From 72cbbc300ea8301bcd7a098dd4bdabf7c3b55c7e Mon Sep 17 00:00:00 2001 From: Dima Kan Date: Mon, 17 Nov 2025 19:10:28 +0200 Subject: [PATCH 1/2] implemented features like RSS downloader, rules for improving transcription quality and support of multiple podcasts --- rewrite_rules.json | 17 + src/add_title_keyword_field.py | 40 ++ src/apply_rewrite_rules.py | 163 +++++++ src/count_episodes.py | 77 ++++ src/os_index.py | 35 +- src/os_ingest.py | 132 +++++- src/quick_upload.py | 2 +- src/rss_parser.py | 149 ++++++ src/transcribe.py | 200 +++++++- test_rss_feasibility.py | 108 +++++ ...-systems-the-unicorn-silk-sonic-methods.md | 0 ...10-through-the-hurt-with-scotty-jackson.md | 0 .../100-it-s-episode-100.md | 0 .../101-systems-check-spring-2025.md | 0 .../102-planning-doing-things-for-me-we.md | 0 .../103-the-privilege-of-doing-this.md | 0 .../104-checking-in-on-checking-out.md | 0 .../11-huuuuuuuge-opportunities-for-you.md | 0 .../12-con-doing-it-live.md | 0 .../13-happiness-first-productivity-second.md | 0 ...-the-appropriate-amount-of-expectations.md | 0 .../15-jay-has-some-big-decisions.md | 0 ...-friend-tells-you-what-you-already-knew.md | 0 ...-related-subjects-for-your-productivity.md | 0 ...putting-external-factors-in-a-vice-grip.md | 0 ...s-spaghetti-combating-imposter-syndrome.md | 0 .../2-when-you-cant-even-or-odd.md | 0 .../20-living-in-a-world-where-and-exists.md | 0 .../21-plan-to-be-flexible.md | 0 .../22-you-con-clean-things-up.md | 0 .../23-making-do-with-what-we-have.md | 0 .../24-tiny-annoying-tasks.md | 0 .../25-caring-to-care.md | 0 .../26-the-content-dragon-appears.md | 0 .../27-these-are-the-spoons-you-have.md | 0 .../28-when-failure-happens.md | 0 .../29-systems-check-summer-2022.md | 0 .../3-getting-away-now.md | 0 .../30-facing-the-reality-of-being-a-face.md | 0 .../31-salt-pepper-garlic-fall.md | 0 ...-first-of-all-is-it-fun-with-brad-dowdy.md | 0 transcripts/{ => conduit_podcast}/33.md | 0 .../34-debug-my-nervous-system.md | 0 .../35-serious-unicorning-around.md | 0 .../{ => conduit_podcast}/36-brain-bags.md | 0 .../37-systems-check-fall-2022.md | 0 .../38-scrape-scrap-and-survive.md | 0 .../39-self-care-isn-t-selfish.md | 0 .../4-big-and-little-wind-downs.md | 0 .../40-throwing-off-the-emperor-s-groove.md | 0 .../41-accountability-doing-it-right.md | 0 .../42-we-asked-a-lot-of-questions.md | 0 .../43-systems-check-chugga-chugga-to-do.md | 0 ...aking-care-of-that-jar-in-the-preseason.md | 0 .../45-adjusting-to-adjustments.md | 0 .../46-feed-the-ducks.md | 0 .../47-the-secret-soup-packing-sauce.md | 0 ...ong-projects-remove-the-concept-of-time.md | 0 ...ck-spring-2023-unicorns-and-thundercats.md | 0 ...stained-progress-over-being-overwhelmed.md | 0 .../50-friendship-hipaa.md | 0 .../51-touching-squishy-brains.md | 0 ...tes-that-right-there-thats-a-good-thing.md | 0 .../53-it-s-a-mess-we-re-a-mess-it-s-messy.md | 0 .../54-the-perfect-productivity-show.md | 0 ...55-expectations-of-being-in-a-new-space.md | 0 ...ummer-2023-sidetracked-with-doing-stuff.md | 0 .../57-i-need-help-to-get-the-help.md | 0 .../58-finding-your-rhythm.md | 0 .../59-everything-is-broken.md | 0 ...6-getting-back-into-the-swing-of-things.md | 0 .../60-not-juggling-its-balancing.md | 0 .../61-the-conduit-burnout-candle.md | 0 .../62-systems-check-fall-2023.md | 0 ...63-tiny-audits-for-a-mini-kondo-benefit.md | 0 ...ty-software-with-no-productivity-system.md | 0 ...cing-the-chaos-being-toxic-and-positive.md | 0 .../66-i-wish-i-had-a-britnie.md | 0 .../67-plan-but-be-ready-to-pivot.md | 0 .../68-well-we-made-a-mistake.md | 0 ...ystems-check-winter-2024-systems-che-ck.md | 0 ...hat-were-not-gon-do-saying-no-to-things.md | 0 .../{ => conduit_podcast}/70-gimme-f-g-zen.md | 0 ...1-this-is-good-for-you-you-need-to-stop.md | 0 .../72-back-that-thing-up.md | 0 .../73-you-may-feel-some-slight-discomfort.md | 0 .../74-knucklebones-big-risk-big-rewards.md | 0 .../75-ideates-vs-executioners.md | 0 .../76-systems-check-spring-2024.md | 0 .../{ => conduit_podcast}/77-let-me-cook.md | 0 .../78-big-bundles-of-identical-tasks.md | 0 .../79-over-many-night-s-success.md | 0 .../8-fear-of-success-because-if-i-succeed.md | 0 .../80-jay-didnt-have-a-connection.md | 0 ...81-bretts-mental-health-and-tech-corner.md | 0 .../82-systems-check-summer-2024.md | 0 .../83-the-tire-method-revisited.md | 0 .../84-no-expectations-the-spiderman-story.md | 0 .../85-embracing-boring.md | 0 .../86-just-let-it-unfold.md | 0 .../87-burned-all-the-way-out.md | 0 .../88-more-work-thats-good-i-think.md | 0 .../89-end-of-the-year-systems-check.md | 0 .../9-decision-space-the-tire-technique.md | 0 .../90-big-theme-guy-with-stephen-hackett.md | 0 ...made-this-for-himself-and-maybe-you-too.md | 0 .../92-finding-the-joy.md | 0 ...3-burning-the-candle-in-the-name-of-joy.md | 0 .../94-supervised-learning.md | 0 .../95-communicating-about-communicating.md | 0 .../{ => conduit_podcast}/96-body-grief.md | 0 ...oad-with-the-unicorns-busiest-internets.md | 0 ...sentation-advice-what-is-even-happening.md | 0 .../99-expecting-expectations.md | 0 ...mizer-with-daniel-wrigley-and-eric-pugh.md | 109 +++++ ...rch-product-on-neural-search-principles.md | 309 +++++++++++++ ...tionizing-e-commerce-with-vector-search.md | 384 ++++++++++++++++ ...-2024-alessandro-benedetti-llms-in-solr.md | 158 +++++++ ...s-2024-doug-turnbull-learning-in-public.md | 134 ++++++ ...zzwords-2024-sonam-pankaj-embedanything.md | 104 +++++ ...mi-on-the-weaviate-vector-search-engine.md | 54 +++ ...mpathy-and-artifacts-with-john-berryman.md | 282 ++++++++++++ ...tic-university-founder-at-henry-ai-labs.md | 306 +++++++++++++ ...t-weaviate-chatgpt-llms-form-vs-meaning.md | 65 +++ ...-ml-for-query-and-content-understanding.md | 269 +++++++++++ ...vector-search-and-llms-with-leo-boytsov.md | 156 +++++++ ...rch-as-a-constant-experimentation-cycle.md | 127 +++++ ...ds-with-simon-eskildsen-ceo-turbopuffer.md | 91 ++++ ...gh-measuring-search-quality-with-quepid.md | 201 ++++++++ ...oka-data-at-the-core-of-all-the-cool-ml.md | 253 ++++++++++ ...ector-database-and-working-with-clients.md | 344 ++++++++++++++ ...ch-consultant-engineering-better-search.md | 321 +++++++++++++ ...pinecone-vector-podcast-with-dmitry-kan.md | 170 +++++++ ...of-vespa-from-sparse-into-neural-search.md | 102 +++++ ...an-fontanals-principal-engineer-jina-ai.md | 151 ++++++ ...andy-sql-meets-vector-search-at-rockset.md | 151 ++++++ ...the-academia-industry-gap-with-haystack.md | 310 +++++++++++++ ...le-in-embedding-computation-with-mighty.md | 433 ++++++++++++++++++ .../saurabh-rai-growing-resume-matcher.md | 127 +++++ ...f-swirl-search-in-siloed-data-with-llms.md | 378 +++++++++++++++ ...-ii-bring-ai-to-company-data-with-swirl.md | 35 ++ ...t-challenges-and-joys-of-ml-engineering.md | 238 ++++++++++ .../trey-grainger-wormhole-vectors.md | 299 ++++++++++++ ...-the-rise-fall-and-future-by-notebooklm.md | 101 ++++ ...hium-hardware-accelerated-vector-search.md | 215 +++++++++ ...-of-the-most-adopted-ann-algorithm-hnsw.md | 313 +++++++++++++ ...-to-know-your-data-with-metric-learning.md | 186 ++++++++ 147 files changed, 7786 insertions(+), 13 deletions(-) create mode 100644 rewrite_rules.json create mode 100644 src/add_title_keyword_field.py create mode 100644 src/apply_rewrite_rules.py create mode 100644 src/count_episodes.py create mode 100644 src/rss_parser.py create mode 100644 test_rss_feasibility.py rename transcripts/{ => conduit_podcast}/1-our-systems-the-unicorn-silk-sonic-methods.md (100%) rename transcripts/{ => conduit_podcast}/10-through-the-hurt-with-scotty-jackson.md (100%) rename transcripts/{ => conduit_podcast}/100-it-s-episode-100.md (100%) rename transcripts/{ => conduit_podcast}/101-systems-check-spring-2025.md (100%) rename transcripts/{ => conduit_podcast}/102-planning-doing-things-for-me-we.md (100%) rename transcripts/{ => conduit_podcast}/103-the-privilege-of-doing-this.md (100%) rename transcripts/{ => conduit_podcast}/104-checking-in-on-checking-out.md (100%) rename transcripts/{ => conduit_podcast}/11-huuuuuuuge-opportunities-for-you.md (100%) rename transcripts/{ => conduit_podcast}/12-con-doing-it-live.md (100%) rename transcripts/{ => conduit_podcast}/13-happiness-first-productivity-second.md (100%) rename transcripts/{ => conduit_podcast}/14-giving-ourselves-the-appropriate-amount-of-expectations.md (100%) rename transcripts/{ => conduit_podcast}/15-jay-has-some-big-decisions.md (100%) rename transcripts/{ => conduit_podcast}/16-a-good-friend-tells-you-what-you-already-knew.md (100%) rename transcripts/{ => conduit_podcast}/17-space-and-related-subjects-for-your-productivity.md (100%) rename transcripts/{ => conduit_podcast}/18-putting-external-factors-in-a-vice-grip.md (100%) rename transcripts/{ => conduit_podcast}/19-eating-the-devils-spaghetti-combating-imposter-syndrome.md (100%) rename transcripts/{ => conduit_podcast}/2-when-you-cant-even-or-odd.md (100%) rename transcripts/{ => conduit_podcast}/20-living-in-a-world-where-and-exists.md (100%) rename transcripts/{ => conduit_podcast}/21-plan-to-be-flexible.md (100%) rename transcripts/{ => conduit_podcast}/22-you-con-clean-things-up.md (100%) rename transcripts/{ => conduit_podcast}/23-making-do-with-what-we-have.md (100%) rename transcripts/{ => conduit_podcast}/24-tiny-annoying-tasks.md (100%) rename transcripts/{ => conduit_podcast}/25-caring-to-care.md (100%) rename transcripts/{ => conduit_podcast}/26-the-content-dragon-appears.md (100%) rename transcripts/{ => conduit_podcast}/27-these-are-the-spoons-you-have.md (100%) rename transcripts/{ => conduit_podcast}/28-when-failure-happens.md (100%) rename transcripts/{ => conduit_podcast}/29-systems-check-summer-2022.md (100%) rename transcripts/{ => conduit_podcast}/3-getting-away-now.md (100%) rename transcripts/{ => conduit_podcast}/30-facing-the-reality-of-being-a-face.md (100%) rename transcripts/{ => conduit_podcast}/31-salt-pepper-garlic-fall.md (100%) rename transcripts/{ => conduit_podcast}/32-first-of-all-is-it-fun-with-brad-dowdy.md (100%) rename transcripts/{ => conduit_podcast}/33.md (100%) rename transcripts/{ => conduit_podcast}/34-debug-my-nervous-system.md (100%) rename transcripts/{ => conduit_podcast}/35-serious-unicorning-around.md (100%) rename transcripts/{ => conduit_podcast}/36-brain-bags.md (100%) rename transcripts/{ => conduit_podcast}/37-systems-check-fall-2022.md (100%) rename transcripts/{ => conduit_podcast}/38-scrape-scrap-and-survive.md (100%) rename transcripts/{ => conduit_podcast}/39-self-care-isn-t-selfish.md (100%) rename transcripts/{ => conduit_podcast}/4-big-and-little-wind-downs.md (100%) rename transcripts/{ => conduit_podcast}/40-throwing-off-the-emperor-s-groove.md (100%) rename transcripts/{ => conduit_podcast}/41-accountability-doing-it-right.md (100%) rename transcripts/{ => conduit_podcast}/42-we-asked-a-lot-of-questions.md (100%) rename transcripts/{ => conduit_podcast}/43-systems-check-chugga-chugga-to-do.md (100%) rename transcripts/{ => conduit_podcast}/44-taking-care-of-that-jar-in-the-preseason.md (100%) rename transcripts/{ => conduit_podcast}/45-adjusting-to-adjustments.md (100%) rename transcripts/{ => conduit_podcast}/46-feed-the-ducks.md (100%) rename transcripts/{ => conduit_podcast}/47-the-secret-soup-packing-sauce.md (100%) rename transcripts/{ => conduit_podcast}/48-long-projects-remove-the-concept-of-time.md (100%) rename transcripts/{ => conduit_podcast}/49-systems-check-spring-2023-unicorns-and-thundercats.md (100%) rename transcripts/{ => conduit_podcast}/5-sustained-progress-over-being-overwhelmed.md (100%) rename transcripts/{ => conduit_podcast}/50-friendship-hipaa.md (100%) rename transcripts/{ => conduit_podcast}/51-touching-squishy-brains.md (100%) rename transcripts/{ => conduit_podcast}/52-notes-that-right-there-thats-a-good-thing.md (100%) rename transcripts/{ => conduit_podcast}/53-it-s-a-mess-we-re-a-mess-it-s-messy.md (100%) rename transcripts/{ => conduit_podcast}/54-the-perfect-productivity-show.md (100%) rename transcripts/{ => conduit_podcast}/55-expectations-of-being-in-a-new-space.md (100%) rename transcripts/{ => conduit_podcast}/56-systems-check-summer-2023-sidetracked-with-doing-stuff.md (100%) rename transcripts/{ => conduit_podcast}/57-i-need-help-to-get-the-help.md (100%) rename transcripts/{ => conduit_podcast}/58-finding-your-rhythm.md (100%) rename transcripts/{ => conduit_podcast}/59-everything-is-broken.md (100%) rename transcripts/{ => conduit_podcast}/6-getting-back-into-the-swing-of-things.md (100%) rename transcripts/{ => conduit_podcast}/60-not-juggling-its-balancing.md (100%) rename transcripts/{ => conduit_podcast}/61-the-conduit-burnout-candle.md (100%) rename transcripts/{ => conduit_podcast}/62-systems-check-fall-2023.md (100%) rename transcripts/{ => conduit_podcast}/63-tiny-audits-for-a-mini-kondo-benefit.md (100%) rename transcripts/{ => conduit_podcast}/64-creating-productivity-software-with-no-productivity-system.md (100%) rename transcripts/{ => conduit_podcast}/65-embracing-the-chaos-being-toxic-and-positive.md (100%) rename transcripts/{ => conduit_podcast}/66-i-wish-i-had-a-britnie.md (100%) rename transcripts/{ => conduit_podcast}/67-plan-but-be-ready-to-pivot.md (100%) rename transcripts/{ => conduit_podcast}/68-well-we-made-a-mistake.md (100%) rename transcripts/{ => conduit_podcast}/69-systems-check-winter-2024-systems-che-ck.md (100%) rename transcripts/{ => conduit_podcast}/7-see-what-were-not-gon-do-saying-no-to-things.md (100%) rename transcripts/{ => conduit_podcast}/70-gimme-f-g-zen.md (100%) rename transcripts/{ => conduit_podcast}/71-this-is-good-for-you-you-need-to-stop.md (100%) rename transcripts/{ => conduit_podcast}/72-back-that-thing-up.md (100%) rename transcripts/{ => conduit_podcast}/73-you-may-feel-some-slight-discomfort.md (100%) rename transcripts/{ => conduit_podcast}/74-knucklebones-big-risk-big-rewards.md (100%) rename transcripts/{ => conduit_podcast}/75-ideates-vs-executioners.md (100%) rename transcripts/{ => conduit_podcast}/76-systems-check-spring-2024.md (100%) rename transcripts/{ => conduit_podcast}/77-let-me-cook.md (100%) rename transcripts/{ => conduit_podcast}/78-big-bundles-of-identical-tasks.md (100%) rename transcripts/{ => conduit_podcast}/79-over-many-night-s-success.md (100%) rename transcripts/{ => conduit_podcast}/8-fear-of-success-because-if-i-succeed.md (100%) rename transcripts/{ => conduit_podcast}/80-jay-didnt-have-a-connection.md (100%) rename transcripts/{ => conduit_podcast}/81-bretts-mental-health-and-tech-corner.md (100%) rename transcripts/{ => conduit_podcast}/82-systems-check-summer-2024.md (100%) rename transcripts/{ => conduit_podcast}/83-the-tire-method-revisited.md (100%) rename transcripts/{ => conduit_podcast}/84-no-expectations-the-spiderman-story.md (100%) rename transcripts/{ => conduit_podcast}/85-embracing-boring.md (100%) rename transcripts/{ => conduit_podcast}/86-just-let-it-unfold.md (100%) rename transcripts/{ => conduit_podcast}/87-burned-all-the-way-out.md (100%) rename transcripts/{ => conduit_podcast}/88-more-work-thats-good-i-think.md (100%) rename transcripts/{ => conduit_podcast}/89-end-of-the-year-systems-check.md (100%) rename transcripts/{ => conduit_podcast}/9-decision-space-the-tire-technique.md (100%) rename transcripts/{ => conduit_podcast}/90-big-theme-guy-with-stephen-hackett.md (100%) rename transcripts/{ => conduit_podcast}/91-robb-knight-made-this-for-himself-and-maybe-you-too.md (100%) rename transcripts/{ => conduit_podcast}/92-finding-the-joy.md (100%) rename transcripts/{ => conduit_podcast}/93-burning-the-candle-in-the-name-of-joy.md (100%) rename transcripts/{ => conduit_podcast}/94-supervised-learning.md (100%) rename transcripts/{ => conduit_podcast}/95-communicating-about-communicating.md (100%) rename transcripts/{ => conduit_podcast}/96-body-grief.md (100%) rename transcripts/{ => conduit_podcast}/97-reducing-load-with-the-unicorns-busiest-internets.md (100%) rename transcripts/{ => conduit_podcast}/98-presentation-advice-what-is-even-happening.md (100%) rename transcripts/{ => conduit_podcast}/99-expecting-expectations.md (100%) create mode 100644 transcripts/vector-podcast/adding-ml-layer-to-search-hybrid-search-optimizer-with-daniel-wrigley-and-eric-pugh.md create mode 100644 transcripts/vector-podcast/amin-ahmad-cto-vectara-algolia-elasticsearch-like-search-product-on-neural-search-principles.md create mode 100644 transcripts/vector-podcast/atita-arora-search-relevance-consultant-revolutionizing-e-commerce-with-vector-search.md create mode 100644 transcripts/vector-podcast/berlin-buzzwords-2024-alessandro-benedetti-llms-in-solr.md create mode 100644 transcripts/vector-podcast/berlin-buzzwords-2024-doug-turnbull-learning-in-public.md create mode 100644 transcripts/vector-podcast/berlin-buzzwords-2024-sonam-pankaj-embedanything.md create mode 100644 transcripts/vector-podcast/bob-van-luijt-ceo-semi-on-the-weaviate-vector-search-engine.md create mode 100644 transcripts/vector-podcast/code-search-copilot-llm-prompting-with-empathy-and-artifacts-with-john-berryman.md create mode 100644 transcripts/vector-podcast/connor-shorten-phd-researcher-florida-atlantic-university-founder-at-henry-ai-labs.md create mode 100644 transcripts/vector-podcast/connor-shorten-research-scientist-weaviate-chatgpt-llms-form-vs-meaning.md create mode 100644 transcripts/vector-podcast/daniel-tunkelang-leading-search-consultant-leveraging-ml-for-query-and-content-understanding.md create mode 100644 transcripts/vector-podcast/debunking-myths-of-vector-search-and-llms-with-leo-boytsov.md create mode 100644 transcripts/vector-podcast/doug-turnbull-staff-relevance-engineer-shopify-search-as-a-constant-experimentation-cycle.md create mode 100644 transcripts/vector-podcast/economical-way-of-serving-vector-search-workloads-with-simon-eskildsen-ceo-turbopuffer.md create mode 100644 transcripts/vector-podcast/eric-pugh-measuring-search-quality-with-quepid.md create mode 100644 transcripts/vector-podcast/evgeniya-sukhodolskaya-data-advocate-toloka-data-at-the-core-of-all-the-cool-ml.md create mode 100644 transcripts/vector-podcast/filip-haltmayer-data-engineer-ziliz-on-milvus-vector-database-and-working-with-clients.md create mode 100644 transcripts/vector-podcast/grant-ingersoll-fractional-cto-leading-search-consultant-engineering-better-search.md create mode 100644 transcripts/vector-podcast/greg-kogan-pinecone-vector-podcast-with-dmitry-kan.md create mode 100644 transcripts/vector-podcast/jo-bergum-distinguished-engineer-yahoo-vespa-journey-of-vespa-from-sparse-into-neural-search.md create mode 100644 transcripts/vector-podcast/joan-fontanals-principal-engineer-jina-ai.md create mode 100644 transcripts/vector-podcast/louis-brandy-sql-meets-vector-search-at-rockset.md create mode 100644 transcripts/vector-podcast/malte-pietsch-cto-deepset-passion-in-nlp-and-bridging-the-academia-industry-gap-with-haystack.md create mode 100644 transcripts/vector-podcast/max-irwin-founder-max-io-on-economics-of-scale-in-embedding-computation-with-mighty.md create mode 100644 transcripts/vector-podcast/saurabh-rai-growing-resume-matcher.md create mode 100644 transcripts/vector-podcast/sid-probstein-creator-of-swirl-search-in-siloed-data-with-llms.md create mode 100644 transcripts/vector-podcast/sid-probstein-part-ii-bring-ai-to-company-data-with-swirl.md create mode 100644 transcripts/vector-podcast/tom-lackner-vp-engineering-classic-com-on-qdrant-nft-challenges-and-joys-of-ml-engineering.md create mode 100644 transcripts/vector-podcast/trey-grainger-wormhole-vectors.md create mode 100644 transcripts/vector-podcast/vector-databases-the-rise-fall-and-future-by-notebooklm.md create mode 100644 transcripts/vector-podcast/yaniv-vaknin-director-of-product-searchium-hardware-accelerated-vector-search.md create mode 100644 transcripts/vector-podcast/yury-malkov-staff-engineer-twitter-author-of-the-most-adopted-ann-algorithm-hnsw.md create mode 100644 transcripts/vector-podcast/yusuf-sarigoz-ai-research-engineer-qdrant-getting-to-know-your-data-with-metric-learning.md diff --git a/rewrite_rules.json b/rewrite_rules.json new file mode 100644 index 0000000..48cb018 --- /dev/null +++ b/rewrite_rules.json @@ -0,0 +1,17 @@ +{ + "corner shortened": "Connor Shorten", + "Dimeji Conan": "Dmitry Kan", + "Dimitri Can": "Dmitry Kan", + "Dimitri": "Dmitry", + "Mietri": "Dmitry", + "Leo Boyzov": "Leo Boytsov", + "Cupid": "Quepid", + "Daniel Tungerling": "Daniel Tunkelang", + "Daniel Tungaling": "Daniel Tunkelang", + "VV8": "Weaviate", + "Weeveate": "Weaviate", + "Pine Cone": "Pinecone", + "FAYS": "FAISS", + "Yanny Vaknin": "Yaniv Vaknin", + "deep set": "Deepset" +} \ No newline at end of file diff --git a/src/add_title_keyword_field.py b/src/add_title_keyword_field.py new file mode 100644 index 0000000..9890a12 --- /dev/null +++ b/src/add_title_keyword_field.py @@ -0,0 +1,40 @@ +""" +Add keyword subfield to title field in existing OpenSearch index. +This allows aggregations on the title field. +""" + +import os +from dotenv import load_dotenv +from opensearchpy import OpenSearch + +load_dotenv() + +connection_string = os.getenv("OPENSEARCH_SERVICE_URI") +index_name = os.getenv("INDEX_NAME", "embedded_vp_transcripts") +client = OpenSearch(connection_string, use_ssl=True, timeout=100) + +# Update mapping to add keyword subfield to title +mapping_update = { + "properties": { + "title": { + "type": "text", + "fields": { + "keyword": {"type": "keyword"} + } + } + } +} + +print(f"Updating mapping for index: {index_name}") +try: + response = client.indices.put_mapping( + index=index_name, + body=mapping_update + ) + print(f"✅ Mapping updated successfully: {response}") + print("\nNote: The keyword subfield will only be available for new documents.") + print("To make it available for existing documents, you'll need to reindex.") +except Exception as e: + print(f"❌ Error updating mapping: {e}") + raise + diff --git a/src/apply_rewrite_rules.py b/src/apply_rewrite_rules.py new file mode 100644 index 0000000..aa02215 --- /dev/null +++ b/src/apply_rewrite_rules.py @@ -0,0 +1,163 @@ +"""Apply rewrite rules to transcript files to correct common transcription errors.""" + +import json +import pathlib +import re +from typing import Dict + +import frontmatter +import typer +import typing_extensions +from rich.progress import track + +app = typer.Typer() + + +def load_rewrite_rules(rules_file: pathlib.Path) -> Dict[str, str]: + """Load rewrite rules from a JSON file.""" + if not rules_file.exists(): + typer.echo(f"Error: Rules file not found: {rules_file}", err=True) + raise typer.Exit(1) + + try: + with open(rules_file, "r", encoding="utf-8") as f: + rules = json.load(f) + return rules + except json.JSONDecodeError as e: + typer.echo(f"Error: Invalid JSON in rules file: {e}", err=True) + raise typer.Exit(1) + + +def apply_rewrite_rules(text: str, rules: Dict[str, str]) -> str: + """ + Apply rewrite rules to text using case-insensitive matching with word boundaries. + + Args: + text: The text to process + rules: Dictionary mapping incorrect text to correct text + + Returns: + Text with rewrite rules applied + """ + result = text + + for incorrect, correct in rules.items(): + # Create a regex pattern with word boundaries for case-insensitive matching + # For multi-word phrases, replace spaces with \s+ to allow flexible whitespace + # and use word boundaries only at the start and end + escaped = re.escape(incorrect) + # Replace escaped spaces with \s+ pattern + pattern = escaped.replace(r"\ ", r"\s+") + # Add word boundaries at start and end + pattern = r"\b" + pattern + r"\b" + result = re.sub(pattern, correct, result, flags=re.IGNORECASE) + + return result + + +def process_transcript_file( + input_file: pathlib.Path, + output_file: pathlib.Path, + rules: Dict[str, str], +) -> bool: + """ + Process a single transcript file by applying rewrite rules. + + Args: + input_file: Path to input transcript file + output_file: Path to output transcript file + rules: Dictionary of rewrite rules + + Returns: + True if successful, False otherwise + """ + try: + # Read and parse the file with frontmatter + content = input_file.read_text(encoding="utf-8") + post = frontmatter.loads(content) + + # Apply rewrite rules to the content (not the frontmatter metadata) + corrected_content = apply_rewrite_rules(post.content, rules) + + # Update the post with corrected content + post.content = corrected_content + + # Ensure output directory exists + output_file.parent.mkdir(parents=True, exist_ok=True) + + # Write the corrected file + output_file.write_text(frontmatter.dumps(post), encoding="utf-8") + + return True + except Exception as e: + typer.echo(f"Error processing {input_file}: {e}", err=True) + return False + + +@app.command() +def apply_rules( + input_dir: typing_extensions.Annotated[ + pathlib.Path, + typer.Option("--input-dir", "-i", help="Input directory containing transcript files"), + ], + output_dir: typing_extensions.Annotated[ + pathlib.Path, + typer.Option("--output-dir", "-o", help="Output directory for corrected transcript files"), + ], + rules_file: typing_extensions.Annotated[ + pathlib.Path, + typer.Option("--rules-file", "-r", help="Path to rewrite rules JSON file"), + ] = pathlib.Path("rewrite_rules.json"), +): + """ + Apply rewrite rules to transcript files. + + Processes all .md files in the input directory recursively and writes + corrected versions to the output directory, preserving directory structure. + """ + # Validate input directory + if not input_dir.exists(): + typer.echo(f"Error: Input directory does not exist: {input_dir}", err=True) + raise typer.Exit(1) + + if not input_dir.is_dir(): + typer.echo(f"Error: Input path is not a directory: {input_dir}", err=True) + raise typer.Exit(1) + + # Load rewrite rules + typer.echo(f"Loading rewrite rules from {rules_file}") + rules = load_rewrite_rules(rules_file) + typer.echo(f"Loaded {len(rules)} rewrite rules") + + # Find all markdown files recursively + md_files = list(input_dir.rglob("*.md")) + + if not md_files: + typer.echo(f"No .md files found in {input_dir}") + return + + typer.echo(f"Found {len(md_files)} transcript files to process") + + # Process each file + successful = 0 + failed = 0 + + for input_file in track(md_files, description="Processing files"): + # Calculate relative path from input directory + relative_path = input_file.relative_to(input_dir) + output_file = output_dir / relative_path + + if process_transcript_file(input_file, output_file, rules): + successful += 1 + else: + failed += 1 + + typer.echo(f"\nProcessing complete:") + typer.echo(f" ✓ Successfully processed: {successful}") + if failed > 0: + typer.echo(f" ✗ Failed: {failed}", err=True) + + +if __name__ == "__main__": + app() + diff --git a/src/count_episodes.py b/src/count_episodes.py new file mode 100644 index 0000000..9a90ab7 --- /dev/null +++ b/src/count_episodes.py @@ -0,0 +1,77 @@ +""" +Count unique episodes in OpenSearch index. +This script provides a workaround if title.keyword doesn't exist yet. +""" + +import os +from dotenv import load_dotenv +from opensearchpy import OpenSearch + +load_dotenv() + +connection_string = os.getenv("OPENSEARCH_SERVICE_URI") +index_name = os.getenv("INDEX_NAME", "embedded_vp_transcripts") +client = OpenSearch(connection_string, use_ssl=True, timeout=100) + +# Try with title.keyword first +query = { + "size": 0, + "query": { + "match_all": {} + }, + "aggs": { + "unique_episodes": { + "terms": { + "field": "title.keyword", + "size": 10000 + } + } + } +} + +print(f"Querying index: {index_name}") +try: + response = client.search(index=index_name, body=query) + buckets = response.get("aggregations", {}).get("unique_episodes", {}).get("buckets", []) + + if buckets: + print(f"✅ Found {len(buckets)} unique episodes") + print("\nFirst 10 episodes:") + for bucket in buckets[:10]: + print(f" - {bucket['key']}: {bucket['doc_count']} chunks") + else: + print("⚠️ No results found. Trying alternative approach...") + # Alternative: get all unique titles using a different method + # This requires fetching documents and deduplicating + scroll_query = { + "size": 1000, + "_source": ["title"], + "query": {"match_all": {}} + } + + titles = set() + response = client.search(index=index_name, body=scroll_query, scroll="2m") + scroll_id = response.get("_scroll_id") + + while True: + hits = response.get("hits", {}).get("hits", []) + if not hits: + break + + for hit in hits: + title = hit.get("_source", {}).get("title") + if title: + titles.add(title) + + if not scroll_id: + break + + response = client.scroll(scroll_id=scroll_id, scroll="2m") + scroll_id = response.get("_scroll_id") + + print(f"✅ Found {len(titles)} unique episodes (using workaround method)") + +except Exception as e: + print(f"❌ Error: {e}") + raise + diff --git a/src/os_index.py b/src/os_index.py index 66b1465..092a886 100644 --- a/src/os_index.py +++ b/src/os_index.py @@ -21,10 +21,16 @@ }, "mappings": { "properties": { - "title": {"type": "text"}, - "episode_number": {"type": "int"}, + "title": { + "type": "text", + "fields": { + "keyword": {"type": "keyword"} + } + }, + "episode_number": {"type": "integer"}, "description": {"type": "text"}, "url": {"type": "keyword"}, + "image_url": {"type": "keyword"}, "content": {"type": "text"}, "content_vector": { "type": "knn_vector", @@ -46,6 +52,29 @@ def create_index(index_name: str = index_name, index_settings=index_settings): """Checks for existing index and deletes it and recreates it if it exists""" if client.indices.exists(index=index_name): + print(f"Deleting existing index {index_name}") client.indices.delete(index=index_name) - client.indices.create(index=index_name, body=index_settings, ignore=400) + print(f"Creating index {index_name} with knn_vector mapping") + try: + response = client.indices.create( + index=index_name, + body=index_settings + ) + print(f"✅ Index created successfully: {response}") + except Exception as e: + print(f"❌ Error creating index: {e}") + raise + + # Verify the mapping was applied correctly + print(f"\nVerifying index mapping...") + mapping = client.indices.get_mapping(index=index_name) + content_vector_type = mapping[index_name]["mappings"]["properties"].get("content_vector", {}).get("type") + if content_vector_type == "knn_vector": + print(f"✅ content_vector field is correctly set to knn_vector") + else: + print(f"⚠️ WARNING: content_vector field type is '{content_vector_type}', expected 'knn_vector'") + print(f" This may cause vector search to fail. Please recreate the index.") + +if __name__ == "__main__": + create_index() \ No newline at end of file diff --git a/src/os_ingest.py b/src/os_ingest.py index 47c556a..c7de698 100644 --- a/src/os_ingest.py +++ b/src/os_ingest.py @@ -2,9 +2,13 @@ import os import pathlib import re +import typing import uuid +import typing_extensions import arrow +import slugify +import typer from dotenv import load_dotenv from langchain_huggingface import HuggingFaceEmbeddings from langchain_text_splitters import RecursiveCharacterTextSplitter @@ -37,6 +41,46 @@ def os_load_data_from_file(file: pathlib.Path): """Chunk data, create embeddings, and index in OpenSearch.""" docs = [] + # Load frontmatter and extract metadata + frontmatter_post = frontmatter.loads(file.read_text()) + + # Extract episode_number from title (e.g., "1 - Title" -> 1) + episode_number_match = re.match(r"^\d+", frontmatter_post["title"]) + episode_number = int(episode_number_match.group()) if episode_number_match else None + + # Parse pub_date - try multiple formats + pub_date_str = frontmatter_post["pub_date"] + pub_date = None + + # Try the defined format first + try: + pub_date = arrow.get(pub_date_str, fmt).date().isoformat() + except (arrow.parser.ParserError, arrow.parser.ParserMatchError, ValueError): + # Try RFC 2822 format (e.g., "Fri, 07 Nov 2025 05:58:00 GMT") + try: + pub_date = arrow.get(pub_date_str, "ddd, DD MMM YYYY HH:mm:ss ZZZ").date().isoformat() + except (arrow.parser.ParserError, arrow.parser.ParserMatchError, ValueError): + # Try RFC 2822 without timezone + try: + pub_date = arrow.get(pub_date_str, "ddd, DD MMM YYYY HH:mm:ss").date().isoformat() + except (arrow.parser.ParserError, arrow.parser.ParserMatchError, ValueError): + # Last resort: try to parse with dateutil (more flexible) + from dateutil import parser as dateutil_parser + pub_date = dateutil_parser.parse(pub_date_str).date().isoformat() + + if pub_date is None: + raise ValueError(f"Could not parse pub_date: {pub_date_str}") + + # Build base_data with metadata fields + base_data = { + "title": frontmatter_post["title"], + "episode_number": episode_number, + "description": frontmatter_post.get("description", ""), + "url": frontmatter_post.get("url", ""), + "image_url": frontmatter_post.get("image_url", ""), + "pub_date": pub_date, + } + post_chunks = splitter.create_documents([frontmatter_post.content]) for post_chunk in post_chunks: doc = { @@ -50,5 +94,91 @@ def os_load_data_from_file(file: pathlib.Path): }, } docs.append(doc) - response = helpers.bulk(client, docs) + response = helpers.bulk(client, docs, index=INDEX_NAME) return response + + +app = typer.Typer() + + +@app.command() +def ingest( + episode: typing_extensions.Annotated[ + typing.Optional[str], + typer.Option("--episode", "-e", help="Episode name (slugified filename without .md extension)"), + ] = None, + show: typing_extensions.Annotated[ + typing.Optional[str], + typer.Option("--show", "-s", help="Show name (subdirectory in transcripts/ folder)"), + ] = None, + all: typing_extensions.Annotated[ + bool, + typer.Option("--all", "-a", help="Process all transcript files"), + ] = False, +): + """ + Ingest transcript files into OpenSearch. + + Can process a specific episode, all episodes in a show, or all episodes. + """ + transcripts_dir = pathlib.Path("transcripts") + + if all: + # Process all files in transcripts directory (including subdirectories) + if show: + # Process all files in the show subdirectory + show_dir = transcripts_dir / slugify.slugify(show) + if not show_dir.exists(): + typer.echo(f"Show directory not found: {show_dir}", err=True) + raise typer.Exit(1) + files = list(show_dir.glob("*.md")) + else: + # Process all files in transcripts and all subdirectories + files = list(transcripts_dir.rglob("*.md")) + + if not files: + typer.echo("No transcript files found", err=True) + raise typer.Exit(1) + + typer.echo(f"Processing {len(files)} transcript file(s)...") + for file in files: + typer.echo(f"Processing: {file}") + try: + os_load_data_from_file(file) + typer.echo(f"✓ Successfully ingested {file.name}") + except Exception as e: + typer.echo(f"✗ Error processing {file.name}: {e}", err=True) + + elif episode: + # Process a specific episode + if show: + # Look in the show subdirectory + file_path = transcripts_dir / slugify.slugify(show) / f"{slugify.slugify(episode)}.md" + else: + # Look in transcripts root or try to find it recursively + file_path = transcripts_dir / f"{slugify.slugify(episode)}.md" + if not file_path.exists(): + # Try to find it in subdirectories + found_files = list(transcripts_dir.rglob(f"{slugify.slugify(episode)}.md")) + if found_files: + file_path = found_files[0] + + if not file_path.exists(): + typer.echo(f"Episode file not found: {file_path}", err=True) + raise typer.Exit(1) + + typer.echo(f"Processing: {file_path}") + try: + os_load_data_from_file(file_path) + typer.echo(f"✓ Successfully ingested {file_path.name}") + except Exception as e: + typer.echo(f"✗ Error processing {file_path.name}: {e}", err=True) + raise typer.Exit(1) + + else: + typer.echo("Error: Must specify either --episode or --all", err=True) + raise typer.Exit(1) + + +if __name__ == "__main__": + app() diff --git a/src/quick_upload.py b/src/quick_upload.py index fe446b3..090f6c3 100644 --- a/src/quick_upload.py +++ b/src/quick_upload.py @@ -15,7 +15,7 @@ ARROW_FMT = r"MMMM[\s+]D[\w+,\s+]YYYY" -def process_frontmatter(file: pathlib.Path) +def process_frontmatter(file: pathlib.Path): frontmatter_post = frontmatter.loads( file.read_text() ) # loads the metadata from the file diff --git a/src/rss_parser.py b/src/rss_parser.py new file mode 100644 index 0000000..03cadc9 --- /dev/null +++ b/src/rss_parser.py @@ -0,0 +1,149 @@ +"""Parse RSS feed to extract episode information and audio URLs.""" + +import xml.etree.ElementTree as ET +from typing import List, Dict, Optional +import httpx +import typer + + +def parse_rss_feed(rss_url: str) -> List[Dict[str, str]]: + """ + Parse an RSS feed and extract episode information. + + Args: + rss_url: URL of the RSS feed + + Returns: + List of dictionaries containing episode metadata and audio URLs + """ + typer.echo(f"Fetching RSS feed from {rss_url}") + response = httpx.get(rss_url) + response.raise_for_status() + + root = ET.fromstring(response.text) + + # Define namespaces + namespaces = { + 'itunes': 'http://www.itunes.com/dtds/podcast-1.0.dtd', + 'podcast': 'https://podcastindex.org/namespace/1.0', + } + + # Find all items in the feed + items = root.findall('.//item') + episodes = [] + + for item in items: + # Extract title (handle CDATA) + title_elem = item.find('title') + if title_elem is not None: + title = title_elem.text if title_elem.text else "" + # Clean up CDATA if present + title = title.strip() + else: + title = "Unknown Title" + + # Extract description (handle CDATA) + desc_elem = item.find('description') + if desc_elem is not None: + description = desc_elem.text if desc_elem.text else "" + description = description.strip() + else: + description = "" + + # Extract publication date + pub_date_elem = item.find('pubDate') + pub_date = pub_date_elem.text if pub_date_elem is not None and pub_date_elem.text else "" + + # Extract link + link_elem = item.find('link') + link = link_elem.text if link_elem is not None and link_elem.text else "" + + # Extract audio URL from enclosure + enclosure = item.find('enclosure') + audio_url = None + if enclosure is not None: + audio_url = enclosure.get('url') + + # Extract episode number from itunes:episode if available + episode_elem = item.find('.//itunes:episode', namespaces) + episode_number = None + if episode_elem is not None and episode_elem.text: + try: + episode_number = int(episode_elem.text) + except (ValueError, AttributeError): + pass + + # Extract image URL from itunes:image if available + image_elem = item.find('.//itunes:image', namespaces) + image_url = None + if image_elem is not None: + image_url = image_elem.get('href') + + if audio_url: + episode_data = { + 'title': title, + 'description': description, + 'pub_date': pub_date, + 'url': link, + 'audio_url': audio_url, + 'episode_number': episode_number, + 'image_url': image_url, + } + episodes.append(episode_data) + + typer.echo(f"Found {len(episodes)} episodes in RSS feed") + return episodes + + +def get_episode_by_number(rss_url: str, episode_number: int) -> Optional[Dict[str, str]]: + """ + Get a specific episode by episode number from the RSS feed. + + Args: + rss_url: URL of the RSS feed + episode_number: Episode number to retrieve + + Returns: + Episode dictionary or None if not found + """ + episodes = parse_rss_feed(rss_url) + for episode in episodes: + if episode.get('episode_number') == episode_number: + return episode + return None + + +def get_latest_episode(rss_url: str) -> Optional[Dict[str, str]]: + """ + Get the latest episode from the RSS feed (first item). + + Args: + rss_url: URL of the RSS feed + + Returns: + Latest episode dictionary or None if feed is empty + """ + episodes = parse_rss_feed(rss_url) + return episodes[0] if episodes else None + + +def get_audio_url_from_rss_episode(episode: Dict[str, str]) -> tuple[Dict[str, str], str]: + """ + Extract metadata and audio URL from an episode dictionary. + + Args: + episode: Episode dictionary from RSS feed + + Returns: + Tuple of (metadata_dict, audio_url) + """ + metadata = { + 'title': episode.get('title', 'Unknown'), + 'url': episode.get('url', ''), + 'description': episode.get('description', ''), + 'pub_date': episode.get('pub_date', ''), + 'image_url': episode.get('image_url', ''), + } + audio_url = episode.get('audio_url', '') + return (metadata, audio_url) + diff --git a/src/transcribe.py b/src/transcribe.py index 9119d32..5be7d36 100644 --- a/src/transcribe.py +++ b/src/transcribe.py @@ -10,14 +10,53 @@ import httpx import slugify import typer -import whisper from langchain_text_splitters import RecursiveCharacterTextSplitter from rich.progress import track from rich.prompt import Confirm from url_finder import fetch_latest_episode_number, get_audio_url_from_episode_number +from rss_parser import ( + parse_rss_feed, + get_episode_by_number, + get_latest_episode, + get_audio_url_from_rss_episode, +) + +# Lazy import and loading of Whisper model +_whisper_model = None +_whisper_module = None + + +def _get_whisper_model(): + """Lazily load the Whisper model only when needed.""" + global _whisper_model, _whisper_module + + if _whisper_model is not None: + return _whisper_model + + try: + import whisper + _whisper_module = whisper + except ImportError: + typer.echo( + "Error: Whisper not found. Please install it with: pip install openai-whisper", + err=True, + ) + raise typer.Exit(1) + + try: + _whisper_model = whisper.load_model("base") + return _whisper_model + except AttributeError as e: + typer.echo( + f"Error: Whisper module doesn't have expected API. " + f"Please ensure you have 'openai-whisper' installed: pip install openai-whisper", + err=True, + ) + typer.echo(f"Details: {e}", err=True) + raise typer.Exit(1) + app = typer.Typer() -model = whisper.load_model("base") splitter = RecursiveCharacterTextSplitter( chunk_size=300, separators=[".", "!", "?", "\n"], @@ -51,7 +90,10 @@ def download_audio_file(url: str) -> str: def transcribe_audio_file(audio_file: pathlib.Path) -> str: """Transcribe an audio file to text""" - + model = _get_whisper_model() + # _whisper_module is guaranteed to be set if _get_whisper_model() succeeded + whisper = _whisper_module + audio = whisper.load_audio(str(audio_file)) transcription = model.transcribe(audio=audio, verbose=False) @@ -77,10 +119,25 @@ def transcribe_file( return output_file.write_text(transcription) -def transcribe_from_audio_url(audio_url: str) -> int: +def transcribe_from_audio_url(audio_url: str) -> str: typer.echo(f"Transcribing audio from {audio_url}") - audio_file = download_audio_file(audio_url) - return transcribe_audio_file(audio_file) + audio_file_path = download_audio_file(audio_url) + transcription = transcribe_audio_file(pathlib.Path(audio_file_path)) + # Clean up temporary file + pathlib.Path(audio_file_path).unlink() + return transcription + + +def get_output_file_path(metadata: dict, show: typing.Optional[str] = None) -> pathlib.Path: + """Get the output file path for a transcript based on metadata and show name.""" + if show: + return pathlib.Path( + f"transcripts/{slugify.slugify(show)}/{slugify.slugify(metadata['title'])}.md" + ) + else: + return pathlib.Path( + f"transcripts/{slugify.slugify(metadata['title'])}.md" + ) @app.command(name="ep") @@ -95,6 +152,14 @@ def transcribe_from_episode_number( "-r", "--range", help="two numbers separated by a dash ('-'). Example: 1-10" ), ] = None, + show: typing_extensions.Annotated[ + typing.Optional[str], + typer.Option("--show", help="Show name to use as subdirectory in transcripts/ folder"), + ] = None, + skip_if_exists: typing_extensions.Annotated[ + bool, + typer.Option("--skip-if-exists", help="Skip episodes for which transcriptions already exist"), + ] = False, ): """ Transcribe an episode from an episode number @@ -122,13 +187,132 @@ def transcribe_from_episode_number( for episode_number in track(episode_numbers): metadata, audio_url = get_audio_url_from_episode_number(episode_number) + output_file = get_output_file_path(metadata, show) + + if skip_if_exists and output_file.exists(): + typer.echo(f"Skipping episode {episode_number}: {output_file} already exists") + continue + transcription = transcribe_from_audio_url(audio_url) post = frontmatter.Post( "\n".join(splitter.split_text(transcription)), **metadata ) - output_file = pathlib.Path( - f"transcripts/{slugify.slugify(metadata['title'])}.md" + output_file.parent.mkdir(parents=True, exist_ok=True) + output_file.write_text(frontmatter.dumps(post)) + + +@app.command(name="rss") +def transcribe_from_rss( + rss_url: typing_extensions.Annotated[ + str, + typer.Option("--url", help="RSS feed URL"), + ] = "https://media.rss.com/vector-podcast/feed.xml", + episode_numbers: typing.Optional[typing.List[int]] = None, + latest: typing_extensions.Annotated[ + bool, typer.Option("-l", "--latest", help="Transcribe latest episode only") + ] = False, + all: typing_extensions.Annotated[ + bool, typer.Option("-a", "--all", help="Transcribe all episodes") + ] = False, + _range: typing_extensions.Annotated[ + typing.Optional[str], + typer.Option( + "-r", "--range", help="two numbers separated by a dash ('-'). Example: 1-10" + ), + ] = None, + show: typing_extensions.Annotated[ + typing.Optional[str], + typer.Option("--show", help="Show name to use as subdirectory in transcripts/ folder"), + ] = None, + skip_if_exists: typing_extensions.Annotated[ + bool, + typer.Option("--skip-if-exists", help="Skip episodes for which transcriptions already exist"), + ] = False, +): + """ + Transcribe episodes from an RSS feed (e.g., Vector Podcast) + + Metadata is pulled from the RSS feed. + Audio is downloaded from the enclosure URL in the feed. + """ + + if latest: + episode = get_latest_episode(rss_url) + if not episode: + typer.echo("No episodes found in RSS feed", err=True) + raise typer.Exit(1) + + metadata, audio_url = get_audio_url_from_rss_episode(episode) + output_file = get_output_file_path(metadata, show) + + if skip_if_exists and output_file.exists(): + typer.echo(f"Skipping latest episode: {output_file} already exists") + return + + transcription = transcribe_from_audio_url(audio_url) + post = frontmatter.Post( + "\n".join(splitter.split_text(transcription)), **metadata + ) + output_file.parent.mkdir(parents=True, exist_ok=True) + output_file.write_text(frontmatter.dumps(post)) + return + + if all: + if Confirm.ask( + "This Process will download and transcribe all episodes. Continue?", + show_choices=True, + default=False, + ): + episodes = parse_rss_feed(rss_url) + for episode in track(episodes): + metadata, audio_url = get_audio_url_from_rss_episode(episode) + output_file = get_output_file_path(metadata, show) + + if skip_if_exists and output_file.exists(): + typer.echo(f"Skipping episode: {output_file} already exists") + continue + + transcription = transcribe_from_audio_url(audio_url) + post = frontmatter.Post( + "\n".join(splitter.split_text(transcription)), **metadata + ) + output_file.parent.mkdir(parents=True, exist_ok=True) + output_file.write_text(frontmatter.dumps(post)) + return + + if _range: + start, stop = _range.split("-") + episode_numbers = list(range(int(start), int(stop) + 1)) + + if not episode_numbers: + # Default to latest episode + episode = get_latest_episode(rss_url) + if not episode: + typer.echo("No episodes found in RSS feed", err=True) + raise typer.Exit(1) + episodes_to_process = [episode] + else: + episodes_to_process = [] + for ep_num in episode_numbers: + episode = get_episode_by_number(rss_url, ep_num) + if episode: + episodes_to_process.append(episode) + else: + typer.echo(f"Episode {ep_num} not found in RSS feed", err=True) + + for episode in track(episodes_to_process): + metadata, audio_url = get_audio_url_from_rss_episode(episode) + output_file = get_output_file_path(metadata, show) + + if skip_if_exists and output_file.exists(): + typer.echo(f"Skipping episode: {output_file} already exists") + continue + + transcription = transcribe_from_audio_url(audio_url) + post = frontmatter.Post( + "\n".join(splitter.split_text(transcription)), **metadata ) + output_file.parent.mkdir(parents=True, exist_ok=True) output_file.write_text(frontmatter.dumps(post)) diff --git a/test_rss_feasibility.py b/test_rss_feasibility.py new file mode 100644 index 0000000..77d3cea --- /dev/null +++ b/test_rss_feasibility.py @@ -0,0 +1,108 @@ +#!/usr/bin/env python3 +""" +Test script to verify RSS feed parsing and audio download feasibility. +This script tests the core functionality without requiring full transcription. +""" + +import sys +import pathlib + +# Add src to path +sys.path.insert(0, str(pathlib.Path(__file__).parent / "src")) + +try: + from rss_parser import parse_rss_feed, get_latest_episode + from transcribe import download_audio_file + print("✅ All imports successful") +except ImportError as e: + print(f"❌ Import error: {e}") + print("Please install required dependencies:") + print(" pip install httpx typer") + sys.exit(1) + +def test_rss_parsing(): + """Test RSS feed parsing""" + print("\n📡 Testing RSS feed parsing...") + rss_url = "https://media.rss.com/vector-podcast/feed.xml" + + try: + episodes = parse_rss_feed(rss_url) + print(f"✅ Successfully parsed RSS feed") + print(f" Found {len(episodes)} episodes") + + if episodes: + latest = episodes[0] + print(f"\n📝 Latest episode:") + print(f" Title: {latest.get('title', 'N/A')}") + print(f" Audio URL: {latest.get('audio_url', 'N/A')[:80]}...") + print(f" Episode #: {latest.get('episode_number', 'N/A')}") + return latest.get('audio_url') + else: + print("⚠️ No episodes found in feed") + return None + except Exception as e: + print(f"❌ Error parsing RSS feed: {e}") + import traceback + traceback.print_exc() + return None + +def test_audio_download(audio_url): + """Test audio file download""" + if not audio_url: + print("\n⏭️ Skipping audio download test (no URL)") + return False + + print(f"\n⬇️ Testing audio download...") + print(f" URL: {audio_url[:80]}...") + + try: + # Just test the download, not full transcription + import tempfile + import httpx + + with tempfile.NamedTemporaryFile(mode="+wb", suffix=".mp3", delete=True) as f: + print(f" Downloading to temporary file...") + with httpx.stream("GET", audio_url, follow_redirects=True, timeout=30.0) as response: + response.raise_for_status() + size = 0 + for chunk in response.iter_bytes(): + f.write(chunk) + size += len(chunk) + if size > 1024 * 1024: # Stop after 1MB for testing + break + print(f"✅ Successfully downloaded {size / 1024:.1f} KB (test)") + return True + except Exception as e: + print(f"❌ Error downloading audio: {e}") + import traceback + traceback.print_exc() + return False + +def main(): + print("=" * 60) + print("Vector Podcast RSS Feed Transcription - Feasibility Test") + print("=" * 60) + + audio_url = test_rss_parsing() + download_success = test_audio_download(audio_url) + + print("\n" + "=" * 60) + print("FEASIBILITY ASSESSMENT:") + print("=" * 60) + + if audio_url and download_success: + print("✅ FEASIBLE - All core components work!") + print("\nNext steps:") + print(" 1. Install Whisper: pip install openai-whisper") + print(" 2. Run transcription:") + print(" python src/transcribe.py rss --latest") + print(" python src/transcribe.py rss --url 1 2 3") + print(" python src/transcribe.py rss --all") + else: + print("⚠️ Some issues detected. Check errors above.") + print("=" * 60) + +if __name__ == "__main__": + main() + + diff --git a/transcripts/1-our-systems-the-unicorn-silk-sonic-methods.md b/transcripts/conduit_podcast/1-our-systems-the-unicorn-silk-sonic-methods.md similarity index 100% rename from transcripts/1-our-systems-the-unicorn-silk-sonic-methods.md rename to transcripts/conduit_podcast/1-our-systems-the-unicorn-silk-sonic-methods.md diff --git a/transcripts/10-through-the-hurt-with-scotty-jackson.md b/transcripts/conduit_podcast/10-through-the-hurt-with-scotty-jackson.md similarity index 100% rename from transcripts/10-through-the-hurt-with-scotty-jackson.md rename to transcripts/conduit_podcast/10-through-the-hurt-with-scotty-jackson.md diff --git a/transcripts/100-it-s-episode-100.md b/transcripts/conduit_podcast/100-it-s-episode-100.md similarity index 100% rename from transcripts/100-it-s-episode-100.md rename to transcripts/conduit_podcast/100-it-s-episode-100.md diff --git a/transcripts/101-systems-check-spring-2025.md b/transcripts/conduit_podcast/101-systems-check-spring-2025.md similarity index 100% rename from transcripts/101-systems-check-spring-2025.md rename to transcripts/conduit_podcast/101-systems-check-spring-2025.md diff --git a/transcripts/102-planning-doing-things-for-me-we.md b/transcripts/conduit_podcast/102-planning-doing-things-for-me-we.md similarity index 100% rename from transcripts/102-planning-doing-things-for-me-we.md rename to transcripts/conduit_podcast/102-planning-doing-things-for-me-we.md diff --git a/transcripts/103-the-privilege-of-doing-this.md b/transcripts/conduit_podcast/103-the-privilege-of-doing-this.md similarity index 100% rename from transcripts/103-the-privilege-of-doing-this.md rename to transcripts/conduit_podcast/103-the-privilege-of-doing-this.md diff --git a/transcripts/104-checking-in-on-checking-out.md b/transcripts/conduit_podcast/104-checking-in-on-checking-out.md similarity index 100% rename from transcripts/104-checking-in-on-checking-out.md rename to transcripts/conduit_podcast/104-checking-in-on-checking-out.md diff --git a/transcripts/11-huuuuuuuge-opportunities-for-you.md b/transcripts/conduit_podcast/11-huuuuuuuge-opportunities-for-you.md similarity index 100% rename from transcripts/11-huuuuuuuge-opportunities-for-you.md rename to transcripts/conduit_podcast/11-huuuuuuuge-opportunities-for-you.md diff --git a/transcripts/12-con-doing-it-live.md b/transcripts/conduit_podcast/12-con-doing-it-live.md similarity index 100% rename from transcripts/12-con-doing-it-live.md rename to transcripts/conduit_podcast/12-con-doing-it-live.md diff --git a/transcripts/13-happiness-first-productivity-second.md b/transcripts/conduit_podcast/13-happiness-first-productivity-second.md similarity index 100% rename from transcripts/13-happiness-first-productivity-second.md rename to transcripts/conduit_podcast/13-happiness-first-productivity-second.md diff --git a/transcripts/14-giving-ourselves-the-appropriate-amount-of-expectations.md b/transcripts/conduit_podcast/14-giving-ourselves-the-appropriate-amount-of-expectations.md similarity index 100% rename from transcripts/14-giving-ourselves-the-appropriate-amount-of-expectations.md rename to transcripts/conduit_podcast/14-giving-ourselves-the-appropriate-amount-of-expectations.md diff --git a/transcripts/15-jay-has-some-big-decisions.md b/transcripts/conduit_podcast/15-jay-has-some-big-decisions.md similarity index 100% rename from transcripts/15-jay-has-some-big-decisions.md rename to transcripts/conduit_podcast/15-jay-has-some-big-decisions.md diff --git a/transcripts/16-a-good-friend-tells-you-what-you-already-knew.md b/transcripts/conduit_podcast/16-a-good-friend-tells-you-what-you-already-knew.md similarity index 100% rename from transcripts/16-a-good-friend-tells-you-what-you-already-knew.md rename to transcripts/conduit_podcast/16-a-good-friend-tells-you-what-you-already-knew.md diff --git a/transcripts/17-space-and-related-subjects-for-your-productivity.md b/transcripts/conduit_podcast/17-space-and-related-subjects-for-your-productivity.md similarity index 100% rename from transcripts/17-space-and-related-subjects-for-your-productivity.md rename to transcripts/conduit_podcast/17-space-and-related-subjects-for-your-productivity.md diff --git a/transcripts/18-putting-external-factors-in-a-vice-grip.md b/transcripts/conduit_podcast/18-putting-external-factors-in-a-vice-grip.md similarity index 100% rename from transcripts/18-putting-external-factors-in-a-vice-grip.md rename to transcripts/conduit_podcast/18-putting-external-factors-in-a-vice-grip.md diff --git a/transcripts/19-eating-the-devils-spaghetti-combating-imposter-syndrome.md b/transcripts/conduit_podcast/19-eating-the-devils-spaghetti-combating-imposter-syndrome.md similarity index 100% rename from transcripts/19-eating-the-devils-spaghetti-combating-imposter-syndrome.md rename to transcripts/conduit_podcast/19-eating-the-devils-spaghetti-combating-imposter-syndrome.md diff --git a/transcripts/2-when-you-cant-even-or-odd.md b/transcripts/conduit_podcast/2-when-you-cant-even-or-odd.md similarity index 100% rename from transcripts/2-when-you-cant-even-or-odd.md rename to transcripts/conduit_podcast/2-when-you-cant-even-or-odd.md diff --git a/transcripts/20-living-in-a-world-where-and-exists.md b/transcripts/conduit_podcast/20-living-in-a-world-where-and-exists.md similarity index 100% rename from transcripts/20-living-in-a-world-where-and-exists.md rename to transcripts/conduit_podcast/20-living-in-a-world-where-and-exists.md diff --git a/transcripts/21-plan-to-be-flexible.md b/transcripts/conduit_podcast/21-plan-to-be-flexible.md similarity index 100% rename from transcripts/21-plan-to-be-flexible.md rename to transcripts/conduit_podcast/21-plan-to-be-flexible.md diff --git a/transcripts/22-you-con-clean-things-up.md b/transcripts/conduit_podcast/22-you-con-clean-things-up.md similarity index 100% rename from transcripts/22-you-con-clean-things-up.md rename to transcripts/conduit_podcast/22-you-con-clean-things-up.md diff --git a/transcripts/23-making-do-with-what-we-have.md b/transcripts/conduit_podcast/23-making-do-with-what-we-have.md similarity index 100% rename from transcripts/23-making-do-with-what-we-have.md rename to transcripts/conduit_podcast/23-making-do-with-what-we-have.md diff --git a/transcripts/24-tiny-annoying-tasks.md b/transcripts/conduit_podcast/24-tiny-annoying-tasks.md similarity index 100% rename from transcripts/24-tiny-annoying-tasks.md rename to transcripts/conduit_podcast/24-tiny-annoying-tasks.md diff --git a/transcripts/25-caring-to-care.md b/transcripts/conduit_podcast/25-caring-to-care.md similarity index 100% rename from transcripts/25-caring-to-care.md rename to transcripts/conduit_podcast/25-caring-to-care.md diff --git a/transcripts/26-the-content-dragon-appears.md b/transcripts/conduit_podcast/26-the-content-dragon-appears.md similarity index 100% rename from transcripts/26-the-content-dragon-appears.md rename to transcripts/conduit_podcast/26-the-content-dragon-appears.md diff --git a/transcripts/27-these-are-the-spoons-you-have.md b/transcripts/conduit_podcast/27-these-are-the-spoons-you-have.md similarity index 100% rename from transcripts/27-these-are-the-spoons-you-have.md rename to transcripts/conduit_podcast/27-these-are-the-spoons-you-have.md diff --git a/transcripts/28-when-failure-happens.md b/transcripts/conduit_podcast/28-when-failure-happens.md similarity index 100% rename from transcripts/28-when-failure-happens.md rename to transcripts/conduit_podcast/28-when-failure-happens.md diff --git a/transcripts/29-systems-check-summer-2022.md b/transcripts/conduit_podcast/29-systems-check-summer-2022.md similarity index 100% rename from transcripts/29-systems-check-summer-2022.md rename to transcripts/conduit_podcast/29-systems-check-summer-2022.md diff --git a/transcripts/3-getting-away-now.md b/transcripts/conduit_podcast/3-getting-away-now.md similarity index 100% rename from transcripts/3-getting-away-now.md rename to transcripts/conduit_podcast/3-getting-away-now.md diff --git a/transcripts/30-facing-the-reality-of-being-a-face.md b/transcripts/conduit_podcast/30-facing-the-reality-of-being-a-face.md similarity index 100% rename from transcripts/30-facing-the-reality-of-being-a-face.md rename to transcripts/conduit_podcast/30-facing-the-reality-of-being-a-face.md diff --git a/transcripts/31-salt-pepper-garlic-fall.md b/transcripts/conduit_podcast/31-salt-pepper-garlic-fall.md similarity index 100% rename from transcripts/31-salt-pepper-garlic-fall.md rename to transcripts/conduit_podcast/31-salt-pepper-garlic-fall.md diff --git a/transcripts/32-first-of-all-is-it-fun-with-brad-dowdy.md b/transcripts/conduit_podcast/32-first-of-all-is-it-fun-with-brad-dowdy.md similarity index 100% rename from transcripts/32-first-of-all-is-it-fun-with-brad-dowdy.md rename to transcripts/conduit_podcast/32-first-of-all-is-it-fun-with-brad-dowdy.md diff --git a/transcripts/33.md b/transcripts/conduit_podcast/33.md similarity index 100% rename from transcripts/33.md rename to transcripts/conduit_podcast/33.md diff --git a/transcripts/34-debug-my-nervous-system.md b/transcripts/conduit_podcast/34-debug-my-nervous-system.md similarity index 100% rename from transcripts/34-debug-my-nervous-system.md rename to transcripts/conduit_podcast/34-debug-my-nervous-system.md diff --git a/transcripts/35-serious-unicorning-around.md b/transcripts/conduit_podcast/35-serious-unicorning-around.md similarity index 100% rename from transcripts/35-serious-unicorning-around.md rename to transcripts/conduit_podcast/35-serious-unicorning-around.md diff --git a/transcripts/36-brain-bags.md b/transcripts/conduit_podcast/36-brain-bags.md similarity index 100% rename from transcripts/36-brain-bags.md rename to transcripts/conduit_podcast/36-brain-bags.md diff --git a/transcripts/37-systems-check-fall-2022.md b/transcripts/conduit_podcast/37-systems-check-fall-2022.md similarity index 100% rename from transcripts/37-systems-check-fall-2022.md rename to transcripts/conduit_podcast/37-systems-check-fall-2022.md diff --git a/transcripts/38-scrape-scrap-and-survive.md b/transcripts/conduit_podcast/38-scrape-scrap-and-survive.md similarity index 100% rename from transcripts/38-scrape-scrap-and-survive.md rename to transcripts/conduit_podcast/38-scrape-scrap-and-survive.md diff --git a/transcripts/39-self-care-isn-t-selfish.md b/transcripts/conduit_podcast/39-self-care-isn-t-selfish.md similarity index 100% rename from transcripts/39-self-care-isn-t-selfish.md rename to transcripts/conduit_podcast/39-self-care-isn-t-selfish.md diff --git a/transcripts/4-big-and-little-wind-downs.md b/transcripts/conduit_podcast/4-big-and-little-wind-downs.md similarity index 100% rename from transcripts/4-big-and-little-wind-downs.md rename to transcripts/conduit_podcast/4-big-and-little-wind-downs.md diff --git a/transcripts/40-throwing-off-the-emperor-s-groove.md b/transcripts/conduit_podcast/40-throwing-off-the-emperor-s-groove.md similarity index 100% rename from transcripts/40-throwing-off-the-emperor-s-groove.md rename to transcripts/conduit_podcast/40-throwing-off-the-emperor-s-groove.md diff --git a/transcripts/41-accountability-doing-it-right.md b/transcripts/conduit_podcast/41-accountability-doing-it-right.md similarity index 100% rename from transcripts/41-accountability-doing-it-right.md rename to transcripts/conduit_podcast/41-accountability-doing-it-right.md diff --git a/transcripts/42-we-asked-a-lot-of-questions.md b/transcripts/conduit_podcast/42-we-asked-a-lot-of-questions.md similarity index 100% rename from transcripts/42-we-asked-a-lot-of-questions.md rename to transcripts/conduit_podcast/42-we-asked-a-lot-of-questions.md diff --git a/transcripts/43-systems-check-chugga-chugga-to-do.md b/transcripts/conduit_podcast/43-systems-check-chugga-chugga-to-do.md similarity index 100% rename from transcripts/43-systems-check-chugga-chugga-to-do.md rename to transcripts/conduit_podcast/43-systems-check-chugga-chugga-to-do.md diff --git a/transcripts/44-taking-care-of-that-jar-in-the-preseason.md b/transcripts/conduit_podcast/44-taking-care-of-that-jar-in-the-preseason.md similarity index 100% rename from transcripts/44-taking-care-of-that-jar-in-the-preseason.md rename to transcripts/conduit_podcast/44-taking-care-of-that-jar-in-the-preseason.md diff --git a/transcripts/45-adjusting-to-adjustments.md b/transcripts/conduit_podcast/45-adjusting-to-adjustments.md similarity index 100% rename from transcripts/45-adjusting-to-adjustments.md rename to transcripts/conduit_podcast/45-adjusting-to-adjustments.md diff --git a/transcripts/46-feed-the-ducks.md b/transcripts/conduit_podcast/46-feed-the-ducks.md similarity index 100% rename from transcripts/46-feed-the-ducks.md rename to transcripts/conduit_podcast/46-feed-the-ducks.md diff --git a/transcripts/47-the-secret-soup-packing-sauce.md b/transcripts/conduit_podcast/47-the-secret-soup-packing-sauce.md similarity index 100% rename from transcripts/47-the-secret-soup-packing-sauce.md rename to transcripts/conduit_podcast/47-the-secret-soup-packing-sauce.md diff --git a/transcripts/48-long-projects-remove-the-concept-of-time.md b/transcripts/conduit_podcast/48-long-projects-remove-the-concept-of-time.md similarity index 100% rename from transcripts/48-long-projects-remove-the-concept-of-time.md rename to transcripts/conduit_podcast/48-long-projects-remove-the-concept-of-time.md diff --git a/transcripts/49-systems-check-spring-2023-unicorns-and-thundercats.md b/transcripts/conduit_podcast/49-systems-check-spring-2023-unicorns-and-thundercats.md similarity index 100% rename from transcripts/49-systems-check-spring-2023-unicorns-and-thundercats.md rename to transcripts/conduit_podcast/49-systems-check-spring-2023-unicorns-and-thundercats.md diff --git a/transcripts/5-sustained-progress-over-being-overwhelmed.md b/transcripts/conduit_podcast/5-sustained-progress-over-being-overwhelmed.md similarity index 100% rename from transcripts/5-sustained-progress-over-being-overwhelmed.md rename to transcripts/conduit_podcast/5-sustained-progress-over-being-overwhelmed.md diff --git a/transcripts/50-friendship-hipaa.md b/transcripts/conduit_podcast/50-friendship-hipaa.md similarity index 100% rename from transcripts/50-friendship-hipaa.md rename to transcripts/conduit_podcast/50-friendship-hipaa.md diff --git a/transcripts/51-touching-squishy-brains.md b/transcripts/conduit_podcast/51-touching-squishy-brains.md similarity index 100% rename from transcripts/51-touching-squishy-brains.md rename to transcripts/conduit_podcast/51-touching-squishy-brains.md diff --git a/transcripts/52-notes-that-right-there-thats-a-good-thing.md b/transcripts/conduit_podcast/52-notes-that-right-there-thats-a-good-thing.md similarity index 100% rename from transcripts/52-notes-that-right-there-thats-a-good-thing.md rename to transcripts/conduit_podcast/52-notes-that-right-there-thats-a-good-thing.md diff --git a/transcripts/53-it-s-a-mess-we-re-a-mess-it-s-messy.md b/transcripts/conduit_podcast/53-it-s-a-mess-we-re-a-mess-it-s-messy.md similarity index 100% rename from transcripts/53-it-s-a-mess-we-re-a-mess-it-s-messy.md rename to transcripts/conduit_podcast/53-it-s-a-mess-we-re-a-mess-it-s-messy.md diff --git a/transcripts/54-the-perfect-productivity-show.md b/transcripts/conduit_podcast/54-the-perfect-productivity-show.md similarity index 100% rename from transcripts/54-the-perfect-productivity-show.md rename to transcripts/conduit_podcast/54-the-perfect-productivity-show.md diff --git a/transcripts/55-expectations-of-being-in-a-new-space.md b/transcripts/conduit_podcast/55-expectations-of-being-in-a-new-space.md similarity index 100% rename from transcripts/55-expectations-of-being-in-a-new-space.md rename to transcripts/conduit_podcast/55-expectations-of-being-in-a-new-space.md diff --git a/transcripts/56-systems-check-summer-2023-sidetracked-with-doing-stuff.md b/transcripts/conduit_podcast/56-systems-check-summer-2023-sidetracked-with-doing-stuff.md similarity index 100% rename from transcripts/56-systems-check-summer-2023-sidetracked-with-doing-stuff.md rename to transcripts/conduit_podcast/56-systems-check-summer-2023-sidetracked-with-doing-stuff.md diff --git a/transcripts/57-i-need-help-to-get-the-help.md b/transcripts/conduit_podcast/57-i-need-help-to-get-the-help.md similarity index 100% rename from transcripts/57-i-need-help-to-get-the-help.md rename to transcripts/conduit_podcast/57-i-need-help-to-get-the-help.md diff --git a/transcripts/58-finding-your-rhythm.md b/transcripts/conduit_podcast/58-finding-your-rhythm.md similarity index 100% rename from transcripts/58-finding-your-rhythm.md rename to transcripts/conduit_podcast/58-finding-your-rhythm.md diff --git a/transcripts/59-everything-is-broken.md b/transcripts/conduit_podcast/59-everything-is-broken.md similarity index 100% rename from transcripts/59-everything-is-broken.md rename to transcripts/conduit_podcast/59-everything-is-broken.md diff --git a/transcripts/6-getting-back-into-the-swing-of-things.md b/transcripts/conduit_podcast/6-getting-back-into-the-swing-of-things.md similarity index 100% rename from transcripts/6-getting-back-into-the-swing-of-things.md rename to transcripts/conduit_podcast/6-getting-back-into-the-swing-of-things.md diff --git a/transcripts/60-not-juggling-its-balancing.md b/transcripts/conduit_podcast/60-not-juggling-its-balancing.md similarity index 100% rename from transcripts/60-not-juggling-its-balancing.md rename to transcripts/conduit_podcast/60-not-juggling-its-balancing.md diff --git a/transcripts/61-the-conduit-burnout-candle.md b/transcripts/conduit_podcast/61-the-conduit-burnout-candle.md similarity index 100% rename from transcripts/61-the-conduit-burnout-candle.md rename to transcripts/conduit_podcast/61-the-conduit-burnout-candle.md diff --git a/transcripts/62-systems-check-fall-2023.md b/transcripts/conduit_podcast/62-systems-check-fall-2023.md similarity index 100% rename from transcripts/62-systems-check-fall-2023.md rename to transcripts/conduit_podcast/62-systems-check-fall-2023.md diff --git a/transcripts/63-tiny-audits-for-a-mini-kondo-benefit.md b/transcripts/conduit_podcast/63-tiny-audits-for-a-mini-kondo-benefit.md similarity index 100% rename from transcripts/63-tiny-audits-for-a-mini-kondo-benefit.md rename to transcripts/conduit_podcast/63-tiny-audits-for-a-mini-kondo-benefit.md diff --git a/transcripts/64-creating-productivity-software-with-no-productivity-system.md b/transcripts/conduit_podcast/64-creating-productivity-software-with-no-productivity-system.md similarity index 100% rename from transcripts/64-creating-productivity-software-with-no-productivity-system.md rename to transcripts/conduit_podcast/64-creating-productivity-software-with-no-productivity-system.md diff --git a/transcripts/65-embracing-the-chaos-being-toxic-and-positive.md b/transcripts/conduit_podcast/65-embracing-the-chaos-being-toxic-and-positive.md similarity index 100% rename from transcripts/65-embracing-the-chaos-being-toxic-and-positive.md rename to transcripts/conduit_podcast/65-embracing-the-chaos-being-toxic-and-positive.md diff --git a/transcripts/66-i-wish-i-had-a-britnie.md b/transcripts/conduit_podcast/66-i-wish-i-had-a-britnie.md similarity index 100% rename from transcripts/66-i-wish-i-had-a-britnie.md rename to transcripts/conduit_podcast/66-i-wish-i-had-a-britnie.md diff --git a/transcripts/67-plan-but-be-ready-to-pivot.md b/transcripts/conduit_podcast/67-plan-but-be-ready-to-pivot.md similarity index 100% rename from transcripts/67-plan-but-be-ready-to-pivot.md rename to transcripts/conduit_podcast/67-plan-but-be-ready-to-pivot.md diff --git a/transcripts/68-well-we-made-a-mistake.md b/transcripts/conduit_podcast/68-well-we-made-a-mistake.md similarity index 100% rename from transcripts/68-well-we-made-a-mistake.md rename to transcripts/conduit_podcast/68-well-we-made-a-mistake.md diff --git a/transcripts/69-systems-check-winter-2024-systems-che-ck.md b/transcripts/conduit_podcast/69-systems-check-winter-2024-systems-che-ck.md similarity index 100% rename from transcripts/69-systems-check-winter-2024-systems-che-ck.md rename to transcripts/conduit_podcast/69-systems-check-winter-2024-systems-che-ck.md diff --git a/transcripts/7-see-what-were-not-gon-do-saying-no-to-things.md b/transcripts/conduit_podcast/7-see-what-were-not-gon-do-saying-no-to-things.md similarity index 100% rename from transcripts/7-see-what-were-not-gon-do-saying-no-to-things.md rename to transcripts/conduit_podcast/7-see-what-were-not-gon-do-saying-no-to-things.md diff --git a/transcripts/70-gimme-f-g-zen.md b/transcripts/conduit_podcast/70-gimme-f-g-zen.md similarity index 100% rename from transcripts/70-gimme-f-g-zen.md rename to transcripts/conduit_podcast/70-gimme-f-g-zen.md diff --git a/transcripts/71-this-is-good-for-you-you-need-to-stop.md b/transcripts/conduit_podcast/71-this-is-good-for-you-you-need-to-stop.md similarity index 100% rename from transcripts/71-this-is-good-for-you-you-need-to-stop.md rename to transcripts/conduit_podcast/71-this-is-good-for-you-you-need-to-stop.md diff --git a/transcripts/72-back-that-thing-up.md b/transcripts/conduit_podcast/72-back-that-thing-up.md similarity index 100% rename from transcripts/72-back-that-thing-up.md rename to transcripts/conduit_podcast/72-back-that-thing-up.md diff --git a/transcripts/73-you-may-feel-some-slight-discomfort.md b/transcripts/conduit_podcast/73-you-may-feel-some-slight-discomfort.md similarity index 100% rename from transcripts/73-you-may-feel-some-slight-discomfort.md rename to transcripts/conduit_podcast/73-you-may-feel-some-slight-discomfort.md diff --git a/transcripts/74-knucklebones-big-risk-big-rewards.md b/transcripts/conduit_podcast/74-knucklebones-big-risk-big-rewards.md similarity index 100% rename from transcripts/74-knucklebones-big-risk-big-rewards.md rename to transcripts/conduit_podcast/74-knucklebones-big-risk-big-rewards.md diff --git a/transcripts/75-ideates-vs-executioners.md b/transcripts/conduit_podcast/75-ideates-vs-executioners.md similarity index 100% rename from transcripts/75-ideates-vs-executioners.md rename to transcripts/conduit_podcast/75-ideates-vs-executioners.md diff --git a/transcripts/76-systems-check-spring-2024.md b/transcripts/conduit_podcast/76-systems-check-spring-2024.md similarity index 100% rename from transcripts/76-systems-check-spring-2024.md rename to transcripts/conduit_podcast/76-systems-check-spring-2024.md diff --git a/transcripts/77-let-me-cook.md b/transcripts/conduit_podcast/77-let-me-cook.md similarity index 100% rename from transcripts/77-let-me-cook.md rename to transcripts/conduit_podcast/77-let-me-cook.md diff --git a/transcripts/78-big-bundles-of-identical-tasks.md b/transcripts/conduit_podcast/78-big-bundles-of-identical-tasks.md similarity index 100% rename from transcripts/78-big-bundles-of-identical-tasks.md rename to transcripts/conduit_podcast/78-big-bundles-of-identical-tasks.md diff --git a/transcripts/79-over-many-night-s-success.md b/transcripts/conduit_podcast/79-over-many-night-s-success.md similarity index 100% rename from transcripts/79-over-many-night-s-success.md rename to transcripts/conduit_podcast/79-over-many-night-s-success.md diff --git a/transcripts/8-fear-of-success-because-if-i-succeed.md b/transcripts/conduit_podcast/8-fear-of-success-because-if-i-succeed.md similarity index 100% rename from transcripts/8-fear-of-success-because-if-i-succeed.md rename to transcripts/conduit_podcast/8-fear-of-success-because-if-i-succeed.md diff --git a/transcripts/80-jay-didnt-have-a-connection.md b/transcripts/conduit_podcast/80-jay-didnt-have-a-connection.md similarity index 100% rename from transcripts/80-jay-didnt-have-a-connection.md rename to transcripts/conduit_podcast/80-jay-didnt-have-a-connection.md diff --git a/transcripts/81-bretts-mental-health-and-tech-corner.md b/transcripts/conduit_podcast/81-bretts-mental-health-and-tech-corner.md similarity index 100% rename from transcripts/81-bretts-mental-health-and-tech-corner.md rename to transcripts/conduit_podcast/81-bretts-mental-health-and-tech-corner.md diff --git a/transcripts/82-systems-check-summer-2024.md b/transcripts/conduit_podcast/82-systems-check-summer-2024.md similarity index 100% rename from transcripts/82-systems-check-summer-2024.md rename to transcripts/conduit_podcast/82-systems-check-summer-2024.md diff --git a/transcripts/83-the-tire-method-revisited.md b/transcripts/conduit_podcast/83-the-tire-method-revisited.md similarity index 100% rename from transcripts/83-the-tire-method-revisited.md rename to transcripts/conduit_podcast/83-the-tire-method-revisited.md diff --git a/transcripts/84-no-expectations-the-spiderman-story.md b/transcripts/conduit_podcast/84-no-expectations-the-spiderman-story.md similarity index 100% rename from transcripts/84-no-expectations-the-spiderman-story.md rename to transcripts/conduit_podcast/84-no-expectations-the-spiderman-story.md diff --git a/transcripts/85-embracing-boring.md b/transcripts/conduit_podcast/85-embracing-boring.md similarity index 100% rename from transcripts/85-embracing-boring.md rename to transcripts/conduit_podcast/85-embracing-boring.md diff --git a/transcripts/86-just-let-it-unfold.md b/transcripts/conduit_podcast/86-just-let-it-unfold.md similarity index 100% rename from transcripts/86-just-let-it-unfold.md rename to transcripts/conduit_podcast/86-just-let-it-unfold.md diff --git a/transcripts/87-burned-all-the-way-out.md b/transcripts/conduit_podcast/87-burned-all-the-way-out.md similarity index 100% rename from transcripts/87-burned-all-the-way-out.md rename to transcripts/conduit_podcast/87-burned-all-the-way-out.md diff --git a/transcripts/88-more-work-thats-good-i-think.md b/transcripts/conduit_podcast/88-more-work-thats-good-i-think.md similarity index 100% rename from transcripts/88-more-work-thats-good-i-think.md rename to transcripts/conduit_podcast/88-more-work-thats-good-i-think.md diff --git a/transcripts/89-end-of-the-year-systems-check.md b/transcripts/conduit_podcast/89-end-of-the-year-systems-check.md similarity index 100% rename from transcripts/89-end-of-the-year-systems-check.md rename to transcripts/conduit_podcast/89-end-of-the-year-systems-check.md diff --git a/transcripts/9-decision-space-the-tire-technique.md b/transcripts/conduit_podcast/9-decision-space-the-tire-technique.md similarity index 100% rename from transcripts/9-decision-space-the-tire-technique.md rename to transcripts/conduit_podcast/9-decision-space-the-tire-technique.md diff --git a/transcripts/90-big-theme-guy-with-stephen-hackett.md b/transcripts/conduit_podcast/90-big-theme-guy-with-stephen-hackett.md similarity index 100% rename from transcripts/90-big-theme-guy-with-stephen-hackett.md rename to transcripts/conduit_podcast/90-big-theme-guy-with-stephen-hackett.md diff --git a/transcripts/91-robb-knight-made-this-for-himself-and-maybe-you-too.md b/transcripts/conduit_podcast/91-robb-knight-made-this-for-himself-and-maybe-you-too.md similarity index 100% rename from transcripts/91-robb-knight-made-this-for-himself-and-maybe-you-too.md rename to transcripts/conduit_podcast/91-robb-knight-made-this-for-himself-and-maybe-you-too.md diff --git a/transcripts/92-finding-the-joy.md b/transcripts/conduit_podcast/92-finding-the-joy.md similarity index 100% rename from transcripts/92-finding-the-joy.md rename to transcripts/conduit_podcast/92-finding-the-joy.md diff --git a/transcripts/93-burning-the-candle-in-the-name-of-joy.md b/transcripts/conduit_podcast/93-burning-the-candle-in-the-name-of-joy.md similarity index 100% rename from transcripts/93-burning-the-candle-in-the-name-of-joy.md rename to transcripts/conduit_podcast/93-burning-the-candle-in-the-name-of-joy.md diff --git a/transcripts/94-supervised-learning.md b/transcripts/conduit_podcast/94-supervised-learning.md similarity index 100% rename from transcripts/94-supervised-learning.md rename to transcripts/conduit_podcast/94-supervised-learning.md diff --git a/transcripts/95-communicating-about-communicating.md b/transcripts/conduit_podcast/95-communicating-about-communicating.md similarity index 100% rename from transcripts/95-communicating-about-communicating.md rename to transcripts/conduit_podcast/95-communicating-about-communicating.md diff --git a/transcripts/96-body-grief.md b/transcripts/conduit_podcast/96-body-grief.md similarity index 100% rename from transcripts/96-body-grief.md rename to transcripts/conduit_podcast/96-body-grief.md diff --git a/transcripts/97-reducing-load-with-the-unicorns-busiest-internets.md b/transcripts/conduit_podcast/97-reducing-load-with-the-unicorns-busiest-internets.md similarity index 100% rename from transcripts/97-reducing-load-with-the-unicorns-busiest-internets.md rename to transcripts/conduit_podcast/97-reducing-load-with-the-unicorns-busiest-internets.md diff --git a/transcripts/98-presentation-advice-what-is-even-happening.md b/transcripts/conduit_podcast/98-presentation-advice-what-is-even-happening.md similarity index 100% rename from transcripts/98-presentation-advice-what-is-even-happening.md rename to transcripts/conduit_podcast/98-presentation-advice-what-is-even-happening.md diff --git a/transcripts/99-expecting-expectations.md b/transcripts/conduit_podcast/99-expecting-expectations.md similarity index 100% rename from transcripts/99-expecting-expectations.md rename to transcripts/conduit_podcast/99-expecting-expectations.md diff --git a/transcripts/vector-podcast/adding-ml-layer-to-search-hybrid-search-optimizer-with-daniel-wrigley-and-eric-pugh.md b/transcripts/vector-podcast/adding-ml-layer-to-search-hybrid-search-optimizer-with-daniel-wrigley-and-eric-pugh.md new file mode 100644 index 0000000..4dbb65c --- /dev/null +++ b/transcripts/vector-podcast/adding-ml-layer-to-search-hybrid-search-optimizer-with-daniel-wrigley-and-eric-pugh.md @@ -0,0 +1,109 @@ +--- +description: '

Vector Podcast website: https://vectorpodcast.com

Haystack + US 2025: https://haystackconf.com/2025/

Federated + search, Keyword & Neural Search, ML Optimisation, Pros and Cons of Hybrid search

It + is fascinating and funny how things develop, but also turn around. In 2022-23 everyone + was buzzing about hybrid search. In 2024 the conversation shifted to RAG, RAG, RAG. + And now we are in 2025 and back to hybrid search - on a different level: finally + there are strides and contributions towards making hybrid search parameters learnt + with ML. How cool is that?

Design: Saurabh Rai, https://www.linkedin.com/in/srbhr/

The + design of this episode is inspired by a scene in Blade Runner 2049. There''s a clear + path leading towards where people want to go to, yet they''re searching for something.

00:00 + Intro

00:54 Eric''s intro and Daniel''s background

02:50 Importance + of Hybrid search: Daniel''s take

07:26 Eric''s take

10:57 Dmitry''s + take

11:41 Eric''s predictions

13:47 Doug''s blog on RRF is not enough

16:18 + How to not fall short of the blind picking in RRF: score normalization, combinations + and weights

25:03 The role of query understanding: feature groups

35:11 + Lesson 1 from Daniel: Simple models might be all you need

36:30 Lesson 2: + query features might be all you need

38:30 Reasoning capabilities in search

40:02 + Question from Eric: how is this different from Learning To Rank?

42:46 Carrying + the past in Learning To Rank / any rank

44:21 Demo!

51:52 How to consume + this in OpenSearch

55:15 What''s next

58:44 Haystack US 2025

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20250321_110308_985bc30944ce48882d237ba24dea55a4.png +pub_date: Fri, 21 Mar 2025 11:33:23 GMT +title: 'Adding ML layer to Search: Hybrid Search Optimizer with Daniel Wrigley and + Eric Pugh' +url: https://rss.com/podcasts/vector-podcast/1951801 +--- + +Hello there, Vector Podcast is back. Same season 3. I think we are about to wrap it up with few final really interesting episodes. +There I have the privilege to talk to the open source crew Eric Pugh who you have seen in the one of the previous episodes and you guessed Daniel Riggly joining us to discuss really interesting topic on hybrid search and optimization. Really really excited to have you both on the show. Hello. +Awesome. Awesome. So as a tradition we start with the intros. Eric everyone knows but Eric feel free to introduce yourself. I mean great to be back to me tree. +I actually I'm a little late getting here because I realized I was driving to the office that I forgot my mug that you gave me the other year. So I actually called Daniels like I'm going to be a little late because I got to go home and pick up the mug and bring it into the office. +My wife keeps it and we use it when we go hiking but I was like I'm going to bring it into the office and show it off since this is my second podcast to do with you and the mug that you gave me two years ago three years ago at this point. Yeah probably three years. Works great works great. +So yeah super excited to be back here and you know kind of talk about some of the work that we've been doing with the open search community. So exciting. Yes. And Daniel welcome. Can you say a few words about yourself your background? Absolutely yeah thanks. It's great to be here. +I'm super excited maybe a little nervous but I'm sure it'll be fun. So I'm Daniel I'm with open source connections. I started out as a search consultant back in May 2012. + So almost 13 years now and I'm here to share some of our experiences that we made in our most recent project together with the folks of open search when it comes to hybrid search how to optimize hybrid search and also what's necessary to optimize hybrid search namely query sets and judgments but I'm sure we'll get into that in a couple of seconds. +Yeah thanks Daniel. I'm also nervous but I also know that you know when I release in the episodes I enjoy them. It's just it's just fun really. So I was thinking like hybrid search yeah we did discuss and I think community discusses it at large and various forums. +Erick also reminded me of the episode with Alessandro Benindetti that we just did it really was worth. +Yeah I was really just curious maybe step back from that topic a little bit and discuss the importance of hybrid search and what is it in your own words where do you see value for it compared to how we used to do search before. You want to take it Daniel and then I'll follow up. + Sure yeah so I think we see hybrid search especially in this project as let's say the process of blending traditional keyword search and also let's say modern search approaches based on language more or mostly called either vector search or neuro search and I think the benefits of it are probably you follow it or I guess you can you can group them into two groups. + Looking at the end user we always want to provide the end users with the most or the highest quality results right so search result quality is what we strive for and traditional keyword search always lacks of let's say finding related things that may not really contain the specific words but similar so laptop and notebook is an example that I think we ran probably a million times in demos maybe even more than a million times so if notebook is not in my product description I will not find it when I search for laptop and the other way around and that's where let's say blending the two techniques really shine because it enables you to not only find where your keywords are in but also find related stuff to augment the result set and I think that with that with that large benefit of course come a lot of challenges because it always is let's say non-tribal how to actually blend the traditional techniques and the more modern techniques so that's where the challenge between or the challenge behind hybrid search actually lies. + I mentioned two groups for which there are benefits of the end user we want to provide the end user with the highest quality results that's one group the other group is of course we as the ones providing search applications I mean we somehow need to profit from providing better results and it then is always different or yeah different in which let's say scenario in which industry we are working so the monosperm transparent one is always e-commerce the easier the end user the consumer actually finds stuff in your online shop the easier is for them to buy stuff if they buy more stuff more easily of course we generate more revenue and that's kind of the benefit then that comes with providing better search results and the other way is we don't want to let's say manually tune systems let's say indefinitely so of course I can go ahead and say laptop is synonymous to notebook and PC is maybe broader term of laptop and rules like these but that's kind of work that is never done if I have a changing catalog that I don't know old products get thrown out of the product catalog new products arrive so it's a never-ending challenge for me and I don't want to let's say spend my work for us always manually hunting these rules and thinking about what made the users mean when they start for something I want something let's say intelligently looking for the right things in my index and that's what the neural part of hybrid search enables us so I think these are definitely maybe the two groups that benefit and how these two groups benefit from my perspective yeah that's really good intro Ericy water take it yeah I think it's an interesting journey that we've been on the last few years and I sort of look at hybrid search as a little bit of a like a course correction right so keyword search been around forever well understood frustrations are well known and then vectors came out and all these new products these new vector databases everybody was really excited about them and we all said oh okay let's go use vectors and we leapt on that and got really excited built everything using vectors and I think maybe we went too far that way over into vector land and we started after we started getting some experience with vectors we started realizing some of the problems with it right like doesn't matter what you query you're gonna get some search results right sometimes zero search results is the right answer right you know interesting challenges around you know faceting or pagination or highlighting can be weird right so you know I think that there are some definite challenges in vectors and we all went over that way and I think we've seen it in the last two years where all the vector databases were frantically adding keyword like search and all of the keyword search indexes were all frantically adding vectors okay now we have these things as like where do we go oh hybrid search right hybrid search and you know hybrid search popped out and you know hear me out I think hybrid search is just good old federated search from the late 90s and 2000s where you had two search engines with two we send out two queries and then you brought them back and you're like how do I merge them together and sometimes you do terrible things like two lists of results right we was sometimes we would try to link them up together um it's the same idea whether you're going to one search engine you're making a keyword search and a neural search and bring them together or two totally separate see keyword search engines you're still bringing back two lists however I think at least this time around how to merge the lists of results together seems to be going better than when we did it back in federated search right uh and I look forward to talking more about like some of the ways that we bring hybrid you know build our hybrid results set together um part of me really kind of wonders why ranked reciprocal fusion wasn't a thing the last time I did federated search back in the 2000s right like doesn't seem like that crazy of a concept why didn't we do that right but we didn't uh so I'm a little more optimistic about the value of it but um it i think hybrid is a little bit of something old coming back because we're back to the same problem I literally have two search engines two concepts for how to do information retrieval and yet I want to blend it into one yeah that's exciting topic I think to me a hybrid search opens doors beyond sort of what I think Daniel just explained you know the semantic connection between you know keywords and so on is where you go multi-model right of course you need to go there carefully probably but if you do miss metadata on a particular you know image on the product you could reason about it using the image itself and maybe also video because we have video alarms as well they're more expensive of course to run but you know sky's the limit so to say if you want to go there and go um yeah so I think I think in that sense hybrid search uh unlocks many more avenues to explore including in e-commerce I think right yeah yeah yeah I mean I love that we are actually getting away from the old just straight up bag of words that was keyword that served us for a long time but still was just a very rough approximation of what people want right I mean BM25 you know people say it's not even the best algorithm it's just as fast as the one that we use uh vectors is sort of this idea of there are richer ways of understanding user queries and the content tech and just going beyond taxes the you know it's absolutely wonderful right lots of different things I mean some point we'll do a vector search on usage patterns right to figure stuff out right like it'll be the the mode will be activity won't be video or image or something it'll be activity you'd be like oh yeah that's the person I want to talk to they have the same activities as me based on whatever it is that they do right so but those kinds of things definitely are expressed through the vectors um I do think that hybrid is an amazing thing for right now for the next few years uh I do think though it's also a little bit of a bandaid in the sense that we're still leaning on keyword search for you know various use cases and if we were to look 10 years out I think an ideal solution is that we're not doing hybrid anymore just have a better approach to search something beyond vector plus keyword something better that still supports the zero results is the right answer you know some of these problems that vector using vectors gives us right we would have better better approach and not this slightly band-aid I have two different ways of searching and then have to wedge them back together yeah I like for now hybrid's exciting yeah I like that I like where you're going I wanted to also I wonder if you saw that blog by Dr. + Enbal I will make sure to link it where he talks about uh rrf you know reciprocal rank fusion and he shows on a like handcrafted example that uh you know if let's say neural search brings relevant result to the top of few results keyword lacks and doesn't so it basically brings noise when you combine the two you will end up having kind of half noise half signal and it will look terrible it will look terrible right and where do you stand on this like only there was a way yeah yeah it's only there was a way of at our understand not just blindly yeah you know blindly matching things um and I'll hand it over to Daniel and just a moment I do want to call that I really liked uh your previous episode with all a Sandra where uh I can't remember was you were all a Sandra but you kind of I think it was you Demetri said yeah that your engineers were looking at hybrid search and they kind of looked at it and said when you strip away the fancy words like ranked reciprocal fusion for blending things together you're like that's just round Robin right and you know round robin is not necessarily a round blind and it's blind round Robin right it's not round robin in in your middle school when you had to pick teams for dodgeball right the people picking knew who the best players were so at least you were at least divvying the best choices and at the very end those last two kids you know you knew they were the worst choices they were the noise in the search results right but you were that round robin at least had the benefit of knowing what was good ranked refresh ranked reciprocal fusion has no sense of whether the those results are good or bad right it is literally blindly picking them in some order with no sense of uh of what that is and as you can imagine blindly picking is going to leave lead you be pinched potentially a very weak dodgeball team right and yet that's what we think of a state of the art yeah so Daniel what should we do in this case is there any solution it's it's a good segue into what we actually tried and explored and experimented with um so in our most recent work we tried to come up with a systematic approach to optimize hybrid search specifically in open search um so in open search actually right now you have linear combination techniques at hand so that means you have two normalization techniques you can choose one the L2 norm the min max norm um they are basically both there so that you can normalize the scores from keyword search into the let's say space of vector search so that you can compare apples to apples more or less here and not apples to oranges because as we all know vm25 scores especially if you have if you have like wired field weights they are unbounded they can be in the dozens the hundreds the thousands so you don't really know upfront in what range you are operating and also you can't really compare the scores from one query to another query so that makes it really difficult to combine keyword search scores with any other um let's say search mechanism together with these normalization techniques the L2 norm them in max norm you have three combination techniques at hand and that's basically just three different means you can apply the arithmetic mean the harmonic mean and the geometric mean so that leaves you with two by three so that's already six parameter combinations that you can try on and then you can define weights um so how um how much neurosurge weight how much keyword search weight do i want to have in my query they always add up to one so you can say I want to go with 10% keyword 90% neural or 50-50 um thinking of let's say 11 of these weights so maybe you start with zero keyword and a hundred percent neural and 10% 90% and so on and so forth so that gives you a range of 11 multiplied with the six parameter combinations that we already had gives us let's say a solution area to explore of 66 different combinations which is pretty manageable so we defined optimizing hybrid search as a parameter optimization problem and we picked the most straightforward approach that you can pick and we just tried out all different combinations and calculated search metrics based on judgments and then we just had a look at which one is the best combination um for our experiments we used the ESCI data set um so that was released by amazon a couple of i think 18 months ago or something like that as part of our taglet competition um this data set comes with queries comes with products and most importantly it comes with judgments so we basically have everything that we need to really try out different um parameter combinations see how they work what results are retrieved um can calculate a couple of metrics compare these and then see which one is the best um parameter combination and um that's what we call the global hybrid search optimizer so we try to identify the best parameter combination globally for all the queries that we are looking at in a certain defined subset of queries so that's kind of the first step um the very very straightforward approach that we applied that's not really something um let's say scientifically um so first decated there was just a very brute force approach to see um what's in there also learn how results may be shaped or turn out differently when we increase neural search weight or increase the keyword search weight which normalization combination uh technique is usually the one that's best to retrieve the results and so on and so forth so um we started out with what I call reasonable baseline so searching across um I think five or six fields so title category color brand and description some bullet points so ecommerce data set like pretty basic stuff um and we calculated our metrics with that baseline so um I would call it uh probably not the best baseline you can come up with um but a reasonable baseline um you could come up with so we didn't want to let's say just create the weakest baseline because that's not really difficult to let's say outperform so we wanted to create a reasonable baseline without putting let's say a man here in finding out what the best baseline is um that's called okay right um we got decent results out of that and then we ran this global hybrid search optimizer and that outperformed the baseline already um across the metrics that we had a look at so better in DCG better DCG better precision at 10 were um these were the three metrics that we had a look at and that was nice to see because that already gave us um let's say assurance in a there is a straightforward approach that everyone can use because it's really easily applicable um it gets you good results and it also gives you assurance that there is something too neural search when switching to it from a keyword based um search engine or search application to a hybrid search um application but um as always when you apply something globally there are winners and there are losers so um some of the queries really improve by this hybrid search optimization step the global one but others didn't so we took this one step further and thought about how can we um really create a process that dynamically per query now predicts what the best parameter set is and that now is also like going in this direction that dark mentions in his blog post right so that's kind of a query understanding approach to hybrid search so we're not just blindly applying one parameter combination that we identified on a thousand queries that we explored we are taking one query analyzing this one query and then saying based on a variety of experiments that we made what is for this individual query the best parameter combination that we cannot apply so that we are not really globally applying something but individually dynamically per query and um to already maybe yeah give you the results um of what we did and then go into detail how we did it um the dynamic approach outperformed the global approach so we managed to identify a set of features we trained a model or multiple models actually and by applying this we were able to predict the best neuralness in that case or the best neural search weight for a given query based on um the results we got off the global hybrid search optimizer so we basically recycled all the different search metrics on a per query basis that we got did some feature engineering trained models and then used these models to predict what is the best neural search weight for this query and with this dynamic approach we even saw increases up to 10% in one of the metrics of the three that I just mentioned DCG and DCG and precision at 10 yeah that's very exciting thanks for for sharing this whole you know end-to-end picture pipeline I'm particularly interested at least at this point in time about the fact that well first of all your dynamic approach outperformed the global one right and that seems to be thanks to that query understanding part right can you talk a bit more about that uh and also did you check those predictions manually for example does it make intuitive sense that system picked that's for that specific query it picked more neuralness I mean is it like is it like a natural language question there or like some remnants of it or is there some other interesting findings that you could share possibly oh yeah so um let's first maybe outline what we did exactly and then dive into a couple of observations that we made on the way um so we started out by creating what we call feature groups and then we created features for these feature groups so we looked at three different feature groups um one was the query feature group the next one was the keyword search result feature group and then the semantics search result feature group for the query feature group we had a look at the length of the query the number of query terms if the query has numbers in it and if the query has any special characters in it so we kind of thought of ways figuring out when is the query maybe more specific when it is more when is it more broad query a narrow query and then we will just come up with rules like well a longer query is the more specific it is and maybe if we have more specific queries we have less results that's where we may want to augment search results with neural search results on the other hand when we have a very broad query we may have a lot of results these are short queries and then we may want to let's say only use organic traditional keyword search results yeah so we just came up with a couple of assumptions on our side and then with these four features for the keyword search result feature group we looked at the number of search results the number of hits we got when we executed the query with our baseline search configuration so the one searching in the six fields and then with something like hey if we have zero results then this is maybe a perfect scenario for neural search because then we want to augment zero results with zero keyword results with what comes from the vector search application the other two features we had in this group were the the best title score we had in the keyword search result so if we have a strong title match maybe that's an indication of we don't need as much neural search and we also have a look at I think the average title score in the top 10 was was another one so if we have like a high average in the title scores that's maybe a good sign for no augmentation needed with neural search results for semantic search it was similar to the title score so we had a look at the best title score and the average semantic similarity based on the title that we had indexed so by looking at these three groups we thought of well we now have a representation of the query on its own the result set based on keywords and then the result set based on neural search and that was kind of our starting point and then we did loads of experiments having to do with what's the best feature combination when we train a linear regression model or a random forest regression model what does regularization play for a role can we optimize the model training aspect with that so we really did a lot of iterations with these we also had a look at a large query set and versus a smaller query set to see if that also provides different aspects to it if we just randomly sound the 500 yet 500 queries versus 5000 queries so we did a lot of exploration to really pick out and make sure that we are not really let's say randomly receiving the uplift that we saw but actually really making sure that there is something to it and that we can go out in the wild there for example on this podcast and share our observations and be kind of on the safe side that they can be reproduced hopefully in other scenarios as well so that's kind of on the let's say how did we do it here so the features feature engineering how we train our models so we have a look at linear regression models and random forest regression as a starting point because we thought let's have a look at simple models first and if that works we can still have a look at the more complex ones and that's maybe already the first observation that I can share so linear regression models and the simplest form random forest regression slightly more complex form and then this is the last model iteration that we did last week we also have a look at gradient boosting methods and interestingly they all were almost the same from the model performance perspective so it wasn't like the most complex ones really give you the best results and that's kind of a very reassuring feeling because we need to calculate a couple features right per query and that adds latency to your query execution and especially in e-commerce where every millisecond basically counts we don't really want to let's say run multiple queries to calculate our features just to have like 0. +3 percent performance increase so it really has to be worth the effort so that's kind of a nice observation so we don't have to go for the most complex model architectures we can stick with the simple ones and don't really lose a lot of performance if any the linear regression model and the random forest regression model they actually scored absolutely equally when calculating the search metrics so they predicted the NDEG scores slightly different so that's how we did it we predicted NDEG scores by adding neuroness as a 10th feature in that case and by looking at which neuroness scored best and that kind of the neuroness we went then with for testing efforts afterwards so that's kind of the first interesting observation that we made we also had a look at different feature groups so what happens to model performance if we focus only on query features or only on keyword search result features so training models within one feature group only and not taking all features into account the interesting part here was that the combination work best so not always the combinations of all in nine features together with the neuroness feature but at least some of the key word search result features some of the semantics search result features and some of the queries features so they were best but looking at the query features only and these are the simplest ones to calculate they weren't part of these models from this performance aspect so you wouldn't really lose a lot if you only shows query features for your predictions so that was another nice observation here that if you went with the highest performance in terms of let's say inference speed and also speed of calculating the features beforehand you don't lose a lot of search result quality so again you don't have to go with the most complex approach to get reasonable results out of that which was I think the second most important finding at least from my perspective because that gives us again the assurance that when putting this into production we don't assume let's say hundreds of milliseconds added to your query latency if you stick to the the simple features. + May it means that there's like room for growth with this technique right we're not maxing out this technique just to get started we can start out and then as we get more sophisticated we have room we have milliseconds to burn to do other cool interesting things ask an lllm to characterize the query or something like that right we've got room for bro but I will also like this lesson and I think it kind of resonates with what I've seen in doing a mal previously is that start with like simpler solutions and try to kind of maximize ROI you know by upgrading to a more complex one and you need to set some thresholds because like as you said Daniel you know just you know adding 0. +03 won't get it right it doesn't it's not worth because also when you bring in neural search it means that you need to build that parallel index of things right like you need you need to compute and maybe GPUs as well and someone will need to pay for it and I guess the passing Daniel we're doing this project I kept asking Daniel I'm like why is re-indexing with embedding so slow like where's my turbo button like really why is this still a problem it's 2024 almost 2025 why does embeddings take a long time yeah I remain a little confused why it's so like don't we just turn a knob and make a GPU go faster and then re-index with embeddings is just the same speed as re-indexing with keywords yeah but but also what Daniel says now but also like the fascinating part like one thought crossed my mind as you explained it Daniel is that in some sense you've built some sort of like a reasoning engine if I may call it that way maybe it's not fully you know reasoning like I don't know LLM start to do it that way or something but it's like the engine that looks at the query and examines its features and makes some some conclusions and then it looks also at the results it's not like you just you just you understood the query you set it over to the retriever side and then you hope for the best that there will be best results right but in some sense you you basically do this dynamic sort of reasoning on top of everything but but the lesson there you said and correct me if I'm wrong is that just by looking at the query features you could already achieve good results you don't need to look at the result features yes yes but wouldn't it be nice to look at those yeah I do love the idea of looking at both sides right we tend to focus on queries because I think that's the viewpoint of our industry we are very query centric in the search world it's all about the query and what can we get out of the query we really don't look at the results that much except to say are they good or bad and we're not particularly good about factoring in and what did the user do back into our algorithm and yeah I love that this is a little the dynamic thing that we're doing here I think it's a pointer to bring in more dynamic aspects to our algorithms where they actually can start evolving or changing or being very specific to very specific query types use cases time of year right and today that's very difficult to do only the most sophisticated to teams have sets of algorithms yeah but I also feel like I like what you said Eric like looking at results you know and reasoning about results and also what you understood about the query might lead to much better final representation of what you show to the user right because there are so many there are so many factors also beyond the query and results right like as you said season or you know you observed some patterns with the user the recent purchase history and so on and so forth yeah I mean it's very fascinating and also like if I continue to draw this analogy with lm world you know when you ask lm to to think through what it has done it may correct itself right by just looking at what it has produced because lm's are as someone said calculators for words so if you if you give it itself its own output and ask yeah exactly yeah like I can't wait to write a search algorithm that understands what they did the last time the user didn't like the results and so when you get a similar query for the same user do something new right try something new because whatever you were doing before the user didn't like yeah joke about if the user hates what you're giving them you might as well just return random docs because that'll be better than whatever you're doing right now yeah yeah at least you you have a chance there right with the random so what question I sort of have though in what we described how is it different than learning to rank other than learning to rank is about ranking one list and here we're ranking two lists is do I just conceptually have the role of learning to rank wrong between what learning to rank is and how the dynamic hybrid optimizer was working so I mean we are not re-ranking results right so that's what typically learning to rank does but what we are kind of doing is we are learning when when to let's say increase the weight on keyword search results or on your search results right so it's kind of a learn to lend learn to blend learn to search the new technology or we just done with the learning to nomenclature right and we like optimizer better than learning to blend maybe yeah so I'm not the one who's most creative in coming up with clever names so maybe it's maybe it's time for not learn to but blah blah blah optimizer and that's kind of how we ended up with the hybrid search optimizer right now but I wouldn't really have a good let's say argument against calling it learning to optimize hybrid search or something like that because that's kind of what the dynamic approach does right as we gather more data more clicks more of that those go into the features right we even use the the language of learning to rank right feature engineering right we use that language and we're building a model and you even mention right linear models in a forest and you know those are all the the words that I think of as oh it's learning to rank so interesting I think that's just it's interesting to see learning to rate maybe come back in a new way yeah yeah and we're learning to rank something you can apply on top of the hybrid search optimizer right so it's not like we have any kind of substitute here so that's kind of still I think a very valuable tool in the mix and that's just now one way to really figure out what's the best way of getting to reasonable hybrid search results. +Yeah, but I was recently also thinking about this. I wonder what's your hunt on that, but learning to rank sort of depends on the training data and you usually collect it from the past. +You don't collect it from the future, right? And so as you move into the future and patterns change, you carry over that past weight that can actually go against the intent of your reasoning engine. And that's where I think that a lot of work needs to go in all of these directions. +But as you optimize your retrieving and your reasoning engine, you know, your query understanding, maybe you should dial back the LTR a little bit or maybe you need to retrain it right there right then. +I don't know or retrain frequently enough so that you don't lose the invention strengths, right? Yeah, I think that's our challenge in a lot of these things. The historical approaches versus the predictive approaches. Right. +And, you know, which ones do you go with and how do you discount the historical if you have a bunch of new interesting data? Yeah, yeah. But I also like the limitations of the physical world, I think from investment books I've read one key key takeaway lesson is that no one can predict the future. +If someone claims that they can do, they probably lie. But again, I guess there is still room for being more dynamic. And is there something you guys want to also show? I mean, is this something we can look at visually or? Well, theoretically we can. No pressure. +I mean, so this small demo here and I'm going to show the results first and then how we get to these results. It basically takes in a user query. My search application now is this Jupyter Notebook. So it's not the most sophisticated search application. But it calculates the query features. +Then with these query features reaches out to the model to get the best neural nest what these search features. And with that retrieved response, the query is built together and then sent to open search. +So we're just going to have a look at a couple of examples first and then we can have a look at the code. So again, this is now a part of the ESC iData set. My index has like 20,000 documents in it. So it's not large. It's only a subset of the ESC iData set. +And when we send queries, now in this case, water proof jacket. In this method, the method first, as I just explained, calls out to the model. It retrieves neural nest score and then builds the query together. And then we have this HTML display here. +As you can see, there are not images available for all of the products. But what we can see here is that what a weather proof, sorry, weather proof jacket, it gets a 50-50 search waiting. So in this case, 50% keyword 50% neural. If we go for weather proof jacket for the women, the weights change. +So now we have 90% neural and only 10% keyword search weight. And then, and that's because the query became much more specific, meaning that since we did add women there, we are not expecting results for men or for kids, right? Is that exactly. +So that's kind of what we can now infer maybe in what the model picks up here. So we have a longer query. We have a more specific query. +So it's not really looking at the words, I'll say, the model, but at the features like query length are there numbers in it? Are there any special characters in it? And so another one, weather proof jacket black. And we also see that maybe results that we wouldn't really expect are in the top here. +But again, it's only a small-ish proof of concept kind of thing that we are looking at here. But we can see different queries that are similar from let's say, meaning standpoint of view, they retrieve different weights in that case. And that's kind of the interesting thing. +And we can go for something completely different as well. iPhone case. And we see light, nice iPhone cases throughout the query and that goes with neuro.7 and keyword. 3. And I don't know what's iPhone 15, role, max, A's, black. So that would be a very, very, very specific query. +And here, again, the neurosurgery increases. Whereas when we go for a very broad query, so that's maybe one characteristic of the model that you that you can almost feel is the more specific we get, the more neural weight it gets, but also other features to play a role. Yeah. +Yeah, that's very interesting. And that's how the open search query looks like. So let's say the interesting part is that one here. +So we have keyword query, which is like I explained before searching in these couple of fields with different field weights, best fields query, multi match with the end operator. And then we have a neural query that retrieves the A100 based on the title embedding that we have. +And then the hybrid part is actually the search pipeline that normalizes. In this case, with the L2 norm and combines the results with the arithmetic mean based on the keyword search rate and neural search rate and that are here, passed in as variables that another method. +Yeah, predicts with model inference in that case. Yeah, that's kind of the small but built in Jupyter notebook prototype that we have. Everything we build is like Eric just mentioned open source. +So we have a public repository that contains all but this one notebook actually, but everything you need to train the models, do the feature engineering, calculate the search metrics. So yeah, everything you actually need. So running this with the ESC IDATA set is possible for everyone. +And if you want to apply with your own data, that of course is also possible. So that's kind of what we are looking at next. So adoption in the industry. +And also, hooking it up with the other part of this project, which is namely the, let's call it the evaluation part, calculating implicit judgments based on user feedback. +So clicks queries, stuff like that so that we also enable not only everyone to optimize hybrid search, but also enable and empower everyone to well come up with judgments if you don't have any because that's kind of the basics you need for any query. +Yeah, and that's where keep it comes in shameless flag. So one of the things we actually have a reference implementation. Some of you all may have heard of chorus, which is reference implementation for e-commerce search. We did it in solar. This we have an open search version. +And some of the stuff that you're seeing is sort of bleeding edge, hot off the presses. But we're working right now on getting the chorus for open search edition updated with some of these scripts and notebooks. +So you can just check it out, run the quick start, and then have everything and start playing with it. So you don't have to build all the steps yourself. So you can see how all the pieces fit together. And so that's available. Be great if we can add a link in the line or notes for that as well. +Let's do that. And just want to also to understand this search pipeline and all this, you know, mechanics of hybrid search that you guys implemented. Is it like a plugin to open search? And what's the plan for it? So I guess you spoke to that. Like let's give it to as many users as possible. +What's your idea there? So hybrid search is available in open search. So with the let's say the basic share so you can create a pipeline and say 70% keyword search rate and 30% neural search rate. You can also define these on the fly. +But we currently have the limitation that we although we can hook up the model within a so-called ML inference pipeline in open search. This ML inference pipeline can as of now not pass the predicted neural and keyword search weights to this pipeline. +But feature request is already out there and I assume that in one of the next open search versions we will have the possibility to not only what's already possible hook up the model within open search so that it from within open search you call out to the model to make inference. +You will retrieve the predicted neural and keyword search weights and then you can use in your search pipeline. So there is already an implementation plan out there. +There are open feature requests and if anyone wants to give these thumbs up to prioritize that within the open search community that would of course be greatly appreciated and I'm sure we can include these GitHub issues as well in the show notes. Yeah, for sure. Let's do that and we will call out. +That's how it all out to the community. Please vote if you care. I hope there will be enough people who care about this. Yeah, exactly. And then you said everything is open source. +Does that mean that the training scripts, the algorithms, the choices you make you can make there is also open source, right? And we can link to that as well. Yeah. Yes. +So everything we did, I mean we of course didn't include all the thousand experiments but all the helpers at least that we used to run these 1000 experiments. They are all in the repository out there for everyone to have a look at and maybe come up with even better ideas than we had. +So we definitely always love to hear these as well. Wow. This is amazing. This is a we can true sense. You speak up to your name, open source connections. That's amazing. +To me, it's a ton of work that you could choose to hide as well, right? And work only with your clients and nurture and iron out and everything and make a ton of money and continue on that. But you choose to open source because you believe in the power of the community as well to enhance it. +That's amazing. Well said. What's next? You already said a couple of words, but what's next? And also do you want to address our audience? What do you expect from that? I think one of the things is as we've gotten into this, we've found some rough spots in open search. +Open search has a strong ML component, ML Commons project, but integrating it in sort of new and interesting ways like what Daniel was showing. We're finding some rough spots. It is interesting to me. It does bring in a mind the do search engines. +What we call a search engine need to evolve to be more of an ML engine as well, right? I mean, it feels to me like search has been revolutionized by machine learning. +And as we move into this direction of more calculating building models evaluating data on the fly, do our search engines need to support those use cases and go beyond just the I get a query, I get documents, I match them up and that's it, right? +Is there another layer of computation that we kind of need in the search engine versus bolting it on in some other environment with an ML ops pipeline and all the rest? And I think that's interesting. + One place where I think open search is definitely breaking some interesting ground is all the machine learning aspects to it, but data processing and building models and all of that needs to maybe be a first class citizen of what we consider a search engine versus something done by some other system elsewhere because that's a lot more complexity and you know raises the barrier to adopting these things. +So I look forward to things like a hybrid optimizer just sort of being like what you do when you build your search engine, of course, you turn on the hybrid optimizer. +If it meets your use case and you have the judgements and other features that you need, right? Versus oh, a major engineering project that we're going to do this going to take us six months. So yeah, yeah, yeah. +And you know supporting that, you know, as Dan, you highlighted the search quality evaluation framework that we're adding to open searches, really exciting, would love to come back to Metri and talk all about that on another show. Yeah, let's do that. +I'm really excited to dive deeper into eBall because I think in so many ways, you need to start with eBall, especially if you have a search engine right out there to establish that, you know, baseline for yourself and then learn an introspect where things work or fail. Yeah. +Make sure some of these models don't go and produce terrible yeah, back shit crazy results, right? We had the risk that you know, boosting up on neuralness might be terrible, right? We have to understand that. +And so we need to be much better about evaluation, right? We can't take our eye off the ball of speed, right? Query speed does remain just sort of top of mind. We also really need to be good at evaluation. And I think for a lot of search teams, that's kind of a new thing. Yes, absolutely. +So the pooling and closed search agents. Yeah, and we do want to do what other thing I want to shout out. Yeah, one other thing I want to shout out is that the next haystack conference, the Save the Date last week in April, right? Last week in April, Save the Date went out. +And we are looking for talk reviewers, people who want to be reviewing the talk proposals. It's a double blind process. We get a couple people from the community. So if that's something interested, reach out to David Fisher. He's running that process. And call for proposals will be out. +I'm curious if this year, haystack in Charlottesville might as well just be called hybrid stack. Like, are we going to have two days of talking about hybrid? Because R.A.G. Rags, you know, that's last year. Now we're on to hybrid. +Or is there going to be something new? So it's going to be interesting to see what kind of comes out of the community for this year's haystack. It's also very interesting to see this dynamic or this evolution because I think two or three years ago, hybrid search was at the top of charts. +Everyone was discussing it. And then Rags proceeded it. And that just tells me that maybe we didn't dive deep enough into the topic, which is let it go. It passed and people thought, oh, that doesn't work. We need we need something else. And now it's Rags, Rags, Rags everywhere. +But then Rags also comes with its own, you know, limitations. And now we have a new, you know, evolution level, right, with hybrid search, you guys are optimizing. That's amazing. I was actually waiting for it. I'm happy to see it happen. And thanks for sharing the story. +Let's come back to the eBall when you guys are ready. I also expect that you will publish some blog posts around this topic about this topic. So I'll, you know, I'll be happy to promote those as well and read them, of course, and learn. Terrific. Yeah. Thanks. Thank you very much. +Thanks for coming to the show today and sharing this story. It's very exciting and also showing the demo. I like notebooks because they love you, you know, too quickly. They say things and have a feel of what's going on. I'm glad that Eric still uses the cup that we gave him as a gift. +It's like, like, you're really see, you really see the gift you gave like coming back and saying, you did this and the person you gave it to is still enjoying. That's amazing. So awesome to me, Trey. Thank you very much. You thank me and Daniel on. Yeah, thanks for having us. Yeah, thank you. +Take care. Bye-bye. Bye. \ No newline at end of file diff --git a/transcripts/vector-podcast/amin-ahmad-cto-vectara-algolia-elasticsearch-like-search-product-on-neural-search-principles.md b/transcripts/vector-podcast/amin-ahmad-cto-vectara-algolia-elasticsearch-like-search-product-on-neural-search-principles.md new file mode 100644 index 0000000..e0f29e8 --- /dev/null +++ b/transcripts/vector-podcast/amin-ahmad-cto-vectara-algolia-elasticsearch-like-search-product-on-neural-search-principles.md @@ -0,0 +1,309 @@ +--- +description: '

Update: ZIR.AI has relaunched as Vectara: https://vectara.com/

Topics:

00:00 + Intro

00:54 Amin’s background at Google Research and affinity to NLP and vector + search field

05:28 Main focus areas of ZIR.AI in neural search

07:26 + Does the company offer neural network training to clients? Other support provided + with ranking and document format conversions

08:51 Usage of open source vs + developing own tech

10:17 The core of ZIR.AI product

14:36 API support, + communication protocols and P95/P99 SLAs, dedicated pools of encoders

17:13 + Speeding up single node / single customer throughput and challenge of productionizing + off the shelf models, like BERT

23:01 Distilling transformer models and why + it can be out of reach of smaller companies

25:07 Techniques for data augmentation + from Amin’s and Dmitry’s practice (key search team: margin loss)

30:03 Vector + search algorithms used in ZIR.AI and the need for boolean logic in company’s client + base

33:51 Dynamics of open source in vector search space and cloud players: + Google, Amazon, Microsoft

36:03 Implementing a multilingual search with BM25 + vs neural search and impact on business

38:56 Is vector search a hype similar + to big data few years ago? Prediction for vector search algorithms influence relations + databases

43:09 Is there a need to combine BM25 with neural search? Ideas + from Amin and features offered in ZIR.AI product

51:31 Increasing the robustness + of search — or simply making it to work

55:10 How will Search Engineer profession + change with neural search in the game?

Get a $100 discount (first month free) + for a 50mb plan, using the code VectorPodcast (no lock-in, you can cancel any time): + https://zir-ai.com/signup/user

' +image_url: https://media.rss.com/vector-podcast/20220216_040237_4d74468969220e3376998953833bb185.jpg +pub_date: Wed, 16 Feb 2022 16:14:37 GMT +title: Amin Ahmad - CTO, Vectara - Algolia / Elasticsearch-like search product on + neural search principles +url: https://rss.com/podcasts/vector-podcast/393967 +--- + +Hello, vector podcast is here and today we're going to be talking with Amin Ahmed, co-founder and CEO of the company called ZIR AI. +I'm really, really excited to talk to Amin because basically he's innovating in this space, his company is innovating in this space of bringing vector search to practice and also making it usable. Hey, I mean, how are you? I'm doing fine. Thank you. Thanks for having me. Awesome. +Thanks for joining. And I know it's almost like festive times, so it's probably quite a packed schedule for you otherwise as well. So yeah, I was thinking let's traditionally start with the introduction. +Like, can you please tell me a bit of background before ZIR AI and OZIR AI is a startup and you're rolling at ZIR AI? Yes, sure. Me and my co-founder, we started ZIR AI in 2020. Before that, we were both working at Google. I had been there since 2010. +I worked in Google Research, focused on NLP and language understanding with machine learning. Prior to that, I had worked many other places in the industry. So I've been in the industry about 24 or 25 years now. +And around 2017, the team that I was working on in Google Research actually became known for Gmail Smart Reply. If you remember that feature. Yeah, that's an excellent feature. The moment I saw it, it was like, wow, that's fantastic. Yeah. Yeah, and it was impressive. +And I would say maybe it was a very practical application of NLP that went, that was deployed on a very large scale. So that was the research group that I was a part of. It was under Rakers, while that was developed in collaboration with some others. +Anyway, around that time, I became very interested in using neural networks for more general purpose information retrieval. And I specifically formulated this question answering over a large corpus. And at the time, I mean, Bert, when it was released a year later, changed this idea. +But at the time, a lot of people would approach a machine learning problem from scratch. It would take a completely uninitialized neural network and then try to train it. And when the models get big and deep, mostly you don't have enough data for your task. +And it also, you know, that doesn't jive very well with if you think about how humans approach a task. +If you ask me to answer a question or to read a message from a medical textbook, I may not be a doctor, but my understanding of the English language will allow me to get some of the information content from that passage. +So in the same way, I was thinking that if a neural network is truly understanding language in the way that people do, it should have this property. And it should be possible to train a general purpose neural network that without fine tuning in a specific domain can also work reasonably well. +So I set out to build this thing. And that was my research program in 2017. And we were actually able to launch the first iteration of that model in a product called Google Talk to Books. So to, and I'm saying this to my knowledge, I would love if someone corrected me in the comments section here. +This is Google Talk to Books is the first large scale end-to-end demonstration of a neural information retrieval system. So it is a search over a corpus of around 200,000 books from the Google Books corpus. But it's done entirely with vector search. And I'm not aware of anything before that. +So the neural network is very important here. I, not part of the team that conceived this idea and I was not actively working on it. They had a neural network which wasn't producing good enough results. +And we put in this more general purpose question answering your network and the results dramatically improved. This was basically the first rollout. +But then what I observed over the subsequent years was that I was able to take exactly the same neural network and apply it in at least six different products within Google. And this is what convinced me of the business value of what had been demonstrated here. +This could actually improve metrics and products used by millions of people. And so this was essentially the genesis of the idea of the ZRAI. +We started me in my co-founder in 2020 and the objective is to provide something like elastic search or algolia, except using the principles of neural information retrieval. So as you know, elastic search and algolia are based on the BM25 algorithm fundamentally. +So yeah, so that's what we've been doing for the last two years. Yeah, this is fantastic. +I mean, it's fantastic also that you bring your experience from such a large company innovating in search, right? Over to, you know, the rest of the world essentially, right? So that I believe your goal is to apply this with as many clients as possible. +And are you focusing mostly on NLP at the moment, natural language processing? Yeah, so well, we're focused from a customer's perspective. We provide a tech search solution. +Now, one of the beauties of embedding based techniques is that in your network, you can go beyond text and you can embed images, video and other types of media into a common embedding space. So that is where this company will eventually go. +But my roots are in NLP and I think that tech search by itself is a large area that takes an effort to do well. So that's where we're focused initially. Yeah, that makes total sense. +But as you said, you know, vector search is not kind of constrained by the application as long as you can embed it, right? +And plus all these multimodal scenarios where you can combine, let's say, your camera pointed at something and then you're talking to it and then you can kind of get some textual matches and suggestions, right? So that could be a very rich experience. +Right, right. And that particular application is actually achievable now, even in an all text platform, if you feed the transcripts in. And these neural network approaches tend to work especially well with natural speech, both as query input. +So this is why they're often used in technologies like Assistant or Alexa. Because people, when they speak, it's obviously much different than when you're typing keywords in a search box with your keyboard. But then also when searching over natural language text like transcripts. Yeah, absolutely. +And when you say neural networks, you know, some of them, let's say, vector database providers and vendors on the market, they give you sort of this machinery. You can plug in some models. They also have some models available, let's say, from hugging face. +In your case, in case of ZRI, are you innovating in this space of creating this neural networks for your clients? Yes, we are approaching the problem holistically. So we're, you know, the vector database is one critical component of a neural information retrieval system. +But there's other pieces, for instance, like the re-ranking piece or the neural network that produces the embeddings. And all of these need to work in coordination and tandem. Ideally, when they do, you can squeeze a lot more performance out of this system. +So yes, our focus is on, we even handle data ingestion. It's not a big area of focus. But the reality is that you have to make your experiences as easy as possible for widespread adoption, I think. So we allow our customers to just shovel in, you know, PDF documents and all kinds of other formats. +We perform the text extraction. We perform the segmentation of the document. And we actually do the encoding with the neural network, build the vector database and then handle the serving as well. Yeah, so it sounds like an all-around solution. +And I mean, it's very typical, you know, in some sense kind of to bring some algorithm or some idea to the market, but like it doesn't have any connectors. Okay, how do I feed data into it? Or maybe there is like a simple demo. And yeah, nothing beyond that. +But it sounds like you are taking the kind of all-around approach. +And have you been looking to implement everything yourself or are you also kind of reusing some of the open source pipelines, you know, like for example, for embedding or for document conversions and so on? Yeah, we are using open source as much as we can and where we think it makes sense. +So for instance, for content extraction, there's a Pashitika, which is a very good framework. But then there are certain document types for which there are better alternatives out there. And, you know, we've had certain customers for which PDF extraction, for instance, was a priority. +And we discovered some shortfalls with Tick-N-We went and we researched and found there's better alternatives out there. And so we've got those implemented. But we didn't write a PDF extractor from scratch, obviously. That's too much for a two-man company to do. +So yeah, we're trying to really combine the best of breed in every area and create a cohesive system that just works out of the box quite well for a broad range of use cases. Oh yeah, that's awesome. +And it's also great to hear that you reuse open source software, you know, at least initially or maybe you can find two minutes, so to say. But yeah, I mean, also that's amazing because you can quickly kind of build your product and focus on the goal. +Yeah, and now that we approached this more kind of closely, can you actually describe what is your product today? So as a client, what can I get? What can I, what kind of support you also provide? But first, can you start with the product itself? Yes. +So to describe it abstractly, and then I'll explain very concretely what I mean. I would say that we're a cloud platform as a service for text retrieval or text search. So the way it looks is we have two main APIs, one for indexing content and the other for running queries on the content. +So an organization would come and they would index a large amount of content. They might index periodically or incrementally as well over time. +And this would accrete in an index and then subsequently they would come and they would run generally natural language text queries against that corpus and we would return the best matches. So what we actually provide and how that looks on our platform. +So we, you essentially, you know, you come and you sign up just the way you would sign up for an AWS account, you're dropped into an admin console. Everything you can do in the admin console can be done through APIs. We're basically focused on again, a platform. +So we're accessible through GRPC and rest. The platform, the console is basically to allow you to, you know, point and click and quickly experiment and discover the value of the system. +Because our vision was that within, within 15 to 30 minutes, someone from an organization should be able to come, drop their documents into the system and determine whether or not it's even going to meet their needs. +And then if it does, they can consult the documentation and learn how to use the APIs and get a proper integration going. So we organize collections of documents into what are called corpora. So one corpus is essentially, it's a customer defined entity. +It groups related documents that they want to search together as a unit. We allow, you know, the customer to define any number of corpora, there's limits depending on the account type. And then you can essentially drag and drop the documents into the web browser, into the corpus upload. +It'll be, there's about a seven minute latency. And then you can start running queries. And when you run, we have a hosted UI that makes it easy to see the results kind of on the spot in the browser. +But when you run queries through our interface, you also have our, our, our API is, you also have the ability to run one query against multiple corpora and merge the results. +So we also support the ability to attach metadata as your indexing content, attach metadata that then is returned to you in the search results. So that would allow you to, to join to let's say another system on your end. But those are, those are some of the features that we provide. Yeah. +So it sounds like it's a self service system, right? And so if I was a client of yours, I could like get a subscription trial subscription maybe then upload my document corpus. +How big a corpus could I upload on a trial? Do you have any limitation there at this point? So our general trial has been 15 megabytes of text. And I'll explain what that translates to. I was, I was, I was just working with another customer. +And they had about one gigabyte of PDFs that we put into a corpus. And then that turned out to be about 48 megabytes of text. So the, the billing is by the actual extracted textual content. So 15 megabytes is actually a decent data set, several hundred documents you can imagine. +So, but we have, we have plans that go much larger and we have customers that are indexing far more data. Yeah, yeah, sounds great. And then what happens next? So let's say I'm happy. I want to move forward. +Now you said that there are APIs that I can start kind of introducing inside my prototype or my existing back end. Is that right? Yeah, that's right. So we, we support primarily we promote a gRPC interface because it's high performance low latency. We also do have a rest interface. +We have fully authenticated APIs. So we use OAuth 2.0 that standard. So you would give credentials to your servers and they would use those credentials to establish an authenticated session with the platform and then run, run queries for you at a very high rate. We scale horizontally. +We can go up to hundreds of QPS, though we haven't had a customer that's needed such a high rate, but we're capable of that. Yeah, yeah. +And you also mentioned that you maintain certain like SLA guarantees like P99 latency can you speak a bit about that and also like how much of that accounts for client need versus what you are building for the future. And this is a good question. +So in terms of client need, we really haven't had any client that's required anything better than 200 milliseconds. Now there's a potential client that we're working with. They're not yet acclaimed. +They're looking for more like 50 to 60 milliseconds because essentially the look up into our system is only one part of their overall request handling process. So they have a much tighter budget. +In practice, what we're seeing on our platform for our customers today aggregated over all queries is a P99 of around 130 milliseconds. Our P50 is about 60 milliseconds. And this has been sufficient for our customers. +For customers that have tighter requirements, we actually have many different ways to address it. So actually the main latency is not from the vector database. The vector database is generally quite fast. It's the neural network that has to do the text encoding. That's the bottleneck. +So we have the ability to set up dedicated pools of encoders, neural networks that do this encoding of four customers. So we scale and we're cost efficient by sharing the pool across all customers. But for customers that have very stringent needs, we can set up dedicated pools for them. +But even when you go, let's say single customer, single node, maybe GPU node, there are still theoretical boundary to how fast it can be. +Let's say if I take an off-the-shelf birth model, and if I throw 768 dimensions, what's going to happen? How can I fine tune it on the speed size? Yeah, well, let me address two things that you said there. +So the off-the-shelf birth model is a very common approach that many companies are trying to productionize NLP. They use it because birth has a phenomenal accuracy. You fine tune it with a little bit of data. And everyone always hits the same problem that is very difficult to productionize. +And even at a place like Google, they didn't productionize birth. They had to distill birth and productionize it. And distillation requires a lot of expertise. It's out of the reach, I think, of most companies. +So as good as the results look in a staging environment, that's not really a practical to productionize that. +And that comes back to the original point that we tried to make the right choices where if we were deploying birth, either it would be enormously expensive for us because we'd have to be using GPU instances or TPU instances, or we would have very high latencies. +So we have a model that produces similar performance, but it runs much faster. It's still transformer-based. +Coming to your second point, I think your main question, your original question, was actually what's the theoretical limit of performance that we can achieve in terms of are you asking from a latency perspective? Yeah, a latency. So I'll say this. +When it comes to the vector database, you probably know this better than I do. If it's indexed and quantized correctly on our last stuff, even running on CPUs, you can get down to three, four milliseconds of latency. +It depends on so many trade-offs, like how much recolor you will decircify and other things like that. What are the dimensions of the vector? But I think that we found that to be quite feasible for our system. We don't do 768 dimensions. +Our neural nets produce a little bit less, but still it's comparable. It's not that far off. In terms of the neural network, I would say that transformers are required for proper language understanding. +One of the things I didn't mention about our system was I think that we were basically one of the first teams back in 2017 to incorporate transformers production architecture. This was my colleagues, Noah Constant. He was actually one of our colleagues. +Previously being in our team was on the original transformer paper. He was in Google Brain at that time doing that research. We wanted to productionize a plan to a model. Noah basically spent a couple of months, took that research level code and got it to production quality. +Talk to books is actually being powered by a very early transformer based model. We saw an enormous performance jump in our metrics, doing nothing other than switching to transformers. I've never seen such a big jump in any... Our metrics, we were looking at F1. Our F1 jumped from 30% to 38%. +Just by switching to transformers. Not changing the training data or the evaluation objective, just making this one change in the architecture of the neural network. I would consider that's an absolute requirement. +I would also say that I'm not very familiar with the economics of GPU scaling because it's generally kind of expensive. Our neural networks are actually designed to run reasonably well on CPUs. There's also these tips like obviously Google's got the TPU, but Amazon has Inferencia. +We're still kind of experimenting with what we can do with latency there. +I think that you can count on about 20 to 30 milliseconds of latency at the low end from coming from the encoding process unless you start moving to GPU or something and then you might be able to do maybe 5 to 10 milliseconds. +If you put that all together, it seems to me realistically you can shoot for 30 to 40 milliseconds would be pretty aggressive in terms of what you can get at the lower bound. And maybe for many companies out there, this will be okay. +As long as they don't run web scale type of deployment, maybe they can scale per region or per zone or whatever it is that makes sense to them. I think sounds like 30 to 40 milliseconds could be quite an okay speed. We're talking about latency there. +I think that's a perfectly acceptable speed even for web search or something. That's literally the blink of an eye, 40 milliseconds. I think the other thing to note is that these solutions are very horizontally scalable. +In terms of serving any given throughput, you just scale the neural network and code or pools and you can replicate the vector database if using FIAS for instance you start up replicas. You can basically get almost unlimited throughput. +It just depends on how much money you have to throw at the problem. So if you need 500 QPS, bring up more hardware. If you need 5000 QPS, you can bring up more hardware and do it. Yeah, absolutely. +I also wanted to tap into what you said that distilling bird would be beyond reach for many companies. Can you open up a little bit and also can you share with our audience what do you mean by distilling? Maybe some of our subscribers don't know that. +So in the nutshell, and also why do you think that it's so hard to do? +Okay, well, so what distillation of a neural network refers to is taking a very large neural network and neural network with a lot of parameters, it's called billions of parameters, which is very accurate but cannot reasonably be run on a production workload. +And training a much smaller model that captures as much of the performance of the original model as possible, but fitting inside the engineering parameters of your production system. So able to for instance run an inference within 50 milliseconds. +So the way that distillation normally happens is you use the parent model is called the teacher model and you do a large scale labeling of data. And essentially the student model, the small model that you're training needs to learn to make the same predictions. +And interestingly, it gets as much bang for the buck in terms of training from learning to make the correct predictions as it does from learning to, you know, assign probabilities to the incorrect predictions. +So the reason I'm saying that distillation is difficult is there's, I think it approaches to it, it's still a fairly open research topic. There's a lot of active research. +I haven't looked in the last couple of years as possible that there might be frameworks out there now that make this much easier. +But certainly while I was at Google in 2018, 1920 time frame distillation was generally a topic that was tackled by entire teams working over a quarter or two, at least for the most serious production systems. That's how it was used to go. Yeah. +And definitely when it comes to collecting data as you rightly not just, you know, it's not something you can easily scale unless you have some clever technique for data augmentation. +And even then, like for text, as I was eluding in previous podcasts, you know, like if you have like a London is the capital of Great Britain, you cannot put any random city there in that specific sentence, right? Right. Right. Right. Yeah, you need to have certain control. +But there are still ways to, for example, use retrieval itself to augment your data set, right? For example, if you need more entities, you can find them through retrieval, maybe even through vector search, by the way. I don't know if somebody experimented with that already. +But there are other techniques like kind of producing these negative examples and as you alluded to, right? So you need to have as many negative also as many positive so that your model is balanced, right? And that goes to a general model training topic, which is a year to your point. Yes. +And I think that's one of the key to producing a neural retriever that can outperform VM25 in every workload. So that's an excellent point. Yeah. And also, I just reminded me of one challenge that we've been solving in my team actually earlier with building a job search engine system. +And when you evaluate the performance, let's say precision, when it kind of, we call the miss recall, so how frequently it missed triggers to query, shouldn't have actually triggered. like the basic challenge there is, okay, I have this job queries, which I can mind from certain sources. +But then you can, as negative examples, you can pick everything else, right? But that everything else doesn't actually count because just to give you an example, let's say when I say find full-time job in London, right? So that's just a typical query. +You are really interested to find that slightly negative example, which says, let's say, working hours of some office, right? Which is not about job search anymore. It's about points of interest search, maybe. +And so you really want to have those examples to see, okay, does your model, you know, is able to differentiate between them? +And I guess checklist paper is another example where they go like beyond, you know, imaginary in a way that saying, okay, you can actually fulfill this criteria and you can actually check your model on various, various aspects. +Right, right, right. +And is that something that you like, how did you go about addressing that in your research? +I mean, you know, what we did is that actually, if you look, it was like one of the early, early papers, you know, the reason I like reading papers is because you can bring some ideas from one paper to some other domain. +And so the paper was about sentiment analysis where one of the challenge was back then when it was dictionary-based systems, you know, how do I expand my positive dictionary? How do I expand my negative dictionary? +And what they propose there is that you can use a retrieval system where you say, okay, you take an instance from a positive dictionary, let's say it's good, okay? +And then you search with a pattern where you say good and and then a blank and you just let your search engine tell you what good is occurring with in the sentences or text, right? +And the same for the bad one, then they run some clustering on it so that you can actually pick more representative items from your data set. +And in principle, you could apply a similar technique with the job queries, right? And we didn't go that far, but we actually did try to use our own search engine to essentially, you know, augment. + One of their potential techniques that might help their short of introducing hard negatives, it's easier than introducing hard negatives is to add like what they call a margin loss, which is to essentially just say that the separation in the score that the neural network assign the positive example versus the negative examples has to be large. +So you sign some lambda and it has to be essentially you handicap the scores of the positive examples by that lambda and it forces the neural network to introduce more separation. And so sometimes that can be helpful even if you haven't generated hard negatives. Yeah, yeah, absolutely. +Maybe we can also cite some papers in this podcast, you know, like especially you mentioned some papers and I will try to find this sentiment analysis paper, although I think it's probably like five, six years old or maybe even older. +But I mean, this idea is still live forward, I think, and like we shouldn't forget about them. Right. +And if we go back to your product, so basically, like you said that you also kind of look at using some of the existing algorithms in vector search, can you name them? Or is this some kind of secret? Or are you customizing them as well? So for the vector search piece specifically. +Yeah, so I think we can say that we know we at our core, we do take advantage of phase or fires. I'm not exactly right. I pronounce that from nobody. Nobody knows. I think everyone says their own way. +In my opinion, it's just an excellently designed system with a team that's actively maintaining it and there are obviously experts in that field. One of the features that customers have requested from us is the ability to mix in predicate predicates and traditional Boolean logic. +So you might have this corpus of documents and they all have this, every document has this metadata, which is the date it was published. And then you might want to say, okay, give me the most relevant matches for the query, but only from documents published in 2021. +So this is like a very crisp selection criteria in the selects a subset of the corpus. So this is actually something that we have not launched yet, but we've been actively working on and will probably launch in Q1. I believe I'm going recently added the support. +Google Vertex matching engine I think is a recent offering. They also claim to have this support. It's important. Many of our customers have asked for the same thing. So we've started from a fires, but we have been customizing it. Yeah, yeah, sounds good. +So basically some other companies called symbolic filtering and like that's what I think you refer to, right? So I can have certain categorical variables. So to say in my data and I can filter by it, right? Exactly, right. +Yes, I think when you love fires of face doesn't have this functionality as far as I know. And so essentially you'll have to kind of extend it. +And do you plan to keep it to yourself, which is perfectly fine? Or are you also able to contribute it back to the fires open source project? So I think what I've noticed about the authors of fires is that they want to keep the product very focused on being a first class vector engine. +And these are essentially augmentations that they're not interested in pulling in. And I think they would see it as scope creep, which is probably fair. That said, would we contribute it as open source? We could still contribute it back as open source. +In fact, down the line, we could potentially make our entire stack open source. +I think some of the abusiveness of that, say, over the guards to elastic and how it's worked, where you have these very large companies that essentially contribute very little, but they take advantage of their ability to launch platforms as a service like Amazon can. That's kind of scared us. +So I think in the short term, we're not doing that, but that's certainly something we could plan on doing in the longer term. Yeah. Yeah. +And I mean, of course, the dynamics of open sources kind of not necessarily solved, especially as you've brought up this example with elastic, right? And they're kind of battle between elastic and Amazon. But like for some companies, it still works. +As a starter, you know, you can enter this community. You start building the community around you. And so they bring back ideas. They feed in new use cases to you. +And maybe they even implement some features, right? And is this something that you've been thinking as well along these lines? Well, I definitely see your point. I definitely see your point. And you know, at the same time, we also do have some competition in the space. +We're still in the early days, but 2021 in particular, saw the launch of several competitors. And even Microsoft is in the mix now, Microsoft semantic search. I think it's still in beta. Amazon launch Kendra 2020. +I think that they probably get the credit for launching the first platform as a service, Neural Information Retrieval system. +So in both of the cases, both of those systems, by the way, I think that they actually are fundamentally based on a VM25 search, followed by re-ranking with the neural network. This is what I've gathered from their own product marketing material, which is still a neural search. +It just has a difference out of pros and cons versus like straight retrieval from a vector database. So for instance, just to give you one quick example, a multilingual search, VM25 is not going to work for a multilingual search. +You have queries coming in different languages, documents in different languages. VM25 won't work there, nor will a re-rank on a VM25 results approach work over there because the VM25 has to bring something back to it to re-rank it. +Well, in the case of our system, you can check on some of the demos. We can actually embed across languages into a shared embedding space. So you can search across languages. That's something which you need a vector database for. Yeah, exactly. +So you go multilingual on the first stage of retrieving the data dates. And I think this multilingual search in general, I think it has so much potential. I don't know if Google is using it already to some extent, but even a smaller scale instead of configuring, let's say, solar. +We keep mentioning last search a lot. They didn't pay for this podcast. But I'm just saying, let's say Apache solar, you see, right? So you'll have to go a long, long, long way to achieve it. But now, Lucine released H&SW in 9.0 version. +And so in principle, you could embed your documents using multilingual model and retrieve them in the same way. So do you see huge potential for the market, for the multilinguality? No. +There have been some studies that showed that when eBay introduced automatic translator tools, there was a significant increase. It was a few, I think, you know, a few percentage points of increase in commerce on their platform, which translated to hundreds and hundreds of millions of dollars. +So the, you know, the advancements that have been made in machine translation and now, and she like cross-lingual retrieval, will serve to further break down barriers to commerce, at least, and in the way that's commercially very valuable. +But speaking more broadly, I think that what I will be very interested to see is how vector databases evolve and merge into traditional database technology or into systems like Lucine, like information retrieval systems. +Because at the moment, you know, you have FIAS, it's kind of a separate discreet entity. +But longer term, just, you know, conceptually, in a way, very low-dimensional vector database technology has already made its way into my SQL and Postgres with the spatial extensions that they've supported for many years. The the quadri algorithm for doing, you know, sublinear lookups on a map. +Those spatial extensions have been around for a while. + You can easily imagine that in the future, once people start to understand how useful vector embeddings can be, and that's established, that you you'll have a, you know, columns of vector type in a relational database and be able to simply build an index and perform, you know, fast nearest neighbor searches straight from Postgres. +So I think that's an exciting future to contemplate and I see that eventually it will go there. +That sounds really interesting, like you do you do you think that vector searches in general is hype right now? Like the way big data was few years ago? +No, no, it's not hype because again, I saw neural information techniques backed by vector databases, making a big difference in many products at Google. +So I think, I think where it is right now is that there's a few big companies like the Fang type companies in Silicon Valley that have the expertise to take advantage of it. It's not being commoditized yet. +So it's definitely not hype, but it's got a few years to go before it enters the mainstream consciousness. +Yeah, for sure, but like to your point, like maybe at some point, vector search will become, let's say, part of Postgres or my SQL or whatever other like kind of traditional, so to say database, which is traditional is in its widely used. +And then Lucine already also introduced it, right? So Lucine now has H&SW. +You can go and argue to the point, okay? +Maybe Lucine index layout might not be kind of optimally designed for, you know, nearest neighbor retrieval because, because like if you look at five methods or H&SW, you know, like it's some graph method or it's a way to partition your space in the scene, you partition it by segments. +And that's kind of like given, right? Because it's designed for inverted index. +But again, on Twitter somewhere, I saw it read from one Lucine commeter who said, maybe this will by itself open up some new opportunities because you'll have a separate vector space index per segment, right? And maybe you can design some features around that. +So it sounds like you still see the potential for merging these technologies in the future and then bringing additional benefit. Well, I, yes, I can't really speak for Lucine. I haven't taken time to study that implementation. +How it was done is I think you know more about it than me, but I was seeing that eventually relational databases could and might, you know, implement indexes, vector indexes directly. I'm not sure that I can see any technical reason why that wouldn't be possible, basically. +And it could potentially be very, very useful as neural networks, you know, go more and more mainstream for embedding. Yeah, I mean, it sounds like one logical step forward. +Maybe it will not be kind of scalable as a pure vector database, but like on a small, like, amount of data, let's say, when my SQL or Oracle or other databases, they introduced full-text search, right? Initially, it wasn't there, right? Right. +What restricts you from, you know, introducing another field with embedding and actually running, running your vector retrieval there? Right. Yeah. Yeah, and I think it also, it comes down to this that, okay, FIIS is always going to give you, you know, the maximum performance. +So, you know, there's going to be some subset of engineering teams that need that performance and that's where they're going to go. But what about the mass market, you know, the Fortune 500 companies and things? And they're dealing with problems at such a scale where it's not necessary to go there. +And if it's just in the database, even if it's only giving me 80% of the total performance, that's good enough. +And in a way, that pragmatic trade-off is what's underlying ZERAIs existence, because people often ask, I can get better performance on my data set if I find tune, a bird model, and then distilled the bird model is like, yes, that's true. +We're aiming to give you a neural network and a full experience that will give you like 80% of the performance that you might be able to achieve, which is still better than you get just from a keyword search. +But the reality is, you know, how many companies have the budget to have NLP engineers and data science and squeeze out that extra performance? It's just not important in a lot of cases. Yeah, exactly. +And do you think that, you know, there is still a need to find a way to combine BM25 or whatever you have there, like the idea of Spark Search with the results from the nearest neighbor search? Like, have you been thinking about it? +Have you seen your clients kind of thinking about it or asking about it? There's a very interesting paper from Google, about two years ago, Dave Dobson, and I'm forgetting the other individuals. +It was specifically on this topic. You can obviously model a BM25 search as, you know, multiplication of sparse matrices. And so you can imagine your vectors essentially having a dense part produced by a neural network for instance, and then a very sparse tail or something. +And you actually want to perform about products in this. And how do you do it efficiently? And the paper was going into some fascinating techniques for how to do that well. +So your question was like, do you see these merging? And I think that, you know, I actually brought this up with the folks at Fires. Is this something on your roadmap? Is this something you're interested in? They said, no, we're not interested in this. +They're specifically focused on either sparse or dense, but not hybrid. But I think that it's going to come down to this. If the utility of this sparse hybrid can be shown, then the technology is going to follow and try to create efficient implementations of it. +I think that there are certainly classes of queries for which BM25 can't be beat. And the exact keyword matching is going to be the correct way to do it in the future. So then you can take a few different strategies. +You can either try to classify the query when it's received and then dispatch it to the correct backend, or you can dispatch it to a sparse and a dense index and then merge with a re-ranger. +Or you can do this like truly hybrid system where you're simultaneously doing the multiplication on the sparse and the dense pieces and producing a final list in one shot, not relying on a re-ranger. So it's still an open area of research. Yeah, exactly. +And two things, like I'm looking at it from the point of view of a customer. Let's say I already have BM25 platform, right? Base platform. And so I'm curious. So I'm curious to see what that research can bring me. And maybe I'm thinking about introducing this as an explorative search feature. +So because I'm not sure if it's going to fly for my documents or for my items in the database. So that's one potential to think about, okay, as you said, I can actually route this query to both sparse and dense retrieval and then maybe combine them in some linear formula even. +And I can give like a smaller score, lower score to a weight to the dense part and then higher to the sparse part because I still believe in sparse part. And that's how my users are expecting results to be there. +But then maybe I can surface some magic like Q&A, right? So they ask the question and I can give them the answer. And that might be really interesting. And the second point, there was a paper called Beer, B-E-I-R. I will make sure that all of the papers will be linked here in the show notes. +But that paper actually compared a dense retrieval versus BM25 on a number of tasks, right? So you can have a search, you can have a question answering and leaves goes on. And so what they showed is that BM25 is fairly competitive. +It actually is above dense retrieval methods like on zero short retrieval, right? So like you didn't fine tune this model. So you just took them off the shelf. Here is the task. Let's compare, right? BM25 is very stable. So just few models actually outperformed it. +And so in that sense, it sounds like BM25 is here to stay. What do you think? I agree with you. And again, this is where our scope is as a company is on building an end-to-end information retrieval pipeline, which means that okay, today we have a neural dense retrieval. +Because BM25 has been done, right? It's in Lucene. It's well understood how to implement it. Although there are some tricks to actually make BM25 work even better than off the shelf implementations. +But what we want to eventually get to is we could potentially build the BM25 and dense indexes for our customers. And then return, we're trying to just serve the best results possible. So for instance, you could take even sometimes even very simple heuristics work. +Single word queries often BM25 is how you want to serve them, not from a dense index. So if it's a single word query, okay, you're going to be on 25 search. If it's anything longer than one word run, then search. That's not a very principled approach. +I'm just pointing out that, you know, what's going on behind the scenes? That's the intelligence of the platform to provide. And we're not really restricted or married to a vector database or only a vector database, powering the search of this platform. Yeah, yeah, that makes sense. +So does that manifest in some way in your product that as a user can have the flexibility in how my search is processed, is it going to go the sparse route or is it going to go the the density tree will? No, we don't. +So at the moment, we are only doing dense retrieval because we feel like that's the interesting part. We can add that we can add that BM25 parts without a lot of difficulty in six months from now or something like that. +So, but we do provide a few different flavors of the dense retrieval because there's a few. There's question answering. So the user puts it or query answering. The user puts a query in and then you're trying to find good responses. +There's also another task which is semantic similarity, which is closely related, but it's like I make a statement and I just want to find similar statements. So my statement is not necessarily a question that I'm looking for an answer to. I just want to find semantically similar statements. +And then the other thing is question question similarity often comes up. It comes up usually in the not in. +Well, you've seen it in Google, for instance, when you type with query and then it says people also ask these questions and they get these similar questions, right? So there's use cases for question question similarity. And so we support all three of those modes of operation. +And we allow at query time our customers to specify which mode they're trying to run it. Yeah, yeah, that makes sense. That makes a lot of sense. +And of course, one thing that I keep thinking about is let's say when we introduce the sparse search, let's say BM25 and some customer comes in and it's not English language, it's something else, right? Then you need to bring in also the tokenization and other things from maybe from Lucene. +And of course, Lucene is a library in principle. It could be wrapped in a Docker image and you can do that job, right? But then the question is, can you easily marry it so that it is production grade between different platforms and languages? And it's surprising. +Lucene has come along with Solars, come along in terms of providing a good sense out of defaults out of the box in terms of stop wordless and stemming. But I have my daughter's school started using this like a product that manages communication between the school and the parents. +And that thing was clearly using you know, Lucene or solar elastic search. And they didn't have the stemming configured properly. And I didn't know as possible to misto, misconfigure that. So I was searching for vaccine and it couldn't find it because it was vaccination in the title over there. +So yeah, so with the neurosurche is kind of a little bit more bulletproof, you know, it's a bit more immune to these kinds of mistakes. And those misspelling very easily. Yeah, especially I think there is also a paper about I think it was from Google, you know, to train on byte level. +And so you will not be constrained by okay, the complexity of the language because you have like byte level definitions. And so in principle, your model should be robust to typos and spellings and so on. And some of them come from speech, right? So exactly, exactly. Yeah. +And it sounds like interesting. Like the example you brought up with your daughter's school like system, like it sounds like largely search is still broken. +It's like like the moment you go to some system which is let's say for public use, right? Like it's not necessarily designed for for finability there. It exists. +And you know, like Daniel Tankelang, I think he says like the funny part of search industry in general is that when search engine works, nobody will go and praise you. They just use it. When it doesn't work, they will blame you. So you always err on that. +How do you feel about that? Like, is this also the potential for your company to go and fix many of these broken use cases? +Well, that certainly that certainly actually our vision that we will make it very easy for SaaS companies to provide a much more Google-like search experience in their products. +So when it comes to web, say, let's into two categories. SaaS companies and website owners. When it comes to website owners, I think the search for websites is really used because and it becomes like a cyclical thing. It's really used companies that therefore don't invest any money in improving it. +It's really used because it's not good. And basically Google does enough a good enough job actually indexing well sites. So site owners have accepted that Google is going to be the front door into their into their website. +On the other hand, I think it's it is obviously dangerous for them too because you've had sites that essentially get obliterated when Google changes their quality guidelines and they drop off the front page and the traffic goes down by 95% suddenly and there's no way to recover from it. +So it would be good for us to be able to provide a good search experience on their websites, but I think they don't do it for the cost involved and they don't know how to. +And certainly, algolia and elastic are making that easier, particularly algolia, but there's still a lot of better that it could be made. Coming to SaaS companies, they're they're talking about data that's private. +The communications of the school to the parents are not on the web somewhere they can be indexed by Google. So I feel like what I've noticed in the last few years is that some sort of search feature is present in most of these products now. +But yes, it's usually not tuned, maybe not even set up correctly and it doesn't work well. And there's a lot of room for improvement. So I think these these neural search technologies let you, you know, really easily improve the quality. +Easily if you've got a set of simple APIs and that's what we provide, our APIs basically look like elastic or algolia's index documents and you never know there's a neural network running in the background at all. And it's not important. +Just the queries go in and the results come out, but these results are far, far better than what you would get from a keyword search. So, so I think there's a lot of scope, particularly for SaaS companies for for neural search. Yeah, yeah, absolutely. +I actually wanted to ask you just a question came to my mind. I've been reading the book about I think about relevant search. It's called by Dr. Enbull and other authors. +I might be not remembering exactly, but this book, you know, it goes chapter after chapter, Wade says, okay, let's just take it from the first principles. You have a search to ask, you have documents, you need to start with like tokenization. +And by the way, if you make a mistake, they will be not findable. And then you move one level up and then you start thinking, okay, what about the model, okay, TF IDF, BM25, what are the trade of Sonson? And so they teach you to become a search engineer and then they proceed to ranking and so on. +So forth. And my question is like, what do you think is going to be the change in the search engine profession going forward once neural search will hit the mass market? Because when I was a search engineer, like I looked at the scene and saw and I didn't question much. +I just went and like implemented some changes, some parsers, some plugins or modified the behavior of some algorithm, right, by extending that class by the way, the scene was not it was making a lot of classes final and in Java. And so I cannot actually extend them. +So I had to copy the entire like package and then rename all these classes. So there is no like namespace clash, but that's okay. No worries. At some point I was worried that I will probably reintroduce the scene all the way in my ID because I had to touch multiple parts. +But so I felt like I'm in control more or less, right? Not because it's not only because it's open source, but because I could read the code, I could talk to people, I could read books, I could read blogs, and I could experiment myself, right? +And that made me, I believe a search engineer in that company, even though the company's goal was not to build searches of service, we were building the product. +How do you happen? It thoughts around like how neural search will change the landscape of this job? Well, that's an excellent question. A few thoughts on that topic. Neural search is going to make it easier. It's going to require less expertise to put together high quality search experiences. +And furthermore, the advantage the companies like Google or Microsoft have from click data, it's still going to be there, but it's going to diminish. And I think that's actually why I'm biased here in misreading it. You see a lot of search engine companies starting up in the last year or two. +You've got Niva, Kagi, I think the head of Salesforce research has started his own engine. I've even heard someone. You don't have to be Apple. You don't have to be Apple. You don't have to be Apple. You don't have to be Apple. I'm rich or something. Yeah, exactly. +So maybe some rumors Apple might be trying to do something like that. And it's basically because the amount of effort it takes now I think has gone down significantly. So I think that that's going to be one of the effects of neural search. +And I also expect that just like, you know, Lucine has been around for a long time. I mean, maybe the early 2000s, 2000, 1999, I think when that catching started learning Java and as a side project project, he decided to implement Lucine. +And so he started the whole community and then Hadoop followed and so on and so forth. Yeah. Okay, because I remember from a time ago. So I think that in the same way, there will be an open source neural thing. And it might come under the cover of Lucine or it might be a separate Apache project. +And eventually, it's going to be the go to solution. So what companies like mine are doing right now is, you know, this technology is still pretty new. And we're feeling in the gap. And we're also providing like a completely hosted solution, which has some some value on its own. +But I think longer term, that's where I see things headed because, you know, we're getting into these very good general performance neural networks, systems like Bert that can just perform well on a wide range of tasks. +And then you have like, you know, T5 and now MT5 and you can go across like 100 different languages as well. So there will eventually be models that are good enough and someone's going to take the effort to distill them into something that runs well. +And, and you know, anybody in any organization will be able to download and use it the way that Lucine today. I think that's where things will be, but it might be, it might be five years before we reach that point. Yeah, yeah. + And I mean, to take this thought forward from here, like, like maybe the profession, do you think the profession will change in such a way that instead of tweaking the index configuration to make your search kind of work better, like increase recall and, you know, not suffer from decreased precision, you will move more like into, okay, here is the problem. +And this of the shelf network doesn't work. I have to fine tune it. So you become a little bit more like a researcher. Yeah, so that's an excellent point. +I think one of the key components in these systems and that we have not built yet in our system, but it's in the, it's in the blueprints is some kind of a feedback mechanism. +You'll notice this in Kendra though, for instance, thumbs up thumbs down on the results, for instance, where you indicate what's good and what's bad. And then even with a small amount of that data, you can start to train a re-ranker. +And I think that in the presence of like the volumes of data that you get on an internal application, let's say you're going to get a few thousand items of feedback. Training a re-ranker is probably the most effective thing that you can do that data. +Whether it's a random for a free rank or you take a cross-attention on your network and you fine tune it, but you can significantly improve the search quality that way. +So I think that the machinery for doing all of that can also be part of the open source offering because it's very broadly applicable and can be used by basically anyone. +Because like you're saying, this is the problem that then comes up as like, I want to give feedback on this result so the system can improve itself. Yeah, yeah, absolutely. +So you kind of create the flywheel of success, right? So that you bring the data back and then the model retains and so on and so forth. But there is also there are also like interesting challenges like in your old network like catastrophic forgetting. +Like is this something that you've been thinking maybe back at Google or now with your clients, something that kind of you need to keep innovating or solve it some other way. Yeah, so I am familiar with the concept of catastrophic forgetting. +I honestly haven't studied it very much in the context of these large language models like Bert. Although in general the approach of taking a Bert type model and fine tuning seems to be working well. +But then you're essentially talking about taking after it's been fine tuned on one task and then fine tuning for different tasks and it loses its abilities on the first task. And yeah, I guess I don't know how much of an issue that's that's going to be in the context of information retrieval. +Yeah, I mean, another thing like if you are familiar with learning to rank for example, which may or may not involve in your own network, it may also be based on decision tree like Lambda Mart for example. +You know, when you receive a new batch of clicks or downloads or whatever events you have in the system and you retain that model, it will also forget what it knew about the previous state, right? +It's very natural and it probably is we can associate it with human life as well in some sense, although they say the older you get, the earlier memories you will actually remember, you might forget what happened yesterday, but you remember what happened like 50 years ago. +But like, yeah, that's probably noticing that one myself. +Yeah, me too, actually, because days go by and I'm like, okay, what's going on? But then you go, okay, when I was a kid, I remember something, but like neural networks are probably a little bit different or at least the present neural networks. +And so I think when you when you retrain the model, like you have to retrain otherwise, it will drift, right? I think Google also has a paper about that, like kind of checking the consistency of your machine learning pipeline and your model. +So it doesn't drift and just explode in the eyes of the front of the eyes of the user, right? So you have to keep retraining it. But then that also means that it will forget things. Maybe they were quite important. Maybe they are not high probability anymore, but they still are true. +But the network has forgotten about them. Right, right, right. Yeah, yeah, that makes sense. Yeah. Anyway, it was it was a great talking to you, but I still want to close off. +And before we go to some announcement for you, I'm thinking like I'm asking this question to different to all guests and different guests take it differently. But I really would like to hear your view on that question of why it's a little bit more philosophical. +Like, like in a way, like, you had a stable job at Google, a lot of challenge, a lot of data, a lot of user impact. Like, as you said, like Autorripe Live feature was enabled and to like millions and millions of users. +So then you then you decided to go to build your startup and that's a nice nice kind of way to experience like another side of things. +But why specifically neural search? What drives you to work on it? Well, what attracted me to I was initially attracted very much to the idea of automated reasoning. And then of course that comes in its current incarnation, its machine learning. And so I started to learn about that. +And I had this opportunity to work with the Ray Kurzweil. He joined Google, I think around 2012. I knew about him. He's a very inspirational figure. And he was specifically working on language understanding because he saw that as being very critical to advancement in artificial intelligence. +So, so you know, then beyond that, I would say those are my broad interests. But then I just worked in this area specifically for eight years. And I think I became quite good at what I was doing. +And then also saw that what I was doing post 2017 in particular with this neural network based retrieval had a lot of applicability to products. And you know, I think I think that being in a research team or research team has a different type of focus. +There's a lot of focus on on publishing papers and things but not necessarily a lot of interest or appetite for building platform. So in that way, maybe this wasn't really the right place to attempt that kind of work. But to me, I'm an engineer as well. So this is this is very interesting. +And I'm not sure if I'm answering your question, but that's some of my motivation. No, you do. I mean, essentially, I'm currently leading a search team. And yeah, you know, our KPIs is like, okay, how many papers you published, how many patents you can file. +But also when you start thinking, okay, what impact am I making? Right? There is not that much room to think about creating. Maybe you can create a vision, but you might not necessarily tie it in back to the day-to-day scenarios of users. +You have to be part of engineering probably to start delivering these things at which point you are no longer a researcher. So it sounds like you you managed to combine both of these engineering and research at ZRI. Yes, yes, it's kind of both of the passions together in one company. +And if we're successful and we can take it into the future, the research end of the program is something that I'd really like to ramp up a lot. Since we started, honestly, there's been more engineering and less research. +The training, the neural networks was at the early stage of the company and then we haven't revisited it since then. But I think 2022 is going to be first of all, it's going to be a big year for this industry. +Beyond Pinecon getting funding, I was recently looking at Gina AI, if you're familiar with them. Yeah, they I think raised $30 million. It was in TechCrunch. So the industry is starting to get some notice. And for us as well, we expect, we expect to really expand in 2022. Oh yeah, fantastic. +And I mean, one manager that I worked with used to say that you need to first build the plumbing. And that's your engineering work. Once you have the plumbing, you can stand on it and actually fix some other things, high level. +And that's where you will probably come back to training neural networks and actually nailing that use case for your customers. Sounds really exciting. This was really packed and so much thinking that you brought in and also some discoveries during this conversation. I really enjoyed it. +I'm just thinking, is there something you would like to announce from your product site, something that our listeners can try? And yeah. Well, thank you. Thank you for the opportunity. +I think what I would say is that if what we've been talking about is interesting and if someone would like to try it out, then we've created a special promo code. We're currently in a closed beta. So we're accepting customers, but kind of on a case by case basis. +But we've created a promo code for listeners of this podcast. I think I'm going to I'm sure the exact code with you. And then you can post it in the comments to the to the video. +But essentially would give give you a 50 megabyte account, which is much larger than our standard trial account, a by about a factor of three for free for one month. If you want to just try out the system that we've been talking about. This is fantastic. Thank you, Amin, for this opportunity. +I'm sure some listeners will will take take this into use and build some cool vector search applications for their products. That would be great. Yeah, it was a pleasure to talk to you. I hope we can talk at some point down the road as well. And I wish you all the best in the future. +In the next year, with your ambition and also with with reaching to clients and getting contracts. And and all the best to you on that front. It was my pleasure to talk to you and hopefully to see you next year as well. Thank you so much. It was good talking to you too. Thank you, Amin. Bye bye. \ No newline at end of file diff --git a/transcripts/vector-podcast/atita-arora-search-relevance-consultant-revolutionizing-e-commerce-with-vector-search.md b/transcripts/vector-podcast/atita-arora-search-relevance-consultant-revolutionizing-e-commerce-with-vector-search.md new file mode 100644 index 0000000..144e122 --- /dev/null +++ b/transcripts/vector-podcast/atita-arora-search-relevance-consultant-revolutionizing-e-commerce-with-vector-search.md @@ -0,0 +1,384 @@ +--- +description: '

Topics:

00:00 Intro

02:20 Atita’s path into search engineering

09:00 + When it’s time to contribute to open source

12:08 Taking management role vs + software development

14:36 Knowing what you like (and coming up with a Solr + course)

19:16 Read the source code (and cook)

23:32 Open Bistro Innovations + Lab and moving to Germany

26:04 Affinity to Search world and working as a + Search Relevance Consultant

28:39 Bringing vector search to Chorus and Querqy

34:09 + What Atita learnt from Eric Pugh’s approach to improving Quepid

36:53 Making + vector search with Solr & Elasticsearch accessible through tooling and documentation

41:09 + Demystifying data embedding for clients (and for Java based search engines)

43:10 + Shifting away from generic to domain-specific in search+vector saga

46:06 + Hybrid search: where it will be useful to combine keyword with semantic search

50:53 + Choosing between new vector DBs and “old” keyword engines

58:35 Women of Search

1:14:03 + Important (and friendly) People of Open Source

1:22:38 Reinforcement learning + applied to our careers

1:26:57 The magical question of WHY

1:29:26 Announcements

See + show notes on YouTube: https://www.youtube.com/watch?v=BVM6TUSfn3E

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20230516_070519_a42272298eaf6239be6e8050108fd5b9.jpg +pub_date: Wed, 17 May 2023 08:12:12 GMT +title: Atita Arora - Search Relevance Consultant - Revolutionizing E-commerce with + Vector Search +url: https://rss.com/podcasts/vector-podcast/953768 +--- + +Hello there, vector podcast. We are still in season 2. If you forgot, you can always go back and check season 1. We had some really awesome guests there. Today I have a big, big pleasure to talk to Atita Aurora, who is the search relevance consultant with open source connections. +And she has spent quite a bit of time in search field in NLP. I'm really curious to learn her journey and talk some of the topics that we usually talk on this podcast like, you know, search vector, what vector search and also, you know, aspects of profession. +Hey, Atita, how are you doing? Hi, pleasure to be here, to be in tree. And I mean, before we start, I think a huge shout-out for you that this is the great thing that you're doing. And I mean, I'm feeling pretty excited to be here today. Yeah, thanks, Atita. You're very kind. +And this is what also gives me energy, you know, when people like you say that this makes sense to continue doing. And I really enjoy it myself because I learn so much. I connect with all my guests on a different level in the podcast. And I hope that this is also informative for our listeners. +At least when I release and all the podcasts, all the episodes so far, I kind of like, you know, really remember and I learned something new. It's kind of cool. Yeah, I was just wondering like, we usually traditionally start with with your background. And this is where people can learn more. +But I know that you've been blogging and you've been talking publicly on conferences. But still, it's very interesting to know, you know, how did you arrive at this profession? What was it journey? Yeah, I think that's actually an interesting question. +And I mean, usually when I'm speaking at conferences, I do have this about me slide, which obviously I tend to just get through because I feel like it's so repetitive now, of having presented so many times. But I think I never really talk about as to how I started where I started. +And I think it would be nice if it was documented somewhere to look at when I get older and I can tell my kids or my, you know, grandkids that this is what I did it. Thanks to you for that. By the way, I'm also one of the regular subscribers of the Vector Podcast. +Absolutely all your episodes and obviously whenever you publish, I'm like probably the first ones to check them out. So how I started is kind of interesting. I was a master's students and I was supposed to finish my master's, that is, a master's in computer application in 2008. +However, our college, which is, I mean, I am from one of the top notch institutes, which has like a common attitude test. It's like about 400 K people every year take that test and about 100 people selected. And I was one of them. So obviously it was already very prestigious. +And we had this culture that, you know, if the course is for like three years, that is full time course, we would already get our placements in the year two. +So the company that I got placement with was a very small company and I think I have some sort of, you know, radar that I'm always attracted to small companies because I feel like I get a lot of accountability, a lot of things to do. +Apart from the stated rule in my job offer, which is what I kind of like as well. So it was interesting that I also reached out to them in 2007. +So I was supposed to complete in 2008, but I reached out to them in 2007 itself that I have to complete my industrial project, which is supposed to be like a dissertation thing that you do in PhD. So it was supposed to be a real life project. +And I reached out to them that can I join the company and like kind of do the training. And I mean, it was really nice of them to let me come. However, they didn't really have any kind of like training programs. +And they were experimenting with the solar and losing and blown and zo at that point in time. I'm not sure how many people would really know about zooping alone. They are like the Python based. I don't I don't at least. Yeah, that's actually because it was really a thing back then. +So it is a content based content management system, which is based on Python. And at that point in time, you know, Java was really a thing back then when I started in 2007, eight. +So having worked on Python, I was like, why am I working on something which is like Python? I mean, I want to work on Java, you know, J to E. That is really a thing like build cool applications. +But because I was a trainee and they could obviously, you know, kind of modulate, you know, my role, they asked me to research on, you know, solar and losing because they were coming up with the social networking website application. So Orkut was leaving the space. +Facebook was coming in and we were working on this Facebook application, which could let two people talk without knowing each other's number. +And that was all through like what would pop in my, you know, profile, like who are the people who are close to me? And this was all supposed to be based out of Lucine and solar. +So this is all, I mean, it started off with that I was literally pulling out all my hair at that point in time because you can imagine how immature solar was at that point in time. We didn't even know like which version of Lucine would go along with which version of solar. +So we were trying and testing and there were so many things which were missing as well. But I think I got pretty much, you know, soaked up even though I at first I did not find all of that interesting because my friends were doing Java, G2E and dot nets. +So I was like I'm missing out on something I would, you know, catch up with them and they would talk about all, you know, cool applications that they're building and how database connections, etc. were working. +And I was talking about yeah, I'm building this, you know, data and we're trying to locate people on Google maps and yeah, Google is really a thing. So it was like I'm speaking a different language altogether and people were like what? So it was it was interesting though. +So you felt like an underdog or something? Yeah, I felt like that. And on top of it, I think the bigger challenge was that we had this guy who was basically from the ontology's word. So semantic web was very underplayed at that point in time. +I think right now it's like coming up as if something really fancy and all of these things that are really seen in like big light were not really known as they were. So I was asked to find, you know, like the application has this feature that I could place people in the circle. +Like for forse really a thing friend of a friend. So the major I think the breakdown for me was, you know, dealing with the relationship. Every person is a document and finding relationship between these documents was something that was given to me. +And I was like why why God why am I supposed to you know, do this thing? So relationships and ontologies and visualizing this stuff. So we implemented this visual map using cluster map API back then. And I mean, now when I look back, I feel like that was like very cool stuff that I did back then. +It sounds very cool actually. It is. Modeling Graph using Lucine. It's like not necessarily something people do or at least I don't know about that. Actually, and we did not really have any cluster monitoring tools as well. So we built something by ourselves as well. +So using GraphWiz, we built, you know, like how each of the clusters are doing. So we had this thing that obviously cluster was not something that solar supported back then, but we actually made our own cluster. +But one of the things that I would also like to mention here is that we we did not really, I mean, at least I was or my manager was not really aware of like all of this could be contributed to open source. +So we were like living in our own world, trying to, like build something really cool only for the client, but not really for I think a public. And I think this is something that came way later in my life. +Yeah, I guess, I guess probably like before you contribute, at least how I feel myself when I also doubled in solar a bit, you know, in the beginning and then it took 10 years of in my life. I actually don't want to say off my life. It sounds so negative. +But like, you know, in the beginning, you still need, if you're like a startup or something, you still need to figure out whether this works or not, right? +Whether this solves some needs for your users, how much of this you want to still keep as a business secret, how much is okay to contribute because you might see even more development in this, right? And get to feedback. +Absolutely. Absolutely. + And, you know, having joined a company that was not really like, you know, big companies back in India, I think maybe you would get an idea as to how cool or, you know, how small the company was that in my induction program, like the first day when I joined, I was being asked that, you know, grab a cup of coffee and watch this movie. +The movie was Pirates of Silicon Valley. So they said, you know, we don't want you to have any rocket science e-scales. We were just make you, you know, learn all of that stuff. Just get this mindset. And I think that's what I tried at. I love that movie actually. +I think there are two versions of it, right? The original and some kind of remake if I'm not mistaken. Right. That's correct. I think I watched the original one. The original one is amazing. It's like almost this kind of, you know, it's like a meditation. You go into that state of mind. Indeed. +Yeah. We have a lot to learn from that movie. Yeah. But I think just like everyone else, I was also, you know, in India, I think we do have a lot of pressure of academically, you know, like building, grooming ourselves. +So when I started in 2008 and then, you know, got married in 2009, had my first gig in 2010, I think it was the time when I had to take a break. But when I did it come back in 2011, things were obviously had changed. +And someone, you know, told me that, you know, it would be a good idea to have more of, you know, like a hands-off kind of a role. And that made me think about, you know, going for an MBA. And I pursued MBA in 2014. +And I decided that I would leave development because it's too demanding and I cannot manage that with a child. And when I became a manager, I also took up a job as a manager. I did that for like two and a half months. + And I was like pretty bugged because, I mean, obviously, you know, once a developer, always a developer, I think I started always, I mean, I felt like a little bit, you know, more triggered or more, you know, joy in seeing how things really work and not really by having, you know, said that, you know, this is how, this is what we need to do with an application. +Like, this is the client requirement. This is the BRD, like a business requirement document and then go implemented. So I think which is why after two and a half months, I just decided to come back to the, that's where I belong. +And was there something in the first place that prompted you to take the manager role? Was it just the fact that you were going out of the maternity leave and you thought you will do better in management? Was there something else going on? I think that that's also interesting. +I think there are two sides of this, you know, answer. The first one was, you know, people usually associate and this is actually true that, you know, in a dev role back in India, because we have a lot of, you know, sourcing work. The clients are usually based out of US. +We have long hours of working and usually the client calls would happen in the evening, because we have like 12 hours or 10 hours of difference. So by the time you're ending your day, you have your client calls and you have to stay back in office. And that's probably not like that for our managers. +They have little more perks. And I think that's what someone suggested. And I think I tried to play along, although I mean, I don't regret doing MB at all, because it's, it just helped me understanding like what my manager is going through. +I mean, how is he thinking like, how should I behave? So that just gave me also the context of the other side of the table. So I don't regret it. But in some sense, it's, if I capture it, right, it sounds that maybe it wasn't the most natural move for you to take the manager role. +Maybe it was just some circumstantial in a way, right? You thought it would be better, easier with your new responsibilities in the family, right? That is good. Yeah. And I mean, I would say like, you know, personally, maybe women have obviously changed. +It's been five years I've moved to Berlin now. And until I moved and all, you know, women professionals that I know, my friends in India, I think they still have this problem of clearly communicating what they want in their job I mean, if you can't do that, I think you are already an awesome woman. +I was not one of them. It was always like something that, you know, people would see me as less if I'm asking for like, I need to be at home with my kid, because he's too small. He may need me. +But I always try to, you know, keep things to myself and try to, you know, change myself, try to leave what I was passionate about just to fit into that frame, like how women should be, how a mother should be, or how a wife should be. So that was something I learned very hard way. Yeah. +And it's like something that, I'm sure we will touch this on this topic later in the podcast, but like, it's something that is kind of implicit. And when we talk about men, maybe they don't feel that. +And again, it depends on the culture where you come from, you know, in my culture, you know, men also like assign this responsibilities that you should be the man who earns money and hands all your decisions need to be based in order to maximize that probability that you will be that person. +But maybe you don't want that path, you know, maybe you still want to go and explore what is it that you like. +And so it's interesting that how culture and, you know, society shape us in that direction, until we just carry the momentum until we realize, hold on a second, am I going in the right direction? And this would happen to you? True, true. That that was the exact same thing. +And again, I think the major bump came in that there was this company or or training company. So to say, let's put it more precisely, who reached out to me, which was like far away, at least from the place I lived in. +And they said, like, could you remotely, you know, develop this solar curriculum for us for our training? And it was probably the first, you know, big things that happened to me. And I was like, okay, I mean, I work on the application that uses solar-avue solar before things have changed now. +And that was in 2014, was they come back? Like, oh my god, solar is like still working. Like people are still working on this. Okay, wow, amazing. That's when I realized, okay, solar has really transformed. The community has grown. And it was interesting. +I think that was more like, you know, meeting my old friend solar in a whole new, you know, a tire like with dinner jacket and suit and with a tie. And I was like, oh my god, dude, you are popular now. So that was that was the thing for me. And preparing that course curriculum for them. +I think that was when I learned about all, you know, the developed features that were available in solar back then. Also learned about elastic storage back then as well. But that training became such a hot cake because I would give, you know, public webinars. +Obviously, we're was being paid for that too. There were like almost like 400, 500 people on those webinars to see like what this course is all about. Everyone wanted to become, so there was no rules, such as search engineer, like the engineer who knows about search more or less. +But we would take like 25 people only in that course, or maybe even less sometimes. But I think preparing the course curriculum was one thing. +And then conducting that course for the first time was completely next level because I did not imagine like people who would come to that course would come with like 10 or 20 or 30 years of experience in Java. +So to imagine like people are really asking me questions very low level, like what is happening when you know, face-sitting is happening? How is this, you know, a variable, you know, is it in the, you know, like a memory or is it in like somewhere else? +Like what would have like, like, performance wise? Can I improve this? And I was like, I mean, I am literally like stumped because imagine like this was 2014. +I started back in 2008 professionally after completing my studies. So six years and that took with a break of like one and a half years and competing with the knowledge of like low level code in Java with a person who's been working on Java for like 25 years. Obviously was something. +And I would always say, and I hope people who are listening to this did not, you know, recall like I'm saying all of this out loud on a podcast, but I would say like I have nine years of experience. But even nine years was less at that point in time because these people are like always very senior. +And which made me, you know, like take break from my office, like literally understand from the code level, like how each of these features were working. Although it was solely done to like, you know, literally save my reputation at that point in time. +But I realized like the benefit or the grooming that it brought along was like way bigger. I think that understanding that was like, I would say like a major breakthrough. You reminded me of, I don't remember was it 2011, probably, and Berlin buzzwords. There was some raffle for whatever. +Like I want a book written by Rafale Kutz on Elasticsearch. And he actually wrote like a couple words there. And he said, if you don't find answers in this book, then read the code. And like, you know, the code is also open source. +And I was like, this was like such a big opening to me in some sense, even though I was coding by then. I was like, hold on a second. So if I don't find what does it mean? So this book doesn't contain all the answers, you know, like major things. And it's pretty thick book, you know. +I was like, yeah, wow. So it just tells you that how experimental you need to be, right? And that there are no given answers, right? Right. So true. Yeah, that's exactly how it is. And then while giving this training, I mean, I ran almost like seven eight batches. +One of the person was my student who recommended my profile to Lucidworks. And that's how I got into Lucidworks. And I discovered a whole new word of open source. Oh, so now you can write code and, you know, contribute or, you know, shape the product as well. +Like I can really define like how the solar function would work. +Like I've always, you know, being on the other side, like complaining, like, oh, you know what? I don't like, you know, how this shows on UI or I don't like, you know, why it forgets about this thing or how about you, you know, if I could change this behavior. +So instead of, you know, just making that change in my local copy, I could actually open source stuff. I could actually contribute as to how product shapes. And I think that was like, you know, like you're a car moment for sure for me. So, yeah. +And at that point, you moved to the US for that job or you were already in the US. So I moved briefly to US, but as I said, like by that time, I already had my second kit, who was six months old. And it was not very, you know, practical for me to stay there by myself. +And which is why I decided to come back, uh, leave my job there in Lucidworks. And I started my own consulting company called bistro innovation labs. I know the name would sound like as if it's a restaurant. +And the reason, I mean, there are like a lot of things that people used to ask me like, why is it called bistro innovation lab? Like you should have something like a sci-fi formula or like some math algo in the name. +Why would you keep it like a bistro? And I was like, because I'm so passionate about cooking, I think, I forgot to probably mention that during my intro. I got so excited. But yeah, I think a food part is something that really, really, uh, you know, brings the best in me. +I think if my staff or whatever I'm doing is not really working, I think if you do not find me on my desk, uh, you would find me in my kitchen. So that's mostly, uh, you know, where my word lies. Some dangling between my desk and my kitchen, because I think I love it that way. +And that's what I call my company as well. +But do you think there is some connection actually? +I think at some point, I even like vlog really briefly, uh, about this, uh, as I was just learning how to cook, I guess, uh, there's some connection between how you write code from scratch and how you cook before you learned how to cook that particular dish, right? +Like you can assemble from building blocks and like in that order. +Right. I think, I think that's an interesting point. I mean, it does function the same way. And it's obviously like the experimentation is something like you would experiment with different cuisines. Like usually I have, I mean, I don't really do that. I don't change the basic nature of the food. +I mean, if it is German food, it should be German food. I mean, I would not try to Indianize the food. Uh, but I think somewhere I do that and that experimentation is something that I would also connect with like creating something. +And I think that's, I mean, I never thought about it from that aspect, but yeah, I think good point, good core religion, so to say. Yeah. Absolutely. And then what happened next? So you opened your bistro innovations lab? True. And I think that landed me a job in Germany. +Uh, and interestingly, I had never been a Germany before. And for me, I mean, I had been to London before I had been to US before. So to me or precisely so to say, you know, I'm trying to wrap all the Indians that would say, for us, every foreign country, you know, we can speak in English. +But I think the biggest trauma was like when I landed in Germany in Berlin. And I realized that, uh, English gay niche. And I realized that, you know, it would be very difficult. + But I think I had, I had a tough year in 2018 when I decided to move here for several reasons, because I was trying to make sure like my, you know, family settles here will at the same time trying to, you know, have a little bit of a grip on the language as well, like my work at that point in time. +But I decided to close the company afterwards. Sadly, you couldn't keep it. I could not. Yeah, there are some general notes. So it doesn't work. And did you, but you did have clients on it? Like you did the, oh yes, I had clients. I had clients. I had three pretty major clients back then. +And I think one thing that somebody commented about it yesterday as well. And they say that I come across subtle, you know, in a very subtle way, you know, very straightforward way. And this is something that people do not expect me to be. But I think that subtleness came way, hard way to me. +And I want to preserve it that way. So one thing that I try to set an example also for my kids is that I don't lie. I try to make things as clear as possible. So I tried communicating to my clients that, you know, I would not be able to work because I have already my hands full. +And I'm trying to settle in a new country, trying to manage my family, also helping, you know, my husband who did not have a job back then. But I mean, if you can adjust with that, we can still keep working. +But then I would not charge you for that because I mean, anyway, I would be paying taxes for it. So I think eventually, I mean, it was like more like one or two calls a week. And then it transformed into one call of a, then one call in two weeks, and then one call of month. +And then eventually I just lost all the clients. And I think that's when I decided to close it out. Yeah, yeah. That may happen. But but the still you have the, you know, the affinity to boards search world, right? And develop. I do have. I do have. +And I am working as a search relevance consultant now with open source connections. I think one of the reasons that I decided to work with this company, I mean, they are well known search consulting companies in the space. +And the mission statement that, you know, they want to empower the search teams of the word that really, you know, is something that really rings a bell, you know, or I would say like it shines with what I want to do. +Basically, I mean, you know, all of us have this, you know, mindset that we all want to do something for money and something for what we really are passionate about. So I think that's really nice like how I connect because I feel like I resonate with the model of the company. +I mean, I also want to, you know, empower people and like share my knowledge as open source or like, I mean, if I'm getting paid for it, even better. But yeah. And I think it is also different from how traditional consulting companies work. +Like I have been with consulting company myself in a starting of my career. And I feel like that the companies who are taking the services of these consulting companies are more like, you know, very closely tied. I mean, it's like, you know, being mad at with them forever and ever. +Like, no, turning back now, you're always, you know, going to be with us. And I think that's where it's source connection. It's like job security for some businesses, probably, right? That is true. Kind of like working model. +And you're saying that, you know, in open source connections, it's the opposite. And it's actually clearly stated that what you said, empower search teams to learn and become independent if they want, right? Exactly. That kind of entice the whole situation. You don't need to really be exactly. +And if you look at its very natural, like, you know, we help teams to, you know, fish their own fish. It's like, you know, we are unblocking them to achieve their goals. I mean, if you think of it from a video game point of view, like, and then they will be stuck at some other point. +I mean, by then the context would have changed and they would have swim through like their initial challenges. So, I mean, as a consultant, I would also get a new use case next time. So it's like, you know, we keep on learning with our clients. +And I think that's what really excites me about my job. That sounds great. +And I mean, when you think about search, really, what are the companies that are so publicly known and shining and doing so many things, but open source connections, you know, with haystack, grandparents in Europe and in the US with all the tooling, you know, like, cube it and beyond. +Like, I think it's in part, I think why this podcast also exists is because to keep going and discussing and keeping that connection open that we talk and we develop the thought further and we share our experiences. +And I think in many ways that's what open source connections has been so successfully doing. +And what is your role there in little bit more detail? So about my role, I think things, and again, another thing is that I love about my job is that I have independence in terms of like what I choose to work on. +So when I talk about an engagement, I mean, there have been cases when I've been strategist, also been an engineer. +So I get, I would say, like enough time to, you know, research about stuff, I get time to also be hands on, I get time to, you know, explore stuff and, you know, develop good solutions also as a council to the company that we work with. +And I think individually as well, like, you know, you feel really valued. Like if I want to have some transition, like, for example, if you would, you know, like to bring this in, because it's a vector podcast, like I added vector search to chorus. + So I think chorus is so to say, like a small, you know, like experimental, you know, webshop that bunch of, you know, folks started off, I think Eric Pugh and Renee and Paul and I think there are a bunch of other folks who tried to bring together, you know, all the two links, which is needed to run a webshop or e-commerce shop. +And I think with all this buzz that I was, you know, hearing at, you know, whenever we meet, you know, different clients, they were like, okay, what about vectors? Obviously, we're consultants, people look up to us, you know, for advises. +Like, is it something for me? Is it something that I can do? Is it something that, you know, we should go for, and, you know, I am a person again, I said, because I have something that really stops me if I have to lie. So, I mean, I would usually, you know, keep mum. +I would not really say something and I don't want to be in that situation for too long, because obviously the word is still, you know, getting ahead of themselves. +They are, you know, developing new solutions every day and then Chad GPD came and then, I mean, already a buzz about transformers and like LLM's, I think it's just non-stop. And everything is, you know, kind of, you know, like the boundaries are diminishing. +So, I remember like last year when I started with open source connections and Gen, in fact, I got a chance to work with this client and they were like, all about, you know, West Spa and then we also considered working with VVA8 and then what do you suggest? Like, are we making a good choice? +I mean, is it something that we should be doing? And at that point in time, I was like, literally, like, okay, I think I don't know, I don't have enough context of this. +And I think that's where it all started. + I mean, that presentation I gave at Perlin Buzzwords last year about West Spa, it was condensed version of how I learned Westpiant, how I got completely, you know, I mean, it swidled me off my floor like there is something that existed, you know, at the time when it's so existed and it has been as solid as, you know, rest of the other traditional search engines as well as, you know, so many, you know, data science functionality that it offers. +So it was amazing. And I would say like, sometimes that imposter syndrome that, you know, women have, you know, it also, you know, like transforms or it's something that triggers us to do something that in the end, comes out as, you know, more like shining or grooming us. And I think I liked it. +So I think that's where it all started. And I think I proposed that idea to some of the folks at OSE that this is what I want to do. +It took some time, of course, because I was on the full-time engagements with the clients, but when I had like the first, you know, stop, I tried to make some things work. I experimented with stuff and that's what we came up with. +So, Rene really helped me with the, because I gave like initial demos to him and Eric and Charlie. And I said like, you know, I'm struggling with that the embedding needs to be calculated somehow. +I mean, solar already supports vectors, but the only challenges that, you know, the vectors for the query need to be somehow calculated, you know, outside of solar. And that's when you know, Rene suggested like it would be a good idea to maybe, you know, use Quirky for it. +And I have not worked on Quirky before. And I have been wanting to do that. Somehow none of my clients were really at that stage that they could use Quirky. And that's how I got to, you know, touch the entire stack of open source connections. +I have already contributed on QPIT before, like adding visualizations and the other stuff. Solved a lot of bugs. I think Eric would really help you for that. And I would hate you. Why would you hate for that? I mean, because it was, it was something like, I don't know, like some sort of a charm. +Like every time I would touch QPIT for some client requirement, I would find a bug. And then I think it's been very nice, I think in principle that he's person that who would say like, oh, there's a bug. Please lock the bug. +How about I offer you to work on it? I mean, it would initially feel like because I recorded it, but actually, I mean, if you see at that, I mean, it would actually come as a, you know, very empowering thing. Like he offers you to, you know, solve it the way you like. +And I think not many people are as open. Yeah. And if you turn it around like for him to know all the context details is super hard because you encountered it in very specific context and you have all the input to reproduce it, right? So you are the expert of that bug. That is true. +I think that's another way to look at it. I mean, I never thought of that in that aspect. But yes, that's that's true. So that's something that triggered me to work on QPIT. So I worked on several bugs before I added visualisation. +And that to came through because I had to otherwise, you know, do the visualisation outside of QPIT, and I would then complain like, how about, I mean, I'm anyway using data from QPIT. +How is it like, you know, if I could use the data that's coming from QPIT, like insight QPIT somehow, you know, supporting the visualisation as well. And I think it was certainly groundbreaking that we added Python notebooks functionality to QPIT. +And that just opened up a whole lot of, you know, the AI portal. Yeah, that's amazing actually because I also kept pushing QPIT in every job that I took, right? And not necessarily pushing as in I am selling the tool and I don't care what's the purpose for using it. +But I just know that for all these typical problems with search quality, instead of like reinventing a wheel, why not take an open source, you know, tool, like QPIT. And it's commercially friendly license, you know, go ahead, deploy it. It's very easy to do so. Exactly. +And then when I saw the, the notebooks, I was like, wow, this is so cool. I can just now select to my data scientists and say, hey, we've just labeled all this queries. Can you do your magic right here in the notebook? And I can actually access it as well, potentially, or whatever. +I mean, it's just like so much easier than to scratch your head and think, okay, now I need to download all this data, all these annotations, and then push them somewhere else. And then, yeah, it's, I think it's an amazing feature. Thanks for doing it. Oh, my pleasure. Yeah. +So I think also during this course of discussion, we also discovered some documentation bugs, something that we're not mentioned in solar documentation. I contributed to that too. +So I would say like in principle, I think we have a very supportive and encouraging culture in the company, which is what I really like. I think, I don't know, maybe if I could talk a little more about this wet room implementation. I mean, there have been several talks about it as well. +I also presented that, you know, Craco, the haystack on tour. But I think it was something that was, you know, has been sitting on my mind for too long, because I think we need a, you know, reasonable way to not, you know, like, you know, detain the question from the client. +And also, at the same time, we want to address the question in most, you know, like explainable way. And I think this was like the explainable thing that, you know, is something that people can use. +And I think all this while, you know, people have been discussing about vectors, I think people charged money to show you that, you know, vectors could work in your search engine. And we do it for free. +And I think, again, going by like what my company does, like we provide a lot of informational content, you know, for free and open source, lots and lots of things, you know, which usually would cost a lot of money to the companies. +And I think this is something I really feel like I'm doing a good job. I'm making a difference. So I think that that really brings a lot of, you know, like satisfaction that I'm doing a good job somewhere. So that's, that's nice. +And I think now I'm working on to also, you know, experiment with other stuff with the selector thing. I presented that, you know, we would be working on improving the image models. +So basically, I mean, I'm sure you know about it, but for some viewers, it would be a new thing that chorus is a dummy shop. And we have a dataset that comes from icecat. So icecat data basically is, you know, like a collaborated content from Amazon and other, you know, web shops. +And usually that content is like very, very structured content. It has images as well. And Lord, if you know, my new features or attributes of all the products. So which is why it's like a good example. +Plus we have other content as well in the chorus in general, like how Quepid would work locally for your web shop and how you can use quirky and smooie to do the search andizing part and, you know, manage these search rules. +Like if you want to bring some brand up in the search results, you could do that as well. So I think we promised that we would be working on the images side. And because we have access to images, usually we have any commerce shops. So we tried to, you know, leverage that as well. +And I think if you look at the demos that we presented, even without fine tuning the results were like breathtakingly unbelievable. We were like, wow, this is, this is amazing. +So I think in general, I think it was very, you know, liberating experience that, you know, we could use vectors successfully in these shops, which are using the traditional search engines. So we, I recently also contributed the last search version. +Because again, in lot of forms, when we posted about vectors in solar and chorus solar version, we got a lot of questions about like, is it going to be supported also in last search? So I think I just took stab at that too. And chorus is implemented in which language? Is it Java or? Yes. +So chorus is, yes, it's combination, of course. I think I'm adding a lot of, you know, I think content to it as well. +I think one of the more interesting things that I also contributed and I felt like it's something that usually people would not share it on the open source is like how you can convert your documents into vectors. +So I think this is the part that usually, you know, the data encoding process, something that people would really charge you high amount of money to, you know, add vectors into your indexing pipeline. And I provide that again for free. Yeah. And in solar version. +So I think that is also something that people could take advantage of. Yeah, I mean, just a couple of years ago, when was it exactly a year ago? Eric, you traveled to Europe to meet you guys in Berlin and then he also traveled to Finland to visit me. +And he wasn't the hotel room and he said, hey, let's work for a few hours. And he was actually asking some questions about vector search. We were writing this article in search insights, right? Oh, right. Right. And and and he was saying his very passion, he's like almost theatrical. +He was like walking in the room and saying, okay, here is the pipeline. I have solar. I have this. I have my Java like client. So where will you compute this vectors if all the models are accessible through Python, right? Yeah. +And it wasn't just a question of you might get the same question with your client, but it was like, wait a second. Do I even know myself? What would I do? And I said, probably I don't. So I would start engineering something from scratch. That is true. +And I think that that is that is actually one of the most obvious questions that keeps coming across because Lord of our clients are now interested in this. +I mean, I know a typical work week for me has been like, I'm giving like more than four or five demos in a week to clients like who want to know about this like will it fit my use case? Like I have this, you know, size of the catalog like will it fit my use case? What do I need? +And what kind of models? I mean, I need to choose and obviously there are some things that we need to also, you know, spend time and, you know, charge money for. +But then eventually it turns out that, you know, people bring in their concerns and that basically, you know, like shapes, what is it that we need to contribute next to the open source? What is it that is confusing people? Why people are creating so much hype about this stuff? +Like this needs to be demystified. +And I think that's what my company does the best. And I think I'm just learning from them. Yeah, I think it's the it's an excellent spot. And it's like, usually with all these hypes, things get overcomplicated. But in the end, if they prove to to exist, right, it prove the right to exist. +Then I think many of these things will get simplified. They will probably to some extent even commoditize. And like, I think it was the recent LinkedIn post that, broken up every single mind and search by Dr. Dr. +and Bullway, he says, you know, why what's happening with the last search? What's happening with solar? What's happening with this? That's right. Lodge language models. And, you know, like, and it's like, how do so many people chip in? And I was wondering them too. Yes. +And they are still like revolving around like some of the some more interesting concepts, some more like basic concepts that really are unsolved in many ways. +And, you know, how do you even like deliver vectors to your database and so on and so forth? +And one of the comments, I don't remember who said it was, hey, you know, in the end, keyword search and vector search, both will be as equal kind of like, modesty that you can play with in any order, probably give some weightage to one or another depending on your use case. +And so complexity will shift away from these basic topics to something more domain specific. That is actually right. And I'm glad you pointed out. +And this was something that that's been like constantly asked when we present, when we give demos that, you know, how do I fit this into my existing stack? I know it's cool. And because when we say that, you know, it is understanding the semantic meaning of your query. +And that means that even if the things which are not described in the similar vocabulary, then also your search engine can find them. So I think which is a very powerful thing if you look at. And also there are some people who really wanted to use all the machine learning magic. +I mean, I remember the craze in 2017 when LTR came out like how many people really wanted to use LTR? And then it was like, you know, that kind of gave a lot of struggle to a lot of folks. So it's just that how easily accessible, you know, machine learning models now become. +So people need to know, people deserve to know that it is not that rocket science anymore. Like it is very common. It's very obvious. It's like the natural path that should go into. And the thing that I wanted to point out earlier was that the hybrid is the wave forward. +There are so many things that I could think of that keyword does the best. And I think sooner people realize that hybrid is the way forward. Yeah. +And like what is your take on hybrid if you were to offer it to a client, you know, I have heard from some of the clients in the past, you know, okay, so if we have a vector database, like VVAT or pinecon or whatever or quadrant. +And then we have the needs for the e-commerce application like facets and we cannot do it in some of these databases. So what should we do? Like does hybrid mean that we will run to databases like one elastic search, one vector database, or do you have a better answer to that? I think not really. +I think that's also an interesting point if it's coming in the discussion here. I think sooner or lot of these, you know, vector database and vector search engine companies are also realizing that they cannot lose what keyword search engines brought. They cannot just take it away. +I think one of the other things that, you know, the keyword based search engines abroad is like, you have total control on what goes into the search engine. And I think this is something in the name of semantic understanding, you cannot just push your content into these search engines. +So you still have to massage the content, you still have to treat it, you still have to have control on this data. And somehow, like if you say that, you know, synonyms are not needed anymore because, you know, the vector search engines, you know, would understand all of that. +But what about stemming? What about, you know, there are tons of other things that we do in the before pointing data into the search engine. And I think that still stays relevant in a lot of different contexts because this has developed, this has grown over the period of time. +You cannot throw it all out. +So I feel like there's a, you know, mid kind of point that, you know, we have to come, like, especially for the traditional search engines and the new search engines, which are emerging in the market, they have to come somewhere in between, where we try to, you know, bring best of both words. +And I think that is going to be the way forward. Yeah, but I guess we are not kind of like there yet, right? I would say like, I mean, the change has already started. All right. +In the presentations that we give and the demonstrations that we provide to the customers, I think people ask us that, you know, what is the smallest use case I could, you know, try with this. +So I think one of the suggestions that always, you know, comes from my side is that, you know, attack the cases that do not, you know, perform well with your traditional search engine. So for example, like the long tail queries, I think this is where any traditional search engine struggles. +This is probably the first thing, and instead of, you know, running into a zero head screen, like it is better to, you know, have a chain sort of, which should delegate your query to the vector search engine or a vector part of the query. +And this is something should be like a smallest, you know, like way to adopt the the newest technologies and take leverage what they find a bring it. +So maybe like from the product perspective and business perspective, reducing the search abandonment rate, right? Because that, that that what actually takes a lot of money away from all these players, you know, absolutely. +The abandonment you're just based, based off and you're like, you cannot find anything. So why should I keep trying? The system does not even like respond. Absolutely. +I mean, I have been in that situation before because I've also worked on like a lot of product searches and I would leverage something like, you know, I would keep on doing like the relaxation of the tokens in the query. So I would keep dropping the content that doesn't make sense. +But then I would say it's not the easiest thing to do. Rather, it is easier to pass on this query, like the long query to another system that is, you know, dealing with semantic similarity. And once that has proven its worth, I think that's the time we bring it forward. +And we take it more like from the down to up approach. Yeah. So you think the message really to vector database companies is to think about what you can take from keyword search engines like solar elastic search, open search, I guess as well, right? Absolutely. +And also message to the existing keyword search engine companies as well as that you're not old fashioned, you're not like out of the market anyway, like you have proven your worth over the period of time and it is here to stay. It's just that how quickly you can adopt to the change. +And I think that is happening. Yeah. Yeah. That's very interesting. I think you you are calming many people. No, I like, oh, I'm like, I will losing, I will losing the wave of innovation because we cannot, you know, introduce vector database into the mix or whatever. +But I think you can, right? Like with solar, Alexander Benedetti is doing a lot of work in implementing the vector search there. And then in the elastic search, of course, solar Maria Shripe and others have been Julie. Yes. +Julie Tsipzirani have been doing work there, right? But like I do still feel like what Doug was saying in his post, you know, the cracks of it that like it doesn't feel like this functionality is advertised well. I think that's actually the point. Yes. +And of it, not even from the marketing perspective, but more like from the perspective of, hey, how do you get things done with this? Yeah. Like all these basic questions answered. Yeah. I think that's actually a very good point. +I think and there's a way, I would say like a contrast you see here, because the companies which are bringing the vector search in a database of the search engine format, they're new. They're upcoming in the market. I think this is part of the marketing strategy that they have to talk about it. +They have to advertise it. It's just that, you know, the traditional search engine companies or I would not say like solar is not with your company. Elastic probably is. +But then because they are already, you know, like very popular people are using it, they don't need that mass, you know, publicity. +So to say, but I think we need to talk about it that, okay, I think if that's a trend, like we do it too, but we don't talk about it that much as much as we should be doing. +And I think that's actually an interesting point, which means that we should talk about course more, because I think that basically exemplifies as to how easily can this be done with your search engine. +So you don't have to divorce your existing search engine to use some cool technology, unless you really have a case where you are starting right up from the scratch. I think you can consider using one of these. +Otherwise, I think if you're using something already, which has grown over the period of time, it doesn't make sense to throw everything out of the window just yet. Absolutely. +And with your Lucine mindset, what have you seen in Vespa that looked attractive for that? Yeah, I think I have been quite a Vespa fan girl, I would say more for the reasons that the content, the kind of content that Vespa, he generates. + I think, you know, you would need, when you're talking about features, when you're talking about, you know, search engine or, you know, different kind of like what can be enabled with this feature, you all, you know, think about like, okay, will it perform like how much of queer response time am I looking at? +What is the data set I'm looking at? And when you look at like the Vespa's content, I mean, you don't have to look any further, like everything is summarized so well. +I think this one thing I'm trying to, you know, add to my writing style that I assess everything, well, that I can say it out loud to the public, to the word that, you know, this is how it performs. +And I think the very knowledgeable folks, I mean, especially Joe, I think I have been super impressed with how he describes stuff. And I think some of the things have really like blown my mind out as well, like, oh, this could be done in this way as well. I think that's that's one of the things. +And while developing this presentation last year, I think I bugged him a lot. But if he was, he was always, always super responsive, even his team, there were some, you know, UI things that I found out, like we're not working as expected. +And I think they're always, you know, very modest and, you know, acknowledging that, okay, this is something that will work on. And this is how it works. +So they're always there somehow, like, I don't know how big the team is, they're always, you know, some familiar faces who are always responding to your messages, but it's nice. +One of the other things that I would like to point out is also that, I mean, distinct or I would say like a nice thing, is that updates is one thing that, you know, you would struggle with in a search engine. +If you have big catalog and you're expecting, especially like in e-commerce, like, updates come in, the company that I used to work for before, we used to have several updates and we used to club them out, kind of bashing process. +And we would process them just as like together, because we didn't really have resources to process them, like one by one or just as how they come. I think Westphah really does that, like through updates, through atomic updates is something, through partial updates, sorry, is what they do. +And I think this is something really, really cool. And I think that just takes away that need, that you need to rein next everything and sometimes, you know, people complain that I have a really big catalog and it takes like six hours. +If I rein next everything, I think that's something they clearly stand out. I think when they say when they claim that we are searching for big data, I think they really get it done. Yeah. And I think it was also proven in the context of Yahoo systems, right? Some of the life scavenants. Yeah. +I mean, they're always the early adopters and I think the way they write about this stuff and how they implement, I think they correlate, you know, cover the topic when they're talking about it or implementing it. And I think that's what I really like about them. Yeah. +So like if you would recommend someone who starts from scratch, would you recommend Westphah? I think I can. I think I can. I think, you know, one of the things, maybe I'm old fashioned or I don't know, like this, this is not affiliated way. I mean, no one has really paid me to say this. +But more of like, I feel like it's not as fragile as so many systems that are coming up recently. I mean, obviously we have more sourcing power. We are ways stronger infrastructure wise. But I think it is as solid as, you know, so do our elastic is. I think the kind of trust I have in them. +And also like as, you know, clearly catching up to the trend and, you know, evolving soon is also what they have. So I think that's that's really kind of something remarkable. And like to think, I mean, it's a huge pallet of systems. And it's not like one of one is the only winner here. +Like it would think it would think about, you know, Luzin itself, which has been developed for like how many years 20? Yeah. Or more actually. It has so many human languages. Natural languages support it that you cannot find probably in West Boa or other systems. +But again, it all depends on your market where you're going. If it's English speaking, probably you'll be fine. But like if it's like Japanese or, you know, some of the interesting tokenizers that have been contributed to Luzin, I probably still stand out. +Yeah, I think that's one good point that how much of control or where exactly am I coming from? I think a lot depends on the context too. That is, that is right. I mean, I would. Yeah. And like switching gears a bit. So we did touch on this topic. +But like, so your progression and the profession has been from, you know, what sounds to me like, I don't know, it's the super tough competition to go from to for 100,000 people to 100 something like that. It's just insane. It just feels like, you know, a journey full of challenge. +But then like on top of these, there could be other challenges that, I don't know, like gender inequality or whatever is happening in the world today, right? In the profession. +And this was one of the topics that really stood out on Haystack in Berlin last year, in September, right? Where you ran the session, women in search, if I remember correctly, the title. And you had some women in search invited on stage. +And some of them have sent, but here, recorded, um, little presentations. I mean, this was very like emotional. It was, it was a learning experience for many. The crowd was speechless in some sense, uploading, of course. +What was going through your mind when you were preparing this session? How did you come up with this idea? I think, I mean, I would also acknowledge like, when people raise hands after this session, I was expecting like, oh my god, what kind of questions? +Like, people would start asking me, like, you know, how did you come up with this? Like, how did you come to this figure and stuff? And they would ask me questions about what I presented. +But it was absolutely so hot warming to see that each one of the people who raised hands were to appreciate and tell me how they felt about this session and not really like putting me on the spot. +And I think one of the other things that I wanted to achieve, because we've had the first session of women in search in Haystack, US last year. So Haystack is just from the corner while we're talking about this or might have happened when we roll this out. +But yeah, I think Audrey presented the first women in search session in US. And it was a panel discussion as to what women really expect in the company or what kind of qualities or what kind of features they, you know, stand out for women when they decide to join a company. +And it was our long conversation. I think there were different kind of feedback. Some people enjoyed it. Some people said, like, oh my god, it was kind of like too much to sit in one place and listen to like white women talking. +So I think the idea came from the point that I would not, you know, have like a panel discussion. It has to be something different. And it has to be something solid. It has to be something that people can relate to. And it has to be something that is contributed by several women. + So I cannot bring everyone up on the stage of course, but I wanted to make sure like I have as many women as possible somehow to talk about themselves because each time I speak to, you know, fellow women in the search of crowd, I feel like they, they very much, you know, underplay what contributions they make. +And I think this is something I keep telling, you know, people that I meet that, you know, what you're doing is amazing. It's just that if the other person failing to see what you've done, it's not that you've done, you've not done anything less. +And I think this is basically the message that I wanted to kind of, you know, spread across that, you know, if you're not seen enough, I mean, maybe we just need to gather, you know, like we as women and we would be there to support each other. +We would be there to kind of, you know, advocate for each other. We would be there to mentor and collaborate with each other. If, and I also said this in the session, that one thing that always, you know, like surprised me is that men form group to kind of groom themselves to develop themselves. +Women just don't do that. I don't know why women are often, you know, like in their heads, they're still competing with each other because it's like, okay, only one could be misuniverse. Only, you know, one of you can be succeeded. +There's always like, you know, one best thing and that basically, you know, triggers this competitive nature in us. And I think that needs to really go away. Like, and people do it, people make us compete. +And I think this is something that somehow, you know, I'm trying to, you know, like spread across that there's no point competing. +Let us, you know, use each other's, you know, strength to become, you know, one solid strength so that, you know, we can really, like, become a better version of ourselves. So we're not competing with each other. We are, you know, we have to act as one. +So that's, I hope that that message kind of spreads across. And I hate when companies say is like, you know, we have rolled out this position. And we're expecting like 33% applications from women. Like, we never do that for men. +Like, we never say out the numbers that, you know, like these much of our, you know, applications came from men. Like, why we have to always, you know, explicitly talk about this number. Like, we have reservations. We don't need reservations. We need, like, we need to be equally considered. +And I think there have been like several sessions. And I think when you talked about the preparation, I think I literally soaked into that moment, you know, became one of like the activists myself. I was attending so many different kind of webinars and like the in-person sessions. +I don't know how many tons of like groups that I join afterwards to listen to like what people really talk about. And I think this is like way specific or critical to me because I want to make sure like I deliver my 100% when I'm doing something even though if it is non-technical. +So although I was like very skeptical about like if I should do something non-technical because I think in my head I was still replaying that I don't want to be type-casted as like, oh, maybe she's not that technical. That's why she's talking about, you know, non-text stuff. +So it was kind of like I was confused like if I should do that. I mean, I don't want to be type-casted. But in the end I was very happy, very surprised, happy, surprised that people took it well. People took it the way I expected them to take it. And a lot of women reach out to me as well. +A lot of companies reach out to me as well. And it's surprising that how many people want to collaborate and you know they tag me on several, you know, LinkedIn posts as well when people make big claims. +And that gives me an opportunity to speak to different companies as to, you know, what are they doing? And what is it that they want to do? And sometimes people expect that you know, I would be coaching like how can they add diversity, how they can bring in more women in the team. +And I'm like, okay, I can help you with your, you know, search application. Not maybe this is not probably what I master. But it's nice. I think how many people kind of want to be involved with this venture. There's this company that I recently spoke to. I would not name them. +I would present them maybe in Haystack EU who have got so convinced. And even the CEO of the company got so convinced that they introduced like supporting women and having women in presence in their company as like one of the pillars pillars of the company. Like this is what we are defending. +This is what we are going to be talking about. So which was very hot warming. People sent me messages like, Oh, after your session, you know, we added like two or three women to our team. I think in next six months time, we would let you know how this goes. +So it's very warming to know, like people are trusting me, people are, you know, like reacting to it very positively. People are not, you know, typecasting me. And that's, I mean, it feels, you know, like I feel accountable for rest of the women as well. +So I feel like if I mean, I can somehow, you know, bring more of women together and like the problem that they have, they do not have means to network even though we are so much of advanced word. Like some way we could mentor women. +I think if you remember, like the women that came up on the stage, they were not speaking up that nicely as well. I mean, they were not groomed in the in the manner that, you know, some of them were really shaking. +And some of these women, you know, wanted to, they were feeling more comfortable to speak, like in a recorded video in the closed room. And I think that's where it was coming from. Some of them were obviously coming from another place altogether, last minute editions as well. +So obviously they could not manage to travel. But most of them had this fear that they were like, you know, we would introduce ourselves in one line. And I was like, yeah, I mean, one line, two line, three line, just come up on the stage. +Like, let's just, you know, feel that, you know, light up the stage, like how does it really feels like? So, and if you, you were there in the session as well, like none of the contributions they mentioned were less in any way. It's just that we don't realize the, that how big our contribution is. +I was blown literally like, I think I even made the comment there. I don't remember the name of the lady, but she was saying, hey, I was just like a student. And then I found an internship in one company and I started contributing to solar and my code was accepted. Yeah. And she was exactly. +She was still, like doubtful of herself. Like, am I going, what is this? Is this the right thing? Yeah. +And I was like, I remember myself, when 2010, 11, 12, something like that, I came up with some ideas, some code, and I was kind of thinking, what if I contributed something to solar? And I failed. I could never like find my, myself a path there. +And then at some point, I kind of like gave up in a way. And you know, like, and then this lady says, hey, I just did this small thing, right? And clearly underselling it. And by the way, yeah, that's true. +And by the way, I mean, if she's listening or someone else is listening, I have used peramsat in the vector implementation as well. There you go. +Yeah, I have tried to include all the bits and pieces like, you know, I mean, I'm always, you know, touched by this open source, you know, contribution part. I feel like that gives you the opportunity that you can learn from experienced people. +And it takes effort to accept the feedback from such a large audience, so many experienced people. It gives the opportunity to interact and have, you know, like comments from people who've been working in the industry for such a long time. +And if you survive like one contribution, even one contribution, I think that's where it starts. And something like, you know, it's like some addiction. Like once you start it, it's like, you don't, you just keep one of, do it like every, in every possible way. +You just want to make sure like you're contributing something or the other. It's so amazing. It feels amazing. Yeah, exactly. +And it's nice to say this because I hope that there will be more female listeners also joining this podcast, according to my statistics, at least, you know, there've been domination of male listeners if YouTube is not lying to me. But I don't think that should continue that way, for sure. +Because this world is much more diverse, much more interesting, and also we should remember to say that in the relevance and matching text lack, there is a secret group, right? Can you say something about that? Like a channel? You mean women and source group? It's open. +I mean, it's open to women and the allies. So if you consider yourself as a ally for women in search, you're welcome to join it. There is no secret. You can be happily part of it. And we have session one some month every first Wednesday. And we still have that. +We are trying to bring in more useful content. So I talked about like mentoring part and the, you know, collaborating part. And then how I'm seeing that, you know, women should contribute and, you know, collaborate more. +So I'm trying to have a pattern where we could have some text sessions as well. Like I imagine, I think two, two months ago, I think, because from last two, I have not been able to attend myself one when I was out of town and the other one, I think I was not well. So I did not attend them. +But then what I'm trying to have here is that we are trying to have more like text sessions, more like people open up to themselves and then, you know, to the others that this is what I'm trying to achieve. +So I think the session that I had like two months ago, I was still implementing stuff up in course. +I was still kind of battling with like, how do I make images work? +And I was considering and it was so beautiful that during that call, like everyone who was, you know, just kind of, you know, getting to know each other and it's nice that, you know, every time we have a meeting, at least one or two new people join in. +So one of them said, like, have you tried this? Have you tried that? And everyone got there, you know, Jupiter notebooks out and then they all started, you know, like, oh, this is going to work. Oh, maybe this one needs a licensing cost. +Oh, maybe this one we should not consider because it supports these many things only. And then by that time that one hour, I mean, became so productive that everyone left that meeting with some knowledge of vectors. And it was so amazing to see that. That's super cool. +I mean, I'm really glad to hear this because it both connects to like an overarching goal, some real purpose here. And also, you know, you create the environment of support and exchange and cross-pollination. I think this is something that men should do too. +Yeah, you're welcome to join the sessions. I mean, this is really, I mean, we feel, I mean, it's really nice. And I can say, like, you have good reputation. People would love to have you. +We hosted a Sebastian from Weaviate who offered to help after the Haystack EU session to help women to, you know, have some public speaking skills. And I think we've had an amazing session. We did not put that out on YouTube yet. +Not sure why, but then people who attended absolutely hanged us and said, like, it was really useful. Some planning to have more sessions like that. Maybe we could host you someday as well. That would be awesome for sure. And so you have a YouTube for that. Or is it part of Oh, we have it. +I think this is the reason why it was not put on YouTube because we don't really have a channel. And we did not know like where should we put it. But we recorded it for sure. We have the recording. Yeah, I think today YouTube is one of the probably defective platforms. +I don't want to over-advertise it, but I think it's a good place to be. Yeah, I think that's a good suggestion, I think, will certainly consider for that in future as well. But yeah, I think the entire idea is that, you know, we need to make a positive impact. +Be it with open source, be it with, you know, the trying to push and bring up more minority people up front. +One of the most, I think one of the things that I would like to highlight here is also maybe we haven't touched, but while talking, it strikes me that we have a saying back in India that behind every successful man, there is a woman. +I think it kind of, you know, reverses itself in my case because my story started. I mean, if I talk about from my college, my husband is also my, was my classmate. So I'm like one of those blessed people who has a company of someone who always supports me. +And I think I have been here always, you know, bold and like standing on the stage giving presentations because behind that is the person who is, you know, very meek, very nervous. Like, how would I do it? And he's the one who's always encouraging me to come up on these stage and like, I can do it. +So it's interesting that behind me and every, you know, big stake that I took, there was always a man. So if I talk about like the first one started with my husband, the next one was obviously my manager at my first job. +And with the open source as well, I would like to mention that there's search engine that's probably not very popular. Maybe a lot of people do not even know about it, open source server. And it's not affiliated with AWS open search in any way. But so this guy has recommended me also on LinkedIn. +And we bumped into like maybe he was trying out, you know, other contributors or developers as well for his search engine. And we collaborated and he showed me whole new word of open source, open source contributions. +So he's a very critical person, I would say like code wise, I think he's really good. And he worships, I would say Mike McIndels, because every time I would not understand anything and Lucy and he would show me something from my McIndels. So I mean, it was, it was nice. +And then I started, I think if I talk about from the open source connection, affiliated libraries as well. So chorus was mostly like Eric and Renee kind of pushed me into like we should do something together, keep it again. +Eric promoted sort of that, you should try fixing some things also with solar. It was Eric. And with open NLP, it was one of my colleagues at open source connections, who's not with open source connections anymore. So Jeff, who is chair of the open NLP library. +So I think there was this discussion we had because we were trying to work on a use case together. And he suggested like we could do this with open NLP. And I was using an open NLP before as well, like in 2017. +And somehow never really thought of like I would contribute, but I pointed out to him that you know there's something that could be done in this direction. And he said like why don't you do it? And I was like me like no way. And he was like no, no, you can do it. +I mean, if you understand this, I mean certainly. And I think as I said before, like it's an addiction, once you started, you just don't want to stop. And I think that's how I started, I mean, I started using it more and more contributing more and obviously. +And you became the commuter as well, right? I became the comatir. Yes. Of open NLP, right? Congratulations. That's really great. You're right. Nice. Actually nice. And the kind of people who review your code and like you get to know like oh, this could be done this way as well. +I mean, I can say that it's a very rounded development opportunity that I got with open source, open source contributions. So that's something I can highly, highly recommend anyone. +I think one of the things that has really helped me in principle is like starting with the documentation or maybe the test cases. I think that's very classic advice that you would get from anyone who's been working and contributing. But that's really the right way to go about it. +I think documentation really helps because you can not just write anything like that. You have to try things out yourself before writing about them. And I think that just gives you enough context to pick up some tickets and like start solving them. +I am actually mentoring some of the people like outside my job to contribute on open source. And just so you know, these are the people who have been in the industry for like 20, 25 years. +So I mean, there are people who come with Lord of hesitation that I mean, is it even something that I should do? But it's nice. I mean, the kind of questions they bring in and the kind of discussions we have. It's amazing. So yeah. +Yeah, I mean, for sure, like what you said, makes so much sense, you know, like you read the docs, you will and you will test it out probably as part of your job or of your research project or whatever or your curiosity, right? And something will definitely not be right. +I mean, whatever said in the doc is not longer true or it doesn't apply for use case. +But how did you you said, you know, this imposter syndrome and like just general fear or like you didn't have experience with it? How did you kind of leap from from what you just encountered that doesn't work to actually going and contributing it? I think that that is something. +I mean, I've so I think the first time it took kind of the bigger leap of faith, then rest of the other times. I think once it works, that's what I'm saying. Like the first time is always going to be the hardest. +I think you would always and I think not being able to contribute on solar for almost like six years, I have been away from solar and I can say like a lot of things I already implemented when they came out in solar for point 10, I think that's what I started working on in 2012 or 14. +And it was kind of this grudge that really like made me feel and took me on a guilt trip that I could have done that just that I did not have a knowledge of. So which is which is why I mean, I'm saying that if you think about that you could do it, I mean, just think about the worst thing. +Worst is that somebody would reject and ask you to rework on it. But from your side, you would have the satisfaction that I thought this could be done this way and I tried to you know, do it this way. And I think that's what that's how I am living now. +I mean, if I have something on my mind, sitting on my mind that this is how it should be done, I just execute it because that is what is in my control. Like I can execute it. Approving or not approving something is something that I delegate to the rest, the other person. +So I think that is also something that we brought up in the discussion with Sebastian that Lord of Women, when discussing like how can we become like better public speakers, asked him that you know, what is a good topic to be submitted in the conference? +Like how do I ensure like my topic is accepted? And he made a way, I would say like a reasonable advice that from your side, what you can do is like submit the proposal. +I mean, accepting, not accepting is something, I mean, once it is you know, like submitted, it goes to the next swimling and that's when you stop thinking about it. +If it comes back to you, that's when you need to think about like what do I need to present to woo the crowd and making sure like you make a point. But then that's what even I do. Like if I think about that, this is how it should be done, I just do it. I like maxes like it would be rejected. +It would not be accepted or somebody would say like, oh, you know what, this is wrong. This is now hot. This is not how it should be done. That's about it. You get a feedback. Either you get the recognition of the stuff that you've contributed or you get feedback. Both of them are good. +Yeah, that's beautifully put. I remember about one person who was like a Java champion. I don't know if you know of this title that was awarded at some point by some microsystems. I think it's called Java champion. +You know, people who really popularized Java and you know, talked about it, written books and contributed code and so on and so forth. + And he was saying, you know, whenever he received the question and he had like 20, 25 years experience in this and he was saying, hey, when I hear a question from a newcomer, doubting themselves million, million of times, you know, should I apply for this job or should I become, should I commit something here like code or whatever. +I am just afraid. He was saying, okay, go and get your rejection. So it's like that first leap that you take and it becomes bit more like a game. I mean, it's like and he had a bit of like humor here, right? Okay. +So you're doubting yourself, you think you will get rejected, then prove it, go and get it. Right. And as they go and get it, they will probably either get it or succeed or get some partial you know, stage and to the partial stage and probably need to revise something and so on to right. Right. +No, that is that is amazing. I mean, I love it. And I that also brings back like I used to have this email signature. I think in during the time when I was in college that I never lose either when or I learn. Yeah. That just brought back. I mean, I don't know. +I at what point in time like I removed it, but I really lived by that line that you either learn or you win. So either way, it's a win. Yeah. +And like, probably getting ahead of ourselves, but like even when you start getting these small wins, they have start and end really like by the end of the win, you're like, okay, what should they do next? +You know, next challenge for me, you know, and that's probably you already had of yourself and you're doing a lot of work already, but at the same time, that challenge is always there. +And that's the part of the game and part of the reinforcement learning. And that is actually the correct terminology. I was just looking for that term. I think reenforcement learning is I think that's exactly how it happens. I think for me, it's exactly how it's going. +Like I contribute something, then I get feedback and I present it. And I think I start, you know, becoming more kind of like passionate about the other stuff. So it's like, you know, throwing your hat over the wall. And the next time you throw it at, I mean, throw your hat at even higher wall. +So I'm not sure if you understand that terminology, like this is something that's like taking your chances. So it's like throwing hat over the wall. So yeah, that's that's so I'm trying to make sure like all the feedback that I got from demos and the presentations that I give so far. +I mean, I, you know, collect all of it, making sure like I address everything. I have big presentation again at Berlin Buzzwords to give this year. Yeah, I think it's kind of like something. +If I look back, like the first time when I spoke at Berlin Buzzword and I think year 2020, then that's when the conference was like online. It was something I remember that, I mean, I went to my office because my kids would disturb me. +And I have this, you know, like I was so nervous, I had like five bottles of water next to me. And then I did not drink anything because I felt like I would not have to leave the seat for the toilet break if I drink that that much water. +So I was so confused like, I'm thirsty because I'm talking so much, but then I would not drink them because I would have to, you know, go away because I was hosting the lightning talk session, which went on for like the longer than a usual talk session. +So it was, it was like from that point in time until now, I think, yes, it's been quite a journey. Yeah, amazing. +I mean, this, this is really like coming to this logical question that I usually ask, like the question of why, and like you shared a lot today, you know, about women and search, your own journey, public speaking. +And there is always something new coming up, new projects, new blog, as a result of that project, maybe new struggle, new learning and so on. +But is there something that keeps you in this profession beyond these challenges, beyond solutions? Or is it just that? I think it's mostly, I don't know, it's, it's something so engaging and the joy that I get in solving things, I think that just keeps me going on and on. +And I think it's also like a commitment that I have to myself that learning things and experimenting with stuff that really brings the best out of me. So I mean, I have this somehow ambition, I want to be, you know, like I want to know everything and maybe language analysis. +I mean that, you know, so much, you know, has like a passion that I have for these that I want to be like a PhD someday in that. And which is why I want to know everything and somehow, like just aiming for that goal, I know I'm like maybe old, old for doing PhD. +But yeah, that's the goal, like keep for myself. Like if I, I think a short story before we end is that people ask me like, what does your name means? And I think if you're not noticed, like my name is a pellandrom, my first name and my last name. +So it's a, the ITA, and if you reverse it, it stays the same. Also people ask me like, what does it mean? So my name is actually a machine learning model. So I learn from the past. That's what my name means. +And I want to just, you know, prove my name that if my parents, you know, thought of something before naming me at the time, they get what they expected from me. And that's all about it. Oh, very beautiful answer. Really? It's like living to your, to your true self. Yes. Amazing. +Is there something you want to announce? Are you already said that you're going to present at Marine Bosworth's? Is there something else you want to share with the audience that they should know about? Maybe course or something, something that they should, you know, get them and start it with. +Oh, yeah, for sure. I think so the elastic search version of courses also out now. I mean, happy to take more questions. If you have anything, reach out to me on Slack, of course. Other than that, there were certain things that I wanted to rework. So we wanted to work on our image models. +So that's now been fixed already. So happy to give that try as well. Along with that, we're soon going to be sharing some more information about fine tuning the models. We're already working with Gina AI on that. +So hopefully we can come up with the blog post or something that demystifies that part as well. Other than that, I have some plans also for Hastag EU. I'm working on some case studies with the companies. + I mean, if somebody who's listening to this podcast thinks that they could be one of the candidates who wants to be involved with this case study, we would talk about how involvement of women change things at your workplace, feel free to connect with me on relevance like or whichever way you think is the best LinkedIn. +Yeah, I bet Twitter maybe. Oh, yeah, Twitter for sure. Absolutely. Thanks for adding that in. Fantastic. I really, really enjoyed this conversation at TITA. I learned something today. And I think that our listeners did as well. Keep improving your model. +Keep generating more of the parts so you can further your model. And I hope to meet you as well at Hastag or some other meetup maybe online. And yeah, I mean, it was a really pleasure to talk to you today. Yeah, same here. It was amazing. +And I mean, it was obviously like a different level, like being a subscriber and being on the show. So that was the kind of like moment I had with myself today. Thank you for this opportunity. Thanks so much. It was also the best for the forthcoming podcasts. Yeah, thank you so much. +Thank you, TITA. Yeah, bye bye. \ No newline at end of file diff --git a/transcripts/vector-podcast/berlin-buzzwords-2024-alessandro-benedetti-llms-in-solr.md b/transcripts/vector-podcast/berlin-buzzwords-2024-alessandro-benedetti-llms-in-solr.md new file mode 100644 index 0000000..f633596 --- /dev/null +++ b/transcripts/vector-podcast/berlin-buzzwords-2024-alessandro-benedetti-llms-in-solr.md @@ -0,0 +1,158 @@ +--- +description: '

Alessandro''s talk on Hybrid Search with Apache Solr Reciprocal Rank + Fusion: https://www.youtube.com/watch?v=8x2cbT5CCEM&list=PLq-odUc2x7i8jHpa6PHGzmxfAPEz-c-on&index=5

00:00 + Intro

00:50 Alessandro''s take on the bbuzz''24 conference

01:25 What + and value of hybrid search

04:55 Explainability of vector search results to + users

09:27 Explainability of vector search results to search engineers

13:12 + State of hybrid search in Apache Solr

14:32 What''s in Reciprocal Rank Fusion + beyond round-robin?

18:30 Open source for LLMs

22:48 How we should approach + this issue in business and research

26:12 How to maintain the status of an + open-source LLM / system

30:06 Prompt engineering (hope and determinism)

34:03 + DSpy

35:16 What''s next in Solr

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20241107_011152_a59e71acc05fe03f850677d583f5111a.png +pub_date: Thu, 07 Nov 2024 13:59:44 GMT +title: Berlin Buzzwords 2024 - Alessandro Benedetti - LLMs in Solr +url: https://rss.com/podcasts/vector-podcast/1741381 +--- + +All right, Dr. Podcast and here I have Alessandra Benedetti with me his second time on the podcast actually and exactly about same place we recorded two years ago. I remember on Berlin was works. Yeah, we were here. Yeah, I guess I got 22. It was. +It was by the way a lot no easier if you remember them now but was closing day that we like people. Yeah, but I think it's almost end of day as well here. First day of the conference and yeah, I wanted to chat with you. +How do you like the conference so far? So has been so far like a great conference. We've been seeing like many talks about the language modern integration with search. So that's the biggest new trend. +Vector based search is still quite a strong topic and in general with testing also like evaluation and explainability discussions around like vector based search or in general language models. And and my thoughts was about hybrid search. Hybrid search. +Yeah, so you work a lot on on solar right that's your kind of like playground and that's where you integrate things but also then I heard that like guys at Reddit are using the work that you've been doing also in solar. So that's amazing. +Tell me more a bit more about what is hybrid search right? How do you see it? What's the value? And and basically maybe what are the challenges that you needed to solve and you still see related to hybrid search? +So the first point and the reason I decided to start working a little bit more in hybrid search and contributing this even our rank vision to solar is because of the limitations of vector based search. +So vector based search of course introduces like the ability of closing the semantic gaps with light queries and documents with some some limitations right? So the explainability bar for example is an aspect I care a lot and it's just very difficult to explain vector based search results. +Yeah, we have I dimensional. So many many dimensions in the vectors and humans are not really good in managing many dimensions. We live in a three dimensional world and this even difficult for us to to understand life for dimensional life. Yeah, then we have like many elements in those vectors. +So each feature in the vector doesn't have like a meaning for the humans. So you have like 768 dimensions in your vectors and there's no single dimension that means something semantic. So it's just the output of some machiner model but we can interpret like what it is. +And we can interpret what would happen if that feature goes higher or lower. I mean does a higher value for that feature means higher relevance or not? You can't really do that with vector based search. So these kind of problems. Yeah, start to have like an input. Right. +So you have like your clients using vector based search. They are happy and then they are not and they want to explain for example, yeah, what happens. Yeah, and another limitation is keyword based matching. So by the vector based search try to solve the vocabulary in his best problem. +So if you have terms in your vocabulary that's different from the vocabulary used for queries. Yeah. At the same time, users are used to have keyword matching documents in their response. +So when you don't provide keyword matching document in their response, they're going to be like problems and questions. Yeah. Why do I see this now? And why I don't see for example this title. Oh yeah. So without any search, the idea is to mitigate those problems. +So mix up different query results sets. Potentially like vector based search results and traditional keyword based search results. Yeah. Get back one result set. Yeah. Let's try to combine both words. Interesting. +And if we kind of step forward from this, let's say we deployed hybrid search, so now it basically takes some similar documents from keyword hits and then another one from vector. You still get those documents that do not have keyword matches, right, from the vector space. +Do you know or maybe you have employed some ways of explaining to the user why they see them? So that's an interesting point actually at the discussion recently about how can we explain better by vector based search. So we mentioned already all the problems. We've explained the what can we do bet. +So there are other approaches that just cure dense vector based search such as learned sparse retrieval for example, where you learn query or document expansion term candidates based on learned models. So based on the probability you will expand your queries with additional terms. +So that's a little bit more explainable because at least you get back from the machine learning model alternative terms for the queries and the document. Yeah. It's still a first layer of explainability. So you have some that's like additional concepts. So it's easier to understand. +Still you have the probability assigned to their pair. So if it goes wrong, you may end up with unreasonable terms. So not perfect. A little bit better, maybe a little bit more explainable. +And then there are approaches such callvert where you encode your sentence, not just to just one vector back to a sequence of vectors. So multiple vectors, one pair for an action. And you do the same for your documents. +And then you you basically return results based on the similarity between not just a single query vector and the document vector, but multiple query vectors. So each query at each query vector, which is meant to be probably a term with the terms in the document. +So you may be able to highlight the terms in the document that are close to the terms in the query. Yeah. Also in this case, of course, it's just a first layer of explainability because then if this goes wrong, of course, again, you have sequences of vectors. +So you can get like a sort of itm up of what query terms match is like, more or less the document ones, but still not perfect. Yeah, sure. +Of course, it's kind of like maybe experimentation that is required, right? What works for you? What what what is the end product? But maybe one question is for me as a user, right? Let's say I'm using solar and you offer hybrid search now. +Are you already offering or will you consider at some point offering the capability to ingrain what you just said into let's say highlighter in solar that will it will actually build me the snippet regardless of the source of that document, whether it's keyboard or vector. +That's a very interesting question because there are in my opinion two layers of explainability for engineers. So we need to work on the engine and change the ranking, change the matching and user's equipment. Yeah. +So a user that just want to know why for example, what is there and for user's finability actually, my company, we design and develop the highlighter. +We call the neural highlighter that takes in input the wireless model and in the response, will I like the snippet for each result in documents, not based on let's say on match, but based on the question as a system powered by a level model. Yeah. +So in this way, you will be able to highlight part of the original document that are semantically close to the interesting place. Can you say the name again? What was the name? It's called the neural highlighter. Neural neural highlighter. So it's your proprietary product right now. Yes. +It's a lot of synths, right? Yes. Right now, yes. We may contribute it to a open source integer. I don't know. Right now is one of our products. But I mean, it's a feature that you, is it offered as a standalone component? It's a plugin. It's a plugin. So you ins it's a plugin to pull it. +It's a plugin. That's the value prop as well, right? It doesn't always need to be open. It's something I can plug in and exactly. It takes in input the wireless model. Yeah. It's a response point. So you can write. +So that will help to explain results to the users, right? And you also mentioned, right? It's thanks for doing this distinction, making this distinction that there is also explainability for the engineers that is also important. +So can you a bit explain what you mean? Explainability for the engineers because I care about it a lot of force. But I used to be an engineer full time. And I need to know how, like, how to do it, how to tweak something. +But also, can I explain to myself that what I tweaked is actually the right thing, right? So, you know, kind of the process of engineering the. Yes. +So in solar, for example, there is a debug component that give you the ability to engineer to expand the response with the information about how the score was calculated. So in solar, when you have a query and you have a result, a score is calculated for that result for that query. +And this core will impact the ranking. So, descending order, literally, right from the highest core to the lowest. And normally, this core is explained showing why you get that mathematical calculations from the term frequencies. Yeah. +The length of the document field, the average length of the field, the document, frequency, hour, error, a term was, for example, and so on and so forth. So long mathematical expression that are readable to the user and you can understand, okay, I was aiming for this field to impact the score. +Let's see, let's see, really impact the score. With better research right now, the only explanation that you get from an engineer perspective, literally is within the top K. So this document was within a top K with a cosine similarity between the query vector and the document vector. +That's not really helpful. It's just confirming what's you know already, right? I mean, yeah, it's in the top K, it was written. +So one of the ideas I was thinking of, because actually quite far from the implementation is to explain the reason document in the results set, showing examples of, so the language models used to return embedded, were fine tuned on sentence similarity. +So this means there were pair of sentences with similar meaning and pair of sentences with this similar meaning in a way that to learn how to encode this amount. So I think it could be very interesting if to explain the reason a document is being returned. + Because of vector cells, you show like a snippet, say, because there are, there is this similar pairs of sentences and this is this similar sentence, in the way that then potentially the engineer can go back and realize, okay, let's take a look to the original training data, for example, did I cover the example well or maybe they are wrong? +So I see like, oh, these two sentences are shown as similar, but they are not so. +It's just an idea, you know, study it. Wow, that's very interesting, because as you said, it's very limiting today to just know that geometric search happened and this is the result. Yeah, that's amazing. +I mean, it's really interesting that with this work, you are not really just taking something and applying to implementation, right? Like, I mean, implementing a plugin, but you actually go into the space of exploring thing because it's not like everything is done, right? +And maybe in some companies, it has been done, but they are not open sourcing, right? And so you need to do the search, the search of the solution. +That's very interesting. So in terms of functionality today, hybrid search is already available in solar, right? Is it already released? And big portion? Yeah, so there are different ways I need search can be performed in solar. So right now, we saw 9.6. +There are ways of combining results from electrical search and vector-based search and then re-rank them, for example, using learning pranks. So you give like different ways to different factors. So for example, the vector-based core or the traditional core. Yeah. +What is coming next, which was the topic of my talk is the receiver rank user. So that's coming with solar 9.7. So I guess in a couple of months, we're going to release it. Nice. And that is a way of adding hybrid search that is independent on this core and just based on the ranking of the results. +So you mix the different rank lists. Yeah, they can be two, maybe more. Yeah. It's support more supported, not just two. And then you combine them based on the position of the documents in the different rank list. Yeah. +The higher the position in the ranking, the best the probability that the document is going to end up in a higher final result set. Yeah, yeah. +Actually, when I was maybe you can help me understand this, but when we were trying reciprocal rank fusion with another search engine, we actually found implementation. So we could kind of plug it in and Python code, very quickly. +But then when we looked at the code, one of my engineers said, this looks like round, raw, and algorithm essentially. There is nothing particularly peculiar about it or tunable about it, which probably is not true, but I'm not sure what's your take on this. +So it felt like you have two lists and you basically just take the starting from the top, you take like in order, you know, these documents and you combine a blend at least, right? +But if you wanted to pay attention to some signals from these documents, you know, based on their features or or maybe you wanted to introduce a logic on top of this, right? +So you want to say, let's say in the context of geographic search, I want to find in top three results, I want to see a super popular B. +O.I. and I know what popular means. Another second result could be, I don't know, the closest one or maybe vice versa, depends. And so on so forth. So I have some kind of rules in embed and then maybe it stops becoming RRE, already, right? But I still go going, taking a step backwards. +Did I explain it right? Or other some parameters and RRE that I could kind of be tuning a bit to have the different outcome? There's not much to tune to be honest. So you got it right. +It's not only around roaming, because what you do is basically you give a new score to the documents that are based on all the rankings of that document in the results list. +So it's not like in Perliving where, for example, you go with one document, you pick from one range of lists and then to the other list, you pick another and then you choose which one should I go next. +It's more about life, let's see this document how many times it appears in the ranking list and where it appears in the ranking list and let's build this new score. So the more you are in the top positions, the more likely you end up in the top position of the final result list. +Given that, you're absolutely right that if you want to be like more advanced ranking systems, potentially like with different phases, different steps, it makes complete sense to maybe build your original candidate sets with receiver or infusion. +And then you re-rank, for example, using learning to rank and many features where you can have like, again, maybe the vector distances one feature, the similarity we want to feel from a expert perspective, popularity, geographical distance and many other features. +And then you apply learning for example, so you train a machine learning model to identify these weights. It makes perfect sense in my opinion. +I believe receiver, rank, fusion and in general, like let's call them simple approaches with our research, because if you take a look to the algorithm of receiver or rank fusion, it's not the core, it's actually open and open algorithm from 2009. +But this opened the doors in my opinion to build your original candidate set and then potentially like, yeah, you re-rank it okay. Yeah, yeah, yeah, she's not random, it's okay, she's already some reasons to be there. +And of course, like in any case, those without saying that we do need to have some method of combining these completely disparate spaces of scores, right? Into one. +And that could be actually even like different search engines operating on keyword level because they output different scores, right? So maybe even potentially I'm thinking separate charts of your data that also have their own idea, right? Local idea. So, yeah, incomparable, right? Awesome. +We also, not related to this completely different topic, like there was also a keynote today about sort of what open source means, right? +And without, of course, criticizing, but some companies were mentioned on this context where they claim that the LLMs are open source, but when you look at the licenses, they are restrictive, they actually do not allow you to use them independently, right? And kind of go and serve your customers. + But you also just mentioned what the code was started recording is that there are also cases where model can be open source, and it's kind of like more or less abiding the principles of open source spirit, but then contract, but then the data that it was trained on is not open source or the methods that were applied to the data are not open source, right? +So to me, it sounds so important to keep kind of declaring what open source is, what are the principles, right? + And maybe this keynote also shed some light, but you also, it seems like this topic is also very close to you, and you are in the open source contributing a lot, you are the commuter, like can you can you share your vision on what is open source, what are the implications for how this field should be developing? +I think it's a huge problem, especially because nowadays open washing, which is like the practice of associating openness to something that potentially is not really fully open, is happening a lot. +Here's open source is cool, open source show like a good habit, so you're the good guys if you if you do open source. +So as you said, we are not going to make names of companies or association that claim, for example, they lar language model were open source, but lar language models are complex systems. So the outputs, the final light waves on the neural network is just one little part of the entire picture. +Those lar language models are normally pre-trained on huge quantities of data with a pre-training algorithm. +So the pre-training data and the pre-training code, is it open? Is it not? I mean, many times it's not, not only not, it's not open, it's not even known, what kind of data is just generic internet scale data. +What about the fine tuning them? + So once you get the pre-training, which is the unsupervised part, where you just explore the web, that's pretty simple, then you want to fine tune for specific tasks, like sentence similarity or instruction following or, I don't know, summarization, any kind of task you want to use the and to do that normally using an additional training data set that is particularly designed for that fine tuning task. +And again, is that open? So do you communicate and then you make it available? And the code for fine tuning, do you make it available? +The output of the pre-training, do you make it available separately from the output of the fine, the documentation, any data that explains what is done, why you found it? + So I've read like an interesting paper that I guess we can share, like, as a comment from a university, they were like comparing all these aspects for the MS models and how famous like open source of the MS models actually behaved on each of these columns and would be surprising how a small percentage of these like, you know, big layers are actually open sourcing everything. +So it's not just the license that as you said correctly, sometimes it's limiting, but literally like the components shirt, sometimes it's just the final which is, is it helpful? +I mean, in open source, you want to cooperate, you want to improve the code, like in normal code, you have access to everything and you can like improve, you can help the community. +If you just access the ways, you can use it, but can you, for example, improve it? Can you understand if it's fair? In the data you was there? Yes. Yeah, it's really difficult. Yeah. +And so, what do you think these discussions should start or maybe it's ongoing? Are you part of some discussion? And how does it impact business and maybe research? Right? Because there are different sides of this coin. +Many of these things emerge in the academia space, but then they move to create value on the business side, but it could also be vice versa. +So what do you think? What are we going to address this? +So I know that the open source initiative, which is a group of people that directly directly open source manifest, so I try to basically think to ways of the finding open source is they are working on a definition of open source for AI models. +We are going to see hopefully soon enough, a definition of what it means for a model to be open source. And that is going to be great, because at that point it's not a matter of like, I believe it's open, and I claim it's open. It's open for its notes. +And everything is covered by a license that is going to be open or not. + In terms of like impolination between like the academia and the industry, I think probably that's the most, I mean, this period is so important to see like cross pollination, because there are like many models that for example are designed and contributed by the academia that must then be used by organizations and the other way around, because of course there are like a lot of money involved in training and free training and fine training on the algorithm of the models. +So many, I mean, only few actually organizations are able to do this. So they should try, I mean, ideally to make it as open as possible in a way that then universities can focus on small components and potentially help in some more. Yeah, yeah. +I mean, you know, pre-training and internet scale is incredibly expensive from energy perspective especially. So I hope, you know, we reach the point where everything is open enough for also smaller organizations and academia organizations to 12. +Yeah, it's very interesting because there is always going to be this kind of play between, okay, this big company has all the servers, they need to train the model. +So they can also decide how they will do it and not kind of disclose, but then maybe the question that we need to be disputing and sort of discussing is that they still don't have all the data to train on, right? Potentially. +Like there have been some cases mentioned in the keynote, you know, when some company, we will not name the company, it goes and trains it on some articles of famous publishing house, right? And now that publishing house is unhappy because they say you took our articles without us knowing this. +Now, it now it kind of evokes this question, okay, when I was reading this article, there was probably some license which said you can not, you can do this, but not this, maybe there is something hidden, right? But only now we started discussing these things, right? +And that's very interesting topic, but do you think that, you know, when the companies will be, let's say, they have open source to model and they have checked everything on that manifesto or on that contract. + Do you think that there will be still a need for some maybe tooling or some process to kind of continuously maintain the status of this model as open source because it may well happen that, you know, either the company or research institute, they go and accidentally use some data that doesn't anymore confirm, confirm, like, comply with this contract, right? +First of all, without other lands, do you think such thing exists, would say, for Apache Solar or you see that no one will find a library that is not the license that it has to be, plugs it in and we do a release of you seeing a solar. +I think there is some checker, right? Yeah. So these applies to certain extent to code as well, right? So you are a contributor. When you sign basically the, let's say contract with the Apache Solar Foundation, you are sure that any kind of contribution you do is your own. +So there's not being COVID, for example, that was not COVID-rided and the sort of thing. It's genuinely created by you. It's genuinely created by you. So to certain extent, that would be a similar thing to potentially add some training data. +I think probably it's a little bit less likely that like in an existing large-language model, for example, someone would contribute a little more data. I mean, it's more likely that maybe you you would change a little bit the code, for example, responsible of fine tuning and it sort of things. +But still, I think there will be this layer of responsibility that wouldn't wait on the shoulders of the contributors because of course, you kind of have control on these single individuals. +And you need to have like this sort of layer where the no-profit, the Schopen source project protects itself from. Yeah. +Because I can imagine that again, it's probably putting it to extremes, but there could be eventually some tooling where you take the model and you introspect its behavior and you can make a guess on which data it was trained. Potentially. Or at least find some similarities with how it produces. +I mean, there been some attacks, so to say, right? So you can actually probe the model and see what it outputs, right? You can even break some models sometimes. That's true. So that's more like on the hacker side or the the bad hacker side. But I mean, there probably will be tooling. +Do you think it's possible that there will be tooling kind of checking the model and and making some hypothesis. And as you said, once caught, that organization will kind of lose its trust, right? So obviously, everyone wants to be kind of accountable and so on. +But then there could be a flip side of that that you can kind of accidentally assume that they did it, but that's not true, right? Now that becomes a very hard debate, right? So it's an area which I think deserves exploration and study. +And I believe that's being accountable of like the data you use and disclosing it, of course, is the first step. But then also validating that companies send the truth, for example, I think it's going to be important to build trust and to make sure that what you display is actually what happens. +Because we never know. It's very interesting. Was there some other topic you wanted to cover? I mean, are you also working on Raga or anything of that or evaluating the LLAM based search? We are working on many different integration with LLAM models. Retriol passage generation is one of it. +Nugro language parsing, for example, is another so moving from Nugro language to structured queries. Yeah. Probably the last thing we can discuss, the last topic we can discuss is prompt engineering. Yeah. +Briefly, because yes, it's this naming convention is something that really hurts me because it's not engineering at all in my opinion because you're just attempting to communicate with something and you don't know what to expect. +Because I've seen, I mean, I've seen tools today with people saying, you write this prompt and you hope you get this response. Yeah. You type this prompt and you ask, please give me the response. She is, to me, something that is, not scientific at all. It's not scientific. It's not scientific. +It's not science. You can't just be comfortable. Yeah, you can be comfortable. Yeah. So there's, in my opinion, just to give you short, a big margin of improvement there to interact with LLAM in a more program idea. +I want to specify it as with rules and get back a response that satisfies those rules. If I want to select an item from a list, I want to select an item from the list. I wonder LLAM is more than to be able to just select the item from the list. +I'm not 80% of the time, select the item on the list and 20% of the time, select the item and give me an explanation. I just wanted the item from the list. Yeah. +And right now, I've seen, and we will see the conference because I've seen in the agenda, there are a lot of many talks about trying to solve this problem. But right now, what I've seen as a possible solution is just like you post validates the response and you go back. +Like, okay, yeah, I asked for a specific JSON in the response. There are mistakes. It's like, it's not a possible JSON. I say, I go back to the LLAM and I say, this is not a possible JSON. Can you fix it? And again, and again, and again, which is not really something you want to go to production. +So in short, in my opinion, like using LLAM with models, program, I right now is full for approval concepts. But would I bring to production like out of the box like these sort of approaches? I want, because I wouldn't, you know, brings, I want to bring something that is deterministic. Yeah. +It does what I want to do 100% of the time. Sometimes. And I don't want to hope. It's a good thing. Well, I want to make it work. +Yeah, but it's also like I see it's very interesting topic, by the way, but I also see some level of contradiction that to like between non deterministic and hallucinating model essentially hallucinating by design because it keeps predicting the terms, right? +And some level of determinism as you just explained, right? But I guess, but I guess at the same time, someone might say that our life is not that deterministic, many moving parts and we still find a way to, I don't know, leave it and then build something, right? Yeah, there's something that moves. +I think, you know, it's the first, anyway, we are experiencing, in my opinion, at first, that in these new worlds, of AI big models. So I think it's fair. They were born to auto complete text, to generate text. And now we are trying to use them to do tasks. Yes. Which is okay. +We as humans use language to, to task. Yeah. So I just guess, and then we end up with programming. Yes. Computers, right? So let's play it in a little bit more that would be programmable. Yeah. I mean, it doesn't remind a bit. +I haven't explored it, but to mention GSPY, the packaging, probably you heard about it, right? Which replaces the prompt engineering with more programmable sort of way of doing it. +I still don't know how it works, but I know that some of the engineers in my team applied it quite successfully to generate some synthetic queries. So that was very interesting. +Have you played with it? Do you know? So my team, we've been playing with it for one of the bros of conceptual concepts for doing other language processing, for structure solar queries. And I think it's a nice first step. +Still is giving you like an in-direction between the prompt and the way to write a prompt. So you have like classes, the mimic, the programming language, but then ends up as prompt. Yeah. I see. You are not sure that you will get what you want. But it's a first attempt. Yeah. Yeah. +I think it's okay. I mean, we will improve that. It feels maybe like maybe first baby step in a way that it's not a work to the state that you mentioned. Yeah. That's not a conflict solution, but it's great. So what's next that you're working on? That you want to disclose? Yeah. +So first of all, I want you to bring and merge the hybrid search receiver rank fusion to solar, which is coming nine to seven. So I'm very close to the men. Awesome. For that. We are, we got some funding from the European Union to work on solar. So that's like that. +So we're going to be able to contribute more vector-based search capabilities, better integrations we've learned into rank, better integration with like inference and points to make it a little bit more transparent. That's it. +And still in the work like multivalued supports for vector-based search in solar. And there are like some pieces in losing to speed up and improve optimized vector-based search that are not yet in solar. And that's among my top priority. So this is in short. This is fantastic. This is fantastic. +And of course, it's all open source and you know, yeah, and then like anyone can join, but maybe we can also make a call out and say that, I mean, everyone that wants to contribute, you know, the more the merrier. +Yeah, I actually enjoy like even though I don't do solar or only seen today, I am still reading the main news. And and for the most part, I'm reading the lesson one. So sometimes I see your discussion as well and you where you say, actually, by the way, I'm working on this hybrid search. +I did this and this. So maybe it will influence you. And I also love the the culture where you do not really enforce or impose your solution. You just say just for you to know, maybe it will be useful. And someone says, yeah, awesome. I especially love that that discussion. +I forgot the particular topic, but I remember it was a recent one. It's a recent one. So keep up your great work. And it's always a pleasure to talk to you and it looks like it's a tradition that we started meeting in the same sort of early flood words. In the end now. +Yeah, so every two years, we see a lot of people. Yeah, there are a lot of people. So fantastic. Thank you so much, LeSando. And anyway, my reaction to the conference. Thank you. Thank you. \ No newline at end of file diff --git a/transcripts/vector-podcast/berlin-buzzwords-2024-doug-turnbull-learning-in-public.md b/transcripts/vector-podcast/berlin-buzzwords-2024-doug-turnbull-learning-in-public.md new file mode 100644 index 0000000..196d210 --- /dev/null +++ b/transcripts/vector-podcast/berlin-buzzwords-2024-doug-turnbull-learning-in-public.md @@ -0,0 +1,134 @@ +--- +description: '

00:00 Intro

00:30 Greets for Doug

01:46 Apache Solr and + stuff

03:08 Hello LTR project

04:42 Secret sauce of Doug''s continuous + blogging

08:50 SearchArray

13:22 Running complex ML experiments

17:29 + Efficient search orgs

22:58 Writing a book on search and AI

Show + notes:

- Doug''s talk on Learning To Rank at Reddit delivered at the Berlin + Buzzwords 2024 conference: https://www.youtube.com/watch?v=gUtF1gyHsSM

- + Hello LTR: https://github.com/o19s/hello-ltr

- + Lexical search for pandas with SearchArray: https://github.com/softwaredoug/searcharray

- + https://softwaredoug.com/

- + What AI Engineers Should Know about Search: https://softwaredoug.com/blog/2024/06/25/what-ai-engineers-need-to-know-search

- + AI Powered Search: https://www.manning.com/books/ai-powered-search

- + Quepid: https://github.com/o19s/quepid

- + Branching out in your ML / search experiments: https://dvc.org/doc/use-cases

- + Doug on Twitter: https://x.com/softwaredoug

- + Doug on LinkedIn: https://www.linkedin.com/in/softwaredoug/

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20240718_110721_6a250f534a47b913cfe9ab7513e63b01.png +pub_date: Thu, 18 Jul 2024 11:10:42 GMT +title: Berlin Buzzwords 2024 - Doug Turnbull - Learning in Public +url: https://rss.com/podcasts/vector-podcast/1572886 +--- + +few years. Cool. Yeah. Hello, how are you? Hi, Doug. It's great meeting you at Berlin Boswords. Yeah, I can see you. Yeah, great to see you. It's your second time on the podcast and, yeah, excited to be back. Yeah. Awesome. +I think it's like two years or only one that I can take so, but how have you been? I've been great. I've just been doing traditional learning to rank over at Reddit. And it's been a lot of fun. A lot of it's just meet and say to stuff. Yeah. +The stuff that I think is really important, like your training data with search and you're getting your features right and that sort of thing. Not actually too much vector search lately. So kind of paving a path in the ranking model space. +And I still think that's really important for if you're building a rag app or if you're building a lot of these things, a lot of people are sort of discovering this through the vector route. They're like realizing there's a small other side of information retrieval. Yeah, that's important. +And that's that's really exciting to me because I think a lot of new ideas that suffer coming into this space. Yeah, amazing talk as well. I'm sure we'll link it. What was it's published? The one you just gave. +And you also reminded me of time as I told you of the time when I was working on solar starting at version 1.5. Yeah. It was eaching to ask you which version are you running? But then I was like, what will it matter to me? Well, it's still well, it's we were running solar 7 until recently. +And one of the things you didn't talk about in the talk was having performance problems with solar 7. Yeah. And moving to solar 9 fixed it. So it helped with a lot of stability and performance problems. +And that's just one of those things that a lot of these projects, machine like not just like learning to rank is like a lot of machine learning projects. +You what I find especially with learning to rank, you're often like building out and scaling up infrastructure for a certain problem at the same time you're doing machine learning. Yeah. So it's you're finding these problems. Yeah. +And you will spend weeks or months like, why is it slow? It's unexpectedly slow. Yeah. What's behind it is and then you realize, oh, solar 9 doesn't have this problem and we'll resolve it. Yeah. And we were already upgrading. +So like, okay, we can put this we don't have to stress out about this performance problem. But that's why it takes a year, two years sometimes for these projects. Yeah. Yeah. Start to show. Hey, oh. Yeah. +I also would like to say thank you for your project that I think you started back at OSC open source connections. Yeah. Hello LTR. Yeah. Which I think it's still out there and it's out. That's very project. Yeah. +It really allowed me to quickly, you know, jump on the on the train and start moving because I was actually alone on the team. I did do search before, but it wasn't related to ML at all, right? It was like feature engineering, but on a different side of things. And yeah. So thanks for that, really. +Yeah. I like, I think it's really important. And one thing I think it's the career advice that's helped me is to learn in public. So a lot of hello LTR came up when I was learning how to do LTR. And I had to get some examples and like try different things out. +And then as I made mistakes, those mistakes became lessons for the LTR training that came out of fellow LTR, also called fellow LTR that open source connection it does. So it's I really encourage people like the best way, the best teachers are often people actively learning. Yeah. +Because you will encounter the mistakes that the experts forgot about. Yeah. I couldn't tell you about how to learn how a loop worked from Python because I've done that too wrong. +But the person who would have the empathy to teach that really well to someone learning that's grad should would be probably my son if he was learning Python for style. +So I really encourage like be out there speaking, blogging, acting because you'll have insight to how to teach your project that expert won't. Yeah. Yeah. +That's that's another side of your professional life that amazes me is that how do you find time to block so it's like and that those are really deep things sometimes you go into detail with its code or where you offer some thought model like do you sleep at home. +I think I just have a high tolerance for making mistakes in public. Oh, and also I think a lot of it has to do with having a history degree. Oh, really? I didn't know that. Yeah. I think so history and computer science. +So when you get when you do history, it's a lot of writing, writing, writing, reading, writing. And then a lot of it's also when you get to this your senior level history, it's like not just writing an essay, but can you write your argument in a single page? Yeah. +And so that's which is funny because you think when you're a student, you think I'm going to make the margins big and I'm going to make the text big so I can take up more space. Yeah. Well, when you start writing a lot, you tend to write, you tend to get really verbose. Yeah. +Then you have to learn to just make your arguments like exactly. Yes. Yes. And then you're shorter. And yeah. So yeah, it also now when I'm doing the product management role, I have I do not have a history degree like you, but I have to write some things in a concise way. +Sometimes they say you have to remove half of the page because you're not fitting the page limit. I make that mistake all the time. How many one pages are exactly? Like 10 pages. +And another thing is like never talk about hypothetical future because you don't even know yourself what it will happen or not. Right? Yeah. +When they talk either talk about things that you're absolutely certain have happened or you're certain that they have planned already, right? That's how we do the product management. Yeah. So it teaches that that side of things. +But I guess what I do is out and I then go to blogging and and and I use you as a great example there. You go and unleash yourself and blogging and you write what you want, right? But you still need to what you want said that you became more successful blogger at the moment. +You actually started modeling that specific person you're writing to not an abstract audience and not yourself because you're not writing. +So is this how you still perceive it? Well, I definitely write to myself six months from now, but I also write the audience I imagine is like a close group of friends. So I almost think about blogging as sometimes and this is easy. +It's easy for people often to imagine sitting down and writing a long email or Slack message. And what if you just turn that into a blog post? Yeah. +And to me, that's that's an inspiration for this so many times you get excited about something and you want to send a message to your friends and share it. We'll turn that enthusiasm and that message into a blog post. And then that those are the best blog posts. +And I also think it's really important to remember it's blogging. It's like it's a step above writing us a media post. It's very informal. Don't take it too seriously. You will make a mistake. Exactly. You will it's and it's a do it for fun. Yeah. +But yeah, I do it a lot because I want I think I I there's a meme of like someone starting out on something. Yeah. Someone being very senior and then or someone being like mid-career and then someone super senior in their career. +And often the like starting out this meme starting out super senior are the same. Yeah. And it's like my version of that is doing when you start out you code and do stuff to impress your friends like in high school or whatever. +And then you get like all worried about like having some big impact and like impressing the whole world. And then when you get super senior again, you're just like I just want to like do full stuff to impress my friends. Yeah. +And which actually turns out also to be stuff that the whole world cares about because usually your friends are are like doing cool stuff themselves like you know vector searcher or doing cool AI stuff. Yeah. So it turns out that the rest of the world finds that interest in too. +But I think that's a really important thing to have an authentic voice. Yeah. So it's also part of the building up your profile. Yeah. Yeah. So I think like like Steve Jobs I think I think said like computer is a bicycle for the mind, right? And so blogging is also in a way bicycle for the mind. +You have to rework your self-help, right? It's also the programming that you do on the site like searcher rate. So tell me more about what was the motivation. Why did you start working? So I think I like to go against the grain a little bit. +So I actually had worked on different versions of vector search for a long time before this for craze. And different hacks and things to make vector search work in a in a different in the solar elastic search world. So everyone's in vector search. +And I decided that in part because I wanted to get do a little bit more native programming, get facts. I used to do that. I used to be a C programmer. Yeah. I found that vector search is very welcoming to machine learning engineer data science community. +But the traditional lexical search engines like solar and elastic search, they're very weird. Yeah. They're very like you have to know this weird query DSL. You have to understand these things. Organization. +So I wanted to take that and take that lexical world and bring it into a data science or data high data sort of environment that is very comfortable to machine learning people on data science. Yeah. So I built search array. +And what search array the reason I built that or what that does is it's basically a lexical extension to pandas. So if I have text, I can make a pandas column that's just like tokenized text. And then I can ask it to score against a keyword and get a BM25 score. +And the main reason I built this was so in a co-lad notebook or something. I can quickly further set ideas while having to stand up solar elastic search or think about Docker container and all this stuff. And I said I could see like, okay, I want to tokenize things a certain way. +I want to score change the BM25 scoring to be a certain way. Yeah. And which is you think about like 90% of what you do in a lexical search engine is tweaking the tokenization Yes. Finging the scoring, trying to index something new and search against that like, oh, I have any recognition field now. +No. Well, it's and the other thing is what the other thing you do a lot in lexical search engines is I want to boost by recency and I want to do these other things. And a lot of these things can just be done with a really fast like I take this column numerical date column in pandas. Yeah. +I take a numpy array that's a BM25 score. I multiply them together. Yeah. I have a score that's a recency weighted yeah. What does that look like in terms of my offline metrics? Yeah. Interesting. Without having to go off to like, you have to figure out a last search and like that, a satiric stuff. +So basically it allows you to try out some ideas really quickly, right? But then there will be some kind of offset compared with the reality because tokenizers and solar probably work differently. Yeah. But that would be different. +Probably will be closing up, right? So it's also if you nailed the signal that like you explained today in your presentation, you know, what about the number of comments or what about, you know, the recent C&C and so on. Yeah. Although this be on the go and I'll like try that really quickly. Yeah. +Yeah. And a lot of times it's it's a big effort to index some new data into the search engine. Yeah. Right. Yeah. Like you have to go is the upstream system fast enough to handle those load and to stay up to date. +And really to justify a project, you might start with a prototype and say, okay, I just pull for a small small chest set of data. I pulled in some data. It seems like there's some signal here. Yeah. Let's plan a project around it. Yeah. +And so this is to me and actually I'll my sees the conference after we'll invest for it. So I'll talk about planning. Yeah. +But to me, a lot of this like how do we build better prototypes to build plans and have ideas and have conversations between engineers, data scientists and product managers is really one of the inspirations for search array. +Uh, good times to do this before I'd be like, okay, I have to stand up some examples. Yeah. Just some yeah. Yeah. Yeah. Yeah. Spend time on that. Yeah. How am I going to index to allocate a cluster and whatnot? Yeah. Exactly. Yeah. Oh, that's amazing. +I actually this reminded me when I was working on the learning to run. And this was the last project I did in my 10 year tenure at AlphaSense. I said, Hey, can I have this really expensive laptop? So it would be like 30 gig around one terabyte drive. I thought I need so much. +Do you have a base for some reason? And it was SSD. And and and I got it approved. It was like, oh my god. So many. And I spent like a year working on it and was kind of like bloated version of search array because I could do everything on the laptop. +This connected right? The only problem I remember was wrecking my experiment tree because I would like, let's do right, would verfricate or like kind of branch out. That's right. +And then like, okay, should they go back now? Because retreat because looks looks like I went down the the rabbit hole and it doesn't give at any value. So I need to go back to that state when it was that bigger, you know, and deciduous or whatever, right? So and start off from there. +That sounds a lot of like what some of the functionality included. The sort of like search relevance tuning tool to you, you PID. Because I remember when I was building that many years ago, it was very much like every time you submit like you tweak the query and it saves that as a try. +Yeah, another time. You can't fork off stuff, but you can go back and be like, oh, this thing didn't work out. +I'm going to go back and yeah, it's it is funny how yeah, I have the same feeling and even in a notebook environment, I don't have that because notebooks you tweak a little bit, you forget what happened, you like, why did I thought my MDCG was good and I got bad. +What did I do wrong? You wish that you were like somehow the notebook was like versioning itself as you were going. Yeah, exactly. But somehow the whole environment was version, but yeah, that doesn't exist. I wish that kind of thing existed. Yeah, it was. +There was a tool we were using, but then it was acquired by some company. It was called spell.run. So it basically was like an integrated Python notebook environment that runs a cluster and they were heading in the direction. I was giving a lot of feedback to them. +Like, hey, can you actually build an infrastructure which will allow me to also maintain my branched out, you know, experiment space? And I think they got acquired probably stored before they could do this and probably they continue doing this. +I don't know, but there is another project called DVC, I think, which allows you to basically maintain your experiments as deep hashes, right? So you basically, you can, you can, you can get hash your code along with your data and then you upload your data. +Let's say to some cloud, I don't know, drive some drive abstract one. And then you have your code associated with that. So you can basically restore you or someone else that can restore the experiment. +I think if that, if that was frictionless, right? Or even the matter of it existing, right? Because I had to literally write something down on a piece of paper to remember what I need to do. You know, sometimes I would go crazy. +At Shopify, so we had a, we had a, our testing, search testing infrastructure, the notebooks. So everything was in a monorepo. So you could stand up, elastic search, and you could have, we had a, this is a Rails environment. +There's a, all the relevance logic was in a Ruby library that we, Rails monolith would load and so call in the network, call, elastic search and do whatever, pre and post processing. +But when we stood up the test environment, we would say, we wanted to load this, we wanted to load the right configs, but we would basically put the, come to the, commit task of the repo that we, that it was supposed to be, and it would load the config and it would be amazing. +But yeah, it's, it's getting reproducible environments. Yeah, and experiments is a challenge. +This is where, I mean, at some point, basically, like your experiment rate will be, you know, trumped by how quickly you can deploy this or how quickly you can like, shuffle things, right? So yeah, I think so. +This is where infrastructure comes as a big topic and in your talk, I think you spent a good, yeah. Yeah, yeah. +I think it's like partnership, right? So we had a, there has to be like a big team in my career too, as partnerships, like partnerships with PM, partnerships with data, yeah, partnerships with infrastructure. +You really have to have one cohesive team and one of the anti-patterns is when they're so separate, yes, that it creates, I agree. You have to throw a requirement over the fence, yeah, and then a month later, maybe you get something back, but it's not quite what you want. +You really have to act like one team, yeah, and search is so multi-functional. +Yeah, yeah, and I've seen it at Shopify, the challenge was like infrastructure was a different or, and so we would throw things back and forth over the fence and be not quite right, we want to, and we'd have to figure out the right way to partner at Reddit. +It's a bit more, we have a bit more of a challenge that data is a different group. + So we're sort of throwing things over the fence, getting things back, yeah, and so we have to like actively work to make sure those partnerships are like, are healthy, yeah, but it's a big challenge and I think like organizationally, there are reasons that companies separate things out that are beyond search. + So it's not like there's an easy solution, but it's definitely when you get to search with these data products, like not just search for recommendations and feed and things, these become having cross-functional partnerships and not only cross-functional partnerships, but individuals who can work beyond their domain and get them for like very multiple hats, yeah, is really important. + Yeah, I think you're absolutely spot on, you know, and back in the previous company, declined of silo AI and now I don't know, I feel sort of like the same, but one thing I found after like breaking some arrows in the beginning, I found that if you can, try to find the permutual benefits that they will be driven as well as you, you don't know what's the outcome going to be because experiments are always like that, right? +Yeah, but the fact that we are having an experiment cross-department, this is amazing, then you go to all these meetings with executives and you say you praise them and they probably some way praise you like your team and and that's how you get the the right thing, right? +But things happen, things happen, you just need to be persistent I guess. +Yeah, things happen all the time and like, you know, organizational changes are like the weather. + You never know, you know, there's going to be a reward for yeah, someone comes in with a new perspective or whatever and that's another career lesson is not to get too caught up like emotionally, you know, or something happens, you know, it's not a lot of times it's just that so many things are already in control that are just like out in the politics or whatever organization changes are going to be. +I like this sort of mental model called circles of interest. Yeah, in the inner one it's like your direct control, it's probably you, your time and whatever, where you work, specific tasks. +Yeah, another one is like inference, so you cannot control but you can influence people or things or whatever it is and then the last one is that even if it bothers you but you cannot do anything, so it's a more control area. +Yeah, so you have to accept it or move on or do something but don't get stuck on that, so it's have it's have and things of course keep moving between the it's just like dynamic system but still good to be aware of. +Yeah, I'm sure with your way of blogging and book writing and yeah, all these projects you actually have that way of kind of, okay, if this is stuck, I'm going to, you know, uh, read stress by blogging, even though writing is, oh, you're right. +I do think like the other career advice is like you get hyper focused on one thing, you lose the forest for the trees. Yes, and take a step back. Maybe there's a project that you really like that got canceled or something. Yes. + We'll take a step back and and uh, first of all, a year you don't remember, but there are so many interesting things to work on and I think people forget that that there's there's so many interesting things to work on and I, you know, I had a brief break between Shopify and Reddit and I, I realized what life would be like when I was retired because I would get up and it's not like I just laid around and nothing. +I was just like, oh, what could I play with? What could I do? Oh, there's a problem. There's that interest in probably. Yeah, yeah. And uh, that really opens your eyes to that. Uh, there's always, there's sort of like more fish in the sea, so to speak of like problems to work on or cool stuff. +So yeah, there was even a study that when people retire because they get money or like a lottery or some other way, they go enjoy life, but they also age much quicker. Yeah. And sometimes they, unfortunately, die quicker because they have nothing to sort of strive for. Yeah. Yeah. +So that's, that's really really cool advice. And if this was not enough, all this cute bit, search array, blogging, of course, work and other things, podcasting now. Uh, you also write a book. Tell me a bit more about that before we close. Oh, yeah. Yeah. +It's, it's, it's, you just joked on the stage that the idea came to 2018. Yeah. So, Trey, Trey initiated the book. So, Trey's the primary author. And Trey came to me and I think 2018 said, hey, I want to let you know I'm writing a search book. I think I'll be done. +And I want to really, you know, for work and my wife and everything and family, I want to be done in six months. So I'm going to stress everyone out. And here we are. It's, you know, there was a pandemic. There was a lot of stuff that happened. But, uh, it's 2024. +And, uh, and it's funny because the nature, what you might refer to as AI in 2018, of course, is now is LLM's and these things. But, uh, it's, it's really exciting. I think like a lot of the things in the book are timeless. Yeah. Techniques. Awesome. +Um, we initially focused the book on solar, but we're taking, we took a step back when we said, let's make this applicable to many search engines. Yeah. And there are examples being worked on for many platforms. It's the ecosystem is so huge now. +There's all kinds of vector databases that are, uh, even adding electrical serve. And then there's, of course, solar, elastic search, there's open search, there's a bespa. Yeah. Um, in this more and more traditional space. And, um, so like I worked primarily on the learning terrain content. +And so, um, a lot of the things about how you get training data, train a model or, uh, how you evaluate these things, how you expose users to search results that are maybe a bit novel. Like you do a little bit of exploration to build out your training data. +All of these things are, um, regardless of where search goes or where rag goes for whatever. And it's, it's, they're still very relevant. +And it feels like in a way, a lot of how users are interacting with, with the world and with products is through, um, some kind of search or some kind of retrieval system. Yeah. +Even if it's a recommendation system or a fee system, that's becoming feeling more like search, where it's like real time. And I'm getting the stuff updated in real time. And of course, rag is searched. So I think search is still there, right? Getting over the world and all of which. Yeah, exactly. +Yeah. I've realized some crowd is now gathering, uh, to have lunch and we will, we will have lunch soon as well. Uh, it's always a pleasure to talk to you. And finally, in person, I think we've never met, but I think if you've been to, uh, you see the revolution in 2003, seeing around in islands. +Oh, I've been there as well. Is that the one in San Diego or no, no, in Dublin, in Dublin, yeah. Have you been there? Yes. I wasn't. Yeah. Yeah. Uh, then I, I didn't dare to, uh, you know, to go high. But, you know, that's when I, we introduced Kupit, actually. I think. Yeah. Oh, it's amazing. +The workter. Yeah. That's a still relevant project. That's, you know, it's running 2013 JavaScript. Say to the art. Yeah. And I've been consistently deploying it to an every company. I agree. So in TomTom, we just released a new algorithm to production. Thanks to Kupit in two weeks. A lot of weeks. +It was, it was on the bookshelf, of course, quite some time because the team couldn't figure out how to test the quality. And I said, okay, let's just do labeling, right? And let's use Q. Yeah. Just simple labels and, and we're, we're, uh, more than 10% increase in precision with new algorithm. +And they said, as a product manager, I approved the release. Let's go. That's also, that's great. Thanks for creating the tool. Great. Sure. Happy to. I love making tools. Yeah. Thanks for your time, Doug. Enjoy the conference. Thank you. Stay in Berlin. Yes. Thank you. Awesome. Thanks. \ No newline at end of file diff --git a/transcripts/vector-podcast/berlin-buzzwords-2024-sonam-pankaj-embedanything.md b/transcripts/vector-podcast/berlin-buzzwords-2024-sonam-pankaj-embedanything.md new file mode 100644 index 0000000..f343a52 --- /dev/null +++ b/transcripts/vector-podcast/berlin-buzzwords-2024-sonam-pankaj-embedanything.md @@ -0,0 +1,104 @@ +--- +description: '

Video: https://youtu.be/dVIPBxHJ1kQ

00:00 + Intro

00:15 Greets for Sonam

01:02 Importance of metric learning

3:37 + Sonam''s background: Rasa, Qdrant

4:31 What''s EmbedAnything

5:52 What + a user gets

8:48 Do I need to know Rust?

10:18 Call-out to the community

10:35 + Multimodality

12:32 How to evaluate quality of LLM-based systems

16:38 + QA for multimodal use cases

18:17 Place for a human in the LLM craze

19:00 + Use cases for EmbedAnything

20:54 Closing theme (a longer one - enjoy!)

Show + notes:

- GitHub: https://github.com/StarlightSearch/EmbedAnything

- + HuggingFace Candle: https://github.com/huggingface/candle

- + Sonam''s talk on Berlin Buzzwords 2024: https://www.youtube.com/watch?v=YfR3kuSo-XQ

- + Removing GIL from Python: https://peps.python.org/pep-0703

- + Blind pairs in CLIP: https://arxiv.org/abs/2401.06209

- + Dark matter of intelligence: https://ai.meta.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/

- + Rasa chatbots: https://github.com/RasaHQ/rasa

- + Prometheus: https://github.com/prometheus-eval/prometheus-eval

- + Dino: https://github.com/facebookresearch/dino

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20240919_060938_934c8351e1fe4c81a354cd419d0a3307.png +pub_date: Thu, 19 Sep 2024 11:02:40 GMT +title: Berlin Buzzwords 2024 - Sonam Pankaj - EmbedAnything +url: https://rss.com/podcasts/vector-podcast/1663042 +--- + +Hello there, vector podcast and I'm here accompanied with Sonan. Sonan you are the, I guess, visitor of the conference. Are you also giving a talk? Yes, I'm giving a talk tomorrow on metric learning. +Yeah, what's your topic? I'm not talking metric learning tomorrow, but I'm very excited about what we are building at and better than anything on starlight. So yeah, awesome. And is it your first time at the conference? Yes, it's the first time, but that's one of the best conferences. Awesome. +Yeah, I love it. I've been here first time in 2011 and I still, I still love coming back once in a while. It's really good. I can see why you want to come back again and again. Yeah, exactly. Yeah. Awesome. +And you work mostly on what I, well, we had an episode actually with quadrants on metric learning. I will, I will make sure to link it. Tell me a bit more about metric learning if you will. Like in a, in a, why shouldn't everyone care that he seems to think that they should use maybe yes. +So a lot of people just think about like, you know, we can do a check distance and then you know, we'll get the similarity. But the thing is, even though you change the distance, it won't make any difference because those embeddings are already in the space. So it's already relative. +So if you're doing a co-science similarity, which I love pizza and I do not love pizza, that's your 90% similarity. Right? And to the other distance will not make any sense. +So the thing is with metric learning, you can build your own data set and then the train there when embedding model again for giving you right. Yeah. +I mean, I still try to understand it, but it's basically like, like on one hand, you have your data and then you choose the model and that model should be pre-trained for you, you could also fine tune it on your data if you want. And then inherently, it will have its own measure of similarity. +So it's not something you can easily control. Yeah. But then metric learning opposes this by saying that you should be in control of your metric. Yeah. +It's all your similarity measure, not just the metric itself, but the similarity measure, which means that I should kind of like drop the model, just get my data and start training some new network, right? So that I can find the basically fine tuning the embedding model. +What with your data? So yeah, suppose you're finding intense. Yes. Okay. Where does metric learning really shine? It's classification versus similarity again. If you are doing classification, you are limited up to certain classes, right? Suppose, yeah, particular intense. Yeah. +It's not scalable at like a million scale, you cannot keep adding adding addicts, but with similarity search and metric learning, you can add any intense, very keen solution. Yeah. Yeah. So it's not limited. Yeah. +That's one of the, you know, classical way to view that metric learning plays much, much better role at scale, and that's why vector database can scale this much. Sure. Yeah. And tell me a bit more about yourself. +How did you end up in this space? Like, what was your pet? I know you worked at Thrasa as well, which is also an open source project. Yes. I once looked at and but now you work for another company like, what was your journey? And yeah. +So I worked at as an AI researcher at Sama, so we were mostly in clinical trials. So, you know, Pfizer and the world is, it does this clinical trials for 10 to 12 years and we had like those massive data and we wanted to find out the subjects could drop out of the studies. +I also published paper before. That's well-versed in this AI research and AI area. Yeah. And then I joined Raza for conversation, AI, I love conversation, AI. And then I joined four friends recently and I got into this embedding space. +And now I have my own open source project called embedding a thing in which you can use very different multi-moder sources and structure sources, speed, you know, you get embed it in 40 x faster speed than any other presence by planes. Wow. How did you do that? That is rust. It's all available. +It's all open source because I have like a used supporter of open source. So what we do is we have built this cluster in rust from PDF while it is going towards embedding. So one of the analogy that I use most is embedding models are, yeah, they are like really, really cool. +They are becoming faster and everything. But if you want to drive a Porsche, would you like to drive it on a national highway for a road full of quadruples? So that's the analogy being used. +We are giving you a high for the price for driving your embedding model or Porsche, you know, in a very sophisticated and like, yeah, no tech depth, you call it by blind for embedding. Interesting. So you are basically building an infrastructure where or infrastructure for this is a very model. +So what as a user, what can I do on this project? Yeah, very good question. So we are very production ready. Yeah. And we do not use any kind of heavy library, right? They are lip torches. +So if you have to embed something, the first go on hugging phase, use sentence, transformers, and then you will download that 2.5 TV library and stuff like which will come with lip torches and stuff like that. Yeah. And we have removed all those dependents. All right. That's a good lighter. Yeah. +We have liked it. Yeah. But of candle from the hugging phase, because candle also uses rust and because we are also building rust, it's much easier to integrate with candle. So yeah, so it's much lighter, much faster, you know, way of creating. +What is candle? Candle is basically, basically, inference on GPU and CPU. Oh, I see. Yeah. Yeah. And it's also open source. Yeah, it's also. Okay. So you do everything unconventionally in a rust, even though everyone else is doing it in Python. +Because it's, you know, multi-treting is like so much embedded in rust. Like people will tell you that Python can also do multi-treting, but that's not too multi-treting because the global, global, and global, and global law. Yeah. And rust tells you mutable log. +So you can do like achieve a tool multi-treting just with rust. Yeah. They promised actually to solve geolproblem in Python next version. Yeah. They already are through them. Oh, wow. I don't know when it will materialize, but. Okay. And so, okay. +But if I look at it from the perspective, let's say, of building some product, being a chatbot or like search engine, you know, blend it with vector search, or something like that. So what is my typical sort of like, like, pipeline, how does it look like? Right. +So what will I do? Let's say I have my data. And then maybe I've chosen a model, but that model is okay. Maybe it's not the fastest one. +What should I do? Will I turn to your platform to speed it up? Will I turn to your platform to do some other things as well? So we are not doing any changes in the model itself. We are not quantizing. +Even though we can use those models, so candle gives you a certain list of models that you can use and create with us. Yeah. Basically. So whatever candle supports, we support. Yeah. Whatever candle doesn't support, we cannot support because we are basically dependent of them. Yeah. +So if and we are not doing anything in the model itself, we are doing it on the extraction in parsing part of the data. Right. If you have different videos, different MDs, I will extract junk and parse them. And then build this like extra fast. Yeah. Yeah. +And then let's say if I want to go to production, but I also have some other components which maybe you wouldn't integrate, right? I know my search cluster and something else. My services. So can I also go to production with your platform? Yeah. +Like, how will it look like exactly? Is it a docker? Is it the unit? It's you do not need to first of all code in Rust. A lot of developers come out to reach me and like, you know, they ask, do I need to know Rust to contribute to embed anything? I'm like, no, you do not need. +We have, because let's fire. We have like worked like for building this wrapper around Rust so that you know, you can easily create it with Python. Oh, so you have a Python wrapper of your own. Yeah. You only need to know Python. You do not need to know Rust at all. How interesting. Yeah. +So do you have any like instances where companies have already built POCs with your platform? Or do you already have someone going to production? Yeah. So I get so many requests on like acquisition part of things and stuff like that. +But we are like, you know, it's we are one my whole company and we have nothing company project. But we have got six key downloads, but we have gotten to production yet. But hopefully next two, three months. Nice. And do you need any help from the community? Yes. +If you're interested in building the infrastructure for UnityBI and Rust, Python. So to connect with us, we're left to have you on board. All right. But let's go back a little bit. So you also said that there is multi-modality element of it. Yeah. +So I will tell you the way I see it, but please correct me or augment me. So I think a couple of years ago, two years ago, we gave a talk here at Berlin Bosnol. It's basically showing a system where you can search images and text, whatever you want. +And if you have images that do not have textual metadata, then that's your gateway into finding these images because neural networks will understand and extract the content using plebe, right? Yeah. And so we were able to show some really interesting examples. +For example, you could find in the context of the commerce, a long sleeveless dress, striped, whatever color and so on and so forth. And it worked. And even some audience members asked us to demo on their queries and it still worked. +So that showed the power of multi-modality, right? And we didn't even need to fine-tune delete it. It was just out of the box. But I guess the reason for it to multi-modality, what else are you thinking that's part of your blog? Great question. +Even though images like Clare is known for multi-modality research, one of the best use cases of Clare is when you're doing the need of short classification, right? It doesn't need the previous data at all, even if it is searching images, if it is searching through text. +And it's like so powerful, right? So we have a different example with it. But coming to a question, we have audio, wanted to embed audio graphs, et cetera. So all these are in five times. But right now, we are only embedding text and images. And are you using Clip? Yeah, we're using Clip. +Clip, that's right. You know, that one thing that you cannot get. I also wanted to ask you a bit if you may share your insight on evaluating this system. +So one of the feedbacks that I have gotten for, for IoT, between or anything, like basically, so let's say I have my LLN based application, you know, how do I evaluate it? Because one of the feedbacks is that sometimes it gives perfect results, sometimes it gives awful results, right? +So now there is nothing in between, right? Or not barely. +So how would you solve this? Of course, you do start with your metric learning and some other techniques, right? But there is still the other side of things when you go to production, as you know, like in Rasa and Quadrant and many other companies, you care about quality. +So how do you have any insight on that? Are you maybe planning to build something along the lines of evaluation? That's a great question. You know, but great part of the great response to it is, so LLN is one of the examples of that. +So, you know, LLN gives you a bright answer, but it also gives you hallucination. But a lot of people see hallucination as a bug, but I see it as a feature, because it won't be able to do that creative job, but it can do with hallucinations. +One of the, there are so many tools, right, to measure retrieval part like Braggas, Prometheus, right? And there are so many tools, too. +But still, I think recall measure, what we call it, like, you know, measure how LLN recall is working, where basically extracting most relevant information, not like rubbish information. +So those things are like really important, and a lot of research is going on, but we are more like focused on the infrastructure, and we are keeping it up, trying to keep it up, but yeah, so mostly I would go for classical testing ways like precision, recall, yeah. +But basically like, okay, you do test, and you see that sometimes once in a while it fails. So first of all, of course, catching that is important, right? People are going to production. Yes, but what is your way backwards to fixing this? Thank you, Chef. Yeah, from from from finding that bug. +Okay, let me think about that. +So, data set, maybe you can give some example where you have fixed a ratio, you know, reported by someone, not necessarily as part of your platform, but previously, I don't know, what have I done? Yeah, so I was working with this, that's why this talk came into my mind, right? +The negation problem, the negation problem is so huge. +You will always find sentences with not every domain, read biomedical, read law and everything, and it still gives you the same similarity, even though you do not have to be a language, you know, expert to understand these. Different things. Yeah, yeah, yeah. +So different things, right? That's that's when the trick learning comes, that's when inference started to come into mind, because inference is very important. +Like a lot of people have played with SNLI and stuff, and then they understand that to understand negation, you first need to understand inferences. +So there's a way to like, method, right? Yeah, yeah, yeah, and entailment, contradiction, neutral, two sentences could be neutral, yeah, unrelated to each other, two centers could be contradictory, contradictory to each other. +So, yeah, which means that you need a purpose of data somehow labeled, yes, yes, yes, logically reasoned through, right, using an algorithm. +Yes, so that's that's what SNLI is, like, SNLI is a data set, particularly for this, yeah, particular questions, problems, so yeah, if you, you've re-tune it with, it was fun. So that would be for text, and what about other modalities? I don't think it was, that's what it was. +Like images, I know sometimes a model may hallucinate that there is something in the real world, but there is nothing like that. Oh, that's one thing, but I guess there are many. +So there are things like, there's a study called blind pairs in clip, that was done by, I'm sorry, I forgot the name, which were like, people say find that. +Yeah, so they found out, clip actually has blind pairs, like you cannot segment things really well, like, cats sleeping on the, on the car, or something, and then something else will give you the same description or something. So there, Dino comes in. +So Dino there's a segmentation with self-supervised learning, self-supervised learning, I think is the best invention of like, for this AI, and now it thumbs up. Yeah, in the open source. That's supervised learning. Is it open source? A Dino, you say? Yeah, it is. +Yeah, it's from Meta, and I think Likun is one of, and when, you know, he has written this folder, what you call it, white paper on the dark matter of intelligence that is self-supervised learning. So they are doing a lot of work in self-supervised learning. +You know, make data, make the model learn from the data itself. You do not need to label it. Yeah, that's like in the self-supervised sort of. +Okay, then maybe another question I have is, where do you embed a human in this process? Do you ever, like, I don't know, to check quality or give feedback? Exactly. So one other thing in metric learning is everyone thinks it's self-supervised, or data, it will learn with the data. +It doesn't need label. But when the contrastive learning happens, who is making that negative mining fare? It's the huge, it's the kind of making that negative one. Very, very crisp one, right? Exactly, where does it is learning? Not just random negative, but like synonymically negative. +Thimmedically negative. Yeah. So there, they mean very common and like, you know. For sure. Yeah. So, and so today, let's say if someone wants to use your platform, you said, embedding and embedding. Embedding thing. Embedding thing. It's on GitHub, I'm guessing. We'll link it. Yeah. +And so how do you, like, part of the story that becomes successful, I guess, is that you can map out your path from use cases to your library, to your project, which would probably be one of the components in the overall picture. +So which scenarios and use cases do you see where your platform can give value? Is it chat box? Is it vector search? Is it completely anything? +In anywhere where embeddings are used, multi-model embeddings, my library will be like, well, I want my library to be the infrastructure where people use different. +Awesome. Yeah. Yeah. Yeah. Well, this sounds really cool. And I wish you all the best in this project. Thank you. I hope that some of my listeners will go and check out and maybe you will even get some contributors or, you know, whoever users who can create the tickets. Yeah. Yeah. +I would love to see some issues. And, you know, even if you want to raise some issues, go ahead or add any feature, you can add it as a full request and we can take a look at it. We are really, really excited. And a lot of developers just feature to me, do I need to need no rest? No. Yeah. +If you do not need to need no rest at all. Yeah. So you can be, let's say, writing the library and still can trigger it. Yeah. Oh, nice. Awesome. Maybe you can use chat GTP as well to convert your Python to rest, but that's another story. Awesome. And I look forward to your presentation. +I will not be there, but I will watch the recording and I will also link this episode and the recording of your talk. Thank you. So good luck with that and thank you so much. Thank you. Thank you so much. Enjoy it. Enjoy the content. Yeah. Thank you. \ No newline at end of file diff --git a/transcripts/vector-podcast/bob-van-luijt-ceo-semi-on-the-weaviate-vector-search-engine.md b/transcripts/vector-podcast/bob-van-luijt-ceo-semi-on-the-weaviate-vector-search-engine.md new file mode 100644 index 0000000..a7e0699 --- /dev/null +++ b/transcripts/vector-podcast/bob-van-luijt-ceo-semi-on-the-weaviate-vector-search-engine.md @@ -0,0 +1,54 @@ +--- +description: '

1. Layering problem: www.edge.org/conversation/sean_…-layers-of-reality

2. + Podcast with Etienne Dilocker (SeMI Technologies Co-Founder & CTO): www.youtube.com/watch?v=6lkanzOqhDs

3. + SOC2: linfordco.com/blog/soc-1-vs-soc-2-audit-reports/

4. + Dmitry''s post on 7 Vector Databases: towardsdatascience.com/milvus-pineco…-9c65a3bd0696

5. + Billion-Scale ANN Challenge: big-ann-benchmarks.com/index.html

6. + Weaviate Introduction: www.semi.technology/developers/weaviate/current/ + Newsletter: www.semi.technology/newsletter/

7. + Use case: Scalable Knowledge Graph Search for 60+ million academic papers with Weaviate: + medium.com/keenious/knowledge-…aviate-7964657ec911

8. + Bob''s Twitter: twitter.com/bobvanluijt

9. + Dmitry''s Twitter: twitter.com/DmitryKan

10. + Dmitry''s tech blog: dmitry-kan.medium.com/

' +image_url: https://media.rss.com/vector-podcast/20211223_011215_3e84d5201cd172cc4c9a7c3057bf900a.jpg +pub_date: Thu, 23 Dec 2021 13:17:15 GMT +title: Bob van Luijt (CEO, Semi) on the Weaviate vector search engine +url: https://rss.com/podcasts/vector-podcast/347461 +--- + +6 someone write good from players friends to That's a vector database and I'm sure Bob will talk more about it what it is what it isn't. Hey Bob. Hey, do you think for having me cool to be here? Yeah, thanks for joining. +I know you have a haptic schedule, but it's always nice to pause a little bit and talk about things. And I was thinking maybe we can start off by introductions like if you can introduce yourself your background and kind of like how did you end up working for this product and company? Yeah, sure. +So I started my career as a software engineer and later I moved to a more IT and software consultancy and one of the things that I was working with a lot of unstructured data and we're probably going to talk way more about that. + But the story that I have is that the years ago I was at a conference in San Francisco and it was a cloud conference and back then it was just announced that there was a change in the Google search algorithm and you have to bear my thesis, this predating like you see in the remote like transformers and those kind of things. +This was the time that I think the glass was the biggest thing around. And they made it change and they said well we're going to go more to contextual search. We're going to go on the way from what they call then page rank to rank brain. +And one of the things that I was looking into is like is there any company or are these cloud providers are they going to provide database technology or search engine technology that actually deals with a similar type of search so that it becomes easier to search through unstructured data. +And the answer was actually like they weren't looking into it or maybe they weren't sharing it. So I was actually at the airport of San Francisco and I just started to work on this idea. And it was like it's coming from a lot of directions back then. +So it was a lot happening, knowledge graphs were happening and the machine learning was growing. And at some point I thought like hey actually I do think that there's an opportunity in the market for this. And so I started to work on this. I started to gather a team around me. +And what then happened was that a lot happened in the machine learning space. So think about these transformers models were released. They were getting better and better. + And back then we were still looking at like having like these factory presentations said that we can talk a little bit more about in a bit like on the site but we actually don't like hey actually if we use this we can just solve new use cases and we can build a completely new database or new search engine. +And so that is the origin story. So that's where I'm coming from and why we started because unstructured data was a problem. It is still a problem. +And I strongly believe that these kinds of vector search technologies are helping in solving these problems not only in text but also basically anything you can vectorize. So they can be images that can be audio but that could also be I don't know the human genome you name it. +All these things can be vectorized and it gives another perspective to search through the data. So that's the origin story. That's awesome. That's awesome to hear. And like you know like this field is still in many ways emerging right. +Like the field of let's say vector databases per se as products but also the field of applying them you know for different use cases. +But you know like it's interesting you touched on like you you know you knew about you know Google kind of disclosing something and then you knew that the models have been also developing right. +Let's say basically you predated that but then bird came out right and then in other fields you know let's say computer vision automatic speech recognition they also been vectorizing in some way maybe signal processing wasn't vectorizing but then I guess they started doing it. + And like it's interesting do you think that you kind of like coincided like you basically predicted this field right like you didn't know it will happen you felt that it will happen but it wasn't at the same scale as it is now today we have so many models right like I don't know hugging face making a product out of it and so on. +But like do you think there was a real need or was it kind of coinciding that yes now there are models we are addressing the similar problems you know but using different technique and now you are there with your idea. + Yeah so I mean there are two sides of the going to answer this question so I want to end that's more the about the need and then secondly about when I knew when I sold the value and so let me let me start with the first thing so on structure data is huge and the problem that we currently have with search is that if you know what you're looking for you can find it if you don't know what you're looking for you can't so to make that very simple if you have a webshop for example just sort of a grocery store or something like that and you're looking for medicine because you have a headache then you must somehow or use the name of the product or somebody needs to tag the product to find it right so if you have like I don't know a aspirin then somebody has to add the keyword you know and headache or something like that or painkiller and then even with painkiller you know etc. + What these models solve and then we already talk about the NLP and the natural language pressing models is the is that they solve is that you can look in the vicinity of these words and what I often give is an example to think about it just as a as a mental model is a is an actual physical grocery store so the example that I always gave is that I say like well let's say that I like a I have a shopping list and a shopping list says like Apple banana washing powder if you would have a traditional data if you would have a store that it's organized as a traditional database then it could be for example on an alphabetical order it's going to be pretty difficult to actually find what you're looking for because maybe at the A you might not find the Apple but you have to look to the G because you're looking for a granny Smith app etc. + And what these these factor models do is that they they're basically a form of like a hyperspace right so they have you can envision them as as a as a tree dimensional space so if you walk to the food department right in the in the grocery store and you find an apple then you know that a banana will be closer by than the washing powder is and if you move towards the washing powder you move away from these from the from the from the food section and that brings me to the second part of my answer when I knew that is this potential was because I made this very simple super super super simple a prototype which was based back then on on glove and the big problem was that people say like there's a problem with disambiguation so if I have a word with the fact to re-presentation for example or Apple is that related to the fruit apple or to the company apple so I did something very simple I said like well if I have a document or a sentence and again at bear in mind this is predating transformers and I said well what if I take these individual works so I wrote a very simple script that took these individual words I said I'm going to calculate a new factor representation just a centroid based on these works so now I said okay I have a company with an apple so I take all these individual words calculate centroid and now I see if I can somehow make the work you know the sentence less ambiguous and that turned actually out to work rather well not extremely well but rather well again we're talking years back now and then I knew okay this here is value because I could think of so many things that you now can index and you can search in the air quotes vicinity of it in your in your vector space it made it easier to find things it made it easier to classify things etc etc so that's that would basically be my answer how I see that yeah so it's it's a great answer so it's like the the way I thought about it's like you bring context to your data right if we stay on the text side for the moment you said apple and banana you know they are related because they are both fruit right but there could be some other related items now data set we just don't know about as long as we encoded them and with the right kind of distance metric we can figure it out how close they are so it sounds like coming back to your previous example where we have used let's say inverted index you know we would just store all our items in some alphabetic order and hope for the best and that order I think inherently didn't have the context right so the context was kind of in it was kind of represented in a different way like in it specifically in the in the case of inverted index you deal with a dictionary of terms pointing to a posting list right speaking in lip scene search engine bingo for the moment here so and that posting list is just an order order list of document IDs so you don't have much context there either right no exactly and that is that is how it how it brings context and and and and again going back to that to that mental model or that the idea that you can have about it is that what I what I said I so if you take the the the the building where the where you have to grow history in the building the building would be the database basically and the model tells you where to put stuff in that building so that's how it's giving that that context and then the only thing that we need to do well I make it sound very simple now but the thing that we need to do in that database is make it possible as easy as possible for the end user to navigate through that building and that is basically what the vector database is doing so it's taking the data and we can also talk a little bit more about the features that that we have in wv8 because that's also something that we we don't only store factors we also store data objects but basically if you bring a data object to wv8 you tell it and take this part of that information to vectorize so for example if you have well product for example then you could say well I want to factorize the title and description that is factorized and then it the model tells wv8 where in that database or in that vector space to place that data object and that is what we try to optimize for as much as we can so that you can search through to hundreds of millions of data objects in using that model in just mere milliseconds yeah that's fantastic and I think before we move to the like kind of what you are focusing on as a product which is super exciting and I mean you're doing a ton of work I just wanted to close off on that on that line of thought that maybe just maybe we are on the verge of closing the inverted index data structure because it existed since I think 15th century like the first book where they published the index page in the end it's an inverted index because it said okay this word occurs on this page that's an inverted index right and so it existed for multiple centuries and so you think we are on the verge of replacing it with contextualized embeddings I mean that is certainly that isn't that isn't an exciting thought I have to know that there are a few things from a from a from a technical perspective where the inverted index is still being used but one of the things that we've done for example wv8 is that we said like we double down on the factor you know on the on the on the contextual search and yes every now and then so for example if you say they show me if you have a product database and you say like show me products to for outdoor sporting for example but they have to be more expensive than 10 bucks then you know both types of indexes kick in but it definitely starts from the perspective of vector search and I like your idea so the amount of also research has been released that we of course also benefit from is is amazing so I like that talk I like that idea yeah so I think on the next lecture I also a little bit like teach students in the local university here and when I explained some basic building blocks of sort of you know classical search engine architecture and I explained the inverted index then I ask I puzzle them with this question do you know how old this data structure is and the students are actually from the linguistic department so they they're not as kind of you know IT people who care only about code they also care about the rest of the life in many ways so there's no I think no I don't want to play the IT guys but I'm just saying they are kind of very multi-dimensional you know and they and they're just puzzled and they say okay maybe 18th century they don't know but then I just bring the screenshot of a really old book which is in the 18th century and they're like really so I just I sort of make that connection that hey we are still using the tech that was invented in 15th century yeah yeah yeah nobody that's I mean I agree with you and that is that is extremely exciting and I think we'll also get into that but but you also see emerging is like these the use cases where those kinds of database and search engines are do that are kind of solved I mean of course it's so fair so it can always be better and always be more but those kind are kind of solved but what we see actually with these fact search engines is that new use cases and new options actually pop up so we can do new things with it so and I think that's very exciting as well yeah absolutely that's an exciting way to kind of approach this new emerging field is to look for use cases and I was really wondering like what is that that you are building in the company your BIV8 database engine so you said that you have an you had an idea you know you started assembling the team now you give vision you drive a lot of things on the open source you're super active what is it that you are focusing on you know for your users and maybe you can also go into use cases part sure so it's important to bear in mind that if you look at a solution like we've said you can take two angles to look at it so you can as I like to go you can look at it bottom self so that's really true the core technology that's how you can look at but you can also look top down and so that is from the from the use case perspective and they're like that are people in working on BIV8 and as you mentioned it's also it's open source that are working and talking about like this from the bottom self approach and I like to take a little bit more it's top down so just like so what are the things that we can do with it and and so let me explain to you what so what we're building so at the at the core so you can see that's like a they're like three layers basically so the first layer is the database itself so you can find that database on on on GitHub you can find it on the documentation or website and it's just called the BIV8 factor search and it's the core database we see people use the database just to store data objects and their own factory presentations now what is very important to know from a use case perspective and I'm now starting at the lowest level is that we thought that it was very important from the get go to make sure that people can not only store the factors but also the data that they are representing so add back to to to in example of the product but it will be an article or what have you you can actually store the product so the price the name the description and those kind of things and you can say this product has this factor representation and on top of that we also said like we want to be able to connect these data objects together in a more air quotes traditional graph format just to make sure so we just not a graph database but it has a graph data model now when we go into use case I can I will share a few cool things that I that you can do with that but that's so that is at the core at the heart that is the database and what does that database focus on it's focus on being a database so that you can have create right update and delete functionality which is an easier set than done and there's a lot of content also that my my colleague agenda talks about online if you really want to get into the nitty gritty so but it's dead there's a database that you're used to you use BIV8 in a similar fashion so you take the container to spin it up APIs become available or the rest of the APIs of the graph channel APIs client available Python Go Java what have you that you can connect to the database so if you're used to working with a database or a search engine it's the same functionality there that is one that sits at the core then around that we have our first layer or a second layer and that are those are modules and what these modules do they do a few things so we've seen like hey there are actually certain types of for example machinery models that people keep using over and over to get these vector representations why not bundle them so think about the text to veck models that we have so we have different types for different use cases where you can say well I'm going to throw in that product but automatically take a model to create a vector representation for that so that's something that we have reveals have question answering models spell check models you can create your own models sorry I'm saying models I meant modules sorry this is a little bit and it's so models and modules so I meant modules and those are available open source as well and my colleague Laura made a great video also on like how you can build your own modules and then we have like a another layer around that and then we go a little bit outside of the realm of the software per se itself and those are more in the the package use cases that we see so we see that there's a lot of value in in retail wholesale e-commerce in the medical space data management space those kind of spaces and what we're doing and that's most also my focus is okay okay if we have at the core we have this one singular database what are these package things that we can do around it and that is also where we make a distinction between our users and our customers so our customers are mostly interested in these packages right it can say okay I have a PRP classification problem oh great you can actually do that with V8 with that specifically for companies in your industry can also be for for documents search for a medical use case image use cases image similarities and so we package them together so that would be the last there sometimes there are software involved for example in in the form of plugins and those kind of things but that is the that is the outer layer so that is what we've had looks like and that's what you're constantly building because as you mentioned before um the effector is kind of a new thing right to actually deal with that and but a question might be it's like so what's the so what's actually the new thing right so the other day somebody asked me said like well it was a data scientist it's like well but if I have my factors I can just store them like in my memory and and do some similarities or something and said yes you can absolutely do that I said like now but what if you want to do that for a a product catalog that might have like 50k products that are constantly changing and those kind of so then that becomes problematic so we actually help you to bring these models to production and what you actually see is that the new use cases that come out of that are tremendously big and we're just constantly uncovering new ones so let me give you one example so the let's stay with the ecommerce example so if I have and we've had a data object that has a product and a product has effective representation that it got from a for example transform a model then we can also say and we get well I have a cart I just a shopping cart now if people add products to the shopping cart we can real time calculate new vector representations based on what people have in these carts so now we can say hey based on what you have in your carts you might be interested in this or that product as well so now you have these real time now all of a sudden it it changed from a search engine where you can find products into a recommendation engine for ecommerce just all in one and those kind of things for constantly uncovering and there's so much more that we that we can do from from very concrete things like ecommerce to on all the other ends of the spectrum things like with effects representation that people are calculating for like a genomes and those kind of things so that is just that keeps just the use case keep turning up you know almost on a daily basis yeah yeah that's I I want to that's so great like dive in I was kind of a little bit I wanted to unpack a little bit like things so I understand them well enough and maybe our listeners will too as well so you know like when you said models and modules you know let's say I'm a researcher I have a model embedding model right that I've been using and the battle testing now if I want to introduce that model into VIAVI8 I will have to create a module which is using this model is that right is that okay that's correct yeah and and and I need to extend some API right that you provide yes and that is something that we spent a lot of time on and that's the API design because the IMA strong believer in developer UX so it needs to be as clean and as easy as possible so one of the things for example that we've done is that we've adopted GraphQL as an interface and so sometimes people ask is it like well why GraphQL and not something more expressive like Spark or something like that which is a good question or one of the things that we know is like well if we focus on being an effective database and we just want to show these data objects with effect representation and you know sometimes it's possible have these as the graph relations connections in them but we're not focusing on being a graph database we think actually GraphQL does a job and it's easy for people to understand it's very intuitive for people to understand and I think that these kind of things are very important and so to get back to your to your point that to your question so what we try to do is make it as easy as possible to actually bring these if you have your own models to production and it's like well I don't have any models but I just want to do you know a semantics search through I don't know a resumes right then you know just pick something off the shelf shoot it in 3D API and you could do that so let's say and also this is kind of like which I think is very important in today's world even though a lot of machine learning and data science happens in Python you know when you go let's say to web scale sometimes you cannot use Python anymore like you need to use let's say go right or maybe see bindings in go and things like that so your API is it kind of cross-lingual meaning I have my model in Python or maybe in go can I kind of plug it in or do I need to rewrite some layer on top of it to be compiled yes that is a great question and here comes the especially the expertise of the development team in so the what they have done is that is like well we know that that that that that center right that database that just needs to be optimized as far as possible because I'm you know let's stick with the e-commerce example if you use it in production and hundreds of people are searching and you want to give these recommendations you need to be able to scale it so you need to choose a language and an architecture that actually supports that so in our case that's let's go but even if you if you look to the to the GitHub repository you will even find the assembly optimizations for certain things in there but we also knew that we said like well maybe if you want to use model that's written in for example that has bindings in Python for example or you like to work in in Python so one of the things that we did there so it's like well the way that the modules work is that the modules are containerized so there are APIs going between Weaviate and the different modules and as long as you adhere to these APIs you can choose with any whatever language you want to build to build a module so in the case of Weaviate Weaviate itself is completely written in go and as like even with the assembly optimizations those kind of things and the modules we have a few modules that are for example written in Python because we use specific types of transformer modules that just you know run well within Python so you can do whatever you want within it when it comes to using Weaviate so you have the database running and you can pick a client for example the Python client and have the Python client interact with Weaviate wherever it sits but if you're building a frontend application people use for example the JavaScript clients we have I've seen people build react applications with the JavaScript clients so that's that's why we structure it like that that it's easy to use in production yeah that's amazing and and you know like what you touched on it's so important and I mean close to my heart as well I've been building APIs kind of in my free time for a long time and you know like what I've noticed with the users is that the lower you put the boundary to kind of enter right like meaning let's say you have an API and you have published the sample clients to use this API on all possible languages let's say kind of the mainstream ones at least you know it will lower the thresholds to enter for them so they will never even contact you they will start using it right that's that's the win that's the win right yeah and I even believe the so to so to sidestep a little bit from the discussion but it's my it's interesting to talk about this because the I am a strong believer and I'm I'm I'm touting my horn for like years already about this that is it like the overlap between the tech and the business science is in my opinion expressed in the API layer so if you peel the onion of a tech business you can go as deep as go unless you have a graphical user interface but if you talk about database technology those kind of things even bigger platforms if you look at the API the API describes to you in human language what is what what it's exposing and therefore what the value is that it's creating and the only thing that you need to do is business is it's right to capture that value yeah exactly and it's like I think the the the sort of the saying behind the API platform I was using back then they were saying like software is eating the world but APIs are eating software so they like decomposing the software that was sitting on the shelf somewhere in those big companies small companies whatever and now it it's introducing the network right so like everyone wants to expose their value through an API and you can easily consume that value through an API right and then you add all this you know payment you know layers and whatnot to actually make it economically feasible but that's that's I think that's an exciting direction and I'm happy to hear that you guys are pursuing that model like basically making it an API in many ways database should be an API right it's sitting somewhere and I can connect to it with my client of you know in the language of my choice and you know handle all the cases I need to handle exactly and I think so if you look at a nice car for example right there are two ways that you can look at the car so the bottom up way if you compare it's software right the bottom up way is that the first thing that you do you open the hood and you you you know you look at the beauty of the engine and maybe you want to know how the engine works on the hood the top down by your looking at it is just opening the door sitting in the chair holding the interface of the car the steering wheel and you go like oh this this car drives you know it drives fantastic it drives amazing amazing and my argument persians and I and not everybody agrees but that's my point of view is that I said like if you have an amazing engine but you know shitty steering wheel nobody's gonna drive your car I mean the other way around social truth that if you have a beautiful interface in your car and you have a shitty engine that also doesn't work but that needs to play well together and that's again why I'm a strong believer in that in that in the in the UX the experience that you have in using the technology because of course an experience is not limited to a graphical use in space that can also sit in a in an API of course yeah absolutely and you know like coming back to some of the use cases you brought up I mean you you mentioned the shopping cart I was actually chatting to Eric Pugh hey Eric if you're listening to this from open source connections you know he was saying like out of his you know it's kind of out of blue sky he was saying hey so let's imagine a use case it's a pizza delivery and and you want to encode the journey you know of your delivery person with no left turns right so like only right most or like all right turns or you know forward backward but no left turns just for whatever reason and this was like a very interesting use case like can I actually express that in the form of embedding probably I do I can right so and and do some kind of geographical search and like say okay what's the most similar sort of journey that will bring these pizza from A to B sounds a little bit crazy but you know like that's what I'm thinking when a use case exists the the sort of the journey is to go backward from it to to the embedding space right and that's that's very interesting yeah yeah but so so let's build a little bit further on the so I I like the example right so so but let's let it's an it's an it's a good example but it's an an abstract example so we might even make it a little bit more concrete right so let's say you have a pizza delivery surface and from the moment somebody orders the pizza you have certain data right so you have data about like what's on the pizza it's coming from or the person living etc etc there are two things that you can do right you can and go that information in the vector representation and if you are fast enough in comparing vectors with old orders you can say something about that order and that it becomes interesting so what you now can do is you can say for example you might be able to say something about delivery times so you can say okay I've we've sold that say that you're like a big pizza chain you know we'll sold like a million pizzas in the past now based on this request we real-time calculate the vector representation for this order we do a real-time comparison for orders in the past and I said hey we see that the average of the loss I don't know 10 orders that are similar was like 18 minutes so now real-time you can say something about that so this is just an example of a use case that where these vector databases might be extremely valuable and that's just a example and that just more and more of these kinds of cases are popping up so that's that's extremely exciting in my opinion yeah I mean it it sounds so interesting because like you know there are so many products that still revolve around the idea of let's say kind of for simplistic terms inverted index like I'm using that 15th century model to represent my data right then if I have images I'm like oops what should I do now okay maybe I can use some extension on top of Lucine if I'm using Lucine let's say for the sake of it right but it's still kind of a limiting experience it's kind of like okay I'm solving my my task with the wrong tool right in many ways and maybe I should not be like let's say Lucine since I mentioned it I can kind of continue using it for the sparse search for the for the normal inverted index retrieval but I can unlock so many new use cases with the vector search right and like find kind of new ways to show the value to the users because especially you know like in traditional search engines if you return let's say 100 results you know users don't have time sometimes to go through them right you basically offload a lot of do you feel that way or yeah certainly and there's a lot in there are a lot of interesting things in what you're saying so I mean so so again it depends on how you look at it bottoms up or top down so so you can make an argument from the from the bottom's up approach okay we have to infer the index or we have to the defector index and we can do certain things to this but what I also like to do is that I like to look at it from from a top down perspective and what I mean with it is this so if you work with a you want to build on a project for yourself you want to build on a project for you know with your students you want to build on a project for I don't know for your boss or for customer whatever how often do you actually say okay the tool that I'm not going to use to store my data isn't inferred index it probably doesn't happen that often I mean it's like it's like if you go to one of the the websites of these famous big companies that build databases around inferred indexes they don't go like this is the best inverted index around use us they say something else right so and they say like hey we help you with with enterprise search or we help you with with logging or we help you with service security needs and those kind of things and what I think where we are right now in the cutting edge of like where where where effective searches is like we are talking about it like you would be talking about these inverted indexes but I hope and that's one of the things that we try to do at with you know at some technologies around we've had it is to also talk about these new things and this is what you do with it so it's like hey we can't you can't you have like these recommendation systems in e-commerce you can do contextual search to e-commerce I don't know why it's stuck with e-commerce but I keep getting but or you can do contextual search to documents and those and those kind of things we I was talking about this this this amazing use case that had to do with a with a with a resume and had to do about this so there was like yet a resume and in and let's say that in the resume it says like I'm an IT director and I played in the the National Olympic Beach volleyball team that's what it says and now the request is like there's they're looking for somebody who's an IT director and who is interested in playing sports you're not going to find that person within infertile index but instead of talking about that from the perspective of like the infertile index can't find it because there's no relationship between directly between sports and being in the Olympic Beach volleyball team but with the with the effect index we can I actually like to find more words and better language to actually talk about these it's only perspective use case like contextual search, mentex search and those kind of things which I even think are still abstracting into a lot of people's ears but I think it's also very exciting so there's this new thing goes for you as well right so you're also helping with that you're helping to let the world know like hey look there's this new thing look what the things are that you can do with it so I think the point that I'm trying to make is that I I it's not that I disagree with your point I agree with your point but if we compare it with successful search engines now that might be based on infertile indexes but that's not how how we talk about them and I really think that say that's that we're like at the at the cost of that change that people select that they start to talk more in these in these from the perspective of the use cases and the things that you can build with them. + Yeah absolutely I mean it's just you know like my engineering mind always kicks in and says like hey but you're basically offering me to replace a verded index with like vector search data structure but you are totally right like if an electric company would say hey buy our cars because we have the best battery and look how good it is and they they supply some diagrams there and showing how well it conserves the energy and stuff right maybe yeah we'll appeal to some clients who want to save the planet let's say right but the rest of the clients they will say okay why should I buy your car if it's slower right like you didn't focus on the use case I'm I'm kind of like advocating for right so and you you should always listen to your users on that one. + Yes and so what you're saying is very interesting and this is something I was inspired by something which is called the the lavering problem which basically means that in the past and I was so this was for me the case too like I think like you know maybe 10 years ago or something that the user if I just go deeper drill down deeper deeper deeper deeper deeper and I understand how something works at the core that means that I understand the whole concept of something and the more I'm learning about this and the more I'm working on it the more I think that that's not the case so let me give you an example so there was this I saw this tweet coming by on the day that coin base did an IPO and I'm paraphrasing here what was in the tweet because I don't remember exactly but somebody found the hacker news post or somebody announced that they were working on coin base it might not even have been called coin base back then and he said okay I'm thinking of building a platform blah blah blah it's something like that well you should have seen the responses there because people like no wants to use that and that's not where these blockchain technologies are you know made for because we actually want to decentralize things blah blah blah regardless of the fact if you agree or disagree with the statement I think we can agree on the fact that coin base is doing pretty well and bringing a lot of value to people so the point that I'm trying to make with this story is that I think that the that the risk we run in constantly doing that deep dive and making the deep dive comparison which is important and which needs to happen and where we need to think on in the product itself and we also need to think in these other layers like how will people actually use that because don't bear in mind that the people currently that are involved in the discussion talking about the structure database and who are very vocal about it are people are extremely knowledgeable about what's happening in the hood but with it what if you're just a you know you're just working in the company and you're just like a normal software engineer and somebody says like hey I want to do better product search and you do a Google search on that you find a solution like we've yet you might be interested in knowing what's happening on your web but there's a limit to that you also you you come through it through the use case not not bottoms up but that's how I how I look at it so that yeah that's the point that I want to make I really like I really like your approach but like because I mean I was kind of joining this industry or entering this industry as an engineer and then I kind of progressed more like to work closer to product management management I did I didn't become a product manager but when I talk to them they really kind of want to hear too much about the algorithms right because it's not what they think about daily they think about solving user use cases right and and and sometimes they may ask how can I do this is this possible right and they give you a task right and then you go and like you come back to your toolbox and you're like okay what do I have here a couple of databases I have this queue system okay let me stitch things together and maybe this will work out right but you I agree to I agree with you that I think in many ways we we have that risk kind of in engineering kind of focusing too much on what's closer to us right let's say I enjoy using this IDE I enjoy using this compiler but what value it produces beyond me enjoying using it right the end result right that matters no absolutely and don't get me wrong as so I mean sometimes when we apply on our internal select channel when there's something you released in the software or then I enjoy that as well and or I can you know then I you know I play around with it and I go like it's amazing that this works or that we can do this or how fast things are or scalable they are don't get me wrong I enjoy that a lot but the things but I what I also enjoy and I think that's also the role that I have in this in this company and the role that I'm trying to play for us when it comes to vector search is if you for example have these product managers to actually listen to them and select what what problem can we solve and I don't think it's the responsibility of the product manager to take an example to understand how vector search might apply to their use case no we need to be able to express through the product managers how we can bring value to them because I don't think I mean of course there are product managers that say like okay for the next product we need to send our database but I don't think there are many asking a question they ask like okay you know we can absolutely it's a lot of data we can never lose the data architect or or engineer how are we gonna solve that and it's so it's different language to talk about these problems and what we now start to see is that that there's this wave coming that people express problems from a product manager perspective business owner perspective entrepreneur perspective that they that they say things or problems that they have for example hey somebody keeps typing and I'm having a headache in my search bar but they don't see us friend and then we need to go boom let's use case for we get yeah absolutely and it's like that's a great segue actually to the second part of the show you know like product managers answer the question what we're building right and engineers answer the question how and so I wanted you to kind of go and talk a little bit about how you implemented the avid and I understand that hgn maybe could also talk about it and I think he talked about it recently in a podcast and I'll be I'll make sure to link that in the show notes but what caught my eye and you know if you look at the landscape of the vector databases some of them are close source most of them up to now are open source and it's interesting distinction because some businesses decide you know we will keep it close because it's at the core of what we offer and you know maybe there are some risk elements involved for them maybe something else but that's that that's their choice your choice was to open source viv it can you talk a bit more about it yeah sure yeah sure so the so that goes back to us like so if we so you can if you have a use case you can package things together right that goes from the the the lowest level of the technology so just you know where the the bits and bytes begin and then where the where the index sits and how how it works and how it's optimized and how it's scalable and then you go up on up and then you get to to these to modules that you might want to use and then you get to these packages of additional tools that you might want to use for specific use case and then the question sits like okay where does the most value come from and what do people need to actually use this in production and what you try to do is that you try to somehow capture that value and then there are two things that we see in the case of viv yet because we've yet of course all these also with our competitors we evolve in different directions right which is good I think and is that we said like well there's a lot of value so if you look at our enterprise customers right what's very important for them is that that they want to have certain SLAs that they want to have certain sometimes they want to have the size of things sometimes they want to use things that are in these packages sometimes they want to have specific models they all can do that but that is where the most of the value for them is coming from they need viv yet to do it's always a viv yet a hard but it seldom lead up people specifically ask for that vector search engine so if you go back to that example what I gave like these famous search engines that you now have around are not promoted as like we have the best infertile index pieces of software that exists that goes a little bit the same for us so then you can select well if that's the case we could consider open sourcing it and then you can say well I can make a pro list and I can make a con list of open sourcing so because I have a business model right I know I want to build my business so what would be the the pro of open sourcing well one is transparency so you say like well we're building something completely new it sounds all very fancy we're gonna show the world that we're not you know we can actually do this so I very fancy told you like I said well we even have parts of assembly in the in the in the in the code well you can actually see that right so you can see how it's optimized you can see how it's functioned as one so what that has as an effect is that it builds trust so the second thing that happens is as I mentioned before we need to learn what these use cases are that people are building with vectors so we see people are building like crazy so our downloads are going up up up up up up over time and the other day somebody just published a great article about how they how they index 60 million data objects that we've yet were open sourced users but what we have what we learn from that we have this they it's like they're they're so kind to do they're basically promoting with yet they get a seat back they gave us help etc but there's also another thing sometimes an open source user finds a bug or finds something and the way that of course that the the the the the the software ecosystem is structured is at the moment that the fix comes in our customers have that fix as well so it's a it's a win-win so and the thing is customers don't mind that are open source users because if I have a customer that says like hey or prospect because hey Bob can I also use the open source first you say of course you can if you manage it yourself you're stuck with the open source license if something goes wrong and you can choose the sound software and then they go like well we we want all that so then it's interesting for them to um um uh uh to to to buy license what's also important to know is that these companies like ours were young companies so you're also trying to position yourself in the field and you try to show what you can do and I think that open source is an amazing vehicle um um uh to do that because as you probably know the the the the the the open source community can be very direct and that is great because then you learn from it and you can make things better so we've learned a lot from the from the um community so all in all it's currently is it is a net win to have it open source and it's because it's not it's it's helping us from an outreach point of view it helps us to build community and um it's not biting our uh business strategy yeah but that's well put uh but I wanted to still come back on the open source a little bit you know like you did mention these key elements that are not positive for you and they natively embed into your business model so to say right um but there is also one element that the open source like if you compare this this to close source let's say the way this would look inside your companies that you have internal you know roadmap planning and you chagelong just releasing stuff right and then you go directly through your sales you know to upgrade this installation so you become like a uh kind of corporate type of thing be to be deployments right um on the open source side you need to do an upfront work to maintain the connection you've built the community but you need to keep talking to the community right that's a lot of work as well so how do you see that part of the story yeah so that's a very interesting question actually so the um so what you see with it so i so in the end I think that community is not something that is um uh just a open source thing so let me give you an example a database company that I find very interesting is how they operate is a snowflake right can you say that because of data warehouse you know no no no vector search I can easily say that no but so I find it very interesting and I sometimes talk to people and they tell me you know what's an amazing company in how they build and help us build partnerships that snowflake never like wow that's interesting so they explain to me how they do it so the point that I'm trying to make they apparently are doing a great job in building a community uh but they're of course completely closed source so you need to build a community you know either way you need to have if people don't like your stuff um they'll move away and we know of a very famous database company where that is happening and uh uh so it's a um it's the old-fashioned company so that's fine but that way actually learned from them like actually you want to be you want to have a community you want to be nice you want to be great great products um because then people you know then the best marketing is basically where the mouth and so the point I'm trying to make is like I don't think we're limited to so sorry I meant community is not a thing only for open source you somehow want to show failure and then you build community about people using your technology saying something about your technology um etc yeah I mean absolutely I mean it's it's just the I guess the essence of my question was kind of like you know like if I if I maintain it as a closed source I can maintain my own standards and I can be let's say soc to compliant right for the auditing part of things so my business moves forward but when I'm open source I need to maintain a different level of standard like like documentation you know um code style you know uh the process of submitting you know pull requests and how how can I influence the VIVA direction and other things you know like it's a lot of support on your side you basically you support the clients right like those that choose your deployments your hosted version your cloud and then you need to support the community and I mean I'm not saying this is a bad thing I'm saying this is like a portion of your business model of your day-to-day life that is dedicated to that and you're doing a great job at that by the way I'm like super amazed like positive you are always like welcoming on slack and the count keeps increasing like regularly when I go back to VIVA Slack I'm like okay just few weeks ago it was 150 now it's over over 200 like what's going on so you know what was doing a great job and and the whole team um but I mean it's work it's work that's what I'm trying to do. + Yeah yeah well I mean running a startup in general is it's a lot of work but I so I hear your argument but I'm just not I don't 100% agree with the argument so let me explain let me explain why so first of all like with the the the simple example I spoke to I don't know if everybody knows what SOK2 is but there was listening to the podcast but it's like it's the same right you can have an open source product that is SOK2 complying which is really is interesting again in from a business model perspective so you can say if you use this software open source it's not SOK2 complying if you use the exact same software with a different license it is SOK2 complying so it's a part of the open source this model so that that's one thing the second thing about for example maintaining documentation that is true but the thing is if you have a a close source solution somebody somehow needs to use the APIs as well so you still need a documentation for it so you still need to maintain the documentation so um so there I'm not sure if that argument still if that if that argument still still holds the only thing that is sometimes difficult is that um people ask a lot of questions so you sometimes see those on a slack they ask a lot of questions so as you want to be friendly and you want to um um you know you want to answer these questions there are two things related to that so one is like at some point of course you know okay see people ask like a lot of questions and they keep asking over and sometimes you friends they say you know maybe just watch this video first or read this part of documentation first or sometimes I also do that in the end and then just kindly just you know and direct them in it said I maybe want to start you want to start there so that that is one thing um there was something else that I wanted to say related to this uh oh yeah and the second thing and it's also something that I always tell the team is that sometimes an open source user might ask complicated questions not complicated as in that the question itself is complicated but just oh another question or why is he or she asking this but the thing is that I strongly believe that every question that you get has a core of truth to it so so if somebody makes a fuss about somebody asks a question then probably others have that problem as well and the the upside that you have from open source is that there's a lower barrier to entry people start to ask these questions and you you know you learn from them and I think it's completely fine to in return I sometimes if people ask specific questions I just ask them not on the public select of maybe in the DMs like hey may I ask what are you building with me yet because then there's this feedback loop and we're learning from it as well so it's a I hear your point but I do think that open source is evolving and the business models around it are you evolving as well and and and we're trying to benefit from it and again for now it's it's a net positive yeah thanks Bob it's it's very clear you know the reason I'm asking this question is because there is there's always something behind your choice right and it's like it's it's supporting your idea you drive in it but you know there is an alternative model as well you didn't consider it because you didn't want to go that path right you didn't want to go the close source path for your database because the way you as you said you want to get more feedback loops right with the community you want to learn more about the day use cases and this is fantastic way of getting it right like you you show it just transparently on the web you can either download it and host yourself or and probably that will happen when you run into some some issues here and there we will we will be there to support you right and and and you can you can contribute back if you get inspired by the tech itself your deep and tech you want to fix some things right or introduce a feature that is amazing yeah and don't get me wrong as I don't have any problems with close source even if I I mean I can make an argument for close source as well but I but I but I do think is that the that plays a role in your identity as a company so what kind of company do we want to be and how do we want to show that to the outside world and and yes that is that that comes with the complexity of meaning to deal with it but in the end it it works well so for example we also see so go back to these product managers right so what we see is that sometimes you know you have developers around the table and if developers see that it's they sometimes especially with with corporates the developers expect that we have the close source solution so then they see we've here they see this actually open source that makes them very enthusiastic about it and this is great then I you know I did an installation and I played around with it this is great which is then a positive feedback move back to the product manager and then everybody you know and everybody's happy so it's a it again it's a it's a it's a it's a it's currently a net positive and also I think when you build something new so you try to new niche yeah you create a new niche and we're not we're not alone but it's also not very crowded right you need to somehow show the world what that niches and what that niche can do in as many ways as possible so I think that I I dare to bet you a nice bottle of wine that for the 10 people that told me that they really like the fact that we've here this open source only one of them actually looked at the software itself right went in the folders looked at how it was written people do that the people get feedback on it but the for for a lot of people they just say hey it's great that your literature is open about it we got you what you know we understand it but you miss model is great um uh if this is this is working so it's it's building a it's a friendly way of approaching the market basically yeah I would argue yeah yeah and I think to close off on this like you know like um in my previous company when we were extending solar Apache solar you know um the reason we were extending it is because we had a very specific use case that wasn't solved by the community right and you know like as you as you go into the Apache solar documentation you couldn't find a lot of material there on that specific topic right and so what I had to resort to is reading the source code right and this is something that actually one of the um elastic search book authors I think Rafal could said you know if you have a question and nobody answers it on the mailing list or documentation doesn't have an answer go and read the code that's your answer right and if it's if it wasn't open source what would I do I would have to engage through some sales loop or what I would kind of like it would it would it would put the threshold to enter it so high that I would just kind of unbearably high I would say okay I will find something else maybe I will stop working on this problem right yeah exactly and just knowing that that's there even if you don't need it often works as a benefit don't what I what I don't have is and that's it surprisingly a lot of people ask me this but I don't have any moral reason to have something open source it's just something used that works very well for us and how we want to position ourselves so but it's a great question but it's uh I think to recap it just to position a vector search and with that we've basically in the world to show people that this is something that they can do and I think it's working wonders yeah absolutely and maybe we can kind of cover another topic that you've you've mentioned that you know you you are using at the core of the Bavid you're using certain algorithms you know for the vector search itself like building the index and the search algorithm and you have mentioned to me in private that you are using like H&S double which is a hierarchical navigable small world graph algorithm right that you have customized right can you talk a bit more why you did it like and you did mention crude right so that you needed to add crude yesterday I was checking their repository and they actually the original authors they already added crude there because probably there been some other use case coming from elsewhere you know can you add it and and I was coming with my new use case by the way that I was saying can I load the first layer of the graph somehow with one single and then I have to go and read their code this is another beauty of the source code I can read it right but can you talk a bit more uh why why you customized H&S double you and did you implement it in go in the end yeah yeah so so two parts of that so the um I did not do that implementation so the the we we earlier referred that other podcaster listeners really want to go into the near degree about it then I would highly recommend listening to that podcast but I believe that you're also going to link it so um and the answer to your question was basically already in your in your question right so the the thing is that we the problem that that we needed to solve is that if you so you can take a um in an library but some of them are immutable and then the problem is like so then if you change something you need to rebuild it again that is something you don't want in a database because if you go back to that use case of for example the recommendation engine you somehow want to real time add a product to a card and somehow real time want to deal with that so then you are air quotes limited to a an algorithm that supports that and that was for us um uh H&S W was the was the right fit however um as you you can actually see that also in our documentation we not only have modules but the an n is actually also plug-in so currently we only have H&S W but there's no reason we're looking at others as well that in the future we're going to release other um an n plugins as well within we've it that you as an enthusiast can actually choose what you want to use but the only the the requirement that we have for such an um uh uh an algorithm is there basically when you say like okay we need to somehow have that crot support and or we need to build add crot support to it could even be the case that in the future it's um you know we're gonna support other use case for that's not the case but that's for now and uh to the second part of your question yes that's actually a customly built which you can of course see in the pit of repo you see a full circle yeah that's the value all along right so um and hopefully you know like in the end in the end of the day it's like if you guys and then something in part of H&S W that you implemented in in in glow and you published it as as code the original authors may also look at it right and they might you know take that idea and bring it back to the implementation and then that of course you know just as a side uh product will benefit some other part of of the community but you will be there as well right because like you know it's the authorship or you know the credit that that will be given to you because you did it right so i mean that that that's very interesting like you can benefit and reach out to you to even like new users potentially right or they will know about your existence um through this through this link no exactly and i think so and and the thing is what happens with with a solution like like with it is that it so it's like yes it has that in an algorithm at its at its heart at its core but there's so much around it with with with the scalability um the capability that it has with the way of also storing in the the data objects that actually building that yourself kind of it's very comparable to for example Lucine and solar or Lucine and Elasticsearch so how i like to talk about it in a comparison as i said i like to about take the an n algorithm as the as in your mind as Lucine i mean the the comparison isn't a hundred percent correct but to make my point and then that whole thing that's for example solar or elastic build around it that is what we're trying to do with these with these algorithms um so you could that's how you could kind of compare it right rafan just that we said i'll re-give you this out of the box but i do want to reiterate that yes we have these power users that really want to know that nitty-gritty that want to make those changes but the majority of users the the i like to call them the silent majority of users they just have like okay i have a hundred thousand documents i know a hugging phase model that i like how am i gonna quickly search through it period and then they find we fit so it's a that's that's the the majority of users and they probably don't even know what hnsw is which is fine right so that's perfectly fine because they do other things they might be edits that layering what i said like they just sit in another layer that they look at it so um i think the cool thing of our modular system is that we can make these power users at the core happy but we can also have these more generic developers or full stack developers we can make them happy or even in the outer layer we can make these these product managers happy and that's what we you know what we what we focus on but all to a single core and and to to a set of modules that are those are immutable basically so it's not it's not that we have like two types of weviators one weviate probably support all these use cases yeah and and i mean you know like for those of us who really want to go deep into detail and you know like the analogy that just kind of came to my mind is that if you take my sequel or some some sequel database right when you choose the type of the field that you index right and you are thinking okay it's gonna be a B3 or it's gonna be full text or it's gonna be some other data structure that sequel database offers to you that's when you start asking questions what is the trade-off right of choosing that version or the other version but you may also kind of just index the data and then kind of solve your use case first right and only when your product manager comes back to you and says hey why is this slower than yesterday can you improve right exactly exactly and what i find interesting and important where we now are in the in in in the cutting edge where where vector search in general sits is that yes it's very important to talk about to be transparent about to to share about or or not share depending on your opener close energy about how these things work what kind of algorithms are used on it etc that is very important and that's also as you mentioned earlier something that we are very active in by the way not only as so for example my colleague i chain is doing that from the from the core of vivid but my other colleague Laura is more doing that from graph ql perspectives and like how can these queries when they leave yet etc but i think it's also important to if you we take your my sequel example what do people use my sequel for or in our case what do they use vector search for and how can i communicate to these people who have absolutely no idea what an what an an algorithm is and and i think that those are the three pillars that we stand on so the the core the interface and the use cases and and we try to cover these three pillars within semi based on one code dates yeah that's fantastic i mean what what you do there and like i will link the yeah like self what is it selfish plug so that the block was basically explains you know a few details of six vector databases couple of which are close source the rest are open source you know you can actually see you know for yourself like what what is happening there in those databases and then so much material there and like don't go too technical yet kind of stay in the use case part and kind of like and and i and i hope i we can like as the community collaborate more in bringing these use cases kind of highlighting them i just alluded to you know similarity code search or you know encoding some software viruses into that representation and then searching similar viruses when you need to do that and you're like yeah so i mean there are so many use cases that um it's our job in many ways to connect and you're doing a great job there really i do to connect the the use cases with the tech right so like don't fixate on the tech yet because tech isn't going to improve over time there will be new cool algos by the way the billion scale competition going on there will be probably new algorithms right that will be done performance and then eventually performance will stop mettering in a way right like it will be something else it will be like okay you know what can i do yes yeah so and and let me if i may make it a quick metaphor there i mean probably this metaphor is used many times for for the technology but just to to to make it as well it's like you know the other day i ate at a great restaurant with a friend of mine with great amazing food it was great atmosphere great food everything right so that is the that it would be the the metaphor the use case that you have but my friend you know she said like i actually want to know how they make this so she also bought a cookbook right with the recipes in it so now she could go into the book and she could actually go even a level deeper and actually see how the chef prepared the the dishes which is fine i was not necessarily i was i just wanted to have like some nice food in a nice glass wine she want to know a little bit more and i think if we're smart about this and i think also that is the where open source business development is now in you know 2021 almost 2022 is that you can actually cater that to that whole stack if you do that smart and because they you know they they click into each other so i so the point is like you said i i don't think that the the technology and talk about technology is also very important but i don't think it's the only thing we should talk about i think we should make sure that we talk about both and that are constantly aligned that's also within how we talk about we get internally that is that is that is aligned but people use different words and different ways to describe the technology of people that are you know helping me on the business development they use different words they never talk about hnsw like the people in the core tech team do but they have a great understanding about the use case that are being sold so and i think that all needs to come together and we are an amazing point in time where that's happening perfect research so i think that's just that's just amazing and by the way kudos to you as well right so you're you're you're carving out your own niche as well there so good for you that's nice you know it's cool to see yeah being independent kind of in this field because it also open opens doors to talk to to guys like you really and and and you know like if i was your competitor maybe you wouldn't want to talk to me necessarily not not on the right it would be a discussion it's a different discussion yes and at the same time as i said in the first episode i'm actually educating myself a lot on this so in the process of this i hope to share um you know the learnings and and and benefit everyone including myself so so that's that's that's the that's the way to go forward so kind of on the open source side right so i'm open sourcing more exactly and uh yes exactly hey Bob so you you you really shared so much inside in in what you do on the product side in at bavv8 as well as technology i want to drill into more into this kind of philosophical level why you do this and when i say why like i mean you personally right and that probably propagates to your team as well and we can even ask everyone on your team why you guys do this right but you are the visionary you are the core of this like what brings you to this field you are the forefront of it yeah so it's like the that's a great question by the way so thanks for asking so the um so when i was working on just experimenting with these with these models and these and these representations and i started to do research on actually how much how big is this how much unstructured data is there actually that we could potentially help it then i found that this was so big and at some point i was just walking down the street literally and i was and if you just walk down street it's like like in the matrix when when you see everything like that it's this matrix i was like wow every company that i see every truck that i see driving by an airplane that i see coming over a warehouse that i see they could potentially use with you and that's that the dream that i just can walk down the street and i was like oh you see the truck there yeah the company uses us to do x oh you see the hospital over there yeah they use us to do y etc etc and that is such a thrill to be in that in a new niche and trying to um uh trying to build that product and and try to build it a solid product and bring that to you know that new product to people solve new problems that is a personal driver plus something that is just something that i'm personally very interested in and that's something that you already i guess noticed by the way that i you know my answers and that grew over time is that i became interested in that that layer between you have the the tech and how people use it there's like and there's like this this overlapped there i'm interested in that overlapped and like how do people use the technology how does that create value and how can we bring it to them and how can we capture some of that value and that is something that i'm extremely interested in and and this is just you know some of my technologies and then we've had this product is vehicle to do that so it's like if we then it just you know everything big and we think about this new niche with new database technology then just let's just go all in and just see you know how far we can bring this and that that's way more to say about this but the um um it's it's such an exciting time to work in in this and so that's my personal reason why i do this and um so yeah so that yeah since since you are so big on use cases is there a specific use case that drives you the dead gets sold maybe it wasn't sold yet maybe it was already sold by the way in your in your videos you know like you always kind of quite frequently you say okay imagine a wine store right and thinking i'm thinking probably there is a good wine in in the holand i should when when i travel let's let's get together and drink some good wine but you know like is that the use case that drives you is there something else that you think could be sold by via the eight oh no so it wasn't yes so the what what drives me is the there are certain use cases um that could be i think and i'm now doing this from the top of my head that you could um you could look at them from the perspective of size so that a that big you know large corporates are working with vj trying to solve problems that i go like this is amazing that they use this right that is that is something that i want hand that that drives me the other hand what drives me is that people were looking at Weaviate to to use it to um where it has an impact in people's lives so that can be medical use cases or even in even go as far as the HR example that i gave right and i'm a little bit vague about these use cases because we still working on them um but so when they're big or when they have an important positive impact on people's lives that is amazing so if i present to certain people results um uh uh for for these big use cases and you see these you know people their eyes they go open they go like wow okay i actually do that that's amazing that is that is the most that's that's the coolest thing that's that's around and um and there's also the there's something in and i don't want to sound too vague about it but that's something in exciting about um um uh with with with with machine learning i guess and especially with nlp so one of the things that we are working on and we hopefully gonna release uh perv as soon as as as a demo that i said is that is that um we loaded the complete Wikipedia into vh8 just the whole thing and so we're not talking about almost a hundred million paragraphs and um i watched an an an an an an an an an an antenna hopkins move the other day and i typed into we get just an aircraft you're upgraded like you know um uh which actor played Hannibal actor or something and then in a few milliseconds it says antenna hopkins and they go like whoa and that is so cool if that actually works and if that happens that is very yeah that just gives me a thrill and so i would say these three things are why i'm why i'm doing yeah that's super exciting it's like you know like the uh if i was asking the same question to me then the word semantics right the similarity yes that would drive me because i actually did my PhD uh in in um machine translation and my supervisor developed a semantic parser you know it wasn't syntactic parser and i cannot still find an analogy on on the market for for for this work uh but like it was driving me that the way he was explaining this is that hey i really now read tall story you know with my parser every single day right and it fails i fixed it it fails i fixed it but sometimes it amazes me because tall story tends to create so long sentences that they can take several pages you know in Russian right i don't know how it books in the translation by the way i've never seen the translation but in in the Russian source and tall story has written his book nine times his books it's it's like several books right like the the war in peace i mean and like you know so like he was basically compiling the language using his parser and he was fascinated by this like okay this is the semantic layer and and i was constantly thinking you know i defended my thesis in 2011 and i was thinking okay how can i apply this tech in real life and it was very difficult because this parser is kind of implemented in force and i don't code in fourth i don't know if you heard of this language it's like it's used in the industry it's high performance and like but it's functional you know it's yeah so like you can express many things with just one single word and then it just unwinds there behind the scenes and you're like okay how do i debug this right so and there was a port to java is well done by another student but i was kind of constantly fascinated by this field like how can i bring semantics into the word of numbers and into the word of like well let's put it inverted in the system what that's like since i mentioned it meant yeah no but i like this extra much what you're saying and it's like i and this goes again a little bit out of the realm of the technology i mean you use the technology but it goes out of it i remembered that there was before the whole pandemic it there was like i was speaking at a conference in London and they were so nice to book me at the or to have me at the large stage and so i had like this big screen and i'm talking about vf8 and i'm giving it demo and i could see the first few front rows and that what you said before the q and a example so i do just i give a q and a example and i and i just get a real demo and what i always do is then the moment when i present an audience and i clicked like to you know to execute the query and get the response back then i always look at the people that i'm presenting to and you see these people go like they sit like in order to just watch your screen and they go like and that's like and i go like that's such a cool thing so something that that we as a team came up with and everybody participated in building the thing and then people enjoy seeing that and enjoy using that that's that's amazing that's like uh uh yeah that's just you know that's just fantastic so and language has that that additional element to it right so that that people you know it does this how we communicate and and we get closer to have the the machines communicate you know that we can communicate more natural language based with people yeah that's that's that's amazing so that that that would be another that would be a fourth line of doing it yeah yeah that's fantastic that's so so deep and you know like i'm happy that we also connect with you on that topic you know the semantics when you you can park all other items like technology use cases product but the semantics part where we are driving this thing it's it's just amazing and i'm happy for you guys to be doing this and i think part of the excitement of being at the edge of of doing things is also kind of you know launching things and like kind of like announcing something is there something you would like to announce yeah so yeah certainly so i think the so one thing that already mentioned a bit so we we're going to launch that huge data set just the whole big video with everything and also for people to try it out and to play around with so that they can see actually how big it gets and related to that and we already actually have an impree release but it's going to be released very soon just as a standard release is everything related to horizontal scalability so that people can now you know scale from into the millions into the into the billions and we starting to get these kinds of questions in and we're very close and it's like i was in like very very very very close because the pre-releases are ready out there and and that then goes full circle back to people ask i get sometimes emails from people say yes so we have like you know a x billion factors but you probably can say well all choices we may probably can and i go like but can you also store the metadata you can also store the metadata and then people go they're really excited so it's just that just keeps going and going and so those are the big two things that i that i want to share because the there's a lot of people asking for this and so we're probably going to make a lot of people happy when it's when it's out of the pre-release yeah that's fantastic i mean it both sounds so big and you know i'm actually spending my time now figuring out how to scale i participate in the billion scale competition actually by the way and i'm like i'm i'm waiting with excitement for this release because i would like to learn things from you guys as well and like if you solve that and you open source many things and you will open source this part as well so it's it's it's amazing you know what you do and and i'm waiting with excitement to learn what you've done so thanks so much and thanks so much thanks so much for your time as well i mean we we went so deep today and in many areas and i'm sure we can talk more at some point down the road probably we probably can and thank you so much for having me and keep up the great work because you're doing a great you know job in the uh um you know in the community and in the industry yeah thanks so much Bob my pleasure and uh see you next time bye cool thank you bye music \ No newline at end of file diff --git a/transcripts/vector-podcast/code-search-copilot-llm-prompting-with-empathy-and-artifacts-with-john-berryman.md b/transcripts/vector-podcast/code-search-copilot-llm-prompting-with-empathy-and-artifacts-with-john-berryman.md new file mode 100644 index 0000000..9b30f29 --- /dev/null +++ b/transcripts/vector-podcast/code-search-copilot-llm-prompting-with-empathy-and-artifacts-with-john-berryman.md @@ -0,0 +1,282 @@ +--- +description: '

Vector Podcast website: https://vectorpodcast.com

Get + your copy of John''s new book "Prompt Engineering for LLMs: The Art and Science + of Building Large Language Model–Based Applications": https://amzn.to/4fMj2Ef

John + Berryman is the founder and principal consultant of Arcturus Labs, + where he specializes in AI application development (Agency and RAG). As an early + engineer on GitHub Copilot, John contributed to the development of its completions + and chat functionalities, working at the forefront of AI-assisted coding tools. + John is coauthor of "Prompt + Engineering for LLMs" (O''Reilly).Before his work on Copilot, John''s + focus was search technology. His diverse experience includes helping to develop + next-generation search system for the US Patent Office, building search and recommendations + for Eventbrite, and contributing to GitHub''s code search infrastructure. John is + also coauthor of "Relevant + Search" (Manning), a book that distills his expertise in the field.John''s + unique background, spanning both cutting-edge AI applications and foundational search + technologies, positions him at the forefront of innovation in LLM applications and + information retrieval.

00:00 Intro

02:19 John''s background + and story in search and ML

06:03 Is RAG just a prompt engineering technique?

10:15 + John''s progression from a search engineer to ML researcher

13:40 LLM predictability + vs more traditional programming

22:31 Code assist with GitHub Copilot

29:44 + Role of keyword search for code at GitHub

35:01 GenAI: existential risk or + pure magic? AI Natives

39:40 What are Artifacts

46:59 Demo!

55:13 + Typed artifacts, tools, accordion artifacts

56:21 From Web 2.0 to Idea exchange

57:51 + Spam will transform into Slop

58:56 John''s new book and Acturus Labs intro

Show + notes:

- John Berryman on X: https://x.com/JnBrymn

- Acturus Labs: https://arcturus-labs.com/

- + John''s blog on Artifacts (see demo in the episode): https://arcturus-labs.com/blog/2024/11/11/cut-the-chit-chat-with-artifacts/

Watch + on https://youtu.be/60HAtHVBYj8

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20250209_090249_4151453caa902e94e1bbf399c57f535b.png +pub_date: Mon, 10 Feb 2025 03:21:48 GMT +title: Code search, Copilot, LLM prompting with empathy and Artifacts with John Berryman +url: https://rss.com/podcasts/vector-podcast/1888857 +--- + +Hello everyone, Vector podcast is back. Season 3. We are wrapping up the season with some really, really juicy episodes. I'm sure you will love this one. I have the privilege of talking to John Barryman today. He is an ex senior machine learning researcher who worked on GitHub Copilot. +Currently, he runs his own consultancy actress labs. I'm sure he will talk more about that. Yeah, welcome, John. Good to be here. How's it going? Awesome. I actually just picked the book of yours and the book that you and Dr. Moll have written together. +I've interviewed Doug a couple of times already on the podcast. He has a lot to say. And I realized you've written this book together. It's my go-to source of wisdom on search. Do you still remember which chapters you covered? Oh my gosh. It's been a long time, I'm sure. +Yeah, if you told me the chapter title, I could probably say whether it is mere Doug. I did all the fun ones that did all hard ones. And we both did chapter one in our own times. I mean, we were all in chapter twice. +I want to read maybe everything, but in the search relevance problems, search under the hood, debugging, relevance problem, tame in tokens, basic multi-field search, how you build relevance function, feed relevance feedback. Yeah, relevance centered enterprise. That's interesting. +And then semantic and personalized search. Wow. Back in when was this published? I think we published that in 2016, I see. Yeah, 2016. Yeah, well, it's been almost 10 years. Yeah, that's the version I have. So you do have semantic search in the end there. Yeah, awesome. Yeah. +But yeah, Joan, it's interesting to introduce yourself to our audience. What's your background? How you got here? What are you up to? Oh, well, I guess that's a long story. I've had a very circuitous path. I started out in aerospace engineering because I like the math. +And as I got into the field, I found that that's a thing that I really liked once the math and was the software. You could do anything with those. And so while everyone was geeking out about satellites and stuff, I thought that was really cool. +But I realized that there's a big, big world out there that you could address the whole thing with software and math. So I breached out and got that book in your hand. My next big adventure was into search. I joined a concerted consultancy in Charlottesville, Virginia, worked with Doug Turnbull. +I did had amazing adventures, hop on planes. And I talked to Zappos, shoe sales and worked with a patent office. And then I got the opportunity to write that book with Doug. So that pushed me all really, really far. +I got the opportunity to start working for some really interesting companies or for Eventbrite and built out their search and recommendation twice. And then I got a chance to parly that into GitHub. So I went to GitHub and built out their last search-based code search infrastructure. +The old search infrastructure had smoke coming out of it. So we came in, rebuilt infrastructure from ground up. And after a while, I was search was fun. But I was always trying to get a little bit back towards math, towards data science tips. +And in about 2021, I got my chance to make the leak to data science. Join data science at GitHub. And from there, it ended up getting the opportunity, just right place, right time to join Copilot. Because that was kind of, you know, ML machine learning type stuff. +And I was in the data science group to that point. And I was, I came on to Copilot after the research team had wrapped up. There was a research team, brilliant people from GitHub next. They said, while look at these large language models, they're going to do amazing things. I think it's time. +And they built this prototype. And then I came in on the team that was there when it was going into production. So how to get this shipped to everyone, how to start improving it, how to measure, you know, what was working and what wasn't not working. +And then from there, I went into chat, Copilot chat. I was working with some of those features inside the web app. And finally, I was like, well, you know, I've got a little bit of knowledge in my head now, time to write another book. +And I connected with one of the research scientists that was on the original team. Albert Ziegler, we wrote the book, Prompt Engineering for Elements. It's about building the Elements applications. +And with that, just published two weeks ago, officially published, I have started out on a new adventure. Yet again, I am running Arturus Labs. I'm an indie consultant. +And I'm focusing on everything, large language models, Prompt Engineering, how to build applications, you know, it's feasibility, evaluations, stuff like that. Kind of anything you want at this point. And it's a blast. Oh, well, fantastic journey. Yeah, thanks for sharing that. +It's very, you know, it says a lot there. You will believe it or not. But I actually advertised your recent books, the Prompt Engineering, to my students on the recent course that we caught up with my former colleagues on LLMs and Generative AI. So I took the chapter on the rag. +And I thought that rag is nothing else than Prompt Engineering, really. Well, yeah, it's interesting. I mean, that's a topic in and of itself. Are we going to open that kind of worms? Of course. Sure. +Yeah, rag is an interesting thing because everyone talks about rag as if it's own entity, that it's a special thing. +But if you like look at it, especially from my background, which has been searched and then large language models, you click look at rag and it is search and then the large language models. +And if you combine them both, then it's really hard to get a good understanding of what's working and what's not working. You just, you know, you throw up the basic chain application, connect the data source. And I guess you just pray that it works. +But really, what it breaks, if you break it down to its components, then you've got a search application and the Prompt Engineering large language application that it overlaps. But a lot of it's kind of downstream. +And if you can look at those two chunks separately, it becomes a lot easier to debug problems. Rather than saying, you know, user asks this question, I got a garbage answer. +You can say the user asks this question, the large language model interpreted it as this search, this search, return these results. And maybe that's the, maybe that's where the problem was. And you can start debugging that. And the search results got interpreted this way. +And maybe you're not presenting it right to the model. + So always the name of the game with it's probably everything we're going to talk about today is, you know, figured out how to take this giant black box and break it down into components and figure out what is, you know, what's it made of and what possibly is going wrong and put sensors there and actually debugger. +Yeah, you're absolutely right. And in the, in the lecture, I actually longed code from someone, I forgot their name, but I'll make sure to link it. We've built a rag ground up without using any framework whatsoever. You didn't mention Langchain, that's one way of doing it for sure. +But we just really built, you know, naive, can and search and just use the model out of the box, sentence, bird. +And then I've noticed that because we did use dot product there, I've noticed that it would favor longer passages over shorter ones, right? For example, it would pull up an appendix of a AI powered book, AI powered search book. +And I was like, like, you could clearly see that it's missing the point. It's not able to pull up one short sentence where the answer lies. It just pulls something else remotely related. And that's exactly what you said, right? Like you need to start debugging what's going on there. +And you need to start fixing on figuring out maybe change the model, maybe change the chunking. But yeah, I agree. It felt a bit like black box, but less so when you implement it ground up, right? So you don't depend on any framework. +And when you implement it ground up, you find out that it's not all of that complicated. And once you've built every piece of it, like, you know, I mean, you've already seen the black box broken down to its sub pieces. It's not a black box anymore. +So yeah, that's typically since the whole industry now is sorting itself, trying to figure out what tools are useful and what tools are not going to be useful, I often advocate that people start as close to the metal as possible. +Because these models are actually pretty friendly, pretty fun to play with. Don't put layers on top of it that obfuscate, you know, what's actually happening. Yeah, absolutely. I'm really itching to ask you more about now, like your time at GitHub. +But before that, I also want to like a little bit take your, you know, take a look at your approach, how you view your career, right? So you look, you worked on search, but then you ended up in the hottest place in the way, applying all the lamps, right? +And you needed to convert in a way to an ML engineer. +Do you view it that way? And also if you do, how did you prepare yourself to become a machine learning researcher, actually not even an engineer, right? You are focusing on research aspects of things. So you needed to move the needle in the research space. +I don't know if I have a good answer for you. Like if anyone thinks my career has been successful, which in many ways I've done all right, it's been luckily like tripping and falling uphill. Every time I fall down, it's like in the uphill direction. And I don't, I'm the hand of Providence. +And so what do I do with any of these crazy jumps that I make to prepare? Pretty much, I just take the jump. I think I'm going to say how I'm going to prepare for the next jump. I take, I see the jump. And then I jump into it and like almost drown every single time by surviving. +So in this particular case, yeah, the move towards AI researcher, I mean, there's a lot in that, there's a lot of weight in that phrase that maybe I don't necessarily feel in my own career. +By beginning search for so long and by wanting to do data science for so long, I made myself, you know, over time, pretty aware of how things were, you know, just the typical approach to the model. +So I was never caught any of this in school, but you know, you read, you read the right books and you know, go through the right examples. Yeah, so I have gained, I wouldn't say just an absolute comfort with any of this even now. +But you know, familiarity of being around it for periods of this point. And then when I jumped into the large language modeling stuff, it's actually kind of interesting because it's a different type of AI expert than we've had before and maybe an easier entrance for a lot of people. +Much of my career, I have been an engineer and really I still, I predominantly think of myself as an engineering mindset. And so when you come into, you know, large language models, it's actually really approachable. +You don't have to immediately know everything about, you know, what choice of models to use and like, you know, how to train and have the whole outside and evaluate. +And you can just go to work and at first, at least, just experiment and I really encourage people to do this when they're building on, they're on an application. Rather than, you know, thinking about all the evaluations and stuff at front, you'll need, don't worry, you'll get to them. +But just get your hands, hands dirty, start using the, the APIs and build up some intuition and a weird way empathy for these large language models. Yeah, yeah, this is brilliantly said. I just recently listened to the episode of Lex Friedman with the ontropic team. +So the CEO and some of the researchers there. And one of them said, yeah, exactly. And one of them said that you, along the lines of what you just said about empathy towards the model that when you know where model succeeds and where it kind of fails, you learn how to prompt it. Right. +You know, like which risks you will encounter and you should be okay with those, but you don't tilt towards more risky areas, in the west to succeed in some specific thing. So I don't know, I like that. +But what is your take on LLM unpredictability compared to more, if you will, you know, traditional programming per se. Right. So for example, when you, when we used to, when you used to write code, and I don't know, C++ Java, what have you? It was very deterministic in many ways. +Maybe there have been some things non deterministic like runtime and so on, but still you felt like you, you are in control of many things, right. With LLM, it's different. For example, when you ask an LLM to summarize a document for you, and then you ask second time, the answer will be different. +It will be, you know, in subtle ways, it will be different. +And so that also creates, in my mind, some issues around, okay, if I have several users accessing the same document, should they compute the summary on the fly, or should they compute it once and store it and then show the same copy to all of them, right. +But that also means that if the original summary was not good enough for some reason and subsequent versions were better, I will never show those better versions. Right. So like, you start asking all these like multitude of questions, or am I asking the wrong questions? It's such a challenge. +And I don't, yeah, it's it's a period. Right. Like if you're used to doing something with Python, it's going to be the exact same answer every single time. With these models, it's just like, you know, a very finicky person that keeps changing their opinion. +And you ask them the same question twice, and they've forgotten what they just said. Because it's a new session, so they literally don't have them. You're fully just that they start over. I think we're going to see a shift in this is not going to change anytime soon. +Just it's almost as if you kind of plug a fake human into the circuit. It's like it's going to be independent. That's the nature of it. And that nature is not going to change anytime soon. So I think what you're going to see is a modification in the way we build code around these things. +I think the pain point is when you assume that it's going to be as predictable as a code that you're used to. +But once you get over it, you realize that, okay, well, if I just literally had a human in the loop, there's like an API to connect to a human, then I have to be build a user experience that is somehow tolerant to that. And so let's see. +A lot of times people are hoping the first phrase into interacting with these things. They say, here's a specification, build this code, and they expect the answer to just forward. Now, that can fail in one of two big ways. One way is that it's just too complex. +The model you can do chain of thought reasoning, 01 has it built in and it's magic. And it's going to get better. But with any sufficiently large request, complex request, since you're just appending one token at a time, it's just too easy to paint yourself into a corner. +So models will get better and they'll be less and less likely to paint themselves into a corner. But it'll always be the case with sufficient complexity. +The other issue that you run into and why we'll never ever get there is because when I describe something, the domain of possible implementations, possible completions that match that input is so much larger than whatever I have in my head right now. +And so if you have a company that's like, you know, we're going to have like, you say the specification or code and it will just always make the code. It's like, you don't realize into the codes written which you even wanted. You don't, and then you go back and change it. +You don't realize the codes written and written incorrectly, you know, that, oh, that it's doing what I said. That's not what I meant. +So what does this all mean? I think that future implementations have to do a lot to keep the user in the loop and make the experience so that the user doesn't feel like they're just shouting instructions at a thing and then hoping that it works. +But the user has to be interacting with this thing and, you know, converging towards a solution. So you see this in a couple of ways. One way is like with the assistant interface. +And cursor, forgive me, GitHub, per se, the cursor is just a really good example here where you feel like you're chatting with someone that is working with you to, to, on this code. It gets into something I hope we talk about a little bit later. +Art of facts, you know, they're, you're having this conversation here, but you're working on these artifacts. You're working on these things. And these assistants under, you understand what they're looking at. +Whenever they make a recommendation to change something, you understand how it's going to change your code. You are still in control as a human, say yes or no for all this stuff. And that's one way that they keep the users in the loop. +The other way that we keep users in the loop, and I promise I'll shut up soon, is there's a assistant type behavior and then there's like workflows where a human is, it's still in the loop. But there is a human at the beginning that designed a workflow as like a set of steps. +You can't just say look at this website and pull out all the phone numbers, all of the menu items, all of the, you know, the structure content. And always expected to work. +Sometimes it's better to say, let's take this big thing and have a human, a human in this loop is defining all the steps that it's going to take to implement this workflow. +And that way it's still, you can make something that is recoverable, you know, that there's airstates for some of these steps and you can get out of them, pass it back up to a real human. +But yeah, all along the way of saying these things are going to remain hard to predict, but the code that's built around them, I think, is going to become very tolerant of that and by pulling the users into the conversation constantly. +Yeah, so you basically, if I got your idea right, is that you put the user in the driver's seat, right? And the model or whatever LLM app is still, it's kind of like an assistant, as you said, or companion, whatever you want to call it, right? +But you, like, you still, I guess we are still at that point in time when we need to know exactly what we want, right? As users. +And I think we also need to know how to get it out of the model, right? +Because sometimes no matter what you know, it's not somehow achievable, maybe because you don't know how to prompt well or, you know, you just go into the loops, I frequently go there, when I, for example, chat to, I don't know, chat GPT or it could be any other tool. +When it just keeps going and returning to the point which didn't work already, because the alternative doesn't work now. And I'm like, okay, neither work. Like what you propose just doesn't work. +What should I do? But still, I feel like I became much more productive as a, I don't write code every day, you know, for my work anymore, but for leaving. But when I do, I feel like I saved, I don't know, three, five days of my time by using these tools. +But there is still this kind of unpredictable component to it, you know, I'll give you one example, very specific one. So I was building like like simple Python code, which would draw a diagram. And on the x-axis, it would need to put, you know, these values like 1, 1.5, 2, 2.5 and so on. +And so the model made a mistake by rounding all these values to an integer. And so when x-axis, all of a sudden, I saw the same values, right? And the model doesn't have the reasoning component to realize that it made a mistake. +Or at least call it out and say, do you want it this way? Or should I do it another way? I had to correct it because I knew that I needed to cast it to float. But if I didn't know programming, I wouldn't be able to do that, right? I would be stuck right there. +And so that's the level lake of sophistication we are in still, right? If we're talking about code completion, but I wonder what you feel about this? What do you think about code complete? +You did call out cursor as the tool you use the probably more often now, but you did work on that in GitHub, Copilot team. +And what was your sense of its quality and like challenges around it? And in general, how did you approach the task that research challenge? I can speak a little bit of that. There's two ways in which I will be unsatisfactory here. One, I can't get into all the details probably. +And another way is I've been gone since May. So I'm sure that that makes an amazing change since then. But this Copilot completions was one of the first successful applications of large language models. +And outside of the pure model, chat to BT, a large language model as a large language model service. Like this is, this was just the, I guess it was the first. So the implementation was actually fairly simplistic. Basically, they, we weren't using chat models at the time. Those didn't exist. +We were only using completion models. Completion models, basically, I mean, your audience probably knows this, but given the top part of a document, then all the model does. And it's useful to think of the model this way. It simplifies things. All the model does is it picks the next token. +What is the most likely token based on all these words before it? What's the next token? And then you append that one and you did it again and again. +And so the big aha moment that happened probably in 2019, as well before my time on Copilot was, look, I can take this top half the code down to the function. And the answer, you know, the completion that it makes is surprisingly good. +So like maybe it's time to just wrap or wrap up for application around it. And then after that, everybody's learning these lessons at this point. But it's all about the context that you put around it and how you present it so that the model can make sense of it. +At the time that I started with Copilot, we were still using the completion models. And it was, the context itself was 2048 tokens, I think. So just tiny, tiny, tiny window. +And so a huge focus at the time was how to take all the things that we thought might be useful and squeeze it down into this tiny space, just, you know, actually make sure you've nailed it. +Because not only do you have to fit the prompt into this 2048 tokens, but whatever the completions are, that's, you know, that they're sharing the same windows. You can move that line up and down, but it's always in 2048. So there wasn't, there wasn't. The ingredients were pretty simple. +The file that you're looking at is obviously the most important thing. If the file is long, which I'll often I'll log in to that 48, then the text right above the cursor is an important thing. +There are some initial work with like the, they're still called fill in the middle models, which they don't need this anymore because all the models are so free. You don't need a specialized model for this. +But you could, you know, you could say the prefix and the suffix, and it would do a good job about filling in the middle. So the suffix was also an important part of the context. +Where do you stop this thing? And then as the model, crew is a context-based crew a little bit, we can start sticking in extra things. And so, you know, you start with little bitty things. +These models were not trained on, these models were trained on code, but they didn't necessarily have the context around the code. So the first easy thing to stick in is you could do a shabang at the top, protecting a comment that says, here's the path for this, this file. +And that gives the model context about where this lives in the context of everything else. A big breakthrough that Albert Dealer, Mike Coother, pioneered was the neighboring tab stuff. And I think this is all common sense these days. +But basically, when you, as a human, are using an IDE, you open up the file you're working on. But you also open up other files for reference. So, duh, why don't we, you know, do that ourselves. And the initial implementations of this that, you know, probably got not better at this point. +It was simple. It was like, look at the text right around the cursor. And then search these files for similar text. And in your timing, 2048 token space, you have any room for any of these snippets, then you can chunk other stuff into the context. You have to be careful how you present that. +You can't just, you know, have random scraps of text that are like, you know, partial function implementations. Because that will prime the model to implement partial functions. Like, it'll, you know, it'll just iterate the same gross pattern. It seems above. +So you do things that make it look more like code. You say, here is an interesting, you skip a code from this file in the comment so that it's still, you know, importantly, so it's still valid syntax at the end of it. +And voila, the rest of this history, we came out with a really impactful product that no one had seen anything like it before. And it's certainly changed the way I code. I'm much quicker and probably dumber at the same time. Yeah, it's been an interesting experience. +Oh, maybe more smart because you get to do more things, right? Like you can, I guess you can, you can get hired, like you can achieve, you know, larger heights and then, like experiment way, you need to experiment, right? And not where it feels maybe more mundane. +As long as the code works and like, I don't know, there are no security holes in it and stuff like that, which would need to be checked separately, I guess. Anyway, that's very interesting. +But to close up the loop there, like I'm just trying to understand, you said you focused on keyword search, right? So you, you owned the elastic search sort of pipeline. +Can you, if you're comfortable disclosing that, like, would that index the visible code in the ID somehow so that you can, or what was the role of that in the whole chain, hope, pipeline? You're asking a lot of questions that don't quite seek well on my actual experience. +Let me see, if I can take your question and you take it just a little bit. When I came to GitHub, I worked on code search, which was keyword, like, school search for the entire code corpus. And that was really cool work. But that has since moved to that, they've rebuilt the whole system yet again. +And it's a really amazing engine, the proprietary engine that's effectively grip at fantastically massive scale. +But that said, that code engine, the one that I built in, even the one that came after it, are not the things that are most beneficial for some of the applications that KhoPy that has in the editor. And they do different things for that. +They're, for example, if you're on the web app side, there are things, now I need even in the ID, I'm remembering stuff from six months ago, they do just in time like vector embedding vector storage and stuff. +Vectors are a lot better for certain types of code search where you're finding code that is about something. Whereas, lexical search is a lot better when you're finding code that matches this exact string. +And I think everyone in code outside of code, everyone everywhere is still kind of wrestling with this. There's no one data structure that does all that stuff ideally. And I think we were wrestling with that inside KhoPylet as well. +Yeah, but I guess, yeah, I understood your point and I probably missed that in your explanation that you worked on code search and not on the generation. That's why in code search, you did use the elastic search index. + But like what I was imagining and I'm completely clueless in this topic, is that by the virtue of LLM being trained on bunch of code, let's say open source code that you can train on license wise, if the user is asking something that reminds the code that had written before, wouldn't it make sense to try to find that code and kind of somehow you know, rag on it with LLM or is it completely different than how you did it? +The like at this point, we've moved to much, it's you know, as of May my left, they've moved to much larger models. +And then the models themselves have read not only all the code and GitHub, but also it's read the internet five times or something. So they read all the blog posts about code. It's amazing, right? It's what times you live in. + So whenever you're typing something and it kind of smells like something it's a thing before, it doesn't, it doesn't necessarily need rag to go get you know, common motifs, common, you know, here's what you're doing, here's what I think you're doing a code right now and it can piece code together from all the code it's ever learned from and extract late outside of it. +But if it is and you know, this is me talking about how maybe I would build a co-pilot. At some point I guess, you know, you need to see if it's if the user's typing code that is so similar to code in this code base that it's worth bringing it in. +And we kind of did that in a rudimentary way with the neighboring tabs. You've already got the tabs open. And that ended up being super useful. +I think there's probably a kind of a decreased efficacy, there's work for this, where if you're doing a rag search over the entire code base, probably the code that you're going to find is already code that's open in the tabs right beside them. So maybe it's useful to do that maybe it's not. +But I don't know. Yeah, interesting. I think code is like, as you said, it's the first successful LLM application. Probably some companies will say, no, no, no, Dr. Boog's was the first successful LLM application. +But I, but I, there were some, maybe it was the first successful neural search application. And then co-pilot was the first LLM application, successful LLM application. And there's plan nine. +Yeah, there was another company that was out there actually before us, but they just didn't have quite the same, they weren't only my Microsoft at the time, that probably helped a bit. Yeah, budget wise. I'm guessing. Yeah. +Yeah, but I still, I still feel like it feels like magic, right? Like, judgey PT also felt magic and scary in the beginning. +Like when I saw it for the first time and I saw it produce code, I thought that my job is done, even though I was not a programmer anymore by then, but I felt the existential, well, not crisis of fear that basically many of us, and especially junior developers, like probably not needed anymore. +But then as I was, you know, overcoming my fear and I was like, now let me try this thing. It's probably a toy. I found some, what I explained, you know, some edge cases, which just doesn't work. It goes in loops. And so I was like, okay, it seems like another tool under my belt. +So I better master it and not, you know, walk away from it. +But the code generation still feels like magic because you can explain, like you can use tap tab and like on a method signature complete, complete something or on the comment complete something, but you could also write natural language, right? +You could say, generate test cases for me or something like that, right? And then it will understand it and will read your code and will reason about it and produce the test cases. +I mean, that feels really magical. It's the time we're wandering into right now is going to feel like magic for a while until we've got to get used to the exponent, it's just going to keep going up and going up more, going up. +But you know, I've had those existential pains myself, but then I realized when I start using these new tools the way that they want me to use them, I have superpowers. I think what we're actually, you got to have the right mindset. +If your mindset is like, oh, my cobalt job is over, you might be right, your cobalt job is probably over. But if your mindset is like, oh, wow, I can do things I never could do before. +I, John Berryman, put together the HTML from my website and built a react app in this like, like I thought I'd have to have a PhD to do something like that. But it's amazing. +And what you're seeing is an emergence of a new group of people that are, they call us the AI natives, AI native development. And I've heard, you know, code composers rather than like just coders. And you have people that are technically savvy. +You can't, you have to have, you know, some ability to, to recode still at this point, to debuck some stuff like you were talking about. But they all go out at a screen, do this thing for me. +And they have, it just takes a little bit of experience to learn how to shout at the screen in the right way. You got to, you know, you've you still got to have the human ability to, you have to think about how this is structured, how to modularize stuff. There, there is a craft to it still. +But you, you can start building up pieces. +Even if you're not technically savvy, if you've been building it in chunks when one of these pieces messes up gloriously and you've got your floating point numbers that I don't work in out like your example, then at least you can say, I'm going to delete back to here. +I'm going to try a different route. See if I can just bump it out of this. And often you can. And people in every walk of life are are much more effective and efficient at creating. +And it's, you know, you don't get this, you don't always get to solve the nitpahee little, you know, if you really love debugging and writing tests, I'm sorry. I think that's your days might be numbered. +But if you love creating, that's I think we're approaching a new golden age and it's exponential. We're going to keep approaching new golden ages for a while. Yeah, I think in my career, if I can reflect a little bit, I, I love creating much more for sure. +But then back then, we didn't have a lamp, so didn't have compilates. We had to do pay a programming, right? And that was our command. Yeah. +And but the, the, the, that notion that you just said about creativity, I think that drove much more forward than us going into the rabbit down the rabbit holes, you know, of debugging that thing. However important that thing was, you know, of course, you need to debug and so on. +But it didn't feel, like you, you would just feel exhausted after that. You know, like, yeah, I fixed that bug. Finally, I squashed it. +Move on because you, you want to build stuff, right? You, and I think it was it, the extra who said, if debugging is the process of removing, finding and removing bugs, then programming must be the process of introducing bugs. And so that's right. Yeah. That's a vicious circle. Yeah. +You, you already touched on that topic a bit earlier about artifacts. I've read your blog posts, which will, will definitely link, link in and I, I got inspired by that. +I have to say, because oftentimes when I go to the set applications, you know, chat, GPT or perplexity, what have you, and you have a longer conversation there, it is hard to then sort of trace back and think, okay, I branched here and, okay, what was my thinking again? +What did I produce at that point? There is nothing to hold on to except scrolling back and forth. +And that's what you really put. And I want you to open, like, you basically proposed something new, I believe. I wonder if you are the creator of this or like, in any case, you carry this idea forward. Can you explain what do you mean by artifacts? I will carry the idea forward. +I think there is what we're seeing is some convergence around the notion that put into my blog post. For example, with anthropics, artifacts, so that they, they splash something that I think is getting at what I'm talking about. +But if you dig a little at the end, it's not quite what I'm talking about. Whenever you engaged in a conversation with an assistant, LM experience, they just want to chat. And so we've done good over time by giving them like tools. +So now it's rather than just like being your therapist, they can go inducing for you. So that's nice. But still, it's a linear flow. And whenever you're talking about something, it flows back into the backstroll. +Most of the time, when you are getting work done, you're getting work done on something. And artifact, there is a staple, I really wanted to call it a staple object of discourse because it isn't object. It's staple because it may change. And it's it's the object of the discourse. +But artifacts is not just easy to say. But this is what we deal with. Whenever we're paraprogramming on something, it's me and you looking at this piece of code, and you make a recommendation about this. And I say, that's good. We go back and forth. +And anything that you can imagine can be addressed like that. The situation becomes a particular point yet, when every you're dealing with multiple artifacts. So if you're saying, I really like this thing over here. And I wonder how it would fit in with this thing over here. +You're having, as a human, you're having to refer to more than one thing that exists outside of this linear conversation. And you're talking about how they relate to one another. And so the blog post, which I hope you guys all read, arterislabs. +com, we'll do this again in a second, right? It gets into what I think of as an artifact. It talks about how to build a prompt so that you have space for this linear conversation. +But you also draw the models attention to a chunk at the top, usually, you might put it all through, you might put it at the bottom to have an experiment with it. But a static chunk, which is like, here is all the things on the table that people can refer to. +Each object, each artifact, has importantly an ID to be referred to. And I've noticed that these models do really well with arbitrary hexadecimal IDs. So I'll just give them a random ID. +But they're really good at referring to those and not like, you know, they don't seem to hallucinate these IDs, which surprised me. +And so if you have a prompt with these artifacts at the top, and you have a system message that explains to the model how to interact with these things, then my experience is that they obey the instructions really well. +They talk humans are used to using pronouns and names and nicknames and, you know, other pointers that refer to the real thing. +And these models having read all the human text that they could get their hands on, the internet five times, they also understand what you mean about using using pronouns and stuff. So you can say, you know, dear model, there's this thing called artifact. They have these IDs. +When you refer to them, then use these, use anchors like in HML because they've seen a lot of those. And in the href tag, refer to it. And here's an example. And they just, I haven't done any like formal like, you know, reinforced testing, but in my experience with, they just haven't gone wrong. +They are comfortable referring to these things. And it provides a really slick experience, I think, for the user. The user at the end of this conversation is looking at a conversation that they don't have to scroll back up to. They're looking at artifacts on the right. +And they can, they can grab the ones they need. The artifacts themselves, you know, the application developer, you're in charge of how you want to present these things. If it's its text, you can just make it text. +But if it's like a home listing, you know, in the background, it can really be represented by, you know, JSON. But you present the user, you know, picture the home and the, you know, scrollable tab and maybe a scheduling button. +You can do all these rich things with artifacts that you can't do if you're just having a chit chat conversation and it's all just scrolling back into the back. So I think it's a cool enough idea. +I think there's some indications that it's coming into existence with, you know, Anthropocardic factor, GPT, OpenAI's Canvas. Persure is actually implicitly doing a really good job with somehow they're doing this. So it'll come into reality, I think, at some point. It just gives you an idea. Yeah. +It feels like it structures the interaction with the element. +It doesn't feel like you lost your time in a way that you, like, it's like you need to summarize it for your conversation, right? To go back and like tell you what was important, right? But how does it all know what is important? You know, but you already forgot. +And so if you have this artifacts, you can refer to them. But it's interesting that I think these artifacts can you use them? And by the way, I don't know, if you can demo something quickly, I saw a demo on your website. All right. +So this is how do you go to my website? Oh, and you know, check this out. This website was me and like chat GPT and cursor just kind of hanging out, teaching me some HTML. But yeah, you go to my blog. Wait a second. Wait a second. You've built this site with an LLM. Correct? Yeah. +That's what you said. Well, it was me and a large language ball. It wouldn't be just saying build a website. Of course. It's going to, it's the, it's what's going to happen in our future. It was everything is going to be a conversation working on this with a large language ball. +It's a beautiful website. I have to say, yeah, amazing. And the logo. Even the sniffy little logo was generated. AI. Oh, amazing. Okay. This is ridiculous. I'm going to take up just a little bit of your time. It's okay. Oh, it's fine. This logo right here. Check out how many cool things out. +There's, there's a bunch of little bits in here. And then I'll make give you a quiz so you can find the last thing hiding in this. Arcturus is a star in the Northern hemisphere. It's a navigational star. It's a brightest star. And it means guardian of the bear. +And so with my cubo logo here, you've got the a you got the bear. The a is kind of serves. It's a little looks like guardian. The bears represent of the big hairy problem. That's powerful. And but I'm going to, I'm going to help you out. The stars are all for for pointed. It's navigational. +There's one more little uh, uh, Easter egg in this that I didn't notice until I finished building it. I didn't design it. It just emerged. And if, yeah, I'll start doing it. If you're a good computer scientist, especially, oh yeah, then yeah, yeah, a star search, a star search. You got it. +You got it. You got it. I didn't even think about it. I just thought this needs kind of a star over here. And I looked at it and it's a star, which is, you know, optimal, near optimal navigation of the difficult domain ahead. LMS is a good at creating Easter eggs then. Yeah, very terrible jazz. +Yeah. So anyway, sorry, sorry for the, also the stars that was, I mean, these stars are amazing as well. You can just stare at them, right? And Marvel, they move, they look a bit like snowflakes sometimes as well. Yep, they do. All right, so thank you for the digression. +We're looking through my blog and we're looking through, uh, cut the chit chat with artifacts. One thing I'm trying to do recently with my blog, and I hope you guys will, you know, there's plenty of place where you can, uh, like, subscribe for this. I'm trying to put in plenty of examples. +And here's the kind of built-in example of it working. Let's see. You know what, this, we might very well edit this out, but I'm going to go down to the now you try a bit right here. Oh, if this is, uh, in a naive approach, uh, let's say that I'm building like a real estate, uh, helper assistant. +I help real estate agents. And the real estate agent says, I want to put together an email for client about, and I'm listed on Oak Street. Can you hold a listing? And so the thing has some tools built in. Uh, it's got a get listing tool. And so you can see all the garbage that puts in there. +And it's got this listing, um, but like, I don't, I've got the listing. It says it's got the listing. Somehow all this garbage, there's a listing, but I don't know what the listing's really about, um, and so I could ask about it, but then it's, it's a filter. +I don't have the thing that came from the database. I have this weird filter in front of it. Uh, can you pull an email template and draft a new email in another tool that it has? I guess it's going to take its sweet time to do it. Oh, of course. Hmm. Hmm. Okay. +Um, so it drafts, it drafts an email, but oh, look, I've forgotten this, the, the buyer's name. So this is one version of the email that is relevant to this thing right here. Uh, but, you know, I've forgotten to tell you his name is Tim Cersei and my company's name is Artie Tristral Estate. +Uh, it goes back to this and so it fills it in and then I'm left at the end of the conversation, you know, copy and paste in this out. +If this is what I want, I'm going to paste this in the user's email and be really embarrassed when it's got this little string at the top because I've copied that out. And if I wanted to do anything else like modify the template or do anything, it's, it's, it's just, it's not there for me. +All right. So let's, let's do a similar experience with, with this. I want to, uh, again, pull out that listing per, for Oak Street. Oh, I have an interesting. All right. +So in this time, I'm still showing that it, it knows how to use tools, but every time it tries to spit out this like JSON stuff, it's actually getting substituted in with an HRF that points to it. And what is it point to where you click on it and it automatically loads, uh, this scar right here. +Now, um, I didn't take time to make a real pretty interface, but you can, imagine this is JSON. You can make this look like anything you want to. You can make it link out to the database and do all sorts of things. All right. I'm going to put together that email template again. Yeah. +I guess especially when you build a dedicated LAM application, right? You know what type of, what types of objects you're going to be interacting with and you can build the, you know, I can do UI around those, right? But yeah, a very flexible, manable interface. +The interface is whatever the user needs it to be potentially. Yeah. All right. So it's, uh, it's, uh, it's, uh, got this customized email draft. Now, uh, you know, I was looking here, there's no email draft here, but there is here on, on the side of the screen. +And you can see unfortunately, I forgot, uh, to stick in the user's names. So, uh, let's see. Here's the template that it used. We didn't see that in the last example. You can see how it wants to put together stuff. You can see how it actually put together stuff. +And whenever I said I forgot his name, it said, Oh, okay, I've updated that artifact for you. So you don't have like multiple versions scaling up. You just got this. And you could do even interesting things like I could say, uh, you know, this is much better if I say gone bare, you man right here. +And that is now part of the artifact that the assistant sees. Uh, it's, it's in that artifact section at the top of the prompt. You can have it say, please change my email prompt forever to say something out of like this. And you can work on this and say that back to the day. +It just, oh, it opens up a lot of possibilities for a user experience that is easier. Because when we get work done, we get work on things, not just check. Yeah. +Your reason, your reason around artifacts and you work with them like as if they were physical objects almost, right? You can take away this thing with you and go proceed with your task. Yep. You refer to them. You modify them. We use them to do things. And you could, I'm guessing. +I'm really guessing. I'm new to this topic. You could maybe even condition the model on these things, right? You could say given this artifact, I want to do something else with it, like rewrite some parts. You know, would that work? I mean, this kind of sky is the limit. +Uh, it's, it's kind of been a fun thing to think about. But you could have typed artifacts. And then when you have a certain type of artifact, you could introduce the tools. Uh, so like, you know, if we need to modify this artifact artifact, we can, we can know how to deal with it. +You can have, it's kind of what I did with my next post, the roaming rag. You can have artifacts that are like accordions. They're, they're bigger than fit in the prompt. +But you can say, you know, here's summarized outline everything in every piece of that summary, uh, the model effectively can click on it and expand it. And it's just another ID and, you know, a tool expand the section. So it can read docs that are bigger than fits in its context. +There's just a lot of neat things that I think you can do with artifacts at the starting point. It's very interesting. Don't you think that just one thought across my mind is that when we transitioned from static web to like web 2. +0, I guess, so what is what was it called when you can actually modify things on the web, right? You could send a comment, you could, you could do stuff. Uh, now it feels like we've transitioned into the new phase when we do the same to the ideas. +We like exchange ideas and we can like modify them, you know, prior on them, prompt with them, uh, take away a store. So it becomes more on the concept level. I think everything's going to get really weird, uh, going forward. I think we've been used to going to the internet and going to web pages. +And even if we could interact a little bit, it's nothing like you're about to see. I wonder if a lot of the internet experiences, you know, they're worried about all the text going away, uh, because like we were, we'd run out, run out of the text, the internet, training these giant giant models. +Maybe the future of the internet is going to be replaced by just conversations. The, you're going to go to a place that is a sensible, you know, starting point, but the whole website is going to become whatever reality you need it to be at the time. +And I have no idea how we harvest the text of that train of future models. It might be crazy, but I think I think we're getting ready for a future we cannot possibly predict. Yeah, and I think spam will be replaced by slope, right? I don't know if you heard of this, YouTube. +No, slow, slow, slow is, uh, SLOP. So it's basically an unverified output of an LLAM model. So something that got produced back to your question, you don't, you have no idea if it's true or not, you go and paste it somewhere in the web and then LLAM goes and scraps it and learns from it. +So you spam the model. And so there is a call out. If this feedback effect. Yeah, exactly. And there is a call out that, hey, let's not spam or let's not post slope on the web because that will bite us because we are moving so far ahead in the LLAM. And who is obeying that recommendation? Exactly. +Probably not the companies that need content produced. Yeah. Yeah. The moment you say, don't do something, there will be a bunch of people saying, oh, let's try. That sounds like fine. Oh, that's a good idea. And then we need to invent a solution for that. Hey, Jonathan, it was really exciting. +And I've known it like a ton by talking to you. I feel like we can record probably like like three months style episode, you know, four or five hours before we get exhausted. But I also wanted to give you a chance to, you know, go on stage and sort of and talk about your book way. +Like, why do you think everyone needs to read it? I want to read it. If I get a chance to get my hands on it, hopefully soon. Everyone needs to read it because every time I make a sale, I get one cup of coffee. So that's why everyone needs to read it. Of course. Yeah, that's a good reason. +But then also, yeah, go ahead. No, I wanted also you to give you a chance to talk about your company. +Because I know that feeling of starting something new on your own, you call yourself an indie consultant, right? At the same time, you have so much with you and your luggage, right? Like you, the knowledge of the experience. And so why not share it in a different way through your company. +But I wanted to learn a bit more. What is your vision for the company? What do you think you will offer like in midterm? Where do you create the value for the customers? And maybe there will be some customers listening in this podcast, hopefully. Sure. +Well, okay, let's go through both of those then. I hope I hope everyone reads the book. I hope they enjoy it. I hope they learn from it. Working with large language models is a very different beast from what you're used to. +I think, you know, three years from now, everyone will be a large language model application developer because they're becoming so prevalent everywhere. So start early. Get your hands dirty, interact with these things. +And my book helps kind of take, you know, give you the training wells at first to understand here are a bunch of the problems that you run into. Here's how here's how model works. That's there's actually a lot of good intuition and just understanding the tool that you're interacting with. +Here's how to organize a prompt. And that's not always easy. You got to figure out what's all the stuff you might use. And you can't use all of it because it doesn't all fit or because you don't want to wait for the latency. +You know, it tells you how to, you know, fit that into a prompt, present it to the model in a way that kind of empathetically, the model is going to understand the model is not psychic. You need to talk to the model as if you're talking to, you know, someone that you're working with. + And then towards the end of the book, it gets outside of a single prompt and it talks about, you know, like this tool magic word we've got right now, agency and how to build a assistant behavior with tools and how to, you know, build a more sophisticated thinking steps with it in review of, you know, what's happened. +And it talks about workflows, which is another type of agency really about how to, you know, take an input, bunch of data, pick it apart, do the right steps to get a job done with hopefully not going off track too much. +We talk a little bit about evaluation and we wrap it up by saying holy cow, look at the future we're going into. This is going to be amazing. So I hope you get a chance to read the book and I hope you enjoy it. I hope it's as enjoyable to you as it was painful to me to write. +And then yes, I am out of my own now. I'm an indie consultant at Arturus Labs. I'm specializing in all things just like the book, prompt engineering, large language model application development. I think we're going into a very different world as far as like how you build things. +You've got to build it like we had earlier in this conversation. You've got to build these components to deal with. You've got to build it web apps to deal with these components that are very undependable. I do make them as dependable as possible. +How do you make the user experience where they trust what's happening? And that's tricky. So I offer a whole range of things from just education, going in and training companies. I like going and working with them to think through what product they're working on right now with their next big goal. +I can say this is a great idea. You're on the right track. This is not quite feasible, but we can fix it. That's the product type stuff I like thinking through. And then as we get to a longer engagement, I just love working with these companies, especially like startups. +Just sit down, pair with them, do transfer of knowledge, type stuff. It's just really neat to see what people are up to. A lot of creative ideas right now. And then finally, yeah, please make sure you check out my website, www.artrisslabs.com. +I'm going to throw together a lot more blog posts like the one we didn't know today. I'm trying right now to make sure every blog post has something juicy, a piece of code that actually works and you can experience the thing that was running to my mind at the time. So try it out. +I'm really engaging on Twitter. Tell me what you think. And yeah, I'd like to get to to know you guys too. Yeah, amazing, amazing. And I wish you all the best with your new adventure, your new venture. And yeah, we will link everything. We will link the book. +We will link your site and blogs for sure. Thanks so much for spending time with me and educating me and keeping up with Mike sometimes, you know, and obvious questions. It was really, really a pleasure to talk to you. +And I really, really hope that we can record sometime soon because you seem to be cooking a lot of ideas. And you take from what I gather, you take really practical view of things. And you've been like, and you are an engineer and researcher. And so that's very dear to my heart to see. +And I can't wait to see what you come up with next. Me too. Well, thanks so much for having me on. It's been great talking to you. So yeah, let's do this again sometime. Yeah, thanks, John. Have a good day. \ No newline at end of file diff --git a/transcripts/vector-podcast/connor-shorten-phd-researcher-florida-atlantic-university-founder-at-henry-ai-labs.md b/transcripts/vector-podcast/connor-shorten-phd-researcher-florida-atlantic-university-founder-at-henry-ai-labs.md new file mode 100644 index 0000000..15e0642 --- /dev/null +++ b/transcripts/vector-podcast/connor-shorten-phd-researcher-florida-atlantic-university-founder-at-henry-ai-labs.md @@ -0,0 +1,306 @@ +--- +description: '

Show notes:

- On the Measure of Intelligence by François Chollet + - Part 1: Foundations (Paper Explained) [YouTube](https://www.youtube.com/watch?v=3_qGr...)

- + [2108.07258 On the Opportunities and Risks of Foundation Models](https://arxiv.org/abs/2108.07258)

- + [2005.11401 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401)

- + Negative Data Augmentation: https://arxiv.org/abs/2102.05113

- Beyond + Accuracy: Behavioral Testing of NLP models with CheckList: [2005.04118 Beyond Accuracy: + Behavioral Testing of NLP models with CheckList](https://arxiv.org/abs/2005.04118)

- + Symbolic AI vs Deep Learning battle https://www.technologyreview.com/2020...

- + Dense Passage Retrieval for Open-Domain Question Answering https://arxiv.org/abs/2004.04906

- + Data Augmentation Can Improve Robustness https://arxiv.org/abs/2111.05328

- + Contrastive Loss Explained. Contrastive loss has been used recently… | by Brian + Williams | Towards Data Science https://towardsdatascience.com/contra...

- + Keras Code examples https://keras.io/examples/

- https://you.com/ -- new web search engine + by Richard Socher

- The Book of Why: The New Science of Cause and Effect: + Pearl, Judea, Mackenzie, Dana: 9780465097609: Amazon.com: Books https://www.amazon.com/Book-Why-Scien...

- + Chelsea Finn: https://twitter.com/chelseabfinn

- Jeff Clune: https://twitter.com/jeffclune

- + Michael Bronstein (Geometric Deep Learning): https://twitter.com/mmbronstein https://arxiv.org/abs/2104.13478

- + Connor''s Twitter: https://twitter.com/CShorten30

- Dmitry''s Twitter: https://twitter.com/DmitryKan

' +image_url: https://media.rss.com/vector-podcast/20211223_011252_c0a8e84bf74cac993f87600e13f3d942.jpg +pub_date: Thu, 23 Dec 2021 13:32:52 GMT +title: Connor Shorten - PhD Researcher - Florida Atlantic University & Founder at + Henry AI Labs +url: https://rss.com/podcasts/vector-podcast/347472 +--- + +Hey everyone, Dr. Podgas here. And today we have Connor Shorten with me who will talk a bit about his research about lecture databases, about YouTube hopefully as well. So I'm expecting a really nice discussion today. +Hey Connor, how are you doing? Hey Dmitra, thanks so much for having me on the podcast. I'm really excited to continue our episode and maybe dive more into the deep learning research side. + I think our first podcast on Henry AI labs went really into the detail and the practical implementation and the history of Burton Elasticsearch and then all the different vector databases and I think so now we can kind of maybe look more in the research side of things and sort of discuss together about where we think all this vector search engine stuff is headed. +Oh yeah, absolutely. And it's exciting to be recording based on the day when you actually released that video. So obviously we will link it so for our listeners and our audiences. And hey, could you please introduce yourself? Yeah, great. +So to say to introduce myself, I guess I would like to kind of like be reintroducing myself almost every like year. So as obviously I make these YouTube videos and I'm kind of like still discovering my role in deep learning research and still learning myself. +In my journey, I'm in my second year of my PhD. I finished my master's degree where I got started with research on generative adversarial networks and data augmentation, published literature reviews on data augmentation for images and text. +And this has really been my research focus is data augmentation, the idea. Primarily my interest was I started out with when I first learned about deep learning right away, I come from being a basketball player. I played basketball in college and I was ready to go deep learning for basketball. +How can this improve basketball? So one thing about basketball is when you're playing, you want to have a highlight mix tape where you have all your best moves and helps you get the college scholarship. +And so I was really familiar with that process of what it takes to be recruited to play college basketball. So I wanted to build this computer vision system that would crop out, you know, you're made baskets from full game tapes automatically. +And so I came into this problem that everyone has seen where if you try to do supervised learning with small data sets, it does not work. So like annotating data is extremely difficult. +Like you can, if you're doing it yourself, you can probably get yourself like, you know, in my case, I was annotating made baskets and video clips, which is already high dimensional data already, you know, paying to store all that data. So, you know, and labeling it was a problem. +So I said, maybe data augmentation because I'm overfitting this data. So I can try to rotate it, crop it horizontally, flip it, increase the brightness, this whole package of things you can do. Yeah, and orientation. Yeah. Right. And so it worked pretty well. +So I was pretty inspired by this idea of data augmentation. +I really like papers like Francois Chalets on the measure of intelligence where discussing the ideas of like system centric generalization developer, where generalization known unknowns is kind of, you know, matrix of known and unknowns with generalization cases. +So I hold the belief that we can kind of steer the data in the direction that enables more generalization. And the key to mocking more generalization is mostly going to be in the data space. +So I'd say I'm in this data centric AI category, which is, you know, lately become one of the buzzwords of where your camp is. I love things like neural architecture search and different learning strategies and all that, but I really love the data augmentation. +I think there's so much opportunity and research to explore this further. And then. And so yeah, so I have a few ideas of how this could intersect with vector search engines and vector representation learning. So that's on one end. +So that's kind of, you know, my research interest is in data augmentation and a bit of a background about how I became so inspired in data augmentation. So then to say kind of what I'm doing right now is, you know, I've, so I've started doing some experiment papers. +Most of my computing is managed with Google collab, which is pretty nice. +You know, like, you have the Google collab notebooks and then you have the Google Drive integration for persistence and, you know, you can make it pretty far without putting a dent in your wallet by doing it by getting too carried away. And so that's kind of how I'm setting that up. +And, you know, I have, you know, I can tell people about like, as I mentioned, beginning trying to reintroduce myself and figure out my role. So I had kind of like recently, like a high of achieving the best student paper at this ICT AI conference on something about inductive biases. +And then the next day I get my ICLR reviews back, which were not great. So, you know, and that's kind of the journey of this, you know, I'm just setting, setting forward to ICML and trying to just bounce back and stay on this journey of figuring out how to do deep learning research. +So it's definitely a high. Isn't it? It's like almost always like that, you know, like in machine learning, nothing is predictable and nothing is given, you know, like, and you need to be kind of averse to that. Well, not averse, but resistant, right? Like, okay, I'm fine. +I can take risks, but it's like a marathon. It's not a sprint. Oh, yeah, definitely. And just the disappointment of investing a month or two into a research project and then you just start running the experiments. And you're like, oh, this is not working. +And your advisor's on the phone twice a week and saying, how's it going? And you're like, not good. You know, like, so that's stressful. And, you know, anyone else going through that, I can definitely relate to that kind of struggle. Is this by the way, why you do YouTube show Henry A. L.A. +Labs? Is this why you do it? Or is there something else as well? I just wanted to kind of tap into the psychological element of it if you thought about it. Yeah, yeah, I love to talk about it. +I mean, my inspiration for YouTube came from, I guess I was just like one of these people who really enjoyed like we would have guest lectures come to Florida Atlantic University. +One that stood out to me more than anything else is researchers from Johns Hopkins came to, they had built a prosthetic limb that connects to a brain computer interface. +And they have people who have lost their limbs and they can, you know, blindfolded touch an orange and say, this is an orange, this is an apple, this is a banana. And they came to talk to us at Florida Atlantic. And I mean, it was, it was inspiring. +I like, I love these kind of seminars and just, I guess like falling in love with this kind of presentation. +It's almost like, say like to me, it's kind of like an allegace to like maybe like stand-up comedy, how you have someone who gets up on stage and puts the show on, you know, the benefit of the slides behind them. And, you know, I really like these, these kind of talks. +And that's kind of, so that's kind of like the art of it is what I really like about YouTube. I mean, I definitely believe in YouTube as the medium for communicating these ideas right now. + You know, like, and we'll get into talking about writing on medium and like, yeah, like the different ways you can write on Twitter, you can write on medium, you can record podcasts and put it on Spotify, Apple, and you can write these research papers obviously just, you know, upload it to archive, treat it like a medium of the number of user on archive is probably less than what you get on YouTube. +The content is different too. So, yeah. Yeah. So, yeah, I really believe in the medium and then I just want to see the art form develop further. Like, I'm really impressed with what Yannick Kiltch was doing. +Like, right now he's just released auto regressive diffusion models and, you know, I'm excited to watch it and that's, and that's the fun about it is, is you have this excitement about it. Let's link that as well. It's a YouTube as well or like another show you mentioned. Yeah, yeah. +I think just YouTube, Yannick Kiltch, I think most of our viewers will know what we're talking about. I just want to make sure that I will also educate myself. So, so let's link that. Awesome. +Yeah, I mean, and so yeah, you said that data augmentation is one thing you worked on and I guess continue working on. +It's actually interesting that you did that in CV space, but there is also somehow connection in text, right? Can you tell a bit more about that? Yeah, so I, so I spent the, I think it was this, sorry, I'm getting my dates wrong. It's currently the fall. +So, I think I spent the summer spring of last year trying to transition these ideas into text. I did the image data augmentation survey in 2019 where the sentiment was still extremely hot around GANs, gender, vatricero networks. Everyone was really excited about this real fake loss. +We can generate data and then add that to the data set and then, you know, suddenly we have this very broad coverage for interpolation in our data space. So then I was trying to look into text. Text is, I say the key lesson I learned is that it's harder to be labeled preserving. +When you're forming the X prime Y, it's less likely that the Y is going to have that same high level class labels as you're trying to do things like say, like the starter kit would be random swapping, random insertion, random deletion, those kind of things. +And then you kind of transition into maybe trying to use a knowledge graph to better guide the text you're replacing. And then ideas like say mix up where you cut and paste and glue sentences together. I'm not like a huge fan of that, but it's kind of interesting. Yes. +Might as well have like drop out. It's kind of like a, you know, like I don't think there's a lot of intuition in the data space of why just smashing them together would work so well. But it does kind of work. +And then, and then I really like this category of generative data augmentation is obviously mentioning my start in gendered adversarial networks. And this idea that you learn the data distribution. +So you sample from the data distribution to learn classifiers and kind of classifiers being almost like a appendage of the generative model, which is, which is like what we're talking about with the modules, the supervised learning tasks that you append onto the vector search engine database. +It's like this task of having a generative model or say a representative vector space is kind of like the real context that built into the supervised learning task. Or at least that's the way I see it. And, you know, maybe anyone can leave a comment if they are have a different idea about that. +I think it's ill aimed. But so that's kind of how I see those two things integrating. So to connect this back to text, what we can do is text is we can use things like GPT three or more so what they do is you would prompt GPT three. +So you'd say, you know, please finish this movie review with a positive sentiment as the prompt. And then you can just remove whatever you want from the original data point. And GPT three can generate a new movie review. +And then you can blow up your data set size, avoid the pitfalls of overfitting and that kind of promise of data augmentation. So hopefully that kind of answers the question of how I did this transition from image to text data augmentation. Yeah, it does. +And I mean, why I'm asking also is because, you know, you can also treat these two sources of data in like kind of in a joint training task, right? So you can kind of train the joint neural network. +And for example, when you watch, let's say watch using the algorithm, you watch the movie or cartoon and you see some scene where, you know, one hero is kind of crying. The other one is cheering him up. +You know, now where do you pay attention to? It's also important, right? Because it's the whole scene. Now you need to pay attention, maybe just to that pin on his neck, you know, that he's not happy about. And you know, things like that. +So have you thought about that as well? Or are you still considering them as independent? Yeah, I know. Yeah, I love that idea. Like I think what we're the word that most people are using is multimodal learning. And I'd call that paper multimodal data augmentation. +And you know, just last night Microsoft released a new 2.5 billion parameter image text embedding space. You know, everyone's knows about OpenAI's clip image text spaces and the dolly, the avocado shaved armchair generation. Everyone likes that. So yeah, I mean, multimodal learning is so exciting. +Yeah, I'd say it's going to be an interesting thing with the computation of it and what kind of in what the computation requires, we're setting up these kind of tests. +I'd say, especially with video data, like you just mentioned, I, you know, I wouldn't really want to play around with video data with my collab Google Drive workflow that I mentioned earlier. Yeah. Yeah. +But it's interesting also that big players, like you mentioned Microsoft and I mean, others, they're moving in direction of increasing number of parameters in the model. But when you go to practice and you need to build a classifier, you know, you don't have that much capacity. +Like you don't want to spend that much capacity really unless you're building like a terminator level AI, which will handle all tasks that you have. But probably you won't do that because it's still not there. +So do you also think about that kind of the practical element or are you still kind of fencing the beauty of these complex models? Well, wait, do you see that? Yeah, well, I'll stake my flag in the same campus, the foundation models, researchers, and I think it was mostly Stanford. +They published this paper titled on the opportunities and risk of foundation models, some title like that. I'm sorry, it's not exactly correct. But, you know, this kind of ideology that big companies like Microsoft and Vitya Google Facebook, they'll build these big, big models. +And then what we'll do is we'll use this knowledge distillation interface to compress it into practical use cases. +And so we've seen, I'd say this started with Colin Raffle and the people who worked on my paper with the text to text transfer transform of the T5 model, not showed how you could unify all text supervised learning tasks through the same kind of language modeling style interface. +You just prompt it with, you know, natural language inference and then you give it the input or you say, answer this question, give it the input or you say, re-rank these documents and give them the document. So it's the same interface for every supervised learning task. +So yeah, I'm, and then just one more thing to kind of put in the citation context is this general purpose, like, opening iClip and it looks like Microsoft, I think they're calling it bletchly or something like that. + But this idea of just having two vector embedding spaces and then using the contrast of alignment as the general interface for any kind of task, because as we mentioned, you can put any task into natural language, any task that you're going to do with supervised learning could be described with natural language. +So you have that kind of interface and the Allen Institute has another architecture called general purpose vision systems that, you know, unifies all these tasks, object detection, semantic segmentation, service, normal estimation, all these kind of ideas are unifying one architecture interface. +So to kind of wrap up my answer to the question, I think it's going to be Microsoft and them scaling up like crazy. Maybe they're going to run it out of internet scale data eventually. I think Microsoft has said that they can train like a 32 trillion parameter model if they were motivated to do so. + So I think they're going to run out of internet scale data and then the data augmentation will be the next step from going from say like the 400 million image taxpayers that are now open sourced or Luther AI has the pile, which is like 800 gigabytes of raw text if you want to do something with that. +So I think eventually as you go into the 32 trillion parameter and on, they're going to use data augmentation to have these inductive biases about how we can keep scaling the data side of it. So yeah, so I think they can scale the models for a while. +Yeah, I guess they probably they are doing an amazing job, but like they are probably still writing the horse of what Peter Norby called the unreasonable effectiveness of data, right? +So like your algorithm might not be kind of as as nuanced as your data is and so just give it to the machine learning algorithm as much as possible and then kind of it will learn, right? But you know, like in practical situations, this is what I alluded to. +Like you just don't have that much data. On the other hand, you don't want you don't have that much choice and you also mentioned this. This is a very interesting topic of data augmentation in text because in images, you can do like cropping rotation and huge changes and whatnot. +In text, you can do that like so easily. For example, if you say you have a sentence London is the capital of Great Britain, you cannot put Barcelona there. It will not make sense. +So, you know, but like you can still find another example where you could probably swap cities and that's how you build, you know, the augmentation. But then there are other things. For example, if you take machine translation, you know, it suffers from hallucination problem. +I don't know if you heard about it, but like if you have certain like distortion in your data, for example, you call the websites and you also called erroneously the advertisement. +So you glued the advertisement to the source pair, source target pair, right? Now your model is hallucinating about that advertisement when the student has, right? So, and it's flipping facts. It's also switching, you know, object and subject easily. So it's not something. +And again, now I'm stepping on the territory of the model itself, right? But like, and model robustness. But I think data augmentation plays a key role in actually making sure that your model can kind of at least not hiccup on some very basic things, right? So. +Yeah, and we're completely in agreement with that. I think one other part to that story will be how, say, so Facebook has this model called retrieval augmented generation, where the whole idea is to add more context to avoid this hallucination problem. +So to kind of break down three things, you just said, I want to start off with the, yeah, the hallucination thing and transitioning right into that. +So, so I think the idea of adding more context is our best solution to stopping hallucination and maybe using consistency, contrastive loss, loss functions for the fine tuning to, to make sure they're attending on the context. +Because like I recently reviewed a paper on my channel titled open, open, open challenges in open domain generalization, some title like that, where, um, where yeah, these models, you get them the context. So they have additional context in the input, but they just don't read it. +And they just generalize as if it's not there. So fixing that problem is definitely step one. And so then to go into the second thing that you mentioned where you replaced London with Barcelona and that's the thing about tech data augmentation is, it's, it's not label preserving really. +It's harder to find symmetries in the space. It's easier to find these differences. So there's one paper. Maybe I'd like to point readers to titled on negative data augmentation. +And so they're kind of flipping the, so it's like, how do we use augmented data? Should we just keep using this, you know, kale divergence between the one hot class vectors or should we do something different with the augmented data? +I mentioned consistency losses where the loss would be, you know, the representations of X and X prime ignoring whatever the Y label is and negative data augmentation is saying, you know, push them apart. +These are not the same label. We've switched London with Barcelona. And so then I think the last thing, as we're talking about, like the practical implementation, I think you say two things, there's like two directions that which are really interesting. +And I think what you're getting to with the data augmentation is, is you want to prevent overfitting. And if you have, if you're, you know, grabbing Microsoft's 32 trillion parameter model, and you've only got 100 labeled examples, there's no way that's going to work. +So you want to prevent overfitting. And then I think kind of the second part to that story when people talk about this kind of topic is, is like storage and inference cost and obviously training costs. You're going to fine tune this. +So maybe training costs has been solved with prompting where you don't actually need to do any grading to send updates. You just give more in the input context. But then I think inference cost is solved with this knowledge installation interface. +And I think hugging face, man, I think the name of their product is lightning or something like that where it's about inference acceleration. And it looks like they're, you know, they're doing it pretty well. So I certainly bet on hugging face to solve that problem. Oh, yeah, absolutely. +I think they call it infinity, you know? Infinity. Yeah, sorry about that. Oh, it's okay. It's also like testing your memory, you know, like we remember. And I think it's still also like at some point, and I think Elon Musk is afraid of it. +Hey, Elon, if you're listening to this, hello, you know, like he's afraid of that our interface is way too slow, right? And so eventually I will basically supersede us, which I don't think so, but let's see. + But also like what's interesting, I was thinking that maybe a little bit like developing this topic further, but it sounds you have so much knowledge on this and it's so packed, what you said, you know, like, for example, if we could use the language model itself to help us generate, you said GPT, right? +It's generative model, but there could be some others, which will kind of help us to generate things and then augment the dataset. +But there is one beautiful that I don't know if you've read this paper. It's called what bird is not lessons from a new suite of cycling, holistic diagnostics for language models. And so basically the paper essentially claims that bird does not distinguish the negations. +And that can be super, super sensitive, like in sentiment analysis, right? At least, but also like in machine translation and other downstream tasks. So have you thought about this? Like basically there is actually a now a development. +I think it's also on Microsoft side to try to bring knowledge into the language model. And you can do it in a variety of ways you mentioned knowledge graph, but there are other ways kind of to bring in the structured knowledge. +So any thoughts on that on that topic? Yeah, and this is where I'm just starting getting back into we V8 because I think we V8 is going to be a huge part of solving that problem and adding the additional context. But first I want to raise you one paper. +So from the psycholinguistic thing, I want to point readers in the direction of viewers in the direction of checklist. It was one of the best paper awards at a recent ACL conference. ACL is I think ACL EM and OP, like the top NLP conferences checklist is exactly what you say. +It's a complete suite of tests for negations named entity swapping. And it's really nice to use. It's on GitHub. So yeah, so they have the interfaces for testing for that kind of thing, which I think once you have the test, you can start hacking away, it's solving it. +It's not theoretically grounded. If you have the right test, you could hack away until you pass the test. So checklist is the test for that. But then so yeah, so then the idea of context and and we V8. + So so V8 is so the vector search engine part and you know, Facebook paper dense passage retrieval is their current approach where they have, you know, the text embeddings, the documents and they're going to go retrieve the context so that you can avoid hallucination, hopefully avoid these kind of vulnerabilities through robustness. +But so vector search engines is what I see as being a huge player in solving that particular problem. And I see that transitioning not just from text, but image text of video text like the idea that you want to add some more context from your database to the current inference. Yeah, yeah. +I mean, V8 is doing fantastic work. Actually, we have a podcast recorded with mob and so, you know, my listener's can actually watch it and then we also had an episode with you where we covered some of the things. And you also recorded a bunch of videos like walking through the feature set. +What caught your attention in V8 when kind of if you can slightly compare to other database vendors? Okay, well, I don't have much of a comparison to other database vendors. And so I'm, you know, apologies to everyone out there working on this. +My experience with it doesn't come from the practical software engineering side of it. It comes from reading these research papers and then being familiar with these ideas. And then, I mean, V8 is easy to use. It's really well, the documentation is great. It's easy to get started with it. +So that was a huge thing for me is, you know, when I first met Bob, first of all, you know, he's a great guy and, you know, meeting this team. They're all really on top of everything and their slack chat is really great. People, you know, pitching in their problems and it's just a great community. +But, you know, what, what did it for me is, so I met Bob and then I spent about two weeks going through their documentation, the quick start, the installation set up, you know, get my data sets in there. And it's just really easy to use. +So I, and then, and then learning about all these other things like the Python client. +Like as we talk about fetching the context, I mean, we want to ingrate that into a training loop where say Facebook also recently released internet augmented generation where they're using the Bing API to bring in the context and then learn with that extra training. +So they have a Python client that lets you integrate that into your model workflows. And then something we talked about in our last podcast, I love the GraphQL interface. I think it's really cool. And I love the web demo. +So you can, you know, get started with the GraphQL interface and you can practice your queries, you know, you know, learn it quickly before you make any commitment of installing your mouse database. +So yeah, and I just think we be it is like a beautiful technology that's making my, my life is trying to do deep learning research just a lot easier. So, you know, it's awesome that they're willing to support Henry AI labs and help me continue making content on YouTube. +Well, at the same time, it's a, you know, it's a tool that helps me do what I want to do with this kind of research. Yeah. And are you like already using via V8 in your research or planning to use? Yeah. +So I haven't really made a Henry AI labs video on this yet, but it's something I'm really excited about. So one paper I recently had accepted in ICML A, not quite ICML, but ICML A, it's application to add it to it. +But it's a, it's a caros, Bert is the title of the paper and it's about, you know, language modeling with caros documentation and caros code examples and, you know, like Syek Paul, Franceschal Leigh, they're going crazy with these caros code examples. And there's so many examples. +Like you could, you have like a PhD and more organized completely online on this caros code examples to me. It's like the most interesting collection of deep learning information on the internet as the caros code examples. +So from there, there's like two ideas is like, can we build a language model that can like debug your caros code for you and, you know, open AI code X. Everyone knows that it looks like the answer to that is yes. And you know, they have the lead code, they have data sets of like lead code. +I know everyone loves lead code. And everyone is looking for a job. Yeah, code X is, you know, able to pass these lead code tests. So, you know, and I, you know, I'd say some lead code tests are harder than the deep learning debugging. +So, you know, it looks like it looks like a pretty promising solution. And so in the second project I have that I'm integrating Weaviate, what to help me do is, is, you know, Facebook is big on unsupervised machine translation. +They did a paper where they're translating between Python and JavaScript without any annotation. So maybe we can translate between caros and PyTorch without needing to, or PyTorch and Jack's even to, without, you know, somehow without much labeling. And this is very much an infant research project. +But if you have that, if you could bring the caros code examples to PyTorch and Jack's and just, you know, help people share this knowledge. +So, so this is like two of my personal projects that I've started integrating Weaviate in and then one of the project that I'm, you know, extremely passionate about and really into with my involvement with the university. +And this is kind of a separate thing that I'm not too heavy on because I don't want to like kind of push the commercial interest too much. It's, you know, and Weaviate is open source. So it's an open source software. We have, we can download it from GitHub and we have it. +So they can't, you know, take it away. +And so, so this other project is, we're trying to build patient information retrieval systems where you, you know, you come to the hospital and they start to record your, you know, coagulation studies, they, all the physiological markers and the genetic history. +And we want to go query the literature maybe. So this is, you know, as a research project and the on Institute has been pioneering this with data sets like core 19 and their system called sub.ai. Salesforce research had a system called co-search. I'm just kind of naming things for people. +Oh my god, I'm not going to describe these things. +So these are like literature scientific literature mining systems where you, you know, you want information about say COVID-19 and or, you know, someone's coming in there with some obscure disease, you want to be able to query the literature with particular information about this patient. +And so this is the information retrieval problem that, you know, we're super interested in as spectrature search engine people. So we're trying to turn these patients into, which is what I have is mostly tabular data. +You might get a little bit of medical images, some clinical reports for some text, but, yeah, mostly tabular data. So we want to encode that into vectors, send those vectors into the scientific literature, and then maybe there's some clinical trial, you know, because it's so much data. +Once you really download, like say the core 19 data set from the on Institute, you'll realize that, you know, 500,000 papers about COVID is nothing anyone could read. You know, I already know this from reading deep learning papers. It's like no one can read this. +And even like, if you go traditional way, and I wanted those at the top in, in this area, you know, like if you go traditional way, let's say you have a keyword look up, right? +So keyword search, you would have to build like some kind of synonym layer, which means you need to understand what you're doing, or you will need to hire somebody to do that. +And that's like an additional step, which kind of like, you know, doesn't reduce the journey for you. You have to do that and this is that you feel like you have more control, maybe, but at the same time, it's very laborious. +So at the same time, similarity search kind of doesn't have that boundary, right? So essentially you haven't coded it and now you, you know, now that the challenge, the complexity moves more into the space of choosing the right neural network and then choosing the right database. +Everyone knows which is the right database. So, but anyway, but I'm just saying, like, but, but I'm just saying, like, do you think that similarity search will completely supersede keyword or you still see some synergy between them? Yeah. +And well, I like, before I get into saying my opinion on this, I'd say that I'm not the expert on keyword search. So, so here's my opinion on it. I, you know, we V8 has a symbolic filtering where you can still do symbolic searches. You can still do the keyword filtering. +You can still have these symbolic characteristics. And, you know, I'm in the same, I believe things like what Gary Marcus talks about, about, you know, it's not really robust to these symbolic queries. What we mentioned earlier, where you insert negation and it might completely throw it off. +So robustness is like not completely solved that. I was reading a paper this morning called from DeepMind Researcher's data augmentation can help robustness. It was like such a on the nose title, like that, like data augmentation helps robustness. So, so yeah, solving robustness. +And I'm, you know, I saw a, I'm not like, I still think solving robustness is a huge issue for this. It's not completely put together yet. Yeah, absolutely. I agree. I agree. +So, but like, yeah, you mentioned you are not an expert on keyword search, but at the same time, I think you were the expert of using like Google, right? So like you still type keywords. +And, and I think psychologically, you still expect, you know, the snippets to contain some of your keywords as a validation that the search engine got it, right? So like otherwise, search engine maybe that just, you know, returns you garbage in return to what you want. +Yeah, and that's why I think like like the page rank, transition dynamic matrices, though, those kind of things that that's like, it won't be enough to just have the vector search engine probably. You'll probably need some kind of like tuning layer. +And that's why, so we've got has the Python client. As I mentioned previously, a research project for this would be to integrate that Python client into the training loop of the, you know, whatever is doing the supervised learning task. So it kind of isn't just retrieving. +It's like when we talked about the difference in information retrieval and approximate nearest neighbor search, it's kind of like the semantics differences between the things you're encoding, where you might be encoding a like the email title and then the email body. +And so you have these different kind of like transitions between the categories of objects you're encoding. +So, so yeah, like the, you know, I still think that there's like a layer of, I don't know how to describe it, maybe like that system one system two, I know people like that analogy, but there's some kind of layer between keyword search and vector neural representations. +There's something in the middle of that. And, you know, I don't know what it is, but yeah, I guess page rank. Yeah. +Yeah, like basically you're talking about sort of even, even after vector database has returned to the nearest neighbors, you still have a sort of liberty to apply a re-runker, right? +Because and that's where your business logic kicks in, like the rules, the product, the vision, the design, there are so many inputs into that process of ranking. +And then ranking obviously is like a huge research area as well, you know, with the click biasing and things like that, right? Yeah, I mean, and it's also interesting. I just crossed my mind that yesterday, Richard Sorter announced his search engine and U.com. +And did you have a chance to check it out? Basically for listeners who didn't check it out yet, so it's a search engine which summarizes the web pages and the kind of documents and so on. And so you are kind of, it makes it actionable. +So just one example, they can find you a code snippet on Stack Overflow that you can actually copy paste. And that's just one example, right? But there are plenty of more. Any thoughts on this? Yeah, well, I mean, first of all, Richard Sacher, his research has been incredible. +And as I mentioned earlier in the podcast, I was listening to systems co-search from Salesforce Research was, he was one of the authors, I don't know who led the project. So yeah, U.com, I mean, it looks crazy. +Like, have I used it quite not really yet, but I definitely believe in the concept and yeah, the research is pointing in that direction. It's exciting. +But do I think like, solely neural system? Yeah, I mean, designing new interfaces around search, started to go around that a little bit as I'm trying to like think, well, I talk, but yeah, the U.com thing is exciting. New spaces for search engines. +It's hard to even completely conceptualize it, I think because it's such a, you think of Google as like this giant, undistructible search engine, but that's really not the story. There really is a ton of research and search engines. Yeah, yeah, but actually, I'm currently working for WebScale. +So, Changes, which I cannot mention because it's my client on the NDA, but we basically have all the charts and we know that Google is like 97%. And then everyone else is close to the bottom. Unfortunately, well, of course, Bing has a couple percent of the market. +And then it kind of, if you go inside a specific country, the split might be different. Like, if you take Russia, for example, Yandex is on top and then Google is following them, but very closely, you know, but overall, globally, Google is just somewhere beyond the sky. +So, you need to kind of differentiate a lot, you know, like you don't want to build another Google. +It's almost like Peter Tills book, you know, zero to one where he says, if you are building another Facebook, you're not learning anything from Mark Zuckerberg or if you're building another Google, you're not, you're not learning anything from the Google founders. +Like, you need to build that one, right? And I think Richard is trying to build that one probably. So, yeah, I mean, it's an interesting direction that he's trying to involve the AI much deeper in the process, probably already surfacing, you know, users. That's fantastic. +Yeah, yeah, I don't have anything to add other than just shared excitement about what you.com will become. It's certainly exciting. Yeah, absolutely. All the best Richard. Yeah, and you actually I wanted to make a slight segue into you shared like a ton of information today. +I wonder how do you keep up with so much stuff happening? Like, what are your preferred sources of information? Like, obviously YouTube is one, but, you know, there is also medium. There is publications themselves. +How did you structure your sort of consumption, you know, parts like the pacing and kind of where to pay, put your attention and so on? Yeah, that's a great question. +And, you know, early days of my podcast, I was doing the Machine Learning Street Talk with Tim Scarf and Yana Kiltcher and Tim asked Jonathan Frank, the author of the lottery ticket hypothesis, the same question. Like, what's your information diet? And I thought it's a really interesting question. +So mine is, you know, like most people out there trying to be good at something. It's chaotic and it gets overwhelming and I get really stressed out sometimes. So I don't know if this is the best advice to follow, but like, here's what I do. +So I, you know, I'm very active on Twitter, like maybe to the point of detrimental to my health, like I checked Twitter, like, all the time. Like, so I'm always refreshing Twitter and seeing the new headlines. +And so I, when I see like an archive link, I'll try to, like, if I like it, I've tried to discipline myself to be like, don't just like it. Like read the abstract, like get a couple sentences in because clearly, you know, the titles caught your attention. +So, so Twitter is really where I get all my news. And then the art form of making these YouTube videos, I mean, like Yana Kiltcher and Tim Scarf that I mentioned, the Machine Learning Street Talk, these kind of, this kind of medium. It's, I watch that. It's pretty good. +I think I watch it on like, Exploratory Street also Alexa, Miss Coffee Bean to kind of go on the list, you know, they're not the only ones doing it well. A lot of people are starting to make really great YouTube videos. And I love that kind of medium of showing these things. +So on my, my work, my like, my workout, say I'm a basketball player and I've got to work on my deep learning skills is it's mostly about reading these papers. My experiments, I'd say the coding part is not super challenging. +Thanks to things like Keras coding examples and like thanks to them, major thanks to them because that saves me so much headache in just getting running. So, so yeah, I try to, I try to read like five papers at a time. I tried to switch, I try to set 20 minute timers, drink a lot of coffee. +And what else do I do? Yeah, I guess that's it really reading the really reading the papers. I mean, if you make paper summary videos and write blog posts, that's also a huge way to retain it. I try to talk to a lot of people also just, you know, I try to keep a lot of contact. +Like I'm organized all this through Twitter. So like, you know, I might just send messages to say, Syek Paul from who makes, I think he works at Cardid and he makes, he's one of the leaders of Keras code examples. I'll send him ideas. I'll be like, you know, I saw this paper on Twitter. +I think, you know, this reminds me of what you're doing. And, and yes, I guess overall, that's my information diet. I'm probably leaving something that I didn't really, you know, prepare something for this, but no, it's okay. +I mean, it's also, it's also great that you're speaking your mind, but and things that really stick, you know, you mentioned them, right? But where on that scale, you would put medium, you know, the blogging platform where it kind of thrives with tutorials. +And sometimes these tutorials, they're kind of okay, but you kind of like, okay, are they going deep enough? But then there are other things where they summarize papers in such a way that they actually try to explain it. +It's almost like popularizing science because you do want to breed that next, you know, generation as well. And maybe you will have some feedback to your ideas because don't you think when you publish a research paper, you know, for the most part of the humanity, it's dry text. +For some, it's just Greek, right? They will not even understand it. They will never, they will never read it. And so, but they still might be curious, like, okay, how, you know, robots make decisions or something like that. +You know, so, how does my car, how does my car keep the lane keeps the lane? And actually today I was driving, I was driving to work and I was like, my car actually switched to the lane keeping mode. And it was telling me that I should not, you know, steer to the left that much. +So it was actually steering to the right. But the moment it noticed that I put my hands away from the steering wheel, it actually started alarming me and saying, hey, I'll sleep or something, you know. +So it's also like kind of caring for you, right? In a way, so it's not trying to do so much more work, in that sense. +Yeah, like, the idea of popular science, I mean, you know, I'm recording my podcast behind a bookshelf, like it makes me look smarter, but I only really, I only really read books like, you know, like the book of, I mean, the book of why is a bad example. +That's a really great book, like technical and I really really like that one. But most of these like popular science books, I'd have to be like on an airplane or something like I, or are in the same with the category of medium articles that are popular science. +Like, you know, I read research papers only, not to like be dismissive of anything else, but that's just like the question of what particularly do I study. And in my approach is very people-centric. +Like, you know, like when, say, Chelsea Finn publishes a new paper on Twitter, I'll go read that because I kind of have been following her thinking, like Jeff Cloon is another example with the AIGA's or François Shalide. +These kind of people like I, like Michael Bronson with the geometric deep learning is another great example. I hate doing these lists. +I never like to do these lists because it's so endless, like the vocabulary you need to kind of assess, like I've left off so many people, but you know, I like the people's centric focus and I try to get to know these people and understand like how they think of these things. +It's like the same thing as you go to the conference. Sometimes you don't go to that specific topic. +Maybe when you're a little bit more junior, you do, but later in your career, like academic or industrial, you actually go to listen to that person because they might not give you any novel idea, but they might give you so much experience that you daily, like really need, right? +Yeah, and just following the timeline of their work, it helped, like their newest work will help you realize, oh, that's their thinking in the past work too. +I kind of see how they're thinking about these things. And it's like, you know, everybody thinks so abstract. They have this idea, this vision, and it can be hard to communicate the vision in writing or videos. +So yeah, just like you said, I think just repeated exposure to the same person is like, hopefully that's Henry AI last thing. Yeah, absolutely. I'm pretty sure. I saw some really great comments underneath your videos, you know, some people were saying, I can't wait for the next one. +So you definitely doing great job there. So could as to you for doing that for so long actually. I don't know for how long you've been doing this, but you have a ton of videos. Yeah, and I really appreciate it. +You know, the people who keep commenting, I, you know, I recognize your profiles, and I do really, really appreciate it. So it helps me keep making the videos and staying convinced of that medium of YouTube being one of the ways to express these ideas. +I'd say like even, even more so than writing papers that you submit to these conferences. Sometimes I, you know, I think making a YouTube video can be a powerful way to share ideas. +I don't know if I want to completely put my flag on that idea because I, you know, these reviews, you do get some really good reviews. Like as I mentioned previously at the beginning of the video, I, you know, I literally got smashed on my ICLR reviews. +They were not good, but I got, I got really high quality feedback. So, yes. You know, you're learning from it. You're learning. Right. Yeah. Actually, one of my managers used to say feedback is gold. +So even if it feels painful, take it because because the problem is that sometimes, especially as you grow in your career, you know, at some point you will be the role model for some other people. Now, where do you get the feedback from nowhere? Because you're the person giving feedback. +But you still need to grow. You still have pains, you have doubts, you have ideas, you need validation. And maybe you're doing something wrong as well at some point. Maybe somebody is intimidated to tell you that because you are at the top. You are like the boss or whatever. +You know, like who gives you feedback at that point? They actually recommend to turn to, you know, professional coaches and kind of those people who can actually steer you in some direction. Right. Oh, maybe you can unload your thoughts. +Have you found yourself in that situation? Or what, what do you think? Yeah. Well, I mean, I'm in a lucky situation where I do have a formal PhD advisor that, as I mentioned, I speak on the phone with very often. +And, and you know, my PhD advisor and I had a relationship for so long that he like introduced machine learning to me. So it's like, I was a basketball player, you know, taking classes. And I, and so this was my introduction to machine learning. +I like, I hardly understood like, you know, like a tea test statistical regression analysis before this class. So it's like, so I'm, I've had the same advisor for a long time in that regard, like a formal academic advisor. +And then meeting people like Bob and, you know, you and I as we talk now, I, you know, trying to reach out and pick the brains of people and see what they think. I guess. Yeah. So basically they are like, they become like, you might have multiple role models. +And sometimes, you know, like they also say, you do not need a physical person with whom you talk, but it could be some kind of online person. Like for me, it used to be for a long time, Elon Musk, because I've been focusing on building startups. +And, and his approach to startups was not like, hey, you know, go unleash yourself, get rid of your doubt and just do it. No, he's so deep into what he does. +Like at some point, I want to record a podcast where I would like to talk to you or talk to somebody to actually explain and kind of does it resonate with you, like he's thinking, like, first, you need to try this before automating this. +You need to repeat it several times to learn new mistakes and blah, blah, blah. So it's like an amazing way. +And he like build this kind of, you know, a thought machinery that he applies to any problem, right? So any problem that lands in his hands, he's like, I can try it step by step like that and see what happens. And maybe at some point it just drops out and you're like, okay, I'm done here. +I'm moving to the next one, right? So I'm not going to waste my time. And he's a super productive guy, as we know. So I mean, sometimes it could be just an online person that you follow. And as you said, you do this on Twitter, like you said, like maniacally refreshing the tweeters. +So just stay stay safe as well there. But at the same time, I think the respiratory time in your life, when you're learning a ton. And later in your life, you will be kind of generating fruit out of it mostly. Or maybe you will be telling to other people and maybe inspiring them more and more. +And then leading some research groups and the work, you know, teams. And that's that's totally fine. But I also wanted to call out your idea that I think is quite instructive for many of us. +And hopefully to our listeners that yes, do go go and read papers because as Andrew Ang put it, he said, if you read a paper every weekend, let's say you have a full-time job, you don't have time to read it, you can read it on the weekend. +At some point, and he also recommended to start coding, you know, like actually you didn't find the code for it, just try to implement the idea, right? +At some point, after reading the papers, you will actually start generating ideas because you will find gaps in the thinking of the authors on all of these papers. +And nobody is doing perfect job there. They're doing the publishable work, right? And so I think that resonates with you as well. Yeah, definitely. +You definitely like switch gears where you become an idea machine like you say where you read a paper and you'll have like a billion ideas for how to extend it. And then you'll transition to this part, which is what I'm learning now. +And, you know, as I'm in my last year, I've been two years in my PhD and the transition for me is going from idea machine to, okay, can you really build the idea for real? Do you really know how to test this? And so, and that transition isn't super obvious. +And it's painful to be going back and forth between, you know, theoretical idea machine. +I'm reading these papers because like in terms of like that flow state of creativity that you get into when you're when you're working on things, for me, personally, reading papers is like the most satisfying thing. I feel very like productive when I'm reading papers. +I might, you know, I feel good. But when I'm engineering things, I feel more pain, man, because it's more painful, I'd say. Yes, yes. +And this is where, of course, you do want to have those oiled well, well, well, oiled software systems that you don't need to waste your time setting things up or running out of disco, whatever, you know, heaven so, so frequently. +So like even the innocuous things like before I had integrated Google Drive with Google collab, and it would crash. And I feel like I've just lost 10 hours of running this thing. So, and that is not good. +Like, this is I think what Joel Spolski said at some point, you know, the co-founder of Stack or Flow, you know, he said like, imagine that you want to print a piece of paper and you log into your computer and it says, please upgrade the driver. +So you upgrade the driver and then operating system says I need to reboot. So it reboots and it basically waits 10 minutes of your time. And then you, and then again, it says, hey, actually, I cannot print because you ran out of something now. Again, it installs them. +And you like, instead of solving the problem, you become the administrator of your computer, right? +And that's the same, the same thing can happen so much, so often in software, you know, development and research as well, because, because I think somebody will put on Twitter, we do not actually choose between big and small, like do a lot of things and do like small amount of things. +We usually choose between small and nothing. And so I guess when those things eating a lot of your small time, right, to nothing, you're like frustrated and you're like, okay, I'm just down the rabbit hole. +What am I doing? +And so I think tools like VEVIates save a ton of time and everybody who is innovating in this space from the direction of usability, you know, like and saving time, shaving those minutes off of, you know, your experience, I think that will save so much time for your thinking as well. +Yeah. And before VEVIate, I was doing a little bit of the sponsored content work, and which for me is great because I get to talk to these people and they teach me a lot. And so this is with the term in AI, which is now a part of you who have packered. +And so yeah, they're building the hyper-pram, like distributed training hyper-pram, reorganization, which what we're talking about, like the administer of the system, they're doing a lot of this work. +And you know, as anyone, I'm sure people listening to this have gotten smoked with the cost of one of these experiments too. So it's not just your time. It's not fun. Yeah, actually, you reminded me of on Google Cloud. It was a tutorial, like a workshop. It was a free one. +They even like gave us food. So you just show up, they video and then they tell you things. And it was a practical one. And I remember one of the instructors, he was not an employee of Google, but he was certified. And you know, like he said, hey, now we're gonna spin the Spanner cluster. +And Spanner is the my SQL planet scale with all the consistency and semantic guarantees using atomic clocks. And there is like a fantastic presentation by one of its engineers that I have in my recordings. I have not published yet because I don't know if Google will try to sue me. +But you know, the idea is that it's a fantastic system. And there is a paper as well. And then the guide, the teacher, he said, well, hold on. Don't spin too many of them because I get the bill. And last month, I got a bill of $4,000. +And Google could not reimburse it because they said, you're not an internal employee. So he was like, it's fun. But you know, to the point when you might. Yeah, it's funny. It's funny now, but it's not funny at all. Yeah, that determined AI calls it lunch and learn. +There are this kind of concept for deep learning, or like I'd say to science content, like even like, you know, with physics and they're going to be doing experiments where it's expensive. So we're not going to each be doing it. +We're going to watch one person do it and kind of gather around as a community. And yeah, I see that as being a huge part. Just like Uber eats coupons, I think is a brilliant interface for it. And then everyone attends the thing. But yeah, I love that kind of. +And then just quickly, so like one thing we're working on at Weve 8. And as people have seen with hugging face data sets and the Kaggle competitions, well, hugging face data is a little different, but it is hosting the demos cheaply. So that so in Weve 8, we're working on this. +The wiki data is going to be the next big release where we have the pie torch, big graph embeddings, which is the graph structure makes it different from say Wikipedia, because it's really good at entity embedding. + As we mentioned London and Barcelona, if you construct a knowledge graph of Barcelona compared to London, that's going to have a better entity representation using learning techniques like deep walk or note to veck or maybe maybe like a graph convolutional network with an auto encoder loss, but probably deep walk or note to veck is what I would say is, I mean, I'm not completely caught up with that, but anyway, so having that kind of data set, the wiki data, and now it's cheaper. +That's the huge difference. That's the change in deep learning is hugging face is hosting all these data sets, so you don't have to host them yourself. You can just quickly access them. + And with Weve 8, it's even more exciting, in my opinion, because they're hosting a vector search engine with model inference, I mean, hugging face is doing model inference too, as we talked about infinity where they've got inference time data like milliseconds for these massive models is, yeah, is you don't have to pay for the hosting of these things, which is obviously good. +Absolutely, absolutely. And also not like massive with hosting things, because that's also the cost of maintaining is the cost not to neglect. So absolutely. Yeah, yeah. Absolutely. Hey, it was such a packed conversation. +I think the show notes will be infinite, because you mentioned so many names, so many articles, and that's fantastic. Thanks so much for doing this. I wanted to just still kind of end on kind of a little bit like that philosophical stance, which I usually do. +And I think we touched a lot on that and thanks for doing this. But like in summary, what drives you? Why are you doing this? What you are doing? That's great question. I mean, I guess like, and I've heard, as you mentioned, Elon Musk, I've heard that he says, like, I want to be useful. +That's one thing he says. Yeah. And I guess in the same way, trying to do the useful thing. +And I guess like, obviously, I like these big grandiose visions of things like helping with health care and self-driving cars and helping with poverty and creating housing climate science, all these kind of things, obviously. +So obviously, there are these big grandiose goals that I think we all share truthfully. +But then it's more of a question of how do you stay in the grind of it? And how do you keep waking up and keep getting at it? And so I'd say that kind of heuristic of just trying to do useful things every day is actually a pretty good guide. And so we all share these big visions. +But we need the motivation to pick ourselves off the couch and achieve to do that. Yeah, absolutely. +And it also sounds like you mentioned you played basketball and you continue playing that, right? So that thing, when you do the sport, you need to be persistent, right? And your body sometimes doesn't want to do it, maybe. But you know in your mind that you do want to do it. +And so that persistence, I think, also translates into, you know, the research and keeping up with things, right? Yeah. Yeah. + And to stay on that kind of analogy, I'd say like the physical pain of basketball is like, you might hurt your knee, you might have some tendonitis, is that kind of physical pain or the physical pain of when you're doing conditioning and you can't breathe, that you're going to have that same kind of analog with this kind of mental work. +And it'll manifest itself in like depression and burnout. And so you have to be like, as you do more training, you get better at the pain of the injuries. So to say like it's like injuries to your mind and the same kind of analog as physical injuries would be. +And I think understanding that and accepting it and dealing with it is important as well. And then it kind of translates into maybe some other region of your brain when you have this page from like, you know, reviews or like your experiment going, hey, why are you can make? Oh yeah. Oh yeah. Fine. +I got to get a cup of coffee and you know, in five minutes, I'm okay. Maybe. Yeah. The coffee is the key supplement. Absolutely. Corner, thanks so much. This was such a fantastic conversation. I'm pretty sure we can repeat it. Have another one. +And I can't wait to see what development you're doing with VIAVIAT and also in all your research projects. You know, stay active, stay hungry, stay foolish, as Steve Jobs used to say. And I think that's fantastic what you're doing. Thanks so much. Thank you so much for having me to meet me. Bye. \ No newline at end of file diff --git a/transcripts/vector-podcast/connor-shorten-research-scientist-weaviate-chatgpt-llms-form-vs-meaning.md b/transcripts/vector-podcast/connor-shorten-research-scientist-weaviate-chatgpt-llms-form-vs-meaning.md new file mode 100644 index 0000000..1d134f4 --- /dev/null +++ b/transcripts/vector-podcast/connor-shorten-research-scientist-weaviate-chatgpt-llms-form-vs-meaning.md @@ -0,0 +1,65 @@ +--- +description: '

Topics:

00:00 Intro

01:54 Things Connor learnt in the + past year that changed his perception of Vector Search

02:42 Is search becoming + conversational?

05:46 Connor asks Dmitry: How Large Language Models will change + Search?

08:39 Vector Search Pyramid

09:53 Large models, data, Form vs + Meaning and octopus underneath the ocean

13:25 Examples of getting help from + ChatGPT and how it compares to web search today

18:32 Classical search engines + with URLs for verification vs ChatGPT-style answers

20:15 Hybrid search: keywords + + semantic retrieval

23:12 Connor asks Dmitry about his experience with sparse + retrieval

28:08 SPLADE vectors

34:10 OOD-DiskANN: handling the out-of-distribution + queries, and nuances of sparse vs dense indexing and search

39:54 Ways to + debug a query case in dense retrieval (spoiler: it is a challenge!)

44:47 + Intricacies of teaching ML models to understand your data and re-vectorization

49:23 + Local IDF vs global IDF and how dense search can approach this issue

54:00 + Realtime index

59:01 Natural language to SQL

1:04:47 Turning text into + a causal DAG

1:10:41 Engineering and Research as two highly intelligent disciplines

1:18:34 + Podcast search

1:25:24 Ref2Vec for recommender systems

1:29:48 Announcements

For + Show Notes, please check out the YouTube episode below.

This episode on YouTube: + https://www.youtube.com/watch?v=2Q-7taLZ374

Podcast + design: Saurabh Rai: https://twitter.com/srvbhr

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20230311_070307_5788fcdf763e7dd822dd4b0bbb59f9b6.jpg +pub_date: Sat, 11 Mar 2023 19:38:10 GMT +title: Connor Shorten - Research Scientist, Weaviate - ChatGPT, LLMs, Form vs Meaning +url: https://rss.com/podcasts/vector-podcast/861832 +--- + +Hello there, vector podcasts. Season 2, and to them super super super excited to have a reappearance of Connor Shorten on vector podcasts. We recorded like a year ago about that time. Something's changed. He is a research scientist at Semi Technologies, the company behind VB8. +Here you can see an episode with Bob, and here you can see the episode with Connor as well. And back then when we were talking, Connor, you've been a lot into basketball. Do you still play basketball? Yeah, I still play a little bit. +And I'll add also to that that I think you also have podcasts with Eddie and Laura, also in the queue of we've read. We'll add that. Exactly. +And I remember like you've been big on computer vision, data augmentation back then, and you first like guinea pig task was you know some capturing baskets in the basketball game. And I wonder if you continued working on that at some point. +Yeah, I think about it every now and then, but I've been so captivated by the natural language processing and the tech search honestly. I still think about image search a bit, but yeah, the tech search to me is just it's just so exciting. It feels like there's so much that you can do with it. +And yeah, it's really been it's been an intense year. I've learned so much and I think it'll be a totally different podcast with respect to like what I'm this talking about. Yeah, yeah, absolutely. +I actually love to start also by asking you, what do you feel you've learned in this year that has changed something fundamentally in how you perceive vector search today versus back then and year ago? Yeah, that's a big question. I think I'm definitely with we V8. +I've learned a lot about having like kind of the user focus, the product focus definitely way more engineering understanding of the distributed data system, replication, cap theorem, all these kind of things. +So like the knowledge of the engineering around it in addition to sort of the machine learning research about like how to vector representations get optimized with deep learning models and then you know, this whole retrieve and read research. +And overall the space is evolved in such an amazing way and it's just really exciting. Yeah, absolutely. +I've been I've been also following all different things reading papers, you know, implementing clip, but I still feel like I miss out on so many things and I really hope we will cover some of them today. +And we on the verge of I think maybe witnessing a change in the search paradigm, you know, with chat GPT. I first I wanted to sort of get your first reaction on this. Obviously you tested it. I also tested it actually with when I published my recent blog post on neural search frameworks. +And I was like just stuck on creating a title and I asked chat GPT, can you come up with a title and came up with a reasonably good title and I actually used it without editing. And I read a bunch of other stories, you know, like for example, how you can avoid fines, for wrong parking and stuff. +But then there is this discussion going on, you know, like how it may change search. +But before that, what was your impression of chat GPT? Yeah, well, I think like everyone else sort of in in this like reading about say Google's flan model or, you know, that we've been kind of reading about a lot of these large language models, but we haven't actually really gotten to use them. + I think Facebook's OPT model was on hugging face and I played with that and back in back at the time, I was mostly like the few shot learning part was like the part that was so exciting where you could, you know, give it like for example, of a task and then it could just instantly learn the task and that's like pretty surprising for people who've been doing supervised learning optimization for a long time. +And so mostly my thinking was a few shot learning, but this chat GPT thing, this reinforced learning from human feedback, this like I mean the way that it can talk is just mind blowing. +It's I'm so amazed by and I think yeah, it's really unlocked a lot of thinking about the importance of prompting to me and what prompting means. I used to think that was just kind of like a task description idea, which it still kind of is, but it's like the nuances of it are so much. +And yeah, I'd really love to like dive into this topic of large language models and search and I have a few different dimensions of how I'm kind of thinking about these two things relating to each other, but since I've brought up prompting, I kind of want to stay on this one quickly. +So Bob and Jerry Lou showed me this thing called GPT index and GPT index has this strategy for prompting GPT for summarization. +It has other things, but this is one thing that just really stood out to me and there are like two strategies you can use to summarize long text with the large language model. + You can either create and refine where you go paragraph by paragraph and you say like you start up by please write a summary of this long text, you'll receive a paragraph by paragraph and then it iteratively updates a summary or you can have this tree where you you know you chunk it up like you know as a tree and then you couple it like recursively and then build up the summary that way. +So this kind of thing about like how we use these large language models all of it is so interesting and so I guess kind of yeah, let me pass it back to you and I'm curious like how do you think large language models will change search? +Yeah, I mean I'm still kind of learning it and I having you know built search engine before vector search you know using like TFIDF basically. +I knew the cost of doing it wrong you know or sort of focusing too much on precision and then paying a huge bill because of that. + So like our search engine for example back in the days when we indexed on sentence level in alpha sense would eat something like half a terabyte memory and you know memory was never cheap like it was very expensive even back then and so we had to figure out ways to retain precision, not lose recall or maybe even increase recall because there was a problem with this precision oriented search and stay within the budget right so when I think about language models myself and I also worked at silo AI at one large client you know applying these models at web scale. + The problem at web scale is that you really need to go sub-second and not just sub-second you need to go like 10 milliseconds or so because all of these adapt because you have so many components in the search engine it's also multilingual it's also serving a specific country you know with that specific latency requirements and stuff and and then there is indexing how quickly you can index things right because you may also face bottlenecks there so these these are the things that I keep thinking about but also the thing that we talked a year ago in the port in the same podcast vector podcast is that you know the models like trained by Microsoft for instance I can hardly imagine deploying them today in my practical setting because they will have like billions of parameters and so they will be probably slower and also how do I fine tune them how much server capacity I will need to fine tune them and so that's why I thought you know from the discussion with multi-peach right he pointed me to the Atlas paper where they basically are able to with a few examples fine tune the model so quickly and it will have substantially less parameters so it becomes more practical you know both on fine tuning side and also on serving side and these are the topics that I keep thinking before I enter the is it chat GPT is it sexy is it cool is it answering my questions you know can I actually deploy it and not have angry faces from DevOps saying hey you just crossed all the like we are low margin on search and you are just you know way above that so sorry we cannot deploy this so these are the questions I'm thinking about a lot yeah that I think there's a couple things in pack and no one's helped me develop the abstraction around the end-to-end search framework more than you so thank you so with the with the pyramid diagrams and these kind of things it's so helpful and yeah you mentioned like the approximate nearest neighbor then one up you have where I see is the information retrieval layer where you have the you know dense vector search BM25 split covert that layer and then at the top you have like what I think is going to be the chat GPT layer that's like that would be my current predict and we're going to talk about neural search frameworks that they can do more on the wv8 podcast yeah well maybe to just say a little bit one of our favorite partners that we've been working with is neural magic and neural magic is doing sparsity inference acceleration where they've recently one of their papers is about getting the 175 billion parameter GPT model to run on a single GPU I know that you know you can probably compile these large language models on like Nvidia Triton server and do it that way but I think that this sparsity acceleration for CPUs is just incredibly exciting for that particular dimension of it and yeah I think what you said inspired so many ideas yeah I sort of like like what I value in your approach is that you run probably like a basketball player converted into a marathon runner with the same capacity you have to play a game you know that you basically run super quick and fast and long distances you know on the research side and I love this approach really really because it opens up a lot of opportunities I sort of like because I come from the engineering background yeah I did my PhD but it was like 11 years ago so I most of my time I spent in production you know great systems and every time you just try to move a little bit like okay let's add this and oh the cost is this oh sorry okay it will take me now to two more weeks to index my content so and we have for this what is the use case so you trickle back to like almost like product level management and so you will get these questions inevitably like okay why are we doing this like what's the actual trade off what's the benefit of bringing this into production right and but at the same time I'm fascinated by this I mean this will not stop for sure right would you agree to that statement yeah I think and there's uh so I know Hagen Faces recently published their they open source of data said they did with surge AI on getting these um human annotations to train the reward model in the reinforced learning human feedback strategy so I think they'll they'll be an open sourcing of the data of the data that you need to train the models and then yeah I think pretty soon they'll be open source versions of it I think open AI um I I'm very curious about this like kind of data flywheel idea whereby open sourcing the model they spend a ton of money on letting you use it for free but then they get the data of how you want to use it and so very curious how that leads to more to a better model my PhD advisor is a world class expert in class imbalance like understanding that machine learning models they do not perform well on long tail you know if you have an imbalance data so it's a lot of like the bias discussion things like that so I'm I'm curious maybe it helps the long tail getting all this data yeah it's still not exactly how it will get better I think one thing I've said previously is like there was this paper from Emily Bender and um caller is the last name sorry it but it's called unmeaning understanding in big data and it makes this argument that it's like language models by predicting the next token will never achieve meaning because it's like an octopus underneath the ocean of two stranded islanders and it's just mimicking their language but if it if something like a bear is to show up on the island and it goes help a bear then the octopus is like oh I don't know what a bear is like yeah I'll do more but I think what we're seeing with the reinforcement learning thing is that it's like it's acting it's there's there's this other paper called experience grounds language it's about you you need to it's like the levels of sort of developing meaning and one of it is like about the importance of acting acting in your environment I'm I'm kind of going around right here but I also see like this causal inference stuff and uh Judea Pearl has this ladder of causality where uh it's you act you make interventions but then the the the the top of the ladder of causality is you can understand uh counterfactuals and so that last part I have no idea how that's going to be achieved yet but I clearly chat GPT is now like acting so it's different from the yeah yeah the next word thing yeah I think what coming back to chat GPT like what um impressed me maybe the most is uh so I had I had this problem uh I was I was working on billion scale and then search algorithm with with the group of researchers and and engineers like almost a year ago so I invented this this algorithm I called it candy like of course you know not not meaning my surname but in any case with a k um it's all open source and GitHub I'll make sure to link it and so the the problem was that it it would work on 10 million vectors it would work on 100 million vectors but it would choke on one billion it would basically run out of memory uh and and I did it entirely in python right so maybe I I should have chosen in retrospect some other language but in any case I wanted to make this work um I couldn't I ran out of time and I ran out of computer resource because it was given to us by Microsoft um for a limited period of time so what I did is that I pasted that code into chat chat GPT and I said yeah first of all I tried to paste the whole thing but it said well it's too long so I had to focus on a specific part where I think the the problem you know kind of lurks and and it gave me the answer it said okay maybe try avoid using non-py arrays as much as you do try to pre-allocate them try to reset them and actually I did that I just didn't paste that portion of the code which was doing this so the the system didn't know that but it was on the right on the right track but then when I did it a year sorry a day later the answer changed the question was exactly same but the answer changed and that kind of make me really like uh what's going on like is it learning as it goes can you explain this part like have you seen this in his behavior like was the casting generation of the yeah chat GPT sorry I was like I was trying to follow along with the I think we're going to talk about like approximation error with the AN and search as we scale it and I know we're coming back to the chat GPT but I'll be uh yeah so it's like uh it's like a tree decoding where uh it it has a probability density on the length of vocabulary and you can take several paths through that tree for what you're going to output and uh you often randomly sample through the through the tree if that makes sense like um yeah yeah me does but I mean the answer was kind of like in some sense these two answers were complementary to each other right and and maybe I could go on and say hey what do you mean by resetting can you because it didn't provide any uh code examples it would just say reset and I was like what do you mean by reset I don't have such a method like like like so I I think that that was maybe impressive part of chat GPT and um just to close off on that there was a recent discussion on on relevancy and matching text like where a lot of these search people see uh there was um there was this argument against chat GPT that let's say if you go um you know use uh duck duck go today you will see the links right you can go and examine the links and you can actually verify the information to some extent maybe not to full extent but to some extent in chat GPT you can do that there is an answer that's it so it's it's quite a jump from being able to kind of seemingly check the is it trustworthy to well you have no way to do that what do you think of this aspect yeah that's brilliant I it makes me think about like well very broadly it makes me think about artificial general intelligence compared to super intelligence sort so to say and like I think about the artificial general intelligence and like because open AI they've published web GPT and instruct GPT so instruct GPT is like the reinforcement learning from human feedback part and then web GPT is like the like the whole idea that we're super excited about at wevea where you search for context to append to the input and then like if you say like please uh ground your answer in this information and then it's a paragraph about like how the BM25 algorithm works like I use this personally that way to hybrid search and understanding it and so like if you give it the context it's so much better and so I think I suspect that chat GBC under the hood does something like a Google or a Bing API search and so it's like general old but um yeah this idea like so so so then this idea of super intelligence it uh because I've been like can I use chat GPT to help me write like you know blog post survey papers things like that are relevant for trying to be a master of search and what I need from it is more so like citation recommendation right like I needed to go into like uh Leo Boystov's publications and parse it out for me and help me understand what he's done so it's like the specific information and then yeah the real I mean u. +com also has a really brilliant thing where it's uh search engine on this panel and then the chat GBC on this side so it's like a user interface problem I think yeah yeah but but I mean maybe even yeah I totally agree with you that user interface definitely creates the bias uh how we like how you use traffic lights today they go like red you know yellow and green they don't go upside down right and like if you see an upside down you will you will think well this is a wrong uh traffic light uh I'd rather not cross here you know but like it's kind of like similar here like with the search engines we are used to seeing you know URLs and and being able to click there but of course if you take Google or I guess being does that too they also pre-generate this answers answer boxes right so you can answer you can click there but I don't think you have a URL to verify you know the source of this information if I'm not wrong yeah yeah so they already playing with incorporating this knowledge from a language model right and they they they look at you and of course they also want you to spend more time on their page which is probably not good but we'll not discuss that so they don't share the traffic further but but the thing is you know they still play with the idea okay what if we try to answer not just with the URL and summary but actually with the actual thing right with the actual answer oh so that comes into like the extractive versus abstractive and like whether you want the question answering models that classify the answers in the context yeah and yeah I think that still has a place for sure I mean it's super lightweight as I mentioned Neural Magic they just did a sparse question answering model that can run on CPU super fast and yeah I think that approach is also just gonna be more cost effective for a while yeah exactly but you mentioned BM25 and I'm curious like I've been trying to approach this hybrid search topic but I think you went ahead all right so and I was just wondering like what's your take on this topic like can you a little like intro it to our listeners but also why do you think it's a good idea to to build like a hybrid search you know combining keyboard retrieval with it's with a you know dense retrieval yeah awesome I started by saying this has just been like the most satisfying project I've worked on since I've joined Wevegate and just being a part of this team and it's been you know like a big team working on hybrid search and it's just been an incredible experience so I guess starting yeah the motivation is BM25 has this it builds on term frequency inverse document frequency by adding like this binary independence model and the IDF calculation and then you also normalize it for the length of the document that's just like these subtle differences that make it different than TF IDF but you could also use TF IDF in hybrid search if that's what you were after and so then you also have the vector search and then you have this rank fusion so so we look we found this paper where they have seven different strategies for rank fusion it's like rrf board uh I don't know come some but in the end we just went with rrf reciprocal rank fusion which is just erica's recently published a blog post that shows the equation and just tells some of our thinking around it but it's where you just combine the ranks compared to say combining the scores because you know BM25 has a score particularly and vector search has like a distance at return so you might look at some way of like linearly or non-linearly combining those scores and I've done some experiments with with kind of my thinking around it was like okay what would be like an optimal alpha per query would that be you know maybe like a conditional model like so I tried this with the few shot learning of gbt3 where you you run a few examples of the optimal alpha and then you try to see uh you know how would you like to wait BM25 and dense vector search given this query and see if that is productive but I found and this is a very interesting thing because I think people have this idea that BM25 is like very interpretable but in my experience it hasn't been that I when you do when you're doing longish queries in long documents and maybe we can talk about long queries or short queries but I find that trying to decode why it why BM25 was better than dense search for some particular query I find that to be impossible and maybe someone will prove it wrong and I'll look forward to seeing that honestly but like there's this example that we have as you know erica was developing the weviate error demonstration of hybrid search where the query how to catch an elaskin Pollock and the idea being that the dense vector search contributes the disambiguation of catch that it prefers to fishing and that BM25 is specific to elaskin Pollock but I haven't been able to just like inspect that kind of behavior as I look through the beer benchmarks that just I'm super excited to talk about that and how we've been evaluating it but you know let me let me pass it back to you and ask about your experience with them BM25 or like the keyword and the dense search particularly because then I'd like to kind of take the topic to just arbitrary combinations of retrieval methods not just be in 25 and say DPR or whatever yeah I think I remember even before the dense search appeared on the scene we were experimenting with sort of like making TFI DF which is BM25 is like an addon like BM25 I think stands for best match so period so solved problem solved but you know like one of the questions the the reason I love working with product managers and at the moment I am a product manager so I took the other side of this thing maybe we can talk more about it in the Weaviate podcast but you know the reason I love talking to product managers is because they don't know anything maybe they don't know that much about algorithms as you and they don't code maybe as much as you but they do care for they are the stakeholders of the end result right so when they go out talk to sales or to the end users they will not get a question which alpha you have used now coming back to your to your example right they will say hey I typed cat three times in my query and I still see that the document that mentions it once is at the top how can you explain this I will try to link there is a consulting company I think they're based in Boston actually by the way I just forget their name key and via something so they have a really great presentation on haystack life I believe where they go super deep and I recommend you watch it super super deep on how TF IDF screws up our understanding of how things should work what they don't you know and they go by you know how many times you know the word cat is mentioned in the document versus how many times it's it's mentioned in the query and you can do all this combinatorial you know combinations and then they kind of like explain what you would do to kind of solve it right and you you basically develop this situation another another thing is that I found useful and it also mentioned in the relevant search book by Dr. + Nbal and Jerry Bareman that you can you can use like if you would use like let's say elastic search or similar system or solar you could actually build a function which explains the query step by step right so it basically prints you the tree of how it actually came up with that final answer with that final score and how you know that specific field like for example at TomTom we would I cannot go into much specifics what we do at TomTom but basically the geographic search right so you type some destination let's say an address or maybe a P. +O. +I name point of interest like a shop and it's multilingual as well right so obviously your query may hit by accident sometimes in a wrong language field and so the only way to know this is to print that query execution formula if you will right and so you will see okay ah it hit in that in that let's say I don't know a French uh but I wasn't intending French I was doing German or something why did it do that and you start reasoning about how did I create the tokens because when you tokenize your text it's same problem is as in then search in a way like when you split text into paragraphs or sentences there you need to split the tokens how you do split the tokens is dependent on how you model the semantics of what you are converting to to a token so you should not convert string to a token you should convert meaning to a token so if you capture meaning in that token then you're done in a way but then coming back to your question I cannot answer it fully now but I highly recommend that that talk um by can be so you know like you need to you need to see how term frequencies and inverse document frequencies play together and also like in BM25 versus TFID if you have the term saturation issue which is kind of mitigated in BM25 to some extent right so meeting that if you have two documents um sorry if you have two terms which occur like one is like million times and the other one one million plus one TFID will be unable to distinguish between these two but like BM25 is still sensitive to these things and that's why it's a little better right so I think it solves this term saturation issue I don't know if I answered your question but no yeah I think um so yeah a couple things I really want to continue on this TFID versus BM25 and then adverse displayed to it I think you're I think this like pseudo relevance feedback is that like the phrase I give to show that like um if you're searching with BM25 you say if you had added this key like you have the gold document and you're like how would I have modified the query to produce that document is it so I think that yeah that's one way another way is to how would you modify the indexing that's more in your control right so how you would modify the indexing for example you would in some cases you can remove the applicates or something right so like you don't you don't need them or something like that you can you can or you can split the term by numbers or something right if they happen to occur inside the term something like I'm making these examples but I'm saying that you have more control in the indexing than in the query but in the in the query you can model like query similarity for example right so yeah oh that's super interesting yeah the the way that you do like the text preprocessing like stemming stopper removal all that all that that bag of that's what I hope dense vector search can kill all that I hope you can just like anything can go into it yeah and but um yeah and so I think there's this this thing called like decoding the latent space of a vector search model on that other idea of what query would have produced this where you would take the you would train a language model on document query pairs and then it would generate a query that would have matched the document maybe that's useful but but I'm also I'm very curious what you think about this idea of split vectors so split vectors is like you keep the mass language modeling head and so you encode the thing into the vectors so the mass language modeling head always only takes in a vector as input you always would mask out whatever the mass token was and then send just that vector to the mass language modeling head that will produce like a sparse distribution over what would replace it and so I think the the idea behind split is that you do that for each token and then you just kind of average all the vocabulary distributions and that gives you a sparse vector that represents like the like happy euphoric ecstatic like the kind of synonyms behind it do you like that kind of idea yeah yeah uh uh I like that the fact that I think we can step back from like this dense vector limitations and go back and try to capture what sparse vectors do because if I don't know if you watch the episode with duck Turnbull but he actually shed the light on on this really well by saying hey if you if you take the keyword retrieval inverted index you deal with like probably hundreds of thousands of dimensions unless millions unless billions like in some of the indexes we had at least million per term right so that's like million positions most of which are zeros because this term doesn't occur you know in in specific doc but like doc id but like it occurs like in a few and so in dense retrieval you sort of like compress all of these to let's say 256 dimensions and inherently you lose the precision right so it becomes more like recall oriented rather than you know in sparse you you basically like what also it means spars is that this is probably like a little bit like going back to n and algorithms right so like an inverted index it's basically like a hash table so I have this term it's like order one look up in the hash table and then you leapfrog you use this leapfrog algorithm implemented really well in leucine for example how you jump over long strides of consecutive doc id's because you don't really need to examine them in an antiquity let's say if it's cat and dog you know you know that cat occurs in the document id5 well I don't know like 10 let's say and for dog you are on on on three so you can leapfrog all the way to 10 you don't really need to check all this in because they will never occur together so for or query that's another story because that's a union but for and query it's an intersection so you always need an intersection you can then stop early because you don't need 100,000 results on the screen right and I'm still actually curious on how would you know when to stop because what if you didn't find the document that is even more relevant that what you have seen so far but that's like a matter of debate I guess but then you start scoring them and then sorting them were relevance right yeah sorry if I'm a little behind them so is this referring to how you can use like an inverted index to calculate the BM25 scores so I would you know with my document collection if dog appears I you know dog and the documents so that when I'm calculating yeah yeah but like the the the the I guess the comparison I wanted to make to dance search that like an old vector search is that they are on the on the base data structure first of all you have a choice of the algorithm you want to use but let's say we take hnsw which is the most popular right also implemented in v8 I know but like you don't know when you enter the first layer you don't know where exactly you will end up like so like with hash table I know exactly where I'm entering and I know that I'm exactly in the right place right and you know you can also expand your query with synonyms then you enter more more points in the hash table and you start traversing all of them in parallel and you come up with the answer but in dance search you need to like accept the uncertainty of navigating that graph you don't know where it will land it has certain limitations and trade-offs and then it will pull up you know some nearest neighbors and probably you should be happy with them because oh otherwise you need to do it twice so that price and so on you see what I mean right so like they are fundamentally different also on search side oh in like this stochastic nature of the yeah and also I read this paper called OOD disganan that talks about how much the distribution shift can impact the graph based vimana so vimana is like hnsw but you flatten it so there's no longer the hierarchy of layers it's like all the same thing and then you can put it on disk and it's like a little cheaper run I think yeah it's fascinating the whole indexing the part that that's like kind of the the meat of this especially from wavy aspect of that's where I see and in addition to you know the ux and making it like a very developer friendly to well there's a few sides to it because there's also the distributed database part and you know all the written and go laying the concurrency control you know the replication of the backups like all these kind of things like that so it's definitely like some things to but that approximate nearest neighbor search and I know that you have this experience with you know I've listened to a ton of your talks and you're you introduced me to the a and n benchmarks but yeah that I see that there is being like three levels of errors that come that propagate up there's the errors from hnsw and say product quantization then there's the errors from the vector representations to begin with and then there's maybe the errors and like the question answering model so if you wanted to have like you know natural questions open domain qa you're looking at like three layers of cascading errors that are sort of unrelated to each other yeah exactly people really brilliantly that you like and I think if I may summarize it you know I anyway to you know kind of where this hat of the person who is creating this doctor search pyramid and stuff I'm not the only guy doing this but I keep doing this because it helps me to stay comfortable in the topic and sort of okay I'm looking at it from this angle and if you accept it stay with me if you don't you know you may you may as well augment it or something like you did earlier with some levels and you know like it's just you need to accept that uncertainty like you explained and also that uncertainty that you know like in this can and paper they they explicitly show that in hnsw you may have unreachable nodes and they counted something like 1000 nodes were completely unreachable from any point in the graph like no matter how you search how long you search what are the values for your e f and m parameters during index construction and search you just don't reach them and and that's I think that's somewhat similar to the inverted index search where you have like one million uh doc IDs per term how do you know when to stop it's also like you may never reach the documents that you should have visited but you just deliberately decided to stop you know prematurely because you don't have time you have to you know return the documents within night and 10 milliseconds so you have to make trade-offs um but they are ordered naturally in in the increasing order of doc IDs right they're not ordered by does this question answer anything does this does this document know anything about cats or it just not mentions them and passing you know does this document knows anything about tweeter does it describe tweeter or just says you know please contact me on twitter here is my twitter handle right like complete noise uh so so you see what I mean right so like there are I think in both approaches like on fundamental level on data structure level we deal with this fundamental limitations like gravity law like you cannot jump off and and fly to moon or to Mars right without additional like thrust and devices and stuff yeah so do you feel the same like does it resonate oh yeah well firstly thank you that you just explained that concept to me for the first time I'm just I'm just now alive on the podcast understanding that concept but yeah it's very it's very cool like the um sorting the inverted index to prioritize documents maybe by clicks like clicks would be like like the most sensible thing if it's like web pages so to say and you sort the documents and then you yeah you have some kind you could probably calculate how much time you have to search and how much that lets you go into the invert index yeah super interesting I I think it's very interesting for wevia with the with the hybrid search in the beam 25 index because I I know the inverted index has been explored because we have this uh like neuro symbolic search where you would annotate properties like you you're searching through let's say you have a billion sneaker images but you've also labeled the color they are so you have red is the color and then you can use that to filter the search so there's definitely been some foundation in pre filtering and integrating uh these kind of symbolic inverted indexes with h and s w so it's not like the first time we've yet it's ever exploring that but I yeah there's definitely nuances with the beam 25 because of the cardinality of how many terms you like with the document I think you're splitting it I don't know 300 words right like 300 300 words per property so the just the size of it um I mean starting to go into the thinking around like the sizes of things it inspired me when when you're mentioning the compression bottleneck from sparse to dense I was thinking like okay let's say we have 384 dimensional vectors that have 32 bits per uh vector position like what is that is that is that 384 or 324 or 32 or you know like that it's still a massive common tutorial space right yes exactly exactly and and and you said like is it even the model that captures everything we need to capture right it in all of these are numbers of course it's kind of number representation of the models understanding well understanding in quotes of of the objects that we index uh but I guess like for me like um and you're way ahead in this I feel like that uh with VBA development like um of me you know what matters to me when I was like a search engineer day to day is what tools not necessarily tools as in specific programs but like tools as in algorithms approaches I have to control the process right so if somebody comes up and says hey can you look in this query can you debug it first of all like explain queries one brilliant way of doing it and that's where you start but then once you understood aha there is a problem that it hits this field or I give too much of a boost uh in this situation what should I do so you start like tweaking these parameters and you have these tools in your hands right you can do that in vector search I I don't know like I have like probably fine tuning as one tool right so like if clip stops working on these images I can go and fine tune or bird um but what else do I have like I can also tune some parameters in hnsw or gskn and so or something I can make all these thousand nodes reachable like they didn't this can and I can choose disk over RAM if I want to save on you know on cost and stuff but what else do I have as a control to actually go and debug and fix that specific query like what has been your experience on that or maybe thinking uh yeah I think you've named them all I mean I know I've seen like um like the tuning of the EF construction as you mentioned with hnsw and I guess something that I'm really excited about with these beer benchmarks and maybe I can introduce it now because I think it helps with this idea of model selection in terms of the user's perspective on how can I debug my system how do I fix my search system so the beer benchmarks is it's about diverse text retrieval so you know it's like arguana NF corpus track covid is the difference is instead of saying that the search image net is going to be ms marco which is you know like 10 million being passages and like a million labeled query so it's like the image net idea of like this general source of it like image net is like a massive collection of images labeled in a bunch of categories so it's like it's like is ms marco the search image net but it seems like instead we're going for diversity with beer and I think also if we all if we want to talk about intent intents and instructions further I think actually beer is I think beer had another there's latte like L O T T E capital T's like they go they go beverages right so yeah so there's like an equivalent to beer and then there's also miracle which is for multilingual so there's a lot of these like diverse text retrieval and then and then it's expanding where you would label it with the instructions as well and I don't remember the names of these data sets off the top of my head because it's very new but I know this paper called task aware retrieval with instructions and I think there's a model another paper with a model called instructor so this is idea where you also label with the intent but but anyways let me go back to the focus on like how does a user debug the search system and say how can I fix it so the idea with the beer benchmarks like one idea would be that we could test several different models and you could maybe say like okay well I'm building a nutrition I'm building a nutrition search apps I'm like I'm like bodybuilding. +com or something like that and so you would look at the NF corpus results and you would see the performance of the different models and that would maybe help you take a different model off the shelf but then what you're saying with like fine tuning it I suspect that fine tuning is going to be a super powerful lever I think if you find like and maybe later there's so many topics I want to talk to you about. + Like with the idea of I've been building a Wevey-A demo of the podcast search so I've been taking the Wevey-A podcast parsing the transcriptions and putting them in there in my temptation to like fine tune it and start thinking about this positive negative construction for that I mean I think in general with Wevey-A we're kind of you know letting like you know we use open AI models co-hear models, hugging face models and it's like we're not really training the models but it's just such an interesting thing to tune I know Gina AI's fine tuner is extremely interesting that I do find myself like constantly pulled in that direction of like wanting to train models. + Yeah absolutely I've been when we when we presented Mewves at Berlin Buzzwords last year now we actually said we also have Mewver which is the component to allowing you to fine tune a model we kind of like don't have it for prime time but I've been like really fascinated kind of coding a bit of that and and checking how well it can can work in a more generic way you know because I think fine tuner allows you to plug in several models you know and like because different models have different inputs they have different like setting to train and fine tune and so you need to be aware of that like Clip is that is a kind of two tower in a way right so you do need text you do need the image but I think I feel like coming back to the question like what tools I have I feel like fine tuning and I feel like you agree to that the fine tuning is one way that should be more available to the masses should be more available to the users in a way that they are aware of this tool and they know you know best sort of like know how how to use them and also pitfalls you may fall into and I think this is what you brilliantly described like a year ago in the context of computer vision like data augmentation right so like it's one thing that you can feed you can feed some manual examples but how far you can go and like in your basketball example like you've been manually labeling some examples like you run out of patience in a way right okay you can hire people to do that but is that scalable probably not and also new trends come up like if you take a business specifically working on e-commerce or I don't know full text document search you know things come up every week maybe right so like I don't know Tesla releasing cyber truck and you don't have it in the in the model so it actually like in your example what was it with the ocean and like yeah I hear you say like how to catch an Alaska Pollock and then let's pretend that Alaska Pollock is a new fish that like you maybe with vector search you may try to find what could be the most similar object but it may also be wrong right or in the case when the distance is so big that it doesn't make sense anymore to consider this as a candidate right so yeah so this is this is very interesting like and I hear that you you really want to like dive into fine tuning topic as well right yeah well that idea is amazing because there this argument and I also when I interviewed multi-peach he gave me these three reasons to favor the retrieve then read approach to large language models and one of which was this idea that you can swap out the information to update it with new information cyber truck becomes a new thing and then you can put it in the context and now the language model just has the reason across the context but then as you say the embedding model doesn't know about the new thing so the embedding model you know also isn't going to pick it up and so yeah I think that continuous updating one idea that I'm just incredibly excited about I haven't figured out how to make this work yet but the idea would be you you're the ML ops problem of this is you need to re-vectorize your data set which yeah so the solution maybe is that you could vectorize like a thousand representative documents and my hypothesis is that the proximity graph from I want to say Vamanum or Southern H and SW because I barely understand graph neural networks let alone trying to make it a hierarchical neural network but like if it's if it's the proximity graph maybe you can it's like it's like it's like a psycho again it's very similar to like image to image translation or any kind of you know it's a vector space to vector space translation and so you you know you input the vector output the change in vector and so can you vectorize like a thousand and then propagate that throughout the graph it or throughout the corpus and maybe that proximity graph has some kind of bias that facilitates the optimization task or maybe the graph neural network thing is too much overhead and you're better off just having like a transformer that takes into vector outputs a vector but yeah that this idea of like how do you continually update your embedding models it's fascinating right yeah yeah especially the ML ops aspect of it as you've mentioned like if if we were to insert new neighbors into the existing graph right would that change it favors something more recent or would it like break something that we didn't want to break and things but but in some sense if you think about coming back like we are still in the realm of this hybrid search topic in a way right if you look at BM25 OTF idea of approach right so if you compute so you're I so you term frequency is only dependent on this document right so that's fine it's kind of the independent of all other documents but your inverse document frequency is dependent on the whole corpus which is indexed in that chart by the way that's another like big topic which is kind of like crossing the boundary of is this just infrastructure issue in slash engineering is this kind of like research issue and it's like it's fuzzy it's it's it's it's a blend and so for that chart you're gonna have that local idea unless you build a a higher level cache which will keep track of each individual chart's idea and roll it up to the global idea and like if you look at Apache Solar I think I believe they had a country module or something implementing this where you can actually implement a global cache with IDF which will live on top of the chart and now you're coming back to MLOBS you need to make sure it never dies because if it dies you go back to like the chart level IDF and so that becomes dependent on okay I have managed to stuff stuff a lot of documents about cats in this chart so the IDF is like this and then I stuff a lot of documents on dogs here so they become like unbalanced if you if you know what I mean so they it's not a healthy mix of term statistics in your collection right and that will influence a lot of things like you may say in some cases it's okay but in some other cases it may not work if your query contains you know both concepts and they are unequally represented somehow in your in your collection right so so I mean does it make sense I mean so it's like you do have limitations also and and not limitations but maybe I should pose it in more positive way research tasks right so such challenges what should we do and I hope that in some sense dense search is pushing us to think more and more about this and maybe some things will back lash from vector search back to the you know classical keyword retrieval and maybe some new data structures will even emerge to to tackle these things yeah I think that idea that you're describing on the IDF caching it's super interesting I think it is it is inspiring me just thinking generally about how we're trading knowledge on this thing in general and having this podcast and having this content and this communication and how we've you know done like our first iteration of BN25 and yeah like learning so much about the index structure it is really really interesting I was thinking about like oh well how about displayed vectors could could we just kind of update the mass language modeling head to get the new terms and would that be easier than this kind of global cache I like idea and is it more forward thinking and then yeah it's really interesting I think maybe one other idea is this thing called colbert which is like a token level representation thing where it's like they call it late interaction where first you do the you know the standard vector search but then you keep the token vectors for each of the vectors and and then you do that and then they've had efficiency improvements on that so like I think they in the original colbert they they've recently published this paper I know Christopher Pots and Omar Kataba I'm sorry I don't know like I'll show my best like no give everyone credit all the time but in this paper they describe it like the original colbert is like in 154 gigabyte index compared to like one gigabyte with other methods and so so yeah like efficient indexing and I'm definitely running a van but it is like a big thing to unpack there's so much depth to this and that's what makes working in this field so exciting is that there's so much opportunity so much to explore yeah yeah and so much unsolved as well I don't I don't know if you wanted to continue a thought oh no sorry yeah I was just yeah I mean we are branching out but like actually one thing that you just reminded me there was a maybe I should start writing a book or something because like the moment I remember this I should write a chapter and then keep adding and then publish it maybe you can be my author or something yeah that was just thinking it was was it like 10 years ago on Berlin buzzwords there was a presentation by one of the engineers at Twitter I don't know if he's still a Twitter and I forgot his name I remember he was German working out from San Francisco and he basically coming back to that issue with you know sorted document ideas right what they did at Twitter first of all you know the scale of Twitter is such that you cannot possibly store Lucine index on disk and then go and retrieve it because well it's just way too slow right what they did is that they moved the whole index into memory right so they had to rewrite Lucine to kind of like this memory friendly data structure and one thing they did in particular is that as tweets come in each tweet is a document it gets its unique document they deem and they would append this new document ID to the postings list in the end right so for this term so they would decompose it into terms back and then they would know okay I need now to update that specific terms posting list so the posting list is just the array of dock IDs so they would put that Twitter tweets dock ID in the end and as the new searcher comes in searching tweets they would read backwards from the end they wouldn't read from the beginning of so basically what they did is that they kind of like encoded the temporal nature of tweets and people what end users wanting to search and view the tweets which are the most fresh so like like I don't know if like you are the heavy user of Twitter I do you know like on Twitter like when I log in and I check my timeline like usually I see something super fresh and then I keep scrolling but like not like anti-props to Twitter but it's it's a nightmare to search on Twitter like when I search something I know existed like a week ago there is no way for me to find it unless I know the exact tweet ID right and so at some point I was even indexing tweets actually direct messages I had with few people you know in solar and then basically searching them so because it was way faster than searching them on Twitter because like if you have 5,000 direct messages scrolling through them will take half a day so because they keep loading and loading so basically what I'm trying to say is that they optimize the data structure for the nature of usage of Twitter in such a way that they bias to the recent tweets and they don't care if you will have to spend a day retrieving like super all tweet like it's like so my new user use case for them for the majority of users 99% of users will only want to see and consume the latest thing so in some sense this is kind of the effect of optimizing to the usage like what you say we could optimize you know like split or or similar you know sparse lllm or something to kind of like learn you know that latest beat and maybe there is a high chance of it being retrieved as well so we might as well bias the system to that but then of course there is catastrophic forgetting thing and stuff like that. + Yeah there's no is yes not an easy problem to continually update the mlm head either it would be maybe worth adding that this mlm head insplay doesn't need to be like a billion parameters well maybe a billion would be good but it doesn't need to be a hundred billion or like yeah that's such a fascinating nugget of system design you just shared at the Twitter thing and yeah it's really interesting I've seen this other company called perplexity AI that Ravine Shrinivas is I think he's a founder CEO of it and it's cool because he was he worked on curl with Peter Rebele on this contrastive representation learning for robotics where they're you know they're doing the same kind of idea vector optimization to learn a state space for robotic control on so I think it's really cool that now he's working on the search space too but they have it's like the other approach is like natural language to SQL it's something like that where like instead of and I'm getting a little off topic but it's like kind of related to Twitter and it's about like putting tweets into you know data stores and then parsing natural language queries into the SQL but so that's like another idea I guess is like you would parse the query yeah I think I'm already explaining what do you think about that idea like you you take the query and you turn it into an SQL query in like that's it yes yes I know what you mean it's like it's very similar I think deep said did that right so you can or maybe it's opposite I'm not sure but like if you have a probably the same if you have like a table right you know with fields and rows I don't know let's say list of mountains with their heights and so on so you can actually have a question what is the tallest mountain in Europe or Asia you could turn that query in natural language into SQL command and say you know select you know mountains from this mountains table order by height reverse right descending and so I like this idea and in fact actually I've I think first of all this is already doable right so I'm fine just stood with like with the Deepset doing that in haystack but I also came across this idea during my PhD research because so the problem there I believe was that it was like these engineers working on building aircrafts and so they had to read a ton of manuals but once you read the manual you still need to go and look up that specific number somewhere in the database right so so basically they do like multiple multiple hop approach and that may take like forever like you first of all you need to crunch through a ton of you know text material and then somehow summarize it and then okay now I need to go and look up that that number in the database but what if you could ask a natural language question to the manuals then convert that to a SQL command which would know to go and look up in that specific database table and give you the answer so like the manual doesn't have it but it has some instructions how to find it and then you would kind of like convert that into through this metal language convert that into SQL and then get that answer right and this was like pre-dense retrieval in era obviously but I think I still feel like it has the merit to like well I guess two things I think first there's this problem where you search like for error line manual some specific detail and it's like in result seven like it almost got it like it's not like not in the top 100 but it's seven and to that problem is where I think this GPT index like recursive summarization or create and refine summarization I think that'll solve that problem and yeah well so I then coming back to this idea of natural language to SQL and like structured unstructured data on the other end you can also parse the tables into text and so I've seen that done too there's like wiki tables to text and so me personally my favorite application is is scientific literature mining and searching through scientific papers and so you could parse out the tables to turn like the results tables to turn it into natural language and I mean there's so many fascinating things so it's like with a knowledge let's say like a knowledge graph the idea of the knowledge graph is if I have Demetri Khan host the vector podcast is a product manager at Tom Tom I with knowledge graph I can you know I compress the representation of all these facts into one structure compared to having the set of sentences right and yeah so maybe if I can kind of plug something I've done so I have this paper that will be published pretty soon it's about it's in the Florida Atlantic University PhD it's an interdisciplinary team with the College of Nursing and a local healthcare system so we have electronic health records that describe COVID-19 patients and we're trying to predict survival outcome treatment forecasting prognosis all that kind of stuff and so the the thing that we explored in this paper is let's switch from the structured tabular data to parsing it into natural language text and let's turn it into like clinical narratives or let's do this thing where you do if X if feature name equals if feature name equals then label right yeah so there's a paper from the University of Wisconsin called language interface fine tuning where they do that same idea but it's you know like the UCI machine learning repository data sets so so I think I know that I've taken like a walker and also to think it's cool it's cool I'm sure now listeners will be like what but I know like it's it's also what I heard from my listeners for example in the podcast is that they actually do use this episode as an educational material so that's why you know if we can stuff as many links to papers and your work they can go and study this yeah go go go I do some rise I guess the question is like how are we thinking about structured and unstructured data the deep learning systems you could parse out the structure into unstructure and then you have the transfer learning is really easy right yeah yes or you can keep the structure and then maybe you can learn a better representation thanks to the structure and with that question my interest has been really heavily in these causal digs and this idea of creating structured causal relationships between variables I still have no idea how that really how you can take like Wikipedia text and turn it into a causal diagram but I have an idea of like if and it comes back to this agi versus super intelligence idea if I have a super intelligence and it's reading search literature I want it to have some kind of causal diagram of our current model of search stuff so like it has some model of how BM25 is index the limitations of it's blade this representation this MLOVs problem it has like some structured representation of all these problems such that when the new batch of archive papers or tweets you know however the news is coming into it or experiments right it looks at its causal diagram to say like this violated my this this claim like because that's the thing you see a paper like autoregressive models as as search engines or you see like what's the name of that where it's like transformers is a differentiable search index like you see some title like that that violates your causal diagram of why things are the way they are and that's what like inspires your interest so that's that particular angle of it is yeah yeah I'm not mostly thinking I haven't explored this topic myself yet but so let's say if you take a language model like bird which was kind of like you could say statically trained once on Wikipedia or news content right but the world is changing every single day right your model doesn't so what you could do is that you could introduce knowledge back to the model and I'm still like on the on the brisk of kind of exploring this I think new tremors talked about it recently like how you can incorporate knowledge in the language model so for instance what like the way I see this before I even like read this paper so I could probably try to invent reinvent the wheel is that so the language model might figure out that the question is about the president of the United States that specific one let's say Obama something but then the question is is Obama still the president of the United States and so now the language model is kind of like hentik app that says well I actually don't have last I know like chat gpd does that right like I was trained by 2021 so I have no idea what happened in 2022 sorry goodbye but like it could actually say it could say I figured out the context I know roughly what you're asking this is the person I know this person I know that what what the president means I know the the country United States but you're asking me a factual question so what it could do is actually it could go and ask a knowledge graph which is updated without recalculating the the embeddings which is solving them all of this problem right so it's it's it's another data structure you know it's a knowledge graph it's being updated as we go and so it goes and says hey let's coming back to your question on on structured language like in in graph systems you also need to form your query in a certain way so it forms the query in a certain way and traverses the graph and then checks is Obama the president the answer is no it goes back all the way to maybe a language model I don't mean some other layer and basically presents the answer to the user right yeah so that's just one thought before even dove into this topic of incorporating knowledge in elabs I would probably think like that yeah I love that you brother that knowledge graph it's like and that's kind of like GBT index as well as laying chain I can't believe I haven't brought that up until now we can talk about that more in the neural search frameworks discussion on the review podcast but like this idea of different kinds of external memory and I don't know what's wrong with my brain today and I keep like branching into completely I don't think it's wrong I think it's the right setting it's just not suitable with the coding or something but um like so I was recently talking with Shukri who just joined we've got as well about um about this idea of metadata re-ranking so one approach is you have the xg boost re-ranker where you take in the bm25 score the vector distance and then also symbolic features as the input to the re to the xg boost re-ranker so the thing he was okay do we want to store this metadata in weveate as well or do we go get it from redis or feature store something like that where we get that kind of property and so it's like the knowledge graph the idea connects to that because it's like okay are we going to build the knowledge graph in weveate should it live in weveate or should we plug weveate in with something like Neo4j or or is it a top level controller like the neural search frameworks thing you're describing where it's you know something that hooks into weveate and hooks into Neo4j relational AI tiger graph I don't know all the rdf ontology technologies but you know like it has separate and it's a higher level that picks between the indexes so it's yeah it's like what kind of technology is built and weveate and that's not even really up to me you know exactly but I think it's kind of fun to brainstorm with you like what like we kind of like intuitively find this limitations together and at the same time this limitations may lead to future discoveries like on engineering and research and like when I was giving this keynote at Haystack where by the way weveate guys will surprise and another guys as well like I didn't I didn't feel bold enough to say this but I think I will say this now at least that I feel like engineering and research are kind of like indistinguishable in the amount of intelligent power you need to put into this to solve it because it's not like given right like if this data structure inverted index is designed like this and you do have the the issue of early termination because you cannot like waste so many CPU cycles then like okay without reading papers can you go and solve it like being just an engineer so to say no you can't it's like it's it's super hard like you need to start coming up with like new vector space model which was invented when in 60s 70s I don't know so like can you come up with like completing your model it's it's it's equally hard as in research when okay you know that SOTA is now this can I beat it somehow but it's not like you're just beating sort of for the sake of it maybe some people do but like I would take a stance of not doing that like I would try to solve an existing problem right so I do want to surface as you said more relevant document to the top or maybe even the passage maybe in a number so I keep pushing for that so both of these to me they're like they require so much intelligence so that they become indistinguishable in some sense like what exactly are you now solving the MLOPS problem are you solving the you know the inverted index data structure limitation problem or are you solving how do I retrain the embeddings how did you train the model or fine tune the model and I don't recompute the embeddings because it's a way to expand so it's to pay the bill yes does it does it resonate with you like what what are your thoughts of that yeah our ct oeddy and delocca has written about product engineering and like on this meta on this meta of like how do these decisions get made and it's like I think there's a book called change my office I have a bookshelf behind me I used to be in podcasts and I'd be like it's that yeah yeah I still have it actually yes but it's like it's like ask your developer is a title something like that about and well okay so that maybe maybe I got a little off with this idea of research and engineering I think the the scientist is very like a metrics oriented in a different way like the the engineer like the the diversity of the tests and the data collection is more important when you're the when you're the scientist sort of uh yeah the the engineer needs to build like smoke tests sort of where whereas I see the scientist needs to like have a very rigorous data collection kind of because that's sort of how I see the distinction and responsibility sort of is that makes sense yeah it does it does actually yeah you you uh you gave a very good distinctive you know feature what I was trying to say is that like in engineering you still have a plethora of options like it's combinatorial explosions in certain cases there are also mundane parts in both of these right so like we are not talking about them but like they do exist but like you do have these points like okay should I branch this way or that way should I step back and rethink and and that's yeah but I agree I agree you you gave a really good example of like in research I do care about data so much in engineering it's probably the quality assurance department is going to worry about okay what data we're going to feed into the system to try to kind of maybe break it and see limits and where it breaks what do we need to fix um or is it kind of like stable what it proved enough to release you know things like that so but yeah I think if I can stay on this a little more I think this like generalization testing like the industry of quality assurance but 4D learning is is going to be really fascinating I'm like excited like how I think when we first met you had written this um not all vector databases are equal and I thought that was so insightful because it was like a you told the story of an emerging market and that was so interesting I really look forward to seeing like the story of the emerging market around generalization testing I think like um like with the beer benchmarks that kind of thing where it's like you create some million scale data set and have the NDCG recall precision with all these queries I think maybe also this idea of like AB testing with models is going to be more popular I was when I went to Neurops this year and there is this talk from Dr. + Juhau came about interaction centric AI and how that might differ from the first paradigm of model centric AI where say you judge the image generation model purely based on like inception score for shade is tends to feature spaces in real images and then to data centric AI which is like I think snorkel AI is very responsible for like branding that term and making it so popular but it's like you're really focusing on the curation of data like your language model is like mosaic and oslatus pub med gpt it's about like you have this particular data and you like clean it and you make it awesome and then I think interaction centric AI is like a new way to evaluate models where it's like AB testing driven kind of or like how quickly can you perform a test I don't know if I've gotten too else topic but no I think it's it's exactly the topic to focus on if we are serious about you know putting these things out in production like you do need you do need to have and provide an evidence to the stakeholders that and to yourself that this dust hold water and we can release it and it's not going to show something you know in discriminate to the users that they will be completely you know puzzled and stuff or maybe you know there are all these numerous examples when like Google search when they I think incorporated some distilled version of bird when they would flip the meaning and they would say you do take this medicine but actually in the prescription it says you do not take that medicine or vice versa you know because it's not sensitive to negations and stuff so like I totally agree I'm with you on that like how do we QA quality of sure you know that the systems that release and I think the open AI team did that brilliant trick in a way that they said hey here is the chat GPT go test it and they get like million users in the first few days because they actually do need some extra brains to do go and test in different like scenarios and see where it breaks maybe it doesn't make sense anymore so yeah it's my understanding that's how like scale AI became the kings is that you know like labeled data like mechanical Turk I think Sir J. +I. + is something that's emerging that I've been seeing yeah it's really interesting yeah exactly um yeah um I was I was wondering um you you also worked on this podcast search and you had the opinion that Whisper has some bottlenecks I wonder if you if you want to like tap into that a little bit yeah so I'd love to tell this story so uh so it comes the kind of story behind it is uh so Boris power at open AI tweeted uh so they they cut the prices for the open AI embeddings and and Boris is pointing out how cheap it would be to index a massive podcast like the Joe Rogan podcast so that's how I was like hey I have a podcast and you have a vector podcast and we did also but so I started to be you know I started doing this where you you know you take the audio files then you put them into Whisper I also tried like uh descript is something that I like a lot I've been using descript for a long time for editing videos and so it's like you still because you it's very the podcast transcriptions you still want to edit them a bit you you have like uh and like like if you were yes how I'm pausing right now I'm talking about but the transcriptions it's not quite what you want to like index to this idea of like how do we create a knowledge base from these podcasts because these podcasts is so like we've covered so many topics and it's so it's kind of easier to do it like this than to be writing all this down and then and also it's very collaborative uh like the podcast you get more people involved it's like a community building thing is so yeah that idea of creating knowledge bases out of podcasts like what would you write your interest on a scale of one to 10 of having a vector podcast I mean I would love to join to join the you know to join the geek here so because I do like I was rewatching the episode with you Danny here we go and you were like exploding with knowledge right like in a way you're branching out a lot today as well exploding with knowledge because you you read all these papers you try things you share you know like google collapse and stuff but like like how do I tap into this knowledge like it's it's very synchronous right I have to like there is no way to like random jump into hey where did he talk about you know that model from Microsoft like I don't know unless I have the time code I don't have a way to do that right so yeah yeah and I what that's what inspires me so much with I want to fine-tune these models so badly just on the uh turn taking as the positive labeling and yeah I think can you expand a bit more on that what do you mean uh okay Conor says I want to talk about the turn taking Demetri can you expand on that more on what that means Conor okay it's like this is how you do the positive thing that's like potentially like that yeah yeah yeah and like if you want to have more examples of what Conor said like you could like augment with Conor's uh statements uh oh like sentences yeah yeah and just the I feel like the potential of it is crazy I also think like we're gonna see it like hooked into say Spotify are these big platforms that organize podcasts and I think it'll help you like discover like because something else it's like I love how you do this vector search podcast and I'm also doing a vector search podcast and it's like who else is out there doing like maybe a recommendation podcast or like you know like it's like this kind of discovery about the people because podcasting is very like collaborative it is a medium right like it is not like you you can't do it by yourself no like it's like it's it's almost like the thing uh like stand up comedian so anyone who is presenting you do need the audience because you simply do not generate the 3d-ness of your thoughts in absence of people like it's very hard to do and then same thing happens here right now like when we exchange like I like I have like a full shade of these notes and stuff right so I wouldn't be able like what like do I do you know these things do I know some of these things you know it's like a vote if working in your memory but like coming back to whisper like just to get it right you you're saying it's still a bottle neck in your opinion in what way okay well I'd hate to be like quoted as saying it's not good if it's not the same thing which I value you know it's not yeah so if you're creating a podcast search app you like there's still needs to be a little more parsing I don't know if you need to find I don't know if you need to correct one and then fine tune so because I've also been playing a little bit more about chat gbc and as as I've been learning about this kind of like sequential prompting from gbc index and chain about learning like how you can get chat gbc to maybe clean up a podcast transcription but there's like still a pretty fat pretty difficult manual cleaning effort in the middle of that yeah actually I can resonate with that like I've I've worked with one startup helping them to do speech to text right and first of all one one issue is very similar with low resource so to say languages in an OPs that if you don't have a model trained on a lot of examples or maybe they've been trained on some TV shows and you are doing and a user speech stuff you know the topics are different the style is different everything is different and so it breaks and so I was also eluding to the topic of fine tuning there but exactly what you said the problem was the output was so noisy that I had to write and what I called like an LPLayer which would go and you know change things for instance if you say 25 and it actually spells it out with letters you you will collapse that to a number you know but sometimes it would do it in problematic places and you're like oh no don't do that don't do it here you know so like it's just like an aftermath you know thing and you would wish that the model having enough context and knowledge about the world should do it right as it transcribes rather than you do this as as a aftermath yeah yeah exactly I'm thinking the same way and he's like it's a text layer afterwards yeah yeah exactly yeah super cool and then maybe like as we're wrapping up the podcast if you let me quickly tell you about ref to veck and sort of the pivot into recommendation and well so to start off ref to veck is about and it's about utilizing we VH data model a little more so we VH data model is designed where you have different classes so this class could be products this class could be user so you know like tables and SQL we have different data objects like the high-level ideas of designing data objects and then you have graph relations between them like user-like products so the simplest thing is that then you can represent the user as the average vector of the products that the user liked and then you can rank with you can re-rank with that or you could just search with that vector that that could be your search vector or you could have some other search like restaurants in Boston and because I live in Boston and you know like oh sorry I didn't mean to give away Boston in the query say I my my query is Italian restaurants and because it sees that Connor likes restaurant I don't know like some north and Italian restaurants one way that like it knows that I'm in Boston so it will it can personalize just using that vector to re-rank to only show me restaurants in Boston because if you show me a restaurant in Chicago it's like useless so so so that's kind of the first idea is this kind of like average the vectors to get the centroid but then there's this idea where and I learned this from talking to Martin Grootendorce about his burtopic library and I highly recommend people check that out it's such a cool way of visualizing vector spaces but this like HDB scan clustering so he was describing the difference between HDB scan and k-means clustering for how they produce centroids but and so HDB scan has this very cool like density clustering thing but regardless of the clustering you use I just I like HDB scan a lot but so let's say we get three centroids like I like Nike shoes a data shoes and Jordan shoes and you have these three centroids so you can use those three centroids is three average vectors from their respective clusters to re-rank with as well have some kind of diversity and results and then there is just this thinking around so so yes there that that's the recommendation pivot and then there's this idea of like top level index and I'm stealing that kind of terminology from gpt index because what gpt index does is to represent a long document you have again that tree summarization where you could say this is for obviously right and it's for and you summarize these two and then summarize one and see if this like top level index where you search through this layer first and this layer and so it's like if you're asking question like what was Barack Obama's legacy and then you have the symbolic filter of the titles of the Wikipedia pages and you have where title equals Barack Obama like that top level search will like super simplified the search space because now you're just looking in the Barack Obama article and there at all Wikipedia so I think reftivect also in the use of having constructing top level indexes by you know having document has passage has passage has passage again in the we get data model it can be it's just like I think it's a really interesting way that we're trying to use this cross-reference graph structure to move embeddings through the graph another idea and I know like doing a thousand ideas like it could be having like a graph convolutional network where okay so you have user-like product has brand okay let's just make it a three-class graph like that and so you have this this graph and you need to send the you need to aggregate the embeddings through the graph so now it's like should we just average should we try some kind of like nonlinear graph convolutional network and the graph convolutional network being beneficial because a graph network can handle like arbitrary number of inputs that's sort of like isn't like a fixed input size like transformers you would like zero-pad it to 512 tokens or the convolutional network is it's like kind of flexible but generally it's like very flexible to the number of inputs and so I hope that was an okay tour of reftivect and I know I'm trying to get in a little bit start it's amazing actually and I think I hope we can maybe discuss in subsequent episodes as well because the topic of personalization is also very interesting and like for someone who says okay we just have this fixed vectors computed from the content how the hell we can actually bring the user and this is what you've described this is what I perceive from it I think this is an excellent topic and this kind of opens up opportunities for vector search to appeal to to the you know search engine builder so maybe some other engines like recommendation and so but I think we have a ton of material I really love talking to you maybe before we close off is there something you wanted to announce to to the audience of vector podcast oh yeah I think it was so we have toured a lot of things but I really hope that you check out the weve 8 beer benchmarks repository so this is a recent effort around hybrid search coming back to that in a long conversation I really thought it was forever ago but like the hybrid search thing has been tested with with the beer benchmarks and so there's there's like scales there's like small scale beer medium scale larger scale so right now there's the larger scale and some medium scale I'm at smaller scale and some medium scale and right now we're working on the backups but this is all based on so we've got 1. +15 had backups where you can you know back up the weve 8 instance to have like a file that lets you just restore the weve 8 instance so you don't need to import the data it's like you know it's like with the face indexes how you can just read index so so now what you can do is you can just load the weve 8 index and so why this is so exciting to me is I've I've always been really interested in like hug and face data sets or papers with codes papers with data like this organization of data and I used to think with like weve 8's Wikipedia demo that it would need to be like live always hosted like you click try it now and then it's like boom you're in the console and you can query it but I think with these with this repo where you just download the Docker file for weve 8 it's like three it's like two lines of code where you do Docker compose up and then Python restore the name of the data set you want and I think that's just as easy as having some always hosted demo so yeah I hope and I think the other thing is with with hybrid search another thing that excites me so much is it's like if it's vector search only it's like you could argue well why don't I just use the face index then but I think because it's got the BM25 and the vector search is starting to offer more value with like how it can help you with your information retrieval research yeah and in general that's just something that is very important to me is trying to figure out how to connect with the information retrieval research I think the beer benchmarks presents a really exciting way to do it I do have some ideas on how users would be interested in it because I think the idea of beer benchmarks is maybe you look at it and you say okay NF corpus or trec covid or natural questions like very similar to what the app that I'm building but I think with chat gbt you could probably loop through your documents and generate queries gold like those would be the gold documents for those queries and you can do the same kind of evaluation testing where as you mentioned you want to see how that approximate nearest neighbor error cascades into the representation error and see what that means for your particular problem so I hope people check it out I hope find an interesting yeah that's a that's a ton super packed thanks so much what I like in this discussion compared to to the last 20 years ago is that you continue to explode with knowledge and I hope you will continue doing that thanks so much for your time today corner and yeah looking forward to talk more yeah thank you so much to me try to feel like the vector podcast is like the super bowl of search podcast so thank you so much thank you so much Connor yeah enjoy your day bye bye \ No newline at end of file diff --git a/transcripts/vector-podcast/daniel-tunkelang-leading-search-consultant-leveraging-ml-for-query-and-content-understanding.md b/transcripts/vector-podcast/daniel-tunkelang-leading-search-consultant-leveraging-ml-for-query-and-content-understanding.md new file mode 100644 index 0000000..8d9977c --- /dev/null +++ b/transcripts/vector-podcast/daniel-tunkelang-leading-search-consultant-leveraging-ml-for-query-and-content-understanding.md @@ -0,0 +1,269 @@ +--- +description: '

Topics:

00:00 Kick-off by Judy Zhu

01:33 Introduction + by Dmitry Kan and his bio!

03:03 Daniel’s background

04:46 “Science + is the difference between instinct and strategy”

07:41 Search as a personal + learning experience

11:53 Why do we need Machine Learning in Search, or can + we use manually curated features?

16:47 Swimming up-stream from relevancy: + query / content understanding and where to start?

23:49 Rule-based vs Machine + Learning approaches to Query Understanding: Pareto principle

29:05 How content + understanding can significantly improve your search engine experience

32:02 + Available datasets, tools and algorithms to train models for content understanding

38:20 + Daniel’s take on the role of vector search in modern search engine design as the + path to language of users

45:17 Mystical question of WHY: what drives Daniel + in the search space today

49:50 Announcements from Daniel

51:15 Questions + from the audience

Show notes:

[What is Content Understanding?. Content + understanding is the foundation… | by Daniel Tunkelang | Content Understanding | + Medium](https://medium.com/content-understanding/what-is-content-understanding-4da20e925974)

[Query + Understanding: An Introduction | by Daniel Tunkelang | Query Understanding](https://queryunderstanding.com/introduction-c98740502103)

Science + as Strategy [YouTube](https://www.youtube.com/watch?v=dftt6Yqgnuw)

Search + Fundamentals course - https://corise.com/course/search-fundamentals

Search + with ML course - https://corise.com/course/search-with-machine-learning

Books:

Faceted + Search, by Daniel Tunkelang: https://www.amazon.com/Synthesis-Lectures-Information-Concepts-Retrieval/dp/1598299999

Modern + Information Retrieval: The Concepts and Technology Behind Search, by Ricardo Baeza-Yates: + https://www.amazon.com/Modern-Information-Retrieval-Concepts-Technology/dp/0321416910/ref=sr11?qid=1653144684&refinements=p_27%3ARicardo+Baeza-Yates&s=books&sr=1-1

Introduction + to Information Retrieval, by Chris Manning: https://www.amazon.com/Introduction-Information-Retrieval-Christopher-Manning/dp/0521865719/ref=sr1fkmr0_1?crid=2GIR19OTZ8QFJ&keywords=chris+manning+information+retrieval&qid=1653144967&s=books&sprefix=chris+manning+information+retrieval%2Cstripbooks-intl-ship%2C141&sr=1-1-fkmr0

Query + Understanding for Search Engines, by Yi Chang and Hongbo Deng: https://www.amazon.com/Understanding-Search-Engines-Information-Retrieval/dp/3030583333

' +image_url: https://media.rss.com/vector-podcast/20220522_070529_71d7f3ebca3858a656066fb337b207c1.jpg +pub_date: Mon, 23 May 2022 13:00:19 GMT +title: Daniel Tunkelang - Leading Search Consultant - Leveraging ML for query and + content understanding +url: https://rss.com/podcasts/vector-podcast/494873 +--- + +We can get started. So I can kick us off even though you see are really the star of the show. So hi everyone. Welcome to our fireside chat with Dmitry Kan, Daniel Tunkelang. This fireside chat is on search and all the interesting topics that Dimeji and Daniel will talk about. +And it's a series that's hosted by Kauarais. + Kauarais just do my plug here and we're a new education platform that transforms the way professionals build technical high demand skills through top industry leaders such as Daniel and collective peer learning such as Demetri, the format of our courses or pretty innovative because we mix live instructor sessions with real world projects and fireside chats like these with operators that are experts in their field. +And actually I see a couple of students from both the search class and from other classes are in the audience. So welcome back to you guys and welcome to everyone else here. So with that, I'll pass on to Demetri. Awesome. Thanks, Judy. And hello, everyone. +As they usually say, hey, they are vector podcasts is here. And today I have like a luminary guest, a mogul in search world, Daniel Tankele and beyond excited to be talking to him and discussing the, you know, favorite he's in mind topics in queer understanding and content understanding. +And traditionally, I will introduce myself for the first time on the podcast. And well, what I want to say is I have PhD in natural language processing. I work the machine translation back in the days. Currently in two roles, principal AI scientist with silo AI. It's a consulting gig. +And recently I entered the job as a senior product manager at the company called Tom Tom, which produces maps and map search and navigation. I have 16 years of experience in developing search engines for startups and multinational technology giants. +Most recently, I worked on multilingual web scale search. I also claim to be an expert in vector search engines. And I'm the host of vector podcast, which focuses on this tech, but also beyond that on search at large. I'm also blogging on medium. +And as I said, I'm beyond excited to be talking to Daniel today. And as a tradition, Daniel, could you please introduce yourself to me and our audience? Sure, do you make me? Thank you for that. I'm Daniel Tunkelang. And I've been working in search for, I guess, a little bit over two decades. +I started after completing my PhD, not anything to do with search information retrieval, but actually in network visualization. +I shortly ended up teaming up with a few folks to start a company called Indeka back in 1999 that ended up focusing on e-commerce search and to some degree enterprise search in general. Site search has there for 10 years of the chief scientist. +And then I went to Google where in fact, I worked on search in local search part of the maps search team as a tech lead moved ironically from the East Coast. +I've been living in New York to now to do to leave Google and join LinkedIn where I first ran the product data science team, but ended up coming back to my first bulb of search. +And it was at LinkedIn where I started a query understanding team and shifted my focus, which had really been more around faceted search and interaction to query understanding. +After leaving LinkedIn, I decided to go off on my own and for the past six or seven years, I think what I like to call a high-class consultant trying to bring the search to everybody who needs it, which turns out to be a lot of people. +And then last year, I discovered the wonderful folks at Co-Rise and started teaching these classes with my friend and colleague, Raddey herself. Fantastic. And I can add to that, the course being having been a student at your course. Fantastic course, I've learned a lot. +And yeah, I'm a happy owner of this certificate as well. So I can prove to future job employers that I have passed it. And actually, I watched one presentation you gave at the CIO Summit 10 years ago. +And one key phrase that I took away from it or suggestion, you said science is the difference between instinct and strategy. And I wanted to a little bit like ask you to talk to the role of science in every day search engine development and research. +Do you continue to view it that way 10 years forward? I do, it's funny because if you, anybody who watches that, that is probably the only recording of me wearing a suit on any sort of video is the science strategy talk. +And the when I was thinking at the time, you know, as a data scientist, a big part of my job was getting people to use the scientific method and there were whether that was a A B testing or having, you know, clear falseifiable hypotheses and so forth. +Now, it don't get me wrong, instincts matter a lot. For example, if you go to a search engine and you're not happy with what you're seeing, your instincts are probably right. There's probably something wrong. +But if you say, oh, I'm not seeing the results I like, I'm going to add a simple inventory. I'm going to turn up one of these knobs to see what I get. Then don't get me wrong, you'll sometimes get improvements, instincts are not useless. +But you won't have a way of being certain that you're getting improvements. And you may sometimes get improvements that happen to work in that particular moment at that particular time, but which you can't explain or sustain. +And so science is about using techniques like with other people might call randomize control trials, but we like to call A B tests. + The science it poses a certain amount of discipline and it keeps you honest, which I do think is the difference between running on instincts that may or may not work and being able to pursue a strategy where you not only can see whether or how things work, you can measure this as well and repeat what you do. +So I still hold to that with regard to this. Yeah, this is fantastic. And I highly recommend also to watch that video, even though it was for high top executives, there is a lot of logical elements to it that you can apply in day-to-day work. +And yeah, I remember also one quote from the book called How Google Works that if we argue and we have data, let's look at that data. But if we go by opinions, let's go with mine. And it was written by Vice President of that area. So basically, he's a hippo or sort of top that letter. +So why not why not actually follow the hierarchy there? But yeah, I agree that if you have data, look at it if you don't try to collect it. Yeah. Yeah, I mean, indeed, I mean, data is what is the equalizer, but it's for those of us who are not CEOs, it's how we get things done. Yeah, absolutely. +By the way, I wanted to also say a couple of words on logistics. Please send your questions on the chat and we will handle them in the end of this session. +Yeah, and 10 years forward, I've read your message on LinkedIn where you said a little bit on sad tone, not everyone shares my passion for search. But I suspect that many would be more excited about search if they understood it better. +Was it just a moment of despair or was it a moment that you thought, okay, I need to approach it differently. +I can keep blogging about query understanding and content understanding, but how can I actually open the doors to the minds of new people, potentially students in this field? What was going through your mind when you wrote that? So I'm an off to this. +So I'm, you know, if two years of a pandemic and now the global crisis can get me out, I'm certainly not going to despair just because not enough people are excited about search. + But I have seen that, you know, our technology industry tends to have certain kinds of fads and say, in fact, back in the 90s, everybody was excited about search for those of you all to have to remember, Google was not the first major search engine that we're using all of this, that we're using, gahoo and so forth. +And then after Google took on the scene, many people said, oh, search is done. Now, I happen to not be one of those people because I was at a startup, which actually also started in 1999 working on search. And they said, no, search isn't done at all. +I mean, we were trying to help e-commerce companies and we saw that there's a lot to do on search. Now you might think that 20 years later, search would finally be done. +But interestingly, there are still so many opportunities, in fact, using some of the latest developments in machine learning to do so. But what I've seen is that people don't necessarily gravitate to search as an exciting problem. +They're excited about voice recognition about what they perceive as AI in general, which they may see as question-answered, which at the heart of it has lots to do with search as well. +But they don't realize that, you know, that humble little search box in which they are typing and everything going on between it is still an extremely exciting area of development. +I think it's because it does look so simple that they don't imagine what can you do? Change the size of the search box, you know, change the font of what's actually going on between. You'll be behind that. +So my hope is that when people see the complexity involved and the way in which search is amenable to the very techniques that they are excited about, they'll then say, oh wow, this is great. +And then on the top of everything else, it's a place where I can have a huge about impact, a netable impact on the way that people interact in machines. So I know, no despair, just maybe sometimes a little bit of sadness that people don't share my excitement enough. Yeah, absolutely. +I mean, search is a fantastic field if you're not there, consider entering, or at least studying and evaluating, but it's very deep. +I remember actually myself like 20 years ago, still in the university, I was asking a friend of mine, how do the search engine works? And he was majoring in information retrieval back then. I knew nothing about the field. And he said, well, we we use inverted index. +That's how we represent the documents. But then I was still not satisfied. I asked him, hey, how can actually search engine know what I want to find when I don't know myself? Like, if I start typing something in the keyword box, it's like a chicken act problem. +It means that I know something already of the subject, right? But what if I know nothing about it? And so in my mind, I started hypothesizing that maybe we can build a search engine, which will kind of refine the query. +I didn't know how to do it, but I was just, you know, thinking in my mind that it's possible. And now so many years fast forward, we apply machine learning to search. +And what I wanted to ask you, why do you think we need machine learning in search today? Like, there are other ways to satisfy the user intent on sort of calculated, understand it. Then there are other things like established techniques with manual boosting and manual features that you can curate. +And many companies, I think, still do it. But like, what's your take on machine learning role in search today? So certainly when I was working on search back in 1999, I didn't give a no much machine learning. +I take a class that's highly theoretical, but I managed to help build search so clearly it's possible to do it without machine learning. And as you said, many people still are working with completely hand-to-systems. +I think that machine learning plays a few roles though in, I think what you could say is modernize in search, but what we're making it do things you really couldn't do before. +So for one thing, when you're doing all of these hand-to-tune boosts, right, which you're typically saying, oh, I'm going to have a bunch of factors that affect me. I'll change the weights on those. I'll see what can improve. +Effectively, you're performing an optimization problem, but you're doing it by hand, or you're saying, let me go a little bit in this direction, a little bit in that direction, let me see what it can do. +Well, the main technique that machine learning does is optimization only that it does so by formalizing the objective that you're optimizing for, and then using mathematical techniques, like variations of gradient descent, to look for the place that is optimal. +Well, it would be silly for you to do things by hand that there is an existing architecture to do that. But the other thing is that when you do things by hand, you're very unlikely to be able to move too many knobs by hand. +Three or four factors, you can handle a hundred factors, almost certainly not. +And that tends to be the big win of machine learning is that because of the win that it automates what you would otherwise do by hand, it allows you to do things at a much larger scale and yet keep your weights about you. And that's usually what people do when they're concerned about ranking. +But the other the other way to break through in machine learning today is that in areas like query and content understanding, machine learning often allows you to solve problems. + You'd have been very unlikely to solve by hand to recognize when a query comes in that's in a particular category or that particular entities, people, brands and so forth are mentioned in that query or to figure out what a piece of content is about and get a representation that you can then use to inform what to be returned. +And that's an area where it's not new that you can use machine learning there, but the ability assistance to do so using the more modern AI techniques of word embedding to the like, have dramatically changed the quality of that. +And it's a breakthrough that I can only think of comparing to, you know, speech recognition has been around for a while. But if you use speech recognition systems in the 1980s or 1990s, you thought other very cute, but they'll never be useful. +And today we take for granted that they work well enough that people who had no other option could actually manage with them. And I would say that machine learning in search has now reached a point where it would be silly not to use it if you have the possibility of the data to do so. +Yeah, absolutely, especially if you sit on a pile of data, right? As they used to say in the age of big data. But of course, there are still niche areas where you, let's say you launch a startup, so you don't have clicks. Maybe you can measure clicks some way, but let's say you don't have clicks. +You don't have any user feedback yet. I think at that point, you could still apply machine learning, right? Like deep learning, hopefully we'll get there. +But before that, I think when people talk about ML in search context, they quite often mean, you know, machine learning based relevancy, you know, learning to rank, like you learn a function which will rank your documents. +But in a way, what can rank or find by itself, not much, if the data is not there, if it's not categorized. So what's your view on where machine learning can bring a lot of benefit upstream? And I think you touched on it like query understanding and content understanding. +Can you drill a little bit more into that, especially from the point of view, how you approach the task, where you start? Sure. Well, as you know, and anybody listening to this, is ready what I have to say. I like ranking, some of my best friends, but do ranking and even I do it occasionally. +But I think that ranking has been over emphasized in search in general and in machine learning, the power and search in particular. +So if we think of what ranking does, it says, but we have a search query, we have a potential result, and we score it with a function that will then determine the order which we present it, assuming that that result is as a candidate to be considered. +And if you go back to the original models of information retrieval, they act as if every document in your corpus could be scored. The only reason you don't do that is you can't forward to it's too expensive. +But that that's the, the gist of it, the scoring function on the query and document, then in some cases, even on the user. Now, that's a lot of input into a function. It's quite a different way that you might approach the problem is to say, I have a query. +I'm going to try to represent that query as useful as possible without looking at any documents first. Also, before I even see any queries, I have documents. +I'm going to try to represent them as well as possible before I see any queries, or at the very least before I see the particular query that someone's going to make. I might have something, I might use the history of queries to, you know, inform my approach. +So now we've factored out this original scoring problem that said, throw everything at a scoring function and said, no, no, no, we're first going to say, let's understand the query in a representation that distills it to its assets. +We have already understood the content, the documents, in a way that distills them to their assets. And now, when we even decide what to retrieve, we're going to use those representations that already have done some of the work for us. In the case of the documents, we did it offline. +In the case of the query, we have to wait till we see it unless it's a maybe ahead query we've seen before, but we do it once for the query, not once for every result. +And we can use that to then say, well, roughly speaking, if we have represented the query and the content in a similar space retrieval that is deciding what documents we should look at, is much more of a matching problem. + In fact, if the space uses a similar schema, for example, if the query is mapped to a category or a set of categories, and the documents having categorized using the same set, we can say, well, we should probably retrieve documents from those categories, or we may have other structured data we can use that way. +And what is happening is that a lot of the work that ranking was doing, which was essentially trying to say, should I retrieve this document at all? Is this document relevant enough to the query that it should be in consideration? +This query-dependent aspect of ranking can be solved as saying, basically, once I have the query, and the content represented in the same space, do they know? Is that overlocked there? + So we're basically changing the first order bits, the higher order bits of ranking into more of a classification problem, which we're experiencing is really, look, once we have the query and the content in the same space, figuring out if, you know, what is the content that matches the general gist of the query, should be an easier problem. +And then ranking is more, oh, well, a lot of things matched, but some are better than others. And that's, of course, the word, but she learning that, well, machine learning is how we get those representations. +It's how we turn the query into a more useful representation, how we turn the content into a more useful representation. +But it is, by treating those things in something of an acceleration, it allows us to be a lot more directed than we are with ranking, and in my experience, it's far better results. Yeah, I remember like the course, search with a meltot by you and grunting yourself. +You gave that brilliant example that stuck with me. + I believe it was best by implementing, correct me if I'm wrong, implementing the query functionality, query understanding functionality where if you typed iPhone or some product that they didn't have at the moment, they would actually use query understanding to tell you that they don't have a product they would not even go in search. +And I mean, the example was this B&H photo with iPhones, but an example that's even more fun is with Netflix, where I don't have much inside as to the channels there that haven't been one of my clients. But the Netflix has perspective about limited catalog. +They don't, for example, get movies from folks like Disney that is quite protected of its catalog. But Netflix knows when you're looking for a Disney children's movie. And when you do that, rather than trying to simply match the text of your query, they show some of their children's movies. +So it's an example where you clearly split out the work of understanding, the searchers intent, and query understanding from simply retrieving and scoring results, because that improved representation. +They know you're looking for a children's movie, and they have children's movie is far more powerful than the traditional ways in which you might score a grand result. Yeah, I'm personally fascinated by the field of query understanding, having implemented it with my team in a web scale. +We worked on job search engine, as vertical search engine, no power by the web scale search engine. First of all, it was multilingual. Second is that you have to figure out this semantic subtleties when users type opening hours or working hours, whatever the way they phrase it in their language. +And that's not a query you want to execute in the job search. But if they set jobs in IT in London, that's okay. And you can use that and pass it through the filter. So query understanding kind of worked as a filter in a way. +But then it also, or a classify, you could say, right? But then it would also give us this reach semantics that we could apply in fields. Let's say if it's London as a city, you don't want to search that work just in the description. You can apply it in the field of the city on the document. +And I mean, this was like back then we applied rule-based approach. +And it worked fine, but it was very maybe conservative in a way, right? Especially for languages like Turkish, where they have the word ish, which is a, you know, overloaded, semantically overloaded word and used in different contexts. It may mean a bank card. +It may mean a job search and a number of other meanings. +But would you advocate for using machine learning and query understanding? I know, by the way, you wrote a brilliant series of blog posts on medium drilling into so many subtopics of query understanding, and especially like that you can actually utilize it in autocorrelate. +I was actually fascinated that you connected those two and I highly recommend everyone to take a look at that. +So what's your take on sort of rule-based versus machine learning? Would you start with rule-based? And then as you learn, go to machine learning, or would you start head-on with machine learning? +So I certainly see a lot of value of machine learning inquiry understanding for some of the reasons I was saying before. +But with that said, I think that there's often a sort of a burrito principle in 80, 20 in search problems. And when I go to people, especially folks in small organizations, I tell them, look, let's say for example you're trying to figure out would be, well, use your job of job search examples. +And so I spent a few years up like that and it's very close to my heart. +They knew on to know, well, somebody, for example, looking for a job title, or are they looking for, in say, lengthens case someone's name, or in the case of, say, language, maybe you're not sure what language they're searching in you're trying to do language identification. +You could start by looking at the most common queries that you see, and then just having people, your own employees, a hired crowd, what have you to say, look, can you just label these? I'm not going to label more than hundreds or maybe thousands of these queries that way. +At the hundreds of thousands it starts getting a bit silly. But you can do that and you can say, okay, maybe you have now handled 20, 30% of my traffic that way. It's not uncommon that in 10,000 queries you easily get to that. And you can see great. +Now that I've done that, now that I know, that this person is looking for a job title, that the language is Turkish, or what have you, what would I do with that? And I'm like, well, I'm going to have a particular search experience in mind. +If I know that it's going to be in jobs, I won't look in people, or I won't look in my content posts. If I know what language it is, I'm going to grab from that repository and so forth. And you can learn what you would do there. Now, this won't scale into the tail of your distribution. +But you can learn what happens with that sort of experience. And that's actually really important, because sometimes you don't know what people react to until you show it. There's a bit of a chicken in head in these things as to what is the quality of your data, but also what is the experience. +But once you've decided, okay, I am going to pursue this sort of experience. Frankly, without machine learning, you're never going to scale it. +You're not going to label everything in a rule-based approach to try to figure out what language something is in, or what category something is in simply isn't going to scale. +In the case, for example, language, it's not like you're going to just build dictionary, it's because you'll have cognates between the languages, or in the case of job titles. +By the time you get to Chief Vector Search Ninja, you're going to be in a bit of trouble as to recognize and bad as someone's title. So that's the point at which collecting training data becomes critical. +One of the nice things is if you've done some of the work by hands that can actually be how you bootstrap training data for these approaches, especially if you don't have our data position to do so using feedback from your own search application. Yeah, absolutely. A quick shout out to our audience. +I think it's, if I'm reading it right, just a second, Andre has a poll of how many people in this call are using hand-to-and-boost versus machine learning. I'm really interested to hear or read this opinion. Maybe you can say about it. +Yeah, and on the other hand, you've been advocating a lot on drilling into your content. And of course, some companies do this one way or another. But can you illuminate us on what you can do also on the content understanding side? Sure. +So if you think about it, if all you need was query understanding, you might be able to figure out exactly what the search or wants, but actually not be able to find it. +So content understanding is really what you're doing in order to represent content in your index and the best way to make it retrievable, it's horrible. So certainly, it's a great place to do things like categorization. +This is especially true to say if you have a marketplace or if you have a lot of unstructured content where you don't necessarily know what the content is about. It's also a good place to extract entities, terminology, even determined potentially the terminology that's used for representing it. +I mean, imagine it. For example, you have a collection of research papers. You can discover the useful words or phrases that tend to carry meaning. +You can relate them to one another by putting them in a vector space where the distance between the vectors that tells you how similar they are, you can cluster those. +So in general, doing things that involve either classification or essentially annotation recognizing entities or turns in those allows you to enrich the way you index the content. +You can also figure out when documents are similar to one another because when you have these vector representations, you can take the entire document or part of the document and do that. +And that can be useful for saying, oh, if you're interested in this document, you might be interested in these other ones or maybe you're interested in these other ones, but they're more recent. +And that allows you to combine what content is about with other factors like its recent seeds, popularity, other people that look at it. +And you see this often not just for search, but specifically for for being an engine for recommendations that are triggered from discovering something through an initial search. So all of these things basically make the content more retrievable, but also more exploreable. Yeah, absolutely. +I can also add to that in some settings, specifically in financial search, I'm happy to work at the company called AlphaSense. +You may end up in an institution when you cannot actually use the hints from the users, right? So for instance, like if you do a not a suggest and you extract themes from queries, you could actually do that. I believe Google does that to some extent. +But in financial setting, you cannot do this because banks will prohibit using their searches with their arrivals, right? You don't want to do that ever. +And so at that point, you do go deeper into content understanding and you start extracting stable themes and maybe over time you can also extract trends as they show up. And that might be one way to kind of combat the issue of not being able to use queries to influence your model. +But yeah, you might have another setting. I'm curious to hear in the audience as well, what kind of setting you guys have. And my next question would be on what I available data sets. +Let's say if I want to practice query understanding or content understanding at home in my lab, what are the available data sets, tools and algorithms that you can recommend that will allow us to train these models for both of these directions, query and content understanding? +So as those of you took the class, no, we've been using any commerce data set from Best Buy for a teaching. +It's a nice data set. It's a little bit old, but it has a virtue that it has a bunch of structured data queries and some click data as well. And that's proven useful. You can get that from Kaggle as they've made available freely. +And indeed, Kaggle, which is at this point, the subsidiary of Google, but Minkins independent brand is a great place to get data sets. +This one from Best Buy, I think is probably the best all around one for exploring the particularly query understanding, until that starts that content understanding. There are certainly other data sets available. +You can, for example, grab dumps of data from Wikipedia that are fascinating Wikipedia is perhaps the best overall data set in the world. But they're in mind that it's a bit sprawling and that they don't supply much in the way queries or feedback. +And you'll have to do a little bit of a work with that. There's a data set called MSMarco that's been very popular with essentially the deep learning crowd because it's an interesting place for doing question answering. +So I think a lot of the question becomes what is the problem that you want to work on? +And I would say for those of you who are already working in search and some capacity or at least have access to data, you should really consider trying to use your own data because usually the thing that is hardest to get in public data sets is user behavior. +For perfectly understandable reasons, companies are not eager to share what their users have done either because of the privacy constraints for their user or the competitive nature of that data. +So even if you're able to find catalog data, which you could, if it's structured, use to learn content understanding techniques. For query understanding, the most powerful thing you're going to use is a collection of queries and labels for what those queries mean. +But if you get a collection of data without even having what the queries are and let alone the labels, it's a little bit more difficult. Yeah, absolutely. And can you also share a bit on the tooling side or maybe algorithms? Sure. +So the, you know, for different problems, obviously call for different tools. On the ranking side, one of the most popular approaches that's still in use today is XG boost, which you can get online easily enough. +And it's also been integrated with, I think at this point, most of the major is certainly Lucine based, sort of solar, elastant, and so forth. +If you're interested in classifying text or doing unsupervised learning and untext, there, you know, these days, frankly, I would go directly to embedding based models. +And you can use tools like Burd or maybe the old school on the fan or fast text that you can get online and you can download those, you can install them on your laptop, you can even get pre-trained models for hundreds of languages that do so. And from that, it should be easy enough. +You can just walk through the tutorials where you take just a bunch of labeled text examples in the case of past text, it's an example of stack exchange cooking questions associated with it with their labels. And in half an hour, you find that you're actually doing content classification from this. +And in the course that we do this sort of thing with the best by data as well, it's amazingly easy to see that these sort of tools will start to give you reasonable looking answers. + Now, getting for reasonable answers to answers that you're happy with and incorporating into a search experience can be the difference between an hour and a week or a month or but my hope is that by seeing how easy it is to get started with these, you get tempted enough that you say, great, but 80% isn't good enough. +I need to get myself to something I'd be willing to put in front of my customers. And to be fair, there's a little hard, more hard work to make that happen. Yeah, absolutely. Vector's search, by the way, is my favorite topic. I talk a lot about it. And the look as well. +And I'm super, super happy that you mentioned this now. And the gateway here to this topic is the connection with content understanding is one of the techniques called doc to query, essentially computes possible queries and then augments your document. +So you don't actually need to step in the unknown field of vector search trying to re-engineer your search engine. You can actually keep your search engine architecture as it is. +And you just basically augment your documents in the hope of increasing coverage and actually precision at the same time of future queries. +So what's your take on this on these techniques, emerging techniques, but also what's your take on on the role of vector search in general in the search engine design today? Sure. So if you think about it, the Dr. +Query approach is similar in spirit to saying, well, I'm going to just, I'll have a known set of fields that I would assign to the document in traditional inverted index or posting list. And indeed, the limitation of the older approaches is that they get a kind of force you to a limited vocabulary. +And now the query vocabulary is literally the language of users. So I think that's a great way to do things. And to handle also that documents have often a lot more variability than queries. +This is typically the only some of my people are going to do in a search box, but documents can be in all shapes and sizes. +So I'm certainly a fan of doing document enrichment that's query friendly or conversely, if you're going to do things on the query side, to think of a query is actually as a bag of documents. +I think even though there have been these explicit, what we call two tower approaches that try to sort of meet halfway, I think it's perfectly fine to say, well, we'll think of one of these things as more fundamental and not the second one, too. I think first off, I think it's great. +The idea that you can think of meaning as it comes, point at an eye-dimensional space, it explores things around it, even though in a way it's not a new idea, right? +People have been using vectors at least as far back as techniques like TFIDF where the bag of words representation of content was simply a vector in the space where every word was intervention. +I'm glad you've gotten a little bit smarter about that over the past few decades. And certainly what we can do now with embeddings is amazing. With that said, I think that sometimes people use vectors as too much of a sledgehammer. +For example, if I do a query on a site for cat, turning cats into a vector, and then turning all the documents into vectors, and then sorting them across my code sign, probably is overkill when how much information am I going to get out of a single token cat? +Figuring out whether something is a cat, as Google showed, may require a huge amount of machine learning, for example, with based on taking images. +But it's probably safe to say that at least at the query level, there's only so much new ones are going to get out of a one word query corresponding to an entity. +And if you a traditional approach where you might cure a vector based approach would say, well, I'm going to take the vector for my query cat. I'm going to take all of the vectors for my documents, which I vary the reason of cat and is implied. +I suppose in those vectors and sort by their distance, it probably makes sense to start by saying, maybe I could have actually, you know, either using the doctor query or more traditional methods annotated the documents in such a way that for the first pass, I could get the things here. +In the case of queries, and that simply, as only I've spoken in it, there may not be much I can do at that point in terms of use of vectors. Now, as the queries get longer, have more signal in them. This game changes completely. If I'm saying, I'm looking for a cat wearing a red bow tie. +Well, with a query like that, it's very unlikely that a traditional approach is going to be able to say, well, what do I do? I'm going to look for those words. Some of them, some other ones, you know, is a neck tie. +The same as a bow tie, you know, would a cat in a texito be better than just your typical cat pictures. And so, at that point, the game has changed because it's not a symbol binary question anymore. And the identity that is reduced to similarity make a huge difference. +And there, I think you lean much more heavily into things. Now, I'd say that it's still the case that doing it pure, you know, grab everything as a sort of a nearest neighbor's search in a vector space can be computationally challenging. And it can lead to you, you sort of unpredictable results. +So, most people today are still doing their first pass at least by using traditional methods. But I do know folks who are increasingly trying to use vectors from the get go, but just by using sort of course, or grade vectors or various techniques to make that first pass be quick enough. +So, I think we're going in that direction. I think that there's still a lot of value both computational efficiency and for end explainability in using, you know, traditional inverted indexing techniques where you can, especially for the early stages of retrieval. +But for either for getting these nuances or for say increasing your recall, but, you know, retrieving things that might might have lost otherwise. We're seeing increasingly the use of a vector search to get them. And, you know, we're doing this talk now in 2022. +I suspect in a few years the inverted, inverted index methods will become more and more confined to those cases where where the data is really just simple binary. I think I've always used this. +I think this kind of, there'll always be the certain head cases of it, but the use of vectors is only going to expand. Yeah, absolutely. +Oh, like, especially where I would say, inverted index will still be needed if you are looking for an exact phrase, like you don't want to say, hey, vectorize this and find the similar. No, I don't want similar. I want that exact thing. +And of course, there are other things that need to be improved in vector search, like, I don't know, bird model, according to one study. It doesn't recognize negations and it might be actually crucial for some search scenarios, or sentiment analysis. +And also another thing is that by now, at this point, the sparse search BM25 based methods is a very strong baseline when you compare these methods across datasets, across tasks. And so across domains. So I think the future is very bright on this direction, in this direction. +And I think a lot of folks are trying to combine this method. So I'm happy that you are looking at this as well. And I believe you will be teaching about this as well. We are quite close to the top of the hour. And I'm happy to see so many queries coming in. +But I'm going to ask you my favorite question, the question of why it's this kind of mystical, philosophical question. You are the most celebrated search engine professional today, one of the most, if not the most. +You've done everything there is to do in search, in my opinion, like when I look at your CV, you know, even you consulted Zoom through which we're doing this session. So that speaks volumes. And I just wanted to ask you what drives you to continue focusing on search engines. +And especially teaching about it. So search, if all of the problems that, you know, of all the things we do with technology, I believe is the one that puts us as human beings front and center. +So much of what you see, and specifically machine learning AI is being done to us feeds recommendations, advertisements, and search starts with people expressing what they want. +And I know, in my version of the future, I'm not a custodian, but I believe that the machines will help us, but they have to start with us expressing our intent. So that's an ace of my search is so exciting. And as for why I teach it, well, it comes back to what you asked in the beginning. +Now, I'm a despairing that there's nothing exciting about search. Not despairing, but I do think that the need for people to be building great, great search is not net by the supply of people who have learned about it. +And as much as I enjoy personally working as a consultant for companies, that's not exactly a scalable approach. + So what I see is there's so many people out there who know enough that with a little bit of a push, some combination of the sort of the basics domain knowledge that we're teaching in our fundamentals class, but also the kinds of techniques and feedback are somewhat opinionated way of showing those techniques in the search with machine learning class that focuses on queer understanding, non-conset understanding. +That takes dense retrieval, vector retrieval, puts it in context, is just the nudge they need to get over this. You don't need to spend years and not everybody's going to get to do a PhD in information retrieval and machine translation. +But I think that today, if you are a software engineer, if you have a basic knowledge of coding, and you learn a few of these things, you can do wonders with the tool that's out there. And then from experience, you'll develop the rest of the sorts of skills that you need. +So I'm excited that I can be a part of enabling the next generation to just run circles around anything I ever got to do. +Now look back at the work we did in the early 2000s and it looks so naive, although I think we were working at least on the right problems, but without the machinery we have today. +And I just think, you know, in in in another 20 years, I look forward to looking back on the naivety of what we thought search was, you know, back at this point. Yeah, I'm ecstatic. This is very deep Daniels. Thanks so much. +And please keep doing what you're doing because I really, really enjoy reading your blogs. I need to also read your book, by the way, on FASITED search. +And before we move to the questions from the audience, is there any announcement you would like to make to our audience? Well, you know, for those who don't know, we're going to be teaching these two classes in June. +There's a search fundamentals class, which is a two-week class intended for people with no background in search. And this search with machine learning class that will start two weeks later. So people can take both a fact, focuses on search with machine learning. +So we're going to show you a query understanding, constant understanding, and vector retrieval. That will start the first course. We'll start at June 6th, the second on June 20th. And these classes are available to anyone in the world. +We make sure, granted eye, that we cover the time zones well and are available asynchronously. So I hope that some of you have already signed up. Some of you have taken the class before. +But what we experienced when we talked this class before was the incredible community, which did make sure it's indeed a part of. And we're excited to keep going. Absolutely. And I highly recommend you to take this course, or both of these are one of these. +And first hand experience, it was breathless run. And I was like, yeah, it's every single week. I need to hand in a project. And it's not just theory. And theory, by the way, is very deep. If you have time, go and read all the write-ups that Daniel and Grant have done on the course. +But also the actual act of coding, the actual thing that you see how it evolves in your hands. It's amazing. So awesome. Let's proceed to the questions from the audience. I will take the first question from the Q&A panel. We also have in the chat. So the first question is from Hemann Schu. +I'm a polygis of five meets pronounce your name. Hi, Daniel. Any specific book to get started with search with elements of machine learning? Thanks. Yeah, so I mean, the, I'll say this, there's a lot that's been written on, on learning to rank. I think Chris Mannings book discusses it. +I recorded by a Jesus book, might, I'll have been a little bit older. What I think you're going to be less likely to find to though, is that's on the query and content understanding. There is a book. +It's really a moral collection of survey essays on query understanding for search that was published. I forget if it was last year or the year before, a little bit expensive, but if you look at that out there, if you're more interested in things for free, my blog at queryunderstandy. +com is available. At least we'll give you a survey of the of the techniques there. But to be clear, it's very focused on query understanding as such. I'm writing a series of content understanding that unimaginatively content understanding.com that starts doing the same thing there. +But from the perspective of books, I would say that probably Chris Mannings information retrieval book would be a good place to start an information retrieval in general. And the query understanding collection of essays is frankly the best publisher resource you're going to get for that. Awesome. +Hope that answers your question. I'm sure the next one I'm going to take from the chat. It's from Chris. What are your recommendations for integrating information retrieval, retrieving documents with question answering, returning answers within a context? Yeah. +So question answering is really exciting, right? And that the this idea that you can get information instead of just the document. + The if you think about the the way we've gotten there, a lot of it starts front of a mental passage retrieval where or even before that search snippets essentially, if you think about the way Google looked five to ten years ago, you would see that sometimes as you looked at your search results page, your answer was in the few words that were highlighted for the for the result. +But now it's more likely that you'll see that sentence extracted and put near the top. And I would say that a lot of question answering today feels a lot like passage retrieval. That is find that sentence. + Although I would say that while before that tended to be retrieving a passage that contained essentially the exact words you'd use, maybe a little bit of variation for stemming or synonyms, nowadays it's more likely using a vector-based approach to be a sentence or passage that is similar in the vector space. +That's it for exciting. However, what people really want is that even though there is no sentence in the content that exactly answers your question, somehow the search engine will be able to not to be a search engine. +Answer engine and say, oh, I'm able to synthesize content from different places, understand your question and learn that. We are not there. I mean, of course, you can ask what's each the i, i, i, i plus one, and it will say zero, but that's cheating, right? That's really just doing this. +You can play with wall from alpha that is a more sophisticated version of trying to essentially parse what you ask into a question that it can then execute in a language. But I think doing that on general information, we are far away from it's exciting, but that's that will require a generation. +Absolutely. I agree on that. The next question from Q&A panel from Donnie. What roles do curated control for capillaries, terminally name, third-east exonomy, and so on, playing in practical approaches to query and content understanding in your experience? So the huge. +And I'm glad that you asked this on me. Basically, collecting these sorts of curated capillaries can be a great way to label the content and have targets for doing a machine learning labeling. So for example, you know what colors things come in, then those people tend to look for colors. +They're great. You can actually say great that I'm going to do to look this way. Or if I know what subjects people are interested in, those will be the subjects that they will content with and where it targets things for queries. + So in a way, having these vocabularies changes what would otherwise be an unsturableized problem, saying, well, I'm hoping that I can get some representation of what this is about, what the query is about, and match them, which if I don't have vocabularies, I'll be somewhat unattainingated in how I do this. +Having control vocabularies can say, oh, those are the ways in which I will try to represent things. +Even if I'm using vector-based methods to get there, they give me some alignment on the space and having multiple such vocabularies might say, well, sometimes I might be interested in one aspect of this content that's sometimes in another, right? +So I might be interested in the color and I'd be in the prototype and I'd be in the material. +So I would say that having these vocabularies can make a big difference. And since you bring up the SORA, it's going to be helpful when there's a vocabulary gap between the way people ask things and the way things are represented to use a SORAist for query expansion. +You have to be careful because the SORA tends to wreak havoc with the context in which those words occur, but still they can be great for generating candidates for retrieving more results. Awesome. +I think we still have time even at the top of the hour for the last question from an anonymous attendee. So Daniel, you talked about query classification for the retrieval side of things, but that can be a slippery slope. If content isn't 100% correctly categorized, and often it's not. +Therefore, our recall would be negatively impacted by using query understanding as a hard filter. Any input on that? Absolutely. I mean, I was burned by this myself when I was helping a client with trying to target promoted search results. +And I said, oh, we should only put them in the right category, indeed, because there were some categorization errors. This had such a negative impact that I stormed into the room saying, close the test, we're losing money. +And it felt very embarrassed because it turned out that, indeed, this is a problem, the content wasn't classified well. Of course, the first thing I would say is invest on the content side because if you're able to classify queries, you probably can also invest in classified with the content. +And by the way, even if the content has been say, categorized in a way you're not allowed to override. Right. Maybe you're a marketplace or you're the content that you know, you don't own the categorization. +For example, on LinkedIn, if I decide to say I'm an attorney on LinkedIn, LinkedIn's not what you just automatically change, no, you're actually not your cat. But it can still classify me and say, you know, you kind of look like a software engineer. +And you can use inferred categories in your retrieval. There's no prohibition there. I'd also say that, you know, maybe the content isn't 100% categorized because there are some similar categories. +Well, you can always take query classification, you know, it's not not a hard and fast rule, but more like guidelines and say, well, return things in that category. But I'll also return things to say we're referring to that category. +And maybe you'll even return things that are in similar categories. Oh, in my way, if I'm not 100% sure on that category, even on the query side, maybe I'll take the top few categories that I thought were possible. +There are a lot of ways in which you can use what you've seen, what you've learned about the query and what you've learned about the content that's hints. And with all these things, it's a precision recall trade-off. +And you have to generally decide what's the cost of losing that recall versus what's the cost of having the annoyance of for precision. And it will depend. Absolutely. It's a journey. And I've enjoyed this conversation so much. I've learned new things. I will rewatch this podcast myself. +And thanks to everyone for asking your questions and thanks, Daniel, for for answering them. Brilliant as you usually do. Hopefully we covered all the questions from the chat and from the Q&A panel. And yeah, thank you so much. +And I think where you don't feel comfortable or you don't know yet, I highly recommend you to take the course or a course on search. And go from there, experiment. Be bold about what you do in your experiments. But just apply science and apply the knowledge from the moguls like Daniel and Grant. +So thank you very much, Daniel, for your time and for your wisdom today. Thank you, Jimic, you're so pleasure. Are ready. Thank you, everyone. Thanks. \ No newline at end of file diff --git a/transcripts/vector-podcast/debunking-myths-of-vector-search-and-llms-with-leo-boytsov.md b/transcripts/vector-podcast/debunking-myths-of-vector-search-and-llms-with-leo-boytsov.md new file mode 100644 index 0000000..03f6472 --- /dev/null +++ b/transcripts/vector-podcast/debunking-myths-of-vector-search-and-llms-with-leo-boytsov.md @@ -0,0 +1,156 @@ +--- +description: '

00:00 + Intro

01:31 + Leo''s story

09:59 + SPLADE: single model to solve both dense and sparse?

21:06 + DeepImpact

29:58 + NMSLIB: what are non-metric spaces

34:21 + How HNSW and NMSLIB joined forces

41:11 + Why FAISS did not choose NMSLIB''s algorithm

43:36 + Serendipity of discovery and the creation of industries

47:06 + Vector Search: intellectually rewarding, professionally undervalued

52:37 + Why RDBMS Still Struggles with Scalable Vector and Free-Text Search

1:00:16 + Leo''s recent favorite papers

Lots + of papers and other material from Leo: https://www.youtube.com/watch?v=gzWErcOXIKk

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20250117_030137_0d0e98d093e79861b9dc85d445adcf1e.png +pub_date: Fri, 17 Jan 2025 15:07:51 GMT +title: Debunking myths of vector search and LLMs with Leo Boytsov +url: https://rss.com/podcasts/vector-podcast/1852660 +--- + +Hi everyone, Vector Podcast is back with still with season three and I'm super excited to be talking to my guests today and there is a connection with this episode between this episode and the episode that we recorded with Yuri Malkov about one of the most famous and popular vector search algorithm I can ask WL and I'm talking today with Leo Boytsov, who is the senior research scientist at AWS and he is also a co-author of Animesleab and Animesleab is today used at Open Search and probably some other places that I actually don't know and I hope to learn it as well today. +This is just exciting and I think goes without saying that the whole field stands on the work done by people like Leo and Yuri and others who actually develop the core algorithms and popularize them, improve them over time and then the story unfolds from there. +Hi Leo, how you doing? Hi, thank you for introducing me, it's a great pleasure to be able to podcast. Yeah, it's my pleasure as well to have you. Traditionally we start with the background. +Can you say in a few words your background maybe how you got here and what's your story in search vector search and maybe LLM? Yeah, sure, yeah so background is pretty long. So I've had a rather long career, honestly. Well in my current capacity as you mentioned, I am a scientist at WSAA labs. +For one year I was working on co-generation about this year, earlier this year I moved to a Q-console team, Q-console team works on question and switch at bots that answers questions about various AWS services. +So we can ask like, I don't know, it's like where's my EC2 instance things like that and how I set up things. But I have to make a disclaimer that today I do not speak on behalf of AWS and I cannot talk in details about my work there. So as I said I had a really relatively long career. +Yeah, so most of nearly all of my life, I have been a computer science geek with a passion for building cool stuff and solving hard problems. Yet my professional career started in rather mundane fashion. So I started working client and service of where for financial systems. +This was not my favorite subject, but pretty much the only one that was paid reasonably well at the time. So I had to do a lot of front end and back end engineering using various SQL databases. +I was not satisfied with my career, but luckily I got really interested in algorithms, in particular retrieval algorithms. So I started working on this topic with the algorithms first part time, then full time. But largely as a software engineer, less as a researcher. +And as a software engineer, work for various companies, including two tiny startups in the Russian search engine and the Yandex. So later I moved to the United States and work on the search engine PubMed, International Center of Biotechnology, information. +First again, that was a common topic in my career, started working with I was doing a lot of front end development. But about the class, 40 years I worked primarily on the T-Roll, the core engine. In particular, I invented a pretty need to speed up weighted bull in the T-Roll. +And around the time I also realized that it would be hard to get to the search position without a good degree. So that motivated me to apply a bunch of universities and eventually I got accepted by Carnegie Mell, which was a huge lock. But yeah, so I did my PhD studies there. +And during these studies, I worked on a mix of machine learning and algorithm algorithms without any deep learning. So the vector search or rather similarity search was a part of my graduate studies. So yeah, I didn't use any deep learning though. +It was a mix of classical machine learning, VortoVex style neural networks and digital. So what is an interesting part of that story is that my advisor, Eric Nyberg, he worked on question answering. +And together with his theory and his participated in development of IBM Watson, that's an amazing trivia playing system that 2011 defeated human champions. So that was like one reason why I chose my advisor. It was like such a cool topic to choose. +But pretty quickly I learned about the system and realized, oh, like it's actually really just not just but it's largely such engine on steroids. So Retrieval, IBM Watson, I have a blog post about that if anybody is interested. +But then Retrieval, it's basically really Retrieval based extractive question answering systems. So if you want to improve question answering, you need to improve Retrieval. So that's how I got back to working on quality algorithms. +And again, I saw an opportunity and why big research question was, how can we do information Retrieval using more advanced techniques rather than lexical search with BIRN 25? +And because before birth, nowadays like everybody just uses like word-based models or any like other transform based models to create dense vector embeddings and they are quite effective, that was not the case when like 10 years ago. +So whatever we had there was pretty ineffective Retrieval. And so my thought was that because the single representation was not effective Retrieval, those need to be somehow combined and assembled. So you basically don't get a single representation. +You use a combination, you use combine similarity and then you treat this similarity as a black box and then you apply generic Retrieval algorithm. So this was a pretty in hindsight that was a pretty ambitious project that required working on both design and effective similarities. +And Retrieval algorithms. And that's why we, well one, that's where that animously library turned out to be very useful. It was instrumental to this work. Although it was created for somewhat unrelated people. +Okay, so that was an overall rather bumpy, right things didn't work well initially and I got a lot of help from other people in particular from my author, David Norva, who proposed an amazing improvement for one of the algorithms in a thermosleep. +Yeah, and so we published and opened after my graduation. And when I was writing my thesis, yeah, I was found like a bunch of issues with my previous approaches and realized that I could also use like a H&SW like algorithms which were not like core part of my thesis work. +And I got even stronger results, but that was like a little bit too late to publish and use otherwise. Moreover, that the similarity that I used was a sort of a face palm realization that that similarity that I used, like Retrieval, completely as a black box. And it worked with most effective. +It was more effective than B125 on the collections that I used. But I didn't realize that this black box similarity was actually representable by another product between two large sparse vectors by another former author Chris Dyer pointed this out. +And if I embraced this sparse vector approach from the get-go, it would have been a much easier problem to solve from both engineering and scientific points of view even without work. And okay, it could have produced some more impact. But yeah, a little bit too late to dwell this now. +Okay, and enough with that I graduated six years ago. And since then I haven't working as a researcher scientist and engineer on deep learning in what specifically I had in training models for specific initial computer vision and Retrieval. +Despite this diversity, things have come a full circle and are working question as being systems once again. Yeah, that was pretty much involved. Yeah, amazing story. Yeah, thank you for that. +It's like the story tends to repeat itself, but at the same time, if we find the topic still exciting and it seems like you are still very interested in question answering and improving building blocks of that, it's kind of cool, right? +So that we are able to come back to some of the topics, pick them up on a different level. +That's amazing. And yeah, there is a lot to unpack. I almost wanted to ask you or the moment you spoke about spars and dance. I wanted to pick your brain on what's it take on the model called split and split V2. +I don't know if you're familiar with that model, but basically, you know, there is always this discussion should we take lexical search, combine it with dance search and then do some kind of hybrid formal on top and then how do we even learn the parameters of that model, right? +Depending on the domain. +But then there is a drastic sort of approach. Let's not do that. Let's just take a complete model which can handle both and then you can also support what the dance search doesn't support like exact phrase searches. +What's your general intuition about that? How do you think about this? Well, that's a super interesting question. I have one clarifying question though. So, before I answer, you said that some people who want to have a single model that's doing both. +Could you elaborate a little bit on this? Well, I guess maybe it's not that they wanted, but it's like the development when, instead of sort of, you know, combining these disparate sources of results, you know, one coming from lexical search, which is kind of like well-known BM25 driven, I guess. +And then the other one is more like more modern in a way that everyone wants to get exposed to dance search. And then you need to somehow figure out how you combine the results, right? So one is designed maybe for precision lexical. +The other one is designed more for recall, right? Because the vectors are not, they don't have as many dimensions as these far specters. +But then you still need to figure out, okay, how do I combine this to? And usually people cite reciprocal rank fusion in what I hear, but there are other methods as well, like even clustering based. But then that's one approach. Another approach is just stop doing that, I guess. +If I really understand what split does, and then you encode with split your data once, and you retrieve, you know, you use its capabilities to also retrieve exact phrases, right? So, effectively, ideally, you don't need the lexical matching engine anymore, but maybe I'm completely wrong. +I'm just, I wanted to hear your opinion on that. Okay, well, let's get it. Using your words, it's a lot on back here. I'm still not quite sure what you mean by having like a single model. +Although, maybe I love me try to maybe start answering questions and you can drop me and guide me into the other direction if needed. +So first of all, we have what's interesting about Nashville language is that, and that's very different from computer vision domain, is that we usually represent, we can we have multiple ways to represent text. +So in computer vision, usually it's just like each image is a traditional representative of actors that was the commodity theme. + But in the in a national language processing, we started with the so-called bag of words representations where a document was represented by basically a sparse vector where you will have either zeros and ones, which means the specific terms present or not, or maybe weights, not just zeros and ones, but weights. +But then, with development of deep learning, and I actually started a little bit earlier with people, people learned how to represent text using fixed size vectors. And that was like using principle component analysis. +And this is not a very natural representation for text and it didn't work really well initially. But now we're having good results. So we have like two representations and there are different approaches to combine those, of course. +One is just if you want to do the T-wall, you can indeed just do the lexical base search, you can do a kidney or a snabestation vector representations, and then you can somehow merge the results. You can use ranker. +But you don't have to, and that's the so-called hybrid search, but the hybrid search can exist in different versions. +So if you want to combine it sort of in a single model, why don't you represent each document using both sparse and dense vector? And when you're computing the similarity, you can compute the similarity between sparse parts, between dense parts, and then combine them somehow. +For example, using a weight. And that's in fact what I was trying to do in my thesis as well, because I was doing, my similarities was basically an ensemble of several similarities course for at least two representations. And that could work. +There's of course modern instantiations of this, and there's a paper, I think both are by some glue people, where they did exactly like this, they combined splaid and some dense vector embeddings. +And that can work apparently a little bit better than, or sometimes maybe a lot better than basic representations, like each representation specifically. So with both approaches, of course, there are issues that you mentioned. +So I don't know what the best approach there, and I don't have a crystal ball regarding what's the best path forward. But with dense representation, the clearly the problem is that you have to pack everything into the fixed size vector. +And as your document is getting bigger, you basically the vector size, the amount of information you can store is the same, but your document increases in size. So you would possibly expect some deterioration in quality. +But another reason why you can see deteriorating results just because some like you have fixed representations, the number of words is huge. +And like in regular person knows like around like educated person knows about 30,000 words, but in reality, like internet has millions of words, right? And the words are not just only words, there are things like product identifiers, right? +If you want to, and sometimes people will do products, they will search something they want to buy, and they would you know copy paste those, or type them in, and then they got squished in the in that dense vector. +So it cannot be precise. + There is an interesting paper by author by Neil Srymer's, sentence board author, where he has a, in like some experimental and even theoretical evidence that as the collection size increases so the dense vector search can deteriorate just because there would be some false positives and measures due to you know the excruciating a lot of information together and they fix size directly. +So yeah, I mean, it's quite possible, but I haven't seen like a fall off of this work, so I don't know how much of a problem it isn't in practice. And coming back to the sparse representations, so yeah, they could potentially use all this issue, but not necessarily with displayed like models. +Well, the problem with display is that displayed models, they create those sparse representations using the, not the words themselves, they're using sub word talking. +So as a reminder with like models like with transform models, they create this sort of new sort of vocabulary that has some complete words, but most words are incomplete. + So like they have like extract prefix, suffixes, parts of the words, and this is your new vocabulary and the difference between these new vocabulary and the actual vocabulary that people use or use on the internet is that it's limited to, it can have like 50,000 talking, maybe 200 talking and some of the advanced modeling models, but we really have like millions and millions of words. +So of course, that would also lead to some deterioration in quality false positives, and especially if you try to represent, represent long documents using this fixed size vector. So it's sort of sparse in more, it's more sparse in some ways, but it's still fixed size vector. Doesn't make sense. +Yeah, it does. +I mean, it's very insightful, what you said that like basically to make my question much more succinct, I could ask, you could we just use splaid for everything? And like instead of, you know, combining different approaches, just use splaid, but you basically answered it really eloquently. +You said that splaid itself has limitations, right? For example, that would not allow us to properly embed all variety of the language and then obviously dealing with longer documents is another issue. +There is an interesting extension to this, so I was just recently listening to a presentation on the extendable splaid where they extend the vocabulary of splaid by eddy entities. +That's one interesting direction of work, but another interesting direction is like the so-called like deep impact models where they take a document and they do document expansion using like, you know, the doctor query style models. +And then they for each talking, I think, in the document they are learning a weight. And so this is like a little bit more less limited, I think. +But in the end, I think it's whenever we, yeah, so basically if like to be able to handle those like rare, we need lexical representation to handle, you know, bigger vocabularies. And it's probably hard to model with just fixed size vectors. Yeah, it makes a lot of sense. +At the same time, we also know that, well, it depends on how you model this, but lexical approach, like vanilla lexical approach would miss semantic links, right, and sort of understanding of larger context, because all it does is that it kind of looks through the VM 25 model at the words. +And sometimes it just pays attention to some words, but doesn't pay attention to other words. +And it may miss the main point of the query, right? + But of course, this model still worked for a new work that Yandex, you know, it best, this model's worked previously, probably by virtue of you training the users that, hey, don't give me the full sentence, just give me like, you know, specific words, like chopped list of words that I need to look up. +And that's how I guess inverted index worked out. +And of course, you need to have on top of that, you need to have very smart reranking strategy to pull up the documents that are really relevant, right? +But I guess today we have we have this new, well, I keep calling it new, but it's not maybe necessarily that new, but it's still fairly fresh development of dense, dense retrieval that not many companies, I think, have been boarded in the products yet. +But it's a very interesting direction, and still you need to combine the two worlds, right? So it sounds like from what you said, the only way to get better quality is to combine this approaches rather than try to develop one single holistic model to handle everything. +Oh, I, yeah, it's a great question. I actually don't know what's the best part forward is. So I highlighted the, the deficiencies and advantages of different approaches. +But I also want to comment on the deep impact model, the deep impact model, I think the way, maybe I described it, it was, it sounded like it is like a BM25 model, but it's actually not. +So maybe we should have, like we're talking about sparse representations, like learned sparse representations, because it's a bigger topic and it's much bigger topic than most people realize sometimes. +So people know BM25, people know dense vectors, and these are, these simple things, but there is a lot in between. +So first of all, what you can do, and that's what people did, and even the doctor query is the most famous way to do so, but it was actually not even a single group of people who proposed this. +So what can you do? +We can take a model, a deep learning model, contextualized model, maybe not necessarily contextualized, but contextualized models, they do better job because they look at the model as a whole, the document as a whole, they don't like look like a devidue chunks of document, right? +So they kind of can understand what the total meaning of the document. +And then they, they propose new keywords on new terms. So some like synonyms, synonyms that could have been in this document, but they are not. And if you add these documents to the new, if you add, sorry pardon me, if you add these terms to the document, then this missing synonyms are there. +You can index this document. So basically this is document expansion. And you can do document expansion. And that helps resolve that lexical mismatch, mitigate lexical mismatch between query and documents. And I claim it's easier to do this expansion. +That there are like of course approaches that do query expansion, basically adding synonyms at the query stage. But why claim is that it's much harder to do this accurately because there is much less context. +So this is one, you know, this is one direction of fixing things and creating sparse representations. Like there is a split model. What does this play? What does this play model? It's completely sort of it's doing something completely different. It looks at the document. +And there is a vocabulary, like bird tokens. And for each token, it gives you a weight. It looks at the document sort of understand this meaning that says, all like this is like, this is a word, a prefix or word. It should have this weight. And that's how you get a sparse representation. +But with deep impact, you're doing something slightly different. So you take a document and you do this document expansion. So you add words like synonyms. But then you don't index this document using build 25. Why? Because build 25 is clearly old style and it doesn't take like context to account. +So instead of that, you train a transform model that would give you weight for each term in the document, in the expanded document. And then you use this for it. Oh, that's very easy. And that's called deep and that's called deep impact models. Yes. Yeah. +We should link that I guess there is a paper for that as well and should be able to link that. Yeah. That's very interesting. And it's also interesting that what you mentioned about the dance model sort of not able to capture everything that you want them to capture. +And yet, this becomes a building block in the application phase, like for example, in Rage or a Givalogmented Generation because effectively, the only method that I heard so far off, which is circulating a lot is just chunk it up. +You chunk all documents up and then you hope that the chunk size is less or about the same as capacity of the model, right? Because otherwise, it will chop off the end and you will lose the part of the meaning. +Or you also apply some methods like some level of overlap, right? So you can then index a few more chunks in the same entity and then try to query. And then interestingly, you can generate questions out of chunks that these chunks might be able to answer. +And then you search those questions instead of the chunks themselves, right? So which comes back to what you said about Dr. Query, I guess. So it's very interesting that like we are sort of like standing on a set of building blocks that themselves should be optimized and optimized and optimized. +But I guess we already in the phase globally when everyone is trying to derive value from LLMs and Rags and everything, right? And yet, we can stumble upon some really tricky situations. Like you explained. Oh, it looks like we have a lot. Yeah, it looks like we have still a lot of research topics. +Yeah. A lot of answering questions. Yeah. I wanted to a little bit digress from here to the work you've done at NMS Lip and I want to read it from your from the GitHub repository. It's a non-metrics space library. +And I did spend some time in my rework life, you know, when I was studying mathematics and we did study a bunch of, you know, metric spaces. I have never realized I would never really like imagine that this highly theoretical stuff would now connect so deeply to practice and it's amazing. +But can you tell me why it's non-metrics space library? Isn't it so that the whole idea of, you know, vector searches that we choose some metric, Cousin or dot product or whatever it is. And we are, and that's how we express the semantics similarity. Great question. +So the reason why it is we decided to not limit ourselves for to metric search because we felt and that's also a feeling of other people is that metric search is is limiting. So it's not expressive enough. It turned out to be true to some degree but not as much as we hoped. +And indeed, in many cases, so and why we're doing so, the representation learning was not as developed as it is now. So we felt like, you know, we need to be able to, people will engineer those complex similarities and we need to support individual using this complex similarity. +This did not happen. +But what I think happened and that's I want to connect this to my statement that in the end of my graduate studies or rather after defending a thesis, somebody pointed out that the similarities that we were using were basically representable as the sparse similar product between two huge vectors. +So it's some sort of it becomes similar to either deep impact or split. And in fact, so the similarity is the maximum product. It's not the cosine similarity. And the the search like the search procedure is called maximum inner product search. +So basically, you want to retrieve documents that have the maximum inner product between query and the document. + And the and this is not this is a symmetric similarity measure in some sense symmetric, but it is not it is it is not a metric and it's not easily reducible to the to the cosine similarity and to the creature searching using a science similarity is actually fully equivalent to searching using the Euclidean distance for the inner product you can reduce this search to a cosine similarity and Euclidean distance, but turns out that this reduction affects efficiency. +And then somewhat like bigger topic for a discussion, but what happened is that people say at at Lucene, who are maintaining Lucene, they were adding support for the maximum inner product. +And Vespa did this and they did this through this trick to reducing of of maximum inner product to cosine similarity and of two. And I argued that there is research showing that this is suboptimal and there was a discussion in this as a result they basically didn't do this. +So so a long story short, I think a lot of things are so non-metrics similarity search as in general turn out to be not so useful, but there are some instances like maximum inner product search where we do have things that are non-metric entities widely used. +Yeah, that's amazing, but I think I hope that equally as I'm learning, I hope our listeners are also learning on this because often times when you plunge into a new field, let's say, then search, all you see is what is being popularized and you know you may go down the rabbit hole. +So I'm really excited and thankful that you are able to share and much wider perspective over things. And then also most interestingly, you work and you say you're a co-author of an MSLIP besides other authors. +Your collective work is also now used at like for open search, engine, which I believe I also had a chance to test at some point and like it's a C++ library that is then somehow loaded of GVM and basically then searches is performed using H&SW. +Can you tell me a bit about that like that story of how did you end up you know connecting these to an MSLIP and H&SW and here I will probably link to the episode theory that it's quite popular today. + Yeah, well first of all I have to say that I mean it was popularity like close to 100% of an MSLIP is certainly due to the development of H&SW which was Eurisk creation not mine and we affected it in only very minor ways because I think the I mean we provided the platform and yeah so I think one trick that you reboard which I borrowed from KGRAF was the efficient algorithm from him or she checking but that was it. +So the end of the sleep but end of the sleep it was like creation of several people and it was like has like a rather wild story so it was never planned in the sort of random how we developed. + So in 2012 I attended the conference where I met Billik Nidhan who was working on and he was doing his PhD on similarity research and we found that like a lot of like we shared some interesting particularly in the written algorithms and we decided to do some joint project together and then my initial interest was how do I support not it was somewhat academic topic no metric search is as I explained before it's still like largely more like academic interest because a lot of things are really metric or at most maximum winter product search which is sort of almost a scientific narrative almost metric. +And yeah so that was basically purely for academic interest and I connected it to the to the machine learning because I saw an opportunity to use machine learning to support generical algorithms that would do a k-nearest new research with non-metrics simulator such as scale divergence. + Yeah so we did it as a machine learning course project and we published paper at Noribs and it could have stopped at this point but then I also like that conference I got like I met other URIs that conference or just treated that I met with some of with Yuri Adir Quothar, Alexander and they worked both work at Merrill Habs company where they developed small world graph approach and that was like a general version and so Alexander was really interested to prove that whatever algorithms that we have in Emma Sleep and we were tackling generic search in generic spaces for generic similarities and he was eager to prove that the graph based algorithm I actually truly generic and this is why he and his student that you know created the first version of a small world graph in Emma Sleep so basically contributed this version and that was a really super slow I spread it up by both 10x and that was the version that we used to win the first in Benchmarks so it was pretty good but it has issues and one issue that was fixed thanks to Yuri sharing with me some early version with H&SW and I looked at the code it was not as like fast version that he created later but already there was fixing something and maybe he didn't realize he showed me that piece of code and I realized oh like there is actually still an issue in SW graph so SW both SW graph was improved and Yuri like contribution is W2 and Emma Sleep so it greatly it was like a huge contribution like big step forward it won the second NNBG Mark competition they proved SW graph was I think the second the second algorithm I have a screenshot of this somewhere which I sometimes included to my job talks and the H&SW also influenced face because they realized oh like they actually knew about K graph and knew about the graph based retrieval but there was one important reason why they didn't you can ask me why but anyway so it influenced face and a lot of other people and of course yeah that that Yuri created that was Yuri's work like a huge impact in the rest of his history but I think Yuri shouldn't complain no he has a great career first he has great career first Twitter and now it's at OpenAI so yeah it's a magic story just a close of the wolf white it face did not implement the approach you had this is this is a really interesting thing because that's one of my favorite pieces in this story well turns out that the the graph based retrieval algorithms they had long history so a lot of this was rediscovered on the the pruning heuristics and like the basic algorithm they go back to papers in 80s and 90s but people did not use it and one hurdle was the inability to efficiently create those K K nearest neighbor graphs and K nearest neighbor graph it's a simple concept you have data point and you have you need to find some data points that are nearest neighbor of these data points and then you connect to them module or some post modification of this graph but how like you know if you have end points it is in the brute force approach and squared computation how can you do this efficiently how can you approximate this so the way how it was done before people were coming up with fancy algorithms how to how to approximate a disappointment and those fancy algorithms would not particularly scalable and K graph I think is not particularly scalable we played with it we actually incorporated K graph in primitation into enemy sleep and it was indeed hard to create large graphs because it's a nitty-fragurithium and yeah it's not it's not very scalable but what what Yuri and Kothar did for as small world graph and it was before she was done we were while they were at Merrillab's company they realized that they can combine the interval and creation of the graph and they can do it efficiently like in what like using modern terminology embarrassingly parallel fashion and that was I think one key missing block that prevented graph based algorithms to become practical yeah that makes sense yeah it doesn't like what what excites me in this story that you shared is that how serendipitous the this like the discovery process is right and like something that feels like random leads to I don't know creation of the industry right you could you could largely say that the new industry of you know vector databases and vector search and now rag on top of that is created because you guys worked on practical implementations of something that was also that that stood on no shoulders of you know some of the inventions and research done before that and so it's kind of like natural progression but I mean it's amazing how you know it's just on the verge of you not meeting someone at a conference could basically need to possibly not creating an industry right quite possibly I think well thank you for the kind words and of course it's not because of us the if not for us I think other people who have done this I yeah so I with them but anyways so I think we did useful work and clearly people are using a tremendous double you lot and people using it mostly even even though it hasn't like a lot of issues but it was still ended up being used rather widely and the one reason why it was used so widely because people who needed a library the Python basically a library that would do Kenyar's new research and it would do it from Python and people often take these little things for granted but say initially I honestly didn't have Python bindings and to participate and then benchmarks and have something useful you would need to have Python bindings this Python bindings were written by Billek I didn't I didn't create those bindings so he created those bindings the first version so there you go that library was possible to use and at the moment there were at the moment the it was not such a big choice of libraries to do Kenyar's search so in terms of the competition there was anoy which was slower noticeably slower there was another library flan which used similar algorithms to anoy but it was less optimized and it was also slower through three wall there was a keg graph but it was not like so easily usable and yeah basically that was it and later came face but it came it was only released I think a year a couple of years later after well definitely after yeah it took several years for face to appear and people started using it so at some point there was vacuum and I honestly filled it now like other approaches that are taking over yeah so yeah it in summary there wasn't a lot of serendipity but I wouldn't take credit for it in the industry it would have created without us for sure yeah yeah maybe or maybe not and it's also well I think it's quite a photo like a better work typical of the end turn not to recognize the impact they're making because the moment they do recognize that that's probably end of story so like you need to be constantly sort of low ego and pointing at the goal and I think this is what you're doing and that it feels like this is your approach but you also do do quite a bit of impact I could ask a ton of questions obviously and I could relate also to the fact that what you explained about some of the struggles like how to optimize these algorithms because at some point I did embark on participating in billion scale and thench marks and I I think I failed miserably but at the same time I did have some code which worked on a small scale and one of the building works there was hnsw with very simple I would say with very very simple intuition that you just make several I think several passes through the data set and you try to bind points in space that are closer to each other and then you would push them to some common bucket I called them shards and then you would build a nation s double index for those shards the only thing that I couldn't figure out is for those shards I still needed to have an entry point to quickly sort of identify which shards I should go down you know through when I when I when I find similar similar documents for the query and I did attempt to modify hnsw code in the enemy sleep you know to to like get me only first layer of the graph so that I can pretend that that's my layer for entering the shards I just ran out of time but I see it was very exciting and also thanks to the organizers we had access to really beefy machines which I think I had I haven't been giving like good use I was mostly burning the you know CPU capacity and memory but I think it's an exciting field and what what I hope is that like with the vacuum that you mentioned that it doesn't happen that this torch will be carried forward and then someone will get excited about and not afraid of trying new things in this space are you yourself still like looking at obviously you're looking after enemy sleep but is there something that particularly excites you in this field that you would be working on or you are working on? + Yeah great question so first of all yeah I am not sure if if I would do any work on vector search in the I haven't actually not maintaining an enemy sleep pretty well recently I'm just didn't have a lot of time and there was an issue with building the so I will still fix and support later version of Python for sure I was like you know like piece my piecemeal work I found like say half a day to fix like Windows build something else popped up yeah so it is an exciting field while it's also become really busy and another thing is that I still see the focus main focus it's not it's not very appreciated so the like you said you said I mean there's a really nice voice that all like that helped industry to be created and maybe it's true to some degree it is what yeah is it appreciated by you know your potential employer no it's like zero appreciation so it it isn't it's it's it's it wasn't still is somewhat niche topic and most people are of course I interested in how do you solve intelligence in that you know broad sense of the word how do you create models that can be seen and if the model like how you can combine them that this is like you know this new agentic ecosystem and yeah so all that stuff that really excites people it it isn't this you know plane of or space of large language models machine learning deep learning intelligence you name it yeah so that's why I do have ideas I did tested some of them but you know things usually don't work but yeah I don't have time to think systematically about this issues yeah but I guess at the same time you did create the base for for you know for other people to innovate and I think it's I think it's highly appreciated really I also wanted I also wanted to pick up the topic that originally sort of interested when you interested me when when you popped up on my LinkedIn you know feed you you made a statement about relational databases trying to implement search feature or sort of capability and sort of like miserably failing and that maybe you didn't use the word miserably it's it's my word here but I wanted to do a little bit like expand on this like why do you think they tried to do that and also while they were trying that what went wrong yeah great question well first of all I definitely wouldn't say the word miserably because the it it has been success to some degree definitely and it's not it's not over until it's over so the people are working with this so what I have been observing for many many years and I as I said I did start my career as as a person working on databases and doing a lot of writing a lot of SQL so but the the typical database is a very different beast from what you typically need to do information to you so the first of all they all like the early databases or they are oriented they achieve some tradeoff between the so they need have got throughput in both inserts and updates and they need to be able to update information pretty quickly and also it should be pretty reasonable and they also support really like the the data the data can be pretty complicated that what they call that SQL schema there can be multiple tables and all of that needs to be supported and so of course there are tradeoffs to be made to make it possible and again to support generality support efficient updates support efficient inserts but at the same time if you're doing the TV system a lot of of this is not necessary so say for you want to do like keyword-based retrieval you only need to all at high level is somewhat a simplification but you need to memorize them in which documents you have which keywords and then you have this so called inverted index where for each keyword you have a list of documents where these keywords appear and it's much simple structure it permits much more efficient compression algorithms so it's again it's it's a different beast and and also in terms of efficiency of updates once you compress data and once you represent it in a special way it's it becomes much harder to to make these incremental updates for which those early databases were applied so clearly there is a disconnect it was somewhat removed with the introduction of so-called columnar databases but it's still like with columnar databases I believe they actually do not favor those you know point updates anymore they they are best to be used for bulk updates and so basically once you're doing bulk updates yeah you're sort of in this search engine area where you ask you you change things in in rather large increments changing the access in rather large increments and you don't you don't worry too much about your information is being like really up to date you can wait maybe a day maybe a few hours but it doesn't have like an instantaneous update of the database so this is a different trade-offs so yeah um well of course there is a disconnect and this is why it's it was always hard I believe to add to add like food tax indexes to regular databases but another issue with the the disconnect is that like again the retrieval often needs like really different set of specialized features so if you have a relational database system it's pretty hard to support this for example like deconization if you need to do deconization in multiple languages yeah so of course that's part like you know the creation of those specialized tools with a lot of features like you've seen and VESPA and databases are catching up but there is still a gap and you know it's probably like going to be really tedious to to support yeah full set of features like you know you need to match VESPA so yeah these are like my five cents on this stuff yeah but I'm curious to sort of a little bit the understand why do you think databases are still trying why are they trying to encompass this seemingly disparate ways of searching right when you actually if basically like you explained if you need to have a fully blown search engine that can support multiple languages tokenization and so on you better be using the likes of recene VESPA and you know maybe elastic search on top of the scene and so on why why are they still trying they want customers it's of course advantageous to be like you know one stop shop so they come to specific provider and they have everything so I listen to a podcast the roxette co-founder which was the roxette the one was acquired by OpenAI but I think you recorded that podcast before they were acquired so good timing and you can clearly hear that message all like we really want people to come and use our solution so we have hybrid search we have some support for ranking we have this and we have that yeah I can't I can't argue against this being convenient so definitely something something very useful customers yeah yeah just a small correction he's not a co-founder I think he's well VP of engineering or used to be a VP of engineering in roxette but yeah I mean he's he brings the story and I encourage listeners to listen to the episode he brings the story of you know roxDB scalability issues from from Facebook and how it underpins you know the the further journey at roxette so I feel like we could discuss for five hours and I'm actually a big fan of Lex Friedman podcasts where some of the episodes really really long and you can listen to them for weeks and and I think I really hope that we can record with you sometime later as well as you know as you have topics to share but is there something Leo that you want to share I don't know it could be a paper you've read that particularly excites you maybe a book or anything else that you want to say yeah I think we yeah great question so so I was interested a lot recently very recently I mean maybe the last couple of years in how LLAMS can be useful for search in one particular interesting direction is how do you use LLAMS to to train smaller models for retrieval and ranking for me personally it's a very exciting area of research yeah as far as distillation is concerned there was several interesting papers on the topic there was but basically the lot of of that work revolves around creation synthetic data synthetic queries based on the documents like we have a document that creates the queries and queries that you asked that that pretty slash question and the answer is in documents we have a positive relevant document and you can sample negatives from from your collection and train them while but there is also a line of research where they they would try to create both queries and documents so yeah in summary the that whole that whole not in summary but that that that line of research was particularly interesting to me although there was some work before LLAMS to create synthetic queries it was not particularly well-used technique but one paper that stood out was the in-part paper from a couple of years ago and we have reproduction of this paper in that paper had a a pretty quick fall-up from the there were several several authors from Google they called it Protigator where they showed how this technique can be improved and there was another fall-up from the with the same first author now she transitioned to the mind and now they they showed like like oh like now we do it like somewhat better but the they they found one issue with the synthetic query generation approach that not always the the document that was used to create the queries the most relevant document so you would think it sort of makes sense that if the question is being answered by this document it is the most relevant document that turns out if you ask a question there can be other documents that that answer this question and they can answer that question even better and so they solve this problem using you know a relabeling approach so that basically the due to the will they generate synthetic query from some document they they do it you and they do then they look at the top say 10 documents and they they use another LLM to decide whether these documents are relevant to the query or not yeah it's also very interesting paper as well I yeah and finally the last couple of papers that I encountered were regarding creation of either creation of documents either just joined here with queries or based on the queries this is also very interesting for long yeah that's amazing thanks for sharing and I hope we can link all of these papers in the in the episode you know yeah absolutely because I think one of the goals of this podcast is to continue to be educational resource not just entertainment maybe some people potentially viewed as an entertainment entertainment and then good sense of word you know when you want to sort of a little bit like break away from your daily routine and then listen to some of the insights and and we heard a lot of insights today from you thanks a lot for sharing Leo and I wish you all the all the best in in your in your projects and your current projects in your future projects and yeah I mean I would be all equally excited to talk to you at some point as well because it does feel like you have a lot more to say than I'm able to contain in the in a single episode yeah it's my pleasure thanks a lot for inviting me I enjoyed the podcast I enjoyed our conversation very thank you very much Leo and good luck bye bye \ No newline at end of file diff --git a/transcripts/vector-podcast/doug-turnbull-staff-relevance-engineer-shopify-search-as-a-constant-experimentation-cycle.md b/transcripts/vector-podcast/doug-turnbull-staff-relevance-engineer-shopify-search-as-a-constant-experimentation-cycle.md new file mode 100644 index 0000000..d99c001 --- /dev/null +++ b/transcripts/vector-podcast/doug-turnbull-staff-relevance-engineer-shopify-search-as-a-constant-experimentation-cycle.md @@ -0,0 +1,127 @@ +--- +description: '

Topics:

00:00 Intro

01:30 Doug’s story in Search

04:55 + How Quepid came about

10:57 Relevance as product at Shopify: challenge, process, + tools, evaluation

15:36 Search abandonment in Ecommerce

21:30 Rigor + in A/B testing

23:53 Turn user intent and content meaning into tokens, not + words into tokens

32:11 Use case for vector search in Maps. What about search + in other domains?

38:05 Expanding on dense approaches

40:52 Sparse, + dense, hybrid anyone?

48:18 Role of HNSW, scalability and new vector databases + vs Elasticsearch / Solr dense search

52:12 Doug’s advice to vector database + makers

58:19 Learning to Rank: how to start, how to collect data with active + learning, what are the ML methods and a mindset

1:12:10 Blending search and + recommendation

1:16:08 Search engineer role and key ingredients of managing + search projects today

1:20:34 What does a Product Manager do on a Search team?

1:26:50 + The magical question of WHY

1:29:08 Doug’s announcements

Show notes:

Doug’s + course: https://www.getsphere.com/ml-engineering/ml-powered-search?source=Instructor-Other-070922-vector-pod

Upcoming + book: https://www.manning.com/books/ai-powered-search?aaid=1&abid=e47ada24&chan=aips

Doug’s + post in Shopify’s blog “Search at Shopify—Range in Data and Engineering is the Future”: + https://shopify.engineering/search-at-shopify

Doug’s + own blog: https://softwaredoug.com/

Using + Bayesian optimization for Elasticsearch relevance: https://www.youtube.com/watch?v=yDcYi-ANJwE&t=1s

Hello + LTR: https://github.com/o19s/hello-ltr

Vector + Databases: https://towardsdatascience.com/milvus-pinecone-vespa-weaviate-vald-gsi-what-unites-these-buzz-words-and-what-makes-each-9c65a3bd0696

Research: + Search abandonment has a lasting impact on brand loyalty: https://cloud.google.com/blog/topics/retail/search-abandonment-impacts-retail-sales-brand-loyalty

Quepid: + https://quepid.com/

Podcast design: Saurabh + Rai [https://twitter.com/srvbhr]

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20221001_071023_bbd8f38e993da204036dc514900a891b.png +pub_date: Sat, 01 Oct 2022 07:32:38 GMT +title: Doug Turnbull - Staff Relevance Engineer, Shopify - Search as a constant experimentation + cycle +url: https://rss.com/podcasts/vector-podcast/638830 +--- + +Hello there, that vector podcast is here. We are rolling in the season two of this podcast. And so today we have like a breach, so to say from US to Finland. And I'm super excited to talk to Dr. +and ball, staff relevance engineer choppy fine and that gives to be a CTO at open source connections, the company behind so many tools for us relevance engineers and relevance product managers as I am today. He's the original creator of Quepid and explainer and also learning to rank rank. My shirt. +Yeah, let's search. Yeah, cute.com. Awesome. Great to have you here. Hi, how are you doing? I'm great. Yeah, I'm doing great. Excited to chat about where search is going and the exciting places that are, you know, search is going ahead and everything. So finally, I get to be on this podcast. +I'm really excited to be here. Yeah, absolutely. Long overdue and you are the legendary guest. So I'm super excited to talk to you. +And a lot to cover, but before we begin, could you spare a few minutes talking through your background, how you ended up in search, was it an accident or was it not, was it? It was mostly an accident. +So what happened was so for a long time, the first chapter of my career, the first half was being C and C plus plus developer. And I kind of got really into performance, so optimizing speed and in native code. That was a lot of fun. +And I moved down here to Charlottesville in 2012 from the Washington DC area couple, so I was a couple hours away from my work. And I found that like, you know, I was kind of at the time, especially being one remote employee for an in office company is just a nightmare. +And we had this neighborhood block party and I decided to wear a nerdy t shirt just to see like, oh, maybe I'll meet other developers. And I think this shirt said something like my code doesn't have any bugs. It just has features or something or random features. +And I so happened to run into Eric Pugh, who's the founder of open source connections and sort of one thing led to another and I got, I was like, oh, this seems cool. It's a small company. Always wanted to try out consulting and contracting. +And so, yeah, we ended up getting at the job and getting more and more into search. Yeah, awesome. And you spend there how long seven more years about eight years eight years. Yeah, it's a long. Yeah, a long time. Yeah, it was a lot of fun. Yeah, and you've done so much. +I mean, I was literally in my previous job at AlphaSense. I was my last. So I spend the 10 and a half years and my last half a year. I was focusing on learning to rank. And I could find I could not find a better resource than hello LTR. Ripple on GitHub that you have. Yeah. Yeah. +And it was it was an amazing journey because first of all, I had to learn it on the other hand, I had to build like what we could call maybe an infrastructure pipeline of flywheel of success. Yeah, so you're going to be right. +So yeah, and then you train the model you test and then you validate and so on so forth. Validate with the users maybe a test. It was awesome. I built it entirely on your on your Ripple. And I even thought that said some was the PR or issue that I created. But anyway, I'm sure yeah. +Yeah, it seems like to me sure you contribute a lot to like I know you contribute a lot to keep it too. And I think it's a great work or constantly huddling on you know, keep it and trying to keep it keep it going. +Yeah, and I mean, that's thanks to your curiosity that you created it you you kind of saw the nation feed for it. But also when I came across it, I mean, it was very straight forward to start using it. And of course, it was also learning experience. +But now every time I joined, let's say new new gig, you know, previously silo AI with a large client, you know, WebScale search, I brought it in. I said, there is no other tool that I know we should just try this and then try to write a Tom Tom right now as well. So we have it. Oh, that's awesome. +Yeah. Yeah. Yeah, Quepid has a funny origin story where in sort of like dovetails with my story, but for a long time, opens for connections. And I think this is true of a lot of places in the early 2010s. It was pretty easy to build you we would build these beautiful search apps. +And that would be part of our, our consulting as we build these search apps. And they would be beautiful and they look pretty, but then only at the very end with someone type in a search and you would see, like these results don't make any sense. And then like people panic and they want to fix it. +They're about to go to market. They can't release like this. And they realize of the search engine isn't some magic black box. It's actually this thing that we have to configure and tune and stuff. +And so Quepid actually started because, and there's an old Lucene revolution video that talks about this. But John Barryman, my coworker at the time, and I would go to our client also in Charlottesville, Silverchair. And we were helping them develop these search applications. +And as like they would tune like constantly we go back every week and try to fix something. And then we would end up breaking something else. So I finally got kind of tired of it. +And I just sat there and built like a at the time of Python flask app that was just let's show these search results and like just label them as good or bad. And so we don't have to keep going backwards on on our quality. +And he was I was literally creating the apple he was sitting there like trying to tune search with with our client. Reena Morse at Silverchair. So it was kind of like hacked together in an hour and then we started using it. This is so cool. +And I mean, for me, Quepid, I mean this topic of quality assurance and searches big, I think, right. And maybe under valued. I'm not sure. Yeah, it is. Yeah, totally. +But you know, like at Alphasense, for example, I had access to people who used to be financial analysts or they deeply understand, you know, content. And it's so important to understand content like brokerage versus, you know, sell site versus buy site. +What is it? What is this? You know, what people are looking there for. And I remember one of the guys on that product team, he said, well, this is fantastic. Now I can explain to you what I need in terms of relevancy without getting into the weeds of your algorithm. +And you then hold away and get there, right. So I mean, hold away. That's fantastic. And I remember like at WebScale search, we got stuck a little bit like optimizing our KPI metrics, like one of them is click through rate. +And I remember when I onboarded Quepid, we generated literally 70 juratickets as a result of analyzing first annotating, rating the queries and then analyzing what went wrong there, right. And probably like half of this at least was data related issues, which you would think, hey, this is Quepid. +This is about relevance and not about data. Oh, yeah, you find that stuff all the time. Yeah. + We would, we would, we kind of had this model at open source connections that worked well where, you know, you, you come in as a consultant, you're trying to, we would, we started consulting in the search relevance basically exclusively and we would come in and instead of sometimes, you know, it's a very data driven process and it needs to be, but on the other times it's like just jumping in. +Let's start with 12 queries and let's label what's what the good results are and improve those. And then we would kind of go through these sprints of, okay, let's take the next 12 queries, let's take the next 12 queries. + And you just constantly like gradually expand the envelope of what you're tuning and actually worked really well as a practice for improving relevancy without having to spend like sometimes, you know, the, you know, places don't necessarily have months to spend boots trapping like a click stream pipeline and understanding clicks and all the biases and things and that. +Or, you know, and so it's just a really straightforward way to get started on the problem. Yeah, absolutely. And I don't know if you could imagine this, but when I was a consultant, I had like a breather two months between, you know, that client and Tom Tom. +So I consulted you startups, one of them in the US. And so they look at us, we said, you know, you come in and they think you can do magic and I said, okay, maybe I can't, but I will not tell you. So no doubt. In front of you. +So I came there and I said, hey, how are you doing QA and it should need this massive Excel with colored legend and like what not. And I said, well, this is cool, but I think it's not repeatable. And they said, yeah, it's a big pain point. I said, let's do something better, something else. +And I introduced QBIT, it took probably a couple months. Then I lost touch with that startup. So I switched to consulting others. They didn't reach out. Then I reached out and I said, hey, what's the status. +And they said, you will be surprised, but we have moved the whole QA process to keep it as like wow. Oh, that's awesome. Yeah. This is the touch and feeling of what you have created for your use cases, worked for someone else's case. Isn't it amazing feeling. Yeah, it's great. +Yeah, it's funny how that works. You know, if you solve your own problem really well, you know, there's probably other people out there that are like you that have the same problem and appreciate that perspective on the problem. +So, so yeah, I think that that's kind of a truism is like, don't worry about solving a if you have a need, if you have a need, solve the problem for yourself as the most important audience. And it will sort of naturally find the people like you have the exact same problem. Yeah, fantastic. +And now you are at Shopify. Yeah. So how do you structure your work there? This is my how part in the podcast as well. You're building your product is relevancy in many ways, right? And maybe performance of the search engine because there are trade-offs. Yeah. +So how do you structure the whole process experimentation, evaluation? Is there anything you could share and understand that there could be some private things you don't want to share. Oh, sure. Yeah. That's okay. +So for context, our team works on the relevancy of all of the Shopify storefronts, so all the little shops out there. And that's a really interesting process because you could imagine the impact there is very variable per shop. +And we don't, of course, we don't want to like tank someone's sales, but at the same time, if we see something doing well, generally, then we, you know, we want to promote it. +So in the last part of the process there, it's, it's very different than in the past, I've worked on, you know, you work on one search engine, you might work on one Shopify store, so to speak. +And that Shopify, the challenges, there are hundreds of thousand millions of little, you know, shops that you Shopify search. +And you, how do you find an algorithm or algorithms that support those, that, you know, what's going to work well for every possible ecommerce use case, like in some cases, of course, there's a lot of apparel on Shopify. +There's a lot of, there's a, you know, all kinds of things people make up businesses on Shopify Shopify is wants to very much support creators. How do people even search, what do they expect when they search on these stores. +So the good thing is when I started at Shopify, there was already some amount of, you know, data flowing through in terms of like knowing what people are clicking on and stuff. So I was able to pretty early on start developing a click model. +So a click model is something that looks at how users click on results and sort of like in aggregate gives a search result, a probability of relevance for a given query. +And we noticed that people skip over certain product a lot when they search for shoes, maybe for some reason the shoes search shows socks at the top, we know those socks are probably not relevant. And we know that whatever they're clicking on below that are very likely relevant. +And so we can, we started, you know, the good thing that Shopify we kind of could were able to start using that as like a test set. And then, of course, tooling is very key to my heart. So like one thing that we've done at Shopify is we built a large tool tool chain. +We sort of to do offline experiments called called boogie using that data. And it's about what you would expect from using something like Quepid we can take this data we can sort of see like did we improve things to things get worse with our ideas. +And then of course we release to a test we look at our normal conversion metrics and that kind of thing. And then we do a lot of analysis of our A B test and we we will graduate things for production. So at a super high level that's that's nothing there. +I don't think is that different than most places other than you know we have the challenge of so many different shops and things that we have to sort of solve for. I mean, this sounds so fantastic. It's like almost fixing or improving search for the entire e-commerce right and maybe even. +Yeah, that's that's part of the challenge and the draw on one of the reason, you know, I met Shopify is just because it's it's for you. There are people on Shopify who sell 100,000 products. There are people who sell one product right and there are. +There are really there are stores that you use that you may not realize are Shopify stores and then there are stores that are very clearly like Shopify at the end. So that you know Shopify in the URL or in the footer and that kind of thing, but some you know your local. +I think my local running shop. Shout out to rag and mountain running. And then there's a place that you know at the farmers market and a lot of those places you Shopify. But then there's also like larger larger brands that use Shopify. Yeah, that's amazing. +I've seen you recently posted with a link to this paper was it from Google saying that search abandonment costs us retail 30 300 billion dollars annually. So it's like massive massive opportunity for search companies, you know, consultancies and so on. Totally. +But why do you think this is still the case regardless of all the efforts of our like what search community. Is it to say that community is too small and there is like potential to grow it and add companies and so on. Or is there something fundamental that you know still needs to be tackled. +And I think it's it reminds me a lot of where the search space and not just the search space for adjacent things like recommendations or any. I would say like surface on a let's say a website or a product or an app that like is algorithmic in some way. +That's just a it feels like from how people build products. It's just fundamentally a nascent space. So it reminds me of early in my career, when I was a software developer. +And I worked at a couple software companies maybe because I was a CEO developer that fund that really were hardware companies. And people at management levels were used to running hardware companies and but more and more of the value was delivered to software. +And they didn't necessarily understand how to manage software. So you know hardware you might have these very upfront you know classically waterfall kinds of development processes. +And then software you know in the early 2000s we learned about agile and then it was good to be iterative and these things and it's okay you know fail fast. And we can always unlike hardware with software we can hit the undo button. +And so it's a very different practice and a very different style of leadership I think. And I think the same thing is becoming true of these data products like algorithmic data products like search where. +Sure at the implementation level in our at our level you see a lot of people who really we understand the problem we understand it's very experimental. It's even more experimental than like how software you can ship something and you can undo something if you need to. +It's actually like extremely experimental where every week you're shipping something new and you're always looking at metrics and AB tests and everything. And every week you make a completely different product that you go in a different direction. +Honestly I think one reason that one reason that that this is a problem is that organizations structure to ship classic software aren't necessarily well suited to ship like these data products. +And you I gave a talk at at my C's you know the e-commerce search conference in Charlottesville in April about how to Shopify. One of the things that we do to try to help with this problem is really make engineering and data like work hand in hand. +Because many organizations they're very siloed from each other. And that can be a really big challenge because as you make these decisions like day to day like I'm implementing something in my search. Do I go do I you know as I'm writing lines of code do I go to the left or do I go to the right. +Do I try boosting this or implement this algorithm. And I think that's really a little bit slower and a little bit faster. And like those really intricate decisions you kind of need both sides of the data and engineering brain to to do those. +And I only think of very very small handful of companies places like Google maybe meta Facebook like have really mastered this like blending of data and engineering. Most people most other companies who have started to like you know have finally mastered software engineering. +Haven't quite come up like from a leadership and beyond and product leadership overcome the the sort of like hump of of how do we think about data products. +How do we manage things that are experimental and aren't like you know actual projects that are going to take a couple months to complete that we have a very clear beginning middle into. +Yeah exactly it's like beautifully put like it's not like a Toyota you know pipeline where you could say yeah this is where we start we put it all these materials in some people do something and we fix some bugs and off we go with the car. +But like there is no like definition of done in some sense rate. Yeah there's no definition of done it's very much like you're just constantly experimenting. It's not something that's visual that it's like oh we're going to add this button to the UI and it's going to do these things. +In some ways you're not changing you're all with you're like rarely changing the UI you're like mixing up search results and how they come up. There may be UI elements to it like oh we understand this query better so we serve this UI but it's extremely fuzzy and hard to like. +One of the biggest challenges I have actually you know and I've had this in consulting and you know continue to have a shop device how do you coach stakeholders to understand what you even do. +The plus side is it's very much a it's very much like oh it's a very tied to you know we're going to make more money on the other hand it's like not clearly like. So just be in a traditional sense it's a it's a constant cycle of experimentation and optimization. +Yeah absolutely I mean and it requires like a different discipline and like rigor and really even like a Tom Tom for example I work in a relevancy team search search for elements. + I'm not a ML component or try to but but the thing is that I was amazed by by the team saying hey I'm running this a b test and they compute a bunch of metrics some like confidence intervals p values and they say yeah I feel like this is a good change in the algorithm but it's not proving to be like when we have split our traffic a b 50 50 just doesn't work after two weeks we have to kill it. +You need to go through that rigor if you say just for the moment you you doubt and you say no I love this change I'm going to push it forward it's not going to harm you lost like you cannot do this right. +It requires everyone to be a scientist too and I think traditional I think there's like traditional product and other kinds of leadership that is very can be very opinion driven or have a strong vision. +And I don't think there's a there I think there's still tons of room for that because at the end of the day you need. You need a strong hypothesis and often what you're a b testing is within the context of a larger strategy. Like you know we think we'll get traction if we go in this direction. + But you have to like you have to really bring science to everyone has to be a scientist and it's not as often an a b test to is not as simple as like it was a clear winner loser it sometimes is a winner in some ways where it's like it's a winner in this dimension a loser and so we mentioned and then like can we go after and slice and dice the data to really understand what happened. + It's not it's not a often not like a cut and dry story and so like trying to understand the data to even know how to tell the story requires a lot of humility I think if you're a leader to say like okay it's more important like what I learned and like my big idea being being the winner of you know being the clear like thing that won so I get all credit and that kind of thing. + Yeah exactly this is amazing I wanted to I'm actually rereading your book here I'm a big fan and and once we meet you give me the autograph all right and John very man I think I saw you actually in person listen revolution 2013 you were on stage it was in Dublin do you remember being in Dublin if you. + Yeah yeah that was fun yeah I remember they had this huge rugby stadium yeah yeah exactly this was the coming back to I think I talked about Quepid there I think my colleague from Silvercher arena came out we talked about Quepid yes and and I and I got blessing for Luke from Andrew Balecki there I said would you be okay if I continue because he didn't have time and he said yes yes please please oh cool and and then later it ended up being part of the game. + You see and earned a lesson committed to Tomoko Ocita who is now driving massive changes day on Gira and oh yeah yeah that's true yeah I've seen his name a lot yeah yes fantastic and and in this book why I brought it up I was just reading one of the first chapters where you so beautifully said analysis so let's say in lesson lingo how you process the input text yeah analysis shouldn't map words to tokens it should map meaning and user intent to tokens yeah I mean this is amazingly put and you go there later explaining how you balance the precision versus recall as you do modifications to the analysis chain you know whether you're standing or not and stuff like that but it's not like many people even viewed that way I think not that I viewed it that way I was always like yeah what should I tweak to make it possible but there is a related topic on this front you know query and content understanding mm-hmm how how does this thing connect in your mind yeah I think so first of all I think it's funny how the work that we do shapes sort of like how we do certain work shapes our perspective on things because I think when I would you know writing that book is sort of like my early part of my relevance career at open source connections and this still happens like you kind of get brought to a client and it's like okay we have we have this app over here we have this indexing pipeline you just work within this box that is the search engine and so I became quite adept at like how can I hack the analyzers and the query and everything to be really to do like all the crazy things I want to do like and really could I take in a taxonomy and sort of map to a conceptual understanding of of the language not just a not just like the words themselves you know people think about analyzers they think about stemming and they think about lower casing but more and more it was like oh I only I can only work within this box it as a search engine and whether it becomes like plugins or whatever how can I how can I massage the text coming in and the queries coming in so that they they mapped to each other and so in that in that context it's it's like you know people you may have heard Conway's law which is like you end up shipping your org chart like how you structure your projects is very much tied to the organizational structure of how you sort of of how you do things and so the consultant slash relevance team it really only works in the box that is the search engine and makes the magic thing magic more magical and so how it's when I think about that often it's sort of similar to how people think about relational databases you're creating a structure of a database to answer certain questions in the same way using analysis and how you create fields you're sort of like structuring an index or a view of some documents to answer these natural language queries that come in and so everything is sort of like thinking about massaging this database to really to really get to that into rank results in a way that sort of like gets closer to the questions that users are answering and I really concrete example of that is uh you know I think this comes up a lot in that actually this is my one of my earliest first projects was if you take some let's say some medical knowledge into a your indexing like questions or medical articles and you have a there are taxonomies out there that are like mesh is one medical subject headings that say like okay this article is an article about um it let's say something in the cardiovascular system it has to do with the heart and it has to do with like the left ventricle so you that's a taxonomy it's hierarchy and um if I can index that and I can index that taxonomy a certain way so that if someone if I take a query I also map the query to something in that taxonomy let's say cardiovascular system heart rate ventricle if I can engineer the similarity in the search engines so that it uh it kind of uses the analysis to be like oh it has so many taxonomy nodes similar that makes it more relevant but maybe it has one or two dissimilar that makes a little less relevant if I can sort of like zero in on a on on that uh then I'm really getting closer to sort of meaning that I am to uh you know whether it's like a stemd version of this word or not and you can create tokenization pipelines that take terms like let's say uh myocardial infarction which is a heart attack and sort of like uses synonyms and other things to say oh it's actually this part in this taxonomy uh and therefore you know we we we sort of expanded to these taxonomy tokens and uh same thing at index time and so I got very adept at sort of massaging data in that way but I think like when you take a step back and you think about if you have access to full indexing pipeline as most teams do and you have access to the full query API and everything um really you're doing the same exact thing you're sort of like massaging content as it's come in comes in in some ways you have more tools if you can do it before it gets to the search engine and the same thing with queries you might have some ability to apply an nLP model or do some kind of any recognition before it comes in so philosophically you're really doing the same thing you're trying to map um you're sort of at one side you're mapping documents to queries and on the other side you're mapping query to sort of the document structure and you're trying to map those two together in a way that creates a ranking function that that does what you want it to do yeah absolutely um I think it was Daniel Tanklank who summarized his 20 years experience as comparing sets of documents right so like is this set of documents better than the other and then everything else comes as input you know was it query understanding was it content understanding whatever yeah it's it's amazing absolutely absolutely yeah and it's uh all of these things come together and the search engine is kind of like the core driver and you're trying to massaging this similarity engine to to make that quote-unquote cosine similarity what you want it to be yeah I've recently ran across one case uh so in map search you could think well what what people do type there well they do type addresses they type um coordinates um they also types uh questions they can see how do where do I go hiking here you know in this area stuff like that not something we can handle right now but maybe in the future we will and the case was um there was a company a search with company name so right we support points of interest search B. +O. +I and our search engine focused on so you have a meaningful part of the company name I don't remember something like white mice something and then had like less meaningful parts like limited south Africa you know things that would repeat across a number of company names and I was search engine because you you have the feature of minstreet max right it it it it focused actually on less meaningful components and so we bumped some overlap higher just because how TF idea by the way works it's also a lot of work they're going into understanding why does this TF equal to this number I need to figure out idea right um yeah I mean and I was like so I went on Twitter and I tweeted like I came across another use case where maybe vector search could help because it would actually focus on the meaningful part hopefully because you have the attention uh mechanism in the transformer models right like bird and others so presumably it would focus only on the right part and it would find it do you think do you think this was a moment of despair do you believe in this? + uh I I mean I do believe in it to some extent like I totally believe that's a valid thing I also think that like sometimes the document frequency itself is really interesting because it's like it gets at the idea of specificity in the query and so if you search for something and it's just like the document frequency is sometimes a poor measure of specificity because it's uh it's not like it's actually you know just because something is rare in the corpus doesn't necessarily mean it's it's uh it's it's actually more specific to the users intent and some cases like that is just like uh thinking of when when we would do uh we did a project for a Riley media to kind of help with a project similar to Safari books online that people might be familiar with and people don't like if people search um job ascript or book job ascript books it just so happens that just how titles are written if you write a book on on uh react you're not going to put JavaScript in the title but react is conceptually you know about JavaScript and so uh what's really interesting is like I you know you type JavaScript books react might be a great great react book might be a great JavaScript book but you have to understand react in the context of this broader concept of JavaScript even though that exact term is put in title so uh this concept of term specificity is really useful but it's often like uh the the way we get at it with document frequency it can be can be really invalid you know not great and to your point about like the attention mechanism that's that's really that's really interesting because um yeah I sort of I could see like conceptually how uh how that can really like tie you sort of like zero in on the concepts that are most important to a to a document and one challenge with like one of the reasons I think Bert is so found like transformative is traditionally like for years and years even going back to the early 2000s with like lead and semantic analysis of these these things and and then we have word to vac eventually these sort of like techniques uh they're really great for like and some ways like increasing recall or getting at like a rough semantic sense of of what's what's uh you know what's there but when you at the end of the day it's like not helping me really get at the higher precision kind of component of search that really like traditional search engines thrive at and still are really good at like you have you know that this is a shoe I don't need to see socks just show me the shoes this is a shoe and you don't have that like fuzziness that you get in a dense vector representation where everything is kind of compressed down and fuzzy uh but what hurt and those kinds of things really do with the attention mechanism I think it's really like turning that on its head where it's like actually there are these parts that we could get at where it's like the precision of these related concepts it's like we know that um we know that the A the most important part of this document is this part that talks about JavaScript or it's you know the JavaScript-iness about it and and when we search for that we can kind of zero in on when that dimension of it as opposed to being a fuzzy concept of um of you know programming languages and JavaScript if that makes sense I feel like it's like zeroing in on like what makes this this thing precisely interesting as opposed to traditional dense vector representations which have been more fuzzy and castifying cat out a five wide net kind of thing and more focused of recall yeah exactly I think and you you're reading a nice paper with what problem does Bert hope to solve the search in 2019 and of 2019 and you really well-compared there uh inverted in the xpar search method with uh war two back and then you basically allude to the fact that Bert probably gets the aboutness of the document uh better than uh war two veko tfidf right because in war two veko you essentially have like a window that you slide through yeah to your early example if this book react never happens to be near JavaScript because everyone knows it's JavaScript right then you will never find it using using war two veko maybe it will be too distant but with Bert it tries to embed the whole document right or like you know chunks of it averaged and so on so it might yeah and and you have to do like if you were to use war two veko you'd have to like sort of implement your own attention mechanism in a way you'd be like uh okay what parts of the document are okay first i've got to throw out a bunch of front matter and and matter and and junk and like with word to veko you'd have to somehow engineer to like okay we'll look at these paragraphs and uh maybe i need to focus it on these ones more and throw away some other ones uh and you don't the aboutness of that gets really blended whereas the amazing thing yeah you're at the amazing thing about like about Bert is how it's a ability to really zero in on the aboutness of like where each each token position it's not just like the paragraph has you know where the document has an embedding each token position has an embedding so it's like if i take a question i can really zero in on like oh this is the part of this article that is most similar to this you still got challenges with like the fuzziness of dense factor and it's you know maybe not precisely the words you're looking for but just the fact like that's just mind-boggling that each token position of a book might be an embedding i mean it's a it can be a beast to to man and should deal with but it's it could be a really powerful concept yeah absolutely and plus it's a mass model right so it can predict what should be the the token and that masked out position and then it can actually predict entire sentences i think there was one of the side yeah effects of it right so it could become generated yeah totally yeah exactly so it's pretty amazing yeah and so today as we roll into this you you follow up on this trend of sparse versus dense you know i think a lot of discussion is still going around how dense will enter this sparse search world at larger scale so how do you feel about this and of course there is hybrid search as well it's a hot space yeah and i know there's a lot of open source projects there's like milbus there's companies like pankham there's qdren there are all all of these systems that are doing dense vector retrieval and it's also just like a fun problem if you were in search for a while to think about approximate nearest neighbors and like how you solve that and i know for a long time it's been sort of you know a side project of a lot of people i know for you Dmitry and Max you guys had had a lot of fun in the billion vector challenge um it's it you you the first thing to ask is like why do we need these extra databases and it's interesting to think about because you're thinking like i we just talked about why can't we you know we can match map tokens to meaning and that kind of thing you know a lot you know and why can't we do that why can't we just apply the same techniques to the dense world why can't we use a traditional search engine and if you think about it what what you're they're very in some ways they're very the data structures underneath of them are optimized it's like yes you're both in both cases you're sort of like mapping query meaning to document meaning like fundamentally the task is the same but the data structures that you would use for a dense system where everything is clustered into like 200 and maybe 256 or 760 or whatever dimensions are very different than a sparse index where you know you have and it's something like a a last searcher to get a loose-seeing traditional index really it's a it's a the dimensionality is way way higher so it's like you know you could expect hundreds of thousands of words each each word is its own dimension and if you think about that like you're going to have situations where you know words follows zip zip-slaw zip-slaw which is you know the occurred in english the occurs in every word and you get gradually gradually falls right off a cliff and then you get like cat occurs in 1% of documents and then you go keep going and you get like specific terminology like feline occurs in 0. +1% and it really falls off a cliff and so the sparse vector indices are really optimized for that use case of like I have a I have a term and it basically points at a very small handful of documents that contain that term and I can do that look up very quickly I can fetch those I can score those and then I can sort of like get get a get a score whereas what's interesting about the dense vector case is it's more like I go in with sparse vector I go in with a single term and I get like or maybe two or three terms I look up in this giant data structure and I get this this by looking up and I can get like the you can get the handle to all the things that match that and I can sort those and get them back with dense vector the query isn't two or three terms it's some value in a larger 256 or more dimension vector so off the bat right there is like that's a large 256 terms and a traditional surgingian would be a large query and really you're looking up in a in a index that is itself that dimensionality much smaller dimensionality where every document has some amount of value for each each dimension so it's not like cat where it occurs in three things it's like all billion documents have some level of if cat was one of the dimensions some level of catness and if you just think about how you would build a data structure it'd be a very different thing and that's why that's why they people build these you know completely different data structures and why doing nearest neighbors on this large scale data is very important because you do want to get like some some sense of like similar conceptual meaning in this in this compressed vector space but that in and of itself kind of gets at the pros and cons of each because if you get this sort of like compressed representation you don't specifically have the word cat or even direct necessarily direct synonyms of cat that you've created you have like a rough statistical sense of like catness or animal-ness that you're kind of clustering together you've lost it by compressing to smaller dimensions you've lost some precision just by definition but you've sort of expanded the net of what you might bring in so that's like a pro and a con of the new dense vector representation whereas continuing to use a sparse vector representation it's a much more precisely managed look up and so they there's not some future where you throw away one or you throw away the other more and more the reality is like hybrid retrieval where you're using both data structures to serve search results to give people like some kind of relevant results and in a mix of both perspectives of like expanding the meaning to baby mean other things or snow-staying in this more precise world of sparse vector meaning or sparser meaning yeah it's amazing how you put it like it struck me house in a simple way you can explain complex things I mean the sparse vector yeah you said hundreds of thousands of you know terms in your term dictionary when I work with alpha sense I once counted we had that billion in because you feed like I don't know millions and millions of documents and they do vary quite a lot of course there is some overlap like financial legal like revenue right would occur everywhere but then as they describe different verticals in the industry you know healthcare versus I don't know pure finance banking investment firms and stuff they they have different lingual there and and that's amazing like how you put it right so if I had a vector let's say billion size right now I have what I have much less right so I have like 768 dimensions maybe 1024 I heard the recently one commuter in in Elasticsearch is trying to push the scene to upgrade to 2048 or something oh wow and I didn't see that that's that's amazing yeah yeah I think it was my share you were so yeah and this is amazing but I guess you're right and also there is another thing that comes to mind we had a podcast with Yuri Malkov who is the creator of one of the most popular a and then algorithm station SW here article navigable small world graph and when I ask him this question so let's say you have this geometrical search right similarity search and in the case of ecommerce you also want to have filters so you want to say I want to read choose you know this size in stock and so on and and he he's surprising that he said these contradicts contradict the to each other so so much that I cannot even imagine creating a generic algorithm that will cover this case because essentially he said it could quickly degrade to traversing the whole space of points because as you said you know about as a dimension you also have these filters as a dimension right you could say yeah cluster these points on color cluster these points on size yeah yeah imagine doing this up front I mean this is not a generic solution and then he was he was just blunt and saying this is not possible I don't see how you could do this and yet the vector databases claim that they have done it and yes you can go and at scale but I I sense that there is some some truth he didn't maybe potentially there you know some edge cases where it doesn't work or maybe it goes over a second and it's acceptable I don't know but how do you feel yeah I mean it feels a bit like over complicating a solve problem it feels like we've I sort of I suspect that will be in a world of of sort of hybrid more hybrid retrieval where you're using a traditional filter for those kinds of things because I think like I feel like we dense retrieval is the missing piece of most people's search systems if not for anything then just like I think like first pass like often it was like case for a long time that people would would do like first pass like the m25 scoring and then like maybe apply some learning to rank algorithm I sort of feel like that's going to flip to be it could flip to be like first I'm going to get this dense vector candidate list because it's compressed it's like it's actually more recall oriented and then I'm going to use sparse vector techniques to filter re-rank and these kinds of things but I also think like just for speed and ops like one thing that's you know it's just a practical concern is like they're solar, elastic search have such huge install bases and a huge practitioner people who know how to scale them and I think with new dense vector techniques I'm not sure people are going to like completely throw out their elastic search or solar install just to have this new functionality and in fact you know as elastic search and solar sort of adopt more of these things I think more and more people are going to say oh that's cool I'm going to use this in addition to my elastic search or solar so I sort of feel like we'll end up in this world where where yeah elastic search and solar don't give us as nice of a set of feature features for that but it already works pretty well for 80% of what we do anyway and we just kind of want to tack on this extra bit so that feels more of a like my expectation of what the future would be then then we'll like throw out the existing systems and adopt something new yeah this is very interesting opinion of course not downplaying the displayers that you mentioned I haven't talked about it as well seven databases exist today and you neural network no neural frameworks like Gina and Haystack which is of these database but I agree like I think the future might be in flexibility that okay if I'm already with solar why don't I just use the you know and then plug in and try my luck maybe just wet my toes so to say right I don't want to jump to entirely new world of new database that I don't have experience with but if you haven't had that set up you start up let's say I know some startups when they when they want to go that direction with neural search they do consider VESPA or VVA or Pinecone you know and that might be a different use case as well by the way this is entirely new big topic but it's not only search right search is still being figured out and some companies do it but then you also have machine learning pipelines like recommender systems oh totally yeah yeah that's a great use case so it's not like for search up of course like you have this huge install base and stuff so I mean especially you know for established companies like Shopify or whoever else but you're absolutely right there is there is a lot of opportunity for for this at in so many places like pure question answering applications or places where you know you're using it as a backend component to do some kind of inference for recommendation systems I think it's a fantastic I think it's a fantastic thing I do I do think like more and more a bigger practice will evolve to scaling these things out and understanding them from an operational perspective so yeah I definitely think it's an interesting landscape to watch and I think like I think the the other counter even to what I said is like like their solar and elastic search are established for their use cases but this sort of like I was saying before this sort of like world of building these search like or data driven surfaces or personalization driven surfaces it's just wide open like I look at my phone I have the limited screen real estate it needs to show me something relevant for me or relevant to you know I think you know peloton for example peloton the fitness app I want to do a workout it's gonna I'm gonna go to the app it's gonna it's not gonna like it does have an navigation but it's also just trying to show me something on the screen that's gonna be relevant to me so engage with it or Netflix or all of the UIs we use these days they're not really like point and click they're driven by some kind of smart algorithm and it's not necessarily like a classic search use case where it's like point click filter then search with relevance so I do think that it's a wide open space and honestly I think it's a it's an under appreciated space and I think it's a space that in some ways if I was maybe if I was you know I'm just thinking of this now and I speak speaking completely out of out of hand but like if I was to advise like a Pinecone or somebody I might say like let's you know stop talking about yourself as a vector database let's start talking about all of these ways that are really you know I think I in my book I talk about relevance oriented applications or like I forget even like relevance driven enterprises and I think like a lot of these applications are really sort of like completely sort of relevance oriented applications which really whether it's really personalization or recommendations there are things that are about ranking something to a user for given context or maybe for given query or question and I would focus on the universe that stuff because I don't know if anyone's really speaking from a product perspective about how what is the engine that drives that and I think that could be really exciting product or open source space or whatever just just really begin. + This is a great advice I might quote you on the upcoming keynote that I need to deliver at Haystack Berlin because this is a great great thought because in many ways you know one thing is that when people come back to me as they say what's the difference between vector and neural search and I'm like there's not much difference it's like vector is probably mathematical stance and then you know it's more like if you're deep learning engineer or researcher so you like to take it from that angle right but then you put it so beautifully like maybe if we focus too much on this technical level saying you know this is vector search that is and yeah totally it's all sexy you need to bite and we don't focus on like use cases and how things enough I think enough and how this could complement each other you know it's not like vector search is trying to kick out spar search it's not going to happen by the way you know the prey search is not supported in vector search maybe it will be but it's not right now you cannot just say there's also there's also a set of problems that are I mean to this day people use tree based models for I'm going to mix some kind of similarity with some kind of statistic about my data you know I think like tabular data so to speak has consistently been dominated by tree based models which is a completely different thing from deep learning and neural search so and those things integrate pretty well with like the learning to rank plugins in solar and elastic search where you can plug in a vector similarity into that kind of tree based model but the opposite isn't necessarily true this is very interesting so diagram like we spoke a lot about and I'm sure we could speak more about how to engineer a search engine you know let's say if you start up you don't have clicks right you you don't have feedback from your users maybe necessarily in that form you can engineer now you have a dense search you can still engineer by tweaking analysis chains and crafting scene dictionaries but once you are over that launch you know and you gather that data natureal move is to start looking into something like learning to rank and you you spoke a lot about that I was just quickly googling you know you you spoke at I believe reading buzzwords haystack you spoke somewhere in San Francisco Bay area like how to turn you know ranking into a ML problem machine learning problem we also spoke about Bayezian well yeah and then there is also I've recently learned well not that recently I think it was last haystack or maybe the previous before that learning to boost how do these methods come together where do you start for those who maybe haven't have only heard about it but they didn't try it yet and do you also think maybe a connected question do you think that we will marry you know the dense retrieval signals with learning to rank in some way does it make sense yeah yeah so yeah so I think a lot of companies they think it's easy to go into the learning to rank process thinking that I'm gonna this is the you know knock on wood it's easy I have a model I'm gonna train this model with my clicks and everything and and search will magically get better what's interesting is if you go back to haystack talks about learning to rank and other comforts talks where the number one place people get stuck and they spend their energy is on the training data less so the model the features and all of that stuff and if you think about it it makes sense because one of the biggest problems with sort of training data is it's just horribly biased towards whatever the old algorithm is you're always clicking on you people are always clicking on what the old algorithm showed them regardless of you know if it was good or not there are it's getting some clicks and the stuff that might be amazing but it's on the 10th page is never getting clicks so how do you optimize search in that context and it sort of doesn't matter what model you use or what feature you use until you get like really well-labeled training data you can't really make much progress um so what you can do to get started on it is at a minimum okay we know that it's the training data is not perfect but if we just look at like the top end the top 10 or so what's actually getting clicks we might be able to start to learn some stuff there about like what differentiates them so you might start to think you could think about this is um the learning to rank is there are many ways of learning to rank but if we just start to think about it as a classification problem within the context of these uh these results are that we do have click data on what's differentiating them like getting being seen with a lot of impressions and no clicks and lots of things and things with lots of clicks and you start to see the features that separate those um and then you sort of like know at least you're knowing like within the context of your search filter bubble what's sort of like differentiating relevant irrelevant and you can kind of use that to rank um but at some point you do need to realize that like I am working within this filter bubble with my original algorithm and all I'm doing is sort of tweaking up a few things tweaking down a few things how do I bust that filter bubble and get different kinds of potential relevant results in front of users to sort of like see whether or not they'll click or not um and that's that's really like sets you know that's really probably the big big challenge that people have with I mean honestly not just learning to rank but any algorithmic search works they're doing yeah absolutely I mean I when I was doing it using your hello LTR Rappl I think I focused a lot on the mechanical aspects like okay what is pairwise what is list wise I need to read london mark papers to understand get into the width of the algos but then I think I spent maybe too little time figuring out the data part and like head versus tail I think grunting your show wants to say torso as well and I'm like a torso yeah what's that so like do you have any advice for those who are starting like do they just need to be born data scientists or do you think that they're really like that to set yeah it helps I guess for those like you once but like tool set or methodology or a book or whatever like what yeah yeah so um I this is like a this is a big focus of AI powered search a book I help write with trig ranger and max or whim and then ML powered search was just the training I'm doing because I I think I think like a lot of the focus these days is on cool things like dense vector retrieval bird and these kinds of things and to me that's like taking out an old model and putting in a new model but the problems outside of it to get the training data are still really hard and there are a lot of techniques people can use and I think sort of the thing people aren't talking about enough in in the space is active and reinforcement learning and that's what I talk about a lot in my book and my training is this idea of you know how do we strategically explore new potential relevant search results for a query but still maintaining exploiting the knowledge we do have because we don't want to completely just show people random results and how do we play with that boundary a little bit in a strategic way and not just like here's a bunch of results oh it hears like a random one um and there are processes out there to do that and out you know one uh one that's very near and dear to my heart which is a very practical thing for people to learn about is what's called a galsian process and a galsian process is just a different kind of regression so it's the same way we're learning uh we're basically learning to rank we're learning like given a bunch of features like the title bm25 or maybe some dense vector similarity or some other you know the popularity of the document we're still learning from our data what is you know what function of that and probably predicts relevance and what doesn't but what's interesting about a galsian process is that any given point it knows how certain it is in the prediction so like obviously points that are in your training set it's going to have high certainty that the the variants the sort of like the gout that's where the galsian comes in the galsian um distribution at that point is very small it's very certain when you go a little bit farther out from a point that you have information about and it might sort of like try to connect the dots between that that observation and maybe one down here but at the end certain the kind of grows and grows as it moves away from an existing observation and that's interesting because what this model is doing is it's sort of like both predicting relevance for arbitrary points in the feature space and it can do that because it can see patterns like oh it seems like there's a cluster of training data over here where it's like things in this realm are more relevant than things over here on the bottom left but when it's in between those data points is where the uncertainty really lies and it can say well I'm gonna I think we should probe here I think we should try to select the document to show the user to get more information on that is uh uh we feel with a reasonable set of confidence is probably relevant but we're not entirely sure because we haven't exactly observed that yet and that's really where you can both explore the training data and explore the feature space so if you introduce a new feature into learning to rank you could say like oh let me try different combinations of this and then as you start to get out the general pattern you can try things in between to really understand how that feature interacts with how um how users are interacting with data so that's really why it's active learning it's very much about like the model itself yes you're you're training a model like the model itself can know its own gap so that you can sort of like imagine as you're serving search results you can go to this model and say not only are you I am not only wanting what you're most certain about I want like strategically where you want to explore and you can show those results to users too and start to gather clicks and information on that and to me that's a really exciting field of of where search and information retrieval and all of these fuzzy relevance interfaces could go and do a lot of amazing work yeah it sounds fantastic it's like a mathematically driven wave expanding your click base right and it still and it still sounds very experimental to me because nothing is given you only have from what you explained to find a student correctly you know it's like it's still an experiment we could run a nab test is that how you would do it also so like you you basically your model is essentially a reflection of the data choice you made and now you explained a Gaussian model to do that right and then you run an nab test is that right yeah you could do you could set up lots of different ways of doing it so you could be you could have your ab tests that are that are going on within those ab tests or let's say a classic ab test is like if I search for shoe I get ranking a or ranking b I can select actually there's there's a lot of creative ways you can do it but sort of like a classic way of doing it might be to say in the third slot I'm going to put the explore item so in every other slot I might have my like my traditional LTR model that's ranking results using lambda mar or SVM rank or any of these sort of traditional learning rank models and then we're not even learning to rank some other solution that you know works well with your features and then you slot in that like third result that's going to like explore a bit it's going to be different or as many results as you want another completely different option is to use it as a means to drive um in search results it's not like we show people just like 10 results anymore we give people lots of different there's different UI widgets that are like off to the right you might have something that looks a bit more like an ad or suggestions or in in product recommendations or a product search we might have like sort of similar products to those that you searched for like different kinds of prompts and you might get sort of explore in those spaces too to kind of get more click data and traffic to just sort of like explore what might be relevant so it may it depends a lot on like how you want to drive your UI in your specific use case and what might be appropriate and this is help me understand this this is different from click models right because we also have the click bias problem and we could introduce or redistribute the click weight in a way to those unseen items this is absolutely right is this different so it's different so um so there yeah this is like I'm talking about step two of a process step one before you even get to here is you don't just want to take like if you search for shoe and you notice something gets a certain click-through rate it's not necessarily you don't necessarily want to take that raw number of clicks because even within those things that you're showing users different just there is something called position bias which is people might scan top to bottom and they're just going to click on the first result more than they're going to click on the second result and there's lots of reasons for that even when they notice both they're just getting they might say oh this algorithm must know what it's doing um there are people scan top to bottom um and there are different reasons people are just like will click the first result more than the second result and so on and it's it's a it's an interesting phenomenon of like psychology about how people process uh search results that are even shown to them yeah exactly and by the way just as you explained this it occurred to me that have you noticed how um you know the interfaces changed like you go on youtube watching these shirts there is no way to search them right you just click and you watch and watch it because I think at that point first of all there is no bias you don't know what's next but I think the goal is also more like entertainment it's not like um I have an information need right I'm actually searching for something but I guess sometimes and I think you also spoke about it search uh blends with recommender systems because we actually don't know and user might not know what they're looking for sometimes maybe sometimes they do sometimes they don't like it's an explorative search which means it could become a recommender system which means you could plug in those explorative results exactly and it becomes a very um that blending it can be very uh interesting it's also can be challenging because searches also a very intentional activity and if you do something uh let's say in a an a dense vector representation there is some relationship that in a general sense like when you train on Wikipedia it makes sense that these things go together but maybe in the specific domain this specific uh profession there's jargon and it turns out those don't go together people will notice and they'll complain about about these things um a sort of like actually domain independent example of this that you sometimes see is sometimes um things that are opposite sexually occur together so you get like um I want to cancel my reservation or I want to confirm my reservation those sometimes co-occur with the same kinds of words and sometimes in these retrieval situations you might be able to get away with that and like a recommendations context for people like yeah whatever but when I'm searching it's like how how dare you not understand me and people almost get like offended by it because it's almost like going to a person at a store and asking a question and given the exact opposite or something yeah exactly I think my wife was recently doing a search in one of the uh you know grocery apps and uh everything gets delivered home today even in Europe and she was searching for oil and she was saying hey your uh vector search you know research could be applied here probably so the the top result was tuna fish and she was like why uh maybe maybe because oil is one of the components it's inside the oil right so what do you want but she was looking for a category of things so like breads right and she was getting yogurts yeah a sudden yeah so I think that's probably a negative example of an explorative search or maybe not I'm not sure but I think it is like you're a puzzle to the user not to see breads on the on the page and seeing yogurts yeah exactly yeah yeah that's actually a good like example of a the traditional search engine kind of doing that or it's like oil but it's tuna and oil whereas uh maybe a dense vector search might actually work better until you get like motor oil it's like so yeah both sides have to be tuned carefully because yeah search can really searches one of those things and I think the article we talked about a while ago about like the Google article actually talks about this not just in terms of loss revenue but in terms of brand retention because people will not come back to your store if they're like the search doesn't get me the seems dumb so it's it's uh it definitely yeah people people notice when search is not understanding them yeah 300 billion dollar opportunity for everyone yeah out there so in this maze of things learning to rank density well we still need to also um get concerned with how we manage this project right and yeah a lot of a lot of ideas here and thoughts uh but like I'm particularly interested in this um in the search engineer role transcending itself to something else for example it used to be I don't know I was tuning I was a solar relevancy search engineer I guess a few years ago and I was just reading these XML files and tweaking and tweaking and then you know you know indexing search pipeline and so on but today you mentioned this data science came into play and it's still being integrated what what other aspects do we need to think about how should we form search teams uh I believe you have a blog post on that as well we will cite this in the show notes oh yeah a Shopify and I know the Shopify yearning blog I know Eric Pugh at Opus source connections talks about a lot too um yeah and I can't say I have all the answers because if you like it's a you're right it's a brand new space I think it's an interesting thing to talk about I think there's two principles I think about when I think about a search team and you can't you have to do both and it's like it's like uh building a plane while you're flying it do you always you can't be that I remember at Opus source connections sometimes we would get in projects that would be very almost like two infrastructure focus uh and then other projects that would be two only building the experiments and solutions um and what I mean by that is sometimes the infrastructure folks experiments is more like oh we're gonna gather we're gonna spend nine months gathering clicks and processing it and trying to understand what's relevant before you've been touching a model or or tuning relevance wherever and then the other end of the spectrum you have systems that are just like we're not even gonna try to understand what's relevant just tune things and you know yolo ship it and hopefully hopefully things hopefully things look good and what you know and honestly both of those are anti-patterns because obviously in the case where we just like study the problem we never actually deliver anything and uh not just as a consultant but as a practitioner working on a team your stakeholders are gonna lose patience they're you're not gonna have much success well the other hand um I've seen like I've had I had one project where spent months and months and months developing experiments they did have the ability to AB test but we didn't really have any ability to understand or dig below underneath like what was happening at a query level or anything where we just spend months and months experimenting and through a dozen experiments at the wall for AB tests none of them turned out to matter and I suspect in that case it was turned out to be a performance issue or a UX issue that was actually more a problem and really what you have to be doing in this relevant space is shipping experiments all the time with whatever for structure you have to support them while simultaneously like changing the engine that you're using to like understand the quality of relevance so as an example you might do something like start out with Quepid and start shipping things just incrementally with Quepid getting people's feedback as bad as it might be knowing it's wrong um and start shipping changes but at the same time with you're doing that with your right hand and with your left hand you're kind of going in like oh we have to start gathering click data because eventually the Quepid experimentation might hit diminishing returns or might get it really subtle cases that people aren't gonna be able to easily tell me the difference and if you're not doing both you're you're really gonna get yourself into trouble yeah and it's it's amazing you really described it as a not an individual level experience but like a team level experience right like and now well everyone can figure out okay add the data scientist at the UX person at the product manager at the research engineers and work together in one single concert right yeah totally yeah and that's a tricky thing to build because um so the first that yeah you do need all those roles um and you need a tremendous amount of data literacy so not only do you need uh you need those roles you need like probably a good strong core of engineering and data working together so that's probably a good place to start but as you add as you eventually add like someone like a product manager like what does a product manager on a search team do um and that's a really interesting question because I think it's quite different than building other features a product manager on a search team is constantly looking at data trying to let's just say at the query level because it doesn't have to be the query level could be a user or whatever is trying to say like here's a cluster of problems we have or opportunities maybe it's this kind of search a search for um colors in products or a search for this type of terminology and then have some like has to have the ability to do the constantly do the analysis of that data advocate for data that they need to get implemented and then um understand to some level like when they work with their data and engineering team what are the experiments like let's think about half a dozen experiments that could treat this problem prior to ties and triage in terms of reward effort trade-off and um and really plan out how we do those experiments and when you do that planning it's not just about planning the like nuts and bolts of how we get this experiment into production like we built this pipeline we do these things it's also building the like how will we measure how we answer the questions about those experiments um and that's a pretty I feel like that's one of the toughest roles that's a unicorn that is it's hard to hard to have someone with all of those skills but it's also really essential to really be able to have a really successful search team really accurately put I mean I'm still learning the product manager you know roll myself but like that's exactly right you know like you need to generate the insight for yourself you you're constantly like a detective work you know you keep looking yeah that's a good way of putting your detective uh you have to be a really good detective and then you have to like you also have to like figure out where you're going to go digging as a detective what am I gonna maybe I need to set up the like a team of manual laborers because there's something on our click data that's not quite right or or do something different with our click data and it's like you really have to be able to understand and appreciate how the nature of your evidence yeah exactly and and maybe to add to that like when I used to be an engineer what do you do daily you open Gira and you say what's the next ticket on my name so somebody thought about it somebody says what needs to be done they don't tell you that this might be an experiment but like it's given right with product management I don't open Gira and I know what to do every day I'm like let's think uh you know okay look at metrics uh look at query logs see what the engineering has done what experiment we just completed try to combine this pieces into yeah what did we learn from that experiment and like what might be the next step yeah yeah and and also subscribing to bold changes sometimes it's easy to kind of go step by step evolutionary you know but sometimes you need to jump over you steps yeah requires boldness and then messaging that and saying hey we need we need this I know it's almost like going after it makes me think of going after like then you know to get a get um get grant research for like most you know in the US if you if you want to do some big research project at a university you go to a government agency and you give this big proposal and for these big bets you almost like it's almost like that where it's like you yes we have this like side over here that's just constantly evolving whatever it currently works but then like for these big bets you almost have to think about it in terms of we want to spend x amount of time researching this area to see if this direction works out and then as part of that you also have to be like these are the early tests the prototypes before we build the big thing to know if we should invest even further and that's that's a tricky thing I think that's something a really good product manager can sort of like coach the stakeholders and thinking about these things of like and and thinking about them as bets and not thinking about them as like sure things that we know they're going to work out this also really important yeah exactly yeah just one example came to came to my mind that was it eBay when they didn't have type of head when they added it they they tapped into something like a hundred million dollar market you know because because you reduce the the time spent in each search session right you might get the faster which means you will get the faster transaction or like get faster so yeah totally probably probably was in involving product management thinking what if we do this but it's yeah it's like outside of box thinking yeah totally totally and I before we close off I mean I really enjoyed this conversation bag and I think we could speak entire day you know my engineer having like a lot of fun now like really getting into this but I love asking this question and you partially answered it during this podcast the y-question what really like you've done a ton it's not just that you imagine doing things or told someone to do you actually did it yourself like keep it learning to rank plugin explainer you know books all of these really physical almost physical objects right books are it wouldn't matter yeah so and but you still keep going and going and I mean you talk at conferences you push so much material on LinkedIn and Twitter I barely can fall up like what drives you in this space that's a that's a great question I think I think what gets me excited about this space is how hey it feels like the future of how people interact with with computers like searches the Google for example for a long time people call it Google like it's really a command line interface but it understands natural language and I feel like more and more interfaces are this like fuzzy interaction that's search like and it's this thing creeping up on us that people aren't quite realizing um and then the other it that just makes it a fascinating field of like the intersection of data and engineering and product and UX and you have to have all of these parts of your brain working together to help sort of like understand and solve the problem um it's really just a it's a huge intellectual challenge but I you know more you know more foundationally just like I find like interacting with interesting and great people in the field also just drives me is just how fun it is to interact out there with people like you Demetri and other people who are just like also get excited about the problem and like to nerd out about it so that also kind of drives me is just the social aspect of sharing my crazy ideas or products or books and getting feedback and like continuing the conversation yeah and I think it's endless you're doing a great contribution there but it's like endless journey in many ways right so many facets so much totally dimensionality yeah totally absolutely and of course I think people want to learn these things I myself as well from time to time I'm subscribed to a course and I just have the blast of I don't know four weeks two months whatever not for the certificate but for the for the knowledge and for that feeling of connection to that knowledge and with that I want to ask you if you have any announcements yeah so I'm doing a course with sphere sphere is a fantastic company that is sort of trying to build these like next level courses you know it's not your basic utemy course we're learning some basic things it's really like it's almost like a master class with a professional and they are really focused on machine learning engineering right now so they've recommended systems and all these things that I'm doing an ML powered search course and it really covers a lot of these things that we've talked about starting from you know just appreciating the relevance problem to building up learning terrain models and really focus on the problem of ranking and then also discovery of doing feature exploration and training data exploration to try to figure out what's even relevant beyond the sort of filter bubble of our current search algorithms so if you're interested in that catch up with me at it's get sphere. +com and you can find the ML powered search course and then of course like I all of my other things out there AI powered search written with tray and max and relevant search of course hopefully still still relevant so to speak and all the great stuff out there that I think people find interesting and useful and of course I also want to continue to plug open source connections they have great training consulting courses I was you know a key part of training of that team as a great place as a resource that you can go to so yeah this is fantastic announcement and also thanks for that and I also want to say that I enjoy the reason I enjoy reading your book relevant searches not only because you share a bunch there like for example indexing songs I was like what inverted index yeah you can if you want it your way of writing is very thorough it's like you create a network of thoughts as I go through the text and say we will talk about it later but let me spend a few sentences still explaining what I mean and I'm like it's like a conversation and yeah I try to be conversation on including like the typical like bad jokes and sorts of humor exactly I'm also learning on that side so that's that's fantastic and that gives that feel and keep keep going keep doing this I enjoy pulling what you do and connecting once in a while you sometimes give me a really good advice on you know how to oh sure is the title in the blog post or should they venture into this or not and things like that that's amazing this cross pollination so I'm enjoying it a lot and I recommend everyone to subscribe to your course google of course link it thank you and have fun have fun oh definitely we'll do awesome thanks so much Doug I enjoyed it and see you soon hopefully in person yeah yeah same all right bye bye all right take care with \ No newline at end of file diff --git a/transcripts/vector-podcast/economical-way-of-serving-vector-search-workloads-with-simon-eskildsen-ceo-turbopuffer.md b/transcripts/vector-podcast/economical-way-of-serving-vector-search-workloads-with-simon-eskildsen-ceo-turbopuffer.md new file mode 100644 index 0000000..d793b4f --- /dev/null +++ b/transcripts/vector-podcast/economical-way-of-serving-vector-search-workloads-with-simon-eskildsen-ceo-turbopuffer.md @@ -0,0 +1,91 @@ +--- +description: '

Turbopuffer search engine supports such products as Cursor, Notion, + Linear, Superhuman and Readwise.

This episode on YouTube: https://youtu.be/I8Ztqajighg

Medium: + https://dmitry-kan.medium.com/vector-podcast-simon-eskildsen-turbopuffer-69e456da8df3

Dev: + https://dev.to/vectorpodcast/vector-podcast-simon-eskildsen-turbopuffer-cfa

If + you are on Lucene / OpenSearch stack, you can go managed by signing up here: https://console.aiven.io/signup?utm_source=youtube&utm_medium=&&utm_content=vectorpodcast

Time + codes:

00:00 Intro

00:15 Napkin Problem 4: Throughput of Redis

01:35 + Episode intro

02:45 Simon''s background, including implementation of Turbopuffer

09:23 + How Cursor became an early client

11:25 How to test pre-launch

14:38 + Why a new vector DB deserves to exist?

20:39 Latency aspect

26:27 Implementation + language for Turbopuffer

28:11 Impact of LLM coding tools on programmer craft

30:02 + Engineer 2 CEO transition

35:10 Architecture of Turbopuffer

43:25 Disk + vs S3 latency, NVMe disks, DRAM

48:27 Multitenancy

50:29 Recall@N benchmarking

59:38 + filtered ANN and Big-ANN Benchmarks

1:00:54 What users care about more (than + Recall@N benchmarking)

1:01:28 Spicy question about benchmarking in competition

1:06:01 + Interesting challenges ahead to tackle

1:10:13 Simon''s announcement

Show + notes:

- Turbopuffer in Cursor: https://www.youtube.com/watch?v=oFfVt3S51T4&t=5223s

transcript: + https://lexfridman.com/cursor-team-transcript

- + https://turbopuffer.com/

- + Napkin Math: https://sirupsen.com/napkin

- + Follow Simon on X: https://x.com/Sirupsen

- + Not All Vector Databases Are Made Equal: https://towardsdatascience.com/milvus-pinecone-vespa-weaviate-vald-gsi-what-unites-these-buzz-words-and-what-makes-each-9c65a3bd0696/

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20250919_060954_7832a7c20742f9493a19a27a0c5d8947.png +pub_date: Fri, 19 Sep 2025 06:09:39 GMT +title: Economical way of serving vector search workloads with Simon Eskildsen, CEO + Turbopuffer +url: https://rss.com/podcasts/vector-podcast/2222846 +--- + +Now, let's get started. MAPKIN Problem 4 Today, as you are preparing your organic, high mountain type needs along in the kitchen net. +One of your lovely co-workers mentioned that they were looking at adding more radices because it was maxing out at 10,000 commands per second, which they were trending aggressively towards. You asked them how they were using it. +Were they writing some obscure order and command? They would BPF probes to determine that it was all get key and set key value. They also confirmed all the values were about less than 64 bytes. For those unfamiliar with radices, it's single threaded in memory key value store written in C. +And faced, after this encounter, you walk to the window. You look out and see if your high mountain type needs along. As you stare at yet another condominium building being built, it hits you. 10,000 commands per second. 10,000. +Isn't that a bit low? Shouldn't something that's fundamentally just doing random memory reads and writes over an established TCP session be able to do more? Hello there. Vector podcast is back. Season 4. +And we are kicking off with an exciting topic and guest assignment, Eskulls and CEO of TurboPuffer. I've been watching you guys from almost from the start, just following each other on Twitter, like virtual friends. +And it's funny that before this episode, you're the CEO of the company and this for this episode, you try to sell TurboPuffer to me and say, hey, why don't you use it? Did you make a compound tom should pass? Yeah, should pass, for sure. But tell me, hey, welcome, first of all, welcome. +And thank you very much for having with me. It's a tradition to usually start with the background. If you could speak in your own words about yourself, your journey. +I know that you've worked at Shopify at some point, also scaling databases, I guess, right? But I've also been following your napkin math newsletter. I was reading maybe I'll quote some text today from there, just to amuse an exciting audience. But tell me about yourself. +Yeah, I can give a very brief overview and if we can dig into anything, if there's anything that stands out. I started programming when I was a teenager. Similar to you, English is not my first language. +So at some point, I exhausted the Danish web and then like divulged into video game addiction for three years as a teenager to learn enough English to sort of, you know, get my own chat CBT moment and take off point. +And then I spent a lot of time in high school being not very good at competitive programming, but good enough to qualify for the small country of Denmark. And then I spent almost a decade working at Shopify doing mainly infrastructure work. +So when I joined infrastructure Shopify and the infrastructure team, we were doing, I mean, it was not even an infrastructure team like DevOps was just becoming a thing. And we were driving just a couple hundred requests per second. And by the time I left, we saw peaks of more than a million. +And I more or less worked on all of the stateful systems that power that because they generally tend to be the bottleneck just playing whack-a-mole every single year for every Black Friday for many years. And I spent the majority of those years on one of the last resort pages for Shopify as well. +One of those pages were the pages are very scary in the middle of the night and where a lot of GME of course runs through Shopify. So very high responsibility on that. I left in 2021 and kind of jumped around at my friends' companies helping them with various things. +And I spent almost my entire career at one company. So I wanted to dabble and just go and basically help my friends with any infrastructure challenges that they had. And in 2023 when Chatschy BT launched and the APIs launched, I was working with my friends at this company called Readwise. +They have a product similar to a pocket and others for reading articles later from an Amal product. And they asked me to build a recommendation feature for articles. And I was like, well, it's perfect, right? LLMs or embedding models are basically just LLMs with their heads chopped off. +And they're trained on exactly this data. So we built something and it actually worked pretty well for just recommending articles. But then I ran the back of the envelope math on what it would cost to do this for the entire article catalog, right? It had hundreds of millions of articles. +And it would have cost more than 30 grand a month to do. And for a large company that's not a big deal for an experiment. But this was a company that was spending three grand a month on a Postgres instance that prior to working on this, I tuned. +And spending 10 times that on just recommendations and possibly search was just untenable. So it sort of lost it. It was lost in its track. And it was a bit sad. +And it's sort of ended up in that bucket that a lot of companies have of like, okay, we're going to work on this when it becomes cheaper and then we'll ship this feature. But it was a bit sad because I was excited about this feature. It's a user of the product as well. +And I could not stop thinking about that. Why was it so expensive? And the vector databases at the time were storing everything in memory. And DRAM on a cloud cost somewhere between two to five dollars per gigabyte. And this just economics of this didn't line up. +It wasn't that this vector database was doing anything, you know, malicious in their pricing. They're just trying to earn them on its margin on memory pricing. But memory pricing was just too high and it stopped its feature and it's tracks. +And what I couldn't stop thinking about is, why can't we do all of this on top of Obick storage, right? Like we just put it on an Obick storage. That's the source of truth. And then we actually need to some piece of data and we put it in memory or even on this if we can. +And I did it to Mac and Nathan. And I was like, I think that's about a hundred times cheaper. And of course, that would have been no brainer for read wise. +We would have just bought it and started using it and tried it out, right? And maybe put way more data in and maybe worked our way up to that 30 grand a month bill. But with a different workload. And so yeah, I couldn't stop thinking about it. +And then eventually started writing the first version over the summer of 2023. Just me alone in the wills of Canada and then launched it in October of 2023, which is probably where you saw it. I didn't really tell anyone about it. I was just I was just hacking away. +Launched it did a lot of our deal over that summer insights that some of them still are in the product and a lot of them we've since faced out. But the most important thing was that it launched. +And the first version of trouble puffer didn't have I was just looking at the website the other day for an unrelated reason. It didn't have mutable indexes. So you just wrote to it and then you called an index endpoint and then you're logged in like that's it. And it didn't have any SDKs. +It was just a big pure HTML website. But it was enough to ship it and it caught the attention at the time of the cursor team back in 2023. And of course, this was this was this was early on for cursor. It was early on for us. +And they are they're a vector database built did not line up with their per user economics and how they wanted to use rag in their in cursor. And so they they wanted to try to work together. +And we exchanged a bunch of emails of bullet points and it was very clear that they thought that this architecture was exactly now knowing the team are now they were just sat down at the dining table, done to napkin math over there and then thought why hasn't anyone built it like this. + And so we worked we worked I went to San Francisco and spent some time with them and came up with a bunch of features that they would need and called the best engineer that I knew would Shopify my co founder Justin and asked if they'd come on board because I think I think maybe there's something here. + And yeah, we launched it cursor cursor moved over and their bill was reduced by 95% and of course the additional storage architect today were on before didn't make sense for the cursor economics but our storage architecture really did because you put all the all the code based embeddings on s3 and then the ones that are actively being used we can use in grammar or have in disk. +I'll stop there but that would be what led up to to this moment. Oh, that's amazing journey. + A lot to ask of course a lot of questions but just on that cursor thing as I told you before we started recording you know I knew about you launching this working on this and then I've released it to the Lex Friedman podcast episode with the cursor team and they didn't mention turbo pop for sort of like in passing but you know I think that also probably created a lot of attention to you guys but I'm just curious like how did you get together how did you know cursor team somehow someone on the cursor team that you could like partner early on and essentially help they kind of like helped you to pioneer it right in some sense becoming the first client or maybe future client right how did you approach them. +They did I mean they were a design partner in every sense of the word right we had a slack channel and I feel like they treated us as part of their team and we treated them as part of our team. + They came inbound they sent an email based on the website and they said hey we would need mutable indexes and glob and a couple of other things so it's like well that's a very reasonable request right and I think they had the conviction that this was the right architecture and like if we could prove in their trust and then be able to be in a good place so it was really just a it was just an oneness conversation just the way that the website is today a very honest description of what are the trade-offs what can it do what can it not do what is the latency profile what are the guarantees and that's exactly the kind of bullet point discussion that we engaged in over email before I met the team in person yeah and they of course they were a small team at the time right it was and they needed help with the with with parts of their infrastructure and working very very closely with teams that they could trust with the right economics and the right the right reliability. + Yeah for sure but I guess that honesty which I also value a lot you know and in my work as I became a product manager you know three years ago and I think it applies to any discipline be honest but but you know like that honesty probably lies on the fact that you you've done your napkin math and you knew where this will scale where how this can go right how did you go about doing that pre-launch right before having any client is that the company of your friends that helped you to kind of like figure out the economics and sort of the the throughput and all of these rigorous questions that you ask you know as problem statements on napkin math. + I think that should almost bring up the internet archive version of it the the first version of TurboPuffer I had not thought about the business at all I didn't have any launch playbook I had I had one of course all the economics of what it would cost me to operate and spend a decent amount of time on the pricing because that felt like an important thing to spend time on at the time but there was really not much more than that of course the the Readwise team was very interested but at the time I could barely do a you know I could just do around 10 million factors which is not enough for their use case. + I can screen share the website with you right here of what it looked like at the time and then we can get for the for the for the listening audience we can get your reaction but it was it was very simple I wouldn't I wouldn't put in any sophistication and it was honestly I was exhausted I've been working on this like completely alone not telling anyone about it no interested customers for like four months extremely focused like every single day and I couldn't like you ask my wife she would say I was very distracted and she's just like well how are you working so hard on this like there's no one on the team you don't have any customer line up and I'm just like someone has to do this and I I just launched it and I launched it I mean now I feel some verising would be did launch it just couldn't do that launch it was pretty slow I spent a bunch of time actually trying to make it work in wasm and on the edge but it was too hard to make it fast and a bunch of other false starch like that on different types of a and end indexing structures we could talk about that as well and would be settled on but there was no real sophistication in the go to market it was really just here it is here's the outcome math here's what it does let's see how the world takes it but I think when when when you sit on a well you didn't sit on it yet but you had a cool like technology ID and mind right you knew you know it may play out but also of course it required a lot of hard work like you said but after that after you see it fly like on some small scale or whatever scale I think that brings you like that excitement to bring it to the world right so yeah I see you're sharing the screen of the of the web archive page yeah that's it very simple yeah yeah that's awesome but yeah that's actually a good segue to you know you probably know I've been at the emergence of the field of vector database field I've been I think I was the first probably to write just a simple block post with like you know these crump snippets of what each vector database did and how they stand out and so on turbo buffer wasn't there because turbo buffer was still in your mind I think but but the segue here is I don't have it covered in that block post but in your mind why were you not happy with the vector date is like at large did you try all of them did you try some of them why did you think that a new vector database deserves to exist yeah I think I think it really just came back to the read wise example right there's I there's they look like great products I really like the API of many of them they had lots of features that have taken me a long time to build that even features that we don't have today although we have a lot of features today compared to when we launched it came out of the cost piece that it felt that there was a lot of latent demand built up in the market of people who wanted to use these things but it just didn't make sense with the economics it's very difficult to earn a return on search I mean I remember the search clusters that Shopify were very expensive but ecommerce is a lot about search and so it was okay right but for a lot of companies search is a an important feature but is not the feature right and so the per user economics just have to make sense it's not that everyone just wants it in the cheapest possible way is that if you invest in infrastructure you have to get a return on that investment and it felt that I knew that I'd read wise they could get a return on that investment but it wasn't on 30 grand a month it was maybe close at a 3 grand or 5 grand a month that they would feel that they could earn a return on that feature and gender conversion engagement and whatever so it was really about the storage architecture and I think that when I think about databases now this was not as coherent to me at the time at the time I was driven by the Nipkin Math Naptkin Math not the not the market nothing else it was based on one qualitative experience and an Naptkin Math there was nothing else in it and speak about it in a more sophisticated way now being you know having learned a lot about go-to-market sense but the that those were that's really all that was at the time it was an insight on those two things the best ideas right are simultaneous inventions right someone else would have done it six months later probably other people were doing it at the time that launched a later right we were the first to launch with this particular architecture but it was out there for the grappling right the idea was in the air like s3 had the the dpites now finally so the way that I think about this to really boil this down is that if you want to create a generational database company I think you need two things you need a new workload the new workload here is that we have almost every company on earth sits on their treasure troll of data and they want to connect that to LLAMs especially all the unstructured data that it's always been very difficult to do we did this for structured data into 2010s the new workload was that we wanted to do analytics on billions tens of billions trillions of rows of structured data but now with LLAMs we're entering into that with the unstructured data that's the first thing we needed new workload because that's when people go out shopping for a new database the second thing that you need is a new storage architecture if you don't have a new storage architecture that is fundamentally a better tradeoff for the particular workload then there's no reason why tacking on a secondary index to your relational database to your OLAB to your existing search engine they would eat it I would have made that decision in the shoes that Shopify right it's like well this database like has a really good vector index but it doesn't bring anything new in terms of the storage architecture so we're just going to invest in the mySQL extension right it's what we really want to Shopify or the uh Lucine Lucine workload right these are great databases they've stood the test of time and when you're on call you become very conservative in what you adopt for new workloads but you cannot ignore a new storage architecture that is an order of magnitude cheaper than the previous one when you store a gigabyte of data in a traditional storage engine you have to replicate that to three disks maybe two if you have a little bit or if you have more risk tolerance but likely three a gigabyte of disk with from the cloud vendors cost about 10 cents you run it at 50% utilization otherwise it's too scary to be on call 20 cents per gigabyte times three for all the replicas 60 cents per gigabyte obi storage is two cents per gigabyte right so it's it's it's 30 times cheaper if it's all cold now by the time you have some of it in SSD and you have it in memory then the blended cost ends up being different but it tracks the actual value to the customer even if you have all of that in disk well you only need one copy right and that disk you can run it at 100% utilization meaning the blended cost is now 12 cents per gigabyte right so the 10 cents 100% utilization plus the two cents per gigabyte for obi storage so now you have the ingredients of a new actual database you have a new workload right which is which makes means that people are out there trying to look for ways to connect their data to LLMs and then you have the second ingredient which is a new storage architecture that allows them to do it in order of magnitude easier and cheaper than what they can do when they're existing architectures and this matters because vectors are so big right a kilobyte of text easily turns into tens of kilobytes of vector data yeah yeah it's absolutely true one other thing that I kept keep hearing or kept hearing about you know whether or not to introduce a vector search in the mix for some really heavy workloads is that it will bring certain latency on top that we cannot tolerate right for example if you run a hybrid search like you guys have implemented as well you know one of these will be slowest and therefore you will have to wait for that slowest component and so if it adds I don't know a few hundred milliseconds on top of your original you know retrieval mechanism then it's going to be an off-line what's your take on that have you thought obviously you have thought about that what's the edge that turbo buffer brings in this space over maybe pure databases yeah I think I think there's we see two types of ways that people adopt vector databases or turbo buffer we don't consider turbo buffer a pure play vector database we consider it a search engine we actually consider it a full database because there's a full generic LSM underneath all of that and we consider that the actual acid of turbo puffer is an LSM that's obnox storage native and doesn't rely on any state we just think that the vector index and the search engine index is what the market needed the most so let's speak about latency there's no real fundamental latency trade off with this architecture the only thing is that once in a while you will hit that cold query but the entire databases optimize the round minimizing the amount of round trips that you do to SS3 S3 you can max out a a network card right so you can get on a gcp or AWS box you can get 50 to 100 gigabytes per second of network bandwidth you give it per second of network bandwidth so this is similar to this band with the latency actually even better in the clouds often than disks even with SSDs even with N and NVME SSD so the network is phenomenal you can drive you know say you can drive all of that data you can drive gigabytes of data per second in a single round trip so you can get great throughput but the latency is high the p90 might be around 200 milliseconds to s3 for every round trip someone regardless of how much data that you transfer assuming you're saturating the box we've decide almost everything interval buffer around minimizing the number of round trips to 3 to 4 that doesn't just help for s3 it also helps for modern disk which the same thing you can drive enormous amounts of bandwidth but the round trip time is is long right it's like a hundreds of microseconds versus hundreds of milliseconds but still still substantial compared to dm so the latency tradeoff is not a fundamental tradeoff with this architecture by the time that it makes it into the memory cache it's just as fast as everyone else we have found that people don't care if it's like a millisecond or five milliseconds as long as it's reliably less than around 50 milliseconds they're good right and I think that a lot of the traditional storage architectures especially because of the sharding structure with multiple nodes you're already in a worse position than going to two systems where if you write a query on some of the traditional search engine generally you touch five ten maybe more nodes depending on depending on this because the shard size is very very small you go into more depth on that so you already have this problem what we see is that there's two types of ways that people adopt it so the first one is you have an existing lexical search engine you are having a hard time running it because of the traditional like very stateful architecture and they're reputed for just being difficult to run and you're like already a little bit add your threshold for the amount of money that you're spending on this cluster and if you put the vector data in it's often 10 to 20 times larger than the text data it is just it's a project that stops in its tracks similar to the read wise case that I mentioned before so for those players we often see that they have something that's really well tuned for the lexical and they adopt a vector store and then they do two queries in parallel the vector store should not be slower than the lexical right so these are just two futures that you merge together in use and in general we see that our customers are actually quite happy to move some of the ranking and the final like second stage ranking out of the search engine and into a search. +py instead of a big search. +json which can be very difficult to maintain many of these companies express a lot of desire to move more and more of their lexical work also onto turbo buffer and we have a full tech search engine we don't have every feature of blue scene yet but we're working very very actively on bringing this up what we also see is that a lot of our customers don't need all of the features of blue scene anymore because the vectors are so good that a lot of the you know Ph. +D. + level efforts we did before to turn strings into things is not as much of an issue anymore and really what we use strings for more is that when you search for DM you get the metri right like like for a prefix match whereas an embedding model might think that that's a direct message those kinds of things are important and we still need string matching for that lots of applications needed but there's a lot of things that we do in leucine with synonyms with stemming with all these kinds of things the team models are frankly just a lot better at so we find that this is an adoption curve that is there a lot of the newer companies just start with embedding models and simple full-text search and and they get it up and running on turbo puffer and they like that they just pay for what they need they don't think about it and they could pump a petabyte of data and if they want it and it would be extremely competitive on pricing and they don't have to think about it oh that's awesome that's awesome actually I forgot to mention I forgot to ask you which language did you choose to implement to revolver on yeah we we um well it was just me at the time but I chose I chose Rust and I think I spent the majority of my career writing Ruby at Shopify and and then a lot of go as well for some of the infrastructure components and then mainly debugging C which all the databases that we were using were we're doing and reading C I I really like go and I like go alongside Ruby at Shopify because go was one of those things as when leading teams I didn't have to worry about whether someone knew go or not because the adoption to learn it is two weeks um the adoption to learn Rust and being proficient in it is months right and somehow that's written Rust for two years it's a lot more productive than someone who's written it for two months in the language um and that's just not the case for go like someone who's spent two years in it is just not that much more productive and so and I think that's an amazing feature of the language but from from my own point of view and from the napkin web math point of you I just I was always so hungry having been in time inside of runtimes in the Ruby MRI runtime and then inside of the go runtime I was just hungry to just get directly connected to the metal of the machine and so and for a database in particular that was very important right we need to vectorize everything we need full control over that and I think that I think that that full control as remarkable now as Go is which would I think it would be would have been okay that raw access to the machine has been has is needed for for writing something like TurboPover. + Yeah for sure I still remember coding the times when I was learning and coded industrially in CNC++ like you like you really need it to be very very careful but in return you can get a lot of like performance gains you know and some of your ideas really fly but yeah today I guess I'm coding more in Python or should I even say that I called in Python when I use cursor more and more which is by the way scary you know the the that feeling when some some other entity writes code and you are just reading it right it's it's a little bit scary and I'm still grappling with it but the amount of productivity that I get is enormous and it's like you know I can shoot daily like features and just see them being used that's amazing. + I think what I love about it is that I still love to sit there and write the additional code by hand you know maybe at some point we will we will we will mark TurboPover as an a seasonally written database because we don't use a ton of AI for the very key parts because I mean we're at the edge of what the LLMs could know but I think that for me in a position where I'm in and out of meetings all day these days but I can actually get a lot done in a 30 minute window when I have something that's prompting and writing the tests right and you follow it off at the beginning of the meeting you check in and they're like you know 15 30 minutes you have in between blocks and this allows me to actually contribute a lot more code than I was otherwise going to be able to not into the core engine you know I don't get led led into led into a lot of that anymore because I don't have the the time and focus that it takes to fully think something through there but for the website the API to initial features all of that it's just been wonderful yeah that's amazing I also wanted to go a bit on the tangent like you essentially you've been you could say mathematician engineer but you took a leap towards becoming a CEO right and I think you know as you said you go to meetings you do lots of you know probably sales and and and and product and all of that stuff was it a natural transition for you like what what have you learned in this in this journey and what what maybe do you miss from from your previous career when you when you were like you know hands on and sit down and write a bunch of code I think I have a I have a couple of angles to answer the question not necessarily a directing answer I think one one angle is that fundamentally I'm like a growth junkie for better or worse and I think that entrepreneurship is the ultimate path for a growth junkie it was never really something that I assumed that I was going to do I've never before even when I was working on the project it's never about becoming a founder it's just about creating the database right and at some point becoming the founder of the company becomes a means to an end of creating the database and getting it into the hands of our users and making sure they have a great time that's always what like that's what drove me right was the read why I should have this right our customers should have this this you have a great experience and that's always what's driven me and to me the the founder and all of the other things have been a mean towards an end there I think that one of the things that is maybe both a controversial but also feels like a true statement is that at some point I feel a bit numb to what work that I enjoy and what I don't enjoy anymore because what I enjoy the most is making this company successful and making the database successful for our customers that's what I care the most about and I'm yeah I honestly I love sales I love marketing I love the engineering I love hiring people for the team I love all of these things but it's not a simplistic answer to oh I've been coding my whole life I think it's more that that is my idle activity if there is that one to two hour and there's nothing urgent on then I'm going to go spend some time in the code it's like oh how did how did Nathan implement this new this new query planning of query plan or heuristic that is a natural that's where my idle activity and I always like to also an interviewing people try to understand especially if they're in a more hybrid world what's your idle activity what's the thing did you do when you have one to two hours and nothing else comes up do you gravitate towards the code do you start looking at just start writing an article do you start playing with the product what is that idle activity and it is code for me that's what everything is grounded in and I think it I think it has a deep influence on how I can how I can lead the company but I don't think it's been I think I often think about something that tell them said you know the author of anti fragile and a bunch of other books is that you the best authors of books are not the ones that sit down and like you know read a bunch of papers then write a page then read another paper write a page the best books are written by people who just you know go to the cabin sit down write 500 pages and and hit publish of course that's not what actually happens but if you read it to let books it's probably pretty close to what actually happened and he just has the citations in his head and I think about that often when building this company that it has felt like I've worked or this my whole life without knowing for it and I feel every every morning that I wake up that this is exactly what it has led up to so it's very naturally even if it wasn't go onto itself that it makes sense with the experience I've had to do exactly this and I tremendously enjoy it but it's not a simplistic answer to do I miss coding no I want to make this company incredibly successful but sometimes I will do it as a recreational activity yeah I mean definitely like when I look at you like on twitter for example you come across as a very technical person and you are for sure right even though you know that to grow your business you need to do a lot of other activities but at the same time I mean yeah I don't mean to ask it in a way that hey you regret now that you do sales you regret not doing more coding which which is not true you still do that and I think that all of all of the engineers will become better engineers if they learn the mastery of actually presenting what they do right and then they will not need a middle layer or someone else who will go and talk to that product manager or whoever else needs to talk to right so they can actually represent themselves but also I also love how you put it really eloquently that what is your idle activity right what do you what's your affinity what you gravitate to and I actually can it resonates a lot with me because my idle activity that I'm really nervous that I do nothing especially on vacations I start coding you know I just go and just okay let's just let's just hypothesize about something but let's let's dial back for for the into the architecture like when I look at the architecture page of turbo buffer it's very simple it's like client connecting over you know TCP to a database instance and it has just two components they are memory or SSD cache and the object storage tell me a bit more so I think our listeners and I mostly know what object storage is but tell me a bit more about that memory component like what algorithm design went into that maybe trade-offs and you know how frequently you need to do the round trips to the object storage versus when we actually don't do that yeah I think it would be easiest to do this by speaking about the lifetime of a request as the cache worms so we we'll actually start with the right path and when when you do a right into turbo buffer it's as simple as you can imagine it I mean at this point we've optimized parts of it that it's not this simple but this is the the best way to explain it when you when you do a right to turbo buffer that right basically goes into a file in a directory called the right ahead log so when you write to a namespace you can imagine that on s3 it's like slash namespace slash you know right ahead log the right ahead log is basically just a sequence of all the rights in order the raw rights so you do your right and it might be okay I'm inserting a document with text the metri and one with text Simon and those are the two documents you can in the simplest way you can imagine that this file is called zero job JSON and the next one is called one dot JSON three dot JSON that's a database right that's just the right ahead log and if you want to satisfy a query you just scan through all the JSON documents and you satisfy the query that's actually respectable database and it's not even that far from the first version of turbo buffer but of course you have to index that data as well so basically as you can imagine once many megabytes of data come in asynchronously an indexing node will pick it up and put it into the inverted index for full text search put it into an a and an index for vector search and put it into a attribute or filtering index for other attributes and there will be other indexing types in the future when when that happens it will put it into slash namespace slash index and just start putting files in there right and then the query layer can then consult those files right instead of scanning through every single document to find a metri you can just plop in and look at the metri in the inverted index find the document and return it that's how right works when a right happens it will go through one of the query nodes and the right will be also written into the into the cache right so both the memory cache and the disk cache and when so when you do a query you will go to that same query node right there's a consistent hashing so if there's three it's sort of like the same namespace will end up on node one all the time if it hashes that I know when you've satisfied when you when you do a query it will first take the cat check the caches if you just did the right well it's already there because we just wrote all the rights into the cache to have this you know the right through cache and and we will satisfy the query mainly from the cache if for whatever reason this namespace is not maybe you did the right a month ago and so it's following that a cache and you do the read well then we'll read through cache by going directly to opix storage with its few round trips as possible to get the data to satisfy the query both from the index and from the wall will do range reads directly on s3 right the old like hcp range header to get exactly the bytes we need to satisfy the query and then start hydrating the cache on the on the query node so the subsequent queries get faster and faster and we can do that a gigabyte per second we can hydrate the cache even for very very very large namespaces so that's the general architecture of turbo puffer on a completely cold query it takes hundreds of milliseconds and on a warm query it can take as little as 10 milliseconds to to satisfy the query the the last detail I'll point out and then we can go into a particular aspect of this is that turbo puffer has chosen to do consistent reads by default this is an unusual choice for search engines we've seen doesn't do this unless you turn it on explicitly I think they've done more work now for real time indexing which to me is the gold standard which is why I keep referring back to it's phenomenal piece of software and turbo power requests consistent reads by default meaning that if you do it right and then you read immediately afterwards that right will be visible and in order to satisfy that we can't just rely on the cache on that node being out of date that node could have died it could have you know the hashed ring could have moved because we scaled up so every single um query we go to op storage and see what is the latest entry in the wall and do we have that entry right is it at 3. +json or is that 5. +json and do I have that so we have a little pointer file that we can look and we can download and look at right and that round trip is basically our p50 like our spans are basically you know often like one to two milliseconds of actual search and then on gcs about depending on the region 12 to 16 milliseconds waiting for that consistency check on s3 the small obnoxiously it's a little bit better so it's eight milliseconds but you can turn this off and you will still get up to you you can get eventual consistency that's very normal for these databases like could be up to one minute out of date and then you can see often less than a millisecond or a millisecond latency to a turbo buffer by turning off that check but we find that this is a very safe default and I think that database should ship with very safe and unsurprising defaults yeah for sure for sure um so in that cache but you also have the you also have the let's focus only back to search part for now you also have the a and n index is that also stored on s3 and then is it do you also keep kind of like a replica of it in memory to to quick access and how do you sort of it's true how do you sort of synchronize the two both the right-ahead log and the index are everything is stored on s3 if you killed all of the compute nodes of turbo buffer in all of our clusters we would not lose any data there is no data on the compute notes that matter it's only transient caching but we cache everything yeah if you're accessing the index will cache the index if you're just accessing the right-ahead log files because it's so small or there's parts of the data that hasn't been indexed then and that's also on s3 and goes into the same cache with everything else right prioritized by the workloads to try to get the best performance possible yeah it's quite smart so effectively you like I remember like at some previous companies when I was running Apache Solar one of the problems was always that all of these charts are super cold because they're never used right we still pay for them but then when the query hits you incur so much latency that's super painful and so I was always coming up with these ideas what if I run some you know post indexing warm-up script that will go and shoot a bunch of queries to all of the charts just to keep them you know up and running and and warm or just cat all the indices on Linux into memory we've done that too that was like 10 years ago or so that was very strange feeling like why do I need to mess with that level of detail it never actually paid off I think what pays off is a most smart way to organize your index and how you read data backwards like essentially when you users really only need fresh data first like on Twitter for example everyone is really after the recent tweets and not some archive and that was very similar case for us but it's very interesting like you go into so much detail there to to make the database effectively like a living organism adjusting to the usage but you also you also have multi-tenancy right so meaning that the same the same turbo buffer deployed across the data centers is going to be used by multiple companies at the same time unless they demand an isolation how do you think about that when they use the same effectively in the same instance compute and index I'd love to go into the solar example for just one second before we go into multi-tenancy how slow were those queries because when you say it cold you mean that it's not in memory when I say cold I mean that it's on S3 what kind of latency were you seeing that you had to do this work on it was very slow first of all the it has to do also with the domain specificity you know the the queries were Boolean and very long and so they they would take sometimes just a query itself would take a minute to execute on now like a regional index design and that was like just super crazy right but it was also very accurate because it was like sentence level search and then I had to design a new system new architecture where we could retain the accuracy of that engine but not have to spend so much money on on on indexing individual sentences so we indexed one complete document right so I had to change the algorithms slightly and so it went to sub-second it was still I think it's still slow right but it was much faster and and users started like like we could scale the company effectively after that right with one minute and 75% of infrastructure costs were like you know shaving off so but that's that's that was part of the Lucine you know munging with the algorithm and changing how it scans the document it had nothing to do with the level that you go into you know with turbo buffer you know like effectively controlling the whole the whole process there got it yeah I think the the point I the the point there is that I think we do see that some customers are concerned with with this cash because they've gone bit and by basically the the way that I would think about it is in in some of the traditional engines the way that they do IO if something is on disk it feels like it's bad like if it's on disk it's slow and it really has to be in memory and so you sort of have you know the pufferfish is either you know the pufferfish is sort of because when it's fully inflated it's a DRAM right it's a deflated it's in s3 well it only had two settings right either it's in disk which is quite slow and frankly in some of the traditional storage engine I've seen the latency on disk being similar to our latency on s3 yeah and so then you have to load it into DRAM and what a lot of these traditional databases they have to do a full copy into DRAM they can't just like zero copy off of disk and in the disk are also quite slow these old network disks right the NVME disks are so fast right they are they can drive bandwidth that's within you know a very low multiple of DRAM right tens of gigabytes per second but their cost is almost and two orders of magnitude lower so this completely changes the economics but you you can't take advantage of these very easily you can't just put as some software on it and just it's going to be like 10 times faster than an original disk even if it's fundamentally capable of it because what we found for example is that we had to remove the Linux page cache because the Linux page cache cannot keep up with these disks so you have to do direct dial but when you do direct dial you don't get coalescing you don't get all these other things now you have to write your own IO driver right and so you just databases have not been built to take advantage of it because they're also like they're not built to try to do an IO depth like basically so many outside standing IO requests they can they can drive there's a lot of throughput so there's just a lot of barriers of entry there so what we find is that when again speaking in generic terms here of like you know millions of vectors query that once when something is in disk it's maybe high tens of milliseconds mid you know 50 70 milliseconds when it's fully on disk maybe lower depending on the query the machinality whatever and when it's in memory it's closer to 10 to 20 milliseconds right so it's like these are not this is not bad like the user is barely going to notice it and but of course you're going to get more throughput that way and then means it's on s3 it's maybe more like five to six hundred milliseconds it's sort of user would notice but a lot of our customers like notion for example when you open the q and a dialog and these different dialogues that will query turbo puffer they will send a request to tell turbo puffer hey can you start warming up the cash here in a way that makes sense and by cash we just mean putting it into disk and starting with sort of the upper layers of the an index and other things to reduce the time as much as possible so there's a lot of things that can be done here that are very very simple that means that the there's there's there's barely a trade-off yeah but we let's go back into multi-tenancy unless you had a follow up let's do that yeah let's do that like how do you use a multi-tenancy part so so turbo puffer can run in three different ways it can run yeah multi-tenancy clusters that's what I mean that's what cursor does that's what that's what linear does and many of our customers so in multi-tenancy you share you share the compute we can do this so cheaply right because we can share the caching can share the we can share all of this infrastructure it's very easy for us to run this way so that's the default mode the cash is of course segregated off in in in in in in different ways but is also like shared in ways where you have a big burst the traffic rate you get more of the cash than others so that's what we so it's a very great way of running multi-tenancy the other thing we do for multi-tenancy to keep it very secure is that because all the data at rest is in the bucket you can pass an encryption key to turbo puffer that we don't have access to unless it's audit logged on your side where we can encrypt and decrypt the object which is logically and from a security point standpoint equivalent to you having all the data in your bucket so this is a very nice primitive that for example linear it takes advantage of because they have full control over their data they can see when turbo puffer is accessing it they can shut it down at any point in time and they can even pass that on to any other customers where turbo puffer can encrypt data for linear customers on behalf of the customer with the customer's key it this is like really really I think groundbreaking and underrated in this architecture you can of course do single tenancy with turbo puffer as well with the computers only for you you can do b y o c where we run to recover inside of your cloud in a way that's like very compliant we can never see customer data but we find it in multi-tenancy with the encryption which can be done per namespities satisfies the security requirements of even some of the biggest companies in the world yeah that sounds awesome I also wanted to pick one topic which usually used to I don't know if any more I don't see that as much pick up a lot of flame discussions what is your recall at n and when I go to the docs of turbo puffer it says recall at n is 100% recall at 10 excuse me but vector search bar so does that not 100% we said 9200 right no I think it says what wait wait wait I'll need to what was the page where you do that oh here the limits oh I see observed in production yeah it should say up to 100% that's a bug in the docs that I shipped last night I'm gonna I'm gonna fix that after this awesome but what it says in the in the limits is 90 to 100% but let's talk about recall I'd love to get into recall so I think recall is incredibly important it's the equivalent of your database you have to trust your database to do it in the same way that you have to trust your database do f sync and you have to trust your database that when we say that hey we don't return a success to you unless it's committed to s3 you have to trust that recall is similar right if you are working on search and you're working on connecting data to llem's then you don't want to worry in your e-vails on whether your vector database is giving you low recall it's actually a very sophisticated problem to evaluate whether this is the cause so you have to trust your vendor this is an underrated problem and I love that you're asking about it and very few people ask about it unless they're quite sophisticated so let's go into it let's go into a long answer here for your audience because I think this is paramount most databases that have a vector index are trained on or not trained on but they're benchmarked against for these different A&N open source projects so there's sift and others problem with these data is that they do not represent what we've seen in the real world a lot of them are very low dimensionality like when we do benchmarking on a billion that we're working on right now the biggest data sets we can find are like 64 dimensions this is not what people are doing in production they're doing at least 512 often generally I'd say the average is around 768 dimensions these are not representative data sets and the distributions in the academic benchmarks are also completely different for what we see in real data sets right in real data sets we see millions of copy of duplicates right we see filtering all these chaotic environments that do not present themselves in the academic thench works so if if you're using a vector index that's only been tested on academic benchmarks it's it's I mean it's like the LLMs right it's like you don't you don't really trust it just based on discording it's sort of you it's all the vibes right it's all the qualitative thing right outside the benchmark was that everyone was dreaming on them that it will work for your domain right like the LLM that's right like early on very very early on in in in interval puffer's history in the first month I was mainly iterating against the SIFT data set right just like 128 dimensional data set I didn't know anything about an end at the time so it's like okay this is pretty good we can tune some risks on this and then I can do go wider but I have the feedback loop and the observation I had at the time was that I found that one I so I got something that worked really well great heristics on SIFT and then when I went it on the other data sets it just completely did not work well or generalized to the other data sets and I think that taught me an early lesson that the these academic data sets are just not enough and the only way to know what your recall is going to be is to measure it in production this is what TurboPuffer does for a percentage of queries it depends on the number of queries that you do but let's say around 1% of queries TurboPuffer will run an exhaustive search against the A&N index on a separate worker fleet we will then emit a metric to data dog that is the recall number right for this query right like which is basically okay this is the top 10 we know is accurate and this is the you know heristic A&N index is what's the overlap and we will average that over time I have a graph in data dog that shows all the different organizations that have more than 100 queries in the in the past in the past hour or whatever and then we have the recall for all of them we have the recall at what they asked for to recall a 10 the p10 recall the p90 recall and we try to our best to make sure that this is green at all times and we consider green anything above 90% is generally quite good it well 90% is is quite good for some queries but for simple queries often it's closer to 100% many of our customers have have 99. +5% recall so this is the only way that we know to do this and it's fun you ask this question today because last night I was I was hacking on putting this into the dashboard so literally putting the recall that we observe from this from this monitoring system into the dashboard of the user because we think it's that important and it's very difficult to get right we have spent thousands of engineering hours to make sure that the recall is high now recall on academic benchmarks easy recall on raw ann search is especially on academic benchmarks very easy raw recall on production data sets I'd say medium to medium hard high recall on ann queries with filters with mixed selectivity and incremental indexing absolute hard mode this is what the like you just slap a secondary vector index onto an existing database this is what they can't do they can't sustain them like a thousand writes per second with high recall in the face of very difficult filter queries so let's talk about filters recall for a second there is barely any academic data sets on this yet it's all the production workloads what a filtered in an index means is that let's say that for example you have you have an ecommerce and you're searching for you're searching for I don't know yellow right and you want to only get things that ship to Canada that cuts the clusters in different weird ways that might end up with a selectivity of 50% and so if you just visit the the closest whatever vectors with some horrific you have you're not going to get the true ann because you actually have to search maybe twice as many maybe three times as many vectors to get the right recall the query planner the thing in the database that decides where to go on disk and figure out the data and aggregate it altogether and return into the user needs to be aware of the selectivity of the filter and plan that into the ann index again if a database is not really serious about their vector offering they're not doing this they're not measuring in a production they're not they're not willing to show their users and they're not they don't have a full infrastructure in place to measure the recall so I'd say we take this extremely seriously and we don't want our users to have to get to get to guest this and it's a it's sometimes a thankless job because because many many many many emails that we see against some of the other vector indexes have very low recall and and how are users supposed to know because running these running these tests is extremely difficult it is and it's like it's as you said like you need to trust there right trust your vendor and it's basically like the like in some documentation pages you say the floor or like the bottom line right like the needs of each it just doesn't make sense right if the quality isn't there then why are you even running this it's a difference between you know finding that product with those constraints when it exists and actually not finding it right therefore not buying it and so on and so on and so forth it's right and I think yeah I you can never guarantee a recall you can observe what you are trying to make it be here on every data set but if you send a billion completely random vectors with 3000 dimensions and try to query them in turbo buffer with query vectors and there is no natural clustering because they're random vectors you're not going to get 100% recall when you send that with a 10% selectivity filters that just like completely breaks every heuristic that's made right but all data in production real data that people want to search has some natural clustering to it so that's not a real benchmark that you can evaluate recall on right and so we always take this seriously and the in the in the in POCs and with the monitoring we do we're looking at these numbers all the time but there are like absolute edge cases that can be very very difficult and what you have to do too as a database vendor is like it's a tug is a tug of war between we're going to look at more data to try to get high recall and we're going to try to improve the clustering of the data so that we have to search less data and so you're always trying to improve the clustering and you're always trying to improve the performance of the database so we can look at more data to get high recall yeah for sure I know that you mentioned about filters you know challenges vegan and I don't know if you aware of those the reason an end-end benchmarks right but there is also a big end-end benchmarks that they happen to pleasure to participate in they have one of the datasets one of the tasks they have is the filtered search I have not participated in that one but again as you said it's kind of like academic but some of the datasets are quite logical like beaten points dimensions and not that huge it's like that's the thing there are 156 yeah it's not like yeah there are hundreds of 200 dimensions these are not real data sets like no there are real data sets but they are real data sets from the past generation of vectors right the the pre the pre current modern embedding error right which are just scores so much higher than these so we just don't see people use these yeah yeah exactly it's still fun to participate in this benchmark by the way because the data is there and you know the some of the guarantees that you need to hit really high you know like thousands of tens of thousands of queries per second so if you can create a toy index that works just a proud moment I guess that's right but I would say that people don't care about these bench like now people care about the benchmarks like their fun competitions but I think it can ruin your company if all you're trying to do is maximize these benchmarks because how many companies on in the world are trying to do 10,000 QPS on a billion vectors right not that many but there's a lot of companies that have a billion vectors lying around that they want to search and they just don't want the pricing to be offensive right we're a turbo buffer can you do this depending on the dimensionality for like a thousand dollars in month that's what people really seem to care about yeah sure maybe if I ask you like a spicy question if I may why do you think some of the vector database players like in Dalgium cells into that game of showing the benchmark and telling we are the best and then you know someone else cuts them over and says no you made a mistake in the benchmark why do you think this is happening like like publicly if you're comfortable talking about this yeah we we don't we don't publish benchmarks against anyone else in in fact it's it's it's usually against the terms of service to do that for almost every vendor including the big vendors like the hyper scalers probably shouldn't be it probably should be prohibited for the hyper scalers for like any competitive reasons or but anyway for the peers I think it's it's like a low blow because everyone can sort of p-hack their way to something where they they go better and becomes like month throwing and it's very distracting activity um we benchmark ourselves in ways that we find that our customers are actually using the database so we're not doing it at 10,000 qps because it's just not what we see to a single namespace um so we benchmark against ourselves we benchmark against first principles and we're always considering what is the gap between what turbo puffer does and first principles there's that's what I've learned that's why I do napkin math is because the fundamental thing you should be benchmarking against our first principles there's a gap between what the DRAM or disband with it is multiplied by how much if it you need and your database is not doing that well then you either have a gap and you're understanding or you have a you've found room for improvement that's what matters and of course it also matters what other people are doing but what matters the most is what your customers are trying to do and they'll they'll pull you in that direction so we think that this is a this easily becomes one of these metrics where you know if you give people a metric they'll optimize for it um and benchmarks of how many qps you can do at some number recall it's just not what people care about um they care about it working they care about enormous ride throughput they care about costs they care about other things necessarily that are much harder to put in such a benchmark um I think benchmarks are important like we need to give people a sense of what they should expect and they should hold us truth at that and what I would love to have is like more absurd ability in the turbo puffer product of like what kind of like performance are you seeing um we're working on you know explaining um our exposing query plans from turbo puffer right so you can see um well what's what's causing the indexing uh sorry what's causing the the query latency to be what it is so yeah I don't think the mud throwing is great um I think that at some point someone's going to publish a benchmark with turbo puffer and um and and again something else and and then we'll have to deal with that as it comes right um by um but it's certainly not an activity that we plan to engage in yeah I love your answer because it also resonates with me like in a different dimension you know I found myself in a situation when at some point in the past when um we've been copy copycatted I can't be say in that way so there was a company that really literally copied the whole interface and it like how how the product looks and we felt threatened but what they couldn't copy is essentially the internal IP right all the algorithms everything where we've spent hard hard uh working time on you know they couldn't copy that and and effectively that doesn't fly by itself right so so basically what I'm trying to say is that you know even though it felt threatening still thinking about what you need to solve right by the laws of physics you really need to focus just on that and if you solve that you become the leader of the market and that's what happened to the company actually it the story was that it actually acquired this copycat right and that's it uh I mean it doesn't mean that's the bad thing bad outcome for either of them but what I'm trying to say is that just focus on that on that thing that you're trying to solve and don't try to indulge into these you know games of like you said you know not throwing and stuff I like that really well said yeah that's I think I think so we focus on on customer studies we focus on we focus on on first principles we focus on benchmarking and we focus on on what are customers telling you telling us that they need and I think that um I think those are the right things to focus on for our company for sure and and and just looking at the clientele right the the ones that you shared just just knowing those names cursor and and notion that everyone is pretty much using every day that's like a testament to what you've done um I also wanted to ask you before we close I wanted to ask you about what are the maybe technical or business or some other challenges that you see ahead of yourself or maybe that's what already is happening and you see that it's important especially in this space of LLAMS where LLAMS can bring value um what is it it feels like you you have wildly successful as a business and as a technology but is there something that you see is still unsolved and and is ahead of you and worth solving I'll go back to a you know I spent a long time at Shopify and so part of growing up at that company from when I was very young was being taught a bit in the school of Toby Lukay the CEO and something that he often said is that you know you about himself is like you have to grow to keep up with the business and that's that's what it is for me as well right I um first had to grow as an engineer to put out the first version um then I had to build an engineering team to take it much further than I ever could alone and I think we have and just an absolutely like 99% college engineering team now then I turned my focus to sales and learning that and now I turn now I have to turn it towards marketing towards legal towards all these different things to build the company I we spoke a little bit about this about this before um about I think that one of the beliefs that we have is just the town density of the team and I don't think there's a lot of I think that a lot of people talk a lot about the town density and I think that there is um now a generation of companies that's really trying to do it um I think that um the tools that we now have available to us and especially the kind of tool that we work on every day um the floor for productivity has been raised but the ceiling has been raised far more and so what really matters to us is having a team of individuals that where everyone is is a player right we see these teams as a symbol of almost more like sports teams today um then how companies were originally uh originally built um and I think that we we hold that as a as a strong belief in how we we um we are building the company but it demands a lot from everyone to work in this way but it's very fun um and I think that the the growth that that embodies on everyone including myself is important and I have to keep up with that I have to keep up with the demand of of of how our customers and our team internally and everything grows and that's that's the biggest challenge is just the amount of new that has to be learned um so that we can become a successful company which is is important for me for our customers and for everyone who's chosen to come along for the ride and join the company. + Oh that's awesome yeah this field changes so quickly um it it felt much slower when I was coding myself you know Java you seen all that stuff you you had like solar elastic search that's it for like long time and then a lot of new engines popped up especially when vector search appeared on the scene but now with the LEM advancements and all of that it just feels so crazy so yeah it's very interesting challenge for sure you know personally and business wise and and team wise and yeah and and keeping balance is another one. + I think the pace of the pace that we see everyone running at now in the successful companies is beyond anything that I've seen before um it reminds me of of just the the the months leading up to black friday at Shopify but it's all the time and I love it I'm addicted to that pace and I think that we have created a team of people who seek intensity and that's exactly what we think that we need to create the right product at a pace that makes sense for also our customers so the day are never bottled next on us and that is what keeps me up at night. +Oh that's awesome that's great cause. We usually end with some sort of announcement anything you want to say to the audience especially now that you said that you want to go deeper into marketing it beats your chance anything that you want to share all call for. + I think that we we we we've refrained from doing any large releases and we try to just ship as rapidly as we possibly can if I look at the change log this month it's um I mean we we launched to we launched the region the Singapore Canada we've added the float tight uh we've we've recently added clients for Java Go Ruby um one of the things that I think is is really exciting is our conditional rights and this is where turbo puffer is not just a bunch of files on s3 it's not even just a search engine it could do conditional rights where you can say hey I only want to replace this document if it's newer than the old version right these are real database features and things like patch right where you do a partial update but we don't we we just we just launched and then we put it on put it on x and we move on um so I don't have any big announcement we went GA a couple a couple months ago it would have luck to have done that but we just tried to ship an announcement just get it out there as soon as possible move on and ship the next thing. + Yeah congratulations on Jay on your original launch but on Jay I think it's a big milestone as well and as you said you're probably not as sort of transactional anymore you just keep shipping and you follow what the customers need and but sometimes some of these things may go and notice unless you know people follow exactly what you do and so in that sense I feel like there is a room or stage for saying hey guys go go use it it's GA right run your benchmark. + I think now do they think about it been more I think one announcement might be that early and unintervaled puffer's history we were very focused on doing many namespaces that were small but we are getting very good at large namespaces now and we have customers that are searching billions of vectors at once and we have customers that want to search hundreds of billions of vectors all at once and we are working with them on that and this is not particularly scary anymore you know exactly what we need to get there so if you have use cases of that caliber you may have passed by turbo puffer before but we're getting ready and we are ready for hundreds of million and billions at once the only limitation there is the is really just the size of a single machine and then we shard over them but I think back to the sharding we had before you need to make every shard as large as possible to get the best economics and the best performance and that's been one of the issues with some of the traditional story tangents. +Yeah for sure. + Yeah I really enjoyed the convo I know we could have gone and so many topics like I really wanted to ask you also about an an algorithms and stuff but I also I feel like we could talk more later as well you know down the road as you guys are progressing hopefully you'll be open to that but I've learned a ton and it's very interesting designed that you have and and the whole journey of you pushing for four months you know and interrupted I hope you now regain some of the balance back to your life now that you have the team supporting you but I really enjoyed this conversation Simon thank you so much for your time thank you Dmitry \ No newline at end of file diff --git a/transcripts/vector-podcast/eric-pugh-measuring-search-quality-with-quepid.md b/transcripts/vector-podcast/eric-pugh-measuring-search-quality-with-quepid.md new file mode 100644 index 0000000..5076243 --- /dev/null +++ b/transcripts/vector-podcast/eric-pugh-measuring-search-quality-with-quepid.md @@ -0,0 +1,201 @@ +--- +description: '

00:00 + Intro

00:21 + Guest Introduction: Eric Pugh

03:00 + Eric''s story in search and the evolution of search technology

7:27 + Quepid: Improving Search Relevancy

10:08 + When to use Quepid

14:53 + Flash back to Apache Solr 1.4 and the book (of which Eric is one author)

17:49 + Quepid Demo and Future Enhancements

23:57 + Real-Time Query Doc Pairs with WebSockets

24:16 + Integrating Quepid with Search Engines

25:57 + Introducing LLM-Based Judgments

28:05 + Scaling Up Judgments with AI

28:48 Data Science + Notebooks in Quepid

33:23 Custom + Scoring in Quepid

39:23 + API and Developer Tools

42:17 The Future + of Search and Personal Reflections

Show notes:

- Hosted Quepid: + https://app.quepid.com/

- + Ragas: Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines + https://github.com/explodinggradients...

- + Why Quepid: https://quepid.com/why-quepid/

- + Quepid on Github: https://github.com/o19s/quepid

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20240626_010626_075b8a8d662d3fbf1946ef06b8218efa.png +pub_date: Wed, 26 Jun 2024 13:42:56 GMT +title: Eric Pugh - Measuring Search Quality with Quepid +url: https://rss.com/podcasts/vector-podcast/1539938 +--- + +Hello there, vector podcast season 3. In this season I made one simple promise. I will try to stick to 30 minute episodes. Let's see how well I will do it. It's not always easy, especially when you have guests like Eric Pugh that I'm really having a pleasure to talk to today. +I can say that we've been working together on Quepid, on ideation, on things. And I've learned a ton from you. Yeah, yeah, I'm super excited. So when did I come visit you? I think it was two years ago or some two years ago. I think so. It was pandemic, I guess. +Yeah, it was the very end of the pandemic. Right. I remember getting my, yeah, so it was still pandemic. It was still pandemic. Yeah, it's still pandemic, right? Because I had to get Quepid to say, yeah, so I was, yeah, I was like, I want to meet to meet you in person. +And I called you and said, I'm going to come to Helsinki and visit you. And I think you were like, why? I mean, we don't work together or say we worked on Quepid though. Quite a few evenings together, right? I think it was like nine o'clock your time, Helsinki time. Yes. Yes. +Who was it? And it was Friday. I remember vividly Friday. What else to do? So I went camping with my family. Can I screen share? Did you meet? Yes, yes. Of course. You can give me permissions. I went camping with my family. +And if you think back to my visit to you, you and your wife gave me a little gift. Yeah, give me a free share. I can only do you host. Let's do a host and you can screen share. Yeah, I can. All right. And so I just wanted to share off this, this cup. Oh, there it is. +So we've had that little wooden cup. I think it's a traditional finish drinking vessel when you're out in nature. And there it is with coffee. +And then I'm also showing off my metal ceramic metal enameled cup that I picked up at OpenSearchCon EU a couple weeks ago in Berlin that Zeta Alpha shared had some great conversations about search, relevancy and measurement with them. +So we took these two cups on our family camping trip the other week. I wanted to show those off to you. This is lovely. This is lovely. And I'm glad you're putting this in good news. Yep. It goes with us. So fantastic. Yes. Yes, where we start. First of all, hello. Welcome. Welcome. Yeah. +Thank you very much for having me. It's long overdue. And usually we start with a little bit of a background. Obviously, people can go. I think you even have a Wikipedia page about you. I think so. I don't know. That is a lot of people. Right. That is a lot of people to get to a Wikipedia page. +I don't know that I'm quite there yet. Yes. So my name's Eric Pugh. Been doing search for about, I don't know. We're like getting 15 years. And I was there for when a search was like first, oh, you have your own search engine. It was very exotic. And there was nothing open source. +It was all commercial. And then cut my teeth in search going through the big data time period. Right. When, as Grant Ingersoll said once, search is the UI to big data. And it was all about data. +Can we handle and how do we store it and scale up our search engines? And that was great and kind of led into the machine learning time period. We're really at that point, it was like, OK, we have lots of data. We can now search it. +What does it mean? What are people looking for? It wasn't enough to have fast search with 10 blue links. +It was all of a sudden became really important to be like, am I giving my users what they want or not? And machine learning and data science really kind of came along and helped us make those determinations. +So really, and that's when open source connections, the company I was one of the co-founders of, and I'm one of the leaders of really kind of focusing on the value side of search, relevancy. +Am I giving people what they're looking for? How do I drive more revenue in e-commerce? How do I help people use my SaaS products? Are they subscribed and renew their subscriptions? All of this, right? And yeah, machine learning was awesome. Data science was awesome. +Really got into a whole measurement thing. And that was kind of one of the products that I stored, Quepid, we know each other, came out of that time period because we said, why are we building custom tooling for every project, maybe we could share some things. +So, and then yeah, today it's really been exciting to see sort of generative AI come along and vectors. +And it's interesting because I still feel, you know, for a little while I was like, is search still gonna be a domain? And you know, search is totally changed, but it's still how people interact with systems, right? +Whether it's a spot and a retrieval augmented generation or a more traditional keyword search, using LLMs, using models, using vectors, still a search engine in the middle of it, mediating, moderating that conversation. +So really excited about what Gen AI has let us do. And I think my big takeaway right now is that historically search was fairly mediocre. You could make it a little better, you could make it a little worse, but it was always like people understood it was fairly explainable. +Why I'm really excited about measurement and understanding these days is because now with Gen AI, we have much better tools. We don't have to have mediocre search kind of better, kind of worse. Instead we can have amazing, accurate search results that really understand what you're looking for. +And you're like, yes, this is exactly what I wanted. But, whoops side of it is sometimes those search results are back shit crazy and you know, no idea why it came back with it and you made me lose trust. +And so now instead of all search results sort of being in the middle sort of, yeah, a little better, little worse, we're now really polarized. Sometimes they're amazing, sometimes they're terrible. +And we need to understand what that curve looks like and make sure that the amount of terrible is something that we're willing to deal with, right? +Terrible results, one in 10,000, one in 5,000, one in a million, depending on your domain, it may need to be one in a billion is a terrible, right, depending on what you're doing. +So exciting times, really exciting. Yeah, it's amazing, it's amazing story. And of course, I'm very pleased to also being able to pick up, keep it with you early on, where I tried to pioneer it two companies ago and I was leaving actually. But it was almost ready. +And then the next company I actually deployed it. And we, I remember we generated 70 G-RAT tickets just by looking at queries in Quepid because you know how it usually goes. +People develop software, other people check on it, other people are just project managing and things like this and no one really takes the lead on looking at the queries. And this is actually the most fun sometimes to look at queries and sort of you know investigate what's going on. +Do you even like these results? How do you feel about them? You know let alone setting up a team around it where some annotators can actually go and label with some domain expertise, you know, or maybe pretending to be users and things like this. +So it's an amazing system and we continue to use it today. Of course this was the first thing I pioneered at TomTom and it's still there. It's fantastic, that is wonderful. I mean, it's been great to see sort of the adoption of the product and then people have been using it for a long time. +So I'm going to show a query set today that is a thousand queries and maybe a thousand queries that have been judged 10 deep right by hand for three years. Almost four years. +This one organization that Nerry Information Network has been using Quepid for years and now they built up this massive body of ratings and they have tons of data and trend lines for what did search look like four years ago? What did it look like last year? What does it look like today? +It's really been exciting to see them. +They've just been using the little hosted Quepid app.Quepid.com and but it's worked for them. So a thousand queries definitely takes a long time to work your way through. But these days they're just kind of keeping an eye on what's changing, right? Barring a major algorithm change. +It's just sort of staying on top of it and keeping everything right. But yeah, so it's really exciting to see people using it. Yeah. +Definitely I'm having a little bit of thoughts about where does Quepid live in our generative AI future? Been playing a lot with tools like Ragus and some of the other ones, right? And it's interesting to see what tooling and where does Quepid do things? Well, where does it have challenges? +Where do we want to go with? So yeah, for sure. +And for those who don't know Quepid, I mean, I can give my short intro, but obviously feel free to augment. But like the way I see it is that it's basically instead of hearsay and sort of someone saying, your search doesn't work. And here is one anecdotal example. +What you can do is that, oh, vice versa, you could say I improved search. +And here is one anecdotal example where it really shines, right? Now what should we ship it? So basically I think Quepid really gives you the tooling and you can actually, even if you want, you can even do it in an unbiased way, as possible where you will do blind labeling in some sense, right? +So I've done it actually just recently. +And basically you allow your users, well, your domain experts actually, but maybe even developers to go label queries. +And it also has this sandbox where you can actually, well, you can plug in your own engine, but you can also plug in those standard engines like Elastic Search Solar, Open Search and others. +And I think you even added some vector search engines recently, right? Yeah, so we have a Vectara, which is a pure vector search engine. We've got out the only app. +And then Open Search, Elastic Search Solar, the Lucine Bay search engines, and then kind of exciting, you can also now plug in your own search API. And so you can just talk to any API, a restful Git post JSON sort of API, you can use Quepid as well. So that's been really good. Fantastic. +I love this YCupid. This is sort of the origin story Doug Turmbolt, who many of you may know, right, from his book Relevant Search. He created Quepid. And we're looking at like a decade ago at this point. +And it was because, you know, it was difficult to measure and improve search, right? Lots of spreadsheets going back, lots of conversations. You fix one thing, break another. And Doug and Arena were working together on a project. And it was literally, this is the origin story for Quepid. +So Quepid's all about making collaboration better, making your testing more accurate, and making things go faster, right? Because we need to iterate and experiment quickly, right? +The one thing I know is that the team that can experiment quickly and effectively is the team that's going to win out, right? It's not about specific technology choices or technical expertise. +It's experimentation. Can you do it quickly? So yeah, so Quepid.com has the advertising free hosted version, really excited. It sort of continues to be useful in today's world. Absolutely. And it's also open source, right? So you don't have to be buying anything, whatever. +It used to be a product though. It used to be generating revenue. Yeah, I mean, you told me. We're consulting soon. So yeah, we used to sell it. We used to sell it for $10,000 a year for an enterprise license. And we had customers and it was great. +But I think then we figured out we were making, I don't know, $80,000 a year, which sounds like a lot, but then investing $150,000 in salary, supporting it. And it was like, yeah, we're not a product company. And we are open source connections, having a commercial product just didn't fit naturally. +And since we're all about training our clients and empowering search teams, right? Doesn't necessarily feel empowering to be like, yes, we've empowered you, but you have to pay us money every month for this one product, right? Just felt more natural to have it as an open source project. +Yeah, absolutely. And it also fits your, well, how should I say, your professional line at UI commuter. You've seen solar commuter, right? Yeah, so I am a commuter, not active on Lucene, that's just a level of technical expertise. But I am a commuter on solar. +And then interesting as like an interesting personal professional development, I've been really gotten much more involved with the open search community over the whole year. +And so I'm now a, they call it a maintainer instead of commuter, but I am a maintainer for open search documentation, which has really been really a lot fun to work on. And we're talking about it maybe in another podcast, but contributing some new features to open search, the open source product. +So really excited about that. Sam. Actually, give me one second. I have one thing to confess, one second. I have to confess or share one personal bit that when I started in search, it was, of course, it was early. It was like 2003 about when I wrote my own search engine. +But when I started doing search in the industry, right? It was 2010. And it was a patch of solar. And when you Google a patch of solar, you would mostly find Java, Java dog. Yeah. +And maybe, and then I figured out there is also a mailing list was like, but is there a place where I can read about solar besides Vicky pages because Vicky pages were not kind of complete in a way? Yep. Yep. I was like, and I found this, this book. Oh my gosh. One point four. Yeah. +A prior server data. Yes. Yes. Yes. And I've read it covered to cover. I have to say it because because I had one challenging task. I had to build what I suggest and that I would suggest had to abide to certain rules. +And I was like, oh my god, how will I do it? In the moment I did it, it was also slow. So I had to figure out on our data, on our version of model of data. Right. Oh my god, this was so exciting. I was like going back and forth between the book. And then a bit of googling and then trying things. +Ah, yes. Fantastic. Wow. Thanks for doing this. So you also the author. You also the author. Yeah. Yeah. Yeah. Yeah. Yeah. So we did that book. We did it. We did a second version of it for updated solar. But that was quite a few years ago. +I am kind of curious what's going to happen with technical books. Right. I mean, in the solar community, we got the ref guide, which is, I think, pretty darn good considering how it's written. I do sort of wonder what the future of technical books will be with open source communities. +And what do we do? So maybe like cookbooks, you know, like that you have specific cases and like, how would you come about building these things and maybe real data so people can actually try things, right? Yeah. Yeah. Yeah. I mean, it has gotten a lot easier to publish on the web, right? Yeah. +Have something. But yeah, what, you know, I think a lot of people write a book sort of as a writer passage as well. Right. So it's a book, a little different thing writing a book for an open source project reference. Right. For sure. How to make them printable. +So you can say I wrote the book for this open source project. But we'll see. Yeah. That's exciting. But you also wanted to show something. Let's demo. I'd love to. Yeah. +So we touched briefly on Cuban, right? And so I'm going to go ahead and tell you about the first one of the stewards of the project. And historically for those of you who views, who've used Quepid in the past, the way it has worked is I'll just, I'll just bring up my local host. Here we go. Right. +So one of the things that we've added in the not in the recent. This is the development version. So user with realistic activity and Quepid is who I've pulled up and I got a couple of cases here. But you know, in Quepid, it works well. I'm going to bring up a case, right? Here's a case. +I'm going to search for milk. I did a query for milk. This is using sort of a random data set here. It's backed by a solar search engine. You can see right there. There's my search engine. And Quepid works great for a. +Relatively small number of queries up to hundreds, right? And one of the things that we found is that. That this interface works well, especially if a search engine super fast and responsive. But because this is a rich single page JavaScript application. +It's making queries in real time to a search engine. If you have a thousand queries, like the people I mentioned before, takes like 15, 20 minutes, but as you wide load up and all the queries to be run. +And we know that lots of people want to run more queries, 5,000, right? When people ask how many queries should I be measuring? I'm like, well, start out with what you can. If that's 25 and 50, that's better than zero. Think about 200, maybe 300, maybe 1,000, 5,000. +Right? And then above 5,000 that sort of only for the most sophisticated teams. But Quepid kind of tops out at maybe a thousand queries. And so we've been doing a lot of work to think about how do we support larger data sets, right? Larger query sets. +And what's been really fun is to work on introducing background processing, right? Instead of everything being limited by the request, response cycle of your web browser. What if we can run some background jobs? And so I'm just going to show really quick. I'm going to go and bring up all the books. +And I've got an import feature. So we have exported a book book export 39. It's a 62 megabyte JSON file. So 62 megabytes. And I'm going to go ahead and click upload. And now in Quepid, what we're starting to do is we can take large files, JSON files predominantly. And we storm in the background. +And we kick off a process, a background job. And there you can see, right? There we are loading a whole bunch of queries, right? And these are all sort of scientific queries, some very complex ones and simpler ones. +And you can see it's going to take a while because this had what 28,000 query dock pairs, right? So that are being loaded along with their judgments. So, but what sort of fun with the new background jobs and using web sockets. +We're also able to push up updates to you as background jobs are happening inside Quepid. So right here, there we are. And we are loading a whole bunch of data. Now. Yes, it would be nice if it was a parquet file, not a MySQL database that we were using. +So we'll have to think about some of those things. But this is starting to open up the door to moving larger data sets and being really comfortable with that sort of 5000 queries, 50,000 query dock pairs kind of data. +Not going to manage the 100,000 queries or quarter million documents those data sets. So I think this is a great thing to do. And I think this is a great thing to do. I think this is a great thing to do. And I think this is a great thing to do. And I think this is a great thing to do. +But we're at least scaling it up to get a broader set. The other thing that I'm also excited about is we're getting closer to being able to run these analytics on a regular basis, right? Now that we have some background processing, we could think about every night. +So these little charts here that you see that are sort of showing some basic scoring information. You could start using this to monitor it over time. Instead of having to roll your own dashboarding tools. Yeah. So. So that's something I'm really excited about. I'm also going to point out to two PR. +So GitHub.com. 19 sqbid is the open source project. And a couple of pull requests that are in progress, but looking to land them soon. Right. Here is this pull request 976. Imagine if we could run thousands of queries, nightly and Quepid. +Now that we've got background jobs working and communicating with the user, right, of state. This will be coming pretty soon. Pretty soon in open source time, which means. I don't know. We'll see next few months. It's on a bag. People helping and testing. So this one's super exciting. +Let's go back and see how we're doing. Yep. We're doing. So there we go. We're up to 4968, where a doc pairs as we kind of count down. Yeah. Yeah. This is all through the magic of Web sockets, which has been really cool to see. +And as you are loading this here, you also executing it against the search engine. Or are you here because we had all static data. Yes, static data. A book represents the query doc pairs with all of the data. Whereas the case is what we do the real time querying. Now that we have this one working. +Once we have this PR, then you'll be able to run a background job in Kupid. With a similar counter, maybe it up here next to one of your cases that says we're running queries, 5000 queries. And this is our progress for the number of the error out. +But of course, for listeners to understand, like what takes time is basically, of course, also inserting this data into Kupid's database is like my sequel. Right. And ready. I guess, or have you stopped using a ready. So I'm not sure. So we're actually, so we're using my sequels or database. +However, what manages this communication Web sockets is all in red. So as the bat and it's our background jobs and our front end jobs and our web browsers keep track of each other is through red. Yeah, so if I actually, so if you, you know, I'm running local hosts, you won't see it. +But if everybody who is connected, who has permissions for this book, everybody would be seeing these messages. Yeah, yeah. So it's kind of broadcasting to everyone who has access. Exactly. Exactly. So that's something I'm really, really excited about. +The other thing that I'm really excited to be is LLM based judgments, right? So you kind of started out this conversation about using Kupid with human judges annotators, right, and gathering quite quality data. But as we all know, human judges is expensive. Not every organization can do it. +Um, my colleague Scott Stoltz last year did some interesting work playing around with chat GPT when it first came out to evaluate, is this query and this document. +And then we've been working with Moody's on their BG solution and using what we've been calling judge Judy and LLM to evaluate what that lets us do is. And then we're using a small set of human judges to validate our LLM judge judge Judy. +And if we have good correlation, right, our inner rate of reliability looks good. You know, flights, Kappa, Cohen's all those metrics look good. Then this gives us confidence to go ahead and scale up the judgments, right, using an LLM. +Today, that is a bunch of pandas notebooks and kind of custom code. However, the other pull requests that I'm really excited about, right, is this meat judge Judy, she is your AI powered subject matter expert. Right. +And so in the not too distant future, you will be able to let me go ahead and bring up this case, right, here we have one person who's been the judge. But soon you'll have a second column next to it judge Judy, right, using whatever prompt you've typed in, right, or provided is judging. +So that's the other big how do we scale up, Quepid and make it relevant in our gen AI world, right, those are sort of the two big things. This is fantastic. This is really fantastic. Yeah. Wow. +I hope this PRs will land really soon, especially the LM one, right, because this allows people to really quickly hit the ground running and start labeling. +Actually, someone will label in a way, but exactly exactly the trick is having the right props, right, and having the right set of positive examples and negative examples, right. +But one of the things that I were working on, so Quepid, right, ships with a set of data science notebooks, they need a little bit more work. See if this comes up in my diversion, I don't ship that. So I'm going to switch to the production Quepid. Yeah, no worries. And notebooks. +Who? They're loading up. So in this example's folder, we're actually shipping a couple of notebooks for you to use, flash capa, jacquard and RBO comparison, multirator analysis, right. These notebooks here, you can directly use with your Quepid book of judgments to evaluate how we're doing overall. +And so this can let you take your human judgments, understand how good or bad they are. And then when you bring the LLM power judge in compare the LLM judge to what your human judges were doing and feel some confidence. +So I'm really excited to be shipping these because I think it's going to lower the barrier to getting judgments. And that's something that a lot of search teams are like, I would love to use cute, but I would love to do this. +But I can't do any of this until I have judgments and I don't know where to get them or I don't have the domain experts that I need, right. +And you know, search oriented organizations often have that figured out, but a lot of other teams are like, we just see a search engine that works what you know reasonably well and we don't have that. So we got a lower the barrier to getting judgments in judgments and I'm excited about this. +This is fantastic, but I can also add from my personal experience, you know, that yes, you're absolutely right that there is this sometimes there is even a friction, right. +The search engineer says, no, I don't want to label I'm a search engineer, I'm developing the algorithm, but they will get so many more insights, so much more insights if they actually label. +And in our teams, you know, if you have, I don't know, 10 people and if each will label 10 queries, you will have 100 queries labeled. +So of course, if you don't go for overlapping and stuff like that, but if you go, then yeah, it's another story, but you know, and then all of us, all of a sudden get all this insights, right. +Now, now the LLM thing can actually help you scale this right and then of course all this prompting and in label studio, by the way, they have released a maybe something to think about a capability where an agent will learn from user feedback, right. +So let's say they label and then so LLM will label make make some mistakes and then the main expert will correct them and so it takes it in as a feedback and then it becomes better over time. +So it's basically you can kind of support it's like you're not copilot, someone said, seed prop stain in previous episode said, confident. So you kind of like give these things you collaborate in a way, right. So that would be this is fantastic direction. Yeah. +So I mean, this is definitely very much around that more narrow relevance judging versus generic labeling way label studio is right. But there's definitely room for inspiration from both label studio. I've been looking at more as well as ragas and how it's doing some of the new metrics. +Yeah, it's interesting. So yeah, exactly. What I love about Quepid is that I can really connect it to the live search engine. I mean, not necessarily in production can be some development version of it. +And I can start labeling and queering and as you said, search is the interface to big data, right. So Quepid becomes interface to your search, which is the interface to your big data and all the unknowns there. +And when this terrible that one looks to OK, that one looks OK, that one looks terrible, right. We can immediately start building. Yeah, maybe that one's OK, that one doesn't look it right we can immediately start building some sort of understanding right now. +So quick little binary one right we're going to start building that and get it get a sense of what our score is going to be exactly exactly. And that score is also customizable. We've done some little implementations in like JavaScript looking like language, right. I think it's JavaScript. +Yeah, but you just come in here and you take your score right there is so here's classic NBC G 10, but you can change it. So like one recently, we wanted to know in this score right here, we're being you know penalized because soy is returning zero results. And so it's giving us a zero. +So it's bringing down our average precision. But what if you wanted to know that it was supposed to be zero results zero results is actually the right. Yes, yes, yes. So that one we actually went in and said we added an option a per query option should be ZSR. +Right, and we set that option and then in our custom score. If it said should be ZSR is true and there were zero results then we gave it a one right because it's working the way and vice versa. +We had other situations where yeah, if we started returning results for soy, that would have been worse search right and so yeah that was a great use of a custom score. Yeah, fantastic. We're all about that because it was a great use sound. Yeah, yeah, yeah, exactly. +Also where Quepid helped us is sometimes you don't speak that language. So it could be Korean language that you don't speak but you need to move and on one occasion we've sent a Quepid just to our Korean native speakers in the company and they've labeled and they told us how it looks so. That work. +So the other thing right so here we are we are happily loading up all these but I'll go ahead and click judge right. So this is sort of an older approach to rating a zero a one to 10 wouldn't do that now but that's what we've been saying. +So there you can see a here is the human radar interface now I don't know what is a good rating or not but here you can know the rates and documents kind of taking this from this is a recent add on which is if you're if you can't read it why right. +I am a vet in there I am not a vet and don't understand the science. Yeah in this square. Right and so I'll skip judge in that and that and that that's been that's been helpful in just cranking out your human judgments so. +Yeah go got just a couple of judgments mostly Jeff I yeah Jeff Scott 2500 I've got four in here and I mark one is unreadable I can reset that which should throw it back in the pool right maybe we have a conversation about why it was unreadable and then throw it back in the. +Almost there we are almost there right the background job is wonderful but it doesn't necessarily mean it's any faster right now I know I know at least watch it and watch the countdown so. +Fantastic demo can you tell a bit more about the tech side of things we didn't mention sequel my sequel already is. So if someone wants to jump in and start you know going and and sort of what is the level of effort they need to go through. +It's a little bit of a challenge right so most of us so this is a Ruby on rails web app right like it is a full stack web app and this is all just standard Ruby on rails the app is. + It's been upgraded over the years to be with the latest standard and so if you do rails development everything's going to feel very comfortable obviously a lot of us in the search or information retrieval world don't have that expertise and that that's just a challenge so one thing I will say is that if you join or ask questions on relevance slack pound Quepid happy to answer those questions the core application that you play with in here. +So it's an old angular one works great no problems but it's an angular one app and because it's an open source project not a commercial product we sort of stayed away from attempting the big rewrite to update it to react or name your thing lots of examples it seems to work fine animals. +So Quepid angular one app for all of this and then outside of this this is all just a standard rails application lots of model view controller type screens that you can see right here all standard rails my SQL database redis for sort of the communication layer. +And it's all built using Docker so if you want to so the read me has way too much developer centric set up right but if you have Docker then you run bin setup Docker yeah that will set you up with the development environment literally what I was just showing is the inside of Docker. +And then you fire it up locally with bin Docker server and that runs it locally so there's a lot of docs in here for all the different parts that can be a little overwhelming I think we have to rework some of this documentation but it's all there. +Now a couple of things I'll show off we actually have an API now so right here you have a you come here and you generate your personal access token like that and just for fun by and and this curl command will show you your user so we have authentication API. +And we're slowly working on documenting all of those API API API API. I'm doing well. +So there we go API API slash we're slowly documenting all the APIs and so one of the things that I encourage people right is maybe Quepid doesn't do everything you need to do and so you're building some scripts outside of it or in some notebooks but you can use Quepid as your shared source of truth. +So maybe you have a case that represents your golden set of queries right you in your notebook can go and grab all the query so and so we're adding sort of more and more documentation on all of these different API so that's fantastic yeah. +So make it a little bit easier for people to understand so I can look at this here's case for but I can also look at it like this. That's a Jason right. +This should give me back my Jason right it's all my Jason data right there's all my different scores etc so if Quepid provides a value but doesn't do everything you need to do you can read him right from it. +So I'm going to do a lot of export name pork functions as well so yeah so it's fantastic and yeah so it's loaded it's loaded. Like there we go 29,000 or 87 query document pairs and 29,291 judgments right. So there is all preserved so. +There you go this is fantastic thanks for the demo Eric I learned because I wasn't keeping up as closely I think we also are writing an outdated version of Quepid so I will ask the team to to upgrade obviously because. +Yeah we should yeah yeah the release cadence is fairly fast so make sure your deployment model is pretty simplistic and automated so it's keep up yeah exactly. +This is fantastic i'm sure we can talk more about your other projects and we will save it for another episode yeah but I was also thinking I like to ask this question and now I get the chance yeah. + Why the the question of why I call it or the the motivational what keeps you up and at night so to say why you are still in search Eric you've spent so many years do you think it's still unsolved or what what keeps you going in in this topic yeah so what I love about search is it it kind of reinvents itself every. + I mean seven years five to seven years it sort of reinvents itself every seven years right I sort of started out with saying at one time it was exciting just to have an open source search engine right in a world of big expensive commercial search engines and then it was really exciting to get into big data from the search perspective. +becoming a data scientist right I mean I I I pretend to be a data scientist I pretend to be a machine learning guy right through search right so it reinvented itself and now i'm a prompt engineer and generative AI person through search and so I love that the field reinvents itself. + But also certain long standing principles around measurement experimentation appear to remain relevant even though it reinvents itself every seven years right and it's been really, really exciting like I like that what i'm doing now is not what I was doing seven years ago and I suspect I won't be doing it in another seven years. +And that I like making things happen I like solving problems and search remains sort of the way people interact with technology systems right I am really intrigued or looking forward to when. +Search isn't just I ask a question get a response but I ask a question get a response then I have another conversation and the search engine understands that we actually have we we talk about search as a conversation but. +We don't normally do that we just pretend it's a one shot kind of thing I look forward to that side of things and then what are the new use cases we're going to enable right i'm going with my family to Spain for three weeks. + In July got my plane tickets booked I got not great flights but cheap flights imagine that there was a search engine out there that knew what my plane flights were knew what my wife's personal tolerances are and if it was constantly shopping for a cheaper flight and actually cancel current flight and gave bought the new one and you know just let me know by the way I saved you. + Another 400 bucks for your family of four or I found a better flight or there was an upgrade right like wouldn't be amazing if once you kind of gave it the parameters is doing that and I suspect that's going to kind of look like a search experience right it's going to be a query with a bunch of parameters. + It understands what my preferences and tolerances and risks are right and that's going to be a really interesting thing to measure and I think it'll be really powerful so so excited about that future I suspect that's the next thing that we get to once we get through kind of the current generative AI stuff. +That's a beautiful answer thanks so much really I've learned a lot today I'm sure we'll repeat this let's do it I know you have another topic to talk about from your conference talk and another project you're working on and I'm sure keep it keep it continues to be. + Really relevant to what we do it's it's a it's a toolbox right it's it's it's all in your toolbox or maybe it's a toolbox of full of tools but I think it's it's fantastic one to have to really complete your search journey because if you are only writing code and you're never looking at queries you're never labeling you never here would you know how how does it feel like you know using this change and you will not get far so please use it. +I mean of course you can set up something you know in the kitchen with Excel but the Microsoft Excel whatever Google spreadsheets but maybe that's not scalable enough and not repeatable and yeah why waste time if they're already cool tools like keep it open source really excited about this. +That's great that's great yeah the scaling up is super great super exciting so yeah it will be interesting to make sure that keep it remains true to what it does and doesn't try to become all things to all people we'll see what happens. +Absolutely yes and you you as a listener have a chance to contribute it's open source exactly exactly thanks so much Eric I really enjoyed it. Thank you bye bye cheers. \ No newline at end of file diff --git a/transcripts/vector-podcast/evgeniya-sukhodolskaya-data-advocate-toloka-data-at-the-core-of-all-the-cool-ml.md b/transcripts/vector-podcast/evgeniya-sukhodolskaya-data-advocate-toloka-data-at-the-core-of-all-the-cool-ml.md new file mode 100644 index 0000000..bfcb665 --- /dev/null +++ b/transcripts/vector-podcast/evgeniya-sukhodolskaya-data-advocate-toloka-data-at-the-core-of-all-the-cool-ml.md @@ -0,0 +1,253 @@ +--- +description: '

Toloka’s support for Academia: grants and educator partnerships

https://toloka.ai/collaboration-with-educators-form

https://toloka.ai/research-grants-form

These + are pages leading to them:

https://toloka.ai/academy/education-partnerships

https://toloka.ai/grants

Topics:

00:00 + Intro

01:25 Jenny’s path from graduating in ML to a Data Advocate role

07:50 + What goes into the labeling process with Toloka

11:27 How to prepare data + for labeling and design tasks

16:01 Jenny’s take on why Relevancy needs more + data in addition to clicks in Search

18:23 Dmitry plays the Devil’s Advocate + for a moment

22:41 Implicit signals vs user behavior and offline A/B testing

26:54 + Dmitry goes back to advocating for good search practices

27:42 Flower search + as a concrete example of labeling for relevancy

39:12 NDCG, ERR as ranking + quality metrics

44:27 Cross-annotator agreement, perfect list for NDCG and + Aggregations

47:17 On measuring and ensuring the quality of annotators with + honeypots

54:48 Deep-dive into aggregations

59:55 Bias in data, SERP, + labeling and A/B tests

1:16:10 Is unbiased data attainable?

1:23:20 + Announcements

This episode on YouTube: https://youtu.be/Xsw9vPFqGf4

Podcast + design: Saurabh Rai: https://twitter.com/srvbhr

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20230128_100136_c691208feb13437a07aae0f929db756b.jpg +pub_date: Sat, 28 Jan 2023 10:19:56 GMT +title: Evgeniya Sukhodolskaya - Data Advocate, Toloka - Data at the core of all the + cool ML +url: https://rss.com/podcasts/vector-podcast/799802 +--- + +Hello there, vector podcast season 2. I hope that you were waiting for a new episode. And today I'm really happy about my guest and the topic, because in many ways we didn't cover it as much as I think and hope we can cover it today. +It's the topic of data, the role of data, while everyone is talking about sexy deep learning, chat, GPT, learning to rank new algorithms and so on. I still believe that we should not forget about where all these things begin and this is data. +And I'm happy to have and welcome Yvgenia Sukadolska, data advocate at the locker today with me. How are you doing? Thank you, Dima. Thank you. I am super happy to talk it. I'm very precarious. I'm very smooth. So I'm feeling like I'm just having a little chitchat before vacation. +Yes, exactly how it should be. And I'm really happy to have you here. We met at Haystack Conference and this was great. I saw so much excitement in you when you talked about, hey, but what about data? We should also talk about it. Don't forget it. +And I'm really excited to drill into this today with you. I think let's start as we usually start with your background and then we're all from there. Okay, perfect. Yeah, Haystack. I think I literally like formulated my passion that I want to talk about searching the means of the data. +So I'm feeling like today I'm getting a present for Christmas. So yeah, about me, I am this person, this type of a person which got his perfect position by chance because I never knew it existed. +Because I finished bachelor's in machine learning and I was like, okay, if I did so, I need to work like around it, you know, but at this point, everybody was, everybody is always hyping on something in machine learning, you know, so at that point, I mean, I finished in 2009. +So it was like a very big era of guns and like others. And everybody wanted to work with a computer vision and a dinner and so I thought, okay, maybe I need to start like working somewhere in the field and see what I will like. And by some chance, I started working as a software developer. +I don't know why just, you know, like out of the blue, your students, you're getting your first job. Then I kind of realized after some time that I'm doing more like business analytics, talking tasks and consulting through doing the development also and I like both. +And I was like, okay, so I don't want to be just a developer and I switched to a position which was called technical manager, which was like something in between analysts. +And also a person who like manages like some projects with people and also at that time, that was the first time I tried to do crowdsourcing like related to our projects because we were doing tasks about moderation. +And with moderation and actually you need a lot of data like labels and checked on the quality because it's a very hard task. And still was off because here I wasn't using enough machine learning and I was like, but I started it. Okay, then I switched to machine learning engineer. +I mean, it sounds like I'm hoping on and hoping off, but it was like some periods of time. It was amazing. I had an amazing had the I am still very grateful to him. He like, teach me much about machine learning in the production. +But at this point, I realized that I lost some of the knowledge that I like received back. So I applied for the master's program in the actually Munich. Now I'm studying the technical university of Minken on the also kind of machine learning program. And also I feel that I'm not speaking enough. +You see, so I'm like always like speaking enough, not the melon, not this enough. And at some points, my like, please, who I knew back then and they were working at that point in to local. They said, hey, Jenny, how about you will work with us as a data advocate. +And I was like, what is that? I'm going to Germany. I am not, I don't know what is that. And they're like, oh, that would be perfect for you because you can do machine learning and researching, but you also can speak. + And I am so happy that it literally happened because last year and some time that I was like there working with the data, working for outsourcing, working with search because I also have like some past experiences when I was a machine learning engineer, I actually was working with a search recommendations a little bit. +So all of these combined in one perfect profession. So I would say I'm a very happy person. This is super great. And it sounds like you are in the warm waters of like what you really want to do at the same time. I think it's still a very demanding role and and then field. +And so you are basically still doing some ML, right, but you also advocate for data. +Can you expand a bit on what you do? Oh, yeah, it's actually like everything in a like all in one like, you know, this shampoos with conditioner and the shampoo in body wash and everything because I have some freedom at my position to choose what I want to like study now. +So, for example, I chose like going to the search conferences and talk about it because I had some experience. I really love the idea of comparing crowdsourcing and machine learning models in some particular tasks. For example, let's think about the adversarial attacks. +It's interesting how far we can expand with them, like detecting them by machine learning and detecting them for humans. And like these different comparisons where they crowd wins where they like manually where the machine learning wins. +It's a question which is in general interesting, especially now with chat GPT when everybody is like, oh my god, the ice conscious, OK, close everything, fire all the software engineers we are done. + So it's super interesting to explore that and I am always like reading articles about this attending a talks about this also doing myself some talks plus of course I'm also participating in development of our company because to talk has started as a data labeling company, but now it's expanding much more in the means that we also started designing like a mouth tools on the top of it. +Because when you're having such a resource, you know, like human manual labeling in your basically in your, I don't know how to say in your basement, but it sounds creepy. Yeah, you can use it for transfer learning or for some other like interesting tasks. +And yeah, we expanded a lot and it's very interesting to a system this process and to come and to talk about this and also in the means of still manual lately in assistance to I am now developing currently. Yeah, this is fantastic. +And when somebody approaches to work on or like maybe you just create an account, I guess, and you start, you know, creating projects and so on. +I think many, many things go into the process like starting from the price, right, how economical it can be, right, do I have any control over this, but also in terms of, for example, the outcome, you know, the quality of labels that I will get. +How do you usually sort of structure the process, is there some general recipe that the lock up would offer to any user and maybe on top of that, you would offer some additional service, so to say right or advice to a company, is there something around that you could share with us. + I would say I can talk no stop, but in general it's like this firstly, of course, when you're deciding that you need manual labeling for some reason, like some data sets, you need to understand that you actually need that because it doesn't need that for every task that they're existing just use data centric approach throwing data because nothing is tops up this that's not correct, of course. +You can be free using like open source data, you can be free using synthetic data because it's cheap, you're just generating it yourself. But sometimes in a lot of domains, you don't have enough available data for some specific domains. +And it's hard to gather it or generate it or sometimes you need a human creation over curation over the machine learning processes, for example, for monitoring or like with. +For example, it's like a hot topic i've seen such a thread in Twitter how people try to ask it for some like really, really dangerous stuff and check if it will have provided and it did so like you know we still need a human creation over like the data gathered by MLEI mechanisms. +So in this case, if you feel like you haven't need to gather a data set for your specific problem and you don't know where to start here is crowdsourcing platforms and for example into loka. +It is the platform which was created from engineers to engineers so it's not about like the only model with business where you're coming to us were consulting you and you're going away. +So we're actually super happy if you're like trying to deal with it yourself because we have an open API and it's more about mechanisms than speaking with manual labor. It's like literally about like handling the crowds with a mechanism sort of filtering and etc. +So usually to start you need to register and then we have huge tons of tutorials and education programs and also we have a community which like might in handles actually and we're bothered to discuss any problems or questions. +But I would say like we try to implement in the platform already simple steps that help you to do it pretty intuitively without studying much your first labeling project and set it up and let it run. +So there are like inbound instructions on every step there are like some little video or some little text instructions telling the good practices. +So we try to make it as simple like I actually saw it developing because when I started using to look at it wasn't any of this and now it's like impressive how they changed everything. Yeah, I can imagine. + And so inside the log I mean if I consider to look as deep package product that I get access to you know inside it I presume you have the labeling editor or component whatever is it that you're calling it that I can flexibly load any data format right and also any vertical like from computer science to audio to text time serious maybe and so on. +What does it take for you know the way I imagine it they send my head is that let's say this is this is a team that hasn't had experience with labeling before but they realize the importance of it in their project. So they will not be professionals in this space. +What what do they need to think about when they prepare the data maybe quantity maybe you recommend them to start with smaller quantity how should they reason about format and should they first go and watch all the tutorials so can they somehow intuitively follow the UI. + I would say like this like of course crowdsourcing a little bit like aligning it reminds me a little bit of training the machine learning model you need to spend some time tuning of course but yeah it's for the different data types that's firstly addressing your like comments it's for different data types it's for like all the text video etc and like can be used for multiple use cases like gathering data. + Gathering data says for voice assistance and like for self driving cars or like as I hope we will still stick to the main name of the podcast and we will talk about search relevance with a human labeling but like yeah let's imagine your theme is on the record creating a project and they're realized we need human labeling but they never saw the platform. +I would say that the most important thing to consider is to start thinking in the midst of the composition of the task. +It's a key thing in the any crowdsourcing tasks that's not like it's pretty scalable so the amount is not a problem it's not that expensive you can set up reasonable prices and it will be pretty much cheap but the one thing that is very important if you make task too complicated. +Like as you would to for example having in house experts and you can ask them whatever and they will think of the rest here you're working with people which are not committed specifically to provide you something more than you asked for them. + So tasks should be simple and they should be well defined so that is the thing that you need to a little bit train on thinking of how to decomposition task and for example like if you offer it to a person who is not in your area of the studies that you will be sure that he still can do it without special training maybe just reading the instruction or like completing some exam. + And the rest is pretty much covered by the platform because now there are like specific mechanics which predefined your settings for you not making mistakes on like you know we money to the crowds or like doing the interface incorrectly because we try to implement as much testing in house as possible. + And the interface they configure like the program where you configure the interface it is done pretty much into intuitive sense so you don't have to like learn JavaScript or HTML if anything it's done just in basic building blocks which can be like changed in places and group together in some nice looking interface. + So I hope the most of the burden is just to start thinking differently it's like you know it's like with a programming sometimes there was a moment when I learned how scaling my life and I have to completely like reprogram my mind because it's such a different language in the means of programming you need to think differently the same is with crowdsource unique to think of the composing that is the most important. +Yeah that's exciting yeah a high school functional mathematical yeah it's so my god yes but probably beautiful too yeah this sounds great so it does sound like a self service in many ways and now that you called out search which is also very dear to my heart and I'm glad it is the same for you. + So let's start with the basics you know like I have a search engine I have users I've got logs logs right so what I can do is that I can actually record for every search the position where the click happened right so what we returned what was the click and so I have plenty of data assuming I have plenty of users why do I need another data set can you convince me. +What am I missing. +Here I would say manual is much more about using not creating a new data set from the scratch but evaluating your abilities of ranking your queries in your search correctly because as far as I understand a lot of like there are like a lot of the ranking pretty sophisticated algorithms existing. + The change in stay like starting from the simple like search like I don't know because I am a lot of people like document and a query and everything which was like some past time now people are creating vector databases and it's super sophisticated but still we have search we have recommendations we have like some order of queries which user expects to receive. +I mean user doesn't expect to receive some order but it user expects to see the right answer as like closer to him as possible not like searching for the five pages so for that specifically human evaluation on top of the implicit signals evaluation like clicking it's very crucial. +And I can try to elaborate on that how do you think like from your perspective like do we need if you like your creative search engine do you think we need also to make you and see the like ranking results or you think that clicks and buys if we're talking about equal is enough. + Well if I play the interior to play the devil's advocate you know the data advocate I am the devil's advocate here in principle I already have users so they will tell me with the clicks they won't with clicks right so I might as well just measure you know the sort of plot this click through rate or something else and then see what's going on right so that will be my probably online metric. +But I guess when you when you talk about human labeling you infer that there is an importance in offline evaluation as well right. +Oh yeah I am like you know is asking this rhetorical question like we need it right right but actually I can give some motivation behind it actually it's a very interesting thing about what human clicks actually mean. + We can return to it because recently I had this meet up about biases and it was also about like human clicking and one of the very interesting talks that we had at our meet up was about like position guys so the humans are just tend to click on something that they are offers because they're taught that the thing that offered like at the top positions are exactly what they need. +And they may be dissatisfied with that but they're just like you know follow the general way of how search engines and the com search engines work. +So technically online metrics they make a lot of sense of course because like by clicks and by bias you can predict the most of the behavior and it's pretty fast and it's automated. + I mean like a testing by everybody knows that the one thing that it doesn't cover it's firstly explicit signals like you can you can't talk by the clicks and vice as the whole overall about the human satisfaction the satisfaction score because it's not like they are explicitly asked in general like do you like this do you like this search result maybe you wanted something else maybe you want it more maybe you wanted to be recommended something else. + And yeah the other things that sometimes like if we're like doing some assumption about the like an A B testing for example that we change some interface and we're doing by some clicks and by some assumption sometimes we can by introducing new features pretty much hurt our product because it's happening in real life and user see the changes there like the ranking how it differs now with a new feature and they're getting super like. + This satisfy the end humans they're not like you know for giving easily they see problems in your search engine they might say yeah i'm not using it again no thank you so like some reasons of why would apply lately be better you can experiment much more on it because like all you notice the feature in zoom that if i'm doing that it's actually noticing by the neural network and of course me. +That is amazing oh my god okay that yeah we're really in the like this is the time of the artificial everything so I got super amused i'm sorry. + Yeah yeah so firstly with a plainly doing you can definitely experiment more because you can try different features without higher harm in the end to end users of your system of your engine and secondly you can check how they're satisfied what they do like actually explicitly ask them what do you think about this because when you're just the guessing their behavior by their like some implicit things. +Like where they are is nook and how much they click you can do much for mistakes because you know as it says we we can't get into another's human head but we can at least try to ask and then that probably will be closer to the reality. + Oh yeah absolutely yeah I mean if we don't if we so what you're saying is that we if because usually in search engines in a way we skip that step of asking yeah we could integrate some give thumbs up or thumbs down approach but it might also be still rather implicit and not explaining everything we want to get. + But basically like you know yeah I just remember that I was reading pretty much interesting articles recently about the recommendation systems and implementing them using them more about the behaviorism of the people like not just giving them the most popular result or the most desired by the similar by some features result like in the collaborative filtering. + But sometimes we need to give humans the result that they are not thinking of what they it will make their like for example health or life better because you know this problem of recommender systems when you're like used to clicking on something at some point recommender system starts offering you the same pool of things you're kind of stuck in this and that's why like using more like sometimes and the offers of this paper they were suggesting that we need to sometimes. + As humans explicitly did they like what they were recommended and do they understand why it was recommended to them and maybe they want to change track of the recommendations so that's why we shouldn't seem just online metrics but yeah also the second grand reason why apply metrics are good you can experiment without harm very fast because like online metrics they're usually taking like two weeks for something. +You need to wait for the statistical test to make some like results which are significant and here you can test it much more faster and you can perform even offline a testing by the applying manual label data which will cost less harm because the real users won't see your mistakes. + Yeah I think this strikes a chord with me for sure so it's like you don't really because there is always a cost to pay when you go online that you actually deliberately potentially harming someone someone's experience to learn whether your contract is it called contract factual like your change in the algorithm is good a bad. + Yeah it reminds me I am sorry I'm so totally up to the but it reminds me of the when I was working in moderation of the vertusments here it's very crucial to make mistake and line because they're like two different options otherwise you're like showing to the end to end users the advertisements with things which are like you aren't supposed to show like drugs. +I don't know some the funeries some yellow news something that is like dangerous or just like stupid on the other hand if you're not letting some like through the moderation some healthy content through your losing money of the companies which are like having a deal with you and your own. + So here you need to be very cautious with any online experiments you pretty much doing everything online or offline and you need to very much monitor how you're watching learning algorithms doing the evaluation because environment changes a lot in like you know in the advertisement world new laws are incoming very fast people like people are input like they are impressively good at the end of the day. + And the way the all of the boundaries when they need to fraud something so imagine every day there are like new existing algorithms of creating some advertisements which are passing the machine learning algorithm block blocking the fraud and everything so you need to adapt very fast and for that you and for a fine experimenting your course need applying data and applying labeling very much. + Yeah I slowly start to wake up from my devil's advocate role so so I should stop being careless and not only rely on the data that I see in production because in a way it's like prime time for my product and I should be careful about it's not just deploying something once and it stays there forever and chat GPT takes care of everything. +But it actually something that I will need to evolve and this is where the crowdsourcing approach may help me to do more economical more less intrusive as well this is really good. Let's maybe try to make it a little bit more concrete right and let's emulate let's play this game. +Can you verbally visualize describe let's let's say I'm developing a I don't know flower search engine I don't want to say ecommerce I don't want to say something specific let's say I'm searching I'm offering flowers and I would like to search them. I will use her to search them. + I guess can you propose sort of a framework of thought how I should approach the crowdsourcing so let's say what should I focus on can I choose a metric of line metric that you will recommend would you like to you know do you think that there is some specific thing that I could try to connect with my business goals like a metric that will be reflective of my business goals or would you start with something just something like I don't know in DC geo. +Whatever and go from there. It's a very very very long topic to discuss but less starts from somewhere so in my perspective like of course like you're doing your engine it's in some point of course you're implementing some like online labeling. + So you can find a lot of things like online evaluation and like you have some somewhere to start and here you come into the position where you need to do some of line labeling so there is like these I would say like a circle which like in the parts of which you can like you can depict your your pipeline pipeline as a circle which goes in infinity and it has the several parts. + When you're deciding like about like how you want to perform your ranking what is your ranking means what do you want to show the most what is your like how many positions people do see what do you want them to see the first what is relevant what is around and you're like selecting some end to end metric that you're going to use. + There are like usually some popular metrics you notice the like and this is all like this this cumulative gain metrics is very popular and nice way to start there and there are like even like more simple ones just evaluating about the precision and recall of your position of the elements arising in your like ranking list resulting and there can be even more sophisticated approaches like. + Like expected reciprocals around for example metrics if you heard of it it's like more cascade approach because you know that people are not clicking through after some certain position but think we're talking about flowers I would say it's like it's more about like image search simple one which has like some certain type of definitive answers and it's not like people are going to it's like with search in some items when you're like finding what you desire and then you are not. + It's scrolling down maybe with flowers you just want to see so I would say I mean see them like download them or something so I would say this is pretty good at the beginning as a basis and then you can adopt this metrics based on what are you really interested in maybe you have some advertisements on some of the flowers for something. + Then the next part after you define what are you want to do for example view like which metric you want to evaluate what do you want to see like what you want to compare you think about like what do you need for human labeling how do you need to sample data what will be the result how you need to aggregate it and how do you need to like use this information in your product. + And then it comes like for example for in this G you usually need some ideal ranking to compare your ranking to so here comes exactly the crowdsourcing the manually because you can gather this ideal ranking from them and then do a comparison on the your real search engine answers so okay we define the goal we want an ideal ranking of flowers by the query and not one query because like I'm not going to do it. + For example just one queries kind of I don't know super simple and you want to evaluate it in general so here comes the sampling and how you can approach sampling of your queries and the results of your search engine can be very different you can just try to sample the most popular flowers and queries but it's usually not the best approach just because like the most popular queries are usually very well handled and they are very simple. + Because like when the people are marching in the in their disayers it means that it's not a very complicated thing so like what very like a huge tail of very rare queries which you also want to consider I guess in evaluation in ideal ranking so here comes like two techniques for example reserve works sampling or even like stratified sampling I would say I must recommend using stratified sampling adopted by like your own. + So you can use the same method as the one you are using in the list of the situation in your own needs this one allows to like to very like shortly explain it without like the deep is just you have your own data the whole amount of the queries with their like how often they're asked how popular they are and you are doing like some bins of them based on the popularity and you try to sample equally from the each beam but these beans are different sized based on the general popularity. + So we have the kind of like you're kind of modeling the distribution of the data in your engine by sampling like this so after you have this data samples you need to think how to present it to like manual labor what do you want to ask you like you want some ideal ranking yes and there is an option to give them like for example query and the like the first 10 or 20 results that your engine returns and it depends like how many results depends on the click through rate which you can like for example estimate my you have a data about how users click how far they click in your like length of your search results and you can estimate that after like I don't know 15th position it's not interesting usually to anyone so you cannot worry about it very much. + But here like you see if you're giving the whole list to end user in crowdsourcing and saying okay rank needs from like the most relevant list relevant as I was saying before the composition is very important and this task of ranking is very hard because even like I am having like some degree like I have much worse and masters I think I am generally like a dukega person to some extent I am not sure you know but if somebody says to me okay this is the flower like this there is a 15 pictures can you please like rank them from the most suitable to the list suitable I would be like oh my god I can't do that because it's too much so there's like other approaches either like taking a specific item which returned by your system taking a query and answering are they like relevant or irrelevant together is it a matching or a matching pair it's much simpler it's very good understandable by the crowd but the problem is that here you can't kind of compare items with the same relevance because like it says like okay this relevant and this relevant and you're like okay what should I put on top this one or this one you can ask people to give like some percentage of relevancy from their head but still different people think differently so it's kind of hard very much to aggregate the results. + The most nice approach I would say would be a pairwise comparisons so you're like giving a query you're giving two answers and you says okay what's what's what's you what's you better and then by this pairwise comparisons you can do a whole ranking then by aggregating this pairwise comparisons in the manner of the list with like from the most relevant part to the list relevant and of course If you're doing this pairwise comparisons honestly like how it's supposed to be it's n squared the amount of entities which is like a tons of entities so usually our suggestion is to do like more in the weak sort or like other sort manner with n log n so like doing a hard estimation of this pairwise comparisons sampling a little bit less but still you can like have in the end the pretty pretty good estimate it like ranking list. + So you create this assignment you have this pairwise comparisons you have the results you can estimate the quality of results based by how like this particular user is good with this particular assignments and then you're aggregating it there are like some models then you can use for aggregating for example like mathematical models statistical models like for example, red literary or something we did it actually in our crowd kids it's a thing for pretty much we tried to do an open library in Python for every type of crowdsourcing annotations not only to local ones so you can implement it yourself for example take some library even hours and then you got your ideal ranking as you desired and you can compute the metrics like compared to your ideal ranking so how good your search engine returns like on big samples and these samples how good are the results of these flowers how relevant they are and then you see like what is the overall result it might be good or not very good if it not very good you for example can select some domains when you see the most mistakes and try to like ask the crowd in some separate projects the main wise like where are the mistakes exactly maybe you have problems with like defining the color of this flower or maybe you have problems with like good lightning on the photos and you can figure out what is exactly the problem and yeah you can use this manual labeling firstly for evaluating the metrics from time to time and to see how your search engine improves with like including new features and changing the search in algorithms and you can also train on this manually labeled data your ML models which perform ranking so I would say it like it works kind of like this. + Yeah this is this is great it does sound like a very structured process what you explained but but I do want to drill into maybe couple of specifics so one is I believe in DCG is is definitely I think it's a browser that In principle if I was communicating this to some management in my team I could say that yesterday we were at 75% and today we are 76% so we are improving right and this is on a percent scale if I remove the letter and from this formula then this becomes like an absolute scale and there is no way to tell are we progressing or are we regressing. + But at the same time again wearing my devil's advocate suit here for a moment and DCG has a problem that if let's say I have a scale of labels from zero to three right so zero one two three so zero meaning completely relevant result and three meaning completely relevant perfect result if I receive two ratings one with all all ones right so all ones and the other one is with all three so all ones it's kind of like a suboptimal result nothing better in the list but at the same time not perfect and the other one is absolutely perfect and DCG yields exactly same number because if only if we You rightly mentioned about optimal perfect ranking so if my perfect ranking equals in lens exactly the visible labeled area than the formal and DCG will yield 100% in both cases and this is kind of like a problem and you touched on this in that part where you say that we need to make sure to construct this perfect order of results right so how long it should be let's say if I show 10 flowers on the screen 10 bouquets whatever how long that perfectly should be 30 hundred is there any recommendation. + I would say like as I mentioned before firstly you can use the metrics which is like expected reciprocal rank which is exactly talking about the moment when the user lose attention and after that you can make mistake but they are just not reaching it and for evaluation this like the moment of the termination of the interest you can exactly I think pre evaluated with like if you have some any data and pre-evaluated by the clicks so you can give like any item some weight by reach through general and then just predict in general how much like general user how many items your general users like look through before they're satisfied with the result and maybe over time this actually amount will be decreased because your ranking will be more perfect. + But you also can try to emulate the same experiments with actually the cross-sourcing and just to see how like to give them some certain amount of objects why I'm talking about this actually because recently when we had this talk about biases the presenter for testing his hypothesis on the click through he created a project in soloka where he had like the query. + And the recommendations which were like around the 20 or like 30 and he looked are there clicking through until some I mean they're also were like ordered like the search engine and he looked like how how far they're clicking through to check the hypothesis of the position bias so in general you can also try to test your hypothesis online with a click metrics and see how to like. + Choose this position and then test it offline but one additional thing when we're talking about business we're in general also talking about budgets so of course the more you need to evaluate steel is the cost will rise just because you're like your offer it more data to crowd and crowd needs to like to do more assignments or it becoming more costly. +So I would say I would like estimate the amount that you need that you know that you need the amount of the click through and then maybe cut it based on your like general estimate the costs of mental labeling and try to align them little bit because still. +I would say the result might be not like 100% perfect in the means that people are reaching like farther and seeing the mistakes but it still will be a big improvement if you're catch a mistake in the top ranking like positions. +I think connected to this there is a notion of disagreement between annotators right so what is relevant for you might be completely relevant to me and I I want to see the for the same query I want to see the results in different order. + I think one of the suggestions I've heard of how you could construct this perfect list is actually you can take and concatenate all of the rankings given by independent annotators for the same query and then resort them in the order that makes it perfect from the top to the bottom of course you will still have issues with ties right so if you have three three three's then how should you order them but. +But at least they will be visible on the screen so maybe that's fine. +Or maybe not who knows but at the same time you kind of like achieve this perfect list which incorporates the wisdom or the wishes of other people that have been in the same sort of group have you experiment something around these lines or do you think it's sensible to do is. +To experiment with the which part with the check in the form working with or with reordering with with with with constructing your perfect list right because for ndcg you need that perfect list to divide by right in the form. +So how we experiment with the length of this list you're asking me no in this case I think i'm actually describing that specific way of building it that you take a sub lists from different people that annotated the same query then you stack them together and then you sort them right. + Or always it's yeah it's very interesting approach I would say that I myself never experienced such of the technique which sounds very interesting but we're usually just doing like I usually eat more like aggregation by the models which are not like concatenating but that taking into the account in the general the quality of the user in this ranking problem. + And so when you're doing an aggregation you're just like more lean towards a user who are proficient in ranking in general so you trust him as an end to end good user of the search so for example when one person said all three and one said all once but I know that this three guy is in general good at this I will just take his one as an ideal labeling. + Yeah this is this is the exciting part you're tapping into the topic of quality of annotators which is super super important at the same time you could teach the annotators if you have them in house but if you have them external you kind of do not have control over who gets what task so how exactly maybe the local or what kind of methodology should I apply to measure the quality of the data. +What are the components there. + It's actually like I would say it's a very very like it's a very big system and the means that you need to not only measure quality but also like keep your projects and protect it from the fraud and from the people who specifically want to break quality not just they're like making a human mistakes but they're really really trying to scan with your data. +So there are like different techniques starting from the super simple ones like anti fraud ones which are like looking at how fast are you labeling. +Are you labeling with the same non human distribution of the labels like clicking only like one option until it just goes forever or like even sometimes it's shaking of how you're like how is your behaviors like in general with like different projects with a lot of data. +Different projects without not taking how your mouse works or something like this so this and also there are like of course general exams. +Checking your language proficiency checking your proficiency writing checking your proficiency in some other skills which are also building up some certain I will say port portrait of a good label or because if you're like able to and provide the good results in the some skills. + Which are like around this problem like your good with this or this that means that you're in general won't be at least a broader and you have a chance to succeed in this tasks and then the main mechanism which is used in the most of the tasks where you have categories like classification or something and when we're working with a categorical tasks we know the ground truth some sort. + Of course it doesn't happen with ranking because we're ranking we don't know like it's a very subjective manner what do you prefer this or this but with but you can of course actually create an obvious examples like very obvious so when you know like some ground truth you can hide you can shuffle in this. +And you can see the examples of the tasks with the answers which you know and you can like. + Hiddenly shuffle them in between the assignments so people will complete them without noticing it because like it's hidden by the API and everything and by the percentage of the examples that they evaluated correctly you can kind of see me their skills because you know that like in general for this class they're giving the right answers. + And the second technique which also works good for the more creative I would say or gathering assignments for example when you need to take a picture or when you need to do an assignment outdoors for example go and check there is like building on the some certain plate for like a map sub there you can do even more tricky thing and tell the crowd evaluate the other crowds. + So you're creating a specific validation project with the other crowds or source and you're giving them the answers of the first crowds or source and you say okay guys now you need to evaluate doesn't look correct to you doesn't look like not a fraud and everything and there by this double evaluation you're actually sorting out all the problems. +Wow I have never heard of such method it's it's amazing I think more traditionally like maybe like 10 years ago in the project related to sentiment analysis we were talking about. Double annotation but at the same time so you give the same. +You know the same label and then you ask the human to whether they agree or not but twice and then you basically calculate the inter annotator agreement but what you just described is so brilliantly put and sort of invented in a way. + Was this invented at the locker or have you seen this somewhere to be on I don't know if we did that I am super happy yeah it doesn't seem like rocket science yeah but it works yeah yeah yeah and also about yeah in inter annotator agreement also works especially in like some classification that's how we actually started creating this hidden assignments recently. + Like as I told about them we are called and honey quotes or the golden assignment the one with a hidden tasks which you're like shuffle in the data and then evaluate the result the skills of people who are doing the some certain kinds of assignments and actually also we're saving these skills and sometimes you can access them because they're already on platform they call global skills you can just. + Preselect on your project people who already succeed in moderation for example that actually helped me recently a lot because I didn't have to train the crowd for my very complex stuff so yeah but I stepped aside so before like when I was even working with to lock us on time ago you have to create this specific tasks yourself. + Like this hidden you have to manually label them and that took some time and it was like kind of tiring because you're sitting here and you're creating like 100 like usually you need some certain amount of some sample of this task like at least 75% of the general like amount of the tasks on your platform on your project to evaluate how good the people are because you're just if you're like giving them 100 items to label and once you're asking like. +If it's correct on earth you can't evaluate if this person is good or bad it can be just the pure luck so labeling why yourself was kind of you know time consuming and said and recently we decided okay but we have. + Crowd why do we are doing the drop of the crowd let's just create this hidden tasks by the other crowd and we can do this easily just using the interhuman agreement you're just giving them a task and you're pre selecting the crowd with the good skills in the past just in general so you trust them more and you throw for example 10 people on one tiny bit of a task and 10 people like legally without knowing what the others say and then like to have better than one you. + Usually some certain amount of the strong agreement comes and you know that is the right answer and you can directly pick it and already shuffle it in the other project so you'll see yeah we're making the self working mechanisms like you know you just throw some data in your system yeah it's like self reinforcement or yeah I think this is amazing and and it also is surfacing I believe like it's. +Feature of the locker that you cannot get with let's say you set up an open source labeling tool that you can having a specific task like moderation or. + I don't know sentiment whatever machine translation that you can actually ask and gather a group that will be proficient in that specific space so because otherwise you're going to be wasting cycles in potentially teaching people right yeah yeah I think this is something that now that we started to say in the beginning of the podcast the data is important but also humans that annotate. +It is important I important yeah absolutely this is great I still wanted to understand one building block you were talking about aggregation can you can you again sort of restate this what do you mean and what should I pay attention to as a as a user of such a platform. + So for example like there are different ways of annotating data and sometimes you need like there can be different cases when you need aggregation so aggregation is like just imagine that you receive some raw results from the human annotators and then you need to aggregate them in some final answer that you will use for your model or for something it can be different cases when you need it for example as we were talking about the. + The aggregation between humans on the some some task for example you have a task of labeling feature is that a cat or dog and like you decided that you want to like foreign a tater so it's and like three of them said it's a cat and the one said that it's a dog and you have less for answers and to understand that it's a cat you need to perform an aggregation so if it comes to classification tasks it's pretty easy to do. + I mean you can do just major world or like major world to wake it by the skills of this people but then it comes for example for aggregating like images like for example you're doing a segmentation and you need to aggregate different answers about the segmentations here it's already harder because like doing like a major world piece of wise it's a little bit of a hard work you know. + So for that usually there are like some models which are pre designed and used and studied in crowd science so aggregation for the aggregation of image and also aggregation of this pairwise comparisons that I was talking about because this is a specifically a hard task because you have this pair wise assignments and sometimes it's like a better than be be better than see but see better than a and you're having a cycle you don't know what to do so for that there are existing. + So then a couple of models which are based for example noisy bread dietary which are based on the expectation maximization algorithm which assumes that flavors are actually by the skill know the ground truth of the answers and we're trying to estimate that to get as possible as close to that for a digmar with like a couple of iterations of this model and then the end it just gives you away the list of responses like for example if we're counting and. +DCD some other metric we just need a list where it says like item one is the best item 10 is the worst so the aggregations out of this all of the pairwise comparisons it will give you that list. + You can implement these aggregations yourself and study them because like in crowd like we're not the one the first ones doing crowdsourcing action so they're like in crowd science they're like a lot of models presented and our research team actually also studying them and implementing them and I hope. +I'm not praising them too much but you can it's your it's your moment of okay our research team is created yeah but but yeah we for example for aggregation we just create a tool which can be used paired with a platform so you don't have to think much how does it work but if you want. +Right me and I can provide you with paper yeah absolutely and all the papers that you mentioned during this podcast I really would love to include a show notes as well because because I I see how how the listeners find the episodes educational and they. +Some of them spent a lot of time you know listening through and and and then you know reading the papers as well at least browsing through them. So Janet so much stuff you have shared so far and I think even those who are using open source tools like I don't know keep it a label studio. + And others I'm sure they can learn from what you said but I also hope that they will improve their practices by typing into the talent behind the locker but there is one topic that I think keeps surfacing everywhere but also to some degree it becomes an overheated discussion around bias and data and how this can actually drive the inequalities in life and in the world and I. +I think by the virtue of us being in this space we should resist this as much as possible but I wanted to pick your your brain on what is bias for you and how you have seen or maybe discussed this in the email projects. + I really like this topic because yeah I recently hosted a panel discussion about biases and a lot of hosting a panel discussions because you can come you can know zero about the subject you can ask the stupidest questions to the most awesome engineers in the field and your returning super educated so maybe I'll try to just to recreate what people from this panel discussion and early in set to me. +But as far from my understanding they are like not two types of biases but I would I consider them as a two types of biases. + The one is I think all bias more related to the stuff which is like the things that we shouldn't be biased on but we are biased because of the historical data or like the mother unfair like like results of the history so it's like the bias of the skin color the bias of the gender the bias of some other and they are here and there in the set in the big models. +For example like even the dolly and GPT free and everything. + Sadly scenes they are learning on the internet available data and the internet is a very toxic space sometimes especially like I still like the stories of the chat bots which were like learned on Twitter and then they are like so offensive that nobody can leave them out in the open like business communication word. + So these models of course have this at the biases and that should be controlled very much and that's why we have like the fairness fairness topic in the eye and that's exactly like I actually I studied recently I love my masters because I'm revisiting all the topics in the ML so I'm feeling like when I'm talking about it I'm feeling like I'm literally coming from the academic background. +It's just the monsters and the fairness algorithms there like pretty much set up how you can evaluate how you can try to make your data list bias or just test it on the fairness but yeah still here. +Sadly and the second thing which and of course their approaches how to avoid it fully but sadly we're constructing new biases here and there so we're getting rid of the one and we're constructing you. + And the second one they are more like behavioral biases maybe they're like last harmful in the general because I consider ethical biases being very harmful like when we're creating a systems related to jurisdiction or like to some other things these biases can be crucial and also by the way the same biases. + Oh I can I remember the story about covid like with the covid when people tried it's not like that ethical bias but it's a bias and it was very crucial so when with covid people tried to at the beginning when it started and everybody was panicking so people started to think it maybe they can do something in the eye like some amount of which will help to recognize if the person has pneumonia like is it like caused by covid or not in the lungs. +And there already was data from China because it started earlier and there were like a lot of AI and the engineers working on that but the problem was that the data was biased and it wasn't cleaned and sorted out because people didn't have so much time I mean it was very like a crucial moment. + So because of that models were working very biased and bad because they for example were predicting that if the person on the like the scan if the person is kind of in the position of lying she or he then they have covid but if it's in the standing position they don't have it just because the people for lying and taking photos they were just in the worst medical condition. +In general because like when you can't stand out that means that you're pretty you know so it was just a bias in the diet because it was a balance and that is the result of bias which you need to monitor in control what that's why you can't leave it in the open world. + And yeah so the behavior biases it's more like about when like for example with the search engine I think I touched it the position bias it's when you're just trained to click on the like first three elements that you see because you're you're so well with information that you don't have like a power to go through the tens of pages and select what exactly do or there like some other biases. + For example we know that one behavior thing that people have it's it's interesting thing it's called it's called choice over it's like when a recommendation systems people actually like in restaurants people prefer to see something with a bigger menu with a bigger recommendation because they think oh it's enriched it's nice I would love it but the more choice you have. + The more cost you're spending of on decision your inner cost your evaluation cost at some point it becomes just not like not feasible not useful like you need less items to select better choice at some point you just lose attention and everything and that's another like thing which comes from our behavior and which biases a lot of instruments and which biases a lot of like models and which we need to take into account. +Or otherwise we won't be successful with implementing it yeah absolutely on this paralysis paralysis of choice would you think that reducing the number of options with bias our system in some way like strictly speaking do we actually introduce bias by reducing the number of options. + Oh I'm pretty sure we do but like as I said sometimes you can use biases like not all of the biases I would say there that harmful sometimes you can just like try to use them for like having more good outcome of course I'm not talking about the ethical biases but with like for example with reducing the choice amount. + Of course you're biased people towards like the what you offer for example it depends on what you offer in this limited choice and if you're like offering them the most popular of course they can be stuck in the pool of selecting the same items without changing their preferences which they would like to but in general for them it would be easier to select something that they really prefer by the whole characteristics. + So even by using people here you're actually kind of helping them with the choice process so I would say it was a general recommendation like after 10 or 15 items as far as I recall your choice overload becomes too much so you just can't like you know I hate these restaurants when you have an immune soup she feeds Indian food Mexican food you're got oh my god I'm so hungry but I can't chew. + Yeah it's like no focus and maybe no face of the restaurant in a way but at the same time I'm pretty sure there are customers who are like in haste and they don't have time to drill and understand what is this local cuisine here just give me that pizza or burger or whatever and I will flick through the menu right but I really wanted to relate to what you said and I think bias is not always negative and I think it's important. + To understand that sometimes in circumstances circumstances it could be actually bringing positive impact and the example you gave is one of that right so but at the same time whatever I show on the screen in the search engine you already talked about it's a click bias right what I show in that order you know in majority of countries in the world will go top to bottom left to right and we will click and I will show you what I'm doing. + In that order but at the same time anything that I say for example now I already bias you to think in that direction and if I choose another strategy and I start talking about snow or something else your mind will tune to completely different topic right and you will be biased again without realizing that I do this. +So the same actually will apply I think to the annotation and labeling project right so whatever I show in which order I show which questions I ask will bias the annotators to. + Besides all other factors like if I overload with them with questions they will be tired and really not give me value but if I didn't do that the order of the tasks might bias them and many other items can you talk a bit more about that and also is there some silver bullet that can solve this with least improve. +Okay from from my position which is a very subjective opinion I would say bias is a very human thing so even creating like a big. + The same shunt the perfect model trains on a purely human data in the amount that we have it now even if we increase the amount we can get rid of the biases which are we are so afraid of so we need to go to the some system with robots creating robots creating robots and then maybe we're get rid of our own biases because as you know human factor it's a really like a thing where just making mistakes sometimes just because we're. +Like not as perfect in the tentative so we're just biased as if but I would say with the human labeling you see you're doing product from bias humans to bias. +At least the thing that why I was talking about the Ecclicity like when you're predicting something by the their clicks in the online experiments you're introducing even more like you're introducing a third person in this chain developer who. +Does assumptions about others people bias is sometimes without knowing their culture or like as you said in search engines we read like on top to bottom from left to right but sometimes in some cultures they have different way of writing reading right to let or maybe they have like designing. + There are different people which like different types of the search like based on age maybe sometimes some people are like seeing less or there are people with color blindness and they need some other results because like it's also depends on how do you see everything so when only one person like assumes especially developer like I was asking on the panel a question like should be or we all be psychologists and philosophers to create a systems because like when the development of the system is going to be a lot more difficult to do. + The developer decides what to do it sometimes this person is not educated about like psychology of the human behavior and it might give some mistakes so that's why I think human labeling wins it's not like there are people are who are like psychologists and philosophers but you are giving the same task to the same crowd you can like do a perfilter in of the crowd. + For example by the same targets audience that you're interested in by the language by the culture by the interests by like for example you have a comfort flowers like people who like flowers like work with flowers or ask them do you like flowers and then send them to this task and by this your at least like. + Trying to model the same behavior with actual like people by this having the same distribution like maybe small like sub sample of the same distribution of people for your target users and you're not deciding for them yourself so I would say the best recommendation is to think about filtering your crowd for your assignments thinking of who you want to be satisfied by your product and asking the people related to that to do the evolution. +So I think that's the best recommendation to do the testing. + Yeah, it's just one thought came to mind that in principle if we consider it you know annotation process is building some kind of mathematical function that we're trying to with which we're trying to fit into the reality then in principle we could have built a perfect annotation, you know composition project that would fit into the reality. +So I think that's the best recommendation to replicate the same biases that exist today and earn money right that would be kind of the wrong way to go. I hope companies do do doing that right. +I need I mean I need to say that even like I think even to log if I am not mistaken it actually uses as a support a little bit of machine learning labeling so we're learning on our crowd in each project and we're like providing some sublingling with the most. +Learned by the prices of the target auditory and tries to emulate the same behavior, but it's still not like the evil sentient AI robotics because it's mostly manually labeled but I need to know. +I need to mention that also humans who are like labeling assignments sometimes they are very educated and very smart and they are very willing to correct the system and actually when you want to correct the system you're becoming super talented. +So I saw some people creating some algorithms which are labeling the assignments for them and relating the human time of labeling human way of like moving the mouse human way of understanding instruction. +Recently I was asked how we're blocking this type of people but I'm saying by to bond this type of people there getting so close to actually label it that. Yeah, I'm sure we can learn learned from them because and you already touched on this topic that another big area of researchers how we can. +I believe it's called gamification or like you break the machine learning model by supplying certain sequence of actions and input such that it will unlock some doors or whatever right maybe you receive a loan that you are not supposed to and things like that. +Yeah, this is interesting and do you think I'm asking the same question as they see on your event. Do you think that unbiased data is attainable so there is a zero bias. + The honest I don't think so I don't believe so like I might be incorrect and the experts said like different experts like who was sitting with me on the panel discussion and I of course I asked the same question because it's very interesting like is it only our thing the why we're in this loop of creating and fixing biases like you know like a cheap one but. + I personally don't think so because like a bias is a very human thing we can try to get rid of one you're creating another but it's not bad it's not bad we just need to get rid of the actually like dangerous biases like ethic and other ones and with the rest we just need to understand how to deal with them and as humans we can recognize some biases which are harmful and that's good that's why for example we need. + The manual evaluation of the AI systems which are trained now because they are having their very like nicely trained but they are producing biases and they can detect what they are doing so sometimes they can be harmful so that's why like from my perspective like big models alone still can be like used now even if there exists and they're like compressing us very much I need some on top verification. +Yeah exactly and this is where the human labeling comes in. I think the flip side or if I would flip around my question about getting the or your question rather. + Of getting the completely unbiased said you could also say could we actually source the data said that that contains all the little biases or little diversities that exist in the world right or maybe not okay not in the world but in that domain of operation that you are you are in your business and maybe. +Formulating it that way gives me a lever to start thinking okay what is it that I'm missing in in the data and of course this is the most challenging question to know what I don't know what I'm missing right so it's equally hard but it's probably more in the trajectory of a massive the data set. +I would say that I actually heard some approaches which are working on that like specifically taking into account bias very biased data on the like in the domain and seeing how you're a reason will perceive it and actually catching the mistakes by it because yeah we we have. +We can account the bias data and there are like some guidelines how to notice biases in your data or models so here we can try to at least approach this ask from your perspective yeah. +Sounds great hey Jenny I really enjoy talking to you and I think we could talk forever on this topic but I really love asking the question of why which is. + I'm tapping into your motivation you did say in the beginning that you know all the stars positioned correctly in a way and you got the the dream job but at the same time you still wake up every morning and you say okay what will I do today what drives me why drives me so what drives you to do what you do data advocate and a well. + I would say I don't want to like start a story with reflecting on how I like I woke up one day and realized that my my heart belongs to a I and everything but I would say like a little little moments in my life when I had to write and say about can computers think when I was applying to university and I had to explain to my parents what is AI and why I'm doing it. + When I compare the other like positions when I saw some questions which people in general like asking see in the Lee and GPT free when I visited some industrial conferences and compare them with the research and conferences and notice that people are fascinated by the models and their key textures when what when it comes to like taking it down to development and to actually helping people people. +Struggle with doing some simple things like not simple but basic things like providing the date not interesting there like they sound less interesting but they're actually very crucial like providing the right to cure in the Bay. + By us monitoring it not just creating a proof of concept it bugged me so much because I see so many cool models ideas architectures around creating like insane applications but not always they're coming to production and not always there start like helping people and everything and I would say I really would love to I love to. +In general like seeing something start working like it's a very satisfaction or anything so I choose my like my path on say to approach people with talking about data and how can it help actually to train the models and make them closer to the production. +Make them closer to being actually here and working good and helping people out with this magnificent ideas created by researchers. +Oh yeah that sounds so cool very deep thank you for sharing this it's like I think in many ways it's like the dream maybe of creating that companion that can think the same way we think and it's not human. +Because we are could also as a humanity we do reproduce and so on but we also challenge ourselves and others in what is possible what is what are the limits of of our intelligence or you know are they need. +Yes and it's a very interesting task that are still waiting there to be solved and can be solved with the I think it's it's magnificent. I am waiting to see if somebody some model will finally win the Lovner prize and pass the cheering test so I'm working for chat GPT. +Maybe it will be fine tuned on something like flower search or something. Yeah yeah yeah on human labeling with for lock I am pretty sure exactly exactly. +And yeah and traditionally of course we will we will link everything we can link about the locker but if you were to announce something to the audience maybe how they can get closer to the platform you know start playing around. +Okay yeah okay I have three things that I really want to to announce firstly if you want to talk about just like do we need manual labeling do you need manual labeling do you need to do you need to transfer learning with crowdsourcing do you want to just use crowdsourcing and think about it. +And to join our community because we talk just in general they're about the mental stuff about the eye about crowdsourcing about that the century can all those entry approach and there we can. +concretely talk about like some topics which concern to you to your company or like to your business and just talk and also we have two initiatives for like education and for researchers if somebody is interested to check some hypothesis on crowdsourcing for example some. + Christian air some ethical stuff some gathering of the data sets for your own to or like you want to create some education or like to teach a course over the crowdsourcing to your students we have two foreign applications where we can provide you some promulcals of using crowdsourcing for free and play around and maybe to start liking it as I do. +And I truly like it because like when you can gather at 12k data sets in one day start liking it. This is mind blowing. So yeah that's that's me that's it thank you very much. That's fun. That was magical thank you. +I'm sorry for being like a very talkative person but I'm always like this would be afraid of me. + No it's it's your character it's your energy it's it's your experience and it's something that speaks up you know beside you controlling it I think it's it's it's important it's amazing and that's how it should be I think I really really enjoyed talking to you Jenny today I hope this is not the last time we can record another one and another one. +And know the best with your Christmas vacation or thank you and recharging and all the best with the local. Thank you very much I keep in. You can do it however you like actually we we approve. +Yeah thank you and I hope the audience got that magic tune as well and everyone will also have time to recharge during the Christmas and your break and we will continue from here thank you Jenny. Thank you bye bye. \ No newline at end of file diff --git a/transcripts/vector-podcast/filip-haltmayer-data-engineer-ziliz-on-milvus-vector-database-and-working-with-clients.md b/transcripts/vector-podcast/filip-haltmayer-data-engineer-ziliz-on-milvus-vector-database-and-working-with-clients.md new file mode 100644 index 0000000..119916d --- /dev/null +++ b/transcripts/vector-podcast/filip-haltmayer-data-engineer-ziliz-on-milvus-vector-database-and-working-with-clients.md @@ -0,0 +1,344 @@ +--- +description: '

Order your Milvus t-shirt / hoodie! https://milvus.typeform.com/to/IrnLAgui Thanks + Filip for arranging.

Show notes:

- Milvus DB: https://milvus.io/

- Not All Vector + Databases Are Made Equal: https://towardsdatascience.com/milvus...

- Milvus + talk at Haystack: https://www.youtube.com/watch?v=MLSMs...

- BEIR: + A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models + https://arxiv.org/abs/2104.08663 +

- End-to-End Environmental Sound Classification using a 1D Convolutional + Neural Network: https://arxiv.org/abs/1904.08990

- What BERT is not: Lessons + from a new suite of psycholinguistic diagnostics for language models https://arxiv.org/abs/1907.13528

- + NVIDIA Triton Inference Server: https://developer.nvidia.com/nvidia-t... +

- Towhee -- ML / Embedding pipeline making steps before Milvus easier: https://github.com/towhee-io/towhee +

- Being at the leading edge: http://paulgraham.com/startupideas.html

' +image_url: https://media.rss.com/vector-podcast/20211223_011243_83a6aa11fa9886f4212eedc43a2501c3.jpg +pub_date: Thu, 23 Dec 2021 13:28:43 GMT +title: Filip Haltmayer (Data Engineer, Ziliz) on Milvus vector database and working + with clients +url: https://rss.com/podcasts/vector-podcast/347470 +--- + +All right, vector podcast, episode three. And today we have Philip Altmeier, data engineer as Zillitz, who works a lot with users. And the company is building the vector search database called Milbus. Hey Philip. Nice to meet you. Yeah, I got it. Data engineer as Zillitz, pretty much me. +Yeah, awesome, awesome. Nice to meet you as well. And thanks for joining the show. And yeah, it's usually I would like to start with you introducing yourself to our audience, what's your background and how you ended up working for Zillitz. Sounds good. +Yes, so you got my name already Philip Altmeier. A graduated from UC Santa Cruz with a BS in computer science in 2020. So right start enduring COVID. And then out of college, I really wanted to kind of get into the startup scene. +And I was doing a lot of things, machine learning, taking a lot of classes, doing projects. And when I was going to look for a job, I realized anything machine learning related, you have to have a PhD. You have to be doing masters PhD, extra work. +You're not getting out of, you're not getting into the field out of college. So the next step was like, okay, what's kind of new and growing in that field? Somewhere where there isn't already so much this set knowledge. And that's where vector search came in. +And then Zillitz, I found them and did the whole process. And I really thought I fed it fit in. And that's where that took off. So that's kind of how I got to where I am right now. +But pretty much yeah, I throw a shout out of college, getting in the whole field and figuring everything out on how it all works. Oh yeah, that's cool. And can you tell me a bit more? I've been also doing some tech stuff for a few years here and there. +But like data engineer and you work with users. How exactly that looks like? Yeah, so data engineer gets thrown around a lot at a lot of companies. Right now, so for me, how it works for me, data engineer falls into kind of just user success and more also pre sale style of things. +So how to use our tech, creating new use cases like we have a bootcamp where we show examples of how to use Milvus. That's kind of what we're doing. We're talking to the customers that are trying to learn how they can implement in their system, what problems they're having. +So we're the ones that are kind of front facing in the company. And then as a data engineer, I've also worked with a lot on the cloud deployment, figuring that, optimizing that worked on some development aspects as well. But it's kind of a lot of hats. +So it's like startup data engineer, at least here, it's pretty much just whatever needs help with or whatever needs work. You kind of get put on that. So it's a cool opportunity to try a lot of it and get to meet a lot of customers and cool people in all different parts of the field. +Yeah, and you also learn to interact with users because they bring a different perspective over things. They probably don't focus as much on the integrated, but they need to solve something. Right? Oh yeah, it's definitely that. So it's figuring out, seeing their use cases is always crazy. +See how much data they're dealing with these cool ideas of what they're trying to do and seeing how we can make it work. And usually it all goes well. We can figure out solutions. We work together. We kind of keep relationships. I think it's really cool. And sometimes it doesn't work. +Sometimes we need to put more things into Milvus. So we kind of keep the communication line open and we kind of figure out more things to put in, kind of get their input on what we're working on and go from there. But it's a lot of a back and forth conversation with users and customers. +Yeah, for sure. It's kind of like, first you need to learn what is it that they're trying to do, right? And before, like you suggest any solution because it takes a lot of time. +Do you feel the same way when you talk to them? Like instead of kind of jumping in and like solving, you kind of try to figure it out. Or you do something else. You do you have a different approach. +So I think I go the way of like, yeah, kind of get all the info first because everyone's use cases different. None of them are ever the same. And then kind of come back to the team, kind of discuss it. +See, do we have any else that's been doing it sort of similar? Do we have a solution? Because sometimes we've had to do hacky solutions with our like our previous version. +There were things that just weren't up to par and you couldn't really change it that much because it was like kind of on an old style of doing it and it didn't work. So you kind of do some hacky solution for them. Some way to trick it into working it out work in production. +But then we take notes of that and kind of later on put it into like, so like when we're going to new version, okay, we got to think about this. We got to improve on this. And yeah, but it all starts with kind of figuring out what they're doing, getting the whole picture. +And sometimes like there's always conversations where they can't really say everything because a lot of these places that are these companies are, there's a reason that they're like, they got to be secretive because it's a new field and they may have a really good new idea. +But it's kind of like extracting as much as we can without like crossing those bounds, getting that in for the not a lot of tell and seeing if it can help them with what we got. But it's like a big team effort. It's not just me. +So I talked to them, bring it back and then we all work together to solve it. Yeah, yeah, for sure, for sure. And I, your users kind of aware of that you are helping them with vector search. +Do they even care? So it's fifth, there may be not fifth, like I would say 70% of the people we talk to are aware from it. They come to us with help. +So they know what they're kind of getting into and they know what they need vector search for and they know like what they're doing and why they're doing it. But sometimes we also get people that are, hey, like I want to find similar images. +And there it's like, we have like the simple like tutorials that kind of deal with it, but they want to know more about it. So there is some explaining of what vector search is. +What vectors are sometimes that some like it's understandable, like not everyone goes studies machine learning and knows what like vectors are math as well. +But I would say 70% of the time they get it and they know what they're getting into, but 30% of the time it's also just like a whole new world and they don't really like we explain it, but it's not like a can't explain it all in one day. There's a lot of stuff that goes behind it. +Sure you might touch on vectors one day, but then you have to get into the algorithms next day and then it's sort of like keeping that relationship and answering questions whenever they come up. Oh yeah, oh yeah, for sure. +And are you using like most of the time you're using Milders, right? Like as part of your user engagement or do you like how does it look like? +So you bring the database and you say you know it can solve a bunch of different use cases, but you know we also need to vectorize your data or maybe they bring the vectors. +How does it look like? So they they usually bring the vectors themselves. We're currently going working on as well as we're building something a little bit above Milders for actually getting the vectors, but that's the little in the working progress and it should be releasing soon. +But for now it's always we have like our examples, we have a bootcamp where we pull like the basic like the basic pipeline. It's always around Milders, but for like images we have a resident 50 and we kind of show them how it goes in that process. +So that's for the 30% we kind of go over like one pipeline. It's like a small file because it's just three steps encode it and bet it and then search it. +But yeah, so most of the time they already have their embeddings though with those bigger companies who knows who know what vector search is, who knows what they're getting into. They already have their 512 dimensions 10 million vectors. They know what they're getting into. +So they just want to see okay how many how fast can you do it? What are the bottlenecks? Where can we like what do we need to scale out if we're going to scale? So it's like again 70 30 the 30% usually you need to go over the actual embeddings as well. So just like a quick neural that lesson. +Yeah, yeah. Or maybe like it's in their culture in the company to kind of like dive deeper into what they are doing, right? Yeah. +And maybe they think that they can kind of take it over and then kind of run with it as they learned, right? But then you said 70% are kind of like, you know, here is my problem. +Can you solve it? Is that exactly like does this work? Can we have a handle this? And that's how usually like 70% again, but they have their own neural and that's sometimes they don't want to tell us like what neural nets are using or how exactly their data is. +But they give us okay, this many dimensions, this many vectors and this many read request, write request, will it work? And then we kind of go from there and see if we can solve it. Yeah, yeah, sounds cool. But can you actually tell me what is vector search and what is Mildos? Okay, sure. +So yeah, vector search. Pretty much a way to search vectors and check over vectors as well. Yeah, I'll go over. So numbers and vectors, you have numbers easily comparable. You can store them in relational databases. Yeah, like the greater than equal to less than. +So to actually index these and search quickly through them, you can do things like Beatriz. This is a very efficient and very fast way of searching for a value. +Vectors on the other hand, you don't really have this kind of like comparison, direct comparison, you have similarity metrics, which is a math equation where you kind of find out how far they diverge. But it doesn't tell you, okay, this element is diverging this much. +It's like a lump sum for every value in the vector combined. This is how different the entire thing is. And that makes indexing a little bit more difficult because you kind of start relying on more approximate algorithms. +So this is what approximate nearest neighbor search, which is pretty much all of vector search. It's this library of algorithms. And there you can do clustering and then you can do graph based, tree based. So the big, the big names are right now for inverted file we have face. +That's the library for its clustering based on centroids. And then you store values in the inverted file and you search through that. There's tree based, which is spotifies annoy what they're using for their music recommendations. +And that's just building trees and splitting all your data by hyperplanes and then going left or right. And then we have graph based, which is H and SW. I think is the biggest one right now. And they're doing pretty much graph start with a very sparse, a very like empty graph on the top layer. +And then you find the closest to one point, let's say, and then you drop down a lower where it gets more dense and you keep dropping and dropping. +And then there's a locality based sensitive hashing, which instead of with normal hash algorithms, you avoid collisions with locality sensitive hashing, you try to get collisions. If you get collisions, that means that they're close together. +And then one thing I kind of forgot to go over is like the data types that this brings. So there is structured data, which is those numbers strings, those things that can be easily compared to. And then there's unstructured data. +And this is pretty much these images, videos, medical data, some things that computers can't easily understand. And then with unstructured data, you throw them through a neural net and you get those vectors that we previously talked about. +And then relational data or with a structured data, you can just take the data itself because it can be easily compared. It's already known to a computer. It can understand them. +And then there's in between, which is semi structured, semi structured is things like emails where you have structured to it. Like you have the body, the header, the sending address, those are all like every email has that, but the data inside is unstructured. +This is kind of where you use a mix of both. But yeah, vector search gets a little complicated, but the main way to think about it, you have unstructured data, your computer does not understand whatsoever. You can have two images, a pixel apart. +And half the time if your algorithm is not good, your computer will think there are two exactly different pictures. It won't get it. So you take that unstructured data, you throw a through a neural net, you get vectors, and what vectors use those previous algorithms to find things that are similar. +And that's how you can quickly search through it. Right, right. And I mean, so you mentioned these several algorithms there, which is I agree, I agree, I read this paper as well. This is cool. +But like just to satisfy my curiosity, where would you put the product quantization methods, you know, which is also implemented in FICE and maybe some, some where else to. +Like, is this like a fundamentally different approach compared to LSH graph, trees, or is this something else in your book? So with FICE with this quantization, I think I just find that to be part of the graph based. I looked a bit into it. +This kind of went a little deeper because I didn't really work on that much, but I did up and look into it. But it's pretty much just simplifying it for clustering is the way I saw it, where I would classify that for clustering. +You kind of want to, you need something to kind of simplify and speed it up. So that's where in FICE you have the quantized base. You have the flats, which aren't, you have the SQH, which is quantized based. +And there's a few more throughout the names, but it's just a way of speeding up that already used one. I'm not sure how well it will work with other algorithms. So like using that quantization and then trying it on a noise, you quantize everything and then you start doing the splits. +It might speed things up, but yeah, it's a little bit outside of where I see what I know. But if it works with FICE, I believe it could work with the other locations. It's just, I don't think it's fantastic yet. Yeah, I mean, I agree. +And I mean, there are a number of approaches where they combine things. If you take the paper on disk and then from Microsoft, from I think Bing team, they combine H&SW with product quantization. And they also have clustering. So it's kind of three phase algorithm. +The first cluster, the points, they get the centroids. Then they kind of quantize them, I guess, kind of lose some precision on the vectors so that you can actually load them in memory. +And then like from there, they build the, so for the clusters, they build the H&SW, the graph kind of layout, right? For each kind of like shard, you could say, for cluster. And then they basically kind of, it's a, it's a few steps kind of approach. +So your query comes in, it basically goes through kind of this quantization. You find the closest, you know, centroids. And then you go and kind of searching them. And then you read rank the results based on the disk. +So from disk, you read the non-quantized versions of vectors, right? So that you can actually give, get the precision. So I mean, what I'm trying to say basically is that you can combine these algorithms in ways. Yeah. Yeah. +Depending on your use case, basically, like if you try to optimize for memory or speed or something like that. Yeah. Yeah. And so if we go back to, before we go into Milbus, like if we go back to use cases, you mentioned there are like a number of things. +Let's say you can encode almost any object and you gave an example, really good one about email, right? So email on one hand, everyone knows what it is. On the other hand, it has unstructured kind of parts to it. +And if you compare text, let's say versus audio or video, do you think that you can equally apply vector search? Of course you can, but I'm asking in terms of quality that you will get. +Or do you need to go like extra mile, you know, in audio, extra mile and video compared to text? There are so many models. Honestly, that's a good question. I think that's where the neural nets come in and that's where they're important. +How they kind of the black box doesn't how it kind of sorts everything out, but I believe in text, I feel like first, I feel like there's been a lot more work and a lot more kind of people have been looking into them for now, everyone's kind of switching to that for product recommendation. +There's a lot more money in that area. So I think there are a lot more advances in those neural nets. But I think the underlying to text, the way I personally see it, this isn't like scientific factor is I feel like there's a lot more underlying in language. +I feel like there's a lot more rules underlying connections that I would think a neural net would find compared to an image. And then with those underlying values and kind of the underlying language, you'll have an easier time kind of grouping things together with a neural net. +And then if you can easily more easily group things together, the more easily you can search it pretty much. You can make these clusters a lot more accurate if things are going to already be near each other, and it's easier to find. +With images on the other hand, I feel like there's not as much of a background connection in everything. Again, all personal take, but I feel like sure an image might have the same object, but there's no like real underlying thing linking those objects. Yeah, there's the shape. +But in languages, you have a lot more than just a shape of an object. So that's kind of where I think text does have a better time. But in reality, the way I want to, when we look at our systems and everything like when we try, it always ends up being very close to each other. +Maybe like up until now, it's already approximate. So no one's really been hurt that much by half a percent of accuracy. Up until like everyone understands it's still kind of a new feel. It's kind of growing in that these methods are all approximate. Like you're never going to get a perfect end. +It's really up to the testing with your neural net to see which embeddings and optimizing your neural net and then throwing because these algorithms aren't, it's like for the approximate year's neighbor, these algorithms aren't really learning. +Sure, there is some learning with the quantization based ones, where it is kind of making its own quantization. I know for face, it does it. But again, it's like it's like a, it's an algorithm that kind of goes step by step and there's not too much randomness. +Sure Spotify does random splits in their annoy. But yeah, so I kind of have to optimize your neural net to really get the best performance. But there are some values that you can mess around with the actual approximate nearest neighbor search. +But those don't play as big of a deal, I believe, as what you're doing with your neural net. Yeah, that's interesting. +So, if I actually take a step back a little bit, can you tell me and our listeners why we cannot do exact KNN, exact KNN nearest neighbors? Why do we need to do approximate? What stops us from doing exact? So with exact, okay, so first you can get exact, but that's just going to be brute force. +Maybe all these libraries, maybe not all of them, but most of them do have a brute force search. And then you haven't solved anything. Then you can just use relational database, throw in your numbers and go by each column, look through each one, see which one's closest. +Yeah, so you get approximate, that's where you get your speed up. And then with approximate, because you're doing clustering, you assume most things are going to be embedded. +Your neural net, like if you have a neural net, your embedding layer, your vector is probably like you hope that it's going to find similarities. Like if you have two items that are very similar, your hope that their distance is not going to be far. This isn't always the case. +A neural net, if you have a photo of a car and a photo of a car with a bike in the background in might for some reason, folks on the bike, we don't really know what's going on. There's research into seeing what's actually going on behind the scenes in the box. +But yeah, these two might pop out with two completely different values. They might be in completely different clusters, even though they kind of should be similar. So that's where this, it can kind of go wrong. +And then, see, yeah, you search the wrong cluster, and then you'll miss that, even though I was supposed to be a good choice. But then there's also the aspect of not searching through everything. You want to speed things up. +You search through the top 10 matches, let's say for inverted file list, which is centroid base, the clustering. You look at the top 10 centroid. If you look at the top 10 centroid, and you find those files in there, yeah, they're going to be similar. +But there might be the 11th centroid, might be a very similar one. They're all by just a tiny bit less. And then inside it, it might have the perfect answer. +So there is like all of this approximation where you only look at top X numbers, and then also combine with you only look at, you only make so many clusters, you make X clusters. There's always going to be outliers out of bounds. So that's where you kind of get that loss. +Because for the similarity search in this, leading on this, like, will similarity search take over everything? It won't really, because sometimes you need perfect results. And similarity search is kind of useless there. +It's going to end up being brute force, and then with brute force, any algorithm really works. You're going to be looking through every single value. Yeah. +So it's like complexity wise, it becomes like big all of n, where n could be like 1 billion, right? Yeah, when you're in 1 billion, there's no problem solved anymore if you look through everything. Yeah. Yeah. Yeah. +That's why you need to go approximate, but it's not like approximate to the level of losing like tens of percent in, in, uh, percent. Yeah. It's usually, I would say it's usually like around three to the three percent. +If you're doing like a very reasonable, like speed versus a recall, um, you balance it out of it, that's where you can change the values in the actual algorithm. But if you keep it kind of balanced, usually 97% the average of mostly what I've been seeing. So it's pretty strong. +And this is like where, yeah, it's approximate when you're dealing with billions of data, you don't really, like, yeah, finding the exact for some use it use cases is very useful, but usually in the billion scale data range, you're okay with just getting a few that are very close. Yeah. Yeah. +And I mean, I've, um, so when I published a blog post about like all the vector databases, I will make sure to link it in the notes. Um, and, and Mildbus was there as well. You know, and I can use somebody said that they have been actually using no SQL database for genome related project. +And so what, what the guy said that he did is that he actually can pre computed the nearest neighbors for each individual entry. And then he stored it as, as individual items in the no SQL database. +And so as query came in, he basically kind of went and kind of asked for each item, okay, what's your neighbors, right? And, and then he said on, on, on small scale, this worked fine, but he wouldn't necessarily use this on the kind of next level, right? Yeah. +And so can you tell me more like how Mildbus is done? What is, what is it as a product, let's say, uh, and, and what's included inside? What can I get as a user? Yeah, sure. +So, um, yeah, Mildbus, we kind of built it as a database first, similar research second, where everyone's collecting a bunch of data, a bunch of vectors, everyone's hoarding all their data and they have, they're making their neural nets, they're all getting embeddings, but then like, what's next? +You need to do something with that data. +So that's where against similar research. Yeah, what we're doing is building up a database system. So right now with version 2.0, we're really working on making a cloud, uh, native, something scalable, something fast and something easy to use. +So we, you can think of it pretty much as a MySQL database and just for vectors. And then in that regard, you have the cred operations, you have sharding, you have all of this, all these operations and we're kind of building up that for vector itself. +And then later on, we're going to be building up other parts that branch off to kind of make those vectors. So we're kind of, it's this core to our entire pipeline for dealing with similarity search. And yeah, that's kind of what it is. +In terms of like the actions you can do with it, you can do storing, you can updating, as I said, partitioning, sharding, we're adding scalar filtering in right now. It's with INS, but I think this month. So in the next week, I believe we're going to be having strings for a scalar filtering. +What scalar filtering is, is being able to filter results in a fast way. So instead of searching through everything and then filtering out these certain things, you kind of apply the filter first or apply the filter during the search to speed everything up and also get more accurate results. +So with, let's say you have a vector and then there's a filter that says, glass is equal true. You can look for every single vector that has glasses equal true. And that's very useful and something that everyone's been looking for. But yeah, it's a database first. +And then for the actual searching, we employ all these libraries that we produced and mentioned, us, annoy, phase, hnsw, all these guys to build these indexes. And then you can select whichever one you want. You can use multiple. +Sometimes some will work better for images or if you're neural nets working in some way, it might work better with this one. So you can store multiple of these indexes, decide, test pretty easily, and mess around with it. And once you're done, you select what you want and you call it a day. +And you search and you get results. Yeah. Actually, when I was watching one presentation by your colleagues at Haystack, we will make sure to link this as well. Like this got my eye besides, you know, the horizontal scaling that other databases as well have. +Well, maybe not all of them, but some most of them. But, you know, one thing that caught my eye was that I can indexes, you said, the data using different index layouts, essentially different algorithms that you alluded to earlier. +And then I can somehow test and kind of figure out which one works better. Is that right? Yeah, pretty much so we do have benchmarking algorithms, but you can also benchmark yourself as well. +But the way is you can build up also every single index has its own parameters and you can just constantly build up more. You can like build up 10 of parameters change this way or 10 of just completely different indexes. +And then you perform a search with the same vector for each index because when you search, you can select which index you want to use. So you can just take that search, throw it in through every single index, see the results. +And then if you have a baseline data where you already have it labeled and you know what results you should be getting from a brute force. So when we do these benchmarks, it's always compared to brute force because brute force will give you the exact answers. +And from there, you can kind of see, okay, how many hits did I get, how many did I miss, and see what your recall rate is. +And then you can also time these things as well because some parameters, if you make 10,000 clusters within your data, that's going to take a bit to search if you want to search through every single one. +So you time it and then you can kind of get this ratio of speech performance or we usually say like speech recall. But yeah, so you can build up all of them then go from there kind of if one doesn't work, you can just delete it, it'll do it in the background, build a new one. +You can be doing these searches and everything concurrently because back to go, the indexes can be built at the same time as you're searching and doing all these other things because we have workers for queries, for building indexes and for inserting data. +So it's all kind of in the background and kind of gets dealt for you. Yeah, that's cool. And I mean, so you mentioned the technical part, you know, like different products they might have some SLA, let's say, you know, how quick it is, queit per second, P99, whatever. +But like what about the semantic part? Like you mentioned that there is like a ground truth that you can compare to always, right? + But like what about the other kind of side of things, let's say for people who are like, let's say product managers, they're not very technical, they will not look into this metrics, but they still would like to get a way of understanding, you know, what's the kind of impact on the semantic part of things, right? +For instance, you're comparing, you know, inverted index versus vector search, right? +So with the semantic part, we don't really deal with that as much because we're assuming that your semantics are done well by the, by the neural net because this is where it kind of goes, you compare everything to brute force. +If your brute force shows that this is the correct response or this like, or this wording or this wording or this wording or the top three results, those are mathematically the closest, most similar to your input. So that's where you kind of compared to that. +If those aren't close, that means that there's an error a step above because your neural net is not finding the connections correctly. + So that's kind of how we compare to the base, which is just the flat index of brute force and we kind of pull out and we see if you're hitting the right responses, like if that's sort of what we deal with, not the actual, because the semantics come from the same from the neural net and to find that issue is more above us like in the whole stack, that makes sense. +Yeah, yeah, for sure. So basically what you're saying is that, you know, if I take, if I fix the model, right, so the model is fixed. +And I pick, let's say, different algorithms for indexing as well as let's say, different even distances, right? In some cases, I can maybe choose different distances, right? +Although maybe you can tell me if I'm wrong here, because if I trained the model for a specific distance, maybe I cannot easily pick another distance during test, is that right? So a distance, what do you mean by selecting a distance? +Because it's all based on closest, like we will rank it closest to furthest and then it's only like top end results. +Yeah, I guess what I meant is the distance metric itself. So it could be to harming, you know, and yeah, do you correct all those? Yeah. So comparing across this different distance metrics, that one is kind of you have to look at your data. +We don't have, because yeah, if you use different distance metrics, then your flat line is going to be different. But yeah, that's one where if you're going to compare across indexes, you have to keep them the same distance metric. +Swapping them out will make some big changes, I think, because if you go from L1 to L2, or maybe not L1 to L2, there's a cosine to Euclidean. It switches up things up a bit where in some cases, one of the distances might, a higher values better, in some cases, the lower values better. +So there's no real direct comparison. They're still going to usually rank in the same order. But yeah, for figuring out which one you want to use there, it's kind of give or taking off. I actually have to look at the results for that one. +There's no real like mathematical way to kind of compare some antics to distance and kind of get the relationship, if that makes sense. Yeah, yeah, for sure. For sure, it's more like an experimentation needed there, right? Yeah. Exactly. +And also, like, actually, I just remembered when you've been kind of describing these different distance metrics. I remember a paper, I think it's called Beer. So it was comparing different methods to do the re-ranking step, right? Like dense retrieval and some other methods I forgot already. +But they actually found out that if you have documents, let's say text documents, the cosine similarity will favor shorter documents given the tie versus dot product will favor longer documents. And this is by design of the formula. The cosine similarity is basically mapping it to the unit sphere. +The dot product is there is nothing to kind of normalize on. +So it basically just takes all the components of your vector and just basically says, okay, here is the volume and just the lowest one wins, right? And that can actually impact the user experience, right? Like if I have a database, let's say of news versus some deep research, right? +So deep research is thousands of pages and news is couple of pages, maybe even just couple of paragraphs. +Yeah. So if my hits are just in a paragraph in the news and also in a paragraph in the longer document with cosine, I'll get the news. I will not get the deep research, right? See what I'm saying? No, that makes sense. Yeah. +So yeah, that's I think one where you you have to kind of test it out and see what you want because I have some people searching they might want to use some people searching they might want the scientific paper. And that's one where you look at history, I guess. +For thinking about this, let's say Google is doing it. You look at the user's history of how they search if they're searching for scientific stuff or if they're always looking at you maybe swap the index and to a different distance metrics. But yeah, I haven't thought of that too much. Not way. +That's really interesting. Oh, yeah, I need to check out that paper. Yeah, for sure. I'll send you the link and I'll make sure to also include in the notes, maybe for those of us who are interested in reading papers. And yeah, so that's awesome. +So you guys basically will for a database where users can mix and match the way they want, right? And then you help them to kind of do you guide the users in the process of doing this? +So if they come to us for help, we usually kind of we have some articles where we mess around with the indexes and different parameters and we kind of have like a graph, let's say speed performance, recall that kind of stuff where it's kind of preliminary. +We kind of hope that they kind of learn it on their own because like we can only help so many people. So yeah, we do help out. We point to the right directions. And if it's like a really interesting use case or a really big use case, we'll kind of mess around with it ourselves and try to help out. +But we also just hope that people mess around and then post the results. They see that and then like we kind of the more data we get, the more like it helps a lot with when people kind of share what they're doing. +We're trying to share as much as everything kind of get people into this, get those words spread and that's pretty much open source. Like a big deal of open source is kind of getting this out there. +Like competition's good, new innovations good, just get this like vector similarity search, getting people interested in it. Yeah. Yeah. And your website has so many use cases covered like I was looking at audio search. That was interesting. +Like you basically walk through, you know, selecting a library, how I will encode the song and I have an idea to try it out on a few songs that I have like MP3s. Yeah. +And what I was particularly interested is like, okay, is there a way to separate like the singer voice from the musical instrument from like, I don't know, the style of this song and so on. Yeah. +And it's a really cool one because this is one of the things that kind of looked into a lot and did some talks on not like really detailed, but it always popped up. +So for all these like recommendation systems, like what's I think I have never looked at like their deep everything's behind closed doors. +But it's like spotifies recommendation and then when you're doing like the shazam, I think was shazam for the audio recognizing is, yeah, they separate the background music and the vocals and they pretty much discard vocals. +This music searching is based just in the background and there were some techniques. So I think there are for separating the audio, there is like one D neural nets that kind of go in the line and there's like times used based neural nets. But another one that was audio inversion. +So it would help when you had the background where you didn't vert it or where you had the vocals to get the audio out. But a lot of it was working on that of pulling out the background music is the big step. And then performing the neural net on that to get the embedding. +So that's how you avoid if you're recommending songs, you would cover songs and you kind of avoid it for you can easily filter out cover songs because they're going to have the exact same background. The vocals will be different. +And then with these recommendation system, another cool thing is you don't want the perfect similar like with your result search result, you don't want the exact same research. It's like you don't want the top 10 closest. +You might want like the last 10 out of 100 that are close because you want something similar but not really exact of the same. But yeah, audio inversion and one D neural nets and a few others. +I don't remember that on the top of my head, but it's a hard problem to solve of getting the vocals out without having like separated files already. +And it's like an exciting topic because like, you know, like there are so many examples on the web how you can index text, you know, how you can not index text and do something else with text and more text, right? +But it's like in frequent that I come across some image search or audio or even for that matter video, you know, I haven't seen any blog posts on the video. +I don't know if you guys have it. So video, yeah, that's that one gets a little difficult in terms of like video you can, but also how you're going to sort everything out because when you're doing dealing with videos, everything is framed by frame. +So then it's how to do you take every frame and sort of group it together into one sort of like ID and then if any frame matches, you kind of point to that, it gets a little difficult with video. +It's not too bad if you're doing let's say live tracking in a video like, like to say there's a soccer player and you pull out the most similar player that looks like him, you can get a name for him to track him. So it knows his name. +That's kind of similar if you're doing live tracking, but if you're looking for things within a video like you look for a single person an entire video, then it kind of gets difficult of it takes time. +You either go through all of you index all of them or you pull out a few key frames, but not too many people I'm going to be honest are doing video yet. +I think there is a little bit of lack right now to be honest, I think images are the most used for us like everyone because images, I think, is the easiest even compared to text because the text you have, some of these neural networks where the transformer networks are a little bit hard to use. +You always have like, yeah, you have like in Python you have sentence transformed the easiest one where you just input the string, but the other ones kind of required to add tags in the string and do these things which not everyone understands. +With images, it's just import some ResNet 50, which torch makes it really simple. So the image put the image in the ResNet and then literally you get your embedding vector you can directly pipe it. +So it's like, it's a very simple one and it gets good results that are pretty interesting and you can do a lot with images and I don't think enough people are doing it yet for like shopping things. Everyone's still relying on text but let's say you upload an image or a shoe, you find that shoe. +I think everyone will enjoy that a lot more. Yeah. So it says like very concrete application in business, right? And e-commerce is a very big area. So yeah, that makes total sense. +It's not like many users are like, oh, I remember that scene in the movie, can I find it expressing it in words? Yeah, it won't work. +Maybe you can like say the actor and then like some description of the scene, but then you already have to know the actor, which personally, I don't know any actor names. So yeah, exactly. It doesn't work for me. +And it and it defeats the purpose of search, right? Because actually like early on when I was kind of just entering this field many years ago, I was like, so search, it's like I need to know what to look for, right? +So I'm typing the keywords, telling the search engine what I'm looking for, but I don't know what I'm looking for. +Yeah, you're already doing the job for it. Yeah. Yeah, like if you're searching for the keywords, that means you really need to search for the engine anymore if you're doing all of this job. Yeah. And I really loved one competition that Yandex did. Like they actually stopped doing it in its epiti. +So the competition was like this, they give you a question. And basically you compete with other people, all right? So they give you a question, but it's not like what is the color of submarine in the in the song of Beatles, right? It's like you first need to answer the first part of the question. +Then you get like kind of another puzzle, another question kind of, you know, the puzzle gets solved and you get the full question and so on. And like it's multi-layered process. So basically they're telling you that a cool search engine would be doing that. +So you could ask like a very convoluted question and it would kind of figure everything out. I feel like you might know more, isn't there the the aspect of I remember taking an MLP class. +It was always you could only get to two degrees or was it like the third degree would always like if you have a question and then like based on that answer, you have another like part of like the next part of the question is placed on the first one. +I think they were only being able to do the second degree like a question after the question. And like getting that third part, it would always fail. Like it wouldn't be able to do the connection all the way back. Yeah, yeah. They stopped doing that. Seems like a really good conversation. +And it was also based on a lot of associations, something that computers may or may not be doing good. You know, like if you know prologue, the programming language, like it basically has the associations kind of built in as first class function. +You type something like, you know, orange fruit and then you type somewhere fruit and it says orange. If you type orange, it says fruit, right? It remembered that mapping, right? And then you can use this associative kind of programming in a bunch of places kind of building AI. +But I haven't been programming prologue. I was just it was part of one course. But you know, like the questions that they asked at the Yandex competition, they were also like, you know, something like, you know, who met this lady when he was a student blah, blah, blah, blah. +And you're like, I already lost the train, the train of thought in this question. So I spent like, I don't know, maybe one minute just figuring out what is being asked. And then you're like decomposing this problem into multiple problems and you start from the first one. +And then they are solving it. The next one and time is running. It's like five minutes of question, if I remember correctly. And it was fantastic competition. You know, like it's like, I don't think search engines are still kind of on that level. I don't think they are yet on that level. So yeah. +They'll probably get there. It's those neural nets. What they're doing with them is it's going to get there. Yeah. And we have those giant networks that they're now creating like Nvidia release that nega something GPT. +What is it three right now? Was it yeah, GPT three is the one that they're not releasing. No, it's going to get there one layer another. +So does it make you excited like to try these models in real life? What do you think they kind of make too far still, too far from real life? What do you think it makes me excited? And I think they're doing really well. +It's just that this whole trend has kind of been going to like not user friendly. No one can run any of these models and need like nine a 100s like $100,000 worth of computing who has that like other than those places that are doing that what GPT three took. +I don't know how many billions and billions parameters. No one can run that unless you're like at some super big company. It's like my opinion is what's the point like you can you can always throw more and more hard door at it and you can always get like 0.001 percent closer and closer. +And that's kind of like this whole thing of why like this area of research is on. It's kind of the new thing like we've already kind of maxed our selves out on neural nets. I personally believe unless there's some huge architecture changes that inspire some really interesting stuff. +I don't see neural nets changing as much and then I feel like we can do more with the next step of the nearest neighbor search in this approximate nearest neighbor vector similarity search. And I think that's where we can make some more new head games until we max this out. But yeah, I don't know. +I'm excited to see where it goes. It's just I hope it's going to be something I can try myself and not need to spend $400 on Amazon for hours worth of calculations. Yeah. So maybe somebody needs to work on compressing this model. +So sort of compressing the compute power they actually, you know, require. But now the thing is now it's like everyone's interested when they say, oh yeah, we use like 40 million dollars with the compute power. Everyone thinks that's cool. That's going to get some news. +And people are going to be interested in it. It's when it's bigger, but there is I don't know. I remember when I was studying all this, there was a lot of those things in regards to sparse neural nets and that was kind of going to be the future of compressing all this down. +Fortunately, I haven't really kept up on it too much, but hopefully they do make moves because again, I don't see throwing more and more hardware as innovation compared to actually making it efficient. Yeah. That's my take. People are doing it so there's there's reason to do it. Yeah, for sure. +It's like I guess the the hope of researchers is to essentially kind of emulate human brain, right? But like, I think human brain has like a hundred million neurons, so even more, right? I don't know. Like, yeah, it has a bunch of neurons and then connections that we have is. Yeah. +Yeah, we're not, I don't think hardware wise we're going to be close yet. Unless, I don't know, I know they're doing a lot of research in this area, but I definitely don't think the neuron or, yeah, I don't know. This is the way I estimate that. Yeah, yeah, yeah. +But like, just to close on that thought also that human brain does not kind of use the energy of a server farm is just like one electric lamp or whatever. Yeah, even less efficiency right there. And that efficiency. A couple of kilos and that's it, right? That's the device. +And then we can do all of this. Yeah, there is a long way to go. Yeah, yeah. And exciting though. Yeah, yeah, absolutely. But actually, today I've learned from my colleague Arnitalman. I will make sure to also ask him to give me the link to this paper. +The paper said that if you take a model like bird, bird will not be able to distinguish the negations. Have you seen such research, which is very amazing. Like for the bird power, not to be able to distinguish negations, that could be like deal breaker in some cases. +Even though it is very, bird is very kind of powerful model and probably Google is using it like for a few percent of their searches. But to know that it doesn't know to distinguish between negated or non negated phrase, that's interesting. +And that brings more questions in where some languages have double negations, where matters, where it doesn't matter. And then that's like where it's like a double negations in some languages are used quite a bit and they really change the meaning. +So that's where it's like, what do we do there? But I wonder why. I guess I wonder if that's something we know why or if that's something just a black box of mystery does something there. +Yeah, because if it was a rule based system, you could claim that, hey, I have the rules here, right? And I've managed to encode negations. I know what they are. And maybe you run out of all possible combinations, then you add another one and another one. +But in bird, you don't do that, right? You mask the text and you train it. And that's it, right? Yeah, you didn't tell what is negation. You didn't tell what. And you can also argue, yes, we also in our human brain, we don't have syntax, right? But there are some studies like that. +Like, for instance, when kids learn to speak the language, they don't know what is negation, right? They don't know what is the syntactic structure or pronoun or whatever. They just speak, right? And so we probably don't use syntax in our brain either. +Like we use some semantic grams, I don't know, something like that. Yeah. It's an exciting topic. So, and if we go back to Milbus, so you guys basically essentially have built support for a number of indexing algorithms, as you said, right? Yeah. +And can I, as a user, also plug in my own method? So we're currently working for those plans. Right now, it's kind of blocked off. Or like there's a bunch of changes you have to make deeper in. +But we are working towards doing something along those lines where we'll have a system where you can kind of bring it in. But we're also trying to add like Google scan and we're working on that disk and it's like we're already like working on putting all the main ones in. +But right now, you can't really do your own. And there's also another question that comes up a bunch of the time. +Is it using your own distance metric? +And that's one, unfortunately, you kind of lose, like you can do your own disks metrics, metric, but it would require you to only use flat because all these algorithms are kind of, the reason they're efficient is because they can do some of these distance metrics. +And like, let's say with quantization, like it kind of plugs and plays together nicely. When you try changing those things, everything breaks and you kind of have to revert back to doing a flat-based system. +But yeah, those are the, yeah, for now, we're trying to add in all of the most famous, or not most famous, but just state of the art near-snaper searching algorithms. +And then later on, hopefully, we can kind of make this thing where you can kind of code it out yourself and kind of plug it in and make it work. And also, Milbus, like, pays a lot of attention to scalability. +So for instance, like horizontal scaling, you know, is probably vertical, right? So, but for instance, one thing that I've been thinking about is that in the whole infrastructure and the pipeline of a search engine, one of the bottlenecks is actually getting the data, right? +So the data comes in, let's say in raw format, I don't know, news items, images, what have you. +Now you need to compute the embeddings, right? At some good, good rate, right? So kind of throughput. Yeah. +So do you guys have any work done in this area or kind of recommendations for the users? +So recommendations right now are what we've been using is when we recommend is like having a server, like there's in videos, try to in there's a few other ones of inference-based servers, and you scale those up themselves. +We are currently working on that, like we're calling it OE. And it's kind of the ML pipeline scalable that goes, that's focuses on embeddings. So like kind of doing all these things about embeddings, everything embeddings, and making pipelines that can scale in multiple machines and multiple GPUs. +Still in the work of progress, it's pretty early stage. That's what I'm currently also working on. That's going to be that, like there are people that don't know that step and that don't know what the best process is, and we're going to be, it's also open source. +So it's going to be the step ahead of millivis. And then we're going to, as it progresses, kind of interlink it with millivis, kind of make an easy plug-in play together. But for now, it's all about kind of scaling up inference servers. Luckily, you can scale it pretty easily. +When it comes to videos, where frame order matters, it's a little different. But yes, we kind of, for now, with millivis are only that storage and search part. Everything above it is up to the user. Yeah. +And millivis, do I only store the vectors, or can I also store the input object? Right now, no input object. And that was kind of like, it slows things down a lot. +And that's where sort of a no-skill database or something like that would work a lot better for those quick retrievals, where you need exact. So we, for now, store the ints, and then we're going to store strings. And then later on, we're going to add more and more types that we can store with it. +And another thing, we're hoping to be able to also index on strings on indexes. So you are on ints. So for now, we don't store the objects. So in the future, when we have string, you can link the file path. +Because that's usually what most, when you store an object, you're just storing the file path. But yeah, object storage is not part currently on in the millivis server. You just get an ID and a vector. Yeah, yeah. +And then you go back and kind of link it to the other days, where those objects are stored in case you need to display them or something like that. So in that sense, you guys are like a pure vector database, like you store. Exactly. +Literally the vectors plus, I guess, the scholar values that they can filter on, right? Yeah. Yeah. Exactly. Oh, that's awesome. I think that covers a lot of use cases, isn't it? Doesn't it? Yeah. Yeah. And I was also thinking like, so millivis is open source. +And it's one of my favorite also questions. You know, like, can you speak more why, why millivis is open source? What do you get from it being open source? So I think the biggest thing right now is with open source, like, you need open source to kind of get this idea out. So Dr. +search, if you close source it, you don't really know what's going on. Nothing like something like you don't really get the info that you want. +And that doesn't spark competition because if you're not in there already, you need to find out everything yourself, kind of do it all to just catch up in that that's sort of going to take a long time. +So with open source, it kind of promotes this competition innovation because everyone can see what we're doing. You can see all these algorithms, you can learn how a vector search works. And then people can kind of branch off and do their own. +Sure it's competition, but the only way you're going to get more users familiar with this and knowing about vector search is just get as many people doing it and get as many people trying their own routes, get as many people building up their own systems and just kind of get it out there. +That's like what we want. And I think like that's the biggest way you can do it. If you open source, everyone can see what's going on, learn from it and just go ahead. +And then also with open source, you kind of get feedback from everyone from all different areas, from all different like, you can be a student working on a project who has some great idea. Like he's not some company. +So if you're like sometimes close source, if it's not bringing money in or something like that, no one really really listened to that small student and his idea. So where he might not be able to use it. +So it's just about getting more perspectives on it and getting more input and kind of making it accessible to everyone and sparking that competition innovation. Yeah, actually you brought a very interesting topic. I didn't think about it that way to be honest. + And now that you said it, it's very logical that it may as well be a competition between some users because they are using the same tech and they have different use cases or maybe the same use case, but they are competing like to get that last sort of percent of precision out or whatever you're doing. +But also at the same time, you know, like when you look at open source projects, like I don't know Apache Software Foundation for that sake, you know, when you go there and you ask a question, you, first of all, you don't have to say the company that you work for, right? +Or maybe you are that student that you said. +And, and you know, you just focus on the matter, right? You focus on what is it you're asking about? And then if somebody is so curious, even if they're competing over the same thing, they might kind of casually share something, right? +I mean, that's what I've seen in the in the mailing lists a lot. +Like users just some of the other users, they just come in and say, Hey, why are you doing this? You know, did you consider something else? And you're focusing so much on solving a specific problem? Yeah, I think it's just, yeah, it's kind of in like with competition, there's innovation. +And then with innovation, you get more people interested. And I think that's kind of like what neural nets did is started out. I don't think everyone was using it. Everyone was just using some brute force tech search of keyword matching. +And then as people learned about it more, there's open source systems. And I think a lot of these neural nets, if you're going to be making neural nets, you're going to be doing research on them. You're going to be posting those papers. Everyone's going to see it filled on top of it. +And we'll just explode. I think now that's happening with these Bert models with Huggins face all of this. It's just exploding and more people are looking into it. And it's just it's better for everything at this point. So that's kind of why we do it. +There's a bit of my opinion company motto, but that's kind of, yeah, the reason. Yeah, it's kind of like a compound effect of multiple inputs. And then everyone essentially has the same goal is to serve the users the best. Or maybe solve that specific problem they're solving. +Maybe even for themselves. But yeah, I mean, that's very interesting. And how do you, so basically you have Slack where I can go and ask my question. Like how do you kind of balance your time between kind of doing the actual work and helping the community? So that's a hard one. +Right now with Slack is it's people that come to us. So because this area hasn't blown up so much, it's still manageable. I'm thinking the future when you get to like levels of these other open source projects where they have like 20,000 people and they're slack all like posting questions. +Right now it's pretty manageable and you can kind of keep on top of it. And yeah, Slack, we made like a discourse. We kind of made a lot of like preliminary like areas that you could talk to us. And then yeah, there's GitHub issues all that. +But it's also there's another aspect of like kind of splitting up the problems. Because if you open up a Slack, people might post their technical problems there. Or like there's something that might be worth being a GitHub issue. +The people that are looking at the Slack majority of the time are more like user success style, not like full-blown R&D deep engineers. So that's where the, that's where I think the balance comes out where the problem is of like what belongs on Slack, what belongs on GitHub as an issue. +But for now, all of it's easily solvable because it's a steady inflow that we can manage. And we have enough people looking at it. But we'll see in the future that's going to be another problem to deal with in the interested to see how we do it. +Yeah, it's like both catastrophic success, right? Exactly. It may happen. But hopefully it will be manageable in your case. +And so you can, you can as said kind of cater to that community as well as actually keep solving the and keeping your roadmap under control because you also need to keep, you know, innovating in this space, right? Yeah, I'm glad it's working for you. +And I've been also slacking a bit kind of, the slacking is not the right word, but slacking with big ads. So just kind of, I so immediately answers to my questions and they've been like a long thread. +Why Docker doesn't work, can you try these? Can you try that? And it's also like, you know, it's like a first impression you get about the database or about the Nopus source product. Like how soon you get an answer? Yeah, and you definitely try this sometimes. +Also like you're working on one problem and you have another problem that's completely separate. It's like it's a big system. So jumping around between, but then it's also okay. Let me find someone to answer that for you. So you go internally look for someone. +Hey, can you answer this? But hopefully it's working. I think, I think we're pretty quick on our responses. Maybe like overnight, it's sometimes difficult to sleeping in everything. But we try to get responses whenever we can. Yeah, some people are like in China, I guess. So like, I don't know. +It was like five hours difference with my time zone and sometimes. With yours, yes, probably five. Mine is 14. That's like you ask one question for a couple of days, right? Yeah, it gets interesting with the technical like very deep questions, because then I have to kind of bridge the gap of time. +Try to find solutions on my own to why it's going wrong. But then also once five o'clock hits for me, I can pull in the external knowledge from the other team. But it's fun. This needs to be solved with vector search. I don't need an exact answer just in the approximate, but faster. +Oh, yeah, we were working on that. We're trying to apply it to like a chatbot for all the problems that you have. It's been working okay, but we're working on it. Trying to get more questions, kind of build up a data. So that's the issue with everything. Just building up that data said. +Yeah, absolutely. So that it will make sense for the chatbot to kind of, because chatbot wouldn't create answers. Well, unless it's some generative model. Yeah, the GBT3 for our questions. Like a story out of it. Yeah, but then it might be hallucinating as well. +In some cases, it's okay though, right? Yeah, one of the ten is the correct answer. The other ones are all just like burn your computer. Yeah, if you want to have fun with, you know, you don't need an exact answer. You just, okay, hey buddy, how are you doing? Yeah, so yeah, that's fantastic. +So I was lovely moving to to why section, even though I didn't say all the sections, but we kind of mixed what and how together in many ways. And you handle it really, really well. +You know, the why the why question that I really like to ask, everyone on this show is kind of what motivates you to be part of vector search development today? I think for me the biggest thing, I want to over a few times is everyone storing all this data. And like it's so like it's a huge amount. +I like all these companies. And then just the next step, I want to see what we can do with it. Vector search is one who knows. Maybe vector search might not be it. +But in that chase for figuring out vector search for like perfecting it, something you might pop out and kind of ride this wave of what's next. And that's kind of like why I really like vector search right now. I get to learn about all these things. +I still get to like throw my ideas into it and still kind of have them matter. Like the previous like the way that's past already, it's kind of gone to the point where you really need to have this deep, deep knowledge to actually be able to innovate. +You do also with vector search and all of these things, but it's a little bit more fresh. So more that sort of makes sense. Like I like to I want to ride that wave of freshness and kind of the next step of dealing with these huge data amounts. Yeah, that's that's that's amazing. +And also I think I've read somewhere on on to either one of the founders of Y-combinator. He said like it was an essay. He said like when you are on the bleeding edge of doing something, then you you automatically become the expert in that field. +And if something works for you, you know, the rest of the market will probably try to copy. If it didn't work, then probably everyone else didn't figure it out because you are the bleeding edge expert, right? You are right there. +And if you will figure if you will figure out something very interesting that will be kind of revolutionary in some way, then you will be first to possibly capture the value, right? And so you work for that goal. +On one hand, as you said, it motivates you to unlock the, you know, the silo database is kind of of data, unstructured and structured data. On the other hand, you said maybe it won't be vector search, maybe it will be something else because you are in that experimental mode, right? Yeah. +Whereas you can easily quickly transition and kind of keep that knowledge, keep it going and keep it running. Yeah, that's pretty much for me why I'm doing this. It's been really fun so far. Also startups. I like it. I like the multi-hat. Kind of just do it. Try it by fire and just get it done. +Yeah, yeah. With the money where you mouth, well, how do you say it in English? With the money where you mouth is? With the money where you mouth is? Yeah, like something along those lines. +Yeah, but I mean, you basically, instead of just kind of blogging or saying how cool it is, you actually go and try to apply it to some real use case, right? Exactly. +And if I may ask you, like, do you think that something kind of tactically or strategically is missing right now in the vector search space? Like maybe on the lines of how we explain it or maybe there are some untapped use cases or something else that comes to you. +I think I think the big thing right now is I may be wrong on this one. I kind of might be explaining it weirdly, but like having a standard, like we don't really have a standard for any of this yet. And there's a bunch of things kind of popping up and everyone's going to be scared to move away. +So like, I feel like the last to search. I don't know too much about the history and like what's going on, but like, everyone's a bunch of people have built up their system on that and it's kind of been a standard for doing that text-based keyword searching and that kind of stuff. +And then when we say, oh yeah, do word embedding. So we'll make everything improve. Do this, do this. But it's like, there's no standard in any of like, we're doing vector database. Some of the people doing vector search with database attached. +We're like, everyone's kind of just doing some like, there's no big thing there that keep people kind of try to make it similar. So yeah, there's no standard, which I think is kind of an issue. And it's going to hurt everyone in the long run because there's no standard. +People won't be as excited to try it out because there's too many options. Why switch is too much of a pain. I don't know if that kind of made sense, but that's sort of what I'm seeing as an issue right now. But it'll probably solve at some point. I think naturally that happens. +I guess explaining it could have seen from my previous explanation of trying to explain a vector search. It was all over the place. But it kind of gets hard to it's a step. It's a jump and um, yeah, not everyone will know like similarity like cosine distances. +Like you need to be sort of involved with machine learning. I think the best way of around that is just making full pipelines for people where you just put an image in you get your result and then go from there and then from there on they can start messing around with it. +But uh, in time, everyone I think will have that and everyone's working for that. +Yeah, I think what you said makes a lot of sense and thanks for bringing up this topic, you know, standardization because um, on one hand it basically points us to think that this field is still fragmented, right? I've, I've blogged about it. +I had six databases and then one evening I get a comment on the blog that hey, we are the new key on the blog. Can you add us? That's the seventh database, right? So how many more there are? Probably, yeah, dense. I don't know. Oh yeah, always popping up. But it's like it's good for innovation. +It's just like we're competing against ourselves. Like we're competing in everything but no one else really cares. Like we can all compete against each other but the people that are actually going to use are going to be like look at this mess. Why would we go into that area? Yeah. +And that there was actually, you know, if you know the relevancy and matching slack, I don't know if you're on it, it's like a community of all search in 3DS consultants. I think I am. Yeah. Yeah. Yeah. It's it's it's fantastic place. I'll also make sure to link it in the notes. +And there was like one very interesting piece on like actually touching on what you just said, you know, there was like a heated discussion on like how should we call it pre-filtering, single-stage filtering, something filtering. +And you know, like when you invented that, you go to your users and you say, yeah, today we released a single-stage pre-filter after filter filtering. So please use it, right? And then some other company comes and says, no, we invented another one. +It's called after pre-filtering single-stage with double sub-stages, you know, I'm just making it up obviously. Exactly what you mean. I've exaggerating, right? And then that's what you said. +Eventually it will hurt the users because they will say, oh no, no, no, I have that single-stage filter after filtering. I will not go and switch it to another one, right? Yeah. No, exactly then. It's just but I think it's it's all young. I'm like kind of new to this whole field of how things work. +But I think it's natural for these young someone like everyone's going to race at the top and see. And you just got to do what we got to do. But it's interesting how it's all going to play out if there is going to be more communication between everyone if there's not. I don't know. +It's might be a little bit above my what I'm doing. But we'll see future awaits with all of this. But I still feel like in the end of the day, you really need to focus on the users, right? You're not focusing on inventing a new turnbook or a new dictionary for vector search. +Eventually it will be published, by the way. I'm sure there will be so many terms. It will be published. But like we need to work to that. Yeah. No, I agree. 100%. So hey Philip, like it's been really great discussion. +I was thinking like, would you like to announce something to the users of Milbus or maybe those who are not yet using Milbus, but they would like to try it out. Yeah. So we have Hector profess get involved. And it's a pretty easy one to check out and see how our system works. +And we are releasing a general release candidate. So pretty much are kind of tried and true. Milbus 2.0 coming month. And you also mentioned that other system toy, you said, what is it about? And when will the yeah, Toby is a ML pipeline software kind of simplifying mainly for embeddings. +And it's all about embeddings and kind of making these for you. So it's pipeline system. Everyone can operate and everyone can upload their solutions if they want to download them. +And yeah, still in the working progress, but look out for it because I think it's going to help in a lot of these areas. Yeah, that's super cool. That sounds very exciting, you know, to kind of take this package and kind of plug things in and try it out for real on the real day. Exactly. +That's fantastic. Thanks for doing this. We'll make sure to also kind of mention this in the show notes or link if by the time it's there. Yeah, awesome. +Thanks, Philip, so much for your time for going so deep with me on on even philosophy behind neural networks and then sharing your ideas and thoughts. Thanks so much. +And I hope we can make another episode at some point down the road if you're open to it, especially as the company materials and the product materials. And you get more use cases. So I'm looking forward for more blog posts as well. Awesome. Yeah. Yeah, thanks for having me. +It was a really fun discussion. Thanks so much, Philip. Bye bye. Bye. \ No newline at end of file diff --git a/transcripts/vector-podcast/grant-ingersoll-fractional-cto-leading-search-consultant-engineering-better-search.md b/transcripts/vector-podcast/grant-ingersoll-fractional-cto-leading-search-consultant-engineering-better-search.md new file mode 100644 index 0000000..1eefc66 --- /dev/null +++ b/transcripts/vector-podcast/grant-ingersoll-fractional-cto-leading-search-consultant-engineering-better-search.md @@ -0,0 +1,321 @@ +--- +description: '

Vector Podcast Live

Topics:

00:00 Kick-off introducing + co:rise study platform

03:03 Grant’s background

04:58 Principle of 3 + C’s in the life of a CTO: Code, Conferences and Customers

07:16 Principle + of 3 C’s in the Search Engine development: Content, Collaboration and Context

11:51 + Balance between manual tuning in pursuit to learn and Machine Learning

15:42 + How to nurture intuition in building search engine algorithms

18:51 How to + change the approach of organizations to true experimentation

23:17 Where should + one start in approaching the data (like click logs) for developing a search engine

29:36 + How to measure the success of your search engine

33:50 The role of manual + query rating to improve search result relevancy

36:56 What are the available + datasets, tools and algorithms, that allow us to build a search engine?

41:56 + Vector search and its role in broad search engine development and how the profession + is shaping up

49:01 The magical question of WHY: what motivates Grant to stay + in the space

52:09 Announcement from Grant: course discount code DGSEARCH10

54:55 + Questions from the audience

Show notes:

- Grant’s interview at Berlin + Buzzwords 2016: https://www.youtube.com/watch?v=Y13gZM5EGdc

- + “BM25 is so Yesterday: Modern Techniques for Better Search”: https://www.youtube.com/watch?v=CRZfc9lj7Po

- + “Taming text” - book co-authored by Grant: https://www.manning.com/books/taming-text

- + Search Fundamentals course - https://corise.com/course/search-fundamentals

- + Search with ML course - https://corise.com/course/search-with-machine-learning

- + Click Models for Web Search: https://github.com/markovi/PyClick

- + Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing, book + by Ron Kohavi et al: https://www.amazon.com/Trustworthy-Online-Controlled-Experiments-Practical-ebook/dp/B0845Y3DJV

- + Quepid, open source tool and free service for query rating and relevancy tuning: + https://quepid.com/

- Grant’s talk in 2013 + where he discussed the need of a vector field in Lucene and Solr: https://www.youtube.com/watch?v=dCCqauwMWFE

- + CLIP model for multimodal search: https://openai.com/blog/clip/

- + Demo of multimodal search with CLIP: https://blog.muves.io/multilingual-and-multimodal-vector-search-with-hardware-acceleration-2091a825de78

- + Learning to Boost: https://www.youtube.com/watch?v=af1dyamySCs

- + Dmitry’s Medium List on Vector Search: https://medium.com/@dmitry-kan/list/vector-search-e9b564d14274

' +image_url: https://media.rss.com/vector-podcast/20220609_020607_0461c4544521e6be53134d28774b7c4a.jpg +pub_date: Thu, 09 Jun 2022 14:51:07 GMT +title: Grant Ingersoll - Fractional CTO, Leading Search Consultant - Engineering Better + Search +url: https://rss.com/podcasts/vector-podcast/514832 +--- + +Hello there, vector podcast is here. I'm Dmitry Khan and I'll be hosting this session. And just a few words on the logistics. Everyone in the audience feel free to submit your questions either through a Q&A panel or directly in the chat and we will try to handle as many questions as we can. +I'll save you words about core eyes. What's core eyes is a new education platform that transforms the way professionals build technical high demand skills through top industry instructors and collective peer learning. +And the format of their courses is innovative mixing live instructor sessions with real world projects and fireside chats like this one technique with operators who experts in their fields. +I will say a few words about myself as well, untraditionally on the podcast, but I think it becomes a tradition now second time. I said and Dmitry Khan I have a PhD in natural language processing. I've worked at company Alphasense helped to build the search stack. +I spent like a decade, you know, there. I've been a principal AI scientist at silo AI. It's a AI consulting gig focusing on a number of ML verticals. And recently I joined company Tom Tom as a senior product manager working on search. I've also been a contributor and user of Quepid. +It's a query rating tool go check it out. It's an open source tool. So overall I spent like 16 years in developing search engines for startups and multinational technology giants. I also happen to be hosting this podcast vector podcast go check it out. I'll share the link in a second. +And I'm also blogging on medium on on my findings in vector search. So you might hear me talking about vector search here and there. And today I'm super super excited to have Grant in your soul with me. I've known Grant since about 2011. +Not personally, but I've seen I've seen him on stage on you know Berlin buzzwords conference and Lucinda revolution. And he has been a long contributor and open source as well. Solary, Lucinda Mahoot and others. And very very effective presenter. +I just watched a few presentations as a homework for this session. There will be some questions from there. But hey Grant, let's start with an introduction from Indio own words. +Hey Dmitri and thank you so much for having me on the vector podcast and obviously props to co rise here as well for helping sponsor this. Both Daniel Tunkelang and I are on the co rise platform and really enjoying our time there. So real quick about myself. As you said, my name is Grant Ingersoll. +I guess these days a long standing user and contributor and committer. And generally somebody who participates in the search space, if you will, I think I wrote my first Lucine code back in 2004 or so. I guess that maybe makes me old. +As far as my background is I was one of the co founders of Lucidworks, which is one of the leading companies in the search space. I then left them in 2019 to become the chief technology officer at the Wikimedia Foundation. +You probably know them better as the nonprofit behind Wikipedia and Wikidata. So I was the CTO there for two years. And then in August or so of 2021, I took some time off. And then in January of 2022, I went on my own as a consultant and an instructor for co rise. So here we are now. +I am commonly doing work in what I would call fractional CTO land, which means I primarily help companies kind of get their technology stack in order, make decisions about technology, higher teams, upgrade teams, do all the things that a CTO would do. Often for small businesses and or startups. +And so that's really my background. Really happy to be here and looking forward to the podcast. Awesome. Great to have you, really grand. And also, you know, finally, I have a chance to ask some questions and chat to you in this cozy atmosphere as well. And I wanted to start with a question. +So I was watching a kind of short interview you gave. During Berlin buzzwords 2016, where you said how you split your time as then CTO of I believe, Lucid works. You said that you split your time between three C's, which is writing code, going to conferences and talking to customers. +Now that you're independent, is this how you spend your time or did you did you get some new letters of the alphabet? Yeah. +Yeah, and there's often in there as well, colleagues and co-workers, you know, especially in, you know, the CTO role is kind of a funny one, right? Depending on the company, it can mean a lot of different things. At some companies, CTOs are entirely outward facing. +It's effectively a sales role or a marketing role, right? You're out evangelizing the product, you're talking to customers, etc. In a startup, the CTO is often the primary engineer. +If you're a two person startup and you're just getting off the ground, you probably have the CTO title if you're the technical one in that startup, and you're probably writing all the code, right? +In other places, you're running your engineering team, and you may not be writing as much code, but you're responsible for the team. +I guess over my years, I've worn all of those hats. I've been out doing conferences and evangelizing. I've done a lot of sales work, especially later on at Lucidworks, I did a lot of sales work as the company evolved and grew. +When I was at Wikimedia, it was all pretty much internal running the technology team, making, you know, helping making technology decisions, all of those kinds of things. So I wouldn't necessarily say it's changed much. +I still do write some code, but not as much as as I used to, I guess, when I was a full-time engineer. But yeah, it still roughly falls into those categories. +Yeah, and I mean, like having been a student on your course, I've really enjoyed so much code that you've written to support this infrastructure of building the search engine. And I mean, you are still highly technical person, so I wouldn't discount that. +And I mean, this is something that is dear to my heart as well for me being an engineer, to talk to like-minded person. And in this segment year, in the same conference, 2017, you gave an excellent talk title, BM25, is so yesterday, modern techniques for better search. +And what's funny, and I'm going to share the link as well. But what's funny is that I don't know if you noticed it yourself, but you again have three C's in there. I wonder if you did it on purpose. What you have there as building blocks of this kind of journey of building a search engine. +So the first one is content. And you piggyback on solar capabilities, but in general, it could be any search engine out there with rules for content, like with boosting, manual boosting, you know, lending pages, and so on. The second C is collaboration. +So that's like the way you put it, it's collective intelligence to predict user behavior based on like historical aggregated data. And this is where I think recommenders come in, popularity, signals, and so on. +And last but not least, you have context, which is when you ask questions, who are you, where are you, you know, what I have you done previously. +And this is when you start doing market and user segmentation and venture into personalization and so on, would you say that you view the search engine journey and development the same way today, or have you have you changed your perspective? I really need to check and get a little more creative. +I think I'm using the letter C there too many times in a row, but I mean, I think a lot of that still stands pretty true. +Regardless of the engine you're using or whether you're using deep learning techniques or not, like, you know, at the end of the day, you're trying to match users to information that will help them make better decisions or be more informed, right. +And you know, these days, I would probably add in one more. I'm trying to think of how I could be witty and make it into another C, but you know, in working with Daniel Tongueleg on this class, one of the things that is just absolutely wowed me is the query understanding aspects of it. +And so maybe you could put that into the context category if you wanted. But, you know, realistically speaking that that work you can do, especially in large scale environments where you have a lot of queries, to really understand what users are asking or intending to ask when they put in a query. +So I would probably throw that in there if I didn't include that back then. And so like I said, maybe that's part of your content or your or your context is the actual query, a user or this, the set of queries that a user is asking. +You know, but I still think a lot of that stands at the conceptual level, right, is you have to have some, you know, if you think about it, this is the vector podcast, right. +All of this stuff we're building vectors and then essentially calculating this fancy version of a cosine similarity between them. +And at the end of the day, all of these techniques we're doing are effectively how can we shape those vectors so that things that are meant to be closer together, show up closer together and things that are not as related the cosine is further apart, right. +Like at the end of the day, like that math doesn't change, yet all these techniques, whether it's deep learning, et cetera, are all about creating those vectors and doing that calculation, right. +And so by understanding your content, you're shifting those vectors, you're transforming them in the space, you're adding synonyms, you're adding embeddings, all of those kinds of things, you're adding proper nouns, you're you're doing noun phrases, et cetera. +By understanding your context, you're able to ask better queries, right, which is shifting the query vector, right. And by using popularity, et cetera, you're also then shifting those vectors by essentially adding more weight to things that are more popular, right. +You know, so at the end of the day, like, yeah, I would say I'd still stand by that with the caveat is really bringing forward the query understanding aspect of it. +Yeah, I think query understanding, you put it brilliantly, it's like really an exciting space and we actually recorded a podcast as well with Daniel Tankilank, where he explained a lot of it, he also blogged a lot about it. So go check it out. +And like in that same presentation, like when you demoed the capabilities of Lucidworks platform, where you played a lot with different like ranking strategies, basically like you pre-trained some of them and you you were able to switch live, I felt like you you are a tinkerer as well. +You enjoy really going deep down into the what search engine can do and what you you can extract from the data. +And my question is, where do you see the balance between kind of like doing this in a more manual fashion, where you actually educate yourself, right, versus like throwing it to a machine learning model? Yeah, it's a great question. + I mean, I think you know, obviously, and I see my former colleague and co-founder Eric Hatcher is on, I mean, he used to always say it depends and I'd say it depends here of course as well, which is, you know, I mean, there's there's some situations where you just you don't have enough data for machine learning, right? +So by default, you are going to be manually tuning the situation, right? You you see that a lot in enterprise systems, especially smaller enterprise systems or in niche applications where, you know, effectively search just needs to be good enough. +Maybe you're not monetizing search. And so you don't, you know, you just kind of need it to be reasonably good, right? It's a feature in a much broader set of features that users are going to engage with. +And so, you know, where and how you would use machine learning in those situations, you know, you may or may not. + In the situations where you have lots and lots of data, lots of users, you're probably monetizing search, whether that's via e-commerce or or web search or ads or whatever, like, you know, I think machine learning makes a lot more sense there and and it's a lot easier to run these types of experiments that allow you to tinker not just with the hand-ranked models, which I think hand-ranking still has its place, right? +Because they help you form intuition about what is in your data, right? +And that intuition is really important even in a machine learning world because, you know, at the end of the day, even with machine learning, while you can still try out, you can try out a lot more features and approaches, you still have limited time, right? + And so, you still have to have some intuition about what's going to work and I think there's no substitute for that intuition helping guide you into what matters, like, so for instance, in a learning to rank scenario where you're actually learning a ranking model, you still are often building up those systems using the features of your data. + So you have to know what those features are and one of the nice things is like with tools like Lucine-based engines like OpenSearch or Solar or Elastic, I'm sure Vespa has the same kind of thing, you can go and play around with those, you can create your own function queries that allow you to roughly try out different formulas for ranking and then you can go and turn those things into machine learning models, right? +That learn a much more effective function than what you could come up with, right? So, I think even in this world of large data sets and machine learning, you're still going to have to build intuition, right? Yeah, absolutely. +And like in your own experience and in the experience of the teams that you supported, how do you nurture this intuition? Like, do you read books? Do you constantly experiment? + And also like when it comes in, you know, to understanding fundamentals of search, let's say knowing how TFIDF formula composed or you have 25, what are the trade-offs versus sort of like going and actually experimenting and trying out things, you know, where do you see that balance as well for yourself maybe and also for the teams around you? +Yeah, I mean, I think everybody will have their own, you know, kind of depending on where you come from, right? Like if, you know, like if you have, if you've done deep academic work, you're probably going to have a lot more understanding of the math and the theoretical side of it. +And then you're going to have to develop the intuition of real world data, right? How messy it is, how clunky it is, how full of junk and spam, et cetera, right? Because a lot of times when you're dealing with academic data sets, they're pretty clean, right? +Relatively speaking, they still of course have their own set of garbage and nuances in them. +Whereas if you're an engineer and you're coming at it from like, hey, you know, often what I see with engineers is they come at it from a quantitative standpoint of like, I want to make sure this is scalable and reliable. So they're solving for the hardening of the system problem first. +And then they often will develop the the relevant side of it or the the understanding of the data second. Now again, broad generalizations there because, you know, folks have all kinds of different backgrounds. +But you know, so like as a leader in somebody who does, you know, manages people in this space, like I would often just work with you depending on what your background and understanding and intuition is. +And then, you know, try to help you complement whatever it is you're missing there, right? Like I think you have to have an understanding of how these engines work. +I've often seen folks who don't have an understanding of all the capabilities of these modern search engines recreate the wheel, right? Like they're reinventing the wheel because they they're coming from this first principles of the math that they learn at the academic level. +And then, but they don't necessarily know how that applies to real data in the real world. Whereas a lot of these, you know, modern search engines, because they are, they grew up in large scale, you know, publicly traded high volume spaces. +They've really been hardened on the engineering side and they really know how to deal with all the nuances of real world data, right? +And so, you by learning those kinds of things, you will be much more effective at the at bringing to bear your intuitions and understandings from whichever background that is. +I don't know if that makes sense or not. Yeah, no, absolutely. Yeah, actually in the same presentation, you also said like, you've seen cases where you you come into helper company and they they point you to sort of like a data, the ace almost of 10,000 rules. +And so you you said they have that in principle, you could just remove solar or whatever search engine you have and just use those rules to retrieve documents, right? But when you go and ask specific questions, what what what this rule does? +The answer that you you illustrated was well, it was created by Joy, you know, and he quit five years ago. +So he then said it makes sense. So we keep it. So how do you go about convincing the organization or teams to change their perception and sort of like become more flexible and move into this flywheel of experiments? Yeah, it's hard. +And again, I think, you know, I mean, you have to look at incentives and first principles there, right? Like, again, if you're in this boat of like searches, just a feature, there may or may not be any incentive. +But if you're in this boat of like, hey, search is a really critical aspect of what we do. Our users use it all the time. It's key to revenue. It's key to timeliness or it's, you know, people's lives are on the line, et cetera. You're going to invest in making sure searches as capable as possible. +Those folks usually don't take much convincing once you can show them a better way, right? They're often already frustrated by the sheer number of rules that they have. +And so one of the things that can often work in those situations, I think is, you know, you can start to just learn the, you know, a lot of these machine learning systems will actually learn the set of rules, right? +And so if you want, you can just start to learn the rules by the fact that you're gathering your queries and your click logs and you're looking at the engagements users are having with the system, with the rules in place. +And then over time, you know, that will learn it. +That the harder part often is getting that last part, which is true experimentation whereby they actually have a system in place for running multi-variant experiments or AB tests, right? +And they can actually try out different approaches and see which one wins and see which one's most effective and then go with that from, you know, until the next one beats it, right? That's a fair amount of engineering work to get in place. +It's also a fair amount of math to do in order to make sure it's appropriate. These days, there are systems and tools that allow you to do it, but if you want to homegrown it, you know, that can take a lot of work. +So getting people to be in that mindset, especially in environments or company cultures where like there's pride in being right, you know, you sometimes see that in a lot of companies where it's like whoever's the boss has to be right kind of situation. +Those types of companies are always going to struggle with experiment mindsets because, you know, they reward, quote unquote, being right as opposed to, quote unquote, you know, rewarding longer term growth and incremental improvements with the occasional failures, right? +So you really have to look at company culture first and potentially reset that and then build and bake in the the necessary engineering work to make experiments work. +Yeah, absolutely. I agree to that same thought that, you know, without failures, you cannot really breed the culture of creating cool new stuff because you basically cannot unleash yourself to go and mess with your code base, right? And do things and create new stuff. +So like, you need to be brave for sure. Well, as I think the front of mind Ted Dunning said, the cool thing about experimentation frameworks is you get to be wrong and that's okay, right? Like you're actually right by the fact that you're wrong. +You're because you're right in the long run, right? Yes. Even if any given experiment is flat or bad, right? But overall, you know, in the long run, you're going to win out because you're going to just, it's easier and easier for you to add in a new approach. Yeah, absolutely. +I think that Turnbull also said, like, you know, how you basically accumulate this bruises, right? So you're like, Oscar tissue as some other people say. So I think without doing things, you can't without failing as well, you can't learn. So I totally agree to that. +But still for those who are still learning, you know, and we are discussing, to some extent, the courses that you couldn't be teaching, you know, where do you start? Like, let's say you have some data, right? You have some click logs within your organization or maybe you found some data set. +Where do you start? How do you go about dissecting that data set? What do you do with it as next steps and what to avoid maybe and what good things to know to keep in mind? +Yeah, I mean, I think, you know, first off, I mean, a lot of companies aren't even at all that great at actually collecting and managing their query logs, right? +So if you're, if you've got a search engine up and running and you want to improve it, I mean, I think the first thing you have to start to do is again, it kind of goes back to this first principles. +Like, if I'm not measuring things that help me understand what users are doing, and that's the first step, right? Like, make sure you're able to process your query logs and capture things like session history and what users clicked on, what they saw. +A lot of companies will only measure what was clicked on, but they actually don't measure what was seen by the user or at least inferred to be seen by the user. +And that can be a big loss because a lot of these machine learning systems, you need to know what wasn't chosen just as much as you need to know what was chosen, right? So really make sure you've got the instrumentation of your system in place. +And guess what? A search engine is a great place to store all of that data as well, right? As elastic as proven out with their using search for logs and spawn as well, right? And so make sure you're captioning all that stuff. And then again, I think this is where your intuition starts to come in. +So whenever I get a new data set, a new set of click logs, I start to look at, well, what are my most popular queries? What are users asking today? What are they asking overall? What led to zero results? +How often are they rewriting their queries like they typed in a query and then they didn't like the results. +So they rewrote it. You know, all of these things are pretty easily discoverable in query logs, right? So just start digging in and building some intuition for those things. + So for instance, one of the things when I was back at Lucidworks that we would do is what we call like head tail analysis or long tail analysis is another thing you see in the literature, you know, especially in the e-commerce world where you have this power law distribution where most people ask the same things over and over, but you often have a really long tail. +When you analyze the long tail in a lot of e-commerce situations, what you often find, for instance, is the long tail is actually pretty highly correlated to the head queries, right? And so developing that intuition of like, you know, why are these long tail queries working or not working? +That can then help you do much better at all of your queries, right? And so, you know, from those click logs, then you start to focus on, well, how do I improve my head or my torso queries, like the ones that are most common? +And then as you go on, then you can look at how do I handle long tail queries depending on how important they are to you? You know, and from from that click log, then you can start to build either, you know, in some cases, you still might make sense for you to have rules. +And then, and then you can also look at, you know, like again, like I would try to look at it the problem holistically, what's going to get me the most bang for my buck in terms of where I should spend my time, right? +So in the short run, rules are probably easier, but they're harder to maintain in the long run. +And of course, you can only manage so many rules on your own and, you know, even with several people, whereas machine learning may take more work up front, but in the long run is probably easier to maintain. +Although I do still wonder, you know, if we're going to run into the same kind of problems we have with rules with machine learning models where we have so many different models that are being applied and they're built by different teams and they're applied in different scenarios. +And, and next thing you know, you have a complexity problem on that front as well. + But, you know, luckily, like with things like machine learning operations becoming more of a focus and people getting much more rigorous about how they deploy and manage models, I think most of those problems will be mitigated in one run, but it still goes back to the same core principles, which you need to have good housekeeping in order to be successful both with rules and with machine learning models. +I don't know if that that was kind of long wind. I don't know if that answered the question or not. It does, it does. +I mean, it gives the intuition, especially where you said the connection between, you know, that that was an insight actually to me, like the connection between head and tail that 50% of tail may correlate with your head. And that's amazing. +Like 50% of this super hard queries could be kind of, you know, removed from that complexity space, right? Which is, again, you know, your mileage may vary, right? Like it depends on your data set in Europe, but you know, like in e-commerce, right? +If if I phone 13 or whatever is the head query, there's probably a tail query that's, you know, silver 64 gigabyte iPhone 13 with case, right? Like that's probably a tail query or at least a torso query. +And once you have those types of realizations, you can start to link these up. And then the cool thing really is that then the things you know about the head can apply to those types of tail queries as well. +And so you're actually, you might be able to more effectively manage those tail queries, even without machine learning models. Yeah, absolutely. And just a quick reminder to our respected audience, feel free to send your questions. +Otherwise, I will ask all the questions myself, which, which of course I have, but, you know, I'm sure you guys have guys and girls. I'm sure you have some interesting cases. We do get a few questions already, but we will we'll answer them in the end of this session. +And couple coupling, you know, that process of sort of, you know, crafting the signals and training your model and deploying it and ML ops that you mentioned. +How do you when it comes to measurement, how do you measure? How do you make sure that, you know, what happens right now in production still makes sense that they don't need to do any hectic action about, you know, okay, pulling the model back or something like that. +What's your sense on on on that front? And like, maybe some measurements that you have deployed yourself and have been observing every single day and relying on it. And again, it depends on your what, you know, kind of what domain you work in. +But, you know, I mean, there's there's lots of literature on how to score and and, you know, test your model. + So things like precision and recall where you're looking at what users are clicking on and whether they're finding the results, things like zero results or often one of the things that I find helpful is like what what you would call surprising results where documents are occurring fairly high up in the results, but they're not actually garnering the clicks that you would expect given that position. +So for instance, you know, I mean, many people in search understand that there's a position bias that's just built into all of us as humans. We we trust the machine. And so we click on the first one. + Well, if you if you consistently see that a document is appearing at say number one or number two in the results, but it's getting way less clicks than say the six or seventh document, that might be an indication to you that that document isn't particularly relevant or for whatever reasons users aren't liking it. +So those kinds of more subtle metrics can also be informative. + I think, you know, if you have a AB experiment, testing framework in place, obviously you can do all of your metrics around AB testing, you know, start with just giving a certain amount of traffic to your new approach and then ramping up as it meets your metrics, whatever that is or what, you know, your targets are if that's things like add the cards, etc. +You can ramp up those those types of tests as you as it proves out. There's obviously there's things you can do offline as well, like especially if you have enough query logs. +And if your index hasn't changed that much, but maybe just the approach you're taking has, then you can you can replay your logs, you can test out and you know, effectively simulate what users might click on in those scenarios. +And then of course there's the old fashioned just, you know, things like smell tests like do these results look better to me as an expert, you obviously have to be careful there or to a small cohort of experts, you know, like maybe your colleagues, etc. might spend some time scoring. +So all of these things, I think are techniques and measurements you can use to check to see whether results are, you know, good enough for you them to go into production. +I think there's I think Ronnie, Ronnie, co-hoved me, I forget the name of the book, but he has a really good book along with a co-author on online experimentation. It's probably these days the Bible of online experimentation. So I would encourage users to check that out. +And then, you know, there's there's lots of metrics that you can deploy, you know, that are pretty well standard and publicized. There's some quick googling should find those for people. Yeah, for sure. +Of course, I think you could measure some things like a DCG, which is offline, right? So like, but you do need like rated queries. +And as a contributor to Qbit, which is a query rating system, open source system, I'm curious to to hear your opinion on, you know, sort of on one hand, of course, you can always go and just check, sanity check, you know, smoke test, your, your, your runker. +But that's just maybe for engineers or product managers, like a smaller group versus when you go and try to understand the intent of queries at larger scale with this manual effort. +Have you seen, have you deployed such methods within organizations? What do you feel like doing this in the companies on more regular basis? +And I also know, as a shout out to what you did in the course, search with the mail course, like you did ask us to rate some queries and create a judgment, please, to get a feel of the process. +And I think that by itself is a great idea because it pushes you towards, you know, further understanding what is it that you're building for? So yeah. + Yeah, I mean, I think, yeah, I mean, it makes, it makes a ton of sense to have, if you can afford to do offline evaluation using, you know, professional annotators, you know, like, I don't know how good mechanical Turk these days is, but like, you know, something like a mechanical Turk or like, I forget what crowd flour is called now or I know we've worked with a company called Appen in the past, like, there are these companies out there that will provide you with a large number of annotators who will run your queries and then rank them for you. +And of course, you can use that as well. +So again, like, you know, it often comes down to whether you're monetizing your search results and folks who do monetize their search results will typically pay for those kinds of things, especially once they reach really large scales, you know, like your, your Amazon's and the like. +Where and how much you can do that often comes down to budget and time, right? +So, you know, if you have the budget, I've seen companies do that, you know, maybe I don't know about weekly, there might be some that do that weekly at the really large scale, that gets really expensive quarterly or whenever there's a major update to the system, those kinds of things. +So by all means, I mean, I think anything you can do to get, you know, I think often in this space, we love to say, oh, well, this is the way you do it. +And the reality is, is like, you want a hybrid approach to most of these things, right? Because there's no one perfect way of, there's no one perfect model and there's no one perfect way of evaluating a model, right? +And so you need to blend these and build up a broader sense of what actually works, right? Yeah, absolutely. + It's just like, I guess, I guess, general awareness, like that these systems and approaches exist and like when you feel stuck that you don't know, okay, you don't generate ideas where you can improve your search engine, you can go deeper and try to involve, you know, and the teachers, I believe, to help you understand. +And before we move further to some of higher level questions, I still wanted to ask you a little bit more detailed question on if somebody in the audience or listeners wants to try to build the kind of end-to-end search engine at home. +So what are the available datasets, tools and algorithms exist today that will allow you to build this and train relevancy models and all these building blocks in the search engine? Yeah, I mean, it's, you know, it's interesting. +I think in many ways we live in a golden age of of search engines, right? Like, there are several just top notch open source freely available search engines on the market. +There are a number of companies competing in this space, right? So, you know, picking an engine is almost like, hey, you know, it's a plethora of riches. +It's almost, it's like, you're, it's a challenge to pick one because there's so many good choices, right? And you're often like, what specific features or domains am I going to participate in? So, you know, it's obviously one like, choose a good engine. +And I think you really can't go wrong with any of the main ones. What, you know, it's the Lucine-based ones, Solar Elastic Search Open Search. I haven't played with Vespa myself, but, you know, I think that one's coming on strong as well. +You see a lot of interesting capabilities that are coming out of that. And then, you know, you have obviously the, the companies behind it. Of course, I'm co-founder of Lucidworks, and so still a big shout out and big fan there, because I think they're doing a lot of interesting things. +But you also see a number of other players in that space, both with deep learning or neural-based approaches, as well as blended or hybrid or traditional approaches. So, one, start with your engine. See what it's capable of. +And then on the data set front, it really kind of depends on what your, what domain you're in. But, you know, I'm a big fan. You know, I often start with public data sets, Trek TREC is a great place to get data sets across a large number of domains. You can also get queries. +So, whether you want to do web search or e-commerce or legal or enterprise or medical, like you can go to track and get a data set and start indexing that, playing around with it. These days also, it's just super easy to go crawl. +So, you know, get like scrappy or curl or WGET or whatever, or it's one of these crawlers and go crawl websites. And then you can start going from there. The query log side tends to be a little bit harder because companies don't like to release their queries. +But there are several data sets that do have some form of queries with them. They may not be enough for you to fully test all the features of an engine. So, in our class, we use a really old data set from Best Buy that has query logs. In it, well, query click logs. +But for instance, it doesn't tell you what was shown the user. It just tells you what they clicked on. And so, you can't actually build full models or effective models with that. +But it's actually a really good e-commerce data set because it has all of the problems of a data set that comes from a company. Namely, there's a lot of missing data in there. There's a lot of bad data. But there's also a lot of really good data. +And so, starting with those, and then I think, you know, you kind of just start to push the engine through its paces. Start with the tutorials, the basic features, and then see where you can go deeper. +Can you actually get Best In Class relevance measurement out of it? Can you get Best In Class speed performance out of it? And then just work your way through the engine. And these days, you can typically do that in, say, less than a week. +And that's really amazing, right? Especially when you combine that with all the great information out on the web, right? Like, you know, I think when I was getting started, it was, you know, you had to go and really dig in underneath the hood and kind of figure out a lot of those pieces these days. +It would take several weeks, if not months, you know, month or more to really feel like you understood an engine and where it went. And I think these days, it's just so much easier to do that, which is awesome. Yeah, absolutely. And I remember during the course we had to do it within a week. +So per project. So that was super exciting. And I think this would not be a vector podcast if I wouldn't ask you also on your opinion in vector search. +Like what's your feel for how it will augment the search engine experience on the user side as well as on the development side and connected to that. What do you think the search engine engineer profession is going to be like soon? And I think it's already shaping up in many ways. +Like the boundary between data scientists and the search engineer blend. Do you feel yourself like that? Do you think this is the direction we are going? Or do you think it's going to be like a form that will wear off? That's at some point. Yeah, I mean, it's, well, it's not going to wear off. +I mean, there's too much money and too much investment and too much better results. I will state upfront, I'm not an expert on these vector engines, right? Like I, it's kind of interesting. +Like they, I went back and look through some of my talks and I think I gave a talk in 2013 on what the Lucine and Solar community needed to do next. And one of the things was we need to add support for dense vectors. That was 2013. I think we just got dense vector support in solar. +Elastic maybe was there a little bit sooner, but roughly same time frame. There are plugins, of course, that have been around like the K&N plugins, things like that. Hey folks, like this stuff is here to stay. +I mean, the really interesting questions, you're starting to see these hybrid models where, like BM25 is still really good and really fast at that first pass retrieval. +It's kind of hard to beat in terms of the scale at which you can get a first pass rank, right? And then feeding it, those results into much deeper or more capable engines. I think that's been around for a while and academia has proven that out. +Clearly, like using embeddings and vectors for things like query understanding and content understanding and using tools like Burnt, etc. for enriching your understanding, your content, and then making those searchable. That's all, I think, well and good. + I think the really interesting question will be is whether the vector engines can add all of the layers that the sparse approaches have, I don't know about perfected, but added over the years, you know, the fascinating, the aggregations, the spell checkings, the highlighting, all of those things that actually go into building a search application. +If the vector engines deliver all of those things and deliver better results, that's probably a no brainer, right? In the meantime, we have these hybrids because I think there nobody is delivering all of the capabilities. +The other things that's interesting with the dense vectors, right, is that you can start to map multimodal data types all into the same engine. So images and text and audio, etc. Right? And again, like I'm not an expert on this, but that's my understanding. +So then, so then you can query across spaces, if you will. Again, like I'm not using the right terminology here, but and that to me is often the, at least people talk about that like it's a holy grail. I'm not fully convinced people will actually search that way. +I still think that remains to seem because there's a lot of implications for the the user interface and the user experience is how you interact with that. +You know, like people have long talked about, oh, hey, I'm going to take a picture and then get my back, my search results, but like I don't every time I use those tools, I'm like, okay, that's nice, but it's still clunky from a user experience standpoint, right? +So, so like there's a lot of that work above and beyond just the core engine that has to be solved. +But clearly, there's a lot of money and effort going into it. And so like as a search engineer, you can't ignore as a data scientist, you can't ignore it. And so you've got to get up on how these are built. +I think all the major engines open source and private have some form of it at this point of blended models. + Again, like, you know, if you're in a domain that you don't have enough data for these and may or may not work, although again, like one of the interesting things with these neural models, right, is you can often train on a general model and then just use a few examples from your domain to essentially tailor that general model to your environment, right? +Like I'm working on one of my clients is doing this in the NLP space right now. +We're using a general model around analyzing contracts and then we're applying domain specific things to it. +And it's really interesting how effective it is with very few examples, right? That's an NLP problem, not a search problem, but you know, so I think you're going to just continue to see that trend and grow and expand, right? So you've got to be on board with it. Yeah, absolutely. +And you can find of course more conversation on the podcast about this. But I think I agree with you that the multimodality aspect of vector search is quite exciting. +And where the data sits in images, for instance, that haven't been annotated yet, right? And so many images uploaded every single day in videos, you know, if the model is able to transcend the the domains so easily like clip model, for instance, built by OpenAI, it's not a perfect model. +Sometimes it fails, but sometimes it also uses you like, how could it figure out, you know, to work so reliably on my data that it hasn't seen before? That's amazing. +Well, and it goes back to your earlier question, which is like, you know, at the end of the day, folks like go evaluate it and see whether it works better for you. +And then like I said, even earlier, I mean, they're all just vectors and we're all just trying to calculate cosines between the user's query and and the vector. And so in some regards, like we're just building a better vector, right? It's just a better vector. It has more information encoded in it. +And so if I can query that more effectively, then why wouldn't you use it? Yeah, yeah, exactly. And of course, there are other subtopics there how to make it faster and so on, but I think eventually we will, hey, Google figured it out for 10% of the queries. +So I guess the rest of the world will catch up. +Before we continue to the questions from the audience of which we have at you, I do love asking, and if you can keep it a little bit short, because we are short on time, but I'm still super, super interested to hear your motivation to stay in this space. +You have tried so many things in your career, right? Looking at your LinkedIn profiles, just on and on experiences and fractional CTO and full-time CTO and an engineer and so on and book author. +What motivates you to stay in this space today and also go into education teaching? Yeah, I mean, it's funny. + I think, well, even when I was at Wikimedia and I quote, unquote, left search, I mean, we still ran a very large search engine and I always enjoyed my conversations with a search team at Wikimedia just because they were, you know, it's such a high traffic website and search there, I think does something like 6,000 queries per second or something like that. + So you know, in some ways, and this is reflecting back on my career, I mean, I think I fell in love with language and the way humans use language and find information back circa 1999 or so when I started at a small company called TextWise run by Liz Litty who is one of the pioneers in the natural language processing field and it just happened to have a search project that I started working on, right? +But to me, you know, at the end of the day, like, this space and this is why I went to Wikimedia. + So I say, searches that necessarily the through line, even though it's often the main, it appears to be the through line in my career, the deeper through line, I think, is that I am fascinated by how we can leverage computers to help users make more informed, more capable, more aware decisions in their lives, whether that's purchasing online or political or governmental or whatever it is, like, I am fascinated by how we can help people make more informed decisions because I think that's the thing that lifts us out, right? +And so education then is a easy follow-on from that through line, right? Like, the more people I can help use these tools and also learn myself, the better off will I'll be, right? Like, we have to use these tools to, you know, to help us as humans get along better, etc. +be more informed, so on, so forth, right? So that's probably the through line of the career, right? Is this how do you help people find information and take action that makes us all better? Absolutely, this is very deep. Thanks so much. +I love asking this question because I'm super motivated to stay in the space, but I also love to see the facets and the motivation of other professionals like yourself that I'm looking up to. I really enjoyed this conversation. +Is there an announcement that you want to make in terms of the courses that you're going to be teaching soon? Yeah, that's great. I appreciate that, the Metri, and I know we'd have some user questions, and I'm happy to stay on a little bit longer as well, get those. +Yes, we actually, we have two classes coming up. +So one of the things we learned in the first run of search with machine learning is, you know, effectively we had one week of trying to get everybody on to the same page of how does open search work and what are the basics of search? + And then we had three weeks of fairly intense machine learning in a search environment, and one of the things that happened in the class because we didn't have a lot of prerequisites is we had a really wide array of students of folks who were deep experts like yourself, as well as like totally new to this arena. +And what happened, I think, is that first week for the new people was like, hey, this is too much for me to get up to speed. And for the folks who had already done search, it was like, hey, I already know how to do all of this. +And so trying to, trying to go across that gap, I think we kind of ended up in this lukewarm area where nobody was quite satisfied. +So one of the things we did was we split out the new stuff into a two week class called search fundamentals, which covers all of the basic intuitions of search, whether it's deep learning based or a sparse learning based or sparse vector based, sorry. +And so we cover, you know, indexing querying, facetying, spell checking, auto-complete, kind of all the building blocks of a search application. +And then with the machine learning class, because we're dropping that beginner class week, we now have added in a neural retrieval dance retrieval into that as well. And so the search with fundamentals class starts next Monday, June 6th. You can still sign up. It's $200. +There's a code, DGSearch 10. And then search with machine learning is two weeks after that. And that's a four week class. Both are project intensive. +Every week, you're going to do a project, you're going to write code, you're going to interact with students, you're going to hear lectures, so on, so forth. +In many ways, I think it's modeled after a university style class where you, you know, every week you have homework, every week you have lectures, and so on, so forth. So yeah, please sign up. Yeah, that's awesome. +What I've personally enjoyed during the course, the search with the Mel four weeks course was the atmosphere. The atmosphere that was basically creating itself amongst the students and was over 100 people there on Slack helping each other. That was just amazing. +Somebody saved me like a ton of time by just sharing, you know, a recipe that I followed and quickly went through to some hurdle. And I learned, and I, of course, I knew some stuff. +Yes, I'm an expert in this field, but also you can put your expertise, you know, to a test when you, when you run so fast during the course and the support that you guys provided was amazing. So that's amazing. I've enjoyed this conversation so much. +Now we are moving to the questions from the audience. And I'll pick the, and feel free to ask questions, please. We still have a few minutes. The first question comes from Avynash, who is currently testing the approach of buying coder to find the similar sentence, top 10. +And later passing the top 10 sentence to a crossing coder model to find the most similar sentence in the top 10 using cosine similarity. Yeah, I guess he's asking for advice is this an appropriate method. This is where my expertise just is not. So Avynash, I will apologize. +I do not know enough here to give you advice. +I would probably ask first, like, what is the actual problem? Are you trying to solve? You know, so if you're trying to find similar sentences, then from my understanding of it, that my basic level understanding of what you're describing, it sounds like a reasonable, a reasonable approach. +But there are people who are much in probably Dmitry, you probably could answer this one better than I, but I have not played with or tried out those specific types of capabilities. So I don't have good advice there. I have worked in general on sentence similarity type problems. +It is always challenging. In fact, I have a my current company that I'm one of my fractional clients. We are doing sentence similarity or clause similarity types problems. And I think they are we are using similar modeling techniques, but I'm not doing the day to day modeling on that. +So I'm really just trusting the data scientists on that. Yeah, I can add to this that I happen to have given a community talk during the search with the mail course. And there I actually go explicitly into this by encoder and cross-ent coder. +So only one thing is that cross-ent coder is much more computationally intensive. And so you don't want to run it on a huge amount of sentences. And it looks like that's what you're doing. So that sounds sensible to me. I think I would pay more attention to testing your approach. +So make sure to reserve some part of your data set to test it. Careful. Yeah, this is the cool thing for me coming back in from Wikiland is I'm learning so much now too. +Like this is I've been digging my way through a lot of these things, but as you can see, this is why it's the gold age because there's so many approaches and they're often improving state of the art every week, right? Yeah, exactly. A lot of things is happening. +Another question I'm taking now from the chat, Carlos is asking, I'd like to know Grandsepinion inside about learning to boost. He gives also a link to a presentation at a high-stack high-stack conference. I don't know if you're familiar with this approach, Grant. Can you say anything? I am not. +I'd like to know learning to boost interesting. Another thing to go learn. Yeah, I think it was all kind of learning to rank. I think it's related, but I actually don't know myself like that much in detail, but that that presentation was great. +It looked like new thing, but at the same time kind of familiar. Basically, instead of learning to rank, you learn the boost values as far as I remember. It sounds interesting and reasonable. +Again, at the end of the day, how do we shape these vectors? I know that's a generic wave in your hands, but I would take this and go try it. I think most of these machine learning systems you're trying to learn weights that then shape the way that vector gets called. +If it works on your domain and it's fast enough and you can maintain it, then go for it. You don't need some experts blessing on it. It certainly sounds interesting. LTR certainly has its own challenges in terms of tweaking and tuning. I know I've struggled with that with LTR a lot. +I know I've struggled with hand-tuned boost a lot as well, so anything that helps do that I think would be good. Yeah, awesome. The next question comes from Nico, Hey, Nico, a former colleague from AlphaSense. +If you're hosting an information search engine which should catch new topics like COVID when it hit, how do you notice that your boosting model of vector embedding model does not recognize queries related to these new topics proactively? +Yeah, that's where I think the instrumentation of your system comes in, right? And the human and the loop on that instrumentation in the system, right? +I mean, I think nobody talks about it, but even at the really large successful search engines, there's still people who are reviewing where things are working and not working, right? And generally they're doing it at the experimentation level, but people still dig into queries. +What queries are underperforming? What documents are underperforming? I think there's tools, there's a lot of good tools out there for anomaly detection as well. +So recognizing when new queries are coming in is something like anomaly detection algorithms will work with, right? +You know, looking at your top queries, your trending queries, and then again, looking at those results, there are machine learning approaches to automatically identifying and alerting on those kinds of things, again, along the anomaly detection line. +But at the end of the day, you can always do that with people as well, right? And that's where humans maybe are better at still at recognizing some of those things. Yeah, and I think you also alluded to this somewhat. +I mean, this question is to me, it's like chicken-eyed problem, right? So if a new topic arises in the queries and also in the documents, but I haven't handled it yet before prior to this, then what can I do live? +So I think you said that try to measure things like if some top ranking documents are not clicked, then that's probably a signal of something is smoky there. +Go check it out. Another thing that I think I could recommend, maybe from my side, is you could try to cluster your queries actually. +And sometimes the funny thing is that queries are related in some way, right? +So like if it's a completely new cluster and usually dense retrieval helps a lot there, pre-trained models on your domain or maybe on some generic domain like news, they might still pick these things up and put them in the same basket, then ask some human annotators to go and check. +Instead of checking the whole multimillion log, you know, which would be super, super complicated. And you know, I agree. +And the nice thing about like, you know, especially, you know, these engines, you know, there is still the good old BM25 case where like at least the basic level keywords are going to match. +And so if a new term comes in for COVID and like it's in the documents, you'll at least probably get an exact match. You may not deal with the fuzzy matches all that well, but you know, like something's better than nothing. And then that allows you to start to iterate on it. Yeah, exactly. +So the next question from Q&A panel is from Chris for the search with ML course, which front-end framework are most students using for their projects? Front-end framework feels a little open-ended to me, but I mean, I can tell. +So one of the things we're doing in both classes is we try to work with a real data set-end with a real search application. For better or for worse, we chose not to use notebooks. Notebooks are great for a lot of things, but I don't know that they always show you how actual applications work. +So we actually build out a really simple application. The front-end is like tailwinds, CSS, and really simple flask serving layer for the APIs. And then we use open search for the search engine and things like fast text and a few other things for ML side of it. +You know, we use the learning to rank plugin for open search, trying to think if there's anything else in our stack. It's primarily Python, but I think if you were a Java user or any of the other languages where there's clients for open search, you would do just fine in the class. +You maybe just won't be able to use all of the Python capabilities that we have in the class. I hope that answers your question, Chris. The repositories are all at least the base level repositories are all available under my GitHub. +So you can just go to my GitHub, which excuse me is GSING, ERS, and put that in the chat. And then you can see the frameworks we use. Yeah, awesome. +And I can just, you know, you can pick these things up or you can, if you know Python, it's probably easy for you, but if you don't, you can pick this up. And the next question is from the chat from quasi, I hope you pronounce your name correctly. +As these days, most of these sort of approaches are based on transformers for anyone who wants to try out IR approach using transformers as a pet project. Does grant have any recommendations in terms of cloud services tools? I don't have any specific recommendations. +I know I've looked at there's several players. I was so for instance, I saw somebody in one of the IR communities that I was in with posted around, I think I don't know how he's pronounced about quadrant, I think QDR and T. +I know there's UVA, I know there's Pinecone, elastic, solar, and open search all have dense vector retrieval capabilities. I've been playing around just getting started with hugging face. I'm a little late to the hugging face game when it comes to these things. +I know a lot of people I talk to use colab to build and run these systems. And so I think you can probably get started. Again, like Demetri, you may have better tutorials. I know you've posted a bunch of stuff on medium amount, how to get started in this as have other people. +So I would start there, I guess, any one of those you probably won't do wrong with. And then for me, I always go back to, like I like to take a data set that I'm familiar with first rather than a technology that I'm unfamiliar with. +Whenever I'm learning something new, I start with something I'm familiar with and then try to apply that thing to the new technology as opposed to picking the technology first and then trying to, you know, kind of go back and forth between the tutorials that they provide. +But I always like to go back to a domain I'm familiar with because then I don't have to rebuild my intuition. Right. So for instance, I've never really done image search, but I've done e-commerce search all the time. +So it makes way more sense for me to try out transformers with e-commerce than it does with images just because I don't know the core intuition as much on the images as I do for e-commerce. So I would probably start that way first. Yeah, I agree. +And another thing, yeah, of course, Grant, you thank you, you mentioned, you know, my medium blog post, there are a lot more people blogging on this, but I have a specific collection on medium, 37 minutes by sheer reading time. +You can go through like basics like exact can and search all the way up to, you know, neural retrieval, which is approximate nearest neighbor search because you cannot do exact can and search at scale. It will just not not scale. +So you have to kind of go and cut some corners, so to say, but actually in a more mathematical sense, you create this algorithms that are beautifully handling this complexity for you. So go check it out. +I think the next and probably last question, but not least, is from a shish, is the search with a mail course right to step into if I'm looking to learn about semantic search and add the functionality to SQL or no SQL databases? That's an interesting question. +I guess I haven't thought about it in that sense. +I mean, I think, you know, I think a lot of the techniques we use in the ML class relate to semantic search and relate to like how can we get better relevance out of the engine? So semantic search being one of those types of capabilities, a kind of semantic search often is a pretty loaded phrase. +So depending on what you're trying to do there, as you should your mileage may vary. But we certainly cover things like classifying your content, classifying your queries. We do learning to rank. +We talk about synonym expansion query, you know, smarter queries, better filters, all of those kinds of things, I think fall can be loosely coupled into semantic search. +If you're talking more like you want to do like graph-based inferences or, you know, using things like wiki data or dbpedia or those kinds of things to infer relationships and do semantic search that way. We don't really cover those as much. +We do base off of open search, but I think the concepts apply in general. With the SQL and no SQL databases, like I know a lot of them have kind of baseline search functionality in them. +And so you would be able to apply some of the principles because a lot of the principles we do, you actually do either before indexing or before querying. +So those would certainly apply, you know, because at the end of the day, you're just using those things to then generate a better query or a better document to be stored in your engine. And so I don't see your reason why they went work in a no-SQL store or a SQL store. +It's just then how do you translate that into your query language, right? But we do use open search. All the examples are open search. You would have to do the work to leap to that, whatever it is your engine is doing. Yeah, absolutely. And the good thing is that open search does have a K&N plugin. +They call it K&N plugin, but it's actually approximate nearest neighbor search. And so it's off heap for those who care. So it's not inside Java, but it still allows you to get a feel of how neural search will influence your results at. +And you can also, you know, mix and match, sort of using more traditional VM25 with this. Awesome. This was the last question. Thanks so much to everyone who asked their questions live. And, you know, consider joining the course if you haven't yet. +And, Grant, thanks so much for this session and for answering the question and sharing your wisdom. I've enjoyed this conversation very much. Thank you. Thanks so much for having me, Dmitry, and keep up the great work. I love the podcast. +And it's awesome to see a search dedicated podcast out there. So congrats and good luck with that. Thank you so much. All right. Bye-bye. Bye, folks. Thanks, Dmitry. Thanks, Grant. All right. \ No newline at end of file diff --git a/transcripts/vector-podcast/greg-kogan-pinecone-vector-podcast-with-dmitry-kan.md b/transcripts/vector-podcast/greg-kogan-pinecone-vector-podcast-with-dmitry-kan.md new file mode 100644 index 0000000..d1f8800 --- /dev/null +++ b/transcripts/vector-podcast/greg-kogan-pinecone-vector-podcast-with-dmitry-kan.md @@ -0,0 +1,170 @@ +--- +description: '

Show notes:

1. Pinecone 2.0: https://www.pinecone.io/learn/pinecon... + It is GA and free: https://www.pinecone.io/learn/v2-pric...

2. Get your + “Love Thy Nearest Neighbour” t-shirt :) shoot an email to greg@pinecone.io

3. + Billion-Scale Approximate Nearest Neighbour Search Challenge: https://big-ann-benchmarks.com/index.... +

4. ANNOY: https://github.com/spotify/annoy

5. FAISS: https://github.com/facebookresearch/f... +

6. HNSW: https://github.com/nmslib/hnswlib

7. “How Zero Results Are + Killing Ecommerce Conversions” https://lucidworks.com/post/how-zero-... +

8. Try out Pinecone vector DB: https://app.pinecone.io/

9. Twitter: https://twitter.com/Pinecone_io

10. + LinkedIn: https://www.linkedin.com/company/pine...

11. Greg’s Twitter: + https://twitter.com/grigoriy_kogan +

12. Dmitry''s Twitter: https://twitter.com/DmitryKan

Watch on YouTube: https://www.youtube.com/watch?v=jT3i7NLwJ8w

' +image_url: https://media.rss.com/vector-podcast/20211206_061204_ed150262b3f862f73666d3cce317fc98.jpg +pub_date: Mon, 06 Dec 2021 18:00:04 GMT +title: Greg Kogan - Pinecone - Vector Podcast with Dmitry Kan +url: https://rss.com/podcasts/vector-podcast/334671 +--- + +Hello everyone, so, Dr. Podcast here. Today I have Greg Coggen with the charter of marketing. He works for Pinecon. So today we will dive into Pinecon and maybe Greg will give us some highlights as well. Hi Greg. It's me, Tree. Thanks for having me. Yeah, awesome. Thanks for joining. +So I was thinking maybe you can introduce yourself to our audience because actually I personally was quite impressed that you're so technical and even though you're in charge of marketing, you're like your lingo is so technical. +So technical, so can you do have some technical background? Yeah, I actually have a degree in naval architecture. It's an engineering degree and that was my career for three years. And I did systems engineering and mechanical engineering electrical and so on. +While I was doing that, I also was moonlighting as a web developer and taught myself PHP and things like that and reading about startups and eventually became clear that I should make my day jobs related to startups. +And so I left my engineering career and went to work with startups with marketing and I fell in love with it. That was about nine years ago. And I've been working with for the past eight years. I was consulting and advising technical startups on marketing. +And I loved it because I was able to use my engineering thinking and get along well with technical founders and the like the coding foundation I had allowed me to get a grasp for what it is the products do. + And last year I joined pineconous the VP of marketing and the engineering background certainly helps we have a technical product technical users and really everyone at the company has a very technical background, even our director of product has a PhD in electrical engineering just to give you a sense. +And I was like, wow, that's impressive. Yeah, that's like you mentioned to the H.P. actually this was one of the first languages I called learned to code and decide Pearl, but yeah, this days. +I'm almost I slowed down before before I told people I learned PHP because I know there's a bit of stigma with it. It was like messy and it's like, you know, not as pristine or. +Yeah, as fancy as something else, but they got the job job done like with with that foundation, a lot of other things made a lot more sense. Yeah, absolutely. Yeah, I mean, I also enjoyed actually like it was one of the first jobs I got was in PHP. +So I built like a forum and every class in the code was starting with oops and I was asking the new engineer doesn't mean all OP like object oriented programming. And he said, no, it just means oops, I'm not technical. So he wasn't technical enough to know what is all P. +But anyway, that was kind of funny. Yeah, that's cool. So basically like you have the technical background. You also know how to explain things. +I think it's very important in our profession at large and sounds like you've been you've been advancing into this topic more and more to the level of becoming, you know, symbol or like BPO marketing actually to be precise, right. Yeah, that's awesome. +So tell me more a bit more about fine code like what what are you guys building and yeah, I know that you've recently had a major upgrade of fine code. Maybe if you wish you could highlight some of the improvements you guys made. Sure. +So we're building a vector database that makes it very easy to deploy to build and deploy the vector search into production applications. This is especially useful for semantic search and recommendation systems. +There are, we saw, I should say the founders saw that there's several ways of doing this to try and emulate the big companies like Facebook, Google, Microsoft and Spotify. + They all involved a lot of engineering work and a lot of infrastructure work and maintenance to actually make it run in production, whether you're a small startup and have better things to focus on or a big tech company and also have better things to focus on, especially when supporting your search and recommender systems would involve like a big team of engineers. +So we recently announced Pinecone 2.0 and that's that's a major release that gets us closer to helping companies deploy this in production. +So one of the biggest things we've heard from users is that to get this in production, they need to emulate some of their traditional search features they had before, but they're trying to replace. And that was specifically filtering. +They wanted to have some control over the nearest neighbor search results that they were getting through Pinecone. +Another thing was cost since typically vector search nearest neighbor searches are done in memory companies with millions and billions of items, which are the types of companies that benefit most from Pinecone. +We're finding it prohibitively expensive to do vector search not just on Pinecone, but anywhere. And so for them, the barrier to getting into production wasn't lack of engineering teams. It was like just astronomical cost projections. +And so for that, we are releasing hybrid storage, which stores part of the, which basically stores some data on disk and a smaller amount of data in memory, which reduces costs up to 10x, reduces infrastructure costs. And we're passing that along to users. +So it's going to reduce it or manage the infrastructure, but their costs are going to go down as well. And there's some other things like sock to compliance. They're totally new rest API and Python client. And console. And a bunch of other things as well. +So, yeah, and there's even more I can't announce just yet, but we're our engineering team is growing in our development velocities picking up as well. So we're going to have lots of new things to share very soon. Yeah, that's fantastic. Can't wait. And then compress on the on the 2.0 release. +But I just noticed your t-shirt says love the nearest neighbor. Wow. This is so relevant to this discussion. We have lots more of these. Anyone can send me an email. I got Pinecone that I know and I'll get your form to fill out to get one. Oh, thanks. Thanks, Greg. I'll gladly wear it. +So yeah, I mean that that covers the value prop behind your product. So I mean the key element for me is also that as you said, you're reducing cost and you know, like you provide fully managed, you know, solution to better search. +So teams don't have to kind of like run around, figure out some low level things and just get to business. That's great. So the next thing I wanted to ask you like more like on the lines of how you know there are different ways of implementing vector search right and there are different algorithms. +There is an end bench marks that will be big and then benchmarks soon as well. That competition is going for listeners outshare the link as well. But what ways did you kind of consider to implement your tech. I know some parts of it are proprietary. +So maybe you cannot share too much detail, but maybe you can share some things give us a clue how you do things on kind of like algorithmic side and also like kind of like speak to the product that large like, you know, you mentioned, so see two compliance. +So it was very important for your customers, right. So that also is kind of included in the how part. +Yeah, I'll be a little lighter on the technical side because I would rather, I'd rather point you to our docs and point people to our docs and some of the articles and examples we have, then say something that's imprecise from a technical standpoint. +Generally that there are sort of three layers, we see three layers in the inside of vector search solution or vector database. The lowest layer is the near neighbor search algorithm like annoy or hnsw. Then there's an index library, which contains those algorithms and that's like face. +And then there's a shell around that, which we're calling vector database that provides things like live index updates and crowd operations on vectors and filtering and metadata storage and things like that. So for the index, we, Pankoan does use face for exact search. +You can choose if what sort of engine you're running and a proprietary index for approximate search, which is obviously the bulk of use cases for us. +And we thought a lot about performance comparisons, maybe even open sourcing that proprietary index so we can, so we can be included in an end benchmark. +While we were thinking about that, we learned from users that actually like eaking out slightly more slightly lower latencies or slightly higher recall from the index was not really what they were after. That's not where they were stuck. +They were stuck on downstream things like horizontal scaling and adding features to an index. Setting up the infrastructure and managing it. And so since learning that and valid data that we focus much more on those things. And stayed with a proprietary index for the for approximate search. +And sure enough, we find that even people who ask a lot about this after they sign up and start using it, they really, you know, the solve their use case and they don't ask us about it again after that from some other search or recommendation system to vector search. +And you're just looking for an easy way to run it in production. So that's the use case just implement vector search and production. + Or a lot of people come to us from from like an application side, which is they don't even know they want to use vector search, but they know they want to replace their semantic their keyword search with semantic search or they want to implement image similarity search that will work on fuzzy matches or they want to do anomaly detection. +So and or classification and things like that. It really is it has as many applications as search information retrieval general. A lot of people come to us for vectors, excuse me for semantic search. So they have their embedding models like bird or something like that. +And they got it working in the lab, the data science team got semantic search working using embeddings. Now they're like, okay, engineering team or ML engineering team. +How do we get this in our product? How do we make this? How do we keep latency below 200 milliseconds? How do we add filtering to this to give users control. And the ML engineering team is then goes out and finds like, oh, we can do this with something like bank home. +So those are those are the typical use cases, I would say semantic search, the most common or somebody just coming because they're looking for vector search and regardless of what it's for. Yeah, yeah, from from our from our perspective for Pinecone. +We don't care what your data is like if it's in an embedding format, you can index it and then you search through it. Any it works with any model, any any, you know, initial data and because we have a rest API, you can call it from anywhere. +So you can use it in a notebook, you can use it in the backend application. Yeah, we're seeing a lot of interesting use cases. Yeah, sounds great. Sounds like a lot actually of use case that you mentioned. +I mean, obviously it's search, but then the answer to could also be like data science that they want to run. +If you take five, for instance, you know, metadata science teams, they run like large scale experiments using the library, but like obviously when that's the data science part, that's the exploration part. +But the moment you want to put this out to prod, you'll face a bunch of kind of like low level engineering concerns, like, oh, how do I do this? How do you do that? Reinventing the wheel isn't ever fun. +Well, sometimes it's fun if it's kind of like they'd work, but if you don't have time, you kind of like when I'm both faster than obviously you will want to use an existing solution for that. Yeah, we find that, you know, for the data science team, they don't, it's not their issue. +They need to develop the model and and prove that the method works. It becomes an engineering teams issue or the ML engineering teams issue. And yeah, they're often not exactly lacking things to do. +So, some organizations are all about like focusing on the core product and trying to use managed services wherever possible. Others like to develop things in house and prefer to take open source as much as possible. +So I think it depends on your, you know, how you prioritize your focus and what kind of, you know, what's your engineering culture at the company? Yeah, absolutely. +And sounds like you also address the elements of like, so see to and I believe you also will have GPR covered at some point already covered. +So we say we're GDPR friendly, which means there's no, there's no official certification you can get for being GDPR, you can just be following the regulations and able to make the proper disclosures and able to act on requests for deletion and things like that. +These are the types of things like the security aspects. It's another thing that a data science team might not force to when they're developing like a factor search solutions to some application. But when it goes engineering when you start talking about getting it into production. +And depending on the company, you're, yeah, you start, you know, all these things come up. +Does it meet our security, does it pass our security review? Does it pass our reliability requirements? Who's going to be on call if this thing goes down like all these things come up and we worry about those things so that the users don't have to. +Yeah, that's a big benefit like to the users again to focus on what matters to them. And by the way, I don't want to just so this doesn't come off as like promotional. +Anyone listening to this can treat this as just heads up about what you should think about if you want to get vector search in your production, even if you're using some other solution, like these are things you should plan for. And start thinking about it and making. Leaving time to do. +Yeah, absolutely. You don't want to be caught by surprise in those, those items for sure. Yeah, that's awesome. By the way, I remember that you guys also made a bunch of blog posts on like FICE and LSH. I mean, I really like the way you did it. +You know, it's almost look what looks like a comic book, you know, you know, get, get, get deep with, with these things. And I think you also shared the source code, like some notebooks. Is that right? Is that. Yeah, we, we, we publish. And articles on vector search on face on. +Semantic search different techniques and things like that. A lot of them are done by the very talented. James breaks, I should give him a shout out. We have a new one today about the index composite indexes and index factory in face. +We share code snippets and we have example notebooks for all of them. And yeah, we're very happy to see people like them. Even people who are not familiar with vector search will see it and it peaks their interest because engineers like to see how things work and learn new things. +And that's our goal. It's some of them have almost nothing to do with Pinecone. And we have more people to learn about. Vector search to. Realize that they can use vector search to replace their to improve the current applications and. +If we succeed in that, I think it'll certainly help us, but really everyone in the. In this space. Yeah, I mean, absolutely. Those looks like our jam, you know, people are reading, citing and kind of discussing on Slack and things like that. +And yeah, it sounds like you guys are also kind of willing to share your knowledge with the community, even like beyond kind of share, you know, customer interaction and so on. Right. So that's that's awesome. Yeah. I think we are moving slowly to the third section of our podcast, which is why. +And I think I know it's a little bit more philosophical kind of stance and what you do. And kind of like how you do it. I don't know if you've been reflecting on your journey. I know you said you joined last year. Join bank on last year. +But I guess I'll start off by just asking you what motivates you to be part of vector search development and this community as much. Me personally, I've worked with 40, 40 startups when I was consulting over 40 startups. And when I met, you know, the founder of Pinecone. +And learned about the product and about the space. I saw a familiar pattern, which caught my attention. And the familiar pattern was from 2015. Six years ago now, almost seven when I started working with the time, very small company called Domino Data Lab, which. +It's an ML ops platform at the time, we call that a data science platform. It's used by over 20% or the fortune 500 companies. And the time was a small team and it was a product for data scientists, but like nobody knew exactly what is a data scientist. +Few people called themselves that even if they were doing data science work. A lot of work data science work was done on just people's laptops. And there's no. It was a very young. Area, let's say. Not not quite mature. There's not a tooling for it and so on. +And over time, over a few years, it became, of course, data science became. A core function in many companies, like just like engineering and marketing and customer support. And as that happened, like having the right tooling for that function. +And kind of maturing the capabilities and making sure it's everything data sciences run can run in production securely and reliably and things like that. And so it became more important. Of course, the companies that were. Solving those things were growing with that demand. +And so I wanted to be a part of that journey, that kind of journey again. And again, I saw in Pinecon, I saw product that is pretty early in the space. And I saw a lot of data based concept and. We had to spend a lot of time explaining to people what that means they weren't getting it. +On the user side, you see many, many engineers doing ML engineering work. We don't yet call themselves ML engineers. They're still titled the software engineers. Or they might get data scientists, but they're now working on like production applications. +And also we see that companies are struggling as they as they want to take vector search out of the lab and into production production applications. They're running up against the same challenges like the technology they have. They had available wasn't quite. Built for that. +For huge scale and for like secure and reliable. And so that's the environment. And. Yeah, that's exciting to be. To be in an emerging category like that. And solve a real need and see watch the need. Grow. Yeah. That's my personal, you know, that's what motivates me. And that's why. +So I'm excited to be here. If you want to go even even on a more philosophical level, like. It's really rewarding to me to. Help grow. The kinds of technologies that are powering. Our like software infrastructure, which, which, which, which everything in this world runs on today. +And, you know, you know, the things that are in the back that it's kind of behind the scenes and under the hood that you know most consumers and most people don't know that. Their Facebook feed is powered by similarity search or that their Google search is powered by similar research. But it. +Even without them knowing it affects them tremendously. And. And, you know, I think that's really cool. Yeah, yeah. Some, some so deep. +I mean, your connection to it and in general, like it sounds like you are excited to be at the bleeding edge of stack, right? So kind of like building the next thing. And it's always exciting. Of course, it's also in many ways. Kind of what I don't want to use the word dangerous. +I want to use the word kind of like intense and, you know, like, smoothness, nice and bold type of thing, right? Yeah, yeah, it's. For sure, we don't know how the future will play out. We have our hopes and we're making our bets. And it's exciting to try it. And it that's it motivates us. +Yeah, we're not looking for a safe. For safety here. Yeah, yeah, absolutely. But on that front, like on the future, a little bit touching on the future of this market, even though it's emerging. You know, and it's still unfolding in many ways. And there are so many players already. +But I'm just thinking like, what do you think, kind of what strategic items and missing on the market right now? You know, when you think about not the data science part, I think that data science is developed quite well. We have a lot of competing, you know, algorithms and frameworks. +But like more like on the business side, right? And maybe that's in line with like how users understand the systems. Maybe they don't understand enough. Like, or like what items and missing and maybe you're working on that. +Maybe you're willing to share maybe not, but maybe something along those lines that we can discuss. Yeah, I think you actually. You made the right point, which is. For a certain, for a certain audience. +There's not, I mean, there's still more to be done for a very technical audience that's familiar with the vector search. But they have a lot of tools in front of them and. Right now, whatever extra features they needed, they've, they've hopefully figured out. +It's everyone else who doesn't yet understand this and doesn't quite see. How it applies to their applications. And for whom it's not clear what, you know, how to choose an algorithm, how to tune it. That's, I think the future isn't educating. +Those people and those companies and then bringing this capability to them. And that means just helping them understand what it is. But it also means making the. Products more accessible to them. Like. Taking care of some of the technical details. So that they can just focus on. +Yeah, they're business side of things in their application. And they're really many companies out there that can use. Vector search, but just haven't heard of it. Don't realize it. And. I think the future is in reaching those people. I think even looking. Beyond vector search and just. +Vector embeddings in general, I think it's more, as more and more companies adopt. And I think that's. And I think that's. And I think that's. And I think that's what I think it's about. And L.P. And continue hiring for data scientists. +And now machine learning engineers, which, by the way, are growing at a faster pace than. Data scientists. The number of people with machine learning engineer titles. And they're working on. And they're working on. And they're working on. Check this. Whereas data scientists grew by. +I don't remember exactly, but single digits. So, and obviously not all those mental engineers are working on vector search. But. They will have more and more. Vector embedding data. But they're trying to wrangle. If they want to maintain and. And so, and so, and so, and so. And so, and so. And so. +And the past five years, we, we saw. Introduction of data warehouses and data lakes and really like companies. Realizing that they need to centralize their other data. For their data science teams and analysts and so on. And so, we, we believe the same will. +Companies will need the same thing for vector embeddings. So that they have a. One database for. All their vectorized data that can feed models, that can feed applications, feed. Training and analysis and so on. So. Yeah, that might be a few years out. And we'll see if that. So. And that's. +Those are the kinds of things we're thinking about often, yeah, beyond vector search. How do we get? How do we help people get more use out of there? Yeah, I mean, so I guess it goes along the lines also producing docs and, you know, kind of the computation. +I can't explaining and source code explaining things, right? How can I, you know, keep the road running and kind of doing things. With my, because I don't want to focus on like, you know, actually what I really want is to achieve my goal. +Let's say create a music search service or something like that. Yeah, exactly. Yeah, that's a trap. A lot of people and companies fall into. They love the technology. They love their very proud of building something unique. +As you should be, but you have to remember that people you're serving are just trying to solve some business problem. +Some of them, the early adopters, they'll, they might be very curious about how it works under the hood and they might want to have the ability to pull some levers and turn some knobs. +But the vast majority of people just want to implement machine learning into the applications to create a smarter search function to increase user engagement, things like that. Yeah, yeah. I think that was also one recent article. +I will also link it in the notes explaining that you can apply back to search to solve the zero-heat problem in e-commerce. And that's how you can save, well, actually earn money, right? So save the user experience in that sense. Yeah, so it sounds like more and more use cases are coming up. +And you guys at the forefront of actually hearing what are the use cases, right? And kind of hopefully you'll be sharing some of those with the audience at large and something we'll learn from you guys. Yeah, we're still constantly surprised by what people want to do with the fact of search. +And yeah, we want to make the product available to as many people as possible to see what they come up with. I will say though, we also surprised in a way by how many people want to just do vector search on text data, which seems like such a simple thing to us, maybe. +But it gets back to this point that not everyone is as far along as people in the vector search community. So we've got to bring more people with us and help them see that once they're done with a semantic search use case, there's actually a lot more they can do with it. Yeah. Later on with it. +Yeah, I think it's something that probably needs a bit of kind of discovery for everyone, but also sort of like blogging more about that and sharing more about that. +No, it's not only text, it's actually everything that is inculdable as a vector, right? And it could be dense, it could be sparse, it could be whatever you have there as long as it's a vector. +Then you can send it in index and search and then you need tools to choose the metric function, right? We didn't talk about it, but I know you guys support like three major distances like Euclidean and the product and design. +So I mean, this is like more or less this standard across many data science applications, but I'm sure there is somebody somewhere sitting in the garage and venting in new metric and probably you will want to kind of provide plug-in architecture for that case as well, right? +Yeah, well, we have our own people in this figurative garages working on stuff as well. +So, but also to go back to the previous thing, the vector database that surrounds the engine as well, which might just look like more traditional database features rather than, and simply apply to vector search rather than some breakthrough algorithms or things like that. +Although, yeah, you know, the filtering that we introduced with point cone 2.0 is doing single stage filtering on a vector index was, let's say, let's not say that it took a lot of late nights in the garage. Yeah, yeah, sounds exciting. +It sounds like what your customers will benefit from, right? Almost immediately. Yeah, that's fantastic. +Yeah, I was thinking, like, do you want to add anything more on fine-going or like, for instance, if somebody wants to try it out today, what's the process looks like or should they just shoot you at the email? Yeah, well, if they want to shoot me an email, they're welcome to do that. +If they want to a t-shirt, send me an email, graygadpinecon.io. But actually, we want to make it very easy for people to start and experiment with. And so, you can go to pintcoin.io slash start and create a free account. And for small workloads, it's actually free to use. +You get one pod, which is enough for, definitely enough for experimenting. And if you have a small workload, it's enough for your production use case. That's the easiest and fastest way to sign up. You don't have to talk to anyone. +If you need custom deployment configurations, like certain availability zones or anything else, you can send me an email or you can use a contact form on our site and we'll get you set up. And it's almost as quick. We just have to set up some configurations. +But we want to help you get to production and that means not standing in your way. So that's the best way to go to pintcoin.io slash start. Awesome. And we'll make sure to link that in the notes as well. And you said, what do you mean Kubernetes, right? Kubernetes pod? Yes. Oh, yeah. +I mean, we didn't touch on this episode, but obviously you guys are scaling with Kubernetes. So you're also modern on that side as well. Oh, yeah. +We, you know, I should have mentioned this when you asked about the inner workings, but yeah, we're using Kubernetes to make the whole service horizontally scalable. And of course, the total managed on our side. +So you don't have to know anything about containers or Kubernetes or worry about any of it. But I mean, it's Kafka for streaming to support streaming index updates or batch updates. There are load balancers that are API gateways that are just a bunch of different. +There's a key value store under the hood. If you want to see the architecture, you can cut our docs and learn a bit more about it. But again, you don't have to know anything about it. And that's the point. We made it. +You just make your API calls and get your results and do something with those results. Yeah. Exactly. Fantastic. +And by the way, are you planning to kind of at some point, maybe open source, or actually implement some things for the public to send the data in? Well, do you think it's not a problem at all? Some kind of connector called some kind of gluing code to the Pinecone on the side of integration. +So I guess obviously clients will still look at how do they plug in Pinecone in the right part of their architecture. Yeah, we're thinking a lot about that. We're looking at what are the most common data sources. +What does typically usage look like? And what's the trickiest part for people? And we are thinking about how to make the trickiest part, parts easiest, as many people as possible. So can't say much more than that, but certainly we'll have some common use cases covered soon. Yeah, sure. +I mean, that's so important. I mean, a lot of things like in machine learning, you know, that like 80% goes to data collection and cleaning. +And then in the end, you plug in some model and, ooh, I sold the task, right? And the same kind of goes to the trying databases or, you know, software like, okay, how do I plug this in? And days go by and you are still figuring things out. So I think that's the, that's something to address. +And I guess you guys are doing that, right? Yeah. Yeah, yeah, we definitely. And also a lot of people, you know, we expect people to keep their data warehouse and their document store, you know, because we are, we are your vector database, we're not your blob storage or document storage. +So, a lot of people use PICO and alongside the data warehouse. Or some other database and the easier and more seamless connections are between the two, the easier it is to get factor search into production. So that's what we're thinking about. Yeah, sounds great. Thanks. +And yeah, I think we can wrap up like I really enjoyed talking to you, Greg. I mean, your, your short is the best. I once I get it, I will wear it as well. And I'll be compatible with that through search. So thanks so much for your thoughts. I mean, this was super deep. +And I mean, also you shared some of your personal kind of, you know, attitude and aspirations in this area. It's still emerging, but I mean, it's great to see you guys at the forefront of it. And I hope to hear more. +And just last question, where our listeners can follow you or maybe like Twitter or LinkedIn, where are you kind of publicly available? So for Panko, you on, we publish a lot of things on our website. So you can go to pinecone.io and at the bottom, you can subscribe for email updates. +And you get, you know, all these face articles and things that you heard about, you'll get them in your inbox on Twitter. We're at Panko and underscore IO. On LinkedIn, we also have a big following there. Me personally, I'm at Gregory underscore Kogan. Gregory is gri g o r i y underscore K o g a n. +But a lot of things I post there are Panko and related because that's what I think about a lot these days. And I'll also add that big credit to you for also leading the way with, with doing a podcast like this. +Yeah, it's exciting to see more people learn about this about vector search and start thinking about it and implementing it. And a lot of it is thanks to a evangelist like you who put in the work to do that. Thanks. Glad to hear that. Thanks so much. +And actually, I must say that I'm educating myself equally on this journey. So hopefully as part of this journey, you know, the listeners and the readers can can educate as well. So in the end, you know, value increases by doing these things. So that's that's what drives me here. +So thanks so much for joining the show. Yeah, I hope we can record another one at some time down the road. Yeah, that would be awesome. Awesome. Thanks, Greg. Bye bye. \ No newline at end of file diff --git a/transcripts/vector-podcast/jo-bergum-distinguished-engineer-yahoo-vespa-journey-of-vespa-from-sparse-into-neural-search.md b/transcripts/vector-podcast/jo-bergum-distinguished-engineer-yahoo-vespa-journey-of-vespa-from-sparse-into-neural-search.md new file mode 100644 index 0000000..ccb0c5b --- /dev/null +++ b/transcripts/vector-podcast/jo-bergum-distinguished-engineer-yahoo-vespa-journey-of-vespa-from-sparse-into-neural-search.md @@ -0,0 +1,102 @@ +--- +description: '

Topics:

00:00 Introduction

01:21 Jo Kristian’s background + in Search / Recommendations since 2001 in Fast Search & Transfer (FAST)

03:16 + Nice words about Trondheim

04:37 Role of NTNU in supplying search talent and + having roots in FAST

05:33 History of Vespa from keyword search

09:00 + Architecture of Vespa and programming language choice: C++ (content layer), Java + (HTTP requests and search plugins) and Python (pyvespa)

13:45 How Python API + enables evaluation of the latest ML models with Vespa and ONNX support

17:04 + Tensor data structure in Vespa and its use cases

22:23 Multi-stage ranking + pipeline use cases with Vespa

24:37 Optimizing your ranker for top 1. Bonus: + cool search course mentioned!

30:18 Fascination of Query Understanding, ways + to implement and its role in search UX

33:34 You need to have investment to + get great results in search

35:30 Game-changing vector search in Vespa and + impact of MS Marco Passage Ranking

38:44 User aspect of vector search algorithms

43:19 + Approximate vs exact nearest neighbor search tradeoffs

47:58 Misconceptions + in neural search

52:06 Ranking competitions, idea generation and BERT bi-encoder + dream

56:19 Helping wider community through improving search over CORD-19 + dataset

58:13 Multimodal search is where vector search shines

1:01:14 + Power of building fully-fledged demos

1:04:47 How to combine vector search + with sparse search: Reciprocal Rank Fusion

1:10:37 The philosophical WHY question: + Jo Kristian’s drive in the search field

1:21:43 Announcement on the coming + features from Vespa

- Jo Kristian’s Twitter: https://twitter.com/jobergum

- + Dmitry’s Twitter: https://twitter.com/DmitryKan

For + the Show Notes check: https://www.youtube.com/watch?v=UxEdoXtA9oM

' +image_url: https://media.rss.com/vector-podcast/20220412_120408_e18078d3137041275301d6bf045caa0e.jpg +pub_date: Tue, 12 Apr 2022 12:29:08 GMT +title: Jo Bergum - Distinguished Engineer, Yahoo! Vespa - Journey of Vespa from Sparse + into Neural Search +url: https://rss.com/podcasts/vector-podcast/452635 +--- + +Everyone, Vector Podcast is here. I hope you have been waiting for another episode. And today I have a rock star with me. Joe Christian Bergum, a distinguished engineer with Yahoo. And he has been super vocal in the field of Vector Search. +And he has been also advocating for one of the famous Vector Search engines and actually like a platform. Shirley Jo can talk more about it called Vespa. Hey Joe, how are you doing? Hey Dmitry, I'm good, thanks. How are you? I'm great. Thank you very much for taking time to talk to me. +It's fantastic being here on your show. It's become so popular. Thank you for that introduction. I'm not sure if I'm a rock star. It's really interesting to be here. I really look forward for our conversation on Vector Search and maybe we'll touch on language models as well. +And they'll talk a little bit about Vespa and the technology in Vespa. I'm really excited. Yeah, I'm looking forward to that. And I mean, you are a rock star. I can hear you every way on Twitter and LinkedIn and blogging. And so what else? So this has been like this. +And I'm really glad to hear to talk to you here today. And so as a tradition, could you please introduce yourself however you want to know the detail you want and we'll take it from there? Yeah. Yeah, so my name is Joe Christian and I work for Yahoo. And I've been working for Yahoo's is 2007. +My current role in Yahoo is distinguished engineer and I work on the Vespa platform. And I've been working on Search and Recommendations since about 2001. When I joined a company here as an intern during my studies, a company called Fast Search and Transfer, an Norwegian company. +Back then they were doing web search with this web search engine called alldevab.com. So they started around 98 I think trying to compete with Google and so on. And then Yahoo came along and bought the web search division. The team here in Toronto. They also bought all the way star and so on. +So that was back in 2003. And in 2004, Vespa was born. So and I joined I actually worked in Fast in the enterprise search division for some time, three years. And then I joined Yahoo in 2007. And since then I'm been here working on search and Vespa in Yahoo. So that's my background. +I also hold a master degree in computer science from the Norwegian University here in Toronto. Oh yeah, that's great. Actually, by the way, I did visit Toronto Hame. Was it 2007 for an interview with one search company? Not fast. But yeah, it was a great, great visit. I mean, I love the city. +It's an amazing place. Yeah, it's an amazing place. And it's funny what you said about search and trial because it really has a special, maybe special even in Europe because we at one time we had both Google, Bing and Yahoo here in in in in trial line. So that was a fantastic time. +Google shut down their office here in trial line. And but now we have a Microsoft is here in in tronheim and also Yahoo as office here in in tronheim. So there's a lot of search technology competence here in tronheim. +This is amazing actually for for relatively small city, but I think Tronheim used to be a capital of Norway at some point in back in history. Yeah, in its on point, back way back in the Viking days. Exactly. So now all these Vikings are stopped going around with boats and harassing people. +Now we developed search technology now. Yeah, such a move. Wow. And I also understood that in tronheim, as you said, there is the university. Is it actually one of the talent supplies for this industry or engineering in general? Yeah, it is. +We have the largest technical university in Norway, C and in tronheim. So as an old kind of history and so it's definitely one of the reasons why the search companies evolved. And the fast search and transfer of the company was founded by people coming out of the university here. +So two point in the east week, very good swing and so these two reggae and they they came they actually started with FTP search bucket back in like the night the seven. So and that developed into this web search engine and then eventually this became a Westpaw in Yahoo. Oh yeah, yeah, sounds great. +So I can actually maybe touch on the backgrounds since I've mentioned now web search and you know how maybe not everybody has heard about Westpaw and so Westpaw actually we started developing Westpaw in 2004. So Yahoo said that you know we brought you into the company. +We want you to build a vertical search platform that we can use across our properties in Yahoo. So for example, Yahoo finance, Yahoo news, they need to have some kind of search engine. +So and they gave that task to ask you in trial and I'm so they started building Westpaw, the Westpaw platform using the routes and the technology from the web search and putting that into a package that the verticals could install and use. +And then over time this so basically starting with basic BM25 like search like keyword search and then gradually Westpaw added more features real time indexing, 10 source aggregations, grouping facets as well. +So it really developed over time and new requirements came in especially when we started Westpaw it was around search but in 2007, 2008 around that time Westpaw's also started to be used more of as a recommendation engine. So serving of recommendations. So when you go to finance, Yahoo. +com and there's a set of articles that are recommended to you the serving engine doing that is Westpaw. And then in 2017, Yahoo decided that we're going to open source Westpaw to the world. +So we open-sourced it using our Apache tool license and we still continue to actively, very actively develop on Westpaw and add new features and so on. So that's a kind of brief background. So Westpaw is not new. +It's really kind of it has in a very long history and I think that's also great thing and we can talk maybe a little bit about it because you know we need to develop software over time. There are a lot of changes you know in the infrastructure. There was no cloud, public cloud. +There were no Kubernetes and from 2004. I started in 2007, you know a high power content machine, content node machine would have maybe eight things of RAM. And it would have maybe maximum 1 gigabit per second network. And if we go fast forward, you know, and it will have spinning disks. +And now we have NVME SSD disks. We have nodes with four terabytes, potentially of memory, lots of CPU power. So there's like keeping up in improving the software and adopting it to the hardware and new hardware and so on. It's been really fun to watch. +I think we did a good job actually making Westpaw kind of modern from something that started in 2004. It turns like really an exciting journey and really like starting from when you would explain like you know small scale servers in the way all the way. +And the technology has changed so much right? The disks became faster I guess and you know the network has become faster. +And like I remember like in Silicon Valley, Citco, if you if you watched it, it like they had a case when they optimized one module in the system and the whole system went down because it's way too fast. +So it's like it sounds like you have done quite a bit of job to actually keep this shape of flow. And like if I understood correctly, technically speaking, Westpaw or portion of Westpaw is implemented in Java. And then portion in C or C++ and then you also have some Python. +And maybe you can talk more about the choice of languages and sort of culture that there isn't the team. But I'm also curious like around the same time. I've actually seen was also developing right quite quite fast. +Did you kind of look at what that team is doing which is like an open source project? Was there something to loan from? Yeah, so let me tackle the first questions around Westpaw and the kind of languages that be used. And there's a lot of things here to cover. So Westpaw is around 1. +7 million lines of code, the total Westpaw platform. And it's a roughly 50% is written in Java. And 50% is written in C++. +And why do we use two different languages and what are the trade-offs? +So in the Westpaw architecture, we made a clear distinction between what we call the cluster that holds the content where you actually index and invert the documents and you have all the data structures for fast searching in these data structures. +The content layer is written in C++ because you're managing a lot of data. You have the data that you need to have in memory and so on. So and it needs to be fairly efficient. And then on there what we call the stateless layer is the layer that actually interacts with user requests. +So user requests comes in. It's accepted by HSP server and there you do and that layer is written in Java. So you can also then deploy plugins. You can write your own searcher functions that can dispatch the query and get a reply. And you don't. +It's transparent from a given searcher if you have a 100 node cluster or if you have a single node cluster. So that's kind of hidden away when you deploy a plugin. So those languages have different trade-offs. +So it's a lot easier for people to write plugins using Java without shooting themselves in the foot using C++. So in the content layer in C++ we don't allow any kind of plugins. You can contribute or you can contribute to the open source but then it needs to be a kind of feature. +We don't allow you to embed a library or something into the content layer. So that's a trade-off. So then you mentioned Python. We have a Python, what we call pi-wespa which is language binding on top of the HDP API. So it's not of the core kind of westpides. +It's an API where we built around interacting with westpa, doing model evaluation and evaluating for example different retrieval and writing strategies. So that's the kind of language. And regarding your scene, Apache Lucene. So if I recall correctly, I think Apache Lucene started in 1998. +So around time. So there's a lot of inspiration of course and it's not that many ways you can build a search engine. So I'm losing pretty much, it's a really good library. +So yeah, definitely we look at what's happening in open source and they have a lot of admiration for the work and the committers of Apache Lucene. I mean, it's a great job that they've done and they'll be able to develop this over 20 years. +And the core difference is between westpun, Apache Lucene is that westpun is a full kind of engine. So it becomes more of like comparing westpun with elastic search or Apache Zoolar, which is kind of an engine on top. So there's no like westpun library which you can use. +You have to kind of buy the whole, you have to buy the whole platform. Yes, basically like a web server around it and all the components like the nodes and overseer and other architectural elements. Yeah, for sure. +And on the Python side, I'm also curious like with all the development of models and you know, hugging face and you can pretty much find a paper and then most likely there is a model already available in some shape and form. +And so the Python layer in westpun does it help you know newcomers to kind of easier experiment with these models in conjunction with westpun? We do hope so. And that was one of the goals for making py westpun. +So there are different kind of use cases where you if you have like a more of a low query volume, maybe you have 200,000 documents or something like that, you know, not really not really very low latency and so on. +Then you can use Python and do embeddings and you can play then it natively works with hugging face and all those libraries that are typically written in Python. And then you can use westpun, just purely HTTP based APIs and so on. +The other option, which is more involved, I have to say, and that is that you can take a transformer model, for example, and export it to one X format or on X, which is open neural network exchange format. +So that's a kind of open neural network format that multiple companies like Microsoft, I think also Facebook have rallied around, you know, this open format. +So you can take the transformer models from the hugging face library and then you can export it to on X and then you can import all next models into westpun for evaluation. +And westpun we integrate with on X runtime, which is open source library from Microsoft, which has a lot of different language findings, Python, C++, Java. So it's a really great library and we integrate with that. +So you don't use it directly, but we have like you can put the model here, westpun you can be use it and you can invoke it and so on. + And those models and then you're kind of a trade off between, you know, getting to know westpun playing around with it and then, you know, maybe low QPS, but in the scenario where you have a really large scale, you want to do 100,000 per cent back and there's something like that, then you move it to on X and deploy it actually inside the westpun cluster, which has many benefits because then you don't transfer a lot of data over the network and so on, because network is still even, you know, within the data centers, maybe the network limitations have this sold so you can get 10 gigs or 25 gigs even, but going cross region, then latency is still concern and that's that's one thing that really fascinates me is that we're still sometimes, you know, the use cases are bottlenecked by the speed of the light, right? +So yeah, going from the east goes to the west coast and the US is easily 100 milliseconds. +So hasn't been yet canceled or sold so yeah, physics. + Yeah, this is fantastic and and so and also like even before we go into this wonderful world of models and latest advancements like I'm still curious also to dig into the item that you mentioned like you when when you have been evolving westpun over time, you found a need to add something really interesting, some really interesting data structures like tensors you mentioned and like could you elaborate a bit more on how this need arise and also like, you know, what are the use cases, typical use cases for it today and also how accessible to an average user of westpun so to say. +Yeah, so I'll do a little bit of history on that. So the best for document model you write has a fixed kind of you have to have a defined schema in westpun. +So you have to define it for instance, you have a document type called document and it has a title, it has maybe some time stamp, it might be have an integer attribute. + So there are different like normal document model, what you expect from kind of any any schema oriented database and we also had vectors so you can do early on that you can actually do brute force dot products as part of ranking because that was really popular among in in your you know for various ranking requirements you will multiply or sorry, you will perform multiple different dot products over the documents that you you're queried as retreat then in around 2013 2014 the researchers in your outside, you know, we really want to express these type of recommendation models where we can use the general concept of tensor so not just storing a vector in the document but even a matrix and they had some use cases around recommendation. + So for instance in the in the in the document you can represent in the matrix so you can have multiple is this document popular in multiple different categories for example that you know this document is popular among people that are interested in use this is in the ones that they're interested in finance and so on. + So it's a really like complex and complex like that you can actually have both the tensors in the in the document side but also the query side and then you can do during the ranking phase you can evaluate these kind of expressions so it's a really it's a really powerful the language and one example concrete example is we haven't touched on the language models and so on but for instance the callbert model which is contextualized late interaction overbert where you actually take the query is not represented as one vector but each of the terms in the queries represent the desivector and similar on the document side each of the document terms are represented as a vector and then at runtime you retrieve documents and then you rank them based on this maximum similarity function so it takes the vector of the first term and then it performs k dot products against the vectors of the document terms and then you you take the maximum of that score and then you do that for all of the terms and the final is the it's a sum so that was actually one of the things that I personally the tensors hadn't been that much used for search use cases but more around recommendation use cases when I when I when I saw callbert and I saw the maximum operator I was like this is just perfect fit for for the best potential it's a perfect use case so yeah yeah that's one example yeah awesome once you described like when you go like many models today is like okay embedded spas that you embed this paragraph whatever but if if you need to go world level that's like lots of data lots of computation right so how you would even do this sounds like tensors have found the use case there yeah and in in in in callbert what when we we represented callbert on less also we did a large sample application around the ms marker dataset the passage ranking dataset of mammoth marker so we made a sample app where you can combine these different retrieval and ranking strategies and but in our case we used callbert as a re ranking model and that's one of the really strength of of espice that we allow you to express really complex retrieval and ranking pipelines so that you do a query and then each of the nodes involved in the query they will do a local ranking or matching and then you could have a second face locally on each node and then when you have the kind of global view after you have done the scatter gather then you can do another re ranking face because then you have the global view so there are a lot of possibilities to kind of trade off between accuracy and cost them yeah yeah exactly and actually as you've been describing this I also realized that we've been recently discussing in one of the podcasts about multi-stage runker right so you could have either a sparse or dense retrieval but you can then later use your graph algorithm to kind of like re rank the items I think it was in the podcast with Yuri Markov the author of H&SW algorithm and so have you have you seen any use cases based on espice you know for multi-stage ranking pipeline? + Definitely I mean so both the search internally in our we also see this outside from customers using last but they do multi-stage retrieval and ranking pipelines so there's basically the reason why you do it typically is that it's too expensive to evaluate the kind of final ranking model over all the documents right so you take some kind of approximation of that model and then you execute that as to kind of candidate the treaver and I think one of the we haven't talked about the vector search capabilities of VESPA yet but one of the beauties of VESPA is that we after we integrate it approximate nearest neighbor searches that you can do a combination when you actually do the matching and querying that you can combine it the regular sparse or keyword search with a vector search and then you re rank and it's kind of paradigm of having multiple stages you know you see that in the question answering pipelines as well right or you have retriever and then you have what they call a reader right so it basically finds some candidate passages from Wikipedia and then extracts in a reader but evaluating the reader which is a complex typically a transformer model where you input both the query and the document at the same time into the deep neural network it's very complex to actually evaluate that overall the potential passages and user types. + It's like super intensive and I'm super curious to drill into this topic of like combining you know neural search with sparse search actually before that as you've been talking I've realized I'm actually now taking a search with machine learning course dot by grand ingressol Daniel thank you like it's a fantastic course I can highly recommend it it's super intense as well and I think yesterday grand mentioned that there are companies which you know that they really need to optimize only like top one or top two results and they have built models to optimize only that top one or top two which sounds like my mind blowing right and that was like something maybe this applies to web scale to some extent and one of my recent experiences is actually in web scale search engine we have a mobile screen and so we can only show top three results and the target is obviously to have a high CTR and so we've quickly noticed that if you do a sparse search without any logic on the query whatsoever CTR is very low so you have to do some tricks like query understanding and also trying to increase precision at the same time maintaining the diversity of search in some degree so you know basically it's very easy that with sparse search you hit just the tip of that iceberg and basically say okay I have three teacher jobs for you which is not that interesting because we don't know if the user is looking for teacher jobs right so so that's that's like have you seen cases like this I think these are really really challenging ones yeah but generally if you look at the results like if you ever rate on MS Markov for example and the official metric there which is the mean the reciprocal rank right so if you get the perfect the actual relevant document you're able to retrieve it and put it in position one that query gets a score of one right but if you put it in the second place it's gonna have a score of 0. +5 I think that's really good measure when you talk about mobile screen precision at 1 2 3 so that's really important but in the kind of retrieval not this stage retrieval and ranking pipeline it makes sense to spend more of the computational budget within the lakes SLA on those top K hits right so like in when you go to Google today and you do a search probably 100 million documents will be excluded in just a fraction of a millisecond right and then there are multiple stages and you can be sure that the kind of the the 10 last documents from the previous stages that the invest more time or computer computational resources on those hits yeah yeah exactly and also the good thing is that is that because that's when I talked about the West Architecture where you have this division between stateless which is doing the scatter gather and each of the local nodes is that basically in a search engine today you need to move move computations both to the data but there's a lot of talk about you know moving or separating compute from storage which is a huge thing right in the cloud but in search in search use cases with high triplet high document or you need to be able to do both you need to do fast computations across multiple nodes and then you transfer data like in last well each of the hits that you can include ranking features or so on that the subsequent phases can actually use for re-ranking and the good thing is like I know you've done a talk about diversity in search results and so on is that you need to have that global view you know in order to kind of optimize for for diversity and then you can kind of throw away a lot of the hits that you're not going to show because of kind of business constraints or diversity constraints and you don't need to invoke the heavy model for those hits yeah so yeah I think it's interesting for these kind of pipelines but one thing that is challenging regarding both the stage pipelines is that they interact with each other right and if you do if you have like a system for training your model retraining the model using statistical features what are users clicking on and so on the one of the features then you will have some biases towards the actual the ranking algorithm that is in place today because that's the model that is bringing interactions so you basically just retrain on the top hits and that was what we saw on Amazon as well as that when they started to improve the retriever so when it's not actually instead of having a BM25 like do BM25 and then rerank they had a mean reciprocal rank score of 0. +35 or something and now after changing into a dense retriever now we're talking about 0. +42 or something like that so by improving the improving the retriever right because the retriever kind of sets the upper bound you know because the rerank cannot really dream up you know the relevant hits if the retriever hasn't retrieved it right so that's an important point you know in the retriever and writing stages so exactly and I think we can gradually move into neural search and vector search but like you know it was one of the students question also yes then the same course how much you can actually solve with the rerunquer if your first stage retriever didn't even find what the user is looking for which means probably the query is not a match for this search engine you know let's say they're looking for a specific model of a phone but they don't even cell phones right so like and I think the response from Daniel Tankilang on this one was that actually you can implement a query understanding system which will understand the query as much as it can and if it knows that there are no such items in the database don't even bother searching for them and I think this was a really really clever advice on it and and and he said that system worked extremely well so like for user for user satisfaction to save their time right because in the end what we're doing is actually optimizing the user journey which then translates into business right so that was a fantastic example of how you can also talk such search problem yeah and that's one area where I wanted to because we're building all of these sample applications what you can build with VASPA and query understanding has been one of the topics that I wanted to build out to to demonstrate you know how to do that especially building it using a transformer model actually so you can have different ways of doing this but one way of doing it is to use it as a multi-label categorization problem so given a query here are the intents and their probability but what's stopping me from doing this is that we need to work on kind of open data sets now and there are very few query data sets in this way so one approximation is actually to train using the title so in the e-commerce set you can train based on the titles but then you need to have some kind of label on you know is this you can do it around categories so you have the title of the e-commerce listing and you have the category and the beauty of this is that you're actually mapping free text queries into kind of a fixed predefined vocabulary which is the categories and then you can actually eliminate zero hits the zero hits problem because you actually no longer retrieving based on on the free text queries but you're retrieving based on the most kind of interesting categories so yeah so these are really interesting you know topics and that's one thing that I find you know with search you know and why I love being in search is that there's there's there's there's a ton of things that you can you know build and you know there's query understanding you know there's facets there's retrieval and ranking dense bars you know and then you have the scale of it you know how to make it fast you know there's so many things you know they're just query understanding you know it's probably you know a full research topic you know on its own so there's so many things involved in search so that's yeah yeah it's like endless journey I agree it's like yeah it is you can also dive into an old piece out of things or you can stay with like scaling of search or query parsing whatever that you find passion and maybe your passion changes over time as well right so you did a bit of an LP then you move to query parsing then you move to maybe even scalability or whatever yeah I agree it's like a fascinating topic and also what fascinates me is that on the other side of things users are also not sleeping they're puzzling you all the time with new queries seasonal changes data changes as well because in mid-size a larger company it's usually work of multiple teams and you know or departments some departments looking after data some looking after ranking recommendation all the feature collection and what not you know something somewhere can sometimes go wrong and you need to prepare for it you need to interact you need to kind of build a system that is resilient and it's it's a fantastic fantastic space yeah it really is and there's so many methods and that's also one of the things you know people you know they want to build something that is great you know but even if you're using a passion to see know if you're using Westbar you know you need to have kind of some investment of actually getting great results and that the same thing if you're using like a vector search library or you know you need to have some kind of data pipeline for your documents and queries so there's you know I'm not a huge believer in you know none of these technologies really work that well you know out of the box you know it's it's such as definitely not a sole problem and even if you look at Google you know they're struggling as well you know there are many queries of question answering and so on that they totally get wrong you know and people want to build Google but they have like maybe two guys you know or or girls you know working on search you know you you you don't build a great search experience you know if you're by a team of two two people yeah yeah it's a huge investment and also like time investment not just like you need to hire a lot of smart people but you need to give them time to actually go through all these challenges and you know now that you've mentioned vector search I'm actually curious like when in Westbar journey you know did you first hear about vector search and actually what caught your eye and you know like sometimes even today when companies evaluate whether or not to take the neural search journey or stay with this part search journey it is not that obvious actually and and maybe you could share some advice there as well but maybe first if you could also do a historical deep dive there super exciting yeah it's it's so we've been using like dog products and so on for search but it's been it's been brute force right so been able to do brute force vector search in Westbar for a long time and then in 2018 bird came out and in January 2019 the researchers published really great results on them as Marco Pasadranking and then we like you know this is this bird model you know how can we use it you know is it you know there are a lot of things you know to get your head around you know what its bird is and how to use it and then we saw that there basically were two ways of of using it either as a representation model where you encode the query and the document independently and then you can build using a vector search library you can you build the index of your corpus and then you can retrieve pretty efficiently if you have La Crocs Med Search version so that actually was what motivated us in 2019 we started that work in summer August to actually have vector search and then also in term in Java there are a couple of image search use cases around hamming distance so they were pushing for that so there are multiple things and also our users it was open source by then so users were also asking for it you know can Westbar do vector search you know we see that we could represent vectors but it's not that cost efficient if we need to do brute force so then we start looking at you know it and we had all the kind of building pieces we had the tensor the document models representing floats and all these numeric fields and so so we had it wasn't a lot of work to get all the kind of pieces together but we had to implement the algorithm and we did we did a pretty I think like one month where we actually surveyed multiple algorithms for approximate vector search you know how could they fit into the Westbar model of doing things so that's the background of kind of why why vector search came to Westbar and I was really exciting when we started working on that because there were a lot of interest in it right so there were people wanted to work on that project so yeah of course because it's something like a bleeding edge and like a new but also like one of the podcasts I mentioned I think it was also with Yuri Malkov that I've I've had a friend who worked in essentially vector search but he was a mathematician himself right so I also viewed it as a pure mathematical concept and I was like yeah he's playing with some theoretical you know advancements and then he actually gave a talk at Google you know as well actually presenting this algorithm and the nearest neighbor search essentially and how to optimize it and even then I wasn't like essentially buying in and like okay it's still mathematics but then when I was reading H&SW paper I saw them citing his work I was like wow so now these paths have intersected so now this makes sense and you know usually it excites me when it's put into practice is that how you felt as well like was mathematics aspect of it like engaging for you or did you view it more like an engineering sort of yeah I'm definitely on the engineering side so I'm definitely on the engineering side so for example on transformers you know I don't care about the deep neural network architecture how these interacts you know I basically treat as a black box you know this is this is the box and you need a tokenizer for it okay what's the tokenizer what's saying people's output what can I use it for you know I'm not gonna do and they be I mean a lot of research actually study you know how can we build ultimate in neural network architecture so definitely know from for me that was not the math involved but we we have some people in our team with a heavy math background and you know they can teach me a little bit about what it's a proper distance metric and you know why why this one work and this one work so that that was really also a learning experience for for me to engage with such core team on this feature and a huge discussion we had you know who was you know one of my main point was that you know we need to be able to integrate for users when they want to use vector search they want to have filters they want to be able to express this in our query language so that you can combine the best of both worlds and and that took really some time you know to to get that right and that was really you know really fun to see that that you actually can write a query and say that hey give me documents that are near in vector space then filter on this attribute but at the same time also compute or retrieve based on the weekend query operator which you heard about weekend which is an optimization technique for doing spatial chival and that you could actually express that in the same query and I have to say that I was really proud of our effort when when it came out with that and and could be able to combine it and that's really if you look on the future side I think vector search it's been the biggest game changer for us was to actually integrate vector search because that's speared a lot of interest into VESPO yeah yeah and I can actually have people coming in you know yeah but I can imagine that vector search can still be useful in search as well as recommendation systems right yeah exactly so so and that's one of the things that you know you see that factorization machines dot products has been used for recommendation for a long time so you basically see the technology for search and recommendation use cases kind of merging into the kind of same same space technology space and for those type of use cases I think VESPO is really strong technology and but the interesting thing that I want to mention is that we have people coming in you know asking about VESPO thinking that it was a vector search database and then they realized hey you know there's keywords there's ranking there's a lot of other features here you know so that's been interesting for me you know you know I see vector search as a feature of VESPO in this whole kind of serving engine but you can use for search and recommendation not like I see vector search as a very important feature but it's like one feature of VESPO yeah I have to admit that part I probably played that role in bringing those users onto you through that blog post that I will of course mention and did mention multiple times and where I compare multiple you know now seven vector databases and I did put VESPO in that corner just to consider only the vector part but I knew that you guys over a lot more and actually still learn at some point hopefully we'll use VESPO in some project that I can actually evaluate but yeah you absolutely right that some of these systems are actually beyond just vector search and you know also the use cases like the way you view this right you should actually take a step back and ask yourself what is it that you are trying to build yeah I think it's really important and so when you look at vector search and we didn't so to clarify on the algorithm side after investigating an oil and several techniques we went for your emalco's H&SW algorithm so we implemented a version of that to be able to also handle filtering real-time updates and so on so but I think one discussion that is is not heard that often is that vector search when you introduce kind of H&SW or any technique you are losing some accuracy compared to the brute force right so for example a data set that is called SIFT one million documents you can do a single treaded route for search over those one million vectors in about a hundred milliseconds right but if you do approximate then some parameters of H&SW you might get down to 0. +1 milliseconds as well using a library right so it's a thousand times faster but by doing that you are losing some accuracy and that's kind of when I see blog posts about approximate vector search without mentioning the kind of trade-offs between recall and performance then I like you know you should include the recall numbers because there's really so it's really I think it's really important for many use cases right it might be that you need to do use a brute force because that kind of approximative error that you introduce is not acceptable right so we do have use cases in now that we actually use we don't have like large amount of documents that we actually use a brute force search and best but best best supports brute force search so yeah yeah okay so you can switch and that's the beauty is that since we support this you just say in the query time you can say approximate through your false and that means that you can take a query run it using a brute force and then you can compare the result for the brute force which is exact with the approximation then you can compute the overlap between those two and that's typically then what's used in the recall at k right so I did two blog posts on what I call billion scale vector search with with last one where I did deep dive I think into these kind of trade-offs because when you introduce approximate you also need to build these kind of index structures so in hnsw you need to build the graph right which is time and resource taking time you know I'm costing memory so there are all these kind of trade-offs and that's generally I mean generally for search a lot of trade-offs but especially around vector search I call it the jack of old trade-offs because there's so many things you know to consider you know memory usage this usage CPU and so on so yeah that love the term jack of old feed-offs yeah but it but it really is you know you really have so many trade-offs and some companies you know maybe you have lots of data but you don't have any real tripe it right in that case maybe disk a and n or things that basically using disk is is a good alternative because when you're buying servers in the cloud or renting servers in the cloud you pay when you want to have this amount of memory you get this amount of CPU right there comes in a relationship between the CPU and the memory and and so there are different trade-offs around you know what what's actually going to use it for you yeah exactly have you heard any other misconceptions about neural search at large you know when somebody comes and says hey I want to implement a question answering system you couldn't principle use sparse search techniques or like query understanding techniques you know to actually almost do it in the rule-based fashion but like neural search on the other hand is like you know new sexy stuff everyone's to try out so the question is like have you heard of any misconceptions or something that people think it's much easier than it is yeah that's that's I mean it's a fantastic question I think you know you can just sit back you know this is I'm relaxed for a few minutes because this is a topic that I really love um yeah so so so the first time when we if you look at semantic search especially around vector search if we semantic search might mean a lot but if you look at the kind of the typical that people use semantics search today is that you have this vector search right you have independent query embedding in the document embedding and so and if you base this if you take them pre-trained language model from hugging phase and you just pull that model and then you encode your queries using for instance the CLS token or the average over all tokens and the result that you will get from that is not going to compete at all with the VM25 right because that language model has not been it's only been learned learning how to do mask language model right so it's basically it's been trained on predicting the next word right so it's a deep neural network that's it's not been trained for that so it's basically like taking some deep neural network for my vacuum cleaner and put it into my car you know to try to try to try the car it's not been trained for that right so that was one of the things you know when we struggled as well when they looked at bird and the other people like oh that's so great and then we had the engine and we could like compare it with VM25 and then we did bird here and there was like these results if you look at the actual information retrieval benchmarks they're like the results are not good they're they're like really so then came the kind of you know realization I think that's actually happened around industry as well in 2020 when the DPR dense passage retriever paper came out from from a Facebook where they trained on natural questions the Google dataset they actually trained this dense retriever and the dense model using a contrastive loss and hard negative mining so they basically demonstrate you know how to actually train a dense retriever model and then we actually saw the results were much better than than VM25 in that but but it's a huge so that's one area where I think that people just using the pre-trained model might not work well especially if it's not been tuned for retrieval and even if you look at MS Marco which is the largest data set out there that you can train a model on if you train a model on MS Marco and then you apply that model into a different domain so on a different dataset it might not outcompete VM25 in fact it actually in many cases it is actually underperforms compared to VM25 so and that's why there's a lot of interest and especially recently is like you know if we combine this exact matching you know the actual user search for this phrase but we also have the vector representation you know how to combine that and that's that's actually two of my colleagues are right now working on the beer dataset to they open the PR to the to the dataset to include VASP as well and then we will demonstrate some methods for combining sparse and dense. + Yeah that's awesome like I've read the beer paper after you referred it to me actually and it was quite eye-opening because it does compare not only sort of like search engine algorithms and approaches but also datasets and tasks right which different tasks like searching or answering questions may matter quite a lot and so it was quite an eye-opening that first of all VM25 is fairly competitive so it's not a loser not at all so like you should still consider using it like and actually maybe even keeping it as a strong baseline in everything you do and I know some companies by the way still use TFIDF so maybe they should also like first transition to VM25 and only then jump to neural search techniques are like a denser trivel and and I think you also mentioned that and I saw by the way that you have participated in various competitions on denser trivel and on ranking like can you can you elaborate a bit more like what drives your interest there because to me that sounds more like academic interest in a way right but of course you're also showcasing and probably bringing ideas back to that spa. + Yeah so the motivation was actually around them as Marco passage ranking and where we actually could use this dataset and then our dream when we started to implement vector search was one thing and the other thing was you know how can we represent the re-ranking using bird in westbound so using the actual bird model inputting both the career and the document at the same time so that was one dream we had and but we were looking at the results and I think the first paper that we read it read that they they used maybe a day to with even with a GPU to actually perform 3600 queries right so it was not really you know how can we make this practical and then two years later we actually did did beat their benchmark and to end represented on westbound and we were doing it less than 100 milliseconds so on CPU right so but there being a lot of learning to get there but that was the motivation to kind of demonstrate that you can take this state of the art or close to state of the art with three-wheel and ranking pipeline from an open dataset which is how widely recognized and all the researchers are actually publishing papers around it you can actually take that model and use westbound and get those results you know so it was one way of demonstrating that you can actually then you can actually use these models with westbound and have it serve in your state so that was actually the motivation not on the kind of science side and so on but I have to say that I really would encourage everybody that works in search to look at some of these open datasets you know play with them you know maybe you have some ideas you know around search and how to do search and there's a lot of talks about boosting this phrasing you know how actually does that impact the results on kind of a dataset and I can really recommend the track COVID which is a dataset that was made at the start of the pandemic and it has about 50 queries and deep judgments for each of the queries and the collection is rather small so you can play with it on a single node and so on so that I will really encourage you know people in search to try out you know because then you get the feeling you know does it actually work does it actually you know compare it with BM25 what happens if I do phrase matching or do something clever you know so I think that's and and I'm really not a huge fan of anecdotal query examples to see these kind of commercial actors with this you know I'm searching for this on this magic results you know I'm more into you know demonstrating that actually Westpac can do this and it has the funding and actually on the real datasets. + Yeah and I agree in the end of the day what matters is first of all can you apply this tag as you said in your real setting right in your domain then another thing that you mentioned just now you know the track COVID dataset so maybe as the result of your research you might also impact on the global situation right maybe somewhere locally maybe somebody will use your work to actually implement a better search system so I think that that's also a fantastic segue to you know the the context that you're doing and that's actually a very interesting point because we had at the start of the pandemic we built a cord 19 search interface that we published online so people who actually go and search this dataset and people they were I don't I don't recall the details but it's still online and people actually because of all open source so they forked it and then they started using Westpac and based on that and I think it's a much better shape that service right now than the the cord 19 search that we did so they actually built on that work so so that's that's great I love to put what I call sample applications you know how what can you build with with Westpac and and that's actually a lot of my time these days are spent on making these sample applications smooth and easy to to work with and especially we've been rather weak on the kind of UI putting together front dance and so on so that's actually some work that I'm doing right now to kind of build more of the product you know what can you build with it because people don't get really excited about looking at Jason I output you know to actually see some interactions faces facets you know out the completion and to actually build that whole experience you know for the for the product people it's like looking at the engine when you actually want to maybe look at the car right and then you get fascinated by how shiny and sort of sleek it is and then you're like I'm buying it yes yes I totally hear you there and like actually in these use cases you know there are other platforms you know in the neural search space also doing multiple demos have you been looking into the direction of multimodal search does that excite you do you think it's too much of a bling edge or niche use case or do you think it has potential because of the neural search crossing the boundary of text towards the image audio and so on yeah I think multimodal is really where recto search is shining so this is the area where you it really shines I have some doubts about out of the main like we discussed using a vector model for text search if you don't have any label training data and so on and adopted to your data using vector search alone for that I think is questionable but looking at this multimodal where you combine both a transformer model and a typical image model and you train that representation and from what I've seen from these models and we did a sample application on this as well using the clip embedding model from from open AI and looking at the results I I have to say that I'm really impressed by kind of just eyeballing I don't have any kind of I don't have any hard data sets or but it's really impressive you know what that model can can actually do so I definitely think that multimodal is it's very I don't think it's I don't think it's that far ahead I think because we have interest in representing clip in best from from actual questions I'm actually I'm seeing an email right now you know how to they want to help on their schema and definitely they want to use clip yeah so definitely I don't think it's that advanced at the moment and I think we'll see another thing that I'm working on right now is that I talked about our sample applications I want to build a new sample application that demonstrates in a UI in an e-commerce setting where you combine different kind of fussy matching exact matching vector search all in the same query and then you can have some sliders where you actually slide these you know how does the result change and they change in real time so I just need some help on the on the react front end because I'm not I'm not a great JavaScript programmer I have to admit so I need some help on that so yeah but I definitely think that multimodal vector search has a really has a huge number of use cases yeah I hope that amongst listeners of this podcast maybe there are some with front end skills and maybe since you're building this for open source you know that might be good use case as well to be contributing to this crazy journey but we have yeah I mean that would I mean definitely we do see more involvement and contributions from from the in the kind of community around VESPA so I think we build a lot of the last two years of the community side and people getting to know more about VESPA and actually starting to contribute back both on the sample applications and also documentation and also we're seeing our more involved in contributing to the code so definitely yeah so but I think it's from a product side it's really important for us to and also we have a commercial offering of VESPA where you actually have a hosted interface hosted solution multi-region and to I think it we want VESPA to be able to run fully fledged in your environment if you want to use it because it's open source it's our Pasha 20 if you want to use our cloud you are welcome to do that and to kind of have and the same kind of functionality and what we add in the cloud is CICD pipelines how to do multi-region failovers like in the US East US West you can have different so all this kind of top and take care of sort of take care of nodes failing and whatever you know the hole the kind of host the experience so and that's been an issue with our sample apps they have been like it has been some friction around you know how to deploy them locally how to deploy them to the cloud so I'm trying to kind of bring them together so that they they work in in multiple environments yeah that's a lot of sense and I guess it takes a lot of engineering effort to also kind of cover all these different use cases so sounds quite exciting and actually demoing the technology I think you know as you know other vector databases have got it and I think it's a such a low entry for especially for non-technical people or those who are in charge of businesses business units to actually make decisions and I think for them you know having a relevant demo is going to be quite a game changer because if they need to reason about your technology only through the eyes of engineers in their company then probably that's that's much longer path right yeah exactly and I want this experience to be as smooth as possible so that you can get started with the sample application run it locally get some data into it fire up your front and react and you can interact with it and if you're happy with it if you want to share with your friends you can upload it to the Westpac Cloud and then you can share to URL to your friends and that's a model that I really believe in that you can it's open source so you can actually run it locally and then you can take the cloud provider can actually take care of the hosting for you so that's and right now we actually we are providing like free trials so you don't you only need an email address for the Westpac Cloud you don't need a credit card or things like that so you can actually play it play with it and run with we can even leave a link where users can try out Westpac and subscribe so I think that will be quite beneficial and actually I was thinking like even though we a little bit drifted in our conversation away from better search you did mention the exciting space of combining you know better search with smart search and I wanted to take it from the angle of a non-technical user right so let's say they come to you and they say Joe can you actually enlighten me a little bit on how do I combine these things maybe I just want to deep my toe and vector search just to see what it cannot cannot do in my domain what would you recommend them to do assuming that they already have maybe like a smart search engine and maybe they are evaluating Westpac as one candidate yeah so I think the question is if you're using Westpac it's rather easy to do this because you you can express it in the query and then you write the right key profiles saying that you know this is how going to combine the sparse ranking single for example be on 25 with retrieval for others that are not using Westpac using for example elastic search and open source of our shesolar what we see is that they build a lot of infrastructure on top of these so they actually have the ranking layers outside of elastic search right so in that case is you could have kind of a vector search library running at the side of elastic search and then retrieve and then you need to you need to keep those two data stores in sync and then you can in parallel fetch okay give me elastic search your best results and vector search database give me your best results and then you can use a technique called reciprocal rank fusion where you basically merged results based on you know are they are they ranking you know it's the document found in both so that's that's a powerful technique of but you don't have to actually know anything about the distribution of ranking scores and so on so google is writing a lot about reciprocal rank fusion so it's interesting direction and that's one thing we know from Bing and from others from from both Bing and from bydo in in China is that they're doing this kind of mix mix retrieval with different systems for sparse signals and then signals but but then you have for the regular uses you have a lot of moving parts right you have different data stores make the manned ship and that's one of the things that we try to our advantage is that when you're using Westbyes that you know you you get these capabilities in the same engine you don't need to store the data in different stores and having consistency problems because of that yeah yeah so I will definitely if you're interested if you're sitting there today with open source or or elastic search and you don't want to invest in in in in the vast park you could try this batching the query and doing reciprocal rank fusion yeah yeah it could be like one way to actually introduce something from more like semantic search if you view it that way right so that's a great idea because I think there are multiple approaches to this and I think if you are within one search engine like say VESPA or elastic search open search a solar would have you then I think you could in principle experiment with like fusing you know the neural search result with sparse search using some kind of linear combination as you actually retrieve it right yeah so so so so you so you can actually use the linear combination but the great thing about this rank fusion is that you don't simply you don't look at the ranking scores so you basically just fuse them by the order of their returns so you don't have to know anything about the score distribution like EM25 it has basically unbounded it could be 25 it could be 100 to be 5 right so it's very difficult to to to combine that using a linear model because you have two signals you know and one is number is going to be like this and now it was going to be between 0 and 1 so reciprocal rank fusion is definitely you know interesting case actually this is super great point and hopefully we can provide some links to this because this technique because I think I heard this question multiple times that would you set exactly just now that the score space is completely different and they are not compatible with each other and so you have to find a way to still interleave them or merge them right so that would you set exactly makes sense that you don't pay attention to the score space you actually look at the order and you try your best to interleave them yeah that makes total sense yeah yeah there was actually a recent recent paper on because there has been more interest in that these dense models alone that they generalize not that well what you're using out of domain and one of the things that the Google researchers were doing and showed promising results was using this rank fusion and I've seen this rank fusion in a multiple Google papers so so it's very interesting the researchers they're really interested in reciprocal rank fusion so yeah sounds like a popular technique yes I mean time flies and I really enjoy talking it feels like we could record another podcast what do you think I'm talking multiple topics but I still really love to pick a brain on that philosophical question and kind of ask you what what keeps you so interested like you are a loudmouth behind West Pine general but you also offer a bunch of advice right like through your blogs through your public presentations and even sharing papers on Twitter at least for me was super helpful that I could you know quickly read the paper that you shared but what keeps you motivated and interested to stay in this field and also specifically you know maybe you think something is missing in the vector search space or in general in in search space that you would like to fix yeah so it's a great philosophical question I think I'm not that excited about vector search I see that as a technique so I'm more like excited about search because I think it's such a fascinating problem we touched on it before you know you have query categorization spelling you have so many different aspects of building a great search experience and also the scale thing is really appealing for me you know kind of passionate about you know how can we do this billion scale how can we make it fast you know what if we need 100,000 queries per second what if we need to update all the documents in real time and within 12 hours or one hour or you know where's the limits you know where is the cloud going you know this compute versus storage can we now move more computations out of the storage layer there are a lot of these exciting on the kind of system things but on them kind of more science things you know how to build a great search experience I think you need to have this kind of multiple techniques and we didn't touch on it but vector search is one thing sparse search is one other but GBDT models tree based models is really ruling the search or it's kind of the hammer of search because those models on tabular data statistical features you know they really show promising results and that's another thing that I think is great than about west based that you can combine these GBDT models newer metals in the same ranking functions I don't think that there's a one single silver bullet for retrieval I think there are multiple different singles like for instance let's let's do vector search if you only do vector search if Google only did vector search only on the text right it would basically you have a lot of duplicates on the web you have low quality content you know there's page rank there other factors you know it's not only vector search there's that kind of different techniques so and so that means also that there's a lot of things new things to learn you know how do you do caricaturization you know how do you how do you how do you actually determine which facets and then kind of navigation you're going to show to the user and like you touched on at the start you know if your user does a query and we don't have any good results you know should we just slow them some random results or should they say that hey you know I'm sorry but we don't have anything for your query so yeah that's really what motivates me is that it's such a fantastic problem if you're interested in scale and all these kind of things coming together so yeah yeah thanks for that it's deep and it's very wide and I think it's like limitless space and I hope also that newcomers feel it's kind of like a low bar entry especially and we didn't touch on this but especially with your work and open source you know the support like you can go and slack or whatever tool you're using to communicate with your users and actually listen and address their concerns questions and hopefully this opens more you know more possibilities for newcomers to enter yeah I love I love actually it's actually a weakness as well but I love answering questions you know you can see me answering questions on multiple slack spaces you know I love people you know asking questions about search so I really love that and I'm and what really gets me if someone is struggling with something you know how can I do this with West Park and I'll try to explain it you know you have to do this and that you know and then I like I go back you know at the program saying you know we need to fix this you know we need to make this more easy for people to use right so it's it's a both way thing and that's one thing that I learned in my career is that you know listen carefully to your users how they're using the product what are the pain points you know how to how does it feel to get started are able to progress so that's that's really also motivating and and honestly I think that some of the work that we've done using some of these smaller transformer models has been has an impact on the industry like I got contacted by a person here on Twitter the other day said that you know I saw your tweet about these smaller language models like not the birth base that people usually turn okay but this mini LM model which is a distilled 22 million parameters that actually did the demo that you can run in your browser and it said you know I saw your tweet and I went ahead and tried it for my domain which was classification of hate speech and then he like did a blog post on it and he mentioned me and I think that was really like interesting for me to see that you know that I could share something that some people could actually make use of even if it was outside of search show and I learned a lot from especially around the relevance is slack space that we are both in the open social connections slack space so a lot of discussion there rector search and we are you know sharing some blog posts and then I ask Greg from Pinecoin a tough question maybe you know and so I really love being there and discussing and I learn a lot from from other people like just devins from elastic search and so and I'm from you and especially around Berlin best search last year you did the AMA on the vector search and for me like one of the key moments was that Max Irvin your co-host of that he said you know what if the user types of phrase query you know actually quote marks I want to search for this exact phrase you know don't show me anything else give me that phrase you know and that's something that is really hard to do with vector search alone right because you basically map it into this vector space and we do the similarities and that was a key takeaway from me and that was a really eye-opener for me you know you need to be building out better examples of how actually to combine sparse and then single so yeah this is amazing and what I enjoyed what you said is that you keep your practitioner hat on so you don't just buy in easily into these new models or you don't stay on the field of okay I'm only an engineer I don't even know what machine learning is because I think the profession is slowly changing and it's like a blend of skills where today you need to succeed as a search engineer and maybe it shouldn't be called the search engineer anymore like I think it needs some new term but we will probably be stuck with it for the lack of a better term but eventually you will need more skills under your belt and I think of the work that you are doing is amazing in sharing this knowledge and that people can actually reproduce it and I think that's super super crucial for the progress yeah I mean thank you Dmitry that's that's really nice of you and you know that's that's I yeah I think it's it's it's actually true you know to share and I think that what you said you know building a search team today is really hard because especially since deep learning entered the search field right so now you need to know how to configure and do matching and boosting an elastic search and now you also need you know how do I train the dense vector model and you know how should I you know should I use birch they use birch large you know does it handle multilingual text does it handle spell correction you know they're always kind of different things you know so building a search team in 2022 it's not easy because you need the kind of a mixed NLP search you know there are a lot of different things and that's what I love about it you know and I talked about on Twitter and you know in a talk I did earlier as well that this neural paradigm shift has opened this kind of knowledge gap you know how to actually successfully apply these methods and also on the technology side that we try to bring you know with VESPA that you can kind of combine different techniques we don't have to throw away 50 or 300 years of the inverted index you know we don't need to throw that away you know it still has value it's going to have value probably forever you know so so we don't have to throw away so that's interesting but that's also what it's been fascinating and I've said numerous times that I don't think I've learned that much in my career that I've actually over the last three years because reading papers it's not being a big kind of interest of mine earlier it's been one of the system side engineering side but that's been a high-opener for me you know to kind of how to apply these techniques and yeah so and to learn you know and that's the great thing about open source is that you know we can share ways of doing things yeah yeah absolutely sharing is caring and so much comes back to you as you said you know you get mentioned somewhere and you feel like you didn't do it in vain but also you might learn something new like a new use case and I feel the same you know when when I blog or when some video is viewed by somebody and then somebody says thank you even just multiple months after I did it and you know it's just an amazing feeling it's like a sense of connection as well especially in these days when we don't maybe meet socially as we used to but that's a new actually evolutionary new way of connecting and I feel much more comfortable and enjoying this more detailed conversation so maybe these interactions on Twitter they they bring a lot more value and I think this is super super great is there is something that you would like to share on Vespa development or maybe something that users might anticipate and maybe you want to point them to some tutorial that they might you know take a look at yeah so I can give a few product updates of what's coming from Vespa we are coming gonna release some integrated dense models for Vespa so though you don't have to export so you can you can use these models off the shelf and then we allow you to tune the Korean coder so you have the document code is frozen but then you can tune the Korean coder and then show you how to combine these combining both dense as far so that's one thing that is coming out other thing is that we're taking some steps regarding for love QPS use cases because we designed Vespa you know to be kind of low single digit male seconds on multiple different use cases but not everybody needs that so we're introducing some new options for memory management so that we can actually run on service with less memory so that's I think that's gonna be a game changer for certain use cases that don't need high throughput low latency so that's two things and yeah that's I think that's more than enough you know and there will there will be some some blogs I think about our results on the beer beer benchmark yeah yeah that's and I'm gonna come out also with a blog post on a technique called span which is a paper for Microsoft so m2n represented that on Vespa so it's really interesting technique with the hybrid combination of HMSW and inverted file and you can represent this m2n in Vespa so I'm gonna do a part three of my blog post serial billion scale so that's something I'm looking forward to but right now I'm that kind of refracting a lot of the sample applications and so on to to get the experience more smoothly yeah yeah sounds fantastic looking forward to it and we'll make sure to link all the blog posts that you mentioned especially on billion scale vector search and other tutorials that you mentioned and this is fantastic thank you for doing this and keep doing this keep finding the energy I know it's stuck sometimes but I think it keeps you also awake and sort of like pushing yourself forward and I think the best way to use your brain is actually doing something that is useful be reading a paper or implementing code or blogging about it so this is fantastic thanks so much for your active contribution thank you thank you as well yeah um and I really enjoyed this conversation I really hope we can record at some point down the road as well if you will be open to it and I think we can cover a lot more topics as well but I wish you all the success in your endeavors and stay warm and excited about the field yeah I will I mean such an exciting field thank you very much Dmitry for hosting this and you know we'll talk later on thank you thank you and see you around bye bye you \ No newline at end of file diff --git a/transcripts/vector-podcast/joan-fontanals-principal-engineer-jina-ai.md b/transcripts/vector-podcast/joan-fontanals-principal-engineer-jina-ai.md new file mode 100644 index 0000000..af8fb9a --- /dev/null +++ b/transcripts/vector-podcast/joan-fontanals-principal-engineer-jina-ai.md @@ -0,0 +1,151 @@ +--- +description: '

Topics:

00:00 Intro

00:42 Joan''s background

01:46 + What attracted Joan''s attention in Jina as a company and product?

04:39 Main + area of focus for Joan in the product

05:46 How Open Source model works for + Jina?

08:38 Deeper dive into Jina.AI as a product and technology stack

11:57 + Does Jina fit the use cases of smaller / mid-size players with smaller amount of + data?

13:45 KNN/ANN algorithms available in Jina

16:05 BigANN competition + and BuddyPQ, increasing 12% in recall over FAISS

17:07 Does Jina support customers + in model training? Finetuner

20:46 How does Jina framework compare to Vector + Databases?

26:46 Jina''s investment in user-friendly APIs

31:04 Applications + of Jina beyond search engines, like question answering systems

33:20 How to + bring bits of neural search into traditional keyword retrieval? Connection to model + interpretability

41:14 Does Jina allow going multimodal, including images + / audio etc?

46:03 The magical question of Why

55:20 Product announcement + from Joan

Order your Jina swag https://docs.google.com/forms/d/e/1FAIpQLSedYVfqiwvdzWPX-blCpVu-tQoiFiUJQz2QnIHU1ggy1oyg/ + Use this promo code: vectorPodcastxJinaAI

Show notes:

- Jina.AI: https://jina.ai/

- + HNSW + PostgreSQL Indexer: [GitHub - jina-ai/executor-hnsw-postgres: A production-ready, + scalable Indexer for the Jina neural search framework, based on HNSW and PSQL](https://github.com/jina-ai/executor-h...)

- + pqlite: [GitHub - jina-ai/pqlite: A fast embedded library for Approximate Nearest + Neighbor Search integrated with the Jina ecosystem](https://github.com/jina-ai/pqlite)

- + BuddyPQ: [Billion-Scale Vector Search: Team Sisu and BuddyPQ | by Dmitry Kan | Big-ANN-Benchmarks + | Nov, 2021 | Medium](https://medium.com/big-ann-benchmarks...)

- PaddlePaddle: + [GitHub - PaddlePaddle/Paddle: PArallel Distributed Deep LEarning: Machine Learning + Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)](https://github.com/PaddlePaddle/Paddle)

- + Jina Finetuner: [Finetuner 0.3.1 documentation](https://finetuner.jina.ai/)

- [Not + All Vector Databases Are Made Equal | by Dmitry Kan | Towards Data Science](https://towardsdatascience.com/milvus...)

- + Fluent interface (method chaining): [Fluent interfaces in Python | Florian Einfalt + – Developer](https://florianeinfalt.de/posts/fluen...)

- Sujit Pal’s blog: + [Salmon Run](http://sujitpal.blogspot.com/)

- + ByT5: Towards a token-free future with pre-trained byte-to-byte models https://arxiv.org/abs/2105.13626

Special + thanks to Saurabh Rai for the Podcast Thumbnail: https://twitter.com/srbhr_ https://www.linkedin.com/in/srbh077/

' +image_url: https://media.rss.com/vector-podcast/20220119_090157_f67877f44bb32ae14fd380d9328691ec.jpg +pub_date: Wed, 19 Jan 2022 21:02:57 GMT +title: Joan Fontanals - Principal Engineer - Jina AI +url: https://rss.com/podcasts/vector-podcast/366298 +--- + +Hey everyone, Bector Podcast is here and today we are continuing our quest to study more about Bector Technologies and Beding Technologies platforms and today I have a guest from Gina AI, his name is John Fontanelles and he is a principal engineer at Gina AI. Hey John. Hello, nice to meet you. +Yeah, nice to meet you as well. Thanks for joining me today and I'm really really excited to talk about what is Gina AI? I know something I used to use some kind of predecessors of Gina AI in some sense, but not like Gina AI itself. +But first of all I would like you to introduce yourself, your background to our listeners and to me. + So, well I studied engineering degree in Barcelona, not computer science, from general engineering with my AmiExelectical Engineering, Mechanical Engineering, but I got into software engineering because I was related with robotics and then when I started my professional career I did software engineering at different companies and industries and then I got more into that engineering machine learning and these kinds of fields and then I also did some work on traditional search, on web search, engine so on and then just live brought me to Gina which was a good step in my career. +Oh yeah, cool. +So, what caught your eye in Gina AI as a company and maybe as a technology as a product or maybe the team? + So, for me what caught me the eye, it was like the technology and the vision of I see that vector search embedding a semantic search in general can revolutionize how we understand search and can bring it to the next level also and adapt to different kind of data or so on and go beyond the typical search bar that we are so much used to. +Yeah, yeah and I mean but Gina is like more than just embedding or kind of it's more like a ecosystem right like it has like marketplace it has many different building blocks and components. + This is what I think most of the people that might be hearing us might be wondering much because it's a question that we receive a lot so we are not such another vector database as the ones that have been created in the podcast so we are treating the problem of semantic search and we are seeing this as a then-to-end problem and we are trying to build an ecosystem to help the business and developers to develop their own neural search based engines and for that we are trying to build a ecosystem from the core to our document types where we are also recently and launched this fine tuner project to help you with fine tuning your models for your search applications so we are building a whole family of products and projects in this around this area of neural search. +Yeah it sounds quite ambitious and it sounds like all of these building blocks are really needed for anybody who wants to venture into embedding world of semantic search or you know kind of bringing the power of this deep learning models. + So it goes beyond only only embedding your data and searching through it you may want to cut it into different pieces, you may want to re-run it at the end, you may want to join different modalities together so we are trying to give and make it easy for the user to develop these applications so that they speak the same language and we hope that they will all speak gene language. +Oh yeah, oh yeah, for sure. +And GNI is open source, right? + Yes, so can you speak a bit more like towards the business model or kind of how GNI kind of makes money in a way like so basically it's open source, anybody can go and download it and basically leverage in their work or is there something that like you have some products for which customers can pay and kind of right now we are right now we are completely open source everything that you can see in our report and stuff each open for everyone. +Yeah so so okay and you are like mostly working on back-and-side of things so you're not interacting with direct customers right? Is that okay? I'm working mostly on the main products and what do you hear about use cases like how do they translate to your level of kind of day-to-day job? + So most of our solution engineers that say that are closer to clients and users they bring guys their pain points on how they are trying to solve users needs and some of the main use cases that we are trying to solve come from textile search, e-match search, multimodal search that is something that we are trying to excel at that is going beyond only just using search and text or images to search maybe trying to have a combination of walls to power search to the next level. +So they might like bring some kind of use case that you need to figure out on tech level right? +Yes kind of translates to you but on the other hand like you said it's open source so it means like there is like a bunch of like GitHub issues coming in right and if you have like Slack or I don't know if you're using Slack anyway. +Yeah, so like probably every day like somebody you wake up and there are questions there right? So it's also clients in a way right? + Yes for me my users are our clients and we have to listen to them so that's the big point of open source in my opinion is this direct feedback from the users we can you can correct your direction and you can measure if your APIs are or your design are too complex for the user to rush or whatever so this direct feedback is really useful. +And to this point it's manageable. +Yeah yeah but it's also like I guess I also alluded to this and one of the podcasts was with Bob One Lloyd from from semi like it's also sometimes maybe to give up with all the questions right? Like if you get all these questions when do you find the answer to kind of really deeply answer? +Yeah fine time for answer them yeah. +So we are trying to grow our team into knowing that the community is something that makes us special and it's important for us to take care of our community so we are all trying to keep an eye on the community. +Yeah yeah yeah I remember like when I was developing like search code we were using like Apache Solar and I had to like customize some parts of solar and listen and I remember like in order for me to kind of get up to speed I had to go to this mailing list right? + And so there are like thousands and thousands emails actually Apache Solar was super active you know like in sillies in many ways and and I was like how can I keep up with all these questions but like I do need to somehow keep up and summarize maybe what what is being asked there in order to understand it's useful for me or not because when you ask a question on the mailing list or like today on Slack sometimes you need to be ready to pay back right? +If somebody help you in the community like you sometimes need to also pay back so it's like it's a game. + When this is seen in the community I think it's really pleasant for all the team when community interacts with each other and none of the no one in the team has to jump in because they so they help each other that's when I think the community really scales and really open source goes to the next level. +Yeah it's kind of regenerating itself and kind of the cultural element of it so and the community drives you forward I mean just driving force of the project from the interaction point and the feature wise as well. Yeah sounds good. +So John tell me more about GNI itself like as a product let's say as a technology stack like what can I do as a user you know using GNI and yeah like is it self-series and so on. + So the main point of GNI is that we want to be with the user from the minute they are experimenting with their search application so for instance we are written in Python and we have a really nice API in Python to build with your documents that can treat with any type of data, text, images, audio, video and we are trying to build a really easy to use API for this for you to run locally your solutions. + The first experimental facing to wrap your code for processing loading the files and for processing the images or whatever and embedding them searching to do a process many as neighbors or exact nearest neighbor search then once you have this we make it easy for you to wrap it in some microservices what we call executors so first phase you deal with these document array types that we have come with then you come with them to the next layer is you have it with the executors so you wrap your logic in different microservices and then we put it in what we call a flow that is kind of a pipeline that is really to scale locally or remotely or even with Kubernetes so that you have replication and scalability taken care for. +So we are trying to bring you easily from your day zero of development to the production system. +Yeah yeah sounds good sounds comprehensive and like what if I would like to just use like a hosted version can I use a hosted version from Gina AI or do I need to do an operation? There is no hosted version at this point yeah so it's basically I need it's like a Lego type of thing right? +Yes exactly. +I will have a nice deployment. And we have even this marketplace as you said this with this helicopter cutter so you can share publicly or privately with your colleagues or with the community your meeting blocks that you may think they are useful for you. +Yeah modern deep learning models that you have packed processing, copy-done, re-runking, even back to research research. +Yeah so and how does it align also with like companies or hubs like Hagen phase you know Hagen phase is also very famous on model side right? So like let's show somebody picks a model and wants to bring it to Gina what's the process there? + So it's quite I would say having phase it's quite inspirational for us in this sense in this marketplace community and place it is quite similar but um Gina is this marketplace is related to our executor so it goes beyond only models so it's any subsystem enabled and block that you can that you can build that is able to be part of this of this Gina pipeline for us and we are trying to make it user-phone for you to localize it and use it in any way in a simple API and we're still working to make it easier every time. + Yeah of course because actually you know it tends to get a lot of time you know the infrastructure part like how do I bring my model let's say I have a custom model and I want to bring it inside Gina right so it serves as a embedding layer so how do I figure out all this scalability or latency parameters and so on so I think so the first thing is to get it working we are having to we expose these with these executors that have some API and to that read requests with some maybe I inspired with this fast API approach and then you have with this row you have the parameters to replicate to scale and so on you you may run it in GPU whatever yeah yeah so like you can choose your cost kind of like factors right or based on your cost factors you can choose it's CPU actually and then latency and for some models actually CPU is fine so yeah I mean why not yeah it depends also on the user needs so for instance we are also seeing that neural search main not all is not needed to be only for these big giants with this big amount of data and big amount of resources so any company it's more company can benefit from the power of the neural networks to power their search so they may not need so much require so much resources or they may not require so much speed so it's about and so we are giving the power more or less to use yeah and kind of flexibility of the platform so because essentially if they wanted to do it from scratch then they would probably need to figure out similar things like component isolation and scaling and yeah like an algorithm like a quality checks and so on and on the algorithm side you said like you have exact search as well as in exact search can you talk with more and kind of mention maybe some algorithms that you support so yeah so right now natively we support as the main native quite optimized version of the and exact nearest neighbor search but then for instance one of these building blocks can be any support wrapping any client for any other vector database but for instance we just realized our own and approximate nearest neighbor solution we have two of them for instance that we have developed so much so we have one that is based on hsw plus a postgres indexer a postgres database for to require the documents and then we have built our well we just released and in Slack the community can start enjoying it we have and build what we call pcolyte which and works with product quantization but also has support for hsw you said pcolyte or how do you spell that pcolyte pcolyte or pcolyte or it's like product quantization light version yes we are preferring options that's why with preview and and and in what sense is it light compared to product quantization no I have not been involved so much in this project right now so it's a new thing but it is light in sense of that it is quite and and it's quite native to work with our document type so it's not so general as any any object but it is really built to integrate very easily with jinnah how I see like with specific kind of scheme on or document types and and it's also open source and do you do you like obviously you can provide the links or we can also link in the show notes but do you also like have some kind of latency analysis for this algorithm like hasn't been conducted do you know yeah there is some benchmarks that you're gonna find in the readme I cannot have the I don't have numbers in my head right now yeah but I think for portion of our audience it's gonna be interesting to check out because as you know like actually my team just completed like participation in big a n n I don't know if you heard about this competition so it's like brilliant scale approximate near near snail search so we invented like a new algorithm called body pq I will also link in the show notes like the blog post about it so we increased recall by 12% over five model so yeah five algorithm so yeah I think I think it's great that you guys also inventing I don't know if we are testing to this billion scale I think we are more in the million scale yeah actually we also ventured into billion scale but in the process we figured out a solution for million scale so it's not for billion yet we don't know yet if we can generalize to that level but I think we can with some additional research well this is the first version so for sure we will try to improve it yeah yeah awesome awesome this is great and have you also helped customers to like train models no but we don't we didn't help customers well we did from our solution point of view but this is an interesting topic because this is something that of the this is one of the pains that we found quite often with our users like it was easy for them to go to the to that level 70% let's say of accuracy with any deep learning model that all these tech giants have developed right but we believe that this last mile this transfer learning part is important and we are and when we realize we started this project that is we called well between we know it's already released a fine tuner maybe we can share that as well where we try to make it easy for users to to fine tune their models for our magic learning search applications and they are and it is also framework agnostic for fight we support fighters tensor flow and paddle padder so we realized this pain point for the users that once we have everything running at so on the quality was not as expected and this and we are trying to get to help the user in our ecosystem to get to this yeah to this level by using this fine tuner so basically can you can you explain a bit more about fine tuner like basically what what input do I need to provide as a as a user into this so fine tuner it could feel similar to any fighter dataset for instance but we are trying to put our documents as our as the main citizen of our ecosystem so you have to wrap your any that has any of your data into our document types which is really easy so it's something easy to learn and easy to use and then you can fit your models and we have made it easy for you to use the most typical of those functions we are trying to introduce hard negative mining we are trying to make it easy for everyone to solve the common problems when having and when training for and for search applications and we are also trying to make an interactive labeler that helps you interactively with through an easy to use UI and tag similar objects so that you can go together with them yeah yeah so like kind of I mean fine tuning can be a pipeline by itself right in itself so like how do you get this data samples that you want to fine tune on and you might have them with full launch or during test after launch and it's like a you know the cycle and flywheel of success so to say right so do you cover like the full workflow until production including production or is it like pre-production so for now we are using the just embedding model and just to get embeddings that get better semantics out of your your data set of your specific use case but we are in a really thing it's 0. +2 release or something around in there so there's a long way to go yeah for sure but I mean the direction is fantastic because that's exactly what what addresses the the real need uh any user like fine tune like it's all fancy to take like a hugging case model or whatever but like fine tuning it to the level when you're users beloved that's a different story yeah that sounds great but I also wanted to come back to you like you mentioned that ginae i doesn't kind of compare to vector databases but I do get sometimes questions like how do these systems compare to each other and you may may or may not know I blocked about all vector databases I knew to that point and turns out they've been six and then the seventh one knocked on the door so it's also now on the blog but I didn't cover ginae i i didn't cover deep sets haystack because I thought that ginae and haystack they like layers above a vector database is that the right thinking yes I think it makes sense we are we might try to develop our own solutions for the use case that we may feel more worth so that is I mean the one is out there but yeah I think it's right we are trying to I think vector databases cover one of the parts or one of the challenges maybe one of the main challenges of vector research or neural search but we try to see the whole scope and the whole pipeline so in ginae we can use you can wrap some client that will use any of the big vector searches big data bases of how they're have you done any integration with some vector database not ourselves right now but it would be we might do it in the future okay yeah because for now you did mention that you offer ginae and and and algorithms which to me sounds like a core building block of vector databases but then of course in vector database you have many more things right like where do you store objects how you store them what about filters and so on but we are trying to cover from the for instance we are not some some people for some use cases and just exact years neighbor search might work just fine and they don't need to worry about configuring fancy ANN models for their recall speed and requirements so I think there is room for everyone so I think it just you have to offer what is right for the right use case and the right need yeah of course and by by the way what's the core programming language used in gina like in ginae so our core programming language is python because we are more like since we are this pipeline and we are like a glue and ecosystem most of our operations are wrapping models that run in optimized languages or something and that also python helps us iterate really fast which other languages might slow us down yeah that's true and does it also apply to the ANN algorithms that you mentioned like pqlite is it also python? +I don't know if we are for instance I think we are also using some bindings for HLW so you are using probably C++ version of HNSW binding to python right? + yes that's for sure but I don't know if some of for the HNSW you part yes for some other parts I don't know we are so maybe sites and I some of the code we are trying to optimize wherever we find yeah but but it sounds cool that you know if if we still continue kind of this comparison a little bit between ginae and vector databases right like vector databases if you pick them let's say viv8 is implemented in go what run is implemented in rust so these are compiling languages right so vespa is like Java plus some c I think C++ and but mostly Java so like nobody implements the vector search in pure python because it's it's very it's going to be very taxing on the latency you know no but the expensive operation we are not running so for instance then the nearest neighbor search we are doing we are based on Nampa operations which are optimized at Nampa level and the approximators neighbors I think most of the code most of the heavy lifting is done under c++ level from I'm just covering our our bindings yeah and I'm still curious about pq light like is it the c or is it python but I think we need to check the documentation yeah I'm curious because like I've actually invented a new algorithm in in NNSWRG but I haven't published it widely it's open source but I haven't like done the thorough benchmarking and what I faced is that you know like in python even though I optimized all parts of the algorithm I'm using pre allocation and Nampa it still runs out of memory runs out of memory as in it leaks memory and it doesn't explain like python virtual machine doesn't tell you where like you don't have tools okay there are some tools but they're not useful like no you're showing a little Java so yeah and I've been like a little bit like desperate and I've been thinking okay should I now move into RAST GO territory which might be a little bit more dangerous like even though I do have some experience in c++ but you know like do I want to go there now like python is much more comfortable the I think depends on the later you are working with and it's so I think that by offering python APIs in the field of machine learning will attract and will make everybody much easier to use then if you get the API rights the API is right you might then bind it to whatever of your favorite languages but I think getting the comfortable API for that developer to use and to love using is one of the key first steps so do you invest a lot into building these APIs can you give an example of like some API within jena that kind of makes the workflow easier for so for instance we are trying to improve a lot in this document so documents are central logic and documents are raised that these are the two core members of our family in the ecosystem so we are spending a lot of time on making them easy to use for instance with this fluent pattern we are trying to invest a lot of time on finding the best way the more python way to work on it yeah so it's a constant evolution yeah yeah of course but it's like API is exactly that layer which is essentially like facing the customer right and you don't know the scenarios they will use it in and sometimes they might kind of surprise you or they might say okay I found some workaround for your like missing parts but then you think okay I didn't think about it right the API layer is a fantastic way of talking to your client through like API contracting away right yeah and it's a quite a big challenge I would say to have the right balance between ease of use and flexibility so what belongs there and what doesn't belong there because there's always a risk to put too much functionality in one same thing and make it very powerful but make it a nightmare to use yeah so in these balance I think there is the key to what is your choice when you have to choose let's say it's a balance of flexibility or like flexibility or what did you say the ease of use right ease of use I think we're now I'm now sending to go for the ease of use because for instance with these open source I repeat that open source teaches you well I think at some point we did a nearly down to well the APIs and it was a little bit complex to use you could do a lot of things but at the end maybe not everybody was doing so I think ease of use for the first century barrier is the most important thing yeah and I mean also like it's interesting you know like if you have a real API let's say deployed somewhere and it's a published contract and people are sending queries there then you know actually which endpoints which features have been used which are not which options are completely ignored even though you put them in the docs right but how do you go about this in the open source code like somebody download your code they use it somewhere you don't know how so how do you collect this analytics from them do you just send like call out messages hey guys what do you use what you don't right now we are trying to keep attention on who is using guys what and when people ask us we try to get the most information out of them not information on the business of how they use it how they feel so right now the community is the only source of information we have that's the open source what how do you talk to them like do you like send like messages saying hey guys can you vote about keeping this feature and removing that one or not exactly like this but would you see if people that are more engaged or more or less engaged people that are more finding it more easier or less or having more difficulties with your with your solution so it's and sometimes we have a development relations teams that try to get also feedback from from the community in many terms so this is a global effort but but in the end you have you have a say right like no matter what they ask you have a say is that right well I mean sometimes you cannot please the community to all the extent I don't know we have to keep a roadmap for instance people my one you to build something that is eman related but maybe not so significant for search solutions this is quite a confusion I think also there so beyond beyond search like where can I use Gina what kind of other use cases have you seen beyond like kind of similarity search let's say so since we are building these abstractions it is quite easy for a year to use these abstractions for building any classification model or anything as I think you really did you could even deploy something and use Gina to easily deploy and scale and use your segmenter and object segmenter model but this is this is something that you could do but Gina is born and will will be working to implement new research solutions so you could still use this but might not be the best tool for it so we are not born for that but you could do but we can see that we are done we you can do this because for instance classification or segmenting can object can be part of your pipeline but in theory we are born to support search application yeah yeah so like or or something that is or search applications or something that you can frame as a search application right now for instance a question answering system that you can frame as a part where you will do something I research or spare search and then you have some really more extracts more information from something so anything that falls into this domain you can do it yeah I guess you can also like based on the research and some practice happening in data augmentation based on retrieving you can also formulate data augmentation as a process of search in principle right so the output will be your augmented data but you you use search in search in in yes actually that might be but search can be so many problems can be framed into search I don't know at the end is like vectors are somehow like the truth not like the semantic information so how we don't understand exactly why but is encoded there right so just by clustering them together somehow we have some understanding so so many things to confirm with I also wanted to ask a little bit like closer to the similarity search itself you know let's say I built a traditional kind of texts search engine okay and I'm moving away from BM25 which is like probably majority of this market today so I'm thinking okay what are these cool kids doing maybe I should try it out plug in some bird model and but then in my UI I'm also showing snippets and it's very easy to show snippets when it's a keyboard search right so what should I do or what can I do with model like bird and genie AI to show snippets or something that will resemble snippets to the users maybe you can also change your information so that you check where the attention is put in your model or somehow but yeah I think also there is a thing that we are framed and we have been grown into this keyword search that it's so interpretable and so easy to use and even so so easy to hack so how no you know you as a user know how to drive your leg search if you don't find it right but okay this word might find you here and I think since these models are kind of black box for many of us I think in this kind of sense this interpretability is one of main challenges and I think one of the main focus that research should go yeah but I mean you you call it out as an interpretability but like for the users and for me let's say I'm a product manager I don't care about is it bird model or is it BM25 I used to see snippets I want to see them now so like what should they do yeah but the point is in the 25 I can give you a snippet because I know why I have this solution here is well it correlates and but where the information that I want is there maybe for instance in dinner one of the main building blocks that we have is our document is our recursive structure because most of the things for instance if you find a search if you search a document a text or document you might need to break into paragraphs into tags and so on so maybe what you can do is you do the vector search at the variety level is at sentence level but then the results might be shown at paragraph or at sentence level so you can highlight very easily the sentence that really drove the search to this page that we know I use case yeah I remember actually I don't if you know the block was at Selman Run like Sujit Pal it's he's doing a lot of blogging in the area of like you know here is the problem how do I solve it and and then he look quite usually he goes into deep learning or like trying out some vector search maybe or or not and I remember like he was saying that to solve this snippeting problem how he he would do it because he comes from traditional search and I do in a way and like he said okay you can kind of build like if I remember correctly if you can do like almost like a dictionary right so let's say you take a word you can embed it take a word you can embed it like you can embed a dictionary right now when you found that document you can kind of from embedding so you can map back to the words if they happen to be close enough like geometrically you can find close enough words so you can kind of try to say okay maybe these keywords are representative of this text but I'm not 100% sure but at least you try so you go backwards like reverse engineering from the embeddings it's interesting though you may need to go through all the pain of limitizing kind of these kind of stuff that you may have saved by going through semantic search and now you are back to it so like straight-offs but yeah it might be a good yeah limitization is another thing but like I think there was this paper from I believe Google about byte level kind of training right so they don't care if it's like lemma or if it's like suffix or prefix they just go byte level they don't go sub word level they go byte level and then with byte level you can essentially kind of like okay now I can compute the distance again right okay how close is this to this dictionary word or not but then again from there in order to produce a snippet that will look like natural language you will have to use some kind of model like gpt or like in general generate the the sentence and at that point it might actually go completely different direction from your text right start like hallucinating or write a news item that doesn't exist so yeah well maybe you can use these extractive models were from a sentence given a context but nature all these all these top literature is yeah yeah but I mean like attention the mention what you mentioned the tension probably can be used here right so like you you can ask the model okay what did you pay attention to when you yes when you did the matching but but still it's not some people as you say like you can say interpretability but on that hand it's kind of like when you go specifically to that product case you need that snippet or you need that kind of context of the match all like if you said mathematics and it picked algebra like why did it pick algebra at least can you explain because here it's more or less obvious but in a specific domain it might not be right yes like what do we do I don't know maybe you are not using the right to I don't know maybe we are obsessed on using the degree for everything but I think the these two walls of keyword what we call tradition search and this neural search I think they can be combined to power things to the next level I think they need to be enemies and there is good and bad team both sides do you have any thoughts how you would combine for instance in any solution you can have solution I don't know maybe you can get results based on both on both sides and then at our ranking step consider what is best is this a complex is this a complex query maybe I'm looking more for some semantically reached solution did this guy just this is user just send a couple of keywords is it semantically written off no this user might be expecting keyword based feedback yeah yeah that's true well like you couldn't even like go as simple as kind of giving that control to users right so So everybody knows that it's keyword. +They first want to go with what they know, what works or may not work. And then only if they are not satisfied enough, then they optimize for equal. They might go into explorative mode that's put on the similarity search. Yeah, I mean, that might be quite viable. So it's interesting. +The problem is that keyword search, well, as far as search, it might have not a good future for image-based search or any other mobility related search. Yeah, exactly. +Like the moment you go beyond text, right? What do you do? That's a big power, I think, and the big future that neural search has ahead. That is, there is where not any traditional search solution, I think will keep up. +So if I want to build like a multimodal search, can I be some executor from the marketplace and plug it into Gina today and do it? Yeah, I don't know. Yeah, I think we have some, for instance, but you can use clip that clip. You can use clip to encode. +I think there is audio, you can text, or there is image and text. And it performs very well. We have wrapped it in one of these executors and hat modules. And you can use these clip models to your close model search faces. +It's quite efficient without the match-faint tuning to search for images given text and the other way around. It's good to present. Yeah, that sounds cool. +But I was thinking, if I want to combine like speech, text, and image, then I need to probably come up with some meta model of that, right? +There is some research in this area where it is not that like modalities are treated differently and encoded separately, but where they are considered together. +Even there is some research where there is multiplicity and some context switch. So they move the vector. So that's also possible to get the latest research, wrap it into one of these models and deploy it in production. But this is not so easy. +For us, we didn't focus on building these front scratch, but we're also looking to having these top-notch researchers into building these modules. +So like, in that case, would you prefer like communities to help out to bring in the model, or are you helping to do that? Right now, we are driving this direction to offer this to the community. I think that our dream as an open source is to have the community flourish and be alive upon itself. +So the future should be community driven? Yeah, because in the end, community might also know kind of... Well, when this grows big, you know, community will be kind of helping each other. Yes. +Like some of these things will become what you may call commodity to some extent, right? Or at least the way you integrate might become commodity and the use cases might become commodity and there will be new use cases which are untapped, but I think community can definitely help out each other. +What we might need to focus on to make these models easier to use or easier to find if we have a marketplace where everything... maybe we need to help the community on finding what they need in every time. Yeah, yeah. +Content wise, hopefully there is a time where community is the main contributor there. +Was there something else in China that we should know about as users, some cool feature or some system that you think doesn't exist in competitors, something at all is cool? I don't know right now about competitors. So I think what I like the most is the easy... the easy views and the time saving. +That's what if you go out to our economy and you try to build from zero to the ployding coordinate as an neural solution and image solution, I think you will all enjoy the asiness. Yeah. Yeah, so it's like kind of well-oiled machine. +But can I also bring it up on my laptop? Yes, you can try on your laptop everything. So the point is you may not be able to index so many images, but you can get the first feeling with your laptop. +Yeah, I mean, if I want to build a demo to impress my manager, you usually use my laptop, right? Like, that's maybe one way. G9 is really for that. Yeah, that's pretty cool. And I think also it's nice that you said it's Python friendly. +So it opens doors to so many things, especially on hacking face. It's pretty much all Python, right? So I need to pick some models like it in and do a need to containerize it maybe or figure out isolation and so on, like just plug it in and start using it. +I think that's also a great boost to productivity and actually kind of implementing the use case rather than focusing on some mundane components and parts and processes, right? Yeah, and even these modules that we have, they are already containerized for you. +So we have on our end, we build a container for you so that you can build it in an isolated way with all your dependencies and stuff. Yeah, yeah. Sounds great. I mean, I think we now have pretty good understanding of Gina. Of course, we didn't read the docs yet, but it's sounds promising. +So I hope some of our listeners and audience will take it out. I wanted to go more into this kind of philosophical level. Like, what drives you in this space? Like you said that you've been working in web scale search as well before, right? And like some other search and engineering in general. +So what drives you here now in this area when you join Gina? And why are you joined Gina? So I joined Gina, especially as I was in this tradition, search space I was working on training ranking models. So what drove me more is to enable this search system, this search experiences beyond text. +I was super curious about how we can, it's impressive to me how the same framework of getting something that extracts meaningful vectors with semantic information can be used for images, for video, for audio, for anything. +These frameworks, I think, it has a lot of features, because it's quite, and also how the different research areas from different modelities interact with each other, I don't know. +I don't know, the conversion neural network appeared, even some text classification used to do these, then appeared the transformer right now, the computer vision community is getting in love with transformers. These fact and forth, I think, it's impressive. +But also, if you think of the magic of getting this vector and having so much meaning there, it's quite amazing. Yeah, it's true. It's very powerful. +The sheer fact that you don't need to build a synonym dictionary if you go full text, right? Like it just tells you that mathematics is close to algebra, or you throw data at it, and it's unsupervised, right? +It just tells you, hey, I've trained it up, like now, okay, I can tell you what's close to each other geometrically. +It also has the mathematical beauty there, right? Geometric closeness rather than some obscure train shopstruck sparse closeness. It's quite elegant, I would say. Yeah. +You have tracked all these knowledge, all these, and you have this simple thing that you can imagine in your head as a three-dispose, and that is as simple as algebra from, I don't know, which grade, but quite simple. Yeah, I think in simplicity, there is a lot of beauty. +Yeah, it's very easy to explain to your granny. Like, I'm doing this, you know, like it's a three-dispose kind of, there are points, and I'm just looking the closest one. +I expect to have something that puts things close to each other that makes sense together, that is what we expect from this black box. Exactly, exactly. +And then the question of scale, like if you go to 10,000,000,000,000,000,000,000, then, okay, can you trade some of that closeness precision, and kind of get past the speed? So, yeah, it's very interesting. +I mean, does it interest you more like on deep learning side, or on mathematics side, or engineering side, like, or maybe some other side? In every side, from mathematics, I enjoy a lot the beauty of it. Sometimes, it's too obscure for me, but I really like it. +And understand, deep learning, I like it. Although, I feel that some of the research doesn't seem to be so innovative, and maybe we should spend more time checking other paths. Okay. Deep learning. Which other paths, like you? I don't know, you said that. Could be honest, I don't know. +I just feel that there's so much literature that I cannot keep up with. And then from the engineering side, I think it's cool. It is just space, I think I also can provide more value, depending on some times concepts are to abstract from. +Yeah, like for me, I want to call out, but you'll point on, is deep learning the only way. Like, for example, one scary thing is that these models are becoming more and more parameterized. So you have like 100 to billions of parameters, maybe billion, trillion. +Like, how many more can you have? Zillion parameters in there. But first of all, it's impractical. So if you take that model, you try to plug it in. It doesn't plug in because it's too expensive. And also, you might not have that much data in the first place. +So why should you care? Like, web scale search engines probably will. But like you, as a researcher, in let's say a startup, you don't know if you need that much. +You need to solve that specific thing, right? So it will look really strange to bring this huge microscope and like, GBT model in and say, this is what we need to use. And then like, the whole budget goes into paying that model, or whatever, you know, like, it's in practical. +So that direction by itself, like, I think it's a little bit like doled. Or like, I don't know, like, how you feel about it. Yeah, it's a race where I had another layer and get more parameters in A-Win. +And I think, I mean, but it feels that the first step to move away from this is to really understand how things are learned and why are learned the way they are. I don't know any match. You have these models to bring back to the image. Where the filters are learned more or less. +You have some idea or where the model is looking. But maybe to put more research on slow down, let's slow down these race and let's understand. And maybe we find a way to make it more sustainable for everyone. +Yeah, because I remember like, when I was doing my PhD in machine translation, it was using like statistical models, like Moses and statistical machine translation. And so it would suffer from things like out of vocabulary and how do I bring syntax in and what not. +But then like when deep learning came, like all of a sudden you see that it translates much, much better. And you think, wow, probably they solved it now, right? The claim from 50s that we will solve machine translation probably now is delivered that the promise. But then you notice it's fluent. +But it's kind of like, I don't want to use the word stupid, but it just doesn't get it, right? Like it makes wolf subject an object easily. It may go and hallucinate about something that doesn't exist there. +Or it actually goes and translates into like single letters all of a sudden, you know, or repeating engrams. Or like you see that it didn't exactly solve it, right? You would have tried your life to a certain system yet. +Yeah, and then you kind of come back and like, okay, and I used to do it like in a rule-based approach. So I could understand the syntax of the sentence and then semantics, all of each like nod in the tree. +And then when I translate, I use some semantic like function and it's all well-defined in the axonomy of semantic functions and so on. Like, okay, now I go back to deep learning, do you have anything like that? No, it's like just space. +Maybe there should be a way to combine, I don't know, we have built, I think as humans, we have built this complex way of talking to each other, know, with semantics, I mean multiple languages and stuff. And there is no way that all these language can go back to this deep learning world world. +It seems country-to-if, if at least. Yeah, yeah. So you are certain that like maybe the voice of those who built alternative models to deep learning of alternative approach should be maybe louder. +Yeah, I think that we may suffer from the bias of the winner, no, I mean, maybe the first one who opens a door might not win the race because, but even if they show another way that the race might go, I think they might deserve more attention. Yeah. Yeah, this is quite deep. +Thanks for this white section. Like you think about it a lot, like kind of, okay, not to be biased, okay? Yes, there are challenges, but it might not be the only right approach and giving you experience as well. Like you can judge a little bit with your open eyes. +I think we should explore more and maybe not one focus of that. And probably explore with Gina, right? That's the goal. For sure. Yeah. So this is super great. Is there something you want to share? You already shared that the fine tuner is available. +So our listeners can go and check it out, right? Is there something else that like we need to be expecting? Chon. And early next year we should be releasing our 3.0 version. So the state unit for that. Yeah, absolutely. Comment. We will be moving fast in the next times. Yeah, this is fantastic. +I mean, thanks so much for all this information and detail on Gina and also like your ambition and kind of like your thinking here. I mean, it's really nice that you keep your open mind available to all of our listeners. Yeah, thanks so much. It was a pleasure to talk to you, John, today. +Thank you. Enjoy it too much. Yeah, thank you very much. Looking forward to 4.3.0. Yeah. Thank you. Cheers. Bye, bye. Bye. \ No newline at end of file diff --git a/transcripts/vector-podcast/louis-brandy-sql-meets-vector-search-at-rockset.md b/transcripts/vector-podcast/louis-brandy-sql-meets-vector-search-at-rockset.md new file mode 100644 index 0000000..b8fe010 --- /dev/null +++ b/transcripts/vector-podcast/louis-brandy-sql-meets-vector-search-at-rockset.md @@ -0,0 +1,151 @@ +--- +description: '

00:00 + Intro

00:42 + Louis''s background

05:39 From Facebook + to Rockset

07:41 + Embeddings prior to deep learning / LLM era

12:35 + What''s Rockset as a product

15:27 Use cases

18:04 + RocksDB as part of Rockset

20:33 AI capabilities: + ANN index, hybrid search

25:11 Types of + hybrid search

28:05 + Can one learn the alpha?

30:03 Louis''s + prediction of the future of vector search

33:55 + RAG and other AI capabilities

41:46 + Call out to the Vector Search community

46:16 + Vector Databases vs Databases

49:16 + Question of WHY

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20240501_010549_d5b842295c8b59f78ff9fa1e488d2af8.png +pub_date: Wed, 01 May 2024 13:54:39 GMT +title: Louis Brandy - SQL meets Vector Search at Rockset +url: https://rss.com/podcasts/vector-podcast/1460893 +--- + +Hello there, vector podcast. Season three and this promised I'm trying to shoot for 30 minute episodes. Let's see how I'm going to do on this one. I'm super excited to have Luis Brandy, vice president of engineering at Rockset. I know you guys are building database. +Hey Luis, how are you doing? I'm doing great. So far so good. Thank you for having me today. Oh yeah, excited. Excited to learn about Rockset as well. But before that, it's a tradition. +Could you please introduce yourself a little bit about your background and how you got to your stage in your professional life? Sure. So I've been at Rockset for two years and change over two years. VP of engineering. Before that, I was at Facebook for 11 years. +So I did roughly three things at Facebook and it's funny because even the ones that feel least relevant have become more relevant recently. I did spam fighting infrastructure for my first much of time at Facebook and that involved like two large systems. +One was like a super real time system, which turns into the real time database we're going to talk about today. And the other was we did a lot of vector clustering. Like back I was doing vectors but way before they were cool. This time was like 2011, 2015 or so. +And we used vectors a lot in in spam fighting and image classification. And this is like even before like the deep learning took over the world like this is right before deep learning changed everything in this in this space. But we were using vectors a lot. +We built some pretty powerful systems actually built like large scale vector clustering. I don't know before it was cool. Now everyone's building large scale vector applications. And then I worked on a lot of other stuff at my time at Facebook. +So there was a lot of core C++ core libraries, a lot of infrastructure stuff. I worked on an open source stuff called folly and thrift. So these are basically like core libraries that Facebook has released over the years. And the theme of all this is like highly scalable infra. +And then I did some real time and some vector stuff back in the spam fighting days. It's not totally applicable necessarily to the modern world. But it's still pretty interesting background. It a very interesting confluence of things that have brought me to Roxette. +So yeah, that's my life story roughly and in nutshell, there's more. But I think that will do for the for the intro. Yeah, for sure. Very exciting, really exciting. I heard about thrift. +And I also remember like early on many years ago, when some of you guys were on stage, you know, from the engineering at Facebook, you would constantly, you know, hint to the point that yeah, we ran out of the capabilities of this database. So we needed to scale up. +We needed to build a new one sometimes. And that was really really interesting that it's constantly like, you know, you're always battle against too many images, too many videos and so on and so forth. Yeah, one thing that I've always said was that like everything is broken at scale. + Like like, there's this idea that sometimes you reach for the right tool for the job, but the reality is like when when you push the even the right tool to the limit, it will fall over and you'll find yourself doing things that other people, you know, rebuilding something that other people take for granted. +Like my favorite example of this is at Facebook, we had a team on on my core C++ group that was working on Malak. Like who who works on Malak? It turns out there are people that work at Malak. +They're most of them work at like a place like Facebook or Google or places like that, but but that's like the kind of thing where you can save a lot of money by making tiny improvements to Malak. So it's worth doing. It's amazing. I remember I did a bit of a C++ as well. +I guess like you could say two and a half years. +And at some point in the 90 virus company here in Finland, I had to choose which Malak will it be, right? And I had to sort of discuss with my team and I was like, struck really, is that really the thing we need to discuss? +And they said, yeah, actually you won't believe because we are running on a mobile phone back then it was this Microsoft's Windows mobile, I guess it was called, right? So you have to be really careful to the round. +Yeah, I mean, there's only here for Malak's in the world. So you might have chosen ours. Who knows? Amazing. All these say is that you've been really, really deep and low level. And so I think you you doubled in coding obviously, right? Yeah. So I was a fairly technical. +I've been a manager for relatively long time. I don't know, 12 years or so, but I've always been a fairly technical manager in my path. And so for example, for years I worked in the course, SQL's plus libraries at Facebook, even while I was a manager, even a director. +I was on the SQL standards committee for a while and doing things like that. Sorry, I got paged. Everything's fine. So yeah, I've definitely worked in the code. I've tried to stay as hands on as possible. In most recent years, it's become increasingly difficult. +I just I don't know, it's sort of the dark side of management. You slowly slide into more managerial things. But I still try to stay about as hands on as I possibly can. Oh, tell me about, tell me about me about that. +I mean, I'm also on the product management side today and previously a manager of people as well. And I'm like, am I sliding backwards? Do I need to? Sometimes I do, but it's not on the same level as it used to be for sure. But it makes sense to stay on these topics. +And and then after all these years, you decided to move to Roxette. I've read a blog post that I think you've written for the company where you give the reasons why you did so and you explain about the team strengths and so on support. Some of them are from Facebook as well. +Today matter, right? Can you sort of repeat that story a little bit like why you moved from a big company you could say, right? To a startup. So the answer is in short is the people. The core group at Roxette is a bunch of, I shouldn't say the core group now. +The core group now has grown a lot, but the original founding or was a bunch of extremely strong Facebook people that I knew from Facebook and from. And so you know, you mentioned rebuilding databases. +For example, like two of the main people were probably three of the main people responsible for rebuilding databases at Facebook are at Roxette. So Drupal, who's who's our CTO was built RoxDB at Facebook. +And that was part of replacing the storage plan of my sequel and a highly scalable way at Facebook. And then of course, the graph database that powers literally all of Facebook like Facebook is a graph and it is primarily powered by a graph database called Tau. +Nathan and Vencat are two people who worked who worked on that at at extensively at like tech leads and founders in some sense of that project Facebook. So this is like extremely pedigree group. +So to me, I don't care so much about it, they're also like genuinely amazing people to work with and work around. + And so this is kind of this idea of like, hey, you want to join us startup with a bunch of the smartest people you've ever worked with and like try to do something and worst case scenario, you know, it all goes, you know, kaput or whatever, but you have like you worked with like some of the best people on a really interesting problem for a couple years. +And I was like, yeah, I'm in for that. There's a longer version of that story, but that's that is really the central reason of why I ended up I ended up switching. Yeah, I mean, it sounds like a brilliant reason too. +But I'm also interested you said you've been using embeddings before like Facebook and on vectors and you said that prior to deep learning era. +So can you explain a bit like how these vectors were sort of created if it's possible? So there is some sensitivity here, but it's not maybe for the reason you think it's not a trade sensitivity. The sensitivity with abuse abuse use cases. What we were doing was image classification. + And and most of this is I'm not going to go into too much detail for maybe obvious reasons, but there are there are images that you are not allowed to to to use or put up and they and and obviously what they don't want to do is hand all these companies like the images and say if you see this illegal image, tell us. +So oftentimes they give you hashes. And but these aren't actual hashes. They are not a hash of the illegal image. They are a locality sensitive hash and they're a vector. What they are is literally a vector. +And Euclidean distance is the measurement of so you basically have a a classic vector search problem. You're given a pile of vectors. If there's a technology known as photo DNA that you can look up. It's it's not as far as I know it's not like an open standard. +So you don't actually it's not actually in the public domain. What it actually is, but it's effectively a mechanism for turning images into vectors that's used as this hashing mechanism. +And and so Facebook built a bunch of infrastructure to flag hashes that came through for reasons that are not fun to talk about. Let's put it that way. Like they're there. There's like again, I don't want to get into it. +It's kind of it's awful, right? But at the end of the day, like you have you have vectors flowing into the system. And what you're doing every single upload is is doing essentially a vector search. +You're saying, Hey, given this corpus of vectors is this vector that fly that's coming in match any use. That was the basic core of the system. But once you have this like these vectors, you can start to do other abuse things. So for example, you can start clustering vectors. +You can build vector clusters. And that way you can find like neighborhoods of images, like similar images. Now here similar is here similar means something quite different. Because these were not like semantic similarity. +So this is not like what you would get from an embedding today from like say, you know, any of the modern. Yeah, I heard clip. Yeah, whatever. Yeah. These were these were these were much more like text textual. I mean, texture. People familiar with like image processing techniques. +This is these are vectors based on things like local pixel gradients or wavelet transforms things like that. +So when we say images were similar, we mean to things like, you know, like rebalancing the white scale or or changing the hue and saturation like like those kinds of image manipulation or re encoding it right from a different JPEG, different JPEG encoding. +Like it was tolerant to that kind of manipulation, not like it wasn't like finding images of elephants, like that's that's not what it was doing. Yeah, yeah, I remember I took a course. Actually, I studied master degree in Finland here dedicated to data security. +And one of the courses was about, you know, how you can temper with images that had watermarks, right? So like, yeah, and then how do you make that watermark resilient to any tempering that might happen on the image level, right? On any of the bands and stuff. And as you explained, he was in stuff. +So that's basically they digital image processing is the word to Google if someone wants to. And then it's like a big, big topic. But what struck me and what you explained is that every image upload had to go through that process, which means it had to be super scalable. +And also your database of vectors would be ever growing all the time as the image passes through or doesn't, you would need to add it somewhere to your vector space. So in this case, no, this is the one advantage we had, because we only care about matches to a specific relatively small set. +Oh, I see, I see. So it's like a set that shouldn't grow ideally, right? Yeah, or very, very nominal. I see. Yeah. And so this is, so it's funny because that that's a big difference that that makes it easy in that era. +Today, you'd have to bust out all the A and N stuff and maybe stuff we'll get into to really be able to do a really much more scalable vector search. So this was really more about evaluating a relatively fixed set of vectors. +So you can hyper optimize how that was organized in like a, and, but evaluating it at an insane scale. So the update problem wasn't very hard, but the evaluation problem was like it needed to be extremely high scale. Yeah, a bunch of questions in my mind, but let's move move on to Roxette. +Tell me more about the what part, you know, what it is as the product. And then slowly, let's go deeper into the technology side. Yeah. So my standard statement of what Roxette is is Roxette is a search and analytics database built for the cloud. +And that's a bunch of, I forgot one, it's a real time search and analytics database for the cloud. +Now that's a bunch of little buzz worries that, you know, it's very easy to get lost in the kind of marketing feel of that, but each of those words does like non trivial amounts of work and what it is I'm really trying to build here. So first of all, it's a search and analytics database. +So here, what we mean is like a like an OLAP style analytics database is like it's it's like that's where we're starting. We want to run an analytics type queries and this I won't get into all of this, but this is like separate from your OLTP style databases. +So this is not my sequel, not a large transactional thing. It is a but is it OLAP style database. +And search and analytics is a very interesting pairing in this world because systems like elastic search or very search oriented system systems like rocks that have analytics styles, but these are actually not that different architecturally. They're very the way you use them may feel different. +The primitives you're using feel different, but all that sits fairly shallowly in the technology. The underlying architecture of these systems ends up looking quite similar. So search and analytics actually go together quite nicely from like a I can do both. +Maybe I don't do both well, but that will mostly exist at the top, not not in the not in the infrastructure. It's in the cloud. So the whole system is built to be elastic from the beginning. So if you send me twice as much data, I can scale you out in a way that you know, like it just works. +You don't have to worry. You're not reprovisioning more machines to double your cluster size or anything like that. And then real time. +So our focus has always been real time, which is to say specifically most people when they think of real time they want their queries to be fast, but the real heart of real time is ingest latency. +So if you send me new data, how quickly does that data get manifested in the queries? If it's tomorrow, if it shows up in tomorrow's queries, you're not that's not a real time system. +And there's a lot of systems like this, these very big batch style, like mega exabyte type of like doob clusters that you you can query yesterday's data, right? And get and get like genuinely enormous amounts of data. That is not rock set like as that's not rock sets problem. +But for us, it's like, hey, if you want like last minutes data and it's ideally several zero smaller of a working set, then that's where rock set is meant is meant to work really well. And so this is like the heart of this is what we've set out to build at a high level. +And I don't know if you want to do want me to keep going. I don't know. I feel like I've already said too much. I want I want no, it's amazing. It's a good start. I wanted to stay a little bit on the product side. If you go now and flip over to the use cases for the moment. +So what are the typical use cases and sort of can you zoom out as much as possible, maybe even giving, you know, even if hypothetical, it's fine. For example, so products that use your product. Yeah. So we we have a bunch of customers in a bunch of different domains and it's we can even go. +So so one way to think about this is just like who's using it and why like what domains are they using it in? And so for example, we have a bunch of gaming customers. So this is like there's real time events occurring in games. +Imagine an online an online game of some sort and they're collating that information constantly and having it be real uptime say leader boards or things like that are happening. There's a lot there's several actually like logistics and supply chain type people using it. + So like where is my package right now or where is the boat in the ocean like these kinds of queries are like very commonly done, you know, like where is the where they're basically tracking their entire supply chain trying to find shortages and what's going to create problems down the line in like a logistic type settings. +There's a lot of FinTech. There's a lot of financial financial firms use it a lot of fraud detection. So again fraud and spam these are very real time problems. You can't like detect yesterday spam or fraud. That's like really harmful. You got to you need to know now right. +And then a lot of like recommendation and like product experience. So anytime that like you want to power a user facing experience, you almost always need that to be real time. So you know example I like to use is there's a there's a there's a place called what not if you go to what not. +com if you've never heard of it. What not is basically a streaming site for buying and selling. So it's sort of eBay meets Twitch kind of a easiest way I could describe it. But what's really cool about that is you have a recommendation problem like I want to buy something or people selling it. +So I it's really in the sites. And my interest were you to show me like you might want to check these things out. That's like a recommendation problem. But it's like really real time right. It has to match me to online sellers at any given moment. +And so it's a recommendation system that has to get built needs decent amount of needs high scale. And it also needs to be real time. It needs to use a lot of real time data. So these are all use cases for Rockset. These are every one of these is real customers using Rockset to do something. +Yeah, for sure. Now I want to go back to jump back to tech side. So Rockset and inside it are you using RocksDB or something else? So okay. So okay. Are we using RocksDB? So first of all do we know what RocksDB is? Just everyone's on the same page. +RocksDB is an engine that was built by Drew Bat Facebook. And I shouldn't say by Drew, but by a team that Drew was a part of. Like he was one of the original founders of that team. There's certainly a lot of people involved in RocksDB. It's a key value store right. +It's built sort of to scale very well and sort of do log structure merge over time. Rockset absolutely uses RocksDB as its storage plane. And so there's a lot of Rockset built on top of RocksDB. So Rockset is not RocksDB as a service. That is not what Rockset is. +We do use it as the storage plane of Rockset. And we do take heavy advantage of, again, to get into the technical weeds a little bit like log structured merges to keep our indexes sort of up to date continuously. And that is a big part of like the real timeness of Rockset. +Like being able to update the index continuously and having this like heavy weight infrastructure to merge these indexes and then the kind of the appendonly log structured way that you do. And the LSM world is part of the secret sauce. +It's not that secret, but it's part of the secret sauce of Rockset. Yeah, for sure. But then also all these things like vector search, you know, storing the embeddings. Is that also happening outside of RocksDB? Basically, in the layer you explained. +So hold on, you asked about vector, what were the things? Oh, embeddings. And embeddings and vector search itself and the sort of a and n indexes presumably. Yeah. +So the a and n index, so we've added, we've extended RocksDB a little bit to kind of have this notion of a blob of memory that you attach to a particular thing. It's what's going to be the a and n index. And then you can build custom operators to merge them, for example. +And so that we do, we do essentially shove the a and n index into this. And so it gets into RocksDB. RocksDB doesn't know about a and n indexes. It just knows there's a blob of memory that it has to log structure merge down the road. As far as embeddings, for us, that's just arrays. +So for us, an embedding is just a vector. And for us, a vector is just an array. There's no real difference in the way these things are stored. And those are stored in RocksDB. Yeah, got it. +And so, and basically, what other AI capabilities does RocksDB offer, you know, basically everything? What's the secret source of that thing? So they're facing, right? But still. So I have two, there's a few things to talk about here. We talk about secret sauce. +So one thing we skipped over about one thing that's worth touching on in terms of RocksDB or architecture is RocksDB has two things that you hope every database has, but not every database has one is we have disaggregated storage, like fully disaggregated storage. +So if you double your storage, you can, you can, basically, you can double your storage, you can double your compute, you can do either. You don't have to do both, right? You can, they are stable. +They, there's compute optimized machines and storage optimized machines, and you can add to either group independently. We also have compute and compute isolation. So you can set aside a set of machines, for example, just to do ingest and a different set of machines, just to do queries. +And they both operate on the same backend, for example. You can go farther than that. +You can have different groups of machines for different sets of queries or by 10 in or whatever you can go wild with this idea, isolating compute from it's from each other, right? Once you have disaggregated storage, this is an idea you can do. +This is already really powerful for AI use cases, like in a way you don't necessarily appreciate, because what it means is I have a way to do my index rebuilds, which are expensive in a vector world, away from the machines handling queries. +Like I'm not, like what's not going to happen is the machine, the database is going to bog itself down doing an index updates of some sort while queries are trying to be served and you're going to get time out. So being able to actually separate out compute is very powerful in these AI settings. +Again, another example, no one's done this like in total anger yet, but it's going to come, which is like the hey, I have a god awful amount of vectors. I want to update them to the next generation of my the new open AI model has come out. I want to rerun the entire data set. +We can do that in this kind of like off on the side fashion in a way that just reduz it all in place without affecting the running application as an example. +So that's one like kind of very architectural found is a very database type of a feature that you you will miss it if you don't have it when the day comes. +Moving up to kind of more AI level things, the other thing that we have is like we have a huge pile of infrastructure of like doing SQL and relational queries, right? In this system that's separate from the vector stuff. +So when the vector stuff gets mixed with that stuff, the things get very powerful and very magical. +And so this gets you into so there's it's funny because database people talk a certain way and AI people talk a certain way and a lot of times they're actually saying the same thing, but they use none of the same words. +And so they don't know they're talking about the same thing, but as an example, like so in an AI context, things like metadata filtering or hybrid search, these are all things Rockset does out of the box. +Like metadata filtering in an AI context or a vector context, that's just like the wear clause of a SQL query. Like that's all that is like where X is created and this time is created and that. Like so for us, it's that's all done. Like metadata filtering is easy. That's not a hard problem at all. +All you have to do is you know, I have a super powerful query language. I'm query optimizer. All you have to do is kind of merge that with the a and n kind of vector search and I get like metadata filtering is like a not that's not a hard problem for us to solve. +Like it would be for others to solve. And so I do think we really shine in situations where a you care about real time ingest, be you care about any kind of hybrid or metadata filtering. +Rockset's really good as well for for kind of raw vector power, but I wouldn't say we're like the best database in the world for like, I don't know, I view it more like we kind of going where our customers are taking us. +Like if the customers came to me as like, Hey, if you if you if I can have 10 times more vectors and like 4% more precision and recall, if you implemented this slightly better algorithm with these parameters, we would do it. +But almost always it's like they want we want that like hybrid search seems to be the king. Like it's it's it's merging these things. And that's where like a lot of our effort has gone is into making the hybrid search story like making these two worlds work together like fairly seamlessly. +Like be able to say like show me the closest 10 vectors that were updated in the last 10 minutes. Like that that kind of query is really powerful. And that's kind of what we've been focused on in terms of in terms of. But I guess the timestamp example you gave it's also like metadata check right. +It's kind of like way close where you say between a and b timestamps. Yes. Yes. But like hybrid search at least the way I'm hearing people do this is that let's say take the search domains example. +You might have a keyword search right which is your sparse index and then you have your then syntax vector search. And you want to combine the two in some way. For example, you could say I still trust keyword search. So let's give it 75% of weight and then 25% goes to vector. +And then you combine them into leave them in some merging strategy. And then you return back to the user. Is this how you see hybrid search or do you see? So I have a whole ran here. You might you might have yes you've unlocked my ran here. +So let's go so hybrid search is one of these very overloaded terms exactly as you have kind of this is kind of sometimes what people mean. Sometimes people do they they smuggle metadata filtering as a hybrid search. +Strictly speaking under my definition, metadata filtering is a kind of hybrid search. It just sort of has extreme weights right like it's weight one if it matches and zero if it doesn't. And so it's kind of like a weighted hybrid search. +You can also do this kind of linear combination hybrid search right like I have a BM 25 keyword type a ranker which by the way, rockset can do like rockset has this rockset you can build you can do this order by keyword ranking limit 10 like you could write that. +And then you can also then do the vector limit 10 like show me the 10 closest vectors. There's nothing stopping you from saying or you know order by 0.25 of that plus 0.75 of that for example, in your in your example. +So that kind of linear combination hybrid search is is doable like that that's how that's how you could do rockset. Sorry, that's how you can do that kind of hybrid search on rockset today. + Now you people do do slightly more advanced things than this by the way there are like you can go beyond that in hybrid search and get into things like by encoding and crossing coding where you really do try to take the the expanded vector space and treat it non-linearly so it's no longer a linear combination of the two halves. +And we've this is this is something we are actively looking at. + I don't so it's I don't think it's hard to add like it's easy extension onto onto onto the current system but it's more of like a science question like it's more of like if you tell me what to add I'll add it sure that's easy but it's like what do we add like what's the right crossing code I don't know that's a much harder problem that's like much more of a scientific question in terms of like do I need to train an encoder for your particular use case is there such a thing as a good off the shelf one right so that's that's kind of where we're at with this but but in terms of adding that functionality that is like an active you've you've this is the this is the frontier right now for us that's for those people that they're trying to go beyond the kind of bilinear yeah wait yeah another thing yeah bilinear is it's it's amazing maybe you can share some resources as well for for me and the audience to read but also I thought you know when hybrid search sort of topic emerged right in in the vector database world you know bb8 Pinecone milbus what you're like and so on I think one thing that was overlooked and I really wanted to tap into that at some point is to learn the alpha rise because it's not given like how should you come if you go with linear combination you know what should be the alpha for your data yeah it's fun yeah this is what's certain like the search community has been doing things like this for a long time like search people are quite familiar with this idea of I have a semantic search system I have a keyword ranking system I have an alpha and I've learned I learned that alpha and I inject it into my system and then they've even gone farther like search has this whole like wand idea people I don't know if people are familiar so again we have to have all these community like weekend weekend exactly right joe bersham listening to this podcast will probably say yeah I know what you're talking about yeah yeah yeah so it's funny because like the vector community is sort of like I mean it's not it's not rehashing it's not relearning because it's got this new thing this a and n things what's got to like drag a and through the search history of like things these other kinds of things so yes so so learning the ant parameter um this is not a particularly hard thing to do using rock set but it's not a thing we we don't we don't help you like I don't have a button you push to to automatically learn your your alpha like you can send me whatever query you want it can have whatever alpha in it you want you can build whatever system you can query us arbitrarily to generate an alpha however you'd like and and then send me the queries with the alpha you you you you you dreaded and ready but that's that's roughly how that's going to look today for us yeah do you think at all maybe picking a little bit into the future and sort of inspiration do you think at all that the industry you iin one day will end up suggesting these values to their users you know learning from the data and sort of maybe even like you know looking at how things behave you know introduction what do people click although yeah there is a risk of going too much into the application logic which you probably do not want to do but I know I my view is kind of like so once upon a time I had a similar feeling this reminds me of a similar discussion that didn't happen that long ago which was like around feature stores like database people looked at feature stores and we're like what do you need a feature store for like you just use a database store for features and the reality is most feature stores that's what they are they are databases that on top of them put a lot of things to help manage as a first class citizen the lifetime of a feature like orchestration platforms like like techton and he's like orc orc orcust and that orchestration systems that's that's what you're I think it no matter what there'll always be a database in there and something like rockset will be in there and the question of whether or not rockset the company is like a larger piece of software that has rockset the database and some orchestration layers above it to help you do these kinds of things that's a harder question I think that if you ask me to make a prediction about where things are going my guess is that for the foreseeable future hybrid searches king of some kind so very few problems will be purely vector search I that's my guess almost all will be will we greatly benefited by some form of hybridization even if it's just metadata filtering um and then that means that the more advanced search techniques that will slowly migrate over which means things like alpha learning and weak and and all these other kinds of higher level two-stage retrieval type ideas that that come from the search world I do think will come over and more and more influence the vector search world because the vector search ultimately is a form of search so it shouldn't be surprising that most of these same ideas are still still apply yeah for sure I mean there is this extreme example from Mark Cuban the episode on Lex Friedman podcast that they just finished listening to he says that probably in the future all of us will have their own LLAMs trained for whatever reason you know for example you want to do stock trading and so you start you know draining your model maybe on specific subset of stocks or whatever and then it will help you it will augment augment you as they say yeah as an entity yeah I would love I would love for a chat GPT that could like put in making an email sketch me a skeleton email in my voice like the chat GPT voice if I say hey write me an email to say this to somebody it's not my voice right it's a I don't know it's a little too corporate my voice is a little bit more yeah so it'd be cool to have it like learn my voice and be able to you know write me a skeleton that of something that was like sounded like me that would be that would be awesome I'll be I'm there for that yeah what I would like that some model or whatever it is would remind me that I forgot to drink water you know something like that so it learns my habits and it knows that it's bad for my health you know remember to do these remember to stand out remember to walk things like this you know I drank some water that's good everyone drinks water yes yeah please do because it's very healthy you need to drink I guess two liters a day whatever some people do forget this and then they say have you know I have to take pink here or so whatever no you don't just drink water but so what else do you want to share about Rockset you know as an offering as a AI enabler you know maybe do you guys plan to support rag or do you think rag is sort of like client side you know thing as well that people can do you know using your tap things like that no no we we um we actually have a bunch of rag type style use cases on Rockset today and I do I do think Rockset naturally supports rag but it's interesting so like I guess one of the my kind of open questions is pure rag and I'm making up a term here but but but it is actually one of the very few like almost perfect vector use cases in its pure vector search but I'm actually not convinced because even most of the people that we're we know that are doing like rag style things want are also doing some amount of boosting and or metadata filtering to like further augment like hybrid augmented the retrieval that augments the generation um uh and so so for example like hey if the user asks about a certain thing when you search for blurbs to augment the generation boost the more recent ones kind of a kind of a thing like there's this kind of thing that gets injected into these systems um yeah so I'm I'm we you can build this with Rockset today and I'm quite keen on on these kinds of use cases I would say that like like looking forward I'm I am quite interested in this kind of emerging dynamic of like where the real value is from here they're sort of like at least three dimensions things could go one is like better and better an an algorithms that squeeze more performance and more scale and more whatever recall out of out of everything out of every bite of RAM and so forth and so on another direction is incrementability so a lot of these there's a lot of a lot of these like really advanced really strong systems sorry a n n indexes are not updateable easily so the sort of updateability destroys a lot of what you just worked really hard to build or you spend way too much CPU to do it so which is better like which which in on in real life which are the what I rather update twice as fast or twice as painlessly or what I rather get three and a half percent more you know on my precision recall and then the third dimension is how do these things integrate with other indexes right so certain a and n indexes are much better at doing meditative filter at scale than other ones are and so you know if there's more value in that than the 3% I got over here then I so it's not all together clear we are pretty heavily betting on the I shouldn't say we're betting on it like right now we we got the hybrid stuff relatively easily so that's the thing that we're building heavily because all the all the hybridization has been and the the incrementability because that's core like so for us the incrementability is like not you have to have that I can't use an an index that requires like overnight training like that's not a thing that that rock that doesn't work with rocks x we were trying to be real time and then I guess there's like one fourth dimension that could blow all this up which is that somehow the vectors get so good that none of the rest of this matters like maybe there is maybe there is no rag maybe there is no like maybe the vectors just good enough maybe the machine is smart enough that we don't need any of the rest of this we don't need any hybrids I think that's unlikely in the short and medium term but who knows in the in the long that's probably require some kind of singularity yes it is jump right because that means that you do not need foundational models from metal whoever right you could train it from scratch and if you can do it within a couple minutes then why would you bother taking those models right that's very interesting it's my that that's that's why I said there's three and then I threw the fourth one in because I it's it's not impossible but I think it's not likely not anytime exactly I mean it's probably if this is about to happen then probably we would already see the room like you know the signals of that but today still we can see how these giants keep training the models and they keep open sourcing sometimes encodes sometimes for real but yeah it's it's another topic to cover I have a very practical question as well so for example if I do have a model and that model could be from Higging Face for example so it's not mine how do I bring the embeddings to Rockset can I leverage the Rockset's infrastructure to compute the embeddings themselves so this the answer in short is no today and it is on my list super high on my list so there is a customer who came to me tomorrow and was like hey I want to run this model using your infrastructure over my data I'd probably find a way to make that work for that like an existing customer like I would like because that's a feature I want to build I'm like waiting for the excuse to build that the problem for me is it's just really hard to build generally like if it was like call this API or support these exact kind of models it's not so hard but to do it in general without having like a specific customer demand it's a little bit trickier so we can kind of wait until that take a little bit more shape but we have the pieces in place like it's not hard for me to spin up a bunch of machines that run on your data and write to your database I just it's the actual like last mile of wire of like what code do I run how do I secure that code you know like that kind of stuff that's like what's missing from us so today and today you have to give me the embedding you're gonna have to run them and put them in a rock set but this is at the top of my list of sort of features I want to build yeah I mean it just sounds and by the way you know if you take database today probably you could divide them into two groups you know using these dimensions specifically whether or not you can compute embeddings inside and sometimes you do not want that because you want to like fine tune the model and obviously the database wouldn't have access to it unless there is a very easy way to plug it in which I haven't seen by the way probably I'm missing something but I haven't seen it and everyone today has some sort of vector support you know both the traditional databases as well as this new breed of vector databases but yeah that's interesting that's interesting that you guys are looking in that direction what else you know like if if someone wants in the audience wants to try rock set today you know do they need to pay it right away well can they have some free tier to play around oh there's free there's free tiers yeah so you can play around you can play around for free in rock set and uh if anybody is like super interested and they have something interesting and they they they can always email us too um we we will try to find a way to make make make that stuff work as much as possible but yes there is a free tier you can go back around with it yeah um and um it is managed so that the one thing you have to understand about rock set is the managed service right so you're not going to download it and run it or whatever it's not that's not the way it works no and and by the way that's exactly the advantage for businesses right and that's why we do have different business models you know because in the end of the day you're not doing this only for fun you you really need to run money too for the company to grow and and build more things for your users and so that's absolutely legit uh approach not everything needs to be open source you chose it that way but it's great that you have free tier and we can also link it in the show notes sure um what are you looking at you know do you need some you said you have already so many clients in different nations different verticals what else would you benefit from by sharing rock set into a wider community you know through these podcasts all right so there's a lot of ways to answer this question but but this is the vector group right so selfishly I kind of already hinted at this is I'm trying to get a clearer sense by where the value is going to come from in for vectors in the in the short and medium term for people like there's a lot of people out there and we saw this there's a million people trying to oh my god vectors are happening how do I plug this into my business like is there it's can I use this and we've seen a bunch of interesting super novel use cases like things you would not expect and you know there's an insurance company that want to that wants to like scan internal documents you know do they want to do search they want to do like internal search semantic search um and so for me my my most selfish interest here is to really get a clear picture of like which of these like little subdomains is actually really providing like real value like what is really what is really like taking off it's hard it's sometimes it's hard to tell like who's just messing around because everyone's messing around literally everyone is messing around and who's like actually latch on to something that's got some real legs and every time we find a customer that's got like real legs we dig in we're like all in we're like all right how can we help you like let me let's you know again like the the I'm waiting for one of these people to come back and be like can we retrain our embedding so like all right yeah let's go build it right so that's kind of my yeah I I want people to keep messing around with this stuff I I want to figure like all of us messing around it is going to find where it gets traction like where we can get our hooks in and where things start to start to really make progress and then I just want to hear from those people like I want to know what what you need every time we talk to someone it's something new and surprising right um and that's kind of though yeah when the real world intersects with all this like you know uh in my head it's all an indexes and graph theory or whatever but but uh when the real word intersects is always something like simple that you need that would make your life a lot easier and that's the kind of stuff that I'm eager to hear yeah I think uh I could share with you without saying what would be okay uh one uh member of my team said hey we're we're we're using one one um search engine today which also has you know beyond the um sparse indexals that vector search support and so he was saying okay they're using hnsw algorithm but I cannot tweak the amp parameter and I forgot what was the second parameter and look because I cannot do that recall is really below what it needs to be it just doesn't work and then he went online it's an open source database he typed you know the issue on github and they realized oh we missed really important thing so they quickly uh expose the parameters and so he now can tune them right so so yeah the tuning of the index is another this is a good one right so a lot of these systems have like a tier there's like a coarse grain and a fine grain so you have hnsw over IVF or hnsw over or IVF or IVF and then each of these has parameters and so you get these like massive config strings that set that say how these are built um and we we expose this so you can do all this stuff but in real life if you're building like what what number do you even pick like how do you know I don't know that person must have gone through a lot to decide they needed to change that ever because it's not obvious it's not like oh yeah you it's like you look at the data like 16s wrong like the infrastructure to like optimize this system is not trivial and then even if you do optimize it you have to rerun everything you have to rebuild that index right once you kind of trained it so to speak so yeah I think that's a that's a huge area where our our infrastructure is not helpful at the moment yeah but I'm sure you will learn in general excited like Luis look you have so much information that I think we should record another episode as well down the road as you guys progressing on the database and you add all this interesting you know tweaks that and not to the database as well but I'm also super excited about the direction because basically you offer like if you take pure vector databases you know they do not implement SQL support right right they like the purpose of what what the existence is something else right they've been designed to have vectors as the first class citizens and so they they make it super easy to plug in a model or actually have the model you know almost pulled from hugging face or some other model storage model model hub but then when you want to do some facets or whatever you want to call them aggregations right that's not as easy depends on database probably as well but I've seen some I don't want to name them but in any case they know it's it's a weak point and it's probably because they do not want to serve that segment of the market maybe they do it's partially right but it's so hard yeah exactly yeah I mean I think that the really good vector databases who succeed will slowly turn into databases and databases will turn into like these things are merging they're just coming at each other for different directions like like if you're at a vector database if you're building a vector database and you're looking at your metadata filtering support you're like I can't make this more powerful without just reinventing SQL like at some point I'm going to have to just build SQL and so one day they're going to bite the bullet and we'll I mean maybe not SQL but something you know SQL complete if you will right because you just need all that stuff and then pretty soon you get into the problem of like hey my metadata filter is the slow part of this of my thing so now what like oh now I'm doing query optimization like SQL query optimization like now I'm building query optimizers like metadata filter optimizers and you know so we have all that like we brought all that to the party right like I have a I have a cost based optimizer for my SQL query so if your metadata filter does crazy stuff I can like do you know all kinds of SQL magic to like to optimize this query but on the flip side like yeah like so everybody's everybody I think the good systems need all this stuff it so it's just we took a hard problem we took two hard problems and we say congratulations this now one hard problem and it's like okay well okay it's a big hard problem yeah I love how you you model it that this database is and non-data bases sort of will converge eventually even though everyone I think at this point calls themselves a database yeah probably minor exceptions but still you are spot on on whether or not first of all what is a database right and then whether or not you have all these features that that need to be supported and and also like really importantly the world is used to having SQL databases right so like if you sort of I don't have a better analogy but basically if you develop something and you say it can run but cannot walk and you're like okay but sometimes you need to walk right that's amazing before we close I really like to ask this question with some people find it a little awkward to answer but I do feel it's important it's a little bit philosophical and I ask what drives you it used to be why you do this but basically when you wake up you know you are driven to continue but what's inside that spinning you've been through it right you've been doing this for so many years also Facebook at scale but you want to continue to do that so I am the way I think about this is there's there's like a shiny problem at the heart of all this that I love and if you if you let me I will sit there and like I will be happy if like I just come into work every day and like look through the corridors and and fix bugs anything that's crashing then look through the profiles and like optimize code I can just do this this just makes me happy and so like building like reliable scalable systems make me happy so there's like this shiny problem in the middle of all this and it's like the common thread through everything that I could just do and be happy and it it's rewarded and rewarding right and so that like the basis of like it's really easy to like this stuff so then obviously you have to extend upon that like the way that you get driven beyond the shiny thing because you know I could go do that for like Minecraft mods I don't have to do that for databases is like in some larger mission that like you feel connected to so for me the the mission here was a little bit too old the people was actually kind of the original driving force like this is the people I don't even care what we're I don't care what we're doing like let's go do it like us as a group that's gonna be fun but then the whole AI thing I mean look I we can get philosophical you want to get philosophical do it real quick there's like two or three nominations for technologies that will change the 21st century like and I you got to work pretty hard to not put AI at the top of that list maybe there's some other ones you could argue like maybe nuclear fusion is a 21st century revolution maybe gene editing like I don't know you could come up with something but like chances are that AI is gonna be like a defining 21st century technology so you're gonna let me play with my shiny toys in that yeah that's okay I'm out of bed now right I'll get out of bed I'll come I'll come get out of bed and I will let's go let's go build something so that's I think that's my answer to your question amazing and I think you got it on from here to really and and they saw passion knowledge so you did see the movements so I'm really excited to see what you guys got a built thank you so much for joining me today to discuss yes we didn't go to the NM tuning this algo and this is how the algorithm goes but hey I really enjoyed the product level this is what this is on the money some during my company say yeah fantastic thank you so much Luis you know enjoy your day and let's talk soon awesome thank you for having me and yeah happy to chat again all right cheers bye bye \ No newline at end of file diff --git a/transcripts/vector-podcast/malte-pietsch-cto-deepset-passion-in-nlp-and-bridging-the-academia-industry-gap-with-haystack.md b/transcripts/vector-podcast/malte-pietsch-cto-deepset-passion-in-nlp-and-bridging-the-academia-industry-gap-with-haystack.md new file mode 100644 index 0000000..6ac8998 --- /dev/null +++ b/transcripts/vector-podcast/malte-pietsch-cto-deepset-passion-in-nlp-and-bridging-the-academia-industry-gap-with-haystack.md @@ -0,0 +1,310 @@ +--- +description: '

Topics:

00:00 Introduction

01:12 Malte’s background

07:58 + NLP crossing paths with Search

11:20 Product discovery: early stage repetitive + use cases pre-dating Haystack

16:25 Acyclic directed graph for modeling a + complex search pipeline

18:22 Early integrations with Vector Databases

20:09 + Aha!-use case in Haystack

23:23 Capabilities of Haystack today

30:11 + Deepset Cloud: end-to-end deployment, experiment tracking, observability, evaluation, + debugging and communicating with stakeholders

39:00 Examples of value for + the end-users of Deepset Cloud

46:00 Success metrics

50:35 Where Haystack + is taking us beyond MLOps for search experimentation

57:13 Haystack as a smart + assistant to guide experiments

1:02:49 Multimodality

1:05:53 Future + of the Vector Search / NLP field: large language models

1:15:13 Incorporating + knowledge into Language Models & an Open NLP Meetup on this topic

1:16:25 + The magical question of WHY

1:23:47 Announcements from Malte

Show notes:

- + Haystack: https://github.com/deepset-ai/haystack/

- + Deepset Cloud: https://www.deepset.ai/deepset-cloud

- + Tutorial: Build Your First QA System: https://haystack.deepset.ai/tutorials/v0.5.0/first-qa-system

- + Open NLP Meetup on Sep 29th (Nils Reimers talking about “Incorporating New Knowledge + Into LMs”): https://www.meetup.com/open-nlp-meetup/events/287159377/

- + Atlas Paper (Few shot learning with retrieval augmented large language models): + https://arxiv.org/abs/2208.03299

- + Tweet from Patrick Lewis: https://twitter.com/PSH_Lewis/status/1556642671569125378

- + Zero click search: https://www.searchmetrics.com/glossary/zero-click-searches/

Very + large LMs:

- 540B PaLM by Google: https://lnkd.in/eajsjCMr

- + 11B Atlas by Meta: https://lnkd.in/eENzNkrG

- + 20B AlexaTM by Amazon: https://lnkd.in/eyBaZDTy

- + Players in Vector Search: https://www.youtube.com/watch?v=8IOpgmXf5r8 + https://dmitry-kan.medium.com/players-in-vector-search-video-2fd390d00d6

- + Click Residual: A Query Success Metric: https://observer.wunderwood.org/2022/08/08/click-residual-a-query-success-metric/

- + Tutorials and papers around incorporating Knowledge into Language Models: https://cs.stanford.edu/people/cgzhu/

Podcast + design: Saurabh Rai https://twitter.com/srvbhr

' +image_url: https://media.rss.com/vector-podcast/20220830_070827_46ba9c40226c9b5c8e39886c99b0aea3.jpg +pub_date: Tue, 30 Aug 2022 07:27:26 GMT +title: Malte Pietsch - CTO, Deepset - Passion in NLP and bridging the academia-industry + gap with Haystack +url: https://rss.com/podcasts/vector-podcast/599924 +--- + +Hello there, Vector Podcast. Season 2, we are relaunching after summer and it was a little bit of break last episode was from Berlin buzzwords and today, coincidentally, we have a guest from Berlin, multi-peach, a studio of Deepset, the company behind Haystack. +So we're going to be diving into what I call a neural framework, but I wonder if Malta would give a different picture there, but still very interested to learn and dive into multiple topics there. Hey, Malta, how you doing? I'm good doing great. Thanks for having me today. +How are you doing? I'm good. I'm great. It's still summer. It's super hot as we were exchanging before the recording. It's super, super hot, but I like it. +So yeah, I think before we dive into what is Haystack, I really like to learn about yourself and what is your background and how did you find yourself in this space of what we call Vector Search? I wonder if you describe it differently, but I call it Vector Search, Vector Search players. +So can you tell a bit about that? Yeah, I'm sure I'm happy. So I would say my background is mostly in NLP engineering, what I would call probably these days. And during my studies, I basically had no clue about NLP. I think it wasn't really any part of our coursework or something really a thing. +And for me, all then started basically after my studies, went to the research project in the US, which was at the intersection of machine learning and healthcare. And the big, big focus there was on numerical data. +So we were basically trying to find signals, patterns, and laboratory measurements for kidney disease patients to predict some kind of risks. And there was all the kind of numerical data. +And NLP wasn't really really scope of that project, but there was for me, that basically one kind of event that made me then get in touch with NLP and eventually fell at fall in love. +And it was really in this project, we tried to predict a lot of these risk factors through a lot of, I would say, quite fancy modeling to get some good signals. And at the end, it kind of worked. +We were able to predict some risks, but when we then talked to doctors and showed them these results or asked for their feedback, they said, yeah, yeah, that's all correct. Yeah, but it's not really new. We knew that before. But this part here, this is like, this is an interesting one. +And this is what we do there. And that was basically the only small part where we, where we looked at written notes of of doctors during treatments. And from a modeling perspective, that was really, I would say, nothing fancy, nothing advanced, nothing where we spend a lot of time. +But at the end, it was the point, I think, where the, the doctor's physician saw the biggest value. And that kind of got me to think again, thought, okay, well, like, just this kind of data source, it was something they couldn't really access before. +And now with this, like, very simple, native methods, they somehow saw a value, a new thing. And that's basically where I thought, oh, what, it's cool. +What can you actually then do with more advanced methods of, if you have more fancy models, how can you make this kind of unused data source than accessible. And yeah, basically, realizing this, the power of it. +And that's basically when it then started digging deeper, working more on energy, at some point, then set left research, because I was really interested in seeing these models working the real world. +How do they work at scale? How can they really then solve problems every day? And basically, and came back to Germany, worked in a couple of startups, always just say, NAP at scale, kind of intersection, a lot in online advertisement, recommend our systems. +And then eventually four years ago, then we started sort of Deepset. And together with two colleagues, we found the Deepset basically because we saw this big motion appeared was kind of piling up. +There was a whole like still pre transformers, but there were early science, I think, on research that, that things are becoming more feasible and super interesting things became possible. +At the same time, we also saw that there's this big gap, you know, like things becoming possible on research side, didn't really mean people were using it in production in the industry. And I think we were at this, this interesting bubble back then. +We did it, we applied deep learning models at scale, saw how that worked, but also saw how much of work it actually is of manual work to get it done. +And basically up the early days of Deepset were mainly around, how can we bridge that gap, how can we get latest models from research into production in the industry, what kind of product tooling can we do. And can we build to make that transition easier. +Yeah, and that's basically how we, we ended up in the, in the startup world building building out Deepset. And, yeah, initially, that was really more about we saw this problem. We had a couple of product hypothesis, but we didn't, didn't like say place a bet on directly on one of them. +We rather said, okay, let's, let's go out there. Let's really try to understand for one year what are really repetitive use cases out there. What are really the pain points of other enterprise teams that are working in that field and then kind of settling on a product and then building it out. +Yeah, that's basically after one year, how we ended up in search. + And of course, I would say really the one use case, the dominant use case, there was present in every company that we worked with and that was really a big say, valuable use case, where the push not only came from the developers who wanted to do something better, but also actually from the, from the business side where people saw big value inside Eric. +I use Google every day, where can't we have something similar in our product or our internal data sets and and that thing was something that got us done really interested. +And on the same time that on the the tech side, basically learning more and more about the pain points, why is it actually so difficult for for people in these and these enterprises to build modern search systems, what could you actually do to help them. Yeah, that's fascinating. +Actually four or five years ago, could you have imagined that an L.P. +would cross paths with search because like in many ways, this bar search, which existed for many, many years before was in some sense, I sense it that way in mailing list, let's say a patch is solar mailing list, people were dreaming about applying an L.P. +in some way, compared to what is happening right now. + I don't want to downplay those efforts, but I'm saying things like you could embed a post tag, part of speech tag on on term level, and then use that during search again, you need to run some kind of parser on the query, and then use that payload information to filter through let's say adjectives and verbs or something bad, you know, I don't know if there was any practical application in place, probably there was. +But again, if you compare that to what is happening today, you basically have a vast array of models right in deep learning models that can be applied directly to search using vector search approach, could you have imagined this happening when you when you were about to start the company. +No, I would say I was I think what we we had big big say dreams about N.A.P. +and we we were true believers that that things become easier and say more feasible in production, but that was more actually under I would say transfer learning side and making models to say more easily adoptable to certain domains for search, I think that was for us. +And only then on our journey where we kind of realized, oh, like that's actually two interesting different fields kind of connecting over time right and also I felt from at least from my perspective, from a community side from the people who worked on information retrieval. + I think for a long time, a big, like a lot of skeptic people, I wouldn't be talking about any key or dance dance retrieval for good reason right because I think there was also like a lot of hype around deep learning and still what's a lot of promises that were made like that it will just outperform space retrieval out of the box. +And then I think many of these promises were not hold for a long time. +But I think then basically there was another phase where I think people realized, oh, actually now it's kind of starting to work and not only just in research and these ivory towers and lab settings but actually also in reality at scale. + And I think that was then fast also here, the moment where I've got really interesting and I think since then just crazy to see how things are progressing when thinking about a multi model search or now just was like more I say going away from document retrieval to maybe something like question answering which we do a lot. +And really really crazy to see what's possible these days and I couldn't have imagined that it's going so fast. Yeah, and there are a lot of contributors as well, of course. I just happened to give a talk about players in vector search. +I will link it in the show notes, which was just published with C's. London IR meet up, but even that during that presentation, I felt like I'm scratching the the tip of the iceberg in some sense, I know there is so much happening. +And in Heystack, like did you have a vision for the product, like you said, you didn't know what the product will be, but you knew sort of the repetitive use cases in a way, right, and also challenges, can you share some of the early day challenges that you saw. +And do you think that they are solved today or are they still kind of like in the mix of we need to fix something's there. So I think that was basically all about this first year of Deepset, where we did these learnings where wasn't that clear. +But after that year, I think we had a lot of clear insights and at least for us, a clear vision also for Heystack, what we want to want to solve there. +And I would say the big challenge, the big problem that we focused on that we saw in the industry was having just all these get up technologies and. +And basically Heystack is trying and always as I would say as a design philosophy design principle has two things in place that try to bring these data technologies together in a meaningful way. +And what I mean with that is basically if you think about search it's what say really it's a lot more than then model right and it typically you have factor databases. +And you may be chained together multiple models, you have something you want to do at indexing time, you have other things you want to do a query time. +And for each of these say kind of components that you need at the end, there are so many different options that you're that you can plug in and often it's hard to say in the early days. +And then you know, do I go for elastic search or something like Pinecone electrical database, do I go for this model or that model, do I need a, I don't know, just the retriever in my pipeline or do I actually also need to add a re rank or something else. +And we just saw that teams are aware of actually spending a lot of time on. And then we're doing these things together manually. And even when they had it once there was and constant or maintenance work or iterations where they have to exchange one component of the system. +And that was really just slowing them down a lot and sometimes even then causing that a project got. So over time, not really ending up in production, but kind of dying at the prototyping stage, because it just took so long and and things got kind of sidetracked. +And with hastag, we basically tried to solve that and having very clear building blocks like, for example, the retriever, which very clean their face. +And within that you can swap a lot of different technology models and the same for a slew vector database document stores where you can very easily change between something like elastic search, Pinecone, we veate and whatnot. +So I would say that's the was the one thing this building blocks and trying to get the focus of developers back on making these creative decisions what they actually want to have in their pipeline, trying it out with with anti users, rather than just spending time on doing things together. +And the second thing is I would say very deep concept also in hastag up pipelines. So really what we saw is it's not just one model. It's typically a couple of steps that you want to have there. +So in hastag we started early on having direct as to click graphs where you can have different notes and basically when you have a query or indexing time file that kind of hits the pipeline, you can root it for this graph. That can be very easy. There is a set of a query. +I do put it to a retriever and I get back my documents or can go basically quite complex where you say all like depending on the query type. +If it's a keyword query, I rooted a certain path in my graph, my pipeline, or if it's a question, maybe I go a different way and I have different models, I'm basically involved in my in my search request. And these two, I was here, the core principles in hastag up. That's very interesting. +So that second thing they are cyclic graph with a love for very complex scenarios, right. Like as you explained, we couldn't principle support question answering use case side by side with the kind of like normal search with theory, rankers and stuff, right. Is that correct. Exactly. +So that's what we basically learned from customers like when we saw there was a big interest in something like question answering and people say, wow, that's amazing. Can we use that for our website or for our product here. But doing that switch in a production case is quite tough, right. +Like if people are used to do keyword queries and they know I know I have to enter your keywords to get basically my results. +And then from one day to the other, you switch to more semantic queries, maybe more questions or also I think dance retrieval, if you really have more sentences that you use. +It takes some time for people to adjust and we saw that in a couple of scenarios that basically the traffic kind of requests that come in. Start a lot with keyword queries and then over time slowly shift towards more semantic queries. +When people realize, oh, I can actually also ask a question and all this like, like Google. + And then there's a trend, but you need everything to have an option your system to allow both for certain time and and hasty basically with the query classifier where you can initially basically classify is that a question or a keyword query or you could go with also semantically like what a topic level saying all like this is a query for certain type of category in my my document set. +And then maybe do something different. And like early on Hey stack did it integrate with any database per se was it like the last search back then. Yeah, like the basically starting point was the last search was the very first document store we had. +But the last search back then didn't I believe didn't support neural search right so how did you actually gel these things together. Yeah, that was just that kind of coming in over time right so it was. Think the the era where elastic search was for us was really. +We came from a question answering use cases a lot and there was really like how do we scale that how can we now. Ask questions not on a single document and single small passage, but how can we do it actually on millions of files and. +And the 25 work as a retriever step before that was was okay was not not too bad and that's kind of how it started and then very fast evolved into into a say back to search direction. Where we had them a files basically as a as a next document store. +In combination with some some SQL database for for the metadata and so on and then it basically kind of. I think took off on the lecture database side with the nervous we via a Pinecone and so on and so forth open search today is also part of the face deck. But that was I think then just. +Half half here after we launched a stick. Oh yeah, that's awesome. That sounds quite quick. I know that BBA was also emerging about the same time. And then and then neighbors I guess as well. Yeah, that's that's that sounds super cool. +And was there any as you were approaching your clients or like prospects was there any specific use case that you would be demoing with because you knew this would trigger the aha moment like question answering or maybe a specific domain where you did that. +Yeah, I would say we were for us it was a lot around question answering back then that was really very great that I think many of these aha moments. As to remember we were at one client and when this meeting and it was like on the in the financial domain. +So we're interested in asking questions on financial reports of certain companies and basically accelerating their analysis. +And at one point in this meeting we showed what you can do with question answering ask these questions and they also like suggested own questions that we should ask and they work so they were that point and convinced oh like that's not fake. And like smoke and mirror here. + And the basically the boss of the department was standing up and shouting like wow that's that's amazing and went out of the office and at the office next door and and carried over colleagues and said like you have to see that and that was actually even before we started building hastag but was these kind of moments were very important to see like this is something. +That is not just fascinating for for techies like we were but also say business people and users see that value and see value and their work for it. + I can imagine that and it's like a class of what we call knowledge workers right it's something that you spend so much time on crafting this queries and I have spent some time in the full text finance I would say at alpha sense and remember some of the clients they had accumulated Boolean queries over a period of 20 years right and they were like so long it's like several pages. + When you when you when you slap that into solar it runs for three minutes because our index layout was not what it is today and was not very optimal and it's crazy to see what what people kind of start doing as work around right so we are at a similar case with a with an airplane manufacturer was not financial domain but really on some more maintenance level analyzing basically issues that come up maybe in certain technical areas and they also have like this crazy Boolean search queries and people just became experts and crafting that but it took them really long like asking for sending one query creating this query I was taking easily like minutes. + Yeah exactly and so what hey stack is today can you can you elaborate a bit on the architecture and maybe if it's possible if you find it easy if you put if you pick what say use case actually I recently I was talking to one stakeholder who wanted to build a chatbot but it was a very specific domain so that chatbot would actually ask you some kind of philosophical question. + So I think it's a very difficult then like questions sort of a little bit like distracting you from from what's going on let's say you are on a conference and in a lot of things go through your mind but you don't register maybe what's going on you don't get see the value and and that Zenbot might kind of ask you and well essentially allow you to pause and reflect right. +What I realized is that yeah I could pick another shelf model let's say question answering bird or something but it probably wouldn't work on what I want right my domain is different and I had an electronic book with this Zen type of statements. +So this one question I'm hinting to is kind of fine tuning or maybe even right retraining right but where would I start with hey stack and can you walk me through the architecture. + So as mentioned earlier into core principles are these building blocks and using this building blocks to assemble pipelines and I would say the core we come from is question answering and search but by now I would say the framework has evolved a lot in that direction if you have a lot of different notes and can support a lot of different use cases going to translation zero short classification. +And you could produce these notes in isolation or you can kind of assemble them and use them within your search pipeline. +So usually I think what what our users through and how they start is now they often come with a kind of search use case pick one of the standard pipelines that we have so we can very easily the few lines of Python created pipeline for no it's a question answering or maybe dance retrieval. + Pick a document store you pick one model from for example the hackenface model hub and and we give some recommendations on which models might be my people starting point and then it's very easy actually to just put your files into a into a pipeline can be PDF files we do the conversion basically for you there's a note for it. +And just have a basic say demo system up and running in a few minutes and that's often already I think a good good starting point if you are maybe also new to that field if you just want to try it quickly out on this kind of ebooks that you mentioned. + And get a get a first let's say quality of understanding how good piece of the shelf pipelines for my use case get this first data point and then basically enter the I would say next next steps typically in your project if you see all like this is promising but not enough for really going to production. +And then typically go more in this experimentation mode they say all it's now maybe evaluate compare a couple of different models let's maybe adjust this pipeline a bit or add a re-ranker maybe or go maybe to the to a hybrid retriever pipeline where we come. +Basically have a 25 retriever in parallel to a dense retriever and we join these documents and hastic has a lot of functionality that makes that easy to to basically change a pipeline as you wonder very quickly and then evaluate if that gives you any any benefit. + If these say of the shelf options and combinations are not enough for use case then yeah you can go down the fine tuning route I would say we have also have a source the notation tool labeling tool where you can create training data and basically fine tune parts of your pipeline retriever or reader for question answering. +So basically I would say everything from a quick prototype tool. Let's do some some experiments here and there to then going and production and deploying it with a with a basic rest API until basically. + Sounds cool and so in that experimentation mode I guess one one one aspect is like fine tuning you mentioned right the other is kind of like what building blocks I could plug in right and I know you guys have really good documentation is there something like a tutorial or or some kind of walk through that would even help me discover is a user what are the options. +So we have a couple of different different tutorials showing you what kind of notes also you can use like many people are not aware of for example options that can do it indexing time that might be helpful so. +For example, like enriching your documents with metadata can be incredibly powerful later at search time because you can then filter on your search space to make more categories that that you're interested in. And there we have for example, the stories that show you how easily you can. +For example, classify documents that you index to certain categories and then later on at query time use these categories to narrow down your search space filter for these categories. +And on the model side, say if you are now you know that you want to have a say QA model reader and you know interested in what model you want. +I would probably suggest you just go to our benchmarks page which is linked from documentation there we have a couple of comparisons in terms of accuracy and speed. But also we have most of our own models on the hackenface model hub which appears to find this information and model cards. +Yeah, that's awesome. So you guys in addition to open source version that I could I presume could host completely myself right I still have a bunch of questions on that open source side but still you also offer the cloud version you call Deepset cloud is right. +Can you explain what users get with that I presume scalability but maybe something else and I think we can we can leave a link to in the show notes as well for those users who want to try it out. +Yeah, basically hey stack the open source predictors will be a Python framework and you can do everything you want there to prototype the experiments and if you want also go to production with it. + But you also found in basically in addition to that people want something more like they want to really host the platform where it's really end to end and basically you have faster workflows so really what's covering the whole lifecycle of an application from early prototyping to running many experiments and parallel getting more guidance. +What's on from your eye perspective on what to launch investigating certain documents in a faster way. Then to OK now I did all these experiments and I want ever kind of one click path to production and I don't want to bother with any scaling and basically a productionizing on my side. + And this is basically what what we do with these at cloud so if you imagine as a host the platform the cloud the SaaS platform where you develop your NAP applications and can easily bring them to production and monitor them afterwards so really the I would say whole life cycle and especially what's going on getting your. +Getting your NAP pipelines faster to production as you would probably do it on a just Python level and then continue monitoring them and having this close group is to later want to maintain them. + So it sounds cool and since it's kind of like so with open source version I presume I could do kind of a local development on my PC right and then go and use some deployment pipeline to deploy with cloud version I have sort of like managed haystack right and now thinking about developer experience are you guys moving more towards cloud tools as well you know like for example. +A code editor could be in the clouds or the changes and click click the button and off it goes I don't even need to download it locally right or or do you see some other trend with your users. + No like we maybe that's also an important point so it's still a developer platform right so we are not in a low code no code space and what we really try is basically giving developers the option to customize components and that then goes through coding and and there we have for example editors directly on the platform where you can. + Edit for example just the young definition of pipelines and quickly switch certain parameters if you want to do that and then it's basically there's a hosted notebooks where you can also easily kind of open these resources like a pipeline and we automatically create some Python code of it in notebook that you can then. +Then edit as you as you know it also from haystack open source. +Adjust the sort of certain component debug it maybe at another one and then it's basically just one Python line again to move away from the Python code in your notebook to the production artifacts to the pipeline that is then deployed and then can run production. +Yeah sounds cool and if a user has some as a user I mean it could be a company right so let's say they have an established tool set you know maybe if the usage maker maybe they don't maybe use something else. How do you reach these tools said that is kind of outside of haystack do you have to. + I would say in most cases not so you will I mean what were very very basically stop I would say with with the cloud is when you have your pipeline to NAP service and you have your rest API that you expose that's kind of where we stop so there's a lot of I would say stuff in a company that is built around it when you're into your product and also on the other side of where do the files come from where does that. +Data come from how you think it into into a Deepset cloud. But within that space we rather see people. Customers who appreciate it that's like fully integrated and they don't usually then. Want to stay on on sage maker if they are on it for these NAP use cases so from our perspective. +There are the other are these more generic solutions that are not specific for NAP the car work for any kind of machine learning. But if you really have cases where you want to be faster on your NAP use cases. +Want to have more say support on that side that's basically where where Deepset cloud and comes into play and to give you an example your think of experiments should evaluate these pipelines. +And then you have to do give basically a lot of options to investigate predictions and what do these metrics actually say and this is a thing is something that is usually missing and solutions like sage maker. +You have to then really combine with many other tools and build in there like a lot of extra stuff. And that basically comes all together already with Deepset cloud. So get it right so Deepset cloud with offer me sort of an evaluation tool set right. +Can I get the same in the open source version or it's not present there. You can basically evaluate single pipelines also in the open source version. +The difference is that basically in Deepset cloud you have a full overview over your project where we track all your experiments you can kind of compare them. +Launch easily 20 experiments in parallel and this is actually on large data sets and with open source I think and generally you would need to provision a lot of machines GPUs to run that in parallel. +And that's basically what one thing that we offer and Deepset cloud and the other is basically the I would say just the you I love layer over it. So of course I can work with what Hey stack on and get basically a report around my experiments again maybe a panel state of frame I get some metrics. +What we do when as you on top in Deepset cloud is allowing people to interact with this kind of data more easily like finding examples of queries that fail that. Or that are successful getting feedback from also end users so collaborating. +Basically the the persons who use that search system at the end. +And now that's also what I think what we what we saw a lot that yeah you can extract your predictions and maybe it's like a CSV and then you shared with your next colleague who then I'm kind of rates or give say human evaluation if these queries makes sense or not. +But again this is like a lot of friction you have them a lot of these sees these are exifies floating around. +And what we would be what we do is I think bring this together again having it in one place that you can also in future easily reuse that for other experiments and even use it for training and and have it in this in one central place. +Yeah sounds amazing from what I gather this sounds like a end to end ML ops platforms specifically for an LP neural search right. +Exactly you have thought through so many things not only the developer side of things like experimentation but also you know debugging and actually going through the feedback from stakeholders or users. And then communicating with them. +Yeah and I think this is like something that is missed in many projects like this like end user collaboration and from our experience this should really happen in in a very early stage of a project that also kind of continuously when when you move to production and even when you are production. +And I think this is something which is if you don't have the right tooling that's very annoying to you probably like just building a demo like a UI for some search system. +If you are not a front end developer if you're an LP engineer it takes some extra time and even with something extremely these days it's still is then annoying to do it properly and if you're an enterprise maybe draft some access to it's permission words. +But it's so important I think when you look at what projects work out at the end what pipelines more customers go to production. +It's really a big criteria I think in the early days like sharing a demo with your colleagues and end users really the first pipeline you have more or less giving it to the hands of users and seeing what what they think about it and how they use it. +And there were so many examples where NLP engineers thought they they knew what people were were searching but after these kind of demo sessions or like sharing it I want to see what what people actually do there. +And then they realized oh like they use a lot of key work queries or they never put a question mark at the end or they have a lot of misspellings what else. So I think there's a lot of early learnings that you can make as a developer from these demos and understanding it out. +And also I think on the other side just creating this early aha moment this kind of wow effect and some trust on the end user side is also crucial. +So I would say that's a cycle one point very early demo getting this initial feedback and then probably the second point that we see often is when you then had a time of running your experiments tuning your pipeline kind of the way to production. +I think then at some point a second phase where you you just do again some manual evaluation with end user so not completely relying on on machine learning metrics. +Because we think there's some kind of metric blindness in the industry sometimes you just kind of get obsessed with your one metric that you optimize in these experiments and whatever it is just increasing it from experiment experiment. +And you go to production and you realize wow okay this metric is doesn't say say anything about the user set of satisfaction that I have in the end. +And there are so many examples from our customers where just handing out this pipeline showing kind of like search queries and results and then collecting some easy kind of thumbs up thumbs down feedback and. +And then trying to correlate is that really what we also saw in our experiments in our metrics and in the thing in many cases was that either the pipeline was not yet ready for production and they were like it's. +The far less accurate than we thought or also case where it was the other way around where teams thought are stuck we will never go beyond and like a for a for one score of 60% we do not here it's it's not working. +And they kind of handed out this this predictions or like get this demo and then people actually don't like notes like these predictions are perfectly fine. +And when you then dig deeper I think it's often that engineers not look enough into the data I think I'm just kind of rely on this high level metric. And the thing especially nowadays. These metrics only tell the part of the story because you're like for question answering also for search. +If you have a relation data set and let's say you always label the exact answer for certain question or query. There's just so many ways how you know. Can give a correct answer for for question that is different to this label so to give an example. +And we have many customers financially domain so typical question there is. How will revenue evolve next year and maybe in your data set and the evaluation data set you labeled. It will increase by 12%. +And now at the prediction time your model maybe finds another passage or generates the answer and says it will significantly increase. So like there's no overlap at all from a lexical side still both answers make sense and and are correct and we can probably debate now which one is more accurate. +But in many cases there is they basically give the same same answer semantically. But they're just formulated very differently and that's where I would say traditional metrics fail. +So yeah, we need better metrics and we basically did some research work on that and also part of the haystack where you can do like more semantic answer similarity or as a metric. +But it's of course also just I think looking at your data and looking at these predictions and seeing if they're really wrong on or if they're actually okay and maybe it's some problem of metrics or you are labeling process where maybe you need to collect more different options that are okay. +Yeah, I totally agree it's like it's it's a challenge of intersecting user language with whatever machinery you have to answer that right be it's part search be dense search doesn't matter like users don't care what they care is that their language is understood and often enough it's not. +Especially around things like bird if we go dance bird model doesn't understand engagements right there was a research paper on that and that might actually harm. There was even a Google example where it's showing the opposite like you say I don't want that but they say yes you actually do. +And then take that medicine which might be harmful. And then the metrics is essentially what I get from what you just described essentially you might have offline metrics right let's say and DCG or precision or recall whatever and then you have online metrics right. +And actually crafting the online metrics is is also our an art and it's never ending journey and just recently I came across one blog post which was shared by a former Netflix engineer. + I will make sure to link it in the show notes as well describing click residual metric right so it's what is you expected success on on on on that let's say segment of your market whatever on the queries versus what you got and then people still keep trying and trying and trying but just doesn't deliver so you could have these as a low hanging fruit to fix your system right and so. + Do you see that maybe that's already happening in haystack or do you see that that might happen that I as a user might be able to describe my metric let's say in the form of Python or JavaScript code whatever plug it into haystack and let it measure what I want and kind of mimic the online metric in substance. + So I think like providing kind of custom metrics yeah yeah yeah yeah and you can can can do that to some degree already like plugging in basically like a Python function and forwarding it that's the one way I think the other is probably on a on a note level so you can imagine this pipeline they're providing at some point they can be it answers or documents so you can also easily kind of add custom notes where you say I think this this note should now I will compare it to whatever you want or like maybe you're not on the setting. +Kind of write some locks somewhere like I don't take some some signals from from from the early query. To an extensive the way you can monitor it so yeah I think there's that's probably one of the kind of next steps where. +I see it's more and more online metrics more and more online experiments I would say right now where we see big parts of the market I think that's the. +More and that phase of developing experimenting finding the pipeline getting it initially to production and having their video I would say smooth journey and having a fast path to production. Kind of high success rates for these projects and I would say it's very right now focused on more. +But yeah let's say further down the road if you really think about the whole and add up life cycle. I think on the monitoring side there's this logic and one online metrics but also then things like data drift my queries actually shift into a different direction to these. +I think a lot of our query profiles and think of what I actually these use case how how how can we describe the query distribution and this can be on a formal level like. +Second questions was keyboard queries but could be also on a topic level to understand what is a profile at point a we can match it with certain pipelines but also is that kind of changing over time. + Yeah yeah you you somewhat anticipate like expected my question or sort of partly answered my question and my next question about where do you see the biggest effort in haystack and and Deepset cloud going let's say beyond amelops you know tightening the knobs and making sure that this flies and works correctly. +More towards I know you guys also hiring a product manager so sort of like more on vision side and connected to that if you will what do you think is missing on the market today still. +Maybe in understanding maybe in perception level maybe in tooling you already alluded also to things like metric blindness right and and and maybe when users get stuck and thinking that this is a wrong system but actually it's not they just didn't look the right way and things like that. +Yeah and there's I think the ton of works to left I think we are we already talked about it I think things progressed a lot in the last years it's crazy to see but still I feel it's with the in the middle of it or just starting and so much more work and things you can improve and then do better. + Yeah I would say for us right now there's like a lot of different directions but I think especially on the on the open source side we want to improve the developer experience also like simplifying the first steps within haystack I think it can be still overwhelming and I really want to make sure that. +Get as many people to the first aha moment like using all your own data asking a few questions comparing spa students retrieval and really experiencing this first hand I think this is one of the things we work on. + Then a lot around multi model so we recently added support for tables within haystack so I think one interesting direction right now that you can can really query into these kind of tables in your documents but maybe also further down the road into your SQL database as another data source and then of course everything around images videos audio and it's also interesting for us I think for. + Our customers it's typically less important than can attack some tables but still I think it's interesting interesting options that you can do there so I think that's like a lot on on open source side and deep side cloud are we really into lounge basically the experiments module that was one big step forward there and now it's a lot around giving there also guidance and suggestions like. + Like for example now I have the experiment I ran an experiment I've like a lot of these metrics I have a lot of data that was somehow generated but as it's not a single model anymore it's like a pipeline I really want to understand as a data scientist okay like where where should I not focus on or like where what's probably a good way forward to improve this pipeline is a rather the retrieval problem is a rather. + Another note that I should improve is maybe something wrong with my evaluation data set should I go back to labeling and like giving these kind of at least making these kind of analysis easier as something that we work on right now and then I think further down the road that will be for us a lot expanding in this world ML of life cycles what we talk about right monitoring without just making it simpler to integrate it at both ends so. +Basically on the one side ingesting your source data more easily and thinking it more easily into into deep side cloud so that you can say I know either maybe I have a wiki system that I use maybe I don't know I use notion or maybe I use. + Confluence or either I know another elastic search class that way I've already my my documents that I'm interested in so we have in there kind of smooth connectors that you can can import your data and directly work on it and then on the other end if you have your API now how can I easily get now a kind of search bar or search functionality in my final product so there's a lot of things and then everything around fine tuning a few short learning with large language models that some of you are quite excited about because as we mentioned I think right now there's already made a big step forward that a lot of use cases where you don't need to train at all anymore and then maybe that's a misperception that you also see in the market I think to the typical users come to us and say like oh yeah this use case how can I train and then we usually ask did you really need to train your own model like have you tried this and that like these kind of combinations and kind of models that are out there certain sentence transformers certain pre-trained QA models or anchor models and they're like no but like our use cases are different and that won't work and in many cases it does or at least they're surprised how good it is already and maybe it's enough to get started on it and so I think that's what I think we have to do is misperception still I think there are then also these cases to be fair where fine tuning still helps right and where you really care about if you percentage points better accuracy and where you then go down and say let's now start labeling let's collect either like we in this manual labeling process or maybe from some more noisy maybe real time like a production day data where you saw what people search what they clicked how can we use that maybe for training that's something where we see big potential probably for for next year and basically want to simplify this domain adaptation to have less manual effort and basically more automated way of training it and that I think was also about that. +So I think that's a good way to do that and then the direction of maybe that's language models. + Yeah sounds cool and if we go in even in look even further into the future would say I don't know five 10 years out do you think that haystack at some point may even start suggesting the user what to try you know if you go and set up a KPI for yourself right your end goal and then through the chain and that I see that you know then say yes something is going on there. +Then it would actually suggest you also to try some other model do you think it's possible or do you think it's a wrong direction at all like to you drive and leave this to the creativity of your users. +I think it's a combination of both so I definitely think that helps to accelerate and certain parts of your work so especially I think suggesting what experiment to run next or what it could be something you can try. +So I'm a big fan of that and I think we don't need to go probably like five or 10 years down the road that is happening already sooner so I can and hasty and Deepset cloud. + And maybe just like one thing we are so we have our company something that we call Hockey Friday so it's like one Friday every month where every person the company can work on whatever they want so really hacking on crazy ideas trying stuff out and I know that this Friday people are working on a generative model where you basically give in you describe what you want like what kind of pipeline so you can type in. +And let's say I want documents such pipeline that works on legal data that is very fast something like that. +And the output is basically a YAML file that describes this haystack pipeline which you can then easily kind of load and Python try out and also write a load and then Deepset cloud and run it there. +So we are experimenting with right now and and of course some time for the down the road I could see that you can take what's like signals from from what we know from what worked on certain domains and and basically use that in into this maybe a generative process. +Yeah it sounds cool actually reminded me of the time when I was doing my PhD something like 12 years ago a bit more. I had a collaborator who wrote a paper on taking taking the user text and converging that into C++ code. + And the use case I don't remember exactly all the details of the use case but I remember it was some way in the airport so like they do a lot of this routine work and instead of repeating it you could actually build a smarter system right so you think this could be the future of haystack or maybe the industry at large. +Yeah at least I think it's like one if you want element that helps accelerating right so if you also if you look at the core pilot right now I like it a lot for calling and I'm still in many cases surprised what what co pilot suggests you're as a as a note on the code level. +And I think something similar as also posted on the machine learning site and you are not only a generic correct code but really something that fits for for use case and to describe it. +I mean I think it's like if you think about the big up picture I think it's one piece that helps you in your workflow. I think it's there's still like many many other pieces that we need to get right and that won't be that's it a holy grail at the end. +What I really believe in is that you need a framework or a platform where we want to call it where you can easily compare things on your data and I think this helps a lot then and creating transparency in the market creating or creating a lot of things. + And also like kind of trust for your own use case that you are not basically doing a technology choice before you actually started working on your use case and that I think holds for vector databases where maybe today this is a good choice for you but maybe I know one year down the road maybe you want to switch this I think this market is so early and that's very hard to place a bet right now on one of these technologies. + So modeling side there's like so much crazy bus around large language models and can firstly see the trend going there but it's also I think very important to to understand if that's really useful for your use case now how it compares to much smaller models and and that this should be easy right this shouldn't this shouldn't be big part of your project it should be rather you will try and you will try to do it. + So I think to think about options you want to try maybe getting some suggestions as well there but this would be I think this is a human creativity part as well and then the actual say swapping of components and comparing their making them comparable I think that's nothing where you should spend time as a developer on. + And like connected to the question about future maybe I'm closing off on that we recently built with my colleague are netalman a multi model and multilingual search demo right where we used clip model of the shelf without any fine tuning on web data and it showed us really really amazing results right so like where keyword search cannot find because simply metadata doesn't really work. +It doesn't have it and it's multilingual right so and it type it the same query with neural retrieval and it gets it. +Is there anything stopping haystack to move into that direction as well sort of like crossing the boundary of only text right so like you did say multi model in the context of let's say queering a table but I could also query an image. +Do you think that high stack is going in that direction as well. + I think we're like we're right now working on it so we have a first case where we want to support where you have a text query but you can query also into images from the result side and then basically now the other way around would be probably one of the later ones they have an image as a query until I want to find different media types, I'd say. +But yeah this is like definitely what we're right now working on. I think I also think we need to think always see what are the big use cases and what kind of customers you have and how do we use it. +I think with images there's a lot of interesting use cases mainly in e-commerce I would say that's cool. So we are already supported to some degree and will support more I think in the next month. + That's great to learn and that also means that I need to adjust my classification because I've been presenting what I know about the players in in vector database and neural frameworks and specifically for haystack I put NLP as the main vertical and I think largely you guys still advertise that as the main vertical but I think nothing stops you from. +Switching that to multi modality right so NLP computer vision and maybe even speech at some point. +Yeah totally I think our approaches there's just a bit like doing one thing to quite a depth first and then moving on to the next rather than let's say starting with very high level basic support for all modalities and then kind of growing all of them. +So what we rather did in the past and still doing is very deep support for texts and we haven't there everything in place before kind of moving on to the next. That's a bit of a philosophy question maybe a strategic questions what you want to do it. +So this field multi is changing quite a lot right so a lot of things generative models really big large models models that I don't know even how to use yet you know like dali. +Yeah I think that's a bit of a bit of a question about the young just kind of experimental interest but probably there will be some use cases. Where do you think else the trends are going in this space. +Yeah so we like want one big trend I think for sure is these large language models and everything around it and as I talked earlier about it. Where is it right now and is it already today really usable is it already kind of worth investigating them comparing them for for your own use cases. +And I think there we are I would say still in an early phase it's look at for example GPT 3 and I think it's primals to the quite nice analysis earlier this year where compared embeddings from GPT 3 to more standard size transformers. +And there we think we saw that the performance is is not bad but it's also definitely not our performing regular size models which are a thousand times smaller cost a few dollars and not thousands and tens of thousands of dollars for for your influence costs. +So I think that's it's basically right now is the let's see case by case that it makes sense for use case. But if you think look a bit further into the next years. +I'm pretty sure and convinced that this is only a matter of time until we see more and more large language models really in production also in search pipelines in production. +And I think that right now it's this phase of figuring out how can we make them really more efficient more some more reliable so that we really can trust these these results there. + And I can easily update to new knowledge and and I would really but now look a lot into and what I'm personally quite excited about is is now this I think area of research around retrieval based NLP so yes on the one hand side kind of scaling up the models making them bigger because we think learned and over last years that they are good few short learners. +And that's of course exciting because you can just take these models and kind of throw a task at them and they will perform so less manual work of of annotating data creating domain specific data sets and so on. +But I think we also saw that they are not very efficient and there are these other problems. + How do you how do you actually now teach not to be free about recent events or about your own domain knowledge and typically I think these these data sets that you that you want to search in they're not static right so there's a constantly evolving and you really want to retrain these crazy models every few days or weeks just to kind of catch up with us. +And yeah, I think it's like where this stream of retrieval based or achieve augmented models is super interesting and I think there's a lot of cool work. We work just this week but from from Patrick with this publication around the Atlas model. +So there's basically idea can we can we somehow remove the say the memory part from these big models and it kind of outsource it to a database to an index and then at a query time we still have like a large model. + Yeah, that's like kind of complex reasoning, but it's kind of basing the generation on some retrieve documents and that can be useful for search but can be also for a fact checking or other use cases and and long story short, I think they have interesting they did love interesting experiments and that paper that show that you can actually outsource quite a bit of these parameters of this memory into into a vector vector database and and still keep the few short capabilities of these giant language models. +And I think this is like a super cool route like yeah larger models but still not putting everything in it not not blowing up parameters parameter size unreasonably. Let's do combining it with now let's say an external document base or knowledge base. +Yeah, I think it's the topic attached upon it's fascinating that on one hand let's say you have a model right and if you if you keep retraining it or fine tuning it on on latest data, you may run into this I think it's called catastrophic forgetting right so. + Like things that we as humans know that I don't know what is liquid kind of on high level without going into chemistry and it's not that we think about it every single day when we drink water but like it's not that we actually forget it if somebody asks us right no matter how many news or papers whatever the red books right we still remember the basic facts and and I think what you just said. + With the Atlas model right so approach outsourcing that memory into some database that you can maybe even control and say okay these facts need to stay I never want them to go away no matter what right this are like basic principles and maybe they exist in every domain like finance or healthcare and and so on. +And and yeah, I think this is interesting direction. Yeah, actually all these facts change right can also be that over time you have to adjust effects or knowledge and this is way easier I think if you have it explicitly somewhere in documents not so much in the process of model. +Yeah exactly exactly and like maybe just one example that comes to my mind is like CT CT's change names right and so you could still go back and say what was the name of that CD between you know 1995 and 2000 right something like that. +Yeah or presidents of nations also change right so for this kind of queries I think you want to make sure that you're up to date change it. +Yeah, and I think maybe coming back to search understanding the context will play such a huge role once these models become even more mature and available and knowledge aware. + But but the the challenge of extracting contacts from the query still is there if I say who is the president of the United States it might you know conclude that i'm asking about now present but if I was couple programs above already saying setting the stage about specific period of time in the past it could actually reason that I'm maybe not asking about presence right. +Exactly could do this reasoning or could you ask a clarifying question right or like or like here. + A couple of options that you mean this like as you may want more to win a human conversation yeah so I think it's called conversational information retrieval right and I think that we might start seeing this blend of what probably today is called chatbot and a search engine but it could be a search engine which is just clarifying. +Yeah, I mean I think it's it's I think also is in the field I think we are seeing that the search what we understand undersurge is evolving right so it's not so much anymore so I think about web search engines. +A few cases you were still you search search and you click on the website and then you will search somewhere your information. But in many cases we will kind of zero click search now where you have your query and within the search results you already find what you want at and i think this is just. +Yeah getting more and more popular that you're not providing say there's the route to go to another knowledge source but you're trying to really answer the query directly and there's no need to go further. +I will also try to remember to link one paper maybe it's like a series of papers from Microsoft where they try to embed knowledge into the language model and that's. +I think it's a very interesting direction as well as also embedding knowledge graphs into the model right because one way as you said and I think that trend probably still there that yeah you can keep adding parameters more and more billion. +More and more billion three lines but at some point it just simply becomes an on practical in practical right to to have such a large model in production and then how do you find it. But again it doesn't capture the relationships well enough right if you didn't explain it. +Absolutely and just thinking of it we have actually a meetup end of September so if you're interested or anyone here is listening where a news Rimas will also talk about exactly that topic how do you. +And then we'll talk about the corporate knowledge into a language model and and that would be our end of September as well just look for maybe can link it in the show notes. +So our open NIP data series absolutely will do that gladly my favorite question I know you touched many times as well during this podcast which I really really enjoyed. +But what what else drives you beyond you know you have a role as a city or you have a role as a pioneer in this space and maybe educating and reaching more and more people. Is there something else that drives you sort of beyond the tech itself in this field. +I mean that I think my my excitement my passion for NIP is clear. I hope that came came through. +But yeah for me like the technology is the one thing but then really seeing how you solve problems with that like how you can make annoying work of financial analyst faster and better like just seeing that either say first and because they are customer or it's a more indirectly. +And I know that this is now kind of possible. So I think it's like still a big driver for me personally and I think I want to think I absolutely love about open source that it's not just paying users commercial users where you kind of see that. +So we are really this huge community by now from haystack where there's so many different people with different backgrounds, different use cases and it's. +For me often like just end of the day really like scrolling through and all get up issues kind of questions that come in or on Slack when are we on discord. +And what what people are actually building with that and and it's really cool to see what they kind of use case come up with but also how far this actually got that it's. +And using so many companies all around the world from big tech to classical enterprise to start up to build their products on top. And that thing is a stick to one of my biggest motivation boosters that you can get at seeing the community appreciating using it. +And and probably also like on tip beyond get up just recently ran into a guy in a bar who he and Berlin who who used haystack and let's definitely something I never would have imagined a few years ago. +And this kind of happens or what we said of a glass here when we find a bit of a vision and thought about some goals at the company offside. +I think one of us for the open source side that people start putting say haystack experience into their job requirements or the other way around people putting that in the CVs and we thought oh, I guess this is maybe three years down the road. +But then a few weeks afterwards we saw these first job postings where this was required and also TVs where this was mentioned. +So I think it's just cool to see how you can leave a footprint and beyond let's say you are immediate bubble but really kind of spreads it's open it's all digital it's kind of connected in the world right. And leaving those kind of footprint is what I enjoy. +And yeah, search in I think as a domain in us just for me really interesting because it's so diverse as you can go in many directions can dive very deep into an IP can think a lot about the user side at the end for what use cases you can make it work. And can think a lot about scalability. +It's just I think the one of the most for my point most exciting and diverse applications of technology right now. And and one way I think you can really relate to like really can think okay what what is actually possible what kind of information you can make accessible. +And that's that's obviously the beauty of it. Yeah, it's beautifully put thanks thanks for sharing I know some some of the guests that I asked this question would probably think hey why is this philosophical question I'm just you know doing it I like it but that's it. +But I think it gives so much to towards you know you reflecting on what you do because that might also influence your choices in in the tech or in how you approach your users what message you send and so on and so forth and maybe reconsider some things as well. + And an open source part you reminded me of one story when it was my first time visit in the US I think it was 2015 and it was a patchy coin I was crossing on the traffic light you know on the pedestrian crossing and it was like this narrow avenue you know not narrow white and every right and select takes on my account like few minutes but it's of course not minutes maybe 20 seconds. +And I think it's really bouncing to me from the other side of the road saying I know you I was like no it's impossible it's my first time visit you know I don't I'm not a public figure. +How is it possible and he said he because you build look it's one of the open source kind of you seen in the extriders that I used to work on. +You know which I inherited from its original creator Andrej Blyetski and that's it he didn't stop to say anything else but but he made my day you know and I think what you felt in the bar was probably similar knowing that that person uses haystack and you know it's amazing. +Absolutely because it's just it feels very honest right it feels like it is is not because we know it's crazy marketing or anything like that it's just like a really like a natural community thing and and just building something that's useful for others. + Yeah exactly which probably reinforces you and gives you these well in this case direct feedback well not only specifics of your of your platform but actually the fact that they're using it and relying on and building a business and that tells the two decisions you made in the architecture and so on and so forth that's amazing. +Yeah I mean like from a company perspective that's one of the fastest feedback cycles you can have right and like seeing diverse use cases diverse developer person on us how they approach things what they're struggling with. Yeah also that angle it was fast yeah absolutely crucial. +I think it's the best and it's like I think it's Elon Musk who said the best setting is when your user fell in love with your product and once you just succeed so yeah there you go amazing and I've enjoyed this podcast so much is there anything you want to announce to our listeners. + Yeah we just the meet up I already mentioned so if you're interested in LP that's happening in September it will be hybrid so you can join online but if you're if you happen to be in Berlin we also have a small on-site event and then yeah of course if you haven't tried hastag yet maybe check it out on GitHub. + As a prompt every promise you can get an easy first pipeline up and running and just give it a try to try to question answering if you haven't if you're more coming from traditional search and down on Deepset cloud as mentioned we just released a big new model on experiments with still an early stage with with the product but we have an early access program so if you're interested if you're having a lot of use case that you want to bring to production in a fast way we think about how to scale it how to actually find that pipeline how to collaborate with with your end users and get some feedback there just reach out to us and then we can can get you on the on the early access program. + Amazing thanks so much Malta I've enjoyed again saying this and this was deep and thoughtful and we will make sure to link all the all the goodies that you mentioned in the show notes and I hope to meet some day maybe in Berlin maybe somewhere else but absolutely yeah let's make that happen and I totally enjoyed our conversation as well so thanks but for having me. +It's definitely interesting fantastic all the best with haystack and and with your research and development. Thanks a lot thanks Malta bye bye bye. you \ No newline at end of file diff --git a/transcripts/vector-podcast/max-irwin-founder-max-io-on-economics-of-scale-in-embedding-computation-with-mighty.md b/transcripts/vector-podcast/max-irwin-founder-max-io-on-economics-of-scale-in-embedding-computation-with-mighty.md new file mode 100644 index 0000000..160de1b --- /dev/null +++ b/transcripts/vector-podcast/max-irwin-founder-max-io-on-economics-of-scale-in-embedding-computation-with-mighty.md @@ -0,0 +1,433 @@ +--- +description: '

00:00 Introduction

01:10 Max''s deep experience in search and + how he transitioned from structured data

08:28 Query-term dependence problem + and Max''s perception of the Vector Search field

12:46 Is vector search a + solution looking for a problem?

20:16 How to move embeddings computation from + GPU to CPU and retain GPU latency?

27:51 Plug-in neural model into Java? Example + with a Hugging Face model

33:02 Web-server Mighty and its philosophy

35:33 + How Mighty compares to in-DB embedding layer, like Weavite or Vespa

39:40 + The importance of fault-tolerance in search backends

43:31 Unit economics + of Mighty

50:18 Mighty distribution and supported operating systems

54:57 + The secret sauce behind Mighty''s insane fast-ness

59:48 What a customer is + paying for when buying Mighty

1:01:45 How will Max track the usage of Mighty: + is it commercial or research use?

1:04:39 Role of Open Source Community to + grow business

1:10:58 Max''s vision for Mighty connectors to popular vector + databases

1:18:09 What tooling is missing beyond Mighty in vector search pipelines

1:22:34 + Fine-tuning models, metric learning and Max''s call for partnerships

1:26:37 + MLOps perspective of neural pipelines and Mighty''s role in it

1:30:04 Mighty + vs AWS Inferentia vs Hugging Face Infinity

1:35:50 What''s left in ML for + those who are not into Python

1:40:50 The philosophical (and magical) question + of WHY

1:48:15 Announcements from Max

25% discount for the first year + of using Mighty in your great product / project with promo code VECTOR:

https://bit.ly/3QekTWE

Show notes:

- + Max''s blog about BERT and search relevance: https://opensourceconnections.com/blog/2019/11/05/understanding-bert-and-search-relevance/

- + Case study and unit economics of Mighty: https://max.io/blog/encoding-the-federal-register.html

- + Not All Vector Databases Are Made Equal: https://towardsdatascience.com/milvus-pinecone-vespa-weaviate-vald-gsi-what-unites-these-buzz-words-and-what-makes-each-9c65a3bd0696

Watch + on YouTube: https://youtu.be/LnF4hbl1cE4

' +image_url: https://media.rss.com/vector-podcast/20220616_060650_51fed3f5cf98ff1ddb61cc17e11e43be.jpg +pub_date: Thu, 16 Jun 2022 18:27:50 GMT +title: Max Irwin - Founder, MAX.IO - On economics of scale in embedding computation + with Mighty +url: https://rss.com/podcasts/vector-podcast/522301 +--- + +Hello, vector podcast is here. And today I'm going to be talking to Max Irwin. He's this star in the search engine business in search engine world. He has been doubling also in NLP a lot. I don't know 20 years. It's huge amount of time. +And I mean, he has been consulting in this space, also building products. And now he's focusing on building his new product. And he's the founder of company called max.io, which is also a website. You can go check it out. And he's building a mighty inference server. +And the number of other tools that I'm sure Max will talk about today. Hey, Max, how are you doing? I'm doing great. How are you? I'm great. And thanks so much for joining me today. I'm very happy to be talking to you today. I'm very happy to be talking to you today. +I'm very happy to be talking to you today. I'm very happy to be talking to you today. And I'm learning about my tea and all the things that you're cooking there. But I think as a tradition, could you start with introducing yourself first? Sure. Yeah. Hi. I was a younger in my language course. +I was a younger in my language course. I was more of a mathematics computer nerd. So I had to kind of relearn language and improve my language skills to be able to be dangerous and in NLP. +But I started NLP probably around 2014, 2013, depending on when I really first started hearing about it getting interested. But I earnestly really started in 2015, 2016 with actual product development around NLP. With search, I've been doing search since about 2010, 2011. +Again, it's fuzzy when I actually first started, but. I think the first real serious thing I did with search was when I went to take my first solar training course, which was one of the when lucid works still had solar training and they had contractors coming to give training. +So that was, that was in 2012, but I'd been messing around with engines before that. I started on an engine called DT search, which was the C++. Closed source engine, but you could buy the code for like a thousand dollars a year. +So the company I was working for many rakes, we actually bought the code. And I was, I was the newbie with search. I mean, we had guys working with it for a while. And they built a whole platform around DT search. I was starting to show its age. So we started shifting over to solar. +But yeah, since I started that, but well, before that, I did a little bunch of computer programs. So like the 20 years, 22 years, ish stuff that's in my bio, like I've been. I graduated university in the year 2000 and I've been, you know, working professionally software ever since. +But with search, I, I really got interested in search. Around 2012 is when I really said, wow, this is amazing. This is so much different from what I've been doing before. So that's when I really do have had first into into the problem space in the domain. +Yeah, and some people say that many of us ended up in search field by accident. As well as actually NLP, I've been talking to one professor here in the University of Helsinki has built machine translation team very, very strong one. And, and he has built the system called opus. +And, and he actually said that he ended up in NLP also by accident because it was just an offer from a professor and he decided to take it and he turned out to be quite good at it, you know. +And he also had another option just to go and work in in Germany, he's from Germany to work in Germany in some company database company. And, and, and likely he didn't take that path. How was it for you? How do you feel about yourself and then ending up in the in the in this space. +That's a great question. It's interesting. I feel like a ending it up. I, it was definitely somewhat accidental. I, I found, I, I had the pleasure of meeting so many people in search through my different positions that I was working with. And the varying degrees of expertise. +I found that a lot of people. Who got involved with machine learning found out about search. Because TFI DF and all that stuff is like an algorithm and it's like, oh, there's this whole language problem behind search so we have to figure out. +And then search people get involved in machine learning because oh, this language problem is horrible. How do we solve it with automation and learning. So I, I accidentally stumbled on it because I took it was a, it was a role that was in like healthcare compliance. +And I was interested in that domain specifically and search just happened to be a really important problem at space. So that's how I kind of got into the technical domain of search. +And it just was so much more fascinating than like the stuff that I was used to with crud, you know, just create read update delete and just workflow applications, which I've been doing for, you know, 10 to 12 years at that point. Yeah. +Yeah, I mean, for me, for me, like searching all like, I think I started 2002 2003 academically. But then it was like seven years past and I still couldn't find a niche or a job for myself because there haven't been many search companies in Finland actually at that point. +And then I found a company which I joined in 2010, AlfaSense and it was a patchy solar, you see in everything new. +But it was still somehow inviting and I think the first time when I, when I've built the backend and I was like, okay, somebody is going to use this somebody is going to type the queries and will try to find information. +So I also tried it out and kind of like maybe work maybe didn't I wasn't the, the user of this system, I didn't know what to type. So I was just grabbing some phrases from the documents and see, okay, does it find or not, you know. +So is this something that also like attracted you like, okay, find the ability, right, like discovery or maybe discovery is the next stage, but even the find ability itself. +Yeah, I guess search was really my first step towards working with real complex data that wasn't so unstructured unstructured data, right. You kind of, you kind of reach a limit with structured data at some point of getting stuff into databases, getting it out and things like that. +And you can, you can spend a lifetime in that work. But I felt like I'd been doing it for a while. And with, with search, it was like this, this weird world where it's like all this unknown stuff and you don't know what to do. So it's an unsolved problem. +I felt like databases and things like that were like this solved problem where search search wasn't a solved problem and still isn't. Now with the work, if I had been doing the same database work, that's all no code right now, you can just create the same stuff I was doing no code tools. +You don't even have to be a programmer if you don't want to. At the level that we were doing it, you know, in the mid 2000s. So, yeah, now it is. And it's still it's still unsolved. Even when we start talking, you know, we're going to talk about vectors, of course, but vector search. +But that's still an unsolved problem. It's like another tool, but you still have all these all these issues that you have to take into account. Yeah, so endless exploration. Yeah, it's like infinite quest in many ways. There is like a limitless amount of tasks to solve. +But then, so somehow in your career, there was a turn that you decided to get closer to this vector search field. I just wanted to hear your kind of first reaction. +Like what did you think about it? When did you hear about it? And also what attracted you? I'd say the first thing that really attracted me towards vector search was the birth paper. That was written in 2018, but I didn't I didn't come across until 2019. +And Google had written a blog about how they were using it for their for their web search. And, you know, you could download some Python and get this stuff to work. But the reason why I was so fascinated by that is because of working in search already six years. No, let's do some math. +So, you know, eight years at that point, I had been stumbling along with the vocabulary problem, the query term dependence problem, as we call it, where, okay, well, to solve this, you have to create a bunch of synonyms. +And you get to a certain level of advancement and then you create a taxonomy and then, you know, you created a knowledge graph. +And, you know, before before before bird, we'd started playing around with word to veck and saying, oh, can, you know, can these type of embeddings be used to solve this whackable problem with synonyms and knowledge graph vocabulary expansion. The answer turned out to be no word to veck. +It didn't work as well as we'd hoped it helped with some things, but not, but it harmed with others. So it produced a lot of noise and, you know, maybe we didn't give it a good enough chance, but we saw, okay, we can train this thing pretty quick and we can get this model from our content. +But there's still this problem. + So when I started to play around with some of the Python tools that were available for, for bird and large language networks, which actually use word to veck as the preprocessing step to get the first, to get the first encodings and then, first embeddings and then use those identifiers to go forward. +And then we saw something there, I saw actual similarity where I didn't, I just saw kind of co occurrence with with word to veck before. +Yeah, these things are you see them in the same context, but with actual linguistic similarity, the first time I saw that was with bird and that's where all the hype came from. And then the next step with bird is like, okay, I have these vectors. Now, what do I do with them. +And then I said, okay, well, I have to use a dot product, right to use a cosine similarity. Okay, let me just do that. And then I say, oh, you can't just do that across every vector. It's impossible. You have to do something else. And then you go, you go on this learning path, right. +So that's, that's where I, that's where I ended up. And I had actually written a blog post in 2019. You know, about, and I think that post was, you know, widely accepted by the community, it's still in the open source connections blog. +And it was really, it was really showing like, hey, this is, this is a change, you know, it's not just Google that's going to be doing this like this is really interesting. And a lot of people agreed and there's, there was like this movement that kind of happened after that. +And a lot of other people were coming to the same conclusions, but there were a lot of challenges. So, with vector search and approximate nearest neighbor search. You know, that's, it would, that's just the tool to solve the problem. It's like, you know, you start with this problem over here. +And then you go like 10 steps over here. And finally, you get to vector searching. Okay, this is, this is a potential solution. Right. This is the core of the potential solution with all this stuff in the middle. Yeah. +But have you felt that I should read this blog and will definitely link it in the show notes. But sometimes when I look at vector search, let's say demos or applications algorithms. I get a feeling that you might just think, okay, I have a solution. Let me find a problem. +Because it's, it's all semitical. I mean, it's so sexy, right. Do you, do you think this is one of the sort of misconceptions, you know, in this field, or do you think that it's well passed that already. That's a great question. I don't know if. I don't think it's a solution looking for a problem. +I don't think that's true. I think there it actually does solve some problems. But I do agree that it gets, you know, there's a lot of gray area. And how do you arrive at that from I need to find things as a person. +You know, and all the things that you have to go through until vector search actually means something that's a solution. I think there's, there's a lot of people who picked it up and say, okay, we could just use this and it's going to solve solve these problems. But it doesn't do that. Right. +Because search is not just about similarity, you know, you can express a query similarity with a document using TF IDF PM 25, you know, the sentence transformer, you know, cosine distance, whatever. But that's only the similarity. There's also like the need that the person has to what they have. +So it's, it's a bunch of candidate documents that are similar, but what's the actual document you need. So that's where a lot of other things come into play. + It's just one piece in a much larger search or recommendations platform, you know, you still have to take on all the other signals and, you know, common now in the, in the more mature platforms is, you know, you have some learning to rank algorithm that takes, you know, you have to see similarity is one is one feature in in a learning to rank model. +Along with, you know, PM 25 with the title, the end 25 of the body, you know, the number of clicks, the date, all this other stuff. And it's, it's a piece. +But the thing that the piece solves is that query term dependence problem, whereas like I don't have to, and sometimes, you know, I don't have to go in and craft synonyms by hand, and I don't have this endless task of doing that. +You just, you kind of have all these other tasks that you still have to do, but that one maybe has kept it bay a little bit. Yeah, yeah, absolutely. I mean, maybe I can a little bit like, I can restate my question or sort of like, clarify what I mean. +I guess when you read, I think when you read a paper like Bert, a similar papers, they also say, hey, we, we ran down, this one down stream task like sentiment analysis, we also did question answering, we did recommendation, all these other things, and it works great. +pushes you to think in the direction that is this a universal language model or approach that I can now take and apply to everywhere every task and the answer is actually now because. Hey, I mean, if you are in healthcare and they trained on news, it's not going to work. +So the vocabulary still was not excluded from this journey. So if it is mismatch, it's mismatch. But the model itself, of course, is a clever piece of, you know, tech, which you can then take and kind of apply fine tune, maybe or train. On your data. So I think that that's. +That's one way to look at it, right? It is, but I think that we, we see a huge. Still a huge gap in the domain, right? I think there are a lot of organizations that can just make use of retrained models and fine tuning them. +But, you know, we, I know that there are still domains that you can't do that. Like, when if you go up and you try, you know, something that's fine tuned, like law, right? Law is like its own language. I wouldn't even, like law written in English. I wouldn't even call that English. +I'd call that, you know, legal English, right? Because just the structure, the vocabulary, the grammar, all this stuff is so different than what's in like a Wikipedia article or news or something like that, right? +So when you try to do a fine tuning on a pre-trained model that's trained on, you know, let's say like on to notice five, which is a bunch of collections of like, you know, news, Wikipedia, like general knowledge that most people use. +When you find to it, I, there's still a gap. There's, there's something missing, right? Because the original trained model was lacking this context. And that's, that's only for the content. Also, that's just, that's just the content. +And when people search and they type in terms, you know, you can imagine like this, this Venn diagram of like, well, here's, here's all of the content over here that you've trained on. +And then here's all the terms that your people that the users know, right? And you try to like bring these closer together somehow, right? If the model was trained on content that is like up here, then you're going to have trouble like kind of putting it together. +I don't know if you can do a good job in my hands showing this, but no, you're doing perfect out there. So I think that one of the one of the big existing problems is pre-training still costs like a ridiculous amount of money and is out of the reach of those teams. +Yeah, I've read, I've read papers, you know, one of them was by Microsoft showing like if you, you know, the bird vocabulary is like 30,000 words or something like that. If you increase the vocabulary size to like 100,000 words, then the model generalizes much better. +And you, of course, you expand the content and the domains that are involved in that training. So I think you, I think we're going to see some more of that. The world is still stuck on this 30,000 terms in the pre-trained space of things like onto notes. +Because it's just so expensive, it's train models and Google and Microsoft and Facebook and these companies that train models, they're not going to bother open sourcing those maybe that maybe they will at some point. +But I think we're going to need to see big companies that are specific in that domain, train those models and then open source them. But if you spend millions of dollars to train a model and you're a big private company. +Are you going to open source the model weights probably not you're going to keep it for yourself because that's huge value, it's huge value for your product. I guess you open source the idea sort of you publish okay here is the bird model here is the mom model or whatever. +But then go train it yourself. Yeah, if you have a couple million dollars lying around. +Yeah, and then I was I was also talking to in another episode, I mean, Ahmad who used to work at Google and he said that entire teams would be dedicated on a quarterly basis to do the expensive fine tuning work with burrito similar models. +And can you imagine that it's like a team's effort and this people some of them invented the model some of them didn't but you know with all the resources that Google has to fine tune them for three months. +So I don't think this is out of reach of startups and I mean there are other things that are out of reach like and this is where you saw the gap with might you want to get closer to the might now. + So there is you know every time I install a vector database i'm not going to name one and it says hey you know it will be faster if you use GPUs and i'm like okay I must start up I don't have GPUs you know so this is I think one of the gaps that you saw with mighty but are there other gaps that you saw that you are addressing with mighty server. +Yes, so the NLP world right now in the vector worlds right now they all they talk about is Python Python Python Python everything is in Python when you get to production you use something else but it's Python Python. So I wanted to. +I came from a non Python background I started with see Pascal when I was really young and that my seed programming is terrible for sure then I then I discovered you know. +Intermediate intermediately compiled languages Java C sharp things like that that was like early 2000s for me and I kind of went I was in the Microsoft world so I was doing C sharp for a while. +And then I found and all the while I've been doing JavaScript because of you know I was involved in the web in the mid 90s. +And that's how I got involved with content and content data and all this stuff there's just all web stuff and then you got a no JavaScript if you do anything with the web so it was like see see sharpen JavaScript for me for a while. +So I know that there's a gap if you go and if you go and you go into the JavaScript world the node or you know type script or those things do know now there's there there's nothing you want to do nLP learn Python. That's pretty much the suggestion. + Same with C sharp you know okay well there's there's some libraries out there for their clunky nobody really you know Microsoft probably uses them right because they're Microsoft and they built C sharp and everybody's doing Microsoft stuff but you know outside of outside of Microsoft's like who's using C sharp for for natural language for us to train models. +Nobody and to host models you know okay well to do it you have to jump through all these hoops and it's really hard. +So unless you want to like put Python in your stack which is basically a non starter for a lot of teams a lot of teams that they they work in languages like node JavaScript C sharp Java Ruby go like there's so many huge languages out there that just can't touch. +So I wanted something that kind of broke out of this shell is Python is Python like enclosure of like how do you get this stuff into the hands of other people just want to build web applications they don't want to go and you know go into the Python family. +So that was that was one of the one of the starting catalysts from from mighty infrancerber. + I there are there is one tool that I have to use that is Python because it has to you have to convert a model and I convert the model to on X which is most people know about on X if you're in the NLP world by now which stand it's oh n and X that's for open neural network exchange and is this intermediary format that can be used generically it's like an open model format. +Now there are runtimes that you can take on X and on X models and and run them so the biggest the biggest one is on X runtime and that's developed by Microsoft it's open source and I see licensed and that's written in C++. + But there are bindings for other languages and community contributes by next so you can use on X runtime and Python if you want to you can and you'll get like for those Python people who want to host models in Python just convert your model to on X and hosted in a Python on X runtime it'll double the speed of the model inference like out of the box you don't have to do anything you press like a button you don't you you clone the repo you press a button and twice twice as fast. +But for others you know there's binding for C sharp there's bindings for Java there's there might be bindings for Ruby I haven't looked probably bindings for go and even if Microsoft doesn't support them the community builds them. + So you can do this but there's this other problem that you have the other problem is that well those are just the model weights and if you're talking about and hosting the runtime for the model weights you so you put in inputs and you get outputs but where do you get the inputs from well you have to tokenize text and you have to do all the stuff to prepare it to pre process it and then when you tokenize and you do pre processing. +Then you can pass in those the tokenized data is inputs. + But all the tokenizers are written in Python so now you have to now you have that problem so I actually used rust for mighty inferno server because hugging face based their tokenizer they're fast tokenizers on rust they wrote it in rust and they offer bindings for Python so if you if you install fast tokenizer in Python you're actually using rustbox for the first time you can do it. + So I wrote a web server that wraps the rust tokenizers and on extra on time and I wrote a whole bunch of code for pipeline specific stuff like question answering sentence transformers sequence classification which is like sentiment analysis token classifications that's like entity and any recognition and I'm working on others also but that thing is really good. +It's so much faster it's so much faster than Python like it's not even close. +It's probably like three or four times as fast without any fine tuning of it and I've gone through fine tuning so by compared I haven't compared it to Python in a long time but I might be like five times as fast as Python right now on CPU. +You can also use GPU if you want and it's you maintain the same the same speed it's just as fast. +Yeah well it's just as fast as the the ratio of speed is like the you know if you took the model in Python and you put it in a GPU versus you take the model and on extra on time you put it in the GPU you get it's far faster. +And you say like when you said bindings in other languages you know like Java C sharp so if my stack is in Java I can take this model and kind of like it into my Java code. +You can take a let's take a hugging face model for example like just let's say brick based encased you know most people know that one. For based encased you can export that to on X with hugging face code in Python and you have now you have an on X model. +Now you can in Java you can stand up a class that wraps the on X runtime and you and you load the model into memory with on X into on X runtime in your class and then you can create methods around that class right and then you can call you can call in you can say I'm going to pass this. + Say i'm going to pass in the inputs and i'm going to get out and that's all in Java now well with the C++ wrapper for on X runtime of course but to connect but to wrap that C++ runtime there have to be bindings between the language so Java has to have some application interface to talk to C++ which is GNI right Java native interface I think yeah I think so Java yeah. +So that part like having Java talk to on X runtime is taken care of already you still have to write all the other stuff around it like to you to leverage it but that's you know what programmers used to that sort of thing you know Java you can you can do that and I think. + I don't know if it's I don't know how much we've seen it but Jeff Semarek who works at open source connections I know that is like he was working on a project where you know he could try to load an on X runtime and in open and LP which is a Java program so trying to get an on X model in open and LP and I think he succeeded I don't I haven't seen code for that. + Yeah yeah that's awesome so I mean the reason I'm asking is because I witnessed this tectonic shift in in my previous company where we had the entire stack in Java even though we started with Pearl but we had to read right everything into Java it's just didn't scale on Pearl and yeah and I mean we had Apache solar and we had a lot of things like that. + So I had a big number on one end as the open source search engine also written in Java and you know when we would customize it we would write plugins in Java and so on but then when we wanted to to introduce a into the pipeline of course everything was in Python we hired people who could only do Python nothing else fresh grats. + With this new architecture okay you have Python as one step in the pipeline how do you call it how do you handle errors how do you scale this thing right and we were also moving to Kubernetes to add to this crazy mix and and what we ended up doing is that we would have a synchronous processor plugged in in every place where you have Python to abstract Python away from Java right so you would come. +And I kind of like just say send this message to an sqsq and on the other end there is somebody consuming it can you imagine how scalable this can be. +It works it works you can also like scale it locally but as the whole architecture I don't think it's it's a very kind of smooth in a way solution like not to mention that the performance element of it is just not taken care of. +And what you say now essentially like with on X binding in Java we could just train that model and and and then export it in on X format and then use it directly in Java. You can yes yeah. But you still have to get the inputs to the model so if. +If it's like an image or something like that it's usually pretty easy but if it's text then you have to tokenize first and you have to use the right tokenizer and you have to do you have to kind of jump through a bunch of hoops to get it to work correctly so it's probably. +A month's worth of work to get a tokenizer working in Java the way the way you needed to work. Yeah and maybe you could in principle share this tokenizer between tasks right so it's for sentiment or for entity recognition in principle you could use the same tokenizer yeah. +Right so the tokenizer is so the tokenizer relies on the vocabulary and the configuration which is bound to the model so the model is dependent on on these things so if you have a generalized way to load. +If a vocabulary in the configuration then yes you could just take the take the thing and your new. +New stack but having said all this with mighty you took a different you know approach like the philosophy behind mighty you offer it as a web server right and can you can can you tell me more about it i mean i'm sure you can open a lot of detail. +Yeah the reason I went that route is because when you when you want to do model inference. You want to give it as much compute as possible right. +And you kind of want it to be its own thing so I went the microservice route i'm not i'm not saying microservices are the way of the future and they're better than model it's and all this stuff but the idea of coupling like this. +And you know this model inference is part of like your regular application code. +Maybe you don't want to do that you know you want to have this other service that can and this is part of like the bigger m alops question which is well how often should I update models what are the things that I just know about. +So you know drift and all these things that are like what about logging and all this stuff it's like well okay you need a way to do this and if you embed model inference in your own code now you're also responsible for all this stuff right. +So as a as a microservice you can evolve that microservice and say all right this thing is responsible for model inference and that's it right. +So you can evolve that in all the side effects around that of like okay well you need a new model but if you have to a b test models what if you want to do logging but if you want to do all these other things. +You can evolve that in its own way and it's in the separation of concerns makes much more sense. + And then it kind of gets you out of the it gets you out of the problem of like okay well am I going to build a mighty for Ruby am I going to build mighty for node am I going to build ready mighty for go like I don't have to do that I can just build mighty inference server as a web server or a grpc which own you know it's on it's on the road map I don't know how long that's going to take but. +So you have this thing and then I just have the right client libraries and the API is always the same client libraries for HTTP are super easy so. Yeah. +And if you compare this let's say we take a database like V8 of SBAR they have inference inside them right so like if you already bought into that solution in principle you could you could do this the only caveat I think is that if you have your custom model you'll have to go. +An extra mile to actually integrate it inside this database right and at that point with V8 I think you will have to master go with best but you'll have to master the C++ or Java i'm not sure i'm not an expert in that but there is a podcast with Joe Bergum that you can check out but yes. +So how would you kind of like on product side how would you compare my to that approach. So best but uses on exprontime best but perhaps on exprontime I believe it's on a current time I know the use on exprontimes don't know our percent of it's on exprontime. +So you still have to go through the step of doing that with with we be it it's a little bit different with we be it it's you have these things called modules and then the modules are typically like docker containers with. +With API six suppose and then there's logic written is in the module code for for we be eight that will wrap that API and it's easier if you just copy and paste a model and then change stuff to match the API of the thing that you have a doctor container. +So it's not that much work you still have to know go to do it. And yeah I think the other problem that I have with that approach and i'm not i'm not saying it's wrong but from my perspective. +So if you if you look at the documentation actually for a couple of vector search engines i'm not sure if this but I think we veaid and maybe another will say okay well it's better to use a GPU for inference and then CPU for the vector search right because. +You know you want to provide as many workers to the to the search algorithm as possible and you don't want the you don't want the inference the model inference and the in the vector search fighting for resources. +Because both are very expensive right so they say hey if you have GPU then all your model inferences and GPU and your vector search is all CPU and you get this one perfect box and everything just works but okay well what if you want to scale beyond that. +You can only send so many documents into into a GPU at a time what if I need what if I need 12 machines when I need 12 machines that are all hosting we veate and they're all hosting mighty all or whatever your inference solution is all at once right. +So this goes back to the separation of concerns problems like well what if I what if I have a lot of documents that I need to process and it doesn't take that long to to get them into the vector search by the vectors but processing those documents takes a long time so I have to pre process. +Well now you've kind of got like this situation where you might need another solution to do this batch pre processing right in another in another place. +And then you bypass the module when you when you integrate into into we be a you just want to send the director directly to we be so you don't you don't have any inference sending the director to direct. +So again it's like this i'm not saying it's wrong I think I think it's a great idea because you know you can just install something that will just work right you don't have to install like three different things and try to figure it all out. +So I think that getting getting up to speed on that is probably quick but in the long term like the scalability overall I think that you now have this coupling and it's a bit of a challenge so I don't know how that gets resolved. +Yeah that's actually a good point because you reminded me of I don't remember precisely what we were sort of balancing between but like with solar and a Java pipeline in front of it so the pipeline would process documents as they come in and you know what I mean. +You know chunk them classify them run sentiment analysis on them and so on. + We were thinking okay some of these things could be computed inside solar like we could write some clever plugin which actually does i mean solar has a lot of things there you know like before it indexes a document you can run like a ton of things I think open and it is one one example right you could plug in and run something there. +And I remember that my manager like who was a VP of engineering he came and said hey what if we lose solar so we computed everything inside solar stored it and lost it then what. +Like now you need to bring it up back really quickly and usually what you want to do is probably like replicate some shard and off you go right but if you if you don't have that data you need to recompute it now so you don't have any intermediate storage storage. +Solar is not the storage solar is the database and so we backtracked and we said okay we will compute everything and store it in s3 you know in file storage and if in the event of losing solar we will restore it and reindex everything on the fly. + So I mean that kind of also like you know resurrected that that that situation that also be deviator quadrant or any other database if you lose the fact if you lose the database you lose the vectors so if you have computed them inside the database now bringing it back and then turning it on and say hey please compute my vectors again please please please you know just too much time. + You're exactly right and this is a this is a lesson that I learned I didn't learn this lesson the hard way thankfully but this is just a lesson I learned picking stuff up when I was at when I was at Walter's Clure which is a huge publishing firm and you have you have your content which is like editorial content primary source content and it's it's written in such a way where it's it's pretty raw from a machine perspective you know and then it goes through a series of things and it's really hard to get it. + So I'm going to add the topics and then I'm going to save that state that's now on this somewhere back to okay well now I have to you know add this other thing you know do any recognition or something that's also said right so you have all these intermediate steps so if you lose anything it's really easy you don't have to rerun the entire you have to rerun the entire pipeline it takes you months to do that. +Not just days but like literally months to start from scratch with content so that's like a disastrous scenario. So this lesson that you learn is, okay, well, yeah, you don't do everything all in one place, because if you lose it, then it's all gone. You start from scratch. +So yeah, separating concerns in that way. And then the idea of, well, you can plug this thing in anywhere along the chain now. You know, you have this, you have a microservice, you can put it in, can put it anywhere. +And then you can, you don't even have to just take the vectors and then stick them in the search engine, right? Well, what if you want, what if you need the vectors and you want to do something else? +What if you have like a recommendations platform and you have this other system over here, and you want to do this other stuff. +It's like, well, you have to think about routing and all these other things. But if you just have an easy way to get vectors, you know, plug it anywhere along the stack, then that's up to you. You know, there's no, there's no prescribed way of doing things. It's a Lego. +You put the Lego wherever you want. Yeah, that's a great point because we also implemented like an algorithm, which was it computing some topics, I think. And we used fast text and work to back vectors. +But we didn't need the vectors in the end in the downstream system, we just computed them, cluster, run some magic algorithm, you know, produced. And then you store the topics. So you store actual words in some database, so index them in the search engine. So yeah, you're absolutely right. +Like sometimes you don't need the vectors, but they are still the medium to get to your, you know, to your target. So, and, and so, but you've, I've seen the blog post, which will also link you've published on marks that I owe. +And I think sort of almost like a unit unit economy of this thing, like if I have mighty gazillion amount of servers, how it will play out, you know, how much. Separation of concern and also resource separation all these things and how economical it is in the end. +And it is something that you are proposing. So let's say if somebody takes mighty and wants to scale it, you know, like all of a sudden you get, instead of 10,000 documents, you get 10 million documents to process, right. +Because somebody changed somewhere in the pipeline and now we need to rerun the whole thing. So how would you, what is your recommendation also on the economy side, how do you see mighty playing a role in making this huge thing more economical. +So the first thing, the first thing that I see is that you can, you can calculate the cost ahead of time, because it's absolutely linearly scalable, right. So, you can take, so mighty itself sits on one CPU, right. +It sits on one thread, I'll even say a thread because these days you have cores and CPUs and threads and it gets messed up. +You can, you can tell mighty to use multiple threads in certain situations if you want to, but the example for bash cross fixing that I use, which I actually learned from the best fatigue because they wrote an amazing block. +And I think it was early January, they released a blog post talking about this exact problem of, you know, do you have one process across multiple threads, do you have multiple processes. +So if you go with a multiple processes route, let's say I take, I take a bunch of documents and I pass them in and I, you know, I have some level of consistency in the document size. +So, you know, I actually do, pass them in and it takes you, it takes you, it takes you, X to get all of your documents, interest, right. So you have that number, you know how long it took and you know how much, how much content you process in terms of fights. +So if I, if I add, if I add another process now and I'm doing this purely paralyzeable so half of my documents go here, half of my documents go there, it's what I said exactly is linearly scalable, I add a CPU, it has the time, right, it has the time that it takes to do this. +So if I have, if I have a situation where I've said, okay, I did 10,000 documents, it took me X, now I have to do a million documents. +How long do I want it to take, you can actually write down the calculation and say, I need, I need this exact infrastructure, which is a huge problem right now, a lot of people don't know that. +Okay, let's just add a lot of GPUs and see what happens, you know, you can, you can spend the time to go through and do that calculation. But it's not so straightforward. And you'd have to do it like you'd, you'd have to cost it yourself. +I haven't released it, but I want to have a calculator that says, how many bytes do you have and, you know, how long do you want to spend, and I can say, well, it'll cost you this in Amazon or whatever. So that's, that's one, that's one thing. +I also want it so we, I mentioned GPUs is like, this is, I built it so it works on CPU. +If you are a company that's getting into this stuff, and it's this, this idea of unit economy, like how long does it take to process something? What's, what's the cost and, you know, how do I scale it, but the, the billion documents. +I'm coming into this ecosystem and content processing, and I'm used to working in Java, or it, you know, C sharp or something like that. +Now you're telling me I need to buy GPUs, like I have to run GPUs, and then I go check the prices and I'm like, well, that's not how much we spend on infrastructure. That's not in our budget. I'm sorry to tell you. So maybe we can't even do this. +So I wanted to have a way where you could get around that problem, where you could just use CPU, and it's a straightforward understanding of the cost that you have to put in. +I haven't checked Amazon, I haven't checked Amazon prices in a while, but I, might as well be posted online, which is, which is another cloud platform. +I just, the pricing is better, and I just like them, they were actually recently purchased by a huge content management system, stress on the name, forget the name, whatever, anyway, I use line note, and it's, it's cheap for CPUs, like it's great. +You want to run a GPU, it's like $500 a month or $1,000 a month. And that's a lot of money for one machine, and most teams are not willing to spend that. If you want to do fractional, you know, on AWS is probably fractional GPUs, I think. But it's still expensive. +And now you're now, it's like this cost that never goes away. Like once you once you do it, it's like, well, it's there, it's there for a long time, you know, CPUs are a commodity, GPUs, you have to fight with the, with the cryptocurrency crowd for the cost, all this stuff. So, yes. +I can imagine that GPUs can be used during the model training or fine tuning, but during serving that sounds way too expensive. Right. Yeah, yeah, that makes a lot of sense. + And, and so now when you offer my, how exactly you offer it, it's, it's a binary package, right, that I can install and basically run on my, on my system, and I can decide whether it will be like a standalone kind of script or it will be a port in Kubernetes or Docker image and some other known Kubernetes. +So is that right? That's right. It's, it's a very small executable. It's, it's so Linux is a first class citizen. +Windows is, it'll run on Windows, it'll run on Mac, but I've heard people running it on Mac M1, but they had to like do a lot of stuff to like fix dependencies and it wasn't really working that well. +And I think what's it called Rosetta or something, I think it's still using that like to, to do the X86 like bridge, like the translation visualization. So Mac and one, it's not, I wouldn't consider it working. I've also seen some other problems on Mac that I'm trying to resolve. +It works fine works on my machine, right, that type of thing. But really it's meant to be running links. You can run it in Docker. It's really easy to get started in Docker. So you can download the executable and run it on your back. +Or you can just download the Docker and use that, which is probably a little bit more straightforward. And you don't have to worry about other dependencies. With Linux, I don't, if you're running it on on Linux machines, you can use the Docker if you're doing like Kubernetes and that stuff. Great. +Run it in Docker. Just make sure that you sort out like in your pod or whatever, like how much compute you're actually giving it. Because model inference doesn't, it's not just a mighty, it's like all model inference is really, really heavy. It's really expensive. +It wants a lot of, wants a lot of compute. Not so much memory, but compute. So just be sure to give it enough to satisfy your, since you, and you time I haven't done computer any tests myself. But I like to run, I'm old school, like this whole Docker thing. +Yeah, okay, I'll, I'll make a Docker file. Sure, you can use it in Docker. It's on the Docker hub. But I like to just install stuff the old fashioned way. In Ubuntu, I just, you know, download the download the thing. It's a tarball. And you, you know, you end up the tarball and you're good to go. +And the way you started is actually it's a, it's a rest program with a, with a library dependency, which is on its runtime. It's dynamically linked. It's not statically linked. But to start it, you can either start one core, you specify the model. +Or there's a thing that says it's called mighty cluster is just a back script back script. And it'll look and check how many chords you have on the machine. And it'll start a process for every core you have. So it does this for you. +And it takes like less than half a second for each quarter startup is, I actually put that in on purpose. That's a limit I put in to slow it down a little bit. So it didn't let go off the rails. But you could probably take that little bit off. +You could just go and modify the bash script and see how quickly it will start up. So I, so that blog post that you mentioned before. Like I rented on 128 course. So it would take like I actually took the rails off and let it start up really quickly. +But it can take, it can take a moment to start it up on every single core. And yeah, you could do it in Docker. You could do it bare metal. If there's any people out there using windows, I'd love to hear from you. I'm back from Mac and Linux, but I haven't gotten any windows feedback. +So I don't even know if it's worth building it for windows these days. Maybe not. Yeah, I think it depends if I don't know what should be the scenario is like a your developer on windows. And for some reason, you don't go on your server side to. +You want to develop everything locally, right? So you want to bring up I saw such guys actually in my team, they wanted to bring every single server service on their laptop. Yeah. And that's how they developed they didn't want to depend on any external connection. +Right, even even Docker is like a pain in windows these days sometimes, right. So I know that I know the windows ecosystem because I used to I used to be in in 2000s. But that's the mindset. It's like I'm just going to run everything natively in windows. Yeah. +And like when I tried mighty on on Mac, I think it took like some seconds to boot, but the moment it booted, I was like shooting some queries and like to compute the vectors and was insanely fast. +Is it only on a secret sauce and in this insane fastness? If you're if you're used to running models and Python, it'll seem insanely fast. A lot of it is on that's they get most of the credit there, yes. +But there's a lot of other stuff that goes into it, which is like the tokenization and the pre processing and the post processing is just it's fast is I've been using rust for it and it's just a rust is a really interesting language. It's it's gotten me back into systems programming. +I'm not here to say that like rust is like the most amazing thing ever. There are things I love about it, the things that are like, I don't know if I would do that way, but you're supposed to do things a certain way because the compiler understands that it'll super optimize it for you. +It's hard to it's hard to wrap your brain around it. If you're if you're from a dynamic type of language like Python or JavaScript, it's hard to get a handle on note. I compile background like you know type, type programming language is compile ahead of time. +I was used to that from my previous life. So I was able to pick it up again and I read the I read I just read the rest book. There's a free book out there. I actually bought the I bought a paper back because I like paperbacks and I like hard covers like actual paper these days. +So I was reading it like that. And just going through the examples took me a couple weeks to get a handle on rust. That gets a lot of the credit as well. The rust language. It's just it optimizes and you know you have to learn this field that I'm in now with model inference. +It's like the super niche field of like you have to understand the hardware and you have to understand the machine learning and there are those two fields are like so different. There are very few people out there that are really good in both. So I know that there's a word vectorization. +So vectorized on the CPU is like well, if I have to do a calculation with eight with a bite and you know I have a register at 64 bits. But I have an eight an eight bit bite like well I can vectorize and I can do eight calculations because it's with SIMD. +So I can do the same instruction multiple data right. So that so rust if you turn on certain compiler flags it'll do that for you automatically. +So you get that speed up so I turn I turn those knobs all the way up said use all the use AVX one and two if the process supports it and most processes do these days for your on x86 arm has a different set. +So I'm going to run into the arm world that I have to get an emblem back and I'm going to start messing around with all that but if you know that stuff and you know how to turn it on, rust does the rest for you. +You kind of have to write your code a certain way so that you know rest will do the optimization certain way you can't think like old school you have to kind of think in rest world a little bit. +And you get all this extra all this extra speed from pretty much nothing just from writing your code a certain way turning on a couple of compiled flags. That's why it was so fast. +Yeah, but you still needed to figure all of these out and I remember you were you were saying that you had a bunch of weeks you know coding on stuff. + You get things done because I know and many of us probably know here in the audience that if you are a programmer, you might say yeah I can do it but you cannot actually estimate when you will be done so you get into the weeds and like oh my god it just like you to have for something else doesn't work or like i'm sending a requested fails whatever what's going on and you spend so much time or if you're doing an algorithm that's another story. +That's like an internet journey there like debugging all these states and and I mean i'm just trying to say that even though you you make it sound so easy to to master rust and you know to to go through all these maze and make it the way compiler wants it. +It's still time it's a lot of time it's skill and so you master it and that's why and in the end you know the end result was not given you you earned it right so why not turn this into the business so now on the business side i'm thinking. + How do you offer us like so how do you offer excuse me mighty so you have you have the the binary you have the like the model will be shipped separately somehow outside of binary right but what as a customer i'm paying for and yeah and also kind of ahead of time a question, can you give a discount code for our audience to try it. +Oh that's a great question. um. Yes, so my business model is is again old school because i've been doing software for a long time so it's license software right you pay a license you get to use the software i'm still trying to figure out the exact price point. +Some people say some people say it's too cheap, which is interesting because I didn't think so. Some people think I say I should charge more money for it. +It's $99 a month right now when this podcast is published and after that it may change if you get it I don't have it's light up to strike I can go and create a discount code for folks I don't have a code right now but if you if you email me and you say I heard about you on the vector podcast. +Follow the link in the description like follow the link in the notes and email will will set something up so you can get a discount. That's the way it works, but that's that's for commercial so if you're using it commercially and you're making money from it. +Then you know I ask you pay a license please. If you are a nonprofit charity or just using it you're a student or you just have a side project you're messing around you just want to get some vectors go install it you know don't worry about it. +But if you put in a production is and you're charging money for your product please please buy a list. Yeah yeah. To have questions sign how will you track who is using it for commercial and who is using it for a hobbyist project. That's a great question and I don't I don't track that. +I'm also I'm really into a privacy and safety on the web. So I don't like the idea of like putting in a whole bunch of tracking into lemon tree. I think that's a terrible way to run a product these days. +I the only thing it does is I have it ask when it first starts up it just asks the server for what the latest version is and it'll tell you if there's a new version. So with that I see I see that okay that you know somebody asked for a new version. +And I anonymize all the IP addresses so I don't even know who like there's nothing there's no user information at all so I just use that to kind of track. How often to start and it's I see like maybe maybe five downloads a day. Right now. +That's what I do so if you're running it you're for pirating it. I can't stop you spending my time trying to stop you. It's not it's not worth my energy. I'd much rather work with teams who really want to gain something. +So if you do buy a license I'll work with you on setting it up and telling you how to use it and working it and working on it with you. It's not advertised but around model inference itself. I'm happy to offer services to get your model up and running and making sure that it's running. +And so that's not really. Even doing a model conversion with you setting you that setting you up with that stuff. But that's that's not advertised. It does say like I'll spend an hour with you if you buy a subscription to get you set up but if you need more help than that, you know, let me know. +Now there's another tier which is like if you're Amazon Amazon would never buy money they have their own world. So if you're like a cloud provider or if you want to offer it as an API. That it doesn't count because it's it's it's per. It's per product that I sell the license for. +So if you are selling it as a cloud provider as an API and you've got like a thousand clients that are now using money. Well, I actually count all of those clients as a mighty user. +I don't have a price published but if you have that situation, I'm not going to charge you 99 dollars a month for each client that's that that you know if you're running that's ever business. Contact me and we'll work will work something out. Yeah, that's perfect. I mean, sounds like a solid model. +I mean, for the start for sure. And another like favorite question I have and I've been asking this question also to open source players like V8. And I think quadrant. +So basically, I have these thought, you know, one way of kind of building that connection that may yield a business case for you is what you just explained right. So somebody buys a license and then you scale with them you explain how to make it better. +How to tune it maybe implement some features another route is to open a Slack channel or discord, whatever. +And you know, invite users there and then start talking to them and maybe you'll have some open source components as well at some point, you know, I don't know a tool that helps me to convert my model into representation that might you can read. +Have you considered also taking that open source route as one way of building that community of some of which will be your users and paying customers. Great question. I don't have a. I don't have a slack. I don't have a slack myself. I'm a member in many other slacks. I could set up a discord. +I'm on discord. Mostly just for the ML ops community and discord. But I could just start like a thread or a channel and that I don't know if mighty itself needs its own slack by itself. I think it's more of a community. It would be part of another community. +One of the one of the annoying things for me is I have to go and join like 12 million slacks because everybody has their own slack and it doesn't work with one another discord does that way better. Slack. We got to have words. You got to make it easier. +I have like four email addresses or five email addresses across like 12 different slacks right now. I can't keep track of them. I can't keep track of them. But in terms of open source, I already have a bunch of open source projects. So there is. Max dot IO but spelled out. M A X D O T I on GitHub. +Somebody already took max dash IO. We can't have dots and good. That's fine. You know, names or names. So there's mighty convert, which I actually am working on updating that because it's based on optimum, which is a hugging face repository that does model conversion. +It's a very light wrapper around optimum. It basically just converts the model for you. Bundles the tokenizer and a configuration. That's it. It's pretty straightforward. You can do that yourself. You don't have to use that. So that, but that's open source. +There's also a mighty batch, which is a node program, which is a way to do concurrent batch processing of documents, into vectors, pointing it at a mighty server. +That's best described in the blog post that I wrote and how that works about the blog post being converting the code of federal regulations. That's on it's on the homepage of max.io. And there's also a bunch of other open source projects that I haven't talked about yet. +So now node mighty, which is basically just an API client for node that will talk to mighty does connection pooling. So if you have like four mighty for mighty cores running, it'll talk to all the little negotiate which core to use when it makes a call. +So that's really easy to use and like an express server. I also wrote to other node modules while I was at it that aren't from mighty, but I wrote node quadrant. So now there's a node client for the quadrant vector database. +And I told, I told the guys that quadrant that this exists and trying to work on a blog post out of an answer, I guess this is the announcement. I'll publish something. There's going to be a demo. I also wrote node Pinecone. So well, it's Pinecone dash I owe. +So now there's a node JS integration into Pinecone. So you can talk to Pinecone from from node from a kind of express server or something. The guys of Pinecone don't know that I wrote that yet because it wasn't I didn't I just put it out there. It's on npm. +So I got a I got a I got to work that out and they might want it. If you guys if you want this, you know, I just wanted something that I could use. But it's your name. So please take the take the package from me. If you don't get upset that I used your name. +I just wanted a tool that I could use for my own node JS testing. But then this stuff integrates with with mighty really easily. So I have all these node clients now. And I'm fucking focusing focusing on JavaScript for it. So all this stuff is going to be released. +It's already it's already up there. It's on npm. It's on my my GitHub. It's free to use. Maybe needs a little bit more polish. I haven't fully mapped out the APIs. I just mapped out the core stuff that I needed to do. +So it doesn't do things like the scroll command, you know, where you can scroll through all the points on quadrant. But I don't know how much of a use for that is it's really easy to add that I just didn't have the time. So yeah, there's there's a bunch of open source work that I've been doing. +I also want to mention I'm working on starter applications. So I have. Right now. Basically, it's like a it's like a starter app that works with node and node mighty and quadrant. And also node mighty and Pinecone. +I have two starter apps that aren't released yet that I'm working on polishing up and getting out there where it's where it's really easy. So if you're a job script person to just take documents, convert them into vectors, load them into a vector database and have a search app running using them. +That's fantastic. I mean, so much turned back and I think this could be one of the like a we're witnessing a community written software for a close software company. +I mean, Pinecone is a close software company, right? And we have an episode with Greg Kogan, who is a chief marketing marketing officer with Pinecone. We can connect you to and you can discuss the future. I talk to Greg. We're working on some stuff. +So what my question is what made you write those connectors like did you think that this would also pave the way to using mighty, you know, plugging in mighty in the pipeline. Let's say if I'm a Pinecone user and I can have a node Pinecone connector at the same time as mighty. +I'd say half half that, you know, there is, you know, I do want to promote it, of course. But again, I want to bring these tools outside of the Python ecosystem. +If you look at the vector databases right now with it with the exception of with we be eight, we be eight does a great job of having different clients for different different languages and stacks. Well, but both, both quadrant and Pinecone right now, it's all Python. +Quadrant quadrant is written in rust, but their client right now is their first class client is in Python. Which they did that because obviously everybody who has to get vectors has to use Python anyway. Or they used to. But that's why they chose Python, at least that's that's what I speculate. +And Pinecone as well, all their examples are in notebook form and Jupyter notebook form you go in and you want to do a semantic search example, that's a Python notebook. I'm not crazy about Python notebooks. +I think Python notebooks are good for illustrating like ideas and sketches for papers, but it's really hard for me to look at a Python notebook and say, here's how I make this into a working application. It doesn't translate well because the architecture isn't there. +It's a bunch of cells that you run an order. That's not how, you know, real world applications work. +So the idea is to just get these tools and get these ideas and capabilities out into the hands of a lot of other people who want to be able to use this stuff and are not familiar with Python, they're not familiar with NLP. I want to be able to use this. +This new technology because they might have a business problem to try to solve. So you're thinking actually about engineers who are day to day productizing their code and thinking, okay, yeah, I need a embedding layer, but I don't care about notebooks. I'm not a Pythonista or whatever. +So, you know, just give me the tool. Exactly. + I mean, and by the tools, you also like disclose something like ahead of time with me that to me that you are like one of the overarching goals for you is to develop as many tools for the vector search community as possible and like some of the tools you mentioned, like go beyond, you know, pure engineering components like connectors, you said, maybe like fine tuning a model or something of that sort, which at that point, I think you are stepping on the ground of, you know, other start. +Like, you know, other startups like Gina and, you know, Deepset and so on. Do you feel that way or do you not concern yourself with those and you are just thinking, okay, what's missing in the field, I'm going to add it, I'm going to open source it. Yeah, same. +So, Deepset is like it's all Python again, Gina, I think is a lot of Python, right. I'm not as familiar with Gina. Yeah, they are Python mostly. +Yeah, it's, there's a huge opportunity to make these tools available to non Python stacks and I don't, I, before I started working in machine learning, I've never even considered Python as, as an application framework, you know, people are using like Django, like, flash and stuff like that. +But for me, it was like, it's not that it didn't take it seriously, I just felt it wasn't, it wasn't something that I would have chosen to use aside from, you know, a lot of other, a lot of other stacks. +So there are so many other teams out there that want to be able to use these things, but now they have to, oh, Python, Python, Python, it's nonstop. So we got to break out of that somehow. And I'm starting with node because the JavaScript ecosystem is just absolutely enormous. +I think people underestimate the size of the JavaScript ecosystem. If you're in machine learning, and you're listening to this podcast right now, like there, there are like, maybe a hundred people for every one of you who's using JavaScript in, for applications, like that's how big it is. +So I'm starting there. I just know it's just an enormous community. And not only for front end development, you know, we need to emphasize this because you also have server side JavaScript, like node. Yes, and others. And it's huge. +And a lot of software, which is kind of the middleware between your super cool search engine or your vector database. And you have a lot of middleware written in node because it's so much easier. Oh, well, not easy. I don't know. +Is it easier? But I think it's just the pervasive, you know, nature of JavaScript. Yeah, I don't know if I'd say not as easier than Python. I think it's, you know, I think they're similar actually in a lot of ways. The syntax is a little bit different. Curly braces versus tabs. +But I think that node, we're getting away from vectors right now, but nodes started because JavaScript was the language of the web. And people didn't want to learn another language to also write back end code. +You know, you were using pearl, right? So a lot of the, there was a lot of time where it was like pearl PHP plus JavaScript, right? There was that whole world out there. So that's where node came from. It came from the web, the web front end. So that's, web front end is enormous. +And they all and a lot of them just adopted note and then node had its own hype cycle like 2010 through 2014 was like maybe nodes heyday where it just was like. Through the roof, everything was no JS. It was going crazy. Now it's all now it's all. +You know machine learning and AI a lot of people got involved in this in this world, but there's still a huge, a huge section of the world that's written on top of node from applications that started and, you know, the early 2010s and have evolved ever since. +Yeah, but back to tools like so you said in the early notes you shared like you also want to address some of the problem solved problems like a model fine tuning or some other like pipeline steps that that may be precede the embedding layer that you have now addressed with my team. +So what are your thoughts there, what do you think is missing? I don't, yeah, I don't know if I'm going to get into actual model tuning. I think that first of all, I'm not, I'm not as good as I'm not as good training models as other people. There are other people that are suited to train models. +But I do think there's a lot of other information that is lacking in model in the ML ops world and vector and vector search. One of them is just like well, how similar are these things right what's the distribution of similarities. +I think we V8 said they do support some of that and vest bus what some of that and logging. But I don't know about Pinecone. I'm pretty sure Russ, I'm sure pretty sure quadrant does not so it's like what do I mean by this it's like if I if I have a vector and I get a lot of data. +And I get and I do a query against a quadrant, for example, I get back a list of documents that are nearest neighbors and the similarities. +Well, where does that fit like if I get it back and the first document is like point four or nine similar right is that good is that bad like what are the what's what's real what's real good similar maybe maybe the best similarities are like point eight range. +So now I know that well in terms of my entire corpus and how people usually query this result is actually not that great. There's a lot of questions to be answered around that stuff. So I think that's lacking in a lot of ways. I don't know if that's the right fit for mighty though. +I think there's just external tools that you know I'm kicking around all that stuff would be open source. So I'm very interested in in mighty being the area of the business and then all the other stuff is open source to make things easier for people to use these things. +But yeah, there's a lot there's a lot of stuff in terms of. So in terms of the options, there's like model drift it's like well I used you know if let's say I have like 100 100 sentences right. And I vectorize these against you know model 1.2. +3 right and I got back I got back a list of vectors now I've upgraded my model I model 1.3.8 right. Now I now I run my test vectors in my test sentences through and I get different vectors. Like how how much has changed what's the difference there. +So there's this whole world around measuring model drift and there's some there's some interesting open source tools on this already. But they're written in Python. Now you'd have to use by thons and do all this stuff. So I'm trying to understand what what the tools. +And what tools could be written that are not in Python land. That could expose these statistics and this important information to people who. You know, you don't want to marry themselves to Python. Yeah, yeah, absolutely. +This sounds like you touched also on this very important topic which I think is known as a metric learning where. And one hand you do want to know what is the optimal distance and maybe you need to find junior model or maybe your data is not good fit for this model and so on. +But you do need the tools maybe something like keep it for you know ranking evaluation and tuning you would also have some tool which is keep it like maybe even with the UI where you can load this vectors visualize them and see OK. +So if you do the field together what's missing and so on and then have the stats on it right so you can actually run the statistics and you know. I want to let Eric write that tool. I love Quepid. Quepid is so great. Eric go right, keep it for vector search. +Yeah, I think we can pay it up on that maybe all of us contribute make it open source. Yeah, but yeah, I think this is one way to look at it right and I think quadrant. +Developers, they push the metric learning quite heavily forward by the time this podcast is this episode is out there will be another episode with a developer from quadrant who is actually very big on this idea of metric learning. + And he open sources of course everything and I mean he offers tools and also like papers that you can read and educate yourself on this space I think this is something that barely is scratched at the moment by the community by by even the end users you know they don't know OK I take clip model I have the images plug them in together works fine I'm done what if it doesn't work. +What if you have some images you never find them for any query but you do want to find them because it's an image of some product that was recently released and you do want to showcase it right and you're not using keyword search there it's an image. + And using vectors to retrieve it right so it thinks like this I mean it's kind of like there's a bunch of topics there one another one favorite that I like is the robustness right so if I have an aircraft I rotated a little bit and all of a sudden I find kittens instead of the aircrafts and this is what corner short and showed yesterday on on on the gena meetup and was amazing I mean robustness you just change slightly your input and you just yeah doesn't work. +So I think there is a lot of things missing but like you like from what I sense in your answer like it feels like you do still want to keep your focus on mighty and push that as further along as possible right. +Yes, and I want to what I really want is I love that people download and install it and use it and do whatever they want to get vectors with my that's awesome i'm really trying to find partners i'm really trying to find partners who. Who want to just really make it super easy. +To do inference model inference at scale. +So for example I haven't gotten any replies i've been like spamming not spamming i've been emailing and trying to get in touch with like cloud cloud providers right to say serverless inference if you could offer serverless inference right through lambdas or whatever. +That's like so many people are asking for that you know you can't do that with Python tools these days you can do it it's just going to it would take forever and it would be really expensive really slow. But there's such an opportunity for cloud providers to make it super easy so you can have. +You know you want to get content from from point A into into your recommendation engine or your vector database or whatever you know. +Do you want to stand up like the big GPU server in the middle to get this no you don't want to do that if you can avoid it so how about something that's that serverless and people can just run so i'm trying to find partners there i'm trying to find partners who who have. +Search platforms and and other and other platforms or just see this is a Lego and their stack and things is going to make it easier and they don't want to you know hire a team and spend months building this thing and trying to figure it out. +You can do that of course go go do that but you know you can save yourself a lot of time and pay and buy it. By working with stuff that's already there. Yeah that makes sense i mean probably companies like the likes of algolia or. Right exactly but potentially elastic you know because they. +Both of these want to get closer to the neural search even though maybe they were not wired up originally to be vector search databases but they do have the components like elastic based on leasing and algolia. +Probably based also leasing i'm not sure but i'm sure that they're looking at this field so i mean for them and now we are getting a little bit into amelops and vision that you also shared a little bit ahead of time that. My tea could be one of the components in the amelops ecosystem right. +Yeah absolutely not just a stand long kind of script which I download and then i'm thinking okay where do I plug it in. Right I mean if it was if it was are you thinking in that direction as well yourself like okay. +Identifying the tools and systems where might it could kind of like play a long a role of the embedding software. Yeah absolutely. It's. +I have to if the other thing I want to figure out is does it make sense as it is right now as as a web server like that for every case probably not there's probably situations grpc was one request. But yeah it's it's meant to be flexible for you stick in a model that your model. +You know and you run it how you want the other thing that I found was that i'm I met a lot of people who are. +Like scratching their heads saying like which model should I use also right is my my first model or whatever and I just want to start playing around with this so that's the other thing I did is I is I have like default models that I that I chose that I know work well because. +You know especially like news rumors he's amazing and he's done amazing. +Community development around around expert and the models that he's trained and the documentation is published around why certain models are good and others are bad so other people they don't know of of this stuff so it's just like well you don't have to go off and learn and understand. + Right away why why I should choose one model versus another it's a hard decision to make so there's some there's some defaults that I chose so it's really easy to get started so there's a lot of people that I don't know of this stuff so it's just like well you don't have to go off and learn and understand. +So it's just one model versus another it's a hard decision to make so there's some there's some defaults that I chose so it's really easy to get started so the so the vectors themselves right off the bat or if you do question answering it'll be it'll be pretty good like for. +For regular regular English not to make specific you you still have to do fine tuning for most cases but you're not going to start fine tuning before you even know how this thing performs like. + So in the beginning right you want to try a model and see what how close it is so there's some starting starting work there I know Algolia is getting to the vector search stuff so I don't know maybe they maybe they don't know how to choose a model so you guys you can use my default model if you want it's just. + Yeah absolutely I mean so far what I hear from you is that mighty has the qualities let's say it can run on pure CPU which is a win on cost it scales which is also a win on cost in the long term right and it also is insanely fast which is a win on product it's a win on go to market like I have this document how quickly it travels through the. + Travels through the pipeline and is searchable right so I mean it's important use case in some cases like paramount you know like financial space you know a document came out I wanted to be indexed right away like a second after I don't want to wait five minutes I it will be way too late for me to make a decision so. +I mean is there something else like you and maybe if you if you could compare now or point us to the blog post you know with other vendors like Amazon has in French you know hugging face has infinity infinity right. + And then and video I think they also had some layer I forgot its name but like those are probably fairly expensive they probably I'm not $90 per piece so what what what is your thinking there so like you I think you also are vocal in this space or like in that direction that mighty is much more economical. +Then these more expensive solutions but they probably offer something else as well but like you have an issue for sure yeah. +I think that so the interesting thing with if you if you get involved with like if you if you get into Amazon like in French yet and all this stuff they crafted like their entire like they build their own hardware they have their neural core. That all the stuff is based around and that's like. +It's locking it's big time locking right. This is just a web API you can just use it I think that. + I've considered also like hosting an API like hugging face hugging faces like one of the most amazing software companies ever it's like that's like the real community driven open source stuff that they do such amazing work so I don't want to I don't want to say anything bad about hugging this because I really have nothing bad to say at all. +But you know infinity definitely has a fit for the market which is like you know if you are like Walmart and you need a solution okay. Hugging face infinity is in your budget go pay for it you know that's the type of thing that Walmart should use. +But if you are just like if you're a five person developer team or like even if you work at a company does like you know 300 people. And it is like really really expensive. +So there is a there is a market segmentation there there's a difference between okay well how much can you afford and who can you hire and what's the level of internal support that you have to put around this thing and how does it all fit. + And the teams that are just starting off that need to use something that that works really fast easy to use them that's that's where my fits it I don't think mighty can compete with infinity because honestly I you know hey Walmart if you want me as a customer if you want if you want to buy lighty sure go ahead you know let's talk or you can pay the 99 bucks a month but you know that's not that's not one target i'm trying to make it super easy for everybody else. +I rank recently connected with me a LinkedIn I think some kind of VP of engineering hey if you're looking into embeddings contact max. Really. +So we understand infinity a little bit better because I didn't try this at all is this like some kind of web service that you basically buy subscription for like sauce kind of thing. No it's like a dark container I think infinity is a dark container. +I don't know it might be it might even be written in rust i'm not sure consider tokenizers are written in rust they may have done I may have done something infinity came out before mighty so they may have done something. Perfect competitor for for mighty in that sense. +I mean no time pricing but I mean only the package itself right so basically it's like darker darker anyway. Yeah and I think I think mighty well I think infinity encourages GPU like you they want you to use GPU for it but that's like. +I think infinity fits well if you have like you know a million requests an hour something like that scale you know. If you have like 20,000 requests a day or a thousand requests a day you know that that range 100,000 you know. +I think by these perfect for that you know it's not you don't have to have like this huge scale it can get bigger you can just you know spend more money on hardware and scale it up as much as you want you can support. +You can support you know a million requests a day if you want to or 10 million you just have to put more hardware behind it. So I think i'm just competing in a different market I don't think I don't think infinity and I are targeting the same the same businesses. +Yeah yeah and I mean you do have the edge on the fact that you want to address the community beyond Python so like I think it's a big it's a big message to send. And in some ways through you you channeled this this feeling that hey this guy is in no j as a Java probably feel like left out. +From this thing but it's probably not true I mean I know also there is this deep learning for J and blah blah blah but like it's like an island in the ocean probably comparing it with. + I think it's an offer just didn't get the adoption that Python got I remember going through these internal pains myself right when I was when it was like 2015 2016 and I started and I started getting deep learning training and I took of course air courses and rings courses on machine learning and stuff. + So I was kind of tough with Octav which is an open source is a mathematical or whatever it's a new Octav but it's like it's its own language right so it's mathematics just just code but then like the next courses where all Python and I was like oh no I have to learn Python I don't know Python I have to knew new new new language to use this stuff okay fine i'll do it right so I went down that and I learned Python and I got pretty good at it. +So a lot of people just don't want to take that step you know they want to they want to ship code in their stack so it's a big ask to say if you want to use these awesome tools you got to use you got to convert you got to convert your language. +yeah yeah exactly and if you're you know if you're not into data science or machine learning then why would you enter Python at all like it has no no like single winning point well maybe simplicity but hey is that it you know. + And it's like lose deposition of course you can make it more strict with typing and blah blah but like still but like it took me I think it took me actually good three years to learn Python properly because it's like not like okay I understand how to do the for loop I understand the indentation and blah blah but like to actually master it right like you know avoid stupid loading of the model multiple times in the unicorn. +So so I think the all like sys and I didn't enter thysand the sys and world likely. But but even just writing normal soft when Python takes a lot of time productizing it takes a lot of time so. +So why would you enter it if you are not after the tasty machine learning and data science so why would you consider even converting your software stack into this. + So it should be the other way around and I think you're doing a great job there with mighty basically offered as a service offered as maybe in the future as some kind of library or some kind of environment I mean Microsoft has been doing a bunch of these things I don't know if you remember the CLR common language runtime. + So like you you you bring up the the visual studio and you can say okay my project will be in Pearl compiled and run for Java I don't remember it was crazy I was just experimenting with it and I was like I barely knew any of these languages as a student but I was fascinated by the idea it didn't fly I think but it was it was amazing. + Yeah absolutely and you know I would I did I did play around with the idea of well what if you don't even have to download mighty what if I was playing around with this idea from the mpm perspective like what if you just installed it from mpm command and I thought that's a little bit heavy weight do I want to bring in this thing from you know I could I don't know if that's if I should do that. + I also don't want to set false expectations to and maybe this is just because I'm not great at marketing but I don't want to set the expectation of you just do mpm install mighty server and then you have a perfectly running thing because it's more than that you have you have to scale it properly you have to give it its own compute. +You have to choose the appropriate model you have to you have to do certain things to really get the most out of it. +So I don't want to set false expectations where somebody deploys it and and it's like it doesn't work well at all because they just did mpm install mighty server which doesn't exist by the way don't try that and then it didn't and then it didn't work. +So I you know there is a little bit of knowledge that you do have to attain so I want to pass you know you do have to familiarize yourself with some concepts that doesn't mean learning an entirely new language and stack. +Yeah it's more like probably like I'm a lobster dev ops somebody can pick it up and I mean learning that way is much faster than actually you know figuring out how the hell will I plug it into my Java code or C++ code or whatever so yeah of course. +I think we like I've really enjoyed this conversation max we went deep into all these aspects maybe not we can record another episode you know going in another direction i'm sure there is like me and me and me and me. +So we can Pauline and directions to take but i enjoy asking this question of philosophical why if you can still spare you few minutes, like, why why why why why why so why how are you fascinating by this field of vector search what brought you into it and i remember i will also. +mentioned this that we did form a team with you and you responded positively to my inquiry to compete in in billion scale and in competition. And you basically single handedly almost driven the idea of body PQ. +Of course, we also have Alexander Simarov who was helping you and all of us been brainstorming with you. +But so that was kind of like maybe academic fascination with it, right? But are there other facets that keep you going also giving your background in search, which was prevector search? +Yeah, I'd say just my endless curiosity into things, you know, I think a lot of us have that as, you know, if you're listening to this podcast, the audience, there's probably a lot of you are very curious about just check technology in general and the limitations of technology and what's positive. +And getting to that new magical thing. And trying something for the first time and saying, oh my god, this is incredible. I can't believe this actually worked that I did this. So it's that. I mean, I, you know, I'm in my 40s now. +So I've gone through that cycle a lot of times where I've tried something and it was amazing. I do really feel that there's a lot of practicality to it though, you know, in my wisdom now. +I see that, yeah, just because something looks cool doesn't mean it's the best thing in the world and it should be used everywhere. So I see the practical, the practical use and need for vector search. +Whether or not it turns out to be like the end all be all with search, you know, that debate is open, right? But I don't think it is. I think it's just one piece in the puzzle. But it does solve this whole class of problems that were unsolvable before if you go back 10 years. +When I first started in search, the types of things that I'm doing right now. And I'll give you an example and I actually, you know, I set this to somebody the other day. +It's like, you know, the first time I installed solar, this is even, you know, maybe elastic search was around at that time, but maybe it was compass search. It wasn't even elastic yet. The first time I installed solar and I put in some documents, I was like, wow, this is amazing. +Like I can do a search. This is so much better than that crappy index that I was using on SQL Server. So it was just really, it was like that type of amazement. +But then you, you know, you work with it over time, you see the limitations and it's like, oh, this got it out of these synonyms and all these other problems and all this stuff. +I'll say that, you know, when you, when you first start off and like the relevance of solar, like out of the box, you take their example, schema XML, and you, and you add some documents to it and you get back stuff and you're like, okay, this is cool. +If you take that feeling and then you, and I'll just use quadrant for an example because quadrant is in my opinion, like super easy to use, like you just docker pull quadrant and use throw some stuff in there. Especially now with this node thing. +So when I did that, the first time I used quadrant and I wrote this node wrapper and I just chucked in a whole bunch of documents, I saw that like just the out of the box relevance. And I'm not saying this is fine tuned, like this isn't something production worthy. +But just the out of the box relevance, I was like, this is better and I would spend, in my opinion, less time worrying about this than I would with an inverted index, you know, just because, well, yeah, maybe the results aren't like super precise all the time, and things like that. +But if I'm on a team and it's like, I got this search bar and I got this content and I don't want to worry about it, right? I don't want to worry about it. I just want it to work. I want it to surface stuff that's like reasonably accurate. It doesn't have to be the best search in the world. +But it's a cost for me. It's a cost for me as its team. I don't make money from search, but it's something I have to support. +I think vector search offers are really, really good solution there because it's not like you have to chase that endless bug of this doesn't even have anything to do with my search. +You just I searched for, you know, what is the best hiking boot or something like that, you know? And all of the documents they matched what 10 times, but there's no semblance of hiking boot or anything in my document. This is terrible. You know, you don't get anything like that in vector search. +And that's I think the appeal. You know, when you get into like real, real production, like highly tuned search, it's just one piece. But just for the teams that's like out of the box, I want it to work and I don't want to deal with it. +I think it's a better, I think it's a better solution than elasticer solar. You end up spending a lot less time and headache. Yeah, that's amazing. That's that's so deep. And in what you said, speaks speaks and sings a practitioner, but I think also speaks and sings a dreamer. +I think you dream of of better ways of searching things, right? +And like you you went through it practically, but also, you know, when you when you get so deep into practical elements, you get stuck into it and you're like you're thinking in the in the framework of the given system or the given language even, right? +Sometimes the paradigms that you read through the docs and you keep thinking about them, it's hard to unstick yourself from them. +And I mean, the fact that vector search field was born is magical in many ways. So I feel like you you feel the same. +And I mean, the fact that you also ventured with me and others into building a new algorithm for vector search also says that that you wanted to go as deep as implementing an algorithm, right? So which what could be sexier than implementing an algorithm? I mean, I don't know. +Of course, all other things are also sexy, but I'm just saying that it's very complex. It's very like demanding intellectually demanding work. So that that's amazing. Thanks so much for this depth. +And is there something you would like to share or announce with, you know, on mighty or maybe on something you're going to be presenting on Berlin buzzwords, I know as well, but is there something that you would like to share? Yeah, so I am presenting a Berlin buzzwords. +I am putting together a charity event to hack on vector search. And that's going to be on May 5th. I don't know when this podcast will be published, but on May 5th, I want to get and I want it to be it's just going to be an all day learning session. And I'm not charging money for this. +This is like free. I just want to show people how to use these tools if you're not in the Python world. If you're part of the Python world, I want to join amazing. Great. +But I want to do an all day like hackathon, where I'll show you how to get these things up and running hack away at it by the end you'll have a working example on your own. +And all the money, all the time we're going to raise money for charity, specifically around refugees and displaced people, because of the horrible things that are happening in the Ukraine and other parts of the world as well. +Getting some learning happening and also raising money for charity seems like a great way to spend time. So I plan to host that on May 5th. +It's probably going to be on Twitch, because I want to just to be an open drop in drop out format, you can come, you can go, it's not going to be like a controlled zoom where you, you know, it's like that. It's going to be on Twitch with chat and stuff like that. So I'm going to get it all set up. +Details are going to come out shortly. By the time this is published, maybe the details will be available already. We'll drop a link. Yeah, awesome. +This sounds amazing that you also keep thinking about this sensitive topics or like what's happening in the world and you are also contributing with your skills into a good course here. Thanks so much. I will try to publish this podcast before May 5th. +So make sure that somebody can join and get chappin, of course. We can do the most media, social media push. This is amazing. Thanks so much, Max. I've enjoyed this conversation thoroughly. We went into depth and with and everything all dimensions. It's a multi-dimensional conversation. +So thanks so much and keep it up. And I'm curious to hear news about Mighty and the tooling around it and also looking forward to your building buzzwords presentation. Yeah, thank you so much, Dima. It's great to chat. Yeah, thank you, Max. Cheers. Cheers. Take care. Bye-bye. \ No newline at end of file diff --git a/transcripts/vector-podcast/saurabh-rai-growing-resume-matcher.md b/transcripts/vector-podcast/saurabh-rai-growing-resume-matcher.md new file mode 100644 index 0000000..80fdce8 --- /dev/null +++ b/transcripts/vector-podcast/saurabh-rai-growing-resume-matcher.md @@ -0,0 +1,127 @@ +--- +description: '

Topics:

00:00 Intro - how + do you like our new design?

00:52 Greets

01:55 + Saurabh''s background

03:04 Resume Matcher: + 4.5K stars, 800 community members, 1.5K forks

04:11 + How did you grow the project?

05:42 + Target audience and how to use Resume Matcher

09:00 + How did you attract so many contributors?

12:47 + Architecture aspects

15:10 Cloud or + not

16:12 + Challenges in maintaining OS projects

17:56 + Developer marketing with Swirl AI Connect

21:13 + What you (listener) can help with

22:52 + What drives you?

Show notes:

- Resume Matcher: https://github.com/srbhr/Resume-Matcher

website: + https://resumematcher.fyi/

- + Ultimate CV by Martin John Yate: https://www.amazon.com/Ultimate-CV-Cr...

- + fastembed: https://github.com/qdrant/fastembed

- + Swirl: https://github.com/swirlai/swirl-search

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20240412_070424_039c919c74991b595a1fa22f4c6cb1dd.png +pub_date: Fri, 12 Apr 2024 19:17:29 GMT +title: Saurabh Rai - Growing Resume Matcher +url: https://rss.com/podcasts/vector-podcast/1434941 +--- + +Hello there, this is Vector Podcast. Third season, I'm sure you've been waiting for new episodes. There have been something happening in my family on positive side, so I was a little bit like distracted on a way, but I'm super excited to have my guest with me. +It's Sarah Prie, who is a software developer. He also doubles in DevRel. He has an open source project, which is like Skyrocketing in stars. Yeah, welcome, Sarah. +It's a high-demetripe, and it's a pleasure to be on the first episode of the third season, and it's a pretty amazing introduction that you do. So, hey, audience, I am Sarah. I am a software developer with more than two years of experience. I've been doing a lot of open source projects. +And there is one more thing that I should also mention, which is also very important. Well, at least for me, and I know for you as well, is that you are the designer on this podcast, and I'm sure that you will be designing this very episode as well. Oh, yeah. +Yeah, it's going to be fun, like designing your own, like editing the own episode and the banner and so on. So it's pretty amazing going from designer open source and then here. Yeah, absolutely. +So, yeah, if you check out some of the designs that drew your attention, you should know that this was, these were done by Sarah. I'm really excited to have you here. As usual, we start with an intro. +Could you introduce yourself to our audience? Like what's your background and how you got here? Yeah. Okay, so I have a background in computer science and engineering coming up with an engineering degree right after the COVID second wave hit in India. +And after that, I've been doing like a full stack development for a very large corporate company out there, probably been there for two and a half years. +And apart from that, I was pretty much involved into open source projects, vector search, machine learning in AI, and all the same spaces that you are in, that's how I found you. And the other amazing team members out there that we have collaborated and I've designed for as well. +So that was an interest that I've kept on to work towards artificial intelligence, machine learning, vector spaces, vector search, vector databases and all those interesting things. And yeah, and all thing open source as well. +So with all that's happening in the last two years with me, I ended up creating this project of mine called resume master, which is started to like gain more attention than I initially assumed. +And that's where it all like blew up like 4000 nearly 4500 GitHub stars, huge amount of traffic on the website and a lot of downloads, maybe like more than 1000 folks. I've got like 800 members in my community. So it's pretty amazing. Yeah, this is insane and crazy. +I remember we were chatting together about this project and you said, you have this project, which is kind of like, you know, a cadaver interest or something like that that you've been doing on the side, right? And then we were chatting, I give you like a small advice to rename it slightly. +And then you decided to really go public with it or whatever. Like just an insane amount of growth that you've made there from like a couple of stars to 4000 and half stars. Like how did you do that? Like I still don't understand. +So the amazing part with the whole thing was that as you mentioned, the initially it was called name resume matching and it was pretty much focused on the algorithm. +And like when we were talking about it, to make it more public, friendly, have a nice intuitive dashboard so that people can interact apart from the command line stuff that I had before. +So that was the whole thing to create up a product that people can use and that introduced me to something called as developer marketing. It's not just that you have the product or you have the code, but there has to be something special about it that you can do so that it grows. +I mean, we have a huge probably millions of open source project in GitHub and GitLab as well. What makes a difference between something that has 100 stars or maybe less to something that has 4000 or maybe more than that? That has the element of marketing out there and a finally polished product. +So that's the key differentiator between everything is that. Writing software is essential, but if you don't do the marketing, if you don't do the advertising or evangelizing about your whole stuff, then it probably doesn't get as much attention it wants to. +So that was the game changer for resume matching. This is an amazing journey that you've had there. And this is where I want to ask you next. +My next question, can you explain what resume matcher does and how is it relevant or useful to pretty much anybody, I guess, but maybe developers first, right? The target audience for resume matcher was developers out there. +I'm a software developer, so I knew like what was the challenge with the whole project. What resume matcher does is like, it is a reverse ATS tool for your resume is out there. It takes up certain keywords from job descriptions and then it matches with your resume out there. +It suggests that probably you can add some extra keywords. An example would be, hey, I am a Java developer. I'm going to apply for this full stack developer job that I know that I can do it well. The full stack developer job contains certain extra keywords. +I have written Java, but the job description has mentioned it as G2EE or Java to enterprise edition or probably another package. Maybe it is, it has maven, it has spring, and I have written as spring boot. Certain keyword mismatches there, which anybody can make. +Then that is something that resume matcher fills in. It would go out, parse your resume, match it with a lot of job descriptions out there. It's going to suggest your similarity score as well. That, hey, this is the job, this is how much you match. That's where the webtoe similarity comes in. +You mentioned ATS. What does it stand for? Can you expand it? Oh, yeah, for shows. For those who don't know, ATS is an applicant tracking system. There's like a lot of them. +What actually happens is that when you apply to a company, any likes, from startup to a large corporate company, you submit their resumes and they get ingested by these applicant tracking system. +These software's taken your resume, extract all the keywords, and then they run some similarity scores against the job description. It also do some custom keywords searching as well, but that's to per company basis. It's like optimizing your resume, right? I remember I was reading one book. +I will need to look it up to link it in the show notes, but basically it's a book about how you should approach job seeking, but part of it is basically writing your resume, right? +I remember that it was actually starting in reverse way, the same way you just explained, but it would first ask you to list the jobs that way you want to apply, right? Sort of like scope them. +Then you kind of try to summarize what's common in there, if there is some commonality, and then you go backwards to your experience and you try to kind of match these things in a proper way, right? +You even need to start your resume from the key skill that this job adds a looking for, and that's how your resume is going to stand out. +That's amazing way that you cracked it. Basically, I don't know if you've read that book, but... No, I'm not. That's amazing. This tool has now how many contributors? It's an open source, right? Sorry, I missed it a bit. So this tool is open source, right? Yeah, open source with Apache 2.0 license. +Yeah, and how many... Part of the journey, it's not just getting stars, you did get stars, which is amazing, but I think an even more interesting result, at least for me, is that you got contributors besides yourself. +And I think last time I checked was today, like you have, I don't know, dozens of them. How did that happen? Yeah, more than... So this advice would be more towards the open source companies out there. We've seen a lot of them getting started. Why a combinator has a lot of... +They've started funding a lot of open source companies out there, which is pretty amazing. We're from vector database, machine learning, tools, generative AI tools, all those amazing things. So the interesting part of this is that trending gets you visibility. +Like, if you are on the GitHub trending feed, it will give you visibility. And there are all the people out there who are like looking for their next content. So we have a lot of YouTubers out there who talk about open source projects. We have a lot of blog writers out there who tried about... +Even I have also written about it in certain in certain of my blogs as well, like the other different tools. So trending feed gives you visibility. People take it out and then people tweet about the project. People talk in their different forums. It gets reshared. +So this gives you a really good visibility. Probably maybe like improves up your SEO a bit. That I would say. And once you are visible, it's going to be like more people are going to download that project. They will try it out. And if they find out a bug, they will contribute it. +And some other people are really enthusiastic. They want to learn hopping into your community. They will talk about like, hey, can I contribute? Is this an issue that I want to talk about? So the whole thing with the resume match was the same as well. Yeah, it's amazing. +And then I feel like maybe these people also found your project relevant to themselves or maybe to their connections. I was checking your product hand and I see some comments where people say that, hey, this is amazing. I also recommended it to a friend of mine who's also looking for a job. +And I think this market, of course, became like IT sector became a little volatile. Probably around the world, right? So you kind of kicked in with this project right? Yes, I would say that may be my timing was perfect in that sense. +Like the moment, I would not like to stay in that sense, but I just wrote the product. I never knew that the market was going to be in that same thing. So yeah, the resume measure took the advantage of time, we're also seeing a lot of generative AI based open source startups. +Lama index is pretty amazing in that sense. Like I know Lama index and Langshane trying them out when they were like really small, they were just growing out there. And then after like 12, 13 months, I believe? Well, yeah, around nearly a year, maybe less. Now they're like full-blown companies. +They've got an investment. They have their own cloud tier. So maybe the whole next big shift towards open source. Yeah, exactly. Well, getting a bit more technical, can you also unveil the some of the architecture decisions you made for this tool? I know that you are using vector search. +Is that right? Like what are you using? You are using some library or database. Maybe you started using a database. Can you explain a bit more about the architecture of your system? I use like basic tools such as PDF minor or PDF extraction tools, word text extraction tools, I would say. +I use that. Then I use libraries such as Spacy, Annel TK. And yeah, Spacy and Annel TK code that I've written to combine them and use algorithms to extract chunks of text. And then I use something called as, so I was using quadrant vector database to do the vector embedding. +But later on quadrant introduced something called as fast embed, which is pretty amazing. And it can do the text to vector embedding on the fly. And then using that, I can, I've written like someone contributed the code to do the co-science similarity for that. +So extract the text, do the analysis using Spacy, Annel TK, and there's also another wrapper on top of Spacy. Use that. Get the code of the data, visualize it, and then send it to the people out there for the vector embedding. But the search is something that I would like to introduce. +Maybe generative AI is also that something that I'm working on as well. So the current goal in the couple of months is to get a dynamic ATS that takes in your resume, takes in the job description, optimizes it without hallucinating, without adding some extra keywords out there. +Like as a chava developer, you don't want cloud. If you haven't worked in cloud, then you don't want Kubernetes to be suddenly added into your resume. Or maybe it, you don't want it to say that, hey, this guy has worked out. +Some other company and created open AI is any any item stuff that comes out. So limiting the hallucination and all that thing to introduce generative AI is somewhere in between. Yeah, yeah, yeah, but sure. +And are you also like planning to make it like a cloud version or something that you host for other people to access? Or is it so that at the moment, it's mostly like self service, right? So people need to download your repository, set it up and start using it. +The challenges with cloud is that cloud is not free. And if I were was to introduce a cloud variant that has to be like, have a paywall, people have to log in, they have to subscribe and all that thing. It's just a far fetched goal that if I want to, I can get a paid version of it. +But what it really want is downloadable version that people can download and easily access, not just software developers, but maybe everyone out there. Like a Mac, Mac OS app or an iOS app that people can download and start to. Yeah, yeah, for sure. Yeah, that makes a lot of sense for sure. +Yeah, you also mentioned something about like challenges maintaining this project. I don't know. Is this is a challenging to maintain a project that you've been like the author of, but now you have so many contributors, maybe also a growing demand of things and people make decisions and so on. +How do you coordinate this thing today? Is it a challenge? Yeah, that's an interesting question by the way. Yeah, it becomes challenging after a certain time. So when you're trending, it gives you a really amazing intense dopamine hit and you're excited about it. +You check out the GitHub Fidei, you have grown by 200 stars and everything. I've seen like a lot of people joining that are talking to you. And then after that, after the whole phase fades out, there comes in a time when I have a lot of full requests out there, which we have to talk about. +And then even if the full requests are there, you have to download them, test them, and then there could be things like you have to request changes and it all takes up a lot of time. So it does become a challenging but hay, but that's the part of process. +Like even you do the same thing with at your workplace as well. So I won't focus more much on that. It's the work that's there. Yeah, it's more of any open source project. Yeah, exactly. +So it's kind of like marathon and you need to dedicate some chunk of your day to do stuff, right? To maintain it, to keep it alive. Oh yeah, that's amazing. +You also mentioned that you basically replicated this success of your, you became like a marketing guru in some sense, right? Like I know that there are, there are even like companies that have open source repositories that haven't had such a success that you've had in just a few weeks. +I mean, I haven't just really amazed by this. You also picked up another project and you kind of like replicated the same process there and you grew it to like really big level. So can you talk a bit more about what is this project about? And yeah, what's your role there? And what do you do? Yep. +So while I was doing resume matching and all this thing, I was looking for a search engine out there like something that could easily federate the queries across different sources and extract information. And probably the guest has also paid on the previous version of season two of vector podcast. +That's when it was world search. So we had a talk with Sid and we probably kicked off like, hey, we had similar interest. He was also into search and embedding and all those AI stuff. And I was also interested in that. So I joined in as a open source contributor out there. +We talked about like, hey, how can we replicate the success for resume matching that to Swirl? And it wasn't much of a serious role, I would say, or a full-time role, but I did the testing of like the, does the principles work somewhere else as well? +So I took that as a challenge and the sole search went down from say a hundred GitHub stars to somewhere between 1400 now. +And I did that around like two months, I would say. So it depends on project and then timing, not everything is going to be like resume oriented. It's some more practical and more enterprisey thing with sort. +What they do in just is that there are AI Connect platform, they connect internal large language models to your enterprise data sources such as teams and outlook. So while they have some developer audience, it's pretty focused towards something like a niche out there. +Instead of like chat GPT or something which is generic and even can use. So early something which will cater to let's just say five percent or even three percent of the audience out there. So that's that. I mean, that was the challenge and I talked to the team. +We did some changes, wrote some blogs and we arrived at like a really good substantial result. So as I could say, the principles work for all the companies out there. Swirl has been could be say one of the extreme use cases that hey, this is catering to enterprise, but it works for them. +Then it can work for any generic public facing project. Yeah, yeah, amazing. And please do check out the episode with seed prop stain that who who created Swirl and he's very driven individual with a lot of experience in search engine development and software development at large. +Please check it out. Yeah, that's amazing. Yeah, I also want to ask you where do you go from here, right? So do you need some help with the resume marcher or you have enough of help, but you need I don't know, you need some maybe cloud, cloud credits to host it. Something like that. +Well, anything, anything else. Okay, yeah, for sure. Like any help would be appreciated with resume match out there. If you can donate to the project out there, it can help like drive the motivation out there to do stuff. +Maybe what what I would really help is, especially is this in the generative AI offering out there that I'm going to develop, maybe help and test it out with different AI providers. We also have like open source model, mixed trial, Google's, Gemma, all those things and then we have charge GPT. +So really would like to help in orchestrating the whole project around how to do it in a more better generative manner. So that's I would really need anyone's help out there. And of course, cloud hosting. +If we can test a model out there that can help us with cloud hosting or someone who would like to sponsor the project. That would be the better one because it gets a lot of traffic the website. I can put you up on the website. If you'd like to sponsor, it can reach out to me. +I will drop in maybe that's fantastic. I'm sure there will be someone reaching out or at least checking out the website using using the tool. Yeah, that's amazing. +Yeah, I also like to ask this a little bit more philosophical question by the end of the podcast, which I used to ask it like why do you do things? But maybe I will try to rephrase it in this third season. +What drives you when you wake up? What drives you? Why are you doing this? What drives you in your open source job? Pretty amazing question. What drives you? So I would like to quote Sandika as the Roman philosopher out there for a philosophical question. +He says life is short and our time on planet earth is pretty limited. And if you use it, well, the same life could be long enough for that. So it's not about why you start with why, but you just start with something. And I started with open source. And it went from like anybody can do open source. +All you need is a laptop or a computer and how to do get it. And it starts with there. You build out a public project. You talk about it. Share the same idea. You contribute to other projects. And the whole chain starts from there. +And then that's how I found all these amazing people out there, including you coming up with how to improve resume and match it. I've even met some really grateful amazing people out there as well. +So it's in the beginning when I was doing the whole thing, it never occurred to me that this project could go out and become a really trending project. It never did. I mean, it was the wildest maybe like 0.01% of the tree. This can't this go trending. And then it did happen. +And not only that it happened like, well, more times to be go out and become this whole trending thing. So that was pretty amazing. So in the beginning, it was let's do just something. We have time, not wasted, don't waste time. Let's get started. And it actually spans it out. +More action you take, the better it gets. Yeah, absolutely. So keep on doing this. And also you are very driven like every time we talk of podcasts, I learned something from you. You give me a couple of links and I start checking them. So it seems like you always on the edge. That's amazing. +Yeah, thank you very much. Zara, I really enjoyed chatting to you. I'm sure we'll connect more. I'm looking forward to the design, the craziest design that you will come up for this episode. Of course. And all the best with resume and mature and with your blogging. +And I also know like you're using a ton of modern tools, right? Like chat GPT and you apply them to work. That's amazing. Yeah, for sure. And yes, thank you to me, Trey. Thank you for having me on this podcast and giving me the opportunity to talk about resume and action. \ No newline at end of file diff --git a/transcripts/vector-podcast/sid-probstein-creator-of-swirl-search-in-siloed-data-with-llms.md b/transcripts/vector-podcast/sid-probstein-creator-of-swirl-search-in-siloed-data-with-llms.md new file mode 100644 index 0000000..adcf3b9 --- /dev/null +++ b/transcripts/vector-podcast/sid-probstein-creator-of-swirl-search-in-siloed-data-with-llms.md @@ -0,0 +1,378 @@ +--- +description: '

Topics:

00:00 Intro

00:22 Quick demo of SWIRL on the + summary transcript of this episode

01:29 Sid’s background

08:50 Enterprise + vs Federated search

17:48 How vector search covers for missing folksonomy + in enterprise data

26:07 Relevancy from vector search standpoint

31:58 + How ChatGPT improves programmer’s productivity

32:57 Demo!

45:23 Google + PSE

53:10 Ideal user of SWIRL

57:22 Where SWIRL sits architecturally

1:01:46 + How to evolve SWIRL with domain expertise

1:04:59 Reasons to go open source

1:10:54 + How SWIRL and Sid interact with ChatGPT

1:23:22 The magical question of WHY

1:27:58 + Sid’s announcements to the community

YouTube version: https://www.youtube.com/watch?v=vhQ5LM5pK_Y

Design + by Saurabh Rai: https://twitter.com/_srbhr_ + Check out his Resume Matcher project: https://www.resumematcher.fyi/

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20230722_050704_2b439b236c5d93de6718cfecda81d779.jpg +pub_date: Sat, 22 Jul 2023 05:03:26 GMT +title: Sid Probstein - Creator of SWIRL - Search in siloed data with LLMs +url: https://rss.com/podcasts/vector-podcast/1047952 +--- + +In this episode, you will learn about Swirl, MetaSearch Engine with large language models for your silo data. Here you can see how it works for the summary transcript of this episode created with the tool called ClearWord. +Hello there, Vector Podcast Season 2, and today I'm super excited to be talking to CIPROP STING, the creator of Swirl Search. It's a federated VectorSearch Engine. If I'm correct, but I would not hear more from CIT himself. Hello, CIT, how are you? I'm doing great. It's really great to be here. +Thank you so much for inviting me. Yeah, thanks for joining. I'm sure you are very busy building Swirl, and I'm really curious to learn more about it. I missed all the discussion, you know, how chat GPT is going to change things. +You know, is it going to conquer us or whatnot? But yeah, I mean, I'm really interested to hear how you guys are doing, how you guys are building this. And traditionally, we start with your background because we really want to know how you got here. Absolutely, no. +And it's been an interesting journey. Swirl actually is my, the 12th venture I've been lucky enough to work on. I started actually at a free email company called FreeMarkMail. You might remember Juno, our vastly more successful competitor. +It was a great, great lesson in marketing and customer acquisition. +But long story short, you know, my dad was an MIT professor, and he suggested, or he was interested in computers, and somewhere around, it was too long ago, but I was about 12 and I picked up a TRS 80 with 16K of RAM, I think, in a cassette tape for storage. +And we went to a couple of, actually, we went to two classes together, and then he didn't want to do it anymore, but I stayed with it. And I have always loved getting that computer to do things that we wanted to do. +And so I guess ever since then, I followed the tech path, so I was lucky enough to do my undergrad at MIT. I actually have an MBA, though, I'm one of those MBA CTOs. And mostly I've worked building software and leading teams to build products and services. +So some of them have been a TIVIO, which is now actually service now, which is obviously one of the unicorns out there. They really totally disrupted the knowledge base and help desk space. +And it's an incredible application of interesting core technology at the beginning, when things were whiteboardy. +I've worked in a couple of other search companies, and with some other search companies, I was lucky to spend a little time with Mercedes Arabian over at BA Insight, which was a very cool and also Jeff Fried, very cool company. +And since I know those guys back from fast, another company that I worked at, now Microsoft, fast was one of the early players in enterprise search that had an excellent product that scaled and right as Google was sort of becoming a household name and just intermediate everybody. +We had the tool to build the catalog, the e-catalog, that mostly for publishers, and but then it really spread out and started to affect intranets. And it was truly there that I saw the power of search and how it could change almost everything from the business perspective. +You know, business intelligence and reporting and all of these systems that have been around for 70, 80 years, they're what we settle for. But everybody, you know, from Brin and Page on, right, and way before that, we're all inspired by that Star Trek computer. +Why can't we just ask it, you know, it seems like it's not that hard. And now of course, not to give away the lead, right? But there's definitely something doing that and it's been a long time coming. But that is not structured data. +Well, let's not argue about the semantics, but it's not what people refer to as structured. It's not database data metrics and KPIs and sales numbers and things like that. +I think that it was really at fast and also at Northern Light Technology, which is still going strong, by the way, with some fantastic indexing search. And now they're doing question answering. First place I really touched search, right, was at Northern Light. It's the human interface. +And we feel like it should be coming along faster. And now the text after many years of indexing and vector search, right? And the advances driven by Google so much, right? Transformer architectures and vectors. And that has all come together into a pretty amazing place. +And so long story short, that background led me to create swirl because I noticed a couple things. It really came down to three things. One is that there's, there are silos, super silos, like service now. +Service now really did get a lot of the knowledge bases and a lot of those, a lot of the help desk, you know, the tickets base with the streams and tickets. M365 kind of won the files race at least right along with email. And I guess they've done very well. +Obviously, very impressive performance to build teams to the large community that it has developed. So, and then there are others, right? There's certainly Salesforce, a great example of where most of the CRM data now lives. Snowflakes, another one, you can't really get a copy of these. +I mean, moving the data out from Snowflake is relatively easy, but the others, there's a complicated API there. Salesforce has thousands of tables. So, you can't really get that data anymore, but yet it has some of the most important ideas, concepts, and knowledge in your entire company. +So, that's when I realized something that had been tried before. MetaSearch, right, or federated search. I think MetaSearch is clear up because now sometimes people say federated search is about e-commerce federation. +The MetaSearch was hard to do because of connectivity, right? Like it could take you months to just get somebody to change a network thing or to put a VPN in place or either change permissions. That was expensive in large enterprise. +But now, especially with public services, pretty much everything has an API. The perimeter doesn't exist the way I used to. And so, you can query everything. So, that left the problem of can you make sense of things? And that's of course, what we're here about, right, is vectors. +The power of vector search and vector similarity, specifically, right, self-cosign vector similarity that we use in Swirl to make sense of completely disparate and very, very, very incompatible results, if you will. And it's shocking how well it works. +That that's when I saw it work, I said there's more to this than I thought it now. It seems I'm not the only one. So, but anyway, that's a little bit of the story in my background. I hope that that made some sense. Yeah, it's very solid background. +You reminded me of one, I don't remember the name of that computer, but like didn't have the display the way we have today, right? It just had the keyboard. And then it had the cassette. And so, my friend and I were sitting there for several minutes to boot it. +And then there was some game like Mario or whatever. That was on the cool Apple twos. I was always envious of the Apple two, you know, kids. Because you're right, on the TRS-80, we only had block graphics. It was hilarious. But it didn't move a little bit faster in a way. +Like you get to wait a long time for Apple upgrades. But I remember the TRS-80, there was an incredible ecosystem of things you could add to it. So, memory. And then there was a company called Percom that put out disk drives. Wow, a disk drive. +That was a game changer if you played with a cassette recorder. Although, who didn't love switching your parents' cassettes with the with the data tape so they'd put it on in the car and we go, okay, are they going to stop and turn that off? It was a hilarious prank. A great way to get some sound. +But disk drives then gave, right? First, there were the five in a quarter or actually eight inch, then five in a quarter. And then finally, they've got to the cassette. I was at that point, it was sort of replaced, right? Then the IBM PC showed up. And that was a bit of a game changer. +But the Apple always had better graphics. Yeah, absolutely. I just wanted to come back to what you just said about federated search and enterprise search. +I think I remember hearing about enterprise search was it like 15, 16, 17 years ago, I don't remember in the university when one of my supervisors was focusing on it and he was saying, this is the next big thing. And once it's figured out, you know, we will be rich. But somehow it didn't happen. +And then later in my career, I heard term federated search in connection to, okay, we have our own search engine, we have clients data, can we combine the two without needing them to upload their data to our servers? +Because in some cases, they wouldn't trust us, you know, securing it's enough and so on. +So forest. And then we were confronted with the fact that maybe it will incur quite a bit of latency. And also even in the first place, how we would build this. +But you know, like before we even get there, how do you relate enterprise search versus federated search? So, so I think they're, they're different in that enterprise search is about a realm. Right. Enterprise search means usually not public sources. +And I think it's important to differentiate the problems of the large enterprise and even the medium enterprise are not the same as the sort of small, small and medium enterprise enterprise. Maybe that's not a great dividing line. +But definitely the large enterprise has a very different set of problems. It's so much more about, you know, global distribution and languages and regulation. If you're a, you know, small company like like swirl ink, we have five people, we can work off of almost anything. +I mean, and we don't have the silo problem because we just have picked, you know, we have four. But it's interesting. We do still have the silo problem, right. +And as I'm going to show you, just when we were trying to find the steering document for this discussion, I realized I was hunting around which silo did I put it in instead of just going to search. So it's funny that we've trained ourselves to work that way. +That I think it's a reflection of the reality that in the large enterprise, it's exactly what you said entitlements are extremely important. You're talking about crown jewel data like PL product data or customer feedback. CRM data is much less sensitive in some ways. +Also data that you might purchase, it's very common for companies to build and or purchase data sets and assemble them or assemble derivative sets. These would be incredibly valuable for lots of uses. +The simplest one, right, usually as sales or the most obvious one is help sales, help partners sell more at the knowledge companies, help the sales people better understand their customers or industries. And there's a massive amount of information overload. So the problems are different. +They're acute. They're willing to spend significant money, right, and invest in really creating a better world. I think now, maybe one of the most important trends is people are not so interested in more search boxes. +They want to build proactive systems that bring people the information that they need. And this has been a long time thing in sales with things like LinkedIn Navigator, right? A lot of the public data gets harvested and brought to you. +But think about all of that incredibly rich, valuable internal data and needing to bring that and how hard it is to bring that to people inside the enterprise because of those entitlement lines. +So federated or meta search is a technical approach, which says rather than an in traditional enterprise search, traditionally, the tool is indexing. +So you take the data from all the sources that you need to query, which usually, since that's hundreds, if not thousands, inside the large enterprise, usually start with a few. +And you extract the data, meaning you pull it all out, then you have to remodel it because you could leave it sort of as is, but the odds are high that won't help with search. You need to make at least some of the fields, things like title and body line up. +So you map those things over and you have to make sure that the set of entitlements, meaning whose author I see stuff, all of that from all the silos has to be aggregated and correctly rationalized and put together, then you index it. +Indexing is a technical process like creating a structure like the back of most books or most long books, a list of words with basically page numbers, but in this case, they're slightly more complex. They might identify the documents in the field and the exact token that it occurs in. +So you have this kind of data structure. And you just have to keep it up to date anytime anything changes. So it's really hard. I have been very lucky to work in search and fast was a phenomenal indexing company and it innovated in indexing beyond the pale. I really incredible stuff. +So fast was one of the first companies to do updateable indices. You could actually update it. Then a lot of the stuff that they did is advanced vector. We did it fast, but you know, me a tiny bit, right? Whatever the nuggets were, but they went on. They went so far with engine development at fast. +And now it's by the way available through the Vespa project, right? If you go to Vespa.de, all that stuff is available, open source to. Yeah, we have an episode with Vespa. You probably would joke. Yes. He's my hero on on Twitter. +So incredible advances at fast and frankly at a TVO, you know, there were a bunch of patents filed. +Some very smart people worked on that problem and came up with incredible ways to interlink data by combining graph and and a traditional inverted index and doing things like then adding that to machine learning and doing things like predicting the answer to a service ticket. +So there's no end of indexing. It's just hard. That's all. It's just hard. And especially when you want to combine silos. And so over the years, I've bumped into people who have had the multi silo problem in grade numbers. +There is one consulting company that has more than 500 silos, separate installations of elastic, literally from version two to version eight or whatever they're on now, right? Because that was a standard. +And when they got a JSON data set or a database or they bought something or they did a hackathon invariably, the documents ended up in some elastic with some security on it. +And now the some of the variation right in partner and case team performance is attributed internally through surveys to who knows where to get the data. +If you know, oh, I know to talk to this person, they will have the key to unlock this particular thing that I can then use to say, hey, look what we did this incredible work we did in your industry before or look at this incredible work we did for you in the past, right? +A new partner might not know that. +They've done five engagements that were very similar. So it's that kind of and I think the word is systematic. People want to be very much more systematic now because everyone is too busy and there's information overload. So that's really the to break those lines down. +My view is enterprise search now really desperately, desperately critically requires meta search. It's the only choice you cannot you're downloading, you know, pulling out all of the data, even if you were to desire that. It's very hard to do. +Now you're because you have to basically the old way would be to pull all the data out of everything and sort of filter it down. Why not search it? Yeah, our search is to say it's out there now. The vendors are doing incredible things. +I mean, service now from where it was years ago to where it is today. It's incredible. There's an amazing team of people working away on that and that's true of most applications now. Somebody's working on search. It has a nice high quality API. So let them do their thing. Let them master it. +But search and the other thing, the interesting that makes meta search particularly powerful for the enterprise is you're always searching on behalf of something. Right. And that avoids something that avoids it. It goes with the flow. It goes with the grain of the enterprise architecture. +You're supposed to query on behalf of something and if you do, in theory, the app can just maintain the context. It only gets tricky when you start saying, oh, I want to combine these five together. At the data level, when you do it at the user level, that's fine. +Either the user was authorized to see all three or they weren't or they were able to see a portion of it or they weren't. That's the way things work in the enterprise. So that's that's the subtle difference, right? To delineate them. Yeah. And why the potentials there is that indexing is costly. +And yet, on the. Yeah. And you described it really eloquently in a way that to some extent by implementing meta search, you wouldn't need to solve indexing issues. You wouldn't need to solve entitlement issues, right? You kind of like use the existing proxies. +But there is one remaining bit that I'm really curious about. So if you look at, let's say what Google did to the web search is that they looked at what you could call a proxonym effect. +So other people created pages linked to more important pages, hubs, and then you invent the algorithm, create it to you. +But you still kind of like rely on what others did in a way, right? And so now you have the page rank algorithm, how you how you rank the documents and all of a sudden, this is the breakthrough and this looks a lot more relevant. In enterprise search, you don't necessarily have this. +Okay, you do have documents that are being created, created, and so forth. But then as you said, there is a lot of silos, right? And so things get created. +There is no single place where you can say, what happened? What did I miss? What do you have on this topic and so forth? Just today in the morning, I was browsing through Office 365. +They have like a single page, which shows me all the documents that either I interacted with or someone interacted with and I am part of that group. And I can search there. That was helpful actually. That's all the lot of save the lot of time. But again, it doesn't have confidence. +It doesn't have Salesforce. It doesn't have a bunch of other places where it would go. So I guess one missing component, still in enterprise search was how would you rank these documents, right? Because you don't have a lot of signals. You simply have the documents themselves. +And so would you say that vector search now opens up this horizon for us? It helps solve this problem. Absolutely. And I think if we untangle it a little bit, it gets back to Google. In fact, it goes right back to Google. Google had the challenge of make they had the biggest data set in history. +The web incredibly interlinked. And they did the absolute best job of figuring out how to model that structure. You weren't searching every web page every time you searched. You were searching a structure that in fact is a large language model. Right? That's what they built. +They were the one they pioneered it. But it was the very first one. Or no, that's probably not true at all. Burr was an early one that got popular. I don't want to make, I have no idea. Right? What came first? But Burr was certainly the one that was the game changer. It was very recognized. +That's where the real popularization of transformer models I think came from. And it's that structure. What is that structure? It's a structure that can evaluate results almost independent of the results themselves. You don't have to look at every web page. And so that's the key. +In fact, you're absolutely right. I think there have been many attempts to do meta search and federated search even against APIs. But you then end up with just all the results. Tiled or whatever it is, but it's just all the results. And that doesn't help with information overload. +It also doesn't really get to the potential. So the key is, and what's world uses, we use a large language model. There's more to it, right? There's a relevancy algorithm around it. +There's a similarity pipeline around it, right? But at the end of the day, there's a large model that evaluates data as vectors with real numbers. And it allows you to do incredible comparisons. And the thing that, as I put this together, I wrote it nights and weekends starting last year. +And when I started to get results from it, I was shocked because I did not expect it to go to be as good as it came out. The thing about relevancy, right? It's, oh, man, we can always say we can, we'll identify it when we see it. But building tests around it is very difficult. +You come out with gold standards. And I love all the tooling. There's so much good tooling around it. But at the end of the day, it all depends, fundamentally, on really some set of finite labeled outcomes, right? That's what it is. I found another way to measure the relevancy without doing that. +And the way to do that is how far to the right are the term hits. And in, when you're using swirl, it favors because of the large language, the mall we used, it favors hits that are to the left, beginning of the sentence. It views aboutness as being at the beginning of the sentence. +It's extremely good at discriminating, again, identifying hits that are in passing. So I think we can all agree. Relovancy, I've always viewed relevancy as a bit of a stepped function. +The absolute top is exactly what I searched for in the entire field of the title and the body, right? At the end of the, the hits at the beginning of the body, we can probably agree that's got to be a good hit, the degree there's a title in a body. +And then we all fear the terrible mention, right? The enemy of relevancy is one mention of New York at the very end, right? It's like they're in the list of cities that absolutely have nothing to do with the big apple. +And that's what I found is that the relevancy function, the lower you are in the result list, the more to the right your search terms are. And the relevancy is what, the other thing about meta searches is you don't have the documents. I believe that an evidence-based approach is critical. +Did the silo return the search terms that you, the user put in, are they visible in the results? If they're not visible, then there's a question. So that's the other piece of it is we do use an evident space metric combined with similarity to say to rank and it works. +And now, so that said, here's all the stuff that I just left out. You have to normalize the query, especially if you interpret the query and rewrite it. One of the most important things about meta searches you can't send the same query to every endpoint. Not all endpoints are equal. +Some endpoints, for example, might be a database that's really able to target one field at a time effectively. So for example, they might be a repository that knows about companies. So if your search is electric vehicle Tesla, don't send electric vehicle in, just send Tesla. +So we provide a way to mark that. So we're all has the ability to tag each search provider with what it knows about. So you'd write that electric vehicle company colon Tesla. Tesla goes just to the company silos, the query. Everybody else drops the tag. +So Google gets electric vehicle Tesla, which of course, it doesn't have a magnificent job on. So then you have to normalize the query when you're scoring, as well as you have to normalize each field, right, as normal. Freshness is certainly an interesting thing. +I found the model also works best if we add a boost based on the top topness of the results. So if a repository gave it rank number one, we should probably at least factor that in a little bit. And then of course, there's things like number of hits. +And vector similarity is ultimately used, right? We aggregate vector similarities to reflect the evidence level. And then the strength of it, right, is captured in the similarity. So a lot went into it. +It's probably the most awful module in our in our repo, but somebody smarter will rewrite it soon, but it works. And that's the important thing. And that is why I'm here today, right? I have exited other ventures because I believe in this so much. +And I put together a little company that is here to support it. It's 100% open source under Apache 2.0. Get it or grab the darker and you can make it run in two lines. And you'll see. Yeah, that sounds so fantastic. And I'm sure our listeners will take a look, especially because it's open source. +It's much easier to you know, start hacking over the weekend or something. I also wanted to ask you before you show us some demos. I think this will be really, really interesting and change in format of the podcast to some extent. +You mentioned the similarity aspect of vector search, right? And so probably it also exists in keyword search to some extent, but there, as you said, we trained ourselves to look at what we see. And if we see how a keyword, we kind of trusted this more. +Although this probably varies per case, but in similarity search and vector search, this similarity is a play, right? So like, if you cannot find a top result, or even like a middle relevant result, you only find like very distant ones. +How do you detect this and how do you deal with this? So the similarity will be poor and there'll be no evidence. So the score will be low and it will be end up dropped to the back of the result list. That's the key. Now, there are a bunch of reasons that can happen though. +One of those reasons could be that perhaps we do not understand the domain, as well as the silo does. + And one thing, one example of that is perhaps we're dealing with transformations of entities, very often dictionary based, or sometimes it's more subtle, but one of the things we learned very quickly is QueryGee is an open, an amazing open source package that's used with Elastic Solar, an open source, an open search, I should say. +And it rewrites queries. It's kind of the standard for it. It's very common to find it. It's really amazing. So here, the idea is that the silo knows something that we don't. So we actually have an integration now where we listen to the feedback that comes from each engine. +So if they report, for example, if they highlight hits, we check the similarity. The similarity is high enough and we'll honor that. And that's another idea, where we want each of those silos to, we want to honor their feedback. +Now, we're not today, but in the future, why not requery based on expanding our vocabulary around the search? Those are all things that can be done. And by the way, we'd love to get a pull request if someone wants to do that. That honestly is kind of the key to the whole thing. Yeah. +So you kind of like learn to innovate. Anyway, you have multiple voter problems, but you also want to really hear the signal, hear out the signal from every of the voter, and sort of like make sure that you roll this up to the best formula, right? The best representation of this signal to the user. +That's right. Absolutely. And then you can label some of those sounds because you're right. Some of them are getting really smart. Just some examples. I'll throw out some Vectara. Amazing, amazing, incredible vector database. +That's probably an inadequate description, but it answers questions on your documents. There's a revolution in Vector, in Vector search. Some people are focused very much on performance, right? Some people are focused on knowledge. Some people are focused on exporting Vectors. +So I think the enterprise, especially large enterprise, already has dozens of indexing tools and engines and others. And there will be many of these too, special case, right? There'll be some that are incredible at customer service. And some will be incredible at exception handling. +Some will be incredible at perhaps sales pre-qualification. You know, I just sort of learned from the past examples. Watson was going to diagnose everything, right? And I think what it ultimately did well was pre-approval authorizations. +So, but over time, I think it's clearly these will all become more automated. And so then, but you still need a way, if you're trying to figure out what's new across these salads, you still need a way to query them all. And so this world's happy. We have an integration with chat GPT. +You can query chat GPT. In fact, by default, we query it if you put your key in every time. So you and we rewrite the query. If the query is a question, we just pass it right along. If it's not, we ask rewrite it using a prompt to something like tell me about thing. +So you get a summary, right, which we pop up for, I think we also have a query processor. So you can have a chat GPT change the user's query. Like you could say rewrite this to a Boolean, or rewrite this why not to a vector. +But in the end, right, it's going to do that on its own on the back side of things. So when you're trying to solve problems in the enterprise, the key is you need an interface to write to. +And it would be nice not to have to write code to connect to all these things, getting back to your question about architecture. And so those are the key abstractions in Swirl. Swirl, you don't have to write code to connect to an endpoint that we already have a connector to. +You just write a search provider. All you need to know is JSON path and maybe be able to read a little developer API dog. Right. And then that you'll pretty much be able to get the search provider. If you need to write a connector. And of course, here's the punch line. +Well, I think it will probably take you a couple hours, depending on your skill at Python. But on average, it shouldn't take more than two hours because I can give you a prompt. And we can teach chat GPT about the connector class. +You should be able to get that done in a couple hours just fixing up what it does. I found that about 70% of the time, it will produce a workable connector. Just fast. The right prompt, right? Teach it how to teach it our connector class. +And give it the right prompt and bang, you have sort of almost working codes. Yeah, I think this is the best part of interfaces like chat GPT systems like chat GPT is that you can outsource this mundane work and really focus on the actual thing. I was actually born away myself. +And to some extent, scared you weeks ago when I was just saying, Hey, can you create a Python code which we'll talk to TomTom search API map search API. And it did create it. It just asked me to insert like a token. So I just subscribe developer token. But I was really blown away. +Like I would have spent probably like several half a days here and there figuring things out, right? If it wasn't TomTom or some other API. But yeah, I think this is amazing. And I think, well, I believe that you guys are documenting a lot. +But if you if you haven't yet, which just explained within documents, I think that could save a lot of time for developers. But I was wondering maybe you would like to show us a demo of swirl and then we'll dig deeper into that. Absolutely. So let me share my screen. +So hopefully you can see my screen. Yes. So this is swirl. Actually, I'll start here. This is the swirl repo. Everything you need to get started is here. The readme describes, pretty much the two commands you need to run to get swirl running if you have Docker. +There are more detailed instructions if you want to download. Everything that you'll see here runs, we have automated tests against everything. We have a whole CICD environment. And support, I just want to be clear, is free. +Please just join our Slack channel and we're happy to help anytime, anywhere. Now, when you get swirl installed locally, as I have it, you'll get this nice homepage. But ultimately, what most people want to see is the UI. So this is Spyglass. +It's an open source UI produced by a sister company, KMW, actually long time friend Kevin Waters. And he's a long time committer and a contributor to the open source community as well. So Spyglass is a great starting point for building user interfaces. It has a lot of the key building blocks. +And so here, yesterday, I was thinking about how we wrote a document. You sent me a document to use. And I admit today, I was seeing you're going, where is that document? And I actually said, okay, it's in Microsoft Outlook and I found it. +But I forgot that I could search because one of the great things about that's coming out in swirl version two, which is going to drop next month in May is we have full M365 support. So you can do the OAuth two dance. And I've actually searched through my M365. +And here's my acceptance of your your meeting, actually, some other references to it. And then here, document number four, document shared with you, vector podcast. So if I had searched, it would have been the fourth hit above the fold. +And I actually haven't done the relevancy tuning on email or one drive yet. So it worked well enough to come up. But what I think you can see again is the matches are early in the document. It favors them. +First of all, of course, it likes both terms together, but it favors them without with some exceptions. It favors the term that's to the left. And so you can see there were a lot of results, but only a few really ranked. Yeah, hi. And that's the key, right? I scan it. I'm pretty much done now. +And I can say, you know, I probably want to go look in my email or my one drive. That's more than likely where it is. And I can go and do that, you know, very simply. Right, there we go. Now I have in the top three. So the power of meta search though is more than just that. +I will just let's do that. Is it like a Django app or? Yes. Yeah. Yeah. So the stack is the stack is rabbit Django Python celery, although we're not using too much celery and SQLite or Postgres with a lot of packages. We use NLTK, Spacey, Jason Path, some others. +So now, so here I am running my electric vehicle company, Colin Tesla. This is an earlier version of the software. So you're going to see some, there's one bug here, which is you'll see the emphasis tags instead of having them render, just because I reloaded the older version. +But here we can see a lot more sources than just, just, you know, enterprise sources. And particular, one of the things that the swirl adaptive query processor does is it rewrites this query. Most repositories will get the search electric vehicle Tesla with the company tag removed. +However, the company funding database in BigQuery, which I just fixed, will actually only get the query Tesla. So if we now look at the results, you know, we'll see fairly traditional high quality content here about electric vehicles with Tesla favored early on. +So for example, it loves this hit with Tesla right at the beginning of the body. Most of these, I think are pretty good hits. And here, here's a database hit. This is from BigQuery. It's a company funding record. So Tesla Motors raised a large series C back in 2006. +This is an old database of funding records from Kaggle. Now, a couple of things I want to point out on the fly swirl allows you to turn a database record into a sort of pseudo document. You can actually just write this as a Python expression and use braces to refer to the fields. +And I'll show that in a second. In addition, though, swirl has a fixed schema URL title, body date published date retrieved and author, but it also has a payload field. The payload field can hold anything. And by default, anything that you don't specify for mapping goes into the payload. +You can also say, please don't put anything in the payload. So here, the fields are also repeated, right? As data items so that if I want, did I could extract those individually? And the idea here is you have a normalized record that reflects the sort of top relevancy items. +So you know whether or not you should go deeper. And then the payload will have anything extra that you might need to make that decision. So for example, if we look a little further down, here's a result from NLresearch.com. +Northern Light, the same company I started while working on search, or I learned a lot about search at was really the first company I worked for. Still going strong. One of the things they do is extract super high quality news from the web. +And they field it and they classify it and can really do rich searching. So here is an article that they pulled together about, you know, basically it's not so much about the electric vehicle market. It's about Tesla. So it ranked a little bit lower. +In this case, there were some other ones that ranked higher. They have some nice data that we like to capture and put in the payload as well. So this really is the core of swirl. And you say it has things like facets. For example, we use U-track internally to track issues. +So if I want to, you know, just switch to those, it'll bring just those up. Oh, looks like I pooped on that one. Another thing you can do when you're running, oops, looks like just a second. Another thing you can do, we have the concept of mixers. Not for drinks, but for results. +You can mix the results up by default. We do it by relevancy, but you can specify different mixers. For example, the date mixer will focus on it. Well, date sorted and it hides anything that doesn't have a date published. +The round robin mixer, on the other hand, still sort of honors relevancy, but it just takes one from each result. So you get a cross section of the results. So here, for example, you know, just looking at the top five, one result, the best result from each silo right here at the top. +And of course, here I'm arguing a little bit about the relevancy of this right in one of our support tickets. So you see everything kind of just brought together for me and I can decide which things I might like to do. +Yeah, maybe it could, I mean, I'm just commenting as we go, but maybe visually it could also show where this comes from, right? Because you do have on the left the sources. Yes. So could actually say this comes from here, this comes from there. But again, the combined view is also excellent. +It's just if you needed to know, right? If you need to know, where did I get this from, right? That's right. So we do, we do, we keep the source in the result here, along with whoever the source tells us the author is. However, in the in this version, we didn't get to it. +We like to report the original rank. So you should see IT news and I'll research one here. It's the number one result in the two point O version. Actually, there's a new version that's coming out. I think we're going to just do a bug fix on this. The latest version 10. +1, which is in the repo now, fixes that in a couple other issues. So if you just get the newest, you'll be good. In two point O that we have a little bit of a new treatment for this. I think you'll like a lot better. But before I jump to that, you asked me a really important question. +Right? So how this is great, this UI will evolve. It's here so that you can show the power, right? And we ship it integrated. But from a developer perspective, none of this is super helpful, right? How do I integrate this with an existing UI? So that's what I really wanted to show you next. +So first, how do we connect to something? The answer is a search provider definition. So this definition right here, this text record, mostly JSON, but mostly just strings. +This configures are out of the box request get connector to query a search provider, to query in particular this Google programmable search engine that I put together. And actually, we ship with three of them preset and please feel free to share our keys. +We're happy we want to make sure that something is working for everybody, right? Out of the box. So further in this, are the things you'd expect? You can configure this with by providing a URL. You can construct the URL by pulling in fields from the query mappings. +So the only thing that ever really changes in a Google PSC is the CX code. Everything else you can just copy and paste. You can put dozens of them in. Also here are some of the important system things that help the system work, help us process this. +So we have four different processing pipelines built into Swirl. One is a prequery that runs before Federation. And then there's a query processing pipeline that runs for each connector or actually actually say search provider, which is an instance, a configured instance of a connector. +Then each of those also has a result processing pipeline, which transforms the results from the source into our normalized format. And then there's a post result processing that does things like relevancy ranking where you want all of the data. And they're all different. +By the way, there's an object model behind Swirl. So grading these things is really simple. There are different base classes for those and they set you up with everything you need. So essentially you come in, you have a Python model or I should say a Django model object to operate on. +All you have to do is change it and exit and you're done. Simple, simple. Also, we map out the different query capabilities of each provider in the query mappings. So how do you tell a given endpoint to sort by date? This is how you add this to the URL. How do you page the results? This is how. +Result index is a Swirl capability where we can provide you with the index number. You can also use result page. So the count or the page that you want. And here's an important one too, the not character. So do the silo support not as a term. This one doesn't. It does not support not as a term. +But it supports the not character. So as an example, now if I go to the search object, I can run a search. I'll run it for knowledge management. So actually, I'll just let that one run for a second. There we go. I got my chat. I have the wrong chat GPT API key. But that's okay. +Everybody knows what we would say about this stuff. So actually the query I really want to do is Elon Musk not Twitter. So perfectly legitimate query, right? What's going on in Elon Musk's world that's not related to Twitter? Now here's the thing. Google PSC will not understand that query. +Everybody says what Google doesn't understand, not no web Google does, but Google programmable search engine does not honor or not. And in fact, just to prove it, PSC.google.com. By the way, before I talk to you, I didn't know of this system existence myself, PSC. Oh my gosh. +For web slicing up the web, it is incredible. I mean, it takes two seconds to build it, right? So and you just give it examples. So here's the thing. You can go, here's the public URL for one of the programmable search engines I put in and I'll do the same exact query. Elon Musk. +Okay, so the very first result has Twitter in it, right? It's right there. In fact, the second result also has Twitter. Google programmable search engine is not going through the full Google parser. And it does not honor the not. However, if I say this, it works perfectly. +The plus-minus syntax works. Okay. So now when we look at this definition, it says the not character for Google PSC is minus. So now if we look at the search I ran, let's look at the search object. It's another object inside a swirl. +Why is there a search object? Because in MetaSearch, it takes a few seconds to get the results from everything. And you may want to look at that data over and over again. +In fact, one of the cool things you can do is swirl is you can set the subscribe function, swirl well, then recheck for new results every so often and update and mark them new and you can even get an update things like that, right? So alert mode if you will or subscribe mode as we like to call it. +So let's take a look at the search object. What this object contains for starters, a block of messages that explain exactly what was done to the query. And here you can see the adaptive query processor rewrote the queries for Google PSC from Elon Musk not Twitter to Elon Musk minus Twitter. +So this way we guarantee you're going to get the right result, not a bad result. Oh, and also our relevancy model checks. If you have a nodded term in your query and it finds it in relevancy, we drop it to the bottom and say we actually put a special flag on it. We say this was a bad result. +Most of the others though, frankly, just either didn't know they don't handle not. You track doesn't handle a knot at all. So we removed it completely and just say, go and give us what you've got for that. And for others, we probably would have left them. +Looking at the results, there's also an info block. This is all JSON. So it's straightforward for developer using Python. It's little lists and dictionaries. There's a result that describes what each of the different sources gave back. Easy to parse if you want to build that. +You have a filter URL so you can construct your own facet display and to jump to any given provider. We actually give you the query that we ran. So if you want to check the results, assuming you have the right credentials, there's the results. So I can actually go look at and modify my JSON. +And then as you would expect, there's a summary of what was found. So here's what we actually searched. The overall query, if you want to rerun or update a query or rescore it, you can do that right from the result list. So those links are available. +We summarize the federation results and the time. Give you the next page of results, everything stored in swirl. So you can page through. By the way, you can also set a retention or exploration factor if you want. So results will simply disappear for secure applications. You can even do it. +So there's no storage at all. And then the results. So from a developer perspective, literally, I'm going to extract the results dictionary or sorry, the results list from this structure that I get back when I call it. And I'm going to iterate on that and each thing's a dictionary. +It's a flat dictionary with as the things you would expect pretty much, right? Title URL, body, date published date retrieved and author. Everything else is meta information. So what what search provider responded, what the rank was, our scores a score. +There's various techniques to turn that into a probability or a confidence level if you would like. We may do that in the future. I think it's if people wanted it, we'd love to hear about it. I think for now, though, people seem to be very happy just with rank. +Most importantly, and really, this is what swirls ultimate value is, we explain exactly why the result matched and why it scored as it did. So for example, we in this case, of course, there are no stems for a name, but we do as basically we use nltk, we stem to maximize recall. +Then you'll see the actual extracted hits, the actual hit, not the lower case tokenized version, right? So we extract the actual hit. And then we produce the score, which is this is the cosine similarity between the query and the text around it in the result. +So we kind of sentence tokenize the result that we get and then we're basically looking to try and stay within that sentence and see how relevant it is. +And ultimately, we also adjust since we are sending different queries to different systems and of course, different systems have different result links on average. We do adjustments for both of those. We also give you the exact token locations for everything that's hit and ready to rank from there. +Wow. So much is done behind the scenes here. And so much is simplified on the other side, on the outer side. That's amazing. +And how many systems do you support or which systems do you support out of the box today? So I'm happy to say we have connectors to all of the major open source search engines, including solar, AWS open search or open search.org, I should say, an elastic search. +We also support the main open databases, Postgres, SQLite, also some of the more traditional cloud ones, Google BigQuery, for example. And we are in the process of adding, as I mentioned, M365. We also have, as of the last one, you can connect to Atlassian using our request get. +You can connect to UTRAC. So many of the sophisticated repositories, you can actually just use the request get connector to talk to them. And M365 and Slack are coming in our next release, which is next month. +Well, I think especially Slack or any like messenger that also has this kind of APIs that can utilize, I think that's going to be like a big thing in my opinion, because so much is happening in Slack or similar platforms. You know, so much knowledge is kind of written there in public channel. +So you're in your own direct messages, right? It's possible to access them. Then I think that this is amazing. We even support Microsoft Teams in the next release, full search of messages, also all the shared objects, depending on configuration. +And if you're familiar with the M365 OpenID connect, the infrastructure and sort of that ecosystem, it's entirely under the deployers control. Swirl is just software. +I mean, we have a hosted platform which you get connected to, but the permissioning, all of that is actually done on the on the owner's side. And you can turn it off in one second for any reason you're uncomfortable. But Swirl 2. +0, again, we'll be coming out next month, has all of the OAuth and OIDC capabilities so that you're really just connecting your Microsoft account, searching through that stuff. And there's no other user interfaces or IDs or anything like that. It's all seamless. +And again, I'll completely controlled by the deploy inside that M36510 and owner. Yeah, fantastic. Is there something else you want to show on this demo? Or we want to go back to our audio mode video and audio for those who are listening only. All right. I hope that was more than enough. +You know, there's a ton to show. I just want to give a little flavor for it. And in particular, you know, we're really focused on making this easy for developers. That's the current audience. I think there's lots more we can do in the future. +But if you want to add a bunch of sources or solve a multi-silver search problem, that's what Swirls intended to do. That's a... It's amazing. It's amazing. +And how do you see the clientele? Like, what is the ideal client for this system? How do you want to interact with these clients? And how do you see... +Or maybe you'll already experience this, you know, first steps to succeeding on this path? So I honestly, people who are using it today are doing three things with it. And I'm super curious, right, as to which ones of these will evolve. +I think the most basic, you know, or interesting use case, right, or the sort of like the most obvious use case is one search box to rule them all, apart from the Lord of the Rings reference. But honestly, that's been so hard. +If you've done a lot of enterprise search projects, normally, you know, for the initial scope, and it's expensive, and it takes about a year or whatever, you know, you get a couple silos in place, and things are good, and people like it. +But adding silos over time is super costly, and it's hard, and this is the way to do it. You have a great existing search index, you have a search UI, awesome. Connect the search index to Swirl, and connect your UI to Swirl. +Now you can add a whole bunch of other sources and get great ranking, and you don't have to change the UI necessarily. For the most part, every search UI has URL, title, and body, and maybe a date. Okay, so if, for starters, you can just take those. +And if you have more, right, if you want to do a source facet, that's cool. From there, I think, you know, people with Python, right, Django, experience, and who want to take this and tailor it, we'd love to help, we'd love to hear what you're doing. +Again, please, the Slack supports all free, just join up the community and get in there, and tell us what's going on, or ask. And I think there's lots of other people who are working with it too, who are started to, you know, answer questions and things like that. +The second thing, though, there are definitely use cases where people really want to monitor multiple sources, and push notifications out, like, to Slack, and to teams, and things like that. That's a very different model. +I don't know if that's for everybody, but I think it's, in a way, that's the future. Right, we shouldn't have to ask when going to a search box takes time, and then I still have to parse it. +Depending on what you know, Swirl doesn't do any profiling or anything like that, depending on what you know, you're the builder of search apps, right, or inside apps. +You should be able to target them, but the barrier is usually not what we know about the user, right? Since they're an employee, we might have skill knowledge about them, right? We probably have access in theory to some other information about their job function and department and who they talk to. +So it shouldn't be that hard, but the problem isn't knowing that stuff. The problem is saying, okay, well, how do I get content, right? How do I get that out? So again, hook it up to Swirl. + Build a watch list, which can be essentially a group of queries or set of search objects with the subscribe function turned on, you know, for a bunch of topics, push that data out to the people who need to know, create groups, use service accounts to search as opposed to using individual users, right? +Targeting individual users, not super valuable for proactive delivery, but on a group basis, very valuable. +So tell, right, create an industry feed that, you know, if you really know where to get the best industry data, why not make that systematic? Why not make that data available to everybody who's out there trying to talk to those folks through whatever, through their mobile? +And this is a thing like trying to do end-to-end enterprise search is super hard. +You got to get people to adopt your solution. Why would what do you want my mobile app for? You probably already have a cool one. You might already have five. So it's all about just putting that data out there so people can keep building fast. That's it. Yeah, this is amazing. + I mean, you you simplified a lot in how you presented, you simplified a lot and you solved so many edge case, like not edge case, but like this really challenging things that are like showstoppers sometimes, you know, like, okay, I have this existing search demo app or something, you know, it's used within my department. +I just want to add one data source. +Now, what do I do, right? Do I really need to change my UI? Do I really need to rewrite the back end and things like that? And so I could actually, when I introduce swirl, will it actually precede every search back end call between UI and the search back end? That's how I do it now. +And like, we're setting it up. We use it internally and that's the way to do it rather than querying an index, you know, and then create just queries world and have it query all of those things. And what you get is the best results from across all sources. +Now, that's no substitute though, right, from going into the silo. Sometimes you need to go into the silo. They have in addition to a great search API and a lot of business logic right on their side, like query synums. There's a lot more. +You probably want to view the object in their environment versus in swirl, we can create a copy of it or whatever, like everybody else does. We don't. +If somebody wants to do preview, you know, there are so many technologies to do that, but why? +Instead, take, I think the best thing to do is after the user has scanned the shallow results that swirl gives you immediately two, two, three seconds, that's nothing compared to the time it takes to go to each silo. +After you've done three silos, you're already way saving, right? But then say, okay, look, it's obvious to me that the best results here are maybe in one drive in this folder or maybe it's in this team's chat or these teams chats. +So now click, go into that environment and hopefully you can then, right, traverse the data and get what you actually need. And down the road, when those repositories are serving up answers, right? We have mentioned chat, GPT much, but I assume you've seen the Microsoft co-pilot demo. +How long before that's pushing the data back, as opposed to you asking for it, right? It's saying, oh, here's the summary you need today. If you knew what to tell it, it could probably do that for you. So I think that's the new landscape. +The much more important thing than the one search box to rule them all is to use the power of meta-search to connect systems together and deliver information to the stuff you have already, to the workflows that work and make value already. +Whether that's Slack or, a newsletter or a notification to a Salesforce queue, that's the what you should do. The world doesn't need another search you are. +Yeah, especially like today, I saw a message on Slack for one of the senior managers saying, hey, what's the password to this thing? +And I can imagine that in the business schedules, you know, they don't have access, they don't have the password right now, they will switch to another topic, but maybe this topic was still important and maybe even one important, but they just don't want to wait. +And what you say is that in principle, they could have configured it once and access it as many times as they need. Exactly. Exactly. +And it's not uncommon in the world of, you know, consulting, strategic consulting, tech strategy, that the most powerful people are analysts and admins because, you know, partners are very busy, right, talking to and solving client problems and finding new ones. +So they rely on those folks to have access to all the systems and to go scour them. And of course, that's a waste, right? Probably nobody loves scouring those silos, but even more, we cannot be 100% systematic all the time. + But with technologies like meta-search and push technologies and there's a million things you could use and there's a million ways to deliver those things, the opportunity is really there to let those people work on something else, right, to create value in other ways and not just be scouring it for everything that's relevant, for every, you know, give the best chance. +Yeah, absolutely. +And how do you view the problem of, or do you think it's a problem at all, of evolving such a search engine, you know, like, if, if I have the main experts who could actually label results for me for these queries, could I somehow integrate this into the process with swirl? Absolutely. +So that brings me actually nice lead into the third use case that the people are starting to look at with swirl. So exactly what you said, maybe I'm trying to build the chat GPT of my business, okay, maybe it doesn't even have to be that, maybe it has to be something, even a simpler version. +How would I automate handling of an exception when processing a mortgage, as an example? How could I automate that? That's really hard. That is probably not a rules-based system. But it's exactly what you said. +I need labels, right? So you're going to have your humans go scour, whatever, the various locations slack and teams and various products. And hopefully they find them and they label them. +Why not use MetaSearch for that? + So if you can MetaSearch those things and use the language model, right, to basically say, I'm going to label anything over a certain score as being about this thing, then I give it a bunch of labels, let it hit, get a bunch of targets, let it go, find those things, hold the documents, because you will need the documents. +The difficulty of pulling documents compared to searching documents in M365 is one permission. So we are today, right, if you install swirl against M365, against your tenant, you are granting permission for swirl on behalf of some user, right, to search through the one drive files. +So you could also give a permission to fetch those files. So use swirl, to find the documents that are about the exception handling across silos, label the ones that are above a certain threshold. +Perhaps you could display those in a UI and let the, let the analyst check the labels, you could use a cool tool like Prodigy as an example, right, from explosion, the same folks who make spacey, which is what we use in in swirl. + And I think from there you would say you if you trusted the labels, if the labels were good enough, you could actually do your first run, right, take 25 or 40% or whatever your preferred number of the labeled results out, build the machine learning model with the rest, and then test with the, you know, with the holdout set, do the, you know, if it's bad, build a confusion matrix, etc, etc. +There you go. And at least now you're reviewing and refining and adjusting the threshold as opposed to going and starting with hand labeling of data. Yeah. Yeah. That's, that's a great application for meta search and language models. Exactly. +And you explain basically kind of like in the, in the most straightforward way, you know, in a machine learning, out of machine learning model training testing validation, right, which doesn't escape in the search world doesn't escape from this. I think this is amazing. +You chose as the model for your product, open source. You have some thoughts on this. I really like this question when I asked this to, I think I asked it to Bob Van Lloyd from Weaviate as well. +You know, why did you guys, you know, looking at your competition, let's say Pinecon didn't choose open source for some reasons that are valid for them, but you guys did. +And so did you, what makes you believe in this model work better? Because in some sense, it does require a lot of like public facing work, right? You need to explain, you need to document, you need to review pull requests with all the goodies that come with this, of course, right? +But there is an extra work involved, but you definitely get some benefits. +What is your thinking here? The truth is I've been an open source person forever. +I just believe in it, whether it was, Jeff Hammerbacker's amazing comment about how it's too bad that everyone's spending their time on clicks, right? And he believes that the data science approach benefits hugely from open source. That's so true. +Joseph Jackson, the notable VC, right, has written so much about its open source software that's really eating the world. It's eating at a considerably higher rate. And the reason I think is it's a few things. One is trust. One is trust. +You know, during the pandemic, I think the large enterprise saw a lot of promising young ventures just not make it. And if you bet on one of those technologies, you probably didn't get the technology. Or maybe you did, right? I don't know, but there was a certain amount of risk involved in that. +And open source does, although people, I don't think want to take the code and run with it, they want to know that they could, if they had to. +The second thing though, the trust is much deeper when you have a commercial company that supports open source, the so-called commercial open source model, because it does require that public investment, that public discipline. We're all about people using it. There's no sales. +Nobody has that title. We're here to make people successful using it. And I'm not sure, to be honest with you, how all the different ways it's going to evolve, but we want to evolve in line with what the actual community needs. +You know, I think you start with a kernel of an idea, and I've worked in search enough to have that. But beyond that, it's a collective thing. I love the way Vespa, as an example, is so open to look at how well it's evolving in the place that the community that needs it. +I think there's a similar community. And what is out there for them are a bunch of potentially some good and some unknown vendors, some interesting open source products, some of which might take a lot of work together. +And maybe their stories about super hot projects where there's one committee, and they go on vacation for two months and everything falls apart, where they lose interest after two years, and they leave with 2000 tickets. It's good to know that there's a little commercial entity. +But ultimately, aren't the greatest innovations coming from open source, open AI? Most of the pieces out there, that's why there have been so many replications. And that's the last piece of it. It's provable. You know, you can take my word for it. You can look at all the charts and stuff. +But with two commands, if you have Docker running, you can get swirl going, and you can see for yourself. Yeah. If it doesn't do something, well, help us make it better. Sorry, go ahead. Exactly. Exactly. +No, I mean, that exactly proves it because however magical the software is, if you are the engineer, you really want to like, you know, open the engine and see what's going on there. +How can I modify this? How can I plug this in? Because if it's not open, I guess, well, maybe someone will blame me and say, no, this is wrong. +But you know, if it's like an API that I need to pay for, what's the path for me to get into hacking? Should I buy it on my own credit card? +Or should I call my manager and say, hey, can you, well, and usually what happens if you look at Pinecon, for example, they will have, they will allocate like a free tier, right? And so you can kind of hack with free tier. +If you run out, then you'll call your manager, I guess. Right. And nothing wrong with that too. I mean, but I think that that's just a facilitation of the try and buy process. It's still a commercial company. You can't know for sure. Right. And honestly, that works for many companies. +There's no one size fits all. My point is this. +I think for solving complex, the kinds of complex multi-silow problems in the large enterprise where where I have been very lucky to work before and where I think, at least to some degree, I hear about the problems, right? Even if I don't understand them. I hear about the problems. +Open source is the winning model because it is so tailorable. You know, no one has the same thing. Everybody has seven of everything, I think, in the large enterprise. And then there's regulation and compliance regulatory systems, all that stuff. +Those things are the ones, those are the actual barriers. So open source is most adoptable in that regard. And then I think as long as there's someone who, you know, as long as there's some option to say, well, they're not disappearing, right? They're not. +There's still someone to help us who really knows how this thing works. It's, it's safe and tailorable. And that's what's really driving so much of the growth, the incredible growth in the software. Again, chat GPT, right? Paper, wood methods, not. It's being commercialized, but that's no surprise. +Yeah, I mean, it wouldn't probably exist if, like, just yesterday I was hacking something for going to bed. And it was super slow because I think it was US daytime. Everyone was probably hacking there as well. But I was fine with that. It was typing slowly, giving me some code snippets. +But could it have given me this code snippets if they were not online, if they were not like on GitHub or somewhere else, right? So I think it's kind of extending on the shoulders of giants again. Totally. I completely agree with you. And it's extremely limited. +Look, it was trained at least partly the noncode part, right? It was on Reddit. It reads like Reddit. It has a little bit of a know it, Ollie, you know, and it gives the sort of like consensus answer. Now, that's great for code as long as the consensus data is modern, current, and available. +So it's never going to teach you, it probably won't teach you that much about enterprise integration patterns and enterprise workloads. But it'll teach you a lot about open source. I write with it, I try to write with it almost every day. +And I can say this, it's very good at filling in a class function. If you teach it a class, it's very good at that. That seems to be, and that's really, I think, commodity work, right? How to connect to X? It's very, very disruptive there. +It's also potentially disruptive to a lot of natural language tasks. I think that's the way it is because it is at the end of the day a giant natural language model, right? So it's not surprising. It can do things like translation. It's very good at rewriting a query to make it broader. +It knows how to rewrite a query to make it Boolean. Those are never going to change. But getting the data to it. Again, if you want to build the chat GPT of mortgage exception handling, you're going to need to pull a lot of internal data, label it carefully. +That's that, and that, and you might discover you don't have enough. That could also be the case. There's a whole synthetic data market that's ready to solve that problem. So, but in a larger surprise, I think it's much more of the other problem. We can't get to it. We know it's there. +On that front, have you actually considered implementing a chat GPT plugin so that I can go as a user configure things at my tokens? And now, boom, I can search my internal data lakes. So we have, we are integrated with chat GPT. There's a connector so you can query it. +We, by default, send every query to it. We also have a query processor, and we will soon have a result processor that will summarize your results for you. But frankly, I think several people have already done stuff like that. So you just copy and paste the links. You can probably get that. +I think that's that's really an essential piece of it. Now, to query, like generate queries from chat GPT, I think that's easy to do. Right. Someone can do that. But this is my point. There will be other GPs. We refer to chat GPT as a question answer, right? Or questions. +If you say question, Colin, put your question in. We'll send it to chat GPT. I am sure people are looking at the amazing platforms you've just mentioned, right? All of them. +Those are going to end up deployed in different parts of the enterprise, answering questions, summarizing, extracting, predicting, prescribing. There will be all those things out there. And the key will be, how do you get at them? Yeah. It's still the problem. Right. +Just because you have something that will comment on the financial implications of a federal rule change, for example, right? Doesn't mean anyone's going to go look at it. +So, but if you made sure that every day or whatever it is or every that we were checking for new temporal updates from that and those were being pushed out to the people who needed to know that and read it, especially if you could check that they read it. +If you could imagine doing something like pushing information to analysts or somebody who's taking action on it and then tracking to see who read it and then watching their performance, I am sure that that will be a thing in the financial services world. You know, it's a tough world. +There's very used to a high level of governance, if you will, but I think that that's the kind of system that will ultimately produce the automation where the chat GPT will be able to solve the mortgage exception. So, on its own, 90% of the time, right? 10% of the time engaging a human. Yes. +That's somewhat scary, but I think it could also be liberating if done well. And I think there is a big discussion on this topic going on. +How do we collectively as a humanity, you know, make sure that this tech doesn't host us, right? Doesn't just kick us out of our professions or, you know, we still have a way to, I mean, even just going back to yesterday's example, I was going really in circles. +I was just drawing some pins on the map using chat GPT. And it couldn't get exactly the crux of what I was asking. And so I went to the kitchen. +I thought, just for two minutes and I thought, okay, I can just break down my code in two parts without telling chat GPT what I'm doing and just run everything in my ID and boom, I'm done because I was reasonably going in circles. +And maybe it's just me unable to, you know, engineer better prompt, so engineer better questions. Or maybe chat GPT does have limitations as well. You never know. But it did help me probably like 90% of the work was done using that interaction. +Like I would have spent several half a days as they call them or whatever evenings, figuring out all these things. Like what library should they use to connect to open source map or whatever. You know, how do I drop pins? Absolutely. +The chat GPT is the perfect replacement for the more senior developer, who will answer your texts right or your Slack, sorry Dave, my name's mine. You know, like that used to do work until you're blocked and then you go find somebody and say, okay, so I can't figure this out. +This was pre-internet, right? Now, for a long time, we had stack trace or the other thing that chat GPT has completely revised. Yeah, stack overflow. Stack overflow. Right. Exactly. Now it is we have stack overflow. For a while, we had stack overflow. And then now chat GPT, it's funny. +I forgot the name because I use chat GPT instead. I haven't Googled for a code thing in so long. I can't even replace your habit, right? Your memory and habit in some sense. Yeah. +Well, you know, we all get good at evaluating those, right? The stack overflow articles like, okay, so when's it from? How many upvotes does it have? Is there a good response? Does it have the green check mark? Chat GPT is pretty much bringing you back the green check mark answer. +So there's no point anymore. That's what it's good at. I totally review. It's funny. You mentioned this because exactly same thought across my mind when I was interacting with chat GPT. So that was like relating to my experience with stack overflow, doing some small Android application. +And I've ran into the issue which was described in like something like 20 questions and answers on exactly same topic. And everyone had a green, you know, check mark upvotes, but nothing worked. And in the end, I found just one of them that worked. +And you know, that was like the process in a way like iterative, repetitive, and also in some sense for trading, but then in the end, when you achieve it, you know, it's fine. You achieve what you want. With chat GPT is somewhat similar, but the experience is different. +I don't need to type that much. I mean, I don't need to type something into Google then go to Stackflow, you know, read this thing, comprehend it, and then apply it. With chat GPT, all of this is condensed. +It's like all of these steps just condensed and meet just literally typing what I want and getting something on the screen. Right. This part by itself is amazing. It is hard to predict where how far that will go. But I think that one thing is very clear. +The M365 silo is probably the most important one going forward because it's going to kind of automatically be taking the knowledge, which is very present in outlook, right? Maybe not so much encounter, but in your email is a lot of knowledge there in teams. There's a lot of knowledge there. +Documents, probably a decent amount there too, although I think that tends to be more scattered. But effectively, right? Chat GPT was trained from Reddit, which is chat. Teams is chat. Outlook is sort of chat. So there's no doubt that maybe those early interactions will come through that channel. +But I do think that exactly as you said early on, Microsoft is never going to make it easy to talk to anybody else. They still come from that position of silo dominance or whatever it is. They don't like to work with Salesforce. Salesforce doesn't like to work with them. +Nobody likes to use the non-great product in someone else's stack just because we're trying to consolidate. So that's why it persists. And that is very real and exacerbates the problem, the walls between the silos, and then throw in all the others. +After you get the basic whatever, big five, then you have all the elastics and open searches and solars and postgres and to say nothing of the applications. So one group is using swirl to look at five different ticket systems. They're all just ticket. +You track is one from JetBrains and then you're on the, there's some others. Okay, that's a really interesting problem. The cost to migrate all that stuff would be just, it's not even, I don't think it's necessarily that much money. It's just a massive amount of pain. +If you could figure out how to do it, probably some transfer much, it's not that much money, but it is a tremendous amount of work. Yes, I think you probably don't realize yourself yet, but from the way you explain this, it feels like you've invented JetGPT for the search part. +I mean, in some sense, like simplifying things, not actually, as you said, not requesting anyone to physically reinvent things like move data here and there, which can take years, sometimes like dozens of years, people simply don't do this. +And also access to the data, like today, I only remember a fraction of things that I did. I literally forget things that I've done yesterday. +I might sometimes reflect and I remember something a week ago or so, but it's still, it's because of information overload, and I need to make decisions, I need to scramble something together quickly on a conference page, how much knowledge do I have myself? +And if I had that magical search bar where I could have typed something and just get the support material, not to go all over the place, essentially doing what search engines should do, just go and check what happened where and when and by whom? Exactly. +Exactly. There's so much amazing work and time and genius that's gone into some of these apps. I mean, who doesn't love them? Like they're, you know, they all have incredible capabilities and they're evolved, they're growing all the time. +In a way, right, the idea that you would take data out to try to make sense of it is absurd. It really is. Think of Salesforce as 2,000 plus tables just to make the application work, you're going to extract that? No, you're going to query it. +And that's the key, right? And so we're focused on making the querying easy and understandable simplicity. You know, I've worked on some amazing products that were not simple. +And I'm sorry for some of them, right? Not being that simple, but at the end of the day, I think today in the enterprise, it's got to get easier. And there's got to be alternatives to indexing. And so thus the simplicity. Amazing. Here comes my favorite question. +As we get closer to the end of this amazing podcast episode, the question of why you've done a lot in software engineering, you've done quite a lot in search. You mentioned on all this companies, you know, like fast, which, you know, product became like Vespa and so on. You're building swirl. +Why? Like what keeps you motivated to do this? And as amazing as it is, like you're doing a lot of things. And also in the open, what motivates you to stay in this topic of search? You know, whether or not it's been searched, data integration has been the thing that I've always liked. +I started my career at John Hancock financial services working in marketing, doing customer segmentation. Interesting stuff. But really, the problem the company couldn't solve was how to view, well, completely separate product lines in one way. +They had no idea, right? 110-year-old company had no idea that it had a Pareto actually was somewhat worse. Like 10% or 15% of the customers were producing 80% of the premiums. Everybody got treated equally. +It was like a very old school business that was all about customers without really understanding customers. And it was still massively successful. So that's not an act. They were one of the biggest users of technology. +Also, Hancock had the largest IBM mainframe, I think, in the Northeast for many years. +But the silo problem was the problem that we had to solve to actually take the company to the level that it could compete with direct mail companies because direct mail companies had a lower cost basis and they knew the customer. +And that project quite honestly is the pattern that I have seen over and over again, regardless of what venture search has been one of them. But I was really lucky to work on mortgage processing too. +So a company called AI Foundry was actually backed by Kodak Alaris, which was the world leader in scanning at the time, right? Said, we need to come up with something to do. We need to do something interesting with this scanning technology. +And we'd like to apply it in a market other than consumer photos or things like that. Try to find a new market. +And mortgage turned out to be hot because if you've done a mortgage, right, if you've taken a mortgage, you have this ugly moment of sending them a bunch of documents and then you just have to wait. And then sometimes they're like, Oh, I need to do this one again. +I believe there's research that showed that something like one third of the applicants drop out every two or three days after, you know, you haven't got back to them with their documents. They just want that all clear, like you're good. +So AI Foundry used pretty interesting OCR's, zone technology, classification, text classification to turn the mortgage app into data, not 100% with the state of the art before was keying it, manually keying it, and then someone would manually review it. So we switched it to review. +Company was successful. It was a silo problem again. You could think of the different types, right, of articles as being fundamentally silos and understanding them was hard, and we do a lot of modeling and it worked. It worked great, right? Gaulous bought the company. That's just another example. +Did the same thing in an IoT company, most recently, where we're basically taking sensor data from healthcare settings, marrying it up with other data, like their EHR data and trying to predict, you know, likelihood of various conditions. So it's always the silo problem. +And frankly, every single one of these ventures would have benefited from something like swirl. So that's why I did it. It's because to be honest with you, I think the data problem is huge. I'm passionate about it. +And I think it's important to solve it because frankly, some of the service problems, right, that we all suffer when we're out in the field dealing with large companies because they just don't have the data. +They're not just trying to be mean or be clueless, right? Sometimes it's like, it's a hard problem to solve. We expect a lot now. As an engineer, right? I'm expecting chat GPT level response is pretty soon. +And yet, what we have is Siri, who like can barely figure out to turn off the alarm, you know, what it's going to. So there are going to be some bumps. There's going to be some sudden pulls and pushes. But I think the important thing is that why you ask me why do it open because prove it. Awesome. +Yeah, this is an amazing answer. So data is literally king and the one who has universal access to data, wins, right? In so many senses of this word. This is so great. This is so good. Chatting to you, Sid, I've learned a lot. I was wondering if there's something you would like to announce. +Something is cooking. Or you simply want to invite developers to a tutorial and to send a pull request. Well, I would love to do that. First of all, we have webinars every couple of weeks. Please come if you're interested. +Just, it's a, you just need to put an email address at the edge of the red form. We are also totally available on Slack. There's, you know, totally, we don't have sales. It's free. Just connect up. You'll talk to support or customer success. I guess is the more, more appropriate term these days. +But they're here. We're here to help. That includes me and everybody else on the team. There's only five of us. But we're all here to help. We would love to hear what you want to do this world, what you're doing this world. +We are here to write, if you need help with a search provider, we'll write it for you or help you help you get it working. What I can say for sure is this. Next month, version 2.0 will drop. It will be something you can one click try and it will have the M365 integration that I talked about. +So the full ability to deploy it to your tenant in our hosted version or just to take the Docker, run with it, hook that up so it will support OAuth 2 and OIDC. Many, many more features will be elaborating on the things you can do with it over the next couple of months, particularly in May. +And I just really would beg people to try it and tell us what you think. That's my ask. So if, and if anybody can, if you want to work on it, you know, we're always delighted to accept and even guide anybody as to where to start. Right. So that's where we are. +We're very young and we're trying to figure this out. And energetic and knowledgeable. And I think we will link everything you mentioned, of course, in the episode show notes so everyone can click it their will. And you know, follow and learn from you as I did today. +And I really want to allocate time also to participate in one of your webinars with them. I'm pretty sure I will learn more. That would be great. We are definitely bringing in folks. We had again, KMW, which makes Spyglass the open source project. +We had the author of Quarge came, came previously Renee. It was great fun. We hope to have him on again, because I think we could learn. I'm actually listening for our talk about the things they're doing. So and many others. So absolutely, we'd love to have you on. +And if you know anybody who wants to talk about the stuff too, please, I'd love to have them on as well. Fantastic. Thanks for pushing the envelope of search. Keep pushing. I wish you all the success that you can get and beyond. +And I hope we can chat more down the line down the road as you got, as you guys grow and I'm pretty sure you will. +Thank you so much for the confidence we will love to share updates in future, especially I'll be very psyched to show you some of the machine learning stuff we're talking about as a case, we definitely want to build that as a use case and make it one click easy to do that. +So yeah, let's keep it touch. I love to too. I mean, I'm a huge fan of the podcast. Obviously, I've listened to the Vespa cast several times and I think please keep it up. It's awesome. There's not enough people focused on this incredible area of technology. We're talking about stuff. +I think it's going to become more common, but it's still a little bit unknown. Yeah, appreciate your kind words. It's thanks to you, makers. Thank you so much, say it for your time. Really enjoyed it. Thank you very much. Bye bye. \ No newline at end of file diff --git a/transcripts/vector-podcast/sid-probstein-part-ii-bring-ai-to-company-data-with-swirl.md b/transcripts/vector-podcast/sid-probstein-part-ii-bring-ai-to-company-data-with-swirl.md new file mode 100644 index 0000000..e434309 --- /dev/null +++ b/transcripts/vector-podcast/sid-probstein-part-ii-bring-ai-to-company-data-with-swirl.md @@ -0,0 +1,35 @@ +--- +description:

00:00 + Intro

01:54 + Reflection on the past year in AI

08:08 + Reader LLM (and RAG)

12:36 Does it + need fine-tuning to a domain?

14:20 + How LLMs can lie

17:32 + What if data isn't perfect

21:21 SWIRL's + secret sauce with Reader LLM

23:55 Feedback + loop

26:14 + Some surprising client perspective

31:17 + How Gen AI can change communication interfaces

34:11 + Call-out to the Community

+image_url: https://media.rss.com/vector-podcast/ep_cover_20240515_120505_ab56f7a7d7ebadfb6bbd3486a4d2e7ad.png +pub_date: Wed, 15 May 2024 12:57:55 GMT +title: Sid Probstein, part II - Bring AI to company data with SWIRL +url: https://rss.com/podcasts/vector-podcast/1480271 +--- + +Hello there, this is Vector Podcast Season 3 and I'm super excited to be talking to companies with thousands and thousands of users and thousands and thousands of systems that it's been a time of inspiration and a little bit of continued nervousness about what it all means. +Last March on the 15th actually was the 14th was Pi Day and that was the one year anniversary of GPT-4. What I've learned is that those large enterprises were again looked at GPT-4 and said this is going to change our business. +This can really help everybody be an efficient expert and just slice through the current problems of silo data and inconsistent systems. +But at the same time there were a lot of fear about well are we exposing invaluable internal data to AI's that are then going to be trained on it? Is this going to be exposed? Lost? There have been many many lawsuits. + So ultimately the large enterprises did what they always do which is engage with it on their own terms and many of them purchased download installed AI, generative AI's and LLMs in their private clouds and we're working with one large company that did that and trained it with a bunch of what they called safe data. + So annual reports and you employ a handbook and it's very interesting to talk to but it can't really help a business person or somebody trying to answer a question in the supply chain group or in the R&D group or in HR because this doesn't have access to those systems and in those places you've ever worked there. +You know when you onboard the first thing they do is your manager does right is they open a bunch of tickets so that you could have access to systems. That's hard enough. +So the reason that there's been so in a way so little progress right lots of installs of AI but not that much real I'd love to hear from you some of the use cases out there. + People are still trying to say we're still trying to get the data to the AI so that it can provide the benefit and what ultimately what what happened is this they've got the AI's installed the first generation of AI architecture solution architectures is what I will refer to as a vendor driven put the data in architecture literally every product out there I don't want to name them but they all say the first step is put the data in again like for some people for many applications from POVs for testing it out that's great and I've who hasn't done it with a few PDFs right and got some interesting results but you can't just take a copy of a departmental database and hand it over to a centralized corporate database for training like that their rules in place to prevent that even more difficult is the idea that you would send it outside your perimeter into someone else's cloud right at another big manufacturing firm they have a 24 month waiting list to onboard a new SaaS product right they'd like we have to put our security team on it so I believe it's a very interesting time and ultimately what happened is Swirl thought differently about the problem as you said we thought about it from the search technology perspective why would we move all of the data instead move the essentially take only the data that you didn't give it to the AI at that moment and what Swirl does first to do that is we create a single pane of glass well the next thing I'll mention is Swirl is software we are a software company and our software is typically deployed in the customers private cloud there's we are happy to do hosting for POVs and for various applications but for larger enterprise we don't expect that to be the case once you deploy Swirl it integrates with your single sign on systems such as Microsoft or Octa or ping federate others you can have cast whatever in there once it's configured you send a question prompt or query search query to Swirl it brokers that query to all the sources that it's authorized to do so and it does so on behalf of that user so it's not only is it safe compliance search using existing infrastructure but it's personal the data the user or caller gets back is based on what that user can see so I use it all the time and it's my email my outlook my calendar my LinkedIn whatever right it's my view we actually love the idea that we should present the data to the user so you get that single pane of glass and actually you can decide what to do with it you can say I don't want this source or whatever you can make adjustments but ultimately we then execute rag we have our own excellent high quality rag better than many in particular it seeks highly relevant passages from all of the documents we can fetch the documents and authenticate on the fly as to do that um bind those to a prompt we have our own prompt engineering you can uh override it and then do the rag against a huge list of AI providers actually we support more than 20 today including most of the ones we see out there open AI open AI and azure bedrock and prop at google mistral uh co here etc and in all cases no code should be required you configure an existing connector more than likely you're putting in just endpoint information and authentication tokens and then swirl again does that broker and creates that pane of glass and execute rag you can also use swirl just for the R if you have your own rag right you can get the result list and do your fetching or you can hook after you've got the swirl has the fetched results and you can operate that on just the full documents the key to this i love that you asked is the reader lllm we have been really heads down working on the reader llm um i've actually been asking people if they have heard the term before and many haven't uh i don't know what what your take is on on reader llm these days oh yeah i'm still catching up really i mean the way i see it and i'm still kind of plowing through rag itself right so you you said what is my take on on how easy it is to on board to the say i models and so on i i have a sense that people are aware of this because it's so easy to access through chat chat gpt and similar tools but then when it comes to deploying these things i don't think that it's as easy right so because you you have to go through a list of models you need to figure out which one to pick and and and and hence you need to be a data scientist right at that point or ml practitioner or whatever um and it's not and it's like the web is exploding with so many cheap advice you know use these use that but then as you go through that process you realize that none of those models work and so you need to do something okay the risk rag but setting up rag means that you need to bring in an effective database that you haven't seen before and things like this right so it's yeah so i love that so just speaking of misinformation right i think you're absolutely right there's so much um confusing stuff out there you do not need a vector database to rag you never did it's it's a it's a vendor thing that i totally understand they're charging per gigabyte or whatever so they say you have to have it to rag uh there's an excellent study by zet hub and actually simpson garf ankles and advisor to swore all you may have heard that name incredible tech writer um he recently wrote a study a survey or a summary i should say of the zet hub study the zet hub study shows that you do not need to vectorize your data to get high quality results instead you just increase the number of results you get from a so-called naive nonvector search engine or database and re-rank using vectors that's exactly what swore all this we vectorize the result set snippets we vectorize the full text of the documents we vectorize the query the prompt whatever it is right and our reader llem is responsible for a complex similarity re-ranking you can actually plug the a your choice of embeddings into our reader llem embeddings are actually just a feature one one of the many things that llem's do so you can change that but the reader llem here's really the core of it it's the middle layers of the generative AI llem without the you know um text generation and text interpretation part that's not there at all instead you use it to determine the similarity right cosine or they're many different algorithms but ultimately you're taking some algorithm like that and you're using embeddings plus the reader llem's own knowledge to say how similar is the query or prompt or part of it to the response that i got or find the most relevant passage in a document because you're absolutely right there are tools like langshane out there as in one example which give you lots of interesting tooling right but it's still on you the developer i actually had chat tpt generate me a pipeline just as a demo and the biggest problem is it generated a function that i have to fill in which is called select documents that's really hard and ultimately like you're basically just providing the pipeline to move the data once again but it's the reader llem in swirl is all about re-ranking and finding the best passages so that you are not sending a hundred pdf of which one paragraph is relevant you are sending the paragraph that way you can put a lot more data and you can also not blow out your token limits right assuming you have such a thing if you're on prem but that's what that's the reader lm i'll say this their reader lm are the unsung heroes of especially search but also a rag when you're looking at i would say bing or or chat gpt and you ask it a question and it goes and fetches documents from the web it's almost certainly using a reader llm to determine which pages are best and to be fair being in google have incredible knowledge of that already so it's not like it's that hard but then they're almost certainly reading the most relevant passages right they're not just passing the whole web page in so reader lm's are a thing they're definitely becoming more and more prevalent and they provide a critical non hallucinating step to help find the best results so the user doesn't have to and that's very interesting and and how let's say if you plug into a companies network right so and they focus on something i don't know healthcare banking what have you would you need to fine tune reader lm in any way no i actually don't recommend it i think there's a lot of evidence that fine tuning because of its fundamentally lossy process right is somewhat responsible for hallucinations there's been quite a bit written about this and i think that ultimately the the winning combination today is that you use a very well trained capable model that is generalist and you provide it with the data that you need to provide it with at the moment you need to for example swirls prompt engineering does a few things one we force it to only consider the rag data and not add its own model thoughts right you can interpret but don't say don't create facts that aren't presented to you second force it to disambiguate this is one of the worst errors in prompt engineering is not is just letting it go right up on past equating right two entities with the same name as if they're the same thing so our default engineering says listen if you see and into two entities with the same name don't you know essentially call that out and don't just gloss over it the last one is especially when you're talking about multiple sources of data and enterprise data the user must be able to verify or nobody wants to make a career limiting move because they took chat gpt's and answer and said here it is here it is right put it up on the on the investor site not a good idea but swirl also forces the AI to quote the sources that you use to cite them and of course you also have access to the underlying search results right so you can verify that yes you have a million dollars in insurance coverage and it covers x y and z that's key yeah that's amazing I was just you reminded me of when you said about hallucinations I was just listening to one interview is not related to AI world attack world it's political sciences and so she was asked the scientists she was asked you know are you using chat gpt at work and she said yes sometimes I do sometimes I use it as a co-writer so you know I I draft some things quickly and and I still see that chat gpt is very crude you know in the way it approaches you know I can do it better but sometimes I'm just you know lazy or tired okay let it do it but then the thing that struck her was that it actually hallucinates she was asking give me you know top five books in political science you know in specific country and chat gpt was very confident and and said they said the five books and the authors and when she googled them they don't exist and and then she said they don't exist and then chat gpt responded okay here is only one book that you should read and that didn't exist either so she was genuinely like baffled and she said okay you might say something with less confidence but why lie why do you lie she doesn't know what is hallucinations but she's she looks at it as a user and it's very disconcerting so believe it or not when I first started using gpt 4 I got a hallucination that I thought was so real I wrote to the publisher and said why is this article no longer online and the publisher wrote back and said there is no such article but it could have been it was authored by someone they said gpt 4 said it was authored by another author who had posted on that site the url looked correct and the content looked I mean the snippet it gave me looked absolutely real but again when they build these models a few you know 10 20 gigabyte model right of gpt 4 or 35 or whatever it is petabytes and petabytes of data went into that so by definition it's lossy but the way the lllm the generative part works is it must provide a response so you know how that is when you can't quite remember the name of something and it's essentially doing the same thing so it knows it I saw an artifact that looked like that but I don't have the artifact anymore so it generates something that is the consensus version of what it would have been had it existed and that's why I don't believe in fine tuning so much think if you have a high capable model with some reasoning and the ability to interpret text and follow instructions you provide it with your internal data and that is the beauty of reg because here's the thing the reason it's so good at things why does why does chat gpt 4 sound like a smart person on you know reddit or or some or Facebook or something like that right and that's because that's where it was trained from you're internal and of course on something like reddit or whatever they have a new the same conversation 10 million times right I mean how many discussions of whatever twin peaks or battle star galactica you know are there there are a lot of them and so it learns the core of these things right and can answer those questions but if you feed at your internal data like it's probably not so repetitive it's probably much more conflicting than not and so you that's why you produce more problems it's much better give it the one thing that's really relevant and let it reason yeah that sounds go and slight live right it's something that can be updated throughout the lifecycle of your company or department whatever but there is one challenge they want to offer to you and came to me just today as I was thinking and preparing for this episode data is not gold you know sometimes it is gold because everyone talks about it la la la but like it also is very complex machinery and it can have the stakes of it of its own you know misattribution misclassification and human error what have you how how would you say reader lllm or swirl gonna tackle this issue or is it just gonna transparently sort of like garbage in garbage out type of response it's a great question speaking of hallucinations in AI right we all have probably worked with somebody at one time another who made a mistake right or whatever didn't understand the problem enough and that stuff gets into teams and slack and you know documents are wrong like it's incredible you're right it's incredibly messy in the enterprise happy as anybody not worked at a firm where they had you know 500 versions of the same PowerPoint that is just evolved right so absolutely well these are things that ultimately are gonna have to be continued to be work on but here's one point number one if you leave the system in the system data in the system of record you're much less likely to introduce new problems especially like security problems and you leave it in the system of record than any domain modeling lexicon's ontologies text items you get the benefit of those if someone cared about that source they might very well have done some of that right so if you pull it all out and put it in a vector database like what happened to all of that knowledge so I would argue that the systems of record that are valuable have things in place to deal with that number two the reader lm does a couple things that to help this one it's aware of certain commonly problematic formats email is the worst reply forward and signature content very very problematic we have a solution for public data too so that you can get article content without getting as an example at navigation advertisements cloked data stuff like that right so because very often public data is relevant right to to large enterprise like they want to see policy changes regulatory changes online catalog changes right that that's all relevant stuff then there's the similarity problem right so one of the another thing the reader lm does it can do semantic analysis to determine which is the latest version of the same document that's a one of the great lm's are amazing at that much better at old school like multi windows setups where you're trying to take out like a signature of the document and say well this could it's very similar but lm does it much better right and you can quickly say now this is the latest version of that spreadsheet or you can let the user decide it's another thing who doesn't love shopping I love being able to look at my shopping cart full of swirl results and say you know this one I know isn't really relevant these are the five I've risen or maybe this is the source or these are the sources that I want my data from today that's another way of allowing the user to bring their expertise and experience and knowledge and say no no no colibre not thoughts but snowflake not oracle whatever and I'm not picking on anybody we all can say they're all present they all have value the question is which one has the answer for me today well until the until they can write the query with the context that answers that you know I think the key is to keep the user in the loop make sure that there are citations and ultimately that in a year the systems will be smarter and many of these problems will be solved after all almost all the naive search engines right that were BM25 or whatever pretty much all have vector upgrades now only questions can you wait long enough to vectorize at high-dimensional space a few million documents exactly yeah sometimes when I use chat GPT I don't use it that often by the way for some reason maybe it says something about me maybe I should you learn to do it but sometimes as you said you know it just generates something it seems a little average you know it's a code snippet or something like that you try it it doesn't work at that point when I when I get frustrated a little bit I'm like can you show me the source maybe a link to stack over flow so I can go and drill in for myself you know I don't have to sort of keep pounding and you and I'm asking you know okay that didn't work this didn't work because I can do the same thing just staring at the stack over flow page right and maybe they have been already some updates and someone said no that doesn't work in all some times they see you see the selected answer but then there is another one which everyone says that works not the selected one so that's just funny yeah that's amazing so reader lulm like just to sort of bring it back to the ground especially for those who are sort of no vice like myself I still consider myself no vice you know have you sort of taken a reader lulm off the shelf you sort of implemented someone's paper or took it or did you did you have to train it how did you go about it we built it largely from it's it's been an evolving thing but they're they're definitely our other reader lulm's out there the key is to preserve the structure right and the pieces the structure that allows you to do similarity we implemented our own similarity and other algos we also do things like named entity recognition and sentiment analysis well those are great at that stuff it can do scoring for machine learning purposes right so we have a nice intention detection system now that will essentially based on the responses that you get to tell you which sources are most relevant right up front right based on the responses and also optionally ratings right if you want to bring bring that into the system passage detection is totally in response our reader lulm's passage detection is totally in response to the problem you described right which is the data is messy and we don't necessarily want to ship a you know 500 100 page PDFs that have essentially the same data so there we it finds the most relevant passage super quickly and truncates the document down to a window around it um those those are the things and it we've really implemented it ourselves it's our it's our own it's our own creation oh that's fantastic so that's your secret source as well I mean that's that's something to be proud of and also I want to sort of close up that the sort of you know description that he gave or maybe looking at the future does world have some way of feedback do you plan to if not do you plan to implement do you think it's reasonable to have a feedback loop you know like in chat jpd you can say thumbs up thumbs down you cannot say much you know you can say this was the answer I don't know if that's gonna go into the loop but whatever because it gives me the the joy of sort of completing it um yeah oh yes so when swirl AI connect is deployed in the enterprise where starters you get to connect the data and get rag and your choice of AI's and by the way it's again configuration for the AI you put your keys in right check pick the model and you can rag against it you can also choose the role you want to use different generative AI in rights you can use it for query writing you can use it for direct answer you can use it for rag if it has embeddings you can use that to power the the reader L so just just to be clear there it's uh it's a bit more flexible yeah i was asking about feedback right so like do you plan do you have it and if not do you plan to think it's reasonable to have it right absolutely so after you deploy AI connect as mentioned you get those abilities then we have an analytics package which will give you insight as to which sources are providing the most relevant responses and rating and putting all of that into dashboards understanding who are the number one users who write the best prompts who get which sources produce the best results which prompts you absolutely is all part of the the offering and ultimately it's part of what we tailor right for the for the deployment and again that can be on premises but AI connect is the key because it's collecting that data on again always in the customers private cloud like we don't see it we're not sass but that data is absolutely turnable into gold a variety of different gold things and so you can hopefully figure out which AI works best for which groups you can figure out which sources right are providing the best input for rag etc yeah that's fantastic i love that you do have feedback uh i think it's definitely gold it could could also be super messy and noisy and stuff but it's better than absence of it um yeah that's amazing maybe like in the past year so you've been deploying this with clients obviously you don't have to mention the names but was there something that surprised you how clients you know sort of perceived uh swirl yeah i i would say that um people have not really been looking for search if i'm the AI the explosion of AI and the excitement around AI kind of crowded everything out round everything out so that's why i think so many of these copy in architectures were got so much momentum but and i by the way i think people are doing incredible stuff with that so it's not like those aren't perfectly legitimate i mean every database ever starts that way right you put the data in and you get inside of it's just that there's a bit more to the story right there's a whole other world of well there's a lot of these and i just moved them to cloud and i can't necessarily do it but people weren't thinking search i'm not sure what they believed the answer would be but uh there were some excellent posts there was one on linkedin by um vector ventures i think or um i should probably get the name right but in any event they published an excellent piece about how search is probably the answer to making AI work in a lot of these cases and uh you know they also point out there's not that many people who have come at it from the search perspective so that was a bit surprising to me because the large enterprise has always loved search always uh because that that's how knowledge workers and people get stuff done right it's yes you have business intelligence and dashboards and reporting and we like those things but so much of the qualitative why did things happen explain it to me how do i solve this that's been something that search did a good job of um ultimately it's a technology right and the marriage of search with llm that seems to be the unlocker if you will in the enterprise and that's that was surprising to me right i i thought that there would be much more of a search first approach and i think everybody had to get through the understanding of what it means to adopt AI and how the first generation works and now i think people are recognizing real time real time architecture using systems of record with re-ranking and and then just stay keeping up right with the incredible innovation in it in let's call it generative AI right that's that's the interesting thing but again i come back to what i said at the beginning there's going to be a many many incredible generative AI's they're going to do different things that we haven't even seen i don't think the most extraordinary ones they'll be from the big science publishers they're going to build incredible life sciences gai i'm sure people like bloomberg ft you're going to build incredible um financial ones and that's great but all of those still need your data to operate on your environment and give you answers that are meaningful and that problem that problem is the problem that's world solves so just to understand what were they expecting was it like chat interface or you said they didn't expect search no they thought they would ask the question oh yeah so they thought like a chat right yeah that everybody wants kind of a conversational interface you know another thing i learned actually you're really reminded me people are not so interested in the idea that there is an AI place that you go i think another very logical step is business folks knowledge workers they would like to use the channels they use today so rather than i have to have a new place to go why can't i talk to it on teens or in slack or on my whatsapp why can't i text it um if i need to get a visual i could always go to a screen right and then i could have it show me the underlying data show me the graph show me the chart but for the the the future is not applications that are destinations the future is an ongoing that i log with the AI that understands your business your world has access to your data and becomes your uh trusted advisor and agent i don't want to use the word co-pilot because i think that's a little it's much more your confidant it's much more your agent it's going to tell you stuff like hey that question you asked last month there's a very different answer this month that's a pretty interesting thing or it's going to let you explore so you know tell me about our customer service ratings well which region right disambiguation which was previously something you know that you would do like through facets in search right that's kind of thing that should become more dialogue oriented but those things that's going to take some time because in order to know how to disambiguate you still have to know what data is relevant right so so that's been surprising but i think we're going to see a wave of search driven innovation and i'm excited about it i think the more people shift away from innovating in a repository to innovating across repositories we'll see you know another another layer of innovation and and even more productivity left right for for the people to use it oh yeah that's fantastic the way i put it and i'm glad to hear this because there's the search professional you know now a product manager i love the fact that the powerhouse of you know the future of AI still continues to be searched right at the core and i think search it also says that search isn't solved and maybe this is another iteration that we will approach it but it's like because search is also perception right it's also how you express yourself how you perceive what you see maybe the interfaces will change right so sometimes i do want to you know that that product that google had google glass sometimes i want to have glass on me to take a picture or you know not to be as distracted by going and fetching my phone or something right because today i still have to do that it's not as immersive experience and also i've noticed working with engineers now when i flipped on this side of you know the process i'm a product manager so i keep thinking about things and they keep coding and sometimes i've noticed that they don't even go back on slack from like a couple hours when i when i ask something because they don't want to be uh you know distracted from their ideas so maybe there could be a way for me or agent or whoever to sort of sneak into their idea ask a question talk to them right that would be fantastic maybe it sounds also a little crazy like you you still want to have privacy and sort of you know uh flow but at the same time there is reality of your job right you you do need to go back to your email to your slack or whatever you're using teams and get distracted and then you forget what is it that you've been onto when you come back to your your motive execution absolutely you know in a way applications are distracting i think there was a really good study recently that showed the danger of interrupting engineers right because of the context switch um it's definitely the same for business people they're just like everybody else right context matters and it can be hard to switch um i think that's the real promise of a i look at chat gpt you go to chat gpt they're gonna have search soon web search you can ask it questions you don't have to go to google and being in five other places right to get it and that is the real possibility that you would choose the way you want to interact with it and that thing in theory that single point that single pane of glass or single conversational agent right that could potentially be in front of many many sources of data and that i think that's what what people realize it's hard to say how does what does it really look like in five years if a i really continues along the path of sun the answer is it's the end of applications yeah yeah exactly and going back to being immersive and sort of feeling that i'm myself and i am in control and not like vice versa when today i don't feel like i'm in control applications update by themselves i phone restarts i have no idea what's in that update i will not be able to ever understand what they do but they do it so sometimes i feel like whatever i have bought belongs to someone else but probably this will change and i think this should change as we wrap up i was thinking is there something you want to sort of call out to the community and say you know by now swirl obviously has progressed you guys open source i love it you you have a bunch of contributors probably that you trust and you work with but is there anything that you would you know benefit from calling out the the larger community i think i'm very happy to see the the folks shift and focus towards search i think the thing i'd call out is to say you know there are many different user communities that want to consume AI they will benefit from it and i think the key is not to go too far on the hype cycle right and because honestly another thing i learned is not everybody is into the details of how AI works right and like fine tuning is an example it's a very deep discussion at some level i'm no expert right i can tell a lot about it but i think there's people who have done much more on it than i will ever do but the end of the day the user that's way way way far from the user's head right what they're what they're trying to understand and the people are making decisions about bringing these things in are is it safe how can i trust it how do i get it to provide a benefit so i think the honest thing is rather than focusing on like we've got a few more tokens than somebody else talk about use cases focus on the user i think that's where that's what's world did from the beginning because if you're in search like there is nothing but the user right the user's intent is everything right and from like we can go back to lots of lots of great writing about that from biaires e8s to tinkle and and all points in between but the user's intent's important and that's the thing to focus on what are they trying to accomplish and build great use cases that ultimately you know allow people to focus on the things they'd rather focus on instead of you know the minutia and the time of collecting all these different data points yeah that's i think that's i think what you do as a as a tech industry responding to the user demand for AI my two cents oh that's amazing i don't don't try to outsmart the users and uh make things that you produce explainable and so they probably will adopt them weaker uh that's amazing i also see uh super uh maybe provocative really nice one from your side where you say that you outshine google i will link it in the show notes and maybe we'll discuss it at some point as well something about ranking looks it i really enjoyed chatting to you today i'm sure there will be someone you know in the community reaching out and maybe trying out swirl to be honest it's itching for me to try it out right when i when i said it when i mentioned in my company and someone said i would love to have that single keyword box so that i can search slack and conference and email and everything um that's amazing that's fantastic and also amazing that you guys do it uh in the open so everyone can try it um all the best to you good luck in in whatever you're building in your uh next big things thanks to me tree thank you so much it was great to talk to you and uh when you want to ready to try a swirl you got the open source version enjoy our slack and we'll be happy to help absolutely thank you very much enjoy your day youtube \ No newline at end of file diff --git a/transcripts/vector-podcast/tom-lackner-vp-engineering-classic-com-on-qdrant-nft-challenges-and-joys-of-ml-engineering.md b/transcripts/vector-podcast/tom-lackner-vp-engineering-classic-com-on-qdrant-nft-challenges-and-joys-of-ml-engineering.md new file mode 100644 index 0000000..6d4c583 --- /dev/null +++ b/transcripts/vector-podcast/tom-lackner-vp-engineering-classic-com-on-qdrant-nft-challenges-and-joys-of-ml-engineering.md @@ -0,0 +1,238 @@ +--- +description: '

Show notes:

- The ML Test Score: A Rubric for ML Production + Readiness and Technical Debt Reduction https://research.google/pubs/pub46555/

- + IEEE MLOps Standard for Ethical AI https://docs.google.com/document/d/1x...

- + Qdrant: https://qdrant.tech/

- + Elixir connector for Qdrant by Tom: https://github.com/tlack/exqdr

- Other 6 vector + databases: https://towardsdatascience.com/milvus...

- ByT5: Towards + a token-free future with pre-trained byte-to-byte models https://arxiv.org/abs/2105.13626

- + Tantivy: https://github.com/quickwit-inc/tantivy

- Papers with + code: https://paperswithcode.com/

' +image_url: https://media.rss.com/vector-podcast/20211223_041259_de64d1b728c612795842622095155ffc.jpg +pub_date: Thu, 23 Dec 2021 16:01:59 GMT +title: Tom Lackner - VP Engineering - Classic.com - on Qdrant, NFT, challenges and + joys of ML engineering +url: https://rss.com/podcasts/vector-podcast/347538 +--- + +Hi, everyone. Bector Podcast is here. And today we have Tom Lackner, Vice President of Technology at the company called Classic. And I'm sure Tom will talk more about it. And he's also the founder and sole developer of Lookpop, which I'm sure Tom will talk more about as well today. +And what's really cool is that Tom has been using vector database called Quadrant in his development. And so today we have a user of a vector database, not a maker. And that's amazing to hear firsthand how it goes with vector database. Hey Tom. Hey, what's going on? So great that you joined today. +And I just wanted to start as usual, like if you could please introduce yourself and give you a little bit like color to your background. Sure. My name is Tom Lackner. I'm a software developer living in Miami, Florida, a very warm place. +I've been developing stuff on the web for about 20 years now since the early days of it. And I really, really love vector databases these days and doing stuff with embeddings. Yeah, fantastic, fantastic. +And can you tell more about classic? So I know that it's about classic cars, but yeah, what's this website is about and what's the community maybe around it and so on. So I'm the VP of technology for a site called classic.com, the tracks classic car values. +So what we basically do is we go out on the web and we grab all the car sales that are occurring that are happening in a way that's easily understood. So if anything is sold with a price on it, we record that information. +And then we cross reference all these vehicles broken down into what we call markets. +So if a vehicle came in two different trims, two different levels of options, we break those out separately and we can give the user a really good estimate of value with very specific and granular understanding of what a car is really worth. +So it's basically like a big data for cars type project, I guess you could say. Yeah, and I mean, I checked the website and I mean, the cars look so great and some of them are kind of like on on high end in terms of pricing. +So it also defines the audience, right? Yeah, it's classic car values have really gone up in the past five years, especially considering COVID and a couple of factors in the United States. So it's more important than ever to do really intelligent, like savvy shopping before you make a purchase. +So that's where we're coming from. Oh, yeah. Awesome. And like, is it so that the kind of the user experience is mostly kind of managed on the website or you have also some offline part of the operations? So most of our operations are online on the website. +We also have an iPhone app, but what's really important is our backend crawlers. So we have a huge amount of software and resources attached to the idea of brightened crawlers that can understand different auction websites really, really well. +That's like a critical part of the infrastructure that's sort of behind the scenes, but ends up doing becoming, you know, a key part of what we're doing. Yeah. And I knew obviously I have a search bar there. +So what happens when I type something like in, you know, on classic, we use a combination of Postgres for the actual like OLTP data, like, you know, the actual ground truth. And then we feed that into a last search to do the full text search. +What we're actually trying to do there is transition that as well to using a text embedding. We I find that text embeddings are easier to use in the long run. But what's actually challenging there is developing a good understanding of typos. Right. +So we could probably go into one detail later, but the most of the text embeddings that you encounter aren't really typo tolerant. So in our case, that search box needs to really understand, like, let's say, Ferrari or Lamborghini. Those words are often spelled incorrectly for obvious reasons. +So one of the things that's holding us up there is developing a typo, is a system level of embedding. + Yeah, it sounds also some similarities to web search, for example, you know, like where users are using like colloquial language, or if they if they talk to their microphone instead of typing, then you have these typical problems from ASR, like automatic speech recognition systems, and you need to tolerate that. +So so it means that I don't know, like we've been thinking about data augmentation techniques. Have you have you thought about that as well? So what I've tried to do is to retrain the retrain the model using basically our input data, but with certain transformations applied certain permutations. +At this point, I'm not I am not the point where I have a usable model coming out of that, but I'm still doing some research and I it should work in theory. +Yeah, and there are so many models like on hugging face that they guess you can also kind of tap in, right? And that's actually one of the hard parts is to evaluate all those models. +So I have been taking a couple days to write a script that just downloaded every single one and tried them to determine which to best at our tasks. Yeah, exactly. And also like choosing kind of the quality metrics is another, another direction like how do you evaluate? Yeah, absolutely. +Yeah, we're kind of a new territory for a lot of this. So I mean, that's exciting on one hand, but on the other hand, like sometimes you just don't know the answer to a problem, which is like, yeah, yeah, for sure. +So in that sense, classic is kind of well, it's it's funny that there is a coincidence classic and then classic search like in a way that you're using TF idea for BM25, right? Well, of course, you will add an embedding layer at some point to make it more semantic, right? +Then you said that you have look pop, which where you are now experimenting with vector search. +Can you tell me more what is look pop? And then how do you implement vector search there? For the last couple of years, I've been really interested in search engines and how search engines work. +I feel like Google has sort of done us a disservice in certain ways over the past, you know, a couple generations of its development. So I've been interested in like, you know, developing better web search tools. Lookpop.co is my effort to make a NFT search tool. +So NFTs are digital artworks that you can buy in the past year. The NFT market has exploded. I think something like $6 billion has been exchanged this year in NFTs. +But the problem with NFTs is that coming from the world and the language of cryptocurrency, a lot of the websites related to NFTs are about the price, the up, the down, the this, that that, you know, what's hot? +What's not blah, blah, blah, who's flipping, you know, like I personally don't care for that. +So I was looking for an NFT search engine that could actually help me understand the meaning of NFTs and find visually similar ones. +If I find something I like, it would be cool like the up to see stuff that's kind of in that same vein without having to manually search around on OpenC, for instance, which is the number one NFT market. You can only search by the name of the creators, right, which is so weird to me. +I wanted to be able to search by themes, by visual styles. And when I came across clip, the text embedding system or the imaging embedding system, it really, like it provided all those features in a pretty easy to use way. So I'm really excited about that functionality. Yeah. +And clip is basically the embedding model developed by OpenAI. I think it's also available as a hiking face model. So you can kind of plug it in much easier in the code. +And so you have what is your experience with clip so far? So one of the great things about embedding is that when they work, it's like a sort of like magic, right? Like it's like, it's amazing that this was even possible. +The problem is though, if you actually look at the results set as a whole, it's only 80% accurate, right? Like you'll find 20% of those in there are just what the hell is this? +So as a sort of imperative programmer coming to it, or a guy who's my experience is based in the world of traditional programming, to see that it's like, okay, this is a bug, but it's not. +It's one of the switchovers you need to make is to accept the fact that you're going to get a lot of great results for very little effort, but you're not going to get every 100% results. +It's more about identifying the results that are bad, flagging them and trying to retrain to get them out of the loop in the future. Yeah, exactly. +So like building that pipeline, it's essentially like MLOPS pipeline, right? Machine learning operations where you need to switch your mind a little bit into building this pipeline way. +You can like detect problems and then feedback to the process of building a next version of your model, right? So it's not as easy as opening your debugger and then, okay, here is the bug. It's logical, fix it, done. Yeah, the pipeline to develop these models is long term. +It's very different than a piece of software and you need continuous monitoring and you need to continuously be able to sort of have signals feeding in to make that make that model better next time. It's actually pretty difficult. +I know there's a lot of startups around like MLOPS now, which makes total sense to me, but it's almost like I feel like developers myself included need to build the mindset and to know, and to, and to mentally know like, okay, these are the different components that I need to put into the system. +Yeah, absolutely. And there is like, there are white papers published, for example, there is one by Google, I will try to also, you know, link it in the show notes and share with you. +But it's fairly long document and it goes so high level that you might get, get, get a slip, you know, while reading it. So you need like real tools, right? And you need some kind of understanding, okay, I stitched this together and I just achieved my goal. +I don't want to like build the fully blown MLOPS pipeline, right? And it's also expensive, like retraining these models is very slow, which means you're going to want to use the best hardware you can. +And if you're doing that every day, which is crazy, but let's say you do it every week or every month, it's still a significant amount of like fixed resources. You have to allocate to it and like mental resources to understand it to debug it. Yes. Yes. Yeah. You're right. +Yeah, I think there is still a long way to go in this in this direction, but at the same time, you like you as a developer, you want to focus on the task and on the result, right? Like not on figuring out what's the bug and that framework or whatever. +So yeah, I think there are tools already available. So maybe one of them that I've been using is determined AI or kind of doing some early stage moves there. It's completely open source and it's and it claims that basically it utilizes your GPUs to the maximum because GPU is super expensive. +And yeah, so basically at the abstracts GPU kind of allocation away from you, but it has some limitations as well. So the team is working on or resulting them, but like PyTorch and TensorFlow are supported. So like you can run some fine tuning or training or evaluation and hyperparameter search. +So yeah, I mean, it gives you a sense of control in a way, but of course that comes with some rigidity built in, but eventually I really hope that they will make it and it will be more kind of widespread. Yeah. Awesome. Awesome. +So today if I go to look up, can I buy an NFT already? Okay, then just find it. You can just find it and click through the open see the actual process of you getting the deliverable and the token and all that stuff is actually pretty complicated. So I'm going to let them do it. +I really want to on look up, hopefully be indexing tons of different NFT markets. Open see is the biggest one, but there's quite a few other small ones. So I didn't want to time myself too closely to one particular blockchain or one particular form of operation. +I do think that this is developing so quickly. NFTs weren't even a thing until about two years ago. So I feel like it's a little early to sort of like get in bed with just one of the vendors or just one of the vendors. Yeah. Yeah. +And I've been also like when I joined Quadrant Telegram channel, I saw like you've been so active like you are sending some advice or commentary almost every single day. So I love Quadrant and I love Telegram. +Yeah, I was just thinking like you are the developer of Quadrant or what, but you are the user, is that right? +Yeah, I mean, I think I'm doing like informal tech support in the opposite time zone because they're all on CET like you are in a, you know, for some reason, although a lot of people wind up in, you know, American time in there. +I was looking for a long time for a vector database. I tried FAASS face, I think they call it in Python and I tried a couple others and I really didn't find anything. +I guess you could say like intuitive that intuitively like scratch, scratch my itch, you know, like I don't like software to complicated things that are sort of like isolated and independent and easy to install and use. +Quadrant just sort of like ticked all those boxes for me like it's a small download, it's a dockerized so it's very easy to install. The API just makes sense. I had evaluated other vector databases we can talk about that if you want. +I found that Quadrant was the best mix of all those different factors. So, you know, when I embrace an open source project to try to do my best to help them out. So I built the first elixir connector to use Quadrant for me, Lixir. I'm trying to still develop other little pieces of the puzzle. +So actually I'm interested, I'm quite interested because you know, like I published a blog on seven vector databases. +It was actually six and then the founder of Quadrant knocks on my door, virtual door and he said hey, please add our database as well because we are the new kid on the block and you just probably didn't see us. +And then I opened their website and I was like kind of a little bit blown away because, you know, the documentation looks interesting, very good and also like the way they position it. Yeah. They talk about like metric deep learning, some things that I didn't even hear before. +And then I also discovered the developer team, like what they do and also they customized like H&SW algorithm algorithm as well. They do graph algorithm. +And can you like a little bit walk me through the options that you considered, like which are the databases you have taken a look at, how deep you went before you decided to go with Quadrant and what was the ticking moment like, okay, I like this. Right. +So I think the main one that I think I studied for a while, I think a lot of people look at is Milvis. Milvis has like a lot of really exciting energy going on. I think they go to have a good replication story as well. +But the problem where they seem like they wanted me to use their Python data science toolkit to sort of interact with it. They were, their API was very abstract and focused on, obviously just not what I was really doing. +I needed an API that was oriented around data operations and working not so much analysis. So that kind of slowed me down there. With Quadrant, I felt that as soon as I got to the web page, I knew how to use it. Right. +Which, you know, for some reason to me, you know, if a software system abstracts as specific of how it operates and how you use it too much. +I like, I like to say, like, I like a language where you go to the site and you just see a bit of code there on the home page and it's a button you can run it, you know, like, I don't want to be too removed from the actual tasks of what I'm doing. +So Quadrant just seemed to like understand that and get that. And then going to the channel, I like the fact that they had a good, a good, a deep technical understanding of how it works, but they weren't trying to beat you over the head with the specifics. +It was kind of like abstract at the right level. So, and you know, it's really fast. I tried tossing millions of records at it and it was almost if perceptibly slower. Like you almost couldn't tell that you were adding so many records. So I thought that was, that was really fair. +A lot of these vector databases now, I feel like the more like platforms, you know, I didn't want that. +I wanted almost like a redis of vector databases, you know, that kind of, by platform, do you mean like this database is trying to lock you in in a way, kind of like give you so many features you don't need? Exactly. Okay. +So I think it's all well meaning, but I just, I just feel like I can't, I can't trust one vendor for a lot of what I'm doing. I need to sort of spread my, spread my risk, you know, over different parts. So I try not to embrace any parts of the system that are too large, too monolithic. +You know, yeah. And I guess at this point, you're wearing your VP hat, right? VP of the genuine hat that you, you don't just kind of like, oh, this is a sexy platform or tool. Let's use it. +But you want to see long term, what are the implications, right? And you need to, you need to adopt technology that has the right sort of surface shape that you know, it's going to slide in easily. +You know, I, I, I, with, for instance, with Python face or whatever, I knew that that was going to be a nightmare to wrap to make connectors for for different systems. And I also knew that I wasn't going to program my entire thing in Python. +And I also knew that, you know, I would need to have a long term component running, running a web server that was independent of Python back and restarts. So with all those factors together, I think that quadrant was kind of like the obvious choice. +And plus just looking through the code, it seemed, it seemed short. I, for some reason, I've been having really good, really good results with stuff right and rest lately. I've, like, a lot of rest software come across as really reliable and performant. Yes. +So like, that's what I was about to ask, like, when you, then when you're choosing, you were choosing the platform or the database, did you pay attention to the programming language that was implemented in as well? Yeah. +For some reason, I know it's unfair, but I've definitely observed certain patterns in all, let's say, all Java based server applications, like, let's say, elastic is a great example. They always want to consume many, many, many CPUs and have low RAM limitations. +And of course, they're still that confusing garbage collection cycles and Java. And like every time I run a Java based service, I need to end up doing tweaking on the JVM Yeah, which is like a garbage collector, right? You don't want to do that black magic. +Yeah, I want, I want the thing that's doing the running to sort of be self operating. I don't want to have to be tweaking that all the time as my application needs grow and change. So that kind of like disqualifies all Java based software for me. +And I believe that one of the major vector databases is Java, right? I think it's the, the, the, the, the, the, the, the, the, the, the, it is written in go, actually. So, but if you mean Vespa, Vespa is written in Java and some other part. Yeah, Vespa, yeah. Yeah. +But by the way, did you consider Vespa or VEV8 as you've been studying different databases? I believe I checked out the Vespa site, like you said, Java. What was the other one you said? VEV8. VEV8. Written and go. Yeah. It's also open source and also first year cloud. +By the way, I have episodes with both Milbus and VEV8 for those of us that would like to kind of listen in what, what are the building blocks and architectures behind this? Yeah. And features. It's actually awesome. I have to listen to that. Yeah. +I'm actually very curious about the different implementation choices. Yeah. Yeah. Because go is also a high performance language, right? Compared to Python or Java. Of course, in Java, it also depends how you do it. +You know, if you take elastic search or solar, both of them are using Apache Ducin, which is a search library inside. And Apache Ducin has been optimized for, well, close to 20 years or even more. So I mean, it's close to see in some sense, but of course, it is not see. +So like when you, when you load more and more data, eventually it will, you will run into situations that you just explained, right? That you start tweaking the garbage collector. +There is like a dedicated channel or like even a person, I feel like Sean Hasey in the commuter side, who has a lot of wisdom in how specifically you need to tune which parameters in which garbage collector. +Do you want GC or whatever you have? Depending on the Java version, right? Because different Java versions have different GCs and like it's almost like opening a whole can of worms when you don't want to actually. You want to solve that task and move on to the next one. +So yeah, so far, I haven't had that issue with the rest based stuff that I'm integrating into my work so far. But you know, I would imagine the people using the first wave of Java based server software didn't find any problems either. +So maybe as time goes on, we'll discover that, you know, you can't do large allocations in a roster or something like that. +Yeah, it's also cool that actually Rust has been picked up by many teams developing search engines and not necessarily the vector databases, but like traditional, you know, inverted index databases like Tantivi is one of them, which is using Rust. +And they have a nice blog as well explaining some of the performance bottlenecks they were able to resolve. And Tantivi is way faster than you've seen. So there is another presentation by the Tantivi main. I'm looking for a good free tech system with BM 25 and all that kind of stuff. +Unfortunately, quadrant isn't going to add that. I don't think it's kind of off base for them, but that's such an important part. You know, there's a reason that so many of these startups have a team of people just doing search result quality, you know, search results are critical. +Yeah, now that you mentioned this important topic, I also wanted to kind of a little bit pick your brain on that, you know, like you have the traditional search engine, let's say on your classic dot com site, right, where I type text and you use inverted index to find the results. +And then you want to bring in bird model or some other model to deal with more semantic search, right. So have you thought about how you would combine them? Let's say inverted index versus vector search. +So I actually, okay, so we say search, right, but there's actually kind of different sub tasks inside of search. One of them is when you search for something, we want to show you something that's similar. So you don't necessarily want to get the exact same term. +So that requires like one piece of data or one mechanism, right, which is more like a recommendation type system. You also want to handle things with direct keyword matches, of course, but you also want to handle typos, right. +So typos requires like a second layer and structure of databases or the way you implement it as to work a certain way. Okay. +And I feel like the best way to kind of do this is to have the search piece do multiple different attempts at solving the query and then combine them with an intelligent strategy. So, like for instance, on classic, now we're building a better auto-suggest component. +And it's actually doing three different types of queries. +And I think that if you really, really, if you start recording what users are doing and you start looking at every single, every single search and saying, what did we do wrong here? How could we not service? I think you'll see that it's actually not just one type of query. +When people see a type of search box, they'll just start plugging things in. They don't know if they're going to do English language queries, which is something that an embedding would handle, right, because an embedding can understand any sort of information or any sort of intention in that query. +But sometimes they're just searching for a specific model number or something like that. In my experience, a lot of text embedding models, if you use a term that's outside of the domain, something that was outside the keyword list it was trained with, you're going to get really bad results. +So that's another thing I have to sort of be thinking about. +So unfortunately, right now I think that the best way to set these things up is to do multiple last-use search queries, maybe a postgres query, and maybe a quadrant query, and then correlate all those results and display them intelligently. Yeah, exactly. +So basically, you almost need some smart runker or re-runker, which combines these results, and it doesn't care, which algorithm was used to bring them in. But what it cares is the, you know, to optimize the KPI, let's say, flicks through rate or whatever it is. +Because in some applications, like, I've been talking to one company building maps, and they said that, for example, when you sit in a car and you start typing some, like, few letters, like two or three, you don't have much time as a driver, and you just need to hit the road going, right? + So if this company is doing bad at predicting the intent, and by the way, what they do is that they don't limit the search only to the radius around you, because they believe that you might be going to the airport from where you will fly out to, I don't know, Washington, DC, and then you are looking for that specific street while you are sitting in the car in Europe. +And so they search the whole database of points of interest, and, and you know, like, first of all, it's scale. It's going to be a problem. And the other thing is, you need to actually rank the results in such a way that they get it, right? So it's extremely difficult problem to handle. +So in that case, I would, yeah, in that case, they're probably predicting from where you are now. If you're here now, what's the most likely thing that you want to go to? It's kind of an interesting problem. +And actually, that's like, you actually kind of bring up a good point, is that a lot of startups don't have enough data to make those intelligent associations. +So it becomes a game of sort of, this, of finding an open data set that you can use, or something you do have, and like sort of abstracting from it, or extending it in a certain way that you can make these intelligent inferences. But it's very, very difficult. +And until you have a lot of users, you don't have any data coming back to you, telling you whether or not you're doing a good job. So it's not easy. +And I think that's one of the reasons that we see that some of these big startups, these platforms become very entrenched with their data learning tools, or their machine learning tools, their data sets, there you have, it becomes hard to, hard to unseat them. +Because all the activity and that space is happening on their property, you know? Yeah, yeah, exactly. So, and one thing I wanted to also mention that you said you want to handle typos. +Did you know, or did you look into bite level embedding models? So basically, instead of, you know, segmenting the word, let's say, letter by letter, whatever, which could be also expensive. They go into bite level. I think that paper was published by Google. +I will try to look it up and also link it in the show notes. But have you, have, did you know about this? Or did you consider such models? That's news to me. +What I've been trying to do is just retrain an existing model with a bunch of permutations and things like that, obviously, think of that were like common typos like dashes and stuff like that. But that's a very interesting idea. +So basically, they're working on a character by character level, right? So the embedding itself is composed of, it's even bite because the language could be something they like, okay, you don't want to apply some linguistic component, which is language dependent. +Let's say in Chinese, you need to segment the string, right? You need to know where is that space, which is not there geometrically. And then in some other languages, let's say, Russian, you will have like rich morphology. +So a lot of endings and prefixes of the word, right? So instead of, yeah, like surface forms, instead of applying that surface form lematizer, you will go and just look at the bites. And then you ask the neural networks to pay attention to the bites instead of the characters. Yeah. +That's actually a brilliant idea. And no, I haven't heard that, but I would love to apply it. So please send me the link. I will, I will for sure. It would be cool if you can apply and take a look at it. +And hopefully, there is a model that you can take off the shelf and not like spend weeks or months researching it. So the amount of effort going into training these models now is really, really absurd. It's ludicrous, yes. +And I mean, the models are getting bigger and bigger, but it's funny that they not necessarily becoming more smart in a way. And I will try to open it. And I'm actually now editing an episode. So by the time this one is published, that one should also be published. +And yeah, so basically, yeah, how you train everything, I don't know, it's like, you don't want like you as a developer, you don't have time researching things, right? +Now with what options do you have, you will probably need to go to some company, which will offer you the service for money, or you need to go and scout on the hugging face side and hope for the best, right? How do you feel about that? Yeah. +I haven't spent so much time just sort of getting my brain around certain things that are like, you know, there's no real jumping off point for a lot of the stuff. +There's no single place you can go to people, I see people on the web on these sites, like, what book should I read about this? Book, you kidding me? What book should you write about this? There are no books about any of this, you know? +It's changing so quickly that I feel like you have to be part of numerous, like internet subcultures and very specific, like research websites, even I understand what's going on. +But thank God that people like hugging face are putting so many resources into just making these tools available, like their pipelines package is like such a time saver. I can't believe I was ever implementing all these things from scratch or from, you know, separate tool. +Yeah, yeah, there is another site that I wanted to mention, which also picks up is papers with code because when you read a paper and you're like, okay, how do I do it? I need to spend a few weekends to implement it. Some of us have the skills, some of us don't. So people are lucky. Yeah. +And they probably belong to the communities so well that, you know, they know their way through. Yeah, I think that, you know, papers without code are kind of like, like to me, a little scandalous now. +I feel like it's very difficult to measure someone's results and to really evaluate what people are doing if they're not releasing the code. And even if they explain the algorithm and the paper a lot of times, the specific training process for the model is what's really critical. +So certain decisions they made about what's included in the day set, what isn't included in the day set. And just sort of like the training loop engineering, like I feel like that's super important. +So I think what the success of OpenAI has had with Clip, I think goes to show that someone with a great idea and a model that's released on time and early is just it's going to really be a game changer for the industry. +Yeah, I remember like in the Telegram channel of Quadrant, two people including you, I think, said that without any fine tuning, you got really great results with Clip. And I think you guys applied it to different domains. +And that was amazing because especially the cross-domain application, you know, it's such a big pain for these models. +There is a paper as well where they take the off-the-shelf models and they apply them to different like search tasks because a search task could be, let's say, it looks like a question answering system or it operates in financial domain, right? So it goes in specific domain. +And then they basically what they did is that they applied no fine tuning whatsoever. So they took the models and they applied what they call zero short learning, right? So you just, you need to predict it right away. And they showed that, ah, man, they're not all equal at all. +And sometimes they miserably fail. But they actually found out that the specific category of algorithms based on, I think based on dense retrievers, if I remember correctly. So they perform better than others. +But if you compare the dense retrievers to BM25 method, BM25 was still fairly close to them. And it's way less expensive. So that's really interesting. Yeah. It also depends a lot on the very, very specific use cases of your users. +Like what you're saying with BM25, if something is very, if they're searching for a lot of sort of jargon and industry specific stuff, BM25 is definitely going to kill it compared to even models that are tolerant of the terms outside of their, outside of the keyword space. +Like, I really feel like what we need is a more natural way to blend these two kinds of techniques. +And I think as we see more and more advanced vector-based search engines coming out, we're going to see systems that are able to sort of store the full text, store the vector embedding, compare them both and rank them in a uniform way, which is like so critical. +And one of the, I think something you mentioned that is super interesting is to the systems that are, they're retraining them using these simple keyword question answering tasks. And result comes out much, much better, the accuracy and stuff. I think that's so interesting. +And I believe that if you could take a model and take specifics about your use case and blend them together very quickly and easily, I think that we would end up seeing embeddings that produce a much better result in the field. Yeah. +And I think you are tapping also in the in the part where I hope that at some point, we also get a confidence level. Let's say from BM25, you can also get a confidence estimate how well it worked. And the same goes to dense retrieving, you know, the vector retrieving. +Wait, could also say, hey, I'm kind of like 60% confident that I found you a good result, right? Then the question is how you build it, but like that's the goal. Let's say I'm just pushing this out to the universe. So hey, everyone who works on search engines, maybe you can consider this. +Or maybe you already are considering this. But I think that would be so much easier because, for example, in e-commerce, right, one of the problems is zero heat search. And probably for your search as well, right? Like somebody typed something that you couldn't handle. +Now, what do you show? A blank screen or you show the most popular NFTs, right? And that's one of the hard things about, I guess, a traditional imperative based search engine. You can never show the user an empty page. You can never just say, ah, nothing, sorry, try again. +You have to always be like feeding them next steps that they can go to that make that make a lot of sense. And that's definitely one of the challenges with old style database search approaches, just finding, finding results that are relevant, but not quite right. +You know, SQL doesn't really do that. So that's another great thing about Quadrant. At least a lot of times you're getting a score metric back that is a good and continuous, value. It's not bullying. Yes, this match, no, that didn't match. Yeah, yeah, exactly. +I remember like when I was entering this field slowly, I had a friend who was majoring in information retrieval systems as an academic. +And and I asked him, hey, so if I, how do search engine work, you know, like if I type something, what happens next? And I knew nothing about inverted indexes, nothing at all. And he said, yeah, there is like an inverted index. So we break that documents down into this kind of vector of terms. +And then it points each, each term points to the posting list with the OKDs and so on. And then you apply Boolean, Boolean like logic on top of those. And you make it efficient. But then I was still not satisfied. +I said, hey, so it means that if I need to find something, let's say I'm in discovery mode, I don't know what type. So what should I do? And and he said, yeah, the IR is not there yet. Like there is no discovery. +You literally need to type at least something, right? And then I said, OK, when I type something, like how does the search engine know what I'm looking for? +And he said, well, that inverted model, which is a vector space model from the 60s or 70s, it basically builds some kind of understanding of the document. +And I said, how exactly does it understand the document? And he said, basically, it's a bag of words. And I said, how can it, how can it make sure that it understands the meaning when it's just bag of words? Well, he said, there is also IDF component and NTF component. And these two play together. +And hopefully the ideas that you will find some unique document, which which uniquely explains what you're looking for. But if I'm not looking for a term, if I'm looking for to be or not to be, each of these words is a stop word. +Now, how does it know what I'm looking for? And then he said, OK, Google actually pays more attention to the title. So like if these words occur in the title, they will rank the document higher. And at that point, I was like, this is like magic. So it doesn't understand anything. I'm searching. +It's just tuning it, right? It's layers of hacks upon hacks upon hacks to achieve certain goals. It's very interesting. And in the case of Google, it's amazing. And it works as well as it does. The scope of documents that they have in that index is ridiculous. +And to be able to sort of fulfill realistic queries, especially if you consider doing an exact magic query for long terms across a huge index of documents, like how the hell, like the quotation mark queries, I guess you could call them. Very interesting. I think that's a real good thing. +One of the things that I've found to help me evaluate the overall the overall confidence level of that these texts and writings do is I evaluate different choices. So for instance, on classic.com, one of the options we're exploring is we have an enormous editor workflow. +So when a new vehicle comes into the site, we need to have a vehicle person is expert at that making model. Look at the vehicle and determine what it is and answer some questions about it. +Like what color is it? Has it been restored or is it in an original condition? So what we're trying now is to actually use clip for that. So I have a database of those, let's say, potential colors. +And then I evaluate the image with clip and I say, picture of a red car, picture of a blue car, picture of a green car. And then I look at all of them, and I determine what one obviously which one has the highest the closest distance. +But also, overall, did any of them have a close distance or were they all kind of distant or were they all very far away from the embedding of the query. And if so, then I tell myself that okay, we're not answering this question well. +Right? Like the fact that it had no strong suggestion at all is in a way a confidence factor or a confidence metric in a way. Which is fantastic. +Like you're able to find an answer to my question, which is like broad enough, I think, but like essentially, you can use a threshold on the distance that didn't cross my mind at all. Like, yeah, you're right. +Like you can define kind of like the confidence interval for these distances, right? And you know, which metric you're using and you know your data set as well. +Right? So you could go through the meal of your lab and check, okay, is this a good one? Is this a bad one? Yeah, that's an amazing solution that you just came up with. +And from the perspective of the amount of art of, like when you're building a piece of software, you have to say, how many little artifacts am I creating here? What do I have to actually do? Am I creating a lot of stuff? +Am I just creating a little bit of stuff that works for a broad range of data and use cases? Now with clip, you get so much for free, quote unquote, like that whole question answering system that I used to implement that took a couple hours. +And of course, I'm going to take a lot of tweaking, but compared to training, a bunch of image classifiers to answer the same task, which would take me an enormous amount of effort I would have to have, you know, we have seven different attributes. +So there have to be seven different models, hundreds of thousands of training images for each, a very elaborate process of manually correlating them with clip. I just got all that quickly. And again, it's not super accurate, but it gives you a building block that you can just apply everywhere. +And you know, if at some point, I wanted to find other vehicles like this one, that same model works. If I want to find, if this matches a certain given piece of text, like, is this a Ford Mustang? That model works. It's just, I don't know, it's really, really, really amazing. +Yeah, it's mind blowing that, you know, science, as you said, you know, somebody in the science world thought about this problem, and they came up with some really great solution that you can actually use. +But when you discovered that clip works so well, did you get amazed at the point of going back and reading the paper, or are you not interested in papers? I flipped to the paper. I'm interested in papers to some extent. +I know some people like take a week off of work to read paper and stuff like that. I'm like, wow. You know, I don't come from a math background. I come from more of a practitioner programmer background. +So for me, I actually prefer sometimes to study the code and to understand, like, a lot of times their usage instructions will give you a lot of like subtle information about ways that you should and should not apply it. So I kind of stand that space for the most part. +But I am definitely paying a lot of attention to all, to all embeddings at this point. +And I feel like this is like, especially the multi-volta ones, once they start including the video content, and once we can run audio through there as well, it's just going to be a really exciting time to be alive. +Yeah, I think it's great that you're looking at the code because it's like several levels of abstraction, right? First, you're trying to understand, okay, is this useful? Okay, it is. +How does it work? Maybe what are the limitations? What are the advantages? Then you go to the paper, so where they, of course, beautifully describe the algorithm and they say it's the best. So it beats the state of the art and, you know, over the previous work. +But the problem is that there is always a gap, or usually there is a gap between the paper and the code. So if they publish the open source code, you go there and oh my god, there's like a bunch of additional hacks on top of the paper to make it really work. Oh, I see. +So yeah, it's amazing that you go back and read the code. And are you getting scared of reading like C++ code? No, no, no, no. In a way, C++ is like, it's so much, you know, it's different. It's a different part of your brain. +You know, C++ is so much simpler in a certain sense that every, every line there has a specific action at a specific point in the code. +Like every line there has a certain meaning with, let's say a model and PyTorch, there's a lot of like, for instance, like if your normalization is wrong, right? It's hard to tell that. And it's hard to even see that except for watching a training curve and guessing and sort of hoping. +And maybe that's part of my, maybe that speaks to my skillset, but I definitely think that that the machine learning model is brilliant because it's such a small amount of code that can do so much. Whereas C++ stuff is interesting because everything is excruciatingly carefully defined. +So it's kind of two separate sides, but both you to hold in their own way. Yeah, absolutely, absolutely. Especially when they get combined. So you're like, you're doing some model on C++ because for example, H&SW the graph algorithm is implemented in C++, which looks like C. +So I took a look at it with one of my colleagues and he was like, wow, this is not the modern C++ code. And yeah, it's like basic, it looks very basic in a way. Of course, they use some C++ elements, but like, for example, they allocate memory with like this level of mal-locks. +Yeah, and you're like, wow, yeah, you're doing that. And then some other companies, for example, like semi, which build the aviate or quadrant, like basically re-implement the algorithm and their language of choice, go or rest or whatever. +So because you feel probably better after understanding each bit there and then you can also control it in the way you want, especially after listening to users. +Yeah, and every, I think that it's kind of like they're living in different, different computational spaces in a sense, like what they're expressing and what that line of code does is the complete opposite from the perspective of what is what we're committing to the machine here. +You know, the machine learning models, we're building a framework in which it's capable of learning something in a C++ based or any imperative environment like that. We're expressing everything you can specifically do. It's almost the opposite. It's kind of interesting to think about. +Yeah, exactly. +Hey, Tom, it was really great talking to you, but I was still thinking like if you can spend a few minutes and if you are not averse to philosophy, like I like to ask this question also to each guest on my on my podcast, like considering that vector search is an emerging field in many ways. +We don't know yet if it will fully replace the traditional search or if they will work together. But in general, like what makes you excited? Why are you doing this? Like what, what keeps you going and exploring this field today? I have a very simple answer for you. +I'm tired of writing if statements. So you want to piggyback on some complex models that the trends are. I want to show the machine examples of it working correctly and examples of it working incorrectly and the machine learns exactly what those if statements should be. +I mean, it's the idea that we have to train something by illustrating every possible variation of it is just insane. + If imagine like on the say on Lookpop when you're searching for money and you see images with dollar signs in it, like that could have been programmed by a human being, but it would take a team of hundreds and it would take them 10 years and then they would finally have the money detector, you know. +Some brilliant dudes took a couple months to express how it could work and now we can solve all these different questions. It's fascinating. No, it's an amazing answer actually. Thank you. +I mean, it's you know, like some people get entrenched in like, oh, I'm so in love with machine learning, but like what you say is that you have a practical need and you also know the limitations of your previous approach, right? +Like if statements like who wants to code if statements or like if you take would say dictionary like somewhere in solar elastic search, you need to manually code that dictionary up and like maintain it. +Oh my god. Really? Is that the best part of your job? Probably not. Defining synonyms is a whole I cannot believe I have to define synonyms. Someone's already done this somewhere, you know, induction rich somewhere and they're just sitting on the ring. +It is somewhere dusty dusty shelves from why not why not embed them inside the machine learning algorithm? Yeah, absolutely. Hey, it's so fantastic talking to you. +Thanks for bringing this user perspective and like, is there something you would like to announce or share with with the audience, you know, anything at all? Just check out lookpop.co and taking some NFTs in your life. +Yeah, and buy an NFT and spice up your life, the digital life, right? Yeah, awesome. Hey, Tom, thanks so much. I really wish you all the best in trying quadrant and implementing it in your product and also like the whole web to your user base. +And I'm sure we can talk later as well and you can share some progress speeds, you know, as you go. Great. Thank you so much. Thank you so much. Yeah, bye bye. \ No newline at end of file diff --git a/transcripts/vector-podcast/trey-grainger-wormhole-vectors.md b/transcripts/vector-podcast/trey-grainger-wormhole-vectors.md new file mode 100644 index 0000000..e6aa3e1 --- /dev/null +++ b/transcripts/vector-podcast/trey-grainger-wormhole-vectors.md @@ -0,0 +1,299 @@ +--- +description: '

This lightning session introduces a new idea in vector search - Wormhole + vectors!

It has deep roots in physics and allows for transcending spaces of + any nature: sparse, vector and behaviour (but could theoretically be any N-dimensional + space).

Blog post on Medium: https://dmitry-kan.medium.com/novel-idea-in-vector-search-wormhole-vectors-6093910593b8

Session + page on maven: https://maven.com/p/8c7de9/beyond-hybrid-search-with-wormhole-vectors?utm_campaign=NzI2NzIx&utm_medium=ll_share_link&utm_source=instructor

To + try the managed OpenSearch (multi-cloud, automatic backups, disaster recovery, vector + search and more), go here: https://console.aiven.io/signup?utm_source=youtube&utm_medium&&utm_content=vectorpodcast

Get + credits to use Aiven''s products (PG, Kafka, Valkey, OpenSearch, ClickHouse): https://aiven.io/startups

Timecodes:

00:00 + Intro by Dmitry

01:48 Trey''s presentation

03:05 Walk to the AI-Powered + Search course by Trey and Doug

07:07 Intro to vector spaces and embeddings

19:03 + Disjoint vector spaces and the need of hybrid search

23:11 Different modes + of search

24:49 Wormhole vectors

47:49 Q&A

What you''ll + learn:

- What are "Wormhole Vectors"?

Learn how wormhole vectors + work & how to use them to traverse between disparate vector spaces for better + hybrid search.

- Building a behavioral vector space from click stream data

Learn + to generate behavioral embeddings to be integrated with dense/semantic and sparse/lexical + vector queries.

- Traverse lexical, semantic, & behavioral vectors spaces

Jump + back and forth between multiple dense and sparse vector spaces in the same query

- + Advanced hybrid search techniques (beyond fusion algorithms)

Hybrid search + is more than mixing lexical + semantic search. See advanced techniques and where + wormhole vectors fit in.

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20251107_051156_724d3e0d493d36eed167f0604822b7e3.png +pub_date: Fri, 07 Nov 2025 05:58:00 GMT +title: Trey Grainger - Wormhole Vectors +url: https://rss.com/podcasts/vector-podcast/2314900 +--- + +Alright, hello everyone, wherever you are, really, really happy to see all of you online. Welcome to the Beyond Hybrid Search with Warm Home Vectors. It's another idea that Tray is going to present today and we will have a discussion and all of you are welcome to ask questions as well. Yeah, cool. +I think we'll start with that. This is just a quick intro from me. I'm Dmitri Khan. I, most recently, I'm with Ivan. I joined as a product director leading the search domain. We offer managed open search so that you don't have your headaches setting it up and doing some DevOps. +And you can choose any cloud whatsoever, really. And then just go and run with that. And I'll share a couple of links later. I'm also a host of the vector podcast that I started, I think, four years ago. I already stopped counting. Maybe some of you have heard some of the episodes. +And yeah, it keeps going on and off, but I'm really excited to continue doing that. I've been in search for, I think, 16 years, maybe 20 years if I include academic experience or exposure. I've built search at startups, multinational technology giants. +I think what was the startup, for example, AlfaSense became, I think, a unicorn company by now. Yeah, I'm super excited to partner with three AI power search and support from vector podcasts looking forward to the session today. Over to you, Trey. Awesome. Thanks, Dmitri. Appreciate it. +I'm really excited to have Dmitri Khan more for the conversation part of this. He's been doing, doing the vector podcast and in the space for a long time. So I think it'd be useful to help him facilitate, get lots of questions and good discussions. So I'm Trey Granger. +I'm the lead author on the book AI Powered Search along with Doug Turnbull and Max Irwin. I'm the founder of Search Colonel company that does AI Powered Search consulting, technical advisor, open source connections. +Last year, I've been an adjunct professor at from university teaching computer science. My background, basically my entire career has been in search, particularly the intersection of data science, AI and search. +My last company, prior to search journal, I was the CTO of pre search, which is a decentralized web search engine prior to that. I was the chief algorithms officer at Lucidworks, a search company, as well as prior to that, their SVP of engineering. +I also had a search at career builder, prior to that. I also a decade ago, solar in action, but AI Powered Search is the focus of what I'm doing right now. The books got, you know, quite good reviews from folks. If you haven't checked it out, please check it out. +And this lightning lesson is one of a series leading up to an AI Powered Search course that Doug Turnbull and I are teaching starting two weeks from today. I heard of it. +It's kind of themed based upon the book, but we'll be going into a lot of new and emerging techniques that aren't in the book as well. Just to give you a sense, I'll spend like a minute on this, maybe two. +If you're curious, it's, you know, four solid weeks of material, the first week will sort of, you know, do a course intro, introduce the search relevance problem, talk about ranking those things. +We'll have a guest session from Eric Pugh from open source connections, talking about user behavior insights. For collecting click stream data and how to properly collect and process that are an accession will be on signals and reflected intelligence models. +Everything from signals boosting for popularized relevance to learning to rank for generalized relevance to collaborative filtering and matrix factorization for personalized relevance to knowledge graph learning to learn from user behaviors. +You know, terms, misspelling things like that about your domain. And then every week will have office hours where you can bring your hardest questions or we've got labs throughout the course as well. If you need help with those, we can help. +We the next week will dive into AI powered query modalities, things like buying coders and crossing coders talk about chunking, talk about late interaction models, hybrid search, multimodal search, all of those. Again, all of this has code and notebooks associated with it will be working through. +We have a guest lecture from Jenny from quadrant who will be talking about mixing sparse sentence representations with mini coil and then we'll dive after that into sort of hands on building ranking classifiers or learning to rank models. +And what is entailed in that we will of course then have office hours again the next week will dive deep into rag talk about rag. You know, sort of naive rag, agentech rag adaptive rag guard rails all the sorts of things you need to sort of understand to do to rag well. +We'll talk about agentech search towards the end of the course talk about interleaving strategies for rag will have Max Irwin our co author on a powered search to giving a guest lecture session after that will be. +Automating learning to rank and with click models and with active learning so we'll be diving into how to deal with biases in your data how to deal with exploration versus exploitation looking for results that may maybe don't show up in your normal search patterns and then. + The sort of final two weeks will have a guest lecture from john handler from open search and AWS really talking about scaling vector search in production with lots of good experience from large scale open search clusters and Amazon servers and then we'll dive into optimizing I search for production everything from quantization re ranking strategies semantic caching. + Running local models and then for our last session will dive deep into AI powered query understanding and agentech search focused on really interpreting understanding queries leveraging agents as part of that process and so if that's interesting to you there's a link and a QR code here anyone who attends today is eligible for 20% off the course. + And so definitely check it out if you've been considering it there's two weeks left and of course even if you can't attend all the sessions everyone who's enrolled will have permanent access to all of the recordings all the code and all the course materials so you can use these going forward into the future if that's interesting to you so done with that now I'd like to get to our topic which is beyond hybrid search with wormhole vectors. + So let me dive straight in and feel free if you have questions as Dmitry said post them in the comments Dmitry feel free to interrupt me at any point if there's something worth diving into otherwise i'm just going to keep going and kind of focus on conversation at the end so I want to start with some basic material on vectors and vector spaces to kind of set our expectations for where we're going with wormhole vectors to start vectors by definition mathematically or something that have direction and we're going to start with the end of the end of the session. + And magnitude vectors if you think of them as sort of something that can go in any direction and vector space and you can add vectors together to generate a new vector you can average vectors together to find you know sort of an area that's in between them there's lots of mathematical operations that we can do on vectors but let's keep in mind that they have both the direction and a magnitude. +When we think of embeddings and embedding as a set of coordinates in vector space and a Hilbert space but in a semantic vector space into which we can map a concept so whereas. +vectors have dimensions and those dimensions sort of go in any direction when we talk about an embedding an embedding is actually a point in vector space so for example this point right here this series of floats for book or tree or what have you. +You can think of it as a vector originating from the origin at zero zero here and extending out to that point but fundamentally we think of an embedding as a coordinate. That is a point in vector space that corresponds with some semantic meaning. + And search whenever we're dealing with embeddings we often have things like word or phrase embeddings where we take an individual word and leveraging a transformer model typically we will generate that series of floats that represents the meaning of that word given the context around it but we can also have sentence embeddings where we look at all of the words in the sentence and their contextual meaning. + And generate an embedding that represents the meaning of the sentence we can have paragraph embeddings that sort of summarize the core ideas of that paragraph and the same thing with a document often in search will start with just a document embedding and when we take a query we generate an embedding and we do sort of a vector similarity between defined related documents that match the query but you can chunk documents up in any way and any number of vectors. +And actually we typically think of embeddings and vectors as having a relatively small number of dimensions we call these dense vectors where maybe there's 768 or you know 1024 some number of dimensions and we compress lots of data into a continuous space within those. +However, there's also the notion of sparse vectors and the best way to think of a sparse vector for purposes of our discussion today is to think of lexical search and to think of just you know when I'm trying to run a search for keywords. +Imagine you have a 1 million dimensional vector not 768 but a million dimensions and every single one of those dimensions corresponds to a term in your index where you've indexed all of your keywords and let's just assume that there's only a million terms in your index. + If I wanted to represent latte as a query well let me not do latte let me do a doughnut if I want to represent doughnut as a query then I can represent that as a vector with a million zeros minus one and that there's a one in the column for doughnut indicating that this is a million dimensional vector with only one value represented and that's the whether the text doughnut appears within. + Within this document or query and so that's a sparse vector where it's sparse because most of the data is not filled and I have mostly zeros but there's some ones and of course in this case if I had the search for cheese pizza that vector would have two ones in it one for cheese and one for pizza so it's a million dimensional vector with two ones in it. + This is just as valid as a vector as a dense vector with only 768 dimensions but what we typically do when we start to move from lexical matching where we can match on you know those yes or no ones or zeros and an inverted index what we typically do when we move to doing semantic search is we focus on a much smaller number of dimensions and so conceptually as an embedding here what I have is eight dimensions you know each of these items. + The items that I showed on the previous slide has dimensions indicating whether it's food whether it's a drink you know how much dairy it has is it bread is a caffeine sweet calories healthy etc so you can see apple juice now is not represented as it has the word apple and it has the word juice but it's represented as very much not food very much a drink no dairy no no bread no caffeine very high on sweet but not all the way up very high on calories and all the all the way up and in between but you know kind of sort of healthy but not really and then you have same thing cheese bread sticks very much food not a drink good bit of dairy very much bread no caffeine you get the idea these map in the attributes are the dimensions of these concepts over here by representing them in these eight dimensions and in search what we typically do is we represent documents and queries leveraging these vectors. + And then we do some sort of vector similarity calculation in order to say how related similar things are so if I were to for example take the vector over here for cheese pizza and I were to then do a cosine similarity between that vector and every other vector I would see that cheese bread sticks have a very high similarity followed by cinnamon bread sticks followed by doughnut all the way down to water so these are essentially ranked based upon cheese pizza these are the cheesiest. + This is the most breadiest unhealthiest non drinkiest things at the top you know this is still very bready very non drinky not very healthy here ranked all the way down to it's essentially opposite in this vector space which is water which is all the way on the other end of the spectrum same thing with green tea very similar to water cappuccino latte you know healthy no calories drink all the way down to a very unhealthy very not drink you get the idea. +Actually in a semantic vector space things span across these dimensions and they fit at different places along within the vector space that corresponds to the meaning of these attributes. + Now when we deal with transformers which we get from all the LLM's today that we're leveraging for vector search these don't use explicit features like we have here food drink dairy bread etc they use latent features and latent just means sort of hidden or another way to put it is the dimensions don't correspond one to one with particular attributes it's combinations of those dimensions together that they give us our meaning and so to think of that visually if I were to create an embedding space and this is obviously flattened you know there could be thousands and thousands of dimensions or hundreds but in this vector space if these are all of the the embeddings that I have and I would search for the keyword or start for the phrase Darth Vader turn that into an embedding and match it you'll see that over here on the right I have a cluster of meaning associated with the search for Darth Vader. +Now there's some other points in various places but if I were to look at the items in this cluster I see pictures of Darth Vader which is what I would expect because the meaning of Darth Vader is essentially in this area of vector space. +Similarly if I were to search for puppy then this cluster of meaning right here corresponds with puppies and did I see pictures of puppies so the interesting question arises when I ask what happens if I were to find the midpoint between puppy and Darth Vader in this semantic vector space. +People have different intuitions about what actually happens here some people think it's you know I don't know what I would find in the middle but the answer is if this vector space is properly constructed so that the semantic meaning is represented i.e. + the further away I get from this point the more I get away from dog the further away I get from this the more I get away from Darth Vader and vice versa then what I would expect to find if I sort of average those two a vector from here and a vector from here together is a puppy Darth Vader a cute puppy Darth Vader right here in the middle. + And so for some people that makes intuitive sense but if you think about what a semantic vector space is doing where's representing meaning across a continuous spectrum you would expect to find this because I'm essentially finding what the thing that is the average sort of in between Darth Vader and puppy within the semantic vector space. +Now there's all sorts of reasons why this could not work depending upon how you've changed your model and if you know there's too much data being compressed into to a little space but conceptually this works. +So similarly if I were to do a embedding search for superhero flying versus superheroes flying this is very comparable to running a search for you know superhero flying sort of. +With the with the idea of a singular hero and then sort of tracking out the idea of a singular and adding in the idea of a plural again from here to heroes and then what happens over here is this is essentially the same vector but moved toward or in the direction of multiple versus singular. +What you see over here in fact is that while some of the images are the same I all in general i'm seeing more images of superheroes that are in groups of multiple superheroes and so to demonstrate this with a very like explicit concrete example. + If I were to take this an embedding for this image which is a delorean from back to the future and I were to sort of describe it right this is a you know a sporty car with you know to door one door on either side and it's kind of boxy and it's got really cool lighting and so when I run that search for this embedding on other images I find other sporty cars obviously some delorean's in here but also just in general sporty cars with you know a door on either side and really cool lighting for the most part. + However, what if I were to take an embedding for the query from the last slide superhero and I were to average that or pull it with this image embedding what would I get well in fact we have an example of this in the i powered search book when we're doing multi modal search if I take an embedding for superhero and embedding for this image what I in fact do get is this very first result as a sporty car with cool lighting with a superhero on top because that's what I would expect in this. + So I would expect a semantic vector space to be in in between these things and for these other images again sporty cars single door but notice that in all of them there's a person and it just so happens that that person is the protagonist of their story so maybe those stories didn't have actual superheroes but these are the heroes of those stories. +We get the idea and I wanted to paint that conceptually just to talk about regions of vector space and what they represent and how you can use math on vectors to move between them and sort of combine concepts and find related things. + So one problem now zooming back out to the topic of today one problem that we commonly come across and this is where hybrid search comes into play is that we have disjoint vector spaces in search and that leads to disjoint query paradigms what I mean by that is that we have a sparse lexical semantic space which is our inverted index right what I showed you earlier with the million dimensions and you know the keywords. +So we have a very small number of dimensions represent the dimensions that is a vector space it's just a very sparse one. + Similarly we have dense vector spaces where most of our embeddings are that we get out of large language models where they're compact into a small number of dimensions but they're continuous because we have these two different query paradigms what often happens with vector search is we say hey I don't know how to combine this dense query on this embedding with this sparse query with these keywords. +So I'm just going to run them as separate searches and in fact that's what most sort of hybrid searches hybrid search implementations look like out of the box. + So this is an example of rrf or the reciprocal rank fusion algorithm where I'm essentially taking a lexical query over here for the habit and I'm matching on a bunch of documents you'll see you know each of these has the word habit and it's somewhere either in the title or maybe in the description. + But notice that while the first four results look pretty good the next these are the only results that had the word habit in them and then the rest of these results the good the bad and the ugly this just happens to match on the word the three times and then this next result happens to match on the lord of the ring so it's got the in it three times as well. + It happened to give me a good result but it was purely coincidence because it doesn't have the word habit here and then I get the abyss and then the apartment again only matching on the word the so the lexical search found all the results that had the word habit in them but it completely missed a whole bunch of other potentially relevant results likewise my vector query over here for this embedding matched the habit here it matched a Harry Potter movie here you know similar concepts similar themes and. + And similar kind of visual style lord of the rings, the rings rise of the guardians I guess is maybe kind of conceptually similar even though it's a cartoon the wailing I think this has a visually similar style but is a really bad match you get the idea so there's some really good results I get from the vector search some the dense vector search some really good results I get from this lexical or sparse vector search and then with hybrid search with reciprocal rank fusion we can essentially take. + Each of those separate sets of results and combine them together in a way that weights things that both the lexical and the dense search found relevant it moves those at the top and then kind of gives us better results overall you can see that i've matched most of the results over here so it's better than either of the two lexical or dense vector search mechanisms individually however. +i'm still treating them as entirely separate things i'm i'm doing the lexical search i'm doing embedding search and then i'm combining them together. + But in reality there's lots of ways to merge these different paradigms and even beyond just the embedding some getting from text you know we can we can get text embeddings so for example and we can do a texting coder to generate embedding for that we can take images and generate embeddings for that we can also take user behaviors. + And generate behavioral based embeddings and combine those together and there's different ways to generate new vector spaces you can concatenate these together and you can do dimensionality reduction or you can stack them i'm not going to get into those today but the reality is we've got a lot of tools at our disposal to be able to query and get at data and related in different ways. +In fact what I described for hybrid searches second ago with rrf is just scratching the surface for what we can do with. Combining different paradigms and so this spectrum here on the left, this is you know token matching or sort of. +traditional you know the lexical search and you'll see that you know things like tf IDF will be matching those kinds of things fit over here we've also let me just check the yeah okay we've also got. +On the opposite of the spectrum this dense vector search and of course the rrf would fall in here in this sort of. + hybrid sparse retrieval and dense vector search where we're running them independently and come in parallel and combining the results but there's also mechanisms where we could for example run sparse retrieval first and then re rank using dense embeddings or something like with mini coil which I mentioned Jenny from quadrant's going to come talk to us about an AI power search course you can actually run a sparse search. + And have embeddings that are sort of adding additional semantic data to your lexical queries to be able to better leverage semantics as part of your sparse search there's play there's semantic knowledge graphs there's all these different techniques that we can use to get better search whether it's hybrid search or leveraging one of the techniques but I want to just like. +mention that there's lots of ways to deal with embeddings and to deal with sparse and dense vectors to combine combine them to improve query understanding and to improve recall and so one of the. + Things that i'm experimenting with sort of like an emerging way to do this is something i'm calling wormhole vectors and the idea of wormhole vectors is that i've got these sort of different vector spaces i've got my sparse lexical vector space which we talked about i've got my dents semantic vector space. +And then I mentioned we can generate behavioral vector spaces which i'll show and just a little bit and so I want to walk through what this technique looks like. +And I do want to frame this talk as this is sort of like new and emerging i've got lots of experience doing some of this across different vector spaces but. +There's a lot of things that I so need to iron out in terms of best practices for doing this so treat this as something this emerging and something you can play with and I think the intuition will be really helpful. +But i'm you know if in preparation for the course and going forward i'm going to be doing a lot more in terms of concrete examples for this. +And so i don't want to get into quantum physics or you know this is in general, but you know wormholes if you're not familiar are essentially passages through space time. +You can think of it as you know the ability to you know go from one point in space to another point in space and essentially like hop there instantly. I could get into Einstein Rosen bridge is not that kind of stuff but don't really want to for purposes of today. +And what I do want to do though is talk about. I'll give you one second well i'll skip over this will maybe come to that in the q and a if Demetri is interested and talking about. +This notion of entanglement and how that relates to wormholes it might be interesting later but I don't this is about search not about quantum physics and physics in general so. This is what it means to generate a wormhole vector by practically so if if you want to generate a wormhole vector. +The there's a fundamental base reality of all these vector spaces meaning if I query with an embedding an intense vector space or I query with. I query with a lexical query over here where I query with ideas and user behavior over here. +All of those queries ultimately boil down to matching something and the something that they match is really critical to how we understand queries and how we understand relevance and what they boil down to is a document set. +So if you run an embedding search over here you find a point in vector space and if it's a dense space you typically do an approximate nearest neighbor algorithm or otherwise find the nearest neighbors to whatever point you're querying and those are your relevant documents. +Those documents form a set and you can cut off the threshold at any point to say these are the documents that matched but that set of documents collectively has some meaning that some some relationships within it that represent the meaning of that query. +Likewise if I do a keyword search I find a document set and the collection of those documents represents the meaning of that query at least as we've been able to represent it in that vector space same thing over here. +So the idea of a wormhole vector is if I want to query in one vector space and find a corresponding region in another vector space that you know shares this sort of same meaning semantically then what I'll do is I'll query in the current vector space for example within embedding here. +Actually let me start it here, I'll query in the like sparse like school vector space I will then find a relevant document set this is what search does it finds a document set and then I will derive a wormhole vector to the to a corresponding region of another vector space. + So for example once I found a document set over here I will use that document set to generate what I'm calling a wormhole vector but to generate a vector that will allow me to query in the other vector space or hop or traverse to the other vector space instantly to a region that shares a similar semantic meaning to the region in the like school space. + Then once I've found that vector for the other vector space I will run that query in the other vector space to traverse to that vector space and then I'll repeat as needed so I can actually hop back and forth between vector spaces to find and collect documents to try to better understand them and then to use that understanding to take those documents and return them for the full set of search results. +So I'm going to actually just show this visually for a second let me. Let me click here and restart this demo so imagine I have a sparse vector space over here on the left. + The way this works is I send a query in this query finds a set of relevant documents that are in this vector space and what's it's found those documents it uses them to as a essentially a wormhole in the pauses for a second maybe I can't essentially wants to run that query I find the relevant documents which are the things close by in vector space I then use that to generate a vector and embedding that I'm going to run a search for over here in the dense space. +And once I run that search you'll notice that in this example it's not exactly where these documents are but it's very nearby meaning the sort of collection of these things together and what's understood semantically about the relationship maps to this point and vector space on the right. +And then that allows me to then find other things really surrounding it that represent a similar meaning. + And this is a you know just looking at two vector spaces a sparse vector space and a dense vector space for keywords and then for embeddings but as I mentioned there's also this notion of the behavioral vector space so the same thing happens here I can run a query find relevant documents use those as my wormhole. + And then I generate this wormhole vector to hop through the wormhole to the other side to find the region corresponding to that meaning and either of these other vector spaces so in this case I've done major expectation which is like the process you go through when you're doing collaborative filtering for recommendations then I would you know hop to the corresponding region over here so that's the general idea just kind of visually describing it. +And I go over here. Give me just one second. Give me one second slides up. All right. And then the next question is how do we actually create these wormhole vectors so to meet you if there's any questions feel free to enter up me at any point but okay. I'll keep going otherwise. +I think we have a couple questions but we'll defer the end sounds good. All right. So the question now is how do we create a wormhole vector and there's essentially two two types that I'm going to focus on right now one is the. Sorry, I lost this. +The first is if I'm trying to go to a dense vector space within beddings so this is very easy all have to do is pull the vectors or average the vectors of the top in documents so imagine I run a keyword search. +I find a set of documents I rank those and then I don't necessarily need to take the entire document set I could but if I want to just take the top in documents to get a sort more sort of semantically relevant or you know just. +Just let's just say relevant set corresponding to that keyword query then I generate a new embedding in the dense space that is just an average of those. +If you go back to the Darth Vader example from earlier where the puppy Darth Vader is in the middle it was sort of a combination of the meaning of Darth Vader and a meeting of puppy. + I think of this as taking a bunch of documents that each have their own meaning and when I pull them together I'm creating an embedding that has the average of the meaning and if I assume my documents that I queried on the lexical side have some sense of a shared meaning within them and I take the top documents from that then that shared meaning I can hop over to the dense space find and then find other things that have similar meaning even if they don't match the keywords. + Likewise I can go the other direction if I'm in my embedding space my by dense space I can run a search find the top in most related embeddings by cosine similar to what have you and then conceptually it seems more difficult to then hop over to the sparse space how do you generate a sparse vector but there's a technique called semi acknowledge graphs which I'll kind of walk you through which allows you to do this. + So zooming back out I mentioned pulling the vectors of the K and in documents all you need to do again I query an electrical space get the top K documents get the embeddings of those documents and average them together this is the simple way to do that just using Numpi for the semantic knowledge graph approach same thing I get the top K documents in the current vector space and then I do a semantic knowledge graph traversal to derive a sparse lexical query that best represents those documents. + So functionally if you think of language I just talk about semantic knowledge graphs for a second and show you the structure of natural language you can think of it as a graph of relationships right we've got prefixes and suffixes and you know those mapped to terms those mapped to terms sequences and documents. +But once you get documents and we've got these terms across documents you can just think of this as a giant graph of relationships and so I can take you know individual words. + In this case trade his he they all refer to the same entity I can take other things and if I think of this as a graph then in fact you can leverage your inverted index as a graph and you can traverse it to find these relationships and so and typical search engine so like any of the Lucy and engines for example you have. + An inverted index which is something that maps terms to sets of documents and then you've got usually a forward index and you know open search elastic search solar any Lucy and engine this is going to be your your doc values but essentially I can take any term and map it to a set of documents so if I can take any term or sorry any document map it to a set of terms so if I can take any term and map it to a document set. + And I can take any document and map it to a set of terms then that's a graph and I can traverse back and forth across this so for example if I have the skill of Java and the skill field and I've got a set of documents that has the keyword Java you can think of the set of documents is representing the keyword Java and then similarly you know there's sort of linked to other documents you'll notice that there's no documents that link both the skill of Java and the skill of hibernate. + And so in a set theory view it looks like this notice that this set doesn't intersect with these and from a graph theory view the same underlying indexes look like this where I have a graph where I've got the skill of Java with a has related skill edge to the skill of scholar and it's skill hibernate and then oncology is completely disconnected from this graph and all I'm doing is leveraging my inverted index my sparse representation to traverse across these relationships. + This is very useful for things like disambiguation where I can take a keyword like server I can traverse through documents to find you know what are the top semantically related categories for example DevOps and travel and then within each of those categories I can traverse to other keywords and find which are the most semantically related keywords to server and the DevOps category for example I get terms like doctor and genetics Jenkins get words and travel I get things like tip restaurant bill. + So all of this just leverages and inverted index there's no embeddings what whatsoever this is all just leveraging the sparse semantic space but why this matters for modeling intent is if I have a query like barbeque near haystack over here I can generate a sparse vector representing the meaning of barbecue by looking at the index and seeing what's related to it. +So in this context what I'm able to find is that barbeque is related to things like ribs and brisket and pork and the category of restaurant IE I can generate a sparse lexical vector like this purely from the semantics the things that are semantically nearby in my sparse vector space to barbecue. +Also if you look at the query over on the right barbeque grill what I'm able to do is generate a sparse vector that is barbecue or grill or propane or charcoal notice that this vector is now different because it's contextualized based upon grill being in this query. +So now my query becomes category about to our clients and then this is the list of words that better represents the meaning of barbecue again no embeddings no transformer models no LLMs involved here. This is purely leveraging my sparse lexical space in the semantics within it. +And so this is some example source code from the app how research book for traversing semantic knowledge graphs. But the idea here with the wormhole vectors is that I can take a query in any vector space. So for example if I take a lexical query here. + I can easily take you know lasagna or drive through what have you and I can generate these representations over here by taking lasagna finding a doc set that matches that keyword and then from that doc set finding these other relationships for example lasagna can be described as Italian with keywords like lasagna Alfredo pasta and Italian. +And then Korean barbecue can be represented as category of Korean with terms like Korean bon chaan sorry lawn et cetera fast food gets things like McDonald's and window. So this is purely leveraging and I've been doing this for years and it works very, very well. +But this is purely leveraging the inverted index in this document set. The idea with the wormhole vectors is not just to stay within a single vector space but to be able to go across vector spaces. +So similarly I should be able to take an embedding that finds a region in semantic vector space and a dense space find the nearby things which ultimately just translate to a doc set. +And then from that doc set I can use the same technique to say what are the things that are related within these documents and generate you know the similar kinds of outputs over here. + You can also think of this if taking away all the wormhole vector terminology you can think of this is just a way to make your embeddings more explainable right I've got an embedding I go to a dense vector space I find documents and then from that set of documents I'm now driving a lexical vector which is readable. +And then I'm describing what's happening there and of course I can then turn around and take that and query my sparse space to match other things that have the terms but maybe didn't match in the dense space so that's the general idea. +And there's one kind of last thing I wanted to cover briefly which is this notion of behavioral embedding spaces because I've mentioned it multiple times and I have a feeling a lot of people aren't super familiar and so let me click here. +The general idea and I'll be very quick through this will spend more time in the AI power search course diving into all of this but the very high level intuition is that when users to interact with your documents right they they run queries they click on the data. + These they click on things they like them add to cart purchase those are user behavioral signals and if you've got a sufficient amount of traffic you want to be collecting those and leveraging them to build reflected intelligence algorithms so one of the types I mentioned several earlier signals boosting collaborative filtering and matrix factorization learning to rank and knowledge graph learning but specifically on collaborative filtering which is mostly focused on personalized search. + So I'm a personalized search or understanding user behavior to generate better personalized results we typically leverage collaborative filtering which is now for them for doing recommendations so I start with you know particular item or particular user and I recommend other items based upon that item or user. + So it typically looks like right somebody run searches or purchases things like apple and Macbook and then these are the items they interact with you know iPads and Macbook errors things like that and then for that user we can generate this list of recommendations based upon running this call out collaborative filtering algorithm in this case. + I want to briefly mention again that with typical content based embeddings I mentioned latent features earlier typically you have items and there's these densely packed dimensions that represents different features collectively you know like this particular feature might have a strong correlation with size this one I have a strong correlation with color this one might have a strong correlation with you know is this kind of like a computer but they. + Those meaning spread across many different of these dimensions similarly whenever we're doing collaborative filtering these also rely on latent features or latent dimensions so for example if I have a bunch of users my first user likes these three movies my next user likes these three movies my third user likes these three my next user likes these three and my last user likes these three. + You can kind of visually see here you know these are sort of you know there's some similarity here there's some similarity here your brains probably picking out what it is but if I were to map these conceptually I would say that users one two and three tended to like movies that were about superheroes made by Marvel studios and occasionally Warner Brothers they're all action movies and they're not suitable for small children. + Whereas users four and five all like animated movies all of them are suitable for small children and all of them are made by Disney and Pixar a collaborative filtering algorithm sort of discovers these relationships and recommends based upon them because they exist in the underlying documents even though we don't have them modeled out explicitly and the way this works with collaborative filtering as we do matrix factorization so we start with a user item. +Matrix where here's my list of users and here's my items and then these are sort of the the amount to which they like those items we can derive this based upon just their querying click patterns. + The intermediate step for collaborative filtering is matrix factorization which is taking this underlying user item interaction matrix and trying to break it into two different matrices this user feature matrix and this item feature matrix and the idea is that if I can generate a set of latent values associated with this user across some number of dimensions I'm only showing three here visually because it's a PowerPoint slide but you know there be more. + And if I have you know the same latent dimensions over here for the items when I multiply a particular user and their sort of their particular values associated with these latent dimensions with the movie that I'm pulling apart how much of this belongs to the movie and how much of this belongs to the user in terms of the way I'm going to do it. + So this is a user embedding and this is an item embedding what that means is that and this is just how it works to do collaborative filtering actually generate recommendations for for particular items not particularly useful for today but what I can do is I can generate these latent embeddings and these essentially allow me to create a behavioral embedding space for my items. +So I've done that I can add these behavioral embeddings onto documents just like I do with content based embeddings or whether it's images or text or what have you and then leverage those as a behavioral space. +So we do this commonly with you know personalized search for example we'll go through this in the course but if I have a person who previously searched for Hello Kitty plus toy GE electric razor GE bright white light bulbs Samsung stainless steel refrigerator. + I can take a normal query key required for microwave which just returns random microwaves if I use these vectors and properly with no guard rails I might do things like blur the lines between categories most people if they've searched for a Samsung stainless steel refrigerator the best results here would be a Samsung stainless steel microwave but if you do this wrong the sort of naive approach is you know I might end up with a Hello Kitty microwave or a you know Panasonic microwave. +Not Panasonic but I they might end up with things that don't exactly match all of the preferences when the category again for another day but this is how behavioral vector space would typically be used. +But ultimately there's a lot of ticks of tips and tricks you can use to do AI powered search to combine all of these different techniques that you might use to run searches and to query understanding a relevance and sort of integrate one whole vectors in various places. + So there's lots of different query paradigm to experiment with to merge using wormhole vectors but that's the general idea I wanted to kind of introduce today to get the discussion going about going from thinking of these vector spaces is entirely sort of orthogonal we have to query them separately or maybe I could even like query them in the same query but I'm filtering on them separately to trying to actually pull out the semantic understanding from one vector space and use that to craft. +A sort of wormhole or hopping off point to another vector space to sort of continue to explore leveraging a different query paradigm so that's pretty much it for the talk for today. Dima Dima, I don't know if you want to start to dive in some questions. +I know some people will have to hop off at the top of the hour because this is scheduled for an hour but I'm also happy to just kind of keep going with questions a little bit after if it makes sense and people can drop off when they want but let's maybe dive into some discussion. +Yeah, we have a bunch of questions thanks to a bunch. This is fantastic topic. I just recently traveled to Texas from Finland and it took me like 12 hours. I wish there was a wormhole, you know jump through points so I could just end up there much quicker. We have so many questions, man. +So I'll defer my questions off and I'll just jump. There is one logistic question from Arthjune. I hope I pronounce you name correctly. I'm sorry. How is the course different from the AI part search book and then later is this topic wormhole vectors covered in the book? Awesome. +Okay, so I would say there's material wise there's probably like a about a 40% overlap. Like the book is a good solid foundation for how to think about AI powered search. Obviously we go through all the mental models and lots of code examples. +So a lot of the labs and a lot of the code for the course will come from the book. However, there's a lot of new topics and things that we just like, you know, we couldn't write a thousand page book. +And so there's a lot of things we just couldn't get to because we had to start from the beginning and frame it. +So things like late interaction models, things like a gentick search that aren't in the book that like late interaction models are reference, but we just couldn't get into depth that are more modern and interesting ways to solve problems. +Things like many coil, which I mentioned, those things are will be in the course and unique to the course and we'll have guest speakers who are experts in those things. So I would say the course doesn't expect you to have read the book or to understand the fundamentals in the book will cover those. +But we won't cover everything in the book and we'll also cover a lot of things that aren't in the book and go in deeper depth. And so I would say, you know, if you've read the book, the course is still going to be like really valuable. +And even if you can't make all the sessions again, the videos and all the materials, you know, available for you forever. So, but so you don't have to have read the book to take the course, but if you have read the book, the course is still going to be massively useful. Yeah. +So the two implement each other. And by the way, I own the book and it's amazing read, you know, in silency. And then the course is a different way of, you know, engaging with the material like a dynamic way. +Well, I didn't answer the last part, which is, will wormhole vectors be covered? They will definitely be covered more so as like the techniques and strategies for how to hop back and forth between. +So some of it's actually in the book, the semantic knowledge graph stuff is already in the book, but the, yeah, we'll, we'll definitely talk about wormhole vectors explicitly and have some more specific examples people can play with. Yeah, awesome. +And I do want to mention this is like experimental and emerging and there's, there's some things that I glossed over today in terms of, you know, hopping to a particular point versus trying to hop to like a region and have more of a shape that we could chat about as well. +But, but yeah, there's some, some, there's still some things I'm doing to kind of better understand it fine to know. Yeah, awesome. I'm trying to speed up, but there's a question from Claudio. +What are the latent features is basically where you switched from sort of explicit feature metrics to like latent features. Maybe I can take it quickly. It's basically in an LLEM. +So if you deal with an encoder model, where you generate embeddings, basically these are like internal representations that the model learns and they're like compressed. They're like abstract way of, you know, dealing with patterns or relationships and your data. +It's not exactly that, you know, directly that black and white, but the thing is that I like on the conceptual level, they're like internal weights that the model learns. Then there is a question from Julian, very concrete one. +Can you give an exact, concrete example of how to compute the wormhole vector from sparse to dense space. Yeah, so I had a slide. It's sparse to dense is the easy one, which is, let me, let me go back to the slide. One second almost there. Here we go. +So to go from sparse to dense, think of it this way, you've got a bunch of documents in your index and you generate embeddings for those documents. That's how your dense space is constructed, right? Those embeddings on the documents. +If you query for the documents using keywords in your sparse space, then you're still matching that set of documents and all of those documents have the embeddings on them. So all you do is run a keyword search on your documents. +Take the top end documents are the most relevant, right? They hopefully semantically represent the concept, the best. And then you take those embeddings off of them and you literally adverse them together. The code for that is on the screen right here. +And that you just generate this pooled embedding. +It's that notion of Darth Vader versus puppy and finding the puppy Darth Vader in the middle, right? If someone were to run a keyword search and it's sort of is easy to think of this with a single keyword, but let's go back to my, what it, let's go back to cheese pizza, right? +Like if I search for pizza, I'm going to match a bunch of pizzas. +If I search for maybe cheese pizzas back as all pizza has cheese, let's do cinnamon bread sticks, right? If I search for bread, I'm going to find bread, you know, documents have the word bread. If I search for cinnamon, I find documents with cinnamon. +If I search for sticks, I find documents with sticks sticks by itself isn't really what I'm looking for. +But if I do cinnamon bread sticks, then I'm finding all of the documents that have those terms together, which likely are cinnamon breadsticks or have the notion of cinnamon breadsticks or talking about cinnamon breadsticks. + So if I take all of those documents, the most relevant ones, and I generate, and I average their embeddings together and go over to the dense space where I land should be where the concept of cinnamon breadsticks is and things nearby, which may not have the word cinnamon bread or sticks in them should come back. +I might get, you know, certain kinds of donuts and certain kind that might get like a churro or something like that. So that's how it works. But this is the math here. That's actually the easiest way is to go from sparse to dense. The dense to sparse requires a semantic knowledge graph or similar. +Awesome. I hope this answers your question, Julianne. If not feel free to unmute and ask full of questions. Otherwise, I'll jump to the next one from Ursula. +Do we build the inverted index and the forward index to build the knowledge graph using just some document chunks? Do we need a much bigger document base to make it work? That's a good question. + So the best way to, for a semantic knowledge graph to work the best, you need to have overlaps of terms that across documents, meaning if I take something like stack exchange, where there's a bunch of topics being talked about, you'll have lots of people who use the same words together and the same documents. +When that happens, you can easily find sets of terms that overlap commonly and use the semantic knowledge graph to generate semantic understanding and relationships based upon those co occurrences. All the math for that's in the AI powered search book, but that's when it works the best. +Something like Wikipedia is actually even though it's commonly used for like every data science project. It's actually really bad for semantic knowledge graphs because every Wikipedia document tends to be about a particular topic. +And other than common terminology, you tend to not have a lot of overlap across documents because they're all focused on one idea. So for semantic knowledge graph to work well, you typically are going to want to have overlap across your documents. +What that means is that if you chunk your document so small that you only have like a couple of words or sentences or something like that, you lose a lot of that context. I mean, in general, when you chunk, you lose context, that's the problem with chunking with most forms of chunking. +And so you have to be careful not to chunk too much, but the end versus also true. If you only have 100 documents in every single one of them is a thousand pages long, well, there's way too much overlap and everything is related at that point. +So I would say it's no different than just how you would typically segment your documents for any search problem, right? You need to be granular enough to be useful, but not broad enough to, you know, kind of be too general. +And now the logistical question from our unit whether we will share slides. Yes, absolutely. Yeah, the video for this everybody who signed up will get and probably like 48 hours, you'll get emailed a copy of the video. So you can refer back to it. +And I'll also send an email with the slides probably shortly thereafter. Yeah, and I plan to publish this in the vector forecast as well. Yes, absolutely. Later. The next question is really cool from Claudia creating a warmhole vector that will move us from embedding space to sparse vector. +I understand the methodology, but the way back now, how do we aggregate a set of sparse vectors that represent documents in a way that will allow us to move us to the embedding space. +So in the words from the sparse like you showed tray, we have like millions of dimensions, right? How do we compact that right and don't lose anything and not introduce any noise when we're not way to the dense space. Yeah, so it's it's a really great question. +I answered it technically in terms of pulling, but let me add some color to it in terms of techniques. So the. There's a couple of things here. +One, whenever you're querying in an inverted index, there's typically a kind of Boolean matching phase and then there's a ranking phase, meaning if you had 10 million documents in your index, you're not going to return a ranked list of 10 million documents. +So I'm going to probably return the documents that have the specific keywords you search for, which is going to be a much smaller document set. And so that's and you can do the same thing on the dense side with, you know, cutoffs on cosine similarity or something like that. +But step one is you start with a condensed document set that should represent generally the meaning of what you searched for using the keywords you searched for on the left school side. +However, because the idea of a wormhole vector is to find the best corresponding region in the other semantic space, it can often be useful to not take that entire document set either matching the query. But if you feel confident about your ranking, then you can take the top in documents. +So maybe you match, you know, 10,000 and maybe you only take the top 100 and say, hey, from the top 100, if you know your relevance ranking is good, then you're going to like use that to generate a more precise wormhole vector to the meaning of those top documents over to the dense space. +So, and that whether you go with the full matching document set or you go with the like the top in that's really a just practical matter of how confident you are on the ranking. If you're if you're really confident in your relevance, you should go with the more relevant documents. +And if you're not just take the whole document set and it should sort of average out, you know, the meaning. Another thing that we didn't really get into is that the strategy, the technique I was showing if I let me jump back to the final slide one second. So I jump back to here. +So the technique that I'm showing where I get my document set, pull my embedding together, that ultimately gives me a single embedding, which is a single point over here in my dense vector space. So the strategy is that different queries have different specificity. +So imagine this is like a job search engine. If I run a search for, you know, senior AI search engineer, action, culberts, signals boosting and, you know, collaborative filter. If I run that search, that's a very specific search. +Frankly, it probably doesn't match anybody, but if I ran that search, it would be a very small number of documents is very specific. However, and so in that case having a point kind of makes sense. However, if I ran a search for sales. So that's like a third of all jobs. +And for me to take the notion of sales, which is probably a giant region and this vector space with lots of nuance inside of it and to then turn that into just a point in the other vector space. +It's probably not going to work out super well because there's probably sales is probably distributed across that other vector space in a much larger region. And so, there's this notion of query specificity, which is also really useful. +So I would actually argue that the better way to do this technique is as part of your initial query when you're sort of finding the set of documents. + If you can look, for example, at the embeddings and do just like a cosine similarity across the embeddings that you're pulling, you can, you can go from like a bunch of embeddings that are just pulled together into a point to actually saying what is the like relative size of the range of the co science within these embeddings. +And if it's a very large range, I understand that this is not a very specific query. It's a broad query. Therefore, when I go query in the dense space, I need to draw a larger radius or larger kind of shape around what I'm searching for. +So ideally you're actually searching for a shape and not just a point. But literally every vector search implementation I've seen at any company is searching on embeddings as points and just looking for the nearest things, not searching on shapes. +And so we don't even really have the query patterns and paradigms in place today to do that kind of a query. But I think that would be a further improvement on the paradigm here. Awesome. Yeah, Tim Allison says thank you. Thanks, Tim. The next question is from Julian. +Can you recommend any papers or the material to explore the topic further? So not really. So the one whole vector thing is something I kind of came up with. I will say, well, two things. One, semantic knowledge graphs. +I actually was the lead author on the original semantic knowledge graph paper back in like I don't know, 2016 or whatever was published. So this notion of being able to jump between spaces back to a. Sparse space. You could look at that paper. +If you want an actual research paper, I've also given lots of talks about it. It's in the AI powered search book. It'll be in the course. However, the notion of taking running a query and pulling vectors together. And even the notion of specific query specificity that co-signed similarity thing. +If you look at Daniel Tukalang, he actually did a lightning talk with us a week or two ago on query understanding. He actually talks about this notion of a bag of documents to represent a query. It's functionally the exact same thing. Right. +So if I run a query and think of the queries, meaning as being represented by the set of documents that match that query. Then to take that set of documents that holds that meaning and pull the embedding is to create an average embedding that represents that meaning and embedding space. +It's functionally the same thing that Daniel describes when he talks about bags of documents. So I would say look at Daniel's work. Look at the lightning talk he gave a week or two ago with us. And those are some good resources. And of course, the book and the course. Yeah, awesome. +Maybe at some point of paper as well. Right. Yeah, it's definitely possible. I need a lot of good. I need e-vows on how this actually does in practice. Yeah, absolutely. Are you not the same question? I'll skip that. Most of all, he's asking for a knife phone query cases. +So charges may also appear would be correct to take the average. Would it be correct to take the average of them? So good question. So, yeah, if you, so, Lexical queries work really well when you've got particular terms you're looking for, whether it's an ID or whether it's a keyword. +They don't work as well with like semantic meaning, whereas in a dent space, obviously you query on meaning, but if you try to search for product ID in a dent space, unless you've fine tuned it for that, it's going to do an awful job. + And so, in the case of like searching for iPhone and getting iPhone cases, this somewhat gets back to what I said earlier about ideally, if you take the top in documents that are the most relevant and you limit to that, like if you're ranking algorithm can already sort of understand that when someone searches for iPhone that, you know, they mean, you know, an actual iPhone versus a case. +That's a, that's a better way to go versus just anything that matches the term. That being said, what you can do is you can, for example, in that case, search for iPhone, find the iPhone cases, along with iPhone, get that average vector. +And then there's still this region of, you know, along certain dimensions, it's associated with iPhone. +If you hopped over to the behavioral embedding space, what you're going to find is that, you know, hey, these cases are very highly correlated to these items that the iPhones that actually correspond to those cases. +So that might be a case where you would want to hop to the behavioral space and leverage what's there. There's also just to note, we've talked about taking entire queries and hopping from, you know, between spaces. +But there's also a line of thinking and practice here around using this for query understanding, not just ranking. + And so you could, for example, split the query into individual keywords, you know, iPhone, like just the word iPhone on the, and you could also search on the dense space and you could try to take the individual pieces and find things related to them and then leverage that for query understanding to hop back and forth between spaces. +I, the answer is you still have the fundamental limitations of each space. But imagine if somebody searched for, you know, I want a phone that's really good at blah, blah, blah, that's made by Apple. +With product ID X, right? And imagine trying to search for that like an or a query on the submit on the, like school side, and you'll actually match that ID and probably have it come up at the very top. And then you can imagine searching for that embedding on the, on the dense space. +And you could imagine for each of those hopping back and forth and trying to see what documents are there a couple of times. So there's, look, sure, and answer there's a lot of different ways that you could leverage this technique to be hopping back and forth. +I'm not going to claim now that I've thought through every single one of them and there's lots of ways to do it. +But I think as an introduction to the topic and as a tool that you can add to your tool belt to be able to get explain ability and another vector space based upon what you found in the first vector space. I think this is a really cool technique. +And I just wanted to kind of present it and get feedback and enjoy this discussion. Thanks, Trey. We're quite over time. Thanks everyone for staying on and hopefully if you can still stay on, we can get to the bottom of the list. +Yeah, and by the way, to your answer, Trey, I think somewhere there is probably a notion of search result diversity as well. Right? So even if the user types iPhone, they only mean the phone, but they actually may mean something else. +Right? So I think that's really showing diverse results and then traversing to the other side with those diverse results could also make sense. Absolutely. Yeah. +Then Arjun is asking, is there, if I summarize the question, is there a cheaper way than using semantic knowledge graph? Maybe, you know, the fear is that the graph approach is computation expensive? Is there some cheap way to running an embedding model typically? But it just depends. +Yeah, I mean, there's other techniques. Like if you have a fine tune, blade model, for example, it can give you very comparable kind of semantic understanding on the sparse side. So with that is you have to fine tune it to your data. +And also one of the benefits of the semantic knowledge graph is that if you, I'm just going to quickly jump to the slide to show you this one. Let me do the one that's got keywords. Here we go. +There here with the semantic knowledge graph approach, you have the ability to not just represent the query with a bunch of terms with values, but you can actually use any field. So it's really useful to be able to describe it with a category of Korean and a bunch of terms here. +And maybe you've got other fields on your documents that are really useful for describing the document. And the text on me of some sort, the semantic knowledge graph gives you a lot richer ability to turn the set of documents into a fully expressive query. So yeah, there's other techniques. +You could look at displayed things like that, but nothing that's nearly as expressive. And these are the concepts you're going to cover in the course, right? Yeah, we'll cover it all in the course for those who are interested to learn more. +Sorry, not trying to make this talk just a big promo for the course, but this is really warm old vectors by themselves are really interesting topic, but yeah, obviously, I would love if you would join us in the course. It'll be fine. +The vibrato is asking, you know, what do you think about some of the reputable search sites like indeed on LinkedIn, where searching for a male engineer will bring your results like data engineers and, you know, whatever unrelated stuff, not directly related stuff. +And so the question is why search documents, not based on the entire user query, right? Only part of it. So I'm trying to understand the question in relation to the wormhole vector topic. Yeah, I think it's more, I think it's less directly related. I think it's more auring on the side of data bias. +Why this reputable search sites do not sort of utilize the semantic search, you know, one to one in a way. Yeah, I got you. +I mean, the reality is that most AI powered search algorithms, really all of them are used data and the data is biased, right? So like the reality is in the world, if you look at data engineering jobs, they are more statistically skewed towards males being in those jobs and females. +That doesn't mean that that doesn't mean anything in terms of who can do the job or who can't. It's just a reality that, you know, there tend to be more males and engineering and therefore the data is reflecting that it would be nice to be able to take those biases out. +And in fact, there's ways you can do that, but they're extra work. And so the out of the box algorithms that are typically employed don't necessarily try to tackle those biases. + So yeah, I think it's I think it's valiant to, you know, try to, especially when you're dealing with things like people's livelihoods and, you know, careers and things like that, I think it's, it's a great exercise and something they should focus on, but it's like, unfortunately, kind of a reality of the underlying data that's being bubbled up, I think. +Yeah, my take is the data is biased. Yeah, I agree. + I've built one job search engine like couple companies ago and my take is that probably these companies are trying to avoid, you know, these traps when you're super, super precise query, we will either lead to nothing or lead to just a couple of jobs on the screen because their business is to show you as many jobs as possible so that they can monetize that. +I'm not going to talk about maybe like business element as well, but I'm sure there are other like technical aspects of this, which we should note disregard. Sure. +And then from Arjun, and you experience how much do the following results differ first query against dance of vector space directly and the second query in sparse vector space and warm hole to dance vector space and finally get the dogs that are similar to the warm hole vector average vector. +Yeah, yeah, that's. + I mean, at the end of the day, the, if you have a query that you run against your lexical space that matches mostly documents that are related to the query and then you hop over to the dance space, you're typically going to get a lot of overlap because the lexical space semantics are going to be very similar to the dense vector space semantics in terms of like the underlying meaning. +If you were to take the lexical space and I should mention you can actually use a one hole vector in the same vector space. +I kind of showed that with like taking a query like lasagna and then rewriting it with a more expanded out lexical query with a category of Italian and so you don't have to actually jump between different vector spaces. +You can even jump within the same vector space and I think that in this context, the more similar, the meaning of the underlying set of documents is matching each query, the more interesting you're going to be able to find missing links in the other vector space. + If you have very orthogonal queries, like you can imagine on the lexical side searching for, you know, orange juice and Nintendo switch right like or you'll get nothing for that but aren't you or Nintendo switch will you basically end up with a document set that is really two separate documents sets right it like that there there's not a lot of overlap and if you hop over to the dense space and get the average of those. + There's still going to be things that are like probably close to Nintendo switch and probably close to orange but orange juice but the more different those things are you might get some weird stuff in between because you're looking now looking across two different places any any Nintendo switch stuff that's orange or related to juice or something might show up but it's it's going to be weird and so this isn't like a magical silver bullet that solves every query understanding or every relevance problem it's just another tool in our toolkit. +To be able to better reason about the underlying documents and queries into you explain queries and another modality if you will another query modality yeah in other words what you're searching for should still kind of make sense yeah yeah it does it probably will return some useful results. +Tips a thank you thank you tips. Rostem. Rostem says you know the impact of of documents agmentation so basically what are the suggestions to improve that so that warm whole vectors would be useful. +Yeah I think I think it's common sense like the same way that you would chunk documents for doing rag you know I have to think of you. + If the documents are too big then there's too much loss of specificity and too much context being blurred together and if the documents are too tiny then you you're losing context and they're they're too specific and not enough overlap so I think you know typical like whatever your domain is you know I mean if you've got giant PDFs that are books maybe break into chapters or possibly even sections of chapters of the large sections. +But yeah just like use common sense what's a reasonable size document that represents the meaning of something that is like sort of. It's called integral like like a whole like a whole thing a whole concept. Yeah yeah. +It depends on the domain but it's I would just say you know your common sense is probably going to take you far on that one. Yeah and like for long documents that are like a thousand pages for sure you want to do that and maybe the last question is from Arjun. +Can this idea of warm whole vectors give us more serendipitous results. Give us more what serendipitous results. + Yeah absolutely so yeah think of just the behavioral space right if I run a query a keyword query and then I want to find other things that are related to this that don't match the terms and maybe don't even match the meaning but like user behavior is said that these things I should suggest I'm basically infusing recommendations then if I help over to the cement to the dense space. +Then I take my keywords and I'm finding other things that share meaning but don't necessarily have that keyword. + I'm starting with dense and I hop over to the electrical side I'm making sure that I'm finding things with that meaning but I'm adding in keywords that were completely ignored by the dense space that was not necessarily a serendipitous that's just like fixing problems but I would say going from lexical to semantic more so we'll get you. +So the things that were dismissed but yeah for actual serendipitous you're probably the behavioral space is probably going to give you a lot more magic there. Alright awesome I think it's a wrap thanks so much everyone. +Thank you so much for the presentation for the idea and for pounding at the questions with such an immense speed. Thank you all for time. This was awesome awesome thanks to me to really appreciate you coming on this was awesome thanks everybody for joining. +And yeah the video and slides and everything will be coming out to you and I hope to see you soon. Thank you. See you soon. Bye bye. Bye. \ No newline at end of file diff --git a/transcripts/vector-podcast/vector-databases-the-rise-fall-and-future-by-notebooklm.md b/transcripts/vector-podcast/vector-databases-the-rise-fall-and-future-by-notebooklm.md new file mode 100644 index 0000000..9c87276 --- /dev/null +++ b/transcripts/vector-podcast/vector-databases-the-rise-fall-and-future-by-notebooklm.md @@ -0,0 +1,101 @@ +--- +description: '

https://www.vectorpodcast.com/

I + had fun interacting with NotebookLM - mostly for self-educational purposes. I think + this tool can help by bringing an additional perspective over a textual content. + It ties to what RAG (Retrieval Augmented Generation) can do to content generation + in another modality. In this case, text is used to augment the generation of a podcast + episode.

This episode is based on my blog post: https://dmitry-kan.medium.com/the-rise-fall-and-future-of-vector-databases-how-to-pick-the-one-that-lasts-6b9fbb43bbbe

Time + codes:

00:00 Intro to the topic

1:11 Dmitry''s knowledge in the space

1:54 + Unpacking the Rise & Fall idea

3:14 How attention got back to Vector DBs + for a bit

4:18 Getting practical: Dmitry''s guide for choosing the right Vector + Database

4:39 FAISS

5:34 What if you need fine-grained keyword search? + Look at Apache Lucene-based engines

6:41 Exception to the rule: Late-interaction + models

8:30 Latency and QPS: GSI APU, Vespa, Hyperspace

9:28 Strategic + approach

9:55 Cloud solutions: CosmosDB, Vertex AI, Pinecone, Weaviate Cloud

10:14 + Community voice: pgvector

10:48 Picture of the fascinating future of the field

12:23 + Question to the audience

12:44 Taking a step back: key points

13:45 + Don''t get caught up in trendy shiny new tech

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20250302_080303_04f8b5f2665529faa9d13569b97a18c9.png +pub_date: Sun, 02 Mar 2025 08:27:58 GMT +title: 'Vector Databases: The Rise, Fall and Future - by NotebookLM' +url: https://rss.com/podcasts/vector-podcast/1922013 +--- + +Welcome back everybody. Today, we're gonna be diving into the world of vector databases. Ooh, fun. They're rise, they're potential fall, and with the future holds. Okay. You know, you sent us this fascinating medium article to kind of guide our exploration. Ooh. +Called The Rise, Fall, and Future of Vector Databases. How to pick the one that lasts by Dmitry Kan. Yeah, I saw that one. Published January 6th, 2025. Mm-hmm. So guess the term vector database might actually be on its way out. +Really? And your choice of database could hinge on needing things like faceted search. Oh, wow. Or even those super cool late interaction models. Huh, interesting. And treat. I know I am. Let's break it all down. Okay, let's do it. So it's gonna be good. +What I thought was so interesting about this article is how it really blends like the technical side with the broader AI landscape. Yeah, you're right. It's not just about the nuts and bolts. It's about how perceptions and adoption of vector databases are shifting within the AI world. Absolutely. +Like this is not just a technical deep dive. Right. This is about how people are thinking about these things. Yeah. And using them. Totally. And Dmitry brings this really unique perspective He does. to this whole conversation. +Because he was like deeply involved in this emerging market just a few years back. Oh, really? He was even advising VCs on which vector database companies to back. Wow. So he's like an insider. Yeah, he's got the inside scoop. So he really saw this whole thing unfold firsthand. +He was there from the beginning. Wow, that's amazing. And it's interesting because just a few years ago, vector databases were like the hot topic. They were everywhere, right? Everybody was talking about it. Like they were the key to unlocking all these powerful AI applications. +Like this was gonna change everything. Everyone was so excited. Yeah. Okay, so let's unpack this whole rise and fall idea. Okay. So Dmitry noticed something interesting. What's up? Fewer people were reading his early articles about vector databases. Oh, really? Huh? I wonder why. +What do you make of that? Well, you know, it kind of hints at a potential shift in the industry. Okay. Instead of being seen as these like standalone solutions, it seems like vector search technology is kind of merging with other AI advancements, becoming part of a bigger picture. +Like what? Think LLM, small time modal search. They're all getting more integrated. Okay, so it's not that vector databases are like vanishing. Right. They're evolving. Exactly. I'm blending into more comprehensive solutions. That's it. Okay, I got it. The technology itself is still crucial. +But how we think about it and use it is evolving. Okay. Like, you know, you have your traditional databases, right? Like your SQL and no SQL types. Well, a lot of them have integrated vector search capabilities now. Oh, wow. +So the data type itself is becoming normalized within these existing systems. Okay. I see. So it's becoming more commonplace. Yeah. Exactly. It's not this like niche thing anymore. Right. It's just part of the toolkit. It's becoming part of the fabric of how we work with data. +I like that fabric of how we work with data. It's a good way to put it right. But here's where things get really interesting. Okay. Tell me. Dmitry saw this resurgence in views for those older articles. Oh, so people are coming back to them. +They're coming back and guess what? What? We've been right when major funding announcements hit for some vector database startups back in April 2023. Oh, interesting. Like, big money was flowing back into this space. Yeah. Like, pine cones, $100 million series B. Yeah. Pinecone was huge. +Weve 8 securing $50 million. Huh. And QDran getting $7.5 million in seed funding. Yeah. Those were big headlines. They were. It really highlights how much media coverage and industry buzz can influence how we perceive technology trends. Codally. It's like a self-fulfilling prophecy almost. Yeah. +And it makes you think like how much of what we perceive is like the next big thing is actually driven by, you know, the hype, the hype, the funding, the media attention. Yeah. It's fascinating. So clearly, there's still a ton of innovation and investment happening in the vector database space. +Be sure. But let's get practical for our listener out there. Let's give him some actionable advice. The real gem in this article is Demetri's guide for choosing the right vector database. Right. Because there's no one size fits all solution. Exactly. It really depends on your specific needs. +It's like having a roadmap for navigating this complex landscape. Totally a roadmap. So where does he suggest starting? Okay. I'm ready. His secret weapon is, FAISS. Okay. No, it's not technically a full-fledged database. Right. It's more of a powerful library. Okay. +So the kicker, it can handle massive data sets. Okay. We're talking over a billion vectors. Wow. That's a lot. So it can scale. We can handle the big stuff. And the beauty of FAISS is its simplicity and scalability. So it's perfect for, it's perfect for initial exploration and prototyping. +You can just get in there and start playing around. Exactly. Yeah. But of course, uh oh, there's always a butt. There's a trade-off. Okay. What is it? Built-in filtering capabilities. Uh, so you can't really do that fine-grained search. Right. Like with keywords and stuff. +Which might mean getting creative with workarounds? Okay. So you've got to be a little clever. A little bit. If you want to use FAISS, I.S. for certain things. Yeah. So if you need that fine-grained controlled keyword search. Yeah. +Along with your vector search, what does Demetri recommend? I'm all ears. He suggests looking at databases built on top of Lucene. Lucene. Options like open search, elastic search, and a patchy solar. Got it. So these are all built on this like solid foundation of Lucene technology. Yeah. +Lucene's been around for a while, right? It's a mature technology with a proven track record. Yeah. So it's reliable. Reliable. Yeah. And it provides that robust keyword search. Okay. You mentioned plus multilingual support. That's important these days. Super important. +And it performs incredibly well. So it's fast and efficient. Nice. This makes it particularly well suited for e-commerce. Oh, interesting why e-commerce? Where features like faceting, which allows users to refine their search by specific attributes. Oh, I see. So like filtering by brands. Yeah. +Like filtering by brand. Price range. Price range size. All those things are essential. Makes sense for e-commerce. He did mention one exception though, right? Though there's always an exception. What is it? Cudrant, even though it's not built on Lucene. Oh, okay. Includes fascinating capabilities. +Interesting. So it's kind of a hybrid approach. Making it a contender in those scenarios too. So Cudrant's kind of a wild card. A little bit. Yeah. It's got its own unique set of features. And it shows the importance of going beyond general categories. Yeah. +And really digging into the specific features each database offers. Absolutely. You can't just like assume that because it's in one category. Right. It's got all the features you need. You got to do your research. You got to look under the hood. Exactly. +Now what if you need something more advanced? Okay. Like what? Like support for those late interaction models. Late interaction models, huh? Yeah. If you heard of these. I've heard the term, but I'm not really sure what they were. Okay. So imagine you're searching for the perfect pair of red shoes. +Okay. I like shoes. But only after you've seen a picture of the outfit you want them to match. Oh, I see. So like the context of the search changes. Exactly. And then you see something you see later on. That's where late interaction models come in. Okay. +Allowing you to refine your search based on context that's only available later in the process. So it's like a more dynamic way of searching. It is a more dynamic way of searching. Interesting. And it requires a different level of database support. I bet. +And Dmitry points to QDrent or Vespa as potential solutions. Okay. So they can handle those late interactions. Because they offer that support natively. So you don't have to hack it together yourself. Exactly. That's good to know. +So choosing a database that can handle those complexities is critical for performance and efficiency. You don't want your search to be slow and clunky. Especially if you're dealing with a lot of data. Right. Or if you need those results in real time. But it doesn't stop there. There's more. +The next step into Dmitry's roadmap is super important. Okay. Hit me. Considering latency. Latency, okay. And those query per second abans? Oh, yes. QPS. Can make or break your application? You're telling me. If your database is slow. Yeah. Or it can't handle the volume of queries. +It's going to be a bad experience for the user. It's going to be a disaster. So you've got to think about those things up front. Absolutely. And choose a database that can handle the load. If high performance is the name of the game. Yeah. +You'll want to explore solutions like GSI, APU, Vespa, or hyperspace. Got it. In fact, Dmitry even shared an anecdote about a CTO. Oh, I love a good anecdote. You'll confess that no open source vector database could handle their extreme workload. Wow. So they had to go with a commercial solution. +They'd find something else. That's interesting. Choosing wisely is essential. You can't just pick the first one you see. No. You got to your homework. And think about your long term needs. So the takeaway here is you need to think strategically. Yep. +Do you invest the engineering time to set up and maintain an open source database? Right. Or do you go with the convenience and potentially higher costs of a cloud solution? Right. It's a classic trade off. There's no right or wrong answer. It depends on your situation. +It's all about finding the balance that works best for your specific situation. Absolutely. And there are a lot of great cloud and API based options out there. Like what? Like Cosmos DB, Vertex AI, Pinecone Cloud, Weaviate Cloud, and other. But there's no shortage of options. +There's a lot to choose from. It's a good problem to have, right? It is a good problem to have. Better than having no options at all. And we love hearing from our community. Oh yes, our listeners are the best. One reader, Matt Collins, suggested exploring extensions like PG Vector. +Did you vector? Okay. Which adds vector search to Postgresql. Oh, so you can just add it onto your existing Postgres database. Exactly. That's pretty cool. It's a really clever solution. You have to rip and replace your whole infrastructure. +And it speaks to the constantly evolving nature of the vector database landscape. It's a fast moving field. There's always something new happening. You've been in new solutions emerging. Speaking of evolution. Oh, this is where it gets really interesting. +Dmitry paints a fascinating picture of the future. I can't wait to hear this. He believes the future lies in what he calls neural search frameworks. Oh, wow. These frameworks could revolutionize how we build AI-powered applications. Okay. +And then we have a system that streamlines the entire process from data modeling and embedding selection to evaluation and scaling. +So instead of wrestling with the complexities of choosing and integrating all the different components, it would be like having an intelligent assistant guiding you through building a search application no matter what database technology you're using. Exactly. +And this vision ties in nicely with the concept of compound AI systems. Oh, interesting. So where LLMs vector databases and other AI components work together like a well coordinated orchestra. So instead of focusing on the individual instruments, you're conducting the entire symphony. Precisely. +I love that analogy. Users can then focus on the task they're trying to solve, rather than the technical nuts and bolts. So it's about abstracting away the complexity and empowering users to focus on the bigger picture. It's about making AI more accessible and user friendly. +It's fascinating how this all connects to those funding announcements we talked about earlier. Right. It seems like the industry might be moving towards a more unified approach to AI solutions. That's a keen observation. While individual components like vector databases are still important. +For sure. The future might be about how these pieces fit into a larger ecosystem. Yeah, it's all about the big picture. This brings us to an interesting question for you, the listener. Oh, yes. Let's get our listeners involved. +Do you see neural search frameworks as a complete paradigm shift? Or will specialized vector databases continue to have a distinct role to play? It's tough question. It's something to think about. Let us know your thoughts in the comments. We'd love to hear from you. +But before we get too caught up in the future, let's take a step back and revisit one of Dmitry's key points about the impact of media coverage on perceptions of technology trends. Right. That was a really important point. +It's a crucial reminder to be discerning consumers of information, especially in a field as dynamic as AI, where innovation is constant. What might seem like a decline could actually be a natural evolution. Interesting. As a technology matures and finds its place within a larger ecosystem. Right. +Like a caterpillar transforming into a butterfly. That's a wonderful analogy. It's still the same creature, just in a more advanced and beautiful form. It underscores the importance of staying curious, continuing to explore, and never assuming that any technology is truly dead. +Because it might just be evolving into something even better. Who knows what exciting developments await us in the world of vector databases? I'm definitely eager to see what the future holds. E2. This deep dive has given me a whole new perspective. I'm sure it has for our listeners as well. +You know, as we're discussing this, it strikes me that Dmitry's journey with vector databases mirrors a broader trend in the tech world. Oh, how so? We often get caught up in the hype cycle. Oh, yeah, for sure. +But true innovation often emerges when technologies evolve and integrate in unexpected ways. It's like that saying the whole is greater than the sum of its parts. And that brings us to another crucial point from the article. Okay. One that I think holds immense value for our listeners today. +I'm all ears. Remember how Dmitry emphasized that it's not just about the vectors themselves. It's about understanding the nuances of data pre-processing model selection and bedding techniques. And even knowing when to switch back to traditional keyword search for certain tasks. +Yeah, sometimes the old ways are still the best. He's advocating for a more holistic approach where vector databases are seen as one tool among many in the AI toolbox. So it's not a silver bullet. It's not a magic solution. It's one piece of the puzzle. Exactly. +There is a deeper understanding of the underlying principles, not just blindly applying the latest trendy technology. Right. It's about making informed choices based on a thorough analysis of your specific needs and constraints. So don't just jump on the bandwagon. +Do your research and figure out what's right for you. So for those of you out there exploring AI solutions, don't get fixated on buzzwords. Take the time to really grasp the fundamentals. Understand the basics. Experiment with different approaches. Stay around with different tools. +And don't be afraid to challenge assumptions. Question everything. And remember, the AI landscape is constantly evolving. What works best today might be superseded by something even more powerful and efficient tomorrow. So stay curious. They engage. And keep learning. +Couldn't have said it better myself. Well folks, that brings us to the end of our deep dive into the world of vector databases. It's been a wild ride. We've explored their rise, their potential fall, and the exciting possibilities of neural search frameworks. We've covered a lot of ground. +We've also learned some valuable lessons about navigating the hype cycle and making informed decisions in a rapidly changing technological landscape. It's been a fascinating journey. Absolutely. And we hope you've enjoyed it as much as we have. Until next time. Keep exploring. Keep questioning. +And keep that thirst for knowledge alive. It's funny actually while we're focused on all this cutting edge tech, Dmitry actually kind of throws it back to basics in the article a little bit. Oh yeah, I remember that part. +He recounts this conversation he had with the chief data scientist at a major bank. That was a good one, which I thought was so interesting because it really emphasizes how even with all these advancements, sometimes the simplest solution is the best one. You don't always need the fanciest tools. +Right. Sometimes it's about using the right tool for the job. Exactly. This bank had poured resources into building this complex vector search system. Okay. But guess what? What? They ended up getting better results with good old fashioned keyword search. Really? For some very specific tasks. Huh. +So even the big bangs are going back to basics sometimes. Sometimes it makes more sense. Yeah. It's a powerful reminder that we shouldn't dismiss those tried and true methods. Like don't throw out the baby with the bathwater. Right. Exactly. +The tools when used strategically can outperform the flashiest new tech. It's all about choosing the right tool for the job. It's like trying to use a chainsaw to cut a piece of paper. Oof. Yeah, that wouldn't end well. Sometimes a simple pair of scissors does the job better. Much better. +And that brings us back to Dmitry's vision of neural search frameworks. Okay. If they become a reality. Yeah. Could they simplify these choices for us? Interesting question. +Would they be able to determine the best approach? Like whether it's vector search, keyword search or a hybrid based on the task at hand. That's the dream. How be amazed? It would be like having this AI assistant that just knows the best way to search for anything. That's a fantastic question. +It really gets to the heart of what these frameworks could potentially offer. Like a more intelligent and adaptable approach to search. So instead of us having to figure it all out. Yeah. The system would just do it for us. +A vision of system that not only combines different AI components, but also understands which tool is best suited for each specific task. That would be a game changer. It would free us from getting bogged down in the technical details and allow us to focus on solving real world problems. +It's all about making AI work for us. Not the other way around. Exactly. And it highlights the importance of staying flexible and open to new possibilities. As the AI landscape continues to evolve, we need to be willing to adapt and embrace new approaches. +Even if the challenger existing assumptions. Challenge the status quo. Right. Well said, well, this deep dive has been a whirlwind of information. It has. But I think the biggest takeaway for me is that the world of vector databases and AI in general is anything but static. +It's constantly changing. It's a constantly evolving landscape full of exciting possibilities and unexpected twists in turn. You never know what's going to come next. And that's what makes it so exciting. I completely agree. And it's a landscape that requires us to stay curious. Keep learning. +And never stop asking questions. And on that note, we'd love to hear from you, our listeners. Yes, please share your thoughts. What are your thoughts on the future of vector databases? Hmm. +Do you think neural search frameworks will revolutionize the way we build AI applications? That's the big question. Share your insights, your predictions, and your questions. We're all ears. We're always eager to continue the conversation. +Until next time, keep exploring, keep innovating, and keep diving deep into the fascinating world of AI. That's a wrap on this deep dive, folks. We hope you've enjoyed the journey as much as we have. \ No newline at end of file diff --git a/transcripts/vector-podcast/yaniv-vaknin-director-of-product-searchium-hardware-accelerated-vector-search.md b/transcripts/vector-podcast/yaniv-vaknin-director-of-product-searchium-hardware-accelerated-vector-search.md new file mode 100644 index 0000000..be99ac6 --- /dev/null +++ b/transcripts/vector-podcast/yaniv-vaknin-director-of-product-searchium-hardware-accelerated-vector-search.md @@ -0,0 +1,215 @@ +--- +description: '

00:00 Introduction

01:11 Yaniv’s background and intro to Searchium + & GSI

04:12 Ways to consume the APU acceleration for vector search

05:39 + Power consumption dimension in vector search

7:40 Place of the platform in + terms of applications, use cases and developer experience

12:06 Advantages + of APU Vector Search Plugins for Elasticsearch and OpenSearch compared to their + own implementations

17:54 Everyone needs to save: the economic profile of + the APU solution

20:51 Features and ANN algorithms in the solution

24:23 + Consumers most interested in dedicated hardware for vector search vs SaaS

27:08 + Vector Database or a relevance oriented application?

33:51 Where to go with + vector search?

42:38 How Vector Search fits into Search

48:58 Role of + the human in the AI loop

58:05 The missing bit in the AI/ML/Search space

1:06:37 + Magical WHY question

1:09:54 Announcements

- Searchium vector search: + https://searchium.ai/

- Dr. Avidan Akerib, + founder behind the APU technology: https://www.linkedin.com/in/avidan-akerib-phd-bbb35b12/

- + OpenSearch benchmark for performance tuning: https://betterprogramming.pub/tired-of-troubleshooting-idle-search-resources-use-opensearch-benchmark-for-performance-tuning-d4277c9f724

- + APU KNN plugin for OpenSearch: https://towardsdatascience.com/bolster-opensearch-performance-with-5-simple-steps-ca7d21234f6b

- + Multilingual and Multimodal Search with Hardware Acceleration: https://blog.muves.io/multilingual-and-multimodal-vector-search-with-hardware-acceleration-2091a825de78

- + Muves talk at Berlin Buzzwords, where we have utilized GSI APU: https://blog.muves.io/muves-at-berlin-buzzwords-2022-3150eef01c4

- + Not All Vector Databases are made equal: https://towardsdatascience.com/milvus-pinecone-vespa-weaviate-vald-gsi-what-unites-these-buzz-words-and-what-makes-each-9c65a3bd0696

Episode + on YouTube: https://youtu.be/EerdWRPuqd4

Podcast + design: Saurabh Rai: https://twitter.com/srvbhr

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20221221_081200_2671d7352871b25bd4959821449e1a69.jpg +pub_date: Wed, 21 Dec 2022 20:35:43 GMT +title: Yaniv Vaknin - Director of Product, Searchium - Hardware accelerated vector + search +url: https://rss.com/podcasts/vector-podcast/752549 +--- + +Hello, vector podcast is here. We again continue to roll in this season 2 of this podcast. Today I have a very interesting guest, Yaniv Vaknin, who is the director of product at Searchum. If you've read my blog post on not all vector databases, I made equal. +One of the vector databases, all like technologies, stood out. And it's a technology made by GSI, technology company. And it's implementing a hardware for vector search. It's very rare that this thing exists or this approach exists on the market today. And I'm super excited to talk to Yaniv Vaknin. +How are you, Yaniv Vaknin? Hey. Great. Thanks for having me, maybe, Dmitry. Yeah. I'm really glad you joined and found time in your business schedule. +So can you first explain how searchum and GSI are related? And maybe at the same time, if you could traditionally introduce yourself and talk about your background and how you got here. Yeah. So maybe I will start with quick introduction. So I'm director of product at Searchum AI. +Searchum AI is a SaaS platform for ML search application, based on purpose build AI chip for search applications. Prior to this role, I've worked at AWS as a machine learning specialist where I've worked with broad spectrum of top t top tier tech companies. +Trying to help them in their machine learning domain. And I was super excited from the revolution of the like the fifth revolution, the AI revolution with cool stuff of NLP search. Unstructed data structure data. I've worked with various companies cyber, fintech e-commerce, etc. +I was co founder and CEO of deep sea AI, which was the first computer vision based system for open water drowning detection. So we are the SaaS solution of GSI. GSI acquired an Israeli startup a few years ago. And the founder is Dr. Avidan Krib, which is one of the smartest guy that I ever met. +And during this PhD, he invented a new concept. So traditionally, CPU is communicating with the memory and then you have like challenges of bottleneck, IO, etc. But when you build the new concept, you build a memory that is the processor. So all of the computation is happening inside the memory. +You can guess that when you're running heavy or intensive intensive memory applications, if all of the processing is happening inside the memory, you can get a single digit millisecond latency. Yeah, so GSI acquired the Israeli startup and we are based in Israel. +We have an R&D team of approximately 40 to 50 people. In order to scale it, we started searching AI because it's super hard to scale hardware. So today we are offering this unique hardware on the cloud. It can be AWS, GCP or any other cloud and customers can consume it as a SaaS platform. +Yeah, makes sense. But there is still an option to if I want to have a completely on-premise sort of like setup, right? In principle, I could have bought like the APU cards, APU being a associative processing unit. Cards and like install them similar to what I would do with GPU. +Is that right? Yeah, yeah, totally. Yeah, so there are two types of implementation. The first one is on-prem and the second is via the cloud. +There are various configurations and in terms of if for instance, customers that would like to consume it as an on-prem solution, there are various capabilities. And one of the major things about this hardware accelerator is the power consumption. +So comparing it to CPU or GPU, it is like can be 5% or 10% of the power consumption. So companies that are running heavy workloads of GPU and CPU, the total cost of ownership for them is the power consumption. And other factors. +So on-prem customers can reduce the infrastructure cost in terms of the total cost of ownership, power consumption, etc. Yeah, this is really cool. And I think it's not very frequently that we mentioned power consumption as one of the dimensions on this podcast. +I mean, I think it's crucial of course for the planet and also for the electricity bill. And how the electricity costs skyrocketing, you know, and I think it's quite important. Yeah, sorry. +No, I was just kind of alluding to this fact that it's very like you should not skip it in kind of assessing a system or like a vector search solution, right. Not only focusing entirely on the offering itself, like you should still worry about how it will scale in different dimensions. +I'm glad you guys also worry about the power consumption part. Yeah, low carbon footprint is a major issue right now and especially in Europe. Usually developers when they are launching the AWS instances, so they choose by parameters of virtual CPU, RAM, etc. +And they are not aware of the carbon footprint, but when you are running it on-prem, this is a major parameters and this is a key parameter to, you know, taking a decision what is the right platform or right is the right. Hardware for you. +So totally agree, but, you know, I believe in an agree with you, we should take it into consideration and assume for cloud providers to integrate cloud providers like AWS, GCP, Azure. +This can be, you know, critical for them and we are in conversation with some of the company of some of the cloud providers that I mentioned. Yeah, this sounds great. +If we move a little bit closer to the algorithm side, so this is a kind of like dedicated hardware and as far as I understood also based on our brain buzz, buzzwords presentation. This hardware can support not only vector search, but some other scenarios, right. +Like for image processing related tasks. So is there any kind of like constraint on what type of vector search algorithm you can implement or is it doesn't it doesn't have any constraint. Yeah, yeah. +So I think that the biggest challenge today is when you are developing hardly can develop like a state of the art hardware, but I think the major challenge is how do you integrate it with the community. And video I've done it. Very good with the CUDA and it should be part of the ecosystem. +So in terms of applications here, we have like another application for image processing, it is based on. So, say it's set light images and radar images and we can process it like faster in a few orders of magnitude comparing to Jeep and video GPU. +So, we have in the past we had a few other applications for a genome and the molecules and today we are. We would like to you know to focus on on the biggest challenges like I believe that you know searches and you know we can elaborate about it later on. +Search is still broken and this is a huge market and so our focus right now is on the search. And we actually have to spend it to other solutions as well like image processing and we already have a solution and a customer for this solution. +And then one of our efforts is to build an ecosystem around this so. Hopefully soon we will launch our Python compiler so developers can write their code on Python and then you know run it. Simulnglessly on on our APU without you know trying to learn a new framework or a new language. +So, this is another direction that we are working we are trying I think one of the biggest challenges today is you know to simplifying the technological stack for developers so they are working with the common frameworks or languages they don't want to learn it. +So, they they they would like you know to stay with the. With the languages that are familiar and you know the learning curve is not always. They don't always have time for you know to learn a new framework so we are trying to simplify the integration with their current stack. +One of our solution is is a plugin on top of elastic and open search and which they are offering a vector search today but and we can talk about it but. So we have a plugin on top of this. +Search applications because some of some of the customers would like you know to stay with their curing the last or open search so we built a plugin on top of it and we are. We are talking with search engines and vector database in order to implement. +Our solution with their solution and I think in terms of like the the landscape so we are we are not perceived the vector search engines and vector database as competitors. +My perception is that they are potential they are partners and better together and you know to give a greater value for their customers reducing the infrastructure cost and give. So we are looking at the same with the same without sacrificing accuracy so. +Yeah we are you know trying to be part of the ecosystem and you know to help them and. To help customer scale their and improve their scale their search applications. + This is interesting you touched on you know being like a competitor to vector database I think it's interesting topic in general because on one hand if you take all vector database players they kind of look at each other as competitors probably but at the same time as all of you players are sharing the you know the approach the documentation the how you think about yourself I think it also helps cumulatively to the whole market. +I wasn't also wanted to drill in a bit into this elastic search and open search plugin so essentially like what elastic search team has been doing recently and I think they released now some updates in version 8.5 where you can you can do things like hybrid search right but this is all based on. +So it's all inside Java so it kind of runs in the same gvm right. The approach that you you guys have implemented it's basically like a. +The search backend right which kind of runs somewhere else let's see if we're using the cloud offering but at the same time it feels like a sort of like native to elastic search so I don't need to do much right I just need to install the plugin of course I need to have credentials. + And what I wanted to say is that it feels like you expand the capabilities of elastic search beyond what what it offers in a way that you can actually remove the load of vector search away from it to another backend right can you talk a bit more on the unit cost on on this kind of unit economy and and and sort of advantages of the approach that that you have implemented. + Yeah, this is a great point actually so but we are trying to decouple storage and compute so let's say for instance customer with elastic or open search and they they're having like tens of clusters and they would like you know to scale it and to optimize it so we are running on top of of elastic and you can our solution is kind of the compute for elastic so they can run and scale and. +And scale and reduce the infrastructure cost because you know all of these is is a is a question of how many machines do you run okay so you can get like 99. +9 recall it then or accuracy and you can get like single digit millisecond latency but in terms of the infrastructure costs so you know one of the biggest challenges today for enterprises is the. + The low margins due to heavy infrastructure applications so if you are running GPU on the cloud or like heavy machines with big machines with high memory it's great in terms of the business because it's great in terms of the dev team in terms of we are getting great performance high recall but again when you're moving and you're discussing. + So we're having a lot of things going on on the business side so in terms of the margins and the profit of the companies and today it's a big issue you know with companies that are having the challenge of being profitable so we are trying and we add like a few benchmarks we are trying to reduce the infrastructure costs so instead of 10 machines it can be two machines and and our. +Accelerator our APU and with that you can you know scale it and you know one one other interesting thing is like many companies are talking about the scale challenge about the one billion scale challenge so and you know data is exploding right because you know today they are. +80 zetabytes and 10 years ago it was like 16 so essentially like data is growing very fast and I assume that in the next couple of years it will grow exponentially and 90% of this data and the data that is created every year is unstructured so you know this is the cliche of. +Finding a needle in a haystack so in terms of and I assume more and more companies will face the scale challenge like above one billion and. +I know that this is a challenge for some of the search engine companies you know scaling to hundreds of millions and billions and I had a conversation with one of the biggest e-commerce in in Asia and you told me yeah our our challenges to scale. +They have like two billion and index and the again the infrastructure cost is a major issue i have read a post by amazon cfo and like a week ago and their focus right now is reducing the. +Infrastructure cost for their customers and any solution that can reduce the infrastructure cost for enterprises I think it's a major issue for not only rnd teams but business and decision-makers in enterprises. +I will I will pull for that link so we can also include it in the in the show notes some of our listeners by the way find it quite educational to have all this additional links and study materials and I think we can also include that that's super super cool. +So in a way like your challenges that you basically need to. As you said there are low margins right for this big players everyone tries to stay. +Profitable so anyway your challenges to not only fit into that narrow kind of window but also be profitable yourself right so like you like provide that acceleration what do you think where do you stand today on that I do you think there is a lot. +Still to do or do you think it's already something that companies can try. Yeah so. Today we have like the first generation of our. AI chip the AP. The potential of improving our hardware and our bill of material of our hardware and. +Generally speaking next year we launch our second generation so for instance if today we can you know in terms of performance we are talking about single double digit millisecond latency with one a few. +Next year we will launch our second generation it will be more than 10x faster so I think we are just scratching the tip of the iceberg so the I think that the hardware challenge is solved but. +You know every week we have like a new implementation and improving our performance on the software layer so we have a few layers we have the hardware layers so I spoke about it like the first generation and the second generation. +I believe that there is a huge potential in terms of optimizing our software layer because it is. We are trying to reinvent search so. I think there's a huge potential on on the hardware side but I think we are just we are just we didn't even start to optimize our software performance. +So we recently we found a new implementation to improve the latency by reduce the latency by 40% like it was two weeks ago so hopefully we will launch it to production in the upcoming in the next up upcoming weeks. +So and in terms of your question yeah I think we are just at the beginning and I believe that we can optimize both on the on the hardware and the software layer and hopefully it will be very profitable. + Sounds great I mean in general it was since I have had exposure to it as we implemented the image search demo it was quite interesting how you know how easy it was to set it up right so it like and and you don't need to worry about that hardware thing yeah it acts a little bit like a black box but on the other hand it's very scalable so. + And you guys also have I will make sure to link this you also published the is called neural hashing algorithm right which you which you use one of the algorithms that you have implemented it would also be equal to drill in into that direction but I mean in general it was fairly straightforward how you know how we upload the data how it gets indexed and then how we can query. + Yeah I was just thinking to take it a little bit deeper can you talk to some of the features you know many of the vector database players they say why do you need vector databases because first of all if you took files for example a similar framework you wouldn't have the filter support right and of course in. +In real application like such app you do need filters alongside the whatever retriever you implement right keyword or vector so can you talk a bit more about features and maybe also touch on the algorithms that you guys have implemented. +Yeah yeah so there are various types of features and implementations we are working with you know the common algorithms it can be either flat search for applications like it can be a face recognition where you need to. +Search any every record we have implementations of a an an IVF and a new implementation of H&S W on our APU. +Pre-filter and other features one of one of our one of the areas that we would like to focus is as you mentioned is is to simplify so we can you know you can work with it as a black box and install the plugin and work with your. +With your technological stack and with your search application either elastic open search or a vector search engine or vector database. + Pre-filter as you mentioned is supported and I think that we should focus on simplifying this is our biggest challenge simplifying the work with our platform and creating more integrations and more connectors not not on the feature level but in terms of you know working with the the ecosystem this is. +This is our main focus right now and again improving the performance because we are you know customer obsessed and we would like our customers to get the lowest infrastructure cost and we without sacrificing latency and. Sorry and the accuracy. Recall. Yeah that makes sense and especially like. +To do this at scale right I know that some of the players they say that it's very rare that there are clients with more than you dozens of millions of items right but today you already mentioned that there are clients. +So which have you know more than a billion items maybe more than two billion items so do you think that going forward you will see more of these second you know type of players with more data or do you think that there is still a use for dedicated hardware. For this kind of smaller scale players. +Yeah yeah absolutely agree with you I think that in terms of the scale challenge we are. +We are working with customers and some of them here as you mentioned like tens of billions but moving forward I think most of the enterprises and the big companies will move forward and they will scale to one billion 10 billion and maybe even more. +In terms of like the ecosystem so my two cents is that companies are stealing the concept of keyword search for some applications TF IDF BM 25. +For some application it's a good solution and you know you don't need an hammer for a screw driver problem right so for some use cases keyword search is a good fit and this is like you know part of the concept of vibrate search. +So I think we are still we're the beginning of the the vector search if I may call it the vector search revolution when you know where you can have the. +X to vac concept like any unstructured data we were usually we are talking about text but there are broad areas that we could you know develop some cool stuff for as I mentioned Jen on. We have a video search we have a notebook with you know website notebook with video search and again the the. +There's a broad spectrum of applications that companies can develop some cool stuff cool stuff and we are you know excited to see. So really anti these and start up that are developing applications on top of of these. +Vector search applications yeah you catch on that topic by the way which I also spoke to. To some extent on the haystack conference in Berlin where I gave a keynote also make sure to give the link. +So I think that's a wonderful set that let's stop calling it vector search because and I don't know how I really interested to really interested to hear your thoughts on that. + Because in principle you you and I being product managers like if we think about some problem to solve right let's say we want to introduce I don't know question answering component in our search engines it's not like we would probably if we didn't know that we probably wouldn't say oh I know how to solve it it's vector search. +Instead he was saying you know let's call it relevance application right or relevance oriented application. What's your broad take on this. +You touched on this as well like people are not yet aware of this revolution it's probably already happening but people don't know what to do with it right and I just yesterday saw it with from one user saying can you actually explain what what can I do with it right. +So do you think that the world is still let's say the world of software development is still awakening to to to this new field. +Yeah yeah absolutely I fully agree with you essentially when you I'm talking with developers and I'm saying we are I'm talking again we are working on vector search they are like asking vector what. +And I think that most of the developers and this is one of our challenges is to democratize AI and machine learning so in terms of technology my perspective and is that technology is an enabler if the best solution is vector search great it can you know outperform on on various applications but. +Technology on a product perspective so you are trying to create value I think that the first lesson of a product is to create value for your customer that's it. Simple as that and what is the technology and what is under the hood and what is inside the black box it really doesn't matter. +So in terms of technology yeah there's you know and we are like it's a crazy time for developers in terms of the AI machine learning revolution stable diffusion generative generative AI and I've heard about that they are going open AI planning to launch the new GPT for. + The base of innovation is is totally crazy so and it's really hard to keep it to keep it simple to simplify it when when people are asking you you know there's the grandmother test for startup state and in plain English explain your idea in a plain English and super challenging you know to simplify it so when when you know when developers or companies asking what is vector search I'm using the. +The example of you know transforming words in the case of text to numbers it's easy for us to compare numbers right we know that three is is close or is similar to four right but what is the connection between king and queen. +Okay so how do you represent it as a number so if you if you are trying again again i'm trying to super simplified if you are trying to build an equation what is the connection between king and queen so you can say. + King plus man minus woman equals to queen so you are you are trying to represent it as numbers so this is the concept of vector you are representing unstructured data and it can be you know with image image embedding etc and then I think like you know most of the tech companies today their core technology is search. +Okay let's take if you are looking for a movie it's it's Netflix if you are. + Would like you know to hear something cool or your podcast so you are running a query on Spotify vector podcast and you will get the metri's podcast you would like to buy a dress or and you you are trying to do it very simple you don't usually more more of the let's take for instance ecommerce right. + So most of the consumers don't have the time and the patient to run you know SQL queries you know filter this filter that they would like to write it in a simple English or or in a different language okay yeah so let's take for an example girl in Asia she would like to purchase a red and white short sleeve dress. +Up until the vector search revolution she didn't had the option to do it so usually she will get like. +Similar result it will not always be red and white with short sleeve dress and what about the challenge of the language okay so if her English is not so great and she would like to purchase something. +On Amazon eBay or any other ecommerce so the challenge of language so essentially vector search is breaking the barrier of the language and the the barrier of understanding what is your what is your question or what is your query so I think that in terms of. + You know and there's a broad discussion about it democratizing AI what is the added value of AI so you know you have like autonomous cars and this is great but you know breaking barriers the language barrier with the multilingual model and and some other cool stuff this is this is I think something that is doing really good for the ecosystem and for consumer and the people that. +You know they have like a barrier of a language so. +This is a great example what is the added value of vector search yeah I agree I mean all of the examples that you brought up you know if you if you look at how you would tackle I don't know like red short sleeve dress with the more traditional approach I guess you will need to build some kind of. + Query understanding system but even then like even if after you build it let's say you you will run filters on your data right but that also means you do need to have the filters but but if you don't have them if you don't have the values in those fields in your documents right so what if you want you have like and this is by the way not. + It's very unusual like I used to see I used to oversee a project in e-commerce space where we would get data from new providers all the time right so one of the issues was to map them back to our ontology but at the same time they would they would miss a lot of like field values right so what would you put there so they give you some description and then they give you the image on a set of images so like with conventional not conventional but like more traditional approach. + More traditional approach of search right keyword search you're kind of like stuck right what would you do there and I guess of course people do solve it in some way but instead you could just apply vector search right and and even though I say just there is still some challenge for example with model fine tuning and things like that can you talk a bit more to this maybe new challenges that this field opens of course it gives us. + Opportunities it gives us advantages it solves some you know painstaking issues that we had before but what do we need to focus on going forward then once we deploy such systems beyond only hardware part but also like on this algorithm side yeah you know this is this is a great question because it resonates with one of your blog post recent blog post where you published the Google's research about. + Ecommerce companies in the US losing 300 billion dollars due to search abandonment in the US only and again this is crazy number because if you have like I would like to buy a green polo shirt and I really want to buy a green polo shirt and they They commerce got this green polo shirt inside in the in the where are so in the inventory and they can find the match we can find the match for this for this challenge this is this is the you challenge but in terms of of and again this is just one one example but you know our mission is is to back to break this barrier for for developers it's not only ecommerce so. + So expanding it to searching blocks okay if you would like to find an anomaly or you would like to understand what is the root cause when you're and you have like software system logs and you would like to to understand and to find some anomalies or even fintech ecommerce and other areas I think that there's some cool stuff. +Some cool stuff over there's a one one way you know to move forward is if you would like to use let's take for instance Siri I would like to buy with your audio right I would like to buy a red and white short sleeve dress below $100. + Okay so you can this is a simple thing for you know consumers but you know technology wise this is the you challenge so the first challenge is to convert the audio to text and today there's you know you can convert it directly to vectors and then you can run this query but again you need to filter because if you want. + Something that is below 100 so usually it's the price field so I think this is the biggest challenge that consumers or people can communicate in a natural way with the computer with audio and say it very simple without you know trying to to run a complicated SQL queries etc so I think this is the like the. + The holy grail of machine learning to process this query and give you like the example and when you're purchase but when you would like to purchase it on a certain website it will give you the place order page and you will get all of the details you will see the type of the dress and it will give you the right result and it will be below 100. + $100 and I think this is the way or this is a direction that we we can move forward with this technology yeah yeah that sounds great so in principle like so so that our listeners the president knew one will understand is that vector search really opens doors to new types of data right new modalities as they say so like previously it was maybe only text modality. + Even if you saw pictures on the on the monitor or on your phone as you know as a response to your query it doesn't necessarily mean that that query really was kind of grasping the best parts of that image like it would actually understand what what is in the image but with vector search you can also implement that and for example using clip model or some other model where you can really. +Infer meaning from that picture right and and what you are saying is that. In the future and maybe this is to some extent happening already is that we can also cross modalities between voice and text right so like what I'm saying it can it can represent as a vector and then. +Find an image or find a video right it's like a lot of applications. + Yeah yeah yeah totally yeah exactly and you know if if you are working with your Instagram and you found like a nice celeb that is wearing a nice dress and you would like to buy something that is similar so with image search you can find like a similar and find me the find me this dress or the most relevant dress or the most the closest dress the closest example of this dress and yeah there are various options you know this is just one example you know of how to monetize Instagram or a tick tock where you know consumers can watch their favorite. +So that's the. Celeb that they are following and if they were seeing something this is great so I want to purchase it and in terms of monetization and in terms of the added value of the customer take take this you take this platforms as a. +Anycomers platform okay this is like a fresh concept but this is ways this is a way for companies to monetize the platform it's not a social media it can be. + Ecommerce and it can be super simple because you know up until now they they've seen like a nice dress or a nice shirt but they cannot do with it they cannot purchase it they don't know how to explain to the machine or the computer what what is the type of the of the clothing that they would like to buy. +So yeah that there are various options and yeah i'm eager to see what are the applications that you know developers and the entrepreneurs will develop with this technology. + Yep that sounds great one of the apps that you just kind of reminded me of is I think it was James bricks who built the kind of simple demo using the recent model called whisper from open AI so you can actually you know like on YouTube today how you find things is basically mostly based on titles I believe this is what people type. + But then he build a demo where he can land in the precise time code which contains the answer to your question you know that could be really interesting like it just to think about it at the unlocks even more what you said in the beginning like we have this is that a bites of data and so on but like we are not able to unlock the data right it's just sitting there waiting to be discovered so to say. +Yeah it's really cool I wanted to spend a bit of time on on the search topic itself so you did mention this search abandonment issue which is like an ecommerce but but in general if we if we think about search field. + On a much larger scale and I think Daniel tanky lung also said about it that when search engine doesn't work you are blamed but when it does work you don't hear anything it's like people take it for granted it's kind of like water from the tap I guess right if it's the right analogy so what do you think of the search field in general like where do you think vector search field fits in and what's the role of this hybrid approach. +Where you have this keywords versus which are more familiar to users versus vector search so where would you take this yourself right as a product manager having unlimited resources where would you where would you go. + Yeah this is this is an interesting question yeah so I think that search is still an unsold problem and you know in order to find the right object or the right the the most accurate type of data we are still we have a lot of work to develop this ecosystem and you know to build the multimodals and multilingual and I think that the big tech companies are doing some great job with this with this tap. + Like open and I and the other folks and hybrid searches is is a is a very interesting concept I believe that we for some applications it can be a good a good way to solve their challenges but I think that the one of the most important is that you should and again I've learned that the data but yes that they are like the concept moving backwards from the custom what is I don't have solution if we have a discussion with a customer we are asking what is the problem that you would like to solve and this is like you should be focused what what is the problem that you would like to solve like more than 50% of your discussion. + And if you don't have a good fit it's not a good fit if you see the vector search technology is not a good fit we will say it to the customer we are not trying you know to fit into space that you know keyword search is a great solution so I think it's the focus should be around the problem space so trying to figure out what is their pain point or what is their customers pain point is it the accuracy for some for some applications. + We we spoke with the fraud detection company and for their use case like keyword search was was a good enough solution great so go go go ahead and we don't want to disturb you so I think the focus should be around the problem and the challenge and then again focus on what is the the challenge that they would like to achieve or what is the what is the what is the potential of the solution and sometimes we are talking about recall is it the most important parameter for some of the for some of the customers and 90% is good enough for the use case but for mission mission critical it should be mission critical applications. +Sorry it should be 99. +99% right so I think it's it's a matter to some extent it's it's an issue of what is the problem and what is the KPI that you would like to achieve would you like to optimize the recall great we would optimize it if you would like to reduce the infrastructure cost with the same KPI recall of X and you have latency of X and it's a good enough and maybe it can be latency. +So for instance, Amazon published a research that every 100 milliseconds let it see equals to 1% of the revenue. So if the revenue is $1 billion then 100 milliseconds of latency equals to 10 million. This is a huge impact for companies. +So I think the main the main question is what is the problem that you would like to solve what is the pain point and starting from the customer and then work backwards to find if you have like a good solution and if a solution is a is a good fit. +And there are various concept keyword search is a great solution vector search is a great solution and I research is a good solution. The the big I think the biggest question is what is the the problem that the customer would like to solve. +Yeah, I think you put it really brilliantly because it's very easy to get into my new show of tweaking things like on the software side and saying I have the best algorithm right or I have the fastest or whatever. +But then if you if you forgot that I guess the most important dimension for your customer maybe it's power consumption that we mentioned previously or something else right. But but also what you said. + How you can think like the way Amazon did it right that they think big right they say okay of all these dollars we we earned how much actually we wasted on on you know latency and also how much of clients we kind of lost right or potential clients because if if the server doesn't respond soon enough then. +And it's only average right 100 millisecond maybe for some it looks like more like closer to second including their own internet connection and they might just give up and say hard this is not working today I will go and check something else I will forget about what I wanted to buy so right. + Yeah, this is very interesting also like you you you you you brought up some some topic behind the scenes like on the role of human in this whole loop I also want to pick a brain on that you know there is one direction in a I saying this is going to be a whole automatic thing you don't need to do anything it will decide for you which is also by the way a little bit worrisome if the I is going to decide for everything. + But but even going back to earth like way do you see humans will play a role in some sense we are slower right then machines in some sense I think we are still faster for example in creating things but even they are the machines are tapping into it but like connected with MLOPS topics also machine learning operations and connected with. + By us in data that we collect to train models or maybe some other dimension that I'm missing that you think human is going to play a role can you can you expand a little bit on that yeah so actually I wrote about it in medium about the MLOPS challenge and and the human in the loop and what is the place of human in the loop essentially I believe that machine learning is a decision supports system. +Okay, the I believe that the human as. + As a huge or as important significant role in helping the machine to decide and where a good way to automate processes is to use the machine and to set a threshold okay so for instance if you were talking about cyber cyber security challenge so you can decide that the threshold is threshold is below 0. +7 is a good enough and don't that like the sock teams will check this anomaly and then again you are reducing the main power cost because you are automating and you are sending queries or a stream of data to analyze that they would you know fine tune the model and then you can create a learning model right so it's a human in the loop. +It's a human in the loop the human is giving a feedback to the model and then you can detect data drift if it's not automated you know there there are solutions that are good that you know data tree etc but again. + My my two cents is that fully automated systems we are not ready yet for it and I believe that in order and then again we don't like all of the anomalies will be tested by a human again because you have the false positive fatigue or a lot of fatigue in the cyber domain so I believe that a combination or the hybrid model where you can define a certain threshold and send it to a human to run a sanity check and you know I've worked with many data scientists and the always like you know to improve the state of the art model and improve the F1 score from 99 to 99. +9 but what is the the impact on the business. Is it is it important enough for the business to invest resources in this in this in this research or not like five data scientists are running and testing and optimizing the eye per parameter. + For months but what is the business what is the impact on the business so essentially I believe that and again it resonates with the search domain so I believe that companies that will be smart enough to integrate the human and the loop mechanism where they can find you know measure KPIs like the clicks on on the on the first result and how many clicks on the first result or any other KPIs and then if it's a good model then great we should keep it you know if it's not broken don't touch it but if something is not working the mechanism or something that there's a drift in the data so we can you know research research it again and find what is the root cause and then human will detect or machine will detect it so I believe that this is the question of of layers so you have like the machine learning. + Machine learning layer and then MLO tools like you can you know auto ML and the hyper parameter optimization and data drift and model drift and other tools but I don't think we are ready to fully automated all of the process and yeah this is like a great question for instance autonomous cars are we ready yet or not and this is I think this is the challenge of the data science ecosystem in the next years. + Yeah I think it's also like our psychological readiness to accept this solutions right maybe previously when we didn't have let's say elevator everyone was walking up the stairs and no one really complained but then when the first elevator arrived maybe people were like really you know looking at it with open eyes and like what is this should they trust it will I get interrupt in it or something you know So the same I think goes to what you just raised as the you know self-driving cars you know I think it was ill and must saying I don't remember exactly the stats but something let's say one and a thousand you know so it avoids basically 999 you know cases that cases so would you trust that or do you need like it to be even bigger number and so on so forth so like complete 1000 so it's never mistaken but what about cases where it's all right. + It's hard to decide right like you inevitably going to crush the car now you need to choose where like to the human or maybe to to the I don't know for to the tree or something which hurts the driver and stuff like that right so this this I think the same the same decisions that we would be making as humans then algorithms should make and I think what humans or humanity has hard time with is probably accepting the fact that someone else is going to make the decision right so yeah it's a revolution you know you mentioned the elevator but you know the famous story of the report with the horses and the cars why should we why should we need the cars right so it's a revolution I think that most of the features that we are working on improving the quality of life and people you know can automate processes and and on focus in their you know in their family and instead of doing some complicated task they can you know focus their their time on innovation or you know play football or soccer whatever they want and you know makes our life easier to some extent yeah and we believe collectively that vector search is going to help there. + I really like also to of course ask this my philosophical question but before that I was thinking like what do you think on the field in general the vector search and maybe including search and machine learning what do a lot is happening but what do you think is still missing from your perspective something that maybe we need to fix to be more efficient. + Yeah I think it's it's it's education simplifying the concept of search I think this is the should be our main focus so education generating content and again I really like the grandmother test simplifying not like super complicated mathematical equations etc trying to simplify and to explain it to your grandmother. + And so I think it's education we are you know the ecosystem is trying to generate high quality content videos YouTube blog post we we are trying to contribute to this effort as well and I don't think we are doing enough and you know it can be like high school or universities so but again this is vector search is technology. + It's an enabler it's not it's not the objective it's not the it's not the target but in order to unlock the potential of vector search and machine learning and transformance and all of these cool stuff we should invest some of our resources in education and and learning and training and you know unlock the potential that every developer can kill build the and vector vector search based application in in every field like it can be as I mentioned before healthcare, healthcare, FinTech education and any other domain that manufacturing or any other domain that you would like that is eager to solve some problem I think we should you know simplified similar to the revolution of auto ml so instead of you know processing and labeling images etc you have like an auto ml auto ml tool or solution and your provision you provision the data labeling it and then the under the hood the auto ml model will run the experiments find the right algorithm find the right hyper parameters optimizing them you can define what is the what is the KPI that you would like to optimize if it's f1 score or recall or whatever and then you will get the model and if you would like to deep dive you will get the code so you know generating models with low code so this is another and another area that we should focus in but you know to some extent I believe that education training and generating high quality content should be our focus right now yeah I think you put it really well and I would probably even add to this that yes there is content which kind of like promotes someone's solution right but at the same time you really want to educate like why should people even care about your solution so you need to take few steps back and explain what are we talking about you know what problems exist that you are trying to that you are targeting right so I still if I was asking the same question to myself I still see a lot of content which is much more promotional than it should be because in the beginning of this revolution you still need to explain what is happening what the hell is going on you know why why because the reaction could be also from the incumbent players that they will say no this is not this is not where I think are going they will go go back to their clients and say the same but like you should not position it that way you should you should explain as you said like start from the problem right start from what is your actual business and product target and I guess this is not something that many engineers asking themselves some of them some of the best that I know do some of the best data scientists do as well they don't code before they understood what is being asked of them and I think it's an amazing skill and this is exactly where education also helps like why should also data scientists or engineers care about this new new field. + Yeah, yeah, this is super important and yeah, we should honestly we should you know when I'm saying again internally we should improve the quality of the content and not trying you know to sell our solution just to explain for software developer without the background in machine learning and to simplify it for him and to explain what is the concept what is the trade off between the concept and again to give him the option to understand what is happening and issue decide what is the best tool for him is it a screwdriver is it an hammer it will understand the bits and bytes but and the trade off and and give him the the full piece of the product and the and give him the the full picture about what what are the pros and cons of every solution and you know he will take the decision. + Yeah, exactly and I think if we decay it more some of the players doing doing really great job there and I'm looking forward also to see some blog posts you already mentioned the notebooks that you guys are publishing on your website and I believe that was searching website right and looking forward for more content. +There that I think now that I learned that you really care about the topic I think it's important to create and share and maybe educate the educators and give the example so I think this is really great. +Yeah, yeah, one example the great blog was that one of our software developer wrote is how to optimize open search workloads. + So again it's not we have a plugin on top of it but he wrote about what are the options without you know writing about our solution and what are the options out they can our customers can optimize it and another interesting blog post that we will publish soon is you know benchmarking one of the things that we should improve in our ecosystem is to decide on on a standard tool that will. + Help us to decide what is the KPI and the benchmark there are various benchmark over there i'm familiar we are familiar with Raleigh the elastic benchmark i haven't seen like a good benchmark industry standard in in the vector search there was the challenge of big a and then one year ago two years ago but again I don't think we have. +Good tool today but so one of our developers wrote out to run the benchmark tool so it was like open search benchmark how to use this benchmark and what is the. +Beats and bytes and tips how to understand the benchmark tools so yeah I think that starting from the education and then offering customers to check your solution yeah sounds great i think maybe even by the time this podcast is published we have that new blog as well. +Hey yeah if I'm really excited to be chatting to you today I mean it's we touched a lot of deep topics i'm sure we could have gone. +For longer i'm but I was also i'm really curious to ask you this magical why question you know the same way as you said don't think about software think about the problem that you're solving the reason i'm asking why why question is. + Because I truly believe if you don't understand why you do things then you're kind of like flying through things right so you might as well regret some some time later maybe you train the muscle but still i don't think it's a good sensible approach in your life so i'm really interested given all your experience in in. +Machine learning and product management software development why are you excited to work on back to search search whatever is it that you do day today. +Yeah i think this this is a great question i really like the why combinator accelerator approach for building products build something that customers like or love. And essentially you know we are building some trying to build some cool stuff and make. +People's life easier happier i gave an example of the girl from Asia so this is this is I think one of our mission but it's not only the girl from Asia that would like to purchase a red short sleeve dress it's the DevOps that is trying to find the right log and instead of. +Working it for hours it will take him seconds okay so essentially our mission and i'm excited that i'm working on this topic it's to make the. +The consumers businesses and enterprises life easier and so I think it's a very simple statement of of the why and I believe that this is this is my mission or this is our mission. And to some extent I think that this is like a doing good perspective so you know you have like. Gambling company is. +Building some stuff and building applications and i'm my approach is you know building things that will help the humanity so i'm exciting that this is the things that i'm working on. And by the way this was in my previous startup when we tried to save life right drowning detection. +You issue residential pool open water and you know save life and if we can you know save life maybe for health applications. Detecting cancer with the image embeddings or some other cool stuff i'm super excited that this is the domain that we are working on. +Yeah this is this is very relatable and this fantastic that you're bringing this up you know how we can actually improve life. Beside building great products or products that sell. +This is this is amazing and to conclude is there any announcement that you want to make from the from your side of from search room a side. +Yeah yeah so i'm very excited because we are building some cool stuff so the first one we launch our search you may i platform where we offer customers a free tier to check our. Platform and again it's not. +We are working we are not supporting all of the features but it this is very important it is very important for us to get your feedback so i encourage you to check it and to send me an email or send my team an email or you know support. +We are trying to build things that developers would like and we are very focused on the customer so this is the first announcement and every feedback is valuable next year we will launch our second generation where we can offer. +We have a few new implementations and in terms of performance and hopefully. At the beginning of 2023 we will release our python compiler and some other cool stuff so we are working on a few vectors if I may use. +The vectors and yeah on the software on the hardware on the system user experience and the user interface and to simplify it. Yeah so this is the things that we are working right now and we will be happy to be in touch. +Sounds great thanks and it looks like your plate is full of really exciting things so all the best to you and your team i know some of them. +Yeah it's amazing that you guys are building this and i'm really looking forward to gen 2 of the APU hardware as well yeah and all the best we will stay in touch thank you very much for this episode. +Yeah thank you very much the mid-rate was a pleasure talking with you you know super interesting stuff I can you know talk for hours about this domain you know it's i'm excited to work in this domain and really looking forward to you know. +Here from the community fantastic thanks so much Yennef thank you for now. Thank you very much bye bye. \ No newline at end of file diff --git a/transcripts/vector-podcast/yury-malkov-staff-engineer-twitter-author-of-the-most-adopted-ann-algorithm-hnsw.md b/transcripts/vector-podcast/yury-malkov-staff-engineer-twitter-author-of-the-most-adopted-ann-algorithm-hnsw.md new file mode 100644 index 0000000..111477f --- /dev/null +++ b/transcripts/vector-podcast/yury-malkov-staff-engineer-twitter-author-of-the-most-adopted-ann-algorithm-hnsw.md @@ -0,0 +1,313 @@ +--- +description: '

Topics:

00:00 Introduction

01:04 Yury’s background in + laser physics, computer vision and startups

05:14 How Yury entered the field + of nearest neighbor search and his impression of it

09:03 “Not all Small Worlds + are Navigable”

10:10 Gentle introduction into the theory of Small World Navigable + Graphs and related concepts

13:55 Further clarification on the input constraints + for the NN search algorithm design

15:03 What did not work in NSW algorithm + and how did Yury set up to invent new algorithm called HNSW

24:06 Collaboration + with Leo Boytsov on integrating HNSW in nmslib

26:01 Differences between HNSW + and NSW

27:55 Does algorithm always converge?

31:56 How FAISS’s implementation + is different from the original HNSW

33:13 Could Yury predict that his algorithm + would be implemented in so many frameworks and vector databases in languages like + Go and Rust?

36:51 How our perception of high-dimensional spaces change compared + to 3D?

38:30 ANN Benchmarks

41:33 Feeling proud of the invention and + publication process during 2,5 years!

48:10 Yury’s effort to maintain HNSW + and its GitHub community and the algorithm’s design principles

53:29 Dmitry’s + ANN algorithm KANNDI, which uses HNSW as a building block

1:02:16 Java / Python + Virtual Machines, profiling and benchmarking. “Your analysis of performance contradicts + the profiler”

1:05:36 What are Yury’s hopes and goals for HNSW and role of + symbolic filtering in ANN in general

1:13:05 The future of ANN field: search + inside a neural network, graph ANN

1:15:14 Multistage ranking with graph based + nearest neighbor search

1:18:18 Do we have the “best” ANN algorithm? How ANN + algorithms influence each other

1:21:27 Yury’s plans on publishing his ideas

1:23:42 + The intriguing question of Why

Show notes:

- HNSW library: https://github.com/nmslib/hnswlib/

- + HNSW paper Malkov, Y. A., & Yashunin, D. A. (2018). Efficient and robust approximate + nearest neighbor search using hierarchical navigable small world graphs. TPAMI, + 42(4), 824-836. (arxiv:1603.09320)

- NSW paper Malkov, Y., Ponomarenko, A., + Logvinov, A., & Krylov, V. (2014). Approximate nearest neighbor algorithm based + on navigable small world graphs. Information Systems, 45, 61-68.

- Yury Lifshits’s + paper: https://yury.name/papers/lifshits2009combinatorial.pdf

- + Sergey Brin’s work in nearest neighbour search: GNAT - Geometric Near-neighbour + Access Tree: [CiteSeerX — Near neighbor search in large metric spaces](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.173.8156)

- + Podcast with Leo Boytsov: https://rare-technologies.com/rrp-4-leo-boytsov-knn-search/

- + Million-Scale ANN Benchmarks: http://ann-benchmarks.com/

- Billion Scale ANN Benchmarks: + https://github.com/harsha-simhadri/big-ann-benchmarks

- FALCONN + algorithm: https://github.com/falconn-lib/falconn

- Mentioned navigable + small world papers:

Kleinberg, J. M. (2000). Navigation in a small world. + Nature, 406(6798), 845-845.;

Boguna, M., Krioukov, D., & Claffy, K. C. + (2009). Navigability of complex networks. Nature Physics, 5(1), 74-80.

' +image_url: https://media.rss.com/vector-podcast/20220131_090127_be85ef047356dd187c4b22fb3a9286be.jpg +pub_date: Mon, 31 Jan 2022 09:41:27 GMT +title: Yury Malkov - Staff Engineer, Twitter - Author of the most adopted ANN algorithm + HNSW +url: https://rss.com/podcasts/vector-podcast/377082 +--- + +Hello, vector podcast is here and today we're going to be talking to the author of H&SW library and algorithm. It's one of the best algorithms out there, one of the most used algorithms in vector search. And today I'm talking to Yuri Malkov. How are you doing? Hi. Hi. Hi. +So yeah, my name is Yuri Malkov. Currently I'm working at Twitter. There's a staff from my engineer and the content understanding and research and recommender systems. Yeah, please know that during discussion I don't represent like Twitter's point of view. The views are of my own. +Yeah, so it's great. So yeah, you already began introducing yourself. So I was wondering if you could tell me a bit about yourself, your background and then maybe we can also move into discussing the algorithm itself. Okay, sure. Yeah, so my trajectory of moving to ML is quite typical to Russia. +So yeah, I got good physics education in Nizhny Novgorod and there I did the PhD in laser physics. So there I was doing experiments on teravat lasers. So that was fun and like that part of physics is like considered to be like sexy part, similar to computer vision in machine learning. +And I was lucky to have good supervisors. So one of my supervisor which was like mostly a supervisor of paper. So he helped me. Is now the head of Russian Academy. So yeah, I had good supervisors. +In addition to physics, I was concurrently working part time in a startup that was building distributed scalable search systems based on insights from real networks. Yeah, that worked ended up in several papers on predecessor of H&W. +And the startup, yeah, unfortunately the startup was closed before even I got PhD. So yeah, I decided to focus on physics after that, but after I got my PhD degree in physics. So I, like there was a choice for me like what to do next and to proceed with career and physics. +I had to go abroad, like I didn't want to go abroad. I want to stay in Nizhny Novgorod. So I decided to just like switch directions and to network science there. And then I got a really good grant from the Russian fund. Alpha Phi, which now is not present anymore. +So I could do like research by my own. Like this pretty good salary. And yeah, I also joined companies, like computer vision companies to get to insight into why people actually use like similarities to your algorithm and machine learning. +And I worked at an television and later anti-club, which is the company that is like doing big brother for Moscow, like Moscow surveillance. +And later I joined some some VIS Center in Moscow and I worked with like Victor Limpitsky who is one of the well known personas in Russia and in 2019 I moved to US and now I work in Twitter to recommend their systems and content understanding, like board models. Oh yeah. +So you probably also use nearest neighbor search in your work or. Well, I can mention it. Yeah, well, not really. So I'm focused on the so I can work Twitter most of the time. I can have last half a year, I spent on improving search relevance. So that is mostly the ranker. +But that is closely related to the nearest neighbor search. Yeah. So you mentioned like basically the background where you've been in Russia, it was like kind of related to computer vision. Of course, you had physics background by education, but you also kind of worked in computer vision startups. +So what was your impression of this nearest neighbor search problem and like, how did you think about it when like, did you read papers to understand like what was done in that area? I think that areas pretty like developed right in in in the papers like like NSW itself, right? Like navigators. +Well, so like in the startup meta labs, I have been working, I think I've worked for six or seven years. So it was quite quite a significant period of time. And then we started just like from distributed search. So the idea was like we do it from scratch. +So like we don't care what I've been done before. So we have an idea. So there are like distributed hash tables like port or other stuff and we want to do it, but with similarity search. So that should scale to better base. And there that's like very different approach from nearest neighbor search. +And like most of the time we spent like developing this algorithm was not even like nearest neighbor search. That was closer to this symbolic filtering, but with like arbitrary filters. +And only at some point of time, like we had a realization that oh, like that is similar to what people actually need. Like there are a lot of papers of on nearest neighbor search. So we switch direction and like the most cited publications are on nearest neighbors. Yeah. Yeah. +I don't remember was it on your paper or somebody else's paper. I saw a paper of my old friend, you reliefs because he actually defended his thesis like in PG thesis in this space. So when he was doing it, I think it was 2009. +I was like, I was considering this like a pure mathematical problem without like maybe direct application. But then he gave a talk at Google, like you know, Google tech talks. I don't know if they still exist or not, but like he presented this problem and like they did some optimizations as well. +And then I think I think your paper sites it or maybe someone else's I don't remember. I was like really surprised to see, you know, his work also kind of in the same line of things that now lead to vector search essentially. Well, yeah, I think I saw his work, but it seemed like more theory. +Like if you look to history like of like graph approaches so like. Now it's mostly like rehashing of old stuff. So definitely new things, but like there is so much work done before like Sergey Brein worked on nearest neighbor search with like GNIT. So that is also like good work. +There were there were previous work on graph search, I think in 1993, which like aren't that much different compared to like current though, like they have also problems with scalability at that point. So I think yeah, so that was. There is a large number of like previous work in that area. +But you said like you didn't concern yourself with reading too many papers before you started inventing this new algorithm. Is that right? Yeah, sure, sure, we read papers, but they were not really relevant. So we read papers on network science. +And so we tried to so there was a problem with building like this, no navigable small roles. So like not every small network is navigable. Like most models are not. So we wanted to build navigable small and there were also didn't understand like. +Like what what was the criteria like what is like how we could make it and we reinvented like this. Dillon or graphs inside the company and after like you reinvented like you know starting to search and see there are lots of papers who did the same. Right. Yeah. So yeah. So we went the other way. +Yeah. So now that you mentioned this thing like can you actually please introduce this concepts at least on high level to our audience like what is a small world what is like what white it's to be navigable kind of a little bit like more to the user facing level if it's possible. +Well, like navigable small world so you have a large network. And so navigable small world that means you can find paths between like arbitrary elements in this network using which is a logarithmic scale. So the number of hopes can be done with the rhythmic and you can use only local information. +And do like something like greedy search like greedy searches allow and if you can find like the path and the algorithmic steps to your network is navigable. And that small world part like why is it small small. +And that's like history how he's historical reasons so there was like a famous like milligram experiment where they they send letters from one person like from random person to some target person. +That was kind of greedy like greedy search for connections very similar to this and that that's called like small world experiment so like a small world. And real networks like people have like real networks have low diameter like human human connection networks. +And they are navigable like at least according to milligram experiments and like subsequent experiments. Is it kind of related in common terms like to six handshakes that you need to connect every random person with another random person on the planet. +Yes, yes, so that's that's like that experiment is pretty sure I think it's done in the 60s so yeah so. And so the navigable part is basically like if we put this in the context of search right so. +So let's say I have local information I'm here I would like to travel from here let's say I'm in Helsinki I would like to travel to New York like how do I travel right I need to go to the airport. +From the airport I will travel maybe to some city in Europe from there I will change you know the airplane and then fly over to New York. +I'm making it a little bit more complicated there is a direct flight to New York from Helsinki but okay maybe that wasn't right is that analogous to navigable part. +Yes, yes, so like generally like that you can pinpoint that but if you start and finish in like small local airports which usually don't have connections, my magic connection so they connected to hops. +Yeah, and that is one of the model of navigable small roles so there are like Kleinberg's model which doesn't have hops so you can also build navigable small walls without hops. + But they have polylogarifmix coedizian so if you want to have polylogarifmix coedizian so maybe I'll ask you to provide some references later so especially for those who want to dig deeper into the smithematics like you mentioned these different algorithms like many of them are new to me at least so I'm sure to our part of our audience. + Part of our audience as well and I wanted to also ask you like on the context of your invention like what was the input so you said like you had a lot of data right from computer vision but like was there something else like dimensionality or some other constraint that was kind of tough for previous algorithms like a LSH or you know any other. +Well, there LSH didn't even work so we worked with like three structures we have to like how will you do LSH. + Yeah, for and for LSH so I thought that those are not practical algorithms so even when I spoke with people who like were writing a lot of papers on LSH they like expressed doubts and whether those algorithms are practical so they are not learnable so they cannot take advantage of the data that you have so like that. +And like what what they told is like they see as quantization as just a better version of practical version of LSH. +Yeah right and so actually I'm really interested like how did you set up to invent the algorithm like I can just give you briefly like in the recent billion scale vector search challenge. +We had like a small team and one of our team members actually implemented like a small change in product quantization layer like basically how you shuffle the dimensions in the vector and he achieved like 12% recall increase over the baseline you know the Facebook sell algorithm. +I didn't like have that much knowledge I've read your paper I've read other papers and so I was just thinking okay if I if I would start from first principles how would I solve it like I know nothing about this problem right so like how can I solve you know the search in multi-dimensional space. +And so I actually implemented a very very simple algorithm using your algorithm as one of the components maybe we can talk later about it but like how did you start inventing H&S W. +Well H&S W had a pretty assessor so it has like an NSW it's also called MSW or SW graph in different places like depending on where you look so and there I just so it had problems. +So it had several problems but like for like if you don't think about distributed setup the main problem it had poly algorithmic scalability with a number of elements and that killed the performance on low dimensional data. + So there were like comparison works like one by Leonid Bytes of where he evaluated different algorithms and like its performance really like it didn't perform that well on some data set and the loss was by many orders of magnitude so it could be like one like 1000 times slower than the best solution and yeah. +So the work on H&S W were targeted at just improving the previous version so it wouldn't have this problem and like ideally would perform the best on all setups. So yeah and that that that that has been solved. +Right but like you still needed to add that magical age in front of it so you made it hierarchical like what what pushed you in the direction of making it hierarchical and what what did you think that it might work or was it like as a result of experimentation that it proved to work. +Well yeah that that that's that's yeah that has many ingredients in it so for for one thing when I worked with the startup mirror labs so we had a different problem with distributed index that NSW had a pleasant quality that the hubs that are created in the network are the first elements. +So and the for distributed system you would want to add new nodes to the system and you will have much capacity like increase the capacity of the system but because all your hubs on in the first notes like in the older notes because they have been created before new nodes even existed. + The traffic is routed through the same old notes which make it not scalable and we spent quite a lot of time on figuring out how to solve it and there at some point I've noticed that like our NSW approach is pretty similar to skip list in terms of what what has been what is being protest as final network. +The idea is like if you if you create a skip list for one D and create the NSW for one D and then like for skip list you just merge the all all links regardless of player you will get a similar network in terms of like degree distribution like distance distribution well all major properties. + So but skip list doesn't have this property so you can add new nodes and they can have like they can have higher levels and like your traffic will be a road through notes in your form like across your distributed system so and that thing we knew like from the startup that there is a like equivalence but that was only for the problem of distributed search. +So it would still use the same polylogar if Michael like 3d search algorithm like which doesn't think about like what is that how many links you have on a note so that was shelved for that reasons in the startup but then so after ID PhD so like I wanted to publish a good paper on network science. + And there like it was and I like there is there is a result that we can create a new navigable networks which a method which was not known before so I tried to publish it in nature so it was rejected like nature physics also rejected that it was rejected by editors then in scientific reports was rejected after a review and then like it was finally published in plus one. +And I think like I really like this paper so that was like the most surprising result I think I got but yet it's not really decided. +And as a byproduct of this I did a comparison to other navigable small world methods and so like maybe I have maybe like this approach with like the old vision that you can apply like you can look at the real world networks and replicate it and like computer system and they will be. +So I replicate that the work done like scale free navigable navigable small worlds which are very popular thing till the moment all. +And so that the performance was really was like very bad like extremely bad and the reason for that that if you have a scale free network and scale free means you have a power low distribution of degree and usually they like there is a. + coefficient gamma and like the best cases was gamma is close to two but gamma close to two means that the scalability with the size of the network so the degree scale is almost linearly so when you have a like a greedy search for the hub so when it goes through the hub like it to play it's like a huge portion of the network so you have like linear scalability instead of like ultra logarithmic so log log again which is. + They like the number of hope is log log in but at some point you evaluate to like almost every point in your network and you have like really bad performance and that like that after that realized what was the problem with NSW and like I thought all like we already have a solution for that so because keep least doesn't have this problem and so yeah after that I implemented the prototype and it worked. +Working on the like C++ version and the evaluation. By the way when when you started implementing your prototype was it initially in C++ now it wasn't in in in in Java and Java because Java is your favorite language or what was it Java. +Because the distributed system like that was implemented in Java so that was close so like it was easier to integrate like if you like maybe you were thinking it's easier to integrate in Java right. +Well I just know how to code it in Java so I code that several times for NSW and that all Java code was released. +So yeah just code it and then like I had to transfer it to C++ to make it efficient and like yeah and so there is like Leonid bites off so who who is a maintainer of an MS leap so I have been in contact with him for quite a while and yeah so it was implemented in the library. +Did you like collaborate with him to to to implement it using the enemy sleep sort of the most efficient way or. +Well first of all like the ideology of the library is very close to like what we have been developing so it's not only focused on like typical distances like L2, Cassian or even like inner product. +So yeah it makes sense to compare on all those distances and Leonid also had a paper like in a bench like on all of those so we can just implement a new algorithm and run a bench. +Right and come so that that was like a really good point and it also wasn't implemented in and benchmark so if you add an algorithm so we can like. +Go through all sets of benchmarks yeah yeah yeah so like it was kind of easy for you to evaluate where you algorithm stands against other algorithms right so like yes right right and so what was the. And you also had a call to write maybe maybe you could introduce him as well like on your paper. +Oh yeah that that is midrida shunyan so. So that's that that that was my peer in the like physics lab he also got PhD the same year I did so yeah so. Yeah so we decided to team up with that so he helped a lot on he he did he did the all evaluation so he integrated it with other code. +And here on the evaluation on the like clusters that we had. Yeah at that point nice so so back to the invention like as you've been inventing this elbow did you have to make a lot of adjustments to the core of the algorithm as you have been evaluating it or was it like. +You know the first shot and it was it. Well not really so there are like two changes compared to NSW in the national SW so first one is the idea of layers so that's all most of the problems with like low dimensional data and. +Yeah also improve performance like in most of the tax that like most of the distributions even like but maybe not much like high dimensional data but still when I ran the whole like suit that was there was a few data set on we should perform worse compared to. +VP3 so that's from Leonits you then I thought that wasn't a big deal but like communicated the results with Leonit trying to convince him that like we don't need to have that much algorithms. +But he was not convinced so then we added like an improvement with the heuristic for selecting the neighbors which like I personally knew from the work on spatial approximation three. +That made that that made the transition to skip list exact so it made an exact so you can build the exact skip list in one day using this heuristic and after that so yeah that that that that addition improved the performance yeah. +I remember please correctly if i'm wrong but like I've read your paper actually really really closely so I printed it and you know like I was reading with the pencil actually making notes so remember like at some point was it so that you agree to them it also needs to prove that it will converge. + Or like because you keep resuffling the points in some way right like as you build it you use multiple threads like in order to kind of build the actual paths between the nodes between layers right so like do you need to kind of still somehow make sure that it will converge on all dimensionality so all spaces or was it was it not necessary. + Well so the algorithm is pretty stable so the result is like how many threads you can go that is an empirical result so I was surprised when I saw it but you know even like for NSW the first algorithm even if you start to do like to use I know 40 threads from a single element I can found no no no drop in the recall. +No drop in the recall or speed that was a bit surprising. In terms of stability. So the main way to make it stable is just like to avoid avoid exploring so like use use proper parameters big enough there are ways to make it stable in. +For corruption so when when but that that that that is pretty costly so if you bootstrap the graph so if you like do iterations like similar to an undecent I think you probably know that I can make it stable even if it's corrupted by a lot. +So that is done only for updates so like when you update your kind of corrupting the graph and well in the like a channels WLIP. +So for updates it wasn't specifically made to be like very stable but for just construction it doesn't have to be like that's stable doesn't have to conversion all situation just keep the parameters high enough and it wouldn't diverge. + Right yeah because I remember like and I'm also curious to hear your opinion so then I after your paper I started reading other papers for example the Microsoft's zoom algorithm and then later they called it discount and with some modifications so they were comparing to etch and SW at larger scale something like billions of nodes billions of points in the space right. + And so they they were trying to minimize the cost that that that it will incur because basically as you build the H and SW you also use memory quite a bit right so I wanted to hear your opinion on that part and then they what they did is that they I don't know if you're familiar with these papers but what they did is that they offloaded portion of the retrieval to to an SSD disk. +And so they kind of combined your algorithm with like additional layers and then they kind of resolve to full precision when they go to SSD disk but they don't don't do it in memory. So they do use quantization right yeah they use quantization exactly. + That's a very popular approach and that makes sense so it's so basically you have a hardware limitation so that you can can store but you have a hardware here are here so you have like not so big RAM and like lots of SSDs and maybe like if you have distributed system you have access to other nodes. +So yeah that's a clever use of here are here that makes sense. But at the same time like your algorithm was taken into using to popular frameworks like files right so like files is not a single algorithm like one of them as H and SW and then. +Actually don't know how they did it did they take your C++ dependency directly or did they implemented do know. +They very implemented from scratch so like I talked to them once so they said they tried different way but like in the end it was like pretty close to the like the initial C++ library don't have some different there is some if something's are implemented differently in fights. +So for instance there is a thread pool like in channels WL for keeping track of visited elements so when you have a new search if there is like a map like think of a bitmap for which knows which notes in the network are visited. +And the channels WL it's kept in memory all the time and when you have like a new search it will just like peaks from the pool and then face like it creates it once per search so there are much searches more more effective. + Yeah yeah yeah by search yeah batch search is another feature that sometimes is implemented in vector databases but did you like expect that your algorithm would become so widely applicable like do you know that it has been re implemented in several languages like for example as part of vector database called V ev8 it was implemented in goal. +And there is a database called quadrant it it's implemented in rast and of course all of these implementations also add like crowd support so they you can actually update right because in reality in database you need these features. +And then they also added symbolic filtering on top of that so it's also inside your algorithm like did you did you expect such popularity. No no like I thought that we will publish the algorithm and like we will win the benchmarks and we're clearly seeing. +But though at that time like just before we published the benchmark there was a like competitor Falcon which also published the benchmarks of widget better but like for Falcon targeted like not like that much and I thought that well Falcon was only like for few specific metrics. +And yeah actually it also was done by like person from the same school which I went so it was in jarrison stain so I talked with him a bit and I thought that like we have open source to code so we published the paper so like people will quickly just like iterate on top of that and like improve. + But yeah so it took much more time to others to improve upon it compared to what I've expected and maybe that was due to lack of interest maybe that was to some inertia so I don't know like looking at the how many startups and solutions are popping out right now it seems like that like the most of the interest came much longer. +Like much later yeah to like to the point when it was released so it was hard to predict it back then. +Yeah do you think that an MS leap has to do something with this success that it kind of maybe an MS leap was somewhat visible and then when you edit your algorithm there and show that it performs you know those people who followed this library at least knew. Okay there is a new algorithm. +I think yeah well that helps so when the MS leap is a good library so it has some audience I think the most attention came from and benchmarks. So because yeah well an eye is like what was I had a lot of attention by that point and that benchmark was done by the same person. +Who did an eye so yeah I think that draw some like traffic to the libraries and yeah also I think the idea of algorithm was like understandable and so. So that also like affects the usage so if you understand something you are more likely to use it. +Yeah yeah it's Eric Bernerson right the Swedish guy as he says the sweet who is stuck in New York City yeah I think he implemented a no way originally there is also like a presentation by him where he explains not only the annoy algorithm but also. + So how intuition doesn't work in multi-dimensional spaces anymore like we think that like in three in 3D world where we leave now right like the further the point away from you like you can actually see it somehow perceive it but like in multi-dimensional space it's not like that I don't know what's your view on that by the way. +So like does geometry perception changes in high dimensional space. +Well yes yes so there are like many interpretations of this so people who work with nearest neighbor search they know about it so like if you have like if you have like many dimensions even small perturbations there they can go like far. +So you all have like so to find nearest neighbor you need to have like a huge cover sphere yeah like when you divide divide the space so yeah that makes the problem complicated and that that one of the reasons why all the practical methods are approximate. + Right yeah yeah yeah so like you do need some approximation in order to find the points and so yeah I mean it sounds like so when you when you mentioned and then benchmarks was it you who submitted the algorithm for the benchmarks or was it Eric who picked it up and he made it kind of available in the end and benchmarks. +No no I did a full request to edit. All right so it basically yeah yeah so you pushed it forward yourself right so it wasn't like you just implemented and then you waited for it to be discovered so to say. +No yeah definitely so like the one of the like decisions to use in the most sleep was that the most sleep was already integrated in an benchmark so adding that will be just like adding some code in an benchmark that like pulls the algorithm and. +And then the tuning of the parameters so that was but that was simple to do right yeah and so as you did that like what were the results like of that of course an end benchmarks it has a number of parameters right for example like even indexing speed. +Not only like recall versus QPS trade trade off like was there some specific kind of metrics that were hnsw excelled over other algorithms. +Well at that point of time there was like no logging of the construction time and memory consumption and the like the initial version in the most sleep it had like clear focus on the performance like recall to speed ratio. +Yeah and well you know it's hard to do proper benchmarking so like there are a number of scenarios somewhere you have a limit on memory somewhere you have a limit on the construction time sometimes like you don't care about them at all you just care about the speed. + You can also care about like multi thread performance or you can care about like single thread performance like maybe different scenarios so it's pretty hard to go proper benchmarking and the depth like like I did a decision to just focus on the recall and don't think about construction and memory. +Okay I see yeah and so and and basically when you when you did that like was hnsw like at the top of the competition at that point. +Yes yes it was like a top and on many many benchmarks it was like it was a huge cap compared to the next competitor so not so maybe for a globe I think this Falcon there was still there there was a like significant. Difference yeah but many. +Yeah also like at some point after that there was a real release of key graph algorithm so which like decreased the difference but it was still on top of it. +Yeah so did you did you did it make you feel proud at that moment when you saw the big gap and like this is your invention for how did you feel about it. +Well that felt nice for sure so yeah so we published the paper I think like pop when the paper was finally accepted so it's also felt like really well so I think it took. Like two and a half years to publish the paper well. +As they say in the US I think every rejection brings you closer to the goal so it sounds like you've been rejected in multiple like journals that was not that was still published. +Now that was a single journal it's just like yeah one revision took one year so that is that is palm year so transaction of pattern analyzing a machine intelligence okay. +So like we follow the practice and physics and ignore ignore the conferences so and we also need the for the grants we need to have journal publications not not confidence publication so we sent. To Pamy and had few revisions there but each revision took a year. +Wow this is super long why do you think it was like that like why why would reviewers be so scrutinizing like your submission. Well I don't know so like I actually talked with the editor so I was very angry after the first result so and it seems like just a problem is how. +So publications in computer science are organized so that's that's not only that journal there are so many journals which have. This problem and like when I looked at the Twitter like when some discussions were like oh I got like review invitation for like this like the national journal. + They said I have to write review in 10 days oh I never gonna do that so no like no way I'm writing a review in 10 days and yeah so in physics it took it sometimes took a few weeks to get the review and journal in journal so you send it and thank you for the months you can already start like writing to review like what what what takes so long yeah and yeah in computer science. +But journals are very slow conferences are also slow there's several months to get the review and like people saw that we are using archive yeah so if there were no archive I think they have already they will just. + They will create new journals yeah exactly like they should be any monopolies right in that sense like maybe go and create your own journal but then the question is when the problem is when you a PhD student let's say you have a chicken act problem right so you haven't proven yourself yet you need a publication to defend your thesis right so that's the trap. + Well it's also known how how can this all so if like they created like a new conference conferences like I think I didn't remember I see a lot or I see a lot was created not that long ago they could have created the journal as well yeah the same people said like we don't want to do conferences like conferences you have a very tight deadline that means like if you miss it you'll wait for another year and that is like not not. +Great let's create a journal and now you have a continuous like a spectrum of time when you want to send your paper no deadlines there are no deadlines for reviewers yeah you could almost review yourself. +Yeah I mean like during the review period on the conference you can get like 10 papers at the same time so you have to review them like in a batch but if you are working with journals you get a review like from time to time yeah like your your load is distributed. + Yeah so by the way what is your take like I think new IPS conference they decided this year they decided to hold all reviews publicly so essentially anybody can see you know the comments from reviewers and there is like a discussion between reviewers and authors and everything is public do you think it improves the process somehow or not what's your take on this. + Well I think that makes sense so that opens well that sets the bar for reviewers higher because if you know that your review will be read by some random people you want to make it better and spend more time on reading the paper it also helps to understand the review process from outside like for if you're a new reviewer you want to understand how to do proper review you can just read reviews by other people. + And that is helpful and you can also like if you're if you want to publish a paper you can find similar papers and read the reviews for those papers and understand like why they are rejected or accepted so that that helps I don't see like much problem in that that fights against against the corruption and some places in science are corrupted so. + Yeah it kind of brings transparency at least with the process and also as you mentioned someone can learn how to do these things right so I think it's also useful and maybe it prevents situations when the paper is rejected outright because the reviewer has some bias against this topic or you know I mean at least transparency is good I think yeah. +Are you publishing today by the way do you have any publishable work do you intend to publish. Not much so I'm working mostly on protection like maybe next year I work on something publishable we are last last thing I published that wasn't some song so for on both estimation. + Yeah but like I've noticed like you are very active on hnsw github like when I when I posted my question and maybe we can discuss that as well if you are kind of curious on that kind of you responded really fast and so it means that you still continue to allocate chunk of your time to to look at you know issues and pull requests on on github. +Yeah so like I wish I would have done it better so I miss some some things from there but yeah I tried to update this library so I think that well when I designed hnsw so there was some design decisions and even if I see like some algorithms outside improve upon that I think they are not. +Aligned with the design. So and I skip them one of that is like hnsw tries to avoid global view of the network so because it's it's a descendant of. Distributed algorithms so like it's like it's not good strategically if you have like a global view well sometimes it helps. +Like there are papers where you can and that we should make that the pass from the entry point of the network to every node is in short so you can make it but that is that breaks if you're doing assertions for instance so like you cannot have a global view and dynamic nature at the same time. +Yeah so that that's why I ignore some of the stuff there's also a focus on like custom distances so even though the hnsw lip supports only free distances is pretty easy to implement what distances like you want and I believe that there will be a shift in like what distances are being used. + After some time because there are problems with like those like those simple distances you mean like a sign cosign dot product this type of distances right or yeah yeah it's more a problem that you want to embed everything like you want to embed an entity into a single vector representation so and that has limitations like as you probably know that. + Like transformers are based on attention and there before there was a like a last year with attention for translation and without attention of didn't work well because it like compressed everything to a single vector so I believe that in some time there will be at least set distances so where your object and query represented as a set of like as a set of which can be shuffled and doesn't change the structure. + So for a user that can be like set of user interests for a document that can be a set of like subjects inside the document for the query it can be like different parts that you want to have in the query at the same time but those parts like might not be ordered and when you embed something you are that you make it ordered and like for instance when I played with clip. + So there is this for so I thought that like it can do what's your which is nice so you can like have an image and like see like what are the words are which text is closest but definitely struggles with the notion of like what words are there so what is the first word yeah yeah so like geometrically or like in different languages it might be even different geometry of words right like should you read left to right or right. +Right to left and then like you also need another dimension of language they are guess. +Yeah, we can represent it as like bag of words maybe ordered bag of words is something encoding as all people do now for text but that that I have so I think like an end would need to adapt for the situation in the future and keeping the stability to add new distances is like is important. +Yeah, so are you are you working on on this personally or are you like welcoming pool requests as they say you know to implement different distances. Well, I'm welcoming pool requests for sure because those are very application specific well it's pretty easy to implement like I don't know. + Or some simple distance so you have like a set a set of distance you just select which are the closest out of the set so it would like many many to many somewhat similar I think to what colder does so though they can I think go without it but essentially you all you'll need a set to set distance right yeah make sense I was since I mentioned it twice already I was wondering like to pick your name. + I was wondering like to pick your brain on what I was thinking in this space like an and trust me it's absolutely very simple algorithm that I came up with the only problem is that I chose Python as the language and so Python has this little bit weird virtual machine kind of how it does the garbage collection and so what I suspect maybe it's also bugging my code but on billion nodes I cannot actually make it conversion reasonable memory so I'm going to say. +And so it runs out of memory like on 995 million and what I did I was really what like I took the input set of points right so the points are like 128 dimensions or 200 dimensions. +Essentially I pick a random point the first one not not random the first one and then I on a sample of points I compute a median distance right so basically what's what's the kind of average distance between between all of them in a player wise fashion. + And and so then I use that as as a filter to build what I called a sharp right so essentially I decided to split the billion points down to controllable number of charts let's say 1000 charts right and so I pick the first point and then I say okay which other point is close enough so like within that median distance to this point. + And so I joined them together in the chart and as the chart reaches 1 million so basically if it's like 1000 charts each chart roughly 1 million points that's a billion points right and then I will close that chart and I will run H and as double you on it so that I can actually have that chart as a hierarchical navigable small world graph. +And and it seems to converge like at least on 10 million it converges on 100 million converges it runs out of memory on one billion but I think it's just some weirdness in how I do it in this big loop or overall points. +But when I reached out to you on on GitHub like my idea was to also access the first layer of the graph so that first layer where the query will enter I could use that. + And as the sort of entry point across this 1000 charts right so because I don't want to load all 1000 into memory I want to load only sufficient amount of entry points so that I can quickly check which chart is closer to my query and then go inside that and use it as W what do you think about this idea it's very simple I think. +Yes well that that that makes sense so that clustering it seems to be so like you did you have like a cluster the points into 1000 clusters and then they select the clusters and. Well yeah that that makes sense I think like historically there were other papers that suggested something similar to. +And then I think in flam so that was one of the distributors strategies that they suggested. Yeah well that that that might work out so that though that depends on on the scale so and so that also well for production system you also want to replicate those notes and so right. +Okay maybe like let's come from a different way so that you can also start to very small pieces so it might not be needed in this case I can want to balance so but on the top layer you can also use like as in this Microsoft paper that you mentioned also there are other papers like from young so. +So I have a paper this those guys so you can use a in you can start into maybe not the short you can. So if you want to divide your data set into like million clusters and use like a higher index to decide on which chart you want to select it right yes. +So though like if you're if you're not talking about like. So it's not a scale so probably like doesn't make too much sense. + But yeah yeah you can do this yeah I mean I'm still hopeful to kind of keep trying it I have another friend who is like on Twitter he actually recorded like a YouTube video where he said here here and here you make a mistake like this is why you lose memory like you should never allocate objects inside loops you should pre-allocate them as NAMPA erase and so on. +So with his modifications it still runs out of memory so like I need to kind of move forward and I'm still kind of like hopefully I can do it in Python but something also tells me maybe I should move to more kind of memory controllable language something like rast or C++ I don't know. + Well I'm not sure so like so using something like so you probably using C++ libraries from Python like NAMPA or torch yeah something like that so they should not click memory so those are pretty pretty controllable yeah yeah it is definitely my code somewhere in the loop it probably just computes too many time like like basically the hottest part of the algorithm like in terms of profiling it right is like. +Like when you can so you pre compute the median distance once right and then you use that value all the time so it's kind of it's okay it's just an object so it doesn't allocate much but then as you extract the next batch of points so I read the one billion set in one million batches right. +I sense that there could be a loss of memory because like it's a binary file and so you say in NAMPA you say from this file read the next batch right so like you you provide the kind of offset and so I sense that maybe there it loses memory maybe not I don't know. +Because I've noticed that in files library they use a map to do the same thing I'm not using a map. I can also use a map so NAMPA if you read the tensors from NAMPA they can also have map options so you can load this map in NAMPA. +But even if you're using if you're reading we are like open like open file is like read binary it should not click memory so it should it should you do read them it's just like. + Yeah so it must be something super stupid then in my code that kind of like really obvious to somebody like you like okay here is the here is the point you should not do this but like for me it's like I invented this basic idea but then like pushing it maybe like like it works on 10 million and I'm okay but like the task was as part of this challenge to do the billion scale right so this is like you crawl you crawl the the mountain. + Without the top in a way but yes there is a top of course it's only one billion points but yeah I mean it keeps me quite excited to keep doing it of course I already see some maybe need for improvements for example how how do I make updates right so let's say new point comes in and I have like 1000 charts predefined so I need to find either an existing chart or create a new one at some point so that that part I defer to the future but I'm not sure if I can do it. +But like maybe I still need to push push harder to just converge it first. Okay you can profile for memory so we can like loop some operations in the code that you think that can leak and the profile the memory for those. +Yeah I've been doing that like actually I also come from the world of Java so in Java it's like quite straightforward in a way there are also tools in Python when you plug in this memory profiler it slows down your computation significant. So you have to wait like 10 times more to see the. + Yeah so I'm not a fan of profilers so like recently I found a video like a talk on YouTube which explain why we shouldn't use profilers and that was like the profilers they become obsolete when the code became like not multifreaded but like with multiple pass so when I'm going to release pension so pension was super scholar so your operations are out of order and when you look at the profiler results like I don't understand them so when I was developing Asian as WL by having to use profilers so I just like wrote benches for operations and like I had like baseline and trial so they usually work in the same memory so the like index is the same but there are different implementations of search and like your your speed can depend on memory how you allocate the memory and with those benches you can measure something like up to 1 or 2% of difference. +And when you like do a lot of benches with one or two percent improvement you can get like 20% improvement 50% improvement. Yeah but like I never used profiles and like I never saw like in my life that people use profiles and like get really complicated insights from using profiles. + Yeah I agree like we did like so we're building also building a search engine with like we had like by design we had like billions of documents and each document was just a short sentence like a statement from a document real document and of course we were running out like we were running into all this garbage collector stop the world problems and so on. +We were running this profilers I think one of them was J rocket and others and like when you see the graphs you're like okay so now I know yes it leaks but what should I do so or it tells you that your code is using like byte arrays too much like what can you do other than that right. + Yeah and for performance it's even worse so you see that like this model takes a lot of time but like in a multi multi threaded world that like it's not for sure so you can improve it that like and that happened so quite quite quite a few times so people went then to me and said like your analysis of performance contradictor profile. +That's okay right because you didn't optimize for the profiler. Yeah because profiler cannot like so it cannot say to you what would happen if you change something. Exactly it's just a snapshot. Yeah it's just a snapshot. + And like coming back to H&SW what are you hoping to achieve like maybe in some midterm future for example like widely cited work which where the re implementation as W is when they add symbolic filtering so like what would it take in your original paper in your original algorithm to add symbolic filters how does it change the dynamic of that graph and search. +Well it seems like for me like so I can correlate interest to and then and interest to symbolic filtering so like I think two years ago I haven't heard like people talk about symbolic filtering and and but now like it's a hot topic. +Like from different places people want symbolic filtering that is like for targeting so like for ads yeah you can want to have some targeting for the audience or some other filters and but I see that as outside of the end so. +So as I said when working on a startup so our first application was doing something like symbolic filtering and there it's easier in some sense because like as you said there is a problem of this distances and high dimensional space and this problem there is no such problem in symbolic filtering. + So symbolic filtering you have a query that have exact result and like if you write the SQL query so it can be optimized to work efficiently and but the I and I does a very different job it does approximate yeah filtering it can kind of mix them together so if you add like so you have a distance and like you add some. +Like prefix for that which somehow captures the symbolic filtering and you can build an index that also like takes takes account and like there are some other people who suggested to do that as well. +So but the problem here like and yeah that can help so during search so if you filter by the symbol and like you can easily add filtering so when it's in a W does filtering for the leads like can be done the same way. + Yeah you can extract like only elements that pass the filter and there is some guidance on the graph because you create it with it but for me like I don't know so you have like huge number of possible filters so what will be the metric and how would you balance it with the like approximate network that creates a lot of problems I think yeah. +I thought that the best solution would be like to keep this like to some extent but focus more on like how do you can sharp the index according to those like. + Great theory, don't that they are sharp so you can like to a skill queries like for instance like there are some queries that can work well with this filtering like if you're most of like like I know 20% of the elements pass the symbolic filter so that is fine you can use it but maybe there are some queries for which like I know already like one of a million passes them and those are in different parts. + Yeah exactly space so for them you can see in real time so you like you search and you see that it doesn't perform well for those and you can just build a separate index for them right because you know those are small those are people want to find them maybe there are enough maybe they're out of a billion but if you have three and a element so there's like a million of them. +So you you you cash them like build a cash index for those on the fly so that is like discrete optimization problem and I think that's a bit outside of the index because index is like. +Yeah so it's focused on the different part yeah yeah and I really I don't think that other algorithms like and an algorithms can like somehow avoid this problem yeah exactly yeah I mean it sounds. + Yeah what what you say like you find a stunt correctly it like a little bit like a and then contradicts this kind of the nature of symbolic filtering in some sense but still people do it right so for example in VEV8 and in quadrant they did it right so like you and then Mildbus as well but it's funny like in Mildbus they use. + Fies and then other algorithms right but they say we only support you know integer fields but we don't support for example strings yet so we are working on adding strings which means essentially they're designing like this graph somehow in such a way that okay it doesn't support strings yet maybe because it's not so easy to to to to edit right. + Well I'm not sure so that also depends how you measure the performance like if you have rare queries like which don't have any result so like you probably algorithm doesn't even work on them but you either are rare to you measure the like overall recall and you don't see like any problems so definitely you can build a solution maybe like some simple with like filtering during search. +But like it's sure it will fail on some points and that is suboptimal in terms of latency. So if you if you talk about existing solution maybe maybe they have like a really good solution which I just don't know I looked at few and that was mostly like filtering inside the graph. +Yeah, like if you if you if you if you have really rare elements which are like distributed across the search space. Evenly like in different parts so it will struggle because you need to just do brute force of the whole. +Yeah exactly I mean to me it sounds like computerial explosion like if I add more and more symbolic filters like essentially I'm introducing like new subspaces in my space right so like I need to like push this points somehow close. +To each other within that specific symbolic filter but if I add more of them now I have like the kind of like multi-dimensional space of filters right. +Yeah and you you have a really high dimensional space of filters but you don't really know like the distribution of queries for those filters should be very different because those are user distribution. + So that also will make the problem more complicated so it still can work if you if the especially distribution is kind of similar so it will work if you crank up the parameters of the graph use more connections but so there is a mismatch so during query your distribution may be very different and you need to think about it. +So like how you balance those inside so you have like two types of distance and how you balance them you want to balance it so the the query distribution. +Yeah that's that's this field like I think this field of vector search doesn't make you excited that you you contributed to it like how do you feel about this field that is emerging right now. + So I think it is very important so right now I'm working mostly on applications how to like get advantage of this and so there are many applications which cannot be done without efficient search like there was a paper for deep mind like was quite recently where they used search like inside of the network and well. +That makes a lot of sense and I think yeah there will be more papers and there were papers before that paper but there will be more papers that use an inside inside a big like a huge and all people. + Yeah, for example, like this learning to hash methods, I don't know if you heard about them so like there are like when I when I tried to kind of put everything into their buckets like how many different types of algorithms exist like I didn't know about learning to hash it seems to be like one of the recent developments. +Are you following up on that as well or. Well learning to hash so like I'm not really following that so learning to hash was before H&S W okay there are algorithms and when I talked with people who did like what specialize on product quantization and review the papers. +They told me that like learning to hash never reaches the performance of like post quantization like at least at what that was at like few years ago. Yeah and yeah maybe like now it's solved. But like when I talk about a and then inside I think about like about graph and. +Yeah yeah and yeah and so one interesting thing also can happen like with graphs so what what is like what is an additional advantage of graph nearest neighbors search engines is that you can change the metric. +So for instance if you are doing multi stage ranking like the you have like and you have multiple multiple candidate sources like for search you have something like like like the m25 also you might have embeddings like with similarity search. +And those are like three that the separate sources and then ranked. But essentially like why do you need an end like for the first like from from the beginning. You need an end to speed up the ranking so essentially you can rank all the documents using your have a ranker. +But you cannot it's too like too expensive to do so you can add an an an is basically for vector search that is you distill everything to vectors and you have the same objective. And you have like a way to specify the interactions. +But you can look about the other way so you know you have a graph and the graph are just the candidates and you have like a low simple metric now you have more complicated metric on this graph and you have like a final ranker that also can be searched on this graph. +So that means you don't supply like a set of candidates to the ranking but rather you supply interpoints and the graphs so you have a graph which is well, which is built trying to capture the similarity for the ranking. +And like when you so instead of filtering like from one stage to the next stage, you can just switch the metric and the graph. You had light metric which is like vectors. +Now you have more complicated metrics so you hydrate the features of the elements and the graph and like traverse and like now you have a really complicated metric. Yeah, like you have it, but you still you just have an interpoint on the graph so you explore it and you can. +Well, you can fix some mistakes done by the previous layers. Yeah, so it's not exact filtering so that's yeah, that's another like maybe unique feature of the graph methods. +So it sounds quite exciting like have you have you thought about publishing this idea or like I mean it sounds quite quite unique. Well, it doesn't make sense to publish an idea without implementation. +Yeah, for sure, but maybe you can influence those who who would like to experiment on it, at least those who will watch this podcast, I think they will listen they will they will probably pick it up. And use graph algorithms for sure. +Yeah, I mean, it sounds like all of the NN algorithms they have like advantages and disadvantages, right? So it's not like the all of them are uniquely outperforming, you know, the others. +So there is like a division like if you think about like quantization algorithms, so they are kind of orthogonal to graph algorithms, so they they quantize so they can speed up a compressed like I'm compressed to save the memory and speed up the computation. +But like older algorithm they just use something like IVF so and that is like one layer filtering and you can use graphs instead of IVF right so we can use graphs and add like quantization and Faiz did that before. Yeah, I think some others also did that. +Yeah, and the thing and then like vector databases actually offer it as one method like like milvis, for example, like they offer IVF and then you can choose like if you want to do exact KNN or if you want to do a and then so you can actually configure it in different ways. + Yeah, I mean, just sounds like you're without maybe realizing much like you are at the core of what's happening in vector search in some sense, of course there have been other multiple contributions, right, but like for some reason exactly your algorithm has been picked by many vector databases, there are like seven of them, so actually wrote a block about six of them and then seventh kind of knocked on my door and said can you also add at us. +And so when I when I was going through different databases like in Java implemented in Java or in Python or you know in Rust and go all of them picked your algorithm for some reason. +So like maybe it was easier like it's a combination of how easy it is to implement how transparent it is like to understand right and then basically it's stability so it's like a combination of things. + Yeah, probably like I'm not totally sure so yeah the initial library also was implemented as a header on the not the initials or that was a second library so there was a problem with HNW implementation and NMS lip so it so like the NMS lip format was a bit restrictive like for efficient operation so it converted to flat memory format. +And so that that makes made construction slower memory conceptions bigger so was implemented as a header only library so header only library was inspired by N9 so like by the success also and I think that also might have contributed because it's very easy to like integrate it. +So there are a few files it compiles in some seconds. Yeah, maybe maybe also that help the library surface simple and easy to integrate. Yeah, yeah. + And I mean it must feel kind of cool to have this impact but but I also also hope like you you will continue kind of maybe doing some publishable publishable work in some fashion and doesn't need to be a journal which is rejected five times but something else is this something that you are planning to do or well, that depends so like I cannot talk too much about my work in Twitter so maybe maybe we will publish something so that depends on how it goes I mean I'm near even nearest neighbors. +Yeah, not not only but yeah. But I it's hard to predict now if it works well. Yeah, at least the idea that you mentioned like I mean if it's outside Twitter for example in HNAS W library like the idea of this multi stage ranking sounds quite exciting. +Well, I think it can be implemented only by the teams who own the rankers and all the whole pipeline. I think it can be implemented as like as you need to hide you need to hydrate the features and like on the fly and have feature hydration is very specific to application. +Yeah, but not only inside the production environment. Yeah, that makes sense. Yeah, so maybe it will call for creation of data sets and kind of this benchmarks if the industry chooses to move in that direction. + So like there are some obvious problems with data privacy with that so it's hard to publish something well, you can think of a toy problem so like you have like you don't do actual like work with users but maybe we do image to image search and you have like a huge transformer model on top of that or maybe like something like Marco and the smart car maybe it can be done with that. +Like experimented maybe so. Yeah, yeah, I think we went really deep today, Yuri, I think it was really really cool, cool talking to you. +I always like to still ask kind of this question orthogonal question of why like it's a little bit more philosophical but like if you're not a verse of philosophy like why would you say like this field attracted you like. Like in your own words. +Like I didn't have much choice, it just was like I got my first job offer and that was in this field. +But that's that's about scale like people like scaling and like many games when you play on like Android or other stuff they're based on scaling so you do like a little action and there are huge consequences of those actions like destroying something or like that is scaling. +And this is just like a pure scale of how how how do you scale machine learning applications. + Yeah, so on one hand it kind of was predefined as you said you found the job on the other hand you still were curious to implement that algorithm so like it wasn't like somebody said okay you have to do it right you could also choose a job of like okay I'm just coding nine to five and then I go home. +But like you still decided to implement an algorithm. Yes well that that was a fun job so yeah so like you were not scared by the challenge itself right maybe was it was it like motivating actually. +There was no like that much push like from the like from the company itself so we could we could do whatever we want inside the company so it was very relaxed. +Yeah that might be actually a really good background to invent things don't you think like if you if you come to work and somebody says no you cannot do what you want you should do this and it might be kind of too restrictive. +But here like they've been both challenges and also that freedom to solve those challenges. +Yeah there are like two components first of all you need to have like freedom and do long long term stuff so like without worrying of like what are you going to ship into production soon the second is concentration of talent. So you have like high concentration of talent so people can share ideas. +Yeah if you have this mix so like there will there will be innovations for sure. Yeah it sounds like you had a combination of all three components that you mentioned right so like talents and also yeah. +Yeah yeah I also saw that another company is like yeah like in some song there was already a strong team and there were like lots of innovation so there are a few startups. Which came from our lab and there was a really good paper. Yeah so that's a that's a recipe for innovation for sure. +Yeah yeah I'm really happy that it turned out so well to you for you and your author as well. I think he continues to work also in the industry at least last time I checked. And so I really hope that you will get some really cool pull requests on hnsw that will pass your criteria. +Well yeah most of them pass is just like I would love to have more time and I'll try to allocate more time. Yeah looking and checking them. Yeah it's really really great I really enjoyed talking to you Yuri thanks so much for allocating your time also in this Christmas time. +But yeah I mean all the best to you in the future also Twitter and hope to see some published work at some point but I don't know it just I enjoyed reading your paper and and. Kind of also then read read your code and it's kind of like it feels like you've put a lot of effort there and and. +It it also influences the industry so much today so maybe you are not kind of realizing this every single day but like yeah you should know this that there are so many databases that use your algorithm as one of the base lines in production. It's really cool work. +Yeah that that that that that was great that there was success yeah maybe one thing like I would note that the idea is the rain cares so that was partially implemented and there needs work. So here the work on ER maybe you know like. +By using the and then as the for the final rain care so yeah it's just like so I felt that I need to cite this sure I need work for sure I learned this idea like maybe not was changing but from from him. + Yeah yeah sounds great I mean I've also interacted a bit with him and and it sounds like he's very knowledgeable guy and he has very strong opinions as well so maybe we will also talk with him on one of the episodes but yeah I'm glad that you guys collaborated and yeah it's a fantastic result for for the industry as well and and probably for your profiles well not probably but definitely for your profiles. +So yeah thank you so much for your time and yeah I hope you will have a relaxing time over the Christmas and happy new year as well so thank you very much for your time Yuri. Thank you. Yeah bye bye. Music \ No newline at end of file diff --git a/transcripts/vector-podcast/yusuf-sarigoz-ai-research-engineer-qdrant-getting-to-know-your-data-with-metric-learning.md b/transcripts/vector-podcast/yusuf-sarigoz-ai-research-engineer-qdrant-getting-to-know-your-data-with-metric-learning.md new file mode 100644 index 0000000..93839c4 --- /dev/null +++ b/transcripts/vector-podcast/yusuf-sarigoz-ai-research-engineer-qdrant-getting-to-know-your-data-with-metric-learning.md @@ -0,0 +1,186 @@ +--- +description: '

Topics:

00:00 Intro

01:03 Yusuf’s background

03:00 + Multimodal search in tech and humans

08:53 CLIP: discovering hidden semantics

13:02 + Where to start to apply metric learning in practice. AutoEncoder architecture included!

19:00 + Unpacking it further: what is metric learning and the difference with deep metric + learning?

28:50 How Deep Learning allowed us to transition from pixels to + meaning in the images

32:05 Increasing efficiency: vector compression and + quantization aspects

34:25 Yusuf gives a practical use-case with Conversational + AI of where metric learning can prove to be useful. And tools!

40:59 A few + words on how the podcast is made :) Yusuf’s explanation of how Gmail smart reply + feature works internally

51:19 Metric learning helps us learn the best vector + representation for the given task

52:16 Metric learning shines in data scarce + regimes. Positive impact on the planet

58:30 Yusuf’s motivation to work in + the space of vector search, Qdrant, deep learning and metric learning — the question + of Why

1:05:02 Announcements from Yusuf

- Join discussions at Discord: + https://discord.qdrant.tech

- Yusuf''s + Medium: https://medium.com/@yusufsarigoz + and LinkedIn: https://www.linkedin.com/in/yusufsarigoz/ +

- GSOC 2022: TensorFlow Similarity - project led by Yusuf: https://docs.google.com/document/d/1fLDLwIhnwDUz3uUV8RyUZiOlmTN9Uzy5ZuvI8iDDFf8/edit#heading=h.zftd93u5hfnp +

- Dmitry''s Twitter: https://twitter.com/DmitryKan

Full + Show Notes: https://www.youtube.com/watch?v=AU0O_6-EY6s

' +image_url: https://media.rss.com/vector-podcast/20220507_080542_57009c58f961b6d0713e057b9a5a4832.jpg +pub_date: Sat, 07 May 2022 20:37:42 GMT +title: Yusuf Sarıgöz - AI Research Engineer, Qdrant - Getting to know your data with + metric learning +url: https://rss.com/podcasts/vector-podcast/479453 +--- + +Hello, today we have a new episode of the Vector Podcast and today I'm super happy to have you, Suf Sangos, with me. He holds the role of AI Research Engineer at Quadrant. +It's a Vector Search Database Company and you might remember we had an episode with Tom Lackner, who is the user of Quadrant today. We have an episode and discussion with you, Suf, who works for Quadrant. +And one of the core topics today we're going to be discussing metric learning, but before that, hey, you Suf, how are you doing? I'm very excited to join you in this episode to discuss metric learning and thank you for having me. Yeah, thanks for coming up. +Really, I think this topic is something that has been crossing my area of focus and also some of the questions that users are asking, you know, okay, if I have this data set, how can I be sure that it will work with neural search, right? And I think metric learning seems to be one of the answers. +But before we start discussing this in deep, in depth, I was thinking, could you please introduce yourself to our audience? Yes, sure. Armist has told Suf's software developer and AI researcher with a background in linguistics at the university. +Actually, I've been developing software since my high school years. During my master's study, I combined my experience and my education to study machine translation. +After several years of experience in different roles and at different startups, I ended up with the multi model retrieval because I had a long experience in both computer vision and measured language processing. So for some time, my main focus is metric learning. +I was already a user of co-advent, even before joining co-advent and I thought it would be very cool to work for an open source project that I find valuable myself. Yeah, sounds awesome. Sounds cool. You just mentioned multi model. +So you mean like multi model search, right? And I think this field is still kind of in many ways shaping up and many people are still learning and kind of scratching their heads like what is multi model? Like maybe if you could give an example or a little bit explain what is multi model. Yes, sure. +Actually, as you just said, multi model is quite a new topic actually. Actually, it's resurrecting with developments in deep metric learning. One of the most famous applications is a clip by OpenAI, short for contrastive language image, the pre-training. +In the most basic term, they train a model to construct a unified vector space for both images and tests. Basically, they have two encoders, one for images and one for tests, support that you have a pair of images and its textual description. +When you see this image and that textual description to these encoders, you are supposed to get very similar vectors, vector output from these encoders. So you can search images with a textual query or Y-14. +So you sort of crossed the, so in a way with one modalities text or image is another modality, but in this case, we kind of like cross go across modalities. I think we can cross the border of modalities with this. +Yeah, which I think to many users will sound like a magic because you essentially, if you view an image like a set of pixels and if you query textual queries a set of words, now you sort of somehow magically search your words in pixels, but actually that's not exactly what's happening. +Of course, we do the embedding and so on, but in a nutshell, it kind of sounds like this magical cross model search there. + Yes, I expected for newcomers is a little bit like magic, but from quite a long time, we have already been using vector search in the context of image search, but in that case, we search for images with a query which is image if that, but in this case, we make a connection between two modalities actually. +This is also how our human brain is functioning. +For the most of the time, we don't consume the information from a single modality actually when we try to understand our environment, we both take it as a visual input and also an audio input and we also talk to people around them for it gives us a better understanding of the environment. +So if we want to make our AI smarter, we also need to help them gain this ability as well. So beyond searching for images with a textual query, this also helps us to combine information from different sources. +So in this case, maybe we can also have AI better understand its environment by combining, for example, a stream from the camera and also maybe an output from a speech recognition and encoding them into a vector we can combine these two vectors to fit into that encoder. +So this also opens such new opportunities. Yeah, that's a great intro there also like how you gave analogy with how human brain functions, so like how we take so many signals into our decision making. +And specifically, like what you mentioned about clip, I like the fact that in practical settings, let's say if you have images, let's say of some goods and you want to make a search in those goods and you also have some metadata, let's say titles or descriptions, right? +It may be that some human decided what to put in that text, but they didn't put everything that there is on the image, right? And so I think clip helps us to find sort of semantics that's hidden inside the image itself, right? So I think that's kind of like has practical impact on what we built. +Yeah, exactly. + Actually, in the traditional source, for example, let's get the product source as example, when you want to develop a product source for, for example, an e-commerce website, you need to enter different terms that can define that product to have a user's find that product with different wording, but this is not so practical because people use very different terms to refer to things. +And you in the current capacity of e-commerce websites, we have hundreds of thousands of products and they also need to be updated once you add new products and remove new products. +And also like myths acted at typos to this complexity, is actually explored to millions, maybe a tens of millions of possibilities. This is beyond the power of humans actually. +But once you make connections, make a connection between text and images, you don't need to enter such descriptive text, you only encode images into vectors and index time into a vector database. +Then in the inference time, all you need is just encode the textual input as well and create that pre-indexed database to get similar results. +Actually, this also buildings new opportunities, for example, people usually enter some pre-defined textual descriptors in this search engines, but some new products may have brand new features that people are not accustomed to. +So even in this case, our vector search based solution that combines images and text can be in that image as well. Yeah, that sounds cool. So it kind of opens up a lot of opportunities that didn't exist before when we modeled our object purely through textual representation. +Maybe somebody did attempt to also encode images of some other binary format, but I think maybe it wasn't as efficient or definitely not multi-model. So that sounds so cool. +And so how do you connect? Where do you start? Usually, let's say if you have a data set, right? And you want to implement neural search experience. +At one point of time, do you start thinking about what the metric is the best for my data set? And also, how do you approach it from which angle do you usually approach this? And this is something that really helps you to hear your theoretical as well as practical thoughts of this. +Yes, you're actually there are lots of very different techniques and methods and approaches to metric learning that can work for some specific types of problems. +But in my practical experience, I usually begin with with with an auto encoder, because it's already very easy to implement and easy to train. It can be applied to almost any data track. Basically, in auto encoders, we have two models and encoder and the encoders. +The encoders part encodes samples into an dimensional vector. This and should be much lower than the dimensionality of the input sample. And the decoder is supposed to reconstruct the input sample when this encoded vector is given to it. So this is a the self-provised method. +So it can be applied to any type of data set. You don't need labels. It usually gives a very good resource. After training such a model, you can visualize embedding. We call the output of the encoders, vectors embedding. So you can visualize such embedding with a tool. +This tool can be, for example, TensorFlow projectors and another tool by Yubach. I just couldn't show my word in there. Sorry. No worries. We can find those links later, I guess. Yeah, we can put a link in the description. +And this visualization tools have us see if our encoders really involve similar samples need to each closer to each other than the similar ones. If it is, we can use this encoders part. +We can just dispose the decoder part and we can simply keep the encoder part and use it to encode our samples and index them in the vector. And we can already start searching semantics. But we usually do buzzers than this one with only small set of labeled data. +And you actually need only a few with that one. Actually, we are preparing some publications to demonstrate this one. After you train and encoders with a considerable number of unlabeled data, all you need to do is just to find to in it with a small set of labeled data. +On the supervised site, there are really quite a number of very different approaches to matrix learning from more traditional margin-based approaches to newer categorization-based approaches. And actually, they deserve a long discussion of data. For sure. Yeah, that's awesome. +But just to unpack it a little bit, so in a natural metric learning process allows me to learn the optimal distance metric for my data. So it's kind of like a function of my dataset properties, inner properties. Yeah, actually, let's clarify this metric thing. +What does it mean in this context? In this context, a metric is a non-negative function with two inputs. Let's say X and Y. And it is used to measure what is called the distance between X and Y. When we feed such two inputs, it gives us a scaler's positive value. +If this value is closer to zero, then we can assume that those two inputs are more similar to each other with two inputs with a higher distance value. So our whole objective in metric learning is to train functions that can give this distance value. +On the practical site, we usually train a model that outputs a vector and a dimensional vector. And then we can apply different distance functions such as Euclidean and cosine distance to get a measurement of the distance value. There is also a term deep metric learning. +Actually, the traditional metric learning uses some linear transformations to project samples into an dimensional feature space to apply a metric function. +But this linear aspect of such transformations limits the use of traditional metric learning using time with more richers, data types, for example, images and texts. +So deep metric learning benefits from the methods of deep learning to learn non-linear transformations to project samples into a new and dimensional vector space. +But in this context, I usually use metric learning as an umbrella term to refer to both traditional metric learning and deep metric learning. Just like we do with machine learning to refer to both classical machine learning and deep learning. Yeah, that makes sense. Thank you. +And so essentially, in the lay main terms, deep learning allows us to vectorize data objects that previously we couldn't vectorize in a celly, so images or I don't know. And do it efficiently, because in images, you might have way too many pixels. +So if you just take the vector of all the pixels, it's way too big of an object to deal with. And so you vectorize, as you said, in the beginning, and you basically sort of project it in a lower dimensional space. So now you can actually efficiently operate on it. Exactly. +Let's get images as an example. Let's assume that we have images with a size of 200 times 200. And we also have a channel value of three. So we end up with 200 times 200 times three values for a single image. And also, let's actually, too many values also mean a great variance value. +So it's not so practical to make a measurement between two images, because those pixel values can include very surface, quite shallow surface features that do not make any sense in our semantics. +But once we encode those high dimensional inputs into a low dimensional vector space, for example, we usually have 500 to 12, 10 to the 12, 12, 12, 14 dimensional vectors. And this value is really low when compared to the original dimension of that sample. +So in this case, that model should learn, should learn a representation of high dimensional samples. Actually, we just throw the unnecessary part of those samples, and we only keep the part that matters for us. Yeah, yeah. +So kind of in some sense, you could say it's like signal compression, right? +So in some sense, like using the signal law, like the distribution, you could actually compress things, like I don't know if theoretically speaking in an image, you have like one object, and the rest is just the background of one color. +You really don't need to pass all these pixels independently, like you could just say, okay, it's a background I've learned that it's that color kind of semantically, I guess, and then what matters is the object somewhere there that we focus on when we look at this picture, right? Yeah, exactly. +Actually, in the original distribution case, for example, of images, we don't have any connection between the value of a pixel and the semantic counterpart of that pixel one. But once we transform it into a vector space, at least theoretically we can make conclusions. +For example, we have a 1024 dimensional vector as a representation of that image. In this case, if we examine this vector space, we can make conclusions of this value in the index zero, in cause the features of this feature of image. +For example, it can, in cause the size of a specific object or the colors value of a specific object or maybe some more abstract features of objects. This enables us to search it more efficiently instead of otherwise our values are actually distributed to a very wide range. +And we don't have such interpretations in that distribution space. Yeah, that makes sense. It's a very unique high variant and also in some senses, like waste of space because we are not communicating that much more information by sort of encoding all these pixels. +But we could actually extract some features and patterns in the image. +I think some early work on this was done using, if I remember, it was called a Godworth filter or some other ways of kind of smoothing your image and trying to learn what features you have, for instance, if you try to differentiate between spruce and widely trees. +So like for the purposes of keeping one tree and then maybe removing the others. But I think it wasn't as efficient perhaps as compared to deep learning because deep learning, as far as understanding, basically like learns without features in many ways. +It learns from the data and then you should have some target function that you're optimizing for so it can recreate the weights inside it. Exactly. +Actually, what is most differentiating feature of deep learning is deep learning is actually used to learn the parameters of complex functions instead of manually tuning them. Before deep learning, we already had most of the filters we currently have. +But the parameters of such filters were supposed to be manually tuned by experts in that domain. But in deep learning, we learn those parameters directly from data. And as you said, actually, the beginning of metric learning is also in dimensionality reduction. +We have most popular contrastive loss, for example. + And the first introduction of contrastive loss is in 2005 and the original purpose of that function actually to reduce dimensionality of high dimensional inputs rather than vector source or anything F for actually another end just tried to reduce the dimensionality of high dimensional input to use lower dimensional input F features to other models. +Yeah, that sounds exciting. Actually, before you brought this up, I didn't think that way because I was experimenting in my team also with things like product quantization. So you do have all already the vectors computed by the neural network, but you could actually quantize them even further. +So you save space and maybe of course you introduce some overlaps that might decrease your precision, but slightly, but you're gonna save a ton of space and make your search more efficient. +So it's almost like you could think of dimensionality reduction in so many different levels and ways as you have the reason about your data, right? Yeah, exactly. +Actually, metric learning is itself a type of dimensionality reduction, but even after you apply metric learning and vector encoding to your data, you still have a high dimensional vector. You have, for example, 10, 10, 10, 10, 4, dimensional data times 32 bits for a single flaw. +So it's already a huge data when you have, for example, millions of samples. So you can still actually apply some quantization methods to get even smaller representations from that one. +And this can be also hierarchical, meaning that you can get several representations of the same sample at different levels of information encoded in that feature space. Yeah, that's fantastic. + So I was also thinking like, if you could give like some practical example or setting where I could start thinking about deploying metric learning and also like, could you sort of point us in the direction of what tools are available so that I don't think we need to reinvent everything from scratch, but maybe there are some practices, also best practices available, you know, to structure this process. +Can you give some advice on that? Yeah, sure. For a starter example, actually, metric learning is best known for is used in face recognition, but personally, I don't support use of machine learning to process biometric information. +So I give an example from our everyday life, actually, we almost everyday use it, smart deploy. The feature found in, for example, GMA, LinkedIn, and other messaging apps. Actually, it is trained from a large collection of conversation histories in these platforms. +Basically, they just like the example we put in the beginning image and textual unified vector space, they construct a unified vector space for conversation histories and single sentences. +For any moment of conversation, you encode the history of that conversation to retrieve most relevant replies to that history. And you can show them as suggestions to the usage, and users can pitch one of them. +And what is exciting with this setup, you can also log the chosen reply, and you can continue improving your model from direct feedback from your actual users. So it's a really practical use case of metric learning. +And for practitioners who want to start experimenting with metric learning, actually, there are lots of tools to solve very few problems in metric learning. +So in the context of deep learning model development itself, we have several libraries, such as high-torch metric learning and transfer flow similarity. +There are other libraries as well, but I think these are the most mature libraries and most cultural, how should I say, virtual libraries to tackle with different data tasks. +On the other hand, for visualization, we have this transfer flow projector, is a browser-based tool for you can examine your embedding easily with that one. +There are also vector search databases, there are increasing in numbers, but of course, I am a fan of Quaddon because it's really doing a great job with an extensive filtering support for a variety of data tasks. And it's doing this very efficiently, very elegant in only 40 megawise. +So it opens up very important is to put your metric learning model into production and to combine vector search with super search as well. So you can just filter your data based on their payload information at the same time as vector search. +I think these are other than that, beyond beside my research and engineering practices, I'm also maintaining a repository called Automatic Learning and I'm regularly sharing new developments in the domain of metric learning with personal annotations. +So I think it might be also quite helpful for those who want to find their ways in this domain. That's awesome. Thank you. +I will certainly make sure to add all of these links in the description notes, in the notes to this podcast and usually all of these podcasts that I do, they have a lot of links that actually you almost can use as an educational material. And thanks so much for adding so much information here. +And I actually wanted to drill a little bit again into that example that brilliant example you gave about predicting sort of what snacks when I type. +Actually, I used this feature quite a lot and especially like when you're on the go and today I think I've used it somewhere with a Gmail, I was on the go and I had only one finger, right? +So just holding my phone as I go and there was a question and the answer was something, yes, it happened or yes, it did. +And maybe it wasn't the best sort of semantical choice or maybe not the most elegant choice linguistically, like maybe I would add more color, but because I was on the go, it was fine to save that, you know, few minutes and don't be distracted by the phone. +So I just pressed that button and off it goes. And so that's a fantastic feature. So I wanted to sort of open up the process a little bit of metric learning in this case. Basically, I imagine and please correct me if I'm wrong. +As an input, I would have, let's say, a pair of sentences that what was the input and what was the prediction and that prediction could be either curated by experts or we could have minded from the logs, whatever. +So let's say we have a corpus like this, right? So we can employ sequence to sequence model or some other model to actually train like our first first predictor. +So at which point would you start thinking and how exactly would you start thinking about metric learning? Like how can I change the behavior of my model? Like will I replace like last layer of my neural network with like different layer that I have learned from metric learning? +Can you a bit open up this kitchen for me? Thanks. +Actually, this smart supply has its own paper by Google as well and they are really doing a great job to describe the whole logic to whole design decisions behind this feature. + As you already said, the suggested duplicates are not the best, the most specific replies that you can imagine, but this is actually are spied design because they do not generate those replies, but they have a large collection of such replies and they should be as flexible as possible to fit into different circumstances. +So they shouldn't have any specific references to a specific sentence in the conversation. So that should be a generic enough to apply almost any conversation. +For the training slide, yeah actually, they filter a large collection from the different platforms they are running Gmail and other platforms and they filter short replies and thematically more broad samples such as as you gave as an example. Yes, I did or no, I didn't does it have such examples. +And the actual training algorithm works like this. They actually come up with a very creative, very clever, lost function for just a terrain with this model. They have only a pair of two samples and there is no other label or information. +We only have one input and one ground truth, we have no other scoring, no other label or anything else. So we only get a batch of, for example, and samples and we encode those two and samples because we have two samples first page and we end up with two and samples. +And once we encode them with our encoder, we can compute a distance matrix between these all posts of the encoder. A distance matrix is a two-dimensional matrix to define every distance value between all possible pairs in a collection. +So we have a matrix of five and times and and we already have these samples as pairs. We already know that there is a company target samples for the sample, for the first sample at index zero, the company sample should also be at index zero for sample at index one. +The company sample sample should be at index one. So we can generate these target labels just based on this information. So it's like a categorical classification now. +So for the first sample in the pair at index zero, the categorical label should be zero and others all index values should be wrong. +So we can just encode this information as a one-heart encoding and we can simply use a Crohn's entropy loss once we encode this information as a one-heart encoding and we can train this model with this loss. +So it is called multiple negative ranking loss because in in some way we rank all possible replies in a batch with multiple negatives and only one positive sample. Yeah. +And so you would train this network with this with this loss function and so the output will be what? Like will it be like the optimal metric or optimal? Yeah. +In one, we train this model, we end up with a model that can encode a sentence in such a way that that vector can retrieve the most relevant vectors from a collection of possible replies. So after we train this model, we encode all possible replies and index them in a vector database. +And at the inference time, we encode the usage input again with this model and make a query, a vector source query to that pre-index database of possible replies and we can get, for example, a car, a chain, a nearest neighbor to that vector to suggest to use it. Yeah. +I mean, after you explain this, like to me, like the mental image that evokes is that we sort of like learn rather than learning the metric, we're actually learning the vectors themselves. +We're learning the best vector representation for our object to satisfy some goal, right? Let's say that for this sentence, the closest reply should be this in some sense. Yeah, exactly. Actually, the model learns a representation that satisfies the satisfies our purpose. +So in some way, we can fully pick any distance metric based on this intuition. Yeah. So the second part of your question, when we can think about metric learning. +Actually, metric learning can be applied to almost any domain of problems, but there are some particular cases where metric learning really shines over other alternatives. +These are actually data scarce regimes, especially for labeled, if you are short for labeled data, you can still do a pretty good job with, for example, auto encoder, as we already discussed previously. And also, if you have rapid-changing distributions, it's again, very helpful. +And if you have, for example, a very, very high number of classes, again, metric learning can do a good job. Finally, metric learning is one of the best way to be able to actually increase the performance of machine learning models, even after training. +In normal deep learning training, there is no way to increase the performance of a model after training is complete. But in metric learning, this is quite possible. +For example, instead of just a training classification model to make a probability distribution over set of classes, we can train a metric learning model and encode samples with that model to store somewhere. +And during the inference, we can query that store to get more similar chain nearest neighbors and decide on the predicted category based on the majority of those chain nearest neighbors. This is called chain uncostication, in fact. +And in the practical side, on the practical side, you can continue to add new samples to that store without any need to retrain the model. And once you add new samples to that store your model performance will also increase. +And also, there is another use case, for example, a more recent approach by DeepMind. Up until now, the only way to make AI smarter is usually train a bigger and bigger language model. But in the most recent study by DeepMind, they augment language models with retrieval capability. +This means actually they encode and store a large collection of corpus in a Rector's database. And during the inference, they query this database to get most relevant sentences, most relevant text to the user input. And they combine them to feed to the model. +And with this technique, they can achieve the same performance as GPT3 with 25x less parameters. So it's a very efficient way of AI. So I'm also quite happy to see the direction of AI towards a more efficient one with metric learning as well. Yeah, yeah, it's fantastic. +And I think it's like a good impact on the planning, because I don't think we want to spend too much electricity on all power and training neural networks. Yeah, exactly. +And it also enables democratization of Deep Learning, because not everyone has the same resources as this large companies as Google, Facebook, and OpenAI. So I think it's also important for that reason as well. Yeah, that's fantastic. I mean, you gave quite a lot of detail on metric learning. +Of course, there is a ton to learn. And I even, I've seen like a book cited on one of the metric learning pages that I found through your awesome metric learning resource. And now that we touched a bit on where the AI is also going and then how to make it more efficient. +I also like to ask a question of why sort of this magical question which drills into your motivation as to why at all you are in this space, let's say deep learning and quadrant vector search and also specifically metric learning. +Can you a bit elaborate on the philosophy that drives you here? Yes, sure. Actually, what motivates me to work with metric learning is it's potential to approach many different problems very efficiently. +Before metric learning actually, you need to train very different models to solve very different problems. But with metric learning, you can train a single model and you can use the very same model to solve very different problems. +And this is also another fight that makes metric learning efficient. Actually, metric learning has a great potential, but you also need a great tool to put it into production. For example, upon to now there was no way to combine vector search with paid-out information. +Even if you make a connection, it was not for practical because you lose some information because you do not, you could not filter the tool systems of information at the same time. Quadrant is doing a great job by combining vector search with filterable paid-out information. +So it opens up quite a few new opportunities. For that one, you can filter your information based on if geographic, geographic place, for example, or another sparse category, numeric value or anything else while at the same time doing a vector search. So I think it's really exciting. +One of the most common problems in AI, you actually do the research, but you don't have the required tooling to make it practical in the real world. +So I think it's quite important to have such tools as Quadrant to achieve very different, very difficult and challenging problems very alien than efficiently. Yeah, absolutely. That's quite deep. Thank you so much for sharing this. +It also resonates with me because in many ways, you know, deep learning on one hand, maybe some people feel like it's kind of overhyped and there is so much material on the web. +On the other hand, when you start doing it yourself, you might end up, you know, going into down the rabbit hole and you don't know all the tools as you said. You don't know all the best practices. +And also, like, before we had vector databases, you couldn't actually, well, apply this, like, okay, you could of course build some nice demo and, you know, throw a web page and just ask somebody, okay, type something here and my neural network will do something. +But now, like, you could kind of scale this further and index your embeddings and see the end result of what you're doing through the retrieval process. So I think that opens up a lot of opportunities. So that's super cool. Yeah, exactly. +Actually, once we have such tooling, the domain is also improving more rapidly and also the improvements in the domain also foster development of such tools. +So I think it's like too far and it will be a metric learning will be in a better place in the future with this rapid developments in the domain. Yeah, absolutely. +And I was thinking, like, there's like a ton of material, I'm sure we'll have to digest, at least I will have to digest a lot of it and see how I can apply this. And thankfully, you have, you know, you have this awesome metric learning resource on GitHub that we can check out. +We'll make sure to leave it in the notes. And if if some of us want to kind of work with you or interact with you, can you make like a little announcement where we can join forces and kind of learn more about metric learning and maybe contribute to this field together with you? Yes, you're right. +I have several announcements maybe. First, beyond my resource and engineering site, I'm also a community guide guide and we have a difficult server at Quadrant where we hold paper reading class. We had the first one about contrastive laws and we will also have another fashion about triplet laws. +And I also wrote a wrote an intuitional triplet law post. Our approach will be like, after I will write such intuitional post about papers and then we will hold Q&A sessions in our discourse servers. +So everyone who is curious about metric learning can join the discourse server to enjoy this discussion. Apart from that one, beside my professional life, I'm recognized as a Google developer expert on machine learning, on the volunteering site, community site. +And this year at Google's thunder off call, I will serve as a transfer flow mentor for the transfer flow, the library take Python package, if a package for metric learning in the transfer flow ecosystem. +So university students and fresh graduates can apply to Google's thunder off call if they want to work with me in this effort and contribute to the field. That's fantastic. + I think Google's thunder off code is an exciting place to be and there are so many projects but it's great to learn that you are leading the metric learning exploration there and I'm sure there will be interest towards it and I will make sure to also leave the relevant link in the show notes on this. +Yeah, thanks so much. Use of this this was a pleasure to discuss with you. I feel like I dipped some of my fingers in the water of metric learning. I think there is still a ton to learn and thanks so much for introducing it from so multiple angles. We've enjoyed this conversation. +Thank you Dimitriv again for this great opportunity. I hope the audience also enjoyed it as well and I hope it will be helpful for those who are interested in metric learning. Yeah, for sure. Thank you so much. +I learned a ton and I hope I'll also see you maybe doing some presentations or reading your blogs to learn more about it. Thanks so much. Thank you so much. Yeah, bye bye. \ No newline at end of file From 83f179b8a9cf26e69110382a562a119d1d27b318 Mon Sep 17 00:00:00 2001 From: Dima Kan Date: Fri, 21 Nov 2025 11:36:31 +0200 Subject: [PATCH 2/2] added timecode support --- STREAMLIT_UPDATE_PROMPT.md | 139 + rewrite_rules.json | 1 + src/apply_rewrite_rules.py | 26 +- src/os_index.py | 68 +- src/os_ingest.py | 163 +- src/transcribe.py | 127 +- ...mizer-with-daniel-wrigley-and-eric-pugh.md | 2760 +++++++ ...rch-product-on-neural-search-principles.md | 4359 +++++++++++ ...tionizing-e-commerce-with-vector-search.md | 5153 +++++++++++++ ...-2024-alessandro-benedetti-llms-in-solr.md | 2019 +++++ ...s-2024-doug-turnbull-learning-in-public.md | 1675 ++++ ...zzwords-2024-sonam-pankaj-embedanything.md | 1143 +++ ...mi-on-the-weaviate-vector-search-engine.md | 4459 +++++++++++ ...mpathy-and-artifacts-with-john-berryman.md | 3520 +++++++++ ...tic-university-founder-at-henry-ai-labs.md | 4312 +++++++++++ ...t-weaviate-chatgpt-llms-form-vs-meaning.md | 4895 ++++++++++++ ...-ml-for-query-and-content-understanding.md | 3273 ++++++++ ...vector-search-and-llms-with-leo-boytsov.md | 2867 +++++++ ...rch-as-a-constant-experimentation-cycle.md | 4317 +++++++++++ ...ds-with-simon-eskildsen-ceo-turbopuffer.md | 4199 ++++++++++ ...gh-measuring-search-quality-with-quepid.md | 2895 +++++++ ...oka-data-at-the-core-of-all-the-cool-ml.md | 2887 +++++++ ...ector-database-and-working-with-clients.md | 5860 ++++++++++++++ ...ch-consultant-engineering-better-search.md | 3903 ++++++++++ ...pinecone-vector-podcast-with-dmitry-kan.md | 2170 ++++++ ...of-vespa-from-sparse-into-neural-search.md | 4169 ++++++++++ ...an-fontanals-principal-engineer-jina-ai.md | 2922 +++++++ ...andy-sql-meets-vector-search-at-rockset.md | 3062 ++++++++ ...the-academia-industry-gap-with-haystack.md | 3051 ++++++++ ...le-in-embedding-computation-with-mighty.md | 6142 +++++++++++++++ .../saurabh-rai-growing-resume-matcher.md | 1396 ++++ ...f-swirl-search-in-siloed-data-with-llms.md | 5254 +++++++++++++ ...-ii-bring-ai-to-company-data-with-swirl.md | 1979 +++++ ...t-challenges-and-joys-of-ml-engineering.md | 3385 ++++++++ .../trey-grainger-wormhole-vectors.md | 6801 +++++++++++++++++ ...-the-rise-fall-and-future-by-notebooklm.md | 2211 ++++++ ...hium-hardware-accelerated-vector-search.md | 2490 ++++++ ...-of-the-most-adopted-ann-algorithm-hnsw.md | 3878 ++++++++++ ...-to-know-your-data-with-metric-learning.md | 2188 ++++++ 39 files changed, 116084 insertions(+), 34 deletions(-) create mode 100644 STREAMLIT_UPDATE_PROMPT.md create mode 100644 transcripts_with_timestamps/vector-podcast/adding-ml-layer-to-search-hybrid-search-optimizer-with-daniel-wrigley-and-eric-pugh.md create mode 100644 transcripts_with_timestamps/vector-podcast/amin-ahmad-cto-vectara-algolia-elasticsearch-like-search-product-on-neural-search-principles.md create mode 100644 transcripts_with_timestamps/vector-podcast/atita-arora-search-relevance-consultant-revolutionizing-e-commerce-with-vector-search.md create mode 100644 transcripts_with_timestamps/vector-podcast/berlin-buzzwords-2024-alessandro-benedetti-llms-in-solr.md create mode 100644 transcripts_with_timestamps/vector-podcast/berlin-buzzwords-2024-doug-turnbull-learning-in-public.md create mode 100644 transcripts_with_timestamps/vector-podcast/berlin-buzzwords-2024-sonam-pankaj-embedanything.md create mode 100644 transcripts_with_timestamps/vector-podcast/bob-van-luijt-ceo-semi-on-the-weaviate-vector-search-engine.md create mode 100644 transcripts_with_timestamps/vector-podcast/code-search-copilot-llm-prompting-with-empathy-and-artifacts-with-john-berryman.md create mode 100644 transcripts_with_timestamps/vector-podcast/connor-shorten-phd-researcher-florida-atlantic-university-founder-at-henry-ai-labs.md create mode 100644 transcripts_with_timestamps/vector-podcast/connor-shorten-research-scientist-weaviate-chatgpt-llms-form-vs-meaning.md create mode 100644 transcripts_with_timestamps/vector-podcast/daniel-tunkelang-leading-search-consultant-leveraging-ml-for-query-and-content-understanding.md create mode 100644 transcripts_with_timestamps/vector-podcast/debunking-myths-of-vector-search-and-llms-with-leo-boytsov.md create mode 100644 transcripts_with_timestamps/vector-podcast/doug-turnbull-staff-relevance-engineer-shopify-search-as-a-constant-experimentation-cycle.md create mode 100644 transcripts_with_timestamps/vector-podcast/economical-way-of-serving-vector-search-workloads-with-simon-eskildsen-ceo-turbopuffer.md create mode 100644 transcripts_with_timestamps/vector-podcast/eric-pugh-measuring-search-quality-with-quepid.md create mode 100644 transcripts_with_timestamps/vector-podcast/evgeniya-sukhodolskaya-data-advocate-toloka-data-at-the-core-of-all-the-cool-ml.md create mode 100644 transcripts_with_timestamps/vector-podcast/filip-haltmayer-data-engineer-ziliz-on-milvus-vector-database-and-working-with-clients.md create mode 100644 transcripts_with_timestamps/vector-podcast/grant-ingersoll-fractional-cto-leading-search-consultant-engineering-better-search.md create mode 100644 transcripts_with_timestamps/vector-podcast/greg-kogan-pinecone-vector-podcast-with-dmitry-kan.md create mode 100644 transcripts_with_timestamps/vector-podcast/jo-bergum-distinguished-engineer-yahoo-vespa-journey-of-vespa-from-sparse-into-neural-search.md create mode 100644 transcripts_with_timestamps/vector-podcast/joan-fontanals-principal-engineer-jina-ai.md create mode 100644 transcripts_with_timestamps/vector-podcast/louis-brandy-sql-meets-vector-search-at-rockset.md create mode 100644 transcripts_with_timestamps/vector-podcast/malte-pietsch-cto-deepset-passion-in-nlp-and-bridging-the-academia-industry-gap-with-haystack.md create mode 100644 transcripts_with_timestamps/vector-podcast/max-irwin-founder-max-io-on-economics-of-scale-in-embedding-computation-with-mighty.md create mode 100644 transcripts_with_timestamps/vector-podcast/saurabh-rai-growing-resume-matcher.md create mode 100644 transcripts_with_timestamps/vector-podcast/sid-probstein-creator-of-swirl-search-in-siloed-data-with-llms.md create mode 100644 transcripts_with_timestamps/vector-podcast/sid-probstein-part-ii-bring-ai-to-company-data-with-swirl.md create mode 100644 transcripts_with_timestamps/vector-podcast/tom-lackner-vp-engineering-classic-com-on-qdrant-nft-challenges-and-joys-of-ml-engineering.md create mode 100644 transcripts_with_timestamps/vector-podcast/trey-grainger-wormhole-vectors.md create mode 100644 transcripts_with_timestamps/vector-podcast/vector-databases-the-rise-fall-and-future-by-notebooklm.md create mode 100644 transcripts_with_timestamps/vector-podcast/yaniv-vaknin-director-of-product-searchium-hardware-accelerated-vector-search.md create mode 100644 transcripts_with_timestamps/vector-podcast/yury-malkov-staff-engineer-twitter-author-of-the-most-adopted-ann-algorithm-hnsw.md create mode 100644 transcripts_with_timestamps/vector-podcast/yusuf-sarigoz-ai-research-engineer-qdrant-getting-to-know-your-data-with-metric-learning.md diff --git a/STREAMLIT_UPDATE_PROMPT.md b/STREAMLIT_UPDATE_PROMPT.md new file mode 100644 index 0000000..34ba95e --- /dev/null +++ b/STREAMLIT_UPDATE_PROMPT.md @@ -0,0 +1,139 @@ +# Streamlit App Update: Support Timestamp-Enabled OpenSearch Index + +## Context + +A new timestamp-enabled OpenSearch index has been created alongside the existing index. This new index includes timestamp information for each chunk, allowing the RAG system to generate timestamped YouTube URLs that open episodes at the exact moment where relevant snippets occur. + +## Index Structure + +The timestamp-enabled index has all the same fields as the standard index, plus these additional fields: + +- **`youtube_url`** (keyword): Full YouTube URL (e.g., `https://www.youtube.com/watch?v=BVM6TUSfn3E`) +- **`youtube_video_id`** (keyword): Just the video ID (e.g., `BVM6TUSfn3E`) +- **`timestamp`** (integer): Chunk start time in seconds (e.g., 1296 for 21:36) +- **`chunk_index`** (integer): Position of chunk within the episode + +## Environment Variables + +The timestamp-enabled index name is determined by: +- `INDEX_NAME_WITH_TIMESTAMPS` environment variable (if set) +- Otherwise defaults to `{INDEX_NAME}_timestamps` (e.g., if `INDEX_NAME=transcripts`, then `transcripts_timestamps`) + +## Required Changes + +### 1. Update Index Name Configuration + +Add support for selecting which index to use (standard or timestamp-enabled). You can either: +- Add a configuration option/environment variable to choose the index +- Or add a toggle in the Streamlit UI to switch between indices + +### 2. Update Search Query to Include Timestamp Fields + +When querying the timestamp-enabled index, ensure your search results include the new fields: + +```python +# Example: Include timestamp fields in the _source +search_body = { + "query": {...}, + "_source": ["title", "url", "content", "youtube_url", "youtube_video_id", "timestamp", "chunk_index", ...] +} +``` + +### 3. Construct Timestamped YouTube URLs + +When displaying episode URLs in search results, check if `timestamp` and `youtube_video_id` are available. If they are, construct a timestamped URL: + +**URL Format:** +- Base YouTube URL: `https://www.youtube.com/watch?v={video_id}` +- Timestamped URL: `https://youtu.be/{video_id}?t={timestamp}` + +**Example:** +- Video ID: `BVM6TUSfn3E` +- Timestamp: `1296` (seconds) +- Result: `https://youtu.be/BVM6TUSfn3E?t=1296` + +**Implementation:** +```python +def build_episode_url(hit): + """Build episode URL with timestamp if available""" + url = hit.get('_source', {}).get('url', '') + youtube_video_id = hit.get('_source', {}).get('youtube_video_id') + timestamp = hit.get('_source', {}).get('timestamp') + + # If we have YouTube video ID and timestamp, use timestamped URL + if youtube_video_id and timestamp is not None: + return f"https://youtu.be/{youtube_video_id}?t={timestamp}" + + # Otherwise, fall back to original URL + return url +``` + +### 4. Handle Missing Timestamps Gracefully + +Some chunks may not have timestamps (e.g., old transcripts without Whisper segments). Always check if `timestamp` is not None before constructing timestamped URLs: + +```python +if timestamp is not None: + # Use timestamped URL +else: + # Use regular episode URL +``` + +### 5. Display Timestamps in UI (Optional Enhancement) + +Consider displaying the timestamp in a human-readable format alongside the URL: + +```python +def format_timestamp(seconds): + """Convert seconds to MM:SS or HH:MM:SS format""" + if seconds is None: + return None + hours = seconds // 3600 + minutes = (seconds % 3600) // 60 + secs = seconds % 60 + if hours > 0: + return f"{hours}:{minutes:02d}:{secs:02d}" + return f"{minutes}:{secs:02d}" + +# In your UI: +if timestamp: + st.write(f"⏱️ {format_timestamp(timestamp)}") +``` + +## Example Search Result Structure + +When querying the timestamp-enabled index, search results will have this structure: + +```python +{ + "_source": { + "title": "Episode Title", + "url": "https://rss.com/podcasts/vector-podcast/599924", + "content": "Chunk text content...", + "youtube_url": "https://www.youtube.com/watch?v=BVM6TUSfn3E", # May be None + "youtube_video_id": "BVM6TUSfn3E", # May be None + "timestamp": 1296, # Integer seconds, may be None + "chunk_index": 5, + # ... other existing fields + } +} +``` + +## Testing Checklist + +- [ ] Update index name to use timestamp-enabled index +- [ ] Verify search queries return timestamp fields +- [ ] Test URL construction with timestamps +- [ ] Test fallback to regular URL when timestamp is None +- [ ] Test with episodes that have YouTube URLs +- [ ] Test with episodes that don't have YouTube URLs +- [ ] Verify timestamped URLs open YouTube at correct time + +## Notes + +- The timestamp-enabled index can coexist with the standard index +- Both indices can be queried independently +- The standard index remains unchanged and continues to work as before +- Episodes without Whisper segments will have `timestamp: None` +- Episodes without YouTube URLs in description will have `youtube_url: None` and `youtube_video_id: None` + diff --git a/rewrite_rules.json b/rewrite_rules.json index 48cb018..754ec5b 100644 --- a/rewrite_rules.json +++ b/rewrite_rules.json @@ -2,6 +2,7 @@ "corner shortened": "Connor Shorten", "Dimeji Conan": "Dmitry Kan", "Dimitri Can": "Dmitry Kan", + "Dmitri Khan": "Dmitry Kan", "Dimitri": "Dmitry", "Mietri": "Dmitry", "Leo Boyzov": "Leo Boytsov", diff --git a/src/apply_rewrite_rules.py b/src/apply_rewrite_rules.py index aa02215..77b4e97 100644 --- a/src/apply_rewrite_rules.py +++ b/src/apply_rewrite_rules.py @@ -99,22 +99,44 @@ def apply_rules( input_dir: typing_extensions.Annotated[ pathlib.Path, typer.Option("--input-dir", "-i", help="Input directory containing transcript files"), - ], + ] = None, output_dir: typing_extensions.Annotated[ pathlib.Path, typer.Option("--output-dir", "-o", help="Output directory for corrected transcript files"), - ], + ] = None, rules_file: typing_extensions.Annotated[ pathlib.Path, typer.Option("--rules-file", "-r", help="Path to rewrite rules JSON file"), ] = pathlib.Path("rewrite_rules.json"), + with_timestamps: typing_extensions.Annotated[ + bool, + typer.Option("--with-timestamps", help="Process transcripts_with_timestamps/ directory instead of transcripts/"), + ] = False, + in_place: typing_extensions.Annotated[ + bool, + typer.Option("--in-place", help="Overwrite input files instead of writing to output directory"), + ] = False, ): """ Apply rewrite rules to transcript files. Processes all .md files in the input directory recursively and writes corrected versions to the output directory, preserving directory structure. + + If --with-timestamps is used, defaults to transcripts_with_timestamps/ directory. + If --in-place is used, files are overwritten in the input directory. """ + # Set default directories based on with_timestamps flag + if input_dir is None: + input_dir = pathlib.Path("transcripts_with_timestamps" if with_timestamps else "transcripts") + + if output_dir is None: + if in_place: + output_dir = input_dir + else: + # Default output: add _corrected suffix + output_dir = pathlib.Path(f"{input_dir}_corrected") + # Validate input directory if not input_dir.exists(): typer.echo(f"Error: Input directory does not exist: {input_dir}", err=True) diff --git a/src/os_index.py b/src/os_index.py index 092a886..39fa653 100644 --- a/src/os_index.py +++ b/src/os_index.py @@ -4,6 +4,7 @@ Should be ran before sending information to a new database """ +import copy import os from dotenv import load_dotenv @@ -47,10 +48,66 @@ } index_name = os.getenv("INDEX_NAME", "transcripts") +index_name_with_timestamps = os.getenv("INDEX_NAME_WITH_TIMESTAMPS", f"{index_name}_timestamps") -def create_index(index_name: str = index_name, index_settings=index_settings): - """Checks for existing index and deletes it and recreates it if it exists""" +def create_index_with_timestamps(index_name: str = index_name_with_timestamps): + """Creates an index with timestamp fields for timecode support""" + # Start with base index settings (deep copy to avoid modifying original) + timestamps_index_settings = copy.deepcopy(index_settings) + + # Add timestamp-related fields to mappings + timestamps_index_settings["mappings"]["properties"]["youtube_url"] = {"type": "keyword"} + timestamps_index_settings["mappings"]["properties"]["youtube_video_id"] = {"type": "keyword"} + timestamps_index_settings["mappings"]["properties"]["timestamp"] = {"type": "integer"} + timestamps_index_settings["mappings"]["properties"]["chunk_index"] = {"type": "integer"} + + if client.indices.exists(index=index_name): + print(f"Deleting existing index {index_name}") + client.indices.delete(index=index_name) + + print(f"Creating index {index_name} with knn_vector mapping and timestamp fields") + try: + response = client.indices.create( + index=index_name, + body=timestamps_index_settings + ) + print(f"✅ Index created successfully: {response}") + except Exception as e: + print(f"❌ Error creating index: {e}") + raise + + # Verify the mapping was applied correctly + print(f"\nVerifying index mapping...") + mapping = client.indices.get_mapping(index=index_name) + content_vector_type = mapping[index_name]["mappings"]["properties"].get("content_vector", {}).get("type") + if content_vector_type == "knn_vector": + print(f"✅ content_vector field is correctly set to knn_vector") + else: + print(f"⚠️ WARNING: content_vector field type is '{content_vector_type}', expected 'knn_vector'") + print(f" This may cause vector search to fail. Please recreate the index.") + + # Verify timestamp fields + has_timestamp = "timestamp" in mapping[index_name]["mappings"]["properties"] + has_youtube_url = "youtube_url" in mapping[index_name]["mappings"]["properties"] + if has_timestamp and has_youtube_url: + print(f"✅ Timestamp fields are correctly set") + else: + print(f"⚠️ WARNING: Some timestamp fields are missing") + + +def create_index(index_name: str = index_name, index_settings=index_settings, with_timestamps: bool = False): + """Checks for existing index and deletes it and recreates it if it exists + + Args: + index_name: Name of the index to create + index_settings: Index settings to use (defaults to base settings) + with_timestamps: If True, create index with timestamp fields + """ + if with_timestamps: + create_index_with_timestamps(index_name) + return + if client.indices.exists(index=index_name): print(f"Deleting existing index {index_name}") client.indices.delete(index=index_name) @@ -77,4 +134,9 @@ def create_index(index_name: str = index_name, index_settings=index_settings): print(f" This may cause vector search to fail. Please recreate the index.") if __name__ == "__main__": - create_index() \ No newline at end of file + import sys + with_timestamps = "--with-timestamps" in sys.argv + if with_timestamps: + create_index_with_timestamps() + else: + create_index() \ No newline at end of file diff --git a/src/os_ingest.py b/src/os_ingest.py index c7de698..4294319 100644 --- a/src/os_ingest.py +++ b/src/os_ingest.py @@ -1,4 +1,5 @@ import frontmatter +import json import os import pathlib import re @@ -13,10 +14,12 @@ from langchain_huggingface import HuggingFaceEmbeddings from langchain_text_splitters import RecursiveCharacterTextSplitter from opensearchpy import OpenSearch, helpers +from bs4 import BeautifulSoup load_dotenv() INDEX_NAME = os.getenv("INDEX_NAME") +INDEX_NAME_WITH_TIMESTAMPS = os.getenv("INDEX_NAME_WITH_TIMESTAMPS", f"{INDEX_NAME}_timestamps" if INDEX_NAME else "transcripts_timestamps") CONNECTION_STRING = os.getenv("OPENSEARCH_SERVICE_URI") client = OpenSearch(CONNECTION_STRING, use_ssl=True, timeout=100) @@ -37,8 +40,113 @@ ) -def os_load_data_from_file(file: pathlib.Path): - """Chunk data, create embeddings, and index in OpenSearch.""" +def parse_youtube_url(description: str) -> typing.Optional[str]: + """Extract YouTube URL from HTML description field. + + Looks for patterns like youtube.com/watch?v=VIDEO_ID or youtu.be/VIDEO_ID + Returns the full URL or None if not found. + """ + if not description: + return None + + # Parse HTML description + soup = BeautifulSoup(description, 'html.parser') + + # Look for YouTube links in anchor tags + for link in soup.find_all('a', href=True): + href = link.get('href', '') + if 'youtube.com/watch' in href or 'youtu.be/' in href: + # Extract the URL, handling both full URLs and relative paths + if href.startswith('http'): + return href + elif href.startswith('//'): + return 'https:' + href + elif href.startswith('/'): + return 'https://www.youtube.com' + href + + # Also check for YouTube URLs in plain text + youtube_pattern = r'(https?://(?:www\.)?(?:youtube\.com/watch\?v=|youtu\.be/)[\w-]+)' + matches = re.findall(youtube_pattern, description) + if matches: + return matches[0] + + return None + + +def extract_youtube_video_id(youtube_url: str) -> typing.Optional[str]: + """Extract video ID from YouTube URL. + + Handles both formats: + - youtube.com/watch?v=VIDEO_ID + - youtu.be/VIDEO_ID + + Returns just the video ID or None if not found. + """ + if not youtube_url: + return None + + # Pattern for youtube.com/watch?v=VIDEO_ID + match = re.search(r'(?:youtube\.com/watch\?v=|youtu\.be/)([a-zA-Z0-9_-]+)', youtube_url) + if match: + return match.group(1) + + return None + + +def map_chunk_to_segment_timestamp(chunk_text: str, whisper_segments: typing.List[dict]) -> typing.Optional[float]: + """Map a chunk to its corresponding Whisper segment and return the start timestamp. + + Args: + chunk_text: The text content of the chunk + whisper_segments: List of Whisper segments, each with 'start', 'end', and 'text' keys + + Returns: + The start timestamp (in seconds) of the first matching segment, or None if no match + """ + if not whisper_segments or not chunk_text: + return None + + # Normalize text for comparison (lowercase, strip whitespace) + chunk_text_normalized = chunk_text.lower().strip() + + # Try to find segment that contains the chunk text + for segment in whisper_segments: + segment_text = segment.get('text', '').lower().strip() + if not segment_text: + continue + + # Check if chunk text is contained in segment text or vice versa + # We use a threshold to handle partial matches + if chunk_text_normalized in segment_text or segment_text in chunk_text_normalized: + return segment.get('start') + + # Also check for significant overlap (at least 50% of shorter string) + shorter_len = min(len(chunk_text_normalized), len(segment_text)) + if shorter_len > 0: + # Simple overlap check: count common words + chunk_words = set(chunk_text_normalized.split()) + segment_words = set(segment_text.split()) + if chunk_words and segment_words: + overlap_ratio = len(chunk_words & segment_words) / min(len(chunk_words), len(segment_words)) + if overlap_ratio > 0.5: + return segment.get('start') + + # If no exact match, try to find the segment that starts closest to where the chunk would be + # This is a fallback for chunks that don't match exactly + # We'll use the first segment as a last resort (not ideal, but better than None) + if whisper_segments: + return whisper_segments[0].get('start') + + return None + + +def os_load_data_from_file(file: pathlib.Path, use_timestamps: bool = False): + """Chunk data, create embeddings, and index in OpenSearch. + + Args: + file: Path to the transcript file + use_timestamps: If True, extract YouTube URLs and map chunks to Whisper segments for timestamps + """ docs = [] # Load frontmatter and extract metadata @@ -80,9 +188,31 @@ def os_load_data_from_file(file: pathlib.Path): "image_url": frontmatter_post.get("image_url", ""), "pub_date": pub_date, } + + # Extract YouTube URL and video ID if using timestamps + whisper_segments = None + if use_timestamps: + description = frontmatter_post.get("description", "") + youtube_url = parse_youtube_url(description) + youtube_video_id = extract_youtube_video_id(youtube_url) if youtube_url else None + + base_data["youtube_url"] = youtube_url + base_data["youtube_video_id"] = youtube_video_id + + # Load Whisper segments from frontmatter + whisper_segments_str = frontmatter_post.get("whisper_segments") + if whisper_segments_str: + try: + whisper_segments = json.loads(whisper_segments_str) + except (json.JSONDecodeError, TypeError): + whisper_segments = None post_chunks = splitter.create_documents([frontmatter_post.content]) - for post_chunk in post_chunks: + + # Determine which index to use + target_index = INDEX_NAME_WITH_TIMESTAMPS if use_timestamps else INDEX_NAME + + for chunk_index, post_chunk in enumerate(post_chunks): doc = { **base_data, **{ @@ -93,8 +223,17 @@ def os_load_data_from_file(file: pathlib.Path): ], }, } + + # Add timestamp fields if using timestamps + if use_timestamps: + doc["chunk_index"] = chunk_index + # Map chunk to Whisper segment to get timestamp + timestamp = map_chunk_to_segment_timestamp(post_chunk.page_content, whisper_segments) if whisper_segments else None + doc["timestamp"] = int(timestamp) if timestamp is not None else None + docs.append(doc) - response = helpers.bulk(client, docs, index=INDEX_NAME) + + response = helpers.bulk(client, docs, index=target_index) return response @@ -115,13 +254,19 @@ def ingest( bool, typer.Option("--all", "-a", help="Process all transcript files"), ] = False, + with_timestamps: typing_extensions.Annotated[ + bool, + typer.Option("--with-timestamps", help="Use timestamp-enabled index and extract YouTube URLs/timestamps"), + ] = False, ): """ Ingest transcript files into OpenSearch. Can process a specific episode, all episodes in a show, or all episodes. + Use --with-timestamps to ingest into the timestamp-enabled index and read from transcripts_with_timestamps/ folder. """ - transcripts_dir = pathlib.Path("transcripts") + # Use different folder based on with_timestamps flag + transcripts_dir = pathlib.Path("transcripts_with_timestamps" if with_timestamps else "transcripts") if all: # Process all files in transcripts directory (including subdirectories) @@ -141,10 +286,12 @@ def ingest( raise typer.Exit(1) typer.echo(f"Processing {len(files)} transcript file(s)...") + index_type = "timestamp-enabled" if with_timestamps else "standard" + typer.echo(f"Using {index_type} index") for file in files: typer.echo(f"Processing: {file}") try: - os_load_data_from_file(file) + os_load_data_from_file(file, use_timestamps=with_timestamps) typer.echo(f"✓ Successfully ingested {file.name}") except Exception as e: typer.echo(f"✗ Error processing {file.name}: {e}", err=True) @@ -168,8 +315,10 @@ def ingest( raise typer.Exit(1) typer.echo(f"Processing: {file_path}") + index_type = "timestamp-enabled" if with_timestamps else "standard" + typer.echo(f"Using {index_type} index") try: - os_load_data_from_file(file_path) + os_load_data_from_file(file_path, use_timestamps=with_timestamps) typer.echo(f"✓ Successfully ingested {file_path.name}") except Exception as e: typer.echo(f"✗ Error processing {file_path.name}: {e}", err=True) diff --git a/src/transcribe.py b/src/transcribe.py index 5be7d36..5f9c790 100644 --- a/src/transcribe.py +++ b/src/transcribe.py @@ -1,7 +1,9 @@ """Use Whisper to transcribe audio files to text.""" from os import name +import json import pathlib +import re import tempfile import typing import typing_extensions @@ -75,12 +77,24 @@ def download_audio_file(url: str) -> str: Returns: The path to the downloaded audio file. """ + # Use browser-like headers to avoid getting ad-injected versions + # Some CDNs (like RSS.com) serve different content based on User-Agent + headers = { + "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", + "Accept": "audio/mpeg, audio/*, */*", + "Accept-Language": "en-US,en;q=0.9", + "Referer": "https://rss.com/", + } with tempfile.NamedTemporaryFile(mode="+wb", suffix=".mp3", delete=False) as f: typer.echo(f"Downloading audio file from {url}") - with httpx.stream("GET", url, follow_redirects=True) as response: - + with httpx.stream("GET", url, headers=headers, follow_redirects=True) as response: + # Log the final URL after redirects for debugging + final_url = str(response.url) + if final_url != url: + typer.echo(f"Redirected to: {final_url}") + typer.echo(f"Saving audio file to {f.name}") for chunk in response.iter_bytes(): f.write(chunk) @@ -88,8 +102,8 @@ def download_audio_file(url: str) -> str: return f.name -def transcribe_audio_file(audio_file: pathlib.Path) -> str: - """Transcribe an audio file to text""" +def transcribe_audio_file(audio_file: pathlib.Path) -> dict: + """Transcribe an audio file and return full Whisper result with text and segments""" model = _get_whisper_model() # _whisper_module is guaranteed to be set if _get_whisper_model() succeeded whisper = _whisper_module @@ -97,6 +111,12 @@ def transcribe_audio_file(audio_file: pathlib.Path) -> str: audio = whisper.load_audio(str(audio_file)) transcription = model.transcribe(audio=audio, verbose=False) + return transcription + + +def transcribe_audio_file_text_only(audio_file: pathlib.Path) -> str: + """Transcribe an audio file to text (backward compatibility)""" + transcription = transcribe_audio_file(audio_file) return transcription["text"] @@ -116,10 +136,11 @@ def transcribe_file( if not output_file: output_file = input_file.absolute().with_suffix(".txt") - return output_file.write_text(transcription) + return output_file.write_text(transcription["text"]) -def transcribe_from_audio_url(audio_url: str) -> str: +def transcribe_from_audio_url(audio_url: str) -> dict: + """Transcribe audio from URL and return full Whisper result with text and segments""" typer.echo(f"Transcribing audio from {audio_url}") audio_file_path = download_audio_file(audio_url) transcription = transcribe_audio_file(pathlib.Path(audio_file_path)) @@ -128,15 +149,22 @@ def transcribe_from_audio_url(audio_url: str) -> str: return transcription -def get_output_file_path(metadata: dict, show: typing.Optional[str] = None) -> pathlib.Path: - """Get the output file path for a transcript based on metadata and show name.""" +def get_output_file_path(metadata: dict, show: typing.Optional[str] = None, with_timestamps: bool = False) -> pathlib.Path: + """Get the output file path for a transcript based on metadata and show name. + + Args: + metadata: Episode metadata dictionary + show: Optional show name (subdirectory) + with_timestamps: If True, save to transcripts_with_timestamps/ folder + """ + base_dir = "transcripts_with_timestamps" if with_timestamps else "transcripts" if show: return pathlib.Path( - f"transcripts/{slugify.slugify(show)}/{slugify.slugify(metadata['title'])}.md" + f"{base_dir}/{slugify.slugify(show)}/{slugify.slugify(metadata['title'])}.md" ) else: return pathlib.Path( - f"transcripts/{slugify.slugify(metadata['title'])}.md" + f"{base_dir}/{slugify.slugify(metadata['title'])}.md" ) @@ -160,6 +188,10 @@ def transcribe_from_episode_number( bool, typer.Option("--skip-if-exists", help="Skip episodes for which transcriptions already exist"), ] = False, + with_timestamps: typing_extensions.Annotated[ + bool, + typer.Option("--with-timestamps", help="Extract and store Whisper segments with timestamps (slower, saves to transcripts_with_timestamps/)"), + ] = False, ): """ Transcribe an episode from an episode number @@ -187,15 +219,24 @@ def transcribe_from_episode_number( for episode_number in track(episode_numbers): metadata, audio_url = get_audio_url_from_episode_number(episode_number) - output_file = get_output_file_path(metadata, show) + output_file = get_output_file_path(metadata, show, with_timestamps=with_timestamps) if skip_if_exists and output_file.exists(): typer.echo(f"Skipping episode {episode_number}: {output_file} already exists") continue - transcription = transcribe_from_audio_url(audio_url) + transcription_result = transcribe_from_audio_url(audio_url) + transcription_text = transcription_result["text"] + + # Only extract and store segments if with_timestamps is enabled + if with_timestamps: + whisper_segments = transcription_result.get("segments", []) + metadata["whisper_segments"] = json.dumps(whisper_segments) if whisper_segments else None + else: + metadata["whisper_segments"] = None + post = frontmatter.Post( - "\n".join(splitter.split_text(transcription)), **metadata + "\n".join(splitter.split_text(transcription_text)), **metadata ) output_file.parent.mkdir(parents=True, exist_ok=True) output_file.write_text(frontmatter.dumps(post)) @@ -228,6 +269,10 @@ def transcribe_from_rss( bool, typer.Option("--skip-if-exists", help="Skip episodes for which transcriptions already exist"), ] = False, + with_timestamps: typing_extensions.Annotated[ + bool, + typer.Option("--with-timestamps", help="Extract and store Whisper segments with timestamps (slower, saves to transcripts_with_timestamps/)"), + ] = False, ): """ Transcribe episodes from an RSS feed (e.g., Vector Podcast) @@ -235,6 +280,17 @@ def transcribe_from_rss( Metadata is pulled from the RSS feed. Audio is downloaded from the enclosure URL in the feed. """ + # Auto-detect show name from RSS URL if not provided + if show is None: + # Try to extract show name from URL pattern like "rss.com/vector-podcast/feed.xml" + match = re.search(r'/([^/]+)/feed\.xml', rss_url) + if match: + show = match.group(1) + else: + # Fallback: try to extract from any path segment before feed.xml + match = re.search(r'/([^/]+)/[^/]*feed', rss_url) + if match: + show = match.group(1) if latest: episode = get_latest_episode(rss_url) @@ -243,15 +299,24 @@ def transcribe_from_rss( raise typer.Exit(1) metadata, audio_url = get_audio_url_from_rss_episode(episode) - output_file = get_output_file_path(metadata, show) + output_file = get_output_file_path(metadata, show, with_timestamps=with_timestamps) if skip_if_exists and output_file.exists(): typer.echo(f"Skipping latest episode: {output_file} already exists") return - transcription = transcribe_from_audio_url(audio_url) + transcription_result = transcribe_from_audio_url(audio_url) + transcription_text = transcription_result["text"] + + # Only extract and store segments if with_timestamps is enabled + if with_timestamps: + whisper_segments = transcription_result.get("segments", []) + metadata["whisper_segments"] = json.dumps(whisper_segments) if whisper_segments else None + else: + metadata["whisper_segments"] = None + post = frontmatter.Post( - "\n".join(splitter.split_text(transcription)), **metadata + "\n".join(splitter.split_text(transcription_text)), **metadata ) output_file.parent.mkdir(parents=True, exist_ok=True) output_file.write_text(frontmatter.dumps(post)) @@ -266,15 +331,24 @@ def transcribe_from_rss( episodes = parse_rss_feed(rss_url) for episode in track(episodes): metadata, audio_url = get_audio_url_from_rss_episode(episode) - output_file = get_output_file_path(metadata, show) + output_file = get_output_file_path(metadata, show, with_timestamps=with_timestamps) if skip_if_exists and output_file.exists(): typer.echo(f"Skipping episode: {output_file} already exists") continue - transcription = transcribe_from_audio_url(audio_url) + transcription_result = transcribe_from_audio_url(audio_url) + transcription_text = transcription_result["text"] + + # Only extract and store segments if with_timestamps is enabled + if with_timestamps: + whisper_segments = transcription_result.get("segments", []) + metadata["whisper_segments"] = json.dumps(whisper_segments) if whisper_segments else None + else: + metadata["whisper_segments"] = None + post = frontmatter.Post( - "\n".join(splitter.split_text(transcription)), **metadata + "\n".join(splitter.split_text(transcription_text)), **metadata ) output_file.parent.mkdir(parents=True, exist_ok=True) output_file.write_text(frontmatter.dumps(post)) @@ -302,15 +376,24 @@ def transcribe_from_rss( for episode in track(episodes_to_process): metadata, audio_url = get_audio_url_from_rss_episode(episode) - output_file = get_output_file_path(metadata, show) + output_file = get_output_file_path(metadata, show, with_timestamps=with_timestamps) if skip_if_exists and output_file.exists(): typer.echo(f"Skipping episode: {output_file} already exists") continue - transcription = transcribe_from_audio_url(audio_url) + transcription_result = transcribe_from_audio_url(audio_url) + transcription_text = transcription_result["text"] + + # Only extract and store segments if with_timestamps is enabled + if with_timestamps: + whisper_segments = transcription_result.get("segments", []) + metadata["whisper_segments"] = json.dumps(whisper_segments) if whisper_segments else None + else: + metadata["whisper_segments"] = None + post = frontmatter.Post( - "\n".join(splitter.split_text(transcription)), **metadata + "\n".join(splitter.split_text(transcription_text)), **metadata ) output_file.parent.mkdir(parents=True, exist_ok=True) output_file.write_text(frontmatter.dumps(post)) diff --git a/transcripts_with_timestamps/vector-podcast/adding-ml-layer-to-search-hybrid-search-optimizer-with-daniel-wrigley-and-eric-pugh.md b/transcripts_with_timestamps/vector-podcast/adding-ml-layer-to-search-hybrid-search-optimizer-with-daniel-wrigley-and-eric-pugh.md new file mode 100644 index 0000000..9483550 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/adding-ml-layer-to-search-hybrid-search-optimizer-with-daniel-wrigley-and-eric-pugh.md @@ -0,0 +1,2760 @@ +--- +description: '

Vector Podcast website: https://vectorpodcast.com

Haystack + US 2025: https://haystackconf.com/2025/

Federated + search, Keyword & Neural Search, ML Optimisation, Pros and Cons of Hybrid search

It + is fascinating and funny how things develop, but also turn around. In 2022-23 everyone + was buzzing about hybrid search. In 2024 the conversation shifted to RAG, RAG, RAG. + And now we are in 2025 and back to hybrid search - on a different level: finally + there are strides and contributions towards making hybrid search parameters learnt + with ML. How cool is that?

Design: Saurabh Rai, https://www.linkedin.com/in/srbhr/

The + design of this episode is inspired by a scene in Blade Runner 2049. There''s a clear + path leading towards where people want to go to, yet they''re searching for something.

00:00 + Intro

00:54 Eric''s intro and Daniel''s background

02:50 Importance + of Hybrid search: Daniel''s take

07:26 Eric''s take

10:57 Dmitry''s + take

11:41 Eric''s predictions

13:47 Doug''s blog on RRF is not enough

16:18 + How to not fall short of the blind picking in RRF: score normalization, combinations + and weights

25:03 The role of query understanding: feature groups

35:11 + Lesson 1 from Daniel: Simple models might be all you need

36:30 Lesson 2: + query features might be all you need

38:30 Reasoning capabilities in search

40:02 + Question from Eric: how is this different from Learning To Rank?

42:46 Carrying + the past in Learning To Rank / any rank

44:21 Demo!

51:52 How to consume + this in OpenSearch

55:15 What''s next

58:44 Haystack US 2025

YouTube: + https://www.youtube.com/watch?v=quY769om1EY

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20250321_110308_985bc30944ce48882d237ba24dea55a4.png +pub_date: Fri, 21 Mar 2025 11:33:23 GMT +title: 'Adding ML layer to Search: Hybrid Search Optimizer with Daniel Wrigley and + Eric Pugh' +url: https://rss.com/podcasts/vector-podcast/1951801 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 19.48, "text": " Hello + there, Vector Podcast is back.", "tokens": [50364, 2425, 456, 11, 691, 20814, 29972, + 307, 646, 13, 51338], "temperature": 0.0, "avg_logprob": -0.377059421023807, "compression_ratio": + 1.1504424778761062, "no_speech_prob": 0.23972827196121216}, {"id": 1, "seek": 0, + "start": 19.48, "end": 21.88, "text": " Same season 3.", "tokens": [51338, 10635, + 3196, 805, 13, 51458], "temperature": 0.0, "avg_logprob": -0.377059421023807, "compression_ratio": + 1.1504424778761062, "no_speech_prob": 0.23972827196121216}, {"id": 2, "seek": 0, + "start": 21.88, "end": 27.0, "text": " I think we are about to wrap it up with few + final really interesting episodes.", "tokens": [51458, 286, 519, 321, 366, 466, + 281, 7019, 309, 493, 365, 1326, 2572, 534, 1880, 9313, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.377059421023807, "compression_ratio": 1.1504424778761062, + "no_speech_prob": 0.23972827196121216}, {"id": 3, "seek": 2700, "start": 27.0, "end": + 32.44, "text": " There I have the privilege to talk to the open source crew Eric + Pugh who you have seen", "tokens": [50364, 821, 286, 362, 264, 12122, 281, 751, + 281, 264, 1269, 4009, 7260, 9336, 430, 1984, 567, 291, 362, 1612, 50636], "temperature": + 0.0, "avg_logprob": -0.4400940972405511, "compression_ratio": 1.495049504950495, + "no_speech_prob": 0.3294818103313446}, {"id": 4, "seek": 2700, "start": 32.44, "end": + 40.24, "text": " in the one of the previous episodes and you guessed Daniel Riggly + joining us to discuss", "tokens": [50636, 294, 264, 472, 295, 264, 3894, 9313, 293, + 291, 21852, 8033, 497, 46737, 5549, 505, 281, 2248, 51026], "temperature": 0.0, + "avg_logprob": -0.4400940972405511, "compression_ratio": 1.495049504950495, "no_speech_prob": + 0.3294818103313446}, {"id": 5, "seek": 2700, "start": 40.24, "end": 46.84, "text": + " really interesting topic on hybrid search and optimization.", "tokens": [51026, + 534, 1880, 4829, 322, 13051, 3164, 293, 19618, 13, 51356], "temperature": 0.0, "avg_logprob": + -0.4400940972405511, "compression_ratio": 1.495049504950495, "no_speech_prob": 0.3294818103313446}, + {"id": 6, "seek": 2700, "start": 46.84, "end": 49.2, "text": " Really really excited + to have you both on the show.", "tokens": [51356, 4083, 534, 2919, 281, 362, 291, + 1293, 322, 264, 855, 13, 51474], "temperature": 0.0, "avg_logprob": -0.4400940972405511, + "compression_ratio": 1.495049504950495, "no_speech_prob": 0.3294818103313446}, {"id": + 7, "seek": 2700, "start": 49.2, "end": 50.2, "text": " Hello.", "tokens": [51474, + 2425, 13, 51524], "temperature": 0.0, "avg_logprob": -0.4400940972405511, "compression_ratio": + 1.495049504950495, "no_speech_prob": 0.3294818103313446}, {"id": 8, "seek": 2700, + "start": 50.2, "end": 52.04, "text": " Awesome.", "tokens": [51524, 10391, 13, 51616], + "temperature": 0.0, "avg_logprob": -0.4400940972405511, "compression_ratio": 1.495049504950495, + "no_speech_prob": 0.3294818103313446}, {"id": 9, "seek": 5204, "start": 53.04, "end": + 54.04, "text": " Awesome.", "tokens": [50414, 10391, 13, 50464], "temperature": + 0.0, "avg_logprob": -0.2870995751742659, "compression_ratio": 1.6875, "no_speech_prob": + 0.29941433668136597}, {"id": 10, "seek": 5204, "start": 54.04, "end": 58.04, "text": + " So as a tradition we start with the intros.", "tokens": [50464, 407, 382, 257, + 6994, 321, 722, 365, 264, 560, 2635, 13, 50664], "temperature": 0.0, "avg_logprob": + -0.2870995751742659, "compression_ratio": 1.6875, "no_speech_prob": 0.29941433668136597}, + {"id": 11, "seek": 5204, "start": 58.04, "end": 63.04, "text": " Eric everyone knows + but Eric feel free to introduce yourself.", "tokens": [50664, 9336, 1518, 3255, + 457, 9336, 841, 1737, 281, 5366, 1803, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.2870995751742659, "compression_ratio": 1.6875, "no_speech_prob": 0.29941433668136597}, + {"id": 12, "seek": 5204, "start": 63.04, "end": 65.03999999999999, "text": " I mean + great to be back to me tree.", "tokens": [50914, 286, 914, 869, 281, 312, 646, 281, + 385, 4230, 13, 51014], "temperature": 0.0, "avg_logprob": -0.2870995751742659, "compression_ratio": + 1.6875, "no_speech_prob": 0.29941433668136597}, {"id": 13, "seek": 5204, "start": + 65.03999999999999, "end": 69.75999999999999, "text": " I actually I''m a little + late getting here because I realized I was driving to the office", "tokens": [51014, + 286, 767, 286, 478, 257, 707, 3469, 1242, 510, 570, 286, 5334, 286, 390, 4840, 281, + 264, 3398, 51250], "temperature": 0.0, "avg_logprob": -0.2870995751742659, "compression_ratio": + 1.6875, "no_speech_prob": 0.29941433668136597}, {"id": 14, "seek": 5204, "start": + 69.75999999999999, "end": 73.32, "text": " that I forgot my mug that you gave me + the other year.", "tokens": [51250, 300, 286, 5298, 452, 23610, 300, 291, 2729, + 385, 264, 661, 1064, 13, 51428], "temperature": 0.0, "avg_logprob": -0.2870995751742659, + "compression_ratio": 1.6875, "no_speech_prob": 0.29941433668136597}, {"id": 15, + "seek": 5204, "start": 73.32, "end": 76.52, "text": " So I actually called Daniels + like I''m going to be a little late because I got to go home", "tokens": [51428, + 407, 286, 767, 1219, 8033, 82, 411, 286, 478, 516, 281, 312, 257, 707, 3469, 570, + 286, 658, 281, 352, 1280, 51588], "temperature": 0.0, "avg_logprob": -0.2870995751742659, + "compression_ratio": 1.6875, "no_speech_prob": 0.29941433668136597}, {"id": 16, + "seek": 5204, "start": 76.52, "end": 78.8, "text": " and pick up the mug and bring + it into the office.", "tokens": [51588, 293, 1888, 493, 264, 23610, 293, 1565, 309, + 666, 264, 3398, 13, 51702], "temperature": 0.0, "avg_logprob": -0.2870995751742659, + "compression_ratio": 1.6875, "no_speech_prob": 0.29941433668136597}, {"id": 17, + "seek": 7880, "start": 78.8, "end": 82.96, "text": " My wife keeps it and we use + it when we go hiking but I was like I''m going to bring it into", "tokens": [50364, + 1222, 3836, 5965, 309, 293, 321, 764, 309, 562, 321, 352, 23784, 457, 286, 390, + 411, 286, 478, 516, 281, 1565, 309, 666, 50572], "temperature": 0.0, "avg_logprob": + -0.26173441863257035, "compression_ratio": 1.713740458015267, "no_speech_prob": + 0.11045863479375839}, {"id": 18, "seek": 7880, "start": 82.96, "end": 88.44, "text": + " the office and show it off since this is my second podcast to do with you and + the mug", "tokens": [50572, 264, 3398, 293, 855, 309, 766, 1670, 341, 307, 452, + 1150, 7367, 281, 360, 365, 291, 293, 264, 23610, 50846], "temperature": 0.0, "avg_logprob": + -0.26173441863257035, "compression_ratio": 1.713740458015267, "no_speech_prob": + 0.11045863479375839}, {"id": 19, "seek": 7880, "start": 88.44, "end": 92.0, "text": + " that you gave me two years ago three years ago at this point.", "tokens": [50846, + 300, 291, 2729, 385, 732, 924, 2057, 1045, 924, 2057, 412, 341, 935, 13, 51024], + "temperature": 0.0, "avg_logprob": -0.26173441863257035, "compression_ratio": 1.713740458015267, + "no_speech_prob": 0.11045863479375839}, {"id": 20, "seek": 7880, "start": 92.0, + "end": 94.47999999999999, "text": " Yeah probably three years.", "tokens": [51024, + 865, 1391, 1045, 924, 13, 51148], "temperature": 0.0, "avg_logprob": -0.26173441863257035, + "compression_ratio": 1.713740458015267, "no_speech_prob": 0.11045863479375839}, + {"id": 21, "seek": 7880, "start": 94.47999999999999, "end": 95.84, "text": " Works + great works great.", "tokens": [51148, 27914, 869, 1985, 869, 13, 51216], "temperature": + 0.0, "avg_logprob": -0.26173441863257035, "compression_ratio": 1.713740458015267, + "no_speech_prob": 0.11045863479375839}, {"id": 22, "seek": 7880, "start": 95.84, + "end": 101.12, "text": " So yeah super excited to be back here and you know kind + of talk about some of the work", "tokens": [51216, 407, 1338, 1687, 2919, 281, 312, + 646, 510, 293, 291, 458, 733, 295, 751, 466, 512, 295, 264, 589, 51480], "temperature": + 0.0, "avg_logprob": -0.26173441863257035, "compression_ratio": 1.713740458015267, + "no_speech_prob": 0.11045863479375839}, {"id": 23, "seek": 7880, "start": 101.12, + "end": 103.67999999999999, "text": " that we''ve been doing with the open search + community.", "tokens": [51480, 300, 321, 600, 668, 884, 365, 264, 1269, 3164, 1768, + 13, 51608], "temperature": 0.0, "avg_logprob": -0.26173441863257035, "compression_ratio": + 1.713740458015267, "no_speech_prob": 0.11045863479375839}, {"id": 24, "seek": 7880, + "start": 103.67999999999999, "end": 105.28, "text": " So exciting.", "tokens": [51608, + 407, 4670, 13, 51688], "temperature": 0.0, "avg_logprob": -0.26173441863257035, + "compression_ratio": 1.713740458015267, "no_speech_prob": 0.11045863479375839}, + {"id": 25, "seek": 7880, "start": 105.28, "end": 107.08, "text": " Yes.", "tokens": + [51688, 1079, 13, 51778], "temperature": 0.0, "avg_logprob": -0.26173441863257035, + "compression_ratio": 1.713740458015267, "no_speech_prob": 0.11045863479375839}, + {"id": 26, "seek": 10708, "start": 107.08, "end": 109.12, "text": " And Daniel welcome.", + "tokens": [50364, 400, 8033, 2928, 13, 50466], "temperature": 0.0, "avg_logprob": + -0.30384905745343466, "compression_ratio": 1.4292682926829268, "no_speech_prob": + 0.029049118980765343}, {"id": 27, "seek": 10708, "start": 109.12, "end": 111.88, + "text": " Can you say a few words about yourself your background?", "tokens": [50466, + 1664, 291, 584, 257, 1326, 2283, 466, 1803, 428, 3678, 30, 50604], "temperature": + 0.0, "avg_logprob": -0.30384905745343466, "compression_ratio": 1.4292682926829268, + "no_speech_prob": 0.029049118980765343}, {"id": 28, "seek": 10708, "start": 111.88, + "end": 114.0, "text": " Absolutely yeah thanks.", "tokens": [50604, 7021, 1338, + 3231, 13, 50710], "temperature": 0.0, "avg_logprob": -0.30384905745343466, "compression_ratio": + 1.4292682926829268, "no_speech_prob": 0.029049118980765343}, {"id": 29, "seek": + 10708, "start": 114.0, "end": 115.0, "text": " It''s great to be here.", "tokens": + [50710, 467, 311, 869, 281, 312, 510, 13, 50760], "temperature": 0.0, "avg_logprob": + -0.30384905745343466, "compression_ratio": 1.4292682926829268, "no_speech_prob": + 0.029049118980765343}, {"id": 30, "seek": 10708, "start": 115.0, "end": 119.52, + "text": " I''m super excited maybe a little nervous but I''m sure it''ll be fun.", + "tokens": [50760, 286, 478, 1687, 2919, 1310, 257, 707, 6296, 457, 286, 478, 988, + 309, 603, 312, 1019, 13, 50986], "temperature": 0.0, "avg_logprob": -0.30384905745343466, + "compression_ratio": 1.4292682926829268, "no_speech_prob": 0.029049118980765343}, + {"id": 31, "seek": 10708, "start": 119.52, "end": 123.2, "text": " So I''m Daniel + I''m with open source connections.", "tokens": [50986, 407, 286, 478, 8033, 286, + 478, 365, 1269, 4009, 9271, 13, 51170], "temperature": 0.0, "avg_logprob": -0.30384905745343466, + "compression_ratio": 1.4292682926829268, "no_speech_prob": 0.029049118980765343}, + {"id": 32, "seek": 10708, "start": 123.2, "end": 129.48, "text": " I started out + as a search consultant back in May 2012.", "tokens": [51170, 286, 1409, 484, 382, + 257, 3164, 24676, 646, 294, 1891, 9125, 13, 51484], "temperature": 0.0, "avg_logprob": + -0.30384905745343466, "compression_ratio": 1.4292682926829268, "no_speech_prob": + 0.029049118980765343}, {"id": 33, "seek": 12948, "start": 129.48, "end": 140.35999999999999, + "text": " So almost 13 years now and I''m here to share some of our experiences + that we made in our", "tokens": [50364, 407, 1920, 3705, 924, 586, 293, 286, 478, + 510, 281, 2073, 512, 295, 527, 5235, 300, 321, 1027, 294, 527, 50908], "temperature": + 0.0, "avg_logprob": -0.20429320335388185, "compression_ratio": 1.6462264150943395, + "no_speech_prob": 0.11860533803701401}, {"id": 34, "seek": 12948, "start": 140.35999999999999, + "end": 146.6, "text": " most recent project together with the folks of open search + when it comes to hybrid search", "tokens": [50908, 881, 5162, 1716, 1214, 365, 264, + 4024, 295, 1269, 3164, 562, 309, 1487, 281, 13051, 3164, 51220], "temperature": + 0.0, "avg_logprob": -0.20429320335388185, "compression_ratio": 1.6462264150943395, + "no_speech_prob": 0.11860533803701401}, {"id": 35, "seek": 12948, "start": 146.6, + "end": 153.28, "text": " how to optimize hybrid search and also what''s necessary + to optimize hybrid search namely", "tokens": [51220, 577, 281, 19719, 13051, 3164, + 293, 611, 437, 311, 4818, 281, 19719, 13051, 3164, 20926, 51554], "temperature": + 0.0, "avg_logprob": -0.20429320335388185, "compression_ratio": 1.6462264150943395, + "no_speech_prob": 0.11860533803701401}, {"id": 36, "seek": 12948, "start": 153.28, + "end": 158.88, "text": " query sets and judgments but I''m sure we''ll get into + that in a couple of seconds.", "tokens": [51554, 14581, 6352, 293, 40337, 457, 286, + 478, 988, 321, 603, 483, 666, 300, 294, 257, 1916, 295, 3949, 13, 51834], "temperature": + 0.0, "avg_logprob": -0.20429320335388185, "compression_ratio": 1.6462264150943395, + "no_speech_prob": 0.11860533803701401}, {"id": 37, "seek": 15888, "start": 158.88, + "end": 160.28, "text": " Yeah thanks Daniel.", "tokens": [50364, 865, 3231, 8033, + 13, 50434], "temperature": 0.0, "avg_logprob": -0.3505380980822505, "compression_ratio": + 1.5956521739130434, "no_speech_prob": 0.039369069039821625}, {"id": 38, "seek": + 15888, "start": 160.28, "end": 167.35999999999999, "text": " I''m also nervous but + I also know that you know when I release in the episodes I enjoy them.", "tokens": + [50434, 286, 478, 611, 6296, 457, 286, 611, 458, 300, 291, 458, 562, 286, 4374, + 294, 264, 9313, 286, 2103, 552, 13, 50788], "temperature": 0.0, "avg_logprob": -0.3505380980822505, + "compression_ratio": 1.5956521739130434, "no_speech_prob": 0.039369069039821625}, + {"id": 39, "seek": 15888, "start": 167.35999999999999, "end": 169.76, "text": " + It''s just it''s just fun really.", "tokens": [50788, 467, 311, 445, 309, 311, 445, + 1019, 534, 13, 50908], "temperature": 0.0, "avg_logprob": -0.3505380980822505, "compression_ratio": + 1.5956521739130434, "no_speech_prob": 0.039369069039821625}, {"id": 40, "seek": + 15888, "start": 169.76, "end": 174.12, "text": " So I was thinking like hybrid search + yeah we did discuss and I think community discusses", "tokens": [50908, 407, 286, + 390, 1953, 411, 13051, 3164, 1338, 321, 630, 2248, 293, 286, 519, 1768, 2248, 279, + 51126], "temperature": 0.0, "avg_logprob": -0.3505380980822505, "compression_ratio": + 1.5956521739130434, "no_speech_prob": 0.039369069039821625}, {"id": 41, "seek": + 15888, "start": 174.12, "end": 176.76, "text": " it at large and various forums.", + "tokens": [51126, 309, 412, 2416, 293, 3683, 26998, 13, 51258], "temperature": 0.0, + "avg_logprob": -0.3505380980822505, "compression_ratio": 1.5956521739130434, "no_speech_prob": + 0.039369069039821625}, {"id": 42, "seek": 15888, "start": 176.76, "end": 182.04, + "text": " Erick also reminded me of the episode with Alessandro Benindetti that + we just did it really", "tokens": [51258, 3300, 618, 611, 15920, 385, 295, 264, + 3500, 365, 967, 442, 29173, 3964, 471, 12495, 300, 321, 445, 630, 309, 534, 51522], + "temperature": 0.0, "avg_logprob": -0.3505380980822505, "compression_ratio": 1.5956521739130434, + "no_speech_prob": 0.039369069039821625}, {"id": 43, "seek": 15888, "start": 182.04, + "end": 183.04, "text": " was worth.", "tokens": [51522, 390, 3163, 13, 51572], "temperature": + 0.0, "avg_logprob": -0.3505380980822505, "compression_ratio": 1.5956521739130434, + "no_speech_prob": 0.039369069039821625}, {"id": 44, "seek": 18304, "start": 184.04, + "end": 190.64, "text": " Yeah I was really just curious maybe step back from that + topic a little bit and discuss", "tokens": [50414, 865, 286, 390, 534, 445, 6369, + 1310, 1823, 646, 490, 300, 4829, 257, 707, 857, 293, 2248, 50744], "temperature": + 0.0, "avg_logprob": -0.18696778615315754, "compression_ratio": 1.5869565217391304, + "no_speech_prob": 0.072663314640522}, {"id": 45, "seek": 18304, "start": 190.64, + "end": 195.23999999999998, "text": " the importance of hybrid search and what is + it in your own words where do you see value", "tokens": [50744, 264, 7379, 295, + 13051, 3164, 293, 437, 307, 309, 294, 428, 1065, 2283, 689, 360, 291, 536, 2158, + 50974], "temperature": 0.0, "avg_logprob": -0.18696778615315754, "compression_ratio": + 1.5869565217391304, "no_speech_prob": 0.072663314640522}, {"id": 46, "seek": 18304, + "start": 195.23999999999998, "end": 200.92, "text": " for it compared to how we + used to do search before.", "tokens": [50974, 337, 309, 5347, 281, 577, 321, 1143, + 281, 360, 3164, 949, 13, 51258], "temperature": 0.0, "avg_logprob": -0.18696778615315754, + "compression_ratio": 1.5869565217391304, "no_speech_prob": 0.072663314640522}, {"id": + 47, "seek": 18304, "start": 200.92, "end": 203.6, "text": " You want to take it + Daniel and then I''ll follow up.", "tokens": [51258, 509, 528, 281, 747, 309, 8033, + 293, 550, 286, 603, 1524, 493, 13, 51392], "temperature": 0.0, "avg_logprob": -0.18696778615315754, + "compression_ratio": 1.5869565217391304, "no_speech_prob": 0.072663314640522}, {"id": + 48, "seek": 18304, "start": 203.6, "end": 211.32, "text": " Sure yeah so I think + we see hybrid search especially in this project as let''s say the", "tokens": [51392, + 4894, 1338, 370, 286, 519, 321, 536, 13051, 3164, 2318, 294, 341, 1716, 382, 718, + 311, 584, 264, 51778], "temperature": 0.0, "avg_logprob": -0.18696778615315754, + "compression_ratio": 1.5869565217391304, "no_speech_prob": 0.072663314640522}, {"id": + 49, "seek": 21132, "start": 211.32, "end": 218.4, "text": " process of blending + traditional keyword search and also let''s say modern search approaches", "tokens": + [50364, 1399, 295, 23124, 5164, 20428, 3164, 293, 611, 718, 311, 584, 4363, 3164, + 11587, 50718], "temperature": 0.0, "avg_logprob": -0.280281497586158, "compression_ratio": + 1.5730337078651686, "no_speech_prob": 0.0051127891056239605}, {"id": 50, "seek": + 21132, "start": 218.4, "end": 227.56, "text": " based on language more or mostly + called either vector search or neuro search and I think", "tokens": [50718, 2361, + 322, 2856, 544, 420, 5240, 1219, 2139, 8062, 3164, 420, 16499, 3164, 293, 286, 519, + 51176], "temperature": 0.0, "avg_logprob": -0.280281497586158, "compression_ratio": + 1.5730337078651686, "no_speech_prob": 0.0051127891056239605}, {"id": 51, "seek": + 21132, "start": 227.56, "end": 235.4, "text": " the benefits of it are probably + you follow it or I guess you can you can group them into", "tokens": [51176, 264, + 5311, 295, 309, 366, 1391, 291, 1524, 309, 420, 286, 2041, 291, 393, 291, 393, 1594, + 552, 666, 51568], "temperature": 0.0, "avg_logprob": -0.280281497586158, "compression_ratio": + 1.5730337078651686, "no_speech_prob": 0.0051127891056239605}, {"id": 52, "seek": + 21132, "start": 235.4, "end": 236.4, "text": " two groups.", "tokens": [51568, 732, + 3935, 13, 51618], "temperature": 0.0, "avg_logprob": -0.280281497586158, "compression_ratio": + 1.5730337078651686, "no_speech_prob": 0.0051127891056239605}, {"id": 53, "seek": + 23640, "start": 236.96, "end": 243.84, "text": " Looking at the end user we always + want to provide the end users with the most or the", "tokens": [50392, 11053, 412, + 264, 917, 4195, 321, 1009, 528, 281, 2893, 264, 917, 5022, 365, 264, 881, 420, 264, + 50736], "temperature": 0.0, "avg_logprob": -0.18179401397705078, "compression_ratio": + 1.6495327102803738, "no_speech_prob": 0.030946411192417145}, {"id": 54, "seek": + 23640, "start": 243.84, "end": 251.04000000000002, "text": " highest quality results + right so search result quality is what we strive for and traditional", "tokens": + [50736, 6343, 3125, 3542, 558, 370, 3164, 1874, 3125, 307, 437, 321, 23829, 337, + 293, 5164, 51096], "temperature": 0.0, "avg_logprob": -0.18179401397705078, "compression_ratio": + 1.6495327102803738, "no_speech_prob": 0.030946411192417145}, {"id": 55, "seek": + 23640, "start": 251.04000000000002, "end": 258.36, "text": " keyword search always + lacks of let''s say finding related things that may not really contain", "tokens": + [51096, 20428, 3164, 1009, 31132, 295, 718, 311, 584, 5006, 4077, 721, 300, 815, + 406, 534, 5304, 51462], "temperature": 0.0, "avg_logprob": -0.18179401397705078, + "compression_ratio": 1.6495327102803738, "no_speech_prob": 0.030946411192417145}, + {"id": 56, "seek": 23640, "start": 258.36, "end": 265.44, "text": " the specific + words but similar so laptop and notebook is an example that I think we", "tokens": + [51462, 264, 2685, 2283, 457, 2531, 370, 10732, 293, 21060, 307, 364, 1365, 300, + 286, 519, 321, 51816], "temperature": 0.0, "avg_logprob": -0.18179401397705078, + "compression_ratio": 1.6495327102803738, "no_speech_prob": 0.030946411192417145}, + {"id": 57, "seek": 26544, "start": 265.44, "end": 273.24, "text": " ran probably + a million times in demos maybe even more than a million times so if notebook", "tokens": + [50364, 5872, 1391, 257, 2459, 1413, 294, 33788, 1310, 754, 544, 813, 257, 2459, + 1413, 370, 498, 21060, 50754], "temperature": 0.0, "avg_logprob": -0.1342410099359206, + "compression_ratio": 1.6515837104072397, "no_speech_prob": 0.004714095965027809}, + {"id": 58, "seek": 26544, "start": 273.24, "end": 278.16, "text": " is not in my + product description I will not find it when I search for laptop and the other", + "tokens": [50754, 307, 406, 294, 452, 1674, 3855, 286, 486, 406, 915, 309, 562, + 286, 3164, 337, 10732, 293, 264, 661, 51000], "temperature": 0.0, "avg_logprob": + -0.1342410099359206, "compression_ratio": 1.6515837104072397, "no_speech_prob": + 0.004714095965027809}, {"id": 59, "seek": 26544, "start": 278.16, "end": 286.6, + "text": " way around and that''s where let''s say blending the two techniques really + shine because it enables", "tokens": [51000, 636, 926, 293, 300, 311, 689, 718, + 311, 584, 23124, 264, 732, 7512, 534, 12207, 570, 309, 17077, 51422], "temperature": + 0.0, "avg_logprob": -0.1342410099359206, "compression_ratio": 1.6515837104072397, + "no_speech_prob": 0.004714095965027809}, {"id": 60, "seek": 26544, "start": 286.6, + "end": 293.24, "text": " you to not only find where your keywords are in but also + find related stuff to augment", "tokens": [51422, 291, 281, 406, 787, 915, 689, + 428, 21009, 366, 294, 457, 611, 915, 4077, 1507, 281, 29919, 51754], "temperature": + 0.0, "avg_logprob": -0.1342410099359206, "compression_ratio": 1.6515837104072397, + "no_speech_prob": 0.004714095965027809}, {"id": 61, "seek": 29324, "start": 293.48, + "end": 301.64, "text": " the result set and I think that with that with that large + benefit of course come a lot of", "tokens": [50376, 264, 1874, 992, 293, 286, 519, + 300, 365, 300, 365, 300, 2416, 5121, 295, 1164, 808, 257, 688, 295, 50784], "temperature": + 0.0, "avg_logprob": -0.1804484110029917, "compression_ratio": 1.680473372781065, + "no_speech_prob": 0.0049310545437037945}, {"id": 62, "seek": 29324, "start": 301.64, + "end": 309.72, "text": " challenges because it always is let''s say non-tribal how + to actually blend the traditional techniques", "tokens": [50784, 4759, 570, 309, + 1009, 307, 718, 311, 584, 2107, 12, 83, 2024, 304, 577, 281, 767, 10628, 264, 5164, + 7512, 51188], "temperature": 0.0, "avg_logprob": -0.1804484110029917, "compression_ratio": + 1.680473372781065, "no_speech_prob": 0.0049310545437037945}, {"id": 63, "seek": + 29324, "start": 309.72, "end": 317.08, "text": " and the more modern techniques + so that''s where the challenge between or the challenge behind", "tokens": [51188, + 293, 264, 544, 4363, 7512, 370, 300, 311, 689, 264, 3430, 1296, 420, 264, 3430, + 2261, 51556], "temperature": 0.0, "avg_logprob": -0.1804484110029917, "compression_ratio": + 1.680473372781065, "no_speech_prob": 0.0049310545437037945}, {"id": 64, "seek": + 31708, "start": 317.15999999999997, "end": 323.47999999999996, "text": " hybrid + search actually lies. I mentioned two groups for which there are benefits of the + end user", "tokens": [50368, 13051, 3164, 767, 9134, 13, 286, 2835, 732, 3935, 337, + 597, 456, 366, 5311, 295, 264, 917, 4195, 50684], "temperature": 0.0, "avg_logprob": + -0.11517394343508949, "compression_ratio": 1.7395348837209301, "no_speech_prob": + 0.007801172323524952}, {"id": 65, "seek": 31708, "start": 323.47999999999996, "end": + 328.44, "text": " we want to provide the end user with the highest quality results + that''s one group the other", "tokens": [50684, 321, 528, 281, 2893, 264, 917, 4195, + 365, 264, 6343, 3125, 3542, 300, 311, 472, 1594, 264, 661, 50932], "temperature": + 0.0, "avg_logprob": -0.11517394343508949, "compression_ratio": 1.7395348837209301, + "no_speech_prob": 0.007801172323524952}, {"id": 66, "seek": 31708, "start": 328.44, + "end": 335.4, "text": " group is of course we as the ones providing search applications + I mean we somehow need to profit", "tokens": [50932, 1594, 307, 295, 1164, 321, + 382, 264, 2306, 6530, 3164, 5821, 286, 914, 321, 6063, 643, 281, 7475, 51280], "temperature": + 0.0, "avg_logprob": -0.11517394343508949, "compression_ratio": 1.7395348837209301, + "no_speech_prob": 0.007801172323524952}, {"id": 67, "seek": 31708, "start": 335.4, + "end": 344.28, "text": " from providing better results and it then is always different + or yeah different in which", "tokens": [51280, 490, 6530, 1101, 3542, 293, 309, + 550, 307, 1009, 819, 420, 1338, 819, 294, 597, 51724], "temperature": 0.0, "avg_logprob": + -0.11517394343508949, "compression_ratio": 1.7395348837209301, "no_speech_prob": + 0.007801172323524952}, {"id": 68, "seek": 34428, "start": 344.28, "end": 349.79999999999995, + "text": " let''s say scenario in which industry we are working so the monosperm + transparent one is always", "tokens": [50364, 718, 311, 584, 9005, 294, 597, 3518, + 321, 366, 1364, 370, 264, 1108, 329, 610, 76, 12737, 472, 307, 1009, 50640], "temperature": + 0.0, "avg_logprob": -0.18972611701351472, "compression_ratio": 1.7149122807017543, + "no_speech_prob": 0.0008283494389615953}, {"id": 69, "seek": 34428, "start": 349.79999999999995, + "end": 357.23999999999995, "text": " e-commerce the easier the end user the consumer + actually finds stuff in your online shop the easier", "tokens": [50640, 308, 12, + 26926, 264, 3571, 264, 917, 4195, 264, 9711, 767, 10704, 1507, 294, 428, 2950, 3945, + 264, 3571, 51012], "temperature": 0.0, "avg_logprob": -0.18972611701351472, "compression_ratio": + 1.7149122807017543, "no_speech_prob": 0.0008283494389615953}, {"id": 70, "seek": + 34428, "start": 357.23999999999995, "end": 363.23999999999995, "text": " is for + them to buy stuff if they buy more stuff more easily of course we generate more + revenue and", "tokens": [51012, 307, 337, 552, 281, 2256, 1507, 498, 436, 2256, + 544, 1507, 544, 3612, 295, 1164, 321, 8460, 544, 9324, 293, 51312], "temperature": + 0.0, "avg_logprob": -0.18972611701351472, "compression_ratio": 1.7149122807017543, + "no_speech_prob": 0.0008283494389615953}, {"id": 71, "seek": 34428, "start": 363.23999999999995, + "end": 371.96, "text": " that''s kind of the benefit then that comes with providing + better search results and the other way", "tokens": [51312, 300, 311, 733, 295, + 264, 5121, 550, 300, 1487, 365, 6530, 1101, 3164, 3542, 293, 264, 661, 636, 51748], + "temperature": 0.0, "avg_logprob": -0.18972611701351472, "compression_ratio": 1.7149122807017543, + "no_speech_prob": 0.0008283494389615953}, {"id": 72, "seek": 37196, "start": 371.96, + "end": 381.32, "text": " is we don''t want to let''s say manually tune systems let''s + say indefinitely so of course I can", "tokens": [50364, 307, 321, 500, 380, 528, + 281, 718, 311, 584, 16945, 10864, 3652, 718, 311, 584, 24162, 10925, 370, 295, 1164, + 286, 393, 50832], "temperature": 0.0, "avg_logprob": -0.07885779937108357, "compression_ratio": + 1.5567567567567568, "no_speech_prob": 0.0034359849523752928}, {"id": 73, "seek": + 37196, "start": 381.32, "end": 391.4, "text": " go ahead and say laptop is synonymous + to notebook and PC is maybe broader term of laptop and rules", "tokens": [50832, + 352, 2286, 293, 584, 10732, 307, 5451, 18092, 281, 21060, 293, 6465, 307, 1310, + 13227, 1433, 295, 10732, 293, 4474, 51336], "temperature": 0.0, "avg_logprob": -0.07885779937108357, + "compression_ratio": 1.5567567567567568, "no_speech_prob": 0.0034359849523752928}, + {"id": 74, "seek": 37196, "start": 391.4, "end": 399.79999999999995, "text": " like + these but that''s kind of work that is never done if I have a changing catalog that + I don''t", "tokens": [51336, 411, 613, 457, 300, 311, 733, 295, 589, 300, 307, 1128, + 1096, 498, 286, 362, 257, 4473, 19746, 300, 286, 500, 380, 51756], "temperature": + 0.0, "avg_logprob": -0.07885779937108357, "compression_ratio": 1.5567567567567568, + "no_speech_prob": 0.0034359849523752928}, {"id": 75, "seek": 39980, "start": 400.44, + "end": 407.24, "text": " know old products get thrown out of the product catalog + new products arrive so it''s a never-ending", "tokens": [50396, 458, 1331, 3383, + 483, 11732, 484, 295, 264, 1674, 19746, 777, 3383, 8881, 370, 309, 311, 257, 1128, + 12, 2029, 50736], "temperature": 0.0, "avg_logprob": -0.1322068590106386, "compression_ratio": + 1.6010928961748634, "no_speech_prob": 0.0010116742923855782}, {"id": 76, "seek": + 39980, "start": 407.24, "end": 416.04, "text": " challenge for me and I don''t want + to let''s say spend my work for us always manually hunting these", "tokens": [50736, + 3430, 337, 385, 293, 286, 500, 380, 528, 281, 718, 311, 584, 3496, 452, 589, 337, + 505, 1009, 16945, 12599, 613, 51176], "temperature": 0.0, "avg_logprob": -0.1322068590106386, + "compression_ratio": 1.6010928961748634, "no_speech_prob": 0.0010116742923855782}, + {"id": 77, "seek": 39980, "start": 416.04, "end": 422.12, "text": " rules and thinking + about what made the users mean when they start for something I want something", + "tokens": [51176, 4474, 293, 1953, 466, 437, 1027, 264, 5022, 914, 562, 436, 722, + 337, 746, 286, 528, 746, 51480], "temperature": 0.0, "avg_logprob": -0.1322068590106386, + "compression_ratio": 1.6010928961748634, "no_speech_prob": 0.0010116742923855782}, + {"id": 78, "seek": 42212, "start": 423.08, "end": 430.68, "text": " let''s say intelligently + looking for the right things in my index and that''s what the neural part", "tokens": + [50412, 718, 311, 584, 5613, 2276, 1237, 337, 264, 558, 721, 294, 452, 8186, 293, + 300, 311, 437, 264, 18161, 644, 50792], "temperature": 0.0, "avg_logprob": -0.19416969541519408, + "compression_ratio": 1.5833333333333333, "no_speech_prob": 0.0043699066154658794}, + {"id": 79, "seek": 42212, "start": 430.68, "end": 439.32, "text": " of hybrid search + enables us so I think these are definitely maybe the two groups that benefit", "tokens": + [50792, 295, 13051, 3164, 17077, 505, 370, 286, 519, 613, 366, 2138, 1310, 264, + 732, 3935, 300, 5121, 51224], "temperature": 0.0, "avg_logprob": -0.19416969541519408, + "compression_ratio": 1.5833333333333333, "no_speech_prob": 0.0043699066154658794}, + {"id": 80, "seek": 42212, "start": 439.32, "end": 445.72, "text": " and how these + two groups benefit from my perspective yeah that''s really good intro Ericy water", + "tokens": [51224, 293, 577, 613, 732, 3935, 5121, 490, 452, 4585, 1338, 300, 311, + 534, 665, 12897, 9336, 88, 1281, 51544], "temperature": 0.0, "avg_logprob": -0.19416969541519408, + "compression_ratio": 1.5833333333333333, "no_speech_prob": 0.0043699066154658794}, + {"id": 81, "seek": 44572, "start": 445.8, "end": 452.36, "text": " take it yeah + I think it''s an interesting journey that we''ve been on the last few years and + I", "tokens": [50368, 747, 309, 1338, 286, 519, 309, 311, 364, 1880, 4671, 300, + 321, 600, 668, 322, 264, 1036, 1326, 924, 293, 286, 50696], "temperature": 0.0, + "avg_logprob": -0.11472883678617932, "compression_ratio": 1.6753246753246753, "no_speech_prob": + 0.002358798636123538}, {"id": 82, "seek": 44572, "start": 452.36, "end": 459.40000000000003, + "text": " sort of look at hybrid search as a little bit of a like a course correction + right so keyword search", "tokens": [50696, 1333, 295, 574, 412, 13051, 3164, 382, + 257, 707, 857, 295, 257, 411, 257, 1164, 19984, 558, 370, 20428, 3164, 51048], "temperature": + 0.0, "avg_logprob": -0.11472883678617932, "compression_ratio": 1.6753246753246753, + "no_speech_prob": 0.002358798636123538}, {"id": 83, "seek": 44572, "start": 459.40000000000003, + "end": 465.88000000000005, "text": " been around forever well understood frustrations + are well known and then vectors came out and all", "tokens": [51048, 668, 926, 5680, + 731, 7320, 7454, 12154, 366, 731, 2570, 293, 550, 18875, 1361, 484, 293, 439, 51372], + "temperature": 0.0, "avg_logprob": -0.11472883678617932, "compression_ratio": 1.6753246753246753, + "no_speech_prob": 0.002358798636123538}, {"id": 84, "seek": 44572, "start": 465.88000000000005, + "end": 471.88000000000005, "text": " these new products these new vector databases + everybody was really excited about them and we all", "tokens": [51372, 613, 777, + 3383, 613, 777, 8062, 22380, 2201, 390, 534, 2919, 466, 552, 293, 321, 439, 51672], + "temperature": 0.0, "avg_logprob": -0.11472883678617932, "compression_ratio": 1.6753246753246753, + "no_speech_prob": 0.002358798636123538}, {"id": 85, "seek": 47188, "start": 471.88, + "end": 477.64, "text": " said oh okay let''s go use vectors and we leapt on that + and got really excited built everything", "tokens": [50364, 848, 1954, 1392, 718, + 311, 352, 764, 18875, 293, 321, 476, 2796, 322, 300, 293, 658, 534, 2919, 3094, + 1203, 50652], "temperature": 0.0, "avg_logprob": -0.09517174232296827, "compression_ratio": + 1.7570093457943925, "no_speech_prob": 0.0008602467132732272}, {"id": 86, "seek": + 47188, "start": 477.64, "end": 486.28, "text": " using vectors and I think maybe + we went too far that way over into vector land and we started after", "tokens": + [50652, 1228, 18875, 293, 286, 519, 1310, 321, 1437, 886, 1400, 300, 636, 670, 666, + 8062, 2117, 293, 321, 1409, 934, 51084], "temperature": 0.0, "avg_logprob": -0.09517174232296827, + "compression_ratio": 1.7570093457943925, "no_speech_prob": 0.0008602467132732272}, + {"id": 87, "seek": 47188, "start": 486.28, "end": 492.36, "text": " we started getting + some experience with vectors we started realizing some of the problems with it", + "tokens": [51084, 321, 1409, 1242, 512, 1752, 365, 18875, 321, 1409, 16734, 512, + 295, 264, 2740, 365, 309, 51388], "temperature": 0.0, "avg_logprob": -0.09517174232296827, + "compression_ratio": 1.7570093457943925, "no_speech_prob": 0.0008602467132732272}, + {"id": 88, "seek": 47188, "start": 492.36, "end": 498.44, "text": " right like doesn''t + matter what you query you''re gonna get some search results right", "tokens": [51388, + 558, 411, 1177, 380, 1871, 437, 291, 14581, 291, 434, 799, 483, 512, 3164, 3542, + 558, 51692], "temperature": 0.0, "avg_logprob": -0.09517174232296827, "compression_ratio": + 1.7570093457943925, "no_speech_prob": 0.0008602467132732272}, {"id": 89, "seek": + 49844, "start": 499.4, "end": 507.48, "text": " sometimes zero search results is + the right answer right you know interesting challenges around", "tokens": [50412, + 2171, 4018, 3164, 3542, 307, 264, 558, 1867, 558, 291, 458, 1880, 4759, 926, 50816], + "temperature": 0.0, "avg_logprob": -0.1020694308810764, "compression_ratio": 1.742081447963801, + "no_speech_prob": 0.001348108984529972}, {"id": 90, "seek": 49844, "start": 508.12, + "end": 514.04, "text": " you know faceting or pagination or highlighting can be + weird right so you know I think that there", "tokens": [50848, 291, 458, 1915, 9880, + 420, 11812, 2486, 420, 26551, 393, 312, 3657, 558, 370, 291, 458, 286, 519, 300, + 456, 51144], "temperature": 0.0, "avg_logprob": -0.1020694308810764, "compression_ratio": + 1.742081447963801, "no_speech_prob": 0.001348108984529972}, {"id": 91, "seek": 49844, + "start": 514.04, "end": 519.08, "text": " are some definite challenges in vectors + and we all went over that way and I think we''ve seen it", "tokens": [51144, 366, + 512, 25131, 4759, 294, 18875, 293, 321, 439, 1437, 670, 300, 636, 293, 286, 519, + 321, 600, 1612, 309, 51396], "temperature": 0.0, "avg_logprob": -0.1020694308810764, + "compression_ratio": 1.742081447963801, "no_speech_prob": 0.001348108984529972}, + {"id": 92, "seek": 49844, "start": 519.08, "end": 528.36, "text": " in the last + two years where all the vector databases were frantically adding keyword like search", + "tokens": [51396, 294, 264, 1036, 732, 924, 689, 439, 264, 8062, 22380, 645, 431, + 49505, 5127, 20428, 411, 3164, 51860], "temperature": 0.0, "avg_logprob": -0.1020694308810764, + "compression_ratio": 1.742081447963801, "no_speech_prob": 0.001348108984529972}, + {"id": 93, "seek": 52836, "start": 529.16, "end": 538.44, "text": " and all of the + keyword search indexes were all frantically adding vectors okay now we have these", + "tokens": [50404, 293, 439, 295, 264, 20428, 3164, 8186, 279, 645, 439, 431, 49505, + 5127, 18875, 1392, 586, 321, 362, 613, 50868], "temperature": 0.0, "avg_logprob": + -0.13279681735568577, "compression_ratio": 1.5826771653543308, "no_speech_prob": + 0.0006227812846191227}, {"id": 94, "seek": 52836, "start": 538.44, "end": 547.32, + "text": " things as like where do we go oh hybrid search right hybrid search and + you know hybrid search popped out", "tokens": [50868, 721, 382, 411, 689, 360, 321, + 352, 1954, 13051, 3164, 558, 13051, 3164, 293, 291, 458, 13051, 3164, 21545, 484, + 51312], "temperature": 0.0, "avg_logprob": -0.13279681735568577, "compression_ratio": + 1.5826771653543308, "no_speech_prob": 0.0006227812846191227}, {"id": 95, "seek": + 54732, "start": 547.4000000000001, "end": 559.88, "text": " and you know hear me + out I think hybrid search is just good old federated search from the late 90s", + "tokens": [50368, 293, 291, 458, 1568, 385, 484, 286, 519, 13051, 3164, 307, 445, + 665, 1331, 38024, 770, 3164, 490, 264, 3469, 4289, 82, 50992], "temperature": 0.0, + "avg_logprob": -0.10870110470315685, "compression_ratio": 1.5956284153005464, "no_speech_prob": + 0.0003426594485063106}, {"id": 96, "seek": 54732, "start": 559.88, "end": 568.2, + "text": " and 2000s where you had two search engines with two we send out two queries + and then you brought", "tokens": [50992, 293, 8132, 82, 689, 291, 632, 732, 3164, + 12982, 365, 732, 321, 2845, 484, 732, 24109, 293, 550, 291, 3038, 51408], "temperature": + 0.0, "avg_logprob": -0.10870110470315685, "compression_ratio": 1.5956284153005464, + "no_speech_prob": 0.0003426594485063106}, {"id": 97, "seek": 54732, "start": 568.2, + "end": 574.2, "text": " them back and you''re like how do I merge them together + and sometimes you do terrible things like", "tokens": [51408, 552, 646, 293, 291, + 434, 411, 577, 360, 286, 22183, 552, 1214, 293, 2171, 291, 360, 6237, 721, 411, + 51708], "temperature": 0.0, "avg_logprob": -0.10870110470315685, "compression_ratio": + 1.5956284153005464, "no_speech_prob": 0.0003426594485063106}, {"id": 98, "seek": + 57420, "start": 574.2, "end": 582.84, "text": " two lists of results right we was + sometimes we would try to link them up together um it''s the", "tokens": [50364, + 732, 14511, 295, 3542, 558, 321, 390, 2171, 321, 576, 853, 281, 2113, 552, 493, + 1214, 1105, 309, 311, 264, 50796], "temperature": 0.0, "avg_logprob": -0.10777336718088174, + "compression_ratio": 1.799043062200957, "no_speech_prob": 0.0002078502147924155}, + {"id": 99, "seek": 57420, "start": 582.84, "end": 588.84, "text": " same idea whether + you''re going to one search engine you''re making a keyword search and a neural", + "tokens": [50796, 912, 1558, 1968, 291, 434, 516, 281, 472, 3164, 2848, 291, 434, + 1455, 257, 20428, 3164, 293, 257, 18161, 51096], "temperature": 0.0, "avg_logprob": + -0.10777336718088174, "compression_ratio": 1.799043062200957, "no_speech_prob": + 0.0002078502147924155}, {"id": 100, "seek": 57420, "start": 588.84, "end": 593.6400000000001, + "text": " search and bring them together or two totally separate see keyword search + engines you''re still", "tokens": [51096, 3164, 293, 1565, 552, 1214, 420, 732, + 3879, 4994, 536, 20428, 3164, 12982, 291, 434, 920, 51336], "temperature": 0.0, + "avg_logprob": -0.10777336718088174, "compression_ratio": 1.799043062200957, "no_speech_prob": + 0.0002078502147924155}, {"id": 101, "seek": 57420, "start": 593.6400000000001, "end": + 601.24, "text": " bringing back two lists however I think at least this time around + how to merge the lists of", "tokens": [51336, 5062, 646, 732, 14511, 4461, 286, + 519, 412, 1935, 341, 565, 926, 577, 281, 22183, 264, 14511, 295, 51716], "temperature": + 0.0, "avg_logprob": -0.10777336718088174, "compression_ratio": 1.799043062200957, + "no_speech_prob": 0.0002078502147924155}, {"id": 102, "seek": 60124, "start": 601.24, + "end": 608.6800000000001, "text": " results together seems to be going better than + when we did it back in federated search right", "tokens": [50364, 3542, 1214, 2544, + 281, 312, 516, 1101, 813, 562, 321, 630, 309, 646, 294, 38024, 770, 3164, 558, 50736], + "temperature": 0.0, "avg_logprob": -0.07979805022478104, "compression_ratio": 1.5869565217391304, + "no_speech_prob": 0.0002062526618828997}, {"id": 103, "seek": 60124, "start": 609.4, + "end": 614.52, "text": " uh and I look forward to talking more about like some of + the ways that we bring hybrid you know build", "tokens": [50772, 2232, 293, 286, + 574, 2128, 281, 1417, 544, 466, 411, 512, 295, 264, 2098, 300, 321, 1565, 13051, + 291, 458, 1322, 51028], "temperature": 0.0, "avg_logprob": -0.07979805022478104, + "compression_ratio": 1.5869565217391304, "no_speech_prob": 0.0002062526618828997}, + {"id": 104, "seek": 60124, "start": 614.52, "end": 622.36, "text": " our hybrid + results set together um part of me really kind of wonders why ranked reciprocal + fusion", "tokens": [51028, 527, 13051, 3542, 992, 1214, 1105, 644, 295, 385, 534, + 733, 295, 27348, 983, 20197, 46948, 23100, 51420], "temperature": 0.0, "avg_logprob": + -0.07979805022478104, "compression_ratio": 1.5869565217391304, "no_speech_prob": + 0.0002062526618828997}, {"id": 105, "seek": 62236, "start": 623.32, "end": 631.88, + "text": " wasn''t a thing the last time I did federated search back in the 2000s + right like doesn''t seem like", "tokens": [50412, 2067, 380, 257, 551, 264, 1036, + 565, 286, 630, 38024, 770, 3164, 646, 294, 264, 8132, 82, 558, 411, 1177, 380, 1643, + 411, 50840], "temperature": 0.0, "avg_logprob": -0.0960174814860026, "compression_ratio": + 1.598901098901099, "no_speech_prob": 0.0015305147971957922}, {"id": 106, "seek": + 62236, "start": 631.88, "end": 638.2, "text": " that crazy of a concept why didn''t + we do that right but we didn''t uh so I''m a little more optimistic", "tokens": + [50840, 300, 3219, 295, 257, 3410, 983, 994, 380, 321, 360, 300, 558, 457, 321, + 994, 380, 2232, 370, 286, 478, 257, 707, 544, 19397, 51156], "temperature": 0.0, + "avg_logprob": -0.0960174814860026, "compression_ratio": 1.598901098901099, "no_speech_prob": + 0.0015305147971957922}, {"id": 107, "seek": 62236, "start": 638.2, "end": 645.72, + "text": " about the value of it but um it i think hybrid is a little bit of something + old coming back", "tokens": [51156, 466, 264, 2158, 295, 309, 457, 1105, 309, 741, + 519, 13051, 307, 257, 707, 857, 295, 746, 1331, 1348, 646, 51532], "temperature": + 0.0, "avg_logprob": -0.0960174814860026, "compression_ratio": 1.598901098901099, + "no_speech_prob": 0.0015305147971957922}, {"id": 108, "seek": 64572, "start": 645.72, + "end": 652.9200000000001, "text": " because we''re back to the same problem I literally + have two search engines two concepts for how to", "tokens": [50364, 570, 321, 434, + 646, 281, 264, 912, 1154, 286, 3736, 362, 732, 3164, 12982, 732, 10392, 337, 577, + 281, 50724], "temperature": 0.0, "avg_logprob": -0.08661473629086516, "compression_ratio": + 1.6008403361344539, "no_speech_prob": 0.0022914668079465628}, {"id": 109, "seek": + 64572, "start": 652.9200000000001, "end": 659.4, "text": " do information retrieval + and yet I want to blend it into one yeah that''s exciting topic I think", "tokens": + [50724, 360, 1589, 19817, 3337, 293, 1939, 286, 528, 281, 10628, 309, 666, 472, + 1338, 300, 311, 4670, 4829, 286, 519, 51048], "temperature": 0.0, "avg_logprob": + -0.08661473629086516, "compression_ratio": 1.6008403361344539, "no_speech_prob": + 0.0022914668079465628}, {"id": 110, "seek": 64572, "start": 660.28, "end": 666.0400000000001, + "text": " to me a hybrid search opens doors beyond sort of what I think Daniel just + explained you know", "tokens": [51092, 281, 385, 257, 13051, 3164, 9870, 8077, 4399, + 1333, 295, 437, 286, 519, 8033, 445, 8825, 291, 458, 51380], "temperature": 0.0, + "avg_logprob": -0.08661473629086516, "compression_ratio": 1.6008403361344539, "no_speech_prob": + 0.0022914668079465628}, {"id": 111, "seek": 64572, "start": 666.0400000000001, "end": + 671.88, "text": " the semantic connection between you know keywords and so on is + where you go multi-model right", "tokens": [51380, 264, 47982, 4984, 1296, 291, + 458, 21009, 293, 370, 322, 307, 689, 291, 352, 4825, 12, 8014, 338, 558, 51672], + "temperature": 0.0, "avg_logprob": -0.08661473629086516, "compression_ratio": 1.6008403361344539, + "no_speech_prob": 0.0022914668079465628}, {"id": 112, "seek": 67188, "start": 671.88, + "end": 677.24, "text": " of course you need to go there carefully probably but if + you do miss metadata on a particular you", "tokens": [50364, 295, 1164, 291, 643, + 281, 352, 456, 7500, 1391, 457, 498, 291, 360, 1713, 26603, 322, 257, 1729, 291, + 50632], "temperature": 0.0, "avg_logprob": -0.1272990573536266, "compression_ratio": + 1.7518518518518518, "no_speech_prob": 0.0032136570662260056}, {"id": 113, "seek": + 67188, "start": 677.24, "end": 684.28, "text": " know image on the product you could + reason about it using the image itself and maybe also video", "tokens": [50632, + 458, 3256, 322, 264, 1674, 291, 727, 1778, 466, 309, 1228, 264, 3256, 2564, 293, + 1310, 611, 960, 50984], "temperature": 0.0, "avg_logprob": -0.1272990573536266, + "compression_ratio": 1.7518518518518518, "no_speech_prob": 0.0032136570662260056}, + {"id": 114, "seek": 67188, "start": 684.28, "end": 688.6, "text": " because we have + video alarms as well they''re more expensive of course to run but you know", "tokens": + [50984, 570, 321, 362, 960, 45039, 382, 731, 436, 434, 544, 5124, 295, 1164, 281, + 1190, 457, 291, 458, 51200], "temperature": 0.0, "avg_logprob": -0.1272990573536266, + "compression_ratio": 1.7518518518518518, "no_speech_prob": 0.0032136570662260056}, + {"id": 115, "seek": 67188, "start": 688.6, "end": 694.12, "text": " sky''s the limit + so to say if you want to go there and go um yeah so I think I think in that sense", + "tokens": [51200, 5443, 311, 264, 4948, 370, 281, 584, 498, 291, 528, 281, 352, + 456, 293, 352, 1105, 1338, 370, 286, 519, 286, 519, 294, 300, 2020, 51476], "temperature": + 0.0, "avg_logprob": -0.1272990573536266, "compression_ratio": 1.7518518518518518, + "no_speech_prob": 0.0032136570662260056}, {"id": 116, "seek": 67188, "start": 694.12, + "end": 700.52, "text": " hybrid search uh unlocks many more avenues to explore including + in e-commerce I think right", "tokens": [51476, 13051, 3164, 2232, 517, 34896, 867, + 544, 43039, 281, 6839, 3009, 294, 308, 12, 26926, 286, 519, 558, 51796], "temperature": + 0.0, "avg_logprob": -0.1272990573536266, "compression_ratio": 1.7518518518518518, + "no_speech_prob": 0.0032136570662260056}, {"id": 117, "seek": 70052, "start": 701.48, + "end": 710.36, "text": " yeah yeah yeah I mean I love that we are actually getting + away from the old just straight up", "tokens": [50412, 1338, 1338, 1338, 286, 914, + 286, 959, 300, 321, 366, 767, 1242, 1314, 490, 264, 1331, 445, 2997, 493, 50856], + "temperature": 0.0, "avg_logprob": -0.09141743838132083, "compression_ratio": 1.6785714285714286, + "no_speech_prob": 0.003222076455131173}, {"id": 118, "seek": 70052, "start": 710.36, + "end": 716.36, "text": " bag of words that was keyword that served us for a long + time but still was just a very rough", "tokens": [50856, 3411, 295, 2283, 300, 390, + 20428, 300, 7584, 505, 337, 257, 938, 565, 457, 920, 390, 445, 257, 588, 5903, 51156], + "temperature": 0.0, "avg_logprob": -0.09141743838132083, "compression_ratio": 1.6785714285714286, + "no_speech_prob": 0.003222076455131173}, {"id": 119, "seek": 70052, "start": 716.36, + "end": 723.56, "text": " approximation of what people want right I mean BM25 you + know people say it''s not even the best", "tokens": [51156, 28023, 295, 437, 561, + 528, 558, 286, 914, 15901, 6074, 291, 458, 561, 584, 309, 311, 406, 754, 264, 1151, + 51516], "temperature": 0.0, "avg_logprob": -0.09141743838132083, "compression_ratio": + 1.6785714285714286, "no_speech_prob": 0.003222076455131173}, {"id": 120, "seek": + 70052, "start": 723.56, "end": 729.72, "text": " algorithm it''s just as fast as + the one that we use uh vectors is sort of this idea of there are", "tokens": [51516, + 9284, 309, 311, 445, 382, 2370, 382, 264, 472, 300, 321, 764, 2232, 18875, 307, + 1333, 295, 341, 1558, 295, 456, 366, 51824], "temperature": 0.0, "avg_logprob": + -0.09141743838132083, "compression_ratio": 1.6785714285714286, "no_speech_prob": + 0.003222076455131173}, {"id": 121, "seek": 72972, "start": 729.72, "end": 736.6, + "text": " richer ways of understanding user queries and the content tech and just + going beyond taxes the", "tokens": [50364, 29021, 2098, 295, 3701, 4195, 24109, + 293, 264, 2701, 7553, 293, 445, 516, 4399, 10041, 264, 50708], "temperature": 0.0, + "avg_logprob": -0.12392632527784868, "compression_ratio": 1.696969696969697, "no_speech_prob": + 0.0008326139068230987}, {"id": 122, "seek": 72972, "start": 736.6, "end": 741.96, + "text": " you know it''s absolutely wonderful right lots of different things I mean + some point we''ll do a vector", "tokens": [50708, 291, 458, 309, 311, 3122, 3715, + 558, 3195, 295, 819, 721, 286, 914, 512, 935, 321, 603, 360, 257, 8062, 50976], + "temperature": 0.0, "avg_logprob": -0.12392632527784868, "compression_ratio": 1.696969696969697, + "no_speech_prob": 0.0008326139068230987}, {"id": 123, "seek": 72972, "start": 741.96, + "end": 751.24, "text": " search on usage patterns right to figure stuff out right + like it''ll be the the mode will be activity", "tokens": [50976, 3164, 322, 14924, + 8294, 558, 281, 2573, 1507, 484, 558, 411, 309, 603, 312, 264, 264, 4391, 486, 312, + 5191, 51440], "temperature": 0.0, "avg_logprob": -0.12392632527784868, "compression_ratio": + 1.696969696969697, "no_speech_prob": 0.0008326139068230987}, {"id": 124, "seek": + 72972, "start": 751.24, "end": 756.6, "text": " won''t be video or image or something + it''ll be activity you''d be like oh yeah that''s the person", "tokens": [51440, + 1582, 380, 312, 960, 420, 3256, 420, 746, 309, 603, 312, 5191, 291, 1116, 312, 411, + 1954, 1338, 300, 311, 264, 954, 51708], "temperature": 0.0, "avg_logprob": -0.12392632527784868, + "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0008326139068230987}, + {"id": 125, "seek": 75660, "start": 756.6, "end": 763.32, "text": " I want to talk + to they have the same activities as me based on whatever it is that they do right", + "tokens": [50364, 286, 528, 281, 751, 281, 436, 362, 264, 912, 5354, 382, 385, 2361, + 322, 2035, 309, 307, 300, 436, 360, 558, 50700], "temperature": 0.0, "avg_logprob": + -0.056114905122397606, "compression_ratio": 1.6277777777777778, "no_speech_prob": + 0.000335940218064934}, {"id": 126, "seek": 75660, "start": 763.32, "end": 771.64, + "text": " so but those kinds of things definitely are expressed through the vectors + um I do think that hybrid", "tokens": [50700, 370, 457, 729, 3685, 295, 721, 2138, + 366, 12675, 807, 264, 18875, 1105, 286, 360, 519, 300, 13051, 51116], "temperature": + 0.0, "avg_logprob": -0.056114905122397606, "compression_ratio": 1.6277777777777778, + "no_speech_prob": 0.000335940218064934}, {"id": 127, "seek": 75660, "start": 771.64, + "end": 780.2, "text": " is an amazing thing for right now for the next few years + uh I do think though it''s also a little", "tokens": [51116, 307, 364, 2243, 551, + 337, 558, 586, 337, 264, 958, 1326, 924, 2232, 286, 360, 519, 1673, 309, 311, 611, + 257, 707, 51544], "temperature": 0.0, "avg_logprob": -0.056114905122397606, "compression_ratio": + 1.6277777777777778, "no_speech_prob": 0.000335940218064934}, {"id": 128, "seek": + 78020, "start": 780.2, "end": 789.1600000000001, "text": " bit of a bandaid in the + sense that we''re still leaning on keyword search for you know various", "tokens": + [50364, 857, 295, 257, 4116, 17810, 294, 264, 2020, 300, 321, 434, 920, 23390, 322, + 20428, 3164, 337, 291, 458, 3683, 50812], "temperature": 0.0, "avg_logprob": -0.08040446501511794, + "compression_ratio": 1.5454545454545454, "no_speech_prob": 0.002656811149790883}, + {"id": 129, "seek": 78020, "start": 789.1600000000001, "end": 797.48, "text": " + use cases and if we were to look 10 years out I think an ideal solution is that + we''re not doing", "tokens": [50812, 764, 3331, 293, 498, 321, 645, 281, 574, 1266, + 924, 484, 286, 519, 364, 7157, 3827, 307, 300, 321, 434, 406, 884, 51228], "temperature": + 0.0, "avg_logprob": -0.08040446501511794, "compression_ratio": 1.5454545454545454, + "no_speech_prob": 0.002656811149790883}, {"id": 130, "seek": 78020, "start": 797.48, + "end": 804.76, "text": " hybrid anymore just have a better approach to search something + beyond vector plus keyword something", "tokens": [51228, 13051, 3602, 445, 362, + 257, 1101, 3109, 281, 3164, 746, 4399, 8062, 1804, 20428, 746, 51592], "temperature": + 0.0, "avg_logprob": -0.08040446501511794, "compression_ratio": 1.5454545454545454, + "no_speech_prob": 0.002656811149790883}, {"id": 131, "seek": 80476, "start": 804.76, + "end": 811.3199999999999, "text": " better that still supports the zero results + is the right answer you know some of these problems", "tokens": [50364, 1101, 300, + 920, 9346, 264, 4018, 3542, 307, 264, 558, 1867, 291, 458, 512, 295, 613, 2740, + 50692], "temperature": 0.0, "avg_logprob": -0.15298568118702283, "compression_ratio": + 1.7025862068965518, "no_speech_prob": 0.015728961676359177}, {"id": 132, "seek": + 80476, "start": 811.3199999999999, "end": 819.08, "text": " that vector using vectors + gives us right we would have better better approach and not this slightly", "tokens": + [50692, 300, 8062, 1228, 18875, 2709, 505, 558, 321, 576, 362, 1101, 1101, 3109, + 293, 406, 341, 4748, 51080], "temperature": 0.0, "avg_logprob": -0.15298568118702283, + "compression_ratio": 1.7025862068965518, "no_speech_prob": 0.015728961676359177}, + {"id": 133, "seek": 80476, "start": 819.08, "end": 825.4, "text": " band-aid I have + two different ways of searching and then have to wedge them back together yeah I + like", "tokens": [51080, 4116, 12, 64, 327, 286, 362, 732, 819, 2098, 295, 10808, + 293, 550, 362, 281, 34530, 552, 646, 1214, 1338, 286, 411, 51396], "temperature": + 0.0, "avg_logprob": -0.15298568118702283, "compression_ratio": 1.7025862068965518, + "no_speech_prob": 0.015728961676359177}, {"id": 134, "seek": 80476, "start": 825.4, + "end": 830.84, "text": " for now hybrid''s exciting yeah I like that I like where + you''re going I wanted to also I wonder if", "tokens": [51396, 337, 586, 13051, + 311, 4670, 1338, 286, 411, 300, 286, 411, 689, 291, 434, 516, 286, 1415, 281, 611, + 286, 2441, 498, 51668], "temperature": 0.0, "avg_logprob": -0.15298568118702283, + "compression_ratio": 1.7025862068965518, "no_speech_prob": 0.015728961676359177}, + {"id": 135, "seek": 83084, "start": 830.84, "end": 839.48, "text": " you saw that + blog by Dr. Enbal I will make sure to link it where he talks about uh rrf you know", + "tokens": [50364, 291, 1866, 300, 6968, 538, 2491, 13, 2193, 2645, 286, 486, 652, + 988, 281, 2113, 309, 689, 415, 6686, 466, 2232, 367, 81, 69, 291, 458, 50796], "temperature": + 0.0, "avg_logprob": -0.1799412688163862, "compression_ratio": 1.4973821989528795, + "no_speech_prob": 0.00138946867082268}, {"id": 136, "seek": 83084, "start": 839.48, + "end": 848.2, "text": " reciprocal rank fusion and he shows on a like handcrafted + example that uh you know if let''s say", "tokens": [50796, 46948, 6181, 23100, 293, + 415, 3110, 322, 257, 411, 1011, 5611, 292, 1365, 300, 2232, 291, 458, 498, 718, + 311, 584, 51232], "temperature": 0.0, "avg_logprob": -0.1799412688163862, "compression_ratio": + 1.4973821989528795, "no_speech_prob": 0.00138946867082268}, {"id": 137, "seek": + 83084, "start": 848.9200000000001, "end": 856.6800000000001, "text": " neural search + brings relevant result to the top of few results keyword lacks and doesn''t so it", + "tokens": [51268, 18161, 3164, 5607, 7340, 1874, 281, 264, 1192, 295, 1326, 3542, + 20428, 31132, 293, 1177, 380, 370, 309, 51656], "temperature": 0.0, "avg_logprob": + -0.1799412688163862, "compression_ratio": 1.4973821989528795, "no_speech_prob": + 0.00138946867082268}, {"id": 138, "seek": 85668, "start": 856.68, "end": 862.52, + "text": " basically brings noise when you combine the two you will end up having + kind of half noise half", "tokens": [50364, 1936, 5607, 5658, 562, 291, 10432, 264, + 732, 291, 486, 917, 493, 1419, 733, 295, 1922, 5658, 1922, 50656], "temperature": + 0.0, "avg_logprob": -0.19746763627607744, "compression_ratio": 1.8396226415094339, + "no_speech_prob": 0.006937813945114613}, {"id": 139, "seek": 85668, "start": 862.52, + "end": 868.8399999999999, "text": " signal and it will look terrible it will look + terrible right and where do you stand on this like", "tokens": [50656, 6358, 293, + 309, 486, 574, 6237, 309, 486, 574, 6237, 558, 293, 689, 360, 291, 1463, 322, 341, + 411, 50972], "temperature": 0.0, "avg_logprob": -0.19746763627607744, "compression_ratio": + 1.8396226415094339, "no_speech_prob": 0.006937813945114613}, {"id": 140, "seek": + 85668, "start": 870.12, "end": 878.3599999999999, "text": " only there was a way + yeah yeah it''s only there was a way of at our understand not just blindly yeah", + "tokens": [51036, 787, 456, 390, 257, 636, 1338, 1338, 309, 311, 787, 456, 390, + 257, 636, 295, 412, 527, 1223, 406, 445, 47744, 1338, 51448], "temperature": 0.0, + "avg_logprob": -0.19746763627607744, "compression_ratio": 1.8396226415094339, "no_speech_prob": + 0.006937813945114613}, {"id": 141, "seek": 85668, "start": 878.92, "end": 884.92, + "text": " you know blindly matching things um and I''ll hand it over to Daniel and + just a moment I do want to", "tokens": [51476, 291, 458, 47744, 14324, 721, 1105, + 293, 286, 603, 1011, 309, 670, 281, 8033, 293, 445, 257, 1623, 286, 360, 528, 281, + 51776], "temperature": 0.0, "avg_logprob": -0.19746763627607744, "compression_ratio": + 1.8396226415094339, "no_speech_prob": 0.006937813945114613}, {"id": 142, "seek": + 88492, "start": 884.92, "end": 891.4, "text": " call that I really liked uh your + previous episode with all a Sandra where uh I can''t remember", "tokens": [50364, + 818, 300, 286, 534, 4501, 2232, 428, 3894, 3500, 365, 439, 257, 28184, 689, 2232, + 286, 393, 380, 1604, 50688], "temperature": 0.0, "avg_logprob": -0.14683284542777322, + "compression_ratio": 1.670995670995671, "no_speech_prob": 0.0005746468668803573}, + {"id": 143, "seek": 88492, "start": 891.4, "end": 896.68, "text": " was you were + all a Sandra but you kind of I think it was you Demetri said yeah that your engineers", + "tokens": [50688, 390, 291, 645, 439, 257, 28184, 457, 291, 733, 295, 286, 519, + 309, 390, 291, 4686, 302, 470, 848, 1338, 300, 428, 11955, 50952], "temperature": + 0.0, "avg_logprob": -0.14683284542777322, "compression_ratio": 1.670995670995671, + "no_speech_prob": 0.0005746468668803573}, {"id": 144, "seek": 88492, "start": 896.68, + "end": 902.1999999999999, "text": " were looking at hybrid search and they kind + of looked at it and said when you strip away the fancy", "tokens": [50952, 645, + 1237, 412, 13051, 3164, 293, 436, 733, 295, 2956, 412, 309, 293, 848, 562, 291, + 12828, 1314, 264, 10247, 51228], "temperature": 0.0, "avg_logprob": -0.14683284542777322, + "compression_ratio": 1.670995670995671, "no_speech_prob": 0.0005746468668803573}, + {"id": 145, "seek": 88492, "start": 902.1999999999999, "end": 909.9599999999999, + "text": " words like ranked reciprocal fusion for blending things together you''re + like that''s just round", "tokens": [51228, 2283, 411, 20197, 46948, 23100, 337, + 23124, 721, 1214, 291, 434, 411, 300, 311, 445, 3098, 51616], "temperature": 0.0, + "avg_logprob": -0.14683284542777322, "compression_ratio": 1.670995670995671, "no_speech_prob": + 0.0005746468668803573}, {"id": 146, "seek": 90996, "start": 909.96, "end": 918.76, + "text": " Robin right and you know round robin is not necessarily a round blind + and it''s blind round", "tokens": [50364, 16533, 558, 293, 291, 458, 3098, 3870, + 259, 307, 406, 4725, 257, 3098, 6865, 293, 309, 311, 6865, 3098, 50804], "temperature": + 0.0, "avg_logprob": -0.12723348428914835, "compression_ratio": 1.8398058252427185, + "no_speech_prob": 0.00046124018263071775}, {"id": 147, "seek": 90996, "start": 918.76, + "end": 925.64, "text": " Robin right it''s not round robin in in your middle school + when you had to pick teams for dodgeball", "tokens": [50804, 16533, 558, 309, 311, + 406, 3098, 3870, 259, 294, 294, 428, 2808, 1395, 562, 291, 632, 281, 1888, 5491, + 337, 27238, 3129, 51148], "temperature": 0.0, "avg_logprob": -0.12723348428914835, + "compression_ratio": 1.8398058252427185, "no_speech_prob": 0.00046124018263071775}, + {"id": 148, "seek": 90996, "start": 926.6, "end": 932.76, "text": " right the people + picking knew who the best players were so at least you were at least divvying", + "tokens": [51196, 558, 264, 561, 8867, 2586, 567, 264, 1151, 4150, 645, 370, 412, + 1935, 291, 645, 412, 1935, 3414, 85, 1840, 51504], "temperature": 0.0, "avg_logprob": + -0.12723348428914835, "compression_ratio": 1.8398058252427185, "no_speech_prob": + 0.00046124018263071775}, {"id": 149, "seek": 90996, "start": 932.76, "end": 939.08, + "text": " the best choices and at the very end those last two kids you know you + knew they were the worst", "tokens": [51504, 264, 1151, 7994, 293, 412, 264, 588, + 917, 729, 1036, 732, 2301, 291, 458, 291, 2586, 436, 645, 264, 5855, 51820], "temperature": + 0.0, "avg_logprob": -0.12723348428914835, "compression_ratio": 1.8398058252427185, + "no_speech_prob": 0.00046124018263071775}, {"id": 150, "seek": 93908, "start": 939.08, + "end": 945.32, "text": " choices they were the noise in the search results right + but you were that round robin at least had", "tokens": [50364, 7994, 436, 645, 264, + 5658, 294, 264, 3164, 3542, 558, 457, 291, 645, 300, 3098, 3870, 259, 412, 1935, + 632, 50676], "temperature": 0.0, "avg_logprob": -0.11290172088977903, "compression_ratio": + 1.798165137614679, "no_speech_prob": 0.00011593567614909261}, {"id": 151, "seek": + 93908, "start": 945.32, "end": 953.1600000000001, "text": " the benefit of knowing + what was good ranked refresh ranked reciprocal fusion has no sense of whether", + "tokens": [50676, 264, 5121, 295, 5276, 437, 390, 665, 20197, 15134, 20197, 46948, + 23100, 575, 572, 2020, 295, 1968, 51068], "temperature": 0.0, "avg_logprob": -0.11290172088977903, + "compression_ratio": 1.798165137614679, "no_speech_prob": 0.00011593567614909261}, + {"id": 152, "seek": 93908, "start": 953.1600000000001, "end": 960.2, "text": " the + those results are good or bad right it is literally blindly picking them in some + order with no", "tokens": [51068, 264, 729, 3542, 366, 665, 420, 1578, 558, 309, + 307, 3736, 47744, 8867, 552, 294, 512, 1668, 365, 572, 51420], "temperature": 0.0, + "avg_logprob": -0.11290172088977903, "compression_ratio": 1.798165137614679, "no_speech_prob": + 0.00011593567614909261}, {"id": 153, "seek": 93908, "start": 960.2, "end": 966.2, + "text": " sense of uh of what that is and as you can imagine blindly picking is + going to leave lead you", "tokens": [51420, 2020, 295, 2232, 295, 437, 300, 307, + 293, 382, 291, 393, 3811, 47744, 8867, 307, 516, 281, 1856, 1477, 291, 51720], "temperature": + 0.0, "avg_logprob": -0.11290172088977903, "compression_ratio": 1.798165137614679, + "no_speech_prob": 0.00011593567614909261}, {"id": 154, "seek": 96620, "start": 966.2, + "end": 973.5600000000001, "text": " be pinched potentially a very weak dodgeball + team right and yet that''s what we think of a state of the", "tokens": [50364, 312, + 14614, 292, 7263, 257, 588, 5336, 27238, 3129, 1469, 558, 293, 1939, 300, 311, 437, + 321, 519, 295, 257, 1785, 295, 264, 50732], "temperature": 0.0, "avg_logprob": -0.1573984252081977, + "compression_ratio": 1.6853448275862069, "no_speech_prob": 0.0013409038074314594}, + {"id": 155, "seek": 96620, "start": 973.5600000000001, "end": 980.5200000000001, + "text": " art yeah so Daniel what should we do in this case is there any solution + it''s it''s a good segue", "tokens": [50732, 1523, 1338, 370, 8033, 437, 820, 321, + 360, 294, 341, 1389, 307, 456, 604, 3827, 309, 311, 309, 311, 257, 665, 33850, 51080], + "temperature": 0.0, "avg_logprob": -0.1573984252081977, "compression_ratio": 1.6853448275862069, + "no_speech_prob": 0.0013409038074314594}, {"id": 156, "seek": 96620, "start": 980.5200000000001, + "end": 988.12, "text": " into what we actually tried and explored and experimented + with um so in our most recent work we", "tokens": [51080, 666, 437, 321, 767, 3031, + 293, 24016, 293, 5120, 292, 365, 1105, 370, 294, 527, 881, 5162, 589, 321, 51460], + "temperature": 0.0, "avg_logprob": -0.1573984252081977, "compression_ratio": 1.6853448275862069, + "no_speech_prob": 0.0013409038074314594}, {"id": 157, "seek": 96620, "start": 988.84, + "end": 994.84, "text": " tried to come up with a systematic approach to optimize + hybrid search specifically in open search", "tokens": [51496, 3031, 281, 808, 493, + 365, 257, 27249, 3109, 281, 19719, 13051, 3164, 4682, 294, 1269, 3164, 51796], "temperature": + 0.0, "avg_logprob": -0.1573984252081977, "compression_ratio": 1.6853448275862069, + "no_speech_prob": 0.0013409038074314594}, {"id": 158, "seek": 99484, "start": 995.24, + "end": 1005.8000000000001, "text": " um so in open search actually right now you + have linear combination techniques at hand so that", "tokens": [50384, 1105, 370, + 294, 1269, 3164, 767, 558, 586, 291, 362, 8213, 6562, 7512, 412, 1011, 370, 300, + 50912], "temperature": 0.0, "avg_logprob": -0.15689779500492285, "compression_ratio": + 1.6932515337423313, "no_speech_prob": 0.0004383990599308163}, {"id": 159, "seek": + 99484, "start": 1005.8000000000001, "end": 1011.5600000000001, "text": " means you + have two normalization techniques you can choose one the L2 norm the min max norm", + "tokens": [50912, 1355, 291, 362, 732, 2710, 2144, 7512, 291, 393, 2826, 472, 264, + 441, 17, 2026, 264, 923, 11469, 2026, 51200], "temperature": 0.0, "avg_logprob": + -0.15689779500492285, "compression_ratio": 1.6932515337423313, "no_speech_prob": + 0.0004383990599308163}, {"id": 160, "seek": 99484, "start": 1011.5600000000001, + "end": 1018.0400000000001, "text": " um they are basically both there so that you + can normalize the scores from keyword search", "tokens": [51200, 1105, 436, 366, + 1936, 1293, 456, 370, 300, 291, 393, 2710, 1125, 264, 13444, 490, 20428, 3164, 51524], + "temperature": 0.0, "avg_logprob": -0.15689779500492285, "compression_ratio": 1.6932515337423313, + "no_speech_prob": 0.0004383990599308163}, {"id": 161, "seek": 101804, "start": 1018.92, + "end": 1026.28, "text": " into the let''s say space of vector search so that you + can compare apples to apples more or less", "tokens": [50408, 666, 264, 718, 311, + 584, 1901, 295, 8062, 3164, 370, 300, 291, 393, 6794, 16814, 281, 16814, 544, 420, + 1570, 50776], "temperature": 0.0, "avg_logprob": -0.15317499464836673, "compression_ratio": + 1.60989010989011, "no_speech_prob": 0.0031713047064840794}, {"id": 162, "seek": + 101804, "start": 1026.28, "end": 1034.44, "text": " here and not apples to oranges + because as we all know vm25 scores especially if you have if you have", "tokens": + [50776, 510, 293, 406, 16814, 281, 35474, 570, 382, 321, 439, 458, 371, 76, 6074, + 13444, 2318, 498, 291, 362, 498, 291, 362, 51184], "temperature": 0.0, "avg_logprob": + -0.15317499464836673, "compression_ratio": 1.60989010989011, "no_speech_prob": 0.0031713047064840794}, + {"id": 163, "seek": 101804, "start": 1034.44, "end": 1042.12, "text": " like wired + field weights they are unbounded they can be in the dozens the hundreds the thousands", + "tokens": [51184, 411, 27415, 2519, 17443, 436, 366, 517, 18767, 292, 436, 393, + 312, 294, 264, 18431, 264, 6779, 264, 5383, 51568], "temperature": 0.0, "avg_logprob": + -0.15317499464836673, "compression_ratio": 1.60989010989011, "no_speech_prob": 0.0031713047064840794}, + {"id": 164, "seek": 104212, "start": 1042.12, "end": 1049.4799999999998, "text": + " so you don''t really know upfront in what range you are operating and also you + can''t really compare", "tokens": [50364, 370, 291, 500, 380, 534, 458, 30264, 294, + 437, 3613, 291, 366, 7447, 293, 611, 291, 393, 380, 534, 6794, 50732], "temperature": + 0.0, "avg_logprob": -0.11347551814845351, "compression_ratio": 1.64, "no_speech_prob": + 0.00010527812264626846}, {"id": 165, "seek": 104212, "start": 1049.4799999999998, + "end": 1055.6399999999999, "text": " the scores from one query to another query + so that makes it really difficult to combine keyword", "tokens": [50732, 264, 13444, + 490, 472, 14581, 281, 1071, 14581, 370, 300, 1669, 309, 534, 2252, 281, 10432, 20428, + 51040], "temperature": 0.0, "avg_logprob": -0.11347551814845351, "compression_ratio": + 1.64, "no_speech_prob": 0.00010527812264626846}, {"id": 166, "seek": 104212, "start": + 1056.6799999999998, "end": 1065.8, "text": " search scores with any other um let''s + say search mechanism together with these normalization", "tokens": [51092, 3164, + 13444, 365, 604, 661, 1105, 718, 311, 584, 3164, 7513, 1214, 365, 613, 2710, 2144, + 51548], "temperature": 0.0, "avg_logprob": -0.11347551814845351, "compression_ratio": + 1.64, "no_speech_prob": 0.00010527812264626846}, {"id": 167, "seek": 106580, "start": + 1065.8799999999999, "end": 1072.84, "text": " techniques the L2 norm them in max + norm you have three combination techniques at hand and that''s", "tokens": [50368, + 7512, 264, 441, 17, 2026, 552, 294, 11469, 2026, 291, 362, 1045, 6562, 7512, 412, + 1011, 293, 300, 311, 50716], "temperature": 0.0, "avg_logprob": -0.16189916928609213, + "compression_ratio": 1.8293838862559242, "no_speech_prob": 0.0012069582007825375}, + {"id": 168, "seek": 106580, "start": 1072.84, "end": 1078.68, "text": " basically + just three different means you can apply the arithmetic mean the harmonic mean and + the", "tokens": [50716, 1936, 445, 1045, 819, 1355, 291, 393, 3079, 264, 42973, + 914, 264, 32270, 914, 293, 264, 51008], "temperature": 0.0, "avg_logprob": -0.16189916928609213, + "compression_ratio": 1.8293838862559242, "no_speech_prob": 0.0012069582007825375}, + {"id": 169, "seek": 106580, "start": 1078.68, "end": 1086.36, "text": " geometric + mean so that leaves you with two by three so that''s already six parameter combinations", + "tokens": [51008, 33246, 914, 370, 300, 5510, 291, 365, 732, 538, 1045, 370, 300, + 311, 1217, 2309, 13075, 21267, 51392], "temperature": 0.0, "avg_logprob": -0.16189916928609213, + "compression_ratio": 1.8293838862559242, "no_speech_prob": 0.0012069582007825375}, + {"id": 170, "seek": 106580, "start": 1086.36, "end": 1094.9199999999998, "text": + " that you can try on and then you can define weights um so how um how much neurosurge + weight how", "tokens": [51392, 300, 291, 393, 853, 322, 293, 550, 291, 393, 6964, + 17443, 1105, 370, 577, 1105, 577, 709, 28813, 374, 432, 3364, 577, 51820], "temperature": + 0.0, "avg_logprob": -0.16189916928609213, "compression_ratio": 1.8293838862559242, + "no_speech_prob": 0.0012069582007825375}, {"id": 171, "seek": 109492, "start": 1094.92, + "end": 1101.4, "text": " much keyword search weight do i want to have in my query + they always add up to one so you can say", "tokens": [50364, 709, 20428, 3164, 3364, + 360, 741, 528, 281, 362, 294, 452, 14581, 436, 1009, 909, 493, 281, 472, 370, 291, + 393, 584, 50688], "temperature": 0.0, "avg_logprob": -0.1726050059000651, "compression_ratio": + 1.5217391304347827, "no_speech_prob": 0.000397005642298609}, {"id": 172, "seek": + 109492, "start": 1101.4, "end": 1110.2, "text": " I want to go with 10% keyword + 90% neural or 50-50 um thinking of let''s say 11 of these", "tokens": [50688, 286, + 528, 281, 352, 365, 1266, 4, 20428, 4289, 4, 18161, 420, 2625, 12, 2803, 1105, 1953, + 295, 718, 311, 584, 2975, 295, 613, 51128], "temperature": 0.0, "avg_logprob": -0.1726050059000651, + "compression_ratio": 1.5217391304347827, "no_speech_prob": 0.000397005642298609}, + {"id": 173, "seek": 109492, "start": 1111.16, "end": 1118.3600000000001, "text": + " weights so maybe you start with zero keyword and a hundred percent neural and + 10% 90% and so on", "tokens": [51176, 17443, 370, 1310, 291, 722, 365, 4018, 20428, + 293, 257, 3262, 3043, 18161, 293, 1266, 4, 4289, 4, 293, 370, 322, 51536], "temperature": + 0.0, "avg_logprob": -0.1726050059000651, "compression_ratio": 1.5217391304347827, + "no_speech_prob": 0.000397005642298609}, {"id": 174, "seek": 111836, "start": 1118.4399999999998, + "end": 1127.08, "text": " and so forth so that gives you a range of 11 multiplied + with the six parameter combinations", "tokens": [50368, 293, 370, 5220, 370, 300, + 2709, 291, 257, 3613, 295, 2975, 17207, 365, 264, 2309, 13075, 21267, 50800], "temperature": + 0.0, "avg_logprob": -0.12123590502245672, "compression_ratio": 1.5737704918032787, + "no_speech_prob": 0.00017715782450977713}, {"id": 175, "seek": 111836, "start": + 1127.08, "end": 1137.6399999999999, "text": " that we already had gives us let''s + say a solution area to explore of 66 different combinations", "tokens": [50800, + 300, 321, 1217, 632, 2709, 505, 718, 311, 584, 257, 3827, 1859, 281, 6839, 295, + 21126, 819, 21267, 51328], "temperature": 0.0, "avg_logprob": -0.12123590502245672, + "compression_ratio": 1.5737704918032787, "no_speech_prob": 0.00017715782450977713}, + {"id": 176, "seek": 111836, "start": 1137.6399999999999, "end": 1146.52, "text": + " which is pretty manageable so we defined optimizing hybrid search as a parameter + optimization problem", "tokens": [51328, 597, 307, 1238, 38798, 370, 321, 7642, + 40425, 13051, 3164, 382, 257, 13075, 19618, 1154, 51772], "temperature": 0.0, "avg_logprob": + -0.12123590502245672, "compression_ratio": 1.5737704918032787, "no_speech_prob": + 0.00017715782450977713}, {"id": 177, "seek": 114652, "start": 1147.16, "end": 1152.52, + "text": " and we picked the most straightforward approach that you can pick and + we just tried out all", "tokens": [50396, 293, 321, 6183, 264, 881, 15325, 3109, + 300, 291, 393, 1888, 293, 321, 445, 3031, 484, 439, 50664], "temperature": 0.0, + "avg_logprob": -0.1616206623259045, "compression_ratio": 1.6437768240343347, "no_speech_prob": + 0.001243914826773107}, {"id": 178, "seek": 114652, "start": 1152.52, "end": 1160.6, + "text": " different combinations and calculated search metrics based on judgments + and then we just had a", "tokens": [50664, 819, 21267, 293, 15598, 3164, 16367, + 2361, 322, 40337, 293, 550, 321, 445, 632, 257, 51068], "temperature": 0.0, "avg_logprob": + -0.1616206623259045, "compression_ratio": 1.6437768240343347, "no_speech_prob": + 0.001243914826773107}, {"id": 179, "seek": 114652, "start": 1160.6, "end": 1168.36, + "text": " look at which one is the best combination um for our experiments we used + the ESCI data set um so", "tokens": [51068, 574, 412, 597, 472, 307, 264, 1151, + 6562, 1105, 337, 527, 12050, 321, 1143, 264, 12564, 25240, 1412, 992, 1105, 370, + 51456], "temperature": 0.0, "avg_logprob": -0.1616206623259045, "compression_ratio": + 1.6437768240343347, "no_speech_prob": 0.001243914826773107}, {"id": 180, "seek": + 114652, "start": 1168.36, "end": 1175.72, "text": " that was released by amazon + a couple of i think 18 months ago or something like that as part of our", "tokens": + [51456, 300, 390, 4736, 538, 47010, 257, 1916, 295, 741, 519, 2443, 2493, 2057, + 420, 746, 411, 300, 382, 644, 295, 527, 51824], "temperature": 0.0, "avg_logprob": + -0.1616206623259045, "compression_ratio": 1.6437768240343347, "no_speech_prob": + 0.001243914826773107}, {"id": 181, "seek": 117572, "start": 1175.72, "end": 1182.68, + "text": " taglet competition um this data set comes with queries comes with products + and most importantly", "tokens": [50364, 6162, 2631, 6211, 1105, 341, 1412, 992, + 1487, 365, 24109, 1487, 365, 3383, 293, 881, 8906, 50712], "temperature": 0.0, "avg_logprob": + -0.1821271260579427, "compression_ratio": 1.7725118483412323, "no_speech_prob": + 0.0002748727274592966}, {"id": 182, "seek": 117572, "start": 1182.68, "end": 1190.6000000000001, + "text": " it comes with judgments so we basically have everything that we need to + really try out different", "tokens": [50712, 309, 1487, 365, 40337, 370, 321, 1936, + 362, 1203, 300, 321, 643, 281, 534, 853, 484, 819, 51108], "temperature": 0.0, "avg_logprob": + -0.1821271260579427, "compression_ratio": 1.7725118483412323, "no_speech_prob": + 0.0002748727274592966}, {"id": 183, "seek": 117572, "start": 1190.6000000000001, + "end": 1196.6000000000001, "text": " um parameter combinations see how they work + what results are retrieved um can calculate a couple", "tokens": [51108, 1105, 13075, + 21267, 536, 577, 436, 589, 437, 3542, 366, 19817, 937, 1105, 393, 8873, 257, 1916, + 51408], "temperature": 0.0, "avg_logprob": -0.1821271260579427, "compression_ratio": + 1.7725118483412323, "no_speech_prob": 0.0002748727274592966}, {"id": 184, "seek": + 117572, "start": 1196.6000000000001, "end": 1202.3600000000001, "text": " of metrics + compare these and then see which one is the best um parameter combination", "tokens": + [51408, 295, 16367, 6794, 613, 293, 550, 536, 597, 472, 307, 264, 1151, 1105, 13075, + 6562, 51696], "temperature": 0.0, "avg_logprob": -0.1821271260579427, "compression_ratio": + 1.7725118483412323, "no_speech_prob": 0.0002748727274592966}, {"id": 185, "seek": + 120236, "start": 1203.1599999999999, "end": 1211.1599999999999, "text": " and um + that''s what we call the global hybrid search optimizer so we try to identify the + best", "tokens": [50404, 293, 1105, 300, 311, 437, 321, 818, 264, 4338, 13051, 3164, + 5028, 6545, 370, 321, 853, 281, 5876, 264, 1151, 50804], "temperature": 0.0, "avg_logprob": + -0.08685370742297563, "compression_ratio": 1.5909090909090908, "no_speech_prob": + 0.0012637972831726074}, {"id": 186, "seek": 120236, "start": 1211.1599999999999, + "end": 1218.28, "text": " parameter combination globally for all the queries that + we are looking at in a certain defined", "tokens": [50804, 13075, 6562, 18958, 337, + 439, 264, 24109, 300, 321, 366, 1237, 412, 294, 257, 1629, 7642, 51160], "temperature": + 0.0, "avg_logprob": -0.08685370742297563, "compression_ratio": 1.5909090909090908, + "no_speech_prob": 0.0012637972831726074}, {"id": 187, "seek": 120236, "start": 1219.08, + "end": 1226.04, "text": " subset of queries so that''s kind of the first step um + the very very straightforward approach", "tokens": [51200, 25993, 295, 24109, 370, + 300, 311, 733, 295, 264, 700, 1823, 1105, 264, 588, 588, 15325, 3109, 51548], "temperature": + 0.0, "avg_logprob": -0.08685370742297563, "compression_ratio": 1.5909090909090908, + "no_speech_prob": 0.0012637972831726074}, {"id": 188, "seek": 122604, "start": 1226.2, + "end": 1232.28, "text": " that we applied that''s not really something um let''s + say scientifically um so first decated", "tokens": [50372, 300, 321, 6456, 300, + 311, 406, 534, 746, 1105, 718, 311, 584, 39719, 1105, 370, 700, 979, 770, 50676], + "temperature": 0.0, "avg_logprob": -0.22167723855854551, "compression_ratio": 1.696035242290749, + "no_speech_prob": 0.0014247347135096788}, {"id": 189, "seek": 122604, "start": 1232.28, + "end": 1240.36, "text": " there was just a very brute force approach to see um what''s + in there also learn how results may be", "tokens": [50676, 456, 390, 445, 257, 588, + 47909, 3464, 3109, 281, 536, 1105, 437, 311, 294, 456, 611, 1466, 577, 3542, 815, + 312, 51080], "temperature": 0.0, "avg_logprob": -0.22167723855854551, "compression_ratio": + 1.696035242290749, "no_speech_prob": 0.0014247347135096788}, {"id": 190, "seek": + 122604, "start": 1241.32, "end": 1248.12, "text": " shaped or turn out differently + when we increase neural search weight or increase the keyword search", "tokens": + [51128, 13475, 420, 1261, 484, 7614, 562, 321, 3488, 18161, 3164, 3364, 420, 3488, + 264, 20428, 3164, 51468], "temperature": 0.0, "avg_logprob": -0.22167723855854551, + "compression_ratio": 1.696035242290749, "no_speech_prob": 0.0014247347135096788}, + {"id": 191, "seek": 122604, "start": 1248.12, "end": 1255.32, "text": " weight which + normalization combination uh technique is usually the one that''s best to retrieve", + "tokens": [51468, 3364, 597, 2710, 2144, 6562, 2232, 6532, 307, 2673, 264, 472, + 300, 311, 1151, 281, 30254, 51828], "temperature": 0.0, "avg_logprob": -0.22167723855854551, + "compression_ratio": 1.696035242290749, "no_speech_prob": 0.0014247347135096788}, + {"id": 192, "seek": 125532, "start": 1255.32, "end": 1262.4399999999998, "text": + " the results and so on and so forth so um we started out with what I call reasonable", + "tokens": [50364, 264, 3542, 293, 370, 322, 293, 370, 5220, 370, 1105, 321, 1409, + 484, 365, 437, 286, 818, 10585, 50720], "temperature": 0.0, "avg_logprob": -0.20958518981933594, + "compression_ratio": 1.5393258426966292, "no_speech_prob": 0.000259325752267614}, + {"id": 193, "seek": 125532, "start": 1262.4399999999998, "end": 1269.72, "text": + " baseline so searching across um I think five or six fields so title category color + brand", "tokens": [50720, 20518, 370, 10808, 2108, 1105, 286, 519, 1732, 420, 2309, + 7909, 370, 4876, 7719, 2017, 3360, 51084], "temperature": 0.0, "avg_logprob": -0.20958518981933594, + "compression_ratio": 1.5393258426966292, "no_speech_prob": 0.000259325752267614}, + {"id": 194, "seek": 125532, "start": 1269.72, "end": 1277.8, "text": " and description + some bullet points so ecommerce data set like pretty basic stuff um and we calculated", + "tokens": [51084, 293, 3855, 512, 11632, 2793, 370, 308, 26926, 1412, 992, 411, + 1238, 3875, 1507, 1105, 293, 321, 15598, 51488], "temperature": 0.0, "avg_logprob": + -0.20958518981933594, "compression_ratio": 1.5393258426966292, "no_speech_prob": + 0.000259325752267614}, {"id": 195, "seek": 127780, "start": 1277.8799999999999, + "end": 1286.52, "text": " our metrics with that baseline so um I would call it uh + probably not the best baseline you can come", "tokens": [50368, 527, 16367, 365, + 300, 20518, 370, 1105, 286, 576, 818, 309, 2232, 1391, 406, 264, 1151, 20518, 291, + 393, 808, 50800], "temperature": 0.0, "avg_logprob": -0.07955386373731825, "compression_ratio": + 1.9121951219512194, "no_speech_prob": 0.0008101784042082727}, {"id": 196, "seek": + 127780, "start": 1286.52, "end": 1293.32, "text": " up with um but a reasonable + baseline um you could come up with so we didn''t want to let''s say", "tokens": + [50800, 493, 365, 1105, 457, 257, 10585, 20518, 1105, 291, 727, 808, 493, 365, 370, + 321, 994, 380, 528, 281, 718, 311, 584, 51140], "temperature": 0.0, "avg_logprob": + -0.07955386373731825, "compression_ratio": 1.9121951219512194, "no_speech_prob": + 0.0008101784042082727}, {"id": 197, "seek": 127780, "start": 1294.12, "end": 1300.04, + "text": " just create the weakest baseline because that''s not really difficult + to let''s say outperform so", "tokens": [51180, 445, 1884, 264, 44001, 20518, 570, + 300, 311, 406, 534, 2252, 281, 718, 311, 584, 484, 26765, 370, 51476], "temperature": + 0.0, "avg_logprob": -0.07955386373731825, "compression_ratio": 1.9121951219512194, + "no_speech_prob": 0.0008101784042082727}, {"id": 198, "seek": 127780, "start": 1300.04, + "end": 1306.36, "text": " we wanted to create a reasonable baseline without putting + let''s say a man here in finding out what the", "tokens": [51476, 321, 1415, 281, + 1884, 257, 10585, 20518, 1553, 3372, 718, 311, 584, 257, 587, 510, 294, 5006, 484, + 437, 264, 51792], "temperature": 0.0, "avg_logprob": -0.07955386373731825, "compression_ratio": + 1.9121951219512194, "no_speech_prob": 0.0008101784042082727}, {"id": 199, "seek": + 130636, "start": 1306.36, "end": 1314.76, "text": " best baseline is um that''s + called okay right um we got decent results out of that and then", "tokens": [50364, + 1151, 20518, 307, 1105, 300, 311, 1219, 1392, 558, 1105, 321, 658, 8681, 3542, 484, + 295, 300, 293, 550, 50784], "temperature": 0.0, "avg_logprob": -0.13504152095064204, + "compression_ratio": 1.8110599078341014, "no_speech_prob": 0.0002652165130712092}, + {"id": 200, "seek": 130636, "start": 1314.76, "end": 1321.56, "text": " we ran this + global hybrid search optimizer and that outperformed the baseline already um across + the", "tokens": [50784, 321, 5872, 341, 4338, 13051, 3164, 5028, 6545, 293, 300, + 484, 610, 22892, 264, 20518, 1217, 1105, 2108, 264, 51124], "temperature": 0.0, + "avg_logprob": -0.13504152095064204, "compression_ratio": 1.8110599078341014, "no_speech_prob": + 0.0002652165130712092}, {"id": 201, "seek": 130636, "start": 1321.56, "end": 1328.76, + "text": " metrics that we had a look at so better in DCG better DCG better precision + at 10 were um these were", "tokens": [51124, 16367, 300, 321, 632, 257, 574, 412, + 370, 1101, 294, 9114, 38, 1101, 9114, 38, 1101, 18356, 412, 1266, 645, 1105, 613, + 645, 51484], "temperature": 0.0, "avg_logprob": -0.13504152095064204, "compression_ratio": + 1.8110599078341014, "no_speech_prob": 0.0002652165130712092}, {"id": 202, "seek": + 130636, "start": 1328.76, "end": 1335.8, "text": " the three metrics that we had + a look at and that was nice to see because that already gave us um let''s", "tokens": + [51484, 264, 1045, 16367, 300, 321, 632, 257, 574, 412, 293, 300, 390, 1481, 281, + 536, 570, 300, 1217, 2729, 505, 1105, 718, 311, 51836], "temperature": 0.0, "avg_logprob": + -0.13504152095064204, "compression_ratio": 1.8110599078341014, "no_speech_prob": + 0.0002652165130712092}, {"id": 203, "seek": 133580, "start": 1335.8, "end": 1342.2, + "text": " say assurance in a there is a straightforward approach that everyone can + use because it''s really", "tokens": [50364, 584, 32189, 294, 257, 456, 307, 257, + 15325, 3109, 300, 1518, 393, 764, 570, 309, 311, 534, 50684], "temperature": 0.0, + "avg_logprob": -0.12143955732646741, "compression_ratio": 1.8019323671497585, "no_speech_prob": + 0.00033421439002268016}, {"id": 204, "seek": 133580, "start": 1342.2, "end": 1349.48, + "text": " easily applicable um it gets you good results and it also gives you assurance + that there is", "tokens": [50684, 3612, 21142, 1105, 309, 2170, 291, 665, 3542, + 293, 309, 611, 2709, 291, 32189, 300, 456, 307, 51048], "temperature": 0.0, "avg_logprob": + -0.12143955732646741, "compression_ratio": 1.8019323671497585, "no_speech_prob": + 0.00033421439002268016}, {"id": 205, "seek": 133580, "start": 1349.48, "end": 1356.28, + "text": " something too neural search when switching to it from a keyword based + um search engine or", "tokens": [51048, 746, 886, 18161, 3164, 562, 16493, 281, + 309, 490, 257, 20428, 2361, 1105, 3164, 2848, 420, 51388], "temperature": 0.0, "avg_logprob": + -0.12143955732646741, "compression_ratio": 1.8019323671497585, "no_speech_prob": + 0.00033421439002268016}, {"id": 206, "seek": 133580, "start": 1356.28, "end": 1364.9199999999998, + "text": " search application to a hybrid search um application but um as always + when you apply something", "tokens": [51388, 3164, 3861, 281, 257, 13051, 3164, + 1105, 3861, 457, 1105, 382, 1009, 562, 291, 3079, 746, 51820], "temperature": 0.0, + "avg_logprob": -0.12143955732646741, "compression_ratio": 1.8019323671497585, "no_speech_prob": + 0.00033421439002268016}, {"id": 207, "seek": 136492, "start": 1364.92, "end": 1372.28, + "text": " globally there are winners and there are losers so um some of the queries + really improve by this", "tokens": [50364, 18958, 456, 366, 17193, 293, 456, 366, + 37713, 370, 1105, 512, 295, 264, 24109, 534, 3470, 538, 341, 50732], "temperature": + 0.0, "avg_logprob": -0.09761110428840883, "compression_ratio": 1.6201117318435754, + "no_speech_prob": 0.00019892251293640584}, {"id": 208, "seek": 136492, "start": + 1372.28, "end": 1380.52, "text": " hybrid search optimization step the global one + but others didn''t so we took this one step further", "tokens": [50732, 13051, 3164, + 19618, 1823, 264, 4338, 472, 457, 2357, 994, 380, 370, 321, 1890, 341, 472, 1823, + 3052, 51144], "temperature": 0.0, "avg_logprob": -0.09761110428840883, "compression_ratio": + 1.6201117318435754, "no_speech_prob": 0.00019892251293640584}, {"id": 209, "seek": + 136492, "start": 1380.52, "end": 1390.2, "text": " and thought about how can we + um really create a process that dynamically per query now predicts", "tokens": [51144, + 293, 1194, 466, 577, 393, 321, 1105, 534, 1884, 257, 1399, 300, 43492, 680, 14581, + 586, 6069, 82, 51628], "temperature": 0.0, "avg_logprob": -0.09761110428840883, + "compression_ratio": 1.6201117318435754, "no_speech_prob": 0.00019892251293640584}, + {"id": 210, "seek": 139020, "start": 1390.28, "end": 1397.96, "text": " what the + best parameter set is and that now is also like going in this direction that dark + mentions", "tokens": [50368, 437, 264, 1151, 13075, 992, 307, 293, 300, 586, 307, + 611, 411, 516, 294, 341, 3513, 300, 2877, 23844, 50752], "temperature": 0.0, "avg_logprob": + -0.13489577770233155, "compression_ratio": 1.7477064220183487, "no_speech_prob": + 0.0005321715143509209}, {"id": 211, "seek": 139020, "start": 1397.96, "end": 1403.4, + "text": " in his blog post right so that''s kind of a query understanding approach + to hybrid search", "tokens": [50752, 294, 702, 6968, 2183, 558, 370, 300, 311, 733, + 295, 257, 14581, 3701, 3109, 281, 13051, 3164, 51024], "temperature": 0.0, "avg_logprob": + -0.13489577770233155, "compression_ratio": 1.7477064220183487, "no_speech_prob": + 0.0005321715143509209}, {"id": 212, "seek": 139020, "start": 1404.1200000000001, + "end": 1411.48, "text": " so we''re not just blindly applying one parameter combination + that we identified on a thousand queries", "tokens": [51060, 370, 321, 434, 406, + 445, 47744, 9275, 472, 13075, 6562, 300, 321, 9234, 322, 257, 4714, 24109, 51428], + "temperature": 0.0, "avg_logprob": -0.13489577770233155, "compression_ratio": 1.7477064220183487, + "no_speech_prob": 0.0005321715143509209}, {"id": 213, "seek": 139020, "start": 1411.48, + "end": 1418.68, "text": " that we explored we are taking one query analyzing this + one query and then saying based on", "tokens": [51428, 300, 321, 24016, 321, 366, + 1940, 472, 14581, 23663, 341, 472, 14581, 293, 550, 1566, 2361, 322, 51788], "temperature": + 0.0, "avg_logprob": -0.13489577770233155, "compression_ratio": 1.7477064220183487, + "no_speech_prob": 0.0005321715143509209}, {"id": 214, "seek": 141868, "start": 1419.3200000000002, + "end": 1425.72, "text": " a variety of experiments that we made what is for this + individual query the best parameter", "tokens": [50396, 257, 5673, 295, 12050, 300, + 321, 1027, 437, 307, 337, 341, 2609, 14581, 264, 1151, 13075, 50716], "temperature": + 0.0, "avg_logprob": -0.12494560388418344, "compression_ratio": 1.7788461538461537, + "no_speech_prob": 0.0004352013929747045}, {"id": 215, "seek": 141868, "start": 1425.72, + "end": 1431.16, "text": " combination that we cannot apply so that we are not really + globally applying something but", "tokens": [50716, 6562, 300, 321, 2644, 3079, + 370, 300, 321, 366, 406, 534, 18958, 9275, 746, 457, 50988], "temperature": 0.0, + "avg_logprob": -0.12494560388418344, "compression_ratio": 1.7788461538461537, "no_speech_prob": + 0.0004352013929747045}, {"id": 216, "seek": 141868, "start": 1431.16, "end": 1440.68, + "text": " individually dynamically per query and um to already maybe yeah give you + the results um of", "tokens": [50988, 16652, 43492, 680, 14581, 293, 1105, 281, + 1217, 1310, 1338, 976, 291, 264, 3542, 1105, 295, 51464], "temperature": 0.0, "avg_logprob": + -0.12494560388418344, "compression_ratio": 1.7788461538461537, "no_speech_prob": + 0.0004352013929747045}, {"id": 217, "seek": 141868, "start": 1441.24, "end": 1447.96, + "text": " what we did and then go into detail how we did it um the dynamic approach + outperformed the global", "tokens": [51492, 437, 321, 630, 293, 550, 352, 666, 2607, + 577, 321, 630, 309, 1105, 264, 8546, 3109, 484, 610, 22892, 264, 4338, 51828], "temperature": + 0.0, "avg_logprob": -0.12494560388418344, "compression_ratio": 1.7788461538461537, + "no_speech_prob": 0.0004352013929747045}, {"id": 218, "seek": 144796, "start": 1447.96, + "end": 1456.76, "text": " approach so we managed to identify a set of features we + trained a model or multiple models actually", "tokens": [50364, 3109, 370, 321, + 6453, 281, 5876, 257, 992, 295, 4122, 321, 8895, 257, 2316, 420, 3866, 5245, 767, + 50804], "temperature": 0.0, "avg_logprob": -0.130331888794899, "compression_ratio": + 1.6363636363636365, "no_speech_prob": 0.0004946246044710279}, {"id": 219, "seek": + 144796, "start": 1457.8, "end": 1464.52, "text": " and by applying this we were + able to predict the best neuralness in that case or the best neural", "tokens": + [50856, 293, 538, 9275, 341, 321, 645, 1075, 281, 6069, 264, 1151, 18161, 1287, + 294, 300, 1389, 420, 264, 1151, 18161, 51192], "temperature": 0.0, "avg_logprob": + -0.130331888794899, "compression_ratio": 1.6363636363636365, "no_speech_prob": 0.0004946246044710279}, + {"id": 220, "seek": 144796, "start": 1464.52, "end": 1471.96, "text": " search weight + for a given query based on um the results we got off the global hybrid search", + "tokens": [51192, 3164, 3364, 337, 257, 2212, 14581, 2361, 322, 1105, 264, 3542, + 321, 658, 766, 264, 4338, 13051, 3164, 51564], "temperature": 0.0, "avg_logprob": + -0.130331888794899, "compression_ratio": 1.6363636363636365, "no_speech_prob": 0.0004946246044710279}, + {"id": 221, "seek": 147196, "start": 1472.04, "end": 1479.32, "text": " optimizer + so we basically recycled all the different search metrics on a per query basis that + we got", "tokens": [50368, 5028, 6545, 370, 321, 1936, 30674, 439, 264, 819, 3164, + 16367, 322, 257, 680, 14581, 5143, 300, 321, 658, 50732], "temperature": 0.0, "avg_logprob": + -0.15848018421846277, "compression_ratio": 1.668103448275862, "no_speech_prob": + 0.0003756447113119066}, {"id": 222, "seek": 147196, "start": 1479.88, "end": 1486.3600000000001, + "text": " did some feature engineering trained models and then used these models + to predict what is the", "tokens": [50760, 630, 512, 4111, 7043, 8895, 5245, 293, + 550, 1143, 613, 5245, 281, 6069, 437, 307, 264, 51084], "temperature": 0.0, "avg_logprob": + -0.15848018421846277, "compression_ratio": 1.668103448275862, "no_speech_prob": + 0.0003756447113119066}, {"id": 223, "seek": 147196, "start": 1486.3600000000001, + "end": 1493.32, "text": " best neural search weight for this query and with this + dynamic approach we even saw increases", "tokens": [51084, 1151, 18161, 3164, 3364, + 337, 341, 14581, 293, 365, 341, 8546, 3109, 321, 754, 1866, 8637, 51432], "temperature": + 0.0, "avg_logprob": -0.15848018421846277, "compression_ratio": 1.668103448275862, + "no_speech_prob": 0.0003756447113119066}, {"id": 224, "seek": 147196, "start": 1493.32, + "end": 1501.48, "text": " up to 10% in one of the metrics of the three that I just + mentioned DCG and DCG and precision at 10", "tokens": [51432, 493, 281, 1266, 4, + 294, 472, 295, 264, 16367, 295, 264, 1045, 300, 286, 445, 2835, 9114, 38, 293, 9114, + 38, 293, 18356, 412, 1266, 51840], "temperature": 0.0, "avg_logprob": -0.15848018421846277, + "compression_ratio": 1.668103448275862, "no_speech_prob": 0.0003756447113119066}, + {"id": 225, "seek": 150196, "start": 1502.52, "end": 1507.56, "text": " yeah that''s + very exciting thanks for for sharing this whole you know end-to-end", "tokens": + [50392, 1338, 300, 311, 588, 4670, 3231, 337, 337, 5414, 341, 1379, 291, 458, 917, + 12, 1353, 12, 521, 50644], "temperature": 0.0, "avg_logprob": -0.19924222855340867, + "compression_ratio": 1.6415929203539823, "no_speech_prob": 0.0009391083149239421}, + {"id": 226, "seek": 150196, "start": 1507.56, "end": 1516.76, "text": " picture + pipeline I''m particularly interested at least at this point in time about the fact + that", "tokens": [50644, 3036, 15517, 286, 478, 4098, 3102, 412, 1935, 412, 341, + 935, 294, 565, 466, 264, 1186, 300, 51104], "temperature": 0.0, "avg_logprob": -0.19924222855340867, + "compression_ratio": 1.6415929203539823, "no_speech_prob": 0.0009391083149239421}, + {"id": 227, "seek": 150196, "start": 1516.76, "end": 1523.08, "text": " well first + of all your dynamic approach outperformed the global one right and that seems to + be thanks", "tokens": [51104, 731, 700, 295, 439, 428, 8546, 3109, 484, 610, 22892, + 264, 4338, 472, 558, 293, 300, 2544, 281, 312, 3231, 51420], "temperature": 0.0, + "avg_logprob": -0.19924222855340867, "compression_ratio": 1.6415929203539823, "no_speech_prob": + 0.0009391083149239421}, {"id": 228, "seek": 150196, "start": 1523.08, "end": 1530.68, + "text": " to that query understanding part right can you talk a bit more about that + uh and also did you", "tokens": [51420, 281, 300, 14581, 3701, 644, 558, 393, 291, + 751, 257, 857, 544, 466, 300, 2232, 293, 611, 630, 291, 51800], "temperature": 0.0, + "avg_logprob": -0.19924222855340867, "compression_ratio": 1.6415929203539823, "no_speech_prob": + 0.0009391083149239421}, {"id": 229, "seek": 153068, "start": 1531.64, "end": 1539.4, + "text": " check those predictions manually for example does it make intuitive sense + that system picked that''s", "tokens": [50412, 1520, 729, 21264, 16945, 337, 1365, + 775, 309, 652, 21769, 2020, 300, 1185, 6183, 300, 311, 50800], "temperature": 0.0, + "avg_logprob": -0.16313033633761936, "compression_ratio": 1.6724890829694323, "no_speech_prob": + 0.0011577354744076729}, {"id": 230, "seek": 153068, "start": 1539.4, "end": 1545.16, + "text": " for that specific query it picked more neuralness I mean is it like is + it like a natural language", "tokens": [50800, 337, 300, 2685, 14581, 309, 6183, + 544, 18161, 1287, 286, 914, 307, 309, 411, 307, 309, 411, 257, 3303, 2856, 51088], + "temperature": 0.0, "avg_logprob": -0.16313033633761936, "compression_ratio": 1.6724890829694323, + "no_speech_prob": 0.0011577354744076729}, {"id": 231, "seek": 153068, "start": 1545.72, + "end": 1551.16, "text": " question there or like some remnants of it or is there + some other interesting findings that you", "tokens": [51116, 1168, 456, 420, 411, + 512, 44652, 295, 309, 420, 307, 456, 512, 661, 1880, 16483, 300, 291, 51388], "temperature": + 0.0, "avg_logprob": -0.16313033633761936, "compression_ratio": 1.6724890829694323, + "no_speech_prob": 0.0011577354744076729}, {"id": 232, "seek": 153068, "start": 1551.16, + "end": 1559.88, "text": " could share possibly oh yeah so um let''s first maybe + outline what we did exactly and then", "tokens": [51388, 727, 2073, 6264, 1954, + 1338, 370, 1105, 718, 311, 700, 1310, 16387, 437, 321, 630, 2293, 293, 550, 51824], + "temperature": 0.0, "avg_logprob": -0.16313033633761936, "compression_ratio": 1.6724890829694323, + "no_speech_prob": 0.0011577354744076729}, {"id": 233, "seek": 155988, "start": 1560.44, + "end": 1567.8000000000002, "text": " dive into a couple of observations that we + made on the way um so we started out by creating", "tokens": [50392, 9192, 666, + 257, 1916, 295, 18163, 300, 321, 1027, 322, 264, 636, 1105, 370, 321, 1409, 484, + 538, 4084, 50760], "temperature": 0.0, "avg_logprob": -0.1355469822883606, "compression_ratio": + 2.0828729281767955, "no_speech_prob": 0.0001732686796458438}, {"id": 234, "seek": + 155988, "start": 1568.8400000000001, "end": 1575.8000000000002, "text": " what we + call feature groups and then we created features for these feature groups so we + looked at", "tokens": [50812, 437, 321, 818, 4111, 3935, 293, 550, 321, 2942, 4122, + 337, 613, 4111, 3935, 370, 321, 2956, 412, 51160], "temperature": 0.0, "avg_logprob": + -0.1355469822883606, "compression_ratio": 2.0828729281767955, "no_speech_prob": + 0.0001732686796458438}, {"id": 235, "seek": 155988, "start": 1575.8000000000002, + "end": 1582.2, "text": " three different feature groups um one was the query feature + group the next one was the keyword", "tokens": [51160, 1045, 819, 4111, 3935, 1105, + 472, 390, 264, 14581, 4111, 1594, 264, 958, 472, 390, 264, 20428, 51480], "temperature": + 0.0, "avg_logprob": -0.1355469822883606, "compression_ratio": 2.0828729281767955, + "no_speech_prob": 0.0001732686796458438}, {"id": 236, "seek": 155988, "start": 1582.2, + "end": 1589.24, "text": " search result feature group and then the semantics search + result feature group for the query", "tokens": [51480, 3164, 1874, 4111, 1594, 293, + 550, 264, 4361, 45298, 3164, 1874, 4111, 1594, 337, 264, 14581, 51832], "temperature": + 0.0, "avg_logprob": -0.1355469822883606, "compression_ratio": 2.0828729281767955, + "no_speech_prob": 0.0001732686796458438}, {"id": 237, "seek": 158924, "start": 1589.32, + "end": 1597.64, "text": " feature group we had a look at the length of the query + the number of query terms if the query has", "tokens": [50368, 4111, 1594, 321, + 632, 257, 574, 412, 264, 4641, 295, 264, 14581, 264, 1230, 295, 14581, 2115, 498, + 264, 14581, 575, 50784], "temperature": 0.0, "avg_logprob": -0.07804080934235544, + "compression_ratio": 1.8389261744966443, "no_speech_prob": 0.00014303417992778122}, + {"id": 238, "seek": 158924, "start": 1597.64, "end": 1605.0, "text": " numbers in + it and if the query has any special characters in it so we kind of thought of ways", + "tokens": [50784, 3547, 294, 309, 293, 498, 264, 14581, 575, 604, 2121, 4342, 294, + 309, 370, 321, 733, 295, 1194, 295, 2098, 51152], "temperature": 0.0, "avg_logprob": + -0.07804080934235544, "compression_ratio": 1.8389261744966443, "no_speech_prob": + 0.00014303417992778122}, {"id": 239, "seek": 158924, "start": 1605.96, "end": 1612.52, + "text": " figuring out when is the query maybe more specific when it is more when + is it more", "tokens": [51200, 15213, 484, 562, 307, 264, 14581, 1310, 544, 2685, + 562, 309, 307, 544, 562, 307, 309, 544, 51528], "temperature": 0.0, "avg_logprob": + -0.07804080934235544, "compression_ratio": 1.8389261744966443, "no_speech_prob": + 0.00014303417992778122}, {"id": 240, "seek": 161252, "start": 1612.52, "end": 1618.68, + "text": " broad query a narrow query and then we will just come up with rules like + well a longer", "tokens": [50364, 4152, 14581, 257, 9432, 14581, 293, 550, 321, + 486, 445, 808, 493, 365, 4474, 411, 731, 257, 2854, 50672], "temperature": 0.0, + "avg_logprob": -0.14232163096583167, "compression_ratio": 1.968586387434555, "no_speech_prob": + 0.0014477790100499988}, {"id": 241, "seek": 161252, "start": 1618.68, "end": 1626.68, + "text": " query is the more specific it is and maybe if we have more specific queries + we have less results", "tokens": [50672, 14581, 307, 264, 544, 2685, 309, 307, 293, + 1310, 498, 321, 362, 544, 2685, 24109, 321, 362, 1570, 3542, 51072], "temperature": + 0.0, "avg_logprob": -0.14232163096583167, "compression_ratio": 1.968586387434555, + "no_speech_prob": 0.0014477790100499988}, {"id": 242, "seek": 161252, "start": 1626.68, + "end": 1634.2, "text": " that''s where we may want to augment search results with + neural search results on the other hand", "tokens": [51072, 300, 311, 689, 321, + 815, 528, 281, 29919, 3164, 3542, 365, 18161, 3164, 3542, 322, 264, 661, 1011, 51448], + "temperature": 0.0, "avg_logprob": -0.14232163096583167, "compression_ratio": 1.968586387434555, + "no_speech_prob": 0.0014477790100499988}, {"id": 243, "seek": 161252, "start": 1634.2, + "end": 1639.48, "text": " when we have a very broad query we may have a lot of results + these are short queries and then we", "tokens": [51448, 562, 321, 362, 257, 588, + 4152, 14581, 321, 815, 362, 257, 688, 295, 3542, 613, 366, 2099, 24109, 293, 550, + 321, 51712], "temperature": 0.0, "avg_logprob": -0.14232163096583167, "compression_ratio": + 1.968586387434555, "no_speech_prob": 0.0014477790100499988}, {"id": 244, "seek": + 163948, "start": 1639.48, "end": 1646.2, "text": " may want to let''s say only use + organic traditional keyword search results yeah so we just came up with", "tokens": + [50364, 815, 528, 281, 718, 311, 584, 787, 764, 10220, 5164, 20428, 3164, 3542, + 1338, 370, 321, 445, 1361, 493, 365, 50700], "temperature": 0.0, "avg_logprob": + -0.11411915296389732, "compression_ratio": 1.819047619047619, "no_speech_prob": + 0.00030637902091257274}, {"id": 245, "seek": 163948, "start": 1646.2, "end": 1654.84, + "text": " a couple of assumptions on our side and then with these four features + for the keyword search result", "tokens": [50700, 257, 1916, 295, 17695, 322, 527, + 1252, 293, 550, 365, 613, 1451, 4122, 337, 264, 20428, 3164, 1874, 51132], "temperature": + 0.0, "avg_logprob": -0.11411915296389732, "compression_ratio": 1.819047619047619, + "no_speech_prob": 0.00030637902091257274}, {"id": 246, "seek": 163948, "start": + 1654.84, "end": 1662.6, "text": " feature group we looked at the number of search + results the number of hits we got when we", "tokens": [51132, 4111, 1594, 321, 2956, + 412, 264, 1230, 295, 3164, 3542, 264, 1230, 295, 8664, 321, 658, 562, 321, 51520], + "temperature": 0.0, "avg_logprob": -0.11411915296389732, "compression_ratio": 1.819047619047619, + "no_speech_prob": 0.00030637902091257274}, {"id": 247, "seek": 163948, "start": + 1663.4, "end": 1669.24, "text": " executed the query with our baseline search configuration + so the one searching in the six", "tokens": [51560, 17577, 264, 14581, 365, 527, + 20518, 3164, 11694, 370, 264, 472, 10808, 294, 264, 2309, 51852], "temperature": + 0.0, "avg_logprob": -0.11411915296389732, "compression_ratio": 1.819047619047619, + "no_speech_prob": 0.00030637902091257274}, {"id": 248, "seek": 166924, "start": + 1669.24, "end": 1676.04, "text": " fields and then with something like hey if we + have zero results then this is maybe a perfect scenario", "tokens": [50364, 7909, + 293, 550, 365, 746, 411, 4177, 498, 321, 362, 4018, 3542, 550, 341, 307, 1310, 257, + 2176, 9005, 50704], "temperature": 0.0, "avg_logprob": -0.11994010894024959, "compression_ratio": + 1.699421965317919, "no_speech_prob": 0.0004973821341991425}, {"id": 249, "seek": + 166924, "start": 1676.04, "end": 1684.92, "text": " for neural search because then + we want to augment zero results with zero keyword results with what", "tokens": + [50704, 337, 18161, 3164, 570, 550, 321, 528, 281, 29919, 4018, 3542, 365, 4018, + 20428, 3542, 365, 437, 51148], "temperature": 0.0, "avg_logprob": -0.11994010894024959, + "compression_ratio": 1.699421965317919, "no_speech_prob": 0.0004973821341991425}, + {"id": 250, "seek": 166924, "start": 1684.92, "end": 1694.6, "text": " comes from + the vector search application the other two features we had in this group were the", + "tokens": [51148, 1487, 490, 264, 8062, 3164, 3861, 264, 661, 732, 4122, 321, 632, + 294, 341, 1594, 645, 264, 51632], "temperature": 0.0, "avg_logprob": -0.11994010894024959, + "compression_ratio": 1.699421965317919, "no_speech_prob": 0.0004973821341991425}, + {"id": 251, "seek": 169460, "start": 1695.48, "end": 1702.4399999999998, "text": + " the best title score we had in the keyword search result so if we have a strong + title match maybe", "tokens": [50408, 264, 1151, 4876, 6175, 321, 632, 294, 264, + 20428, 3164, 1874, 370, 498, 321, 362, 257, 2068, 4876, 2995, 1310, 50756], "temperature": + 0.0, "avg_logprob": -0.0765617644950135, "compression_ratio": 1.716763005780347, + "no_speech_prob": 0.0008155876421369612}, {"id": 252, "seek": 169460, "start": 1702.4399999999998, + "end": 1711.8, "text": " that''s an indication of we don''t need as much neural + search and we also have a look at I think the", "tokens": [50756, 300, 311, 364, + 18877, 295, 321, 500, 380, 643, 382, 709, 18161, 3164, 293, 321, 611, 362, 257, + 574, 412, 286, 519, 264, 51224], "temperature": 0.0, "avg_logprob": -0.0765617644950135, + "compression_ratio": 1.716763005780347, "no_speech_prob": 0.0008155876421369612}, + {"id": 253, "seek": 169460, "start": 1711.8, "end": 1719.56, "text": " average title + score in the top 10 was was another one so if we have like a high average in the + title", "tokens": [51224, 4274, 4876, 6175, 294, 264, 1192, 1266, 390, 390, 1071, + 472, 370, 498, 321, 362, 411, 257, 1090, 4274, 294, 264, 4876, 51612], "temperature": + 0.0, "avg_logprob": -0.0765617644950135, "compression_ratio": 1.716763005780347, + "no_speech_prob": 0.0008155876421369612}, {"id": 254, "seek": 171956, "start": 1719.56, + "end": 1728.84, "text": " scores that''s maybe a good sign for no augmentation needed + with neural search results for semantic", "tokens": [50364, 13444, 300, 311, 1310, + 257, 665, 1465, 337, 572, 14501, 19631, 2978, 365, 18161, 3164, 3542, 337, 47982, + 50828], "temperature": 0.0, "avg_logprob": -0.09373751420241136, "compression_ratio": + 1.7218934911242603, "no_speech_prob": 0.0005050160689279437}, {"id": 255, "seek": + 171956, "start": 1728.84, "end": 1738.6, "text": " search it was similar to the + title score so we had a look at the best title score and the average", "tokens": + [50828, 3164, 309, 390, 2531, 281, 264, 4876, 6175, 370, 321, 632, 257, 574, 412, + 264, 1151, 4876, 6175, 293, 264, 4274, 51316], "temperature": 0.0, "avg_logprob": + -0.09373751420241136, "compression_ratio": 1.7218934911242603, "no_speech_prob": + 0.0005050160689279437}, {"id": 256, "seek": 171956, "start": 1739.3999999999999, + "end": 1746.44, "text": " semantic similarity based on the title that we had indexed + so by looking at these three groups", "tokens": [51356, 47982, 32194, 2361, 322, + 264, 4876, 300, 321, 632, 8186, 292, 370, 538, 1237, 412, 613, 1045, 3935, 51708], + "temperature": 0.0, "avg_logprob": -0.09373751420241136, "compression_ratio": 1.7218934911242603, + "no_speech_prob": 0.0005050160689279437}, {"id": 257, "seek": 174644, "start": 1746.44, + "end": 1753.48, "text": " we thought of well we now have a representation of the + query on its own the result set based on keywords", "tokens": [50364, 321, 1194, + 295, 731, 321, 586, 362, 257, 10290, 295, 264, 14581, 322, 1080, 1065, 264, 1874, + 992, 2361, 322, 21009, 50716], "temperature": 0.0, "avg_logprob": -0.0883289048838061, + "compression_ratio": 1.7972972972972974, "no_speech_prob": 0.0012511537643149495}, + {"id": 258, "seek": 174644, "start": 1753.48, "end": 1760.28, "text": " and then + the result set based on neural search and that was kind of our starting point and + then we", "tokens": [50716, 293, 550, 264, 1874, 992, 2361, 322, 18161, 3164, 293, + 300, 390, 733, 295, 527, 2891, 935, 293, 550, 321, 51056], "temperature": 0.0, "avg_logprob": + -0.0883289048838061, "compression_ratio": 1.7972972972972974, "no_speech_prob": + 0.0012511537643149495}, {"id": 259, "seek": 174644, "start": 1760.28, "end": 1768.1200000000001, + "text": " did loads of experiments having to do with what''s the best feature combination + when we train a linear", "tokens": [51056, 630, 12668, 295, 12050, 1419, 281, 360, + 365, 437, 311, 264, 1151, 4111, 6562, 562, 321, 3847, 257, 8213, 51448], "temperature": + 0.0, "avg_logprob": -0.0883289048838061, "compression_ratio": 1.7972972972972974, + "no_speech_prob": 0.0012511537643149495}, {"id": 260, "seek": 174644, "start": 1768.1200000000001, + "end": 1775.96, "text": " regression model or a random forest regression model what + does regularization play for a role", "tokens": [51448, 24590, 2316, 420, 257, 4974, + 6719, 24590, 2316, 437, 775, 3890, 2144, 862, 337, 257, 3090, 51840], "temperature": + 0.0, "avg_logprob": -0.0883289048838061, "compression_ratio": 1.7972972972972974, + "no_speech_prob": 0.0012511537643149495}, {"id": 261, "seek": 177596, "start": 1776.2, + "end": 1784.2, "text": " can we optimize the model training aspect with that so + we really did a lot of iterations with these we", "tokens": [50376, 393, 321, 19719, + 264, 2316, 3097, 4171, 365, 300, 370, 321, 534, 630, 257, 688, 295, 36540, 365, + 613, 321, 50776], "temperature": 0.0, "avg_logprob": -0.14441747376413056, "compression_ratio": + 1.646067415730337, "no_speech_prob": 0.00133566337171942}, {"id": 262, "seek": 177596, + "start": 1784.2, "end": 1791.0, "text": " also had a look at a large query set and + versus a smaller query set to see if that also provides", "tokens": [50776, 611, + 632, 257, 574, 412, 257, 2416, 14581, 992, 293, 5717, 257, 4356, 14581, 992, 281, + 536, 498, 300, 611, 6417, 51116], "temperature": 0.0, "avg_logprob": -0.14441747376413056, + "compression_ratio": 1.646067415730337, "no_speech_prob": 0.00133566337171942}, + {"id": 263, "seek": 177596, "start": 1791.0, "end": 1801.4, "text": " different + aspects to it if we just randomly sound the 500 yet 500 queries versus 5000 queries", + "tokens": [51116, 819, 7270, 281, 309, 498, 321, 445, 16979, 1626, 264, 5923, 1939, + 5923, 24109, 5717, 23777, 24109, 51636], "temperature": 0.0, "avg_logprob": -0.14441747376413056, + "compression_ratio": 1.646067415730337, "no_speech_prob": 0.00133566337171942}, + {"id": 264, "seek": 180140, "start": 1801.5600000000002, "end": 1810.92, "text": + " so we did a lot of exploration to really pick out and make sure that we are not + really let''s say", "tokens": [50372, 370, 321, 630, 257, 688, 295, 16197, 281, + 534, 1888, 484, 293, 652, 988, 300, 321, 366, 406, 534, 718, 311, 584, 50840], "temperature": + 0.0, "avg_logprob": -0.17442808431737564, "compression_ratio": 1.7045454545454546, + "no_speech_prob": 0.0032360912300646305}, {"id": 265, "seek": 180140, "start": 1810.92, + "end": 1820.1200000000001, "text": " randomly receiving the uplift that we saw but + actually really making sure that there is something", "tokens": [50840, 16979, 10040, + 264, 45407, 300, 321, 1866, 457, 767, 534, 1455, 988, 300, 456, 307, 746, 51300], + "temperature": 0.0, "avg_logprob": -0.17442808431737564, "compression_ratio": 1.7045454545454546, + "no_speech_prob": 0.0032360912300646305}, {"id": 266, "seek": 180140, "start": 1820.1200000000001, + "end": 1830.2800000000002, "text": " to it and that we can go out in the wild there + for example on this podcast and share our observations and", "tokens": [51300, 281, + 309, 293, 300, 321, 393, 352, 484, 294, 264, 4868, 456, 337, 1365, 322, 341, 7367, + 293, 2073, 527, 18163, 293, 51808], "temperature": 0.0, "avg_logprob": -0.17442808431737564, + "compression_ratio": 1.7045454545454546, "no_speech_prob": 0.0032360912300646305}, + {"id": 267, "seek": 183028, "start": 1831.16, "end": 1837.08, "text": " be kind + of on the safe side that they can be reproduced hopefully in other scenarios as + well", "tokens": [50408, 312, 733, 295, 322, 264, 3273, 1252, 300, 436, 393, 312, + 11408, 1232, 4696, 294, 661, 15077, 382, 731, 50704], "temperature": 0.0, "avg_logprob": + -0.12407591848662405, "compression_ratio": 1.6705882352941177, "no_speech_prob": + 0.00036687497049570084}, {"id": 268, "seek": 183028, "start": 1839.72, "end": 1847.3999999999999, + "text": " so that''s kind of on the let''s say how did we do it here so the features + feature engineering", "tokens": [50836, 370, 300, 311, 733, 295, 322, 264, 718, + 311, 584, 577, 630, 321, 360, 309, 510, 370, 264, 4122, 4111, 7043, 51220], "temperature": + 0.0, "avg_logprob": -0.12407591848662405, "compression_ratio": 1.6705882352941177, + "no_speech_prob": 0.00036687497049570084}, {"id": 269, "seek": 183028, "start": + 1847.3999999999999, "end": 1853.48, "text": " how we train our models so we have + a look at linear regression models and random forest regression", "tokens": [51220, + 577, 321, 3847, 527, 5245, 370, 321, 362, 257, 574, 412, 8213, 24590, 5245, 293, + 4974, 6719, 24590, 51524], "temperature": 0.0, "avg_logprob": -0.12407591848662405, + "compression_ratio": 1.6705882352941177, "no_speech_prob": 0.00036687497049570084}, + {"id": 270, "seek": 185348, "start": 1854.3600000000001, "end": 1861.24, "text": + " as a starting point because we thought let''s have a look at simple models first + and if that works", "tokens": [50408, 382, 257, 2891, 935, 570, 321, 1194, 718, + 311, 362, 257, 574, 412, 2199, 5245, 700, 293, 498, 300, 1985, 50752], "temperature": + 0.0, "avg_logprob": -0.07810977064532998, "compression_ratio": 1.8390243902439025, + "no_speech_prob": 0.0013448463287204504}, {"id": 271, "seek": 185348, "start": 1861.24, + "end": 1867.64, "text": " we can still have a look at the more complex ones and + that''s maybe already the first observation", "tokens": [50752, 321, 393, 920, 362, + 257, 574, 412, 264, 544, 3997, 2306, 293, 300, 311, 1310, 1217, 264, 700, 14816, + 51072], "temperature": 0.0, "avg_logprob": -0.07810977064532998, "compression_ratio": + 1.8390243902439025, "no_speech_prob": 0.0013448463287204504}, {"id": 272, "seek": + 185348, "start": 1867.64, "end": 1875.64, "text": " that I can share so linear regression + models and the simplest form random forest regression", "tokens": [51072, 300, 286, + 393, 2073, 370, 8213, 24590, 5245, 293, 264, 22811, 1254, 4974, 6719, 24590, 51472], + "temperature": 0.0, "avg_logprob": -0.07810977064532998, "compression_ratio": 1.8390243902439025, + "no_speech_prob": 0.0013448463287204504}, {"id": 273, "seek": 185348, "start": 1875.64, + "end": 1880.76, "text": " slightly more complex form and then this is the last model + iteration that we did last week", "tokens": [51472, 4748, 544, 3997, 1254, 293, + 550, 341, 307, 264, 1036, 2316, 24784, 300, 321, 630, 1036, 1243, 51728], "temperature": + 0.0, "avg_logprob": -0.07810977064532998, "compression_ratio": 1.8390243902439025, + "no_speech_prob": 0.0013448463287204504}, {"id": 274, "seek": 188076, "start": 1881.4, + "end": 1891.72, "text": " we also have a look at gradient boosting methods and interestingly + they all were almost the same", "tokens": [50396, 321, 611, 362, 257, 574, 412, + 16235, 43117, 7150, 293, 25873, 436, 439, 645, 1920, 264, 912, 50912], "temperature": + 0.0, "avg_logprob": -0.07371518688817177, "compression_ratio": 1.5815217391304348, + "no_speech_prob": 0.0006007174379192293}, {"id": 275, "seek": 188076, "start": 1891.72, + "end": 1898.36, "text": " from the model performance perspective so it wasn''t like + the most complex ones really give you", "tokens": [50912, 490, 264, 2316, 3389, + 4585, 370, 309, 2067, 380, 411, 264, 881, 3997, 2306, 534, 976, 291, 51244], "temperature": + 0.0, "avg_logprob": -0.07371518688817177, "compression_ratio": 1.5815217391304348, + "no_speech_prob": 0.0006007174379192293}, {"id": 276, "seek": 188076, "start": 1898.36, + "end": 1906.6, "text": " the best results and that''s kind of a very reassuring + feeling because we need to calculate a couple", "tokens": [51244, 264, 1151, 3542, + 293, 300, 311, 733, 295, 257, 588, 19486, 1345, 2633, 570, 321, 643, 281, 8873, + 257, 1916, 51656], "temperature": 0.0, "avg_logprob": -0.07371518688817177, "compression_ratio": + 1.5815217391304348, "no_speech_prob": 0.0006007174379192293}, {"id": 277, "seek": + 190660, "start": 1906.76, "end": 1914.9199999999998, "text": " features right per + query and that adds latency to your query execution and especially in e-commerce", + "tokens": [50372, 4122, 558, 680, 14581, 293, 300, 10860, 27043, 281, 428, 14581, + 15058, 293, 2318, 294, 308, 12, 26926, 50780], "temperature": 0.0, "avg_logprob": + -0.13972900027320498, "compression_ratio": 1.5614973262032086, "no_speech_prob": + 0.0022194196935743093}, {"id": 278, "seek": 190660, "start": 1914.9199999999998, + "end": 1920.76, "text": " where every millisecond basically counts we don''t really + want to let''s say run multiple queries", "tokens": [50780, 689, 633, 27940, 18882, + 1936, 14893, 321, 500, 380, 534, 528, 281, 718, 311, 584, 1190, 3866, 24109, 51072], + "temperature": 0.0, "avg_logprob": -0.13972900027320498, "compression_ratio": 1.5614973262032086, + "no_speech_prob": 0.0022194196935743093}, {"id": 279, "seek": 190660, "start": 1920.76, + "end": 1929.8, "text": " to calculate our features just to have like 0.3 percent + performance increase so it really has to", "tokens": [51072, 281, 8873, 527, 4122, + 445, 281, 362, 411, 1958, 13, 18, 3043, 3389, 3488, 370, 309, 534, 575, 281, 51524], + "temperature": 0.0, "avg_logprob": -0.13972900027320498, "compression_ratio": 1.5614973262032086, + "no_speech_prob": 0.0022194196935743093}, {"id": 280, "seek": 192980, "start": 1930.44, + "end": 1937.6399999999999, "text": " be worth the effort so that''s kind of a nice + observation so we don''t have to go for the most", "tokens": [50396, 312, 3163, + 264, 4630, 370, 300, 311, 733, 295, 257, 1481, 14816, 370, 321, 500, 380, 362, 281, + 352, 337, 264, 881, 50756], "temperature": 0.0, "avg_logprob": -0.15205131342381606, + "compression_ratio": 1.6771300448430493, "no_speech_prob": 0.0016967261908575892}, + {"id": 281, "seek": 192980, "start": 1937.6399999999999, "end": 1943.6399999999999, + "text": " complex model architectures we can stick with the simple ones and don''t + really lose a lot of", "tokens": [50756, 3997, 2316, 6331, 1303, 321, 393, 2897, + 365, 264, 2199, 2306, 293, 500, 380, 534, 3624, 257, 688, 295, 51056], "temperature": + 0.0, "avg_logprob": -0.15205131342381606, "compression_ratio": 1.6771300448430493, + "no_speech_prob": 0.0016967261908575892}, {"id": 282, "seek": 192980, "start": 1943.6399999999999, + "end": 1950.68, "text": " performance if any the linear regression model and the + random forest regression model they actually", "tokens": [51056, 3389, 498, 604, + 264, 8213, 24590, 2316, 293, 264, 4974, 6719, 24590, 2316, 436, 767, 51408], "temperature": + 0.0, "avg_logprob": -0.15205131342381606, "compression_ratio": 1.6771300448430493, + "no_speech_prob": 0.0016967261908575892}, {"id": 283, "seek": 192980, "start": 1950.68, + "end": 1959.24, "text": " scored absolutely equally when calculating the search + metrics so they predicted the NDEG", "tokens": [51408, 18139, 3122, 12309, 562, + 28258, 264, 3164, 16367, 370, 436, 19147, 264, 426, 22296, 38, 51836], "temperature": + 0.0, "avg_logprob": -0.15205131342381606, "compression_ratio": 1.6771300448430493, + "no_speech_prob": 0.0016967261908575892}, {"id": 284, "seek": 195924, "start": 1959.24, + "end": 1966.68, "text": " scores slightly different so that''s how we did it we + predicted NDEG scores by adding neuroness", "tokens": [50364, 13444, 4748, 819, + 370, 300, 311, 577, 321, 630, 309, 321, 19147, 426, 22296, 38, 13444, 538, 5127, + 12087, 266, 442, 50736], "temperature": 0.0, "avg_logprob": -0.16975004217597875, + "compression_ratio": 1.7710280373831775, "no_speech_prob": 0.0007928272243589163}, + {"id": 285, "seek": 195924, "start": 1966.68, "end": 1973.88, "text": " as a 10th + feature in that case and by looking at which neuroness scored best and that kind + of", "tokens": [50736, 382, 257, 1266, 392, 4111, 294, 300, 1389, 293, 538, 1237, + 412, 597, 12087, 266, 442, 18139, 1151, 293, 300, 733, 295, 51096], "temperature": + 0.0, "avg_logprob": -0.16975004217597875, "compression_ratio": 1.7710280373831775, + "no_speech_prob": 0.0007928272243589163}, {"id": 286, "seek": 195924, "start": 1973.88, + "end": 1981.96, "text": " the neuroness we went then with for testing efforts afterwards + so that''s kind of the first interesting", "tokens": [51096, 264, 12087, 266, 442, + 321, 1437, 550, 365, 337, 4997, 6484, 10543, 370, 300, 311, 733, 295, 264, 700, + 1880, 51500], "temperature": 0.0, "avg_logprob": -0.16975004217597875, "compression_ratio": + 1.7710280373831775, "no_speech_prob": 0.0007928272243589163}, {"id": 287, "seek": + 195924, "start": 1981.96, "end": 1989.0, "text": " observation that we made we also + had a look at different feature groups so what happens", "tokens": [51500, 14816, + 300, 321, 1027, 321, 611, 632, 257, 574, 412, 819, 4111, 3935, 370, 437, 2314, 51852], + "temperature": 0.0, "avg_logprob": -0.16975004217597875, "compression_ratio": 1.7710280373831775, + "no_speech_prob": 0.0007928272243589163}, {"id": 288, "seek": 198900, "start": 1989.0, + "end": 1997.16, "text": " to model performance if we focus only on query features + or only on keyword search result features", "tokens": [50364, 281, 2316, 3389, 498, + 321, 1879, 787, 322, 14581, 4122, 420, 787, 322, 20428, 3164, 1874, 4122, 50772], + "temperature": 0.0, "avg_logprob": -0.10760331572147838, "compression_ratio": 1.7439024390243902, + "no_speech_prob": 0.0002571550430729985}, {"id": 289, "seek": 198900, "start": 1997.16, + "end": 2004.28, "text": " so training models within one feature group only and not + taking all features into account", "tokens": [50772, 370, 3097, 5245, 1951, 472, + 4111, 1594, 787, 293, 406, 1940, 439, 4122, 666, 2696, 51128], "temperature": 0.0, + "avg_logprob": -0.10760331572147838, "compression_ratio": 1.7439024390243902, "no_speech_prob": + 0.0002571550430729985}, {"id": 290, "seek": 198900, "start": 2005.32, "end": 2012.28, + "text": " the interesting part here was that the combination work best so not always + the combinations of all", "tokens": [51180, 264, 1880, 644, 510, 390, 300, 264, + 6562, 589, 1151, 370, 406, 1009, 264, 21267, 295, 439, 51528], "temperature": 0.0, + "avg_logprob": -0.10760331572147838, "compression_ratio": 1.7439024390243902, "no_speech_prob": + 0.0002571550430729985}, {"id": 291, "seek": 201228, "start": 2012.28, "end": 2018.76, + "text": " in nine features together with the neuroness feature but at least some + of the key word search", "tokens": [50364, 294, 4949, 4122, 1214, 365, 264, 12087, + 266, 442, 4111, 457, 412, 1935, 512, 295, 264, 2141, 1349, 3164, 50688], "temperature": + 0.0, "avg_logprob": -0.16376498937606812, "compression_ratio": 1.913265306122449, + "no_speech_prob": 0.000762462499551475}, {"id": 292, "seek": 201228, "start": 2018.76, + "end": 2024.12, "text": " result features some of the semantics search result features + and some of the queries features", "tokens": [50688, 1874, 4122, 512, 295, 264, + 4361, 45298, 3164, 1874, 4122, 293, 512, 295, 264, 24109, 4122, 50956], "temperature": + 0.0, "avg_logprob": -0.16376498937606812, "compression_ratio": 1.913265306122449, + "no_speech_prob": 0.000762462499551475}, {"id": 293, "seek": 201228, "start": 2024.84, + "end": 2030.2, "text": " so they were best but looking at the query features only + and these are the simplest ones to", "tokens": [50992, 370, 436, 645, 1151, 457, + 1237, 412, 264, 14581, 4122, 787, 293, 613, 366, 264, 22811, 2306, 281, 51260], + "temperature": 0.0, "avg_logprob": -0.16376498937606812, "compression_ratio": 1.913265306122449, + "no_speech_prob": 0.000762462499551475}, {"id": 294, "seek": 201228, "start": 2030.2, + "end": 2038.12, "text": " calculate they weren''t part of these models from this + performance aspect so you wouldn''t really", "tokens": [51260, 8873, 436, 4999, + 380, 644, 295, 613, 5245, 490, 341, 3389, 4171, 370, 291, 2759, 380, 534, 51656], + "temperature": 0.0, "avg_logprob": -0.16376498937606812, "compression_ratio": 1.913265306122449, + "no_speech_prob": 0.000762462499551475}, {"id": 295, "seek": 203812, "start": 2038.1999999999998, + "end": 2047.08, "text": " lose a lot if you only shows query features for your predictions + so that was another nice observation", "tokens": [50368, 3624, 257, 688, 498, 291, + 787, 3110, 14581, 4122, 337, 428, 21264, 370, 300, 390, 1071, 1481, 14816, 50812], + "temperature": 0.0, "avg_logprob": -0.13134191975449072, "compression_ratio": 1.6793478260869565, + "no_speech_prob": 0.0008912773337215185}, {"id": 296, "seek": 203812, "start": 2047.08, + "end": 2055.08, "text": " here that if you went with the highest performance in + terms of let''s say inference speed and also speed", "tokens": [50812, 510, 300, + 498, 291, 1437, 365, 264, 6343, 3389, 294, 2115, 295, 718, 311, 584, 38253, 3073, + 293, 611, 3073, 51212], "temperature": 0.0, "avg_logprob": -0.13134191975449072, + "compression_ratio": 1.6793478260869565, "no_speech_prob": 0.0008912773337215185}, + {"id": 297, "seek": 203812, "start": 2055.08, "end": 2063.16, "text": " of calculating + the features beforehand you don''t lose a lot of search result quality so again + you don''t", "tokens": [51212, 295, 28258, 264, 4122, 22893, 291, 500, 380, 3624, + 257, 688, 295, 3164, 1874, 3125, 370, 797, 291, 500, 380, 51616], "temperature": + 0.0, "avg_logprob": -0.13134191975449072, "compression_ratio": 1.6793478260869565, + "no_speech_prob": 0.0008912773337215185}, {"id": 298, "seek": 206316, "start": 2063.24, + "end": 2069.24, "text": " have to go with the most complex approach to get reasonable + results out of that which was I think", "tokens": [50368, 362, 281, 352, 365, 264, + 881, 3997, 3109, 281, 483, 10585, 3542, 484, 295, 300, 597, 390, 286, 519, 50668], + "temperature": 0.0, "avg_logprob": -0.21859038503546463, "compression_ratio": 1.5973451327433628, + "no_speech_prob": 0.007162398658692837}, {"id": 299, "seek": 206316, "start": 2069.24, + "end": 2074.92, "text": " the second most important finding at least from my perspective + because that gives us again", "tokens": [50668, 264, 1150, 881, 1021, 5006, 412, + 1935, 490, 452, 4585, 570, 300, 2709, 505, 797, 50952], "temperature": 0.0, "avg_logprob": + -0.21859038503546463, "compression_ratio": 1.5973451327433628, "no_speech_prob": + 0.007162398658692837}, {"id": 300, "seek": 206316, "start": 2075.64, "end": 2083.48, + "text": " the assurance that when putting this into production we don''t assume + let''s say hundreds of", "tokens": [50988, 264, 32189, 300, 562, 3372, 341, 666, + 4265, 321, 500, 380, 6552, 718, 311, 584, 6779, 295, 51380], "temperature": 0.0, + "avg_logprob": -0.21859038503546463, "compression_ratio": 1.5973451327433628, "no_speech_prob": + 0.007162398658692837}, {"id": 301, "seek": 206316, "start": 2083.48, "end": 2089.08, + "text": " milliseconds added to your query latency if you stick to the the simple + features.", "tokens": [51380, 34184, 3869, 281, 428, 14581, 27043, 498, 291, 2897, + 281, 264, 264, 2199, 4122, 13, 51660], "temperature": 0.0, "avg_logprob": -0.21859038503546463, + "compression_ratio": 1.5973451327433628, "no_speech_prob": 0.007162398658692837}, + {"id": 302, "seek": 208908, "start": 2089.16, "end": 2093.72, "text": " May it means + that there''s like room for growth with this technique right we''re not", "tokens": + [50368, 1891, 309, 1355, 300, 456, 311, 411, 1808, 337, 4599, 365, 341, 6532, 558, + 321, 434, 406, 50596], "temperature": 0.0, "avg_logprob": -0.18722731913995305, + "compression_ratio": 1.757462686567164, "no_speech_prob": 0.06885755062103271}, + {"id": 303, "seek": 208908, "start": 2093.72, "end": 2099.88, "text": " maxing out + this technique just to get started we can start out and then as we get more sophisticated", + "tokens": [50596, 11469, 278, 484, 341, 6532, 445, 281, 483, 1409, 321, 393, 722, + 484, 293, 550, 382, 321, 483, 544, 16950, 50904], "temperature": 0.0, "avg_logprob": + -0.18722731913995305, "compression_ratio": 1.757462686567164, "no_speech_prob": + 0.06885755062103271}, {"id": 304, "seek": 208908, "start": 2099.88, "end": 2107.56, + "text": " we have room we have milliseconds to burn to do other cool interesting + things ask an lllm to", "tokens": [50904, 321, 362, 1808, 321, 362, 34184, 281, + 5064, 281, 360, 661, 1627, 1880, 721, 1029, 364, 287, 285, 76, 281, 51288], "temperature": + 0.0, "avg_logprob": -0.18722731913995305, "compression_ratio": 1.757462686567164, + "no_speech_prob": 0.06885755062103271}, {"id": 305, "seek": 208908, "start": 2107.56, + "end": 2112.2799999999997, "text": " characterize the query or something like that + right we''ve got room for bro but I will also like", "tokens": [51288, 38463, 264, + 14581, 420, 746, 411, 300, 558, 321, 600, 658, 1808, 337, 2006, 457, 286, 486, 611, + 411, 51524], "temperature": 0.0, "avg_logprob": -0.18722731913995305, "compression_ratio": + 1.757462686567164, "no_speech_prob": 0.06885755062103271}, {"id": 306, "seek": 208908, + "start": 2112.2799999999997, "end": 2118.04, "text": " this lesson and I think it + kind of resonates with what I''ve seen in doing a mal previously is that", "tokens": + [51524, 341, 6898, 293, 286, 519, 309, 733, 295, 41051, 365, 437, 286, 600, 1612, + 294, 884, 257, 2806, 8046, 307, 300, 51812], "temperature": 0.0, "avg_logprob": + -0.18722731913995305, "compression_ratio": 1.757462686567164, "no_speech_prob": + 0.06885755062103271}, {"id": 307, "seek": 211804, "start": 2118.68, "end": 2125.4, + "text": " start with like simpler solutions and try to kind of maximize ROI you + know by upgrading to", "tokens": [50396, 722, 365, 411, 18587, 6547, 293, 853, 281, + 733, 295, 19874, 49808, 291, 458, 538, 36249, 281, 50732], "temperature": 0.0, "avg_logprob": + -0.08514618343777126, "compression_ratio": 1.641025641025641, "no_speech_prob": + 0.008666086941957474}, {"id": 308, "seek": 211804, "start": 2125.4, "end": 2130.6, + "text": " a more complex one and you need to set some thresholds because like as + you said Daniel you know just", "tokens": [50732, 257, 544, 3997, 472, 293, 291, + 643, 281, 992, 512, 14678, 82, 570, 411, 382, 291, 848, 8033, 291, 458, 445, 50992], + "temperature": 0.0, "avg_logprob": -0.08514618343777126, "compression_ratio": 1.641025641025641, + "no_speech_prob": 0.008666086941957474}, {"id": 309, "seek": 211804, "start": 2130.6, + "end": 2137.08, "text": " you know adding 0.03 won''t get it right it doesn''t it''s + not worth because also when you bring", "tokens": [50992, 291, 458, 5127, 1958, + 13, 11592, 1582, 380, 483, 309, 558, 309, 1177, 380, 309, 311, 406, 3163, 570, 611, + 562, 291, 1565, 51316], "temperature": 0.0, "avg_logprob": -0.08514618343777126, + "compression_ratio": 1.641025641025641, "no_speech_prob": 0.008666086941957474}, + {"id": 310, "seek": 211804, "start": 2137.08, "end": 2142.92, "text": " in neural + search it means that you need to build that parallel index of things right like + you need", "tokens": [51316, 294, 18161, 3164, 309, 1355, 300, 291, 643, 281, 1322, + 300, 8952, 8186, 295, 721, 558, 411, 291, 643, 51608], "temperature": 0.0, "avg_logprob": + -0.08514618343777126, "compression_ratio": 1.641025641025641, "no_speech_prob": + 0.008666086941957474}, {"id": 311, "seek": 214292, "start": 2143.32, "end": 2151.48, + "text": " you need to compute and maybe GPUs as well and someone will need to pay + for it and I guess", "tokens": [50384, 291, 643, 281, 14722, 293, 1310, 18407, 82, + 382, 731, 293, 1580, 486, 643, 281, 1689, 337, 309, 293, 286, 2041, 50792], "temperature": + 0.0, "avg_logprob": -0.3127850426567925, "compression_ratio": 1.5217391304347827, + "no_speech_prob": 0.007821322418749332}, {"id": 312, "seek": 214292, "start": 2151.48, + "end": 2157.2400000000002, "text": " the passing Daniel we''re doing this project + I kept asking Daniel I''m like why is re-indexing", "tokens": [50792, 264, 8437, + 8033, 321, 434, 884, 341, 1716, 286, 4305, 3365, 8033, 286, 478, 411, 983, 307, + 319, 12, 471, 3121, 278, 51080], "temperature": 0.0, "avg_logprob": -0.3127850426567925, + "compression_ratio": 1.5217391304347827, "no_speech_prob": 0.007821322418749332}, + {"id": 313, "seek": 214292, "start": 2157.2400000000002, "end": 2164.44, "text": + " with embedding so slow like where''s my turbo button like really why is this still + a problem it''s", "tokens": [51080, 365, 12240, 3584, 370, 2964, 411, 689, 311, + 452, 20902, 2960, 411, 534, 983, 307, 341, 920, 257, 1154, 309, 311, 51440], "temperature": + 0.0, "avg_logprob": -0.3127850426567925, "compression_ratio": 1.5217391304347827, + "no_speech_prob": 0.007821322418749332}, {"id": 314, "seek": 216444, "start": 2164.44, + "end": 2173.16, "text": " 2024 almost 2025 why does embeddings take a long time + yeah I remain a little confused why it''s so like", "tokens": [50364, 45237, 1920, + 39209, 983, 775, 12240, 29432, 747, 257, 938, 565, 1338, 286, 6222, 257, 707, 9019, + 983, 309, 311, 370, 411, 50800], "temperature": 0.0, "avg_logprob": -0.17782463965477882, + "compression_ratio": 1.5336787564766838, "no_speech_prob": 0.005210345145314932}, + {"id": 315, "seek": 216444, "start": 2174.04, "end": 2181.32, "text": " don''t we + just turn a knob and make a GPU go faster and then re-index with embeddings is just + the same", "tokens": [50844, 500, 380, 321, 445, 1261, 257, 26759, 293, 652, 257, + 18407, 352, 4663, 293, 550, 319, 12, 471, 3121, 365, 12240, 29432, 307, 445, 264, + 912, 51208], "temperature": 0.0, "avg_logprob": -0.17782463965477882, "compression_ratio": + 1.5336787564766838, "no_speech_prob": 0.005210345145314932}, {"id": 316, "seek": + 216444, "start": 2181.32, "end": 2190.2000000000003, "text": " speed as re-indexing + with keywords yeah but but also what Daniel says now but also like the", "tokens": + [51208, 3073, 382, 319, 12, 471, 3121, 278, 365, 21009, 1338, 457, 457, 611, 437, + 8033, 1619, 586, 457, 611, 411, 264, 51652], "temperature": 0.0, "avg_logprob": + -0.17782463965477882, "compression_ratio": 1.5336787564766838, "no_speech_prob": + 0.005210345145314932}, {"id": 317, "seek": 219020, "start": 2190.2, "end": 2194.52, + "text": " fascinating part like one thought crossed my mind as you explained it + Daniel is that in some", "tokens": [50364, 10343, 644, 411, 472, 1194, 14622, 452, + 1575, 382, 291, 8825, 309, 8033, 307, 300, 294, 512, 50580], "temperature": 0.0, + "avg_logprob": -0.09886228519937267, "compression_ratio": 1.6986899563318778, "no_speech_prob": + 0.0012310482561588287}, {"id": 318, "seek": 219020, "start": 2194.52, "end": 2200.68, + "text": " sense you''ve built some sort of like a reasoning engine if I may call + it that way maybe it''s not", "tokens": [50580, 2020, 291, 600, 3094, 512, 1333, + 295, 411, 257, 21577, 2848, 498, 286, 815, 818, 309, 300, 636, 1310, 309, 311, 406, + 50888], "temperature": 0.0, "avg_logprob": -0.09886228519937267, "compression_ratio": + 1.6986899563318778, "no_speech_prob": 0.0012310482561588287}, {"id": 319, "seek": + 219020, "start": 2200.68, "end": 2206.3599999999997, "text": " fully you know reasoning + like I don''t know LLM start to do it that way or something but it''s like", "tokens": + [50888, 4498, 291, 458, 21577, 411, 286, 500, 380, 458, 441, 43, 44, 722, 281, 360, + 309, 300, 636, 420, 746, 457, 309, 311, 411, 51172], "temperature": 0.0, "avg_logprob": + -0.09886228519937267, "compression_ratio": 1.6986899563318778, "no_speech_prob": + 0.0012310482561588287}, {"id": 320, "seek": 219020, "start": 2207.08, "end": 2212.8399999999997, + "text": " the engine that looks at the query and examines its features and makes + some some conclusions and then", "tokens": [51208, 264, 2848, 300, 1542, 412, 264, + 14581, 293, 1139, 1652, 1080, 4122, 293, 1669, 512, 512, 22865, 293, 550, 51496], + "temperature": 0.0, "avg_logprob": -0.09886228519937267, "compression_ratio": 1.6986899563318778, + "no_speech_prob": 0.0012310482561588287}, {"id": 321, "seek": 221284, "start": 2212.84, + "end": 2219.96, "text": " it looks also at the results it''s not like you just you + just you understood the query you set it", "tokens": [50364, 309, 1542, 611, 412, + 264, 3542, 309, 311, 406, 411, 291, 445, 291, 445, 291, 7320, 264, 14581, 291, 992, + 309, 50720], "temperature": 0.0, "avg_logprob": -0.07936587540999702, "compression_ratio": + 1.7713004484304933, "no_speech_prob": 0.001717950333841145}, {"id": 322, "seek": + 221284, "start": 2219.96, "end": 2225.32, "text": " over to the retriever side and + then you hope for the best that there will be best results right but in", "tokens": + [50720, 670, 281, 264, 19817, 331, 1252, 293, 550, 291, 1454, 337, 264, 1151, 300, + 456, 486, 312, 1151, 3542, 558, 457, 294, 50988], "temperature": 0.0, "avg_logprob": + -0.07936587540999702, "compression_ratio": 1.7713004484304933, "no_speech_prob": + 0.001717950333841145}, {"id": 323, "seek": 221284, "start": 2225.32, "end": 2231.7200000000003, + "text": " some sense you you basically do this dynamic sort of reasoning on top + of everything but but the", "tokens": [50988, 512, 2020, 291, 291, 1936, 360, 341, + 8546, 1333, 295, 21577, 322, 1192, 295, 1203, 457, 457, 264, 51308], "temperature": + 0.0, "avg_logprob": -0.07936587540999702, "compression_ratio": 1.7713004484304933, + "no_speech_prob": 0.001717950333841145}, {"id": 324, "seek": 221284, "start": 2231.7200000000003, + "end": 2238.44, "text": " lesson there you said and correct me if I''m wrong is + that just by looking at the query features you", "tokens": [51308, 6898, 456, 291, + 848, 293, 3006, 385, 498, 286, 478, 2085, 307, 300, 445, 538, 1237, 412, 264, 14581, + 4122, 291, 51644], "temperature": 0.0, "avg_logprob": -0.07936587540999702, "compression_ratio": + 1.7713004484304933, "no_speech_prob": 0.001717950333841145}, {"id": 325, "seek": + 223844, "start": 2238.44, "end": 2245.56, "text": " could already achieve good results + you don''t need to look at the result features yes yes but", "tokens": [50364, 727, + 1217, 4584, 665, 3542, 291, 500, 380, 643, 281, 574, 412, 264, 1874, 4122, 2086, + 2086, 457, 50720], "temperature": 0.0, "avg_logprob": -0.09210114893705948, "compression_ratio": + 1.7252252252252251, "no_speech_prob": 0.001563511323183775}, {"id": 326, "seek": + 223844, "start": 2245.56, "end": 2251.8, "text": " wouldn''t it be nice to look + at those yeah I do love the idea of looking at both sides right we", "tokens": [50720, + 2759, 380, 309, 312, 1481, 281, 574, 412, 729, 1338, 286, 360, 959, 264, 1558, 295, + 1237, 412, 1293, 4881, 558, 321, 51032], "temperature": 0.0, "avg_logprob": -0.09210114893705948, + "compression_ratio": 1.7252252252252251, "no_speech_prob": 0.001563511323183775}, + {"id": 327, "seek": 223844, "start": 2251.8, "end": 2258.28, "text": " tend to focus + on queries because I think that''s the viewpoint of our industry we are very query", + "tokens": [51032, 3928, 281, 1879, 322, 24109, 570, 286, 519, 300, 311, 264, 35248, + 295, 527, 3518, 321, 366, 588, 14581, 51356], "temperature": 0.0, "avg_logprob": + -0.09210114893705948, "compression_ratio": 1.7252252252252251, "no_speech_prob": + 0.001563511323183775}, {"id": 328, "seek": 223844, "start": 2258.28, "end": 2263.64, + "text": " centric in the search world it''s all about the query and what can we + get out of the query we really", "tokens": [51356, 1489, 1341, 294, 264, 3164, 1002, + 309, 311, 439, 466, 264, 14581, 293, 437, 393, 321, 483, 484, 295, 264, 14581, 321, + 534, 51624], "temperature": 0.0, "avg_logprob": -0.09210114893705948, "compression_ratio": + 1.7252252252252251, "no_speech_prob": 0.001563511323183775}, {"id": 329, "seek": + 226364, "start": 2263.72, "end": 2270.92, "text": " don''t look at the results that + much except to say are they good or bad and we''re not particularly good", "tokens": + [50368, 500, 380, 574, 412, 264, 3542, 300, 709, 3993, 281, 584, 366, 436, 665, + 420, 1578, 293, 321, 434, 406, 4098, 665, 50728], "temperature": 0.0, "avg_logprob": + -0.08737870057423909, "compression_ratio": 1.6353591160220995, "no_speech_prob": + 0.003289616433903575}, {"id": 330, "seek": 226364, "start": 2270.92, "end": 2280.6, + "text": " about factoring in and what did the user do back into our algorithm and + yeah I love that this is a", "tokens": [50728, 466, 1186, 3662, 294, 293, 437, 630, + 264, 4195, 360, 646, 666, 527, 9284, 293, 1338, 286, 959, 300, 341, 307, 257, 51212], + "temperature": 0.0, "avg_logprob": -0.08737870057423909, "compression_ratio": 1.6353591160220995, + "no_speech_prob": 0.003289616433903575}, {"id": 331, "seek": 226364, "start": 2280.6, + "end": 2286.7599999999998, "text": " little the dynamic thing that we''re doing + here I think it''s a pointer to bring in more dynamic", "tokens": [51212, 707, 264, + 8546, 551, 300, 321, 434, 884, 510, 286, 519, 309, 311, 257, 23918, 281, 1565, 294, + 544, 8546, 51520], "temperature": 0.0, "avg_logprob": -0.08737870057423909, "compression_ratio": + 1.6353591160220995, "no_speech_prob": 0.003289616433903575}, {"id": 332, "seek": + 228676, "start": 2287.6400000000003, "end": 2293.96, "text": " aspects to our algorithms + where they actually can start evolving or changing or being very specific", "tokens": + [50408, 7270, 281, 527, 14642, 689, 436, 767, 393, 722, 21085, 420, 4473, 420, 885, + 588, 2685, 50724], "temperature": 0.0, "avg_logprob": -0.09651506142538102, "compression_ratio": + 1.6055555555555556, "no_speech_prob": 0.007187430281192064}, {"id": 333, "seek": + 228676, "start": 2293.96, "end": 2303.8, "text": " to very specific query types + use cases time of year right and today that''s very difficult to do", "tokens": + [50724, 281, 588, 2685, 14581, 3467, 764, 3331, 565, 295, 1064, 558, 293, 965, 300, + 311, 588, 2252, 281, 360, 51216], "temperature": 0.0, "avg_logprob": -0.09651506142538102, + "compression_ratio": 1.6055555555555556, "no_speech_prob": 0.007187430281192064}, + {"id": 334, "seek": 228676, "start": 2303.8, "end": 2313.4, "text": " only the most + sophisticated to teams have sets of algorithms yeah but I also feel like I like", + "tokens": [51216, 787, 264, 881, 16950, 281, 5491, 362, 6352, 295, 14642, 1338, + 457, 286, 611, 841, 411, 286, 411, 51696], "temperature": 0.0, "avg_logprob": -0.09651506142538102, + "compression_ratio": 1.6055555555555556, "no_speech_prob": 0.007187430281192064}, + {"id": 335, "seek": 231340, "start": 2313.56, "end": 2319.0, "text": " what you + said Eric like looking at results you know and reasoning about results and also + what", "tokens": [50372, 437, 291, 848, 9336, 411, 1237, 412, 3542, 291, 458, 293, + 21577, 466, 3542, 293, 611, 437, 50644], "temperature": 0.0, "avg_logprob": -0.0943937654848452, + "compression_ratio": 1.892156862745098, "no_speech_prob": 0.0017330568516626954}, + {"id": 336, "seek": 231340, "start": 2319.0, "end": 2325.1600000000003, "text": + " you understood about the query might lead to much better final representation + of what you show to", "tokens": [50644, 291, 7320, 466, 264, 14581, 1062, 1477, + 281, 709, 1101, 2572, 10290, 295, 437, 291, 855, 281, 50952], "temperature": 0.0, + "avg_logprob": -0.0943937654848452, "compression_ratio": 1.892156862745098, "no_speech_prob": + 0.0017330568516626954}, {"id": 337, "seek": 231340, "start": 2325.1600000000003, + "end": 2332.12, "text": " the user right because there are so many there are so + many factors also beyond the query and results", "tokens": [50952, 264, 4195, 558, + 570, 456, 366, 370, 867, 456, 366, 370, 867, 6771, 611, 4399, 264, 14581, 293, 3542, + 51300], "temperature": 0.0, "avg_logprob": -0.0943937654848452, "compression_ratio": + 1.892156862745098, "no_speech_prob": 0.0017330568516626954}, {"id": 338, "seek": + 231340, "start": 2332.12, "end": 2338.12, "text": " right like as you said season + or you know you observed some patterns with the user the recent", "tokens": [51300, + 558, 411, 382, 291, 848, 3196, 420, 291, 458, 291, 13095, 512, 8294, 365, 264, 4195, + 264, 5162, 51600], "temperature": 0.0, "avg_logprob": -0.0943937654848452, "compression_ratio": + 1.892156862745098, "no_speech_prob": 0.0017330568516626954}, {"id": 339, "seek": + 233812, "start": 2338.12, "end": 2345.16, "text": " purchase history and so on and + so forth yeah I mean it''s very fascinating and also like if I", "tokens": [50364, + 8110, 2503, 293, 370, 322, 293, 370, 5220, 1338, 286, 914, 309, 311, 588, 10343, + 293, 611, 411, 498, 286, 50716], "temperature": 0.0, "avg_logprob": -0.15321807239366614, + "compression_ratio": 1.6491228070175439, "no_speech_prob": 0.008580013178288937}, + {"id": 340, "seek": 233812, "start": 2345.16, "end": 2350.8399999999997, "text": + " continue to draw this analogy with lm world you know when you ask lm to to think + through what it", "tokens": [50716, 2354, 281, 2642, 341, 21663, 365, 287, 76, 1002, + 291, 458, 562, 291, 1029, 287, 76, 281, 281, 519, 807, 437, 309, 51000], "temperature": + 0.0, "avg_logprob": -0.15321807239366614, "compression_ratio": 1.6491228070175439, + "no_speech_prob": 0.008580013178288937}, {"id": 341, "seek": 233812, "start": 2350.8399999999997, + "end": 2358.2799999999997, "text": " has done it may correct itself right by just + looking at what it has produced because lm''s are", "tokens": [51000, 575, 1096, + 309, 815, 3006, 2564, 558, 538, 445, 1237, 412, 437, 309, 575, 7126, 570, 287, 76, + 311, 366, 51372], "temperature": 0.0, "avg_logprob": -0.15321807239366614, "compression_ratio": + 1.6491228070175439, "no_speech_prob": 0.008580013178288937}, {"id": 342, "seek": + 233812, "start": 2358.2799999999997, "end": 2365.96, "text": " as someone said calculators + for words so if you if you give it itself its own output and ask", "tokens": [51372, + 382, 1580, 848, 4322, 3391, 337, 2283, 370, 498, 291, 498, 291, 976, 309, 2564, + 1080, 1065, 5598, 293, 1029, 51756], "temperature": 0.0, "avg_logprob": -0.15321807239366614, + "compression_ratio": 1.6491228070175439, "no_speech_prob": 0.008580013178288937}, + {"id": 343, "seek": 236596, "start": 2366.52, "end": 2375.0, "text": " yeah exactly + yeah like I can''t wait to write a search algorithm that understands what they did", + "tokens": [50392, 1338, 2293, 1338, 411, 286, 393, 380, 1699, 281, 2464, 257, 3164, + 9284, 300, 15146, 437, 436, 630, 50816], "temperature": 0.0, "avg_logprob": -0.1078964430710365, + "compression_ratio": 1.7853881278538812, "no_speech_prob": 0.026512479409575462}, + {"id": 344, "seek": 236596, "start": 2375.0, "end": 2380.6, "text": " the last time + the user didn''t like the results and so when you get a similar query for the same + user", "tokens": [50816, 264, 1036, 565, 264, 4195, 994, 380, 411, 264, 3542, 293, + 370, 562, 291, 483, 257, 2531, 14581, 337, 264, 912, 4195, 51096], "temperature": + 0.0, "avg_logprob": -0.1078964430710365, "compression_ratio": 1.7853881278538812, + "no_speech_prob": 0.026512479409575462}, {"id": 345, "seek": 236596, "start": 2380.6, + "end": 2386.44, "text": " do something new right try something new because whatever + you were doing before the user didn''t like", "tokens": [51096, 360, 746, 777, 558, + 853, 746, 777, 570, 2035, 291, 645, 884, 949, 264, 4195, 994, 380, 411, 51388], + "temperature": 0.0, "avg_logprob": -0.1078964430710365, "compression_ratio": 1.7853881278538812, + "no_speech_prob": 0.026512479409575462}, {"id": 346, "seek": 236596, "start": 2387.2400000000002, + "end": 2393.64, "text": " yeah joke about if the user hates what you''re giving + them you might as well just return random", "tokens": [51428, 1338, 7647, 466, 498, + 264, 4195, 23000, 437, 291, 434, 2902, 552, 291, 1062, 382, 731, 445, 2736, 4974, + 51748], "temperature": 0.0, "avg_logprob": -0.1078964430710365, "compression_ratio": + 1.7853881278538812, "no_speech_prob": 0.026512479409575462}, {"id": 347, "seek": + 239364, "start": 2393.64, "end": 2399.64, "text": " docs because that''ll be better + than whatever you''re doing right now yeah yeah at least you you have a", "tokens": + [50364, 45623, 570, 300, 603, 312, 1101, 813, 2035, 291, 434, 884, 558, 586, 1338, + 1338, 412, 1935, 291, 291, 362, 257, 50664], "temperature": 0.0, "avg_logprob": + -0.1216478895867008, "compression_ratio": 1.8246445497630333, "no_speech_prob": + 0.010809781961143017}, {"id": 348, "seek": 239364, "start": 2399.64, "end": 2407.3199999999997, + "text": " chance there right with the random so what question I sort of have though + in what we described how is", "tokens": [50664, 2931, 456, 558, 365, 264, 4974, + 370, 437, 1168, 286, 1333, 295, 362, 1673, 294, 437, 321, 7619, 577, 307, 51048], + "temperature": 0.0, "avg_logprob": -0.1216478895867008, "compression_ratio": 1.8246445497630333, + "no_speech_prob": 0.010809781961143017}, {"id": 349, "seek": 239364, "start": 2407.3199999999997, + "end": 2414.12, "text": " it different than learning to rank other than learning + to rank is about ranking one list", "tokens": [51048, 309, 819, 813, 2539, 281, + 6181, 661, 813, 2539, 281, 6181, 307, 466, 17833, 472, 1329, 51388], "temperature": + 0.0, "avg_logprob": -0.1216478895867008, "compression_ratio": 1.8246445497630333, + "no_speech_prob": 0.010809781961143017}, {"id": 350, "seek": 239364, "start": 2415.48, + "end": 2422.8399999999997, "text": " and here we''re ranking two lists is do I just + conceptually have the role of learning to rank", "tokens": [51456, 293, 510, 321, + 434, 17833, 732, 14511, 307, 360, 286, 445, 3410, 671, 362, 264, 3090, 295, 2539, + 281, 6181, 51824], "temperature": 0.0, "avg_logprob": -0.1216478895867008, "compression_ratio": + 1.8246445497630333, "no_speech_prob": 0.010809781961143017}, {"id": 351, "seek": + 242284, "start": 2422.84, "end": 2430.04, "text": " wrong between what learning + to rank is and how the dynamic hybrid optimizer was working", "tokens": [50364, + 2085, 1296, 437, 2539, 281, 6181, 307, 293, 577, 264, 8546, 13051, 5028, 6545, 390, + 1364, 50724], "temperature": 0.0, "avg_logprob": -0.08268000474616663, "compression_ratio": + 1.6171428571428572, "no_speech_prob": 0.0007803972694091499}, {"id": 352, "seek": + 242284, "start": 2432.28, "end": 2439.56, "text": " so I mean we are not re-ranking + results right so that''s what typically learning to rank does but", "tokens": [50836, + 370, 286, 914, 321, 366, 406, 319, 12, 20479, 278, 3542, 558, 370, 300, 311, 437, + 5850, 2539, 281, 6181, 775, 457, 51200], "temperature": 0.0, "avg_logprob": -0.08268000474616663, + "compression_ratio": 1.6171428571428572, "no_speech_prob": 0.0007803972694091499}, + {"id": 353, "seek": 242284, "start": 2439.56, "end": 2450.28, "text": " what we + are kind of doing is we are learning when when to let''s say increase the weight + on keyword", "tokens": [51200, 437, 321, 366, 733, 295, 884, 307, 321, 366, 2539, + 562, 562, 281, 718, 311, 584, 3488, 264, 3364, 322, 20428, 51736], "temperature": + 0.0, "avg_logprob": -0.08268000474616663, "compression_ratio": 1.6171428571428572, + "no_speech_prob": 0.0007803972694091499}, {"id": 354, "seek": 245028, "start": 2450.28, + "end": 2457.4, "text": " search results or on your search results right so it''s + kind of a learn to", "tokens": [50364, 3164, 3542, 420, 322, 428, 3164, 3542, 558, + 370, 309, 311, 733, 295, 257, 1466, 281, 50720], "temperature": 0.0, "avg_logprob": + -0.19344098944413035, "compression_ratio": 1.7352941176470589, "no_speech_prob": + 0.0023917951621115208}, {"id": 355, "seek": 245028, "start": 2458.6800000000003, + "end": 2467.0800000000004, "text": " lend learn to blend learn to search the new + technology or we just done with the learning to", "tokens": [50784, 21774, 1466, + 281, 10628, 1466, 281, 3164, 264, 777, 2899, 420, 321, 445, 1096, 365, 264, 2539, + 281, 51204], "temperature": 0.0, "avg_logprob": -0.19344098944413035, "compression_ratio": + 1.7352941176470589, "no_speech_prob": 0.0023917951621115208}, {"id": 356, "seek": + 245028, "start": 2469.0, "end": 2474.52, "text": " nomenclature right and we like + optimizer better than learning to blend", "tokens": [51300, 297, 4726, 3474, 1503, + 558, 293, 321, 411, 5028, 6545, 1101, 813, 2539, 281, 10628, 51576], "temperature": + 0.0, "avg_logprob": -0.19344098944413035, "compression_ratio": 1.7352941176470589, + "no_speech_prob": 0.0023917951621115208}, {"id": 357, "seek": 247452, "start": 2475.16, + "end": 2484.2, "text": " maybe yeah so I''m not the one who''s most creative in + coming up with clever names so maybe it''s", "tokens": [50396, 1310, 1338, 370, + 286, 478, 406, 264, 472, 567, 311, 881, 5880, 294, 1348, 493, 365, 13494, 5288, + 370, 1310, 309, 311, 50848], "temperature": 0.0, "avg_logprob": -0.1449633631212958, + "compression_ratio": 1.7416267942583732, "no_speech_prob": 0.005703783128410578}, + {"id": 358, "seek": 247452, "start": 2484.2, "end": 2492.12, "text": " maybe it''s + time for not learn to but blah blah blah optimizer and that''s kind of how we", + "tokens": [50848, 1310, 309, 311, 565, 337, 406, 1466, 281, 457, 12288, 12288, 12288, + 5028, 6545, 293, 300, 311, 733, 295, 577, 321, 51244], "temperature": 0.0, "avg_logprob": + -0.1449633631212958, "compression_ratio": 1.7416267942583732, "no_speech_prob": + 0.005703783128410578}, {"id": 359, "seek": 247452, "start": 2492.12, "end": 2496.7599999999998, + "text": " ended up with the hybrid search optimizer right now but I wouldn''t really + have a good", "tokens": [51244, 4590, 493, 365, 264, 13051, 3164, 5028, 6545, 558, + 586, 457, 286, 2759, 380, 534, 362, 257, 665, 51476], "temperature": 0.0, "avg_logprob": + -0.1449633631212958, "compression_ratio": 1.7416267942583732, "no_speech_prob": + 0.005703783128410578}, {"id": 360, "seek": 247452, "start": 2497.64, "end": 2503.96, + "text": " let''s say argument against calling it learning to optimize hybrid search + or something like that", "tokens": [51520, 718, 311, 584, 6770, 1970, 5141, 309, + 2539, 281, 19719, 13051, 3164, 420, 746, 411, 300, 51836], "temperature": 0.0, "avg_logprob": + -0.1449633631212958, "compression_ratio": 1.7416267942583732, "no_speech_prob": + 0.005703783128410578}, {"id": 361, "seek": 250396, "start": 2504.36, "end": 2512.2, + "text": " because that''s kind of what the dynamic approach does right as we gather + more data more clicks more", "tokens": [50384, 570, 300, 311, 733, 295, 437, 264, + 8546, 3109, 775, 558, 382, 321, 5448, 544, 1412, 544, 18521, 544, 50776], "temperature": + 0.0, "avg_logprob": -0.10012498768893155, "compression_ratio": 1.8483412322274881, + "no_speech_prob": 0.0012311162427067757}, {"id": 362, "seek": 250396, "start": 2512.2, + "end": 2518.04, "text": " of that those go into the features right we even use the + the language of learning to rank right", "tokens": [50776, 295, 300, 729, 352, 666, + 264, 4122, 558, 321, 754, 764, 264, 264, 2856, 295, 2539, 281, 6181, 558, 51068], + "temperature": 0.0, "avg_logprob": -0.10012498768893155, "compression_ratio": 1.8483412322274881, + "no_speech_prob": 0.0012311162427067757}, {"id": 363, "seek": 250396, "start": 2518.04, + "end": 2524.84, "text": " feature engineering right we use that language and we''re + building a model and you even mention", "tokens": [51068, 4111, 7043, 558, 321, + 764, 300, 2856, 293, 321, 434, 2390, 257, 2316, 293, 291, 754, 2152, 51408], "temperature": + 0.0, "avg_logprob": -0.10012498768893155, "compression_ratio": 1.8483412322274881, + "no_speech_prob": 0.0012311162427067757}, {"id": 364, "seek": 250396, "start": 2525.4, + "end": 2532.04, "text": " right linear models in a forest and you know those are + all the the words that I think of as oh it''s", "tokens": [51436, 558, 8213, 5245, + 294, 257, 6719, 293, 291, 458, 729, 366, 439, 264, 264, 2283, 300, 286, 519, 295, + 382, 1954, 309, 311, 51768], "temperature": 0.0, "avg_logprob": -0.10012498768893155, + "compression_ratio": 1.8483412322274881, "no_speech_prob": 0.0012311162427067757}, + {"id": 365, "seek": 253204, "start": 2532.04, "end": 2537.64, "text": " learning + to rank so interesting I think that''s just it''s interesting to see learning to + rate maybe", "tokens": [50364, 2539, 281, 6181, 370, 1880, 286, 519, 300, 311, 445, + 309, 311, 1880, 281, 536, 2539, 281, 3314, 1310, 50644], "temperature": 0.2, "avg_logprob": + -0.1863905035931131, "compression_ratio": 1.7557603686635945, "no_speech_prob": + 0.0007230683113448322}, {"id": 366, "seek": 253204, "start": 2537.64, "end": 2544.92, + "text": " come back in a new way yeah yeah I''m learning to rank still something + you can apply on top of the", "tokens": [50644, 808, 646, 294, 257, 777, 636, 1338, + 1338, 286, 478, 2539, 281, 6181, 920, 746, 291, 393, 3079, 322, 1192, 295, 264, + 51008], "temperature": 0.2, "avg_logprob": -0.1863905035931131, "compression_ratio": + 1.7557603686635945, "no_speech_prob": 0.0007230683113448322}, {"id": 367, "seek": + 253204, "start": 2544.92, "end": 2553.56, "text": " hybrid search optimizer right + so it''s not like we have any kind of substitute here so that''s kind", "tokens": + [51008, 13051, 3164, 5028, 6545, 558, 370, 309, 311, 406, 411, 321, 362, 604, 733, + 295, 15802, 510, 370, 300, 311, 733, 51440], "temperature": 0.2, "avg_logprob": + -0.1863905035931131, "compression_ratio": 1.7557603686635945, "no_speech_prob": + 0.0007230683113448322}, {"id": 368, "seek": 253204, "start": 2553.56, "end": 2561.08, + "text": " of still I think a very valuable tool in the mix and that''s just now + one way to really", "tokens": [51440, 295, 920, 286, 519, 257, 588, 8263, 2290, + 294, 264, 2890, 293, 300, 311, 445, 586, 472, 636, 281, 534, 51816], "temperature": + 0.2, "avg_logprob": -0.1863905035931131, "compression_ratio": 1.7557603686635945, + "no_speech_prob": 0.0007230683113448322}, {"id": 369, "seek": 256108, "start": 2561.7999999999997, + "end": 2567.88, "text": " figure out what''s the best way of getting to reasonable + hybrid search results yeah but I was", "tokens": [50400, 2573, 484, 437, 311, 264, + 1151, 636, 295, 1242, 281, 10585, 13051, 3164, 3542, 1338, 457, 286, 390, 50704], + "temperature": 0.0, "avg_logprob": -0.1099062442779541, "compression_ratio": 1.722488038277512, + "no_speech_prob": 0.0013559159124270082}, {"id": 370, "seek": 256108, "start": 2567.88, + "end": 2571.0, "text": " recently also thinking about this I wonder what''s your + hunt on that but", "tokens": [50704, 3938, 611, 1953, 466, 341, 286, 2441, 437, + 311, 428, 12454, 322, 300, 457, 50860], "temperature": 0.0, "avg_logprob": -0.1099062442779541, + "compression_ratio": 1.722488038277512, "no_speech_prob": 0.0013559159124270082}, + {"id": 371, "seek": 256108, "start": 2573.16, "end": 2579.88, "text": " learning + to rank sort of depends on the training data and you usually collected from the + past you", "tokens": [50968, 2539, 281, 6181, 1333, 295, 5946, 322, 264, 3097, 1412, + 293, 291, 2673, 11087, 490, 264, 1791, 291, 51304], "temperature": 0.0, "avg_logprob": + -0.1099062442779541, "compression_ratio": 1.722488038277512, "no_speech_prob": 0.0013559159124270082}, + {"id": 372, "seek": 256108, "start": 2579.88, "end": 2586.2799999999997, "text": + " don''t collect it from the future right and so as you move into the future and + patterns change you", "tokens": [51304, 500, 380, 2500, 309, 490, 264, 2027, 558, + 293, 370, 382, 291, 1286, 666, 264, 2027, 293, 8294, 1319, 291, 51624], "temperature": + 0.0, "avg_logprob": -0.1099062442779541, "compression_ratio": 1.722488038277512, + "no_speech_prob": 0.0013559159124270082}, {"id": 373, "seek": 258628, "start": 2586.28, + "end": 2593.5600000000004, "text": " carry over that past weight that can actually + go against the intent of your reasoning engine and", "tokens": [50364, 3985, 670, + 300, 1791, 3364, 300, 393, 767, 352, 1970, 264, 8446, 295, 428, 21577, 2848, 293, + 50728], "temperature": 0.0, "avg_logprob": -0.08094951718352562, "compression_ratio": + 1.7272727272727273, "no_speech_prob": 0.007452432997524738}, {"id": 374, "seek": + 258628, "start": 2593.5600000000004, "end": 2600.36, "text": " and that''s where + I think that a lot of work needs to go in all of these directions but as you", "tokens": + [50728, 293, 300, 311, 689, 286, 519, 300, 257, 688, 295, 589, 2203, 281, 352, 294, + 439, 295, 613, 11095, 457, 382, 291, 51068], "temperature": 0.0, "avg_logprob": + -0.08094951718352562, "compression_ratio": 1.7272727272727273, "no_speech_prob": + 0.007452432997524738}, {"id": 375, "seek": 258628, "start": 2600.36, "end": 2606.2000000000003, + "text": " optimize your retrieving and your reasoning engine you know your query + understanding maybe you", "tokens": [51068, 19719, 428, 19817, 798, 293, 428, 21577, + 2848, 291, 458, 428, 14581, 3701, 1310, 291, 51360], "temperature": 0.0, "avg_logprob": + -0.08094951718352562, "compression_ratio": 1.7272727272727273, "no_speech_prob": + 0.007452432997524738}, {"id": 376, "seek": 258628, "start": 2606.2000000000003, + "end": 2612.0400000000004, "text": " should dial back the LTR a little bit or maybe + you need to retrain it right there right then I", "tokens": [51360, 820, 5502, 646, + 264, 441, 25936, 257, 707, 857, 420, 1310, 291, 643, 281, 1533, 7146, 309, 558, + 456, 558, 550, 286, 51652], "temperature": 0.0, "avg_logprob": -0.08094951718352562, + "compression_ratio": 1.7272727272727273, "no_speech_prob": 0.007452432997524738}, + {"id": 377, "seek": 261204, "start": 2612.04, "end": 2621.4, "text": " don''t know + or retrain frequently enough so that you don''t lose the invention strengths right + yeah", "tokens": [50364, 500, 380, 458, 420, 1533, 7146, 10374, 1547, 370, 300, + 291, 500, 380, 3624, 264, 22265, 16986, 558, 1338, 50832], "temperature": 0.0, "avg_logprob": + -0.1083514844217608, "compression_ratio": 1.6416184971098267, "no_speech_prob": + 0.0012699058279395103}, {"id": 378, "seek": 261204, "start": 2621.64, "end": 2628.2799999999997, + "text": " I think that''s our challenge in a lot of these things yeah the historical + approaches versus the", "tokens": [50844, 286, 519, 300, 311, 527, 3430, 294, 257, + 688, 295, 613, 721, 1338, 264, 8584, 11587, 5717, 264, 51176], "temperature": 0.0, + "avg_logprob": -0.1083514844217608, "compression_ratio": 1.6416184971098267, "no_speech_prob": + 0.0012699058279395103}, {"id": 379, "seek": 261204, "start": 2628.2799999999997, + "end": 2637.32, "text": " predictive approaches right and you know which ones do + you go with and how do you discount", "tokens": [51176, 35521, 11587, 558, 293, + 291, 458, 597, 2306, 360, 291, 352, 365, 293, 577, 360, 291, 11635, 51628], "temperature": + 0.0, "avg_logprob": -0.1083514844217608, "compression_ratio": 1.6416184971098267, + "no_speech_prob": 0.0012699058279395103}, {"id": 380, "seek": 263732, "start": 2637.32, + "end": 2644.2000000000003, "text": " the historical if you have a bunch of new interesting + data yeah yeah but I also like the limitations", "tokens": [50364, 264, 8584, 498, + 291, 362, 257, 3840, 295, 777, 1880, 1412, 1338, 1338, 457, 286, 611, 411, 264, + 15705, 50708], "temperature": 0.0, "avg_logprob": -0.09566466779593961, "compression_ratio": + 1.6651982378854626, "no_speech_prob": 0.0058951606042683125}, {"id": 381, "seek": + 263732, "start": 2644.2000000000003, "end": 2652.6800000000003, "text": " of the + physical world I think from investment books I''ve read one key key takeaway lesson + is that", "tokens": [50708, 295, 264, 4001, 1002, 286, 519, 490, 6078, 3642, 286, + 600, 1401, 472, 2141, 2141, 30681, 6898, 307, 300, 51132], "temperature": 0.0, "avg_logprob": + -0.09566466779593961, "compression_ratio": 1.6651982378854626, "no_speech_prob": + 0.0058951606042683125}, {"id": 382, "seek": 263732, "start": 2652.6800000000003, + "end": 2657.56, "text": " no one can predict the future if someone claims that they + can do they probably lie", "tokens": [51132, 572, 472, 393, 6069, 264, 2027, 498, + 1580, 9441, 300, 436, 393, 360, 436, 1391, 4544, 51376], "temperature": 0.0, "avg_logprob": + -0.09566466779593961, "compression_ratio": 1.6651982378854626, "no_speech_prob": + 0.0058951606042683125}, {"id": 383, "seek": 263732, "start": 2659.4, "end": 2665.2400000000002, + "text": " but but again I guess there is still room for being more dynamic and is + there something you guys", "tokens": [51468, 457, 457, 797, 286, 2041, 456, 307, + 920, 1808, 337, 885, 544, 8546, 293, 307, 456, 746, 291, 1074, 51760], "temperature": + 0.0, "avg_logprob": -0.09566466779593961, "compression_ratio": 1.6651982378854626, + "no_speech_prob": 0.0058951606042683125}, {"id": 384, "seek": 266524, "start": 2665.24, + "end": 2670.12, "text": " want to also show I mean is this something we can look + at visually or", "tokens": [50364, 528, 281, 611, 855, 286, 914, 307, 341, 746, + 321, 393, 574, 412, 19622, 420, 50608], "temperature": 0.0, "avg_logprob": -0.15950644400811964, + "compression_ratio": 1.6226415094339623, "no_speech_prob": 0.001306170946918428}, + {"id": 385, "seek": 266524, "start": 2673.24, "end": 2683.56, "text": " well theoretically + we can no pressure I mean so this small demo here and I''m going to show the", "tokens": + [50764, 731, 29400, 321, 393, 572, 3321, 286, 914, 370, 341, 1359, 10723, 510, 293, + 286, 478, 516, 281, 855, 264, 51280], "temperature": 0.0, "avg_logprob": -0.15950644400811964, + "compression_ratio": 1.6226415094339623, "no_speech_prob": 0.001306170946918428}, + {"id": 386, "seek": 266524, "start": 2683.56, "end": 2692.2, "text": " results first + and then how we get to these results it basically takes in a user query my such", + "tokens": [51280, 3542, 700, 293, 550, 577, 321, 483, 281, 613, 3542, 309, 1936, + 2516, 294, 257, 4195, 14581, 452, 1270, 51712], "temperature": 0.0, "avg_logprob": + -0.15950644400811964, "compression_ratio": 1.6226415094339623, "no_speech_prob": + 0.001306170946918428}, {"id": 387, "seek": 269220, "start": 2692.2, "end": 2697.96, + "text": " application now is this Jupyter Notebook so it''s not the most sophisticated + search application", "tokens": [50364, 3861, 586, 307, 341, 22125, 88, 391, 11633, + 2939, 370, 309, 311, 406, 264, 881, 16950, 3164, 3861, 50652], "temperature": 0.0, + "avg_logprob": -0.14665975878315587, "compression_ratio": 1.7409638554216869, "no_speech_prob": + 0.00034268011222593486}, {"id": 388, "seek": 269220, "start": 2698.68, "end": 2706.04, + "text": " but it calculates the query features then with these query features reaches + out to the model to", "tokens": [50688, 457, 309, 4322, 1024, 264, 14581, 4122, + 550, 365, 613, 14581, 4122, 14235, 484, 281, 264, 2316, 281, 51056], "temperature": + 0.0, "avg_logprob": -0.14665975878315587, "compression_ratio": 1.7409638554216869, + "no_speech_prob": 0.00034268011222593486}, {"id": 389, "seek": 269220, "start": + 2706.04, "end": 2716.2, "text": " get the the best neural nest what these search + features and with that retrieved response the query", "tokens": [51056, 483, 264, + 264, 1151, 18161, 15646, 437, 613, 3164, 4122, 293, 365, 300, 19817, 937, 4134, + 264, 14581, 51564], "temperature": 0.0, "avg_logprob": -0.14665975878315587, "compression_ratio": + 1.7409638554216869, "no_speech_prob": 0.00034268011222593486}, {"id": 390, "seek": + 271620, "start": 2716.2, "end": 2724.52, "text": " is built together and then sent + to open search so we''re just going to have a look at a couple of", "tokens": [50364, + 307, 3094, 1214, 293, 550, 2279, 281, 1269, 3164, 370, 321, 434, 445, 516, 281, + 362, 257, 574, 412, 257, 1916, 295, 50780], "temperature": 0.0, "avg_logprob": -0.13025188446044922, + "compression_ratio": 1.5668449197860963, "no_speech_prob": 0.0020670457743108273}, + {"id": 391, "seek": 271620, "start": 2724.52, "end": 2731.48, "text": " examples + first and then we can have a look at the code so again this is now part of the ESC + iData", "tokens": [50780, 5110, 700, 293, 550, 321, 393, 362, 257, 574, 412, 264, + 3089, 370, 797, 341, 307, 586, 644, 295, 264, 12564, 34, 741, 35, 3274, 51128], + "temperature": 0.0, "avg_logprob": -0.13025188446044922, "compression_ratio": 1.5668449197860963, + "no_speech_prob": 0.0020670457743108273}, {"id": 392, "seek": 271620, "start": 2731.48, + "end": 2738.68, "text": " set my index has like 20,000 documents in it so it''s + not large it''s only a subset of the ESC iData", "tokens": [51128, 992, 452, 8186, + 575, 411, 945, 11, 1360, 8512, 294, 309, 370, 309, 311, 406, 2416, 309, 311, 787, + 257, 25993, 295, 264, 12564, 34, 741, 35, 3274, 51488], "temperature": 0.0, "avg_logprob": + -0.13025188446044922, "compression_ratio": 1.5668449197860963, "no_speech_prob": + 0.0020670457743108273}, {"id": 393, "seek": 273868, "start": 2739.64, "end": 2749.72, + "text": " and when we send queries now in this case waterproof jacket in this method + the method first as I", "tokens": [50412, 293, 562, 321, 2845, 24109, 586, 294, + 341, 1389, 27974, 11781, 294, 341, 3170, 264, 3170, 700, 382, 286, 50916], "temperature": + 0.0, "avg_logprob": -0.12327960087702824, "compression_ratio": 1.5806451612903225, + "no_speech_prob": 0.002759872004389763}, {"id": 394, "seek": 273868, "start": 2749.72, + "end": 2759.0, "text": " just explained cause out to the model it retrieves the + neural nest score and then builds the query", "tokens": [50916, 445, 8825, 3082, + 484, 281, 264, 2316, 309, 19817, 977, 264, 18161, 15646, 6175, 293, 550, 15182, + 264, 14581, 51380], "temperature": 0.0, "avg_logprob": -0.12327960087702824, "compression_ratio": + 1.5806451612903225, "no_speech_prob": 0.002759872004389763}, {"id": 395, "seek": + 273868, "start": 2759.0, "end": 2766.3599999999997, "text": " together and then + we have this HTML display here as you can see there are not images available for", + "tokens": [51380, 1214, 293, 550, 321, 362, 341, 17995, 4674, 510, 382, 291, 393, + 536, 456, 366, 406, 5267, 2435, 337, 51748], "temperature": 0.0, "avg_logprob": + -0.12327960087702824, "compression_ratio": 1.5806451612903225, "no_speech_prob": + 0.002759872004389763}, {"id": 396, "seek": 276636, "start": 2766.36, "end": 2773.88, + "text": " all of the products but what we can see here is that what a weatherproof + sorry weatherproof jacket", "tokens": [50364, 439, 295, 264, 3383, 457, 437, 321, + 393, 536, 510, 307, 300, 437, 257, 5503, 15690, 2597, 5503, 15690, 11781, 50740], + "temperature": 0.0, "avg_logprob": -0.19228851318359375, "compression_ratio": 1.4848484848484849, + "no_speech_prob": 0.0024820754770189524}, {"id": 397, "seek": 276636, "start": 2774.84, + "end": 2784.84, "text": " it gets a 50-50 search waiting also in this case 50% keyword + 50% neural if we go for weatherproof", "tokens": [50788, 309, 2170, 257, 2625, 12, + 2803, 3164, 3806, 611, 294, 341, 1389, 2625, 4, 20428, 2625, 4, 18161, 498, 321, + 352, 337, 5503, 15690, 51288], "temperature": 0.0, "avg_logprob": -0.19228851318359375, + "compression_ratio": 1.4848484848484849, "no_speech_prob": 0.0024820754770189524}, + {"id": 398, "seek": 278484, "start": 2784.84, "end": 2796.52, "text": " jacket for + the women the weights change so now we have 90% neural and only 10% keyword search + weight", "tokens": [50364, 11781, 337, 264, 2266, 264, 17443, 1319, 370, 586, 321, + 362, 4289, 4, 18161, 293, 787, 1266, 4, 20428, 3164, 3364, 50948], "temperature": + 0.0, "avg_logprob": -0.16628310259650736, "compression_ratio": 1.6032608695652173, + "no_speech_prob": 0.015731146559119225}, {"id": 399, "seek": 278484, "start": 2797.08, + "end": 2803.08, "text": " and then oh and that''s because the query became much + more specific meaning that since we did add", "tokens": [50976, 293, 550, 1954, + 293, 300, 311, 570, 264, 14581, 3062, 709, 544, 2685, 3620, 300, 1670, 321, 630, + 909, 51276], "temperature": 0.0, "avg_logprob": -0.16628310259650736, "compression_ratio": + 1.6032608695652173, "no_speech_prob": 0.015731146559119225}, {"id": 400, "seek": + 278484, "start": 2803.08, "end": 2809.32, "text": " women there we are not expecting + results for men or for kids right is that exactly so that''s kind", "tokens": [51276, + 2266, 456, 321, 366, 406, 9650, 3542, 337, 1706, 420, 337, 2301, 558, 307, 300, + 2293, 370, 300, 311, 733, 51588], "temperature": 0.0, "avg_logprob": -0.16628310259650736, + "compression_ratio": 1.6032608695652173, "no_speech_prob": 0.015731146559119225}, + {"id": 401, "seek": 280932, "start": 2809.32, "end": 2815.4, "text": " of what we + can now infer maybe in what the model picks up here so we have a longer query we + have a", "tokens": [50364, 295, 437, 321, 393, 586, 13596, 1310, 294, 437, 264, + 2316, 16137, 493, 510, 370, 321, 362, 257, 2854, 14581, 321, 362, 257, 50668], "temperature": + 0.0, "avg_logprob": -0.10824088428331458, "compression_ratio": 1.7873303167420815, + "no_speech_prob": 0.0033873342908918858}, {"id": 402, "seek": 280932, "start": 2815.4, + "end": 2821.6400000000003, "text": " more specific query so it''s not really looking + at the words I''ll say the model but at the features like", "tokens": [50668, 544, + 2685, 14581, 370, 309, 311, 406, 534, 1237, 412, 264, 2283, 286, 603, 584, 264, + 2316, 457, 412, 264, 4122, 411, 50980], "temperature": 0.0, "avg_logprob": -0.10824088428331458, + "compression_ratio": 1.7873303167420815, "no_speech_prob": 0.0033873342908918858}, + {"id": 403, "seek": 280932, "start": 2821.6400000000003, "end": 2829.0800000000004, + "text": " query length are there numbers in it are there any special characters + in it and so on so another one", "tokens": [50980, 14581, 4641, 366, 456, 3547, + 294, 309, 366, 456, 604, 2121, 4342, 294, 309, 293, 370, 322, 370, 1071, 472, 51352], + "temperature": 0.0, "avg_logprob": -0.10824088428331458, "compression_ratio": 1.7873303167420815, + "no_speech_prob": 0.0033873342908918858}, {"id": 404, "seek": 280932, "start": 2829.0800000000004, + "end": 2834.84, "text": " weatherproof jacket black and we also see that maybe results + that we wouldn''t really expect", "tokens": [51352, 5503, 15690, 11781, 2211, 293, + 321, 611, 536, 300, 1310, 3542, 300, 321, 2759, 380, 534, 2066, 51640], "temperature": + 0.0, "avg_logprob": -0.10824088428331458, "compression_ratio": 1.7873303167420815, + "no_speech_prob": 0.0033873342908918858}, {"id": 405, "seek": 283484, "start": 2835.0, + "end": 2841.56, "text": " in the top here but again it''s only a smallish proof + of concept kind of thing that we are looking at", "tokens": [50372, 294, 264, 1192, + 510, 457, 797, 309, 311, 787, 257, 1359, 742, 8177, 295, 3410, 733, 295, 551, 300, + 321, 366, 1237, 412, 50700], "temperature": 0.0, "avg_logprob": -0.17334971708409927, + "compression_ratio": 1.6779661016949152, "no_speech_prob": 0.0023127684835344553}, + {"id": 406, "seek": 283484, "start": 2841.56, "end": 2849.1600000000003, "text": + " here but we can see different queries that are similar from let''s say meanings + standpoint of you", "tokens": [50700, 510, 457, 321, 393, 536, 819, 24109, 300, + 366, 2531, 490, 718, 311, 584, 28138, 15827, 295, 291, 51080], "temperature": 0.0, + "avg_logprob": -0.17334971708409927, "compression_ratio": 1.6779661016949152, "no_speech_prob": + 0.0023127684835344553}, {"id": 407, "seek": 283484, "start": 2849.8, "end": 2857.4, + "text": " they retrieve different weights in that case and that''s kind of the interesting + thing and we can go", "tokens": [51112, 436, 30254, 819, 17443, 294, 300, 1389, + 293, 300, 311, 733, 295, 264, 1880, 551, 293, 321, 393, 352, 51492], "temperature": + 0.0, "avg_logprob": -0.17334971708409927, "compression_ratio": 1.6779661016949152, + "no_speech_prob": 0.0023127684835344553}, {"id": 408, "seek": 285740, "start": 2857.48, + "end": 2868.12, "text": " for something completely different as well um iPhone case + and we see like nice iPhone cases", "tokens": [50368, 337, 746, 2584, 819, 382, + 731, 1105, 7252, 1389, 293, 321, 536, 411, 1481, 7252, 3331, 50900], "temperature": + 0.0, "avg_logprob": -0.2702459971110026, "compression_ratio": 1.364963503649635, + "no_speech_prob": 0.0014275498688220978}, {"id": 409, "seek": 285740, "start": 2868.92, + "end": 2879.56, "text": " throughout the query and that goes with neuro.7 and keyword.3 + and I don''t know what''s iPhone 15", "tokens": [50940, 3710, 264, 14581, 293, 300, + 1709, 365, 16499, 13, 22, 293, 20428, 13, 18, 293, 286, 500, 380, 458, 437, 311, + 7252, 2119, 51472], "temperature": 0.0, "avg_logprob": -0.2702459971110026, "compression_ratio": + 1.364963503649635, "no_speech_prob": 0.0014275498688220978}, {"id": 410, "seek": + 287956, "start": 2880.2799999999997, "end": 2890.84, "text": " roll max a is black + so that would be a very very very specific query and here again the", "tokens": + [50400, 3373, 11469, 257, 307, 2211, 370, 300, 576, 312, 257, 588, 588, 588, 2685, + 14581, 293, 510, 797, 264, 50928], "temperature": 0.0, "avg_logprob": -0.23271123824581022, + "compression_ratio": 1.6645962732919255, "no_speech_prob": 0.0011124081211164594}, + {"id": 411, "seek": 287956, "start": 2890.84, "end": 2897.48, "text": " neural search + rate increases whereas when we go for a very broad query so that''s maybe one", + "tokens": [50928, 18161, 3164, 3314, 8637, 9735, 562, 321, 352, 337, 257, 588, 4152, + 14581, 370, 300, 311, 1310, 472, 51260], "temperature": 0.0, "avg_logprob": -0.23271123824581022, + "compression_ratio": 1.6645962732919255, "no_speech_prob": 0.0011124081211164594}, + {"id": 412, "seek": 287956, "start": 2899.24, "end": 2905.48, "text": " characteristic + of the model that you that you can almost feel is the more specific we get", "tokens": + [51348, 16282, 295, 264, 2316, 300, 291, 300, 291, 393, 1920, 841, 307, 264, 544, + 2685, 321, 483, 51660], "temperature": 0.0, "avg_logprob": -0.23271123824581022, + "compression_ratio": 1.6645962732919255, "no_speech_prob": 0.0011124081211164594}, + {"id": 413, "seek": 290548, "start": 2905.96, "end": 2911.88, "text": " the more + neural weight it gets but also other features do play a role.", "tokens": [50388, + 264, 544, 18161, 3364, 309, 2170, 457, 611, 661, 4122, 360, 862, 257, 3090, 13, + 50684], "temperature": 0.0, "avg_logprob": -0.1436357810849049, "compression_ratio": + 1.5515151515151515, "no_speech_prob": 0.00042315831524319947}, {"id": 414, "seek": + 290548, "start": 2912.76, "end": 2922.36, "text": " Yeah yeah that''s very interesting + and that''s how the open search query looks like so let''s say", "tokens": [50728, + 865, 1338, 300, 311, 588, 1880, 293, 300, 311, 577, 264, 1269, 3164, 14581, 1542, + 411, 370, 718, 311, 584, 51208], "temperature": 0.0, "avg_logprob": -0.1436357810849049, + "compression_ratio": 1.5515151515151515, "no_speech_prob": 0.00042315831524319947}, + {"id": 415, "seek": 290548, "start": 2922.36, "end": 2931.48, "text": " the interesting + part is that one here so we have a keyword query which is like I explained", "tokens": + [51208, 264, 1880, 644, 307, 300, 472, 510, 370, 321, 362, 257, 20428, 14581, 597, + 307, 411, 286, 8825, 51664], "temperature": 0.0, "avg_logprob": -0.1436357810849049, + "compression_ratio": 1.5515151515151515, "no_speech_prob": 0.00042315831524319947}, + {"id": 416, "seek": 293148, "start": 2931.48, "end": 2937.32, "text": " before searching + in these couple of fields with different field weights a best fields query", "tokens": + [50364, 949, 10808, 294, 613, 1916, 295, 7909, 365, 819, 2519, 17443, 257, 1151, + 7909, 14581, 50656], "temperature": 0.0, "avg_logprob": -0.17114732265472413, "compression_ratio": + 1.7061611374407584, "no_speech_prob": 0.0001779366284608841}, {"id": 417, "seek": + 293148, "start": 2937.88, "end": 2943.56, "text": " multi match with the end operator + and then we have a neural query that retrieves the", "tokens": [50684, 4825, 2995, + 365, 264, 917, 12973, 293, 550, 321, 362, 257, 18161, 14581, 300, 19817, 977, 264, + 50968], "temperature": 0.0, "avg_logprob": -0.17114732265472413, "compression_ratio": + 1.7061611374407584, "no_speech_prob": 0.0001779366284608841}, {"id": 418, "seek": + 293148, "start": 2944.68, "end": 2950.84, "text": " A100 based on the title embedding + that we have and then the hybrid part is actually the", "tokens": [51024, 316, 6879, + 2361, 322, 264, 4876, 12240, 3584, 300, 321, 362, 293, 550, 264, 13051, 644, 307, + 767, 264, 51332], "temperature": 0.0, "avg_logprob": -0.17114732265472413, "compression_ratio": + 1.7061611374407584, "no_speech_prob": 0.0001779366284608841}, {"id": 419, "seek": + 293148, "start": 2950.84, "end": 2958.36, "text": " search pipeline that normalizes + in this case with the L2 norm and combines the results with the", "tokens": [51332, + 3164, 15517, 300, 2710, 5660, 294, 341, 1389, 365, 264, 441, 17, 2026, 293, 29520, + 264, 3542, 365, 264, 51708], "temperature": 0.0, "avg_logprob": -0.17114732265472413, + "compression_ratio": 1.7061611374407584, "no_speech_prob": 0.0001779366284608841}, + {"id": 420, "seek": 295836, "start": 2958.36, "end": 2964.76, "text": " arithmetic + mean based on the keyword search rate and neural search rate and that are here", + "tokens": [50364, 42973, 914, 2361, 322, 264, 20428, 3164, 3314, 293, 18161, 3164, + 3314, 293, 300, 366, 510, 50684], "temperature": 0.0, "avg_logprob": -0.1950115849894862, + "compression_ratio": 1.5625, "no_speech_prob": 0.00012403786240611225}, {"id": 421, + "seek": 295836, "start": 2964.76, "end": 2972.2000000000003, "text": " passed in + as variables that are another method yeah predicts with model inference in that + case.", "tokens": [50684, 4678, 294, 382, 9102, 300, 366, 1071, 3170, 1338, 6069, + 82, 365, 2316, 38253, 294, 300, 1389, 13, 51056], "temperature": 0.0, "avg_logprob": + -0.1950115849894862, "compression_ratio": 1.5625, "no_speech_prob": 0.00012403786240611225}, + {"id": 422, "seek": 295836, "start": 2975.0, "end": 2985.32, "text": " Yeah that''s + that''s kind of the small but built in Jupyter notebook prototype that we have", + "tokens": [51196, 865, 300, 311, 300, 311, 733, 295, 264, 1359, 457, 3094, 294, + 22125, 88, 391, 21060, 19475, 300, 321, 362, 51712], "temperature": 0.0, "avg_logprob": + -0.1950115849894862, "compression_ratio": 1.5625, "no_speech_prob": 0.00012403786240611225}, + {"id": 423, "seek": 298532, "start": 2986.28, "end": 2993.4, "text": " everything + we built is like Eric just mentioned open source so we have a public repository + that", "tokens": [50412, 1203, 321, 3094, 307, 411, 9336, 445, 2835, 1269, 4009, + 370, 321, 362, 257, 1908, 25841, 300, 50768], "temperature": 0.0, "avg_logprob": + -0.1696358228984632, "compression_ratio": 1.6367713004484306, "no_speech_prob": + 0.0016907332465052605}, {"id": 424, "seek": 298532, "start": 2993.4, "end": 2999.7200000000003, + "text": " contains all but this one notebook actually but everything you need to + train the models", "tokens": [50768, 8306, 439, 457, 341, 472, 21060, 767, 457, + 1203, 291, 643, 281, 3847, 264, 5245, 51084], "temperature": 0.0, "avg_logprob": + -0.1696358228984632, "compression_ratio": 1.6367713004484306, "no_speech_prob": + 0.0016907332465052605}, {"id": 425, "seek": 298532, "start": 3000.44, "end": 3007.88, + "text": " do the feature engineering calculate the search metrics so yeah everything + you actually need so", "tokens": [51120, 360, 264, 4111, 7043, 8873, 264, 3164, + 16367, 370, 1338, 1203, 291, 767, 643, 370, 51492], "temperature": 0.0, "avg_logprob": + -0.1696358228984632, "compression_ratio": 1.6367713004484306, "no_speech_prob": + 0.0016907332465052605}, {"id": 426, "seek": 298532, "start": 3007.88, "end": 3014.52, + "text": " running this with the ESC IDATA set is possible for everyone and if you + want to apply", "tokens": [51492, 2614, 341, 365, 264, 12564, 34, 7348, 44811, 992, + 307, 1944, 337, 1518, 293, 498, 291, 528, 281, 3079, 51824], "temperature": 0.0, + "avg_logprob": -0.1696358228984632, "compression_ratio": 1.6367713004484306, "no_speech_prob": + 0.0016907332465052605}, {"id": 427, "seek": 301452, "start": 3014.6, "end": 3020.7599999999998, + "text": " with your own data that of course is also possible so that''s kind of + what we are looking at next", "tokens": [50368, 365, 428, 1065, 1412, 300, 295, + 1164, 307, 611, 1944, 370, 300, 311, 733, 295, 437, 321, 366, 1237, 412, 958, 50676], + "temperature": 0.0, "avg_logprob": -0.11262814203898112, "compression_ratio": 1.6, + "no_speech_prob": 0.0009003984159789979}, {"id": 428, "seek": 301452, "start": 3020.7599999999998, + "end": 3030.04, "text": " here so adoption in the industry and also hooking it up + with the other part of this project which is", "tokens": [50676, 510, 370, 19215, + 294, 264, 3518, 293, 611, 1106, 5953, 309, 493, 365, 264, 661, 644, 295, 341, 1716, + 597, 307, 51140], "temperature": 0.0, "avg_logprob": -0.11262814203898112, "compression_ratio": + 1.6, "no_speech_prob": 0.0009003984159789979}, {"id": 429, "seek": 301452, "start": + 3030.04, "end": 3039.08, "text": " namely the let''s call it the evaluation part + calculating implicit judgments based on user feedback", "tokens": [51140, 20926, + 264, 718, 311, 818, 309, 264, 13344, 644, 28258, 26947, 40337, 2361, 322, 4195, + 5824, 51592], "temperature": 0.0, "avg_logprob": -0.11262814203898112, "compression_ratio": + 1.6, "no_speech_prob": 0.0009003984159789979}, {"id": 430, "seek": 303908, "start": + 3039.16, "end": 3046.92, "text": " so clicks queries stuff like that so that we + also enable not only everyone to optimize hybrid", "tokens": [50368, 370, 18521, + 24109, 1507, 411, 300, 370, 300, 321, 611, 9528, 406, 787, 1518, 281, 19719, 13051, + 50756], "temperature": 0.0, "avg_logprob": -0.1788503646850586, "compression_ratio": + 1.5988700564971752, "no_speech_prob": 0.005032109096646309}, {"id": 431, "seek": + 303908, "start": 3046.92, "end": 3053.72, "text": " search but also enable and empower + everyone to well come up with judgments if you don''t have any", "tokens": [50756, + 3164, 457, 611, 9528, 293, 11071, 1518, 281, 731, 808, 493, 365, 40337, 498, 291, + 500, 380, 362, 604, 51096], "temperature": 0.0, "avg_logprob": -0.1788503646850586, + "compression_ratio": 1.5988700564971752, "no_speech_prob": 0.005032109096646309}, + {"id": 432, "seek": 303908, "start": 3053.72, "end": 3060.52, "text": " because + that''s kind of the the basics you need for any query you need. Yeah and that''s + where", "tokens": [51096, 570, 300, 311, 733, 295, 264, 264, 14688, 291, 643, 337, + 604, 14581, 291, 643, 13, 865, 293, 300, 311, 689, 51436], "temperature": 0.0, "avg_logprob": + -0.1788503646850586, "compression_ratio": 1.5988700564971752, "no_speech_prob": + 0.005032109096646309}, {"id": 433, "seek": 306052, "start": 3060.6, "end": 3069.96, + "text": " keep it comes in. Shameless flag. So one of the things we actually have + a reference implementation", "tokens": [50368, 1066, 309, 1487, 294, 13, 42912, + 4272, 7166, 13, 407, 472, 295, 264, 721, 321, 767, 362, 257, 6408, 11420, 50836], + "temperature": 0.0, "avg_logprob": -0.1594693258907018, "compression_ratio": 1.7443946188340806, + "no_speech_prob": 0.0019274278311058879}, {"id": 434, "seek": 306052, "start": 3069.96, + "end": 3074.6, "text": " you know some of you all may have heard of chorus which + is reference implementation for e-commerce", "tokens": [50836, 291, 458, 512, 295, + 291, 439, 815, 362, 2198, 295, 22632, 597, 307, 6408, 11420, 337, 308, 12, 26926, + 51068], "temperature": 0.0, "avg_logprob": -0.1594693258907018, "compression_ratio": + 1.7443946188340806, "no_speech_prob": 0.0019274278311058879}, {"id": 435, "seek": + 306052, "start": 3074.6, "end": 3080.84, "text": " search we did it in solar this + we have an open search version and some of the stuff that you''re", "tokens": [51068, + 3164, 321, 630, 309, 294, 7936, 341, 321, 362, 364, 1269, 3164, 3037, 293, 512, + 295, 264, 1507, 300, 291, 434, 51380], "temperature": 0.0, "avg_logprob": -0.1594693258907018, + "compression_ratio": 1.7443946188340806, "no_speech_prob": 0.0019274278311058879}, + {"id": 436, "seek": 306052, "start": 3080.84, "end": 3086.84, "text": " seeing is + sort of bleeding edge hot off the presses but we''re working right now on getting + that", "tokens": [51380, 2577, 307, 1333, 295, 19312, 4691, 2368, 766, 264, 40892, + 457, 321, 434, 1364, 558, 586, 322, 1242, 300, 51680], "temperature": 0.0, "avg_logprob": + -0.1594693258907018, "compression_ratio": 1.7443946188340806, "no_speech_prob": + 0.0019274278311058879}, {"id": 437, "seek": 308684, "start": 3086.84, "end": 3093.8, + "text": " chorus for open search edition updated with some of these scripts and + notebooks so you can just", "tokens": [50364, 22632, 337, 1269, 3164, 11377, 10588, + 365, 512, 295, 613, 23294, 293, 43782, 370, 291, 393, 445, 50712], "temperature": + 0.0, "avg_logprob": -0.13832103941175672, "compression_ratio": 1.7085201793721974, + "no_speech_prob": 0.001251386129297316}, {"id": 438, "seek": 308684, "start": 3093.8, + "end": 3098.6800000000003, "text": " check it out run the quick start and then have + everything and start playing with it so you don''t", "tokens": [50712, 1520, 309, + 484, 1190, 264, 1702, 722, 293, 550, 362, 1203, 293, 722, 2433, 365, 309, 370, 291, + 500, 380, 50956], "temperature": 0.0, "avg_logprob": -0.13832103941175672, "compression_ratio": + 1.7085201793721974, "no_speech_prob": 0.001251386129297316}, {"id": 439, "seek": + 308684, "start": 3098.6800000000003, "end": 3106.36, "text": " have to build all + the steps yourself so you can see how all the pieces fit together and so that''s", + "tokens": [50956, 362, 281, 1322, 439, 264, 4439, 1803, 370, 291, 393, 536, 577, + 439, 264, 3755, 3318, 1214, 293, 370, 300, 311, 51340], "temperature": 0.0, "avg_logprob": + -0.13832103941175672, "compression_ratio": 1.7085201793721974, "no_speech_prob": + 0.001251386129297316}, {"id": 440, "seek": 308684, "start": 3106.36, "end": 3111.0, + "text": " available be great if we can add a link in the line or notes for that + as well to meet you.", "tokens": [51340, 2435, 312, 869, 498, 321, 393, 909, 257, + 2113, 294, 264, 1622, 420, 5570, 337, 300, 382, 731, 281, 1677, 291, 13, 51572], + "temperature": 0.0, "avg_logprob": -0.13832103941175672, "compression_ratio": 1.7085201793721974, + "no_speech_prob": 0.001251386129297316}, {"id": 441, "seek": 311100, "start": 3111.08, + "end": 3116.68, "text": " Yeah let''s do that and just want to also to understand + this search pipeline and all this you know", "tokens": [50368, 865, 718, 311, 360, + 300, 293, 445, 528, 281, 611, 281, 1223, 341, 3164, 15517, 293, 439, 341, 291, 458, + 50648], "temperature": 0.0, "avg_logprob": -0.1607085197202621, "compression_ratio": + 1.7321428571428572, "no_speech_prob": 0.007830423302948475}, {"id": 442, "seek": + 311100, "start": 3116.68, "end": 3123.0, "text": " mechanics of hybrid search that + you guys implemented is it like a plugin to open search and what''s", "tokens": + [50648, 12939, 295, 13051, 3164, 300, 291, 1074, 12270, 307, 309, 411, 257, 23407, + 281, 1269, 3164, 293, 437, 311, 50964], "temperature": 0.0, "avg_logprob": -0.1607085197202621, + "compression_ratio": 1.7321428571428572, "no_speech_prob": 0.007830423302948475}, + {"id": 443, "seek": 311100, "start": 3123.0, "end": 3129.08, "text": " the plan + for it right so I guess you spoke to that like let''s let''s give it to as many + users as", "tokens": [50964, 264, 1393, 337, 309, 558, 370, 286, 2041, 291, 7179, + 281, 300, 411, 718, 311, 718, 311, 976, 309, 281, 382, 867, 5022, 382, 51268], "temperature": + 0.0, "avg_logprob": -0.1607085197202621, "compression_ratio": 1.7321428571428572, + "no_speech_prob": 0.007830423302948475}, {"id": 444, "seek": 311100, "start": 3129.08, + "end": 3139.24, "text": " possible what''s your idea there. So hybrid search is + available in open search so with let''s say", "tokens": [51268, 1944, 437, 311, + 428, 1558, 456, 13, 407, 13051, 3164, 307, 2435, 294, 1269, 3164, 370, 365, 718, + 311, 584, 51776], "temperature": 0.0, "avg_logprob": -0.1607085197202621, "compression_ratio": + 1.7321428571428572, "no_speech_prob": 0.007830423302948475}, {"id": 445, "seek": + 313924, "start": 3140.12, "end": 3147.24, "text": " the the basic share so you can + create a pipeline and say 70% keyword search weight and 30%", "tokens": [50408, + 264, 264, 3875, 2073, 370, 291, 393, 1884, 257, 15517, 293, 584, 5285, 4, 20428, + 3164, 3364, 293, 2217, 4, 50764], "temperature": 0.0, "avg_logprob": -0.16770266019381008, + "compression_ratio": 1.5810055865921788, "no_speech_prob": 0.0015157173620536923}, + {"id": 446, "seek": 313924, "start": 3147.24, "end": 3154.8399999999997, "text": + " neural search weight you can also define these on the fly but we currently have + the limitation", "tokens": [50764, 18161, 3164, 3364, 291, 393, 611, 6964, 613, + 322, 264, 3603, 457, 321, 4362, 362, 264, 27432, 51144], "temperature": 0.0, "avg_logprob": + -0.16770266019381008, "compression_ratio": 1.5810055865921788, "no_speech_prob": + 0.0015157173620536923}, {"id": 447, "seek": 313924, "start": 3154.8399999999997, + "end": 3162.6, "text": " that we although we can hook up the model within a so-called + ml inference pipeline in open search", "tokens": [51144, 300, 321, 4878, 321, 393, + 6328, 493, 264, 2316, 1951, 257, 370, 12, 11880, 23271, 38253, 15517, 294, 1269, + 3164, 51532], "temperature": 0.0, "avg_logprob": -0.16770266019381008, "compression_ratio": + 1.5810055865921788, "no_speech_prob": 0.0015157173620536923}, {"id": 448, "seek": + 316260, "start": 3163.48, "end": 3173.3199999999997, "text": " this ml inference + pipeline can as of now not pass the predicted neural and keyword search weights", + "tokens": [50408, 341, 23271, 38253, 15517, 393, 382, 295, 586, 406, 1320, 264, + 19147, 18161, 293, 20428, 3164, 17443, 50900], "temperature": 0.0, "avg_logprob": + -0.09703114724928333, "compression_ratio": 1.5846994535519126, "no_speech_prob": + 0.000978419790044427}, {"id": 449, "seek": 316260, "start": 3173.3199999999997, + "end": 3181.64, "text": " to this pipeline but feature request is already out there + and I assume that in one of the next", "tokens": [50900, 281, 341, 15517, 457, 4111, + 5308, 307, 1217, 484, 456, 293, 286, 6552, 300, 294, 472, 295, 264, 958, 51316], + "temperature": 0.0, "avg_logprob": -0.09703114724928333, "compression_ratio": 1.5846994535519126, + "no_speech_prob": 0.000978419790044427}, {"id": 450, "seek": 316260, "start": 3182.2, + "end": 3188.6, "text": " open search versions we will have the possibility to not + only what''s already possible hook up the", "tokens": [51344, 1269, 3164, 9606, + 321, 486, 362, 264, 7959, 281, 406, 787, 437, 311, 1217, 1944, 6328, 493, 264, 51664], + "temperature": 0.0, "avg_logprob": -0.09703114724928333, "compression_ratio": 1.5846994535519126, + "no_speech_prob": 0.000978419790044427}, {"id": 451, "seek": 318860, "start": 3188.6, + "end": 3197.3199999999997, "text": " model within open search so that it from within + open search you call out to the model to make", "tokens": [50364, 2316, 1951, 1269, + 3164, 370, 300, 309, 490, 1951, 1269, 3164, 291, 818, 484, 281, 264, 2316, 281, + 652, 50800], "temperature": 0.0, "avg_logprob": -0.13806663125248278, "compression_ratio": + 1.7577639751552796, "no_speech_prob": 0.0006006536423228681}, {"id": 452, "seek": + 318860, "start": 3197.3199999999997, "end": 3204.92, "text": " inference you will + retrieve the predicted neural and keyword search weights and then you can use", + "tokens": [50800, 38253, 291, 486, 30254, 264, 19147, 18161, 293, 20428, 3164, 17443, + 293, 550, 291, 393, 764, 51180], "temperature": 0.0, "avg_logprob": -0.13806663125248278, + "compression_ratio": 1.7577639751552796, "no_speech_prob": 0.0006006536423228681}, + {"id": 453, "seek": 318860, "start": 3204.92, "end": 3211.64, "text": " these in + your search pipeline so there is already an implementation plan out there there + are", "tokens": [51180, 613, 294, 428, 3164, 15517, 370, 456, 307, 1217, 364, 11420, + 1393, 484, 456, 456, 366, 51516], "temperature": 0.0, "avg_logprob": -0.13806663125248278, + "compression_ratio": 1.7577639751552796, "no_speech_prob": 0.0006006536423228681}, + {"id": 454, "seek": 321164, "start": 3211.96, "end": 3219.3199999999997, "text": + " open feature requests and if anyone wants to give these thumbs up to prioritize + that within the", "tokens": [50380, 1269, 4111, 12475, 293, 498, 2878, 2738, 281, + 976, 613, 8838, 493, 281, 25164, 300, 1951, 264, 50748], "temperature": 0.0, "avg_logprob": + -0.10399868844569414, "compression_ratio": 1.6551724137931034, "no_speech_prob": + 0.006629263982176781}, {"id": 455, "seek": 321164, "start": 3219.3199999999997, + "end": 3224.68, "text": " open search community that would of course be greatly + appreciated and I''m sure we can include", "tokens": [50748, 1269, 3164, 1768, 300, + 576, 295, 1164, 312, 14147, 17169, 293, 286, 478, 988, 321, 393, 4090, 51016], "temperature": + 0.0, "avg_logprob": -0.10399868844569414, "compression_ratio": 1.6551724137931034, + "no_speech_prob": 0.006629263982176781}, {"id": 456, "seek": 321164, "start": 3224.68, + "end": 3231.8799999999997, "text": " these GitHub issues as well in the show notes. + Yeah for sure let''s do that and we will call out", "tokens": [51016, 613, 23331, + 2663, 382, 731, 294, 264, 855, 5570, 13, 865, 337, 988, 718, 311, 360, 300, 293, + 321, 486, 818, 484, 51376], "temperature": 0.0, "avg_logprob": -0.10399868844569414, + "compression_ratio": 1.6551724137931034, "no_speech_prob": 0.006629263982176781}, + {"id": 457, "seek": 321164, "start": 3231.8799999999997, "end": 3236.6, "text": + " that''s how to call out to the community please vote if you care I hope there + will be enough people", "tokens": [51376, 300, 311, 577, 281, 818, 484, 281, 264, + 1768, 1767, 4740, 498, 291, 1127, 286, 1454, 456, 486, 312, 1547, 561, 51612], "temperature": + 0.0, "avg_logprob": -0.10399868844569414, "compression_ratio": 1.6551724137931034, + "no_speech_prob": 0.006629263982176781}, {"id": 458, "seek": 323660, "start": 3236.68, + "end": 3245.64, "text": " who care about this. Yeah exactly and then the you said + everything is open source does that mean", "tokens": [50368, 567, 1127, 466, 341, + 13, 865, 2293, 293, 550, 264, 291, 848, 1203, 307, 1269, 4009, 775, 300, 914, 50816], + "temperature": 0.0, "avg_logprob": -0.15059559920738483, "compression_ratio": 1.7085201793721974, + "no_speech_prob": 0.01387438178062439}, {"id": 459, "seek": 323660, "start": 3245.64, + "end": 3252.68, "text": " that the training scripts the algorithms the choices you + make you can make there is also open", "tokens": [50816, 300, 264, 3097, 23294, + 264, 14642, 264, 7994, 291, 652, 291, 393, 652, 456, 307, 611, 1269, 51168], "temperature": + 0.0, "avg_logprob": -0.15059559920738483, "compression_ratio": 1.7085201793721974, + "no_speech_prob": 0.01387438178062439}, {"id": 460, "seek": 323660, "start": 3252.68, + "end": 3260.04, "text": " source right and we can link to that as well yeah. Yes + so everything with the I mean we of course", "tokens": [51168, 4009, 558, 293, 321, + 393, 2113, 281, 300, 382, 731, 1338, 13, 1079, 370, 1203, 365, 264, 286, 914, 321, + 295, 1164, 51536], "temperature": 0.0, "avg_logprob": -0.15059559920738483, "compression_ratio": + 1.7085201793721974, "no_speech_prob": 0.01387438178062439}, {"id": 461, "seek": + 323660, "start": 3260.04, "end": 3265.88, "text": " didn''t include all the thousand + experiments but all the helpers at least that we used to run", "tokens": [51536, + 994, 380, 4090, 439, 264, 4714, 12050, 457, 439, 264, 854, 433, 412, 1935, 300, + 321, 1143, 281, 1190, 51828], "temperature": 0.0, "avg_logprob": -0.15059559920738483, + "compression_ratio": 1.7085201793721974, "no_speech_prob": 0.01387438178062439}, + {"id": 462, "seek": 326588, "start": 3265.88, "end": 3274.44, "text": " these 1000 + experiments they are all in the repository out there for everyone to have a look + at and", "tokens": [50364, 613, 9714, 12050, 436, 366, 439, 294, 264, 25841, 484, + 456, 337, 1518, 281, 362, 257, 574, 412, 293, 50792], "temperature": 0.0, "avg_logprob": + -0.14934514550601735, "compression_ratio": 1.546875, "no_speech_prob": 0.001986442133784294}, + {"id": 463, "seek": 326588, "start": 3274.44, "end": 3280.84, "text": " maybe come + up with even better ideas than we had so we definitely always love to hear these + as well.", "tokens": [50792, 1310, 808, 493, 365, 754, 1101, 3487, 813, 321, 632, + 370, 321, 2138, 1009, 959, 281, 1568, 613, 382, 731, 13, 51112], "temperature": + 0.0, "avg_logprob": -0.14934514550601735, "compression_ratio": 1.546875, "no_speech_prob": + 0.001986442133784294}, {"id": 464, "seek": 326588, "start": 3281.48, "end": 3289.4, + "text": " Wow this is amazing this is a we can true sense you speak up to your name + open source connections", "tokens": [51144, 3153, 341, 307, 2243, 341, 307, 257, + 321, 393, 2074, 2020, 291, 1710, 493, 281, 428, 1315, 1269, 4009, 9271, 51540], + "temperature": 0.0, "avg_logprob": -0.14934514550601735, "compression_ratio": 1.546875, + "no_speech_prob": 0.001986442133784294}, {"id": 465, "seek": 328940, "start": 3289.88, + "end": 3296.44, "text": " that''s amazing like to me it''s a ton of work that you + could choose to hide as well right and work", "tokens": [50388, 300, 311, 2243, + 411, 281, 385, 309, 311, 257, 2952, 295, 589, 300, 291, 727, 2826, 281, 6479, 382, + 731, 558, 293, 589, 50716], "temperature": 0.0, "avg_logprob": -0.14937685894709762, + "compression_ratio": 1.8240740740740742, "no_speech_prob": 0.014770988374948502}, + {"id": 466, "seek": 328940, "start": 3296.44, "end": 3302.6800000000003, "text": + " only with your clients and and nurture and iron out and everything and make a + ton of money and", "tokens": [50716, 787, 365, 428, 6982, 293, 293, 41451, 293, + 6497, 484, 293, 1203, 293, 652, 257, 2952, 295, 1460, 293, 51028], "temperature": + 0.0, "avg_logprob": -0.14937685894709762, "compression_ratio": 1.8240740740740742, + "no_speech_prob": 0.014770988374948502}, {"id": 467, "seek": 328940, "start": 3302.6800000000003, + "end": 3307.56, "text": " continue on that but you choose to open source because + you believe in the power of the community as", "tokens": [51028, 2354, 322, 300, + 457, 291, 2826, 281, 1269, 4009, 570, 291, 1697, 294, 264, 1347, 295, 264, 1768, + 382, 51272], "temperature": 0.0, "avg_logprob": -0.14937685894709762, "compression_ratio": + 1.8240740740740742, "no_speech_prob": 0.014770988374948502}, {"id": 468, "seek": + 328940, "start": 3307.56, "end": 3318.6800000000003, "text": " well to enhance it + that''s amazing well said what''s next you already said a couple of words but what''s", + "tokens": [51272, 731, 281, 11985, 309, 300, 311, 2243, 731, 848, 437, 311, 958, + 291, 1217, 848, 257, 1916, 295, 2283, 457, 437, 311, 51828], "temperature": 0.0, + "avg_logprob": -0.14937685894709762, "compression_ratio": 1.8240740740740742, "no_speech_prob": + 0.014770988374948502}, {"id": 469, "seek": 331868, "start": 3318.68, "end": 3325.96, + "text": " next and also do you want to address our audience what do you expect from + me I think you know one of", "tokens": [50364, 958, 293, 611, 360, 291, 528, 281, + 2985, 527, 4034, 437, 360, 291, 2066, 490, 385, 286, 519, 291, 458, 472, 295, 50728], + "temperature": 0.0, "avg_logprob": -0.15480988748957603, "compression_ratio": 1.7345132743362832, + "no_speech_prob": 0.0003564977669157088}, {"id": 470, "seek": 331868, "start": 3325.96, + "end": 3332.2, "text": " the things is you know as we''ve gotten into this you know + we''ve we''ve found some rough spots in", "tokens": [50728, 264, 721, 307, 291, + 458, 382, 321, 600, 5768, 666, 341, 291, 458, 321, 600, 321, 600, 1352, 512, 5903, + 10681, 294, 51040], "temperature": 0.0, "avg_logprob": -0.15480988748957603, "compression_ratio": + 1.7345132743362832, "no_speech_prob": 0.0003564977669157088}, {"id": 471, "seek": + 331868, "start": 3332.2, "end": 3339.16, "text": " open search right open search + has a strong ML component ML Commons project right but integrating it", "tokens": + [51040, 1269, 3164, 558, 1269, 3164, 575, 257, 2068, 21601, 6542, 21601, 34894, + 1716, 558, 457, 26889, 309, 51388], "temperature": 0.0, "avg_logprob": -0.15480988748957603, + "compression_ratio": 1.7345132743362832, "no_speech_prob": 0.0003564977669157088}, + {"id": 472, "seek": 331868, "start": 3339.16, "end": 3344.9199999999996, "text": + " in sort of new and interesting ways like what Daniel was showing we''re finding + some rough spots", "tokens": [51388, 294, 1333, 295, 777, 293, 1880, 2098, 411, + 437, 8033, 390, 4099, 321, 434, 5006, 512, 5903, 10681, 51676], "temperature": 0.0, + "avg_logprob": -0.15480988748957603, "compression_ratio": 1.7345132743362832, "no_speech_prob": + 0.0003564977669157088}, {"id": 473, "seek": 334492, "start": 3345.7200000000003, + "end": 3353.8, "text": " it is interesting to me it does bring them in the do search + engines what we call a search engine need", "tokens": [50404, 309, 307, 1880, 281, + 385, 309, 775, 1565, 552, 294, 264, 360, 3164, 12982, 437, 321, 818, 257, 3164, + 2848, 643, 50808], "temperature": 0.0, "avg_logprob": -0.08490586280822754, "compression_ratio": + 1.7757847533632287, "no_speech_prob": 0.0021228506229817867}, {"id": 474, "seek": + 334492, "start": 3353.8, "end": 3360.6800000000003, "text": " to evolve to be more + of an ML engine as well right I mean it feels to me like search has been", "tokens": + [50808, 281, 16693, 281, 312, 544, 295, 364, 21601, 2848, 382, 731, 558, 286, 914, + 309, 3417, 281, 385, 411, 3164, 575, 668, 51152], "temperature": 0.0, "avg_logprob": + -0.08490586280822754, "compression_ratio": 1.7757847533632287, "no_speech_prob": + 0.0021228506229817867}, {"id": 475, "seek": 334492, "start": 3360.6800000000003, + "end": 3367.64, "text": " revolutionized by machine learning and as we move into + this direction of more calculating building", "tokens": [51152, 8894, 1602, 538, + 3479, 2539, 293, 382, 321, 1286, 666, 341, 3513, 295, 544, 28258, 2390, 51500], + "temperature": 0.0, "avg_logprob": -0.08490586280822754, "compression_ratio": 1.7757847533632287, + "no_speech_prob": 0.0021228506229817867}, {"id": 476, "seek": 334492, "start": 3367.64, + "end": 3373.96, "text": " models evaluating data on the fly do our search engines + need to support those use cases and go beyond", "tokens": [51500, 5245, 27479, 1412, + 322, 264, 3603, 360, 527, 3164, 12982, 643, 281, 1406, 729, 764, 3331, 293, 352, + 4399, 51816], "temperature": 0.0, "avg_logprob": -0.08490586280822754, "compression_ratio": + 1.7757847533632287, "no_speech_prob": 0.0021228506229817867}, {"id": 477, "seek": + 337396, "start": 3373.96, "end": 3381.7200000000003, "text": " just the I I get + a query I get documents I match them up and that''s it right is there another", + "tokens": [50364, 445, 264, 286, 286, 483, 257, 14581, 286, 483, 8512, 286, 2995, + 552, 493, 293, 300, 311, 309, 558, 307, 456, 1071, 50752], "temperature": 0.0, "avg_logprob": + -0.11376806323447924, "compression_ratio": 1.6877828054298643, "no_speech_prob": + 0.0004616710648406297}, {"id": 478, "seek": 337396, "start": 3381.7200000000003, + "end": 3388.52, "text": " layer of computation that we kind of need in the search + engine versus bolting it on in", "tokens": [50752, 4583, 295, 24903, 300, 321, 733, + 295, 643, 294, 264, 3164, 2848, 5717, 8986, 783, 309, 322, 294, 51092], "temperature": + 0.0, "avg_logprob": -0.11376806323447924, "compression_ratio": 1.6877828054298643, + "no_speech_prob": 0.0004616710648406297}, {"id": 479, "seek": 337396, "start": 3389.48, + "end": 3396.04, "text": " some other environment with an ML ops pipeline and all + the rest um and and I think that''s in the", "tokens": [51140, 512, 661, 2823, 365, + 364, 21601, 44663, 15517, 293, 439, 264, 1472, 1105, 293, 293, 286, 519, 300, 311, + 294, 264, 51468], "temperature": 0.0, "avg_logprob": -0.11376806323447924, "compression_ratio": + 1.6877828054298643, "no_speech_prob": 0.0004616710648406297}, {"id": 480, "seek": + 337396, "start": 3396.04, "end": 3401.48, "text": " interesting you know one place + where I think open search is a little bit you know is definitely", "tokens": [51468, + 1880, 291, 458, 472, 1081, 689, 286, 519, 1269, 3164, 307, 257, 707, 857, 291, 458, + 307, 2138, 51740], "temperature": 0.0, "avg_logprob": -0.11376806323447924, "compression_ratio": + 1.6877828054298643, "no_speech_prob": 0.0004616710648406297}, {"id": 481, "seek": + 340148, "start": 3402.28, "end": 3406.6, "text": " you know breaking some interesting + ground is all the machine learning aspects to it", "tokens": [50404, 291, 458, 7697, + 512, 1880, 2727, 307, 439, 264, 3479, 2539, 7270, 281, 309, 50620], "temperature": + 0.0, "avg_logprob": -0.07449235544576273, "compression_ratio": 1.6846846846846846, + "no_speech_prob": 0.000674513285048306}, {"id": 482, "seek": 340148, "start": 3407.48, + "end": 3413.88, "text": " but you know data processing and building models and all + of that needs to maybe be a first class", "tokens": [50664, 457, 291, 458, 1412, + 9007, 293, 2390, 5245, 293, 439, 295, 300, 2203, 281, 1310, 312, 257, 700, 1508, + 50984], "temperature": 0.0, "avg_logprob": -0.07449235544576273, "compression_ratio": + 1.6846846846846846, "no_speech_prob": 0.000674513285048306}, {"id": 483, "seek": + 340148, "start": 3413.88, "end": 3421.56, "text": " citizen of what we consider + a search engine versus something done by some other system elsewhere", "tokens": + [50984, 13326, 295, 437, 321, 1949, 257, 3164, 2848, 5717, 746, 1096, 538, 512, + 661, 1185, 14517, 51368], "temperature": 0.0, "avg_logprob": -0.07449235544576273, + "compression_ratio": 1.6846846846846846, "no_speech_prob": 0.000674513285048306}, + {"id": 484, "seek": 340148, "start": 3421.56, "end": 3427.4, "text": " because that''s + a lot more complexity and you know raises the barrier to adopting these things so", + "tokens": [51368, 570, 300, 311, 257, 688, 544, 14024, 293, 291, 458, 19658, 264, + 13357, 281, 32328, 613, 721, 370, 51660], "temperature": 0.0, "avg_logprob": -0.07449235544576273, + "compression_ratio": 1.6846846846846846, "no_speech_prob": 0.000674513285048306}, + {"id": 485, "seek": 342740, "start": 3427.48, "end": 3434.2000000000003, "text": + " I look forward to things like a hybrid optimizer just sort of being like what + you do when you build", "tokens": [50368, 286, 574, 2128, 281, 721, 411, 257, 13051, + 5028, 6545, 445, 1333, 295, 885, 411, 437, 291, 360, 562, 291, 1322, 50704], "temperature": + 0.0, "avg_logprob": -0.09127230976903161, "compression_ratio": 1.7255813953488373, + "no_speech_prob": 0.0005911454791203141}, {"id": 486, "seek": 342740, "start": 3434.2000000000003, + "end": 3439.1600000000003, "text": " your search engine of course you turn on the + hybrid optimizer if it meets your use case and", "tokens": [50704, 428, 3164, 2848, + 295, 1164, 291, 1261, 322, 264, 13051, 5028, 6545, 498, 309, 13961, 428, 764, 1389, + 293, 50952], "temperature": 0.0, "avg_logprob": -0.09127230976903161, "compression_ratio": + 1.7255813953488373, "no_speech_prob": 0.0005911454791203141}, {"id": 487, "seek": + 342740, "start": 3439.1600000000003, "end": 3445.7200000000003, "text": " you have + the judgments and other features that you need right versus oh a major engineering + project", "tokens": [50952, 291, 362, 264, 40337, 293, 661, 4122, 300, 291, 643, + 558, 5717, 1954, 257, 2563, 7043, 1716, 51280], "temperature": 0.0, "avg_logprob": + -0.09127230976903161, "compression_ratio": 1.7255813953488373, "no_speech_prob": + 0.0005911454791203141}, {"id": 488, "seek": 342740, "start": 3445.7200000000003, + "end": 3452.28, "text": " that we''re going to do this going to take us six months + so um yeah yeah yeah um", "tokens": [51280, 300, 321, 434, 516, 281, 360, 341, 516, + 281, 747, 505, 2309, 2493, 370, 1105, 1338, 1338, 1338, 1105, 51608], "temperature": + 0.0, "avg_logprob": -0.09127230976903161, "compression_ratio": 1.7255813953488373, + "no_speech_prob": 0.0005911454791203141}, {"id": 489, "seek": 345228, "start": 3453.2400000000002, + "end": 3459.48, "text": " um and you know supporting that you know as Dan you highlighted + the search quality evaluation", "tokens": [50412, 1105, 293, 291, 458, 7231, 300, + 291, 458, 382, 3394, 291, 17173, 264, 3164, 3125, 13344, 50724], "temperature": + 0.0, "avg_logprob": -0.14010675563368685, "compression_ratio": 1.6695278969957081, + "no_speech_prob": 0.0030809228774160147}, {"id": 490, "seek": 345228, "start": 3459.48, + "end": 3465.0, "text": " framework that we''re adding to open searches really exciting + would love to come back to Metri and", "tokens": [50724, 8388, 300, 321, 434, 5127, + 281, 1269, 26701, 534, 4670, 576, 959, 281, 808, 646, 281, 6377, 470, 293, 51000], + "temperature": 0.0, "avg_logprob": -0.14010675563368685, "compression_ratio": 1.6695278969957081, + "no_speech_prob": 0.0030809228774160147}, {"id": 491, "seek": 345228, "start": 3465.0, + "end": 3471.0, "text": " talk all about that on another show yeah let''s do that + I''m really excited to dive deeper into", "tokens": [51000, 751, 439, 466, 300, + 322, 1071, 855, 1338, 718, 311, 360, 300, 286, 478, 534, 2919, 281, 9192, 7731, + 666, 51300], "temperature": 0.0, "avg_logprob": -0.14010675563368685, "compression_ratio": + 1.6695278969957081, "no_speech_prob": 0.0030809228774160147}, {"id": 492, "seek": + 345228, "start": 3471.0, "end": 3476.36, "text": " evil because I think in so many + ways you need to start with evil especially if you have a search engine", "tokens": + [51300, 6724, 570, 286, 519, 294, 370, 867, 2098, 291, 643, 281, 722, 365, 6724, + 2318, 498, 291, 362, 257, 3164, 2848, 51568], "temperature": 0.0, "avg_logprob": + -0.14010675563368685, "compression_ratio": 1.6695278969957081, "no_speech_prob": + 0.0030809228774160147}, {"id": 493, "seek": 347636, "start": 3476.36, "end": 3482.92, + "text": " right out there uh to establish that uh you know baseline for yourself + and then learn and", "tokens": [50364, 558, 484, 456, 2232, 281, 8327, 300, 2232, + 291, 458, 20518, 337, 1803, 293, 550, 1466, 293, 50692], "temperature": 0.0, "avg_logprob": + -0.15163103739420572, "compression_ratio": 1.7168949771689497, "no_speech_prob": + 0.0086812824010849}, {"id": 494, "seek": 347636, "start": 3482.92, "end": 3489.56, + "text": " introspect where things uh work or fail yeah make sure some of these models + don''t go and produce", "tokens": [50692, 560, 28713, 689, 721, 2232, 589, 420, + 3061, 1338, 652, 988, 512, 295, 613, 5245, 500, 380, 352, 293, 5258, 51024], "temperature": + 0.0, "avg_logprob": -0.15163103739420572, "compression_ratio": 1.7168949771689497, + "no_speech_prob": 0.0086812824010849}, {"id": 495, "seek": 347636, "start": 3489.56, + "end": 3497.2400000000002, "text": " terrible yeah back shit crazy results right + we had the risk that you know boosting up on neural", "tokens": [51024, 6237, 1338, + 646, 4611, 3219, 3542, 558, 321, 632, 264, 3148, 300, 291, 458, 43117, 493, 322, + 18161, 51408], "temperature": 0.0, "avg_logprob": -0.15163103739420572, "compression_ratio": + 1.7168949771689497, "no_speech_prob": 0.0086812824010849}, {"id": 496, "seek": 347636, + "start": 3497.2400000000002, "end": 3502.6, "text": "ness might be terrible right + we have to understand that and so we need to be much better about", "tokens": [51408, + 1287, 1062, 312, 6237, 558, 321, 362, 281, 1223, 300, 293, 370, 321, 643, 281, 312, + 709, 1101, 466, 51676], "temperature": 0.0, "avg_logprob": -0.15163103739420572, + "compression_ratio": 1.7168949771689497, "no_speech_prob": 0.0086812824010849}, + {"id": 497, "seek": 350260, "start": 3502.6, "end": 3509.08, "text": " evaluation + right we can''t take our eye off the ball of speed right query speed does remain", + "tokens": [50364, 13344, 558, 321, 393, 380, 747, 527, 3313, 766, 264, 2594, 295, + 3073, 558, 14581, 3073, 775, 6222, 50688], "temperature": 0.0, "avg_logprob": -0.20521946957236842, + "compression_ratio": 1.8436018957345972, "no_speech_prob": 0.0012746984139084816}, + {"id": 498, "seek": 350260, "start": 3509.7999999999997, "end": 3515.72, "text": + " just sort of top of mind we also really need to be good at evaluation and I think + for a lot of", "tokens": [50724, 445, 1333, 295, 1192, 295, 1575, 321, 611, 534, + 643, 281, 312, 665, 412, 13344, 293, 286, 519, 337, 257, 688, 295, 51020], "temperature": + 0.0, "avg_logprob": -0.20521946957236842, "compression_ratio": 1.8436018957345972, + "no_speech_prob": 0.0012746984139084816}, {"id": 499, "seek": 350260, "start": 3515.72, + "end": 3521.72, "text": " search teams that''s kind of a new thing yes absolutely + so the pooling and low search agents yeah and", "tokens": [51020, 3164, 5491, 300, + 311, 733, 295, 257, 777, 551, 2086, 3122, 370, 264, 7005, 278, 293, 2295, 3164, + 12554, 1338, 293, 51320], "temperature": 0.0, "avg_logprob": -0.20521946957236842, + "compression_ratio": 1.8436018957345972, "no_speech_prob": 0.0012746984139084816}, + {"id": 500, "seek": 350260, "start": 3521.72, "end": 3527.24, "text": " we do want + to do what other thing I want to shout out yeah one other thing I want to shout + out is that", "tokens": [51320, 321, 360, 528, 281, 360, 437, 661, 551, 286, 528, + 281, 8043, 484, 1338, 472, 661, 551, 286, 528, 281, 8043, 484, 307, 300, 51596], + "temperature": 0.0, "avg_logprob": -0.20521946957236842, "compression_ratio": 1.8436018957345972, + "no_speech_prob": 0.0012746984139084816}, {"id": 501, "seek": 352724, "start": 3527.7999999999997, + "end": 3535.9599999999996, "text": " the next haystack conference uh the saved the + date last week in April right um last week in April", "tokens": [50392, 264, 958, + 4842, 372, 501, 7586, 2232, 264, 6624, 264, 4002, 1036, 1243, 294, 6929, 558, 1105, + 1036, 1243, 294, 6929, 50800], "temperature": 0.0, "avg_logprob": -0.15075209971224324, + "compression_ratio": 1.7180616740088106, "no_speech_prob": 0.0024090104270726442}, + {"id": 502, "seek": 352724, "start": 3536.7599999999998, "end": 3543.8799999999997, + "text": " save the date went out um and we are looking for uh talk reviewers people + who want to be reviewing", "tokens": [50840, 3155, 264, 4002, 1437, 484, 1105, 293, + 321, 366, 1237, 337, 2232, 751, 45837, 561, 567, 528, 281, 312, 19576, 51196], "temperature": + 0.0, "avg_logprob": -0.15075209971224324, "compression_ratio": 1.7180616740088106, + "no_speech_prob": 0.0024090104270726442}, {"id": 503, "seek": 352724, "start": 3543.8799999999997, + "end": 3548.4399999999996, "text": " the talk proposals it''s a double blind process + we get a couple people from the community so if", "tokens": [51196, 264, 751, 20198, + 309, 311, 257, 3834, 6865, 1399, 321, 483, 257, 1916, 561, 490, 264, 1768, 370, + 498, 51424], "temperature": 0.0, "avg_logprob": -0.15075209971224324, "compression_ratio": + 1.7180616740088106, "no_speech_prob": 0.0024090104270726442}, {"id": 504, "seek": + 352724, "start": 3548.4399999999996, "end": 3555.08, "text": " that''s something + interested reach out to David Fisher uh he''s running that process uh and call for", + "tokens": [51424, 300, 311, 746, 3102, 2524, 484, 281, 4389, 26676, 2232, 415, 311, + 2614, 300, 1399, 2232, 293, 818, 337, 51756], "temperature": 0.0, "avg_logprob": + -0.15075209971224324, "compression_ratio": 1.7180616740088106, "no_speech_prob": + 0.0024090104270726442}, {"id": 505, "seek": 355508, "start": 3555.08, "end": 3562.44, + "text": " proposals will be out I''m curious if this year haystack in Charlottesville + might as well just be called", "tokens": [50364, 20198, 486, 312, 484, 286, 478, + 6369, 498, 341, 1064, 4842, 372, 501, 294, 14130, 1521, 279, 8386, 1062, 382, 731, + 445, 312, 1219, 50732], "temperature": 0.0, "avg_logprob": -0.14968310252274616, + "compression_ratio": 1.652542372881356, "no_speech_prob": 0.0013577057980000973}, + {"id": 506, "seek": 355508, "start": 3562.44, "end": 3571.16, "text": " hybrid stack + like are we gonna have two days of talking about hybrid uh because R.A.G. rag is", + "tokens": [50732, 13051, 8630, 411, 366, 321, 799, 362, 732, 1708, 295, 1417, 466, + 13051, 2232, 570, 497, 13, 32, 13, 38, 13, 17539, 307, 51168], "temperature": 0.0, + "avg_logprob": -0.14968310252274616, "compression_ratio": 1.652542372881356, "no_speech_prob": + 0.0013577057980000973}, {"id": 507, "seek": 355508, "start": 3571.16, "end": 3575.96, + "text": " rag is you know that''s last year now we''re on to hybrid uh or is there + going to be something new", "tokens": [51168, 17539, 307, 291, 458, 300, 311, 1036, + 1064, 586, 321, 434, 322, 281, 13051, 2232, 420, 307, 456, 516, 281, 312, 746, 777, + 51408], "temperature": 0.0, "avg_logprob": -0.14968310252274616, "compression_ratio": + 1.652542372881356, "no_speech_prob": 0.0013577057980000973}, {"id": 508, "seek": + 355508, "start": 3575.96, "end": 3580.84, "text": " so it''s going to be interesting + to see what what what kind of comes out of the community uh for", "tokens": [51408, + 370, 309, 311, 516, 281, 312, 1880, 281, 536, 437, 437, 437, 733, 295, 1487, 484, + 295, 264, 1768, 2232, 337, 51652], "temperature": 0.0, "avg_logprob": -0.14968310252274616, + "compression_ratio": 1.652542372881356, "no_speech_prob": 0.0013577057980000973}, + {"id": 509, "seek": 358084, "start": 3580.84, "end": 3586.6800000000003, "text": + " this year''s haystack it''s also very interesting to to see this dynamic or this + evolution because", "tokens": [50364, 341, 1064, 311, 4842, 372, 501, 309, 311, + 611, 588, 1880, 281, 281, 536, 341, 8546, 420, 341, 9303, 570, 50656], "temperature": + 0.0, "avg_logprob": -0.10982514279229301, "compression_ratio": 1.7717391304347827, + "no_speech_prob": 0.008731388486921787}, {"id": 510, "seek": 358084, "start": 3586.6800000000003, + "end": 3592.2000000000003, "text": " I think two or three years ago hybrid search + was at the top of charts everyone was discussing it", "tokens": [50656, 286, 519, + 732, 420, 1045, 924, 2057, 13051, 3164, 390, 412, 264, 1192, 295, 17767, 1518, 390, + 10850, 309, 50932], "temperature": 0.0, "avg_logprob": -0.10982514279229301, "compression_ratio": + 1.7717391304347827, "no_speech_prob": 0.008731388486921787}, {"id": 511, "seek": + 358084, "start": 3592.2000000000003, "end": 3598.04, "text": " and then rags proceeded + it and that just tells me that maybe we didn''t dive deep enough into the", "tokens": + [50932, 293, 550, 367, 12109, 39053, 309, 293, 300, 445, 5112, 385, 300, 1310, 321, + 994, 380, 9192, 2452, 1547, 666, 264, 51224], "temperature": 0.0, "avg_logprob": + -0.10982514279229301, "compression_ratio": 1.7717391304347827, "no_speech_prob": + 0.008731388486921787}, {"id": 512, "seek": 358084, "start": 3598.04, "end": 3604.36, + "text": " topic which is let it go it passed and people thought oh that doesn''t + work we need we need something", "tokens": [51224, 4829, 597, 307, 718, 309, 352, + 309, 4678, 293, 561, 1194, 1954, 300, 1177, 380, 589, 321, 643, 321, 643, 746, 51540], + "temperature": 0.0, "avg_logprob": -0.10982514279229301, "compression_ratio": 1.7717391304347827, + "no_speech_prob": 0.008731388486921787}, {"id": 513, "seek": 358084, "start": 3604.36, + "end": 3610.76, "text": " else and now it''s rag rag rag everywhere but then rag + also comes with its own you know limitation", "tokens": [51540, 1646, 293, 586, + 309, 311, 17539, 17539, 17539, 5315, 457, 550, 17539, 611, 1487, 365, 1080, 1065, + 291, 458, 27432, 51860], "temperature": 0.0, "avg_logprob": -0.10982514279229301, + "compression_ratio": 1.7717391304347827, "no_speech_prob": 0.008731388486921787}, + {"id": 514, "seek": 361084, "start": 3611.2400000000002, "end": 3618.92, "text": + " uh and now we have a new you know evolution level right with hybrid search you + guys are", "tokens": [50384, 2232, 293, 586, 321, 362, 257, 777, 291, 458, 9303, + 1496, 558, 365, 13051, 3164, 291, 1074, 366, 50768], "temperature": 0.0, "avg_logprob": + -0.14187672024681455, "compression_ratio": 1.6854460093896713, "no_speech_prob": + 0.0023195648100227118}, {"id": 515, "seek": 361084, "start": 3619.48, "end": 3624.44, + "text": " optimizing uh that''s amazing I was actually waiting for it I''m happy + to see it happen", "tokens": [50796, 40425, 2232, 300, 311, 2243, 286, 390, 767, + 3806, 337, 309, 286, 478, 2055, 281, 536, 309, 1051, 51044], "temperature": 0.0, + "avg_logprob": -0.14187672024681455, "compression_ratio": 1.6854460093896713, "no_speech_prob": + 0.0023195648100227118}, {"id": 516, "seek": 361084, "start": 3625.6400000000003, + "end": 3630.28, "text": " and thanks for sharing the story let''s come back to the + evolve when you guys are ready I also", "tokens": [51104, 293, 3231, 337, 5414, + 264, 1657, 718, 311, 808, 646, 281, 264, 16693, 562, 291, 1074, 366, 1919, 286, + 611, 51336], "temperature": 0.0, "avg_logprob": -0.14187672024681455, "compression_ratio": + 1.6854460093896713, "no_speech_prob": 0.0023195648100227118}, {"id": 517, "seek": + 361084, "start": 3630.28, "end": 3636.44, "text": " expect that you will publish + some blog posts around this topic about this topic so I''ll you", "tokens": [51336, + 2066, 300, 291, 486, 11374, 512, 6968, 12300, 926, 341, 4829, 466, 341, 4829, 370, + 286, 603, 291, 51644], "temperature": 0.0, "avg_logprob": -0.14187672024681455, + "compression_ratio": 1.6854460093896713, "no_speech_prob": 0.0023195648100227118}, + {"id": 518, "seek": 363644, "start": 3636.44, "end": 3640.68, "text": " know I''ll + be happy to promote those as well and read them of course and learn", "tokens": + [50364, 458, 286, 603, 312, 2055, 281, 9773, 729, 382, 731, 293, 1401, 552, 295, + 1164, 293, 1466, 50576], "temperature": 0.0, "avg_logprob": -0.2078845549602898, + "compression_ratio": 1.6859504132231404, "no_speech_prob": 0.015017562545835972}, + {"id": 519, "seek": 363644, "start": 3642.28, "end": 3647.88, "text": " terrific + yeah thank you thank you very much thanks for coming to the show today and sharing + this", "tokens": [50656, 20899, 1338, 1309, 291, 1309, 291, 588, 709, 3231, 337, + 1348, 281, 264, 855, 965, 293, 5414, 341, 50936], "temperature": 0.0, "avg_logprob": + -0.2078845549602898, "compression_ratio": 1.6859504132231404, "no_speech_prob": + 0.015017562545835972}, {"id": 520, "seek": 363644, "start": 3647.88, "end": 3652.12, + "text": " story it''s very exciting and also showing the demo I like notebooks because + they love you", "tokens": [50936, 1657, 309, 311, 588, 4670, 293, 611, 4099, 264, + 10723, 286, 411, 43782, 570, 436, 959, 291, 51148], "temperature": 0.0, "avg_logprob": + -0.2078845549602898, "compression_ratio": 1.6859504132231404, "no_speech_prob": + 0.015017562545835972}, {"id": 521, "seek": 363644, "start": 3653.16, "end": 3657.8, + "text": " you know too quickly uh they say things and and have a feel of what''s + going on", "tokens": [51200, 291, 458, 886, 2661, 2232, 436, 584, 721, 293, 293, + 362, 257, 841, 295, 437, 311, 516, 322, 51432], "temperature": 0.0, "avg_logprob": + -0.2078845549602898, "compression_ratio": 1.6859504132231404, "no_speech_prob": + 0.015017562545835972}, {"id": 522, "seek": 363644, "start": 3658.68, "end": 3662.28, + "text": " I''m glad that Eric still uses the cup that we gave him as a gift", "tokens": + [51476, 286, 478, 5404, 300, 9336, 920, 4960, 264, 4414, 300, 321, 2729, 796, 382, + 257, 5306, 51656], "temperature": 0.0, "avg_logprob": -0.2078845549602898, "compression_ratio": + 1.6859504132231404, "no_speech_prob": 0.015017562545835972}, {"id": 523, "seek": + 366228, "start": 3662.36, "end": 3669.4, "text": " I love it it''s like like you + really see you really see the gift you gave like coming back and saying", "tokens": + [50368, 286, 959, 309, 309, 311, 411, 411, 291, 534, 536, 291, 534, 536, 264, 5306, + 291, 2729, 411, 1348, 646, 293, 1566, 50720], "temperature": 0.0, "avg_logprob": + -0.2811511484781901, "compression_ratio": 1.8245614035087718, "no_speech_prob": + 0.02362840808928013}, {"id": 524, "seek": 366228, "start": 3669.4, "end": 3675.7200000000003, + "text": " you did this and the person you gave it to is still enjoying that''s amazing", + "tokens": [50720, 291, 630, 341, 293, 264, 954, 291, 2729, 309, 281, 307, 920, 9929, + 300, 311, 2243, 51036], "temperature": 0.0, "avg_logprob": -0.2811511484781901, + "compression_ratio": 1.8245614035087718, "no_speech_prob": 0.02362840808928013}, + {"id": 525, "seek": 366228, "start": 3675.7200000000003, "end": 3680.1200000000003, + "text": " so awesome to meet you thank you very much thank you thank you and Daniel + on", "tokens": [51036, 370, 3476, 281, 1677, 291, 1309, 291, 588, 709, 1309, 291, + 1309, 291, 293, 8033, 322, 51256], "temperature": 0.0, "avg_logprob": -0.2811511484781901, + "compression_ratio": 1.8245614035087718, "no_speech_prob": 0.02362840808928013}, + {"id": 526, "seek": 366228, "start": 3680.1200000000003, "end": 3685.0, "text": + " yeah thanks for having us yeah thank you take care bye bye", "tokens": [51256, + 1338, 3231, 337, 1419, 505, 1338, 1309, 291, 747, 1127, 6543, 6543, 51500], "temperature": + 0.0, "avg_logprob": -0.2811511484781901, "compression_ratio": 1.8245614035087718, + "no_speech_prob": 0.02362840808928013}]' +--- + +Hello there, Vector Podcast is back. Same season 3. I think we are about to wrap it up with few final really interesting episodes. +There I have the privilege to talk to the open source crew Eric Pugh who you have seen in the one of the previous episodes and you guessed Daniel Riggly joining us to discuss really interesting topic on hybrid search and optimization. Really really excited to have you both on the show. Hello. +Awesome. Awesome. So as a tradition we start with the intros. Eric everyone knows but Eric feel free to introduce yourself. I mean great to be back to me tree. +I actually I'm a little late getting here because I realized I was driving to the office that I forgot my mug that you gave me the other year. So I actually called Daniels like I'm going to be a little late because I got to go home and pick up the mug and bring it into the office. +My wife keeps it and we use it when we go hiking but I was like I'm going to bring it into the office and show it off since this is my second podcast to do with you and the mug that you gave me two years ago three years ago at this point. Yeah probably three years. Works great works great. +So yeah super excited to be back here and you know kind of talk about some of the work that we've been doing with the open search community. So exciting. Yes. And Daniel welcome. Can you say a few words about yourself your background? Absolutely yeah thanks. It's great to be here. +I'm super excited maybe a little nervous but I'm sure it'll be fun. So I'm Daniel I'm with open source connections. I started out as a search consultant back in May 2012. + So almost 13 years now and I'm here to share some of our experiences that we made in our most recent project together with the folks of open search when it comes to hybrid search how to optimize hybrid search and also what's necessary to optimize hybrid search namely query sets and judgments but I'm sure we'll get into that in a couple of seconds. +Yeah thanks Daniel. I'm also nervous but I also know that you know when I release in the episodes I enjoy them. It's just it's just fun really. So I was thinking like hybrid search yeah we did discuss and I think community discusses it at large and various forums. +Erick also reminded me of the episode with Alessandro Benindetti that we just did it really was worth. +Yeah I was really just curious maybe step back from that topic a little bit and discuss the importance of hybrid search and what is it in your own words where do you see value for it compared to how we used to do search before. You want to take it Daniel and then I'll follow up. + Sure yeah so I think we see hybrid search especially in this project as let's say the process of blending traditional keyword search and also let's say modern search approaches based on language more or mostly called either vector search or neuro search and I think the benefits of it are probably you follow it or I guess you can you can group them into two groups. + Looking at the end user we always want to provide the end users with the most or the highest quality results right so search result quality is what we strive for and traditional keyword search always lacks of let's say finding related things that may not really contain the specific words but similar so laptop and notebook is an example that I think we ran probably a million times in demos maybe even more than a million times so if notebook is not in my product description I will not find it when I search for laptop and the other way around and that's where let's say blending the two techniques really shine because it enables you to not only find where your keywords are in but also find related stuff to augment the result set and I think that with that with that large benefit of course come a lot of challenges because it always is let's say non-tribal how to actually blend the traditional techniques and the more modern techniques so that's where the challenge between or the challenge behind hybrid search actually lies. + I mentioned two groups for which there are benefits of the end user we want to provide the end user with the highest quality results that's one group the other group is of course we as the ones providing search applications I mean we somehow need to profit from providing better results and it then is always different or yeah different in which let's say scenario in which industry we are working so the monosperm transparent one is always e-commerce the easier the end user the consumer actually finds stuff in your online shop the easier is for them to buy stuff if they buy more stuff more easily of course we generate more revenue and that's kind of the benefit then that comes with providing better search results and the other way is we don't want to let's say manually tune systems let's say indefinitely so of course I can go ahead and say laptop is synonymous to notebook and PC is maybe broader term of laptop and rules like these but that's kind of work that is never done if I have a changing catalog that I don't know old products get thrown out of the product catalog new products arrive so it's a never-ending challenge for me and I don't want to let's say spend my work for us always manually hunting these rules and thinking about what made the users mean when they start for something I want something let's say intelligently looking for the right things in my index and that's what the neural part of hybrid search enables us so I think these are definitely maybe the two groups that benefit and how these two groups benefit from my perspective yeah that's really good intro Ericy water take it yeah I think it's an interesting journey that we've been on the last few years and I sort of look at hybrid search as a little bit of a like a course correction right so keyword search been around forever well understood frustrations are well known and then vectors came out and all these new products these new vector databases everybody was really excited about them and we all said oh okay let's go use vectors and we leapt on that and got really excited built everything using vectors and I think maybe we went too far that way over into vector land and we started after we started getting some experience with vectors we started realizing some of the problems with it right like doesn't matter what you query you're gonna get some search results right sometimes zero search results is the right answer right you know interesting challenges around you know faceting or pagination or highlighting can be weird right so you know I think that there are some definite challenges in vectors and we all went over that way and I think we've seen it in the last two years where all the vector databases were frantically adding keyword like search and all of the keyword search indexes were all frantically adding vectors okay now we have these things as like where do we go oh hybrid search right hybrid search and you know hybrid search popped out and you know hear me out I think hybrid search is just good old federated search from the late 90s and 2000s where you had two search engines with two we send out two queries and then you brought them back and you're like how do I merge them together and sometimes you do terrible things like two lists of results right we was sometimes we would try to link them up together um it's the same idea whether you're going to one search engine you're making a keyword search and a neural search and bring them together or two totally separate see keyword search engines you're still bringing back two lists however I think at least this time around how to merge the lists of results together seems to be going better than when we did it back in federated search right uh and I look forward to talking more about like some of the ways that we bring hybrid you know build our hybrid results set together um part of me really kind of wonders why ranked reciprocal fusion wasn't a thing the last time I did federated search back in the 2000s right like doesn't seem like that crazy of a concept why didn't we do that right but we didn't uh so I'm a little more optimistic about the value of it but um it i think hybrid is a little bit of something old coming back because we're back to the same problem I literally have two search engines two concepts for how to do information retrieval and yet I want to blend it into one yeah that's exciting topic I think to me a hybrid search opens doors beyond sort of what I think Daniel just explained you know the semantic connection between you know keywords and so on is where you go multi-model right of course you need to go there carefully probably but if you do miss metadata on a particular you know image on the product you could reason about it using the image itself and maybe also video because we have video alarms as well they're more expensive of course to run but you know sky's the limit so to say if you want to go there and go um yeah so I think I think in that sense hybrid search uh unlocks many more avenues to explore including in e-commerce I think right yeah yeah yeah I mean I love that we are actually getting away from the old just straight up bag of words that was keyword that served us for a long time but still was just a very rough approximation of what people want right I mean BM25 you know people say it's not even the best algorithm it's just as fast as the one that we use uh vectors is sort of this idea of there are richer ways of understanding user queries and the content tech and just going beyond taxes the you know it's absolutely wonderful right lots of different things I mean some point we'll do a vector search on usage patterns right to figure stuff out right like it'll be the the mode will be activity won't be video or image or something it'll be activity you'd be like oh yeah that's the person I want to talk to they have the same activities as me based on whatever it is that they do right so but those kinds of things definitely are expressed through the vectors um I do think that hybrid is an amazing thing for right now for the next few years uh I do think though it's also a little bit of a bandaid in the sense that we're still leaning on keyword search for you know various use cases and if we were to look 10 years out I think an ideal solution is that we're not doing hybrid anymore just have a better approach to search something beyond vector plus keyword something better that still supports the zero results is the right answer you know some of these problems that vector using vectors gives us right we would have better better approach and not this slightly band-aid I have two different ways of searching and then have to wedge them back together yeah I like for now hybrid's exciting yeah I like that I like where you're going I wanted to also I wonder if you saw that blog by Dr. + Enbal I will make sure to link it where he talks about uh rrf you know reciprocal rank fusion and he shows on a like handcrafted example that uh you know if let's say neural search brings relevant result to the top of few results keyword lacks and doesn't so it basically brings noise when you combine the two you will end up having kind of half noise half signal and it will look terrible it will look terrible right and where do you stand on this like only there was a way yeah yeah it's only there was a way of at our understand not just blindly yeah you know blindly matching things um and I'll hand it over to Daniel and just a moment I do want to call that I really liked uh your previous episode with all a Sandra where uh I can't remember was you were all a Sandra but you kind of I think it was you Demetri said yeah that your engineers were looking at hybrid search and they kind of looked at it and said when you strip away the fancy words like ranked reciprocal fusion for blending things together you're like that's just round Robin right and you know round robin is not necessarily a round blind and it's blind round Robin right it's not round robin in in your middle school when you had to pick teams for dodgeball right the people picking knew who the best players were so at least you were at least divvying the best choices and at the very end those last two kids you know you knew they were the worst choices they were the noise in the search results right but you were that round robin at least had the benefit of knowing what was good ranked refresh ranked reciprocal fusion has no sense of whether the those results are good or bad right it is literally blindly picking them in some order with no sense of uh of what that is and as you can imagine blindly picking is going to leave lead you be pinched potentially a very weak dodgeball team right and yet that's what we think of a state of the art yeah so Daniel what should we do in this case is there any solution it's it's a good segue into what we actually tried and explored and experimented with um so in our most recent work we tried to come up with a systematic approach to optimize hybrid search specifically in open search um so in open search actually right now you have linear combination techniques at hand so that means you have two normalization techniques you can choose one the L2 norm the min max norm um they are basically both there so that you can normalize the scores from keyword search into the let's say space of vector search so that you can compare apples to apples more or less here and not apples to oranges because as we all know vm25 scores especially if you have if you have like wired field weights they are unbounded they can be in the dozens the hundreds the thousands so you don't really know upfront in what range you are operating and also you can't really compare the scores from one query to another query so that makes it really difficult to combine keyword search scores with any other um let's say search mechanism together with these normalization techniques the L2 norm them in max norm you have three combination techniques at hand and that's basically just three different means you can apply the arithmetic mean the harmonic mean and the geometric mean so that leaves you with two by three so that's already six parameter combinations that you can try on and then you can define weights um so how um how much neurosurge weight how much keyword search weight do i want to have in my query they always add up to one so you can say I want to go with 10% keyword 90% neural or 50-50 um thinking of let's say 11 of these weights so maybe you start with zero keyword and a hundred percent neural and 10% 90% and so on and so forth so that gives you a range of 11 multiplied with the six parameter combinations that we already had gives us let's say a solution area to explore of 66 different combinations which is pretty manageable so we defined optimizing hybrid search as a parameter optimization problem and we picked the most straightforward approach that you can pick and we just tried out all different combinations and calculated search metrics based on judgments and then we just had a look at which one is the best combination um for our experiments we used the ESCI data set um so that was released by amazon a couple of i think 18 months ago or something like that as part of our taglet competition um this data set comes with queries comes with products and most importantly it comes with judgments so we basically have everything that we need to really try out different um parameter combinations see how they work what results are retrieved um can calculate a couple of metrics compare these and then see which one is the best um parameter combination and um that's what we call the global hybrid search optimizer so we try to identify the best parameter combination globally for all the queries that we are looking at in a certain defined subset of queries so that's kind of the first step um the very very straightforward approach that we applied that's not really something um let's say scientifically um so first decated there was just a very brute force approach to see um what's in there also learn how results may be shaped or turn out differently when we increase neural search weight or increase the keyword search weight which normalization combination uh technique is usually the one that's best to retrieve the results and so on and so forth so um we started out with what I call reasonable baseline so searching across um I think five or six fields so title category color brand and description some bullet points so ecommerce data set like pretty basic stuff um and we calculated our metrics with that baseline so um I would call it uh probably not the best baseline you can come up with um but a reasonable baseline um you could come up with so we didn't want to let's say just create the weakest baseline because that's not really difficult to let's say outperform so we wanted to create a reasonable baseline without putting let's say a man here in finding out what the best baseline is um that's called okay right um we got decent results out of that and then we ran this global hybrid search optimizer and that outperformed the baseline already um across the metrics that we had a look at so better in DCG better DCG better precision at 10 were um these were the three metrics that we had a look at and that was nice to see because that already gave us um let's say assurance in a there is a straightforward approach that everyone can use because it's really easily applicable um it gets you good results and it also gives you assurance that there is something too neural search when switching to it from a keyword based um search engine or search application to a hybrid search um application but um as always when you apply something globally there are winners and there are losers so um some of the queries really improve by this hybrid search optimization step the global one but others didn't so we took this one step further and thought about how can we um really create a process that dynamically per query now predicts what the best parameter set is and that now is also like going in this direction that dark mentions in his blog post right so that's kind of a query understanding approach to hybrid search so we're not just blindly applying one parameter combination that we identified on a thousand queries that we explored we are taking one query analyzing this one query and then saying based on a variety of experiments that we made what is for this individual query the best parameter combination that we cannot apply so that we are not really globally applying something but individually dynamically per query and um to already maybe yeah give you the results um of what we did and then go into detail how we did it um the dynamic approach outperformed the global approach so we managed to identify a set of features we trained a model or multiple models actually and by applying this we were able to predict the best neuralness in that case or the best neural search weight for a given query based on um the results we got off the global hybrid search optimizer so we basically recycled all the different search metrics on a per query basis that we got did some feature engineering trained models and then used these models to predict what is the best neural search weight for this query and with this dynamic approach we even saw increases up to 10% in one of the metrics of the three that I just mentioned DCG and DCG and precision at 10 yeah that's very exciting thanks for for sharing this whole you know end-to-end picture pipeline I'm particularly interested at least at this point in time about the fact that well first of all your dynamic approach outperformed the global one right and that seems to be thanks to that query understanding part right can you talk a bit more about that uh and also did you check those predictions manually for example does it make intuitive sense that system picked that's for that specific query it picked more neuralness I mean is it like is it like a natural language question there or like some remnants of it or is there some other interesting findings that you could share possibly oh yeah so um let's first maybe outline what we did exactly and then dive into a couple of observations that we made on the way um so we started out by creating what we call feature groups and then we created features for these feature groups so we looked at three different feature groups um one was the query feature group the next one was the keyword search result feature group and then the semantics search result feature group for the query feature group we had a look at the length of the query the number of query terms if the query has numbers in it and if the query has any special characters in it so we kind of thought of ways figuring out when is the query maybe more specific when it is more when is it more broad query a narrow query and then we will just come up with rules like well a longer query is the more specific it is and maybe if we have more specific queries we have less results that's where we may want to augment search results with neural search results on the other hand when we have a very broad query we may have a lot of results these are short queries and then we may want to let's say only use organic traditional keyword search results yeah so we just came up with a couple of assumptions on our side and then with these four features for the keyword search result feature group we looked at the number of search results the number of hits we got when we executed the query with our baseline search configuration so the one searching in the six fields and then with something like hey if we have zero results then this is maybe a perfect scenario for neural search because then we want to augment zero results with zero keyword results with what comes from the vector search application the other two features we had in this group were the the best title score we had in the keyword search result so if we have a strong title match maybe that's an indication of we don't need as much neural search and we also have a look at I think the average title score in the top 10 was was another one so if we have like a high average in the title scores that's maybe a good sign for no augmentation needed with neural search results for semantic search it was similar to the title score so we had a look at the best title score and the average semantic similarity based on the title that we had indexed so by looking at these three groups we thought of well we now have a representation of the query on its own the result set based on keywords and then the result set based on neural search and that was kind of our starting point and then we did loads of experiments having to do with what's the best feature combination when we train a linear regression model or a random forest regression model what does regularization play for a role can we optimize the model training aspect with that so we really did a lot of iterations with these we also had a look at a large query set and versus a smaller query set to see if that also provides different aspects to it if we just randomly sound the 500 yet 500 queries versus 5000 queries so we did a lot of exploration to really pick out and make sure that we are not really let's say randomly receiving the uplift that we saw but actually really making sure that there is something to it and that we can go out in the wild there for example on this podcast and share our observations and be kind of on the safe side that they can be reproduced hopefully in other scenarios as well so that's kind of on the let's say how did we do it here so the features feature engineering how we train our models so we have a look at linear regression models and random forest regression as a starting point because we thought let's have a look at simple models first and if that works we can still have a look at the more complex ones and that's maybe already the first observation that I can share so linear regression models and the simplest form random forest regression slightly more complex form and then this is the last model iteration that we did last week we also have a look at gradient boosting methods and interestingly they all were almost the same from the model performance perspective so it wasn't like the most complex ones really give you the best results and that's kind of a very reassuring feeling because we need to calculate a couple features right per query and that adds latency to your query execution and especially in e-commerce where every millisecond basically counts we don't really want to let's say run multiple queries to calculate our features just to have like 0. +3 percent performance increase so it really has to be worth the effort so that's kind of a nice observation so we don't have to go for the most complex model architectures we can stick with the simple ones and don't really lose a lot of performance if any the linear regression model and the random forest regression model they actually scored absolutely equally when calculating the search metrics so they predicted the NDEG scores slightly different so that's how we did it we predicted NDEG scores by adding neuroness as a 10th feature in that case and by looking at which neuroness scored best and that kind of the neuroness we went then with for testing efforts afterwards so that's kind of the first interesting observation that we made we also had a look at different feature groups so what happens to model performance if we focus only on query features or only on keyword search result features so training models within one feature group only and not taking all features into account the interesting part here was that the combination work best so not always the combinations of all in nine features together with the neuroness feature but at least some of the key word search result features some of the semantics search result features and some of the queries features so they were best but looking at the query features only and these are the simplest ones to calculate they weren't part of these models from this performance aspect so you wouldn't really lose a lot if you only shows query features for your predictions so that was another nice observation here that if you went with the highest performance in terms of let's say inference speed and also speed of calculating the features beforehand you don't lose a lot of search result quality so again you don't have to go with the most complex approach to get reasonable results out of that which was I think the second most important finding at least from my perspective because that gives us again the assurance that when putting this into production we don't assume let's say hundreds of milliseconds added to your query latency if you stick to the the simple features. + May it means that there's like room for growth with this technique right we're not maxing out this technique just to get started we can start out and then as we get more sophisticated we have room we have milliseconds to burn to do other cool interesting things ask an lllm to characterize the query or something like that right we've got room for bro but I will also like this lesson and I think it kind of resonates with what I've seen in doing a mal previously is that start with like simpler solutions and try to kind of maximize ROI you know by upgrading to a more complex one and you need to set some thresholds because like as you said Daniel you know just you know adding 0. +03 won't get it right it doesn't it's not worth because also when you bring in neural search it means that you need to build that parallel index of things right like you need you need to compute and maybe GPUs as well and someone will need to pay for it and I guess the passing Daniel we're doing this project I kept asking Daniel I'm like why is re-indexing with embedding so slow like where's my turbo button like really why is this still a problem it's 2024 almost 2025 why does embeddings take a long time yeah I remain a little confused why it's so like don't we just turn a knob and make a GPU go faster and then re-index with embeddings is just the same speed as re-indexing with keywords yeah but but also what Daniel says now but also like the fascinating part like one thought crossed my mind as you explained it Daniel is that in some sense you've built some sort of like a reasoning engine if I may call it that way maybe it's not fully you know reasoning like I don't know LLM start to do it that way or something but it's like the engine that looks at the query and examines its features and makes some some conclusions and then it looks also at the results it's not like you just you just you understood the query you set it over to the retriever side and then you hope for the best that there will be best results right but in some sense you you basically do this dynamic sort of reasoning on top of everything but but the lesson there you said and correct me if I'm wrong is that just by looking at the query features you could already achieve good results you don't need to look at the result features yes yes but wouldn't it be nice to look at those yeah I do love the idea of looking at both sides right we tend to focus on queries because I think that's the viewpoint of our industry we are very query centric in the search world it's all about the query and what can we get out of the query we really don't look at the results that much except to say are they good or bad and we're not particularly good about factoring in and what did the user do back into our algorithm and yeah I love that this is a little the dynamic thing that we're doing here I think it's a pointer to bring in more dynamic aspects to our algorithms where they actually can start evolving or changing or being very specific to very specific query types use cases time of year right and today that's very difficult to do only the most sophisticated to teams have sets of algorithms yeah but I also feel like I like what you said Eric like looking at results you know and reasoning about results and also what you understood about the query might lead to much better final representation of what you show to the user right because there are so many there are so many factors also beyond the query and results right like as you said season or you know you observed some patterns with the user the recent purchase history and so on and so forth yeah I mean it's very fascinating and also like if I continue to draw this analogy with lm world you know when you ask lm to to think through what it has done it may correct itself right by just looking at what it has produced because lm's are as someone said calculators for words so if you if you give it itself its own output and ask yeah exactly yeah like I can't wait to write a search algorithm that understands what they did the last time the user didn't like the results and so when you get a similar query for the same user do something new right try something new because whatever you were doing before the user didn't like yeah joke about if the user hates what you're giving them you might as well just return random docs because that'll be better than whatever you're doing right now yeah yeah at least you you have a chance there right with the random so what question I sort of have though in what we described how is it different than learning to rank other than learning to rank is about ranking one list and here we're ranking two lists is do I just conceptually have the role of learning to rank wrong between what learning to rank is and how the dynamic hybrid optimizer was working so I mean we are not re-ranking results right so that's what typically learning to rank does but what we are kind of doing is we are learning when when to let's say increase the weight on keyword search results or on your search results right so it's kind of a learn to lend learn to blend learn to search the new technology or we just done with the learning to nomenclature right and we like optimizer better than learning to blend maybe yeah so I'm not the one who's most creative in coming up with clever names so maybe it's maybe it's time for not learn to but blah blah blah optimizer and that's kind of how we ended up with the hybrid search optimizer right now but I wouldn't really have a good let's say argument against calling it learning to optimize hybrid search or something like that because that's kind of what the dynamic approach does right as we gather more data more clicks more of that those go into the features right we even use the the language of learning to rank right feature engineering right we use that language and we're building a model and you even mention right linear models in a forest and you know those are all the the words that I think of as oh it's learning to rank so interesting I think that's just it's interesting to see learning to rate maybe come back in a new way yeah yeah I'm learning to rank still something you can apply on top of the hybrid search optimizer right so it's not like we have any kind of substitute here so that's kind of still I think a very valuable tool in the mix and that's just now one way to really figure out what's the best way of getting to reasonable hybrid search results yeah but I was recently also thinking about this I wonder what's your hunt on that but learning to rank sort of depends on the training data and you usually collected from the past you don't collect it from the future right and so as you move into the future and patterns change you carry over that past weight that can actually go against the intent of your reasoning engine and and that's where I think that a lot of work needs to go in all of these directions but as you optimize your retrieving and your reasoning engine you know your query understanding maybe you should dial back the LTR a little bit or maybe you need to retrain it right there right then I don't know or retrain frequently enough so that you don't lose the invention strengths right yeah I think that's our challenge in a lot of these things yeah the historical approaches versus the predictive approaches right and you know which ones do you go with and how do you discount the historical if you have a bunch of new interesting data yeah yeah but I also like the limitations of the physical world I think from investment books I've read one key key takeaway lesson is that no one can predict the future if someone claims that they can do they probably lie but but again I guess there is still room for being more dynamic and is there something you guys want to also show I mean is this something we can look at visually or well theoretically we can no pressure I mean so this small demo here and I'm going to show the results first and then how we get to these results it basically takes in a user query my such application now is this Jupyter Notebook so it's not the most sophisticated search application but it calculates the query features then with these query features reaches out to the model to get the the best neural nest what these search features and with that retrieved response the query is built together and then sent to open search so we're just going to have a look at a couple of examples first and then we can have a look at the code so again this is now part of the ESC iData set my index has like 20,000 documents in it so it's not large it's only a subset of the ESC iData and when we send queries now in this case waterproof jacket in this method the method first as I just explained cause out to the model it retrieves the neural nest score and then builds the query together and then we have this HTML display here as you can see there are not images available for all of the products but what we can see here is that what a weatherproof sorry weatherproof jacket it gets a 50-50 search waiting also in this case 50% keyword 50% neural if we go for weatherproof jacket for the women the weights change so now we have 90% neural and only 10% keyword search weight and then oh and that's because the query became much more specific meaning that since we did add women there we are not expecting results for men or for kids right is that exactly so that's kind of what we can now infer maybe in what the model picks up here so we have a longer query we have a more specific query so it's not really looking at the words I'll say the model but at the features like query length are there numbers in it are there any special characters in it and so on so another one weatherproof jacket black and we also see that maybe results that we wouldn't really expect in the top here but again it's only a smallish proof of concept kind of thing that we are looking at here but we can see different queries that are similar from let's say meanings standpoint of you they retrieve different weights in that case and that's kind of the interesting thing and we can go for something completely different as well um iPhone case and we see like nice iPhone cases throughout the query and that goes with neuro. +7 and keyword. +3 and I don't know what's iPhone 15 roll max a is black so that would be a very very very specific query and here again the neural search rate increases whereas when we go for a very broad query so that's maybe one characteristic of the model that you that you can almost feel is the more specific we get the more neural weight it gets but also other features do play a role. + Yeah yeah that's very interesting and that's how the open search query looks like so let's say the interesting part is that one here so we have a keyword query which is like I explained before searching in these couple of fields with different field weights a best fields query multi match with the end operator and then we have a neural query that retrieves the A100 based on the title embedding that we have and then the hybrid part is actually the search pipeline that normalizes in this case with the L2 norm and combines the results with the arithmetic mean based on the keyword search rate and neural search rate and that are here passed in as variables that are another method yeah predicts with model inference in that case. + Yeah that's that's kind of the small but built in Jupyter notebook prototype that we have everything we built is like Eric just mentioned open source so we have a public repository that contains all but this one notebook actually but everything you need to train the models do the feature engineering calculate the search metrics so yeah everything you actually need so running this with the ESC IDATA set is possible for everyone and if you want to apply with your own data that of course is also possible so that's kind of what we are looking at next here so adoption in the industry and also hooking it up with the other part of this project which is namely the let's call it the evaluation part calculating implicit judgments based on user feedback so clicks queries stuff like that so that we also enable not only everyone to optimize hybrid search but also enable and empower everyone to well come up with judgments if you don't have any because that's kind of the the basics you need for any query you need. +Yeah and that's where keep it comes in. Shameless flag. + So one of the things we actually have a reference implementation you know some of you all may have heard of chorus which is reference implementation for e-commerce search we did it in solar this we have an open search version and some of the stuff that you're seeing is sort of bleeding edge hot off the presses but we're working right now on getting that chorus for open search edition updated with some of these scripts and notebooks so you can just check it out run the quick start and then have everything and start playing with it so you don't have to build all the steps yourself so you can see how all the pieces fit together and so that's available be great if we can add a link in the line or notes for that as well to meet you. + Yeah let's do that and just want to also to understand this search pipeline and all this you know mechanics of hybrid search that you guys implemented is it like a plugin to open search and what's the plan for it right so I guess you spoke to that like let's let's give it to as many users as possible what's your idea there. + So hybrid search is available in open search so with let's say the the basic share so you can create a pipeline and say 70% keyword search weight and 30% neural search weight you can also define these on the fly but we currently have the limitation that we although we can hook up the model within a so-called ml inference pipeline in open search this ml inference pipeline can as of now not pass the predicted neural and keyword search weights to this pipeline but feature request is already out there and I assume that in one of the next open search versions we will have the possibility to not only what's already possible hook up the model within open search so that it from within open search you call out to the model to make inference you will retrieve the predicted neural and keyword search weights and then you can use these in your search pipeline so there is already an implementation plan out there there are open feature requests and if anyone wants to give these thumbs up to prioritize that within the open search community that would of course be greatly appreciated and I'm sure we can include these GitHub issues as well in the show notes. +Yeah for sure let's do that and we will call out that's how to call out to the community please vote if you care I hope there will be enough people who care about this. +Yeah exactly and then the you said everything is open source does that mean that the training scripts the algorithms the choices you make you can make there is also open source right and we can link to that as well yeah. + Yes so everything with the I mean we of course didn't include all the thousand experiments but all the helpers at least that we used to run these 1000 experiments they are all in the repository out there for everyone to have a look at and maybe come up with even better ideas than we had so we definitely always love to hear these as well. + Wow this is amazing this is a we can true sense you speak up to your name open source connections that's amazing like to me it's a ton of work that you could choose to hide as well right and work only with your clients and and nurture and iron out and everything and make a ton of money and continue on that but you choose to open source because you believe in the power of the community as well to enhance it that's amazing well said what's next you already said a couple of words but what's next and also do you want to address our audience what do you expect from me I think you know one of the things is you know as we've gotten into this you know we've we've found some rough spots in open search right open search has a strong ML component ML Commons project right but integrating it in sort of new and interesting ways like what Daniel was showing we're finding some rough spots it is interesting to me it does bring them in the do search engines what we call a search engine need to evolve to be more of an ML engine as well right I mean it feels to me like search has been revolutionized by machine learning and as we move into this direction of more calculating building models evaluating data on the fly do our search engines need to support those use cases and go beyond just the I I get a query I get documents I match them up and that's it right is there another layer of computation that we kind of need in the search engine versus bolting it on in some other environment with an ML ops pipeline and all the rest um and and I think that's in the interesting you know one place where I think open search is a little bit you know is definitely you know breaking some interesting ground is all the machine learning aspects to it but you know data processing and building models and all of that needs to maybe be a first class citizen of what we consider a search engine versus something done by some other system elsewhere because that's a lot more complexity and you know raises the barrier to adopting these things so I look forward to things like a hybrid optimizer just sort of being like what you do when you build your search engine of course you turn on the hybrid optimizer if it meets your use case and you have the judgments and other features that you need right versus oh a major engineering project that we're going to do this going to take us six months so um yeah yeah yeah um um and you know supporting that you know as Dan you highlighted the search quality evaluation framework that we're adding to open searches really exciting would love to come back to Metri and talk all about that on another show yeah let's do that I'm really excited to dive deeper into evil because I think in so many ways you need to start with evil especially if you have a search engine right out there uh to establish that uh you know baseline for yourself and then learn and introspect where things uh work or fail yeah make sure some of these models don't go and produce terrible yeah back shit crazy results right we had the risk that you know boosting up on neuralness might be terrible right we have to understand that and so we need to be much better about evaluation right we can't take our eye off the ball of speed right query speed does remain just sort of top of mind we also really need to be good at evaluation and I think for a lot of search teams that's kind of a new thing yes absolutely so the pooling and low search agents yeah and we do want to do what other thing I want to shout out yeah one other thing I want to shout out is that the next haystack conference uh the saved the date last week in April right um last week in April save the date went out um and we are looking for uh talk reviewers people who want to be reviewing the talk proposals it's a double blind process we get a couple people from the community so if that's something interested reach out to David Fisher uh he's running that process uh and call for proposals will be out I'm curious if this year haystack in Charlottesville might as well just be called hybrid stack like are we gonna have two days of talking about hybrid uh because R. +A.G. + rag is rag is you know that's last year now we're on to hybrid uh or is there going to be something new so it's going to be interesting to see what what what kind of comes out of the community uh for this year's haystack it's also very interesting to to see this dynamic or this evolution because I think two or three years ago hybrid search was at the top of charts everyone was discussing it and then rags proceeded it and that just tells me that maybe we didn't dive deep enough into the topic which is let it go it passed and people thought oh that doesn't work we need we need something else and now it's rag rag rag everywhere but then rag also comes with its own you know limitation uh and now we have a new you know evolution level right with hybrid search you guys are optimizing uh that's amazing I was actually waiting for it I'm happy to see it happen and thanks for sharing the story let's come back to the evolve when you guys are ready I also expect that you will publish some blog posts around this topic about this topic so I'll you know I'll be happy to promote those as well and read them of course and learn terrific yeah thank you thank you very much thanks for coming to the show today and sharing this story it's very exciting and also showing the demo I like notebooks because they love you you know too quickly uh they say things and and have a feel of what's going on I'm glad that Eric still uses the cup that we gave him as a gift I love it it's like like you really see you really see the gift you gave like coming back and saying you did this and the person you gave it to is still enjoying that's amazing so awesome to meet you thank you very much thank you thank you and Daniel on yeah thanks for having us yeah thank you take care bye bye \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/amin-ahmad-cto-vectara-algolia-elasticsearch-like-search-product-on-neural-search-principles.md b/transcripts_with_timestamps/vector-podcast/amin-ahmad-cto-vectara-algolia-elasticsearch-like-search-product-on-neural-search-principles.md new file mode 100644 index 0000000..4da2bc7 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/amin-ahmad-cto-vectara-algolia-elasticsearch-like-search-product-on-neural-search-principles.md @@ -0,0 +1,4359 @@ +--- +description: '

YouTube: https://www.youtube.com/watch?v=e2tZ6HD4I44

Update: + ZIR.AI has relaunched as Vectara: https://vectara.com/

Topics:

00:00 Intro

00:54 + Amin’s background at Google Research and affinity to NLP and vector search field

05:28 + Main focus areas of ZIR.AI in neural search

07:26 Does the company offer neural + network training to clients? Other support provided with ranking and document format + conversions

08:51 Usage of open source vs developing own tech

10:17 + The core of ZIR.AI product

14:36 API support, communication protocols and + P95/P99 SLAs, dedicated pools of encoders

17:13 Speeding up single node / + single customer throughput and challenge of productionizing off the shelf models, + like BERT

23:01 Distilling transformer models and why it can be out of reach + of smaller companies

25:07 Techniques for data augmentation from Amin’s and + Dmitry’s practice (key search team: margin loss)

30:03 Vector search algorithms + used in ZIR.AI and the need for boolean logic in company’s client base

33:51 + Dynamics of open source in vector search space and cloud players: Google, Amazon, + Microsoft

36:03 Implementing a multilingual search with BM25 vs neural search + and impact on business

38:56 Is vector search a hype similar to big data few + years ago? Prediction for vector search algorithms influence relations databases

43:09 + Is there a need to combine BM25 with neural search? Ideas from Amin and features + offered in ZIR.AI product

51:31 Increasing the robustness of search — or simply + making it to work

55:10 How will Search Engineer profession change with neural + search in the game?

Get a $100 discount (first month free) for a 50mb plan, + using the code VectorPodcast (no lock-in, you can cancel any time): https://zir-ai.com/signup/user

' +image_url: https://media.rss.com/vector-podcast/20220216_040237_4d74468969220e3376998953833bb185.jpg +pub_date: Wed, 16 Feb 2022 16:14:37 GMT +title: Amin Ahmad - CTO, Vectara - Algolia / Elasticsearch-like search product on + neural search principles +url: https://rss.com/podcasts/vector-podcast/393967 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 21.080000000000002, + "text": " Hello, vector podcast is here and today we''re going to be talking with + Amin Ahmed, co-founder", "tokens": [50364, 2425, 11, 8062, 7367, 307, 510, 293, + 965, 321, 434, 516, 281, 312, 1417, 365, 2012, 259, 39189, 11, 598, 12, 33348, 51418], + "temperature": 0.0, "avg_logprob": -0.3554973602294922, "compression_ratio": 1.1293103448275863, + "no_speech_prob": 0.09521778672933578}, {"id": 1, "seek": 0, "start": 21.080000000000002, + "end": 24.16, "text": " and CEO of the company called ZIR AI.", "tokens": [51418, + 293, 9282, 295, 264, 2237, 1219, 1176, 7740, 7318, 13, 51572], "temperature": 0.0, + "avg_logprob": -0.3554973602294922, "compression_ratio": 1.1293103448275863, "no_speech_prob": + 0.09521778672933578}, {"id": 2, "seek": 2416, "start": 24.16, "end": 30.2, "text": + " I''m really, really excited to talk to Amin because basically he''s innovating + in this space,", "tokens": [50364, 286, 478, 534, 11, 534, 2919, 281, 751, 281, + 2012, 259, 570, 1936, 415, 311, 5083, 990, 294, 341, 1901, 11, 50666], "temperature": + 0.0, "avg_logprob": -0.24888961335532686, "compression_ratio": 1.644, "no_speech_prob": + 0.18610267341136932}, {"id": 3, "seek": 2416, "start": 30.2, "end": 34.56, "text": + " his company is innovating in this space of bringing vector search to practice + and also", "tokens": [50666, 702, 2237, 307, 5083, 990, 294, 341, 1901, 295, 5062, + 8062, 3164, 281, 3124, 293, 611, 50884], "temperature": 0.0, "avg_logprob": -0.24888961335532686, + "compression_ratio": 1.644, "no_speech_prob": 0.18610267341136932}, {"id": 4, "seek": + 2416, "start": 34.56, "end": 36.36, "text": " making it usable.", "tokens": [50884, + 1455, 309, 29975, 13, 50974], "temperature": 0.0, "avg_logprob": -0.24888961335532686, + "compression_ratio": 1.644, "no_speech_prob": 0.18610267341136932}, {"id": 5, "seek": + 2416, "start": 36.36, "end": 38.16, "text": " Hey, I mean, how are you?", "tokens": + [50974, 1911, 11, 286, 914, 11, 577, 366, 291, 30, 51064], "temperature": 0.0, "avg_logprob": + -0.24888961335532686, "compression_ratio": 1.644, "no_speech_prob": 0.18610267341136932}, + {"id": 6, "seek": 2416, "start": 38.16, "end": 39.96, "text": " I''m doing fine.", + "tokens": [51064, 286, 478, 884, 2489, 13, 51154], "temperature": 0.0, "avg_logprob": + -0.24888961335532686, "compression_ratio": 1.644, "no_speech_prob": 0.18610267341136932}, + {"id": 7, "seek": 2416, "start": 39.96, "end": 40.96, "text": " Thank you.", "tokens": + [51154, 1044, 291, 13, 51204], "temperature": 0.0, "avg_logprob": -0.24888961335532686, + "compression_ratio": 1.644, "no_speech_prob": 0.18610267341136932}, {"id": 8, "seek": + 2416, "start": 40.96, "end": 41.96, "text": " Thanks for having me.", "tokens": + [51204, 2561, 337, 1419, 385, 13, 51254], "temperature": 0.0, "avg_logprob": -0.24888961335532686, + "compression_ratio": 1.644, "no_speech_prob": 0.18610267341136932}, {"id": 9, "seek": + 2416, "start": 41.96, "end": 42.96, "text": " Awesome.", "tokens": [51254, 10391, + 13, 51304], "temperature": 0.0, "avg_logprob": -0.24888961335532686, "compression_ratio": + 1.644, "no_speech_prob": 0.18610267341136932}, {"id": 10, "seek": 2416, "start": + 42.96, "end": 43.96, "text": " Thanks for joining.", "tokens": [51304, 2561, 337, + 5549, 13, 51354], "temperature": 0.0, "avg_logprob": -0.24888961335532686, "compression_ratio": + 1.644, "no_speech_prob": 0.18610267341136932}, {"id": 11, "seek": 2416, "start": + 43.96, "end": 50.16, "text": " And I know it''s almost like festive times, so it''s + probably quite a packed schedule for", "tokens": [51354, 400, 286, 458, 309, 311, + 1920, 411, 42729, 1413, 11, 370, 309, 311, 1391, 1596, 257, 13265, 7567, 337, 51664], + "temperature": 0.0, "avg_logprob": -0.24888961335532686, "compression_ratio": 1.644, + "no_speech_prob": 0.18610267341136932}, {"id": 12, "seek": 2416, "start": 50.16, + "end": 53.64, "text": " you otherwise as well.", "tokens": [51664, 291, 5911, 382, + 731, 13, 51838], "temperature": 0.0, "avg_logprob": -0.24888961335532686, "compression_ratio": + 1.644, "no_speech_prob": 0.18610267341136932}, {"id": 13, "seek": 5364, "start": + 53.64, "end": 58.0, "text": " So yeah, I was thinking let''s traditionally start + with the introduction.", "tokens": [50364, 407, 1338, 11, 286, 390, 1953, 718, 311, + 19067, 722, 365, 264, 9339, 13, 50582], "temperature": 0.0, "avg_logprob": -0.3188347524526168, + "compression_ratio": 1.4660633484162895, "no_speech_prob": 0.0052063073962926865}, + {"id": 14, "seek": 5364, "start": 58.0, "end": 65.92, "text": " Like, can you please + tell me a bit of background before ZIR AI and OZIR AI is a startup and", "tokens": + [50582, 1743, 11, 393, 291, 1767, 980, 385, 257, 857, 295, 3678, 949, 1176, 7740, + 7318, 293, 422, 57, 7740, 7318, 307, 257, 18578, 293, 50978], "temperature": 0.0, + "avg_logprob": -0.3188347524526168, "compression_ratio": 1.4660633484162895, "no_speech_prob": + 0.0052063073962926865}, {"id": 15, "seek": 5364, "start": 65.92, "end": 68.2, "text": + " you''re rolling at ZIR AI?", "tokens": [50978, 291, 434, 9439, 412, 1176, 7740, + 7318, 30, 51092], "temperature": 0.0, "avg_logprob": -0.3188347524526168, "compression_ratio": + 1.4660633484162895, "no_speech_prob": 0.0052063073962926865}, {"id": 16, "seek": + 5364, "start": 68.2, "end": 71.92, "text": " Yes, sure.", "tokens": [51092, 1079, + 11, 988, 13, 51278], "temperature": 0.0, "avg_logprob": -0.3188347524526168, "compression_ratio": + 1.4660633484162895, "no_speech_prob": 0.0052063073962926865}, {"id": 17, "seek": + 5364, "start": 71.92, "end": 76.16, "text": " Me and my co-founder, we started ZIR + AI in 2020.", "tokens": [51278, 1923, 293, 452, 598, 12, 33348, 11, 321, 1409, 1176, + 7740, 7318, 294, 4808, 13, 51490], "temperature": 0.0, "avg_logprob": -0.3188347524526168, + "compression_ratio": 1.4660633484162895, "no_speech_prob": 0.0052063073962926865}, + {"id": 18, "seek": 5364, "start": 76.16, "end": 78.16, "text": " Before that, we + were both working at Google.", "tokens": [51490, 4546, 300, 11, 321, 645, 1293, + 1364, 412, 3329, 13, 51590], "temperature": 0.0, "avg_logprob": -0.3188347524526168, + "compression_ratio": 1.4660633484162895, "no_speech_prob": 0.0052063073962926865}, + {"id": 19, "seek": 5364, "start": 78.16, "end": 82.44, "text": " I had been there + since 2010.", "tokens": [51590, 286, 632, 668, 456, 1670, 9657, 13, 51804], "temperature": + 0.0, "avg_logprob": -0.3188347524526168, "compression_ratio": 1.4660633484162895, + "no_speech_prob": 0.0052063073962926865}, {"id": 20, "seek": 8244, "start": 82.44, + "end": 92.72, "text": " I worked in Google Research, focused on NLP and language + understanding with machine learning.", "tokens": [50364, 286, 2732, 294, 3329, 10303, + 11, 5178, 322, 426, 45196, 293, 2856, 3701, 365, 3479, 2539, 13, 50878], "temperature": + 0.0, "avg_logprob": -0.166553709242079, "compression_ratio": 1.4851485148514851, + "no_speech_prob": 0.002010730793699622}, {"id": 21, "seek": 8244, "start": 92.72, + "end": 95.84, "text": " Prior to that, I had worked many other places in the industry.", + "tokens": [50878, 24032, 281, 300, 11, 286, 632, 2732, 867, 661, 3190, 294, 264, + 3518, 13, 51034], "temperature": 0.0, "avg_logprob": -0.166553709242079, "compression_ratio": + 1.4851485148514851, "no_speech_prob": 0.002010730793699622}, {"id": 22, "seek": + 8244, "start": 95.84, "end": 101.4, "text": " So I''ve been in the industry about + 24 or 25 years now.", "tokens": [51034, 407, 286, 600, 668, 294, 264, 3518, 466, + 4022, 420, 3552, 924, 586, 13, 51312], "temperature": 0.0, "avg_logprob": -0.166553709242079, + "compression_ratio": 1.4851485148514851, "no_speech_prob": 0.002010730793699622}, + {"id": 23, "seek": 8244, "start": 101.4, "end": 111.0, "text": " And around 2017, + the team that I was working on in Google Research actually became known", "tokens": + [51312, 400, 926, 6591, 11, 264, 1469, 300, 286, 390, 1364, 322, 294, 3329, 10303, + 767, 3062, 2570, 51792], "temperature": 0.0, "avg_logprob": -0.166553709242079, + "compression_ratio": 1.4851485148514851, "no_speech_prob": 0.002010730793699622}, + {"id": 24, "seek": 11100, "start": 111.0, "end": 114.08, "text": " for Gmail Smart + Reply.", "tokens": [50364, 337, 36732, 12923, 3696, 356, 13, 50518], "temperature": + 0.0, "avg_logprob": -0.27677382797491357, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.07919315248727798}, {"id": 25, "seek": 11100, "start": 114.08, + "end": 115.08, "text": " If you remember that feature.", "tokens": [50518, 759, + 291, 1604, 300, 4111, 13, 50568], "temperature": 0.0, "avg_logprob": -0.27677382797491357, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.07919315248727798}, + {"id": 26, "seek": 11100, "start": 115.08, "end": 117.0, "text": " Yeah, that''s + an excellent feature.", "tokens": [50568, 865, 11, 300, 311, 364, 7103, 4111, 13, + 50664], "temperature": 0.0, "avg_logprob": -0.27677382797491357, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.07919315248727798}, {"id": 27, "seek": 11100, + "start": 117.0, "end": 120.0, "text": " The moment I saw it, it was like, wow, that''s + fantastic.", "tokens": [50664, 440, 1623, 286, 1866, 309, 11, 309, 390, 411, 11, + 6076, 11, 300, 311, 5456, 13, 50814], "temperature": 0.0, "avg_logprob": -0.27677382797491357, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.07919315248727798}, + {"id": 28, "seek": 11100, "start": 120.0, "end": 121.0, "text": " Yeah.", "tokens": + [50814, 865, 13, 50864], "temperature": 0.0, "avg_logprob": -0.27677382797491357, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.07919315248727798}, + {"id": 29, "seek": 11100, "start": 121.0, "end": 122.0, "text": " Yeah, and it was + impressive.", "tokens": [50864, 865, 11, 293, 309, 390, 8992, 13, 50914], "temperature": + 0.0, "avg_logprob": -0.27677382797491357, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.07919315248727798}, {"id": 30, "seek": 11100, "start": 122.0, + "end": 126.08, "text": " And I would say maybe it was a very practical application + of NLP that went, that was deployed", "tokens": [50914, 400, 286, 576, 584, 1310, + 309, 390, 257, 588, 8496, 3861, 295, 426, 45196, 300, 1437, 11, 300, 390, 17826, + 51118], "temperature": 0.0, "avg_logprob": -0.27677382797491357, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.07919315248727798}, {"id": 31, "seek": 11100, + "start": 126.08, "end": 128.56, "text": " on a very large scale.", "tokens": [51118, + 322, 257, 588, 2416, 4373, 13, 51242], "temperature": 0.0, "avg_logprob": -0.27677382797491357, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.07919315248727798}, + {"id": 32, "seek": 11100, "start": 128.56, "end": 131.84, "text": " So that was + the research group that I was a part of.", "tokens": [51242, 407, 300, 390, 264, + 2132, 1594, 300, 286, 390, 257, 644, 295, 13, 51406], "temperature": 0.0, "avg_logprob": + -0.27677382797491357, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.07919315248727798}, {"id": 33, "seek": 11100, "start": 131.84, "end": 136.96, + "text": " It was under Rakers, while that was developed in collaboration with some + others.", "tokens": [51406, 467, 390, 833, 497, 19552, 11, 1339, 300, 390, 4743, + 294, 9363, 365, 512, 2357, 13, 51662], "temperature": 0.0, "avg_logprob": -0.27677382797491357, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.07919315248727798}, + {"id": 34, "seek": 13696, "start": 136.96, "end": 145.56, "text": " Anyway, around + that time, I became very interested in using neural networks for more general purpose", + "tokens": [50364, 5684, 11, 926, 300, 565, 11, 286, 3062, 588, 3102, 294, 1228, + 18161, 9590, 337, 544, 2674, 4334, 50794], "temperature": 0.0, "avg_logprob": -0.17008631317703812, + "compression_ratio": 1.663003663003663, "no_speech_prob": 0.062159594148397446}, + {"id": 35, "seek": 13696, "start": 145.56, "end": 147.32, "text": " information + retrieval.", "tokens": [50794, 1589, 19817, 3337, 13, 50882], "temperature": 0.0, + "avg_logprob": -0.17008631317703812, "compression_ratio": 1.663003663003663, "no_speech_prob": + 0.062159594148397446}, {"id": 36, "seek": 13696, "start": 147.32, "end": 152.44, + "text": " And I specifically formulated this question answering over a large corpus.", + "tokens": [50882, 400, 286, 4682, 48936, 341, 1168, 13430, 670, 257, 2416, 1181, + 31624, 13, 51138], "temperature": 0.0, "avg_logprob": -0.17008631317703812, "compression_ratio": + 1.663003663003663, "no_speech_prob": 0.062159594148397446}, {"id": 37, "seek": 13696, + "start": 152.44, "end": 157.88, "text": " And at the time, I mean, Bert, when it + was released a year later, changed this idea.", "tokens": [51138, 400, 412, 264, + 565, 11, 286, 914, 11, 29594, 11, 562, 309, 390, 4736, 257, 1064, 1780, 11, 3105, + 341, 1558, 13, 51410], "temperature": 0.0, "avg_logprob": -0.17008631317703812, + "compression_ratio": 1.663003663003663, "no_speech_prob": 0.062159594148397446}, + {"id": 38, "seek": 13696, "start": 157.88, "end": 162.48000000000002, "text": " + But at the time, a lot of people would approach a machine learning problem from + scratch.", "tokens": [51410, 583, 412, 264, 565, 11, 257, 688, 295, 561, 576, 3109, + 257, 3479, 2539, 1154, 490, 8459, 13, 51640], "temperature": 0.0, "avg_logprob": + -0.17008631317703812, "compression_ratio": 1.663003663003663, "no_speech_prob": + 0.062159594148397446}, {"id": 39, "seek": 13696, "start": 162.48000000000002, "end": + 166.48000000000002, "text": " It would take a completely uninitialized neural network + and then try to train it.", "tokens": [51640, 467, 576, 747, 257, 2584, 517, 259, + 270, 831, 1602, 18161, 3209, 293, 550, 853, 281, 3847, 309, 13, 51840], "temperature": + 0.0, "avg_logprob": -0.17008631317703812, "compression_ratio": 1.663003663003663, + "no_speech_prob": 0.062159594148397446}, {"id": 40, "seek": 16648, "start": 166.48, + "end": 173.0, "text": " And when the models get big and deep, mostly you don''t + have enough data for your task.", "tokens": [50364, 400, 562, 264, 5245, 483, 955, + 293, 2452, 11, 5240, 291, 500, 380, 362, 1547, 1412, 337, 428, 5633, 13, 50690], + "temperature": 0.0, "avg_logprob": -0.14112989334833054, "compression_ratio": 1.5891472868217054, + "no_speech_prob": 0.0020516153890639544}, {"id": 41, "seek": 16648, "start": 173.0, + "end": 178.51999999999998, "text": " And it also, you know, that doesn''t jive very + well with if you think about how humans approach", "tokens": [50690, 400, 309, 611, + 11, 291, 458, 11, 300, 1177, 380, 361, 488, 588, 731, 365, 498, 291, 519, 466, 577, + 6255, 3109, 50966], "temperature": 0.0, "avg_logprob": -0.14112989334833054, "compression_ratio": + 1.5891472868217054, "no_speech_prob": 0.0020516153890639544}, {"id": 42, "seek": + 16648, "start": 178.51999999999998, "end": 179.51999999999998, "text": " a task.", + "tokens": [50966, 257, 5633, 13, 51016], "temperature": 0.0, "avg_logprob": -0.14112989334833054, + "compression_ratio": 1.5891472868217054, "no_speech_prob": 0.0020516153890639544}, + {"id": 43, "seek": 16648, "start": 179.51999999999998, "end": 183.95999999999998, + "text": " If you ask me to answer a question or to read a message from a medical + textbook, I may", "tokens": [51016, 759, 291, 1029, 385, 281, 1867, 257, 1168, 420, + 281, 1401, 257, 3636, 490, 257, 4625, 25591, 11, 286, 815, 51238], "temperature": + 0.0, "avg_logprob": -0.14112989334833054, "compression_ratio": 1.5891472868217054, + "no_speech_prob": 0.0020516153890639544}, {"id": 44, "seek": 16648, "start": 183.95999999999998, + "end": 188.92, "text": " not be a doctor, but my understanding of the English language + will allow me to get some", "tokens": [51238, 406, 312, 257, 4631, 11, 457, 452, + 3701, 295, 264, 3669, 2856, 486, 2089, 385, 281, 483, 512, 51486], "temperature": + 0.0, "avg_logprob": -0.14112989334833054, "compression_ratio": 1.5891472868217054, + "no_speech_prob": 0.0020516153890639544}, {"id": 45, "seek": 16648, "start": 188.92, + "end": 192.2, "text": " of the information content from that passage.", "tokens": + [51486, 295, 264, 1589, 2701, 490, 300, 11497, 13, 51650], "temperature": 0.0, "avg_logprob": + -0.14112989334833054, "compression_ratio": 1.5891472868217054, "no_speech_prob": + 0.0020516153890639544}, {"id": 46, "seek": 19220, "start": 192.2, "end": 197.6, + "text": " So in the same way, I was thinking that if a neural network is truly understanding + language", "tokens": [50364, 407, 294, 264, 912, 636, 11, 286, 390, 1953, 300, 498, + 257, 18161, 3209, 307, 4908, 3701, 2856, 50634], "temperature": 0.0, "avg_logprob": + -0.151705795459533, "compression_ratio": 1.5974025974025974, "no_speech_prob": 0.0024464698508381844}, + {"id": 47, "seek": 19220, "start": 197.6, "end": 200.72, "text": " in the way that + people do, it should have this property.", "tokens": [50634, 294, 264, 636, 300, + 561, 360, 11, 309, 820, 362, 341, 4707, 13, 50790], "temperature": 0.0, "avg_logprob": + -0.151705795459533, "compression_ratio": 1.5974025974025974, "no_speech_prob": 0.0024464698508381844}, + {"id": 48, "seek": 19220, "start": 200.72, "end": 205.92, "text": " And it should + be possible to train a general purpose neural network that without fine tuning", + "tokens": [50790, 400, 309, 820, 312, 1944, 281, 3847, 257, 2674, 4334, 18161, 3209, + 300, 1553, 2489, 15164, 51050], "temperature": 0.0, "avg_logprob": -0.151705795459533, + "compression_ratio": 1.5974025974025974, "no_speech_prob": 0.0024464698508381844}, + {"id": 49, "seek": 19220, "start": 205.92, "end": 211.32, "text": " in a specific + domain can also work reasonably well.", "tokens": [51050, 294, 257, 2685, 9274, + 393, 611, 589, 23551, 731, 13, 51320], "temperature": 0.0, "avg_logprob": -0.151705795459533, + "compression_ratio": 1.5974025974025974, "no_speech_prob": 0.0024464698508381844}, + {"id": 50, "seek": 19220, "start": 211.32, "end": 213.2, "text": " So I set out + to build this thing.", "tokens": [51320, 407, 286, 992, 484, 281, 1322, 341, 551, + 13, 51414], "temperature": 0.0, "avg_logprob": -0.151705795459533, "compression_ratio": + 1.5974025974025974, "no_speech_prob": 0.0024464698508381844}, {"id": 51, "seek": + 19220, "start": 213.2, "end": 216.92, "text": " And that was my research program + in 2017.", "tokens": [51414, 400, 300, 390, 452, 2132, 1461, 294, 6591, 13, 51600], + "temperature": 0.0, "avg_logprob": -0.151705795459533, "compression_ratio": 1.5974025974025974, + "no_speech_prob": 0.0024464698508381844}, {"id": 52, "seek": 21692, "start": 216.92, + "end": 223.72, "text": " And we were actually able to launch the first iteration + of that model in a product called", "tokens": [50364, 400, 321, 645, 767, 1075, + 281, 4025, 264, 700, 24784, 295, 300, 2316, 294, 257, 1674, 1219, 50704], "temperature": + 0.0, "avg_logprob": -0.2097498807040128, "compression_ratio": 1.6374045801526718, + "no_speech_prob": 0.007150536868721247}, {"id": 53, "seek": 21692, "start": 223.72, + "end": 225.64, "text": " Google Talk to Books.", "tokens": [50704, 3329, 8780, 281, + 33843, 13, 50800], "temperature": 0.0, "avg_logprob": -0.2097498807040128, "compression_ratio": + 1.6374045801526718, "no_speech_prob": 0.007150536868721247}, {"id": 54, "seek": + 21692, "start": 225.64, "end": 230.56, "text": " So to, and I''m saying this to + my knowledge, I would love if someone corrected me in the", "tokens": [50800, 407, + 281, 11, 293, 286, 478, 1566, 341, 281, 452, 3601, 11, 286, 576, 959, 498, 1580, + 31687, 385, 294, 264, 51046], "temperature": 0.0, "avg_logprob": -0.2097498807040128, + "compression_ratio": 1.6374045801526718, "no_speech_prob": 0.007150536868721247}, + {"id": 55, "seek": 21692, "start": 230.56, "end": 232.72, "text": " comments section + here.", "tokens": [51046, 3053, 3541, 510, 13, 51154], "temperature": 0.0, "avg_logprob": + -0.2097498807040128, "compression_ratio": 1.6374045801526718, "no_speech_prob": + 0.007150536868721247}, {"id": 56, "seek": 21692, "start": 232.72, "end": 237.88, + "text": " This is Google Talk to Books is the first large scale end-to-end demonstration + of a neural", "tokens": [51154, 639, 307, 3329, 8780, 281, 33843, 307, 264, 700, + 2416, 4373, 917, 12, 1353, 12, 521, 16520, 295, 257, 18161, 51412], "temperature": + 0.0, "avg_logprob": -0.2097498807040128, "compression_ratio": 1.6374045801526718, + "no_speech_prob": 0.007150536868721247}, {"id": 57, "seek": 21692, "start": 237.88, + "end": 240.2, "text": " information retrieval system.", "tokens": [51412, 1589, + 19817, 3337, 1185, 13, 51528], "temperature": 0.0, "avg_logprob": -0.2097498807040128, + "compression_ratio": 1.6374045801526718, "no_speech_prob": 0.007150536868721247}, + {"id": 58, "seek": 21692, "start": 240.2, "end": 246.88, "text": " So it is a search + over a corpus of around 200,000 books from the Google Books corpus.", "tokens": + [51528, 407, 309, 307, 257, 3164, 670, 257, 1181, 31624, 295, 926, 2331, 11, 1360, + 3642, 490, 264, 3329, 33843, 1181, 31624, 13, 51862], "temperature": 0.0, "avg_logprob": + -0.2097498807040128, "compression_ratio": 1.6374045801526718, "no_speech_prob": + 0.007150536868721247}, {"id": 59, "seek": 24688, "start": 246.88, "end": 250.72, + "text": " But it''s done entirely with vector search.", "tokens": [50364, 583, 309, + 311, 1096, 7696, 365, 8062, 3164, 13, 50556], "temperature": 0.0, "avg_logprob": + -0.22665776704487048, "compression_ratio": 1.6153846153846154, "no_speech_prob": + 0.0012818914838135242}, {"id": 60, "seek": 24688, "start": 250.72, "end": 254.07999999999998, + "text": " And I''m not aware of anything before that.", "tokens": [50556, 400, 286, + 478, 406, 3650, 295, 1340, 949, 300, 13, 50724], "temperature": 0.0, "avg_logprob": + -0.22665776704487048, "compression_ratio": 1.6153846153846154, "no_speech_prob": + 0.0012818914838135242}, {"id": 61, "seek": 24688, "start": 254.07999999999998, "end": + 256.88, "text": " So the neural network is very important here.", "tokens": [50724, + 407, 264, 18161, 3209, 307, 588, 1021, 510, 13, 50864], "temperature": 0.0, "avg_logprob": + -0.22665776704487048, "compression_ratio": 1.6153846153846154, "no_speech_prob": + 0.0012818914838135242}, {"id": 62, "seek": 24688, "start": 256.88, "end": 262.12, + "text": " I, not part of the team that conceived this idea and I was not actively + working on it.", "tokens": [50864, 286, 11, 406, 644, 295, 264, 1469, 300, 34898, + 341, 1558, 293, 286, 390, 406, 13022, 1364, 322, 309, 13, 51126], "temperature": + 0.0, "avg_logprob": -0.22665776704487048, "compression_ratio": 1.6153846153846154, + "no_speech_prob": 0.0012818914838135242}, {"id": 63, "seek": 24688, "start": 262.12, + "end": 267.08, "text": " They had a neural network which wasn''t producing good + enough results.", "tokens": [51126, 814, 632, 257, 18161, 3209, 597, 2067, 380, + 10501, 665, 1547, 3542, 13, 51374], "temperature": 0.0, "avg_logprob": -0.22665776704487048, + "compression_ratio": 1.6153846153846154, "no_speech_prob": 0.0012818914838135242}, + {"id": 64, "seek": 24688, "start": 267.08, "end": 272.24, "text": " And we put in + this more general purpose question answering your network and the results dramatically", + "tokens": [51374, 400, 321, 829, 294, 341, 544, 2674, 4334, 1168, 13430, 428, 3209, + 293, 264, 3542, 17548, 51632], "temperature": 0.0, "avg_logprob": -0.22665776704487048, + "compression_ratio": 1.6153846153846154, "no_speech_prob": 0.0012818914838135242}, + {"id": 65, "seek": 24688, "start": 272.24, "end": 273.24, "text": " improved.", + "tokens": [51632, 9689, 13, 51682], "temperature": 0.0, "avg_logprob": -0.22665776704487048, + "compression_ratio": 1.6153846153846154, "no_speech_prob": 0.0012818914838135242}, + {"id": 66, "seek": 27324, "start": 273.24, "end": 275.8, "text": " This was basically + the first rollout.", "tokens": [50364, 639, 390, 1936, 264, 700, 3373, 346, 13, + 50492], "temperature": 0.0, "avg_logprob": -0.170166015625, "compression_ratio": + 1.5798319327731092, "no_speech_prob": 0.00768980011343956}, {"id": 67, "seek": 27324, + "start": 275.8, "end": 281.40000000000003, "text": " But then what I observed over + the subsequent years was that I was able to take exactly", "tokens": [50492, 583, + 550, 437, 286, 13095, 670, 264, 19962, 924, 390, 300, 286, 390, 1075, 281, 747, + 2293, 50772], "temperature": 0.0, "avg_logprob": -0.170166015625, "compression_ratio": + 1.5798319327731092, "no_speech_prob": 0.00768980011343956}, {"id": 68, "seek": 27324, + "start": 281.40000000000003, "end": 287.88, "text": " the same neural network and + apply it in at least six different products within Google.", "tokens": [50772, 264, + 912, 18161, 3209, 293, 3079, 309, 294, 412, 1935, 2309, 819, 3383, 1951, 3329, 13, + 51096], "temperature": 0.0, "avg_logprob": -0.170166015625, "compression_ratio": + 1.5798319327731092, "no_speech_prob": 0.00768980011343956}, {"id": 69, "seek": 27324, + "start": 287.88, "end": 296.04, "text": " And this is what convinced me of the business + value of what had been demonstrated here.", "tokens": [51096, 400, 341, 307, 437, + 12561, 385, 295, 264, 1606, 2158, 295, 437, 632, 668, 18772, 510, 13, 51504], "temperature": + 0.0, "avg_logprob": -0.170166015625, "compression_ratio": 1.5798319327731092, "no_speech_prob": + 0.00768980011343956}, {"id": 70, "seek": 27324, "start": 296.04, "end": 301.64, + "text": " This could actually improve metrics and products used by millions of people.", + "tokens": [51504, 639, 727, 767, 3470, 16367, 293, 3383, 1143, 538, 6803, 295, 561, + 13, 51784], "temperature": 0.0, "avg_logprob": -0.170166015625, "compression_ratio": + 1.5798319327731092, "no_speech_prob": 0.00768980011343956}, {"id": 71, "seek": 30164, + "start": 301.64, "end": 307.44, "text": " And so this was essentially the genesis + of the idea of the ZRAI.", "tokens": [50364, 400, 370, 341, 390, 4476, 264, 1049, + 9374, 295, 264, 1558, 295, 264, 1176, 3750, 40, 13, 50654], "temperature": 0.0, + "avg_logprob": -0.2683603161963347, "compression_ratio": 1.5597014925373134, "no_speech_prob": + 0.0020390308927744627}, {"id": 72, "seek": 30164, "start": 307.44, "end": 312.71999999999997, + "text": " We started me in my co-founder in 2020 and the objective is to provide + something like", "tokens": [50654, 492, 1409, 385, 294, 452, 598, 12, 33348, 294, + 4808, 293, 264, 10024, 307, 281, 2893, 746, 411, 50918], "temperature": 0.0, "avg_logprob": + -0.2683603161963347, "compression_ratio": 1.5597014925373134, "no_speech_prob": + 0.0020390308927744627}, {"id": 73, "seek": 30164, "start": 312.71999999999997, "end": + 317.71999999999997, "text": " elastic search or algolia, except using the principles + of neural information retrieval.", "tokens": [50918, 17115, 3164, 420, 3501, 29760, + 11, 3993, 1228, 264, 9156, 295, 18161, 1589, 19817, 3337, 13, 51168], "temperature": + 0.0, "avg_logprob": -0.2683603161963347, "compression_ratio": 1.5597014925373134, + "no_speech_prob": 0.0020390308927744627}, {"id": 74, "seek": 30164, "start": 317.71999999999997, + "end": 325.03999999999996, "text": " So as you know, elastic search and algolia + are based on the BM25 algorithm fundamentally.", "tokens": [51168, 407, 382, 291, + 458, 11, 17115, 3164, 293, 3501, 29760, 366, 2361, 322, 264, 15901, 6074, 9284, + 17879, 13, 51534], "temperature": 0.0, "avg_logprob": -0.2683603161963347, "compression_ratio": + 1.5597014925373134, "no_speech_prob": 0.0020390308927744627}, {"id": 75, "seek": + 30164, "start": 325.03999999999996, "end": 327.64, "text": " So yeah, so that''s + what we''ve been doing for the last two years.", "tokens": [51534, 407, 1338, 11, + 370, 300, 311, 437, 321, 600, 668, 884, 337, 264, 1036, 732, 924, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.2683603161963347, "compression_ratio": 1.5597014925373134, + "no_speech_prob": 0.0020390308927744627}, {"id": 76, "seek": 30164, "start": 327.64, + "end": 329.2, "text": " Yeah, this is fantastic.", "tokens": [51664, 865, 11, 341, + 307, 5456, 13, 51742], "temperature": 0.0, "avg_logprob": -0.2683603161963347, "compression_ratio": + 1.5597014925373134, "no_speech_prob": 0.0020390308927744627}, {"id": 77, "seek": + 32920, "start": 329.2, "end": 334.84, "text": " I mean, it''s fantastic also that + you bring your experience from such a large company innovating", "tokens": [50364, + 286, 914, 11, 309, 311, 5456, 611, 300, 291, 1565, 428, 1752, 490, 1270, 257, 2416, + 2237, 5083, 990, 50646], "temperature": 0.0, "avg_logprob": -0.21437828063964845, + "compression_ratio": 1.5193798449612403, "no_speech_prob": 0.003055588575080037}, + {"id": 78, "seek": 32920, "start": 334.84, "end": 336.32, "text": " in search, right?", + "tokens": [50646, 294, 3164, 11, 558, 30, 50720], "temperature": 0.0, "avg_logprob": + -0.21437828063964845, "compression_ratio": 1.5193798449612403, "no_speech_prob": + 0.003055588575080037}, {"id": 79, "seek": 32920, "start": 336.32, "end": 341.48, + "text": " Over to, you know, the rest of the world essentially, right?", "tokens": + [50720, 4886, 281, 11, 291, 458, 11, 264, 1472, 295, 264, 1002, 4476, 11, 558, 30, + 50978], "temperature": 0.0, "avg_logprob": -0.21437828063964845, "compression_ratio": + 1.5193798449612403, "no_speech_prob": 0.003055588575080037}, {"id": 80, "seek": + 32920, "start": 341.48, "end": 346.52, "text": " So that I believe your goal is + to apply this with as many clients as possible.", "tokens": [50978, 407, 300, 286, + 1697, 428, 3387, 307, 281, 3079, 341, 365, 382, 867, 6982, 382, 1944, 13, 51230], + "temperature": 0.0, "avg_logprob": -0.21437828063964845, "compression_ratio": 1.5193798449612403, + "no_speech_prob": 0.003055588575080037}, {"id": 81, "seek": 32920, "start": 346.52, + "end": 351.03999999999996, "text": " And are you focusing mostly on NLP at the moment, + natural language processing?", "tokens": [51230, 400, 366, 291, 8416, 5240, 322, + 426, 45196, 412, 264, 1623, 11, 3303, 2856, 9007, 30, 51456], "temperature": 0.0, + "avg_logprob": -0.21437828063964845, "compression_ratio": 1.5193798449612403, "no_speech_prob": + 0.003055588575080037}, {"id": 82, "seek": 32920, "start": 351.03999999999996, "end": + 356.08, "text": " Yeah, so well, we''re focused from a customer''s perspective.", + "tokens": [51456, 865, 11, 370, 731, 11, 321, 434, 5178, 490, 257, 5474, 311, 4585, + 13, 51708], "temperature": 0.0, "avg_logprob": -0.21437828063964845, "compression_ratio": + 1.5193798449612403, "no_speech_prob": 0.003055588575080037}, {"id": 83, "seek": + 35608, "start": 356.08, "end": 359.28, "text": " We provide a tech search solution.", + "tokens": [50364, 492, 2893, 257, 7553, 3164, 3827, 13, 50524], "temperature": 0.0, + "avg_logprob": -0.2024254384248153, "compression_ratio": 1.6018099547511313, "no_speech_prob": + 0.001940710935741663}, {"id": 84, "seek": 35608, "start": 359.28, "end": 366.03999999999996, + "text": " Now, one of the beauties of embedding based techniques is that in your + network,", "tokens": [50524, 823, 11, 472, 295, 264, 1869, 530, 295, 12240, 3584, + 2361, 7512, 307, 300, 294, 428, 3209, 11, 50862], "temperature": 0.0, "avg_logprob": + -0.2024254384248153, "compression_ratio": 1.6018099547511313, "no_speech_prob": + 0.001940710935741663}, {"id": 85, "seek": 35608, "start": 366.03999999999996, "end": + 372.03999999999996, "text": " you can go beyond text and you can embed images, video + and other types of media", "tokens": [50862, 291, 393, 352, 4399, 2487, 293, 291, + 393, 12240, 5267, 11, 960, 293, 661, 3467, 295, 3021, 51162], "temperature": 0.0, + "avg_logprob": -0.2024254384248153, "compression_ratio": 1.6018099547511313, "no_speech_prob": + 0.001940710935741663}, {"id": 86, "seek": 35608, "start": 372.03999999999996, "end": + 374.28, "text": " into a common embedding space.", "tokens": [51162, 666, 257, 2689, + 12240, 3584, 1901, 13, 51274], "temperature": 0.0, "avg_logprob": -0.2024254384248153, + "compression_ratio": 1.6018099547511313, "no_speech_prob": 0.001940710935741663}, + {"id": 87, "seek": 35608, "start": 374.28, "end": 378.32, "text": " So that is where + this company will eventually go.", "tokens": [51274, 407, 300, 307, 689, 341, 2237, + 486, 4728, 352, 13, 51476], "temperature": 0.0, "avg_logprob": -0.2024254384248153, + "compression_ratio": 1.6018099547511313, "no_speech_prob": 0.001940710935741663}, + {"id": 88, "seek": 35608, "start": 378.32, "end": 384.15999999999997, "text": " + But my roots are in NLP and I think that tech search by itself is a large area", + "tokens": [51476, 583, 452, 10669, 366, 294, 426, 45196, 293, 286, 519, 300, 7553, + 3164, 538, 2564, 307, 257, 2416, 1859, 51768], "temperature": 0.0, "avg_logprob": + -0.2024254384248153, "compression_ratio": 1.6018099547511313, "no_speech_prob": + 0.001940710935741663}, {"id": 89, "seek": 38416, "start": 384.16, "end": 385.6, + "text": " that takes an effort to do well.", "tokens": [50364, 300, 2516, 364, 4630, + 281, 360, 731, 13, 50436], "temperature": 0.0, "avg_logprob": -0.21838103584621263, + "compression_ratio": 1.6553030303030303, "no_speech_prob": 0.002856289502233267}, + {"id": 90, "seek": 38416, "start": 385.6, "end": 388.96000000000004, "text": " So + that''s where we''re focused initially.", "tokens": [50436, 407, 300, 311, 689, + 321, 434, 5178, 9105, 13, 50604], "temperature": 0.0, "avg_logprob": -0.21838103584621263, + "compression_ratio": 1.6553030303030303, "no_speech_prob": 0.002856289502233267}, + {"id": 91, "seek": 38416, "start": 388.96000000000004, "end": 390.36, "text": " + Yeah, that makes total sense.", "tokens": [50604, 865, 11, 300, 1669, 3217, 2020, + 13, 50674], "temperature": 0.0, "avg_logprob": -0.21838103584621263, "compression_ratio": + 1.6553030303030303, "no_speech_prob": 0.002856289502233267}, {"id": 92, "seek": + 38416, "start": 390.36, "end": 396.56, "text": " But as you said, you know, vector + search is not kind of constrained by the application", "tokens": [50674, 583, 382, + 291, 848, 11, 291, 458, 11, 8062, 3164, 307, 406, 733, 295, 38901, 538, 264, 3861, + 50984], "temperature": 0.0, "avg_logprob": -0.21838103584621263, "compression_ratio": + 1.6553030303030303, "no_speech_prob": 0.002856289502233267}, {"id": 93, "seek": + 38416, "start": 396.56, "end": 398.88000000000005, "text": " as long as you can + embed it, right?", "tokens": [50984, 382, 938, 382, 291, 393, 12240, 309, 11, 558, + 30, 51100], "temperature": 0.0, "avg_logprob": -0.21838103584621263, "compression_ratio": + 1.6553030303030303, "no_speech_prob": 0.002856289502233267}, {"id": 94, "seek": + 38416, "start": 398.88000000000005, "end": 404.0, "text": " And plus all these multimodal + scenarios where you can combine, let''s say,", "tokens": [51100, 400, 1804, 439, + 613, 32972, 378, 304, 15077, 689, 291, 393, 10432, 11, 718, 311, 584, 11, 51356], + "temperature": 0.0, "avg_logprob": -0.21838103584621263, "compression_ratio": 1.6553030303030303, + "no_speech_prob": 0.002856289502233267}, {"id": 95, "seek": 38416, "start": 404.0, + "end": 408.04, "text": " your camera pointed at something and then you''re talking + to it and then you can kind of", "tokens": [51356, 428, 2799, 10932, 412, 746, 293, + 550, 291, 434, 1417, 281, 309, 293, 550, 291, 393, 733, 295, 51558], "temperature": + 0.0, "avg_logprob": -0.21838103584621263, "compression_ratio": 1.6553030303030303, + "no_speech_prob": 0.002856289502233267}, {"id": 96, "seek": 38416, "start": 408.04, + "end": 411.48, "text": " get some textual matches and suggestions, right?", "tokens": + [51558, 483, 512, 2487, 901, 10676, 293, 13396, 11, 558, 30, 51730], "temperature": + 0.0, "avg_logprob": -0.21838103584621263, "compression_ratio": 1.6553030303030303, + "no_speech_prob": 0.002856289502233267}, {"id": 97, "seek": 41148, "start": 411.48, + "end": 414.36, "text": " So that could be a very rich experience.", "tokens": [50364, + 407, 300, 727, 312, 257, 588, 4593, 1752, 13, 50508], "temperature": 0.0, "avg_logprob": + -0.21947385646678783, "compression_ratio": 1.5818815331010454, "no_speech_prob": + 0.0024948555510491133}, {"id": 98, "seek": 41148, "start": 414.36, "end": 415.16, + "text": " Right, right.", "tokens": [50508, 1779, 11, 558, 13, 50548], "temperature": + 0.0, "avg_logprob": -0.21947385646678783, "compression_ratio": 1.5818815331010454, + "no_speech_prob": 0.0024948555510491133}, {"id": 99, "seek": 41148, "start": 415.16, + "end": 422.16, "text": " And that particular application is actually achievable + now, even in an all text platform,", "tokens": [50548, 400, 300, 1729, 3861, 307, + 767, 3538, 17915, 586, 11, 754, 294, 364, 439, 2487, 3663, 11, 50898], "temperature": + 0.0, "avg_logprob": -0.21947385646678783, "compression_ratio": 1.5818815331010454, + "no_speech_prob": 0.0024948555510491133}, {"id": 100, "seek": 41148, "start": 422.16, + "end": 424.6, "text": " if you feed the transcripts in.", "tokens": [50898, 498, + 291, 3154, 264, 24444, 82, 294, 13, 51020], "temperature": 0.0, "avg_logprob": -0.21947385646678783, + "compression_ratio": 1.5818815331010454, "no_speech_prob": 0.0024948555510491133}, + {"id": 101, "seek": 41148, "start": 424.6, "end": 430.04, "text": " And these neural + network approaches tend to work especially well with natural speech,", "tokens": + [51020, 400, 613, 18161, 3209, 11587, 3928, 281, 589, 2318, 731, 365, 3303, 6218, + 11, 51292], "temperature": 0.0, "avg_logprob": -0.21947385646678783, "compression_ratio": + 1.5818815331010454, "no_speech_prob": 0.0024948555510491133}, {"id": 102, "seek": + 41148, "start": 430.04, "end": 431.44, "text": " both as query input.", "tokens": + [51292, 1293, 382, 14581, 4846, 13, 51362], "temperature": 0.0, "avg_logprob": -0.21947385646678783, + "compression_ratio": 1.5818815331010454, "no_speech_prob": 0.0024948555510491133}, + {"id": 103, "seek": 41148, "start": 431.44, "end": 436.68, "text": " So this is + why they''re often used in technologies like Assistant or Alexa.", "tokens": [51362, + 407, 341, 307, 983, 436, 434, 2049, 1143, 294, 7943, 411, 14890, 420, 22595, 13, + 51624], "temperature": 0.0, "avg_logprob": -0.21947385646678783, "compression_ratio": + 1.5818815331010454, "no_speech_prob": 0.0024948555510491133}, {"id": 104, "seek": + 41148, "start": 436.68, "end": 440.08000000000004, "text": " Because people, when + they speak, it''s obviously much different than when you''re typing keywords", "tokens": + [51624, 1436, 561, 11, 562, 436, 1710, 11, 309, 311, 2745, 709, 819, 813, 562, 291, + 434, 18444, 21009, 51794], "temperature": 0.0, "avg_logprob": -0.21947385646678783, + "compression_ratio": 1.5818815331010454, "no_speech_prob": 0.0024948555510491133}, + {"id": 105, "seek": 44008, "start": 440.08, "end": 443.12, "text": " in a search + box with your keyboard.", "tokens": [50364, 294, 257, 3164, 2424, 365, 428, 10186, + 13, 50516], "temperature": 0.0, "avg_logprob": -0.22153741805279842, "compression_ratio": + 1.708185053380783, "no_speech_prob": 0.0015338828088715672}, {"id": 106, "seek": + 44008, "start": 443.12, "end": 447.03999999999996, "text": " But then also when + searching over natural language text like transcripts.", "tokens": [50516, 583, + 550, 611, 562, 10808, 670, 3303, 2856, 2487, 411, 24444, 82, 13, 50712], "temperature": + 0.0, "avg_logprob": -0.22153741805279842, "compression_ratio": 1.708185053380783, + "no_speech_prob": 0.0015338828088715672}, {"id": 107, "seek": 44008, "start": 447.03999999999996, + "end": 448.24, "text": " Yeah, absolutely.", "tokens": [50712, 865, 11, 3122, 13, + 50772], "temperature": 0.0, "avg_logprob": -0.22153741805279842, "compression_ratio": + 1.708185053380783, "no_speech_prob": 0.0015338828088715672}, {"id": 108, "seek": + 44008, "start": 448.24, "end": 453.03999999999996, "text": " And when you say neural + networks, you know, some of them, let''s say, vector database providers", "tokens": + [50772, 400, 562, 291, 584, 18161, 9590, 11, 291, 458, 11, 512, 295, 552, 11, 718, + 311, 584, 11, 8062, 8149, 11330, 51012], "temperature": 0.0, "avg_logprob": -0.22153741805279842, + "compression_ratio": 1.708185053380783, "no_speech_prob": 0.0015338828088715672}, + {"id": 109, "seek": 44008, "start": 453.03999999999996, "end": 457.71999999999997, + "text": " and vendors on the market, they give you sort of this machinery.", "tokens": + [51012, 293, 22056, 322, 264, 2142, 11, 436, 976, 291, 1333, 295, 341, 27302, 13, + 51246], "temperature": 0.0, "avg_logprob": -0.22153741805279842, "compression_ratio": + 1.708185053380783, "no_speech_prob": 0.0015338828088715672}, {"id": 110, "seek": + 44008, "start": 457.71999999999997, "end": 459.03999999999996, "text": " You can + plug in some models.", "tokens": [51246, 509, 393, 5452, 294, 512, 5245, 13, 51312], + "temperature": 0.0, "avg_logprob": -0.22153741805279842, "compression_ratio": 1.708185053380783, + "no_speech_prob": 0.0015338828088715672}, {"id": 111, "seek": 44008, "start": 459.03999999999996, + "end": 463.71999999999997, "text": " They also have some models available, let''s + say, from hugging face.", "tokens": [51312, 814, 611, 362, 512, 5245, 2435, 11, + 718, 311, 584, 11, 490, 41706, 1851, 13, 51546], "temperature": 0.0, "avg_logprob": + -0.22153741805279842, "compression_ratio": 1.708185053380783, "no_speech_prob": + 0.0015338828088715672}, {"id": 112, "seek": 44008, "start": 463.71999999999997, + "end": 469.56, "text": " In your case, in case of ZRI, are you innovating in this + space of creating this neural networks", "tokens": [51546, 682, 428, 1389, 11, 294, + 1389, 295, 1176, 5577, 11, 366, 291, 5083, 990, 294, 341, 1901, 295, 4084, 341, + 18161, 9590, 51838], "temperature": 0.0, "avg_logprob": -0.22153741805279842, "compression_ratio": + 1.708185053380783, "no_speech_prob": 0.0015338828088715672}, {"id": 113, "seek": + 46956, "start": 469.56, "end": 471.16, "text": " for your clients?", "tokens": [50364, + 337, 428, 6982, 30, 50444], "temperature": 0.0, "avg_logprob": -0.18582420349121093, + "compression_ratio": 1.6156716417910448, "no_speech_prob": 0.002171857515349984}, + {"id": 114, "seek": 46956, "start": 471.16, "end": 475.24, "text": " Yes, we are + approaching the problem holistically.", "tokens": [50444, 1079, 11, 321, 366, 14908, + 264, 1154, 4091, 20458, 13, 50648], "temperature": 0.0, "avg_logprob": -0.18582420349121093, + "compression_ratio": 1.6156716417910448, "no_speech_prob": 0.002171857515349984}, + {"id": 115, "seek": 46956, "start": 475.24, "end": 481.48, "text": " So we''re, + you know, the vector database is one critical component of a neural information", + "tokens": [50648, 407, 321, 434, 11, 291, 458, 11, 264, 8062, 8149, 307, 472, 4924, + 6542, 295, 257, 18161, 1589, 50960], "temperature": 0.0, "avg_logprob": -0.18582420349121093, + "compression_ratio": 1.6156716417910448, "no_speech_prob": 0.002171857515349984}, + {"id": 116, "seek": 46956, "start": 481.48, "end": 483.08, "text": " retrieval system.", + "tokens": [50960, 19817, 3337, 1185, 13, 51040], "temperature": 0.0, "avg_logprob": + -0.18582420349121093, "compression_ratio": 1.6156716417910448, "no_speech_prob": + 0.002171857515349984}, {"id": 117, "seek": 46956, "start": 483.08, "end": 488.56, + "text": " But there''s other pieces, for instance, like the re-ranking piece or + the neural network", "tokens": [51040, 583, 456, 311, 661, 3755, 11, 337, 5197, + 11, 411, 264, 319, 12, 20479, 278, 2522, 420, 264, 18161, 3209, 51314], "temperature": + 0.0, "avg_logprob": -0.18582420349121093, "compression_ratio": 1.6156716417910448, + "no_speech_prob": 0.002171857515349984}, {"id": 118, "seek": 46956, "start": 488.56, + "end": 490.2, "text": " that produces the embeddings.", "tokens": [51314, 300, 14725, + 264, 12240, 29432, 13, 51396], "temperature": 0.0, "avg_logprob": -0.18582420349121093, + "compression_ratio": 1.6156716417910448, "no_speech_prob": 0.002171857515349984}, + {"id": 119, "seek": 46956, "start": 490.2, "end": 492.76, "text": " And all of these + need to work in coordination and tandem.", "tokens": [51396, 400, 439, 295, 613, + 643, 281, 589, 294, 21252, 293, 48120, 13, 51524], "temperature": 0.0, "avg_logprob": + -0.18582420349121093, "compression_ratio": 1.6156716417910448, "no_speech_prob": + 0.002171857515349984}, {"id": 120, "seek": 46956, "start": 492.76, "end": 497.04, + "text": " Ideally, when they do, you can squeeze a lot more performance out of this + system.", "tokens": [51524, 40817, 11, 562, 436, 360, 11, 291, 393, 13578, 257, + 688, 544, 3389, 484, 295, 341, 1185, 13, 51738], "temperature": 0.0, "avg_logprob": + -0.18582420349121093, "compression_ratio": 1.6156716417910448, "no_speech_prob": + 0.002171857515349984}, {"id": 121, "seek": 49704, "start": 497.04, "end": 502.44, + "text": " So yes, our focus is on, we even handle data ingestion.", "tokens": [50364, + 407, 2086, 11, 527, 1879, 307, 322, 11, 321, 754, 4813, 1412, 3957, 31342, 13, 50634], + "temperature": 0.0, "avg_logprob": -0.17426842037994084, "compression_ratio": 1.5756302521008403, + "no_speech_prob": 0.0017174474196508527}, {"id": 122, "seek": 49704, "start": 502.44, + "end": 504.28000000000003, "text": " It''s not a big area of focus.", "tokens": + [50634, 467, 311, 406, 257, 955, 1859, 295, 1879, 13, 50726], "temperature": 0.0, + "avg_logprob": -0.17426842037994084, "compression_ratio": 1.5756302521008403, "no_speech_prob": + 0.0017174474196508527}, {"id": 123, "seek": 49704, "start": 504.28000000000003, + "end": 511.24, "text": " But the reality is that you have to make your experiences + as easy as possible for widespread", "tokens": [50726, 583, 264, 4103, 307, 300, + 291, 362, 281, 652, 428, 5235, 382, 1858, 382, 1944, 337, 22679, 51074], "temperature": + 0.0, "avg_logprob": -0.17426842037994084, "compression_ratio": 1.5756302521008403, + "no_speech_prob": 0.0017174474196508527}, {"id": 124, "seek": 49704, "start": 511.24, + "end": 512.9200000000001, "text": " adoption, I think.", "tokens": [51074, 19215, + 11, 286, 519, 13, 51158], "temperature": 0.0, "avg_logprob": -0.17426842037994084, + "compression_ratio": 1.5756302521008403, "no_speech_prob": 0.0017174474196508527}, + {"id": 125, "seek": 49704, "start": 512.9200000000001, "end": 519.64, "text": " + So we allow our customers to just shovel in, you know, PDF documents and all kinds + of", "tokens": [51158, 407, 321, 2089, 527, 4581, 281, 445, 29789, 294, 11, 291, + 458, 11, 17752, 8512, 293, 439, 3685, 295, 51494], "temperature": 0.0, "avg_logprob": + -0.17426842037994084, "compression_ratio": 1.5756302521008403, "no_speech_prob": + 0.0017174474196508527}, {"id": 126, "seek": 49704, "start": 519.64, "end": 521.04, + "text": " other formats.", "tokens": [51494, 661, 25879, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.17426842037994084, "compression_ratio": 1.5756302521008403, + "no_speech_prob": 0.0017174474196508527}, {"id": 127, "seek": 49704, "start": 521.04, + "end": 522.52, "text": " We perform the text extraction.", "tokens": [51564, 492, + 2042, 264, 2487, 30197, 13, 51638], "temperature": 0.0, "avg_logprob": -0.17426842037994084, + "compression_ratio": 1.5756302521008403, "no_speech_prob": 0.0017174474196508527}, + {"id": 128, "seek": 49704, "start": 522.52, "end": 524.48, "text": " We perform + the segmentation of the document.", "tokens": [51638, 492, 2042, 264, 9469, 399, + 295, 264, 4166, 13, 51736], "temperature": 0.0, "avg_logprob": -0.17426842037994084, + "compression_ratio": 1.5756302521008403, "no_speech_prob": 0.0017174474196508527}, + {"id": 129, "seek": 52448, "start": 524.48, "end": 529.32, "text": " And we actually + do the encoding with the neural network, build the vector database and then", "tokens": + [50364, 400, 321, 767, 360, 264, 43430, 365, 264, 18161, 3209, 11, 1322, 264, 8062, + 8149, 293, 550, 50606], "temperature": 0.0, "avg_logprob": -0.22864268596907308, + "compression_ratio": 1.68259385665529, "no_speech_prob": 0.020112542435526848}, + {"id": 130, "seek": 52448, "start": 529.32, "end": 531.24, "text": " handle the + serving as well.", "tokens": [50606, 4813, 264, 8148, 382, 731, 13, 50702], "temperature": + 0.0, "avg_logprob": -0.22864268596907308, "compression_ratio": 1.68259385665529, + "no_speech_prob": 0.020112542435526848}, {"id": 131, "seek": 52448, "start": 531.24, + "end": 533.72, "text": " Yeah, so it sounds like an all-around solution.", "tokens": + [50702, 865, 11, 370, 309, 3263, 411, 364, 439, 12, 25762, 3827, 13, 50826], "temperature": + 0.0, "avg_logprob": -0.22864268596907308, "compression_ratio": 1.68259385665529, + "no_speech_prob": 0.020112542435526848}, {"id": 132, "seek": 52448, "start": 533.72, + "end": 537.8000000000001, "text": " And I mean, it''s very typical, you know, in + some sense kind of to bring some algorithm", "tokens": [50826, 400, 286, 914, 11, + 309, 311, 588, 7476, 11, 291, 458, 11, 294, 512, 2020, 733, 295, 281, 1565, 512, + 9284, 51030], "temperature": 0.0, "avg_logprob": -0.22864268596907308, "compression_ratio": + 1.68259385665529, "no_speech_prob": 0.020112542435526848}, {"id": 133, "seek": 52448, + "start": 537.8000000000001, "end": 541.12, "text": " or some idea to the market, + but like it doesn''t have any connectors.", "tokens": [51030, 420, 512, 1558, 281, + 264, 2142, 11, 457, 411, 309, 1177, 380, 362, 604, 31865, 13, 51196], "temperature": + 0.0, "avg_logprob": -0.22864268596907308, "compression_ratio": 1.68259385665529, + "no_speech_prob": 0.020112542435526848}, {"id": 134, "seek": 52448, "start": 541.12, + "end": 542.84, "text": " Okay, how do I feed data into it?", "tokens": [51196, 1033, + 11, 577, 360, 286, 3154, 1412, 666, 309, 30, 51282], "temperature": 0.0, "avg_logprob": + -0.22864268596907308, "compression_ratio": 1.68259385665529, "no_speech_prob": 0.020112542435526848}, + {"id": 135, "seek": 52448, "start": 542.84, "end": 545.52, "text": " Or maybe there + is like a simple demo.", "tokens": [51282, 1610, 1310, 456, 307, 411, 257, 2199, + 10723, 13, 51416], "temperature": 0.0, "avg_logprob": -0.22864268596907308, "compression_ratio": + 1.68259385665529, "no_speech_prob": 0.020112542435526848}, {"id": 136, "seek": 52448, + "start": 545.52, "end": 547.76, "text": " And yeah, nothing beyond that.", "tokens": + [51416, 400, 1338, 11, 1825, 4399, 300, 13, 51528], "temperature": 0.0, "avg_logprob": + -0.22864268596907308, "compression_ratio": 1.68259385665529, "no_speech_prob": 0.020112542435526848}, + {"id": 137, "seek": 52448, "start": 547.76, "end": 553.44, "text": " But it sounds + like you are taking the kind of all-around approach.", "tokens": [51528, 583, 309, + 3263, 411, 291, 366, 1940, 264, 733, 295, 439, 12, 25762, 3109, 13, 51812], "temperature": + 0.0, "avg_logprob": -0.22864268596907308, "compression_ratio": 1.68259385665529, + "no_speech_prob": 0.020112542435526848}, {"id": 138, "seek": 55344, "start": 553.44, + "end": 560.1600000000001, "text": " And have you been looking to implement everything + yourself or are you also kind of reusing some", "tokens": [50364, 400, 362, 291, + 668, 1237, 281, 4445, 1203, 1803, 420, 366, 291, 611, 733, 295, 319, 7981, 512, + 50700], "temperature": 0.0, "avg_logprob": -0.2266206643016068, "compression_ratio": + 1.6, "no_speech_prob": 0.0018445164896547794}, {"id": 139, "seek": 55344, "start": + 560.1600000000001, "end": 566.6800000000001, "text": " of the open source pipelines, + you know, like for example, for embedding or for document", "tokens": [50700, 295, + 264, 1269, 4009, 40168, 11, 291, 458, 11, 411, 337, 1365, 11, 337, 12240, 3584, + 420, 337, 4166, 51026], "temperature": 0.0, "avg_logprob": -0.2266206643016068, + "compression_ratio": 1.6, "no_speech_prob": 0.0018445164896547794}, {"id": 140, + "seek": 55344, "start": 566.6800000000001, "end": 569.36, "text": " conversions + and so on?", "tokens": [51026, 42256, 293, 370, 322, 30, 51160], "temperature": + 0.0, "avg_logprob": -0.2266206643016068, "compression_ratio": 1.6, "no_speech_prob": + 0.0018445164896547794}, {"id": 141, "seek": 55344, "start": 569.36, "end": 577.12, + "text": " Yeah, we are using open source as much as we can and where we think it + makes sense.", "tokens": [51160, 865, 11, 321, 366, 1228, 1269, 4009, 382, 709, + 382, 321, 393, 293, 689, 321, 519, 309, 1669, 2020, 13, 51548], "temperature": 0.0, + "avg_logprob": -0.2266206643016068, "compression_ratio": 1.6, "no_speech_prob": + 0.0018445164896547794}, {"id": 142, "seek": 55344, "start": 577.12, "end": 581.84, + "text": " So for instance, for content extraction, there''s a Pashitika, which is + a very good framework.", "tokens": [51548, 407, 337, 5197, 11, 337, 2701, 30197, + 11, 456, 311, 257, 430, 1299, 270, 5439, 11, 597, 307, 257, 588, 665, 8388, 13, + 51784], "temperature": 0.0, "avg_logprob": -0.2266206643016068, "compression_ratio": + 1.6, "no_speech_prob": 0.0018445164896547794}, {"id": 143, "seek": 58184, "start": + 581.84, "end": 586.5600000000001, "text": " But then there are certain document + types for which there are better alternatives out", "tokens": [50364, 583, 550, + 456, 366, 1629, 4166, 3467, 337, 597, 456, 366, 1101, 20478, 484, 50600], "temperature": + 0.0, "avg_logprob": -0.2829961858244024, "compression_ratio": 1.7132075471698114, + "no_speech_prob": 0.0045359134674072266}, {"id": 144, "seek": 58184, "start": 586.5600000000001, + "end": 587.5600000000001, "text": " there.", "tokens": [50600, 456, 13, 50650], + "temperature": 0.0, "avg_logprob": -0.2829961858244024, "compression_ratio": 1.7132075471698114, + "no_speech_prob": 0.0045359134674072266}, {"id": 145, "seek": 58184, "start": 587.5600000000001, + "end": 592.6, "text": " And, you know, we''ve had certain customers for which PDF + extraction, for instance, was a priority.", "tokens": [50650, 400, 11, 291, 458, + 11, 321, 600, 632, 1629, 4581, 337, 597, 17752, 30197, 11, 337, 5197, 11, 390, 257, + 9365, 13, 50902], "temperature": 0.0, "avg_logprob": -0.2829961858244024, "compression_ratio": + 1.7132075471698114, "no_speech_prob": 0.0045359134674072266}, {"id": 146, "seek": + 58184, "start": 592.6, "end": 597.32, "text": " And we discovered some shortfalls + with Tick-N-We went and we researched and found there''s better", "tokens": [50902, + 400, 321, 6941, 512, 2099, 18542, 365, 314, 618, 12, 45, 12, 4360, 1437, 293, 321, + 37098, 293, 1352, 456, 311, 1101, 51138], "temperature": 0.0, "avg_logprob": -0.2829961858244024, + "compression_ratio": 1.7132075471698114, "no_speech_prob": 0.0045359134674072266}, + {"id": 147, "seek": 58184, "start": 597.32, "end": 598.32, "text": " alternatives + out there.", "tokens": [51138, 20478, 484, 456, 13, 51188], "temperature": 0.0, + "avg_logprob": -0.2829961858244024, "compression_ratio": 1.7132075471698114, "no_speech_prob": + 0.0045359134674072266}, {"id": 148, "seek": 58184, "start": 598.32, "end": 599.8000000000001, + "text": " And so we''ve got those implemented.", "tokens": [51188, 400, 370, 321, + 600, 658, 729, 12270, 13, 51262], "temperature": 0.0, "avg_logprob": -0.2829961858244024, + "compression_ratio": 1.7132075471698114, "no_speech_prob": 0.0045359134674072266}, + {"id": 149, "seek": 58184, "start": 599.8000000000001, "end": 602.32, "text": " + But we didn''t write a PDF extractor from scratch, obviously.", "tokens": [51262, + 583, 321, 994, 380, 2464, 257, 17752, 8947, 284, 490, 8459, 11, 2745, 13, 51388], + "temperature": 0.0, "avg_logprob": -0.2829961858244024, "compression_ratio": 1.7132075471698114, + "no_speech_prob": 0.0045359134674072266}, {"id": 150, "seek": 58184, "start": 602.32, + "end": 605.5600000000001, "text": " That''s too much for a two-man company to do.", + "tokens": [51388, 663, 311, 886, 709, 337, 257, 732, 12, 1601, 2237, 281, 360, 13, + 51550], "temperature": 0.0, "avg_logprob": -0.2829961858244024, "compression_ratio": + 1.7132075471698114, "no_speech_prob": 0.0045359134674072266}, {"id": 151, "seek": + 60556, "start": 605.56, "end": 612.56, "text": " So yeah, we''re trying to really + combine the best of breed in every area and create a cohesive", "tokens": [50364, + 407, 1338, 11, 321, 434, 1382, 281, 534, 10432, 264, 1151, 295, 18971, 294, 633, + 1859, 293, 1884, 257, 43025, 50714], "temperature": 0.0, "avg_logprob": -0.21462676308371803, + "compression_ratio": 1.6090225563909775, "no_speech_prob": 0.08632315695285797}, + {"id": 152, "seek": 60556, "start": 612.56, "end": 617.28, "text": " system that + just works out of the box quite well for a broad range of use cases.", "tokens": + [50714, 1185, 300, 445, 1985, 484, 295, 264, 2424, 1596, 731, 337, 257, 4152, 3613, + 295, 764, 3331, 13, 50950], "temperature": 0.0, "avg_logprob": -0.21462676308371803, + "compression_ratio": 1.6090225563909775, "no_speech_prob": 0.08632315695285797}, + {"id": 153, "seek": 60556, "start": 617.28, "end": 618.76, "text": " Oh yeah, that''s + awesome.", "tokens": [50950, 876, 1338, 11, 300, 311, 3476, 13, 51024], "temperature": + 0.0, "avg_logprob": -0.21462676308371803, "compression_ratio": 1.6090225563909775, + "no_speech_prob": 0.08632315695285797}, {"id": 154, "seek": 60556, "start": 618.76, + "end": 625.16, "text": " And it''s also great to hear that you reuse open source + software, you know, at least initially", "tokens": [51024, 400, 309, 311, 611, 869, + 281, 1568, 300, 291, 26225, 1269, 4009, 4722, 11, 291, 458, 11, 412, 1935, 9105, + 51344], "temperature": 0.0, "avg_logprob": -0.21462676308371803, "compression_ratio": + 1.6090225563909775, "no_speech_prob": 0.08632315695285797}, {"id": 155, "seek": + 60556, "start": 625.16, "end": 628.5999999999999, "text": " or maybe you can find + two minutes, so to say.", "tokens": [51344, 420, 1310, 291, 393, 915, 732, 2077, + 11, 370, 281, 584, 13, 51516], "temperature": 0.0, "avg_logprob": -0.21462676308371803, + "compression_ratio": 1.6090225563909775, "no_speech_prob": 0.08632315695285797}, + {"id": 156, "seek": 60556, "start": 628.5999999999999, "end": 633.68, "text": " + But yeah, I mean, also that''s amazing because you can quickly kind of build your + product", "tokens": [51516, 583, 1338, 11, 286, 914, 11, 611, 300, 311, 2243, 570, + 291, 393, 2661, 733, 295, 1322, 428, 1674, 51770], "temperature": 0.0, "avg_logprob": + -0.21462676308371803, "compression_ratio": 1.6090225563909775, "no_speech_prob": + 0.08632315695285797}, {"id": 157, "seek": 63368, "start": 633.68, "end": 635.68, + "text": " and focus on the goal.", "tokens": [50364, 293, 1879, 322, 264, 3387, + 13, 50464], "temperature": 0.0, "avg_logprob": -0.26846617519265353, "compression_ratio": + 1.5919282511210762, "no_speech_prob": 0.012865211814641953}, {"id": 158, "seek": + 63368, "start": 635.68, "end": 641.9599999999999, "text": " Yeah, and now that we + approached this more kind of closely, can you actually describe what", "tokens": + [50464, 865, 11, 293, 586, 300, 321, 17247, 341, 544, 733, 295, 8185, 11, 393, 291, + 767, 6786, 437, 50778], "temperature": 0.0, "avg_logprob": -0.26846617519265353, + "compression_ratio": 1.5919282511210762, "no_speech_prob": 0.012865211814641953}, + {"id": 159, "seek": 63368, "start": 641.9599999999999, "end": 643.4799999999999, + "text": " is your product today?", "tokens": [50778, 307, 428, 1674, 965, 30, 50854], + "temperature": 0.0, "avg_logprob": -0.26846617519265353, "compression_ratio": 1.5919282511210762, + "no_speech_prob": 0.012865211814641953}, {"id": 160, "seek": 63368, "start": 643.4799999999999, + "end": 645.4, "text": " So as a client, what can I get?", "tokens": [50854, 407, + 382, 257, 6423, 11, 437, 393, 286, 483, 30, 50950], "temperature": 0.0, "avg_logprob": + -0.26846617519265353, "compression_ratio": 1.5919282511210762, "no_speech_prob": + 0.012865211814641953}, {"id": 161, "seek": 63368, "start": 645.4, "end": 647.7199999999999, + "text": " What can I, what kind of support you also provide?", "tokens": [50950, + 708, 393, 286, 11, 437, 733, 295, 1406, 291, 611, 2893, 30, 51066], "temperature": + 0.0, "avg_logprob": -0.26846617519265353, "compression_ratio": 1.5919282511210762, + "no_speech_prob": 0.012865211814641953}, {"id": 162, "seek": 63368, "start": 647.7199999999999, + "end": 650.52, "text": " But first, can you start with the product itself?", "tokens": + [51066, 583, 700, 11, 393, 291, 722, 365, 264, 1674, 2564, 30, 51206], "temperature": + 0.0, "avg_logprob": -0.26846617519265353, "compression_ratio": 1.5919282511210762, + "no_speech_prob": 0.012865211814641953}, {"id": 163, "seek": 63368, "start": 650.52, + "end": 652.3199999999999, "text": " Yes.", "tokens": [51206, 1079, 13, 51296], "temperature": + 0.0, "avg_logprob": -0.26846617519265353, "compression_ratio": 1.5919282511210762, + "no_speech_prob": 0.012865211814641953}, {"id": 164, "seek": 63368, "start": 652.3199999999999, + "end": 658.24, "text": " So to describe it abstractly, and then I''ll explain very + concretely what I mean.", "tokens": [51296, 407, 281, 6786, 309, 12649, 356, 11, + 293, 550, 286, 603, 2903, 588, 39481, 736, 437, 286, 914, 13, 51592], "temperature": + 0.0, "avg_logprob": -0.26846617519265353, "compression_ratio": 1.5919282511210762, + "no_speech_prob": 0.012865211814641953}, {"id": 165, "seek": 65824, "start": 658.24, + "end": 664.5600000000001, "text": " I would say that we''re a cloud platform as + a service for text retrieval or text search.", "tokens": [50364, 286, 576, 584, + 300, 321, 434, 257, 4588, 3663, 382, 257, 2643, 337, 2487, 19817, 3337, 420, 2487, + 3164, 13, 50680], "temperature": 0.0, "avg_logprob": -0.1612650916689918, "compression_ratio": + 1.76, "no_speech_prob": 0.11757352203130722}, {"id": 166, "seek": 65824, "start": + 664.5600000000001, "end": 671.48, "text": " So the way it looks is we have two main + APIs, one for indexing content and the other for", "tokens": [50680, 407, 264, 636, + 309, 1542, 307, 321, 362, 732, 2135, 21445, 11, 472, 337, 8186, 278, 2701, 293, + 264, 661, 337, 51026], "temperature": 0.0, "avg_logprob": -0.1612650916689918, "compression_ratio": + 1.76, "no_speech_prob": 0.11757352203130722}, {"id": 167, "seek": 65824, "start": + 671.48, "end": 673.24, "text": " running queries on the content.", "tokens": [51026, + 2614, 24109, 322, 264, 2701, 13, 51114], "temperature": 0.0, "avg_logprob": -0.1612650916689918, + "compression_ratio": 1.76, "no_speech_prob": 0.11757352203130722}, {"id": 168, "seek": + 65824, "start": 673.24, "end": 677.4, "text": " So an organization would come and + they would index a large amount of content.", "tokens": [51114, 407, 364, 4475, + 576, 808, 293, 436, 576, 8186, 257, 2416, 2372, 295, 2701, 13, 51322], "temperature": + 0.0, "avg_logprob": -0.1612650916689918, "compression_ratio": 1.76, "no_speech_prob": + 0.11757352203130722}, {"id": 169, "seek": 65824, "start": 677.4, "end": 682.6800000000001, + "text": " They might index periodically or incrementally as well over time.", "tokens": + [51322, 814, 1062, 8186, 38916, 420, 26200, 379, 382, 731, 670, 565, 13, 51586], + "temperature": 0.0, "avg_logprob": -0.1612650916689918, "compression_ratio": 1.76, + "no_speech_prob": 0.11757352203130722}, {"id": 170, "seek": 65824, "start": 682.6800000000001, + "end": 687.64, "text": " And this would accrete in an index and then subsequently + they would come and they would", "tokens": [51586, 400, 341, 576, 1317, 7600, 294, + 364, 8186, 293, 550, 26514, 436, 576, 808, 293, 436, 576, 51834], "temperature": + 0.0, "avg_logprob": -0.1612650916689918, "compression_ratio": 1.76, "no_speech_prob": + 0.11757352203130722}, {"id": 171, "seek": 68764, "start": 687.64, "end": 693.16, + "text": " run generally natural language text queries against that corpus and we + would return the", "tokens": [50364, 1190, 5101, 3303, 2856, 2487, 24109, 1970, + 300, 1181, 31624, 293, 321, 576, 2736, 264, 50640], "temperature": 0.0, "avg_logprob": + -0.16679914961469935, "compression_ratio": 1.61864406779661, "no_speech_prob": 0.0009590925765223801}, + {"id": 172, "seek": 68764, "start": 693.16, "end": 695.4, "text": " best matches.", + "tokens": [50640, 1151, 10676, 13, 50752], "temperature": 0.0, "avg_logprob": -0.16679914961469935, + "compression_ratio": 1.61864406779661, "no_speech_prob": 0.0009590925765223801}, + {"id": 173, "seek": 68764, "start": 695.4, "end": 700.84, "text": " So what we actually + provide and how that looks on our platform.", "tokens": [50752, 407, 437, 321, 767, + 2893, 293, 577, 300, 1542, 322, 527, 3663, 13, 51024], "temperature": 0.0, "avg_logprob": + -0.16679914961469935, "compression_ratio": 1.61864406779661, "no_speech_prob": 0.0009590925765223801}, + {"id": 174, "seek": 68764, "start": 700.84, "end": 705.36, "text": " So we, you + essentially, you know, you come and you sign up just the way you would sign", "tokens": + [51024, 407, 321, 11, 291, 4476, 11, 291, 458, 11, 291, 808, 293, 291, 1465, 493, + 445, 264, 636, 291, 576, 1465, 51250], "temperature": 0.0, "avg_logprob": -0.16679914961469935, + "compression_ratio": 1.61864406779661, "no_speech_prob": 0.0009590925765223801}, + {"id": 175, "seek": 68764, "start": 705.36, "end": 711.68, "text": " up for an AWS + account, you''re dropped into an admin console.", "tokens": [51250, 493, 337, 364, + 17650, 2696, 11, 291, 434, 8119, 666, 364, 24236, 11076, 13, 51566], "temperature": + 0.0, "avg_logprob": -0.16679914961469935, "compression_ratio": 1.61864406779661, + "no_speech_prob": 0.0009590925765223801}, {"id": 176, "seek": 68764, "start": 711.68, + "end": 714.48, "text": " Everything you can do in the admin console can be done + through APIs.", "tokens": [51566, 5471, 291, 393, 360, 294, 264, 24236, 11076, 393, + 312, 1096, 807, 21445, 13, 51706], "temperature": 0.0, "avg_logprob": -0.16679914961469935, + "compression_ratio": 1.61864406779661, "no_speech_prob": 0.0009590925765223801}, + {"id": 177, "seek": 71448, "start": 714.48, "end": 717.88, "text": " We''re basically + focused on again, a platform.", "tokens": [50364, 492, 434, 1936, 5178, 322, 797, + 11, 257, 3663, 13, 50534], "temperature": 0.0, "avg_logprob": -0.20046348571777345, + "compression_ratio": 1.6676923076923076, "no_speech_prob": 0.13901077210903168}, + {"id": 178, "seek": 71448, "start": 717.88, "end": 720.8000000000001, "text": " + So we''re accessible through GRPC and rest.", "tokens": [50534, 407, 321, 434, 9515, + 807, 10903, 12986, 293, 1472, 13, 50680], "temperature": 0.0, "avg_logprob": -0.20046348571777345, + "compression_ratio": 1.6676923076923076, "no_speech_prob": 0.13901077210903168}, + {"id": 179, "seek": 71448, "start": 720.8000000000001, "end": 725.0, "text": " The + platform, the console is basically to allow you to, you know, point and click and + quickly", "tokens": [50680, 440, 3663, 11, 264, 11076, 307, 1936, 281, 2089, 291, + 281, 11, 291, 458, 11, 935, 293, 2052, 293, 2661, 50890], "temperature": 0.0, "avg_logprob": + -0.20046348571777345, "compression_ratio": 1.6676923076923076, "no_speech_prob": + 0.13901077210903168}, {"id": 180, "seek": 71448, "start": 725.0, "end": 727.32, + "text": " experiment and discover the value of the system.", "tokens": [50890, 5120, + 293, 4411, 264, 2158, 295, 264, 1185, 13, 51006], "temperature": 0.0, "avg_logprob": + -0.20046348571777345, "compression_ratio": 1.6676923076923076, "no_speech_prob": + 0.13901077210903168}, {"id": 181, "seek": 71448, "start": 727.32, "end": 732.9200000000001, + "text": " Because our vision was that within, within 15 to 30 minutes, someone from + an organization", "tokens": [51006, 1436, 527, 5201, 390, 300, 1951, 11, 1951, 2119, + 281, 2217, 2077, 11, 1580, 490, 364, 4475, 51286], "temperature": 0.0, "avg_logprob": + -0.20046348571777345, "compression_ratio": 1.6676923076923076, "no_speech_prob": + 0.13901077210903168}, {"id": 182, "seek": 71448, "start": 732.9200000000001, "end": + 736.32, "text": " should be able to come, drop their documents into the system and + determine whether or not", "tokens": [51286, 820, 312, 1075, 281, 808, 11, 3270, + 641, 8512, 666, 264, 1185, 293, 6997, 1968, 420, 406, 51456], "temperature": 0.0, + "avg_logprob": -0.20046348571777345, "compression_ratio": 1.6676923076923076, "no_speech_prob": + 0.13901077210903168}, {"id": 183, "seek": 71448, "start": 736.32, "end": 738.08, + "text": " it''s even going to meet their needs.", "tokens": [51456, 309, 311, 754, + 516, 281, 1677, 641, 2203, 13, 51544], "temperature": 0.0, "avg_logprob": -0.20046348571777345, + "compression_ratio": 1.6676923076923076, "no_speech_prob": 0.13901077210903168}, + {"id": 184, "seek": 71448, "start": 738.08, "end": 743.8000000000001, "text": " + And then if it does, they can consult the documentation and learn how to use the + APIs and get", "tokens": [51544, 400, 550, 498, 309, 775, 11, 436, 393, 7189, 264, + 14333, 293, 1466, 577, 281, 764, 264, 21445, 293, 483, 51830], "temperature": 0.0, + "avg_logprob": -0.20046348571777345, "compression_ratio": 1.6676923076923076, "no_speech_prob": + 0.13901077210903168}, {"id": 185, "seek": 74380, "start": 743.8, "end": 745.64, + "text": " a proper integration going.", "tokens": [50364, 257, 2296, 10980, 516, + 13, 50456], "temperature": 0.0, "avg_logprob": -0.17702918333165785, "compression_ratio": + 1.6074766355140186, "no_speech_prob": 0.0005752904107794166}, {"id": 186, "seek": + 74380, "start": 745.64, "end": 750.68, "text": " So we organize collections of documents + into what are called corpora.", "tokens": [50456, 407, 321, 13859, 16641, 295, 8512, + 666, 437, 366, 1219, 6804, 64, 13, 50708], "temperature": 0.0, "avg_logprob": -0.17702918333165785, + "compression_ratio": 1.6074766355140186, "no_speech_prob": 0.0005752904107794166}, + {"id": 187, "seek": 74380, "start": 750.68, "end": 754.24, "text": " So one corpus + is essentially, it''s a customer defined entity.", "tokens": [50708, 407, 472, 1181, + 31624, 307, 4476, 11, 309, 311, 257, 5474, 7642, 13977, 13, 50886], "temperature": + 0.0, "avg_logprob": -0.17702918333165785, "compression_ratio": 1.6074766355140186, + "no_speech_prob": 0.0005752904107794166}, {"id": 188, "seek": 74380, "start": 754.24, + "end": 760.28, "text": " It groups related documents that they want to search together + as a unit.", "tokens": [50886, 467, 3935, 4077, 8512, 300, 436, 528, 281, 3164, + 1214, 382, 257, 4985, 13, 51188], "temperature": 0.0, "avg_logprob": -0.17702918333165785, + "compression_ratio": 1.6074766355140186, "no_speech_prob": 0.0005752904107794166}, + {"id": 189, "seek": 74380, "start": 760.28, "end": 767.0, "text": " We allow, you + know, the customer to define any number of corpora, there''s limits depending", + "tokens": [51188, 492, 2089, 11, 291, 458, 11, 264, 5474, 281, 6964, 604, 1230, + 295, 6804, 64, 11, 456, 311, 10406, 5413, 51524], "temperature": 0.0, "avg_logprob": + -0.17702918333165785, "compression_ratio": 1.6074766355140186, "no_speech_prob": + 0.0005752904107794166}, {"id": 190, "seek": 74380, "start": 767.0, "end": 769.16, + "text": " on the account type.", "tokens": [51524, 322, 264, 2696, 2010, 13, 51632], + "temperature": 0.0, "avg_logprob": -0.17702918333165785, "compression_ratio": 1.6074766355140186, + "no_speech_prob": 0.0005752904107794166}, {"id": 191, "seek": 76916, "start": 769.16, + "end": 775.64, "text": " And then you can essentially drag and drop the documents + into the web browser, into the", "tokens": [50364, 400, 550, 291, 393, 4476, 5286, + 293, 3270, 264, 8512, 666, 264, 3670, 11185, 11, 666, 264, 50688], "temperature": + 0.0, "avg_logprob": -0.21588961791992187, "compression_ratio": 1.7798507462686568, + "no_speech_prob": 0.0041803279891610146}, {"id": 192, "seek": 76916, "start": 775.64, + "end": 776.64, "text": " corpus upload.", "tokens": [50688, 1181, 31624, 6580, 13, + 50738], "temperature": 0.0, "avg_logprob": -0.21588961791992187, "compression_ratio": + 1.7798507462686568, "no_speech_prob": 0.0041803279891610146}, {"id": 193, "seek": + 76916, "start": 776.64, "end": 779.4399999999999, "text": " It''ll be, there''s + about a seven minute latency.", "tokens": [50738, 467, 603, 312, 11, 456, 311, 466, + 257, 3407, 3456, 27043, 13, 50878], "temperature": 0.0, "avg_logprob": -0.21588961791992187, + "compression_ratio": 1.7798507462686568, "no_speech_prob": 0.0041803279891610146}, + {"id": 194, "seek": 76916, "start": 779.4399999999999, "end": 780.76, "text": " + And then you can start running queries.", "tokens": [50878, 400, 550, 291, 393, + 722, 2614, 24109, 13, 50944], "temperature": 0.0, "avg_logprob": -0.21588961791992187, + "compression_ratio": 1.7798507462686568, "no_speech_prob": 0.0041803279891610146}, + {"id": 195, "seek": 76916, "start": 780.76, "end": 785.24, "text": " And when you + run, we have a hosted UI that makes it easy to see the results kind of on", "tokens": + [50944, 400, 562, 291, 1190, 11, 321, 362, 257, 19204, 15682, 300, 1669, 309, 1858, + 281, 536, 264, 3542, 733, 295, 322, 51168], "temperature": 0.0, "avg_logprob": -0.21588961791992187, + "compression_ratio": 1.7798507462686568, "no_speech_prob": 0.0041803279891610146}, + {"id": 196, "seek": 76916, "start": 785.24, "end": 786.9599999999999, "text": " + the spot in the browser.", "tokens": [51168, 264, 4008, 294, 264, 11185, 13, 51254], + "temperature": 0.0, "avg_logprob": -0.21588961791992187, "compression_ratio": 1.7798507462686568, + "no_speech_prob": 0.0041803279891610146}, {"id": 197, "seek": 76916, "start": 786.9599999999999, + "end": 791.48, "text": " But when you run queries through our interface, you also + have our, our, our API is, you also", "tokens": [51254, 583, 562, 291, 1190, 24109, + 807, 527, 9226, 11, 291, 611, 362, 527, 11, 527, 11, 527, 9362, 307, 11, 291, 611, + 51480], "temperature": 0.0, "avg_logprob": -0.21588961791992187, "compression_ratio": + 1.7798507462686568, "no_speech_prob": 0.0041803279891610146}, {"id": 198, "seek": + 76916, "start": 791.48, "end": 798.1999999999999, "text": " have the ability to + run one query against multiple corpora and merge the results.", "tokens": [51480, + 362, 264, 3485, 281, 1190, 472, 14581, 1970, 3866, 1181, 79, 3252, 293, 22183, 264, + 3542, 13, 51816], "temperature": 0.0, "avg_logprob": -0.21588961791992187, "compression_ratio": + 1.7798507462686568, "no_speech_prob": 0.0041803279891610146}, {"id": 199, "seek": + 79820, "start": 798.2, "end": 805.6, "text": " So we also support the ability to + attach metadata as your indexing content, attach metadata", "tokens": [50364, 407, + 321, 611, 1406, 264, 3485, 281, 5085, 26603, 382, 428, 8186, 278, 2701, 11, 5085, + 26603, 50734], "temperature": 0.0, "avg_logprob": -0.21361607638272373, "compression_ratio": + 1.6784313725490196, "no_speech_prob": 0.0029648279305547476}, {"id": 200, "seek": + 79820, "start": 805.6, "end": 809.8000000000001, "text": " that then is returned + to you in the search results.", "tokens": [50734, 300, 550, 307, 8752, 281, 291, + 294, 264, 3164, 3542, 13, 50944], "temperature": 0.0, "avg_logprob": -0.21361607638272373, + "compression_ratio": 1.6784313725490196, "no_speech_prob": 0.0029648279305547476}, + {"id": 201, "seek": 79820, "start": 809.8000000000001, "end": 815.0400000000001, + "text": " So that would allow you to, to join to let''s say another system on your + end.", "tokens": [50944, 407, 300, 576, 2089, 291, 281, 11, 281, 3917, 281, 718, + 311, 584, 1071, 1185, 322, 428, 917, 13, 51206], "temperature": 0.0, "avg_logprob": + -0.21361607638272373, "compression_ratio": 1.6784313725490196, "no_speech_prob": + 0.0029648279305547476}, {"id": 202, "seek": 79820, "start": 815.0400000000001, "end": + 818.0, "text": " But those are, those are some of the features that we provide.", + "tokens": [51206, 583, 729, 366, 11, 729, 366, 512, 295, 264, 4122, 300, 321, 2893, + 13, 51354], "temperature": 0.0, "avg_logprob": -0.21361607638272373, "compression_ratio": + 1.6784313725490196, "no_speech_prob": 0.0029648279305547476}, {"id": 203, "seek": + 79820, "start": 818.0, "end": 819.0, "text": " Yeah.", "tokens": [51354, 865, 13, + 51404], "temperature": 0.0, "avg_logprob": -0.21361607638272373, "compression_ratio": + 1.6784313725490196, "no_speech_prob": 0.0029648279305547476}, {"id": 204, "seek": + 79820, "start": 819.0, "end": 821.84, "text": " So it sounds like it''s a self service + system, right?", "tokens": [51404, 407, 309, 3263, 411, 309, 311, 257, 2698, 2643, + 1185, 11, 558, 30, 51546], "temperature": 0.0, "avg_logprob": -0.21361607638272373, + "compression_ratio": 1.6784313725490196, "no_speech_prob": 0.0029648279305547476}, + {"id": 205, "seek": 79820, "start": 821.84, "end": 827.6800000000001, "text": " + And so if I was a client of yours, I could like get a subscription trial subscription", + "tokens": [51546, 400, 370, 498, 286, 390, 257, 6423, 295, 6342, 11, 286, 727, 411, + 483, 257, 17231, 7308, 17231, 51838], "temperature": 0.0, "avg_logprob": -0.21361607638272373, + "compression_ratio": 1.6784313725490196, "no_speech_prob": 0.0029648279305547476}, + {"id": 206, "seek": 82768, "start": 827.68, "end": 832.16, "text": " maybe then + upload my document corpus.", "tokens": [50364, 1310, 550, 6580, 452, 4166, 1181, + 31624, 13, 50588], "temperature": 0.0, "avg_logprob": -0.2156848714809225, "compression_ratio": + 1.524229074889868, "no_speech_prob": 0.0041611953638494015}, {"id": 207, "seek": + 82768, "start": 832.16, "end": 834.76, "text": " How big a corpus could I upload + on a trial?", "tokens": [50588, 1012, 955, 257, 1181, 31624, 727, 286, 6580, 322, + 257, 7308, 30, 50718], "temperature": 0.0, "avg_logprob": -0.2156848714809225, "compression_ratio": + 1.524229074889868, "no_speech_prob": 0.0041611953638494015}, {"id": 208, "seek": + 82768, "start": 834.76, "end": 838.7199999999999, "text": " Do you have any limitation + there at this point?", "tokens": [50718, 1144, 291, 362, 604, 27432, 456, 412, 341, + 935, 30, 50916], "temperature": 0.0, "avg_logprob": -0.2156848714809225, "compression_ratio": + 1.524229074889868, "no_speech_prob": 0.0041611953638494015}, {"id": 209, "seek": + 82768, "start": 838.7199999999999, "end": 843.9599999999999, "text": " So our general + trial has been 15 megabytes of text.", "tokens": [50916, 407, 527, 2674, 7308, 575, + 668, 2119, 10816, 24538, 295, 2487, 13, 51178], "temperature": 0.0, "avg_logprob": + -0.2156848714809225, "compression_ratio": 1.524229074889868, "no_speech_prob": 0.0041611953638494015}, + {"id": 210, "seek": 82768, "start": 843.9599999999999, "end": 846.3599999999999, + "text": " And I''ll explain what that translates to.", "tokens": [51178, 400, 286, + 603, 2903, 437, 300, 28468, 281, 13, 51298], "temperature": 0.0, "avg_logprob": + -0.2156848714809225, "compression_ratio": 1.524229074889868, "no_speech_prob": 0.0041611953638494015}, + {"id": 211, "seek": 82768, "start": 846.3599999999999, "end": 851.12, "text": " + I was, I was, I was just working with another customer.", "tokens": [51298, 286, + 390, 11, 286, 390, 11, 286, 390, 445, 1364, 365, 1071, 5474, 13, 51536], "temperature": + 0.0, "avg_logprob": -0.2156848714809225, "compression_ratio": 1.524229074889868, + "no_speech_prob": 0.0041611953638494015}, {"id": 212, "seek": 82768, "start": 851.12, + "end": 855.88, "text": " And they had about one gigabyte of PDFs that we put into + a corpus.", "tokens": [51536, 400, 436, 632, 466, 472, 8741, 34529, 295, 17752, + 82, 300, 321, 829, 666, 257, 1181, 31624, 13, 51774], "temperature": 0.0, "avg_logprob": + -0.2156848714809225, "compression_ratio": 1.524229074889868, "no_speech_prob": 0.0041611953638494015}, + {"id": 213, "seek": 85588, "start": 855.88, "end": 859.52, "text": " And then that + turned out to be about 48 megabytes of text.", "tokens": [50364, 400, 550, 300, + 3574, 484, 281, 312, 466, 11174, 10816, 24538, 295, 2487, 13, 50546], "temperature": + 0.0, "avg_logprob": -0.18534351607500496, "compression_ratio": 1.6091954022988506, + "no_speech_prob": 0.000489726779051125}, {"id": 214, "seek": 85588, "start": 859.52, + "end": 862.52, "text": " So the, the billing is by the actual extracted textual + content.", "tokens": [50546, 407, 264, 11, 264, 35618, 307, 538, 264, 3539, 34086, + 2487, 901, 2701, 13, 50696], "temperature": 0.0, "avg_logprob": -0.18534351607500496, + "compression_ratio": 1.6091954022988506, "no_speech_prob": 0.000489726779051125}, + {"id": 215, "seek": 85588, "start": 862.52, "end": 869.32, "text": " So 15 megabytes + is actually a decent data set, several hundred documents you can imagine.", "tokens": + [50696, 407, 2119, 10816, 24538, 307, 767, 257, 8681, 1412, 992, 11, 2940, 3262, + 8512, 291, 393, 3811, 13, 51036], "temperature": 0.0, "avg_logprob": -0.18534351607500496, + "compression_ratio": 1.6091954022988506, "no_speech_prob": 0.000489726779051125}, + {"id": 216, "seek": 85588, "start": 869.32, "end": 873.76, "text": " So, but we + have, we have plans that go much larger and we have customers that are indexing", + "tokens": [51036, 407, 11, 457, 321, 362, 11, 321, 362, 5482, 300, 352, 709, 4833, + 293, 321, 362, 4581, 300, 366, 8186, 278, 51258], "temperature": 0.0, "avg_logprob": + -0.18534351607500496, "compression_ratio": 1.6091954022988506, "no_speech_prob": + 0.000489726779051125}, {"id": 217, "seek": 85588, "start": 873.76, "end": 874.76, + "text": " far more data.", "tokens": [51258, 1400, 544, 1412, 13, 51308], "temperature": + 0.0, "avg_logprob": -0.18534351607500496, "compression_ratio": 1.6091954022988506, + "no_speech_prob": 0.000489726779051125}, {"id": 218, "seek": 85588, "start": 874.76, + "end": 877.12, "text": " Yeah, yeah, sounds great.", "tokens": [51308, 865, 11, + 1338, 11, 3263, 869, 13, 51426], "temperature": 0.0, "avg_logprob": -0.18534351607500496, + "compression_ratio": 1.6091954022988506, "no_speech_prob": 0.000489726779051125}, + {"id": 219, "seek": 85588, "start": 877.12, "end": 878.32, "text": " And then what + happens next?", "tokens": [51426, 400, 550, 437, 2314, 958, 30, 51486], "temperature": + 0.0, "avg_logprob": -0.18534351607500496, "compression_ratio": 1.6091954022988506, + "no_speech_prob": 0.000489726779051125}, {"id": 220, "seek": 85588, "start": 878.32, + "end": 879.32, "text": " So let''s say I''m happy.", "tokens": [51486, 407, 718, + 311, 584, 286, 478, 2055, 13, 51536], "temperature": 0.0, "avg_logprob": -0.18534351607500496, + "compression_ratio": 1.6091954022988506, "no_speech_prob": 0.000489726779051125}, + {"id": 221, "seek": 85588, "start": 879.32, "end": 881.04, "text": " I want to move + forward.", "tokens": [51536, 286, 528, 281, 1286, 2128, 13, 51622], "temperature": + 0.0, "avg_logprob": -0.18534351607500496, "compression_ratio": 1.6091954022988506, + "no_speech_prob": 0.000489726779051125}, {"id": 222, "seek": 88104, "start": 881.04, + "end": 886.64, "text": " Now you said that there are APIs that I can start kind + of introducing inside my prototype", "tokens": [50364, 823, 291, 848, 300, 456, + 366, 21445, 300, 286, 393, 722, 733, 295, 15424, 1854, 452, 19475, 50644], "temperature": + 0.0, "avg_logprob": -0.2055803680419922, "compression_ratio": 1.5, "no_speech_prob": + 0.0828024297952652}, {"id": 223, "seek": 88104, "start": 886.64, "end": 888.64, + "text": " or my existing back end.", "tokens": [50644, 420, 452, 6741, 646, 917, + 13, 50744], "temperature": 0.0, "avg_logprob": -0.2055803680419922, "compression_ratio": + 1.5, "no_speech_prob": 0.0828024297952652}, {"id": 224, "seek": 88104, "start": + 888.64, "end": 889.64, "text": " Is that right?", "tokens": [50744, 1119, 300, 558, + 30, 50794], "temperature": 0.0, "avg_logprob": -0.2055803680419922, "compression_ratio": + 1.5, "no_speech_prob": 0.0828024297952652}, {"id": 225, "seek": 88104, "start": + 889.64, "end": 891.16, "text": " Yeah, that''s right.", "tokens": [50794, 865, 11, + 300, 311, 558, 13, 50870], "temperature": 0.0, "avg_logprob": -0.2055803680419922, + "compression_ratio": 1.5, "no_speech_prob": 0.0828024297952652}, {"id": 226, "seek": + 88104, "start": 891.16, "end": 897.52, "text": " So we, we support primarily we + promote a gRPC interface because it''s high performance", "tokens": [50870, 407, + 321, 11, 321, 1406, 10029, 321, 9773, 257, 290, 49, 12986, 9226, 570, 309, 311, + 1090, 3389, 51188], "temperature": 0.0, "avg_logprob": -0.2055803680419922, "compression_ratio": + 1.5, "no_speech_prob": 0.0828024297952652}, {"id": 227, "seek": 88104, "start": + 897.52, "end": 898.52, "text": " low latency.", "tokens": [51188, 2295, 27043, 13, + 51238], "temperature": 0.0, "avg_logprob": -0.2055803680419922, "compression_ratio": + 1.5, "no_speech_prob": 0.0828024297952652}, {"id": 228, "seek": 88104, "start": + 898.52, "end": 901.04, "text": " We also do have a rest interface.", "tokens": [51238, + 492, 611, 360, 362, 257, 1472, 9226, 13, 51364], "temperature": 0.0, "avg_logprob": + -0.2055803680419922, "compression_ratio": 1.5, "no_speech_prob": 0.0828024297952652}, + {"id": 229, "seek": 88104, "start": 901.04, "end": 904.36, "text": " We have fully + authenticated APIs.", "tokens": [51364, 492, 362, 4498, 9214, 3587, 21445, 13, 51530], + "temperature": 0.0, "avg_logprob": -0.2055803680419922, "compression_ratio": 1.5, + "no_speech_prob": 0.0828024297952652}, {"id": 230, "seek": 88104, "start": 904.36, + "end": 907.76, "text": " So we use OAuth 2.0 that standard.", "tokens": [51530, + 407, 321, 764, 48424, 2910, 568, 13, 15, 300, 3832, 13, 51700], "temperature": 0.0, + "avg_logprob": -0.2055803680419922, "compression_ratio": 1.5, "no_speech_prob": + 0.0828024297952652}, {"id": 231, "seek": 90776, "start": 907.76, "end": 913.68, + "text": " So you would give credentials to your servers and they would use those + credentials to establish", "tokens": [50364, 407, 291, 576, 976, 27404, 281, 428, + 15909, 293, 436, 576, 764, 729, 27404, 281, 8327, 50660], "temperature": 0.0, "avg_logprob": + -0.24384072449830202, "compression_ratio": 1.6139705882352942, "no_speech_prob": + 0.04690662771463394}, {"id": 232, "seek": 90776, "start": 913.68, "end": 919.04, + "text": " an authenticated session with the platform and then run, run queries for + you at a very", "tokens": [50660, 364, 9214, 3587, 5481, 365, 264, 3663, 293, 550, + 1190, 11, 1190, 24109, 337, 291, 412, 257, 588, 50928], "temperature": 0.0, "avg_logprob": + -0.24384072449830202, "compression_ratio": 1.6139705882352942, "no_speech_prob": + 0.04690662771463394}, {"id": 233, "seek": 90776, "start": 919.04, "end": 921.04, + "text": " high rate.", "tokens": [50928, 1090, 3314, 13, 51028], "temperature": + 0.0, "avg_logprob": -0.24384072449830202, "compression_ratio": 1.6139705882352942, + "no_speech_prob": 0.04690662771463394}, {"id": 234, "seek": 90776, "start": 921.04, + "end": 922.52, "text": " We scale horizontally.", "tokens": [51028, 492, 4373, 33796, + 13, 51102], "temperature": 0.0, "avg_logprob": -0.24384072449830202, "compression_ratio": + 1.6139705882352942, "no_speech_prob": 0.04690662771463394}, {"id": 235, "seek": + 90776, "start": 922.52, "end": 927.28, "text": " We can go up to hundreds of QPS, + though we haven''t had a customer that''s needed such", "tokens": [51102, 492, 393, + 352, 493, 281, 6779, 295, 1249, 6273, 11, 1673, 321, 2378, 380, 632, 257, 5474, + 300, 311, 2978, 1270, 51340], "temperature": 0.0, "avg_logprob": -0.24384072449830202, + "compression_ratio": 1.6139705882352942, "no_speech_prob": 0.04690662771463394}, + {"id": 236, "seek": 90776, "start": 927.28, "end": 929.92, "text": " a high rate, + but we''re capable of that.", "tokens": [51340, 257, 1090, 3314, 11, 457, 321, 434, + 8189, 295, 300, 13, 51472], "temperature": 0.0, "avg_logprob": -0.24384072449830202, + "compression_ratio": 1.6139705882352942, "no_speech_prob": 0.04690662771463394}, + {"id": 237, "seek": 90776, "start": 929.92, "end": 930.92, "text": " Yeah, yeah.", + "tokens": [51472, 865, 11, 1338, 13, 51522], "temperature": 0.0, "avg_logprob": + -0.24384072449830202, "compression_ratio": 1.6139705882352942, "no_speech_prob": + 0.04690662771463394}, {"id": 238, "seek": 90776, "start": 930.92, "end": 937.72, + "text": " And you also mentioned that you maintain certain like SLA guarantees like + P99 latency", "tokens": [51522, 400, 291, 611, 2835, 300, 291, 6909, 1629, 411, + 318, 11435, 32567, 411, 430, 8494, 27043, 51862], "temperature": 0.0, "avg_logprob": + -0.24384072449830202, "compression_ratio": 1.6139705882352942, "no_speech_prob": + 0.04690662771463394}, {"id": 239, "seek": 93772, "start": 937.72, "end": 944.96, + "text": " can you speak a bit about that and also like how much of that accounts + for client need", "tokens": [50364, 393, 291, 1710, 257, 857, 466, 300, 293, 611, + 411, 577, 709, 295, 300, 9402, 337, 6423, 643, 50726], "temperature": 0.0, "avg_logprob": + -0.24096171752266263, "compression_ratio": 1.5541125541125542, "no_speech_prob": + 0.0018748895963653922}, {"id": 240, "seek": 93772, "start": 944.96, "end": 947.84, + "text": " versus what you are building for the future.", "tokens": [50726, 5717, + 437, 291, 366, 2390, 337, 264, 2027, 13, 50870], "temperature": 0.0, "avg_logprob": + -0.24096171752266263, "compression_ratio": 1.5541125541125542, "no_speech_prob": + 0.0018748895963653922}, {"id": 241, "seek": 93772, "start": 947.84, "end": 950.8000000000001, + "text": " And this is a good question.", "tokens": [50870, 400, 341, 307, 257, 665, + 1168, 13, 51018], "temperature": 0.0, "avg_logprob": -0.24096171752266263, "compression_ratio": + 1.5541125541125542, "no_speech_prob": 0.0018748895963653922}, {"id": 242, "seek": + 93772, "start": 950.8000000000001, "end": 957.1600000000001, "text": " So in terms + of client need, we really haven''t had any client that''s required anything better", + "tokens": [51018, 407, 294, 2115, 295, 6423, 643, 11, 321, 534, 2378, 380, 632, + 604, 6423, 300, 311, 4739, 1340, 1101, 51336], "temperature": 0.0, "avg_logprob": + -0.24096171752266263, "compression_ratio": 1.5541125541125542, "no_speech_prob": + 0.0018748895963653922}, {"id": 243, "seek": 93772, "start": 957.1600000000001, "end": + 960.64, "text": " than 200 milliseconds.", "tokens": [51336, 813, 2331, 34184, 13, + 51510], "temperature": 0.0, "avg_logprob": -0.24096171752266263, "compression_ratio": + 1.5541125541125542, "no_speech_prob": 0.0018748895963653922}, {"id": 244, "seek": + 93772, "start": 960.64, "end": 963.36, "text": " Now there''s a potential client + that we''re working with.", "tokens": [51510, 823, 456, 311, 257, 3995, 6423, 300, + 321, 434, 1364, 365, 13, 51646], "temperature": 0.0, "avg_logprob": -0.24096171752266263, + "compression_ratio": 1.5541125541125542, "no_speech_prob": 0.0018748895963653922}, + {"id": 245, "seek": 93772, "start": 963.36, "end": 965.24, "text": " They''re not + yet acclaimed.", "tokens": [51646, 814, 434, 406, 1939, 1317, 22642, 13, 51740], + "temperature": 0.0, "avg_logprob": -0.24096171752266263, "compression_ratio": 1.5541125541125542, + "no_speech_prob": 0.0018748895963653922}, {"id": 246, "seek": 96524, "start": 965.24, + "end": 973.72, "text": " They''re looking for more like 50 to 60 milliseconds because + essentially the look up into our system", "tokens": [50364, 814, 434, 1237, 337, + 544, 411, 2625, 281, 4060, 34184, 570, 4476, 264, 574, 493, 666, 527, 1185, 50788], + "temperature": 0.0, "avg_logprob": -0.17719637883173955, "compression_ratio": 1.490990990990991, + "no_speech_prob": 0.005780704785138369}, {"id": 247, "seek": 96524, "start": 973.72, + "end": 977.88, "text": " is only one part of their overall request handling process.", + "tokens": [50788, 307, 787, 472, 644, 295, 641, 4787, 5308, 13175, 1399, 13, 50996], + "temperature": 0.0, "avg_logprob": -0.17719637883173955, "compression_ratio": 1.490990990990991, + "no_speech_prob": 0.005780704785138369}, {"id": 248, "seek": 96524, "start": 977.88, + "end": 980.04, "text": " So they have a much tighter budget.", "tokens": [50996, + 407, 436, 362, 257, 709, 30443, 4706, 13, 51104], "temperature": 0.0, "avg_logprob": + -0.17719637883173955, "compression_ratio": 1.490990990990991, "no_speech_prob": + 0.005780704785138369}, {"id": 249, "seek": 96524, "start": 980.04, "end": 984.04, + "text": " In practice, what we''re seeing on our platform for our customers today + aggregated over", "tokens": [51104, 682, 3124, 11, 437, 321, 434, 2577, 322, 527, + 3663, 337, 527, 4581, 965, 16743, 770, 670, 51304], "temperature": 0.0, "avg_logprob": + -0.17719637883173955, "compression_ratio": 1.490990990990991, "no_speech_prob": + 0.005780704785138369}, {"id": 250, "seek": 96524, "start": 984.04, "end": 988.76, + "text": " all queries is a P99 of around 130 milliseconds.", "tokens": [51304, 439, + 24109, 307, 257, 430, 8494, 295, 926, 19966, 34184, 13, 51540], "temperature": 0.0, + "avg_logprob": -0.17719637883173955, "compression_ratio": 1.490990990990991, "no_speech_prob": + 0.005780704785138369}, {"id": 251, "seek": 98876, "start": 988.76, "end": 991.92, + "text": " Our P50 is about 60 milliseconds.", "tokens": [50364, 2621, 430, 2803, + 307, 466, 4060, 34184, 13, 50522], "temperature": 0.0, "avg_logprob": -0.14299837180546351, + "compression_ratio": 1.6840148698884758, "no_speech_prob": 0.015431715175509453}, + {"id": 252, "seek": 98876, "start": 991.92, "end": 995.8, "text": " And this has + been sufficient for our customers.", "tokens": [50522, 400, 341, 575, 668, 11563, + 337, 527, 4581, 13, 50716], "temperature": 0.0, "avg_logprob": -0.14299837180546351, + "compression_ratio": 1.6840148698884758, "no_speech_prob": 0.015431715175509453}, + {"id": 253, "seek": 98876, "start": 995.8, "end": 1001.4399999999999, "text": " + For customers that have tighter requirements, we actually have many different ways + to address", "tokens": [50716, 1171, 4581, 300, 362, 30443, 7728, 11, 321, 767, + 362, 867, 819, 2098, 281, 2985, 50998], "temperature": 0.0, "avg_logprob": -0.14299837180546351, + "compression_ratio": 1.6840148698884758, "no_speech_prob": 0.015431715175509453}, + {"id": 254, "seek": 98876, "start": 1001.4399999999999, "end": 1002.4399999999999, + "text": " it.", "tokens": [50998, 309, 13, 51048], "temperature": 0.0, "avg_logprob": + -0.14299837180546351, "compression_ratio": 1.6840148698884758, "no_speech_prob": + 0.015431715175509453}, {"id": 255, "seek": 98876, "start": 1002.4399999999999, "end": + 1004.56, "text": " So actually the main latency is not from the vector database.", + "tokens": [51048, 407, 767, 264, 2135, 27043, 307, 406, 490, 264, 8062, 8149, 13, + 51154], "temperature": 0.0, "avg_logprob": -0.14299837180546351, "compression_ratio": + 1.6840148698884758, "no_speech_prob": 0.015431715175509453}, {"id": 256, "seek": + 98876, "start": 1004.56, "end": 1006.88, "text": " The vector database is generally + quite fast.", "tokens": [51154, 440, 8062, 8149, 307, 5101, 1596, 2370, 13, 51270], + "temperature": 0.0, "avg_logprob": -0.14299837180546351, "compression_ratio": 1.6840148698884758, + "no_speech_prob": 0.015431715175509453}, {"id": 257, "seek": 98876, "start": 1006.88, + "end": 1009.3199999999999, "text": " It''s the neural network that has to do the + text encoding.", "tokens": [51270, 467, 311, 264, 18161, 3209, 300, 575, 281, 360, + 264, 2487, 43430, 13, 51392], "temperature": 0.0, "avg_logprob": -0.14299837180546351, + "compression_ratio": 1.6840148698884758, "no_speech_prob": 0.015431715175509453}, + {"id": 258, "seek": 98876, "start": 1009.3199999999999, "end": 1011.52, "text": + " That''s the bottleneck.", "tokens": [51392, 663, 311, 264, 44641, 547, 13, 51502], + "temperature": 0.0, "avg_logprob": -0.14299837180546351, "compression_ratio": 1.6840148698884758, + "no_speech_prob": 0.015431715175509453}, {"id": 259, "seek": 98876, "start": 1011.52, + "end": 1018.4399999999999, "text": " So we have the ability to set up dedicated + pools of encoders, neural networks that do", "tokens": [51502, 407, 321, 362, 264, + 3485, 281, 992, 493, 8374, 28688, 295, 2058, 378, 433, 11, 18161, 9590, 300, 360, + 51848], "temperature": 0.0, "avg_logprob": -0.14299837180546351, "compression_ratio": + 1.6840148698884758, "no_speech_prob": 0.015431715175509453}, {"id": 260, "seek": + 101844, "start": 1018.44, "end": 1021.2, "text": " this encoding of four customers.", + "tokens": [50364, 341, 43430, 295, 1451, 4581, 13, 50502], "temperature": 0.0, "avg_logprob": + -0.2530548725653132, "compression_ratio": 1.6153846153846154, "no_speech_prob": + 0.011765317991375923}, {"id": 261, "seek": 101844, "start": 1021.2, "end": 1028.48, + "text": " So we scale and we''re cost efficient by sharing the pool across all customers.", + "tokens": [50502, 407, 321, 4373, 293, 321, 434, 2063, 7148, 538, 5414, 264, 7005, + 2108, 439, 4581, 13, 50866], "temperature": 0.0, "avg_logprob": -0.2530548725653132, + "compression_ratio": 1.6153846153846154, "no_speech_prob": 0.011765317991375923}, + {"id": 262, "seek": 101844, "start": 1028.48, "end": 1032.1200000000001, "text": + " But for customers that have very stringent needs, we can set up dedicated pools + for them.", "tokens": [50866, 583, 337, 4581, 300, 362, 588, 6798, 317, 2203, 11, + 321, 393, 992, 493, 8374, 28688, 337, 552, 13, 51048], "temperature": 0.0, "avg_logprob": + -0.2530548725653132, "compression_ratio": 1.6153846153846154, "no_speech_prob": + 0.011765317991375923}, {"id": 263, "seek": 101844, "start": 1032.1200000000001, + "end": 1039.64, "text": " But even when you go, let''s say single customer, single + node, maybe GPU node, there are still", "tokens": [51048, 583, 754, 562, 291, 352, + 11, 718, 311, 584, 2167, 5474, 11, 2167, 9984, 11, 1310, 18407, 9984, 11, 456, 366, + 920, 51424], "temperature": 0.0, "avg_logprob": -0.2530548725653132, "compression_ratio": + 1.6153846153846154, "no_speech_prob": 0.011765317991375923}, {"id": 264, "seek": + 101844, "start": 1039.64, "end": 1042.96, "text": " theoretical boundary to how + fast it can be.", "tokens": [51424, 20864, 12866, 281, 577, 2370, 309, 393, 312, + 13, 51590], "temperature": 0.0, "avg_logprob": -0.2530548725653132, "compression_ratio": + 1.6153846153846154, "no_speech_prob": 0.011765317991375923}, {"id": 265, "seek": + 101844, "start": 1042.96, "end": 1048.3600000000001, "text": " Let''s say if I take + an off-the-shelf birth model, and if I throw 768 dimensions,", "tokens": [51590, + 961, 311, 584, 498, 286, 747, 364, 766, 12, 3322, 12, 46626, 3965, 2316, 11, 293, + 498, 286, 3507, 1614, 27102, 12819, 11, 51860], "temperature": 0.0, "avg_logprob": + -0.2530548725653132, "compression_ratio": 1.6153846153846154, "no_speech_prob": + 0.011765317991375923}, {"id": 266, "seek": 104836, "start": 1048.36, "end": 1050.6799999999998, + "text": " what''s going to happen?", "tokens": [50364, 437, 311, 516, 281, 1051, + 30, 50480], "temperature": 0.0, "avg_logprob": -0.32399056174538354, "compression_ratio": + 1.5757575757575757, "no_speech_prob": 0.022683139890432358}, {"id": 267, "seek": + 104836, "start": 1050.6799999999998, "end": 1054.04, "text": " How can I fine tune + it on the speed size?", "tokens": [50480, 1012, 393, 286, 2489, 10864, 309, 322, + 264, 3073, 2744, 30, 50648], "temperature": 0.0, "avg_logprob": -0.32399056174538354, + "compression_ratio": 1.5757575757575757, "no_speech_prob": 0.022683139890432358}, + {"id": 268, "seek": 104836, "start": 1054.04, "end": 1059.28, "text": " Yeah, well, + let me address two things that you said there.", "tokens": [50648, 865, 11, 731, + 11, 718, 385, 2985, 732, 721, 300, 291, 848, 456, 13, 50910], "temperature": 0.0, + "avg_logprob": -0.32399056174538354, "compression_ratio": 1.5757575757575757, "no_speech_prob": + 0.022683139890432358}, {"id": 269, "seek": 104836, "start": 1059.28, "end": 1065.6799999999998, + "text": " So the off-the-shelf birth model is a very common approach that many companies + are trying", "tokens": [50910, 407, 264, 766, 12, 3322, 12, 46626, 3965, 2316, 307, + 257, 588, 2689, 3109, 300, 867, 3431, 366, 1382, 51230], "temperature": 0.0, "avg_logprob": + -0.32399056174538354, "compression_ratio": 1.5757575757575757, "no_speech_prob": + 0.022683139890432358}, {"id": 270, "seek": 104836, "start": 1065.6799999999998, + "end": 1066.84, "text": " to productionize NLP.", "tokens": [51230, 281, 4265, 1125, + 426, 45196, 13, 51288], "temperature": 0.0, "avg_logprob": -0.32399056174538354, + "compression_ratio": 1.5757575757575757, "no_speech_prob": 0.022683139890432358}, + {"id": 271, "seek": 104836, "start": 1066.84, "end": 1070.08, "text": " They use + it because birth has a phenomenal accuracy.", "tokens": [51288, 814, 764, 309, 570, + 3965, 575, 257, 17778, 14170, 13, 51450], "temperature": 0.0, "avg_logprob": -0.32399056174538354, + "compression_ratio": 1.5757575757575757, "no_speech_prob": 0.022683139890432358}, + {"id": 272, "seek": 104836, "start": 1070.08, "end": 1072.32, "text": " You fine + tune it with a little bit of data.", "tokens": [51450, 509, 2489, 10864, 309, 365, + 257, 707, 857, 295, 1412, 13, 51562], "temperature": 0.0, "avg_logprob": -0.32399056174538354, + "compression_ratio": 1.5757575757575757, "no_speech_prob": 0.022683139890432358}, + {"id": 273, "seek": 104836, "start": 1072.32, "end": 1076.6, "text": " And everyone + always hits the same problem that is very difficult to productionize.", "tokens": + [51562, 400, 1518, 1009, 8664, 264, 912, 1154, 300, 307, 588, 2252, 281, 4265, 1125, + 13, 51776], "temperature": 0.0, "avg_logprob": -0.32399056174538354, "compression_ratio": + 1.5757575757575757, "no_speech_prob": 0.022683139890432358}, {"id": 274, "seek": + 107660, "start": 1076.6, "end": 1082.36, "text": " And even at a place like Google, + they didn''t productionize birth.", "tokens": [50364, 400, 754, 412, 257, 1081, + 411, 3329, 11, 436, 994, 380, 4265, 1125, 3965, 13, 50652], "temperature": 0.0, + "avg_logprob": -0.18467098016005296, "compression_ratio": 1.6653061224489796, "no_speech_prob": + 0.006023973226547241}, {"id": 275, "seek": 107660, "start": 1082.36, "end": 1084.84, + "text": " They had to distill birth and productionize it.", "tokens": [50652, 814, + 632, 281, 42923, 3965, 293, 4265, 1125, 309, 13, 50776], "temperature": 0.0, "avg_logprob": + -0.18467098016005296, "compression_ratio": 1.6653061224489796, "no_speech_prob": + 0.006023973226547241}, {"id": 276, "seek": 107660, "start": 1084.84, "end": 1088.52, + "text": " And distillation requires a lot of expertise.", "tokens": [50776, 400, + 42923, 399, 7029, 257, 688, 295, 11769, 13, 50960], "temperature": 0.0, "avg_logprob": + -0.18467098016005296, "compression_ratio": 1.6653061224489796, "no_speech_prob": + 0.006023973226547241}, {"id": 277, "seek": 107660, "start": 1088.52, "end": 1092.52, + "text": " It''s out of the reach, I think, of most companies.", "tokens": [50960, + 467, 311, 484, 295, 264, 2524, 11, 286, 519, 11, 295, 881, 3431, 13, 51160], "temperature": + 0.0, "avg_logprob": -0.18467098016005296, "compression_ratio": 1.6653061224489796, + "no_speech_prob": 0.006023973226547241}, {"id": 278, "seek": 107660, "start": 1092.52, + "end": 1099.6399999999999, "text": " So as good as the results look in a staging + environment, that''s not really a practical", "tokens": [51160, 407, 382, 665, 382, + 264, 3542, 574, 294, 257, 41085, 2823, 11, 300, 311, 406, 534, 257, 8496, 51516], + "temperature": 0.0, "avg_logprob": -0.18467098016005296, "compression_ratio": 1.6653061224489796, + "no_speech_prob": 0.006023973226547241}, {"id": 279, "seek": 107660, "start": 1099.6399999999999, + "end": 1100.6399999999999, "text": " to productionize that.", "tokens": [51516, + 281, 4265, 1125, 300, 13, 51566], "temperature": 0.0, "avg_logprob": -0.18467098016005296, + "compression_ratio": 1.6653061224489796, "no_speech_prob": 0.006023973226547241}, + {"id": 280, "seek": 107660, "start": 1100.6399999999999, "end": 1105.3999999999999, + "text": " And that comes back to the original point that we tried to make the right + choices where", "tokens": [51566, 400, 300, 1487, 646, 281, 264, 3380, 935, 300, + 321, 3031, 281, 652, 264, 558, 7994, 689, 51804], "temperature": 0.0, "avg_logprob": + -0.18467098016005296, "compression_ratio": 1.6653061224489796, "no_speech_prob": + 0.006023973226547241}, {"id": 281, "seek": 110540, "start": 1105.4, "end": 1108.96, + "text": " if we were deploying birth, either it would be enormously expensive for + us because we''d", "tokens": [50364, 498, 321, 645, 34198, 3965, 11, 2139, 309, + 576, 312, 39669, 5124, 337, 505, 570, 321, 1116, 50542], "temperature": 0.0, "avg_logprob": + -0.24159034480893515, "compression_ratio": 1.6495176848874598, "no_speech_prob": + 0.0007520999060943723}, {"id": 282, "seek": 110540, "start": 1108.96, "end": 1114.88, + "text": " have to be using GPU instances or TPU instances, or we would have very + high latencies.", "tokens": [50542, 362, 281, 312, 1228, 18407, 14519, 420, 314, + 8115, 14519, 11, 420, 321, 576, 362, 588, 1090, 4465, 6464, 13, 50838], "temperature": + 0.0, "avg_logprob": -0.24159034480893515, "compression_ratio": 1.6495176848874598, + "no_speech_prob": 0.0007520999060943723}, {"id": 283, "seek": 110540, "start": 1114.88, + "end": 1119.0400000000002, "text": " So we have a model that produces similar performance, + but it runs much faster.", "tokens": [50838, 407, 321, 362, 257, 2316, 300, 14725, + 2531, 3389, 11, 457, 309, 6676, 709, 4663, 13, 51046], "temperature": 0.0, "avg_logprob": + -0.24159034480893515, "compression_ratio": 1.6495176848874598, "no_speech_prob": + 0.0007520999060943723}, {"id": 284, "seek": 110540, "start": 1119.0400000000002, + "end": 1121.68, "text": " It''s still transformer-based.", "tokens": [51046, 467, + 311, 920, 31782, 12, 6032, 13, 51178], "temperature": 0.0, "avg_logprob": -0.24159034480893515, + "compression_ratio": 1.6495176848874598, "no_speech_prob": 0.0007520999060943723}, + {"id": 285, "seek": 110540, "start": 1121.68, "end": 1125.6000000000001, "text": + " Coming to your second point, I think your main question, your original question, + was actually", "tokens": [51178, 12473, 281, 428, 1150, 935, 11, 286, 519, 428, + 2135, 1168, 11, 428, 3380, 1168, 11, 390, 767, 51374], "temperature": 0.0, "avg_logprob": + -0.24159034480893515, "compression_ratio": 1.6495176848874598, "no_speech_prob": + 0.0007520999060943723}, {"id": 286, "seek": 110540, "start": 1125.6000000000001, + "end": 1129.96, "text": " what''s the theoretical limit of performance that we can + achieve in terms of are you asking", "tokens": [51374, 437, 311, 264, 20864, 4948, + 295, 3389, 300, 321, 393, 4584, 294, 2115, 295, 366, 291, 3365, 51592], "temperature": + 0.0, "avg_logprob": -0.24159034480893515, "compression_ratio": 1.6495176848874598, + "no_speech_prob": 0.0007520999060943723}, {"id": 287, "seek": 110540, "start": 1129.96, + "end": 1131.68, "text": " from a latency perspective?", "tokens": [51592, 490, 257, + 27043, 4585, 30, 51678], "temperature": 0.0, "avg_logprob": -0.24159034480893515, + "compression_ratio": 1.6495176848874598, "no_speech_prob": 0.0007520999060943723}, + {"id": 288, "seek": 110540, "start": 1131.68, "end": 1133.92, "text": " Yeah, a + latency.", "tokens": [51678, 865, 11, 257, 27043, 13, 51790], "temperature": 0.0, + "avg_logprob": -0.24159034480893515, "compression_ratio": 1.6495176848874598, "no_speech_prob": + 0.0007520999060943723}, {"id": 289, "seek": 113392, "start": 1133.92, "end": 1138.76, + "text": " So I''ll say this.", "tokens": [50364, 407, 286, 603, 584, 341, 13, 50606], + "temperature": 0.0, "avg_logprob": -0.22485659672663763, "compression_ratio": 1.3526011560693643, + "no_speech_prob": 0.007748408708721399}, {"id": 290, "seek": 113392, "start": 1138.76, + "end": 1146.8000000000002, "text": " When it comes to the vector database, you probably + know this better than I do.", "tokens": [50606, 1133, 309, 1487, 281, 264, 8062, + 8149, 11, 291, 1391, 458, 341, 1101, 813, 286, 360, 13, 51008], "temperature": 0.0, + "avg_logprob": -0.22485659672663763, "compression_ratio": 1.3526011560693643, "no_speech_prob": + 0.007748408708721399}, {"id": 291, "seek": 113392, "start": 1146.8000000000002, + "end": 1159.1200000000001, "text": " If it''s indexed and quantized correctly on + our last stuff, even running on CPUs, you", "tokens": [51008, 759, 309, 311, 8186, + 292, 293, 4426, 1602, 8944, 322, 527, 1036, 1507, 11, 754, 2614, 322, 13199, 82, + 11, 291, 51624], "temperature": 0.0, "avg_logprob": -0.22485659672663763, "compression_ratio": + 1.3526011560693643, "no_speech_prob": 0.007748408708721399}, {"id": 292, "seek": + 113392, "start": 1159.1200000000001, "end": 1162.6000000000001, "text": " can get + down to three, four milliseconds of latency.", "tokens": [51624, 393, 483, 760, + 281, 1045, 11, 1451, 34184, 295, 27043, 13, 51798], "temperature": 0.0, "avg_logprob": + -0.22485659672663763, "compression_ratio": 1.3526011560693643, "no_speech_prob": + 0.007748408708721399}, {"id": 293, "seek": 116260, "start": 1162.6, "end": 1166.6799999999998, + "text": " It depends on so many trade-offs, like how much recolor you will decircify + and other things", "tokens": [50364, 467, 5946, 322, 370, 867, 4923, 12, 19231, + 11, 411, 577, 709, 850, 36182, 291, 486, 979, 347, 66, 2505, 293, 661, 721, 50568], + "temperature": 0.0, "avg_logprob": -0.2549551474947889, "compression_ratio": 1.6, + "no_speech_prob": 0.05158064886927605}, {"id": 294, "seek": 116260, "start": 1166.6799999999998, + "end": 1167.6799999999998, "text": " like that.", "tokens": [50568, 411, 300, 13, + 50618], "temperature": 0.0, "avg_logprob": -0.2549551474947889, "compression_ratio": + 1.6, "no_speech_prob": 0.05158064886927605}, {"id": 295, "seek": 116260, "start": + 1167.6799999999998, "end": 1169.1599999999999, "text": " What are the dimensions + of the vector?", "tokens": [50618, 708, 366, 264, 12819, 295, 264, 8062, 30, 50692], + "temperature": 0.0, "avg_logprob": -0.2549551474947889, "compression_ratio": 1.6, + "no_speech_prob": 0.05158064886927605}, {"id": 296, "seek": 116260, "start": 1169.1599999999999, + "end": 1175.1999999999998, "text": " But I think that we found that to be quite + feasible for our system.", "tokens": [50692, 583, 286, 519, 300, 321, 1352, 300, + 281, 312, 1596, 26648, 337, 527, 1185, 13, 50994], "temperature": 0.0, "avg_logprob": + -0.2549551474947889, "compression_ratio": 1.6, "no_speech_prob": 0.05158064886927605}, + {"id": 297, "seek": 116260, "start": 1175.1999999999998, "end": 1177.1999999999998, + "text": " We don''t do 768 dimensions.", "tokens": [50994, 492, 500, 380, 360, 24733, + 23, 12819, 13, 51094], "temperature": 0.0, "avg_logprob": -0.2549551474947889, "compression_ratio": + 1.6, "no_speech_prob": 0.05158064886927605}, {"id": 298, "seek": 116260, "start": + 1177.1999999999998, "end": 1180.76, "text": " Our neural nets produce a little bit + less, but still it''s comparable.", "tokens": [51094, 2621, 18161, 36170, 5258, + 257, 707, 857, 1570, 11, 457, 920, 309, 311, 25323, 13, 51272], "temperature": 0.0, + "avg_logprob": -0.2549551474947889, "compression_ratio": 1.6, "no_speech_prob": + 0.05158064886927605}, {"id": 299, "seek": 116260, "start": 1180.76, "end": 1183.04, + "text": " It''s not that far off.", "tokens": [51272, 467, 311, 406, 300, 1400, + 766, 13, 51386], "temperature": 0.0, "avg_logprob": -0.2549551474947889, "compression_ratio": + 1.6, "no_speech_prob": 0.05158064886927605}, {"id": 300, "seek": 116260, "start": + 1183.04, "end": 1189.8799999999999, "text": " In terms of the neural network, I + would say that transformers are required for proper", "tokens": [51386, 682, 2115, + 295, 264, 18161, 3209, 11, 286, 576, 584, 300, 4088, 433, 366, 4739, 337, 2296, + 51728], "temperature": 0.0, "avg_logprob": -0.2549551474947889, "compression_ratio": + 1.6, "no_speech_prob": 0.05158064886927605}, {"id": 301, "seek": 116260, "start": + 1189.8799999999999, "end": 1190.8799999999999, "text": " language understanding.", + "tokens": [51728, 2856, 3701, 13, 51778], "temperature": 0.0, "avg_logprob": -0.2549551474947889, + "compression_ratio": 1.6, "no_speech_prob": 0.05158064886927605}, {"id": 302, "seek": + 119088, "start": 1190.96, "end": 1195.3600000000001, "text": " One of the things + I didn''t mention about our system was I think that we were basically", "tokens": + [50368, 1485, 295, 264, 721, 286, 994, 380, 2152, 466, 527, 1185, 390, 286, 519, + 300, 321, 645, 1936, 50588], "temperature": 0.0, "avg_logprob": -0.3187896465433055, + "compression_ratio": 1.6223175965665235, "no_speech_prob": 0.007115307729691267}, + {"id": 303, "seek": 119088, "start": 1195.3600000000001, "end": 1200.7600000000002, + "text": " one of the first teams back in 2017 to incorporate transformers production + architecture.", "tokens": [50588, 472, 295, 264, 700, 5491, 646, 294, 6591, 281, + 16091, 4088, 433, 4265, 9482, 13, 50858], "temperature": 0.0, "avg_logprob": -0.3187896465433055, + "compression_ratio": 1.6223175965665235, "no_speech_prob": 0.007115307729691267}, + {"id": 304, "seek": 119088, "start": 1200.7600000000002, "end": 1205.48, "text": + " This was my colleagues, Noah Constant.", "tokens": [50858, 639, 390, 452, 7734, + 11, 20895, 37413, 13, 51094], "temperature": 0.0, "avg_logprob": -0.3187896465433055, + "compression_ratio": 1.6223175965665235, "no_speech_prob": 0.007115307729691267}, + {"id": 305, "seek": 119088, "start": 1205.48, "end": 1207.2800000000002, "text": + " He was actually one of our colleagues.", "tokens": [51094, 634, 390, 767, 472, + 295, 527, 7734, 13, 51184], "temperature": 0.0, "avg_logprob": -0.3187896465433055, + "compression_ratio": 1.6223175965665235, "no_speech_prob": 0.007115307729691267}, + {"id": 306, "seek": 119088, "start": 1207.2800000000002, "end": 1211.0400000000002, + "text": " Previously being in our team was on the original transformer paper.", + "tokens": [51184, 33606, 885, 294, 527, 1469, 390, 322, 264, 3380, 31782, 3035, + 13, 51372], "temperature": 0.0, "avg_logprob": -0.3187896465433055, "compression_ratio": + 1.6223175965665235, "no_speech_prob": 0.007115307729691267}, {"id": 307, "seek": + 119088, "start": 1211.0400000000002, "end": 1214.2, "text": " He was in Google Brain + at that time doing that research.", "tokens": [51372, 634, 390, 294, 3329, 29783, + 412, 300, 565, 884, 300, 2132, 13, 51530], "temperature": 0.0, "avg_logprob": -0.3187896465433055, + "compression_ratio": 1.6223175965665235, "no_speech_prob": 0.007115307729691267}, + {"id": 308, "seek": 121420, "start": 1214.2, "end": 1220.3600000000001, "text": + " We wanted to productionize a plan to a model.", "tokens": [50364, 492, 1415, 281, + 4265, 1125, 257, 1393, 281, 257, 2316, 13, 50672], "temperature": 0.0, "avg_logprob": + -0.30803849962022567, "compression_ratio": 1.584033613445378, "no_speech_prob": + 0.024104176089167595}, {"id": 309, "seek": 121420, "start": 1220.3600000000001, + "end": 1225.04, "text": " Noah basically spent a couple of months, took that research + level code and got it to production", "tokens": [50672, 20895, 1936, 4418, 257, + 1916, 295, 2493, 11, 1890, 300, 2132, 1496, 3089, 293, 658, 309, 281, 4265, 50906], + "temperature": 0.0, "avg_logprob": -0.30803849962022567, "compression_ratio": 1.584033613445378, + "no_speech_prob": 0.024104176089167595}, {"id": 310, "seek": 121420, "start": 1225.04, + "end": 1226.04, "text": " quality.", "tokens": [50906, 3125, 13, 50956], "temperature": + 0.0, "avg_logprob": -0.30803849962022567, "compression_ratio": 1.584033613445378, + "no_speech_prob": 0.024104176089167595}, {"id": 311, "seek": 121420, "start": 1226.04, + "end": 1232.2, "text": " Talk to books is actually being powered by a very early + transformer based model.", "tokens": [50956, 8780, 281, 3642, 307, 767, 885, 17786, + 538, 257, 588, 2440, 31782, 2361, 2316, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.30803849962022567, "compression_ratio": 1.584033613445378, "no_speech_prob": + 0.024104176089167595}, {"id": 312, "seek": 121420, "start": 1232.2, "end": 1238.6000000000001, + "text": " We saw an enormous performance jump in our metrics, doing nothing other + than switching to transformers.", "tokens": [51264, 492, 1866, 364, 11322, 3389, + 3012, 294, 527, 16367, 11, 884, 1825, 661, 813, 16493, 281, 4088, 433, 13, 51584], + "temperature": 0.0, "avg_logprob": -0.30803849962022567, "compression_ratio": 1.584033613445378, + "no_speech_prob": 0.024104176089167595}, {"id": 313, "seek": 121420, "start": 1238.6000000000001, + "end": 1241.3600000000001, "text": " I''ve never seen such a big jump in any...", + "tokens": [51584, 286, 600, 1128, 1612, 1270, 257, 955, 3012, 294, 604, 485, 51722], + "temperature": 0.0, "avg_logprob": -0.30803849962022567, "compression_ratio": 1.584033613445378, + "no_speech_prob": 0.024104176089167595}, {"id": 314, "seek": 124136, "start": 1241.36, + "end": 1243.52, "text": " Our metrics, we were looking at F1.", "tokens": [50364, + 2621, 16367, 11, 321, 645, 1237, 412, 479, 16, 13, 50472], "temperature": 0.0, "avg_logprob": + -0.33342599215572827, "compression_ratio": 1.3980099502487562, "no_speech_prob": + 0.010498964227735996}, {"id": 315, "seek": 124136, "start": 1243.52, "end": 1251.9199999999998, + "text": " Our F1 jumped from 30% to 38%.", "tokens": [50472, 2621, 479, 16, 13864, + 490, 2217, 4, 281, 12843, 6856, 50892], "temperature": 0.0, "avg_logprob": -0.33342599215572827, + "compression_ratio": 1.3980099502487562, "no_speech_prob": 0.010498964227735996}, + {"id": 316, "seek": 124136, "start": 1251.9199999999998, "end": 1253.7199999999998, + "text": " Just by switching to transformers.", "tokens": [50892, 1449, 538, 16493, + 281, 4088, 433, 13, 50982], "temperature": 0.0, "avg_logprob": -0.33342599215572827, + "compression_ratio": 1.3980099502487562, "no_speech_prob": 0.010498964227735996}, + {"id": 317, "seek": 124136, "start": 1253.7199999999998, "end": 1259.04, "text": + " Not changing the training data or the evaluation objective, just making this one + change in the", "tokens": [50982, 1726, 4473, 264, 3097, 1412, 420, 264, 13344, + 10024, 11, 445, 1455, 341, 472, 1319, 294, 264, 51248], "temperature": 0.0, "avg_logprob": + -0.33342599215572827, "compression_ratio": 1.3980099502487562, "no_speech_prob": + 0.010498964227735996}, {"id": 318, "seek": 124136, "start": 1259.04, "end": 1261.04, + "text": " architecture of the neural network.", "tokens": [51248, 9482, 295, 264, + 18161, 3209, 13, 51348], "temperature": 0.0, "avg_logprob": -0.33342599215572827, + "compression_ratio": 1.3980099502487562, "no_speech_prob": 0.010498964227735996}, + {"id": 319, "seek": 124136, "start": 1261.04, "end": 1265.6, "text": " I would consider + that''s an absolute requirement.", "tokens": [51348, 286, 576, 1949, 300, 311, 364, + 8236, 11695, 13, 51576], "temperature": 0.0, "avg_logprob": -0.33342599215572827, + "compression_ratio": 1.3980099502487562, "no_speech_prob": 0.010498964227735996}, + {"id": 320, "seek": 126560, "start": 1265.6, "end": 1272.6, "text": " I would also + say that I''m not very familiar with the economics of GPU scaling because it''s", + "tokens": [50364, 286, 576, 611, 584, 300, 286, 478, 406, 588, 4963, 365, 264, 14564, + 295, 18407, 21589, 570, 309, 311, 50714], "temperature": 0.0, "avg_logprob": -0.279806750944291, + "compression_ratio": 1.4609053497942386, "no_speech_prob": 0.01739194616675377}, + {"id": 321, "seek": 126560, "start": 1272.6, "end": 1274.76, "text": " generally + kind of expensive.", "tokens": [50714, 5101, 733, 295, 5124, 13, 50822], "temperature": + 0.0, "avg_logprob": -0.279806750944291, "compression_ratio": 1.4609053497942386, + "no_speech_prob": 0.01739194616675377}, {"id": 322, "seek": 126560, "start": 1274.76, + "end": 1280.0, "text": " Our neural networks are actually designed to run reasonably + well on CPUs.", "tokens": [50822, 2621, 18161, 9590, 366, 767, 4761, 281, 1190, + 23551, 731, 322, 13199, 82, 13, 51084], "temperature": 0.0, "avg_logprob": -0.279806750944291, + "compression_ratio": 1.4609053497942386, "no_speech_prob": 0.01739194616675377}, + {"id": 323, "seek": 126560, "start": 1280.0, "end": 1287.56, "text": " There''s + also these tips like obviously Google''s got the TPU, but Amazon has Inferencia.", + "tokens": [51084, 821, 311, 611, 613, 6082, 411, 2745, 3329, 311, 658, 264, 314, + 8115, 11, 457, 6795, 575, 682, 612, 268, 2755, 13, 51462], "temperature": 0.0, "avg_logprob": + -0.279806750944291, "compression_ratio": 1.4609053497942386, "no_speech_prob": 0.01739194616675377}, + {"id": 324, "seek": 126560, "start": 1287.56, "end": 1291.32, "text": " We''re still + kind of experimenting with what we can do with latency there.", "tokens": [51462, + 492, 434, 920, 733, 295, 29070, 365, 437, 321, 393, 360, 365, 27043, 456, 13, 51650], + "temperature": 0.0, "avg_logprob": -0.279806750944291, "compression_ratio": 1.4609053497942386, + "no_speech_prob": 0.01739194616675377}, {"id": 325, "seek": 129132, "start": 1291.32, + "end": 1301.4399999999998, "text": " I think that you can count on about 20 to 30 + milliseconds of latency at the low end from", "tokens": [50364, 286, 519, 300, 291, + 393, 1207, 322, 466, 945, 281, 2217, 34184, 295, 27043, 412, 264, 2295, 917, 490, + 50870], "temperature": 0.0, "avg_logprob": -0.22534859807867752, "compression_ratio": + 1.5853658536585367, "no_speech_prob": 0.003443585243076086}, {"id": 326, "seek": + 129132, "start": 1301.4399999999998, "end": 1306.1599999999999, "text": " coming + from the encoding process unless you start moving to GPU or something and then you", + "tokens": [50870, 1348, 490, 264, 43430, 1399, 5969, 291, 722, 2684, 281, 18407, + 420, 746, 293, 550, 291, 51106], "temperature": 0.0, "avg_logprob": -0.22534859807867752, + "compression_ratio": 1.5853658536585367, "no_speech_prob": 0.003443585243076086}, + {"id": 327, "seek": 129132, "start": 1306.1599999999999, "end": 1311.84, "text": + " might be able to do maybe 5 to 10 milliseconds.", "tokens": [51106, 1062, 312, + 1075, 281, 360, 1310, 1025, 281, 1266, 34184, 13, 51390], "temperature": 0.0, "avg_logprob": + -0.22534859807867752, "compression_ratio": 1.5853658536585367, "no_speech_prob": + 0.003443585243076086}, {"id": 328, "seek": 129132, "start": 1311.84, "end": 1319.1599999999999, + "text": " If you put that all together, it seems to me realistically you can shoot + for 30 to 40 milliseconds", "tokens": [51390, 759, 291, 829, 300, 439, 1214, 11, + 309, 2544, 281, 385, 40734, 291, 393, 3076, 337, 2217, 281, 3356, 34184, 51756], + "temperature": 0.0, "avg_logprob": -0.22534859807867752, "compression_ratio": 1.5853658536585367, + "no_speech_prob": 0.003443585243076086}, {"id": 329, "seek": 131916, "start": 1319.16, + "end": 1323.72, "text": " would be pretty aggressive in terms of what you can get + at the lower bound.", "tokens": [50364, 576, 312, 1238, 10762, 294, 2115, 295, 437, + 291, 393, 483, 412, 264, 3126, 5472, 13, 50592], "temperature": 0.0, "avg_logprob": + -0.28014289226728617, "compression_ratio": 1.5524193548387097, "no_speech_prob": + 0.2558579444885254}, {"id": 330, "seek": 131916, "start": 1323.72, "end": 1327.52, + "text": " And maybe for many companies out there, this will be okay.", "tokens": + [50592, 400, 1310, 337, 867, 3431, 484, 456, 11, 341, 486, 312, 1392, 13, 50782], + "temperature": 0.0, "avg_logprob": -0.28014289226728617, "compression_ratio": 1.5524193548387097, + "no_speech_prob": 0.2558579444885254}, {"id": 331, "seek": 131916, "start": 1327.52, + "end": 1333.3600000000001, "text": " As long as they don''t run web scale type of + deployment, maybe they can scale per region", "tokens": [50782, 1018, 938, 382, + 436, 500, 380, 1190, 3670, 4373, 2010, 295, 19317, 11, 1310, 436, 393, 4373, 680, + 4458, 51074], "temperature": 0.0, "avg_logprob": -0.28014289226728617, "compression_ratio": + 1.5524193548387097, "no_speech_prob": 0.2558579444885254}, {"id": 332, "seek": 131916, + "start": 1333.3600000000001, "end": 1337.92, "text": " or per zone or whatever it + is that makes sense to them.", "tokens": [51074, 420, 680, 6668, 420, 2035, 309, + 307, 300, 1669, 2020, 281, 552, 13, 51302], "temperature": 0.0, "avg_logprob": -0.28014289226728617, + "compression_ratio": 1.5524193548387097, "no_speech_prob": 0.2558579444885254}, + {"id": 333, "seek": 131916, "start": 1337.92, "end": 1343.0800000000002, "text": + " I think sounds like 30 to 40 milliseconds could be quite an okay speed.", "tokens": + [51302, 286, 519, 3263, 411, 2217, 281, 3356, 34184, 727, 312, 1596, 364, 1392, + 3073, 13, 51560], "temperature": 0.0, "avg_logprob": -0.28014289226728617, "compression_ratio": + 1.5524193548387097, "no_speech_prob": 0.2558579444885254}, {"id": 334, "seek": 131916, + "start": 1343.0800000000002, "end": 1344.92, "text": " We''re talking about latency + there.", "tokens": [51560, 492, 434, 1417, 466, 27043, 456, 13, 51652], "temperature": + 0.0, "avg_logprob": -0.28014289226728617, "compression_ratio": 1.5524193548387097, + "no_speech_prob": 0.2558579444885254}, {"id": 335, "seek": 134492, "start": 1344.92, + "end": 1349.2, "text": " I think that''s a perfectly acceptable speed even for web + search or something.", "tokens": [50364, 286, 519, 300, 311, 257, 6239, 15513, 3073, + 754, 337, 3670, 3164, 420, 746, 13, 50578], "temperature": 0.0, "avg_logprob": -0.3164606730143229, + "compression_ratio": 1.6148409893992932, "no_speech_prob": 0.3439730107784271}, + {"id": 336, "seek": 134492, "start": 1349.2, "end": 1353.8000000000002, "text": + " That''s literally the blink of an eye, 40 milliseconds.", "tokens": [50578, 663, + 311, 3736, 264, 24667, 295, 364, 3313, 11, 3356, 34184, 13, 50808], "temperature": + 0.0, "avg_logprob": -0.3164606730143229, "compression_ratio": 1.6148409893992932, + "no_speech_prob": 0.3439730107784271}, {"id": 337, "seek": 134492, "start": 1353.8000000000002, + "end": 1358.76, "text": " I think the other thing to note is that these solutions + are very horizontally scalable.", "tokens": [50808, 286, 519, 264, 661, 551, 281, + 3637, 307, 300, 613, 6547, 366, 588, 33796, 38481, 13, 51056], "temperature": 0.0, + "avg_logprob": -0.3164606730143229, "compression_ratio": 1.6148409893992932, "no_speech_prob": + 0.3439730107784271}, {"id": 338, "seek": 134492, "start": 1358.76, "end": 1363.96, + "text": " In terms of serving any given throughput, you just scale the neural network + and code", "tokens": [51056, 682, 2115, 295, 8148, 604, 2212, 44629, 11, 291, 445, + 4373, 264, 18161, 3209, 293, 3089, 51316], "temperature": 0.0, "avg_logprob": -0.3164606730143229, + "compression_ratio": 1.6148409893992932, "no_speech_prob": 0.3439730107784271}, + {"id": 339, "seek": 134492, "start": 1363.96, "end": 1369.96, "text": " or pools + and you can replicate the vector database if using FIAS for instance you start up", + "tokens": [51316, 420, 28688, 293, 291, 393, 25356, 264, 8062, 8149, 498, 1228, + 479, 40, 3160, 337, 5197, 291, 722, 493, 51616], "temperature": 0.0, "avg_logprob": + -0.3164606730143229, "compression_ratio": 1.6148409893992932, "no_speech_prob": + 0.3439730107784271}, {"id": 340, "seek": 134492, "start": 1369.96, "end": 1370.96, + "text": " replicas.", "tokens": [51616, 3248, 9150, 13, 51666], "temperature": 0.0, + "avg_logprob": -0.3164606730143229, "compression_ratio": 1.6148409893992932, "no_speech_prob": + 0.3439730107784271}, {"id": 341, "seek": 134492, "start": 1370.96, "end": 1372.8400000000001, + "text": " You can basically get almost unlimited throughput.", "tokens": [51666, + 509, 393, 1936, 483, 1920, 21950, 44629, 13, 51760], "temperature": 0.0, "avg_logprob": + -0.3164606730143229, "compression_ratio": 1.6148409893992932, "no_speech_prob": + 0.3439730107784271}, {"id": 342, "seek": 137284, "start": 1372.84, "end": 1375.1599999999999, + "text": " It just depends on how much money you have to throw at the problem.", + "tokens": [50364, 467, 445, 5946, 322, 577, 709, 1460, 291, 362, 281, 3507, 412, + 264, 1154, 13, 50480], "temperature": 0.0, "avg_logprob": -0.24277473077541445, + "compression_ratio": 1.6446886446886446, "no_speech_prob": 0.060518525540828705}, + {"id": 343, "seek": 137284, "start": 1375.1599999999999, "end": 1378.76, "text": + " So if you need 500 QPS, bring up more hardware.", "tokens": [50480, 407, 498, + 291, 643, 5923, 1249, 6273, 11, 1565, 493, 544, 8837, 13, 50660], "temperature": + 0.0, "avg_logprob": -0.24277473077541445, "compression_ratio": 1.6446886446886446, + "no_speech_prob": 0.060518525540828705}, {"id": 344, "seek": 137284, "start": 1378.76, + "end": 1381.84, "text": " If you need 5000 QPS, you can bring up more hardware and + do it.", "tokens": [50660, 759, 291, 643, 23777, 1249, 6273, 11, 291, 393, 1565, + 493, 544, 8837, 293, 360, 309, 13, 50814], "temperature": 0.0, "avg_logprob": -0.24277473077541445, + "compression_ratio": 1.6446886446886446, "no_speech_prob": 0.060518525540828705}, + {"id": 345, "seek": 137284, "start": 1381.84, "end": 1383.3999999999999, "text": + " Yeah, absolutely.", "tokens": [50814, 865, 11, 3122, 13, 50892], "temperature": + 0.0, "avg_logprob": -0.24277473077541445, "compression_ratio": 1.6446886446886446, + "no_speech_prob": 0.060518525540828705}, {"id": 346, "seek": 137284, "start": 1383.3999999999999, + "end": 1389.4399999999998, "text": " I also wanted to tap into what you said that + distilling bird would be beyond reach for", "tokens": [50892, 286, 611, 1415, 281, + 5119, 666, 437, 291, 848, 300, 1483, 7345, 5255, 576, 312, 4399, 2524, 337, 51194], + "temperature": 0.0, "avg_logprob": -0.24277473077541445, "compression_ratio": 1.6446886446886446, + "no_speech_prob": 0.060518525540828705}, {"id": 347, "seek": 137284, "start": 1389.4399999999998, + "end": 1390.4399999999998, "text": " many companies.", "tokens": [51194, 867, 3431, + 13, 51244], "temperature": 0.0, "avg_logprob": -0.24277473077541445, "compression_ratio": + 1.6446886446886446, "no_speech_prob": 0.060518525540828705}, {"id": 348, "seek": + 137284, "start": 1390.4399999999998, "end": 1394.12, "text": " Can you open up a + little bit and also can you share with our audience what do you mean by", "tokens": + [51244, 1664, 291, 1269, 493, 257, 707, 857, 293, 611, 393, 291, 2073, 365, 527, + 4034, 437, 360, 291, 914, 538, 51428], "temperature": 0.0, "avg_logprob": -0.24277473077541445, + "compression_ratio": 1.6446886446886446, "no_speech_prob": 0.060518525540828705}, + {"id": 349, "seek": 137284, "start": 1394.12, "end": 1395.12, "text": " distilling?", + "tokens": [51428, 1483, 7345, 30, 51478], "temperature": 0.0, "avg_logprob": -0.24277473077541445, + "compression_ratio": 1.6446886446886446, "no_speech_prob": 0.060518525540828705}, + {"id": 350, "seek": 137284, "start": 1395.12, "end": 1398.8, "text": " Maybe some + of our subscribers don''t know that.", "tokens": [51478, 2704, 512, 295, 527, 11092, + 500, 380, 458, 300, 13, 51662], "temperature": 0.0, "avg_logprob": -0.24277473077541445, + "compression_ratio": 1.6446886446886446, "no_speech_prob": 0.060518525540828705}, + {"id": 351, "seek": 139880, "start": 1399.6399999999999, "end": 1403.28, "text": + " So in the nutshell, and also why do you think that it''s so hard to do?", "tokens": + [50406, 407, 294, 264, 37711, 11, 293, 611, 983, 360, 291, 519, 300, 309, 311, 370, + 1152, 281, 360, 30, 50588], "temperature": 0.0, "avg_logprob": -0.28881819248199464, + "compression_ratio": 1.6582914572864322, "no_speech_prob": 0.006933798547834158}, + {"id": 352, "seek": 139880, "start": 1405.28, "end": 1414.0, "text": " Okay, well, + so what distillation of a neural network refers to is taking a very large neural", + "tokens": [50688, 1033, 11, 731, 11, 370, 437, 42923, 399, 295, 257, 18161, 3209, + 14942, 281, 307, 1940, 257, 588, 2416, 18161, 51124], "temperature": 0.0, "avg_logprob": + -0.28881819248199464, "compression_ratio": 1.6582914572864322, "no_speech_prob": + 0.006933798547834158}, {"id": 353, "seek": 139880, "start": 1414.0, "end": 1419.8, + "text": " network and neural network with a lot of parameters, it''s called billions + of parameters, which", "tokens": [51124, 3209, 293, 18161, 3209, 365, 257, 688, + 295, 9834, 11, 309, 311, 1219, 17375, 295, 9834, 11, 597, 51414], "temperature": + 0.0, "avg_logprob": -0.28881819248199464, "compression_ratio": 1.6582914572864322, + "no_speech_prob": 0.006933798547834158}, {"id": 354, "seek": 139880, "start": 1419.8, + "end": 1425.8799999999999, "text": " is very accurate but cannot reasonably be run + on a production workload.", "tokens": [51414, 307, 588, 8559, 457, 2644, 23551, + 312, 1190, 322, 257, 4265, 20139, 13, 51718], "temperature": 0.0, "avg_logprob": + -0.28881819248199464, "compression_ratio": 1.6582914572864322, "no_speech_prob": + 0.006933798547834158}, {"id": 355, "seek": 142588, "start": 1425.88, "end": 1431.1200000000001, + "text": " And training a much smaller model that captures as much of the performance + of the original", "tokens": [50364, 400, 3097, 257, 709, 4356, 2316, 300, 27986, + 382, 709, 295, 264, 3389, 295, 264, 3380, 50626], "temperature": 0.0, "avg_logprob": + -0.1924495469956171, "compression_ratio": 1.6610169491525424, "no_speech_prob": + 0.04053569585084915}, {"id": 356, "seek": 142588, "start": 1431.1200000000001, "end": + 1437.64, "text": " model as possible, but fitting inside the engineering parameters + of your production system.", "tokens": [50626, 2316, 382, 1944, 11, 457, 15669, + 1854, 264, 7043, 9834, 295, 428, 4265, 1185, 13, 50952], "temperature": 0.0, "avg_logprob": + -0.1924495469956171, "compression_ratio": 1.6610169491525424, "no_speech_prob": + 0.04053569585084915}, {"id": 357, "seek": 142588, "start": 1437.64, "end": 1443.1200000000001, + "text": " So able to for instance run an inference within 50 milliseconds.", "tokens": + [50952, 407, 1075, 281, 337, 5197, 1190, 364, 38253, 1951, 2625, 34184, 13, 51226], + "temperature": 0.0, "avg_logprob": -0.1924495469956171, "compression_ratio": 1.6610169491525424, + "no_speech_prob": 0.04053569585084915}, {"id": 358, "seek": 142588, "start": 1443.1200000000001, + "end": 1450.5600000000002, "text": " So the way that distillation normally happens + is you use the parent model is called the teacher", "tokens": [51226, 407, 264, + 636, 300, 42923, 399, 5646, 2314, 307, 291, 764, 264, 2596, 2316, 307, 1219, 264, + 5027, 51598], "temperature": 0.0, "avg_logprob": -0.1924495469956171, "compression_ratio": + 1.6610169491525424, "no_speech_prob": 0.04053569585084915}, {"id": 359, "seek": + 142588, "start": 1450.5600000000002, "end": 1455.3200000000002, "text": " model + and you do a large scale labeling of data.", "tokens": [51598, 2316, 293, 291, 360, + 257, 2416, 4373, 40244, 295, 1412, 13, 51836], "temperature": 0.0, "avg_logprob": + -0.1924495469956171, "compression_ratio": 1.6610169491525424, "no_speech_prob": + 0.04053569585084915}, {"id": 360, "seek": 145532, "start": 1455.32, "end": 1460.8, + "text": " And essentially the student model, the small model that you''re training + needs to learn", "tokens": [50364, 400, 4476, 264, 3107, 2316, 11, 264, 1359, 2316, + 300, 291, 434, 3097, 2203, 281, 1466, 50638], "temperature": 0.0, "avg_logprob": + -0.15440780558484665, "compression_ratio": 1.7854077253218885, "no_speech_prob": + 0.00035531993489712477}, {"id": 361, "seek": 145532, "start": 1460.8, "end": 1463.0, + "text": " to make the same predictions.", "tokens": [50638, 281, 652, 264, 912, + 21264, 13, 50748], "temperature": 0.0, "avg_logprob": -0.15440780558484665, "compression_ratio": + 1.7854077253218885, "no_speech_prob": 0.00035531993489712477}, {"id": 362, "seek": + 145532, "start": 1463.0, "end": 1470.08, "text": " And interestingly, it gets as + much bang for the buck in terms of training from learning", "tokens": [50748, 400, + 25873, 11, 309, 2170, 382, 709, 8550, 337, 264, 14894, 294, 2115, 295, 3097, 490, + 2539, 51102], "temperature": 0.0, "avg_logprob": -0.15440780558484665, "compression_ratio": + 1.7854077253218885, "no_speech_prob": 0.00035531993489712477}, {"id": 363, "seek": + 145532, "start": 1470.08, "end": 1474.9199999999998, "text": " to make the correct + predictions as it does from learning to, you know, assign probabilities", "tokens": + [51102, 281, 652, 264, 3006, 21264, 382, 309, 775, 490, 2539, 281, 11, 291, 458, + 11, 6269, 33783, 51344], "temperature": 0.0, "avg_logprob": -0.15440780558484665, + "compression_ratio": 1.7854077253218885, "no_speech_prob": 0.00035531993489712477}, + {"id": 364, "seek": 145532, "start": 1474.9199999999998, "end": 1476.9199999999998, + "text": " to the incorrect predictions.", "tokens": [51344, 281, 264, 18424, 21264, + 13, 51444], "temperature": 0.0, "avg_logprob": -0.15440780558484665, "compression_ratio": + 1.7854077253218885, "no_speech_prob": 0.00035531993489712477}, {"id": 365, "seek": + 145532, "start": 1476.9199999999998, "end": 1483.4399999999998, "text": " So the + reason I''m saying that distillation is difficult is there''s, I think it approaches", + "tokens": [51444, 407, 264, 1778, 286, 478, 1566, 300, 42923, 399, 307, 2252, 307, + 456, 311, 11, 286, 519, 309, 11587, 51770], "temperature": 0.0, "avg_logprob": -0.15440780558484665, + "compression_ratio": 1.7854077253218885, "no_speech_prob": 0.00035531993489712477}, + {"id": 366, "seek": 148344, "start": 1483.44, "end": 1485.6000000000001, "text": + " to it, it''s still a fairly open research topic.", "tokens": [50364, 281, 309, + 11, 309, 311, 920, 257, 6457, 1269, 2132, 4829, 13, 50472], "temperature": 0.0, + "avg_logprob": -0.20257222311837333, "compression_ratio": 1.6029411764705883, "no_speech_prob": + 0.09794837236404419}, {"id": 367, "seek": 148344, "start": 1485.6000000000001, "end": + 1487.48, "text": " There''s a lot of active research.", "tokens": [50472, 821, 311, + 257, 688, 295, 4967, 2132, 13, 50566], "temperature": 0.0, "avg_logprob": -0.20257222311837333, + "compression_ratio": 1.6029411764705883, "no_speech_prob": 0.09794837236404419}, + {"id": 368, "seek": 148344, "start": 1487.48, "end": 1490.6000000000001, "text": + " I haven''t looked in the last couple of years as possible that there might be + frameworks", "tokens": [50566, 286, 2378, 380, 2956, 294, 264, 1036, 1916, 295, + 924, 382, 1944, 300, 456, 1062, 312, 29834, 50722], "temperature": 0.0, "avg_logprob": + -0.20257222311837333, "compression_ratio": 1.6029411764705883, "no_speech_prob": + 0.09794837236404419}, {"id": 369, "seek": 148344, "start": 1490.6000000000001, "end": + 1492.88, "text": " out there now that make this much easier.", "tokens": [50722, + 484, 456, 586, 300, 652, 341, 709, 3571, 13, 50836], "temperature": 0.0, "avg_logprob": + -0.20257222311837333, "compression_ratio": 1.6029411764705883, "no_speech_prob": + 0.09794837236404419}, {"id": 370, "seek": 148344, "start": 1492.88, "end": 1499.56, + "text": " But certainly while I was at Google in 2018, 1920 time frame distillation + was generally", "tokens": [50836, 583, 3297, 1339, 286, 390, 412, 3329, 294, 6096, + 11, 22003, 565, 3920, 42923, 399, 390, 5101, 51170], "temperature": 0.0, "avg_logprob": + -0.20257222311837333, "compression_ratio": 1.6029411764705883, "no_speech_prob": + 0.09794837236404419}, {"id": 371, "seek": 148344, "start": 1499.56, "end": 1504.0, + "text": " a topic that was tackled by entire teams working over a quarter or two, + at least for the", "tokens": [51170, 257, 4829, 300, 390, 9426, 1493, 538, 2302, + 5491, 1364, 670, 257, 6555, 420, 732, 11, 412, 1935, 337, 264, 51392], "temperature": + 0.0, "avg_logprob": -0.20257222311837333, "compression_ratio": 1.6029411764705883, + "no_speech_prob": 0.09794837236404419}, {"id": 372, "seek": 148344, "start": 1504.0, + "end": 1505.48, "text": " most serious production systems.", "tokens": [51392, 881, + 3156, 4265, 3652, 13, 51466], "temperature": 0.0, "avg_logprob": -0.20257222311837333, + "compression_ratio": 1.6029411764705883, "no_speech_prob": 0.09794837236404419}, + {"id": 373, "seek": 148344, "start": 1505.48, "end": 1507.48, "text": " That''s + how it was used to go.", "tokens": [51466, 663, 311, 577, 309, 390, 1143, 281, 352, + 13, 51566], "temperature": 0.0, "avg_logprob": -0.20257222311837333, "compression_ratio": + 1.6029411764705883, "no_speech_prob": 0.09794837236404419}, {"id": 374, "seek": + 148344, "start": 1507.48, "end": 1508.48, "text": " Yeah.", "tokens": [51566, 865, + 13, 51616], "temperature": 0.0, "avg_logprob": -0.20257222311837333, "compression_ratio": + 1.6029411764705883, "no_speech_prob": 0.09794837236404419}, {"id": 375, "seek": + 148344, "start": 1508.48, "end": 1511.92, "text": " And definitely when it comes + to collecting data as you rightly not just, you know, it''s", "tokens": [51616, + 400, 2138, 562, 309, 1487, 281, 12510, 1412, 382, 291, 32879, 406, 445, 11, 291, + 458, 11, 309, 311, 51788], "temperature": 0.0, "avg_logprob": -0.20257222311837333, + "compression_ratio": 1.6029411764705883, "no_speech_prob": 0.09794837236404419}, + {"id": 376, "seek": 151192, "start": 1511.92, "end": 1517.1200000000001, "text": + " not something you can easily scale unless you have some clever technique for data + augmentation.", "tokens": [50364, 406, 746, 291, 393, 3612, 4373, 5969, 291, 362, + 512, 13494, 6532, 337, 1412, 14501, 19631, 13, 50624], "temperature": 0.0, "avg_logprob": + -0.2160141626993815, "compression_ratio": 1.6570397111913358, "no_speech_prob": + 0.009416880086064339}, {"id": 377, "seek": 151192, "start": 1517.1200000000001, + "end": 1523.0, "text": " And even then, like for text, as I was eluding in previous + podcasts, you know, like if you", "tokens": [50624, 400, 754, 550, 11, 411, 337, + 2487, 11, 382, 286, 390, 806, 33703, 294, 3894, 24045, 11, 291, 458, 11, 411, 498, + 291, 50918], "temperature": 0.0, "avg_logprob": -0.2160141626993815, "compression_ratio": + 1.6570397111913358, "no_speech_prob": 0.009416880086064339}, {"id": 378, "seek": + 151192, "start": 1523.0, "end": 1527.5600000000002, "text": " have like a London + is the capital of Great Britain, you cannot put any random city there", "tokens": + [50918, 362, 411, 257, 7042, 307, 264, 4238, 295, 3769, 12960, 11, 291, 2644, 829, + 604, 4974, 2307, 456, 51146], "temperature": 0.0, "avg_logprob": -0.2160141626993815, + "compression_ratio": 1.6570397111913358, "no_speech_prob": 0.009416880086064339}, + {"id": 379, "seek": 151192, "start": 1527.5600000000002, "end": 1529.0800000000002, + "text": " in that specific sentence, right?", "tokens": [51146, 294, 300, 2685, + 8174, 11, 558, 30, 51222], "temperature": 0.0, "avg_logprob": -0.2160141626993815, + "compression_ratio": 1.6570397111913358, "no_speech_prob": 0.009416880086064339}, + {"id": 380, "seek": 151192, "start": 1529.0800000000002, "end": 1530.0800000000002, + "text": " Right.", "tokens": [51222, 1779, 13, 51272], "temperature": 0.0, "avg_logprob": + -0.2160141626993815, "compression_ratio": 1.6570397111913358, "no_speech_prob": + 0.009416880086064339}, {"id": 381, "seek": 151192, "start": 1530.0800000000002, + "end": 1531.0800000000002, "text": " Right.", "tokens": [51272, 1779, 13, 51322], + "temperature": 0.0, "avg_logprob": -0.2160141626993815, "compression_ratio": 1.6570397111913358, + "no_speech_prob": 0.009416880086064339}, {"id": 382, "seek": 151192, "start": 1531.0800000000002, + "end": 1532.0800000000002, "text": " Right.", "tokens": [51322, 1779, 13, 51372], + "temperature": 0.0, "avg_logprob": -0.2160141626993815, "compression_ratio": 1.6570397111913358, + "no_speech_prob": 0.009416880086064339}, {"id": 383, "seek": 151192, "start": 1532.0800000000002, + "end": 1533.0800000000002, "text": " Yeah, you need to have certain control.", "tokens": + [51372, 865, 11, 291, 643, 281, 362, 1629, 1969, 13, 51422], "temperature": 0.0, + "avg_logprob": -0.2160141626993815, "compression_ratio": 1.6570397111913358, "no_speech_prob": + 0.009416880086064339}, {"id": 384, "seek": 151192, "start": 1533.0800000000002, + "end": 1539.68, "text": " But there are still ways to, for example, use retrieval + itself to augment your data set,", "tokens": [51422, 583, 456, 366, 920, 2098, 281, + 11, 337, 1365, 11, 764, 19817, 3337, 2564, 281, 29919, 428, 1412, 992, 11, 51752], + "temperature": 0.0, "avg_logprob": -0.2160141626993815, "compression_ratio": 1.6570397111913358, + "no_speech_prob": 0.009416880086064339}, {"id": 385, "seek": 153968, "start": 1539.68, + "end": 1540.68, "text": " right?", "tokens": [50364, 558, 30, 50414], "temperature": + 0.0, "avg_logprob": -0.19192072653001355, "compression_ratio": 1.6620209059233448, + "no_speech_prob": 0.014143800362944603}, {"id": 386, "seek": 153968, "start": 1540.68, + "end": 1543.52, "text": " For example, if you need more entities, you can find them + through retrieval, maybe even", "tokens": [50414, 1171, 1365, 11, 498, 291, 643, + 544, 16667, 11, 291, 393, 915, 552, 807, 19817, 3337, 11, 1310, 754, 50556], "temperature": + 0.0, "avg_logprob": -0.19192072653001355, "compression_ratio": 1.6620209059233448, + "no_speech_prob": 0.014143800362944603}, {"id": 387, "seek": 153968, "start": 1543.52, + "end": 1546.0, "text": " through vector search, by the way.", "tokens": [50556, + 807, 8062, 3164, 11, 538, 264, 636, 13, 50680], "temperature": 0.0, "avg_logprob": + -0.19192072653001355, "compression_ratio": 1.6620209059233448, "no_speech_prob": + 0.014143800362944603}, {"id": 388, "seek": 153968, "start": 1546.0, "end": 1549.0, + "text": " I don''t know if somebody experimented with that already.", "tokens": + [50680, 286, 500, 380, 458, 498, 2618, 5120, 292, 365, 300, 1217, 13, 50830], "temperature": + 0.0, "avg_logprob": -0.19192072653001355, "compression_ratio": 1.6620209059233448, + "no_speech_prob": 0.014143800362944603}, {"id": 389, "seek": 153968, "start": 1549.0, + "end": 1553.5600000000002, "text": " But there are other techniques like kind of + producing these negative examples and as you", "tokens": [50830, 583, 456, 366, + 661, 7512, 411, 733, 295, 10501, 613, 3671, 5110, 293, 382, 291, 51058], "temperature": + 0.0, "avg_logprob": -0.19192072653001355, "compression_ratio": 1.6620209059233448, + "no_speech_prob": 0.014143800362944603}, {"id": 390, "seek": 153968, "start": 1553.5600000000002, + "end": 1554.5600000000002, "text": " alluded to, right?", "tokens": [51058, 33919, + 281, 11, 558, 30, 51108], "temperature": 0.0, "avg_logprob": -0.19192072653001355, + "compression_ratio": 1.6620209059233448, "no_speech_prob": 0.014143800362944603}, + {"id": 391, "seek": 153968, "start": 1554.5600000000002, "end": 1559.96, "text": + " So you need to have as many negative also as many positive so that your model + is balanced,", "tokens": [51108, 407, 291, 643, 281, 362, 382, 867, 3671, 611, 382, + 867, 3353, 370, 300, 428, 2316, 307, 13902, 11, 51378], "temperature": 0.0, "avg_logprob": + -0.19192072653001355, "compression_ratio": 1.6620209059233448, "no_speech_prob": + 0.014143800362944603}, {"id": 392, "seek": 153968, "start": 1559.96, "end": 1560.96, + "text": " right?", "tokens": [51378, 558, 30, 51428], "temperature": 0.0, "avg_logprob": + -0.19192072653001355, "compression_ratio": 1.6620209059233448, "no_speech_prob": + 0.014143800362944603}, {"id": 393, "seek": 153968, "start": 1560.96, "end": 1567.8400000000001, + "text": " And that goes to a general model training topic, which is a year to your + point.", "tokens": [51428, 400, 300, 1709, 281, 257, 2674, 2316, 3097, 4829, 11, + 597, 307, 257, 1064, 281, 428, 935, 13, 51772], "temperature": 0.0, "avg_logprob": + -0.19192072653001355, "compression_ratio": 1.6620209059233448, "no_speech_prob": + 0.014143800362944603}, {"id": 394, "seek": 153968, "start": 1567.8400000000001, + "end": 1568.8400000000001, "text": " Yes.", "tokens": [51772, 1079, 13, 51822], + "temperature": 0.0, "avg_logprob": -0.19192072653001355, "compression_ratio": 1.6620209059233448, + "no_speech_prob": 0.014143800362944603}, {"id": 395, "seek": 156884, "start": 1568.84, + "end": 1597.6, "text": " And I think that''s one of the key to producing a neural + retriever that can outperform BM25", "tokens": [50364, 400, 286, 519, 300, 311, + 472, 295, 264, 2141, 281, 10501, 257, 18161, 19817, 331, 300, 393, 484, 26765, 15901, + 6074, 51802], "temperature": 0.2, "avg_logprob": -0.8956533813476563, "compression_ratio": + 1.0344827586206897, "no_speech_prob": 0.03218916431069374}, {"id": 396, "seek": + 159760, "start": 1597.6, "end": 1598.6, "text": " in every workload.", "tokens": + [50364, 294, 633, 20139, 13, 50414], "temperature": 0.0, "avg_logprob": -0.32192740853377216, + "compression_ratio": 1.6608996539792387, "no_speech_prob": 0.3511809706687927}, + {"id": 397, "seek": 159760, "start": 1598.6, "end": 1600.6, "text": " No, so it''s + an excellent point.", "tokens": [50414, 883, 11, 370, 309, 311, 364, 7103, 935, + 13, 50514], "temperature": 0.0, "avg_logprob": -0.32192740853377216, "compression_ratio": + 1.6608996539792387, "no_speech_prob": 0.3511809706687927}, {"id": 398, "seek": 159760, + "start": 1600.6, "end": 1601.6, "text": " Yeah.", "tokens": [50514, 865, 13, 50564], + "temperature": 0.0, "avg_logprob": -0.32192740853377216, "compression_ratio": 1.6608996539792387, + "no_speech_prob": 0.3511809706687927}, {"id": 399, "seek": 159760, "start": 1601.6, + "end": 1607.48, "text": " Also, I just reminded me of one challenge that we''ve + been solving in my team actually earlier", "tokens": [50564, 2743, 11, 286, 445, + 15920, 385, 295, 472, 3430, 300, 321, 600, 668, 12606, 294, 452, 1469, 767, 3071, + 50858], "temperature": 0.0, "avg_logprob": -0.32192740853377216, "compression_ratio": + 1.6608996539792387, "no_speech_prob": 0.3511809706687927}, {"id": 400, "seek": 159760, + "start": 1607.48, "end": 1610.6399999999999, "text": " with building like a job + search engine system.", "tokens": [50858, 365, 2390, 411, 257, 1691, 3164, 2848, + 1185, 13, 51016], "temperature": 0.0, "avg_logprob": -0.32192740853377216, "compression_ratio": + 1.6608996539792387, "no_speech_prob": 0.3511809706687927}, {"id": 401, "seek": 159760, + "start": 1610.6399999999999, "end": 1615.6, "text": " And you know, like when you + evaluate the performance, let''s say precision or when it kind", "tokens": [51016, + 400, 291, 458, 11, 411, 562, 291, 13059, 264, 3389, 11, 718, 311, 584, 18356, 420, + 562, 309, 733, 51264], "temperature": 0.0, "avg_logprob": -0.32192740853377216, + "compression_ratio": 1.6608996539792387, "no_speech_prob": 0.3511809706687927}, + {"id": 402, "seek": 159760, "start": 1615.6, "end": 1620.9199999999998, "text": + " of, we call it misrecall, so how frequently it mis triggers to query, shouldn''t + have actually", "tokens": [51264, 295, 11, 321, 818, 309, 3346, 13867, 336, 11, + 370, 577, 10374, 309, 3346, 22827, 281, 14581, 11, 4659, 380, 362, 767, 51530], + "temperature": 0.0, "avg_logprob": -0.32192740853377216, "compression_ratio": 1.6608996539792387, + "no_speech_prob": 0.3511809706687927}, {"id": 403, "seek": 159760, "start": 1620.9199999999998, + "end": 1622.1599999999999, "text": " triggered.", "tokens": [51530, 21710, 13, 51592], + "temperature": 0.0, "avg_logprob": -0.32192740853377216, "compression_ratio": 1.6608996539792387, + "no_speech_prob": 0.3511809706687927}, {"id": 404, "seek": 159760, "start": 1622.1599999999999, + "end": 1627.1599999999999, "text": " And you know, like the basic challenge there + is, okay, I have this job queries, which I", "tokens": [51592, 400, 291, 458, 11, + 411, 264, 3875, 3430, 456, 307, 11, 1392, 11, 286, 362, 341, 1691, 24109, 11, 597, + 286, 51842], "temperature": 0.0, "avg_logprob": -0.32192740853377216, "compression_ratio": + 1.6608996539792387, "no_speech_prob": 0.3511809706687927}, {"id": 405, "seek": 162716, + "start": 1627.16, "end": 1630.44, "text": " can mind from certain sources.", "tokens": + [50364, 393, 1575, 490, 1629, 7139, 13, 50528], "temperature": 0.0, "avg_logprob": + -0.1821998636773292, "compression_ratio": 1.6742081447963801, "no_speech_prob": + 0.001221410115249455}, {"id": 406, "seek": 162716, "start": 1630.44, "end": 1634.92, + "text": " But then you can as negative examples, you can pick everything else, right?", + "tokens": [50528, 583, 550, 291, 393, 382, 3671, 5110, 11, 291, 393, 1888, 1203, + 1646, 11, 558, 30, 50752], "temperature": 0.0, "avg_logprob": -0.1821998636773292, + "compression_ratio": 1.6742081447963801, "no_speech_prob": 0.001221410115249455}, + {"id": 407, "seek": 162716, "start": 1634.92, "end": 1639.68, "text": " But that + everything else doesn''t actually count because just to give you an example, let''s", + "tokens": [50752, 583, 300, 1203, 1646, 1177, 380, 767, 1207, 570, 445, 281, 976, + 291, 364, 1365, 11, 718, 311, 50990], "temperature": 0.0, "avg_logprob": -0.1821998636773292, + "compression_ratio": 1.6742081447963801, "no_speech_prob": 0.001221410115249455}, + {"id": 408, "seek": 162716, "start": 1639.68, "end": 1644.16, "text": " say when + I say find full-time job in London, right?", "tokens": [50990, 584, 562, 286, 584, + 915, 1577, 12, 3766, 1691, 294, 7042, 11, 558, 30, 51214], "temperature": 0.0, "avg_logprob": + -0.1821998636773292, "compression_ratio": 1.6742081447963801, "no_speech_prob": + 0.001221410115249455}, {"id": 409, "seek": 162716, "start": 1644.16, "end": 1647.8000000000002, + "text": " So that''s just a typical query.", "tokens": [51214, 407, 300, 311, 445, + 257, 7476, 14581, 13, 51396], "temperature": 0.0, "avg_logprob": -0.1821998636773292, + "compression_ratio": 1.6742081447963801, "no_speech_prob": 0.001221410115249455}, + {"id": 410, "seek": 162716, "start": 1647.8000000000002, "end": 1654.88, "text": + " You are really interested to find that slightly negative example, which says, + let''s say,", "tokens": [51396, 509, 366, 534, 3102, 281, 915, 300, 4748, 3671, + 1365, 11, 597, 1619, 11, 718, 311, 584, 11, 51750], "temperature": 0.0, "avg_logprob": + -0.1821998636773292, "compression_ratio": 1.6742081447963801, "no_speech_prob": + 0.001221410115249455}, {"id": 411, "seek": 165488, "start": 1654.88, "end": 1657.8000000000002, + "text": " something hours of some office, right?", "tokens": [50364, 746, 2496, + 295, 512, 3398, 11, 558, 30, 50510], "temperature": 0.0, "avg_logprob": -0.1791841578933428, + "compression_ratio": 1.6538461538461537, "no_speech_prob": 0.13948795199394226}, + {"id": 412, "seek": 165488, "start": 1657.8000000000002, "end": 1660.0400000000002, + "text": " Which is not about job search anymore.", "tokens": [50510, 3013, 307, + 406, 466, 1691, 3164, 3602, 13, 50622], "temperature": 0.0, "avg_logprob": -0.1791841578933428, + "compression_ratio": 1.6538461538461537, "no_speech_prob": 0.13948795199394226}, + {"id": 413, "seek": 165488, "start": 1660.0400000000002, "end": 1662.96, "text": + " It''s about points of interest search, maybe.", "tokens": [50622, 467, 311, 466, + 2793, 295, 1179, 3164, 11, 1310, 13, 50768], "temperature": 0.0, "avg_logprob": + -0.1791841578933428, "compression_ratio": 1.6538461538461537, "no_speech_prob": + 0.13948795199394226}, {"id": 414, "seek": 165488, "start": 1662.96, "end": 1669.0400000000002, + "text": " And so you really want to have those examples to see, okay, does your + model, you know, is", "tokens": [50768, 400, 370, 291, 534, 528, 281, 362, 729, + 5110, 281, 536, 11, 1392, 11, 775, 428, 2316, 11, 291, 458, 11, 307, 51072], "temperature": + 0.0, "avg_logprob": -0.1791841578933428, "compression_ratio": 1.6538461538461537, + "no_speech_prob": 0.13948795199394226}, {"id": 415, "seek": 165488, "start": 1669.0400000000002, + "end": 1671.24, "text": " able to differentiate between them?", "tokens": [51072, + 1075, 281, 23203, 1296, 552, 30, 51182], "temperature": 0.0, "avg_logprob": -0.1791841578933428, + "compression_ratio": 1.6538461538461537, "no_speech_prob": 0.13948795199394226}, + {"id": 416, "seek": 165488, "start": 1671.24, "end": 1679.4, "text": " And I guess + checklist paper is another example where they go like beyond, you know, imaginary", + "tokens": [51182, 400, 286, 2041, 30357, 3035, 307, 1071, 1365, 689, 436, 352, 411, + 4399, 11, 291, 458, 11, 26164, 51590], "temperature": 0.0, "avg_logprob": -0.1791841578933428, + "compression_ratio": 1.6538461538461537, "no_speech_prob": 0.13948795199394226}, + {"id": 417, "seek": 165488, "start": 1679.4, "end": 1683.96, "text": " in a way + that saying, okay, you can actually fulfill this criteria and you can actually", + "tokens": [51590, 294, 257, 636, 300, 1566, 11, 1392, 11, 291, 393, 767, 13875, + 341, 11101, 293, 291, 393, 767, 51818], "temperature": 0.0, "avg_logprob": -0.1791841578933428, + "compression_ratio": 1.6538461538461537, "no_speech_prob": 0.13948795199394226}, + {"id": 418, "seek": 168396, "start": 1683.96, "end": 1687.48, "text": " check your + model on various, very suspects.", "tokens": [50364, 1520, 428, 2316, 322, 3683, + 11, 588, 35667, 13, 50540], "temperature": 0.0, "avg_logprob": -0.2075612407085324, + "compression_ratio": 1.7900763358778626, "no_speech_prob": 0.014408075250685215}, + {"id": 419, "seek": 168396, "start": 1687.48, "end": 1688.48, "text": " Right, right.", + "tokens": [50540, 1779, 11, 558, 13, 50590], "temperature": 0.0, "avg_logprob": + -0.2075612407085324, "compression_ratio": 1.7900763358778626, "no_speech_prob": + 0.014408075250685215}, {"id": 420, "seek": 168396, "start": 1688.48, "end": 1689.48, + "text": " Right.", "tokens": [50590, 1779, 13, 50640], "temperature": 0.0, "avg_logprob": + -0.2075612407085324, "compression_ratio": 1.7900763358778626, "no_speech_prob": + 0.014408075250685215}, {"id": 421, "seek": 168396, "start": 1689.48, "end": 1693.88, + "text": " And is that something that you like, how did you go about addressing that + in your research?", "tokens": [50640, 400, 307, 300, 746, 300, 291, 411, 11, 577, + 630, 291, 352, 466, 14329, 300, 294, 428, 2132, 30, 50860], "temperature": 0.0, + "avg_logprob": -0.2075612407085324, "compression_ratio": 1.7900763358778626, "no_speech_prob": + 0.014408075250685215}, {"id": 422, "seek": 168396, "start": 1693.88, "end": 1699.0, + "text": " I mean, you know, what we did is that actually, if you look, it was like + one of the early,", "tokens": [50860, 286, 914, 11, 291, 458, 11, 437, 321, 630, + 307, 300, 767, 11, 498, 291, 574, 11, 309, 390, 411, 472, 295, 264, 2440, 11, 51116], + "temperature": 0.0, "avg_logprob": -0.2075612407085324, "compression_ratio": 1.7900763358778626, + "no_speech_prob": 0.014408075250685215}, {"id": 423, "seek": 168396, "start": 1699.0, + "end": 1703.72, "text": " early papers, you know, the reason I like reading papers + is because you can bring some", "tokens": [51116, 2440, 10577, 11, 291, 458, 11, + 264, 1778, 286, 411, 3760, 10577, 307, 570, 291, 393, 1565, 512, 51352], "temperature": + 0.0, "avg_logprob": -0.2075612407085324, "compression_ratio": 1.7900763358778626, + "no_speech_prob": 0.014408075250685215}, {"id": 424, "seek": 168396, "start": 1703.72, + "end": 1707.0, "text": " ideas from one paper to some other domain.", "tokens": + [51352, 3487, 490, 472, 3035, 281, 512, 661, 9274, 13, 51516], "temperature": 0.0, + "avg_logprob": -0.2075612407085324, "compression_ratio": 1.7900763358778626, "no_speech_prob": + 0.014408075250685215}, {"id": 425, "seek": 168396, "start": 1707.0, "end": 1712.1200000000001, + "text": " And so the paper was about sentiment analysis where one of the challenge + was back then when", "tokens": [51516, 400, 370, 264, 3035, 390, 466, 16149, 5215, + 689, 472, 295, 264, 3430, 390, 646, 550, 562, 51772], "temperature": 0.0, "avg_logprob": + -0.2075612407085324, "compression_ratio": 1.7900763358778626, "no_speech_prob": + 0.014408075250685215}, {"id": 426, "seek": 171212, "start": 1712.12, "end": 1716.6, + "text": " it was dictionary-based systems, you know, how do I expand my positive + dictionary?", "tokens": [50364, 309, 390, 25890, 12, 6032, 3652, 11, 291, 458, 11, + 577, 360, 286, 5268, 452, 3353, 25890, 30, 50588], "temperature": 0.0, "avg_logprob": + -0.15576336509303043, "compression_ratio": 1.824390243902439, "no_speech_prob": + 0.006721757352352142}, {"id": 427, "seek": 171212, "start": 1716.6, "end": 1719.08, + "text": " How do I expand my negative dictionary?", "tokens": [50588, 1012, 360, + 286, 5268, 452, 3671, 25890, 30, 50712], "temperature": 0.0, "avg_logprob": -0.15576336509303043, + "compression_ratio": 1.824390243902439, "no_speech_prob": 0.006721757352352142}, + {"id": 428, "seek": 171212, "start": 1719.08, "end": 1725.28, "text": " And what + they propose there is that you can use a retrieval system where you say, okay,", + "tokens": [50712, 400, 437, 436, 17421, 456, 307, 300, 291, 393, 764, 257, 19817, + 3337, 1185, 689, 291, 584, 11, 1392, 11, 51022], "temperature": 0.0, "avg_logprob": + -0.15576336509303043, "compression_ratio": 1.824390243902439, "no_speech_prob": + 0.006721757352352142}, {"id": 429, "seek": 171212, "start": 1725.28, "end": 1732.0, + "text": " you take an instance from a positive dictionary, let''s say it''s good, + okay.", "tokens": [51022, 291, 747, 364, 5197, 490, 257, 3353, 25890, 11, 718, 311, + 584, 309, 311, 665, 11, 1392, 13, 51358], "temperature": 0.0, "avg_logprob": -0.15576336509303043, + "compression_ratio": 1.824390243902439, "no_speech_prob": 0.006721757352352142}, + {"id": 430, "seek": 171212, "start": 1732.0, "end": 1738.52, "text": " And then + you search with a pattern where you say good and then a blank and you just let", + "tokens": [51358, 400, 550, 291, 3164, 365, 257, 5102, 689, 291, 584, 665, 293, + 550, 257, 8247, 293, 291, 445, 718, 51684], "temperature": 0.0, "avg_logprob": -0.15576336509303043, + "compression_ratio": 1.824390243902439, "no_speech_prob": 0.006721757352352142}, + {"id": 431, "seek": 173852, "start": 1738.52, "end": 1745.24, "text": " your search + engine tell you what good is occurring with in the sentences or text, right?", "tokens": + [50364, 428, 3164, 2848, 980, 291, 437, 665, 307, 18386, 365, 294, 264, 16579, 420, + 2487, 11, 558, 30, 50700], "temperature": 0.0, "avg_logprob": -0.17335914682458947, + "compression_ratio": 1.6784313725490196, "no_speech_prob": 0.006699309218674898}, + {"id": 432, "seek": 173852, "start": 1745.24, "end": 1746.84, "text": " And the + same for the bad one.", "tokens": [50700, 400, 264, 912, 337, 264, 1578, 472, 13, + 50780], "temperature": 0.0, "avg_logprob": -0.17335914682458947, "compression_ratio": + 1.6784313725490196, "no_speech_prob": 0.006699309218674898}, {"id": 433, "seek": + 173852, "start": 1746.84, "end": 1751.44, "text": " Then they run some clustering + on it so that you can actually pick more representative items", "tokens": [50780, + 1396, 436, 1190, 512, 596, 48673, 322, 309, 370, 300, 291, 393, 767, 1888, 544, + 12424, 4754, 51010], "temperature": 0.0, "avg_logprob": -0.17335914682458947, "compression_ratio": + 1.6784313725490196, "no_speech_prob": 0.006699309218674898}, {"id": 434, "seek": + 173852, "start": 1751.44, "end": 1752.92, "text": " from your data set.", "tokens": + [51010, 490, 428, 1412, 992, 13, 51084], "temperature": 0.0, "avg_logprob": -0.17335914682458947, + "compression_ratio": 1.6784313725490196, "no_speech_prob": 0.006699309218674898}, + {"id": 435, "seek": 173852, "start": 1752.92, "end": 1757.56, "text": " And in principle, + you could apply a similar technique with the job queries, right?", "tokens": [51084, + 400, 294, 8665, 11, 291, 727, 3079, 257, 2531, 6532, 365, 264, 1691, 24109, 11, + 558, 30, 51316], "temperature": 0.0, "avg_logprob": -0.17335914682458947, "compression_ratio": + 1.6784313725490196, "no_speech_prob": 0.006699309218674898}, {"id": 436, "seek": + 173852, "start": 1757.56, "end": 1764.96, "text": " And we didn''t go that far, + but we actually did try to use our own search engine to essentially,", "tokens": + [51316, 400, 321, 994, 380, 352, 300, 1400, 11, 457, 321, 767, 630, 853, 281, 764, + 527, 1065, 3164, 2848, 281, 4476, 11, 51686], "temperature": 0.0, "avg_logprob": + -0.17335914682458947, "compression_ratio": 1.6784313725490196, "no_speech_prob": + 0.006699309218674898}, {"id": 437, "seek": 173852, "start": 1764.96, "end": 1767.72, + "text": " you know, augment.", "tokens": [51686, 291, 458, 11, 29919, 13, 51824], + "temperature": 0.0, "avg_logprob": -0.17335914682458947, "compression_ratio": 1.6784313725490196, + "no_speech_prob": 0.006699309218674898}, {"id": 438, "seek": 176772, "start": 1767.72, + "end": 1771.1200000000001, "text": " And then there''s another potential technique + that might help their short of introducing", "tokens": [50364, 400, 550, 456, 311, + 1071, 3995, 6532, 300, 1062, 854, 641, 2099, 295, 15424, 50534], "temperature": + 0.0, "avg_logprob": -0.219615770422894, "compression_ratio": 2.0, "no_speech_prob": + 0.020148206502199173}, {"id": 439, "seek": 176772, "start": 1771.1200000000001, + "end": 1772.1200000000001, "text": " hard negatives.", "tokens": [50534, 1152, 40019, + 13, 50584], "temperature": 0.0, "avg_logprob": -0.219615770422894, "compression_ratio": + 2.0, "no_speech_prob": 0.020148206502199173}, {"id": 440, "seek": 176772, "start": + 1772.1200000000001, "end": 1776.92, "text": " It''s easier than introducing hard + negatives just to add like what they call a margin loss,", "tokens": [50584, 467, + 311, 3571, 813, 15424, 1152, 40019, 445, 281, 909, 411, 437, 436, 818, 257, 10270, + 4470, 11, 50824], "temperature": 0.0, "avg_logprob": -0.219615770422894, "compression_ratio": + 2.0, "no_speech_prob": 0.020148206502199173}, {"id": 441, "seek": 176772, "start": + 1776.92, "end": 1781.72, "text": " which is to essentially just say that the separation + in the score that the neural network", "tokens": [50824, 597, 307, 281, 4476, 445, + 584, 300, 264, 14634, 294, 264, 6175, 300, 264, 18161, 3209, 51064], "temperature": + 0.0, "avg_logprob": -0.219615770422894, "compression_ratio": 2.0, "no_speech_prob": + 0.020148206502199173}, {"id": 442, "seek": 176772, "start": 1781.72, "end": 1787.44, + "text": " assign the positive example versus the negative examples has to be large.", + "tokens": [51064, 6269, 264, 3353, 1365, 5717, 264, 3671, 5110, 575, 281, 312, 2416, + 13, 51350], "temperature": 0.0, "avg_logprob": -0.219615770422894, "compression_ratio": + 2.0, "no_speech_prob": 0.020148206502199173}, {"id": 443, "seek": 176772, "start": + 1787.44, "end": 1792.96, "text": " So you sign some lambda and it has to be, essentially, + you handicap the scores of the positive", "tokens": [51350, 407, 291, 1465, 512, + 13607, 293, 309, 575, 281, 312, 11, 4476, 11, 291, 45975, 264, 13444, 295, 264, + 3353, 51626], "temperature": 0.0, "avg_logprob": -0.219615770422894, "compression_ratio": + 2.0, "no_speech_prob": 0.020148206502199173}, {"id": 444, "seek": 176772, "start": + 1792.96, "end": 1797.2, "text": " examples by that lambda and it forces the neural + network to introduce more separation.", "tokens": [51626, 5110, 538, 300, 13607, + 293, 309, 5874, 264, 18161, 3209, 281, 5366, 544, 14634, 13, 51838], "temperature": + 0.0, "avg_logprob": -0.219615770422894, "compression_ratio": 2.0, "no_speech_prob": + 0.020148206502199173}, {"id": 445, "seek": 179720, "start": 1797.2, "end": 1802.2, + "text": " So sometimes that can be helpful, even if you haven''t generated hard + negatives.", "tokens": [50364, 407, 2171, 300, 393, 312, 4961, 11, 754, 498, 291, + 2378, 380, 10833, 1152, 40019, 13, 50614], "temperature": 0.0, "avg_logprob": -0.23219970294407435, + "compression_ratio": 1.6074074074074074, "no_speech_prob": 0.0014588504564017057}, + {"id": 446, "seek": 179720, "start": 1802.2, "end": 1805.3600000000001, "text": + " Yeah, yeah, absolutely.", "tokens": [50614, 865, 11, 1338, 11, 3122, 13, 50772], + "temperature": 0.0, "avg_logprob": -0.23219970294407435, "compression_ratio": 1.6074074074074074, + "no_speech_prob": 0.0014588504564017057}, {"id": 447, "seek": 179720, "start": 1805.3600000000001, + "end": 1810.3600000000001, "text": " Maybe we can also cite some papers in this + podcast, you know, like especially you mentioned", "tokens": [50772, 2704, 321, + 393, 611, 37771, 512, 10577, 294, 341, 7367, 11, 291, 458, 11, 411, 2318, 291, 2835, + 51022], "temperature": 0.0, "avg_logprob": -0.23219970294407435, "compression_ratio": + 1.6074074074074074, "no_speech_prob": 0.0014588504564017057}, {"id": 448, "seek": + 179720, "start": 1810.3600000000001, "end": 1811.3600000000001, "text": " some papers.", + "tokens": [51022, 512, 10577, 13, 51072], "temperature": 0.0, "avg_logprob": -0.23219970294407435, + "compression_ratio": 1.6074074074074074, "no_speech_prob": 0.0014588504564017057}, + {"id": 449, "seek": 179720, "start": 1811.3600000000001, "end": 1816.24, "text": + " And I will try to find this sentiment analysis paper, although I think it''s probably + like", "tokens": [51072, 400, 286, 486, 853, 281, 915, 341, 16149, 5215, 3035, 11, + 4878, 286, 519, 309, 311, 1391, 411, 51316], "temperature": 0.0, "avg_logprob": + -0.23219970294407435, "compression_ratio": 1.6074074074074074, "no_speech_prob": + 0.0014588504564017057}, {"id": 450, "seek": 179720, "start": 1816.24, "end": 1819.52, + "text": " five, six years old or maybe even older.", "tokens": [51316, 1732, 11, + 2309, 924, 1331, 420, 1310, 754, 4906, 13, 51480], "temperature": 0.0, "avg_logprob": + -0.23219970294407435, "compression_ratio": 1.6074074074074074, "no_speech_prob": + 0.0014588504564017057}, {"id": 451, "seek": 179720, "start": 1819.52, "end": 1822.6000000000001, + "text": " But I mean, this idea is still live forward, I think.", "tokens": [51480, + 583, 286, 914, 11, 341, 1558, 307, 920, 1621, 2128, 11, 286, 519, 13, 51634], "temperature": + 0.0, "avg_logprob": -0.23219970294407435, "compression_ratio": 1.6074074074074074, + "no_speech_prob": 0.0014588504564017057}, {"id": 452, "seek": 179720, "start": 1822.6000000000001, + "end": 1825.88, "text": " And like we shouldn''t forget about them.", "tokens": + [51634, 400, 411, 321, 4659, 380, 2870, 466, 552, 13, 51798], "temperature": 0.0, + "avg_logprob": -0.23219970294407435, "compression_ratio": 1.6074074074074074, "no_speech_prob": + 0.0014588504564017057}, {"id": 453, "seek": 182588, "start": 1825.88, "end": 1836.3200000000002, + "text": " And if we go back to your product, so basically, like you said that you + also kind of look", "tokens": [50364, 400, 498, 321, 352, 646, 281, 428, 1674, 11, + 370, 1936, 11, 411, 291, 848, 300, 291, 611, 733, 295, 574, 50886], "temperature": + 0.0, "avg_logprob": -0.2596008555094401, "compression_ratio": 1.510752688172043, + "no_speech_prob": 0.008051185868680477}, {"id": 454, "seek": 182588, "start": 1836.3200000000002, + "end": 1842.0, "text": " at using some of the existing algorithms in vector search, + can you name them?", "tokens": [50886, 412, 1228, 512, 295, 264, 6741, 14642, 294, + 8062, 3164, 11, 393, 291, 1315, 552, 30, 51170], "temperature": 0.0, "avg_logprob": + -0.2596008555094401, "compression_ratio": 1.510752688172043, "no_speech_prob": 0.008051185868680477}, + {"id": 455, "seek": 182588, "start": 1842.0, "end": 1846.2, "text": " Or is this + some kind of secret or are you customizing them as well?", "tokens": [51170, 1610, + 307, 341, 512, 733, 295, 4054, 420, 366, 291, 2375, 3319, 552, 382, 731, 30, 51380], + "temperature": 0.0, "avg_logprob": -0.2596008555094401, "compression_ratio": 1.510752688172043, + "no_speech_prob": 0.008051185868680477}, {"id": 456, "seek": 182588, "start": 1846.2, + "end": 1849.4, "text": " So for the vector search, specifically.", "tokens": [51380, + 407, 337, 264, 8062, 3164, 11, 4682, 13, 51540], "temperature": 0.0, "avg_logprob": + -0.2596008555094401, "compression_ratio": 1.510752688172043, "no_speech_prob": 0.008051185868680477}, + {"id": 457, "seek": 182588, "start": 1849.4, "end": 1850.4, "text": " Yeah.", "tokens": + [51540, 865, 13, 51590], "temperature": 0.0, "avg_logprob": -0.2596008555094401, + "compression_ratio": 1.510752688172043, "no_speech_prob": 0.008051185868680477}, + {"id": 458, "seek": 185040, "start": 1850.92, "end": 1857.96, "text": " I think + we can say that we know we at our core, we do take advantage of phase or fire.", + "tokens": [50390, 286, 519, 321, 393, 584, 300, 321, 458, 321, 412, 527, 4965, 11, + 321, 360, 747, 5002, 295, 5574, 420, 2610, 13, 50742], "temperature": 0.0, "avg_logprob": + -0.36677643625359785, "compression_ratio": 1.5330396475770924, "no_speech_prob": + 0.12372417002916336}, {"id": 459, "seek": 185040, "start": 1857.96, "end": 1858.96, + "text": " So I''m not exactly right.", "tokens": [50742, 407, 286, 478, 406, 2293, + 558, 13, 50792], "temperature": 0.0, "avg_logprob": -0.36677643625359785, "compression_ratio": + 1.5330396475770924, "no_speech_prob": 0.12372417002916336}, {"id": 460, "seek": + 185040, "start": 1858.96, "end": 1859.96, "text": " I don''t pronounce that from.", + "tokens": [50792, 286, 500, 380, 19567, 300, 490, 13, 50842], "temperature": 0.0, + "avg_logprob": -0.36677643625359785, "compression_ratio": 1.5330396475770924, "no_speech_prob": + 0.12372417002916336}, {"id": 461, "seek": 185040, "start": 1859.96, "end": 1860.96, + "text": " No, but nobody knows.", "tokens": [50842, 883, 11, 457, 5079, 3255, 13, + 50892], "temperature": 0.0, "avg_logprob": -0.36677643625359785, "compression_ratio": + 1.5330396475770924, "no_speech_prob": 0.12372417002916336}, {"id": 462, "seek": + 185040, "start": 1860.96, "end": 1864.3600000000001, "text": " I think everyone + says they own weight.", "tokens": [50892, 286, 519, 1518, 1619, 436, 1065, 3364, + 13, 51062], "temperature": 0.0, "avg_logprob": -0.36677643625359785, "compression_ratio": + 1.5330396475770924, "no_speech_prob": 0.12372417002916336}, {"id": 463, "seek": + 185040, "start": 1864.3600000000001, "end": 1872.2800000000002, "text": " In my + opinion, it''s just an excellently designed system with a team that''s actively + maintaining", "tokens": [51062, 682, 452, 4800, 11, 309, 311, 445, 364, 45817, 2276, + 4761, 1185, 365, 257, 1469, 300, 311, 13022, 14916, 51458], "temperature": 0.0, + "avg_logprob": -0.36677643625359785, "compression_ratio": 1.5330396475770924, "no_speech_prob": + 0.12372417002916336}, {"id": 464, "seek": 185040, "start": 1872.2800000000002, "end": + 1876.44, "text": " it and there are obviously experts in that field.", "tokens": + [51458, 309, 293, 456, 366, 2745, 8572, 294, 300, 2519, 13, 51666], "temperature": + 0.0, "avg_logprob": -0.36677643625359785, "compression_ratio": 1.5330396475770924, + "no_speech_prob": 0.12372417002916336}, {"id": 465, "seek": 187644, "start": 1876.44, + "end": 1885.72, "text": " One of the features that customers have requested from + us is the ability to mix in predicate", "tokens": [50364, 1485, 295, 264, 4122, + 300, 4581, 362, 16436, 490, 505, 307, 264, 3485, 281, 2890, 294, 3852, 8700, 50828], + "temperature": 0.0, "avg_logprob": -0.2067992023585998, "compression_ratio": 1.6736401673640167, + "no_speech_prob": 0.04804878681898117}, {"id": 466, "seek": 187644, "start": 1885.72, + "end": 1887.88, "text": " predicates and traditional Boolean logic.", "tokens": + [50828, 47336, 1024, 293, 5164, 23351, 28499, 9952, 13, 50936], "temperature": 0.0, + "avg_logprob": -0.2067992023585998, "compression_ratio": 1.6736401673640167, "no_speech_prob": + 0.04804878681898117}, {"id": 467, "seek": 187644, "start": 1887.88, "end": 1893.0800000000002, + "text": " So you might have this corpus of documents and they all have this, every + document has", "tokens": [50936, 407, 291, 1062, 362, 341, 1181, 31624, 295, 8512, + 293, 436, 439, 362, 341, 11, 633, 4166, 575, 51196], "temperature": 0.0, "avg_logprob": + -0.2067992023585998, "compression_ratio": 1.6736401673640167, "no_speech_prob": + 0.04804878681898117}, {"id": 468, "seek": 187644, "start": 1893.0800000000002, "end": + 1895.68, "text": " this metadata, which is the date it was published.", "tokens": + [51196, 341, 26603, 11, 597, 307, 264, 4002, 309, 390, 6572, 13, 51326], "temperature": + 0.0, "avg_logprob": -0.2067992023585998, "compression_ratio": 1.6736401673640167, + "no_speech_prob": 0.04804878681898117}, {"id": 469, "seek": 187644, "start": 1895.68, + "end": 1898.48, "text": " And then you might want to say, okay, give me the most + relevant matches for the query,", "tokens": [51326, 400, 550, 291, 1062, 528, 281, + 584, 11, 1392, 11, 976, 385, 264, 881, 7340, 10676, 337, 264, 14581, 11, 51466], + "temperature": 0.0, "avg_logprob": -0.2067992023585998, "compression_ratio": 1.6736401673640167, + "no_speech_prob": 0.04804878681898117}, {"id": 470, "seek": 187644, "start": 1898.48, + "end": 1901.56, "text": " but only for documents published in 2021.", "tokens": + [51466, 457, 787, 337, 8512, 6572, 294, 7201, 13, 51620], "temperature": 0.0, "avg_logprob": + -0.2067992023585998, "compression_ratio": 1.6736401673640167, "no_speech_prob": + 0.04804878681898117}, {"id": 471, "seek": 190156, "start": 1901.56, "end": 1907.84, + "text": " This is like a very crisp selection criteria and this selects a subset + of the corpus.", "tokens": [50364, 639, 307, 411, 257, 588, 22952, 9450, 11101, + 293, 341, 3048, 82, 257, 25993, 295, 264, 1181, 31624, 13, 50678], "temperature": + 0.0, "avg_logprob": -0.2956766278556224, "compression_ratio": 1.5619469026548674, + "no_speech_prob": 0.009947966784238815}, {"id": 472, "seek": 190156, "start": 1907.84, + "end": 1914.2, "text": " So this is actually something that we have not launched + yet, but we''ve been actively working", "tokens": [50678, 407, 341, 307, 767, 746, + 300, 321, 362, 406, 8730, 1939, 11, 457, 321, 600, 668, 13022, 1364, 50996], "temperature": + 0.0, "avg_logprob": -0.2956766278556224, "compression_ratio": 1.5619469026548674, + "no_speech_prob": 0.009947966784238815}, {"id": 473, "seek": 190156, "start": 1914.2, + "end": 1917.28, "text": " on and will probably launch in Q1.", "tokens": [50996, + 322, 293, 486, 1391, 4025, 294, 1249, 16, 13, 51150], "temperature": 0.0, "avg_logprob": + -0.2956766278556224, "compression_ratio": 1.5619469026548674, "no_speech_prob": + 0.009947966784238815}, {"id": 474, "seek": 190156, "start": 1917.28, "end": 1926.3999999999999, + "text": " I believe, I recently added the support Google vertex matching engine, + I think, is a recent", "tokens": [51150, 286, 1697, 11, 286, 3938, 3869, 264, 1406, + 3329, 28162, 14324, 2848, 11, 286, 519, 11, 307, 257, 5162, 51606], "temperature": + 0.0, "avg_logprob": -0.2956766278556224, "compression_ratio": 1.5619469026548674, + "no_speech_prob": 0.009947966784238815}, {"id": 475, "seek": 190156, "start": 1926.3999999999999, + "end": 1927.3999999999999, "text": " offering.", "tokens": [51606, 8745, 13, 51656], + "temperature": 0.0, "avg_logprob": -0.2956766278556224, "compression_ratio": 1.5619469026548674, + "no_speech_prob": 0.009947966784238815}, {"id": 476, "seek": 190156, "start": 1927.3999999999999, + "end": 1929.3999999999999, "text": " They also claim to have this support.", "tokens": + [51656, 814, 611, 3932, 281, 362, 341, 1406, 13, 51756], "temperature": 0.0, "avg_logprob": + -0.2956766278556224, "compression_ratio": 1.5619469026548674, "no_speech_prob": + 0.009947966784238815}, {"id": 477, "seek": 192940, "start": 1929.4, "end": 1932.16, + "text": " It''s important, many of our customers have asked for the same thing.", + "tokens": [50364, 467, 311, 1021, 11, 867, 295, 527, 4581, 362, 2351, 337, 264, + 912, 551, 13, 50502], "temperature": 0.0, "avg_logprob": -0.28342952284702033, "compression_ratio": + 1.6223776223776223, "no_speech_prob": 0.04829133301973343}, {"id": 478, "seek": + 192940, "start": 1932.16, "end": 1939.0, "text": " So we''ve started from a FIAS, + but we have been customizing it.", "tokens": [50502, 407, 321, 600, 1409, 490, 257, + 479, 40, 3160, 11, 457, 321, 362, 668, 2375, 3319, 309, 13, 50844], "temperature": + 0.0, "avg_logprob": -0.28342952284702033, "compression_ratio": 1.6223776223776223, + "no_speech_prob": 0.04829133301973343}, {"id": 479, "seek": 192940, "start": 1939.0, + "end": 1940.2, "text": " Yeah, yeah, sounds good.", "tokens": [50844, 865, 11, 1338, + 11, 3263, 665, 13, 50904], "temperature": 0.0, "avg_logprob": -0.28342952284702033, + "compression_ratio": 1.6223776223776223, "no_speech_prob": 0.04829133301973343}, + {"id": 480, "seek": 192940, "start": 1940.2, "end": 1945.16, "text": " So basically + some other companies call it symbolic filtering and that''s what I think you", "tokens": + [50904, 407, 1936, 512, 661, 3431, 818, 309, 25755, 30822, 293, 300, 311, 437, 286, + 519, 291, 51152], "temperature": 0.0, "avg_logprob": -0.28342952284702033, "compression_ratio": + 1.6223776223776223, "no_speech_prob": 0.04829133301973343}, {"id": 481, "seek": + 192940, "start": 1945.16, "end": 1946.16, "text": " refer to, right?", "tokens": + [51152, 2864, 281, 11, 558, 30, 51202], "temperature": 0.0, "avg_logprob": -0.28342952284702033, + "compression_ratio": 1.6223776223776223, "no_speech_prob": 0.04829133301973343}, + {"id": 482, "seek": 192940, "start": 1946.16, "end": 1951.16, "text": " So I can + have certain categorical variables, so to say in my data and I can filter by it,", + "tokens": [51202, 407, 286, 393, 362, 1629, 19250, 804, 9102, 11, 370, 281, 584, + 294, 452, 1412, 293, 286, 393, 6608, 538, 309, 11, 51452], "temperature": 0.0, "avg_logprob": + -0.28342952284702033, "compression_ratio": 1.6223776223776223, "no_speech_prob": + 0.04829133301973343}, {"id": 483, "seek": 192940, "start": 1951.16, "end": 1952.16, + "text": " right?", "tokens": [51452, 558, 30, 51502], "temperature": 0.0, "avg_logprob": + -0.28342952284702033, "compression_ratio": 1.6223776223776223, "no_speech_prob": + 0.04829133301973343}, {"id": 484, "seek": 192940, "start": 1952.16, "end": 1953.16, + "text": " Exactly, right.", "tokens": [51502, 7587, 11, 558, 13, 51552], "temperature": + 0.0, "avg_logprob": -0.28342952284702033, "compression_ratio": 1.6223776223776223, + "no_speech_prob": 0.04829133301973343}, {"id": 485, "seek": 192940, "start": 1953.16, + "end": 1958.76, "text": " Yeah, so I think vanilla fires of face doesn''t have this + functionality as far as I know.", "tokens": [51552, 865, 11, 370, 286, 519, 17528, + 15044, 295, 1851, 1177, 380, 362, 341, 14980, 382, 1400, 382, 286, 458, 13, 51832], + "temperature": 0.0, "avg_logprob": -0.28342952284702033, "compression_ratio": 1.6223776223776223, + "no_speech_prob": 0.04829133301973343}, {"id": 486, "seek": 195876, "start": 1958.76, + "end": 1961.16, "text": " And so essentially you''ll have to kind of extend it.", + "tokens": [50364, 400, 370, 4476, 291, 603, 362, 281, 733, 295, 10101, 309, 13, + 50484], "temperature": 0.0, "avg_logprob": -0.1534564899948408, "compression_ratio": + 1.6216216216216217, "no_speech_prob": 0.0035870415158569813}, {"id": 487, "seek": + 195876, "start": 1961.16, "end": 1965.32, "text": " And do you plan to keep it to + yourself, which is perfectly fine, or are you also able", "tokens": [50484, 400, + 360, 291, 1393, 281, 1066, 309, 281, 1803, 11, 597, 307, 6239, 2489, 11, 420, 366, + 291, 611, 1075, 50692], "temperature": 0.0, "avg_logprob": -0.1534564899948408, + "compression_ratio": 1.6216216216216217, "no_speech_prob": 0.0035870415158569813}, + {"id": 488, "seek": 195876, "start": 1965.32, "end": 1972.12, "text": " to contribute + it back to the FIAS open source project?", "tokens": [50692, 281, 10586, 309, 646, + 281, 264, 479, 40, 3160, 1269, 4009, 1716, 30, 51032], "temperature": 0.0, "avg_logprob": + -0.1534564899948408, "compression_ratio": 1.6216216216216217, "no_speech_prob": + 0.0035870415158569813}, {"id": 489, "seek": 195876, "start": 1972.12, "end": 1977.16, + "text": " So I think what I''ve noticed about the authors of FIAS is that they want + to keep the product", "tokens": [51032, 407, 286, 519, 437, 286, 600, 5694, 466, + 264, 16552, 295, 479, 40, 3160, 307, 300, 436, 528, 281, 1066, 264, 1674, 51284], + "temperature": 0.0, "avg_logprob": -0.1534564899948408, "compression_ratio": 1.6216216216216217, + "no_speech_prob": 0.0035870415158569813}, {"id": 490, "seek": 195876, "start": 1977.16, + "end": 1981.28, "text": " very focused on being a first class vector engine.", "tokens": + [51284, 588, 5178, 322, 885, 257, 700, 1508, 8062, 2848, 13, 51490], "temperature": + 0.0, "avg_logprob": -0.1534564899948408, "compression_ratio": 1.6216216216216217, + "no_speech_prob": 0.0035870415158569813}, {"id": 491, "seek": 195876, "start": 1981.28, + "end": 1986.52, "text": " And these are essentially augmentations that they''re + not interested in pulling in.", "tokens": [51490, 400, 613, 366, 4476, 29919, 763, + 300, 436, 434, 406, 3102, 294, 8407, 294, 13, 51752], "temperature": 0.0, "avg_logprob": + -0.1534564899948408, "compression_ratio": 1.6216216216216217, "no_speech_prob": + 0.0035870415158569813}, {"id": 492, "seek": 198652, "start": 1986.52, "end": 1992.2, + "text": " And I think they would see it as scope creep, which is probably fair.", + "tokens": [50364, 400, 286, 519, 436, 576, 536, 309, 382, 11923, 9626, 11, 597, + 307, 1391, 3143, 13, 50648], "temperature": 0.0, "avg_logprob": -0.218062345798199, + "compression_ratio": 1.7581967213114753, "no_speech_prob": 0.0031924243085086346}, + {"id": 493, "seek": 198652, "start": 1992.2, "end": 1995.2, "text": " That said, + would we contribute it as open source?", "tokens": [50648, 663, 848, 11, 576, 321, + 10586, 309, 382, 1269, 4009, 30, 50798], "temperature": 0.0, "avg_logprob": -0.218062345798199, + "compression_ratio": 1.7581967213114753, "no_speech_prob": 0.0031924243085086346}, + {"id": 494, "seek": 198652, "start": 1995.2, "end": 1998.0, "text": " Like we could + still contribute it back as open source.", "tokens": [50798, 1743, 321, 727, 920, + 10586, 309, 646, 382, 1269, 4009, 13, 50938], "temperature": 0.0, "avg_logprob": + -0.218062345798199, "compression_ratio": 1.7581967213114753, "no_speech_prob": 0.0031924243085086346}, + {"id": 495, "seek": 198652, "start": 1998.0, "end": 2002.0, "text": " In fact, down + the line we could potentially make our entire stack open source.", "tokens": [50938, + 682, 1186, 11, 760, 264, 1622, 321, 727, 7263, 652, 527, 2302, 8630, 1269, 4009, + 13, 51138], "temperature": 0.0, "avg_logprob": -0.218062345798199, "compression_ratio": + 1.7581967213114753, "no_speech_prob": 0.0031924243085086346}, {"id": 496, "seek": + 198652, "start": 2002.0, "end": 2010.4, "text": " I think some of the abusiveness + of that, say, in regards to elastic and how it''s worked,", "tokens": [51138, 286, + 519, 512, 295, 264, 48819, 8477, 295, 300, 11, 584, 11, 294, 14258, 281, 17115, + 293, 577, 309, 311, 2732, 11, 51558], "temperature": 0.0, "avg_logprob": -0.218062345798199, + "compression_ratio": 1.7581967213114753, "no_speech_prob": 0.0031924243085086346}, + {"id": 497, "seek": 198652, "start": 2010.4, "end": 2014.08, "text": " where you + have these very large companies that essentially contribute very little, but", "tokens": + [51558, 689, 291, 362, 613, 588, 2416, 3431, 300, 4476, 10586, 588, 707, 11, 457, + 51742], "temperature": 0.0, "avg_logprob": -0.218062345798199, "compression_ratio": + 1.7581967213114753, "no_speech_prob": 0.0031924243085086346}, {"id": 498, "seek": + 201408, "start": 2014.08, "end": 2022.0, "text": " they take advantage of their + ability to launch platforms as a service like Amazon can.", "tokens": [50364, 436, + 747, 5002, 295, 641, 3485, 281, 4025, 9473, 382, 257, 2643, 411, 6795, 393, 13, + 50760], "temperature": 0.0, "avg_logprob": -0.29417976379394534, "compression_ratio": + 1.5254901960784313, "no_speech_prob": 0.014069268479943275}, {"id": 499, "seek": + 201408, "start": 2022.0, "end": 2023.8, "text": " That''s kind of scared us.", "tokens": + [50760, 663, 311, 733, 295, 5338, 505, 13, 50850], "temperature": 0.0, "avg_logprob": + -0.29417976379394534, "compression_ratio": 1.5254901960784313, "no_speech_prob": + 0.014069268479943275}, {"id": 500, "seek": 201408, "start": 2023.8, "end": 2028.36, + "text": " I think in the short term, we''re not doing that, but that''s certainly + something we could", "tokens": [50850, 286, 519, 294, 264, 2099, 1433, 11, 321, + 434, 406, 884, 300, 11, 457, 300, 311, 3297, 746, 321, 727, 51078], "temperature": + 0.0, "avg_logprob": -0.29417976379394534, "compression_ratio": 1.5254901960784313, + "no_speech_prob": 0.014069268479943275}, {"id": 501, "seek": 201408, "start": 2028.36, + "end": 2030.56, "text": " plan on doing in the longer term.", "tokens": [51078, + 1393, 322, 884, 294, 264, 2854, 1433, 13, 51188], "temperature": 0.0, "avg_logprob": + -0.29417976379394534, "compression_ratio": 1.5254901960784313, "no_speech_prob": + 0.014069268479943275}, {"id": 502, "seek": 201408, "start": 2030.56, "end": 2031.56, + "text": " Yeah.", "tokens": [51188, 865, 13, 51238], "temperature": 0.0, "avg_logprob": + -0.29417976379394534, "compression_ratio": 1.5254901960784313, "no_speech_prob": + 0.014069268479943275}, {"id": 503, "seek": 201408, "start": 2031.56, "end": 2038.04, + "text": " And I mean, of course, the dynamics of open source is not necessarily + solved, especially as", "tokens": [51238, 400, 286, 914, 11, 295, 1164, 11, 264, + 15679, 295, 1269, 4009, 307, 406, 4725, 13041, 11, 2318, 382, 51562], "temperature": + 0.0, "avg_logprob": -0.29417976379394534, "compression_ratio": 1.5254901960784313, + "no_speech_prob": 0.014069268479943275}, {"id": 504, "seek": 201408, "start": 2038.04, + "end": 2040.48, "text": " you''ve brought up this example with the elastic, right?", + "tokens": [51562, 291, 600, 3038, 493, 341, 1365, 365, 264, 17115, 11, 558, 30, + 51684], "temperature": 0.0, "avg_logprob": -0.29417976379394534, "compression_ratio": + 1.5254901960784313, "no_speech_prob": 0.014069268479943275}, {"id": 505, "seek": + 204048, "start": 2040.48, "end": 2044.8, "text": " And they''re kind of battle between + elastic and Amazon.", "tokens": [50364, 400, 436, 434, 733, 295, 4635, 1296, 17115, + 293, 6795, 13, 50580], "temperature": 0.0, "avg_logprob": -0.22550853836202175, + "compression_ratio": 1.6108949416342413, "no_speech_prob": 0.043774936348199844}, + {"id": 506, "seek": 204048, "start": 2044.8, "end": 2049.8, "text": " But like for + some companies, it still works as a starter.", "tokens": [50580, 583, 411, 337, + 512, 3431, 11, 309, 920, 1985, 382, 257, 22465, 13, 50830], "temperature": 0.0, + "avg_logprob": -0.22550853836202175, "compression_ratio": 1.6108949416342413, "no_speech_prob": + 0.043774936348199844}, {"id": 507, "seek": 204048, "start": 2049.8, "end": 2051.6, + "text": " You can enter this community.", "tokens": [50830, 509, 393, 3242, 341, + 1768, 13, 50920], "temperature": 0.0, "avg_logprob": -0.22550853836202175, "compression_ratio": + 1.6108949416342413, "no_speech_prob": 0.043774936348199844}, {"id": 508, "seek": + 204048, "start": 2051.6, "end": 2053.8, "text": " You start building the community + around you.", "tokens": [50920, 509, 722, 2390, 264, 1768, 926, 291, 13, 51030], + "temperature": 0.0, "avg_logprob": -0.22550853836202175, "compression_ratio": 1.6108949416342413, + "no_speech_prob": 0.043774936348199844}, {"id": 509, "seek": 204048, "start": 2053.8, + "end": 2055.64, "text": " And so they bring back ideas.", "tokens": [51030, 400, + 370, 436, 1565, 646, 3487, 13, 51122], "temperature": 0.0, "avg_logprob": -0.22550853836202175, + "compression_ratio": 1.6108949416342413, "no_speech_prob": 0.043774936348199844}, + {"id": 510, "seek": 204048, "start": 2055.64, "end": 2058.52, "text": " They feed + in new use cases to you.", "tokens": [51122, 814, 3154, 294, 777, 764, 3331, 281, + 291, 13, 51266], "temperature": 0.0, "avg_logprob": -0.22550853836202175, "compression_ratio": + 1.6108949416342413, "no_speech_prob": 0.043774936348199844}, {"id": 511, "seek": + 204048, "start": 2058.52, "end": 2061.4, "text": " And maybe they even implement + some features, right?", "tokens": [51266, 400, 1310, 436, 754, 4445, 512, 4122, + 11, 558, 30, 51410], "temperature": 0.0, "avg_logprob": -0.22550853836202175, "compression_ratio": + 1.6108949416342413, "no_speech_prob": 0.043774936348199844}, {"id": 512, "seek": + 204048, "start": 2061.4, "end": 2066.92, "text": " And is this something that you''ve + been thinking as well along these lines?", "tokens": [51410, 400, 307, 341, 746, + 300, 291, 600, 668, 1953, 382, 731, 2051, 613, 3876, 30, 51686], "temperature": + 0.0, "avg_logprob": -0.22550853836202175, "compression_ratio": 1.6108949416342413, + "no_speech_prob": 0.043774936348199844}, {"id": 513, "seek": 204048, "start": 2066.92, + "end": 2069.04, "text": " Well, I definitely see your point.", "tokens": [51686, + 1042, 11, 286, 2138, 536, 428, 935, 13, 51792], "temperature": 0.0, "avg_logprob": + -0.22550853836202175, "compression_ratio": 1.6108949416342413, "no_speech_prob": + 0.043774936348199844}, {"id": 514, "seek": 206904, "start": 2069.04, "end": 2070.04, + "text": " I definitely see your point.", "tokens": [50364, 286, 2138, 536, 428, + 935, 13, 50414], "temperature": 0.0, "avg_logprob": -0.25695695072771557, "compression_ratio": + 1.526829268292683, "no_speech_prob": 0.0011328626424074173}, {"id": 515, "seek": + 206904, "start": 2070.04, "end": 2078.04, "text": " At the same time, we also do + have some competition in the space.", "tokens": [50414, 1711, 264, 912, 565, 11, + 321, 611, 360, 362, 512, 6211, 294, 264, 1901, 13, 50814], "temperature": 0.0, "avg_logprob": + -0.25695695072771557, "compression_ratio": 1.526829268292683, "no_speech_prob": + 0.0011328626424074173}, {"id": 516, "seek": 206904, "start": 2078.04, "end": 2086.32, + "text": " We''re still in the early days, but 2021, in particular, saw the launch + of several competitors.", "tokens": [50814, 492, 434, 920, 294, 264, 2440, 1708, + 11, 457, 7201, 11, 294, 1729, 11, 1866, 264, 4025, 295, 2940, 18333, 13, 51228], + "temperature": 0.0, "avg_logprob": -0.25695695072771557, "compression_ratio": 1.526829268292683, + "no_speech_prob": 0.0011328626424074173}, {"id": 517, "seek": 206904, "start": 2086.32, + "end": 2091.12, "text": " And even Microsoft is in the mix now, Microsoft''s semantic + search.", "tokens": [51228, 400, 754, 8116, 307, 294, 264, 2890, 586, 11, 8116, + 311, 47982, 3164, 13, 51468], "temperature": 0.0, "avg_logprob": -0.25695695072771557, + "compression_ratio": 1.526829268292683, "no_speech_prob": 0.0011328626424074173}, + {"id": 518, "seek": 206904, "start": 2091.12, "end": 2094.8, "text": " I think it''s + still in beta, Amazon launch Kendra in 2020.", "tokens": [51468, 286, 519, 309, + 311, 920, 294, 9861, 11, 6795, 4025, 20891, 424, 294, 4808, 13, 51652], "temperature": + 0.0, "avg_logprob": -0.25695695072771557, "compression_ratio": 1.526829268292683, + "no_speech_prob": 0.0011328626424074173}, {"id": 519, "seek": 209480, "start": 2094.8, + "end": 2100.1200000000003, "text": " I think that they probably get the credit for + launching the first platform as a service", "tokens": [50364, 286, 519, 300, 436, + 1391, 483, 264, 5397, 337, 18354, 264, 700, 3663, 382, 257, 2643, 50630], "temperature": + 0.0, "avg_logprob": -0.19223769098265558, "compression_ratio": 1.6779661016949152, + "no_speech_prob": 0.010225232690572739}, {"id": 520, "seek": 209480, "start": 2100.1200000000003, + "end": 2102.7200000000003, "text": " neural information retrieval system.", "tokens": + [50630, 18161, 1589, 19817, 3337, 1185, 13, 50760], "temperature": 0.0, "avg_logprob": + -0.19223769098265558, "compression_ratio": 1.6779661016949152, "no_speech_prob": + 0.010225232690572739}, {"id": 521, "seek": 209480, "start": 2102.7200000000003, + "end": 2109.04, "text": " So in both of the cases, both of those systems, by the + way, I think that they actually are", "tokens": [50760, 407, 294, 1293, 295, 264, + 3331, 11, 1293, 295, 729, 3652, 11, 538, 264, 636, 11, 286, 519, 300, 436, 767, + 366, 51076], "temperature": 0.0, "avg_logprob": -0.19223769098265558, "compression_ratio": + 1.6779661016949152, "no_speech_prob": 0.010225232690572739}, {"id": 522, "seek": + 209480, "start": 2109.04, "end": 2115.2400000000002, "text": " fundamentally based + on a VM25 search, followed by re-ranking with the neural network.", "tokens": [51076, + 17879, 2361, 322, 257, 18038, 6074, 3164, 11, 6263, 538, 319, 12, 20479, 278, 365, + 264, 18161, 3209, 13, 51386], "temperature": 0.0, "avg_logprob": -0.19223769098265558, + "compression_ratio": 1.6779661016949152, "no_speech_prob": 0.010225232690572739}, + {"id": 523, "seek": 209480, "start": 2115.2400000000002, "end": 2119.2200000000003, + "text": " This is what I''ve gathered from their own product marketing material, + which is still", "tokens": [51386, 639, 307, 437, 286, 600, 13032, 490, 641, 1065, + 1674, 6370, 2527, 11, 597, 307, 920, 51585], "temperature": 0.0, "avg_logprob": + -0.19223769098265558, "compression_ratio": 1.6779661016949152, "no_speech_prob": + 0.010225232690572739}, {"id": 524, "seek": 209480, "start": 2119.2200000000003, + "end": 2120.2200000000003, "text": " in neural search.", "tokens": [51585, 294, + 18161, 3164, 13, 51635], "temperature": 0.0, "avg_logprob": -0.19223769098265558, + "compression_ratio": 1.6779661016949152, "no_speech_prob": 0.010225232690572739}, + {"id": 525, "seek": 209480, "start": 2120.2200000000003, "end": 2124.4, "text": + " It just has a difference out of pros and cons versus like straight retrieval from + a vector", "tokens": [51635, 467, 445, 575, 257, 2649, 484, 295, 6267, 293, 1014, + 5717, 411, 2997, 19817, 3337, 490, 257, 8062, 51844], "temperature": 0.0, "avg_logprob": + -0.19223769098265558, "compression_ratio": 1.6779661016949152, "no_speech_prob": + 0.010225232690572739}, {"id": 526, "seek": 212440, "start": 2124.4, "end": 2125.4, + "text": " database.", "tokens": [50364, 8149, 13, 50414], "temperature": 0.0, "avg_logprob": + -0.21398587954246392, "compression_ratio": 1.703125, "no_speech_prob": 0.011719164438545704}, + {"id": 527, "seek": 212440, "start": 2125.4, "end": 2132.48, "text": " So, for instance, + just to give you one quick example, a multilingual search, VM25 is not", "tokens": + [50414, 407, 11, 337, 5197, 11, 445, 281, 976, 291, 472, 1702, 1365, 11, 257, 2120, + 38219, 3164, 11, 18038, 6074, 307, 406, 50768], "temperature": 0.0, "avg_logprob": + -0.21398587954246392, "compression_ratio": 1.703125, "no_speech_prob": 0.011719164438545704}, + {"id": 528, "seek": 212440, "start": 2132.48, "end": 2134.6800000000003, "text": + " going to work for a multilingual search.", "tokens": [50768, 516, 281, 589, 337, + 257, 2120, 38219, 3164, 13, 50878], "temperature": 0.0, "avg_logprob": -0.21398587954246392, + "compression_ratio": 1.703125, "no_speech_prob": 0.011719164438545704}, {"id": 529, + "seek": 212440, "start": 2134.6800000000003, "end": 2139.7200000000003, "text": + " You have queries coming in different languages, documents in different languages.", + "tokens": [50878, 509, 362, 24109, 1348, 294, 819, 8650, 11, 8512, 294, 819, 8650, + 13, 51130], "temperature": 0.0, "avg_logprob": -0.21398587954246392, "compression_ratio": + 1.703125, "no_speech_prob": 0.011719164438545704}, {"id": 530, "seek": 212440, "start": + 2139.7200000000003, "end": 2145.04, "text": " VM25 won''t work there, nor will a + re-rank on a VM25 results approach work over there,", "tokens": [51130, 18038, 6074, + 1582, 380, 589, 456, 11, 6051, 486, 257, 319, 12, 20479, 322, 257, 18038, 6074, + 3542, 3109, 589, 670, 456, 11, 51396], "temperature": 0.0, "avg_logprob": -0.21398587954246392, + "compression_ratio": 1.703125, "no_speech_prob": 0.011719164438545704}, {"id": 531, + "seek": 212440, "start": 2145.04, "end": 2148.84, "text": " because the VM25 has + to bring something back to re-ranking.", "tokens": [51396, 570, 264, 18038, 6074, + 575, 281, 1565, 746, 646, 281, 319, 12, 20479, 278, 13, 51586], "temperature": 0.0, + "avg_logprob": -0.21398587954246392, "compression_ratio": 1.703125, "no_speech_prob": + 0.011719164438545704}, {"id": 532, "seek": 212440, "start": 2148.84, "end": 2153.52, + "text": " Well, in the case of our system, you can check on some of the demos.", + "tokens": [51586, 1042, 11, 294, 264, 1389, 295, 527, 1185, 11, 291, 393, 1520, + 322, 512, 295, 264, 33788, 13, 51820], "temperature": 0.0, "avg_logprob": -0.21398587954246392, + "compression_ratio": 1.703125, "no_speech_prob": 0.011719164438545704}, {"id": 533, + "seek": 215352, "start": 2153.52, "end": 2158.08, "text": " We can actually embed + across languages into a shared embedding space.", "tokens": [50364, 492, 393, 767, + 12240, 2108, 8650, 666, 257, 5507, 12240, 3584, 1901, 13, 50592], "temperature": + 0.0, "avg_logprob": -0.24934017986332604, "compression_ratio": 1.6450381679389312, + "no_speech_prob": 0.01436303649097681}, {"id": 534, "seek": 215352, "start": 2158.08, + "end": 2160.64, "text": " And so you can search across languages.", "tokens": [50592, + 400, 370, 291, 393, 3164, 2108, 8650, 13, 50720], "temperature": 0.0, "avg_logprob": + -0.24934017986332604, "compression_ratio": 1.6450381679389312, "no_speech_prob": + 0.01436303649097681}, {"id": 535, "seek": 215352, "start": 2160.64, "end": 2163.64, + "text": " That''s something which you need a vector database for.", "tokens": [50720, + 663, 311, 746, 597, 291, 643, 257, 8062, 8149, 337, 13, 50870], "temperature": 0.0, + "avg_logprob": -0.24934017986332604, "compression_ratio": 1.6450381679389312, "no_speech_prob": + 0.01436303649097681}, {"id": 536, "seek": 215352, "start": 2163.64, "end": 2164.64, + "text": " Yeah, exactly.", "tokens": [50870, 865, 11, 2293, 13, 50920], "temperature": + 0.0, "avg_logprob": -0.24934017986332604, "compression_ratio": 1.6450381679389312, + "no_speech_prob": 0.01436303649097681}, {"id": 537, "seek": 215352, "start": 2164.64, + "end": 2171.52, "text": " So you go multilingual on the first stage of retrieving + the data dates, right?", "tokens": [50920, 407, 291, 352, 2120, 38219, 322, 264, + 700, 3233, 295, 19817, 798, 264, 1412, 11691, 11, 558, 30, 51264], "temperature": + 0.0, "avg_logprob": -0.24934017986332604, "compression_ratio": 1.6450381679389312, + "no_speech_prob": 0.01436303649097681}, {"id": 538, "seek": 215352, "start": 2171.52, + "end": 2175.56, "text": " And I think this multilingual search in general, I think + it has so much potential.", "tokens": [51264, 400, 286, 519, 341, 2120, 38219, 3164, + 294, 2674, 11, 286, 519, 309, 575, 370, 709, 3995, 13, 51466], "temperature": 0.0, + "avg_logprob": -0.24934017986332604, "compression_ratio": 1.6450381679389312, "no_speech_prob": + 0.01436303649097681}, {"id": 539, "seek": 215352, "start": 2175.56, "end": 2181.0, + "text": " I don''t know if Google is using it already, to some extent, but like + even like at smaller", "tokens": [51466, 286, 500, 380, 458, 498, 3329, 307, 1228, + 309, 1217, 11, 281, 512, 8396, 11, 457, 411, 754, 411, 412, 4356, 51738], "temperature": + 0.0, "avg_logprob": -0.24934017986332604, "compression_ratio": 1.6450381679389312, + "no_speech_prob": 0.01436303649097681}, {"id": 540, "seek": 218100, "start": 2181.0, + "end": 2186.6, "text": " scale, instead of configuring, let''s say, solar.", "tokens": + [50364, 4373, 11, 2602, 295, 6662, 1345, 11, 718, 311, 584, 11, 7936, 13, 50644], + "temperature": 0.0, "avg_logprob": -0.2809359414236886, "compression_ratio": 1.4792626728110598, + "no_speech_prob": 0.061261195689439774}, {"id": 541, "seek": 218100, "start": 2186.6, + "end": 2189.0, "text": " We keep mentioning last search a lot.", "tokens": [50644, + 492, 1066, 18315, 1036, 3164, 257, 688, 13, 50764], "temperature": 0.0, "avg_logprob": + -0.2809359414236886, "compression_ratio": 1.4792626728110598, "no_speech_prob": + 0.061261195689439774}, {"id": 542, "seek": 218100, "start": 2189.0, "end": 2192.0, + "text": " They didn''t pay for these podcasts.", "tokens": [50764, 814, 994, 380, + 1689, 337, 613, 24045, 13, 50914], "temperature": 0.0, "avg_logprob": -0.2809359414236886, + "compression_ratio": 1.4792626728110598, "no_speech_prob": 0.061261195689439774}, + {"id": 543, "seek": 218100, "start": 2192.0, "end": 2196.08, "text": " But I''m + just saying, let''s say Apache solar, or Lycene, right?", "tokens": [50914, 583, + 286, 478, 445, 1566, 11, 718, 311, 584, 46597, 7936, 11, 420, 12687, 66, 1450, 11, + 558, 30, 51118], "temperature": 0.0, "avg_logprob": -0.2809359414236886, "compression_ratio": + 1.4792626728110598, "no_speech_prob": 0.061261195689439774}, {"id": 544, "seek": + 218100, "start": 2196.08, "end": 2202.24, "text": " So you''ll have to kind of like, + yeah, go a long, long, long way to achieve it.", "tokens": [51118, 407, 291, 603, + 362, 281, 733, 295, 411, 11, 1338, 11, 352, 257, 938, 11, 938, 11, 938, 636, 281, + 4584, 309, 13, 51426], "temperature": 0.0, "avg_logprob": -0.2809359414236886, "compression_ratio": + 1.4792626728110598, "no_speech_prob": 0.061261195689439774}, {"id": 545, "seek": + 218100, "start": 2202.24, "end": 2207.56, "text": " But like, okay, now Lycene released + H&SW in 9.0 version.", "tokens": [51426, 583, 411, 11, 1392, 11, 586, 12687, 66, + 1450, 4736, 389, 5, 50, 54, 294, 1722, 13, 15, 3037, 13, 51692], "temperature": + 0.0, "avg_logprob": -0.2809359414236886, "compression_ratio": 1.4792626728110598, + "no_speech_prob": 0.061261195689439774}, {"id": 546, "seek": 220756, "start": 2207.56, + "end": 2213.92, "text": " And so in principle, you could embed your documents using + multilingual model and retrieve", "tokens": [50364, 400, 370, 294, 8665, 11, 291, + 727, 12240, 428, 8512, 1228, 2120, 38219, 2316, 293, 30254, 50682], "temperature": + 0.0, "avg_logprob": -0.2174994945526123, "compression_ratio": 1.5104166666666667, + "no_speech_prob": 0.0037523882929235697}, {"id": 547, "seek": 220756, "start": 2213.92, + "end": 2215.84, "text": " them in the same way, right?", "tokens": [50682, 552, + 294, 264, 912, 636, 11, 558, 30, 50778], "temperature": 0.0, "avg_logprob": -0.2174994945526123, + "compression_ratio": 1.5104166666666667, "no_speech_prob": 0.0037523882929235697}, + {"id": 548, "seek": 220756, "start": 2215.84, "end": 2225.16, "text": " So do you + see huge potential for the market, you know, for the multilinguality?", "tokens": + [50778, 407, 360, 291, 536, 2603, 3995, 337, 264, 2142, 11, 291, 458, 11, 337, 264, + 2120, 38219, 507, 30, 51244], "temperature": 0.0, "avg_logprob": -0.2174994945526123, + "compression_ratio": 1.5104166666666667, "no_speech_prob": 0.0037523882929235697}, + {"id": 549, "seek": 220756, "start": 2225.16, "end": 2234.08, "text": " No, there + have been some studies that showed that when eBay introduced automatic translator", + "tokens": [51244, 883, 11, 456, 362, 668, 512, 5313, 300, 4712, 300, 562, 33803, + 7268, 12509, 35223, 51690], "temperature": 0.0, "avg_logprob": -0.2174994945526123, + "compression_ratio": 1.5104166666666667, "no_speech_prob": 0.0037523882929235697}, + {"id": 550, "seek": 223408, "start": 2234.08, "end": 2238.48, "text": " tools, there + was a significant increase.", "tokens": [50364, 3873, 11, 456, 390, 257, 4776, 3488, + 13, 50584], "temperature": 0.0, "avg_logprob": -0.22587434595281428, "compression_ratio": + 1.6818181818181819, "no_speech_prob": 0.07370851188898087}, {"id": 551, "seek": + 223408, "start": 2238.48, "end": 2242.6, "text": " It was a few, I think, you know, + a few percentage points of increase in commerce on their", "tokens": [50584, 467, + 390, 257, 1326, 11, 286, 519, 11, 291, 458, 11, 257, 1326, 9668, 2793, 295, 3488, + 294, 26320, 322, 641, 50790], "temperature": 0.0, "avg_logprob": -0.22587434595281428, + "compression_ratio": 1.6818181818181819, "no_speech_prob": 0.07370851188898087}, + {"id": 552, "seek": 223408, "start": 2242.6, "end": 2246.96, "text": " platform, + which translated to hundreds and hundreds of millions of dollars.", "tokens": [50790, + 3663, 11, 597, 16805, 281, 6779, 293, 6779, 295, 6803, 295, 3808, 13, 51008], "temperature": + 0.0, "avg_logprob": -0.22587434595281428, "compression_ratio": 1.6818181818181819, + "no_speech_prob": 0.07370851188898087}, {"id": 553, "seek": 223408, "start": 2246.96, + "end": 2252.2799999999997, "text": " So the, you know, the advancements that have + been made in machine translation and now,", "tokens": [51008, 407, 264, 11, 291, + 458, 11, 264, 7295, 1117, 300, 362, 668, 1027, 294, 3479, 12853, 293, 586, 11, 51274], + "temperature": 0.0, "avg_logprob": -0.22587434595281428, "compression_ratio": 1.6818181818181819, + "no_speech_prob": 0.07370851188898087}, {"id": 554, "seek": 223408, "start": 2252.2799999999997, + "end": 2258.36, "text": " and she like, cross-lingual retrieval, will serve to further + break down barriers to commerce", "tokens": [51274, 293, 750, 411, 11, 3278, 12, + 1688, 901, 19817, 3337, 11, 486, 4596, 281, 3052, 1821, 760, 13565, 281, 26320, + 51578], "temperature": 0.0, "avg_logprob": -0.22587434595281428, "compression_ratio": + 1.6818181818181819, "no_speech_prob": 0.07370851188898087}, {"id": 555, "seek": + 223408, "start": 2258.36, "end": 2259.36, "text": " at least.", "tokens": [51578, + 412, 1935, 13, 51628], "temperature": 0.0, "avg_logprob": -0.22587434595281428, + "compression_ratio": 1.6818181818181819, "no_speech_prob": 0.07370851188898087}, + {"id": 556, "seek": 223408, "start": 2259.36, "end": 2262.0, "text": " And in a + way that''s commercially very valuable.", "tokens": [51628, 400, 294, 257, 636, + 300, 311, 41751, 588, 8263, 13, 51760], "temperature": 0.0, "avg_logprob": -0.22587434595281428, + "compression_ratio": 1.6818181818181819, "no_speech_prob": 0.07370851188898087}, + {"id": 557, "seek": 226200, "start": 2262.0, "end": 2267.12, "text": " But speaking + more broadly, I think that what I would be very interested to see is how", "tokens": + [50364, 583, 4124, 544, 19511, 11, 286, 519, 300, 437, 286, 576, 312, 588, 3102, + 281, 536, 307, 577, 50620], "temperature": 0.0, "avg_logprob": -0.26387407973005966, + "compression_ratio": 1.4761904761904763, "no_speech_prob": 0.007855252362787724}, + {"id": 558, "seek": 226200, "start": 2267.12, "end": 2277.48, "text": " vector databases + evolve and merge into traditional database technology or into systems like", "tokens": + [50620, 8062, 22380, 16693, 293, 22183, 666, 5164, 8149, 2899, 420, 666, 3652, 411, + 51138], "temperature": 0.0, "avg_logprob": -0.26387407973005966, "compression_ratio": + 1.4761904761904763, "no_speech_prob": 0.007855252362787724}, {"id": 559, "seek": + 226200, "start": 2277.48, "end": 2280.64, "text": " Lucene, like information retrieval + systems.", "tokens": [51138, 9593, 1450, 11, 411, 1589, 19817, 3337, 3652, 13, 51296], + "temperature": 0.0, "avg_logprob": -0.26387407973005966, "compression_ratio": 1.4761904761904763, + "no_speech_prob": 0.007855252362787724}, {"id": 560, "seek": 226200, "start": 2280.64, + "end": 2285.92, "text": " Because at the moment, you know, you have FIAS, it''s + kind of a separate discreet entity.", "tokens": [51296, 1436, 412, 264, 1623, 11, + 291, 458, 11, 291, 362, 479, 40, 3160, 11, 309, 311, 733, 295, 257, 4994, 2983, + 4751, 13977, 13, 51560], "temperature": 0.0, "avg_logprob": -0.26387407973005966, + "compression_ratio": 1.4761904761904763, "no_speech_prob": 0.007855252362787724}, + {"id": 561, "seek": 228592, "start": 2285.92, "end": 2292.4, "text": " But longer + term, just, you know, conceptually, in a way, very low-dimensional vector database", + "tokens": [50364, 583, 2854, 1433, 11, 445, 11, 291, 458, 11, 3410, 671, 11, 294, + 257, 636, 11, 588, 2295, 12, 18759, 8062, 8149, 50688], "temperature": 0.0, "avg_logprob": + -0.20521536327543713, "compression_ratio": 1.619047619047619, "no_speech_prob": + 0.31724634766578674}, {"id": 562, "seek": 228592, "start": 2292.4, "end": 2297.6, + "text": " technology has already made its way into my sequel and Postgres with the + spatial extensions", "tokens": [50688, 2899, 575, 1217, 1027, 1080, 636, 666, 452, + 20622, 293, 10223, 45189, 365, 264, 23598, 25129, 50948], "temperature": 0.0, "avg_logprob": + -0.20521536327543713, "compression_ratio": 1.619047619047619, "no_speech_prob": + 0.31724634766578674}, {"id": 563, "seek": 228592, "start": 2297.6, "end": 2301.36, + "text": " that they''ve supported for many years.", "tokens": [50948, 300, 436, + 600, 8104, 337, 867, 924, 13, 51136], "temperature": 0.0, "avg_logprob": -0.20521536327543713, + "compression_ratio": 1.619047619047619, "no_speech_prob": 0.31724634766578674}, + {"id": 564, "seek": 228592, "start": 2301.36, "end": 2307.16, "text": " The quadri + algorithm for doing, you know, sublinear lookups on a map.", "tokens": [51136, 440, + 10787, 470, 9284, 337, 884, 11, 291, 458, 11, 1422, 28263, 574, 7528, 322, 257, + 4471, 13, 51426], "temperature": 0.0, "avg_logprob": -0.20521536327543713, "compression_ratio": + 1.619047619047619, "no_speech_prob": 0.31724634766578674}, {"id": 565, "seek": 228592, + "start": 2307.16, "end": 2309.56, "text": " Those spatial extensions have been around + for a while.", "tokens": [51426, 3950, 23598, 25129, 362, 668, 926, 337, 257, 1339, + 13, 51546], "temperature": 0.0, "avg_logprob": -0.20521536327543713, "compression_ratio": + 1.619047619047619, "no_speech_prob": 0.31724634766578674}, {"id": 566, "seek": 228592, + "start": 2309.56, "end": 2314.6, "text": " You can easily imagine that in the future, + once people start to understand how useful vector", "tokens": [51546, 509, 393, + 3612, 3811, 300, 294, 264, 2027, 11, 1564, 561, 722, 281, 1223, 577, 4420, 8062, + 51798], "temperature": 0.0, "avg_logprob": -0.20521536327543713, "compression_ratio": + 1.619047619047619, "no_speech_prob": 0.31724634766578674}, {"id": 567, "seek": 231460, + "start": 2314.6, "end": 2321.64, "text": " embeddings can be, and that''s established, + that you''ll have a, you know, columns of vector", "tokens": [50364, 12240, 29432, + 393, 312, 11, 293, 300, 311, 7545, 11, 300, 291, 603, 362, 257, 11, 291, 458, 11, + 13766, 295, 8062, 50716], "temperature": 0.0, "avg_logprob": -0.25377947256105754, + "compression_ratio": 1.6088560885608856, "no_speech_prob": 0.004319720435887575}, + {"id": 568, "seek": 231460, "start": 2321.64, "end": 2327.3199999999997, "text": + " type in a relational database and be able to simply build an index and perform, + you know,", "tokens": [50716, 2010, 294, 257, 38444, 8149, 293, 312, 1075, 281, + 2935, 1322, 364, 8186, 293, 2042, 11, 291, 458, 11, 51000], "temperature": 0.0, + "avg_logprob": -0.25377947256105754, "compression_ratio": 1.6088560885608856, "no_speech_prob": + 0.004319720435887575}, {"id": 569, "seek": 231460, "start": 2327.3199999999997, + "end": 2331.24, "text": " fast nearest neighbor searches straight from Postgres.", + "tokens": [51000, 2370, 23831, 5987, 26701, 2997, 490, 10223, 45189, 13, 51196], + "temperature": 0.0, "avg_logprob": -0.25377947256105754, "compression_ratio": 1.6088560885608856, + "no_speech_prob": 0.004319720435887575}, {"id": 570, "seek": 231460, "start": 2331.24, + "end": 2334.0, "text": " So I think that''s an exciting future to contemplate.", + "tokens": [51196, 407, 286, 519, 300, 311, 364, 4670, 2027, 281, 19935, 473, 13, + 51334], "temperature": 0.0, "avg_logprob": -0.25377947256105754, "compression_ratio": + 1.6088560885608856, "no_speech_prob": 0.004319720435887575}, {"id": 571, "seek": + 231460, "start": 2334.0, "end": 2336.68, "text": " And I see that eventually it + will go there.", "tokens": [51334, 400, 286, 536, 300, 4728, 309, 486, 352, 456, + 13, 51468], "temperature": 0.0, "avg_logprob": -0.25377947256105754, "compression_ratio": + 1.6088560885608856, "no_speech_prob": 0.004319720435887575}, {"id": 572, "seek": + 231460, "start": 2336.68, "end": 2338.92, "text": " That sounds really interesting, + Lake.", "tokens": [51468, 663, 3263, 534, 1880, 11, 10582, 13, 51580], "temperature": + 0.0, "avg_logprob": -0.25377947256105754, "compression_ratio": 1.6088560885608856, + "no_speech_prob": 0.004319720435887575}, {"id": 573, "seek": 231460, "start": 2338.92, + "end": 2342.7999999999997, "text": " Do you think that vector searches in general + is a hype right now?", "tokens": [51580, 1144, 291, 519, 300, 8062, 26701, 294, + 2674, 307, 257, 24144, 558, 586, 30, 51774], "temperature": 0.0, "avg_logprob": + -0.25377947256105754, "compression_ratio": 1.6088560885608856, "no_speech_prob": + 0.004319720435887575}, {"id": 574, "seek": 234280, "start": 2342.8, "end": 2346.1600000000003, + "text": " I think the way big data was few years ago.", "tokens": [50364, 286, 519, + 264, 636, 955, 1412, 390, 1326, 924, 2057, 13, 50532], "temperature": 0.0, "avg_logprob": + -0.23654603958129883, "compression_ratio": 1.5271966527196652, "no_speech_prob": + 0.0058137052692472935}, {"id": 575, "seek": 234280, "start": 2346.1600000000003, + "end": 2355.36, "text": " No, no, it''s not hype because, again, I saw neural information + techniques, backed by vector", "tokens": [50532, 883, 11, 572, 11, 309, 311, 406, + 24144, 570, 11, 797, 11, 286, 1866, 18161, 1589, 7512, 11, 20391, 538, 8062, 50992], + "temperature": 0.0, "avg_logprob": -0.23654603958129883, "compression_ratio": 1.5271966527196652, + "no_speech_prob": 0.0058137052692472935}, {"id": 576, "seek": 234280, "start": 2355.36, + "end": 2359.4, "text": " databases, making a big difference in many products at + Google.", "tokens": [50992, 22380, 11, 1455, 257, 955, 2649, 294, 867, 3383, 412, + 3329, 13, 51194], "temperature": 0.0, "avg_logprob": -0.23654603958129883, "compression_ratio": + 1.5271966527196652, "no_speech_prob": 0.0058137052692472935}, {"id": 577, "seek": + 234280, "start": 2359.4, "end": 2365.52, "text": " So I think where it is right + now is that there''s a few big companies, like the Fang type", "tokens": [51194, + 407, 286, 519, 689, 309, 307, 558, 586, 307, 300, 456, 311, 257, 1326, 955, 3431, + 11, 411, 264, 25409, 2010, 51500], "temperature": 0.0, "avg_logprob": -0.23654603958129883, + "compression_ratio": 1.5271966527196652, "no_speech_prob": 0.0058137052692472935}, + {"id": 578, "seek": 234280, "start": 2365.52, "end": 2370.5600000000004, "text": + " companies in Silicon Valley, that have the expertise to take advantage of it.", + "tokens": [51500, 3431, 294, 25351, 10666, 11, 300, 362, 264, 11769, 281, 747, 5002, + 295, 309, 13, 51752], "temperature": 0.0, "avg_logprob": -0.23654603958129883, "compression_ratio": + 1.5271966527196652, "no_speech_prob": 0.0058137052692472935}, {"id": 579, "seek": + 237056, "start": 2370.56, "end": 2372.88, "text": " It''s not being commoditized + yet.", "tokens": [50364, 467, 311, 406, 885, 19931, 270, 1602, 1939, 13, 50480], + "temperature": 0.0, "avg_logprob": -0.28119694685735624, "compression_ratio": 1.5611510791366907, + "no_speech_prob": 0.04782700911164284}, {"id": 580, "seek": 237056, "start": 2372.88, + "end": 2377.2, "text": " So it''s definitely not hype, but it''s got a few years + to go before it enters a mainstream", "tokens": [50480, 407, 309, 311, 2138, 406, + 24144, 11, 457, 309, 311, 658, 257, 1326, 924, 281, 352, 949, 309, 18780, 257, 15960, + 50696], "temperature": 0.0, "avg_logprob": -0.28119694685735624, "compression_ratio": + 1.5611510791366907, "no_speech_prob": 0.04782700911164284}, {"id": 581, "seek": + 237056, "start": 2377.2, "end": 2378.2, "text": " consciousness.", "tokens": [50696, + 10081, 13, 50746], "temperature": 0.0, "avg_logprob": -0.28119694685735624, "compression_ratio": + 1.5611510791366907, "no_speech_prob": 0.04782700911164284}, {"id": 582, "seek": + 237056, "start": 2378.2, "end": 2379.2, "text": " Yeah, for sure.", "tokens": [50746, + 865, 11, 337, 988, 13, 50796], "temperature": 0.0, "avg_logprob": -0.28119694685735624, + "compression_ratio": 1.5611510791366907, "no_speech_prob": 0.04782700911164284}, + {"id": 583, "seek": 237056, "start": 2379.2, "end": 2384.88, "text": " But like + to your point, like maybe at some point, vector search will become, let''s say,", + "tokens": [50796, 583, 411, 281, 428, 935, 11, 411, 1310, 412, 512, 935, 11, 8062, + 3164, 486, 1813, 11, 718, 311, 584, 11, 51080], "temperature": 0.0, "avg_logprob": + -0.28119694685735624, "compression_ratio": 1.5611510791366907, "no_speech_prob": + 0.04782700911164284}, {"id": 584, "seek": 237056, "start": 2384.88, "end": 2393.2, + "text": " part of Postgres or my SQL or whatever other like, kind of traditional, + so to say, database,", "tokens": [51080, 644, 295, 10223, 45189, 420, 452, 19200, + 420, 2035, 661, 411, 11, 733, 295, 5164, 11, 370, 281, 584, 11, 8149, 11, 51496], + "temperature": 0.0, "avg_logprob": -0.28119694685735624, "compression_ratio": 1.5611510791366907, + "no_speech_prob": 0.04782700911164284}, {"id": 585, "seek": 237056, "start": 2393.2, + "end": 2397.36, "text": " which is traditional is in its widely used.", "tokens": + [51496, 597, 307, 5164, 307, 294, 1080, 13371, 1143, 13, 51704], "temperature": + 0.0, "avg_logprob": -0.28119694685735624, "compression_ratio": 1.5611510791366907, + "no_speech_prob": 0.04782700911164284}, {"id": 586, "seek": 237056, "start": 2397.36, + "end": 2399.92, "text": " And then you''ve seen already also introduced it, right?", + "tokens": [51704, 400, 550, 291, 600, 1612, 1217, 611, 7268, 309, 11, 558, 30, 51832], + "temperature": 0.0, "avg_logprob": -0.28119694685735624, "compression_ratio": 1.5611510791366907, + "no_speech_prob": 0.04782700911164284}, {"id": 587, "seek": 239992, "start": 2399.92, + "end": 2403.0, "text": " Lucine now has H&SW.", "tokens": [50364, 9593, 533, 586, + 575, 389, 5, 50, 54, 13, 50518], "temperature": 0.0, "avg_logprob": -0.22172396523611887, + "compression_ratio": 1.4948979591836735, "no_speech_prob": 0.01975647732615471}, + {"id": 588, "seek": 239992, "start": 2403.0, "end": 2412.16, "text": " You can go + and argue to the point, okay, maybe Lucine index layout might not be kind of", "tokens": + [50518, 509, 393, 352, 293, 9695, 281, 264, 935, 11, 1392, 11, 1310, 9593, 533, + 8186, 13333, 1062, 406, 312, 733, 295, 50976], "temperature": 0.0, "avg_logprob": + -0.22172396523611887, "compression_ratio": 1.4948979591836735, "no_speech_prob": + 0.01975647732615471}, {"id": 589, "seek": 239992, "start": 2412.16, "end": 2419.0, + "text": " optimally designed for, you know, nearest neighbor retrieval because, + because like if you", "tokens": [50976, 5028, 379, 4761, 337, 11, 291, 458, 11, + 23831, 5987, 19817, 3337, 570, 11, 570, 411, 498, 291, 51318], "temperature": 0.0, + "avg_logprob": -0.22172396523611887, "compression_ratio": 1.4948979591836735, "no_speech_prob": + 0.01975647732615471}, {"id": 590, "seek": 239992, "start": 2419.0, "end": 2425.2400000000002, + "text": " look at five methods or H&SW, you know, like it''s some graph method or + it''s a way to partition", "tokens": [51318, 574, 412, 1732, 7150, 420, 389, 5, + 50, 54, 11, 291, 458, 11, 411, 309, 311, 512, 4295, 3170, 420, 309, 311, 257, 636, + 281, 24808, 51630], "temperature": 0.0, "avg_logprob": -0.22172396523611887, "compression_ratio": + 1.4948979591836735, "no_speech_prob": 0.01975647732615471}, {"id": 591, "seek": + 242524, "start": 2425.24, "end": 2428.9599999999996, "text": " your space in the + scene, you partition it by segments.", "tokens": [50364, 428, 1901, 294, 264, 4145, + 11, 291, 24808, 309, 538, 19904, 13, 50550], "temperature": 0.0, "avg_logprob": + -0.24098337173461915, "compression_ratio": 1.5737051792828685, "no_speech_prob": + 0.059421613812446594}, {"id": 592, "seek": 242524, "start": 2428.9599999999996, + "end": 2430.9199999999996, "text": " And that''s kind of like given, right?", "tokens": + [50550, 400, 300, 311, 733, 295, 411, 2212, 11, 558, 30, 50648], "temperature": + 0.0, "avg_logprob": -0.24098337173461915, "compression_ratio": 1.5737051792828685, + "no_speech_prob": 0.059421613812446594}, {"id": 593, "seek": 242524, "start": 2430.9199999999996, + "end": 2434.2, "text": " Because it''s designed for inverted index.", "tokens": + [50648, 1436, 309, 311, 4761, 337, 38969, 8186, 13, 50812], "temperature": 0.0, + "avg_logprob": -0.24098337173461915, "compression_ratio": 1.5737051792828685, "no_speech_prob": + 0.059421613812446594}, {"id": 594, "seek": 242524, "start": 2434.2, "end": 2440.52, + "text": " But again, on Twitter somewhere, I saw it with from one Lucine commeter + who said, maybe", "tokens": [50812, 583, 797, 11, 322, 5794, 4079, 11, 286, 1866, + 309, 365, 490, 472, 9593, 533, 800, 2398, 567, 848, 11, 1310, 51128], "temperature": + 0.0, "avg_logprob": -0.24098337173461915, "compression_ratio": 1.5737051792828685, + "no_speech_prob": 0.059421613812446594}, {"id": 595, "seek": 242524, "start": 2440.52, + "end": 2447.7599999999998, "text": " this will by itself open up some new opportunities + because you''ll have a separate vector space index", "tokens": [51128, 341, 486, + 538, 2564, 1269, 493, 512, 777, 4786, 570, 291, 603, 362, 257, 4994, 8062, 1901, + 8186, 51490], "temperature": 0.0, "avg_logprob": -0.24098337173461915, "compression_ratio": + 1.5737051792828685, "no_speech_prob": 0.059421613812446594}, {"id": 596, "seek": + 242524, "start": 2447.7599999999998, "end": 2449.64, "text": " per segment, right?", + "tokens": [51490, 680, 9469, 11, 558, 30, 51584], "temperature": 0.0, "avg_logprob": + -0.24098337173461915, "compression_ratio": 1.5737051792828685, "no_speech_prob": + 0.059421613812446594}, {"id": 597, "seek": 242524, "start": 2449.64, "end": 2452.56, + "text": " And maybe you can design some features around that.", "tokens": [51584, + 400, 1310, 291, 393, 1715, 512, 4122, 926, 300, 13, 51730], "temperature": 0.0, + "avg_logprob": -0.24098337173461915, "compression_ratio": 1.5737051792828685, "no_speech_prob": + 0.059421613812446594}, {"id": 598, "seek": 245256, "start": 2452.56, "end": 2458.44, + "text": " So it sounds like you still see the potential for merging these technologies + in the future", "tokens": [50364, 407, 309, 3263, 411, 291, 920, 536, 264, 3995, + 337, 44559, 613, 7943, 294, 264, 2027, 50658], "temperature": 0.0, "avg_logprob": + -0.260723876953125, "compression_ratio": 1.5960784313725491, "no_speech_prob": 0.030855737626552582}, + {"id": 599, "seek": 245256, "start": 2458.44, "end": 2461.16, "text": " and then + bringing additional benefit.", "tokens": [50658, 293, 550, 5062, 4497, 5121, 13, + 50794], "temperature": 0.0, "avg_logprob": -0.260723876953125, "compression_ratio": + 1.5960784313725491, "no_speech_prob": 0.030855737626552582}, {"id": 600, "seek": + 245256, "start": 2461.16, "end": 2463.88, "text": " Well, yes, I can''t really speak + for Lucine.", "tokens": [50794, 1042, 11, 2086, 11, 286, 393, 380, 534, 1710, 337, + 9593, 533, 13, 50930], "temperature": 0.0, "avg_logprob": -0.260723876953125, "compression_ratio": + 1.5960784313725491, "no_speech_prob": 0.030855737626552582}, {"id": 601, "seek": + 245256, "start": 2463.88, "end": 2466.4, "text": " I haven''t taken time to study + that implementation.", "tokens": [50930, 286, 2378, 380, 2726, 565, 281, 2979, 300, + 11420, 13, 51056], "temperature": 0.0, "avg_logprob": -0.260723876953125, "compression_ratio": + 1.5960784313725491, "no_speech_prob": 0.030855737626552582}, {"id": 602, "seek": + 245256, "start": 2466.4, "end": 2468.68, "text": " How it was done is I think you + know more about it than me.", "tokens": [51056, 1012, 309, 390, 1096, 307, 286, + 519, 291, 458, 544, 466, 309, 813, 385, 13, 51170], "temperature": 0.0, "avg_logprob": + -0.260723876953125, "compression_ratio": 1.5960784313725491, "no_speech_prob": 0.030855737626552582}, + {"id": 603, "seek": 245256, "start": 2468.68, "end": 2476.12, "text": " But I was + seeing that eventually relational databases could and might, you know, implement", + "tokens": [51170, 583, 286, 390, 2577, 300, 4728, 38444, 22380, 727, 293, 1062, + 11, 291, 458, 11, 4445, 51542], "temperature": 0.0, "avg_logprob": -0.260723876953125, + "compression_ratio": 1.5960784313725491, "no_speech_prob": 0.030855737626552582}, + {"id": 604, "seek": 245256, "start": 2476.12, "end": 2479.24, "text": " indexes, + vector indexes directly.", "tokens": [51542, 8186, 279, 11, 8062, 8186, 279, 3838, + 13, 51698], "temperature": 0.0, "avg_logprob": -0.260723876953125, "compression_ratio": + 1.5960784313725491, "no_speech_prob": 0.030855737626552582}, {"id": 605, "seek": + 247924, "start": 2479.24, "end": 2482.9199999999996, "text": " I''m not sure that + I can see any technical reason why that wouldn''t be possible, basically.", "tokens": + [50364, 286, 478, 406, 988, 300, 286, 393, 536, 604, 6191, 1778, 983, 300, 2759, + 380, 312, 1944, 11, 1936, 13, 50548], "temperature": 0.0, "avg_logprob": -0.22915748014288434, + "compression_ratio": 1.5714285714285714, "no_speech_prob": 0.002577786101028323}, + {"id": 606, "seek": 247924, "start": 2482.9199999999996, "end": 2487.7999999999997, + "text": " And it could potentially be very, very useful as neural networks, you + know, go more and", "tokens": [50548, 400, 309, 727, 7263, 312, 588, 11, 588, 4420, + 382, 18161, 9590, 11, 291, 458, 11, 352, 544, 293, 50792], "temperature": 0.0, "avg_logprob": + -0.22915748014288434, "compression_ratio": 1.5714285714285714, "no_speech_prob": + 0.002577786101028323}, {"id": 607, "seek": 247924, "start": 2487.7999999999997, + "end": 2489.68, "text": " more mainstream for embedding.", "tokens": [50792, 544, + 15960, 337, 12240, 3584, 13, 50886], "temperature": 0.0, "avg_logprob": -0.22915748014288434, + "compression_ratio": 1.5714285714285714, "no_speech_prob": 0.002577786101028323}, + {"id": 608, "seek": 247924, "start": 2489.68, "end": 2493.8799999999997, "text": + " Yeah, I mean, it sounds like one logical step forward.", "tokens": [50886, 865, + 11, 286, 914, 11, 309, 3263, 411, 472, 14978, 1823, 2128, 13, 51096], "temperature": + 0.0, "avg_logprob": -0.22915748014288434, "compression_ratio": 1.5714285714285714, + "no_speech_prob": 0.002577786101028323}, {"id": 609, "seek": 247924, "start": 2493.8799999999997, + "end": 2501.2799999999997, "text": " Maybe it will not be kind of scalable as a + pure vector database, but like on a small", "tokens": [51096, 2704, 309, 486, 406, + 312, 733, 295, 38481, 382, 257, 6075, 8062, 8149, 11, 457, 411, 322, 257, 1359, + 51466], "temperature": 0.0, "avg_logprob": -0.22915748014288434, "compression_ratio": + 1.5714285714285714, "no_speech_prob": 0.002577786101028323}, {"id": 610, "seek": + 247924, "start": 2501.2799999999997, "end": 2507.8399999999997, "text": " like amount + of data, let''s say when my SQL or Oracle or other databases, they introduce", "tokens": + [51466, 411, 2372, 295, 1412, 11, 718, 311, 584, 562, 452, 19200, 420, 25654, 420, + 661, 22380, 11, 436, 5366, 51794], "temperature": 0.0, "avg_logprob": -0.22915748014288434, + "compression_ratio": 1.5714285714285714, "no_speech_prob": 0.002577786101028323}, + {"id": 611, "seek": 247924, "start": 2507.8399999999997, "end": 2509.0, "text": + " full text search, right?", "tokens": [51794, 1577, 2487, 3164, 11, 558, 30, 51852], + "temperature": 0.0, "avg_logprob": -0.22915748014288434, "compression_ratio": 1.5714285714285714, + "no_speech_prob": 0.002577786101028323}, {"id": 612, "seek": 250900, "start": 2509.0, + "end": 2511.16, "text": " And initially it wasn''t there, right?", "tokens": [50364, + 400, 9105, 309, 2067, 380, 456, 11, 558, 30, 50472], "temperature": 0.0, "avg_logprob": + -0.2620651623434272, "compression_ratio": 1.681992337164751, "no_speech_prob": 0.00325383385643363}, + {"id": 613, "seek": 250900, "start": 2511.16, "end": 2512.16, "text": " Right.", + "tokens": [50472, 1779, 13, 50522], "temperature": 0.0, "avg_logprob": -0.2620651623434272, + "compression_ratio": 1.681992337164751, "no_speech_prob": 0.00325383385643363}, + {"id": 614, "seek": 250900, "start": 2512.16, "end": 2519.36, "text": " What restricts + you from, you know, introducing another field with embedding and actually running", + "tokens": [50522, 708, 7694, 82, 291, 490, 11, 291, 458, 11, 15424, 1071, 2519, + 365, 12240, 3584, 293, 767, 2614, 50882], "temperature": 0.0, "avg_logprob": -0.2620651623434272, + "compression_ratio": 1.681992337164751, "no_speech_prob": 0.00325383385643363}, + {"id": 615, "seek": 250900, "start": 2519.36, "end": 2521.36, "text": " your vector + retrieval there?", "tokens": [50882, 428, 8062, 19817, 3337, 456, 30, 50982], "temperature": + 0.0, "avg_logprob": -0.2620651623434272, "compression_ratio": 1.681992337164751, + "no_speech_prob": 0.00325383385643363}, {"id": 616, "seek": 250900, "start": 2521.36, + "end": 2522.36, "text": " Right.", "tokens": [50982, 1779, 13, 51032], "temperature": + 0.0, "avg_logprob": -0.2620651623434272, "compression_ratio": 1.681992337164751, + "no_speech_prob": 0.00325383385643363}, {"id": 617, "seek": 250900, "start": 2522.36, + "end": 2523.36, "text": " Yeah.", "tokens": [51032, 865, 13, 51082], "temperature": + 0.0, "avg_logprob": -0.2620651623434272, "compression_ratio": 1.681992337164751, + "no_speech_prob": 0.00325383385643363}, {"id": 618, "seek": 250900, "start": 2523.36, + "end": 2529.64, "text": " And I think it also, it comes down to this that, okay, + FICE is always going to give you,", "tokens": [51082, 400, 286, 519, 309, 611, 11, + 309, 1487, 760, 281, 341, 300, 11, 1392, 11, 479, 13663, 307, 1009, 516, 281, 976, + 291, 11, 51396], "temperature": 0.0, "avg_logprob": -0.2620651623434272, "compression_ratio": + 1.681992337164751, "no_speech_prob": 0.00325383385643363}, {"id": 619, "seek": 250900, + "start": 2529.64, "end": 2531.72, "text": " you know, the maximum performance.", + "tokens": [51396, 291, 458, 11, 264, 6674, 3389, 13, 51500], "temperature": 0.0, + "avg_logprob": -0.2620651623434272, "compression_ratio": 1.681992337164751, "no_speech_prob": + 0.00325383385643363}, {"id": 620, "seek": 250900, "start": 2531.72, "end": 2536.92, + "text": " So, you know, there''s going to be some subset of engineering teams that + need that performance", "tokens": [51500, 407, 11, 291, 458, 11, 456, 311, 516, + 281, 312, 512, 25993, 295, 7043, 5491, 300, 643, 300, 3389, 51760], "temperature": + 0.0, "avg_logprob": -0.2620651623434272, "compression_ratio": 1.681992337164751, + "no_speech_prob": 0.00325383385643363}, {"id": 621, "seek": 250900, "start": 2536.92, + "end": 2538.64, "text": " and that''s where they''re going to go.", "tokens": [51760, + 293, 300, 311, 689, 436, 434, 516, 281, 352, 13, 51846], "temperature": 0.0, "avg_logprob": + -0.2620651623434272, "compression_ratio": 1.681992337164751, "no_speech_prob": 0.00325383385643363}, + {"id": 622, "seek": 253864, "start": 2538.7599999999998, "end": 2543.16, "text": + " What about the mass market, you know, the Fortune 500 companies and things and + they''re dealing", "tokens": [50370, 708, 466, 264, 2758, 2142, 11, 291, 458, 11, + 264, 38508, 5923, 3431, 293, 721, 293, 436, 434, 6260, 50590], "temperature": 0.0, + "avg_logprob": -0.2629668572369744, "compression_ratio": 1.6474358974358974, "no_speech_prob": + 0.006068071350455284}, {"id": 623, "seek": 253864, "start": 2543.16, "end": 2547.2799999999997, + "text": " with problems at such a scale where it''s not necessary to go there.", + "tokens": [50590, 365, 2740, 412, 1270, 257, 4373, 689, 309, 311, 406, 4818, 281, + 352, 456, 13, 50796], "temperature": 0.0, "avg_logprob": -0.2629668572369744, "compression_ratio": + 1.6474358974358974, "no_speech_prob": 0.006068071350455284}, {"id": 624, "seek": + 253864, "start": 2547.2799999999997, "end": 2551.6, "text": " And if it''s just + in the database, even if it''s only giving me 80% of the total performance,", "tokens": + [50796, 400, 498, 309, 311, 445, 294, 264, 8149, 11, 754, 498, 309, 311, 787, 2902, + 385, 4688, 4, 295, 264, 3217, 3389, 11, 51012], "temperature": 0.0, "avg_logprob": + -0.2629668572369744, "compression_ratio": 1.6474358974358974, "no_speech_prob": + 0.006068071350455284}, {"id": 625, "seek": 253864, "start": 2551.6, "end": 2552.6, + "text": " that''s good enough.", "tokens": [51012, 300, 311, 665, 1547, 13, 51062], + "temperature": 0.0, "avg_logprob": -0.2629668572369744, "compression_ratio": 1.6474358974358974, + "no_speech_prob": 0.006068071350455284}, {"id": 626, "seek": 253864, "start": 2552.6, + "end": 2558.52, "text": " And in a way, that pragmatic tradeoff is what''s underlying + Zerae I''s existence because", "tokens": [51062, 400, 294, 257, 636, 11, 300, 46904, + 4923, 4506, 307, 437, 311, 14217, 1176, 1663, 68, 286, 311, 9123, 570, 51358], "temperature": + 0.0, "avg_logprob": -0.2629668572369744, "compression_ratio": 1.6474358974358974, + "no_speech_prob": 0.006068071350455284}, {"id": 627, "seek": 253864, "start": 2558.52, + "end": 2563.0, "text": " people often ask, I can get better performance on my data + set.", "tokens": [51358, 561, 2049, 1029, 11, 286, 393, 483, 1101, 3389, 322, 452, + 1412, 992, 13, 51582], "temperature": 0.0, "avg_logprob": -0.2629668572369744, "compression_ratio": + 1.6474358974358974, "no_speech_prob": 0.006068071350455284}, {"id": 628, "seek": + 253864, "start": 2563.0, "end": 2568.6, "text": " If I find tune, a bird model and + then distilled the bird model is like, yes, that''s true.", "tokens": [51582, 759, + 286, 915, 10864, 11, 257, 5255, 2316, 293, 550, 1483, 6261, 264, 5255, 2316, 307, + 411, 11, 2086, 11, 300, 311, 2074, 13, 51862], "temperature": 0.0, "avg_logprob": + -0.2629668572369744, "compression_ratio": 1.6474358974358974, "no_speech_prob": + 0.006068071350455284}, {"id": 629, "seek": 256860, "start": 2568.88, "end": 2573.0, + "text": " We''re aiming to give you a neural network and a full experience that + will give you like", "tokens": [50378, 492, 434, 20253, 281, 976, 291, 257, 18161, + 3209, 293, 257, 1577, 1752, 300, 486, 976, 291, 411, 50584], "temperature": 0.0, + "avg_logprob": -0.2220158760364239, "compression_ratio": 1.5648854961832062, "no_speech_prob": + 0.0027911229990422726}, {"id": 630, "seek": 256860, "start": 2573.0, "end": 2577.04, + "text": " 80% of the performance that you might be able to achieve, which is still + better than you", "tokens": [50584, 4688, 4, 295, 264, 3389, 300, 291, 1062, 312, + 1075, 281, 4584, 11, 597, 307, 920, 1101, 813, 291, 50786], "temperature": 0.0, + "avg_logprob": -0.2220158760364239, "compression_ratio": 1.5648854961832062, "no_speech_prob": + 0.0027911229990422726}, {"id": 631, "seek": 256860, "start": 2577.04, "end": 2579.04, + "text": " get just from a keyword search.", "tokens": [50786, 483, 445, 490, 257, + 20428, 3164, 13, 50886], "temperature": 0.0, "avg_logprob": -0.2220158760364239, + "compression_ratio": 1.5648854961832062, "no_speech_prob": 0.0027911229990422726}, + {"id": 632, "seek": 256860, "start": 2579.04, "end": 2584.48, "text": " But the + reality is, you know, how many companies have the budget to have NLP engineers and", + "tokens": [50886, 583, 264, 4103, 307, 11, 291, 458, 11, 577, 867, 3431, 362, 264, + 4706, 281, 362, 426, 45196, 11955, 293, 51158], "temperature": 0.0, "avg_logprob": + -0.2220158760364239, "compression_ratio": 1.5648854961832062, "no_speech_prob": + 0.0027911229990422726}, {"id": 633, "seek": 256860, "start": 2584.48, "end": 2586.2799999999997, + "text": " data science and squeeze out that extra performance?", "tokens": [51158, + 1412, 3497, 293, 13578, 484, 300, 2857, 3389, 30, 51248], "temperature": 0.0, "avg_logprob": + -0.2220158760364239, "compression_ratio": 1.5648854961832062, "no_speech_prob": + 0.0027911229990422726}, {"id": 634, "seek": 256860, "start": 2586.2799999999997, + "end": 2588.7999999999997, "text": " It''s just not important in a lot of cases.", + "tokens": [51248, 467, 311, 445, 406, 1021, 294, 257, 688, 295, 3331, 13, 51374], + "temperature": 0.0, "avg_logprob": -0.2220158760364239, "compression_ratio": 1.5648854961832062, + "no_speech_prob": 0.0027911229990422726}, {"id": 635, "seek": 256860, "start": 2588.7999999999997, + "end": 2590.3199999999997, "text": " Yeah, exactly.", "tokens": [51374, 865, 11, + 2293, 13, 51450], "temperature": 0.0, "avg_logprob": -0.2220158760364239, "compression_ratio": + 1.5648854961832062, "no_speech_prob": 0.0027911229990422726}, {"id": 636, "seek": + 259032, "start": 2590.32, "end": 2601.1200000000003, "text": " And do you think + that, you know, there is still a need to find a way to combine BM25 or", "tokens": + [50364, 400, 360, 291, 519, 300, 11, 291, 458, 11, 456, 307, 920, 257, 643, 281, + 915, 257, 636, 281, 10432, 15901, 6074, 420, 50904], "temperature": 0.0, "avg_logprob": + -0.2468852233886719, "compression_ratio": 1.638655462184874, "no_speech_prob": 0.005492100492119789}, + {"id": 637, "seek": 259032, "start": 2601.1200000000003, "end": 2605.6800000000003, + "text": " whatever you have there, like the idea of Spark Search with the results + from the nearest", "tokens": [50904, 2035, 291, 362, 456, 11, 411, 264, 1558, 295, + 23424, 17180, 365, 264, 3542, 490, 264, 23831, 51132], "temperature": 0.0, "avg_logprob": + -0.2468852233886719, "compression_ratio": 1.638655462184874, "no_speech_prob": 0.005492100492119789}, + {"id": 638, "seek": 259032, "start": 2605.6800000000003, "end": 2607.0800000000004, + "text": " neighbor search?", "tokens": [51132, 5987, 3164, 30, 51202], "temperature": + 0.0, "avg_logprob": -0.2468852233886719, "compression_ratio": 1.638655462184874, + "no_speech_prob": 0.005492100492119789}, {"id": 639, "seek": 259032, "start": 2607.0800000000004, + "end": 2609.2400000000002, "text": " Like have you been thinking about it?", "tokens": + [51202, 1743, 362, 291, 668, 1953, 466, 309, 30, 51310], "temperature": 0.0, "avg_logprob": + -0.2468852233886719, "compression_ratio": 1.638655462184874, "no_speech_prob": 0.005492100492119789}, + {"id": 640, "seek": 259032, "start": 2609.2400000000002, "end": 2614.4, "text": + " Have you seen your clients kind of thinking about it or asking about it?", "tokens": + [51310, 3560, 291, 1612, 428, 6982, 733, 295, 1953, 466, 309, 420, 3365, 466, 309, + 30, 51568], "temperature": 0.0, "avg_logprob": -0.2468852233886719, "compression_ratio": + 1.638655462184874, "no_speech_prob": 0.005492100492119789}, {"id": 641, "seek": + 259032, "start": 2614.4, "end": 2618.32, "text": " There''s a very interesting paper + from Google about two years ago, Dave Dobson and I''m", "tokens": [51568, 821, 311, + 257, 588, 1880, 3035, 490, 3329, 466, 732, 924, 2057, 11, 11017, 1144, 929, 266, + 293, 286, 478, 51764], "temperature": 0.0, "avg_logprob": -0.2468852233886719, "compression_ratio": + 1.638655462184874, "no_speech_prob": 0.005492100492119789}, {"id": 642, "seek": + 261832, "start": 2618.32, "end": 2621.32, "text": " forgetting the other individuals.", + "tokens": [50364, 25428, 264, 661, 5346, 13, 50514], "temperature": 0.0, "avg_logprob": + -0.3059347735510932, "compression_ratio": 1.4413145539906103, "no_speech_prob": + 0.012042051181197166}, {"id": 643, "seek": 261832, "start": 2621.32, "end": 2629.56, + "text": " They specifically on this topic, you can obviously model a BM25 search + as, you know,", "tokens": [50514, 814, 4682, 322, 341, 4829, 11, 291, 393, 2745, + 2316, 257, 15901, 6074, 3164, 382, 11, 291, 458, 11, 50926], "temperature": 0.0, + "avg_logprob": -0.3059347735510932, "compression_ratio": 1.4413145539906103, "no_speech_prob": + 0.012042051181197166}, {"id": 644, "seek": 261832, "start": 2629.56, "end": 2632.2400000000002, + "text": " multiplication of Spark''s matrices.", "tokens": [50926, 27290, 295, 23424, + 311, 32284, 13, 51060], "temperature": 0.0, "avg_logprob": -0.3059347735510932, + "compression_ratio": 1.4413145539906103, "no_speech_prob": 0.012042051181197166}, + {"id": 645, "seek": 261832, "start": 2632.2400000000002, "end": 2638.96, "text": + " And so you can imagine your vectors essentially having a dense part produced by + a neural network", "tokens": [51060, 400, 370, 291, 393, 3811, 428, 18875, 4476, + 1419, 257, 18011, 644, 7126, 538, 257, 18161, 3209, 51396], "temperature": 0.0, + "avg_logprob": -0.3059347735510932, "compression_ratio": 1.4413145539906103, "no_speech_prob": + 0.012042051181197166}, {"id": 646, "seek": 261832, "start": 2638.96, "end": 2643.32, + "text": " for instance, and then a very sparse tail or something.", "tokens": [51396, + 337, 5197, 11, 293, 550, 257, 588, 637, 11668, 6838, 420, 746, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.3059347735510932, "compression_ratio": 1.4413145539906103, + "no_speech_prob": 0.012042051181197166}, {"id": 647, "seek": 264332, "start": 2643.32, + "end": 2650.6400000000003, "text": " And you actually want to perform dot products + and how do you do it efficiently?", "tokens": [50364, 400, 291, 767, 528, 281, 2042, + 5893, 3383, 293, 577, 360, 291, 360, 309, 19621, 30, 50730], "temperature": 0.0, + "avg_logprob": -0.2595584930912141, "compression_ratio": 1.704225352112676, "no_speech_prob": + 0.02395879663527012}, {"id": 648, "seek": 264332, "start": 2650.6400000000003, "end": + 2654.96, "text": " And the paper was going into some fascinating techniques for + how to do that well.", "tokens": [50730, 400, 264, 3035, 390, 516, 666, 512, 10343, + 7512, 337, 577, 281, 360, 300, 731, 13, 50946], "temperature": 0.0, "avg_logprob": + -0.2595584930912141, "compression_ratio": 1.704225352112676, "no_speech_prob": 0.02395879663527012}, + {"id": 649, "seek": 264332, "start": 2654.96, "end": 2657.7200000000003, "text": + " So your question was like, do you see these merging?", "tokens": [50946, 407, + 428, 1168, 390, 411, 11, 360, 291, 536, 613, 44559, 30, 51084], "temperature": 0.0, + "avg_logprob": -0.2595584930912141, "compression_ratio": 1.704225352112676, "no_speech_prob": + 0.02395879663527012}, {"id": 650, "seek": 264332, "start": 2657.7200000000003, "end": + 2663.28, "text": " And I think that, you know, I actually brought this up with the + folks at Fires.", "tokens": [51084, 400, 286, 519, 300, 11, 291, 458, 11, 286, 767, + 3038, 341, 493, 365, 264, 4024, 412, 479, 3145, 13, 51362], "temperature": 0.0, + "avg_logprob": -0.2595584930912141, "compression_ratio": 1.704225352112676, "no_speech_prob": + 0.02395879663527012}, {"id": 651, "seek": 264332, "start": 2663.28, "end": 2665.0800000000004, + "text": " Is this something on your roadmap?", "tokens": [51362, 1119, 341, 746, + 322, 428, 35738, 30, 51452], "temperature": 0.0, "avg_logprob": -0.2595584930912141, + "compression_ratio": 1.704225352112676, "no_speech_prob": 0.02395879663527012}, + {"id": 652, "seek": 264332, "start": 2665.0800000000004, "end": 2666.36, "text": + " Is this something you''re interested in?", "tokens": [51452, 1119, 341, 746, 291, + 434, 3102, 294, 30, 51516], "temperature": 0.0, "avg_logprob": -0.2595584930912141, + "compression_ratio": 1.704225352112676, "no_speech_prob": 0.02395879663527012}, + {"id": 653, "seek": 264332, "start": 2666.36, "end": 2668.44, "text": " They said, + no, we''re not interested in this.", "tokens": [51516, 814, 848, 11, 572, 11, 321, + 434, 406, 3102, 294, 341, 13, 51620], "temperature": 0.0, "avg_logprob": -0.2595584930912141, + "compression_ratio": 1.704225352112676, "no_speech_prob": 0.02395879663527012}, + {"id": 654, "seek": 264332, "start": 2668.44, "end": 2673.28, "text": " They''re + specifically focused on either sparse or dense, but not high.", "tokens": [51620, + 814, 434, 4682, 5178, 322, 2139, 637, 11668, 420, 18011, 11, 457, 406, 1090, 13, + 51862], "temperature": 0.0, "avg_logprob": -0.2595584930912141, "compression_ratio": + 1.704225352112676, "no_speech_prob": 0.02395879663527012}, {"id": 655, "seek": 267328, + "start": 2673.28, "end": 2677.28, "text": " And I think that it''s going to come + down to this.", "tokens": [50364, 400, 286, 519, 300, 309, 311, 516, 281, 808, 760, + 281, 341, 13, 50564], "temperature": 0.0, "avg_logprob": -0.23729058150406723, "compression_ratio": + 1.5982142857142858, "no_speech_prob": 0.0033037937246263027}, {"id": 656, "seek": + 267328, "start": 2677.28, "end": 2687.28, "text": " If the utility of this sparse + hybrid can be shown, then the technology is going to follow and try to create", + "tokens": [50564, 759, 264, 14877, 295, 341, 637, 11668, 13051, 393, 312, 4898, + 11, 550, 264, 2899, 307, 516, 281, 1524, 293, 853, 281, 1884, 51064], "temperature": + 0.0, "avg_logprob": -0.23729058150406723, "compression_ratio": 1.5982142857142858, + "no_speech_prob": 0.0033037937246263027}, {"id": 657, "seek": 267328, "start": 2687.28, + "end": 2689.28, "text": " efficient implementations of it.", "tokens": [51064, 7148, + 4445, 763, 295, 309, 13, 51164], "temperature": 0.0, "avg_logprob": -0.23729058150406723, + "compression_ratio": 1.5982142857142858, "no_speech_prob": 0.0033037937246263027}, + {"id": 658, "seek": 267328, "start": 2689.28, "end": 2695.28, "text": " I think + that there are certainly classes of queries for which BM25 can''t be beat.", "tokens": + [51164, 286, 519, 300, 456, 366, 3297, 5359, 295, 24109, 337, 597, 15901, 6074, + 393, 380, 312, 4224, 13, 51464], "temperature": 0.0, "avg_logprob": -0.23729058150406723, + "compression_ratio": 1.5982142857142858, "no_speech_prob": 0.0033037937246263027}, + {"id": 659, "seek": 267328, "start": 2695.28, "end": 2700.28, "text": " And the + exact keyword matching is going to be the correct way to do it in the future.", + "tokens": [51464, 400, 264, 1900, 20428, 14324, 307, 516, 281, 312, 264, 3006, 636, + 281, 360, 309, 294, 264, 2027, 13, 51714], "temperature": 0.0, "avg_logprob": -0.23729058150406723, + "compression_ratio": 1.5982142857142858, "no_speech_prob": 0.0033037937246263027}, + {"id": 660, "seek": 270028, "start": 2700.28, "end": 2704.28, "text": " So then + you can take a few different strategies.", "tokens": [50364, 407, 550, 291, 393, + 747, 257, 1326, 819, 9029, 13, 50564], "temperature": 0.0, "avg_logprob": -0.11681803432079631, + "compression_ratio": 1.7777777777777777, "no_speech_prob": 0.008699117228388786}, + {"id": 661, "seek": 270028, "start": 2704.28, "end": 2711.28, "text": " You can + either try to classify the query when it''s received and then dispatch it to the + correct back end.", "tokens": [50564, 509, 393, 2139, 853, 281, 33872, 264, 14581, + 562, 309, 311, 4613, 293, 550, 36729, 309, 281, 264, 3006, 646, 917, 13, 50914], + "temperature": 0.0, "avg_logprob": -0.11681803432079631, "compression_ratio": 1.7777777777777777, + "no_speech_prob": 0.008699117228388786}, {"id": 662, "seek": 270028, "start": 2711.28, + "end": 2716.28, "text": " Or you can dispatch it to a sparse and a dense index and + then merge with a re-ranger.", "tokens": [50914, 1610, 291, 393, 36729, 309, 281, + 257, 637, 11668, 293, 257, 18011, 8186, 293, 550, 22183, 365, 257, 319, 12, 81, + 3176, 13, 51164], "temperature": 0.0, "avg_logprob": -0.11681803432079631, "compression_ratio": + 1.7777777777777777, "no_speech_prob": 0.008699117228388786}, {"id": 663, "seek": + 270028, "start": 2716.28, "end": 2727.28, "text": " Or you can do this like truly + hybrid system where you''re simultaneously doing the multiplication on the sparse + and the dense pieces and producing a final list in like in one shot, not relying + on a re-ranger.", "tokens": [51164, 1610, 291, 393, 360, 341, 411, 4908, 13051, + 1185, 689, 291, 434, 16561, 884, 264, 27290, 322, 264, 637, 11668, 293, 264, 18011, + 3755, 293, 10501, 257, 2572, 1329, 294, 411, 294, 472, 3347, 11, 406, 24140, 322, + 257, 319, 12, 81, 3176, 13, 51714], "temperature": 0.0, "avg_logprob": -0.11681803432079631, + "compression_ratio": 1.7777777777777777, "no_speech_prob": 0.008699117228388786}, + {"id": 664, "seek": 272728, "start": 2727.28, "end": 2733.28, "text": " So it''s + still an open area of research.", "tokens": [50364, 407, 309, 311, 920, 364, 1269, + 1859, 295, 2132, 13, 50664], "temperature": 0.0, "avg_logprob": -0.18982413208600388, + "compression_ratio": 1.5194805194805194, "no_speech_prob": 0.0942075327038765}, + {"id": 665, "seek": 272728, "start": 2733.28, "end": 2734.28, "text": " Exactly.", + "tokens": [50664, 7587, 13, 50714], "temperature": 0.0, "avg_logprob": -0.18982413208600388, + "compression_ratio": 1.5194805194805194, "no_speech_prob": 0.0942075327038765}, + {"id": 666, "seek": 272728, "start": 2734.28, "end": 2736.28, "text": " And two + things.", "tokens": [50714, 400, 732, 721, 13, 50814], "temperature": 0.0, "avg_logprob": + -0.18982413208600388, "compression_ratio": 1.5194805194805194, "no_speech_prob": + 0.0942075327038765}, {"id": 667, "seek": 272728, "start": 2736.28, "end": 2739.28, + "text": " Like I''m looking at it from the point of view of a customer.", "tokens": + [50814, 1743, 286, 478, 1237, 412, 309, 490, 264, 935, 295, 1910, 295, 257, 5474, + 13, 50964], "temperature": 0.0, "avg_logprob": -0.18982413208600388, "compression_ratio": + 1.5194805194805194, "no_speech_prob": 0.0942075327038765}, {"id": 668, "seek": 272728, + "start": 2739.28, "end": 2742.28, "text": " Let''s say I already have the M25 platform, + right?", "tokens": [50964, 961, 311, 584, 286, 1217, 362, 264, 376, 6074, 3663, + 11, 558, 30, 51114], "temperature": 0.0, "avg_logprob": -0.18982413208600388, "compression_ratio": + 1.5194805194805194, "no_speech_prob": 0.0942075327038765}, {"id": 669, "seek": 272728, + "start": 2742.28, "end": 2743.28, "text": " Base platform.", "tokens": [51114, 21054, + 3663, 13, 51164], "temperature": 0.0, "avg_logprob": -0.18982413208600388, "compression_ratio": + 1.5194805194805194, "no_speech_prob": 0.0942075327038765}, {"id": 670, "seek": 272728, + "start": 2743.28, "end": 2746.28, "text": " And so I''m curious.", "tokens": [51164, + 400, 370, 286, 478, 6369, 13, 51314], "temperature": 0.0, "avg_logprob": -0.18982413208600388, + "compression_ratio": 1.5194805194805194, "no_speech_prob": 0.0942075327038765}, + {"id": 671, "seek": 272728, "start": 2746.28, "end": 2749.28, "text": " Okay. So + I''m curious to see what vector search can bring me.", "tokens": [51314, 1033, 13, + 407, 286, 478, 6369, 281, 536, 437, 8062, 3164, 393, 1565, 385, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.18982413208600388, "compression_ratio": 1.5194805194805194, + "no_speech_prob": 0.0942075327038765}, {"id": 672, "seek": 272728, "start": 2749.28, + "end": 2755.28, "text": " And maybe I''m thinking about introducing this as an explorative + search feature.", "tokens": [51464, 400, 1310, 286, 478, 1953, 466, 15424, 341, + 382, 364, 24765, 1166, 3164, 4111, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.18982413208600388, "compression_ratio": 1.5194805194805194, "no_speech_prob": + 0.0942075327038765}, {"id": 673, "seek": 275528, "start": 2755.28, "end": 2761.28, + "text": " So because I''m not sure if it''s going to fly for my documents or for + my items in the database, right?", "tokens": [50364, 407, 570, 286, 478, 406, 988, + 498, 309, 311, 516, 281, 3603, 337, 452, 8512, 420, 337, 452, 4754, 294, 264, 8149, + 11, 558, 30, 50664], "temperature": 0.0, "avg_logprob": -0.13929992344068445, "compression_ratio": + 1.6754716981132076, "no_speech_prob": 0.0034120010677725077}, {"id": 674, "seek": + 275528, "start": 2761.28, "end": 2769.28, "text": " So that''s one potential to + think about, okay, as you said, I can actually route this query to both sparse and + dense retrieval.", "tokens": [50664, 407, 300, 311, 472, 3995, 281, 519, 466, 11, + 1392, 11, 382, 291, 848, 11, 286, 393, 767, 7955, 341, 14581, 281, 1293, 637, 11668, + 293, 18011, 19817, 3337, 13, 51064], "temperature": 0.0, "avg_logprob": -0.13929992344068445, + "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0034120010677725077}, + {"id": 675, "seek": 275528, "start": 2769.28, "end": 2773.28, "text": " And then + maybe combine them in some linear formula, even.", "tokens": [51064, 400, 550, 1310, + 10432, 552, 294, 512, 8213, 8513, 11, 754, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.13929992344068445, "compression_ratio": 1.6754716981132076, "no_speech_prob": + 0.0034120010677725077}, {"id": 676, "seek": 275528, "start": 2773.28, "end": 2782.28, + "text": " And I can give like a smaller score, lower score to, to, or wait to the + dense part and then higher to the sparse part because I still believe in sparse + part.", "tokens": [51264, 400, 286, 393, 976, 411, 257, 4356, 6175, 11, 3126, 6175, + 281, 11, 281, 11, 420, 1699, 281, 264, 18011, 644, 293, 550, 2946, 281, 264, 637, + 11668, 644, 570, 286, 920, 1697, 294, 637, 11668, 644, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.13929992344068445, "compression_ratio": 1.6754716981132076, + "no_speech_prob": 0.0034120010677725077}, {"id": 677, "seek": 278228, "start": 2782.28, + "end": 2786.28, "text": " And that''s how my users are expecting results to be there.", + "tokens": [50364, 400, 300, 311, 577, 452, 5022, 366, 9650, 3542, 281, 312, 456, + 13, 50564], "temperature": 0.0, "avg_logprob": -0.1715029529017261, "compression_ratio": + 1.6131386861313868, "no_speech_prob": 0.0059082768857479095}, {"id": 678, "seek": + 278228, "start": 2786.28, "end": 2789.28, "text": " But then maybe I can surface + some magic like Q and A, right?", "tokens": [50564, 583, 550, 1310, 286, 393, 3753, + 512, 5585, 411, 1249, 293, 316, 11, 558, 30, 50714], "temperature": 0.0, "avg_logprob": + -0.1715029529017261, "compression_ratio": 1.6131386861313868, "no_speech_prob": + 0.0059082768857479095}, {"id": 679, "seek": 278228, "start": 2789.28, "end": 2792.28, + "text": " So they asked the question and I can give them the answer.", "tokens": + [50714, 407, 436, 2351, 264, 1168, 293, 286, 393, 976, 552, 264, 1867, 13, 50864], + "temperature": 0.0, "avg_logprob": -0.1715029529017261, "compression_ratio": 1.6131386861313868, + "no_speech_prob": 0.0059082768857479095}, {"id": 680, "seek": 278228, "start": 2792.28, + "end": 2794.28, "text": " And that might be really interesting.", "tokens": [50864, + 400, 300, 1062, 312, 534, 1880, 13, 50964], "temperature": 0.0, "avg_logprob": -0.1715029529017261, + "compression_ratio": 1.6131386861313868, "no_speech_prob": 0.0059082768857479095}, + {"id": 681, "seek": 278228, "start": 2794.28, "end": 2798.28, "text": " And the + second point, there was a paper called beer.", "tokens": [50964, 400, 264, 1150, + 935, 11, 456, 390, 257, 3035, 1219, 8795, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.1715029529017261, "compression_ratio": 1.6131386861313868, "no_speech_prob": + 0.0059082768857479095}, {"id": 682, "seek": 278228, "start": 2798.28, "end": 2803.28, + "text": " B E I R. I will make sure that all of the papers will be linked here in + the show notes.", "tokens": [51164, 363, 462, 286, 497, 13, 286, 486, 652, 988, + 300, 439, 295, 264, 10577, 486, 312, 9408, 510, 294, 264, 855, 5570, 13, 51414], + "temperature": 0.0, "avg_logprob": -0.1715029529017261, "compression_ratio": 1.6131386861313868, + "no_speech_prob": 0.0059082768857479095}, {"id": 683, "seek": 278228, "start": 2803.28, + "end": 2810.28, "text": " But that paper actually compared a dense retrieval versus + BM25 on a number of tasks.", "tokens": [51414, 583, 300, 3035, 767, 5347, 257, 18011, + 19817, 3337, 5717, 15901, 6074, 322, 257, 1230, 295, 9608, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.1715029529017261, "compression_ratio": 1.6131386861313868, + "no_speech_prob": 0.0059082768857479095}, {"id": 684, "seek": 281028, "start": 2810.28, + "end": 2813.28, "text": " Right? So you can have a search.", "tokens": [50364, 1779, + 30, 407, 291, 393, 362, 257, 3164, 13, 50514], "temperature": 0.0, "avg_logprob": + -0.16128367236536792, "compression_ratio": 1.6216216216216217, "no_speech_prob": + 0.01648273691534996}, {"id": 685, "seek": 281028, "start": 2813.28, "end": 2817.28, + "text": " You can have a question answering and leaves goes on.", "tokens": [50514, + 509, 393, 362, 257, 1168, 13430, 293, 5510, 1709, 322, 13, 50714], "temperature": + 0.0, "avg_logprob": -0.16128367236536792, "compression_ratio": 1.6216216216216217, + "no_speech_prob": 0.01648273691534996}, {"id": 686, "seek": 281028, "start": 2817.28, + "end": 2821.28, "text": " And so what they showed is that BM25 is fairly competitive.", + "tokens": [50714, 400, 370, 437, 436, 4712, 307, 300, 15901, 6074, 307, 6457, 10043, + 13, 50914], "temperature": 0.0, "avg_logprob": -0.16128367236536792, "compression_ratio": + 1.6216216216216217, "no_speech_prob": 0.01648273691534996}, {"id": 687, "seek": + 281028, "start": 2821.28, "end": 2827.28, "text": " It actually is above dense retrieval + methods like on zero, zero short retrieval.", "tokens": [50914, 467, 767, 307, 3673, + 18011, 19817, 3337, 7150, 411, 322, 4018, 11, 4018, 2099, 19817, 3337, 13, 51214], + "temperature": 0.0, "avg_logprob": -0.16128367236536792, "compression_ratio": 1.6216216216216217, + "no_speech_prob": 0.01648273691534996}, {"id": 688, "seek": 281028, "start": 2827.28, + "end": 2829.28, "text": " Right? So like you didn''t find you in this model.", "tokens": + [51214, 1779, 30, 407, 411, 291, 994, 380, 915, 291, 294, 341, 2316, 13, 51314], + "temperature": 0.0, "avg_logprob": -0.16128367236536792, "compression_ratio": 1.6216216216216217, + "no_speech_prob": 0.01648273691534996}, {"id": 689, "seek": 281028, "start": 2829.28, + "end": 2833.28, "text": " So you just took them off the shelf. Here is the task. + Let''s compare. Right?", "tokens": [51314, 407, 291, 445, 1890, 552, 766, 264, 15222, + 13, 1692, 307, 264, 5633, 13, 961, 311, 6794, 13, 1779, 30, 51514], "temperature": + 0.0, "avg_logprob": -0.16128367236536792, "compression_ratio": 1.6216216216216217, + "no_speech_prob": 0.01648273691534996}, {"id": 690, "seek": 281028, "start": 2833.28, + "end": 2835.28, "text": " BM25 is very stable.", "tokens": [51514, 15901, 6074, + 307, 588, 8351, 13, 51614], "temperature": 0.0, "avg_logprob": -0.16128367236536792, + "compression_ratio": 1.6216216216216217, "no_speech_prob": 0.01648273691534996}, + {"id": 691, "seek": 281028, "start": 2835.28, "end": 2838.28, "text": " So just + few models actually outperformed it.", "tokens": [51614, 407, 445, 1326, 5245, 767, + 484, 610, 22892, 309, 13, 51764], "temperature": 0.0, "avg_logprob": -0.16128367236536792, + "compression_ratio": 1.6216216216216217, "no_speech_prob": 0.01648273691534996}, + {"id": 692, "seek": 283828, "start": 2838.28, "end": 2845.28, "text": " And so in + that sense, it sounds like BM25 is here to stay. What do you think?", "tokens": + [50364, 400, 370, 294, 300, 2020, 11, 309, 3263, 411, 15901, 6074, 307, 510, 281, + 1754, 13, 708, 360, 291, 519, 30, 50714], "temperature": 0.0, "avg_logprob": -0.17544059753417968, + "compression_ratio": 1.5267489711934157, "no_speech_prob": 0.13621282577514648}, + {"id": 693, "seek": 283828, "start": 2845.28, "end": 2847.28, "text": " I agree + with you.", "tokens": [50714, 286, 3986, 365, 291, 13, 50814], "temperature": 0.0, + "avg_logprob": -0.17544059753417968, "compression_ratio": 1.5267489711934157, "no_speech_prob": + 0.13621282577514648}, {"id": 694, "seek": 283828, "start": 2847.28, "end": 2859.28, + "text": " And again, this is where our scope is as a company is on building an end + to end information retrieval pipeline, which means that, okay, today, we have a + neural dense retrieval.", "tokens": [50814, 400, 797, 11, 341, 307, 689, 527, 11923, + 307, 382, 257, 2237, 307, 322, 2390, 364, 917, 281, 917, 1589, 19817, 3337, 15517, + 11, 597, 1355, 300, 11, 1392, 11, 965, 11, 321, 362, 257, 18161, 18011, 19817, 3337, + 13, 51414], "temperature": 0.0, "avg_logprob": -0.17544059753417968, "compression_ratio": + 1.5267489711934157, "no_speech_prob": 0.13621282577514648}, {"id": 695, "seek": + 283828, "start": 2859.28, "end": 2862.28, "text": " Because the M25 has been done, + right?", "tokens": [51414, 1436, 264, 376, 6074, 575, 668, 1096, 11, 558, 30, 51564], + "temperature": 0.0, "avg_logprob": -0.17544059753417968, "compression_ratio": 1.5267489711934157, + "no_speech_prob": 0.13621282577514648}, {"id": 696, "seek": 283828, "start": 2862.28, + "end": 2863.28, "text": " It''s in the scene.", "tokens": [51564, 467, 311, 294, + 264, 4145, 13, 51614], "temperature": 0.0, "avg_logprob": -0.17544059753417968, + "compression_ratio": 1.5267489711934157, "no_speech_prob": 0.13621282577514648}, + {"id": 697, "seek": 283828, "start": 2863.28, "end": 2865.28, "text": " It''s well + understood how to implement it.", "tokens": [51614, 467, 311, 731, 7320, 577, 281, + 4445, 309, 13, 51714], "temperature": 0.0, "avg_logprob": -0.17544059753417968, + "compression_ratio": 1.5267489711934157, "no_speech_prob": 0.13621282577514648}, + {"id": 698, "seek": 286528, "start": 2865.28, "end": 2871.28, "text": " Although + there are some tricks to actually make the M25 work even better than my off the + shelf implementations.", "tokens": [50364, 5780, 456, 366, 512, 11733, 281, 767, + 652, 264, 376, 6074, 589, 754, 1101, 813, 452, 766, 264, 15222, 4445, 763, 13, 50664], + "temperature": 0.0, "avg_logprob": -0.15374137445823433, "compression_ratio": 1.578125, + "no_speech_prob": 0.021132908761501312}, {"id": 699, "seek": 286528, "start": 2871.28, + "end": 2873.28, "text": " But what.", "tokens": [50664, 583, 437, 13, 50764], "temperature": + 0.0, "avg_logprob": -0.15374137445823433, "compression_ratio": 1.578125, "no_speech_prob": + 0.021132908761501312}, {"id": 700, "seek": 286528, "start": 2873.28, "end": 2881.28, + "text": " Where we want to eventually get to is we could potentially build the BM25 + and dense indexes for our customers.", "tokens": [50764, 2305, 321, 528, 281, 4728, + 483, 281, 307, 321, 727, 7263, 1322, 264, 15901, 6074, 293, 18011, 8186, 279, 337, + 527, 4581, 13, 51164], "temperature": 0.0, "avg_logprob": -0.15374137445823433, + "compression_ratio": 1.578125, "no_speech_prob": 0.021132908761501312}, {"id": 701, + "seek": 286528, "start": 2881.28, "end": 2883.28, "text": " And then return.", "tokens": + [51164, 400, 550, 2736, 13, 51264], "temperature": 0.0, "avg_logprob": -0.15374137445823433, + "compression_ratio": 1.578125, "no_speech_prob": 0.021132908761501312}, {"id": 702, + "seek": 286528, "start": 2883.28, "end": 2886.28, "text": " We''re trying to just + serve the best results possible.", "tokens": [51264, 492, 434, 1382, 281, 445, 4596, + 264, 1151, 3542, 1944, 13, 51414], "temperature": 0.0, "avg_logprob": -0.15374137445823433, + "compression_ratio": 1.578125, "no_speech_prob": 0.021132908761501312}, {"id": 703, + "seek": 286528, "start": 2886.28, "end": 2891.28, "text": " So for instance, you + could take even sometimes even very simple heuristics work single word queries.", + "tokens": [51414, 407, 337, 5197, 11, 291, 727, 747, 754, 2171, 754, 588, 2199, + 415, 374, 6006, 589, 2167, 1349, 24109, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.15374137445823433, "compression_ratio": 1.578125, "no_speech_prob": 0.021132908761501312}, + {"id": 704, "seek": 289128, "start": 2891.28, "end": 2896.28, "text": " Often BM25 + is how you want to serve them, not not not from a dense index.", "tokens": [50364, + 20043, 15901, 6074, 307, 577, 291, 528, 281, 4596, 552, 11, 406, 406, 406, 490, + 257, 18011, 8186, 13, 50614], "temperature": 0.0, "avg_logprob": -0.13678568601608276, + "compression_ratio": 1.7098976109215016, "no_speech_prob": 0.08786576986312866}, + {"id": 705, "seek": 289128, "start": 2896.28, "end": 2900.28, "text": " So if it''s + a single word query, okay, you''re going to be on 25 search.", "tokens": [50614, + 407, 498, 309, 311, 257, 2167, 1349, 14581, 11, 1392, 11, 291, 434, 516, 281, 312, + 322, 3552, 3164, 13, 50814], "temperature": 0.0, "avg_logprob": -0.13678568601608276, + "compression_ratio": 1.7098976109215016, "no_speech_prob": 0.08786576986312866}, + {"id": 706, "seek": 289128, "start": 2900.28, "end": 2903.28, "text": " If it''s + anything longer than one word run, then search.", "tokens": [50814, 759, 309, 311, + 1340, 2854, 813, 472, 1349, 1190, 11, 550, 3164, 13, 50964], "temperature": 0.0, + "avg_logprob": -0.13678568601608276, "compression_ratio": 1.7098976109215016, "no_speech_prob": + 0.08786576986312866}, {"id": 707, "seek": 289128, "start": 2903.28, "end": 2908.28, + "text": " That''s not a very principled approach. I''m just pointing out that, you + know, what''s going on behind the scenes.", "tokens": [50964, 663, 311, 406, 257, + 588, 3681, 15551, 3109, 13, 286, 478, 445, 12166, 484, 300, 11, 291, 458, 11, 437, + 311, 516, 322, 2261, 264, 8026, 13, 51214], "temperature": 0.0, "avg_logprob": -0.13678568601608276, + "compression_ratio": 1.7098976109215016, "no_speech_prob": 0.08786576986312866}, + {"id": 708, "seek": 289128, "start": 2908.28, "end": 2919.28, "text": " That''s + the intelligence of the platform to provide and we''re not really restricted or + married to a vector database or only a vector database, powering powering the search + of this platform.", "tokens": [51214, 663, 311, 264, 7599, 295, 264, 3663, 281, + 2893, 293, 321, 434, 406, 534, 20608, 420, 5259, 281, 257, 8062, 8149, 420, 787, + 257, 8062, 8149, 11, 1347, 278, 1347, 278, 264, 3164, 295, 341, 3663, 13, 51764], + "temperature": 0.0, "avg_logprob": -0.13678568601608276, "compression_ratio": 1.7098976109215016, + "no_speech_prob": 0.08786576986312866}, {"id": 709, "seek": 291928, "start": 2919.28, + "end": 2922.28, "text": " Yeah, yeah, that makes sense.", "tokens": [50364, 865, + 11, 1338, 11, 300, 1669, 2020, 13, 50514], "temperature": 0.0, "avg_logprob": -0.32465085116299713, + "compression_ratio": 1.6121495327102804, "no_speech_prob": 0.026882139965891838}, + {"id": 710, "seek": 291928, "start": 2922.28, "end": 2935.28, "text": " So is does + that manifest in some way in your product that I as a user can have the flexibility + and how my search is processed is going to go the sparse route or is it going to + go the the density tree will.", "tokens": [50514, 407, 307, 775, 300, 10067, 294, + 512, 636, 294, 428, 1674, 300, 286, 382, 257, 4195, 393, 362, 264, 12635, 293, 577, + 452, 3164, 307, 18846, 307, 516, 281, 352, 264, 637, 11668, 7955, 420, 307, 309, + 516, 281, 352, 264, 264, 10305, 4230, 486, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.32465085116299713, "compression_ratio": 1.6121495327102804, "no_speech_prob": + 0.026882139965891838}, {"id": 711, "seek": 291928, "start": 2935.28, "end": 2940.28, + "text": " No, we don''t so at the moment we''re only doing the answer tree will + because we feel like that''s the interest.", "tokens": [51164, 883, 11, 321, 500, + 380, 370, 412, 264, 1623, 321, 434, 787, 884, 264, 1867, 4230, 486, 570, 321, 841, + 411, 300, 311, 264, 1179, 13, 51414], "temperature": 0.0, "avg_logprob": -0.32465085116299713, + "compression_ratio": 1.6121495327102804, "no_speech_prob": 0.026882139965891838}, + {"id": 712, "seek": 294028, "start": 2940.28, "end": 2947.28, "text": " We can add + that we can add the BM25 parts without a lot of difficulty in six months from now + or something like that.", "tokens": [50364, 492, 393, 909, 300, 321, 393, 909, 264, + 15901, 6074, 3166, 1553, 257, 688, 295, 10360, 294, 2309, 2493, 490, 586, 420, 746, + 411, 300, 13, 50714], "temperature": 0.0, "avg_logprob": -0.1677037779107151, "compression_ratio": + 1.6093023255813954, "no_speech_prob": 0.1522897183895111}, {"id": 713, "seek": 294028, + "start": 2947.28, "end": 2954.28, "text": " So, but we do provide a few different + flavors of the dense retrieval because there''s a few.", "tokens": [50714, 407, + 11, 457, 321, 360, 2893, 257, 1326, 819, 16303, 295, 264, 18011, 19817, 3337, 570, + 456, 311, 257, 1326, 13, 51064], "temperature": 0.0, "avg_logprob": -0.1677037779107151, + "compression_ratio": 1.6093023255813954, "no_speech_prob": 0.1522897183895111}, + {"id": 714, "seek": 294028, "start": 2954.28, "end": 2960.28, "text": " There''s + question answering so the user puts it or query answering the user puts a query + in and then you''re trying to find good responses.", "tokens": [51064, 821, 311, + 1168, 13430, 370, 264, 4195, 8137, 309, 420, 14581, 13430, 264, 4195, 8137, 257, + 14581, 294, 293, 550, 291, 434, 1382, 281, 915, 665, 13019, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.1677037779107151, "compression_ratio": 1.6093023255813954, + "no_speech_prob": 0.1522897183895111}, {"id": 715, "seek": 296028, "start": 2960.28, + "end": 2972.28, "text": " There''s also another task which is semantic similarity, + which is closely related, but it''s like I make a statement and I just want to find + similar statements so my statement is not necessarily a question that I''m looking + for an answer to.", "tokens": [50364, 821, 311, 611, 1071, 5633, 597, 307, 47982, + 32194, 11, 597, 307, 8185, 4077, 11, 457, 309, 311, 411, 286, 652, 257, 5629, 293, + 286, 445, 528, 281, 915, 2531, 12363, 370, 452, 5629, 307, 406, 4725, 257, 1168, + 300, 286, 478, 1237, 337, 364, 1867, 281, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.15994347965016084, "compression_ratio": 1.8826291079812207, "no_speech_prob": + 0.048425037413835526}, {"id": 716, "seek": 296028, "start": 2972.28, "end": 2984.28, + "text": " I just want to find semantically similar statements and then the other + thing is question question similarity often comes up it comes up usually in the + not not in.", "tokens": [50964, 286, 445, 528, 281, 915, 4361, 49505, 2531, 12363, + 293, 550, 264, 661, 551, 307, 1168, 1168, 32194, 2049, 1487, 493, 309, 1487, 493, + 2673, 294, 264, 406, 406, 294, 13, 51564], "temperature": 0.0, "avg_logprob": -0.15994347965016084, + "compression_ratio": 1.8826291079812207, "no_speech_prob": 0.048425037413835526}, + {"id": 717, "seek": 298428, "start": 2984.28, "end": 2991.28, "text": " Well, you''ve + seen it in Google for instance when you type with query and then it says people + also ask these questions and they get these similar questions right.", "tokens": + [50364, 1042, 11, 291, 600, 1612, 309, 294, 3329, 337, 5197, 562, 291, 2010, 365, + 14581, 293, 550, 309, 1619, 561, 611, 1029, 613, 1651, 293, 436, 483, 613, 2531, + 1651, 558, 13, 50714], "temperature": 0.0, "avg_logprob": -0.12817928194999695, + "compression_ratio": 1.7479674796747968, "no_speech_prob": 0.010657334700226784}, + {"id": 718, "seek": 298428, "start": 2991.28, "end": 3002.28, "text": " So there''s + use cases for question question similarity and so we support all three of those + modes of operation and we allow at query time our customers to specify which mode + they''re trying to run it.", "tokens": [50714, 407, 456, 311, 764, 3331, 337, 1168, + 1168, 32194, 293, 370, 321, 1406, 439, 1045, 295, 729, 14068, 295, 6916, 293, 321, + 2089, 412, 14581, 565, 527, 4581, 281, 16500, 597, 4391, 436, 434, 1382, 281, 1190, + 309, 13, 51264], "temperature": 0.0, "avg_logprob": -0.12817928194999695, "compression_ratio": + 1.7479674796747968, "no_speech_prob": 0.010657334700226784}, {"id": 719, "seek": + 298428, "start": 3002.28, "end": 3007.28, "text": " Yeah, yeah, that makes sense + that makes a lot of sense and of course.", "tokens": [51264, 865, 11, 1338, 11, + 300, 1669, 2020, 300, 1669, 257, 688, 295, 2020, 293, 295, 1164, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.12817928194999695, "compression_ratio": 1.7479674796747968, + "no_speech_prob": 0.010657334700226784}, {"id": 720, "seek": 300728, "start": 3007.28, + "end": 3032.28, "text": " One thing that I keep thinking about is let''s say when + we introduce the sparse search let''s say Bm 25 and some customer comes in and it''s + not English language it''s something else right then you need to bring in also the + tokenization and other things from maybe from Lucene and of course, Lucene is a + library in principle it could be wrapped in a Docker image and you can do that job + right.", "tokens": [50364, 1485, 551, 300, 286, 1066, 1953, 466, 307, 718, 311, + 584, 562, 321, 5366, 264, 637, 11668, 3164, 718, 311, 584, 363, 76, 3552, 293, 512, + 5474, 1487, 294, 293, 309, 311, 406, 3669, 2856, 309, 311, 746, 1646, 558, 550, + 291, 643, 281, 1565, 294, 611, 264, 14862, 2144, 293, 661, 721, 490, 1310, 490, + 9593, 1450, 293, 295, 1164, 11, 9593, 1450, 307, 257, 6405, 294, 8665, 309, 727, + 312, 14226, 294, 257, 33772, 3256, 293, 291, 393, 360, 300, 1691, 558, 13, 51614], + "temperature": 0.0, "avg_logprob": -0.2221856920906667, "compression_ratio": 1.6208333333333333, + "no_speech_prob": 0.13411051034927368}, {"id": 721, "seek": 303228, "start": 3032.28, + "end": 3040.28, "text": " But then the question is can you easily married so that + it is production grade between different platforms and languages.", "tokens": [50364, + 583, 550, 264, 1168, 307, 393, 291, 3612, 5259, 370, 300, 309, 307, 4265, 7204, + 1296, 819, 9473, 293, 8650, 13, 50764], "temperature": 0.0, "avg_logprob": -0.19335718557868206, + "compression_ratio": 1.5454545454545454, "no_speech_prob": 0.03915693610906601}, + {"id": 722, "seek": 303228, "start": 3040.28, "end": 3050.28, "text": " And it''s + surprising Lucene has come a long way so there''s come long in terms of providing + a good sense out of defaults out of the box in terms of stop wordless and stemming + but I have.", "tokens": [50764, 400, 309, 311, 8830, 9593, 1450, 575, 808, 257, + 938, 636, 370, 456, 311, 808, 938, 294, 2115, 295, 6530, 257, 665, 2020, 484, 295, + 7576, 82, 484, 295, 264, 2424, 294, 2115, 295, 1590, 1349, 1832, 293, 12312, 2810, + 457, 286, 362, 13, 51264], "temperature": 0.0, "avg_logprob": -0.19335718557868206, + "compression_ratio": 1.5454545454545454, "no_speech_prob": 0.03915693610906601}, + {"id": 723, "seek": 305028, "start": 3051.28, "end": 3062.28, "text": " My daughter + school started using this like a product that manages communication between the + school and the parents and that thing was clearly using.", "tokens": [50414, 1222, + 4653, 1395, 1409, 1228, 341, 411, 257, 1674, 300, 22489, 6101, 1296, 264, 1395, + 293, 264, 3152, 293, 300, 551, 390, 4448, 1228, 13, 50964], "temperature": 0.0, + "avg_logprob": -0.151140343059193, "compression_ratio": 1.7095435684647302, "no_speech_prob": + 0.09328672289848328}, {"id": 724, "seek": 305028, "start": 3062.28, "end": 3077.28, + "text": " You know, Lucene or solar elastic search and they didn''t have the stemming + configured properly and I didn''t know as possible to misto misconfigure that so + I was searching for vaccine and it couldn''t find find it because it was vaccination + in the title over there.", "tokens": [50964, 509, 458, 11, 9593, 1450, 420, 7936, + 17115, 3164, 293, 436, 994, 380, 362, 264, 12312, 2810, 30538, 6108, 293, 286, 994, + 380, 458, 382, 1944, 281, 3544, 78, 3346, 1671, 20646, 540, 300, 370, 286, 390, + 10808, 337, 7007, 293, 309, 2809, 380, 915, 915, 309, 570, 309, 390, 16498, 294, + 264, 4876, 670, 456, 13, 51714], "temperature": 0.0, "avg_logprob": -0.151140343059193, + "compression_ratio": 1.7095435684647302, "no_speech_prob": 0.09328672289848328}, + {"id": 725, "seek": 307728, "start": 3077.28, "end": 3089.28, "text": " So yeah, + so with the with the neurosurgeon is kind of a little bit more bullet proof, you + know, it''s it''s a bit more immune to these kinds of mistakes and those misspellings + very easily.", "tokens": [50364, 407, 1338, 11, 370, 365, 264, 365, 264, 28813, + 374, 11641, 307, 733, 295, 257, 707, 857, 544, 11632, 8177, 11, 291, 458, 11, 309, + 311, 309, 311, 257, 857, 544, 11992, 281, 613, 3685, 295, 8038, 293, 729, 1713, + 49241, 1109, 588, 3612, 13, 50964], "temperature": 0.0, "avg_logprob": -0.20593896204111528, + "compression_ratio": 1.6680161943319838, "no_speech_prob": 0.049178414046764374}, + {"id": 726, "seek": 307728, "start": 3089.28, "end": 3102.28, "text": " Yeah, yeah, + especially I think there is also a paper about I think it was from Google you know + to train on bite level and so you will not be constrained by okay the complexity + of the language because you have like bite level.", "tokens": [50964, 865, 11, 1338, + 11, 2318, 286, 519, 456, 307, 611, 257, 3035, 466, 286, 519, 309, 390, 490, 3329, + 291, 458, 281, 3847, 322, 7988, 1496, 293, 370, 291, 486, 406, 312, 38901, 538, + 1392, 264, 14024, 295, 264, 2856, 570, 291, 362, 411, 7988, 1496, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.20593896204111528, "compression_ratio": 1.6680161943319838, + "no_speech_prob": 0.049178414046764374}, {"id": 727, "seek": 310228, "start": 3102.28, + "end": 3111.28, "text": " Definitions and and so in principle your model should + be robust to typos and misspellings and so on and some of them come from speech + right so.", "tokens": [50364, 46245, 2451, 293, 293, 370, 294, 8665, 428, 2316, + 820, 312, 13956, 281, 2125, 329, 293, 1713, 49241, 1109, 293, 370, 322, 293, 512, + 295, 552, 808, 490, 6218, 558, 370, 13, 50814], "temperature": 0.0, "avg_logprob": + -0.09581869076459835, "compression_ratio": 1.7211538461538463, "no_speech_prob": + 0.06103605777025223}, {"id": 728, "seek": 310228, "start": 3112.28, "end": 3127.28, + "text": " Exactly exactly yeah and it sounds like interesting like the example you + brought up with your daughter school like system like it sounds like largely search + is still broken it''s like like the moment you go to some.", "tokens": [50864, 7587, + 2293, 1338, 293, 309, 3263, 411, 1880, 411, 264, 1365, 291, 3038, 493, 365, 428, + 4653, 1395, 411, 1185, 411, 309, 3263, 411, 11611, 3164, 307, 920, 5463, 309, 311, + 411, 411, 264, 1623, 291, 352, 281, 512, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.09581869076459835, "compression_ratio": 1.7211538461538463, "no_speech_prob": + 0.06103605777025223}, {"id": 729, "seek": 312728, "start": 3127.28, "end": 3154.28, + "text": " System which is let''s say for public use right like it''s not necessarily + designed for for findability there it exists and you know like like Daniel tanker + lung I think he says like the funny part of search industry in general is that when + search engine works nobody will go and praise you they just use it when it doesn''t + work they will blame you so you always air on on that.", "tokens": [50364, 8910, + 597, 307, 718, 311, 584, 337, 1908, 764, 558, 411, 309, 311, 406, 4725, 4761, 337, + 337, 915, 2310, 456, 309, 8198, 293, 291, 458, 411, 411, 8033, 5466, 260, 16730, + 286, 519, 415, 1619, 411, 264, 4074, 644, 295, 3164, 3518, 294, 2674, 307, 300, + 562, 3164, 2848, 1985, 5079, 486, 352, 293, 13286, 291, 436, 445, 764, 309, 562, + 309, 1177, 380, 589, 436, 486, 10127, 291, 370, 291, 1009, 1988, 322, 322, 300, + 13, 51714], "temperature": 0.0, "avg_logprob": -0.20001648693549923, "compression_ratio": + 1.6491228070175439, "no_speech_prob": 0.06799925118684769}, {"id": 730, "seek": + 315428, "start": 3154.28, "end": 3163.28, "text": " How do you feel about that like + is this also the potential for your company to go and fix many of these broken use + cases.", "tokens": [50364, 1012, 360, 291, 841, 466, 300, 411, 307, 341, 611, 264, + 3995, 337, 428, 2237, 281, 352, 293, 3191, 867, 295, 613, 5463, 764, 3331, 13, 50814], + "temperature": 0.0, "avg_logprob": -0.1565423398404508, "compression_ratio": 1.591549295774648, + "no_speech_prob": 0.08269992470741272}, {"id": 731, "seek": 315428, "start": 3163.28, + "end": 3182.28, "text": " Well that certainly that certainly actually our vision + that we will make it very easy for SAS companies to provide a much more in Google + like search experience in their products so when it comes to web say that let''s.", + "tokens": [50814, 1042, 300, 3297, 300, 3297, 767, 527, 5201, 300, 321, 486, 652, + 309, 588, 1858, 337, 33441, 3431, 281, 2893, 257, 709, 544, 294, 3329, 411, 3164, + 1752, 294, 641, 3383, 370, 562, 309, 1487, 281, 3670, 584, 300, 718, 311, 13, 51764], + "temperature": 0.0, "avg_logprob": -0.1565423398404508, "compression_ratio": 1.591549295774648, + "no_speech_prob": 0.08269992470741272}, {"id": 732, "seek": 318228, "start": 3182.28, + "end": 3191.28, "text": " Into two categories SAS companies and website owners when + it comes to website owners I think the search for websites is really used because.", + "tokens": [50364, 23373, 732, 10479, 33441, 3431, 293, 3144, 7710, 562, 309, 1487, + 281, 3144, 7710, 286, 519, 264, 3164, 337, 12891, 307, 534, 1143, 570, 13, 50814], + "temperature": 0.0, "avg_logprob": -0.160246471563975, "compression_ratio": 1.8588235294117648, + "no_speech_prob": 0.237690269947052}, {"id": 733, "seek": 318228, "start": 3192.28, + "end": 3210.28, "text": " And it becomes like a cyclical thing it''s really used + companies therefore don''t invest any money in improving it it''s really used because + it''s not good and basically Google does enough a good enough job actually indexing + well sites so site owners have accepted that Google is going to be the front door + into their into their website.", "tokens": [50864, 400, 309, 3643, 411, 257, 19474, + 804, 551, 309, 311, 534, 1143, 3431, 4412, 500, 380, 1963, 604, 1460, 294, 11470, + 309, 309, 311, 534, 1143, 570, 309, 311, 406, 665, 293, 1936, 3329, 775, 1547, 257, + 665, 1547, 1691, 767, 8186, 278, 731, 7533, 370, 3621, 7710, 362, 9035, 300, 3329, + 307, 516, 281, 312, 264, 1868, 2853, 666, 641, 666, 641, 3144, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.160246471563975, "compression_ratio": 1.8588235294117648, + "no_speech_prob": 0.237690269947052}, {"id": 734, "seek": 321028, "start": 3210.28, + "end": 3237.28, "text": " On the other hand I think it''s it is obviously dangerous + for them to because you''ve had sites that essentially get obliterated when Google + changes you know their quality guidelines and they they drop off the front page + and the traffic goes down by 95% suddenly and there''s no way to recover from it + so it would be good first to be able to provide a good search experience on the + websites but I think they don''t do it for the cost involved and they don''t know + how to and certainly.", "tokens": [50364, 1282, 264, 661, 1011, 286, 519, 309, 311, + 309, 307, 2745, 5795, 337, 552, 281, 570, 291, 600, 632, 7533, 300, 4476, 483, 23740, + 1681, 770, 562, 3329, 2962, 291, 458, 641, 3125, 12470, 293, 436, 436, 3270, 766, + 264, 1868, 3028, 293, 264, 6419, 1709, 760, 538, 13420, 4, 5800, 293, 456, 311, + 572, 636, 281, 8114, 490, 309, 370, 309, 576, 312, 665, 700, 281, 312, 1075, 281, + 2893, 257, 665, 3164, 1752, 322, 264, 12891, 457, 286, 519, 436, 500, 380, 360, + 309, 337, 264, 2063, 3288, 293, 436, 500, 380, 458, 577, 281, 293, 3297, 13, 51714], + "temperature": 0.0, "avg_logprob": -0.09170211278475247, "compression_ratio": 1.6783216783216783, + "no_speech_prob": 0.03175082057714462}, {"id": 735, "seek": 323728, "start": 3237.28, + "end": 3253.6800000000003, "text": " Algolia and elastic are making that easier + particularly algolia but there''s still a lot better that it could be made coming + to SAS companies there they''re talking about data that''s private the communications + of the school to the parents are not on the web somewhere they can be indexed by + Google.", "tokens": [50364, 35014, 29760, 293, 17115, 366, 1455, 300, 3571, 4098, + 3501, 29760, 457, 456, 311, 920, 257, 688, 1101, 300, 309, 727, 312, 1027, 1348, + 281, 33441, 3431, 456, 436, 434, 1417, 466, 1412, 300, 311, 4551, 264, 15163, 295, + 264, 1395, 281, 264, 3152, 366, 406, 322, 264, 3670, 4079, 436, 393, 312, 8186, + 292, 538, 3329, 13, 51184], "temperature": 0.0, "avg_logprob": -0.14748193884408603, + "compression_ratio": 1.688976377952756, "no_speech_prob": 0.005614744499325752}, + {"id": 736, "seek": 323728, "start": 3254.1600000000003, "end": 3261.88, "text": + " So I feel like what I''ve noticed in the last few years is that some sort of search + feature is present in most of these products now.", "tokens": [51208, 407, 286, + 841, 411, 437, 286, 600, 5694, 294, 264, 1036, 1326, 924, 307, 300, 512, 1333, 295, + 3164, 4111, 307, 1974, 294, 881, 295, 613, 3383, 586, 13, 51594], "temperature": + 0.0, "avg_logprob": -0.14748193884408603, "compression_ratio": 1.688976377952756, + "no_speech_prob": 0.005614744499325752}, {"id": 737, "seek": 326188, "start": 3261.88, + "end": 3291.48, "text": " But yes it''s usually not tuned maybe not even set up + correctly and it doesn''t work well and there''s a lot of room for improvement so + I think these these neural search technologies let you you know really easily improve + the quality easily if you''ve got a set of simple APIs and that''s what we provide + our APIs basically look like elastic or Algolia''s index documents and you never + know there''s a neural network running.", "tokens": [50364, 583, 2086, 309, 311, + 2673, 406, 10870, 1310, 406, 754, 992, 493, 8944, 293, 309, 1177, 380, 589, 731, + 293, 456, 311, 257, 688, 295, 1808, 337, 10444, 370, 286, 519, 613, 613, 18161, + 3164, 7943, 718, 291, 291, 458, 534, 3612, 3470, 264, 3125, 3612, 498, 291, 600, + 658, 257, 992, 295, 2199, 21445, 293, 300, 311, 437, 321, 2893, 527, 21445, 1936, + 574, 411, 17115, 420, 35014, 29760, 311, 8186, 8512, 293, 291, 1128, 458, 456, 311, + 257, 18161, 3209, 2614, 13, 51844], "temperature": 0.0, "avg_logprob": -0.1850170486274807, + "compression_ratio": 1.672, "no_speech_prob": 0.027591120451688766}, {"id": 738, + "seek": 329188, "start": 3291.88, "end": 3300.0, "text": " And the background at + all and it''s not important just the queries go in and the results come out but + these results are far far better than what you would get from a keyword search.", + "tokens": [50364, 400, 264, 3678, 412, 439, 293, 309, 311, 406, 1021, 445, 264, + 24109, 352, 294, 293, 264, 3542, 808, 484, 457, 613, 3542, 366, 1400, 1400, 1101, + 813, 437, 291, 576, 483, 490, 257, 20428, 3164, 13, 50770], "temperature": 0.0, + "avg_logprob": -0.19768850008646646, "compression_ratio": 1.6425855513307985, "no_speech_prob": + 0.009925175458192825}, {"id": 739, "seek": 329188, "start": 3301.0, "end": 3308.88, + "text": " So so I think there''s a lot of scope particularly for SAS companies for + for neural search.", "tokens": [50820, 407, 370, 286, 519, 456, 311, 257, 688, 295, + 11923, 4098, 337, 33441, 3431, 337, 337, 18161, 3164, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.19768850008646646, "compression_ratio": 1.6425855513307985, + "no_speech_prob": 0.009925175458192825}, {"id": 740, "seek": 329188, "start": 3309.36, + "end": 3321.36, "text": " Yeah yeah absolutely I actually wanted to ask you just + a question came to my mind I''ve been reading the book about I think about relevant + search it''s called by.", "tokens": [51238, 865, 1338, 3122, 286, 767, 1415, 281, + 1029, 291, 445, 257, 1168, 1361, 281, 452, 1575, 286, 600, 668, 3760, 264, 1446, + 466, 286, 519, 466, 7340, 3164, 309, 311, 1219, 538, 13, 51838], "temperature": + 0.0, "avg_logprob": -0.19768850008646646, "compression_ratio": 1.6425855513307985, + "no_speech_prob": 0.009925175458192825}, {"id": 741, "seek": 332188, "start": 3321.88, + "end": 3329.88, "text": " Doctor and ball and other authors I might be not remembering + exactly but this book you know it goes chapter after chapter wait says.", "tokens": + [50364, 10143, 293, 2594, 293, 661, 16552, 286, 1062, 312, 406, 20719, 2293, 457, + 341, 1446, 291, 458, 309, 1709, 7187, 934, 7187, 1699, 1619, 13, 50764], "temperature": + 0.0, "avg_logprob": -0.22046729974579393, "compression_ratio": 1.4655172413793103, + "no_speech_prob": 0.004405877087265253}, {"id": 742, "seek": 332188, "start": 3330.4, + "end": 3338.12, "text": " Okay let''s just take it from the first principles you + have a search to ask you have documents you need to start with like.", "tokens": + [50790, 1033, 718, 311, 445, 747, 309, 490, 264, 700, 9156, 291, 362, 257, 3164, + 281, 1029, 291, 362, 8512, 291, 643, 281, 722, 365, 411, 13, 51176], "temperature": + 0.0, "avg_logprob": -0.22046729974579393, "compression_ratio": 1.4655172413793103, + "no_speech_prob": 0.004405877087265253}, {"id": 743, "seek": 333812, "start": 3338.6, + "end": 3359.04, "text": " tokenization and by the way if you make a mistake that + it will be not findable and then you move one level up and then you start thinking + okay what about the model okay TF IDF BM25 what are the trade of Sunson and so they + teach you to become a search engineer and then they proceed to ranking and so on + so forth and my question is like.", "tokens": [50388, 14862, 2144, 293, 538, 264, + 636, 498, 291, 652, 257, 6146, 300, 309, 486, 312, 406, 915, 712, 293, 550, 291, + 1286, 472, 1496, 493, 293, 550, 291, 722, 1953, 1392, 437, 466, 264, 2316, 1392, + 40964, 7348, 37, 15901, 6074, 437, 366, 264, 4923, 295, 6163, 3015, 293, 370, 436, + 2924, 291, 281, 1813, 257, 3164, 11403, 293, 550, 436, 8991, 281, 17833, 293, 370, + 322, 370, 5220, 293, 452, 1168, 307, 411, 13, 51410], "temperature": 0.0, "avg_logprob": + -0.24452042881446548, "compression_ratio": 1.6, "no_speech_prob": 0.14902831614017487}, + {"id": 744, "seek": 335904, "start": 3360.04, "end": 3373.04, "text": " What do + you think is going to be the change in the search engine profession going forward + once neural search will hit the mass market because when I was the search engineer.", + "tokens": [50414, 708, 360, 291, 519, 307, 516, 281, 312, 264, 1319, 294, 264, 3164, + 2848, 7032, 516, 2128, 1564, 18161, 3164, 486, 2045, 264, 2758, 2142, 570, 562, + 286, 390, 264, 3164, 11403, 13, 51064], "temperature": 0.0, "avg_logprob": -0.20458857218424478, + "compression_ratio": 1.6974789915966386, "no_speech_prob": 0.019305700436234474}, + {"id": 745, "seek": 335904, "start": 3374.2799999999997, "end": 3388.52, "text": + " Like I looked at the scene and solar and I I didn''t question much I just went + and like implemented some changes some parsers some plugins or modified the behavior + of some of some algorithm right by extending that class by the way.", "tokens": + [51126, 1743, 286, 2956, 412, 264, 4145, 293, 7936, 293, 286, 286, 994, 380, 1168, + 709, 286, 445, 1437, 293, 411, 12270, 512, 2962, 512, 21156, 433, 512, 33759, 420, + 15873, 264, 5223, 295, 512, 295, 512, 9284, 558, 538, 24360, 300, 1508, 538, 264, + 636, 13, 51838], "temperature": 0.0, "avg_logprob": -0.20458857218424478, "compression_ratio": + 1.6974789915966386, "no_speech_prob": 0.019305700436234474}, {"id": 746, "seek": + 338904, "start": 3389.04, "end": 3412.04, "text": " The scene was not it was making + a lot of classes final and in Java and so I cannot actually extend them so I had + to copy the entire like package and then and then rename all these classes so there + is no like namespace clash but that''s okay nowhere it''s at some point I was worried + that I will probably reintroduce the scene all the way in my ID because I had to + touch multiple parts.", "tokens": [50364, 440, 4145, 390, 406, 309, 390, 1455, 257, + 688, 295, 5359, 2572, 293, 294, 10745, 293, 370, 286, 2644, 767, 10101, 552, 370, + 286, 632, 281, 5055, 264, 2302, 411, 7372, 293, 550, 293, 550, 36741, 439, 613, + 5359, 370, 456, 307, 572, 411, 5288, 17940, 36508, 457, 300, 311, 1392, 11159, 309, + 311, 412, 512, 935, 286, 390, 5804, 300, 286, 486, 1391, 319, 38132, 384, 264, 4145, + 439, 264, 636, 294, 452, 7348, 570, 286, 632, 281, 2557, 3866, 3166, 13, 51514], + "temperature": 0.0, "avg_logprob": -0.23042189937898483, "compression_ratio": 1.6946902654867257, + "no_speech_prob": 0.011080016382038593}, {"id": 747, "seek": 341204, "start": 3412.04, + "end": 3437.88, "text": " But so I felt like I''m in control more or less right + not because it''s on not not only because it''s open source but because I could + read the code I could talk to people I could read books I could read blogs and I + could experiment myself right and that made me I believe a search engineer in that + company even though the company''s goal was not to build you know searches service + we were building the product.", "tokens": [50364, 583, 370, 286, 2762, 411, 286, + 478, 294, 1969, 544, 420, 1570, 558, 406, 570, 309, 311, 322, 406, 406, 787, 570, + 309, 311, 1269, 4009, 457, 570, 286, 727, 1401, 264, 3089, 286, 727, 751, 281, 561, + 286, 727, 1401, 3642, 286, 727, 1401, 31038, 293, 286, 727, 5120, 2059, 558, 293, + 300, 1027, 385, 286, 1697, 257, 3164, 11403, 294, 300, 2237, 754, 1673, 264, 2237, + 311, 3387, 390, 406, 281, 1322, 291, 458, 26701, 2643, 321, 645, 2390, 264, 1674, + 13, 51656], "temperature": 0.0, "avg_logprob": -0.15382315895774148, "compression_ratio": + 1.7885462555066078, "no_speech_prob": 0.07129093259572983}, {"id": 748, "seek": + 343788, "start": 3438.2000000000003, "end": 3443.96, "text": " How do you do happen + it thoughts around like how neural search will change the landscape of this job.", + "tokens": [50380, 1012, 360, 291, 360, 1051, 309, 4598, 926, 411, 577, 18161, 3164, + 486, 1319, 264, 9661, 295, 341, 1691, 13, 50668], "temperature": 0.0, "avg_logprob": + -0.2286097764968872, "compression_ratio": 1.625615763546798, "no_speech_prob": 0.0033826478756964207}, + {"id": 749, "seek": 343788, "start": 3447.08, "end": 3449.96, "text": " Well that''s + a that''s an excellent question.", "tokens": [50824, 1042, 300, 311, 257, 300, 311, + 364, 7103, 1168, 13, 50968], "temperature": 0.0, "avg_logprob": -0.2286097764968872, + "compression_ratio": 1.625615763546798, "no_speech_prob": 0.0033826478756964207}, + {"id": 750, "seek": 343788, "start": 3451.6, "end": 3453.8, "text": " Well a few + a few thoughts on that topic.", "tokens": [51050, 1042, 257, 1326, 257, 1326, 4598, + 322, 300, 4829, 13, 51160], "temperature": 0.0, "avg_logprob": -0.2286097764968872, + "compression_ratio": 1.625615763546798, "no_speech_prob": 0.0033826478756964207}, + {"id": 751, "seek": 343788, "start": 3454.92, "end": 3456.52, "text": " Neural search + is going to make it.", "tokens": [51216, 1734, 1807, 3164, 307, 516, 281, 652, 309, + 13, 51296], "temperature": 0.0, "avg_logprob": -0.2286097764968872, "compression_ratio": + 1.625615763546798, "no_speech_prob": 0.0033826478756964207}, {"id": 752, "seek": + 343788, "start": 3458.6800000000003, "end": 3467.12, "text": " Easier it''s going + to require less expertise to put together high quality search experiences and furthermore.", + "tokens": [51404, 46879, 811, 309, 311, 516, 281, 3651, 1570, 11769, 281, 829, 1214, + 1090, 3125, 3164, 5235, 293, 3052, 3138, 13, 51826], "temperature": 0.0, "avg_logprob": + -0.2286097764968872, "compression_ratio": 1.625615763546798, "no_speech_prob": 0.0033826478756964207}, + {"id": 753, "seek": 346788, "start": 3468.36, "end": 3474.7200000000003, "text": + " The advantage the companies like Google or Microsoft have from click data it''s + still going to be there but it''s going to diminish.", "tokens": [50388, 440, 5002, + 264, 3431, 411, 3329, 420, 8116, 362, 490, 2052, 1412, 309, 311, 920, 516, 281, + 312, 456, 457, 309, 311, 516, 281, 48696, 13, 50706], "temperature": 0.0, "avg_logprob": + -0.19499053955078124, "compression_ratio": 1.50253807106599, "no_speech_prob": 0.00915265642106533}, + {"id": 754, "seek": 346788, "start": 3475.52, "end": 3487.32, "text": " And I think + that''s actually why maybe I''m biased here and misreading it you see a lot of search + engine companies starting up in the last year or two you''ve got meva.", "tokens": + [50746, 400, 286, 519, 300, 311, 767, 983, 1310, 286, 478, 28035, 510, 293, 3346, + 35908, 309, 291, 536, 257, 688, 295, 3164, 2848, 3431, 2891, 493, 294, 264, 1036, + 1064, 420, 732, 291, 600, 658, 385, 2757, 13, 51336], "temperature": 0.0, "avg_logprob": + -0.19499053955078124, "compression_ratio": 1.50253807106599, "no_speech_prob": 0.00915265642106533}, + {"id": 755, "seek": 348732, "start": 3487.88, "end": 3494.84, "text": " Kagi I think + the head of sales force research has started his own engine I''ve even heard some + rumors you don''t.", "tokens": [50392, 591, 20291, 286, 519, 264, 1378, 295, 5763, + 3464, 2132, 575, 1409, 702, 1065, 2848, 286, 600, 754, 2198, 512, 21201, 291, 500, + 380, 13, 50740], "temperature": 0.0, "avg_logprob": -0.405896199213994, "compression_ratio": + 1.599078341013825, "no_speech_prob": 0.07359088957309723}, {"id": 756, "seek": 348732, + "start": 3497.6400000000003, "end": 3513.0800000000004, "text": " Right right I + heard some movie Apple Richard so yeah exactly so to maybe some rumors Apple might + be trying to do something like that and it''s it''s basically because the amount + of effort it takes now I think has gone down significantly.", "tokens": [50880, + 1779, 558, 286, 2198, 512, 3169, 6373, 9809, 370, 1338, 2293, 370, 281, 1310, 512, + 21201, 6373, 1062, 312, 1382, 281, 360, 746, 411, 300, 293, 309, 311, 309, 311, + 1936, 570, 264, 2372, 295, 4630, 309, 2516, 586, 286, 519, 575, 2780, 760, 10591, + 13, 51652], "temperature": 0.0, "avg_logprob": -0.405896199213994, "compression_ratio": + 1.599078341013825, "no_speech_prob": 0.07359088957309723}, {"id": 757, "seek": 351308, + "start": 3513.96, "end": 3519.96, "text": " So I think that that''s going to be + one of the effects of neural.", "tokens": [50408, 407, 286, 519, 300, 300, 311, + 516, 281, 312, 472, 295, 264, 5065, 295, 18161, 13, 50708], "temperature": 0.0, + "avg_logprob": -0.39067249913369456, "compression_ratio": 1.5791666666666666, "no_speech_prob": + 0.016968876123428345}, {"id": 758, "seek": 351308, "start": 3521.08, "end": 3528.6, + "text": " And I also expect it just like you know a losing has been around for a + long time I mean maybe the early 2000''s 2000.", "tokens": [50764, 400, 286, 611, + 2066, 309, 445, 411, 291, 458, 257, 7027, 575, 668, 926, 337, 257, 938, 565, 286, + 914, 1310, 264, 2440, 8132, 311, 8132, 13, 51140], "temperature": 0.0, "avg_logprob": + -0.39067249913369456, "compression_ratio": 1.5791666666666666, "no_speech_prob": + 0.016968876123428345}, {"id": 759, "seek": 351308, "start": 3529.4, "end": 3541.24, + "text": " 99 to 9 I think when that cutting started learning Java and as a side + product project he decided to implement Lucene and so he started the whole community + and then Hadoop followed and so on so far.", "tokens": [51180, 11803, 281, 1722, + 286, 519, 562, 300, 6492, 1409, 2539, 10745, 293, 382, 257, 1252, 1674, 1716, 415, + 3047, 281, 4445, 9593, 1450, 293, 370, 415, 1409, 264, 1379, 1768, 293, 550, 389, + 1573, 404, 6263, 293, 370, 322, 370, 1400, 13, 51772], "temperature": 0.0, "avg_logprob": + -0.39067249913369456, "compression_ratio": 1.5791666666666666, "no_speech_prob": + 0.016968876123428345}, {"id": 760, "seek": 354124, "start": 3541.3199999999997, + "end": 3548.12, "text": " Yeah okay because I yeah I remember from a time ago so + I think that in the same way there will be an open source.", "tokens": [50368, 865, + 1392, 570, 286, 1338, 286, 1604, 490, 257, 565, 2057, 370, 286, 519, 300, 294, 264, + 912, 636, 456, 486, 312, 364, 1269, 4009, 13, 50708], "temperature": 0.0, "avg_logprob": + -0.19038545763170397, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.0064502740278840065}, {"id": 761, "seek": 354124, "start": 3549.08, "end": 3557.3999999999996, + "text": " Neural thing it might come under the cover of Lucene or it might be a + separate Apache project and and eventually it''s going to be the go to solution.", + "tokens": [50756, 1734, 1807, 551, 309, 1062, 808, 833, 264, 2060, 295, 9593, 1450, + 420, 309, 1062, 312, 257, 4994, 46597, 1716, 293, 293, 4728, 309, 311, 516, 281, + 312, 264, 352, 281, 3827, 13, 51172], "temperature": 0.0, "avg_logprob": -0.19038545763170397, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.0064502740278840065}, + {"id": 762, "seek": 354124, "start": 3558.52, "end": 3568.9199999999996, "text": + " So what companies like mine are doing right now is you know this technology is + still pretty new and we''re feeling in the gap and we''re also providing like a + completely hosted solution which has some some value on its own.", "tokens": [51228, + 407, 437, 3431, 411, 3892, 366, 884, 558, 586, 307, 291, 458, 341, 2899, 307, 920, + 1238, 777, 293, 321, 434, 2633, 294, 264, 7417, 293, 321, 434, 611, 6530, 411, 257, + 2584, 19204, 3827, 597, 575, 512, 512, 2158, 322, 1080, 1065, 13, 51748], "temperature": + 0.0, "avg_logprob": -0.19038545763170397, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.0064502740278840065}, {"id": 763, "seek": 356892, "start": 3569.88, + "end": 3578.28, "text": " But I think longer term that''s why I see things headed + because you know we''re getting into these very good general performance neural + networks.", "tokens": [50412, 583, 286, 519, 2854, 1433, 300, 311, 983, 286, 536, + 721, 12798, 570, 291, 458, 321, 434, 1242, 666, 613, 588, 665, 2674, 3389, 18161, + 9590, 13, 50832], "temperature": 0.0, "avg_logprob": -0.1267379685944202, "compression_ratio": + 1.686131386861314, "no_speech_prob": 0.004201448056846857}, {"id": 764, "seek": + 356892, "start": 3580.2000000000003, "end": 3589.2400000000002, "text": " Systems + like Bert that can just perform well on a wide range of tasks and then you have + like you know t5 and now mt5 and you can go across like 100 different languages + as well.", "tokens": [50928, 27059, 411, 29594, 300, 393, 445, 2042, 731, 322, 257, + 4874, 3613, 295, 9608, 293, 550, 291, 362, 411, 291, 458, 256, 20, 293, 586, 275, + 83, 20, 293, 291, 393, 352, 2108, 411, 2319, 819, 8650, 382, 731, 13, 51380], "temperature": + 0.0, "avg_logprob": -0.1267379685944202, "compression_ratio": 1.686131386861314, + "no_speech_prob": 0.004201448056846857}, {"id": 765, "seek": 356892, "start": 3590.36, + "end": 3596.44, "text": " So there will eventually be models that are good enough + and someone''s going to take the effort to distill them into something that runs + well.", "tokens": [51436, 407, 456, 486, 4728, 312, 5245, 300, 366, 665, 1547, 293, + 1580, 311, 516, 281, 747, 264, 4630, 281, 42923, 552, 666, 746, 300, 6676, 731, + 13, 51740], "temperature": 0.0, "avg_logprob": -0.1267379685944202, "compression_ratio": + 1.686131386861314, "no_speech_prob": 0.004201448056846857}, {"id": 766, "seek": + 359644, "start": 3597.4, "end": 3603.88, "text": " And and you know anybody in any + organization will be able to to download and use it the way to use Lucene today.", + "tokens": [50412, 400, 293, 291, 458, 4472, 294, 604, 4475, 486, 312, 1075, 281, + 281, 5484, 293, 764, 309, 264, 636, 281, 764, 9593, 1450, 965, 13, 50736], "temperature": + 0.0, "avg_logprob": -0.15026896794637043, "compression_ratio": 1.6929824561403508, + "no_speech_prob": 0.01676947996020317}, {"id": 767, "seek": 359644, "start": 3603.88, + "end": 3608.36, "text": " I think that''s where things will be but it might be it + might be five years before we reach that point.", "tokens": [50736, 286, 519, 300, + 311, 689, 721, 486, 312, 457, 309, 1062, 312, 309, 1062, 312, 1732, 924, 949, 321, + 2524, 300, 935, 13, 50960], "temperature": 0.0, "avg_logprob": -0.15026896794637043, + "compression_ratio": 1.6929824561403508, "no_speech_prob": 0.01676947996020317}, + {"id": 768, "seek": 359644, "start": 3609.16, "end": 3619.88, "text": " Yeah yeah + and I mean to take this thought forward from here like like maybe the profession + do you think the profession will change in such a way that instead of tweaking.", + "tokens": [51000, 865, 1338, 293, 286, 914, 281, 747, 341, 1194, 2128, 490, 510, + 411, 411, 1310, 264, 7032, 360, 291, 519, 264, 7032, 486, 1319, 294, 1270, 257, + 636, 300, 2602, 295, 6986, 2456, 13, 51536], "temperature": 0.0, "avg_logprob": + -0.15026896794637043, "compression_ratio": 1.6929824561403508, "no_speech_prob": + 0.01676947996020317}, {"id": 769, "seek": 361988, "start": 3620.44, "end": 3628.84, + "text": " The index configuration to make your search kind of work better like increase + recall and you know not suffer from decreased precision.", "tokens": [50392, 440, + 8186, 11694, 281, 652, 428, 3164, 733, 295, 589, 1101, 411, 3488, 9901, 293, 291, + 458, 406, 9753, 490, 24436, 18356, 13, 50812], "temperature": 0.0, "avg_logprob": + -0.2006618037368312, "compression_ratio": 1.5561224489795917, "no_speech_prob": + 0.0060213725082576275}, {"id": 770, "seek": 361988, "start": 3629.8, "end": 3639.2400000000002, + "text": " You will move more like into okay here is the problem and this of the + shelf network doesn''t work I have to fine tune it so you become a little bit more + like a researcher.", "tokens": [50860, 509, 486, 1286, 544, 411, 666, 1392, 510, + 307, 264, 1154, 293, 341, 295, 264, 15222, 3209, 1177, 380, 589, 286, 362, 281, + 2489, 10864, 309, 370, 291, 1813, 257, 707, 857, 544, 411, 257, 21751, 13, 51332], + "temperature": 0.0, "avg_logprob": -0.2006618037368312, "compression_ratio": 1.5561224489795917, + "no_speech_prob": 0.0060213725082576275}, {"id": 771, "seek": 363924, "start": 3639.9599999999996, + "end": 3666.9199999999996, "text": " Yeah so that''s an excellent point I think + one of the key components in these systems and that we have not built yet in our + system but it''s in the it''s in the blueprints is some kind of a feedback mechanism + you''ll notice this in Kendra though for instance thumbs up thumbs down on the results + for instance where you indicate what''s good and what''s bad and then even with + a small amount of that data you can start to train a re-ranker.", "tokens": [50400, + 865, 370, 300, 311, 364, 7103, 935, 286, 519, 472, 295, 264, 2141, 6677, 294, 613, + 3652, 293, 300, 321, 362, 406, 3094, 1939, 294, 527, 1185, 457, 309, 311, 294, 264, + 309, 311, 294, 264, 888, 23547, 47523, 307, 512, 733, 295, 257, 5824, 7513, 291, + 603, 3449, 341, 294, 20891, 424, 1673, 337, 5197, 8838, 493, 8838, 760, 322, 264, + 3542, 337, 5197, 689, 291, 13330, 437, 311, 665, 293, 437, 311, 1578, 293, 550, + 754, 365, 257, 1359, 2372, 295, 300, 1412, 291, 393, 722, 281, 3847, 257, 319, 12, + 20479, 260, 13, 51748], "temperature": 0.0, "avg_logprob": -0.1648145294189453, + "compression_ratio": 1.7182539682539681, "no_speech_prob": 0.011909730732440948}, + {"id": 772, "seek": 366692, "start": 3667.48, "end": 3677.16, "text": " And I think + that in the presence of like the volumes of data that you get on an internal application + let''s say you''re going to get a few thousand items of feedback.", "tokens": [50392, + 400, 286, 519, 300, 294, 264, 6814, 295, 411, 264, 22219, 295, 1412, 300, 291, 483, + 322, 364, 6920, 3861, 718, 311, 584, 291, 434, 516, 281, 483, 257, 1326, 4714, 4754, + 295, 5824, 13, 50876], "temperature": 0.0, "avg_logprob": -0.16178493089573356, + "compression_ratio": 1.657370517928287, "no_speech_prob": 0.003004827070981264}, + {"id": 773, "seek": 366692, "start": 3677.8, "end": 3691.56, "text": " Training + a re-ranker is probably the most effective thing that you can do that data whether + it''s a random for a free rank or you take a cross attention on your network and + you fine tune it but you can significantly improve the search quality that way.", + "tokens": [50908, 20620, 257, 319, 12, 20479, 260, 307, 1391, 264, 881, 4942, 551, + 300, 291, 393, 360, 300, 1412, 1968, 309, 311, 257, 4974, 337, 257, 1737, 6181, + 420, 291, 747, 257, 3278, 3202, 322, 428, 3209, 293, 291, 2489, 10864, 309, 457, + 291, 393, 10591, 3470, 264, 3164, 3125, 300, 636, 13, 51596], "temperature": 0.0, + "avg_logprob": -0.16178493089573356, "compression_ratio": 1.657370517928287, "no_speech_prob": + 0.003004827070981264}, {"id": 774, "seek": 369156, "start": 3691.56, "end": 3712.92, + "text": " So so so I think that the the machinery for doing all of that can also + be part of the open source offering because because it''s it''s very broadly applicable + and can be used by basically anyone because like you say this is the problem that + that then comes up is like I want to give feedback on this results so the system + can improve itself.", "tokens": [50364, 407, 370, 370, 286, 519, 300, 264, 264, + 27302, 337, 884, 439, 295, 300, 393, 611, 312, 644, 295, 264, 1269, 4009, 8745, + 570, 570, 309, 311, 309, 311, 588, 19511, 21142, 293, 393, 312, 1143, 538, 1936, + 2878, 570, 411, 291, 584, 341, 307, 264, 1154, 300, 300, 550, 1487, 493, 307, 411, + 286, 528, 281, 976, 5824, 322, 341, 3542, 370, 264, 1185, 393, 3470, 2564, 13, 51432], + "temperature": 0.0, "avg_logprob": -0.16626375015467815, "compression_ratio": 1.6699507389162562, + "no_speech_prob": 0.025836151093244553}, {"id": 775, "seek": 371292, "start": 3713.7200000000003, + "end": 3728.12, "text": " Yeah yeah absolutely so you kind of create the flywheel + of success right so that you you bring the data back and then the model retrain + and so on so forth but there is also there are also like interesting challenges + like in your old network like catastrophic forgetting.", "tokens": [50404, 865, + 1338, 3122, 370, 291, 733, 295, 1884, 264, 3603, 22830, 295, 2245, 558, 370, 300, + 291, 291, 1565, 264, 1412, 646, 293, 550, 264, 2316, 1533, 7146, 293, 370, 322, + 370, 5220, 457, 456, 307, 611, 456, 366, 611, 411, 1880, 4759, 411, 294, 428, 1331, + 3209, 411, 34915, 25428, 13, 51124], "temperature": 0.0, "avg_logprob": -0.14932635522657825, + "compression_ratio": 1.7480314960629921, "no_speech_prob": 0.15311677753925323}, + {"id": 776, "seek": 371292, "start": 3728.6800000000003, "end": 3740.92, "text": + " Like is this something that you''ve been thinking maybe back at Google or now + with your clients something that kind of you need to keep innovating or solve it + some other way.", "tokens": [51152, 1743, 307, 341, 746, 300, 291, 600, 668, 1953, + 1310, 646, 412, 3329, 420, 586, 365, 428, 6982, 746, 300, 733, 295, 291, 643, 281, + 1066, 5083, 990, 420, 5039, 309, 512, 661, 636, 13, 51764], "temperature": 0.0, + "avg_logprob": -0.14932635522657825, "compression_ratio": 1.7480314960629921, "no_speech_prob": + 0.15311677753925323}, {"id": 777, "seek": 374092, "start": 3740.92, "end": 3770.84, + "text": " Yeah so I am familiar with the concept of catastrophic forgetting I honestly + haven''t studied it very much in the context of of these large language models like + Bert although in general the approach of you know taking a Bert type model and fine + tuning seems seems to be working well but but then you''re essentially talking about + taking after has been fine tuned on one task and then fine tuning for different + task and it''s going to be a great deal.", "tokens": [50400, 865, 370, 286, 669, + 4963, 365, 264, 3410, 295, 34915, 25428, 286, 6095, 2378, 380, 9454, 309, 588, 709, + 294, 264, 4319, 295, 295, 613, 2416, 2856, 5245, 411, 29594, 4878, 294, 2674, 264, + 3109, 295, 291, 458, 1940, 257, 29594, 2010, 2316, 293, 2489, 15164, 2544, 2544, + 281, 312, 1364, 731, 457, 457, 550, 291, 434, 4476, 1417, 466, 1940, 934, 575, 668, + 2489, 10870, 322, 472, 5633, 293, 550, 2489, 15164, 337, 819, 5633, 293, 309, 311, + 516, 281, 312, 257, 869, 2028, 13, 51860], "temperature": 0.0, "avg_logprob": -0.3113837510012509, + "compression_ratio": 1.7192307692307693, "no_speech_prob": 0.007751050870865583}, + {"id": 778, "seek": 377092, "start": 3770.92, "end": 3781.28, "text": " Because + I do think it''s going to get solved because it increases its abilities on the first + task. And yeah I guess I don''t know how much of an issue that''s that''s going + to be in the context as information retrieval.", "tokens": [50364, 1436, 286, 360, + 519, 309, 311, 516, 281, 483, 13041, 570, 309, 8637, 1080, 11582, 322, 264, 700, + 5633, 13, 400, 1338, 286, 2041, 286, 500, 380, 458, 577, 709, 295, 364, 2734, 300, + 311, 300, 311, 516, 281, 312, 294, 264, 4319, 382, 1589, 19817, 3337, 13, 50882], + "temperature": 0.8, "avg_logprob": -0.6311803098584785, "compression_ratio": 1.7327044025157232, + "no_speech_prob": 0.011484101414680481}, {"id": 779, "seek": 377092, "start": 3781.64, + "end": 3796.28, "text": " Yeah I mean another thing like if you are familiar with + learning to rank for example, which may or may not involve in your own network it + may also be based on decision tree like lambda marked for example, you know when + you receive a new batch of clicks or downloads or whatever events you have in the + system and you retrain that model.", "tokens": [50900, 865, 286, 914, 1071, 551, + 411, 498, 291, 366, 4963, 365, 2539, 281, 6181, 337, 1365, 11, 597, 815, 420, 815, + 406, 9494, 294, 428, 1065, 3209, 309, 815, 611, 312, 2361, 322, 3537, 4230, 411, + 13607, 12658, 337, 1365, 11, 291, 458, 562, 291, 4774, 257, 777, 15245, 295, 18521, + 420, 36553, 420, 2035, 3931, 291, 362, 294, 264, 1185, 293, 291, 1533, 7146, 300, + 2316, 13, 51632], "temperature": 0.8, "avg_logprob": -0.6311803098584785, "compression_ratio": + 1.7327044025157232, "no_speech_prob": 0.011484101414680481}, {"id": 780, "seek": + 379628, "start": 3796.28, "end": 3800.0, "text": " clicks or downloads or whatever + events you have in the system and you", "tokens": [50364, 18521, 420, 36553, 420, + 2035, 3931, 291, 362, 294, 264, 1185, 293, 291, 50550], "temperature": 0.0, "avg_logprob": + -0.2876692842846074, "compression_ratio": 1.7263513513513513, "no_speech_prob": + 0.43136316537857056}, {"id": 781, "seek": 379628, "start": 3800.0, "end": 3804.7200000000003, + "text": " retain that model, it will also forget what it knew about the previous + state,", "tokens": [50550, 18340, 300, 2316, 11, 309, 486, 611, 2870, 437, 309, + 2586, 466, 264, 3894, 1785, 11, 50786], "temperature": 0.0, "avg_logprob": -0.2876692842846074, + "compression_ratio": 1.7263513513513513, "no_speech_prob": 0.43136316537857056}, + {"id": 782, "seek": 379628, "start": 3804.7200000000003, "end": 3810.1600000000003, + "text": " right? It''s very natural and it probably is we can associate it with + human life", "tokens": [50786, 558, 30, 467, 311, 588, 3303, 293, 309, 1391, 307, + 321, 393, 14644, 309, 365, 1952, 993, 51058], "temperature": 0.0, "avg_logprob": + -0.2876692842846074, "compression_ratio": 1.7263513513513513, "no_speech_prob": + 0.43136316537857056}, {"id": 783, "seek": 379628, "start": 3810.1600000000003, "end": + 3813.6400000000003, "text": " as well in some sense, although they say the older + you get, the earlier", "tokens": [51058, 382, 731, 294, 512, 2020, 11, 4878, 436, + 584, 264, 4906, 291, 483, 11, 264, 3071, 51232], "temperature": 0.0, "avg_logprob": + -0.2876692842846074, "compression_ratio": 1.7263513513513513, "no_speech_prob": + 0.43136316537857056}, {"id": 784, "seek": 379628, "start": 3813.6400000000003, "end": + 3817.8, "text": " memories you will actually remember, you might forget what happened + yesterday,", "tokens": [51232, 8495, 291, 486, 767, 1604, 11, 291, 1062, 2870, 437, + 2011, 5186, 11, 51440], "temperature": 0.0, "avg_logprob": -0.2876692842846074, + "compression_ratio": 1.7263513513513513, "no_speech_prob": 0.43136316537857056}, + {"id": 785, "seek": 379628, "start": 3817.8, "end": 3821.1200000000003, "text": + " but you remember what happened like 50 years ago. But like, yeah,", "tokens": + [51440, 457, 291, 1604, 437, 2011, 411, 2625, 924, 2057, 13, 583, 411, 11, 1338, + 11, 51606], "temperature": 0.0, "avg_logprob": -0.2876692842846074, "compression_ratio": + 1.7263513513513513, "no_speech_prob": 0.43136316537857056}, {"id": 786, "seek": + 379628, "start": 3821.1200000000003, "end": 3824.0800000000004, "text": " what''s + probably noticing that with myself. Yeah, me too, actually,", "tokens": [51606, + 437, 311, 1391, 21814, 300, 365, 2059, 13, 865, 11, 385, 886, 11, 767, 11, 51754], + "temperature": 0.0, "avg_logprob": -0.2876692842846074, "compression_ratio": 1.7263513513513513, + "no_speech_prob": 0.43136316537857056}, {"id": 787, "seek": 382408, "start": 3824.3199999999997, + "end": 3827.84, "text": " because days go by and I''m like, okay, what''s going + on? But then you go,", "tokens": [50376, 570, 1708, 352, 538, 293, 286, 478, 411, + 11, 1392, 11, 437, 311, 516, 322, 30, 583, 550, 291, 352, 11, 50552], "temperature": + 0.0, "avg_logprob": -0.20278196167527585, "compression_ratio": 1.6641509433962265, + "no_speech_prob": 0.01039421558380127}, {"id": 788, "seek": 382408, "start": 3827.84, + "end": 3833.16, "text": " okay, when I was a kid, I remember something. But like + neural networks are", "tokens": [50552, 1392, 11, 562, 286, 390, 257, 1636, 11, + 286, 1604, 746, 13, 583, 411, 18161, 9590, 366, 50818], "temperature": 0.0, "avg_logprob": + -0.20278196167527585, "compression_ratio": 1.6641509433962265, "no_speech_prob": + 0.01039421558380127}, {"id": 789, "seek": 382408, "start": 3833.16, "end": 3836.4, + "text": " probably a little bit different, or at least the present neural networks.", + "tokens": [50818, 1391, 257, 707, 857, 819, 11, 420, 412, 1935, 264, 1974, 18161, + 9590, 13, 50980], "temperature": 0.0, "avg_logprob": -0.20278196167527585, "compression_ratio": + 1.6641509433962265, "no_speech_prob": 0.01039421558380127}, {"id": 790, "seek": + 382408, "start": 3837.7999999999997, "end": 3843.0, "text": " Right. So I think + when you when you retrain the model, like you have to", "tokens": [51050, 1779, + 13, 407, 286, 519, 562, 291, 562, 291, 1533, 7146, 264, 2316, 11, 411, 291, 362, + 281, 51310], "temperature": 0.0, "avg_logprob": -0.20278196167527585, "compression_ratio": + 1.6641509433962265, "no_speech_prob": 0.01039421558380127}, {"id": 791, "seek": + 382408, "start": 3843.0, "end": 3846.64, "text": " retrain otherwise, it will drift, + right? I think Google also has a paper", "tokens": [51310, 1533, 7146, 5911, 11, + 309, 486, 19699, 11, 558, 30, 286, 519, 3329, 611, 575, 257, 3035, 51492], "temperature": + 0.0, "avg_logprob": -0.20278196167527585, "compression_ratio": 1.6641509433962265, + "no_speech_prob": 0.01039421558380127}, {"id": 792, "seek": 382408, "start": 3846.64, + "end": 3852.04, "text": " about that, like kind of checking the consistency of your + machine learning", "tokens": [51492, 466, 300, 11, 411, 733, 295, 8568, 264, 14416, + 295, 428, 3479, 2539, 51762], "temperature": 0.0, "avg_logprob": -0.20278196167527585, + "compression_ratio": 1.6641509433962265, "no_speech_prob": 0.01039421558380127}, + {"id": 793, "seek": 385204, "start": 3852.04, "end": 3855.8, "text": " pipeline + and your model. So it doesn''t drift and just explode in the eyes of", "tokens": + [50364, 15517, 293, 428, 2316, 13, 407, 309, 1177, 380, 19699, 293, 445, 21411, + 294, 264, 2575, 295, 50552], "temperature": 0.0, "avg_logprob": -0.22144873529417902, + "compression_ratio": 1.7234848484848484, "no_speech_prob": 0.019363148137927055}, + {"id": 794, "seek": 385204, "start": 3855.8, "end": 3859.88, "text": " the front + of the eyes of the user, right? So you have to keep retraining it.", "tokens": [50552, + 264, 1868, 295, 264, 2575, 295, 264, 4195, 11, 558, 30, 407, 291, 362, 281, 1066, + 49356, 1760, 309, 13, 50756], "temperature": 0.0, "avg_logprob": -0.22144873529417902, + "compression_ratio": 1.7234848484848484, "no_speech_prob": 0.019363148137927055}, + {"id": 795, "seek": 385204, "start": 3860.64, "end": 3864.64, "text": " But but + then that also means that it will forget things. Maybe they were quite", "tokens": + [50794, 583, 457, 550, 300, 611, 1355, 300, 309, 486, 2870, 721, 13, 2704, 436, + 645, 1596, 50994], "temperature": 0.0, "avg_logprob": -0.22144873529417902, "compression_ratio": + 1.7234848484848484, "no_speech_prob": 0.019363148137927055}, {"id": 796, "seek": + 385204, "start": 3864.64, "end": 3869.64, "text": " important. Maybe they are not + high probability anymore, but they still are", "tokens": [50994, 1021, 13, 2704, + 436, 366, 406, 1090, 8482, 3602, 11, 457, 436, 920, 366, 51244], "temperature": + 0.0, "avg_logprob": -0.22144873529417902, "compression_ratio": 1.7234848484848484, + "no_speech_prob": 0.019363148137927055}, {"id": 797, "seek": 385204, "start": 3869.64, + "end": 3875.08, "text": " true. But the network has forgotten about them. Right, + right, right. Yeah,", "tokens": [51244, 2074, 13, 583, 264, 3209, 575, 11832, 466, + 552, 13, 1779, 11, 558, 11, 558, 13, 865, 11, 51516], "temperature": 0.0, "avg_logprob": + -0.22144873529417902, "compression_ratio": 1.7234848484848484, "no_speech_prob": + 0.019363148137927055}, {"id": 798, "seek": 385204, "start": 3875.08, "end": 3882.0, + "text": " then yeah, that makes sense. Yeah. Anyway, it was it was a great talking", + "tokens": [51516, 550, 1338, 11, 300, 1669, 2020, 13, 865, 13, 5684, 11, 309, 390, + 309, 390, 257, 869, 1417, 51862], "temperature": 0.0, "avg_logprob": -0.22144873529417902, + "compression_ratio": 1.7234848484848484, "no_speech_prob": 0.019363148137927055}, + {"id": 799, "seek": 388200, "start": 3882.0, "end": 3886.36, "text": " to you, but + I still want to close off. And before we go to some announcement for", "tokens": + [50364, 281, 291, 11, 457, 286, 920, 528, 281, 1998, 766, 13, 400, 949, 321, 352, + 281, 512, 12847, 337, 50582], "temperature": 0.0, "avg_logprob": -0.24063308839875508, + "compression_ratio": 1.758364312267658, "no_speech_prob": 0.01530259195715189}, + {"id": 800, "seek": 388200, "start": 3886.36, "end": 3891.16, "text": " you, I''m + thinking like I''m asking this question to different to all guests and", "tokens": + [50582, 291, 11, 286, 478, 1953, 411, 286, 478, 3365, 341, 1168, 281, 819, 281, + 439, 9804, 293, 50822], "temperature": 0.0, "avg_logprob": -0.24063308839875508, + "compression_ratio": 1.758364312267658, "no_speech_prob": 0.01530259195715189}, + {"id": 801, "seek": 388200, "start": 3891.16, "end": 3896.56, "text": " different + guests take it differently. But I really would like to hear your view", "tokens": + [50822, 819, 9804, 747, 309, 7614, 13, 583, 286, 534, 576, 411, 281, 1568, 428, + 1910, 51092], "temperature": 0.0, "avg_logprob": -0.24063308839875508, "compression_ratio": + 1.758364312267658, "no_speech_prob": 0.01530259195715189}, {"id": 802, "seek": 388200, + "start": 3896.96, "end": 3901.72, "text": " on that question of why it''s a little + bit more philosophical. Like like in a", "tokens": [51112, 322, 300, 1168, 295, + 983, 309, 311, 257, 707, 857, 544, 25066, 13, 1743, 411, 294, 257, 51350], "temperature": + 0.0, "avg_logprob": -0.24063308839875508, "compression_ratio": 1.758364312267658, + "no_speech_prob": 0.01530259195715189}, {"id": 803, "seek": 388200, "start": 3901.72, + "end": 3906.24, "text": " way, like you had a stable job at Google, a lot of challenge, + a lot of data,", "tokens": [51350, 636, 11, 411, 291, 632, 257, 8351, 1691, 412, + 3329, 11, 257, 688, 295, 3430, 11, 257, 688, 295, 1412, 11, 51576], "temperature": + 0.0, "avg_logprob": -0.24063308839875508, "compression_ratio": 1.758364312267658, + "no_speech_prob": 0.01530259195715189}, {"id": 804, "seek": 388200, "start": 3906.36, + "end": 3911.0, "text": " a lot of user impact. Like as you said, like Autorripe + Live feature was enabled", "tokens": [51582, 257, 688, 295, 4195, 2712, 13, 1743, + 382, 291, 848, 11, 411, 6049, 284, 470, 494, 10385, 4111, 390, 15172, 51814], "temperature": + 0.0, "avg_logprob": -0.24063308839875508, "compression_ratio": 1.758364312267658, + "no_speech_prob": 0.01530259195715189}, {"id": 805, "seek": 391100, "start": 3911.0, + "end": 3916.88, "text": " and to like millions and millions of users. So then you + then you decided to go to", "tokens": [50364, 293, 281, 411, 6803, 293, 6803, 295, + 5022, 13, 407, 550, 291, 550, 291, 3047, 281, 352, 281, 50658], "temperature": 0.0, + "avg_logprob": -0.2420102528163365, "compression_ratio": 1.626984126984127, "no_speech_prob": + 0.004987666383385658}, {"id": 806, "seek": 391100, "start": 3916.88, "end": 3922.0, + "text": " to build your startup and that''s a nice nice kind of way to experience + like another", "tokens": [50658, 281, 1322, 428, 18578, 293, 300, 311, 257, 1481, + 1481, 733, 295, 636, 281, 1752, 411, 1071, 50914], "temperature": 0.0, "avg_logprob": + -0.2420102528163365, "compression_ratio": 1.626984126984127, "no_speech_prob": 0.004987666383385658}, + {"id": 807, "seek": 391100, "start": 3922.0, "end": 3927.8, "text": " side of things. + But why specifically neural search? What drives you to to work on it?", "tokens": + [50914, 1252, 295, 721, 13, 583, 983, 4682, 18161, 3164, 30, 708, 11754, 291, 281, + 281, 589, 322, 309, 30, 51204], "temperature": 0.0, "avg_logprob": -0.2420102528163365, + "compression_ratio": 1.626984126984127, "no_speech_prob": 0.004987666383385658}, + {"id": 808, "seek": 391100, "start": 3929.08, "end": 3935.48, "text": " Well, what + attracted me to I was initially attracted very much to the idea of", "tokens": [51268, + 1042, 11, 437, 15912, 385, 281, 286, 390, 9105, 15912, 588, 709, 281, 264, 1558, + 295, 51588], "temperature": 0.0, "avg_logprob": -0.2420102528163365, "compression_ratio": + 1.626984126984127, "no_speech_prob": 0.004987666383385658}, {"id": 809, "seek": + 391100, "start": 3935.48, "end": 3940.36, "text": " automated reasoning. And then + of course, that comes if it''s current incarnation,", "tokens": [51588, 18473, 21577, + 13, 400, 550, 295, 1164, 11, 300, 1487, 498, 309, 311, 2190, 49988, 11, 51832], + "temperature": 0.0, "avg_logprob": -0.2420102528163365, "compression_ratio": 1.626984126984127, + "no_speech_prob": 0.004987666383385658}, {"id": 810, "seek": 394036, "start": 3940.36, + "end": 3946.6800000000003, "text": " it''s machine learning. And so I started to + learn about that. And I had this", "tokens": [50364, 309, 311, 3479, 2539, 13, 400, + 370, 286, 1409, 281, 1466, 466, 300, 13, 400, 286, 632, 341, 50680], "temperature": + 0.0, "avg_logprob": -0.24763920542958018, "compression_ratio": 1.5236220472440944, + "no_speech_prob": 0.0061943805776536465}, {"id": 811, "seek": 394036, "start": 3946.6800000000003, + "end": 3951.08, "text": " opportunity to work with the Ray Kurzweil who joined Google, + I think around 2012.", "tokens": [50680, 2650, 281, 589, 365, 264, 10883, 45307, + 826, 388, 567, 6869, 3329, 11, 286, 519, 926, 9125, 13, 50900], "temperature": 0.0, + "avg_logprob": -0.24763920542958018, "compression_ratio": 1.5236220472440944, "no_speech_prob": + 0.0061943805776536465}, {"id": 812, "seek": 394036, "start": 3952.04, "end": 3956.36, + "text": " I knew about him. He''s a very inspirational figure. And he was specifically", + "tokens": [50948, 286, 2586, 466, 796, 13, 634, 311, 257, 588, 33554, 2573, 13, + 400, 415, 390, 4682, 51164], "temperature": 0.0, "avg_logprob": -0.24763920542958018, + "compression_ratio": 1.5236220472440944, "no_speech_prob": 0.0061943805776536465}, + {"id": 813, "seek": 394036, "start": 3956.36, "end": 3960.08, "text": " working + on language understanding because he saw that as being very critical", "tokens": + [51164, 1364, 322, 2856, 3701, 570, 415, 1866, 300, 382, 885, 588, 4924, 51350], + "temperature": 0.0, "avg_logprob": -0.24763920542958018, "compression_ratio": 1.5236220472440944, + "no_speech_prob": 0.0061943805776536465}, {"id": 814, "seek": 394036, "start": 3961.08, + "end": 3968.2400000000002, "text": " to advancement in artificial intelligence. + So so you know, then beyond that,", "tokens": [51400, 281, 35764, 294, 11677, 7599, + 13, 407, 370, 291, 458, 11, 550, 4399, 300, 11, 51758], "temperature": 0.0, "avg_logprob": + -0.24763920542958018, "compression_ratio": 1.5236220472440944, "no_speech_prob": + 0.0061943805776536465}, {"id": 815, "seek": 396824, "start": 3968.24, "end": 3972.0, + "text": " I would say those are my broad interests. But then I just worked in this + area", "tokens": [50364, 286, 576, 584, 729, 366, 452, 4152, 8847, 13, 583, 550, + 286, 445, 2732, 294, 341, 1859, 50552], "temperature": 0.0, "avg_logprob": -0.15510095868791854, + "compression_ratio": 1.6135458167330676, "no_speech_prob": 0.00186805403791368}, + {"id": 816, "seek": 396824, "start": 3972.0, "end": 3978.16, "text": " specifically + for eight years. And I think I became quite good at what I was doing.", "tokens": + [50552, 4682, 337, 3180, 924, 13, 400, 286, 519, 286, 3062, 1596, 665, 412, 437, + 286, 390, 884, 13, 50860], "temperature": 0.0, "avg_logprob": -0.15510095868791854, + "compression_ratio": 1.6135458167330676, "no_speech_prob": 0.00186805403791368}, + {"id": 817, "seek": 396824, "start": 3978.16, "end": 3983.24, "text": " And then + also saw that what I was doing post 2017 in particular with this", "tokens": [50860, + 400, 550, 611, 1866, 300, 437, 286, 390, 884, 2183, 6591, 294, 1729, 365, 341, 51114], + "temperature": 0.0, "avg_logprob": -0.15510095868791854, "compression_ratio": 1.6135458167330676, + "no_speech_prob": 0.00186805403791368}, {"id": 818, "seek": 396824, "start": 3983.24, + "end": 3991.8799999999997, "text": " neural network based retrieval had a lot of + applicability to products. And you know,", "tokens": [51114, 18161, 3209, 2361, + 19817, 3337, 632, 257, 688, 295, 2580, 2310, 281, 3383, 13, 400, 291, 458, 11, 51546], + "temperature": 0.0, "avg_logprob": -0.15510095868791854, "compression_ratio": 1.6135458167330676, + "no_speech_prob": 0.00186805403791368}, {"id": 819, "seek": 396824, "start": 3991.8799999999997, + "end": 3996.8799999999997, "text": " I think that being in a research team or research + team has a different type of focus.", "tokens": [51546, 286, 519, 300, 885, 294, + 257, 2132, 1469, 420, 2132, 1469, 575, 257, 819, 2010, 295, 1879, 13, 51796], "temperature": + 0.0, "avg_logprob": -0.15510095868791854, "compression_ratio": 1.6135458167330676, + "no_speech_prob": 0.00186805403791368}, {"id": 820, "seek": 399688, "start": 3997.6800000000003, + "end": 4002.0, "text": " There''s a lot of focus on on publishing papers and things, + but not necessarily a lot of", "tokens": [50404, 821, 311, 257, 688, 295, 1879, + 322, 322, 17832, 10577, 293, 721, 11, 457, 406, 4725, 257, 688, 295, 50620], "temperature": + 0.0, "avg_logprob": -0.15346144013485666, "compression_ratio": 1.644927536231884, + "no_speech_prob": 0.006761382799595594}, {"id": 821, "seek": 399688, "start": 4002.0, + "end": 4008.88, "text": " interest or appetite for building platform. So in that + way, maybe this wasn''t really the right place", "tokens": [50620, 1179, 420, 23996, + 337, 2390, 3663, 13, 407, 294, 300, 636, 11, 1310, 341, 2067, 380, 534, 264, 558, + 1081, 50964], "temperature": 0.0, "avg_logprob": -0.15346144013485666, "compression_ratio": + 1.644927536231884, "no_speech_prob": 0.006761382799595594}, {"id": 822, "seek": + 399688, "start": 4008.88, "end": 4016.0, "text": " to attempt that kind of work. + But to me, I''m an engineer as well. So this is this is very interesting.", "tokens": + [50964, 281, 5217, 300, 733, 295, 589, 13, 583, 281, 385, 11, 286, 478, 364, 11403, + 382, 731, 13, 407, 341, 307, 341, 307, 588, 1880, 13, 51320], "temperature": 0.0, + "avg_logprob": -0.15346144013485666, "compression_ratio": 1.644927536231884, "no_speech_prob": + 0.006761382799595594}, {"id": 823, "seek": 399688, "start": 4017.52, "end": 4020.7200000000003, + "text": " And I''m not sure if I''m answering your question, but that''s some of + my motivation.", "tokens": [51396, 400, 286, 478, 406, 988, 498, 286, 478, 13430, + 428, 1168, 11, 457, 300, 311, 512, 295, 452, 12335, 13, 51556], "temperature": 0.0, + "avg_logprob": -0.15346144013485666, "compression_ratio": 1.644927536231884, "no_speech_prob": + 0.006761382799595594}, {"id": 824, "seek": 399688, "start": 4020.7200000000003, + "end": 4026.6400000000003, "text": " No, you do. I mean, essentially, I''m currently + leading a search team. And yeah,", "tokens": [51556, 883, 11, 291, 360, 13, 286, + 914, 11, 4476, 11, 286, 478, 4362, 5775, 257, 3164, 1469, 13, 400, 1338, 11, 51852], + "temperature": 0.0, "avg_logprob": -0.15346144013485666, "compression_ratio": 1.644927536231884, + "no_speech_prob": 0.006761382799595594}, {"id": 825, "seek": 402664, "start": 4026.64, + "end": 4031.2799999999997, "text": " you know, our KPIs is like, okay, how many + papers you published, how many patents you can file.", "tokens": [50364, 291, 458, + 11, 527, 41371, 6802, 307, 411, 11, 1392, 11, 577, 867, 10577, 291, 6572, 11, 577, + 867, 38142, 291, 393, 3991, 13, 50596], "temperature": 0.0, "avg_logprob": -0.13301255702972412, + "compression_ratio": 1.6597222222222223, "no_speech_prob": 0.003590879263356328}, + {"id": 826, "seek": 402664, "start": 4032.72, "end": 4038.24, "text": " But also + when you start thinking, okay, what impact am I making, right? There is not that + much room", "tokens": [50668, 583, 611, 562, 291, 722, 1953, 11, 1392, 11, 437, + 2712, 669, 286, 1455, 11, 558, 30, 821, 307, 406, 300, 709, 1808, 50944], "temperature": + 0.0, "avg_logprob": -0.13301255702972412, "compression_ratio": 1.6597222222222223, + "no_speech_prob": 0.003590879263356328}, {"id": 827, "seek": 402664, "start": 4038.24, + "end": 4044.96, "text": " to think about creating, maybe you can create a vision, + but you might not necessarily tie it", "tokens": [50944, 281, 519, 466, 4084, 11, + 1310, 291, 393, 1884, 257, 5201, 11, 457, 291, 1062, 406, 4725, 7582, 309, 51280], + "temperature": 0.0, "avg_logprob": -0.13301255702972412, "compression_ratio": 1.6597222222222223, + "no_speech_prob": 0.003590879263356328}, {"id": 828, "seek": 402664, "start": 4044.96, + "end": 4050.0, "text": " in back to the day-to-day scenarios of users. You have + to be part of engineering, probably,", "tokens": [51280, 294, 646, 281, 264, 786, + 12, 1353, 12, 810, 15077, 295, 5022, 13, 509, 362, 281, 312, 644, 295, 7043, 11, + 1391, 11, 51532], "temperature": 0.0, "avg_logprob": -0.13301255702972412, "compression_ratio": + 1.6597222222222223, "no_speech_prob": 0.003590879263356328}, {"id": 829, "seek": + 402664, "start": 4050.0, "end": 4054.8799999999997, "text": " to start delivering + these things at which point you are no longer a researcher. So it sounds like", + "tokens": [51532, 281, 722, 14666, 613, 721, 412, 597, 935, 291, 366, 572, 2854, + 257, 21751, 13, 407, 309, 3263, 411, 51776], "temperature": 0.0, "avg_logprob": + -0.13301255702972412, "compression_ratio": 1.6597222222222223, "no_speech_prob": + 0.003590879263356328}, {"id": 830, "seek": 405488, "start": 4055.28, "end": 4060.32, + "text": " you managed to combine both of these engineering and research at ZRI.", + "tokens": [50384, 291, 6453, 281, 10432, 1293, 295, 613, 7043, 293, 2132, 412, 1176, + 5577, 13, 50636], "temperature": 0.0, "avg_logprob": -0.1666931254523141, "compression_ratio": + 1.6840148698884758, "no_speech_prob": 0.004649677779525518}, {"id": 831, "seek": + 405488, "start": 4062.32, "end": 4069.84, "text": " Yes, yes. It''s kind of both + of the passions together in one company. And if we''re successful,", "tokens": [50736, + 1079, 11, 2086, 13, 467, 311, 733, 295, 1293, 295, 264, 30640, 1214, 294, 472, 2237, + 13, 400, 498, 321, 434, 4406, 11, 51112], "temperature": 0.0, "avg_logprob": -0.1666931254523141, + "compression_ratio": 1.6840148698884758, "no_speech_prob": 0.004649677779525518}, + {"id": 832, "seek": 405488, "start": 4069.84, "end": 4074.08, "text": " and we can + take it into the future, the research end of the program is something that I''d + really", "tokens": [51112, 293, 321, 393, 747, 309, 666, 264, 2027, 11, 264, 2132, + 917, 295, 264, 1461, 307, 746, 300, 286, 1116, 534, 51324], "temperature": 0.0, + "avg_logprob": -0.1666931254523141, "compression_ratio": 1.6840148698884758, "no_speech_prob": + 0.004649677779525518}, {"id": 833, "seek": 405488, "start": 4074.08, "end": 4079.52, + "text": " like to ramp up a lot. Since we started, honestly, there''s been more + engineering and less research.", "tokens": [51324, 411, 281, 12428, 493, 257, 688, + 13, 4162, 321, 1409, 11, 6095, 11, 456, 311, 668, 544, 7043, 293, 1570, 2132, 13, + 51596], "temperature": 0.0, "avg_logprob": -0.1666931254523141, "compression_ratio": + 1.6840148698884758, "no_speech_prob": 0.004649677779525518}, {"id": 834, "seek": + 405488, "start": 4080.7200000000003, "end": 4084.1600000000003, "text": " The training, + the neural networks was at the early stage of the company, and then we haven''t", + "tokens": [51656, 440, 3097, 11, 264, 18161, 9590, 390, 412, 264, 2440, 3233, 295, + 264, 2237, 11, 293, 550, 321, 2378, 380, 51828], "temperature": 0.0, "avg_logprob": + -0.1666931254523141, "compression_ratio": 1.6840148698884758, "no_speech_prob": + 0.004649677779525518}, {"id": 835, "seek": 408416, "start": 4084.16, "end": 4093.04, + "text": " revisited it since then. But I think 2022 is going to be, first of all, + it''s going to be a", "tokens": [50364, 20767, 1226, 309, 1670, 550, 13, 583, 286, + 519, 20229, 307, 516, 281, 312, 11, 700, 295, 439, 11, 309, 311, 516, 281, 312, + 257, 50808], "temperature": 0.0, "avg_logprob": -0.1808463136355082, "compression_ratio": + 1.482905982905983, "no_speech_prob": 0.003978653345257044}, {"id": 836, "seek": + 408416, "start": 4093.04, "end": 4099.5199999999995, "text": " big year for this + industry. Beyond Pinecon getting funding, I was recently looking Gina AI,", "tokens": + [50808, 955, 1064, 337, 341, 3518, 13, 19707, 33531, 1671, 1242, 6137, 11, 286, + 390, 3938, 1237, 34711, 7318, 11, 51132], "temperature": 0.0, "avg_logprob": -0.1808463136355082, + "compression_ratio": 1.482905982905983, "no_speech_prob": 0.003978653345257044}, + {"id": 837, "seek": 408416, "start": 4099.5199999999995, "end": 4104.72, "text": + " if you''re familiar with them. They, I think raised $30 million, it was in TechCrunch.", + "tokens": [51132, 498, 291, 434, 4963, 365, 552, 13, 814, 11, 286, 519, 6005, 1848, + 3446, 2459, 11, 309, 390, 294, 13795, 38750, 1680, 13, 51392], "temperature": 0.0, + "avg_logprob": -0.1808463136355082, "compression_ratio": 1.482905982905983, "no_speech_prob": + 0.003978653345257044}, {"id": 838, "seek": 408416, "start": 4105.5199999999995, + "end": 4112.24, "text": " So the industry is starting to get some notice. And for + us as well, we expect,", "tokens": [51432, 407, 264, 3518, 307, 2891, 281, 483, + 512, 3449, 13, 400, 337, 505, 382, 731, 11, 321, 2066, 11, 51768], "temperature": + 0.0, "avg_logprob": -0.1808463136355082, "compression_ratio": 1.482905982905983, + "no_speech_prob": 0.003978653345257044}, {"id": 839, "seek": 411224, "start": 4113.2, + "end": 4116.08, "text": " we expect to really expand in 2022.", "tokens": [50412, + 321, 2066, 281, 534, 5268, 294, 20229, 13, 50556], "temperature": 0.0, "avg_logprob": + -0.1880799461813534, "compression_ratio": 1.683794466403162, "no_speech_prob": 0.014929133467376232}, + {"id": 840, "seek": 411224, "start": 4116.08, "end": 4123.12, "text": " Oh, yeah, + fantastic. And I mean, one manager that I worked with used to say that you need + to first", "tokens": [50556, 876, 11, 1338, 11, 5456, 13, 400, 286, 914, 11, 472, + 6598, 300, 286, 2732, 365, 1143, 281, 584, 300, 291, 643, 281, 700, 50908], "temperature": + 0.0, "avg_logprob": -0.1880799461813534, "compression_ratio": 1.683794466403162, + "no_speech_prob": 0.014929133467376232}, {"id": 841, "seek": 411224, "start": 4123.12, + "end": 4128.88, "text": " build the plumbing. And that''s your engineering work. + Once you have the plumbing, you can stand on", "tokens": [50908, 1322, 264, 39993, + 13, 400, 300, 311, 428, 7043, 589, 13, 3443, 291, 362, 264, 39993, 11, 291, 393, + 1463, 322, 51196], "temperature": 0.0, "avg_logprob": -0.1880799461813534, "compression_ratio": + 1.683794466403162, "no_speech_prob": 0.014929133467376232}, {"id": 842, "seek": + 411224, "start": 4128.88, "end": 4133.76, "text": " it and actually fix some other + things, high level. And that''s where you will probably come back to", "tokens": + [51196, 309, 293, 767, 3191, 512, 661, 721, 11, 1090, 1496, 13, 400, 300, 311, 689, + 291, 486, 1391, 808, 646, 281, 51440], "temperature": 0.0, "avg_logprob": -0.1880799461813534, + "compression_ratio": 1.683794466403162, "no_speech_prob": 0.014929133467376232}, + {"id": 843, "seek": 411224, "start": 4134.32, "end": 4138.719999999999, "text": + " training neural networks and actually nailing that use case for your customers. + Sounds really", "tokens": [51468, 3097, 18161, 9590, 293, 767, 10173, 278, 300, + 764, 1389, 337, 428, 4581, 13, 14576, 534, 51688], "temperature": 0.0, "avg_logprob": + -0.1880799461813534, "compression_ratio": 1.683794466403162, "no_speech_prob": 0.014929133467376232}, + {"id": 844, "seek": 413872, "start": 4138.72, "end": 4146.88, "text": " exciting. + This was really packed and so much thinking that you brought in. And also some", + "tokens": [50364, 4670, 13, 639, 390, 534, 13265, 293, 370, 709, 1953, 300, 291, + 3038, 294, 13, 400, 611, 512, 50772], "temperature": 0.0, "avg_logprob": -0.13732930525992681, + "compression_ratio": 1.6818181818181819, "no_speech_prob": 0.007298737298697233}, + {"id": 845, "seek": 413872, "start": 4146.88, "end": 4151.52, "text": " discoveries + during this conversation. I really enjoyed it. I''m just thinking, is there something", + "tokens": [50772, 28400, 1830, 341, 3761, 13, 286, 534, 4626, 309, 13, 286, 478, + 445, 1953, 11, 307, 456, 746, 51004], "temperature": 0.0, "avg_logprob": -0.13732930525992681, + "compression_ratio": 1.6818181818181819, "no_speech_prob": 0.007298737298697233}, + {"id": 846, "seek": 413872, "start": 4151.52, "end": 4155.6, "text": " you would + like to announce from your product side, something that our listeners can try?", + "tokens": [51004, 291, 576, 411, 281, 7478, 490, 428, 1674, 1252, 11, 746, 300, + 527, 23274, 393, 853, 30, 51208], "temperature": 0.0, "avg_logprob": -0.13732930525992681, + "compression_ratio": 1.6818181818181819, "no_speech_prob": 0.007298737298697233}, + {"id": 847, "seek": 413872, "start": 4157.6, "end": 4161.52, "text": " Well, thank + you. Thank you for the opportunity. I think what I would say is that if", "tokens": + [51308, 1042, 11, 1309, 291, 13, 1044, 291, 337, 264, 2650, 13, 286, 519, 437, 286, + 576, 584, 307, 300, 498, 51504], "temperature": 0.0, "avg_logprob": -0.13732930525992681, + "compression_ratio": 1.6818181818181819, "no_speech_prob": 0.007298737298697233}, + {"id": 848, "seek": 413872, "start": 4162.16, "end": 4166.320000000001, "text": + " what we''ve been talking about is interesting and if someone would like to try + it out,", "tokens": [51536, 437, 321, 600, 668, 1417, 466, 307, 1880, 293, 498, + 1580, 576, 411, 281, 853, 309, 484, 11, 51744], "temperature": 0.0, "avg_logprob": + -0.13732930525992681, "compression_ratio": 1.6818181818181819, "no_speech_prob": + 0.007298737298697233}, {"id": 849, "seek": 416632, "start": 4167.28, "end": 4173.12, + "text": " then we''ve created a special promo code. We''re currently in a closed + beta. So we''re accepting", "tokens": [50412, 550, 321, 600, 2942, 257, 2121, 26750, + 3089, 13, 492, 434, 4362, 294, 257, 5395, 9861, 13, 407, 321, 434, 17391, 50704], + "temperature": 0.0, "avg_logprob": -0.136667534856513, "compression_ratio": 1.583673469387755, + "no_speech_prob": 0.008405245840549469}, {"id": 850, "seek": 416632, "start": 4173.12, + "end": 4178.16, "text": " customers, but kind of on a case by case basis. But we''ve + created a promo code for listeners of this", "tokens": [50704, 4581, 11, 457, 733, + 295, 322, 257, 1389, 538, 1389, 5143, 13, 583, 321, 600, 2942, 257, 26750, 3089, + 337, 23274, 295, 341, 50956], "temperature": 0.0, "avg_logprob": -0.136667534856513, + "compression_ratio": 1.583673469387755, "no_speech_prob": 0.008405245840549469}, + {"id": 851, "seek": 416632, "start": 4178.16, "end": 4183.12, "text": " podcast. + I think I''m going to, I''m sure the exact code with you. And then you can post + it in the", "tokens": [50956, 7367, 13, 286, 519, 286, 478, 516, 281, 11, 286, 478, + 988, 264, 1900, 3089, 365, 291, 13, 400, 550, 291, 393, 2183, 309, 294, 264, 51204], + "temperature": 0.0, "avg_logprob": -0.136667534856513, "compression_ratio": 1.583673469387755, + "no_speech_prob": 0.008405245840549469}, {"id": 852, "seek": 416632, "start": 4183.12, + "end": 4191.599999999999, "text": " comments to the video. But essentially would + give, give you a 50 megabyte account, which is much", "tokens": [51204, 3053, 281, + 264, 960, 13, 583, 4476, 576, 976, 11, 976, 291, 257, 2625, 10816, 34529, 2696, + 11, 597, 307, 709, 51628], "temperature": 0.0, "avg_logprob": -0.136667534856513, + "compression_ratio": 1.583673469387755, "no_speech_prob": 0.008405245840549469}, + {"id": 853, "seek": 419160, "start": 4191.68, "end": 4198.56, "text": " larger than + our standard trial account by about a factor of three for one month. If you want + to just", "tokens": [50368, 4833, 813, 527, 3832, 7308, 2696, 538, 466, 257, 5952, + 295, 1045, 337, 472, 1618, 13, 759, 291, 528, 281, 445, 50712], "temperature": 0.0, + "avg_logprob": -0.12828274242213514, "compression_ratio": 1.6442953020134228, "no_speech_prob": + 0.01291816495358944}, {"id": 854, "seek": 419160, "start": 4198.56, "end": 4202.96, + "text": " try out the system that we''ve been talking about. This is fantastic. + Thank you, Amin, for this", "tokens": [50712, 853, 484, 264, 1185, 300, 321, 600, + 668, 1417, 466, 13, 639, 307, 5456, 13, 1044, 291, 11, 2012, 259, 11, 337, 341, + 50932], "temperature": 0.0, "avg_logprob": -0.12828274242213514, "compression_ratio": + 1.6442953020134228, "no_speech_prob": 0.01291816495358944}, {"id": 855, "seek": + 419160, "start": 4202.96, "end": 4209.76, "text": " opportunity. I''m sure some + listeners will take this into use and build some cool vector search", "tokens": + [50932, 2650, 13, 286, 478, 988, 512, 23274, 486, 747, 341, 666, 764, 293, 1322, + 512, 1627, 8062, 3164, 51272], "temperature": 0.0, "avg_logprob": -0.12828274242213514, + "compression_ratio": 1.6442953020134228, "no_speech_prob": 0.01291816495358944}, + {"id": 856, "seek": 419160, "start": 4209.76, "end": 4215.84, "text": " applications + for their products. That would be great. Yeah, it was a pleasure to talk to you. + I hope", "tokens": [51272, 5821, 337, 641, 3383, 13, 663, 576, 312, 869, 13, 865, + 11, 309, 390, 257, 6834, 281, 751, 281, 291, 13, 286, 1454, 51576], "temperature": + 0.0, "avg_logprob": -0.12828274242213514, "compression_ratio": 1.6442953020134228, + "no_speech_prob": 0.01291816495358944}, {"id": 857, "seek": 419160, "start": 4215.84, + "end": 4220.88, "text": " we can talk at some point down the road as well. And I + wish you all the best in the future. In the", "tokens": [51576, 321, 393, 751, 412, + 512, 935, 760, 264, 3060, 382, 731, 13, 400, 286, 3172, 291, 439, 264, 1151, 294, + 264, 2027, 13, 682, 264, 51828], "temperature": 0.0, "avg_logprob": -0.12828274242213514, + "compression_ratio": 1.6442953020134228, "no_speech_prob": 0.01291816495358944}, + {"id": 858, "seek": 422088, "start": 4220.96, "end": 4227.52, "text": " next year + with your ambition and also with reaching to clients and getting contracts. And", + "tokens": [50368, 958, 1064, 365, 428, 22814, 293, 611, 365, 9906, 281, 6982, 293, + 1242, 13952, 13, 400, 50696], "temperature": 0.0, "avg_logprob": -0.22535778724983946, + "compression_ratio": 1.5942857142857143, "no_speech_prob": 0.00485513499006629}, + {"id": 859, "seek": 422088, "start": 4228.4800000000005, "end": 4234.0, "text": + " all the best to you on that front. It was my pleasure to talk to you and hopefully + to see you", "tokens": [50744, 439, 264, 1151, 281, 291, 322, 300, 1868, 13, 467, + 390, 452, 6834, 281, 751, 281, 291, 293, 4696, 281, 536, 291, 51020], "temperature": + 0.0, "avg_logprob": -0.22535778724983946, "compression_ratio": 1.5942857142857143, + "no_speech_prob": 0.00485513499006629}, {"id": 860, "seek": 422088, "start": 4234.0, + "end": 4239.68, "text": " next year as well. Thank you so much. It was good talking + to you too. Thank you, Amin. Bye-bye.", "tokens": [51020, 958, 1064, 382, 731, 13, + 1044, 291, 370, 709, 13, 467, 390, 665, 1417, 281, 291, 886, 13, 1044, 291, 11, + 2012, 259, 13, 4621, 12, 6650, 13, 51304], "temperature": 0.0, "avg_logprob": -0.22535778724983946, + "compression_ratio": 1.5942857142857143, "no_speech_prob": 0.00485513499006629}]' +--- + +Hello, vector podcast is here and today we're going to be talking with Amin Ahmed, co-founder and CEO of the company called ZIR AI. +I'm really, really excited to talk to Amin because basically he's innovating in this space, his company is innovating in this space of bringing vector search to practice and also making it usable. Hey, I mean, how are you? I'm doing fine. Thank you. Thanks for having me. Awesome. +Thanks for joining. And I know it's almost like festive times, so it's probably quite a packed schedule for you otherwise as well. So yeah, I was thinking let's traditionally start with the introduction. +Like, can you please tell me a bit of background before ZIR AI and OZIR AI is a startup and you're rolling at ZIR AI? Yes, sure. Me and my co-founder, we started ZIR AI in 2020. Before that, we were both working at Google. I had been there since 2010. +I worked in Google Research, focused on NLP and language understanding with machine learning. Prior to that, I had worked many other places in the industry. So I've been in the industry about 24 or 25 years now. +And around 2017, the team that I was working on in Google Research actually became known for Gmail Smart Reply. If you remember that feature. Yeah, that's an excellent feature. The moment I saw it, it was like, wow, that's fantastic. Yeah. Yeah, and it was impressive. +And I would say maybe it was a very practical application of NLP that went, that was deployed on a very large scale. So that was the research group that I was a part of. It was under Rakers, while that was developed in collaboration with some others. +Anyway, around that time, I became very interested in using neural networks for more general purpose information retrieval. And I specifically formulated this question answering over a large corpus. And at the time, I mean, Bert, when it was released a year later, changed this idea. +But at the time, a lot of people would approach a machine learning problem from scratch. It would take a completely uninitialized neural network and then try to train it. And when the models get big and deep, mostly you don't have enough data for your task. +And it also, you know, that doesn't jive very well with if you think about how humans approach a task. +If you ask me to answer a question or to read a message from a medical textbook, I may not be a doctor, but my understanding of the English language will allow me to get some of the information content from that passage. +So in the same way, I was thinking that if a neural network is truly understanding language in the way that people do, it should have this property. And it should be possible to train a general purpose neural network that without fine tuning in a specific domain can also work reasonably well. +So I set out to build this thing. And that was my research program in 2017. And we were actually able to launch the first iteration of that model in a product called Google Talk to Books. So to, and I'm saying this to my knowledge, I would love if someone corrected me in the comments section here. +This is Google Talk to Books is the first large scale end-to-end demonstration of a neural information retrieval system. So it is a search over a corpus of around 200,000 books from the Google Books corpus. But it's done entirely with vector search. And I'm not aware of anything before that. +So the neural network is very important here. I, not part of the team that conceived this idea and I was not actively working on it. They had a neural network which wasn't producing good enough results. +And we put in this more general purpose question answering your network and the results dramatically improved. This was basically the first rollout. +But then what I observed over the subsequent years was that I was able to take exactly the same neural network and apply it in at least six different products within Google. And this is what convinced me of the business value of what had been demonstrated here. +This could actually improve metrics and products used by millions of people. And so this was essentially the genesis of the idea of the ZRAI. +We started me in my co-founder in 2020 and the objective is to provide something like elastic search or algolia, except using the principles of neural information retrieval. So as you know, elastic search and algolia are based on the BM25 algorithm fundamentally. +So yeah, so that's what we've been doing for the last two years. Yeah, this is fantastic. +I mean, it's fantastic also that you bring your experience from such a large company innovating in search, right? Over to, you know, the rest of the world essentially, right? So that I believe your goal is to apply this with as many clients as possible. +And are you focusing mostly on NLP at the moment, natural language processing? Yeah, so well, we're focused from a customer's perspective. We provide a tech search solution. +Now, one of the beauties of embedding based techniques is that in your network, you can go beyond text and you can embed images, video and other types of media into a common embedding space. So that is where this company will eventually go. +But my roots are in NLP and I think that tech search by itself is a large area that takes an effort to do well. So that's where we're focused initially. Yeah, that makes total sense. +But as you said, you know, vector search is not kind of constrained by the application as long as you can embed it, right? +And plus all these multimodal scenarios where you can combine, let's say, your camera pointed at something and then you're talking to it and then you can kind of get some textual matches and suggestions, right? So that could be a very rich experience. +Right, right. And that particular application is actually achievable now, even in an all text platform, if you feed the transcripts in. And these neural network approaches tend to work especially well with natural speech, both as query input. +So this is why they're often used in technologies like Assistant or Alexa. Because people, when they speak, it's obviously much different than when you're typing keywords in a search box with your keyboard. But then also when searching over natural language text like transcripts. Yeah, absolutely. +And when you say neural networks, you know, some of them, let's say, vector database providers and vendors on the market, they give you sort of this machinery. You can plug in some models. They also have some models available, let's say, from hugging face. +In your case, in case of ZRI, are you innovating in this space of creating this neural networks for your clients? Yes, we are approaching the problem holistically. So we're, you know, the vector database is one critical component of a neural information retrieval system. +But there's other pieces, for instance, like the re-ranking piece or the neural network that produces the embeddings. And all of these need to work in coordination and tandem. Ideally, when they do, you can squeeze a lot more performance out of this system. +So yes, our focus is on, we even handle data ingestion. It's not a big area of focus. But the reality is that you have to make your experiences as easy as possible for widespread adoption, I think. So we allow our customers to just shovel in, you know, PDF documents and all kinds of other formats. +We perform the text extraction. We perform the segmentation of the document. And we actually do the encoding with the neural network, build the vector database and then handle the serving as well. Yeah, so it sounds like an all-around solution. +And I mean, it's very typical, you know, in some sense kind of to bring some algorithm or some idea to the market, but like it doesn't have any connectors. Okay, how do I feed data into it? Or maybe there is like a simple demo. And yeah, nothing beyond that. +But it sounds like you are taking the kind of all-around approach. +And have you been looking to implement everything yourself or are you also kind of reusing some of the open source pipelines, you know, like for example, for embedding or for document conversions and so on? Yeah, we are using open source as much as we can and where we think it makes sense. +So for instance, for content extraction, there's a Pashitika, which is a very good framework. But then there are certain document types for which there are better alternatives out there. And, you know, we've had certain customers for which PDF extraction, for instance, was a priority. +And we discovered some shortfalls with Tick-N-We went and we researched and found there's better alternatives out there. And so we've got those implemented. But we didn't write a PDF extractor from scratch, obviously. That's too much for a two-man company to do. +So yeah, we're trying to really combine the best of breed in every area and create a cohesive system that just works out of the box quite well for a broad range of use cases. Oh yeah, that's awesome. +And it's also great to hear that you reuse open source software, you know, at least initially or maybe you can find two minutes, so to say. But yeah, I mean, also that's amazing because you can quickly kind of build your product and focus on the goal. +Yeah, and now that we approached this more kind of closely, can you actually describe what is your product today? So as a client, what can I get? What can I, what kind of support you also provide? But first, can you start with the product itself? Yes. +So to describe it abstractly, and then I'll explain very concretely what I mean. I would say that we're a cloud platform as a service for text retrieval or text search. So the way it looks is we have two main APIs, one for indexing content and the other for running queries on the content. +So an organization would come and they would index a large amount of content. They might index periodically or incrementally as well over time. +And this would accrete in an index and then subsequently they would come and they would run generally natural language text queries against that corpus and we would return the best matches. So what we actually provide and how that looks on our platform. +So we, you essentially, you know, you come and you sign up just the way you would sign up for an AWS account, you're dropped into an admin console. Everything you can do in the admin console can be done through APIs. We're basically focused on again, a platform. +So we're accessible through GRPC and rest. The platform, the console is basically to allow you to, you know, point and click and quickly experiment and discover the value of the system. +Because our vision was that within, within 15 to 30 minutes, someone from an organization should be able to come, drop their documents into the system and determine whether or not it's even going to meet their needs. +And then if it does, they can consult the documentation and learn how to use the APIs and get a proper integration going. So we organize collections of documents into what are called corpora. So one corpus is essentially, it's a customer defined entity. +It groups related documents that they want to search together as a unit. We allow, you know, the customer to define any number of corpora, there's limits depending on the account type. And then you can essentially drag and drop the documents into the web browser, into the corpus upload. +It'll be, there's about a seven minute latency. And then you can start running queries. And when you run, we have a hosted UI that makes it easy to see the results kind of on the spot in the browser. +But when you run queries through our interface, you also have our, our, our API is, you also have the ability to run one query against multiple corpora and merge the results. +So we also support the ability to attach metadata as your indexing content, attach metadata that then is returned to you in the search results. So that would allow you to, to join to let's say another system on your end. But those are, those are some of the features that we provide. Yeah. +So it sounds like it's a self service system, right? And so if I was a client of yours, I could like get a subscription trial subscription maybe then upload my document corpus. +How big a corpus could I upload on a trial? Do you have any limitation there at this point? So our general trial has been 15 megabytes of text. And I'll explain what that translates to. I was, I was, I was just working with another customer. +And they had about one gigabyte of PDFs that we put into a corpus. And then that turned out to be about 48 megabytes of text. So the, the billing is by the actual extracted textual content. So 15 megabytes is actually a decent data set, several hundred documents you can imagine. +So, but we have, we have plans that go much larger and we have customers that are indexing far more data. Yeah, yeah, sounds great. And then what happens next? So let's say I'm happy. I want to move forward. +Now you said that there are APIs that I can start kind of introducing inside my prototype or my existing back end. Is that right? Yeah, that's right. So we, we support primarily we promote a gRPC interface because it's high performance low latency. We also do have a rest interface. +We have fully authenticated APIs. So we use OAuth 2.0 that standard. So you would give credentials to your servers and they would use those credentials to establish an authenticated session with the platform and then run, run queries for you at a very high rate. We scale horizontally. +We can go up to hundreds of QPS, though we haven't had a customer that's needed such a high rate, but we're capable of that. Yeah, yeah. +And you also mentioned that you maintain certain like SLA guarantees like P99 latency can you speak a bit about that and also like how much of that accounts for client need versus what you are building for the future. And this is a good question. +So in terms of client need, we really haven't had any client that's required anything better than 200 milliseconds. Now there's a potential client that we're working with. They're not yet acclaimed. +They're looking for more like 50 to 60 milliseconds because essentially the look up into our system is only one part of their overall request handling process. So they have a much tighter budget. +In practice, what we're seeing on our platform for our customers today aggregated over all queries is a P99 of around 130 milliseconds. Our P50 is about 60 milliseconds. And this has been sufficient for our customers. +For customers that have tighter requirements, we actually have many different ways to address it. So actually the main latency is not from the vector database. The vector database is generally quite fast. It's the neural network that has to do the text encoding. That's the bottleneck. +So we have the ability to set up dedicated pools of encoders, neural networks that do this encoding of four customers. So we scale and we're cost efficient by sharing the pool across all customers. But for customers that have very stringent needs, we can set up dedicated pools for them. +But even when you go, let's say single customer, single node, maybe GPU node, there are still theoretical boundary to how fast it can be. +Let's say if I take an off-the-shelf birth model, and if I throw 768 dimensions, what's going to happen? How can I fine tune it on the speed size? Yeah, well, let me address two things that you said there. +So the off-the-shelf birth model is a very common approach that many companies are trying to productionize NLP. They use it because birth has a phenomenal accuracy. You fine tune it with a little bit of data. And everyone always hits the same problem that is very difficult to productionize. +And even at a place like Google, they didn't productionize birth. They had to distill birth and productionize it. And distillation requires a lot of expertise. It's out of the reach, I think, of most companies. +So as good as the results look in a staging environment, that's not really a practical to productionize that. +And that comes back to the original point that we tried to make the right choices where if we were deploying birth, either it would be enormously expensive for us because we'd have to be using GPU instances or TPU instances, or we would have very high latencies. +So we have a model that produces similar performance, but it runs much faster. It's still transformer-based. +Coming to your second point, I think your main question, your original question, was actually what's the theoretical limit of performance that we can achieve in terms of are you asking from a latency perspective? Yeah, a latency. So I'll say this. +When it comes to the vector database, you probably know this better than I do. If it's indexed and quantized correctly on our last stuff, even running on CPUs, you can get down to three, four milliseconds of latency. +It depends on so many trade-offs, like how much recolor you will decircify and other things like that. What are the dimensions of the vector? But I think that we found that to be quite feasible for our system. We don't do 768 dimensions. +Our neural nets produce a little bit less, but still it's comparable. It's not that far off. In terms of the neural network, I would say that transformers are required for proper language understanding. +One of the things I didn't mention about our system was I think that we were basically one of the first teams back in 2017 to incorporate transformers production architecture. This was my colleagues, Noah Constant. He was actually one of our colleagues. +Previously being in our team was on the original transformer paper. He was in Google Brain at that time doing that research. We wanted to productionize a plan to a model. Noah basically spent a couple of months, took that research level code and got it to production quality. +Talk to books is actually being powered by a very early transformer based model. We saw an enormous performance jump in our metrics, doing nothing other than switching to transformers. I've never seen such a big jump in any... Our metrics, we were looking at F1. Our F1 jumped from 30% to 38%. +Just by switching to transformers. Not changing the training data or the evaluation objective, just making this one change in the architecture of the neural network. I would consider that's an absolute requirement. +I would also say that I'm not very familiar with the economics of GPU scaling because it's generally kind of expensive. Our neural networks are actually designed to run reasonably well on CPUs. There's also these tips like obviously Google's got the TPU, but Amazon has Inferencia. +We're still kind of experimenting with what we can do with latency there. +I think that you can count on about 20 to 30 milliseconds of latency at the low end from coming from the encoding process unless you start moving to GPU or something and then you might be able to do maybe 5 to 10 milliseconds. +If you put that all together, it seems to me realistically you can shoot for 30 to 40 milliseconds would be pretty aggressive in terms of what you can get at the lower bound. And maybe for many companies out there, this will be okay. +As long as they don't run web scale type of deployment, maybe they can scale per region or per zone or whatever it is that makes sense to them. I think sounds like 30 to 40 milliseconds could be quite an okay speed. We're talking about latency there. +I think that's a perfectly acceptable speed even for web search or something. That's literally the blink of an eye, 40 milliseconds. I think the other thing to note is that these solutions are very horizontally scalable. +In terms of serving any given throughput, you just scale the neural network and code or pools and you can replicate the vector database if using FIAS for instance you start up replicas. You can basically get almost unlimited throughput. +It just depends on how much money you have to throw at the problem. So if you need 500 QPS, bring up more hardware. If you need 5000 QPS, you can bring up more hardware and do it. Yeah, absolutely. +I also wanted to tap into what you said that distilling bird would be beyond reach for many companies. Can you open up a little bit and also can you share with our audience what do you mean by distilling? Maybe some of our subscribers don't know that. +So in the nutshell, and also why do you think that it's so hard to do? +Okay, well, so what distillation of a neural network refers to is taking a very large neural network and neural network with a lot of parameters, it's called billions of parameters, which is very accurate but cannot reasonably be run on a production workload. +And training a much smaller model that captures as much of the performance of the original model as possible, but fitting inside the engineering parameters of your production system. So able to for instance run an inference within 50 milliseconds. +So the way that distillation normally happens is you use the parent model is called the teacher model and you do a large scale labeling of data. And essentially the student model, the small model that you're training needs to learn to make the same predictions. +And interestingly, it gets as much bang for the buck in terms of training from learning to make the correct predictions as it does from learning to, you know, assign probabilities to the incorrect predictions. +So the reason I'm saying that distillation is difficult is there's, I think it approaches to it, it's still a fairly open research topic. There's a lot of active research. +I haven't looked in the last couple of years as possible that there might be frameworks out there now that make this much easier. +But certainly while I was at Google in 2018, 1920 time frame distillation was generally a topic that was tackled by entire teams working over a quarter or two, at least for the most serious production systems. That's how it was used to go. Yeah. +And definitely when it comes to collecting data as you rightly not just, you know, it's not something you can easily scale unless you have some clever technique for data augmentation. +And even then, like for text, as I was eluding in previous podcasts, you know, like if you have like a London is the capital of Great Britain, you cannot put any random city there in that specific sentence, right? Right. Right. Right. Yeah, you need to have certain control. +But there are still ways to, for example, use retrieval itself to augment your data set, right? For example, if you need more entities, you can find them through retrieval, maybe even through vector search, by the way. I don't know if somebody experimented with that already. +But there are other techniques like kind of producing these negative examples and as you alluded to, right? So you need to have as many negative also as many positive so that your model is balanced, right? And that goes to a general model training topic, which is a year to your point. Yes. +And I think that's one of the key to producing a neural retriever that can outperform BM25 in every workload. No, so it's an excellent point. Yeah. Also, I just reminded me of one challenge that we've been solving in my team actually earlier with building like a job search engine system. +And you know, like when you evaluate the performance, let's say precision or when it kind of, we call it misrecall, so how frequently it mis triggers to query, shouldn't have actually triggered. +And you know, like the basic challenge there is, okay, I have this job queries, which I can mind from certain sources. +But then you can as negative examples, you can pick everything else, right? But that everything else doesn't actually count because just to give you an example, let's say when I say find full-time job in London, right? So that's just a typical query. +You are really interested to find that slightly negative example, which says, let's say, something hours of some office, right? Which is not about job search anymore. It's about points of interest search, maybe. +And so you really want to have those examples to see, okay, does your model, you know, is able to differentiate between them? +And I guess checklist paper is another example where they go like beyond, you know, imaginary in a way that saying, okay, you can actually fulfill this criteria and you can actually check your model on various, very suspects. +Right, right. Right. +And is that something that you like, how did you go about addressing that in your research? +I mean, you know, what we did is that actually, if you look, it was like one of the early, early papers, you know, the reason I like reading papers is because you can bring some ideas from one paper to some other domain. +And so the paper was about sentiment analysis where one of the challenge was back then when it was dictionary-based systems, you know, how do I expand my positive dictionary? How do I expand my negative dictionary? +And what they propose there is that you can use a retrieval system where you say, okay, you take an instance from a positive dictionary, let's say it's good, okay. +And then you search with a pattern where you say good and then a blank and you just let your search engine tell you what good is occurring with in the sentences or text, right? And the same for the bad one. +Then they run some clustering on it so that you can actually pick more representative items from your data set. +And in principle, you could apply a similar technique with the job queries, right? And we didn't go that far, but we actually did try to use our own search engine to essentially, you know, augment. +And then there's another potential technique that might help their short of introducing hard negatives. +It's easier than introducing hard negatives just to add like what they call a margin loss, which is to essentially just say that the separation in the score that the neural network assign the positive example versus the negative examples has to be large. +So you sign some lambda and it has to be, essentially, you handicap the scores of the positive examples by that lambda and it forces the neural network to introduce more separation. So sometimes that can be helpful, even if you haven't generated hard negatives. Yeah, yeah, absolutely. +Maybe we can also cite some papers in this podcast, you know, like especially you mentioned some papers. And I will try to find this sentiment analysis paper, although I think it's probably like five, six years old or maybe even older. But I mean, this idea is still live forward, I think. +And like we shouldn't forget about them. +And if we go back to your product, so basically, like you said that you also kind of look at using some of the existing algorithms in vector search, can you name them? Or is this some kind of secret or are you customizing them as well? So for the vector search, specifically. Yeah. +I think we can say that we know we at our core, we do take advantage of phase or fire. So I'm not exactly right. I don't pronounce that from. No, but nobody knows. I think everyone says they own weight. +In my opinion, it's just an excellently designed system with a team that's actively maintaining it and there are obviously experts in that field. One of the features that customers have requested from us is the ability to mix in predicate predicates and traditional Boolean logic. +So you might have this corpus of documents and they all have this, every document has this metadata, which is the date it was published. And then you might want to say, okay, give me the most relevant matches for the query, but only for documents published in 2021. +This is like a very crisp selection criteria and this selects a subset of the corpus. So this is actually something that we have not launched yet, but we've been actively working on and will probably launch in Q1. +I believe, I recently added the support Google vertex matching engine, I think, is a recent offering. They also claim to have this support. It's important, many of our customers have asked for the same thing. So we've started from a FIAS, but we have been customizing it. Yeah, yeah, sounds good. +So basically some other companies call it symbolic filtering and that's what I think you refer to, right? So I can have certain categorical variables, so to say in my data and I can filter by it, right? Exactly, right. +Yeah, so I think vanilla fires of face doesn't have this functionality as far as I know. And so essentially you'll have to kind of extend it. +And do you plan to keep it to yourself, which is perfectly fine, or are you also able to contribute it back to the FIAS open source project? So I think what I've noticed about the authors of FIAS is that they want to keep the product very focused on being a first class vector engine. +And these are essentially augmentations that they're not interested in pulling in. And I think they would see it as scope creep, which is probably fair. That said, would we contribute it as open source? Like we could still contribute it back as open source. +In fact, down the line we could potentially make our entire stack open source. +I think some of the abusiveness of that, say, in regards to elastic and how it's worked, where you have these very large companies that essentially contribute very little, but they take advantage of their ability to launch platforms as a service like Amazon can. That's kind of scared us. +I think in the short term, we're not doing that, but that's certainly something we could plan on doing in the longer term. Yeah. +And I mean, of course, the dynamics of open source is not necessarily solved, especially as you've brought up this example with the elastic, right? And they're kind of battle between elastic and Amazon. But like for some companies, it still works as a starter. You can enter this community. +You start building the community around you. And so they bring back ideas. They feed in new use cases to you. And maybe they even implement some features, right? And is this something that you've been thinking as well along these lines? Well, I definitely see your point. +I definitely see your point. At the same time, we also do have some competition in the space. We're still in the early days, but 2021, in particular, saw the launch of several competitors. And even Microsoft is in the mix now, Microsoft's semantic search. +I think it's still in beta, Amazon launch Kendra in 2020. I think that they probably get the credit for launching the first platform as a service neural information retrieval system. +So in both of the cases, both of those systems, by the way, I think that they actually are fundamentally based on a VM25 search, followed by re-ranking with the neural network. This is what I've gathered from their own product marketing material, which is still in neural search. +It just has a difference out of pros and cons versus like straight retrieval from a vector database. So, for instance, just to give you one quick example, a multilingual search, VM25 is not going to work for a multilingual search. +You have queries coming in different languages, documents in different languages. VM25 won't work there, nor will a re-rank on a VM25 results approach work over there, because the VM25 has to bring something back to re-ranking. Well, in the case of our system, you can check on some of the demos. +We can actually embed across languages into a shared embedding space. And so you can search across languages. That's something which you need a vector database for. Yeah, exactly. +So you go multilingual on the first stage of retrieving the data dates, right? And I think this multilingual search in general, I think it has so much potential. +I don't know if Google is using it already, to some extent, but like even like at smaller scale, instead of configuring, let's say, solar. We keep mentioning last search a lot. They didn't pay for these podcasts. +But I'm just saying, let's say Apache solar, or Lycene, right? So you'll have to kind of like, yeah, go a long, long, long way to achieve it. But like, okay, now Lycene released H&SW in 9.0 version. +And so in principle, you could embed your documents using multilingual model and retrieve them in the same way, right? So do you see huge potential for the market, you know, for the multilinguality? +No, there have been some studies that showed that when eBay introduced automatic translator tools, there was a significant increase. +It was a few, I think, you know, a few percentage points of increase in commerce on their platform, which translated to hundreds and hundreds of millions of dollars. +So the, you know, the advancements that have been made in machine translation and now, and she like, cross-lingual retrieval, will serve to further break down barriers to commerce at least. And in a way that's commercially very valuable. +But speaking more broadly, I think that what I would be very interested to see is how vector databases evolve and merge into traditional database technology or into systems like Lucene, like information retrieval systems. +Because at the moment, you know, you have FIAS, it's kind of a separate discreet entity. +But longer term, just, you know, conceptually, in a way, very low-dimensional vector database technology has already made its way into my sequel and Postgres with the spatial extensions that they've supported for many years. The quadri algorithm for doing, you know, sublinear lookups on a map. +Those spatial extensions have been around for a while. + You can easily imagine that in the future, once people start to understand how useful vector embeddings can be, and that's established, that you'll have a, you know, columns of vector type in a relational database and be able to simply build an index and perform, you know, fast nearest neighbor searches straight from Postgres. +So I think that's an exciting future to contemplate. And I see that eventually it will go there. That sounds really interesting, Lake. Do you think that vector searches in general is a hype right now? I think the way big data was few years ago. +No, no, it's not hype because, again, I saw neural information techniques, backed by vector databases, making a big difference in many products at Google. +So I think where it is right now is that there's a few big companies, like the Fang type companies in Silicon Valley, that have the expertise to take advantage of it. It's not being commoditized yet. +So it's definitely not hype, but it's got a few years to go before it enters a mainstream consciousness. Yeah, for sure. +But like to your point, like maybe at some point, vector search will become, let's say, part of Postgres or my SQL or whatever other like, kind of traditional, so to say, database, which is traditional is in its widely used. +And then you've seen already also introduced it, right? Lucine now has H&SW. + You can go and argue to the point, okay, maybe Lucine index layout might not be kind of optimally designed for, you know, nearest neighbor retrieval because, because like if you look at five methods or H&SW, you know, like it's some graph method or it's a way to partition your space in the scene, you partition it by segments. +And that's kind of like given, right? Because it's designed for inverted index. +But again, on Twitter somewhere, I saw it with from one Lucine commeter who said, maybe this will by itself open up some new opportunities because you'll have a separate vector space index per segment, right? And maybe you can design some features around that. +So it sounds like you still see the potential for merging these technologies in the future and then bringing additional benefit. Well, yes, I can't really speak for Lucine. I haven't taken time to study that implementation. How it was done is I think you know more about it than me. +But I was seeing that eventually relational databases could and might, you know, implement indexes, vector indexes directly. I'm not sure that I can see any technical reason why that wouldn't be possible, basically. +And it could potentially be very, very useful as neural networks, you know, go more and more mainstream for embedding. Yeah, I mean, it sounds like one logical step forward. +Maybe it will not be kind of scalable as a pure vector database, but like on a small like amount of data, let's say when my SQL or Oracle or other databases, they introduce full text search, right? And initially it wasn't there, right? Right. +What restricts you from, you know, introducing another field with embedding and actually running your vector retrieval there? Right. Yeah. And I think it also, it comes down to this that, okay, FICE is always going to give you, you know, the maximum performance. +So, you know, there's going to be some subset of engineering teams that need that performance and that's where they're going to go. What about the mass market, you know, the Fortune 500 companies and things and they're dealing with problems at such a scale where it's not necessary to go there. +And if it's just in the database, even if it's only giving me 80% of the total performance, that's good enough. And in a way, that pragmatic tradeoff is what's underlying Zerae I's existence because people often ask, I can get better performance on my data set. +If I find tune, a bird model and then distilled the bird model is like, yes, that's true. We're aiming to give you a neural network and a full experience that will give you like 80% of the performance that you might be able to achieve, which is still better than you get just from a keyword search. +But the reality is, you know, how many companies have the budget to have NLP engineers and data science and squeeze out that extra performance? It's just not important in a lot of cases. Yeah, exactly. +And do you think that, you know, there is still a need to find a way to combine BM25 or whatever you have there, like the idea of Spark Search with the results from the nearest neighbor search? Like have you been thinking about it? +Have you seen your clients kind of thinking about it or asking about it? There's a very interesting paper from Google about two years ago, Dave Dobson and I'm forgetting the other individuals. +They specifically on this topic, you can obviously model a BM25 search as, you know, multiplication of Spark's matrices. And so you can imagine your vectors essentially having a dense part produced by a neural network for instance, and then a very sparse tail or something. +And you actually want to perform dot products and how do you do it efficiently? And the paper was going into some fascinating techniques for how to do that well. So your question was like, do you see these merging? And I think that, you know, I actually brought this up with the folks at Fires. +Is this something on your roadmap? Is this something you're interested in? They said, no, we're not interested in this. They're specifically focused on either sparse or dense, but not high. And I think that it's going to come down to this. +If the utility of this sparse hybrid can be shown, then the technology is going to follow and try to create efficient implementations of it. I think that there are certainly classes of queries for which BM25 can't be beat. +And the exact keyword matching is going to be the correct way to do it in the future. So then you can take a few different strategies. You can either try to classify the query when it's received and then dispatch it to the correct back end. +Or you can dispatch it to a sparse and a dense index and then merge with a re-ranger. Or you can do this like truly hybrid system where you're simultaneously doing the multiplication on the sparse and the dense pieces and producing a final list in like in one shot, not relying on a re-ranger. +So it's still an open area of research. Exactly. And two things. Like I'm looking at it from the point of view of a customer. Let's say I already have the M25 platform, right? Base platform. And so I'm curious. Okay. So I'm curious to see what vector search can bring me. +And maybe I'm thinking about introducing this as an explorative search feature. +So because I'm not sure if it's going to fly for my documents or for my items in the database, right? So that's one potential to think about, okay, as you said, I can actually route this query to both sparse and dense retrieval. And then maybe combine them in some linear formula, even. +And I can give like a smaller score, lower score to, to, or wait to the dense part and then higher to the sparse part because I still believe in sparse part. And that's how my users are expecting results to be there. +But then maybe I can surface some magic like Q and A, right? So they asked the question and I can give them the answer. And that might be really interesting. And the second point, there was a paper called beer. B E I R. I will make sure that all of the papers will be linked here in the show notes. +But that paper actually compared a dense retrieval versus BM25 on a number of tasks. Right? So you can have a search. You can have a question answering and leaves goes on. And so what they showed is that BM25 is fairly competitive. +It actually is above dense retrieval methods like on zero, zero short retrieval. Right? So like you didn't find you in this model. So you just took them off the shelf. Here is the task. Let's compare. Right? BM25 is very stable. So just few models actually outperformed it. +And so in that sense, it sounds like BM25 is here to stay. What do you think? I agree with you. And again, this is where our scope is as a company is on building an end to end information retrieval pipeline, which means that, okay, today, we have a neural dense retrieval. +Because the M25 has been done, right? It's in the scene. It's well understood how to implement it. Although there are some tricks to actually make the M25 work even better than my off the shelf implementations. But what. +Where we want to eventually get to is we could potentially build the BM25 and dense indexes for our customers. And then return. We're trying to just serve the best results possible. So for instance, you could take even sometimes even very simple heuristics work single word queries. +Often BM25 is how you want to serve them, not not not from a dense index. So if it's a single word query, okay, you're going to be on 25 search. If it's anything longer than one word run, then search. That's not a very principled approach. +I'm just pointing out that, you know, what's going on behind the scenes. That's the intelligence of the platform to provide and we're not really restricted or married to a vector database or only a vector database, powering powering the search of this platform. Yeah, yeah, that makes sense. +So is does that manifest in some way in your product that I as a user can have the flexibility and how my search is processed is going to go the sparse route or is it going to go the the density tree will. +No, we don't so at the moment we're only doing the answer tree will because we feel like that's the interest. We can add that we can add the BM25 parts without a lot of difficulty in six months from now or something like that. +So, but we do provide a few different flavors of the dense retrieval because there's a few. There's question answering so the user puts it or query answering the user puts a query in and then you're trying to find good responses. +There's also another task which is semantic similarity, which is closely related, but it's like I make a statement and I just want to find similar statements so my statement is not necessarily a question that I'm looking for an answer to. +I just want to find semantically similar statements and then the other thing is question question similarity often comes up it comes up usually in the not not in. +Well, you've seen it in Google for instance when you type with query and then it says people also ask these questions and they get these similar questions right. +So there's use cases for question question similarity and so we support all three of those modes of operation and we allow at query time our customers to specify which mode they're trying to run it. Yeah, yeah, that makes sense that makes a lot of sense and of course. + One thing that I keep thinking about is let's say when we introduce the sparse search let's say Bm 25 and some customer comes in and it's not English language it's something else right then you need to bring in also the tokenization and other things from maybe from Lucene and of course, Lucene is a library in principle it could be wrapped in a Docker image and you can do that job right. +But then the question is can you easily married so that it is production grade between different platforms and languages. +And it's surprising Lucene has come a long way so there's come long in terms of providing a good sense out of defaults out of the box in terms of stop wordless and stemming but I have. +My daughter school started using this like a product that manages communication between the school and the parents and that thing was clearly using. +You know, Lucene or solar elastic search and they didn't have the stemming configured properly and I didn't know as possible to misto misconfigure that so I was searching for vaccine and it couldn't find find it because it was vaccination in the title over there. +So yeah, so with the with the neurosurgeon is kind of a little bit more bullet proof, you know, it's it's a bit more immune to these kinds of mistakes and those misspellings very easily. +Yeah, yeah, especially I think there is also a paper about I think it was from Google you know to train on bite level and so you will not be constrained by okay the complexity of the language because you have like bite level. +Definitions and and so in principle your model should be robust to typos and misspellings and so on and some of them come from speech right so. +Exactly exactly yeah and it sounds like interesting like the example you brought up with your daughter school like system like it sounds like largely search is still broken it's like like the moment you go to some. + System which is let's say for public use right like it's not necessarily designed for for findability there it exists and you know like like Daniel tanker lung I think he says like the funny part of search industry in general is that when search engine works nobody will go and praise you they just use it when it doesn't work they will blame you so you always air on on that. +How do you feel about that like is this also the potential for your company to go and fix many of these broken use cases. +Well that certainly that certainly actually our vision that we will make it very easy for SAS companies to provide a much more in Google like search experience in their products so when it comes to web say that let's. +Into two categories SAS companies and website owners when it comes to website owners I think the search for websites is really used because. + And it becomes like a cyclical thing it's really used companies therefore don't invest any money in improving it it's really used because it's not good and basically Google does enough a good enough job actually indexing well sites so site owners have accepted that Google is going to be the front door into their into their website. + On the other hand I think it's it is obviously dangerous for them to because you've had sites that essentially get obliterated when Google changes you know their quality guidelines and they they drop off the front page and the traffic goes down by 95% suddenly and there's no way to recover from it so it would be good first to be able to provide a good search experience on the websites but I think they don't do it for the cost involved and they don't know how to and certainly. +Algolia and elastic are making that easier particularly algolia but there's still a lot better that it could be made coming to SAS companies there they're talking about data that's private the communications of the school to the parents are not on the web somewhere they can be indexed by Google. +So I feel like what I've noticed in the last few years is that some sort of search feature is present in most of these products now. + But yes it's usually not tuned maybe not even set up correctly and it doesn't work well and there's a lot of room for improvement so I think these these neural search technologies let you you know really easily improve the quality easily if you've got a set of simple APIs and that's what we provide our APIs basically look like elastic or Algolia's index documents and you never know there's a neural network running. +And the background at all and it's not important just the queries go in and the results come out but these results are far far better than what you would get from a keyword search. So so I think there's a lot of scope particularly for SAS companies for for neural search. +Yeah yeah absolutely I actually wanted to ask you just a question came to my mind I've been reading the book about I think about relevant search it's called by. Doctor and ball and other authors I might be not remembering exactly but this book you know it goes chapter after chapter wait says. +Okay let's just take it from the first principles you have a search to ask you have documents you need to start with like. + tokenization and by the way if you make a mistake that it will be not findable and then you move one level up and then you start thinking okay what about the model okay TF IDF BM25 what are the trade of Sunson and so they teach you to become a search engineer and then they proceed to ranking and so on so forth and my question is like. +What do you think is going to be the change in the search engine profession going forward once neural search will hit the mass market because when I was the search engineer. +Like I looked at the scene and solar and I I didn't question much I just went and like implemented some changes some parsers some plugins or modified the behavior of some of some algorithm right by extending that class by the way. + The scene was not it was making a lot of classes final and in Java and so I cannot actually extend them so I had to copy the entire like package and then and then rename all these classes so there is no like namespace clash but that's okay nowhere it's at some point I was worried that I will probably reintroduce the scene all the way in my ID because I had to touch multiple parts. + But so I felt like I'm in control more or less right not because it's on not not only because it's open source but because I could read the code I could talk to people I could read books I could read blogs and I could experiment myself right and that made me I believe a search engineer in that company even though the company's goal was not to build you know searches service we were building the product. +How do you do happen it thoughts around like how neural search will change the landscape of this job. Well that's a that's an excellent question. Well a few a few thoughts on that topic. Neural search is going to make it. +Easier it's going to require less expertise to put together high quality search experiences and furthermore. The advantage the companies like Google or Microsoft have from click data it's still going to be there but it's going to diminish. +And I think that's actually why maybe I'm biased here and misreading it you see a lot of search engine companies starting up in the last year or two you've got meva. Kagi I think the head of sales force research has started his own engine I've even heard some rumors you don't. +Right right I heard some movie Apple Richard so yeah exactly so to maybe some rumors Apple might be trying to do something like that and it's it's basically because the amount of effort it takes now I think has gone down significantly. +So I think that that's going to be one of the effects of neural. And I also expect it just like you know a losing has been around for a long time I mean maybe the early 2000's 2000. +99 to 9 I think when that cutting started learning Java and as a side product project he decided to implement Lucene and so he started the whole community and then Hadoop followed and so on so far. +Yeah okay because I yeah I remember from a time ago so I think that in the same way there will be an open source. Neural thing it might come under the cover of Lucene or it might be a separate Apache project and and eventually it's going to be the go to solution. +So what companies like mine are doing right now is you know this technology is still pretty new and we're feeling in the gap and we're also providing like a completely hosted solution which has some some value on its own. +But I think longer term that's why I see things headed because you know we're getting into these very good general performance neural networks. +Systems like Bert that can just perform well on a wide range of tasks and then you have like you know t5 and now mt5 and you can go across like 100 different languages as well. +So there will eventually be models that are good enough and someone's going to take the effort to distill them into something that runs well. And and you know anybody in any organization will be able to to download and use it the way to use Lucene today. +I think that's where things will be but it might be it might be five years before we reach that point. Yeah yeah and I mean to take this thought forward from here like like maybe the profession do you think the profession will change in such a way that instead of tweaking. +The index configuration to make your search kind of work better like increase recall and you know not suffer from decreased precision. +You will move more like into okay here is the problem and this of the shelf network doesn't work I have to fine tune it so you become a little bit more like a researcher. + Yeah so that's an excellent point I think one of the key components in these systems and that we have not built yet in our system but it's in the it's in the blueprints is some kind of a feedback mechanism you'll notice this in Kendra though for instance thumbs up thumbs down on the results for instance where you indicate what's good and what's bad and then even with a small amount of that data you can start to train a re-ranker. +And I think that in the presence of like the volumes of data that you get on an internal application let's say you're going to get a few thousand items of feedback. +Training a re-ranker is probably the most effective thing that you can do that data whether it's a random for a free rank or you take a cross attention on your network and you fine tune it but you can significantly improve the search quality that way. + So so so I think that the the machinery for doing all of that can also be part of the open source offering because because it's it's very broadly applicable and can be used by basically anyone because like you say this is the problem that that then comes up is like I want to give feedback on this results so the system can improve itself. +Yeah yeah absolutely so you kind of create the flywheel of success right so that you you bring the data back and then the model retrain and so on so forth but there is also there are also like interesting challenges like in your old network like catastrophic forgetting. +Like is this something that you've been thinking maybe back at Google or now with your clients something that kind of you need to keep innovating or solve it some other way. + Yeah so I am familiar with the concept of catastrophic forgetting I honestly haven't studied it very much in the context of of these large language models like Bert although in general the approach of you know taking a Bert type model and fine tuning seems seems to be working well but but then you're essentially talking about taking after has been fine tuned on one task and then fine tuning for different task and it's going to be a great deal. +Because I do think it's going to get solved because it increases its abilities on the first task. And yeah I guess I don't know how much of an issue that's that's going to be in the context as information retrieval. + Yeah I mean another thing like if you are familiar with learning to rank for example, which may or may not involve in your own network it may also be based on decision tree like lambda marked for example, you know when you receive a new batch of clicks or downloads or whatever events you have in the system and you retrain that model. +clicks or downloads or whatever events you have in the system and you retain that model, it will also forget what it knew about the previous state, right? +It's very natural and it probably is we can associate it with human life as well in some sense, although they say the older you get, the earlier memories you will actually remember, you might forget what happened yesterday, but you remember what happened like 50 years ago. +But like, yeah, what's probably noticing that with myself. Yeah, me too, actually, because days go by and I'm like, okay, what's going on? But then you go, okay, when I was a kid, I remember something. +But like neural networks are probably a little bit different, or at least the present neural networks. Right. +So I think when you when you retrain the model, like you have to retrain otherwise, it will drift, right? I think Google also has a paper about that, like kind of checking the consistency of your machine learning pipeline and your model. +So it doesn't drift and just explode in the eyes of the front of the eyes of the user, right? So you have to keep retraining it. But but then that also means that it will forget things. Maybe they were quite important. Maybe they are not high probability anymore, but they still are true. +But the network has forgotten about them. Right, right, right. Yeah, then yeah, that makes sense. Yeah. Anyway, it was it was a great talking to you, but I still want to close off. +And before we go to some announcement for you, I'm thinking like I'm asking this question to different to all guests and different guests take it differently. But I really would like to hear your view on that question of why it's a little bit more philosophical. +Like like in a way, like you had a stable job at Google, a lot of challenge, a lot of data, a lot of user impact. Like as you said, like Autorripe Live feature was enabled and to like millions and millions of users. +So then you then you decided to go to to build your startup and that's a nice nice kind of way to experience like another side of things. +But why specifically neural search? What drives you to to work on it? Well, what attracted me to I was initially attracted very much to the idea of automated reasoning. And then of course, that comes if it's current incarnation, it's machine learning. And so I started to learn about that. +And I had this opportunity to work with the Ray Kurzweil who joined Google, I think around 2012. I knew about him. He's a very inspirational figure. And he was specifically working on language understanding because he saw that as being very critical to advancement in artificial intelligence. +So so you know, then beyond that, I would say those are my broad interests. But then I just worked in this area specifically for eight years. And I think I became quite good at what I was doing. +And then also saw that what I was doing post 2017 in particular with this neural network based retrieval had a lot of applicability to products. And you know, I think that being in a research team or research team has a different type of focus. +There's a lot of focus on on publishing papers and things, but not necessarily a lot of interest or appetite for building platform. So in that way, maybe this wasn't really the right place to attempt that kind of work. But to me, I'm an engineer as well. So this is this is very interesting. +And I'm not sure if I'm answering your question, but that's some of my motivation. No, you do. I mean, essentially, I'm currently leading a search team. And yeah, you know, our KPIs is like, okay, how many papers you published, how many patents you can file. +But also when you start thinking, okay, what impact am I making, right? There is not that much room to think about creating, maybe you can create a vision, but you might not necessarily tie it in back to the day-to-day scenarios of users. +You have to be part of engineering, probably, to start delivering these things at which point you are no longer a researcher. So it sounds like you managed to combine both of these engineering and research at ZRI. Yes, yes. It's kind of both of the passions together in one company. +And if we're successful, and we can take it into the future, the research end of the program is something that I'd really like to ramp up a lot. Since we started, honestly, there's been more engineering and less research. +The training, the neural networks was at the early stage of the company, and then we haven't revisited it since then. But I think 2022 is going to be, first of all, it's going to be a big year for this industry. +Beyond Pinecon getting funding, I was recently looking Gina AI, if you're familiar with them. They, I think raised $30 million, it was in TechCrunch. So the industry is starting to get some notice. And for us as well, we expect, we expect to really expand in 2022. Oh, yeah, fantastic. +And I mean, one manager that I worked with used to say that you need to first build the plumbing. And that's your engineering work. Once you have the plumbing, you can stand on it and actually fix some other things, high level. +And that's where you will probably come back to training neural networks and actually nailing that use case for your customers. Sounds really exciting. This was really packed and so much thinking that you brought in. And also some discoveries during this conversation. I really enjoyed it. +I'm just thinking, is there something you would like to announce from your product side, something that our listeners can try? Well, thank you. Thank you for the opportunity. +I think what I would say is that if what we've been talking about is interesting and if someone would like to try it out, then we've created a special promo code. We're currently in a closed beta. So we're accepting customers, but kind of on a case by case basis. +But we've created a promo code for listeners of this podcast. I think I'm going to, I'm sure the exact code with you. And then you can post it in the comments to the video. +But essentially would give, give you a 50 megabyte account, which is much larger than our standard trial account by about a factor of three for one month. If you want to just try out the system that we've been talking about. This is fantastic. Thank you, Amin, for this opportunity. +I'm sure some listeners will take this into use and build some cool vector search applications for their products. That would be great. Yeah, it was a pleasure to talk to you. I hope we can talk at some point down the road as well. And I wish you all the best in the future. +In the next year with your ambition and also with reaching to clients and getting contracts. And all the best to you on that front. It was my pleasure to talk to you and hopefully to see you next year as well. Thank you so much. It was good talking to you too. Thank you, Amin. Bye-bye. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/atita-arora-search-relevance-consultant-revolutionizing-e-commerce-with-vector-search.md b/transcripts_with_timestamps/vector-podcast/atita-arora-search-relevance-consultant-revolutionizing-e-commerce-with-vector-search.md new file mode 100644 index 0000000..0308a57 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/atita-arora-search-relevance-consultant-revolutionizing-e-commerce-with-vector-search.md @@ -0,0 +1,5153 @@ +--- +description: '

Topics:

00:00 Intro

02:20 Atita’s path into search engineering

09:00 + When it’s time to contribute to open source

12:08 Taking management role vs + software development

14:36 Knowing what you like (and coming up with a Solr + course)

19:16 Read the source code (and cook)

23:32 Open Bistro Innovations + Lab and moving to Germany

26:04 Affinity to Search world and working as a + Search Relevance Consultant

28:39 Bringing vector search to Chorus and Querqy

34:09 + What Atita learnt from Eric Pugh’s approach to improving Quepid

36:53 Making + vector search with Solr & Elasticsearch accessible through tooling and documentation

41:09 + Demystifying data embedding for clients (and for Java based search engines)

43:10 + Shifting away from generic to domain-specific in search+vector saga

46:06 + Hybrid search: where it will be useful to combine keyword with semantic search

50:53 + Choosing between new vector DBs and “old” keyword engines

58:35 Women of Search

1:14:03 + Important (and friendly) People of Open Source

1:22:38 Reinforcement learning + applied to our careers

1:26:57 The magical question of WHY

1:29:26 Announcements

See + show notes on YouTube: https://www.youtube.com/watch?v=BVM6TUSfn3E

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20230516_070519_a42272298eaf6239be6e8050108fd5b9.jpg +pub_date: Wed, 17 May 2023 08:12:12 GMT +title: Atita Arora - Search Relevance Consultant - Revolutionizing E-commerce with + Vector Search +url: https://rss.com/podcasts/vector-podcast/953768 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 29.92, "text": " Hello + there, vector podcast. We are still in season 2. If you forgot, you can", "tokens": + [50364, 2425, 456, 11, 8062, 7367, 13, 492, 366, 920, 294, 3196, 568, 13, 759, 291, + 5298, 11, 291, 393, 51860], "temperature": 0.0, "avg_logprob": -0.38413872926131537, + "compression_ratio": 1.0, "no_speech_prob": 0.09154795110225677}, {"id": 1, "seek": + 2992, "start": 29.92, "end": 37.0, "text": " always go back and check season 1. + We had some really awesome guests there. Today I", "tokens": [50364, 1009, 352, + 646, 293, 1520, 3196, 502, 13, 492, 632, 512, 534, 3476, 9804, 456, 13, 2692, 286, + 50718], "temperature": 0.0, "avg_logprob": -0.20175727409652516, "compression_ratio": + 1.4790697674418605, "no_speech_prob": 0.14707162976264954}, {"id": 2, "seek": 2992, + "start": 37.0, "end": 42.480000000000004, "text": " have a big, big pleasure to + talk to Atita Aurora, who is the search relevance", "tokens": [50718, 362, 257, + 955, 11, 955, 6834, 281, 751, 281, 1711, 2786, 40663, 11, 567, 307, 264, 3164, 32684, + 50992], "temperature": 0.0, "avg_logprob": -0.20175727409652516, "compression_ratio": + 1.4790697674418605, "no_speech_prob": 0.14707162976264954}, {"id": 3, "seek": 2992, + "start": 42.480000000000004, "end": 50.36, "text": " consultant with open source + connections. And she has spent quite a bit of time", "tokens": [50992, 24676, 365, + 1269, 4009, 9271, 13, 400, 750, 575, 4418, 1596, 257, 857, 295, 565, 51386], "temperature": + 0.0, "avg_logprob": -0.20175727409652516, "compression_ratio": 1.4790697674418605, + "no_speech_prob": 0.14707162976264954}, {"id": 4, "seek": 2992, "start": 50.36, + "end": 58.64, "text": " in search field in NLP. I''m really curious to learn her + journey and talk some", "tokens": [51386, 294, 3164, 2519, 294, 426, 45196, 13, + 286, 478, 534, 6369, 281, 1466, 720, 4671, 293, 751, 512, 51800], "temperature": + 0.0, "avg_logprob": -0.20175727409652516, "compression_ratio": 1.4790697674418605, + "no_speech_prob": 0.14707162976264954}, {"id": 5, "seek": 5864, "start": 58.64, + "end": 63.4, "text": " of the topics that we usually talk on this podcast like, + you know, search vector,", "tokens": [50364, 295, 264, 8378, 300, 321, 2673, 751, + 322, 341, 7367, 411, 11, 291, 458, 11, 3164, 8062, 11, 50602], "temperature": 0.0, + "avg_logprob": -0.3004286575317383, "compression_ratio": 1.6278026905829597, "no_speech_prob": + 0.11591478437185287}, {"id": 6, "seek": 5864, "start": 63.4, "end": 69.64, "text": + " what vector search and also, you know, aspects of profession. Hey, Atita, how + are you doing?", "tokens": [50602, 437, 8062, 3164, 293, 611, 11, 291, 458, 11, + 7270, 295, 7032, 13, 1911, 11, 1711, 2786, 11, 577, 366, 291, 884, 30, 50914], "temperature": + 0.0, "avg_logprob": -0.3004286575317383, "compression_ratio": 1.6278026905829597, + "no_speech_prob": 0.11591478437185287}, {"id": 7, "seek": 5864, "start": 69.64, + "end": 77.6, "text": " Hi, pleasure to be here, to be in tree. And I mean, before + we start, I think a huge shout-out", "tokens": [50914, 2421, 11, 6834, 281, 312, + 510, 11, 281, 312, 294, 4230, 13, 400, 286, 914, 11, 949, 321, 722, 11, 286, 519, + 257, 2603, 8043, 12, 346, 51312], "temperature": 0.0, "avg_logprob": -0.3004286575317383, + "compression_ratio": 1.6278026905829597, "no_speech_prob": 0.11591478437185287}, + {"id": 8, "seek": 5864, "start": 77.6, "end": 85.08, "text": " for you that this + is the great thing that you''re doing. And I mean, I''m feeling pretty excited", + "tokens": [51312, 337, 291, 300, 341, 307, 264, 869, 551, 300, 291, 434, 884, 13, + 400, 286, 914, 11, 286, 478, 2633, 1238, 2919, 51686], "temperature": 0.0, "avg_logprob": + -0.3004286575317383, "compression_ratio": 1.6278026905829597, "no_speech_prob": + 0.11591478437185287}, {"id": 9, "seek": 8508, "start": 85.08, "end": 90.88, "text": + " to be here today. Yeah, thanks, Atita. You''re very kind. And this is what also + gives me", "tokens": [50364, 281, 312, 510, 965, 13, 865, 11, 3231, 11, 1711, 2786, + 13, 509, 434, 588, 733, 13, 400, 341, 307, 437, 611, 2709, 385, 50654], "temperature": + 0.0, "avg_logprob": -0.17276929772418478, "compression_ratio": 1.5584415584415585, + "no_speech_prob": 0.2547583281993866}, {"id": 10, "seek": 8508, "start": 90.88, + "end": 97.67999999999999, "text": " energy, you know, when people like you say that + this makes sense to continue doing. And I", "tokens": [50654, 2281, 11, 291, 458, + 11, 562, 561, 411, 291, 584, 300, 341, 1669, 2020, 281, 2354, 884, 13, 400, 286, + 50994], "temperature": 0.0, "avg_logprob": -0.17276929772418478, "compression_ratio": + 1.5584415584415585, "no_speech_prob": 0.2547583281993866}, {"id": 11, "seek": 8508, + "start": 97.67999999999999, "end": 103.75999999999999, "text": " really enjoy it + myself because I learn so much. I connect with all my guests on a different", "tokens": + [50994, 534, 2103, 309, 2059, 570, 286, 1466, 370, 709, 13, 286, 1745, 365, 439, + 452, 9804, 322, 257, 819, 51298], "temperature": 0.0, "avg_logprob": -0.17276929772418478, + "compression_ratio": 1.5584415584415585, "no_speech_prob": 0.2547583281993866}, + {"id": 12, "seek": 8508, "start": 103.75999999999999, "end": 109.36, "text": " level + in the podcast. And I hope that this is also informative for our listeners. At least", + "tokens": [51298, 1496, 294, 264, 7367, 13, 400, 286, 1454, 300, 341, 307, 611, + 27759, 337, 527, 23274, 13, 1711, 1935, 51578], "temperature": 0.0, "avg_logprob": + -0.17276929772418478, "compression_ratio": 1.5584415584415585, "no_speech_prob": + 0.2547583281993866}, {"id": 13, "seek": 10936, "start": 109.36, "end": 114.52, "text": + " when I release and all the podcasts, all the episodes so far, I kind of like, + you know,", "tokens": [50364, 562, 286, 4374, 293, 439, 264, 24045, 11, 439, 264, + 9313, 370, 1400, 11, 286, 733, 295, 411, 11, 291, 458, 11, 50622], "temperature": + 0.0, "avg_logprob": -0.17865766920484938, "compression_ratio": 1.6654275092936803, + "no_speech_prob": 0.12446001172065735}, {"id": 14, "seek": 10936, "start": 114.52, + "end": 120.44, "text": " really remember and I learned something new. It''s kind + of cool. Yeah, I was just wondering", "tokens": [50622, 534, 1604, 293, 286, 3264, + 746, 777, 13, 467, 311, 733, 295, 1627, 13, 865, 11, 286, 390, 445, 6359, 50918], + "temperature": 0.0, "avg_logprob": -0.17865766920484938, "compression_ratio": 1.6654275092936803, + "no_speech_prob": 0.12446001172065735}, {"id": 15, "seek": 10936, "start": 120.44, + "end": 126.24, "text": " like, we usually traditionally start with with your background. + And this is where people", "tokens": [50918, 411, 11, 321, 2673, 19067, 722, 365, + 365, 428, 3678, 13, 400, 341, 307, 689, 561, 51208], "temperature": 0.0, "avg_logprob": + -0.17865766920484938, "compression_ratio": 1.6654275092936803, "no_speech_prob": + 0.12446001172065735}, {"id": 16, "seek": 10936, "start": 126.24, "end": 131.72, + "text": " can learn more. But I know that you''ve been blogging and you''ve been + talking publicly", "tokens": [51208, 393, 1466, 544, 13, 583, 286, 458, 300, 291, + 600, 668, 6968, 3249, 293, 291, 600, 668, 1417, 14843, 51482], "temperature": 0.0, + "avg_logprob": -0.17865766920484938, "compression_ratio": 1.6654275092936803, "no_speech_prob": + 0.12446001172065735}, {"id": 17, "seek": 10936, "start": 131.72, "end": 138.24, + "text": " on conferences. But still, it''s very interesting to know, you know, how + did you arrive at this", "tokens": [51482, 322, 22032, 13, 583, 920, 11, 309, 311, + 588, 1880, 281, 458, 11, 291, 458, 11, 577, 630, 291, 8881, 412, 341, 51808], "temperature": + 0.0, "avg_logprob": -0.17865766920484938, "compression_ratio": 1.6654275092936803, + "no_speech_prob": 0.12446001172065735}, {"id": 18, "seek": 13824, "start": 138.24, + "end": 144.60000000000002, "text": " profession? What was it journey? Yeah, I think + that''s actually an interesting question. And", "tokens": [50364, 7032, 30, 708, + 390, 309, 4671, 30, 865, 11, 286, 519, 300, 311, 767, 364, 1880, 1168, 13, 400, + 50682], "temperature": 0.0, "avg_logprob": -0.17747703620365687, "compression_ratio": + 1.6175438596491227, "no_speech_prob": 0.04353488236665726}, {"id": 19, "seek": 13824, + "start": 144.60000000000002, "end": 150.52, "text": " I mean, usually when I''m + speaking at conferences, I do have this about me slide, which obviously", "tokens": + [50682, 286, 914, 11, 2673, 562, 286, 478, 4124, 412, 22032, 11, 286, 360, 362, + 341, 466, 385, 4137, 11, 597, 2745, 50978], "temperature": 0.0, "avg_logprob": -0.17747703620365687, + "compression_ratio": 1.6175438596491227, "no_speech_prob": 0.04353488236665726}, + {"id": 20, "seek": 13824, "start": 150.52, "end": 155.68, "text": " I tend to just + get through because I feel like it''s so repetitive now, of having presented", "tokens": + [50978, 286, 3928, 281, 445, 483, 807, 570, 286, 841, 411, 309, 311, 370, 29404, + 586, 11, 295, 1419, 8212, 51236], "temperature": 0.0, "avg_logprob": -0.17747703620365687, + "compression_ratio": 1.6175438596491227, "no_speech_prob": 0.04353488236665726}, + {"id": 21, "seek": 13824, "start": 155.68, "end": 162.60000000000002, "text": " + so many times. But I think I never really talk about as to how I started where I + started.", "tokens": [51236, 370, 867, 1413, 13, 583, 286, 519, 286, 1128, 534, + 751, 466, 382, 281, 577, 286, 1409, 689, 286, 1409, 13, 51582], "temperature": 0.0, + "avg_logprob": -0.17747703620365687, "compression_ratio": 1.6175438596491227, "no_speech_prob": + 0.04353488236665726}, {"id": 22, "seek": 13824, "start": 162.60000000000002, "end": + 168.08, "text": " And I think it would be nice if it was documented somewhere to + look at when I get older and", "tokens": [51582, 400, 286, 519, 309, 576, 312, 1481, + 498, 309, 390, 23007, 4079, 281, 574, 412, 562, 286, 483, 4906, 293, 51856], "temperature": + 0.0, "avg_logprob": -0.17747703620365687, "compression_ratio": 1.6175438596491227, + "no_speech_prob": 0.04353488236665726}, {"id": 23, "seek": 16808, "start": 168.08, + "end": 173.52, "text": " I can tell my kids or my, you know, grandkids that this + is what I did it. Thanks to you for", "tokens": [50364, 286, 393, 980, 452, 2301, + 420, 452, 11, 291, 458, 11, 2697, 35015, 300, 341, 307, 437, 286, 630, 309, 13, + 2561, 281, 291, 337, 50636], "temperature": 0.0, "avg_logprob": -0.21099816197934357, + "compression_ratio": 1.5148936170212766, "no_speech_prob": 0.011438341811299324}, + {"id": 24, "seek": 16808, "start": 173.52, "end": 179.20000000000002, "text": " + that. By the way, I''m also one of the regular subscribers of the Vector Podcast. + Absolutely", "tokens": [50636, 300, 13, 3146, 264, 636, 11, 286, 478, 611, 472, + 295, 264, 3890, 11092, 295, 264, 691, 20814, 29972, 13, 7021, 50920], "temperature": + 0.0, "avg_logprob": -0.21099816197934357, "compression_ratio": 1.5148936170212766, + "no_speech_prob": 0.011438341811299324}, {"id": 25, "seek": 16808, "start": 179.20000000000002, + "end": 185.8, "text": " all your episodes and obviously whenever you publish, I''m + like probably the first ones", "tokens": [50920, 439, 428, 9313, 293, 2745, 5699, + 291, 11374, 11, 286, 478, 411, 1391, 264, 700, 2306, 51250], "temperature": 0.0, + "avg_logprob": -0.21099816197934357, "compression_ratio": 1.5148936170212766, "no_speech_prob": + 0.011438341811299324}, {"id": 26, "seek": 16808, "start": 185.8, "end": 193.0, "text": + " to check them out. So how I started is kind of interesting. I was a master''s + students", "tokens": [51250, 281, 1520, 552, 484, 13, 407, 577, 286, 1409, 307, + 733, 295, 1880, 13, 286, 390, 257, 4505, 311, 1731, 51610], "temperature": 0.0, + "avg_logprob": -0.21099816197934357, "compression_ratio": 1.5148936170212766, "no_speech_prob": + 0.011438341811299324}, {"id": 27, "seek": 19300, "start": 193.0, "end": 198.24, + "text": " and I was supposed to finish my master''s, that is, a master''s in computer + application in", "tokens": [50364, 293, 286, 390, 3442, 281, 2413, 452, 4505, 311, + 11, 300, 307, 11, 257, 4505, 311, 294, 3820, 3861, 294, 50626], "temperature": 0.0, + "avg_logprob": -0.22142265153967816, "compression_ratio": 1.5619469026548674, "no_speech_prob": + 0.030503883957862854}, {"id": 28, "seek": 19300, "start": 198.24, "end": 205.92, + "text": " 2008. However, our college, which is, I mean, I am from one of the top + notch institutes,", "tokens": [50626, 10389, 13, 2908, 11, 527, 3859, 11, 597, 307, + 11, 286, 914, 11, 286, 669, 490, 472, 295, 264, 1192, 26109, 4348, 1819, 11, 51010], + "temperature": 0.0, "avg_logprob": -0.22142265153967816, "compression_ratio": 1.5619469026548674, + "no_speech_prob": 0.030503883957862854}, {"id": 29, "seek": 19300, "start": 205.92, + "end": 212.16, "text": " which has like a common attitude test. It''s like about + 400 K people every year take that", "tokens": [51010, 597, 575, 411, 257, 2689, + 10157, 1500, 13, 467, 311, 411, 466, 8423, 591, 561, 633, 1064, 747, 300, 51322], + "temperature": 0.0, "avg_logprob": -0.22142265153967816, "compression_ratio": 1.5619469026548674, + "no_speech_prob": 0.030503883957862854}, {"id": 30, "seek": 19300, "start": 212.16, + "end": 218.16, "text": " test and about 100 people selected. And I was one of them. + So obviously it was already", "tokens": [51322, 1500, 293, 466, 2319, 561, 8209, + 13, 400, 286, 390, 472, 295, 552, 13, 407, 2745, 309, 390, 1217, 51622], "temperature": + 0.0, "avg_logprob": -0.22142265153967816, "compression_ratio": 1.5619469026548674, + "no_speech_prob": 0.030503883957862854}, {"id": 31, "seek": 21816, "start": 218.16, + "end": 224.64, "text": " very prestigious. And we had this culture that, you know, + if the course is for like three years,", "tokens": [50364, 588, 33510, 13, 400, + 321, 632, 341, 3713, 300, 11, 291, 458, 11, 498, 264, 1164, 307, 337, 411, 1045, + 924, 11, 50688], "temperature": 0.0, "avg_logprob": -0.13065780577112418, "compression_ratio": + 1.7259786476868328, "no_speech_prob": 0.45448753237724304}, {"id": 32, "seek": 21816, + "start": 224.64, "end": 230.32, "text": " that is full time course, we would already + get our placements in the year two. So the company", "tokens": [50688, 300, 307, + 1577, 565, 1164, 11, 321, 576, 1217, 483, 527, 20831, 6400, 294, 264, 1064, 732, + 13, 407, 264, 2237, 50972], "temperature": 0.0, "avg_logprob": -0.13065780577112418, + "compression_ratio": 1.7259786476868328, "no_speech_prob": 0.45448753237724304}, + {"id": 33, "seek": 21816, "start": 230.32, "end": 236.0, "text": " that I got placement + with was a very small company and I think I have some sort of, you know,", "tokens": + [50972, 300, 286, 658, 17257, 365, 390, 257, 588, 1359, 2237, 293, 286, 519, 286, + 362, 512, 1333, 295, 11, 291, 458, 11, 51256], "temperature": 0.0, "avg_logprob": + -0.13065780577112418, "compression_ratio": 1.7259786476868328, "no_speech_prob": + 0.45448753237724304}, {"id": 34, "seek": 21816, "start": 236.0, "end": 240.56, "text": + " radar that I''m always attracted to small companies because I feel like I get + a lot of accountability,", "tokens": [51256, 16544, 300, 286, 478, 1009, 15912, + 281, 1359, 3431, 570, 286, 841, 411, 286, 483, 257, 688, 295, 19380, 11, 51484], + "temperature": 0.0, "avg_logprob": -0.13065780577112418, "compression_ratio": 1.7259786476868328, + "no_speech_prob": 0.45448753237724304}, {"id": 35, "seek": 21816, "start": 240.56, + "end": 247.44, "text": " a lot of things to do. Apart from the stated rule in my + job offer, which is what I kind of like as", "tokens": [51484, 257, 688, 295, 721, + 281, 360, 13, 24111, 490, 264, 11323, 4978, 294, 452, 1691, 2626, 11, 597, 307, + 437, 286, 733, 295, 411, 382, 51828], "temperature": 0.0, "avg_logprob": -0.13065780577112418, + "compression_ratio": 1.7259786476868328, "no_speech_prob": 0.45448753237724304}, + {"id": 36, "seek": 24744, "start": 247.44, "end": 254.07999999999998, "text": " + well. So it was interesting that I also reached out to them in 2007. So I was supposed + to complete", "tokens": [50364, 731, 13, 407, 309, 390, 1880, 300, 286, 611, 6488, + 484, 281, 552, 294, 12656, 13, 407, 286, 390, 3442, 281, 3566, 50696], "temperature": + 0.0, "avg_logprob": -0.10158309739889558, "compression_ratio": 1.861244019138756, + "no_speech_prob": 0.008128110319375992}, {"id": 37, "seek": 24744, "start": 254.07999999999998, + "end": 260.0, "text": " in 2008, but I reached out to them in 2007 itself that I + have to complete my industrial project,", "tokens": [50696, 294, 10389, 11, 457, + 286, 6488, 484, 281, 552, 294, 12656, 2564, 300, 286, 362, 281, 3566, 452, 9987, + 1716, 11, 50992], "temperature": 0.0, "avg_logprob": -0.10158309739889558, "compression_ratio": + 1.861244019138756, "no_speech_prob": 0.008128110319375992}, {"id": 38, "seek": 24744, + "start": 260.0, "end": 266.24, "text": " which is supposed to be like a dissertation + thing that you do in PhD. So it was supposed to be a", "tokens": [50992, 597, 307, + 3442, 281, 312, 411, 257, 39555, 551, 300, 291, 360, 294, 14476, 13, 407, 309, 390, + 3442, 281, 312, 257, 51304], "temperature": 0.0, "avg_logprob": -0.10158309739889558, + "compression_ratio": 1.861244019138756, "no_speech_prob": 0.008128110319375992}, + {"id": 39, "seek": 24744, "start": 266.24, "end": 271.52, "text": " real life project. + And I reached out to them that can I join the company and like kind of do the", + "tokens": [51304, 957, 993, 1716, 13, 400, 286, 6488, 484, 281, 552, 300, 393, 286, + 3917, 264, 2237, 293, 411, 733, 295, 360, 264, 51568], "temperature": 0.0, "avg_logprob": + -0.10158309739889558, "compression_ratio": 1.861244019138756, "no_speech_prob": + 0.008128110319375992}, {"id": 40, "seek": 27152, "start": 271.52, "end": 277.59999999999997, + "text": " training. And I mean, it was really nice of them to let me come. However, + they didn''t really have", "tokens": [50364, 3097, 13, 400, 286, 914, 11, 309, 390, + 534, 1481, 295, 552, 281, 718, 385, 808, 13, 2908, 11, 436, 994, 380, 534, 362, + 50668], "temperature": 0.0, "avg_logprob": -0.21913656254404598, "compression_ratio": + 1.5958333333333334, "no_speech_prob": 0.008976553566753864}, {"id": 41, "seek": + 27152, "start": 277.59999999999997, "end": 285.2, "text": " any kind of like training + programs. And they were experimenting with the solar and losing and", "tokens": + [50668, 604, 733, 295, 411, 3097, 4268, 13, 400, 436, 645, 29070, 365, 264, 7936, + 293, 7027, 293, 51048], "temperature": 0.0, "avg_logprob": -0.21913656254404598, + "compression_ratio": 1.5958333333333334, "no_speech_prob": 0.008976553566753864}, + {"id": 42, "seek": 27152, "start": 285.2, "end": 289.76, "text": " blown and zo + at that point in time. I''m not sure how many people would really know about zooping", + "tokens": [51048, 16479, 293, 5721, 412, 300, 935, 294, 565, 13, 286, 478, 406, + 988, 577, 867, 561, 576, 534, 458, 466, 5721, 26125, 51276], "temperature": 0.0, + "avg_logprob": -0.21913656254404598, "compression_ratio": 1.5958333333333334, "no_speech_prob": + 0.008976553566753864}, {"id": 43, "seek": 27152, "start": 289.76, "end": 296.96, + "text": " alone. They are like the Python based. I don''t I don''t at least. Yeah, + that''s actually because", "tokens": [51276, 3312, 13, 814, 366, 411, 264, 15329, + 2361, 13, 286, 500, 380, 286, 500, 380, 412, 1935, 13, 865, 11, 300, 311, 767, 570, + 51636], "temperature": 0.0, "avg_logprob": -0.21913656254404598, "compression_ratio": + 1.5958333333333334, "no_speech_prob": 0.008976553566753864}, {"id": 44, "seek": + 29696, "start": 296.96, "end": 303.35999999999996, "text": " it was really a thing + back then. So it is a content based content management system, which is based", + "tokens": [50364, 309, 390, 534, 257, 551, 646, 550, 13, 407, 309, 307, 257, 2701, + 2361, 2701, 4592, 1185, 11, 597, 307, 2361, 50684], "temperature": 0.0, "avg_logprob": + -0.1463161377679734, "compression_ratio": 1.7945205479452055, "no_speech_prob": + 0.0030420497059822083}, {"id": 45, "seek": 29696, "start": 303.35999999999996, "end": + 310.64, "text": " on Python. And at that point in time, you know, Java was really + a thing back then when I started in", "tokens": [50684, 322, 15329, 13, 400, 412, + 300, 935, 294, 565, 11, 291, 458, 11, 10745, 390, 534, 257, 551, 646, 550, 562, + 286, 1409, 294, 51048], "temperature": 0.0, "avg_logprob": -0.1463161377679734, + "compression_ratio": 1.7945205479452055, "no_speech_prob": 0.0030420497059822083}, + {"id": 46, "seek": 29696, "start": 310.64, "end": 316.56, "text": " 2007, eight. + So having worked on Python, I was like, why am I working on something which is like", + "tokens": [51048, 12656, 11, 3180, 13, 407, 1419, 2732, 322, 15329, 11, 286, 390, + 411, 11, 983, 669, 286, 1364, 322, 746, 597, 307, 411, 51344], "temperature": 0.0, + "avg_logprob": -0.1463161377679734, "compression_ratio": 1.7945205479452055, "no_speech_prob": + 0.0030420497059822083}, {"id": 47, "seek": 29696, "start": 316.56, "end": 321.59999999999997, + "text": " Python? I mean, I want to work on Java, you know, J to E. That is really + a thing like build cool", "tokens": [51344, 15329, 30, 286, 914, 11, 286, 528, 281, + 589, 322, 10745, 11, 291, 458, 11, 508, 281, 462, 13, 663, 307, 534, 257, 551, 411, + 1322, 1627, 51596], "temperature": 0.0, "avg_logprob": -0.1463161377679734, "compression_ratio": + 1.7945205479452055, "no_speech_prob": 0.0030420497059822083}, {"id": 48, "seek": + 32160, "start": 321.68, "end": 328.8, "text": " applications. But because I was + a trainee and they could obviously, you know, kind of modulate,", "tokens": [50368, + 5821, 13, 583, 570, 286, 390, 257, 40350, 293, 436, 727, 2745, 11, 291, 458, 11, + 733, 295, 1072, 5256, 11, 50724], "temperature": 0.0, "avg_logprob": -0.14958092440729556, + "compression_ratio": 1.7391304347826086, "no_speech_prob": 0.006743357516825199}, + {"id": 49, "seek": 32160, "start": 328.8, "end": 333.20000000000005, "text": " you + know, my role, they asked me to research on, you know, solar and losing because + they were coming", "tokens": [50724, 291, 458, 11, 452, 3090, 11, 436, 2351, 385, + 281, 2132, 322, 11, 291, 458, 11, 7936, 293, 7027, 570, 436, 645, 1348, 50944], + "temperature": 0.0, "avg_logprob": -0.14958092440729556, "compression_ratio": 1.7391304347826086, + "no_speech_prob": 0.006743357516825199}, {"id": 50, "seek": 32160, "start": 333.20000000000005, + "end": 341.68, "text": " up with the social networking website application. So Orkut + was leaving the space. Facebook was coming", "tokens": [50944, 493, 365, 264, 2093, + 17985, 3144, 3861, 13, 407, 1610, 74, 325, 390, 5012, 264, 1901, 13, 4384, 390, + 1348, 51368], "temperature": 0.0, "avg_logprob": -0.14958092440729556, "compression_ratio": + 1.7391304347826086, "no_speech_prob": 0.006743357516825199}, {"id": 51, "seek": + 32160, "start": 341.68, "end": 346.96000000000004, "text": " in and we were working + on this Facebook application, which could let two people talk without knowing", + "tokens": [51368, 294, 293, 321, 645, 1364, 322, 341, 4384, 3861, 11, 597, 727, + 718, 732, 561, 751, 1553, 5276, 51632], "temperature": 0.0, "avg_logprob": -0.14958092440729556, + "compression_ratio": 1.7391304347826086, "no_speech_prob": 0.006743357516825199}, + {"id": 52, "seek": 34696, "start": 346.96, "end": 352.47999999999996, "text": " + each other''s number. And that was all through like what would pop in my, you know, + profile,", "tokens": [50364, 1184, 661, 311, 1230, 13, 400, 300, 390, 439, 807, + 411, 437, 576, 1665, 294, 452, 11, 291, 458, 11, 7964, 11, 50640], "temperature": + 0.0, "avg_logprob": -0.12085545857747396, "compression_ratio": 1.7849056603773585, + "no_speech_prob": 0.003413987811654806}, {"id": 53, "seek": 34696, "start": 352.47999999999996, + "end": 356.79999999999995, "text": " like who are the people who are close to me? + And this was all supposed to be based out of", "tokens": [50640, 411, 567, 366, + 264, 561, 567, 366, 1998, 281, 385, 30, 400, 341, 390, 439, 3442, 281, 312, 2361, + 484, 295, 50856], "temperature": 0.0, "avg_logprob": -0.12085545857747396, "compression_ratio": + 1.7849056603773585, "no_speech_prob": 0.003413987811654806}, {"id": 54, "seek": + 34696, "start": 356.79999999999995, "end": 362.64, "text": " Lucine and solar. So + this is all, I mean, it started off with that I was literally pulling out all", + "tokens": [50856, 9593, 533, 293, 7936, 13, 407, 341, 307, 439, 11, 286, 914, 11, + 309, 1409, 766, 365, 300, 286, 390, 3736, 8407, 484, 439, 51148], "temperature": + 0.0, "avg_logprob": -0.12085545857747396, "compression_ratio": 1.7849056603773585, + "no_speech_prob": 0.003413987811654806}, {"id": 55, "seek": 34696, "start": 362.64, + "end": 369.91999999999996, "text": " my hair at that point in time because you can + imagine how immature solar was at that point in time.", "tokens": [51148, 452, 2578, + 412, 300, 935, 294, 565, 570, 291, 393, 3811, 577, 49539, 7936, 390, 412, 300, 935, + 294, 565, 13, 51512], "temperature": 0.0, "avg_logprob": -0.12085545857747396, "compression_ratio": + 1.7849056603773585, "no_speech_prob": 0.003413987811654806}, {"id": 56, "seek": + 34696, "start": 369.91999999999996, "end": 374.15999999999997, "text": " We didn''t + even know like which version of Lucine would go along with which version of solar.", + "tokens": [51512, 492, 994, 380, 754, 458, 411, 597, 3037, 295, 9593, 533, 576, + 352, 2051, 365, 597, 3037, 295, 7936, 13, 51724], "temperature": 0.0, "avg_logprob": + -0.12085545857747396, "compression_ratio": 1.7849056603773585, "no_speech_prob": + 0.003413987811654806}, {"id": 57, "seek": 37416, "start": 374.16, "end": 378.08000000000004, + "text": " So we were trying and testing and there were so many things which were + missing as well.", "tokens": [50364, 407, 321, 645, 1382, 293, 4997, 293, 456, 645, + 370, 867, 721, 597, 645, 5361, 382, 731, 13, 50560], "temperature": 0.0, "avg_logprob": + -0.1528182470497965, "compression_ratio": 1.6843971631205674, "no_speech_prob": + 0.009581063874065876}, {"id": 58, "seek": 37416, "start": 378.88000000000005, "end": + 385.52000000000004, "text": " But I think I got pretty much, you know, soaked up + even though I at first I did not find all", "tokens": [50600, 583, 286, 519, 286, + 658, 1238, 709, 11, 291, 458, 11, 27368, 493, 754, 1673, 286, 412, 700, 286, 630, + 406, 915, 439, 50932], "temperature": 0.0, "avg_logprob": -0.1528182470497965, "compression_ratio": + 1.6843971631205674, "no_speech_prob": 0.009581063874065876}, {"id": 59, "seek": + 37416, "start": 385.52000000000004, "end": 390.72, "text": " of that interesting + because my friends were doing Java, G2E and dot nets. So I was like I''m missing", + "tokens": [50932, 295, 300, 1880, 570, 452, 1855, 645, 884, 10745, 11, 460, 17, + 36, 293, 5893, 36170, 13, 407, 286, 390, 411, 286, 478, 5361, 51192], "temperature": + 0.0, "avg_logprob": -0.1528182470497965, "compression_ratio": 1.6843971631205674, + "no_speech_prob": 0.009581063874065876}, {"id": 60, "seek": 37416, "start": 390.72, + "end": 395.28000000000003, "text": " out on something I would, you know, catch up + with them and they would talk about all, you know,", "tokens": [51192, 484, 322, + 746, 286, 576, 11, 291, 458, 11, 3745, 493, 365, 552, 293, 436, 576, 751, 466, 439, + 11, 291, 458, 11, 51420], "temperature": 0.0, "avg_logprob": -0.1528182470497965, + "compression_ratio": 1.6843971631205674, "no_speech_prob": 0.009581063874065876}, + {"id": 61, "seek": 37416, "start": 395.28000000000003, "end": 399.76000000000005, + "text": " cool applications that they''re building and how database connections, + etc. were working. And I was", "tokens": [51420, 1627, 5821, 300, 436, 434, 2390, + 293, 577, 8149, 9271, 11, 5183, 13, 645, 1364, 13, 400, 286, 390, 51644], "temperature": + 0.0, "avg_logprob": -0.1528182470497965, "compression_ratio": 1.6843971631205674, + "no_speech_prob": 0.009581063874065876}, {"id": 62, "seek": 39976, "start": 399.76, + "end": 404.8, "text": " talking about yeah, I''m building this, you know, data and + we''re trying to locate people on Google", "tokens": [50364, 1417, 466, 1338, 11, + 286, 478, 2390, 341, 11, 291, 458, 11, 1412, 293, 321, 434, 1382, 281, 22370, 561, + 322, 3329, 50616], "temperature": 0.0, "avg_logprob": -0.12723109698054766, "compression_ratio": + 1.6680851063829787, "no_speech_prob": 0.012591775506734848}, {"id": 63, "seek": + 39976, "start": 404.8, "end": 410.0, "text": " maps and yeah, Google is really a + thing. So it was like I''m speaking a different language altogether", "tokens": + [50616, 11317, 293, 1338, 11, 3329, 307, 534, 257, 551, 13, 407, 309, 390, 411, + 286, 478, 4124, 257, 819, 2856, 19051, 50876], "temperature": 0.0, "avg_logprob": + -0.12723109698054766, "compression_ratio": 1.6680851063829787, "no_speech_prob": + 0.012591775506734848}, {"id": 64, "seek": 39976, "start": 410.0, "end": 416.24, + "text": " and people were like what? So it was it was interesting though. So you + felt like an underdog or", "tokens": [50876, 293, 561, 645, 411, 437, 30, 407, 309, + 390, 309, 390, 1880, 1673, 13, 407, 291, 2762, 411, 364, 833, 14833, 420, 51188], + "temperature": 0.0, "avg_logprob": -0.12723109698054766, "compression_ratio": 1.6680851063829787, + "no_speech_prob": 0.012591775506734848}, {"id": 65, "seek": 39976, "start": 416.24, + "end": 422.96, "text": " something? Yeah, I felt like that. And on top of it, I + think the bigger challenge was that we had", "tokens": [51188, 746, 30, 865, 11, + 286, 2762, 411, 300, 13, 400, 322, 1192, 295, 309, 11, 286, 519, 264, 3801, 3430, + 390, 300, 321, 632, 51524], "temperature": 0.0, "avg_logprob": -0.12723109698054766, + "compression_ratio": 1.6680851063829787, "no_speech_prob": 0.012591775506734848}, + {"id": 66, "seek": 42296, "start": 422.96, "end": 431.2, "text": " this guy who + was basically from the ontology''s word. So semantic web was very underplayed at + that", "tokens": [50364, 341, 2146, 567, 390, 1936, 490, 264, 6592, 1793, 311, 1349, + 13, 407, 47982, 3670, 390, 588, 833, 2858, 292, 412, 300, 50776], "temperature": + 0.0, "avg_logprob": -0.12622997164726257, "compression_ratio": 1.6291666666666667, + "no_speech_prob": 0.010736115276813507}, {"id": 67, "seek": 42296, "start": 431.2, + "end": 436.47999999999996, "text": " point in time. I think right now it''s like + coming up as if something really fancy and all of these", "tokens": [50776, 935, + 294, 565, 13, 286, 519, 558, 586, 309, 311, 411, 1348, 493, 382, 498, 746, 534, + 10247, 293, 439, 295, 613, 51040], "temperature": 0.0, "avg_logprob": -0.12622997164726257, + "compression_ratio": 1.6291666666666667, "no_speech_prob": 0.010736115276813507}, + {"id": 68, "seek": 42296, "start": 436.47999999999996, "end": 443.12, "text": " + things that are really seen in like big light were not really known as they were. + So I was asked to", "tokens": [51040, 721, 300, 366, 534, 1612, 294, 411, 955, 1442, + 645, 406, 534, 2570, 382, 436, 645, 13, 407, 286, 390, 2351, 281, 51372], "temperature": + 0.0, "avg_logprob": -0.12622997164726257, "compression_ratio": 1.6291666666666667, + "no_speech_prob": 0.010736115276813507}, {"id": 69, "seek": 42296, "start": 443.12, + "end": 449.03999999999996, "text": " find, you know, like the application has this + feature that I could place people in the circle.", "tokens": [51372, 915, 11, 291, + 458, 11, 411, 264, 3861, 575, 341, 4111, 300, 286, 727, 1081, 561, 294, 264, 6329, + 13, 51668], "temperature": 0.0, "avg_logprob": -0.12622997164726257, "compression_ratio": + 1.6291666666666667, "no_speech_prob": 0.010736115276813507}, {"id": 70, "seek": + 44904, "start": 449.04, "end": 455.52000000000004, "text": " Like for forse really + a thing friend of a friend. So the major I think the breakdown for me was,", "tokens": + [50364, 1743, 337, 337, 405, 534, 257, 551, 1277, 295, 257, 1277, 13, 407, 264, + 2563, 286, 519, 264, 18188, 337, 385, 390, 11, 50688], "temperature": 0.0, "avg_logprob": + -0.18876303395917338, "compression_ratio": 1.694915254237288, "no_speech_prob": + 0.008275278843939304}, {"id": 71, "seek": 44904, "start": 455.52000000000004, "end": + 462.32000000000005, "text": " you know, dealing with the relationship. Every person + is a document and finding relationship between", "tokens": [50688, 291, 458, 11, + 6260, 365, 264, 2480, 13, 2048, 954, 307, 257, 4166, 293, 5006, 2480, 1296, 51028], + "temperature": 0.0, "avg_logprob": -0.18876303395917338, "compression_ratio": 1.694915254237288, + "no_speech_prob": 0.008275278843939304}, {"id": 72, "seek": 44904, "start": 462.32000000000005, + "end": 469.36, "text": " these documents was something that was given to me. And + I was like why why God why am I supposed to", "tokens": [51028, 613, 8512, 390, + 746, 300, 390, 2212, 281, 385, 13, 400, 286, 390, 411, 983, 983, 1265, 983, 669, + 286, 3442, 281, 51380], "temperature": 0.0, "avg_logprob": -0.18876303395917338, + "compression_ratio": 1.694915254237288, "no_speech_prob": 0.008275278843939304}, + {"id": 73, "seek": 44904, "start": 469.36, "end": 475.36, "text": " you know, do + this thing? So relationships and ontologies and visualizing this stuff. So we implemented", + "tokens": [51380, 291, 458, 11, 360, 341, 551, 30, 407, 6159, 293, 6592, 6204, 293, + 5056, 3319, 341, 1507, 13, 407, 321, 12270, 51680], "temperature": 0.0, "avg_logprob": + -0.18876303395917338, "compression_ratio": 1.694915254237288, "no_speech_prob": + 0.008275278843939304}, {"id": 74, "seek": 47536, "start": 475.36, "end": 482.08000000000004, + "text": " this visual map using cluster map API back then. And I mean, now when + I look back, I feel like", "tokens": [50364, 341, 5056, 4471, 1228, 13630, 4471, + 9362, 646, 550, 13, 400, 286, 914, 11, 586, 562, 286, 574, 646, 11, 286, 841, 411, + 50700], "temperature": 0.0, "avg_logprob": -0.20343955682248485, "compression_ratio": + 1.625531914893617, "no_speech_prob": 0.014307098463177681}, {"id": 75, "seek": 47536, + "start": 482.08000000000004, "end": 487.68, "text": " that was like very cool stuff + that I did back then. It sounds very cool actually. It is.", "tokens": [50700, 300, + 390, 411, 588, 1627, 1507, 300, 286, 630, 646, 550, 13, 467, 3263, 588, 1627, 767, + 13, 467, 307, 13, 50980], "temperature": 0.0, "avg_logprob": -0.20343955682248485, + "compression_ratio": 1.625531914893617, "no_speech_prob": 0.014307098463177681}, + {"id": 76, "seek": 47536, "start": 487.68, "end": 493.28000000000003, "text": " + Modeling Graph using Lucine. It''s like not necessarily something people do or at + least I don''t know", "tokens": [50980, 6583, 11031, 21884, 1228, 441, 1311, 533, + 13, 467, 311, 411, 406, 4725, 746, 561, 360, 420, 412, 1935, 286, 500, 380, 458, + 51260], "temperature": 0.0, "avg_logprob": -0.20343955682248485, "compression_ratio": + 1.625531914893617, "no_speech_prob": 0.014307098463177681}, {"id": 77, "seek": 47536, + "start": 493.28000000000003, "end": 499.28000000000003, "text": " about that. Actually, + and we did not really have any cluster monitoring tools as well. So we built", "tokens": + [51260, 466, 300, 13, 5135, 11, 293, 321, 630, 406, 534, 362, 604, 13630, 11028, + 3873, 382, 731, 13, 407, 321, 3094, 51560], "temperature": 0.0, "avg_logprob": -0.20343955682248485, + "compression_ratio": 1.625531914893617, "no_speech_prob": 0.014307098463177681}, + {"id": 78, "seek": 49928, "start": 499.28, "end": 506.4, "text": " something by + ourselves as well. So using GraphWiz, we built, you know, like how each of the clusters", + "tokens": [50364, 746, 538, 4175, 382, 731, 13, 407, 1228, 21884, 54, 590, 11, 321, + 3094, 11, 291, 458, 11, 411, 577, 1184, 295, 264, 23313, 50720], "temperature": + 0.0, "avg_logprob": -0.1289782831745763, "compression_ratio": 1.7517482517482517, + "no_speech_prob": 0.0013217147206887603}, {"id": 79, "seek": 49928, "start": 506.4, + "end": 512.4, "text": " are doing. So we had this thing that obviously cluster was + not something that solar supported back", "tokens": [50720, 366, 884, 13, 407, 321, + 632, 341, 551, 300, 2745, 13630, 390, 406, 746, 300, 7936, 8104, 646, 51020], "temperature": + 0.0, "avg_logprob": -0.1289782831745763, "compression_ratio": 1.7517482517482517, + "no_speech_prob": 0.0013217147206887603}, {"id": 80, "seek": 49928, "start": 512.4, + "end": 517.52, "text": " then, but we actually made our own cluster. But one of + the things that I would also like to mention", "tokens": [51020, 550, 11, 457, 321, + 767, 1027, 527, 1065, 13630, 13, 583, 472, 295, 264, 721, 300, 286, 576, 611, 411, + 281, 2152, 51276], "temperature": 0.0, "avg_logprob": -0.1289782831745763, "compression_ratio": + 1.7517482517482517, "no_speech_prob": 0.0013217147206887603}, {"id": 81, "seek": + 49928, "start": 517.52, "end": 523.12, "text": " here is that we we did not really, + I mean, at least I was or my manager was not really aware of like", "tokens": [51276, + 510, 307, 300, 321, 321, 630, 406, 534, 11, 286, 914, 11, 412, 1935, 286, 390, 420, + 452, 6598, 390, 406, 534, 3650, 295, 411, 51556], "temperature": 0.0, "avg_logprob": + -0.1289782831745763, "compression_ratio": 1.7517482517482517, "no_speech_prob": + 0.0013217147206887603}, {"id": 82, "seek": 49928, "start": 523.12, "end": 528.0799999999999, + "text": " all of this could be contributed to open source. So we were like living + in our own world, trying to,", "tokens": [51556, 439, 295, 341, 727, 312, 18434, + 281, 1269, 4009, 13, 407, 321, 645, 411, 2647, 294, 527, 1065, 1002, 11, 1382, 281, + 11, 51804], "temperature": 0.0, "avg_logprob": -0.1289782831745763, "compression_ratio": + 1.7517482517482517, "no_speech_prob": 0.0013217147206887603}, {"id": 83, "seek": + 52808, "start": 528.64, "end": 534.5600000000001, "text": " like build something + really cool only for the client, but not really for I think a public. And I", "tokens": + [50392, 411, 1322, 746, 534, 1627, 787, 337, 264, 6423, 11, 457, 406, 534, 337, + 286, 519, 257, 1908, 13, 400, 286, 50688], "temperature": 0.0, "avg_logprob": -0.16743523183495107, + "compression_ratio": 1.6594827586206897, "no_speech_prob": 0.005527488421648741}, + {"id": 84, "seek": 52808, "start": 534.5600000000001, "end": 541.6, "text": " think + this is something that came way later in my life. Yeah, I guess, I guess probably + like before", "tokens": [50688, 519, 341, 307, 746, 300, 1361, 636, 1780, 294, 452, + 993, 13, 865, 11, 286, 2041, 11, 286, 2041, 1391, 411, 949, 51040], "temperature": + 0.0, "avg_logprob": -0.16743523183495107, "compression_ratio": 1.6594827586206897, + "no_speech_prob": 0.005527488421648741}, {"id": 85, "seek": 52808, "start": 541.6, + "end": 547.9200000000001, "text": " you contribute, at least how I feel myself when + I also doubled in solar a bit, you know, in the", "tokens": [51040, 291, 10586, + 11, 412, 1935, 577, 286, 841, 2059, 562, 286, 611, 24405, 294, 7936, 257, 857, 11, + 291, 458, 11, 294, 264, 51356], "temperature": 0.0, "avg_logprob": -0.16743523183495107, + "compression_ratio": 1.6594827586206897, "no_speech_prob": 0.005527488421648741}, + {"id": 86, "seek": 52808, "start": 547.9200000000001, "end": 554.88, "text": " beginning + and then it took 10 years of in my life. I actually don''t want to say off my life.", + "tokens": [51356, 2863, 293, 550, 309, 1890, 1266, 924, 295, 294, 452, 993, 13, + 286, 767, 500, 380, 528, 281, 584, 766, 452, 993, 13, 51704], "temperature": 0.0, + "avg_logprob": -0.16743523183495107, "compression_ratio": 1.6594827586206897, "no_speech_prob": + 0.005527488421648741}, {"id": 87, "seek": 55488, "start": 554.88, "end": 561.4399999999999, + "text": " It sounds so negative. But like, you know, in the beginning, you still + need, if you''re like a", "tokens": [50364, 467, 3263, 370, 3671, 13, 583, 411, + 11, 291, 458, 11, 294, 264, 2863, 11, 291, 920, 643, 11, 498, 291, 434, 411, 257, + 50692], "temperature": 0.0, "avg_logprob": -0.20008695502030222, "compression_ratio": + 1.6782608695652175, "no_speech_prob": 0.01816609315574169}, {"id": 88, "seek": 55488, + "start": 561.4399999999999, "end": 565.6, "text": " startup or something, you still + need to figure out whether this works or not, right? Whether this", "tokens": [50692, + 18578, 420, 746, 11, 291, 920, 643, 281, 2573, 484, 1968, 341, 1985, 420, 406, 11, + 558, 30, 8503, 341, 50900], "temperature": 0.0, "avg_logprob": -0.20008695502030222, + "compression_ratio": 1.6782608695652175, "no_speech_prob": 0.01816609315574169}, + {"id": 89, "seek": 55488, "start": 565.6, "end": 570.56, "text": " solves some needs + for your users, how much of this you want to still keep as a business secret,", + "tokens": [50900, 39890, 512, 2203, 337, 428, 5022, 11, 577, 709, 295, 341, 291, + 528, 281, 920, 1066, 382, 257, 1606, 4054, 11, 51148], "temperature": 0.0, "avg_logprob": + -0.20008695502030222, "compression_ratio": 1.6782608695652175, "no_speech_prob": + 0.01816609315574169}, {"id": 90, "seek": 55488, "start": 570.56, "end": 577.92, + "text": " how much is okay to contribute because you might see even more development + in this, right? And get", "tokens": [51148, 577, 709, 307, 1392, 281, 10586, 570, + 291, 1062, 536, 754, 544, 3250, 294, 341, 11, 558, 30, 400, 483, 51516], "temperature": + 0.0, "avg_logprob": -0.20008695502030222, "compression_ratio": 1.6782608695652175, + "no_speech_prob": 0.01816609315574169}, {"id": 91, "seek": 57792, "start": 577.92, + "end": 584.0, "text": " to feedback. Absolutely. Absolutely. And, you know, having + joined a company that was not really like,", "tokens": [50364, 281, 5824, 13, 7021, + 13, 7021, 13, 400, 11, 291, 458, 11, 1419, 6869, 257, 2237, 300, 390, 406, 534, + 411, 11, 50668], "temperature": 0.0, "avg_logprob": -0.14772834199847598, "compression_ratio": + 1.7079646017699115, "no_speech_prob": 0.04782097041606903}, {"id": 92, "seek": 57792, + "start": 584.0, "end": 590.88, "text": " you know, big companies back in India, + I think maybe you would get an idea as to how cool or,", "tokens": [50668, 291, + 458, 11, 955, 3431, 646, 294, 5282, 11, 286, 519, 1310, 291, 576, 483, 364, 1558, + 382, 281, 577, 1627, 420, 11, 51012], "temperature": 0.0, "avg_logprob": -0.14772834199847598, + "compression_ratio": 1.7079646017699115, "no_speech_prob": 0.04782097041606903}, + {"id": 93, "seek": 57792, "start": 590.88, "end": 596.9599999999999, "text": " you + know, how small the company was that in my induction program, like the first day + when I joined,", "tokens": [51012, 291, 458, 11, 577, 1359, 264, 2237, 390, 300, + 294, 452, 33371, 1461, 11, 411, 264, 700, 786, 562, 286, 6869, 11, 51316], "temperature": + 0.0, "avg_logprob": -0.14772834199847598, "compression_ratio": 1.7079646017699115, + "no_speech_prob": 0.04782097041606903}, {"id": 94, "seek": 57792, "start": 597.5999999999999, + "end": 602.64, "text": " I was being asked that, you know, grab a cup of coffee + and watch this movie. The movie was", "tokens": [51348, 286, 390, 885, 2351, 300, + 11, 291, 458, 11, 4444, 257, 4414, 295, 4982, 293, 1159, 341, 3169, 13, 440, 3169, + 390, 51600], "temperature": 0.0, "avg_logprob": -0.14772834199847598, "compression_ratio": + 1.7079646017699115, "no_speech_prob": 0.04782097041606903}, {"id": 95, "seek": 60264, + "start": 602.64, "end": 607.6, "text": " Pirates of Silicon Valley. So they said, + you know, we don''t want you to have any rocket science", "tokens": [50364, 24161, + 1024, 295, 25351, 10666, 13, 407, 436, 848, 11, 291, 458, 11, 321, 500, 380, 528, + 291, 281, 362, 604, 13012, 3497, 50612], "temperature": 0.0, "avg_logprob": -0.14641619092635527, + "compression_ratio": 1.7031802120141342, "no_speech_prob": 0.04054149240255356}, + {"id": 96, "seek": 60264, "start": 607.6, "end": 613.04, "text": " e-scales. We + were just make you, you know, learn all of that stuff. Just get this mindset. And + I", "tokens": [50612, 308, 12, 4417, 4229, 13, 492, 645, 445, 652, 291, 11, 291, + 458, 11, 1466, 439, 295, 300, 1507, 13, 1449, 483, 341, 12543, 13, 400, 286, 50884], + "temperature": 0.0, "avg_logprob": -0.14641619092635527, "compression_ratio": 1.7031802120141342, + "no_speech_prob": 0.04054149240255356}, {"id": 97, "seek": 60264, "start": 613.04, + "end": 618.48, "text": " think that''s what I tried at. I love that movie actually. + I think there are two versions of it,", "tokens": [50884, 519, 300, 311, 437, 286, + 3031, 412, 13, 286, 959, 300, 3169, 767, 13, 286, 519, 456, 366, 732, 9606, 295, + 309, 11, 51156], "temperature": 0.0, "avg_logprob": -0.14641619092635527, "compression_ratio": + 1.7031802120141342, "no_speech_prob": 0.04054149240255356}, {"id": 98, "seek": 60264, + "start": 618.48, "end": 623.52, "text": " right? The original and some kind of remake + if I''m not mistaken. Right. That''s correct. I think I", "tokens": [51156, 558, + 30, 440, 3380, 293, 512, 733, 295, 28582, 498, 286, 478, 406, 21333, 13, 1779, 13, + 663, 311, 3006, 13, 286, 519, 286, 51408], "temperature": 0.0, "avg_logprob": -0.14641619092635527, + "compression_ratio": 1.7031802120141342, "no_speech_prob": 0.04054149240255356}, + {"id": 99, "seek": 60264, "start": 623.52, "end": 630.48, "text": " watched the + original one. The original one is amazing. It''s like almost this kind of, you know,", + "tokens": [51408, 6337, 264, 3380, 472, 13, 440, 3380, 472, 307, 2243, 13, 467, + 311, 411, 1920, 341, 733, 295, 11, 291, 458, 11, 51756], "temperature": 0.0, "avg_logprob": + -0.14641619092635527, "compression_ratio": 1.7031802120141342, "no_speech_prob": + 0.04054149240255356}, {"id": 100, "seek": 63048, "start": 630.48, "end": 636.48, + "text": " it''s like a meditation. You go into that state of mind. Indeed. Yeah. + We have a lot to learn from", "tokens": [50364, 309, 311, 411, 257, 12537, 13, 509, + 352, 666, 300, 1785, 295, 1575, 13, 15061, 13, 865, 13, 492, 362, 257, 688, 281, + 1466, 490, 50664], "temperature": 0.0, "avg_logprob": -0.10801331931297932, "compression_ratio": + 1.646090534979424, "no_speech_prob": 0.01345206331461668}, {"id": 101, "seek": 63048, + "start": 636.48, "end": 643.44, "text": " that movie. Yeah. But I think just like + everyone else, I was also, you know, in India, I think we do", "tokens": [50664, + 300, 3169, 13, 865, 13, 583, 286, 519, 445, 411, 1518, 1646, 11, 286, 390, 611, + 11, 291, 458, 11, 294, 5282, 11, 286, 519, 321, 360, 51012], "temperature": 0.0, + "avg_logprob": -0.10801331931297932, "compression_ratio": 1.646090534979424, "no_speech_prob": + 0.01345206331461668}, {"id": 102, "seek": 63048, "start": 643.44, "end": 650.0, + "text": " have a lot of pressure of academically, you know, like building, grooming + ourselves. So when I started", "tokens": [51012, 362, 257, 688, 295, 3321, 295, + 48944, 11, 291, 458, 11, 411, 2390, 11, 49700, 4175, 13, 407, 562, 286, 1409, 51340], + "temperature": 0.0, "avg_logprob": -0.10801331931297932, "compression_ratio": 1.646090534979424, + "no_speech_prob": 0.01345206331461668}, {"id": 103, "seek": 63048, "start": 650.0, + "end": 658.0, "text": " in 2008 and then, you know, got married in 2009, had my + first gig in 2010, I think it was the time", "tokens": [51340, 294, 10389, 293, + 550, 11, 291, 458, 11, 658, 5259, 294, 11453, 11, 632, 452, 700, 8741, 294, 9657, + 11, 286, 519, 309, 390, 264, 565, 51740], "temperature": 0.0, "avg_logprob": -0.10801331931297932, + "compression_ratio": 1.646090534979424, "no_speech_prob": 0.01345206331461668}, + {"id": 104, "seek": 65800, "start": 658.0, "end": 664.96, "text": " when I had to + take a break. But when I did it come back in 2011, things were obviously had changed.", + "tokens": [50364, 562, 286, 632, 281, 747, 257, 1821, 13, 583, 562, 286, 630, 309, + 808, 646, 294, 10154, 11, 721, 645, 2745, 632, 3105, 13, 50712], "temperature": + 0.0, "avg_logprob": -0.11054019574765805, "compression_ratio": 1.6652542372881356, + "no_speech_prob": 0.017460757866501808}, {"id": 105, "seek": 65800, "start": 664.96, + "end": 670.88, "text": " And someone, you know, told me that, you know, it would + be a good idea to have more of, you know,", "tokens": [50712, 400, 1580, 11, 291, + 458, 11, 1907, 385, 300, 11, 291, 458, 11, 309, 576, 312, 257, 665, 1558, 281, 362, + 544, 295, 11, 291, 458, 11, 51008], "temperature": 0.0, "avg_logprob": -0.11054019574765805, + "compression_ratio": 1.6652542372881356, "no_speech_prob": 0.017460757866501808}, + {"id": 106, "seek": 65800, "start": 670.88, "end": 679.6, "text": " like a hands-off + kind of a role. And that made me think about, you know, going for an MBA. And I", + "tokens": [51008, 411, 257, 2377, 12, 4506, 733, 295, 257, 3090, 13, 400, 300, 1027, + 385, 519, 466, 11, 291, 458, 11, 516, 337, 364, 26674, 13, 400, 286, 51444], "temperature": + 0.0, "avg_logprob": -0.11054019574765805, "compression_ratio": 1.6652542372881356, + "no_speech_prob": 0.017460757866501808}, {"id": 107, "seek": 65800, "start": 679.6, + "end": 687.12, "text": " pursued MBA in 2014. And I decided that I would leave development + because it''s too demanding and I", "tokens": [51444, 34893, 26674, 294, 8227, 13, + 400, 286, 3047, 300, 286, 576, 1856, 3250, 570, 309, 311, 886, 19960, 293, 286, + 51820], "temperature": 0.0, "avg_logprob": -0.11054019574765805, "compression_ratio": + 1.6652542372881356, "no_speech_prob": 0.017460757866501808}, {"id": 108, "seek": + 68712, "start": 687.12, "end": 693.36, "text": " cannot manage that with a child. + And when I became a manager, I also took up a job as a manager.", "tokens": [50364, + 2644, 3067, 300, 365, 257, 1440, 13, 400, 562, 286, 3062, 257, 6598, 11, 286, 611, + 1890, 493, 257, 1691, 382, 257, 6598, 13, 50676], "temperature": 0.0, "avg_logprob": + -0.0980463799308328, "compression_ratio": 1.862962962962963, "no_speech_prob": 0.0033531850203871727}, + {"id": 109, "seek": 68712, "start": 693.36, "end": 698.88, "text": " I did that + for like two and a half months. And I was like pretty bugged because, I mean, obviously,", + "tokens": [50676, 286, 630, 300, 337, 411, 732, 293, 257, 1922, 2493, 13, 400, 286, + 390, 411, 1238, 7426, 3004, 570, 11, 286, 914, 11, 2745, 11, 50952], "temperature": + 0.0, "avg_logprob": -0.0980463799308328, "compression_ratio": 1.862962962962963, + "no_speech_prob": 0.0033531850203871727}, {"id": 110, "seek": 68712, "start": 698.88, + "end": 704.0, "text": " you know, once a developer, always a developer, I think + I started always, I mean, I felt like a little", "tokens": [50952, 291, 458, 11, + 1564, 257, 10754, 11, 1009, 257, 10754, 11, 286, 519, 286, 1409, 1009, 11, 286, + 914, 11, 286, 2762, 411, 257, 707, 51208], "temperature": 0.0, "avg_logprob": -0.0980463799308328, + "compression_ratio": 1.862962962962963, "no_speech_prob": 0.0033531850203871727}, + {"id": 111, "seek": 68712, "start": 704.0, "end": 710.64, "text": " bit, you know, + more triggered or more, you know, joy in seeing how things really work and not really", + "tokens": [51208, 857, 11, 291, 458, 11, 544, 21710, 420, 544, 11, 291, 458, 11, + 6258, 294, 2577, 577, 721, 534, 589, 293, 406, 534, 51540], "temperature": 0.0, + "avg_logprob": -0.0980463799308328, "compression_ratio": 1.862962962962963, "no_speech_prob": + 0.0033531850203871727}, {"id": 112, "seek": 68712, "start": 710.64, "end": 715.44, + "text": " by having, you know, said that, you know, this is how, this is what we + need to do with an application.", "tokens": [51540, 538, 1419, 11, 291, 458, 11, + 848, 300, 11, 291, 458, 11, 341, 307, 577, 11, 341, 307, 437, 321, 643, 281, 360, + 365, 364, 3861, 13, 51780], "temperature": 0.0, "avg_logprob": -0.0980463799308328, + "compression_ratio": 1.862962962962963, "no_speech_prob": 0.0033531850203871727}, + {"id": 113, "seek": 71544, "start": 715.44, "end": 720.5600000000001, "text": " + Like, this is the client requirement. This is the BRD, like a business requirement + document and then", "tokens": [50364, 1743, 11, 341, 307, 264, 6423, 11695, 13, + 639, 307, 264, 10262, 35, 11, 411, 257, 1606, 11695, 4166, 293, 550, 50620], "temperature": + 0.0, "avg_logprob": -0.11553638593285484, "compression_ratio": 1.7205882352941178, + "no_speech_prob": 0.008576691150665283}, {"id": 114, "seek": 71544, "start": 720.5600000000001, + "end": 726.0, "text": " go implemented. So I think which is why after two and a + half months, I just decided to come back to", "tokens": [50620, 352, 12270, 13, + 407, 286, 519, 597, 307, 983, 934, 732, 293, 257, 1922, 2493, 11, 286, 445, 3047, + 281, 808, 646, 281, 50892], "temperature": 0.0, "avg_logprob": -0.11553638593285484, + "compression_ratio": 1.7205882352941178, "no_speech_prob": 0.008576691150665283}, + {"id": 115, "seek": 71544, "start": 726.0, "end": 732.24, "text": " the, that''s + where I belong. And was there something in the first place that prompted you to + take the", "tokens": [50892, 264, 11, 300, 311, 689, 286, 5784, 13, 400, 390, 456, + 746, 294, 264, 700, 1081, 300, 31042, 291, 281, 747, 264, 51204], "temperature": + 0.0, "avg_logprob": -0.11553638593285484, "compression_ratio": 1.7205882352941178, + "no_speech_prob": 0.008576691150665283}, {"id": 116, "seek": 71544, "start": 732.24, + "end": 737.9200000000001, "text": " manager role? Was it just the fact that you + were going out of the maternity leave and you thought", "tokens": [51204, 6598, + 3090, 30, 3027, 309, 445, 264, 1186, 300, 291, 645, 516, 484, 295, 264, 2389, 28120, + 1856, 293, 291, 1194, 51488], "temperature": 0.0, "avg_logprob": -0.11553638593285484, + "compression_ratio": 1.7205882352941178, "no_speech_prob": 0.008576691150665283}, + {"id": 117, "seek": 71544, "start": 737.9200000000001, "end": 742.08, "text": " + you will do better in management? Was there something else going on?", "tokens": + [51488, 291, 486, 360, 1101, 294, 4592, 30, 3027, 456, 746, 1646, 516, 322, 30, + 51696], "temperature": 0.0, "avg_logprob": -0.11553638593285484, "compression_ratio": + 1.7205882352941178, "no_speech_prob": 0.008576691150665283}, {"id": 118, "seek": + 74208, "start": 742.8000000000001, "end": 750.0, "text": " I think that that''s + also interesting. I think there are two sides of this, you know, answer.", "tokens": + [50400, 286, 519, 300, 300, 311, 611, 1880, 13, 286, 519, 456, 366, 732, 4881, 295, + 341, 11, 291, 458, 11, 1867, 13, 50760], "temperature": 0.0, "avg_logprob": -0.11672892029752437, + "compression_ratio": 1.7397260273972603, "no_speech_prob": 0.019799504429101944}, + {"id": 119, "seek": 74208, "start": 750.0, "end": 756.24, "text": " The first one + was, you know, people usually associate and this is actually true that, you know,", + "tokens": [50760, 440, 700, 472, 390, 11, 291, 458, 11, 561, 2673, 14644, 293, 341, + 307, 767, 2074, 300, 11, 291, 458, 11, 51072], "temperature": 0.0, "avg_logprob": + -0.11672892029752437, "compression_ratio": 1.7397260273972603, "no_speech_prob": + 0.019799504429101944}, {"id": 120, "seek": 74208, "start": 756.24, "end": 761.84, + "text": " in a dev role back in India, because we have a lot of, you know, sourcing + work. The clients are", "tokens": [51072, 294, 257, 1905, 3090, 646, 294, 5282, + 11, 570, 321, 362, 257, 688, 295, 11, 291, 458, 11, 11006, 2175, 589, 13, 440, 6982, + 366, 51352], "temperature": 0.0, "avg_logprob": -0.11672892029752437, "compression_ratio": + 1.7397260273972603, "no_speech_prob": 0.019799504429101944}, {"id": 121, "seek": + 74208, "start": 761.84, "end": 767.2, "text": " usually based out of US. We have + long hours of working and usually the client calls would happen", "tokens": [51352, + 2673, 2361, 484, 295, 2546, 13, 492, 362, 938, 2496, 295, 1364, 293, 2673, 264, + 6423, 5498, 576, 1051, 51620], "temperature": 0.0, "avg_logprob": -0.11672892029752437, + "compression_ratio": 1.7397260273972603, "no_speech_prob": 0.019799504429101944}, + {"id": 122, "seek": 76720, "start": 767.2, "end": 773.36, "text": " in the evening, + because we have like 12 hours or 10 hours of difference. So by the time you''re", + "tokens": [50364, 294, 264, 5634, 11, 570, 321, 362, 411, 2272, 2496, 420, 1266, + 2496, 295, 2649, 13, 407, 538, 264, 565, 291, 434, 50672], "temperature": 0.0, "avg_logprob": + -0.12867415664542434, "compression_ratio": 1.6774193548387097, "no_speech_prob": + 0.025800229981541634}, {"id": 123, "seek": 76720, "start": 773.36, "end": 777.36, + "text": " ending your day, you have your client calls and you have to stay back + in office. And that''s", "tokens": [50672, 8121, 428, 786, 11, 291, 362, 428, 6423, + 5498, 293, 291, 362, 281, 1754, 646, 294, 3398, 13, 400, 300, 311, 50872], "temperature": + 0.0, "avg_logprob": -0.12867415664542434, "compression_ratio": 1.6774193548387097, + "no_speech_prob": 0.025800229981541634}, {"id": 124, "seek": 76720, "start": 777.9200000000001, + "end": 782.8000000000001, "text": " probably not like that for our managers. They + have little more perks. And I think that''s what", "tokens": [50900, 1391, 406, + 411, 300, 337, 527, 14084, 13, 814, 362, 707, 544, 36991, 13, 400, 286, 519, 300, + 311, 437, 51144], "temperature": 0.0, "avg_logprob": -0.12867415664542434, "compression_ratio": + 1.6774193548387097, "no_speech_prob": 0.025800229981541634}, {"id": 125, "seek": + 76720, "start": 782.8000000000001, "end": 787.76, "text": " someone suggested. And + I think I tried to play along, although I mean, I don''t regret doing", "tokens": + [51144, 1580, 10945, 13, 400, 286, 519, 286, 3031, 281, 862, 2051, 11, 4878, 286, + 914, 11, 286, 500, 380, 10879, 884, 51392], "temperature": 0.0, "avg_logprob": -0.12867415664542434, + "compression_ratio": 1.6774193548387097, "no_speech_prob": 0.025800229981541634}, + {"id": 126, "seek": 76720, "start": 787.76, "end": 793.2800000000001, "text": " + MB at all, because it''s, it just helped me understanding like what my manager is + going through.", "tokens": [51392, 28866, 412, 439, 11, 570, 309, 311, 11, 309, + 445, 4254, 385, 3701, 411, 437, 452, 6598, 307, 516, 807, 13, 51668], "temperature": + 0.0, "avg_logprob": -0.12867415664542434, "compression_ratio": 1.6774193548387097, + "no_speech_prob": 0.025800229981541634}, {"id": 127, "seek": 79328, "start": 793.28, + "end": 797.76, "text": " I mean, how is he thinking like, how should I behave? So + that just gave me also the context of", "tokens": [50364, 286, 914, 11, 577, 307, + 415, 1953, 411, 11, 577, 820, 286, 15158, 30, 407, 300, 445, 2729, 385, 611, 264, + 4319, 295, 50588], "temperature": 0.0, "avg_logprob": -0.14023775424597398, "compression_ratio": + 1.6008230452674896, "no_speech_prob": 0.019613690674304962}, {"id": 128, "seek": + 79328, "start": 797.76, "end": 804.9599999999999, "text": " the other side of the + table. So I don''t regret it. But in some sense, it''s, if I capture it, right,", + "tokens": [50588, 264, 661, 1252, 295, 264, 3199, 13, 407, 286, 500, 380, 10879, + 309, 13, 583, 294, 512, 2020, 11, 309, 311, 11, 498, 286, 7983, 309, 11, 558, 11, + 50948], "temperature": 0.0, "avg_logprob": -0.14023775424597398, "compression_ratio": + 1.6008230452674896, "no_speech_prob": 0.019613690674304962}, {"id": 129, "seek": + 79328, "start": 804.9599999999999, "end": 811.4399999999999, "text": " it sounds + that maybe it wasn''t the most natural move for you to take the manager role. Maybe + it was", "tokens": [50948, 309, 3263, 300, 1310, 309, 2067, 380, 264, 881, 3303, + 1286, 337, 291, 281, 747, 264, 6598, 3090, 13, 2704, 309, 390, 51272], "temperature": + 0.0, "avg_logprob": -0.14023775424597398, "compression_ratio": 1.6008230452674896, + "no_speech_prob": 0.019613690674304962}, {"id": 130, "seek": 79328, "start": 811.4399999999999, + "end": 818.24, "text": " just some circumstantial in a way, right? You thought it + would be better, easier with your new", "tokens": [51272, 445, 512, 7982, 394, 831, + 294, 257, 636, 11, 558, 30, 509, 1194, 309, 576, 312, 1101, 11, 3571, 365, 428, + 777, 51612], "temperature": 0.0, "avg_logprob": -0.14023775424597398, "compression_ratio": + 1.6008230452674896, "no_speech_prob": 0.019613690674304962}, {"id": 131, "seek": + 81824, "start": 818.32, "end": 825.2, "text": " responsibilities in the family, + right? That is good. Yeah. And I mean, I would say like,", "tokens": [50368, 16190, + 294, 264, 1605, 11, 558, 30, 663, 307, 665, 13, 865, 13, 400, 286, 914, 11, 286, + 576, 584, 411, 11, 50712], "temperature": 0.0, "avg_logprob": -0.1394248216048531, + "compression_ratio": 1.5454545454545454, "no_speech_prob": 0.012731199152767658}, + {"id": 132, "seek": 81824, "start": 826.32, "end": 831.76, "text": " you know, personally, + maybe women have obviously changed. It''s been five years I''ve moved to", "tokens": + [50768, 291, 458, 11, 5665, 11, 1310, 2266, 362, 2745, 3105, 13, 467, 311, 668, + 1732, 924, 286, 600, 4259, 281, 51040], "temperature": 0.0, "avg_logprob": -0.1394248216048531, + "compression_ratio": 1.5454545454545454, "no_speech_prob": 0.012731199152767658}, + {"id": 133, "seek": 81824, "start": 831.76, "end": 838.64, "text": " Berlin now. + And until I moved and all, you know, women professionals that I know, my friends + in", "tokens": [51040, 13848, 586, 13, 400, 1826, 286, 4259, 293, 439, 11, 291, + 458, 11, 2266, 11954, 300, 286, 458, 11, 452, 1855, 294, 51384], "temperature": + 0.0, "avg_logprob": -0.1394248216048531, "compression_ratio": 1.5454545454545454, + "no_speech_prob": 0.012731199152767658}, {"id": 134, "seek": 81824, "start": 838.64, + "end": 843.44, "text": " India, I think they still have this problem of clearly + communicating what they want in their job", "tokens": [51384, 5282, 11, 286, 519, + 436, 920, 362, 341, 1154, 295, 4448, 17559, 437, 436, 528, 294, 641, 1691, 51624], + "temperature": 0.0, "avg_logprob": -0.1394248216048531, "compression_ratio": 1.5454545454545454, + "no_speech_prob": 0.012731199152767658}, {"id": 135, "seek": 84344, "start": 844.08, + "end": 850.08, "text": " I mean, if you can''t do that, I think you are already + an awesome woman. I was not one of them.", "tokens": [50396, 286, 914, 11, 498, + 291, 393, 380, 360, 300, 11, 286, 519, 291, 366, 1217, 364, 3476, 3059, 13, 286, + 390, 406, 472, 295, 552, 13, 50696], "temperature": 0.0, "avg_logprob": -0.12224768179434317, + "compression_ratio": 1.7794117647058822, "no_speech_prob": 0.20404767990112305}, + {"id": 136, "seek": 84344, "start": 850.5600000000001, "end": 855.12, "text": " + It was always like something that, you know, people would see me as less if I''m + asking for like,", "tokens": [50720, 467, 390, 1009, 411, 746, 300, 11, 291, 458, + 11, 561, 576, 536, 385, 382, 1570, 498, 286, 478, 3365, 337, 411, 11, 50948], "temperature": + 0.0, "avg_logprob": -0.12224768179434317, "compression_ratio": 1.7794117647058822, + "no_speech_prob": 0.20404767990112305}, {"id": 137, "seek": 84344, "start": 855.12, + "end": 860.4000000000001, "text": " I need to be at home with my kid, because he''s + too small. He may need me. But I always try to,", "tokens": [50948, 286, 643, 281, + 312, 412, 1280, 365, 452, 1636, 11, 570, 415, 311, 886, 1359, 13, 634, 815, 643, + 385, 13, 583, 286, 1009, 853, 281, 11, 51212], "temperature": 0.0, "avg_logprob": + -0.12224768179434317, "compression_ratio": 1.7794117647058822, "no_speech_prob": + 0.20404767990112305}, {"id": 138, "seek": 84344, "start": 860.4000000000001, "end": + 865.7600000000001, "text": " you know, keep things to myself and try to, you know, + change myself, try to leave what I was passionate", "tokens": [51212, 291, 458, + 11, 1066, 721, 281, 2059, 293, 853, 281, 11, 291, 458, 11, 1319, 2059, 11, 853, + 281, 1856, 437, 286, 390, 11410, 51480], "temperature": 0.0, "avg_logprob": -0.12224768179434317, + "compression_ratio": 1.7794117647058822, "no_speech_prob": 0.20404767990112305}, + {"id": 139, "seek": 84344, "start": 865.7600000000001, "end": 871.6, "text": " about + just to fit into that frame, like how women should be, how a mother should be, or + how a", "tokens": [51480, 466, 445, 281, 3318, 666, 300, 3920, 11, 411, 577, 2266, + 820, 312, 11, 577, 257, 2895, 820, 312, 11, 420, 577, 257, 51772], "temperature": + 0.0, "avg_logprob": -0.12224768179434317, "compression_ratio": 1.7794117647058822, + "no_speech_prob": 0.20404767990112305}, {"id": 140, "seek": 87160, "start": 871.6, + "end": 877.6800000000001, "text": " wife should be. So that was something I learned + very hard way. Yeah. And it''s like something that,", "tokens": [50364, 3836, 820, + 312, 13, 407, 300, 390, 746, 286, 3264, 588, 1152, 636, 13, 865, 13, 400, 309, 311, + 411, 746, 300, 11, 50668], "temperature": 0.0, "avg_logprob": -0.16596042408662684, + "compression_ratio": 1.6796536796536796, "no_speech_prob": 0.02458283118903637}, + {"id": 141, "seek": 87160, "start": 878.32, "end": 883.84, "text": " I''m sure we + will touch this on this topic later in the podcast, but like, it''s something that + is", "tokens": [50700, 286, 478, 988, 321, 486, 2557, 341, 322, 341, 4829, 1780, + 294, 264, 7367, 11, 457, 411, 11, 309, 311, 746, 300, 307, 50976], "temperature": + 0.0, "avg_logprob": -0.16596042408662684, "compression_ratio": 1.6796536796536796, + "no_speech_prob": 0.02458283118903637}, {"id": 142, "seek": 87160, "start": 883.84, + "end": 889.2, "text": " kind of implicit. And when we talk about men, maybe they + don''t feel that. And again, it depends on", "tokens": [50976, 733, 295, 26947, + 13, 400, 562, 321, 751, 466, 1706, 11, 1310, 436, 500, 380, 841, 300, 13, 400, 797, + 11, 309, 5946, 322, 51244], "temperature": 0.0, "avg_logprob": -0.16596042408662684, + "compression_ratio": 1.6796536796536796, "no_speech_prob": 0.02458283118903637}, + {"id": 143, "seek": 87160, "start": 889.2, "end": 896.24, "text": " the culture + where you come from, you know, in my culture, you know, men also like assign this", + "tokens": [51244, 264, 3713, 689, 291, 808, 490, 11, 291, 458, 11, 294, 452, 3713, + 11, 291, 458, 11, 1706, 611, 411, 6269, 341, 51596], "temperature": 0.0, "avg_logprob": + -0.16596042408662684, "compression_ratio": 1.6796536796536796, "no_speech_prob": + 0.02458283118903637}, {"id": 144, "seek": 89624, "start": 896.32, "end": 901.92, + "text": " responsibilities that you should be the man who earns money and hands + all your decisions need", "tokens": [50368, 16190, 300, 291, 820, 312, 264, 587, + 567, 46936, 1460, 293, 2377, 439, 428, 5327, 643, 50648], "temperature": 0.0, "avg_logprob": + -0.09731222902025495, "compression_ratio": 1.75, "no_speech_prob": 0.0396302193403244}, + {"id": 145, "seek": 89624, "start": 901.92, "end": 907.52, "text": " to be based + in order to maximize that probability that you will be that person. But maybe you + don''t", "tokens": [50648, 281, 312, 2361, 294, 1668, 281, 19874, 300, 8482, 300, + 291, 486, 312, 300, 954, 13, 583, 1310, 291, 500, 380, 50928], "temperature": 0.0, + "avg_logprob": -0.09731222902025495, "compression_ratio": 1.75, "no_speech_prob": + 0.0396302193403244}, {"id": 146, "seek": 89624, "start": 907.52, "end": 912.4, "text": + " want that path, you know, maybe you still want to go and explore what is it that + you like.", "tokens": [50928, 528, 300, 3100, 11, 291, 458, 11, 1310, 291, 920, + 528, 281, 352, 293, 6839, 437, 307, 309, 300, 291, 411, 13, 51172], "temperature": + 0.0, "avg_logprob": -0.09731222902025495, "compression_ratio": 1.75, "no_speech_prob": + 0.0396302193403244}, {"id": 147, "seek": 89624, "start": 913.12, "end": 920.0, "text": + " And so it''s interesting that how culture and, you know, society shape us in that + direction,", "tokens": [51208, 400, 370, 309, 311, 1880, 300, 577, 3713, 293, 11, + 291, 458, 11, 4086, 3909, 505, 294, 300, 3513, 11, 51552], "temperature": 0.0, "avg_logprob": + -0.09731222902025495, "compression_ratio": 1.75, "no_speech_prob": 0.0396302193403244}, + {"id": 148, "seek": 89624, "start": 920.0, "end": 925.6, "text": " until we just + carry the momentum until we realize, hold on a second, am I going in the right", + "tokens": [51552, 1826, 321, 445, 3985, 264, 11244, 1826, 321, 4325, 11, 1797, 322, + 257, 1150, 11, 669, 286, 516, 294, 264, 558, 51832], "temperature": 0.0, "avg_logprob": + -0.09731222902025495, "compression_ratio": 1.75, "no_speech_prob": 0.0396302193403244}, + {"id": 149, "seek": 92560, "start": 925.6, "end": 931.36, "text": " direction? And + this would happen to you? True, true. That that was the exact same thing. And", + "tokens": [50364, 3513, 30, 400, 341, 576, 1051, 281, 291, 30, 13587, 11, 2074, + 13, 663, 300, 390, 264, 1900, 912, 551, 13, 400, 50652], "temperature": 0.0, "avg_logprob": + -0.2040805197381354, "compression_ratio": 1.5, "no_speech_prob": 0.005986114963889122}, + {"id": 150, "seek": 92560, "start": 932.08, "end": 941.36, "text": " again, I think + the major bump came in that there was this company or or training company. So to + say,", "tokens": [50688, 797, 11, 286, 519, 264, 2563, 9961, 1361, 294, 300, 456, + 390, 341, 2237, 420, 420, 3097, 2237, 13, 407, 281, 584, 11, 51152], "temperature": + 0.0, "avg_logprob": -0.2040805197381354, "compression_ratio": 1.5, "no_speech_prob": + 0.005986114963889122}, {"id": 151, "seek": 92560, "start": 942.08, "end": 948.5600000000001, + "text": " let''s put it more precisely, who reached out to me, which was like far + away, at least from the place", "tokens": [51188, 718, 311, 829, 309, 544, 13402, + 11, 567, 6488, 484, 281, 385, 11, 597, 390, 411, 1400, 1314, 11, 412, 1935, 490, + 264, 1081, 51512], "temperature": 0.0, "avg_logprob": -0.2040805197381354, "compression_ratio": + 1.5, "no_speech_prob": 0.005986114963889122}, {"id": 152, "seek": 94856, "start": + 948.56, "end": 956.4799999999999, "text": " I lived in. And they said, like, could + you remotely, you know, develop this solar curriculum", "tokens": [50364, 286, 5152, + 294, 13, 400, 436, 848, 11, 411, 11, 727, 291, 20824, 11, 291, 458, 11, 1499, 341, + 7936, 14302, 50760], "temperature": 0.0, "avg_logprob": -0.15999479004831024, "compression_ratio": + 1.6106194690265487, "no_speech_prob": 0.08323980867862701}, {"id": 153, "seek": + 94856, "start": 956.4799999999999, "end": 962.0799999999999, "text": " for us for + our training? And it was probably the first, you know, big things that happened + to me.", "tokens": [50760, 337, 505, 337, 527, 3097, 30, 400, 309, 390, 1391, 264, + 700, 11, 291, 458, 11, 955, 721, 300, 2011, 281, 385, 13, 51040], "temperature": + 0.0, "avg_logprob": -0.15999479004831024, "compression_ratio": 1.6106194690265487, + "no_speech_prob": 0.08323980867862701}, {"id": 154, "seek": 94856, "start": 962.0799999999999, + "end": 968.16, "text": " And I was like, okay, I mean, I work on the application + that uses solar-avue solar before", "tokens": [51040, 400, 286, 390, 411, 11, 1392, + 11, 286, 914, 11, 286, 589, 322, 264, 3861, 300, 4960, 7936, 12, 706, 622, 7936, + 949, 51344], "temperature": 0.0, "avg_logprob": -0.15999479004831024, "compression_ratio": + 1.6106194690265487, "no_speech_prob": 0.08323980867862701}, {"id": 155, "seek": + 94856, "start": 968.16, "end": 973.28, "text": " things have changed now. And that + was in 2014, was they come back? Like, oh my god,", "tokens": [51344, 721, 362, + 3105, 586, 13, 400, 300, 390, 294, 8227, 11, 390, 436, 808, 646, 30, 1743, 11, 1954, + 452, 3044, 11, 51600], "temperature": 0.0, "avg_logprob": -0.15999479004831024, + "compression_ratio": 1.6106194690265487, "no_speech_prob": 0.08323980867862701}, + {"id": 156, "seek": 97328, "start": 973.28, "end": 977.4399999999999, "text": " + solar is like still working. Like people are still working on this. Okay, wow, amazing.", + "tokens": [50364, 7936, 307, 411, 920, 1364, 13, 1743, 561, 366, 920, 1364, 322, + 341, 13, 1033, 11, 6076, 11, 2243, 13, 50572], "temperature": 0.0, "avg_logprob": + -0.1385592788946433, "compression_ratio": 1.7490774907749078, "no_speech_prob": + 0.025677448138594627}, {"id": 157, "seek": 97328, "start": 977.4399999999999, "end": + 982.48, "text": " That''s when I realized, okay, solar has really transformed. The + community has grown. And it was", "tokens": [50572, 663, 311, 562, 286, 5334, 11, + 1392, 11, 7936, 575, 534, 16894, 13, 440, 1768, 575, 7709, 13, 400, 309, 390, 50824], + "temperature": 0.0, "avg_logprob": -0.1385592788946433, "compression_ratio": 1.7490774907749078, + "no_speech_prob": 0.025677448138594627}, {"id": 158, "seek": 97328, "start": 982.48, + "end": 987.6, "text": " interesting. I think that was more like, you know, meeting + my old friend solar in a whole new, you", "tokens": [50824, 1880, 13, 286, 519, + 300, 390, 544, 411, 11, 291, 458, 11, 3440, 452, 1331, 1277, 7936, 294, 257, 1379, + 777, 11, 291, 51080], "temperature": 0.0, "avg_logprob": -0.1385592788946433, "compression_ratio": + 1.7490774907749078, "no_speech_prob": 0.025677448138594627}, {"id": 159, "seek": + 97328, "start": 987.6, "end": 993.76, "text": " know, a tire like with dinner jacket + and suit and with a tie. And I was like, oh my god,", "tokens": [51080, 458, 11, + 257, 11756, 411, 365, 6148, 11781, 293, 5722, 293, 365, 257, 7582, 13, 400, 286, + 390, 411, 11, 1954, 452, 3044, 11, 51388], "temperature": 0.0, "avg_logprob": -0.1385592788946433, + "compression_ratio": 1.7490774907749078, "no_speech_prob": 0.025677448138594627}, + {"id": 160, "seek": 97328, "start": 993.76, "end": 1001.36, "text": " dude, you + are popular now. So that was that was the thing for me. And preparing that course + curriculum", "tokens": [51388, 6449, 11, 291, 366, 3743, 586, 13, 407, 300, 390, + 300, 390, 264, 551, 337, 385, 13, 400, 10075, 300, 1164, 14302, 51768], "temperature": + 0.0, "avg_logprob": -0.1385592788946433, "compression_ratio": 1.7490774907749078, + "no_speech_prob": 0.025677448138594627}, {"id": 161, "seek": 100136, "start": 1001.44, + "end": 1008.16, "text": " for them. I think that was when I learned about all, you + know, the developed features that were", "tokens": [50368, 337, 552, 13, 286, 519, + 300, 390, 562, 286, 3264, 466, 439, 11, 291, 458, 11, 264, 4743, 4122, 300, 645, + 50704], "temperature": 0.0, "avg_logprob": -0.15640393325260707, "compression_ratio": + 1.6549295774647887, "no_speech_prob": 0.024459991604089737}, {"id": 162, "seek": + 100136, "start": 1008.16, "end": 1013.76, "text": " available in solar back then. + Also learned about elastic storage back then as well. But that", "tokens": [50704, + 2435, 294, 7936, 646, 550, 13, 2743, 3264, 466, 17115, 6725, 646, 550, 382, 731, + 13, 583, 300, 50984], "temperature": 0.0, "avg_logprob": -0.15640393325260707, "compression_ratio": + 1.6549295774647887, "no_speech_prob": 0.024459991604089737}, {"id": 163, "seek": + 100136, "start": 1013.76, "end": 1019.44, "text": " training became such a hot cake + because I would give, you know, public webinars. Obviously,", "tokens": [50984, + 3097, 3062, 1270, 257, 2368, 5908, 570, 286, 576, 976, 11, 291, 458, 11, 1908, 26065, + 13, 7580, 11, 51268], "temperature": 0.0, "avg_logprob": -0.15640393325260707, "compression_ratio": + 1.6549295774647887, "no_speech_prob": 0.024459991604089737}, {"id": 164, "seek": + 100136, "start": 1019.44, "end": 1024.88, "text": " we''re was being paid for that + too. There were like almost like 400, 500 people on those webinars", "tokens": [51268, + 321, 434, 390, 885, 4835, 337, 300, 886, 13, 821, 645, 411, 1920, 411, 8423, 11, + 5923, 561, 322, 729, 26065, 51540], "temperature": 0.0, "avg_logprob": -0.15640393325260707, + "compression_ratio": 1.6549295774647887, "no_speech_prob": 0.024459991604089737}, + {"id": 165, "seek": 100136, "start": 1024.88, "end": 1029.76, "text": " to see like + what this course is all about. Everyone wanted to become, so there was no rules,", + "tokens": [51540, 281, 536, 411, 437, 341, 1164, 307, 439, 466, 13, 5198, 1415, + 281, 1813, 11, 370, 456, 390, 572, 4474, 11, 51784], "temperature": 0.0, "avg_logprob": + -0.15640393325260707, "compression_ratio": 1.6549295774647887, "no_speech_prob": + 0.024459991604089737}, {"id": 166, "seek": 102976, "start": 1030.08, "end": 1037.36, + "text": " such as search engineer, like the engineer who knows about search more + or less. But we would take", "tokens": [50380, 1270, 382, 3164, 11403, 11, 411, + 264, 11403, 567, 3255, 466, 3164, 544, 420, 1570, 13, 583, 321, 576, 747, 50744], + "temperature": 0.0, "avg_logprob": -0.12685231601490693, "compression_ratio": 1.6943231441048034, + "no_speech_prob": 0.004095530137419701}, {"id": 167, "seek": 102976, "start": 1037.36, + "end": 1046.48, "text": " like 25 people only in that course, or maybe even less + sometimes. But I think preparing the course", "tokens": [50744, 411, 3552, 561, + 787, 294, 300, 1164, 11, 420, 1310, 754, 1570, 2171, 13, 583, 286, 519, 10075, 264, + 1164, 51200], "temperature": 0.0, "avg_logprob": -0.12685231601490693, "compression_ratio": + 1.6943231441048034, "no_speech_prob": 0.004095530137419701}, {"id": 168, "seek": + 102976, "start": 1046.48, "end": 1053.12, "text": " curriculum was one thing. And + then conducting that course for the first time was completely next", "tokens": [51200, + 14302, 390, 472, 551, 13, 400, 550, 21749, 300, 1164, 337, 264, 700, 565, 390, 2584, + 958, 51532], "temperature": 0.0, "avg_logprob": -0.12685231601490693, "compression_ratio": + 1.6943231441048034, "no_speech_prob": 0.004095530137419701}, {"id": 169, "seek": + 102976, "start": 1053.12, "end": 1058.8799999999999, "text": " level because I did + not imagine like people who would come to that course would come with like", "tokens": + [51532, 1496, 570, 286, 630, 406, 3811, 411, 561, 567, 576, 808, 281, 300, 1164, + 576, 808, 365, 411, 51820], "temperature": 0.0, "avg_logprob": -0.12685231601490693, + "compression_ratio": 1.6943231441048034, "no_speech_prob": 0.004095530137419701}, + {"id": 170, "seek": 105888, "start": 1058.88, "end": 1066.48, "text": " 10 or 20 + or 30 years of experience in Java. So to imagine like people are really asking me + questions", "tokens": [50364, 1266, 420, 945, 420, 2217, 924, 295, 1752, 294, 10745, + 13, 407, 281, 3811, 411, 561, 366, 534, 3365, 385, 1651, 50744], "temperature": + 0.0, "avg_logprob": -0.2611370271849401, "compression_ratio": 1.6551724137931034, + "no_speech_prob": 0.007277671713382006}, {"id": 171, "seek": 105888, "start": 1066.48, + "end": 1071.92, "text": " very low level, like what is happening when you know, + face-sitting is happening? How is this,", "tokens": [50744, 588, 2295, 1496, 11, + 411, 437, 307, 2737, 562, 291, 458, 11, 1851, 12, 82, 2414, 307, 2737, 30, 1012, + 307, 341, 11, 51016], "temperature": 0.0, "avg_logprob": -0.2611370271849401, "compression_ratio": + 1.6551724137931034, "no_speech_prob": 0.007277671713382006}, {"id": 172, "seek": + 105888, "start": 1071.92, "end": 1078.4, "text": " you know, a variable, you know, + is it in the, you know, like a memory or is it in like", "tokens": [51016, 291, + 458, 11, 257, 7006, 11, 291, 458, 11, 307, 309, 294, 264, 11, 291, 458, 11, 411, + 257, 4675, 420, 307, 309, 294, 411, 51340], "temperature": 0.0, "avg_logprob": -0.2611370271849401, + "compression_ratio": 1.6551724137931034, "no_speech_prob": 0.007277671713382006}, + {"id": 173, "seek": 105888, "start": 1079.2800000000002, "end": 1085.1200000000001, + "text": " somewhere else? Like what would have like, like, performance wise? Can + I improve this? And I was like,", "tokens": [51384, 4079, 1646, 30, 1743, 437, 576, + 362, 411, 11, 411, 11, 3389, 10829, 30, 1664, 286, 3470, 341, 30, 400, 286, 390, + 411, 11, 51676], "temperature": 0.0, "avg_logprob": -0.2611370271849401, "compression_ratio": + 1.6551724137931034, "no_speech_prob": 0.007277671713382006}, {"id": 174, "seek": + 108512, "start": 1085.12, "end": 1092.1599999999999, "text": " I mean, I am literally + like stumped because imagine like this was 2014. I started back in 2008", "tokens": + [50364, 286, 914, 11, 286, 669, 3736, 411, 43164, 292, 570, 3811, 411, 341, 390, + 8227, 13, 286, 1409, 646, 294, 10389, 50716], "temperature": 0.0, "avg_logprob": + -0.15624447490857996, "compression_ratio": 1.5577689243027888, "no_speech_prob": + 0.01345678698271513}, {"id": 175, "seek": 108512, "start": 1092.1599999999999, "end": + 1098.08, "text": " professionally after completing my studies. So six years and + that took with a break of like one and", "tokens": [50716, 27941, 934, 19472, 452, + 5313, 13, 407, 2309, 924, 293, 300, 1890, 365, 257, 1821, 295, 411, 472, 293, 51012], + "temperature": 0.0, "avg_logprob": -0.15624447490857996, "compression_ratio": 1.5577689243027888, + "no_speech_prob": 0.01345678698271513}, {"id": 176, "seek": 108512, "start": 1098.08, + "end": 1104.2399999999998, "text": " a half years and competing with the knowledge + of like low level code in Java with a person who''s", "tokens": [51012, 257, 1922, + 924, 293, 15439, 365, 264, 3601, 295, 411, 2295, 1496, 3089, 294, 10745, 365, 257, + 954, 567, 311, 51320], "temperature": 0.0, "avg_logprob": -0.15624447490857996, + "compression_ratio": 1.5577689243027888, "no_speech_prob": 0.01345678698271513}, + {"id": 177, "seek": 108512, "start": 1104.2399999999998, "end": 1110.7199999999998, + "text": " been working on Java for like 25 years. Obviously was something. And I + would always say, and I hope", "tokens": [51320, 668, 1364, 322, 10745, 337, 411, + 3552, 924, 13, 7580, 390, 746, 13, 400, 286, 576, 1009, 584, 11, 293, 286, 1454, + 51644], "temperature": 0.0, "avg_logprob": -0.15624447490857996, "compression_ratio": + 1.5577689243027888, "no_speech_prob": 0.01345678698271513}, {"id": 178, "seek": + 111072, "start": 1111.44, "end": 1117.6000000000001, "text": " people who are listening + to this did not, you know, recall like I''m saying all of this out loud on", "tokens": + [50400, 561, 567, 366, 4764, 281, 341, 630, 406, 11, 291, 458, 11, 9901, 411, 286, + 478, 1566, 439, 295, 341, 484, 6588, 322, 50708], "temperature": 0.0, "avg_logprob": + -0.13879073293585525, "compression_ratio": 1.6623931623931625, "no_speech_prob": + 0.022663138806819916}, {"id": 179, "seek": 111072, "start": 1117.6000000000001, + "end": 1122.96, "text": " a podcast, but I would say like I have nine years of experience. + But even nine years was less at", "tokens": [50708, 257, 7367, 11, 457, 286, 576, + 584, 411, 286, 362, 4949, 924, 295, 1752, 13, 583, 754, 4949, 924, 390, 1570, 412, + 50976], "temperature": 0.0, "avg_logprob": -0.13879073293585525, "compression_ratio": + 1.6623931623931625, "no_speech_prob": 0.022663138806819916}, {"id": 180, "seek": + 111072, "start": 1122.96, "end": 1129.44, "text": " that point in time because these + people are like always very senior. And which made me, you know,", "tokens": [50976, + 300, 935, 294, 565, 570, 613, 561, 366, 411, 1009, 588, 7965, 13, 400, 597, 1027, + 385, 11, 291, 458, 11, 51300], "temperature": 0.0, "avg_logprob": -0.13879073293585525, + "compression_ratio": 1.6623931623931625, "no_speech_prob": 0.022663138806819916}, + {"id": 181, "seek": 111072, "start": 1129.44, "end": 1135.52, "text": " like take + break from my office, like literally understand from the code level, like how each + of", "tokens": [51300, 411, 747, 1821, 490, 452, 3398, 11, 411, 3736, 1223, 490, + 264, 3089, 1496, 11, 411, 577, 1184, 295, 51604], "temperature": 0.0, "avg_logprob": + -0.13879073293585525, "compression_ratio": 1.6623931623931625, "no_speech_prob": + 0.022663138806819916}, {"id": 182, "seek": 113552, "start": 1135.52, "end": 1140.72, + "text": " these features were working. Although it was solely done to like, you + know, literally save my", "tokens": [50364, 613, 4122, 645, 1364, 13, 5780, 309, + 390, 23309, 1096, 281, 411, 11, 291, 458, 11, 3736, 3155, 452, 50624], "temperature": + 0.0, "avg_logprob": -0.2074526081914487, "compression_ratio": 1.546875, "no_speech_prob": + 0.005118136294186115}, {"id": 183, "seek": 113552, "start": 1140.72, "end": 1147.12, + "text": " reputation at that point in time. But I realized like the benefit or the + grooming that it brought", "tokens": [50624, 13061, 412, 300, 935, 294, 565, 13, + 583, 286, 5334, 411, 264, 5121, 420, 264, 49700, 300, 309, 3038, 50944], "temperature": + 0.0, "avg_logprob": -0.2074526081914487, "compression_ratio": 1.546875, "no_speech_prob": + 0.005118136294186115}, {"id": 184, "seek": 113552, "start": 1147.12, "end": 1154.72, + "text": " along was like way bigger. I think that understanding that was like, I + would say like a major breakthrough.", "tokens": [50944, 2051, 390, 411, 636, 3801, + 13, 286, 519, 300, 3701, 300, 390, 411, 11, 286, 576, 584, 411, 257, 2563, 22397, + 13, 51324], "temperature": 0.0, "avg_logprob": -0.2074526081914487, "compression_ratio": + 1.546875, "no_speech_prob": 0.005118136294186115}, {"id": 185, "seek": 113552, "start": + 1156.08, "end": 1164.56, "text": " You reminded me of, I don''t remember was it + 2011, probably, and Berlin buzzwords. There was some", "tokens": [51392, 509, 15920, + 385, 295, 11, 286, 500, 380, 1604, 390, 309, 10154, 11, 1391, 11, 293, 13848, 13036, + 13832, 13, 821, 390, 512, 51816], "temperature": 0.0, "avg_logprob": -0.2074526081914487, + "compression_ratio": 1.546875, "no_speech_prob": 0.005118136294186115}, {"id": 186, + "seek": 116552, "start": 1165.68, "end": 1173.52, "text": " raffle for whatever. + Like I want a book written by Rafale Kutz on Elasticsearch. And he actually wrote", + "tokens": [50372, 367, 29264, 337, 2035, 13, 1743, 286, 528, 257, 1446, 3720, 538, + 29611, 1220, 591, 325, 89, 322, 2699, 2750, 405, 1178, 13, 400, 415, 767, 4114, + 50764], "temperature": 0.0, "avg_logprob": -0.17618908601648667, "compression_ratio": + 1.7712177121771218, "no_speech_prob": 0.02280421555042267}, {"id": 187, "seek": + 116552, "start": 1174.24, "end": 1180.08, "text": " like a couple words there. And + he said, if you don''t find answers in this book, then read the code.", "tokens": + [50800, 411, 257, 1916, 2283, 456, 13, 400, 415, 848, 11, 498, 291, 500, 380, 915, + 6338, 294, 341, 1446, 11, 550, 1401, 264, 3089, 13, 51092], "temperature": 0.0, + "avg_logprob": -0.17618908601648667, "compression_ratio": 1.7712177121771218, "no_speech_prob": + 0.02280421555042267}, {"id": 188, "seek": 116552, "start": 1180.08, "end": 1184.6399999999999, + "text": " And like, you know, the code is also open source. And I was like, this + was like such a big", "tokens": [51092, 400, 411, 11, 291, 458, 11, 264, 3089, 307, + 611, 1269, 4009, 13, 400, 286, 390, 411, 11, 341, 390, 411, 1270, 257, 955, 51320], + "temperature": 0.0, "avg_logprob": -0.17618908601648667, "compression_ratio": 1.7712177121771218, + "no_speech_prob": 0.02280421555042267}, {"id": 189, "seek": 116552, "start": 1185.28, + "end": 1190.32, "text": " opening to me in some sense, even though I was coding + by then. I was like, hold on a second.", "tokens": [51352, 5193, 281, 385, 294, + 512, 2020, 11, 754, 1673, 286, 390, 17720, 538, 550, 13, 286, 390, 411, 11, 1797, + 322, 257, 1150, 13, 51604], "temperature": 0.0, "avg_logprob": -0.17618908601648667, + "compression_ratio": 1.7712177121771218, "no_speech_prob": 0.02280421555042267}, + {"id": 190, "seek": 116552, "start": 1190.32, "end": 1195.12, "text": " So if I + don''t find what does it mean? So this book doesn''t contain all the answers, you + know,", "tokens": [51604, 407, 498, 286, 500, 380, 915, 437, 775, 309, 914, 30, + 407, 341, 1446, 1177, 380, 5304, 439, 264, 6338, 11, 291, 458, 11, 51844], "temperature": + 0.0, "avg_logprob": -0.17618908601648667, "compression_ratio": 1.7712177121771218, + "no_speech_prob": 0.02280421555042267}, {"id": 191, "seek": 119512, "start": 1195.12, + "end": 1202.6399999999999, "text": " like major things. And it''s pretty thick book, + you know. I was like, yeah, wow. So it just tells you", "tokens": [50364, 411, 2563, + 721, 13, 400, 309, 311, 1238, 5060, 1446, 11, 291, 458, 13, 286, 390, 411, 11, 1338, + 11, 6076, 13, 407, 309, 445, 5112, 291, 50740], "temperature": 0.0, "avg_logprob": + -0.17288289211764193, "compression_ratio": 1.5679012345679013, "no_speech_prob": + 0.0036876611411571503}, {"id": 192, "seek": 119512, "start": 1202.6399999999999, + "end": 1208.9599999999998, "text": " that how experimental you need to be, right? + And that there are no given answers, right?", "tokens": [50740, 300, 577, 17069, + 291, 643, 281, 312, 11, 558, 30, 400, 300, 456, 366, 572, 2212, 6338, 11, 558, 30, + 51056], "temperature": 0.0, "avg_logprob": -0.17288289211764193, "compression_ratio": + 1.5679012345679013, "no_speech_prob": 0.0036876611411571503}, {"id": 193, "seek": + 119512, "start": 1209.6, "end": 1217.36, "text": " Right. So true. Yeah, that''s + exactly how it is. And then while giving this training, I mean,", "tokens": [51088, + 1779, 13, 407, 2074, 13, 865, 11, 300, 311, 2293, 577, 309, 307, 13, 400, 550, 1339, + 2902, 341, 3097, 11, 286, 914, 11, 51476], "temperature": 0.0, "avg_logprob": -0.17288289211764193, + "compression_ratio": 1.5679012345679013, "no_speech_prob": 0.0036876611411571503}, + {"id": 194, "seek": 119512, "start": 1217.36, "end": 1224.8, "text": " I ran almost + like seven eight batches. One of the person was my student who recommended my profile", + "tokens": [51476, 286, 5872, 1920, 411, 3407, 3180, 15245, 279, 13, 1485, 295, 264, + 954, 390, 452, 3107, 567, 9628, 452, 7964, 51848], "temperature": 0.0, "avg_logprob": + -0.17288289211764193, "compression_ratio": 1.5679012345679013, "no_speech_prob": + 0.0036876611411571503}, {"id": 195, "seek": 122480, "start": 1224.8, "end": 1232.56, + "text": " to Lucidworks. And that''s how I got into Lucidworks. And I discovered + a whole new word of open source.", "tokens": [50364, 281, 9593, 327, 18357, 13, + 400, 300, 311, 577, 286, 658, 666, 9593, 327, 18357, 13, 400, 286, 6941, 257, 1379, + 777, 1349, 295, 1269, 4009, 13, 50752], "temperature": 0.0, "avg_logprob": -0.12676288099849925, + "compression_ratio": 1.79182156133829, "no_speech_prob": 0.010339317843317986}, + {"id": 196, "seek": 122480, "start": 1232.56, "end": 1238.1599999999999, "text": + " Oh, so now you can write code and, you know, contribute or, you know, shape the + product as well.", "tokens": [50752, 876, 11, 370, 586, 291, 393, 2464, 3089, 293, + 11, 291, 458, 11, 10586, 420, 11, 291, 458, 11, 3909, 264, 1674, 382, 731, 13, 51032], + "temperature": 0.0, "avg_logprob": -0.12676288099849925, "compression_ratio": 1.79182156133829, + "no_speech_prob": 0.010339317843317986}, {"id": 197, "seek": 122480, "start": 1238.1599999999999, + "end": 1243.6, "text": " Like I can really define like how the solar function would + work. Like I''ve always, you know,", "tokens": [51032, 1743, 286, 393, 534, 6964, + 411, 577, 264, 7936, 2445, 576, 589, 13, 1743, 286, 600, 1009, 11, 291, 458, 11, + 51304], "temperature": 0.0, "avg_logprob": -0.12676288099849925, "compression_ratio": + 1.79182156133829, "no_speech_prob": 0.010339317843317986}, {"id": 198, "seek": 122480, + "start": 1243.6, "end": 1247.6, "text": " being on the other side, like complaining, + like, oh, you know what? I don''t like, you know, how", "tokens": [51304, 885, 322, + 264, 661, 1252, 11, 411, 20740, 11, 411, 11, 1954, 11, 291, 458, 437, 30, 286, 500, + 380, 411, 11, 291, 458, 11, 577, 51504], "temperature": 0.0, "avg_logprob": -0.12676288099849925, + "compression_ratio": 1.79182156133829, "no_speech_prob": 0.010339317843317986}, + {"id": 199, "seek": 122480, "start": 1247.6, "end": 1252.8799999999999, "text": + " this shows on UI or I don''t like, you know, why it forgets about this thing or + how about you,", "tokens": [51504, 341, 3110, 322, 15682, 420, 286, 500, 380, 411, + 11, 291, 458, 11, 983, 309, 2870, 82, 466, 341, 551, 420, 577, 466, 291, 11, 51768], + "temperature": 0.0, "avg_logprob": -0.12676288099849925, "compression_ratio": 1.79182156133829, + "no_speech_prob": 0.010339317843317986}, {"id": 200, "seek": 125288, "start": 1252.88, + "end": 1257.2, "text": " you know, if I could change this behavior. So instead of, + you know, just making that change in my", "tokens": [50364, 291, 458, 11, 498, 286, + 727, 1319, 341, 5223, 13, 407, 2602, 295, 11, 291, 458, 11, 445, 1455, 300, 1319, + 294, 452, 50580], "temperature": 0.0, "avg_logprob": -0.21621912175958807, "compression_ratio": + 1.65625, "no_speech_prob": 0.009860881604254246}, {"id": 201, "seek": 125288, "start": + 1257.2, "end": 1262.64, "text": " local copy, I could actually open source stuff. + I could actually contribute as to how product shapes.", "tokens": [50580, 2654, + 5055, 11, 286, 727, 767, 1269, 4009, 1507, 13, 286, 727, 767, 10586, 382, 281, 577, + 1674, 10854, 13, 50852], "temperature": 0.0, "avg_logprob": -0.21621912175958807, + "compression_ratio": 1.65625, "no_speech_prob": 0.009860881604254246}, {"id": 202, + "seek": 125288, "start": 1262.64, "end": 1270.8000000000002, "text": " And I think + that was like, you know, like you''re a car moment for sure for me. So, yeah.", + "tokens": [50852, 400, 286, 519, 300, 390, 411, 11, 291, 458, 11, 411, 291, 434, + 257, 1032, 1623, 337, 988, 337, 385, 13, 407, 11, 1338, 13, 51260], "temperature": + 0.0, "avg_logprob": -0.21621912175958807, "compression_ratio": 1.65625, "no_speech_prob": + 0.009860881604254246}, {"id": 203, "seek": 125288, "start": 1270.8000000000002, + "end": 1275.44, "text": " And at that point, you moved to the US for that job or + you were already in the US.", "tokens": [51260, 400, 412, 300, 935, 11, 291, 4259, + 281, 264, 2546, 337, 300, 1691, 420, 291, 645, 1217, 294, 264, 2546, 13, 51492], + "temperature": 0.0, "avg_logprob": -0.21621912175958807, "compression_ratio": 1.65625, + "no_speech_prob": 0.009860881604254246}, {"id": 204, "seek": 127544, "start": 1276.4, + "end": 1283.3600000000001, "text": " So I moved briefly to US, but as I said, like + by that time, I already had my second kit,", "tokens": [50412, 407, 286, 4259, 10515, + 281, 2546, 11, 457, 382, 286, 848, 11, 411, 538, 300, 565, 11, 286, 1217, 632, 452, + 1150, 8260, 11, 50760], "temperature": 0.0, "avg_logprob": -0.16441081268618804, + "compression_ratio": 1.5269709543568464, "no_speech_prob": 0.060659222304821014}, + {"id": 205, "seek": 127544, "start": 1283.3600000000001, "end": 1289.76, "text": + " who was six months old. And it was not very, you know, practical for me to stay + there by myself.", "tokens": [50760, 567, 390, 2309, 2493, 1331, 13, 400, 309, 390, + 406, 588, 11, 291, 458, 11, 8496, 337, 385, 281, 1754, 456, 538, 2059, 13, 51080], + "temperature": 0.0, "avg_logprob": -0.16441081268618804, "compression_ratio": 1.5269709543568464, + "no_speech_prob": 0.060659222304821014}, {"id": 206, "seek": 127544, "start": 1289.76, + "end": 1294.48, "text": " And which is why I decided to come back, uh, leave my + job there in Lucidworks. And I started", "tokens": [51080, 400, 597, 307, 983, 286, + 3047, 281, 808, 646, 11, 2232, 11, 1856, 452, 1691, 456, 294, 9593, 327, 18357, + 13, 400, 286, 1409, 51316], "temperature": 0.0, "avg_logprob": -0.16441081268618804, + "compression_ratio": 1.5269709543568464, "no_speech_prob": 0.060659222304821014}, + {"id": 207, "seek": 127544, "start": 1294.48, "end": 1299.3600000000001, "text": + " my own consulting company called bistro innovation labs. I know the name would + sound like", "tokens": [51316, 452, 1065, 23682, 2237, 1219, 18209, 340, 8504, 20339, + 13, 286, 458, 264, 1315, 576, 1626, 411, 51560], "temperature": 0.0, "avg_logprob": + -0.16441081268618804, "compression_ratio": 1.5269709543568464, "no_speech_prob": + 0.060659222304821014}, {"id": 208, "seek": 129936, "start": 1300.0, "end": 1304.8799999999999, + "text": " as if it''s a restaurant. And the reason, I mean, there are like a lot + of things that people", "tokens": [50396, 382, 498, 309, 311, 257, 6383, 13, 400, + 264, 1778, 11, 286, 914, 11, 456, 366, 411, 257, 688, 295, 721, 300, 561, 50640], + "temperature": 0.0, "avg_logprob": -0.10632821821397351, "compression_ratio": 1.6456140350877193, + "no_speech_prob": 0.016868602484464645}, {"id": 209, "seek": 129936, "start": 1304.8799999999999, + "end": 1309.04, "text": " used to ask me like, why is it called bistro innovation + lab? Like you should have something like a", "tokens": [50640, 1143, 281, 1029, + 385, 411, 11, 983, 307, 309, 1219, 18209, 340, 8504, 2715, 30, 1743, 291, 820, 362, + 746, 411, 257, 50848], "temperature": 0.0, "avg_logprob": -0.10632821821397351, + "compression_ratio": 1.6456140350877193, "no_speech_prob": 0.016868602484464645}, + {"id": 210, "seek": 129936, "start": 1309.04, "end": 1315.04, "text": " sci-fi formula + or like some math algo in the name. Why would you keep it like a bistro?", "tokens": + [50848, 2180, 12, 13325, 8513, 420, 411, 512, 5221, 8655, 294, 264, 1315, 13, 1545, + 576, 291, 1066, 309, 411, 257, 18209, 340, 30, 51148], "temperature": 0.0, "avg_logprob": + -0.10632821821397351, "compression_ratio": 1.6456140350877193, "no_speech_prob": + 0.016868602484464645}, {"id": 211, "seek": 129936, "start": 1315.04, "end": 1319.4399999999998, + "text": " And I was like, because I''m so passionate about cooking, I think, I forgot + to probably mention", "tokens": [51148, 400, 286, 390, 411, 11, 570, 286, 478, 370, + 11410, 466, 6361, 11, 286, 519, 11, 286, 5298, 281, 1391, 2152, 51368], "temperature": + 0.0, "avg_logprob": -0.10632821821397351, "compression_ratio": 1.6456140350877193, + "no_speech_prob": 0.016868602484464645}, {"id": 212, "seek": 129936, "start": 1319.4399999999998, + "end": 1325.84, "text": " that during my intro. I got so excited. But yeah, I think + a food part is something that really,", "tokens": [51368, 300, 1830, 452, 12897, + 13, 286, 658, 370, 2919, 13, 583, 1338, 11, 286, 519, 257, 1755, 644, 307, 746, + 300, 534, 11, 51688], "temperature": 0.0, "avg_logprob": -0.10632821821397351, "compression_ratio": + 1.6456140350877193, "no_speech_prob": 0.016868602484464645}, {"id": 213, "seek": + 132584, "start": 1325.84, "end": 1332.1599999999999, "text": " really, uh, you know, + brings the best in me. I think if my staff or whatever I''m doing is not", "tokens": + [50364, 534, 11, 2232, 11, 291, 458, 11, 5607, 264, 1151, 294, 385, 13, 286, 519, + 498, 452, 3525, 420, 2035, 286, 478, 884, 307, 406, 50680], "temperature": 0.0, + "avg_logprob": -0.14028264582157135, "compression_ratio": 1.7777777777777777, "no_speech_prob": + 0.03664243593811989}, {"id": 214, "seek": 132584, "start": 1332.1599999999999, "end": + 1338.0, "text": " really working, I think if you do not find me on my desk, uh, + you would find me in my kitchen.", "tokens": [50680, 534, 1364, 11, 286, 519, 498, + 291, 360, 406, 915, 385, 322, 452, 10026, 11, 2232, 11, 291, 576, 915, 385, 294, + 452, 6525, 13, 50972], "temperature": 0.0, "avg_logprob": -0.14028264582157135, + "compression_ratio": 1.7777777777777777, "no_speech_prob": 0.03664243593811989}, + {"id": 215, "seek": 132584, "start": 1338.0, "end": 1344.9599999999998, "text": + " So that''s mostly, uh, you know, where my word lies. Some dangling between my + desk and my kitchen,", "tokens": [50972, 407, 300, 311, 5240, 11, 2232, 11, 291, + 458, 11, 689, 452, 1349, 9134, 13, 2188, 21892, 1688, 1296, 452, 10026, 293, 452, + 6525, 11, 51320], "temperature": 0.0, "avg_logprob": -0.14028264582157135, "compression_ratio": + 1.7777777777777777, "no_speech_prob": 0.03664243593811989}, {"id": 216, "seek": + 132584, "start": 1344.9599999999998, "end": 1349.1999999999998, "text": " because + I think I love it that way. And that''s what I call my company as well.", "tokens": + [51320, 570, 286, 519, 286, 959, 309, 300, 636, 13, 400, 300, 311, 437, 286, 818, + 452, 2237, 382, 731, 13, 51532], "temperature": 0.0, "avg_logprob": -0.14028264582157135, + "compression_ratio": 1.7777777777777777, "no_speech_prob": 0.03664243593811989}, + {"id": 217, "seek": 132584, "start": 1349.1999999999998, "end": 1353.76, "text": + " But do you think there is some connection actually? I think at some point, I even + like vlog really", "tokens": [51532, 583, 360, 291, 519, 456, 307, 512, 4984, 767, + 30, 286, 519, 412, 512, 935, 11, 286, 754, 411, 8917, 534, 51760], "temperature": + 0.0, "avg_logprob": -0.14028264582157135, "compression_ratio": 1.7777777777777777, + "no_speech_prob": 0.03664243593811989}, {"id": 218, "seek": 135376, "start": 1353.76, + "end": 1360.32, "text": " briefly, uh, about this, uh, as I was just learning how + to cook, I guess, uh, there''s some connection", "tokens": [50364, 10515, 11, 2232, + 11, 466, 341, 11, 2232, 11, 382, 286, 390, 445, 2539, 577, 281, 2543, 11, 286, 2041, + 11, 2232, 11, 456, 311, 512, 4984, 50692], "temperature": 0.0, "avg_logprob": -0.1049320423497563, + "compression_ratio": 1.7148014440433212, "no_speech_prob": 0.021328510716557503}, + {"id": 219, "seek": 135376, "start": 1360.32, "end": 1366.48, "text": " between + how you write code from scratch and how you cook before you learned how to cook + that", "tokens": [50692, 1296, 577, 291, 2464, 3089, 490, 8459, 293, 577, 291, 2543, + 949, 291, 3264, 577, 281, 2543, 300, 51000], "temperature": 0.0, "avg_logprob": + -0.1049320423497563, "compression_ratio": 1.7148014440433212, "no_speech_prob": + 0.021328510716557503}, {"id": 220, "seek": 135376, "start": 1366.48, "end": 1371.68, + "text": " particular dish, right? Like you can assemble from building blocks and + like in that order.", "tokens": [51000, 1729, 5025, 11, 558, 30, 1743, 291, 393, + 22364, 490, 2390, 8474, 293, 411, 294, 300, 1668, 13, 51260], "temperature": 0.0, + "avg_logprob": -0.1049320423497563, "compression_ratio": 1.7148014440433212, "no_speech_prob": + 0.021328510716557503}, {"id": 221, "seek": 135376, "start": 1372.8799999999999, + "end": 1377.68, "text": " Right. I think, I think that''s an interesting point. + I mean, it does function the same way.", "tokens": [51320, 1779, 13, 286, 519, 11, + 286, 519, 300, 311, 364, 1880, 935, 13, 286, 914, 11, 309, 775, 2445, 264, 912, + 636, 13, 51560], "temperature": 0.0, "avg_logprob": -0.1049320423497563, "compression_ratio": + 1.7148014440433212, "no_speech_prob": 0.021328510716557503}, {"id": 222, "seek": + 135376, "start": 1378.24, "end": 1382.56, "text": " And it''s obviously like the + experimentation is something like you would experiment with different", "tokens": + [51588, 400, 309, 311, 2745, 411, 264, 37142, 307, 746, 411, 291, 576, 5120, 365, + 819, 51804], "temperature": 0.0, "avg_logprob": -0.1049320423497563, "compression_ratio": + 1.7148014440433212, "no_speech_prob": 0.021328510716557503}, {"id": 223, "seek": + 138256, "start": 1382.56, "end": 1389.28, "text": " cuisines. Like usually I have, + I mean, I don''t really do that. I don''t change the basic nature of", "tokens": + [50364, 2702, 271, 1652, 13, 1743, 2673, 286, 362, 11, 286, 914, 11, 286, 500, 380, + 534, 360, 300, 13, 286, 500, 380, 1319, 264, 3875, 3687, 295, 50700], "temperature": + 0.0, "avg_logprob": -0.14194114391620344, "compression_ratio": 1.7902621722846441, + "no_speech_prob": 0.005392957013100386}, {"id": 224, "seek": 138256, "start": 1389.28, + "end": 1393.76, "text": " the food. I mean, if it is German food, it should be German + food. I mean, I would not try to", "tokens": [50700, 264, 1755, 13, 286, 914, 11, + 498, 309, 307, 6521, 1755, 11, 309, 820, 312, 6521, 1755, 13, 286, 914, 11, 286, + 576, 406, 853, 281, 50924], "temperature": 0.0, "avg_logprob": -0.14194114391620344, + "compression_ratio": 1.7902621722846441, "no_speech_prob": 0.005392957013100386}, + {"id": 225, "seek": 138256, "start": 1393.76, "end": 1400.3999999999999, "text": + " Indianize the food. Uh, but I think somewhere I do that and that experimentation + is something that", "tokens": [50924, 6427, 1125, 264, 1755, 13, 4019, 11, 457, + 286, 519, 4079, 286, 360, 300, 293, 300, 37142, 307, 746, 300, 51256], "temperature": + 0.0, "avg_logprob": -0.14194114391620344, "compression_ratio": 1.7902621722846441, + "no_speech_prob": 0.005392957013100386}, {"id": 226, "seek": 138256, "start": 1400.3999999999999, + "end": 1405.28, "text": " I would also connect with like creating something. And + I think that''s, I mean, I never thought about it", "tokens": [51256, 286, 576, + 611, 1745, 365, 411, 4084, 746, 13, 400, 286, 519, 300, 311, 11, 286, 914, 11, 286, + 1128, 1194, 466, 309, 51500], "temperature": 0.0, "avg_logprob": -0.14194114391620344, + "compression_ratio": 1.7902621722846441, "no_speech_prob": 0.005392957013100386}, + {"id": 227, "seek": 138256, "start": 1405.28, "end": 1411.9199999999998, "text": + " from that aspect, but yeah, I think good point, good core religion, so to say. + Yeah.", "tokens": [51500, 490, 300, 4171, 11, 457, 1338, 11, 286, 519, 665, 935, + 11, 665, 4965, 7561, 11, 370, 281, 584, 13, 865, 13, 51832], "temperature": 0.0, + "avg_logprob": -0.14194114391620344, "compression_ratio": 1.7902621722846441, "no_speech_prob": + 0.005392957013100386}, {"id": 228, "seek": 141192, "start": 1412.0, "end": 1416.88, + "text": " Absolutely. And then what happened next? So you opened your bistro innovations + lab?", "tokens": [50368, 7021, 13, 400, 550, 437, 2011, 958, 30, 407, 291, 5625, + 428, 18209, 340, 24283, 2715, 30, 50612], "temperature": 0.0, "avg_logprob": -0.16402820587158204, + "compression_ratio": 1.6026200873362446, "no_speech_prob": 0.0033813731279224157}, + {"id": 229, "seek": 141192, "start": 1417.76, "end": 1424.0800000000002, "text": + " True. And I think that landed me a job in Germany. Uh, and interestingly, I had + never been", "tokens": [50656, 13587, 13, 400, 286, 519, 300, 15336, 385, 257, 1691, + 294, 7244, 13, 4019, 11, 293, 25873, 11, 286, 632, 1128, 668, 50972], "temperature": + 0.0, "avg_logprob": -0.16402820587158204, "compression_ratio": 1.6026200873362446, + "no_speech_prob": 0.0033813731279224157}, {"id": 230, "seek": 141192, "start": 1424.0800000000002, + "end": 1431.04, "text": " a Germany before. And for me, I mean, I had been to London + before I had been to US before. So to", "tokens": [50972, 257, 7244, 949, 13, 400, + 337, 385, 11, 286, 914, 11, 286, 632, 668, 281, 7042, 949, 286, 632, 668, 281, 2546, + 949, 13, 407, 281, 51320], "temperature": 0.0, "avg_logprob": -0.16402820587158204, + "compression_ratio": 1.6026200873362446, "no_speech_prob": 0.0033813731279224157}, + {"id": 231, "seek": 141192, "start": 1431.04, "end": 1437.1200000000001, "text": + " me or precisely so to say, you know, I''m trying to wrap all the Indians that + would say, for us,", "tokens": [51320, 385, 420, 13402, 370, 281, 584, 11, 291, + 458, 11, 286, 478, 1382, 281, 7019, 439, 264, 23838, 300, 576, 584, 11, 337, 505, + 11, 51624], "temperature": 0.0, "avg_logprob": -0.16402820587158204, "compression_ratio": + 1.6026200873362446, "no_speech_prob": 0.0033813731279224157}, {"id": 232, "seek": + 143712, "start": 1437.12, "end": 1442.2399999999998, "text": " every foreign country, + you know, we can speak in English. But I think the biggest trauma was like", "tokens": + [50364, 633, 5329, 1941, 11, 291, 458, 11, 321, 393, 1710, 294, 3669, 13, 583, 286, + 519, 264, 3880, 11407, 390, 411, 50620], "temperature": 0.0, "avg_logprob": -0.18979384167359606, + "compression_ratio": 1.649789029535865, "no_speech_prob": 0.014449995942413807}, + {"id": 233, "seek": 143712, "start": 1442.2399999999998, "end": 1451.6799999999998, + "text": " when I landed in Germany in Berlin. And I realized that, uh, English gay + niche. And I realized that,", "tokens": [50620, 562, 286, 15336, 294, 7244, 294, + 13848, 13, 400, 286, 5334, 300, 11, 2232, 11, 3669, 9049, 19956, 13, 400, 286, 5334, + 300, 11, 51092], "temperature": 0.0, "avg_logprob": -0.18979384167359606, "compression_ratio": + 1.649789029535865, "no_speech_prob": 0.014449995942413807}, {"id": 234, "seek": + 143712, "start": 1451.6799999999998, "end": 1458.8799999999999, "text": " you know, + it would be very difficult. But I think I had, I had a tough year in 2018 when I + decided", "tokens": [51092, 291, 458, 11, 309, 576, 312, 588, 2252, 13, 583, 286, + 519, 286, 632, 11, 286, 632, 257, 4930, 1064, 294, 6096, 562, 286, 3047, 51452], + "temperature": 0.0, "avg_logprob": -0.18979384167359606, "compression_ratio": 1.649789029535865, + "no_speech_prob": 0.014449995942413807}, {"id": 235, "seek": 143712, "start": 1458.8799999999999, + "end": 1463.28, "text": " to move here for several reasons, because I was trying + to make sure like my, you know, family", "tokens": [51452, 281, 1286, 510, 337, + 2940, 4112, 11, 570, 286, 390, 1382, 281, 652, 988, 411, 452, 11, 291, 458, 11, + 1605, 51672], "temperature": 0.0, "avg_logprob": -0.18979384167359606, "compression_ratio": + 1.649789029535865, "no_speech_prob": 0.014449995942413807}, {"id": 236, "seek": + 146328, "start": 1463.28, "end": 1468.8799999999999, "text": " settles here will + at the same time trying to, you know, have a little bit of a grip on the language", + "tokens": [50364, 5584, 904, 510, 486, 412, 264, 912, 565, 1382, 281, 11, 291, 458, + 11, 362, 257, 707, 857, 295, 257, 12007, 322, 264, 2856, 50644], "temperature": + 0.0, "avg_logprob": -0.20464043223529782, "compression_ratio": 1.6182572614107884, + "no_speech_prob": 0.014934953302145004}, {"id": 237, "seek": 146328, "start": 1468.8799999999999, + "end": 1475.44, "text": " as well, like my work at that point in time. But I decided + to close the company afterwards.", "tokens": [50644, 382, 731, 11, 411, 452, 589, + 412, 300, 935, 294, 565, 13, 583, 286, 3047, 281, 1998, 264, 2237, 10543, 13, 50972], + "temperature": 0.0, "avg_logprob": -0.20464043223529782, "compression_ratio": 1.6182572614107884, + "no_speech_prob": 0.014934953302145004}, {"id": 238, "seek": 146328, "start": 1475.44, + "end": 1483.12, "text": " Sadly, you couldn''t keep it. I could not. Yeah, there + are some general notes. So it doesn''t work.", "tokens": [50972, 29628, 11, 291, + 2809, 380, 1066, 309, 13, 286, 727, 406, 13, 865, 11, 456, 366, 512, 2674, 5570, + 13, 407, 309, 1177, 380, 589, 13, 51356], "temperature": 0.0, "avg_logprob": -0.20464043223529782, + "compression_ratio": 1.6182572614107884, "no_speech_prob": 0.014934953302145004}, + {"id": 239, "seek": 146328, "start": 1484.6399999999999, "end": 1490.16, "text": + " And did you, but you did have clients on it? Like you did the, oh yes, I had clients. + I had clients.", "tokens": [51432, 400, 630, 291, 11, 457, 291, 630, 362, 6982, + 322, 309, 30, 1743, 291, 630, 264, 11, 1954, 2086, 11, 286, 632, 6982, 13, 286, + 632, 6982, 13, 51708], "temperature": 0.0, "avg_logprob": -0.20464043223529782, + "compression_ratio": 1.6182572614107884, "no_speech_prob": 0.014934953302145004}, + {"id": 240, "seek": 149016, "start": 1490.16, "end": 1496.16, "text": " I had three + pretty major clients back then. And I think one thing that somebody commented about + it", "tokens": [50364, 286, 632, 1045, 1238, 2563, 6982, 646, 550, 13, 400, 286, + 519, 472, 551, 300, 2618, 26940, 466, 309, 50664], "temperature": 0.0, "avg_logprob": + -0.10768066066326482, "compression_ratio": 1.7543859649122806, "no_speech_prob": + 0.019880710169672966}, {"id": 241, "seek": 149016, "start": 1496.16, "end": 1503.0400000000002, + "text": " yesterday as well. And they say that I come across subtle, you know, in + a very subtle way, you know,", "tokens": [50664, 5186, 382, 731, 13, 400, 436, 584, + 300, 286, 808, 2108, 13743, 11, 291, 458, 11, 294, 257, 588, 13743, 636, 11, 291, + 458, 11, 51008], "temperature": 0.0, "avg_logprob": -0.10768066066326482, "compression_ratio": + 1.7543859649122806, "no_speech_prob": 0.019880710169672966}, {"id": 242, "seek": + 149016, "start": 1504.16, "end": 1510.16, "text": " very straightforward way. And + this is something that people do not expect me to be. But I think that", "tokens": + [51064, 588, 15325, 636, 13, 400, 341, 307, 746, 300, 561, 360, 406, 2066, 385, + 281, 312, 13, 583, 286, 519, 300, 51364], "temperature": 0.0, "avg_logprob": -0.10768066066326482, + "compression_ratio": 1.7543859649122806, "no_speech_prob": 0.019880710169672966}, + {"id": 243, "seek": 149016, "start": 1510.16, "end": 1516.72, "text": " subtleness + came way, hard way to me. And I want to preserve it that way. So one thing that + I try to", "tokens": [51364, 7257, 45887, 1361, 636, 11, 1152, 636, 281, 385, 13, + 400, 286, 528, 281, 15665, 309, 300, 636, 13, 407, 472, 551, 300, 286, 853, 281, + 51692], "temperature": 0.0, "avg_logprob": -0.10768066066326482, "compression_ratio": + 1.7543859649122806, "no_speech_prob": 0.019880710169672966}, {"id": 244, "seek": + 151672, "start": 1516.72, "end": 1521.44, "text": " set an example also for my kids + is that I don''t lie. I try to make things as clear as possible. So", "tokens": + [50364, 992, 364, 1365, 611, 337, 452, 2301, 307, 300, 286, 500, 380, 4544, 13, + 286, 853, 281, 652, 721, 382, 1850, 382, 1944, 13, 407, 50600], "temperature": 0.0, + "avg_logprob": -0.08500471640759566, "compression_ratio": 1.7178571428571427, "no_speech_prob": + 0.016301972791552544}, {"id": 245, "seek": 151672, "start": 1521.44, "end": 1526.32, + "text": " I tried communicating to my clients that, you know, I would not be able + to work because I have", "tokens": [50600, 286, 3031, 17559, 281, 452, 6982, 300, + 11, 291, 458, 11, 286, 576, 406, 312, 1075, 281, 589, 570, 286, 362, 50844], "temperature": + 0.0, "avg_logprob": -0.08500471640759566, "compression_ratio": 1.7178571428571427, + "no_speech_prob": 0.016301972791552544}, {"id": 246, "seek": 151672, "start": 1526.32, + "end": 1531.44, "text": " already my hands full. And I''m trying to settle in a + new country, trying to manage my family,", "tokens": [50844, 1217, 452, 2377, 1577, + 13, 400, 286, 478, 1382, 281, 11852, 294, 257, 777, 1941, 11, 1382, 281, 3067, 452, + 1605, 11, 51100], "temperature": 0.0, "avg_logprob": -0.08500471640759566, "compression_ratio": + 1.7178571428571427, "no_speech_prob": 0.016301972791552544}, {"id": 247, "seek": + 151672, "start": 1531.44, "end": 1538.08, "text": " also helping, you know, my husband + who did not have a job back then. But I mean, if you can adjust", "tokens": [51100, + 611, 4315, 11, 291, 458, 11, 452, 5213, 567, 630, 406, 362, 257, 1691, 646, 550, + 13, 583, 286, 914, 11, 498, 291, 393, 4369, 51432], "temperature": 0.0, "avg_logprob": + -0.08500471640759566, "compression_ratio": 1.7178571428571427, "no_speech_prob": + 0.016301972791552544}, {"id": 248, "seek": 151672, "start": 1538.08, "end": 1542.96, + "text": " with that, we can still keep working. But then I would not charge you + for that because I mean,", "tokens": [51432, 365, 300, 11, 321, 393, 920, 1066, + 1364, 13, 583, 550, 286, 576, 406, 4602, 291, 337, 300, 570, 286, 914, 11, 51676], + "temperature": 0.0, "avg_logprob": -0.08500471640759566, "compression_ratio": 1.7178571428571427, + "no_speech_prob": 0.016301972791552544}, {"id": 249, "seek": 154296, "start": 1542.96, + "end": 1548.48, "text": " anyway, I would be paying taxes for it. So I think eventually, + I mean, it was like more like", "tokens": [50364, 4033, 11, 286, 576, 312, 6229, + 10041, 337, 309, 13, 407, 286, 519, 4728, 11, 286, 914, 11, 309, 390, 411, 544, + 411, 50640], "temperature": 0.0, "avg_logprob": -0.16440885969736044, "compression_ratio": + 1.6725663716814159, "no_speech_prob": 0.038620784878730774}, {"id": 250, "seek": + 154296, "start": 1549.3600000000001, "end": 1555.76, "text": " one or two calls + a week. And then it transformed into one call of a, then one call in two weeks,", + "tokens": [50684, 472, 420, 732, 5498, 257, 1243, 13, 400, 550, 309, 16894, 666, + 472, 818, 295, 257, 11, 550, 472, 818, 294, 732, 3259, 11, 51004], "temperature": + 0.0, "avg_logprob": -0.16440885969736044, "compression_ratio": 1.6725663716814159, + "no_speech_prob": 0.038620784878730774}, {"id": 251, "seek": 154296, "start": 1555.76, + "end": 1560.08, "text": " and then one call of month. And then eventually I just + lost all the clients. And I think", "tokens": [51004, 293, 550, 472, 818, 295, 1618, + 13, 400, 550, 4728, 286, 445, 2731, 439, 264, 6982, 13, 400, 286, 519, 51220], "temperature": + 0.0, "avg_logprob": -0.16440885969736044, "compression_ratio": 1.6725663716814159, + "no_speech_prob": 0.038620784878730774}, {"id": 252, "seek": 154296, "start": 1560.64, + "end": 1567.8400000000001, "text": " that''s when I decided to close it out. Yeah, + yeah. That may happen. But but the still you have the,", "tokens": [51248, 300, + 311, 562, 286, 3047, 281, 1998, 309, 484, 13, 865, 11, 1338, 13, 663, 815, 1051, + 13, 583, 457, 264, 920, 291, 362, 264, 11, 51608], "temperature": 0.0, "avg_logprob": + -0.16440885969736044, "compression_ratio": 1.6725663716814159, "no_speech_prob": + 0.038620784878730774}, {"id": 253, "seek": 156784, "start": 1567.84, "end": 1574.24, + "text": " you know, the affinity to boards search world, right? And develop.", "tokens": + [50364, 291, 458, 11, 264, 39703, 281, 13293, 3164, 1002, 11, 558, 30, 400, 1499, + 13, 50684], "temperature": 0.0, "avg_logprob": -0.19147167435611587, "compression_ratio": + 1.6490384615384615, "no_speech_prob": 0.01197296753525734}, {"id": 254, "seek": + 156784, "start": 1574.24, "end": 1580.6399999999999, "text": " I do have. I do have. + And I am working as a search relevance consultant now with open source", "tokens": + [50684, 286, 360, 362, 13, 286, 360, 362, 13, 400, 286, 669, 1364, 382, 257, 3164, + 32684, 24676, 586, 365, 1269, 4009, 51004], "temperature": 0.0, "avg_logprob": -0.19147167435611587, + "compression_ratio": 1.6490384615384615, "no_speech_prob": 0.01197296753525734}, + {"id": 255, "seek": 156784, "start": 1580.6399999999999, "end": 1584.9599999999998, + "text": " connections. I think one of the reasons that I decided to work with this + company, I mean,", "tokens": [51004, 9271, 13, 286, 519, 472, 295, 264, 4112, 300, + 286, 3047, 281, 589, 365, 341, 2237, 11, 286, 914, 11, 51220], "temperature": 0.0, + "avg_logprob": -0.19147167435611587, "compression_ratio": 1.6490384615384615, "no_speech_prob": + 0.01197296753525734}, {"id": 256, "seek": 156784, "start": 1584.9599999999998, "end": + 1592.32, "text": " they are well known search consulting companies in the space. + And the mission statement that,", "tokens": [51220, 436, 366, 731, 2570, 3164, 23682, + 3431, 294, 264, 1901, 13, 400, 264, 4447, 5629, 300, 11, 51588], "temperature": + 0.0, "avg_logprob": -0.19147167435611587, "compression_ratio": 1.6490384615384615, + "no_speech_prob": 0.01197296753525734}, {"id": 257, "seek": 159232, "start": 1593.12, + "end": 1599.4399999999998, "text": " you know, they want to empower the search teams + of the word that really, you know, is something that", "tokens": [50404, 291, 458, + 11, 436, 528, 281, 11071, 264, 3164, 5491, 295, 264, 1349, 300, 534, 11, 291, 458, + 11, 307, 746, 300, 50720], "temperature": 0.0, "avg_logprob": -0.12073604733336206, + "compression_ratio": 1.8691588785046729, "no_speech_prob": 0.03333936631679535}, + {"id": 258, "seek": 159232, "start": 1599.4399999999998, "end": 1605.9199999999998, + "text": " really rings a bell, you know, or I would say like it shines with what + I want to do. Basically,", "tokens": [50720, 534, 11136, 257, 4549, 11, 291, 458, + 11, 420, 286, 576, 584, 411, 309, 28056, 365, 437, 286, 528, 281, 360, 13, 8537, + 11, 51044], "temperature": 0.0, "avg_logprob": -0.12073604733336206, "compression_ratio": + 1.8691588785046729, "no_speech_prob": 0.03333936631679535}, {"id": 259, "seek": + 159232, "start": 1605.9199999999998, "end": 1610.56, "text": " I mean, you know, + all of us have this, you know, mindset that we all want to do something for money", + "tokens": [51044, 286, 914, 11, 291, 458, 11, 439, 295, 505, 362, 341, 11, 291, + 458, 11, 12543, 300, 321, 439, 528, 281, 360, 746, 337, 1460, 51276], "temperature": + 0.0, "avg_logprob": -0.12073604733336206, "compression_ratio": 1.8691588785046729, + "no_speech_prob": 0.03333936631679535}, {"id": 260, "seek": 159232, "start": 1610.56, + "end": 1617.12, "text": " and something for what we really are passionate about. + So I think that''s really nice like how I connect", "tokens": [51276, 293, 746, + 337, 437, 321, 534, 366, 11410, 466, 13, 407, 286, 519, 300, 311, 534, 1481, 411, + 577, 286, 1745, 51604], "temperature": 0.0, "avg_logprob": -0.12073604733336206, + "compression_ratio": 1.8691588785046729, "no_speech_prob": 0.03333936631679535}, + {"id": 261, "seek": 161712, "start": 1617.12, "end": 1623.28, "text": " because + I feel like I resonate with the model of the company. I mean, I also want to, you + know,", "tokens": [50364, 570, 286, 841, 411, 286, 34285, 365, 264, 2316, 295, 264, + 2237, 13, 286, 914, 11, 286, 611, 528, 281, 11, 291, 458, 11, 50672], "temperature": + 0.0, "avg_logprob": -0.13322033832982644, "compression_ratio": 1.6416666666666666, + "no_speech_prob": 0.0015714033506810665}, {"id": 262, "seek": 161712, "start": 1623.28, + "end": 1629.12, "text": " empower people and like share my knowledge as open source + or like, I mean, if I''m getting paid for", "tokens": [50672, 11071, 561, 293, 411, + 2073, 452, 3601, 382, 1269, 4009, 420, 411, 11, 286, 914, 11, 498, 286, 478, 1242, + 4835, 337, 50964], "temperature": 0.0, "avg_logprob": -0.13322033832982644, "compression_ratio": + 1.6416666666666666, "no_speech_prob": 0.0015714033506810665}, {"id": 263, "seek": + 161712, "start": 1629.12, "end": 1635.9199999999998, "text": " it, even better. + But yeah. And I think it is also different from how traditional consulting companies", + "tokens": [50964, 309, 11, 754, 1101, 13, 583, 1338, 13, 400, 286, 519, 309, 307, + 611, 819, 490, 577, 5164, 23682, 3431, 51304], "temperature": 0.0, "avg_logprob": + -0.13322033832982644, "compression_ratio": 1.6416666666666666, "no_speech_prob": + 0.0015714033506810665}, {"id": 264, "seek": 161712, "start": 1635.9199999999998, + "end": 1641.84, "text": " work. Like I have been with consulting company myself + in a starting of my career. And I feel like", "tokens": [51304, 589, 13, 1743, 286, + 362, 668, 365, 23682, 2237, 2059, 294, 257, 2891, 295, 452, 3988, 13, 400, 286, + 841, 411, 51600], "temperature": 0.0, "avg_logprob": -0.13322033832982644, "compression_ratio": + 1.6416666666666666, "no_speech_prob": 0.0015714033506810665}, {"id": 265, "seek": + 164184, "start": 1641.9199999999998, "end": 1647.4399999999998, "text": " that the + companies who are taking the services of these consulting companies are more like, + you know,", "tokens": [50368, 300, 264, 3431, 567, 366, 1940, 264, 3328, 295, 613, + 23682, 3431, 366, 544, 411, 11, 291, 458, 11, 50644], "temperature": 0.0, "avg_logprob": + -0.15836983635312035, "compression_ratio": 1.7773722627737227, "no_speech_prob": + 0.0074394214898347855}, {"id": 266, "seek": 164184, "start": 1647.4399999999998, + "end": 1653.6, "text": " very closely tied. I mean, it''s like, you know, being + mad at with them forever and ever. Like, no,", "tokens": [50644, 588, 8185, 9601, + 13, 286, 914, 11, 309, 311, 411, 11, 291, 458, 11, 885, 5244, 412, 365, 552, 5680, + 293, 1562, 13, 1743, 11, 572, 11, 50952], "temperature": 0.0, "avg_logprob": -0.15836983635312035, + "compression_ratio": 1.7773722627737227, "no_speech_prob": 0.0074394214898347855}, + {"id": 267, "seek": 164184, "start": 1653.6, "end": 1657.76, "text": " turning back + now, you''re always, you know, going to be with us. And I think that''s where it''s", + "tokens": [50952, 6246, 646, 586, 11, 291, 434, 1009, 11, 291, 458, 11, 516, 281, + 312, 365, 505, 13, 400, 286, 519, 300, 311, 689, 309, 311, 51160], "temperature": + 0.0, "avg_logprob": -0.15836983635312035, "compression_ratio": 1.7773722627737227, + "no_speech_prob": 0.0074394214898347855}, {"id": 268, "seek": 164184, "start": 1657.76, + "end": 1663.76, "text": " source connection. It''s like job security for some businesses, + probably, right? That is true.", "tokens": [51160, 4009, 4984, 13, 467, 311, 411, + 1691, 3825, 337, 512, 6011, 11, 1391, 11, 558, 30, 663, 307, 2074, 13, 51460], "temperature": + 0.0, "avg_logprob": -0.15836983635312035, "compression_ratio": 1.7773722627737227, + "no_speech_prob": 0.0074394214898347855}, {"id": 269, "seek": 164184, "start": 1663.76, + "end": 1669.12, "text": " Kind of like working model. And you''re saying that, you + know, in open source connections, it''s the", "tokens": [51460, 9242, 295, 411, + 1364, 2316, 13, 400, 291, 434, 1566, 300, 11, 291, 458, 11, 294, 1269, 4009, 9271, + 11, 309, 311, 264, 51728], "temperature": 0.0, "avg_logprob": -0.15836983635312035, + "compression_ratio": 1.7773722627737227, "no_speech_prob": 0.0074394214898347855}, + {"id": 270, "seek": 166912, "start": 1669.12, "end": 1675.04, "text": " opposite. + And it''s actually clearly stated that what you said, empower search teams to", + "tokens": [50364, 6182, 13, 400, 309, 311, 767, 4448, 11323, 300, 437, 291, 848, + 11, 11071, 3164, 5491, 281, 50660], "temperature": 0.0, "avg_logprob": -0.15537635647520728, + "compression_ratio": 1.603448275862069, "no_speech_prob": 0.004672720097005367}, + {"id": 271, "seek": 166912, "start": 1676.1599999999999, "end": 1683.28, "text": + " learn and become independent if they want, right? Exactly. That kind of entice + the whole situation.", "tokens": [50716, 1466, 293, 1813, 6695, 498, 436, 528, 11, + 558, 30, 7587, 13, 663, 733, 295, 948, 573, 264, 1379, 2590, 13, 51072], "temperature": + 0.0, "avg_logprob": -0.15537635647520728, "compression_ratio": 1.603448275862069, + "no_speech_prob": 0.004672720097005367}, {"id": 272, "seek": 166912, "start": 1683.28, + "end": 1689.12, "text": " You don''t need to really be exactly. And if you look + at its very natural, like, you know,", "tokens": [51072, 509, 500, 380, 643, 281, + 534, 312, 2293, 13, 400, 498, 291, 574, 412, 1080, 588, 3303, 11, 411, 11, 291, + 458, 11, 51364], "temperature": 0.0, "avg_logprob": -0.15537635647520728, "compression_ratio": + 1.603448275862069, "no_speech_prob": 0.004672720097005367}, {"id": 273, "seek": + 166912, "start": 1689.12, "end": 1694.1599999999999, "text": " we help teams to, + you know, fish their own fish. It''s like, you know, we are unblocking them to", + "tokens": [51364, 321, 854, 5491, 281, 11, 291, 458, 11, 3506, 641, 1065, 3506, + 13, 467, 311, 411, 11, 291, 458, 11, 321, 366, 517, 28830, 278, 552, 281, 51616], + "temperature": 0.0, "avg_logprob": -0.15537635647520728, "compression_ratio": 1.603448275862069, + "no_speech_prob": 0.004672720097005367}, {"id": 274, "seek": 169416, "start": 1694.72, + "end": 1699.44, "text": " achieve their goals. I mean, if you think of it from a + video game point of view, like, and then", "tokens": [50392, 4584, 641, 5493, 13, + 286, 914, 11, 498, 291, 519, 295, 309, 490, 257, 960, 1216, 935, 295, 1910, 11, + 411, 11, 293, 550, 50628], "temperature": 0.0, "avg_logprob": -0.09924637526273727, + "compression_ratio": 1.7428571428571429, "no_speech_prob": 0.03188350051641464}, + {"id": 275, "seek": 169416, "start": 1699.44, "end": 1703.8400000000001, "text": + " they will be stuck at some other point. I mean, by then the context would have + changed and they", "tokens": [50628, 436, 486, 312, 5541, 412, 512, 661, 935, 13, + 286, 914, 11, 538, 550, 264, 4319, 576, 362, 3105, 293, 436, 50848], "temperature": + 0.0, "avg_logprob": -0.09924637526273727, "compression_ratio": 1.7428571428571429, + "no_speech_prob": 0.03188350051641464}, {"id": 276, "seek": 169416, "start": 1703.8400000000001, + "end": 1709.8400000000001, "text": " would have swim through like their initial + challenges. So, I mean, as a consultant, I would also get", "tokens": [50848, 576, + 362, 7110, 807, 411, 641, 5883, 4759, 13, 407, 11, 286, 914, 11, 382, 257, 24676, + 11, 286, 576, 611, 483, 51148], "temperature": 0.0, "avg_logprob": -0.09924637526273727, + "compression_ratio": 1.7428571428571429, "no_speech_prob": 0.03188350051641464}, + {"id": 277, "seek": 169416, "start": 1709.8400000000001, "end": 1715.6000000000001, + "text": " a new use case next time. So it''s like, you know, we keep on learning + with our clients. And I think", "tokens": [51148, 257, 777, 764, 1389, 958, 565, + 13, 407, 309, 311, 411, 11, 291, 458, 11, 321, 1066, 322, 2539, 365, 527, 6982, + 13, 400, 286, 519, 51436], "temperature": 0.0, "avg_logprob": -0.09924637526273727, + "compression_ratio": 1.7428571428571429, "no_speech_prob": 0.03188350051641464}, + {"id": 278, "seek": 169416, "start": 1715.6000000000001, "end": 1722.5600000000002, + "text": " that''s what really excites me about my job. That sounds great. And I + mean, when you think about", "tokens": [51436, 300, 311, 437, 534, 1624, 3324, 385, + 466, 452, 1691, 13, 663, 3263, 869, 13, 400, 286, 914, 11, 562, 291, 519, 466, 51784], + "temperature": 0.0, "avg_logprob": -0.09924637526273727, "compression_ratio": 1.7428571428571429, + "no_speech_prob": 0.03188350051641464}, {"id": 279, "seek": 172256, "start": 1722.56, + "end": 1731.12, "text": " search, really, what are the companies that are so publicly + known and shining and doing so many", "tokens": [50364, 3164, 11, 534, 11, 437, + 366, 264, 3431, 300, 366, 370, 14843, 2570, 293, 18269, 293, 884, 370, 867, 50792], + "temperature": 0.0, "avg_logprob": -0.18339136144617102, "compression_ratio": 1.6771300448430493, + "no_speech_prob": 0.009003683924674988}, {"id": 280, "seek": 172256, "start": 1731.12, + "end": 1736.8799999999999, "text": " things, but open source connections, you know, + with haystack, grandparents in Europe and", "tokens": [50792, 721, 11, 457, 1269, + 4009, 9271, 11, 291, 458, 11, 365, 4842, 372, 501, 11, 21876, 294, 3315, 293, 51080], + "temperature": 0.0, "avg_logprob": -0.18339136144617102, "compression_ratio": 1.6771300448430493, + "no_speech_prob": 0.009003683924674988}, {"id": 281, "seek": 172256, "start": 1737.84, + "end": 1746.72, "text": " in the US with all the tooling, you know, like, cube it + and beyond. Like, I think it''s in part,", "tokens": [51128, 294, 264, 2546, 365, + 439, 264, 46593, 11, 291, 458, 11, 411, 11, 13728, 309, 293, 4399, 13, 1743, 11, + 286, 519, 309, 311, 294, 644, 11, 51572], "temperature": 0.0, "avg_logprob": -0.18339136144617102, + "compression_ratio": 1.6771300448430493, "no_speech_prob": 0.009003683924674988}, + {"id": 282, "seek": 172256, "start": 1746.72, "end": 1751.76, "text": " I think + why this podcast also exists is because to keep going and discussing and keeping + that", "tokens": [51572, 286, 519, 983, 341, 7367, 611, 8198, 307, 570, 281, 1066, + 516, 293, 10850, 293, 5145, 300, 51824], "temperature": 0.0, "avg_logprob": -0.18339136144617102, + "compression_ratio": 1.6771300448430493, "no_speech_prob": 0.009003683924674988}, + {"id": 283, "seek": 175256, "start": 1752.8, "end": 1761.28, "text": " connection + open that we talk and we develop the thought further and we share our experiences.", + "tokens": [50376, 4984, 1269, 300, 321, 751, 293, 321, 1499, 264, 1194, 3052, 293, + 321, 2073, 527, 5235, 13, 50800], "temperature": 0.0, "avg_logprob": -0.21040453910827636, + "compression_ratio": 1.6470588235294117, "no_speech_prob": 0.0031428614165633917}, + {"id": 284, "seek": 175256, "start": 1762.0, "end": 1766.8799999999999, "text": + " And I think in many ways that''s what open source connections has been so successfully + doing.", "tokens": [50836, 400, 286, 519, 294, 867, 2098, 300, 311, 437, 1269, 4009, + 9271, 575, 668, 370, 10727, 884, 13, 51080], "temperature": 0.0, "avg_logprob": + -0.21040453910827636, "compression_ratio": 1.6470588235294117, "no_speech_prob": + 0.0031428614165633917}, {"id": 285, "seek": 175256, "start": 1767.76, "end": 1771.84, + "text": " And what is your role there in little bit more detail?", "tokens": [51124, + 400, 437, 307, 428, 3090, 456, 294, 707, 857, 544, 2607, 30, 51328], "temperature": + 0.0, "avg_logprob": -0.21040453910827636, "compression_ratio": 1.6470588235294117, + "no_speech_prob": 0.0031428614165633917}, {"id": 286, "seek": 175256, "start": 1772.96, + "end": 1778.72, "text": " So about my role, I think things, and again, another thing + is that I love about my job is that", "tokens": [51384, 407, 466, 452, 3090, 11, + 286, 519, 721, 11, 293, 797, 11, 1071, 551, 307, 300, 286, 959, 466, 452, 1691, + 307, 300, 51672], "temperature": 0.0, "avg_logprob": -0.21040453910827636, "compression_ratio": + 1.6470588235294117, "no_speech_prob": 0.0031428614165633917}, {"id": 287, "seek": + 177872, "start": 1779.2, "end": 1785.04, "text": " I have independence in terms + of like what I choose to work on. So when I talk about an engagement,", "tokens": + [50388, 286, 362, 14640, 294, 2115, 295, 411, 437, 286, 2826, 281, 589, 322, 13, + 407, 562, 286, 751, 466, 364, 8742, 11, 50680], "temperature": 0.0, "avg_logprob": + -0.1518173308599563, "compression_ratio": 1.76, "no_speech_prob": 0.03011154942214489}, + {"id": 288, "seek": 177872, "start": 1785.04, "end": 1791.28, "text": " I mean, + there have been cases when I''ve been strategist, also been an engineer. So I get, + I would say,", "tokens": [50680, 286, 914, 11, 456, 362, 668, 3331, 562, 286, 600, + 668, 5464, 468, 11, 611, 668, 364, 11403, 13, 407, 286, 483, 11, 286, 576, 584, + 11, 50992], "temperature": 0.0, "avg_logprob": -0.1518173308599563, "compression_ratio": + 1.76, "no_speech_prob": 0.03011154942214489}, {"id": 289, "seek": 177872, "start": + 1791.28, "end": 1797.28, "text": " like enough time to, you know, research about + stuff, I get time to also be hands on, I get time to,", "tokens": [50992, 411, 1547, + 565, 281, 11, 291, 458, 11, 2132, 466, 1507, 11, 286, 483, 565, 281, 611, 312, 2377, + 322, 11, 286, 483, 565, 281, 11, 51292], "temperature": 0.0, "avg_logprob": -0.1518173308599563, + "compression_ratio": 1.76, "no_speech_prob": 0.03011154942214489}, {"id": 290, "seek": + 177872, "start": 1797.28, "end": 1802.72, "text": " you know, explore stuff and, + you know, develop good solutions also as a council to the company", "tokens": [51292, + 291, 458, 11, 6839, 1507, 293, 11, 291, 458, 11, 1499, 665, 6547, 611, 382, 257, + 9209, 281, 264, 2237, 51564], "temperature": 0.0, "avg_logprob": -0.1518173308599563, + "compression_ratio": 1.76, "no_speech_prob": 0.03011154942214489}, {"id": 291, "seek": + 180272, "start": 1803.44, "end": 1809.1200000000001, "text": " that we work with. + And I think individually as well, like, you know, you feel really valued.", "tokens": + [50400, 300, 321, 589, 365, 13, 400, 286, 519, 16652, 382, 731, 11, 411, 11, 291, + 458, 11, 291, 841, 534, 22608, 13, 50684], "temperature": 0.0, "avg_logprob": -0.1600781032017299, + "compression_ratio": 1.7035398230088497, "no_speech_prob": 0.01575920544564724}, + {"id": 292, "seek": 180272, "start": 1809.1200000000001, "end": 1814.16, "text": + " Like if I want to have some transition, like, for example, if you would, you know, + like to bring", "tokens": [50684, 1743, 498, 286, 528, 281, 362, 512, 6034, 11, + 411, 11, 337, 1365, 11, 498, 291, 576, 11, 291, 458, 11, 411, 281, 1565, 50936], + "temperature": 0.0, "avg_logprob": -0.1600781032017299, "compression_ratio": 1.7035398230088497, + "no_speech_prob": 0.01575920544564724}, {"id": 293, "seek": 180272, "start": 1814.16, + "end": 1821.3600000000001, "text": " this in, because it''s a vector podcast, like + I added vector search to chorus. So I think chorus is", "tokens": [50936, 341, 294, + 11, 570, 309, 311, 257, 8062, 7367, 11, 411, 286, 3869, 8062, 3164, 281, 22632, + 13, 407, 286, 519, 22632, 307, 51296], "temperature": 0.0, "avg_logprob": -0.1600781032017299, + "compression_ratio": 1.7035398230088497, "no_speech_prob": 0.01575920544564724}, + {"id": 294, "seek": 180272, "start": 1822.24, "end": 1828.32, "text": " so to say, + like a small, you know, like experimental, you know, webshop that bunch of, you + know,", "tokens": [51340, 370, 281, 584, 11, 411, 257, 1359, 11, 291, 458, 11, 411, + 17069, 11, 291, 458, 11, 2859, 9050, 300, 3840, 295, 11, 291, 458, 11, 51644], "temperature": + 0.0, "avg_logprob": -0.1600781032017299, "compression_ratio": 1.7035398230088497, + "no_speech_prob": 0.01575920544564724}, {"id": 295, "seek": 182832, "start": 1828.32, + "end": 1833.52, "text": " folks started off, I think Eric Pugh and Renee and Paul + and I think there are a bunch of other", "tokens": [50364, 4024, 1409, 766, 11, + 286, 519, 9336, 430, 1984, 293, 47790, 293, 4552, 293, 286, 519, 456, 366, 257, + 3840, 295, 661, 50624], "temperature": 0.0, "avg_logprob": -0.14346295368822315, + "compression_ratio": 1.8520900321543408, "no_speech_prob": 0.029631907120347023}, + {"id": 296, "seek": 182832, "start": 1833.52, "end": 1838.56, "text": " folks who + tried to bring together, you know, all the two links, which is needed to run a webshop", + "tokens": [50624, 4024, 567, 3031, 281, 1565, 1214, 11, 291, 458, 11, 439, 264, + 732, 6123, 11, 597, 307, 2978, 281, 1190, 257, 2859, 9050, 50876], "temperature": + 0.0, "avg_logprob": -0.14346295368822315, "compression_ratio": 1.8520900321543408, + "no_speech_prob": 0.029631907120347023}, {"id": 297, "seek": 182832, "start": 1838.56, + "end": 1844.6399999999999, "text": " or e-commerce shop. And I think with all this + buzz that I was, you know, hearing at, you know,", "tokens": [50876, 420, 308, 12, + 26926, 3945, 13, 400, 286, 519, 365, 439, 341, 13036, 300, 286, 390, 11, 291, 458, + 11, 4763, 412, 11, 291, 458, 11, 51180], "temperature": 0.0, "avg_logprob": -0.14346295368822315, + "compression_ratio": 1.8520900321543408, "no_speech_prob": 0.029631907120347023}, + {"id": 298, "seek": 182832, "start": 1844.6399999999999, "end": 1848.56, "text": + " whenever we meet, you know, different clients, they were like, okay, what about + vectors? Obviously,", "tokens": [51180, 5699, 321, 1677, 11, 291, 458, 11, 819, + 6982, 11, 436, 645, 411, 11, 1392, 11, 437, 466, 18875, 30, 7580, 11, 51376], "temperature": + 0.0, "avg_logprob": -0.14346295368822315, "compression_ratio": 1.8520900321543408, + "no_speech_prob": 0.029631907120347023}, {"id": 299, "seek": 182832, "start": 1848.56, + "end": 1853.04, "text": " we''re consultants, people look up to us, you know, for + advises. Like, is it something for me?", "tokens": [51376, 321, 434, 38935, 11, + 561, 574, 493, 281, 505, 11, 291, 458, 11, 337, 1551, 3598, 13, 1743, 11, 307, 309, + 746, 337, 385, 30, 51600], "temperature": 0.0, "avg_logprob": -0.14346295368822315, + "compression_ratio": 1.8520900321543408, "no_speech_prob": 0.029631907120347023}, + {"id": 300, "seek": 182832, "start": 1853.04, "end": 1858.1599999999999, "text": + " Is it something that I can do? Is it something that, you know, we should go for, + and, you know,", "tokens": [51600, 1119, 309, 746, 300, 286, 393, 360, 30, 1119, + 309, 746, 300, 11, 291, 458, 11, 321, 820, 352, 337, 11, 293, 11, 291, 458, 11, + 51856], "temperature": 0.0, "avg_logprob": -0.14346295368822315, "compression_ratio": + 1.8520900321543408, "no_speech_prob": 0.029631907120347023}, {"id": 301, "seek": + 185816, "start": 1858.16, "end": 1864.72, "text": " I am a person again, I said, + because I have something that really stops me if I have to lie.", "tokens": [50364, + 286, 669, 257, 954, 797, 11, 286, 848, 11, 570, 286, 362, 746, 300, 534, 10094, + 385, 498, 286, 362, 281, 4544, 13, 50692], "temperature": 0.0, "avg_logprob": -0.1737676033606896, + "compression_ratio": 1.6925795053003534, "no_speech_prob": 0.0016984354006126523}, + {"id": 302, "seek": 185816, "start": 1865.28, "end": 1869.76, "text": " So, I mean, + I would usually, you know, keep mum. I would not really say something and I don''t", + "tokens": [50720, 407, 11, 286, 914, 11, 286, 576, 2673, 11, 291, 458, 11, 1066, + 14697, 13, 286, 576, 406, 534, 584, 746, 293, 286, 500, 380, 50944], "temperature": + 0.0, "avg_logprob": -0.1737676033606896, "compression_ratio": 1.6925795053003534, + "no_speech_prob": 0.0016984354006126523}, {"id": 303, "seek": 185816, "start": 1869.76, + "end": 1874.3200000000002, "text": " want to be in that situation for too long, + because obviously the word is still, you know, getting", "tokens": [50944, 528, + 281, 312, 294, 300, 2590, 337, 886, 938, 11, 570, 2745, 264, 1349, 307, 920, 11, + 291, 458, 11, 1242, 51172], "temperature": 0.0, "avg_logprob": -0.1737676033606896, + "compression_ratio": 1.6925795053003534, "no_speech_prob": 0.0016984354006126523}, + {"id": 304, "seek": 185816, "start": 1874.3200000000002, "end": 1879.76, "text": + " ahead of themselves. They are, you know, developing new solutions every day and + then Chad GPD came and", "tokens": [51172, 2286, 295, 2969, 13, 814, 366, 11, 291, + 458, 11, 6416, 777, 6547, 633, 786, 293, 550, 22268, 460, 17349, 1361, 293, 51444], + "temperature": 0.0, "avg_logprob": -0.1737676033606896, "compression_ratio": 1.6925795053003534, + "no_speech_prob": 0.0016984354006126523}, {"id": 305, "seek": 185816, "start": 1879.76, + "end": 1884.96, "text": " then, I mean, already a buzz about transformers and like + LLM''s, I think it''s just non-stop.", "tokens": [51444, 550, 11, 286, 914, 11, + 1217, 257, 13036, 466, 4088, 433, 293, 411, 441, 43, 44, 311, 11, 286, 519, 309, + 311, 445, 2107, 12, 13559, 13, 51704], "temperature": 0.0, "avg_logprob": -0.1737676033606896, + "compression_ratio": 1.6925795053003534, "no_speech_prob": 0.0016984354006126523}, + {"id": 306, "seek": 188496, "start": 1885.68, "end": 1890.24, "text": " And everything + is, you know, kind of, you know, like the boundaries are diminishing.", "tokens": + [50400, 400, 1203, 307, 11, 291, 458, 11, 733, 295, 11, 291, 458, 11, 411, 264, + 13180, 366, 15739, 3807, 13, 50628], "temperature": 0.0, "avg_logprob": -0.16491677509090766, + "compression_ratio": 1.6726618705035972, "no_speech_prob": 0.026465218514204025}, + {"id": 307, "seek": 188496, "start": 1890.8, "end": 1896.4, "text": " So, I remember + like last year when I started with open source connections and Gen, in fact,", "tokens": + [50656, 407, 11, 286, 1604, 411, 1036, 1064, 562, 286, 1409, 365, 1269, 4009, 9271, + 293, 3632, 11, 294, 1186, 11, 50936], "temperature": 0.0, "avg_logprob": -0.16491677509090766, + "compression_ratio": 1.6726618705035972, "no_speech_prob": 0.026465218514204025}, + {"id": 308, "seek": 188496, "start": 1896.4, "end": 1902.56, "text": " I got a chance + to work with this client and they were like, all about, you know, West Spa and then", + "tokens": [50936, 286, 658, 257, 2931, 281, 589, 365, 341, 6423, 293, 436, 645, + 411, 11, 439, 466, 11, 291, 458, 11, 4055, 23729, 293, 550, 51244], "temperature": + 0.0, "avg_logprob": -0.16491677509090766, "compression_ratio": 1.6726618705035972, + "no_speech_prob": 0.026465218514204025}, {"id": 309, "seek": 188496, "start": 1902.56, + "end": 1908.32, "text": " we also considered working with VVA8 and then what do + you suggest? Like, are we making a good", "tokens": [51244, 321, 611, 4888, 1364, + 365, 691, 20914, 23, 293, 550, 437, 360, 291, 3402, 30, 1743, 11, 366, 321, 1455, + 257, 665, 51532], "temperature": 0.0, "avg_logprob": -0.16491677509090766, "compression_ratio": + 1.6726618705035972, "no_speech_prob": 0.026465218514204025}, {"id": 310, "seek": + 188496, "start": 1908.32, "end": 1913.04, "text": " choice? I mean, is it something + that we should be doing? And at that point in time, I was like,", "tokens": [51532, + 3922, 30, 286, 914, 11, 307, 309, 746, 300, 321, 820, 312, 884, 30, 400, 412, 300, + 935, 294, 565, 11, 286, 390, 411, 11, 51768], "temperature": 0.0, "avg_logprob": + -0.16491677509090766, "compression_ratio": 1.6726618705035972, "no_speech_prob": + 0.026465218514204025}, {"id": 311, "seek": 191304, "start": 1913.12, "end": 1917.84, + "text": " literally, like, okay, I think I don''t know, I don''t have enough context + of this. And I think that''s", "tokens": [50368, 3736, 11, 411, 11, 1392, 11, 286, + 519, 286, 500, 380, 458, 11, 286, 500, 380, 362, 1547, 4319, 295, 341, 13, 400, + 286, 519, 300, 311, 50604], "temperature": 0.0, "avg_logprob": -0.23501207210399486, + "compression_ratio": 1.6081632653061224, "no_speech_prob": 0.031644903123378754}, + {"id": 312, "seek": 191304, "start": 1917.84, "end": 1923.2, "text": " where it + all started. I mean, that presentation I gave at Perlin Buzzwords last year about + West Spa,", "tokens": [50604, 689, 309, 439, 1409, 13, 286, 914, 11, 300, 5860, + 286, 2729, 412, 3026, 5045, 29209, 13832, 1036, 1064, 466, 4055, 23729, 11, 50872], + "temperature": 0.0, "avg_logprob": -0.23501207210399486, "compression_ratio": 1.6081632653061224, + "no_speech_prob": 0.031644903123378754}, {"id": 313, "seek": 191304, "start": 1923.2, + "end": 1931.04, "text": " it was condensed version of how I learned Westpiant, how + I got completely, you know, I mean, it", "tokens": [50872, 309, 390, 36398, 3037, + 295, 577, 286, 3264, 4055, 79, 5798, 11, 577, 286, 658, 2584, 11, 291, 458, 11, + 286, 914, 11, 309, 51264], "temperature": 0.0, "avg_logprob": -0.23501207210399486, + "compression_ratio": 1.6081632653061224, "no_speech_prob": 0.031644903123378754}, + {"id": 314, "seek": 191304, "start": 1931.04, "end": 1937.92, "text": " swidled + me off my floor like there is something that existed, you know, at the time when + it''s so", "tokens": [51264, 1693, 327, 1493, 385, 766, 452, 4123, 411, 456, 307, + 746, 300, 13135, 11, 291, 458, 11, 412, 264, 565, 562, 309, 311, 370, 51608], "temperature": + 0.0, "avg_logprob": -0.23501207210399486, "compression_ratio": 1.6081632653061224, + "no_speech_prob": 0.031644903123378754}, {"id": 315, "seek": 193792, "start": 1937.92, + "end": 1943.1200000000001, "text": " existed and it has been as solid as, you know, + rest of the other traditional search engines as", "tokens": [50364, 13135, 293, + 309, 575, 668, 382, 5100, 382, 11, 291, 458, 11, 1472, 295, 264, 661, 5164, 3164, + 12982, 382, 50624], "temperature": 0.0, "avg_logprob": -0.1690689241043245, "compression_ratio": + 1.7727272727272727, "no_speech_prob": 0.011185696348547935}, {"id": 316, "seek": + 193792, "start": 1943.1200000000001, "end": 1951.8400000000001, "text": " well as, + you know, so many, you know, data science functionality that it offers. So it was + amazing.", "tokens": [50624, 731, 382, 11, 291, 458, 11, 370, 867, 11, 291, 458, + 11, 1412, 3497, 14980, 300, 309, 7736, 13, 407, 309, 390, 2243, 13, 51060], "temperature": + 0.0, "avg_logprob": -0.1690689241043245, "compression_ratio": 1.7727272727272727, + "no_speech_prob": 0.011185696348547935}, {"id": 317, "seek": 193792, "start": 1951.8400000000001, + "end": 1957.28, "text": " And I would say like, sometimes that imposter syndrome + that, you know, women have, you know, it also,", "tokens": [51060, 400, 286, 576, + 584, 411, 11, 2171, 300, 704, 7096, 19371, 300, 11, 291, 458, 11, 2266, 362, 11, + 291, 458, 11, 309, 611, 11, 51332], "temperature": 0.0, "avg_logprob": -0.1690689241043245, + "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.011185696348547935}, + {"id": 318, "seek": 193792, "start": 1957.28, "end": 1964.24, "text": " you know, + like transforms or it''s something that triggers us to do something that in the + end,", "tokens": [51332, 291, 458, 11, 411, 35592, 420, 309, 311, 746, 300, 22827, + 505, 281, 360, 746, 300, 294, 264, 917, 11, 51680], "temperature": 0.0, "avg_logprob": + -0.1690689241043245, "compression_ratio": 1.7727272727272727, "no_speech_prob": + 0.011185696348547935}, {"id": 319, "seek": 196424, "start": 1964.88, "end": 1972.48, + "text": " comes out as, you know, more like shining or grooming us. And I think + I liked it. So I think that''s", "tokens": [50396, 1487, 484, 382, 11, 291, 458, + 11, 544, 411, 18269, 420, 49700, 505, 13, 400, 286, 519, 286, 4501, 309, 13, 407, + 286, 519, 300, 311, 50776], "temperature": 0.0, "avg_logprob": -0.12738232140068537, + "compression_ratio": 1.677685950413223, "no_speech_prob": 0.004510264378041029}, + {"id": 320, "seek": 196424, "start": 1972.48, "end": 1979.76, "text": " where it + all started. And I think I proposed that idea to some of the folks at OSE that this + is what I", "tokens": [50776, 689, 309, 439, 1409, 13, 400, 286, 519, 286, 10348, + 300, 1558, 281, 512, 295, 264, 4024, 412, 422, 5879, 300, 341, 307, 437, 286, 51140], + "temperature": 0.0, "avg_logprob": -0.12738232140068537, "compression_ratio": 1.677685950413223, + "no_speech_prob": 0.004510264378041029}, {"id": 321, "seek": 196424, "start": 1979.76, + "end": 1986.32, "text": " want to do. It took some time, of course, because I was + on the full-time engagements with the clients,", "tokens": [51140, 528, 281, 360, + 13, 467, 1890, 512, 565, 11, 295, 1164, 11, 570, 286, 390, 322, 264, 1577, 12, 3766, + 44978, 365, 264, 6982, 11, 51468], "temperature": 0.0, "avg_logprob": -0.12738232140068537, + "compression_ratio": 1.677685950413223, "no_speech_prob": 0.004510264378041029}, + {"id": 322, "seek": 196424, "start": 1986.32, "end": 1992.32, "text": " but when + I had like the first, you know, stop, I tried to make some things work. I experimented + with", "tokens": [51468, 457, 562, 286, 632, 411, 264, 700, 11, 291, 458, 11, 1590, + 11, 286, 3031, 281, 652, 512, 721, 589, 13, 286, 5120, 292, 365, 51768], "temperature": + 0.0, "avg_logprob": -0.12738232140068537, "compression_ratio": 1.677685950413223, + "no_speech_prob": 0.004510264378041029}, {"id": 323, "seek": 199232, "start": 1992.32, + "end": 2000.8799999999999, "text": " stuff and that''s what we came up with. So, + Rene really helped me with the, because I gave like initial", "tokens": [50364, + 1507, 293, 300, 311, 437, 321, 1361, 493, 365, 13, 407, 11, 497, 1450, 534, 4254, + 385, 365, 264, 11, 570, 286, 2729, 411, 5883, 50792], "temperature": 0.0, "avg_logprob": + -0.1335090923309326, "compression_ratio": 1.6528925619834711, "no_speech_prob": + 0.008777234703302383}, {"id": 324, "seek": 199232, "start": 2000.8799999999999, + "end": 2007.76, "text": " demos to him and Eric and Charlie. And I said like, you + know, I''m struggling with that the embedding", "tokens": [50792, 33788, 281, 796, + 293, 9336, 293, 13754, 13, 400, 286, 848, 411, 11, 291, 458, 11, 286, 478, 9314, + 365, 300, 264, 12240, 3584, 51136], "temperature": 0.0, "avg_logprob": -0.1335090923309326, + "compression_ratio": 1.6528925619834711, "no_speech_prob": 0.008777234703302383}, + {"id": 325, "seek": 199232, "start": 2007.76, "end": 2014.56, "text": " needs to + be calculated somehow. I mean, solar already supports vectors, but the only challenges + that,", "tokens": [51136, 2203, 281, 312, 15598, 6063, 13, 286, 914, 11, 7936, 1217, + 9346, 18875, 11, 457, 264, 787, 4759, 300, 11, 51476], "temperature": 0.0, "avg_logprob": + -0.1335090923309326, "compression_ratio": 1.6528925619834711, "no_speech_prob": + 0.008777234703302383}, {"id": 326, "seek": 199232, "start": 2014.56, "end": 2019.6, + "text": " you know, the vectors for the query need to be somehow calculated, you + know, outside of solar.", "tokens": [51476, 291, 458, 11, 264, 18875, 337, 264, + 14581, 643, 281, 312, 6063, 15598, 11, 291, 458, 11, 2380, 295, 7936, 13, 51728], + "temperature": 0.0, "avg_logprob": -0.1335090923309326, "compression_ratio": 1.6528925619834711, + "no_speech_prob": 0.008777234703302383}, {"id": 327, "seek": 201960, "start": 2019.6, + "end": 2023.6, "text": " And that''s when you know, Rene suggested like it would + be a good idea to maybe, you know,", "tokens": [50364, 400, 300, 311, 562, 291, + 458, 11, 497, 1450, 10945, 411, 309, 576, 312, 257, 665, 1558, 281, 1310, 11, 291, + 458, 11, 50564], "temperature": 0.0, "avg_logprob": -0.17282724010851958, "compression_ratio": + 1.7096774193548387, "no_speech_prob": 0.002024167450144887}, {"id": 328, "seek": + 201960, "start": 2023.6, "end": 2028.24, "text": " use Quirky for it. And I have + not worked on Quirky before. And I have been wanting to do that.", "tokens": [50564, + 764, 2326, 347, 4133, 337, 309, 13, 400, 286, 362, 406, 2732, 322, 2326, 347, 4133, + 949, 13, 400, 286, 362, 668, 7935, 281, 360, 300, 13, 50796], "temperature": 0.0, + "avg_logprob": -0.17282724010851958, "compression_ratio": 1.7096774193548387, "no_speech_prob": + 0.002024167450144887}, {"id": 329, "seek": 201960, "start": 2028.8799999999999, + "end": 2034.08, "text": " Somehow none of my clients were really at that stage that + they could use Quirky. And that''s how", "tokens": [50828, 28357, 6022, 295, 452, + 6982, 645, 534, 412, 300, 3233, 300, 436, 727, 764, 2326, 347, 4133, 13, 400, 300, + 311, 577, 51088], "temperature": 0.0, "avg_logprob": -0.17282724010851958, "compression_ratio": + 1.7096774193548387, "no_speech_prob": 0.002024167450144887}, {"id": 330, "seek": + 201960, "start": 2034.08, "end": 2039.36, "text": " I got to, you know, touch the + entire stack of open source connections. I have already contributed on", "tokens": + [51088, 286, 658, 281, 11, 291, 458, 11, 2557, 264, 2302, 8630, 295, 1269, 4009, + 9271, 13, 286, 362, 1217, 18434, 322, 51352], "temperature": 0.0, "avg_logprob": + -0.17282724010851958, "compression_ratio": 1.7096774193548387, "no_speech_prob": + 0.002024167450144887}, {"id": 331, "seek": 201960, "start": 2039.36, "end": 2046.0, + "text": " QPIT before, like adding visualizations and the other stuff. Solved a + lot of bugs. I think Eric", "tokens": [51352, 1249, 47, 3927, 949, 11, 411, 5127, + 5056, 14455, 293, 264, 661, 1507, 13, 7026, 937, 257, 688, 295, 15120, 13, 286, + 519, 9336, 51684], "temperature": 0.0, "avg_logprob": -0.17282724010851958, "compression_ratio": + 1.7096774193548387, "no_speech_prob": 0.002024167450144887}, {"id": 332, "seek": + 204600, "start": 2046.0, "end": 2053.76, "text": " would really help you for that. + And I would hate you. Why would you hate for that? I mean, because it", "tokens": + [50364, 576, 534, 854, 291, 337, 300, 13, 400, 286, 576, 4700, 291, 13, 1545, 576, + 291, 4700, 337, 300, 30, 286, 914, 11, 570, 309, 50752], "temperature": 0.0, "avg_logprob": + -0.17612601386176216, "compression_ratio": 1.7508896797153024, "no_speech_prob": + 0.03424769267439842}, {"id": 333, "seek": 204600, "start": 2053.76, "end": 2057.92, + "text": " was, it was something like, I don''t know, like some sort of a charm. + Like every time I would touch", "tokens": [50752, 390, 11, 309, 390, 746, 411, 11, + 286, 500, 380, 458, 11, 411, 512, 1333, 295, 257, 18904, 13, 1743, 633, 565, 286, + 576, 2557, 50960], "temperature": 0.0, "avg_logprob": -0.17612601386176216, "compression_ratio": + 1.7508896797153024, "no_speech_prob": 0.03424769267439842}, {"id": 334, "seek": + 204600, "start": 2057.92, "end": 2064.16, "text": " QPIT for some client requirement, + I would find a bug. And then I think it''s been very nice, I think", "tokens": [50960, + 1249, 47, 3927, 337, 512, 6423, 11695, 11, 286, 576, 915, 257, 7426, 13, 400, 550, + 286, 519, 309, 311, 668, 588, 1481, 11, 286, 519, 51272], "temperature": 0.0, "avg_logprob": + -0.17612601386176216, "compression_ratio": 1.7508896797153024, "no_speech_prob": + 0.03424769267439842}, {"id": 335, "seek": 204600, "start": 2064.88, "end": 2070.4, + "text": " in principle that he''s person that who would say like, oh, there''s a + bug. Please lock the bug.", "tokens": [51308, 294, 8665, 300, 415, 311, 954, 300, + 567, 576, 584, 411, 11, 1954, 11, 456, 311, 257, 7426, 13, 2555, 4017, 264, 7426, + 13, 51584], "temperature": 0.0, "avg_logprob": -0.17612601386176216, "compression_ratio": + 1.7508896797153024, "no_speech_prob": 0.03424769267439842}, {"id": 336, "seek": + 204600, "start": 2070.4, "end": 2074.88, "text": " How about I offer you to work + on it? I mean, it would initially feel like because I recorded it,", "tokens": [51584, + 1012, 466, 286, 2626, 291, 281, 589, 322, 309, 30, 286, 914, 11, 309, 576, 9105, + 841, 411, 570, 286, 8287, 309, 11, 51808], "temperature": 0.0, "avg_logprob": -0.17612601386176216, + "compression_ratio": 1.7508896797153024, "no_speech_prob": 0.03424769267439842}, + {"id": 337, "seek": 207488, "start": 2074.88, "end": 2079.92, "text": " but actually, + I mean, if you see at that, I mean, it would actually come as a, you know, very", + "tokens": [50364, 457, 767, 11, 286, 914, 11, 498, 291, 536, 412, 300, 11, 286, + 914, 11, 309, 576, 767, 808, 382, 257, 11, 291, 458, 11, 588, 50616], "temperature": + 0.0, "avg_logprob": -0.10375417023897171, "compression_ratio": 1.7073170731707317, + "no_speech_prob": 0.004145223647356033}, {"id": 338, "seek": 207488, "start": 2079.92, + "end": 2086.08, "text": " empowering thing. Like he offers you to, you know, solve + it the way you like. And I think not many", "tokens": [50616, 28261, 551, 13, 1743, + 415, 7736, 291, 281, 11, 291, 458, 11, 5039, 309, 264, 636, 291, 411, 13, 400, 286, + 519, 406, 867, 50924], "temperature": 0.0, "avg_logprob": -0.10375417023897171, + "compression_ratio": 1.7073170731707317, "no_speech_prob": 0.004145223647356033}, + {"id": 339, "seek": 207488, "start": 2086.08, "end": 2093.52, "text": " people are + as open. Yeah. And if you turn it around like for him to know all the context details", + "tokens": [50924, 561, 366, 382, 1269, 13, 865, 13, 400, 498, 291, 1261, 309, 926, + 411, 337, 796, 281, 458, 439, 264, 4319, 4365, 51296], "temperature": 0.0, "avg_logprob": + -0.10375417023897171, "compression_ratio": 1.7073170731707317, "no_speech_prob": + 0.004145223647356033}, {"id": 340, "seek": 207488, "start": 2093.52, "end": 2098.32, + "text": " is super hard because you encountered it in very specific context and + you have all the input to", "tokens": [51296, 307, 1687, 1152, 570, 291, 20381, + 309, 294, 588, 2685, 4319, 293, 291, 362, 439, 264, 4846, 281, 51536], "temperature": + 0.0, "avg_logprob": -0.10375417023897171, "compression_ratio": 1.7073170731707317, + "no_speech_prob": 0.004145223647356033}, {"id": 341, "seek": 207488, "start": 2098.32, + "end": 2104.1600000000003, "text": " reproduce it, right? So you are the expert + of that bug. That is true. I think that''s another way to look", "tokens": [51536, + 29501, 309, 11, 558, 30, 407, 291, 366, 264, 5844, 295, 300, 7426, 13, 663, 307, + 2074, 13, 286, 519, 300, 311, 1071, 636, 281, 574, 51828], "temperature": 0.0, "avg_logprob": + -0.10375417023897171, "compression_ratio": 1.7073170731707317, "no_speech_prob": + 0.004145223647356033}, {"id": 342, "seek": 210416, "start": 2104.16, "end": 2109.2799999999997, + "text": " at it. I mean, I never thought of that in that aspect. But yes, that''s + that''s true. So that''s", "tokens": [50364, 412, 309, 13, 286, 914, 11, 286, 1128, + 1194, 295, 300, 294, 300, 4171, 13, 583, 2086, 11, 300, 311, 300, 311, 2074, 13, + 407, 300, 311, 50620], "temperature": 0.0, "avg_logprob": -0.13573940087717476, + "compression_ratio": 1.786231884057971, "no_speech_prob": 0.005890439730137587}, + {"id": 343, "seek": 210416, "start": 2110.0, "end": 2117.92, "text": " something + that triggered me to work on QPIT. So I worked on several bugs before I added visualisation.", + "tokens": [50656, 746, 300, 21710, 385, 281, 589, 322, 1249, 47, 3927, 13, 407, + 286, 2732, 322, 2940, 15120, 949, 286, 3869, 5056, 7623, 13, 51052], "temperature": + 0.0, "avg_logprob": -0.13573940087717476, "compression_ratio": 1.786231884057971, + "no_speech_prob": 0.005890439730137587}, {"id": 344, "seek": 210416, "start": 2117.92, + "end": 2123.12, "text": " And that to came through because I had to otherwise, you + know, do the visualisation outside of QPIT,", "tokens": [51052, 400, 300, 281, 1361, + 807, 570, 286, 632, 281, 5911, 11, 291, 458, 11, 360, 264, 5056, 7623, 2380, 295, + 1249, 47, 3927, 11, 51312], "temperature": 0.0, "avg_logprob": -0.13573940087717476, + "compression_ratio": 1.786231884057971, "no_speech_prob": 0.005890439730137587}, + {"id": 345, "seek": 210416, "start": 2123.12, "end": 2128.48, "text": " and I would + then complain like, how about, I mean, I''m anyway using data from QPIT. How is + it like,", "tokens": [51312, 293, 286, 576, 550, 11024, 411, 11, 577, 466, 11, 286, + 914, 11, 286, 478, 4033, 1228, 1412, 490, 1249, 47, 3927, 13, 1012, 307, 309, 411, + 11, 51580], "temperature": 0.0, "avg_logprob": -0.13573940087717476, "compression_ratio": + 1.786231884057971, "no_speech_prob": 0.005890439730137587}, {"id": 346, "seek": + 210416, "start": 2128.48, "end": 2133.68, "text": " you know, if I could use the + data that''s coming from QPIT, like insight QPIT somehow, you know,", "tokens": + [51580, 291, 458, 11, 498, 286, 727, 764, 264, 1412, 300, 311, 1348, 490, 1249, + 47, 3927, 11, 411, 11269, 1249, 47, 3927, 6063, 11, 291, 458, 11, 51840], "temperature": + 0.0, "avg_logprob": -0.13573940087717476, "compression_ratio": 1.786231884057971, + "no_speech_prob": 0.005890439730137587}, {"id": 347, "seek": 213368, "start": 2133.68, + "end": 2139.3599999999997, "text": " supporting the visualisation as well. And I + think it was certainly groundbreaking that we added", "tokens": [50364, 7231, 264, + 5056, 7623, 382, 731, 13, 400, 286, 519, 309, 390, 3297, 42491, 300, 321, 3869, + 50648], "temperature": 0.0, "avg_logprob": -0.14842683710950486, "compression_ratio": + 1.5447154471544715, "no_speech_prob": 0.0047005838714540005}, {"id": 348, "seek": + 213368, "start": 2139.3599999999997, "end": 2146.24, "text": " Python notebooks + functionality to QPIT. And that just opened up a whole lot of, you know, the AI", + "tokens": [50648, 15329, 43782, 14980, 281, 1249, 47, 3927, 13, 400, 300, 445, 5625, + 493, 257, 1379, 688, 295, 11, 291, 458, 11, 264, 7318, 50992], "temperature": 0.0, + "avg_logprob": -0.14842683710950486, "compression_ratio": 1.5447154471544715, "no_speech_prob": + 0.0047005838714540005}, {"id": 349, "seek": 213368, "start": 2146.24, "end": 2154.16, + "text": " portal. Yeah, that''s amazing actually because I also kept pushing QPIT + in every job that I took,", "tokens": [50992, 14982, 13, 865, 11, 300, 311, 2243, + 767, 570, 286, 611, 4305, 7380, 1249, 47, 3927, 294, 633, 1691, 300, 286, 1890, + 11, 51388], "temperature": 0.0, "avg_logprob": -0.14842683710950486, "compression_ratio": + 1.5447154471544715, "no_speech_prob": 0.0047005838714540005}, {"id": 350, "seek": + 213368, "start": 2154.16, "end": 2159.8399999999997, "text": " right? And not necessarily + pushing as in I am selling the tool and I don''t care what''s the", "tokens": [51388, + 558, 30, 400, 406, 4725, 7380, 382, 294, 286, 669, 6511, 264, 2290, 293, 286, 500, + 380, 1127, 437, 311, 264, 51672], "temperature": 0.0, "avg_logprob": -0.14842683710950486, + "compression_ratio": 1.5447154471544715, "no_speech_prob": 0.0047005838714540005}, + {"id": 351, "seek": 215984, "start": 2159.84, "end": 2166.4, "text": " purpose for + using it. But I just know that for all these typical problems with search quality,", + "tokens": [50364, 4334, 337, 1228, 309, 13, 583, 286, 445, 458, 300, 337, 439, 613, + 7476, 2740, 365, 3164, 3125, 11, 50692], "temperature": 0.0, "avg_logprob": -0.1682829357328869, + "compression_ratio": 1.524, "no_speech_prob": 0.007397774141281843}, {"id": 352, + "seek": 215984, "start": 2167.28, "end": 2173.92, "text": " instead of like reinventing + a wheel, why not take an open source, you know, tool, like QPIT.", "tokens": [50736, + 2602, 295, 411, 33477, 278, 257, 5589, 11, 983, 406, 747, 364, 1269, 4009, 11, 291, + 458, 11, 2290, 11, 411, 1249, 47, 3927, 13, 51068], "temperature": 0.0, "avg_logprob": + -0.1682829357328869, "compression_ratio": 1.524, "no_speech_prob": 0.007397774141281843}, + {"id": 353, "seek": 215984, "start": 2173.92, "end": 2178.8, "text": " And it''s + commercially friendly license, you know, go ahead, deploy it. It''s very easy to + do so.", "tokens": [51068, 400, 309, 311, 41751, 9208, 10476, 11, 291, 458, 11, + 352, 2286, 11, 7274, 309, 13, 467, 311, 588, 1858, 281, 360, 370, 13, 51312], "temperature": + 0.0, "avg_logprob": -0.1682829357328869, "compression_ratio": 1.524, "no_speech_prob": + 0.007397774141281843}, {"id": 354, "seek": 215984, "start": 2179.44, "end": 2186.1600000000003, + "text": " Exactly. And then when I saw the, the notebooks, I was like, wow, this + is so cool. I can just now", "tokens": [51344, 7587, 13, 400, 550, 562, 286, 1866, + 264, 11, 264, 43782, 11, 286, 390, 411, 11, 6076, 11, 341, 307, 370, 1627, 13, 286, + 393, 445, 586, 51680], "temperature": 0.0, "avg_logprob": -0.1682829357328869, "compression_ratio": + 1.524, "no_speech_prob": 0.007397774141281843}, {"id": 355, "seek": 218616, "start": + 2186.24, "end": 2191.68, "text": " select to my data scientists and say, hey, we''ve + just labeled all this queries. Can you do your", "tokens": [50368, 3048, 281, 452, + 1412, 7708, 293, 584, 11, 4177, 11, 321, 600, 445, 21335, 439, 341, 24109, 13, 1664, + 291, 360, 428, 50640], "temperature": 0.0, "avg_logprob": -0.14744657532781616, + "compression_ratio": 1.6218181818181818, "no_speech_prob": 0.008775784634053707}, + {"id": 356, "seek": 218616, "start": 2191.68, "end": 2196.7999999999997, "text": + " magic right here in the notebook? And I can actually access it as well, potentially, + or whatever.", "tokens": [50640, 5585, 558, 510, 294, 264, 21060, 30, 400, 286, + 393, 767, 2105, 309, 382, 731, 11, 7263, 11, 420, 2035, 13, 50896], "temperature": + 0.0, "avg_logprob": -0.14744657532781616, "compression_ratio": 1.6218181818181818, + "no_speech_prob": 0.008775784634053707}, {"id": 357, "seek": 218616, "start": 2196.7999999999997, + "end": 2201.44, "text": " I mean, it''s just like so much easier than to scratch + your head and think, okay, now I need to", "tokens": [50896, 286, 914, 11, 309, + 311, 445, 411, 370, 709, 3571, 813, 281, 8459, 428, 1378, 293, 519, 11, 1392, 11, + 586, 286, 643, 281, 51128], "temperature": 0.0, "avg_logprob": -0.14744657532781616, + "compression_ratio": 1.6218181818181818, "no_speech_prob": 0.008775784634053707}, + {"id": 358, "seek": 218616, "start": 2201.44, "end": 2207.12, "text": " download + all this data, all these annotations, and then push them somewhere else. And then,", + "tokens": [51128, 5484, 439, 341, 1412, 11, 439, 613, 25339, 763, 11, 293, 550, + 2944, 552, 4079, 1646, 13, 400, 550, 11, 51412], "temperature": 0.0, "avg_logprob": + -0.14744657532781616, "compression_ratio": 1.6218181818181818, "no_speech_prob": + 0.008775784634053707}, {"id": 359, "seek": 218616, "start": 2207.68, "end": 2210.72, + "text": " yeah, it''s, I think it''s an amazing feature. Thanks for doing it.", + "tokens": [51440, 1338, 11, 309, 311, 11, 286, 519, 309, 311, 364, 2243, 4111, 13, + 2561, 337, 884, 309, 13, 51592], "temperature": 0.0, "avg_logprob": -0.14744657532781616, + "compression_ratio": 1.6218181818181818, "no_speech_prob": 0.008775784634053707}, + {"id": 360, "seek": 221072, "start": 2211.2, "end": 2218.56, "text": " Oh, my pleasure. + Yeah. So I think also during this course of discussion, we also discovered some", + "tokens": [50388, 876, 11, 452, 6834, 13, 865, 13, 407, 286, 519, 611, 1830, 341, + 1164, 295, 5017, 11, 321, 611, 6941, 512, 50756], "temperature": 0.0, "avg_logprob": + -0.11172658464182979, "compression_ratio": 1.6134453781512605, "no_speech_prob": + 0.066267229616642}, {"id": 361, "seek": 221072, "start": 2218.56, "end": 2223.9199999999996, + "text": " documentation bugs, something that we''re not mentioned in solar documentation. + I contributed", "tokens": [50756, 14333, 15120, 11, 746, 300, 321, 434, 406, 2835, + 294, 7936, 14333, 13, 286, 18434, 51024], "temperature": 0.0, "avg_logprob": -0.11172658464182979, + "compression_ratio": 1.6134453781512605, "no_speech_prob": 0.066267229616642}, {"id": + 362, "seek": 221072, "start": 2223.9199999999996, "end": 2229.4399999999996, "text": + " to that too. So I would say like in principle, I think we have a very supportive + and encouraging", "tokens": [51024, 281, 300, 886, 13, 407, 286, 576, 584, 411, + 294, 8665, 11, 286, 519, 321, 362, 257, 588, 14435, 293, 14580, 51300], "temperature": + 0.0, "avg_logprob": -0.11172658464182979, "compression_ratio": 1.6134453781512605, + "no_speech_prob": 0.066267229616642}, {"id": 363, "seek": 221072, "start": 2229.4399999999996, + "end": 2235.9199999999996, "text": " culture in the company, which is what I really + like. I think, I don''t know, maybe if I could talk", "tokens": [51300, 3713, 294, + 264, 2237, 11, 597, 307, 437, 286, 534, 411, 13, 286, 519, 11, 286, 500, 380, 458, + 11, 1310, 498, 286, 727, 751, 51624], "temperature": 0.0, "avg_logprob": -0.11172658464182979, + "compression_ratio": 1.6134453781512605, "no_speech_prob": 0.066267229616642}, {"id": + 364, "seek": 223592, "start": 2235.92, "end": 2241.12, "text": " a little more about + this wet room implementation. I mean, there have been several talks about it as", + "tokens": [50364, 257, 707, 544, 466, 341, 6630, 1808, 11420, 13, 286, 914, 11, + 456, 362, 668, 2940, 6686, 466, 309, 382, 50624], "temperature": 0.0, "avg_logprob": + -0.1855200101744454, "compression_ratio": 1.7149122807017543, "no_speech_prob": + 0.014834241941571236}, {"id": 365, "seek": 223592, "start": 2241.12, "end": 2249.2000000000003, + "text": " well. I also presented that, you know, Craco, the haystack on tour. But + I think it was something that", "tokens": [50624, 731, 13, 286, 611, 8212, 300, + 11, 291, 458, 11, 4779, 11428, 11, 264, 4842, 372, 501, 322, 3512, 13, 583, 286, + 519, 309, 390, 746, 300, 51028], "temperature": 0.0, "avg_logprob": -0.1855200101744454, + "compression_ratio": 1.7149122807017543, "no_speech_prob": 0.014834241941571236}, + {"id": 366, "seek": 223592, "start": 2249.2000000000003, "end": 2255.6800000000003, + "text": " was, you know, has been sitting on my mind for too long, because I think + we need a, you know,", "tokens": [51028, 390, 11, 291, 458, 11, 575, 668, 3798, + 322, 452, 1575, 337, 886, 938, 11, 570, 286, 519, 321, 643, 257, 11, 291, 458, 11, + 51352], "temperature": 0.0, "avg_logprob": -0.1855200101744454, "compression_ratio": + 1.7149122807017543, "no_speech_prob": 0.014834241941571236}, {"id": 367, "seek": + 223592, "start": 2255.6800000000003, "end": 2262.2400000000002, "text": " reasonable + way to not, you know, like, you know, detain the question from the client. And also,", + "tokens": [51352, 10585, 636, 281, 406, 11, 291, 458, 11, 411, 11, 291, 458, 11, + 1141, 491, 264, 1168, 490, 264, 6423, 13, 400, 611, 11, 51680], "temperature": 0.0, + "avg_logprob": -0.1855200101744454, "compression_ratio": 1.7149122807017543, "no_speech_prob": + 0.014834241941571236}, {"id": 368, "seek": 226224, "start": 2262.24, "end": 2267.3599999999997, + "text": " at the same time, we want to address the question in most, you know, like + explainable way. And I think", "tokens": [50364, 412, 264, 912, 565, 11, 321, 528, + 281, 2985, 264, 1168, 294, 881, 11, 291, 458, 11, 411, 2903, 712, 636, 13, 400, + 286, 519, 50620], "temperature": 0.0, "avg_logprob": -0.12070008574939164, "compression_ratio": + 1.8984375, "no_speech_prob": 0.009548589587211609}, {"id": 369, "seek": 226224, + "start": 2267.3599999999997, "end": 2272.4799999999996, "text": " this was like + the explainable thing that, you know, is something that people can use. And I think", + "tokens": [50620, 341, 390, 411, 264, 2903, 712, 551, 300, 11, 291, 458, 11, 307, + 746, 300, 561, 393, 764, 13, 400, 286, 519, 50876], "temperature": 0.0, "avg_logprob": + -0.12070008574939164, "compression_ratio": 1.8984375, "no_speech_prob": 0.009548589587211609}, + {"id": 370, "seek": 226224, "start": 2273.4399999999996, "end": 2278.0, "text": + " all this while, you know, people have been discussing about vectors, I think people + charged", "tokens": [50924, 439, 341, 1339, 11, 291, 458, 11, 561, 362, 668, 10850, + 466, 18875, 11, 286, 519, 561, 11109, 51152], "temperature": 0.0, "avg_logprob": + -0.12070008574939164, "compression_ratio": 1.8984375, "no_speech_prob": 0.009548589587211609}, + {"id": 371, "seek": 226224, "start": 2278.0, "end": 2282.7999999999997, "text": + " money to show you that, you know, vectors could work in your search engine. And + we do it for free.", "tokens": [51152, 1460, 281, 855, 291, 300, 11, 291, 458, 11, + 18875, 727, 589, 294, 428, 3164, 2848, 13, 400, 321, 360, 309, 337, 1737, 13, 51392], + "temperature": 0.0, "avg_logprob": -0.12070008574939164, "compression_ratio": 1.8984375, + "no_speech_prob": 0.009548589587211609}, {"id": 372, "seek": 226224, "start": 2283.6, + "end": 2288.4799999999996, "text": " And I think, again, going by like what my company + does, like we provide a lot of informational", "tokens": [51432, 400, 286, 519, + 11, 797, 11, 516, 538, 411, 437, 452, 2237, 775, 11, 411, 321, 2893, 257, 688, 295, + 49391, 51676], "temperature": 0.0, "avg_logprob": -0.12070008574939164, "compression_ratio": + 1.8984375, "no_speech_prob": 0.009548589587211609}, {"id": 373, "seek": 228848, + "start": 2288.48, "end": 2294.16, "text": " content, you know, for free and open + source, lots and lots of things, you know, which usually", "tokens": [50364, 2701, + 11, 291, 458, 11, 337, 1737, 293, 1269, 4009, 11, 3195, 293, 3195, 295, 721, 11, + 291, 458, 11, 597, 2673, 50648], "temperature": 0.0, "avg_logprob": -0.10567758633540227, + "compression_ratio": 1.8341232227488151, "no_speech_prob": 0.009845786727964878}, + {"id": 374, "seek": 228848, "start": 2294.16, "end": 2298.88, "text": " would cost + a lot of money to the companies. And I think this is something I really feel like + I''m", "tokens": [50648, 576, 2063, 257, 688, 295, 1460, 281, 264, 3431, 13, 400, + 286, 519, 341, 307, 746, 286, 534, 841, 411, 286, 478, 50884], "temperature": 0.0, + "avg_logprob": -0.10567758633540227, "compression_ratio": 1.8341232227488151, "no_speech_prob": + 0.009845786727964878}, {"id": 375, "seek": 228848, "start": 2298.88, "end": 2306.0, + "text": " doing a good job. I''m making a difference. So I think that that really + brings a lot of, you know,", "tokens": [50884, 884, 257, 665, 1691, 13, 286, 478, + 1455, 257, 2649, 13, 407, 286, 519, 300, 300, 534, 5607, 257, 688, 295, 11, 291, + 458, 11, 51240], "temperature": 0.0, "avg_logprob": -0.10567758633540227, "compression_ratio": + 1.8341232227488151, "no_speech_prob": 0.009845786727964878}, {"id": 376, "seek": + 228848, "start": 2306.0, "end": 2313.2, "text": " like satisfaction that I''m doing + a good job somewhere. So that''s, that''s nice. And I think now I''m", "tokens": + [51240, 411, 18715, 300, 286, 478, 884, 257, 665, 1691, 4079, 13, 407, 300, 311, + 11, 300, 311, 1481, 13, 400, 286, 519, 586, 286, 478, 51600], "temperature": 0.0, + "avg_logprob": -0.10567758633540227, "compression_ratio": 1.8341232227488151, "no_speech_prob": + 0.009845786727964878}, {"id": 377, "seek": 231320, "start": 2313.2, "end": 2319.2, + "text": " working on to also, you know, experiment with other stuff with the selector + thing. I presented that,", "tokens": [50364, 1364, 322, 281, 611, 11, 291, 458, + 11, 5120, 365, 661, 1507, 365, 264, 23264, 1672, 551, 13, 286, 8212, 300, 11, 50664], + "temperature": 0.0, "avg_logprob": -0.14932908263860964, "compression_ratio": 1.705128205128205, + "no_speech_prob": 0.00987426657229662}, {"id": 378, "seek": 231320, "start": 2319.2, + "end": 2324.48, "text": " you know, we would be working on improving the image models. + So basically, I mean, I''m sure you", "tokens": [50664, 291, 458, 11, 321, 576, + 312, 1364, 322, 11470, 264, 3256, 5245, 13, 407, 1936, 11, 286, 914, 11, 286, 478, + 988, 291, 50928], "temperature": 0.0, "avg_logprob": -0.14932908263860964, "compression_ratio": + 1.705128205128205, "no_speech_prob": 0.00987426657229662}, {"id": 379, "seek": 231320, + "start": 2324.48, "end": 2330.7999999999997, "text": " know about it, but for some + viewers, it would be a new thing that chorus is a dummy shop. And we have a", "tokens": + [50928, 458, 466, 309, 11, 457, 337, 512, 8499, 11, 309, 576, 312, 257, 777, 551, + 300, 22632, 307, 257, 35064, 3945, 13, 400, 321, 362, 257, 51244], "temperature": + 0.0, "avg_logprob": -0.14932908263860964, "compression_ratio": 1.705128205128205, + "no_speech_prob": 0.00987426657229662}, {"id": 380, "seek": 231320, "start": 2330.7999999999997, + "end": 2337.6, "text": " dataset that comes from icecat. So icecat data basically + is, you know, like a collaborated content", "tokens": [51244, 28872, 300, 1487, + 490, 4435, 18035, 13, 407, 4435, 18035, 1412, 1936, 307, 11, 291, 458, 11, 411, + 257, 42463, 2701, 51584], "temperature": 0.0, "avg_logprob": -0.14932908263860964, + "compression_ratio": 1.705128205128205, "no_speech_prob": 0.00987426657229662}, + {"id": 381, "seek": 233760, "start": 2337.68, "end": 2343.12, "text": " from Amazon + and other, you know, web shops. And usually that content is like very, very structured", + "tokens": [50368, 490, 6795, 293, 661, 11, 291, 458, 11, 3670, 14457, 13, 400, 2673, + 300, 2701, 307, 411, 588, 11, 588, 18519, 50640], "temperature": 0.0, "avg_logprob": + -0.1396901169601752, "compression_ratio": 1.5918367346938775, "no_speech_prob": + 0.017319360747933388}, {"id": 382, "seek": 233760, "start": 2343.12, "end": 2349.04, + "text": " content. It has images as well. And Lord, if you know, my new features + or attributes of all the", "tokens": [50640, 2701, 13, 467, 575, 5267, 382, 731, + 13, 400, 3257, 11, 498, 291, 458, 11, 452, 777, 4122, 420, 17212, 295, 439, 264, + 50936], "temperature": 0.0, "avg_logprob": -0.1396901169601752, "compression_ratio": + 1.5918367346938775, "no_speech_prob": 0.017319360747933388}, {"id": 383, "seek": + 233760, "start": 2349.04, "end": 2356.08, "text": " products. So which is why it''s + like a good example. Plus we have other content as well in the chorus", "tokens": + [50936, 3383, 13, 407, 597, 307, 983, 309, 311, 411, 257, 665, 1365, 13, 7721, 321, + 362, 661, 2701, 382, 731, 294, 264, 22632, 51288], "temperature": 0.0, "avg_logprob": + -0.1396901169601752, "compression_ratio": 1.5918367346938775, "no_speech_prob": + 0.017319360747933388}, {"id": 384, "seek": 233760, "start": 2356.08, "end": 2361.36, + "text": " in general, like how Cupid would work locally for your web shop and how + you can use quirky and", "tokens": [51288, 294, 2674, 11, 411, 577, 383, 6127, 576, + 589, 16143, 337, 428, 3670, 3945, 293, 577, 291, 393, 764, 49515, 293, 51552], "temperature": + 0.0, "avg_logprob": -0.1396901169601752, "compression_ratio": 1.5918367346938775, + "no_speech_prob": 0.017319360747933388}, {"id": 385, "seek": 236136, "start": 2361.36, + "end": 2366.8, "text": " smooie to do the search andizing part and, you know, manage + these search rules. Like if you want to", "tokens": [50364, 899, 1986, 414, 281, + 360, 264, 3164, 293, 3319, 644, 293, 11, 291, 458, 11, 3067, 613, 3164, 4474, 13, + 1743, 498, 291, 528, 281, 50636], "temperature": 0.0, "avg_logprob": -0.14937747608531604, + "compression_ratio": 1.7956204379562044, "no_speech_prob": 0.014125058427453041}, + {"id": 386, "seek": 236136, "start": 2366.8, "end": 2374.0, "text": " bring some + brand up in the search results, you could do that as well. So I think we promised + that we", "tokens": [50636, 1565, 512, 3360, 493, 294, 264, 3164, 3542, 11, 291, + 727, 360, 300, 382, 731, 13, 407, 286, 519, 321, 10768, 300, 321, 50996], "temperature": + 0.0, "avg_logprob": -0.14937747608531604, "compression_ratio": 1.7956204379562044, + "no_speech_prob": 0.014125058427453041}, {"id": 387, "seek": 236136, "start": 2374.0, + "end": 2379.04, "text": " would be working on the images side. And because we have + access to images, usually we have any", "tokens": [50996, 576, 312, 1364, 322, 264, + 5267, 1252, 13, 400, 570, 321, 362, 2105, 281, 5267, 11, 2673, 321, 362, 604, 51248], + "temperature": 0.0, "avg_logprob": -0.14937747608531604, "compression_ratio": 1.7956204379562044, + "no_speech_prob": 0.014125058427453041}, {"id": 388, "seek": 236136, "start": 2379.04, + "end": 2384.32, "text": " commerce shops. So we tried to, you know, leverage that + as well. And I think if you look at the demos", "tokens": [51248, 26320, 14457, + 13, 407, 321, 3031, 281, 11, 291, 458, 11, 13982, 300, 382, 731, 13, 400, 286, 519, + 498, 291, 574, 412, 264, 33788, 51512], "temperature": 0.0, "avg_logprob": -0.14937747608531604, + "compression_ratio": 1.7956204379562044, "no_speech_prob": 0.014125058427453041}, + {"id": 389, "seek": 236136, "start": 2384.32, "end": 2390.48, "text": " that we + presented, even without fine tuning the results were like breathtakingly unbelievable.", + "tokens": [51512, 300, 321, 8212, 11, 754, 1553, 2489, 15164, 264, 3542, 645, 411, + 48393, 356, 16605, 13, 51820], "temperature": 0.0, "avg_logprob": -0.14937747608531604, + "compression_ratio": 1.7956204379562044, "no_speech_prob": 0.014125058427453041}, + {"id": 390, "seek": 239048, "start": 2390.56, "end": 2396.96, "text": " We were + like, wow, this is, this is amazing. So I think in general, I think it was very, + you know,", "tokens": [50368, 492, 645, 411, 11, 6076, 11, 341, 307, 11, 341, 307, + 2243, 13, 407, 286, 519, 294, 2674, 11, 286, 519, 309, 390, 588, 11, 291, 458, 11, + 50688], "temperature": 0.0, "avg_logprob": -0.19843683487329727, "compression_ratio": + 1.7408759124087592, "no_speech_prob": 0.004124154802411795}, {"id": 391, "seek": + 239048, "start": 2396.96, "end": 2405.44, "text": " liberating experience that, + you know, we could use vectors successfully in these shops, which are", "tokens": + [50688, 6774, 990, 1752, 300, 11, 291, 458, 11, 321, 727, 764, 18875, 10727, 294, + 613, 14457, 11, 597, 366, 51112], "temperature": 0.0, "avg_logprob": -0.19843683487329727, + "compression_ratio": 1.7408759124087592, "no_speech_prob": 0.004124154802411795}, + {"id": 392, "seek": 239048, "start": 2405.44, "end": 2410.88, "text": " using the + traditional search engines. So we, I recently also contributed the last search version.", + "tokens": [51112, 1228, 264, 5164, 3164, 12982, 13, 407, 321, 11, 286, 3938, 611, + 18434, 264, 1036, 3164, 3037, 13, 51384], "temperature": 0.0, "avg_logprob": -0.19843683487329727, + "compression_ratio": 1.7408759124087592, "no_speech_prob": 0.004124154802411795}, + {"id": 393, "seek": 239048, "start": 2410.88, "end": 2416.56, "text": " Because + again, in lot of forms, when we posted about vectors in solar and chorus solar version,", + "tokens": [51384, 1436, 797, 11, 294, 688, 295, 6422, 11, 562, 321, 9437, 466, 18875, + 294, 7936, 293, 22632, 7936, 3037, 11, 51668], "temperature": 0.0, "avg_logprob": + -0.19843683487329727, "compression_ratio": 1.7408759124087592, "no_speech_prob": + 0.004124154802411795}, {"id": 394, "seek": 239048, "start": 2416.56, "end": 2420.32, + "text": " we got a lot of questions about like, is it going to be supported also + in last search?", "tokens": [51668, 321, 658, 257, 688, 295, 1651, 466, 411, 11, + 307, 309, 516, 281, 312, 8104, 611, 294, 1036, 3164, 30, 51856], "temperature": + 0.0, "avg_logprob": -0.19843683487329727, "compression_ratio": 1.7408759124087592, + "no_speech_prob": 0.004124154802411795}, {"id": 395, "seek": 242032, "start": 2420.4, + "end": 2428.1600000000003, "text": " So I think I just took stab at that too. And + chorus is implemented in which language? Is it", "tokens": [50368, 407, 286, 519, + 286, 445, 1890, 16343, 412, 300, 886, 13, 400, 22632, 307, 12270, 294, 597, 2856, + 30, 1119, 309, 50756], "temperature": 0.0, "avg_logprob": -0.17292947185282803, + "compression_ratio": 1.5965665236051503, "no_speech_prob": 0.0009233547607436776}, + {"id": 396, "seek": 242032, "start": 2428.1600000000003, "end": 2434.96, "text": + " Java or? Yes. So chorus is, yes, it''s combination, of course. I think I''m adding + a lot of, you", "tokens": [50756, 10745, 420, 30, 1079, 13, 407, 22632, 307, 11, + 2086, 11, 309, 311, 6562, 11, 295, 1164, 13, 286, 519, 286, 478, 5127, 257, 688, + 295, 11, 291, 51096], "temperature": 0.0, "avg_logprob": -0.17292947185282803, "compression_ratio": + 1.5965665236051503, "no_speech_prob": 0.0009233547607436776}, {"id": 397, "seek": + 242032, "start": 2434.96, "end": 2438.88, "text": " know, I think content to it + as well. I think one of the more interesting things that I also", "tokens": [51096, + 458, 11, 286, 519, 2701, 281, 309, 382, 731, 13, 286, 519, 472, 295, 264, 544, 1880, + 721, 300, 286, 611, 51292], "temperature": 0.0, "avg_logprob": -0.17292947185282803, + "compression_ratio": 1.5965665236051503, "no_speech_prob": 0.0009233547607436776}, + {"id": 398, "seek": 242032, "start": 2438.88, "end": 2444.32, "text": " contributed + and I felt like it''s something that usually people would not share it on the open", + "tokens": [51292, 18434, 293, 286, 2762, 411, 309, 311, 746, 300, 2673, 561, 576, + 406, 2073, 309, 322, 264, 1269, 51564], "temperature": 0.0, "avg_logprob": -0.17292947185282803, + "compression_ratio": 1.5965665236051503, "no_speech_prob": 0.0009233547607436776}, + {"id": 399, "seek": 244432, "start": 2444.96, "end": 2451.52, "text": " source is + like how you can convert your documents into vectors. So I think this is the part + that", "tokens": [50396, 4009, 307, 411, 577, 291, 393, 7620, 428, 8512, 666, 18875, + 13, 407, 286, 519, 341, 307, 264, 644, 300, 50724], "temperature": 0.0, "avg_logprob": + -0.1386772541517622, "compression_ratio": 1.6995515695067265, "no_speech_prob": + 0.019027087837457657}, {"id": 400, "seek": 244432, "start": 2451.52, "end": 2456.7200000000003, + "text": " usually, you know, the data encoding process, something that people would + really charge you", "tokens": [50724, 2673, 11, 291, 458, 11, 264, 1412, 43430, + 1399, 11, 746, 300, 561, 576, 534, 4602, 291, 50984], "temperature": 0.0, "avg_logprob": + -0.1386772541517622, "compression_ratio": 1.6995515695067265, "no_speech_prob": + 0.019027087837457657}, {"id": 401, "seek": 244432, "start": 2456.7200000000003, + "end": 2462.56, "text": " high amount of money to, you know, add vectors into your + indexing pipeline. And I provide that", "tokens": [50984, 1090, 2372, 295, 1460, + 281, 11, 291, 458, 11, 909, 18875, 666, 428, 8186, 278, 15517, 13, 400, 286, 2893, + 300, 51276], "temperature": 0.0, "avg_logprob": -0.1386772541517622, "compression_ratio": + 1.6995515695067265, "no_speech_prob": 0.019027087837457657}, {"id": 402, "seek": + 244432, "start": 2462.56, "end": 2468.2400000000002, "text": " again for free. Yeah. + And in solar version. So I think that is also something that people could", "tokens": + [51276, 797, 337, 1737, 13, 865, 13, 400, 294, 7936, 3037, 13, 407, 286, 519, 300, + 307, 611, 746, 300, 561, 727, 51560], "temperature": 0.0, "avg_logprob": -0.1386772541517622, + "compression_ratio": 1.6995515695067265, "no_speech_prob": 0.019027087837457657}, + {"id": 403, "seek": 246824, "start": 2468.24, "end": 2473.52, "text": " take advantage + of. Yeah, I mean, just a couple of years ago, when was it exactly a year ago?", + "tokens": [50364, 747, 5002, 295, 13, 865, 11, 286, 914, 11, 445, 257, 1916, 295, + 924, 2057, 11, 562, 390, 309, 2293, 257, 1064, 2057, 30, 50628], "temperature": + 0.0, "avg_logprob": -0.17051021420225806, "compression_ratio": 1.565040650406504, + "no_speech_prob": 0.027464911341667175}, {"id": 404, "seek": 246824, "start": 2474.16, + "end": 2480.8799999999997, "text": " Eric, you traveled to Europe to meet you guys + in Berlin and then he also traveled to Finland to", "tokens": [50660, 9336, 11, + 291, 16147, 281, 3315, 281, 1677, 291, 1074, 294, 13848, 293, 550, 415, 611, 16147, + 281, 24869, 281, 50996], "temperature": 0.0, "avg_logprob": -0.17051021420225806, + "compression_ratio": 1.565040650406504, "no_speech_prob": 0.027464911341667175}, + {"id": 405, "seek": 246824, "start": 2480.8799999999997, "end": 2490.24, "text": + " visit me. And he wasn''t the hotel room and he said, hey, let''s work for a few + hours. And he was", "tokens": [50996, 3441, 385, 13, 400, 415, 2067, 380, 264, 7622, + 1808, 293, 415, 848, 11, 4177, 11, 718, 311, 589, 337, 257, 1326, 2496, 13, 400, + 415, 390, 51464], "temperature": 0.0, "avg_logprob": -0.17051021420225806, "compression_ratio": + 1.565040650406504, "no_speech_prob": 0.027464911341667175}, {"id": 406, "seek": + 246824, "start": 2490.24, "end": 2497.2, "text": " actually asking some questions + about vector search. We were writing this article in search insights,", "tokens": + [51464, 767, 3365, 512, 1651, 466, 8062, 3164, 13, 492, 645, 3579, 341, 7222, 294, + 3164, 14310, 11, 51812], "temperature": 0.0, "avg_logprob": -0.17051021420225806, + "compression_ratio": 1.565040650406504, "no_speech_prob": 0.027464911341667175}, + {"id": 407, "seek": 249720, "start": 2497.2, "end": 2504.16, "text": " right? Oh, + right. Right. And and and he was saying his very passion, he''s like almost theatrical.", + "tokens": [50364, 558, 30, 876, 11, 558, 13, 1779, 13, 400, 293, 293, 415, 390, + 1566, 702, 588, 5418, 11, 415, 311, 411, 1920, 42806, 13, 50712], "temperature": + 0.0, "avg_logprob": -0.1704241180419922, "compression_ratio": 1.6317991631799162, + "no_speech_prob": 0.006502034142613411}, {"id": 408, "seek": 249720, "start": 2504.16, + "end": 2508.72, "text": " He was like walking in the room and saying, okay, here + is the pipeline. I have solar. I have this.", "tokens": [50712, 634, 390, 411, 4494, + 294, 264, 1808, 293, 1566, 11, 1392, 11, 510, 307, 264, 15517, 13, 286, 362, 7936, + 13, 286, 362, 341, 13, 50940], "temperature": 0.0, "avg_logprob": -0.1704241180419922, + "compression_ratio": 1.6317991631799162, "no_speech_prob": 0.006502034142613411}, + {"id": 409, "seek": 249720, "start": 2508.72, "end": 2516.08, "text": " I have my + Java like client. So where will you compute this vectors if all the models are accessible", + "tokens": [50940, 286, 362, 452, 10745, 411, 6423, 13, 407, 689, 486, 291, 14722, + 341, 18875, 498, 439, 264, 5245, 366, 9515, 51308], "temperature": 0.0, "avg_logprob": + -0.1704241180419922, "compression_ratio": 1.6317991631799162, "no_speech_prob": + 0.006502034142613411}, {"id": 410, "seek": 249720, "start": 2516.08, "end": 2522.08, + "text": " through Python, right? Yeah. And it wasn''t just a question of you might + get the same question", "tokens": [51308, 807, 15329, 11, 558, 30, 865, 13, 400, + 309, 2067, 380, 445, 257, 1168, 295, 291, 1062, 483, 264, 912, 1168, 51608], "temperature": + 0.0, "avg_logprob": -0.1704241180419922, "compression_ratio": 1.6317991631799162, + "no_speech_prob": 0.006502034142613411}, {"id": 411, "seek": 252208, "start": 2522.16, + "end": 2527.7599999999998, "text": " with your client, but it was like, wait a second. + Do I even know myself? What would I do? And I", "tokens": [50368, 365, 428, 6423, + 11, 457, 309, 390, 411, 11, 1699, 257, 1150, 13, 1144, 286, 754, 458, 2059, 30, + 708, 576, 286, 360, 30, 400, 286, 50648], "temperature": 0.0, "avg_logprob": -0.11935158698789534, + "compression_ratio": 1.6521739130434783, "no_speech_prob": 0.07275978475809097}, + {"id": 412, "seek": 252208, "start": 2527.7599999999998, "end": 2534.0, "text": + " said, probably I don''t. So I would start engineering something from scratch. + That is true. And I think", "tokens": [50648, 848, 11, 1391, 286, 500, 380, 13, + 407, 286, 576, 722, 7043, 746, 490, 8459, 13, 663, 307, 2074, 13, 400, 286, 519, + 50960], "temperature": 0.0, "avg_logprob": -0.11935158698789534, "compression_ratio": + 1.6521739130434783, "no_speech_prob": 0.07275978475809097}, {"id": 413, "seek": + 252208, "start": 2534.0, "end": 2539.84, "text": " that that is that is actually + one of the most obvious questions that keeps coming across because", "tokens": [50960, + 300, 300, 307, 300, 307, 767, 472, 295, 264, 881, 6322, 1651, 300, 5965, 1348, 2108, + 570, 51252], "temperature": 0.0, "avg_logprob": -0.11935158698789534, "compression_ratio": + 1.6521739130434783, "no_speech_prob": 0.07275978475809097}, {"id": 414, "seek": + 252208, "start": 2539.84, "end": 2545.6, "text": " Lord of our clients are now interested + in this. I mean, I know a typical work week for me has been", "tokens": [51252, + 3257, 295, 527, 6982, 366, 586, 3102, 294, 341, 13, 286, 914, 11, 286, 458, 257, + 7476, 589, 1243, 337, 385, 575, 668, 51540], "temperature": 0.0, "avg_logprob": + -0.11935158698789534, "compression_ratio": 1.6521739130434783, "no_speech_prob": + 0.07275978475809097}, {"id": 415, "seek": 252208, "start": 2545.6, "end": 2550.7999999999997, + "text": " like, I''m giving like more than four or five demos in a week to clients + like who want to know about", "tokens": [51540, 411, 11, 286, 478, 2902, 411, 544, + 813, 1451, 420, 1732, 33788, 294, 257, 1243, 281, 6982, 411, 567, 528, 281, 458, + 466, 51800], "temperature": 0.0, "avg_logprob": -0.11935158698789534, "compression_ratio": + 1.6521739130434783, "no_speech_prob": 0.07275978475809097}, {"id": 416, "seek": + 255080, "start": 2550.88, "end": 2557.44, "text": " this like will it fit my use + case? Like I have this, you know, size of the catalog like will it fit", "tokens": + [50368, 341, 411, 486, 309, 3318, 452, 764, 1389, 30, 1743, 286, 362, 341, 11, 291, + 458, 11, 2744, 295, 264, 19746, 411, 486, 309, 3318, 50696], "temperature": 0.0, + "avg_logprob": -0.13553236447847805, "compression_ratio": 1.88212927756654, "no_speech_prob": + 0.009676621295511723}, {"id": 417, "seek": 255080, "start": 2557.44, "end": 2563.6800000000003, + "text": " my use case? What do I need? And what kind of models? I mean, I need to + choose and obviously there", "tokens": [50696, 452, 764, 1389, 30, 708, 360, 286, + 643, 30, 400, 437, 733, 295, 5245, 30, 286, 914, 11, 286, 643, 281, 2826, 293, 2745, + 456, 51008], "temperature": 0.0, "avg_logprob": -0.13553236447847805, "compression_ratio": + 1.88212927756654, "no_speech_prob": 0.009676621295511723}, {"id": 418, "seek": 255080, + "start": 2563.6800000000003, "end": 2568.32, "text": " are some things that we need + to also, you know, spend time and, you know, charge money for. But then", "tokens": + [51008, 366, 512, 721, 300, 321, 643, 281, 611, 11, 291, 458, 11, 3496, 565, 293, + 11, 291, 458, 11, 4602, 1460, 337, 13, 583, 550, 51240], "temperature": 0.0, "avg_logprob": + -0.13553236447847805, "compression_ratio": 1.88212927756654, "no_speech_prob": 0.009676621295511723}, + {"id": 419, "seek": 255080, "start": 2568.32, "end": 2573.6800000000003, "text": + " eventually it turns out that, you know, people bring in their concerns and that + basically, you know,", "tokens": [51240, 4728, 309, 4523, 484, 300, 11, 291, 458, + 11, 561, 1565, 294, 641, 7389, 293, 300, 1936, 11, 291, 458, 11, 51508], "temperature": + 0.0, "avg_logprob": -0.13553236447847805, "compression_ratio": 1.88212927756654, + "no_speech_prob": 0.009676621295511723}, {"id": 420, "seek": 255080, "start": 2573.6800000000003, + "end": 2579.76, "text": " like shapes, what is it that we need to contribute next + to the open source? What is it that is", "tokens": [51508, 411, 10854, 11, 437, + 307, 309, 300, 321, 643, 281, 10586, 958, 281, 264, 1269, 4009, 30, 708, 307, 309, + 300, 307, 51812], "temperature": 0.0, "avg_logprob": -0.13553236447847805, "compression_ratio": + 1.88212927756654, "no_speech_prob": 0.009676621295511723}, {"id": 421, "seek": 257976, + "start": 2579.76, "end": 2584.48, "text": " confusing people? Why people are creating + so much hype about this stuff? Like this needs to be", "tokens": [50364, 13181, + 561, 30, 1545, 561, 366, 4084, 370, 709, 24144, 466, 341, 1507, 30, 1743, 341, 2203, + 281, 312, 50600], "temperature": 0.0, "avg_logprob": -0.1433835438319615, "compression_ratio": + 1.6182572614107884, "no_speech_prob": 0.007359507959336042}, {"id": 422, "seek": + 257976, "start": 2584.48, "end": 2590.0800000000004, "text": " demystified. And + I think that''s what my company does the best. And I think I''m just learning from + them.", "tokens": [50600, 1371, 38593, 2587, 13, 400, 286, 519, 300, 311, 437, 452, + 2237, 775, 264, 1151, 13, 400, 286, 519, 286, 478, 445, 2539, 490, 552, 13, 50880], + "temperature": 0.0, "avg_logprob": -0.1433835438319615, "compression_ratio": 1.6182572614107884, + "no_speech_prob": 0.007359507959336042}, {"id": 423, "seek": 257976, "start": 2590.0800000000004, + "end": 2598.1600000000003, "text": " Yeah, I think it''s the it''s an excellent + spot. And it''s like, usually with all these hypes,", "tokens": [50880, 865, 11, + 286, 519, 309, 311, 264, 309, 311, 364, 7103, 4008, 13, 400, 309, 311, 411, 11, + 2673, 365, 439, 613, 2477, 5190, 11, 51284], "temperature": 0.0, "avg_logprob": + -0.1433835438319615, "compression_ratio": 1.6182572614107884, "no_speech_prob": + 0.007359507959336042}, {"id": 424, "seek": 257976, "start": 2598.1600000000003, + "end": 2606.7200000000003, "text": " things get overcomplicated. But in the end, + if they prove to to exist, right, it prove the right to", "tokens": [51284, 721, + 483, 670, 43856, 3587, 13, 583, 294, 264, 917, 11, 498, 436, 7081, 281, 281, 2514, + 11, 558, 11, 309, 7081, 264, 558, 281, 51712], "temperature": 0.0, "avg_logprob": + -0.1433835438319615, "compression_ratio": 1.6182572614107884, "no_speech_prob": + 0.007359507959336042}, {"id": 425, "seek": 260672, "start": 2606.72, "end": 2614.08, + "text": " exist. Then I think many of these things will get simplified. They will + probably to some extent", "tokens": [50364, 2514, 13, 1396, 286, 519, 867, 295, + 613, 721, 486, 483, 26335, 13, 814, 486, 1391, 281, 512, 8396, 50732], "temperature": + 0.0, "avg_logprob": -0.32494096081666274, "compression_ratio": 1.6166666666666667, + "no_speech_prob": 0.008377269841730595}, {"id": 426, "seek": 260672, "start": 2614.08, + "end": 2622.3999999999996, "text": " even commoditize. And like, I think it was + the recent LinkedIn post that, broken up every single", "tokens": [50732, 754, 19931, + 270, 1125, 13, 400, 411, 11, 286, 519, 309, 390, 264, 5162, 20657, 2183, 300, 11, + 5463, 493, 633, 2167, 51148], "temperature": 0.0, "avg_logprob": -0.32494096081666274, + "compression_ratio": 1.6166666666666667, "no_speech_prob": 0.008377269841730595}, + {"id": 427, "seek": 260672, "start": 2622.3999999999996, "end": 2627.8399999999997, + "text": " mind and search by Dr. Dr. and Bullway, he says, you know, why what''s + happening with the last", "tokens": [51148, 1575, 293, 3164, 538, 2491, 13, 2491, + 13, 293, 14131, 676, 11, 415, 1619, 11, 291, 458, 11, 983, 437, 311, 2737, 365, + 264, 1036, 51420], "temperature": 0.0, "avg_logprob": -0.32494096081666274, "compression_ratio": + 1.6166666666666667, "no_speech_prob": 0.008377269841730595}, {"id": 428, "seek": + 260672, "start": 2627.8399999999997, "end": 2632.7999999999997, "text": " search? + What''s happening with solar? What''s happening with this? That''s right. Lodge + language models.", "tokens": [51420, 3164, 30, 708, 311, 2737, 365, 7936, 30, 708, + 311, 2737, 365, 341, 30, 663, 311, 558, 13, 441, 19315, 2856, 5245, 13, 51668], + "temperature": 0.0, "avg_logprob": -0.32494096081666274, "compression_ratio": 1.6166666666666667, + "no_speech_prob": 0.008377269841730595}, {"id": 429, "seek": 263280, "start": 2632.8, + "end": 2638.32, "text": " And, you know, like, and it''s like, how do so many people + chip in?", "tokens": [50364, 400, 11, 291, 458, 11, 411, 11, 293, 309, 311, 411, + 11, 577, 360, 370, 867, 561, 11409, 294, 30, 50640], "temperature": 0.0, "avg_logprob": + -0.21572826219641644, "compression_ratio": 1.680952380952381, "no_speech_prob": + 0.08679592609405518}, {"id": 430, "seek": 263280, "start": 2639.6000000000004, "end": + 2645.6000000000004, "text": " And I was wondering them too. Yes. And they are still + like revolving around like some of the", "tokens": [50704, 400, 286, 390, 6359, + 552, 886, 13, 1079, 13, 400, 436, 366, 920, 411, 16908, 798, 926, 411, 512, 295, + 264, 51004], "temperature": 0.0, "avg_logprob": -0.21572826219641644, "compression_ratio": + 1.680952380952381, "no_speech_prob": 0.08679592609405518}, {"id": 431, "seek": 263280, + "start": 2645.6000000000004, "end": 2652.5600000000004, "text": " some more interesting + concepts, some more like basic concepts that really are unsolved in many ways.", + "tokens": [51004, 512, 544, 1880, 10392, 11, 512, 544, 411, 3875, 10392, 300, 534, + 366, 2693, 29110, 294, 867, 2098, 13, 51352], "temperature": 0.0, "avg_logprob": + -0.21572826219641644, "compression_ratio": 1.680952380952381, "no_speech_prob": + 0.08679592609405518}, {"id": 432, "seek": 263280, "start": 2652.5600000000004, "end": + 2659.2000000000003, "text": " And, you know, how do you even like deliver vectors + to your database and so on and so forth?", "tokens": [51352, 400, 11, 291, 458, + 11, 577, 360, 291, 754, 411, 4239, 18875, 281, 428, 8149, 293, 370, 322, 293, 370, + 5220, 30, 51684], "temperature": 0.0, "avg_logprob": -0.21572826219641644, "compression_ratio": + 1.680952380952381, "no_speech_prob": 0.08679592609405518}, {"id": 433, "seek": 265920, + "start": 2659.2, "end": 2666.24, "text": " And one of the comments, I don''t remember + who said it was, hey, you know, in the end, keyword", "tokens": [50364, 400, 472, + 295, 264, 3053, 11, 286, 500, 380, 1604, 567, 848, 309, 390, 11, 4177, 11, 291, + 458, 11, 294, 264, 917, 11, 20428, 50716], "temperature": 0.0, "avg_logprob": -0.16907945546236905, + "compression_ratio": 1.5394736842105263, "no_speech_prob": 0.014189569279551506}, + {"id": 434, "seek": 265920, "start": 2666.24, "end": 2672.16, "text": " search and + vector search, both will be as equal kind of like, modesty that you can", "tokens": + [50716, 3164, 293, 8062, 3164, 11, 1293, 486, 312, 382, 2681, 733, 295, 411, 11, + 1072, 7819, 300, 291, 393, 51012], "temperature": 0.0, "avg_logprob": -0.16907945546236905, + "compression_ratio": 1.5394736842105263, "no_speech_prob": 0.014189569279551506}, + {"id": 435, "seek": 265920, "start": 2672.96, "end": 2680.24, "text": " play with + in any order, probably give some weightage to one or another depending on your use + case.", "tokens": [51052, 862, 365, 294, 604, 1668, 11, 1391, 976, 512, 3364, 609, + 281, 472, 420, 1071, 5413, 322, 428, 764, 1389, 13, 51416], "temperature": 0.0, + "avg_logprob": -0.16907945546236905, "compression_ratio": 1.5394736842105263, "no_speech_prob": + 0.014189569279551506}, {"id": 436, "seek": 265920, "start": 2680.24, "end": 2684.16, + "text": " And so complexity will shift away from these basic topics to something + more", "tokens": [51416, 400, 370, 14024, 486, 5513, 1314, 490, 613, 3875, 8378, + 281, 746, 544, 51612], "temperature": 0.0, "avg_logprob": -0.16907945546236905, + "compression_ratio": 1.5394736842105263, "no_speech_prob": 0.014189569279551506}, + {"id": 437, "seek": 268416, "start": 2685.12, "end": 2692.56, "text": " domain specific. + That is actually right. And I''m glad you pointed out. And this was something", + "tokens": [50412, 9274, 2685, 13, 663, 307, 767, 558, 13, 400, 286, 478, 5404, 291, + 10932, 484, 13, 400, 341, 390, 746, 50784], "temperature": 0.0, "avg_logprob": -0.15394076895206532, + "compression_ratio": 1.6478260869565218, "no_speech_prob": 0.05800002068281174}, + {"id": 438, "seek": 268416, "start": 2693.04, "end": 2697.92, "text": " that that''s + been like constantly asked when we present, when we give demos that, you know,", + "tokens": [50808, 300, 300, 311, 668, 411, 6460, 2351, 562, 321, 1974, 11, 562, + 321, 976, 33788, 300, 11, 291, 458, 11, 51052], "temperature": 0.0, "avg_logprob": + -0.15394076895206532, "compression_ratio": 1.6478260869565218, "no_speech_prob": + 0.05800002068281174}, {"id": 439, "seek": 268416, "start": 2697.92, "end": 2704.3999999999996, + "text": " how do I fit this into my existing stack? I know it''s cool. And because + when we say that, you know,", "tokens": [51052, 577, 360, 286, 3318, 341, 666, 452, + 6741, 8630, 30, 286, 458, 309, 311, 1627, 13, 400, 570, 562, 321, 584, 300, 11, + 291, 458, 11, 51376], "temperature": 0.0, "avg_logprob": -0.15394076895206532, "compression_ratio": + 1.6478260869565218, "no_speech_prob": 0.05800002068281174}, {"id": 440, "seek": + 268416, "start": 2704.3999999999996, "end": 2709.7599999999998, "text": " it is + understanding the semantic meaning of your query. And that means that even if the + things", "tokens": [51376, 309, 307, 3701, 264, 47982, 3620, 295, 428, 14581, 13, + 400, 300, 1355, 300, 754, 498, 264, 721, 51644], "temperature": 0.0, "avg_logprob": + -0.15394076895206532, "compression_ratio": 1.6478260869565218, "no_speech_prob": + 0.05800002068281174}, {"id": 441, "seek": 270976, "start": 2709.76, "end": 2715.92, + "text": " which are not described in the similar vocabulary, then also your search + engine can find them.", "tokens": [50364, 597, 366, 406, 7619, 294, 264, 2531, 19864, + 11, 550, 611, 428, 3164, 2848, 393, 915, 552, 13, 50672], "temperature": 0.0, "avg_logprob": + -0.10815545090106356, "compression_ratio": 1.6319444444444444, "no_speech_prob": + 0.005701898131519556}, {"id": 442, "seek": 270976, "start": 2715.92, "end": 2720.8, + "text": " So I think which is a very powerful thing if you look at. And also there + are some people who", "tokens": [50672, 407, 286, 519, 597, 307, 257, 588, 4005, + 551, 498, 291, 574, 412, 13, 400, 611, 456, 366, 512, 561, 567, 50916], "temperature": + 0.0, "avg_logprob": -0.10815545090106356, "compression_ratio": 1.6319444444444444, + "no_speech_prob": 0.005701898131519556}, {"id": 443, "seek": 270976, "start": 2720.8, + "end": 2726.0800000000004, "text": " really wanted to use all the machine learning + magic. I mean, I remember the craze in 2017 when", "tokens": [50916, 534, 1415, + 281, 764, 439, 264, 3479, 2539, 5585, 13, 286, 914, 11, 286, 1604, 264, 2094, 1381, + 294, 6591, 562, 51180], "temperature": 0.0, "avg_logprob": -0.10815545090106356, + "compression_ratio": 1.6319444444444444, "no_speech_prob": 0.005701898131519556}, + {"id": 444, "seek": 270976, "start": 2726.0800000000004, "end": 2731.92, "text": + " LTR came out like how many people really wanted to use LTR? And then it was like, + you know,", "tokens": [51180, 441, 25936, 1361, 484, 411, 577, 867, 561, 534, 1415, + 281, 764, 441, 25936, 30, 400, 550, 309, 390, 411, 11, 291, 458, 11, 51472], "temperature": + 0.0, "avg_logprob": -0.10815545090106356, "compression_ratio": 1.6319444444444444, + "no_speech_prob": 0.005701898131519556}, {"id": 445, "seek": 270976, "start": 2733.1200000000003, + "end": 2738.32, "text": " that kind of gave a lot of struggle to a lot of folks. + So it''s just that how easily accessible,", "tokens": [51532, 300, 733, 295, 2729, + 257, 688, 295, 7799, 281, 257, 688, 295, 4024, 13, 407, 309, 311, 445, 300, 577, + 3612, 9515, 11, 51792], "temperature": 0.0, "avg_logprob": -0.10815545090106356, + "compression_ratio": 1.6319444444444444, "no_speech_prob": 0.005701898131519556}, + {"id": 446, "seek": 273832, "start": 2738.32, "end": 2743.44, "text": " you know, + machine learning models now become. So people need to know, people deserve to know", + "tokens": [50364, 291, 458, 11, 3479, 2539, 5245, 586, 1813, 13, 407, 561, 643, + 281, 458, 11, 561, 9948, 281, 458, 50620], "temperature": 0.0, "avg_logprob": -0.11155099868774414, + "compression_ratio": 1.7890625, "no_speech_prob": 0.0073401289992034435}, {"id": + 447, "seek": 273832, "start": 2743.44, "end": 2749.36, "text": " that it is not + that rocket science anymore. Like it is very common. It''s very obvious. It''s like", + "tokens": [50620, 300, 309, 307, 406, 300, 13012, 3497, 3602, 13, 1743, 309, 307, + 588, 2689, 13, 467, 311, 588, 6322, 13, 467, 311, 411, 50916], "temperature": 0.0, + "avg_logprob": -0.11155099868774414, "compression_ratio": 1.7890625, "no_speech_prob": + 0.0073401289992034435}, {"id": 448, "seek": 273832, "start": 2749.36, "end": 2754.56, + "text": " the natural path that should go into. And the thing that I wanted to point + out earlier was that", "tokens": [50916, 264, 3303, 3100, 300, 820, 352, 666, 13, + 400, 264, 551, 300, 286, 1415, 281, 935, 484, 3071, 390, 300, 51176], "temperature": + 0.0, "avg_logprob": -0.11155099868774414, "compression_ratio": 1.7890625, "no_speech_prob": + 0.0073401289992034435}, {"id": 449, "seek": 273832, "start": 2755.28, "end": 2759.84, + "text": " the hybrid is the wave forward. There are so many things that I could + think of that keyword does", "tokens": [51212, 264, 13051, 307, 264, 5772, 2128, + 13, 821, 366, 370, 867, 721, 300, 286, 727, 519, 295, 300, 20428, 775, 51440], "temperature": + 0.0, "avg_logprob": -0.11155099868774414, "compression_ratio": 1.7890625, "no_speech_prob": + 0.0073401289992034435}, {"id": 450, "seek": 273832, "start": 2759.84, "end": 2764.8, + "text": " the best. And I think sooner people realize that hybrid is the way forward.", + "tokens": [51440, 264, 1151, 13, 400, 286, 519, 15324, 561, 4325, 300, 13051, 307, + 264, 636, 2128, 13, 51688], "temperature": 0.0, "avg_logprob": -0.11155099868774414, + "compression_ratio": 1.7890625, "no_speech_prob": 0.0073401289992034435}, {"id": + 451, "seek": 276480, "start": 2764.96, "end": 2774.8, "text": " Yeah. And like what + is your take on hybrid if you were to offer it to a client, you know,", "tokens": + [50372, 865, 13, 400, 411, 437, 307, 428, 747, 322, 13051, 498, 291, 645, 281, 2626, + 309, 281, 257, 6423, 11, 291, 458, 11, 50864], "temperature": 0.0, "avg_logprob": + -0.23560344582737083, "compression_ratio": 1.644736842105263, "no_speech_prob": + 0.023212773725390434}, {"id": 452, "seek": 276480, "start": 2774.8, "end": 2780.4, + "text": " I have heard from some of the clients in the past, you know, okay, so + if we have a vector database,", "tokens": [50864, 286, 362, 2198, 490, 512, 295, + 264, 6982, 294, 264, 1791, 11, 291, 458, 11, 1392, 11, 370, 498, 321, 362, 257, + 8062, 8149, 11, 51144], "temperature": 0.0, "avg_logprob": -0.23560344582737083, + "compression_ratio": 1.644736842105263, "no_speech_prob": 0.023212773725390434}, + {"id": 453, "seek": 276480, "start": 2780.4, "end": 2787.2000000000003, "text": + " like VVAT or pinecon or whatever or quadrant. And then we have the needs for the + e-commerce", "tokens": [51144, 411, 691, 53, 2218, 420, 15113, 1671, 420, 2035, + 420, 46856, 13, 400, 550, 321, 362, 264, 2203, 337, 264, 308, 12, 26926, 51484], + "temperature": 0.0, "avg_logprob": -0.23560344582737083, "compression_ratio": 1.644736842105263, + "no_speech_prob": 0.023212773725390434}, {"id": 454, "seek": 276480, "start": 2787.2000000000003, + "end": 2794.0800000000004, "text": " application like facets and we cannot do it + in some of these databases. So what should we do?", "tokens": [51484, 3861, 411, + 49752, 293, 321, 2644, 360, 309, 294, 512, 295, 613, 22380, 13, 407, 437, 820, 321, + 360, 30, 51828], "temperature": 0.0, "avg_logprob": -0.23560344582737083, "compression_ratio": + 1.644736842105263, "no_speech_prob": 0.023212773725390434}, {"id": 455, "seek": + 279408, "start": 2794.16, "end": 2802.0, "text": " Like does hybrid mean that we + will run to databases like one elastic search, one vector database,", "tokens": + [50368, 1743, 775, 13051, 914, 300, 321, 486, 1190, 281, 22380, 411, 472, 17115, + 3164, 11, 472, 8062, 8149, 11, 50760], "temperature": 0.0, "avg_logprob": -0.15756186123551993, + "compression_ratio": 1.7013574660633484, "no_speech_prob": 0.001454259268939495}, + {"id": 456, "seek": 279408, "start": 2802.0, "end": 2808.16, "text": " or do you + have a better answer to that? I think not really. I think that''s also an interesting + point", "tokens": [50760, 420, 360, 291, 362, 257, 1101, 1867, 281, 300, 30, 286, + 519, 406, 534, 13, 286, 519, 300, 311, 611, 364, 1880, 935, 51068], "temperature": + 0.0, "avg_logprob": -0.15756186123551993, "compression_ratio": 1.7013574660633484, + "no_speech_prob": 0.001454259268939495}, {"id": 457, "seek": 279408, "start": 2808.16, + "end": 2813.04, "text": " if it''s coming in the discussion here. I think sooner + or lot of these, you know,", "tokens": [51068, 498, 309, 311, 1348, 294, 264, 5017, + 510, 13, 286, 519, 15324, 420, 688, 295, 613, 11, 291, 458, 11, 51312], "temperature": + 0.0, "avg_logprob": -0.15756186123551993, "compression_ratio": 1.7013574660633484, + "no_speech_prob": 0.001454259268939495}, {"id": 458, "seek": 279408, "start": 2814.08, + "end": 2819.52, "text": " vector database and vector search engine companies are + also realizing that they cannot lose what", "tokens": [51364, 8062, 8149, 293, 8062, + 3164, 2848, 3431, 366, 611, 16734, 300, 436, 2644, 3624, 437, 51636], "temperature": + 0.0, "avg_logprob": -0.15756186123551993, "compression_ratio": 1.7013574660633484, + "no_speech_prob": 0.001454259268939495}, {"id": 459, "seek": 281952, "start": 2819.6, + "end": 2824.08, "text": " keyword search engines brought. They cannot just take + it away. I think one of the other things that,", "tokens": [50368, 20428, 3164, + 12982, 3038, 13, 814, 2644, 445, 747, 309, 1314, 13, 286, 519, 472, 295, 264, 661, + 721, 300, 11, 50592], "temperature": 0.0, "avg_logprob": -0.11625288214002337, "compression_ratio": + 2.0208333333333335, "no_speech_prob": 0.00718507869169116}, {"id": 460, "seek": + 281952, "start": 2824.08, "end": 2831.52, "text": " you know, the keyword based + search engines abroad is like, you have total control on what goes", "tokens": [50592, + 291, 458, 11, 264, 20428, 2361, 3164, 12982, 12637, 307, 411, 11, 291, 362, 3217, + 1969, 322, 437, 1709, 50964], "temperature": 0.0, "avg_logprob": -0.11625288214002337, + "compression_ratio": 2.0208333333333335, "no_speech_prob": 0.00718507869169116}, + {"id": 461, "seek": 281952, "start": 2831.52, "end": 2837.28, "text": " into the + search engine. And I think this is something in the name of semantic understanding,", + "tokens": [50964, 666, 264, 3164, 2848, 13, 400, 286, 519, 341, 307, 746, 294, 264, + 1315, 295, 47982, 3701, 11, 51252], "temperature": 0.0, "avg_logprob": -0.11625288214002337, + "compression_ratio": 2.0208333333333335, "no_speech_prob": 0.00718507869169116}, + {"id": 462, "seek": 281952, "start": 2837.28, "end": 2842.48, "text": " you cannot + just push your content into these search engines. So you still have to massage the", + "tokens": [51252, 291, 2644, 445, 2944, 428, 2701, 666, 613, 3164, 12982, 13, 407, + 291, 920, 362, 281, 16145, 264, 51512], "temperature": 0.0, "avg_logprob": -0.11625288214002337, + "compression_ratio": 2.0208333333333335, "no_speech_prob": 0.00718507869169116}, + {"id": 463, "seek": 281952, "start": 2842.48, "end": 2848.72, "text": " content, + you still have to treat it, you still have to have control on this data. And somehow, + like if", "tokens": [51512, 2701, 11, 291, 920, 362, 281, 2387, 309, 11, 291, 920, + 362, 281, 362, 1969, 322, 341, 1412, 13, 400, 6063, 11, 411, 498, 51824], "temperature": + 0.0, "avg_logprob": -0.11625288214002337, "compression_ratio": 2.0208333333333335, + "no_speech_prob": 0.00718507869169116}, {"id": 464, "seek": 284872, "start": 2848.72, + "end": 2854.64, "text": " you say that, you know, synonyms are not needed anymore + because, you know, the vector search engines,", "tokens": [50364, 291, 584, 300, + 11, 291, 458, 11, 5451, 2526, 2592, 366, 406, 2978, 3602, 570, 11, 291, 458, 11, + 264, 8062, 3164, 12982, 11, 50660], "temperature": 0.0, "avg_logprob": -0.1351154227005808, + "compression_ratio": 1.751111111111111, "no_speech_prob": 0.001237755292095244}, + {"id": 465, "seek": 284872, "start": 2854.64, "end": 2859.2799999999997, "text": + " you know, would understand all of that. But what about stemming? What about, you + know, there are", "tokens": [50660, 291, 458, 11, 576, 1223, 439, 295, 300, 13, + 583, 437, 466, 12312, 2810, 30, 708, 466, 11, 291, 458, 11, 456, 366, 50892], "temperature": + 0.0, "avg_logprob": -0.1351154227005808, "compression_ratio": 1.751111111111111, + "no_speech_prob": 0.001237755292095244}, {"id": 466, "seek": 284872, "start": 2859.2799999999997, + "end": 2863.8399999999997, "text": " tons of other things that we do in the before + pointing data into the search engine. And I think", "tokens": [50892, 9131, 295, + 661, 721, 300, 321, 360, 294, 264, 949, 12166, 1412, 666, 264, 3164, 2848, 13, 400, + 286, 519, 51120], "temperature": 0.0, "avg_logprob": -0.1351154227005808, "compression_ratio": + 1.751111111111111, "no_speech_prob": 0.001237755292095244}, {"id": 467, "seek": + 284872, "start": 2863.8399999999997, "end": 2871.2799999999997, "text": " that still + stays relevant in a lot of different contexts because this has developed, this has + grown", "tokens": [51120, 300, 920, 10834, 7340, 294, 257, 688, 295, 819, 30628, + 570, 341, 575, 4743, 11, 341, 575, 7709, 51492], "temperature": 0.0, "avg_logprob": + -0.1351154227005808, "compression_ratio": 1.751111111111111, "no_speech_prob": 0.001237755292095244}, + {"id": 468, "seek": 287128, "start": 2871.28, "end": 2878.0800000000004, "text": + " over the period of time. You cannot throw it all out. So I feel like there''s + a, you know,", "tokens": [50364, 670, 264, 2896, 295, 565, 13, 509, 2644, 3507, + 309, 439, 484, 13, 407, 286, 841, 411, 456, 311, 257, 11, 291, 458, 11, 50704], + "temperature": 0.0, "avg_logprob": -0.12014622364229369, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.034422505646944046}, {"id": 469, "seek": 287128, "start": 2878.0800000000004, + "end": 2884.96, "text": " mid kind of point that, you know, we have to come, like, + especially for the traditional search engines", "tokens": [50704, 2062, 733, 295, + 935, 300, 11, 291, 458, 11, 321, 362, 281, 808, 11, 411, 11, 2318, 337, 264, 5164, + 3164, 12982, 51048], "temperature": 0.0, "avg_logprob": -0.12014622364229369, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.034422505646944046}, {"id": 470, "seek": + 287128, "start": 2884.96, "end": 2891.92, "text": " and the new search engines, + which are emerging in the market, they have to come somewhere in between,", "tokens": + [51048, 293, 264, 777, 3164, 12982, 11, 597, 366, 14989, 294, 264, 2142, 11, 436, + 362, 281, 808, 4079, 294, 1296, 11, 51396], "temperature": 0.0, "avg_logprob": -0.12014622364229369, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.034422505646944046}, + {"id": 471, "seek": 287128, "start": 2891.92, "end": 2897.6000000000004, "text": + " where we try to, you know, bring best of both words. And I think that is going + to be the way forward.", "tokens": [51396, 689, 321, 853, 281, 11, 291, 458, 11, + 1565, 1151, 295, 1293, 2283, 13, 400, 286, 519, 300, 307, 516, 281, 312, 264, 636, + 2128, 13, 51680], "temperature": 0.0, "avg_logprob": -0.12014622364229369, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.034422505646944046}, {"id": 472, "seek": + 289760, "start": 2898.24, "end": 2901.2, "text": " Yeah, but I guess we are not + kind of like there yet, right?", "tokens": [50396, 865, 11, 457, 286, 2041, 321, + 366, 406, 733, 295, 411, 456, 1939, 11, 558, 30, 50544], "temperature": 0.0, "avg_logprob": + -0.12156861460107005, "compression_ratio": 1.7628458498023716, "no_speech_prob": + 0.08503904938697815}, {"id": 473, "seek": 289760, "start": 2902.48, "end": 2908.72, + "text": " I would say like, I mean, the change has already started. All right. In + the presentations that we", "tokens": [50608, 286, 576, 584, 411, 11, 286, 914, + 11, 264, 1319, 575, 1217, 1409, 13, 1057, 558, 13, 682, 264, 18964, 300, 321, 50920], + "temperature": 0.0, "avg_logprob": -0.12156861460107005, "compression_ratio": 1.7628458498023716, + "no_speech_prob": 0.08503904938697815}, {"id": 474, "seek": 289760, "start": 2908.72, + "end": 2915.52, "text": " give and the demonstrations that we provide to the customers, + I think people ask us that, you know,", "tokens": [50920, 976, 293, 264, 34714, + 300, 321, 2893, 281, 264, 4581, 11, 286, 519, 561, 1029, 505, 300, 11, 291, 458, + 11, 51260], "temperature": 0.0, "avg_logprob": -0.12156861460107005, "compression_ratio": + 1.7628458498023716, "no_speech_prob": 0.08503904938697815}, {"id": 475, "seek": + 289760, "start": 2915.52, "end": 2920.56, "text": " what is the smallest use case + I could, you know, try with this. So I think one of the suggestions", "tokens": + [51260, 437, 307, 264, 16998, 764, 1389, 286, 727, 11, 291, 458, 11, 853, 365, 341, + 13, 407, 286, 519, 472, 295, 264, 13396, 51512], "temperature": 0.0, "avg_logprob": + -0.12156861460107005, "compression_ratio": 1.7628458498023716, "no_speech_prob": + 0.08503904938697815}, {"id": 476, "seek": 289760, "start": 2920.56, "end": 2925.2, + "text": " that always, you know, comes from my side is that, you know, attack the + cases that do not,", "tokens": [51512, 300, 1009, 11, 291, 458, 11, 1487, 490, 452, + 1252, 307, 300, 11, 291, 458, 11, 2690, 264, 3331, 300, 360, 406, 11, 51744], "temperature": + 0.0, "avg_logprob": -0.12156861460107005, "compression_ratio": 1.7628458498023716, + "no_speech_prob": 0.08503904938697815}, {"id": 477, "seek": 292520, "start": 2926.16, + "end": 2931.04, "text": " you know, perform well with your traditional search engine. + So for example, like the long tail queries,", "tokens": [50412, 291, 458, 11, 2042, + 731, 365, 428, 5164, 3164, 2848, 13, 407, 337, 1365, 11, 411, 264, 938, 6838, 24109, + 11, 50656], "temperature": 0.0, "avg_logprob": -0.12444874335979593, "compression_ratio": + 1.8202247191011236, "no_speech_prob": 0.006000712513923645}, {"id": 478, "seek": + 292520, "start": 2931.04, "end": 2936.16, "text": " I think this is where any traditional + search engine struggles. This is probably the first thing,", "tokens": [50656, 286, + 519, 341, 307, 689, 604, 5164, 3164, 2848, 17592, 13, 639, 307, 1391, 264, 700, + 551, 11, 50912], "temperature": 0.0, "avg_logprob": -0.12444874335979593, "compression_ratio": + 1.8202247191011236, "no_speech_prob": 0.006000712513923645}, {"id": 479, "seek": + 292520, "start": 2936.16, "end": 2940.7999999999997, "text": " and instead of, you + know, running into a zero head screen, like it is better to, you know, have a", + "tokens": [50912, 293, 2602, 295, 11, 291, 458, 11, 2614, 666, 257, 4018, 1378, + 2568, 11, 411, 309, 307, 1101, 281, 11, 291, 458, 11, 362, 257, 51144], "temperature": + 0.0, "avg_logprob": -0.12444874335979593, "compression_ratio": 1.8202247191011236, + "no_speech_prob": 0.006000712513923645}, {"id": 480, "seek": 292520, "start": 2940.7999999999997, + "end": 2946.96, "text": " chain sort of, which should delegate your query to the + vector search engine or a vector part of", "tokens": [51144, 5021, 1333, 295, 11, + 597, 820, 40999, 428, 14581, 281, 264, 8062, 3164, 2848, 420, 257, 8062, 644, 295, + 51452], "temperature": 0.0, "avg_logprob": -0.12444874335979593, "compression_ratio": + 1.8202247191011236, "no_speech_prob": 0.006000712513923645}, {"id": 481, "seek": + 292520, "start": 2946.96, "end": 2952.96, "text": " the query. And this is something + should be like a smallest, you know, like way to adopt the", "tokens": [51452, 264, + 14581, 13, 400, 341, 307, 746, 820, 312, 411, 257, 16998, 11, 291, 458, 11, 411, + 636, 281, 6878, 264, 51752], "temperature": 0.0, "avg_logprob": -0.12444874335979593, + "compression_ratio": 1.8202247191011236, "no_speech_prob": 0.006000712513923645}, + {"id": 482, "seek": 295296, "start": 2952.96, "end": 2957.68, "text": " the newest + technologies and take leverage what they find a bring it. So maybe like from the + product", "tokens": [50364, 264, 17569, 7943, 293, 747, 13982, 437, 436, 915, 257, + 1565, 309, 13, 407, 1310, 411, 490, 264, 1674, 50600], "temperature": 0.0, "avg_logprob": + -0.29656117356668305, "compression_ratio": 1.731012658227848, "no_speech_prob": + 0.00315081630833447}, {"id": 483, "seek": 295296, "start": 2957.68, "end": 2963.76, + "text": " perspective and business perspective, reducing the search abandonment + rate, right? Because that,", "tokens": [50600, 4585, 293, 1606, 4585, 11, 12245, + 264, 3164, 9072, 518, 3314, 11, 558, 30, 1436, 300, 11, 50904], "temperature": 0.0, + "avg_logprob": -0.29656117356668305, "compression_ratio": 1.731012658227848, "no_speech_prob": + 0.00315081630833447}, {"id": 484, "seek": 295296, "start": 2963.76, "end": 2967.6, + "text": " that that what actually takes a lot of money away from all these players, + you know,", "tokens": [50904, 300, 300, 437, 767, 2516, 257, 688, 295, 1460, 1314, + 490, 439, 613, 4150, 11, 291, 458, 11, 51096], "temperature": 0.0, "avg_logprob": + -0.29656117356668305, "compression_ratio": 1.731012658227848, "no_speech_prob": + 0.00315081630833447}, {"id": 485, "seek": 295296, "start": 2967.6, "end": 2972.32, + "text": " absolutely. The abandonment you''re just based, based off and you''re + like, you cannot find anything.", "tokens": [51096, 3122, 13, 440, 9072, 518, 291, + 434, 445, 2361, 11, 2361, 766, 293, 291, 434, 411, 11, 291, 2644, 915, 1340, 13, + 51332], "temperature": 0.0, "avg_logprob": -0.29656117356668305, "compression_ratio": + 1.731012658227848, "no_speech_prob": 0.00315081630833447}, {"id": 486, "seek": 295296, + "start": 2972.32, "end": 2976.48, "text": " So why should I keep trying? The system + does not even like respond.", "tokens": [51332, 407, 983, 820, 286, 1066, 1382, + 30, 440, 1185, 775, 406, 754, 411, 4196, 13, 51540], "temperature": 0.0, "avg_logprob": + -0.29656117356668305, "compression_ratio": 1.731012658227848, "no_speech_prob": + 0.00315081630833447}, {"id": 487, "seek": 295296, "start": 2977.28, "end": 2981.92, + "text": " Absolutely. I mean, I have been in that situation before because I''ve + also worked on like a lot of", "tokens": [51580, 7021, 13, 286, 914, 11, 286, 362, + 668, 294, 300, 2590, 949, 570, 286, 600, 611, 2732, 322, 411, 257, 688, 295, 51812], + "temperature": 0.0, "avg_logprob": -0.29656117356668305, "compression_ratio": 1.731012658227848, + "no_speech_prob": 0.00315081630833447}, {"id": 488, "seek": 298192, "start": 2981.92, + "end": 2986.32, "text": " product searches and I would leverage something like, + you know, I would keep on doing like the", "tokens": [50364, 1674, 26701, 293, 286, + 576, 13982, 746, 411, 11, 291, 458, 11, 286, 576, 1066, 322, 884, 411, 264, 50584], + "temperature": 0.0, "avg_logprob": -0.07577056243640035, "compression_ratio": 1.7672727272727273, + "no_speech_prob": 0.0023128297179937363}, {"id": 489, "seek": 298192, "start": 2986.32, + "end": 2991.84, "text": " relaxation of the tokens in the query. So I would keep + dropping the content that doesn''t make sense.", "tokens": [50584, 30315, 295, 264, + 22667, 294, 264, 14581, 13, 407, 286, 576, 1066, 13601, 264, 2701, 300, 1177, 380, + 652, 2020, 13, 50860], "temperature": 0.0, "avg_logprob": -0.07577056243640035, + "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0023128297179937363}, + {"id": 490, "seek": 298192, "start": 2991.84, "end": 2997.6800000000003, "text": + " But then I would say it''s not the easiest thing to do. Rather, it is easier to + pass on this query,", "tokens": [50860, 583, 550, 286, 576, 584, 309, 311, 406, + 264, 12889, 551, 281, 360, 13, 16571, 11, 309, 307, 3571, 281, 1320, 322, 341, 14581, + 11, 51152], "temperature": 0.0, "avg_logprob": -0.07577056243640035, "compression_ratio": + 1.7672727272727273, "no_speech_prob": 0.0023128297179937363}, {"id": 491, "seek": + 298192, "start": 2997.6800000000003, "end": 3003.28, "text": " like the long query + to another system that is, you know, dealing with semantic similarity.", "tokens": + [51152, 411, 264, 938, 14581, 281, 1071, 1185, 300, 307, 11, 291, 458, 11, 6260, + 365, 47982, 32194, 13, 51432], "temperature": 0.0, "avg_logprob": -0.07577056243640035, + "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0023128297179937363}, + {"id": 492, "seek": 298192, "start": 3003.28, "end": 3009.76, "text": " And once + that has proven its worth, I think that''s the time we bring it forward. And we + take it more", "tokens": [51432, 400, 1564, 300, 575, 12785, 1080, 3163, 11, 286, + 519, 300, 311, 264, 565, 321, 1565, 309, 2128, 13, 400, 321, 747, 309, 544, 51756], + "temperature": 0.0, "avg_logprob": -0.07577056243640035, "compression_ratio": 1.7672727272727273, + "no_speech_prob": 0.0023128297179937363}, {"id": 493, "seek": 300976, "start": 3009.76, + "end": 3017.36, "text": " like from the down to up approach. Yeah. So you think + the message really to vector database", "tokens": [50364, 411, 490, 264, 760, 281, + 493, 3109, 13, 865, 13, 407, 291, 519, 264, 3636, 534, 281, 8062, 8149, 50744], + "temperature": 0.0, "avg_logprob": -0.12492439895868301, "compression_ratio": 1.5243243243243243, + "no_speech_prob": 0.00846576876938343}, {"id": 494, "seek": 300976, "start": 3017.36, + "end": 3026.2400000000002, "text": " companies is to think about what you can take + from keyword search engines like solar elastic", "tokens": [50744, 3431, 307, 281, + 519, 466, 437, 291, 393, 747, 490, 20428, 3164, 12982, 411, 7936, 17115, 51188], + "temperature": 0.0, "avg_logprob": -0.12492439895868301, "compression_ratio": 1.5243243243243243, + "no_speech_prob": 0.00846576876938343}, {"id": 495, "seek": 300976, "start": 3026.2400000000002, + "end": 3033.28, "text": " search, open search, I guess as well, right? Absolutely. + And also message to the existing keyword", "tokens": [51188, 3164, 11, 1269, 3164, + 11, 286, 2041, 382, 731, 11, 558, 30, 7021, 13, 400, 611, 3636, 281, 264, 6741, + 20428, 51540], "temperature": 0.0, "avg_logprob": -0.12492439895868301, "compression_ratio": + 1.5243243243243243, "no_speech_prob": 0.00846576876938343}, {"id": 496, "seek": + 303328, "start": 3033.76, "end": 3040.32, "text": " search engine companies as well + as that you''re not old fashioned, you''re not like out of the market", "tokens": + [50388, 3164, 2848, 3431, 382, 731, 382, 300, 291, 434, 406, 1331, 40646, 11, 291, + 434, 406, 411, 484, 295, 264, 2142, 50716], "temperature": 0.0, "avg_logprob": -0.16272992836801628, + "compression_ratio": 1.528205128205128, "no_speech_prob": 0.0229375958442688}, {"id": + 497, "seek": 303328, "start": 3040.32, "end": 3047.28, "text": " anyway, like you + have proven your worth over the period of time and it is here to stay. It''s just", + "tokens": [50716, 4033, 11, 411, 291, 362, 12785, 428, 3163, 670, 264, 2896, 295, + 565, 293, 309, 307, 510, 281, 1754, 13, 467, 311, 445, 51064], "temperature": 0.0, + "avg_logprob": -0.16272992836801628, "compression_ratio": 1.528205128205128, "no_speech_prob": + 0.0229375958442688}, {"id": 498, "seek": 303328, "start": 3047.28, "end": 3055.1200000000003, + "text": " that how quickly you can adopt to the change. And I think that is happening. + Yeah. Yeah. That''s very", "tokens": [51064, 300, 577, 2661, 291, 393, 6878, 281, + 264, 1319, 13, 400, 286, 519, 300, 307, 2737, 13, 865, 13, 865, 13, 663, 311, 588, + 51456], "temperature": 0.0, "avg_logprob": -0.16272992836801628, "compression_ratio": + 1.528205128205128, "no_speech_prob": 0.0229375958442688}, {"id": 499, "seek": 305512, + "start": 3055.12, "end": 3063.52, "text": " interesting. I think you you are calming + many people. No, I like, oh, I''m like, I will", "tokens": [50364, 1880, 13, 286, + 519, 291, 291, 366, 39723, 867, 561, 13, 883, 11, 286, 411, 11, 1954, 11, 286, 478, + 411, 11, 286, 486, 50784], "temperature": 0.0, "avg_logprob": -0.28673347730315135, + "compression_ratio": 1.5765765765765767, "no_speech_prob": 0.008153512142598629}, + {"id": 500, "seek": 305512, "start": 3063.52, "end": 3070.4, "text": " losing, I + will losing the wave of innovation because we cannot, you know, introduce vector", + "tokens": [50784, 7027, 11, 286, 486, 7027, 264, 5772, 295, 8504, 570, 321, 2644, + 11, 291, 458, 11, 5366, 8062, 51128], "temperature": 0.0, "avg_logprob": -0.28673347730315135, + "compression_ratio": 1.5765765765765767, "no_speech_prob": 0.008153512142598629}, + {"id": 501, "seek": 305512, "start": 3070.4, "end": 3074.3199999999997, "text": + " database into the mix or whatever. But I think you can, right? Like with solar,", + "tokens": [51128, 8149, 666, 264, 2890, 420, 2035, 13, 583, 286, 519, 291, 393, + 11, 558, 30, 1743, 365, 7936, 11, 51324], "temperature": 0.0, "avg_logprob": -0.28673347730315135, + "compression_ratio": 1.5765765765765767, "no_speech_prob": 0.008153512142598629}, + {"id": 502, "seek": 305512, "start": 3074.7999999999997, "end": 3079.7599999999998, + "text": " Alexander Benedetti is doing a lot of work in implementing the vector + search there. And then", "tokens": [51348, 14845, 39753, 12495, 307, 884, 257, 688, + 295, 589, 294, 18114, 264, 8062, 3164, 456, 13, 400, 550, 51596], "temperature": + 0.0, "avg_logprob": -0.28673347730315135, "compression_ratio": 1.5765765765765767, + "no_speech_prob": 0.008153512142598629}, {"id": 503, "seek": 307976, "start": 3080.5600000000004, + "end": 3085.28, "text": " in the elastic search, of course, solar Maria Shripe and + others have been", "tokens": [50404, 294, 264, 17115, 3164, 11, 295, 1164, 11, 7936, + 12734, 1160, 470, 494, 293, 2357, 362, 668, 50640], "temperature": 0.0, "avg_logprob": + -0.34566887388838097, "compression_ratio": 1.530701754385965, "no_speech_prob": + 0.014164176769554615}, {"id": 504, "seek": 307976, "start": 3086.0800000000004, + "end": 3092.4, "text": " Julie. Yes. Julie Tsipzirani have been doing work there, + right? But like I do still feel like", "tokens": [50680, 18794, 13, 1079, 13, 18794, + 16518, 647, 89, 347, 3782, 362, 668, 884, 589, 456, 11, 558, 30, 583, 411, 286, + 360, 920, 841, 411, 50996], "temperature": 0.0, "avg_logprob": -0.34566887388838097, + "compression_ratio": 1.530701754385965, "no_speech_prob": 0.014164176769554615}, + {"id": 505, "seek": 307976, "start": 3093.36, "end": 3099.28, "text": " what Doug + was saying in his post, you know, the cracks of it that like it doesn''t feel like", + "tokens": [51044, 437, 12742, 390, 1566, 294, 702, 2183, 11, 291, 458, 11, 264, + 21770, 295, 309, 300, 411, 309, 1177, 380, 841, 411, 51340], "temperature": 0.0, + "avg_logprob": -0.34566887388838097, "compression_ratio": 1.530701754385965, "no_speech_prob": + 0.014164176769554615}, {"id": 506, "seek": 307976, "start": 3099.28, "end": 3105.6800000000003, + "text": " this functionality is advertised well. I think that''s actually the point. + Yes. And of it,", "tokens": [51340, 341, 14980, 307, 42310, 731, 13, 286, 519, 300, + 311, 767, 264, 935, 13, 1079, 13, 400, 295, 309, 11, 51660], "temperature": 0.0, + "avg_logprob": -0.34566887388838097, "compression_ratio": 1.530701754385965, "no_speech_prob": + 0.014164176769554615}, {"id": 507, "seek": 310568, "start": 3105.7599999999998, + "end": 3111.8399999999997, "text": " not even from the marketing perspective, but + more like from the perspective of, hey, how do you", "tokens": [50368, 406, 754, + 490, 264, 6370, 4585, 11, 457, 544, 411, 490, 264, 4585, 295, 11, 4177, 11, 577, + 360, 291, 50672], "temperature": 0.0, "avg_logprob": -0.15636871542249406, "compression_ratio": + 1.7026022304832713, "no_speech_prob": 0.01789936050772667}, {"id": 508, "seek": + 310568, "start": 3111.8399999999997, "end": 3116.48, "text": " get things done with + this? Yeah. Like all these basic questions answered.", "tokens": [50672, 483, 721, + 1096, 365, 341, 30, 865, 13, 1743, 439, 613, 3875, 1651, 10103, 13, 50904], "temperature": + 0.0, "avg_logprob": -0.15636871542249406, "compression_ratio": 1.7026022304832713, + "no_speech_prob": 0.01789936050772667}, {"id": 509, "seek": 310568, "start": 3117.2, + "end": 3123.12, "text": " Yeah. I think that''s actually a very good point. I think + and there''s a way, I would say like a", "tokens": [50940, 865, 13, 286, 519, 300, + 311, 767, 257, 588, 665, 935, 13, 286, 519, 293, 456, 311, 257, 636, 11, 286, 576, + 584, 411, 257, 51236], "temperature": 0.0, "avg_logprob": -0.15636871542249406, + "compression_ratio": 1.7026022304832713, "no_speech_prob": 0.01789936050772667}, + {"id": 510, "seek": 310568, "start": 3123.12, "end": 3128.96, "text": " contrast + you see here, because the companies which are bringing the vector search in a database", + "tokens": [51236, 8712, 291, 536, 510, 11, 570, 264, 3431, 597, 366, 5062, 264, + 8062, 3164, 294, 257, 8149, 51528], "temperature": 0.0, "avg_logprob": -0.15636871542249406, + "compression_ratio": 1.7026022304832713, "no_speech_prob": 0.01789936050772667}, + {"id": 511, "seek": 310568, "start": 3128.96, "end": 3133.52, "text": " of the search + engine format, they''re new. They''re upcoming in the market. I think this is part + of", "tokens": [51528, 295, 264, 3164, 2848, 7877, 11, 436, 434, 777, 13, 814, 434, + 11500, 294, 264, 2142, 13, 286, 519, 341, 307, 644, 295, 51756], "temperature": + 0.0, "avg_logprob": -0.15636871542249406, "compression_ratio": 1.7026022304832713, + "no_speech_prob": 0.01789936050772667}, {"id": 512, "seek": 313352, "start": 3133.52, + "end": 3138.16, "text": " the marketing strategy that they have to talk about it. + They have to advertise it. It''s just that,", "tokens": [50364, 264, 6370, 5206, + 300, 436, 362, 281, 751, 466, 309, 13, 814, 362, 281, 35379, 309, 13, 467, 311, + 445, 300, 11, 50596], "temperature": 0.0, "avg_logprob": -0.14483001828193665, "compression_ratio": + 1.794871794871795, "no_speech_prob": 0.008761400356888771}, {"id": 513, "seek": + 313352, "start": 3138.16, "end": 3142.56, "text": " you know, the traditional search + engine companies or I would not say like solar is not with your", "tokens": [50596, + 291, 458, 11, 264, 5164, 3164, 2848, 3431, 420, 286, 576, 406, 584, 411, 7936, 307, + 406, 365, 428, 50816], "temperature": 0.0, "avg_logprob": -0.14483001828193665, + "compression_ratio": 1.794871794871795, "no_speech_prob": 0.008761400356888771}, + {"id": 514, "seek": 313352, "start": 3142.56, "end": 3148.96, "text": " company. + Elastic probably is. But then because they are already, you know, like very popular + people", "tokens": [50816, 2237, 13, 2699, 2750, 1391, 307, 13, 583, 550, 570, 436, + 366, 1217, 11, 291, 458, 11, 411, 588, 3743, 561, 51136], "temperature": 0.0, "avg_logprob": + -0.14483001828193665, "compression_ratio": 1.794871794871795, "no_speech_prob": + 0.008761400356888771}, {"id": 515, "seek": 313352, "start": 3148.96, "end": 3154.64, + "text": " are using it, they don''t need that mass, you know, publicity. So to say, + but I think we need to", "tokens": [51136, 366, 1228, 309, 11, 436, 500, 380, 643, + 300, 2758, 11, 291, 458, 11, 37264, 13, 407, 281, 584, 11, 457, 286, 519, 321, 643, + 281, 51420], "temperature": 0.0, "avg_logprob": -0.14483001828193665, "compression_ratio": + 1.794871794871795, "no_speech_prob": 0.008761400356888771}, {"id": 516, "seek": + 313352, "start": 3154.64, "end": 3161.52, "text": " talk about it that, okay, I + think if that''s a trend, like we do it too, but we don''t talk about it", "tokens": + [51420, 751, 466, 309, 300, 11, 1392, 11, 286, 519, 498, 300, 311, 257, 6028, 11, + 411, 321, 360, 309, 886, 11, 457, 321, 500, 380, 751, 466, 309, 51764], "temperature": + 0.0, "avg_logprob": -0.14483001828193665, "compression_ratio": 1.794871794871795, + "no_speech_prob": 0.008761400356888771}, {"id": 517, "seek": 316152, "start": 3161.6, + "end": 3168.48, "text": " that much as much as we should be doing. And I think that''s + actually an interesting point,", "tokens": [50368, 300, 709, 382, 709, 382, 321, + 820, 312, 884, 13, 400, 286, 519, 300, 311, 767, 364, 1880, 935, 11, 50712], "temperature": + 0.0, "avg_logprob": -0.12229512751787558, "compression_ratio": 1.6828193832599119, + "no_speech_prob": 0.0038236642722040415}, {"id": 518, "seek": 316152, "start": 3168.48, + "end": 3174.16, "text": " which means that we should talk about course more, because + I think that basically exemplifies as", "tokens": [50712, 597, 1355, 300, 321, 820, + 751, 466, 1164, 544, 11, 570, 286, 519, 300, 1936, 24112, 11221, 382, 50996], "temperature": + 0.0, "avg_logprob": -0.12229512751787558, "compression_ratio": 1.6828193832599119, + "no_speech_prob": 0.0038236642722040415}, {"id": 519, "seek": 316152, "start": 3174.16, + "end": 3179.6, "text": " to how easily can this be done with your search engine. + So you don''t have to divorce your existing", "tokens": [50996, 281, 577, 3612, + 393, 341, 312, 1096, 365, 428, 3164, 2848, 13, 407, 291, 500, 380, 362, 281, 16052, + 428, 6741, 51268], "temperature": 0.0, "avg_logprob": -0.12229512751787558, "compression_ratio": + 1.6828193832599119, "no_speech_prob": 0.0038236642722040415}, {"id": 520, "seek": + 316152, "start": 3179.6, "end": 3186.56, "text": " search engine to use some cool + technology, unless you really have a case where you are starting", "tokens": [51268, + 3164, 2848, 281, 764, 512, 1627, 2899, 11, 5969, 291, 534, 362, 257, 1389, 689, + 291, 366, 2891, 51616], "temperature": 0.0, "avg_logprob": -0.12229512751787558, + "compression_ratio": 1.6828193832599119, "no_speech_prob": 0.0038236642722040415}, + {"id": 521, "seek": 318656, "start": 3186.64, "end": 3191.92, "text": " right up + from the scratch. I think you can consider using one of these. Otherwise, I think + if you''re", "tokens": [50368, 558, 493, 490, 264, 8459, 13, 286, 519, 291, 393, + 1949, 1228, 472, 295, 613, 13, 10328, 11, 286, 519, 498, 291, 434, 50632], "temperature": + 0.0, "avg_logprob": -0.14279097726900283, "compression_ratio": 1.5151515151515151, + "no_speech_prob": 0.02175074629485607}, {"id": 522, "seek": 318656, "start": 3191.92, + "end": 3197.36, "text": " using something already, which has grown over the period + of time, it doesn''t make sense to throw", "tokens": [50632, 1228, 746, 1217, 11, + 597, 575, 7709, 670, 264, 2896, 295, 565, 11, 309, 1177, 380, 652, 2020, 281, 3507, + 50904], "temperature": 0.0, "avg_logprob": -0.14279097726900283, "compression_ratio": + 1.5151515151515151, "no_speech_prob": 0.02175074629485607}, {"id": 523, "seek": + 318656, "start": 3197.36, "end": 3205.84, "text": " everything out of the window + just yet. Absolutely. And with your Lucine mindset, what have you seen in", "tokens": + [50904, 1203, 484, 295, 264, 4910, 445, 1939, 13, 7021, 13, 400, 365, 428, 9593, + 533, 12543, 11, 437, 362, 291, 1612, 294, 51328], "temperature": 0.0, "avg_logprob": + -0.14279097726900283, "compression_ratio": 1.5151515151515151, "no_speech_prob": + 0.02175074629485607}, {"id": 524, "seek": 320584, "start": 3205.84, "end": 3216.4, + "text": " Vespa that looked attractive for that? Yeah, I think I have been quite + a Vespa fan girl, I would say", "tokens": [50364, 691, 279, 4306, 300, 2956, 12609, + 337, 300, 30, 865, 11, 286, 519, 286, 362, 668, 1596, 257, 691, 279, 4306, 3429, + 2013, 11, 286, 576, 584, 50892], "temperature": 0.0, "avg_logprob": -0.21336880852194393, + "compression_ratio": 1.7981220657276995, "no_speech_prob": 0.1404028683900833}, + {"id": 525, "seek": 320584, "start": 3217.04, "end": 3222.48, "text": " more for + the reasons that the content, the kind of content that Vespa, he generates. I think,", + "tokens": [50924, 544, 337, 264, 4112, 300, 264, 2701, 11, 264, 733, 295, 2701, + 300, 691, 279, 4306, 11, 415, 23815, 13, 286, 519, 11, 51196], "temperature": 0.0, + "avg_logprob": -0.21336880852194393, "compression_ratio": 1.7981220657276995, "no_speech_prob": + 0.1404028683900833}, {"id": 526, "seek": 320584, "start": 3223.36, "end": 3227.1200000000003, + "text": " you know, you would need, when you''re talking about features, when you''re + talking about, you know,", "tokens": [51240, 291, 458, 11, 291, 576, 643, 11, 562, + 291, 434, 1417, 466, 4122, 11, 562, 291, 434, 1417, 466, 11, 291, 458, 11, 51428], + "temperature": 0.0, "avg_logprob": -0.21336880852194393, "compression_ratio": 1.7981220657276995, + "no_speech_prob": 0.1404028683900833}, {"id": 527, "seek": 320584, "start": 3227.1200000000003, + "end": 3233.36, "text": " search engine or, you know, different kind of like what + can be enabled with this feature,", "tokens": [51428, 3164, 2848, 420, 11, 291, + 458, 11, 819, 733, 295, 411, 437, 393, 312, 15172, 365, 341, 4111, 11, 51740], "temperature": + 0.0, "avg_logprob": -0.21336880852194393, "compression_ratio": 1.7981220657276995, + "no_speech_prob": 0.1404028683900833}, {"id": 528, "seek": 323336, "start": 3233.36, + "end": 3238.4, "text": " you all, you know, think about like, okay, will it perform + like how much of queer response time", "tokens": [50364, 291, 439, 11, 291, 458, + 11, 519, 466, 411, 11, 1392, 11, 486, 309, 2042, 411, 577, 709, 295, 20323, 4134, + 565, 50616], "temperature": 0.0, "avg_logprob": -0.1633374086066858, "compression_ratio": + 1.7728937728937728, "no_speech_prob": 0.006744744721800089}, {"id": 529, "seek": + 323336, "start": 3238.4, "end": 3243.6, "text": " am I looking at? What is the data + set I''m looking at? And when you look at like the Vespa''s content,", "tokens": + [50616, 669, 286, 1237, 412, 30, 708, 307, 264, 1412, 992, 286, 478, 1237, 412, + 30, 400, 562, 291, 574, 412, 411, 264, 691, 279, 4306, 311, 2701, 11, 50876], "temperature": + 0.0, "avg_logprob": -0.1633374086066858, "compression_ratio": 1.7728937728937728, + "no_speech_prob": 0.006744744721800089}, {"id": 530, "seek": 323336, "start": 3243.6, + "end": 3249.36, "text": " I mean, you don''t have to look any further, like everything + is summarized so well. I think this", "tokens": [50876, 286, 914, 11, 291, 500, + 380, 362, 281, 574, 604, 3052, 11, 411, 1203, 307, 14611, 1602, 370, 731, 13, 286, + 519, 341, 51164], "temperature": 0.0, "avg_logprob": -0.1633374086066858, "compression_ratio": + 1.7728937728937728, "no_speech_prob": 0.006744744721800089}, {"id": 531, "seek": + 323336, "start": 3249.36, "end": 3257.76, "text": " one thing I''m trying to, you + know, add to my writing style that I assess everything,", "tokens": [51164, 472, + 551, 286, 478, 1382, 281, 11, 291, 458, 11, 909, 281, 452, 3579, 3758, 300, 286, + 5877, 1203, 11, 51584], "temperature": 0.0, "avg_logprob": -0.1633374086066858, + "compression_ratio": 1.7728937728937728, "no_speech_prob": 0.006744744721800089}, + {"id": 532, "seek": 323336, "start": 3257.76, "end": 3262.6400000000003, "text": + " well, that I can say it out loud to the public, to the word that, you know, this + is how it performs. And I", "tokens": [51584, 731, 11, 300, 286, 393, 584, 309, + 484, 6588, 281, 264, 1908, 11, 281, 264, 1349, 300, 11, 291, 458, 11, 341, 307, + 577, 309, 26213, 13, 400, 286, 51828], "temperature": 0.0, "avg_logprob": -0.1633374086066858, + "compression_ratio": 1.7728937728937728, "no_speech_prob": 0.006744744721800089}, + {"id": 533, "seek": 326264, "start": 3262.64, "end": 3269.2, "text": " think the + very knowledgeable folks, I mean, especially Joe, I think I have been super impressed + with how", "tokens": [50364, 519, 264, 588, 33800, 4024, 11, 286, 914, 11, 2318, + 6807, 11, 286, 519, 286, 362, 668, 1687, 11679, 365, 577, 50692], "temperature": + 0.0, "avg_logprob": -0.11364034217173659, "compression_ratio": 1.7607142857142857, + "no_speech_prob": 0.007545668166130781}, {"id": 534, "seek": 326264, "start": 3269.2, + "end": 3274.48, "text": " he describes stuff. And I think some of the things have + really like blown my mind out as well,", "tokens": [50692, 415, 15626, 1507, 13, + 400, 286, 519, 512, 295, 264, 721, 362, 534, 411, 16479, 452, 1575, 484, 382, 731, + 11, 50956], "temperature": 0.0, "avg_logprob": -0.11364034217173659, "compression_ratio": + 1.7607142857142857, "no_speech_prob": 0.007545668166130781}, {"id": 535, "seek": + 326264, "start": 3274.48, "end": 3280.56, "text": " like, oh, this could be done + in this way as well. I think that''s that''s one of the things. And while", "tokens": + [50956, 411, 11, 1954, 11, 341, 727, 312, 1096, 294, 341, 636, 382, 731, 13, 286, + 519, 300, 311, 300, 311, 472, 295, 264, 721, 13, 400, 1339, 51260], "temperature": + 0.0, "avg_logprob": -0.11364034217173659, "compression_ratio": 1.7607142857142857, + "no_speech_prob": 0.007545668166130781}, {"id": 536, "seek": 326264, "start": 3280.56, + "end": 3287.04, "text": " developing this presentation last year, I think I bugged + him a lot. But if he was, he was always,", "tokens": [51260, 6416, 341, 5860, 1036, + 1064, 11, 286, 519, 286, 7426, 3004, 796, 257, 688, 13, 583, 498, 415, 390, 11, + 415, 390, 1009, 11, 51584], "temperature": 0.0, "avg_logprob": -0.11364034217173659, + "compression_ratio": 1.7607142857142857, "no_speech_prob": 0.007545668166130781}, + {"id": 537, "seek": 326264, "start": 3287.04, "end": 3292.16, "text": " always super + responsive, even his team, there were some, you know, UI things that I found out,", + "tokens": [51584, 1009, 1687, 21826, 11, 754, 702, 1469, 11, 456, 645, 512, 11, + 291, 458, 11, 15682, 721, 300, 286, 1352, 484, 11, 51840], "temperature": 0.0, "avg_logprob": + -0.11364034217173659, "compression_ratio": 1.7607142857142857, "no_speech_prob": + 0.007545668166130781}, {"id": 538, "seek": 329216, "start": 3292.16, "end": 3298.3999999999996, + "text": " like we''re not working as expected. And I think they''re always, you + know, very modest and, you know,", "tokens": [50364, 411, 321, 434, 406, 1364, 382, + 5176, 13, 400, 286, 519, 436, 434, 1009, 11, 291, 458, 11, 588, 25403, 293, 11, + 291, 458, 11, 50676], "temperature": 0.0, "avg_logprob": -0.1315971314907074, "compression_ratio": + 1.8377358490566038, "no_speech_prob": 0.0027782367542386055}, {"id": 539, "seek": + 329216, "start": 3298.3999999999996, "end": 3304.16, "text": " acknowledging that, + okay, this is something that will work on. And this is how it works. So they''re", + "tokens": [50676, 30904, 300, 11, 1392, 11, 341, 307, 746, 300, 486, 589, 322, 13, + 400, 341, 307, 577, 309, 1985, 13, 407, 436, 434, 50964], "temperature": 0.0, "avg_logprob": + -0.1315971314907074, "compression_ratio": 1.8377358490566038, "no_speech_prob": + 0.0027782367542386055}, {"id": 540, "seek": 329216, "start": 3304.16, "end": 3308.3999999999996, + "text": " always there somehow, like, I don''t know how big the team is, they''re + always, you know, some", "tokens": [50964, 1009, 456, 6063, 11, 411, 11, 286, 500, + 380, 458, 577, 955, 264, 1469, 307, 11, 436, 434, 1009, 11, 291, 458, 11, 512, 51176], + "temperature": 0.0, "avg_logprob": -0.1315971314907074, "compression_ratio": 1.8377358490566038, + "no_speech_prob": 0.0027782367542386055}, {"id": 541, "seek": 329216, "start": 3309.2799999999997, + "end": 3315.2, "text": " familiar faces who are always responding to your messages, + but it''s nice. One of the other things", "tokens": [51220, 4963, 8475, 567, 366, + 1009, 16670, 281, 428, 7897, 11, 457, 309, 311, 1481, 13, 1485, 295, 264, 661, 721, + 51516], "temperature": 0.0, "avg_logprob": -0.1315971314907074, "compression_ratio": + 1.8377358490566038, "no_speech_prob": 0.0027782367542386055}, {"id": 542, "seek": + 329216, "start": 3315.2, "end": 3320.3999999999996, "text": " that I would like + to point out is also that, I mean, distinct or I would say like a nice thing,", + "tokens": [51516, 300, 286, 576, 411, 281, 935, 484, 307, 611, 300, 11, 286, 914, + 11, 10644, 420, 286, 576, 584, 411, 257, 1481, 551, 11, 51776], "temperature": 0.0, + "avg_logprob": -0.1315971314907074, "compression_ratio": 1.8377358490566038, "no_speech_prob": + 0.0027782367542386055}, {"id": 543, "seek": 332040, "start": 3320.48, "end": 3325.6, + "text": " is that updates is one thing that, you know, you would struggle with in + a search engine.", "tokens": [50368, 307, 300, 9205, 307, 472, 551, 300, 11, 291, + 458, 11, 291, 576, 7799, 365, 294, 257, 3164, 2848, 13, 50624], "temperature": 0.0, + "avg_logprob": -0.1517350556420498, "compression_ratio": 1.7794117647058822, "no_speech_prob": + 0.009086442179977894}, {"id": 544, "seek": 332040, "start": 3325.6, "end": 3330.48, + "text": " If you have big catalog and you''re expecting, especially like in e-commerce, + like, updates come in,", "tokens": [50624, 759, 291, 362, 955, 19746, 293, 291, + 434, 9650, 11, 2318, 411, 294, 308, 12, 26926, 11, 411, 11, 9205, 808, 294, 11, + 50868], "temperature": 0.0, "avg_logprob": -0.1517350556420498, "compression_ratio": + 1.7794117647058822, "no_speech_prob": 0.009086442179977894}, {"id": 545, "seek": + 332040, "start": 3330.48, "end": 3334.7200000000003, "text": " the company that + I used to work for before, we used to have several updates and we used to club", + "tokens": [50868, 264, 2237, 300, 286, 1143, 281, 589, 337, 949, 11, 321, 1143, + 281, 362, 2940, 9205, 293, 321, 1143, 281, 6482, 51080], "temperature": 0.0, "avg_logprob": + -0.1517350556420498, "compression_ratio": 1.7794117647058822, "no_speech_prob": + 0.009086442179977894}, {"id": 546, "seek": 332040, "start": 3334.7200000000003, + "end": 3342.08, "text": " them out, kind of bashing process. And we would process + them just as like together, because we didn''t", "tokens": [51080, 552, 484, 11, + 733, 295, 987, 571, 1399, 13, 400, 321, 576, 1399, 552, 445, 382, 411, 1214, 11, + 570, 321, 994, 380, 51448], "temperature": 0.0, "avg_logprob": -0.1517350556420498, + "compression_ratio": 1.7794117647058822, "no_speech_prob": 0.009086442179977894}, + {"id": 547, "seek": 332040, "start": 3342.08, "end": 3348.2400000000002, "text": + " really have resources to process them, like one by one or just as how they come. + I think Westphah", "tokens": [51448, 534, 362, 3593, 281, 1399, 552, 11, 411, 472, + 538, 472, 420, 445, 382, 577, 436, 808, 13, 286, 519, 4055, 950, 545, 51756], "temperature": + 0.0, "avg_logprob": -0.1517350556420498, "compression_ratio": 1.7794117647058822, + "no_speech_prob": 0.009086442179977894}, {"id": 548, "seek": 334824, "start": 3348.24, + "end": 3353.2799999999997, "text": " really does that, like through updates, through + atomic updates is something, through partial updates,", "tokens": [50364, 534, 775, + 300, 11, 411, 807, 9205, 11, 807, 22275, 9205, 307, 746, 11, 807, 14641, 9205, 11, + 50616], "temperature": 0.0, "avg_logprob": -0.18496579195545837, "compression_ratio": + 2.0125, "no_speech_prob": 0.026107536628842354}, {"id": 549, "seek": 334824, "start": + 3353.2799999999997, "end": 3359.2, "text": " sorry, is what they do. And I think + this is something really, really cool. And I think that just", "tokens": [50616, + 2597, 11, 307, 437, 436, 360, 13, 400, 286, 519, 341, 307, 746, 534, 11, 534, 1627, + 13, 400, 286, 519, 300, 445, 50912], "temperature": 0.0, "avg_logprob": -0.18496579195545837, + "compression_ratio": 2.0125, "no_speech_prob": 0.026107536628842354}, {"id": 550, + "seek": 334824, "start": 3359.2, "end": 3363.52, "text": " takes away that need, + that you need to rein next everything and sometimes, you know, people", "tokens": + [50912, 2516, 1314, 300, 643, 11, 300, 291, 643, 281, 6561, 958, 1203, 293, 2171, + 11, 291, 458, 11, 561, 51128], "temperature": 0.0, "avg_logprob": -0.18496579195545837, + "compression_ratio": 2.0125, "no_speech_prob": 0.026107536628842354}, {"id": 551, + "seek": 334824, "start": 3363.52, "end": 3369.52, "text": " complain that I have + a really big catalog and it takes like six hours. If I rein next everything,", "tokens": + [51128, 11024, 300, 286, 362, 257, 534, 955, 19746, 293, 309, 2516, 411, 2309, 2496, + 13, 759, 286, 6561, 958, 1203, 11, 51428], "temperature": 0.0, "avg_logprob": -0.18496579195545837, + "compression_ratio": 2.0125, "no_speech_prob": 0.026107536628842354}, {"id": 552, + "seek": 334824, "start": 3369.52, "end": 3374.72, "text": " I think that''s something + they clearly stand out. I think when they say when they claim that we", "tokens": + [51428, 286, 519, 300, 311, 746, 436, 4448, 1463, 484, 13, 286, 519, 562, 436, 584, + 562, 436, 3932, 300, 321, 51688], "temperature": 0.0, "avg_logprob": -0.18496579195545837, + "compression_ratio": 2.0125, "no_speech_prob": 0.026107536628842354}, {"id": 553, + "seek": 337472, "start": 3375.2, "end": 3380.72, "text": " are searching for big + data, I think they really get it done. Yeah. And I think it was also proven in", + "tokens": [50388, 366, 10808, 337, 955, 1412, 11, 286, 519, 436, 534, 483, 309, + 1096, 13, 865, 13, 400, 286, 519, 309, 390, 611, 12785, 294, 50664], "temperature": + 0.0, "avg_logprob": -0.19768675247041306, "compression_ratio": 1.6936170212765957, + "no_speech_prob": 0.021658197045326233}, {"id": 554, "seek": 337472, "start": 3380.72, + "end": 3387.3599999999997, "text": " the context of Yahoo systems, right? Some of + the life scavenants. Yeah. I mean, they''re always the", "tokens": [50664, 264, + 4319, 295, 41757, 3652, 11, 558, 30, 2188, 295, 264, 993, 4216, 553, 1719, 13, 865, + 13, 286, 914, 11, 436, 434, 1009, 264, 50996], "temperature": 0.0, "avg_logprob": + -0.19768675247041306, "compression_ratio": 1.6936170212765957, "no_speech_prob": + 0.021658197045326233}, {"id": 555, "seek": 337472, "start": 3387.3599999999997, + "end": 3392.7999999999997, "text": " early adopters and I think the way they write + about this stuff and how they implement, I think they", "tokens": [50996, 2440, + 22486, 1559, 293, 286, 519, 264, 636, 436, 2464, 466, 341, 1507, 293, 577, 436, + 4445, 11, 286, 519, 436, 51268], "temperature": 0.0, "avg_logprob": -0.19768675247041306, + "compression_ratio": 1.6936170212765957, "no_speech_prob": 0.021658197045326233}, + {"id": 556, "seek": 337472, "start": 3392.7999999999997, "end": 3399.6, "text": + " correlate, you know, cover the topic when they''re talking about it or implementing + it. And I think", "tokens": [51268, 48742, 11, 291, 458, 11, 2060, 264, 4829, 562, + 436, 434, 1417, 466, 309, 420, 18114, 309, 13, 400, 286, 519, 51608], "temperature": + 0.0, "avg_logprob": -0.19768675247041306, "compression_ratio": 1.6936170212765957, + "no_speech_prob": 0.021658197045326233}, {"id": 557, "seek": 339960, "start": 3399.68, + "end": 3405.7599999999998, "text": " that''s what I really like about them. Yeah. + So like if you would recommend someone who starts", "tokens": [50368, 300, 311, + 437, 286, 534, 411, 466, 552, 13, 865, 13, 407, 411, 498, 291, 576, 2748, 1580, + 567, 3719, 50672], "temperature": 0.0, "avg_logprob": -0.11981833197853782, "compression_ratio": + 1.672340425531915, "no_speech_prob": 0.012026501819491386}, {"id": 558, "seek": + 339960, "start": 3405.7599999999998, "end": 3412.88, "text": " from scratch, would + you recommend Westphah? I think I can. I think I can. I think, you know, one of + the", "tokens": [50672, 490, 8459, 11, 576, 291, 2748, 4055, 950, 545, 30, 286, + 519, 286, 393, 13, 286, 519, 286, 393, 13, 286, 519, 11, 291, 458, 11, 472, 295, + 264, 51028], "temperature": 0.0, "avg_logprob": -0.11981833197853782, "compression_ratio": + 1.672340425531915, "no_speech_prob": 0.012026501819491386}, {"id": 559, "seek": + 339960, "start": 3412.88, "end": 3418.64, "text": " things, maybe I''m old fashioned + or I don''t know, like this, this is not affiliated way. I mean, no", "tokens": + [51028, 721, 11, 1310, 286, 478, 1331, 40646, 420, 286, 500, 380, 458, 11, 411, + 341, 11, 341, 307, 406, 42174, 636, 13, 286, 914, 11, 572, 51316], "temperature": + 0.0, "avg_logprob": -0.11981833197853782, "compression_ratio": 1.672340425531915, + "no_speech_prob": 0.012026501819491386}, {"id": 560, "seek": 339960, "start": 3418.64, + "end": 3427.2799999999997, "text": " one has really paid me to say this. But more + of like, I feel like it''s not as fragile as so many", "tokens": [51316, 472, 575, + 534, 4835, 385, 281, 584, 341, 13, 583, 544, 295, 411, 11, 286, 841, 411, 309, 311, + 406, 382, 23847, 382, 370, 867, 51748], "temperature": 0.0, "avg_logprob": -0.11981833197853782, + "compression_ratio": 1.672340425531915, "no_speech_prob": 0.012026501819491386}, + {"id": 561, "seek": 342728, "start": 3427.28, "end": 3431.76, "text": " systems + that are coming up recently. I mean, obviously we have more sourcing power. We are", + "tokens": [50364, 3652, 300, 366, 1348, 493, 3938, 13, 286, 914, 11, 2745, 321, + 362, 544, 11006, 2175, 1347, 13, 492, 366, 50588], "temperature": 0.0, "avg_logprob": + -0.1479584533389252, "compression_ratio": 1.6768558951965065, "no_speech_prob": + 0.012698985636234283}, {"id": 562, "seek": 342728, "start": 3432.32, "end": 3438.2400000000002, + "text": " ways stronger infrastructure wise. But I think it is as solid as, you + know, so do our elastic is.", "tokens": [50616, 2098, 7249, 6896, 10829, 13, 583, + 286, 519, 309, 307, 382, 5100, 382, 11, 291, 458, 11, 370, 360, 527, 17115, 307, + 13, 50912], "temperature": 0.0, "avg_logprob": -0.1479584533389252, "compression_ratio": + 1.6768558951965065, "no_speech_prob": 0.012698985636234283}, {"id": 563, "seek": + 342728, "start": 3438.2400000000002, "end": 3446.48, "text": " I think the kind + of trust I have in them. And also like as, you know, clearly catching up to the + trend", "tokens": [50912, 286, 519, 264, 733, 295, 3361, 286, 362, 294, 552, 13, + 400, 611, 411, 382, 11, 291, 458, 11, 4448, 16124, 493, 281, 264, 6028, 51324], + "temperature": 0.0, "avg_logprob": -0.1479584533389252, "compression_ratio": 1.6768558951965065, + "no_speech_prob": 0.012698985636234283}, {"id": 564, "seek": 342728, "start": 3446.48, + "end": 3453.0400000000004, "text": " and, you know, evolving soon is also what they + have. So I think that''s that''s really kind of", "tokens": [51324, 293, 11, 291, + 458, 11, 21085, 2321, 307, 611, 437, 436, 362, 13, 407, 286, 519, 300, 311, 300, + 311, 534, 733, 295, 51652], "temperature": 0.0, "avg_logprob": -0.1479584533389252, + "compression_ratio": 1.6768558951965065, "no_speech_prob": 0.012698985636234283}, + {"id": 565, "seek": 345304, "start": 3453.12, "end": 3460.72, "text": " something + remarkable. And like to think, I mean, it''s a huge pallet of systems. And it''s + not like one", "tokens": [50368, 746, 12802, 13, 400, 411, 281, 519, 11, 286, 914, + 11, 309, 311, 257, 2603, 24075, 302, 295, 3652, 13, 400, 309, 311, 406, 411, 472, + 50748], "temperature": 0.0, "avg_logprob": -0.30099921226501464, "compression_ratio": + 1.4846938775510203, "no_speech_prob": 0.012932395562529564}, {"id": 566, "seek": + 345304, "start": 3460.72, "end": 3466.88, "text": " of one is the only winner here. + Like it would think it would think about, you know, Luzin itself,", "tokens": [50748, + 295, 472, 307, 264, 787, 8507, 510, 13, 1743, 309, 576, 519, 309, 576, 519, 466, + 11, 291, 458, 11, 441, 3334, 259, 2564, 11, 51056], "temperature": 0.0, "avg_logprob": + -0.30099921226501464, "compression_ratio": 1.4846938775510203, "no_speech_prob": + 0.012932395562529564}, {"id": 567, "seek": 345304, "start": 3468.32, "end": 3476.08, + "text": " which has been developed for like how many years 20? Yeah. Or more actually. + It has so many", "tokens": [51128, 597, 575, 668, 4743, 337, 411, 577, 867, 924, + 945, 30, 865, 13, 1610, 544, 767, 13, 467, 575, 370, 867, 51516], "temperature": + 0.0, "avg_logprob": -0.30099921226501464, "compression_ratio": 1.4846938775510203, + "no_speech_prob": 0.012932395562529564}, {"id": 568, "seek": 347608, "start": 3477.04, + "end": 3483.2799999999997, "text": " human languages. Natural languages support + it that you cannot find probably in West", "tokens": [50412, 1952, 8650, 13, 20137, + 8650, 1406, 309, 300, 291, 2644, 915, 1391, 294, 4055, 50724], "temperature": 0.0, + "avg_logprob": -0.22690791079872533, "compression_ratio": 1.563265306122449, "no_speech_prob": + 0.01213217992335558}, {"id": 569, "seek": 347608, "start": 3483.2799999999997, "end": + 3488.3199999999997, "text": " Boa or other systems. But again, it all depends on + your market where you''re going. If it''s English", "tokens": [50724, 3286, 64, + 420, 661, 3652, 13, 583, 797, 11, 309, 439, 5946, 322, 428, 2142, 689, 291, 434, + 516, 13, 759, 309, 311, 3669, 50976], "temperature": 0.0, "avg_logprob": -0.22690791079872533, + "compression_ratio": 1.563265306122449, "no_speech_prob": 0.01213217992335558}, + {"id": 570, "seek": 347608, "start": 3488.3199999999997, "end": 3495.2799999999997, + "text": " speaking, probably you''ll be fine. But like if it''s like Japanese or, + you know, some of the interesting", "tokens": [50976, 4124, 11, 1391, 291, 603, + 312, 2489, 13, 583, 411, 498, 309, 311, 411, 5433, 420, 11, 291, 458, 11, 512, 295, + 264, 1880, 51324], "temperature": 0.0, "avg_logprob": -0.22690791079872533, "compression_ratio": + 1.563265306122449, "no_speech_prob": 0.01213217992335558}, {"id": 571, "seek": 347608, + "start": 3495.2799999999997, "end": 3502.08, "text": " tokenizers that have been + contributed to Luzin, I probably still stand out. Yeah, I think that''s", "tokens": + [51324, 14862, 22525, 300, 362, 668, 18434, 281, 441, 3334, 259, 11, 286, 1391, + 920, 1463, 484, 13, 865, 11, 286, 519, 300, 311, 51664], "temperature": 0.0, "avg_logprob": + -0.22690791079872533, "compression_ratio": 1.563265306122449, "no_speech_prob": + 0.01213217992335558}, {"id": 572, "seek": 350208, "start": 3502.88, "end": 3508.0, + "text": " one good point that how much of control or where exactly am I coming from? + I think a lot depends on", "tokens": [50404, 472, 665, 935, 300, 577, 709, 295, + 1969, 420, 689, 2293, 669, 286, 1348, 490, 30, 286, 519, 257, 688, 5946, 322, 50660], + "temperature": 0.0, "avg_logprob": -0.19078363156786152, "compression_ratio": 1.5655737704918034, + "no_speech_prob": 0.008014625869691372}, {"id": 573, "seek": 350208, "start": 3508.0, + "end": 3516.24, "text": " the context too. That is, that is right. I mean, I would. + Yeah. And like switching gears a bit.", "tokens": [50660, 264, 4319, 886, 13, 663, + 307, 11, 300, 307, 558, 13, 286, 914, 11, 286, 576, 13, 865, 13, 400, 411, 16493, + 20915, 257, 857, 13, 51072], "temperature": 0.0, "avg_logprob": -0.19078363156786152, + "compression_ratio": 1.5655737704918034, "no_speech_prob": 0.008014625869691372}, + {"id": 574, "seek": 350208, "start": 3517.12, "end": 3523.36, "text": " So we did + touch on this topic. But like, so your progression and the profession has been from,", + "tokens": [51116, 407, 321, 630, 2557, 322, 341, 4829, 13, 583, 411, 11, 370, 428, + 18733, 293, 264, 7032, 575, 668, 490, 11, 51428], "temperature": 0.0, "avg_logprob": + -0.19078363156786152, "compression_ratio": 1.5655737704918034, "no_speech_prob": + 0.008014625869691372}, {"id": 575, "seek": 350208, "start": 3523.92, "end": 3530.3199999999997, + "text": " you know, what sounds to me like, I don''t know, it''s the super tough + competition to go from", "tokens": [51456, 291, 458, 11, 437, 3263, 281, 385, 411, + 11, 286, 500, 380, 458, 11, 309, 311, 264, 1687, 4930, 6211, 281, 352, 490, 51776], + "temperature": 0.0, "avg_logprob": -0.19078363156786152, "compression_ratio": 1.5655737704918034, + "no_speech_prob": 0.008014625869691372}, {"id": 576, "seek": 353032, "start": 3531.1200000000003, + "end": 3539.28, "text": " to for 100,000 people to 100 something like that. It''s + just insane. It just feels like, you know,", "tokens": [50404, 281, 337, 2319, 11, + 1360, 561, 281, 2319, 746, 411, 300, 13, 467, 311, 445, 10838, 13, 467, 445, 3417, + 411, 11, 291, 458, 11, 50812], "temperature": 0.0, "avg_logprob": -0.20359127930920534, + "compression_ratio": 1.5877551020408163, "no_speech_prob": 0.013856948353350163}, + {"id": 577, "seek": 353032, "start": 3539.28, "end": 3545.52, "text": " a journey + full of challenge. But then like on top of these, there could be other challenges + that,", "tokens": [50812, 257, 4671, 1577, 295, 3430, 13, 583, 550, 411, 322, 1192, + 295, 613, 11, 456, 727, 312, 661, 4759, 300, 11, 51124], "temperature": 0.0, "avg_logprob": + -0.20359127930920534, "compression_ratio": 1.5877551020408163, "no_speech_prob": + 0.013856948353350163}, {"id": 578, "seek": 353032, "start": 3545.52, "end": 3549.6000000000004, + "text": " I don''t know, like gender inequality or whatever is happening in the + world today, right? In the", "tokens": [51124, 286, 500, 380, 458, 11, 411, 7898, + 16970, 420, 2035, 307, 2737, 294, 264, 1002, 965, 11, 558, 30, 682, 264, 51328], + "temperature": 0.0, "avg_logprob": -0.20359127930920534, "compression_ratio": 1.5877551020408163, + "no_speech_prob": 0.013856948353350163}, {"id": 579, "seek": 353032, "start": 3549.6000000000004, + "end": 3559.84, "text": " profession. And this was one of the topics that really + stood out on Haystack in Berlin last year,", "tokens": [51328, 7032, 13, 400, 341, + 390, 472, 295, 264, 8378, 300, 534, 9371, 484, 322, 8721, 372, 501, 294, 13848, + 1036, 1064, 11, 51840], "temperature": 0.0, "avg_logprob": -0.20359127930920534, + "compression_ratio": 1.5877551020408163, "no_speech_prob": 0.013856948353350163}, + {"id": 580, "seek": 355984, "start": 3559.84, "end": 3565.2000000000003, "text": + " in September, right? Where you ran the session, women in search, if I remember + correctly, the", "tokens": [50364, 294, 7216, 11, 558, 30, 2305, 291, 5872, 264, + 5481, 11, 2266, 294, 3164, 11, 498, 286, 1604, 8944, 11, 264, 50632], "temperature": + 0.0, "avg_logprob": -0.1963681920369466, "compression_ratio": 1.5454545454545454, + "no_speech_prob": 0.0023764614015817642}, {"id": 581, "seek": 355984, "start": 3565.2000000000003, + "end": 3574.1600000000003, "text": " title. And you had some women in search invited + on stage. And some of them have sent, but here,", "tokens": [50632, 4876, 13, 400, + 291, 632, 512, 2266, 294, 3164, 9185, 322, 3233, 13, 400, 512, 295, 552, 362, 2279, + 11, 457, 510, 11, 51080], "temperature": 0.0, "avg_logprob": -0.1963681920369466, + "compression_ratio": 1.5454545454545454, "no_speech_prob": 0.0023764614015817642}, + {"id": 582, "seek": 355984, "start": 3574.1600000000003, "end": 3583.6000000000004, + "text": " recorded, um, little presentations. I mean, this was very like emotional. + It was, it was a learning", "tokens": [51080, 8287, 11, 1105, 11, 707, 18964, 13, + 286, 914, 11, 341, 390, 588, 411, 6863, 13, 467, 390, 11, 309, 390, 257, 2539, 51552], + "temperature": 0.0, "avg_logprob": -0.1963681920369466, "compression_ratio": 1.5454545454545454, + "no_speech_prob": 0.0023764614015817642}, {"id": 583, "seek": 358360, "start": 3583.6, + "end": 3590.3199999999997, "text": " experience for many. The crowd was speechless + in some sense, uploading, of course.", "tokens": [50364, 1752, 337, 867, 13, 440, + 6919, 390, 48450, 294, 512, 2020, 11, 27301, 11, 295, 1164, 13, 50700], "temperature": + 0.0, "avg_logprob": -0.14773925011899292, "compression_ratio": 1.7686567164179106, + "no_speech_prob": 0.015759911388158798}, {"id": 584, "seek": 358360, "start": 3591.2, + "end": 3595.2, "text": " What was going through your mind when you were preparing + this session? How did you come up with", "tokens": [50744, 708, 390, 516, 807, 428, + 1575, 562, 291, 645, 10075, 341, 5481, 30, 1012, 630, 291, 808, 493, 365, 50944], + "temperature": 0.0, "avg_logprob": -0.14773925011899292, "compression_ratio": 1.7686567164179106, + "no_speech_prob": 0.015759911388158798}, {"id": 585, "seek": 358360, "start": 3595.2, + "end": 3602.48, "text": " this idea? I think, I mean, I would also acknowledge like, + when people raise hands after this", "tokens": [50944, 341, 1558, 30, 286, 519, + 11, 286, 914, 11, 286, 576, 611, 10692, 411, 11, 562, 561, 5300, 2377, 934, 341, + 51308], "temperature": 0.0, "avg_logprob": -0.14773925011899292, "compression_ratio": + 1.7686567164179106, "no_speech_prob": 0.015759911388158798}, {"id": 586, "seek": + 358360, "start": 3602.48, "end": 3607.7599999999998, "text": " session, I was expecting + like, oh my god, what kind of questions? Like, people would start asking me,", "tokens": + [51308, 5481, 11, 286, 390, 9650, 411, 11, 1954, 452, 3044, 11, 437, 733, 295, 1651, + 30, 1743, 11, 561, 576, 722, 3365, 385, 11, 51572], "temperature": 0.0, "avg_logprob": + -0.14773925011899292, "compression_ratio": 1.7686567164179106, "no_speech_prob": + 0.015759911388158798}, {"id": 587, "seek": 358360, "start": 3607.7599999999998, + "end": 3612.0, "text": " like, you know, how did you come up with this? Like, how + did you come to this figure and stuff? And", "tokens": [51572, 411, 11, 291, 458, + 11, 577, 630, 291, 808, 493, 365, 341, 30, 1743, 11, 577, 630, 291, 808, 281, 341, + 2573, 293, 1507, 30, 400, 51784], "temperature": 0.0, "avg_logprob": -0.14773925011899292, + "compression_ratio": 1.7686567164179106, "no_speech_prob": 0.015759911388158798}, + {"id": 588, "seek": 361200, "start": 3612.0, "end": 3619.68, "text": " they would + ask me questions about what I presented. But it was absolutely so hot warming to + see that", "tokens": [50364, 436, 576, 1029, 385, 1651, 466, 437, 286, 8212, 13, + 583, 309, 390, 3122, 370, 2368, 17983, 281, 536, 300, 50748], "temperature": 0.0, + "avg_logprob": -0.1243658978888329, "compression_ratio": 1.6, "no_speech_prob": + 0.008551513776183128}, {"id": 589, "seek": 361200, "start": 3619.68, "end": 3626.48, + "text": " each one of the people who raised hands were to appreciate and tell me + how they felt about this", "tokens": [50748, 1184, 472, 295, 264, 561, 567, 6005, + 2377, 645, 281, 4449, 293, 980, 385, 577, 436, 2762, 466, 341, 51088], "temperature": + 0.0, "avg_logprob": -0.1243658978888329, "compression_ratio": 1.6, "no_speech_prob": + 0.008551513776183128}, {"id": 590, "seek": 361200, "start": 3626.48, "end": 3632.4, + "text": " session and not really like putting me on the spot. And I think one of + the other things that I", "tokens": [51088, 5481, 293, 406, 534, 411, 3372, 385, + 322, 264, 4008, 13, 400, 286, 519, 472, 295, 264, 661, 721, 300, 286, 51384], "temperature": + 0.0, "avg_logprob": -0.1243658978888329, "compression_ratio": 1.6, "no_speech_prob": + 0.008551513776183128}, {"id": 591, "seek": 361200, "start": 3632.4, "end": 3639.2, + "text": " wanted to achieve, because we''ve had the first session of women in search + in Haystack, US last year.", "tokens": [51384, 1415, 281, 4584, 11, 570, 321, 600, + 632, 264, 700, 5481, 295, 2266, 294, 3164, 294, 8721, 372, 501, 11, 2546, 1036, + 1064, 13, 51724], "temperature": 0.0, "avg_logprob": -0.1243658978888329, "compression_ratio": + 1.6, "no_speech_prob": 0.008551513776183128}, {"id": 592, "seek": 363920, "start": + 3639.2, "end": 3644.72, "text": " So Haystack is just from the corner while we''re + talking about this or might have happened when we", "tokens": [50364, 407, 8721, + 372, 501, 307, 445, 490, 264, 4538, 1339, 321, 434, 1417, 466, 341, 420, 1062, 362, + 2011, 562, 321, 50640], "temperature": 0.0, "avg_logprob": -0.11773257101735761, + "compression_ratio": 1.6058091286307055, "no_speech_prob": 0.0018224960658699274}, + {"id": 593, "seek": 363920, "start": 3644.72, "end": 3652.56, "text": " roll this + out. But yeah, I think Audrey presented the first women in search session in US. + And it was", "tokens": [50640, 3373, 341, 484, 13, 583, 1338, 11, 286, 519, 31808, + 8212, 264, 700, 2266, 294, 3164, 5481, 294, 2546, 13, 400, 309, 390, 51032], "temperature": + 0.0, "avg_logprob": -0.11773257101735761, "compression_ratio": 1.6058091286307055, + "no_speech_prob": 0.0018224960658699274}, {"id": 594, "seek": 363920, "start": 3652.56, + "end": 3659.9199999999996, "text": " a panel discussion as to what women really + expect in the company or what kind of qualities or", "tokens": [51032, 257, 4831, + 5017, 382, 281, 437, 2266, 534, 2066, 294, 264, 2237, 420, 437, 733, 295, 16477, + 420, 51400], "temperature": 0.0, "avg_logprob": -0.11773257101735761, "compression_ratio": + 1.6058091286307055, "no_speech_prob": 0.0018224960658699274}, {"id": 595, "seek": + 363920, "start": 3659.9199999999996, "end": 3664.3199999999997, "text": " what kind + of features they, you know, stand out for women when they decide to join a company.", + "tokens": [51400, 437, 733, 295, 4122, 436, 11, 291, 458, 11, 1463, 484, 337, 2266, + 562, 436, 4536, 281, 3917, 257, 2237, 13, 51620], "temperature": 0.0, "avg_logprob": + -0.11773257101735761, "compression_ratio": 1.6058091286307055, "no_speech_prob": + 0.0018224960658699274}, {"id": 596, "seek": 366432, "start": 3665.04, "end": 3670.96, + "text": " And it was our long conversation. I think there were different kind of + feedback. Some people", "tokens": [50400, 400, 309, 390, 527, 938, 3761, 13, 286, + 519, 456, 645, 819, 733, 295, 5824, 13, 2188, 561, 50696], "temperature": 0.0, "avg_logprob": + -0.12627704856321983, "compression_ratio": 1.72, "no_speech_prob": 0.023725125938653946}, + {"id": 597, "seek": 366432, "start": 3670.96, "end": 3675.6000000000004, "text": + " enjoyed it. Some people said, like, oh my god, it was kind of like too much to + sit in one place and", "tokens": [50696, 4626, 309, 13, 2188, 561, 848, 11, 411, + 11, 1954, 452, 3044, 11, 309, 390, 733, 295, 411, 886, 709, 281, 1394, 294, 472, + 1081, 293, 50928], "temperature": 0.0, "avg_logprob": -0.12627704856321983, "compression_ratio": + 1.72, "no_speech_prob": 0.023725125938653946}, {"id": 598, "seek": 366432, "start": + 3675.6000000000004, "end": 3681.52, "text": " listen to like white women talking. + So I think the idea came from the point that I would not, you know,", "tokens": + [50928, 2140, 281, 411, 2418, 2266, 1417, 13, 407, 286, 519, 264, 1558, 1361, 490, + 264, 935, 300, 286, 576, 406, 11, 291, 458, 11, 51224], "temperature": 0.0, "avg_logprob": + -0.12627704856321983, "compression_ratio": 1.72, "no_speech_prob": 0.023725125938653946}, + {"id": 599, "seek": 366432, "start": 3681.52, "end": 3688.0, "text": " have like + a panel discussion. It has to be something different. And it has to be something", + "tokens": [51224, 362, 411, 257, 4831, 5017, 13, 467, 575, 281, 312, 746, 819, 13, + 400, 309, 575, 281, 312, 746, 51548], "temperature": 0.0, "avg_logprob": -0.12627704856321983, + "compression_ratio": 1.72, "no_speech_prob": 0.023725125938653946}, {"id": 600, + "seek": 368800, "start": 3688.0, "end": 3693.36, "text": " solid. It has to be something + that people can relate to. And it has to be something that is", "tokens": [50364, + 5100, 13, 467, 575, 281, 312, 746, 300, 561, 393, 10961, 281, 13, 400, 309, 575, + 281, 312, 746, 300, 307, 50632], "temperature": 0.0, "avg_logprob": -0.10493412338385061, + "compression_ratio": 1.7977941176470589, "no_speech_prob": 0.019587725400924683}, + {"id": 601, "seek": 368800, "start": 3693.36, "end": 3698.56, "text": " contributed + by several women. So I cannot bring everyone up on the stage of course, but I wanted + to", "tokens": [50632, 18434, 538, 2940, 2266, 13, 407, 286, 2644, 1565, 1518, 493, + 322, 264, 3233, 295, 1164, 11, 457, 286, 1415, 281, 50892], "temperature": 0.0, + "avg_logprob": -0.10493412338385061, "compression_ratio": 1.7977941176470589, "no_speech_prob": + 0.019587725400924683}, {"id": 602, "seek": 368800, "start": 3698.56, "end": 3704.88, + "text": " make sure like I have as many women as possible somehow to talk about + themselves because each time I", "tokens": [50892, 652, 988, 411, 286, 362, 382, + 867, 2266, 382, 1944, 6063, 281, 751, 466, 2969, 570, 1184, 565, 286, 51208], "temperature": + 0.0, "avg_logprob": -0.10493412338385061, "compression_ratio": 1.7977941176470589, + "no_speech_prob": 0.019587725400924683}, {"id": 603, "seek": 368800, "start": 3704.88, + "end": 3712.16, "text": " speak to, you know, fellow women in the search of crowd, + I feel like they, they very much, you know,", "tokens": [51208, 1710, 281, 11, 291, + 458, 11, 7177, 2266, 294, 264, 3164, 295, 6919, 11, 286, 841, 411, 436, 11, 436, + 588, 709, 11, 291, 458, 11, 51572], "temperature": 0.0, "avg_logprob": -0.10493412338385061, + "compression_ratio": 1.7977941176470589, "no_speech_prob": 0.019587725400924683}, + {"id": 604, "seek": 368800, "start": 3712.16, "end": 3717.44, "text": " underplay + what contributions they make. And I think this is something I keep telling, you + know,", "tokens": [51572, 833, 2858, 437, 15725, 436, 652, 13, 400, 286, 519, 341, + 307, 746, 286, 1066, 3585, 11, 291, 458, 11, 51836], "temperature": 0.0, "avg_logprob": + -0.10493412338385061, "compression_ratio": 1.7977941176470589, "no_speech_prob": + 0.019587725400924683}, {"id": 605, "seek": 371744, "start": 3717.44, "end": 3722.16, + "text": " people that I meet that, you know, what you''re doing is amazing. It''s + just that if the other person", "tokens": [50364, 561, 300, 286, 1677, 300, 11, + 291, 458, 11, 437, 291, 434, 884, 307, 2243, 13, 467, 311, 445, 300, 498, 264, 661, + 954, 50600], "temperature": 0.0, "avg_logprob": -0.09088776462761931, "compression_ratio": + 1.9308943089430894, "no_speech_prob": 0.0030685572419315577}, {"id": 606, "seek": + 371744, "start": 3722.16, "end": 3727.12, "text": " failing to see what you''ve + done, it''s not that you''ve done, you''ve not done anything less.", "tokens": [50600, + 18223, 281, 536, 437, 291, 600, 1096, 11, 309, 311, 406, 300, 291, 600, 1096, 11, + 291, 600, 406, 1096, 1340, 1570, 13, 50848], "temperature": 0.0, "avg_logprob": + -0.09088776462761931, "compression_ratio": 1.9308943089430894, "no_speech_prob": + 0.0030685572419315577}, {"id": 607, "seek": 371744, "start": 3727.84, "end": 3733.04, + "text": " And I think this is basically the message that I wanted to kind of, you + know, spread across", "tokens": [50884, 400, 286, 519, 341, 307, 1936, 264, 3636, + 300, 286, 1415, 281, 733, 295, 11, 291, 458, 11, 3974, 2108, 51144], "temperature": + 0.0, "avg_logprob": -0.09088776462761931, "compression_ratio": 1.9308943089430894, + "no_speech_prob": 0.0030685572419315577}, {"id": 608, "seek": 371744, "start": 3733.04, + "end": 3738.4, "text": " that, you know, if you''re not seen enough, I mean, maybe + we just need to gather, you know, like", "tokens": [51144, 300, 11, 291, 458, 11, + 498, 291, 434, 406, 1612, 1547, 11, 286, 914, 11, 1310, 321, 445, 643, 281, 5448, + 11, 291, 458, 11, 411, 51412], "temperature": 0.0, "avg_logprob": -0.09088776462761931, + "compression_ratio": 1.9308943089430894, "no_speech_prob": 0.0030685572419315577}, + {"id": 609, "seek": 371744, "start": 3739.04, "end": 3744.48, "text": " we as women + and we would be there to support each other. We would be there to kind of, you know,", + "tokens": [51444, 321, 382, 2266, 293, 321, 576, 312, 456, 281, 1406, 1184, 661, + 13, 492, 576, 312, 456, 281, 733, 295, 11, 291, 458, 11, 51716], "temperature": + 0.0, "avg_logprob": -0.09088776462761931, "compression_ratio": 1.9308943089430894, + "no_speech_prob": 0.0030685572419315577}, {"id": 610, "seek": 374448, "start": 3744.48, + "end": 3748.72, "text": " advocate for each other. We would be there to mentor and + collaborate with each other.", "tokens": [50364, 14608, 337, 1184, 661, 13, 492, + 576, 312, 456, 281, 14478, 293, 18338, 365, 1184, 661, 13, 50576], "temperature": + 0.0, "avg_logprob": -0.11359937729374055, "compression_ratio": 1.7875457875457876, + "no_speech_prob": 0.005558434873819351}, {"id": 611, "seek": 374448, "start": 3748.72, + "end": 3754.32, "text": " If, and I also said this in the session, that one thing + that always, you know, like surprised me", "tokens": [50576, 759, 11, 293, 286, + 611, 848, 341, 294, 264, 5481, 11, 300, 472, 551, 300, 1009, 11, 291, 458, 11, 411, + 6100, 385, 50856], "temperature": 0.0, "avg_logprob": -0.11359937729374055, "compression_ratio": + 1.7875457875457876, "no_speech_prob": 0.005558434873819351}, {"id": 612, "seek": + 374448, "start": 3754.32, "end": 3760.56, "text": " is that men form group to kind + of groom themselves to develop themselves. Women just don''t do that.", "tokens": + [50856, 307, 300, 1706, 1254, 1594, 281, 733, 295, 22198, 2969, 281, 1499, 2969, + 13, 11065, 445, 500, 380, 360, 300, 13, 51168], "temperature": 0.0, "avg_logprob": + -0.11359937729374055, "compression_ratio": 1.7875457875457876, "no_speech_prob": + 0.005558434873819351}, {"id": 613, "seek": 374448, "start": 3760.56, "end": 3766.16, + "text": " I don''t know why women are often, you know, like in their heads, they''re + still competing with each other", "tokens": [51168, 286, 500, 380, 458, 983, 2266, + 366, 2049, 11, 291, 458, 11, 411, 294, 641, 8050, 11, 436, 434, 920, 15439, 365, + 1184, 661, 51448], "temperature": 0.0, "avg_logprob": -0.11359937729374055, "compression_ratio": + 1.7875457875457876, "no_speech_prob": 0.005558434873819351}, {"id": 614, "seek": + 374448, "start": 3766.16, "end": 3772.2400000000002, "text": " because it''s like, + okay, only one could be misuniverse. Only, you know, one of you can be succeeded.", + "tokens": [51448, 570, 309, 311, 411, 11, 1392, 11, 787, 472, 727, 312, 3346, 409, + 5376, 13, 5686, 11, 291, 458, 11, 472, 295, 291, 393, 312, 20263, 13, 51752], "temperature": + 0.0, "avg_logprob": -0.11359937729374055, "compression_ratio": 1.7875457875457876, + "no_speech_prob": 0.005558434873819351}, {"id": 615, "seek": 377224, "start": 3772.3199999999997, + "end": 3776.3199999999997, "text": " There''s always like, you know, one best thing + and that basically, you know, triggers this", "tokens": [50368, 821, 311, 1009, + 411, 11, 291, 458, 11, 472, 1151, 551, 293, 300, 1936, 11, 291, 458, 11, 22827, + 341, 50568], "temperature": 0.0, "avg_logprob": -0.11926333739025759, "compression_ratio": + 1.8104265402843602, "no_speech_prob": 0.010484054684638977}, {"id": 616, "seek": + 377224, "start": 3776.3199999999997, "end": 3783.7599999999998, "text": " competitive + nature in us. And I think that needs to really go away. Like, and people do it, + people", "tokens": [50568, 10043, 3687, 294, 505, 13, 400, 286, 519, 300, 2203, + 281, 534, 352, 1314, 13, 1743, 11, 293, 561, 360, 309, 11, 561, 50940], "temperature": + 0.0, "avg_logprob": -0.11926333739025759, "compression_ratio": 1.8104265402843602, + "no_speech_prob": 0.010484054684638977}, {"id": 617, "seek": 377224, "start": 3783.7599999999998, + "end": 3789.3599999999997, "text": " make us compete. And I think this is something + that somehow, you know, I''m trying to, you know,", "tokens": [50940, 652, 505, + 11831, 13, 400, 286, 519, 341, 307, 746, 300, 6063, 11, 291, 458, 11, 286, 478, + 1382, 281, 11, 291, 458, 11, 51220], "temperature": 0.0, "avg_logprob": -0.11926333739025759, + "compression_ratio": 1.8104265402843602, "no_speech_prob": 0.010484054684638977}, + {"id": 618, "seek": 377224, "start": 3790.08, "end": 3796.3199999999997, "text": + " like spread across that there''s no point competing. Let us, you know, use each + other''s, you know,", "tokens": [51256, 411, 3974, 2108, 300, 456, 311, 572, 935, + 15439, 13, 961, 505, 11, 291, 458, 11, 764, 1184, 661, 311, 11, 291, 458, 11, 51568], + "temperature": 0.0, "avg_logprob": -0.11926333739025759, "compression_ratio": 1.8104265402843602, + "no_speech_prob": 0.010484054684638977}, {"id": 619, "seek": 379632, "start": 3796.32, + "end": 3800.96, "text": " strength to become, you know, one solid strength so that, + you know, we can really, like,", "tokens": [50364, 3800, 281, 1813, 11, 291, 458, + 11, 472, 5100, 3800, 370, 300, 11, 291, 458, 11, 321, 393, 534, 11, 411, 11, 50596], + "temperature": 0.0, "avg_logprob": -0.14218286948628944, "compression_ratio": 1.6919642857142858, + "no_speech_prob": 0.08456988632678986}, {"id": 620, "seek": 379632, "start": 3801.6800000000003, + "end": 3807.44, "text": " become a better version of ourselves. So we''re not competing + with each other. We are, you know,", "tokens": [50632, 1813, 257, 1101, 3037, 295, + 4175, 13, 407, 321, 434, 406, 15439, 365, 1184, 661, 13, 492, 366, 11, 291, 458, + 11, 50920], "temperature": 0.0, "avg_logprob": -0.14218286948628944, "compression_ratio": + 1.6919642857142858, "no_speech_prob": 0.08456988632678986}, {"id": 621, "seek": + 379632, "start": 3807.44, "end": 3816.32, "text": " we have to act as one. So that''s, + I hope that that message kind of spreads across. And I hate when", "tokens": [50920, + 321, 362, 281, 605, 382, 472, 13, 407, 300, 311, 11, 286, 1454, 300, 300, 3636, + 733, 295, 25728, 2108, 13, 400, 286, 4700, 562, 51364], "temperature": 0.0, "avg_logprob": + -0.14218286948628944, "compression_ratio": 1.6919642857142858, "no_speech_prob": + 0.08456988632678986}, {"id": 622, "seek": 379632, "start": 3816.32, "end": 3822.88, + "text": " companies say is like, you know, we have rolled out this position. And + we''re expecting like 33%", "tokens": [51364, 3431, 584, 307, 411, 11, 291, 458, + 11, 321, 362, 14306, 484, 341, 2535, 13, 400, 321, 434, 9650, 411, 11816, 4, 51692], + "temperature": 0.0, "avg_logprob": -0.14218286948628944, "compression_ratio": 1.6919642857142858, + "no_speech_prob": 0.08456988632678986}, {"id": 623, "seek": 382288, "start": 3822.88, + "end": 3827.6, "text": " applications from women. Like, we never do that for men. + Like, we never say out the numbers that,", "tokens": [50364, 5821, 490, 2266, 13, + 1743, 11, 321, 1128, 360, 300, 337, 1706, 13, 1743, 11, 321, 1128, 584, 484, 264, + 3547, 300, 11, 50600], "temperature": 0.0, "avg_logprob": -0.11256140073140462, + "compression_ratio": 1.9716599190283401, "no_speech_prob": 0.22399087250232697}, + {"id": 624, "seek": 382288, "start": 3827.6, "end": 3833.76, "text": " you know, + like these much of our, you know, applications came from men. Like, why we have + to always,", "tokens": [50600, 291, 458, 11, 411, 613, 709, 295, 527, 11, 291, 458, + 11, 5821, 1361, 490, 1706, 13, 1743, 11, 983, 321, 362, 281, 1009, 11, 50908], "temperature": + 0.0, "avg_logprob": -0.11256140073140462, "compression_ratio": 1.9716599190283401, + "no_speech_prob": 0.22399087250232697}, {"id": 625, "seek": 382288, "start": 3833.76, + "end": 3838.96, "text": " you know, explicitly talk about this number. Like, we + have reservations. We don''t need reservations.", "tokens": [50908, 291, 458, 11, + 20803, 751, 466, 341, 1230, 13, 1743, 11, 321, 362, 40222, 13, 492, 500, 380, 643, + 40222, 13, 51168], "temperature": 0.0, "avg_logprob": -0.11256140073140462, "compression_ratio": + 1.9716599190283401, "no_speech_prob": 0.22399087250232697}, {"id": 626, "seek": + 382288, "start": 3839.52, "end": 3844.08, "text": " We need, like, we need to be + equally considered. And I think there have been like several", "tokens": [51196, + 492, 643, 11, 411, 11, 321, 643, 281, 312, 12309, 4888, 13, 400, 286, 519, 456, + 362, 668, 411, 2940, 51424], "temperature": 0.0, "avg_logprob": -0.11256140073140462, + "compression_ratio": 1.9716599190283401, "no_speech_prob": 0.22399087250232697}, + {"id": 627, "seek": 382288, "start": 3845.12, "end": 3850.7200000000003, "text": + " sessions. And I think when you talked about the preparation, I think I literally + soaked into that", "tokens": [51476, 11081, 13, 400, 286, 519, 562, 291, 2825, 466, + 264, 13081, 11, 286, 519, 286, 3736, 27368, 666, 300, 51756], "temperature": 0.0, + "avg_logprob": -0.11256140073140462, "compression_ratio": 1.9716599190283401, "no_speech_prob": + 0.22399087250232697}, {"id": 628, "seek": 385072, "start": 3850.72, "end": 3856.3999999999996, + "text": " moment, you know, became one of like the activists myself. I was attending + so many different kind of", "tokens": [50364, 1623, 11, 291, 458, 11, 3062, 472, + 295, 411, 264, 23042, 2059, 13, 286, 390, 15862, 370, 867, 819, 733, 295, 50648], + "temperature": 0.0, "avg_logprob": -0.142848277914113, "compression_ratio": 1.6482758620689655, + "no_speech_prob": 0.008944659493863583}, {"id": 629, "seek": 385072, "start": 3856.3999999999996, + "end": 3861.2, "text": " webinars and like the in-person sessions. I don''t know + how many tons of like groups that I", "tokens": [50648, 26065, 293, 411, 264, 294, + 12, 10813, 11081, 13, 286, 500, 380, 458, 577, 867, 9131, 295, 411, 3935, 300, 286, + 50888], "temperature": 0.0, "avg_logprob": -0.142848277914113, "compression_ratio": + 1.6482758620689655, "no_speech_prob": 0.008944659493863583}, {"id": 630, "seek": + 385072, "start": 3861.2, "end": 3866.8799999999997, "text": " join afterwards to + listen to like what people really talk about. And I think this is like way", "tokens": + [50888, 3917, 10543, 281, 2140, 281, 411, 437, 561, 534, 751, 466, 13, 400, 286, + 519, 341, 307, 411, 636, 51172], "temperature": 0.0, "avg_logprob": -0.142848277914113, + "compression_ratio": 1.6482758620689655, "no_speech_prob": 0.008944659493863583}, + {"id": 631, "seek": 385072, "start": 3867.6, "end": 3873.8399999999997, "text": + " specific or critical to me because I want to make sure like I deliver my 100% + when I''m doing", "tokens": [51208, 2685, 420, 4924, 281, 385, 570, 286, 528, 281, + 652, 988, 411, 286, 4239, 452, 2319, 4, 562, 286, 478, 884, 51520], "temperature": + 0.0, "avg_logprob": -0.142848277914113, "compression_ratio": 1.6482758620689655, + "no_speech_prob": 0.008944659493863583}, {"id": 632, "seek": 385072, "start": 3873.8399999999997, + "end": 3879.52, "text": " something even though if it is non-technical. So although + I was like very skeptical about like if I", "tokens": [51520, 746, 754, 1673, 498, + 309, 307, 2107, 12, 29113, 804, 13, 407, 4878, 286, 390, 411, 588, 28601, 466, 411, + 498, 286, 51804], "temperature": 0.0, "avg_logprob": -0.142848277914113, "compression_ratio": + 1.6482758620689655, "no_speech_prob": 0.008944659493863583}, {"id": 633, "seek": + 387952, "start": 3879.52, "end": 3883.68, "text": " should do something non-technical + because I think in my head I was still replaying that I don''t want", "tokens": + [50364, 820, 360, 746, 2107, 12, 29113, 804, 570, 286, 519, 294, 452, 1378, 286, + 390, 920, 23836, 278, 300, 286, 500, 380, 528, 50572], "temperature": 0.0, "avg_logprob": + -0.14551137659671534, "compression_ratio": 1.783882783882784, "no_speech_prob": + 0.007647112011909485}, {"id": 634, "seek": 387952, "start": 3883.68, "end": 3888.4, + "text": " to be type-casted as like, oh, maybe she''s not that technical. That''s + why she''s talking about,", "tokens": [50572, 281, 312, 2010, 12, 66, 34440, 382, + 411, 11, 1954, 11, 1310, 750, 311, 406, 300, 6191, 13, 663, 311, 983, 750, 311, + 1417, 466, 11, 50808], "temperature": 0.0, "avg_logprob": -0.14551137659671534, + "compression_ratio": 1.783882783882784, "no_speech_prob": 0.007647112011909485}, + {"id": 635, "seek": 387952, "start": 3888.4, "end": 3894.48, "text": " you know, + non-text stuff. So it was kind of like I was confused like if I should do that. + I mean,", "tokens": [50808, 291, 458, 11, 2107, 12, 25111, 1507, 13, 407, 309, 390, + 733, 295, 411, 286, 390, 9019, 411, 498, 286, 820, 360, 300, 13, 286, 914, 11, 51112], + "temperature": 0.0, "avg_logprob": -0.14551137659671534, "compression_ratio": 1.783882783882784, + "no_speech_prob": 0.007647112011909485}, {"id": 636, "seek": 387952, "start": 3894.48, + "end": 3901.04, "text": " I don''t want to be type-casted. But in the end I was + very happy, very surprised, happy, surprised", "tokens": [51112, 286, 500, 380, + 528, 281, 312, 2010, 12, 66, 34440, 13, 583, 294, 264, 917, 286, 390, 588, 2055, + 11, 588, 6100, 11, 2055, 11, 6100, 51440], "temperature": 0.0, "avg_logprob": -0.14551137659671534, + "compression_ratio": 1.783882783882784, "no_speech_prob": 0.007647112011909485}, + {"id": 637, "seek": 387952, "start": 3901.04, "end": 3907.28, "text": " that people + took it well. People took it the way I expected them to take it. And a lot of women", + "tokens": [51440, 300, 561, 1890, 309, 731, 13, 3432, 1890, 309, 264, 636, 286, + 5176, 552, 281, 747, 309, 13, 400, 257, 688, 295, 2266, 51752], "temperature": 0.0, + "avg_logprob": -0.14551137659671534, "compression_ratio": 1.783882783882784, "no_speech_prob": + 0.007647112011909485}, {"id": 638, "seek": 390728, "start": 3907.28, "end": 3911.92, + "text": " reach out to me as well. A lot of companies reach out to me as well. And + it''s surprising that", "tokens": [50364, 2524, 484, 281, 385, 382, 731, 13, 316, + 688, 295, 3431, 2524, 484, 281, 385, 382, 731, 13, 400, 309, 311, 8830, 300, 50596], + "temperature": 0.0, "avg_logprob": -0.1158824602762858, "compression_ratio": 1.8185185185185184, + "no_speech_prob": 0.023273682221770287}, {"id": 639, "seek": 390728, "start": 3911.92, + "end": 3916.8, "text": " how many people want to collaborate and you know they tag + me on several, you know, LinkedIn posts", "tokens": [50596, 577, 867, 561, 528, + 281, 18338, 293, 291, 458, 436, 6162, 385, 322, 2940, 11, 291, 458, 11, 20657, 12300, + 50840], "temperature": 0.0, "avg_logprob": -0.1158824602762858, "compression_ratio": + 1.8185185185185184, "no_speech_prob": 0.023273682221770287}, {"id": 640, "seek": + 390728, "start": 3916.8, "end": 3922.1600000000003, "text": " as well when people + make big claims. And that gives me an opportunity to speak to different companies", + "tokens": [50840, 382, 731, 562, 561, 652, 955, 9441, 13, 400, 300, 2709, 385, 364, + 2650, 281, 1710, 281, 819, 3431, 51108], "temperature": 0.0, "avg_logprob": -0.1158824602762858, + "compression_ratio": 1.8185185185185184, "no_speech_prob": 0.023273682221770287}, + {"id": 641, "seek": 390728, "start": 3922.1600000000003, "end": 3927.28, "text": + " as to, you know, what are they doing? And what is it that they want to do? And + sometimes people expect", "tokens": [51108, 382, 281, 11, 291, 458, 11, 437, 366, + 436, 884, 30, 400, 437, 307, 309, 300, 436, 528, 281, 360, 30, 400, 2171, 561, 2066, + 51364], "temperature": 0.0, "avg_logprob": -0.1158824602762858, "compression_ratio": + 1.8185185185185184, "no_speech_prob": 0.023273682221770287}, {"id": 642, "seek": + 390728, "start": 3927.28, "end": 3931.84, "text": " that you know, I would be coaching + like how can they add diversity, how they can bring in more", "tokens": [51364, + 300, 291, 458, 11, 286, 576, 312, 15818, 411, 577, 393, 436, 909, 8811, 11, 577, + 436, 393, 1565, 294, 544, 51592], "temperature": 0.0, "avg_logprob": -0.1158824602762858, + "compression_ratio": 1.8185185185185184, "no_speech_prob": 0.023273682221770287}, + {"id": 643, "seek": 393184, "start": 3931.92, "end": 3937.1200000000003, "text": + " women in the team. And I''m like, okay, I can help you with your, you know, search + application.", "tokens": [50368, 2266, 294, 264, 1469, 13, 400, 286, 478, 411, 11, + 1392, 11, 286, 393, 854, 291, 365, 428, 11, 291, 458, 11, 3164, 3861, 13, 50628], + "temperature": 0.0, "avg_logprob": -0.15840547061660915, "compression_ratio": 1.5254901960784313, + "no_speech_prob": 0.01553185936063528}, {"id": 644, "seek": 393184, "start": 3937.1200000000003, + "end": 3942.88, "text": " Not maybe this is not probably what I master. But it''s + nice. I think how many people kind of", "tokens": [50628, 1726, 1310, 341, 307, + 406, 1391, 437, 286, 4505, 13, 583, 309, 311, 1481, 13, 286, 519, 577, 867, 561, + 733, 295, 50916], "temperature": 0.0, "avg_logprob": -0.15840547061660915, "compression_ratio": + 1.5254901960784313, "no_speech_prob": 0.01553185936063528}, {"id": 645, "seek": + 393184, "start": 3942.88, "end": 3948.7200000000003, "text": " want to be involved + with this venture. There''s this company that I recently spoke to. I would not name", + "tokens": [50916, 528, 281, 312, 3288, 365, 341, 18474, 13, 821, 311, 341, 2237, + 300, 286, 3938, 7179, 281, 13, 286, 576, 406, 1315, 51208], "temperature": 0.0, + "avg_logprob": -0.15840547061660915, "compression_ratio": 1.5254901960784313, "no_speech_prob": + 0.01553185936063528}, {"id": 646, "seek": 393184, "start": 3948.7200000000003, "end": + 3956.08, "text": " them. I would present them maybe in Haystack EU who have got + so convinced. And even the CEO of the", "tokens": [51208, 552, 13, 286, 576, 1974, + 552, 1310, 294, 8721, 372, 501, 10887, 567, 362, 658, 370, 12561, 13, 400, 754, + 264, 9282, 295, 264, 51576], "temperature": 0.0, "avg_logprob": -0.15840547061660915, + "compression_ratio": 1.5254901960784313, "no_speech_prob": 0.01553185936063528}, + {"id": 647, "seek": 395608, "start": 3956.08, "end": 3962.0, "text": " company got + so convinced that they introduced like supporting women and having women in presence", + "tokens": [50364, 2237, 658, 370, 12561, 300, 436, 7268, 411, 7231, 2266, 293, 1419, + 2266, 294, 6814, 50660], "temperature": 0.0, "avg_logprob": -0.10892707210476116, + "compression_ratio": 1.7553956834532374, "no_speech_prob": 0.005058001726865768}, + {"id": 648, "seek": 395608, "start": 3962.96, "end": 3967.7599999999998, "text": + " in their company as like one of the pillars pillars of the company. Like this + is what we are", "tokens": [50708, 294, 641, 2237, 382, 411, 472, 295, 264, 26729, + 26729, 295, 264, 2237, 13, 1743, 341, 307, 437, 321, 366, 50948], "temperature": + 0.0, "avg_logprob": -0.10892707210476116, "compression_ratio": 1.7553956834532374, + "no_speech_prob": 0.005058001726865768}, {"id": 649, "seek": 395608, "start": 3967.7599999999998, + "end": 3972.7999999999997, "text": " defending. This is what we are going to be + talking about. So which was very hot warming. People sent", "tokens": [50948, 21377, + 13, 639, 307, 437, 321, 366, 516, 281, 312, 1417, 466, 13, 407, 597, 390, 588, 2368, + 17983, 13, 3432, 2279, 51200], "temperature": 0.0, "avg_logprob": -0.10892707210476116, + "compression_ratio": 1.7553956834532374, "no_speech_prob": 0.005058001726865768}, + {"id": 650, "seek": 395608, "start": 3972.7999999999997, "end": 3978.24, "text": + " me messages like, Oh, after your session, you know, we added like two or three + women to our team.", "tokens": [51200, 385, 7897, 411, 11, 876, 11, 934, 428, 5481, + 11, 291, 458, 11, 321, 3869, 411, 732, 420, 1045, 2266, 281, 527, 1469, 13, 51472], + "temperature": 0.0, "avg_logprob": -0.10892707210476116, "compression_ratio": 1.7553956834532374, + "no_speech_prob": 0.005058001726865768}, {"id": 651, "seek": 395608, "start": 3978.24, + "end": 3983.04, "text": " I think in next six months time, we would let you know + how this goes. So it''s very warming to know,", "tokens": [51472, 286, 519, 294, + 958, 2309, 2493, 565, 11, 321, 576, 718, 291, 458, 577, 341, 1709, 13, 407, 309, + 311, 588, 17983, 281, 458, 11, 51712], "temperature": 0.0, "avg_logprob": -0.10892707210476116, + "compression_ratio": 1.7553956834532374, "no_speech_prob": 0.005058001726865768}, + {"id": 652, "seek": 398304, "start": 3983.04, "end": 3990.32, "text": " like people + are trusting me, people are, you know, like reacting to it very positively. People", + "tokens": [50364, 411, 561, 366, 28235, 385, 11, 561, 366, 11, 291, 458, 11, 411, + 25817, 281, 309, 588, 25795, 13, 3432, 50728], "temperature": 0.0, "avg_logprob": + -0.14304685592651367, "compression_ratio": 1.7671232876712328, "no_speech_prob": + 0.012050105258822441}, {"id": 653, "seek": 398304, "start": 3990.32, "end": 3998.24, + "text": " are not, you know, typecasting me. And that''s, I mean, it feels, you + know, like I feel accountable", "tokens": [50728, 366, 406, 11, 291, 458, 11, 2010, + 48860, 385, 13, 400, 300, 311, 11, 286, 914, 11, 309, 3417, 11, 291, 458, 11, 411, + 286, 841, 18024, 51124], "temperature": 0.0, "avg_logprob": -0.14304685592651367, + "compression_ratio": 1.7671232876712328, "no_speech_prob": 0.012050105258822441}, + {"id": 654, "seek": 398304, "start": 3998.24, "end": 4004.64, "text": " for rest + of the women as well. So I feel like if I mean, I can somehow, you know, bring more", + "tokens": [51124, 337, 1472, 295, 264, 2266, 382, 731, 13, 407, 286, 841, 411, 498, + 286, 914, 11, 286, 393, 6063, 11, 291, 458, 11, 1565, 544, 51444], "temperature": + 0.0, "avg_logprob": -0.14304685592651367, "compression_ratio": 1.7671232876712328, + "no_speech_prob": 0.012050105258822441}, {"id": 655, "seek": 398304, "start": 4004.64, + "end": 4009.12, "text": " of women together and like the problem that they have, + they do not have means to network even though", "tokens": [51444, 295, 2266, 1214, + 293, 411, 264, 1154, 300, 436, 362, 11, 436, 360, 406, 362, 1355, 281, 3209, 754, + 1673, 51668], "temperature": 0.0, "avg_logprob": -0.14304685592651367, "compression_ratio": + 1.7671232876712328, "no_speech_prob": 0.012050105258822441}, {"id": 656, "seek": + 400912, "start": 4009.2799999999997, "end": 4014.88, "text": " we are so much of + advanced word. Like some way we could mentor women. I think if you remember, like", + "tokens": [50372, 321, 366, 370, 709, 295, 7339, 1349, 13, 1743, 512, 636, 321, + 727, 14478, 2266, 13, 286, 519, 498, 291, 1604, 11, 411, 50652], "temperature": + 0.0, "avg_logprob": -0.1357899962878618, "compression_ratio": 1.8404669260700388, + "no_speech_prob": 0.01437942124903202}, {"id": 657, "seek": 400912, "start": 4015.6, + "end": 4020.88, "text": " the women that came up on the stage, they were not speaking + up that nicely as well. I mean,", "tokens": [50688, 264, 2266, 300, 1361, 493, 322, + 264, 3233, 11, 436, 645, 406, 4124, 493, 300, 9594, 382, 731, 13, 286, 914, 11, + 50952], "temperature": 0.0, "avg_logprob": -0.1357899962878618, "compression_ratio": + 1.8404669260700388, "no_speech_prob": 0.01437942124903202}, {"id": 658, "seek": + 400912, "start": 4020.88, "end": 4025.3599999999997, "text": " they were not groomed + in the in the manner that, you know, some of them were really shaking.", "tokens": + [50952, 436, 645, 406, 22198, 292, 294, 264, 294, 264, 9060, 300, 11, 291, 458, + 11, 512, 295, 552, 645, 534, 15415, 13, 51176], "temperature": 0.0, "avg_logprob": + -0.1357899962878618, "compression_ratio": 1.8404669260700388, "no_speech_prob": + 0.01437942124903202}, {"id": 659, "seek": 400912, "start": 4025.3599999999997, "end": + 4030.4, "text": " And some of these women, you know, wanted to, they were feeling + more comfortable to speak,", "tokens": [51176, 400, 512, 295, 613, 2266, 11, 291, + 458, 11, 1415, 281, 11, 436, 645, 2633, 544, 4619, 281, 1710, 11, 51428], "temperature": + 0.0, "avg_logprob": -0.1357899962878618, "compression_ratio": 1.8404669260700388, + "no_speech_prob": 0.01437942124903202}, {"id": 660, "seek": 400912, "start": 4030.4, + "end": 4035.68, "text": " like in a recorded video in the closed room. And I think + that''s where it was coming from. Some of", "tokens": [51428, 411, 294, 257, 8287, + 960, 294, 264, 5395, 1808, 13, 400, 286, 519, 300, 311, 689, 309, 390, 1348, 490, + 13, 2188, 295, 51692], "temperature": 0.0, "avg_logprob": -0.1357899962878618, "compression_ratio": + 1.8404669260700388, "no_speech_prob": 0.01437942124903202}, {"id": 661, "seek": + 403568, "start": 4035.68, "end": 4041.04, "text": " them were obviously coming from + another place altogether, last minute editions as well. So", "tokens": [50364, 552, + 645, 2745, 1348, 490, 1071, 1081, 19051, 11, 1036, 3456, 44840, 382, 731, 13, 407, + 50632], "temperature": 0.0, "avg_logprob": -0.13537356567382813, "compression_ratio": + 1.7732342007434945, "no_speech_prob": 0.004257075022906065}, {"id": 662, "seek": + 403568, "start": 4041.04, "end": 4045.8399999999997, "text": " obviously they could + not manage to travel. But most of them had this fear that they were like,", "tokens": + [50632, 2745, 436, 727, 406, 3067, 281, 3147, 13, 583, 881, 295, 552, 632, 341, + 4240, 300, 436, 645, 411, 11, 50872], "temperature": 0.0, "avg_logprob": -0.13537356567382813, + "compression_ratio": 1.7732342007434945, "no_speech_prob": 0.004257075022906065}, + {"id": 663, "seek": 403568, "start": 4045.8399999999997, "end": 4049.2799999999997, + "text": " you know, we would introduce ourselves in one line. And I was like, yeah, + I mean, one line,", "tokens": [50872, 291, 458, 11, 321, 576, 5366, 4175, 294, 472, + 1622, 13, 400, 286, 390, 411, 11, 1338, 11, 286, 914, 11, 472, 1622, 11, 51044], + "temperature": 0.0, "avg_logprob": -0.13537356567382813, "compression_ratio": 1.7732342007434945, + "no_speech_prob": 0.004257075022906065}, {"id": 664, "seek": 403568, "start": 4049.2799999999997, + "end": 4054.96, "text": " two line, three line, just come up on the stage. Like, + let''s just, you know, feel that, you know,", "tokens": [51044, 732, 1622, 11, 1045, + 1622, 11, 445, 808, 493, 322, 264, 3233, 13, 1743, 11, 718, 311, 445, 11, 291, 458, + 11, 841, 300, 11, 291, 458, 11, 51328], "temperature": 0.0, "avg_logprob": -0.13537356567382813, + "compression_ratio": 1.7732342007434945, "no_speech_prob": 0.004257075022906065}, + {"id": 665, "seek": 403568, "start": 4054.96, "end": 4060.72, "text": " light up + the stage, like how does it really feels like? So, and if you, you were there in + the session", "tokens": [51328, 1442, 493, 264, 3233, 11, 411, 577, 775, 309, 534, + 3417, 411, 30, 407, 11, 293, 498, 291, 11, 291, 645, 456, 294, 264, 5481, 51616], + "temperature": 0.0, "avg_logprob": -0.13537356567382813, "compression_ratio": 1.7732342007434945, + "no_speech_prob": 0.004257075022906065}, {"id": 666, "seek": 406072, "start": 4060.8799999999997, + "end": 4066.64, "text": " as well, like none of the contributions they mentioned + were less in any way. It''s just that we don''t", "tokens": [50372, 382, 731, 11, + 411, 6022, 295, 264, 15725, 436, 2835, 645, 1570, 294, 604, 636, 13, 467, 311, 445, + 300, 321, 500, 380, 50660], "temperature": 0.0, "avg_logprob": -0.15934961849881202, + "compression_ratio": 1.6753246753246753, "no_speech_prob": 0.18113590776920319}, + {"id": 667, "seek": 406072, "start": 4066.64, "end": 4073.2799999999997, "text": + " realize the, that how big our contribution is. I was blown literally like, I think + I even made", "tokens": [50660, 4325, 264, 11, 300, 577, 955, 527, 13150, 307, 13, + 286, 390, 16479, 3736, 411, 11, 286, 519, 286, 754, 1027, 50992], "temperature": + 0.0, "avg_logprob": -0.15934961849881202, "compression_ratio": 1.6753246753246753, + "no_speech_prob": 0.18113590776920319}, {"id": 668, "seek": 406072, "start": 4074.24, + "end": 4080.7999999999997, "text": " the comment there. I don''t remember the name + of the lady, but she was saying, hey, I was just like", "tokens": [51040, 264, 2871, + 456, 13, 286, 500, 380, 1604, 264, 1315, 295, 264, 7262, 11, 457, 750, 390, 1566, + 11, 4177, 11, 286, 390, 445, 411, 51368], "temperature": 0.0, "avg_logprob": -0.15934961849881202, + "compression_ratio": 1.6753246753246753, "no_speech_prob": 0.18113590776920319}, + {"id": 669, "seek": 406072, "start": 4081.52, "end": 4087.3599999999997, "text": + " a student. And then I found an internship in one company and I started contributing + to solar", "tokens": [51404, 257, 3107, 13, 400, 550, 286, 1352, 364, 16861, 294, + 472, 2237, 293, 286, 1409, 19270, 281, 7936, 51696], "temperature": 0.0, "avg_logprob": + -0.15934961849881202, "compression_ratio": 1.6753246753246753, "no_speech_prob": + 0.18113590776920319}, {"id": 670, "seek": 408736, "start": 4087.44, "end": 4098.64, + "text": " and my code was accepted. Yeah. And she was exactly. She was still, like + doubtful of herself. Like,", "tokens": [50368, 293, 452, 3089, 390, 9035, 13, 865, + 13, 400, 750, 390, 2293, 13, 1240, 390, 920, 11, 411, 6385, 906, 295, 7530, 13, + 1743, 11, 50928], "temperature": 0.0, "avg_logprob": -0.23595818849367517, "compression_ratio": + 1.6351931330472103, "no_speech_prob": 0.016693614423274994}, {"id": 671, "seek": + 408736, "start": 4098.64, "end": 4104.64, "text": " am I going, what is this? Is + this the right thing? Yeah. And I was like, I remember myself,", "tokens": [50928, + 669, 286, 516, 11, 437, 307, 341, 30, 1119, 341, 264, 558, 551, 30, 865, 13, 400, + 286, 390, 411, 11, 286, 1604, 2059, 11, 51228], "temperature": 0.0, "avg_logprob": + -0.23595818849367517, "compression_ratio": 1.6351931330472103, "no_speech_prob": + 0.016693614423274994}, {"id": 672, "seek": 408736, "start": 4105.6, "end": 4111.6, + "text": " when 2010, 11, 12, something like that, I came up with some ideas, some + code, and I was kind of", "tokens": [51276, 562, 9657, 11, 2975, 11, 2272, 11, 746, + 411, 300, 11, 286, 1361, 493, 365, 512, 3487, 11, 512, 3089, 11, 293, 286, 390, + 733, 295, 51576], "temperature": 0.0, "avg_logprob": -0.23595818849367517, "compression_ratio": + 1.6351931330472103, "no_speech_prob": 0.016693614423274994}, {"id": 673, "seek": + 408736, "start": 4111.6, "end": 4116.56, "text": " thinking, what if I contributed + something to solar? And I failed. I could never like find my,", "tokens": [51576, + 1953, 11, 437, 498, 286, 18434, 746, 281, 7936, 30, 400, 286, 7612, 13, 286, 727, + 1128, 411, 915, 452, 11, 51824], "temperature": 0.0, "avg_logprob": -0.23595818849367517, + "compression_ratio": 1.6351931330472103, "no_speech_prob": 0.016693614423274994}, + {"id": 674, "seek": 411656, "start": 4116.56, "end": 4123.68, "text": " myself a + path there. And then at some point, I kind of like gave up in a way. And you know,", + "tokens": [50364, 2059, 257, 3100, 456, 13, 400, 550, 412, 512, 935, 11, 286, 733, + 295, 411, 2729, 493, 294, 257, 636, 13, 400, 291, 458, 11, 50720], "temperature": + 0.0, "avg_logprob": -0.19213156430226452, "compression_ratio": 1.6431718061674008, + "no_speech_prob": 0.0027083023451268673}, {"id": 675, "seek": 411656, "start": 4123.68, + "end": 4131.280000000001, "text": " like, and then this lady says, hey, I just did + this small thing, right? And clearly underselling it.", "tokens": [50720, 411, 11, + 293, 550, 341, 7262, 1619, 11, 4177, 11, 286, 445, 630, 341, 1359, 551, 11, 558, + 30, 400, 4448, 833, 30427, 309, 13, 51100], "temperature": 0.0, "avg_logprob": -0.19213156430226452, + "compression_ratio": 1.6431718061674008, "no_speech_prob": 0.0027083023451268673}, + {"id": 676, "seek": 411656, "start": 4131.280000000001, "end": 4137.4400000000005, + "text": " And by the way, yeah, that''s true. And by the way, I mean, if she''s + listening or someone else is", "tokens": [51100, 400, 538, 264, 636, 11, 1338, 11, + 300, 311, 2074, 13, 400, 538, 264, 636, 11, 286, 914, 11, 498, 750, 311, 4764, 420, + 1580, 1646, 307, 51408], "temperature": 0.0, "avg_logprob": -0.19213156430226452, + "compression_ratio": 1.6431718061674008, "no_speech_prob": 0.0027083023451268673}, + {"id": 677, "seek": 411656, "start": 4137.4400000000005, "end": 4141.52, "text": + " listening, I have used peramsat in the vector implementation as well. There you + go.", "tokens": [51408, 4764, 11, 286, 362, 1143, 680, 4070, 267, 294, 264, 8062, + 11420, 382, 731, 13, 821, 291, 352, 13, 51612], "temperature": 0.0, "avg_logprob": + -0.19213156430226452, "compression_ratio": 1.6431718061674008, "no_speech_prob": + 0.0027083023451268673}, {"id": 678, "seek": 414152, "start": 4141.92, "end": 4148.400000000001, + "text": " Yeah, I have tried to include all the bits and pieces like, you know, + I mean, I''m always,", "tokens": [50384, 865, 11, 286, 362, 3031, 281, 4090, 439, + 264, 9239, 293, 3755, 411, 11, 291, 458, 11, 286, 914, 11, 286, 478, 1009, 11, 50708], + "temperature": 0.0, "avg_logprob": -0.15071676018532743, "compression_ratio": 1.701834862385321, + "no_speech_prob": 0.05892246589064598}, {"id": 679, "seek": 414152, "start": 4148.400000000001, + "end": 4154.4800000000005, "text": " you know, touched by this open source, you + know, contribution part. I feel like that gives you", "tokens": [50708, 291, 458, + 11, 9828, 538, 341, 1269, 4009, 11, 291, 458, 11, 13150, 644, 13, 286, 841, 411, + 300, 2709, 291, 51012], "temperature": 0.0, "avg_logprob": -0.15071676018532743, + "compression_ratio": 1.701834862385321, "no_speech_prob": 0.05892246589064598}, + {"id": 680, "seek": 414152, "start": 4154.4800000000005, "end": 4162.4800000000005, + "text": " the opportunity that you can learn from experienced people. And it takes + effort to accept the feedback", "tokens": [51012, 264, 2650, 300, 291, 393, 1466, + 490, 6751, 561, 13, 400, 309, 2516, 4630, 281, 3241, 264, 5824, 51412], "temperature": + 0.0, "avg_logprob": -0.15071676018532743, "compression_ratio": 1.701834862385321, + "no_speech_prob": 0.05892246589064598}, {"id": 681, "seek": 414152, "start": 4162.4800000000005, + "end": 4168.240000000001, "text": " from such a large audience, so many experienced + people. It gives the opportunity to", "tokens": [51412, 490, 1270, 257, 2416, 4034, + 11, 370, 867, 6751, 561, 13, 467, 2709, 264, 2650, 281, 51700], "temperature": 0.0, + "avg_logprob": -0.15071676018532743, "compression_ratio": 1.701834862385321, "no_speech_prob": + 0.05892246589064598}, {"id": 682, "seek": 416824, "start": 4168.88, "end": 4174.16, + "text": " interact and have, you know, like comments from people who''ve been working + in the industry for", "tokens": [50396, 4648, 293, 362, 11, 291, 458, 11, 411, 3053, + 490, 561, 567, 600, 668, 1364, 294, 264, 3518, 337, 50660], "temperature": 0.0, + "avg_logprob": -0.1436111946416095, "compression_ratio": 1.8097014925373134, "no_speech_prob": + 0.02423580177128315}, {"id": 683, "seek": 416824, "start": 4174.16, "end": 4180.8, + "text": " such a long time. And if you survive like one contribution, even one contribution, + I think that''s", "tokens": [50660, 1270, 257, 938, 565, 13, 400, 498, 291, 7867, + 411, 472, 13150, 11, 754, 472, 13150, 11, 286, 519, 300, 311, 50992], "temperature": + 0.0, "avg_logprob": -0.1436111946416095, "compression_ratio": 1.8097014925373134, + "no_speech_prob": 0.02423580177128315}, {"id": 684, "seek": 416824, "start": 4180.8, + "end": 4187.44, "text": " where it starts. And something like, you know, it''s like + some addiction. Like once you start it,", "tokens": [50992, 689, 309, 3719, 13, + 400, 746, 411, 11, 291, 458, 11, 309, 311, 411, 512, 16835, 13, 1743, 1564, 291, + 722, 309, 11, 51324], "temperature": 0.0, "avg_logprob": -0.1436111946416095, "compression_ratio": + 1.8097014925373134, "no_speech_prob": 0.02423580177128315}, {"id": 685, "seek": + 416824, "start": 4187.44, "end": 4193.36, "text": " it''s like, you don''t, you + just keep one of, do it like every, in every possible way. You just want", "tokens": + [51324, 309, 311, 411, 11, 291, 500, 380, 11, 291, 445, 1066, 472, 295, 11, 360, + 309, 411, 633, 11, 294, 633, 1944, 636, 13, 509, 445, 528, 51620], "temperature": + 0.0, "avg_logprob": -0.1436111946416095, "compression_ratio": 1.8097014925373134, + "no_speech_prob": 0.02423580177128315}, {"id": 686, "seek": 416824, "start": 4193.36, + "end": 4198.0, "text": " to make sure like you''re contributing something or the + other. It''s so amazing. It feels amazing.", "tokens": [51620, 281, 652, 988, 411, + 291, 434, 19270, 746, 420, 264, 661, 13, 467, 311, 370, 2243, 13, 467, 3417, 2243, + 13, 51852], "temperature": 0.0, "avg_logprob": -0.1436111946416095, "compression_ratio": + 1.8097014925373134, "no_speech_prob": 0.02423580177128315}, {"id": 687, "seek": + 419800, "start": 4198.96, "end": 4207.36, "text": " Yeah, exactly. And it''s nice + to say this because I hope that there will be more female listeners", "tokens": + [50412, 865, 11, 2293, 13, 400, 309, 311, 1481, 281, 584, 341, 570, 286, 1454, 300, + 456, 486, 312, 544, 6556, 23274, 50832], "temperature": 0.0, "avg_logprob": -0.14016225968284168, + "compression_ratio": 1.6069868995633187, "no_speech_prob": 0.004716574214398861}, + {"id": 688, "seek": 419800, "start": 4207.36, "end": 4212.96, "text": " also joining + this podcast, according to my statistics, at least, you know, there''ve been", "tokens": + [50832, 611, 5549, 341, 7367, 11, 4650, 281, 452, 12523, 11, 412, 1935, 11, 291, + 458, 11, 456, 600, 668, 51112], "temperature": 0.0, "avg_logprob": -0.14016225968284168, + "compression_ratio": 1.6069868995633187, "no_speech_prob": 0.004716574214398861}, + {"id": 689, "seek": 419800, "start": 4212.96, "end": 4220.56, "text": " domination + of male listeners if YouTube is not lying to me. But I don''t think that should", + "tokens": [51112, 41502, 295, 7133, 23274, 498, 3088, 307, 406, 8493, 281, 385, + 13, 583, 286, 500, 380, 519, 300, 820, 51492], "temperature": 0.0, "avg_logprob": + -0.14016225968284168, "compression_ratio": 1.6069868995633187, "no_speech_prob": + 0.004716574214398861}, {"id": 690, "seek": 419800, "start": 4220.56, "end": 4226.32, + "text": " continue that way, for sure. Because this world is much more diverse, + much more interesting,", "tokens": [51492, 2354, 300, 636, 11, 337, 988, 13, 1436, + 341, 1002, 307, 709, 544, 9521, 11, 709, 544, 1880, 11, 51780], "temperature": 0.0, + "avg_logprob": -0.14016225968284168, "compression_ratio": 1.6069868995633187, "no_speech_prob": + 0.004716574214398861}, {"id": 691, "seek": 422800, "start": 4228.08, "end": 4235.6, + "text": " and also we should remember to say that in the relevance and matching + text lack, there is a secret", "tokens": [50368, 293, 611, 321, 820, 1604, 281, + 584, 300, 294, 264, 32684, 293, 14324, 2487, 5011, 11, 456, 307, 257, 4054, 50744], + "temperature": 0.0, "avg_logprob": -0.20341597000757852, "compression_ratio": 1.5708154506437768, + "no_speech_prob": 0.005313091445714235}, {"id": 692, "seek": 422800, "start": 4235.6, + "end": 4240.88, "text": " group, right? Can you say something about that? Like a + channel? You mean women and source group?", "tokens": [50744, 1594, 11, 558, 30, + 1664, 291, 584, 746, 466, 300, 30, 1743, 257, 2269, 30, 509, 914, 2266, 293, 4009, + 1594, 30, 51008], "temperature": 0.0, "avg_logprob": -0.20341597000757852, "compression_ratio": + 1.5708154506437768, "no_speech_prob": 0.005313091445714235}, {"id": 693, "seek": + 422800, "start": 4240.88, "end": 4247.2, "text": " It''s open. I mean, it''s open + to women and the allies. So if you consider yourself as a", "tokens": [51008, 467, + 311, 1269, 13, 286, 914, 11, 309, 311, 1269, 281, 2266, 293, 264, 14719, 13, 407, + 498, 291, 1949, 1803, 382, 257, 51324], "temperature": 0.0, "avg_logprob": -0.20341597000757852, + "compression_ratio": 1.5708154506437768, "no_speech_prob": 0.005313091445714235}, + {"id": 694, "seek": 422800, "start": 4247.2, "end": 4254.72, "text": " ally for + women in search, you''re welcome to join it. There is no secret. You can be", "tokens": + [51324, 23356, 337, 2266, 294, 3164, 11, 291, 434, 2928, 281, 3917, 309, 13, 821, + 307, 572, 4054, 13, 509, 393, 312, 51700], "temperature": 0.0, "avg_logprob": -0.20341597000757852, + "compression_ratio": 1.5708154506437768, "no_speech_prob": 0.005313091445714235}, + {"id": 695, "seek": 425472, "start": 4254.72, "end": 4260.64, "text": " happily + part of it. And we have session one some month every first Wednesday. And we still + have", "tokens": [50364, 19909, 644, 295, 309, 13, 400, 321, 362, 5481, 472, 512, + 1618, 633, 700, 10579, 13, 400, 321, 920, 362, 50660], "temperature": 0.0, "avg_logprob": + -0.11159610748291016, "compression_ratio": 1.7142857142857142, "no_speech_prob": + 0.07438478618860245}, {"id": 696, "seek": 425472, "start": 4260.64, "end": 4267.280000000001, + "text": " that. We are trying to bring in more useful content. So I talked about + like mentoring part and", "tokens": [50660, 300, 13, 492, 366, 1382, 281, 1565, + 294, 544, 4420, 2701, 13, 407, 286, 2825, 466, 411, 30257, 644, 293, 50992], "temperature": + 0.0, "avg_logprob": -0.11159610748291016, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.07438478618860245}, {"id": 697, "seek": 425472, "start": 4267.280000000001, + "end": 4272.96, "text": " the, you know, collaborating part. And then how I''m seeing + that, you know, women should contribute", "tokens": [50992, 264, 11, 291, 458, 11, + 30188, 644, 13, 400, 550, 577, 286, 478, 2577, 300, 11, 291, 458, 11, 2266, 820, + 10586, 51276], "temperature": 0.0, "avg_logprob": -0.11159610748291016, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.07438478618860245}, {"id": 698, "seek": + 425472, "start": 4272.96, "end": 4279.360000000001, "text": " and, you know, collaborate + more. So I''m trying to have a pattern where we could have some text", "tokens": + [51276, 293, 11, 291, 458, 11, 18338, 544, 13, 407, 286, 478, 1382, 281, 362, 257, + 5102, 689, 321, 727, 362, 512, 2487, 51596], "temperature": 0.0, "avg_logprob": + -0.11159610748291016, "compression_ratio": 1.7142857142857142, "no_speech_prob": + 0.07438478618860245}, {"id": 699, "seek": 427936, "start": 4279.679999999999, "end": + 4288.32, "text": " sessions as well. Like I imagine, I think two, two months ago, + I think, because from last two,", "tokens": [50380, 11081, 382, 731, 13, 1743, 286, + 3811, 11, 286, 519, 732, 11, 732, 2493, 2057, 11, 286, 519, 11, 570, 490, 1036, + 732, 11, 50812], "temperature": 0.0, "avg_logprob": -0.15529436331528884, "compression_ratio": + 1.7123893805309736, "no_speech_prob": 0.054409489035606384}, {"id": 700, "seek": + 427936, "start": 4288.32, "end": 4294.24, "text": " I have not been able to attend + myself one when I was out of town and the other one, I think I was", "tokens": [50812, + 286, 362, 406, 668, 1075, 281, 6888, 2059, 472, 562, 286, 390, 484, 295, 3954, 293, + 264, 661, 472, 11, 286, 519, 286, 390, 51108], "temperature": 0.0, "avg_logprob": + -0.15529436331528884, "compression_ratio": 1.7123893805309736, "no_speech_prob": + 0.054409489035606384}, {"id": 701, "seek": 427936, "start": 4294.24, "end": 4300.48, + "text": " not well. So I did not attend them. But then what I''m trying to have + here is that we are trying to", "tokens": [51108, 406, 731, 13, 407, 286, 630, 406, + 6888, 552, 13, 583, 550, 437, 286, 478, 1382, 281, 362, 510, 307, 300, 321, 366, + 1382, 281, 51420], "temperature": 0.0, "avg_logprob": -0.15529436331528884, "compression_ratio": + 1.7123893805309736, "no_speech_prob": 0.054409489035606384}, {"id": 702, "seek": + 427936, "start": 4300.48, "end": 4306.719999999999, "text": " have more like text + sessions, more like people open up to themselves and then, you know, to the", "tokens": + [51420, 362, 544, 411, 2487, 11081, 11, 544, 411, 561, 1269, 493, 281, 2969, 293, + 550, 11, 291, 458, 11, 281, 264, 51732], "temperature": 0.0, "avg_logprob": -0.15529436331528884, + "compression_ratio": 1.7123893805309736, "no_speech_prob": 0.054409489035606384}, + {"id": 703, "seek": 430672, "start": 4306.72, "end": 4310.88, "text": " others that + this is what I''m trying to achieve. So I think the session that I had like two + months", "tokens": [50364, 2357, 300, 341, 307, 437, 286, 478, 1382, 281, 4584, + 13, 407, 286, 519, 264, 5481, 300, 286, 632, 411, 732, 2493, 50572], "temperature": + 0.0, "avg_logprob": -0.1017657804858777, "compression_ratio": 1.75177304964539, + "no_speech_prob": 0.007724245544523001}, {"id": 704, "seek": 430672, "start": 4310.88, + "end": 4317.12, "text": " ago, I was still implementing stuff up in course. I was + still kind of battling with like, how do I", "tokens": [50572, 2057, 11, 286, 390, + 920, 18114, 1507, 493, 294, 1164, 13, 286, 390, 920, 733, 295, 33752, 365, 411, + 11, 577, 360, 286, 50884], "temperature": 0.0, "avg_logprob": -0.1017657804858777, + "compression_ratio": 1.75177304964539, "no_speech_prob": 0.007724245544523001}, + {"id": 705, "seek": 430672, "start": 4317.12, "end": 4323.12, "text": " make images + work? And I was considering and it was so beautiful that during that call, like + everyone", "tokens": [50884, 652, 5267, 589, 30, 400, 286, 390, 8079, 293, 309, + 390, 370, 2238, 300, 1830, 300, 818, 11, 411, 1518, 51184], "temperature": 0.0, + "avg_logprob": -0.1017657804858777, "compression_ratio": 1.75177304964539, "no_speech_prob": + 0.007724245544523001}, {"id": 706, "seek": 430672, "start": 4323.12, "end": 4327.12, + "text": " who was, you know, just kind of, you know, getting to know each other + and it''s nice that, you know,", "tokens": [51184, 567, 390, 11, 291, 458, 11, 445, + 733, 295, 11, 291, 458, 11, 1242, 281, 458, 1184, 661, 293, 309, 311, 1481, 300, + 11, 291, 458, 11, 51384], "temperature": 0.0, "avg_logprob": -0.1017657804858777, + "compression_ratio": 1.75177304964539, "no_speech_prob": 0.007724245544523001}, + {"id": 707, "seek": 430672, "start": 4327.12, "end": 4333.52, "text": " every time + we have a meeting, at least one or two new people join in. So one of them said, + like,", "tokens": [51384, 633, 565, 321, 362, 257, 3440, 11, 412, 1935, 472, 420, + 732, 777, 561, 3917, 294, 13, 407, 472, 295, 552, 848, 11, 411, 11, 51704], "temperature": + 0.0, "avg_logprob": -0.1017657804858777, "compression_ratio": 1.75177304964539, + "no_speech_prob": 0.007724245544523001}, {"id": 708, "seek": 433352, "start": 4333.52, + "end": 4337.280000000001, "text": " have you tried this? Have you tried that? And + everyone got there, you know, Jupiter notebooks out", "tokens": [50364, 362, 291, + 3031, 341, 30, 3560, 291, 3031, 300, 30, 400, 1518, 658, 456, 11, 291, 458, 11, + 24567, 43782, 484, 50552], "temperature": 0.0, "avg_logprob": -0.11702247287916101, + "compression_ratio": 1.7896440129449838, "no_speech_prob": 0.04374105483293533}, + {"id": 709, "seek": 433352, "start": 4337.280000000001, "end": 4341.360000000001, + "text": " and then they all started, you know, like, oh, this is going to work. + Oh, maybe this one needs a", "tokens": [50552, 293, 550, 436, 439, 1409, 11, 291, + 458, 11, 411, 11, 1954, 11, 341, 307, 516, 281, 589, 13, 876, 11, 1310, 341, 472, + 2203, 257, 50756], "temperature": 0.0, "avg_logprob": -0.11702247287916101, "compression_ratio": + 1.7896440129449838, "no_speech_prob": 0.04374105483293533}, {"id": 710, "seek": + 433352, "start": 4341.360000000001, "end": 4346.160000000001, "text": " licensing + cost. Oh, maybe this one we should not consider because it supports these many things", + "tokens": [50756, 29759, 2063, 13, 876, 11, 1310, 341, 472, 321, 820, 406, 1949, + 570, 309, 9346, 613, 867, 721, 50996], "temperature": 0.0, "avg_logprob": -0.11702247287916101, + "compression_ratio": 1.7896440129449838, "no_speech_prob": 0.04374105483293533}, + {"id": 711, "seek": 433352, "start": 4346.160000000001, "end": 4352.88, "text": + " only. And then by that time that one hour, I mean, became so productive that everyone + left that", "tokens": [50996, 787, 13, 400, 550, 538, 300, 565, 300, 472, 1773, + 11, 286, 914, 11, 3062, 370, 13304, 300, 1518, 1411, 300, 51332], "temperature": + 0.0, "avg_logprob": -0.11702247287916101, "compression_ratio": 1.7896440129449838, + "no_speech_prob": 0.04374105483293533}, {"id": 712, "seek": 433352, "start": 4352.88, + "end": 4356.72, "text": " meeting with some knowledge of vectors. And it was so + amazing to see that.", "tokens": [51332, 3440, 365, 512, 3601, 295, 18875, 13, 400, + 309, 390, 370, 2243, 281, 536, 300, 13, 51524], "temperature": 0.0, "avg_logprob": + -0.11702247287916101, "compression_ratio": 1.7896440129449838, "no_speech_prob": + 0.04374105483293533}, {"id": 713, "seek": 433352, "start": 4357.4400000000005, "end": + 4363.040000000001, "text": " That''s super cool. I mean, I''m really glad to hear + this because it both connects to like an", "tokens": [51560, 663, 311, 1687, 1627, + 13, 286, 914, 11, 286, 478, 534, 5404, 281, 1568, 341, 570, 309, 1293, 16967, 281, + 411, 364, 51840], "temperature": 0.0, "avg_logprob": -0.11702247287916101, "compression_ratio": + 1.7896440129449838, "no_speech_prob": 0.04374105483293533}, {"id": 714, "seek": + 436304, "start": 4363.04, "end": 4369.2, "text": " overarching goal, some real purpose + here. And also, you know, you create the environment of", "tokens": [50364, 45501, + 3387, 11, 512, 957, 4334, 510, 13, 400, 611, 11, 291, 458, 11, 291, 1884, 264, 2823, + 295, 50672], "temperature": 0.0, "avg_logprob": -0.14916689872741698, "compression_ratio": + 1.6288209606986899, "no_speech_prob": 0.001983682857826352}, {"id": 715, "seek": + 436304, "start": 4369.84, "end": 4375.84, "text": " support and exchange and cross-pollination. + I think this is something that men should do too.", "tokens": [50704, 1406, 293, + 7742, 293, 3278, 12, 79, 1833, 2486, 13, 286, 519, 341, 307, 746, 300, 1706, 820, + 360, 886, 13, 51004], "temperature": 0.0, "avg_logprob": -0.14916689872741698, "compression_ratio": + 1.6288209606986899, "no_speech_prob": 0.001983682857826352}, {"id": 716, "seek": + 436304, "start": 4378.56, "end": 4384.4, "text": " Yeah, you''re welcome to join + the sessions. I mean, this is really, I mean, we feel, I mean,", "tokens": [51140, + 865, 11, 291, 434, 2928, 281, 3917, 264, 11081, 13, 286, 914, 11, 341, 307, 534, + 11, 286, 914, 11, 321, 841, 11, 286, 914, 11, 51432], "temperature": 0.0, "avg_logprob": + -0.14916689872741698, "compression_ratio": 1.6288209606986899, "no_speech_prob": + 0.001983682857826352}, {"id": 717, "seek": 436304, "start": 4384.4, "end": 4388.56, + "text": " it''s really nice. And I can say, like, you have good reputation. People + would love to have you.", "tokens": [51432, 309, 311, 534, 1481, 13, 400, 286, 393, + 584, 11, 411, 11, 291, 362, 665, 13061, 13, 3432, 576, 959, 281, 362, 291, 13, 51640], + "temperature": 0.0, "avg_logprob": -0.14916689872741698, "compression_ratio": 1.6288209606986899, + "no_speech_prob": 0.001983682857826352}, {"id": 718, "seek": 438856, "start": 4389.360000000001, + "end": 4397.280000000001, "text": " We hosted a Sebastian from VV8 who offered to + help after the Haystack EU session to help women to,", "tokens": [50404, 492, 19204, + 257, 31102, 490, 691, 53, 23, 567, 8059, 281, 854, 934, 264, 8721, 372, 501, 10887, + 5481, 281, 854, 2266, 281, 11, 50800], "temperature": 0.0, "avg_logprob": -0.1901424217224121, + "compression_ratio": 1.503787878787879, "no_speech_prob": 0.10184956341981888}, + {"id": 719, "seek": 438856, "start": 4397.280000000001, "end": 4402.160000000001, + "text": " you know, have some public speaking skills. And I think we''ve had an + amazing session. We did not put", "tokens": [50800, 291, 458, 11, 362, 512, 1908, + 4124, 3942, 13, 400, 286, 519, 321, 600, 632, 364, 2243, 5481, 13, 492, 630, 406, + 829, 51044], "temperature": 0.0, "avg_logprob": -0.1901424217224121, "compression_ratio": + 1.503787878787879, "no_speech_prob": 0.10184956341981888}, {"id": 720, "seek": 438856, + "start": 4402.160000000001, "end": 4410.400000000001, "text": " that out on YouTube + yet. Not sure why, but then people who attended absolutely hanged us and said,", + "tokens": [51044, 300, 484, 322, 3088, 1939, 13, 1726, 988, 983, 11, 457, 550, 561, + 567, 15990, 3122, 3967, 292, 505, 293, 848, 11, 51456], "temperature": 0.0, "avg_logprob": + -0.1901424217224121, "compression_ratio": 1.503787878787879, "no_speech_prob": 0.10184956341981888}, + {"id": 721, "seek": 438856, "start": 4410.400000000001, "end": 4416.080000000001, + "text": " like, it was really useful. Some planning to have more sessions like that. + Maybe we could host you", "tokens": [51456, 411, 11, 309, 390, 534, 4420, 13, 2188, + 5038, 281, 362, 544, 11081, 411, 300, 13, 2704, 321, 727, 3975, 291, 51740], "temperature": + 0.0, "avg_logprob": -0.1901424217224121, "compression_ratio": 1.503787878787879, + "no_speech_prob": 0.10184956341981888}, {"id": 722, "seek": 441608, "start": 4416.08, + "end": 4421.5199999999995, "text": " someday as well. That would be awesome for + sure. And so you have a YouTube for that. Or is it part of", "tokens": [50364, 19412, + 382, 731, 13, 663, 576, 312, 3476, 337, 988, 13, 400, 370, 291, 362, 257, 3088, + 337, 300, 13, 1610, 307, 309, 644, 295, 50636], "temperature": 0.0, "avg_logprob": + -0.19657424644187646, "compression_ratio": 1.7644927536231885, "no_speech_prob": + 0.044600486755371094}, {"id": 723, "seek": 441608, "start": 4421.5199999999995, + "end": 4425.36, "text": " Oh, we have it. I think this is the reason why it was + not put on YouTube because we don''t really", "tokens": [50636, 876, 11, 321, 362, + 309, 13, 286, 519, 341, 307, 264, 1778, 983, 309, 390, 406, 829, 322, 3088, 570, + 321, 500, 380, 534, 50828], "temperature": 0.0, "avg_logprob": -0.19657424644187646, + "compression_ratio": 1.7644927536231885, "no_speech_prob": 0.044600486755371094}, + {"id": 724, "seek": 441608, "start": 4425.36, "end": 4430.88, "text": " have a channel. + And we did not know like where should we put it. But we recorded it for sure. We + have", "tokens": [50828, 362, 257, 2269, 13, 400, 321, 630, 406, 458, 411, 689, + 820, 321, 829, 309, 13, 583, 321, 8287, 309, 337, 988, 13, 492, 362, 51104], "temperature": + 0.0, "avg_logprob": -0.19657424644187646, "compression_ratio": 1.7644927536231885, + "no_speech_prob": 0.044600486755371094}, {"id": 725, "seek": 441608, "start": 4430.88, + "end": 4437.36, "text": " the recording. Yeah, I think today YouTube is one of the + probably defective platforms. I don''t", "tokens": [51104, 264, 6613, 13, 865, 11, + 286, 519, 965, 3088, 307, 472, 295, 264, 1391, 16445, 488, 9473, 13, 286, 500, 380, + 51428], "temperature": 0.0, "avg_logprob": -0.19657424644187646, "compression_ratio": + 1.7644927536231885, "no_speech_prob": 0.044600486755371094}, {"id": 726, "seek": + 441608, "start": 4437.36, "end": 4445.28, "text": " want to over-advertise it, but + I think it''s a good place to be. Yeah, I think that''s a good", "tokens": [51428, + 528, 281, 670, 12, 345, 3281, 908, 309, 11, 457, 286, 519, 309, 311, 257, 665, 1081, + 281, 312, 13, 865, 11, 286, 519, 300, 311, 257, 665, 51824], "temperature": 0.0, + "avg_logprob": -0.19657424644187646, "compression_ratio": 1.7644927536231885, "no_speech_prob": + 0.044600486755371094}, {"id": 727, "seek": 444528, "start": 4445.28, "end": 4451.5199999999995, + "text": " suggestion, I think, will certainly consider for that in future as well. + But yeah, I think the", "tokens": [50364, 16541, 11, 286, 519, 11, 486, 3297, 1949, + 337, 300, 294, 2027, 382, 731, 13, 583, 1338, 11, 286, 519, 264, 50676], "temperature": + 0.0, "avg_logprob": -0.12401339020391908, "compression_ratio": 1.646551724137931, + "no_speech_prob": 0.015021010302007198}, {"id": 728, "seek": 444528, "start": 4451.5199999999995, + "end": 4456.96, "text": " entire idea is that, you know, we need to make a positive + impact. Be it with open source, be it with,", "tokens": [50676, 2302, 1558, 307, + 300, 11, 291, 458, 11, 321, 643, 281, 652, 257, 3353, 2712, 13, 879, 309, 365, 1269, + 4009, 11, 312, 309, 365, 11, 50948], "temperature": 0.0, "avg_logprob": -0.12401339020391908, + "compression_ratio": 1.646551724137931, "no_speech_prob": 0.015021010302007198}, + {"id": 729, "seek": 444528, "start": 4456.96, "end": 4466.16, "text": " you know, + the trying to push and bring up more minority people up front. One of the most,", + "tokens": [50948, 291, 458, 11, 264, 1382, 281, 2944, 293, 1565, 493, 544, 16166, + 561, 493, 1868, 13, 1485, 295, 264, 881, 11, 51408], "temperature": 0.0, "avg_logprob": + -0.12401339020391908, "compression_ratio": 1.646551724137931, "no_speech_prob": + 0.015021010302007198}, {"id": 730, "seek": 444528, "start": 4466.88, "end": 4471.36, + "text": " I think one of the things that I would like to highlight here is also + maybe we haven''t touched,", "tokens": [51444, 286, 519, 472, 295, 264, 721, 300, + 286, 576, 411, 281, 5078, 510, 307, 611, 1310, 321, 2378, 380, 9828, 11, 51668], + "temperature": 0.0, "avg_logprob": -0.12401339020391908, "compression_ratio": 1.646551724137931, + "no_speech_prob": 0.015021010302007198}, {"id": 731, "seek": 447136, "start": 4471.36, + "end": 4478.719999999999, "text": " but while talking, it strikes me that we have + a saying back in India that behind every successful", "tokens": [50364, 457, 1339, + 1417, 11, 309, 16750, 385, 300, 321, 362, 257, 1566, 646, 294, 5282, 300, 2261, + 633, 4406, 50732], "temperature": 0.0, "avg_logprob": -0.15552099624482713, "compression_ratio": + 1.5662650602409638, "no_speech_prob": 0.012272743508219719}, {"id": 732, "seek": + 447136, "start": 4478.719999999999, "end": 4487.44, "text": " man, there is a woman. + I think it kind of, you know, reverses itself in my case because my story", "tokens": + [50732, 587, 11, 456, 307, 257, 3059, 13, 286, 519, 309, 733, 295, 11, 291, 458, + 11, 14582, 279, 2564, 294, 452, 1389, 570, 452, 1657, 51168], "temperature": 0.0, + "avg_logprob": -0.15552099624482713, "compression_ratio": 1.5662650602409638, "no_speech_prob": + 0.012272743508219719}, {"id": 733, "seek": 447136, "start": 4487.44, "end": 4493.44, + "text": " started. I mean, if I talk about from my college, my husband is also my, + was my classmate. So I''m", "tokens": [51168, 1409, 13, 286, 914, 11, 498, 286, + 751, 466, 490, 452, 3859, 11, 452, 5213, 307, 611, 452, 11, 390, 452, 1508, 13963, + 13, 407, 286, 478, 51468], "temperature": 0.0, "avg_logprob": -0.15552099624482713, + "compression_ratio": 1.5662650602409638, "no_speech_prob": 0.012272743508219719}, + {"id": 734, "seek": 447136, "start": 4493.44, "end": 4499.04, "text": " like one + of those blessed people who has a company of someone who always supports me. And + I think", "tokens": [51468, 411, 472, 295, 729, 12351, 561, 567, 575, 257, 2237, + 295, 1580, 567, 1009, 9346, 385, 13, 400, 286, 519, 51748], "temperature": 0.0, + "avg_logprob": -0.15552099624482713, "compression_ratio": 1.5662650602409638, "no_speech_prob": + 0.012272743508219719}, {"id": 735, "seek": 449904, "start": 4499.12, "end": 4504.56, + "text": " I have been here always, you know, bold and like standing on the stage + giving presentations because", "tokens": [50368, 286, 362, 668, 510, 1009, 11, 291, + 458, 11, 11928, 293, 411, 4877, 322, 264, 3233, 2902, 18964, 570, 50640], "temperature": + 0.0, "avg_logprob": -0.13823289719838944, "compression_ratio": 1.7627737226277371, + "no_speech_prob": 0.014229062013328075}, {"id": 736, "seek": 449904, "start": 4506.0, + "end": 4511.5199999999995, "text": " behind that is the person who is, you know, + very meek, very nervous. Like, how would I do it? And", "tokens": [50712, 2261, + 300, 307, 264, 954, 567, 307, 11, 291, 458, 11, 588, 385, 916, 11, 588, 6296, 13, + 1743, 11, 577, 576, 286, 360, 309, 30, 400, 50988], "temperature": 0.0, "avg_logprob": + -0.13823289719838944, "compression_ratio": 1.7627737226277371, "no_speech_prob": + 0.014229062013328075}, {"id": 737, "seek": 449904, "start": 4511.5199999999995, + "end": 4517.36, "text": " he''s the one who''s always encouraging me to come up + on these stage and like, I can do it. So it''s", "tokens": [50988, 415, 311, 264, + 472, 567, 311, 1009, 14580, 385, 281, 808, 493, 322, 613, 3233, 293, 411, 11, 286, + 393, 360, 309, 13, 407, 309, 311, 51280], "temperature": 0.0, "avg_logprob": -0.13823289719838944, + "compression_ratio": 1.7627737226277371, "no_speech_prob": 0.014229062013328075}, + {"id": 738, "seek": 449904, "start": 4517.36, "end": 4522.32, "text": " interesting + that behind me and every, you know, big stake that I took, there was always a man.", + "tokens": [51280, 1880, 300, 2261, 385, 293, 633, 11, 291, 458, 11, 955, 10407, + 300, 286, 1890, 11, 456, 390, 1009, 257, 587, 13, 51528], "temperature": 0.0, "avg_logprob": + -0.13823289719838944, "compression_ratio": 1.7627737226277371, "no_speech_prob": + 0.014229062013328075}, {"id": 739, "seek": 449904, "start": 4522.96, "end": 4528.32, + "text": " So if I talk about like the first one started with my husband, the next + one was obviously my", "tokens": [51560, 407, 498, 286, 751, 466, 411, 264, 700, + 472, 1409, 365, 452, 5213, 11, 264, 958, 472, 390, 2745, 452, 51828], "temperature": + 0.0, "avg_logprob": -0.13823289719838944, "compression_ratio": 1.7627737226277371, + "no_speech_prob": 0.014229062013328075}, {"id": 740, "seek": 452832, "start": 4528.4, + "end": 4536.0, "text": " manager at my first job. And with the open source as well, + I would like to mention that there''s", "tokens": [50368, 6598, 412, 452, 700, 1691, + 13, 400, 365, 264, 1269, 4009, 382, 731, 11, 286, 576, 411, 281, 2152, 300, 456, + 311, 50748], "temperature": 0.0, "avg_logprob": -0.13512563203510486, "compression_ratio": + 1.5551020408163265, "no_speech_prob": 0.005254331044852734}, {"id": 741, "seek": + 452832, "start": 4537.44, "end": 4542.08, "text": " search engine that''s probably + not very popular. Maybe a lot of people do not even know about it,", "tokens": [50820, + 3164, 2848, 300, 311, 1391, 406, 588, 3743, 13, 2704, 257, 688, 295, 561, 360, 406, + 754, 458, 466, 309, 11, 51052], "temperature": 0.0, "avg_logprob": -0.13512563203510486, + "compression_ratio": 1.5551020408163265, "no_speech_prob": 0.005254331044852734}, + {"id": 742, "seek": 452832, "start": 4542.08, "end": 4550.0, "text": " open source + server. And it''s not affiliated with AWS open search in any way. But so this guy", + "tokens": [51052, 1269, 4009, 7154, 13, 400, 309, 311, 406, 42174, 365, 17650, 1269, + 3164, 294, 604, 636, 13, 583, 370, 341, 2146, 51448], "temperature": 0.0, "avg_logprob": + -0.13512563203510486, "compression_ratio": 1.5551020408163265, "no_speech_prob": + 0.005254331044852734}, {"id": 743, "seek": 452832, "start": 4550.0, "end": 4555.92, + "text": " has recommended me also on LinkedIn. And we bumped into like maybe he + was trying out, you know,", "tokens": [51448, 575, 9628, 385, 611, 322, 20657, 13, + 400, 321, 42696, 666, 411, 1310, 415, 390, 1382, 484, 11, 291, 458, 11, 51744], + "temperature": 0.0, "avg_logprob": -0.13512563203510486, "compression_ratio": 1.5551020408163265, + "no_speech_prob": 0.005254331044852734}, {"id": 744, "seek": 455592, "start": 4556.0, + "end": 4562.96, "text": " other contributors or developers as well for his search + engine. And we collaborated and he", "tokens": [50368, 661, 45627, 420, 8849, 382, + 731, 337, 702, 3164, 2848, 13, 400, 321, 42463, 293, 415, 50716], "temperature": + 0.0, "avg_logprob": -0.21204226504090012, "compression_ratio": 1.6538461538461537, + "no_speech_prob": 0.007024990860372782}, {"id": 745, "seek": 455592, "start": 4562.96, + "end": 4571.04, "text": " showed me whole new word of open source, open source contributions. + So he''s a very critical person,", "tokens": [50716, 4712, 385, 1379, 777, 1349, + 295, 1269, 4009, 11, 1269, 4009, 15725, 13, 407, 415, 311, 257, 588, 4924, 954, + 11, 51120], "temperature": 0.0, "avg_logprob": -0.21204226504090012, "compression_ratio": + 1.6538461538461537, "no_speech_prob": 0.007024990860372782}, {"id": 746, "seek": + 455592, "start": 4571.04, "end": 4576.88, "text": " I would say like code wise, + I think he''s really good. And he worships, I would say Mike McIndels,", "tokens": + [51120, 286, 576, 584, 411, 3089, 10829, 11, 286, 519, 415, 311, 534, 665, 13, 400, + 415, 9965, 82, 11, 286, 576, 584, 6602, 4050, 40, 273, 1625, 11, 51412], "temperature": + 0.0, "avg_logprob": -0.21204226504090012, "compression_ratio": 1.6538461538461537, + "no_speech_prob": 0.007024990860372782}, {"id": 747, "seek": 455592, "start": 4576.88, + "end": 4581.12, "text": " because every time I would not understand anything and + Lucy and he would show me something from my", "tokens": [51412, 570, 633, 565, 286, + 576, 406, 1223, 1340, 293, 22698, 293, 415, 576, 855, 385, 746, 490, 452, 51624], + "temperature": 0.0, "avg_logprob": -0.21204226504090012, "compression_ratio": 1.6538461538461537, + "no_speech_prob": 0.007024990860372782}, {"id": 748, "seek": 458112, "start": 4581.2, + "end": 4590.0, "text": " McIndels. So I mean, it was, it was nice. And then I started, + I think if I talk about from the open", "tokens": [50368, 4050, 40, 273, 1625, 13, + 407, 286, 914, 11, 309, 390, 11, 309, 390, 1481, 13, 400, 550, 286, 1409, 11, 286, + 519, 498, 286, 751, 466, 490, 264, 1269, 50808], "temperature": 0.0, "avg_logprob": + -0.17724386014436422, "compression_ratio": 1.4723618090452262, "no_speech_prob": + 0.014261680655181408}, {"id": 749, "seek": 458112, "start": 4590.0, "end": 4599.44, + "text": " source connection, affiliated libraries as well. So chorus was mostly + like Eric and Renee kind of", "tokens": [50808, 4009, 4984, 11, 42174, 15148, 382, + 731, 13, 407, 22632, 390, 5240, 411, 9336, 293, 47790, 733, 295, 51280], "temperature": + 0.0, "avg_logprob": -0.17724386014436422, "compression_ratio": 1.4723618090452262, + "no_speech_prob": 0.014261680655181408}, {"id": 750, "seek": 458112, "start": 4600.32, + "end": 4606.96, "text": " pushed me into like we should do something together, keep + it again. Eric promoted sort of that,", "tokens": [51324, 9152, 385, 666, 411, 321, + 820, 360, 746, 1214, 11, 1066, 309, 797, 13, 9336, 21162, 1333, 295, 300, 11, 51656], + "temperature": 0.0, "avg_logprob": -0.17724386014436422, "compression_ratio": 1.4723618090452262, + "no_speech_prob": 0.014261680655181408}, {"id": 751, "seek": 460696, "start": 4607.44, + "end": 4615.52, "text": " you should try fixing some things also with solar. It + was Eric. And with open NLP, it was one of my", "tokens": [50388, 291, 820, 853, + 19442, 512, 721, 611, 365, 7936, 13, 467, 390, 9336, 13, 400, 365, 1269, 426, 45196, + 11, 309, 390, 472, 295, 452, 50792], "temperature": 0.0, "avg_logprob": -0.1476672328248316, + "compression_ratio": 1.6738197424892705, "no_speech_prob": 0.003900152398273349}, + {"id": 752, "seek": 460696, "start": 4616.16, "end": 4622.0, "text": " colleagues + at open source connections, who''s not with open source connections anymore. So + Jeff,", "tokens": [50824, 7734, 412, 1269, 4009, 9271, 11, 567, 311, 406, 365, 1269, + 4009, 9271, 3602, 13, 407, 7506, 11, 51116], "temperature": 0.0, "avg_logprob": + -0.1476672328248316, "compression_ratio": 1.6738197424892705, "no_speech_prob": + 0.003900152398273349}, {"id": 753, "seek": 460696, "start": 4622.0, "end": 4629.52, + "text": " who is chair of the open NLP library. So I think there was this discussion + we had because we", "tokens": [51116, 567, 307, 6090, 295, 264, 1269, 426, 45196, + 6405, 13, 407, 286, 519, 456, 390, 341, 5017, 321, 632, 570, 321, 51492], "temperature": + 0.0, "avg_logprob": -0.1476672328248316, "compression_ratio": 1.6738197424892705, + "no_speech_prob": 0.003900152398273349}, {"id": 754, "seek": 460696, "start": 4629.52, + "end": 4635.12, "text": " were trying to work on a use case together. And he suggested + like we could do this with open NLP. And", "tokens": [51492, 645, 1382, 281, 589, + 322, 257, 764, 1389, 1214, 13, 400, 415, 10945, 411, 321, 727, 360, 341, 365, 1269, + 426, 45196, 13, 400, 51772], "temperature": 0.0, "avg_logprob": -0.1476672328248316, + "compression_ratio": 1.6738197424892705, "no_speech_prob": 0.003900152398273349}, + {"id": 755, "seek": 463512, "start": 4635.92, "end": 4643.04, "text": " I was using + an open NLP before as well, like in 2017. And somehow never really thought of like + I", "tokens": [50404, 286, 390, 1228, 364, 1269, 426, 45196, 949, 382, 731, 11, + 411, 294, 6591, 13, 400, 6063, 1128, 534, 1194, 295, 411, 286, 50760], "temperature": + 0.0, "avg_logprob": -0.15151654119076935, "compression_ratio": 1.7614035087719297, + "no_speech_prob": 0.008517570793628693}, {"id": 756, "seek": 463512, "start": 4643.04, + "end": 4646.5599999999995, "text": " would contribute, but I pointed out to him + that you know there''s something that could be done in", "tokens": [50760, 576, + 10586, 11, 457, 286, 10932, 484, 281, 796, 300, 291, 458, 456, 311, 746, 300, 727, + 312, 1096, 294, 50936], "temperature": 0.0, "avg_logprob": -0.15151654119076935, + "compression_ratio": 1.7614035087719297, "no_speech_prob": 0.008517570793628693}, + {"id": 757, "seek": 463512, "start": 4646.5599999999995, "end": 4653.2, "text": + " this direction. And he said like why don''t you do it? And I was like me like + no way. And he was like", "tokens": [50936, 341, 3513, 13, 400, 415, 848, 411, 983, + 500, 380, 291, 360, 309, 30, 400, 286, 390, 411, 385, 411, 572, 636, 13, 400, 415, + 390, 411, 51268], "temperature": 0.0, "avg_logprob": -0.15151654119076935, "compression_ratio": + 1.7614035087719297, "no_speech_prob": 0.008517570793628693}, {"id": 758, "seek": + 463512, "start": 4653.2, "end": 4658.32, "text": " no, no, you can do it. I mean, + if you understand this, I mean certainly. And I think as I said before,", "tokens": + [51268, 572, 11, 572, 11, 291, 393, 360, 309, 13, 286, 914, 11, 498, 291, 1223, + 341, 11, 286, 914, 3297, 13, 400, 286, 519, 382, 286, 848, 949, 11, 51524], "temperature": + 0.0, "avg_logprob": -0.15151654119076935, "compression_ratio": 1.7614035087719297, + "no_speech_prob": 0.008517570793628693}, {"id": 759, "seek": 463512, "start": 4658.32, + "end": 4663.2, "text": " like it''s an addiction, once you started, you just don''t + want to stop. And I think that''s how I started,", "tokens": [51524, 411, 309, 311, + 364, 16835, 11, 1564, 291, 1409, 11, 291, 445, 500, 380, 528, 281, 1590, 13, 400, + 286, 519, 300, 311, 577, 286, 1409, 11, 51768], "temperature": 0.0, "avg_logprob": + -0.15151654119076935, "compression_ratio": 1.7614035087719297, "no_speech_prob": + 0.008517570793628693}, {"id": 760, "seek": 466320, "start": 4663.2, "end": 4667.679999999999, + "text": " I mean, I started using it more and more contributing more and obviously. + And you became the", "tokens": [50364, 286, 914, 11, 286, 1409, 1228, 309, 544, + 293, 544, 19270, 544, 293, 2745, 13, 400, 291, 3062, 264, 50588], "temperature": + 0.0, "avg_logprob": -0.24138724217649365, "compression_ratio": 1.6843971631205674, + "no_speech_prob": 0.005107386037707329}, {"id": 761, "seek": 466320, "start": 4667.679999999999, + "end": 4674.96, "text": " commuter as well, right? I became the comatir. Yes. Of + open NLP, right? Congratulations. That''s", "tokens": [50588, 800, 20314, 382, 731, + 11, 558, 30, 286, 3062, 264, 395, 267, 347, 13, 1079, 13, 2720, 1269, 426, 45196, + 11, 558, 30, 9694, 13, 663, 311, 50952], "temperature": 0.0, "avg_logprob": -0.24138724217649365, + "compression_ratio": 1.6843971631205674, "no_speech_prob": 0.005107386037707329}, + {"id": 762, "seek": 466320, "start": 4674.96, "end": 4680.16, "text": " really great. + You''re right. Nice. Actually nice. And the kind of people who review your code + and", "tokens": [50952, 534, 869, 13, 509, 434, 558, 13, 5490, 13, 5135, 1481, 13, + 400, 264, 733, 295, 561, 567, 3131, 428, 3089, 293, 51212], "temperature": 0.0, + "avg_logprob": -0.24138724217649365, "compression_ratio": 1.6843971631205674, "no_speech_prob": + 0.005107386037707329}, {"id": 763, "seek": 466320, "start": 4680.16, "end": 4685.12, + "text": " like you get to know like oh, this could be done this way as well. I mean, + I can say that it''s a", "tokens": [51212, 411, 291, 483, 281, 458, 411, 1954, 11, + 341, 727, 312, 1096, 341, 636, 382, 731, 13, 286, 914, 11, 286, 393, 584, 300, 309, + 311, 257, 51460], "temperature": 0.0, "avg_logprob": -0.24138724217649365, "compression_ratio": + 1.6843971631205674, "no_speech_prob": 0.005107386037707329}, {"id": 764, "seek": + 466320, "start": 4685.12, "end": 4690.88, "text": " very rounded development opportunity + that I got with open source, open source contributions.", "tokens": [51460, 588, + 23382, 3250, 2650, 300, 286, 658, 365, 1269, 4009, 11, 1269, 4009, 15725, 13, 51748], + "temperature": 0.0, "avg_logprob": -0.24138724217649365, "compression_ratio": 1.6843971631205674, + "no_speech_prob": 0.005107386037707329}, {"id": 765, "seek": 469088, "start": 4690.88, + "end": 4696.72, "text": " So that''s something I can highly, highly recommend anyone. + I think one of the things that has really", "tokens": [50364, 407, 300, 311, 746, + 286, 393, 5405, 11, 5405, 2748, 2878, 13, 286, 519, 472, 295, 264, 721, 300, 575, + 534, 50656], "temperature": 0.0, "avg_logprob": -0.09136404557661577, "compression_ratio": + 1.7642857142857142, "no_speech_prob": 0.003882095217704773}, {"id": 766, "seek": + 469088, "start": 4696.72, "end": 4702.8, "text": " helped me in principle is like + starting with the documentation or maybe the test cases. I think", "tokens": [50656, + 4254, 385, 294, 8665, 307, 411, 2891, 365, 264, 14333, 420, 1310, 264, 1500, 3331, + 13, 286, 519, 50960], "temperature": 0.0, "avg_logprob": -0.09136404557661577, "compression_ratio": + 1.7642857142857142, "no_speech_prob": 0.003882095217704773}, {"id": 767, "seek": + 469088, "start": 4702.8, "end": 4708.32, "text": " that''s very classic advice that + you would get from anyone who''s been working and contributing. But", "tokens": + [50960, 300, 311, 588, 7230, 5192, 300, 291, 576, 483, 490, 2878, 567, 311, 668, + 1364, 293, 19270, 13, 583, 51236], "temperature": 0.0, "avg_logprob": -0.09136404557661577, + "compression_ratio": 1.7642857142857142, "no_speech_prob": 0.003882095217704773}, + {"id": 768, "seek": 469088, "start": 4708.32, "end": 4712.64, "text": " that''s + really the right way to go about it. I think documentation really helps because + you can not", "tokens": [51236, 300, 311, 534, 264, 558, 636, 281, 352, 466, 309, + 13, 286, 519, 14333, 534, 3665, 570, 291, 393, 406, 51452], "temperature": 0.0, + "avg_logprob": -0.09136404557661577, "compression_ratio": 1.7642857142857142, "no_speech_prob": + 0.003882095217704773}, {"id": 769, "seek": 469088, "start": 4712.64, "end": 4717.52, + "text": " just write anything like that. You have to try things out yourself before + writing about them. And I", "tokens": [51452, 445, 2464, 1340, 411, 300, 13, 509, + 362, 281, 853, 721, 484, 1803, 949, 3579, 466, 552, 13, 400, 286, 51696], "temperature": + 0.0, "avg_logprob": -0.09136404557661577, "compression_ratio": 1.7642857142857142, + "no_speech_prob": 0.003882095217704773}, {"id": 770, "seek": 471752, "start": 4717.6, + "end": 4722.72, "text": " think that just gives you enough context to pick up some + tickets and like start solving them.", "tokens": [50368, 519, 300, 445, 2709, 291, + 1547, 4319, 281, 1888, 493, 512, 12628, 293, 411, 722, 12606, 552, 13, 50624], "temperature": + 0.0, "avg_logprob": -0.12343944298042046, "compression_ratio": 1.61864406779661, + "no_speech_prob": 0.009592908434569836}, {"id": 771, "seek": 471752, "start": 4723.280000000001, + "end": 4732.240000000001, "text": " I am actually mentoring some of the people like + outside my job to contribute on open source.", "tokens": [50652, 286, 669, 767, + 30257, 512, 295, 264, 561, 411, 2380, 452, 1691, 281, 10586, 322, 1269, 4009, 13, + 51100], "temperature": 0.0, "avg_logprob": -0.12343944298042046, "compression_ratio": + 1.61864406779661, "no_speech_prob": 0.009592908434569836}, {"id": 772, "seek": 471752, + "start": 4732.240000000001, "end": 4738.4800000000005, "text": " And just so you + know, these are the people who have been in the industry for like 20, 25 years.", + "tokens": [51100, 400, 445, 370, 291, 458, 11, 613, 366, 264, 561, 567, 362, 668, + 294, 264, 3518, 337, 411, 945, 11, 3552, 924, 13, 51412], "temperature": 0.0, "avg_logprob": + -0.12343944298042046, "compression_ratio": 1.61864406779661, "no_speech_prob": 0.009592908434569836}, + {"id": 773, "seek": 471752, "start": 4738.4800000000005, "end": 4743.76, "text": + " So I mean, there are people who come with Lord of hesitation that I mean, is it + even something that", "tokens": [51412, 407, 286, 914, 11, 456, 366, 561, 567, 808, + 365, 3257, 295, 36125, 300, 286, 914, 11, 307, 309, 754, 746, 300, 51676], "temperature": + 0.0, "avg_logprob": -0.12343944298042046, "compression_ratio": 1.61864406779661, + "no_speech_prob": 0.009592908434569836}, {"id": 774, "seek": 474376, "start": 4743.76, + "end": 4749.04, "text": " I should do? But it''s nice. I mean, the kind of questions + they bring in and the kind of discussions", "tokens": [50364, 286, 820, 360, 30, + 583, 309, 311, 1481, 13, 286, 914, 11, 264, 733, 295, 1651, 436, 1565, 294, 293, + 264, 733, 295, 11088, 50628], "temperature": 0.0, "avg_logprob": -0.17519226814936667, + "compression_ratio": 1.617283950617284, "no_speech_prob": 0.004983008839190006}, + {"id": 775, "seek": 474376, "start": 4749.04, "end": 4757.4400000000005, "text": + " we have. It''s amazing. So yeah. Yeah, I mean, for sure, like what you said, makes + so much sense,", "tokens": [50628, 321, 362, 13, 467, 311, 2243, 13, 407, 1338, + 13, 865, 11, 286, 914, 11, 337, 988, 11, 411, 437, 291, 848, 11, 1669, 370, 709, + 2020, 11, 51048], "temperature": 0.0, "avg_logprob": -0.17519226814936667, "compression_ratio": + 1.617283950617284, "no_speech_prob": 0.004983008839190006}, {"id": 776, "seek": + 474376, "start": 4757.4400000000005, "end": 4764.4800000000005, "text": " you know, + like you read the docs, you will and you will test it out probably as part of your + job", "tokens": [51048, 291, 458, 11, 411, 291, 1401, 264, 45623, 11, 291, 486, + 293, 291, 486, 1500, 309, 484, 1391, 382, 644, 295, 428, 1691, 51400], "temperature": + 0.0, "avg_logprob": -0.17519226814936667, "compression_ratio": 1.617283950617284, + "no_speech_prob": 0.004983008839190006}, {"id": 777, "seek": 474376, "start": 4764.4800000000005, + "end": 4771.360000000001, "text": " or of your research project or whatever or your + curiosity, right? And something will definitely not", "tokens": [51400, 420, 295, + 428, 2132, 1716, 420, 2035, 420, 428, 18769, 11, 558, 30, 400, 746, 486, 2138, 406, + 51744], "temperature": 0.0, "avg_logprob": -0.17519226814936667, "compression_ratio": + 1.617283950617284, "no_speech_prob": 0.004983008839190006}, {"id": 778, "seek": + 477136, "start": 4771.36, "end": 4777.679999999999, "text": " be right. I mean, + whatever said in the doc is not longer true or it doesn''t apply for use case.", + "tokens": [50364, 312, 558, 13, 286, 914, 11, 2035, 848, 294, 264, 3211, 307, 406, + 2854, 2074, 420, 309, 1177, 380, 3079, 337, 764, 1389, 13, 50680], "temperature": + 0.0, "avg_logprob": -0.16831255932243502, "compression_ratio": 1.6302521008403361, + "no_speech_prob": 0.0029066165443509817}, {"id": 779, "seek": 477136, "start": 4777.679999999999, + "end": 4783.679999999999, "text": " But how did you you said, you know, this imposter + syndrome and like just general fear or like", "tokens": [50680, 583, 577, 630, 291, + 291, 848, 11, 291, 458, 11, 341, 704, 7096, 19371, 293, 411, 445, 2674, 4240, 420, + 411, 50980], "temperature": 0.0, "avg_logprob": -0.16831255932243502, "compression_ratio": + 1.6302521008403361, "no_speech_prob": 0.0029066165443509817}, {"id": 780, "seek": + 477136, "start": 4784.24, "end": 4792.08, "text": " you didn''t have experience + with it? How did you kind of leap from from what you just encountered", "tokens": + [51008, 291, 994, 380, 362, 1752, 365, 309, 30, 1012, 630, 291, 733, 295, 19438, + 490, 490, 437, 291, 445, 20381, 51400], "temperature": 0.0, "avg_logprob": -0.16831255932243502, + "compression_ratio": 1.6302521008403361, "no_speech_prob": 0.0029066165443509817}, + {"id": 781, "seek": 477136, "start": 4792.08, "end": 4800.48, "text": " that doesn''t + work to actually going and contributing it? I think that that is something. I mean, + I''ve", "tokens": [51400, 300, 1177, 380, 589, 281, 767, 516, 293, 19270, 309, 30, + 286, 519, 300, 300, 307, 746, 13, 286, 914, 11, 286, 600, 51820], "temperature": + 0.0, "avg_logprob": -0.16831255932243502, "compression_ratio": 1.6302521008403361, + "no_speech_prob": 0.0029066165443509817}, {"id": 782, "seek": 480048, "start": 4800.5599999999995, + "end": 4808.799999999999, "text": " so I think the first time it took kind of the + bigger leap of faith, then rest of the other times.", "tokens": [50368, 370, 286, + 519, 264, 700, 565, 309, 1890, 733, 295, 264, 3801, 19438, 295, 4522, 11, 550, 1472, + 295, 264, 661, 1413, 13, 50780], "temperature": 0.0, "avg_logprob": -0.1445729804761482, + "compression_ratio": 1.736842105263158, "no_speech_prob": 0.002417856128886342}, + {"id": 783, "seek": 480048, "start": 4808.799999999999, "end": 4813.839999999999, + "text": " I think once it works, that''s what I''m saying. Like the first time is + always going to be the hardest.", "tokens": [50780, 286, 519, 1564, 309, 1985, 11, + 300, 311, 437, 286, 478, 1566, 13, 1743, 264, 700, 565, 307, 1009, 516, 281, 312, + 264, 13158, 13, 51032], "temperature": 0.0, "avg_logprob": -0.1445729804761482, + "compression_ratio": 1.736842105263158, "no_speech_prob": 0.002417856128886342}, + {"id": 784, "seek": 480048, "start": 4813.839999999999, "end": 4820.24, "text": + " I think you would always and I think not being able to contribute on solar for + almost like six", "tokens": [51032, 286, 519, 291, 576, 1009, 293, 286, 519, 406, + 885, 1075, 281, 10586, 322, 7936, 337, 1920, 411, 2309, 51352], "temperature": 0.0, + "avg_logprob": -0.1445729804761482, "compression_ratio": 1.736842105263158, "no_speech_prob": + 0.002417856128886342}, {"id": 785, "seek": 480048, "start": 4820.24, "end": 4825.599999999999, + "text": " years, I have been away from solar and I can say like a lot of things + I already implemented when they", "tokens": [51352, 924, 11, 286, 362, 668, 1314, + 490, 7936, 293, 286, 393, 584, 411, 257, 688, 295, 721, 286, 1217, 12270, 562, 436, + 51620], "temperature": 0.0, "avg_logprob": -0.1445729804761482, "compression_ratio": + 1.736842105263158, "no_speech_prob": 0.002417856128886342}, {"id": 786, "seek": + 482560, "start": 4825.6, "end": 4833.68, "text": " came out in solar for point 10, + I think that''s what I started working on in 2012 or 14.", "tokens": [50364, 1361, + 484, 294, 7936, 337, 935, 1266, 11, 286, 519, 300, 311, 437, 286, 1409, 1364, 322, + 294, 9125, 420, 3499, 13, 50768], "temperature": 0.0, "avg_logprob": -0.13995523078768862, + "compression_ratio": 1.668141592920354, "no_speech_prob": 0.0028181944508105516}, + {"id": 787, "seek": 482560, "start": 4834.240000000001, "end": 4841.76, "text": + " And it was kind of this grudge that really like made me feel and took me on a + guilt trip that I", "tokens": [50796, 400, 309, 390, 733, 295, 341, 677, 16032, + 300, 534, 411, 1027, 385, 841, 293, 1890, 385, 322, 257, 20421, 4931, 300, 286, + 51172], "temperature": 0.0, "avg_logprob": -0.13995523078768862, "compression_ratio": + 1.668141592920354, "no_speech_prob": 0.0028181944508105516}, {"id": 788, "seek": + 482560, "start": 4841.76, "end": 4847.280000000001, "text": " could have done that + just that I did not have a knowledge of. So which is which is why I mean,", "tokens": + [51172, 727, 362, 1096, 300, 445, 300, 286, 630, 406, 362, 257, 3601, 295, 13, 407, + 597, 307, 597, 307, 983, 286, 914, 11, 51448], "temperature": 0.0, "avg_logprob": + -0.13995523078768862, "compression_ratio": 1.668141592920354, "no_speech_prob": + 0.0028181944508105516}, {"id": 789, "seek": 482560, "start": 4847.280000000001, + "end": 4852.64, "text": " I''m saying that if you think about that you could do + it, I mean, just think about the worst thing.", "tokens": [51448, 286, 478, 1566, + 300, 498, 291, 519, 466, 300, 291, 727, 360, 309, 11, 286, 914, 11, 445, 519, 466, + 264, 5855, 551, 13, 51716], "temperature": 0.0, "avg_logprob": -0.13995523078768862, + "compression_ratio": 1.668141592920354, "no_speech_prob": 0.0028181944508105516}, + {"id": 790, "seek": 485264, "start": 4852.72, "end": 4857.84, "text": " Worst is + that somebody would reject and ask you to rework on it. But from your side, you + would", "tokens": [50368, 26363, 372, 307, 300, 2618, 576, 8248, 293, 1029, 291, + 281, 48376, 322, 309, 13, 583, 490, 428, 1252, 11, 291, 576, 50624], "temperature": + 0.0, "avg_logprob": -0.1308232843875885, "compression_ratio": 1.830258302583026, + "no_speech_prob": 0.014976073056459427}, {"id": 791, "seek": 485264, "start": 4857.84, + "end": 4863.84, "text": " have the satisfaction that I thought this could be done + this way and I tried to you know, do it", "tokens": [50624, 362, 264, 18715, 300, + 286, 1194, 341, 727, 312, 1096, 341, 636, 293, 286, 3031, 281, 291, 458, 11, 360, + 309, 50924], "temperature": 0.0, "avg_logprob": -0.1308232843875885, "compression_ratio": + 1.830258302583026, "no_speech_prob": 0.014976073056459427}, {"id": 792, "seek": + 485264, "start": 4863.84, "end": 4869.04, "text": " this way. And I think that''s + what that''s how I am living now. I mean, if I have something on my mind,", "tokens": + [50924, 341, 636, 13, 400, 286, 519, 300, 311, 437, 300, 311, 577, 286, 669, 2647, + 586, 13, 286, 914, 11, 498, 286, 362, 746, 322, 452, 1575, 11, 51184], "temperature": + 0.0, "avg_logprob": -0.1308232843875885, "compression_ratio": 1.830258302583026, + "no_speech_prob": 0.014976073056459427}, {"id": 793, "seek": 485264, "start": 4869.04, + "end": 4874.88, "text": " sitting on my mind that this is how it should be done, + I just execute it because that is what is in my", "tokens": [51184, 3798, 322, 452, + 1575, 300, 341, 307, 577, 309, 820, 312, 1096, 11, 286, 445, 14483, 309, 570, 300, + 307, 437, 307, 294, 452, 51476], "temperature": 0.0, "avg_logprob": -0.1308232843875885, + "compression_ratio": 1.830258302583026, "no_speech_prob": 0.014976073056459427}, + {"id": 794, "seek": 485264, "start": 4874.88, "end": 4881.6, "text": " control. + Like I can execute it. Approving or not approving something is something that I + delegate to", "tokens": [51476, 1969, 13, 1743, 286, 393, 14483, 309, 13, 29551, + 798, 420, 406, 2075, 798, 746, 307, 746, 300, 286, 40999, 281, 51812], "temperature": + 0.0, "avg_logprob": -0.1308232843875885, "compression_ratio": 1.830258302583026, + "no_speech_prob": 0.014976073056459427}, {"id": 795, "seek": 488160, "start": 4881.6, + "end": 4888.88, "text": " the rest, the other person. So I think that is also something + that we brought up in the discussion", "tokens": [50364, 264, 1472, 11, 264, 661, + 954, 13, 407, 286, 519, 300, 307, 611, 746, 300, 321, 3038, 493, 294, 264, 5017, + 50728], "temperature": 0.0, "avg_logprob": -0.13668123881022134, "compression_ratio": + 1.6291666666666667, "no_speech_prob": 0.008469302207231522}, {"id": 796, "seek": + 488160, "start": 4888.88, "end": 4895.360000000001, "text": " with Sebastian that + Lord of Women, when discussing like how can we become like better public speakers,", + "tokens": [50728, 365, 31102, 300, 3257, 295, 11065, 11, 562, 10850, 411, 577, 393, + 321, 1813, 411, 1101, 1908, 9518, 11, 51052], "temperature": 0.0, "avg_logprob": + -0.13668123881022134, "compression_ratio": 1.6291666666666667, "no_speech_prob": + 0.008469302207231522}, {"id": 797, "seek": 488160, "start": 4895.360000000001, "end": + 4899.4400000000005, "text": " asked him that you know, what is a good topic to be + submitted in the conference? Like how do I", "tokens": [51052, 2351, 796, 300, 291, + 458, 11, 437, 307, 257, 665, 4829, 281, 312, 14405, 294, 264, 7586, 30, 1743, 577, + 360, 286, 51256], "temperature": 0.0, "avg_logprob": -0.13668123881022134, "compression_ratio": + 1.6291666666666667, "no_speech_prob": 0.008469302207231522}, {"id": 798, "seek": + 488160, "start": 4899.4400000000005, "end": 4906.240000000001, "text": " ensure + like my topic is accepted? And he made a way, I would say like a reasonable advice + that", "tokens": [51256, 5586, 411, 452, 4829, 307, 9035, 30, 400, 415, 1027, 257, + 636, 11, 286, 576, 584, 411, 257, 10585, 5192, 300, 51596], "temperature": 0.0, + "avg_logprob": -0.13668123881022134, "compression_ratio": 1.6291666666666667, "no_speech_prob": + 0.008469302207231522}, {"id": 799, "seek": 490624, "start": 4906.32, "end": 4912.0, + "text": " from your side, what you can do is like submit the proposal. I mean, accepting, + not accepting is", "tokens": [50368, 490, 428, 1252, 11, 437, 291, 393, 360, 307, + 411, 10315, 264, 11494, 13, 286, 914, 11, 17391, 11, 406, 17391, 307, 50652], "temperature": + 0.0, "avg_logprob": -0.1597775000113028, "compression_ratio": 1.8, "no_speech_prob": + 0.00816268939524889}, {"id": 800, "seek": 490624, "start": 4912.0, "end": 4919.76, + "text": " something, I mean, once it is you know, like submitted, it goes to the + next swimling and that''s when", "tokens": [50652, 746, 11, 286, 914, 11, 1564, + 309, 307, 291, 458, 11, 411, 14405, 11, 309, 1709, 281, 264, 958, 1693, 332, 1688, + 293, 300, 311, 562, 51040], "temperature": 0.0, "avg_logprob": -0.1597775000113028, + "compression_ratio": 1.8, "no_speech_prob": 0.00816268939524889}, {"id": 801, "seek": + 490624, "start": 4919.76, "end": 4924.0, "text": " you stop thinking about it. If + it comes back to you, that''s when you need to think about like what do", "tokens": + [51040, 291, 1590, 1953, 466, 309, 13, 759, 309, 1487, 646, 281, 291, 11, 300, 311, + 562, 291, 643, 281, 519, 466, 411, 437, 360, 51252], "temperature": 0.0, "avg_logprob": + -0.1597775000113028, "compression_ratio": 1.8, "no_speech_prob": 0.00816268939524889}, + {"id": 802, "seek": 490624, "start": 4924.0, "end": 4930.32, "text": " I need to + present to woo the crowd and making sure like you make a point. But then that''s + what even I do.", "tokens": [51252, 286, 643, 281, 1974, 281, 21657, 264, 6919, + 293, 1455, 988, 411, 291, 652, 257, 935, 13, 583, 550, 300, 311, 437, 754, 286, + 360, 13, 51568], "temperature": 0.0, "avg_logprob": -0.1597775000113028, "compression_ratio": + 1.8, "no_speech_prob": 0.00816268939524889}, {"id": 803, "seek": 493032, "start": + 4930.88, "end": 4938.24, "text": " Like if I think about that, this is how it should + be done, I just do it. I like maxes like it would", "tokens": [50392, 1743, 498, + 286, 519, 466, 300, 11, 341, 307, 577, 309, 820, 312, 1096, 11, 286, 445, 360, 309, + 13, 286, 411, 11469, 279, 411, 309, 576, 50760], "temperature": 0.0, "avg_logprob": + -0.15798392119231047, "compression_ratio": 1.7857142857142858, "no_speech_prob": + 0.04455038160085678}, {"id": 804, "seek": 493032, "start": 4938.24, "end": 4943.12, + "text": " be rejected. It would not be accepted or somebody would say like, oh, + you know what, this is wrong.", "tokens": [50760, 312, 15749, 13, 467, 576, 406, + 312, 9035, 420, 2618, 576, 584, 411, 11, 1954, 11, 291, 458, 437, 11, 341, 307, + 2085, 13, 51004], "temperature": 0.0, "avg_logprob": -0.15798392119231047, "compression_ratio": + 1.7857142857142858, "no_speech_prob": 0.04455038160085678}, {"id": 805, "seek": + 493032, "start": 4943.12, "end": 4949.92, "text": " This is now hot. This is not + how it should be done. That''s about it. You get a feedback. Either you get", "tokens": + [51004, 639, 307, 586, 2368, 13, 639, 307, 406, 577, 309, 820, 312, 1096, 13, 663, + 311, 466, 309, 13, 509, 483, 257, 5824, 13, 13746, 291, 483, 51344], "temperature": + 0.0, "avg_logprob": -0.15798392119231047, "compression_ratio": 1.7857142857142858, + "no_speech_prob": 0.04455038160085678}, {"id": 806, "seek": 493032, "start": 4949.92, + "end": 4956.719999999999, "text": " the recognition of the stuff that you''ve contributed + or you get feedback. Both of them are good.", "tokens": [51344, 264, 11150, 295, + 264, 1507, 300, 291, 600, 18434, 420, 291, 483, 5824, 13, 6767, 295, 552, 366, 665, + 13, 51684], "temperature": 0.0, "avg_logprob": -0.15798392119231047, "compression_ratio": + 1.7857142857142858, "no_speech_prob": 0.04455038160085678}, {"id": 807, "seek": + 495672, "start": 4957.4400000000005, "end": 4964.4800000000005, "text": " Yeah, + that''s beautifully put. I remember about one person who was like a Java champion. + I don''t know", "tokens": [50400, 865, 11, 300, 311, 16525, 829, 13, 286, 1604, + 466, 472, 954, 567, 390, 411, 257, 10745, 10971, 13, 286, 500, 380, 458, 50752], + "temperature": 0.0, "avg_logprob": -0.14521085373078935, "compression_ratio": 1.6556016597510372, + "no_speech_prob": 0.07771903276443481}, {"id": 808, "seek": 495672, "start": 4964.4800000000005, + "end": 4970.320000000001, "text": " if you know of this title that was awarded at + some point by some microsystems. I think it''s called", "tokens": [50752, 498, 291, + 458, 295, 341, 4876, 300, 390, 19100, 412, 512, 935, 538, 512, 15547, 9321, 82, + 13, 286, 519, 309, 311, 1219, 51044], "temperature": 0.0, "avg_logprob": -0.14521085373078935, + "compression_ratio": 1.6556016597510372, "no_speech_prob": 0.07771903276443481}, + {"id": 809, "seek": 495672, "start": 4970.320000000001, "end": 4977.04, "text": + " Java champion. You know, people who really popularized Java and you know, talked + about it, written", "tokens": [51044, 10745, 10971, 13, 509, 458, 11, 561, 567, + 534, 3743, 1602, 10745, 293, 291, 458, 11, 2825, 466, 309, 11, 3720, 51380], "temperature": + 0.0, "avg_logprob": -0.14521085373078935, "compression_ratio": 1.6556016597510372, + "no_speech_prob": 0.07771903276443481}, {"id": 810, "seek": 495672, "start": 4977.04, + "end": 4985.52, "text": " books and contributed code and so on and so forth. And + he was saying, you know, whenever he received", "tokens": [51380, 3642, 293, 18434, + 3089, 293, 370, 322, 293, 370, 5220, 13, 400, 415, 390, 1566, 11, 291, 458, 11, + 5699, 415, 4613, 51804], "temperature": 0.0, "avg_logprob": -0.14521085373078935, + "compression_ratio": 1.6556016597510372, "no_speech_prob": 0.07771903276443481}, + {"id": 811, "seek": 498552, "start": 4985.52, "end": 4992.64, "text": " the question + and he had like 20, 25 years experience in this and he was saying, hey, when I hear", + "tokens": [50364, 264, 1168, 293, 415, 632, 411, 945, 11, 3552, 924, 1752, 294, + 341, 293, 415, 390, 1566, 11, 4177, 11, 562, 286, 1568, 50720], "temperature": 0.0, + "avg_logprob": -0.14754085540771483, "compression_ratio": 1.6470588235294117, "no_speech_prob": + 0.012738942168653011}, {"id": 812, "seek": 498552, "start": 4992.64, "end": 4999.120000000001, + "text": " a question from a newcomer, doubting themselves million, million of times, + you know, should I apply", "tokens": [50720, 257, 1168, 490, 257, 40014, 260, 11, + 10831, 783, 2969, 2459, 11, 2459, 295, 1413, 11, 291, 458, 11, 820, 286, 3079, 51044], + "temperature": 0.0, "avg_logprob": -0.14754085540771483, "compression_ratio": 1.6470588235294117, + "no_speech_prob": 0.012738942168653011}, {"id": 813, "seek": 498552, "start": 4999.120000000001, + "end": 5005.120000000001, "text": " for this job or should I become, should I commit + something here like code or whatever. I am just", "tokens": [51044, 337, 341, 1691, + 420, 820, 286, 1813, 11, 820, 286, 5599, 746, 510, 411, 3089, 420, 2035, 13, 286, + 669, 445, 51344], "temperature": 0.0, "avg_logprob": -0.14754085540771483, "compression_ratio": + 1.6470588235294117, "no_speech_prob": 0.012738942168653011}, {"id": 814, "seek": + 498552, "start": 5005.120000000001, "end": 5012.56, "text": " afraid. He was saying, + okay, go and get your rejection. So it''s like that first leap that you take", "tokens": + [51344, 4638, 13, 634, 390, 1566, 11, 1392, 11, 352, 293, 483, 428, 26044, 13, 407, + 309, 311, 411, 300, 700, 19438, 300, 291, 747, 51716], "temperature": 0.0, "avg_logprob": + -0.14754085540771483, "compression_ratio": 1.6470588235294117, "no_speech_prob": + 0.012738942168653011}, {"id": 815, "seek": 501256, "start": 5012.56, "end": 5019.84, + "text": " and it becomes bit more like a game. I mean, it''s like and he had a bit + of like humor here, right?", "tokens": [50364, 293, 309, 3643, 857, 544, 411, 257, + 1216, 13, 286, 914, 11, 309, 311, 411, 293, 415, 632, 257, 857, 295, 411, 14318, + 510, 11, 558, 30, 50728], "temperature": 0.0, "avg_logprob": -0.17050054235365783, + "compression_ratio": 1.6929824561403508, "no_speech_prob": 0.004583640024065971}, + {"id": 816, "seek": 501256, "start": 5019.84, "end": 5025.6, "text": " Okay. So + you''re doubting yourself, you think you will get rejected, then prove it, go and + get it.", "tokens": [50728, 1033, 13, 407, 291, 434, 10831, 783, 1803, 11, 291, + 519, 291, 486, 483, 15749, 11, 550, 7081, 309, 11, 352, 293, 483, 309, 13, 51016], + "temperature": 0.0, "avg_logprob": -0.17050054235365783, "compression_ratio": 1.6929824561403508, + "no_speech_prob": 0.004583640024065971}, {"id": 817, "seek": 501256, "start": 5025.6, + "end": 5031.6, "text": " Right. And as they go and get it, they will probably either + get it or succeed or get some partial", "tokens": [51016, 1779, 13, 400, 382, 436, + 352, 293, 483, 309, 11, 436, 486, 1391, 2139, 483, 309, 420, 7754, 420, 483, 512, + 14641, 51316], "temperature": 0.0, "avg_logprob": -0.17050054235365783, "compression_ratio": + 1.6929824561403508, "no_speech_prob": 0.004583640024065971}, {"id": 818, "seek": + 501256, "start": 5032.400000000001, "end": 5038.88, "text": " you know, stage and + to the partial stage and probably need to revise something and so on to", "tokens": + [51356, 291, 458, 11, 3233, 293, 281, 264, 14641, 3233, 293, 1391, 643, 281, 44252, + 746, 293, 370, 322, 281, 51680], "temperature": 0.0, "avg_logprob": -0.17050054235365783, + "compression_ratio": 1.6929824561403508, "no_speech_prob": 0.004583640024065971}, + {"id": 819, "seek": 503888, "start": 5039.52, "end": 5044.96, "text": " right. Right. + No, that is that is amazing. I mean, I love it. And I that also brings back like + I", "tokens": [50396, 558, 13, 1779, 13, 883, 11, 300, 307, 300, 307, 2243, 13, + 286, 914, 11, 286, 959, 309, 13, 400, 286, 300, 611, 5607, 646, 411, 286, 50668], + "temperature": 0.0, "avg_logprob": -0.2083697950983622, "compression_ratio": 1.5340314136125655, + "no_speech_prob": 0.03184819594025612}, {"id": 820, "seek": 503888, "start": 5044.96, + "end": 5051.36, "text": " used to have this email signature. I think in during the + time when I was in college that I never", "tokens": [50668, 1143, 281, 362, 341, + 3796, 13397, 13, 286, 519, 294, 1830, 264, 565, 562, 286, 390, 294, 3859, 300, 286, + 1128, 50988], "temperature": 0.0, "avg_logprob": -0.2083697950983622, "compression_ratio": + 1.5340314136125655, "no_speech_prob": 0.03184819594025612}, {"id": 821, "seek": + 503888, "start": 5051.36, "end": 5059.12, "text": " lose either when or I learn. + Yeah. That just brought back. I mean, I don''t know. I at what point in", "tokens": + [50988, 3624, 2139, 562, 420, 286, 1466, 13, 865, 13, 663, 445, 3038, 646, 13, 286, + 914, 11, 286, 500, 380, 458, 13, 286, 412, 437, 935, 294, 51376], "temperature": + 0.0, "avg_logprob": -0.2083697950983622, "compression_ratio": 1.5340314136125655, + "no_speech_prob": 0.03184819594025612}, {"id": 822, "seek": 505912, "start": 5059.12, + "end": 5068.16, "text": " time like I removed it, but I really lived by that line + that you either learn or you win. So", "tokens": [50364, 565, 411, 286, 7261, 309, + 11, 457, 286, 534, 5152, 538, 300, 1622, 300, 291, 2139, 1466, 420, 291, 1942, 13, + 407, 50816], "temperature": 0.0, "avg_logprob": -0.2006537975409092, "compression_ratio": + 1.6263736263736264, "no_speech_prob": 0.01986653357744217}, {"id": 823, "seek": + 505912, "start": 5069.36, "end": 5076.88, "text": " either way, it''s a win. Yeah. + And like, probably getting ahead of ourselves, but like even when you", "tokens": + [50876, 2139, 636, 11, 309, 311, 257, 1942, 13, 865, 13, 400, 411, 11, 1391, 1242, + 2286, 295, 4175, 11, 457, 411, 754, 562, 291, 51252], "temperature": 0.0, "avg_logprob": + -0.2006537975409092, "compression_ratio": 1.6263736263736264, "no_speech_prob": + 0.01986653357744217}, {"id": 824, "seek": 505912, "start": 5076.88, "end": 5083.68, + "text": " start getting these small wins, they have start and end really like by + the end of the win, you''re like,", "tokens": [51252, 722, 1242, 613, 1359, 10641, + 11, 436, 362, 722, 293, 917, 534, 411, 538, 264, 917, 295, 264, 1942, 11, 291, 434, + 411, 11, 51592], "temperature": 0.0, "avg_logprob": -0.2006537975409092, "compression_ratio": + 1.6263736263736264, "no_speech_prob": 0.01986653357744217}, {"id": 825, "seek": + 508368, "start": 5083.76, "end": 5089.04, "text": " okay, what should they do next? + You know, next challenge for me, you know, and that''s probably you", "tokens": + [50368, 1392, 11, 437, 820, 436, 360, 958, 30, 509, 458, 11, 958, 3430, 337, 385, + 11, 291, 458, 11, 293, 300, 311, 1391, 291, 50632], "temperature": 0.0, "avg_logprob": + -0.14469994789312693, "compression_ratio": 1.8837209302325582, "no_speech_prob": + 0.006543614435940981}, {"id": 826, "seek": 508368, "start": 5089.04, "end": 5095.280000000001, + "text": " already had of yourself and you''re doing a lot of work already, but at + the same time, that", "tokens": [50632, 1217, 632, 295, 1803, 293, 291, 434, 884, + 257, 688, 295, 589, 1217, 11, 457, 412, 264, 912, 565, 11, 300, 50944], "temperature": + 0.0, "avg_logprob": -0.14469994789312693, "compression_ratio": 1.8837209302325582, + "no_speech_prob": 0.006543614435940981}, {"id": 827, "seek": 508368, "start": 5095.280000000001, + "end": 5100.72, "text": " challenge is always there. And that''s the part of the + game and part of the reinforcement learning.", "tokens": [50944, 3430, 307, 1009, + 456, 13, 400, 300, 311, 264, 644, 295, 264, 1216, 293, 644, 295, 264, 29280, 2539, + 13, 51216], "temperature": 0.0, "avg_logprob": -0.14469994789312693, "compression_ratio": + 1.8837209302325582, "no_speech_prob": 0.006543614435940981}, {"id": 828, "seek": + 508368, "start": 5101.360000000001, "end": 5106.96, "text": " And that is actually + the correct terminology. I was just looking for that term. I think reenforcement", + "tokens": [51248, 400, 300, 307, 767, 264, 3006, 27575, 13, 286, 390, 445, 1237, + 337, 300, 1433, 13, 286, 519, 319, 268, 9382, 51528], "temperature": 0.0, "avg_logprob": + -0.14469994789312693, "compression_ratio": 1.8837209302325582, "no_speech_prob": + 0.006543614435940981}, {"id": 829, "seek": 508368, "start": 5106.96, "end": 5113.200000000001, + "text": " learning is I think that''s exactly how it happens. I think for me, it''s + exactly how it''s going.", "tokens": [51528, 2539, 307, 286, 519, 300, 311, 2293, + 577, 309, 2314, 13, 286, 519, 337, 385, 11, 309, 311, 2293, 577, 309, 311, 516, + 13, 51840], "temperature": 0.0, "avg_logprob": -0.14469994789312693, "compression_ratio": + 1.8837209302325582, "no_speech_prob": 0.006543614435940981}, {"id": 830, "seek": + 511368, "start": 5113.68, "end": 5120.8, "text": " Like I contribute something, + then I get feedback and I present it. And I think I start, you know,", "tokens": + [50364, 1743, 286, 10586, 746, 11, 550, 286, 483, 5824, 293, 286, 1974, 309, 13, + 400, 286, 519, 286, 722, 11, 291, 458, 11, 50720], "temperature": 0.0, "avg_logprob": + -0.1287408653570681, "compression_ratio": 1.651063829787234, "no_speech_prob": 0.0020228810608386993}, + {"id": 831, "seek": 511368, "start": 5121.84, "end": 5126.88, "text": " becoming + more kind of like passionate about the other stuff. So it''s like, you know, throwing", + "tokens": [50772, 5617, 544, 733, 295, 411, 11410, 466, 264, 661, 1507, 13, 407, + 309, 311, 411, 11, 291, 458, 11, 10238, 51024], "temperature": 0.0, "avg_logprob": + -0.1287408653570681, "compression_ratio": 1.651063829787234, "no_speech_prob": 0.0020228810608386993}, + {"id": 832, "seek": 511368, "start": 5126.88, "end": 5133.76, "text": " your hat + over the wall. And the next time you throw it at, I mean, throw your hat at even + higher wall.", "tokens": [51024, 428, 2385, 670, 264, 2929, 13, 400, 264, 958, 565, + 291, 3507, 309, 412, 11, 286, 914, 11, 3507, 428, 2385, 412, 754, 2946, 2929, 13, + 51368], "temperature": 0.0, "avg_logprob": -0.1287408653570681, "compression_ratio": + 1.651063829787234, "no_speech_prob": 0.0020228810608386993}, {"id": 833, "seek": + 511368, "start": 5134.56, "end": 5140.240000000001, "text": " So I''m not sure if + you understand that terminology, like this is something that''s like taking", "tokens": + [51408, 407, 286, 478, 406, 988, 498, 291, 1223, 300, 27575, 11, 411, 341, 307, + 746, 300, 311, 411, 1940, 51692], "temperature": 0.0, "avg_logprob": -0.1287408653570681, + "compression_ratio": 1.651063829787234, "no_speech_prob": 0.0020228810608386993}, + {"id": 834, "seek": 514024, "start": 5140.24, "end": 5147.679999999999, "text": + " your chances. So it''s like throwing hat over the wall. So yeah, that''s that''s + so I''m trying to make", "tokens": [50364, 428, 10486, 13, 407, 309, 311, 411, 10238, + 2385, 670, 264, 2929, 13, 407, 1338, 11, 300, 311, 300, 311, 370, 286, 478, 1382, + 281, 652, 50736], "temperature": 0.0, "avg_logprob": -0.16652506873721168, "compression_ratio": + 1.6610878661087867, "no_speech_prob": 0.0249177236109972}, {"id": 835, "seek": 514024, + "start": 5147.679999999999, "end": 5152.639999999999, "text": " sure like all the + feedback that I got from demos and the presentations that I give so far. I mean,", + "tokens": [50736, 988, 411, 439, 264, 5824, 300, 286, 658, 490, 33788, 293, 264, + 18964, 300, 286, 976, 370, 1400, 13, 286, 914, 11, 50984], "temperature": 0.0, "avg_logprob": + -0.16652506873721168, "compression_ratio": 1.6610878661087867, "no_speech_prob": + 0.0249177236109972}, {"id": 836, "seek": 514024, "start": 5152.639999999999, "end": + 5158.5599999999995, "text": " I, you know, collect all of it, making sure like I + address everything. I have big presentation", "tokens": [50984, 286, 11, 291, 458, + 11, 2500, 439, 295, 309, 11, 1455, 988, 411, 286, 2985, 1203, 13, 286, 362, 955, + 5860, 51280], "temperature": 0.0, "avg_logprob": -0.16652506873721168, "compression_ratio": + 1.6610878661087867, "no_speech_prob": 0.0249177236109972}, {"id": 837, "seek": 514024, + "start": 5158.5599999999995, "end": 5167.36, "text": " again at Berlin Buzzwords + to give this year. Yeah, I think it''s kind of like something. If I look back,", + "tokens": [51280, 797, 412, 13848, 29209, 13832, 281, 976, 341, 1064, 13, 865, 11, + 286, 519, 309, 311, 733, 295, 411, 746, 13, 759, 286, 574, 646, 11, 51720], "temperature": + 0.0, "avg_logprob": -0.16652506873721168, "compression_ratio": 1.6610878661087867, + "no_speech_prob": 0.0249177236109972}, {"id": 838, "seek": 516736, "start": 5167.36, + "end": 5173.599999999999, "text": " like the first time when I spoke at Berlin Buzzword + and I think year 2020, then that''s when the", "tokens": [50364, 411, 264, 700, + 565, 562, 286, 7179, 412, 13848, 29209, 7462, 293, 286, 519, 1064, 4808, 11, 550, + 300, 311, 562, 264, 50676], "temperature": 0.0, "avg_logprob": -0.11706775125831065, + "compression_ratio": 1.6244725738396624, "no_speech_prob": 0.006397884339094162}, + {"id": 839, "seek": 516736, "start": 5173.599999999999, "end": 5181.2, "text": " + conference was like online. It was something I remember that, I mean, I went to + my office", "tokens": [50676, 7586, 390, 411, 2950, 13, 467, 390, 746, 286, 1604, + 300, 11, 286, 914, 11, 286, 1437, 281, 452, 3398, 51056], "temperature": 0.0, "avg_logprob": + -0.11706775125831065, "compression_ratio": 1.6244725738396624, "no_speech_prob": + 0.006397884339094162}, {"id": 840, "seek": 516736, "start": 5181.2, "end": 5186.48, + "text": " because my kids would disturb me. And I have this, you know, like I was + so nervous, I had like five", "tokens": [51056, 570, 452, 2301, 576, 18071, 385, + 13, 400, 286, 362, 341, 11, 291, 458, 11, 411, 286, 390, 370, 6296, 11, 286, 632, + 411, 1732, 51320], "temperature": 0.0, "avg_logprob": -0.11706775125831065, "compression_ratio": + 1.6244725738396624, "no_speech_prob": 0.006397884339094162}, {"id": 841, "seek": + 516736, "start": 5186.48, "end": 5191.5199999999995, "text": " bottles of water + next to me. And then I did not drink anything because I felt like I would not have", + "tokens": [51320, 15923, 295, 1281, 958, 281, 385, 13, 400, 550, 286, 630, 406, + 2822, 1340, 570, 286, 2762, 411, 286, 576, 406, 362, 51572], "temperature": 0.0, + "avg_logprob": -0.11706775125831065, "compression_ratio": 1.6244725738396624, "no_speech_prob": + 0.006397884339094162}, {"id": 842, "seek": 519152, "start": 5191.52, "end": 5197.68, + "text": " to leave the seat for the toilet break if I drink that that much water. + So I was so confused like,", "tokens": [50364, 281, 1856, 264, 6121, 337, 264, 11137, + 1821, 498, 286, 2822, 300, 300, 709, 1281, 13, 407, 286, 390, 370, 9019, 411, 11, + 50672], "temperature": 0.0, "avg_logprob": -0.1263441619873047, "compression_ratio": + 1.7518518518518518, "no_speech_prob": 0.00958150066435337}, {"id": 843, "seek": + 519152, "start": 5197.68, "end": 5201.76, "text": " I''m thirsty because I''m talking + so much, but then I would not drink them because I would have to,", "tokens": [50672, + 286, 478, 28115, 570, 286, 478, 1417, 370, 709, 11, 457, 550, 286, 576, 406, 2822, + 552, 570, 286, 576, 362, 281, 11, 50876], "temperature": 0.0, "avg_logprob": -0.1263441619873047, + "compression_ratio": 1.7518518518518518, "no_speech_prob": 0.00958150066435337}, + {"id": 844, "seek": 519152, "start": 5201.76, "end": 5207.4400000000005, "text": + " you know, go away because I was hosting the lightning talk session, which went + on for like the", "tokens": [50876, 291, 458, 11, 352, 1314, 570, 286, 390, 16058, + 264, 16589, 751, 5481, 11, 597, 1437, 322, 337, 411, 264, 51160], "temperature": + 0.0, "avg_logprob": -0.1263441619873047, "compression_ratio": 1.7518518518518518, + "no_speech_prob": 0.00958150066435337}, {"id": 845, "seek": 519152, "start": 5207.4400000000005, + "end": 5214.56, "text": " longer than a usual talk session. So it was, it was like + from that point in time until now, I think,", "tokens": [51160, 2854, 813, 257, + 7713, 751, 5481, 13, 407, 309, 390, 11, 309, 390, 411, 490, 300, 935, 294, 565, + 1826, 586, 11, 286, 519, 11, 51516], "temperature": 0.0, "avg_logprob": -0.1263441619873047, + "compression_ratio": 1.7518518518518518, "no_speech_prob": 0.00958150066435337}, + {"id": 846, "seek": 519152, "start": 5214.56, "end": 5220.080000000001, "text": + " yes, it''s been quite a journey. Yeah, amazing. I mean, this, this is really like", + "tokens": [51516, 2086, 11, 309, 311, 668, 1596, 257, 4671, 13, 865, 11, 2243, 13, + 286, 914, 11, 341, 11, 341, 307, 534, 411, 51792], "temperature": 0.0, "avg_logprob": + -0.1263441619873047, "compression_ratio": 1.7518518518518518, "no_speech_prob": + 0.00958150066435337}, {"id": 847, "seek": 522008, "start": 5220.5599999999995, "end": + 5229.2, "text": " coming to this logical question that I usually ask, like the question + of why, and like you shared a", "tokens": [50388, 1348, 281, 341, 14978, 1168, 300, + 286, 2673, 1029, 11, 411, 264, 1168, 295, 983, 11, 293, 411, 291, 5507, 257, 50820], + "temperature": 0.0, "avg_logprob": -0.2049399728644384, "compression_ratio": 1.5549738219895288, + "no_speech_prob": 0.012516120448708534}, {"id": 848, "seek": 522008, "start": 5229.2, + "end": 5237.6, "text": " lot today, you know, about women and search, your own journey, + public speaking. And there is always", "tokens": [50820, 688, 965, 11, 291, 458, + 11, 466, 2266, 293, 3164, 11, 428, 1065, 4671, 11, 1908, 4124, 13, 400, 456, 307, + 1009, 51240], "temperature": 0.0, "avg_logprob": -0.2049399728644384, "compression_ratio": + 1.5549738219895288, "no_speech_prob": 0.012516120448708534}, {"id": 849, "seek": + 522008, "start": 5237.6, "end": 5244.5599999999995, "text": " something new coming + up, new projects, new blog, as a result of that project, maybe new struggle,", "tokens": + [51240, 746, 777, 1348, 493, 11, 777, 4455, 11, 777, 6968, 11, 382, 257, 1874, 295, + 300, 1716, 11, 1310, 777, 7799, 11, 51588], "temperature": 0.0, "avg_logprob": -0.2049399728644384, + "compression_ratio": 1.5549738219895288, "no_speech_prob": 0.012516120448708534}, + {"id": 850, "seek": 524456, "start": 5244.56, "end": 5251.280000000001, "text": + " new learning and so on. But is there something that keeps you in this profession + beyond these", "tokens": [50364, 777, 2539, 293, 370, 322, 13, 583, 307, 456, 746, + 300, 5965, 291, 294, 341, 7032, 4399, 613, 50700], "temperature": 0.0, "avg_logprob": + -0.13571672570215512, "compression_ratio": 1.5977653631284916, "no_speech_prob": + 0.004477645270526409}, {"id": 851, "seek": 524456, "start": 5251.280000000001, "end": + 5263.280000000001, "text": " challenges, beyond solutions? Or is it just that? I + think it''s mostly, I don''t know, it''s, it''s", "tokens": [50700, 4759, 11, 4399, + 6547, 30, 1610, 307, 309, 445, 300, 30, 286, 519, 309, 311, 5240, 11, 286, 500, + 380, 458, 11, 309, 311, 11, 309, 311, 51300], "temperature": 0.0, "avg_logprob": + -0.13571672570215512, "compression_ratio": 1.5977653631284916, "no_speech_prob": + 0.004477645270526409}, {"id": 852, "seek": 524456, "start": 5263.280000000001, "end": + 5272.64, "text": " something so engaging and the joy that I get in solving things, + I think that just keeps me going", "tokens": [51300, 746, 370, 11268, 293, 264, + 6258, 300, 286, 483, 294, 12606, 721, 11, 286, 519, 300, 445, 5965, 385, 516, 51768], + "temperature": 0.0, "avg_logprob": -0.13571672570215512, "compression_ratio": 1.5977653631284916, + "no_speech_prob": 0.004477645270526409}, {"id": 853, "seek": 527264, "start": 5272.72, + "end": 5281.12, "text": " on and on. And I think it''s also like a commitment that + I have to myself that learning things and", "tokens": [50368, 322, 293, 322, 13, + 400, 286, 519, 309, 311, 611, 411, 257, 8371, 300, 286, 362, 281, 2059, 300, 2539, + 721, 293, 50788], "temperature": 0.0, "avg_logprob": -0.13755440385374304, "compression_ratio": + 1.5668449197860963, "no_speech_prob": 0.018185634166002274}, {"id": 854, "seek": + 527264, "start": 5281.12, "end": 5288.4800000000005, "text": " experimenting with + stuff that really brings the best out of me. So I mean, I have this somehow", "tokens": + [50788, 29070, 365, 1507, 300, 534, 5607, 264, 1151, 484, 295, 385, 13, 407, 286, + 914, 11, 286, 362, 341, 6063, 51156], "temperature": 0.0, "avg_logprob": -0.13755440385374304, + "compression_ratio": 1.5668449197860963, "no_speech_prob": 0.018185634166002274}, + {"id": 855, "seek": 527264, "start": 5288.4800000000005, "end": 5295.04, "text": + " ambition, I want to be, you know, like I want to know everything and maybe language + analysis. I mean", "tokens": [51156, 22814, 11, 286, 528, 281, 312, 11, 291, 458, + 11, 411, 286, 528, 281, 458, 1203, 293, 1310, 2856, 5215, 13, 286, 914, 51484], + "temperature": 0.0, "avg_logprob": -0.13755440385374304, "compression_ratio": 1.5668449197860963, + "no_speech_prob": 0.018185634166002274}, {"id": 856, "seek": 529504, "start": 5295.12, + "end": 5303.04, "text": " that, you know, so much, you know, has like a passion + that I have for these that I want to be like", "tokens": [50368, 300, 11, 291, 458, + 11, 370, 709, 11, 291, 458, 11, 575, 411, 257, 5418, 300, 286, 362, 337, 613, 300, + 286, 528, 281, 312, 411, 50764], "temperature": 0.0, "avg_logprob": -0.16483158146569488, + "compression_ratio": 1.6694915254237288, "no_speech_prob": 0.05429725721478462}, + {"id": 857, "seek": 529504, "start": 5303.04, "end": 5308.48, "text": " a PhD someday + in that. And which is why I want to know everything and somehow, like just aiming + for", "tokens": [50764, 257, 14476, 19412, 294, 300, 13, 400, 597, 307, 983, 286, + 528, 281, 458, 1203, 293, 6063, 11, 411, 445, 20253, 337, 51036], "temperature": + 0.0, "avg_logprob": -0.16483158146569488, "compression_ratio": 1.6694915254237288, + "no_speech_prob": 0.05429725721478462}, {"id": 858, "seek": 529504, "start": 5308.48, + "end": 5314.88, "text": " that goal, I know I''m like maybe old, old for doing PhD. + But yeah, that''s the goal, like keep for", "tokens": [51036, 300, 3387, 11, 286, + 458, 286, 478, 411, 1310, 1331, 11, 1331, 337, 884, 14476, 13, 583, 1338, 11, 300, + 311, 264, 3387, 11, 411, 1066, 337, 51356], "temperature": 0.0, "avg_logprob": -0.16483158146569488, + "compression_ratio": 1.6694915254237288, "no_speech_prob": 0.05429725721478462}, + {"id": 859, "seek": 529504, "start": 5314.88, "end": 5322.64, "text": " myself. + Like if I, I think a short story before we end is that people ask me like, what + does your", "tokens": [51356, 2059, 13, 1743, 498, 286, 11, 286, 519, 257, 2099, + 1657, 949, 321, 917, 307, 300, 561, 1029, 385, 411, 11, 437, 775, 428, 51744], "temperature": + 0.0, "avg_logprob": -0.16483158146569488, "compression_ratio": 1.6694915254237288, + "no_speech_prob": 0.05429725721478462}, {"id": 860, "seek": 532264, "start": 5322.64, + "end": 5327.92, "text": " name means? And I think if you''re not noticed, like my + name is a pellandrom, my first name and my last", "tokens": [50364, 1315, 1355, + 30, 400, 286, 519, 498, 291, 434, 406, 5694, 11, 411, 452, 1315, 307, 257, 520, + 285, 474, 4397, 11, 452, 700, 1315, 293, 452, 1036, 50628], "temperature": 0.0, + "avg_logprob": -0.18540033243470272, "compression_ratio": 1.691358024691358, "no_speech_prob": + 0.008039119653403759}, {"id": 861, "seek": 532264, "start": 5327.92, "end": 5335.04, + "text": " name. So it''s a, the ITA, and if you reverse it, it stays the same. Also + people ask me like, what does", "tokens": [50628, 1315, 13, 407, 309, 311, 257, + 11, 264, 6783, 32, 11, 293, 498, 291, 9943, 309, 11, 309, 10834, 264, 912, 13, 2743, + 561, 1029, 385, 411, 11, 437, 775, 50984], "temperature": 0.0, "avg_logprob": -0.18540033243470272, + "compression_ratio": 1.691358024691358, "no_speech_prob": 0.008039119653403759}, + {"id": 862, "seek": 532264, "start": 5335.04, "end": 5342.88, "text": " it mean? + So my name is actually a machine learning model. So I learn from the past. That''s + what my name", "tokens": [50984, 309, 914, 30, 407, 452, 1315, 307, 767, 257, 3479, + 2539, 2316, 13, 407, 286, 1466, 490, 264, 1791, 13, 663, 311, 437, 452, 1315, 51376], + "temperature": 0.0, "avg_logprob": -0.18540033243470272, "compression_ratio": 1.691358024691358, + "no_speech_prob": 0.008039119653403759}, {"id": 863, "seek": 532264, "start": 5342.88, + "end": 5348.56, "text": " means. And I want to just, you know, prove my name that + if my parents, you know, thought of something", "tokens": [51376, 1355, 13, 400, + 286, 528, 281, 445, 11, 291, 458, 11, 7081, 452, 1315, 300, 498, 452, 3152, 11, + 291, 458, 11, 1194, 295, 746, 51660], "temperature": 0.0, "avg_logprob": -0.18540033243470272, + "compression_ratio": 1.691358024691358, "no_speech_prob": 0.008039119653403759}, + {"id": 864, "seek": 534856, "start": 5348.64, "end": 5356.320000000001, "text": + " before naming me at the time, they get what they expected from me. And that''s + all about it.", "tokens": [50368, 949, 25290, 385, 412, 264, 565, 11, 436, 483, + 437, 436, 5176, 490, 385, 13, 400, 300, 311, 439, 466, 309, 13, 50752], "temperature": + 0.0, "avg_logprob": -0.2131419803785241, "compression_ratio": 1.5851528384279476, + "no_speech_prob": 0.012252261862158775}, {"id": 865, "seek": 534856, "start": 5356.96, + "end": 5363.200000000001, "text": " Oh, very beautiful answer. Really? It''s like + living to your, to your true self.", "tokens": [50784, 876, 11, 588, 2238, 1867, + 13, 4083, 30, 467, 311, 411, 2647, 281, 428, 11, 281, 428, 2074, 2698, 13, 51096], + "temperature": 0.0, "avg_logprob": -0.2131419803785241, "compression_ratio": 1.5851528384279476, + "no_speech_prob": 0.012252261862158775}, {"id": 866, "seek": 534856, "start": 5364.080000000001, + "end": 5369.52, "text": " Yes. Amazing. Is there something you want to announce? + Are you already said that you''re going", "tokens": [51140, 1079, 13, 14165, 13, + 1119, 456, 746, 291, 528, 281, 7478, 30, 2014, 291, 1217, 848, 300, 291, 434, 516, + 51412], "temperature": 0.0, "avg_logprob": -0.2131419803785241, "compression_ratio": + 1.5851528384279476, "no_speech_prob": 0.012252261862158775}, {"id": 867, "seek": + 534856, "start": 5369.52, "end": 5374.96, "text": " to present at Marine Bosworth''s? + Is there something else you want to share with the audience that", "tokens": [51412, + 281, 1974, 412, 20415, 22264, 13136, 311, 30, 1119, 456, 746, 1646, 291, 528, 281, + 2073, 365, 264, 4034, 300, 51684], "temperature": 0.0, "avg_logprob": -0.2131419803785241, + "compression_ratio": 1.5851528384279476, "no_speech_prob": 0.012252261862158775}, + {"id": 868, "seek": 537496, "start": 5375.04, "end": 5381.84, "text": " they should + know about? Maybe course or something, something that they should, you know,", "tokens": + [50368, 436, 820, 458, 466, 30, 2704, 1164, 420, 746, 11, 746, 300, 436, 820, 11, + 291, 458, 11, 50708], "temperature": 0.0, "avg_logprob": -0.2285439686108661, "compression_ratio": + 1.6177777777777778, "no_speech_prob": 0.03229888528585434}, {"id": 869, "seek": + 537496, "start": 5383.04, "end": 5388.64, "text": " get them and start it with. + Oh, yeah, for sure. I think so the elastic search version of", "tokens": [50768, + 483, 552, 293, 722, 309, 365, 13, 876, 11, 1338, 11, 337, 988, 13, 286, 519, 370, + 264, 17115, 3164, 3037, 295, 51048], "temperature": 0.0, "avg_logprob": -0.2285439686108661, + "compression_ratio": 1.6177777777777778, "no_speech_prob": 0.03229888528585434}, + {"id": 870, "seek": 537496, "start": 5388.64, "end": 5394.0, "text": " courses also + out now. I mean, happy to take more questions. If you have anything, reach out to", + "tokens": [51048, 7712, 611, 484, 586, 13, 286, 914, 11, 2055, 281, 747, 544, 1651, + 13, 759, 291, 362, 1340, 11, 2524, 484, 281, 51316], "temperature": 0.0, "avg_logprob": + -0.2285439686108661, "compression_ratio": 1.6177777777777778, "no_speech_prob": + 0.03229888528585434}, {"id": 871, "seek": 537496, "start": 5394.0, "end": 5399.68, + "text": " me on Slack, of course. Other than that, there were certain things that + I wanted to rework.", "tokens": [51316, 385, 322, 37211, 11, 295, 1164, 13, 5358, + 813, 300, 11, 456, 645, 1629, 721, 300, 286, 1415, 281, 48376, 13, 51600], "temperature": + 0.0, "avg_logprob": -0.2285439686108661, "compression_ratio": 1.6177777777777778, + "no_speech_prob": 0.03229888528585434}, {"id": 872, "seek": 539968, "start": 5399.68, + "end": 5405.76, "text": " So we wanted to work on our image models. So that''s now + been fixed already. So happy to give", "tokens": [50364, 407, 321, 1415, 281, 589, + 322, 527, 3256, 5245, 13, 407, 300, 311, 586, 668, 6806, 1217, 13, 407, 2055, 281, + 976, 50668], "temperature": 0.0, "avg_logprob": -0.09350616733233134, "compression_ratio": + 1.6523605150214593, "no_speech_prob": 0.009951352141797543}, {"id": 873, "seek": + 539968, "start": 5405.76, "end": 5411.12, "text": " that try as well. Along with + that, we''re soon going to be sharing some more information about", "tokens": [50668, + 300, 853, 382, 731, 13, 17457, 365, 300, 11, 321, 434, 2321, 516, 281, 312, 5414, + 512, 544, 1589, 466, 50936], "temperature": 0.0, "avg_logprob": -0.09350616733233134, + "compression_ratio": 1.6523605150214593, "no_speech_prob": 0.009951352141797543}, + {"id": 874, "seek": 539968, "start": 5411.12, "end": 5416.88, "text": " fine tuning + the models. We''re already working with Gina AI on that. So hopefully we can come + up with", "tokens": [50936, 2489, 15164, 264, 5245, 13, 492, 434, 1217, 1364, 365, + 34711, 7318, 322, 300, 13, 407, 4696, 321, 393, 808, 493, 365, 51224], "temperature": + 0.0, "avg_logprob": -0.09350616733233134, "compression_ratio": 1.6523605150214593, + "no_speech_prob": 0.009951352141797543}, {"id": 875, "seek": 539968, "start": 5416.88, + "end": 5424.88, "text": " the blog post or something that demystifies that part + as well. Other than that, I have some plans", "tokens": [51224, 264, 6968, 2183, + 420, 746, 300, 1371, 38593, 11221, 300, 644, 382, 731, 13, 5358, 813, 300, 11, 286, + 362, 512, 5482, 51624], "temperature": 0.0, "avg_logprob": -0.09350616733233134, + "compression_ratio": 1.6523605150214593, "no_speech_prob": 0.009951352141797543}, + {"id": 876, "seek": 542488, "start": 5424.96, "end": 5431.76, "text": " also for + Hastag EU. I''m working on some case studies with the companies. I mean, if somebody + who''s", "tokens": [50368, 611, 337, 30987, 559, 10887, 13, 286, 478, 1364, 322, + 512, 1389, 5313, 365, 264, 3431, 13, 286, 914, 11, 498, 2618, 567, 311, 50708], + "temperature": 0.0, "avg_logprob": -0.14510599772135416, "compression_ratio": 1.6040816326530611, + "no_speech_prob": 0.03515405207872391}, {"id": 877, "seek": 542488, "start": 5431.76, + "end": 5437.2, "text": " listening to this podcast thinks that they could be one + of the candidates who wants to be involved", "tokens": [50708, 4764, 281, 341, 7367, + 7309, 300, 436, 727, 312, 472, 295, 264, 11255, 567, 2738, 281, 312, 3288, 50980], + "temperature": 0.0, "avg_logprob": -0.14510599772135416, "compression_ratio": 1.6040816326530611, + "no_speech_prob": 0.03515405207872391}, {"id": 878, "seek": 542488, "start": 5437.2, + "end": 5443.76, "text": " with this case study, we would talk about how involvement + of women change things at your workplace,", "tokens": [50980, 365, 341, 1389, 2979, + 11, 321, 576, 751, 466, 577, 17447, 295, 2266, 1319, 721, 412, 428, 15328, 11, 51308], + "temperature": 0.0, "avg_logprob": -0.14510599772135416, "compression_ratio": 1.6040816326530611, + "no_speech_prob": 0.03515405207872391}, {"id": 879, "seek": 542488, "start": 5443.76, + "end": 5450.64, "text": " feel free to connect with me on relevance like or whichever + way you think is the best LinkedIn.", "tokens": [51308, 841, 1737, 281, 1745, 365, + 385, 322, 32684, 411, 420, 24123, 636, 291, 519, 307, 264, 1151, 20657, 13, 51652], + "temperature": 0.0, "avg_logprob": -0.14510599772135416, "compression_ratio": 1.6040816326530611, + "no_speech_prob": 0.03515405207872391}, {"id": 880, "seek": 545064, "start": 5451.280000000001, + "end": 5458.240000000001, "text": " Yeah, I bet Twitter maybe. Oh, yeah, Twitter + for sure. Absolutely. Thanks for adding that in.", "tokens": [50396, 865, 11, 286, + 778, 5794, 1310, 13, 876, 11, 1338, 11, 5794, 337, 988, 13, 7021, 13, 2561, 337, + 5127, 300, 294, 13, 50744], "temperature": 0.0, "avg_logprob": -0.266349720954895, + "compression_ratio": 1.541062801932367, "no_speech_prob": 0.08135085552930832}, + {"id": 881, "seek": 545064, "start": 5458.240000000001, "end": 5464.64, "text": + " Fantastic. I really, really enjoyed this conversation at TITA. I learned something + today.", "tokens": [50744, 21320, 13, 286, 534, 11, 534, 4626, 341, 3761, 412, 314, + 3927, 32, 13, 286, 3264, 746, 965, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.266349720954895, "compression_ratio": 1.541062801932367, "no_speech_prob": 0.08135085552930832}, + {"id": 882, "seek": 545064, "start": 5465.360000000001, "end": 5470.56, "text": + " And I think that our listeners did as well. Keep improving your model.", "tokens": + [51100, 400, 286, 519, 300, 527, 23274, 630, 382, 731, 13, 5527, 11470, 428, 2316, + 13, 51360], "temperature": 0.0, "avg_logprob": -0.266349720954895, "compression_ratio": + 1.541062801932367, "no_speech_prob": 0.08135085552930832}, {"id": 883, "seek": 545064, + "start": 5471.52, "end": 5474.96, "text": " Keep generating more of the parts so + you can further your model.", "tokens": [51408, 5527, 17746, 544, 295, 264, 3166, + 370, 291, 393, 3052, 428, 2316, 13, 51580], "temperature": 0.0, "avg_logprob": -0.266349720954895, + "compression_ratio": 1.541062801932367, "no_speech_prob": 0.08135085552930832}, + {"id": 884, "seek": 547496, "start": 5475.2, "end": 5483.12, "text": " And I hope + to meet you as well at Hastag or some other meetup maybe online.", "tokens": [50376, + 400, 286, 1454, 281, 1677, 291, 382, 731, 412, 30987, 559, 420, 512, 661, 1677, + 1010, 1310, 2950, 13, 50772], "temperature": 0.0, "avg_logprob": -0.19185389171947131, + "compression_ratio": 1.5686274509803921, "no_speech_prob": 0.04549149423837662}, + {"id": 885, "seek": 547496, "start": 5484.8, "end": 5488.88, "text": " And yeah, + I mean, it was a really pleasure to talk to you today.", "tokens": [50856, 400, + 1338, 11, 286, 914, 11, 309, 390, 257, 534, 6834, 281, 751, 281, 291, 965, 13, 51060], + "temperature": 0.0, "avg_logprob": -0.19185389171947131, "compression_ratio": 1.5686274509803921, + "no_speech_prob": 0.04549149423837662}, {"id": 886, "seek": 547496, "start": 5489.44, + "end": 5495.76, "text": " Yeah, same here. It was amazing. And I mean, it was obviously + like a different level,", "tokens": [51088, 865, 11, 912, 510, 13, 467, 390, 2243, + 13, 400, 286, 914, 11, 309, 390, 2745, 411, 257, 819, 1496, 11, 51404], "temperature": + 0.0, "avg_logprob": -0.19185389171947131, "compression_ratio": 1.5686274509803921, + "no_speech_prob": 0.04549149423837662}, {"id": 887, "seek": 547496, "start": 5495.76, + "end": 5503.76, "text": " like being a subscriber and being on the show. So that + was the kind of like moment I had with", "tokens": [51404, 411, 885, 257, 26122, + 293, 885, 322, 264, 855, 13, 407, 300, 390, 264, 733, 295, 411, 1623, 286, 632, + 365, 51804], "temperature": 0.0, "avg_logprob": -0.19185389171947131, "compression_ratio": + 1.5686274509803921, "no_speech_prob": 0.04549149423837662}, {"id": 888, "seek": + 550376, "start": 5503.76, "end": 5510.8, "text": " myself today. Thank you for this + opportunity. Thanks so much. It was also the best for the forthcoming", "tokens": + [50364, 2059, 965, 13, 1044, 291, 337, 341, 2650, 13, 2561, 370, 709, 13, 467, 390, + 611, 264, 1151, 337, 264, 5220, 6590, 50716], "temperature": 0.0, "avg_logprob": + -0.2680434187253316, "compression_ratio": 1.4322033898305084, "no_speech_prob": + 0.0080722039565444}, {"id": 889, "seek": 550376, "start": 5510.8, "end": 5515.280000000001, + "text": " podcasts. Yeah, thank you so much. Thank you, TITA. Yeah, bye bye.", "tokens": + [50716, 24045, 13, 865, 11, 1309, 291, 370, 709, 13, 1044, 291, 11, 314, 3927, 32, + 13, 865, 11, 6543, 6543, 13, 50940], "temperature": 0.0, "avg_logprob": -0.2680434187253316, + "compression_ratio": 1.4322033898305084, "no_speech_prob": 0.0080722039565444}]' +--- + +Hello there, vector podcast. We are still in season 2. If you forgot, you can always go back and check season 1. We had some really awesome guests there. Today I have a big, big pleasure to talk to Atita Aurora, who is the search relevance consultant with open source connections. +And she has spent quite a bit of time in search field in NLP. I'm really curious to learn her journey and talk some of the topics that we usually talk on this podcast like, you know, search vector, what vector search and also, you know, aspects of profession. +Hey, Atita, how are you doing? Hi, pleasure to be here, to be in tree. And I mean, before we start, I think a huge shout-out for you that this is the great thing that you're doing. And I mean, I'm feeling pretty excited to be here today. Yeah, thanks, Atita. You're very kind. +And this is what also gives me energy, you know, when people like you say that this makes sense to continue doing. And I really enjoy it myself because I learn so much. I connect with all my guests on a different level in the podcast. And I hope that this is also informative for our listeners. +At least when I release and all the podcasts, all the episodes so far, I kind of like, you know, really remember and I learned something new. It's kind of cool. Yeah, I was just wondering like, we usually traditionally start with with your background. And this is where people can learn more. +But I know that you've been blogging and you've been talking publicly on conferences. But still, it's very interesting to know, you know, how did you arrive at this profession? What was it journey? Yeah, I think that's actually an interesting question. +And I mean, usually when I'm speaking at conferences, I do have this about me slide, which obviously I tend to just get through because I feel like it's so repetitive now, of having presented so many times. But I think I never really talk about as to how I started where I started. +And I think it would be nice if it was documented somewhere to look at when I get older and I can tell my kids or my, you know, grandkids that this is what I did it. Thanks to you for that. By the way, I'm also one of the regular subscribers of the Vector Podcast. +Absolutely all your episodes and obviously whenever you publish, I'm like probably the first ones to check them out. So how I started is kind of interesting. I was a master's students and I was supposed to finish my master's, that is, a master's in computer application in 2008. +However, our college, which is, I mean, I am from one of the top notch institutes, which has like a common attitude test. It's like about 400 K people every year take that test and about 100 people selected. And I was one of them. So obviously it was already very prestigious. +And we had this culture that, you know, if the course is for like three years, that is full time course, we would already get our placements in the year two. +So the company that I got placement with was a very small company and I think I have some sort of, you know, radar that I'm always attracted to small companies because I feel like I get a lot of accountability, a lot of things to do. +Apart from the stated rule in my job offer, which is what I kind of like as well. So it was interesting that I also reached out to them in 2007. +So I was supposed to complete in 2008, but I reached out to them in 2007 itself that I have to complete my industrial project, which is supposed to be like a dissertation thing that you do in PhD. So it was supposed to be a real life project. +And I reached out to them that can I join the company and like kind of do the training. And I mean, it was really nice of them to let me come. However, they didn't really have any kind of like training programs. +And they were experimenting with the solar and losing and blown and zo at that point in time. I'm not sure how many people would really know about zooping alone. They are like the Python based. I don't I don't at least. Yeah, that's actually because it was really a thing back then. +So it is a content based content management system, which is based on Python. And at that point in time, you know, Java was really a thing back then when I started in 2007, eight. +So having worked on Python, I was like, why am I working on something which is like Python? I mean, I want to work on Java, you know, J to E. That is really a thing like build cool applications. +But because I was a trainee and they could obviously, you know, kind of modulate, you know, my role, they asked me to research on, you know, solar and losing because they were coming up with the social networking website application. So Orkut was leaving the space. +Facebook was coming in and we were working on this Facebook application, which could let two people talk without knowing each other's number. +And that was all through like what would pop in my, you know, profile, like who are the people who are close to me? And this was all supposed to be based out of Lucine and solar. +So this is all, I mean, it started off with that I was literally pulling out all my hair at that point in time because you can imagine how immature solar was at that point in time. We didn't even know like which version of Lucine would go along with which version of solar. +So we were trying and testing and there were so many things which were missing as well. But I think I got pretty much, you know, soaked up even though I at first I did not find all of that interesting because my friends were doing Java, G2E and dot nets. +So I was like I'm missing out on something I would, you know, catch up with them and they would talk about all, you know, cool applications that they're building and how database connections, etc. were working. +And I was talking about yeah, I'm building this, you know, data and we're trying to locate people on Google maps and yeah, Google is really a thing. So it was like I'm speaking a different language altogether and people were like what? So it was it was interesting though. +So you felt like an underdog or something? Yeah, I felt like that. And on top of it, I think the bigger challenge was that we had this guy who was basically from the ontology's word. So semantic web was very underplayed at that point in time. +I think right now it's like coming up as if something really fancy and all of these things that are really seen in like big light were not really known as they were. So I was asked to find, you know, like the application has this feature that I could place people in the circle. +Like for forse really a thing friend of a friend. So the major I think the breakdown for me was, you know, dealing with the relationship. Every person is a document and finding relationship between these documents was something that was given to me. +And I was like why why God why am I supposed to you know, do this thing? So relationships and ontologies and visualizing this stuff. So we implemented this visual map using cluster map API back then. And I mean, now when I look back, I feel like that was like very cool stuff that I did back then. +It sounds very cool actually. It is. Modeling Graph using Lucine. It's like not necessarily something people do or at least I don't know about that. Actually, and we did not really have any cluster monitoring tools as well. So we built something by ourselves as well. +So using GraphWiz, we built, you know, like how each of the clusters are doing. So we had this thing that obviously cluster was not something that solar supported back then, but we actually made our own cluster. +But one of the things that I would also like to mention here is that we we did not really, I mean, at least I was or my manager was not really aware of like all of this could be contributed to open source. +So we were like living in our own world, trying to, like build something really cool only for the client, but not really for I think a public. And I think this is something that came way later in my life. +Yeah, I guess, I guess probably like before you contribute, at least how I feel myself when I also doubled in solar a bit, you know, in the beginning and then it took 10 years of in my life. I actually don't want to say off my life. It sounds so negative. +But like, you know, in the beginning, you still need, if you're like a startup or something, you still need to figure out whether this works or not, right? +Whether this solves some needs for your users, how much of this you want to still keep as a business secret, how much is okay to contribute because you might see even more development in this, right? And get to feedback. +Absolutely. Absolutely. + And, you know, having joined a company that was not really like, you know, big companies back in India, I think maybe you would get an idea as to how cool or, you know, how small the company was that in my induction program, like the first day when I joined, I was being asked that, you know, grab a cup of coffee and watch this movie. +The movie was Pirates of Silicon Valley. So they said, you know, we don't want you to have any rocket science e-scales. We were just make you, you know, learn all of that stuff. Just get this mindset. And I think that's what I tried at. I love that movie actually. +I think there are two versions of it, right? The original and some kind of remake if I'm not mistaken. Right. That's correct. I think I watched the original one. The original one is amazing. It's like almost this kind of, you know, it's like a meditation. You go into that state of mind. Indeed. +Yeah. We have a lot to learn from that movie. Yeah. But I think just like everyone else, I was also, you know, in India, I think we do have a lot of pressure of academically, you know, like building, grooming ourselves. +So when I started in 2008 and then, you know, got married in 2009, had my first gig in 2010, I think it was the time when I had to take a break. But when I did it come back in 2011, things were obviously had changed. +And someone, you know, told me that, you know, it would be a good idea to have more of, you know, like a hands-off kind of a role. And that made me think about, you know, going for an MBA. And I pursued MBA in 2014. +And I decided that I would leave development because it's too demanding and I cannot manage that with a child. And when I became a manager, I also took up a job as a manager. I did that for like two and a half months. + And I was like pretty bugged because, I mean, obviously, you know, once a developer, always a developer, I think I started always, I mean, I felt like a little bit, you know, more triggered or more, you know, joy in seeing how things really work and not really by having, you know, said that, you know, this is how, this is what we need to do with an application. +Like, this is the client requirement. This is the BRD, like a business requirement document and then go implemented. So I think which is why after two and a half months, I just decided to come back to the, that's where I belong. +And was there something in the first place that prompted you to take the manager role? Was it just the fact that you were going out of the maternity leave and you thought you will do better in management? Was there something else going on? I think that that's also interesting. +I think there are two sides of this, you know, answer. The first one was, you know, people usually associate and this is actually true that, you know, in a dev role back in India, because we have a lot of, you know, sourcing work. The clients are usually based out of US. +We have long hours of working and usually the client calls would happen in the evening, because we have like 12 hours or 10 hours of difference. So by the time you're ending your day, you have your client calls and you have to stay back in office. And that's probably not like that for our managers. +They have little more perks. And I think that's what someone suggested. And I think I tried to play along, although I mean, I don't regret doing MB at all, because it's, it just helped me understanding like what my manager is going through. +I mean, how is he thinking like, how should I behave? So that just gave me also the context of the other side of the table. So I don't regret it. But in some sense, it's, if I capture it, right, it sounds that maybe it wasn't the most natural move for you to take the manager role. +Maybe it was just some circumstantial in a way, right? You thought it would be better, easier with your new responsibilities in the family, right? That is good. Yeah. And I mean, I would say like, you know, personally, maybe women have obviously changed. +It's been five years I've moved to Berlin now. And until I moved and all, you know, women professionals that I know, my friends in India, I think they still have this problem of clearly communicating what they want in their job I mean, if you can't do that, I think you are already an awesome woman. +I was not one of them. It was always like something that, you know, people would see me as less if I'm asking for like, I need to be at home with my kid, because he's too small. He may need me. +But I always try to, you know, keep things to myself and try to, you know, change myself, try to leave what I was passionate about just to fit into that frame, like how women should be, how a mother should be, or how a wife should be. So that was something I learned very hard way. Yeah. +And it's like something that, I'm sure we will touch this on this topic later in the podcast, but like, it's something that is kind of implicit. And when we talk about men, maybe they don't feel that. +And again, it depends on the culture where you come from, you know, in my culture, you know, men also like assign this responsibilities that you should be the man who earns money and hands all your decisions need to be based in order to maximize that probability that you will be that person. +But maybe you don't want that path, you know, maybe you still want to go and explore what is it that you like. +And so it's interesting that how culture and, you know, society shape us in that direction, until we just carry the momentum until we realize, hold on a second, am I going in the right direction? And this would happen to you? True, true. That that was the exact same thing. +And again, I think the major bump came in that there was this company or or training company. So to say, let's put it more precisely, who reached out to me, which was like far away, at least from the place I lived in. +And they said, like, could you remotely, you know, develop this solar curriculum for us for our training? And it was probably the first, you know, big things that happened to me. And I was like, okay, I mean, I work on the application that uses solar-avue solar before things have changed now. +And that was in 2014, was they come back? Like, oh my god, solar is like still working. Like people are still working on this. Okay, wow, amazing. That's when I realized, okay, solar has really transformed. The community has grown. And it was interesting. +I think that was more like, you know, meeting my old friend solar in a whole new, you know, a tire like with dinner jacket and suit and with a tie. And I was like, oh my god, dude, you are popular now. So that was that was the thing for me. And preparing that course curriculum for them. +I think that was when I learned about all, you know, the developed features that were available in solar back then. Also learned about elastic storage back then as well. But that training became such a hot cake because I would give, you know, public webinars. +Obviously, we're was being paid for that too. There were like almost like 400, 500 people on those webinars to see like what this course is all about. Everyone wanted to become, so there was no rules, such as search engineer, like the engineer who knows about search more or less. +But we would take like 25 people only in that course, or maybe even less sometimes. But I think preparing the course curriculum was one thing. +And then conducting that course for the first time was completely next level because I did not imagine like people who would come to that course would come with like 10 or 20 or 30 years of experience in Java. +So to imagine like people are really asking me questions very low level, like what is happening when you know, face-sitting is happening? How is this, you know, a variable, you know, is it in the, you know, like a memory or is it in like somewhere else? +Like what would have like, like, performance wise? Can I improve this? And I was like, I mean, I am literally like stumped because imagine like this was 2014. +I started back in 2008 professionally after completing my studies. So six years and that took with a break of like one and a half years and competing with the knowledge of like low level code in Java with a person who's been working on Java for like 25 years. Obviously was something. +And I would always say, and I hope people who are listening to this did not, you know, recall like I'm saying all of this out loud on a podcast, but I would say like I have nine years of experience. But even nine years was less at that point in time because these people are like always very senior. +And which made me, you know, like take break from my office, like literally understand from the code level, like how each of these features were working. Although it was solely done to like, you know, literally save my reputation at that point in time. +But I realized like the benefit or the grooming that it brought along was like way bigger. I think that understanding that was like, I would say like a major breakthrough. You reminded me of, I don't remember was it 2011, probably, and Berlin buzzwords. There was some raffle for whatever. +Like I want a book written by Rafale Kutz on Elasticsearch. And he actually wrote like a couple words there. And he said, if you don't find answers in this book, then read the code. And like, you know, the code is also open source. +And I was like, this was like such a big opening to me in some sense, even though I was coding by then. I was like, hold on a second. So if I don't find what does it mean? So this book doesn't contain all the answers, you know, like major things. And it's pretty thick book, you know. +I was like, yeah, wow. So it just tells you that how experimental you need to be, right? And that there are no given answers, right? Right. So true. Yeah, that's exactly how it is. And then while giving this training, I mean, I ran almost like seven eight batches. +One of the person was my student who recommended my profile to Lucidworks. And that's how I got into Lucidworks. And I discovered a whole new word of open source. Oh, so now you can write code and, you know, contribute or, you know, shape the product as well. +Like I can really define like how the solar function would work. +Like I've always, you know, being on the other side, like complaining, like, oh, you know what? I don't like, you know, how this shows on UI or I don't like, you know, why it forgets about this thing or how about you, you know, if I could change this behavior. +So instead of, you know, just making that change in my local copy, I could actually open source stuff. I could actually contribute as to how product shapes. And I think that was like, you know, like you're a car moment for sure for me. So, yeah. +And at that point, you moved to the US for that job or you were already in the US. So I moved briefly to US, but as I said, like by that time, I already had my second kit, who was six months old. And it was not very, you know, practical for me to stay there by myself. +And which is why I decided to come back, uh, leave my job there in Lucidworks. And I started my own consulting company called bistro innovation labs. I know the name would sound like as if it's a restaurant. +And the reason, I mean, there are like a lot of things that people used to ask me like, why is it called bistro innovation lab? Like you should have something like a sci-fi formula or like some math algo in the name. +Why would you keep it like a bistro? And I was like, because I'm so passionate about cooking, I think, I forgot to probably mention that during my intro. I got so excited. But yeah, I think a food part is something that really, really, uh, you know, brings the best in me. +I think if my staff or whatever I'm doing is not really working, I think if you do not find me on my desk, uh, you would find me in my kitchen. So that's mostly, uh, you know, where my word lies. Some dangling between my desk and my kitchen, because I think I love it that way. +And that's what I call my company as well. +But do you think there is some connection actually? +I think at some point, I even like vlog really briefly, uh, about this, uh, as I was just learning how to cook, I guess, uh, there's some connection between how you write code from scratch and how you cook before you learned how to cook that particular dish, right? +Like you can assemble from building blocks and like in that order. +Right. I think, I think that's an interesting point. I mean, it does function the same way. And it's obviously like the experimentation is something like you would experiment with different cuisines. Like usually I have, I mean, I don't really do that. I don't change the basic nature of the food. +I mean, if it is German food, it should be German food. I mean, I would not try to Indianize the food. Uh, but I think somewhere I do that and that experimentation is something that I would also connect with like creating something. +And I think that's, I mean, I never thought about it from that aspect, but yeah, I think good point, good core religion, so to say. Yeah. Absolutely. And then what happened next? So you opened your bistro innovations lab? True. And I think that landed me a job in Germany. +Uh, and interestingly, I had never been a Germany before. And for me, I mean, I had been to London before I had been to US before. So to me or precisely so to say, you know, I'm trying to wrap all the Indians that would say, for us, every foreign country, you know, we can speak in English. +But I think the biggest trauma was like when I landed in Germany in Berlin. And I realized that, uh, English gay niche. And I realized that, you know, it would be very difficult. + But I think I had, I had a tough year in 2018 when I decided to move here for several reasons, because I was trying to make sure like my, you know, family settles here will at the same time trying to, you know, have a little bit of a grip on the language as well, like my work at that point in time. +But I decided to close the company afterwards. Sadly, you couldn't keep it. I could not. Yeah, there are some general notes. So it doesn't work. And did you, but you did have clients on it? Like you did the, oh yes, I had clients. I had clients. I had three pretty major clients back then. +And I think one thing that somebody commented about it yesterday as well. And they say that I come across subtle, you know, in a very subtle way, you know, very straightforward way. And this is something that people do not expect me to be. But I think that subtleness came way, hard way to me. +And I want to preserve it that way. So one thing that I try to set an example also for my kids is that I don't lie. I try to make things as clear as possible. So I tried communicating to my clients that, you know, I would not be able to work because I have already my hands full. +And I'm trying to settle in a new country, trying to manage my family, also helping, you know, my husband who did not have a job back then. But I mean, if you can adjust with that, we can still keep working. +But then I would not charge you for that because I mean, anyway, I would be paying taxes for it. So I think eventually, I mean, it was like more like one or two calls a week. And then it transformed into one call of a, then one call in two weeks, and then one call of month. +And then eventually I just lost all the clients. And I think that's when I decided to close it out. Yeah, yeah. That may happen. But but the still you have the, you know, the affinity to boards search world, right? And develop. I do have. I do have. +And I am working as a search relevance consultant now with open source connections. I think one of the reasons that I decided to work with this company, I mean, they are well known search consulting companies in the space. +And the mission statement that, you know, they want to empower the search teams of the word that really, you know, is something that really rings a bell, you know, or I would say like it shines with what I want to do. +Basically, I mean, you know, all of us have this, you know, mindset that we all want to do something for money and something for what we really are passionate about. So I think that's really nice like how I connect because I feel like I resonate with the model of the company. +I mean, I also want to, you know, empower people and like share my knowledge as open source or like, I mean, if I'm getting paid for it, even better. But yeah. And I think it is also different from how traditional consulting companies work. +Like I have been with consulting company myself in a starting of my career. And I feel like that the companies who are taking the services of these consulting companies are more like, you know, very closely tied. I mean, it's like, you know, being mad at with them forever and ever. +Like, no, turning back now, you're always, you know, going to be with us. And I think that's where it's source connection. It's like job security for some businesses, probably, right? That is true. Kind of like working model. +And you're saying that, you know, in open source connections, it's the opposite. And it's actually clearly stated that what you said, empower search teams to learn and become independent if they want, right? Exactly. That kind of entice the whole situation. You don't need to really be exactly. +And if you look at its very natural, like, you know, we help teams to, you know, fish their own fish. It's like, you know, we are unblocking them to achieve their goals. I mean, if you think of it from a video game point of view, like, and then they will be stuck at some other point. +I mean, by then the context would have changed and they would have swim through like their initial challenges. So, I mean, as a consultant, I would also get a new use case next time. So it's like, you know, we keep on learning with our clients. +And I think that's what really excites me about my job. That sounds great. +And I mean, when you think about search, really, what are the companies that are so publicly known and shining and doing so many things, but open source connections, you know, with haystack, grandparents in Europe and in the US with all the tooling, you know, like, cube it and beyond. +Like, I think it's in part, I think why this podcast also exists is because to keep going and discussing and keeping that connection open that we talk and we develop the thought further and we share our experiences. +And I think in many ways that's what open source connections has been so successfully doing. +And what is your role there in little bit more detail? So about my role, I think things, and again, another thing is that I love about my job is that I have independence in terms of like what I choose to work on. +So when I talk about an engagement, I mean, there have been cases when I've been strategist, also been an engineer. +So I get, I would say, like enough time to, you know, research about stuff, I get time to also be hands on, I get time to, you know, explore stuff and, you know, develop good solutions also as a council to the company that we work with. +And I think individually as well, like, you know, you feel really valued. Like if I want to have some transition, like, for example, if you would, you know, like to bring this in, because it's a vector podcast, like I added vector search to chorus. + So I think chorus is so to say, like a small, you know, like experimental, you know, webshop that bunch of, you know, folks started off, I think Eric Pugh and Renee and Paul and I think there are a bunch of other folks who tried to bring together, you know, all the two links, which is needed to run a webshop or e-commerce shop. +And I think with all this buzz that I was, you know, hearing at, you know, whenever we meet, you know, different clients, they were like, okay, what about vectors? Obviously, we're consultants, people look up to us, you know, for advises. +Like, is it something for me? Is it something that I can do? Is it something that, you know, we should go for, and, you know, I am a person again, I said, because I have something that really stops me if I have to lie. So, I mean, I would usually, you know, keep mum. +I would not really say something and I don't want to be in that situation for too long, because obviously the word is still, you know, getting ahead of themselves. +They are, you know, developing new solutions every day and then Chad GPD came and then, I mean, already a buzz about transformers and like LLM's, I think it's just non-stop. And everything is, you know, kind of, you know, like the boundaries are diminishing. +So, I remember like last year when I started with open source connections and Gen, in fact, I got a chance to work with this client and they were like, all about, you know, West Spa and then we also considered working with VVA8 and then what do you suggest? Like, are we making a good choice? +I mean, is it something that we should be doing? And at that point in time, I was like, literally, like, okay, I think I don't know, I don't have enough context of this. +And I think that's where it all started. + I mean, that presentation I gave at Perlin Buzzwords last year about West Spa, it was condensed version of how I learned Westpiant, how I got completely, you know, I mean, it swidled me off my floor like there is something that existed, you know, at the time when it's so existed and it has been as solid as, you know, rest of the other traditional search engines as well as, you know, so many, you know, data science functionality that it offers. +So it was amazing. And I would say like, sometimes that imposter syndrome that, you know, women have, you know, it also, you know, like transforms or it's something that triggers us to do something that in the end, comes out as, you know, more like shining or grooming us. And I think I liked it. +So I think that's where it all started. And I think I proposed that idea to some of the folks at OSE that this is what I want to do. +It took some time, of course, because I was on the full-time engagements with the clients, but when I had like the first, you know, stop, I tried to make some things work. I experimented with stuff and that's what we came up with. +So, Rene really helped me with the, because I gave like initial demos to him and Eric and Charlie. And I said like, you know, I'm struggling with that the embedding needs to be calculated somehow. +I mean, solar already supports vectors, but the only challenges that, you know, the vectors for the query need to be somehow calculated, you know, outside of solar. And that's when you know, Rene suggested like it would be a good idea to maybe, you know, use Quirky for it. +And I have not worked on Quirky before. And I have been wanting to do that. Somehow none of my clients were really at that stage that they could use Quirky. And that's how I got to, you know, touch the entire stack of open source connections. +I have already contributed on QPIT before, like adding visualizations and the other stuff. Solved a lot of bugs. I think Eric would really help you for that. And I would hate you. Why would you hate for that? I mean, because it was, it was something like, I don't know, like some sort of a charm. +Like every time I would touch QPIT for some client requirement, I would find a bug. And then I think it's been very nice, I think in principle that he's person that who would say like, oh, there's a bug. Please lock the bug. +How about I offer you to work on it? I mean, it would initially feel like because I recorded it, but actually, I mean, if you see at that, I mean, it would actually come as a, you know, very empowering thing. Like he offers you to, you know, solve it the way you like. +And I think not many people are as open. Yeah. And if you turn it around like for him to know all the context details is super hard because you encountered it in very specific context and you have all the input to reproduce it, right? So you are the expert of that bug. That is true. +I think that's another way to look at it. I mean, I never thought of that in that aspect. But yes, that's that's true. So that's something that triggered me to work on QPIT. So I worked on several bugs before I added visualisation. +And that to came through because I had to otherwise, you know, do the visualisation outside of QPIT, and I would then complain like, how about, I mean, I'm anyway using data from QPIT. +How is it like, you know, if I could use the data that's coming from QPIT, like insight QPIT somehow, you know, supporting the visualisation as well. And I think it was certainly groundbreaking that we added Python notebooks functionality to QPIT. +And that just opened up a whole lot of, you know, the AI portal. Yeah, that's amazing actually because I also kept pushing QPIT in every job that I took, right? And not necessarily pushing as in I am selling the tool and I don't care what's the purpose for using it. +But I just know that for all these typical problems with search quality, instead of like reinventing a wheel, why not take an open source, you know, tool, like QPIT. And it's commercially friendly license, you know, go ahead, deploy it. It's very easy to do so. Exactly. +And then when I saw the, the notebooks, I was like, wow, this is so cool. I can just now select to my data scientists and say, hey, we've just labeled all this queries. Can you do your magic right here in the notebook? And I can actually access it as well, potentially, or whatever. +I mean, it's just like so much easier than to scratch your head and think, okay, now I need to download all this data, all these annotations, and then push them somewhere else. And then, yeah, it's, I think it's an amazing feature. Thanks for doing it. Oh, my pleasure. Yeah. +So I think also during this course of discussion, we also discovered some documentation bugs, something that we're not mentioned in solar documentation. I contributed to that too. +So I would say like in principle, I think we have a very supportive and encouraging culture in the company, which is what I really like. I think, I don't know, maybe if I could talk a little more about this wet room implementation. I mean, there have been several talks about it as well. +I also presented that, you know, Craco, the haystack on tour. But I think it was something that was, you know, has been sitting on my mind for too long, because I think we need a, you know, reasonable way to not, you know, like, you know, detain the question from the client. +And also, at the same time, we want to address the question in most, you know, like explainable way. And I think this was like the explainable thing that, you know, is something that people can use. +And I think all this while, you know, people have been discussing about vectors, I think people charged money to show you that, you know, vectors could work in your search engine. And we do it for free. +And I think, again, going by like what my company does, like we provide a lot of informational content, you know, for free and open source, lots and lots of things, you know, which usually would cost a lot of money to the companies. +And I think this is something I really feel like I'm doing a good job. I'm making a difference. So I think that that really brings a lot of, you know, like satisfaction that I'm doing a good job somewhere. So that's, that's nice. +And I think now I'm working on to also, you know, experiment with other stuff with the selector thing. I presented that, you know, we would be working on improving the image models. +So basically, I mean, I'm sure you know about it, but for some viewers, it would be a new thing that chorus is a dummy shop. And we have a dataset that comes from icecat. So icecat data basically is, you know, like a collaborated content from Amazon and other, you know, web shops. +And usually that content is like very, very structured content. It has images as well. And Lord, if you know, my new features or attributes of all the products. So which is why it's like a good example. +Plus we have other content as well in the chorus in general, like how Cupid would work locally for your web shop and how you can use quirky and smooie to do the search andizing part and, you know, manage these search rules. +Like if you want to bring some brand up in the search results, you could do that as well. So I think we promised that we would be working on the images side. And because we have access to images, usually we have any commerce shops. So we tried to, you know, leverage that as well. +And I think if you look at the demos that we presented, even without fine tuning the results were like breathtakingly unbelievable. We were like, wow, this is, this is amazing. +So I think in general, I think it was very, you know, liberating experience that, you know, we could use vectors successfully in these shops, which are using the traditional search engines. So we, I recently also contributed the last search version. +Because again, in lot of forms, when we posted about vectors in solar and chorus solar version, we got a lot of questions about like, is it going to be supported also in last search? So I think I just took stab at that too. And chorus is implemented in which language? Is it Java or? Yes. +So chorus is, yes, it's combination, of course. I think I'm adding a lot of, you know, I think content to it as well. +I think one of the more interesting things that I also contributed and I felt like it's something that usually people would not share it on the open source is like how you can convert your documents into vectors. +So I think this is the part that usually, you know, the data encoding process, something that people would really charge you high amount of money to, you know, add vectors into your indexing pipeline. And I provide that again for free. Yeah. And in solar version. +So I think that is also something that people could take advantage of. Yeah, I mean, just a couple of years ago, when was it exactly a year ago? Eric, you traveled to Europe to meet you guys in Berlin and then he also traveled to Finland to visit me. +And he wasn't the hotel room and he said, hey, let's work for a few hours. And he was actually asking some questions about vector search. We were writing this article in search insights, right? Oh, right. Right. And and and he was saying his very passion, he's like almost theatrical. +He was like walking in the room and saying, okay, here is the pipeline. I have solar. I have this. I have my Java like client. So where will you compute this vectors if all the models are accessible through Python, right? Yeah. +And it wasn't just a question of you might get the same question with your client, but it was like, wait a second. Do I even know myself? What would I do? And I said, probably I don't. So I would start engineering something from scratch. That is true. +And I think that that is that is actually one of the most obvious questions that keeps coming across because Lord of our clients are now interested in this. +I mean, I know a typical work week for me has been like, I'm giving like more than four or five demos in a week to clients like who want to know about this like will it fit my use case? Like I have this, you know, size of the catalog like will it fit my use case? What do I need? +And what kind of models? I mean, I need to choose and obviously there are some things that we need to also, you know, spend time and, you know, charge money for. +But then eventually it turns out that, you know, people bring in their concerns and that basically, you know, like shapes, what is it that we need to contribute next to the open source? What is it that is confusing people? Why people are creating so much hype about this stuff? +Like this needs to be demystified. +And I think that's what my company does the best. And I think I'm just learning from them. Yeah, I think it's the it's an excellent spot. And it's like, usually with all these hypes, things get overcomplicated. But in the end, if they prove to to exist, right, it prove the right to exist. +Then I think many of these things will get simplified. They will probably to some extent even commoditize. And like, I think it was the recent LinkedIn post that, broken up every single mind and search by Dr. Dr. +and Bullway, he says, you know, why what's happening with the last search? What's happening with solar? What's happening with this? That's right. Lodge language models. And, you know, like, and it's like, how do so many people chip in? And I was wondering them too. Yes. +And they are still like revolving around like some of the some more interesting concepts, some more like basic concepts that really are unsolved in many ways. +And, you know, how do you even like deliver vectors to your database and so on and so forth? +And one of the comments, I don't remember who said it was, hey, you know, in the end, keyword search and vector search, both will be as equal kind of like, modesty that you can play with in any order, probably give some weightage to one or another depending on your use case. +And so complexity will shift away from these basic topics to something more domain specific. That is actually right. And I'm glad you pointed out. +And this was something that that's been like constantly asked when we present, when we give demos that, you know, how do I fit this into my existing stack? I know it's cool. And because when we say that, you know, it is understanding the semantic meaning of your query. +And that means that even if the things which are not described in the similar vocabulary, then also your search engine can find them. So I think which is a very powerful thing if you look at. And also there are some people who really wanted to use all the machine learning magic. +I mean, I remember the craze in 2017 when LTR came out like how many people really wanted to use LTR? And then it was like, you know, that kind of gave a lot of struggle to a lot of folks. So it's just that how easily accessible, you know, machine learning models now become. +So people need to know, people deserve to know that it is not that rocket science anymore. Like it is very common. It's very obvious. It's like the natural path that should go into. And the thing that I wanted to point out earlier was that the hybrid is the wave forward. +There are so many things that I could think of that keyword does the best. And I think sooner people realize that hybrid is the way forward. Yeah. +And like what is your take on hybrid if you were to offer it to a client, you know, I have heard from some of the clients in the past, you know, okay, so if we have a vector database, like VVAT or pinecon or whatever or quadrant. +And then we have the needs for the e-commerce application like facets and we cannot do it in some of these databases. So what should we do? Like does hybrid mean that we will run to databases like one elastic search, one vector database, or do you have a better answer to that? I think not really. +I think that's also an interesting point if it's coming in the discussion here. I think sooner or lot of these, you know, vector database and vector search engine companies are also realizing that they cannot lose what keyword search engines brought. They cannot just take it away. +I think one of the other things that, you know, the keyword based search engines abroad is like, you have total control on what goes into the search engine. And I think this is something in the name of semantic understanding, you cannot just push your content into these search engines. +So you still have to massage the content, you still have to treat it, you still have to have control on this data. And somehow, like if you say that, you know, synonyms are not needed anymore because, you know, the vector search engines, you know, would understand all of that. +But what about stemming? What about, you know, there are tons of other things that we do in the before pointing data into the search engine. And I think that still stays relevant in a lot of different contexts because this has developed, this has grown over the period of time. +You cannot throw it all out. +So I feel like there's a, you know, mid kind of point that, you know, we have to come, like, especially for the traditional search engines and the new search engines, which are emerging in the market, they have to come somewhere in between, where we try to, you know, bring best of both words. +And I think that is going to be the way forward. Yeah, but I guess we are not kind of like there yet, right? I would say like, I mean, the change has already started. All right. +In the presentations that we give and the demonstrations that we provide to the customers, I think people ask us that, you know, what is the smallest use case I could, you know, try with this. +So I think one of the suggestions that always, you know, comes from my side is that, you know, attack the cases that do not, you know, perform well with your traditional search engine. So for example, like the long tail queries, I think this is where any traditional search engine struggles. +This is probably the first thing, and instead of, you know, running into a zero head screen, like it is better to, you know, have a chain sort of, which should delegate your query to the vector search engine or a vector part of the query. +And this is something should be like a smallest, you know, like way to adopt the the newest technologies and take leverage what they find a bring it. +So maybe like from the product perspective and business perspective, reducing the search abandonment rate, right? Because that, that that what actually takes a lot of money away from all these players, you know, absolutely. +The abandonment you're just based, based off and you're like, you cannot find anything. So why should I keep trying? The system does not even like respond. Absolutely. +I mean, I have been in that situation before because I've also worked on like a lot of product searches and I would leverage something like, you know, I would keep on doing like the relaxation of the tokens in the query. So I would keep dropping the content that doesn't make sense. +But then I would say it's not the easiest thing to do. Rather, it is easier to pass on this query, like the long query to another system that is, you know, dealing with semantic similarity. And once that has proven its worth, I think that's the time we bring it forward. +And we take it more like from the down to up approach. Yeah. So you think the message really to vector database companies is to think about what you can take from keyword search engines like solar elastic search, open search, I guess as well, right? Absolutely. +And also message to the existing keyword search engine companies as well as that you're not old fashioned, you're not like out of the market anyway, like you have proven your worth over the period of time and it is here to stay. It's just that how quickly you can adopt to the change. +And I think that is happening. Yeah. Yeah. That's very interesting. I think you you are calming many people. No, I like, oh, I'm like, I will losing, I will losing the wave of innovation because we cannot, you know, introduce vector database into the mix or whatever. +But I think you can, right? Like with solar, Alexander Benedetti is doing a lot of work in implementing the vector search there. And then in the elastic search, of course, solar Maria Shripe and others have been Julie. Yes. +Julie Tsipzirani have been doing work there, right? But like I do still feel like what Doug was saying in his post, you know, the cracks of it that like it doesn't feel like this functionality is advertised well. I think that's actually the point. Yes. +And of it, not even from the marketing perspective, but more like from the perspective of, hey, how do you get things done with this? Yeah. Like all these basic questions answered. Yeah. I think that's actually a very good point. +I think and there's a way, I would say like a contrast you see here, because the companies which are bringing the vector search in a database of the search engine format, they're new. They're upcoming in the market. I think this is part of the marketing strategy that they have to talk about it. +They have to advertise it. It's just that, you know, the traditional search engine companies or I would not say like solar is not with your company. Elastic probably is. +But then because they are already, you know, like very popular people are using it, they don't need that mass, you know, publicity. +So to say, but I think we need to talk about it that, okay, I think if that's a trend, like we do it too, but we don't talk about it that much as much as we should be doing. +And I think that's actually an interesting point, which means that we should talk about course more, because I think that basically exemplifies as to how easily can this be done with your search engine. +So you don't have to divorce your existing search engine to use some cool technology, unless you really have a case where you are starting right up from the scratch. I think you can consider using one of these. +Otherwise, I think if you're using something already, which has grown over the period of time, it doesn't make sense to throw everything out of the window just yet. Absolutely. +And with your Lucine mindset, what have you seen in Vespa that looked attractive for that? Yeah, I think I have been quite a Vespa fan girl, I would say more for the reasons that the content, the kind of content that Vespa, he generates. + I think, you know, you would need, when you're talking about features, when you're talking about, you know, search engine or, you know, different kind of like what can be enabled with this feature, you all, you know, think about like, okay, will it perform like how much of queer response time am I looking at? +What is the data set I'm looking at? And when you look at like the Vespa's content, I mean, you don't have to look any further, like everything is summarized so well. +I think this one thing I'm trying to, you know, add to my writing style that I assess everything, well, that I can say it out loud to the public, to the word that, you know, this is how it performs. +And I think the very knowledgeable folks, I mean, especially Joe, I think I have been super impressed with how he describes stuff. And I think some of the things have really like blown my mind out as well, like, oh, this could be done in this way as well. I think that's that's one of the things. +And while developing this presentation last year, I think I bugged him a lot. But if he was, he was always, always super responsive, even his team, there were some, you know, UI things that I found out, like we're not working as expected. +And I think they're always, you know, very modest and, you know, acknowledging that, okay, this is something that will work on. And this is how it works. +So they're always there somehow, like, I don't know how big the team is, they're always, you know, some familiar faces who are always responding to your messages, but it's nice. +One of the other things that I would like to point out is also that, I mean, distinct or I would say like a nice thing, is that updates is one thing that, you know, you would struggle with in a search engine. +If you have big catalog and you're expecting, especially like in e-commerce, like, updates come in, the company that I used to work for before, we used to have several updates and we used to club them out, kind of bashing process. +And we would process them just as like together, because we didn't really have resources to process them, like one by one or just as how they come. I think Westphah really does that, like through updates, through atomic updates is something, through partial updates, sorry, is what they do. +And I think this is something really, really cool. And I think that just takes away that need, that you need to rein next everything and sometimes, you know, people complain that I have a really big catalog and it takes like six hours. +If I rein next everything, I think that's something they clearly stand out. I think when they say when they claim that we are searching for big data, I think they really get it done. Yeah. And I think it was also proven in the context of Yahoo systems, right? Some of the life scavenants. Yeah. +I mean, they're always the early adopters and I think the way they write about this stuff and how they implement, I think they correlate, you know, cover the topic when they're talking about it or implementing it. And I think that's what I really like about them. Yeah. +So like if you would recommend someone who starts from scratch, would you recommend Westphah? I think I can. I think I can. I think, you know, one of the things, maybe I'm old fashioned or I don't know, like this, this is not affiliated way. I mean, no one has really paid me to say this. +But more of like, I feel like it's not as fragile as so many systems that are coming up recently. I mean, obviously we have more sourcing power. We are ways stronger infrastructure wise. But I think it is as solid as, you know, so do our elastic is. I think the kind of trust I have in them. +And also like as, you know, clearly catching up to the trend and, you know, evolving soon is also what they have. So I think that's that's really kind of something remarkable. And like to think, I mean, it's a huge pallet of systems. And it's not like one of one is the only winner here. +Like it would think it would think about, you know, Luzin itself, which has been developed for like how many years 20? Yeah. Or more actually. It has so many human languages. Natural languages support it that you cannot find probably in West Boa or other systems. +But again, it all depends on your market where you're going. If it's English speaking, probably you'll be fine. But like if it's like Japanese or, you know, some of the interesting tokenizers that have been contributed to Luzin, I probably still stand out. +Yeah, I think that's one good point that how much of control or where exactly am I coming from? I think a lot depends on the context too. That is, that is right. I mean, I would. Yeah. And like switching gears a bit. So we did touch on this topic. +But like, so your progression and the profession has been from, you know, what sounds to me like, I don't know, it's the super tough competition to go from to for 100,000 people to 100 something like that. It's just insane. It just feels like, you know, a journey full of challenge. +But then like on top of these, there could be other challenges that, I don't know, like gender inequality or whatever is happening in the world today, right? In the profession. +And this was one of the topics that really stood out on Haystack in Berlin last year, in September, right? Where you ran the session, women in search, if I remember correctly, the title. And you had some women in search invited on stage. +And some of them have sent, but here, recorded, um, little presentations. I mean, this was very like emotional. It was, it was a learning experience for many. The crowd was speechless in some sense, uploading, of course. +What was going through your mind when you were preparing this session? How did you come up with this idea? I think, I mean, I would also acknowledge like, when people raise hands after this session, I was expecting like, oh my god, what kind of questions? +Like, people would start asking me, like, you know, how did you come up with this? Like, how did you come to this figure and stuff? And they would ask me questions about what I presented. +But it was absolutely so hot warming to see that each one of the people who raised hands were to appreciate and tell me how they felt about this session and not really like putting me on the spot. +And I think one of the other things that I wanted to achieve, because we've had the first session of women in search in Haystack, US last year. So Haystack is just from the corner while we're talking about this or might have happened when we roll this out. +But yeah, I think Audrey presented the first women in search session in US. And it was a panel discussion as to what women really expect in the company or what kind of qualities or what kind of features they, you know, stand out for women when they decide to join a company. +And it was our long conversation. I think there were different kind of feedback. Some people enjoyed it. Some people said, like, oh my god, it was kind of like too much to sit in one place and listen to like white women talking. +So I think the idea came from the point that I would not, you know, have like a panel discussion. It has to be something different. And it has to be something solid. It has to be something that people can relate to. And it has to be something that is contributed by several women. + So I cannot bring everyone up on the stage of course, but I wanted to make sure like I have as many women as possible somehow to talk about themselves because each time I speak to, you know, fellow women in the search of crowd, I feel like they, they very much, you know, underplay what contributions they make. +And I think this is something I keep telling, you know, people that I meet that, you know, what you're doing is amazing. It's just that if the other person failing to see what you've done, it's not that you've done, you've not done anything less. +And I think this is basically the message that I wanted to kind of, you know, spread across that, you know, if you're not seen enough, I mean, maybe we just need to gather, you know, like we as women and we would be there to support each other. +We would be there to kind of, you know, advocate for each other. We would be there to mentor and collaborate with each other. If, and I also said this in the session, that one thing that always, you know, like surprised me is that men form group to kind of groom themselves to develop themselves. +Women just don't do that. I don't know why women are often, you know, like in their heads, they're still competing with each other because it's like, okay, only one could be misuniverse. Only, you know, one of you can be succeeded. +There's always like, you know, one best thing and that basically, you know, triggers this competitive nature in us. And I think that needs to really go away. Like, and people do it, people make us compete. +And I think this is something that somehow, you know, I'm trying to, you know, like spread across that there's no point competing. +Let us, you know, use each other's, you know, strength to become, you know, one solid strength so that, you know, we can really, like, become a better version of ourselves. So we're not competing with each other. We are, you know, we have to act as one. +So that's, I hope that that message kind of spreads across. And I hate when companies say is like, you know, we have rolled out this position. And we're expecting like 33% applications from women. Like, we never do that for men. +Like, we never say out the numbers that, you know, like these much of our, you know, applications came from men. Like, why we have to always, you know, explicitly talk about this number. Like, we have reservations. We don't need reservations. We need, like, we need to be equally considered. +And I think there have been like several sessions. And I think when you talked about the preparation, I think I literally soaked into that moment, you know, became one of like the activists myself. I was attending so many different kind of webinars and like the in-person sessions. +I don't know how many tons of like groups that I join afterwards to listen to like what people really talk about. And I think this is like way specific or critical to me because I want to make sure like I deliver my 100% when I'm doing something even though if it is non-technical. +So although I was like very skeptical about like if I should do something non-technical because I think in my head I was still replaying that I don't want to be type-casted as like, oh, maybe she's not that technical. That's why she's talking about, you know, non-text stuff. +So it was kind of like I was confused like if I should do that. I mean, I don't want to be type-casted. But in the end I was very happy, very surprised, happy, surprised that people took it well. People took it the way I expected them to take it. And a lot of women reach out to me as well. +A lot of companies reach out to me as well. And it's surprising that how many people want to collaborate and you know they tag me on several, you know, LinkedIn posts as well when people make big claims. +And that gives me an opportunity to speak to different companies as to, you know, what are they doing? And what is it that they want to do? And sometimes people expect that you know, I would be coaching like how can they add diversity, how they can bring in more women in the team. +And I'm like, okay, I can help you with your, you know, search application. Not maybe this is not probably what I master. But it's nice. I think how many people kind of want to be involved with this venture. There's this company that I recently spoke to. I would not name them. +I would present them maybe in Haystack EU who have got so convinced. And even the CEO of the company got so convinced that they introduced like supporting women and having women in presence in their company as like one of the pillars pillars of the company. Like this is what we are defending. +This is what we are going to be talking about. So which was very hot warming. People sent me messages like, Oh, after your session, you know, we added like two or three women to our team. I think in next six months time, we would let you know how this goes. +So it's very warming to know, like people are trusting me, people are, you know, like reacting to it very positively. People are not, you know, typecasting me. And that's, I mean, it feels, you know, like I feel accountable for rest of the women as well. +So I feel like if I mean, I can somehow, you know, bring more of women together and like the problem that they have, they do not have means to network even though we are so much of advanced word. Like some way we could mentor women. +I think if you remember, like the women that came up on the stage, they were not speaking up that nicely as well. I mean, they were not groomed in the in the manner that, you know, some of them were really shaking. +And some of these women, you know, wanted to, they were feeling more comfortable to speak, like in a recorded video in the closed room. And I think that's where it was coming from. Some of them were obviously coming from another place altogether, last minute editions as well. +So obviously they could not manage to travel. But most of them had this fear that they were like, you know, we would introduce ourselves in one line. And I was like, yeah, I mean, one line, two line, three line, just come up on the stage. +Like, let's just, you know, feel that, you know, light up the stage, like how does it really feels like? So, and if you, you were there in the session as well, like none of the contributions they mentioned were less in any way. It's just that we don't realize the, that how big our contribution is. +I was blown literally like, I think I even made the comment there. I don't remember the name of the lady, but she was saying, hey, I was just like a student. And then I found an internship in one company and I started contributing to solar and my code was accepted. Yeah. And she was exactly. +She was still, like doubtful of herself. Like, am I going, what is this? Is this the right thing? Yeah. +And I was like, I remember myself, when 2010, 11, 12, something like that, I came up with some ideas, some code, and I was kind of thinking, what if I contributed something to solar? And I failed. I could never like find my, myself a path there. +And then at some point, I kind of like gave up in a way. And you know, like, and then this lady says, hey, I just did this small thing, right? And clearly underselling it. And by the way, yeah, that's true. +And by the way, I mean, if she's listening or someone else is listening, I have used peramsat in the vector implementation as well. There you go. +Yeah, I have tried to include all the bits and pieces like, you know, I mean, I'm always, you know, touched by this open source, you know, contribution part. I feel like that gives you the opportunity that you can learn from experienced people. +And it takes effort to accept the feedback from such a large audience, so many experienced people. It gives the opportunity to interact and have, you know, like comments from people who've been working in the industry for such a long time. +And if you survive like one contribution, even one contribution, I think that's where it starts. And something like, you know, it's like some addiction. Like once you start it, it's like, you don't, you just keep one of, do it like every, in every possible way. +You just want to make sure like you're contributing something or the other. It's so amazing. It feels amazing. Yeah, exactly. +And it's nice to say this because I hope that there will be more female listeners also joining this podcast, according to my statistics, at least, you know, there've been domination of male listeners if YouTube is not lying to me. But I don't think that should continue that way, for sure. +Because this world is much more diverse, much more interesting, and also we should remember to say that in the relevance and matching text lack, there is a secret group, right? Can you say something about that? Like a channel? You mean women and source group? It's open. +I mean, it's open to women and the allies. So if you consider yourself as a ally for women in search, you're welcome to join it. There is no secret. You can be happily part of it. And we have session one some month every first Wednesday. And we still have that. +We are trying to bring in more useful content. So I talked about like mentoring part and the, you know, collaborating part. And then how I'm seeing that, you know, women should contribute and, you know, collaborate more. +So I'm trying to have a pattern where we could have some text sessions as well. Like I imagine, I think two, two months ago, I think, because from last two, I have not been able to attend myself one when I was out of town and the other one, I think I was not well. So I did not attend them. +But then what I'm trying to have here is that we are trying to have more like text sessions, more like people open up to themselves and then, you know, to the others that this is what I'm trying to achieve. +So I think the session that I had like two months ago, I was still implementing stuff up in course. +I was still kind of battling with like, how do I make images work? +And I was considering and it was so beautiful that during that call, like everyone who was, you know, just kind of, you know, getting to know each other and it's nice that, you know, every time we have a meeting, at least one or two new people join in. +So one of them said, like, have you tried this? Have you tried that? And everyone got there, you know, Jupiter notebooks out and then they all started, you know, like, oh, this is going to work. Oh, maybe this one needs a licensing cost. +Oh, maybe this one we should not consider because it supports these many things only. And then by that time that one hour, I mean, became so productive that everyone left that meeting with some knowledge of vectors. And it was so amazing to see that. That's super cool. +I mean, I'm really glad to hear this because it both connects to like an overarching goal, some real purpose here. And also, you know, you create the environment of support and exchange and cross-pollination. I think this is something that men should do too. +Yeah, you're welcome to join the sessions. I mean, this is really, I mean, we feel, I mean, it's really nice. And I can say, like, you have good reputation. People would love to have you. +We hosted a Sebastian from VV8 who offered to help after the Haystack EU session to help women to, you know, have some public speaking skills. And I think we've had an amazing session. We did not put that out on YouTube yet. +Not sure why, but then people who attended absolutely hanged us and said, like, it was really useful. Some planning to have more sessions like that. Maybe we could host you someday as well. That would be awesome for sure. And so you have a YouTube for that. Or is it part of Oh, we have it. +I think this is the reason why it was not put on YouTube because we don't really have a channel. And we did not know like where should we put it. But we recorded it for sure. We have the recording. Yeah, I think today YouTube is one of the probably defective platforms. +I don't want to over-advertise it, but I think it's a good place to be. Yeah, I think that's a good suggestion, I think, will certainly consider for that in future as well. But yeah, I think the entire idea is that, you know, we need to make a positive impact. +Be it with open source, be it with, you know, the trying to push and bring up more minority people up front. +One of the most, I think one of the things that I would like to highlight here is also maybe we haven't touched, but while talking, it strikes me that we have a saying back in India that behind every successful man, there is a woman. +I think it kind of, you know, reverses itself in my case because my story started. I mean, if I talk about from my college, my husband is also my, was my classmate. So I'm like one of those blessed people who has a company of someone who always supports me. +And I think I have been here always, you know, bold and like standing on the stage giving presentations because behind that is the person who is, you know, very meek, very nervous. Like, how would I do it? And he's the one who's always encouraging me to come up on these stage and like, I can do it. +So it's interesting that behind me and every, you know, big stake that I took, there was always a man. So if I talk about like the first one started with my husband, the next one was obviously my manager at my first job. +And with the open source as well, I would like to mention that there's search engine that's probably not very popular. Maybe a lot of people do not even know about it, open source server. And it's not affiliated with AWS open search in any way. But so this guy has recommended me also on LinkedIn. +And we bumped into like maybe he was trying out, you know, other contributors or developers as well for his search engine. And we collaborated and he showed me whole new word of open source, open source contributions. +So he's a very critical person, I would say like code wise, I think he's really good. And he worships, I would say Mike McIndels, because every time I would not understand anything and Lucy and he would show me something from my McIndels. So I mean, it was, it was nice. +And then I started, I think if I talk about from the open source connection, affiliated libraries as well. So chorus was mostly like Eric and Renee kind of pushed me into like we should do something together, keep it again. +Eric promoted sort of that, you should try fixing some things also with solar. It was Eric. And with open NLP, it was one of my colleagues at open source connections, who's not with open source connections anymore. So Jeff, who is chair of the open NLP library. +So I think there was this discussion we had because we were trying to work on a use case together. And he suggested like we could do this with open NLP. And I was using an open NLP before as well, like in 2017. +And somehow never really thought of like I would contribute, but I pointed out to him that you know there's something that could be done in this direction. And he said like why don't you do it? And I was like me like no way. And he was like no, no, you can do it. +I mean, if you understand this, I mean certainly. And I think as I said before, like it's an addiction, once you started, you just don't want to stop. And I think that's how I started, I mean, I started using it more and more contributing more and obviously. +And you became the commuter as well, right? I became the comatir. Yes. Of open NLP, right? Congratulations. That's really great. You're right. Nice. Actually nice. And the kind of people who review your code and like you get to know like oh, this could be done this way as well. +I mean, I can say that it's a very rounded development opportunity that I got with open source, open source contributions. So that's something I can highly, highly recommend anyone. +I think one of the things that has really helped me in principle is like starting with the documentation or maybe the test cases. I think that's very classic advice that you would get from anyone who's been working and contributing. But that's really the right way to go about it. +I think documentation really helps because you can not just write anything like that. You have to try things out yourself before writing about them. And I think that just gives you enough context to pick up some tickets and like start solving them. +I am actually mentoring some of the people like outside my job to contribute on open source. And just so you know, these are the people who have been in the industry for like 20, 25 years. +So I mean, there are people who come with Lord of hesitation that I mean, is it even something that I should do? But it's nice. I mean, the kind of questions they bring in and the kind of discussions we have. It's amazing. So yeah. +Yeah, I mean, for sure, like what you said, makes so much sense, you know, like you read the docs, you will and you will test it out probably as part of your job or of your research project or whatever or your curiosity, right? And something will definitely not be right. +I mean, whatever said in the doc is not longer true or it doesn't apply for use case. +But how did you you said, you know, this imposter syndrome and like just general fear or like you didn't have experience with it? How did you kind of leap from from what you just encountered that doesn't work to actually going and contributing it? I think that that is something. +I mean, I've so I think the first time it took kind of the bigger leap of faith, then rest of the other times. I think once it works, that's what I'm saying. Like the first time is always going to be the hardest. +I think you would always and I think not being able to contribute on solar for almost like six years, I have been away from solar and I can say like a lot of things I already implemented when they came out in solar for point 10, I think that's what I started working on in 2012 or 14. +And it was kind of this grudge that really like made me feel and took me on a guilt trip that I could have done that just that I did not have a knowledge of. So which is which is why I mean, I'm saying that if you think about that you could do it, I mean, just think about the worst thing. +Worst is that somebody would reject and ask you to rework on it. But from your side, you would have the satisfaction that I thought this could be done this way and I tried to you know, do it this way. And I think that's what that's how I am living now. +I mean, if I have something on my mind, sitting on my mind that this is how it should be done, I just execute it because that is what is in my control. Like I can execute it. Approving or not approving something is something that I delegate to the rest, the other person. +So I think that is also something that we brought up in the discussion with Sebastian that Lord of Women, when discussing like how can we become like better public speakers, asked him that you know, what is a good topic to be submitted in the conference? +Like how do I ensure like my topic is accepted? And he made a way, I would say like a reasonable advice that from your side, what you can do is like submit the proposal. +I mean, accepting, not accepting is something, I mean, once it is you know, like submitted, it goes to the next swimling and that's when you stop thinking about it. +If it comes back to you, that's when you need to think about like what do I need to present to woo the crowd and making sure like you make a point. But then that's what even I do. Like if I think about that, this is how it should be done, I just do it. I like maxes like it would be rejected. +It would not be accepted or somebody would say like, oh, you know what, this is wrong. This is now hot. This is not how it should be done. That's about it. You get a feedback. Either you get the recognition of the stuff that you've contributed or you get feedback. Both of them are good. +Yeah, that's beautifully put. I remember about one person who was like a Java champion. I don't know if you know of this title that was awarded at some point by some microsystems. I think it's called Java champion. +You know, people who really popularized Java and you know, talked about it, written books and contributed code and so on and so forth. + And he was saying, you know, whenever he received the question and he had like 20, 25 years experience in this and he was saying, hey, when I hear a question from a newcomer, doubting themselves million, million of times, you know, should I apply for this job or should I become, should I commit something here like code or whatever. +I am just afraid. He was saying, okay, go and get your rejection. So it's like that first leap that you take and it becomes bit more like a game. I mean, it's like and he had a bit of like humor here, right? Okay. +So you're doubting yourself, you think you will get rejected, then prove it, go and get it. Right. And as they go and get it, they will probably either get it or succeed or get some partial you know, stage and to the partial stage and probably need to revise something and so on to right. Right. +No, that is that is amazing. I mean, I love it. And I that also brings back like I used to have this email signature. I think in during the time when I was in college that I never lose either when or I learn. Yeah. That just brought back. I mean, I don't know. +I at what point in time like I removed it, but I really lived by that line that you either learn or you win. So either way, it's a win. Yeah. +And like, probably getting ahead of ourselves, but like even when you start getting these small wins, they have start and end really like by the end of the win, you're like, okay, what should they do next? +You know, next challenge for me, you know, and that's probably you already had of yourself and you're doing a lot of work already, but at the same time, that challenge is always there. +And that's the part of the game and part of the reinforcement learning. And that is actually the correct terminology. I was just looking for that term. I think reenforcement learning is I think that's exactly how it happens. I think for me, it's exactly how it's going. +Like I contribute something, then I get feedback and I present it. And I think I start, you know, becoming more kind of like passionate about the other stuff. So it's like, you know, throwing your hat over the wall. And the next time you throw it at, I mean, throw your hat at even higher wall. +So I'm not sure if you understand that terminology, like this is something that's like taking your chances. So it's like throwing hat over the wall. So yeah, that's that's so I'm trying to make sure like all the feedback that I got from demos and the presentations that I give so far. +I mean, I, you know, collect all of it, making sure like I address everything. I have big presentation again at Berlin Buzzwords to give this year. Yeah, I think it's kind of like something. +If I look back, like the first time when I spoke at Berlin Buzzword and I think year 2020, then that's when the conference was like online. It was something I remember that, I mean, I went to my office because my kids would disturb me. +And I have this, you know, like I was so nervous, I had like five bottles of water next to me. And then I did not drink anything because I felt like I would not have to leave the seat for the toilet break if I drink that that much water. +So I was so confused like, I'm thirsty because I'm talking so much, but then I would not drink them because I would have to, you know, go away because I was hosting the lightning talk session, which went on for like the longer than a usual talk session. +So it was, it was like from that point in time until now, I think, yes, it's been quite a journey. Yeah, amazing. +I mean, this, this is really like coming to this logical question that I usually ask, like the question of why, and like you shared a lot today, you know, about women and search, your own journey, public speaking. +And there is always something new coming up, new projects, new blog, as a result of that project, maybe new struggle, new learning and so on. +But is there something that keeps you in this profession beyond these challenges, beyond solutions? Or is it just that? I think it's mostly, I don't know, it's, it's something so engaging and the joy that I get in solving things, I think that just keeps me going on and on. +And I think it's also like a commitment that I have to myself that learning things and experimenting with stuff that really brings the best out of me. So I mean, I have this somehow ambition, I want to be, you know, like I want to know everything and maybe language analysis. +I mean that, you know, so much, you know, has like a passion that I have for these that I want to be like a PhD someday in that. And which is why I want to know everything and somehow, like just aiming for that goal, I know I'm like maybe old, old for doing PhD. +But yeah, that's the goal, like keep for myself. Like if I, I think a short story before we end is that people ask me like, what does your name means? And I think if you're not noticed, like my name is a pellandrom, my first name and my last name. +So it's a, the ITA, and if you reverse it, it stays the same. Also people ask me like, what does it mean? So my name is actually a machine learning model. So I learn from the past. That's what my name means. +And I want to just, you know, prove my name that if my parents, you know, thought of something before naming me at the time, they get what they expected from me. And that's all about it. Oh, very beautiful answer. Really? It's like living to your, to your true self. Yes. Amazing. +Is there something you want to announce? Are you already said that you're going to present at Marine Bosworth's? Is there something else you want to share with the audience that they should know about? Maybe course or something, something that they should, you know, get them and start it with. +Oh, yeah, for sure. I think so the elastic search version of courses also out now. I mean, happy to take more questions. If you have anything, reach out to me on Slack, of course. Other than that, there were certain things that I wanted to rework. So we wanted to work on our image models. +So that's now been fixed already. So happy to give that try as well. Along with that, we're soon going to be sharing some more information about fine tuning the models. We're already working with Gina AI on that. +So hopefully we can come up with the blog post or something that demystifies that part as well. Other than that, I have some plans also for Hastag EU. I'm working on some case studies with the companies. + I mean, if somebody who's listening to this podcast thinks that they could be one of the candidates who wants to be involved with this case study, we would talk about how involvement of women change things at your workplace, feel free to connect with me on relevance like or whichever way you think is the best LinkedIn. +Yeah, I bet Twitter maybe. Oh, yeah, Twitter for sure. Absolutely. Thanks for adding that in. Fantastic. I really, really enjoyed this conversation at TITA. I learned something today. And I think that our listeners did as well. Keep improving your model. +Keep generating more of the parts so you can further your model. And I hope to meet you as well at Hastag or some other meetup maybe online. And yeah, I mean, it was a really pleasure to talk to you today. Yeah, same here. It was amazing. +And I mean, it was obviously like a different level, like being a subscriber and being on the show. So that was the kind of like moment I had with myself today. Thank you for this opportunity. Thanks so much. It was also the best for the forthcoming podcasts. Yeah, thank you so much. +Thank you, TITA. Yeah, bye bye. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/berlin-buzzwords-2024-alessandro-benedetti-llms-in-solr.md b/transcripts_with_timestamps/vector-podcast/berlin-buzzwords-2024-alessandro-benedetti-llms-in-solr.md new file mode 100644 index 0000000..23986de --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/berlin-buzzwords-2024-alessandro-benedetti-llms-in-solr.md @@ -0,0 +1,2019 @@ +--- +description: '

This episode on YouTube: https://www.youtube.com/watch?v=PNB70TbQUBE

Alessandro''s + talk on Hybrid Search with Apache Solr Reciprocal Rank Fusion: https://www.youtube.com/watch?v=8x2cbT5CCEM&list=PLq-odUc2x7i8jHpa6PHGzmxfAPEz-c-on&index=5

00:00 + Intro

00:50 Alessandro''s take on the bbuzz''24 conference

01:25 What + and value of hybrid search

04:55 Explainability of vector search results to + users

09:27 Explainability of vector search results to search engineers

13:12 + State of hybrid search in Apache Solr

14:32 What''s in Reciprocal Rank Fusion + beyond round-robin?

18:30 Open source for LLMs

22:48 How we should approach + this issue in business and research

26:12 How to maintain the status of an + open-source LLM / system

30:06 Prompt engineering (hope and determinism)

34:03 + DSpy

35:16 What''s next in Solr

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20241107_011152_a59e71acc05fe03f850677d583f5111a.png +pub_date: Thu, 07 Nov 2024 13:59:44 GMT +title: Berlin Buzzwords 2024 - Alessandro Benedetti - LLMs in Solr +url: https://rss.com/podcasts/vector-podcast/1741381 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 20.92, "text": " All + right, Dr. Podcast and here I have Alessandra Benedetti with me his second time + on the", "tokens": [50364, 1057, 558, 11, 2491, 13, 29972, 293, 510, 286, 362, 967, + 442, 18401, 39753, 12495, 365, 385, 702, 1150, 565, 322, 264, 51410], "temperature": + 0.0, "avg_logprob": -0.45390538471501046, "compression_ratio": 1.2936507936507937, + "no_speech_prob": 0.30223244428634644}, {"id": 1, "seek": 0, "start": 20.92, "end": + 27.8, "text": " podcast actually and exactly about same place we recorded two years + ago.", "tokens": [51410, 7367, 767, 293, 2293, 466, 912, 1081, 321, 8287, 732, 924, + 2057, 13, 51754], "temperature": 0.0, "avg_logprob": -0.45390538471501046, "compression_ratio": + 1.2936507936507937, "no_speech_prob": 0.30223244428634644}, {"id": 2, "seek": 2780, + "start": 27.8, "end": 32.4, "text": " I remember on Berlin was works. Yeah, we were + here. Yeah, I guess I got 22.", "tokens": [50364, 286, 1604, 322, 13848, 390, 1985, + 13, 865, 11, 321, 645, 510, 13, 865, 11, 286, 2041, 286, 658, 5853, 13, 50594], + "temperature": 0.0, "avg_logprob": -0.45363426208496094, "compression_ratio": 1.6929824561403508, + "no_speech_prob": 0.42002952098846436}, {"id": 3, "seek": 2780, "start": 32.4, "end": + 38.2, "text": " It was. It was by the way a lot no easier if you remember them now + but was", "tokens": [50594, 467, 390, 13, 467, 390, 538, 264, 636, 257, 688, 572, + 3571, 498, 291, 1604, 552, 586, 457, 390, 50884], "temperature": 0.0, "avg_logprob": + -0.45363426208496094, "compression_ratio": 1.6929824561403508, "no_speech_prob": + 0.42002952098846436}, {"id": 4, "seek": 2780, "start": 38.2, "end": 46.44, "text": + " closing day that we like people. Yeah, but I think it''s almost end of day as", + "tokens": [50884, 10377, 786, 300, 321, 411, 561, 13, 865, 11, 457, 286, 519, 309, + 311, 1920, 917, 295, 786, 382, 51296], "temperature": 0.0, "avg_logprob": -0.45363426208496094, + "compression_ratio": 1.6929824561403508, "no_speech_prob": 0.42002952098846436}, + {"id": 5, "seek": 2780, "start": 46.44, "end": 51.8, "text": " well here. First + day of the conference and yeah, I wanted to chat with you.", "tokens": [51296, 731, + 510, 13, 2386, 786, 295, 264, 7586, 293, 1338, 11, 286, 1415, 281, 5081, 365, 291, + 13, 51564], "temperature": 0.0, "avg_logprob": -0.45363426208496094, "compression_ratio": + 1.6929824561403508, "no_speech_prob": 0.42002952098846436}, {"id": 6, "seek": 2780, + "start": 51.8, "end": 56.88, "text": " How do you like the conference so far? So + has been so far like a great conference.", "tokens": [51564, 1012, 360, 291, 411, + 264, 7586, 370, 1400, 30, 407, 575, 668, 370, 1400, 411, 257, 869, 7586, 13, 51818], + "temperature": 0.0, "avg_logprob": -0.45363426208496094, "compression_ratio": 1.6929824561403508, + "no_speech_prob": 0.42002952098846436}, {"id": 7, "seek": 5688, "start": 56.88, + "end": 61.92, "text": " We''ve been seeing like many talks about the language modern + integration with search.", "tokens": [50364, 492, 600, 668, 2577, 411, 867, 6686, + 466, 264, 2856, 4363, 10980, 365, 3164, 13, 50616], "temperature": 0.0, "avg_logprob": + -0.4918078522184002, "compression_ratio": 1.6269430051813472, "no_speech_prob": + 0.013902271166443825}, {"id": 8, "seek": 5688, "start": 61.92, "end": 68.60000000000001, + "text": " So that''s the biggest new trend. Vector based search is still quite a + strong", "tokens": [50616, 407, 300, 311, 264, 3880, 777, 6028, 13, 691, 20814, + 2361, 3164, 307, 920, 1596, 257, 2068, 50950], "temperature": 0.0, "avg_logprob": + -0.4918078522184002, "compression_ratio": 1.6269430051813472, "no_speech_prob": + 0.013902271166443825}, {"id": 9, "seek": 5688, "start": 68.60000000000001, "end": + 75.0, "text": " topic and in general with testing also like evaluation and explainability", + "tokens": [50950, 4829, 293, 294, 2674, 365, 4997, 611, 411, 13344, 293, 2903, 2310, + 51270], "temperature": 0.0, "avg_logprob": -0.4918078522184002, "compression_ratio": + 1.6269430051813472, "no_speech_prob": 0.013902271166443825}, {"id": 10, "seek": + 5688, "start": 75.0, "end": 80.04, "text": " discussions around like vector based + search or in general language models. And", "tokens": [51270, 11088, 926, 411, 8062, + 2361, 3164, 420, 294, 2674, 2856, 5245, 13, 400, 51522], "temperature": 0.0, "avg_logprob": + -0.4918078522184002, "compression_ratio": 1.6269430051813472, "no_speech_prob": + 0.013902271166443825}, {"id": 11, "seek": 8004, "start": 80.48, "end": 88.32000000000001, + "text": " and my thoughts was about hybrid search. Hybrid search. Yeah, so you work + a lot on", "tokens": [50386, 293, 452, 4598, 390, 466, 13051, 3164, 13, 47088, 3164, + 13, 865, 11, 370, 291, 589, 257, 688, 322, 50778], "temperature": 0.0, "avg_logprob": + -0.3404139738816481, "compression_ratio": 1.5765306122448979, "no_speech_prob": + 0.01298623625189066}, {"id": 12, "seek": 8004, "start": 88.32000000000001, "end": + 91.80000000000001, "text": " on solar right that''s your kind of like playground + and that''s where you", "tokens": [50778, 322, 7936, 558, 300, 311, 428, 733, 295, + 411, 24646, 293, 300, 311, 689, 291, 50952], "temperature": 0.0, "avg_logprob": + -0.3404139738816481, "compression_ratio": 1.5765306122448979, "no_speech_prob": + 0.01298623625189066}, {"id": 13, "seek": 8004, "start": 91.80000000000001, "end": + 98.2, "text": " integrate things but also then I heard that like guys at Reddit + are using the", "tokens": [50952, 13365, 721, 457, 611, 550, 286, 2198, 300, 411, + 1074, 412, 32210, 366, 1228, 264, 51272], "temperature": 0.0, "avg_logprob": -0.3404139738816481, + "compression_ratio": 1.5765306122448979, "no_speech_prob": 0.01298623625189066}, + {"id": 14, "seek": 8004, "start": 98.2, "end": 105.12, "text": " work that you''ve + been doing also in solar. So that''s amazing. Tell me more a", "tokens": [51272, + 589, 300, 291, 600, 668, 884, 611, 294, 7936, 13, 407, 300, 311, 2243, 13, 5115, + 385, 544, 257, 51618], "temperature": 0.0, "avg_logprob": -0.3404139738816481, "compression_ratio": + 1.5765306122448979, "no_speech_prob": 0.01298623625189066}, {"id": 15, "seek": 10512, + "start": 105.12, "end": 109.92, "text": " bit more about what is hybrid search right? + How do you see it? What''s the value?", "tokens": [50364, 857, 544, 466, 437, 307, + 13051, 3164, 558, 30, 1012, 360, 291, 536, 309, 30, 708, 311, 264, 2158, 30, 50604], + "temperature": 0.0, "avg_logprob": -0.30045333949998876, "compression_ratio": 1.6891891891891893, + "no_speech_prob": 0.001479180995374918}, {"id": 16, "seek": 10512, "start": 109.92, + "end": 115.84, "text": " And and basically maybe what are the challenges that you + needed to solve and", "tokens": [50604, 400, 293, 1936, 1310, 437, 366, 264, 4759, + 300, 291, 2978, 281, 5039, 293, 50900], "temperature": 0.0, "avg_logprob": -0.30045333949998876, + "compression_ratio": 1.6891891891891893, "no_speech_prob": 0.001479180995374918}, + {"id": 17, "seek": 10512, "start": 115.84, "end": 123.68, "text": " you still see + related to hybrid search? So the first point and the reason I", "tokens": [50900, + 291, 920, 536, 4077, 281, 13051, 3164, 30, 407, 264, 700, 935, 293, 264, 1778, 286, + 51292], "temperature": 0.0, "avg_logprob": -0.30045333949998876, "compression_ratio": + 1.6891891891891893, "no_speech_prob": 0.001479180995374918}, {"id": 18, "seek": + 10512, "start": 123.68, "end": 126.96000000000001, "text": " decided to start working + a little bit more in hybrid search and", "tokens": [51292, 3047, 281, 722, 1364, + 257, 707, 857, 544, 294, 13051, 3164, 293, 51456], "temperature": 0.0, "avg_logprob": + -0.30045333949998876, "compression_ratio": 1.6891891891891893, "no_speech_prob": + 0.001479180995374918}, {"id": 19, "seek": 10512, "start": 126.96000000000001, "end": + 131.4, "text": " contributing this even our rank vision to solar is because of the + limitations", "tokens": [51456, 19270, 341, 754, 527, 6181, 5201, 281, 7936, 307, + 570, 295, 264, 15705, 51678], "temperature": 0.0, "avg_logprob": -0.30045333949998876, + "compression_ratio": 1.6891891891891893, "no_speech_prob": 0.001479180995374918}, + {"id": 20, "seek": 13140, "start": 131.4, "end": 136.56, "text": " of vector based + search. So vector based search of course introduces like the", "tokens": [50364, + 295, 8062, 2361, 3164, 13, 407, 8062, 2361, 3164, 295, 1164, 31472, 411, 264, 50622], + "temperature": 0.0, "avg_logprob": -0.3414122263590495, "compression_ratio": 1.6703296703296704, + "no_speech_prob": 0.002649236237630248}, {"id": 21, "seek": 13140, "start": 136.56, + "end": 145.48000000000002, "text": " ability of closing the semantic gaps with light + queries and documents with some", "tokens": [50622, 3485, 295, 10377, 264, 47982, + 15031, 365, 1442, 24109, 293, 8512, 365, 512, 51068], "temperature": 0.0, "avg_logprob": + -0.3414122263590495, "compression_ratio": 1.6703296703296704, "no_speech_prob": + 0.002649236237630248}, {"id": 22, "seek": 13140, "start": 145.48000000000002, "end": + 150.6, "text": " some limitations right? So the explainability bar for example is + an aspect I", "tokens": [51068, 512, 15705, 558, 30, 407, 264, 2903, 2310, 2159, + 337, 1365, 307, 364, 4171, 286, 51324], "temperature": 0.0, "avg_logprob": -0.3414122263590495, + "compression_ratio": 1.6703296703296704, "no_speech_prob": 0.002649236237630248}, + {"id": 23, "seek": 13140, "start": 150.6, "end": 156.84, "text": " care a lot and + it''s just very difficult to explain vector based search", "tokens": [51324, 1127, + 257, 688, 293, 309, 311, 445, 588, 2252, 281, 2903, 8062, 2361, 3164, 51636], "temperature": + 0.0, "avg_logprob": -0.3414122263590495, "compression_ratio": 1.6703296703296704, + "no_speech_prob": 0.002649236237630248}, {"id": 24, "seek": 15684, "start": 156.84, + "end": 161.96, "text": " results. Yeah, we have I dimensional. So many many dimensions + in the", "tokens": [50364, 3542, 13, 865, 11, 321, 362, 286, 18795, 13, 407, 867, + 867, 12819, 294, 264, 50620], "temperature": 0.0, "avg_logprob": -0.30894758534985917, + "compression_ratio": 1.8226600985221675, "no_speech_prob": 0.011387085542082787}, + {"id": 25, "seek": 15684, "start": 161.96, "end": 166.92000000000002, "text": " + vectors and humans are not really good in managing many dimensions. We live in", + "tokens": [50620, 18875, 293, 6255, 366, 406, 534, 665, 294, 11642, 867, 12819, + 13, 492, 1621, 294, 50868], "temperature": 0.0, "avg_logprob": -0.30894758534985917, + "compression_ratio": 1.8226600985221675, "no_speech_prob": 0.011387085542082787}, + {"id": 26, "seek": 15684, "start": 166.92000000000002, "end": 172.12, "text": " + a three dimensional world and this even difficult for us to to understand", "tokens": + [50868, 257, 1045, 18795, 1002, 293, 341, 754, 2252, 337, 505, 281, 281, 1223, 51128], + "temperature": 0.0, "avg_logprob": -0.30894758534985917, "compression_ratio": 1.8226600985221675, + "no_speech_prob": 0.011387085542082787}, {"id": 27, "seek": 15684, "start": 172.12, + "end": 181.0, "text": " life for dimensional life. Yeah, then we have like many + elements in those", "tokens": [51128, 993, 337, 18795, 993, 13, 865, 11, 550, 321, + 362, 411, 867, 4959, 294, 729, 51572], "temperature": 0.0, "avg_logprob": -0.30894758534985917, + "compression_ratio": 1.8226600985221675, "no_speech_prob": 0.011387085542082787}, + {"id": 28, "seek": 15684, "start": 181.0, "end": 186.44, "text": " vectors. So each + feature in the vector doesn''t have like a meaning for the", "tokens": [51572, 18875, + 13, 407, 1184, 4111, 294, 264, 8062, 1177, 380, 362, 411, 257, 3620, 337, 264, 51844], + "temperature": 0.0, "avg_logprob": -0.30894758534985917, "compression_ratio": 1.8226600985221675, + "no_speech_prob": 0.011387085542082787}, {"id": 29, "seek": 18644, "start": 186.44, + "end": 193.8, "text": " humans. So you have like 768 dimensions in your vectors + and there''s no single", "tokens": [50364, 6255, 13, 407, 291, 362, 411, 24733, + 23, 12819, 294, 428, 18875, 293, 456, 311, 572, 2167, 50732], "temperature": 0.0, + "avg_logprob": -0.2223291189774223, "compression_ratio": 1.6812227074235808, "no_speech_prob": + 0.0009225498070009053}, {"id": 30, "seek": 18644, "start": 193.8, "end": 198.35999999999999, + "text": " dimension that means something semantic. So it''s just the output of some", + "tokens": [50732, 10139, 300, 1355, 746, 47982, 13, 407, 309, 311, 445, 264, 5598, + 295, 512, 50960], "temperature": 0.0, "avg_logprob": -0.2223291189774223, "compression_ratio": + 1.6812227074235808, "no_speech_prob": 0.0009225498070009053}, {"id": 31, "seek": + 18644, "start": 198.35999999999999, "end": 204.36, "text": " machiner model but + we can interpret like what it is. And we can interpret what", "tokens": [50960, + 2246, 4564, 2316, 457, 321, 393, 7302, 411, 437, 309, 307, 13, 400, 321, 393, 7302, + 437, 51260], "temperature": 0.0, "avg_logprob": -0.2223291189774223, "compression_ratio": + 1.6812227074235808, "no_speech_prob": 0.0009225498070009053}, {"id": 32, "seek": + 18644, "start": 204.36, "end": 210.6, "text": " would happen if that feature goes + higher or lower. I mean does a higher value for", "tokens": [51260, 576, 1051, 498, + 300, 4111, 1709, 2946, 420, 3126, 13, 286, 914, 775, 257, 2946, 2158, 337, 51572], + "temperature": 0.0, "avg_logprob": -0.2223291189774223, "compression_ratio": 1.6812227074235808, + "no_speech_prob": 0.0009225498070009053}, {"id": 33, "seek": 18644, "start": 210.6, + "end": 215.64, "text": " that feature means higher relevance or not? You can''t + really do that with", "tokens": [51572, 300, 4111, 1355, 2946, 32684, 420, 406, + 30, 509, 393, 380, 534, 360, 300, 365, 51824], "temperature": 0.0, "avg_logprob": + -0.2223291189774223, "compression_ratio": 1.6812227074235808, "no_speech_prob": + 0.0009225498070009053}, {"id": 34, "seek": 21564, "start": 215.64, "end": 223.16, + "text": " vector based search. So these kind of problems. Yeah, start to have like + an", "tokens": [50364, 8062, 2361, 3164, 13, 407, 613, 733, 295, 2740, 13, 865, + 11, 722, 281, 362, 411, 364, 50740], "temperature": 0.0, "avg_logprob": -0.3771527189957468, + "compression_ratio": 1.6073298429319371, "no_speech_prob": 0.001931108650751412}, + {"id": 35, "seek": 21564, "start": 223.16, "end": 227.79999999999998, "text": " + input. Right. So you have like your clients using vector based search. They are", + "tokens": [50740, 4846, 13, 1779, 13, 407, 291, 362, 411, 428, 6982, 1228, 8062, + 2361, 3164, 13, 814, 366, 50972], "temperature": 0.0, "avg_logprob": -0.3771527189957468, + "compression_ratio": 1.6073298429319371, "no_speech_prob": 0.001931108650751412}, + {"id": 36, "seek": 21564, "start": 227.79999999999998, "end": 234.92, "text": " + happy and then they are not and they want to explain for example, yeah, what happens.", + "tokens": [50972, 2055, 293, 550, 436, 366, 406, 293, 436, 528, 281, 2903, 337, + 1365, 11, 1338, 11, 437, 2314, 13, 51328], "temperature": 0.0, "avg_logprob": -0.3771527189957468, + "compression_ratio": 1.6073298429319371, "no_speech_prob": 0.001931108650751412}, + {"id": 37, "seek": 21564, "start": 234.92, "end": 242.83999999999997, "text": " + Yeah, and another limitation is keyword based matching. So by the", "tokens": [51328, + 865, 11, 293, 1071, 27432, 307, 20428, 2361, 14324, 13, 407, 538, 264, 51724], "temperature": + 0.0, "avg_logprob": -0.3771527189957468, "compression_ratio": 1.6073298429319371, + "no_speech_prob": 0.001931108650751412}, {"id": 38, "seek": 24284, "start": 243.8, + "end": 249.16, "text": " vector based search try to solve the vocabulary in his + best problem. So if you have terms in", "tokens": [50412, 8062, 2361, 3164, 853, + 281, 5039, 264, 19864, 294, 702, 1151, 1154, 13, 407, 498, 291, 362, 2115, 294, + 50680], "temperature": 0.0, "avg_logprob": -0.2516821637565707, "compression_ratio": + 1.7548076923076923, "no_speech_prob": 0.0053239986300468445}, {"id": 39, "seek": + 24284, "start": 249.16, "end": 255.8, "text": " your vocabulary that''s different + from the vocabulary used for queries. Yeah. At the same time,", "tokens": [50680, + 428, 19864, 300, 311, 819, 490, 264, 19864, 1143, 337, 24109, 13, 865, 13, 1711, + 264, 912, 565, 11, 51012], "temperature": 0.0, "avg_logprob": -0.2516821637565707, + "compression_ratio": 1.7548076923076923, "no_speech_prob": 0.0053239986300468445}, + {"id": 40, "seek": 24284, "start": 255.8, "end": 263.16, "text": " users are used + to have keyword matching documents in their response. So when you don''t", "tokens": + [51012, 5022, 366, 1143, 281, 362, 20428, 14324, 8512, 294, 641, 4134, 13, 407, + 562, 291, 500, 380, 51380], "temperature": 0.0, "avg_logprob": -0.2516821637565707, + "compression_ratio": 1.7548076923076923, "no_speech_prob": 0.0053239986300468445}, + {"id": 41, "seek": 24284, "start": 263.16, "end": 268.6, "text": " provide keyword + matching document in their response, they''re going to be like problems and", "tokens": + [51380, 2893, 20428, 14324, 4166, 294, 641, 4134, 11, 436, 434, 516, 281, 312, 411, + 2740, 293, 51652], "temperature": 0.0, "avg_logprob": -0.2516821637565707, "compression_ratio": + 1.7548076923076923, "no_speech_prob": 0.0053239986300468445}, {"id": 42, "seek": + 26860, "start": 268.6, "end": 273.88, "text": " questions. Yeah. Why do I see this + now? And why I don''t see for example this title.", "tokens": [50364, 1651, 13, + 865, 13, 1545, 360, 286, 536, 341, 586, 30, 400, 983, 286, 500, 380, 536, 337, 1365, + 341, 4876, 13, 50628], "temperature": 0.0, "avg_logprob": -0.3079527034315952, "compression_ratio": + 1.5863636363636364, "no_speech_prob": 0.003633220912888646}, {"id": 43, "seek": + 26860, "start": 273.88, "end": 280.84000000000003, "text": " Oh yeah. So without + any search, the idea is to mitigate those problems. So mix up", "tokens": [50628, + 876, 1338, 13, 407, 1553, 604, 3164, 11, 264, 1558, 307, 281, 27336, 729, 2740, + 13, 407, 2890, 493, 50976], "temperature": 0.0, "avg_logprob": -0.3079527034315952, + "compression_ratio": 1.5863636363636364, "no_speech_prob": 0.003633220912888646}, + {"id": 44, "seek": 26860, "start": 281.48, "end": 287.96000000000004, "text": " + different query results sets. Potentially like vector based search results and traditional", + "tokens": [51008, 819, 14581, 3542, 6352, 13, 9145, 3137, 411, 8062, 2361, 3164, + 3542, 293, 5164, 51332], "temperature": 0.0, "avg_logprob": -0.3079527034315952, + "compression_ratio": 1.5863636363636364, "no_speech_prob": 0.003633220912888646}, + {"id": 45, "seek": 26860, "start": 287.96000000000004, "end": 293.64000000000004, + "text": " keyword based search results. Yeah. Get back one result set. Yeah. Let''s + try to combine both", "tokens": [51332, 20428, 2361, 3164, 3542, 13, 865, 13, 3240, + 646, 472, 1874, 992, 13, 865, 13, 961, 311, 853, 281, 10432, 1293, 51616], "temperature": + 0.0, "avg_logprob": -0.3079527034315952, "compression_ratio": 1.5863636363636364, + "no_speech_prob": 0.003633220912888646}, {"id": 46, "seek": 29364, "start": 293.64, + "end": 299.88, "text": " words. Interesting. And if we kind of step forward from + this, let''s say we deployed hybrid search,", "tokens": [50364, 2283, 13, 14711, + 13, 400, 498, 321, 733, 295, 1823, 2128, 490, 341, 11, 718, 311, 584, 321, 17826, + 13051, 3164, 11, 50676], "temperature": 0.0, "avg_logprob": -0.16302698113945094, + "compression_ratio": 1.5925925925925926, "no_speech_prob": 0.0038669852074235678}, + {"id": 47, "seek": 29364, "start": 301.15999999999997, "end": 309.4, "text": " so + now it basically takes some similar documents from keyword hits and then another + one from vector.", "tokens": [50740, 370, 586, 309, 1936, 2516, 512, 2531, 8512, + 490, 20428, 8664, 293, 550, 1071, 472, 490, 8062, 13, 51152], "temperature": 0.0, + "avg_logprob": -0.16302698113945094, "compression_ratio": 1.5925925925925926, "no_speech_prob": + 0.0038669852074235678}, {"id": 48, "seek": 29364, "start": 310.36, "end": 315.4, + "text": " You still get those documents that do not have keyword matches, right, + from the vector space.", "tokens": [51200, 509, 920, 483, 729, 8512, 300, 360, 406, + 362, 20428, 10676, 11, 558, 11, 490, 264, 8062, 1901, 13, 51452], "temperature": + 0.0, "avg_logprob": -0.16302698113945094, "compression_ratio": 1.5925925925925926, + "no_speech_prob": 0.0038669852074235678}, {"id": 49, "seek": 29364, "start": 316.2, + "end": 321.56, "text": " Do you know or maybe you have employed some ways of explaining + to the user why they see them?", "tokens": [51492, 1144, 291, 458, 420, 1310, 291, + 362, 20115, 512, 2098, 295, 13468, 281, 264, 4195, 983, 436, 536, 552, 30, 51760], + "temperature": 0.0, "avg_logprob": -0.16302698113945094, "compression_ratio": 1.5925925925925926, + "no_speech_prob": 0.0038669852074235678}, {"id": 50, "seek": 32156, "start": 321.88, + "end": 329.0, "text": " So that''s an interesting point actually at the discussion + recently about how can we explain better", "tokens": [50380, 407, 300, 311, 364, + 1880, 935, 767, 412, 264, 5017, 3938, 466, 577, 393, 321, 2903, 1101, 50736], "temperature": + 0.0, "avg_logprob": -0.3099801502530537, "compression_ratio": 1.6256983240223464, + "no_speech_prob": 0.0008720722398720682}, {"id": 51, "seek": 32156, "start": 329.0, + "end": 335.4, "text": " by vector based search. So we mentioned already all the + problems. We''ve explained the what can we do", "tokens": [50736, 538, 8062, 2361, + 3164, 13, 407, 321, 2835, 1217, 439, 264, 2740, 13, 492, 600, 8825, 264, 437, 393, + 321, 360, 51056], "temperature": 0.0, "avg_logprob": -0.3099801502530537, "compression_ratio": + 1.6256983240223464, "no_speech_prob": 0.0008720722398720682}, {"id": 52, "seek": + 32156, "start": 335.4, "end": 342.76, "text": " bet. So there are other approaches + that just cure dense vector based search such as learned", "tokens": [51056, 778, + 13, 407, 456, 366, 661, 11587, 300, 445, 13698, 18011, 8062, 2361, 3164, 1270, 382, + 3264, 51424], "temperature": 0.0, "avg_logprob": -0.3099801502530537, "compression_ratio": + 1.6256983240223464, "no_speech_prob": 0.0008720722398720682}, {"id": 53, "seek": + 34276, "start": 342.76, "end": 350.68, "text": " sparse retrieval for example, where + you learn query or document expansion term candidates", "tokens": [50364, 637, 11668, + 19817, 3337, 337, 1365, 11, 689, 291, 1466, 14581, 420, 4166, 11260, 1433, 11255, + 50760], "temperature": 0.0, "avg_logprob": -0.18904660240052237, "compression_ratio": + 1.574468085106383, "no_speech_prob": 0.01907644234597683}, {"id": 54, "seek": 34276, + "start": 351.56, "end": 359.71999999999997, "text": " based on learned models. So + based on the probability you will expand your queries with additional terms.", "tokens": + [50804, 2361, 322, 3264, 5245, 13, 407, 2361, 322, 264, 8482, 291, 486, 5268, 428, + 24109, 365, 4497, 2115, 13, 51212], "temperature": 0.0, "avg_logprob": -0.18904660240052237, + "compression_ratio": 1.574468085106383, "no_speech_prob": 0.01907644234597683}, + {"id": 55, "seek": 34276, "start": 360.84, "end": 366.44, "text": " So that''s a + little bit more explainable because at least you get back from the machine learning + model", "tokens": [51268, 407, 300, 311, 257, 707, 857, 544, 2903, 712, 570, 412, + 1935, 291, 483, 646, 490, 264, 3479, 2539, 2316, 51548], "temperature": 0.0, "avg_logprob": + -0.18904660240052237, "compression_ratio": 1.574468085106383, "no_speech_prob": + 0.01907644234597683}, {"id": 56, "seek": 36644, "start": 366.52, "end": 373.48, + "text": " alternative terms for the queries and the document. Yeah. It''s still + a first layer of explainability.", "tokens": [50368, 8535, 2115, 337, 264, 24109, + 293, 264, 4166, 13, 865, 13, 467, 311, 920, 257, 700, 4583, 295, 2903, 2310, 13, + 50716], "temperature": 0.0, "avg_logprob": -0.3070507049560547, "compression_ratio": + 1.6130434782608696, "no_speech_prob": 0.009303648956120014}, {"id": 57, "seek": + 36644, "start": 373.48, "end": 378.36, "text": " So you have some that''s like additional + concepts. So it''s easier to understand.", "tokens": [50716, 407, 291, 362, 512, + 300, 311, 411, 4497, 10392, 13, 407, 309, 311, 3571, 281, 1223, 13, 50960], "temperature": + 0.0, "avg_logprob": -0.3070507049560547, "compression_ratio": 1.6130434782608696, + "no_speech_prob": 0.009303648956120014}, {"id": 58, "seek": 36644, "start": 378.36, + "end": 384.28, "text": " Still you have the probability assigned to their pair. + So if it goes wrong, you may end up with", "tokens": [50960, 8291, 291, 362, 264, + 8482, 13279, 281, 641, 6119, 13, 407, 498, 309, 1709, 2085, 11, 291, 815, 917, 493, + 365, 51256], "temperature": 0.0, "avg_logprob": -0.3070507049560547, "compression_ratio": + 1.6130434782608696, "no_speech_prob": 0.009303648956120014}, {"id": 59, "seek": + 36644, "start": 385.08, "end": 390.28, "text": " unreasonable terms. So not perfect. + A little bit better, maybe a little bit more explainable.", "tokens": [51296, 41730, + 2115, 13, 407, 406, 2176, 13, 316, 707, 857, 1101, 11, 1310, 257, 707, 857, 544, + 2903, 712, 13, 51556], "temperature": 0.0, "avg_logprob": -0.3070507049560547, "compression_ratio": + 1.6130434782608696, "no_speech_prob": 0.009303648956120014}, {"id": 60, "seek": + 39028, "start": 391.23999999999995, "end": 399.23999999999995, "text": " And then + there are approaches such callvert where you encode your sentence, not just to just + one", "tokens": [50412, 400, 550, 456, 366, 11587, 1270, 818, 3281, 689, 291, 2058, + 1429, 428, 8174, 11, 406, 445, 281, 445, 472, 50812], "temperature": 0.0, "avg_logprob": + -0.35245581234202666, "compression_ratio": 1.6054054054054054, "no_speech_prob": + 0.015376309864223003}, {"id": 61, "seek": 39028, "start": 399.23999999999995, "end": + 405.55999999999995, "text": " vector back to a sequence of vectors. So multiple + vectors, one pair for an action. And you do the same", "tokens": [50812, 8062, 646, + 281, 257, 8310, 295, 18875, 13, 407, 3866, 18875, 11, 472, 6119, 337, 364, 3069, + 13, 400, 291, 360, 264, 912, 51128], "temperature": 0.0, "avg_logprob": -0.35245581234202666, + "compression_ratio": 1.6054054054054054, "no_speech_prob": 0.015376309864223003}, + {"id": 62, "seek": 39028, "start": 405.55999999999995, "end": 413.88, "text": " + for your documents. And then you you basically return results based on the similarity + between not", "tokens": [51128, 337, 428, 8512, 13, 400, 550, 291, 291, 1936, 2736, + 3542, 2361, 322, 264, 32194, 1296, 406, 51544], "temperature": 0.0, "avg_logprob": + -0.35245581234202666, "compression_ratio": 1.6054054054054054, "no_speech_prob": + 0.015376309864223003}, {"id": 63, "seek": 41388, "start": 413.88, "end": 419.64, + "text": " just a single query vector and the document vector, but multiple query + vectors. So each query", "tokens": [50364, 445, 257, 2167, 14581, 8062, 293, 264, + 4166, 8062, 11, 457, 3866, 14581, 18875, 13, 407, 1184, 14581, 50652], "temperature": + 0.0, "avg_logprob": -0.20008046139952956, "compression_ratio": 1.8413461538461537, + "no_speech_prob": 0.003499676939100027}, {"id": 64, "seek": 41388, "start": 419.64, + "end": 425.48, "text": " at each query vector, which is meant to be probably a term + with the terms in the document. So you", "tokens": [50652, 412, 1184, 14581, 8062, + 11, 597, 307, 4140, 281, 312, 1391, 257, 1433, 365, 264, 2115, 294, 264, 4166, 13, + 407, 291, 50944], "temperature": 0.0, "avg_logprob": -0.20008046139952956, "compression_ratio": + 1.8413461538461537, "no_speech_prob": 0.003499676939100027}, {"id": 65, "seek": + 41388, "start": 425.48, "end": 431.48, "text": " may be able to highlight the terms + in the document that are close to the terms in the query. Yeah.", "tokens": [50944, + 815, 312, 1075, 281, 5078, 264, 2115, 294, 264, 4166, 300, 366, 1998, 281, 264, + 2115, 294, 264, 14581, 13, 865, 13, 51244], "temperature": 0.0, "avg_logprob": -0.20008046139952956, + "compression_ratio": 1.8413461538461537, "no_speech_prob": 0.003499676939100027}, + {"id": 66, "seek": 41388, "start": 432.2, "end": 436.52, "text": " Also in this + case, of course, it''s just a first layer of explainability because then if this", + "tokens": [51280, 2743, 294, 341, 1389, 11, 295, 1164, 11, 309, 311, 445, 257, 700, + 4583, 295, 2903, 2310, 570, 550, 498, 341, 51496], "temperature": 0.0, "avg_logprob": + -0.20008046139952956, "compression_ratio": 1.8413461538461537, "no_speech_prob": + 0.003499676939100027}, {"id": 67, "seek": 43652, "start": 436.52, "end": 442.03999999999996, + "text": " goes wrong, of course, again, you have sequences of vectors. So you can + get like a sort of", "tokens": [50364, 1709, 2085, 11, 295, 1164, 11, 797, 11, 291, + 362, 22978, 295, 18875, 13, 407, 291, 393, 483, 411, 257, 1333, 295, 50640], "temperature": + 0.0, "avg_logprob": -0.24840266579075865, "compression_ratio": 1.5646551724137931, + "no_speech_prob": 0.002190630417317152}, {"id": 68, "seek": 43652, "start": 443.79999999999995, + "end": 452.44, "text": " itm up of what query terms match is like, more or less + the document ones, but still not perfect.", "tokens": [50728, 309, 76, 493, 295, + 437, 14581, 2115, 2995, 307, 411, 11, 544, 420, 1570, 264, 4166, 2306, 11, 457, + 920, 406, 2176, 13, 51160], "temperature": 0.0, "avg_logprob": -0.24840266579075865, + "compression_ratio": 1.5646551724137931, "no_speech_prob": 0.002190630417317152}, + {"id": 69, "seek": 43652, "start": 452.44, "end": 456.44, "text": " Yeah, sure. + Of course, it''s kind of like maybe experimentation that is required,", "tokens": + [51160, 865, 11, 988, 13, 2720, 1164, 11, 309, 311, 733, 295, 411, 1310, 37142, + 300, 307, 4739, 11, 51360], "temperature": 0.0, "avg_logprob": -0.24840266579075865, + "compression_ratio": 1.5646551724137931, "no_speech_prob": 0.002190630417317152}, + {"id": 70, "seek": 43652, "start": 456.44, "end": 463.56, "text": " right? What + works for you? What what what is the end product? But maybe one question is for + me", "tokens": [51360, 558, 30, 708, 1985, 337, 291, 30, 708, 437, 437, 307, 264, + 917, 1674, 30, 583, 1310, 472, 1168, 307, 337, 385, 51716], "temperature": 0.0, + "avg_logprob": -0.24840266579075865, "compression_ratio": 1.5646551724137931, "no_speech_prob": + 0.002190630417317152}, {"id": 71, "seek": 46356, "start": 463.56, "end": 469.56, + "text": " as a user, right? Let''s say I''m using solar and you offer hybrid search + now. Are you already", "tokens": [50364, 382, 257, 4195, 11, 558, 30, 961, 311, + 584, 286, 478, 1228, 7936, 293, 291, 2626, 13051, 3164, 586, 13, 2014, 291, 1217, + 50664], "temperature": 0.0, "avg_logprob": -0.14990426018124536, "compression_ratio": + 1.6464285714285714, "no_speech_prob": 0.004965724423527718}, {"id": 72, "seek": + 46356, "start": 469.56, "end": 476.6, "text": " offering or will you consider at + some point offering the capability to ingrain what you just said", "tokens": [50664, + 8745, 420, 486, 291, 1949, 412, 512, 935, 8745, 264, 13759, 281, 3957, 7146, 437, + 291, 445, 848, 51016], "temperature": 0.0, "avg_logprob": -0.14990426018124536, + "compression_ratio": 1.6464285714285714, "no_speech_prob": 0.004965724423527718}, + {"id": 73, "seek": 46356, "start": 476.6, "end": 481.08, "text": " into let''s say + highlighter in solar that will it will actually build me the snippet", "tokens": + [51016, 666, 718, 311, 584, 40455, 294, 7936, 300, 486, 309, 486, 767, 1322, 385, + 264, 35623, 302, 51240], "temperature": 0.0, "avg_logprob": -0.14990426018124536, + "compression_ratio": 1.6464285714285714, "no_speech_prob": 0.004965724423527718}, + {"id": 74, "seek": 46356, "start": 481.8, "end": 486.2, "text": " regardless of + the source of that document, whether it''s keyboard or vector. That''s a very", + "tokens": [51276, 10060, 295, 264, 4009, 295, 300, 4166, 11, 1968, 309, 311, 10186, + 420, 8062, 13, 663, 311, 257, 588, 51496], "temperature": 0.0, "avg_logprob": -0.14990426018124536, + "compression_ratio": 1.6464285714285714, "no_speech_prob": 0.004965724423527718}, + {"id": 75, "seek": 46356, "start": 486.2, "end": 491.8, "text": " interesting question + because there are in my opinion two layers of explainability for engineers.", "tokens": + [51496, 1880, 1168, 570, 456, 366, 294, 452, 4800, 732, 7914, 295, 2903, 2310, 337, + 11955, 13, 51776], "temperature": 0.0, "avg_logprob": -0.14990426018124536, "compression_ratio": + 1.6464285714285714, "no_speech_prob": 0.004965724423527718}, {"id": 76, "seek": + 49180, "start": 492.04, "end": 496.2, "text": " So we need to work on the engine + and change the ranking, change the matching", "tokens": [50376, 407, 321, 643, 281, + 589, 322, 264, 2848, 293, 1319, 264, 17833, 11, 1319, 264, 14324, 50584], "temperature": + 0.0, "avg_logprob": -0.4303919954119988, "compression_ratio": 1.6653846153846155, + "no_speech_prob": 0.005520969163626432}, {"id": 77, "seek": 49180, "start": 497.8, + "end": 501.8, "text": " and user''s equipment. Yeah. So a user that just want to + know why for example,", "tokens": [50664, 293, 4195, 311, 5927, 13, 865, 13, 407, + 257, 4195, 300, 445, 528, 281, 458, 983, 337, 1365, 11, 50864], "temperature": 0.0, + "avg_logprob": -0.4303919954119988, "compression_ratio": 1.6653846153846155, "no_speech_prob": + 0.005520969163626432}, {"id": 78, "seek": 49180, "start": 501.8, "end": 507.08000000000004, + "text": " what is there and for user''s finability actually, my company, we design + and develop the", "tokens": [50864, 437, 307, 456, 293, 337, 4195, 311, 962, 2310, + 767, 11, 452, 2237, 11, 321, 1715, 293, 1499, 264, 51128], "temperature": 0.0, "avg_logprob": + -0.4303919954119988, "compression_ratio": 1.6653846153846155, "no_speech_prob": + 0.005520969163626432}, {"id": 79, "seek": 49180, "start": 507.08000000000004, "end": + 513.8, "text": " highlighter. We call the neural highlighter that takes in input + the wireless model and in the response,", "tokens": [51128, 40455, 13, 492, 818, + 264, 18161, 40455, 300, 2516, 294, 4846, 264, 14720, 2316, 293, 294, 264, 4134, + 11, 51464], "temperature": 0.0, "avg_logprob": -0.4303919954119988, "compression_ratio": + 1.6653846153846155, "no_speech_prob": 0.005520969163626432}, {"id": 80, "seek": + 49180, "start": 513.8, "end": 519.0, "text": " will I like the snippet for each + result in documents, not based on let''s say on match,", "tokens": [51464, 486, + 286, 411, 264, 35623, 302, 337, 1184, 1874, 294, 8512, 11, 406, 2361, 322, 718, + 311, 584, 322, 2995, 11, 51724], "temperature": 0.0, "avg_logprob": -0.4303919954119988, + "compression_ratio": 1.6653846153846155, "no_speech_prob": 0.005520969163626432}, + {"id": 81, "seek": 51900, "start": 519.0, "end": 524.2, "text": " but based on the + question as a system powered by a level model. Yeah. So in this way,", "tokens": + [50364, 457, 2361, 322, 264, 1168, 382, 257, 1185, 17786, 538, 257, 1496, 2316, + 13, 865, 13, 407, 294, 341, 636, 11, 50624], "temperature": 0.0, "avg_logprob": + -0.3864540713174002, "compression_ratio": 1.6692307692307693, "no_speech_prob": + 0.0030129007063806057}, {"id": 82, "seek": 51900, "start": 524.2, "end": 529.16, + "text": " you will be able to highlight part of the original document that are semantically + close to the", "tokens": [50624, 291, 486, 312, 1075, 281, 5078, 644, 295, 264, + 3380, 4166, 300, 366, 4361, 49505, 1998, 281, 264, 50872], "temperature": 0.0, "avg_logprob": + -0.3864540713174002, "compression_ratio": 1.6692307692307693, "no_speech_prob": + 0.0030129007063806057}, {"id": 83, "seek": 51900, "start": 529.16, "end": 532.36, + "text": " interesting place. Can you say the name again? What was the name? It''s + called the neural", "tokens": [50872, 1880, 1081, 13, 1664, 291, 584, 264, 1315, + 797, 30, 708, 390, 264, 1315, 30, 467, 311, 1219, 264, 18161, 51032], "temperature": + 0.0, "avg_logprob": -0.3864540713174002, "compression_ratio": 1.6692307692307693, + "no_speech_prob": 0.0030129007063806057}, {"id": 84, "seek": 51900, "start": 532.36, + "end": 537.0, "text": " highlighter. Neural neural highlighter. So it''s your proprietary + product right now. Yes.", "tokens": [51032, 40455, 13, 1734, 1807, 18161, 40455, + 13, 407, 309, 311, 428, 38992, 1674, 558, 586, 13, 1079, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.3864540713174002, "compression_ratio": 1.6692307692307693, + "no_speech_prob": 0.0030129007063806057}, {"id": 85, "seek": 51900, "start": 537.0, + "end": 542.04, "text": " It''s a lot of synths, right? Yes. Right now, yes. We may + contribute it to a", "tokens": [51264, 467, 311, 257, 688, 295, 10657, 82, 11, 558, + 30, 1079, 13, 1779, 586, 11, 2086, 13, 492, 815, 10586, 309, 281, 257, 51516], "temperature": + 0.0, "avg_logprob": -0.3864540713174002, "compression_ratio": 1.6692307692307693, + "no_speech_prob": 0.0030129007063806057}, {"id": 86, "seek": 54204, "start": 542.04, + "end": 547.3199999999999, "text": " open source integer. I don''t know. Right now + is one of our products. But I mean,", "tokens": [50364, 1269, 4009, 24922, 13, 286, + 500, 380, 458, 13, 1779, 586, 307, 472, 295, 527, 3383, 13, 583, 286, 914, 11, 50628], + "temperature": 0.0, "avg_logprob": -0.38971791501905095, "compression_ratio": 1.6772908366533865, + "no_speech_prob": 0.031038612127304077}, {"id": 87, "seek": 54204, "start": 547.3199999999999, + "end": 552.12, "text": " it''s a feature that you, is it offered as a standalone + component? It''s a plugin.", "tokens": [50628, 309, 311, 257, 4111, 300, 291, 11, + 307, 309, 8059, 382, 257, 37454, 6542, 30, 467, 311, 257, 23407, 13, 50868], "temperature": + 0.0, "avg_logprob": -0.38971791501905095, "compression_ratio": 1.6772908366533865, + "no_speech_prob": 0.031038612127304077}, {"id": 88, "seek": 54204, "start": 552.12, + "end": 556.1999999999999, "text": " It''s a plugin. So you ins it''s a plugin to + pull it. It''s a plugin. That''s the value", "tokens": [50868, 467, 311, 257, 23407, + 13, 407, 291, 1028, 309, 311, 257, 23407, 281, 2235, 309, 13, 467, 311, 257, 23407, + 13, 663, 311, 264, 2158, 51072], "temperature": 0.0, "avg_logprob": -0.38971791501905095, + "compression_ratio": 1.6772908366533865, "no_speech_prob": 0.031038612127304077}, + {"id": 89, "seek": 54204, "start": 556.1999999999999, "end": 561.16, "text": " prop + as well, right? It doesn''t always need to be open. It''s something I can plug in + and", "tokens": [51072, 2365, 382, 731, 11, 558, 30, 467, 1177, 380, 1009, 643, + 281, 312, 1269, 13, 467, 311, 746, 286, 393, 5452, 294, 293, 51320], "temperature": + 0.0, "avg_logprob": -0.38971791501905095, "compression_ratio": 1.6772908366533865, + "no_speech_prob": 0.031038612127304077}, {"id": 90, "seek": 54204, "start": 561.7199999999999, + "end": 566.28, "text": " exactly. It takes in input the wireless model. Yeah. It''s + a response point. So you can", "tokens": [51348, 2293, 13, 467, 2516, 294, 4846, + 264, 14720, 2316, 13, 865, 13, 467, 311, 257, 4134, 935, 13, 407, 291, 393, 51576], + "temperature": 0.0, "avg_logprob": -0.38971791501905095, "compression_ratio": 1.6772908366533865, + "no_speech_prob": 0.031038612127304077}, {"id": 91, "seek": 56628, "start": 567.0, + "end": 572.68, "text": " write. So that will help to explain results to the users, + right? And you also mentioned,", "tokens": [50400, 2464, 13, 407, 300, 486, 854, + 281, 2903, 3542, 281, 264, 5022, 11, 558, 30, 400, 291, 611, 2835, 11, 50684], "temperature": + 0.0, "avg_logprob": -0.2052457250397781, "compression_ratio": 1.841897233201581, + "no_speech_prob": 0.014821712858974934}, {"id": 92, "seek": 56628, "start": 572.68, + "end": 576.76, "text": " right? It''s thanks for doing this distinction, making + this distinction that there is also", "tokens": [50684, 558, 30, 467, 311, 3231, + 337, 884, 341, 16844, 11, 1455, 341, 16844, 300, 456, 307, 611, 50888], "temperature": + 0.0, "avg_logprob": -0.2052457250397781, "compression_ratio": 1.841897233201581, + "no_speech_prob": 0.014821712858974934}, {"id": 93, "seek": 56628, "start": 576.76, + "end": 582.04, "text": " explainability for the engineers that is also important. + So can you a bit explain what you mean?", "tokens": [50888, 2903, 2310, 337, 264, + 11955, 300, 307, 611, 1021, 13, 407, 393, 291, 257, 857, 2903, 437, 291, 914, 30, + 51152], "temperature": 0.0, "avg_logprob": -0.2052457250397781, "compression_ratio": + 1.841897233201581, "no_speech_prob": 0.014821712858974934}, {"id": 94, "seek": 56628, + "start": 582.04, "end": 586.92, "text": " Explainability for the engineers because + I care about it a lot of force. But I used to be an", "tokens": [51152, 39574, 2310, + 337, 264, 11955, 570, 286, 1127, 466, 309, 257, 688, 295, 3464, 13, 583, 286, 1143, + 281, 312, 364, 51396], "temperature": 0.0, "avg_logprob": -0.2052457250397781, "compression_ratio": + 1.841897233201581, "no_speech_prob": 0.014821712858974934}, {"id": 95, "seek": 56628, + "start": 586.92, "end": 593.3199999999999, "text": " engineer full time. And I need + to know how, like, how to do it, how to tweak something. But also,", "tokens": [51396, + 11403, 1577, 565, 13, 400, 286, 643, 281, 458, 577, 11, 411, 11, 577, 281, 360, + 309, 11, 577, 281, 29879, 746, 13, 583, 611, 11, 51716], "temperature": 0.0, "avg_logprob": + -0.2052457250397781, "compression_ratio": 1.841897233201581, "no_speech_prob": 0.014821712858974934}, + {"id": 96, "seek": 59332, "start": 593.32, "end": 597.6400000000001, "text": " can + I explain to myself that what I tweaked is actually the right thing, right? So,", + "tokens": [50364, 393, 286, 2903, 281, 2059, 300, 437, 286, 6986, 7301, 307, 767, + 264, 558, 551, 11, 558, 30, 407, 11, 50580], "temperature": 0.0, "avg_logprob": + -0.17626971988887577, "compression_ratio": 1.6383928571428572, "no_speech_prob": + 0.0027192109264433384}, {"id": 97, "seek": 59332, "start": 597.6400000000001, "end": + 603.5600000000001, "text": " you know, kind of the process of engineering the. Yes. + So in solar, for example, there is a debug", "tokens": [50580, 291, 458, 11, 733, + 295, 264, 1399, 295, 7043, 264, 13, 1079, 13, 407, 294, 7936, 11, 337, 1365, 11, + 456, 307, 257, 24083, 50876], "temperature": 0.0, "avg_logprob": -0.17626971988887577, + "compression_ratio": 1.6383928571428572, "no_speech_prob": 0.0027192109264433384}, + {"id": 98, "seek": 59332, "start": 603.5600000000001, "end": 611.5600000000001, + "text": " component that give you the ability to engineer to expand the response + with the information about", "tokens": [50876, 6542, 300, 976, 291, 264, 3485, 281, + 11403, 281, 5268, 264, 4134, 365, 264, 1589, 466, 51276], "temperature": 0.0, "avg_logprob": + -0.17626971988887577, "compression_ratio": 1.6383928571428572, "no_speech_prob": + 0.0027192109264433384}, {"id": 99, "seek": 59332, "start": 611.5600000000001, "end": + 617.1600000000001, "text": " how the score was calculated. So in solar, when you + have a query and you have a result,", "tokens": [51276, 577, 264, 6175, 390, 15598, + 13, 407, 294, 7936, 11, 562, 291, 362, 257, 14581, 293, 291, 362, 257, 1874, 11, + 51556], "temperature": 0.0, "avg_logprob": -0.17626971988887577, "compression_ratio": + 1.6383928571428572, "no_speech_prob": 0.0027192109264433384}, {"id": 100, "seek": + 61716, "start": 617.16, "end": 623.0, "text": " a score is calculated for that result + for that query. And this core will impact the ranking.", "tokens": [50364, 257, + 6175, 307, 15598, 337, 300, 1874, 337, 300, 14581, 13, 400, 341, 4965, 486, 2712, + 264, 17833, 13, 50656], "temperature": 0.0, "avg_logprob": -0.26591312885284424, + "compression_ratio": 1.7033492822966507, "no_speech_prob": 0.011101032607257366}, + {"id": 101, "seek": 61716, "start": 623.0, "end": 630.36, "text": " So, descending + order, literally, right from the highest core to the lowest. And normally,", "tokens": + [50656, 407, 11, 40182, 1668, 11, 3736, 11, 558, 490, 264, 6343, 4965, 281, 264, + 12437, 13, 400, 5646, 11, 51024], "temperature": 0.0, "avg_logprob": -0.26591312885284424, + "compression_ratio": 1.7033492822966507, "no_speech_prob": 0.011101032607257366}, + {"id": 102, "seek": 61716, "start": 630.36, "end": 637.56, "text": " this core is + explained showing why you get that mathematical calculations from the term", "tokens": + [51024, 341, 4965, 307, 8825, 4099, 983, 291, 483, 300, 18894, 20448, 490, 264, + 1433, 51384], "temperature": 0.0, "avg_logprob": -0.26591312885284424, "compression_ratio": + 1.7033492822966507, "no_speech_prob": 0.011101032607257366}, {"id": 103, "seek": + 61716, "start": 637.56, "end": 643.48, "text": " frequencies. Yeah. The length of + the document field, the average length of the field,", "tokens": [51384, 20250, + 13, 865, 13, 440, 4641, 295, 264, 4166, 2519, 11, 264, 4274, 4641, 295, 264, 2519, + 11, 51680], "temperature": 0.0, "avg_logprob": -0.26591312885284424, "compression_ratio": + 1.7033492822966507, "no_speech_prob": 0.011101032607257366}, {"id": 104, "seek": + 64348, "start": 643.48, "end": 649.08, "text": " the document, frequency, hour, + error, a term was, for example, and so on and so forth.", "tokens": [50364, 264, + 4166, 11, 7893, 11, 1773, 11, 6713, 11, 257, 1433, 390, 11, 337, 1365, 11, 293, + 370, 322, 293, 370, 5220, 13, 50644], "temperature": 0.0, "avg_logprob": -0.3125187590882018, + "compression_ratio": 1.6017316017316017, "no_speech_prob": 0.007342732511460781}, + {"id": 105, "seek": 64348, "start": 649.08, "end": 654.44, "text": " So long mathematical + expression that are readable to the user and you can understand, okay,", "tokens": + [50644, 407, 938, 18894, 6114, 300, 366, 49857, 281, 264, 4195, 293, 291, 393, 1223, + 11, 1392, 11, 50912], "temperature": 0.0, "avg_logprob": -0.3125187590882018, "compression_ratio": + 1.6017316017316017, "no_speech_prob": 0.007342732511460781}, {"id": 106, "seek": + 64348, "start": 654.44, "end": 660.52, "text": " I was aiming for this field to + impact the score. Let''s see, let''s see, really impact the score.", "tokens": [50912, + 286, 390, 20253, 337, 341, 2519, 281, 2712, 264, 6175, 13, 961, 311, 536, 11, 718, + 311, 536, 11, 534, 2712, 264, 6175, 13, 51216], "temperature": 0.0, "avg_logprob": + -0.3125187590882018, "compression_ratio": 1.6017316017316017, "no_speech_prob": + 0.007342732511460781}, {"id": 107, "seek": 64348, "start": 662.12, "end": 667.32, + "text": " With better research right now, the only explanation that you get from + an engineer perspective,", "tokens": [51296, 2022, 1101, 2132, 558, 586, 11, 264, + 787, 10835, 300, 291, 483, 490, 364, 11403, 4585, 11, 51556], "temperature": 0.0, + "avg_logprob": -0.3125187590882018, "compression_ratio": 1.6017316017316017, "no_speech_prob": + 0.007342732511460781}, {"id": 108, "seek": 66732, "start": 667.32, "end": 674.84, + "text": " literally is within the top K. So this document was within a top K with + a cosine similarity between", "tokens": [50364, 3736, 307, 1951, 264, 1192, 591, + 13, 407, 341, 4166, 390, 1951, 257, 1192, 591, 365, 257, 23565, 32194, 1296, 50740], + "temperature": 0.0, "avg_logprob": -0.23210259701343292, "compression_ratio": 1.6594827586206897, + "no_speech_prob": 0.010490495711565018}, {"id": 109, "seek": 66732, "start": 674.84, + "end": 680.2, "text": " the query vector and the document vector. That''s not really + helpful. It''s just confirming what''s", "tokens": [50740, 264, 14581, 8062, 293, + 264, 4166, 8062, 13, 663, 311, 406, 534, 4961, 13, 467, 311, 445, 42861, 437, 311, + 51008], "temperature": 0.0, "avg_logprob": -0.23210259701343292, "compression_ratio": + 1.6594827586206897, "no_speech_prob": 0.010490495711565018}, {"id": 110, "seek": + 66732, "start": 680.2, "end": 685.48, "text": " you know already, right? I mean, + yeah, it''s in the top K, it was written. So one of the ideas I was", "tokens": + [51008, 291, 458, 1217, 11, 558, 30, 286, 914, 11, 1338, 11, 309, 311, 294, 264, + 1192, 591, 11, 309, 390, 3720, 13, 407, 472, 295, 264, 3487, 286, 390, 51272], "temperature": + 0.0, "avg_logprob": -0.23210259701343292, "compression_ratio": 1.6594827586206897, + "no_speech_prob": 0.010490495711565018}, {"id": 111, "seek": 66732, "start": 685.48, + "end": 691.48, "text": " thinking of, because actually quite far from the implementation + is to explain the reason", "tokens": [51272, 1953, 295, 11, 570, 767, 1596, 1400, + 490, 264, 11420, 307, 281, 2903, 264, 1778, 51572], "temperature": 0.0, "avg_logprob": + -0.23210259701343292, "compression_ratio": 1.6594827586206897, "no_speech_prob": + 0.010490495711565018}, {"id": 112, "seek": 69148, "start": 691.64, "end": 700.76, + "text": " document in the results set, showing examples of, so the language models + used to return embedded,", "tokens": [50372, 4166, 294, 264, 3542, 992, 11, 4099, + 5110, 295, 11, 370, 264, 2856, 5245, 1143, 281, 2736, 16741, 11, 50828], "temperature": + 0.0, "avg_logprob": -0.3550327212311501, "compression_ratio": 1.7990867579908676, + "no_speech_prob": 0.0023939379025250673}, {"id": 113, "seek": 69148, "start": 701.4, + "end": 708.84, "text": " were fine tuned on sentence similarity. So this means there + were pair of sentences with similar", "tokens": [50860, 645, 2489, 10870, 322, 8174, + 32194, 13, 407, 341, 1355, 456, 645, 6119, 295, 16579, 365, 2531, 51232], "temperature": + 0.0, "avg_logprob": -0.3550327212311501, "compression_ratio": 1.7990867579908676, + "no_speech_prob": 0.0023939379025250673}, {"id": 114, "seek": 69148, "start": 708.84, + "end": 714.6, "text": " meaning and pair of sentences with this similar meaning + in a way that to learn how to encode this", "tokens": [51232, 3620, 293, 6119, 295, + 16579, 365, 341, 2531, 3620, 294, 257, 636, 300, 281, 1466, 577, 281, 2058, 1429, + 341, 51520], "temperature": 0.0, "avg_logprob": -0.3550327212311501, "compression_ratio": + 1.7990867579908676, "no_speech_prob": 0.0023939379025250673}, {"id": 115, "seek": + 69148, "start": 714.6, "end": 721.08, "text": " amount. So I think it could be very + interesting if to explain the reason a document is being returned.", "tokens": [51520, + 2372, 13, 407, 286, 519, 309, 727, 312, 588, 1880, 498, 281, 2903, 264, 1778, 257, + 4166, 307, 885, 8752, 13, 51844], "temperature": 0.0, "avg_logprob": -0.3550327212311501, + "compression_ratio": 1.7990867579908676, "no_speech_prob": 0.0023939379025250673}, + {"id": 116, "seek": 72148, "start": 721.72, "end": 729.4, "text": " Because of vector + cells, you show like a snippet, say, because there are, there is this similar", + "tokens": [50376, 1436, 295, 8062, 5438, 11, 291, 855, 411, 257, 35623, 302, 11, + 584, 11, 570, 456, 366, 11, 456, 307, 341, 2531, 50760], "temperature": 0.0, "avg_logprob": + -0.2672163780699385, "compression_ratio": 1.6565217391304348, "no_speech_prob": + 0.002177764428779483}, {"id": 117, "seek": 72148, "start": 729.4, "end": 737.0, + "text": " pairs of sentences and this is this similar sentence, in the way that + then potentially the engineer", "tokens": [50760, 15494, 295, 16579, 293, 341, 307, + 341, 2531, 8174, 11, 294, 264, 636, 300, 550, 7263, 264, 11403, 51140], "temperature": + 0.0, "avg_logprob": -0.2672163780699385, "compression_ratio": 1.6565217391304348, + "no_speech_prob": 0.002177764428779483}, {"id": 118, "seek": 72148, "start": 737.0, + "end": 741.48, "text": " can go back and realize, okay, let''s take a look to the + original training data, for example,", "tokens": [51140, 393, 352, 646, 293, 4325, + 11, 1392, 11, 718, 311, 747, 257, 574, 281, 264, 3380, 3097, 1412, 11, 337, 1365, + 11, 51364], "temperature": 0.0, "avg_logprob": -0.2672163780699385, "compression_ratio": + 1.6565217391304348, "no_speech_prob": 0.002177764428779483}, {"id": 119, "seek": + 72148, "start": 742.2, "end": 748.6, "text": " did I cover the example well or maybe + they are wrong? So I see like, oh, these two sentences", "tokens": [51400, 630, + 286, 2060, 264, 1365, 731, 420, 1310, 436, 366, 2085, 30, 407, 286, 536, 411, 11, + 1954, 11, 613, 732, 16579, 51720], "temperature": 0.0, "avg_logprob": -0.2672163780699385, + "compression_ratio": 1.6565217391304348, "no_speech_prob": 0.002177764428779483}, + {"id": 120, "seek": 74860, "start": 748.6800000000001, "end": 755.96, "text": " + are shown as similar, but they are not so. It''s just an idea, you know, study it.", + "tokens": [50368, 366, 4898, 382, 2531, 11, 457, 436, 366, 406, 370, 13, 467, 311, + 445, 364, 1558, 11, 291, 458, 11, 2979, 309, 13, 50732], "temperature": 0.0, "avg_logprob": + -0.2456328894502373, "compression_ratio": 1.6801801801801801, "no_speech_prob": + 0.015567686408758163}, {"id": 121, "seek": 74860, "start": 755.96, "end": 761.72, + "text": " Wow, that''s very interesting, because as you said, it''s very limiting + today to just know that", "tokens": [50732, 3153, 11, 300, 311, 588, 1880, 11, 570, + 382, 291, 848, 11, 309, 311, 588, 22083, 965, 281, 445, 458, 300, 51020], "temperature": + 0.0, "avg_logprob": -0.2456328894502373, "compression_ratio": 1.6801801801801801, + "no_speech_prob": 0.015567686408758163}, {"id": 122, "seek": 74860, "start": 761.72, + "end": 768.28, "text": " geometric search happened and this is the result. Yeah, + that''s amazing. I mean, it''s really interesting", "tokens": [51020, 33246, 3164, + 2011, 293, 341, 307, 264, 1874, 13, 865, 11, 300, 311, 2243, 13, 286, 914, 11, 309, + 311, 534, 1880, 51348], "temperature": 0.0, "avg_logprob": -0.2456328894502373, + "compression_ratio": 1.6801801801801801, "no_speech_prob": 0.015567686408758163}, + {"id": 123, "seek": 74860, "start": 768.28, "end": 774.36, "text": " that with this + work, you are not really just taking something and applying to implementation,", + "tokens": [51348, 300, 365, 341, 589, 11, 291, 366, 406, 534, 445, 1940, 746, 293, + 9275, 281, 11420, 11, 51652], "temperature": 0.0, "avg_logprob": -0.2456328894502373, + "compression_ratio": 1.6801801801801801, "no_speech_prob": 0.015567686408758163}, + {"id": 124, "seek": 77436, "start": 774.36, "end": 779.08, "text": " right? Like, + I mean, implementing a plugin, but you actually go into the space of exploring", + "tokens": [50364, 558, 30, 1743, 11, 286, 914, 11, 18114, 257, 23407, 11, 457, 291, + 767, 352, 666, 264, 1901, 295, 12736, 50600], "temperature": 0.0, "avg_logprob": + -0.11868926882743835, "compression_ratio": 1.6239669421487604, "no_speech_prob": + 0.005990014877170324}, {"id": 125, "seek": 77436, "start": 779.08, "end": 784.6, + "text": " thing because it''s not like everything is done, right? And maybe in some + companies, it has been done,", "tokens": [50600, 551, 570, 309, 311, 406, 411, 1203, + 307, 1096, 11, 558, 30, 400, 1310, 294, 512, 3431, 11, 309, 575, 668, 1096, 11, + 50876], "temperature": 0.0, "avg_logprob": -0.11868926882743835, "compression_ratio": + 1.6239669421487604, "no_speech_prob": 0.005990014877170324}, {"id": 126, "seek": + 77436, "start": 784.6, "end": 790.76, "text": " but they are not open sourcing, + right? And so you need to do the search, the search of the solution.", "tokens": + [50876, 457, 436, 366, 406, 1269, 11006, 2175, 11, 558, 30, 400, 370, 291, 643, + 281, 360, 264, 3164, 11, 264, 3164, 295, 264, 3827, 13, 51184], "temperature": 0.0, + "avg_logprob": -0.11868926882743835, "compression_ratio": 1.6239669421487604, "no_speech_prob": + 0.005990014877170324}, {"id": 127, "seek": 77436, "start": 790.76, "end": 799.0, + "text": " That''s very interesting. So in terms of functionality today, hybrid search + is already available in", "tokens": [51184, 663, 311, 588, 1880, 13, 407, 294, 2115, + 295, 14980, 965, 11, 13051, 3164, 307, 1217, 2435, 294, 51596], "temperature": 0.0, + "avg_logprob": -0.11868926882743835, "compression_ratio": 1.6239669421487604, "no_speech_prob": + 0.005990014877170324}, {"id": 128, "seek": 79900, "start": 799.0, "end": 806.44, + "text": " solar, right? Is it already released? And big portion? Yeah, so there + are different ways", "tokens": [50364, 7936, 11, 558, 30, 1119, 309, 1217, 4736, + 30, 400, 955, 8044, 30, 865, 11, 370, 456, 366, 819, 2098, 50736], "temperature": + 0.0, "avg_logprob": -0.27918898540994397, "compression_ratio": 1.6311111111111112, + "no_speech_prob": 0.002884520683437586}, {"id": 129, "seek": 79900, "start": 806.44, + "end": 814.36, "text": " I need search can be performed in solar. So right now, + we saw 9.6. There are ways of combining", "tokens": [50736, 286, 643, 3164, 393, + 312, 10332, 294, 7936, 13, 407, 558, 586, 11, 321, 1866, 1722, 13, 21, 13, 821, + 366, 2098, 295, 21928, 51132], "temperature": 0.0, "avg_logprob": -0.27918898540994397, + "compression_ratio": 1.6311111111111112, "no_speech_prob": 0.002884520683437586}, + {"id": 130, "seek": 79900, "start": 814.36, "end": 820.84, "text": " results from + electrical search and vector-based search and then re-rank them, for example,", + "tokens": [51132, 3542, 490, 12147, 3164, 293, 8062, 12, 6032, 3164, 293, 550, 319, + 12, 20479, 552, 11, 337, 1365, 11, 51456], "temperature": 0.0, "avg_logprob": -0.27918898540994397, + "compression_ratio": 1.6311111111111112, "no_speech_prob": 0.002884520683437586}, + {"id": 131, "seek": 79900, "start": 820.84, "end": 827.4, "text": " using learning + pranks. So you give like different ways to different factors. So for example,", + "tokens": [51456, 1228, 2539, 582, 14592, 13, 407, 291, 976, 411, 819, 2098, 281, + 819, 6771, 13, 407, 337, 1365, 11, 51784], "temperature": 0.0, "avg_logprob": -0.27918898540994397, + "compression_ratio": 1.6311111111111112, "no_speech_prob": 0.002884520683437586}, + {"id": 132, "seek": 82740, "start": 827.4, "end": 832.92, "text": " the vector-based + core or the traditional core. Yeah. What is coming next, which was the topic of + my", "tokens": [50364, 264, 8062, 12, 6032, 4965, 420, 264, 5164, 4965, 13, 865, + 13, 708, 307, 1348, 958, 11, 597, 390, 264, 4829, 295, 452, 50640], "temperature": + 0.0, "avg_logprob": -0.21674437661772794, "compression_ratio": 1.5991902834008098, + "no_speech_prob": 0.009263226762413979}, {"id": 133, "seek": 82740, "start": 832.92, + "end": 840.28, "text": " talk is the receiver rank user. So that''s coming with + solar 9.7. So I guess in a couple of months,", "tokens": [50640, 751, 307, 264, + 20086, 6181, 4195, 13, 407, 300, 311, 1348, 365, 7936, 1722, 13, 22, 13, 407, 286, + 2041, 294, 257, 1916, 295, 2493, 11, 51008], "temperature": 0.0, "avg_logprob": + -0.21674437661772794, "compression_ratio": 1.5991902834008098, "no_speech_prob": + 0.009263226762413979}, {"id": 134, "seek": 82740, "start": 840.28, "end": 845.9599999999999, + "text": " we''re going to release it. Nice. And that is a way of adding hybrid search + that is independent on", "tokens": [51008, 321, 434, 516, 281, 4374, 309, 13, 5490, + 13, 400, 300, 307, 257, 636, 295, 5127, 13051, 3164, 300, 307, 6695, 322, 51292], + "temperature": 0.0, "avg_logprob": -0.21674437661772794, "compression_ratio": 1.5991902834008098, + "no_speech_prob": 0.009263226762413979}, {"id": 135, "seek": 82740, "start": 845.9599999999999, + "end": 853.0799999999999, "text": " this core and just based on the ranking of the + results. So you mix the different rank lists. Yeah,", "tokens": [51292, 341, 4965, + 293, 445, 2361, 322, 264, 17833, 295, 264, 3542, 13, 407, 291, 2890, 264, 819, 6181, + 14511, 13, 865, 11, 51648], "temperature": 0.0, "avg_logprob": -0.21674437661772794, + "compression_ratio": 1.5991902834008098, "no_speech_prob": 0.009263226762413979}, + {"id": 136, "seek": 85308, "start": 853.1600000000001, "end": 859.08, "text": " + they can be two, maybe more. Yeah. It''s support more supported, not just two. And + then you combine them", "tokens": [50368, 436, 393, 312, 732, 11, 1310, 544, 13, + 865, 13, 467, 311, 1406, 544, 8104, 11, 406, 445, 732, 13, 400, 550, 291, 10432, + 552, 50664], "temperature": 0.0, "avg_logprob": -0.23221522790414315, "compression_ratio": + 1.7300380228136882, "no_speech_prob": 0.0019908398389816284}, {"id": 137, "seek": + 85308, "start": 859.08, "end": 864.2, "text": " based on the position of the documents + in the different rank list. Yeah. The higher the position", "tokens": [50664, 2361, + 322, 264, 2535, 295, 264, 8512, 294, 264, 819, 6181, 1329, 13, 865, 13, 440, 2946, + 264, 2535, 50920], "temperature": 0.0, "avg_logprob": -0.23221522790414315, "compression_ratio": + 1.7300380228136882, "no_speech_prob": 0.0019908398389816284}, {"id": 138, "seek": + 85308, "start": 864.2, "end": 869.5600000000001, "text": " in the ranking, the best + the probability that the document is going to end up in a higher", "tokens": [50920, + 294, 264, 17833, 11, 264, 1151, 264, 8482, 300, 264, 4166, 307, 516, 281, 917, 493, + 294, 257, 2946, 51188], "temperature": 0.0, "avg_logprob": -0.23221522790414315, + "compression_ratio": 1.7300380228136882, "no_speech_prob": 0.0019908398389816284}, + {"id": 139, "seek": 85308, "start": 869.5600000000001, "end": 875.0, "text": " final + result set. Yeah, yeah. Actually, when I was maybe you can help me understand this, + but", "tokens": [51188, 2572, 1874, 992, 13, 865, 11, 1338, 13, 5135, 11, 562, 286, + 390, 1310, 291, 393, 854, 385, 1223, 341, 11, 457, 51460], "temperature": 0.0, "avg_logprob": + -0.23221522790414315, "compression_ratio": 1.7300380228136882, "no_speech_prob": + 0.0019908398389816284}, {"id": 140, "seek": 85308, "start": 875.0, "end": 880.9200000000001, + "text": " when we were trying reciprocal rank fusion with another search engine,", + "tokens": [51460, 562, 321, 645, 1382, 46948, 6181, 23100, 365, 1071, 3164, 2848, + 11, 51756], "temperature": 0.0, "avg_logprob": -0.23221522790414315, "compression_ratio": + 1.7300380228136882, "no_speech_prob": 0.0019908398389816284}, {"id": 141, "seek": + 88308, "start": 883.32, "end": 889.24, "text": " we actually found implementation. + So we could kind of plug it in and Python code, very quickly.", "tokens": [50376, + 321, 767, 1352, 11420, 13, 407, 321, 727, 733, 295, 5452, 309, 294, 293, 15329, + 3089, 11, 588, 2661, 13, 50672], "temperature": 0.0, "avg_logprob": -0.2510400325693983, + "compression_ratio": 1.5591836734693878, "no_speech_prob": 0.0028847292996942997}, + {"id": 142, "seek": 88308, "start": 889.64, "end": 894.44, "text": " But then when + we looked at the code, one of my engineers said, this looks like round,", "tokens": + [50692, 583, 550, 562, 321, 2956, 412, 264, 3089, 11, 472, 295, 452, 11955, 848, + 11, 341, 1542, 411, 3098, 11, 50932], "temperature": 0.0, "avg_logprob": -0.2510400325693983, + "compression_ratio": 1.5591836734693878, "no_speech_prob": 0.0028847292996942997}, + {"id": 143, "seek": 88308, "start": 894.44, "end": 901.4000000000001, "text": " + raw, and algorithm essentially. There is nothing particularly peculiar about it + or tunable about it,", "tokens": [50932, 8936, 11, 293, 9284, 4476, 13, 821, 307, + 1825, 4098, 27149, 466, 309, 420, 4267, 712, 466, 309, 11, 51280], "temperature": + 0.0, "avg_logprob": -0.2510400325693983, "compression_ratio": 1.5591836734693878, + "no_speech_prob": 0.0028847292996942997}, {"id": 144, "seek": 88308, "start": 901.4000000000001, + "end": 907.96, "text": " which probably is not true, but I''m not sure what''s your + take on this. So it felt like you have two", "tokens": [51280, 597, 1391, 307, 406, + 2074, 11, 457, 286, 478, 406, 988, 437, 311, 428, 747, 322, 341, 13, 407, 309, 2762, + 411, 291, 362, 732, 51608], "temperature": 0.0, "avg_logprob": -0.2510400325693983, + "compression_ratio": 1.5591836734693878, "no_speech_prob": 0.0028847292996942997}, + {"id": 145, "seek": 90796, "start": 908.44, "end": 913.48, "text": " lists and you + basically just take the starting from the top, you take like in order, you know,", + "tokens": [50388, 14511, 293, 291, 1936, 445, 747, 264, 2891, 490, 264, 1192, 11, + 291, 747, 411, 294, 1668, 11, 291, 458, 11, 50640], "temperature": 0.0, "avg_logprob": + -0.17123609973538306, "compression_ratio": 1.7785977859778597, "no_speech_prob": + 0.012419326230883598}, {"id": 146, "seek": 90796, "start": 913.48, "end": 919.24, + "text": " these documents and you combine a blend at least, right? But if you wanted + to pay attention to", "tokens": [50640, 613, 8512, 293, 291, 10432, 257, 10628, + 412, 1935, 11, 558, 30, 583, 498, 291, 1415, 281, 1689, 3202, 281, 50928], "temperature": + 0.0, "avg_logprob": -0.17123609973538306, "compression_ratio": 1.7785977859778597, + "no_speech_prob": 0.012419326230883598}, {"id": 147, "seek": 90796, "start": 919.24, + "end": 923.96, "text": " some signals from these documents, you know, based on their + features or or maybe you wanted to", "tokens": [50928, 512, 12354, 490, 613, 8512, + 11, 291, 458, 11, 2361, 322, 641, 4122, 420, 420, 1310, 291, 1415, 281, 51164], + "temperature": 0.0, "avg_logprob": -0.17123609973538306, "compression_ratio": 1.7785977859778597, + "no_speech_prob": 0.012419326230883598}, {"id": 148, "seek": 90796, "start": 923.96, + "end": 928.6800000000001, "text": " introduce a logic on top of this, right? So + you want to say, let''s say in the context of geographic", "tokens": [51164, 5366, + 257, 9952, 322, 1192, 295, 341, 11, 558, 30, 407, 291, 528, 281, 584, 11, 718, 311, + 584, 294, 264, 4319, 295, 32318, 51400], "temperature": 0.0, "avg_logprob": -0.17123609973538306, + "compression_ratio": 1.7785977859778597, "no_speech_prob": 0.012419326230883598}, + {"id": 149, "seek": 90796, "start": 928.6800000000001, "end": 936.36, "text": " + search, I want to find in top three results, I want to see a super popular B.O.I. + and I know what", "tokens": [51400, 3164, 11, 286, 528, 281, 915, 294, 1192, 1045, + 3542, 11, 286, 528, 281, 536, 257, 1687, 3743, 363, 13, 46, 13, 40, 13, 293, 286, + 458, 437, 51784], "temperature": 0.0, "avg_logprob": -0.17123609973538306, "compression_ratio": + 1.7785977859778597, "no_speech_prob": 0.012419326230883598}, {"id": 150, "seek": + 93636, "start": 936.36, "end": 942.44, "text": " popular means. Another second result + could be, I don''t know, the closest one or maybe vice versa,", "tokens": [50364, + 3743, 1355, 13, 3996, 1150, 1874, 727, 312, 11, 286, 500, 380, 458, 11, 264, 13699, + 472, 420, 1310, 11964, 25650, 11, 50668], "temperature": 0.0, "avg_logprob": -0.22435011044897216, + "compression_ratio": 1.5362903225806452, "no_speech_prob": 0.001863835728727281}, + {"id": 151, "seek": 93636, "start": 942.44, "end": 949.48, "text": " depends. And + so on so forth. So I have some kind of rules in embed and then maybe it stops becoming", + "tokens": [50668, 5946, 13, 400, 370, 322, 370, 5220, 13, 407, 286, 362, 512, 733, + 295, 4474, 294, 12240, 293, 550, 1310, 309, 10094, 5617, 51020], "temperature": + 0.0, "avg_logprob": -0.22435011044897216, "compression_ratio": 1.5362903225806452, + "no_speech_prob": 0.001863835728727281}, {"id": 152, "seek": 93636, "start": 949.48, + "end": 957.64, "text": " RRE, already, right? But I still go going, taking a step + backwards. Did I explain it right?", "tokens": [51020, 497, 3850, 11, 1217, 11, + 558, 30, 583, 286, 920, 352, 516, 11, 1940, 257, 1823, 12204, 13, 2589, 286, 2903, + 309, 558, 30, 51428], "temperature": 0.0, "avg_logprob": -0.22435011044897216, "compression_ratio": + 1.5362903225806452, "no_speech_prob": 0.001863835728727281}, {"id": 153, "seek": + 93636, "start": 957.64, "end": 963.24, "text": " Or other some parameters and RRE + that I could kind of be tuning a bit to have the different", "tokens": [51428, 1610, + 661, 512, 9834, 293, 497, 3850, 300, 286, 727, 733, 295, 312, 15164, 257, 857, 281, + 362, 264, 819, 51708], "temperature": 0.0, "avg_logprob": -0.22435011044897216, + "compression_ratio": 1.5362903225806452, "no_speech_prob": 0.001863835728727281}, + {"id": 154, "seek": 96324, "start": 963.24, "end": 970.76, "text": " outcome? There''s + not much to tune to be honest. So you got it right. It''s not only around", "tokens": + [50364, 9700, 30, 821, 311, 406, 709, 281, 10864, 281, 312, 3245, 13, 407, 291, + 658, 309, 558, 13, 467, 311, 406, 787, 926, 50740], "temperature": 0.0, "avg_logprob": + -0.27971843083699544, "compression_ratio": 1.7718631178707225, "no_speech_prob": + 0.010089404881000519}, {"id": 155, "seek": 96324, "start": 970.76, "end": 975.96, + "text": " roaming, because what you do is basically you give a new score to the + documents that are based on", "tokens": [50740, 42680, 11, 570, 437, 291, 360, 307, + 1936, 291, 976, 257, 777, 6175, 281, 264, 8512, 300, 366, 2361, 322, 51000], "temperature": + 0.0, "avg_logprob": -0.27971843083699544, "compression_ratio": 1.7718631178707225, + "no_speech_prob": 0.010089404881000519}, {"id": 156, "seek": 96324, "start": 975.96, + "end": 981.88, "text": " all the rankings of that document in the results list. + So it''s not like in Perliving where, for", "tokens": [51000, 439, 264, 36550, 295, + 300, 4166, 294, 264, 3542, 1329, 13, 407, 309, 311, 406, 411, 294, 3026, 75, 2123, + 689, 11, 337, 51296], "temperature": 0.0, "avg_logprob": -0.27971843083699544, "compression_ratio": + 1.7718631178707225, "no_speech_prob": 0.010089404881000519}, {"id": 157, "seek": + 96324, "start": 981.88, "end": 986.12, "text": " example, you go with one document, + you pick from one range of lists and then to the other", "tokens": [51296, 1365, + 11, 291, 352, 365, 472, 4166, 11, 291, 1888, 490, 472, 3613, 295, 14511, 293, 550, + 281, 264, 661, 51508], "temperature": 0.0, "avg_logprob": -0.27971843083699544, + "compression_ratio": 1.7718631178707225, "no_speech_prob": 0.010089404881000519}, + {"id": 158, "seek": 96324, "start": 986.12, "end": 991.72, "text": " list, you pick + another and then you choose which one should I go next. It''s more about life,", + "tokens": [51508, 1329, 11, 291, 1888, 1071, 293, 550, 291, 2826, 597, 472, 820, + 286, 352, 958, 13, 467, 311, 544, 466, 993, 11, 51788], "temperature": 0.0, "avg_logprob": + -0.27971843083699544, "compression_ratio": 1.7718631178707225, "no_speech_prob": + 0.010089404881000519}, {"id": 159, "seek": 99172, "start": 991.8000000000001, "end": + 996.2, "text": " let''s see this document how many times it appears in the ranking + list and where it appears in", "tokens": [50368, 718, 311, 536, 341, 4166, 577, + 867, 1413, 309, 7038, 294, 264, 17833, 1329, 293, 689, 309, 7038, 294, 50588], "temperature": + 0.0, "avg_logprob": -0.1817004893085744, "compression_ratio": 1.8319327731092436, + "no_speech_prob": 0.006877677980810404}, {"id": 160, "seek": 99172, "start": 996.2, + "end": 1002.44, "text": " the ranking list and let''s build this new score. So the + more you are in the top positions,", "tokens": [50588, 264, 17833, 1329, 293, 718, + 311, 1322, 341, 777, 6175, 13, 407, 264, 544, 291, 366, 294, 264, 1192, 8432, 11, + 50900], "temperature": 0.0, "avg_logprob": -0.1817004893085744, "compression_ratio": + 1.8319327731092436, "no_speech_prob": 0.006877677980810404}, {"id": 161, "seek": + 99172, "start": 1002.44, "end": 1006.2, "text": " the more likely you end up in + the top position of the final result list.", "tokens": [50900, 264, 544, 3700, 291, + 917, 493, 294, 264, 1192, 2535, 295, 264, 2572, 1874, 1329, 13, 51088], "temperature": + 0.0, "avg_logprob": -0.1817004893085744, "compression_ratio": 1.8319327731092436, + "no_speech_prob": 0.006877677980810404}, {"id": 162, "seek": 99172, "start": 1007.24, + "end": 1012.52, "text": " Given that, you''re absolutely right that if you want + to be like more advanced ranking", "tokens": [51140, 18600, 300, 11, 291, 434, 3122, + 558, 300, 498, 291, 528, 281, 312, 411, 544, 7339, 17833, 51404], "temperature": + 0.0, "avg_logprob": -0.1817004893085744, "compression_ratio": 1.8319327731092436, + "no_speech_prob": 0.006877677980810404}, {"id": 163, "seek": 99172, "start": 1012.52, + "end": 1018.36, "text": " systems, potentially like with different phases, different + steps, it makes complete sense to", "tokens": [51404, 3652, 11, 7263, 411, 365, + 819, 18764, 11, 819, 4439, 11, 309, 1669, 3566, 2020, 281, 51696], "temperature": + 0.0, "avg_logprob": -0.1817004893085744, "compression_ratio": 1.8319327731092436, + "no_speech_prob": 0.006877677980810404}, {"id": 164, "seek": 101836, "start": 1018.36, + "end": 1025.8, "text": " maybe build your original candidate sets with receiver + or infusion. And then you re-rank,", "tokens": [50364, 1310, 1322, 428, 3380, 11532, + 6352, 365, 20086, 420, 1536, 5704, 13, 400, 550, 291, 319, 12, 20479, 11, 50736], + "temperature": 0.0, "avg_logprob": -0.3586759567260742, "compression_ratio": 1.6697247706422018, + "no_speech_prob": 0.0059024677611887455}, {"id": 165, "seek": 101836, "start": 1026.52, + "end": 1032.2, "text": " for example, using learning to rank and many features where + you can have like, again, maybe", "tokens": [50772, 337, 1365, 11, 1228, 2539, 281, + 6181, 293, 867, 4122, 689, 291, 393, 362, 411, 11, 797, 11, 1310, 51056], "temperature": + 0.0, "avg_logprob": -0.3586759567260742, "compression_ratio": 1.6697247706422018, + "no_speech_prob": 0.0059024677611887455}, {"id": 166, "seek": 101836, "start": 1032.2, + "end": 1037.88, "text": " the vector distances one feature, the similarity we want + to feel from a expert perspective,", "tokens": [51056, 264, 8062, 22182, 472, 4111, + 11, 264, 32194, 321, 528, 281, 841, 490, 257, 5844, 4585, 11, 51340], "temperature": + 0.0, "avg_logprob": -0.3586759567260742, "compression_ratio": 1.6697247706422018, + "no_speech_prob": 0.0059024677611887455}, {"id": 167, "seek": 101836, "start": 1038.44, + "end": 1045.08, "text": " popularity, geographical distance and many other features. + And then you apply learning for", "tokens": [51368, 19301, 11, 39872, 4560, 293, + 867, 661, 4122, 13, 400, 550, 291, 3079, 2539, 337, 51700], "temperature": 0.0, + "avg_logprob": -0.3586759567260742, "compression_ratio": 1.6697247706422018, "no_speech_prob": + 0.0059024677611887455}, {"id": 168, "seek": 104508, "start": 1045.08, "end": 1050.12, + "text": " example, so you train a machine learning model to identify these weights. + It makes perfect sense", "tokens": [50364, 1365, 11, 370, 291, 3847, 257, 3479, + 2539, 2316, 281, 5876, 613, 17443, 13, 467, 1669, 2176, 2020, 50616], "temperature": + 0.0, "avg_logprob": -0.31095361272129446, "compression_ratio": 1.6797153024911031, + "no_speech_prob": 0.004169780295342207}, {"id": 169, "seek": 104508, "start": 1050.12, + "end": 1056.36, "text": " in my opinion. I believe receiver, rank, fusion and in + general, like let''s call them simple", "tokens": [50616, 294, 452, 4800, 13, 286, + 1697, 20086, 11, 6181, 11, 23100, 293, 294, 2674, 11, 411, 718, 311, 818, 552, 2199, + 50928], "temperature": 0.0, "avg_logprob": -0.31095361272129446, "compression_ratio": + 1.6797153024911031, "no_speech_prob": 0.004169780295342207}, {"id": 170, "seek": + 104508, "start": 1056.36, "end": 1060.6, "text": " approaches with our research, + because if you take a look to the algorithm of receiver or rank fusion,", "tokens": + [50928, 11587, 365, 527, 2132, 11, 570, 498, 291, 747, 257, 574, 281, 264, 9284, + 295, 20086, 420, 6181, 23100, 11, 51140], "temperature": 0.0, "avg_logprob": -0.31095361272129446, + "compression_ratio": 1.6797153024911031, "no_speech_prob": 0.004169780295342207}, + {"id": 171, "seek": 104508, "start": 1060.6, "end": 1066.84, "text": " it''s not + the core, it''s actually open and open algorithm from 2009. But this opened the", + "tokens": [51140, 309, 311, 406, 264, 4965, 11, 309, 311, 767, 1269, 293, 1269, + 9284, 490, 11453, 13, 583, 341, 5625, 264, 51452], "temperature": 0.0, "avg_logprob": + -0.31095361272129446, "compression_ratio": 1.6797153024911031, "no_speech_prob": + 0.004169780295342207}, {"id": 172, "seek": 104508, "start": 1066.84, "end": 1071.96, + "text": " doors in my opinion to build your original candidate set and then potentially + like, yeah, you", "tokens": [51452, 8077, 294, 452, 4800, 281, 1322, 428, 3380, + 11532, 992, 293, 550, 7263, 411, 11, 1338, 11, 291, 51708], "temperature": 0.0, + "avg_logprob": -0.31095361272129446, "compression_ratio": 1.6797153024911031, "no_speech_prob": + 0.004169780295342207}, {"id": 173, "seek": 107196, "start": 1071.96, "end": 1076.76, + "text": " re-rank it okay. Yeah, yeah, yeah, she''s not random, it''s okay, she''s + already some", "tokens": [50364, 319, 12, 20479, 309, 1392, 13, 865, 11, 1338, 11, + 1338, 11, 750, 311, 406, 4974, 11, 309, 311, 1392, 11, 750, 311, 1217, 512, 50604], + "temperature": 0.0, "avg_logprob": -0.2948504647055825, "compression_ratio": 1.5948275862068966, + "no_speech_prob": 0.014064047485589981}, {"id": 174, "seek": 107196, "start": 1076.76, + "end": 1084.68, "text": " reasons to be there. And of course, like in any case, + those without saying that we do need to have", "tokens": [50604, 4112, 281, 312, + 456, 13, 400, 295, 1164, 11, 411, 294, 604, 1389, 11, 729, 1553, 1566, 300, 321, + 360, 643, 281, 362, 51000], "temperature": 0.0, "avg_logprob": -0.2948504647055825, + "compression_ratio": 1.5948275862068966, "no_speech_prob": 0.014064047485589981}, + {"id": 175, "seek": 107196, "start": 1084.68, "end": 1091.48, "text": " some method + of combining these completely disparate spaces of scores, right? Into one. And that + could", "tokens": [51000, 512, 3170, 295, 21928, 613, 2584, 14548, 473, 7673, 295, + 13444, 11, 558, 30, 23373, 472, 13, 400, 300, 727, 51340], "temperature": 0.0, "avg_logprob": + -0.2948504647055825, "compression_ratio": 1.5948275862068966, "no_speech_prob": + 0.014064047485589981}, {"id": 176, "seek": 107196, "start": 1091.48, "end": 1096.76, + "text": " be actually even like different search engines operating on keyword level + because they", "tokens": [51340, 312, 767, 754, 411, 819, 3164, 12982, 7447, 322, + 20428, 1496, 570, 436, 51604], "temperature": 0.0, "avg_logprob": -0.2948504647055825, + "compression_ratio": 1.5948275862068966, "no_speech_prob": 0.014064047485589981}, + {"id": 177, "seek": 109676, "start": 1096.76, "end": 1102.28, "text": " output different + scores, right? So maybe even potentially I''m thinking separate charts of your", + "tokens": [50364, 5598, 819, 13444, 11, 558, 30, 407, 1310, 754, 7263, 286, 478, + 1953, 4994, 17767, 295, 428, 50640], "temperature": 0.0, "avg_logprob": -0.25208108880546654, + "compression_ratio": 1.5672268907563025, "no_speech_prob": 0.005187679082155228}, + {"id": 178, "seek": 109676, "start": 1102.28, "end": 1109.32, "text": " data that + also have their own idea, right? Local idea. So, yeah, incomparable, right? Awesome.", + "tokens": [50640, 1412, 300, 611, 362, 641, 1065, 1558, 11, 558, 30, 22755, 1558, + 13, 407, 11, 1338, 11, 14036, 42012, 11, 558, 30, 10391, 13, 50992], "temperature": + 0.0, "avg_logprob": -0.25208108880546654, "compression_ratio": 1.5672268907563025, + "no_speech_prob": 0.005187679082155228}, {"id": 179, "seek": 109676, "start": 1110.36, + "end": 1117.0, "text": " We also, not related to this completely different topic, + like there was also a keynote today about", "tokens": [51044, 492, 611, 11, 406, + 4077, 281, 341, 2584, 819, 4829, 11, 411, 456, 390, 611, 257, 33896, 965, 466, 51376], + "temperature": 0.0, "avg_logprob": -0.25208108880546654, "compression_ratio": 1.5672268907563025, + "no_speech_prob": 0.005187679082155228}, {"id": 180, "seek": 109676, "start": 1118.36, + "end": 1123.96, "text": " sort of what open source means, right? And without, of + course, criticizing, but some", "tokens": [51444, 1333, 295, 437, 1269, 4009, 1355, + 11, 558, 30, 400, 1553, 11, 295, 1164, 11, 45474, 11, 457, 512, 51724], "temperature": + 0.0, "avg_logprob": -0.25208108880546654, "compression_ratio": 1.5672268907563025, + "no_speech_prob": 0.005187679082155228}, {"id": 181, "seek": 112396, "start": 1123.96, + "end": 1130.1200000000001, "text": " companies were mentioned on this context where + they claim that the LLMs are open source, but when", "tokens": [50364, 3431, 645, + 2835, 322, 341, 4319, 689, 436, 3932, 300, 264, 441, 43, 26386, 366, 1269, 4009, + 11, 457, 562, 50672], "temperature": 0.0, "avg_logprob": -0.18427793369736784, "compression_ratio": + 1.6785714285714286, "no_speech_prob": 0.002929717767983675}, {"id": 182, "seek": + 112396, "start": 1130.1200000000001, "end": 1136.52, "text": " you look at the licenses, + they are restrictive, they actually do not allow you to use them", "tokens": [50672, + 291, 574, 412, 264, 32821, 11, 436, 366, 43220, 11, 436, 767, 360, 406, 2089, 291, + 281, 764, 552, 50992], "temperature": 0.0, "avg_logprob": -0.18427793369736784, + "compression_ratio": 1.6785714285714286, "no_speech_prob": 0.002929717767983675}, + {"id": 183, "seek": 112396, "start": 1136.52, "end": 1142.76, "text": " independently, + right? And kind of go and serve your customers. But you also just mentioned", "tokens": + [50992, 21761, 11, 558, 30, 400, 733, 295, 352, 293, 4596, 428, 4581, 13, 583, 291, + 611, 445, 2835, 51304], "temperature": 0.0, "avg_logprob": -0.18427793369736784, + "compression_ratio": 1.6785714285714286, "no_speech_prob": 0.002929717767983675}, + {"id": 184, "seek": 112396, "start": 1142.76, "end": 1148.92, "text": " what the + code was started recording is that there are also cases where model can be open + source,", "tokens": [51304, 437, 264, 3089, 390, 1409, 6613, 307, 300, 456, 366, + 611, 3331, 689, 2316, 393, 312, 1269, 4009, 11, 51612], "temperature": 0.0, "avg_logprob": + -0.18427793369736784, "compression_ratio": 1.6785714285714286, "no_speech_prob": + 0.002929717767983675}, {"id": 185, "seek": 114892, "start": 1149.0, "end": 1154.6000000000001, + "text": " and it''s kind of like more or less abiding the principles of open source + spirit, but then", "tokens": [50368, 293, 309, 311, 733, 295, 411, 544, 420, 1570, + 410, 2819, 264, 9156, 295, 1269, 4009, 3797, 11, 457, 550, 50648], "temperature": + 0.0, "avg_logprob": -0.12047401889339908, "compression_ratio": 1.832512315270936, + "no_speech_prob": 0.0020599726121872663}, {"id": 186, "seek": 114892, "start": 1156.04, + "end": 1161.96, "text": " contract, but then the data that it was trained on is + not open source or the methods that were", "tokens": [50720, 4364, 11, 457, 550, + 264, 1412, 300, 309, 390, 8895, 322, 307, 406, 1269, 4009, 420, 264, 7150, 300, + 645, 51016], "temperature": 0.0, "avg_logprob": -0.12047401889339908, "compression_ratio": + 1.832512315270936, "no_speech_prob": 0.0020599726121872663}, {"id": 187, "seek": + 114892, "start": 1161.96, "end": 1169.24, "text": " applied to the data are not + open source, right? So to me, it sounds so important to keep kind of", "tokens": + [51016, 6456, 281, 264, 1412, 366, 406, 1269, 4009, 11, 558, 30, 407, 281, 385, + 11, 309, 3263, 370, 1021, 281, 1066, 733, 295, 51380], "temperature": 0.0, "avg_logprob": + -0.12047401889339908, "compression_ratio": 1.832512315270936, "no_speech_prob": + 0.0020599726121872663}, {"id": 188, "seek": 114892, "start": 1169.24, "end": 1175.16, + "text": " declaring what open source is, what are the principles, right? And maybe + this keynote also", "tokens": [51380, 40374, 437, 1269, 4009, 307, 11, 437, 366, + 264, 9156, 11, 558, 30, 400, 1310, 341, 33896, 611, 51676], "temperature": 0.0, + "avg_logprob": -0.12047401889339908, "compression_ratio": 1.832512315270936, "no_speech_prob": + 0.0020599726121872663}, {"id": 189, "seek": 117516, "start": 1175.24, "end": 1180.1200000000001, + "text": " shed some light, but you also, it seems like this topic is also very close + to you, and you", "tokens": [50368, 14951, 512, 1442, 11, 457, 291, 611, 11, 309, + 2544, 411, 341, 4829, 307, 611, 588, 1998, 281, 291, 11, 293, 291, 50612], "temperature": + 0.0, "avg_logprob": -0.20703210001406464, "compression_ratio": 1.6724890829694323, + "no_speech_prob": 0.008307372219860554}, {"id": 190, "seek": 117516, "start": 1180.1200000000001, + "end": 1186.44, "text": " are in the open source contributing a lot, you are the + commuter, like can you can you share your", "tokens": [50612, 366, 294, 264, 1269, + 4009, 19270, 257, 688, 11, 291, 366, 264, 800, 20314, 11, 411, 393, 291, 393, 291, + 2073, 428, 50928], "temperature": 0.0, "avg_logprob": -0.20703210001406464, "compression_ratio": + 1.6724890829694323, "no_speech_prob": 0.008307372219860554}, {"id": 191, "seek": + 117516, "start": 1186.44, "end": 1192.68, "text": " vision on what is open source, + what are the implications for how this field should be developing?", "tokens": [50928, + 5201, 322, 437, 307, 1269, 4009, 11, 437, 366, 264, 16602, 337, 577, 341, 2519, + 820, 312, 6416, 30, 51240], "temperature": 0.0, "avg_logprob": -0.20703210001406464, + "compression_ratio": 1.6724890829694323, "no_speech_prob": 0.008307372219860554}, + {"id": 192, "seek": 117516, "start": 1194.0400000000002, "end": 1200.44, "text": + " I think it''s a huge problem, especially because nowadays open washing, which + is like the practice", "tokens": [51308, 286, 519, 309, 311, 257, 2603, 1154, 11, + 2318, 570, 13434, 1269, 13836, 11, 597, 307, 411, 264, 3124, 51628], "temperature": + 0.0, "avg_logprob": -0.20703210001406464, "compression_ratio": 1.6724890829694323, + "no_speech_prob": 0.008307372219860554}, {"id": 193, "seek": 120044, "start": 1200.52, + "end": 1206.2, "text": " of associating openness to something that potentially is + not really fully open, is happening a lot.", "tokens": [50368, 295, 4180, 990, 36200, + 281, 746, 300, 7263, 307, 406, 534, 4498, 1269, 11, 307, 2737, 257, 688, 13, 50652], + "temperature": 0.0, "avg_logprob": -0.21035854790800362, "compression_ratio": 1.751131221719457, + "no_speech_prob": 0.0067875441163778305}, {"id": 194, "seek": 120044, "start": 1206.2, + "end": 1212.76, "text": " Here''s open source is cool, open source show like a good + habit, so you''re the good guys if you", "tokens": [50652, 1692, 311, 1269, 4009, + 307, 1627, 11, 1269, 4009, 855, 411, 257, 665, 7164, 11, 370, 291, 434, 264, 665, + 1074, 498, 291, 50980], "temperature": 0.0, "avg_logprob": -0.21035854790800362, + "compression_ratio": 1.751131221719457, "no_speech_prob": 0.0067875441163778305}, + {"id": 195, "seek": 120044, "start": 1212.76, "end": 1219.0800000000002, "text": + " if you do open source. So as you said, we are not going to make names of companies + or association", "tokens": [50980, 498, 291, 360, 1269, 4009, 13, 407, 382, 291, + 848, 11, 321, 366, 406, 516, 281, 652, 5288, 295, 3431, 420, 14598, 51296], "temperature": + 0.0, "avg_logprob": -0.21035854790800362, "compression_ratio": 1.751131221719457, + "no_speech_prob": 0.0067875441163778305}, {"id": 196, "seek": 120044, "start": 1219.0800000000002, + "end": 1224.04, "text": " that claim, for example, they lar language model were + open source, but lar language models are", "tokens": [51296, 300, 3932, 11, 337, + 1365, 11, 436, 1613, 2856, 2316, 645, 1269, 4009, 11, 457, 1613, 2856, 5245, 366, + 51544], "temperature": 0.0, "avg_logprob": -0.21035854790800362, "compression_ratio": + 1.751131221719457, "no_speech_prob": 0.0067875441163778305}, {"id": 197, "seek": + 122404, "start": 1224.12, "end": 1231.08, "text": " complex systems. So the outputs, + the final light waves on the neural network is just one little part", "tokens": + [50368, 3997, 3652, 13, 407, 264, 23930, 11, 264, 2572, 1442, 9417, 322, 264, 18161, + 3209, 307, 445, 472, 707, 644, 50716], "temperature": 0.0, "avg_logprob": -0.21566313963669997, + "compression_ratio": 1.7155555555555555, "no_speech_prob": 0.007961480878293514}, + {"id": 198, "seek": 122404, "start": 1231.08, "end": 1237.8, "text": " of the entire + picture. Those lar language models are normally pre-trained on huge quantities of", + "tokens": [50716, 295, 264, 2302, 3036, 13, 3950, 1613, 2856, 5245, 366, 5646, 659, + 12, 17227, 2001, 322, 2603, 22927, 295, 51052], "temperature": 0.0, "avg_logprob": + -0.21566313963669997, "compression_ratio": 1.7155555555555555, "no_speech_prob": + 0.007961480878293514}, {"id": 199, "seek": 122404, "start": 1237.8, "end": 1246.84, + "text": " data with a pre-training algorithm. So the pre-training data and the pre-training + code,", "tokens": [51052, 1412, 365, 257, 659, 12, 17227, 1760, 9284, 13, 407, 264, + 659, 12, 17227, 1760, 1412, 293, 264, 659, 12, 17227, 1760, 3089, 11, 51504], "temperature": + 0.0, "avg_logprob": -0.21566313963669997, "compression_ratio": 1.7155555555555555, + "no_speech_prob": 0.007961480878293514}, {"id": 200, "seek": 122404, "start": 1247.96, + "end": 1253.8, "text": " is it open? Is it not? I mean, many times it''s not, not + only not, it''s not open, it''s not even known,", "tokens": [51560, 307, 309, 1269, + 30, 1119, 309, 406, 30, 286, 914, 11, 867, 1413, 309, 311, 406, 11, 406, 787, 406, + 11, 309, 311, 406, 1269, 11, 309, 311, 406, 754, 2570, 11, 51852], "temperature": + 0.0, "avg_logprob": -0.21566313963669997, "compression_ratio": 1.7155555555555555, + "no_speech_prob": 0.007961480878293514}, {"id": 201, "seek": 125404, "start": 1254.04, + "end": 1261.48, "text": " what kind of data is just generic internet scale data. + What about the fine tuning them? So", "tokens": [50364, 437, 733, 295, 1412, 307, + 445, 19577, 4705, 4373, 1412, 13, 708, 466, 264, 2489, 15164, 552, 30, 407, 50736], + "temperature": 0.0, "avg_logprob": -0.18889763534709972, "compression_ratio": 1.619047619047619, + "no_speech_prob": 0.0021331042516976595}, {"id": 202, "seek": 125404, "start": 1261.48, + "end": 1265.6399999999999, "text": " once you get the pre-training, which is the + unsupervised part, where you just explore the web,", "tokens": [50736, 1564, 291, + 483, 264, 659, 12, 17227, 1760, 11, 597, 307, 264, 2693, 12879, 24420, 644, 11, + 689, 291, 445, 6839, 264, 3670, 11, 50944], "temperature": 0.0, "avg_logprob": -0.18889763534709972, + "compression_ratio": 1.619047619047619, "no_speech_prob": 0.0021331042516976595}, + {"id": 203, "seek": 125404, "start": 1265.6399999999999, "end": 1271.6399999999999, + "text": " that''s pretty simple, then you want to fine tune for specific tasks, + like sentence similarity or", "tokens": [50944, 300, 311, 1238, 2199, 11, 550, 291, + 528, 281, 2489, 10864, 337, 2685, 9608, 11, 411, 8174, 32194, 420, 51244], "temperature": + 0.0, "avg_logprob": -0.18889763534709972, "compression_ratio": 1.619047619047619, + "no_speech_prob": 0.0021331042516976595}, {"id": 204, "seek": 125404, "start": 1272.28, + "end": 1277.72, "text": " instruction following or, I don''t know, summarization, + any kind of task you want to use the", "tokens": [51276, 10951, 3480, 420, 11, 286, + 500, 380, 458, 11, 14611, 2144, 11, 604, 733, 295, 5633, 291, 528, 281, 764, 264, + 51548], "temperature": 0.0, "avg_logprob": -0.18889763534709972, "compression_ratio": + 1.619047619047619, "no_speech_prob": 0.0021331042516976595}, {"id": 205, "seek": + 127772, "start": 1278.68, "end": 1284.84, "text": " and to do that normally using + an additional training data set that is particularly designed for", "tokens": [50412, + 293, 281, 360, 300, 5646, 1228, 364, 4497, 3097, 1412, 992, 300, 307, 4098, 4761, + 337, 50720], "temperature": 0.0, "avg_logprob": -0.20213103020328216, "compression_ratio": + 1.835820895522388, "no_speech_prob": 0.005818348843604326}, {"id": 206, "seek": + 127772, "start": 1284.84, "end": 1291.16, "text": " that fine tuning task. And again, + is that open? So do you communicate and then you make it available?", "tokens": + [50720, 300, 2489, 15164, 5633, 13, 400, 797, 11, 307, 300, 1269, 30, 407, 360, + 291, 7890, 293, 550, 291, 652, 309, 2435, 30, 51036], "temperature": 0.0, "avg_logprob": + -0.20213103020328216, "compression_ratio": 1.835820895522388, "no_speech_prob": + 0.005818348843604326}, {"id": 207, "seek": 127772, "start": 1292.1200000000001, + "end": 1298.28, "text": " And the code for fine tuning, do you make it available? + The output of the pre-training,", "tokens": [51084, 400, 264, 3089, 337, 2489, 15164, + 11, 360, 291, 652, 309, 2435, 30, 440, 5598, 295, 264, 659, 12, 17227, 1760, 11, + 51392], "temperature": 0.0, "avg_logprob": -0.20213103020328216, "compression_ratio": + 1.835820895522388, "no_speech_prob": 0.005818348843604326}, {"id": 208, "seek": + 127772, "start": 1298.28, "end": 1303.32, "text": " do you make it available separately + from the output of the fine, the documentation,", "tokens": [51392, 360, 291, 652, + 309, 2435, 14759, 490, 264, 5598, 295, 264, 2489, 11, 264, 14333, 11, 51644], "temperature": + 0.0, "avg_logprob": -0.20213103020328216, "compression_ratio": 1.835820895522388, + "no_speech_prob": 0.005818348843604326}, {"id": 209, "seek": 130332, "start": 1303.3999999999999, + "end": 1310.28, "text": " any data that explains what is done, why you found it? + So I''ve read like an interesting paper that", "tokens": [50368, 604, 1412, 300, + 13948, 437, 307, 1096, 11, 983, 291, 1352, 309, 30, 407, 286, 600, 1401, 411, 364, + 1880, 3035, 300, 50712], "temperature": 0.0, "avg_logprob": -0.43238083745392275, + "compression_ratio": 1.5336787564766838, "no_speech_prob": 0.022216973826289177}, + {"id": 210, "seek": 130332, "start": 1310.28, "end": 1316.9199999999998, "text": + " I guess we can share, like, as a comment from a university, they were like comparing + all these aspects", "tokens": [50712, 286, 2041, 321, 393, 2073, 11, 411, 11, 382, + 257, 2871, 490, 257, 5454, 11, 436, 645, 411, 15763, 439, 613, 7270, 51044], "temperature": + 0.0, "avg_logprob": -0.43238083745392275, "compression_ratio": 1.5336787564766838, + "no_speech_prob": 0.022216973826289177}, {"id": 211, "seek": 130332, "start": 1316.9199999999998, + "end": 1324.76, "text": " for the MS models and how famous like open source of the + MS models actually behaved on each of", "tokens": [51044, 337, 264, 7395, 5245, + 293, 577, 4618, 411, 1269, 4009, 295, 264, 7395, 5245, 767, 48249, 322, 1184, 295, + 51436], "temperature": 0.0, "avg_logprob": -0.43238083745392275, "compression_ratio": + 1.5336787564766838, "no_speech_prob": 0.022216973826289177}, {"id": 212, "seek": + 132476, "start": 1324.76, "end": 1332.68, "text": " these columns and would be surprising + how a small percentage of these like, you know, big layers", "tokens": [50364, 613, + 13766, 293, 576, 312, 8830, 577, 257, 1359, 9668, 295, 613, 411, 11, 291, 458, 11, + 955, 7914, 50760], "temperature": 0.0, "avg_logprob": -0.237389407315097, "compression_ratio": + 1.6551724137931034, "no_speech_prob": 0.004554250277578831}, {"id": 213, "seek": + 132476, "start": 1332.68, "end": 1337.48, "text": " are actually open sourcing everything. + So it''s not just the license that as you said correctly,", "tokens": [50760, 366, + 767, 1269, 11006, 2175, 1203, 13, 407, 309, 311, 406, 445, 264, 10476, 300, 382, + 291, 848, 8944, 11, 51000], "temperature": 0.0, "avg_logprob": -0.237389407315097, + "compression_ratio": 1.6551724137931034, "no_speech_prob": 0.004554250277578831}, + {"id": 214, "seek": 132476, "start": 1337.48, "end": 1342.76, "text": " sometimes + it''s limiting, but literally like the components shirt, sometimes it''s just the + final", "tokens": [51000, 2171, 309, 311, 22083, 11, 457, 3736, 411, 264, 6677, + 8336, 11, 2171, 309, 311, 445, 264, 2572, 51264], "temperature": 0.0, "avg_logprob": + -0.237389407315097, "compression_ratio": 1.6551724137931034, "no_speech_prob": 0.004554250277578831}, + {"id": 215, "seek": 132476, "start": 1344.2, "end": 1348.68, "text": " which is, + is it helpful? I mean, in open source, you want to cooperate, you want to improve + the", "tokens": [51336, 597, 307, 11, 307, 309, 4961, 30, 286, 914, 11, 294, 1269, + 4009, 11, 291, 528, 281, 26667, 11, 291, 528, 281, 3470, 264, 51560], "temperature": + 0.0, "avg_logprob": -0.237389407315097, "compression_ratio": 1.6551724137931034, + "no_speech_prob": 0.004554250277578831}, {"id": 216, "seek": 134868, "start": 1348.68, + "end": 1354.92, "text": " code, like in normal code, you have access to everything + and you can like improve, you can", "tokens": [50364, 3089, 11, 411, 294, 2710, + 3089, 11, 291, 362, 2105, 281, 1203, 293, 291, 393, 411, 3470, 11, 291, 393, 50676], + "temperature": 0.0, "avg_logprob": -0.331269619511623, "compression_ratio": 1.6218487394957983, + "no_speech_prob": 0.003783289110288024}, {"id": 217, "seek": 134868, "start": 1354.92, + "end": 1360.68, "text": " help the community. If you just access the ways, you can + use it, but can you, for example, improve", "tokens": [50676, 854, 264, 1768, 13, + 759, 291, 445, 2105, 264, 2098, 11, 291, 393, 764, 309, 11, 457, 393, 291, 11, 337, + 1365, 11, 3470, 50964], "temperature": 0.0, "avg_logprob": -0.331269619511623, "compression_ratio": + 1.6218487394957983, "no_speech_prob": 0.003783289110288024}, {"id": 218, "seek": + 134868, "start": 1360.68, "end": 1367.16, "text": " it? Can you understand if it''s + fair? In the data you was there? Yes. Yeah, it''s really difficult.", "tokens": + [50964, 309, 30, 1664, 291, 1223, 498, 309, 311, 3143, 30, 682, 264, 1412, 291, + 390, 456, 30, 1079, 13, 865, 11, 309, 311, 534, 2252, 13, 51288], "temperature": + 0.0, "avg_logprob": -0.331269619511623, "compression_ratio": 1.6218487394957983, + "no_speech_prob": 0.003783289110288024}, {"id": 219, "seek": 134868, "start": 1367.16, + "end": 1373.8, "text": " Yeah. And so, what do you think these discussions should + start or maybe it''s ongoing? Are you part", "tokens": [51288, 865, 13, 400, 370, + 11, 437, 360, 291, 519, 613, 11088, 820, 722, 420, 1310, 309, 311, 10452, 30, 2014, + 291, 644, 51620], "temperature": 0.0, "avg_logprob": -0.331269619511623, "compression_ratio": + 1.6218487394957983, "no_speech_prob": 0.003783289110288024}, {"id": 220, "seek": + 137380, "start": 1373.8, "end": 1381.48, "text": " of some discussion? And how does + it impact business and maybe research? Right? Because there are", "tokens": [50364, + 295, 512, 5017, 30, 400, 577, 775, 309, 2712, 1606, 293, 1310, 2132, 30, 1779, 30, + 1436, 456, 366, 50748], "temperature": 0.0, "avg_logprob": -0.20948582310830394, + "compression_ratio": 1.5668016194331984, "no_speech_prob": 0.019500738009810448}, + {"id": 221, "seek": 137380, "start": 1381.48, "end": 1390.28, "text": " different + sides of this coin. Many of these things emerge in the academia space, but then + they move", "tokens": [50748, 819, 4881, 295, 341, 11464, 13, 5126, 295, 613, 721, + 21511, 294, 264, 28937, 1901, 11, 457, 550, 436, 1286, 51188], "temperature": 0.0, + "avg_logprob": -0.20948582310830394, "compression_ratio": 1.5668016194331984, "no_speech_prob": + 0.019500738009810448}, {"id": 222, "seek": 137380, "start": 1390.28, "end": 1396.84, + "text": " to create value on the business side, but it could also be vice versa. + So what do you think?", "tokens": [51188, 281, 1884, 2158, 322, 264, 1606, 1252, + 11, 457, 309, 727, 611, 312, 11964, 25650, 13, 407, 437, 360, 291, 519, 30, 51516], + "temperature": 0.0, "avg_logprob": -0.20948582310830394, "compression_ratio": 1.5668016194331984, + "no_speech_prob": 0.019500738009810448}, {"id": 223, "seek": 137380, "start": 1397.6399999999999, + "end": 1403.24, "text": " What are we going to address this? So I know that the + open source initiative, which is a group of", "tokens": [51556, 708, 366, 321, 516, + 281, 2985, 341, 30, 407, 286, 458, 300, 264, 1269, 4009, 11552, 11, 597, 307, 257, + 1594, 295, 51836], "temperature": 0.0, "avg_logprob": -0.20948582310830394, "compression_ratio": + 1.5668016194331984, "no_speech_prob": 0.019500738009810448}, {"id": 224, "seek": + 140324, "start": 1403.24, "end": 1409.4, "text": " people that directly directly + open source manifest, so I try to basically think to ways of the", "tokens": [50364, + 561, 300, 3838, 3838, 1269, 4009, 10067, 11, 370, 286, 853, 281, 1936, 519, 281, + 2098, 295, 264, 50672], "temperature": 0.0, "avg_logprob": -0.2789993502876975, + "compression_ratio": 1.7416267942583732, "no_speech_prob": 0.005888727493584156}, + {"id": 225, "seek": 140324, "start": 1409.4, "end": 1416.44, "text": " finding open + source is they are working on a definition of open source for AI models.", "tokens": + [50672, 5006, 1269, 4009, 307, 436, 366, 1364, 322, 257, 7123, 295, 1269, 4009, + 337, 7318, 5245, 13, 51024], "temperature": 0.0, "avg_logprob": -0.2789993502876975, + "compression_ratio": 1.7416267942583732, "no_speech_prob": 0.005888727493584156}, + {"id": 226, "seek": 140324, "start": 1417.24, "end": 1423.56, "text": " We are going + to see hopefully soon enough, a definition of what it means for a model to be", + "tokens": [51064, 492, 366, 516, 281, 536, 4696, 2321, 1547, 11, 257, 7123, 295, + 437, 309, 1355, 337, 257, 2316, 281, 312, 51380], "temperature": 0.0, "avg_logprob": + -0.2789993502876975, "compression_ratio": 1.7416267942583732, "no_speech_prob": + 0.005888727493584156}, {"id": 227, "seek": 140324, "start": 1423.56, "end": 1429.0, + "text": " open source. And that is going to be great, because at that point it''s + not a matter of like,", "tokens": [51380, 1269, 4009, 13, 400, 300, 307, 516, 281, + 312, 869, 11, 570, 412, 300, 935, 309, 311, 406, 257, 1871, 295, 411, 11, 51652], + "temperature": 0.0, "avg_logprob": -0.2789993502876975, "compression_ratio": 1.7416267942583732, + "no_speech_prob": 0.005888727493584156}, {"id": 228, "seek": 142900, "start": 1429.56, + "end": 1436.84, "text": " I believe it''s open, and I claim it''s open. It''s open + for its notes. And everything is covered", "tokens": [50392, 286, 1697, 309, 311, + 1269, 11, 293, 286, 3932, 309, 311, 1269, 13, 467, 311, 1269, 337, 1080, 5570, 13, + 400, 1203, 307, 5343, 50756], "temperature": 0.0, "avg_logprob": -0.2756321716308594, + "compression_ratio": 1.5604395604395604, "no_speech_prob": 0.010949679650366306}, + {"id": 229, "seek": 142900, "start": 1436.84, "end": 1443.8, "text": " by a license + that is going to be open or not. In terms of like impolination between like the", + "tokens": [50756, 538, 257, 10476, 300, 307, 516, 281, 312, 1269, 420, 406, 13, + 682, 2115, 295, 411, 704, 401, 2486, 1296, 411, 264, 51104], "temperature": 0.0, + "avg_logprob": -0.2756321716308594, "compression_ratio": 1.5604395604395604, "no_speech_prob": + 0.010949679650366306}, {"id": 230, "seek": 142900, "start": 1443.8, "end": 1453.16, + "text": " academia and the industry, I think probably that''s the most, I mean, + this period is so important", "tokens": [51104, 28937, 293, 264, 3518, 11, 286, + 519, 1391, 300, 311, 264, 881, 11, 286, 914, 11, 341, 2896, 307, 370, 1021, 51572], + "temperature": 0.0, "avg_logprob": -0.2756321716308594, "compression_ratio": 1.5604395604395604, + "no_speech_prob": 0.010949679650366306}, {"id": 231, "seek": 145316, "start": 1453.88, + "end": 1459.5600000000002, "text": " to see like cross pollination, because there + are like many models that for example are", "tokens": [50400, 281, 536, 411, 3278, + 6418, 2486, 11, 570, 456, 366, 411, 867, 5245, 300, 337, 1365, 366, 50684], "temperature": + 0.0, "avg_logprob": -0.28062552940554736, "compression_ratio": 1.7962085308056872, + "no_speech_prob": 0.013462143950164318}, {"id": 232, "seek": 145316, "start": 1459.5600000000002, + "end": 1466.1200000000001, "text": " designed and contributed by the academia that + must then be used by organizations and the other", "tokens": [50684, 4761, 293, + 18434, 538, 264, 28937, 300, 1633, 550, 312, 1143, 538, 6150, 293, 264, 661, 51012], + "temperature": 0.0, "avg_logprob": -0.28062552940554736, "compression_ratio": 1.7962085308056872, + "no_speech_prob": 0.013462143950164318}, {"id": 233, "seek": 145316, "start": 1466.1200000000001, + "end": 1470.28, "text": " way around, because of course there are like a lot of + money involved in training and free training", "tokens": [51012, 636, 926, 11, 570, + 295, 1164, 456, 366, 411, 257, 688, 295, 1460, 3288, 294, 3097, 293, 1737, 3097, + 51220], "temperature": 0.0, "avg_logprob": -0.28062552940554736, "compression_ratio": + 1.7962085308056872, "no_speech_prob": 0.013462143950164318}, {"id": 234, "seek": + 145316, "start": 1470.28, "end": 1477.24, "text": " and fine training on the algorithm + of the models. So many, I mean, only few actually organizations", "tokens": [51220, + 293, 2489, 3097, 322, 264, 9284, 295, 264, 5245, 13, 407, 867, 11, 286, 914, 11, + 787, 1326, 767, 6150, 51568], "temperature": 0.0, "avg_logprob": -0.28062552940554736, + "compression_ratio": 1.7962085308056872, "no_speech_prob": 0.013462143950164318}, + {"id": 235, "seek": 147724, "start": 1477.24, "end": 1483.48, "text": " are able + to do this. So they should try, I mean, ideally to make it as open as possible in + a way", "tokens": [50364, 366, 1075, 281, 360, 341, 13, 407, 436, 820, 853, 11, + 286, 914, 11, 22915, 281, 652, 309, 382, 1269, 382, 1944, 294, 257, 636, 50676], + "temperature": 0.0, "avg_logprob": -0.3184212154812283, "compression_ratio": 1.5991379310344827, + "no_speech_prob": 0.00343546480871737}, {"id": 236, "seek": 147724, "start": 1483.48, + "end": 1488.6, "text": " that then universities can focus on small components and + potentially help in some more.", "tokens": [50676, 300, 550, 11779, 393, 1879, 322, + 1359, 6677, 293, 7263, 854, 294, 512, 544, 13, 50932], "temperature": 0.0, "avg_logprob": + -0.3184212154812283, "compression_ratio": 1.5991379310344827, "no_speech_prob": + 0.00343546480871737}, {"id": 237, "seek": 147724, "start": 1488.6, "end": 1495.88, + "text": " Yeah, yeah. I mean, you know, pre-training and internet scale is incredibly + expensive from", "tokens": [50932, 865, 11, 1338, 13, 286, 914, 11, 291, 458, 11, + 659, 12, 17227, 1760, 293, 4705, 4373, 307, 6252, 5124, 490, 51296], "temperature": + 0.0, "avg_logprob": -0.3184212154812283, "compression_ratio": 1.5991379310344827, + "no_speech_prob": 0.00343546480871737}, {"id": 238, "seek": 147724, "start": 1495.88, + "end": 1505.0, "text": " energy perspective especially. So I hope, you know, we + reach the point where everything is open", "tokens": [51296, 2281, 4585, 2318, 13, + 407, 286, 1454, 11, 291, 458, 11, 321, 2524, 264, 935, 689, 1203, 307, 1269, 51752], + "temperature": 0.0, "avg_logprob": -0.3184212154812283, "compression_ratio": 1.5991379310344827, + "no_speech_prob": 0.00343546480871737}, {"id": 239, "seek": 150500, "start": 1505.8, + "end": 1511.48, "text": " enough for also smaller organizations and academia organizations + to 12.", "tokens": [50404, 1547, 337, 611, 4356, 6150, 293, 28937, 6150, 281, 2272, + 13, 50688], "temperature": 0.0, "avg_logprob": -0.2461254657843174, "compression_ratio": + 1.5980861244019138, "no_speech_prob": 0.009635263122618198}, {"id": 240, "seek": + 150500, "start": 1511.48, "end": 1515.32, "text": " Yeah, it''s very interesting + because there is always going to be this kind of", "tokens": [50688, 865, 11, 309, + 311, 588, 1880, 570, 456, 307, 1009, 516, 281, 312, 341, 733, 295, 50880], "temperature": + 0.0, "avg_logprob": -0.2461254657843174, "compression_ratio": 1.5980861244019138, + "no_speech_prob": 0.009635263122618198}, {"id": 241, "seek": 150500, "start": 1516.84, + "end": 1522.6, "text": " play between, okay, this big company has all the servers, + they need to train the model.", "tokens": [50956, 862, 1296, 11, 1392, 11, 341, + 955, 2237, 575, 439, 264, 15909, 11, 436, 643, 281, 3847, 264, 2316, 13, 51244], + "temperature": 0.0, "avg_logprob": -0.2461254657843174, "compression_ratio": 1.5980861244019138, + "no_speech_prob": 0.009635263122618198}, {"id": 242, "seek": 150500, "start": 1522.6, + "end": 1529.72, "text": " So they can also decide how they will do it and not kind + of disclose, but then maybe the question", "tokens": [51244, 407, 436, 393, 611, + 4536, 577, 436, 486, 360, 309, 293, 406, 733, 295, 36146, 11, 457, 550, 1310, 264, + 1168, 51600], "temperature": 0.0, "avg_logprob": -0.2461254657843174, "compression_ratio": + 1.5980861244019138, "no_speech_prob": 0.009635263122618198}, {"id": 243, "seek": + 152972, "start": 1529.8, "end": 1535.32, "text": " that we need to be disputing + and sort of discussing is that they still don''t have all the data", "tokens": [50368, + 300, 321, 643, 281, 312, 37669, 278, 293, 1333, 295, 10850, 307, 300, 436, 920, + 500, 380, 362, 439, 264, 1412, 50644], "temperature": 0.0, "avg_logprob": -0.12686287459506784, + "compression_ratio": 1.6986899563318778, "no_speech_prob": 0.005662213079631329}, + {"id": 244, "seek": 152972, "start": 1535.32, "end": 1540.04, "text": " to train + on, right? Potentially. Like there have been some cases mentioned in the keynote, + you know,", "tokens": [50644, 281, 3847, 322, 11, 558, 30, 9145, 3137, 13, 1743, + 456, 362, 668, 512, 3331, 2835, 294, 264, 33896, 11, 291, 458, 11, 50880], "temperature": + 0.0, "avg_logprob": -0.12686287459506784, "compression_ratio": 1.6986899563318778, + "no_speech_prob": 0.005662213079631329}, {"id": 245, "seek": 152972, "start": 1540.04, + "end": 1548.52, "text": " when some company, we will not name the company, it goes + and trains it on some articles of famous", "tokens": [50880, 562, 512, 2237, 11, + 321, 486, 406, 1315, 264, 2237, 11, 309, 1709, 293, 16329, 309, 322, 512, 11290, + 295, 4618, 51304], "temperature": 0.0, "avg_logprob": -0.12686287459506784, "compression_ratio": + 1.6986899563318778, "no_speech_prob": 0.005662213079631329}, {"id": 246, "seek": + 152972, "start": 1548.52, "end": 1553.64, "text": " publishing house, right? And + now that publishing house is unhappy because they say you took our", "tokens": [51304, + 17832, 1782, 11, 558, 30, 400, 586, 300, 17832, 1782, 307, 22172, 570, 436, 584, + 291, 1890, 527, 51560], "temperature": 0.0, "avg_logprob": -0.12686287459506784, + "compression_ratio": 1.6986899563318778, "no_speech_prob": 0.005662213079631329}, + {"id": 247, "seek": 155364, "start": 1553.72, "end": 1559.0, "text": " articles + without us knowing this. Now, it now it kind of evokes this question, okay,", "tokens": + [50368, 11290, 1553, 505, 5276, 341, 13, 823, 11, 309, 586, 309, 733, 295, 1073, + 8606, 341, 1168, 11, 1392, 11, 50632], "temperature": 0.0, "avg_logprob": -0.20279652731759207, + "compression_ratio": 1.7056603773584906, "no_speech_prob": 0.009502504952251911}, + {"id": 248, "seek": 155364, "start": 1560.2, "end": 1564.44, "text": " when I was + reading this article, there was probably some license which said you can", "tokens": + [50692, 562, 286, 390, 3760, 341, 7222, 11, 456, 390, 1391, 512, 10476, 597, 848, + 291, 393, 50904], "temperature": 0.0, "avg_logprob": -0.20279652731759207, "compression_ratio": + 1.7056603773584906, "no_speech_prob": 0.009502504952251911}, {"id": 249, "seek": + 155364, "start": 1564.44, "end": 1570.1200000000001, "text": " not, you can do this, + but not this, maybe there is something hidden, right? But only now we started", + "tokens": [50904, 406, 11, 291, 393, 360, 341, 11, 457, 406, 341, 11, 1310, 456, + 307, 746, 7633, 11, 558, 30, 583, 787, 586, 321, 1409, 51188], "temperature": 0.0, + "avg_logprob": -0.20279652731759207, "compression_ratio": 1.7056603773584906, "no_speech_prob": + 0.009502504952251911}, {"id": 250, "seek": 155364, "start": 1570.1200000000001, + "end": 1575.48, "text": " discussing these things, right? And that''s very interesting + topic, but do you think that,", "tokens": [51188, 10850, 613, 721, 11, 558, 30, + 400, 300, 311, 588, 1880, 4829, 11, 457, 360, 291, 519, 300, 11, 51456], "temperature": + 0.0, "avg_logprob": -0.20279652731759207, "compression_ratio": 1.7056603773584906, + "no_speech_prob": 0.009502504952251911}, {"id": 251, "seek": 155364, "start": 1576.76, + "end": 1582.5200000000002, "text": " you know, when the companies will be, let''s + say, they have open source to model and they have", "tokens": [51520, 291, 458, + 11, 562, 264, 3431, 486, 312, 11, 718, 311, 584, 11, 436, 362, 1269, 4009, 281, + 2316, 293, 436, 362, 51808], "temperature": 0.0, "avg_logprob": -0.20279652731759207, + "compression_ratio": 1.7056603773584906, "no_speech_prob": 0.009502504952251911}, + {"id": 252, "seek": 158364, "start": 1583.64, "end": 1591.5600000000002, "text": + " checked everything on that manifesto or on that contract. Do you think that there + will be still a", "tokens": [50364, 10033, 1203, 322, 300, 10067, 78, 420, 322, + 300, 4364, 13, 1144, 291, 519, 300, 456, 486, 312, 920, 257, 50760], "temperature": + 0.0, "avg_logprob": -0.17116400689789743, "compression_ratio": 1.5736842105263158, + "no_speech_prob": 0.0038147710729390383}, {"id": 253, "seek": 158364, "start": 1591.5600000000002, + "end": 1598.92, "text": " need for some maybe tooling or some process to kind of + continuously maintain the status of this", "tokens": [50760, 643, 337, 512, 1310, + 46593, 420, 512, 1399, 281, 733, 295, 15684, 6909, 264, 6558, 295, 341, 51128], + "temperature": 0.0, "avg_logprob": -0.17116400689789743, "compression_ratio": 1.5736842105263158, + "no_speech_prob": 0.0038147710729390383}, {"id": 254, "seek": 158364, "start": 1598.92, + "end": 1605.8000000000002, "text": " model as open source because it may well happen + that, you know, either the company or research institute,", "tokens": [51128, 2316, + 382, 1269, 4009, 570, 309, 815, 731, 1051, 300, 11, 291, 458, 11, 2139, 264, 2237, + 420, 2132, 26860, 11, 51472], "temperature": 0.0, "avg_logprob": -0.17116400689789743, + "compression_ratio": 1.5736842105263158, "no_speech_prob": 0.0038147710729390383}, + {"id": 255, "seek": 160580, "start": 1605.8, "end": 1613.0, "text": " they go and + accidentally use some data that doesn''t anymore confirm, confirm, like,", "tokens": + [50364, 436, 352, 293, 15715, 764, 512, 1412, 300, 1177, 380, 3602, 9064, 11, 9064, + 11, 411, 11, 50724], "temperature": 0.0, "avg_logprob": -0.27423079470370676, "compression_ratio": + 1.6017316017316017, "no_speech_prob": 0.024271303787827492}, {"id": 256, "seek": + 160580, "start": 1613.0, "end": 1619.56, "text": " comply with this contract, right? + First of all, without other lands, do you think such thing", "tokens": [50724, 27956, + 365, 341, 4364, 11, 558, 30, 2386, 295, 439, 11, 1553, 661, 5949, 11, 360, 291, + 519, 1270, 551, 51052], "temperature": 0.0, "avg_logprob": -0.27423079470370676, + "compression_ratio": 1.6017316017316017, "no_speech_prob": 0.024271303787827492}, + {"id": 257, "seek": 160580, "start": 1619.56, "end": 1626.52, "text": " exists, + would say, for Apache Solar or you see that no one will find a library that is not + the", "tokens": [51052, 8198, 11, 576, 584, 11, 337, 46597, 22385, 420, 291, 536, + 300, 572, 472, 486, 915, 257, 6405, 300, 307, 406, 264, 51400], "temperature": 0.0, + "avg_logprob": -0.27423079470370676, "compression_ratio": 1.6017316017316017, "no_speech_prob": + 0.024271303787827492}, {"id": 258, "seek": 160580, "start": 1626.52, "end": 1632.04, + "text": " license that it has to be, plugs it in and we do a release of you seeing + a solar. I think there is", "tokens": [51400, 10476, 300, 309, 575, 281, 312, 11, + 33899, 309, 294, 293, 321, 360, 257, 4374, 295, 291, 2577, 257, 7936, 13, 286, 519, + 456, 307, 51676], "temperature": 0.0, "avg_logprob": -0.27423079470370676, "compression_ratio": + 1.6017316017316017, "no_speech_prob": 0.024271303787827492}, {"id": 259, "seek": + 163204, "start": 1632.04, "end": 1639.08, "text": " some checker, right? Yeah. So + these applies to certain extent to code as well, right? So you", "tokens": [50364, + 512, 1520, 260, 11, 558, 30, 865, 13, 407, 613, 13165, 281, 1629, 8396, 281, 3089, + 382, 731, 11, 558, 30, 407, 291, 50716], "temperature": 0.0, "avg_logprob": -0.2993272997669338, + "compression_ratio": 1.5601659751037344, "no_speech_prob": 0.012124263681471348}, + {"id": 260, "seek": 163204, "start": 1639.08, "end": 1645.72, "text": " are a contributor. + When you sign basically the, let''s say contract with the Apache Solar", "tokens": + [50716, 366, 257, 42859, 13, 1133, 291, 1465, 1936, 264, 11, 718, 311, 584, 4364, + 365, 264, 46597, 22385, 51048], "temperature": 0.0, "avg_logprob": -0.2993272997669338, + "compression_ratio": 1.5601659751037344, "no_speech_prob": 0.012124263681471348}, + {"id": 261, "seek": 163204, "start": 1645.72, "end": 1651.6399999999999, "text": + " Foundation, you are sure that any kind of contribution you do is your own. So + there''s not", "tokens": [51048, 10335, 11, 291, 366, 988, 300, 604, 733, 295, 13150, + 291, 360, 307, 428, 1065, 13, 407, 456, 311, 406, 51344], "temperature": 0.0, "avg_logprob": + -0.2993272997669338, "compression_ratio": 1.5601659751037344, "no_speech_prob": + 0.012124263681471348}, {"id": 262, "seek": 163204, "start": 1651.6399999999999, + "end": 1656.76, "text": " being COVID, for example, that was not COVID-rided and + the sort of thing. It''s genuinely created by you.", "tokens": [51344, 885, 4566, + 11, 337, 1365, 11, 300, 390, 406, 4566, 12, 81, 2112, 293, 264, 1333, 295, 551, + 13, 467, 311, 17839, 2942, 538, 291, 13, 51600], "temperature": 0.0, "avg_logprob": + -0.2993272997669338, "compression_ratio": 1.5601659751037344, "no_speech_prob": + 0.012124263681471348}, {"id": 263, "seek": 165676, "start": 1656.76, "end": 1663.72, + "text": " It''s genuinely created by you. So to certain extent, that would be a + similar thing to", "tokens": [50364, 467, 311, 17839, 2942, 538, 291, 13, 407, 281, + 1629, 8396, 11, 300, 576, 312, 257, 2531, 551, 281, 50712], "temperature": 0.0, + "avg_logprob": -0.2459466912773218, "compression_ratio": 1.7264150943396226, "no_speech_prob": + 0.008565830998122692}, {"id": 264, "seek": 165676, "start": 1664.28, "end": 1670.04, + "text": " potentially add some training data. I think probably it''s a little bit + less likely that", "tokens": [50740, 7263, 909, 512, 3097, 1412, 13, 286, 519, 1391, + 309, 311, 257, 707, 857, 1570, 3700, 300, 51028], "temperature": 0.0, "avg_logprob": + -0.2459466912773218, "compression_ratio": 1.7264150943396226, "no_speech_prob": + 0.008565830998122692}, {"id": 265, "seek": 165676, "start": 1670.04, "end": 1674.92, + "text": " like in an existing large-language model, for example, someone would contribute + a little more", "tokens": [51028, 411, 294, 364, 6741, 2416, 12, 25241, 20473, 2316, + 11, 337, 1365, 11, 1580, 576, 10586, 257, 707, 544, 51272], "temperature": 0.0, + "avg_logprob": -0.2459466912773218, "compression_ratio": 1.7264150943396226, "no_speech_prob": + 0.008565830998122692}, {"id": 266, "seek": 165676, "start": 1674.92, "end": 1679.8, + "text": " data. I mean, it''s more likely that maybe you you would change a little + bit the code, for example,", "tokens": [51272, 1412, 13, 286, 914, 11, 309, 311, + 544, 3700, 300, 1310, 291, 291, 576, 1319, 257, 707, 857, 264, 3089, 11, 337, 1365, + 11, 51516], "temperature": 0.0, "avg_logprob": -0.2459466912773218, "compression_ratio": + 1.7264150943396226, "no_speech_prob": 0.008565830998122692}, {"id": 267, "seek": + 167980, "start": 1679.8, "end": 1687.08, "text": " responsible of fine tuning and + it sort of things. But still, I think there will be this layer of", "tokens": [50364, + 6250, 295, 2489, 15164, 293, 309, 1333, 295, 721, 13, 583, 920, 11, 286, 519, 456, + 486, 312, 341, 4583, 295, 50728], "temperature": 0.0, "avg_logprob": -0.3745559345592152, + "compression_ratio": 1.651063829787234, "no_speech_prob": 0.012674327939748764}, + {"id": 268, "seek": 167980, "start": 1687.08, "end": 1693.6399999999999, "text": + " responsibility that wouldn''t wait on the shoulders of the contributors because + of course, you kind of", "tokens": [50728, 6357, 300, 2759, 380, 1699, 322, 264, + 10245, 295, 264, 45627, 570, 295, 1164, 11, 291, 733, 295, 51056], "temperature": + 0.0, "avg_logprob": -0.3745559345592152, "compression_ratio": 1.651063829787234, + "no_speech_prob": 0.012674327939748764}, {"id": 269, "seek": 167980, "start": 1693.6399999999999, + "end": 1701.8799999999999, "text": " have control on these single individuals. And + you need to have like this sort of layer where the", "tokens": [51056, 362, 1969, + 322, 613, 2167, 5346, 13, 400, 291, 643, 281, 362, 411, 341, 1333, 295, 4583, 689, + 264, 51468], "temperature": 0.0, "avg_logprob": -0.3745559345592152, "compression_ratio": + 1.651063829787234, "no_speech_prob": 0.012674327939748764}, {"id": 270, "seek": + 167980, "start": 1701.8799999999999, "end": 1708.2, "text": " no-profit, the Schopen + source project protects itself from. Yeah. Because I can imagine that", "tokens": + [51468, 572, 12, 14583, 11, 264, 2065, 15752, 4009, 1716, 22583, 2564, 490, 13, + 865, 13, 1436, 286, 393, 3811, 300, 51784], "temperature": 0.0, "avg_logprob": -0.3745559345592152, + "compression_ratio": 1.651063829787234, "no_speech_prob": 0.012674327939748764}, + {"id": 271, "seek": 170820, "start": 1708.2, "end": 1713.48, "text": " again, it''s + probably putting it to extremes, but there could be eventually some tooling where", + "tokens": [50364, 797, 11, 309, 311, 1391, 3372, 309, 281, 41119, 11, 457, 456, + 727, 312, 4728, 512, 46593, 689, 50628], "temperature": 0.0, "avg_logprob": -0.14282908682095802, + "compression_ratio": 1.7017543859649122, "no_speech_prob": 0.011533819139003754}, + {"id": 272, "seek": 170820, "start": 1713.48, "end": 1718.68, "text": " you take + the model and you introspect its behavior and you can make a guess on which data + it was", "tokens": [50628, 291, 747, 264, 2316, 293, 291, 560, 28713, 1080, 5223, + 293, 291, 393, 652, 257, 2041, 322, 597, 1412, 309, 390, 50888], "temperature": + 0.0, "avg_logprob": -0.14282908682095802, "compression_ratio": 1.7017543859649122, + "no_speech_prob": 0.011533819139003754}, {"id": 273, "seek": 170820, "start": 1718.68, + "end": 1723.56, "text": " trained. Potentially. Or at least find some similarities + with how it produces. I mean, there", "tokens": [50888, 8895, 13, 9145, 3137, 13, + 1610, 412, 1935, 915, 512, 24197, 365, 577, 309, 14725, 13, 286, 914, 11, 456, 51132], + "temperature": 0.0, "avg_logprob": -0.14282908682095802, "compression_ratio": 1.7017543859649122, + "no_speech_prob": 0.011533819139003754}, {"id": 274, "seek": 170820, "start": 1723.56, + "end": 1729.96, "text": " been some attacks, so to say, right? So you can actually + probe the model and see what it outputs,", "tokens": [51132, 668, 512, 8122, 11, + 370, 281, 584, 11, 558, 30, 407, 291, 393, 767, 22715, 264, 2316, 293, 536, 437, + 309, 23930, 11, 51452], "temperature": 0.0, "avg_logprob": -0.14282908682095802, + "compression_ratio": 1.7017543859649122, "no_speech_prob": 0.011533819139003754}, + {"id": 275, "seek": 170820, "start": 1729.96, "end": 1735.64, "text": " right? You + can even break some models sometimes. That''s true. So that''s more like on the + hacker side or", "tokens": [51452, 558, 30, 509, 393, 754, 1821, 512, 5245, 2171, + 13, 663, 311, 2074, 13, 407, 300, 311, 544, 411, 322, 264, 38155, 1252, 420, 51736], + "temperature": 0.0, "avg_logprob": -0.14282908682095802, "compression_ratio": 1.7017543859649122, + "no_speech_prob": 0.011533819139003754}, {"id": 276, "seek": 173564, "start": 1736.2, + "end": 1741.96, "text": " the the bad hacker side. But I mean, there probably will + be tooling. Do you think it''s possible that", "tokens": [50392, 264, 264, 1578, + 38155, 1252, 13, 583, 286, 914, 11, 456, 1391, 486, 312, 46593, 13, 1144, 291, 519, + 309, 311, 1944, 300, 50680], "temperature": 0.0, "avg_logprob": -0.16953527927398682, + "compression_ratio": 1.6638655462184875, "no_speech_prob": 0.001762103638611734}, + {"id": 277, "seek": 173564, "start": 1741.96, "end": 1747.16, "text": " there will + be tooling kind of checking the model and and making some hypothesis. And as you + said,", "tokens": [50680, 456, 486, 312, 46593, 733, 295, 8568, 264, 2316, 293, + 293, 1455, 512, 17291, 13, 400, 382, 291, 848, 11, 50940], "temperature": 0.0, "avg_logprob": + -0.16953527927398682, "compression_ratio": 1.6638655462184875, "no_speech_prob": + 0.001762103638611734}, {"id": 278, "seek": 173564, "start": 1747.88, "end": 1754.8400000000001, + "text": " once caught, that organization will kind of lose its trust, right? So + obviously, everyone wants to be", "tokens": [50976, 1564, 5415, 11, 300, 4475, 486, + 733, 295, 3624, 1080, 3361, 11, 558, 30, 407, 2745, 11, 1518, 2738, 281, 312, 51324], + "temperature": 0.0, "avg_logprob": -0.16953527927398682, "compression_ratio": 1.6638655462184875, + "no_speech_prob": 0.001762103638611734}, {"id": 279, "seek": 173564, "start": 1754.8400000000001, + "end": 1760.44, "text": " kind of accountable and so on. But then there could be + a flip side of that that you can kind of", "tokens": [51324, 733, 295, 18024, 293, + 370, 322, 13, 583, 550, 456, 727, 312, 257, 7929, 1252, 295, 300, 300, 291, 393, + 733, 295, 51604], "temperature": 0.0, "avg_logprob": -0.16953527927398682, "compression_ratio": + 1.6638655462184875, "no_speech_prob": 0.001762103638611734}, {"id": 280, "seek": + 176044, "start": 1761.16, "end": 1769.3200000000002, "text": " accidentally assume + that they did it, but that''s not true, right? Now that becomes a very hard", "tokens": + [50400, 15715, 6552, 300, 436, 630, 309, 11, 457, 300, 311, 406, 2074, 11, 558, + 30, 823, 300, 3643, 257, 588, 1152, 50808], "temperature": 0.0, "avg_logprob": -0.25752168231540257, + "compression_ratio": 1.471502590673575, "no_speech_prob": 0.002589970361441374}, + {"id": 281, "seek": 176044, "start": 1769.3200000000002, "end": 1778.76, "text": + " debate, right? So it''s an area which I think deserves exploration and study. + And I believe that''s", "tokens": [50808, 7958, 11, 558, 30, 407, 309, 311, 364, + 1859, 597, 286, 519, 17037, 16197, 293, 2979, 13, 400, 286, 1697, 300, 311, 51280], + "temperature": 0.0, "avg_logprob": -0.25752168231540257, "compression_ratio": 1.471502590673575, + "no_speech_prob": 0.002589970361441374}, {"id": 282, "seek": 176044, "start": 1780.6000000000001, + "end": 1787.16, "text": " being accountable of like the data you use and disclosing + it, of course, is the first step.", "tokens": [51372, 885, 18024, 295, 411, 264, + 1412, 291, 764, 293, 17092, 6110, 309, 11, 295, 1164, 11, 307, 264, 700, 1823, 13, + 51700], "temperature": 0.0, "avg_logprob": -0.25752168231540257, "compression_ratio": + 1.471502590673575, "no_speech_prob": 0.002589970361441374}, {"id": 283, "seek": + 178716, "start": 1787.8000000000002, "end": 1793.96, "text": " But then also validating + that companies send the truth, for example, I think it''s going to be", "tokens": + [50396, 583, 550, 611, 7363, 990, 300, 3431, 2845, 264, 3494, 11, 337, 1365, 11, + 286, 519, 309, 311, 516, 281, 312, 50704], "temperature": 0.0, "avg_logprob": -0.2601023244333791, + "compression_ratio": 1.5458333333333334, "no_speech_prob": 0.05368422344326973}, + {"id": 284, "seek": 178716, "start": 1793.96, "end": 1801.24, "text": " important + to build trust and to make sure that what you display is actually what happens.", + "tokens": [50704, 1021, 281, 1322, 3361, 293, 281, 652, 988, 300, 437, 291, 4674, + 307, 767, 437, 2314, 13, 51068], "temperature": 0.0, "avg_logprob": -0.2601023244333791, + "compression_ratio": 1.5458333333333334, "no_speech_prob": 0.05368422344326973}, + {"id": 285, "seek": 178716, "start": 1801.72, "end": 1807.64, "text": " Because + we never know. It''s very interesting. Was there some other topic you wanted to + cover?", "tokens": [51092, 1436, 321, 1128, 458, 13, 467, 311, 588, 1880, 13, 3027, + 456, 512, 661, 4829, 291, 1415, 281, 2060, 30, 51388], "temperature": 0.0, "avg_logprob": + -0.2601023244333791, "compression_ratio": 1.5458333333333334, "no_speech_prob": + 0.05368422344326973}, {"id": 286, "seek": 178716, "start": 1807.64, "end": 1815.5600000000002, + "text": " I mean, are you also working on Raga or anything of that or evaluating + the LLAM based search?", "tokens": [51388, 286, 914, 11, 366, 291, 611, 1364, 322, + 497, 9286, 420, 1340, 295, 300, 420, 27479, 264, 441, 43, 2865, 2361, 3164, 30, + 51784], "temperature": 0.0, "avg_logprob": -0.2601023244333791, "compression_ratio": + 1.5458333333333334, "no_speech_prob": 0.05368422344326973}, {"id": 287, "seek": + 181556, "start": 1815.56, "end": 1820.76, "text": " We are working on many different + integration with LLAM models. Retriol passage generation is one", "tokens": [50364, + 492, 366, 1364, 322, 867, 819, 10980, 365, 441, 43, 2865, 5245, 13, 11495, 470, + 401, 11497, 5125, 307, 472, 50624], "temperature": 0.0, "avg_logprob": -0.3106827069354314, + "compression_ratio": 1.606694560669456, "no_speech_prob": 0.0042394609190523624}, + {"id": 288, "seek": 181556, "start": 1820.76, "end": 1826.6, "text": " of it. Nugro + language parsing, for example, is another so moving from Nugro language to structured", + "tokens": [50624, 295, 309, 13, 426, 697, 340, 2856, 21156, 278, 11, 337, 1365, + 11, 307, 1071, 370, 2684, 490, 426, 697, 340, 2856, 281, 18519, 50916], "temperature": + 0.0, "avg_logprob": -0.3106827069354314, "compression_ratio": 1.606694560669456, + "no_speech_prob": 0.0042394609190523624}, {"id": 289, "seek": 181556, "start": 1826.6, + "end": 1832.44, "text": " queries. Yeah. Probably the last thing we can discuss, + the last topic we can discuss is prompt", "tokens": [50916, 24109, 13, 865, 13, + 9210, 264, 1036, 551, 321, 393, 2248, 11, 264, 1036, 4829, 321, 393, 2248, 307, + 12391, 51208], "temperature": 0.0, "avg_logprob": -0.3106827069354314, "compression_ratio": + 1.606694560669456, "no_speech_prob": 0.0042394609190523624}, {"id": 290, "seek": + 181556, "start": 1832.44, "end": 1838.9199999999998, "text": " engineering. Yeah. + Briefly, because yes, it''s this naming convention is something that really", "tokens": + [51208, 7043, 13, 865, 13, 39805, 356, 11, 570, 2086, 11, 309, 311, 341, 25290, + 10286, 307, 746, 300, 534, 51532], "temperature": 0.0, "avg_logprob": -0.3106827069354314, + "compression_ratio": 1.606694560669456, "no_speech_prob": 0.0042394609190523624}, + {"id": 291, "seek": 183892, "start": 1838.92, "end": 1844.28, "text": " hurts me + because it''s not engineering at all in my opinion because you''re just attempting + to", "tokens": [50364, 11051, 385, 570, 309, 311, 406, 7043, 412, 439, 294, 452, + 4800, 570, 291, 434, 445, 22001, 281, 50632], "temperature": 0.0, "avg_logprob": + -0.2131224782843339, "compression_ratio": 1.7142857142857142, "no_speech_prob": + 0.02166379988193512}, {"id": 292, "seek": 183892, "start": 1844.28, "end": 1851.0800000000002, + "text": " communicate with something and you don''t know what to expect. Because + I''ve seen, I mean, I''ve seen", "tokens": [50632, 7890, 365, 746, 293, 291, 500, + 380, 458, 437, 281, 2066, 13, 1436, 286, 600, 1612, 11, 286, 914, 11, 286, 600, + 1612, 50972], "temperature": 0.0, "avg_logprob": -0.2131224782843339, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.02166379988193512}, {"id": 293, "seek": + 183892, "start": 1851.0800000000002, "end": 1857.88, "text": " tools today with + people saying, you write this prompt and you hope you get this response. Yeah.", + "tokens": [50972, 3873, 965, 365, 561, 1566, 11, 291, 2464, 341, 12391, 293, 291, + 1454, 291, 483, 341, 4134, 13, 865, 13, 51312], "temperature": 0.0, "avg_logprob": + -0.2131224782843339, "compression_ratio": 1.7142857142857142, "no_speech_prob": + 0.02166379988193512}, {"id": 294, "seek": 183892, "start": 1857.88, "end": 1863.8000000000002, + "text": " You type this prompt and you ask, please give me the response. She is, + to me, something that is,", "tokens": [51312, 509, 2010, 341, 12391, 293, 291, 1029, + 11, 1767, 976, 385, 264, 4134, 13, 1240, 307, 11, 281, 385, 11, 746, 300, 307, 11, + 51608], "temperature": 0.0, "avg_logprob": -0.2131224782843339, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.02166379988193512}, {"id": 295, "seek": + 186380, "start": 1864.2, "end": 1869.3999999999999, "text": " not scientific at + all. It''s not scientific. It''s not scientific. It''s not science. You can''t just", + "tokens": [50384, 406, 8134, 412, 439, 13, 467, 311, 406, 8134, 13, 467, 311, 406, + 8134, 13, 467, 311, 406, 3497, 13, 509, 393, 380, 445, 50644], "temperature": 0.0, + "avg_logprob": -0.3429580373862355, "compression_ratio": 1.7370892018779343, "no_speech_prob": + 0.05421893671154976}, {"id": 296, "seek": 186380, "start": 1869.3999999999999, "end": + 1875.96, "text": " be comfortable. Yeah, you can be comfortable. Yeah. So there''s, + in my opinion, just to give", "tokens": [50644, 312, 4619, 13, 865, 11, 291, 393, + 312, 4619, 13, 865, 13, 407, 456, 311, 11, 294, 452, 4800, 11, 445, 281, 976, 50972], + "temperature": 0.0, "avg_logprob": -0.3429580373862355, "compression_ratio": 1.7370892018779343, + "no_speech_prob": 0.05421893671154976}, {"id": 297, "seek": 186380, "start": 1875.96, + "end": 1884.12, "text": " you short, a big margin of improvement there to interact + with LLAM in a more program idea.", "tokens": [50972, 291, 2099, 11, 257, 955, 10270, + 295, 10444, 456, 281, 4648, 365, 441, 43, 2865, 294, 257, 544, 1461, 1558, 13, 51380], + "temperature": 0.0, "avg_logprob": -0.3429580373862355, "compression_ratio": 1.7370892018779343, + "no_speech_prob": 0.05421893671154976}, {"id": 298, "seek": 186380, "start": 1885.24, + "end": 1891.8799999999999, "text": " I want to specify it as with rules and get + back a response that satisfies those rules. If", "tokens": [51436, 286, 528, 281, + 16500, 309, 382, 365, 4474, 293, 483, 646, 257, 4134, 300, 44271, 729, 4474, 13, + 759, 51768], "temperature": 0.0, "avg_logprob": -0.3429580373862355, "compression_ratio": + 1.7370892018779343, "no_speech_prob": 0.05421893671154976}, {"id": 299, "seek": + 189188, "start": 1891.88, "end": 1896.5200000000002, "text": " I want to select + an item from a list, I want to select an item from the list. I wonder", "tokens": + [50364, 286, 528, 281, 3048, 364, 3174, 490, 257, 1329, 11, 286, 528, 281, 3048, + 364, 3174, 490, 264, 1329, 13, 286, 2441, 50596], "temperature": 0.0, "avg_logprob": + -0.22088693633792908, "compression_ratio": 1.9572649572649572, "no_speech_prob": + 0.019875723868608475}, {"id": 300, "seek": 189188, "start": 1896.5200000000002, + "end": 1902.1200000000001, "text": " LLAM is more than to be able to just select + the item from the list. I''m not 80% of the time,", "tokens": [50596, 441, 43, 2865, + 307, 544, 813, 281, 312, 1075, 281, 445, 3048, 264, 3174, 490, 264, 1329, 13, 286, + 478, 406, 4688, 4, 295, 264, 565, 11, 50876], "temperature": 0.0, "avg_logprob": + -0.22088693633792908, "compression_ratio": 1.9572649572649572, "no_speech_prob": + 0.019875723868608475}, {"id": 301, "seek": 189188, "start": 1902.1200000000001, + "end": 1907.0800000000002, "text": " select the item on the list and 20% of the + time, select the item and give me an explanation.", "tokens": [50876, 3048, 264, + 3174, 322, 264, 1329, 293, 945, 4, 295, 264, 565, 11, 3048, 264, 3174, 293, 976, + 385, 364, 10835, 13, 51124], "temperature": 0.0, "avg_logprob": -0.22088693633792908, + "compression_ratio": 1.9572649572649572, "no_speech_prob": 0.019875723868608475}, + {"id": 302, "seek": 189188, "start": 1907.96, "end": 1914.44, "text": " I just wanted + the item from the list. Yeah. And right now, I''ve seen, and we will see", "tokens": + [51168, 286, 445, 1415, 264, 3174, 490, 264, 1329, 13, 865, 13, 400, 558, 586, 11, + 286, 600, 1612, 11, 293, 321, 486, 536, 51492], "temperature": 0.0, "avg_logprob": + -0.22088693633792908, "compression_ratio": 1.9572649572649572, "no_speech_prob": + 0.019875723868608475}, {"id": 303, "seek": 189188, "start": 1914.44, "end": 1919.0, + "text": " the conference because I''ve seen in the agenda, there are a lot of many + talks about trying to solve", "tokens": [51492, 264, 7586, 570, 286, 600, 1612, + 294, 264, 9829, 11, 456, 366, 257, 688, 295, 867, 6686, 466, 1382, 281, 5039, 51720], + "temperature": 0.0, "avg_logprob": -0.22088693633792908, "compression_ratio": 1.9572649572649572, + "no_speech_prob": 0.019875723868608475}, {"id": 304, "seek": 191900, "start": 1919.08, + "end": 1925.72, "text": " this problem. But right now, what I''ve seen as a possible + solution is just like you post validates", "tokens": [50368, 341, 1154, 13, 583, + 558, 586, 11, 437, 286, 600, 1612, 382, 257, 1944, 3827, 307, 445, 411, 291, 2183, + 7363, 1024, 50700], "temperature": 0.0, "avg_logprob": -0.22727550677399136, "compression_ratio": + 1.7025089605734767, "no_speech_prob": 0.02385595068335533}, {"id": 305, "seek": + 191900, "start": 1926.52, "end": 1932.52, "text": " the response and you go back. + Like, okay, yeah, I asked for a specific JSON in the response.", "tokens": [50740, + 264, 4134, 293, 291, 352, 646, 13, 1743, 11, 1392, 11, 1338, 11, 286, 2351, 337, + 257, 2685, 31828, 294, 264, 4134, 13, 51040], "temperature": 0.0, "avg_logprob": + -0.22727550677399136, "compression_ratio": 1.7025089605734767, "no_speech_prob": + 0.02385595068335533}, {"id": 306, "seek": 191900, "start": 1932.52, "end": 1938.92, + "text": " There are mistakes. It''s like, it''s not a possible JSON. I say, I go + back to the LLAM and I say,", "tokens": [51040, 821, 366, 8038, 13, 467, 311, 411, + 11, 309, 311, 406, 257, 1944, 31828, 13, 286, 584, 11, 286, 352, 646, 281, 264, + 441, 43, 2865, 293, 286, 584, 11, 51360], "temperature": 0.0, "avg_logprob": -0.22727550677399136, + "compression_ratio": 1.7025089605734767, "no_speech_prob": 0.02385595068335533}, + {"id": 307, "seek": 191900, "start": 1938.92, "end": 1943.48, "text": " this is + not a possible JSON. Can you fix it? And again, and again, and again, which is not + really", "tokens": [51360, 341, 307, 406, 257, 1944, 31828, 13, 1664, 291, 3191, + 309, 30, 400, 797, 11, 293, 797, 11, 293, 797, 11, 597, 307, 406, 534, 51588], "temperature": + 0.0, "avg_logprob": -0.22727550677399136, "compression_ratio": 1.7025089605734767, + "no_speech_prob": 0.02385595068335533}, {"id": 308, "seek": 191900, "start": 1943.48, + "end": 1948.6, "text": " something you want to go to production. So in short, in + my opinion, like using LLAM with", "tokens": [51588, 746, 291, 528, 281, 352, 281, + 4265, 13, 407, 294, 2099, 11, 294, 452, 4800, 11, 411, 1228, 441, 43, 2865, 365, + 51844], "temperature": 0.0, "avg_logprob": -0.22727550677399136, "compression_ratio": + 1.7025089605734767, "no_speech_prob": 0.02385595068335533}, {"id": 309, "seek": + 194860, "start": 1948.6, "end": 1954.6799999999998, "text": " models, program, I + right now is full for approval concepts. But would I bring to production like", + "tokens": [50364, 5245, 11, 1461, 11, 286, 558, 586, 307, 1577, 337, 13317, 10392, + 13, 583, 576, 286, 1565, 281, 4265, 411, 50668], "temperature": 0.0, "avg_logprob": + -0.3336199788213934, "compression_ratio": 1.5872340425531914, "no_speech_prob": + 0.008671130985021591}, {"id": 310, "seek": 194860, "start": 1955.1599999999999, + "end": 1959.56, "text": " out of the box like these sort of approaches? I want, + because I wouldn''t, you know,", "tokens": [50692, 484, 295, 264, 2424, 411, 613, + 1333, 295, 11587, 30, 286, 528, 11, 570, 286, 2759, 380, 11, 291, 458, 11, 50912], + "temperature": 0.0, "avg_logprob": -0.3336199788213934, "compression_ratio": 1.5872340425531914, + "no_speech_prob": 0.008671130985021591}, {"id": 311, "seek": 194860, "start": 1959.56, + "end": 1967.3999999999999, "text": " brings, I want to bring something that is deterministic. + Yeah. It does what I want to do 100%", "tokens": [50912, 5607, 11, 286, 528, 281, + 1565, 746, 300, 307, 15957, 3142, 13, 865, 13, 467, 775, 437, 286, 528, 281, 360, + 2319, 4, 51304], "temperature": 0.0, "avg_logprob": -0.3336199788213934, "compression_ratio": + 1.5872340425531914, "no_speech_prob": 0.008671130985021591}, {"id": 312, "seek": + 194860, "start": 1967.3999999999999, "end": 1973.32, "text": " of the time. Sometimes. + And I don''t want to hope. It''s a good thing. Well, I want to make it work.", "tokens": + [51304, 295, 264, 565, 13, 4803, 13, 400, 286, 500, 380, 528, 281, 1454, 13, 467, + 311, 257, 665, 551, 13, 1042, 11, 286, 528, 281, 652, 309, 589, 13, 51600], "temperature": + 0.0, "avg_logprob": -0.3336199788213934, "compression_ratio": 1.5872340425531914, + "no_speech_prob": 0.008671130985021591}, {"id": 313, "seek": 197332, "start": 1973.8799999999999, + "end": 1979.6399999999999, "text": " Yeah, but it''s also like I see it''s very + interesting topic, by the way, but I also see some level of", "tokens": [50392, + 865, 11, 457, 309, 311, 611, 411, 286, 536, 309, 311, 588, 1880, 4829, 11, 538, + 264, 636, 11, 457, 286, 611, 536, 512, 1496, 295, 50680], "temperature": 0.0, "avg_logprob": + -0.15326243952700966, "compression_ratio": 1.7136752136752136, "no_speech_prob": + 0.040132369846105576}, {"id": 314, "seek": 197332, "start": 1979.6399999999999, + "end": 1987.0, "text": " contradiction that to like between non deterministic and + hallucinating model essentially hallucinating", "tokens": [50680, 34937, 300, 281, + 411, 1296, 2107, 15957, 3142, 293, 35212, 8205, 2316, 4476, 35212, 8205, 51048], + "temperature": 0.0, "avg_logprob": -0.15326243952700966, "compression_ratio": 1.7136752136752136, + "no_speech_prob": 0.040132369846105576}, {"id": 315, "seek": 197332, "start": 1987.0, + "end": 1993.32, "text": " by design because it keeps predicting the terms, right? + And some level of determinism as you just", "tokens": [51048, 538, 1715, 570, 309, + 5965, 32884, 264, 2115, 11, 558, 30, 400, 512, 1496, 295, 15957, 1434, 382, 291, + 445, 51364], "temperature": 0.0, "avg_logprob": -0.15326243952700966, "compression_ratio": + 1.7136752136752136, "no_speech_prob": 0.040132369846105576}, {"id": 316, "seek": + 197332, "start": 1993.32, "end": 1999.8, "text": " explained, right? But I guess, + but I guess at the same time, someone might say that our life is not", "tokens": + [51364, 8825, 11, 558, 30, 583, 286, 2041, 11, 457, 286, 2041, 412, 264, 912, 565, + 11, 1580, 1062, 584, 300, 527, 993, 307, 406, 51688], "temperature": 0.0, "avg_logprob": + -0.15326243952700966, "compression_ratio": 1.7136752136752136, "no_speech_prob": + 0.040132369846105576}, {"id": 317, "seek": 199980, "start": 1999.8, "end": 2005.8, + "text": " that deterministic, many moving parts and we still find a way to, I don''t + know, leave it and then", "tokens": [50364, 300, 15957, 3142, 11, 867, 2684, 3166, + 293, 321, 920, 915, 257, 636, 281, 11, 286, 500, 380, 458, 11, 1856, 309, 293, 550, + 50664], "temperature": 0.0, "avg_logprob": -0.21954883915362972, "compression_ratio": + 1.5659574468085107, "no_speech_prob": 0.004681337624788284}, {"id": 318, "seek": + 199980, "start": 2005.8, "end": 2010.36, "text": " build something, right? Yeah, + there''s something that moves. I think, you know, it''s the first,", "tokens": [50664, + 1322, 746, 11, 558, 30, 865, 11, 456, 311, 746, 300, 6067, 13, 286, 519, 11, 291, + 458, 11, 309, 311, 264, 700, 11, 50892], "temperature": 0.0, "avg_logprob": -0.21954883915362972, + "compression_ratio": 1.5659574468085107, "no_speech_prob": 0.004681337624788284}, + {"id": 319, "seek": 199980, "start": 2010.36, "end": 2015.8799999999999, "text": + " anyway, we are experiencing, in my opinion, at first, that in these new worlds,", + "tokens": [50892, 4033, 11, 321, 366, 11139, 11, 294, 452, 4800, 11, 412, 700, 11, + 300, 294, 613, 777, 13401, 11, 51168], "temperature": 0.0, "avg_logprob": -0.21954883915362972, + "compression_ratio": 1.5659574468085107, "no_speech_prob": 0.004681337624788284}, + {"id": 320, "seek": 199980, "start": 2016.52, "end": 2024.28, "text": " of AI big + models. So I think it''s fair. They were born to auto complete text, to generate + text.", "tokens": [51200, 295, 7318, 955, 5245, 13, 407, 286, 519, 309, 311, 3143, + 13, 814, 645, 4232, 281, 8399, 3566, 2487, 11, 281, 8460, 2487, 13, 51588], "temperature": + 0.0, "avg_logprob": -0.21954883915362972, "compression_ratio": 1.5659574468085107, + "no_speech_prob": 0.004681337624788284}, {"id": 321, "seek": 202428, "start": 2024.36, + "end": 2031.16, "text": " And now we are trying to use them to do tasks. Yes. Which + is okay. We as humans use language to,", "tokens": [50368, 400, 586, 321, 366, 1382, + 281, 764, 552, 281, 360, 9608, 13, 1079, 13, 3013, 307, 1392, 13, 492, 382, 6255, + 764, 2856, 281, 11, 50708], "temperature": 0.0, "avg_logprob": -0.29556469800995616, + "compression_ratio": 1.6093189964157706, "no_speech_prob": 0.007484468165785074}, + {"id": 322, "seek": 202428, "start": 2031.72, "end": 2037.56, "text": " to task. + Yeah. So I just guess, and then we end up with programming. Yes.", "tokens": [50736, + 281, 5633, 13, 865, 13, 407, 286, 445, 2041, 11, 293, 550, 321, 917, 493, 365, 9410, + 13, 1079, 13, 51028], "temperature": 0.0, "avg_logprob": -0.29556469800995616, "compression_ratio": + 1.6093189964157706, "no_speech_prob": 0.007484468165785074}, {"id": 323, "seek": + 202428, "start": 2037.56, "end": 2041.96, "text": " Computers, right? So let''s + play it in a little bit more that would be programmable. Yeah.", "tokens": [51028, + 37804, 433, 11, 558, 30, 407, 718, 311, 862, 309, 294, 257, 707, 857, 544, 300, + 576, 312, 37648, 712, 13, 865, 13, 51248], "temperature": 0.0, "avg_logprob": -0.29556469800995616, + "compression_ratio": 1.6093189964157706, "no_speech_prob": 0.007484468165785074}, + {"id": 324, "seek": 202428, "start": 2042.52, "end": 2048.12, "text": " I mean, + it doesn''t remind a bit. I haven''t explored it, but to mention GSPY, the packaging,", + "tokens": [51276, 286, 914, 11, 309, 1177, 380, 4160, 257, 857, 13, 286, 2378, 380, + 24016, 309, 11, 457, 281, 2152, 460, 27921, 56, 11, 264, 16836, 11, 51556], "temperature": + 0.0, "avg_logprob": -0.29556469800995616, "compression_ratio": 1.6093189964157706, + "no_speech_prob": 0.007484468165785074}, {"id": 325, "seek": 202428, "start": 2048.12, + "end": 2053.0, "text": " probably you heard about it, right? Which replaces the + prompt engineering with more programmable", "tokens": [51556, 1391, 291, 2198, 466, + 309, 11, 558, 30, 3013, 46734, 264, 12391, 7043, 365, 544, 37648, 712, 51800], "temperature": + 0.0, "avg_logprob": -0.29556469800995616, "compression_ratio": 1.6093189964157706, + "no_speech_prob": 0.007484468165785074}, {"id": 326, "seek": 205300, "start": 2053.0, + "end": 2057.48, "text": " sort of way of doing it. I still don''t know how it works, + but I know that some of the engineers", "tokens": [50364, 1333, 295, 636, 295, 884, + 309, 13, 286, 920, 500, 380, 458, 577, 309, 1985, 11, 457, 286, 458, 300, 512, 295, + 264, 11955, 50588], "temperature": 0.0, "avg_logprob": -0.2574625748854417, "compression_ratio": + 1.59915611814346, "no_speech_prob": 0.006268244236707687}, {"id": 327, "seek": 205300, + "start": 2057.48, "end": 2064.44, "text": " in my team applied it quite successfully + to generate some synthetic queries. So that was very", "tokens": [50588, 294, 452, + 1469, 6456, 309, 1596, 10727, 281, 8460, 512, 23420, 24109, 13, 407, 300, 390, 588, + 50936], "temperature": 0.0, "avg_logprob": -0.2574625748854417, "compression_ratio": + 1.59915611814346, "no_speech_prob": 0.006268244236707687}, {"id": 328, "seek": 205300, + "start": 2064.44, "end": 2071.24, "text": " interesting. Have you played with it? + Do you know? So my team, we''ve been playing with it for one", "tokens": [50936, + 1880, 13, 3560, 291, 3737, 365, 309, 30, 1144, 291, 458, 30, 407, 452, 1469, 11, + 321, 600, 668, 2433, 365, 309, 337, 472, 51276], "temperature": 0.0, "avg_logprob": + -0.2574625748854417, "compression_ratio": 1.59915611814346, "no_speech_prob": 0.006268244236707687}, + {"id": 329, "seek": 205300, "start": 2071.24, "end": 2076.36, "text": " of the bros + of conceptual concepts for doing other language processing, for structure solar", + "tokens": [51276, 295, 264, 738, 329, 295, 24106, 10392, 337, 884, 661, 2856, 9007, + 11, 337, 3877, 7936, 51532], "temperature": 0.0, "avg_logprob": -0.2574625748854417, + "compression_ratio": 1.59915611814346, "no_speech_prob": 0.006268244236707687}, + {"id": 330, "seek": 207636, "start": 2076.36, "end": 2085.7200000000003, "text": + " queries. And I think it''s a nice first step. Still is giving you like an in-direction + between", "tokens": [50364, 24109, 13, 400, 286, 519, 309, 311, 257, 1481, 700, + 1823, 13, 8291, 307, 2902, 291, 411, 364, 294, 12, 18267, 882, 1296, 50832], "temperature": + 0.0, "avg_logprob": -0.2512031212831155, "compression_ratio": 1.529100529100529, + "no_speech_prob": 0.017910394817590714}, {"id": 331, "seek": 207636, "start": 2086.28, + "end": 2094.2000000000003, "text": " the prompt and the way to write a prompt. So + you have like classes, the mimic, the programming", "tokens": [50860, 264, 12391, + 293, 264, 636, 281, 2464, 257, 12391, 13, 407, 291, 362, 411, 5359, 11, 264, 31075, + 11, 264, 9410, 51256], "temperature": 0.0, "avg_logprob": -0.2512031212831155, "compression_ratio": + 1.529100529100529, "no_speech_prob": 0.017910394817590714}, {"id": 332, "seek": + 207636, "start": 2094.2000000000003, "end": 2100.6800000000003, "text": " language, + but then ends up as prompt. Yeah. I see. You are not sure that you will get what + you want.", "tokens": [51256, 2856, 11, 457, 550, 5314, 493, 382, 12391, 13, 865, + 13, 286, 536, 13, 509, 366, 406, 988, 300, 291, 486, 483, 437, 291, 528, 13, 51580], + "temperature": 0.0, "avg_logprob": -0.2512031212831155, "compression_ratio": 1.529100529100529, + "no_speech_prob": 0.017910394817590714}, {"id": 333, "seek": 210068, "start": 2101.56, + "end": 2106.3599999999997, "text": " But it''s a first attempt. Yeah. Yeah. I think + it''s okay. I mean, we will improve that.", "tokens": [50408, 583, 309, 311, 257, + 700, 5217, 13, 865, 13, 865, 13, 286, 519, 309, 311, 1392, 13, 286, 914, 11, 321, + 486, 3470, 300, 13, 50648], "temperature": 0.0, "avg_logprob": -0.34599586633535534, + "compression_ratio": 1.6803652968036529, "no_speech_prob": 0.09725619107484818}, + {"id": 334, "seek": 210068, "start": 2106.3599999999997, "end": 2113.08, "text": + " It feels maybe like maybe first baby step in a way that it''s not a work to the + state that you mentioned.", "tokens": [50648, 467, 3417, 1310, 411, 1310, 700, 3186, + 1823, 294, 257, 636, 300, 309, 311, 406, 257, 589, 281, 264, 1785, 300, 291, 2835, + 13, 50984], "temperature": 0.0, "avg_logprob": -0.34599586633535534, "compression_ratio": + 1.6803652968036529, "no_speech_prob": 0.09725619107484818}, {"id": 335, "seek": + 210068, "start": 2113.08, "end": 2118.44, "text": " Yeah. That''s not a conflict + solution, but it''s great. So what''s next that you''re working on?", "tokens": + [50984, 865, 13, 663, 311, 406, 257, 6596, 3827, 11, 457, 309, 311, 869, 13, 407, + 437, 311, 958, 300, 291, 434, 1364, 322, 30, 51252], "temperature": 0.0, "avg_logprob": + -0.34599586633535534, "compression_ratio": 1.6803652968036529, "no_speech_prob": + 0.09725619107484818}, {"id": 336, "seek": 210068, "start": 2118.44, "end": 2126.04, + "text": " That you want to disclose? Yeah. So first of all, I want you to bring + and merge the", "tokens": [51252, 663, 291, 528, 281, 36146, 30, 865, 13, 407, 700, + 295, 439, 11, 286, 528, 291, 281, 1565, 293, 22183, 264, 51632], "temperature": + 0.0, "avg_logprob": -0.34599586633535534, "compression_ratio": 1.6803652968036529, + "no_speech_prob": 0.09725619107484818}, {"id": 337, "seek": 212604, "start": 2126.12, + "end": 2131.48, "text": " hybrid search receiver rank fusion to solar, which is + coming nine to seven. So I''m very close to", "tokens": [50368, 13051, 3164, 20086, + 6181, 23100, 281, 7936, 11, 597, 307, 1348, 4949, 281, 3407, 13, 407, 286, 478, + 588, 1998, 281, 50636], "temperature": 0.0, "avg_logprob": -0.30727564903997606, + "compression_ratio": 1.634453781512605, "no_speech_prob": 0.02329128421843052}, + {"id": 338, "seek": 212604, "start": 2131.48, "end": 2138.44, "text": " the men. + Awesome. For that. We are, we got some funding from the European Union to work on + solar.", "tokens": [50636, 264, 1706, 13, 10391, 13, 1171, 300, 13, 492, 366, 11, + 321, 658, 512, 6137, 490, 264, 6473, 8133, 281, 589, 322, 7936, 13, 50984], "temperature": + 0.0, "avg_logprob": -0.30727564903997606, "compression_ratio": 1.634453781512605, + "no_speech_prob": 0.02329128421843052}, {"id": 339, "seek": 212604, "start": 2138.44, + "end": 2146.12, "text": " So that''s like that. So we''re going to be able to contribute + more vector-based search capabilities,", "tokens": [50984, 407, 300, 311, 411, 300, + 13, 407, 321, 434, 516, 281, 312, 1075, 281, 10586, 544, 8062, 12, 6032, 3164, 10862, + 11, 51368], "temperature": 0.0, "avg_logprob": -0.30727564903997606, "compression_ratio": + 1.634453781512605, "no_speech_prob": 0.02329128421843052}, {"id": 340, "seek": 212604, + "start": 2146.12, "end": 2150.84, "text": " better integrations we''ve learned into + rank, better integration with like inference and points", "tokens": [51368, 1101, + 3572, 763, 321, 600, 3264, 666, 6181, 11, 1101, 10980, 365, 411, 38253, 293, 2793, + 51604], "temperature": 0.0, "avg_logprob": -0.30727564903997606, "compression_ratio": + 1.634453781512605, "no_speech_prob": 0.02329128421843052}, {"id": 341, "seek": 215084, + "start": 2151.0, "end": 2158.6000000000004, "text": " to make it a little bit more + transparent. That''s it. And still in the work like multivalued supports", "tokens": + [50372, 281, 652, 309, 257, 707, 857, 544, 12737, 13, 663, 311, 309, 13, 400, 920, + 294, 264, 589, 411, 2120, 3576, 5827, 9346, 50752], "temperature": 0.0, "avg_logprob": + -0.2574143608411153, "compression_ratio": 1.7419354838709677, "no_speech_prob": + 0.0034876083955168724}, {"id": 342, "seek": 215084, "start": 2159.48, "end": 2166.04, + "text": " for vector-based search in solar. And there are like some pieces in losing + to speed up and", "tokens": [50796, 337, 8062, 12, 6032, 3164, 294, 7936, 13, 400, + 456, 366, 411, 512, 3755, 294, 7027, 281, 3073, 493, 293, 51124], "temperature": + 0.0, "avg_logprob": -0.2574143608411153, "compression_ratio": 1.7419354838709677, + "no_speech_prob": 0.0034876083955168724}, {"id": 343, "seek": 215084, "start": 2166.04, + "end": 2172.52, "text": " improve optimized vector-based search that are not yet + in solar. And that''s among my top", "tokens": [51124, 3470, 26941, 8062, 12, 6032, + 3164, 300, 366, 406, 1939, 294, 7936, 13, 400, 300, 311, 3654, 452, 1192, 51448], + "temperature": 0.0, "avg_logprob": -0.2574143608411153, "compression_ratio": 1.7419354838709677, + "no_speech_prob": 0.0034876083955168724}, {"id": 344, "seek": 215084, "start": 2172.52, + "end": 2177.96, "text": " priority. So this is in short. This is fantastic. This + is fantastic. And of course, it''s all open", "tokens": [51448, 9365, 13, 407, 341, + 307, 294, 2099, 13, 639, 307, 5456, 13, 639, 307, 5456, 13, 400, 295, 1164, 11, + 309, 311, 439, 1269, 51720], "temperature": 0.0, "avg_logprob": -0.2574143608411153, + "compression_ratio": 1.7419354838709677, "no_speech_prob": 0.0034876083955168724}, + {"id": 345, "seek": 217796, "start": 2177.96, "end": 2184.92, "text": " source and + you know, yeah, and then like anyone can join, but maybe we can also make a call + out and", "tokens": [50364, 4009, 293, 291, 458, 11, 1338, 11, 293, 550, 411, 2878, + 393, 3917, 11, 457, 1310, 321, 393, 611, 652, 257, 818, 484, 293, 50712], "temperature": + 0.0, "avg_logprob": -0.2784245129927848, "compression_ratio": 1.6512605042016806, + "no_speech_prob": 0.019747542217373848}, {"id": 346, "seek": 217796, "start": 2184.92, + "end": 2191.96, "text": " say that, I mean, everyone that wants to contribute, you + know, the more the merrier. Yeah, I actually enjoy", "tokens": [50712, 584, 300, + 11, 286, 914, 11, 1518, 300, 2738, 281, 10586, 11, 291, 458, 11, 264, 544, 264, + 3551, 7326, 13, 865, 11, 286, 767, 2103, 51064], "temperature": 0.0, "avg_logprob": + -0.2784245129927848, "compression_ratio": 1.6512605042016806, "no_speech_prob": + 0.019747542217373848}, {"id": 347, "seek": 217796, "start": 2191.96, "end": 2200.6, + "text": " like even though I don''t do solar or only seen today, I am still reading + the main news. And", "tokens": [51064, 411, 754, 1673, 286, 500, 380, 360, 7936, + 420, 787, 1612, 965, 11, 286, 669, 920, 3760, 264, 2135, 2583, 13, 400, 51496], + "temperature": 0.0, "avg_logprob": -0.2784245129927848, "compression_ratio": 1.6512605042016806, + "no_speech_prob": 0.019747542217373848}, {"id": 348, "seek": 217796, "start": 2200.6, + "end": 2205.4, "text": " and for the most part, I''m reading the lesson one. So + sometimes I see your discussion as well", "tokens": [51496, 293, 337, 264, 881, + 644, 11, 286, 478, 3760, 264, 6898, 472, 13, 407, 2171, 286, 536, 428, 5017, 382, + 731, 51736], "temperature": 0.0, "avg_logprob": -0.2784245129927848, "compression_ratio": + 1.6512605042016806, "no_speech_prob": 0.019747542217373848}, {"id": 349, "seek": + 220540, "start": 2205.48, "end": 2209.7200000000003, "text": " and you where you + say, actually, by the way, I''m working on this hybrid search. I did this", "tokens": + [50368, 293, 291, 689, 291, 584, 11, 767, 11, 538, 264, 636, 11, 286, 478, 1364, + 322, 341, 13051, 3164, 13, 286, 630, 341, 50580], "temperature": 0.0, "avg_logprob": + -0.209297700361772, "compression_ratio": 1.7269372693726937, "no_speech_prob": 0.01697641797363758}, + {"id": 350, "seek": 220540, "start": 2209.7200000000003, "end": 2216.12, "text": + " and this. So maybe it will influence you. And I also love the the culture where + you do not really", "tokens": [50580, 293, 341, 13, 407, 1310, 309, 486, 6503, 291, + 13, 400, 286, 611, 959, 264, 264, 3713, 689, 291, 360, 406, 534, 50900], "temperature": + 0.0, "avg_logprob": -0.209297700361772, "compression_ratio": 1.7269372693726937, + "no_speech_prob": 0.01697641797363758}, {"id": 351, "seek": 220540, "start": 2216.12, + "end": 2221.8, "text": " enforce or impose your solution. You just say just for + you to know, maybe it will be useful. And", "tokens": [50900, 24825, 420, 26952, + 428, 3827, 13, 509, 445, 584, 445, 337, 291, 281, 458, 11, 1310, 309, 486, 312, + 4420, 13, 400, 51184], "temperature": 0.0, "avg_logprob": -0.209297700361772, "compression_ratio": + 1.7269372693726937, "no_speech_prob": 0.01697641797363758}, {"id": 352, "seek": + 220540, "start": 2221.8, "end": 2227.1600000000003, "text": " someone says, yeah, + awesome. I especially love that that discussion. I forgot the particular topic,", + "tokens": [51184, 1580, 1619, 11, 1338, 11, 3476, 13, 286, 2318, 959, 300, 300, + 5017, 13, 286, 5298, 264, 1729, 4829, 11, 51452], "temperature": 0.0, "avg_logprob": + -0.209297700361772, "compression_ratio": 1.7269372693726937, "no_speech_prob": 0.01697641797363758}, + {"id": 353, "seek": 220540, "start": 2227.1600000000003, "end": 2234.12, "text": + " but I remember it was a recent one. It''s a recent one. So keep up your great + work.", "tokens": [51452, 457, 286, 1604, 309, 390, 257, 5162, 472, 13, 467, 311, + 257, 5162, 472, 13, 407, 1066, 493, 428, 869, 589, 13, 51800], "temperature": 0.0, + "avg_logprob": -0.209297700361772, "compression_ratio": 1.7269372693726937, "no_speech_prob": + 0.01697641797363758}, {"id": 354, "seek": 223412, "start": 2234.12, "end": 2239.48, + "text": " And it''s always a pleasure to talk to you and it looks like it''s a tradition + that we started", "tokens": [50364, 400, 309, 311, 1009, 257, 6834, 281, 751, 281, + 291, 293, 309, 1542, 411, 309, 311, 257, 6994, 300, 321, 1409, 50632], "temperature": + 0.0, "avg_logprob": -0.5495441436767579, "compression_ratio": 1.3529411764705883, + "no_speech_prob": 0.12901513278484344}, {"id": 355, "seek": 223412, "start": 2239.48, + "end": 2245.64, "text": " meeting in the same sort of early flood words. In the + end now. Yeah, so every two years, we", "tokens": [50632, 3440, 294, 264, 912, 1333, + 295, 2440, 10481, 2283, 13, 682, 264, 917, 586, 13, 865, 11, 370, 633, 732, 924, + 11, 321, 50940], "temperature": 0.0, "avg_logprob": -0.5495441436767579, "compression_ratio": + 1.3529411764705883, "no_speech_prob": 0.12901513278484344}, {"id": 356, "seek": + 224564, "start": 2245.64, "end": 2250.04, "text": " see a lot of people. Yeah, there + are a lot of people. So fantastic. Thank you so much,", "tokens": [50364, 536, 257, + 688, 295, 561, 13, 865, 11, 456, 366, 257, 688, 295, 561, 13, 407, 5456, 13, 1044, + 291, 370, 709, 11, 50584], "temperature": 0.0, "avg_logprob": -0.9803078969319662, + "compression_ratio": 1.162162162162162, "no_speech_prob": 0.5407794117927551}, {"id": + 357, "seek": 225004, "start": 2250.04, "end": 2253.4, "text": " LeSando. And anyway, + my reaction to the conference. Thank you. Thank you.", "tokens": [50364, 1456, 50, + 1806, 13, 400, 4033, 11, 452, 5480, 281, 264, 7586, 13, 1044, 291, 13, 1044, 291, + 13, 50532], "temperature": 0.0, "avg_logprob": -0.6104109504006126, "compression_ratio": + 1.028169014084507, "no_speech_prob": 0.5935039520263672}]' +--- + +All right, Dr. Podcast and here I have Alessandra Benedetti with me his second time on the podcast actually and exactly about same place we recorded two years ago. I remember on Berlin was works. Yeah, we were here. Yeah, I guess I got 22. It was. +It was by the way a lot no easier if you remember them now but was closing day that we like people. Yeah, but I think it's almost end of day as well here. First day of the conference and yeah, I wanted to chat with you. +How do you like the conference so far? So has been so far like a great conference. We've been seeing like many talks about the language modern integration with search. So that's the biggest new trend. +Vector based search is still quite a strong topic and in general with testing also like evaluation and explainability discussions around like vector based search or in general language models. And and my thoughts was about hybrid search. Hybrid search. +Yeah, so you work a lot on on solar right that's your kind of like playground and that's where you integrate things but also then I heard that like guys at Reddit are using the work that you've been doing also in solar. So that's amazing. +Tell me more a bit more about what is hybrid search right? How do you see it? What's the value? And and basically maybe what are the challenges that you needed to solve and you still see related to hybrid search? +So the first point and the reason I decided to start working a little bit more in hybrid search and contributing this even our rank vision to solar is because of the limitations of vector based search. +So vector based search of course introduces like the ability of closing the semantic gaps with light queries and documents with some some limitations right? So the explainability bar for example is an aspect I care a lot and it's just very difficult to explain vector based search results. +Yeah, we have I dimensional. So many many dimensions in the vectors and humans are not really good in managing many dimensions. We live in a three dimensional world and this even difficult for us to to understand life for dimensional life. Yeah, then we have like many elements in those vectors. +So each feature in the vector doesn't have like a meaning for the humans. So you have like 768 dimensions in your vectors and there's no single dimension that means something semantic. So it's just the output of some machiner model but we can interpret like what it is. +And we can interpret what would happen if that feature goes higher or lower. I mean does a higher value for that feature means higher relevance or not? You can't really do that with vector based search. So these kind of problems. Yeah, start to have like an input. Right. +So you have like your clients using vector based search. They are happy and then they are not and they want to explain for example, yeah, what happens. Yeah, and another limitation is keyword based matching. So by the vector based search try to solve the vocabulary in his best problem. +So if you have terms in your vocabulary that's different from the vocabulary used for queries. Yeah. At the same time, users are used to have keyword matching documents in their response. +So when you don't provide keyword matching document in their response, they're going to be like problems and questions. Yeah. Why do I see this now? And why I don't see for example this title. Oh yeah. So without any search, the idea is to mitigate those problems. +So mix up different query results sets. Potentially like vector based search results and traditional keyword based search results. Yeah. Get back one result set. Yeah. Let's try to combine both words. Interesting. +And if we kind of step forward from this, let's say we deployed hybrid search, so now it basically takes some similar documents from keyword hits and then another one from vector. You still get those documents that do not have keyword matches, right, from the vector space. +Do you know or maybe you have employed some ways of explaining to the user why they see them? So that's an interesting point actually at the discussion recently about how can we explain better by vector based search. So we mentioned already all the problems. We've explained the what can we do bet. +So there are other approaches that just cure dense vector based search such as learned sparse retrieval for example, where you learn query or document expansion term candidates based on learned models. So based on the probability you will expand your queries with additional terms. +So that's a little bit more explainable because at least you get back from the machine learning model alternative terms for the queries and the document. Yeah. It's still a first layer of explainability. So you have some that's like additional concepts. So it's easier to understand. +Still you have the probability assigned to their pair. So if it goes wrong, you may end up with unreasonable terms. So not perfect. A little bit better, maybe a little bit more explainable. +And then there are approaches such callvert where you encode your sentence, not just to just one vector back to a sequence of vectors. So multiple vectors, one pair for an action. And you do the same for your documents. +And then you you basically return results based on the similarity between not just a single query vector and the document vector, but multiple query vectors. So each query at each query vector, which is meant to be probably a term with the terms in the document. +So you may be able to highlight the terms in the document that are close to the terms in the query. Yeah. Also in this case, of course, it's just a first layer of explainability because then if this goes wrong, of course, again, you have sequences of vectors. +So you can get like a sort of itm up of what query terms match is like, more or less the document ones, but still not perfect. Yeah, sure. +Of course, it's kind of like maybe experimentation that is required, right? What works for you? What what what is the end product? But maybe one question is for me as a user, right? Let's say I'm using solar and you offer hybrid search now. +Are you already offering or will you consider at some point offering the capability to ingrain what you just said into let's say highlighter in solar that will it will actually build me the snippet regardless of the source of that document, whether it's keyboard or vector. +That's a very interesting question because there are in my opinion two layers of explainability for engineers. So we need to work on the engine and change the ranking, change the matching and user's equipment. Yeah. +So a user that just want to know why for example, what is there and for user's finability actually, my company, we design and develop the highlighter. +We call the neural highlighter that takes in input the wireless model and in the response, will I like the snippet for each result in documents, not based on let's say on match, but based on the question as a system powered by a level model. Yeah. +So in this way, you will be able to highlight part of the original document that are semantically close to the interesting place. Can you say the name again? What was the name? It's called the neural highlighter. Neural neural highlighter. So it's your proprietary product right now. Yes. +It's a lot of synths, right? Yes. Right now, yes. We may contribute it to a open source integer. I don't know. Right now is one of our products. But I mean, it's a feature that you, is it offered as a standalone component? It's a plugin. It's a plugin. So you ins it's a plugin to pull it. +It's a plugin. That's the value prop as well, right? It doesn't always need to be open. It's something I can plug in and exactly. It takes in input the wireless model. Yeah. It's a response point. So you can write. +So that will help to explain results to the users, right? And you also mentioned, right? It's thanks for doing this distinction, making this distinction that there is also explainability for the engineers that is also important. +So can you a bit explain what you mean? Explainability for the engineers because I care about it a lot of force. But I used to be an engineer full time. And I need to know how, like, how to do it, how to tweak something. +But also, can I explain to myself that what I tweaked is actually the right thing, right? So, you know, kind of the process of engineering the. Yes. +So in solar, for example, there is a debug component that give you the ability to engineer to expand the response with the information about how the score was calculated. So in solar, when you have a query and you have a result, a score is calculated for that result for that query. +And this core will impact the ranking. So, descending order, literally, right from the highest core to the lowest. And normally, this core is explained showing why you get that mathematical calculations from the term frequencies. Yeah. +The length of the document field, the average length of the field, the document, frequency, hour, error, a term was, for example, and so on and so forth. So long mathematical expression that are readable to the user and you can understand, okay, I was aiming for this field to impact the score. +Let's see, let's see, really impact the score. With better research right now, the only explanation that you get from an engineer perspective, literally is within the top K. So this document was within a top K with a cosine similarity between the query vector and the document vector. +That's not really helpful. It's just confirming what's you know already, right? I mean, yeah, it's in the top K, it was written. +So one of the ideas I was thinking of, because actually quite far from the implementation is to explain the reason document in the results set, showing examples of, so the language models used to return embedded, were fine tuned on sentence similarity. +So this means there were pair of sentences with similar meaning and pair of sentences with this similar meaning in a way that to learn how to encode this amount. So I think it could be very interesting if to explain the reason a document is being returned. + Because of vector cells, you show like a snippet, say, because there are, there is this similar pairs of sentences and this is this similar sentence, in the way that then potentially the engineer can go back and realize, okay, let's take a look to the original training data, for example, did I cover the example well or maybe they are wrong? +So I see like, oh, these two sentences are shown as similar, but they are not so. +It's just an idea, you know, study it. Wow, that's very interesting, because as you said, it's very limiting today to just know that geometric search happened and this is the result. Yeah, that's amazing. +I mean, it's really interesting that with this work, you are not really just taking something and applying to implementation, right? Like, I mean, implementing a plugin, but you actually go into the space of exploring thing because it's not like everything is done, right? +And maybe in some companies, it has been done, but they are not open sourcing, right? And so you need to do the search, the search of the solution. +That's very interesting. So in terms of functionality today, hybrid search is already available in solar, right? Is it already released? And big portion? Yeah, so there are different ways I need search can be performed in solar. So right now, we saw 9.6. +There are ways of combining results from electrical search and vector-based search and then re-rank them, for example, using learning pranks. So you give like different ways to different factors. So for example, the vector-based core or the traditional core. Yeah. +What is coming next, which was the topic of my talk is the receiver rank user. So that's coming with solar 9.7. So I guess in a couple of months, we're going to release it. Nice. And that is a way of adding hybrid search that is independent on this core and just based on the ranking of the results. +So you mix the different rank lists. Yeah, they can be two, maybe more. Yeah. It's support more supported, not just two. And then you combine them based on the position of the documents in the different rank list. Yeah. +The higher the position in the ranking, the best the probability that the document is going to end up in a higher final result set. Yeah, yeah. +Actually, when I was maybe you can help me understand this, but when we were trying reciprocal rank fusion with another search engine, we actually found implementation. So we could kind of plug it in and Python code, very quickly. +But then when we looked at the code, one of my engineers said, this looks like round, raw, and algorithm essentially. There is nothing particularly peculiar about it or tunable about it, which probably is not true, but I'm not sure what's your take on this. +So it felt like you have two lists and you basically just take the starting from the top, you take like in order, you know, these documents and you combine a blend at least, right? +But if you wanted to pay attention to some signals from these documents, you know, based on their features or or maybe you wanted to introduce a logic on top of this, right? +So you want to say, let's say in the context of geographic search, I want to find in top three results, I want to see a super popular B. +O.I. and I know what popular means. Another second result could be, I don't know, the closest one or maybe vice versa, depends. And so on so forth. So I have some kind of rules in embed and then maybe it stops becoming RRE, already, right? But I still go going, taking a step backwards. +Did I explain it right? Or other some parameters and RRE that I could kind of be tuning a bit to have the different outcome? There's not much to tune to be honest. So you got it right. +It's not only around roaming, because what you do is basically you give a new score to the documents that are based on all the rankings of that document in the results list. +So it's not like in Perliving where, for example, you go with one document, you pick from one range of lists and then to the other list, you pick another and then you choose which one should I go next. +It's more about life, let's see this document how many times it appears in the ranking list and where it appears in the ranking list and let's build this new score. So the more you are in the top positions, the more likely you end up in the top position of the final result list. +Given that, you're absolutely right that if you want to be like more advanced ranking systems, potentially like with different phases, different steps, it makes complete sense to maybe build your original candidate sets with receiver or infusion. +And then you re-rank, for example, using learning to rank and many features where you can have like, again, maybe the vector distances one feature, the similarity we want to feel from a expert perspective, popularity, geographical distance and many other features. +And then you apply learning for example, so you train a machine learning model to identify these weights. It makes perfect sense in my opinion. +I believe receiver, rank, fusion and in general, like let's call them simple approaches with our research, because if you take a look to the algorithm of receiver or rank fusion, it's not the core, it's actually open and open algorithm from 2009. +But this opened the doors in my opinion to build your original candidate set and then potentially like, yeah, you re-rank it okay. Yeah, yeah, yeah, she's not random, it's okay, she's already some reasons to be there. +And of course, like in any case, those without saying that we do need to have some method of combining these completely disparate spaces of scores, right? Into one. +And that could be actually even like different search engines operating on keyword level because they output different scores, right? So maybe even potentially I'm thinking separate charts of your data that also have their own idea, right? Local idea. So, yeah, incomparable, right? Awesome. +We also, not related to this completely different topic, like there was also a keynote today about sort of what open source means, right? +And without, of course, criticizing, but some companies were mentioned on this context where they claim that the LLMs are open source, but when you look at the licenses, they are restrictive, they actually do not allow you to use them independently, right? And kind of go and serve your customers. + But you also just mentioned what the code was started recording is that there are also cases where model can be open source, and it's kind of like more or less abiding the principles of open source spirit, but then contract, but then the data that it was trained on is not open source or the methods that were applied to the data are not open source, right? +So to me, it sounds so important to keep kind of declaring what open source is, what are the principles, right? + And maybe this keynote also shed some light, but you also, it seems like this topic is also very close to you, and you are in the open source contributing a lot, you are the commuter, like can you can you share your vision on what is open source, what are the implications for how this field should be developing? +I think it's a huge problem, especially because nowadays open washing, which is like the practice of associating openness to something that potentially is not really fully open, is happening a lot. +Here's open source is cool, open source show like a good habit, so you're the good guys if you if you do open source. +So as you said, we are not going to make names of companies or association that claim, for example, they lar language model were open source, but lar language models are complex systems. So the outputs, the final light waves on the neural network is just one little part of the entire picture. +Those lar language models are normally pre-trained on huge quantities of data with a pre-training algorithm. +So the pre-training data and the pre-training code, is it open? Is it not? I mean, many times it's not, not only not, it's not open, it's not even known, what kind of data is just generic internet scale data. +What about the fine tuning them? + So once you get the pre-training, which is the unsupervised part, where you just explore the web, that's pretty simple, then you want to fine tune for specific tasks, like sentence similarity or instruction following or, I don't know, summarization, any kind of task you want to use the and to do that normally using an additional training data set that is particularly designed for that fine tuning task. +And again, is that open? So do you communicate and then you make it available? And the code for fine tuning, do you make it available? +The output of the pre-training, do you make it available separately from the output of the fine, the documentation, any data that explains what is done, why you found it? + So I've read like an interesting paper that I guess we can share, like, as a comment from a university, they were like comparing all these aspects for the MS models and how famous like open source of the MS models actually behaved on each of these columns and would be surprising how a small percentage of these like, you know, big layers are actually open sourcing everything. +So it's not just the license that as you said correctly, sometimes it's limiting, but literally like the components shirt, sometimes it's just the final which is, is it helpful? +I mean, in open source, you want to cooperate, you want to improve the code, like in normal code, you have access to everything and you can like improve, you can help the community. +If you just access the ways, you can use it, but can you, for example, improve it? Can you understand if it's fair? In the data you was there? Yes. Yeah, it's really difficult. Yeah. +And so, what do you think these discussions should start or maybe it's ongoing? Are you part of some discussion? And how does it impact business and maybe research? Right? Because there are different sides of this coin. +Many of these things emerge in the academia space, but then they move to create value on the business side, but it could also be vice versa. +So what do you think? What are we going to address this? +So I know that the open source initiative, which is a group of people that directly directly open source manifest, so I try to basically think to ways of the finding open source is they are working on a definition of open source for AI models. +We are going to see hopefully soon enough, a definition of what it means for a model to be open source. And that is going to be great, because at that point it's not a matter of like, I believe it's open, and I claim it's open. It's open for its notes. +And everything is covered by a license that is going to be open or not. + In terms of like impolination between like the academia and the industry, I think probably that's the most, I mean, this period is so important to see like cross pollination, because there are like many models that for example are designed and contributed by the academia that must then be used by organizations and the other way around, because of course there are like a lot of money involved in training and free training and fine training on the algorithm of the models. +So many, I mean, only few actually organizations are able to do this. So they should try, I mean, ideally to make it as open as possible in a way that then universities can focus on small components and potentially help in some more. Yeah, yeah. +I mean, you know, pre-training and internet scale is incredibly expensive from energy perspective especially. So I hope, you know, we reach the point where everything is open enough for also smaller organizations and academia organizations to 12. +Yeah, it's very interesting because there is always going to be this kind of play between, okay, this big company has all the servers, they need to train the model. +So they can also decide how they will do it and not kind of disclose, but then maybe the question that we need to be disputing and sort of discussing is that they still don't have all the data to train on, right? Potentially. +Like there have been some cases mentioned in the keynote, you know, when some company, we will not name the company, it goes and trains it on some articles of famous publishing house, right? And now that publishing house is unhappy because they say you took our articles without us knowing this. +Now, it now it kind of evokes this question, okay, when I was reading this article, there was probably some license which said you can not, you can do this, but not this, maybe there is something hidden, right? But only now we started discussing these things, right? +And that's very interesting topic, but do you think that, you know, when the companies will be, let's say, they have open source to model and they have checked everything on that manifesto or on that contract. + Do you think that there will be still a need for some maybe tooling or some process to kind of continuously maintain the status of this model as open source because it may well happen that, you know, either the company or research institute, they go and accidentally use some data that doesn't anymore confirm, confirm, like, comply with this contract, right? +First of all, without other lands, do you think such thing exists, would say, for Apache Solar or you see that no one will find a library that is not the license that it has to be, plugs it in and we do a release of you seeing a solar. +I think there is some checker, right? Yeah. So these applies to certain extent to code as well, right? So you are a contributor. When you sign basically the, let's say contract with the Apache Solar Foundation, you are sure that any kind of contribution you do is your own. +So there's not being COVID, for example, that was not COVID-rided and the sort of thing. It's genuinely created by you. It's genuinely created by you. So to certain extent, that would be a similar thing to potentially add some training data. +I think probably it's a little bit less likely that like in an existing large-language model, for example, someone would contribute a little more data. I mean, it's more likely that maybe you you would change a little bit the code, for example, responsible of fine tuning and it sort of things. +But still, I think there will be this layer of responsibility that wouldn't wait on the shoulders of the contributors because of course, you kind of have control on these single individuals. +And you need to have like this sort of layer where the no-profit, the Schopen source project protects itself from. Yeah. +Because I can imagine that again, it's probably putting it to extremes, but there could be eventually some tooling where you take the model and you introspect its behavior and you can make a guess on which data it was trained. Potentially. Or at least find some similarities with how it produces. +I mean, there been some attacks, so to say, right? So you can actually probe the model and see what it outputs, right? You can even break some models sometimes. That's true. So that's more like on the hacker side or the the bad hacker side. But I mean, there probably will be tooling. +Do you think it's possible that there will be tooling kind of checking the model and and making some hypothesis. And as you said, once caught, that organization will kind of lose its trust, right? So obviously, everyone wants to be kind of accountable and so on. +But then there could be a flip side of that that you can kind of accidentally assume that they did it, but that's not true, right? Now that becomes a very hard debate, right? So it's an area which I think deserves exploration and study. +And I believe that's being accountable of like the data you use and disclosing it, of course, is the first step. But then also validating that companies send the truth, for example, I think it's going to be important to build trust and to make sure that what you display is actually what happens. +Because we never know. It's very interesting. Was there some other topic you wanted to cover? I mean, are you also working on Raga or anything of that or evaluating the LLAM based search? We are working on many different integration with LLAM models. Retriol passage generation is one of it. +Nugro language parsing, for example, is another so moving from Nugro language to structured queries. Yeah. Probably the last thing we can discuss, the last topic we can discuss is prompt engineering. Yeah. +Briefly, because yes, it's this naming convention is something that really hurts me because it's not engineering at all in my opinion because you're just attempting to communicate with something and you don't know what to expect. +Because I've seen, I mean, I've seen tools today with people saying, you write this prompt and you hope you get this response. Yeah. You type this prompt and you ask, please give me the response. She is, to me, something that is, not scientific at all. It's not scientific. It's not scientific. +It's not science. You can't just be comfortable. Yeah, you can be comfortable. Yeah. So there's, in my opinion, just to give you short, a big margin of improvement there to interact with LLAM in a more program idea. +I want to specify it as with rules and get back a response that satisfies those rules. If I want to select an item from a list, I want to select an item from the list. I wonder LLAM is more than to be able to just select the item from the list. +I'm not 80% of the time, select the item on the list and 20% of the time, select the item and give me an explanation. I just wanted the item from the list. Yeah. +And right now, I've seen, and we will see the conference because I've seen in the agenda, there are a lot of many talks about trying to solve this problem. But right now, what I've seen as a possible solution is just like you post validates the response and you go back. +Like, okay, yeah, I asked for a specific JSON in the response. There are mistakes. It's like, it's not a possible JSON. I say, I go back to the LLAM and I say, this is not a possible JSON. Can you fix it? And again, and again, and again, which is not really something you want to go to production. +So in short, in my opinion, like using LLAM with models, program, I right now is full for approval concepts. But would I bring to production like out of the box like these sort of approaches? I want, because I wouldn't, you know, brings, I want to bring something that is deterministic. Yeah. +It does what I want to do 100% of the time. Sometimes. And I don't want to hope. It's a good thing. Well, I want to make it work. +Yeah, but it's also like I see it's very interesting topic, by the way, but I also see some level of contradiction that to like between non deterministic and hallucinating model essentially hallucinating by design because it keeps predicting the terms, right? +And some level of determinism as you just explained, right? But I guess, but I guess at the same time, someone might say that our life is not that deterministic, many moving parts and we still find a way to, I don't know, leave it and then build something, right? Yeah, there's something that moves. +I think, you know, it's the first, anyway, we are experiencing, in my opinion, at first, that in these new worlds, of AI big models. So I think it's fair. They were born to auto complete text, to generate text. And now we are trying to use them to do tasks. Yes. Which is okay. +We as humans use language to, to task. Yeah. So I just guess, and then we end up with programming. Yes. Computers, right? So let's play it in a little bit more that would be programmable. Yeah. I mean, it doesn't remind a bit. +I haven't explored it, but to mention GSPY, the packaging, probably you heard about it, right? Which replaces the prompt engineering with more programmable sort of way of doing it. +I still don't know how it works, but I know that some of the engineers in my team applied it quite successfully to generate some synthetic queries. So that was very interesting. +Have you played with it? Do you know? So my team, we've been playing with it for one of the bros of conceptual concepts for doing other language processing, for structure solar queries. And I think it's a nice first step. +Still is giving you like an in-direction between the prompt and the way to write a prompt. So you have like classes, the mimic, the programming language, but then ends up as prompt. Yeah. I see. You are not sure that you will get what you want. But it's a first attempt. Yeah. Yeah. +I think it's okay. I mean, we will improve that. It feels maybe like maybe first baby step in a way that it's not a work to the state that you mentioned. Yeah. That's not a conflict solution, but it's great. So what's next that you're working on? That you want to disclose? Yeah. +So first of all, I want you to bring and merge the hybrid search receiver rank fusion to solar, which is coming nine to seven. So I'm very close to the men. Awesome. For that. We are, we got some funding from the European Union to work on solar. So that's like that. +So we're going to be able to contribute more vector-based search capabilities, better integrations we've learned into rank, better integration with like inference and points to make it a little bit more transparent. That's it. +And still in the work like multivalued supports for vector-based search in solar. And there are like some pieces in losing to speed up and improve optimized vector-based search that are not yet in solar. And that's among my top priority. So this is in short. This is fantastic. This is fantastic. +And of course, it's all open source and you know, yeah, and then like anyone can join, but maybe we can also make a call out and say that, I mean, everyone that wants to contribute, you know, the more the merrier. +Yeah, I actually enjoy like even though I don't do solar or only seen today, I am still reading the main news. And and for the most part, I'm reading the lesson one. So sometimes I see your discussion as well and you where you say, actually, by the way, I'm working on this hybrid search. +I did this and this. So maybe it will influence you. And I also love the the culture where you do not really enforce or impose your solution. You just say just for you to know, maybe it will be useful. And someone says, yeah, awesome. I especially love that that discussion. +I forgot the particular topic, but I remember it was a recent one. It's a recent one. So keep up your great work. And it's always a pleasure to talk to you and it looks like it's a tradition that we started meeting in the same sort of early flood words. In the end now. +Yeah, so every two years, we see a lot of people. Yeah, there are a lot of people. So fantastic. Thank you so much, LeSando. And anyway, my reaction to the conference. Thank you. Thank you. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/berlin-buzzwords-2024-doug-turnbull-learning-in-public.md b/transcripts_with_timestamps/vector-podcast/berlin-buzzwords-2024-doug-turnbull-learning-in-public.md new file mode 100644 index 0000000..c2736a8 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/berlin-buzzwords-2024-doug-turnbull-learning-in-public.md @@ -0,0 +1,1675 @@ +--- +description: '

This episode on YouTube: https://www.youtube.com/watch?v=fIPC_xzqJ0o

00:00 + Intro

00:30 Greets for Doug

01:46 Apache Solr and stuff

03:08 + Hello LTR project

04:42 Secret sauce of Doug''s continuous blogging

08:50 + SearchArray

13:22 Running complex ML experiments

17:29 Efficient search + orgs

22:58 Writing a book on search and AI

Show notes:

- + Doug''s talk on Learning To Rank at Reddit delivered at the Berlin Buzzwords 2024 + conference: https://www.youtube.com/watch?v=gUtF1gyHsSM

- + Hello LTR: https://github.com/o19s/hello-ltr

- + Lexical search for pandas with SearchArray: https://github.com/softwaredoug/searcharray

- + https://softwaredoug.com/

- + What AI Engineers Should Know about Search: https://softwaredoug.com/blog/2024/06/25/what-ai-engineers-need-to-know-search

- + AI Powered Search: https://www.manning.com/books/ai-powered-search

- + Quepid: https://github.com/o19s/quepid

- + Branching out in your ML / search experiments: https://dvc.org/doc/use-cases

- + Doug on Twitter: https://x.com/softwaredoug

- + Doug on LinkedIn: https://www.linkedin.com/in/softwaredoug/

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20240718_110721_6a250f534a47b913cfe9ab7513e63b01.png +pub_date: Thu, 18 Jul 2024 11:10:42 GMT +title: Berlin Buzzwords 2024 - Doug Turnbull - Learning in Public +url: https://rss.com/podcasts/vector-podcast/1572886 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 13.52, "text": " A", + "tokens": [50364, 316, 51040], "temperature": 1.0, "avg_logprob": -4.316565619574653, + "compression_ratio": 0.5294117647058824, "no_speech_prob": 0.48129671812057495}, + {"id": 1, "seek": 0, "start": 13.52, "end": 20.14, "text": " timeick", "tokens": + [51040, 565, 618, 51371], "temperature": 1.0, "avg_logprob": -4.316565619574653, + "compression_ratio": 0.5294117647058824, "no_speech_prob": 0.48129671812057495}, + {"id": 2, "seek": 2014, "start": 20.14, "end": 30.14, "text": " Cool.", "tokens": + [50364, 8561, 13, 50864], "temperature": 0.0, "avg_logprob": -0.3635210477388822, + "compression_ratio": 1.4, "no_speech_prob": 0.37665465474128723}, {"id": 3, "seek": + 2014, "start": 30.14, "end": 34.14, "text": " Yeah.", "tokens": [50864, 865, 13, + 51064], "temperature": 0.0, "avg_logprob": -0.3635210477388822, "compression_ratio": + 1.4, "no_speech_prob": 0.37665465474128723}, {"id": 4, "seek": 2014, "start": 34.14, + "end": 42.14, "text": " Hello, how are you? Hi, Doug. It''s great meeting you at + Berlin Boswords. Yeah, I can see you. Yeah, great to see you.", "tokens": [51064, + 2425, 11, 577, 366, 291, 30, 2421, 11, 12742, 13, 467, 311, 869, 3440, 291, 412, + 13848, 22264, 86, 5703, 13, 865, 11, 286, 393, 536, 291, 13, 865, 11, 869, 281, + 536, 291, 13, 51464], "temperature": 0.0, "avg_logprob": -0.3635210477388822, "compression_ratio": + 1.4, "no_speech_prob": 0.37665465474128723}, {"id": 5, "seek": 2014, "start": 42.14, + "end": 46.14, "text": " It''s your second time on the podcast and yeah, excited + to be back.", "tokens": [51464, 467, 311, 428, 1150, 565, 322, 264, 7367, 293, 1338, + 11, 2919, 281, 312, 646, 13, 51664], "temperature": 0.0, "avg_logprob": -0.3635210477388822, + "compression_ratio": 1.4, "no_speech_prob": 0.37665465474128723}, {"id": 6, "seek": + 4614, "start": 46.14, "end": 51.14, "text": " Yeah, awesome. I think it''s like + two years or only one. Yeah, I think so.", "tokens": [50364, 865, 11, 3476, 13, + 286, 519, 309, 311, 411, 732, 924, 420, 787, 472, 13, 865, 11, 286, 519, 370, 13, + 50614], "temperature": 0.0, "avg_logprob": -0.2507284849117964, "compression_ratio": + 1.502824858757062, "no_speech_prob": 0.08665119856595993}, {"id": 7, "seek": 4614, + "start": 51.14, "end": 61.14, "text": " But how have you been? I wasn''t going. + I''ve been great. Just been doing traditional learning to rank over at Reddit.", + "tokens": [50614, 583, 577, 362, 291, 668, 30, 286, 2067, 380, 516, 13, 286, 600, + 668, 869, 13, 1449, 668, 884, 5164, 2539, 281, 6181, 670, 412, 32210, 13, 51114], + "temperature": 0.0, "avg_logprob": -0.2507284849117964, "compression_ratio": 1.502824858757062, + "no_speech_prob": 0.08665119856595993}, {"id": 8, "seek": 4614, "start": 61.14, + "end": 68.14, "text": " And it''s been a lot of fun. A lot of it''s just meat and + potato stuff. Yeah.", "tokens": [51114, 400, 309, 311, 668, 257, 688, 295, 1019, + 13, 316, 688, 295, 309, 311, 445, 4615, 293, 7445, 1507, 13, 865, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.2507284849117964, "compression_ratio": 1.502824858757062, + "no_speech_prob": 0.08665119856595993}, {"id": 9, "seek": 6814, "start": 68.14, + "end": 73.14, "text": " The stuff that I think is really important like your training + data with search.", "tokens": [50364, 440, 1507, 300, 286, 519, 307, 534, 1021, + 411, 428, 3097, 1412, 365, 3164, 13, 50614], "temperature": 0.0, "avg_logprob": + -0.16343793869018555, "compression_ratio": 1.7768595041322315, "no_speech_prob": + 0.10796502977609634}, {"id": 10, "seek": 6814, "start": 73.14, "end": 78.14, "text": + " And you''re getting your features right and that sort of thing.", "tokens": [50614, + 400, 291, 434, 1242, 428, 4122, 558, 293, 300, 1333, 295, 551, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.16343793869018555, "compression_ratio": 1.7768595041322315, + "no_speech_prob": 0.10796502977609634}, {"id": 11, "seek": 6814, "start": 78.14, + "end": 84.14, "text": " Not actually too much vector search lately. So kind of.", + "tokens": [50864, 1726, 767, 886, 709, 8062, 3164, 12881, 13, 407, 733, 295, 13, + 51164], "temperature": 0.0, "avg_logprob": -0.16343793869018555, "compression_ratio": + 1.7768595041322315, "no_speech_prob": 0.10796502977609634}, {"id": 12, "seek": 6814, + "start": 84.14, "end": 95.14, "text": " Having a path in the ranking model space + and I still think that''s really important for if you''re building a rag app or + if you''re building a lot of these things, a lot of people are sort of discovering + this through the vector route.", "tokens": [51164, 389, 6152, 257, 3100, 294, 264, + 17833, 2316, 1901, 293, 286, 920, 519, 300, 311, 534, 1021, 337, 498, 291, 434, + 2390, 257, 17539, 724, 420, 498, 291, 434, 2390, 257, 688, 295, 613, 721, 11, 257, + 688, 295, 561, 366, 1333, 295, 24773, 341, 807, 264, 8062, 7955, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.16343793869018555, "compression_ratio": 1.7768595041322315, + "no_speech_prob": 0.10796502977609634}, {"id": 13, "seek": 9514, "start": 95.14, + "end": 100.14, "text": " They''re like realizing there''s a small other side of + information retrieval. Yeah, that''s important.", "tokens": [50364, 814, 434, 411, + 16734, 456, 311, 257, 1359, 661, 1252, 295, 1589, 19817, 3337, 13, 865, 11, 300, + 311, 1021, 13, 50614], "temperature": 0.0, "avg_logprob": -0.20446961205284875, + "compression_ratio": 1.613899613899614, "no_speech_prob": 0.4473632574081421}, {"id": + 14, "seek": 9514, "start": 100.14, "end": 106.14, "text": " And that''s that''s + really exciting to me because I think a lot of new ideas that suffer coming in the + space. Yeah, yeah.", "tokens": [50614, 400, 300, 311, 300, 311, 534, 4670, 281, + 385, 570, 286, 519, 257, 688, 295, 777, 3487, 300, 9753, 1348, 294, 264, 1901, 13, + 865, 11, 1338, 13, 50914], "temperature": 0.0, "avg_logprob": -0.20446961205284875, + "compression_ratio": 1.613899613899614, "no_speech_prob": 0.4473632574081421}, {"id": + 15, "seek": 9514, "start": 106.14, "end": 110.14, "text": " Yeah, amazing talk as + well. I''m sure we''ll link it.", "tokens": [50914, 865, 11, 2243, 751, 382, 731, + 13, 286, 478, 988, 321, 603, 2113, 309, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.20446961205284875, "compression_ratio": 1.613899613899614, "no_speech_prob": + 0.4473632574081421}, {"id": 16, "seek": 9514, "start": 110.14, "end": 118.14, "text": + " What was it''s published? The one you just gave. And you also reminded me of time + as I told you, you know, of the time when I was working on solar.", "tokens": [51114, + 708, 390, 309, 311, 6572, 30, 440, 472, 291, 445, 2729, 13, 400, 291, 611, 15920, + 385, 295, 565, 382, 286, 1907, 291, 11, 291, 458, 11, 295, 264, 565, 562, 286, 390, + 1364, 322, 7936, 13, 51514], "temperature": 0.0, "avg_logprob": -0.20446961205284875, + "compression_ratio": 1.613899613899614, "no_speech_prob": 0.4473632574081421}, {"id": + 17, "seek": 11814, "start": 118.14, "end": 126.14, "text": " Starting at version + one. It was each to ask you which version you''re running, but then I was like, + what will it matter to me?", "tokens": [50364, 16217, 412, 3037, 472, 13, 467, 390, + 1184, 281, 1029, 291, 597, 3037, 291, 434, 2614, 11, 457, 550, 286, 390, 411, 11, + 437, 486, 309, 1871, 281, 385, 30, 50764], "temperature": 0.0, "avg_logprob": -0.2614104644111965, + "compression_ratio": 1.6623931623931625, "no_speech_prob": 0.13754825294017792}, + {"id": 18, "seek": 11814, "start": 126.14, "end": 135.14, "text": " Well, it''s + we were running solar seven until recently and one of the things you didn''t talk + about in the talk was having performance problems with solar seven.", "tokens": + [50764, 1042, 11, 309, 311, 321, 645, 2614, 7936, 3407, 1826, 3938, 293, 472, 295, + 264, 721, 291, 994, 380, 751, 466, 294, 264, 751, 390, 1419, 3389, 2740, 365, 7936, + 3407, 13, 51214], "temperature": 0.0, "avg_logprob": -0.2614104644111965, "compression_ratio": + 1.6623931623931625, "no_speech_prob": 0.13754825294017792}, {"id": 19, "seek": 11814, + "start": 135.14, "end": 142.14, "text": " Yeah. And moving to solar nine fixed it. + So it helped with a lot of stability and performance problems.", "tokens": [51214, + 865, 13, 400, 2684, 281, 7936, 4949, 6806, 309, 13, 407, 309, 4254, 365, 257, 688, + 295, 11826, 293, 3389, 2740, 13, 51564], "temperature": 0.0, "avg_logprob": -0.2614104644111965, + "compression_ratio": 1.6623931623931625, "no_speech_prob": 0.13754825294017792}, + {"id": 20, "seek": 14214, "start": 142.14, "end": 151.14, "text": " And that''s + just one of those things that a lot of these projects machine like not just like + learning to rank is like a lot of machine learning projects.", "tokens": [50364, + 400, 300, 311, 445, 472, 295, 729, 721, 300, 257, 688, 295, 613, 4455, 3479, 411, + 406, 445, 411, 2539, 281, 6181, 307, 411, 257, 688, 295, 3479, 2539, 4455, 13, 50814], + "temperature": 0.0, "avg_logprob": -0.20529462640935725, "compression_ratio": 1.8372093023255813, + "no_speech_prob": 0.01549532637000084}, {"id": 21, "seek": 14214, "start": 151.14, + "end": 160.14, "text": " You what I find is especially learning to rank you''re + often like building out and scaling up infrastructure for a certain problem at the + same time you''re doing machine learning.", "tokens": [50814, 509, 437, 286, 915, + 307, 2318, 2539, 281, 6181, 291, 434, 2049, 411, 2390, 484, 293, 21589, 493, 6896, + 337, 257, 1629, 1154, 412, 264, 912, 565, 291, 434, 884, 3479, 2539, 13, 51264], + "temperature": 0.0, "avg_logprob": -0.20529462640935725, "compression_ratio": 1.8372093023255813, + "no_speech_prob": 0.01549532637000084}, {"id": 22, "seek": 14214, "start": 160.14, + "end": 161.14, "text": " Yeah.", "tokens": [51264, 865, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.20529462640935725, "compression_ratio": 1.8372093023255813, + "no_speech_prob": 0.01549532637000084}, {"id": 23, "seek": 14214, "start": 161.14, + "end": 170.14, "text": " So it''s it''s you''re finding these problems. Yeah, and + you will spend weeks. Yeah, or month like why is this slow? It''s unexpectedly slow.", + "tokens": [51314, 407, 309, 311, 309, 311, 291, 434, 5006, 613, 2740, 13, 865, 11, + 293, 291, 486, 3496, 3259, 13, 865, 11, 420, 1618, 411, 983, 307, 341, 2964, 30, + 467, 311, 40452, 2964, 13, 51764], "temperature": 0.0, "avg_logprob": -0.20529462640935725, + "compression_ratio": 1.8372093023255813, "no_speech_prob": 0.01549532637000084}, + {"id": 24, "seek": 17014, "start": 170.14, "end": 176.14, "text": " What''s behind + it is and then you realize, oh, solar nine doesn''t have this problem and will resolve + it. Yeah.", "tokens": [50364, 708, 311, 2261, 309, 307, 293, 550, 291, 4325, 11, + 1954, 11, 7936, 4949, 1177, 380, 362, 341, 1154, 293, 486, 14151, 309, 13, 865, + 13, 50664], "temperature": 0.0, "avg_logprob": -0.28207980178472564, "compression_ratio": + 1.6654545454545455, "no_speech_prob": 0.08824250847101212}, {"id": 25, "seek": 17014, + "start": 176.14, "end": 181.14, "text": " And we were already upgrading. So like, + okay, we can put this. We don''t have to stress out about this performance problem.", + "tokens": [50664, 400, 321, 645, 1217, 36249, 13, 407, 411, 11, 1392, 11, 321, 393, + 829, 341, 13, 492, 500, 380, 362, 281, 4244, 484, 466, 341, 3389, 1154, 13, 50914], + "temperature": 0.0, "avg_logprob": -0.28207980178472564, "compression_ratio": 1.6654545454545455, + "no_speech_prob": 0.08824250847101212}, {"id": 26, "seek": 17014, "start": 181.14, + "end": 184.14, "text": " That''s why it takes a year to year.", "tokens": [50914, + 663, 311, 983, 309, 2516, 257, 1064, 281, 1064, 13, 51064], "temperature": 0.0, + "avg_logprob": -0.28207980178472564, "compression_ratio": 1.6654545454545455, "no_speech_prob": + 0.08824250847101212}, {"id": 27, "seek": 17014, "start": 184.14, "end": 187.14, + "text": " For these projects. Yeah, yeah, start to show.", "tokens": [51064, 1171, + 613, 4455, 13, 865, 11, 1338, 11, 722, 281, 855, 13, 51214], "temperature": 0.0, + "avg_logprob": -0.28207980178472564, "compression_ratio": 1.6654545454545455, "no_speech_prob": + 0.08824250847101212}, {"id": 28, "seek": 17014, "start": 187.14, "end": 188.14, + "text": " Yeah.", "tokens": [51214, 865, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.28207980178472564, "compression_ratio": 1.6654545454545455, "no_speech_prob": + 0.08824250847101212}, {"id": 29, "seek": 17014, "start": 188.14, "end": 195.14, + "text": " I also would like to say thank you for your project that I think you started + back at OEC Open Source Connections.", "tokens": [51264, 286, 611, 576, 411, 281, + 584, 1309, 291, 337, 428, 1716, 300, 286, 519, 291, 1409, 646, 412, 422, 8140, 7238, + 29629, 11653, 626, 13, 51614], "temperature": 0.0, "avg_logprob": -0.28207980178472564, + "compression_ratio": 1.6654545454545455, "no_speech_prob": 0.08824250847101212}, + {"id": 30, "seek": 17014, "start": 195.14, "end": 197.14, "text": " Yeah. Hello + LTR. Yeah.", "tokens": [51614, 865, 13, 2425, 441, 25936, 13, 865, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.28207980178472564, "compression_ratio": 1.6654545454545455, + "no_speech_prob": 0.08824250847101212}, {"id": 31, "seek": 19714, "start": 197.14, + "end": 207.14, "text": " I think it''s still out there and it''s out. That''s a + great project. Yeah, it really allowed me to quickly, you know, jump on the on the + train and start moving because I was actually alone on the team.", "tokens": [50364, + 286, 519, 309, 311, 920, 484, 456, 293, 309, 311, 484, 13, 663, 311, 257, 869, 1716, + 13, 865, 11, 309, 534, 4350, 385, 281, 2661, 11, 291, 458, 11, 3012, 322, 264, 322, + 264, 3847, 293, 722, 2684, 570, 286, 390, 767, 3312, 322, 264, 1469, 13, 50864], + "temperature": 0.0, "avg_logprob": -0.20264286465115017, "compression_ratio": 1.5850622406639003, + "no_speech_prob": 0.22208599746227264}, {"id": 32, "seek": 19714, "start": 207.14, + "end": 216.14, "text": " I did do search before, but it wasn''t related to a mile + at all, right? It was like feature engineering button at different side of things + and.", "tokens": [50864, 286, 630, 360, 3164, 949, 11, 457, 309, 2067, 380, 4077, + 281, 257, 12620, 412, 439, 11, 558, 30, 467, 390, 411, 4111, 7043, 2960, 412, 819, + 1252, 295, 721, 293, 13, 51314], "temperature": 0.0, "avg_logprob": -0.20264286465115017, + "compression_ratio": 1.5850622406639003, "no_speech_prob": 0.22208599746227264}, + {"id": 33, "seek": 19714, "start": 216.14, "end": 218.14, "text": " Yeah, so thanks + for that. Really? Yeah.", "tokens": [51314, 865, 11, 370, 3231, 337, 300, 13, 4083, + 30, 865, 13, 51414], "temperature": 0.0, "avg_logprob": -0.20264286465115017, "compression_ratio": + 1.5850622406639003, "no_speech_prob": 0.22208599746227264}, {"id": 34, "seek": 21814, + "start": 218.14, "end": 226.14, "text": " I think I think it''s really important + and one thing I think it''s the career advice that''s helped me is to learn in public.", + "tokens": [50364, 286, 519, 286, 519, 309, 311, 534, 1021, 293, 472, 551, 286, 519, + 309, 311, 264, 3988, 5192, 300, 311, 4254, 385, 307, 281, 1466, 294, 1908, 13, 50764], + "temperature": 0.0, "avg_logprob": -0.11759167823238649, "compression_ratio": 1.4795321637426901, + "no_speech_prob": 0.507870078086853}, {"id": 35, "seek": 21814, "start": 226.14, + "end": 232.14, "text": " So a lot of hello LTR came up when I was learning how to + do LTR.", "tokens": [50764, 407, 257, 688, 295, 7751, 441, 25936, 1361, 493, 562, + 286, 390, 2539, 577, 281, 360, 441, 25936, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.11759167823238649, "compression_ratio": 1.4795321637426901, "no_speech_prob": + 0.507870078086853}, {"id": 36, "seek": 21814, "start": 232.14, "end": 237.14, "text": + " And I had to get some examples and like try different things out.", "tokens": + [51064, 400, 286, 632, 281, 483, 512, 5110, 293, 411, 853, 819, 721, 484, 13, 51314], + "temperature": 0.0, "avg_logprob": -0.11759167823238649, "compression_ratio": 1.4795321637426901, + "no_speech_prob": 0.507870078086853}, {"id": 37, "seek": 23714, "start": 237.14, + "end": 249.14, "text": " And then as I made mistakes, those mistakes became lessons + for the LTR training that came out of hello LTR, also called hello LTR that open + source connection it does.", "tokens": [50364, 400, 550, 382, 286, 1027, 8038, 11, + 729, 8038, 3062, 8820, 337, 264, 441, 25936, 3097, 300, 1361, 484, 295, 7751, 441, + 25936, 11, 611, 1219, 7751, 441, 25936, 300, 1269, 4009, 4984, 309, 775, 13, 50964], + "temperature": 0.0, "avg_logprob": -0.1723451858911759, "compression_ratio": 1.6308411214953271, + "no_speech_prob": 0.21308627724647522}, {"id": 38, "seek": 23714, "start": 249.14, + "end": 255.14, "text": " So it''s I really encourage people like the best way the + best teachers are often people actively learning.", "tokens": [50964, 407, 309, + 311, 286, 534, 5373, 561, 411, 264, 1151, 636, 264, 1151, 6023, 366, 2049, 561, + 13022, 2539, 13, 51264], "temperature": 0.0, "avg_logprob": -0.1723451858911759, + "compression_ratio": 1.6308411214953271, "no_speech_prob": 0.21308627724647522}, + {"id": 39, "seek": 23714, "start": 255.14, "end": 259.14, "text": " Yeah, because + you will encounter the mistakes that the experts forgot about.", "tokens": [51264, + 865, 11, 570, 291, 486, 8593, 264, 8038, 300, 264, 8572, 5298, 466, 13, 51464], + "temperature": 0.0, "avg_logprob": -0.1723451858911759, "compression_ratio": 1.6308411214953271, + "no_speech_prob": 0.21308627724647522}, {"id": 40, "seek": 25914, "start": 259.14, + "end": 264.14, "text": " I couldn''t tell you about how to learn how a loop worked + from Python because I''ve done too long.", "tokens": [50364, 286, 2809, 380, 980, + 291, 466, 577, 281, 1466, 577, 257, 6367, 2732, 490, 15329, 570, 286, 600, 1096, + 886, 938, 13, 50614], "temperature": 0.0, "avg_logprob": -0.2932789049650493, "compression_ratio": + 1.6557377049180328, "no_speech_prob": 0.276696115732193}, {"id": 41, "seek": 25914, + "start": 264.14, "end": 274.14, "text": " But the person who would teach, have the + empathy to teach that really well to someone learning that''s scratch would be probably + my son if he was learning Python for so.", "tokens": [50614, 583, 264, 954, 567, + 576, 2924, 11, 362, 264, 18701, 281, 2924, 300, 534, 731, 281, 1580, 2539, 300, + 311, 8459, 576, 312, 1391, 452, 1872, 498, 415, 390, 2539, 15329, 337, 370, 13, + 51114], "temperature": 0.0, "avg_logprob": -0.2932789049650493, "compression_ratio": + 1.6557377049180328, "no_speech_prob": 0.276696115732193}, {"id": 42, "seek": 25914, + "start": 274.14, "end": 283.14, "text": " So I really encourage like be out there + speaking, blogging, yeah, because you''ll have insight to how to teach a product + that expert won''t.", "tokens": [51114, 407, 286, 534, 5373, 411, 312, 484, 456, + 4124, 11, 6968, 3249, 11, 1338, 11, 570, 291, 603, 362, 11269, 281, 577, 281, 2924, + 257, 1674, 300, 5844, 1582, 380, 13, 51564], "temperature": 0.0, "avg_logprob": + -0.2932789049650493, "compression_ratio": 1.6557377049180328, "no_speech_prob": + 0.276696115732193}, {"id": 43, "seek": 28314, "start": 283.14, "end": 291.14, "text": + " Yeah, that''s that''s another side of your professional life that amazes me is + that how do you find time to block so it''s like.", "tokens": [50364, 865, 11, 300, + 311, 300, 311, 1071, 1252, 295, 428, 4843, 993, 300, 669, 921, 279, 385, 307, 300, + 577, 360, 291, 915, 565, 281, 3461, 370, 309, 311, 411, 13, 50764], "temperature": + 0.0, "avg_logprob": -0.19301744154941888, "compression_ratio": 1.5660377358490567, + "no_speech_prob": 0.16326384246349335}, {"id": 44, "seek": 28314, "start": 291.14, + "end": 301.14, "text": " And that those are really deep things sometimes you go + into detail with its code or you offer some thought model like do you sleep at all.", + "tokens": [50764, 400, 300, 729, 366, 534, 2452, 721, 2171, 291, 352, 666, 2607, + 365, 1080, 3089, 420, 291, 2626, 512, 1194, 2316, 411, 360, 291, 2817, 412, 439, + 13, 51264], "temperature": 0.0, "avg_logprob": -0.19301744154941888, "compression_ratio": + 1.5660377358490567, "no_speech_prob": 0.16326384246349335}, {"id": 45, "seek": 28314, + "start": 301.14, "end": 306.14, "text": " I think I just have a high tolerance for + making mistakes in public.", "tokens": [51264, 286, 519, 286, 445, 362, 257, 1090, + 23368, 337, 1455, 8038, 294, 1908, 13, 51514], "temperature": 0.0, "avg_logprob": + -0.19301744154941888, "compression_ratio": 1.5660377358490567, "no_speech_prob": + 0.16326384246349335}, {"id": 46, "seek": 30614, "start": 306.14, "end": 311.14, + "text": " And also I think a lot of it has to do with having a history degree.", + "tokens": [50364, 400, 611, 286, 519, 257, 688, 295, 309, 575, 281, 360, 365, 1419, + 257, 2503, 4314, 13, 50614], "temperature": 0.0, "avg_logprob": -0.24321552387719014, + "compression_ratio": 1.8227272727272728, "no_speech_prob": 0.12327217310667038}, + {"id": 47, "seek": 30614, "start": 311.14, "end": 316.14, "text": " Oh really, I + didn''t know that. Yeah, history and computer science. So when you get when you + do history.", "tokens": [50614, 876, 534, 11, 286, 994, 380, 458, 300, 13, 865, + 11, 2503, 293, 3820, 3497, 13, 407, 562, 291, 483, 562, 291, 360, 2503, 13, 50864], + "temperature": 0.0, "avg_logprob": -0.24321552387719014, "compression_ratio": 1.8227272727272728, + "no_speech_prob": 0.12327217310667038}, {"id": 48, "seek": 30614, "start": 316.14, + "end": 325.14, "text": " It''s a lot of writing writing writing, writing, reporting + writing and then a lot of it''s also when you get to this your senior level history.", + "tokens": [50864, 467, 311, 257, 688, 295, 3579, 3579, 3579, 11, 3579, 11, 10031, + 3579, 293, 550, 257, 688, 295, 309, 311, 611, 562, 291, 483, 281, 341, 428, 7965, + 1496, 2503, 13, 51314], "temperature": 0.0, "avg_logprob": -0.24321552387719014, + "compression_ratio": 1.8227272727272728, "no_speech_prob": 0.12327217310667038}, + {"id": 49, "seek": 30614, "start": 325.14, "end": 331.14, "text": " It''s like not + just writing an essay, but can you write your argument in a single page.", "tokens": + [51314, 467, 311, 411, 406, 445, 3579, 364, 16238, 11, 457, 393, 291, 2464, 428, + 6770, 294, 257, 2167, 3028, 13, 51614], "temperature": 0.0, "avg_logprob": -0.24321552387719014, + "compression_ratio": 1.8227272727272728, "no_speech_prob": 0.12327217310667038}, + {"id": 50, "seek": 33114, "start": 331.14, "end": 341.14, "text": " And so that''s + which is funny because you think when you''re a student, you think I''m going to + make the margins big and I''m going to make the text big so I can take up more space.", + "tokens": [50364, 400, 370, 300, 311, 597, 307, 4074, 570, 291, 519, 562, 291, 434, + 257, 3107, 11, 291, 519, 286, 478, 516, 281, 652, 264, 30317, 955, 293, 286, 478, + 516, 281, 652, 264, 2487, 955, 370, 286, 393, 747, 493, 544, 1901, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.20271820068359375, "compression_ratio": 1.7439613526570048, + "no_speech_prob": 0.10355423390865326}, {"id": 51, "seek": 33114, "start": 341.14, + "end": 346.14, "text": " Yeah, when you start writing a lot, you tend to write you + tend to get really verbose.", "tokens": [50864, 865, 11, 562, 291, 722, 3579, 257, + 688, 11, 291, 3928, 281, 2464, 291, 3928, 281, 483, 534, 9595, 541, 13, 51114], + "temperature": 0.0, "avg_logprob": -0.20271820068359375, "compression_ratio": 1.7439613526570048, + "no_speech_prob": 0.10355423390865326}, {"id": 52, "seek": 33114, "start": 346.14, + "end": 349.14, "text": " Yeah, then you have to learn to make your arguments like + exactly.", "tokens": [51114, 865, 11, 550, 291, 362, 281, 1466, 281, 652, 428, 12869, + 411, 2293, 13, 51264], "temperature": 0.0, "avg_logprob": -0.20271820068359375, + "compression_ratio": 1.7439613526570048, "no_speech_prob": 0.10355423390865326}, + {"id": 53, "seek": 33114, "start": 349.14, "end": 351.14, "text": " Yes, and shorter.", + "tokens": [51264, 1079, 11, 293, 11639, 13, 51364], "temperature": 0.0, "avg_logprob": + -0.20271820068359375, "compression_ratio": 1.7439613526570048, "no_speech_prob": + 0.10355423390865326}, {"id": 54, "seek": 33114, "start": 351.14, "end": 352.14, + "text": " And yeah, so.", "tokens": [51364, 400, 1338, 11, 370, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.20271820068359375, "compression_ratio": 1.7439613526570048, + "no_speech_prob": 0.10355423390865326}, {"id": 55, "seek": 35214, "start": 352.14, + "end": 361.14, "text": " Yeah, it also now when I''m doing the product management + role, I have I do not have a history degree like you, but I have to write some things + in a concise way.", "tokens": [50364, 865, 11, 309, 611, 586, 562, 286, 478, 884, + 264, 1674, 4592, 3090, 11, 286, 362, 286, 360, 406, 362, 257, 2503, 4314, 411, 291, + 11, 457, 286, 362, 281, 2464, 512, 721, 294, 257, 44882, 636, 13, 50814], "temperature": + 0.0, "avg_logprob": -0.20056160916103405, "compression_ratio": 1.5825688073394495, + "no_speech_prob": 0.40724697709083557}, {"id": 56, "seek": 35214, "start": 361.14, + "end": 366.14, "text": " Sometimes they say you have to remove half of the page + because you''re not feeding the page limit.", "tokens": [50814, 4803, 436, 584, + 291, 362, 281, 4159, 1922, 295, 264, 3028, 570, 291, 434, 406, 12919, 264, 3028, + 4948, 13, 51064], "temperature": 0.0, "avg_logprob": -0.20056160916103405, "compression_ratio": + 1.5825688073394495, "no_speech_prob": 0.40724697709083557}, {"id": 57, "seek": 35214, + "start": 366.14, "end": 368.14, "text": " I make that mistake all the time.", "tokens": + [51064, 286, 652, 300, 6146, 439, 264, 565, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.20056160916103405, "compression_ratio": 1.5825688073394495, "no_speech_prob": + 0.40724697709083557}, {"id": 58, "seek": 35214, "start": 368.14, "end": 371.14, + "text": " Yeah, how many one pageers are exactly like 10 pages.", "tokens": [51164, + 865, 11, 577, 867, 472, 3028, 433, 366, 2293, 411, 1266, 7183, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.20056160916103405, "compression_ratio": 1.5825688073394495, + "no_speech_prob": 0.40724697709083557}, {"id": 59, "seek": 37114, "start": 371.14, + "end": 378.14, "text": " And another thing is like never talk about hypothetical + future because you don''t even know yourself what it will happen or not, right.", + "tokens": [50364, 400, 1071, 551, 307, 411, 1128, 751, 466, 33053, 2027, 570, 291, + 500, 380, 754, 458, 1803, 437, 309, 486, 1051, 420, 406, 11, 558, 13, 50714], "temperature": + 0.0, "avg_logprob": -0.26978559153420584, "compression_ratio": 1.7657992565055762, + "no_speech_prob": 0.7693508267402649}, {"id": 60, "seek": 37114, "start": 378.14, + "end": 384.14, "text": " Yeah, only talk either talk about things that you''re absolutely + certain have happened or you certain that they have planned already, right.", "tokens": + [50714, 865, 11, 787, 751, 2139, 751, 466, 721, 300, 291, 434, 3122, 1629, 362, + 2011, 420, 291, 1629, 300, 436, 362, 8589, 1217, 11, 558, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.26978559153420584, "compression_ratio": 1.7657992565055762, + "no_speech_prob": 0.7693508267402649}, {"id": 61, "seek": 37114, "start": 384.14, + "end": 386.14, "text": " Yeah, that''s how we do the product management.", "tokens": + [51014, 865, 11, 300, 311, 577, 321, 360, 264, 1674, 4592, 13, 51114], "temperature": + 0.0, "avg_logprob": -0.26978559153420584, "compression_ratio": 1.7657992565055762, + "no_speech_prob": 0.7693508267402649}, {"id": 62, "seek": 37114, "start": 386.14, + "end": 394.14, "text": " Yeah, it teaches that that''s the side of things, but I + guess what I do is out and I then go to blogging and and and and use you as a great + example there.", "tokens": [51114, 865, 11, 309, 16876, 300, 300, 311, 264, 1252, + 295, 721, 11, 457, 286, 2041, 437, 286, 360, 307, 484, 293, 286, 550, 352, 281, + 6968, 3249, 293, 293, 293, 293, 764, 291, 382, 257, 869, 1365, 456, 13, 51514], + "temperature": 0.0, "avg_logprob": -0.26978559153420584, "compression_ratio": 1.7657992565055762, + "no_speech_prob": 0.7693508267402649}, {"id": 63, "seek": 39414, "start": 394.14, + "end": 398.14, "text": " You go and unleash yourself and blogging and you write + what you want, right.", "tokens": [50364, 509, 352, 293, 49814, 1803, 293, 6968, + 3249, 293, 291, 2464, 437, 291, 528, 11, 558, 13, 50564], "temperature": 0.0, "avg_logprob": + -0.2016909339211204, "compression_ratio": 1.6939890710382515, "no_speech_prob": + 0.7486097812652588}, {"id": 64, "seek": 39414, "start": 398.14, "end": 413.14, "text": + " But you still need to what you want said that you became more successful blogger + at the moment you actually started modeling that specific person you''re writing + to not an abstract audience and not yourself because you''re not writing.", "tokens": + [50564, 583, 291, 920, 643, 281, 437, 291, 528, 848, 300, 291, 3062, 544, 4406, + 6968, 1321, 412, 264, 1623, 291, 767, 1409, 15983, 300, 2685, 954, 291, 434, 3579, + 281, 406, 364, 12649, 4034, 293, 406, 1803, 570, 291, 434, 406, 3579, 13, 51314], + "temperature": 0.0, "avg_logprob": -0.2016909339211204, "compression_ratio": 1.6939890710382515, + "no_speech_prob": 0.7486097812652588}, {"id": 65, "seek": 41314, "start": 413.14, + "end": 416.14, "text": " And so is this how you still perceive it.", "tokens": [50364, + 400, 370, 307, 341, 577, 291, 920, 20281, 309, 13, 50514], "temperature": 0.0, "avg_logprob": + -0.16255636889525135, "compression_ratio": 1.5609756097560976, "no_speech_prob": + 0.7266706228256226}, {"id": 66, "seek": 41314, "start": 416.14, "end": 426.14, "text": + " Well, I definitely write to myself six months from now, but I also write the audience + I imagine is like a close group of friends.", "tokens": [50514, 1042, 11, 286, 2138, + 2464, 281, 2059, 2309, 2493, 490, 586, 11, 457, 286, 611, 2464, 264, 4034, 286, + 3811, 307, 411, 257, 1998, 1594, 295, 1855, 13, 51014], "temperature": 0.0, "avg_logprob": + -0.16255636889525135, "compression_ratio": 1.5609756097560976, "no_speech_prob": + 0.7266706228256226}, {"id": 67, "seek": 41314, "start": 426.14, "end": 429.14, "text": + " So I almost think about blogging as sometimes and this is easy.", "tokens": [51014, + 407, 286, 1920, 519, 466, 6968, 3249, 382, 2171, 293, 341, 307, 1858, 13, 51164], + "temperature": 0.0, "avg_logprob": -0.16255636889525135, "compression_ratio": 1.5609756097560976, + "no_speech_prob": 0.7266706228256226}, {"id": 68, "seek": 41314, "start": 429.14, + "end": 434.14, "text": " It''s easy for people often to imagine sitting down and + writing a long email or Slack message.", "tokens": [51164, 467, 311, 1858, 337, + 561, 2049, 281, 3811, 3798, 760, 293, 3579, 257, 938, 3796, 420, 37211, 3636, 13, + 51414], "temperature": 0.0, "avg_logprob": -0.16255636889525135, "compression_ratio": + 1.5609756097560976, "no_speech_prob": 0.7266706228256226}, {"id": 69, "seek": 41314, + "start": 434.14, "end": 437.14, "text": " And what if you just turn that into a + blog post.", "tokens": [51414, 400, 437, 498, 291, 445, 1261, 300, 666, 257, 6968, + 2183, 13, 51564], "temperature": 0.0, "avg_logprob": -0.16255636889525135, "compression_ratio": + 1.5609756097560976, "no_speech_prob": 0.7266706228256226}, {"id": 70, "seek": 41314, + "start": 437.14, "end": 438.14, "text": " Yeah.", "tokens": [51564, 865, 13, 51614], + "temperature": 0.0, "avg_logprob": -0.16255636889525135, "compression_ratio": 1.5609756097560976, + "no_speech_prob": 0.7266706228256226}, {"id": 71, "seek": 43814, "start": 438.14, + "end": 446.14, "text": " And to me, that''s that''s an inspiration for this so many + times you get excited about something and want to send a message to your friends.", + "tokens": [50364, 400, 281, 385, 11, 300, 311, 300, 311, 364, 10249, 337, 341, 370, + 867, 1413, 291, 483, 2919, 466, 746, 293, 528, 281, 2845, 257, 3636, 281, 428, 1855, + 13, 50764], "temperature": 0.0, "avg_logprob": -0.21016060678582443, "compression_ratio": + 1.678294573643411, "no_speech_prob": 0.01751577854156494}, {"id": 72, "seek": 43814, + "start": 446.14, "end": 448.14, "text": " Yeah, and share it.", "tokens": [50764, + 865, 11, 293, 2073, 309, 13, 50864], "temperature": 0.0, "avg_logprob": -0.21016060678582443, + "compression_ratio": 1.678294573643411, "no_speech_prob": 0.01751577854156494}, + {"id": 73, "seek": 43814, "start": 448.14, "end": 453.14, "text": " Well, turn that + enthusiasm and that message into a blog post and then that those are the best blog + posts.", "tokens": [50864, 1042, 11, 1261, 300, 23417, 293, 300, 3636, 666, 257, + 6968, 2183, 293, 550, 300, 729, 366, 264, 1151, 6968, 12300, 13, 51114], "temperature": + 0.0, "avg_logprob": -0.21016060678582443, "compression_ratio": 1.678294573643411, + "no_speech_prob": 0.01751577854156494}, {"id": 74, "seek": 43814, "start": 453.14, + "end": 457.14, "text": " And I also think it''s really important to remember it''s + blogging.", "tokens": [51114, 400, 286, 611, 519, 309, 311, 534, 1021, 281, 1604, + 309, 311, 6968, 3249, 13, 51314], "temperature": 0.0, "avg_logprob": -0.21016060678582443, + "compression_ratio": 1.678294573643411, "no_speech_prob": 0.01751577854156494}, + {"id": 75, "seek": 43814, "start": 457.14, "end": 461.14, "text": " It''s like it''s + a step above writing us a from a post.", "tokens": [51314, 467, 311, 411, 309, 311, + 257, 1823, 3673, 3579, 505, 257, 490, 257, 2183, 13, 51514], "temperature": 0.0, + "avg_logprob": -0.21016060678582443, "compression_ratio": 1.678294573643411, "no_speech_prob": + 0.01751577854156494}, {"id": 76, "seek": 43814, "start": 461.14, "end": 464.14, + "text": " It''s very informal. Don''t take it too seriously.", "tokens": [51514, + 467, 311, 588, 24342, 13, 1468, 380, 747, 309, 886, 6638, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.21016060678582443, "compression_ratio": 1.678294573643411, + "no_speech_prob": 0.01751577854156494}, {"id": 77, "seek": 46414, "start": 464.14, + "end": 466.14, "text": " You will make mistakes exactly.", "tokens": [50364, 509, + 486, 652, 8038, 2293, 13, 50464], "temperature": 0.0, "avg_logprob": -0.20486576757698416, + "compression_ratio": 1.8798076923076923, "no_speech_prob": 0.46770596504211426}, + {"id": 78, "seek": 46414, "start": 466.14, "end": 470.14, "text": " You will it''s + and it''s a do it for fun.", "tokens": [50464, 509, 486, 309, 311, 293, 309, 311, + 257, 360, 309, 337, 1019, 13, 50664], "temperature": 0.0, "avg_logprob": -0.20486576757698416, + "compression_ratio": 1.8798076923076923, "no_speech_prob": 0.46770596504211426}, + {"id": 79, "seek": 46414, "start": 470.14, "end": 471.14, "text": " Yeah.", "tokens": + [50664, 865, 13, 50714], "temperature": 0.0, "avg_logprob": -0.20486576757698416, + "compression_ratio": 1.8798076923076923, "no_speech_prob": 0.46770596504211426}, + {"id": 80, "seek": 46414, "start": 471.14, "end": 474.14, "text": " But yeah, I + do it a lot because I want.", "tokens": [50714, 583, 1338, 11, 286, 360, 309, 257, + 688, 570, 286, 528, 13, 50864], "temperature": 0.0, "avg_logprob": -0.20486576757698416, + "compression_ratio": 1.8798076923076923, "no_speech_prob": 0.46770596504211426}, + {"id": 81, "seek": 46414, "start": 474.14, "end": 479.14, "text": " I think I I + there''s a meme of like someone starting out on something.", "tokens": [50864, 286, + 519, 286, 286, 456, 311, 257, 21701, 295, 411, 1580, 2891, 484, 322, 746, 13, 51114], + "temperature": 0.0, "avg_logprob": -0.20486576757698416, "compression_ratio": 1.8798076923076923, + "no_speech_prob": 0.46770596504211426}, {"id": 82, "seek": 46414, "start": 479.14, + "end": 484.14, "text": " Yeah, someone being very senior and then or someone being + like mid career and then someone.", "tokens": [51114, 865, 11, 1580, 885, 588, 7965, + 293, 550, 420, 1580, 885, 411, 2062, 3988, 293, 550, 1580, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.20486576757698416, "compression_ratio": 1.8798076923076923, + "no_speech_prob": 0.46770596504211426}, {"id": 83, "seek": 46414, "start": 484.14, + "end": 486.14, "text": " Super senior in their career.", "tokens": [51364, 4548, + 7965, 294, 641, 3988, 13, 51464], "temperature": 0.0, "avg_logprob": -0.20486576757698416, + "compression_ratio": 1.8798076923076923, "no_speech_prob": 0.46770596504211426}, + {"id": 84, "seek": 46414, "start": 486.14, "end": 490.14, "text": " And often the + like starting out this meme starting out super senior are the same.", "tokens": + [51464, 400, 2049, 264, 411, 2891, 484, 341, 21701, 2891, 484, 1687, 7965, 366, + 264, 912, 13, 51664], "temperature": 0.0, "avg_logprob": -0.20486576757698416, "compression_ratio": + 1.8798076923076923, "no_speech_prob": 0.46770596504211426}, {"id": 85, "seek": 49014, + "start": 490.14, "end": 498.14, "text": " And it''s like my version of that is doing + when you start out you code and do stuff to impress your friends like in high school + or whatever.", "tokens": [50364, 400, 309, 311, 411, 452, 3037, 295, 300, 307, 884, + 562, 291, 722, 484, 291, 3089, 293, 360, 1507, 281, 6729, 428, 1855, 411, 294, 1090, + 1395, 420, 2035, 13, 50764], "temperature": 0.0, "avg_logprob": -0.11580961476201597, + "compression_ratio": 1.8532818532818534, "no_speech_prob": 0.10690049082040787}, + {"id": 86, "seek": 49014, "start": 498.14, "end": 502.14, "text": " And then you + get like all worried about like having some big impact and like.", "tokens": [50764, + 400, 550, 291, 483, 411, 439, 5804, 466, 411, 1419, 512, 955, 2712, 293, 411, 13, + 50964], "temperature": 0.0, "avg_logprob": -0.11580961476201597, "compression_ratio": + 1.8532818532818534, "no_speech_prob": 0.10690049082040787}, {"id": 87, "seek": 49014, + "start": 502.14, "end": 504.14, "text": " Impressing the whole world.", "tokens": + [50964, 8270, 18605, 264, 1379, 1002, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.11580961476201597, "compression_ratio": 1.8532818532818534, "no_speech_prob": + 0.10690049082040787}, {"id": 88, "seek": 49014, "start": 504.14, "end": 509.14, + "text": " And then when you get super senior again, you''re just like I just want + to like do full stuff to impress my friends.", "tokens": [51064, 400, 550, 562, + 291, 483, 1687, 7965, 797, 11, 291, 434, 445, 411, 286, 445, 528, 281, 411, 360, + 1577, 1507, 281, 6729, 452, 1855, 13, 51314], "temperature": 0.0, "avg_logprob": + -0.11580961476201597, "compression_ratio": 1.8532818532818534, "no_speech_prob": + 0.10690049082040787}, {"id": 89, "seek": 49014, "start": 509.14, "end": 510.14, + "text": " Yeah.", "tokens": [51314, 865, 13, 51364], "temperature": 0.0, "avg_logprob": + -0.11580961476201597, "compression_ratio": 1.8532818532818534, "no_speech_prob": + 0.10690049082040787}, {"id": 90, "seek": 49014, "start": 510.14, "end": 515.14, + "text": " And which actually turns out also to be stuff that the whole world cares + about because usually your friends are.", "tokens": [51364, 400, 597, 767, 4523, + 484, 611, 281, 312, 1507, 300, 264, 1379, 1002, 12310, 466, 570, 2673, 428, 1855, + 366, 13, 51614], "temperature": 0.0, "avg_logprob": -0.11580961476201597, "compression_ratio": + 1.8532818532818534, "no_speech_prob": 0.10690049082040787}, {"id": 91, "seek": 51514, + "start": 515.14, "end": 521.14, "text": " Like doing cool stuff themselves like + you know vector searcher or doing cool AI stuff.", "tokens": [50364, 1743, 884, + 1627, 1507, 2969, 411, 291, 458, 8062, 3164, 260, 420, 884, 1627, 7318, 1507, 13, + 50664], "temperature": 0.0, "avg_logprob": -0.20795553922653198, "compression_ratio": + 1.7829457364341086, "no_speech_prob": 0.14798229932785034}, {"id": 92, "seek": 51514, + "start": 521.14, "end": 522.14, "text": " Yeah.", "tokens": [50664, 865, 13, 50714], + "temperature": 0.0, "avg_logprob": -0.20795553922653198, "compression_ratio": 1.7829457364341086, + "no_speech_prob": 0.14798229932785034}, {"id": 93, "seek": 51514, "start": 522.14, + "end": 525.14, "text": " So it turns out that the rest of the world finds that it''s + interesting to.", "tokens": [50714, 407, 309, 4523, 484, 300, 264, 1472, 295, 264, + 1002, 10704, 300, 309, 311, 1880, 281, 13, 50864], "temperature": 0.0, "avg_logprob": + -0.20795553922653198, "compression_ratio": 1.7829457364341086, "no_speech_prob": + 0.14798229932785034}, {"id": 94, "seek": 51514, "start": 525.14, "end": 528.14, + "text": " But I think that''s a really important thing to have an authentic voice.", + "tokens": [50864, 583, 286, 519, 300, 311, 257, 534, 1021, 551, 281, 362, 364, 12466, + 3177, 13, 51014], "temperature": 0.0, "avg_logprob": -0.20795553922653198, "compression_ratio": + 1.7829457364341086, "no_speech_prob": 0.14798229932785034}, {"id": 95, "seek": 51514, + "start": 528.14, "end": 529.14, "text": " Yeah.", "tokens": [51014, 865, 13, 51064], + "temperature": 0.0, "avg_logprob": -0.20795553922653198, "compression_ratio": 1.7829457364341086, + "no_speech_prob": 0.14798229932785034}, {"id": 96, "seek": 51514, "start": 529.14, + "end": 532.14, "text": " So it''s also part of the building up your profile.", "tokens": + [51064, 407, 309, 311, 611, 644, 295, 264, 2390, 493, 428, 7964, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.20795553922653198, "compression_ratio": 1.7829457364341086, + "no_speech_prob": 0.14798229932785034}, {"id": 97, "seek": 51514, "start": 532.14, + "end": 533.14, "text": " Yeah.", "tokens": [51214, 865, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.20795553922653198, "compression_ratio": 1.7829457364341086, + "no_speech_prob": 0.14798229932785034}, {"id": 98, "seek": 51514, "start": 533.14, + "end": 534.14, "text": " Yeah.", "tokens": [51264, 865, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.20795553922653198, "compression_ratio": 1.7829457364341086, + "no_speech_prob": 0.14798229932785034}, {"id": 99, "seek": 51514, "start": 534.14, + "end": 540.14, "text": " So I think like Steve Jobs, I think I think said like computer + is a bicycle for the mind, right?", "tokens": [51314, 407, 286, 519, 411, 7466, + 29169, 11, 286, 519, 286, 519, 848, 411, 3820, 307, 257, 20888, 337, 264, 1575, + 11, 558, 30, 51614], "temperature": 0.0, "avg_logprob": -0.20795553922653198, "compression_ratio": + 1.7829457364341086, "no_speech_prob": 0.14798229932785034}, {"id": 100, "seek": + 51514, "start": 540.14, "end": 542.14, "text": " And so blogging is also in a way + bicycle for the mind.", "tokens": [51614, 400, 370, 6968, 3249, 307, 611, 294, 257, + 636, 20888, 337, 264, 1575, 13, 51714], "temperature": 0.0, "avg_logprob": -0.20795553922653198, + "compression_ratio": 1.7829457364341086, "no_speech_prob": 0.14798229932785034}, + {"id": 101, "seek": 54214, "start": 542.14, "end": 544.14, "text": " You have to + rework yourself.", "tokens": [50364, 509, 362, 281, 48376, 1803, 13, 50464], "temperature": + 0.0, "avg_logprob": -0.211898926765688, "compression_ratio": 1.5520361990950227, + "no_speech_prob": 0.08107094466686249}, {"id": 102, "seek": 54214, "start": 544.14, + "end": 545.14, "text": " Right.", "tokens": [50464, 1779, 13, 50514], "temperature": + 0.0, "avg_logprob": -0.211898926765688, "compression_ratio": 1.5520361990950227, + "no_speech_prob": 0.08107094466686249}, {"id": 103, "seek": 54214, "start": 545.14, + "end": 550.14, "text": " It''s it''s also the programming that you do on the site + like searcher rate.", "tokens": [50514, 467, 311, 309, 311, 611, 264, 9410, 300, + 291, 360, 322, 264, 3621, 411, 3164, 260, 3314, 13, 50764], "temperature": 0.0, + "avg_logprob": -0.211898926765688, "compression_ratio": 1.5520361990950227, "no_speech_prob": + 0.08107094466686249}, {"id": 104, "seek": 54214, "start": 550.14, "end": 553.14, + "text": " So tell me more about what was the motivation.", "tokens": [50764, 407, + 980, 385, 544, 466, 437, 390, 264, 12335, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.211898926765688, "compression_ratio": 1.5520361990950227, "no_speech_prob": 0.08107094466686249}, + {"id": 105, "seek": 54214, "start": 553.14, "end": 555.14, "text": " Why did you + start working?", "tokens": [50914, 1545, 630, 291, 722, 1364, 30, 51014], "temperature": + 0.0, "avg_logprob": -0.211898926765688, "compression_ratio": 1.5520361990950227, + "no_speech_prob": 0.08107094466686249}, {"id": 106, "seek": 54214, "start": 555.14, + "end": 559.14, "text": " So I think I like to go against the grain a little bit.", + "tokens": [51014, 407, 286, 519, 286, 411, 281, 352, 1970, 264, 12837, 257, 707, + 857, 13, 51214], "temperature": 0.0, "avg_logprob": -0.211898926765688, "compression_ratio": + 1.5520361990950227, "no_speech_prob": 0.08107094466686249}, {"id": 107, "seek": + 54214, "start": 559.14, "end": 567.14, "text": " So I actually had worked on different + versions of vector search for a long time before this for craze.", "tokens": [51214, + 407, 286, 767, 632, 2732, 322, 819, 9606, 295, 8062, 3164, 337, 257, 938, 565, 949, + 341, 337, 2094, 1381, 13, 51614], "temperature": 0.0, "avg_logprob": -0.211898926765688, + "compression_ratio": 1.5520361990950227, "no_speech_prob": 0.08107094466686249}, + {"id": 108, "seek": 56714, "start": 567.14, "end": 575.14, "text": " And different + hacks and things to make vector search work in a in the current in the solar or + last search world.", "tokens": [50364, 400, 819, 33617, 293, 721, 281, 652, 8062, + 3164, 589, 294, 257, 294, 264, 2190, 294, 264, 7936, 420, 1036, 3164, 1002, 13, + 50764], "temperature": 0.0, "avg_logprob": -0.21895347322736466, "compression_ratio": + 1.5842105263157895, "no_speech_prob": 0.017902947962284088}, {"id": 109, "seek": + 56714, "start": 575.14, "end": 578.14, "text": " So everyone''s in vector search.", + "tokens": [50764, 407, 1518, 311, 294, 8062, 3164, 13, 50914], "temperature": 0.0, + "avg_logprob": -0.21895347322736466, "compression_ratio": 1.5842105263157895, "no_speech_prob": + 0.017902947962284088}, {"id": 110, "seek": 56714, "start": 578.14, "end": 585.14, + "text": " And I decided that in part because I wanted to get do a little bit more + native programming.", "tokens": [50914, 400, 286, 3047, 300, 294, 644, 570, 286, + 1415, 281, 483, 360, 257, 707, 857, 544, 8470, 9410, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.21895347322736466, "compression_ratio": 1.5842105263157895, + "no_speech_prob": 0.017902947962284088}, {"id": 111, "seek": 56714, "start": 585.14, + "end": 586.14, "text": " Get facts.", "tokens": [51264, 3240, 9130, 13, 51314], + "temperature": 0.0, "avg_logprob": -0.21895347322736466, "compression_ratio": 1.5842105263157895, + "no_speech_prob": 0.017902947962284088}, {"id": 112, "seek": 56714, "start": 586.14, + "end": 587.14, "text": " I used to do that.", "tokens": [51314, 286, 1143, 281, + 360, 300, 13, 51364], "temperature": 0.0, "avg_logprob": -0.21895347322736466, "compression_ratio": + 1.5842105263157895, "no_speech_prob": 0.017902947962284088}, {"id": 113, "seek": + 56714, "start": 587.14, "end": 588.14, "text": " I used to be a C programmer.", + "tokens": [51364, 286, 1143, 281, 312, 257, 383, 32116, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.21895347322736466, "compression_ratio": 1.5842105263157895, + "no_speech_prob": 0.017902947962284088}, {"id": 114, "seek": 56714, "start": 588.14, + "end": 590.14, "text": " Yeah.", "tokens": [51414, 865, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.21895347322736466, "compression_ratio": 1.5842105263157895, + "no_speech_prob": 0.017902947962284088}, {"id": 115, "seek": 59014, "start": 590.14, + "end": 593.14, "text": " And I found that.", "tokens": [50364, 400, 286, 1352, 300, + 13, 50514], "temperature": 0.0, "avg_logprob": -0.20218599888316371, "compression_ratio": + 1.7098039215686274, "no_speech_prob": 0.002256857231259346}, {"id": 116, "seek": + 59014, "start": 593.14, "end": 598.14, "text": " Vector search is very welcoming + to machine learning engineer data science community.", "tokens": [50514, 691, 20814, + 3164, 307, 588, 17378, 281, 3479, 2539, 11403, 1412, 3497, 1768, 13, 50764], "temperature": + 0.0, "avg_logprob": -0.20218599888316371, "compression_ratio": 1.7098039215686274, + "no_speech_prob": 0.002256857231259346}, {"id": 117, "seek": 59014, "start": 598.14, + "end": 601.14, "text": " But the traditional lexical search engines like solar and + elastic search.", "tokens": [50764, 583, 264, 5164, 476, 87, 804, 3164, 12982, 411, + 7936, 293, 17115, 3164, 13, 50914], "temperature": 0.0, "avg_logprob": -0.20218599888316371, + "compression_ratio": 1.7098039215686274, "no_speech_prob": 0.002256857231259346}, + {"id": 118, "seek": 59014, "start": 601.14, "end": 602.14, "text": " They''re very + weird.", "tokens": [50914, 814, 434, 588, 3657, 13, 50964], "temperature": 0.0, + "avg_logprob": -0.20218599888316371, "compression_ratio": 1.7098039215686274, "no_speech_prob": + 0.002256857231259346}, {"id": 119, "seek": 59014, "start": 602.14, "end": 603.14, + "text": " Yeah.", "tokens": [50964, 865, 13, 51014], "temperature": 0.0, "avg_logprob": + -0.20218599888316371, "compression_ratio": 1.7098039215686274, "no_speech_prob": + 0.002256857231259346}, {"id": 120, "seek": 59014, "start": 603.14, "end": 605.14, + "text": " They''re very like you have to know this weird query DSL.", "tokens": + [51014, 814, 434, 588, 411, 291, 362, 281, 458, 341, 3657, 14581, 15816, 43, 13, + 51114], "temperature": 0.0, "avg_logprob": -0.20218599888316371, "compression_ratio": + 1.7098039215686274, "no_speech_prob": 0.002256857231259346}, {"id": 121, "seek": + 59014, "start": 605.14, "end": 607.14, "text": " You have to understand these things.", + "tokens": [51114, 509, 362, 281, 1223, 613, 721, 13, 51214], "temperature": 0.0, + "avg_logprob": -0.20218599888316371, "compression_ratio": 1.7098039215686274, "no_speech_prob": + 0.002256857231259346}, {"id": 122, "seek": 59014, "start": 607.14, "end": 608.14, + "text": " Organization.", "tokens": [51214, 23979, 13, 51264], "temperature": 0.0, + "avg_logprob": -0.20218599888316371, "compression_ratio": 1.7098039215686274, "no_speech_prob": + 0.002256857231259346}, {"id": 123, "seek": 59014, "start": 608.14, "end": 610.14, + "text": " So I wanted to take that.", "tokens": [51264, 407, 286, 1415, 281, 747, + 300, 13, 51364], "temperature": 0.0, "avg_logprob": -0.20218599888316371, "compression_ratio": + 1.7098039215686274, "no_speech_prob": 0.002256857231259346}, {"id": 124, "seek": + 59014, "start": 610.14, "end": 617.14, "text": " And take that lexical world and + bring it into a data science or data high data sort of environment.", "tokens": + [51364, 400, 747, 300, 476, 87, 804, 1002, 293, 1565, 309, 666, 257, 1412, 3497, + 420, 1412, 1090, 1412, 1333, 295, 2823, 13, 51714], "temperature": 0.0, "avg_logprob": + -0.20218599888316371, "compression_ratio": 1.7098039215686274, "no_speech_prob": + 0.002256857231259346}, {"id": 125, "seek": 61714, "start": 617.14, "end": 623.14, + "text": " And then I found that it''s very comfortable to machine learning people + and data science.", "tokens": [50364, 400, 550, 286, 1352, 300, 309, 311, 588, 4619, + 281, 3479, 2539, 561, 293, 1412, 3497, 13, 50664], "temperature": 0.0, "avg_logprob": + -0.31037235260009766, "compression_ratio": 1.596638655462185, "no_speech_prob": + 0.013651189394295216}, {"id": 126, "seek": 61714, "start": 623.14, "end": 624.14, + "text": " Yeah.", "tokens": [50664, 865, 13, 50714], "temperature": 0.0, "avg_logprob": + -0.31037235260009766, "compression_ratio": 1.596638655462185, "no_speech_prob": + 0.013651189394295216}, {"id": 127, "seek": 61714, "start": 624.14, "end": 626.14, + "text": " So I built search array.", "tokens": [50714, 407, 286, 3094, 3164, 10225, + 13, 50814], "temperature": 0.0, "avg_logprob": -0.31037235260009766, "compression_ratio": + 1.596638655462185, "no_speech_prob": 0.013651189394295216}, {"id": 128, "seek": + 61714, "start": 626.14, "end": 627.14, "text": " And what search array.", "tokens": + [50814, 400, 437, 3164, 10225, 13, 50864], "temperature": 0.0, "avg_logprob": -0.31037235260009766, + "compression_ratio": 1.596638655462185, "no_speech_prob": 0.013651189394295216}, + {"id": 129, "seek": 61714, "start": 627.14, "end": 632.14, "text": " The reason + I built that what that does is it''s basically a lexical extension to pandas.", + "tokens": [50864, 440, 1778, 286, 3094, 300, 437, 300, 775, 307, 309, 311, 1936, + 257, 476, 87, 804, 10320, 281, 4565, 296, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.31037235260009766, "compression_ratio": 1.596638655462185, "no_speech_prob": + 0.013651189394295216}, {"id": 130, "seek": 61714, "start": 632.14, "end": 634.14, + "text": " So if I have text.", "tokens": [51114, 407, 498, 286, 362, 2487, 13, 51214], + "temperature": 0.0, "avg_logprob": -0.31037235260009766, "compression_ratio": 1.596638655462185, + "no_speech_prob": 0.013651189394295216}, {"id": 131, "seek": 61714, "start": 634.14, + "end": 638.14, "text": " I can make a pandas column that''s just like tokenized + text.", "tokens": [51214, 286, 393, 652, 257, 4565, 296, 7738, 300, 311, 445, 411, + 14862, 1602, 2487, 13, 51414], "temperature": 0.0, "avg_logprob": -0.31037235260009766, + "compression_ratio": 1.596638655462185, "no_speech_prob": 0.013651189394295216}, + {"id": 132, "seek": 61714, "start": 638.14, "end": 643.14, "text": " And then I + can ask it to score against a keyword and get a BM25 score.", "tokens": [51414, + 400, 550, 286, 393, 1029, 309, 281, 6175, 1970, 257, 20428, 293, 483, 257, 15901, + 6074, 6175, 13, 51664], "temperature": 0.0, "avg_logprob": -0.31037235260009766, + "compression_ratio": 1.596638655462185, "no_speech_prob": 0.013651189394295216}, + {"id": 133, "seek": 64314, "start": 643.14, "end": 646.14, "text": " And this was + so in a co lab notebook or something.", "tokens": [50364, 400, 341, 390, 370, 294, + 257, 598, 2715, 21060, 420, 746, 13, 50514], "temperature": 0.0, "avg_logprob": + -0.2264869213104248, "compression_ratio": 1.6329588014981273, "no_speech_prob": + 0.07405898720026016}, {"id": 134, "seek": 64314, "start": 646.14, "end": 650.14, + "text": " I can quickly further set ideas while having to stand up solar elastic + search.", "tokens": [50514, 286, 393, 2661, 3052, 992, 3487, 1339, 1419, 281, 1463, + 493, 7936, 17115, 3164, 13, 50714], "temperature": 0.0, "avg_logprob": -0.2264869213104248, + "compression_ratio": 1.6329588014981273, "no_speech_prob": 0.07405898720026016}, + {"id": 135, "seek": 64314, "start": 650.14, "end": 651.14, "text": " Yeah.", "tokens": + [50714, 865, 13, 50764], "temperature": 0.0, "avg_logprob": -0.2264869213104248, + "compression_ratio": 1.6329588014981273, "no_speech_prob": 0.07405898720026016}, + {"id": 136, "seek": 64314, "start": 651.14, "end": 653.14, "text": " Or think about + Docker container and all this stuff.", "tokens": [50764, 1610, 519, 466, 33772, + 10129, 293, 439, 341, 1507, 13, 50864], "temperature": 0.0, "avg_logprob": -0.2264869213104248, + "compression_ratio": 1.6329588014981273, "no_speech_prob": 0.07405898720026016}, + {"id": 137, "seek": 64314, "start": 653.14, "end": 657.14, "text": " And I said + I could see like, OK, I want to tokenize things a certain way.", "tokens": [50864, + 400, 286, 848, 286, 727, 536, 411, 11, 2264, 11, 286, 528, 281, 14862, 1125, 721, + 257, 1629, 636, 13, 51064], "temperature": 0.0, "avg_logprob": -0.2264869213104248, + "compression_ratio": 1.6329588014981273, "no_speech_prob": 0.07405898720026016}, + {"id": 138, "seek": 64314, "start": 657.14, "end": 661.14, "text": " I want to score + change the BM25 scoring to be a certain way.", "tokens": [51064, 286, 528, 281, + 6175, 1319, 264, 15901, 6074, 22358, 281, 312, 257, 1629, 636, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.2264869213104248, "compression_ratio": 1.6329588014981273, + "no_speech_prob": 0.07405898720026016}, {"id": 139, "seek": 64314, "start": 661.14, + "end": 662.14, "text": " Yeah.", "tokens": [51264, 865, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.2264869213104248, "compression_ratio": 1.6329588014981273, + "no_speech_prob": 0.07405898720026016}, {"id": 140, "seek": 64314, "start": 662.14, + "end": 670.14, "text": " And which is you think about like 90% of what you do in + lexical search engine is tweaking the tokenization.", "tokens": [51314, 400, 597, + 307, 291, 519, 466, 411, 4289, 4, 295, 437, 291, 360, 294, 476, 87, 804, 3164, 2848, + 307, 6986, 2456, 264, 14862, 2144, 13, 51714], "temperature": 0.0, "avg_logprob": + -0.2264869213104248, "compression_ratio": 1.6329588014981273, "no_speech_prob": + 0.07405898720026016}, {"id": 141, "seek": 67014, "start": 670.14, "end": 671.14, + "text": " Yeah.", "tokens": [50364, 865, 13, 50414], "temperature": 0.0, "avg_logprob": + -0.17158998342660758, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.01854291930794716}, {"id": 142, "seek": 67014, "start": 671.14, "end": 674.14, + "text": " Finging the scoring, trying to index something new and search against + that.", "tokens": [50414, 479, 8716, 264, 22358, 11, 1382, 281, 8186, 746, 777, + 293, 3164, 1970, 300, 13, 50564], "temperature": 0.0, "avg_logprob": -0.17158998342660758, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.01854291930794716}, + {"id": 143, "seek": 67014, "start": 674.14, "end": 676.14, "text": " Like, oh, I + have any recognition field now.", "tokens": [50564, 1743, 11, 1954, 11, 286, 362, + 604, 11150, 2519, 586, 13, 50664], "temperature": 0.0, "avg_logprob": -0.17158998342660758, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.01854291930794716}, + {"id": 144, "seek": 67014, "start": 676.14, "end": 677.14, "text": " No.", "tokens": + [50664, 883, 13, 50714], "temperature": 0.0, "avg_logprob": -0.17158998342660758, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.01854291930794716}, + {"id": 145, "seek": 67014, "start": 677.14, "end": 680.14, "text": " Oh, it''s and + the other thing is what.", "tokens": [50714, 876, 11, 309, 311, 293, 264, 661, 551, + 307, 437, 13, 50864], "temperature": 0.0, "avg_logprob": -0.17158998342660758, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.01854291930794716}, {"id": 146, "seek": + 67014, "start": 680.14, "end": 684.14, "text": " The other thing you do a lot in + lexical search engines is I want to boost by recency.", "tokens": [50864, 440, 661, + 551, 291, 360, 257, 688, 294, 476, 87, 804, 3164, 12982, 307, 286, 528, 281, 9194, + 538, 850, 3020, 13, 51064], "temperature": 0.0, "avg_logprob": -0.17158998342660758, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.01854291930794716}, + {"id": 147, "seek": 67014, "start": 684.14, "end": 685.14, "text": " And I want + to.", "tokens": [51064, 400, 286, 528, 281, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.17158998342660758, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.01854291930794716}, {"id": 148, "seek": 67014, "start": 685.14, "end": 686.14, + "text": " Do these other things.", "tokens": [51114, 1144, 613, 661, 721, 13, 51164], + "temperature": 0.0, "avg_logprob": -0.17158998342660758, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.01854291930794716}, {"id": 149, "seek": 67014, "start": 686.14, + "end": 691.14, "text": " And a lot of these things can just be done with a really + fast like I take this column.", "tokens": [51164, 400, 257, 688, 295, 613, 721, + 393, 445, 312, 1096, 365, 257, 534, 2370, 411, 286, 747, 341, 7738, 13, 51414], + "temperature": 0.0, "avg_logprob": -0.17158998342660758, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.01854291930794716}, {"id": 150, "seek": 67014, "start": 691.14, + "end": 693.14, "text": " New miracle date column.", "tokens": [51414, 1873, 14660, + 4002, 7738, 13, 51514], "temperature": 0.0, "avg_logprob": -0.17158998342660758, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.01854291930794716}, + {"id": 151, "seek": 67014, "start": 693.14, "end": 695.14, "text": " In pandas.", + "tokens": [51514, 682, 4565, 296, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.17158998342660758, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.01854291930794716}, {"id": 152, "seek": 67014, "start": 695.14, "end": 696.14, + "text": " Yeah.", "tokens": [51614, 865, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.17158998342660758, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.01854291930794716}, {"id": 153, "seek": 69614, "start": 696.14, "end": 699.14, + "text": " And I want to add a score, a num tier ray that''s a BM25 score.", "tokens": + [50364, 400, 286, 528, 281, 909, 257, 6175, 11, 257, 1031, 12362, 18592, 300, 311, + 257, 15901, 6074, 6175, 13, 50514], "temperature": 0.0, "avg_logprob": -0.3289938173093996, + "compression_ratio": 1.6412698412698412, "no_speech_prob": 0.1377331167459488}, + {"id": 154, "seek": 69614, "start": 699.14, "end": 701.14, "text": " I multiply + them together.", "tokens": [50514, 286, 12972, 552, 1214, 13, 50614], "temperature": + 0.0, "avg_logprob": -0.3289938173093996, "compression_ratio": 1.6412698412698412, + "no_speech_prob": 0.1377331167459488}, {"id": 155, "seek": 69614, "start": 701.14, + "end": 702.14, "text": " Yeah.", "tokens": [50614, 865, 13, 50664], "temperature": + 0.0, "avg_logprob": -0.3289938173093996, "compression_ratio": 1.6412698412698412, + "no_speech_prob": 0.1377331167459488}, {"id": 156, "seek": 69614, "start": 702.14, + "end": 704.14, "text": " I have a score that''s a recency weighted.", "tokens": + [50664, 286, 362, 257, 6175, 300, 311, 257, 850, 3020, 32807, 13, 50764], "temperature": + 0.0, "avg_logprob": -0.3289938173093996, "compression_ratio": 1.6412698412698412, + "no_speech_prob": 0.1377331167459488}, {"id": 157, "seek": 69614, "start": 704.14, + "end": 705.14, "text": " Yeah.", "tokens": [50764, 865, 13, 50814], "temperature": + 0.0, "avg_logprob": -0.3289938173093996, "compression_ratio": 1.6412698412698412, + "no_speech_prob": 0.1377331167459488}, {"id": 158, "seek": 69614, "start": 705.14, + "end": 708.14, "text": " What does that look like in terms of my offline metrics?", + "tokens": [50814, 708, 775, 300, 574, 411, 294, 2115, 295, 452, 21857, 16367, 30, + 50964], "temperature": 0.0, "avg_logprob": -0.3289938173093996, "compression_ratio": + 1.6412698412698412, "no_speech_prob": 0.1377331167459488}, {"id": 159, "seek": 69614, + "start": 708.14, "end": 709.14, "text": " Yeah.", "tokens": [50964, 865, 13, 51014], + "temperature": 0.0, "avg_logprob": -0.3289938173093996, "compression_ratio": 1.6412698412698412, + "no_speech_prob": 0.1377331167459488}, {"id": 160, "seek": 69614, "start": 709.14, + "end": 710.14, "text": " Interesting.", "tokens": [51014, 14711, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.3289938173093996, "compression_ratio": 1.6412698412698412, + "no_speech_prob": 0.1377331167459488}, {"id": 161, "seek": 69614, "start": 710.14, + "end": 713.14, "text": " Without having to go off to like, you have to go out elastic + search and like that.", "tokens": [51064, 9129, 1419, 281, 352, 766, 281, 411, 11, + 291, 362, 281, 352, 484, 17115, 3164, 293, 411, 300, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.3289938173093996, "compression_ratio": 1.6412698412698412, + "no_speech_prob": 0.1377331167459488}, {"id": 162, "seek": 69614, "start": 713.14, + "end": 714.14, "text": " Assuteric stuff.", "tokens": [51214, 6281, 20314, 299, + 1507, 13, 51264], "temperature": 0.0, "avg_logprob": -0.3289938173093996, "compression_ratio": + 1.6412698412698412, "no_speech_prob": 0.1377331167459488}, {"id": 163, "seek": 69614, + "start": 714.14, "end": 718.14, "text": " So basically it allows you to try out + some ideas really quickly, right?", "tokens": [51264, 407, 1936, 309, 4045, 291, + 281, 853, 484, 512, 3487, 534, 2661, 11, 558, 30, 51464], "temperature": 0.0, "avg_logprob": + -0.3289938173093996, "compression_ratio": 1.6412698412698412, "no_speech_prob": + 0.1377331167459488}, {"id": 164, "seek": 69614, "start": 718.14, "end": 725.14, + "text": " But then there will be some kind of offset compared with the reality because + tokenizing is in solar probably work differently.", "tokens": [51464, 583, 550, + 456, 486, 312, 512, 733, 295, 18687, 5347, 365, 264, 4103, 570, 14862, 3319, 307, + 294, 7936, 1391, 589, 7614, 13, 51814], "temperature": 0.0, "avg_logprob": -0.3289938173093996, + "compression_ratio": 1.6412698412698412, "no_speech_prob": 0.1377331167459488}, + {"id": 165, "seek": 72514, "start": 725.14, "end": 726.14, "text": " Yeah.", "tokens": + [50364, 865, 13, 50414], "temperature": 0.0, "avg_logprob": -0.35322280757683366, + "compression_ratio": 1.643939393939394, "no_speech_prob": 0.013285395689308643}, + {"id": 166, "seek": 72514, "start": 726.14, "end": 728.14, "text": " But that would + be different probably will be close enough right?", "tokens": [50414, 583, 300, + 576, 312, 819, 1391, 486, 312, 1998, 1547, 558, 30, 50514], "temperature": 0.0, + "avg_logprob": -0.35322280757683366, "compression_ratio": 1.643939393939394, "no_speech_prob": + 0.013285395689308643}, {"id": 167, "seek": 72514, "start": 728.14, "end": 737.14, + "text": " So it''s also if you nailed the signal that like you explained today in + your presentation, you know, what about a number of comments or what about you know + the recent sea and so on.", "tokens": [50514, 407, 309, 311, 611, 498, 291, 30790, + 264, 6358, 300, 411, 291, 8825, 965, 294, 428, 5860, 11, 291, 458, 11, 437, 466, + 257, 1230, 295, 3053, 420, 437, 466, 291, 458, 264, 5162, 4158, 293, 370, 322, 13, + 50964], "temperature": 0.0, "avg_logprob": -0.35322280757683366, "compression_ratio": + 1.643939393939394, "no_speech_prob": 0.013285395689308643}, {"id": 168, "seek": + 72514, "start": 737.14, "end": 739.14, "text": " Yeah, all of these be on the board.", + "tokens": [50964, 865, 11, 439, 295, 613, 312, 322, 264, 3150, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.35322280757683366, "compression_ratio": 1.643939393939394, + "no_speech_prob": 0.013285395689308643}, {"id": 169, "seek": 72514, "start": 739.14, + "end": 741.14, "text": " And I''ll like try that really quickly.", "tokens": [51064, + 400, 286, 603, 411, 853, 300, 534, 2661, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.35322280757683366, "compression_ratio": 1.643939393939394, "no_speech_prob": + 0.013285395689308643}, {"id": 170, "seek": 72514, "start": 741.14, "end": 742.14, + "text": " Yeah.", "tokens": [51164, 865, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.35322280757683366, "compression_ratio": 1.643939393939394, "no_speech_prob": + 0.013285395689308643}, {"id": 171, "seek": 72514, "start": 742.14, "end": 743.14, + "text": " Yeah.", "tokens": [51214, 865, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.35322280757683366, "compression_ratio": 1.643939393939394, "no_speech_prob": + 0.013285395689308643}, {"id": 172, "seek": 72514, "start": 743.14, "end": 748.14, + "text": " And a lot of times it''s, it''s a big effort to index some new data into + the search engine.", "tokens": [51264, 400, 257, 688, 295, 1413, 309, 311, 11, 309, + 311, 257, 955, 4630, 281, 8186, 512, 777, 1412, 666, 264, 3164, 2848, 13, 51514], + "temperature": 0.0, "avg_logprob": -0.35322280757683366, "compression_ratio": 1.643939393939394, + "no_speech_prob": 0.013285395689308643}, {"id": 173, "seek": 72514, "start": 748.14, + "end": 749.14, "text": " Yeah.", "tokens": [51514, 865, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.35322280757683366, "compression_ratio": 1.643939393939394, + "no_speech_prob": 0.013285395689308643}, {"id": 174, "seek": 74914, "start": 749.14, + "end": 754.14, "text": " And you have to go is the upstream system fast enough to + handle those load and then stay up to date.", "tokens": [50364, 400, 291, 362, 281, + 352, 307, 264, 33915, 1185, 2370, 1547, 281, 4813, 729, 3677, 293, 550, 1754, 493, + 281, 4002, 13, 50614], "temperature": 0.0, "avg_logprob": -0.1895805278294523, "compression_ratio": + 1.6571428571428573, "no_speech_prob": 0.002668525092303753}, {"id": 175, "seek": + 74914, "start": 754.14, "end": 756.14, "text": " And really to justify a project.", + "tokens": [50614, 400, 534, 281, 20833, 257, 1716, 13, 50714], "temperature": 0.0, + "avg_logprob": -0.1895805278294523, "compression_ratio": 1.6571428571428573, "no_speech_prob": + 0.002668525092303753}, {"id": 176, "seek": 74914, "start": 756.14, "end": 757.14, + "text": " Yeah.", "tokens": [50714, 865, 13, 50764], "temperature": 0.0, "avg_logprob": + -0.1895805278294523, "compression_ratio": 1.6571428571428573, "no_speech_prob": + 0.002668525092303753}, {"id": 177, "seek": 74914, "start": 757.14, "end": 761.14, + "text": " You might start with a prototype and say, OK, I just pull for a small.", + "tokens": [50764, 509, 1062, 722, 365, 257, 19475, 293, 584, 11, 2264, 11, 286, + 445, 2235, 337, 257, 1359, 13, 50964], "temperature": 0.0, "avg_logprob": -0.1895805278294523, + "compression_ratio": 1.6571428571428573, "no_speech_prob": 0.002668525092303753}, + {"id": 178, "seek": 74914, "start": 761.14, "end": 762.14, "text": " Small chest + set of data.", "tokens": [50964, 15287, 7443, 992, 295, 1412, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.1895805278294523, "compression_ratio": 1.6571428571428573, + "no_speech_prob": 0.002668525092303753}, {"id": 179, "seek": 74914, "start": 762.14, + "end": 763.14, "text": " Yeah.", "tokens": [51014, 865, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.1895805278294523, "compression_ratio": 1.6571428571428573, + "no_speech_prob": 0.002668525092303753}, {"id": 180, "seek": 74914, "start": 763.14, + "end": 764.14, "text": " I pulled in some data.", "tokens": [51064, 286, 7373, 294, + 512, 1412, 13, 51114], "temperature": 0.0, "avg_logprob": -0.1895805278294523, "compression_ratio": + 1.6571428571428573, "no_speech_prob": 0.002668525092303753}, {"id": 181, "seek": + 74914, "start": 764.14, "end": 765.14, "text": " It seems like there''s some signal + here.", "tokens": [51114, 467, 2544, 411, 456, 311, 512, 6358, 510, 13, 51164], + "temperature": 0.0, "avg_logprob": -0.1895805278294523, "compression_ratio": 1.6571428571428573, + "no_speech_prob": 0.002668525092303753}, {"id": 182, "seek": 74914, "start": 765.14, + "end": 766.14, "text": " Yeah.", "tokens": [51164, 865, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.1895805278294523, "compression_ratio": 1.6571428571428573, + "no_speech_prob": 0.002668525092303753}, {"id": 183, "seek": 74914, "start": 766.14, + "end": 767.14, "text": " Let''s plan a project around it.", "tokens": [51214, 961, + 311, 1393, 257, 1716, 926, 309, 13, 51264], "temperature": 0.0, "avg_logprob": -0.1895805278294523, + "compression_ratio": 1.6571428571428573, "no_speech_prob": 0.002668525092303753}, + {"id": 184, "seek": 74914, "start": 767.14, "end": 768.14, "text": " Yeah.", "tokens": + [51264, 865, 13, 51314], "temperature": 0.0, "avg_logprob": -0.1895805278294523, + "compression_ratio": 1.6571428571428573, "no_speech_prob": 0.002668525092303753}, + {"id": 185, "seek": 74914, "start": 768.14, "end": 771.14, "text": " And so this + is to me.", "tokens": [51314, 400, 370, 341, 307, 281, 385, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.1895805278294523, "compression_ratio": 1.6571428571428573, + "no_speech_prob": 0.002668525092303753}, {"id": 186, "seek": 74914, "start": 771.14, + "end": 774.14, "text": " And actually I''ll, my sees the conference after blue and + buzzwords.", "tokens": [51464, 400, 767, 286, 603, 11, 452, 8194, 264, 7586, 934, + 3344, 293, 13036, 13832, 13, 51614], "temperature": 0.0, "avg_logprob": -0.1895805278294523, + "compression_ratio": 1.6571428571428573, "no_speech_prob": 0.002668525092303753}, + {"id": 187, "seek": 74914, "start": 774.14, "end": 776.14, "text": " I''ll talk + about planning.", "tokens": [51614, 286, 603, 751, 466, 5038, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.1895805278294523, "compression_ratio": 1.6571428571428573, + "no_speech_prob": 0.002668525092303753}, {"id": 188, "seek": 77614, "start": 776.14, + "end": 788.14, "text": " But to me, a lot of this like how do we build better prototypes + to build plans and have ideas and have conversations between engineers data scientists + and product managers is really one of the inspirations for search array.", "tokens": + [50364, 583, 281, 385, 11, 257, 688, 295, 341, 411, 577, 360, 321, 1322, 1101, 42197, + 281, 1322, 5482, 293, 362, 3487, 293, 362, 7315, 1296, 11955, 1412, 7708, 293, 1674, + 14084, 307, 534, 472, 295, 264, 17432, 763, 337, 3164, 10225, 13, 50964], "temperature": + 0.0, "avg_logprob": -0.2560033117021833, "compression_ratio": 1.6830188679245284, + "no_speech_prob": 0.002589531010016799}, {"id": 189, "seek": 77614, "start": 788.14, + "end": 794.14, "text": " Because to do this before I''d be like, OK, I have to stand + up some examples, certain custom.", "tokens": [50964, 1436, 281, 360, 341, 949, + 286, 1116, 312, 411, 11, 2264, 11, 286, 362, 281, 1463, 493, 512, 5110, 11, 1629, + 2375, 13, 51264], "temperature": 0.0, "avg_logprob": -0.2560033117021833, "compression_ratio": + 1.6830188679245284, "no_speech_prob": 0.002589531010016799}, {"id": 190, "seek": + 77614, "start": 794.14, "end": 796.14, "text": " Yeah, yeah, yeah, yeah, spend time + on that.", "tokens": [51264, 865, 11, 1338, 11, 1338, 11, 1338, 11, 3496, 565, 322, + 300, 13, 51364], "temperature": 0.0, "avg_logprob": -0.2560033117021833, "compression_ratio": + 1.6830188679245284, "no_speech_prob": 0.002589531010016799}, {"id": 191, "seek": + 77614, "start": 796.14, "end": 799.14, "text": " Yeah, how am I going to absolutely + locate a cluster and whatnot.", "tokens": [51364, 865, 11, 577, 669, 286, 516, 281, + 3122, 22370, 257, 13630, 293, 25882, 13, 51514], "temperature": 0.0, "avg_logprob": + -0.2560033117021833, "compression_ratio": 1.6830188679245284, "no_speech_prob": + 0.002589531010016799}, {"id": 192, "seek": 77614, "start": 799.14, "end": 800.14, + "text": " Yeah, exactly.", "tokens": [51514, 865, 11, 2293, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.2560033117021833, "compression_ratio": 1.6830188679245284, + "no_speech_prob": 0.002589531010016799}, {"id": 193, "seek": 77614, "start": 800.14, + "end": 801.14, "text": " Yeah.", "tokens": [51564, 865, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.2560033117021833, "compression_ratio": 1.6830188679245284, + "no_speech_prob": 0.002589531010016799}, {"id": 194, "seek": 80114, "start": 801.14, + "end": 805.14, "text": " And I think I actually this reminded me when I was working + on the learning to rank.", "tokens": [50364, 400, 286, 519, 286, 767, 341, 15920, + 385, 562, 286, 390, 1364, 322, 264, 2539, 281, 6181, 13, 50564], "temperature": + 0.0, "avg_logprob": -0.2879345005956189, "compression_ratio": 1.533596837944664, + "no_speech_prob": 0.21029916405677795}, {"id": 195, "seek": 80114, "start": 805.14, + "end": 809.14, "text": " And this was the last project I did in my 10 year tenure + at Alpha Cent.", "tokens": [50564, 400, 341, 390, 264, 1036, 1716, 286, 630, 294, + 452, 1266, 1064, 32256, 412, 20588, 3408, 13, 50764], "temperature": 0.0, "avg_logprob": + -0.2879345005956189, "compression_ratio": 1.533596837944664, "no_speech_prob": 0.21029916405677795}, + {"id": 196, "seek": 80114, "start": 809.14, "end": 813.14, "text": " I said, Hey, + can I have this really expensive laptop?", "tokens": [50764, 286, 848, 11, 1911, + 11, 393, 286, 362, 341, 534, 5124, 10732, 30, 50964], "temperature": 0.0, "avg_logprob": + -0.2879345005956189, "compression_ratio": 1.533596837944664, "no_speech_prob": 0.21029916405677795}, + {"id": 197, "seek": 80114, "start": 813.14, "end": 817.14, "text": " So it would + be like 30 gig around one terabyte drive.", "tokens": [50964, 407, 309, 576, 312, + 411, 2217, 8741, 926, 472, 1796, 34529, 3332, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.2879345005956189, "compression_ratio": 1.533596837944664, "no_speech_prob": 0.21029916405677795}, + {"id": 198, "seek": 80114, "start": 817.14, "end": 820.14, "text": " I thought I + need so much space for some reason.", "tokens": [51164, 286, 1194, 286, 643, 370, + 709, 1901, 337, 512, 1778, 13, 51314], "temperature": 0.0, "avg_logprob": -0.2879345005956189, + "compression_ratio": 1.533596837944664, "no_speech_prob": 0.21029916405677795}, + {"id": 199, "seek": 80114, "start": 820.14, "end": 822.14, "text": " And it was + SSD.", "tokens": [51314, 400, 309, 390, 30262, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.2879345005956189, "compression_ratio": 1.533596837944664, "no_speech_prob": 0.21029916405677795}, + {"id": 200, "seek": 80114, "start": 822.14, "end": 824.14, "text": " And and I got + it approved.", "tokens": [51414, 400, 293, 286, 658, 309, 10826, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.2879345005956189, "compression_ratio": 1.533596837944664, + "no_speech_prob": 0.21029916405677795}, {"id": 201, "seek": 80114, "start": 824.14, + "end": 825.14, "text": " It was like, oh, my God.", "tokens": [51514, 467, 390, + 411, 11, 1954, 11, 452, 1265, 13, 51564], "temperature": 0.0, "avg_logprob": -0.2879345005956189, + "compression_ratio": 1.533596837944664, "no_speech_prob": 0.21029916405677795}, + {"id": 202, "seek": 80114, "start": 825.14, "end": 826.14, "text": " So many.", + "tokens": [51564, 407, 867, 13, 51614], "temperature": 0.0, "avg_logprob": -0.2879345005956189, + "compression_ratio": 1.533596837944664, "no_speech_prob": 0.21029916405677795}, + {"id": 203, "seek": 82614, "start": 826.14, "end": 833.14, "text": " So and I spent + like a year working on it and was kind of like bloated version of search array because + I could do everything on the laptop.", "tokens": [50364, 407, 293, 286, 4418, 411, + 257, 1064, 1364, 322, 309, 293, 390, 733, 295, 411, 1749, 770, 3037, 295, 3164, + 10225, 570, 286, 727, 360, 1203, 322, 264, 10732, 13, 50714], "temperature": 0.0, + "avg_logprob": -0.25445001315226595, "compression_ratio": 1.6863468634686347, "no_speech_prob": + 0.09308959543704987}, {"id": 204, "seek": 82614, "start": 833.14, "end": 834.14, + "text": " This connected right.", "tokens": [50714, 639, 4582, 558, 13, 50764], + "temperature": 0.0, "avg_logprob": -0.25445001315226595, "compression_ratio": 1.6863468634686347, + "no_speech_prob": 0.09308959543704987}, {"id": 205, "seek": 82614, "start": 834.14, + "end": 840.14, "text": " The only problem I remember was tracking my experiment + tree because I would like.", "tokens": [50764, 440, 787, 1154, 286, 1604, 390, 11603, + 452, 5120, 4230, 570, 286, 576, 411, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.25445001315226595, "compression_ratio": 1.6863468634686347, "no_speech_prob": + 0.09308959543704987}, {"id": 206, "seek": 82614, "start": 840.14, "end": 843.14, + "text": " Let''s the right word for Kate or like kind of branch out.", "tokens": + [51064, 961, 311, 264, 558, 1349, 337, 16251, 420, 411, 733, 295, 9819, 484, 13, + 51214], "temperature": 0.0, "avg_logprob": -0.25445001315226595, "compression_ratio": + 1.6863468634686347, "no_speech_prob": 0.09308959543704987}, {"id": 207, "seek": + 82614, "start": 843.14, "end": 844.14, "text": " That''s right.", "tokens": [51214, + 663, 311, 558, 13, 51264], "temperature": 0.0, "avg_logprob": -0.25445001315226595, + "compression_ratio": 1.6863468634686347, "no_speech_prob": 0.09308959543704987}, + {"id": 208, "seek": 82614, "start": 844.14, "end": 853.14, "text": " And then like, + OK, should I go back now because retreat because looks looks like I went down the + rabbit hole and it doesn''t give at any value.", "tokens": [51264, 400, 550, 411, + 11, 2264, 11, 820, 286, 352, 646, 586, 570, 15505, 570, 1542, 1542, 411, 286, 1437, + 760, 264, 19509, 5458, 293, 309, 1177, 380, 976, 412, 604, 2158, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.25445001315226595, "compression_ratio": 1.6863468634686347, + "no_speech_prob": 0.09308959543704987}, {"id": 209, "seek": 85314, "start": 853.14, + "end": 859.14, "text": " So you go back to that state when it was that bigger, you + know, and this is you or whatever, right?", "tokens": [50364, 407, 291, 352, 646, + 281, 300, 1785, 562, 309, 390, 300, 3801, 11, 291, 458, 11, 293, 341, 307, 291, + 420, 2035, 11, 558, 30, 50664], "temperature": 0.0, "avg_logprob": -0.37838138442441643, + "compression_ratio": 1.5776699029126213, "no_speech_prob": 0.07478807866573334}, + {"id": 210, "seek": 85314, "start": 859.14, "end": 871.14, "text": " So instead + of from there, that sounds a lot of like what some of the functionality in cute + that the sort of like search relevance turning tool to you, you, you, PID.", "tokens": + [50664, 407, 2602, 295, 490, 456, 11, 300, 3263, 257, 688, 295, 411, 437, 512, 295, + 264, 14980, 294, 4052, 300, 264, 1333, 295, 411, 3164, 32684, 6246, 2290, 281, 291, + 11, 291, 11, 291, 11, 430, 2777, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.37838138442441643, "compression_ratio": 1.5776699029126213, "no_speech_prob": + 0.07478807866573334}, {"id": 211, "seek": 85314, "start": 871.14, "end": 875.14, + "text": " Because I remember when I was building that many years ago.", "tokens": + [51264, 1436, 286, 1604, 562, 286, 390, 2390, 300, 867, 924, 2057, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.37838138442441643, "compression_ratio": 1.5776699029126213, + "no_speech_prob": 0.07478807866573334}, {"id": 212, "seek": 87514, "start": 875.14, + "end": 883.14, "text": " It was very much like every time you submit like you tweak + the query and it saves that as a try.", "tokens": [50364, 467, 390, 588, 709, 411, + 633, 565, 291, 10315, 411, 291, 29879, 264, 14581, 293, 309, 19155, 300, 382, 257, + 853, 13, 50764], "temperature": 0.0, "avg_logprob": -0.20984344482421874, "compression_ratio": + 1.6575342465753424, "no_speech_prob": 0.5611611604690552}, {"id": 213, "seek": 87514, + "start": 883.14, "end": 889.14, "text": " Yeah, another time you can''t fork off + stuff, but you can go back and be like, oh, this thing didn''t work out. I''m going + to go back.", "tokens": [50764, 865, 11, 1071, 565, 291, 393, 380, 17716, 766, 1507, + 11, 457, 291, 393, 352, 646, 293, 312, 411, 11, 1954, 11, 341, 551, 994, 380, 589, + 484, 13, 286, 478, 516, 281, 352, 646, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.20984344482421874, "compression_ratio": 1.6575342465753424, "no_speech_prob": + 0.5611611604690552}, {"id": 214, "seek": 87514, "start": 889.14, "end": 896.14, + "text": " And yeah, it''s it is funny how yeah, I have the same feeling. And even + in a notebook environment.", "tokens": [51064, 400, 1338, 11, 309, 311, 309, 307, + 4074, 577, 1338, 11, 286, 362, 264, 912, 2633, 13, 400, 754, 294, 257, 21060, 2823, + 13, 51414], "temperature": 0.0, "avg_logprob": -0.20984344482421874, "compression_ratio": + 1.6575342465753424, "no_speech_prob": 0.5611611604690552}, {"id": 215, "seek": 87514, + "start": 896.14, "end": 898.14, "text": " I don''t have that because notebooks.", + "tokens": [51414, 286, 500, 380, 362, 300, 570, 43782, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.20984344482421874, "compression_ratio": 1.6575342465753424, + "no_speech_prob": 0.5611611604690552}, {"id": 216, "seek": 89814, "start": 898.14, + "end": 906.14, "text": " And you tweak a little bit, you forget what happened. You + like, why did I thought my end ECG was good and I got bad. What did I do wrong?", + "tokens": [50364, 400, 291, 29879, 257, 707, 857, 11, 291, 2870, 437, 2011, 13, + 509, 411, 11, 983, 630, 286, 1194, 452, 917, 19081, 38, 390, 665, 293, 286, 658, + 1578, 13, 708, 630, 286, 360, 2085, 30, 50764], "temperature": 0.0, "avg_logprob": + -0.2177843451499939, "compression_ratio": 1.631578947368421, "no_speech_prob": 0.3629343509674072}, + {"id": 217, "seek": 89814, "start": 906.14, "end": 912.14, "text": " You wish that + you were like somehow the notebook was like versioning itself as you were going.", + "tokens": [50764, 509, 3172, 300, 291, 645, 411, 6063, 264, 21060, 390, 411, 3037, + 278, 2564, 382, 291, 645, 516, 13, 51064], "temperature": 0.0, "avg_logprob": -0.2177843451499939, + "compression_ratio": 1.631578947368421, "no_speech_prob": 0.3629343509674072}, {"id": + 218, "seek": 89814, "start": 912.14, "end": 914.14, "text": " Yeah, exactly.", "tokens": + [51064, 865, 11, 2293, 13, 51164], "temperature": 0.0, "avg_logprob": -0.2177843451499939, + "compression_ratio": 1.631578947368421, "no_speech_prob": 0.3629343509674072}, {"id": + 219, "seek": 89814, "start": 914.14, "end": 920.14, "text": " And but and like somehow + the whole environment was version, but yeah, that doesn''t exist. I wish that kind + of thing existed.", "tokens": [51164, 400, 457, 293, 411, 6063, 264, 1379, 2823, + 390, 3037, 11, 457, 1338, 11, 300, 1177, 380, 2514, 13, 286, 3172, 300, 733, 295, + 551, 13135, 13, 51464], "temperature": 0.0, "avg_logprob": -0.2177843451499939, + "compression_ratio": 1.631578947368421, "no_speech_prob": 0.3629343509674072}, {"id": + 220, "seek": 92014, "start": 920.14, "end": 928.14, "text": " Yeah, there was there + was a tool we were using, but then it was acquired by some company. It was called + spell dot run.", "tokens": [50364, 865, 11, 456, 390, 456, 390, 257, 2290, 321, + 645, 1228, 11, 457, 550, 309, 390, 17554, 538, 512, 2237, 13, 467, 390, 1219, 9827, + 5893, 1190, 13, 50764], "temperature": 0.0, "avg_logprob": -0.14957671165466307, + "compression_ratio": 1.5938697318007662, "no_speech_prob": 0.03698601573705673}, + {"id": 221, "seek": 92014, "start": 928.14, "end": 934.14, "text": " So basically + was like a integrated Python notebook environment that runs a cluster.", "tokens": + [50764, 407, 1936, 390, 411, 257, 10919, 15329, 21060, 2823, 300, 6676, 257, 13630, + 13, 51064], "temperature": 0.0, "avg_logprob": -0.14957671165466307, "compression_ratio": + 1.5938697318007662, "no_speech_prob": 0.03698601573705673}, {"id": 222, "seek": + 92014, "start": 934.14, "end": 938.14, "text": " And they were heading in the direction. + I was giving a lot of feedback to them.", "tokens": [51064, 400, 436, 645, 9864, + 294, 264, 3513, 13, 286, 390, 2902, 257, 688, 295, 5824, 281, 552, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.14957671165466307, "compression_ratio": 1.5938697318007662, + "no_speech_prob": 0.03698601573705673}, {"id": 223, "seek": 92014, "start": 938.14, + "end": 946.14, "text": " Hey, can you actually build an infrastructure, which will + allow me to also, you know, maintain my branched out, you know, experiment.", "tokens": + [51264, 1911, 11, 393, 291, 767, 1322, 364, 6896, 11, 597, 486, 2089, 385, 281, + 611, 11, 291, 458, 11, 6909, 452, 9819, 292, 484, 11, 291, 458, 11, 5120, 13, 51664], + "temperature": 0.0, "avg_logprob": -0.14957671165466307, "compression_ratio": 1.5938697318007662, + "no_speech_prob": 0.03698601573705673}, {"id": 224, "seek": 94614, "start": 946.14, + "end": 954.14, "text": " And I think they got acquired probably short before they + could do this and probably they continue doing this. I don''t know.", "tokens": + [50364, 400, 286, 519, 436, 658, 17554, 1391, 2099, 949, 436, 727, 360, 341, 293, + 1391, 436, 2354, 884, 341, 13, 286, 500, 380, 458, 13, 50764], "temperature": 0.0, + "avg_logprob": -0.2650107984189634, "compression_ratio": 1.7176470588235293, "no_speech_prob": + 0.17509548366069794}, {"id": 225, "seek": 94614, "start": 954.14, "end": 962.14, + "text": " But there is another project called DVC, I think, which allows you to + basically maintain your experiments as deep hashes, right.", "tokens": [50764, 583, + 456, 307, 1071, 1716, 1219, 17021, 34, 11, 286, 519, 11, 597, 4045, 291, 281, 1936, + 6909, 428, 12050, 382, 2452, 575, 8076, 11, 558, 13, 51164], "temperature": 0.0, + "avg_logprob": -0.2650107984189634, "compression_ratio": 1.7176470588235293, "no_speech_prob": + 0.17509548366069794}, {"id": 226, "seek": 94614, "start": 962.14, "end": 967.14, + "text": " So you basically, you can, you can, you can get cash your code along with + your data.", "tokens": [51164, 407, 291, 1936, 11, 291, 393, 11, 291, 393, 11, 291, + 393, 483, 6388, 428, 3089, 2051, 365, 428, 1412, 13, 51414], "temperature": 0.0, + "avg_logprob": -0.2650107984189634, "compression_ratio": 1.7176470588235293, "no_speech_prob": + 0.17509548366069794}, {"id": 227, "seek": 94614, "start": 967.14, "end": 973.14, + "text": " And then you upload your data. Let''s say to some cloud, I don''t know, + drive some drive abstract one.", "tokens": [51414, 400, 550, 291, 6580, 428, 1412, + 13, 961, 311, 584, 281, 512, 4588, 11, 286, 500, 380, 458, 11, 3332, 512, 3332, + 12649, 472, 13, 51714], "temperature": 0.0, "avg_logprob": -0.2650107984189634, + "compression_ratio": 1.7176470588235293, "no_speech_prob": 0.17509548366069794}, + {"id": 228, "seek": 97314, "start": 973.14, "end": 980.14, "text": " And then you + have your code associated with that. So you can basically restore you or someone + else that can restore the experiment.", "tokens": [50364, 400, 550, 291, 362, 428, + 3089, 6615, 365, 300, 13, 407, 291, 393, 1936, 15227, 291, 420, 1580, 1646, 300, + 393, 15227, 264, 5120, 13, 50714], "temperature": 0.0, "avg_logprob": -0.2212868118286133, + "compression_ratio": 1.6311475409836065, "no_speech_prob": 0.024939825758337975}, + {"id": 229, "seek": 97314, "start": 980.14, "end": 984.14, "text": " I think if + that if that was frictionless, right.", "tokens": [50714, 286, 519, 498, 300, 498, + 300, 390, 17710, 1832, 11, 558, 13, 50914], "temperature": 0.0, "avg_logprob": -0.2212868118286133, + "compression_ratio": 1.6311475409836065, "no_speech_prob": 0.024939825758337975}, + {"id": 230, "seek": 97314, "start": 984.14, "end": 993.14, "text": " Or even the + matter of it existing, right, because I had to literally like write something down + on a piece of paper to remember what I need to do.", "tokens": [50914, 1610, 754, + 264, 1871, 295, 309, 6741, 11, 558, 11, 570, 286, 632, 281, 3736, 411, 2464, 746, + 760, 322, 257, 2522, 295, 3035, 281, 1604, 437, 286, 643, 281, 360, 13, 51364], + "temperature": 0.0, "avg_logprob": -0.2212868118286133, "compression_ratio": 1.6311475409836065, + "no_speech_prob": 0.024939825758337975}, {"id": 231, "seek": 97314, "start": 993.14, + "end": 998.14, "text": " You know, sometimes I would go crazy at shopify. So we + had a, we had a,", "tokens": [51364, 509, 458, 11, 2171, 286, 576, 352, 3219, 412, + 3945, 2505, 13, 407, 321, 632, 257, 11, 321, 632, 257, 11, 51614], "temperature": + 0.0, "avg_logprob": -0.2212868118286133, "compression_ratio": 1.6311475409836065, + "no_speech_prob": 0.024939825758337975}, {"id": 232, "seek": 99814, "start": 998.14, + "end": 1005.14, "text": " our testing, search testing infrastructure, the notebooks. + So everything was in a monorepo.", "tokens": [50364, 527, 4997, 11, 3164, 4997, + 6896, 11, 264, 43782, 13, 407, 1203, 390, 294, 257, 1108, 418, 2259, 13, 50714], + "temperature": 0.0, "avg_logprob": -0.33198479683168475, "compression_ratio": 1.6460176991150441, + "no_speech_prob": 0.010275633074343204}, {"id": 233, "seek": 99814, "start": 1005.14, + "end": 1012.14, "text": " So you could stand up elastic search. And you could have, + we had a, this is a rails environment.", "tokens": [50714, 407, 291, 727, 1463, + 493, 17115, 3164, 13, 400, 291, 727, 362, 11, 321, 632, 257, 11, 341, 307, 257, + 27649, 2823, 13, 51064], "temperature": 0.0, "avg_logprob": -0.33198479683168475, + "compression_ratio": 1.6460176991150441, "no_speech_prob": 0.010275633074343204}, + {"id": 234, "seek": 99814, "start": 1012.14, "end": 1023.14, "text": " There''s + a, all the relevance logic was in a Ruby library that we, a rails monolith would + load and so call in the network call, a lasso search and do whatever, pre and post + processing.", "tokens": [51064, 821, 311, 257, 11, 439, 264, 32684, 9952, 390, 294, + 257, 19907, 6405, 300, 321, 11, 257, 27649, 1108, 29131, 576, 3677, 293, 370, 818, + 294, 264, 3209, 818, 11, 257, 2439, 539, 3164, 293, 360, 2035, 11, 659, 293, 2183, + 9007, 13, 51614], "temperature": 0.0, "avg_logprob": -0.33198479683168475, "compression_ratio": + 1.6460176991150441, "no_speech_prob": 0.010275633074343204}, {"id": 235, "seek": + 102314, "start": 1023.14, "end": 1037.1399999999999, "text": " But when we stood + up the test environment, we would say, we wanted to load this. We wanted to load + the right configs, but we would basically put the company commit tasks of the repo + that we,", "tokens": [50364, 583, 562, 321, 9371, 493, 264, 1500, 2823, 11, 321, + 576, 584, 11, 321, 1415, 281, 3677, 341, 13, 492, 1415, 281, 3677, 264, 558, 6662, + 82, 11, 457, 321, 576, 1936, 829, 264, 2237, 5599, 9608, 295, 264, 49040, 300, 321, + 11, 51064], "temperature": 0.0, "avg_logprob": -0.25421034495035805, "compression_ratio": + 1.7380952380952381, "no_speech_prob": 0.2891211211681366}, {"id": 236, "seek": 102314, + "start": 1037.1399999999999, "end": 1045.1399999999999, "text": " that it was supposed + to be and it would load the config and it would be amazing. But yeah, it''s, it''s + getting reproducible environments.", "tokens": [51064, 300, 309, 390, 3442, 281, + 312, 293, 309, 576, 3677, 264, 6662, 293, 309, 576, 312, 2243, 13, 583, 1338, 11, + 309, 311, 11, 309, 311, 1242, 11408, 32128, 12388, 13, 51464], "temperature": 0.0, + "avg_logprob": -0.25421034495035805, "compression_ratio": 1.7380952380952381, "no_speech_prob": + 0.2891211211681366}, {"id": 237, "seek": 102314, "start": 1045.1399999999999, "end": + 1048.1399999999999, "text": " Yeah, and experiments is a challenge.", "tokens": + [51464, 865, 11, 293, 12050, 307, 257, 3430, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.25421034495035805, "compression_ratio": 1.7380952380952381, "no_speech_prob": + 0.2891211211681366}, {"id": 238, "seek": 104814, "start": 1048.14, "end": 1058.14, + "text": " This is where, I mean, at some point, basically like your experiment rate + will be, you know, trumped by how quickly you can deploy this or how quickly you + can like,", "tokens": [50364, 639, 307, 689, 11, 286, 914, 11, 412, 512, 935, 11, + 1936, 411, 428, 5120, 3314, 486, 312, 11, 291, 458, 11, 21779, 292, 538, 577, 2661, + 291, 393, 7274, 341, 420, 577, 2661, 291, 393, 411, 11, 50864], "temperature": 0.0, + "avg_logprob": -0.32446371031201576, "compression_ratio": 1.830827067669173, "no_speech_prob": + 0.10876153409481049}, {"id": 239, "seek": 104814, "start": 1058.14, "end": 1065.14, + "text": " shuffle things, right? So yeah, I think so. This is where infrastructure + comes as a big topic and in your talk.", "tokens": [50864, 39426, 721, 11, 558, + 30, 407, 1338, 11, 286, 519, 370, 13, 639, 307, 689, 6896, 1487, 382, 257, 955, + 4829, 293, 294, 428, 751, 13, 51214], "temperature": 0.0, "avg_logprob": -0.32446371031201576, + "compression_ratio": 1.830827067669173, "no_speech_prob": 0.10876153409481049}, + {"id": 240, "seek": 104814, "start": 1065.14, "end": 1077.14, "text": " Yeah, I + think you spent a good, yeah. Yeah. I think it''s like partnership, right? So we + had a, there has to be like a big steam in my career too as partnerships like partnerships + with PM partnerships with data.", "tokens": [51214, 865, 11, 286, 519, 291, 4418, + 257, 665, 11, 1338, 13, 865, 13, 286, 519, 309, 311, 411, 9982, 11, 558, 30, 407, + 321, 632, 257, 11, 456, 575, 281, 312, 411, 257, 955, 11952, 294, 452, 3988, 886, + 382, 18245, 411, 18245, 365, 12499, 18245, 365, 1412, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.32446371031201576, "compression_ratio": 1.830827067669173, + "no_speech_prob": 0.10876153409481049}, {"id": 241, "seek": 107714, "start": 1077.14, + "end": 1089.14, "text": " Yeah, partnerships with infrastructure. You really have + to have one cohesive team and one of the anti patterns is when they''re so separate + that it creates.", "tokens": [50364, 865, 11, 18245, 365, 6896, 13, 509, 534, 362, + 281, 362, 472, 43025, 1469, 293, 472, 295, 264, 6061, 8294, 307, 562, 436, 434, + 370, 4994, 300, 309, 7829, 13, 50964], "temperature": 0.0, "avg_logprob": -0.16034182380227482, + "compression_ratio": 1.649789029535865, "no_speech_prob": 0.0009954140987247229}, + {"id": 242, "seek": 107714, "start": 1089.14, "end": 1090.14, "text": " I agree.", + "tokens": [50964, 286, 3986, 13, 51014], "temperature": 0.0, "avg_logprob": -0.16034182380227482, + "compression_ratio": 1.649789029535865, "no_speech_prob": 0.0009954140987247229}, + {"id": 243, "seek": 107714, "start": 1090.14, "end": 1092.14, "text": " You have + to throw a requirement over the fence. Yeah.", "tokens": [51014, 509, 362, 281, + 3507, 257, 11695, 670, 264, 15422, 13, 865, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.16034182380227482, "compression_ratio": 1.649789029535865, "no_speech_prob": + 0.0009954140987247229}, {"id": 244, "seek": 107714, "start": 1092.14, "end": 1096.14, + "text": " And then a month later, maybe you get something back, but it''s not quite + what you want.", "tokens": [51114, 400, 550, 257, 1618, 1780, 11, 1310, 291, 483, + 746, 646, 11, 457, 309, 311, 406, 1596, 437, 291, 528, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.16034182380227482, "compression_ratio": 1.649789029535865, + "no_speech_prob": 0.0009954140987247229}, {"id": 245, "seek": 107714, "start": 1096.14, + "end": 1100.14, "text": " You really have to act like one team. Yeah.", "tokens": + [51314, 509, 534, 362, 281, 605, 411, 472, 1469, 13, 865, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.16034182380227482, "compression_ratio": 1.649789029535865, + "no_speech_prob": 0.0009954140987247229}, {"id": 246, "seek": 107714, "start": 1100.14, + "end": 1102.14, "text": " And search is so multi functional.", "tokens": [51514, + 400, 3164, 307, 370, 4825, 11745, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.16034182380227482, "compression_ratio": 1.649789029535865, "no_speech_prob": + 0.0009954140987247229}, {"id": 247, "seek": 107714, "start": 1102.14, "end": 1103.14, + "text": " Yeah.", "tokens": [51614, 865, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.16034182380227482, "compression_ratio": 1.649789029535865, "no_speech_prob": + 0.0009954140987247229}, {"id": 248, "seek": 110314, "start": 1103.14, "end": 1111.14, + "text": " And I''ve seen it at Shopify, the challenge was like infrastructure was + a different or.", "tokens": [50364, 400, 286, 600, 1612, 309, 412, 43991, 11, 264, + 3430, 390, 411, 6896, 390, 257, 819, 420, 13, 50764], "temperature": 0.0, "avg_logprob": + -0.23146100791103869, "compression_ratio": 1.5825242718446602, "no_speech_prob": + 0.003333881963044405}, {"id": 249, "seek": 110314, "start": 1111.14, "end": 1119.14, + "text": " And so we would throw things back and forth over the fence and be not + quite right. We want to and we have to figure out the right way to partner.", "tokens": + [50764, 400, 370, 321, 576, 3507, 721, 646, 293, 5220, 670, 264, 15422, 293, 312, + 406, 1596, 558, 13, 492, 528, 281, 293, 321, 362, 281, 2573, 484, 264, 558, 636, + 281, 4975, 13, 51164], "temperature": 0.0, "avg_logprob": -0.23146100791103869, + "compression_ratio": 1.5825242718446602, "no_speech_prob": 0.003333881963044405}, + {"id": 250, "seek": 110314, "start": 1119.14, "end": 1126.14, "text": " At Reddit, + it''s a bit more, we have a bit more of a challenge that data is a different group.", + "tokens": [51164, 1711, 32210, 11, 309, 311, 257, 857, 544, 11, 321, 362, 257, 857, + 544, 295, 257, 3430, 300, 1412, 307, 257, 819, 1594, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.23146100791103869, "compression_ratio": 1.5825242718446602, + "no_speech_prob": 0.003333881963044405}, {"id": 251, "seek": 112614, "start": 1126.14, + "end": 1135.14, "text": " So we''re sort of throwing things over the fence, getting + things back. Yeah. And so we have to like actively work to make sure those partnerships + are like are healthy. Yeah.", "tokens": [50364, 407, 321, 434, 1333, 295, 10238, + 721, 670, 264, 15422, 11, 1242, 721, 646, 13, 865, 13, 400, 370, 321, 362, 281, + 411, 13022, 589, 281, 652, 988, 729, 18245, 366, 411, 366, 4627, 13, 865, 13, 50814], + "temperature": 0.0, "avg_logprob": -0.1591036779838696, "compression_ratio": 1.7429577464788732, + "no_speech_prob": 0.00650776457041502}, {"id": 252, "seek": 112614, "start": 1135.14, + "end": 1144.14, "text": " But it''s a big challenge. And I think like organizationally, + there are reasons that companies separate things out that are beyond search.", "tokens": + [50814, 583, 309, 311, 257, 955, 3430, 13, 400, 286, 519, 411, 4475, 379, 11, 456, + 366, 4112, 300, 3431, 4994, 721, 484, 300, 366, 4399, 3164, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.1591036779838696, "compression_ratio": 1.7429577464788732, + "no_speech_prob": 0.00650776457041502}, {"id": 253, "seek": 112614, "start": 1144.14, + "end": 1153.14, "text": " So it''s, it''s not like there''s a easy solution, but + it''s, it''s definitely when you get to search with these data products, like not + just search for recommendations and feed and things.", "tokens": [51264, 407, 309, + 311, 11, 309, 311, 406, 411, 456, 311, 257, 1858, 3827, 11, 457, 309, 311, 11, 309, + 311, 2138, 562, 291, 483, 281, 3164, 365, 613, 1412, 3383, 11, 411, 406, 445, 3164, + 337, 10434, 293, 3154, 293, 721, 13, 51714], "temperature": 0.0, "avg_logprob": + -0.1591036779838696, "compression_ratio": 1.7429577464788732, "no_speech_prob": + 0.00650776457041502}, {"id": 254, "seek": 115314, "start": 1153.14, "end": 1166.14, + "text": " These become having cross functional partnerships and not only cross functional + partnerships, but individuals who can work beyond their domain and get them for + like their multiple hats.", "tokens": [50364, 1981, 1813, 1419, 3278, 11745, 18245, + 293, 406, 787, 3278, 11745, 18245, 11, 457, 5346, 567, 393, 589, 4399, 641, 9274, + 293, 483, 552, 337, 411, 641, 3866, 20549, 13, 51014], "temperature": 0.0, "avg_logprob": + -0.29346837997436526, "compression_ratio": 1.6018518518518519, "no_speech_prob": + 0.049247436225414276}, {"id": 255, "seek": 115314, "start": 1166.14, "end": 1167.14, + "text": " Yeah.", "tokens": [51014, 865, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.29346837997436526, "compression_ratio": 1.6018518518518519, "no_speech_prob": + 0.049247436225414276}, {"id": 256, "seek": 115314, "start": 1167.14, "end": 1169.14, + "text": " Is really important.", "tokens": [51064, 1119, 534, 1021, 13, 51164], + "temperature": 0.0, "avg_logprob": -0.29346837997436526, "compression_ratio": 1.6018518518518519, + "no_speech_prob": 0.049247436225414276}, {"id": 257, "seek": 115314, "start": 1169.14, + "end": 1179.14, "text": " Yeah, I think you''re absolutely spot on the, you know, + and in back in the previous company, declined of silo AI and now I don''t know.", + "tokens": [51164, 865, 11, 286, 519, 291, 434, 3122, 4008, 322, 264, 11, 291, 458, + 11, 293, 294, 646, 294, 264, 3894, 2237, 11, 29213, 295, 3425, 78, 7318, 293, 586, + 286, 500, 380, 458, 13, 51664], "temperature": 0.0, "avg_logprob": -0.29346837997436526, + "compression_ratio": 1.6018518518518519, "no_speech_prob": 0.049247436225414276}, + {"id": 258, "seek": 117914, "start": 1179.14, "end": 1195.14, "text": " I feel sort + of like the same, but one thing I found after like breaking some arrows in the beginning, + I found that if you can try to find the mutual benefits, that they will be driven + as well as you.", "tokens": [50364, 286, 841, 1333, 295, 411, 264, 912, 11, 457, + 472, 551, 286, 1352, 934, 411, 7697, 512, 19669, 294, 264, 2863, 11, 286, 1352, + 300, 498, 291, 393, 853, 281, 915, 264, 16917, 5311, 11, 300, 436, 486, 312, 9555, + 382, 731, 382, 291, 13, 51164], "temperature": 0.0, "avg_logprob": -0.2047961378750736, + "compression_ratio": 1.5228426395939085, "no_speech_prob": 0.3792873024940491}, + {"id": 259, "seek": 117914, "start": 1195.14, "end": 1199.14, "text": " Yeah, you + don''t know what''s the outcome going to be because experiments are always like + that. Right.", "tokens": [51164, 865, 11, 291, 500, 380, 458, 437, 311, 264, 9700, + 516, 281, 312, 570, 12050, 366, 1009, 411, 300, 13, 1779, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.2047961378750736, "compression_ratio": 1.5228426395939085, + "no_speech_prob": 0.3792873024940491}, {"id": 260, "seek": 119914, "start": 1199.14, + "end": 1212.14, "text": " Yeah, but the fact that we are having an experiment cross + department, this is amazing. Then you go to all these meetings with executives and + you say you, you praise them and they probably some way of praise you, like your + team.", "tokens": [50364, 865, 11, 457, 264, 1186, 300, 321, 366, 1419, 364, 5120, + 3278, 5882, 11, 341, 307, 2243, 13, 1396, 291, 352, 281, 439, 613, 8410, 365, 28485, + 293, 291, 584, 291, 11, 291, 13286, 552, 293, 436, 1391, 512, 636, 295, 13286, 291, + 11, 411, 428, 1469, 13, 51014], "temperature": 0.0, "avg_logprob": -0.20556675946270977, + "compression_ratio": 1.7011494252873562, "no_speech_prob": 0.618987500667572}, {"id": + 261, "seek": 119914, "start": 1212.14, "end": 1215.14, "text": " And, and that''s + how you get the right.", "tokens": [51014, 400, 11, 293, 300, 311, 577, 291, 483, + 264, 558, 13, 51164], "temperature": 0.0, "avg_logprob": -0.20556675946270977, "compression_ratio": + 1.7011494252873562, "no_speech_prob": 0.618987500667572}, {"id": 262, "seek": 119914, + "start": 1215.14, "end": 1219.14, "text": " But, but things happen things happen. + You just need to be persistent. I guess.", "tokens": [51164, 583, 11, 457, 721, + 1051, 721, 1051, 13, 509, 445, 643, 281, 312, 24315, 13, 286, 2041, 13, 51364], + "temperature": 0.0, "avg_logprob": -0.20556675946270977, "compression_ratio": 1.7011494252873562, + "no_speech_prob": 0.618987500667572}, {"id": 263, "seek": 119914, "start": 1219.14, + "end": 1224.14, "text": " Yeah, things happen all the time. And like, you know, + organizational changes are like the weather.", "tokens": [51364, 865, 11, 721, 1051, + 439, 264, 565, 13, 400, 411, 11, 291, 458, 11, 24730, 2962, 366, 411, 264, 5503, + 13, 51614], "temperature": 0.0, "avg_logprob": -0.20556675946270977, "compression_ratio": + 1.7011494252873562, "no_speech_prob": 0.618987500667572}, {"id": 264, "seek": 122414, + "start": 1224.14, "end": 1226.14, "text": " You never know.", "tokens": [50364, + 509, 1128, 458, 13, 50464], "temperature": 0.0, "avg_logprob": -0.3169645083847866, + "compression_ratio": 1.6051502145922747, "no_speech_prob": 0.048401039093732834}, + {"id": 265, "seek": 122414, "start": 1226.14, "end": 1231.14, "text": " There''s + going to be a reward for someone comes in with a new perspective or whatever.", + "tokens": [50464, 821, 311, 516, 281, 312, 257, 7782, 337, 1580, 1487, 294, 365, + 257, 777, 4585, 420, 2035, 13, 50714], "temperature": 0.0, "avg_logprob": -0.3169645083847866, + "compression_ratio": 1.6051502145922747, "no_speech_prob": 0.048401039093732834}, + {"id": 266, "seek": 122414, "start": 1231.14, "end": 1237.14, "text": " And that''s + another career lesson is not to get too caught up like emotionally.", "tokens": + [50714, 400, 300, 311, 1071, 3988, 6898, 307, 406, 281, 483, 886, 5415, 493, 411, + 17991, 13, 51014], "temperature": 0.0, "avg_logprob": -0.3169645083847866, "compression_ratio": + 1.6051502145922747, "no_speech_prob": 0.048401039093732834}, {"id": 267, "seek": + 122414, "start": 1237.14, "end": 1239.14, "text": " Oh, yeah, something happens.", + "tokens": [51014, 876, 11, 1338, 11, 746, 2314, 13, 51114], "temperature": 0.0, + "avg_logprob": -0.3169645083847866, "compression_ratio": 1.6051502145922747, "no_speech_prob": + 0.048401039093732834}, {"id": 268, "seek": 122414, "start": 1239.14, "end": 1248.14, + "text": " Yeah, it''s not a lot of times it''s just that so many things are on your + control that are just like out in the politics or whatever organization changes + of the home.", "tokens": [51114, 865, 11, 309, 311, 406, 257, 688, 295, 1413, 309, + 311, 445, 300, 370, 867, 721, 366, 322, 428, 1969, 300, 366, 445, 411, 484, 294, + 264, 7341, 420, 2035, 4475, 2962, 295, 264, 1280, 13, 51564], "temperature": 0.0, + "avg_logprob": -0.3169645083847866, "compression_ratio": 1.6051502145922747, "no_speech_prob": + 0.048401039093732834}, {"id": 269, "seek": 124814, "start": 1248.14, "end": 1256.14, + "text": " Yeah, I think it''s sort of mental model called circles of interest in + the inner one. It''s like your direct control.", "tokens": [50364, 865, 11, 286, + 519, 309, 311, 1333, 295, 4973, 2316, 1219, 13040, 295, 1179, 294, 264, 7284, 472, + 13, 467, 311, 411, 428, 2047, 1969, 13, 50764], "temperature": 0.4, "avg_logprob": + -0.36006989350190033, "compression_ratio": 1.5692307692307692, "no_speech_prob": + 0.6002137064933777}, {"id": 270, "seek": 124814, "start": 1256.14, "end": 1259.14, + "text": " It''s probably you your time and whatever.", "tokens": [50764, 467, 311, + 1391, 291, 428, 565, 293, 2035, 13, 50914], "temperature": 0.4, "avg_logprob": -0.36006989350190033, + "compression_ratio": 1.5692307692307692, "no_speech_prob": 0.6002137064933777}, + {"id": 271, "seek": 124814, "start": 1259.14, "end": 1267.14, "text": " Where you + work specific tasks. Yeah, another one is like inference. So you cannot control. + But you can influence people or things or whatever it is.", "tokens": [50914, 2305, + 291, 589, 2685, 9608, 13, 865, 11, 1071, 472, 307, 411, 38253, 13, 407, 291, 2644, + 1969, 13, 583, 291, 393, 6503, 561, 420, 721, 420, 2035, 309, 307, 13, 51314], "temperature": + 0.4, "avg_logprob": -0.36006989350190033, "compression_ratio": 1.5692307692307692, + "no_speech_prob": 0.6002137064933777}, {"id": 272, "seek": 126714, "start": 1267.14, + "end": 1274.14, "text": " And the last one is that even if it bothers you, but you + cannot do anything. So it''s in all control area.", "tokens": [50364, 400, 264, + 1036, 472, 307, 300, 754, 498, 309, 33980, 291, 11, 457, 291, 2644, 360, 1340, 13, + 407, 309, 311, 294, 439, 1969, 1859, 13, 50714], "temperature": 0.0, "avg_logprob": + -0.24816614931279962, "compression_ratio": 1.5594059405940595, "no_speech_prob": + 0.5769426226615906}, {"id": 273, "seek": 126714, "start": 1274.14, "end": 1281.14, + "text": " Yeah. So you have to accept it or move on or do something. But don''t + get stuck on that. So it''s have.", "tokens": [50714, 865, 13, 407, 291, 362, 281, + 3241, 309, 420, 1286, 322, 420, 360, 746, 13, 583, 500, 380, 483, 5541, 322, 300, + 13, 407, 309, 311, 362, 13, 51064], "temperature": 0.0, "avg_logprob": -0.24816614931279962, + "compression_ratio": 1.5594059405940595, "no_speech_prob": 0.5769426226615906}, + {"id": 274, "seek": 126714, "start": 1281.14, "end": 1284.14, "text": " And things + of course keep moving between the.", "tokens": [51064, 400, 721, 295, 1164, 1066, + 2684, 1296, 264, 13, 51214], "temperature": 0.0, "avg_logprob": -0.24816614931279962, + "compression_ratio": 1.5594059405940595, "no_speech_prob": 0.5769426226615906}, + {"id": 275, "seek": 126714, "start": 1284.14, "end": 1288.14, "text": " It''s just + like dynamic system, but still good to be aware of.", "tokens": [51214, 467, 311, + 445, 411, 8546, 1185, 11, 457, 920, 665, 281, 312, 3650, 295, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.24816614931279962, "compression_ratio": 1.5594059405940595, + "no_speech_prob": 0.5769426226615906}, {"id": 276, "seek": 128814, "start": 1288.14, + "end": 1298.14, "text": " And you know, with your way of blogging and book writing + and yeah, projects you actually have that way of kind of okay, this is stuck. I''m + going to, you know,", "tokens": [50364, 400, 291, 458, 11, 365, 428, 636, 295, 6968, + 3249, 293, 1446, 3579, 293, 1338, 11, 4455, 291, 767, 362, 300, 636, 295, 733, 295, + 1392, 11, 341, 307, 5541, 13, 286, 478, 516, 281, 11, 291, 458, 11, 50864], "temperature": + 0.0, "avg_logprob": -0.37427713812851326, "compression_ratio": 1.5918367346938775, + "no_speech_prob": 0.7087719440460205}, {"id": 277, "seek": 128814, "start": 1298.14, + "end": 1302.14, "text": " relieve stress by blogging, even though writing is.", + "tokens": [50864, 30450, 4244, 538, 6968, 3249, 11, 754, 1673, 3579, 307, 13, 51064], + "temperature": 0.0, "avg_logprob": -0.37427713812851326, "compression_ratio": 1.5918367346938775, + "no_speech_prob": 0.7087719440460205}, {"id": 278, "seek": 128814, "start": 1302.14, + "end": 1309.14, "text": " Oh, you''re right. I do think like the other career advice + is like you get hyper focused on one thing.", "tokens": [51064, 876, 11, 291, 434, + 558, 13, 286, 360, 519, 411, 264, 661, 3988, 5192, 307, 411, 291, 483, 9848, 5178, + 322, 472, 551, 13, 51414], "temperature": 0.0, "avg_logprob": -0.37427713812851326, + "compression_ratio": 1.5918367346938775, "no_speech_prob": 0.7087719440460205}, + {"id": 279, "seek": 130914, "start": 1309.14, "end": 1319.14, "text": " You lose + the forest for the trees. Yes. And take a step back. Maybe there''s a project that + you really like that got canceled or something. Yes.", "tokens": [50364, 509, 3624, + 264, 6719, 337, 264, 5852, 13, 1079, 13, 400, 747, 257, 1823, 646, 13, 2704, 456, + 311, 257, 1716, 300, 291, 534, 411, 300, 658, 24839, 420, 746, 13, 1079, 13, 50864], + "temperature": 0.0, "avg_logprob": -0.13649994243275035, "compression_ratio": 1.7666666666666666, + "no_speech_prob": 0.4097462296485901}, {"id": 280, "seek": 130914, "start": 1319.14, + "end": 1322.14, "text": " We''ll take a step back and.", "tokens": [50864, 492, + 603, 747, 257, 1823, 646, 293, 13, 51014], "temperature": 0.0, "avg_logprob": -0.13649994243275035, + "compression_ratio": 1.7666666666666666, "no_speech_prob": 0.4097462296485901}, + {"id": 281, "seek": 130914, "start": 1322.14, "end": 1324.14, "text": " First of + all, a year, you don''t remember.", "tokens": [51014, 2386, 295, 439, 11, 257, 1064, + 11, 291, 500, 380, 1604, 13, 51114], "temperature": 0.0, "avg_logprob": -0.13649994243275035, + "compression_ratio": 1.7666666666666666, "no_speech_prob": 0.4097462296485901}, + {"id": 282, "seek": 130914, "start": 1324.14, "end": 1335.14, "text": " But there + are so many interesting things to work on. And I think people forget that that there + is. There''s so many interesting things to work on. And I, you know, I had a brief + break between Shopify and Reddit.", "tokens": [51114, 583, 456, 366, 370, 867, 1880, + 721, 281, 589, 322, 13, 400, 286, 519, 561, 2870, 300, 300, 456, 307, 13, 821, 311, + 370, 867, 1880, 721, 281, 589, 322, 13, 400, 286, 11, 291, 458, 11, 286, 632, 257, + 5353, 1821, 1296, 43991, 293, 32210, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.13649994243275035, "compression_ratio": 1.7666666666666666, "no_speech_prob": + 0.4097462296485901}, {"id": 283, "seek": 133514, "start": 1335.14, "end": 1341.14, + "text": " And I, I realized what life would be like when I was retired, because + I would get up and.", "tokens": [50364, 400, 286, 11, 286, 5334, 437, 993, 576, + 312, 411, 562, 286, 390, 16776, 11, 570, 286, 576, 483, 493, 293, 13, 50664], "temperature": + 0.0, "avg_logprob": -0.1767823455530569, "compression_ratio": 1.718487394957983, + "no_speech_prob": 0.08263206481933594}, {"id": 284, "seek": 133514, "start": 1341.14, + "end": 1348.14, "text": " It''s not like I just laid around and nothing. I was just + like, Oh, what could I play with? What could I do? Oh, there''s a problem. There''s + that interest and problem.", "tokens": [50664, 467, 311, 406, 411, 286, 445, 9897, + 926, 293, 1825, 13, 286, 390, 445, 411, 11, 876, 11, 437, 727, 286, 862, 365, 30, + 708, 727, 286, 360, 30, 876, 11, 456, 311, 257, 1154, 13, 821, 311, 300, 1179, 293, + 1154, 13, 51014], "temperature": 0.0, "avg_logprob": -0.1767823455530569, "compression_ratio": + 1.718487394957983, "no_speech_prob": 0.08263206481933594}, {"id": 285, "seek": 133514, + "start": 1348.14, "end": 1358.14, "text": " And that really opens your eyes to that. + There''s always there''s sort of like more fish in the sea. So to speak of like + problems to work on or cool stuff.", "tokens": [51014, 400, 300, 534, 9870, 428, + 2575, 281, 300, 13, 821, 311, 1009, 456, 311, 1333, 295, 411, 544, 3506, 294, 264, + 4158, 13, 407, 281, 1710, 295, 411, 2740, 281, 589, 322, 420, 1627, 1507, 13, 51514], + "temperature": 0.0, "avg_logprob": -0.1767823455530569, "compression_ratio": 1.718487394957983, + "no_speech_prob": 0.08263206481933594}, {"id": 286, "seek": 135814, "start": 1358.14, + "end": 1369.14, "text": " So yeah, there was even a study that when people retire + because they got money or like a lottery or some other wise, they go enjoy life, + but they also age much quicker.", "tokens": [50364, 407, 1338, 11, 456, 390, 754, + 257, 2979, 300, 562, 561, 10731, 570, 436, 658, 1460, 420, 411, 257, 27391, 420, + 512, 661, 10829, 11, 436, 352, 2103, 993, 11, 457, 436, 611, 3205, 709, 16255, 13, + 50914], "temperature": 0.0, "avg_logprob": -0.21416019698948535, "compression_ratio": + 1.68359375, "no_speech_prob": 0.5870619416236877}, {"id": 287, "seek": 135814, "start": + 1369.14, "end": 1374.14, "text": " And sometimes they unfortunately die quicker + because they have nothing to sort of strive for.", "tokens": [50914, 400, 2171, + 436, 7015, 978, 16255, 570, 436, 362, 1825, 281, 1333, 295, 23829, 337, 13, 51164], + "temperature": 0.0, "avg_logprob": -0.21416019698948535, "compression_ratio": 1.68359375, + "no_speech_prob": 0.5870619416236877}, {"id": 288, "seek": 135814, "start": 1374.14, + "end": 1375.14, "text": " Yeah.", "tokens": [51164, 865, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.21416019698948535, "compression_ratio": 1.68359375, "no_speech_prob": + 0.5870619416236877}, {"id": 289, "seek": 135814, "start": 1375.14, "end": 1386.14, + "text": " So that''s that''s really really cool advice. And if this was not enough, + all this cute bit search array, blogging, of course, work and other things podcasting + now.", "tokens": [51214, 407, 300, 311, 300, 311, 534, 534, 1627, 5192, 13, 400, + 498, 341, 390, 406, 1547, 11, 439, 341, 4052, 857, 3164, 10225, 11, 6968, 3249, + 11, 295, 1164, 11, 589, 293, 661, 721, 7367, 278, 586, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.21416019698948535, "compression_ratio": 1.68359375, "no_speech_prob": + 0.5870619416236877}, {"id": 290, "seek": 138614, "start": 1386.14, "end": 1396.14, + "text": " You also write a book tell me a bit more about that before we close. Oh, + yeah, yeah, it''s it''s it''s you just joked on the stage that the idea came to + 2018.", "tokens": [50364, 509, 611, 2464, 257, 1446, 980, 385, 257, 857, 544, 466, + 300, 949, 321, 1998, 13, 876, 11, 1338, 11, 1338, 11, 309, 311, 309, 311, 309, 311, + 291, 445, 361, 9511, 322, 264, 3233, 300, 264, 1558, 1361, 281, 6096, 13, 50864], + "temperature": 0.0, "avg_logprob": -0.28647143805204933, "compression_ratio": 1.4871794871794872, + "no_speech_prob": 0.14315122365951538}, {"id": 291, "seek": 138614, "start": 1396.14, + "end": 1401.14, "text": " Yeah, so tray tray and a share of the book. So trace the + primary author and.", "tokens": [50864, 865, 11, 370, 16027, 16027, 293, 257, 2073, + 295, 264, 1446, 13, 407, 13508, 264, 6194, 3793, 293, 13, 51114], "temperature": + 0.0, "avg_logprob": -0.28647143805204933, "compression_ratio": 1.4871794871794872, + "no_speech_prob": 0.14315122365951538}, {"id": 292, "seek": 140114, "start": 1401.14, + "end": 1415.14, "text": " tray came to me and I think 2018 said, Hey, I want to + let you know I''m writing a search book. I think I''ll be done and I want to really, + you know, for work and my wife and everything and family. I want to be done in six + months and I''m stressed everyone out.", "tokens": [50364, 16027, 1361, 281, 385, + 293, 286, 519, 6096, 848, 11, 1911, 11, 286, 528, 281, 718, 291, 458, 286, 478, + 3579, 257, 3164, 1446, 13, 286, 519, 286, 603, 312, 1096, 293, 286, 528, 281, 534, + 11, 291, 458, 11, 337, 589, 293, 452, 3836, 293, 1203, 293, 1605, 13, 286, 528, + 281, 312, 1096, 294, 2309, 2493, 293, 286, 478, 14471, 1518, 484, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.1892619640269178, "compression_ratio": 1.6650246305418719, + "no_speech_prob": 0.4966784715652466}, {"id": 293, "seek": 140114, "start": 1415.14, + "end": 1420.14, "text": " And here we are. It''s, you know, there was a pandemic. + There was a lot of stuff.", "tokens": [51064, 400, 510, 321, 366, 13, 467, 311, + 11, 291, 458, 11, 456, 390, 257, 5388, 13, 821, 390, 257, 688, 295, 1507, 13, 51314], + "temperature": 0.0, "avg_logprob": -0.1892619640269178, "compression_ratio": 1.6650246305418719, + "no_speech_prob": 0.4966784715652466}, {"id": 294, "seek": 142014, "start": 1420.14, + "end": 1432.14, "text": " But it''s 2024 and it''s funny because the nature what + you might refer to as AI in 2018, of course, is now is LLM''s and these things.", + "tokens": [50364, 583, 309, 311, 45237, 293, 309, 311, 4074, 570, 264, 3687, 437, + 291, 1062, 2864, 281, 382, 7318, 294, 6096, 11, 295, 1164, 11, 307, 586, 307, 441, + 43, 44, 311, 293, 613, 721, 13, 50964], "temperature": 0.0, "avg_logprob": -0.2590762107603012, + "compression_ratio": 1.3757575757575757, "no_speech_prob": 0.4854569137096405}, + {"id": 295, "seek": 142014, "start": 1432.14, "end": 1441.14, "text": " But it''s + really exciting. I think like a lot of the things in the book are timeless techniques.", + "tokens": [50964, 583, 309, 311, 534, 4670, 13, 286, 519, 411, 257, 688, 295, 264, + 721, 294, 264, 1446, 366, 41200, 7512, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.2590762107603012, "compression_ratio": 1.3757575757575757, "no_speech_prob": + 0.4854569137096405}, {"id": 296, "seek": 144114, "start": 1441.14, "end": 1449.14, + "text": " We initially focused the book on solar, but we''re taking we took a step + back when we said let''s make this applicable to many search engines. Yeah.", "tokens": + [50364, 492, 9105, 5178, 264, 1446, 322, 7936, 11, 457, 321, 434, 1940, 321, 1890, + 257, 1823, 646, 562, 321, 848, 718, 311, 652, 341, 21142, 281, 867, 3164, 12982, + 13, 865, 13, 50764], "temperature": 0.0, "avg_logprob": -0.25826309124628705, "compression_ratio": + 1.6311475409836065, "no_speech_prob": 0.35062384605407715}, {"id": 297, "seek": + 144114, "start": 1449.14, "end": 1459.14, "text": " And there are examples being + worked on for many platforms. It''s the ecosystem is so huge now. There''s all kinds + of vector databases that are even adding lexical sir.", "tokens": [50764, 400, 456, + 366, 5110, 885, 2732, 322, 337, 867, 9473, 13, 467, 311, 264, 11311, 307, 370, 2603, + 586, 13, 821, 311, 439, 3685, 295, 8062, 22380, 300, 366, 754, 5127, 476, 87, 804, + 4735, 13, 51264], "temperature": 0.0, "avg_logprob": -0.25826309124628705, "compression_ratio": + 1.6311475409836065, "no_speech_prob": 0.35062384605407715}, {"id": 298, "seek": + 144114, "start": 1459.14, "end": 1463.14, "text": " And then there''s of course + solar elastic search. There''s open search. There''s a best.", "tokens": [51264, + 400, 550, 456, 311, 295, 1164, 7936, 17115, 3164, 13, 821, 311, 1269, 3164, 13, + 821, 311, 257, 1151, 13, 51464], "temperature": 0.0, "avg_logprob": -0.25826309124628705, + "compression_ratio": 1.6311475409836065, "no_speech_prob": 0.35062384605407715}, + {"id": 299, "seek": 146314, "start": 1463.14, "end": 1473.14, "text": " Yeah. In + this more and more traditional space. And so like I worked primarily on the learning + terrain content.", "tokens": [50364, 865, 13, 682, 341, 544, 293, 544, 5164, 1901, + 13, 400, 370, 411, 286, 2732, 10029, 322, 264, 2539, 17674, 2701, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.1641262875327581, "compression_ratio": 1.6729857819905214, + "no_speech_prob": 0.2525365948677063}, {"id": 300, "seek": 146314, "start": 1473.14, + "end": 1489.14, "text": " And so a lot of the things about how you get training + data or train a model or how you evaluate these things, how you expose users to + search results that are maybe a bit novel, like you do a little exploration to build + out your training data.", "tokens": [50864, 400, 370, 257, 688, 295, 264, 721, 466, + 577, 291, 483, 3097, 1412, 420, 3847, 257, 2316, 420, 577, 291, 13059, 613, 721, + 11, 577, 291, 19219, 5022, 281, 3164, 3542, 300, 366, 1310, 257, 857, 7613, 11, + 411, 291, 360, 257, 707, 16197, 281, 1322, 484, 428, 3097, 1412, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.1641262875327581, "compression_ratio": 1.6729857819905214, + "no_speech_prob": 0.2525365948677063}, {"id": 301, "seek": 148914, "start": 1489.14, + "end": 1498.14, "text": " And so these things are regardless of where search goes + or where rag goes for whatever.", "tokens": [50364, 400, 370, 613, 721, 366, 10060, + 295, 689, 3164, 1709, 420, 689, 17539, 1709, 337, 2035, 13, 50814], "temperature": + 0.0, "avg_logprob": -0.18054664668752185, "compression_ratio": 1.5792349726775956, + "no_speech_prob": 0.1436457633972168}, {"id": 302, "seek": 148914, "start": 1498.14, + "end": 1514.14, "text": " It''s there are still very relevant. And it feels like + in a way, a lot of how users are interacting with with the world and with products + is through some kind of search or some kind of retrieval system.", "tokens": [50814, + 467, 311, 456, 366, 920, 588, 7340, 13, 400, 309, 3417, 411, 294, 257, 636, 11, + 257, 688, 295, 577, 5022, 366, 18017, 365, 365, 264, 1002, 293, 365, 3383, 307, + 807, 512, 733, 295, 3164, 420, 512, 733, 295, 19817, 3337, 1185, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.18054664668752185, "compression_ratio": 1.5792349726775956, + "no_speech_prob": 0.1436457633972168}, {"id": 303, "seek": 151414, "start": 1514.14, + "end": 1522.14, "text": " Even if it''s a recommendation system or a feature some + that''s becoming feeling more like search where it''s like real time and I''m getting + the stuff updated in real time.", "tokens": [50364, 2754, 498, 309, 311, 257, 11879, + 1185, 420, 257, 4111, 512, 300, 311, 5617, 2633, 544, 411, 3164, 689, 309, 311, + 411, 957, 565, 293, 286, 478, 1242, 264, 1507, 10588, 294, 957, 565, 13, 50764], + "temperature": 0.0, "avg_logprob": -0.29161275443384205, "compression_ratio": 1.475609756097561, + "no_speech_prob": 0.17734618484973907}, {"id": 304, "seek": 151414, "start": 1522.14, + "end": 1527.14, "text": " And of course, rag is searched. So I think search is still + there, right?", "tokens": [50764, 400, 295, 1164, 11, 17539, 307, 22961, 13, 407, + 286, 519, 3164, 307, 920, 456, 11, 558, 30, 51014], "temperature": 0.0, "avg_logprob": + -0.29161275443384205, "compression_ratio": 1.475609756097561, "no_speech_prob": + 0.17734618484973907}, {"id": 305, "seek": 152714, "start": 1527.14, "end": 1538.14, + "text": " And I think it''s going over the world and on which I realize some crowd + is now gathering to have lunch and we will have lunch soon as well.", "tokens": + [50364, 400, 286, 519, 309, 311, 516, 670, 264, 1002, 293, 322, 597, 286, 4325, + 512, 6919, 307, 586, 13519, 281, 362, 6349, 293, 321, 486, 362, 6349, 2321, 382, + 731, 13, 50914], "temperature": 0.0, "avg_logprob": -0.4734039306640625, "compression_ratio": + 1.555, "no_speech_prob": 0.8607131838798523}, {"id": 306, "seek": 152714, "start": + 1538.14, "end": 1551.14, "text": " It''s always a pleasure to talk to you and finally + in person I think we''ve never met but I think if you''ve been to a new senior revolution + in 2003, seeing around an island.", "tokens": [50914, 467, 311, 1009, 257, 6834, + 281, 751, 281, 291, 293, 2721, 294, 954, 286, 519, 321, 600, 1128, 1131, 457, 286, + 519, 498, 291, 600, 668, 281, 257, 777, 7965, 8894, 294, 16416, 11, 2577, 926, 364, + 6077, 13, 51564], "temperature": 0.0, "avg_logprob": -0.4734039306640625, "compression_ratio": + 1.555, "no_speech_prob": 0.8607131838798523}, {"id": 307, "seek": 155114, "start": + 1551.14, "end": 1559.14, "text": " I''ve been there as well. Is that the one in + San Diego or no, no, in Dublin, in Dublin, yeah, I''ve been there. Yes, I wasn''t. + Yeah, yeah.", "tokens": [50364, 286, 600, 668, 456, 382, 731, 13, 1119, 300, 264, + 472, 294, 5271, 16377, 420, 572, 11, 572, 11, 294, 42323, 11, 294, 42323, 11, 1338, + 11, 286, 600, 668, 456, 13, 1079, 11, 286, 2067, 380, 13, 865, 11, 1338, 13, 50764], + "temperature": 0.0, "avg_logprob": -0.4421159602977611, "compression_ratio": 1.5963302752293578, + "no_speech_prob": 0.7104085087776184}, {"id": 308, "seek": 155114, "start": 1559.14, + "end": 1566.14, "text": " Then I need in there to, you know, to go high. But you + know, that''s when I introduced Kupit actually.", "tokens": [50764, 1396, 286, 643, + 294, 456, 281, 11, 291, 458, 11, 281, 352, 1090, 13, 583, 291, 458, 11, 300, 311, + 562, 286, 7268, 591, 1010, 270, 767, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.4421159602977611, "compression_ratio": 1.5963302752293578, "no_speech_prob": + 0.7104085087776184}, {"id": 309, "seek": 155114, "start": 1566.14, "end": 1568.14, + "text": " Yeah, I know.", "tokens": [51114, 865, 11, 286, 458, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.4421159602977611, "compression_ratio": 1.5963302752293578, + "no_speech_prob": 0.7104085087776184}, {"id": 310, "seek": 155114, "start": 1568.14, + "end": 1574.14, "text": " It''s amazing. The workter. Yeah. That''s a still relevant + project. It''s running 2013 JavaScript.", "tokens": [51214, 467, 311, 2243, 13, + 440, 589, 391, 13, 865, 13, 663, 311, 257, 920, 7340, 1716, 13, 467, 311, 2614, + 9012, 15778, 13, 51514], "temperature": 0.0, "avg_logprob": -0.4421159602977611, + "compression_ratio": 1.5963302752293578, "no_speech_prob": 0.7104085087776184}, + {"id": 311, "seek": 157414, "start": 1574.14, "end": 1579.14, "text": " Yeah, and + I''ve been consistently deploying it in every company I agree.", "tokens": [50364, + 865, 11, 293, 286, 600, 668, 14961, 34198, 309, 294, 633, 2237, 286, 3986, 13, 50614], + "temperature": 0.0, "avg_logprob": -0.27200904846191404, "compression_ratio": 1.5020242914979758, + "no_speech_prob": 0.6817588210105896}, {"id": 312, "seek": 157414, "start": 1579.14, + "end": 1586.14, "text": " So in Tom Tom, we just released a new algorithm to production. + Thanks to Kupit in two weeks.", "tokens": [50614, 407, 294, 5041, 5041, 11, 321, + 445, 4736, 257, 777, 9284, 281, 4265, 13, 2561, 281, 591, 1010, 270, 294, 732, 3259, + 13, 50964], "temperature": 0.0, "avg_logprob": -0.27200904846191404, "compression_ratio": + 1.5020242914979758, "no_speech_prob": 0.6817588210105896}, {"id": 313, "seek": 157414, + "start": 1586.14, "end": 1592.14, "text": " It was, it was on the bookshelf for + quite some time because the team couldn''t figure out how to test equality.", "tokens": + [50964, 467, 390, 11, 309, 390, 322, 264, 1446, 46626, 337, 1596, 512, 565, 570, + 264, 1469, 2809, 380, 2573, 484, 577, 281, 1500, 14949, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.27200904846191404, "compression_ratio": 1.5020242914979758, + "no_speech_prob": 0.6817588210105896}, {"id": 314, "seek": 157414, "start": 1592.14, + "end": 1598.14, "text": " And I said, okay, let''s just do labeling, right? And + let''s use Q. Yeah, just simple labels and.", "tokens": [51264, 400, 286, 848, 11, + 1392, 11, 718, 311, 445, 360, 40244, 11, 558, 30, 400, 718, 311, 764, 1249, 13, + 865, 11, 445, 2199, 16949, 293, 13, 51564], "temperature": 0.0, "avg_logprob": -0.27200904846191404, + "compression_ratio": 1.5020242914979758, "no_speech_prob": 0.6817588210105896}, + {"id": 315, "seek": 159814, "start": 1598.14, "end": 1607.14, "text": " And we saw + like more than 10% increase in precision with new algorithm. And they said, as a + product manager, I approved the release. Let''s go.", "tokens": [50364, 400, 321, + 1866, 411, 544, 813, 1266, 4, 3488, 294, 18356, 365, 777, 9284, 13, 400, 436, 848, + 11, 382, 257, 1674, 6598, 11, 286, 10826, 264, 4374, 13, 961, 311, 352, 13, 50814], + "temperature": 0.0, "avg_logprob": -0.2732205240350021, "compression_ratio": 1.5493562231759657, + "no_speech_prob": 0.276660293340683}, {"id": 316, "seek": 159814, "start": 1607.14, + "end": 1610.14, "text": " That''s also that''s great. Thanks for creating the tool.", + "tokens": [50814, 663, 311, 611, 300, 311, 869, 13, 2561, 337, 4084, 264, 2290, + 13, 50964], "temperature": 0.0, "avg_logprob": -0.2732205240350021, "compression_ratio": + 1.5493562231759657, "no_speech_prob": 0.276660293340683}, {"id": 317, "seek": 159814, + "start": 1610.14, "end": 1619.14, "text": " Great. Sure. Happy to have a lot of + making tools. Yeah. Thanks for a time, Doug. Enjoy the conference. Thank you. Stay + in Berlin. Yes. Thank you. Awesome. Thanks.", "tokens": [50964, 3769, 13, 4894, + 13, 8277, 281, 362, 257, 688, 295, 1455, 3873, 13, 865, 13, 2561, 337, 257, 565, + 11, 12742, 13, 15411, 264, 7586, 13, 1044, 291, 13, 8691, 294, 13848, 13, 1079, + 13, 1044, 291, 13, 10391, 13, 2561, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.2732205240350021, "compression_ratio": 1.5493562231759657, "no_speech_prob": + 0.276660293340683}, {"id": 318, "seek": 162814, "start": 1628.14, "end": 1631.14, + "text": " Thank you.", "tokens": [50364, 1044, 291, 13, 50514], "temperature": 0.0, + "avg_logprob": -0.7868684927622477, "compression_ratio": 0.5555555555555556, "no_speech_prob": + 0.9280984997749329}]' +--- + +A timeick Cool. Yeah. Hello, how are you? Hi, Doug. It's great meeting you at Berlin Boswords. Yeah, I can see you. Yeah, great to see you. It's your second time on the podcast and yeah, excited to be back. Yeah, awesome. I think it's like two years or only one. Yeah, I think so. +But how have you been? I wasn't going. I've been great. Just been doing traditional learning to rank over at Reddit. And it's been a lot of fun. A lot of it's just meat and potato stuff. Yeah. The stuff that I think is really important like your training data with search. +And you're getting your features right and that sort of thing. Not actually too much vector search lately. So kind of. +Having a path in the ranking model space and I still think that's really important for if you're building a rag app or if you're building a lot of these things, a lot of people are sort of discovering this through the vector route. +They're like realizing there's a small other side of information retrieval. Yeah, that's important. And that's that's really exciting to me because I think a lot of new ideas that suffer coming in the space. Yeah, yeah. Yeah, amazing talk as well. I'm sure we'll link it. +What was it's published? The one you just gave. And you also reminded me of time as I told you, you know, of the time when I was working on solar. Starting at version one. +It was each to ask you which version you're running, but then I was like, what will it matter to me? Well, it's we were running solar seven until recently and one of the things you didn't talk about in the talk was having performance problems with solar seven. Yeah. +And moving to solar nine fixed it. So it helped with a lot of stability and performance problems. And that's just one of those things that a lot of these projects machine like not just like learning to rank is like a lot of machine learning projects. +You what I find is especially learning to rank you're often like building out and scaling up infrastructure for a certain problem at the same time you're doing machine learning. Yeah. So it's it's you're finding these problems. Yeah, and you will spend weeks. +Yeah, or month like why is this slow? It's unexpectedly slow. What's behind it is and then you realize, oh, solar nine doesn't have this problem and will resolve it. Yeah. And we were already upgrading. So like, okay, we can put this. We don't have to stress out about this performance problem. +That's why it takes a year to year. For these projects. Yeah, yeah, start to show. Yeah. I also would like to say thank you for your project that I think you started back at OEC Open Source Connections. Yeah. Hello LTR. Yeah. I think it's still out there and it's out. That's a great project. +Yeah, it really allowed me to quickly, you know, jump on the on the train and start moving because I was actually alone on the team. I did do search before, but it wasn't related to a mile at all, right? It was like feature engineering button at different side of things and. +Yeah, so thanks for that. Really? Yeah. I think I think it's really important and one thing I think it's the career advice that's helped me is to learn in public. So a lot of hello LTR came up when I was learning how to do LTR. And I had to get some examples and like try different things out. +And then as I made mistakes, those mistakes became lessons for the LTR training that came out of hello LTR, also called hello LTR that open source connection it does. So it's I really encourage people like the best way the best teachers are often people actively learning. +Yeah, because you will encounter the mistakes that the experts forgot about. I couldn't tell you about how to learn how a loop worked from Python because I've done too long. +But the person who would teach, have the empathy to teach that really well to someone learning that's scratch would be probably my son if he was learning Python for so. +So I really encourage like be out there speaking, blogging, yeah, because you'll have insight to how to teach a product that expert won't. Yeah, that's that's another side of your professional life that amazes me is that how do you find time to block so it's like. +And that those are really deep things sometimes you go into detail with its code or you offer some thought model like do you sleep at all. I think I just have a high tolerance for making mistakes in public. And also I think a lot of it has to do with having a history degree. +Oh really, I didn't know that. Yeah, history and computer science. So when you get when you do history. It's a lot of writing writing writing, writing, reporting writing and then a lot of it's also when you get to this your senior level history. +It's like not just writing an essay, but can you write your argument in a single page. And so that's which is funny because you think when you're a student, you think I'm going to make the margins big and I'm going to make the text big so I can take up more space. +Yeah, when you start writing a lot, you tend to write you tend to get really verbose. Yeah, then you have to learn to make your arguments like exactly. Yes, and shorter. And yeah, so. +Yeah, it also now when I'm doing the product management role, I have I do not have a history degree like you, but I have to write some things in a concise way. Sometimes they say you have to remove half of the page because you're not feeding the page limit. I make that mistake all the time. +Yeah, how many one pageers are exactly like 10 pages. And another thing is like never talk about hypothetical future because you don't even know yourself what it will happen or not, right. +Yeah, only talk either talk about things that you're absolutely certain have happened or you certain that they have planned already, right. Yeah, that's how we do the product management. +Yeah, it teaches that that's the side of things, but I guess what I do is out and I then go to blogging and and and and use you as a great example there. You go and unleash yourself and blogging and you write what you want, right. +But you still need to what you want said that you became more successful blogger at the moment you actually started modeling that specific person you're writing to not an abstract audience and not yourself because you're not writing. And so is this how you still perceive it. +Well, I definitely write to myself six months from now, but I also write the audience I imagine is like a close group of friends. So I almost think about blogging as sometimes and this is easy. It's easy for people often to imagine sitting down and writing a long email or Slack message. +And what if you just turn that into a blog post. Yeah. And to me, that's that's an inspiration for this so many times you get excited about something and want to send a message to your friends. Yeah, and share it. +Well, turn that enthusiasm and that message into a blog post and then that those are the best blog posts. And I also think it's really important to remember it's blogging. It's like it's a step above writing us a from a post. It's very informal. Don't take it too seriously. +You will make mistakes exactly. You will it's and it's a do it for fun. Yeah. But yeah, I do it a lot because I want. I think I I there's a meme of like someone starting out on something. Yeah, someone being very senior and then or someone being like mid career and then someone. +Super senior in their career. And often the like starting out this meme starting out super senior are the same. And it's like my version of that is doing when you start out you code and do stuff to impress your friends like in high school or whatever. +And then you get like all worried about like having some big impact and like. Impressing the whole world. And then when you get super senior again, you're just like I just want to like do full stuff to impress my friends. Yeah. +And which actually turns out also to be stuff that the whole world cares about because usually your friends are. Like doing cool stuff themselves like you know vector searcher or doing cool AI stuff. Yeah. So it turns out that the rest of the world finds that it's interesting to. +But I think that's a really important thing to have an authentic voice. Yeah. So it's also part of the building up your profile. Yeah. Yeah. So I think like Steve Jobs, I think I think said like computer is a bicycle for the mind, right? And so blogging is also in a way bicycle for the mind. +You have to rework yourself. Right. It's it's also the programming that you do on the site like searcher rate. So tell me more about what was the motivation. Why did you start working? So I think I like to go against the grain a little bit. +So I actually had worked on different versions of vector search for a long time before this for craze. And different hacks and things to make vector search work in a in the current in the solar or last search world. So everyone's in vector search. +And I decided that in part because I wanted to get do a little bit more native programming. Get facts. I used to do that. I used to be a C programmer. Yeah. And I found that. Vector search is very welcoming to machine learning engineer data science community. +But the traditional lexical search engines like solar and elastic search. They're very weird. Yeah. They're very like you have to know this weird query DSL. You have to understand these things. Organization. So I wanted to take that. +And take that lexical world and bring it into a data science or data high data sort of environment. And then I found that it's very comfortable to machine learning people and data science. Yeah. So I built search array. And what search array. +The reason I built that what that does is it's basically a lexical extension to pandas. So if I have text. I can make a pandas column that's just like tokenized text. And then I can ask it to score against a keyword and get a BM25 score. And this was so in a co lab notebook or something. +I can quickly further set ideas while having to stand up solar elastic search. Yeah. Or think about Docker container and all this stuff. And I said I could see like, OK, I want to tokenize things a certain way. I want to score change the BM25 scoring to be a certain way. Yeah. +And which is you think about like 90% of what you do in lexical search engine is tweaking the tokenization. Yeah. Finging the scoring, trying to index something new and search against that. Like, oh, I have any recognition field now. No. Oh, it's and the other thing is what. +The other thing you do a lot in lexical search engines is I want to boost by recency. And I want to. Do these other things. And a lot of these things can just be done with a really fast like I take this column. New miracle date column. In pandas. Yeah. +And I want to add a score, a num tier ray that's a BM25 score. I multiply them together. Yeah. I have a score that's a recency weighted. Yeah. What does that look like in terms of my offline metrics? Yeah. Interesting. +Without having to go off to like, you have to go out elastic search and like that. Assuteric stuff. So basically it allows you to try out some ideas really quickly, right? But then there will be some kind of offset compared with the reality because tokenizing is in solar probably work differently. +Yeah. But that would be different probably will be close enough right? So it's also if you nailed the signal that like you explained today in your presentation, you know, what about a number of comments or what about you know the recent sea and so on. Yeah, all of these be on the board. +And I'll like try that really quickly. Yeah. Yeah. And a lot of times it's, it's a big effort to index some new data into the search engine. Yeah. And you have to go is the upstream system fast enough to handle those load and then stay up to date. And really to justify a project. Yeah. +You might start with a prototype and say, OK, I just pull for a small. Small chest set of data. Yeah. I pulled in some data. It seems like there's some signal here. Yeah. Let's plan a project around it. Yeah. And so this is to me. And actually I'll, my sees the conference after blue and buzzwords. +I'll talk about planning. But to me, a lot of this like how do we build better prototypes to build plans and have ideas and have conversations between engineers data scientists and product managers is really one of the inspirations for search array. +Because to do this before I'd be like, OK, I have to stand up some examples, certain custom. Yeah, yeah, yeah, yeah, spend time on that. Yeah, how am I going to absolutely locate a cluster and whatnot. Yeah, exactly. Yeah. +And I think I actually this reminded me when I was working on the learning to rank. And this was the last project I did in my 10 year tenure at Alpha Cent. I said, Hey, can I have this really expensive laptop? So it would be like 30 gig around one terabyte drive. +I thought I need so much space for some reason. And it was SSD. And and I got it approved. It was like, oh, my God. So many. So and I spent like a year working on it and was kind of like bloated version of search array because I could do everything on the laptop. This connected right. +The only problem I remember was tracking my experiment tree because I would like. Let's the right word for Kate or like kind of branch out. That's right. And then like, OK, should I go back now because retreat because looks looks like I went down the rabbit hole and it doesn't give at any value. +So you go back to that state when it was that bigger, you know, and this is you or whatever, right? So instead of from there, that sounds a lot of like what some of the functionality in cute that the sort of like search relevance turning tool to you, you, you, PID. +Because I remember when I was building that many years ago. It was very much like every time you submit like you tweak the query and it saves that as a try. Yeah, another time you can't fork off stuff, but you can go back and be like, oh, this thing didn't work out. I'm going to go back. +And yeah, it's it is funny how yeah, I have the same feeling. And even in a notebook environment. I don't have that because notebooks. And you tweak a little bit, you forget what happened. You like, why did I thought my end ECG was good and I got bad. +What did I do wrong? You wish that you were like somehow the notebook was like versioning itself as you were going. Yeah, exactly. And but and like somehow the whole environment was version, but yeah, that doesn't exist. I wish that kind of thing existed. +Yeah, there was there was a tool we were using, but then it was acquired by some company. It was called spell dot run. So basically was like a integrated Python notebook environment that runs a cluster. And they were heading in the direction. I was giving a lot of feedback to them. +Hey, can you actually build an infrastructure, which will allow me to also, you know, maintain my branched out, you know, experiment. And I think they got acquired probably short before they could do this and probably they continue doing this. I don't know. +But there is another project called DVC, I think, which allows you to basically maintain your experiments as deep hashes, right. So you basically, you can, you can, you can get cash your code along with your data. And then you upload your data. +Let's say to some cloud, I don't know, drive some drive abstract one. And then you have your code associated with that. So you can basically restore you or someone else that can restore the experiment. I think if that if that was frictionless, right. +Or even the matter of it existing, right, because I had to literally like write something down on a piece of paper to remember what I need to do. You know, sometimes I would go crazy at shopify. So we had a, we had a, our testing, search testing infrastructure, the notebooks. +So everything was in a monorepo. So you could stand up elastic search. And you could have, we had a, this is a rails environment. +There's a, all the relevance logic was in a Ruby library that we, a rails monolith would load and so call in the network call, a lasso search and do whatever, pre and post processing. But when we stood up the test environment, we would say, we wanted to load this. +We wanted to load the right configs, but we would basically put the company commit tasks of the repo that we, that it was supposed to be and it would load the config and it would be amazing. But yeah, it's, it's getting reproducible environments. Yeah, and experiments is a challenge. +This is where, I mean, at some point, basically like your experiment rate will be, you know, trumped by how quickly you can deploy this or how quickly you can like, shuffle things, right? So yeah, I think so. This is where infrastructure comes as a big topic and in your talk. +Yeah, I think you spent a good, yeah. Yeah. I think it's like partnership, right? So we had a, there has to be like a big steam in my career too as partnerships like partnerships with PM partnerships with data. Yeah, partnerships with infrastructure. +You really have to have one cohesive team and one of the anti patterns is when they're so separate that it creates. I agree. You have to throw a requirement over the fence. Yeah. And then a month later, maybe you get something back, but it's not quite what you want. +You really have to act like one team. Yeah. And search is so multi functional. Yeah. And I've seen it at Shopify, the challenge was like infrastructure was a different or. And so we would throw things back and forth over the fence and be not quite right. +We want to and we have to figure out the right way to partner. At Reddit, it's a bit more, we have a bit more of a challenge that data is a different group. So we're sort of throwing things over the fence, getting things back. Yeah. +And so we have to like actively work to make sure those partnerships are like are healthy. Yeah. But it's a big challenge. And I think like organizationally, there are reasons that companies separate things out that are beyond search. +So it's, it's not like there's a easy solution, but it's, it's definitely when you get to search with these data products, like not just search for recommendations and feed and things. +These become having cross functional partnerships and not only cross functional partnerships, but individuals who can work beyond their domain and get them for like their multiple hats. Yeah. Is really important. +Yeah, I think you're absolutely spot on the, you know, and in back in the previous company, declined of silo AI and now I don't know. +I feel sort of like the same, but one thing I found after like breaking some arrows in the beginning, I found that if you can try to find the mutual benefits, that they will be driven as well as you. Yeah, you don't know what's the outcome going to be because experiments are always like that. +Right. Yeah, but the fact that we are having an experiment cross department, this is amazing. Then you go to all these meetings with executives and you say you, you praise them and they probably some way of praise you, like your team. And, and that's how you get the right. +But, but things happen things happen. You just need to be persistent. I guess. Yeah, things happen all the time. And like, you know, organizational changes are like the weather. You never know. There's going to be a reward for someone comes in with a new perspective or whatever. +And that's another career lesson is not to get too caught up like emotionally. Oh, yeah, something happens. Yeah, it's not a lot of times it's just that so many things are on your control that are just like out in the politics or whatever organization changes of the home. +Yeah, I think it's sort of mental model called circles of interest in the inner one. It's like your direct control. It's probably you your time and whatever. Where you work specific tasks. Yeah, another one is like inference. So you cannot control. +But you can influence people or things or whatever it is. And the last one is that even if it bothers you, but you cannot do anything. So it's in all control area. Yeah. So you have to accept it or move on or do something. But don't get stuck on that. So it's have. +And things of course keep moving between the. It's just like dynamic system, but still good to be aware of. And you know, with your way of blogging and book writing and yeah, projects you actually have that way of kind of okay, this is stuck. +I'm going to, you know, relieve stress by blogging, even though writing is. Oh, you're right. I do think like the other career advice is like you get hyper focused on one thing. You lose the forest for the trees. Yes. And take a step back. +Maybe there's a project that you really like that got canceled or something. Yes. We'll take a step back and. First of all, a year, you don't remember. But there are so many interesting things to work on. And I think people forget that that there is. There's so many interesting things to work on. +And I, you know, I had a brief break between Shopify and Reddit. And I, I realized what life would be like when I was retired, because I would get up and. It's not like I just laid around and nothing. I was just like, Oh, what could I play with? What could I do? Oh, there's a problem. +There's that interest and problem. And that really opens your eyes to that. There's always there's sort of like more fish in the sea. So to speak of like problems to work on or cool stuff. +So yeah, there was even a study that when people retire because they got money or like a lottery or some other wise, they go enjoy life, but they also age much quicker. And sometimes they unfortunately die quicker because they have nothing to sort of strive for. Yeah. +So that's that's really really cool advice. And if this was not enough, all this cute bit search array, blogging, of course, work and other things podcasting now. You also write a book tell me a bit more about that before we close. +Oh, yeah, yeah, it's it's it's you just joked on the stage that the idea came to 2018. Yeah, so tray tray and a share of the book. So trace the primary author and. tray came to me and I think 2018 said, Hey, I want to let you know I'm writing a search book. +I think I'll be done and I want to really, you know, for work and my wife and everything and family. I want to be done in six months and I'm stressed everyone out. And here we are. It's, you know, there was a pandemic. There was a lot of stuff. +But it's 2024 and it's funny because the nature what you might refer to as AI in 2018, of course, is now is LLM's and these things. But it's really exciting. I think like a lot of the things in the book are timeless techniques. +We initially focused the book on solar, but we're taking we took a step back when we said let's make this applicable to many search engines. Yeah. And there are examples being worked on for many platforms. It's the ecosystem is so huge now. +There's all kinds of vector databases that are even adding lexical sir. And then there's of course solar elastic search. There's open search. There's a best. Yeah. In this more and more traditional space. And so like I worked primarily on the learning terrain content. +And so a lot of the things about how you get training data or train a model or how you evaluate these things, how you expose users to search results that are maybe a bit novel, like you do a little exploration to build out your training data. +And so these things are regardless of where search goes or where rag goes for whatever. It's there are still very relevant. And it feels like in a way, a lot of how users are interacting with with the world and with products is through some kind of search or some kind of retrieval system. +Even if it's a recommendation system or a feature some that's becoming feeling more like search where it's like real time and I'm getting the stuff updated in real time. And of course, rag is searched. +So I think search is still there, right? And I think it's going over the world and on which I realize some crowd is now gathering to have lunch and we will have lunch soon as well. +It's always a pleasure to talk to you and finally in person I think we've never met but I think if you've been to a new senior revolution in 2003, seeing around an island. I've been there as well. Is that the one in San Diego or no, no, in Dublin, in Dublin, yeah, I've been there. Yes, I wasn't. +Yeah, yeah. Then I need in there to, you know, to go high. But you know, that's when I introduced Kupit actually. Yeah, I know. It's amazing. The workter. Yeah. That's a still relevant project. It's running 2013 JavaScript. Yeah, and I've been consistently deploying it in every company I agree. +So in Tom Tom, we just released a new algorithm to production. Thanks to Kupit in two weeks. It was, it was on the bookshelf for quite some time because the team couldn't figure out how to test equality. And I said, okay, let's just do labeling, right? And let's use Q. Yeah, just simple labels and. +And we saw like more than 10% increase in precision with new algorithm. And they said, as a product manager, I approved the release. Let's go. That's also that's great. Thanks for creating the tool. Great. Sure. Happy to have a lot of making tools. Yeah. Thanks for a time, Doug. +Enjoy the conference. Thank you. Stay in Berlin. Yes. Thank you. Awesome. Thanks. Thank you. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/berlin-buzzwords-2024-sonam-pankaj-embedanything.md b/transcripts_with_timestamps/vector-podcast/berlin-buzzwords-2024-sonam-pankaj-embedanything.md new file mode 100644 index 0000000..af3608d --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/berlin-buzzwords-2024-sonam-pankaj-embedanything.md @@ -0,0 +1,1143 @@ +--- +description: '

This episode on YouTube: https://youtu.be/dVIPBxHJ1kQ

00:00 + Intro

00:15 Greets for Sonam

01:02 Importance of metric learning

3:37 + Sonam''s background: Rasa, Qdrant

4:31 What''s EmbedAnything

5:52 What + a user gets

8:48 Do I need to know Rust?

10:18 Call-out to the community

10:35 + Multimodality

12:32 How to evaluate quality of LLM-based systems

16:38 + QA for multimodal use cases

18:17 Place for a human in the LLM craze

19:00 + Use cases for EmbedAnything

20:54 Closing theme (a longer one - enjoy!)

Show + notes:

- GitHub: https://github.com/StarlightSearch/EmbedAnything

- + HuggingFace Candle: https://github.com/huggingface/candle

- + Sonam''s talk on Berlin Buzzwords 2024: https://www.youtube.com/watch?v=YfR3kuSo-XQ

- + Removing GIL from Python: https://peps.python.org/pep-0703

- + Blind pairs in CLIP: https://arxiv.org/abs/2401.06209

- + Dark matter of intelligence: https://ai.meta.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/

- + Rasa chatbots: https://github.com/RasaHQ/rasa

- + Prometheus: https://github.com/prometheus-eval/prometheus-eval

- + Dino: https://github.com/facebookresearch/dino

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20240919_060938_934c8351e1fe4c81a354cd419d0a3307.png +pub_date: Thu, 19 Sep 2024 11:02:40 GMT +title: Berlin Buzzwords 2024 - Sonam Pankaj - EmbedAnything +url: https://rss.com/podcasts/vector-podcast/1663042 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 21.44, "text": " Hello + there, vector podcast and I''m here accompanied with Sonan. Sonan you are the, I + guess,", "tokens": [50364, 2425, 456, 11, 8062, 7367, 293, 286, 478, 510, 24202, + 365, 5185, 282, 13, 5185, 282, 291, 366, 264, 11, 286, 2041, 11, 51436], "temperature": + 0.0, "avg_logprob": -0.36618917815539304, "compression_ratio": 1.335820895522388, + "no_speech_prob": 0.13012650609016418}, {"id": 1, "seek": 0, "start": 21.44, "end": + 26.92, "text": " visitor of the conference. Are you also giving a talk? Yes, I''m + giving a talk tomorrow", "tokens": [51436, 28222, 295, 264, 7586, 13, 2014, 291, + 611, 2902, 257, 751, 30, 1079, 11, 286, 478, 2902, 257, 751, 4153, 51710], "temperature": + 0.0, "avg_logprob": -0.36618917815539304, "compression_ratio": 1.335820895522388, + "no_speech_prob": 0.13012650609016418}, {"id": 2, "seek": 2692, "start": 26.92, + "end": 33.84, "text": " on metric learning. Yeah, what''s your topic? I''m not talking + metric learning tomorrow, but I''m", "tokens": [50364, 322, 20678, 2539, 13, 865, + 11, 437, 311, 428, 4829, 30, 286, 478, 406, 1417, 20678, 2539, 4153, 11, 457, 286, + 478, 50710], "temperature": 0.0, "avg_logprob": -0.35272098541259767, "compression_ratio": + 1.6506550218340612, "no_speech_prob": 0.43288251757621765}, {"id": 3, "seek": 2692, + "start": 33.84, "end": 39.68, "text": " very excited about what we are building + at and better than anything on starlight. So yeah,", "tokens": [50710, 588, 2919, + 466, 437, 321, 366, 2390, 412, 293, 1101, 813, 1340, 322, 3543, 2764, 13, 407, 1338, + 11, 51002], "temperature": 0.0, "avg_logprob": -0.35272098541259767, "compression_ratio": + 1.6506550218340612, "no_speech_prob": 0.43288251757621765}, {"id": 4, "seek": 2692, + "start": 39.68, "end": 46.160000000000004, "text": " awesome. And is it your first + time at the conference? Yes, it''s the first time, but that''s one of", "tokens": + [51002, 3476, 13, 400, 307, 309, 428, 700, 565, 412, 264, 7586, 30, 1079, 11, 309, + 311, 264, 700, 565, 11, 457, 300, 311, 472, 295, 51326], "temperature": 0.0, "avg_logprob": + -0.35272098541259767, "compression_ratio": 1.6506550218340612, "no_speech_prob": + 0.43288251757621765}, {"id": 5, "seek": 2692, "start": 46.160000000000004, "end": + 51.96, "text": " the best conferences. Awesome. Yeah, I love it. I''ve been here + first time in 2011 and I still,", "tokens": [51326, 264, 1151, 22032, 13, 10391, + 13, 865, 11, 286, 959, 309, 13, 286, 600, 668, 510, 700, 565, 294, 10154, 293, 286, + 920, 11, 51616], "temperature": 0.0, "avg_logprob": -0.35272098541259767, "compression_ratio": + 1.6506550218340612, "no_speech_prob": 0.43288251757621765}, {"id": 6, "seek": 5196, + "start": 51.96, "end": 56.68, "text": " I still love coming back once in a while. + It''s really good. I can see why you want to come back", "tokens": [50364, 286, + 920, 959, 1348, 646, 1564, 294, 257, 1339, 13, 467, 311, 534, 665, 13, 286, 393, + 536, 983, 291, 528, 281, 808, 646, 50600], "temperature": 0.0, "avg_logprob": -0.20331482660202754, + "compression_ratio": 1.6244725738396624, "no_speech_prob": 0.04114510864019394}, + {"id": 7, "seek": 5196, "start": 56.68, "end": 65.32, "text": " again and again. + Yeah, exactly. Yeah. Awesome. And you work mostly on what I, well, we had an", "tokens": + [50600, 797, 293, 797, 13, 865, 11, 2293, 13, 865, 13, 10391, 13, 400, 291, 589, + 5240, 322, 437, 286, 11, 731, 11, 321, 632, 364, 51032], "temperature": 0.0, "avg_logprob": + -0.20331482660202754, "compression_ratio": 1.6244725738396624, "no_speech_prob": + 0.04114510864019394}, {"id": 8, "seek": 5196, "start": 65.32, "end": 72.36, "text": + " episode actually with quadrants on metric learning. I will, I will make sure to + link it. Tell me a", "tokens": [51032, 3500, 767, 365, 10787, 10968, 322, 20678, + 2539, 13, 286, 486, 11, 286, 486, 652, 988, 281, 2113, 309, 13, 5115, 385, 257, + 51384], "temperature": 0.0, "avg_logprob": -0.20331482660202754, "compression_ratio": + 1.6244725738396624, "no_speech_prob": 0.04114510864019394}, {"id": 9, "seek": 5196, + "start": 72.36, "end": 77.72, "text": " bit more about metric learning if you will. + Like in a, in a, why shouldn''t everyone care that he", "tokens": [51384, 857, 544, + 466, 20678, 2539, 498, 291, 486, 13, 1743, 294, 257, 11, 294, 257, 11, 983, 4659, + 380, 1518, 1127, 300, 415, 51652], "temperature": 0.0, "avg_logprob": -0.20331482660202754, + "compression_ratio": 1.6244725738396624, "no_speech_prob": 0.04114510864019394}, + {"id": 10, "seek": 7772, "start": 77.72, "end": 83.88, "text": " seems to think + that they should use maybe yes. So a lot of people just think about like, you know,", + "tokens": [50364, 2544, 281, 519, 300, 436, 820, 764, 1310, 2086, 13, 407, 257, + 688, 295, 561, 445, 519, 466, 411, 11, 291, 458, 11, 50672], "temperature": 0.0, + "avg_logprob": -0.25106561183929443, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.05657637491822243}, {"id": 11, "seek": 7772, "start": 83.88, "end": 89.8, "text": + " we can do a check distance and then you know, we''ll get the similarity. But the + thing is,", "tokens": [50672, 321, 393, 360, 257, 1520, 4560, 293, 550, 291, 458, + 11, 321, 603, 483, 264, 32194, 13, 583, 264, 551, 307, 11, 50968], "temperature": + 0.0, "avg_logprob": -0.25106561183929443, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.05657637491822243}, {"id": 12, "seek": 7772, "start": 89.8, + "end": 96.2, "text": " even though you change the distance, it won''t make any difference + because those embeddings are", "tokens": [50968, 754, 1673, 291, 1319, 264, 4560, + 11, 309, 1582, 380, 652, 604, 2649, 570, 729, 12240, 29432, 366, 51288], "temperature": + 0.0, "avg_logprob": -0.25106561183929443, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.05657637491822243}, {"id": 13, "seek": 7772, "start": 96.2, + "end": 102.6, "text": " already in the space. So it''s already relative. So if you''re + doing a co-science similarity,", "tokens": [51288, 1217, 294, 264, 1901, 13, 407, + 309, 311, 1217, 4972, 13, 407, 498, 291, 434, 884, 257, 598, 12, 82, 6699, 32194, + 11, 51608], "temperature": 0.0, "avg_logprob": -0.25106561183929443, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.05657637491822243}, {"id": 14, "seek": 10260, + "start": 102.67999999999999, "end": 108.91999999999999, "text": " which I love pizza + and I do not love pizza, that''s your 90% similarity. Right? And", "tokens": [50368, + 597, 286, 959, 8298, 293, 286, 360, 406, 959, 8298, 11, 300, 311, 428, 4289, 4, + 32194, 13, 1779, 30, 400, 50680], "temperature": 0.0, "avg_logprob": -0.31419731398760264, + "compression_ratio": 1.6545454545454545, "no_speech_prob": 0.026558207347989082}, + {"id": 15, "seek": 10260, "start": 108.91999999999999, "end": 114.75999999999999, + "text": " to the other distance will not make any sense. So the thing is with metric + learning, you can", "tokens": [50680, 281, 264, 661, 4560, 486, 406, 652, 604, 2020, + 13, 407, 264, 551, 307, 365, 20678, 2539, 11, 291, 393, 50972], "temperature": 0.0, + "avg_logprob": -0.31419731398760264, "compression_ratio": 1.6545454545454545, "no_speech_prob": + 0.026558207347989082}, {"id": 16, "seek": 10260, "start": 114.75999999999999, "end": + 120.6, "text": " build your own data set and then the train there when embedding + model again for giving you", "tokens": [50972, 1322, 428, 1065, 1412, 992, 293, + 550, 264, 3847, 456, 562, 12240, 3584, 2316, 797, 337, 2902, 291, 51264], "temperature": + 0.0, "avg_logprob": -0.31419731398760264, "compression_ratio": 1.6545454545454545, + "no_speech_prob": 0.026558207347989082}, {"id": 17, "seek": 10260, "start": 120.6, + "end": 127.08, "text": " right. Yeah. I mean, I still try to understand it, but + it''s basically like, like on one hand,", "tokens": [51264, 558, 13, 865, 13, 286, + 914, 11, 286, 920, 853, 281, 1223, 309, 11, 457, 309, 311, 1936, 411, 11, 411, 322, + 472, 1011, 11, 51588], "temperature": 0.0, "avg_logprob": -0.31419731398760264, + "compression_ratio": 1.6545454545454545, "no_speech_prob": 0.026558207347989082}, + {"id": 18, "seek": 10260, "start": 127.08, "end": 131.56, "text": " you have your + data and then you choose the model and that model should be pre-trained for you,", + "tokens": [51588, 291, 362, 428, 1412, 293, 550, 291, 2826, 264, 2316, 293, 300, + 2316, 820, 312, 659, 12, 17227, 2001, 337, 291, 11, 51812], "temperature": 0.0, + "avg_logprob": -0.31419731398760264, "compression_ratio": 1.6545454545454545, "no_speech_prob": + 0.026558207347989082}, {"id": 19, "seek": 13156, "start": 132.04, "end": 137.96, + "text": " you could also fine tune it on your data if you want. And then inherently, + it will have its own", "tokens": [50388, 291, 727, 611, 2489, 10864, 309, 322, 428, + 1412, 498, 291, 528, 13, 400, 550, 27993, 11, 309, 486, 362, 1080, 1065, 50684], + "temperature": 0.0, "avg_logprob": -0.157344298892551, "compression_ratio": 1.7293577981651376, + "no_speech_prob": 0.0022104955278337}, {"id": 20, "seek": 13156, "start": 139.16, + "end": 145.0, "text": " measure of similarity. So it''s not something you can easily + control. Yeah. But then metric learning", "tokens": [50744, 3481, 295, 32194, 13, + 407, 309, 311, 406, 746, 291, 393, 3612, 1969, 13, 865, 13, 583, 550, 20678, 2539, + 51036], "temperature": 0.0, "avg_logprob": -0.157344298892551, "compression_ratio": + 1.7293577981651376, "no_speech_prob": 0.0022104955278337}, {"id": 21, "seek": 13156, + "start": 145.0, "end": 150.04, "text": " opposes this by saying that you should + be in control of your metric. Yeah. It''s all your", "tokens": [51036, 1458, 4201, + 341, 538, 1566, 300, 291, 820, 312, 294, 1969, 295, 428, 20678, 13, 865, 13, 467, + 311, 439, 428, 51288], "temperature": 0.0, "avg_logprob": -0.157344298892551, "compression_ratio": + 1.7293577981651376, "no_speech_prob": 0.0022104955278337}, {"id": 22, "seek": 13156, + "start": 150.04, "end": 155.32, "text": " similarity measure, not just the metric + itself, but the similarity measure, which means that", "tokens": [51288, 32194, + 3481, 11, 406, 445, 264, 20678, 2564, 11, 457, 264, 32194, 3481, 11, 597, 1355, + 300, 51552], "temperature": 0.0, "avg_logprob": -0.157344298892551, "compression_ratio": + 1.7293577981651376, "no_speech_prob": 0.0022104955278337}, {"id": 23, "seek": 15532, + "start": 155.88, "end": 162.28, "text": " I should kind of like drop the model, + just get my data and start training some new network,", "tokens": [50392, 286, 820, + 733, 295, 411, 3270, 264, 2316, 11, 445, 483, 452, 1412, 293, 722, 3097, 512, 777, + 3209, 11, 50712], "temperature": 0.0, "avg_logprob": -0.24773545100771147, "compression_ratio": + 1.5829787234042554, "no_speech_prob": 0.007064012344926596}, {"id": 24, "seek": + 15532, "start": 162.28, "end": 169.4, "text": " right? So that I can find the basically + fine tuning the embedding model. What with your data?", "tokens": [50712, 558, 30, + 407, 300, 286, 393, 915, 264, 1936, 2489, 15164, 264, 12240, 3584, 2316, 13, 708, + 365, 428, 1412, 30, 51068], "temperature": 0.0, "avg_logprob": -0.24773545100771147, + "compression_ratio": 1.5829787234042554, "no_speech_prob": 0.007064012344926596}, + {"id": 25, "seek": 15532, "start": 169.4, "end": 176.12, "text": " So yeah, suppose + you''re finding intense. Yes. Okay. Where does metric learning really shine?", "tokens": + [51068, 407, 1338, 11, 7297, 291, 434, 5006, 9447, 13, 1079, 13, 1033, 13, 2305, + 775, 20678, 2539, 534, 12207, 30, 51404], "temperature": 0.0, "avg_logprob": -0.24773545100771147, + "compression_ratio": 1.5829787234042554, "no_speech_prob": 0.007064012344926596}, + {"id": 26, "seek": 15532, "start": 177.07999999999998, "end": 183.07999999999998, + "text": " It''s classification versus similarity again. If you are doing classification, + you are limited", "tokens": [51452, 467, 311, 21538, 5717, 32194, 797, 13, 759, + 291, 366, 884, 21538, 11, 291, 366, 5567, 51752], "temperature": 0.0, "avg_logprob": + -0.24773545100771147, "compression_ratio": 1.5829787234042554, "no_speech_prob": + 0.007064012344926596}, {"id": 27, "seek": 18308, "start": 183.08, "end": 190.28, + "text": " up to certain classes, right? Suppose, yeah, particular intense. Yeah. + It''s not scalable at", "tokens": [50364, 493, 281, 1629, 5359, 11, 558, 30, 21360, + 11, 1338, 11, 1729, 9447, 13, 865, 13, 467, 311, 406, 38481, 412, 50724], "temperature": + 0.0, "avg_logprob": -0.30882939425381745, "compression_ratio": 1.6952789699570816, + "no_speech_prob": 0.005887746810913086}, {"id": 28, "seek": 18308, "start": 190.28, + "end": 197.0, "text": " like a million scale, you cannot keep adding adding addicts, + but with similarity search and metric", "tokens": [50724, 411, 257, 2459, 4373, + 11, 291, 2644, 1066, 5127, 5127, 22072, 82, 11, 457, 365, 32194, 3164, 293, 20678, + 51060], "temperature": 0.0, "avg_logprob": -0.30882939425381745, "compression_ratio": + 1.6952789699570816, "no_speech_prob": 0.005887746810913086}, {"id": 29, "seek": + 18308, "start": 197.0, "end": 203.16000000000003, "text": " learning, you can add + any intense, very keen solution. Yeah. Yeah. So it''s not limited. Yeah. That''s + one", "tokens": [51060, 2539, 11, 291, 393, 909, 604, 9447, 11, 588, 20297, 3827, + 13, 865, 13, 865, 13, 407, 309, 311, 406, 5567, 13, 865, 13, 663, 311, 472, 51368], + "temperature": 0.0, "avg_logprob": -0.30882939425381745, "compression_ratio": 1.6952789699570816, + "no_speech_prob": 0.005887746810913086}, {"id": 30, "seek": 18308, "start": 203.16000000000003, + "end": 211.0, "text": " of the, you know, classical way to view that metric learning + plays much, much better role at scale,", "tokens": [51368, 295, 264, 11, 291, 458, + 11, 13735, 636, 281, 1910, 300, 20678, 2539, 5749, 709, 11, 709, 1101, 3090, 412, + 4373, 11, 51760], "temperature": 0.0, "avg_logprob": -0.30882939425381745, "compression_ratio": + 1.6952789699570816, "no_speech_prob": 0.005887746810913086}, {"id": 31, "seek": + 21100, "start": 211.0, "end": 219.32, "text": " and that''s why vector database + can scale this much. Sure. Yeah. And tell me a bit more about yourself.", "tokens": + [50364, 293, 300, 311, 983, 8062, 8149, 393, 4373, 341, 709, 13, 4894, 13, 865, + 13, 400, 980, 385, 257, 857, 544, 466, 1803, 13, 50780], "temperature": 0.0, "avg_logprob": + -0.22737432207380023, "compression_ratio": 1.5916666666666666, "no_speech_prob": + 0.005757349543273449}, {"id": 32, "seek": 21100, "start": 219.32, "end": 223.88, + "text": " How did you end up in this space? Like, what was your pet? I know you + worked at Thrasa as well,", "tokens": [50780, 1012, 630, 291, 917, 493, 294, 341, + 1901, 30, 1743, 11, 437, 390, 428, 3817, 30, 286, 458, 291, 2732, 412, 334, 3906, + 64, 382, 731, 11, 51008], "temperature": 0.0, "avg_logprob": -0.22737432207380023, + "compression_ratio": 1.5916666666666666, "no_speech_prob": 0.005757349543273449}, + {"id": 33, "seek": 21100, "start": 223.88, "end": 229.56, "text": " which is also + an open source project. Yes. I once looked at and but now you work for another", + "tokens": [51008, 597, 307, 611, 364, 1269, 4009, 1716, 13, 1079, 13, 286, 1564, + 2956, 412, 293, 457, 586, 291, 589, 337, 1071, 51292], "temperature": 0.0, "avg_logprob": + -0.22737432207380023, "compression_ratio": 1.5916666666666666, "no_speech_prob": + 0.005757349543273449}, {"id": 34, "seek": 21100, "start": 229.56, "end": 238.04, + "text": " company like, what was your journey? And yeah. So I worked at as an AI + researcher at Sama,", "tokens": [51292, 2237, 411, 11, 437, 390, 428, 4671, 30, + 400, 1338, 13, 407, 286, 2732, 412, 382, 364, 7318, 21751, 412, 318, 2404, 11, 51716], + "temperature": 0.0, "avg_logprob": -0.22737432207380023, "compression_ratio": 1.5916666666666666, + "no_speech_prob": 0.005757349543273449}, {"id": 35, "seek": 23804, "start": 238.12, + "end": 243.16, "text": " so we were mostly in clinical trials. So, you know, Pfizer + and the world is,", "tokens": [50368, 370, 321, 645, 5240, 294, 9115, 12450, 13, + 407, 11, 291, 458, 11, 34694, 293, 264, 1002, 307, 11, 50620], "temperature": 0.0, + "avg_logprob": -0.29561863774838654, "compression_ratio": 1.521186440677966, "no_speech_prob": + 0.055215124040842056}, {"id": 36, "seek": 23804, "start": 243.16, "end": 249.32, + "text": " it does this clinical trials for 10 to 12 years and we had like those + massive data and we wanted", "tokens": [50620, 309, 775, 341, 9115, 12450, 337, + 1266, 281, 2272, 924, 293, 321, 632, 411, 729, 5994, 1412, 293, 321, 1415, 50928], + "temperature": 0.0, "avg_logprob": -0.29561863774838654, "compression_ratio": 1.521186440677966, + "no_speech_prob": 0.055215124040842056}, {"id": 37, "seek": 23804, "start": 249.32, + "end": 255.0, "text": " to find out the subjects could drop out of the studies. + I also published paper before. That''s", "tokens": [50928, 281, 915, 484, 264, 13066, + 727, 3270, 484, 295, 264, 5313, 13, 286, 611, 6572, 3035, 949, 13, 663, 311, 51212], + "temperature": 0.0, "avg_logprob": -0.29561863774838654, "compression_ratio": 1.521186440677966, + "no_speech_prob": 0.055215124040842056}, {"id": 38, "seek": 23804, "start": 255.0, + "end": 262.12, "text": " well-versed in this AI research and AI area. Yeah. And + then I joined Raza for conversation,", "tokens": [51212, 731, 12, 840, 292, 294, + 341, 7318, 2132, 293, 7318, 1859, 13, 865, 13, 400, 550, 286, 6869, 497, 12257, + 337, 3761, 11, 51568], "temperature": 0.0, "avg_logprob": -0.29561863774838654, + "compression_ratio": 1.521186440677966, "no_speech_prob": 0.055215124040842056}, + {"id": 39, "seek": 26212, "start": 262.12, "end": 270.76, "text": " AI, I love conversation, + AI. And then I joined four friends recently and I got into this embedding", "tokens": + [50364, 7318, 11, 286, 959, 3761, 11, 7318, 13, 400, 550, 286, 6869, 1451, 1855, + 3938, 293, 286, 658, 666, 341, 12240, 3584, 50796], "temperature": 0.0, "avg_logprob": + -0.36772735804727635, "compression_ratio": 1.5077720207253886, "no_speech_prob": + 0.02951665408909321}, {"id": 40, "seek": 26212, "start": 270.76, "end": 276.52, + "text": " space. And now I have my own open source project called embedding a thing + in which you can use", "tokens": [50796, 1901, 13, 400, 586, 286, 362, 452, 1065, + 1269, 4009, 1716, 1219, 12240, 3584, 257, 551, 294, 597, 291, 393, 764, 51084], + "temperature": 0.0, "avg_logprob": -0.36772735804727635, "compression_ratio": 1.5077720207253886, + "no_speech_prob": 0.02951665408909321}, {"id": 41, "seek": 26212, "start": 276.52, + "end": 283.8, "text": " very different multi-moder sources and structure sources, + speed, you know, you get embed it in 40", "tokens": [51084, 588, 819, 4825, 12, + 8014, 260, 7139, 293, 3877, 7139, 11, 3073, 11, 291, 458, 11, 291, 483, 12240, 309, + 294, 3356, 51448], "temperature": 0.0, "avg_logprob": -0.36772735804727635, "compression_ratio": + 1.5077720207253886, "no_speech_prob": 0.02951665408909321}, {"id": 42, "seek": 28380, + "start": 283.8, "end": 290.92, "text": " x faster speed than any other presence + by planes. Wow. How did you do that? That is rust.", "tokens": [50364, 2031, 4663, + 3073, 813, 604, 661, 6814, 538, 14952, 13, 3153, 13, 1012, 630, 291, 360, 300, 30, + 663, 307, 15259, 13, 50720], "temperature": 0.0, "avg_logprob": -0.3309457334753585, + "compression_ratio": 1.4789473684210526, "no_speech_prob": 0.02470666728913784}, + {"id": 43, "seek": 28380, "start": 291.72, "end": 297.32, "text": " It''s all available. + It''s all open source because I have like a used supporter of open source.", "tokens": + [50760, 467, 311, 439, 2435, 13, 467, 311, 439, 1269, 4009, 570, 286, 362, 411, + 257, 1143, 28600, 295, 1269, 4009, 13, 51040], "temperature": 0.0, "avg_logprob": + -0.3309457334753585, "compression_ratio": 1.4789473684210526, "no_speech_prob": + 0.02470666728913784}, {"id": 44, "seek": 28380, "start": 298.84000000000003, "end": + 306.84000000000003, "text": " So what we do is we have built this cluster in rust + from PDF while it is going towards embedding.", "tokens": [51116, 407, 437, 321, + 360, 307, 321, 362, 3094, 341, 13630, 294, 15259, 490, 17752, 1339, 309, 307, 516, + 3030, 12240, 3584, 13, 51516], "temperature": 0.0, "avg_logprob": -0.3309457334753585, + "compression_ratio": 1.4789473684210526, "no_speech_prob": 0.02470666728913784}, + {"id": 45, "seek": 30684, "start": 306.84, "end": 314.28, "text": " So one of the + analogy that I use most is embedding models are, yeah, they are like really,", "tokens": + [50364, 407, 472, 295, 264, 21663, 300, 286, 764, 881, 307, 12240, 3584, 5245, 366, + 11, 1338, 11, 436, 366, 411, 534, 11, 50736], "temperature": 0.0, "avg_logprob": + -0.3068643935183261, "compression_ratio": 1.6473214285714286, "no_speech_prob": + 0.014807434752583504}, {"id": 46, "seek": 30684, "start": 314.28, "end": 320.59999999999997, + "text": " really cool. They are becoming faster and everything. But if you want + to drive a Porsche,", "tokens": [50736, 534, 1627, 13, 814, 366, 5617, 4663, 293, + 1203, 13, 583, 498, 291, 528, 281, 3332, 257, 31044, 11, 51052], "temperature": + 0.0, "avg_logprob": -0.3068643935183261, "compression_ratio": 1.6473214285714286, + "no_speech_prob": 0.014807434752583504}, {"id": 47, "seek": 30684, "start": 320.59999999999997, + "end": 327.71999999999997, "text": " would you like to drive it on a national highway + for a road full of quadruples? So that''s the", "tokens": [51052, 576, 291, 411, + 281, 3332, 309, 322, 257, 4048, 17205, 337, 257, 3060, 1577, 295, 10787, 894, 2622, + 30, 407, 300, 311, 264, 51408], "temperature": 0.0, "avg_logprob": -0.3068643935183261, + "compression_ratio": 1.6473214285714286, "no_speech_prob": 0.014807434752583504}, + {"id": 48, "seek": 30684, "start": 327.71999999999997, "end": 334.28, "text": " + analogy being used. We are giving you a high for the price for driving your embedding + model or", "tokens": [51408, 21663, 885, 1143, 13, 492, 366, 2902, 291, 257, 1090, + 337, 264, 3218, 337, 4840, 428, 12240, 3584, 2316, 420, 51736], "temperature": 0.0, + "avg_logprob": -0.3068643935183261, "compression_ratio": 1.6473214285714286, "no_speech_prob": + 0.014807434752583504}, {"id": 49, "seek": 33428, "start": 334.28, "end": 341.55999999999995, + "text": " Porsche, you know, in a very sophisticated and like, yeah, no tech depth, + you call it by", "tokens": [50364, 31044, 11, 291, 458, 11, 294, 257, 588, 16950, + 293, 411, 11, 1338, 11, 572, 7553, 7161, 11, 291, 818, 309, 538, 50728], "temperature": + 0.0, "avg_logprob": -0.3562089134665096, "compression_ratio": 1.6105769230769231, + "no_speech_prob": 0.0064294240437448025}, {"id": 50, "seek": 33428, "start": 341.55999999999995, + "end": 347.88, "text": " blind for embedding. Interesting. So you are basically + building an infrastructure", "tokens": [50728, 6865, 337, 12240, 3584, 13, 14711, + 13, 407, 291, 366, 1936, 2390, 364, 6896, 51044], "temperature": 0.0, "avg_logprob": + -0.3562089134665096, "compression_ratio": 1.6105769230769231, "no_speech_prob": + 0.0064294240437448025}, {"id": 51, "seek": 33428, "start": 348.67999999999995, "end": + 354.59999999999997, "text": " where or infrastructure for this is a very model. + So what as a user, what can I do on this", "tokens": [51084, 689, 420, 6896, 337, + 341, 307, 257, 588, 2316, 13, 407, 437, 382, 257, 4195, 11, 437, 393, 286, 360, + 322, 341, 51380], "temperature": 0.0, "avg_logprob": -0.3562089134665096, "compression_ratio": + 1.6105769230769231, "no_speech_prob": 0.0064294240437448025}, {"id": 52, "seek": + 33428, "start": 355.64, "end": 361.08, "text": " project? Yeah, very good question. + So we are very production ready. Yeah.", "tokens": [51432, 1716, 30, 865, 11, 588, + 665, 1168, 13, 407, 321, 366, 588, 4265, 1919, 13, 865, 13, 51704], "temperature": + 0.0, "avg_logprob": -0.3562089134665096, "compression_ratio": 1.6105769230769231, + "no_speech_prob": 0.0064294240437448025}, {"id": 53, "seek": 36108, "start": 362.03999999999996, + "end": 368.12, "text": " And we do not use any kind of heavy library, right? They + are lip torches. So if you have to embed", "tokens": [50412, 400, 321, 360, 406, + 764, 604, 733, 295, 4676, 6405, 11, 558, 30, 814, 366, 8280, 3930, 3781, 13, 407, + 498, 291, 362, 281, 12240, 50716], "temperature": 0.0, "avg_logprob": -0.40969002765157947, + "compression_ratio": 1.6590038314176245, "no_speech_prob": 0.201416477560997}, {"id": + 54, "seek": 36108, "start": 368.12, "end": 373.88, "text": " something, the first + go on hugging phase, use sentence, transformers, and then you will download", "tokens": + [50716, 746, 11, 264, 700, 352, 322, 41706, 5574, 11, 764, 8174, 11, 4088, 433, + 11, 293, 550, 291, 486, 5484, 51004], "temperature": 0.0, "avg_logprob": -0.40969002765157947, + "compression_ratio": 1.6590038314176245, "no_speech_prob": 0.201416477560997}, {"id": + 55, "seek": 36108, "start": 373.88, "end": 379.47999999999996, "text": " that 2.5 + TV library and stuff like which will come with lip torches and stuff like that.", + "tokens": [51004, 300, 568, 13, 20, 3558, 6405, 293, 1507, 411, 597, 486, 808, 365, + 8280, 3930, 3781, 293, 1507, 411, 300, 13, 51284], "temperature": 0.0, "avg_logprob": + -0.40969002765157947, "compression_ratio": 1.6590038314176245, "no_speech_prob": + 0.201416477560997}, {"id": 56, "seek": 36108, "start": 379.47999999999996, "end": + 381.96, "text": " Yeah. And we have removed all those dependents. All right.", "tokens": + [51284, 865, 13, 400, 321, 362, 7261, 439, 729, 5672, 791, 13, 1057, 558, 13, 51408], + "temperature": 0.0, "avg_logprob": -0.40969002765157947, "compression_ratio": 1.6590038314176245, + "no_speech_prob": 0.201416477560997}, {"id": 57, "seek": 36108, "start": 383.0, + "end": 388.44, "text": " That''s a good lighter. Yeah. We have liked it. Yeah. But + of candle from the hugging phase,", "tokens": [51460, 663, 311, 257, 665, 11546, + 13, 865, 13, 492, 362, 4501, 309, 13, 865, 13, 583, 295, 17968, 490, 264, 41706, + 5574, 11, 51732], "temperature": 0.0, "avg_logprob": -0.40969002765157947, "compression_ratio": + 1.6590038314176245, "no_speech_prob": 0.201416477560997}, {"id": 58, "seek": 38844, + "start": 388.44, "end": 393.88, "text": " because candle also uses rust and because + we are also building rust, it''s much easier to integrate", "tokens": [50364, 570, + 17968, 611, 4960, 15259, 293, 570, 321, 366, 611, 2390, 15259, 11, 309, 311, 709, + 3571, 281, 13365, 50636], "temperature": 0.0, "avg_logprob": -0.2641873449649451, + "compression_ratio": 1.6625, "no_speech_prob": 0.021137645468115807}, {"id": 59, + "seek": 38844, "start": 393.88, "end": 400.36, "text": " with candle. So yeah, so + it''s much lighter, much faster, you know, way of creating. What is candle?", "tokens": + [50636, 365, 17968, 13, 407, 1338, 11, 370, 309, 311, 709, 11546, 11, 709, 4663, + 11, 291, 458, 11, 636, 295, 4084, 13, 708, 307, 17968, 30, 50960], "temperature": + 0.0, "avg_logprob": -0.2641873449649451, "compression_ratio": 1.6625, "no_speech_prob": + 0.021137645468115807}, {"id": 60, "seek": 38844, "start": 401.0, "end": 406.68, + "text": " Candle is basically, basically, inference on GPU and CPU. Oh, I see. Yeah. + Yeah. And it''s also open source.", "tokens": [50992, 20466, 306, 307, 1936, 11, + 1936, 11, 38253, 322, 18407, 293, 13199, 13, 876, 11, 286, 536, 13, 865, 13, 865, + 13, 400, 309, 311, 611, 1269, 4009, 13, 51276], "temperature": 0.0, "avg_logprob": + -0.2641873449649451, "compression_ratio": 1.6625, "no_speech_prob": 0.021137645468115807}, + {"id": 61, "seek": 38844, "start": 406.68, "end": 412.2, "text": " Yeah, it''s also. + Okay. So you do everything unconventionally in a rust, even though everyone", "tokens": + [51276, 865, 11, 309, 311, 611, 13, 1033, 13, 407, 291, 360, 1203, 35847, 6411, + 379, 294, 257, 15259, 11, 754, 1673, 1518, 51552], "temperature": 0.0, "avg_logprob": + -0.2641873449649451, "compression_ratio": 1.6625, "no_speech_prob": 0.021137645468115807}, + {"id": 62, "seek": 41220, "start": 412.2, "end": 419.96, "text": " else is doing + it in Python. Because it''s, you know, multi-treting is like so much embedded in + rust.", "tokens": [50364, 1646, 307, 884, 309, 294, 15329, 13, 1436, 309, 311, 11, + 291, 458, 11, 4825, 12, 3599, 783, 307, 411, 370, 709, 16741, 294, 15259, 13, 50752], + "temperature": 0.0, "avg_logprob": -0.427581103342884, "compression_ratio": 1.738938053097345, + "no_speech_prob": 0.03050844930112362}, {"id": 63, "seek": 41220, "start": 419.96, + "end": 425.71999999999997, "text": " Like people will tell you that Python can also + do multi-treting, but that''s not too multi-treting", "tokens": [50752, 1743, 561, + 486, 980, 291, 300, 15329, 393, 611, 360, 4825, 12, 3599, 783, 11, 457, 300, 311, + 406, 886, 4825, 12, 3599, 783, 51040], "temperature": 0.0, "avg_logprob": -0.427581103342884, + "compression_ratio": 1.738938053097345, "no_speech_prob": 0.03050844930112362}, + {"id": 64, "seek": 41220, "start": 425.71999999999997, "end": 433.96, "text": " + because the global, global, and global, and global law. Yeah. And rust tells you + mutable log.", "tokens": [51040, 570, 264, 4338, 11, 4338, 11, 293, 4338, 11, 293, + 4338, 2101, 13, 865, 13, 400, 15259, 5112, 291, 5839, 712, 3565, 13, 51452], "temperature": + 0.0, "avg_logprob": -0.427581103342884, "compression_ratio": 1.738938053097345, + "no_speech_prob": 0.03050844930112362}, {"id": 65, "seek": 41220, "start": 433.96, + "end": 439.8, "text": " So you can do like achieve a tool multi-treting just with + rust. Yeah. They promised actually to solve", "tokens": [51452, 407, 291, 393, 360, + 411, 4584, 257, 2290, 4825, 12, 3599, 783, 445, 365, 15259, 13, 865, 13, 814, 10768, + 767, 281, 5039, 51744], "temperature": 0.0, "avg_logprob": -0.427581103342884, "compression_ratio": + 1.738938053097345, "no_speech_prob": 0.03050844930112362}, {"id": 66, "seek": 43980, + "start": 439.8, "end": 445.88, "text": " geolproblem in Python next version. Yeah. + They already are through them. Oh, wow. I don''t know when", "tokens": [50364, 1519, + 401, 4318, 1113, 294, 15329, 958, 3037, 13, 865, 13, 814, 1217, 366, 807, 552, 13, + 876, 11, 6076, 13, 286, 500, 380, 458, 562, 50668], "temperature": 0.0, "avg_logprob": + -0.27065517666103606, "compression_ratio": 1.5305343511450382, "no_speech_prob": + 0.007175566628575325}, {"id": 67, "seek": 43980, "start": 445.88, "end": 451.56, + "text": " it will materialize, but. Okay. And so, okay. But if I look at it from + the perspective, let''s say,", "tokens": [50668, 309, 486, 2527, 1125, 11, 457, + 13, 1033, 13, 400, 370, 11, 1392, 13, 583, 498, 286, 574, 412, 309, 490, 264, 4585, + 11, 718, 311, 584, 11, 50952], "temperature": 0.0, "avg_logprob": -0.27065517666103606, + "compression_ratio": 1.5305343511450382, "no_speech_prob": 0.007175566628575325}, + {"id": 68, "seek": 43980, "start": 451.56, "end": 459.08000000000004, "text": " + of building some product, being a chatbot or like search engine, you know, blend + it with vector search,", "tokens": [50952, 295, 2390, 512, 1674, 11, 885, 257, 5081, + 18870, 420, 411, 3164, 2848, 11, 291, 458, 11, 10628, 309, 365, 8062, 3164, 11, + 51328], "temperature": 0.0, "avg_logprob": -0.27065517666103606, "compression_ratio": + 1.5305343511450382, "no_speech_prob": 0.007175566628575325}, {"id": 69, "seek": + 43980, "start": 459.08000000000004, "end": 466.68, "text": " or something like that. + So what is my typical sort of like, like, pipeline, how does it look like?", "tokens": + [51328, 420, 746, 411, 300, 13, 407, 437, 307, 452, 7476, 1333, 295, 411, 11, 411, + 11, 15517, 11, 577, 775, 309, 574, 411, 30, 51708], "temperature": 0.0, "avg_logprob": + -0.27065517666103606, "compression_ratio": 1.5305343511450382, "no_speech_prob": + 0.007175566628575325}, {"id": 70, "seek": 46668, "start": 466.68, "end": 471.48, + "text": " Right. So what will I do? Let''s say I have my data. And then maybe I''ve + chosen a model,", "tokens": [50364, 1779, 13, 407, 437, 486, 286, 360, 30, 961, + 311, 584, 286, 362, 452, 1412, 13, 400, 550, 1310, 286, 600, 8614, 257, 2316, 11, + 50604], "temperature": 0.0, "avg_logprob": -0.1421315532085324, "compression_ratio": + 1.725868725868726, "no_speech_prob": 0.043925996869802475}, {"id": 71, "seek": 46668, + "start": 471.48, "end": 478.12, "text": " but that model is okay. Maybe it''s not + the fastest one. What should I do? Will I turn to your platform", "tokens": [50604, + 457, 300, 2316, 307, 1392, 13, 2704, 309, 311, 406, 264, 14573, 472, 13, 708, 820, + 286, 360, 30, 3099, 286, 1261, 281, 428, 3663, 50936], "temperature": 0.0, "avg_logprob": + -0.1421315532085324, "compression_ratio": 1.725868725868726, "no_speech_prob": 0.043925996869802475}, + {"id": 72, "seek": 46668, "start": 478.12, "end": 482.12, "text": " to speed it + up? Will I turn to your platform to do some other things as well?", "tokens": [50936, + 281, 3073, 309, 493, 30, 3099, 286, 1261, 281, 428, 3663, 281, 360, 512, 661, 721, + 382, 731, 30, 51136], "temperature": 0.0, "avg_logprob": -0.1421315532085324, "compression_ratio": + 1.725868725868726, "no_speech_prob": 0.043925996869802475}, {"id": 73, "seek": 46668, + "start": 483.48, "end": 490.04, "text": " So we are not doing any changes in the + model itself. We are not quantizing. Even though we can", "tokens": [51204, 407, + 321, 366, 406, 884, 604, 2962, 294, 264, 2316, 2564, 13, 492, 366, 406, 4426, 3319, + 13, 2754, 1673, 321, 393, 51532], "temperature": 0.0, "avg_logprob": -0.1421315532085324, + "compression_ratio": 1.725868725868726, "no_speech_prob": 0.043925996869802475}, + {"id": 74, "seek": 46668, "start": 490.04, "end": 495.88, "text": " use those models, + so candle gives you a certain list of models that you can use and", "tokens": [51532, + 764, 729, 5245, 11, 370, 17968, 2709, 291, 257, 1629, 1329, 295, 5245, 300, 291, + 393, 764, 293, 51824], "temperature": 0.0, "avg_logprob": -0.1421315532085324, "compression_ratio": + 1.725868725868726, "no_speech_prob": 0.043925996869802475}, {"id": 75, "seek": 49588, + "start": 495.88, "end": 501.8, "text": " create with us. Yeah. Basically. So whatever + candle supports, we support. Yeah. Whatever candle", "tokens": [50364, 1884, 365, + 505, 13, 865, 13, 8537, 13, 407, 2035, 17968, 9346, 11, 321, 1406, 13, 865, 13, + 8541, 17968, 50660], "temperature": 0.0, "avg_logprob": -0.2131784439086914, "compression_ratio": + 1.7393364928909953, "no_speech_prob": 0.009811174124479294}, {"id": 76, "seek": + 49588, "start": 501.8, "end": 505.88, "text": " doesn''t support, we cannot support + because we are basically dependent of them. Yeah.", "tokens": [50660, 1177, 380, + 1406, 11, 321, 2644, 1406, 570, 321, 366, 1936, 12334, 295, 552, 13, 865, 13, 50864], + "temperature": 0.0, "avg_logprob": -0.2131784439086914, "compression_ratio": 1.7393364928909953, + "no_speech_prob": 0.009811174124479294}, {"id": 77, "seek": 49588, "start": 506.92, + "end": 514.28, "text": " So if and we are not doing anything in the model itself, + we are doing it on the extraction", "tokens": [50916, 407, 498, 293, 321, 366, 406, + 884, 1340, 294, 264, 2316, 2564, 11, 321, 366, 884, 309, 322, 264, 30197, 51284], + "temperature": 0.0, "avg_logprob": -0.2131784439086914, "compression_ratio": 1.7393364928909953, + "no_speech_prob": 0.009811174124479294}, {"id": 78, "seek": 49588, "start": 514.28, + "end": 520.44, "text": " in parsing part of the data. Right. If you have different + videos, different MDs, I will extract", "tokens": [51284, 294, 21156, 278, 644, + 295, 264, 1412, 13, 1779, 13, 759, 291, 362, 819, 2145, 11, 819, 22521, 82, 11, + 286, 486, 8947, 51592], "temperature": 0.0, "avg_logprob": -0.2131784439086914, + "compression_ratio": 1.7393364928909953, "no_speech_prob": 0.009811174124479294}, + {"id": 79, "seek": 52044, "start": 520.44, "end": 526.6800000000001, "text": " junk + and parse them. And then build this like extra fast. Yeah. Yeah.", "tokens": [50364, + 19109, 293, 48377, 552, 13, 400, 550, 1322, 341, 411, 2857, 2370, 13, 865, 13, 865, + 13, 50676], "temperature": 0.0, "avg_logprob": -0.2846457842484261, "compression_ratio": + 1.5867768595041323, "no_speech_prob": 0.03815966844558716}, {"id": 80, "seek": 52044, + "start": 528.2, "end": 534.9200000000001, "text": " And then let''s say if I want + to go to production, but I also have some other components which", "tokens": [50752, + 400, 550, 718, 311, 584, 498, 286, 528, 281, 352, 281, 4265, 11, 457, 286, 611, + 362, 512, 661, 6677, 597, 51088], "temperature": 0.0, "avg_logprob": -0.2846457842484261, + "compression_ratio": 1.5867768595041323, "no_speech_prob": 0.03815966844558716}, + {"id": 81, "seek": 52044, "start": 534.9200000000001, "end": 539.48, "text": " maybe + you wouldn''t integrate, right? I know my search cluster and something else.", "tokens": + [51088, 1310, 291, 2759, 380, 13365, 11, 558, 30, 286, 458, 452, 3164, 13630, 293, + 746, 1646, 13, 51316], "temperature": 0.0, "avg_logprob": -0.2846457842484261, "compression_ratio": + 1.5867768595041323, "no_speech_prob": 0.03815966844558716}, {"id": 82, "seek": 52044, + "start": 539.48, "end": 544.84, "text": " My services. So can I also go to production + with your platform? Yeah.", "tokens": [51316, 1222, 3328, 13, 407, 393, 286, 611, + 352, 281, 4265, 365, 428, 3663, 30, 865, 13, 51584], "temperature": 0.0, "avg_logprob": + -0.2846457842484261, "compression_ratio": 1.5867768595041323, "no_speech_prob": + 0.03815966844558716}, {"id": 83, "seek": 52044, "start": 545.48, "end": 549.32, + "text": " Like, how will it look like exactly? Is it a docker? Is it the unit?", + "tokens": [51616, 1743, 11, 577, 486, 309, 574, 411, 2293, 30, 1119, 309, 257, 360, + 9178, 30, 1119, 309, 264, 4985, 30, 51808], "temperature": 0.0, "avg_logprob": -0.2846457842484261, + "compression_ratio": 1.5867768595041323, "no_speech_prob": 0.03815966844558716}, + {"id": 84, "seek": 54932, "start": 549.8000000000001, "end": 554.7600000000001, + "text": " It''s you do not need to first of all code in Rust. A lot of developers + come out to reach me and", "tokens": [50388, 467, 311, 291, 360, 406, 643, 281, + 700, 295, 439, 3089, 294, 34952, 13, 316, 688, 295, 8849, 808, 484, 281, 2524, 385, + 293, 50636], "temperature": 0.0, "avg_logprob": -0.3261859348842076, "compression_ratio": + 1.691304347826087, "no_speech_prob": 0.05555598437786102}, {"id": 85, "seek": 54932, + "start": 554.7600000000001, "end": 560.7600000000001, "text": " like, you know, + they ask, do I need to know Rust to contribute to embed anything? I''m like, no,", + "tokens": [50636, 411, 11, 291, 458, 11, 436, 1029, 11, 360, 286, 643, 281, 458, + 34952, 281, 10586, 281, 12240, 1340, 30, 286, 478, 411, 11, 572, 11, 50936], "temperature": + 0.0, "avg_logprob": -0.3261859348842076, "compression_ratio": 1.691304347826087, + "no_speech_prob": 0.05555598437786102}, {"id": 86, "seek": 54932, "start": 560.7600000000001, + "end": 568.5200000000001, "text": " you do not need. We have, because let''s fire. + We have like worked like for building this wrapper", "tokens": [50936, 291, 360, + 406, 643, 13, 492, 362, 11, 570, 718, 311, 2610, 13, 492, 362, 411, 2732, 411, 337, + 2390, 341, 46906, 51324], "temperature": 0.0, "avg_logprob": -0.3261859348842076, + "compression_ratio": 1.691304347826087, "no_speech_prob": 0.05555598437786102}, + {"id": 87, "seek": 54932, "start": 568.5200000000001, "end": 574.36, "text": " around + Rust so that you know, you can easily create it with Python. Oh, so you have a Python + wrapper", "tokens": [51324, 926, 34952, 370, 300, 291, 458, 11, 291, 393, 3612, + 1884, 309, 365, 15329, 13, 876, 11, 370, 291, 362, 257, 15329, 46906, 51616], "temperature": + 0.0, "avg_logprob": -0.3261859348842076, "compression_ratio": 1.691304347826087, + "no_speech_prob": 0.05555598437786102}, {"id": 88, "seek": 57436, "start": 574.84, + "end": 581.32, "text": " of your own. Yeah. You only need to know Python. You do + not need to know Rust at all.", "tokens": [50388, 295, 428, 1065, 13, 865, 13, 509, + 787, 643, 281, 458, 15329, 13, 509, 360, 406, 643, 281, 458, 34952, 412, 439, 13, + 50712], "temperature": 0.0, "avg_logprob": -0.25760720481335275, "compression_ratio": + 1.538888888888889, "no_speech_prob": 0.01238335482776165}, {"id": 89, "seek": 57436, + "start": 581.32, "end": 588.76, "text": " How interesting. Yeah. So do you have + any like instances where companies have already built", "tokens": [50712, 1012, + 1880, 13, 865, 13, 407, 360, 291, 362, 604, 411, 14519, 689, 3431, 362, 1217, 3094, + 51084], "temperature": 0.0, "avg_logprob": -0.25760720481335275, "compression_ratio": + 1.538888888888889, "no_speech_prob": 0.01238335482776165}, {"id": 90, "seek": 57436, + "start": 588.76, "end": 595.88, "text": " POCs with your platform? Or do you already + have someone going to production? Yeah. So I get so many", "tokens": [51084, 22299, + 33290, 365, 428, 3663, 30, 1610, 360, 291, 1217, 362, 1580, 516, 281, 4265, 30, + 865, 13, 407, 286, 483, 370, 867, 51440], "temperature": 0.0, "avg_logprob": -0.25760720481335275, + "compression_ratio": 1.538888888888889, "no_speech_prob": 0.01238335482776165}, + {"id": 91, "seek": 59588, "start": 595.88, "end": 604.4399999999999, "text": " requests + on like acquisition part of things and stuff like that. But we are like, you know,", + "tokens": [50364, 12475, 322, 411, 21668, 644, 295, 721, 293, 1507, 411, 300, 13, + 583, 321, 366, 411, 11, 291, 458, 11, 50792], "temperature": 0.0, "avg_logprob": + -0.2676670809826219, "compression_ratio": 1.6237623762376239, "no_speech_prob": + 0.01055728830397129}, {"id": 92, "seek": 59588, "start": 604.4399999999999, "end": + 611.24, "text": " it''s we are one my whole company and we have nothing company + project. But we have got six", "tokens": [50792, 309, 311, 321, 366, 472, 452, 1379, + 2237, 293, 321, 362, 1825, 2237, 1716, 13, 583, 321, 362, 658, 2309, 51132], "temperature": + 0.0, "avg_logprob": -0.2676670809826219, "compression_ratio": 1.6237623762376239, + "no_speech_prob": 0.01055728830397129}, {"id": 93, "seek": 59588, "start": 611.24, + "end": 617.56, "text": " key downloads, but we have gotten to production yet. But + hopefully next two, three months.", "tokens": [51132, 2141, 36553, 11, 457, 321, + 362, 5768, 281, 4265, 1939, 13, 583, 4696, 958, 732, 11, 1045, 2493, 13, 51448], + "temperature": 0.0, "avg_logprob": -0.2676670809826219, "compression_ratio": 1.6237623762376239, + "no_speech_prob": 0.01055728830397129}, {"id": 94, "seek": 59588, "start": 617.56, + "end": 621.96, "text": " Nice. And do you need any help from the community? Yes.", + "tokens": [51448, 5490, 13, 400, 360, 291, 643, 604, 854, 490, 264, 1768, 30, 1079, + 13, 51668], "temperature": 0.0, "avg_logprob": -0.2676670809826219, "compression_ratio": + 1.6237623762376239, "no_speech_prob": 0.01055728830397129}, {"id": 95, "seek": 62196, + "start": 622.9200000000001, "end": 628.9200000000001, "text": " If you''re interested + in building the infrastructure for UnityBI and Rust,", "tokens": [50412, 759, 291, + 434, 3102, 294, 2390, 264, 6896, 337, 27913, 11291, 293, 34952, 11, 50712], "temperature": + 0.0, "avg_logprob": -0.28461087870801616, "compression_ratio": 1.5643939393939394, + "no_speech_prob": 0.011971337720751762}, {"id": 96, "seek": 62196, "start": 628.9200000000001, + "end": 634.6, "text": " Python. So to connect with us, we''re left to have you on + board.", "tokens": [50712, 15329, 13, 407, 281, 1745, 365, 505, 11, 321, 434, 1411, + 281, 362, 291, 322, 3150, 13, 50996], "temperature": 0.0, "avg_logprob": -0.28461087870801616, + "compression_ratio": 1.5643939393939394, "no_speech_prob": 0.011971337720751762}, + {"id": 97, "seek": 62196, "start": 634.6, "end": 639.72, "text": " All right. But + let''s go back a little bit. So you also said that there is multi-modality", "tokens": + [50996, 1057, 558, 13, 583, 718, 311, 352, 646, 257, 707, 857, 13, 407, 291, 611, + 848, 300, 456, 307, 4825, 12, 8014, 1860, 51252], "temperature": 0.0, "avg_logprob": + -0.28461087870801616, "compression_ratio": 1.5643939393939394, "no_speech_prob": + 0.011971337720751762}, {"id": 98, "seek": 62196, "start": 639.72, "end": 644.6800000000001, + "text": " element of it. Yeah. So I will tell you the way I see it, but please correct + me or augment me.", "tokens": [51252, 4478, 295, 309, 13, 865, 13, 407, 286, 486, + 980, 291, 264, 636, 286, 536, 309, 11, 457, 1767, 3006, 385, 420, 29919, 385, 13, + 51500], "temperature": 0.0, "avg_logprob": -0.28461087870801616, "compression_ratio": + 1.5643939393939394, "no_speech_prob": 0.011971337720751762}, {"id": 99, "seek": + 62196, "start": 644.6800000000001, "end": 650.0400000000001, "text": " So I think + a couple of years ago, two years ago, we gave a talk here at Berlin Bosnol. It''s", + "tokens": [51500, 407, 286, 519, 257, 1916, 295, 924, 2057, 11, 732, 924, 2057, + 11, 321, 2729, 257, 751, 510, 412, 13848, 22264, 77, 401, 13, 467, 311, 51768], + "temperature": 0.0, "avg_logprob": -0.28461087870801616, "compression_ratio": 1.5643939393939394, + "no_speech_prob": 0.011971337720751762}, {"id": 100, "seek": 65004, "start": 650.04, + "end": 657.0, "text": " basically showing a system where you can search images and + text, whatever you want.", "tokens": [50364, 1936, 4099, 257, 1185, 689, 291, 393, + 3164, 5267, 293, 2487, 11, 2035, 291, 528, 13, 50712], "temperature": 0.0, "avg_logprob": + -0.16903860569000245, "compression_ratio": 1.59375, "no_speech_prob": 0.0017836998449638486}, + {"id": 101, "seek": 65004, "start": 657.7199999999999, "end": 665.88, "text": " + And if you have images that do not have textual metadata, then that''s your gateway + into finding", "tokens": [50748, 400, 498, 291, 362, 5267, 300, 360, 406, 362, 2487, + 901, 26603, 11, 550, 300, 311, 428, 28532, 666, 5006, 51156], "temperature": 0.0, + "avg_logprob": -0.16903860569000245, "compression_ratio": 1.59375, "no_speech_prob": + 0.0017836998449638486}, {"id": 102, "seek": 65004, "start": 665.88, "end": 671.56, + "text": " these images because neural networks will understand and extract the content + using plebe,", "tokens": [51156, 613, 5267, 570, 18161, 9590, 486, 1223, 293, 8947, + 264, 2701, 1228, 3362, 650, 11, 51440], "temperature": 0.0, "avg_logprob": -0.16903860569000245, + "compression_ratio": 1.59375, "no_speech_prob": 0.0017836998449638486}, {"id": 103, + "seek": 65004, "start": 671.56, "end": 676.28, "text": " right? Yeah. And so we + were able to show some really interesting examples. For example,", "tokens": [51440, + 558, 30, 865, 13, 400, 370, 321, 645, 1075, 281, 855, 512, 534, 1880, 5110, 13, + 1171, 1365, 11, 51676], "temperature": 0.0, "avg_logprob": -0.16903860569000245, + "compression_ratio": 1.59375, "no_speech_prob": 0.0017836998449638486}, {"id": 104, + "seek": 67628, "start": 676.28, "end": 682.6, "text": " you could find in the context + of the commerce, a long sleeveless dress,", "tokens": [50364, 291, 727, 915, 294, + 264, 4319, 295, 264, 26320, 11, 257, 938, 12931, 779, 442, 5231, 11, 50680], "temperature": + 0.0, "avg_logprob": -0.24652152683423914, "compression_ratio": 1.6395348837209303, + "no_speech_prob": 0.005261875223368406}, {"id": 105, "seek": 67628, "start": 682.6, + "end": 687.4, "text": " striped, whatever color and so on and so forth. And it worked. + And even some audience members asked", "tokens": [50680, 3575, 3452, 11, 2035, 2017, + 293, 370, 322, 293, 370, 5220, 13, 400, 309, 2732, 13, 400, 754, 512, 4034, 2679, + 2351, 50920], "temperature": 0.0, "avg_logprob": -0.24652152683423914, "compression_ratio": + 1.6395348837209303, "no_speech_prob": 0.005261875223368406}, {"id": 106, "seek": + 67628, "start": 687.4, "end": 694.8399999999999, "text": " us to demo on their queries + and it still worked. So that showed the power of multi-modality,", "tokens": [50920, + 505, 281, 10723, 322, 641, 24109, 293, 309, 920, 2732, 13, 407, 300, 4712, 264, + 1347, 295, 4825, 12, 8014, 1860, 11, 51292], "temperature": 0.0, "avg_logprob": + -0.24652152683423914, "compression_ratio": 1.6395348837209303, "no_speech_prob": + 0.005261875223368406}, {"id": 107, "seek": 67628, "start": 694.8399999999999, "end": + 698.52, "text": " right? And we didn''t even need to fine-tune delete it. It was + just out of the box.", "tokens": [51292, 558, 30, 400, 321, 994, 380, 754, 643, + 281, 2489, 12, 83, 2613, 12097, 309, 13, 467, 390, 445, 484, 295, 264, 2424, 13, + 51476], "temperature": 0.0, "avg_logprob": -0.24652152683423914, "compression_ratio": + 1.6395348837209303, "no_speech_prob": 0.005261875223368406}, {"id": 108, "seek": + 67628, "start": 699.24, "end": 703.72, "text": " But I guess the reason for it to + multi-modality, what else are you thinking", "tokens": [51512, 583, 286, 2041, 264, + 1778, 337, 309, 281, 4825, 12, 8014, 1860, 11, 437, 1646, 366, 291, 1953, 51736], + "temperature": 0.0, "avg_logprob": -0.24652152683423914, "compression_ratio": 1.6395348837209303, + "no_speech_prob": 0.005261875223368406}, {"id": 109, "seek": 70372, "start": 704.28, + "end": 711.8000000000001, "text": " that''s part of your blog? Great question. Even + though images like Clare is known for multi-modality", "tokens": [50392, 300, 311, + 644, 295, 428, 6968, 30, 3769, 1168, 13, 2754, 1673, 5267, 411, 2033, 543, 307, + 2570, 337, 4825, 12, 8014, 1860, 50768], "temperature": 0.0, "avg_logprob": -0.29357982434724506, + "compression_ratio": 1.5879828326180256, "no_speech_prob": 0.009344922378659248}, + {"id": 110, "seek": 70372, "start": 711.8000000000001, "end": 718.12, "text": " + research, one of the best use cases of Clare is when you''re doing the need of short + classification,", "tokens": [50768, 2132, 11, 472, 295, 264, 1151, 764, 3331, 295, + 2033, 543, 307, 562, 291, 434, 884, 264, 643, 295, 2099, 21538, 11, 51084], "temperature": + 0.0, "avg_logprob": -0.29357982434724506, "compression_ratio": 1.5879828326180256, + "no_speech_prob": 0.009344922378659248}, {"id": 111, "seek": 70372, "start": 718.12, + "end": 723.24, "text": " right? It doesn''t need the previous data at all, even + if it is searching images,", "tokens": [51084, 558, 30, 467, 1177, 380, 643, 264, + 3894, 1412, 412, 439, 11, 754, 498, 309, 307, 10808, 5267, 11, 51340], "temperature": + 0.0, "avg_logprob": -0.29357982434724506, "compression_ratio": 1.5879828326180256, + "no_speech_prob": 0.009344922378659248}, {"id": 112, "seek": 70372, "start": 723.24, + "end": 728.36, "text": " if it is searching through text. And it''s like so powerful, + right? So we have a different", "tokens": [51340, 498, 309, 307, 10808, 807, 2487, + 13, 400, 309, 311, 411, 370, 4005, 11, 558, 30, 407, 321, 362, 257, 819, 51596], + "temperature": 0.0, "avg_logprob": -0.29357982434724506, "compression_ratio": 1.5879828326180256, + "no_speech_prob": 0.009344922378659248}, {"id": 113, "seek": 72836, "start": 728.44, + "end": 734.76, "text": " example with it. But coming to a question, we have audio, + wanted to embed audio graphs,", "tokens": [50368, 1365, 365, 309, 13, 583, 1348, + 281, 257, 1168, 11, 321, 362, 6278, 11, 1415, 281, 12240, 6278, 24877, 11, 50684], + "temperature": 0.0, "avg_logprob": -0.36870470279600565, "compression_ratio": 1.545945945945946, + "no_speech_prob": 0.02618482895195484}, {"id": 114, "seek": 72836, "start": 734.76, + "end": 742.2, "text": " et cetera. So all these are in five times. But right now, + we are only embedding text and images.", "tokens": [50684, 1030, 11458, 13, 407, + 439, 613, 366, 294, 1732, 1413, 13, 583, 558, 586, 11, 321, 366, 787, 12240, 3584, + 2487, 293, 5267, 13, 51056], "temperature": 0.0, "avg_logprob": -0.36870470279600565, + "compression_ratio": 1.545945945945946, "no_speech_prob": 0.02618482895195484}, + {"id": 115, "seek": 72836, "start": 743.64, "end": 749.72, "text": " And are you + using Clip? Yeah, we''re using Clip. Clip, that''s right. You know, that one thing + that you", "tokens": [51128, 400, 366, 291, 1228, 2033, 647, 30, 865, 11, 321, 434, + 1228, 2033, 647, 13, 2033, 647, 11, 300, 311, 558, 13, 509, 458, 11, 300, 472, 551, + 300, 291, 51432], "temperature": 0.0, "avg_logprob": -0.36870470279600565, "compression_ratio": + 1.545945945945946, "no_speech_prob": 0.02618482895195484}, {"id": 116, "seek": 74972, + "start": 749.8000000000001, "end": 759.0, "text": " cannot get. I also wanted to + ask you a bit if you may share your insight on evaluating this", "tokens": [50368, + 2644, 483, 13, 286, 611, 1415, 281, 1029, 291, 257, 857, 498, 291, 815, 2073, 428, + 11269, 322, 27479, 341, 50828], "temperature": 0.0, "avg_logprob": -0.3041996955871582, + "compression_ratio": 1.543778801843318, "no_speech_prob": 0.009660112671554089}, + {"id": 117, "seek": 74972, "start": 759.0, "end": 765.88, "text": " system. So one + of the feedbacks that I have gotten for, for IoT,", "tokens": [50828, 1185, 13, + 407, 472, 295, 264, 5824, 82, 300, 286, 362, 5768, 337, 11, 337, 30112, 11, 51172], + "temperature": 0.0, "avg_logprob": -0.3041996955871582, "compression_ratio": 1.543778801843318, + "no_speech_prob": 0.009660112671554089}, {"id": 118, "seek": 74972, "start": 765.88, + "end": 771.08, "text": " between or anything, like basically, so let''s say I have + my LLN based application,", "tokens": [51172, 1296, 420, 1340, 11, 411, 1936, 11, + 370, 718, 311, 584, 286, 362, 452, 441, 43, 45, 2361, 3861, 11, 51432], "temperature": + 0.0, "avg_logprob": -0.3041996955871582, "compression_ratio": 1.543778801843318, + "no_speech_prob": 0.009660112671554089}, {"id": 119, "seek": 74972, "start": 771.08, + "end": 776.28, "text": " you know, how do I evaluate it? Because one of the feedbacks + is that sometimes it gives perfect", "tokens": [51432, 291, 458, 11, 577, 360, 286, + 13059, 309, 30, 1436, 472, 295, 264, 5824, 82, 307, 300, 2171, 309, 2709, 2176, + 51692], "temperature": 0.0, "avg_logprob": -0.3041996955871582, "compression_ratio": + 1.543778801843318, "no_speech_prob": 0.009660112671554089}, {"id": 120, "seek": + 77628, "start": 776.28, "end": 783.0799999999999, "text": " results, sometimes it + gives awful results, right? So now there is nothing in between, right?", "tokens": + [50364, 3542, 11, 2171, 309, 2709, 11232, 3542, 11, 558, 30, 407, 586, 456, 307, + 1825, 294, 1296, 11, 558, 30, 50704], "temperature": 0.0, "avg_logprob": -0.18651909663759428, + "compression_ratio": 1.7194244604316546, "no_speech_prob": 0.005421963054686785}, + {"id": 121, "seek": 77628, "start": 783.0799999999999, "end": 787.64, "text": " + Or not barely. So how would you solve this? Of course, you do start with your metric + learning and", "tokens": [50704, 1610, 406, 10268, 13, 407, 577, 576, 291, 5039, + 341, 30, 2720, 1164, 11, 291, 360, 722, 365, 428, 20678, 2539, 293, 50932], "temperature": + 0.0, "avg_logprob": -0.18651909663759428, "compression_ratio": 1.7194244604316546, + "no_speech_prob": 0.005421963054686785}, {"id": 122, "seek": 77628, "start": 787.64, + "end": 792.52, "text": " some other techniques, right? But there is still the other + side of things when you go to production,", "tokens": [50932, 512, 661, 7512, 11, + 558, 30, 583, 456, 307, 920, 264, 661, 1252, 295, 721, 562, 291, 352, 281, 4265, + 11, 51176], "temperature": 0.0, "avg_logprob": -0.18651909663759428, "compression_ratio": + 1.7194244604316546, "no_speech_prob": 0.005421963054686785}, {"id": 123, "seek": + 77628, "start": 792.52, "end": 797.24, "text": " as you know, like in Rasa and Quadrant + and many other companies, you care about quality.", "tokens": [51176, 382, 291, + 458, 11, 411, 294, 497, 9994, 293, 29619, 7541, 293, 867, 661, 3431, 11, 291, 1127, + 466, 3125, 13, 51412], "temperature": 0.0, "avg_logprob": -0.18651909663759428, + "compression_ratio": 1.7194244604316546, "no_speech_prob": 0.005421963054686785}, + {"id": 124, "seek": 77628, "start": 797.9599999999999, "end": 802.76, "text": " + So how do you have any insight on that? Are you maybe planning to build something + along the lines", "tokens": [51448, 407, 577, 360, 291, 362, 604, 11269, 322, 300, + 30, 2014, 291, 1310, 5038, 281, 1322, 746, 2051, 264, 3876, 51688], "temperature": + 0.0, "avg_logprob": -0.18651909663759428, "compression_ratio": 1.7194244604316546, + "no_speech_prob": 0.005421963054686785}, {"id": 125, "seek": 80276, "start": 802.76, + "end": 810.52, "text": " of evaluation? That''s a great question. You know, but + great part of the great response to it is,", "tokens": [50364, 295, 13344, 30, 663, + 311, 257, 869, 1168, 13, 509, 458, 11, 457, 869, 644, 295, 264, 869, 4134, 281, + 309, 307, 11, 50752], "temperature": 0.0, "avg_logprob": -0.3141882194662994, "compression_ratio": + 1.721461187214612, "no_speech_prob": 0.032879944890737534}, {"id": 126, "seek": + 80276, "start": 810.52, "end": 817.3199999999999, "text": " so LLN is one of the + examples of that. So, you know, LLN gives you a bright answer, but it also", "tokens": + [50752, 370, 441, 43, 45, 307, 472, 295, 264, 5110, 295, 300, 13, 407, 11, 291, + 458, 11, 441, 43, 45, 2709, 291, 257, 4730, 1867, 11, 457, 309, 611, 51092], "temperature": + 0.0, "avg_logprob": -0.3141882194662994, "compression_ratio": 1.721461187214612, + "no_speech_prob": 0.032879944890737534}, {"id": 127, "seek": 80276, "start": 817.3199999999999, + "end": 823.56, "text": " gives you hallucination. But a lot of people see hallucination + as a bug, but I see it as a feature,", "tokens": [51092, 2709, 291, 35212, 2486, + 13, 583, 257, 688, 295, 561, 536, 35212, 2486, 382, 257, 7426, 11, 457, 286, 536, + 309, 382, 257, 4111, 11, 51404], "temperature": 0.0, "avg_logprob": -0.3141882194662994, + "compression_ratio": 1.721461187214612, "no_speech_prob": 0.032879944890737534}, + {"id": 128, "seek": 80276, "start": 823.56, "end": 831.72, "text": " because it + won''t be able to do that creative job, but it can do with hallucinations.", "tokens": + [51404, 570, 309, 1582, 380, 312, 1075, 281, 360, 300, 5880, 1691, 11, 457, 309, + 393, 360, 365, 35212, 10325, 13, 51812], "temperature": 0.0, "avg_logprob": -0.3141882194662994, + "compression_ratio": 1.721461187214612, "no_speech_prob": 0.032879944890737534}, + {"id": 129, "seek": 83276, "start": 833.08, "end": 839.4, "text": " One of the, + there are so many tools, right, to measure retrieval part like", "tokens": [50380, + 1485, 295, 264, 11, 456, 366, 370, 867, 3873, 11, 558, 11, 281, 3481, 19817, 3337, + 644, 411, 50696], "temperature": 0.0, "avg_logprob": -0.3032225010006927, "compression_ratio": + 1.6395939086294415, "no_speech_prob": 0.017986256629228592}, {"id": 130, "seek": + 83276, "start": 840.2, "end": 847.16, "text": " Braggas, Prometheus, right? And + there are so many tools, too. But still, I think recall measure,", "tokens": [50736, + 4991, 1615, 296, 11, 2114, 649, 42209, 11, 558, 30, 400, 456, 366, 370, 867, 3873, + 11, 886, 13, 583, 920, 11, 286, 519, 9901, 3481, 11, 51084], "temperature": 0.0, + "avg_logprob": -0.3032225010006927, "compression_ratio": 1.6395939086294415, "no_speech_prob": + 0.017986256629228592}, {"id": 131, "seek": 83276, "start": 847.16, "end": 852.04, + "text": " what we call it, like, you know, measure how LLN recall is working,", + "tokens": [51084, 437, 321, 818, 309, 11, 411, 11, 291, 458, 11, 3481, 577, 441, + 43, 45, 9901, 307, 1364, 11, 51328], "temperature": 0.0, "avg_logprob": -0.3032225010006927, + "compression_ratio": 1.6395939086294415, "no_speech_prob": 0.017986256629228592}, + {"id": 132, "seek": 83276, "start": 852.04, "end": 858.2, "text": " where basically + extracting most relevant information, not like rubbish information.", "tokens": + [51328, 689, 1936, 49844, 881, 7340, 1589, 11, 406, 411, 29978, 1589, 13, 51636], + "temperature": 0.0, "avg_logprob": -0.3032225010006927, "compression_ratio": 1.6395939086294415, + "no_speech_prob": 0.017986256629228592}, {"id": 133, "seek": 85820, "start": 858.2800000000001, + "end": 864.76, "text": " So those things are like really important, and a lot of + research is going on, but we are more", "tokens": [50368, 407, 729, 721, 366, 411, + 534, 1021, 11, 293, 257, 688, 295, 2132, 307, 516, 322, 11, 457, 321, 366, 544, + 50692], "temperature": 0.0, "avg_logprob": -0.2634161435640775, "compression_ratio": + 1.6227272727272728, "no_speech_prob": 0.026854360476136208}, {"id": 134, "seek": + 85820, "start": 864.76, "end": 872.2, "text": " like focused on the infrastructure, + and we are keeping it up, trying to keep it up, but yeah,", "tokens": [50692, 411, + 5178, 322, 264, 6896, 11, 293, 321, 366, 5145, 309, 493, 11, 1382, 281, 1066, 309, + 493, 11, 457, 1338, 11, 51064], "temperature": 0.0, "avg_logprob": -0.2634161435640775, + "compression_ratio": 1.6227272727272728, "no_speech_prob": 0.026854360476136208}, + {"id": 135, "seek": 85820, "start": 872.2, "end": 881.24, "text": " so mostly I + would go for classical testing ways like precision, recall, yeah.", "tokens": [51064, + 370, 5240, 286, 576, 352, 337, 13735, 4997, 2098, 411, 18356, 11, 9901, 11, 1338, + 13, 51516], "temperature": 0.0, "avg_logprob": -0.2634161435640775, "compression_ratio": + 1.6227272727272728, "no_speech_prob": 0.026854360476136208}, {"id": 136, "seek": + 85820, "start": 881.24, "end": 886.2, "text": " But basically like, okay, you do + test, and you see that sometimes once in a while it fails.", "tokens": [51516, 583, + 1936, 411, 11, 1392, 11, 291, 360, 1500, 11, 293, 291, 536, 300, 2171, 1564, 294, + 257, 1339, 309, 18199, 13, 51764], "temperature": 0.0, "avg_logprob": -0.2634161435640775, + "compression_ratio": 1.6227272727272728, "no_speech_prob": 0.026854360476136208}, + {"id": 137, "seek": 88620, "start": 886.84, "end": 891.96, "text": " So first of + all, of course, catching that is important, right? People are going to production.", + "tokens": [50396, 407, 700, 295, 439, 11, 295, 1164, 11, 16124, 300, 307, 1021, + 11, 558, 30, 3432, 366, 516, 281, 4265, 13, 50652], "temperature": 0.0, "avg_logprob": + -0.27672962077613017, "compression_ratio": 1.5669291338582678, "no_speech_prob": + 0.012739204801619053}, {"id": 138, "seek": 88620, "start": 891.96, "end": 895.8000000000001, + "text": " Yes, but what is your way backwards to fixing this?", "tokens": [50652, + 1079, 11, 457, 437, 307, 428, 636, 12204, 281, 19442, 341, 30, 50844], "temperature": + 0.0, "avg_logprob": -0.27672962077613017, "compression_ratio": 1.5669291338582678, + "no_speech_prob": 0.012739204801619053}, {"id": 139, "seek": 88620, "start": 896.5200000000001, + "end": 899.6400000000001, "text": " Thank you, Chef. Yeah, from from from finding + that bug.", "tokens": [50880, 1044, 291, 11, 14447, 13, 865, 11, 490, 490, 490, + 5006, 300, 7426, 13, 51036], "temperature": 0.0, "avg_logprob": -0.27672962077613017, + "compression_ratio": 1.5669291338582678, "no_speech_prob": 0.012739204801619053}, + {"id": 140, "seek": 88620, "start": 901.8000000000001, "end": 908.6800000000001, + "text": " Okay, let me think about that. So, data set, maybe you can give some example + where you have fixed", "tokens": [51144, 1033, 11, 718, 385, 519, 466, 300, 13, + 407, 11, 1412, 992, 11, 1310, 291, 393, 976, 512, 1365, 689, 291, 362, 6806, 51488], + "temperature": 0.0, "avg_logprob": -0.27672962077613017, "compression_ratio": 1.5669291338582678, + "no_speech_prob": 0.012739204801619053}, {"id": 141, "seek": 88620, "start": 908.6800000000001, + "end": 913.72, "text": " a ratio, you know, reported by someone, not necessarily + as part of your platform, but previously,", "tokens": [51488, 257, 8509, 11, 291, + 458, 11, 7055, 538, 1580, 11, 406, 4725, 382, 644, 295, 428, 3663, 11, 457, 8046, + 11, 51740], "temperature": 0.0, "avg_logprob": -0.27672962077613017, "compression_ratio": + 1.5669291338582678, "no_speech_prob": 0.012739204801619053}, {"id": 142, "seek": + 91372, "start": 913.72, "end": 922.0400000000001, "text": " I don''t know, what + have I done? Yeah, so I was working with this, that''s why this talk came", "tokens": + [50364, 286, 500, 380, 458, 11, 437, 362, 286, 1096, 30, 865, 11, 370, 286, 390, + 1364, 365, 341, 11, 300, 311, 983, 341, 751, 1361, 50780], "temperature": 0.0, "avg_logprob": + -0.3175241661071777, "compression_ratio": 1.615702479338843, "no_speech_prob": 0.020426224917173386}, + {"id": 143, "seek": 91372, "start": 922.0400000000001, "end": 928.0400000000001, + "text": " into my mind, right? The negation problem, the negation problem is so + huge. You will always find", "tokens": [50780, 666, 452, 1575, 11, 558, 30, 440, + 2485, 399, 1154, 11, 264, 2485, 399, 1154, 307, 370, 2603, 13, 509, 486, 1009, 915, + 51080], "temperature": 0.0, "avg_logprob": -0.3175241661071777, "compression_ratio": + 1.615702479338843, "no_speech_prob": 0.020426224917173386}, {"id": 144, "seek": + 91372, "start": 928.0400000000001, "end": 935.88, "text": " sentences with not every + domain, read biomedical, read law and everything, and it still gives you the", "tokens": + [51080, 16579, 365, 406, 633, 9274, 11, 1401, 49775, 11, 1401, 2101, 293, 1203, + 11, 293, 309, 920, 2709, 291, 264, 51472], "temperature": 0.0, "avg_logprob": -0.3175241661071777, + "compression_ratio": 1.615702479338843, "no_speech_prob": 0.020426224917173386}, + {"id": 145, "seek": 91372, "start": 935.88, "end": 941.96, "text": " same similarity, + even though you do not have to be a language, you know, expert to understand these.", + "tokens": [51472, 912, 32194, 11, 754, 1673, 291, 360, 406, 362, 281, 312, 257, + 2856, 11, 291, 458, 11, 5844, 281, 1223, 613, 13, 51776], "temperature": 0.0, "avg_logprob": + -0.3175241661071777, "compression_ratio": 1.615702479338843, "no_speech_prob": 0.020426224917173386}, + {"id": 146, "seek": 94196, "start": 941.96, "end": 947.32, "text": " Different things. + Yeah, yeah, yeah. So different things, right? That''s that''s when the trick", "tokens": + [50364, 20825, 721, 13, 865, 11, 1338, 11, 1338, 13, 407, 819, 721, 11, 558, 30, + 663, 311, 300, 311, 562, 264, 4282, 50632], "temperature": 0.0, "avg_logprob": -0.31341826121012367, + "compression_ratio": 1.6926605504587156, "no_speech_prob": 0.04503000155091286}, + {"id": 147, "seek": 94196, "start": 947.32, "end": 952.2800000000001, "text": " + learning comes, that''s when inference started to come into mind, because inference + is very important.", "tokens": [50632, 2539, 1487, 11, 300, 311, 562, 38253, 1409, + 281, 808, 666, 1575, 11, 570, 38253, 307, 588, 1021, 13, 50880], "temperature": + 0.0, "avg_logprob": -0.31341826121012367, "compression_ratio": 1.6926605504587156, + "no_speech_prob": 0.04503000155091286}, {"id": 148, "seek": 94196, "start": 952.2800000000001, + "end": 958.6800000000001, "text": " Like a lot of people have played with SNLI and + stuff, and then they understand that to understand", "tokens": [50880, 1743, 257, + 688, 295, 561, 362, 3737, 365, 13955, 48718, 293, 1507, 11, 293, 550, 436, 1223, + 300, 281, 1223, 51200], "temperature": 0.0, "avg_logprob": -0.31341826121012367, + "compression_ratio": 1.6926605504587156, "no_speech_prob": 0.04503000155091286}, + {"id": 149, "seek": 94196, "start": 958.6800000000001, "end": 963.64, "text": " + negation, you first need to understand inferences. So there''s a way to like,", + "tokens": [51200, 2485, 399, 11, 291, 700, 643, 281, 1223, 13596, 2667, 13, 407, + 456, 311, 257, 636, 281, 411, 11, 51448], "temperature": 0.0, "avg_logprob": -0.31341826121012367, + "compression_ratio": 1.6926605504587156, "no_speech_prob": 0.04503000155091286}, + {"id": 150, "seek": 96364, "start": 964.6, "end": 971.64, "text": " method, right? + Yeah, yeah, yeah, and entailment, contradiction, neutral, two sentences could be + neutral,", "tokens": [50412, 3170, 11, 558, 30, 865, 11, 1338, 11, 1338, 11, 293, + 948, 864, 518, 11, 34937, 11, 10598, 11, 732, 16579, 727, 312, 10598, 11, 50764], + "temperature": 0.0, "avg_logprob": -0.37822208404541013, "compression_ratio": 1.7488584474885844, + "no_speech_prob": 0.061677590012550354}, {"id": 151, "seek": 96364, "start": 971.64, + "end": 977.64, "text": " yeah, unrelated to each other, two centers could be contradictory, + contradictory to each other.", "tokens": [50764, 1338, 11, 38967, 281, 1184, 661, + 11, 732, 10898, 727, 312, 49555, 11, 49555, 281, 1184, 661, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.37822208404541013, "compression_ratio": 1.7488584474885844, + "no_speech_prob": 0.061677590012550354}, {"id": 152, "seek": 96364, "start": 977.64, + "end": 983.0, "text": " So, yeah, which means that you need a purpose of data somehow + labeled, yes, yes, yes,", "tokens": [51064, 407, 11, 1338, 11, 597, 1355, 300, 291, + 643, 257, 4334, 295, 1412, 6063, 21335, 11, 2086, 11, 2086, 11, 2086, 11, 51332], + "temperature": 0.0, "avg_logprob": -0.37822208404541013, "compression_ratio": 1.7488584474885844, + "no_speech_prob": 0.061677590012550354}, {"id": 153, "seek": 96364, "start": 983.0, + "end": 988.92, "text": " logically reasoned through, right, using an algorithm. + Yes, so that''s that''s what SNLI is, like,", "tokens": [51332, 38887, 1778, 292, + 807, 11, 558, 11, 1228, 364, 9284, 13, 1079, 11, 370, 300, 311, 300, 311, 437, 13955, + 48718, 307, 11, 411, 11, 51628], "temperature": 0.0, "avg_logprob": -0.37822208404541013, + "compression_ratio": 1.7488584474885844, "no_speech_prob": 0.061677590012550354}, + {"id": 154, "seek": 98892, "start": 988.92, "end": 995.0799999999999, "text": " + SNLI is a data set, particularly for this, yeah, particular questions, problems, + so yeah,", "tokens": [50364, 13955, 48718, 307, 257, 1412, 992, 11, 4098, 337, 341, + 11, 1338, 11, 1729, 1651, 11, 2740, 11, 370, 1338, 11, 50672], "temperature": 0.0, + "avg_logprob": -0.3972225755748182, "compression_ratio": 1.6106194690265487, "no_speech_prob": + 0.05291461944580078}, {"id": 155, "seek": 98892, "start": 995.0799999999999, "end": + 1000.5999999999999, "text": " if you, you''ve re-tune it with, it was fun. So that + would be for text, and what about other", "tokens": [50672, 498, 291, 11, 291, 600, + 319, 12, 83, 2613, 309, 365, 11, 309, 390, 1019, 13, 407, 300, 576, 312, 337, 2487, + 11, 293, 437, 466, 661, 50948], "temperature": 0.0, "avg_logprob": -0.3972225755748182, + "compression_ratio": 1.6106194690265487, "no_speech_prob": 0.05291461944580078}, + {"id": 156, "seek": 98892, "start": 1000.5999999999999, "end": 1009.9599999999999, + "text": " modalities? I don''t think it was, that''s what it was. Like images, I + know sometimes a model may", "tokens": [50948, 1072, 16110, 30, 286, 500, 380, 519, + 309, 390, 11, 300, 311, 437, 309, 390, 13, 1743, 5267, 11, 286, 458, 2171, 257, + 2316, 815, 51416], "temperature": 0.0, "avg_logprob": -0.3972225755748182, "compression_ratio": + 1.6106194690265487, "no_speech_prob": 0.05291461944580078}, {"id": 157, "seek": + 98892, "start": 1009.9599999999999, "end": 1013.56, "text": " hallucinate that there + is something in the real world, but there is nothing like that.", "tokens": [51416, + 35212, 13923, 300, 456, 307, 746, 294, 264, 957, 1002, 11, 457, 456, 307, 1825, + 411, 300, 13, 51596], "temperature": 0.0, "avg_logprob": -0.3972225755748182, "compression_ratio": + 1.6106194690265487, "no_speech_prob": 0.05291461944580078}, {"id": 158, "seek": + 101356, "start": 1014.52, "end": 1020.76, "text": " Oh, that''s one thing, but I + guess there are many. So there are things like, there''s a study called", "tokens": + [50412, 876, 11, 300, 311, 472, 551, 11, 457, 286, 2041, 456, 366, 867, 13, 407, + 456, 366, 721, 411, 11, 456, 311, 257, 2979, 1219, 50724], "temperature": 0.0, "avg_logprob": + -0.3783814239501953, "compression_ratio": 1.669683257918552, "no_speech_prob": 0.04725123196840286}, + {"id": 159, "seek": 101356, "start": 1020.76, "end": 1027.72, "text": " blind pairs + in clip, that was done by, I''m sorry, I forgot the name, which were like,", "tokens": + [50724, 6865, 15494, 294, 7353, 11, 300, 390, 1096, 538, 11, 286, 478, 2597, 11, + 286, 5298, 264, 1315, 11, 597, 645, 411, 11, 51072], "temperature": 0.0, "avg_logprob": + -0.3783814239501953, "compression_ratio": 1.669683257918552, "no_speech_prob": 0.04725123196840286}, + {"id": 160, "seek": 101356, "start": 1027.72, "end": 1035.8, "text": " people say + find that. Yeah, so they found out, clip actually has blind pairs, like you cannot", + "tokens": [51072, 561, 584, 915, 300, 13, 865, 11, 370, 436, 1352, 484, 11, 7353, + 767, 575, 6865, 15494, 11, 411, 291, 2644, 51476], "temperature": 0.0, "avg_logprob": + -0.3783814239501953, "compression_ratio": 1.669683257918552, "no_speech_prob": 0.04725123196840286}, + {"id": 161, "seek": 101356, "start": 1036.52, "end": 1043.32, "text": " segment + things really well, like, cats sleeping on the, on the car, or something, and then", + "tokens": [51512, 9469, 721, 534, 731, 11, 411, 11, 11111, 8296, 322, 264, 11, 322, + 264, 1032, 11, 420, 746, 11, 293, 550, 51852], "temperature": 0.0, "avg_logprob": + -0.3783814239501953, "compression_ratio": 1.669683257918552, "no_speech_prob": 0.04725123196840286}, + {"id": 162, "seek": 104332, "start": 1043.48, "end": 1051.08, "text": " something + else will give you the same description or something. So there, Dino comes in. So + Dino", "tokens": [50372, 746, 1646, 486, 976, 291, 264, 912, 3855, 420, 746, 13, + 407, 456, 11, 413, 2982, 1487, 294, 13, 407, 413, 2982, 50752], "temperature": 0.0, + "avg_logprob": -0.40670801558584535, "compression_ratio": 1.7123287671232876, "no_speech_prob": + 0.0034034820273518562}, {"id": 163, "seek": 104332, "start": 1051.08, "end": 1057.0, + "text": " there''s a segmentation with self-supervised learning, self-supervised + learning, I think is the", "tokens": [50752, 456, 311, 257, 9469, 399, 365, 2698, + 12, 48172, 24420, 2539, 11, 2698, 12, 48172, 24420, 2539, 11, 286, 519, 307, 264, + 51048], "temperature": 0.0, "avg_logprob": -0.40670801558584535, "compression_ratio": + 1.7123287671232876, "no_speech_prob": 0.0034034820273518562}, {"id": 164, "seek": + 104332, "start": 1057.0, "end": 1062.12, "text": " best invention of like, for this + AI, and now it thumbs up. Yeah, in the open source.", "tokens": [51048, 1151, 22265, + 295, 411, 11, 337, 341, 7318, 11, 293, 586, 309, 8838, 493, 13, 865, 11, 294, 264, + 1269, 4009, 13, 51304], "temperature": 0.0, "avg_logprob": -0.40670801558584535, + "compression_ratio": 1.7123287671232876, "no_speech_prob": 0.0034034820273518562}, + {"id": 165, "seek": 104332, "start": 1062.6799999999998, "end": 1069.48, "text": + " That''s supervised learning. Is it open source? A Dino, you say? Yeah, it is. + Yeah, it''s from Meta,", "tokens": [51332, 663, 311, 46533, 2539, 13, 1119, 309, + 1269, 4009, 30, 316, 413, 2982, 11, 291, 584, 30, 865, 11, 309, 307, 13, 865, 11, + 309, 311, 490, 6377, 64, 11, 51672], "temperature": 0.0, "avg_logprob": -0.40670801558584535, + "compression_ratio": 1.7123287671232876, "no_speech_prob": 0.0034034820273518562}, + {"id": 166, "seek": 106948, "start": 1069.96, "end": 1078.76, "text": " and I think + Likun is one of, and when, you know, he has written this folder, what you call it,", + "tokens": [50388, 293, 286, 519, 441, 1035, 409, 307, 472, 295, 11, 293, 562, 11, + 291, 458, 11, 415, 575, 3720, 341, 10820, 11, 437, 291, 818, 309, 11, 50828], "temperature": + 0.0, "avg_logprob": -0.29083048502604164, "compression_ratio": 1.738532110091743, + "no_speech_prob": 0.05210966244339943}, {"id": 167, "seek": 106948, "start": 1078.76, + "end": 1084.76, "text": " white paper on the dark matter of intelligence that is + self-supervised learning. So they are doing a", "tokens": [50828, 2418, 3035, 322, + 264, 2877, 1871, 295, 7599, 300, 307, 2698, 12, 48172, 24420, 2539, 13, 407, 436, + 366, 884, 257, 51128], "temperature": 0.0, "avg_logprob": -0.29083048502604164, + "compression_ratio": 1.738532110091743, "no_speech_prob": 0.05210966244339943}, + {"id": 168, "seek": 106948, "start": 1084.76, "end": 1090.1200000000001, "text": + " lot of work in self-supervised learning. You know, make data, make the model learn + from the data", "tokens": [51128, 688, 295, 589, 294, 2698, 12, 48172, 24420, 2539, + 13, 509, 458, 11, 652, 1412, 11, 652, 264, 2316, 1466, 490, 264, 1412, 51396], "temperature": + 0.0, "avg_logprob": -0.29083048502604164, "compression_ratio": 1.738532110091743, + "no_speech_prob": 0.05210966244339943}, {"id": 169, "seek": 106948, "start": 1090.1200000000001, + "end": 1095.88, "text": " itself. You do not need to label it. Yeah, that''s like + in the self-supervised sort of.", "tokens": [51396, 2564, 13, 509, 360, 406, 643, + 281, 7645, 309, 13, 865, 11, 300, 311, 411, 294, 264, 2698, 12, 48172, 24420, 1333, + 295, 13, 51684], "temperature": 0.0, "avg_logprob": -0.29083048502604164, "compression_ratio": + 1.738532110091743, "no_speech_prob": 0.05210966244339943}, {"id": 170, "seek": 109588, + "start": 1096.8400000000001, "end": 1102.68, "text": " Okay, then maybe another + question I have is, where do you embed a human in this process? Do you ever,", "tokens": + [50412, 1033, 11, 550, 1310, 1071, 1168, 286, 362, 307, 11, 689, 360, 291, 12240, + 257, 1952, 294, 341, 1399, 30, 1144, 291, 1562, 11, 50704], "temperature": 0.0, + "avg_logprob": -0.28615981457280176, "compression_ratio": 1.5546875, "no_speech_prob": + 0.010997063480317593}, {"id": 171, "seek": 109588, "start": 1102.68, "end": 1109.0800000000002, + "text": " like, I don''t know, to check quality or give feedback? Exactly. So one + other thing in metric learning is", "tokens": [50704, 411, 11, 286, 500, 380, 458, + 11, 281, 1520, 3125, 420, 976, 5824, 30, 7587, 13, 407, 472, 661, 551, 294, 20678, + 2539, 307, 51024], "temperature": 0.0, "avg_logprob": -0.28615981457280176, "compression_ratio": + 1.5546875, "no_speech_prob": 0.010997063480317593}, {"id": 172, "seek": 109588, + "start": 1109.0800000000002, "end": 1114.6000000000001, "text": " everyone thinks + it''s self-supervised, or data, it will learn with the data. It doesn''t need", + "tokens": [51024, 1518, 7309, 309, 311, 2698, 12, 48172, 24420, 11, 420, 1412, 11, + 309, 486, 1466, 365, 264, 1412, 13, 467, 1177, 380, 643, 51300], "temperature": + 0.0, "avg_logprob": -0.28615981457280176, "compression_ratio": 1.5546875, "no_speech_prob": + 0.010997063480317593}, {"id": 173, "seek": 109588, "start": 1114.6000000000001, + "end": 1122.3600000000001, "text": " label. But when the contrastive learning happens, + who is making that negative mining fare? It''s the", "tokens": [51300, 7645, 13, + 583, 562, 264, 8712, 488, 2539, 2314, 11, 567, 307, 1455, 300, 3671, 15512, 11994, + 30, 467, 311, 264, 51688], "temperature": 0.0, "avg_logprob": -0.28615981457280176, + "compression_ratio": 1.5546875, "no_speech_prob": 0.010997063480317593}, {"id": + 174, "seek": 112236, "start": 1122.36, "end": 1128.12, "text": " huge, it''s the + kind of making that negative one. Very, very crisp one, right? Exactly, where does + it", "tokens": [50364, 2603, 11, 309, 311, 264, 733, 295, 1455, 300, 3671, 472, + 13, 4372, 11, 588, 22952, 472, 11, 558, 30, 7587, 11, 689, 775, 309, 50652], "temperature": + 0.0, "avg_logprob": -0.5678520512774707, "compression_ratio": 1.6833976833976834, + "no_speech_prob": 0.030655188485980034}, {"id": 175, "seek": 112236, "start": 1128.12, + "end": 1132.36, "text": " is learning? Not just random negative, but like synonymically + negative.", "tokens": [50652, 307, 2539, 30, 1726, 445, 4974, 3671, 11, 457, 411, + 5451, 12732, 984, 3671, 13, 50864], "temperature": 0.0, "avg_logprob": -0.5678520512774707, + "compression_ratio": 1.6833976833976834, "no_speech_prob": 0.030655188485980034}, + {"id": 176, "seek": 112236, "start": 1132.36, "end": 1137.7199999999998, "text": + " Thimmedically negative. Yeah. So there, they mean very common and like, you know.", + "tokens": [50864, 334, 332, 1912, 984, 3671, 13, 865, 13, 407, 456, 11, 436, 914, + 588, 2689, 293, 411, 11, 291, 458, 13, 51132], "temperature": 0.0, "avg_logprob": + -0.5678520512774707, "compression_ratio": 1.6833976833976834, "no_speech_prob": + 0.030655188485980034}, {"id": 177, "seek": 112236, "start": 1137.7199999999998, + "end": 1143.3999999999999, "text": " For sure. Yeah. So, and so today, let''s say + if someone wants to use your platform, you said,", "tokens": [51132, 1171, 988, + 13, 865, 13, 407, 11, 293, 370, 965, 11, 718, 311, 584, 498, 1580, 2738, 281, 764, + 428, 3663, 11, 291, 848, 11, 51416], "temperature": 0.0, "avg_logprob": -0.5678520512774707, + "compression_ratio": 1.6833976833976834, "no_speech_prob": 0.030655188485980034}, + {"id": 178, "seek": 112236, "start": 1143.3999999999999, "end": 1147.24, "text": + " embedding and embedding. Embedding thing. Embedding thing. It''s on GitHub, I''m + guessing.", "tokens": [51416, 12240, 3584, 293, 12240, 3584, 13, 24234, 292, 3584, + 551, 13, 24234, 292, 3584, 551, 13, 467, 311, 322, 23331, 11, 286, 478, 17939, 13, + 51608], "temperature": 0.0, "avg_logprob": -0.5678520512774707, "compression_ratio": + 1.6833976833976834, "no_speech_prob": 0.030655188485980034}, {"id": 179, "seek": + 114724, "start": 1147.72, "end": 1154.1200000000001, "text": " We''ll link it. Yeah. + And so how do you, like, part of the story that becomes successful, I guess,", "tokens": + [50388, 492, 603, 2113, 309, 13, 865, 13, 400, 370, 577, 360, 291, 11, 411, 11, + 644, 295, 264, 1657, 300, 3643, 4406, 11, 286, 2041, 11, 50708], "temperature": + 0.0, "avg_logprob": -0.20942884502988873, "compression_ratio": 1.6213991769547325, + "no_speech_prob": 0.03451335057616234}, {"id": 180, "seek": 114724, "start": 1154.1200000000001, + "end": 1161.16, "text": " is that you can map out your path from use cases to your + library, to your project, which would", "tokens": [50708, 307, 300, 291, 393, 4471, + 484, 428, 3100, 490, 764, 3331, 281, 428, 6405, 11, 281, 428, 1716, 11, 597, 576, + 51060], "temperature": 0.0, "avg_logprob": -0.20942884502988873, "compression_ratio": + 1.6213991769547325, "no_speech_prob": 0.03451335057616234}, {"id": 181, "seek": + 114724, "start": 1161.16, "end": 1168.04, "text": " probably be one of the components + in the overall picture. So which scenarios and use cases do you see", "tokens": + [51060, 1391, 312, 472, 295, 264, 6677, 294, 264, 4787, 3036, 13, 407, 597, 15077, + 293, 764, 3331, 360, 291, 536, 51404], "temperature": 0.0, "avg_logprob": -0.20942884502988873, + "compression_ratio": 1.6213991769547325, "no_speech_prob": 0.03451335057616234}, + {"id": 182, "seek": 114724, "start": 1168.68, "end": 1175.4, "text": " where your + platform can give value? Is it chat box? Is it vector search? Is it completely anything?", + "tokens": [51436, 689, 428, 3663, 393, 976, 2158, 30, 1119, 309, 5081, 2424, 30, + 1119, 309, 8062, 3164, 30, 1119, 309, 2584, 1340, 30, 51772], "temperature": 0.0, + "avg_logprob": -0.20942884502988873, "compression_ratio": 1.6213991769547325, "no_speech_prob": + 0.03451335057616234}, {"id": 183, "seek": 117540, "start": 1176.2, "end": 1183.24, + "text": " In anywhere where embeddings are used, multi-model embeddings, my library + will be like,", "tokens": [50404, 682, 4992, 689, 12240, 29432, 366, 1143, 11, 4825, + 12, 8014, 338, 12240, 29432, 11, 452, 6405, 486, 312, 411, 11, 50756], "temperature": + 0.0, "avg_logprob": -0.25034148352486746, "compression_ratio": 1.6926070038910506, + "no_speech_prob": 0.009070082567632198}, {"id": 184, "seek": 117540, "start": 1183.24, + "end": 1188.6000000000001, "text": " well, I want my library to be the infrastructure + where people use different.", "tokens": [50756, 731, 11, 286, 528, 452, 6405, 281, + 312, 264, 6896, 689, 561, 764, 819, 13, 51024], "temperature": 0.0, "avg_logprob": + -0.25034148352486746, "compression_ratio": 1.6926070038910506, "no_speech_prob": + 0.009070082567632198}, {"id": 185, "seek": 117540, "start": 1188.6000000000001, + "end": 1190.0400000000002, "text": " Awesome. Yeah. Yeah. Yeah.", "tokens": [51024, + 10391, 13, 865, 13, 865, 13, 865, 13, 51096], "temperature": 0.0, "avg_logprob": + -0.25034148352486746, "compression_ratio": 1.6926070038910506, "no_speech_prob": + 0.009070082567632198}, {"id": 186, "seek": 117540, "start": 1190.0400000000002, + "end": 1194.52, "text": " Well, this sounds really cool. And I wish you all the + best in this project.", "tokens": [51096, 1042, 11, 341, 3263, 534, 1627, 13, 400, + 286, 3172, 291, 439, 264, 1151, 294, 341, 1716, 13, 51320], "temperature": 0.0, + "avg_logprob": -0.25034148352486746, "compression_ratio": 1.6926070038910506, "no_speech_prob": + 0.009070082567632198}, {"id": 187, "seek": 117540, "start": 1194.52, "end": 1199.72, + "text": " Thank you. I hope that some of my listeners will go and check out and + maybe you will even get", "tokens": [51320, 1044, 291, 13, 286, 1454, 300, 512, + 295, 452, 23274, 486, 352, 293, 1520, 484, 293, 1310, 291, 486, 754, 483, 51580], + "temperature": 0.0, "avg_logprob": -0.25034148352486746, "compression_ratio": 1.6926070038910506, + "no_speech_prob": 0.009070082567632198}, {"id": 188, "seek": 117540, "start": 1199.72, + "end": 1204.2800000000002, "text": " some contributors or, you know, whoever users + who can create the tickets.", "tokens": [51580, 512, 45627, 420, 11, 291, 458, 11, + 11387, 5022, 567, 393, 1884, 264, 12628, 13, 51808], "temperature": 0.0, "avg_logprob": + -0.25034148352486746, "compression_ratio": 1.6926070038910506, "no_speech_prob": + 0.009070082567632198}, {"id": 189, "seek": 120428, "start": 1205.24, "end": 1210.84, + "text": " Yeah. Yeah. I would love to see some issues. And, you know, even if you + want to raise", "tokens": [50412, 865, 13, 865, 13, 286, 576, 959, 281, 536, 512, + 2663, 13, 400, 11, 291, 458, 11, 754, 498, 291, 528, 281, 5300, 50692], "temperature": + 0.0, "avg_logprob": -0.3032776184082031, "compression_ratio": 1.7228915662650603, + "no_speech_prob": 0.023313026875257492}, {"id": 190, "seek": 120428, "start": 1210.84, + "end": 1216.12, "text": " some issues, go ahead or add any feature, you can add + it as a full request and we can take a look", "tokens": [50692, 512, 2663, 11, 352, + 2286, 420, 909, 604, 4111, 11, 291, 393, 909, 309, 382, 257, 1577, 5308, 293, 321, + 393, 747, 257, 574, 50956], "temperature": 0.0, "avg_logprob": -0.3032776184082031, + "compression_ratio": 1.7228915662650603, "no_speech_prob": 0.023313026875257492}, + {"id": 191, "seek": 120428, "start": 1216.12, "end": 1221.96, "text": " at it. We + are really, really excited. And a lot of developers just feature to me, do I need + to", "tokens": [50956, 412, 309, 13, 492, 366, 534, 11, 534, 2919, 13, 400, 257, + 688, 295, 8849, 445, 4111, 281, 385, 11, 360, 286, 643, 281, 51248], "temperature": + 0.0, "avg_logprob": -0.3032776184082031, "compression_ratio": 1.7228915662650603, + "no_speech_prob": 0.023313026875257492}, {"id": 192, "seek": 120428, "start": 1221.96, + "end": 1226.92, "text": " need no rest? No. Yeah. If you do not need to need no + rest at all. Yeah.", "tokens": [51248, 643, 572, 1472, 30, 883, 13, 865, 13, 759, + 291, 360, 406, 643, 281, 643, 572, 1472, 412, 439, 13, 865, 13, 51496], "temperature": + 0.0, "avg_logprob": -0.3032776184082031, "compression_ratio": 1.7228915662650603, + "no_speech_prob": 0.023313026875257492}, {"id": 193, "seek": 120428, "start": 1226.92, + "end": 1231.48, "text": " So you can be, let''s say, writing the library and still + can trigger it. Yeah.", "tokens": [51496, 407, 291, 393, 312, 11, 718, 311, 584, + 11, 3579, 264, 6405, 293, 920, 393, 7875, 309, 13, 865, 13, 51724], "temperature": + 0.0, "avg_logprob": -0.3032776184082031, "compression_ratio": 1.7228915662650603, + "no_speech_prob": 0.023313026875257492}, {"id": 194, "seek": 123148, "start": 1231.48, + "end": 1235.72, "text": " Oh, nice. Awesome. Maybe you can use chat GTP as well + to convert your Python to rest,", "tokens": [50364, 876, 11, 1481, 13, 10391, 13, + 2704, 291, 393, 764, 5081, 460, 16804, 382, 731, 281, 7620, 428, 15329, 281, 1472, + 11, 50576], "temperature": 0.0, "avg_logprob": -0.31774391384299744, "compression_ratio": + 1.738396624472574, "no_speech_prob": 0.047574255615472794}, {"id": 195, "seek": + 123148, "start": 1235.72, "end": 1240.68, "text": " but that''s another story. Awesome. + And I look forward to your presentation. I will not be", "tokens": [50576, 457, + 300, 311, 1071, 1657, 13, 10391, 13, 400, 286, 574, 2128, 281, 428, 5860, 13, 286, + 486, 406, 312, 50824], "temperature": 0.0, "avg_logprob": -0.31774391384299744, + "compression_ratio": 1.738396624472574, "no_speech_prob": 0.047574255615472794}, + {"id": 196, "seek": 123148, "start": 1240.68, "end": 1246.52, "text": " there, but + I will watch the recording and I will also link this episode and the recording of + your", "tokens": [50824, 456, 11, 457, 286, 486, 1159, 264, 6613, 293, 286, 486, + 611, 2113, 341, 3500, 293, 264, 6613, 295, 428, 51116], "temperature": 0.0, "avg_logprob": + -0.31774391384299744, "compression_ratio": 1.738396624472574, "no_speech_prob": + 0.047574255615472794}, {"id": 197, "seek": 123148, "start": 1246.52, "end": 1249.64, + "text": " talk. Thank you. So good luck with that and thank you so much. Thank you.", + "tokens": [51116, 751, 13, 1044, 291, 13, 407, 665, 3668, 365, 300, 293, 1309, 291, + 370, 709, 13, 1044, 291, 13, 51272], "temperature": 0.0, "avg_logprob": -0.31774391384299744, + "compression_ratio": 1.738396624472574, "no_speech_prob": 0.047574255615472794}, + {"id": 198, "seek": 123148, "start": 1249.64, "end": 1252.52, "text": " Thank you + so much. Enjoy it. Enjoy the content. Yeah. Thank you.", "tokens": [51272, 1044, + 291, 370, 709, 13, 15411, 309, 13, 15411, 264, 2701, 13, 865, 13, 1044, 291, 13, + 51416], "temperature": 0.0, "avg_logprob": -0.31774391384299744, "compression_ratio": + 1.738396624472574, "no_speech_prob": 0.047574255615472794}]' +--- + +Hello there, vector podcast and I'm here accompanied with Sonan. Sonan you are the, I guess, visitor of the conference. Are you also giving a talk? Yes, I'm giving a talk tomorrow on metric learning. +Yeah, what's your topic? I'm not talking metric learning tomorrow, but I'm very excited about what we are building at and better than anything on starlight. So yeah, awesome. And is it your first time at the conference? Yes, it's the first time, but that's one of the best conferences. Awesome. +Yeah, I love it. I've been here first time in 2011 and I still, I still love coming back once in a while. It's really good. I can see why you want to come back again and again. Yeah, exactly. Yeah. Awesome. +And you work mostly on what I, well, we had an episode actually with quadrants on metric learning. I will, I will make sure to link it. Tell me a bit more about metric learning if you will. Like in a, in a, why shouldn't everyone care that he seems to think that they should use maybe yes. +So a lot of people just think about like, you know, we can do a check distance and then you know, we'll get the similarity. But the thing is, even though you change the distance, it won't make any difference because those embeddings are already in the space. So it's already relative. +So if you're doing a co-science similarity, which I love pizza and I do not love pizza, that's your 90% similarity. Right? And to the other distance will not make any sense. +So the thing is with metric learning, you can build your own data set and then the train there when embedding model again for giving you right. Yeah. +I mean, I still try to understand it, but it's basically like, like on one hand, you have your data and then you choose the model and that model should be pre-trained for you, you could also fine tune it on your data if you want. And then inherently, it will have its own measure of similarity. +So it's not something you can easily control. Yeah. But then metric learning opposes this by saying that you should be in control of your metric. Yeah. +It's all your similarity measure, not just the metric itself, but the similarity measure, which means that I should kind of like drop the model, just get my data and start training some new network, right? So that I can find the basically fine tuning the embedding model. +What with your data? So yeah, suppose you're finding intense. Yes. Okay. Where does metric learning really shine? It's classification versus similarity again. If you are doing classification, you are limited up to certain classes, right? Suppose, yeah, particular intense. Yeah. +It's not scalable at like a million scale, you cannot keep adding adding addicts, but with similarity search and metric learning, you can add any intense, very keen solution. Yeah. Yeah. So it's not limited. Yeah. +That's one of the, you know, classical way to view that metric learning plays much, much better role at scale, and that's why vector database can scale this much. Sure. Yeah. And tell me a bit more about yourself. +How did you end up in this space? Like, what was your pet? I know you worked at Thrasa as well, which is also an open source project. Yes. I once looked at and but now you work for another company like, what was your journey? And yeah. +So I worked at as an AI researcher at Sama, so we were mostly in clinical trials. So, you know, Pfizer and the world is, it does this clinical trials for 10 to 12 years and we had like those massive data and we wanted to find out the subjects could drop out of the studies. +I also published paper before. That's well-versed in this AI research and AI area. Yeah. And then I joined Raza for conversation, AI, I love conversation, AI. And then I joined four friends recently and I got into this embedding space. +And now I have my own open source project called embedding a thing in which you can use very different multi-moder sources and structure sources, speed, you know, you get embed it in 40 x faster speed than any other presence by planes. Wow. How did you do that? That is rust. It's all available. +It's all open source because I have like a used supporter of open source. So what we do is we have built this cluster in rust from PDF while it is going towards embedding. So one of the analogy that I use most is embedding models are, yeah, they are like really, really cool. +They are becoming faster and everything. But if you want to drive a Porsche, would you like to drive it on a national highway for a road full of quadruples? So that's the analogy being used. +We are giving you a high for the price for driving your embedding model or Porsche, you know, in a very sophisticated and like, yeah, no tech depth, you call it by blind for embedding. Interesting. So you are basically building an infrastructure where or infrastructure for this is a very model. +So what as a user, what can I do on this project? Yeah, very good question. So we are very production ready. Yeah. And we do not use any kind of heavy library, right? They are lip torches. +So if you have to embed something, the first go on hugging phase, use sentence, transformers, and then you will download that 2.5 TV library and stuff like which will come with lip torches and stuff like that. Yeah. And we have removed all those dependents. All right. That's a good lighter. Yeah. +We have liked it. Yeah. But of candle from the hugging phase, because candle also uses rust and because we are also building rust, it's much easier to integrate with candle. So yeah, so it's much lighter, much faster, you know, way of creating. +What is candle? Candle is basically, basically, inference on GPU and CPU. Oh, I see. Yeah. Yeah. And it's also open source. Yeah, it's also. Okay. So you do everything unconventionally in a rust, even though everyone else is doing it in Python. +Because it's, you know, multi-treting is like so much embedded in rust. Like people will tell you that Python can also do multi-treting, but that's not too multi-treting because the global, global, and global, and global law. Yeah. And rust tells you mutable log. +So you can do like achieve a tool multi-treting just with rust. Yeah. They promised actually to solve geolproblem in Python next version. Yeah. They already are through them. Oh, wow. I don't know when it will materialize, but. Okay. And so, okay. +But if I look at it from the perspective, let's say, of building some product, being a chatbot or like search engine, you know, blend it with vector search, or something like that. So what is my typical sort of like, like, pipeline, how does it look like? Right. +So what will I do? Let's say I have my data. And then maybe I've chosen a model, but that model is okay. Maybe it's not the fastest one. +What should I do? Will I turn to your platform to speed it up? Will I turn to your platform to do some other things as well? So we are not doing any changes in the model itself. We are not quantizing. +Even though we can use those models, so candle gives you a certain list of models that you can use and create with us. Yeah. Basically. So whatever candle supports, we support. Yeah. Whatever candle doesn't support, we cannot support because we are basically dependent of them. Yeah. +So if and we are not doing anything in the model itself, we are doing it on the extraction in parsing part of the data. Right. If you have different videos, different MDs, I will extract junk and parse them. And then build this like extra fast. Yeah. Yeah. +And then let's say if I want to go to production, but I also have some other components which maybe you wouldn't integrate, right? I know my search cluster and something else. My services. So can I also go to production with your platform? Yeah. +Like, how will it look like exactly? Is it a docker? Is it the unit? It's you do not need to first of all code in Rust. A lot of developers come out to reach me and like, you know, they ask, do I need to know Rust to contribute to embed anything? I'm like, no, you do not need. +We have, because let's fire. We have like worked like for building this wrapper around Rust so that you know, you can easily create it with Python. Oh, so you have a Python wrapper of your own. Yeah. You only need to know Python. You do not need to know Rust at all. How interesting. Yeah. +So do you have any like instances where companies have already built POCs with your platform? Or do you already have someone going to production? Yeah. So I get so many requests on like acquisition part of things and stuff like that. +But we are like, you know, it's we are one my whole company and we have nothing company project. But we have got six key downloads, but we have gotten to production yet. But hopefully next two, three months. Nice. And do you need any help from the community? Yes. +If you're interested in building the infrastructure for UnityBI and Rust, Python. So to connect with us, we're left to have you on board. All right. But let's go back a little bit. So you also said that there is multi-modality element of it. Yeah. +So I will tell you the way I see it, but please correct me or augment me. So I think a couple of years ago, two years ago, we gave a talk here at Berlin Bosnol. It's basically showing a system where you can search images and text, whatever you want. +And if you have images that do not have textual metadata, then that's your gateway into finding these images because neural networks will understand and extract the content using plebe, right? Yeah. And so we were able to show some really interesting examples. +For example, you could find in the context of the commerce, a long sleeveless dress, striped, whatever color and so on and so forth. And it worked. And even some audience members asked us to demo on their queries and it still worked. +So that showed the power of multi-modality, right? And we didn't even need to fine-tune delete it. It was just out of the box. But I guess the reason for it to multi-modality, what else are you thinking that's part of your blog? Great question. +Even though images like Clare is known for multi-modality research, one of the best use cases of Clare is when you're doing the need of short classification, right? It doesn't need the previous data at all, even if it is searching images, if it is searching through text. +And it's like so powerful, right? So we have a different example with it. But coming to a question, we have audio, wanted to embed audio graphs, et cetera. So all these are in five times. But right now, we are only embedding text and images. And are you using Clip? Yeah, we're using Clip. +Clip, that's right. You know, that one thing that you cannot get. I also wanted to ask you a bit if you may share your insight on evaluating this system. +So one of the feedbacks that I have gotten for, for IoT, between or anything, like basically, so let's say I have my LLN based application, you know, how do I evaluate it? Because one of the feedbacks is that sometimes it gives perfect results, sometimes it gives awful results, right? +So now there is nothing in between, right? Or not barely. +So how would you solve this? Of course, you do start with your metric learning and some other techniques, right? But there is still the other side of things when you go to production, as you know, like in Rasa and Quadrant and many other companies, you care about quality. +So how do you have any insight on that? Are you maybe planning to build something along the lines of evaluation? That's a great question. You know, but great part of the great response to it is, so LLN is one of the examples of that. +So, you know, LLN gives you a bright answer, but it also gives you hallucination. But a lot of people see hallucination as a bug, but I see it as a feature, because it won't be able to do that creative job, but it can do with hallucinations. +One of the, there are so many tools, right, to measure retrieval part like Braggas, Prometheus, right? And there are so many tools, too. +But still, I think recall measure, what we call it, like, you know, measure how LLN recall is working, where basically extracting most relevant information, not like rubbish information. +So those things are like really important, and a lot of research is going on, but we are more like focused on the infrastructure, and we are keeping it up, trying to keep it up, but yeah, so mostly I would go for classical testing ways like precision, recall, yeah. +But basically like, okay, you do test, and you see that sometimes once in a while it fails. So first of all, of course, catching that is important, right? People are going to production. Yes, but what is your way backwards to fixing this? Thank you, Chef. Yeah, from from from finding that bug. +Okay, let me think about that. +So, data set, maybe you can give some example where you have fixed a ratio, you know, reported by someone, not necessarily as part of your platform, but previously, I don't know, what have I done? Yeah, so I was working with this, that's why this talk came into my mind, right? +The negation problem, the negation problem is so huge. +You will always find sentences with not every domain, read biomedical, read law and everything, and it still gives you the same similarity, even though you do not have to be a language, you know, expert to understand these. Different things. Yeah, yeah, yeah. +So different things, right? That's that's when the trick learning comes, that's when inference started to come into mind, because inference is very important. +Like a lot of people have played with SNLI and stuff, and then they understand that to understand negation, you first need to understand inferences. +So there's a way to like, method, right? Yeah, yeah, yeah, and entailment, contradiction, neutral, two sentences could be neutral, yeah, unrelated to each other, two centers could be contradictory, contradictory to each other. +So, yeah, which means that you need a purpose of data somehow labeled, yes, yes, yes, logically reasoned through, right, using an algorithm. +Yes, so that's that's what SNLI is, like, SNLI is a data set, particularly for this, yeah, particular questions, problems, so yeah, if you, you've re-tune it with, it was fun. So that would be for text, and what about other modalities? I don't think it was, that's what it was. +Like images, I know sometimes a model may hallucinate that there is something in the real world, but there is nothing like that. Oh, that's one thing, but I guess there are many. +So there are things like, there's a study called blind pairs in clip, that was done by, I'm sorry, I forgot the name, which were like, people say find that. +Yeah, so they found out, clip actually has blind pairs, like you cannot segment things really well, like, cats sleeping on the, on the car, or something, and then something else will give you the same description or something. So there, Dino comes in. +So Dino there's a segmentation with self-supervised learning, self-supervised learning, I think is the best invention of like, for this AI, and now it thumbs up. Yeah, in the open source. That's supervised learning. Is it open source? A Dino, you say? Yeah, it is. +Yeah, it's from Meta, and I think Likun is one of, and when, you know, he has written this folder, what you call it, white paper on the dark matter of intelligence that is self-supervised learning. So they are doing a lot of work in self-supervised learning. +You know, make data, make the model learn from the data itself. You do not need to label it. Yeah, that's like in the self-supervised sort of. +Okay, then maybe another question I have is, where do you embed a human in this process? Do you ever, like, I don't know, to check quality or give feedback? Exactly. So one other thing in metric learning is everyone thinks it's self-supervised, or data, it will learn with the data. +It doesn't need label. But when the contrastive learning happens, who is making that negative mining fare? It's the huge, it's the kind of making that negative one. Very, very crisp one, right? Exactly, where does it is learning? Not just random negative, but like synonymically negative. +Thimmedically negative. Yeah. So there, they mean very common and like, you know. For sure. Yeah. So, and so today, let's say if someone wants to use your platform, you said, embedding and embedding. Embedding thing. Embedding thing. It's on GitHub, I'm guessing. We'll link it. Yeah. +And so how do you, like, part of the story that becomes successful, I guess, is that you can map out your path from use cases to your library, to your project, which would probably be one of the components in the overall picture. +So which scenarios and use cases do you see where your platform can give value? Is it chat box? Is it vector search? Is it completely anything? +In anywhere where embeddings are used, multi-model embeddings, my library will be like, well, I want my library to be the infrastructure where people use different. +Awesome. Yeah. Yeah. Yeah. Well, this sounds really cool. And I wish you all the best in this project. Thank you. I hope that some of my listeners will go and check out and maybe you will even get some contributors or, you know, whoever users who can create the tickets. Yeah. Yeah. +I would love to see some issues. And, you know, even if you want to raise some issues, go ahead or add any feature, you can add it as a full request and we can take a look at it. We are really, really excited. And a lot of developers just feature to me, do I need to need no rest? No. Yeah. +If you do not need to need no rest at all. Yeah. So you can be, let's say, writing the library and still can trigger it. Yeah. Oh, nice. Awesome. Maybe you can use chat GTP as well to convert your Python to rest, but that's another story. Awesome. And I look forward to your presentation. +I will not be there, but I will watch the recording and I will also link this episode and the recording of your talk. Thank you. So good luck with that and thank you so much. Thank you. Thank you so much. Enjoy it. Enjoy the content. Yeah. Thank you. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/bob-van-luijt-ceo-semi-on-the-weaviate-vector-search-engine.md b/transcripts_with_timestamps/vector-podcast/bob-van-luijt-ceo-semi-on-the-weaviate-vector-search-engine.md new file mode 100644 index 0000000..26069a7 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/bob-van-luijt-ceo-semi-on-the-weaviate-vector-search-engine.md @@ -0,0 +1,4459 @@ +--- +description: '

YouTube: https://www.youtube.com/watch?v=iHC5oeAN29o

Show + notes:

1. Layering problem: www.edge.org/conversation/sean_…-layers-of-reality

2. + Podcast with Etienne Dilocker (SeMI Technologies Co-Founder & CTO): www.youtube.com/watch?v=6lkanzOqhDs

3. + SOC2: linfordco.com/blog/soc-1-vs-soc-2-audit-reports/

4. + Dmitry''s post on 7 Vector Databases: towardsdatascience.com/milvus-pineco…-9c65a3bd0696

5. + Billion-Scale ANN Challenge: big-ann-benchmarks.com/index.html

6. + Weaviate Introduction: www.semi.technology/developers/weaviate/current/ + Newsletter: www.semi.technology/newsletter/

7. + Use case: Scalable Knowledge Graph Search for 60+ million academic papers with Weaviate: + medium.com/keenious/knowledge-…aviate-7964657ec911

8. + Bob''s Twitter: twitter.com/bobvanluijt

9. + Dmitry''s Twitter: twitter.com/DmitryKan

10. + Dmitry''s tech blog: dmitry-kan.medium.com/

' +image_url: https://media.rss.com/vector-podcast/20211223_011215_3e84d5201cd172cc4c9a7c3057bf900a.jpg +pub_date: Thu, 23 Dec 2021 13:17:15 GMT +title: Bob van Luijt (CEO, Semi) on the Weaviate vector search engine +url: https://rss.com/podcasts/vector-podcast/347461 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 28.16, "text": " on", + "tokens": [50364, 322, 51772], "temperature": 1.0, "avg_logprob": -2.3076030731201174, + "compression_ratio": 0.2, "no_speech_prob": 0.3324229121208191}, {"id": 1, "seek": + 2816, "start": 28.16, "end": 31.92, "text": " database and I''m sure Bob will talk + more about it, what it is, what it isn''t.", "tokens": [50364, 8149, 293, 286, 478, + 988, 6085, 486, 751, 544, 466, 309, 11, 437, 309, 307, 11, 437, 309, 1943, 380, + 13, 50552], "temperature": 0.0, "avg_logprob": -0.29662054777145386, "compression_ratio": + 1.6472727272727272, "no_speech_prob": 0.22794686257839203}, {"id": 2, "seek": 2816, + "start": 32.64, "end": 33.14, "text": " Hey Bob.", "tokens": [50588, 1911, 6085, + 13, 50613], "temperature": 0.0, "avg_logprob": -0.29662054777145386, "compression_ratio": + 1.6472727272727272, "no_speech_prob": 0.22794686257839203}, {"id": 3, "seek": 2816, + "start": 34.32, "end": 36.0, "text": " Hey, thanks for having me.", "tokens": [50672, + 1911, 11, 3231, 337, 1419, 385, 13, 50756], "temperature": 0.0, "avg_logprob": -0.29662054777145386, + "compression_ratio": 1.6472727272727272, "no_speech_prob": 0.22794686257839203}, + {"id": 4, "seek": 2816, "start": 36.44, "end": 37.04, "text": " Cool to be here.", + "tokens": [50778, 8561, 281, 312, 510, 13, 50808], "temperature": 0.0, "avg_logprob": + -0.29662054777145386, "compression_ratio": 1.6472727272727272, "no_speech_prob": + 0.22794686257839203}, {"id": 5, "seek": 2816, "start": 37.44, "end": 38.92, "text": + " Yeah, thanks for joining.", "tokens": [50828, 865, 11, 3231, 337, 5549, 13, 50902], + "temperature": 0.0, "avg_logprob": -0.29662054777145386, "compression_ratio": 1.6472727272727272, + "no_speech_prob": 0.22794686257839203}, {"id": 6, "seek": 2816, "start": 38.92, + "end": 44.480000000000004, "text": " I know you have a haptic schedule, but it''s + always nice to, you know,", "tokens": [50902, 286, 458, 291, 362, 257, 324, 32307, + 7567, 11, 457, 309, 311, 1009, 1481, 281, 11, 291, 458, 11, 51180], "temperature": + 0.0, "avg_logprob": -0.29662054777145386, "compression_ratio": 1.6472727272727272, + "no_speech_prob": 0.22794686257839203}, {"id": 7, "seek": 2816, "start": 44.480000000000004, + "end": 46.480000000000004, "text": " pause a little bit and talk about things.", + "tokens": [51180, 10465, 257, 707, 857, 293, 751, 466, 721, 13, 51280], "temperature": + 0.0, "avg_logprob": -0.29662054777145386, "compression_ratio": 1.6472727272727272, + "no_speech_prob": 0.22794686257839203}, {"id": 8, "seek": 2816, "start": 46.96, + "end": 50.72, "text": " And I was thinking maybe we can start off by introduction.", + "tokens": [51304, 400, 286, 390, 1953, 1310, 321, 393, 722, 766, 538, 9339, 13, + 51492], "temperature": 0.0, "avg_logprob": -0.29662054777145386, "compression_ratio": + 1.6472727272727272, "no_speech_prob": 0.22794686257839203}, {"id": 9, "seek": 2816, + "start": 50.72, "end": 53.84, "text": " Like if you can introduce yourself your + background and kind of like,", "tokens": [51492, 1743, 498, 291, 393, 5366, 1803, + 428, 3678, 293, 733, 295, 411, 11, 51648], "temperature": 0.0, "avg_logprob": -0.29662054777145386, + "compression_ratio": 1.6472727272727272, "no_speech_prob": 0.22794686257839203}, + {"id": 10, "seek": 2816, "start": 54.04, "end": 57.56, "text": " how did you end + up working for this product and company?", "tokens": [51658, 577, 630, 291, 917, + 493, 1364, 337, 341, 1674, 293, 2237, 30, 51834], "temperature": 0.0, "avg_logprob": + -0.29662054777145386, "compression_ratio": 1.6472727272727272, "no_speech_prob": + 0.22794686257839203}, {"id": 11, "seek": 5816, "start": 58.8, "end": 59.279999999999994, + "text": " Yeah, sure.", "tokens": [50396, 865, 11, 988, 13, 50420], "temperature": + 0.0, "avg_logprob": -0.21017656917065647, "compression_ratio": 1.6502057613168724, + "no_speech_prob": 0.003455210942775011}, {"id": 12, "seek": 5816, "start": 59.279999999999994, + "end": 63.839999999999996, "text": " So I''ve been, so I started my career as a, + as a software engineer and,", "tokens": [50420, 407, 286, 600, 668, 11, 370, 286, + 1409, 452, 3988, 382, 257, 11, 382, 257, 4722, 11403, 293, 11, 50648], "temperature": + 0.0, "avg_logprob": -0.21017656917065647, "compression_ratio": 1.6502057613168724, + "no_speech_prob": 0.003455210942775011}, {"id": 13, "seek": 5816, "start": 63.879999999999995, + "end": 68.0, "text": " and later I moved to a more IT and software consultancy.", + "tokens": [50650, 293, 1780, 286, 4259, 281, 257, 544, 6783, 293, 4722, 7189, 6717, + 13, 50856], "temperature": 0.0, "avg_logprob": -0.21017656917065647, "compression_ratio": + 1.6502057613168724, "no_speech_prob": 0.003455210942775011}, {"id": 14, "seek": + 5816, "start": 68.72, "end": 73.2, "text": " And one of the things that I was working + with a lot of unstructured data.", "tokens": [50892, 400, 472, 295, 264, 721, 300, + 286, 390, 1364, 365, 257, 688, 295, 18799, 46847, 1412, 13, 51116], "temperature": + 0.0, "avg_logprob": -0.21017656917065647, "compression_ratio": 1.6502057613168724, + "no_speech_prob": 0.003455210942775011}, {"id": 15, "seek": 5816, "start": 73.39999999999999, + "end": 76.0, "text": " And we''re probably going to talk way more about that.", + "tokens": [51126, 400, 321, 434, 1391, 516, 281, 751, 636, 544, 466, 300, 13, 51256], + "temperature": 0.0, "avg_logprob": -0.21017656917065647, "compression_ratio": 1.6502057613168724, + "no_speech_prob": 0.003455210942775011}, {"id": 16, "seek": 5816, "start": 77.52, + "end": 83.36, "text": " But the story that I have is that the years ago I was at + a,", "tokens": [51332, 583, 264, 1657, 300, 286, 362, 307, 300, 264, 924, 2057, + 286, 390, 412, 257, 11, 51624], "temperature": 0.0, "avg_logprob": -0.21017656917065647, + "compression_ratio": 1.6502057613168724, "no_speech_prob": 0.003455210942775011}, + {"id": 17, "seek": 5816, "start": 83.8, "end": 87.47999999999999, "text": " at a + conference in San Francisco and it was a, it was a cloud conference.", "tokens": + [51646, 412, 257, 7586, 294, 5271, 12279, 293, 309, 390, 257, 11, 309, 390, 257, + 4588, 7586, 13, 51830], "temperature": 0.0, "avg_logprob": -0.21017656917065647, + "compression_ratio": 1.6502057613168724, "no_speech_prob": 0.003455210942775011}, + {"id": 18, "seek": 8816, "start": 88.24, "end": 93.92, "text": " And back then it + was just announced that there was a change in the,", "tokens": [50368, 400, 646, + 550, 309, 390, 445, 7548, 300, 456, 390, 257, 1319, 294, 264, 11, 50652], "temperature": + 0.0, "avg_logprob": -0.3187188222898659, "compression_ratio": 1.7949640287769784, + "no_speech_prob": 0.001919409609399736}, {"id": 19, "seek": 8816, "start": 94.0, + "end": 95.44, "text": " in the Google search algorithm.", "tokens": [50656, 294, + 264, 3329, 3164, 9284, 13, 50728], "temperature": 0.0, "avg_logprob": -0.3187188222898659, + "compression_ratio": 1.7949640287769784, "no_speech_prob": 0.001919409609399736}, + {"id": 20, "seek": 8816, "start": 95.52, "end": 98.96, "text": " And you have to, + you have to bear my, this is this pre dating,", "tokens": [50732, 400, 291, 362, + 281, 11, 291, 362, 281, 6155, 452, 11, 341, 307, 341, 659, 10689, 11, 50904], "temperature": + 0.0, "avg_logprob": -0.3187188222898659, "compression_ratio": 1.7949640287769784, + "no_speech_prob": 0.001919409609399736}, {"id": 21, "seek": 8816, "start": 99.03999999999999, + "end": 101.75999999999999, "text": " like you''ve seen the remote like transformers + and those kind of things.", "tokens": [50908, 411, 291, 600, 1612, 264, 8607, 411, + 4088, 433, 293, 729, 733, 295, 721, 13, 51044], "temperature": 0.0, "avg_logprob": + -0.3187188222898659, "compression_ratio": 1.7949640287769784, "no_speech_prob": + 0.001919409609399736}, {"id": 22, "seek": 8816, "start": 101.75999999999999, "end": + 103.39999999999999, "text": " This was the time that I think,", "tokens": [51044, + 639, 390, 264, 565, 300, 286, 519, 11, 51126], "temperature": 0.0, "avg_logprob": + -0.3187188222898659, "compression_ratio": 1.7949640287769784, "no_speech_prob": + 0.001919409609399736}, {"id": 23, "seek": 8816, "start": 103.88, "end": 106.47999999999999, + "text": " glass was the, was the, the biggest thing around.", "tokens": [51150, + 4276, 390, 264, 11, 390, 264, 11, 264, 3880, 551, 926, 13, 51280], "temperature": + 0.0, "avg_logprob": -0.3187188222898659, "compression_ratio": 1.7949640287769784, + "no_speech_prob": 0.001919409609399736}, {"id": 24, "seek": 8816, "start": 106.67999999999999, + "end": 110.4, "text": " And, and they made it change and they said, like, well,", + "tokens": [51290, 400, 11, 293, 436, 1027, 309, 1319, 293, 436, 848, 11, 411, 11, + 731, 11, 51476], "temperature": 0.0, "avg_logprob": -0.3187188222898659, "compression_ratio": + 1.7949640287769784, "no_speech_prob": 0.001919409609399736}, {"id": 25, "seek": + 8816, "start": 110.4, "end": 112.24, "text": " we''re going to go more to contextual + search.", "tokens": [51476, 321, 434, 516, 281, 352, 544, 281, 35526, 3164, 13, + 51568], "temperature": 0.0, "avg_logprob": -0.3187188222898659, "compression_ratio": + 1.7949640287769784, "no_speech_prob": 0.001919409609399736}, {"id": 26, "seek": + 8816, "start": 112.24, "end": 114.64, "text": " We''re going to go on the way from + what they call them,", "tokens": [51568, 492, 434, 516, 281, 352, 322, 264, 636, + 490, 437, 436, 818, 552, 11, 51688], "temperature": 0.0, "avg_logprob": -0.3187188222898659, + "compression_ratio": 1.7949640287769784, "no_speech_prob": 0.001919409609399736}, + {"id": 27, "seek": 8816, "start": 115.0, "end": 116.6, "text": " a page rank to + rank brain.", "tokens": [51706, 257, 3028, 6181, 281, 6181, 3567, 13, 51786], "temperature": + 0.0, "avg_logprob": -0.3187188222898659, "compression_ratio": 1.7949640287769784, + "no_speech_prob": 0.001919409609399736}, {"id": 28, "seek": 11660, "start": 116.67999999999999, + "end": 119.08, "text": " And one of the things that I was looking into is like,", + "tokens": [50368, 400, 472, 295, 264, 721, 300, 286, 390, 1237, 666, 307, 411, 11, + 50488], "temperature": 0.0, "avg_logprob": -0.1938600046881314, "compression_ratio": + 1.788679245283019, "no_speech_prob": 0.0003896997368428856}, {"id": 29, "seek": + 11660, "start": 119.36, "end": 122.39999999999999, "text": " you know, is there + any company or are these cloud providers?", "tokens": [50502, 291, 458, 11, 307, + 456, 604, 2237, 420, 366, 613, 4588, 11330, 30, 50654], "temperature": 0.0, "avg_logprob": + -0.1938600046881314, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.0003896997368428856}, + {"id": 30, "seek": 11660, "start": 122.44, "end": 127.16, "text": " Are they going + to provide database technology or search engine technology", "tokens": [50656, 2014, + 436, 516, 281, 2893, 8149, 2899, 420, 3164, 2848, 2899, 50892], "temperature": 0.0, + "avg_logprob": -0.1938600046881314, "compression_ratio": 1.788679245283019, "no_speech_prob": + 0.0003896997368428856}, {"id": 31, "seek": 11660, "start": 127.16, "end": 131.84, + "text": " that actually deals with a similar type of search.", "tokens": [50892, + 300, 767, 11215, 365, 257, 2531, 2010, 295, 3164, 13, 51126], "temperature": 0.0, + "avg_logprob": -0.1938600046881314, "compression_ratio": 1.788679245283019, "no_speech_prob": + 0.0003896997368428856}, {"id": 32, "seek": 11660, "start": 131.84, "end": 134.44, + "text": " So that becomes easier to search through unstructured data.", "tokens": + [51126, 407, 300, 3643, 3571, 281, 3164, 807, 18799, 46847, 1412, 13, 51256], "temperature": + 0.0, "avg_logprob": -0.1938600046881314, "compression_ratio": 1.788679245283019, + "no_speech_prob": 0.0003896997368428856}, {"id": 33, "seek": 11660, "start": 134.56, + "end": 140.16, "text": " And, and the answer was actually like they, they weren''t + looking into it.", "tokens": [51262, 400, 11, 293, 264, 1867, 390, 767, 411, 436, + 11, 436, 4999, 380, 1237, 666, 309, 13, 51542], "temperature": 0.0, "avg_logprob": + -0.1938600046881314, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.0003896997368428856}, + {"id": 34, "seek": 11660, "start": 140.16, "end": 142.0, "text": " Or maybe they + were, but they weren''t sharing it.", "tokens": [51542, 1610, 1310, 436, 645, 11, + 457, 436, 4999, 380, 5414, 309, 13, 51634], "temperature": 0.0, "avg_logprob": -0.1938600046881314, + "compression_ratio": 1.788679245283019, "no_speech_prob": 0.0003896997368428856}, + {"id": 35, "seek": 11660, "start": 142.0, "end": 145.28, "text": " So I was actually + at the airport of San Francisco.", "tokens": [51634, 407, 286, 390, 767, 412, 264, + 10155, 295, 5271, 12279, 13, 51798], "temperature": 0.0, "avg_logprob": -0.1938600046881314, + "compression_ratio": 1.788679245283019, "no_speech_prob": 0.0003896997368428856}, + {"id": 36, "seek": 14528, "start": 145.28, "end": 147.2, "text": " And I just started + to work on this idea.", "tokens": [50364, 400, 286, 445, 1409, 281, 589, 322, 341, + 1558, 13, 50460], "temperature": 0.0, "avg_logprob": -0.17131434300149134, "compression_ratio": + 1.8888888888888888, "no_speech_prob": 0.001989940647035837}, {"id": 37, "seek": + 14528, "start": 147.2, "end": 150.24, "text": " And it was like, it''s coming from + a, from a lot of directions back then.", "tokens": [50460, 400, 309, 390, 411, 11, + 309, 311, 1348, 490, 257, 11, 490, 257, 688, 295, 11095, 646, 550, 13, 50612], "temperature": + 0.0, "avg_logprob": -0.17131434300149134, "compression_ratio": 1.8888888888888888, + "no_speech_prob": 0.001989940647035837}, {"id": 38, "seek": 14528, "start": 150.24, + "end": 153.04, "text": " So it was a lot happening, knowledge graphs are happening.", + "tokens": [50612, 407, 309, 390, 257, 688, 2737, 11, 3601, 24877, 366, 2737, 13, + 50752], "temperature": 0.0, "avg_logprob": -0.17131434300149134, "compression_ratio": + 1.8888888888888888, "no_speech_prob": 0.001989940647035837}, {"id": 39, "seek": + 14528, "start": 153.6, "end": 155.08, "text": " And the machine learning was growing.", + "tokens": [50780, 400, 264, 3479, 2539, 390, 4194, 13, 50854], "temperature": 0.0, + "avg_logprob": -0.17131434300149134, "compression_ratio": 1.8888888888888888, "no_speech_prob": + 0.001989940647035837}, {"id": 40, "seek": 14528, "start": 155.08, "end": 159.52, + "text": " And at some point, I thought like, hey, actually,", "tokens": [50854, + 400, 412, 512, 935, 11, 286, 1194, 411, 11, 4177, 11, 767, 11, 51076], "temperature": + 0.0, "avg_logprob": -0.17131434300149134, "compression_ratio": 1.8888888888888888, + "no_speech_prob": 0.001989940647035837}, {"id": 41, "seek": 14528, "start": 159.52, + "end": 163.48, "text": " I do think that there''s an opportunity in the market for + this.", "tokens": [51076, 286, 360, 519, 300, 456, 311, 364, 2650, 294, 264, 2142, + 337, 341, 13, 51274], "temperature": 0.0, "avg_logprob": -0.17131434300149134, "compression_ratio": + 1.8888888888888888, "no_speech_prob": 0.001989940647035837}, {"id": 42, "seek": + 14528, "start": 163.48, "end": 165.6, "text": " And so I started to work on this.", + "tokens": [51274, 400, 370, 286, 1409, 281, 589, 322, 341, 13, 51380], "temperature": + 0.0, "avg_logprob": -0.17131434300149134, "compression_ratio": 1.8888888888888888, + "no_speech_prob": 0.001989940647035837}, {"id": 43, "seek": 14528, "start": 165.6, + "end": 168.6, "text": " So I started to gather a team around me.", "tokens": [51380, + 407, 286, 1409, 281, 5448, 257, 1469, 926, 385, 13, 51530], "temperature": 0.0, + "avg_logprob": -0.17131434300149134, "compression_ratio": 1.8888888888888888, "no_speech_prob": + 0.001989940647035837}, {"id": 44, "seek": 14528, "start": 168.6, "end": 172.64, + "text": " And what then happened was that a lot happened in the machine learning + space.", "tokens": [51530, 400, 437, 550, 2011, 390, 300, 257, 688, 2011, 294, 264, + 3479, 2539, 1901, 13, 51732], "temperature": 0.0, "avg_logprob": -0.17131434300149134, + "compression_ratio": 1.8888888888888888, "no_speech_prob": 0.001989940647035837}, + {"id": 45, "seek": 17264, "start": 172.64, "end": 176.35999999999999, "text": " + So think about these transformers models were released.", "tokens": [50364, 407, + 519, 466, 613, 4088, 433, 5245, 645, 4736, 13, 50550], "temperature": 0.0, "avg_logprob": + -0.27756854248046875, "compression_ratio": 1.7214285714285715, "no_speech_prob": + 0.0026335567235946655}, {"id": 46, "seek": 17264, "start": 176.35999999999999, "end": + 177.83999999999997, "text": " They were getting better and better.", "tokens": [50550, + 814, 645, 1242, 1101, 293, 1101, 13, 50624], "temperature": 0.0, "avg_logprob": + -0.27756854248046875, "compression_ratio": 1.7214285714285715, "no_speech_prob": + 0.0026335567235946655}, {"id": 47, "seek": 17264, "start": 177.83999999999997, "end": + 181.51999999999998, "text": " And back then, we were still looking at like having + like these", "tokens": [50624, 400, 646, 550, 11, 321, 645, 920, 1237, 412, 411, + 1419, 411, 613, 50808], "temperature": 0.0, "avg_logprob": -0.27756854248046875, + "compression_ratio": 1.7214285714285715, "no_speech_prob": 0.0026335567235946655}, + {"id": 48, "seek": 17264, "start": 182.88, "end": 186.88, "text": " factory presentations + that we can talk a little bit more, you know, about in a bit,", "tokens": [50876, + 9265, 18964, 300, 321, 393, 751, 257, 707, 857, 544, 11, 291, 458, 11, 466, 294, + 257, 857, 11, 51076], "temperature": 0.0, "avg_logprob": -0.27756854248046875, "compression_ratio": + 1.7214285714285715, "no_speech_prob": 0.0026335567235946655}, {"id": 49, "seek": + 17264, "start": 186.88, "end": 190.72, "text": " like on the site, but we actually + don''t like, hey, actually, if we use this,", "tokens": [51076, 411, 322, 264, 3621, + 11, 457, 321, 767, 500, 380, 411, 11, 4177, 11, 767, 11, 498, 321, 764, 341, 11, + 51268], "temperature": 0.0, "avg_logprob": -0.27756854248046875, "compression_ratio": + 1.7214285714285715, "no_speech_prob": 0.0026335567235946655}, {"id": 50, "seek": + 17264, "start": 190.72, "end": 195.83999999999997, "text": " we can just solve new + use cases and we can build a completely new database or new search engine.", "tokens": + [51268, 321, 393, 445, 5039, 777, 764, 3331, 293, 321, 393, 1322, 257, 2584, 777, + 8149, 420, 777, 3164, 2848, 13, 51524], "temperature": 0.0, "avg_logprob": -0.27756854248046875, + "compression_ratio": 1.7214285714285715, "no_speech_prob": 0.0026335567235946655}, + {"id": 51, "seek": 17264, "start": 195.83999999999997, "end": 197.64, "text": " + And so that is the origin story.", "tokens": [51524, 400, 370, 300, 307, 264, 4957, + 1657, 13, 51614], "temperature": 0.0, "avg_logprob": -0.27756854248046875, "compression_ratio": + 1.7214285714285715, "no_speech_prob": 0.0026335567235946655}, {"id": 52, "seek": + 17264, "start": 197.64, "end": 199.44, "text": " So that''s where I''m coming from + and", "tokens": [51614, 407, 300, 311, 689, 286, 478, 1348, 490, 293, 51704], "temperature": + 0.0, "avg_logprob": -0.27756854248046875, "compression_ratio": 1.7214285714285715, + "no_speech_prob": 0.0026335567235946655}, {"id": 53, "seek": 19944, "start": 200.44, + "end": 204.44, "text": " why we started because on structure data was a problem.", + "tokens": [50414, 983, 321, 1409, 570, 322, 3877, 1412, 390, 257, 1154, 13, 50614], + "temperature": 0.0, "avg_logprob": -0.22488713897435011, "compression_ratio": 1.7, + "no_speech_prob": 0.029879890382289886}, {"id": 54, "seek": 19944, "start": 204.44, + "end": 206.44, "text": " It is still a problem.", "tokens": [50614, 467, 307, 920, + 257, 1154, 13, 50714], "temperature": 0.0, "avg_logprob": -0.22488713897435011, + "compression_ratio": 1.7, "no_speech_prob": 0.029879890382289886}, {"id": 55, "seek": + 19944, "start": 206.44, "end": 211.44, "text": " And I strongly believe that these + kinds of vector search technologies are helping", "tokens": [50714, 400, 286, 10613, + 1697, 300, 613, 3685, 295, 8062, 3164, 7943, 366, 4315, 50964], "temperature": 0.0, + "avg_logprob": -0.22488713897435011, "compression_ratio": 1.7, "no_speech_prob": + 0.029879890382289886}, {"id": 56, "seek": 19944, "start": 211.44, "end": 218.44, + "text": " in solving these problems, not only in text, but also basically anything + you can vectorize.", "tokens": [50964, 294, 12606, 613, 2740, 11, 406, 787, 294, + 2487, 11, 457, 611, 1936, 1340, 291, 393, 8062, 1125, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.22488713897435011, "compression_ratio": 1.7, "no_speech_prob": + 0.029879890382289886}, {"id": 57, "seek": 19944, "start": 218.44, "end": 222.44, + "text": " So they can be images, they can be audio, but that could also be, I don''t + know,", "tokens": [51314, 407, 436, 393, 312, 5267, 11, 436, 393, 312, 6278, 11, + 457, 300, 727, 611, 312, 11, 286, 500, 380, 458, 11, 51514], "temperature": 0.0, + "avg_logprob": -0.22488713897435011, "compression_ratio": 1.7, "no_speech_prob": + 0.029879890382289886}, {"id": 58, "seek": 19944, "start": 222.44, "end": 223.44, + "text": " the human genome, you name it.", "tokens": [51514, 264, 1952, 21953, 11, + 291, 1315, 309, 13, 51564], "temperature": 0.0, "avg_logprob": -0.22488713897435011, + "compression_ratio": 1.7, "no_speech_prob": 0.029879890382289886}, {"id": 59, "seek": + 19944, "start": 223.44, "end": 227.44, "text": " All these things can be vectorized + and it gives another perspective to search through the data.", "tokens": [51564, + 1057, 613, 721, 393, 312, 8062, 1602, 293, 309, 2709, 1071, 4585, 281, 3164, 807, + 264, 1412, 13, 51764], "temperature": 0.0, "avg_logprob": -0.22488713897435011, + "compression_ratio": 1.7, "no_speech_prob": 0.029879890382289886}, {"id": 60, "seek": + 22744, "start": 227.44, "end": 229.44, "text": " So that''s the origin story.", + "tokens": [50364, 407, 300, 311, 264, 4957, 1657, 13, 50464], "temperature": 0.0, + "avg_logprob": -0.16761112919560187, "compression_ratio": 1.8327137546468402, "no_speech_prob": + 0.007461612578481436}, {"id": 61, "seek": 22744, "start": 229.44, "end": 230.44, + "text": " Yeah, that''s awesome.", "tokens": [50464, 865, 11, 300, 311, 3476, 13, + 50514], "temperature": 0.0, "avg_logprob": -0.16761112919560187, "compression_ratio": + 1.8327137546468402, "no_speech_prob": 0.007461612578481436}, {"id": 62, "seek": + 22744, "start": 230.44, "end": 231.44, "text": " That''s awesome to hear.", "tokens": + [50514, 663, 311, 3476, 281, 1568, 13, 50564], "temperature": 0.0, "avg_logprob": + -0.16761112919560187, "compression_ratio": 1.8327137546468402, "no_speech_prob": + 0.007461612578481436}, {"id": 63, "seek": 22744, "start": 231.44, "end": 235.44, + "text": " And like, you know, like this field is still in many ways emerging, right?", + "tokens": [50564, 400, 411, 11, 291, 458, 11, 411, 341, 2519, 307, 920, 294, 867, + 2098, 14989, 11, 558, 30, 50764], "temperature": 0.0, "avg_logprob": -0.16761112919560187, + "compression_ratio": 1.8327137546468402, "no_speech_prob": 0.007461612578481436}, + {"id": 64, "seek": 22744, "start": 235.44, "end": 244.44, "text": " Like the field + of let''s say vector data basis per say as products, but also the field of applying + them, you know, for different use cases.", "tokens": [50764, 1743, 264, 2519, 295, + 718, 311, 584, 8062, 1412, 5143, 680, 584, 382, 3383, 11, 457, 611, 264, 2519, 295, + 9275, 552, 11, 291, 458, 11, 337, 819, 764, 3331, 13, 51214], "temperature": 0.0, + "avg_logprob": -0.16761112919560187, "compression_ratio": 1.8327137546468402, "no_speech_prob": + 0.007461612578481436}, {"id": 65, "seek": 22744, "start": 244.44, "end": 246.44, + "text": " But you know, like, it''s interesting.", "tokens": [51214, 583, 291, 458, + 11, 411, 11, 309, 311, 1880, 13, 51314], "temperature": 0.0, "avg_logprob": -0.16761112919560187, + "compression_ratio": 1.8327137546468402, "no_speech_prob": 0.007461612578481436}, + {"id": 66, "seek": 22744, "start": 246.44, "end": 252.44, "text": " You touched + on, like, you, you know, you knew about, you know, Google kind of disclosing something.", + "tokens": [51314, 509, 9828, 322, 11, 411, 11, 291, 11, 291, 458, 11, 291, 2586, + 466, 11, 291, 458, 11, 3329, 733, 295, 17092, 6110, 746, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.16761112919560187, "compression_ratio": 1.8327137546468402, + "no_speech_prob": 0.007461612578481436}, {"id": 67, "seek": 22744, "start": 252.44, + "end": 255.44, "text": " And then you knew that the models have been also developing, + right?", "tokens": [51614, 400, 550, 291, 2586, 300, 264, 5245, 362, 668, 611, 6416, + 11, 558, 30, 51764], "temperature": 0.0, "avg_logprob": -0.16761112919560187, "compression_ratio": + 1.8327137546468402, "no_speech_prob": 0.007461612578481436}, {"id": 68, "seek": + 25544, "start": 255.44, "end": 259.44, "text": " Let''s say, basically you predated + that, but then bird came out, right?", "tokens": [50364, 961, 311, 584, 11, 1936, + 291, 3852, 770, 300, 11, 457, 550, 5255, 1361, 484, 11, 558, 30, 50564], "temperature": + 0.0, "avg_logprob": -0.12467643778811219, "compression_ratio": 1.5833333333333333, + "no_speech_prob": 0.01840498298406601}, {"id": 69, "seek": 25544, "start": 259.44, + "end": 268.44, "text": " And then in other fields, you know, let''s say computer + vision, automatic speech recognition, they also been vectorizing in some way.", + "tokens": [50564, 400, 550, 294, 661, 7909, 11, 291, 458, 11, 718, 311, 584, 3820, + 5201, 11, 12509, 6218, 11150, 11, 436, 611, 668, 8062, 3319, 294, 512, 636, 13, + 51014], "temperature": 0.0, "avg_logprob": -0.12467643778811219, "compression_ratio": + 1.5833333333333333, "no_speech_prob": 0.01840498298406601}, {"id": 70, "seek": 25544, + "start": 268.44, "end": 273.44, "text": " Maybe signal processing wasn''t vectorizing, + but then I guess they started doing it.", "tokens": [51014, 2704, 6358, 9007, 2067, + 380, 8062, 3319, 11, 457, 550, 286, 2041, 436, 1409, 884, 309, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.12467643778811219, "compression_ratio": 1.5833333333333333, + "no_speech_prob": 0.01840498298406601}, {"id": 71, "seek": 25544, "start": 273.44, + "end": 275.44, "text": " And like, it''s interesting.", "tokens": [51264, 400, 411, + 11, 309, 311, 1880, 13, 51364], "temperature": 0.0, "avg_logprob": -0.12467643778811219, + "compression_ratio": 1.5833333333333333, "no_speech_prob": 0.01840498298406601}, + {"id": 72, "seek": 25544, "start": 275.44, "end": 280.44, "text": " Do you think + that you kind of like coincided?", "tokens": [51364, 1144, 291, 519, 300, 291, 733, + 295, 411, 13001, 2112, 30, 51614], "temperature": 0.0, "avg_logprob": -0.12467643778811219, + "compression_ratio": 1.5833333333333333, "no_speech_prob": 0.01840498298406601}, + {"id": 73, "seek": 28044, "start": 280.44, "end": 283.44, "text": " Like you basically + predicted this field, right?", "tokens": [50364, 1743, 291, 1936, 19147, 341, 2519, + 11, 558, 30, 50514], "temperature": 0.0, "avg_logprob": -0.12483051651758505, "compression_ratio": + 1.7342657342657342, "no_speech_prob": 0.2935367822647095}, {"id": 74, "seek": 28044, + "start": 283.44, "end": 285.44, "text": " Like you didn''t know it will happen.", + "tokens": [50514, 1743, 291, 994, 380, 458, 309, 486, 1051, 13, 50614], "temperature": + 0.0, "avg_logprob": -0.12483051651758505, "compression_ratio": 1.7342657342657342, + "no_speech_prob": 0.2935367822647095}, {"id": 75, "seek": 28044, "start": 285.44, + "end": 287.44, "text": " You felt that it will happen.", "tokens": [50614, 509, + 2762, 300, 309, 486, 1051, 13, 50714], "temperature": 0.0, "avg_logprob": -0.12483051651758505, + "compression_ratio": 1.7342657342657342, "no_speech_prob": 0.2935367822647095}, + {"id": 76, "seek": 28044, "start": 287.44, "end": 289.44, "text": " But it wasn''t + at the same scale as it is now today.", "tokens": [50714, 583, 309, 2067, 380, 412, + 264, 912, 4373, 382, 309, 307, 586, 965, 13, 50814], "temperature": 0.0, "avg_logprob": + -0.12483051651758505, "compression_ratio": 1.7342657342657342, "no_speech_prob": + 0.2935367822647095}, {"id": 77, "seek": 28044, "start": 289.44, "end": 291.44, "text": + " We have so many models, right?", "tokens": [50814, 492, 362, 370, 867, 5245, 11, + 558, 30, 50914], "temperature": 0.0, "avg_logprob": -0.12483051651758505, "compression_ratio": + 1.7342657342657342, "no_speech_prob": 0.2935367822647095}, {"id": 78, "seek": 28044, + "start": 291.44, "end": 295.44, "text": " Like, I don''t know, hugging face, making + a product out of it and so on.", "tokens": [50914, 1743, 11, 286, 500, 380, 458, + 11, 41706, 1851, 11, 1455, 257, 1674, 484, 295, 309, 293, 370, 322, 13, 51114], + "temperature": 0.0, "avg_logprob": -0.12483051651758505, "compression_ratio": 1.7342657342657342, + "no_speech_prob": 0.2935367822647095}, {"id": 79, "seek": 28044, "start": 295.44, + "end": 297.44, "text": " But like, do you think there was a real need?", "tokens": + [51114, 583, 411, 11, 360, 291, 519, 456, 390, 257, 957, 643, 30, 51214], "temperature": + 0.0, "avg_logprob": -0.12483051651758505, "compression_ratio": 1.7342657342657342, + "no_speech_prob": 0.2935367822647095}, {"id": 80, "seek": 28044, "start": 297.44, + "end": 300.44, "text": " Or was it kind of coinciding that, yes, now there are models.", + "tokens": [51214, 1610, 390, 309, 733, 295, 13001, 2819, 300, 11, 2086, 11, 586, + 456, 366, 5245, 13, 51364], "temperature": 0.0, "avg_logprob": -0.12483051651758505, + "compression_ratio": 1.7342657342657342, "no_speech_prob": 0.2935367822647095}, + {"id": 81, "seek": 28044, "start": 300.44, "end": 306.44, "text": " We are addressing + the similar problems, you know, but using different technique.", "tokens": [51364, + 492, 366, 14329, 264, 2531, 2740, 11, 291, 458, 11, 457, 1228, 819, 6532, 13, 51664], + "temperature": 0.0, "avg_logprob": -0.12483051651758505, "compression_ratio": 1.7342657342657342, + "no_speech_prob": 0.2935367822647095}, {"id": 82, "seek": 28044, "start": 306.44, + "end": 309.44, "text": " And now you are there with your idea.", "tokens": [51664, + 400, 586, 291, 366, 456, 365, 428, 1558, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.12483051651758505, "compression_ratio": 1.7342657342657342, "no_speech_prob": + 0.2935367822647095}, {"id": 83, "seek": 30944, "start": 309.44, "end": 313.44, "text": + " Yeah, so I mean, there are two sides.", "tokens": [50364, 865, 11, 370, 286, 914, + 11, 456, 366, 732, 4881, 13, 50564], "temperature": 0.0, "avg_logprob": -0.2016582489013672, + "compression_ratio": 1.6101694915254237, "no_speech_prob": 0.005780835635960102}, + {"id": 84, "seek": 30944, "start": 313.44, "end": 315.44, "text": " Of the going + to answer this question.", "tokens": [50564, 2720, 264, 516, 281, 1867, 341, 1168, + 13, 50664], "temperature": 0.0, "avg_logprob": -0.2016582489013672, "compression_ratio": + 1.6101694915254237, "no_speech_prob": 0.005780835635960102}, {"id": 85, "seek": + 30944, "start": 315.44, "end": 318.44, "text": " So I want to end that''s more the + about the need.", "tokens": [50664, 407, 286, 528, 281, 917, 300, 311, 544, 264, + 466, 264, 643, 13, 50814], "temperature": 0.0, "avg_logprob": -0.2016582489013672, + "compression_ratio": 1.6101694915254237, "no_speech_prob": 0.005780835635960102}, + {"id": 86, "seek": 30944, "start": 318.44, "end": 323.44, "text": " And then secondly, + about when I knew when I sold a value.", "tokens": [50814, 400, 550, 26246, 11, + 466, 562, 286, 2586, 562, 286, 3718, 257, 2158, 13, 51064], "temperature": 0.0, + "avg_logprob": -0.2016582489013672, "compression_ratio": 1.6101694915254237, "no_speech_prob": + 0.005780835635960102}, {"id": 87, "seek": 30944, "start": 323.44, "end": 326.44, + "text": " And so, so let me, let me start with the first thing.", "tokens": [51064, + 400, 370, 11, 370, 718, 385, 11, 718, 385, 722, 365, 264, 700, 551, 13, 51214], + "temperature": 0.0, "avg_logprob": -0.2016582489013672, "compression_ratio": 1.6101694915254237, + "no_speech_prob": 0.005780835635960102}, {"id": 88, "seek": 30944, "start": 326.44, + "end": 327.44, "text": " So.", "tokens": [51214, 407, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.2016582489013672, "compression_ratio": 1.6101694915254237, + "no_speech_prob": 0.005780835635960102}, {"id": 89, "seek": 30944, "start": 327.44, + "end": 329.44, "text": " On structure data is huge.", "tokens": [51264, 1282, 3877, + 1412, 307, 2603, 13, 51364], "temperature": 0.0, "avg_logprob": -0.2016582489013672, + "compression_ratio": 1.6101694915254237, "no_speech_prob": 0.005780835635960102}, + {"id": 90, "seek": 30944, "start": 329.44, "end": 336.44, "text": " And the problem + that we currently have with search is that if you know what you''re looking for, + you can find it.", "tokens": [51364, 400, 264, 1154, 300, 321, 4362, 362, 365, 3164, + 307, 300, 498, 291, 458, 437, 291, 434, 1237, 337, 11, 291, 393, 915, 309, 13, 51714], + "temperature": 0.0, "avg_logprob": -0.2016582489013672, "compression_ratio": 1.6101694915254237, + "no_speech_prob": 0.005780835635960102}, {"id": 91, "seek": 33644, "start": 336.44, + "end": 339.44, "text": " If you don''t know what you''re looking for, you can''t.", + "tokens": [50364, 759, 291, 500, 380, 458, 437, 291, 434, 1237, 337, 11, 291, 393, + 380, 13, 50514], "temperature": 0.0, "avg_logprob": -0.15100869824809413, "compression_ratio": + 1.8804780876494025, "no_speech_prob": 0.03290649503469467}, {"id": 92, "seek": 33644, + "start": 339.44, "end": 346.44, "text": " So to make that very simple, if you have + a webshop, for example, just sort of a grocery store or something like that.", "tokens": + [50514, 407, 281, 652, 300, 588, 2199, 11, 498, 291, 362, 257, 2859, 9050, 11, 337, + 1365, 11, 445, 1333, 295, 257, 14410, 3531, 420, 746, 411, 300, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.15100869824809413, "compression_ratio": 1.8804780876494025, + "no_speech_prob": 0.03290649503469467}, {"id": 93, "seek": 33644, "start": 346.44, + "end": 350.44, "text": " And you''re looking for medicine because you have a headache.", + "tokens": [50864, 400, 291, 434, 1237, 337, 7195, 570, 291, 362, 257, 23520, 13, + 51064], "temperature": 0.0, "avg_logprob": -0.15100869824809413, "compression_ratio": + 1.8804780876494025, "no_speech_prob": 0.03290649503469467}, {"id": 94, "seek": 33644, + "start": 350.44, "end": 358.44, "text": " Then you must somehow or use the name + of the product or somebody needs to tag the product to find it.", "tokens": [51064, + 1396, 291, 1633, 6063, 420, 764, 264, 1315, 295, 264, 1674, 420, 2618, 2203, 281, + 6162, 264, 1674, 281, 915, 309, 13, 51464], "temperature": 0.0, "avg_logprob": -0.15100869824809413, + "compression_ratio": 1.8804780876494025, "no_speech_prob": 0.03290649503469467}, + {"id": 95, "seek": 33644, "start": 358.44, "end": 365.44, "text": " Right. So if + you have like, I don''t know, a aspirin, then somebody has to add the keyword, you + know, and headache or something like that.", "tokens": [51464, 1779, 13, 407, 498, + 291, 362, 411, 11, 286, 500, 380, 458, 11, 257, 20003, 259, 11, 550, 2618, 575, + 281, 909, 264, 20428, 11, 291, 458, 11, 293, 23520, 420, 746, 411, 300, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.15100869824809413, "compression_ratio": 1.8804780876494025, + "no_speech_prob": 0.03290649503469467}, {"id": 96, "seek": 36544, "start": 365.44, + "end": 369.44, "text": " Or a pain killer and then even with pain killer, you know, + etc.", "tokens": [50364, 1610, 257, 1822, 13364, 293, 550, 754, 365, 1822, 13364, + 11, 291, 458, 11, 5183, 13, 50564], "temperature": 0.0, "avg_logprob": -0.22718788146972657, + "compression_ratio": 1.6981981981981982, "no_speech_prob": 0.004936631303280592}, + {"id": 97, "seek": 36544, "start": 369.44, "end": 371.44, "text": " What these models + solve.", "tokens": [50564, 708, 613, 5245, 5039, 13, 50664], "temperature": 0.0, + "avg_logprob": -0.22718788146972657, "compression_ratio": 1.6981981981981982, "no_speech_prob": + 0.004936631303280592}, {"id": 98, "seek": 36544, "start": 371.44, "end": 377.44, + "text": " And then we only talk about the NLP and the natural language person models + is the.", "tokens": [50664, 400, 550, 321, 787, 751, 466, 264, 426, 45196, 293, + 264, 3303, 2856, 954, 5245, 307, 264, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.22718788146972657, "compression_ratio": 1.6981981981981982, "no_speech_prob": + 0.004936631303280592}, {"id": 99, "seek": 36544, "start": 377.44, "end": 381.44, + "text": " Is that they solve is that you can look in the vicinity of these of these + words.", "tokens": [50964, 1119, 300, 436, 5039, 307, 300, 291, 393, 574, 294, 264, + 42387, 295, 613, 295, 613, 2283, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.22718788146972657, "compression_ratio": 1.6981981981981982, "no_speech_prob": + 0.004936631303280592}, {"id": 100, "seek": 36544, "start": 381.44, "end": 388.44, + "text": " And what I often give is an example to think about it just as a as a mental + model.", "tokens": [51164, 400, 437, 286, 2049, 976, 307, 364, 1365, 281, 519, 466, + 309, 445, 382, 257, 382, 257, 4973, 2316, 13, 51514], "temperature": 0.0, "avg_logprob": + -0.22718788146972657, "compression_ratio": 1.6981981981981982, "no_speech_prob": + 0.004936631303280592}, {"id": 101, "seek": 36544, "start": 388.44, "end": 391.44, + "text": " Is a is an actual physical grocery store.", "tokens": [51514, 1119, 257, + 307, 364, 3539, 4001, 14410, 3531, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.22718788146972657, "compression_ratio": 1.6981981981981982, "no_speech_prob": + 0.004936631303280592}, {"id": 102, "seek": 39144, "start": 391.44, "end": 397.44, + "text": " So the example that I always gave is that I say like, well, let''s say + that I like a.", "tokens": [50364, 407, 264, 1365, 300, 286, 1009, 2729, 307, 300, + 286, 584, 411, 11, 731, 11, 718, 311, 584, 300, 286, 411, 257, 13, 50664], "temperature": + 0.0, "avg_logprob": -0.16648197902067927, "compression_ratio": 1.8885017421602788, + "no_speech_prob": 0.027734197676181793}, {"id": 103, "seek": 39144, "start": 397.44, + "end": 403.44, "text": " I have a shopping list and a shopping list says like apple + banana washing powder.", "tokens": [50664, 286, 362, 257, 8688, 1329, 293, 257, + 8688, 1329, 1619, 411, 10606, 14194, 13836, 6341, 13, 50964], "temperature": 0.0, + "avg_logprob": -0.16648197902067927, "compression_ratio": 1.8885017421602788, "no_speech_prob": + 0.027734197676181793}, {"id": 104, "seek": 39144, "start": 403.44, "end": 411.44, + "text": " If you would have a traditional data, if you would have a store that it''s + organized as a traditional database, then it could be for example on an alphabetical + order.", "tokens": [50964, 759, 291, 576, 362, 257, 5164, 1412, 11, 498, 291, 576, + 362, 257, 3531, 300, 309, 311, 9983, 382, 257, 5164, 8149, 11, 550, 309, 727, 312, + 337, 1365, 322, 364, 23339, 804, 1668, 13, 51364], "temperature": 0.0, "avg_logprob": + -0.16648197902067927, "compression_ratio": 1.8885017421602788, "no_speech_prob": + 0.027734197676181793}, {"id": 105, "seek": 39144, "start": 411.44, "end": 420.44, + "text": " It''s going to be pretty difficult to actually find what you''re looking + for because maybe at the A you might not find the apple, but you have to look to + the G because you''re looking for a granny Smith app, etc.", "tokens": [51364, 467, + 311, 516, 281, 312, 1238, 2252, 281, 767, 915, 437, 291, 434, 1237, 337, 570, 1310, + 412, 264, 316, 291, 1062, 406, 915, 264, 10606, 11, 457, 291, 362, 281, 574, 281, + 264, 460, 570, 291, 434, 1237, 337, 257, 44797, 8538, 724, 11, 5183, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.16648197902067927, "compression_ratio": 1.8885017421602788, + "no_speech_prob": 0.027734197676181793}, {"id": 106, "seek": 42044, "start": 420.44, + "end": 428.44, "text": " And what these these factor models do is that they they''re + basically a form of like a hyper space, right.", "tokens": [50364, 400, 437, 613, + 613, 5952, 5245, 360, 307, 300, 436, 436, 434, 1936, 257, 1254, 295, 411, 257, 9848, + 1901, 11, 558, 13, 50764], "temperature": 0.0, "avg_logprob": -0.23647483022589433, + "compression_ratio": 1.7272727272727273, "no_speech_prob": 0.006148704327642918}, + {"id": 107, "seek": 42044, "start": 428.44, "end": 432.44, "text": " So they have + you can envision them as as a as a tree dimensional space.", "tokens": [50764, 407, + 436, 362, 291, 393, 24739, 552, 382, 382, 257, 382, 257, 4230, 18795, 1901, 13, + 50964], "temperature": 0.0, "avg_logprob": -0.23647483022589433, "compression_ratio": + 1.7272727272727273, "no_speech_prob": 0.006148704327642918}, {"id": 108, "seek": + 42044, "start": 432.44, "end": 446.44, "text": " So if you walk to the food department + in the integrator and you find an apple, then you know that a banana will be closer + by than the washing powder is and if you move towards the washing powder, you move + away from these.", "tokens": [50964, 407, 498, 291, 1792, 281, 264, 1755, 5882, + 294, 264, 3572, 1639, 293, 291, 915, 364, 10606, 11, 550, 291, 458, 300, 257, 14194, + 486, 312, 4966, 538, 813, 264, 13836, 6341, 307, 293, 498, 291, 1286, 3030, 264, + 13836, 6341, 11, 291, 1286, 1314, 490, 613, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.23647483022589433, "compression_ratio": 1.7272727272727273, "no_speech_prob": + 0.006148704327642918}, {"id": 109, "seek": 44644, "start": 446.44, "end": 456.44, + "text": " And from the from the food section and that brings me to the second part + of my answer when I knew that is.", "tokens": [50364, 400, 490, 264, 490, 264, 1755, + 3541, 293, 300, 5607, 385, 281, 264, 1150, 644, 295, 452, 1867, 562, 286, 2586, + 300, 307, 13, 50864], "temperature": 0.0, "avg_logprob": -0.24453353881835938, "compression_ratio": + 1.4296875, "no_speech_prob": 0.02022716961801052}, {"id": 110, "seek": 44644, "start": + 456.44, "end": 464.44, "text": " This potential was because I made this very simple + super super super simple.", "tokens": [50864, 639, 3995, 390, 570, 286, 1027, 341, + 588, 2199, 1687, 1687, 1687, 2199, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.24453353881835938, "compression_ratio": 1.4296875, "no_speech_prob": 0.02022716961801052}, + {"id": 111, "seek": 46444, "start": 464.44, "end": 472.44, "text": " And I was like, + what''s the type which was based back then on glove and big problem was that people + say like there''s a problem with disambiguation.", "tokens": [50364, 400, 286, 390, + 411, 11, 437, 311, 264, 2010, 597, 390, 2361, 646, 550, 322, 26928, 293, 955, 1154, + 390, 300, 561, 584, 411, 456, 311, 257, 1154, 365, 717, 2173, 328, 16073, 13, 50764], + "temperature": 0.0, "avg_logprob": -0.35625419616699217, "compression_ratio": 1.6484375, + "no_speech_prob": 0.4307038486003876}, {"id": 112, "seek": 46444, "start": 472.44, + "end": 476.44, "text": " So if I have a word with effect representation, for example, + or apple.", "tokens": [50764, 407, 498, 286, 362, 257, 1349, 365, 1802, 10290, 11, + 337, 1365, 11, 420, 10606, 13, 50964], "temperature": 0.0, "avg_logprob": -0.35625419616699217, + "compression_ratio": 1.6484375, "no_speech_prob": 0.4307038486003876}, {"id": 113, + "seek": 46444, "start": 476.44, "end": 480.44, "text": " Is that related to the + fruit apple or to the company apple.", "tokens": [50964, 1119, 300, 4077, 281, 264, + 6773, 10606, 420, 281, 264, 2237, 10606, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.35625419616699217, "compression_ratio": 1.6484375, "no_speech_prob": 0.4307038486003876}, + {"id": 114, "seek": 46444, "start": 480.44, "end": 486.44, "text": " So I did something + very simple. I said like well, if I have a document or sentence.", "tokens": [51164, + 407, 286, 630, 746, 588, 2199, 13, 286, 848, 411, 731, 11, 498, 286, 362, 257, 4166, + 420, 8174, 13, 51464], "temperature": 0.0, "avg_logprob": -0.35625419616699217, + "compression_ratio": 1.6484375, "no_speech_prob": 0.4307038486003876}, {"id": 115, + "seek": 46444, "start": 486.44, "end": 492.44, "text": " And again, at bear mind, + this is predating transformers and.", "tokens": [51464, 400, 797, 11, 412, 6155, + 1575, 11, 341, 307, 3852, 990, 4088, 433, 293, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.35625419616699217, "compression_ratio": 1.6484375, "no_speech_prob": 0.4307038486003876}, + {"id": 116, "seek": 49244, "start": 492.44, "end": 498.44, "text": " I said well, + what if I take these individual works. So I wrote a very simple script that took + these individual words.", "tokens": [50364, 286, 848, 731, 11, 437, 498, 286, 747, + 613, 2609, 1985, 13, 407, 286, 4114, 257, 588, 2199, 5755, 300, 1890, 613, 2609, + 2283, 13, 50664], "temperature": 0.0, "avg_logprob": -0.183945411074478, "compression_ratio": + 1.7698412698412698, "no_speech_prob": 0.0077958544716238976}, {"id": 117, "seek": + 49244, "start": 498.44, "end": 503.44, "text": " I said, I''m going to calculate + a new factory presentation, just a centroid based on these words.", "tokens": [50664, + 286, 848, 11, 286, 478, 516, 281, 8873, 257, 777, 9265, 5860, 11, 445, 257, 1489, + 6490, 2361, 322, 613, 2283, 13, 50914], "temperature": 0.0, "avg_logprob": -0.183945411074478, + "compression_ratio": 1.7698412698412698, "no_speech_prob": 0.0077958544716238976}, + {"id": 118, "seek": 49244, "start": 503.44, "end": 507.44, "text": " So now I said, + OK, I have a company with an apple.", "tokens": [50914, 407, 586, 286, 848, 11, + 2264, 11, 286, 362, 257, 2237, 365, 364, 10606, 13, 51114], "temperature": 0.0, + "avg_logprob": -0.183945411074478, "compression_ratio": 1.7698412698412698, "no_speech_prob": + 0.0077958544716238976}, {"id": 119, "seek": 49244, "start": 507.44, "end": 514.44, + "text": " So I take all these individual words calculated centroid. And now I see + if I can somehow.", "tokens": [51114, 407, 286, 747, 439, 613, 2609, 2283, 15598, + 1489, 6490, 13, 400, 586, 286, 536, 498, 286, 393, 6063, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.183945411074478, "compression_ratio": 1.7698412698412698, + "no_speech_prob": 0.0077958544716238976}, {"id": 120, "seek": 49244, "start": 514.44, + "end": 520.44, "text": " Make the work, you know, the sentence less ambiguous and + that turned actually out to work.", "tokens": [51464, 4387, 264, 589, 11, 291, 458, + 11, 264, 8174, 1570, 39465, 293, 300, 3574, 767, 484, 281, 589, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.183945411074478, "compression_ratio": 1.7698412698412698, + "no_speech_prob": 0.0077958544716238976}, {"id": 121, "seek": 52044, "start": 520.44, + "end": 523.44, "text": " And then I thought, well, not extremely well, but rather + well.", "tokens": [50364, 400, 550, 286, 1194, 11, 731, 11, 406, 4664, 731, 11, + 457, 2831, 731, 13, 50514], "temperature": 0.0, "avg_logprob": -0.25809712373009025, + "compression_ratio": 1.7234848484848484, "no_speech_prob": 0.16515830159187317}, + {"id": 122, "seek": 52044, "start": 523.44, "end": 525.44, "text": " Again, we''re + talking years back now.", "tokens": [50514, 3764, 11, 321, 434, 1417, 924, 646, + 586, 13, 50614], "temperature": 0.0, "avg_logprob": -0.25809712373009025, "compression_ratio": + 1.7234848484848484, "no_speech_prob": 0.16515830159187317}, {"id": 123, "seek": + 52044, "start": 525.44, "end": 536.44, "text": " And then I knew, OK, this here + is value because I could think of so many things that you now can index and you + can search in the air quotes vicinity of it.", "tokens": [50614, 400, 550, 286, + 2586, 11, 2264, 11, 341, 510, 307, 2158, 570, 286, 727, 519, 295, 370, 867, 721, + 300, 291, 586, 393, 8186, 293, 291, 393, 3164, 294, 264, 1988, 19963, 42387, 295, + 309, 13, 51164], "temperature": 0.0, "avg_logprob": -0.25809712373009025, "compression_ratio": + 1.7234848484848484, "no_speech_prob": 0.16515830159187317}, {"id": 124, "seek": + 52044, "start": 536.44, "end": 540.44, "text": " In your, in your, in your vector + space, it made it easier to find things.", "tokens": [51164, 682, 428, 11, 294, + 428, 11, 294, 428, 8062, 1901, 11, 309, 1027, 309, 3571, 281, 915, 721, 13, 51364], + "temperature": 0.0, "avg_logprob": -0.25809712373009025, "compression_ratio": 1.7234848484848484, + "no_speech_prob": 0.16515830159187317}, {"id": 125, "seek": 52044, "start": 540.44, + "end": 544.44, "text": " It made it easier to classify things, et cetera, et cetera.", + "tokens": [51364, 467, 1027, 309, 3571, 281, 33872, 721, 11, 1030, 11458, 11, 1030, + 11458, 13, 51564], "temperature": 0.0, "avg_logprob": -0.25809712373009025, "compression_ratio": + 1.7234848484848484, "no_speech_prob": 0.16515830159187317}, {"id": 126, "seek": + 52044, "start": 544.44, "end": 545.44, "text": " So that''s.", "tokens": [51564, + 407, 300, 311, 13, 51614], "temperature": 0.0, "avg_logprob": -0.25809712373009025, + "compression_ratio": 1.7234848484848484, "no_speech_prob": 0.16515830159187317}, + {"id": 127, "seek": 52044, "start": 545.44, "end": 549.44, "text": " No, basically, + be my answer and how I, how I see that.", "tokens": [51614, 883, 11, 1936, 11, 312, + 452, 1867, 293, 577, 286, 11, 577, 286, 536, 300, 13, 51814], "temperature": 0.0, + "avg_logprob": -0.25809712373009025, "compression_ratio": 1.7234848484848484, "no_speech_prob": + 0.16515830159187317}, {"id": 128, "seek": 54944, "start": 549.44, "end": 552.44, + "text": " That''s a great answer. So it''s like the way I thought about it.", "tokens": + [50364, 663, 311, 257, 869, 1867, 13, 407, 309, 311, 411, 264, 636, 286, 1194, 466, + 309, 13, 50514], "temperature": 0.0, "avg_logprob": -0.14869526920155582, "compression_ratio": + 1.6958174904942966, "no_speech_prob": 0.009087065234780312}, {"id": 129, "seek": + 54944, "start": 552.44, "end": 557.44, "text": " It''s like, um, you bring context + to your data, right?", "tokens": [50514, 467, 311, 411, 11, 1105, 11, 291, 1565, + 4319, 281, 428, 1412, 11, 558, 30, 50764], "temperature": 0.0, "avg_logprob": -0.14869526920155582, + "compression_ratio": 1.6958174904942966, "no_speech_prob": 0.009087065234780312}, + {"id": 130, "seek": 54944, "start": 557.44, "end": 566.44, "text": " If we stay + on the text side for the moment, you said apple and banana, you know, they are related + because they are both fruit, right?", "tokens": [50764, 759, 321, 1754, 322, 264, + 2487, 1252, 337, 264, 1623, 11, 291, 848, 10606, 293, 14194, 11, 291, 458, 11, 436, + 366, 4077, 570, 436, 366, 1293, 6773, 11, 558, 30, 51214], "temperature": 0.0, "avg_logprob": + -0.14869526920155582, "compression_ratio": 1.6958174904942966, "no_speech_prob": + 0.009087065234780312}, {"id": 131, "seek": 54944, "start": 566.44, "end": 577.44, + "text": " But there could be some other related items now data set, we just don''t + know about as long as we encoded them and with the right kind of distance metric, + we can figure it out how close they are.", "tokens": [51214, 583, 456, 727, 312, + 512, 661, 4077, 4754, 586, 1412, 992, 11, 321, 445, 500, 380, 458, 466, 382, 938, + 382, 321, 2058, 12340, 552, 293, 365, 264, 558, 733, 295, 4560, 20678, 11, 321, + 393, 2573, 309, 484, 577, 1998, 436, 366, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.14869526920155582, "compression_ratio": 1.6958174904942966, "no_speech_prob": + 0.009087065234780312}, {"id": 132, "seek": 57744, "start": 577.44, "end": 590.44, + "text": " So it sounds like coming back to your previous example where we have used, + let''s say, inverted index, you know, we would just store all our items in some + alphabetical order and hope for the best.", "tokens": [50364, 407, 309, 3263, 411, + 1348, 646, 281, 428, 3894, 1365, 689, 321, 362, 1143, 11, 718, 311, 584, 11, 38969, + 8186, 11, 291, 458, 11, 321, 576, 445, 3531, 439, 527, 4754, 294, 512, 23339, 804, + 1668, 293, 1454, 337, 264, 1151, 13, 51014], "temperature": 0.0, "avg_logprob": + -0.13571473956108093, "compression_ratio": 1.4162162162162162, "no_speech_prob": + 0.0037030382081866264}, {"id": 133, "seek": 57744, "start": 590.44, "end": 596.44, + "text": " And that order, I think inherently didn''t have the context, right?", + "tokens": [51014, 400, 300, 1668, 11, 286, 519, 27993, 994, 380, 362, 264, 4319, + 11, 558, 30, 51314], "temperature": 0.0, "avg_logprob": -0.13571473956108093, "compression_ratio": + 1.4162162162162162, "no_speech_prob": 0.0037030382081866264}, {"id": 134, "seek": + 59644, "start": 596.44, "end": 610.44, "text": " The context was kind of in, it + was kind of represented in a different way, like in specifically in the case of + inverted index, you deal with addictionary of terms pointing to a posting list, + right?", "tokens": [50364, 440, 4319, 390, 733, 295, 294, 11, 309, 390, 733, 295, + 10379, 294, 257, 819, 636, 11, 411, 294, 4682, 294, 264, 1389, 295, 38969, 8186, + 11, 291, 2028, 365, 909, 4105, 822, 295, 2115, 12166, 281, 257, 15978, 1329, 11, + 558, 30, 51064], "temperature": 0.0, "avg_logprob": -0.23753495649857956, "compression_ratio": + 1.6883116883116882, "no_speech_prob": 0.1270771622657776}, {"id": 135, "seek": 59644, + "start": 610.44, "end": 614.44, "text": " Speaking in the scene, search engine, + bingo for the moment here.", "tokens": [51064, 13069, 294, 264, 4145, 11, 3164, + 2848, 11, 272, 18459, 337, 264, 1623, 510, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.23753495649857956, "compression_ratio": 1.6883116883116882, "no_speech_prob": + 0.1270771622657776}, {"id": 136, "seek": 59644, "start": 614.44, "end": 619.44, + "text": " So, um, and that posting list is just an order, order list of document + IDs.", "tokens": [51264, 407, 11, 1105, 11, 293, 300, 15978, 1329, 307, 445, 364, + 1668, 11, 1668, 1329, 295, 4166, 48212, 13, 51514], "temperature": 0.0, "avg_logprob": + -0.23753495649857956, "compression_ratio": 1.6883116883116882, "no_speech_prob": + 0.1270771622657776}, {"id": 137, "seek": 59644, "start": 619.44, "end": 623.44, + "text": " So you don''t have much context there either, right?", "tokens": [51514, + 407, 291, 500, 380, 362, 709, 4319, 456, 2139, 11, 558, 30, 51714], "temperature": + 0.0, "avg_logprob": -0.23753495649857956, "compression_ratio": 1.6883116883116882, + "no_speech_prob": 0.1270771622657776}, {"id": 138, "seek": 62344, "start": 623.44, + "end": 643.44, "text": " So, exactly, and that is, that is how it, how it brings + context and the, and again, going back to that, to that mental model or that, that + idea that you can have about it is that what I, what I said, I, so if you take the + building, where you have to grow, research in the building, the building would be + the database basically.", "tokens": [50364, 407, 11, 2293, 11, 293, 300, 307, 11, + 300, 307, 577, 309, 11, 577, 309, 5607, 4319, 293, 264, 11, 293, 797, 11, 516, 646, + 281, 300, 11, 281, 300, 4973, 2316, 420, 300, 11, 300, 1558, 300, 291, 393, 362, + 466, 309, 307, 300, 437, 286, 11, 437, 286, 848, 11, 286, 11, 370, 498, 291, 747, + 264, 2390, 11, 689, 291, 362, 281, 1852, 11, 2132, 294, 264, 2390, 11, 264, 2390, + 576, 312, 264, 8149, 1936, 13, 51364], "temperature": 0.0, "avg_logprob": -0.3201186770484561, + "compression_ratio": 1.7771739130434783, "no_speech_prob": 0.02788649871945381}, + {"id": 139, "seek": 64344, "start": 643.44, "end": 649.44, "text": " And the model + tells you where to put stuff in that building.", "tokens": [50364, 400, 264, 2316, + 5112, 291, 689, 281, 829, 1507, 294, 300, 2390, 13, 50664], "temperature": 0.0, + "avg_logprob": -0.15664255924713918, "compression_ratio": 1.7005347593582887, "no_speech_prob": + 0.0040156166069209576}, {"id": 140, "seek": 64344, "start": 649.44, "end": 664.44, + "text": " So that''s how it''s giving that context and then the only thing that + we need to do, well, I make it sound very simple, but the thing that we need to + do in a database is, make it possible as easy as possible for the end user to navigate + through that building.", "tokens": [50664, 407, 300, 311, 577, 309, 311, 2902, 300, + 4319, 293, 550, 264, 787, 551, 300, 321, 643, 281, 360, 11, 731, 11, 286, 652, 309, + 1626, 588, 2199, 11, 457, 264, 551, 300, 321, 643, 281, 360, 294, 257, 8149, 307, + 11, 652, 309, 1944, 382, 1858, 382, 1944, 337, 264, 917, 4195, 281, 12350, 807, + 300, 2390, 13, 51414], "temperature": 0.0, "avg_logprob": -0.15664255924713918, + "compression_ratio": 1.7005347593582887, "no_speech_prob": 0.0040156166069209576}, + {"id": 141, "seek": 66444, "start": 664.44, "end": 679.44, "text": " And that is + basically what the factor database is doing. So it''s taking the data and we can + also talk a little bit more about the features that that we have in the waviate + because that''s also something that we, we don''t only store factors, we also store + data objects.", "tokens": [50364, 400, 300, 307, 1936, 437, 264, 5952, 8149, 307, + 884, 13, 407, 309, 311, 1940, 264, 1412, 293, 321, 393, 611, 751, 257, 707, 857, + 544, 466, 264, 4122, 300, 300, 321, 362, 294, 264, 261, 706, 13024, 570, 300, 311, + 611, 746, 300, 321, 11, 321, 500, 380, 787, 3531, 6771, 11, 321, 611, 3531, 1412, + 6565, 13, 51114], "temperature": 0.0, "avg_logprob": -0.21065492527459256, "compression_ratio": + 1.7155555555555555, "no_speech_prob": 0.0838177278637886}, {"id": 142, "seek": 66444, + "start": 679.44, "end": 687.44, "text": " But basically, if you bring a data object + to waviate, you tell it and take this part of that information to factorize.", "tokens": + [51114, 583, 1936, 11, 498, 291, 1565, 257, 1412, 2657, 281, 261, 706, 13024, 11, + 291, 980, 309, 293, 747, 341, 644, 295, 300, 1589, 281, 5952, 1125, 13, 51514], + "temperature": 0.0, "avg_logprob": -0.21065492527459256, "compression_ratio": 1.7155555555555555, + "no_speech_prob": 0.0838177278637886}, {"id": 143, "seek": 68744, "start": 687.44, + "end": 701.44, "text": " So for example, if you have, well, a product, for example, + then you could say, well, I want to factorize the title and description that is + factorized and then the model tells waviate where in that database or in that factor + space to place that data object.", "tokens": [50364, 407, 337, 1365, 11, 498, 291, + 362, 11, 731, 11, 257, 1674, 11, 337, 1365, 11, 550, 291, 727, 584, 11, 731, 11, + 286, 528, 281, 5952, 1125, 264, 4876, 293, 3855, 300, 307, 5952, 1602, 293, 550, + 264, 2316, 5112, 261, 706, 13024, 689, 294, 300, 8149, 420, 294, 300, 5952, 1901, + 281, 1081, 300, 1412, 2657, 13, 51064], "temperature": 0.0, "avg_logprob": -0.13586607073793316, + "compression_ratio": 1.7933884297520661, "no_speech_prob": 0.025173673406243324}, + {"id": 144, "seek": 68744, "start": 701.44, "end": 713.44, "text": " And that is + what we tried to optimize for as much as we can so that you can search through to + hundreds of millions of data objects in using that model in just mere milliseconds.", + "tokens": [51064, 400, 300, 307, 437, 321, 3031, 281, 19719, 337, 382, 709, 382, + 321, 393, 370, 300, 291, 393, 3164, 807, 281, 6779, 295, 6803, 295, 1412, 6565, + 294, 1228, 300, 2316, 294, 445, 8401, 34184, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.13586607073793316, "compression_ratio": 1.7933884297520661, "no_speech_prob": + 0.025173673406243324}, {"id": 145, "seek": 71344, "start": 713.44, "end": 723.44, + "text": " Yeah, that''s fantastic. And I think before we move to the like kind of + what you are focusing on as a product, which is super exciting. And I mean, you''re + doing a ton of work.", "tokens": [50364, 865, 11, 300, 311, 5456, 13, 400, 286, + 519, 949, 321, 1286, 281, 264, 411, 733, 295, 437, 291, 366, 8416, 322, 382, 257, + 1674, 11, 597, 307, 1687, 4670, 13, 400, 286, 914, 11, 291, 434, 884, 257, 2952, + 295, 589, 13, 50864], "temperature": 0.0, "avg_logprob": -0.1426563546209052, "compression_ratio": + 1.6486486486486487, "no_speech_prob": 0.10206887871026993}, {"id": 146, "seek": + 71344, "start": 723.44, "end": 742.44, "text": " I just wanted to close off on that + on that line of thought that maybe, just maybe we are on the verge of closing the + inverted index data structure, because it existed since I think 15th century, like + the first book where they published the index space.", "tokens": [50864, 286, 445, + 1415, 281, 1998, 766, 322, 300, 322, 300, 1622, 295, 1194, 300, 1310, 11, 445, 1310, + 321, 366, 322, 264, 37164, 295, 10377, 264, 38969, 8186, 1412, 3877, 11, 570, 309, + 13135, 1670, 286, 519, 2119, 392, 4901, 11, 411, 264, 700, 1446, 689, 436, 6572, + 264, 8186, 1901, 13, 51814], "temperature": 0.0, "avg_logprob": -0.1426563546209052, + "compression_ratio": 1.6486486486486487, "no_speech_prob": 0.10206887871026993}, + {"id": 147, "seek": 74244, "start": 742.44, "end": 752.44, "text": " And it''s the + index page in the end, it''s an inverted index because it said, okay, this word + occurs on this page, that''s an inverted index, right.", "tokens": [50364, 400, + 309, 311, 264, 8186, 3028, 294, 264, 917, 11, 309, 311, 364, 38969, 8186, 570, 309, + 848, 11, 1392, 11, 341, 1349, 11843, 322, 341, 3028, 11, 300, 311, 364, 38969, 8186, + 11, 558, 13, 50864], "temperature": 0.0, "avg_logprob": -0.19104375558740952, "compression_ratio": + 1.5813953488372092, "no_speech_prob": 0.012077906168997288}, {"id": 148, "seek": + 74244, "start": 752.44, "end": 761.44, "text": " And so it existed for multiple + centuries. And so you think we are on the verge of replacing it with contextualized + embeddings.", "tokens": [50864, 400, 370, 309, 13135, 337, 3866, 13926, 13, 400, + 370, 291, 519, 321, 366, 322, 264, 37164, 295, 19139, 309, 365, 35526, 1602, 12240, + 29432, 13, 51314], "temperature": 0.0, "avg_logprob": -0.19104375558740952, "compression_ratio": + 1.5813953488372092, "no_speech_prob": 0.012077906168997288}, {"id": 149, "seek": + 76144, "start": 761.44, "end": 772.44, "text": " That is certainly that isn''t that + isn''t an exciting thought. I have to know that there are a few things from a from + a from a technical perspective, really inverted index is still being used.", "tokens": + [50364, 663, 307, 3297, 300, 1943, 380, 300, 1943, 380, 364, 4670, 1194, 13, 286, + 362, 281, 458, 300, 456, 366, 257, 1326, 721, 490, 257, 490, 257, 490, 257, 6191, + 4585, 11, 534, 38969, 8186, 307, 920, 885, 1143, 13, 50914], "temperature": 0.0, + "avg_logprob": -0.19043764613923572, "compression_ratio": 1.6555023923444976, "no_speech_prob": + 0.0914943739771843}, {"id": 150, "seek": 76144, "start": 772.44, "end": 780.44, + "text": " But one of the things that we''ve done, for example, we''ve hit is that + we said like we double down on the factor, you know, on the on the contextual search.", + "tokens": [50914, 583, 472, 295, 264, 721, 300, 321, 600, 1096, 11, 337, 1365, 11, + 321, 600, 2045, 307, 300, 321, 848, 411, 321, 3834, 760, 322, 264, 5952, 11, 291, + 458, 11, 322, 264, 322, 264, 35526, 3164, 13, 51314], "temperature": 0.0, "avg_logprob": + -0.19043764613923572, "compression_ratio": 1.6555023923444976, "no_speech_prob": + 0.0914943739771843}, {"id": 151, "seek": 78044, "start": 780.44, "end": 802.44, + "text": " And yes, every now and then. So for example, if you say, show me, if you + have a product database and you say like show me products to for outdoor sporting, + for example, but they have to be more expensive than 10 bucks, then, you know, both + types of indexes kick in, but it definitely starts from the perspective of actor + search and I like your idea.", "tokens": [50364, 400, 2086, 11, 633, 586, 293, 550, + 13, 407, 337, 1365, 11, 498, 291, 584, 11, 855, 385, 11, 498, 291, 362, 257, 1674, + 8149, 293, 291, 584, 411, 855, 385, 3383, 281, 337, 15942, 32366, 11, 337, 1365, + 11, 457, 436, 362, 281, 312, 544, 5124, 813, 1266, 11829, 11, 550, 11, 291, 458, + 11, 1293, 3467, 295, 8186, 279, 4437, 294, 11, 457, 309, 2138, 3719, 490, 264, 4585, + 295, 8747, 3164, 293, 286, 411, 428, 1558, 13, 51464], "temperature": 0.0, "avg_logprob": + -0.20068588710966564, "compression_ratio": 1.6308411214953271, "no_speech_prob": + 0.024972466751933098}, {"id": 152, "seek": 80244, "start": 802.44, "end": 811.44, + "text": " The amount of also research has been released that we of course also benefit + from is amazing. So I like that. I like that idea.", "tokens": [50364, 440, 2372, + 295, 611, 2132, 575, 668, 4736, 300, 321, 295, 1164, 611, 5121, 490, 307, 2243, + 13, 407, 286, 411, 300, 13, 286, 411, 300, 1558, 13, 50814], "temperature": 0.0, + "avg_logprob": -0.17827012803819445, "compression_ratio": 1.6631205673758864, "no_speech_prob": + 0.2819056212902069}, {"id": 153, "seek": 80244, "start": 811.44, "end": 823.44, + "text": " Yeah, so I think on the next lecture, I also a little bit like teach students + in the local university here. And when I explained some basic building blocks of + sort of, you know, classical search engine architecture.", "tokens": [50814, 865, + 11, 370, 286, 519, 322, 264, 958, 7991, 11, 286, 611, 257, 707, 857, 411, 2924, + 1731, 294, 264, 2654, 5454, 510, 13, 400, 562, 286, 8825, 512, 3875, 2390, 8474, + 295, 1333, 295, 11, 291, 458, 11, 13735, 3164, 2848, 9482, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.17827012803819445, "compression_ratio": 1.6631205673758864, + "no_speech_prob": 0.2819056212902069}, {"id": 154, "seek": 80244, "start": 823.44, + "end": 831.44, "text": " And I explained the inverted index. Then I ask, I puzzle + them with this question. Do you know how old this data structure is?", "tokens": + [51414, 400, 286, 8825, 264, 38969, 8186, 13, 1396, 286, 1029, 11, 286, 12805, 552, + 365, 341, 1168, 13, 1144, 291, 458, 577, 1331, 341, 1412, 3877, 307, 30, 51814], + "temperature": 0.0, "avg_logprob": -0.17827012803819445, "compression_ratio": 1.6631205673758864, + "no_speech_prob": 0.2819056212902069}, {"id": 155, "seek": 83144, "start": 831.44, + "end": 843.44, "text": " The students are actually from the linguistic department. + So they they are not as kind of, you know, IT people who care only about code, they + also care about the rest of life in many ways.", "tokens": [50364, 440, 1731, 366, + 767, 490, 264, 43002, 5882, 13, 407, 436, 436, 366, 406, 382, 733, 295, 11, 291, + 458, 11, 6783, 561, 567, 1127, 787, 466, 3089, 11, 436, 611, 1127, 466, 264, 1472, + 295, 993, 294, 867, 2098, 13, 50964], "temperature": 0.0, "avg_logprob": -0.2190199567560564, + "compression_ratio": 1.6753731343283582, "no_speech_prob": 0.013299851678311825}, + {"id": 156, "seek": 83144, "start": 843.44, "end": 858.44, "text": " So no, I don''t + want to play the IT guys, but I''m just saying they are kind of very multi-dimensional, + you know, and they and they are just puzzled and they say, OK, maybe 18th century, + they don''t know. But then I just bring the screenshot of a really old book.", "tokens": + [50964, 407, 572, 11, 286, 500, 380, 528, 281, 862, 264, 6783, 1074, 11, 457, 286, + 478, 445, 1566, 436, 366, 733, 295, 588, 4825, 12, 18759, 11, 291, 458, 11, 293, + 436, 293, 436, 366, 445, 18741, 1493, 293, 436, 584, 11, 2264, 11, 1310, 2443, 392, + 4901, 11, 436, 500, 380, 458, 13, 583, 550, 286, 445, 1565, 264, 27712, 295, 257, + 534, 1331, 1446, 13, 51714], "temperature": 0.0, "avg_logprob": -0.2190199567560564, + "compression_ratio": 1.6753731343283582, "no_speech_prob": 0.013299851678311825}, + {"id": 157, "seek": 85844, "start": 858.44, "end": 868.44, "text": " 18th century + and they''re like really? So I just I sort of make that connection that hey, we + are still using the tech that was invented in 15th century.", "tokens": [50364, + 2443, 392, 4901, 293, 436, 434, 411, 534, 30, 407, 286, 445, 286, 1333, 295, 652, + 300, 4984, 300, 4177, 11, 321, 366, 920, 1228, 264, 7553, 300, 390, 14479, 294, + 2119, 392, 4901, 13, 50864], "temperature": 0.0, "avg_logprob": -0.20540832070743337, + "compression_ratio": 1.5395348837209302, "no_speech_prob": 0.07067399471998215}, + {"id": 158, "seek": 85844, "start": 868.44, "end": 880.44, "text": " Yeah, yeah, + yeah, nobody that''s I mean, I agree with you and that is that is extremely exciting. + And I think we''ll also get into that, but but you also see emerging is like these.", + "tokens": [50864, 865, 11, 1338, 11, 1338, 11, 5079, 300, 311, 286, 914, 11, 286, + 3986, 365, 291, 293, 300, 307, 300, 307, 4664, 4670, 13, 400, 286, 519, 321, 603, + 611, 483, 666, 300, 11, 457, 457, 291, 611, 536, 14989, 307, 411, 613, 13, 51464], + "temperature": 0.0, "avg_logprob": -0.20540832070743337, "compression_ratio": 1.5395348837209302, + "no_speech_prob": 0.07067399471998215}, {"id": 159, "seek": 88044, "start": 880.44, + "end": 899.44, "text": " The use cases where those kinds of database and search + engines are do that are kind of solved. I mean, of course, it''s so fair, so it + can always be better and always be more, but those kind of are kind of solved. But + what we see actually with these fact search engines is that new use cases and new + options actually pop up.", "tokens": [50364, 440, 764, 3331, 689, 729, 3685, 295, + 8149, 293, 3164, 12982, 366, 360, 300, 366, 733, 295, 13041, 13, 286, 914, 11, 295, + 1164, 11, 309, 311, 370, 3143, 11, 370, 309, 393, 1009, 312, 1101, 293, 1009, 312, + 544, 11, 457, 729, 733, 295, 366, 733, 295, 13041, 13, 583, 437, 321, 536, 767, + 365, 613, 1186, 3164, 12982, 307, 300, 777, 764, 3331, 293, 777, 3956, 767, 1665, + 493, 13, 51314], "temperature": 0.0, "avg_logprob": -0.20245326649058948, "compression_ratio": + 1.7787610619469028, "no_speech_prob": 0.23349213600158691}, {"id": 160, "seek": + 88044, "start": 899.44, "end": 903.44, "text": " So we can do new things with it. + So, and I think that''s very exciting as well.", "tokens": [51314, 407, 321, 393, + 360, 777, 721, 365, 309, 13, 407, 11, 293, 286, 519, 300, 311, 588, 4670, 382, 731, + 13, 51514], "temperature": 0.0, "avg_logprob": -0.20245326649058948, "compression_ratio": + 1.7787610619469028, "no_speech_prob": 0.23349213600158691}, {"id": 161, "seek": + 90344, "start": 903.44, "end": 917.44, "text": " Yeah, absolutely. That''s an exciting + way to kind of approach this new emerging field is to look for use cases. And I + was really wondering like what is that that you are building in the company your + B IV 8 database engine.", "tokens": [50364, 865, 11, 3122, 13, 663, 311, 364, 4670, + 636, 281, 733, 295, 3109, 341, 777, 14989, 2519, 307, 281, 574, 337, 764, 3331, + 13, 400, 286, 390, 534, 6359, 411, 437, 307, 300, 300, 291, 366, 2390, 294, 264, + 2237, 428, 363, 15967, 1649, 8149, 2848, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.1627356679815995, "compression_ratio": 1.5793650793650793, "no_speech_prob": + 0.24730977416038513}, {"id": 162, "seek": 90344, "start": 917.44, "end": 928.44, + "text": " So you said that you have an you had an idea, you know, you started assembling + the team. Now you give vision, you drive a lot of things on the open source, you''re + super active.", "tokens": [51064, 407, 291, 848, 300, 291, 362, 364, 291, 632, 364, + 1558, 11, 291, 458, 11, 291, 1409, 43867, 264, 1469, 13, 823, 291, 976, 5201, 11, + 291, 3332, 257, 688, 295, 721, 322, 264, 1269, 4009, 11, 291, 434, 1687, 4967, 13, + 51614], "temperature": 0.0, "avg_logprob": -0.1627356679815995, "compression_ratio": + 1.5793650793650793, "no_speech_prob": 0.24730977416038513}, {"id": 163, "seek": + 92844, "start": 928.44, "end": 936.44, "text": " What is it that you are focusing + on, you know, for your users and maybe you can also go into use cases part.", "tokens": + [50364, 708, 307, 309, 300, 291, 366, 8416, 322, 11, 291, 458, 11, 337, 428, 5022, + 293, 1310, 291, 393, 611, 352, 666, 764, 3331, 644, 13, 50764], "temperature": 0.0, + "avg_logprob": -0.20297931588214377, "compression_ratio": 1.6238095238095238, "no_speech_prob": + 0.04310104250907898}, {"id": 164, "seek": 92844, "start": 936.44, "end": 949.44, + "text": " Sure. So it''s important to bear in mind that if you look at the solution + like we''ve had you can take two angles to look at it. So you can, as I like to + go, you can look at it, but himself. So that''s really true. The core technology.", + "tokens": [50764, 4894, 13, 407, 309, 311, 1021, 281, 6155, 294, 1575, 300, 498, + 291, 574, 412, 264, 3827, 411, 321, 600, 632, 291, 393, 747, 732, 14708, 281, 574, + 412, 309, 13, 407, 291, 393, 11, 382, 286, 411, 281, 352, 11, 291, 393, 574, 412, + 309, 11, 457, 3647, 13, 407, 300, 311, 534, 2074, 13, 440, 4965, 2899, 13, 51414], + "temperature": 0.0, "avg_logprob": -0.20297931588214377, "compression_ratio": 1.6238095238095238, + "no_speech_prob": 0.04310104250907898}, {"id": 165, "seek": 94944, "start": 949.44, + "end": 969.44, "text": " That''s how you can look at, but you can also look top + down. And so that is from the, from the use case perspective. And there are like + that are people in working on we''ve yet. And as you mentioned, it''s also it''s + open source that are working and talking about like this from the bottom self approach. + And I like to take a little bit more top down. So this like, so what are the things + that we can do with it. And.", "tokens": [50364, 663, 311, 577, 291, 393, 574, 412, + 11, 457, 291, 393, 611, 574, 1192, 760, 13, 400, 370, 300, 307, 490, 264, 11, 490, + 264, 764, 1389, 4585, 13, 400, 456, 366, 411, 300, 366, 561, 294, 1364, 322, 321, + 600, 1939, 13, 400, 382, 291, 2835, 11, 309, 311, 611, 309, 311, 1269, 4009, 300, + 366, 1364, 293, 1417, 466, 411, 341, 490, 264, 2767, 2698, 3109, 13, 400, 286, 411, + 281, 747, 257, 707, 857, 544, 1192, 760, 13, 407, 341, 411, 11, 370, 437, 366, 264, + 721, 300, 321, 393, 360, 365, 309, 13, 400, 13, 51364], "temperature": 0.0, "avg_logprob": + -0.21746826171875, "compression_ratio": 1.8274336283185841, "no_speech_prob": 0.3155849874019623}, + {"id": 166, "seek": 96944, "start": 969.44, "end": 975.44, "text": " So let me explain + to you what so what we''re building. So at the core so you can see this like a.", + "tokens": [50364, 407, 718, 385, 2903, 281, 291, 437, 370, 437, 321, 434, 2390, + 13, 407, 412, 264, 4965, 370, 291, 393, 536, 341, 411, 257, 13, 50664], "temperature": + 0.0, "avg_logprob": -0.2730829287797977, "compression_ratio": 1.6576086956521738, + "no_speech_prob": 0.019254447892308235}, {"id": 167, "seek": 96944, "start": 975.44, + "end": 980.44, "text": " They''re like tree layers basically. So the first layer + is the database itself.", "tokens": [50664, 814, 434, 411, 4230, 7914, 1936, 13, + 407, 264, 700, 4583, 307, 264, 8149, 2564, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.2730829287797977, "compression_ratio": 1.6576086956521738, "no_speech_prob": + 0.019254447892308235}, {"id": 168, "seek": 96944, "start": 980.44, "end": 989.44, + "text": " So you can find that database on on on get up, you can find it on the + documentation or website that is just called the weave yet.", "tokens": [50914, + 407, 291, 393, 915, 300, 8149, 322, 322, 322, 483, 493, 11, 291, 393, 915, 309, + 322, 264, 14333, 420, 3144, 300, 307, 445, 1219, 264, 29145, 1939, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.2730829287797977, "compression_ratio": 1.6576086956521738, + "no_speech_prob": 0.019254447892308235}, {"id": 169, "seek": 98944, "start": 989.44, + "end": 1000.44, "text": " And the fact that the fact that search in is the core + database. We see people use the database just to store data objects and their own + factory presentations.", "tokens": [50364, 400, 264, 1186, 300, 264, 1186, 300, + 3164, 294, 307, 264, 4965, 8149, 13, 492, 536, 561, 764, 264, 8149, 445, 281, 3531, + 1412, 6565, 293, 641, 1065, 9265, 18964, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.22667072665306828, "compression_ratio": 1.8059071729957805, "no_speech_prob": + 0.09771913290023804}, {"id": 170, "seek": 98944, "start": 1000.44, "end": 1017.44, + "text": " Now what is very important to know from a use case perspective. And I''m + now starting at the lowest level is that we thought that it was very important from + the get go to make sure that people can not only store the factors, but also the + data that they are representing.", "tokens": [50914, 823, 437, 307, 588, 1021, 281, + 458, 490, 257, 764, 1389, 4585, 13, 400, 286, 478, 586, 2891, 412, 264, 12437, 1496, + 307, 300, 321, 1194, 300, 309, 390, 588, 1021, 490, 264, 483, 352, 281, 652, 988, + 300, 561, 393, 406, 787, 3531, 264, 6771, 11, 457, 611, 264, 1412, 300, 436, 366, + 13460, 13, 51764], "temperature": 0.0, "avg_logprob": -0.22667072665306828, "compression_ratio": + 1.8059071729957805, "no_speech_prob": 0.09771913290023804}, {"id": 171, "seek": + 101744, "start": 1017.44, "end": 1025.44, "text": " So if you look at the data, + you can see that they are going to be a lot more effective to do an example of the + product, but it will be an article or what have you.", "tokens": [50364, 407, 498, + 291, 574, 412, 264, 1412, 11, 291, 393, 536, 300, 436, 366, 516, 281, 312, 257, + 688, 544, 4942, 281, 360, 364, 1365, 295, 264, 1674, 11, 457, 309, 486, 312, 364, + 7222, 420, 437, 362, 291, 13, 50764], "temperature": 0.2, "avg_logprob": -0.498880438848373, + "compression_ratio": 1.74609375, "no_speech_prob": 0.016633901745080948}, {"id": + 172, "seek": 101744, "start": 1025.44, "end": 1033.44, "text": " You can actually + store the product. So the price, the name, the description and those kinds of things. + And it can say this product has this factor.", "tokens": [50764, 509, 393, 767, + 3531, 264, 1674, 13, 407, 264, 3218, 11, 264, 1315, 11, 264, 3855, 293, 729, 3685, + 295, 721, 13, 400, 309, 393, 584, 341, 1674, 575, 341, 5952, 13, 51164], "temperature": + 0.2, "avg_logprob": -0.498880438848373, "compression_ratio": 1.74609375, "no_speech_prob": + 0.016633901745080948}, {"id": 173, "seek": 101744, "start": 1033.44, "end": 1041.44, + "text": " And on top of that, we also said like we want to be able to connect these + data objects together in a more air quotes traditional graph.", "tokens": [51164, + 400, 322, 1192, 295, 300, 11, 321, 611, 848, 411, 321, 528, 281, 312, 1075, 281, + 1745, 613, 1412, 6565, 1214, 294, 257, 544, 1988, 19963, 5164, 4295, 13, 51564], + "temperature": 0.2, "avg_logprob": -0.498880438848373, "compression_ratio": 1.74609375, + "no_speech_prob": 0.016633901745080948}, {"id": 174, "seek": 104144, "start": 1041.44, + "end": 1050.44, "text": " We just not a graph database, but it has a graph data + model. Now, when we go into use case, I can I will share a few cool things that + I that you can do with that.", "tokens": [50364, 492, 445, 406, 257, 4295, 8149, + 11, 457, 309, 575, 257, 4295, 1412, 2316, 13, 823, 11, 562, 321, 352, 666, 764, + 1389, 11, 286, 393, 286, 486, 2073, 257, 1326, 1627, 721, 300, 286, 300, 291, 393, + 360, 365, 300, 13, 50814], "temperature": 0.0, "avg_logprob": -0.1920662307739258, + "compression_ratio": 1.735042735042735, "no_speech_prob": 0.07124494761228561}, + {"id": 175, "seek": 104144, "start": 1050.44, "end": 1065.44, "text": " But that''s + so that is at the core at the heart that is the database. And what does that database + focus on. It''s focused on being a database so that you can really have create right + update and delete functionality, which is easier set and done.", "tokens": [50814, + 583, 300, 311, 370, 300, 307, 412, 264, 4965, 412, 264, 1917, 300, 307, 264, 8149, + 13, 400, 437, 775, 300, 8149, 1879, 322, 13, 467, 311, 5178, 322, 885, 257, 8149, + 370, 300, 291, 393, 534, 362, 1884, 558, 5623, 293, 12097, 14980, 11, 597, 307, + 3571, 992, 293, 1096, 13, 51564], "temperature": 0.0, "avg_logprob": -0.1920662307739258, + "compression_ratio": 1.735042735042735, "no_speech_prob": 0.07124494761228561}, + {"id": 176, "seek": 106544, "start": 1065.44, "end": 1072.44, "text": " And there''s + a lot of content also that my my colleague agenda talks about online if you really + want to get into the mid degree.", "tokens": [50364, 400, 456, 311, 257, 688, 295, + 2701, 611, 300, 452, 452, 13532, 9829, 6686, 466, 2950, 498, 291, 534, 528, 281, + 483, 666, 264, 2062, 4314, 13, 50714], "temperature": 0.0, "avg_logprob": -0.38052363511992665, + "compression_ratio": 1.5377358490566038, "no_speech_prob": 0.06891190260648727}, + {"id": 177, "seek": 106544, "start": 1072.44, "end": 1079.44, "text": " So, but + it''s the database that you''re used to you use, we''ve eaten a similar fashion. + So you take the container to spin it up.", "tokens": [50714, 407, 11, 457, 309, + 311, 264, 8149, 300, 291, 434, 1143, 281, 291, 764, 11, 321, 600, 12158, 257, 2531, + 6700, 13, 407, 291, 747, 264, 10129, 281, 6060, 309, 493, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.38052363511992665, "compression_ratio": 1.5377358490566038, + "no_speech_prob": 0.06891190260648727}, {"id": 178, "seek": 106544, "start": 1079.44, + "end": 1083.44, "text": " APIs become available or the rest of the APIs of the crop + channel APIs.", "tokens": [51064, 21445, 1813, 2435, 420, 264, 1472, 295, 264, 21445, + 295, 264, 9086, 2269, 21445, 13, 51264], "temperature": 0.0, "avg_logprob": -0.38052363511992665, + "compression_ratio": 1.5377358490566038, "no_speech_prob": 0.06891190260648727}, + {"id": 179, "seek": 108344, "start": 1083.44, "end": 1088.44, "text": " Client available, + Python, go job, what have you.", "tokens": [50364, 2033, 1196, 2435, 11, 15329, + 11, 352, 1691, 11, 437, 362, 291, 13, 50614], "temperature": 0.0, "avg_logprob": + -0.256954288482666, "compression_ratio": 1.5714285714285714, "no_speech_prob": 0.035427410155534744}, + {"id": 180, "seek": 108344, "start": 1088.44, "end": 1097.44, "text": " That you + can connect to the database. So if you''re used to working with a database or a + search engine, it''s the same function there. That is one that sits at the core.", + "tokens": [50614, 663, 291, 393, 1745, 281, 264, 8149, 13, 407, 498, 291, 434, 1143, + 281, 1364, 365, 257, 8149, 420, 257, 3164, 2848, 11, 309, 311, 264, 912, 2445, 456, + 13, 663, 307, 472, 300, 12696, 412, 264, 4965, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.256954288482666, "compression_ratio": 1.5714285714285714, "no_speech_prob": 0.035427410155534744}, + {"id": 181, "seek": 108344, "start": 1097.44, "end": 1103.44, "text": " Then around + that we have our first layer or a second layer. And that are those are modules.", + "tokens": [51064, 1396, 926, 300, 321, 362, 527, 700, 4583, 420, 257, 1150, 4583, + 13, 400, 300, 366, 729, 366, 16679, 13, 51364], "temperature": 0.0, "avg_logprob": + -0.256954288482666, "compression_ratio": 1.5714285714285714, "no_speech_prob": 0.035427410155534744}, + {"id": 182, "seek": 110344, "start": 1103.44, "end": 1116.44, "text": " And what + these modules do they do a few things. So we''ve seen like, hey, there are actually + certain types of, for example, machinery models that people keep using over and + over to get these vector representations. Why not bundle them.", "tokens": [50364, + 400, 437, 613, 16679, 360, 436, 360, 257, 1326, 721, 13, 407, 321, 600, 1612, 411, + 11, 4177, 11, 456, 366, 767, 1629, 3467, 295, 11, 337, 1365, 11, 27302, 5245, 300, + 561, 1066, 1228, 670, 293, 670, 281, 483, 613, 8062, 33358, 13, 1545, 406, 24438, + 552, 13, 51014], "temperature": 0.0, "avg_logprob": -0.17209521543632433, "compression_ratio": + 1.7222222222222223, "no_speech_prob": 0.01635173335671425}, {"id": 183, "seek": + 110344, "start": 1116.44, "end": 1130.44, "text": " So think about the text to fact + models that we have. So we have different types for different use cases where you + can say, well, I''m going to throw in that product, but automatically take a model + to create a vector representation.", "tokens": [51014, 407, 519, 466, 264, 2487, + 281, 1186, 5245, 300, 321, 362, 13, 407, 321, 362, 819, 3467, 337, 819, 764, 3331, + 689, 291, 393, 584, 11, 731, 11, 286, 478, 516, 281, 3507, 294, 300, 1674, 11, 457, + 6772, 747, 257, 2316, 281, 1884, 257, 8062, 10290, 13, 51714], "temperature": 0.0, + "avg_logprob": -0.17209521543632433, "compression_ratio": 1.7222222222222223, "no_speech_prob": + 0.01635173335671425}, {"id": 184, "seek": 113044, "start": 1130.44, "end": 1142.44, + "text": " So we have a question answering models spell check models. You can create + your own models. Sorry, I''m saying models and then modules. Sorry, this is a little + bit.", "tokens": [50364, 407, 321, 362, 257, 1168, 13430, 5245, 9827, 1520, 5245, + 13, 509, 393, 1884, 428, 1065, 5245, 13, 4919, 11, 286, 478, 1566, 5245, 293, 550, + 16679, 13, 4919, 11, 341, 307, 257, 707, 857, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.3167739232381185, "compression_ratio": 1.5933333333333333, "no_speech_prob": + 0.03884444758296013}, {"id": 185, "seek": 113044, "start": 1142.44, "end": 1148.44, + "text": " And it''s so models and modules. So I meant modules. And those are available.", + "tokens": [50964, 400, 309, 311, 370, 5245, 293, 16679, 13, 407, 286, 4140, 16679, + 13, 400, 729, 366, 2435, 13, 51264], "temperature": 0.0, "avg_logprob": -0.3167739232381185, + "compression_ratio": 1.5933333333333333, "no_speech_prob": 0.03884444758296013}, + {"id": 186, "seek": 114844, "start": 1148.44, "end": 1156.44, "text": " And open + source as well. And my colleague Laura made a great video also on like how you can + build your own your own modules.", "tokens": [50364, 400, 1269, 4009, 382, 731, + 13, 400, 452, 13532, 13220, 1027, 257, 869, 960, 611, 322, 411, 577, 291, 393, 1322, + 428, 1065, 428, 1065, 16679, 13, 50764], "temperature": 0.0, "avg_logprob": -0.2405545711517334, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.14247308671474457}, + {"id": 187, "seek": 114844, "start": 1156.44, "end": 1168.44, "text": " And then + we have like a another layer around that. And then we go a little bit outside of + the realm of the software per se itself. And those are more in the package use cases.", + "tokens": [50764, 400, 550, 321, 362, 411, 257, 1071, 4583, 926, 300, 13, 400, 550, + 321, 352, 257, 707, 857, 2380, 295, 264, 15355, 295, 264, 4722, 680, 369, 2564, + 13, 400, 729, 366, 544, 294, 264, 7372, 764, 3331, 13, 51364], "temperature": 0.0, + "avg_logprob": -0.2405545711517334, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.14247308671474457}, {"id": 188, "seek": 116844, "start": 1168.44, "end": 1178.44, + "text": " We see that there''s a lot of value in in retail wholesale e-commerce + in the medical space.", "tokens": [50364, 492, 536, 300, 456, 311, 257, 688, 295, + 2158, 294, 294, 10800, 43982, 308, 12, 26926, 294, 264, 4625, 1901, 13, 50864], + "temperature": 0.0, "avg_logprob": -0.23175764083862305, "compression_ratio": 1.3823529411764706, + "no_speech_prob": 0.03516121953725815}, {"id": 189, "seek": 116844, "start": 1178.44, + "end": 1183.44, "text": " Data management space, those kind of spaces and what we''re + doing and that''s mostly also my focus.", "tokens": [50864, 11888, 4592, 1901, 11, + 729, 733, 295, 7673, 293, 437, 321, 434, 884, 293, 300, 311, 5240, 611, 452, 1879, + 13, 51114], "temperature": 0.0, "avg_logprob": -0.23175764083862305, "compression_ratio": + 1.3823529411764706, "no_speech_prob": 0.03516121953725815}, {"id": 190, "seek": + 118344, "start": 1183.44, "end": 1197.44, "text": " And if we have at the core, + we have this one singular database. What are these package things that we can do + around it? And that is also where we''re, you know, where we make a distinction + between our users and our customers.", "tokens": [50364, 400, 498, 321, 362, 412, + 264, 4965, 11, 321, 362, 341, 472, 20010, 8149, 13, 708, 366, 613, 7372, 721, 300, + 321, 393, 360, 926, 309, 30, 400, 300, 307, 611, 689, 321, 434, 11, 291, 458, 11, + 689, 321, 652, 257, 16844, 1296, 527, 5022, 293, 527, 4581, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.2787527251012117, "compression_ratio": 1.6356877323420074, + "no_speech_prob": 0.2780300974845886}, {"id": 191, "seek": 118344, "start": 1197.44, + "end": 1212.44, "text": " So our customers are mostly interested in these packages. + Right. I can say, OK, I have a PRP classification problem. Oh, great. You can actually + do that with V8 with that specifically for companies in your industry.", "tokens": + [51064, 407, 527, 4581, 366, 5240, 3102, 294, 613, 17401, 13, 1779, 13, 286, 393, + 584, 11, 2264, 11, 286, 362, 257, 11568, 47, 21538, 1154, 13, 876, 11, 869, 13, + 509, 393, 767, 360, 300, 365, 691, 23, 365, 300, 4682, 337, 3431, 294, 428, 3518, + 13, 51814], "temperature": 0.0, "avg_logprob": -0.2787527251012117, "compression_ratio": + 1.6356877323420074, "no_speech_prob": 0.2780300974845886}, {"id": 192, "seek": 121244, + "start": 1212.44, "end": 1220.44, "text": " You can also be for document search + for medical use case image use cases, image similarities.", "tokens": [50364, 509, + 393, 611, 312, 337, 4166, 3164, 337, 4625, 764, 1389, 3256, 764, 3331, 11, 3256, + 24197, 13, 50764], "temperature": 0.0, "avg_logprob": -0.2510874668757121, "compression_ratio": + 1.6910569105691058, "no_speech_prob": 0.0035822431091219187}, {"id": 193, "seek": + 121244, "start": 1220.44, "end": 1228.44, "text": " And so we package them together. + So that would be the last there. Sometimes there are software and folks, for example, + in the form of plugins and those kind of things.", "tokens": [50764, 400, 370, 321, + 7372, 552, 1214, 13, 407, 300, 576, 312, 264, 1036, 456, 13, 4803, 456, 366, 4722, + 293, 4024, 11, 337, 1365, 11, 294, 264, 1254, 295, 33759, 293, 729, 733, 295, 721, + 13, 51164], "temperature": 0.0, "avg_logprob": -0.2510874668757121, "compression_ratio": + 1.6910569105691058, "no_speech_prob": 0.0035822431091219187}, {"id": 194, "seek": + 121244, "start": 1228.44, "end": 1238.44, "text": " But that is the that is the + outer layer. So that is what we''ve had looks like. And that''s what you''re constantly + building because as you mentioned before.", "tokens": [51164, 583, 300, 307, 264, + 300, 307, 264, 10847, 4583, 13, 407, 300, 307, 437, 321, 600, 632, 1542, 411, 13, + 400, 300, 311, 437, 291, 434, 6460, 2390, 570, 382, 291, 2835, 949, 13, 51664], + "temperature": 0.0, "avg_logprob": -0.2510874668757121, "compression_ratio": 1.6910569105691058, + "no_speech_prob": 0.0035822431091219187}, {"id": 195, "seek": 123844, "start": 1238.44, + "end": 1257.44, "text": " The fact that it''s kind of a new thing right to actually + deal with that. And that a question might be like, so what''s the, so what''s actually + the new thing right. So the other day somebody asked me said like, well, it was + a data scientist is like, well, but if I have my factors, I can just store them + like in my memory and do some similarities or something.", "tokens": [50364, 440, + 1186, 300, 309, 311, 733, 295, 257, 777, 551, 558, 281, 767, 2028, 365, 300, 13, + 400, 300, 257, 1168, 1062, 312, 411, 11, 370, 437, 311, 264, 11, 370, 437, 311, + 767, 264, 777, 551, 558, 13, 407, 264, 661, 786, 2618, 2351, 385, 848, 411, 11, + 731, 11, 309, 390, 257, 1412, 12662, 307, 411, 11, 731, 11, 457, 498, 286, 362, + 452, 6771, 11, 286, 393, 445, 3531, 552, 411, 294, 452, 4675, 293, 360, 512, 24197, + 420, 746, 13, 51314], "temperature": 0.0, "avg_logprob": -0.3063915859569203, "compression_ratio": + 1.6966824644549763, "no_speech_prob": 0.12385619431734085}, {"id": 196, "seek": + 125744, "start": 1257.44, "end": 1271.44, "text": " And we can absolutely do that + is it like now. But what if you want to do that for a product catalog that might + have like 50 K products that are constantly changing and those kind of so then that + becomes problematic.", "tokens": [50364, 400, 321, 393, 3122, 360, 300, 307, 309, + 411, 586, 13, 583, 437, 498, 291, 528, 281, 360, 300, 337, 257, 1674, 19746, 300, + 1062, 362, 411, 2625, 591, 3383, 300, 366, 6460, 4473, 293, 729, 733, 295, 370, + 550, 300, 3643, 19011, 13, 51064], "temperature": 0.0, "avg_logprob": -0.14475071165296766, + "compression_ratio": 1.7231404958677685, "no_speech_prob": 0.5286182165145874}, + {"id": 197, "seek": 125744, "start": 1271.44, "end": 1283.44, "text": " So we actually + help you to bring these models to production and what you actually see is that the + new use cases that come out of that are tremendously big and we''re just constantly + uncovering new ones.", "tokens": [51064, 407, 321, 767, 854, 291, 281, 1565, 613, + 5245, 281, 4265, 293, 437, 291, 767, 536, 307, 300, 264, 777, 764, 3331, 300, 808, + 484, 295, 300, 366, 27985, 955, 293, 321, 434, 445, 6460, 21694, 278, 777, 2306, + 13, 51664], "temperature": 0.0, "avg_logprob": -0.14475071165296766, "compression_ratio": + 1.7231404958677685, "no_speech_prob": 0.5286182165145874}, {"id": 198, "seek": 128344, + "start": 1283.44, "end": 1297.44, "text": " Let me give you one example. So the + let''s stay with the ecommerce example. So if I have and we''ve had a data object + that has a product and a product has effective representation that it got from a, + for example, transformer model.", "tokens": [50364, 961, 385, 976, 291, 472, 1365, + 13, 407, 264, 718, 311, 1754, 365, 264, 308, 26926, 1365, 13, 407, 498, 286, 362, + 293, 321, 600, 632, 257, 1412, 2657, 300, 575, 257, 1674, 293, 257, 1674, 575, 4942, + 10290, 300, 309, 658, 490, 257, 11, 337, 1365, 11, 31782, 2316, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.2136343800744345, "compression_ratio": 1.6587677725118484, + "no_speech_prob": 0.0407446064054966}, {"id": 199, "seek": 128344, "start": 1297.44, + "end": 1306.44, "text": " Then we can also say we get what I have a card. I just + a shopping cart. Now if people add products to the shopping cart.", "tokens": [51064, + 1396, 321, 393, 611, 584, 321, 483, 437, 286, 362, 257, 2920, 13, 286, 445, 257, + 8688, 5467, 13, 823, 498, 561, 909, 3383, 281, 264, 8688, 5467, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.2136343800744345, "compression_ratio": 1.6587677725118484, + "no_speech_prob": 0.0407446064054966}, {"id": 200, "seek": 130644, "start": 1306.44, + "end": 1311.44, "text": " We can real time calculate new vector representations + based on what people have these cards.", "tokens": [50364, 492, 393, 957, 565, 8873, + 777, 8062, 33358, 2361, 322, 437, 561, 362, 613, 5632, 13, 50614], "temperature": + 0.0, "avg_logprob": -0.11621410927076019, "compression_ratio": 1.7545454545454546, + "no_speech_prob": 0.10231925547122955}, {"id": 201, "seek": 130644, "start": 1311.44, + "end": 1318.44, "text": " So now we can say, hey, based on what you have in your + cards, you might be interested in this or that product as well.", "tokens": [50614, + 407, 586, 321, 393, 584, 11, 4177, 11, 2361, 322, 437, 291, 362, 294, 428, 5632, + 11, 291, 1062, 312, 3102, 294, 341, 420, 300, 1674, 382, 731, 13, 50964], "temperature": + 0.0, "avg_logprob": -0.11621410927076019, "compression_ratio": 1.7545454545454546, + "no_speech_prob": 0.10231925547122955}, {"id": 202, "seek": 130644, "start": 1318.44, + "end": 1329.44, "text": " So now you have these real time now all of a sudden it + it changed from a search engine where you can find products into a recommendation + engine for ecommerce just all in one.", "tokens": [50964, 407, 586, 291, 362, 613, + 957, 565, 586, 439, 295, 257, 3990, 309, 309, 3105, 490, 257, 3164, 2848, 689, 291, + 393, 915, 3383, 666, 257, 11879, 2848, 337, 308, 26926, 445, 439, 294, 472, 13, + 51514], "temperature": 0.0, "avg_logprob": -0.11621410927076019, "compression_ratio": + 1.7545454545454546, "no_speech_prob": 0.10231925547122955}, {"id": 203, "seek": + 132944, "start": 1329.44, "end": 1340.44, "text": " And those kind of things for + constantly uncovering and there''s so much more that we that we can do from from + very concrete things like ecommerce to on all the other ends of the spectrum.", + "tokens": [50364, 400, 729, 733, 295, 721, 337, 6460, 21694, 278, 293, 456, 311, + 370, 709, 544, 300, 321, 300, 321, 393, 360, 490, 490, 588, 9859, 721, 411, 308, + 26926, 281, 322, 439, 264, 661, 5314, 295, 264, 11143, 13, 50914], "temperature": + 0.0, "avg_logprob": -0.1722391860125816, "compression_ratio": 1.711340206185567, + "no_speech_prob": 0.08899862319231033}, {"id": 204, "seek": 132944, "start": 1340.44, + "end": 1349.44, "text": " Things like with effects representation that people are + calculating for like a genomes and those kind of things. So that is just that keeps + just.", "tokens": [50914, 9514, 411, 365, 5065, 10290, 300, 561, 366, 28258, 337, + 411, 257, 1049, 18168, 293, 729, 733, 295, 721, 13, 407, 300, 307, 445, 300, 5965, + 445, 13, 51364], "temperature": 0.0, "avg_logprob": -0.1722391860125816, "compression_ratio": + 1.711340206185567, "no_speech_prob": 0.08899862319231033}, {"id": 205, "seek": 134944, + "start": 1349.44, "end": 1355.44, "text": " And the use case keep turning up, you + know, almost on a daily basis.", "tokens": [50364, 400, 264, 764, 1389, 1066, 6246, + 493, 11, 291, 458, 11, 1920, 322, 257, 5212, 5143, 13, 50664], "temperature": 0.0, + "avg_logprob": -0.20690717061360678, "compression_ratio": 1.5384615384615385, "no_speech_prob": + 0.21303260326385498}, {"id": 206, "seek": 134944, "start": 1355.44, "end": 1363.44, + "text": " Yeah, yeah, that''s I want to that that''s so great like a dive in I was + kind of a little bit I wanted to unpack a little bit like things.", "tokens": [50664, + 865, 11, 1338, 11, 300, 311, 286, 528, 281, 300, 300, 311, 370, 869, 411, 257, 9192, + 294, 286, 390, 733, 295, 257, 707, 857, 286, 1415, 281, 26699, 257, 707, 857, 411, + 721, 13, 51064], "temperature": 0.0, "avg_logprob": -0.20690717061360678, "compression_ratio": + 1.5384615384615385, "no_speech_prob": 0.21303260326385498}, {"id": 207, "seek": + 134944, "start": 1363.44, "end": 1368.44, "text": " So I understand them well enough + and maybe our listeners will too as well.", "tokens": [51064, 407, 286, 1223, 552, + 731, 1547, 293, 1310, 527, 23274, 486, 886, 382, 731, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.20690717061360678, "compression_ratio": 1.5384615384615385, + "no_speech_prob": 0.21303260326385498}, {"id": 208, "seek": 136844, "start": 1368.44, + "end": 1380.44, "text": " So you know, like when you said models and modules, you + know, let''s say I''m a researcher, I have a model in letting model right that I''ve + been using and the battle testing.", "tokens": [50364, 407, 291, 458, 11, 411, 562, + 291, 848, 5245, 293, 16679, 11, 291, 458, 11, 718, 311, 584, 286, 478, 257, 21751, + 11, 286, 362, 257, 2316, 294, 8295, 2316, 558, 300, 286, 600, 668, 1228, 293, 264, + 4635, 4997, 13, 50964], "temperature": 0.0, "avg_logprob": -0.2259809779024672, + "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.07171843945980072}, + {"id": 209, "seek": 136844, "start": 1380.44, "end": 1391.44, "text": " Now, if + I want to introduce that model into the aviate, I will have to create a module which + is using this model is that right is that.", "tokens": [50964, 823, 11, 498, 286, + 528, 281, 5366, 300, 2316, 666, 264, 1305, 13024, 11, 286, 486, 362, 281, 1884, + 257, 10088, 597, 307, 1228, 341, 2316, 307, 300, 558, 307, 300, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.2259809779024672, "compression_ratio": 1.7083333333333333, + "no_speech_prob": 0.07171843945980072}, {"id": 210, "seek": 136844, "start": 1391.44, + "end": 1393.44, "text": " Okay, that''s great.", "tokens": [51514, 1033, 11, 300, + 311, 869, 13, 51614], "temperature": 0.0, "avg_logprob": -0.2259809779024672, "compression_ratio": + 1.7083333333333333, "no_speech_prob": 0.07171843945980072}, {"id": 211, "seek": + 139344, "start": 1393.44, "end": 1399.44, "text": " And I need to extend some API + right that you provide.", "tokens": [50364, 400, 286, 643, 281, 10101, 512, 9362, + 558, 300, 291, 2893, 13, 50664], "temperature": 0.0, "avg_logprob": -0.18335829109981142, + "compression_ratio": 1.558139534883721, "no_speech_prob": 0.014425094239413738}, + {"id": 212, "seek": 139344, "start": 1399.44, "end": 1411.44, "text": " Yes, and + that is something that we spent a lot of time on and that''s the API design because + the I am a strong believer in developer UX.", "tokens": [50664, 1079, 11, 293, 300, + 307, 746, 300, 321, 4418, 257, 688, 295, 565, 322, 293, 300, 311, 264, 9362, 1715, + 570, 264, 286, 669, 257, 2068, 23892, 294, 10754, 40176, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.18335829109981142, "compression_ratio": 1.558139534883721, + "no_speech_prob": 0.014425094239413738}, {"id": 213, "seek": 139344, "start": 1411.44, + "end": 1420.44, "text": " So it needs to be as clean and as easy as possible. So + one of the things, for example, that we''ve done is that we''ve adopted graph QLS + interface.", "tokens": [51264, 407, 309, 2203, 281, 312, 382, 2541, 293, 382, 1858, + 382, 1944, 13, 407, 472, 295, 264, 721, 11, 337, 1365, 11, 300, 321, 600, 1096, + 307, 300, 321, 600, 12175, 4295, 1249, 19198, 9226, 13, 51714], "temperature": 0.0, + "avg_logprob": -0.18335829109981142, "compression_ratio": 1.558139534883721, "no_speech_prob": + 0.014425094239413738}, {"id": 214, "seek": 142044, "start": 1420.44, "end": 1430.44, + "text": " So sometimes people ask is it like well, why graph QL and not something + more expressive like spark or something like that, which is a good question.", "tokens": + [50364, 407, 2171, 561, 1029, 307, 309, 411, 731, 11, 983, 4295, 1249, 43, 293, + 406, 746, 544, 40189, 411, 9908, 420, 746, 411, 300, 11, 597, 307, 257, 665, 1168, + 13, 50864], "temperature": 0.0, "avg_logprob": -0.23101471465768167, "compression_ratio": + 1.793774319066148, "no_speech_prob": 0.003542538033798337}, {"id": 215, "seek": + 142044, "start": 1430.44, "end": 1438.44, "text": " One of the things that we know + is like well, if we focus on being an effective database and we just want to show + these data objects with effect to be presentations.", "tokens": [50864, 1485, 295, + 264, 721, 300, 321, 458, 307, 411, 731, 11, 498, 321, 1879, 322, 885, 364, 4942, + 8149, 293, 321, 445, 528, 281, 855, 613, 1412, 6565, 365, 1802, 281, 312, 18964, + 13, 51264], "temperature": 0.0, "avg_logprob": -0.23101471465768167, "compression_ratio": + 1.793774319066148, "no_speech_prob": 0.003542538033798337}, {"id": 216, "seek": + 142044, "start": 1438.44, "end": 1446.44, "text": " And you know, sometimes it''s + possible we have these as these graph relations connections in them, but we''re + not focusing on being a graph database.", "tokens": [51264, 400, 291, 458, 11, 2171, + 309, 311, 1944, 321, 362, 613, 382, 613, 4295, 2299, 9271, 294, 552, 11, 457, 321, + 434, 406, 8416, 322, 885, 257, 4295, 8149, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.23101471465768167, "compression_ratio": 1.793774319066148, "no_speech_prob": + 0.003542538033798337}, {"id": 217, "seek": 144644, "start": 1446.44, "end": 1453.44, + "text": " I think actually graph QL does a job and it''s easy for people to understand + it''s very intuitive for people to understand.", "tokens": [50364, 286, 519, 767, + 4295, 1249, 43, 775, 257, 1691, 293, 309, 311, 1858, 337, 561, 281, 1223, 309, 311, + 588, 21769, 337, 561, 281, 1223, 13, 50714], "temperature": 0.0, "avg_logprob": + -0.17538372944977323, "compression_ratio": 1.7752808988764044, "no_speech_prob": + 0.02529383823275566}, {"id": 218, "seek": 144644, "start": 1453.44, "end": 1460.44, + "text": " And I think that these kind of things are very important and so to get + back to your to your point that your question.", "tokens": [50714, 400, 286, 519, + 300, 613, 733, 295, 721, 366, 588, 1021, 293, 370, 281, 483, 646, 281, 428, 281, + 428, 935, 300, 428, 1168, 13, 51064], "temperature": 0.0, "avg_logprob": -0.17538372944977323, + "compression_ratio": 1.7752808988764044, "no_speech_prob": 0.02529383823275566}, + {"id": 219, "seek": 144644, "start": 1460.44, "end": 1467.44, "text": " So what + we try to do is make it as easy as possible to actually bring these if you have + your own models to production.", "tokens": [51064, 407, 437, 321, 853, 281, 360, + 307, 652, 309, 382, 1858, 382, 1944, 281, 767, 1565, 613, 498, 291, 362, 428, 1065, + 5245, 281, 4265, 13, 51414], "temperature": 0.0, "avg_logprob": -0.17538372944977323, + "compression_ratio": 1.7752808988764044, "no_speech_prob": 0.02529383823275566}, + {"id": 220, "seek": 144644, "start": 1467.44, "end": 1473.44, "text": " And it''s + like well, I don''t have any models, but I just want to do you know, semantics search + through I don''t know.", "tokens": [51414, 400, 309, 311, 411, 731, 11, 286, 500, + 380, 362, 604, 5245, 11, 457, 286, 445, 528, 281, 360, 291, 458, 11, 4361, 45298, + 3164, 807, 286, 500, 380, 458, 13, 51714], "temperature": 0.0, "avg_logprob": -0.17538372944977323, + "compression_ratio": 1.7752808988764044, "no_speech_prob": 0.02529383823275566}, + {"id": 221, "seek": 147344, "start": 1473.44, "end": 1482.44, "text": " So if you''re + a person who is interested in the data science, you know, you know, you know, just + make something of the shelf should it in 3D API and you could develop.", "tokens": + [50364, 407, 498, 291, 434, 257, 954, 567, 307, 3102, 294, 264, 1412, 3497, 11, + 291, 458, 11, 291, 458, 11, 291, 458, 11, 445, 652, 746, 295, 264, 15222, 820, 309, + 294, 805, 35, 9362, 293, 291, 727, 1499, 13, 50814], "temperature": 0.4, "avg_logprob": + -0.6508289895406584, "compression_ratio": 1.570754716981132, "no_speech_prob": 0.16641034185886383}, + {"id": 222, "seek": 147344, "start": 1482.44, "end": 1493.44, "text": " So let''s + say and also this is kind of like which I think is very important in today''s world, + even though a lot of machine learning and data science happens in Python.", "tokens": + [50814, 407, 718, 311, 584, 293, 611, 341, 307, 733, 295, 411, 597, 286, 519, 307, + 588, 1021, 294, 965, 311, 1002, 11, 754, 1673, 257, 688, 295, 3479, 2539, 293, 1412, + 3497, 2314, 294, 15329, 13, 51364], "temperature": 0.4, "avg_logprob": -0.6508289895406584, + "compression_ratio": 1.570754716981132, "no_speech_prob": 0.16641034185886383}, + {"id": 223, "seek": 149344, "start": 1493.44, "end": 1505.44, "text": " So you go + let''s say to web scale sometimes you cannot use Python anymore like you need to + use let''s say go right or maybe see bindings and go and things like that.", "tokens": + [50364, 407, 291, 352, 718, 311, 584, 281, 3670, 4373, 2171, 291, 2644, 764, 15329, + 3602, 411, 291, 643, 281, 764, 718, 311, 584, 352, 558, 420, 1310, 536, 14786, 1109, + 293, 352, 293, 721, 411, 300, 13, 50964], "temperature": 0.0, "avg_logprob": -0.2374725780267825, + "compression_ratio": 1.6280193236714975, "no_speech_prob": 0.5318252444267273}, + {"id": 224, "seek": 149344, "start": 1505.44, "end": 1520.44, "text": " So your + API is it kind of cross-lingual meaning I have model in Python or maybe in go can + I kind of plug it in or do I need to rewrite some layer on top of it to be company.", + "tokens": [50964, 407, 428, 9362, 307, 309, 733, 295, 3278, 12, 1688, 901, 3620, + 286, 362, 2316, 294, 15329, 420, 1310, 294, 352, 393, 286, 733, 295, 5452, 309, + 294, 420, 360, 286, 643, 281, 28132, 512, 4583, 322, 1192, 295, 309, 281, 312, 2237, + 13, 51714], "temperature": 0.0, "avg_logprob": -0.2374725780267825, "compression_ratio": + 1.6280193236714975, "no_speech_prob": 0.5318252444267273}, {"id": 225, "seek": 152044, + "start": 1521.44, "end": 1539.44, "text": " That is a great question and here comes + the especially the expertise of the development team in so the what they have done + is that is like well, we know that that''s it that that center right that database + that just needs to be optimized as far as possible because.", "tokens": [50414, + 663, 307, 257, 869, 1168, 293, 510, 1487, 264, 2318, 264, 11769, 295, 264, 3250, + 1469, 294, 370, 264, 437, 436, 362, 1096, 307, 300, 307, 411, 731, 11, 321, 458, + 300, 300, 311, 309, 300, 300, 3056, 558, 300, 8149, 300, 445, 2203, 281, 312, 26941, + 382, 1400, 382, 1944, 570, 13, 51314], "temperature": 0.0, "avg_logprob": -0.21740741060491195, + "compression_ratio": 1.5529411764705883, "no_speech_prob": 0.03504478931427002}, + {"id": 226, "seek": 153944, "start": 1539.44, "end": 1556.44, "text": " I''m you + know let''s stick with the ecommerce example if you use it in production and hundreds + of people are searching and you want to give these recommendations you need to be + able to scale it so you need to choose a language and an and an and an architecture + that actually supports that so in our case that''s that''s go.", "tokens": [50364, + 286, 478, 291, 458, 718, 311, 2897, 365, 264, 308, 26926, 1365, 498, 291, 764, 309, + 294, 4265, 293, 6779, 295, 561, 366, 10808, 293, 291, 528, 281, 976, 613, 10434, + 291, 643, 281, 312, 1075, 281, 4373, 309, 370, 291, 643, 281, 2826, 257, 2856, 293, + 364, 293, 364, 293, 364, 9482, 300, 767, 9346, 300, 370, 294, 527, 1389, 300, 311, + 300, 311, 352, 13, 51214], "temperature": 0.0, "avg_logprob": -0.24600630746760838, + "compression_ratio": 1.687830687830688, "no_speech_prob": 0.20347857475280762}, + {"id": 227, "seek": 155644, "start": 1556.44, "end": 1571.44, "text": " But even + if you if you look to the to the GitHub repository you will even find the assembly + optimizations for certain things in there but we also knew that we said like well + maybe if you want to use model that''s written in.", "tokens": [50364, 583, 754, + 498, 291, 498, 291, 574, 281, 264, 281, 264, 23331, 25841, 291, 486, 754, 915, 264, + 12103, 5028, 14455, 337, 1629, 721, 294, 456, 457, 321, 611, 2586, 300, 321, 848, + 411, 731, 1310, 498, 291, 528, 281, 764, 2316, 300, 311, 3720, 294, 13, 51114], + "temperature": 0.0, "avg_logprob": -0.17484464832380706, "compression_ratio": 1.5067567567567568, + "no_speech_prob": 0.14116960763931274}, {"id": 228, "seek": 157144, "start": 1572.44, + "end": 1599.44, "text": " For example and that has bindings in Python for example + or you like to work in in Python so one of the things that we did there so it''s + like well the way that the modules work is that the modules are containerized so + there are APIs going between V8 and the different modules and as long as you adhere + to these APIs you can choose with any whatever language you want to build to build + a module.", "tokens": [50414, 1171, 1365, 293, 300, 575, 14786, 1109, 294, 15329, + 337, 1365, 420, 291, 411, 281, 589, 294, 294, 15329, 370, 472, 295, 264, 721, 300, + 321, 630, 456, 370, 309, 311, 411, 731, 264, 636, 300, 264, 16679, 589, 307, 300, + 264, 16679, 366, 10129, 1602, 370, 456, 366, 21445, 516, 1296, 691, 23, 293, 264, + 819, 16679, 293, 382, 938, 382, 291, 33584, 281, 613, 21445, 291, 393, 2826, 365, + 604, 2035, 2856, 291, 528, 281, 1322, 281, 1322, 257, 10088, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.2237170427695088, "compression_ratio": 1.7945205479452055, + "no_speech_prob": 0.1733960658311844}, {"id": 229, "seek": 159944, "start": 1599.44, + "end": 1613.44, "text": " So in the case of V8 V8 itself is completely written and + go and as like even with the assembly optimization those kind of things and the + modules we have a few modules that are for example written in Python because we + use.", "tokens": [50364, 407, 294, 264, 1389, 295, 691, 23, 691, 23, 2564, 307, + 2584, 3720, 293, 352, 293, 382, 411, 754, 365, 264, 12103, 19618, 729, 733, 295, + 721, 293, 264, 16679, 321, 362, 257, 1326, 16679, 300, 366, 337, 1365, 3720, 294, + 15329, 570, 321, 764, 13, 51064], "temperature": 0.0, "avg_logprob": -0.2259008770897275, + "compression_ratio": 1.6550218340611353, "no_speech_prob": 0.0024508782662451267}, + {"id": 230, "seek": 159944, "start": 1613.44, "end": 1624.44, "text": " A specific + types of transformer modules that just you know run well within within Python so + you can do whatever you want within it when it comes to using V8.", "tokens": [51064, + 316, 2685, 3467, 295, 31782, 16679, 300, 445, 291, 458, 1190, 731, 1951, 1951, 15329, + 370, 291, 393, 360, 2035, 291, 528, 1951, 309, 562, 309, 1487, 281, 1228, 691, 23, + 13, 51614], "temperature": 0.0, "avg_logprob": -0.2259008770897275, "compression_ratio": + 1.6550218340611353, "no_speech_prob": 0.0024508782662451267}, {"id": 231, "seek": + 162444, "start": 1624.44, "end": 1649.44, "text": " So you have the database running + and you can pick a client for example the Python client and have the Python client + interact with V8 wherever it sits but if you''re building a front end application + people use for example the JavaScript clients we have I''ve seen people build react + applications with the JavaScript clients so that''s why we structure it like that + that it''s easy to use in production.", "tokens": [50364, 407, 291, 362, 264, 8149, + 2614, 293, 291, 393, 1888, 257, 6423, 337, 1365, 264, 15329, 6423, 293, 362, 264, + 15329, 6423, 4648, 365, 691, 23, 8660, 309, 12696, 457, 498, 291, 434, 2390, 257, + 1868, 917, 3861, 561, 764, 337, 1365, 264, 15778, 6982, 321, 362, 286, 600, 1612, + 561, 1322, 4515, 5821, 365, 264, 15778, 6982, 370, 300, 311, 983, 321, 3877, 309, + 411, 300, 300, 309, 311, 1858, 281, 764, 294, 4265, 13, 51614], "temperature": 0.0, + "avg_logprob": -0.1620959997177124, "compression_ratio": 1.8294930875576036, "no_speech_prob": + 0.014952750876545906}, {"id": 232, "seek": 164944, "start": 1649.44, "end": 1678.44, + "text": " Yeah that''s amazing and you know like what you touched on it''s so important + and I mean close to my heart as well I''ve been building APIs kind of in my free + time for a long time and you know like what I''ve noticed with the users is that + the lower you put the boundary to kind of enter right like meaning let''s say you + have an API and you have published the sample clients to use this API on all possible + languages let''s say kind of the main street.", "tokens": [50364, 865, 300, 311, + 2243, 293, 291, 458, 411, 437, 291, 9828, 322, 309, 311, 370, 1021, 293, 286, 914, + 1998, 281, 452, 1917, 382, 731, 286, 600, 668, 2390, 21445, 733, 295, 294, 452, + 1737, 565, 337, 257, 938, 565, 293, 291, 458, 411, 437, 286, 600, 5694, 365, 264, + 5022, 307, 300, 264, 3126, 291, 829, 264, 12866, 281, 733, 295, 3242, 558, 411, + 3620, 718, 311, 584, 291, 362, 364, 9362, 293, 291, 362, 6572, 264, 6889, 6982, + 281, 764, 341, 9362, 322, 439, 1944, 8650, 718, 311, 584, 733, 295, 264, 2135, 4838, + 13, 51814], "temperature": 0.0, "avg_logprob": -0.10351850490758915, "compression_ratio": + 1.7738095238095237, "no_speech_prob": 0.12112387269735336}, {"id": 233, "seek": + 167844, "start": 1678.44, "end": 1693.44, "text": " Kind of the main stream once + at least you know it will lower the threshold to enter for them so that they will + never even contact you they will start using it right that''s that''s the win that''s + the win right.", "tokens": [50364, 9242, 295, 264, 2135, 4309, 1564, 412, 1935, + 291, 458, 309, 486, 3126, 264, 14678, 281, 3242, 337, 552, 370, 300, 436, 486, 1128, + 754, 3385, 291, 436, 486, 722, 1228, 309, 558, 300, 311, 300, 311, 264, 1942, 300, + 311, 264, 1942, 558, 13, 51114], "temperature": 0.0, "avg_logprob": -0.17621721540178573, + "compression_ratio": 1.6124031007751938, "no_speech_prob": 0.012404090724885464}, + {"id": 234, "seek": 169344, "start": 1693.44, "end": 1714.44, "text": " And I even + believe the so to sidestep a little bit from the discussion but it''s my it''s interesting + to talk about this because the I am a strong believer and I''m I''m I''m shooting + my horn for like years already about this that is it like the overlap between the + tech and the business science is in my opinion expressed in the API layer.", "tokens": + [50364, 400, 286, 754, 1697, 264, 370, 281, 20822, 377, 595, 257, 707, 857, 490, + 264, 5017, 457, 309, 311, 452, 309, 311, 1880, 281, 751, 466, 341, 570, 264, 286, + 669, 257, 2068, 23892, 293, 286, 478, 286, 478, 286, 478, 5942, 452, 13482, 337, + 411, 924, 1217, 466, 341, 300, 307, 309, 411, 264, 19959, 1296, 264, 7553, 293, + 264, 1606, 3497, 307, 294, 452, 4800, 12675, 294, 264, 9362, 4583, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.17476645382967862, "compression_ratio": 1.6183574879227054, + "no_speech_prob": 0.3173954486846924}, {"id": 235, "seek": 171444, "start": 1714.44, + "end": 1740.44, "text": " So if you feel the onion of a tech business you can go + as deep as go unless you have a graphical user interface but if you talk about database + technology those kind of things even bigger platforms if you look at the API the + API describes to you in human language what is that what it''s exposing and therefore + what the value is that it''s creating and the only thing that you need to do is + business is it''s right to capture that value.", "tokens": [50364, 407, 498, 291, + 841, 264, 10916, 295, 257, 7553, 1606, 291, 393, 352, 382, 2452, 382, 352, 5969, + 291, 362, 257, 35942, 4195, 9226, 457, 498, 291, 751, 466, 8149, 2899, 729, 733, + 295, 721, 754, 3801, 9473, 498, 291, 574, 412, 264, 9362, 264, 9362, 15626, 281, + 291, 294, 1952, 2856, 437, 307, 300, 437, 309, 311, 33178, 293, 4412, 437, 264, + 2158, 307, 300, 309, 311, 4084, 293, 264, 787, 551, 300, 291, 643, 281, 360, 307, + 1606, 307, 309, 311, 558, 281, 7983, 300, 2158, 13, 51664], "temperature": 0.0, + "avg_logprob": -0.1861043950562836, "compression_ratio": 1.7745901639344261, "no_speech_prob": + 0.06396190077066422}, {"id": 236, "seek": 174044, "start": 1740.44, "end": 1763.44, + "text": " Yeah, exactly and it''s like I think the the sort of the same behind the + API platform I was using back then they were saying like software is eating the + world but APIs are eating software so they like decomposing the software that was + sitting on the shelves somewhere in those big companies small companies whatever.", + "tokens": [50364, 865, 11, 2293, 293, 309, 311, 411, 286, 519, 264, 264, 1333, 295, + 264, 912, 2261, 264, 9362, 3663, 286, 390, 1228, 646, 550, 436, 645, 1566, 411, + 4722, 307, 3936, 264, 1002, 457, 21445, 366, 3936, 4722, 370, 436, 411, 22867, 6110, + 264, 4722, 300, 390, 3798, 322, 264, 24349, 4079, 294, 729, 955, 3431, 1359, 3431, + 2035, 13, 51514], "temperature": 0.0, "avg_logprob": -0.16526305675506592, "compression_ratio": + 1.6844919786096257, "no_speech_prob": 0.027160661295056343}, {"id": 237, "seek": + 176344, "start": 1763.44, "end": 1782.44, "text": " And now it''s introducing the + network right so like everyone wants to expose their value through an API and you + can easily consume that value through an API right and then you add all this you + know payment you know layers and what not to actually make it economically feasible.", + "tokens": [50364, 400, 586, 309, 311, 15424, 264, 3209, 558, 370, 411, 1518, 2738, + 281, 19219, 641, 2158, 807, 364, 9362, 293, 291, 393, 3612, 14732, 300, 2158, 807, + 364, 9362, 558, 293, 550, 291, 909, 439, 341, 291, 458, 10224, 291, 458, 7914, 293, + 437, 406, 281, 767, 652, 309, 26811, 26648, 13, 51314], "temperature": 0.0, "avg_logprob": + -0.11941264356885638, "compression_ratio": 1.5953757225433527, "no_speech_prob": + 0.020946044474840164}, {"id": 238, "seek": 178244, "start": 1782.44, "end": 1806.44, + "text": " So that''s that''s I think that''s an exciting direction and I''m happy + to hear that you guys are pursuing that model like basically making it an API in + many ways database should be an API right sitting somewhere and I can connect to + it with my client of you know in the language of my choice and you know handle all + the cases I need to handle.", "tokens": [50364, 407, 300, 311, 300, 311, 286, 519, + 300, 311, 364, 4670, 3513, 293, 286, 478, 2055, 281, 1568, 300, 291, 1074, 366, + 20222, 300, 2316, 411, 1936, 1455, 309, 364, 9362, 294, 867, 2098, 8149, 820, 312, + 364, 9362, 558, 3798, 4079, 293, 286, 393, 1745, 281, 309, 365, 452, 6423, 295, + 291, 458, 294, 264, 2856, 295, 452, 3922, 293, 291, 458, 4813, 439, 264, 3331, 286, + 643, 281, 4813, 13, 51564], "temperature": 0.0, "avg_logprob": -0.09418901644254986, + "compression_ratio": 1.6346153846153846, "no_speech_prob": 0.033035021275281906}, + {"id": 239, "seek": 180644, "start": 1806.44, "end": 1822.44, "text": " Exactly + and I think so if you look at a nice car for example right there are two ways that + you can look at the car so the bottom up way if you compare it''s all right the + bottom up way is that the first thing that you do you open the hood and you you + you know you look at the beauty of the engine and maybe you want to know how the + engine works on the hood.", "tokens": [50364, 7587, 293, 286, 519, 370, 498, 291, + 574, 412, 257, 1481, 1032, 337, 1365, 558, 456, 366, 732, 2098, 300, 291, 393, 574, + 412, 264, 1032, 370, 264, 2767, 493, 636, 498, 291, 6794, 309, 311, 439, 558, 264, + 2767, 493, 636, 307, 300, 264, 700, 551, 300, 291, 360, 291, 1269, 264, 13376, 293, + 291, 291, 291, 458, 291, 574, 412, 264, 6643, 295, 264, 2848, 293, 1310, 291, 528, + 281, 458, 577, 264, 2848, 1985, 322, 264, 13376, 13, 51164], "temperature": 0.0, + "avg_logprob": -0.15906146554385914, "compression_ratio": 1.8835978835978835, "no_speech_prob": + 0.07339145988225937}, {"id": 240, "seek": 182244, "start": 1822.44, "end": 1848.44, + "text": " The top down you''re looking at it is just opening the door sitting in + the chair holding the interface of the car steering wheel and you''re like oh this + this car drives you know it drives fantastic it drives amazing amazing and my argument + person and I''m not everybody agrees but that''s my point of view is that I said + like if you have an amazing engine but you know she is steering wheel nobody''s + going to drive your car.", "tokens": [50364, 440, 1192, 760, 291, 434, 1237, 412, + 309, 307, 445, 5193, 264, 2853, 3798, 294, 264, 6090, 5061, 264, 9226, 295, 264, + 1032, 14823, 5589, 293, 291, 434, 411, 1954, 341, 341, 1032, 11754, 291, 458, 309, + 11754, 5456, 309, 11754, 2243, 2243, 293, 452, 6770, 954, 293, 286, 478, 406, 2201, + 26383, 457, 300, 311, 452, 935, 295, 1910, 307, 300, 286, 848, 411, 498, 291, 362, + 364, 2243, 2848, 457, 291, 458, 750, 307, 14823, 5589, 5079, 311, 516, 281, 3332, + 428, 1032, 13, 51664], "temperature": 0.0, "avg_logprob": -0.2529309590657552, "compression_ratio": + 1.8146551724137931, "no_speech_prob": 0.3252430558204651}, {"id": 241, "seek": 184844, + "start": 1848.44, "end": 1875.44, "text": " I mean the other way around also true + that if you have a beautiful interface in your car and you have a shitty engine + that also doesn''t work but that needs to play well together and that''s again why + I''m strongly for in that in that in the in the UX the experience that you have + in using the technology because of course an experience is not limited to a graphical + use in space that can also sit in a in an in an API of course.", "tokens": [50364, + 286, 914, 264, 661, 636, 926, 611, 2074, 300, 498, 291, 362, 257, 2238, 9226, 294, + 428, 1032, 293, 291, 362, 257, 30748, 2848, 300, 611, 1177, 380, 589, 457, 300, + 2203, 281, 862, 731, 1214, 293, 300, 311, 797, 983, 286, 478, 10613, 337, 294, 300, + 294, 300, 294, 264, 294, 264, 40176, 264, 1752, 300, 291, 362, 294, 1228, 264, 2899, + 570, 295, 1164, 364, 1752, 307, 406, 5567, 281, 257, 35942, 764, 294, 1901, 300, + 393, 611, 1394, 294, 257, 294, 364, 294, 364, 9362, 295, 1164, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.1725081995913857, "compression_ratio": 1.7603305785123966, + "no_speech_prob": 0.0029241242446005344}, {"id": 242, "seek": 187544, "start": 1875.8400000000001, + "end": 1876.8400000000001, "text": " Yeah absolutely.", "tokens": [50384, 865, 3122, + 13, 50434], "temperature": 0.0, "avg_logprob": -0.1750178913493733, "compression_ratio": + 1.7130434782608697, "no_speech_prob": 0.02999264746904373}, {"id": 243, "seek": + 187544, "start": 1878.44, "end": 1905.24, "text": " And you know like coming back + to some of the use cases you brought up I mean you mentioned the shopping cart I + was actually chatting to Eric pew hey Eric if you''re listening to this from open + source connections you know he was saying like out of his you know it''s kind of + out of blue sky he was saying hey so let''s imagine a use case it''s a bit of delivery + and and you want to.", "tokens": [50514, 400, 291, 458, 411, 1348, 646, 281, 512, + 295, 264, 764, 3331, 291, 3038, 493, 286, 914, 291, 2835, 264, 8688, 5467, 286, + 390, 767, 24654, 281, 9336, 25889, 4177, 9336, 498, 291, 434, 4764, 281, 341, 490, + 1269, 4009, 9271, 291, 458, 415, 390, 1566, 411, 484, 295, 702, 291, 458, 309, 311, + 733, 295, 484, 295, 3344, 5443, 415, 390, 1566, 4177, 370, 718, 311, 3811, 257, + 764, 1389, 309, 311, 257, 857, 295, 8982, 293, 293, 291, 528, 281, 13, 51854], "temperature": + 0.0, "avg_logprob": -0.1750178913493733, "compression_ratio": 1.7130434782608697, + "no_speech_prob": 0.02999264746904373}, {"id": 244, "seek": 190544, "start": 1905.44, + "end": 1931.8400000000001, "text": " Encode the journey you know of your delivery + person with no left turns right so like only right most or like right turns or you + know forward backward but no left turns just for whatever reason and this was like + very interesting use case like can I actually express that in the form of embedding + probably I do I can''t right so and then do some kind of.", "tokens": [50364, 29584, + 1429, 264, 4671, 291, 458, 295, 428, 8982, 954, 365, 572, 1411, 4523, 558, 370, + 411, 787, 558, 881, 420, 411, 558, 4523, 420, 291, 458, 2128, 23897, 457, 572, 1411, + 4523, 445, 337, 2035, 1778, 293, 341, 390, 411, 588, 1880, 764, 1389, 411, 393, + 286, 767, 5109, 300, 294, 264, 1254, 295, 12240, 3584, 1391, 286, 360, 286, 393, + 380, 558, 370, 293, 550, 360, 512, 733, 295, 13, 51684], "temperature": 0.0, "avg_logprob": + -0.16893866187647769, "compression_ratio": 1.6875, "no_speech_prob": 0.002184092765673995}, + {"id": 245, "seek": 193184, "start": 1931.84, "end": 1953.24, "text": " Geographical + search and like say okay what''s the most similar sort of journey that will bring + the speed from A to B sounds a little bit crazy but you know like that''s what i''m + thinking when a use case exists the sort of the journey is to go backward from it + to to the embedding space right and that''s that''s very interesting yeah.", "tokens": + [50364, 2876, 48434, 3164, 293, 411, 584, 1392, 437, 311, 264, 881, 2531, 1333, + 295, 4671, 300, 486, 1565, 264, 3073, 490, 316, 281, 363, 3263, 257, 707, 857, 3219, + 457, 291, 458, 411, 300, 311, 437, 741, 478, 1953, 562, 257, 764, 1389, 8198, 264, + 1333, 295, 264, 4671, 307, 281, 352, 23897, 490, 309, 281, 281, 264, 12240, 3584, + 1901, 558, 293, 300, 311, 300, 311, 588, 1880, 1338, 13, 51434], "temperature": + 0.0, "avg_logprob": -0.2017428207397461, "compression_ratio": 1.5789473684210527, + "no_speech_prob": 0.044917333871126175}, {"id": 246, "seek": 195324, "start": 1953.96, + "end": 1978.92, "text": " Yeah but so let''s build a little bit further on the so + I like the example right so so but let''s let it''s an it''s an it''s a good example + but it''s an abstract example so we might even make it a little bit more concrete + right so let''s say you have a pizza delivery surface and from the moment somebody + orders pizza you have certain data right so you have data about like what''s on + the pizza it''s coming from or the person living etc etc.", "tokens": [50400, 865, + 457, 370, 718, 311, 1322, 257, 707, 857, 3052, 322, 264, 370, 286, 411, 264, 1365, + 558, 370, 370, 457, 718, 311, 718, 309, 311, 364, 309, 311, 364, 309, 311, 257, + 665, 1365, 457, 309, 311, 364, 12649, 1365, 370, 321, 1062, 754, 652, 309, 257, + 707, 857, 544, 9859, 558, 370, 718, 311, 584, 291, 362, 257, 8298, 8982, 3753, 293, + 490, 264, 1623, 2618, 9470, 8298, 291, 362, 1629, 1412, 558, 370, 291, 362, 1412, + 466, 411, 437, 311, 322, 264, 8298, 309, 311, 1348, 490, 420, 264, 954, 2647, 5183, + 5183, 13, 51648], "temperature": 0.0, "avg_logprob": -0.20077918779731976, "compression_ratio": + 1.9035087719298245, "no_speech_prob": 0.11443346738815308}, {"id": 247, "seek": + 197892, "start": 1978.92, "end": 2008.68, "text": " There are two things that you + can do right you can and go that information in the factory presentation and if + you are fast enough in comparing factors with old orders you can say something about + that order and that it comes interesting so what you know can do is you can say + for example you might be able to say something about delivery times so you can say + okay I we''ve sold that say that you''re like a big pizza chain you know we''ll + sold like a million dollars.", "tokens": [50412, 821, 366, 732, 721, 300, 291, 393, + 360, 558, 291, 393, 293, 352, 300, 1589, 294, 264, 9265, 5860, 293, 498, 291, 366, + 2370, 1547, 294, 15763, 6771, 365, 1331, 9470, 291, 393, 584, 746, 466, 300, 1668, + 293, 300, 309, 1487, 1880, 370, 437, 291, 458, 393, 360, 307, 291, 393, 584, 337, + 1365, 291, 1062, 312, 1075, 281, 584, 746, 466, 8982, 1413, 370, 291, 393, 584, + 1392, 286, 321, 600, 3718, 300, 584, 300, 291, 434, 411, 257, 955, 8298, 5021, 291, + 458, 321, 603, 3718, 411, 257, 2459, 3808, 13, 51852], "temperature": 0.0, "avg_logprob": + -0.26838186598315683, "compression_ratio": 1.9053497942386832, "no_speech_prob": + 0.012885377742350101}, {"id": 248, "seek": 200892, "start": 2008.92, "end": 2038.8400000000001, + "text": " Pizza''s in the past now based on this request we real time calculate + the fact pre-presentations for this order we do a real time comparison for orders + in the past and I said like hey we see that the average of the last 10 orders that + are similar was like 18 minutes so now real time you can say something about that + so this is just an example of use case that where these factor databases might be + extremely valuable and that''s just", "tokens": [50364, 24469, 311, 294, 264, 1791, + 586, 2361, 322, 341, 5308, 321, 957, 565, 8873, 264, 1186, 659, 12, 79, 11662, 763, + 337, 341, 1668, 321, 360, 257, 957, 565, 9660, 337, 9470, 294, 264, 1791, 293, 286, + 848, 411, 4177, 321, 536, 300, 264, 4274, 295, 264, 1036, 1266, 9470, 300, 366, + 2531, 390, 411, 2443, 2077, 370, 586, 957, 565, 291, 393, 584, 746, 466, 300, 370, + 341, 307, 445, 364, 1365, 295, 764, 1389, 300, 689, 613, 5952, 22380, 1062, 312, + 4664, 8263, 293, 300, 311, 445, 51860], "temperature": 0.0, "avg_logprob": -0.22680512718532397, + "compression_ratio": 1.7211155378486056, "no_speech_prob": 0.006326334085315466}, + {"id": 249, "seek": 203892, "start": 2038.92, "end": 2049.64, "text": " A example + and that just more and more of these kinds of cases are popping up so that''s that''s + extremely exciting in my opinion.", "tokens": [50364, 316, 1365, 293, 300, 445, + 544, 293, 544, 295, 613, 3685, 295, 3331, 366, 18374, 493, 370, 300, 311, 300, 311, + 4664, 4670, 294, 452, 4800, 13, 50900], "temperature": 0.0, "avg_logprob": -0.1620473743956766, + "compression_ratio": 1.592274678111588, "no_speech_prob": 0.0017187005141749978}, + {"id": 250, "seek": 203892, "start": 2049.64, "end": 2067.7200000000003, "text": + " Yeah I mean it sounds so interesting because like you know there are so many products + that still revolve around the idea of let''s say kind of for simplistic terms inverted + index like I''m using that 15th century model to represent my data right", "tokens": + [50900, 865, 286, 914, 309, 3263, 370, 1880, 570, 411, 291, 458, 456, 366, 370, + 867, 3383, 300, 920, 16908, 303, 926, 264, 1558, 295, 718, 311, 584, 733, 295, 337, + 44199, 2115, 38969, 8186, 411, 286, 478, 1228, 300, 2119, 392, 4901, 2316, 281, + 2906, 452, 1412, 558, 51804], "temperature": 0.0, "avg_logprob": -0.1620473743956766, + "compression_ratio": 1.592274678111588, "no_speech_prob": 0.0017187005141749978}, + {"id": 251, "seek": 206772, "start": 2067.72, "end": 2078.2799999999997, "text": + " then if I have images I''m like oops what should I do now okay maybe I can use + some extension on top of Lucine if I''m using Lucine let''s say for the sake of + it right", "tokens": [50364, 550, 498, 286, 362, 5267, 286, 478, 411, 34166, 437, + 820, 286, 360, 586, 1392, 1310, 286, 393, 764, 512, 10320, 322, 1192, 295, 9593, + 533, 498, 286, 478, 1228, 9593, 533, 718, 311, 584, 337, 264, 9717, 295, 309, 558, + 50892], "temperature": 0.0, "avg_logprob": -0.1577076446719286, "compression_ratio": + 1.619047619047619, "no_speech_prob": 0.032959673553705215}, {"id": 252, "seek": + 206772, "start": 2078.2799999999997, "end": 2089.24, "text": " but it''s still kind + of a limiting experience it''s kind of like okay I''m I''m I''m I''m solving my + my task with the wrong tool right in many ways", "tokens": [50892, 457, 309, 311, + 920, 733, 295, 257, 22083, 1752, 309, 311, 733, 295, 411, 1392, 286, 478, 286, 478, + 286, 478, 286, 478, 12606, 452, 452, 5633, 365, 264, 2085, 2290, 558, 294, 867, + 2098, 51440], "temperature": 0.0, "avg_logprob": -0.1577076446719286, "compression_ratio": + 1.619047619047619, "no_speech_prob": 0.032959673553705215}, {"id": 253, "seek": + 208924, "start": 2089.3199999999997, "end": 2099.7999999999997, "text": " and maybe + I should not be like let''s say Lucine since I mentioned it I can kind of continue + using it for the spa search for the for the normal inverted index retrieval", "tokens": + [50368, 293, 1310, 286, 820, 406, 312, 411, 718, 311, 584, 9593, 533, 1670, 286, + 2835, 309, 286, 393, 733, 295, 2354, 1228, 309, 337, 264, 32543, 3164, 337, 264, + 337, 264, 2710, 38969, 8186, 19817, 3337, 50892], "temperature": 0.0, "avg_logprob": + -0.16088639668055943, "compression_ratio": 1.6141304347826086, "no_speech_prob": + 0.0020280599128454924}, {"id": 254, "seek": 208924, "start": 2099.7999999999997, + "end": 2111.4799999999996, "text": " but I can unlock so many new use cases with + the vector search right and like find kind of new ways to show the value to the + users", "tokens": [50892, 457, 286, 393, 11634, 370, 867, 777, 764, 3331, 365, 264, + 8062, 3164, 558, 293, 411, 915, 733, 295, 777, 2098, 281, 855, 264, 2158, 281, 264, + 5022, 51476], "temperature": 0.0, "avg_logprob": -0.16088639668055943, "compression_ratio": + 1.6141304347826086, "no_speech_prob": 0.0020280599128454924}, {"id": 255, "seek": + 211148, "start": 2111.56, "end": 2125.0, "text": " because especially you know like + in traditional search engines if you return let''s say 100 results you know users + don''t have time sometimes to go through them right you basically offload a lot + of do you feel that way or", "tokens": [50368, 570, 2318, 291, 458, 411, 294, 5164, + 3164, 12982, 498, 291, 2736, 718, 311, 584, 2319, 3542, 291, 458, 5022, 500, 380, + 362, 565, 2171, 281, 352, 807, 552, 558, 291, 1936, 766, 2907, 257, 688, 295, 360, + 291, 841, 300, 636, 420, 51040], "temperature": 0.0, "avg_logprob": -0.13281125907438346, + "compression_ratio": 1.6415929203539823, "no_speech_prob": 0.014823603443801403}, + {"id": 256, "seek": 211148, "start": 2125.0, "end": 2135.4, "text": " yeah certainly + and there''s a lot in there are a lot of interesting things in which you''re saying + so I mean so so again it depends on how you look at it", "tokens": [51040, 1338, + 3297, 293, 456, 311, 257, 688, 294, 456, 366, 257, 688, 295, 1880, 721, 294, 597, + 291, 434, 1566, 370, 286, 914, 370, 370, 797, 309, 5946, 322, 577, 291, 574, 412, + 309, 51560], "temperature": 0.0, "avg_logprob": -0.13281125907438346, "compression_ratio": + 1.6415929203539823, "no_speech_prob": 0.014823603443801403}, {"id": 257, "seek": + 213540, "start": 2135.4, "end": 2145.56, "text": " bottoms up or top down so so + you can make an argument from the from the bottoms up approach that you say okay + we have to infer the index or we have to", "tokens": [50364, 43413, 493, 420, 1192, + 760, 370, 370, 291, 393, 652, 364, 6770, 490, 264, 490, 264, 43413, 493, 3109, 300, + 291, 584, 1392, 321, 362, 281, 13596, 264, 8186, 420, 321, 362, 281, 50872], "temperature": + 0.0, "avg_logprob": -0.19414031505584717, "compression_ratio": 2.0043290043290045, + "no_speech_prob": 0.0022386072669178247}, {"id": 258, "seek": 213540, "start": 2145.56, + "end": 2154.84, "text": " de facto index and we can do certain things to this but + but I also like to do is that I like to look at it from from a top down perspective + and what I mean with it is this so", "tokens": [50872, 368, 42225, 8186, 293, 321, + 393, 360, 1629, 721, 281, 341, 457, 457, 286, 611, 411, 281, 360, 307, 300, 286, + 411, 281, 574, 412, 309, 490, 490, 257, 1192, 760, 4585, 293, 437, 286, 914, 365, + 309, 307, 341, 370, 51336], "temperature": 0.0, "avg_logprob": -0.19414031505584717, + "compression_ratio": 2.0043290043290045, "no_speech_prob": 0.0022386072669178247}, + {"id": 259, "seek": 213540, "start": 2157.8, "end": 2165.32, "text": " if you work + with a you want to build on a project for yourself you want to build on a project + for you know with your students you want to", "tokens": [51484, 498, 291, 589, 365, + 257, 291, 528, 281, 1322, 322, 257, 1716, 337, 1803, 291, 528, 281, 1322, 322, 257, + 1716, 337, 291, 458, 365, 428, 1731, 291, 528, 281, 51860], "temperature": 0.0, + "avg_logprob": -0.19414031505584717, "compression_ratio": 2.0043290043290045, "no_speech_prob": + 0.0022386072669178247}, {"id": 260, "seek": 216532, "start": 2165.32, "end": 2175.4, + "text": " build on a project for for I don''t know for your boss or for customer + whatever how often do you actually say okay that the tool that I''m not going to + use to store my data isn''t", "tokens": [50364, 1322, 322, 257, 1716, 337, 337, + 286, 500, 380, 458, 337, 428, 5741, 420, 337, 5474, 2035, 577, 2049, 360, 291, 767, + 584, 1392, 300, 264, 2290, 300, 286, 478, 406, 516, 281, 764, 281, 3531, 452, 1412, + 1943, 380, 50868], "temperature": 0.0, "avg_logprob": -0.14631717059077048, "compression_ratio": + 1.7695652173913043, "no_speech_prob": 0.0006344420253299177}, {"id": 261, "seek": + 216532, "start": 2175.4, "end": 2182.92, "text": " inferred index it probably doesn''t + happen that often I mean it''s like it''s like if you go to one of the um", "tokens": + [50868, 13596, 986, 8186, 309, 1391, 1177, 380, 1051, 300, 2049, 286, 914, 309, + 311, 411, 309, 311, 411, 498, 291, 352, 281, 472, 295, 264, 1105, 51244], "temperature": + 0.0, "avg_logprob": -0.14631717059077048, "compression_ratio": 1.7695652173913043, + "no_speech_prob": 0.0006344420253299177}, {"id": 262, "seek": 216532, "start": 2184.76, + "end": 2192.1200000000003, "text": " the websites of these famous big companies + that build databases around inferred indexes they don''t go like this is the best", + "tokens": [51336, 264, 12891, 295, 613, 4618, 955, 3431, 300, 1322, 22380, 926, + 13596, 986, 8186, 279, 436, 500, 380, 352, 411, 341, 307, 264, 1151, 51704], "temperature": + 0.0, "avg_logprob": -0.14631717059077048, "compression_ratio": 1.7695652173913043, + "no_speech_prob": 0.0006344420253299177}, {"id": 263, "seek": 219212, "start": 2192.12, + "end": 2200.8399999999997, "text": " inverted index around use us they say something + else right so and they say like hey we help you with with enterprise search or we + help you with with", "tokens": [50364, 38969, 8186, 926, 764, 505, 436, 584, 746, + 1646, 558, 370, 293, 436, 584, 411, 4177, 321, 854, 291, 365, 365, 14132, 3164, + 420, 321, 854, 291, 365, 365, 50800], "temperature": 0.0, "avg_logprob": -0.1444112573351179, + "compression_ratio": 1.965, "no_speech_prob": 0.0007665205630473793}, {"id": 264, + "seek": 219212, "start": 2200.8399999999997, "end": 2207.72, "text": " logging or + we help you with service security needs and those kind of things and what I think + where we are right now in the", "tokens": [50800, 27991, 420, 321, 854, 291, 365, + 2643, 3825, 2203, 293, 729, 733, 295, 721, 293, 437, 286, 519, 689, 321, 366, 558, + 586, 294, 264, 51144], "temperature": 0.0, "avg_logprob": -0.1444112573351179, "compression_ratio": + 1.965, "no_speech_prob": 0.0007665205630473793}, {"id": 265, "seek": 219212, "start": + 2207.72, "end": 2215.3199999999997, "text": " cutting edge of like where where effective + searches is like we are talking about it like you would be talking about these", + "tokens": [51144, 6492, 4691, 295, 411, 689, 689, 4942, 26701, 307, 411, 321, 366, + 1417, 466, 309, 411, 291, 576, 312, 1417, 466, 613, 51524], "temperature": 0.0, + "avg_logprob": -0.1444112573351179, "compression_ratio": 1.965, "no_speech_prob": + 0.0007665205630473793}, {"id": 266, "seek": 221532, "start": 2215.32, "end": 2222.52, + "text": " inverted indexes but I hope and that''s one of the things that we try + to do um at with you know at some", "tokens": [50364, 38969, 8186, 279, 457, 286, + 1454, 293, 300, 311, 472, 295, 264, 721, 300, 321, 853, 281, 360, 1105, 412, 365, + 291, 458, 412, 512, 50724], "temperature": 0.0, "avg_logprob": -0.19413080921879522, + "compression_ratio": 1.8127659574468085, "no_speech_prob": 0.0009962993208318949}, + {"id": 267, "seek": 221532, "start": 2222.52, "end": 2228.52, "text": " technologies + around we''ve yet it''s to also talk about these new things and this is what you + do with it so", "tokens": [50724, 7943, 926, 321, 600, 1939, 309, 311, 281, 611, + 751, 466, 613, 777, 721, 293, 341, 307, 437, 291, 360, 365, 309, 370, 51024], "temperature": + 0.0, "avg_logprob": -0.19413080921879522, "compression_ratio": 1.8127659574468085, + "no_speech_prob": 0.0009962993208318949}, {"id": 268, "seek": 221532, "start": 2228.52, + "end": 2233.7200000000003, "text": " it''s like hey we can''t you cannot you have + like um these recommendation systems in e-commerce you can do", "tokens": [51024, + 309, 311, 411, 4177, 321, 393, 380, 291, 2644, 291, 362, 411, 1105, 613, 11879, + 3652, 294, 308, 12, 26926, 291, 393, 360, 51284], "temperature": 0.0, "avg_logprob": + -0.19413080921879522, "compression_ratio": 1.8127659574468085, "no_speech_prob": + 0.0009962993208318949}, {"id": 269, "seek": 221532, "start": 2233.7200000000003, + "end": 2239.32, "text": " contextual search to e-commerce I don''t know why it''s + stuck with e-commerce but I keep getting but or you can do", "tokens": [51284, 35526, + 3164, 281, 308, 12, 26926, 286, 500, 380, 458, 983, 309, 311, 5541, 365, 308, 12, + 26926, 457, 286, 1066, 1242, 457, 420, 291, 393, 360, 51564], "temperature": 0.0, + "avg_logprob": -0.19413080921879522, "compression_ratio": 1.8127659574468085, "no_speech_prob": + 0.0009962993208318949}, {"id": 270, "seek": 223932, "start": 2239.32, "end": 2245.1600000000003, + "text": " contextual search to documents and those um in those kind of things we + I was talking about this this", "tokens": [50364, 35526, 3164, 281, 8512, 293, 729, + 1105, 294, 729, 733, 295, 721, 321, 286, 390, 1417, 466, 341, 341, 50656], "temperature": + 0.0, "avg_logprob": -0.18894100189208984, "compression_ratio": 1.8511904761904763, + "no_speech_prob": 0.000754200795199722}, {"id": 271, "seek": 223932, "start": 2245.1600000000003, + "end": 2252.84, "text": " this amazing use case that had to do with a um um with + with a with a with a resume and had to do about this", "tokens": [50656, 341, 2243, + 764, 1389, 300, 632, 281, 360, 365, 257, 1105, 1105, 365, 365, 257, 365, 257, 365, + 257, 15358, 293, 632, 281, 360, 466, 341, 51040], "temperature": 0.0, "avg_logprob": + -0.18894100189208984, "compression_ratio": 1.8511904761904763, "no_speech_prob": + 0.000754200795199722}, {"id": 272, "seek": 223932, "start": 2252.84, "end": 2261.96, + "text": " so there was like yet a resume and and let''s say that in the resume it + says like um I''m an IT director", "tokens": [51040, 370, 456, 390, 411, 1939, 257, + 15358, 293, 293, 718, 311, 584, 300, 294, 264, 15358, 309, 1619, 411, 1105, 286, + 478, 364, 6783, 5391, 51496], "temperature": 0.0, "avg_logprob": -0.18894100189208984, + "compression_ratio": 1.8511904761904763, "no_speech_prob": 0.000754200795199722}, + {"id": 273, "seek": 226196, "start": 2261.96, "end": 2268.84, "text": " and I played + in the um the national Olympic Beach volleyball team that''s what it says and now + the", "tokens": [50364, 293, 286, 3737, 294, 264, 1105, 264, 4048, 19169, 14866, + 35887, 1469, 300, 311, 437, 309, 1619, 293, 586, 264, 50708], "temperature": 0.0, + "avg_logprob": -0.15635645520556105, "compression_ratio": 1.680327868852459, "no_speech_prob": + 0.0008161693695001304}, {"id": 274, "seek": 226196, "start": 2268.84, "end": 2274.36, + "text": " request is like there''s they''re looking for somebody who''s an IT director + and who is interested in playing sports", "tokens": [50708, 5308, 307, 411, 456, + 311, 436, 434, 1237, 337, 2618, 567, 311, 364, 6783, 5391, 293, 567, 307, 3102, + 294, 2433, 6573, 50984], "temperature": 0.0, "avg_logprob": -0.15635645520556105, + "compression_ratio": 1.680327868852459, "no_speech_prob": 0.0008161693695001304}, + {"id": 275, "seek": 226196, "start": 2275.4, "end": 2282.12, "text": " you''re not + going to find that person with an uh inferred index but instead of talking about + that from", "tokens": [51036, 291, 434, 406, 516, 281, 915, 300, 954, 365, 364, + 2232, 13596, 986, 8186, 457, 2602, 295, 1417, 466, 300, 490, 51372], "temperature": + 0.0, "avg_logprob": -0.15635645520556105, "compression_ratio": 1.680327868852459, + "no_speech_prob": 0.0008161693695001304}, {"id": 276, "seek": 226196, "start": 2282.12, + "end": 2286.36, "text": " the perspective of like the inferred index can''t find + it because there''s no relationship between", "tokens": [51372, 264, 4585, 295, + 411, 264, 13596, 986, 8186, 393, 380, 915, 309, 570, 456, 311, 572, 2480, 1296, + 51584], "temperature": 0.0, "avg_logprob": -0.15635645520556105, "compression_ratio": + 1.680327868852459, "no_speech_prob": 0.0008161693695001304}, {"id": 277, "seek": + 228636, "start": 2287.1600000000003, "end": 2293.56, "text": " directly between + sports and and and uh being in the Olympic Beach volleyball team but with the with", + "tokens": [50404, 3838, 1296, 6573, 293, 293, 293, 2232, 885, 294, 264, 19169, 14866, + 35887, 1469, 457, 365, 264, 365, 50724], "temperature": 0.0, "avg_logprob": -0.170625368754069, + "compression_ratio": 1.6594827586206897, "no_speech_prob": 0.0003510898386593908}, + {"id": 278, "seek": 228636, "start": 2293.56, "end": 2300.28, "text": " the factor + index we can I actually like to find more words and better language to actually + talk", "tokens": [50724, 264, 5952, 8186, 321, 393, 286, 767, 411, 281, 915, 544, + 2283, 293, 1101, 2856, 281, 767, 751, 51060], "temperature": 0.0, "avg_logprob": + -0.170625368754069, "compression_ratio": 1.6594827586206897, "no_speech_prob": 0.0003510898386593908}, + {"id": 279, "seek": 228636, "start": 2300.28, "end": 2304.52, "text": " about these + it''s from the perspective of the use case like contextual search, mental search + and", "tokens": [51060, 466, 613, 309, 311, 490, 264, 4585, 295, 264, 764, 1389, + 411, 35526, 3164, 11, 4973, 3164, 293, 51272], "temperature": 0.0, "avg_logprob": + -0.170625368754069, "compression_ratio": 1.6594827586206897, "no_speech_prob": 0.0003510898386593908}, + {"id": 280, "seek": 228636, "start": 2304.52, "end": 2312.28, "text": " those kind + of things which I even think are still abstracting into a lot of people''s ears + but", "tokens": [51272, 729, 733, 295, 721, 597, 286, 754, 519, 366, 920, 12649, + 278, 666, 257, 688, 295, 561, 311, 8798, 457, 51660], "temperature": 0.0, "avg_logprob": + -0.170625368754069, "compression_ratio": 1.6594827586206897, "no_speech_prob": 0.0003510898386593908}, + {"id": 281, "seek": 231228, "start": 2312.28, "end": 2316.92, "text": " I think + it''s also very exciting so there''s this new thing goes for you as well right so + you''re", "tokens": [50364, 286, 519, 309, 311, 611, 588, 4670, 370, 456, 311, 341, + 777, 551, 1709, 337, 291, 382, 731, 558, 370, 291, 434, 50596], "temperature": 0.0, + "avg_logprob": -0.12841141810182666, "compression_ratio": 1.9959349593495934, "no_speech_prob": + 0.0028913619462400675}, {"id": 282, "seek": 231228, "start": 2316.92, "end": 2321.4, + "text": " also helping with that you''re helping to let the world know like hey + look there''s this new thing look", "tokens": [50596, 611, 4315, 365, 300, 291, + 434, 4315, 281, 718, 264, 1002, 458, 411, 4177, 574, 456, 311, 341, 777, 551, 574, + 50820], "temperature": 0.0, "avg_logprob": -0.12841141810182666, "compression_ratio": + 1.9959349593495934, "no_speech_prob": 0.0028913619462400675}, {"id": 283, "seek": + 231228, "start": 2321.4, "end": 2325.8, "text": " what the things are that you can + do with it so I think the point that I''m trying to make is that I", "tokens": [50820, + 437, 264, 721, 366, 300, 291, 393, 360, 365, 309, 370, 286, 519, 264, 935, 300, + 286, 478, 1382, 281, 652, 307, 300, 286, 51040], "temperature": 0.0, "avg_logprob": + -0.12841141810182666, "compression_ratio": 1.9959349593495934, "no_speech_prob": + 0.0028913619462400675}, {"id": 284, "seek": 231228, "start": 2326.6000000000004, + "end": 2331.48, "text": " I it''s not that I disagree with your point I agree with + your point but if you compare it with successful", "tokens": [51080, 286, 309, 311, + 406, 300, 286, 14091, 365, 428, 935, 286, 3986, 365, 428, 935, 457, 498, 291, 6794, + 309, 365, 4406, 51324], "temperature": 0.0, "avg_logprob": -0.12841141810182666, + "compression_ratio": 1.9959349593495934, "no_speech_prob": 0.0028913619462400675}, + {"id": 285, "seek": 231228, "start": 2331.48, "end": 2341.7200000000003, "text": + " search engines now that might be based on uh um inferred indexes but that''s not + how how we", "tokens": [51324, 3164, 12982, 586, 300, 1062, 312, 2361, 322, 2232, + 1105, 13596, 986, 8186, 279, 457, 300, 311, 406, 577, 577, 321, 51836], "temperature": + 0.0, "avg_logprob": -0.12841141810182666, "compression_ratio": 1.9959349593495934, + "no_speech_prob": 0.0028913619462400675}, {"id": 286, "seek": 234172, "start": 2341.7999999999997, + "end": 2348.12, "text": " talk about them and um I really think that say that''s + that we''re like at the at the cost of", "tokens": [50368, 751, 466, 552, 293, 1105, + 286, 534, 519, 300, 584, 300, 311, 300, 321, 434, 411, 412, 264, 412, 264, 2063, + 295, 50684], "temperature": 0.0, "avg_logprob": -0.08950500697880001, "compression_ratio": + 1.7692307692307692, "no_speech_prob": 0.000763413670938462}, {"id": 287, "seek": + 234172, "start": 2348.12, "end": 2353.64, "text": " that change that people select + that they start to talk more in these in these um from the perspective", "tokens": + [50684, 300, 1319, 300, 561, 3048, 300, 436, 722, 281, 751, 544, 294, 613, 294, + 613, 1105, 490, 264, 4585, 50960], "temperature": 0.0, "avg_logprob": -0.08950500697880001, + "compression_ratio": 1.7692307692307692, "no_speech_prob": 0.000763413670938462}, + {"id": 288, "seek": 234172, "start": 2353.64, "end": 2360.04, "text": " of the use + cases and the things that you can build with them yeah absolutely I mean it''s it''s + just", "tokens": [50960, 295, 264, 764, 3331, 293, 264, 721, 300, 291, 393, 1322, + 365, 552, 1338, 3122, 286, 914, 309, 311, 309, 311, 445, 51280], "temperature": + 0.0, "avg_logprob": -0.08950500697880001, "compression_ratio": 1.7692307692307692, + "no_speech_prob": 0.000763413670938462}, {"id": 289, "seek": 234172, "start": 2360.04, + "end": 2365.7999999999997, "text": " you know like my engineering mind always kicks + in and says like hey but you are basically offering", "tokens": [51280, 291, 458, + 411, 452, 7043, 1575, 1009, 21293, 294, 293, 1619, 411, 4177, 457, 291, 366, 1936, + 8745, 51568], "temperature": 0.0, "avg_logprob": -0.08950500697880001, "compression_ratio": + 1.7692307692307692, "no_speech_prob": 0.000763413670938462}, {"id": 290, "seek": + 236580, "start": 2365.8, "end": 2373.1600000000003, "text": " me to replace a dirty + index with like vector search data structure um but you are totally right", "tokens": + [50364, 385, 281, 7406, 257, 9360, 8186, 365, 411, 8062, 3164, 1412, 3877, 1105, + 457, 291, 366, 3879, 558, 50732], "temperature": 0.0, "avg_logprob": -0.11322447311046512, + "compression_ratio": 1.6437768240343347, "no_speech_prob": 0.0012354031205177307}, + {"id": 291, "seek": 236580, "start": 2373.1600000000003, "end": 2380.6800000000003, + "text": " like if an electric company would say hey buy our cars because we have + the best battery and look", "tokens": [50732, 411, 498, 364, 5210, 2237, 576, 584, + 4177, 2256, 527, 5163, 570, 321, 362, 264, 1151, 5809, 293, 574, 51108], "temperature": + 0.0, "avg_logprob": -0.11322447311046512, "compression_ratio": 1.6437768240343347, + "no_speech_prob": 0.0012354031205177307}, {"id": 292, "seek": 236580, "start": 2380.6800000000003, + "end": 2386.52, "text": " how good it is and they they supply some diagrams there + and showing how well it conserves the", "tokens": [51108, 577, 665, 309, 307, 293, + 436, 436, 5847, 512, 36709, 456, 293, 4099, 577, 731, 309, 1014, 9054, 264, 51400], + "temperature": 0.0, "avg_logprob": -0.11322447311046512, "compression_ratio": 1.6437768240343347, + "no_speech_prob": 0.0012354031205177307}, {"id": 293, "seek": 236580, "start": 2386.52, + "end": 2393.8, "text": " energy and stuff right maybe yeah we''ll appeal to some + clients who want to save the planet let''s", "tokens": [51400, 2281, 293, 1507, + 558, 1310, 1338, 321, 603, 13668, 281, 512, 6982, 567, 528, 281, 3155, 264, 5054, + 718, 311, 51764], "temperature": 0.0, "avg_logprob": -0.11322447311046512, "compression_ratio": + 1.6437768240343347, "no_speech_prob": 0.0012354031205177307}, {"id": 294, "seek": + 239380, "start": 2393.8, "end": 2399.5600000000004, "text": " say right but the + rest of the clients they will say okay why should I buy your car if it''s slower", + "tokens": [50364, 584, 558, 457, 264, 1472, 295, 264, 6982, 436, 486, 584, 1392, + 983, 820, 286, 2256, 428, 1032, 498, 309, 311, 14009, 50652], "temperature": 0.0, + "avg_logprob": -0.07283744604691215, "compression_ratio": 1.7431192660550459, "no_speech_prob": + 0.001930070691742003}, {"id": 295, "seek": 239380, "start": 2399.5600000000004, + "end": 2406.44, "text": " right like you didn''t focus on the use case I''m I''m + kind of like uh advocating for right so", "tokens": [50652, 558, 411, 291, 994, + 380, 1879, 322, 264, 764, 1389, 286, 478, 286, 478, 733, 295, 411, 2232, 32050, + 337, 558, 370, 50996], "temperature": 0.0, "avg_logprob": -0.07283744604691215, + "compression_ratio": 1.7431192660550459, "no_speech_prob": 0.001930070691742003}, + {"id": 296, "seek": 239380, "start": 2406.44, "end": 2411.7200000000003, "text": + " and you you should always listen to your users on that one yes and so what you''re + saying is very", "tokens": [50996, 293, 291, 291, 820, 1009, 2140, 281, 428, 5022, + 322, 300, 472, 2086, 293, 370, 437, 291, 434, 1566, 307, 588, 51260], "temperature": + 0.0, "avg_logprob": -0.07283744604691215, "compression_ratio": 1.7431192660550459, + "no_speech_prob": 0.001930070691742003}, {"id": 297, "seek": 239380, "start": 2411.7200000000003, + "end": 2417.8, "text": " interesting and this this is something um I was inspired + by something which is called the the", "tokens": [51260, 1880, 293, 341, 341, 307, + 746, 1105, 286, 390, 7547, 538, 746, 597, 307, 1219, 264, 264, 51564], "temperature": + 0.0, "avg_logprob": -0.07283744604691215, "compression_ratio": 1.7431192660550459, + "no_speech_prob": 0.001930070691742003}, {"id": 298, "seek": 241780, "start": 2417.96, + "end": 2423.96, "text": " layering problem which basically means that in the past + and I was so this was for me the case too", "tokens": [50372, 40754, 1154, 597, + 1936, 1355, 300, 294, 264, 1791, 293, 286, 390, 370, 341, 390, 337, 385, 264, 1389, + 886, 50672], "temperature": 0.0, "avg_logprob": -0.08807768992015294, "compression_ratio": + 2.0041152263374484, "no_speech_prob": 0.0012937169522047043}, {"id": 299, "seek": + 241780, "start": 2423.96, "end": 2429.8, "text": " like I I think like you know + maybe 10 years ago or something that you know if I just go deeper", "tokens": [50672, + 411, 286, 286, 519, 411, 291, 458, 1310, 1266, 924, 2057, 420, 746, 300, 291, 458, + 498, 286, 445, 352, 7731, 50964], "temperature": 0.0, "avg_logprob": -0.08807768992015294, + "compression_ratio": 2.0041152263374484, "no_speech_prob": 0.0012937169522047043}, + {"id": 300, "seek": 241780, "start": 2429.8, "end": 2435.0800000000004, "text": + " drill down deeper deeper deeper deeper deeper and I understand how something works + at the core that", "tokens": [50964, 11392, 760, 7731, 7731, 7731, 7731, 7731, 293, + 286, 1223, 577, 746, 1985, 412, 264, 4965, 300, 51228], "temperature": 0.0, "avg_logprob": + -0.08807768992015294, "compression_ratio": 2.0041152263374484, "no_speech_prob": + 0.0012937169522047043}, {"id": 301, "seek": 241780, "start": 2435.0800000000004, + "end": 2441.1600000000003, "text": " means that I understand the whole concept of + something and the more I''m learning about this and the", "tokens": [51228, 1355, + 300, 286, 1223, 264, 1379, 3410, 295, 746, 293, 264, 544, 286, 478, 2539, 466, 341, + 293, 264, 51532], "temperature": 0.0, "avg_logprob": -0.08807768992015294, "compression_ratio": + 2.0041152263374484, "no_speech_prob": 0.0012937169522047043}, {"id": 302, "seek": + 241780, "start": 2441.1600000000003, "end": 2445.48, "text": " more I''m working + on it the more I think that that''s not the case so let me give you an example", + "tokens": [51532, 544, 286, 478, 1364, 322, 309, 264, 544, 286, 519, 300, 300, 311, + 406, 264, 1389, 370, 718, 385, 976, 291, 364, 1365, 51748], "temperature": 0.0, + "avg_logprob": -0.08807768992015294, "compression_ratio": 2.0041152263374484, "no_speech_prob": + 0.0012937169522047043}, {"id": 303, "seek": 244548, "start": 2446.12, "end": 2455.2400000000002, + "text": " so that was this I saw this tweet coming by on the day that coinbase did + an IPO and I''m paraphrasing", "tokens": [50396, 370, 300, 390, 341, 286, 1866, + 341, 15258, 1348, 538, 322, 264, 786, 300, 11464, 17429, 630, 364, 50220, 293, 286, + 478, 36992, 1703, 3349, 50852], "temperature": 0.0, "avg_logprob": -0.09765164930741865, + "compression_ratio": 1.6977777777777778, "no_speech_prob": 0.0009086131467483938}, + {"id": 304, "seek": 244548, "start": 2455.2400000000002, "end": 2461.72, "text": + " here what was in the tweet because I don''t remember exactly but somebody found + the hacker news", "tokens": [50852, 510, 437, 390, 294, 264, 15258, 570, 286, 500, + 380, 1604, 2293, 457, 2618, 1352, 264, 38155, 2583, 51176], "temperature": 0.0, + "avg_logprob": -0.09765164930741865, "compression_ratio": 1.6977777777777778, "no_speech_prob": + 0.0009086131467483938}, {"id": 305, "seek": 244548, "start": 2461.72, "end": 2467.72, + "text": " post or somebody announced that they were working on coinbase it might + not even have been cold", "tokens": [51176, 2183, 420, 2618, 7548, 300, 436, 645, + 1364, 322, 11464, 17429, 309, 1062, 406, 754, 362, 668, 3554, 51476], "temperature": + 0.0, "avg_logprob": -0.09765164930741865, "compression_ratio": 1.6977777777777778, + "no_speech_prob": 0.0009086131467483938}, {"id": 306, "seek": 244548, "start": 2467.72, + "end": 2472.36, "text": " coinbase back then and he said okay I''m thinking of building + a platform blah blah blah it''s", "tokens": [51476, 11464, 17429, 646, 550, 293, + 415, 848, 1392, 286, 478, 1953, 295, 2390, 257, 3663, 12288, 12288, 12288, 309, + 311, 51708], "temperature": 0.0, "avg_logprob": -0.09765164930741865, "compression_ratio": + 1.6977777777777778, "no_speech_prob": 0.0009086131467483938}, {"id": 307, "seek": + 247236, "start": 2472.36, "end": 2476.6800000000003, "text": " something like that + well you should have seen the responses there because people like no", "tokens": + [50364, 746, 411, 300, 731, 291, 820, 362, 1612, 264, 13019, 456, 570, 561, 411, + 572, 50580], "temperature": 0.0, "avg_logprob": -0.11192685655019816, "compression_ratio": + 1.790874524714829, "no_speech_prob": 0.0018920288421213627}, {"id": 308, "seek": + 247236, "start": 2477.4, "end": 2481.56, "text": " wants to use that and that''s + not where these blockchain technologies are you know made for", "tokens": [50616, + 2738, 281, 764, 300, 293, 300, 311, 406, 689, 613, 17176, 7943, 366, 291, 458, 1027, + 337, 50824], "temperature": 0.0, "avg_logprob": -0.11192685655019816, "compression_ratio": + 1.790874524714829, "no_speech_prob": 0.0018920288421213627}, {"id": 309, "seek": + 247236, "start": 2481.56, "end": 2487.8, "text": " because we actually want to decentralize + things blah blah blah regardless of the fact if you", "tokens": [50824, 570, 321, + 767, 528, 281, 26515, 1125, 721, 12288, 12288, 12288, 10060, 295, 264, 1186, 498, + 291, 51136], "temperature": 0.0, "avg_logprob": -0.11192685655019816, "compression_ratio": + 1.790874524714829, "no_speech_prob": 0.0018920288421213627}, {"id": 310, "seek": + 247236, "start": 2487.8, "end": 2492.2000000000003, "text": " agree or disagree + with the statement I think we can agree on the fact that coinbase is doing pretty", + "tokens": [51136, 3986, 420, 14091, 365, 264, 5629, 286, 519, 321, 393, 3986, 322, + 264, 1186, 300, 11464, 17429, 307, 884, 1238, 51356], "temperature": 0.0, "avg_logprob": + -0.11192685655019816, "compression_ratio": 1.790874524714829, "no_speech_prob": + 0.0018920288421213627}, {"id": 311, "seek": 247236, "start": 2492.2000000000003, + "end": 2497.2400000000002, "text": " well and bringing a lot of value to people + so the point that I''m trying to make with this story is", "tokens": [51356, 731, + 293, 5062, 257, 688, 295, 2158, 281, 561, 370, 264, 935, 300, 286, 478, 1382, 281, + 652, 365, 341, 1657, 307, 51608], "temperature": 0.0, "avg_logprob": -0.11192685655019816, + "compression_ratio": 1.790874524714829, "no_speech_prob": 0.0018920288421213627}, + {"id": 312, "seek": 249724, "start": 2497.24, "end": 2506.6, "text": " that I think + that the the risk we run in constantly doing that deep dive and making the deep + dive", "tokens": [50364, 300, 286, 519, 300, 264, 264, 3148, 321, 1190, 294, 6460, + 884, 300, 2452, 9192, 293, 1455, 264, 2452, 9192, 50832], "temperature": 0.0, "avg_logprob": + -0.17663486613783724, "compression_ratio": 1.794392523364486, "no_speech_prob": + 0.0013929307460784912}, {"id": 313, "seek": 249724, "start": 2506.6, "end": 2511.0, + "text": " comparison which is important and which needs to happen and where we need + to think on in the product", "tokens": [50832, 9660, 597, 307, 1021, 293, 597, 2203, + 281, 1051, 293, 689, 321, 643, 281, 519, 322, 294, 264, 1674, 51052], "temperature": + 0.0, "avg_logprob": -0.17663486613783724, "compression_ratio": 1.794392523364486, + "no_speech_prob": 0.0013929307460784912}, {"id": 314, "seek": 249724, "start": 2511.0, + "end": 2515.8799999999997, "text": " of self-p and we also need to think in these + other layers like how will people actually", "tokens": [51052, 295, 2698, 12, 79, + 293, 321, 611, 643, 281, 519, 294, 613, 661, 7914, 411, 577, 486, 561, 767, 51296], + "temperature": 0.0, "avg_logprob": -0.17663486613783724, "compression_ratio": 1.794392523364486, + "no_speech_prob": 0.0013929307460784912}, {"id": 315, "seek": 249724, "start": 2517.56, + "end": 2522.8399999999997, "text": " use that because don''t bear in mind that the + people currently that are involved in the discussion", "tokens": [51380, 764, 300, + 570, 500, 380, 6155, 294, 1575, 300, 264, 561, 4362, 300, 366, 3288, 294, 264, 5017, + 51644], "temperature": 0.0, "avg_logprob": -0.17663486613783724, "compression_ratio": + 1.794392523364486, "no_speech_prob": 0.0013929307460784912}, {"id": 316, "seek": + 252284, "start": 2522.84, "end": 2527.96, "text": " talking about the expected database + and who are very vocal about it are people are extremely", "tokens": [50364, 1417, + 466, 264, 5176, 8149, 293, 567, 366, 588, 11657, 466, 309, 366, 561, 366, 4664, + 50620], "temperature": 0.0, "avg_logprob": -0.20534529379748423, "compression_ratio": + 1.8233082706766917, "no_speech_prob": 0.0030439544934779406}, {"id": 317, "seek": + 252284, "start": 2527.96, "end": 2532.28, "text": " knowledgeable about what''s + happening in the hood but with it what if you''re just a you know you''re", "tokens": + [50620, 33800, 466, 437, 311, 2737, 294, 264, 13376, 457, 365, 309, 437, 498, 291, + 434, 445, 257, 291, 458, 291, 434, 50836], "temperature": 0.0, "avg_logprob": -0.20534529379748423, + "compression_ratio": 1.8233082706766917, "no_speech_prob": 0.0030439544934779406}, + {"id": 318, "seek": 252284, "start": 2532.28, "end": 2537.6400000000003, "text": + " just working in the company and you''re just like a normal software engineer and + somebody says like", "tokens": [50836, 445, 1364, 294, 264, 2237, 293, 291, 434, + 445, 411, 257, 2710, 4722, 11403, 293, 2618, 1619, 411, 51104], "temperature": 0.0, + "avg_logprob": -0.20534529379748423, "compression_ratio": 1.8233082706766917, "no_speech_prob": + 0.0030439544934779406}, {"id": 319, "seek": 252284, "start": 2537.6400000000003, + "end": 2543.96, "text": " hey I want to do better product search and you do a Google + search on that you find a solution like", "tokens": [51104, 4177, 286, 528, 281, + 360, 1101, 1674, 3164, 293, 291, 360, 257, 3329, 3164, 322, 300, 291, 915, 257, + 3827, 411, 51420], "temperature": 0.0, "avg_logprob": -0.20534529379748423, "compression_ratio": + 1.8233082706766917, "no_speech_prob": 0.0030439544934779406}, {"id": 320, "seek": + 252284, "start": 2543.96, "end": 2549.32, "text": " we''ve yet you might be interested + in knowing what''s happening on the web but there''s a limit to", "tokens": [51420, + 321, 600, 1939, 291, 1062, 312, 3102, 294, 5276, 437, 311, 2737, 322, 264, 3670, + 457, 456, 311, 257, 4948, 281, 51688], "temperature": 0.0, "avg_logprob": -0.20534529379748423, + "compression_ratio": 1.8233082706766917, "no_speech_prob": 0.0030439544934779406}, + {"id": 321, "seek": 254932, "start": 2549.48, "end": 2555.88, "text": " that you + also you you come to it through the use case not not bottoms up but that''s how + I how I look at it", "tokens": [50372, 300, 291, 611, 291, 291, 808, 281, 309, 807, + 264, 764, 1389, 406, 406, 43413, 493, 457, 300, 311, 577, 286, 577, 286, 574, 412, + 309, 50692], "temperature": 0.0, "avg_logprob": -0.18421757739523184, "compression_ratio": + 1.8436018957345972, "no_speech_prob": 0.002182973315939307}, {"id": 322, "seek": + 254932, "start": 2555.88, "end": 2562.92, "text": " so that yeah that''s the point + that I want to make I really like I really like your approach", "tokens": [50692, + 370, 300, 1338, 300, 311, 264, 935, 300, 286, 528, 281, 652, 286, 534, 411, 286, + 534, 411, 428, 3109, 51044], "temperature": 0.0, "avg_logprob": -0.18421757739523184, + "compression_ratio": 1.8436018957345972, "no_speech_prob": 0.002182973315939307}, + {"id": 323, "seek": 254932, "start": 2562.92, "end": 2569.4, "text": " like because + I mean like yeah I mean I was kind of joining this industry or entering this", "tokens": + [51044, 411, 570, 286, 914, 411, 1338, 286, 914, 286, 390, 733, 295, 5549, 341, + 3518, 420, 11104, 341, 51368], "temperature": 0.0, "avg_logprob": -0.18421757739523184, + "compression_ratio": 1.8436018957345972, "no_speech_prob": 0.002182973315939307}, + {"id": 324, "seek": 254932, "start": 2569.4, "end": 2574.76, "text": " industry + as an engineer and then I kind of progressed more like to work closer to product + management", "tokens": [51368, 3518, 382, 364, 11403, 293, 550, 286, 733, 295, 36789, + 544, 411, 281, 589, 4966, 281, 1674, 4592, 51636], "temperature": 0.0, "avg_logprob": + -0.18421757739523184, "compression_ratio": 1.8436018957345972, "no_speech_prob": + 0.002182973315939307}, {"id": 325, "seek": 257476, "start": 2574.84, "end": 2581.7200000000003, + "text": " I didn''t I didn''t become a product manager but when I talk to them they + really kind of want to", "tokens": [50368, 286, 994, 380, 286, 994, 380, 1813, 257, + 1674, 6598, 457, 562, 286, 751, 281, 552, 436, 534, 733, 295, 528, 281, 50712], + "temperature": 0.0, "avg_logprob": -0.08652338090833726, "compression_ratio": 1.8130841121495327, + "no_speech_prob": 0.0022634316701442003}, {"id": 326, "seek": 257476, "start": 2581.7200000000003, + "end": 2586.6800000000003, "text": " hear too much about the algorithms right because + it''s not what they think about daily they think", "tokens": [50712, 1568, 886, + 709, 466, 264, 14642, 558, 570, 309, 311, 406, 437, 436, 519, 466, 5212, 436, 519, + 50960], "temperature": 0.0, "avg_logprob": -0.08652338090833726, "compression_ratio": + 1.8130841121495327, "no_speech_prob": 0.0022634316701442003}, {"id": 327, "seek": + 257476, "start": 2586.6800000000003, "end": 2595.7200000000003, "text": " about + solving user use cases right and and and sometimes they may ask how can I do this + is this", "tokens": [50960, 466, 12606, 4195, 764, 3331, 558, 293, 293, 293, 2171, + 436, 815, 1029, 577, 393, 286, 360, 341, 307, 341, 51412], "temperature": 0.0, "avg_logprob": + -0.08652338090833726, "compression_ratio": 1.8130841121495327, "no_speech_prob": + 0.0022634316701442003}, {"id": 328, "seek": 257476, "start": 2595.7200000000003, + "end": 2601.2400000000002, "text": " possible right and they give you a task right + and then you go and like you come back to your toolbox", "tokens": [51412, 1944, + 558, 293, 436, 976, 291, 257, 5633, 558, 293, 550, 291, 352, 293, 411, 291, 808, + 646, 281, 428, 44593, 51688], "temperature": 0.0, "avg_logprob": -0.08652338090833726, + "compression_ratio": 1.8130841121495327, "no_speech_prob": 0.0022634316701442003}, + {"id": 329, "seek": 260124, "start": 2601.24, "end": 2608.6, "text": " and you''re + like okay what do I have here a couple of databases I have this queue system okay + let me", "tokens": [50364, 293, 291, 434, 411, 1392, 437, 360, 286, 362, 510, 257, + 1916, 295, 22380, 286, 362, 341, 18639, 1185, 1392, 718, 385, 50732], "temperature": + 0.0, "avg_logprob": -0.10174992320301769, "compression_ratio": 1.7072072072072073, + "no_speech_prob": 0.0018512050155550241}, {"id": 330, "seek": 260124, "start": 2608.6, + "end": 2614.2799999999997, "text": " stitch things together and maybe this will + work out right but you I agree to I agree with you that", "tokens": [50732, 5635, + 721, 1214, 293, 1310, 341, 486, 589, 484, 558, 457, 291, 286, 3986, 281, 286, 3986, + 365, 291, 300, 51016], "temperature": 0.0, "avg_logprob": -0.10174992320301769, + "compression_ratio": 1.7072072072072073, "no_speech_prob": 0.0018512050155550241}, + {"id": 331, "seek": 260124, "start": 2614.8399999999997, "end": 2620.6, "text": + " I think in many ways we we have that risk kind of in engineering kind of focusing + too much", "tokens": [51044, 286, 519, 294, 867, 2098, 321, 321, 362, 300, 3148, + 733, 295, 294, 7043, 733, 295, 8416, 886, 709, 51332], "temperature": 0.0, "avg_logprob": + -0.10174992320301769, "compression_ratio": 1.7072072072072073, "no_speech_prob": + 0.0018512050155550241}, {"id": 332, "seek": 260124, "start": 2621.24, "end": 2629.08, + "text": " on what''s closer to us right let''s say I enjoy using this IDE I enjoy + using this compiler", "tokens": [51364, 322, 437, 311, 4966, 281, 505, 558, 718, + 311, 584, 286, 2103, 1228, 341, 40930, 286, 2103, 1228, 341, 31958, 51756], "temperature": + 0.0, "avg_logprob": -0.10174992320301769, "compression_ratio": 1.7072072072072073, + "no_speech_prob": 0.0018512050155550241}, {"id": 333, "seek": 262908, "start": 2629.08, + "end": 2636.2, "text": " but what value it produces beyond me enjoying using it + right the end result right that", "tokens": [50364, 457, 437, 2158, 309, 14725, + 4399, 385, 9929, 1228, 309, 558, 264, 917, 1874, 558, 300, 50720], "temperature": + 0.0, "avg_logprob": -0.143104520337335, "compression_ratio": 1.663677130044843, + "no_speech_prob": 0.001715904800221324}, {"id": 334, "seek": 262908, "start": 2636.2, + "end": 2642.92, "text": " matters no absolutely and don''t get me wrong as so I + mean sometimes when we apply on our internal", "tokens": [50720, 7001, 572, 3122, + 293, 500, 380, 483, 385, 2085, 382, 370, 286, 914, 2171, 562, 321, 3079, 322, 527, + 6920, 51056], "temperature": 0.0, "avg_logprob": -0.143104520337335, "compression_ratio": + 1.663677130044843, "no_speech_prob": 0.001715904800221324}, {"id": 335, "seek": + 262908, "start": 2642.92, "end": 2648.44, "text": " slack channel when there''s + something you released in the software or then I enjoy that as well", "tokens": + [51056, 29767, 2269, 562, 456, 311, 746, 291, 4736, 294, 264, 4722, 420, 550, 286, + 2103, 300, 382, 731, 51332], "temperature": 0.0, "avg_logprob": -0.143104520337335, + "compression_ratio": 1.663677130044843, "no_speech_prob": 0.001715904800221324}, + {"id": 336, "seek": 262908, "start": 2648.44, "end": 2652.92, "text": " and or I + can you know then I you know I play around with it and I go like it''s amazing that", + "tokens": [51332, 293, 420, 286, 393, 291, 458, 550, 286, 291, 458, 286, 862, 926, + 365, 309, 293, 286, 352, 411, 309, 311, 2243, 300, 51556], "temperature": 0.0, "avg_logprob": + -0.143104520337335, "compression_ratio": 1.663677130044843, "no_speech_prob": 0.001715904800221324}, + {"id": 337, "seek": 265292, "start": 2652.92, "end": 2660.6800000000003, "text": + " this works or that we can do this or how fast things are or scalable they are + don''t get me wrong", "tokens": [50364, 341, 1985, 420, 300, 321, 393, 360, 341, + 420, 577, 2370, 721, 366, 420, 38481, 436, 366, 500, 380, 483, 385, 2085, 50752], + "temperature": 0.0, "avg_logprob": -0.15094008048375449, "compression_ratio": 1.7772727272727273, + "no_speech_prob": 0.003019541036337614}, {"id": 338, "seek": 265292, "start": 2660.6800000000003, + "end": 2667.64, "text": " I enjoy that a lot but the things but I what I also enjoy + and I think that''s also the role that I have", "tokens": [50752, 286, 2103, 300, + 257, 688, 457, 264, 721, 457, 286, 437, 286, 611, 2103, 293, 286, 519, 300, 311, + 611, 264, 3090, 300, 286, 362, 51100], "temperature": 0.0, "avg_logprob": -0.15094008048375449, + "compression_ratio": 1.7772727272727273, "no_speech_prob": 0.003019541036337614}, + {"id": 339, "seek": 265292, "start": 2667.64, "end": 2672.44, "text": " in this + in this company and the role that I''m trying to play for us when it comes to vector + search is", "tokens": [51100, 294, 341, 294, 341, 2237, 293, 264, 3090, 300, 286, + 478, 1382, 281, 862, 337, 505, 562, 309, 1487, 281, 8062, 3164, 307, 51340], "temperature": + 0.0, "avg_logprob": -0.15094008048375449, "compression_ratio": 1.7772727272727273, + "no_speech_prob": 0.003019541036337614}, {"id": 340, "seek": 265292, "start": 2673.2400000000002, + "end": 2679.32, "text": " if you for example have these product managers to actually + listen to them and select what", "tokens": [51380, 498, 291, 337, 1365, 362, 613, + 1674, 14084, 281, 767, 2140, 281, 552, 293, 3048, 437, 51684], "temperature": 0.0, + "avg_logprob": -0.15094008048375449, "compression_ratio": 1.7772727272727273, "no_speech_prob": + 0.003019541036337614}, {"id": 341, "seek": 267932, "start": 2679.96, "end": 2686.04, + "text": " what problem can we solve and I don''t think it''s the responsibility + of the product manager", "tokens": [50396, 437, 1154, 393, 321, 5039, 293, 286, + 500, 380, 519, 309, 311, 264, 6357, 295, 264, 1674, 6598, 50700], "temperature": + 0.0, "avg_logprob": -0.07143635469324448, "compression_ratio": 1.758139534883721, + "no_speech_prob": 0.0016546325059607625}, {"id": 342, "seek": 267932, "start": 2686.04, + "end": 2693.32, "text": " to take an example to understand how vector search might + apply to their use case no we need to", "tokens": [50700, 281, 747, 364, 1365, 281, + 1223, 577, 8062, 3164, 1062, 3079, 281, 641, 764, 1389, 572, 321, 643, 281, 51064], + "temperature": 0.0, "avg_logprob": -0.07143635469324448, "compression_ratio": 1.758139534883721, + "no_speech_prob": 0.0016546325059607625}, {"id": 343, "seek": 267932, "start": 2693.32, + "end": 2698.76, "text": " be able to express through the product managers how we + can bring value to them because I don''t think", "tokens": [51064, 312, 1075, 281, + 5109, 807, 264, 1674, 14084, 577, 321, 393, 1565, 2158, 281, 552, 570, 286, 500, + 380, 519, 51336], "temperature": 0.0, "avg_logprob": -0.07143635469324448, "compression_ratio": + 1.758139534883721, "no_speech_prob": 0.0016546325059607625}, {"id": 344, "seek": + 267932, "start": 2698.76, "end": 2704.28, "text": " I mean of course there are product + managers that say like okay for the next product we need", "tokens": [51336, 286, + 914, 295, 1164, 456, 366, 1674, 14084, 300, 584, 411, 1392, 337, 264, 958, 1674, + 321, 643, 51612], "temperature": 0.0, "avg_logprob": -0.07143635469324448, "compression_ratio": + 1.758139534883721, "no_speech_prob": 0.0016546325059607625}, {"id": 345, "seek": + 270428, "start": 2704.28, "end": 2709.5600000000004, "text": " to cut some of our + database but I don''t think there are many asking a question they ask a question", + "tokens": [50364, 281, 1723, 512, 295, 527, 8149, 457, 286, 500, 380, 519, 456, + 366, 867, 3365, 257, 1168, 436, 1029, 257, 1168, 50628], "temperature": 0.0, "avg_logprob": + -0.14162561940211876, "compression_ratio": 1.7644787644787645, "no_speech_prob": + 0.0033369429875165224}, {"id": 346, "seek": 270428, "start": 2709.5600000000004, + "end": 2714.1200000000003, "text": " like okay you know we can absolutely it''s + a lot of data we can never lose the data", "tokens": [50628, 411, 1392, 291, 458, + 321, 393, 3122, 309, 311, 257, 688, 295, 1412, 321, 393, 1128, 3624, 264, 1412, + 50856], "temperature": 0.0, "avg_logprob": -0.14162561940211876, "compression_ratio": + 1.7644787644787645, "no_speech_prob": 0.0033369429875165224}, {"id": 347, "seek": + 270428, "start": 2715.32, "end": 2720.6800000000003, "text": " architect or or engineer + how are we gonna solve that and it''s so it''s different language to", "tokens": + [50916, 6331, 420, 420, 11403, 577, 366, 321, 799, 5039, 300, 293, 309, 311, 370, + 309, 311, 819, 2856, 281, 51184], "temperature": 0.0, "avg_logprob": -0.14162561940211876, + "compression_ratio": 1.7644787644787645, "no_speech_prob": 0.0033369429875165224}, + {"id": 348, "seek": 270428, "start": 2720.6800000000003, "end": 2725.4, "text": + " talk about these problems and what we now start to see is that that there''s this + wave coming that", "tokens": [51184, 751, 466, 613, 2740, 293, 437, 321, 586, 722, + 281, 536, 307, 300, 300, 456, 311, 341, 5772, 1348, 300, 51420], "temperature": + 0.0, "avg_logprob": -0.14162561940211876, "compression_ratio": 1.7644787644787645, + "no_speech_prob": 0.0033369429875165224}, {"id": 349, "seek": 270428, "start": 2725.4, + "end": 2730.52, "text": " people express problems from a product manager perspective + business owner perspective", "tokens": [51420, 561, 5109, 2740, 490, 257, 1674, + 6598, 4585, 1606, 7289, 4585, 51676], "temperature": 0.0, "avg_logprob": -0.14162561940211876, + "compression_ratio": 1.7644787644787645, "no_speech_prob": 0.0033369429875165224}, + {"id": 350, "seek": 273052, "start": 2731.16, "end": 2736.7599999999998, "text": + " entrepreneur perspective that they that they say things or problems that they + have for example", "tokens": [50396, 14307, 4585, 300, 436, 300, 436, 584, 721, + 420, 2740, 300, 436, 362, 337, 1365, 50676], "temperature": 0.0, "avg_logprob": + -0.15235280990600586, "compression_ratio": 1.6801801801801801, "no_speech_prob": + 0.004758896771818399}, {"id": 351, "seek": 273052, "start": 2737.24, "end": 2742.7599999999998, + "text": " hey somebody keeps typing and I''m having a headache in my search bar + but they don''t see", "tokens": [50700, 4177, 2618, 5965, 18444, 293, 286, 478, + 1419, 257, 23520, 294, 452, 3164, 2159, 457, 436, 500, 380, 536, 50976], "temperature": + 0.0, "avg_logprob": -0.15235280990600586, "compression_ratio": 1.6801801801801801, + "no_speech_prob": 0.004758896771818399}, {"id": 352, "seek": 273052, "start": 2742.7599999999998, + "end": 2749.24, "text": " aspirin and then we need to go boom let''s use case for + we get yeah absolutely and it''s like", "tokens": [50976, 20003, 259, 293, 550, + 321, 643, 281, 352, 9351, 718, 311, 764, 1389, 337, 321, 483, 1338, 3122, 293, 309, + 311, 411, 51300], "temperature": 0.0, "avg_logprob": -0.15235280990600586, "compression_ratio": + 1.6801801801801801, "no_speech_prob": 0.004758896771818399}, {"id": 353, "seek": + 273052, "start": 2750.04, "end": 2755.4, "text": " that''s a great segue actually + to the second part of the of our show you know like product managers", "tokens": + [51340, 300, 311, 257, 869, 33850, 767, 281, 264, 1150, 644, 295, 264, 295, 527, + 855, 291, 458, 411, 1674, 14084, 51608], "temperature": 0.0, "avg_logprob": -0.15235280990600586, + "compression_ratio": 1.6801801801801801, "no_speech_prob": 0.004758896771818399}, + {"id": 354, "seek": 275540, "start": 2756.28, "end": 2763.56, "text": " answer the + question what we''re building right and engineers answer the question how and so", + "tokens": [50408, 1867, 264, 1168, 437, 321, 434, 2390, 558, 293, 11955, 1867, 264, + 1168, 577, 293, 370, 50772], "temperature": 0.0, "avg_logprob": -0.1803202039740059, + "compression_ratio": 1.727699530516432, "no_speech_prob": 0.010061069391667843}, + {"id": 355, "seek": 275540, "start": 2764.2000000000003, "end": 2771.32, "text": + " I wanted you to kind of go and talk a little bit about how you implemented the + avid and I", "tokens": [50804, 286, 1415, 291, 281, 733, 295, 352, 293, 751, 257, + 707, 857, 466, 577, 291, 12270, 264, 1305, 327, 293, 286, 51160], "temperature": + 0.0, "avg_logprob": -0.1803202039740059, "compression_ratio": 1.727699530516432, + "no_speech_prob": 0.010061069391667843}, {"id": 356, "seek": 275540, "start": 2771.32, + "end": 2776.92, "text": " understand that hn maybe could also talk about it and + I think he talked about it recently in", "tokens": [51160, 1223, 300, 276, 77, 1310, + 727, 611, 751, 466, 309, 293, 286, 519, 415, 2825, 466, 309, 3938, 294, 51440], + "temperature": 0.0, "avg_logprob": -0.1803202039740059, "compression_ratio": 1.727699530516432, + "no_speech_prob": 0.010061069391667843}, {"id": 357, "seek": 275540, "start": 2776.92, + "end": 2783.88, "text": " a podcast and I''ll be I''ll make sure to link that in + the show notes but what caught my eye and", "tokens": [51440, 257, 7367, 293, 286, + 603, 312, 286, 603, 652, 988, 281, 2113, 300, 294, 264, 855, 5570, 457, 437, 5415, + 452, 3313, 293, 51788], "temperature": 0.0, "avg_logprob": -0.1803202039740059, + "compression_ratio": 1.727699530516432, "no_speech_prob": 0.010061069391667843}, + {"id": 358, "seek": 278388, "start": 2783.88, "end": 2791.7200000000003, "text": + " you know if you look at the landscape of the vector databases some of them are + close source", "tokens": [50364, 291, 458, 498, 291, 574, 412, 264, 9661, 295, 264, + 8062, 22380, 512, 295, 552, 366, 1998, 4009, 50756], "temperature": 0.0, "avg_logprob": + -0.08241633574167888, "compression_ratio": 1.8300970873786409, "no_speech_prob": + 0.0007727486663497984}, {"id": 359, "seek": 278388, "start": 2792.44, "end": 2799.0, + "text": " most of them up to now are open source and it''s interesting distinction + because some businesses", "tokens": [50792, 881, 295, 552, 493, 281, 586, 366, 1269, + 4009, 293, 309, 311, 1880, 16844, 570, 512, 6011, 51120], "temperature": 0.0, "avg_logprob": + -0.08241633574167888, "compression_ratio": 1.8300970873786409, "no_speech_prob": + 0.0007727486663497984}, {"id": 360, "seek": 278388, "start": 2799.0, "end": 2806.44, + "text": " decide you know we will keep it close because it''s at the core of what + we offer and you know", "tokens": [51120, 4536, 291, 458, 321, 486, 1066, 309, 1998, + 570, 309, 311, 412, 264, 4965, 295, 437, 321, 2626, 293, 291, 458, 51492], "temperature": + 0.0, "avg_logprob": -0.08241633574167888, "compression_ratio": 1.8300970873786409, + "no_speech_prob": 0.0007727486663497984}, {"id": 361, "seek": 278388, "start": 2806.44, + "end": 2811.1600000000003, "text": " maybe there are some risk elements involved + for them maybe something else but that''s that that''s", "tokens": [51492, 1310, + 456, 366, 512, 3148, 4959, 3288, 337, 552, 1310, 746, 1646, 457, 300, 311, 300, + 300, 311, 51728], "temperature": 0.0, "avg_logprob": -0.08241633574167888, "compression_ratio": + 1.8300970873786409, "no_speech_prob": 0.0007727486663497984}, {"id": 362, "seek": + 281116, "start": 2811.16, "end": 2819.24, "text": " their choice your choice was + to open source v avid can you talk a bit more about it um yeah sure", "tokens": + [50364, 641, 3922, 428, 3922, 390, 281, 1269, 4009, 371, 1305, 327, 393, 291, 751, + 257, 857, 544, 466, 309, 1105, 1338, 988, 50768], "temperature": 0.0, "avg_logprob": + -0.16863766761675272, "compression_ratio": 1.7804878048780488, "no_speech_prob": + 0.0011968525359407067}, {"id": 363, "seek": 281116, "start": 2819.24, "end": 2826.8399999999997, + "text": " yeah sure sure so the um so that goes back to us like so if we so you + can if you have a use case", "tokens": [50768, 1338, 988, 988, 370, 264, 1105, 370, + 300, 1709, 646, 281, 505, 411, 370, 498, 321, 370, 291, 393, 498, 291, 362, 257, + 764, 1389, 51148], "temperature": 0.0, "avg_logprob": -0.16863766761675272, "compression_ratio": + 1.7804878048780488, "no_speech_prob": 0.0011968525359407067}, {"id": 364, "seek": + 281116, "start": 2826.8399999999997, "end": 2833.3199999999997, "text": " you can + package things together right that goes from the the the lowest level of the technology + so", "tokens": [51148, 291, 393, 7372, 721, 1214, 558, 300, 1709, 490, 264, 264, + 264, 12437, 1496, 295, 264, 2899, 370, 51472], "temperature": 0.0, "avg_logprob": + -0.16863766761675272, "compression_ratio": 1.7804878048780488, "no_speech_prob": + 0.0011968525359407067}, {"id": 365, "seek": 283332, "start": 2833.4, "end": 2840.76, + "text": " just you know where the the the bits by begin and then where the where + the index sits and how", "tokens": [50368, 445, 291, 458, 689, 264, 264, 264, 9239, + 538, 1841, 293, 550, 689, 264, 689, 264, 8186, 12696, 293, 577, 50736], "temperature": + 0.0, "avg_logprob": -0.15365365965176472, "compression_ratio": 2.0638297872340425, + "no_speech_prob": 0.00046489431406371295}, {"id": 366, "seek": 283332, "start": + 2840.76, "end": 2845.8, "text": " how it works and how it''s optimized and how it''s + scalable and then you go up on up and then you get", "tokens": [50736, 577, 309, + 1985, 293, 577, 309, 311, 26941, 293, 577, 309, 311, 38481, 293, 550, 291, 352, + 493, 322, 493, 293, 550, 291, 483, 50988], "temperature": 0.0, "avg_logprob": -0.15365365965176472, + "compression_ratio": 2.0638297872340425, "no_speech_prob": 0.00046489431406371295}, + {"id": 367, "seek": 283332, "start": 2845.8, "end": 2850.92, "text": " to to these + to these modules that you might want to use and then you get to these packages of", + "tokens": [50988, 281, 281, 613, 281, 613, 16679, 300, 291, 1062, 528, 281, 764, + 293, 550, 291, 483, 281, 613, 17401, 295, 51244], "temperature": 0.0, "avg_logprob": + -0.15365365965176472, "compression_ratio": 2.0638297872340425, "no_speech_prob": + 0.00046489431406371295}, {"id": 368, "seek": 283332, "start": 2850.92, "end": 2856.52, + "text": " additional tools that you might want to use for specific use case and + then the question sits like", "tokens": [51244, 4497, 3873, 300, 291, 1062, 528, + 281, 764, 337, 2685, 764, 1389, 293, 550, 264, 1168, 12696, 411, 51524], "temperature": + 0.0, "avg_logprob": -0.15365365965176472, "compression_ratio": 2.0638297872340425, + "no_speech_prob": 0.00046489431406371295}, {"id": 369, "seek": 283332, "start": + 2856.52, "end": 2863.2400000000002, "text": " okay where does the most value come + from and what do people need to actually use this in production", "tokens": [51524, + 1392, 689, 775, 264, 881, 2158, 808, 490, 293, 437, 360, 561, 643, 281, 767, 764, + 341, 294, 4265, 51860], "temperature": 0.0, "avg_logprob": -0.15365365965176472, + "compression_ratio": 2.0638297872340425, "no_speech_prob": 0.00046489431406371295}, + {"id": 370, "seek": 286324, "start": 2863.3999999999996, "end": 2870.6, "text": + " and um what you try to do is that you try to somehow capture that value and then + there are two things", "tokens": [50372, 293, 1105, 437, 291, 853, 281, 360, 307, + 300, 291, 853, 281, 6063, 7983, 300, 2158, 293, 550, 456, 366, 732, 721, 50732], + "temperature": 0.0, "avg_logprob": -0.17146490038055734, "compression_ratio": 1.7456140350877194, + "no_speech_prob": 0.0007775037083774805}, {"id": 371, "seek": 286324, "start": 2870.6, + "end": 2876.12, "text": " that we see in the case of vv8 because vv8 of course all + these also with our competitors we evolve", "tokens": [50732, 300, 321, 536, 294, + 264, 1389, 295, 371, 85, 23, 570, 371, 85, 23, 295, 1164, 439, 613, 611, 365, 527, + 18333, 321, 16693, 51008], "temperature": 0.0, "avg_logprob": -0.17146490038055734, + "compression_ratio": 1.7456140350877194, "no_speech_prob": 0.0007775037083774805}, + {"id": 372, "seek": 286324, "start": 2876.12, "end": 2882.9199999999996, "text": + " in different directions right which is good I think um uh is that we said like + well there''s a lot", "tokens": [51008, 294, 819, 11095, 558, 597, 307, 665, 286, + 519, 1105, 2232, 307, 300, 321, 848, 411, 731, 456, 311, 257, 688, 51348], "temperature": + 0.0, "avg_logprob": -0.17146490038055734, "compression_ratio": 1.7456140350877194, + "no_speech_prob": 0.0007775037083774805}, {"id": 373, "seek": 286324, "start": 2882.9199999999996, + "end": 2888.3599999999997, "text": " of value so if you look at our enterprise customers + right what''s very important for them is that um", "tokens": [51348, 295, 2158, + 370, 498, 291, 574, 412, 527, 14132, 4581, 558, 437, 311, 588, 1021, 337, 552, 307, + 300, 1105, 51620], "temperature": 0.0, "avg_logprob": -0.17146490038055734, "compression_ratio": + 1.7456140350877194, "no_speech_prob": 0.0007775037083774805}, {"id": 374, "seek": + 288836, "start": 2889.32, "end": 2894.04, "text": " uh that they want to have certain + SLAs that they want to have certain um sometimes they want to have", "tokens": [50412, + 2232, 300, 436, 528, 281, 362, 1629, 22999, 10884, 300, 436, 528, 281, 362, 1629, + 1105, 2171, 436, 528, 281, 362, 50648], "temperature": 0.0, "avg_logprob": -0.21355586581759983, + "compression_ratio": 1.9899497487437185, "no_speech_prob": 0.0014493642374873161}, + {"id": 375, "seek": 288836, "start": 2894.04, "end": 2898.6, "text": " a size of + things sometimes they want to use things that are in these packages sometimes they + want to", "tokens": [50648, 257, 2744, 295, 721, 2171, 436, 528, 281, 764, 721, + 300, 366, 294, 613, 17401, 2171, 436, 528, 281, 50876], "temperature": 0.0, "avg_logprob": + -0.21355586581759983, "compression_ratio": 1.9899497487437185, "no_speech_prob": + 0.0014493642374873161}, {"id": 376, "seek": 288836, "start": 2898.6, "end": 2905.1600000000003, + "text": " have specific models they all can do that uh but um that is where the + most of the value for them", "tokens": [50876, 362, 2685, 5245, 436, 439, 393, 360, + 300, 2232, 457, 1105, 300, 307, 689, 264, 881, 295, 264, 2158, 337, 552, 51204], + "temperature": 0.0, "avg_logprob": -0.21355586581759983, "compression_ratio": 1.9899497487437185, + "no_speech_prob": 0.0014493642374873161}, {"id": 377, "seek": 288836, "start": 2905.1600000000003, + "end": 2912.92, "text": " is coming from they need vv8 to do it''s always a vv8 + that''s hard but it um uh seldomly the people", "tokens": [51204, 307, 1348, 490, + 436, 643, 371, 85, 23, 281, 360, 309, 311, 1009, 257, 371, 85, 23, 300, 311, 1152, + 457, 309, 1105, 2232, 47717, 356, 264, 561, 51592], "temperature": 0.0, "avg_logprob": + -0.21355586581759983, "compression_ratio": 1.9899497487437185, "no_speech_prob": + 0.0014493642374873161}, {"id": 378, "seek": 291292, "start": 2912.92, "end": 2918.92, + "text": " specifically ask what effect the search engine so if you go back to that + example what I gave like", "tokens": [50364, 4682, 1029, 437, 1802, 264, 3164, 2848, + 370, 498, 291, 352, 646, 281, 300, 1365, 437, 286, 2729, 411, 50664], "temperature": + 0.0, "avg_logprob": -0.15758454695991847, "compression_ratio": 1.8888888888888888, + "no_speech_prob": 0.006236704532057047}, {"id": 379, "seek": 291292, "start": 2919.56, + "end": 2924.92, "text": " these famous search engines that you now have around are + not promoted as like we have the best", "tokens": [50696, 613, 4618, 3164, 12982, + 300, 291, 586, 362, 926, 366, 406, 21162, 382, 411, 321, 362, 264, 1151, 50964], + "temperature": 0.0, "avg_logprob": -0.15758454695991847, "compression_ratio": 1.8888888888888888, + "no_speech_prob": 0.006236704532057047}, {"id": 380, "seek": 291292, "start": 2924.92, + "end": 2930.36, "text": " infertile index uh uh piece of software that exists that + goes a little bit the same for us so", "tokens": [50964, 13596, 83, 794, 8186, 2232, + 2232, 2522, 295, 4722, 300, 8198, 300, 1709, 257, 707, 857, 264, 912, 337, 505, + 370, 51236], "temperature": 0.0, "avg_logprob": -0.15758454695991847, "compression_ratio": + 1.8888888888888888, "no_speech_prob": 0.006236704532057047}, {"id": 381, "seek": + 291292, "start": 2931.0, "end": 2937.56, "text": " then you can select well if that''s + the case we could consider open sourcing it and then you", "tokens": [51268, 550, + 291, 393, 3048, 731, 498, 300, 311, 264, 1389, 321, 727, 1949, 1269, 11006, 2175, + 309, 293, 550, 291, 51596], "temperature": 0.0, "avg_logprob": -0.15758454695991847, + "compression_ratio": 1.8888888888888888, "no_speech_prob": 0.006236704532057047}, + {"id": 382, "seek": 291292, "start": 2937.56, "end": 2942.36, "text": " can say + well I can make a pro list and I can make a con list of open sourcing so because + I have a", "tokens": [51596, 393, 584, 731, 286, 393, 652, 257, 447, 1329, 293, + 286, 393, 652, 257, 416, 1329, 295, 1269, 11006, 2175, 370, 570, 286, 362, 257, + 51836], "temperature": 0.0, "avg_logprob": -0.15758454695991847, "compression_ratio": + 1.8888888888888888, "no_speech_prob": 0.006236704532057047}, {"id": 383, "seek": + 294236, "start": 2942.36, "end": 2949.2400000000002, "text": " business model right + I know I want to build my business so what would be the the um a pro of open", "tokens": + [50364, 1606, 2316, 558, 286, 458, 286, 528, 281, 1322, 452, 1606, 370, 437, 576, + 312, 264, 264, 1105, 257, 447, 295, 1269, 50708], "temperature": 0.0, "avg_logprob": + -0.12487714017023806, "compression_ratio": 1.8935361216730038, "no_speech_prob": + 0.0010102689266204834}, {"id": 384, "seek": 294236, "start": 2949.2400000000002, + "end": 2954.92, "text": " sourcing it well one is transparency so you say like well + we''re building something completely new", "tokens": [50708, 11006, 2175, 309, 731, + 472, 307, 17131, 370, 291, 584, 411, 731, 321, 434, 2390, 746, 2584, 777, 50992], + "temperature": 0.0, "avg_logprob": -0.12487714017023806, "compression_ratio": 1.8935361216730038, + "no_speech_prob": 0.0010102689266204834}, {"id": 385, "seek": 294236, "start": 2954.92, + "end": 2960.92, "text": " it sounds all very fancy we''re gonna show the world that + we''re not you know we can actually uh do this", "tokens": [50992, 309, 3263, 439, + 588, 10247, 321, 434, 799, 855, 264, 1002, 300, 321, 434, 406, 291, 458, 321, 393, + 767, 2232, 360, 341, 51292], "temperature": 0.0, "avg_logprob": -0.12487714017023806, + "compression_ratio": 1.8935361216730038, "no_speech_prob": 0.0010102689266204834}, + {"id": 386, "seek": 294236, "start": 2960.92, "end": 2966.44, "text": " so I I very + fancy told you like I was like well we even have parts of assembly in the in the + uh in", "tokens": [51292, 370, 286, 286, 588, 10247, 1907, 291, 411, 286, 390, 411, + 731, 321, 754, 362, 3166, 295, 12103, 294, 264, 294, 264, 2232, 294, 51568], "temperature": + 0.0, "avg_logprob": -0.12487714017023806, "compression_ratio": 1.8935361216730038, + "no_speech_prob": 0.0010102689266204834}, {"id": 387, "seek": 294236, "start": 2966.44, + "end": 2970.36, "text": " the code well you can actually see that right so you can + see how it''s optimized you can see how it''s", "tokens": [51568, 264, 3089, 731, + 291, 393, 767, 536, 300, 558, 370, 291, 393, 536, 577, 309, 311, 26941, 291, 393, + 536, 577, 309, 311, 51764], "temperature": 0.0, "avg_logprob": -0.12487714017023806, + "compression_ratio": 1.8935361216730038, "no_speech_prob": 0.0010102689266204834}, + {"id": 388, "seek": 297036, "start": 2970.36, "end": 2977.88, "text": " optimized + as one so what that has uh as an effect is that it builds trust so the second thing + that", "tokens": [50364, 26941, 382, 472, 370, 437, 300, 575, 2232, 382, 364, 1802, + 307, 300, 309, 15182, 3361, 370, 264, 1150, 551, 300, 50740], "temperature": 0.0, + "avg_logprob": -0.18695352243822674, "compression_ratio": 1.7625570776255708, "no_speech_prob": + 0.0007752158562652767}, {"id": 389, "seek": 297036, "start": 2977.88, "end": 2982.92, + "text": " happens is as I mentioned before we need to learn what these use cases + are that people are building", "tokens": [50740, 2314, 307, 382, 286, 2835, 949, + 321, 643, 281, 1466, 437, 613, 764, 3331, 366, 300, 561, 366, 2390, 50992], "temperature": + 0.0, "avg_logprob": -0.18695352243822674, "compression_ratio": 1.7625570776255708, + "no_speech_prob": 0.0007752158562652767}, {"id": 390, "seek": 297036, "start": 2982.92, + "end": 2989.48, "text": " with vector search so we see people are building like + crazy so our downloads are going up up", "tokens": [50992, 365, 8062, 3164, 370, + 321, 536, 561, 366, 2390, 411, 3219, 370, 527, 36553, 366, 516, 493, 493, 51320], + "temperature": 0.0, "avg_logprob": -0.18695352243822674, "compression_ratio": 1.7625570776255708, + "no_speech_prob": 0.0007752158562652767}, {"id": 391, "seek": 297036, "start": 2989.48, + "end": 2996.76, "text": " over time and the other day somebody just published a + great uh article about how they how they", "tokens": [51320, 670, 565, 293, 264, + 661, 786, 2618, 445, 6572, 257, 869, 2232, 7222, 466, 577, 436, 577, 436, 51684], + "temperature": 0.0, "avg_logprob": -0.18695352243822674, "compression_ratio": 1.7625570776255708, + "no_speech_prob": 0.0007752158562652767}, {"id": 392, "seek": 299676, "start": 2996.76, + "end": 3002.28, "text": " index 60 million data objects than we''ve yet were open + source users but what we have what we", "tokens": [50364, 8186, 4060, 2459, 1412, + 6565, 813, 321, 600, 1939, 645, 1269, 4009, 5022, 457, 437, 321, 362, 437, 321, + 50640], "temperature": 0.0, "avg_logprob": -0.283942379794278, "compression_ratio": + 1.7924528301886793, "no_speech_prob": 0.0014376367907971144}, {"id": 393, "seek": + 299676, "start": 3002.28, "end": 3007.7200000000003, "text": " learned from it we + have this we they it''s like um they''re they''re so kind to do they''re basically", + "tokens": [50640, 3264, 490, 309, 321, 362, 341, 321, 436, 309, 311, 411, 1105, + 436, 434, 436, 434, 370, 733, 281, 360, 436, 434, 1936, 50912], "temperature": 0.0, + "avg_logprob": -0.283942379794278, "compression_ratio": 1.7924528301886793, "no_speech_prob": + 0.0014376367907971144}, {"id": 394, "seek": 299676, "start": 3007.7200000000003, + "end": 3013.4, "text": " promoting we''ve yet they get a seat back they gave us + help etc but there''s also another thing um", "tokens": [50912, 16383, 321, 600, + 1939, 436, 483, 257, 6121, 646, 436, 2729, 505, 854, 5183, 457, 456, 311, 611, 1071, + 551, 1105, 51196], "temperature": 0.0, "avg_logprob": -0.283942379794278, "compression_ratio": + 1.7924528301886793, "no_speech_prob": 0.0014376367907971144}, {"id": 395, "seek": + 299676, "start": 3014.2000000000003, "end": 3020.6000000000004, "text": " sometimes + an open source user finds a bug or finds something and the way that of course that", + "tokens": [51236, 2171, 364, 1269, 4009, 4195, 10704, 257, 7426, 420, 10704, 746, + 293, 264, 636, 300, 295, 1164, 300, 51556], "temperature": 0.0, "avg_logprob": -0.283942379794278, + "compression_ratio": 1.7924528301886793, "no_speech_prob": 0.0014376367907971144}, + {"id": 396, "seek": 302060, "start": 3021.0, "end": 3026.7599999999998, "text": + " that the that the the the the software ecosystem is structured is at the moment + that the fix comes in", "tokens": [50384, 300, 264, 300, 264, 264, 264, 264, 4722, + 11311, 307, 18519, 307, 412, 264, 1623, 300, 264, 3191, 1487, 294, 50672], "temperature": + 0.0, "avg_logprob": -0.15340190098203463, "compression_ratio": 1.9411764705882353, + "no_speech_prob": 0.00046640701475553215}, {"id": 397, "seek": 302060, "start": + 3028.52, "end": 3035.4, "text": " our customers have that fix as well so it''s a + it''s a win-win so and the thing is customers don''t", "tokens": [50760, 527, 4581, + 362, 300, 3191, 382, 731, 370, 309, 311, 257, 309, 311, 257, 1942, 12, 9136, 370, + 293, 264, 551, 307, 4581, 500, 380, 51104], "temperature": 0.0, "avg_logprob": -0.15340190098203463, + "compression_ratio": 1.9411764705882353, "no_speech_prob": 0.00046640701475553215}, + {"id": 398, "seek": 302060, "start": 3035.4, "end": 3040.36, "text": " mind that + are open source users because if I have a customer that says like hey or prospect + because", "tokens": [51104, 1575, 300, 366, 1269, 4009, 5022, 570, 498, 286, 362, + 257, 5474, 300, 1619, 411, 4177, 420, 15005, 570, 51352], "temperature": 0.0, "avg_logprob": + -0.15340190098203463, "compression_ratio": 1.9411764705882353, "no_speech_prob": + 0.00046640701475553215}, {"id": 399, "seek": 302060, "start": 3040.36, "end": 3045.08, + "text": " hey Bob can I also use the open source version to say of course you can + but if you manage it yourself", "tokens": [51352, 4177, 6085, 393, 286, 611, 764, + 264, 1269, 4009, 3037, 281, 584, 295, 1164, 291, 393, 457, 498, 291, 3067, 309, + 1803, 51588], "temperature": 0.0, "avg_logprob": -0.15340190098203463, "compression_ratio": + 1.9411764705882353, "no_speech_prob": 0.00046640701475553215}, {"id": 400, "seek": + 302060, "start": 3045.08, "end": 3048.8399999999997, "text": " you''re stuck with + the open source license if something goes wrong and you can choose the sound", "tokens": + [51588, 291, 434, 5541, 365, 264, 1269, 4009, 10476, 498, 746, 1709, 2085, 293, + 291, 393, 2826, 264, 1626, 51776], "temperature": 0.0, "avg_logprob": -0.15340190098203463, + "compression_ratio": 1.9411764705882353, "no_speech_prob": 0.00046640701475553215}, + {"id": 401, "seek": 304884, "start": 3048.92, "end": 3055.1600000000003, "text": + " software and then they go like well we we want all that so then it''s interesting + for them to um", "tokens": [50368, 4722, 293, 550, 436, 352, 411, 731, 321, 321, + 528, 439, 300, 370, 550, 309, 311, 1880, 337, 552, 281, 1105, 50680], "temperature": + 0.0, "avg_logprob": -0.23480790236900592, "compression_ratio": 1.7190476190476192, + "no_speech_prob": 0.0013952752342447639}, {"id": 402, "seek": 304884, "start": 3055.1600000000003, + "end": 3063.88, "text": " um uh uh uh to to to buy license what''s also important + to know is that these companies", "tokens": [50680, 1105, 2232, 2232, 2232, 281, + 281, 281, 2256, 10476, 437, 311, 611, 1021, 281, 458, 307, 300, 613, 3431, 51116], + "temperature": 0.0, "avg_logprob": -0.23480790236900592, "compression_ratio": 1.7190476190476192, + "no_speech_prob": 0.0013952752342447639}, {"id": 403, "seek": 304884, "start": 3064.76, + "end": 3070.52, "text": " like ours were young companies so you''re also tried to + position yourself in the field and you", "tokens": [51160, 411, 11896, 645, 2037, + 3431, 370, 291, 434, 611, 3031, 281, 2535, 1803, 294, 264, 2519, 293, 291, 51448], + "temperature": 0.0, "avg_logprob": -0.23480790236900592, "compression_ratio": 1.7190476190476192, + "no_speech_prob": 0.0013952752342447639}, {"id": 404, "seek": 304884, "start": 3070.52, + "end": 3078.6000000000004, "text": " try to show what you can do and I think that + open source is an amazing vehicle um uh", "tokens": [51448, 853, 281, 855, 437, + 291, 393, 360, 293, 286, 519, 300, 1269, 4009, 307, 364, 2243, 5864, 1105, 2232, + 51852], "temperature": 0.0, "avg_logprob": -0.23480790236900592, "compression_ratio": + 1.7190476190476192, "no_speech_prob": 0.0013952752342447639}, {"id": 405, "seek": + 307860, "start": 3079.0, "end": 3085.08, "text": " because as you probably know + the the the the the open source community can be very direct", "tokens": [50384, + 570, 382, 291, 1391, 458, 264, 264, 264, 264, 264, 1269, 4009, 1768, 393, 312, 588, + 2047, 50688], "temperature": 0.0, "avg_logprob": -0.10953863871466253, "compression_ratio": + 1.8625592417061612, "no_speech_prob": 0.0003226997214369476}, {"id": 406, "seek": + 307860, "start": 3085.72, "end": 3091.0, "text": " and that is great because then + you learn from it and you can make things better so we''ve learned a lot", "tokens": + [50720, 293, 300, 307, 869, 570, 550, 291, 1466, 490, 309, 293, 291, 393, 652, 721, + 1101, 370, 321, 600, 3264, 257, 688, 50984], "temperature": 0.0, "avg_logprob": + -0.10953863871466253, "compression_ratio": 1.8625592417061612, "no_speech_prob": + 0.0003226997214369476}, {"id": 407, "seek": 307860, "start": 3091.0, "end": 3100.92, + "text": " from the from the um community so all in all it''s currently is it is + a net win to have it open source", "tokens": [50984, 490, 264, 490, 264, 1105, 1768, + 370, 439, 294, 439, 309, 311, 4362, 307, 309, 307, 257, 2533, 1942, 281, 362, 309, + 1269, 4009, 51480], "temperature": 0.0, "avg_logprob": -0.10953863871466253, "compression_ratio": + 1.8625592417061612, "no_speech_prob": 0.0003226997214369476}, {"id": 408, "seek": + 307860, "start": 3100.92, "end": 3107.16, "text": " and it''s because it''s not + it''s it''s helping us from an outreach point of view it helps us to build", "tokens": + [51480, 293, 309, 311, 570, 309, 311, 406, 309, 311, 309, 311, 4315, 505, 490, 364, + 19638, 935, 295, 1910, 309, 3665, 505, 281, 1322, 51792], "temperature": 0.0, "avg_logprob": + -0.10953863871466253, "compression_ratio": 1.8625592417061612, "no_speech_prob": + 0.0003226997214369476}, {"id": 409, "seek": 310716, "start": 3107.16, "end": 3116.3599999999997, + "text": " community and um it''s not biting our uh business strategy yeah but that''s + well put uh but I", "tokens": [50364, 1768, 293, 1105, 309, 311, 406, 32912, 527, + 2232, 1606, 5206, 1338, 457, 300, 311, 731, 829, 2232, 457, 286, 50824], "temperature": + 0.0, "avg_logprob": -0.1198518303003204, "compression_ratio": 1.7625570776255708, + "no_speech_prob": 0.00029823093791492283}, {"id": 410, "seek": 310716, "start": + 3116.3599999999997, "end": 3120.68, "text": " wanted to still come back on the open + source a little bit you know like you did mention these key", "tokens": [50824, + 1415, 281, 920, 808, 646, 322, 264, 1269, 4009, 257, 707, 857, 291, 458, 411, 291, + 630, 2152, 613, 2141, 51040], "temperature": 0.0, "avg_logprob": -0.1198518303003204, + "compression_ratio": 1.7625570776255708, "no_speech_prob": 0.00029823093791492283}, + {"id": 411, "seek": 310716, "start": 3120.68, "end": 3127.16, "text": " elements + that are not positive for you and they natively embed into your business model so + to say", "tokens": [51040, 4959, 300, 366, 406, 3353, 337, 291, 293, 436, 8470, + 356, 12240, 666, 428, 1606, 2316, 370, 281, 584, 51364], "temperature": 0.0, "avg_logprob": + -0.1198518303003204, "compression_ratio": 1.7625570776255708, "no_speech_prob": + 0.00029823093791492283}, {"id": 412, "seek": 310716, "start": 3127.16, "end": 3135.24, + "text": " right um but there is also one element that the open source like if you + compare this this to close", "tokens": [51364, 558, 1105, 457, 456, 307, 611, 472, + 4478, 300, 264, 1269, 4009, 411, 498, 291, 6794, 341, 341, 281, 1998, 51768], "temperature": + 0.0, "avg_logprob": -0.1198518303003204, "compression_ratio": 1.7625570776255708, + "no_speech_prob": 0.00029823093791492283}, {"id": 413, "seek": 313524, "start": + 3135.24, "end": 3141.24, "text": " source let''s say the way this would look inside + your company is that you have internal you know", "tokens": [50364, 4009, 718, 311, + 584, 264, 636, 341, 576, 574, 1854, 428, 2237, 307, 300, 291, 362, 6920, 291, 458, + 50664], "temperature": 0.0, "avg_logprob": -0.13683911837064303, "compression_ratio": + 1.5945945945945945, "no_speech_prob": 0.0012639162596315145}, {"id": 414, "seek": + 313524, "start": 3141.24, "end": 3147.8799999999997, "text": " roadmap planning + and you chagelong just releasing stuff right and then you go directly through your", + "tokens": [50664, 35738, 5038, 293, 291, 417, 559, 338, 556, 445, 16327, 1507, 558, + 293, 550, 291, 352, 3838, 807, 428, 50996], "temperature": 0.0, "avg_logprob": -0.13683911837064303, + "compression_ratio": 1.5945945945945945, "no_speech_prob": 0.0012639162596315145}, + {"id": 415, "seek": 313524, "start": 3147.8799999999997, "end": 3155.7999999999997, + "text": " sales you know to upgrade this installation so you become like a uh kind + of corporate type of thing", "tokens": [50996, 5763, 291, 458, 281, 11484, 341, + 13260, 370, 291, 1813, 411, 257, 2232, 733, 295, 10896, 2010, 295, 551, 51392], + "temperature": 0.0, "avg_logprob": -0.13683911837064303, "compression_ratio": 1.5945945945945945, + "no_speech_prob": 0.0012639162596315145}, {"id": 416, "seek": 315580, "start": 3155.88, + "end": 3165.48, "text": " to be deployments right um on the open source side you + need to do an upfront work to maintain the", "tokens": [50368, 281, 312, 7274, 1117, + 558, 1105, 322, 264, 1269, 4009, 1252, 291, 643, 281, 360, 364, 30264, 589, 281, + 6909, 264, 50848], "temperature": 0.0, "avg_logprob": -0.09270276342119489, "compression_ratio": + 1.7017543859649122, "no_speech_prob": 0.00421164883300662}, {"id": 417, "seek": + 315580, "start": 3165.48, "end": 3169.6400000000003, "text": " connection you''ve + built the community but you need to keep talking to the community right that''s + a", "tokens": [50848, 4984, 291, 600, 3094, 264, 1768, 457, 291, 643, 281, 1066, + 1417, 281, 264, 1768, 558, 300, 311, 257, 51056], "temperature": 0.0, "avg_logprob": + -0.09270276342119489, "compression_ratio": 1.7017543859649122, "no_speech_prob": + 0.00421164883300662}, {"id": 418, "seek": 315580, "start": 3169.6400000000003, "end": + 3177.1600000000003, "text": " lot of work as well so how do you see that part of + the story yeah so that''s a very interesting", "tokens": [51056, 688, 295, 589, + 382, 731, 370, 577, 360, 291, 536, 300, 644, 295, 264, 1657, 1338, 370, 300, 311, + 257, 588, 1880, 51432], "temperature": 0.0, "avg_logprob": -0.09270276342119489, + "compression_ratio": 1.7017543859649122, "no_speech_prob": 0.00421164883300662}, + {"id": 419, "seek": 317716, "start": 3177.16, "end": 3187.24, "text": " question + actually so the um so what you see with it so I so in the end I think that community", + "tokens": [50364, 1168, 767, 370, 264, 1105, 370, 437, 291, 536, 365, 309, 370, + 286, 370, 294, 264, 917, 286, 519, 300, 1768, 50868], "temperature": 0.0, "avg_logprob": + -0.1713361144065857, "compression_ratio": 1.5773809523809523, "no_speech_prob": + 0.0005544294253922999}, {"id": 420, "seek": 317716, "start": 3187.24, "end": 3196.2, + "text": " is not something that is um just a open source thing so let me give you + an example", "tokens": [50868, 307, 406, 746, 300, 307, 1105, 445, 257, 1269, 4009, + 551, 370, 718, 385, 976, 291, 364, 1365, 51316], "temperature": 0.0, "avg_logprob": + -0.1713361144065857, "compression_ratio": 1.5773809523809523, "no_speech_prob": + 0.0005544294253922999}, {"id": 421, "seek": 317716, "start": 3198.12, "end": 3205.16, + "text": " a database company that I find very interesting is how they operate is + a snowflake right", "tokens": [51412, 257, 8149, 2237, 300, 286, 915, 588, 1880, + 307, 577, 436, 9651, 307, 257, 44124, 619, 558, 51764], "temperature": 0.0, "avg_logprob": + -0.1713361144065857, "compression_ratio": 1.5773809523809523, "no_speech_prob": + 0.0005544294253922999}, {"id": 422, "seek": 320516, "start": 3205.16, "end": 3209.64, + "text": " can easily set up because of data warehouse you know no no no vector search + so I can easily say that", "tokens": [50364, 393, 3612, 992, 493, 570, 295, 1412, + 22244, 291, 458, 572, 572, 572, 8062, 3164, 370, 286, 393, 3612, 584, 300, 50588], + "temperature": 0.0, "avg_logprob": -0.1774881567273821, "compression_ratio": 1.8082706766917294, + "no_speech_prob": 0.001721435459330678}, {"id": 423, "seek": 320516, "start": 3210.68, + "end": 3215.24, "text": " no but so I find it very interesting and I sometimes talk + to people and they tell me you know", "tokens": [50640, 572, 457, 370, 286, 915, + 309, 588, 1880, 293, 286, 2171, 751, 281, 561, 293, 436, 980, 385, 291, 458, 50868], + "temperature": 0.0, "avg_logprob": -0.1774881567273821, "compression_ratio": 1.8082706766917294, + "no_speech_prob": 0.001721435459330678}, {"id": 424, "seek": 320516, "start": 3215.96, + "end": 3221.3199999999997, "text": " what''s an amazing company in how they build + and help us build partnerships that''s snowflake", "tokens": [50904, 437, 311, 364, + 2243, 2237, 294, 577, 436, 1322, 293, 854, 505, 1322, 18245, 300, 311, 44124, 619, + 51172], "temperature": 0.0, "avg_logprob": -0.1774881567273821, "compression_ratio": + 1.8082706766917294, "no_speech_prob": 0.001721435459330678}, {"id": 425, "seek": + 320516, "start": 3221.3199999999997, "end": 3225.48, "text": " never like wow that''s + interesting so they explain to me how they do it so the point that I''m trying to", + "tokens": [51172, 1128, 411, 6076, 300, 311, 1880, 370, 436, 2903, 281, 385, 577, + 436, 360, 309, 370, 264, 935, 300, 286, 478, 1382, 281, 51380], "temperature": 0.0, + "avg_logprob": -0.1774881567273821, "compression_ratio": 1.8082706766917294, "no_speech_prob": + 0.001721435459330678}, {"id": 426, "seek": 320516, "start": 3225.48, "end": 3231.16, + "text": " make they apparently are doing a great job in building a community uh + but they''re of course", "tokens": [51380, 652, 436, 7970, 366, 884, 257, 869, 1691, + 294, 2390, 257, 1768, 2232, 457, 436, 434, 295, 1164, 51664], "temperature": 0.0, + "avg_logprob": -0.1774881567273821, "compression_ratio": 1.8082706766917294, "no_speech_prob": + 0.001721435459330678}, {"id": 427, "seek": 323116, "start": 3231.16, "end": 3239.72, + "text": " completely closed source so you need to build a community you know either + way you need to have if", "tokens": [50364, 2584, 5395, 4009, 370, 291, 643, 281, + 1322, 257, 1768, 291, 458, 2139, 636, 291, 643, 281, 362, 498, 50792], "temperature": + 0.0, "avg_logprob": -0.1565284216275779, "compression_ratio": 1.7818181818181817, + "no_speech_prob": 0.0012625670060515404}, {"id": 428, "seek": 323116, "start": 3239.72, + "end": 3247.24, "text": " people don''t like your stuff um they''ll move away and + we know of a very famous database company", "tokens": [50792, 561, 500, 380, 411, + 428, 1507, 1105, 436, 603, 1286, 1314, 293, 321, 458, 295, 257, 588, 4618, 8149, + 2237, 51168], "temperature": 0.0, "avg_logprob": -0.1565284216275779, "compression_ratio": + 1.7818181818181817, "no_speech_prob": 0.0012625670060515404}, {"id": 429, "seek": + 323116, "start": 3247.24, "end": 3254.92, "text": " where that is happening and + uh so it''s a um it''s an old-fashioned company so that''s fine but that", "tokens": + [51168, 689, 300, 307, 2737, 293, 2232, 370, 309, 311, 257, 1105, 309, 311, 364, + 1331, 12, 37998, 2237, 370, 300, 311, 2489, 457, 300, 51552], "temperature": 0.0, + "avg_logprob": -0.1565284216275779, "compression_ratio": 1.7818181818181817, "no_speech_prob": + 0.0012625670060515404}, {"id": 430, "seek": 323116, "start": 3254.92, "end": 3258.52, + "text": " way actually learned from them like actually you want to be you want to + have a community you want to", "tokens": [51552, 636, 767, 3264, 490, 552, 411, + 767, 291, 528, 281, 312, 291, 528, 281, 362, 257, 1768, 291, 528, 281, 51732], "temperature": + 0.0, "avg_logprob": -0.1565284216275779, "compression_ratio": 1.7818181818181817, + "no_speech_prob": 0.0012625670060515404}, {"id": 431, "seek": 325852, "start": 3258.52, + "end": 3265.4, "text": " be nice you want to be great great products um because + some people you know then the best", "tokens": [50364, 312, 1481, 291, 528, 281, + 312, 869, 869, 3383, 1105, 570, 512, 561, 291, 458, 550, 264, 1151, 50708], "temperature": + 0.0, "avg_logprob": -0.17656687615622937, "compression_ratio": 1.572192513368984, + "no_speech_prob": 0.0003750216565094888}, {"id": 432, "seek": 325852, "start": 3265.4, + "end": 3271.56, "text": " marketing is basically worth of money and so the point + I''m trying to make is like I don''t think we''re", "tokens": [50708, 6370, 307, + 1936, 3163, 295, 1460, 293, 370, 264, 935, 286, 478, 1382, 281, 652, 307, 411, 286, + 500, 380, 519, 321, 434, 51016], "temperature": 0.0, "avg_logprob": -0.17656687615622937, + "compression_ratio": 1.572192513368984, "no_speech_prob": 0.0003750216565094888}, + {"id": 433, "seek": 325852, "start": 3271.56, "end": 3282.2, "text": " limited to + um uh uh so sorry I meant community is not a thing only for open source you somehow + want to", "tokens": [51016, 5567, 281, 1105, 2232, 2232, 370, 2597, 286, 4140, 1768, + 307, 406, 257, 551, 787, 337, 1269, 4009, 291, 6063, 528, 281, 51548], "temperature": + 0.0, "avg_logprob": -0.17656687615622937, "compression_ratio": 1.572192513368984, + "no_speech_prob": 0.0003750216565094888}, {"id": 434, "seek": 328220, "start": 3282.2, + "end": 3288.68, "text": " show uh uh uh failure and then you build community about + people using your technology saying something", "tokens": [50364, 855, 2232, 2232, + 2232, 7763, 293, 550, 291, 1322, 1768, 466, 561, 1228, 428, 2899, 1566, 746, 50688], + "temperature": 0.0, "avg_logprob": -0.15447090027180124, "compression_ratio": 1.7241379310344827, + "no_speech_prob": 0.0011803137604147196}, {"id": 435, "seek": 328220, "start": 3288.68, + "end": 3296.12, "text": " about your technology um etc yeah I mean absolutely I + mean it''s it''s just the I guess the essence of", "tokens": [50688, 466, 428, 2899, + 1105, 5183, 1338, 286, 914, 3122, 286, 914, 309, 311, 309, 311, 445, 264, 286, 2041, + 264, 12801, 295, 51060], "temperature": 0.0, "avg_logprob": -0.15447090027180124, + "compression_ratio": 1.7241379310344827, "no_speech_prob": 0.0011803137604147196}, + {"id": 436, "seek": 328220, "start": 3296.12, "end": 3303.56, "text": " my question + was kind of like you know like if I if I maintain it as a closed source I can maintain", + "tokens": [51060, 452, 1168, 390, 733, 295, 411, 291, 458, 411, 498, 286, 498, 286, + 6909, 309, 382, 257, 5395, 4009, 286, 393, 6909, 51432], "temperature": 0.0, "avg_logprob": + -0.15447090027180124, "compression_ratio": 1.7241379310344827, "no_speech_prob": + 0.0011803137604147196}, {"id": 437, "seek": 328220, "start": 3303.56, "end": 3309.72, + "text": " my own standards and I can be let''s say soc to compliant right for the + auditing part of things so", "tokens": [51432, 452, 1065, 7787, 293, 286, 393, 312, + 718, 311, 584, 370, 66, 281, 36248, 558, 337, 264, 2379, 1748, 644, 295, 721, 370, + 51740], "temperature": 0.0, "avg_logprob": -0.15447090027180124, "compression_ratio": + 1.7241379310344827, "no_speech_prob": 0.0011803137604147196}, {"id": 438, "seek": + 330972, "start": 3309.72, "end": 3315.56, "text": " my business moves forward but + when I''m open source I need to maintain a different level of", "tokens": [50364, + 452, 1606, 6067, 2128, 457, 562, 286, 478, 1269, 4009, 286, 643, 281, 6909, 257, + 819, 1496, 295, 50656], "temperature": 0.0, "avg_logprob": -0.13052457571029663, + "compression_ratio": 1.672811059907834, "no_speech_prob": 0.0002788118436001241}, + {"id": 439, "seek": 330972, "start": 3315.56, "end": 3323.48, "text": " standard + like like documentation you know um code style you know uh the process of submitting", + "tokens": [50656, 3832, 411, 411, 14333, 291, 458, 1105, 3089, 3758, 291, 458, 2232, + 264, 1399, 295, 31836, 51052], "temperature": 0.0, "avg_logprob": -0.13052457571029663, + "compression_ratio": 1.672811059907834, "no_speech_prob": 0.0002788118436001241}, + {"id": 440, "seek": 330972, "start": 3324.04, "end": 3329.7999999999997, "text": + " you know pull requests and how how can I influence the VIV direction and other + things you know", "tokens": [51080, 291, 458, 2235, 12475, 293, 577, 577, 393, 286, + 6503, 264, 691, 10375, 3513, 293, 661, 721, 291, 458, 51368], "temperature": 0.0, + "avg_logprob": -0.13052457571029663, "compression_ratio": 1.672811059907834, "no_speech_prob": + 0.0002788118436001241}, {"id": 441, "seek": 330972, "start": 3329.7999999999997, + "end": 3335.0, "text": " like it''s a lot of support on your side you basically + you support the clients right", "tokens": [51368, 411, 309, 311, 257, 688, 295, + 1406, 322, 428, 1252, 291, 1936, 291, 1406, 264, 6982, 558, 51628], "temperature": + 0.0, "avg_logprob": -0.13052457571029663, "compression_ratio": 1.672811059907834, + "no_speech_prob": 0.0002788118436001241}, {"id": 442, "seek": 333500, "start": 3335.08, + "end": 3341.08, "text": " like those that choose your deployments your hosted version + your cloud and then you need to", "tokens": [50368, 411, 729, 300, 2826, 428, 7274, + 1117, 428, 19204, 3037, 428, 4588, 293, 550, 291, 643, 281, 50668], "temperature": + 0.0, "avg_logprob": -0.10927551103674847, "compression_ratio": 1.79182156133829, + "no_speech_prob": 0.0040559167973697186}, {"id": 443, "seek": 333500, "start": 3341.08, + "end": 3345.72, "text": " support the community and I mean I''m not saying this + is a bad thing I''m saying this is like a", "tokens": [50668, 1406, 264, 1768, 293, + 286, 914, 286, 478, 406, 1566, 341, 307, 257, 1578, 551, 286, 478, 1566, 341, 307, + 411, 257, 50900], "temperature": 0.0, "avg_logprob": -0.10927551103674847, "compression_ratio": + 1.79182156133829, "no_speech_prob": 0.0040559167973697186}, {"id": 444, "seek": + 333500, "start": 3345.72, "end": 3351.08, "text": " portion of your business model + of your day-to-day life that is dedicated to that and you are doing", "tokens": + [50900, 8044, 295, 428, 1606, 2316, 295, 428, 786, 12, 1353, 12, 810, 993, 300, + 307, 8374, 281, 300, 293, 291, 366, 884, 51168], "temperature": 0.0, "avg_logprob": + -0.10927551103674847, "compression_ratio": 1.79182156133829, "no_speech_prob": 0.0040559167973697186}, + {"id": 445, "seek": 333500, "start": 3351.08, "end": 3356.36, "text": " a great + job at that by the way I''m like super amazed like positive you are always like + welcoming", "tokens": [51168, 257, 869, 1691, 412, 300, 538, 264, 636, 286, 478, + 411, 1687, 20507, 411, 3353, 291, 366, 1009, 411, 17378, 51432], "temperature": + 0.0, "avg_logprob": -0.10927551103674847, "compression_ratio": 1.79182156133829, + "no_speech_prob": 0.0040559167973697186}, {"id": 446, "seek": 333500, "start": 3356.36, + "end": 3362.68, "text": " on slack and the count keeps increasing like regularly + when I go back to the IVH Slack I''m like okay", "tokens": [51432, 322, 29767, 293, + 264, 1207, 5965, 5662, 411, 11672, 562, 286, 352, 646, 281, 264, 15967, 39, 37211, + 286, 478, 411, 1392, 51748], "temperature": 0.0, "avg_logprob": -0.10927551103674847, + "compression_ratio": 1.79182156133829, "no_speech_prob": 0.0040559167973697186}, + {"id": 447, "seek": 336268, "start": 3362.68, "end": 3367.96, "text": " just few + weeks ago it was 150 now it''s over over 200 like what''s going on so you know what + was", "tokens": [50364, 445, 1326, 3259, 2057, 309, 390, 8451, 586, 309, 311, 670, + 670, 2331, 411, 437, 311, 516, 322, 370, 291, 458, 437, 390, 50628], "temperature": + 0.0, "avg_logprob": -0.12872831638042742, "compression_ratio": 1.7309417040358743, + "no_speech_prob": 0.0009625297388993204}, {"id": 448, "seek": 336268, "start": 3367.96, + "end": 3374.04, "text": " doing a great job and and the whole team um but I mean + it''s work it''s work that''s what I''m trying", "tokens": [50628, 884, 257, 869, + 1691, 293, 293, 264, 1379, 1469, 1105, 457, 286, 914, 309, 311, 589, 309, 311, 589, + 300, 311, 437, 286, 478, 1382, 50932], "temperature": 0.0, "avg_logprob": -0.12872831638042742, + "compression_ratio": 1.7309417040358743, "no_speech_prob": 0.0009625297388993204}, + {"id": 449, "seek": 336268, "start": 3374.04, "end": 3381.3999999999996, "text": + " yeah yeah well I mean running a startup in general is it''s a lot of work but + I so I hear your", "tokens": [50932, 1338, 1338, 731, 286, 914, 2614, 257, 18578, + 294, 2674, 307, 309, 311, 257, 688, 295, 589, 457, 286, 370, 286, 1568, 428, 51300], + "temperature": 0.0, "avg_logprob": -0.12872831638042742, "compression_ratio": 1.7309417040358743, + "no_speech_prob": 0.0009625297388993204}, {"id": 450, "seek": 336268, "start": 3381.3999999999996, + "end": 3387.24, "text": " argument but I''m just not I don''t 100% agree with the + argument so let me explain let me explain why", "tokens": [51300, 6770, 457, 286, + 478, 445, 406, 286, 500, 380, 2319, 4, 3986, 365, 264, 6770, 370, 718, 385, 2903, + 718, 385, 2903, 983, 51592], "temperature": 0.0, "avg_logprob": -0.12872831638042742, + "compression_ratio": 1.7309417040358743, "no_speech_prob": 0.0009625297388993204}, + {"id": 451, "seek": 338724, "start": 3387.8799999999997, "end": 3394.8399999999997, + "text": " so first of all like with the the simple example I spoke to I don''t know + if everybody knows", "tokens": [50396, 370, 700, 295, 439, 411, 365, 264, 264, 2199, + 1365, 286, 7179, 281, 286, 500, 380, 458, 498, 2201, 3255, 50744], "temperature": + 0.0, "avg_logprob": -0.2091286094100387, "compression_ratio": 1.8492063492063493, + "no_speech_prob": 0.00275393552146852}, {"id": 452, "seek": 338724, "start": 3394.8399999999997, + "end": 3398.68, "text": " what suck to is but there was listening to the podcast + but it''s like it''s the center right", "tokens": [50744, 437, 9967, 281, 307, 457, + 456, 390, 4764, 281, 264, 7367, 457, 309, 311, 411, 309, 311, 264, 3056, 558, 50936], + "temperature": 0.0, "avg_logprob": -0.2091286094100387, "compression_ratio": 1.8492063492063493, + "no_speech_prob": 0.00275393552146852}, {"id": 453, "seek": 338724, "start": 3398.68, + "end": 3404.2799999999997, "text": " you can have an open source product that is + suck to complying which is really is interesting", "tokens": [50936, 291, 393, 362, + 364, 1269, 4009, 1674, 300, 307, 9967, 281, 715, 7310, 597, 307, 534, 307, 1880, + 51216], "temperature": 0.0, "avg_logprob": -0.2091286094100387, "compression_ratio": + 1.8492063492063493, "no_speech_prob": 0.00275393552146852}, {"id": 454, "seek": + 338724, "start": 3404.2799999999997, "end": 3408.9199999999996, "text": " again + in from a business model perspective so you can say if you use this software open + source", "tokens": [51216, 797, 294, 490, 257, 1606, 2316, 4585, 370, 291, 393, + 584, 498, 291, 764, 341, 4722, 1269, 4009, 51448], "temperature": 0.0, "avg_logprob": + -0.2091286094100387, "compression_ratio": 1.8492063492063493, "no_speech_prob": + 0.00275393552146852}, {"id": 455, "seek": 338724, "start": 3409.4799999999996, "end": + 3413.3999999999996, "text": " it''s not suck to complying but if you use the exact + same software with the different license it", "tokens": [51476, 309, 311, 406, 9967, + 281, 715, 7310, 457, 498, 291, 764, 264, 1900, 912, 4722, 365, 264, 819, 10476, + 309, 51672], "temperature": 0.0, "avg_logprob": -0.2091286094100387, "compression_ratio": + 1.8492063492063493, "no_speech_prob": 0.00275393552146852}, {"id": 456, "seek": + 341340, "start": 3413.4, "end": 3419.32, "text": " is suck to complying so it''s + it''s a part of the open source business so that that''s one thing", "tokens": [50364, + 307, 9967, 281, 715, 7310, 370, 309, 311, 309, 311, 257, 644, 295, 264, 1269, 4009, + 1606, 370, 300, 300, 311, 472, 551, 50660], "temperature": 0.0, "avg_logprob": -0.17038284860006192, + "compression_ratio": 1.81, "no_speech_prob": 0.0010351674864068627}, {"id": 457, + "seek": 341340, "start": 3419.32, "end": 3425.0, "text": " the second thing about + for example maintaining documentation that is true but the thing is if you", "tokens": + [50660, 264, 1150, 551, 466, 337, 1365, 14916, 14333, 300, 307, 2074, 457, 264, + 551, 307, 498, 291, 50944], "temperature": 0.0, "avg_logprob": -0.17038284860006192, + "compression_ratio": 1.81, "no_speech_prob": 0.0010351674864068627}, {"id": 458, + "seek": 341340, "start": 3425.0, "end": 3432.12, "text": " have a a close source + solution somebody somehow needs to use the APIs as well so you still need", "tokens": + [50944, 362, 257, 257, 1998, 4009, 3827, 2618, 6063, 2203, 281, 764, 264, 21445, + 382, 731, 370, 291, 920, 643, 51300], "temperature": 0.0, "avg_logprob": -0.17038284860006192, + "compression_ratio": 1.81, "no_speech_prob": 0.0010351674864068627}, {"id": 459, + "seek": 341340, "start": 3433.48, "end": 3438.04, "text": " documentation for it + so you still need to maintain the documentation so um", "tokens": [51368, 14333, + 337, 309, 370, 291, 920, 643, 281, 6909, 264, 14333, 370, 1105, 51596], "temperature": + 0.0, "avg_logprob": -0.17038284860006192, "compression_ratio": 1.81, "no_speech_prob": + 0.0010351674864068627}, {"id": 460, "seek": 343804, "start": 3439.0, "end": 3445.0, + "text": " so there I''m not sure if that argument still if that if that argument + still still holds", "tokens": [50412, 370, 456, 286, 478, 406, 988, 498, 300, 6770, + 920, 498, 300, 498, 300, 6770, 920, 920, 9190, 50712], "temperature": 0.0, "avg_logprob": + -0.19377730109474875, "compression_ratio": 2.015957446808511, "no_speech_prob": + 0.001879061688669026}, {"id": 461, "seek": 343804, "start": 3445.88, "end": 3452.2, + "text": " the only thing that is sometimes difficult is that people ask a lot of + questions so you sometimes", "tokens": [50756, 264, 787, 551, 300, 307, 2171, 2252, + 307, 300, 561, 1029, 257, 688, 295, 1651, 370, 291, 2171, 51072], "temperature": + 0.0, "avg_logprob": -0.19377730109474875, "compression_ratio": 2.015957446808511, + "no_speech_prob": 0.001879061688669026}, {"id": 462, "seek": 343804, "start": 3452.2, + "end": 3457.08, "text": " see those on a slack they ask a lot of questions so you + want to be friendly and you want to um", "tokens": [51072, 536, 729, 322, 257, 29767, + 436, 1029, 257, 688, 295, 1651, 370, 291, 528, 281, 312, 9208, 293, 291, 528, 281, + 1105, 51316], "temperature": 0.0, "avg_logprob": -0.19377730109474875, "compression_ratio": + 2.015957446808511, "no_speech_prob": 0.001879061688669026}, {"id": 463, "seek": + 343804, "start": 3457.88, "end": 3464.68, "text": " um you know you want to answer + these questions there are two things related to that so one is like", "tokens": + [51356, 1105, 291, 458, 291, 528, 281, 1867, 613, 1651, 456, 366, 732, 721, 4077, + 281, 300, 370, 472, 307, 411, 51696], "temperature": 0.0, "avg_logprob": -0.19377730109474875, + "compression_ratio": 2.015957446808511, "no_speech_prob": 0.001879061688669026}, + {"id": 464, "seek": 346468, "start": 3464.8399999999997, "end": 3469.7999999999997, + "text": " at some point of course you know okay sneak people ask like a lot of questions + and they keep asking", "tokens": [50372, 412, 512, 935, 295, 1164, 291, 458, 1392, + 13164, 561, 1029, 411, 257, 688, 295, 1651, 293, 436, 1066, 3365, 50620], "temperature": + 0.0, "avg_logprob": -0.1851328380072295, "compression_ratio": 1.993220338983051, + "no_speech_prob": 0.002198383677750826}, {"id": 465, "seek": 346468, "start": 3469.7999999999997, + "end": 3474.04, "text": " over and sometimes you''re friendly say you know maybe + just watch this video first or read this part", "tokens": [50620, 670, 293, 2171, + 291, 434, 9208, 584, 291, 458, 1310, 445, 1159, 341, 960, 700, 420, 1401, 341, 644, + 50832], "temperature": 0.0, "avg_logprob": -0.1851328380072295, "compression_ratio": + 1.993220338983051, "no_speech_prob": 0.002198383677750826}, {"id": 466, "seek": + 346468, "start": 3474.04, "end": 3479.72, "text": " of documentation first or sometimes + I also do that in the end and then just kindly just you know", "tokens": [50832, + 295, 14333, 700, 420, 2171, 286, 611, 360, 300, 294, 264, 917, 293, 550, 445, 29736, + 445, 291, 458, 51116], "temperature": 0.0, "avg_logprob": -0.1851328380072295, "compression_ratio": + 1.993220338983051, "no_speech_prob": 0.002198383677750826}, {"id": 467, "seek": + 346468, "start": 3479.72, "end": 3484.2799999999997, "text": " and direct them in + it said I maybe you want to start you want to start there so that that is one", + "tokens": [51116, 293, 2047, 552, 294, 309, 848, 286, 1310, 291, 528, 281, 722, + 291, 528, 281, 722, 456, 370, 300, 300, 307, 472, 51344], "temperature": 0.0, "avg_logprob": + -0.1851328380072295, "compression_ratio": 1.993220338983051, "no_speech_prob": 0.002198383677750826}, + {"id": 468, "seek": 346468, "start": 3484.2799999999997, "end": 3489.3199999999997, + "text": " thing um there was something else that I wanted to say related to this + uh on the oh yeah and the", "tokens": [51344, 551, 1105, 456, 390, 746, 1646, 300, + 286, 1415, 281, 584, 4077, 281, 341, 2232, 322, 264, 1954, 1338, 293, 264, 51596], + "temperature": 0.0, "avg_logprob": -0.1851328380072295, "compression_ratio": 1.993220338983051, + "no_speech_prob": 0.002198383677750826}, {"id": 469, "seek": 346468, "start": 3489.3199999999997, + "end": 3493.64, "text": " second thing and it''s also something that I always tell + the team is that sometimes an open source", "tokens": [51596, 1150, 551, 293, 309, + 311, 611, 746, 300, 286, 1009, 980, 264, 1469, 307, 300, 2171, 364, 1269, 4009, + 51812], "temperature": 0.0, "avg_logprob": -0.1851328380072295, "compression_ratio": + 1.993220338983051, "no_speech_prob": 0.002198383677750826}, {"id": 470, "seek": + 349364, "start": 3493.64, "end": 3499.48, "text": " user might ask complicated questions + not complicated as in that the question itself is complicated", "tokens": [50364, + 4195, 1062, 1029, 6179, 1651, 406, 6179, 382, 294, 300, 264, 1168, 2564, 307, 6179, + 50656], "temperature": 0.0, "avg_logprob": -0.1365509149504871, "compression_ratio": + 1.8446601941747574, "no_speech_prob": 0.00113924709148705}, {"id": 471, "seek": + 349364, "start": 3499.48, "end": 3506.2, "text": " but just oh another question + or why is he or she asking this but the thing is that I strongly", "tokens": [50656, + 457, 445, 1954, 1071, 1168, 420, 983, 307, 415, 420, 750, 3365, 341, 457, 264, 551, + 307, 300, 286, 10613, 50992], "temperature": 0.0, "avg_logprob": -0.1365509149504871, + "compression_ratio": 1.8446601941747574, "no_speech_prob": 0.00113924709148705}, + {"id": 472, "seek": 349364, "start": 3506.2, "end": 3514.2, "text": " believe that + every question that you get has a core of truth to it so so if somebody makes a", + "tokens": [50992, 1697, 300, 633, 1168, 300, 291, 483, 575, 257, 4965, 295, 3494, + 281, 309, 370, 370, 498, 2618, 1669, 257, 51392], "temperature": 0.0, "avg_logprob": + -0.1365509149504871, "compression_ratio": 1.8446601941747574, "no_speech_prob": + 0.00113924709148705}, {"id": 473, "seek": 349364, "start": 3514.2, "end": 3518.92, + "text": " fuss about somebody asks a question then probably others have that problem + as well and the the", "tokens": [51392, 34792, 466, 2618, 8962, 257, 1168, 550, + 1391, 2357, 362, 300, 1154, 382, 731, 293, 264, 264, 51628], "temperature": 0.0, + "avg_logprob": -0.1365509149504871, "compression_ratio": 1.8446601941747574, "no_speech_prob": + 0.00113924709148705}, {"id": 474, "seek": 351892, "start": 3518.92, "end": 3524.12, + "text": " upside that you have from open source is that there''s a lower barrier + to entry people start to ask", "tokens": [50364, 14119, 300, 291, 362, 490, 1269, + 4009, 307, 300, 456, 311, 257, 3126, 13357, 281, 8729, 561, 722, 281, 1029, 50624], + "temperature": 0.0, "avg_logprob": -0.11529972678736637, "compression_ratio": 1.730909090909091, + "no_speech_prob": 0.0014783248770982027}, {"id": 475, "seek": 351892, "start": 3524.12, + "end": 3529.56, "text": " these questions and you you know you learn from them and + I think it''s completely fine to", "tokens": [50624, 613, 1651, 293, 291, 291, 458, + 291, 1466, 490, 552, 293, 286, 519, 309, 311, 2584, 2489, 281, 50896], "temperature": + 0.0, "avg_logprob": -0.11529972678736637, "compression_ratio": 1.730909090909091, + "no_speech_prob": 0.0014783248770982027}, {"id": 476, "seek": 351892, "start": 3529.56, + "end": 3534.12, "text": " in return I sometimes if people ask specific questions + I just ask them not on the public select", "tokens": [50896, 294, 2736, 286, 2171, + 498, 561, 1029, 2685, 1651, 286, 445, 1029, 552, 406, 322, 264, 1908, 3048, 51124], + "temperature": 0.0, "avg_logprob": -0.11529972678736637, "compression_ratio": 1.730909090909091, + "no_speech_prob": 0.0014783248770982027}, {"id": 477, "seek": 351892, "start": 3534.12, + "end": 3537.7200000000003, "text": " of maybe in the DMs like hey may I ask what + are you building with me yet because then there''s", "tokens": [51124, 295, 1310, + 294, 264, 15322, 82, 411, 4177, 815, 286, 1029, 437, 366, 291, 2390, 365, 385, 1939, + 570, 550, 456, 311, 51304], "temperature": 0.0, "avg_logprob": -0.11529972678736637, + "compression_ratio": 1.730909090909091, "no_speech_prob": 0.0014783248770982027}, + {"id": 478, "seek": 351892, "start": 3537.7200000000003, "end": 3545.96, "text": + " a feedback loop and we''re learning from it as well so it''s a I hear your point + but I do think that", "tokens": [51304, 257, 5824, 6367, 293, 321, 434, 2539, 490, + 309, 382, 731, 370, 309, 311, 257, 286, 1568, 428, 935, 457, 286, 360, 519, 300, + 51716], "temperature": 0.0, "avg_logprob": -0.11529972678736637, "compression_ratio": + 1.730909090909091, "no_speech_prob": 0.0014783248770982027}, {"id": 479, "seek": + 354596, "start": 3545.96, "end": 3552.52, "text": " open source is evolving and + the business models around it are you evolving as well and and we''re", "tokens": + [50364, 1269, 4009, 307, 21085, 293, 264, 1606, 5245, 926, 309, 366, 291, 21085, + 382, 731, 293, 293, 321, 434, 50692], "temperature": 0.0, "avg_logprob": -0.16141234052942155, + "compression_ratio": 1.737991266375546, "no_speech_prob": 0.002879726467654109}, + {"id": 480, "seek": 354596, "start": 3552.52, "end": 3558.84, "text": " trying to + benefit from it and again for now it''s a net positive yeah thanks Bob it''s a it''s + very clear", "tokens": [50692, 1382, 281, 5121, 490, 309, 293, 797, 337, 586, 309, + 311, 257, 2533, 3353, 1338, 3231, 6085, 309, 311, 257, 309, 311, 588, 1850, 51008], + "temperature": 0.0, "avg_logprob": -0.16141234052942155, "compression_ratio": 1.737991266375546, + "no_speech_prob": 0.002879726467654109}, {"id": 481, "seek": 354596, "start": 3558.84, + "end": 3565.56, "text": " you know the the reason I''m asking this question is because + there is there''s always something behind", "tokens": [51008, 291, 458, 264, 264, + 1778, 286, 478, 3365, 341, 1168, 307, 570, 456, 307, 456, 311, 1009, 746, 2261, + 51344], "temperature": 0.0, "avg_logprob": -0.16141234052942155, "compression_ratio": + 1.737991266375546, "no_speech_prob": 0.002879726467654109}, {"id": 482, "seek": + 354596, "start": 3565.56, "end": 3572.04, "text": " your choice right and it''s + like it''s it''s supporting your idea you drive in it but you know there", "tokens": + [51344, 428, 3922, 558, 293, 309, 311, 411, 309, 311, 309, 311, 7231, 428, 1558, + 291, 3332, 294, 309, 457, 291, 458, 456, 51668], "temperature": 0.0, "avg_logprob": + -0.16141234052942155, "compression_ratio": 1.737991266375546, "no_speech_prob": + 0.002879726467654109}, {"id": 483, "seek": 357204, "start": 3572.04, "end": 3577.64, + "text": " is an alternative model as well you didn''t consider it because you didn''t + want to go that path right", "tokens": [50364, 307, 364, 8535, 2316, 382, 731, 291, + 994, 380, 1949, 309, 570, 291, 994, 380, 528, 281, 352, 300, 3100, 558, 50644], + "temperature": 0.0, "avg_logprob": -0.08980850506854314, "compression_ratio": 1.8790697674418604, + "no_speech_prob": 0.003568131709471345}, {"id": 484, "seek": 357204, "start": 3577.64, + "end": 3584.6, "text": " you didn''t want want to go the close source path for your + database because the way you as you said", "tokens": [50644, 291, 994, 380, 528, + 528, 281, 352, 264, 1998, 4009, 3100, 337, 428, 8149, 570, 264, 636, 291, 382, 291, + 848, 50992], "temperature": 0.0, "avg_logprob": -0.08980850506854314, "compression_ratio": + 1.8790697674418604, "no_speech_prob": 0.003568131709471345}, {"id": 485, "seek": + 357204, "start": 3584.6, "end": 3590.44, "text": " you want to get more feedback + loops right with the community you want to learn more about the day use", "tokens": + [50992, 291, 528, 281, 483, 544, 5824, 16121, 558, 365, 264, 1768, 291, 528, 281, + 1466, 544, 466, 264, 786, 764, 51284], "temperature": 0.0, "avg_logprob": -0.08980850506854314, + "compression_ratio": 1.8790697674418604, "no_speech_prob": 0.003568131709471345}, + {"id": 486, "seek": 357204, "start": 3590.44, "end": 3597.56, "text": " cases and + this is fantastic way of getting it right like you you show it just transparently + on the web", "tokens": [51284, 3331, 293, 341, 307, 5456, 636, 295, 1242, 309, 558, + 411, 291, 291, 855, 309, 445, 7132, 6420, 322, 264, 3670, 51640], "temperature": + 0.0, "avg_logprob": -0.08980850506854314, "compression_ratio": 1.8790697674418604, + "no_speech_prob": 0.003568131709471345}, {"id": 487, "seek": 359756, "start": 3597.56, + "end": 3603.7999999999997, "text": " you can either download it and host yourself + or and probably that will happen when you run into", "tokens": [50364, 291, 393, + 2139, 5484, 309, 293, 3975, 1803, 420, 293, 1391, 300, 486, 1051, 562, 291, 1190, + 666, 50676], "temperature": 0.0, "avg_logprob": -0.09101206461588542, "compression_ratio": + 1.775229357798165, "no_speech_prob": 0.002231579041108489}, {"id": 488, "seek": + 359756, "start": 3603.7999999999997, "end": 3608.6, "text": " some some issues here + and there we will we will be there to support you right and and and you can", "tokens": + [50676, 512, 512, 2663, 510, 293, 456, 321, 486, 321, 486, 312, 456, 281, 1406, + 291, 558, 293, 293, 293, 291, 393, 50916], "temperature": 0.0, "avg_logprob": -0.09101206461588542, + "compression_ratio": 1.775229357798165, "no_speech_prob": 0.002231579041108489}, + {"id": 489, "seek": 359756, "start": 3608.6, "end": 3614.44, "text": " you can contribute + back if you get inspired by the tech itself your deep and tech you want to fix", + "tokens": [50916, 291, 393, 10586, 646, 498, 291, 483, 7547, 538, 264, 7553, 2564, + 428, 2452, 293, 7553, 291, 528, 281, 3191, 51208], "temperature": 0.0, "avg_logprob": + -0.09101206461588542, "compression_ratio": 1.775229357798165, "no_speech_prob": + 0.002231579041108489}, {"id": 490, "seek": 359756, "start": 3614.44, "end": 3621.4, + "text": " some things right or introduce a feature that is amazing yeah and don''t + get me wrong as I don''t", "tokens": [51208, 512, 721, 558, 420, 5366, 257, 4111, + 300, 307, 2243, 1338, 293, 500, 380, 483, 385, 2085, 382, 286, 500, 380, 51556], + "temperature": 0.0, "avg_logprob": -0.09101206461588542, "compression_ratio": 1.775229357798165, + "no_speech_prob": 0.002231579041108489}, {"id": 491, "seek": 362140, "start": 3621.4, + "end": 3627.32, "text": " have any problems with close source even if I I mean I + can make an argument for close source as well", "tokens": [50364, 362, 604, 2740, + 365, 1998, 4009, 754, 498, 286, 286, 914, 286, 393, 652, 364, 6770, 337, 1998, 4009, + 382, 731, 50660], "temperature": 0.0, "avg_logprob": -0.14588837671761562, "compression_ratio": + 1.8708133971291867, "no_speech_prob": 0.002905852161347866}, {"id": 492, "seek": + 362140, "start": 3629.7200000000003, "end": 3636.92, "text": " but I but I but I + do think is that the that it plays a role in your identity as a company so what", + "tokens": [50780, 457, 286, 457, 286, 457, 286, 360, 519, 307, 300, 264, 300, 309, + 5749, 257, 3090, 294, 428, 6575, 382, 257, 2237, 370, 437, 51140], "temperature": + 0.0, "avg_logprob": -0.14588837671761562, "compression_ratio": 1.8708133971291867, + "no_speech_prob": 0.002905852161347866}, {"id": 493, "seek": 362140, "start": 3636.92, + "end": 3642.28, "text": " kind of company do we want to be and how do we want to + show that to the outside world and and yes", "tokens": [51140, 733, 295, 2237, 360, + 321, 528, 281, 312, 293, 577, 360, 321, 528, 281, 855, 300, 281, 264, 2380, 1002, + 293, 293, 2086, 51408], "temperature": 0.0, "avg_logprob": -0.14588837671761562, + "compression_ratio": 1.8708133971291867, "no_speech_prob": 0.002905852161347866}, + {"id": 494, "seek": 362140, "start": 3642.28, "end": 3648.28, "text": " that is + that that comes with the complexity of needing to deal with it but in the end it + it it", "tokens": [51408, 300, 307, 300, 300, 1487, 365, 264, 14024, 295, 18006, + 281, 2028, 365, 309, 457, 294, 264, 917, 309, 309, 309, 51708], "temperature": 0.0, + "avg_logprob": -0.14588837671761562, "compression_ratio": 1.8708133971291867, "no_speech_prob": + 0.002905852161347866}, {"id": 495, "seek": 364828, "start": 3648.28, "end": 3652.6000000000004, + "text": " works well so for example we also see so go back to these product managers + right so what we see is", "tokens": [50364, 1985, 731, 370, 337, 1365, 321, 611, + 536, 370, 352, 646, 281, 613, 1674, 14084, 558, 370, 437, 321, 536, 307, 50580], + "temperature": 0.0, "avg_logprob": -0.21280218328087075, "compression_ratio": 1.8384615384615384, + "no_speech_prob": 0.0014366165269166231}, {"id": 496, "seek": 364828, "start": 3652.6000000000004, + "end": 3658.84, "text": " that sometimes you know you have developers around the + table and if developers see that it''s", "tokens": [50580, 300, 2171, 291, 458, + 291, 362, 8849, 926, 264, 3199, 293, 498, 8849, 536, 300, 309, 311, 50892], "temperature": + 0.0, "avg_logprob": -0.21280218328087075, "compression_ratio": 1.8384615384615384, + "no_speech_prob": 0.0014366165269166231}, {"id": 497, "seek": 364828, "start": 3659.5600000000004, + "end": 3665.88, "text": " they sometimes especially with with corporates the developers + expect that we have the close source", "tokens": [50928, 436, 2171, 2318, 365, 365, + 6804, 1024, 264, 8849, 2066, 300, 321, 362, 264, 1998, 4009, 51244], "temperature": + 0.0, "avg_logprob": -0.21280218328087075, "compression_ratio": 1.8384615384615384, + "no_speech_prob": 0.0014366165269166231}, {"id": 498, "seek": 364828, "start": 3665.88, + "end": 3670.36, "text": " solution so then they see we''ve here they see there''s + actually open source that makes them very", "tokens": [51244, 3827, 370, 550, 436, + 536, 321, 600, 510, 436, 536, 456, 311, 767, 1269, 4009, 300, 1669, 552, 588, 51468], + "temperature": 0.0, "avg_logprob": -0.21280218328087075, "compression_ratio": 1.8384615384615384, + "no_speech_prob": 0.0014366165269166231}, {"id": 499, "seek": 364828, "start": 3670.36, + "end": 3674.92, "text": " enthusiastic about it kind of this great then I you know + I did an installation I played run", "tokens": [51468, 28574, 466, 309, 733, 295, + 341, 869, 550, 286, 291, 458, 286, 630, 364, 13260, 286, 3737, 1190, 51696], "temperature": + 0.0, "avg_logprob": -0.21280218328087075, "compression_ratio": 1.8384615384615384, + "no_speech_prob": 0.0014366165269166231}, {"id": 500, "seek": 367492, "start": 3674.92, + "end": 3679.64, "text": " with it this is great which is then a positive feedback + look back to the product manager and then", "tokens": [50364, 365, 309, 341, 307, + 869, 597, 307, 550, 257, 3353, 5824, 574, 646, 281, 264, 1674, 6598, 293, 550, 50600], + "temperature": 0.0, "avg_logprob": -0.11172244128058939, "compression_ratio": 1.847926267281106, + "no_speech_prob": 0.008203844539821148}, {"id": 501, "seek": 367492, "start": 3679.64, + "end": 3687.48, "text": " everybody you know everybody''s happy so it''s a it again + it''s a it''s a it''s a it''s a it''s currently a net", "tokens": [50600, 2201, + 291, 458, 2201, 311, 2055, 370, 309, 311, 257, 309, 797, 309, 311, 257, 309, 311, + 257, 309, 311, 257, 309, 311, 257, 309, 311, 4362, 257, 2533, 50992], "temperature": + 0.0, "avg_logprob": -0.11172244128058939, "compression_ratio": 1.847926267281106, + "no_speech_prob": 0.008203844539821148}, {"id": 502, "seek": 367492, "start": 3689.32, + "end": 3695.64, "text": " positive and also I think when you build something new + so you try to new niche yeah you create a new", "tokens": [51084, 3353, 293, 611, + 286, 519, 562, 291, 1322, 746, 777, 370, 291, 853, 281, 777, 19956, 1338, 291, 1884, + 257, 777, 51400], "temperature": 0.0, "avg_logprob": -0.11172244128058939, "compression_ratio": + 1.847926267281106, "no_speech_prob": 0.008203844539821148}, {"id": 503, "seek": + 367492, "start": 3695.64, "end": 3702.6, "text": " niche and we''re not we''re not + alone but it''s it''s also not very crowded right you need to somehow", "tokens": + [51400, 19956, 293, 321, 434, 406, 321, 434, 406, 3312, 457, 309, 311, 309, 311, + 611, 406, 588, 21634, 558, 291, 643, 281, 6063, 51748], "temperature": 0.0, "avg_logprob": + -0.11172244128058939, "compression_ratio": 1.847926267281106, "no_speech_prob": + 0.008203844539821148}, {"id": 504, "seek": 370260, "start": 3702.6, "end": 3708.12, + "text": " show the world what that niches and what that niche can do in as many + ways as possible so I think", "tokens": [50364, 855, 264, 1002, 437, 300, 25570, + 279, 293, 437, 300, 19956, 393, 360, 294, 382, 867, 2098, 382, 1944, 370, 286, 519, + 50640], "temperature": 0.0, "avg_logprob": -0.15750358451125968, "compression_ratio": + 1.8333333333333333, "no_speech_prob": 0.00991271436214447}, {"id": 505, "seek": + 370260, "start": 3708.12, "end": 3714.52, "text": " that I I dare to bet you a nice + ball of wine that''s for the 10 people that told me that they really", "tokens": + [50640, 300, 286, 286, 8955, 281, 778, 291, 257, 1481, 2594, 295, 7209, 300, 311, + 337, 264, 1266, 561, 300, 1907, 385, 300, 436, 534, 50960], "temperature": 0.0, + "avg_logprob": -0.15750358451125968, "compression_ratio": 1.8333333333333333, "no_speech_prob": + 0.00991271436214447}, {"id": 506, "seek": 370260, "start": 3714.52, "end": 3720.44, + "text": " liked the fact that we''ve hit this open source only one of them actually + looked at the software", "tokens": [50960, 4501, 264, 1186, 300, 321, 600, 2045, + 341, 1269, 4009, 787, 472, 295, 552, 767, 2956, 412, 264, 4722, 51256], "temperature": + 0.0, "avg_logprob": -0.15750358451125968, "compression_ratio": 1.8333333333333333, + "no_speech_prob": 0.00991271436214447}, {"id": 507, "seek": 370260, "start": 3720.44, + "end": 3726.04, "text": " itself right went in the folders looked at how it was + written people do that and people get feedback", "tokens": [51256, 2564, 558, 1437, + 294, 264, 31082, 2956, 412, 577, 309, 390, 3720, 561, 360, 300, 293, 561, 483, 5824, + 51536], "temperature": 0.0, "avg_logprob": -0.15750358451125968, "compression_ratio": + 1.8333333333333333, "no_speech_prob": 0.00991271436214447}, {"id": 508, "seek": + 370260, "start": 3726.04, "end": 3731.88, "text": " on it but for a lot of people + they just say hey this is great literature literature so open about it", "tokens": + [51536, 322, 309, 457, 337, 257, 688, 295, 561, 436, 445, 584, 4177, 341, 307, 869, + 10394, 10394, 370, 1269, 466, 309, 51828], "temperature": 0.0, "avg_logprob": -0.15750358451125968, + "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.00991271436214447}, + {"id": 509, "seek": 373188, "start": 3731.88, "end": 3735.48, "text": " we got you + what you know we understand it but this model is great", "tokens": [50364, 321, + 658, 291, 437, 291, 458, 321, 1223, 309, 457, 341, 2316, 307, 869, 50544], "temperature": + 0.0, "avg_logprob": -0.17114014852614629, "compression_ratio": 1.7320574162679425, + "no_speech_prob": 0.0021559966262429953}, {"id": 510, "seek": 373188, "start": 3737.7200000000003, + "end": 3744.92, "text": " this is this is working so it''s it''s building a it''s + a friendly way of approaching the market basically", "tokens": [50656, 341, 307, + 341, 307, 1364, 370, 309, 311, 309, 311, 2390, 257, 309, 311, 257, 9208, 636, 295, + 14908, 264, 2142, 1936, 51016], "temperature": 0.0, "avg_logprob": -0.17114014852614629, + "compression_ratio": 1.7320574162679425, "no_speech_prob": 0.0021559966262429953}, + {"id": 511, "seek": 373188, "start": 3745.56, "end": 3751.88, "text": " I would + argue yeah yeah and I think to close off on this like you know like in my previous + company", "tokens": [51048, 286, 576, 9695, 1338, 1338, 293, 286, 519, 281, 1998, + 766, 322, 341, 411, 291, 458, 411, 294, 452, 3894, 2237, 51364], "temperature": + 0.0, "avg_logprob": -0.17114014852614629, "compression_ratio": 1.7320574162679425, + "no_speech_prob": 0.0021559966262429953}, {"id": 512, "seek": 373188, "start": 3751.88, + "end": 3759.48, "text": " when we were extending solar Apache solar you know the + reason we were extending it is because", "tokens": [51364, 562, 321, 645, 24360, + 7936, 46597, 7936, 291, 458, 264, 1778, 321, 645, 24360, 309, 307, 570, 51744], + "temperature": 0.0, "avg_logprob": -0.17114014852614629, "compression_ratio": 1.7320574162679425, + "no_speech_prob": 0.0021559966262429953}, {"id": 513, "seek": 375948, "start": 3759.48, + "end": 3767.56, "text": " we had a very specific use case that wasn''t solved by + the community right and you know like as you", "tokens": [50364, 321, 632, 257, + 588, 2685, 764, 1389, 300, 2067, 380, 13041, 538, 264, 1768, 558, 293, 291, 458, + 411, 382, 291, 50768], "temperature": 0.0, "avg_logprob": -0.10894387108939034, + "compression_ratio": 1.6727272727272726, "no_speech_prob": 0.0008469332242384553}, + {"id": 514, "seek": 375948, "start": 3768.04, "end": 3773.72, "text": " as you go + into the Apache solar documentation you couldn''t find a lot of material there on + that", "tokens": [50792, 382, 291, 352, 666, 264, 46597, 7936, 14333, 291, 2809, + 380, 915, 257, 688, 295, 2527, 456, 322, 300, 51076], "temperature": 0.0, "avg_logprob": + -0.10894387108939034, "compression_ratio": 1.6727272727272726, "no_speech_prob": + 0.0008469332242384553}, {"id": 515, "seek": 375948, "start": 3773.72, "end": 3778.52, + "text": " specific topic right and so what I had to resort to is reading the source + code right", "tokens": [51076, 2685, 4829, 558, 293, 370, 437, 286, 632, 281, 19606, + 281, 307, 3760, 264, 4009, 3089, 558, 51316], "temperature": 0.0, "avg_logprob": + -0.10894387108939034, "compression_ratio": 1.6727272727272726, "no_speech_prob": + 0.0008469332242384553}, {"id": 516, "seek": 375948, "start": 3779.4, "end": 3785.64, + "text": " and this is something that actually one of the elastic search book authors + I think Rafal", "tokens": [51360, 293, 341, 307, 746, 300, 767, 472, 295, 264, 17115, + 3164, 1446, 16552, 286, 519, 29611, 304, 51672], "temperature": 0.0, "avg_logprob": + -0.10894387108939034, "compression_ratio": 1.6727272727272726, "no_speech_prob": + 0.0008469332242384553}, {"id": 517, "seek": 378564, "start": 3785.72, "end": 3792.3599999999997, + "text": " could said you know if you have a question and nobody answers it on the + mailing list or", "tokens": [50368, 727, 848, 291, 458, 498, 291, 362, 257, 1168, + 293, 5079, 6338, 309, 322, 264, 41612, 1329, 420, 50700], "temperature": 0.0, "avg_logprob": + -0.10912927850946649, "compression_ratio": 1.8811475409836065, "no_speech_prob": + 0.004319323226809502}, {"id": 518, "seek": 378564, "start": 3792.3599999999997, + "end": 3798.92, "text": " documentation doesn''t have an answer go and read the + code that''s your answer right and if it''s", "tokens": [50700, 14333, 1177, 380, + 362, 364, 1867, 352, 293, 1401, 264, 3089, 300, 311, 428, 1867, 558, 293, 498, 309, + 311, 51028], "temperature": 0.0, "avg_logprob": -0.10912927850946649, "compression_ratio": + 1.8811475409836065, "no_speech_prob": 0.004319323226809502}, {"id": 519, "seek": + 378564, "start": 3798.92, "end": 3804.04, "text": " if it wasn''t open source what + would I do I would have to engage through some sales loop or what", "tokens": [51028, + 498, 309, 2067, 380, 1269, 4009, 437, 576, 286, 360, 286, 576, 362, 281, 4683, 807, + 512, 5763, 6367, 420, 437, 51284], "temperature": 0.0, "avg_logprob": -0.10912927850946649, + "compression_ratio": 1.8811475409836065, "no_speech_prob": 0.004319323226809502}, + {"id": 520, "seek": 378564, "start": 3804.04, "end": 3809.4, "text": " I would kind + of like it would it would it would put the threshold to enter it so high", "tokens": + [51284, 286, 576, 733, 295, 411, 309, 576, 309, 576, 309, 576, 829, 264, 14678, + 281, 3242, 309, 370, 1090, 51552], "temperature": 0.0, "avg_logprob": -0.10912927850946649, + "compression_ratio": 1.8811475409836065, "no_speech_prob": 0.004319323226809502}, + {"id": 521, "seek": 378564, "start": 3810.12, "end": 3814.7599999999998, "text": + " that I would just kind of unbearably high I would say okay I will find something + else or maybe", "tokens": [51588, 300, 286, 576, 445, 733, 295, 517, 26738, 1188, + 1090, 286, 576, 584, 1392, 286, 486, 915, 746, 1646, 420, 1310, 51820], "temperature": + 0.0, "avg_logprob": -0.10912927850946649, "compression_ratio": 1.8811475409836065, + "no_speech_prob": 0.004319323226809502}, {"id": 522, "seek": 381476, "start": 3814.76, + "end": 3821.0, "text": " I will stop working on this problem right yeah exactly + and just knowing that that''s there even if", "tokens": [50364, 286, 486, 1590, + 1364, 322, 341, 1154, 558, 1338, 2293, 293, 445, 5276, 300, 300, 311, 456, 754, + 498, 50676], "temperature": 0.0, "avg_logprob": -0.1426586411957048, "compression_ratio": + 1.8282442748091603, "no_speech_prob": 0.0003130204859189689}, {"id": 523, "seek": + 381476, "start": 3821.0, "end": 3826.36, "text": " you don''t need it often works + as a benefit don''t what I what I don''t have is and that''s", "tokens": [50676, + 291, 500, 380, 643, 309, 2049, 1985, 382, 257, 5121, 500, 380, 437, 286, 437, 286, + 500, 380, 362, 307, 293, 300, 311, 50944], "temperature": 0.0, "avg_logprob": -0.1426586411957048, + "compression_ratio": 1.8282442748091603, "no_speech_prob": 0.0003130204859189689}, + {"id": 524, "seek": 381476, "start": 3826.36, "end": 3831.32, "text": " it surprisingly + a lot of people ask me this but I don''t have any moral reason to have something", + "tokens": [50944, 309, 17600, 257, 688, 295, 561, 1029, 385, 341, 457, 286, 500, + 380, 362, 604, 9723, 1778, 281, 362, 746, 51192], "temperature": 0.0, "avg_logprob": + -0.1426586411957048, "compression_ratio": 1.8282442748091603, "no_speech_prob": + 0.0003130204859189689}, {"id": 525, "seek": 381476, "start": 3831.32, "end": 3835.88, + "text": " open source it''s just something is that works very well for us and how + we want to position ourselves", "tokens": [51192, 1269, 4009, 309, 311, 445, 746, + 307, 300, 1985, 588, 731, 337, 505, 293, 577, 321, 528, 281, 2535, 4175, 51420], + "temperature": 0.0, "avg_logprob": -0.1426586411957048, "compression_ratio": 1.8282442748091603, + "no_speech_prob": 0.0003130204859189689}, {"id": 526, "seek": 381476, "start": 3835.88, + "end": 3842.92, "text": " so but it''s a great question but it''s a I think to recap + it just to position a vector search and", "tokens": [51420, 370, 457, 309, 311, + 257, 869, 1168, 457, 309, 311, 257, 286, 519, 281, 20928, 309, 445, 281, 2535, 257, + 8062, 3164, 293, 51772], "temperature": 0.0, "avg_logprob": -0.1426586411957048, + "compression_ratio": 1.8282442748091603, "no_speech_prob": 0.0003130204859189689}, + {"id": 527, "seek": 384292, "start": 3842.92, "end": 3848.04, "text": " with that + we''ve basically in the world to show people that this is something that they can + do", "tokens": [50364, 365, 300, 321, 600, 1936, 294, 264, 1002, 281, 855, 561, + 300, 341, 307, 746, 300, 436, 393, 360, 50620], "temperature": 0.0, "avg_logprob": + -0.16914291815324264, "compression_ratio": 1.720524017467249, "no_speech_prob": + 0.0011127536417916417}, {"id": 528, "seek": 384292, "start": 3848.04, "end": 3855.2400000000002, + "text": " and I think it''s is working wonders yeah absolutely and maybe we can + kind of cover another topic that", "tokens": [50620, 293, 286, 519, 309, 311, 307, + 1364, 27348, 1338, 3122, 293, 1310, 321, 393, 733, 295, 2060, 1071, 4829, 300, 50980], + "temperature": 0.0, "avg_logprob": -0.16914291815324264, "compression_ratio": 1.720524017467249, + "no_speech_prob": 0.0011127536417916417}, {"id": 529, "seek": 384292, "start": 3856.04, + "end": 3862.12, "text": " you''ve you''ve mentioned that you know you you are using + at the core of the IVAT you''re using certain", "tokens": [51020, 291, 600, 291, + 600, 2835, 300, 291, 458, 291, 291, 366, 1228, 412, 264, 4965, 295, 264, 15967, + 2218, 291, 434, 1228, 1629, 51324], "temperature": 0.0, "avg_logprob": -0.16914291815324264, + "compression_ratio": 1.720524017467249, "no_speech_prob": 0.0011127536417916417}, + {"id": 530, "seek": 384292, "start": 3862.12, "end": 3868.04, "text": " algorithms + you know for the vector search itself like building the index and the search algorithm", + "tokens": [51324, 14642, 291, 458, 337, 264, 8062, 3164, 2564, 411, 2390, 264, 8186, + 293, 264, 3164, 9284, 51620], "temperature": 0.0, "avg_logprob": -0.16914291815324264, + "compression_ratio": 1.720524017467249, "no_speech_prob": 0.0011127536417916417}, + {"id": 531, "seek": 386804, "start": 3868.04, "end": 3873.72, "text": " and you + have mentioned to me in private that you are using like H&S double view which is + a", "tokens": [50364, 293, 291, 362, 2835, 281, 385, 294, 4551, 300, 291, 366, 1228, + 411, 389, 5, 50, 3834, 1910, 597, 307, 257, 50648], "temperature": 0.0, "avg_logprob": + -0.11734984560710628, "compression_ratio": 1.6986301369863013, "no_speech_prob": + 0.0009948965162038803}, {"id": 532, "seek": 386804, "start": 3874.36, "end": 3882.12, + "text": " hierarchical navigable small world graph algorithm right that you have + customized right", "tokens": [50680, 35250, 804, 7407, 712, 1359, 1002, 4295, 9284, + 558, 300, 291, 362, 30581, 558, 51068], "temperature": 0.0, "avg_logprob": -0.11734984560710628, + "compression_ratio": 1.6986301369863013, "no_speech_prob": 0.0009948965162038803}, + {"id": 533, "seek": 386804, "start": 3882.12, "end": 3889.0, "text": " can you talk + a bit more why you did it like and you did mention crude right so that you needed + to", "tokens": [51068, 393, 291, 751, 257, 857, 544, 983, 291, 630, 309, 411, 293, + 291, 630, 2152, 30796, 558, 370, 300, 291, 2978, 281, 51412], "temperature": 0.0, + "avg_logprob": -0.11734984560710628, "compression_ratio": 1.6986301369863013, "no_speech_prob": + 0.0009948965162038803}, {"id": 534, "seek": 386804, "start": 3889.0, "end": 3894.6, + "text": " add crude yesterday I was checking their repository and they actually + the original authors they", "tokens": [51412, 909, 30796, 5186, 286, 390, 8568, + 641, 25841, 293, 436, 767, 264, 3380, 16552, 436, 51692], "temperature": 0.0, "avg_logprob": + -0.11734984560710628, "compression_ratio": 1.6986301369863013, "no_speech_prob": + 0.0009948965162038803}, {"id": 535, "seek": 389460, "start": 3894.6, "end": 3899.64, + "text": " already added crude there because probably there''ve been some other use + case coming from elsewhere", "tokens": [50364, 1217, 3869, 30796, 456, 570, 1391, + 456, 600, 668, 512, 661, 764, 1389, 1348, 490, 14517, 50616], "temperature": 0.0, + "avg_logprob": -0.11579390468760434, "compression_ratio": 1.8202247191011236, "no_speech_prob": + 0.0009836568497121334}, {"id": 536, "seek": 389460, "start": 3899.64, "end": 3905.16, + "text": " you know can you add it and and I was coming with my new use case by the + way that I was saying can", "tokens": [50616, 291, 458, 393, 291, 909, 309, 293, + 293, 286, 390, 1348, 365, 452, 777, 764, 1389, 538, 264, 636, 300, 286, 390, 1566, + 393, 50892], "temperature": 0.0, "avg_logprob": -0.11579390468760434, "compression_ratio": + 1.8202247191011236, "no_speech_prob": 0.0009836568497121334}, {"id": 537, "seek": + 389460, "start": 3905.16, "end": 3910.68, "text": " I load the first layer of the + graph somehow with one single right and then I have to go and read", "tokens": [50892, + 286, 3677, 264, 700, 4583, 295, 264, 4295, 6063, 365, 472, 2167, 558, 293, 550, + 286, 362, 281, 352, 293, 1401, 51168], "temperature": 0.0, "avg_logprob": -0.11579390468760434, + "compression_ratio": 1.8202247191011236, "no_speech_prob": 0.0009836568497121334}, + {"id": 538, "seek": 389460, "start": 3910.68, "end": 3917.16, "text": " the code + this is another beauty of the source code I can read it right but can you talk a + bit more", "tokens": [51168, 264, 3089, 341, 307, 1071, 6643, 295, 264, 4009, 3089, + 286, 393, 1401, 309, 558, 457, 393, 291, 751, 257, 857, 544, 51492], "temperature": + 0.0, "avg_logprob": -0.11579390468760434, "compression_ratio": 1.8202247191011236, + "no_speech_prob": 0.0009836568497121334}, {"id": 539, "seek": 389460, "start": 3917.96, + "end": 3923.72, "text": " why why you customized H&S double you and did you implement + it in go in the end yeah yeah so", "tokens": [51532, 983, 983, 291, 30581, 389, + 5, 50, 3834, 291, 293, 630, 291, 4445, 309, 294, 352, 294, 264, 917, 1338, 1338, + 370, 51820], "temperature": 0.0, "avg_logprob": -0.11579390468760434, "compression_ratio": + 1.8202247191011236, "no_speech_prob": 0.0009836568497121334}, {"id": 540, "seek": + 392372, "start": 3923.72, "end": 3930.52, "text": " so two parts of that so the + I did not do that implementation so the the we earlier referred", "tokens": [50364, + 370, 732, 3166, 295, 300, 370, 264, 286, 630, 406, 360, 300, 11420, 370, 264, 264, + 321, 3071, 10839, 50704], "temperature": 0.0, "avg_logprob": -0.14451200557204913, + "compression_ratio": 1.8634538152610443, "no_speech_prob": 0.0020104669965803623}, + {"id": 541, "seek": 392372, "start": 3930.52, "end": 3935.3199999999997, "text": + " that other podcaster listeners really want to go into the near degree about it + then I would highly", "tokens": [50704, 300, 661, 2497, 42640, 23274, 534, 528, + 281, 352, 666, 264, 2651, 4314, 466, 309, 550, 286, 576, 5405, 50944], "temperature": + 0.0, "avg_logprob": -0.14451200557204913, "compression_ratio": 1.8634538152610443, + "no_speech_prob": 0.0020104669965803623}, {"id": 542, "seek": 392372, "start": 3935.3199999999997, + "end": 3941.0, "text": " recommend listening to that podcast but I believe that + you''re also going to link it so and the", "tokens": [50944, 2748, 4764, 281, 300, + 7367, 457, 286, 1697, 300, 291, 434, 611, 516, 281, 2113, 309, 370, 293, 264, 51228], + "temperature": 0.0, "avg_logprob": -0.14451200557204913, "compression_ratio": 1.8634538152610443, + "no_speech_prob": 0.0020104669965803623}, {"id": 543, "seek": 392372, "start": 3941.0, + "end": 3946.2, "text": " answer to your question was basically already in your in + your question right so the the thing is", "tokens": [51228, 1867, 281, 428, 1168, + 390, 1936, 1217, 294, 428, 294, 428, 1168, 558, 370, 264, 264, 551, 307, 51488], + "temperature": 0.0, "avg_logprob": -0.14451200557204913, "compression_ratio": 1.8634538152610443, + "no_speech_prob": 0.0020104669965803623}, {"id": 544, "seek": 392372, "start": 3946.2, + "end": 3951.24, "text": " that we the problem that that we needed to solve is that + if you so you can take a", "tokens": [51488, 300, 321, 264, 1154, 300, 300, 321, + 2978, 281, 5039, 307, 300, 498, 291, 370, 291, 393, 747, 257, 51740], "temperature": + 0.0, "avg_logprob": -0.14451200557204913, "compression_ratio": 1.8634538152610443, + "no_speech_prob": 0.0020104669965803623}, {"id": 545, "seek": 395124, "start": 3952.2, + "end": 3958.04, "text": " an in an library but some of them are immutable and then + the problem is like so then if you change", "tokens": [50412, 364, 294, 364, 6405, + 457, 512, 295, 552, 366, 3397, 32148, 293, 550, 264, 1154, 307, 411, 370, 550, 498, + 291, 1319, 50704], "temperature": 0.0, "avg_logprob": -0.1566848933139694, "compression_ratio": + 1.8775510204081634, "no_speech_prob": 0.004392927046865225}, {"id": 546, "seek": + 395124, "start": 3958.04, "end": 3962.52, "text": " something you need to rebuild + it again that is something you don''t want in a database because if", "tokens": + [50704, 746, 291, 643, 281, 16877, 309, 797, 300, 307, 746, 291, 500, 380, 528, + 294, 257, 8149, 570, 498, 50928], "temperature": 0.0, "avg_logprob": -0.1566848933139694, + "compression_ratio": 1.8775510204081634, "no_speech_prob": 0.004392927046865225}, + {"id": 547, "seek": 395124, "start": 3962.52, "end": 3967.56, "text": " you go back + to that use case of for example the recommendation engine you somehow want to real", + "tokens": [50928, 291, 352, 646, 281, 300, 764, 1389, 295, 337, 1365, 264, 11879, + 2848, 291, 6063, 528, 281, 957, 51180], "temperature": 0.0, "avg_logprob": -0.1566848933139694, + "compression_ratio": 1.8775510204081634, "no_speech_prob": 0.004392927046865225}, + {"id": 548, "seek": 395124, "start": 3967.56, "end": 3972.4399999999996, "text": + " time add a product to a card and somehow real time want to deal with that so then + you are", "tokens": [51180, 565, 909, 257, 1674, 281, 257, 2920, 293, 6063, 957, + 565, 528, 281, 2028, 365, 300, 370, 550, 291, 366, 51424], "temperature": 0.0, "avg_logprob": + -0.1566848933139694, "compression_ratio": 1.8775510204081634, "no_speech_prob": + 0.004392927046865225}, {"id": 549, "seek": 395124, "start": 3973.0, "end": 3979.9599999999996, + "text": " air quotes limited to a an algorithm that supports that and that was for + us and", "tokens": [51452, 1988, 19963, 5567, 281, 257, 364, 9284, 300, 9346, 300, + 293, 300, 390, 337, 505, 293, 51800], "temperature": 0.0, "avg_logprob": -0.1566848933139694, + "compression_ratio": 1.8775510204081634, "no_speech_prob": 0.004392927046865225}, + {"id": 550, "seek": 397996, "start": 3980.52, "end": 3988.68, "text": " and hnsw + was the was the right fit however you can actually see that also in our documentation", + "tokens": [50392, 293, 276, 3695, 86, 390, 264, 390, 264, 558, 3318, 4461, 291, + 393, 767, 536, 300, 611, 294, 527, 14333, 50800], "temperature": 0.0, "avg_logprob": + -0.2928977157130386, "compression_ratio": 1.8056872037914693, "no_speech_prob": + 0.0043419538997113705}, {"id": 551, "seek": 397996, "start": 3989.2400000000002, + "end": 3996.84, "text": " we''d not only have modules but the a n n is actually + also plug-in so currently we only have hnsw", "tokens": [50828, 321, 1116, 406, + 787, 362, 16679, 457, 264, 257, 297, 297, 307, 767, 611, 5452, 12, 259, 370, 4362, + 321, 787, 362, 276, 3695, 86, 51208], "temperature": 0.0, "avg_logprob": -0.2928977157130386, + "compression_ratio": 1.8056872037914693, "no_speech_prob": 0.0043419538997113705}, + {"id": 552, "seek": 397996, "start": 3996.84, "end": 4001.48, "text": " but there''s + no reason we''re looking at others as well at the future we''re going to release + other", "tokens": [51208, 457, 456, 311, 572, 1778, 321, 434, 1237, 412, 2357, 382, + 731, 412, 264, 2027, 321, 434, 516, 281, 4374, 661, 51440], "temperature": 0.0, + "avg_logprob": -0.2928977157130386, "compression_ratio": 1.8056872037914693, "no_speech_prob": + 0.0043419538997113705}, {"id": 553, "seek": 397996, "start": 4003.2400000000002, + "end": 4007.64, "text": " a n n plug-ins as well within we''ve hit it you as an + enthusiast can actually choose what you", "tokens": [51528, 257, 297, 297, 5452, + 12, 1292, 382, 731, 1951, 321, 600, 2045, 309, 291, 382, 364, 18076, 525, 393, 767, + 2826, 437, 291, 51748], "temperature": 0.0, "avg_logprob": -0.2928977157130386, + "compression_ratio": 1.8056872037914693, "no_speech_prob": 0.0043419538997113705}, + {"id": 554, "seek": 400764, "start": 4007.8799999999997, "end": 4013.4, "text": + " want to use but the only the the requirement that we have for such an", "tokens": + [50376, 528, 281, 764, 457, 264, 787, 264, 264, 11695, 300, 321, 362, 337, 1270, + 364, 50652], "temperature": 0.0, "avg_logprob": -0.21656504544344815, "compression_ratio": + 1.7857142857142858, "no_speech_prob": 0.002649795962497592}, {"id": 555, "seek": + 400764, "start": 4017.72, "end": 4022.44, "text": " an algorithm is there basically + when you say like okay we need to somehow have that crot support", "tokens": [50868, + 364, 9284, 307, 456, 1936, 562, 291, 584, 411, 1392, 321, 643, 281, 6063, 362, 300, + 941, 310, 1406, 51104], "temperature": 0.0, "avg_logprob": -0.21656504544344815, + "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.002649795962497592}, + {"id": 556, "seek": 400764, "start": 4022.44, "end": 4027.48, "text": " and or we + need to build add crot support to it could even be the case that in the future it''s", + "tokens": [51104, 293, 420, 321, 643, 281, 1322, 909, 941, 310, 1406, 281, 309, + 727, 754, 312, 264, 1389, 300, 294, 264, 2027, 309, 311, 51356], "temperature": + 0.0, "avg_logprob": -0.21656504544344815, "compression_ratio": 1.7857142857142858, + "no_speech_prob": 0.002649795962497592}, {"id": 557, "seek": 400764, "start": 4028.44, + "end": 4031.72, "text": " you know we''re going to support other use case for that''s + not the case but that''s for now", "tokens": [51404, 291, 458, 321, 434, 516, 281, + 1406, 661, 764, 1389, 337, 300, 311, 406, 264, 1389, 457, 300, 311, 337, 586, 51568], + "temperature": 0.0, "avg_logprob": -0.21656504544344815, "compression_ratio": 1.7857142857142858, + "no_speech_prob": 0.002649795962497592}, {"id": 558, "seek": 403172, "start": 4032.3599999999997, + "end": 4038.68, "text": " and to the second part of your question yes that''s actually + a customary build which you can of", "tokens": [50396, 293, 281, 264, 1150, 644, + 295, 428, 1168, 2086, 300, 311, 767, 257, 2375, 822, 1322, 597, 291, 393, 295, 50712], + "temperature": 0.0, "avg_logprob": -0.14676471423077328, "compression_ratio": 1.755980861244019, + "no_speech_prob": 0.005430617369711399}, {"id": 559, "seek": 403172, "start": 4038.68, + "end": 4047.9599999999996, "text": " course see in the pit of repo you see a full + circle that''s the value all along right so", "tokens": [50712, 1164, 536, 294, + 264, 10147, 295, 49040, 291, 536, 257, 1577, 6329, 300, 311, 264, 2158, 439, 2051, + 558, 370, 51176], "temperature": 0.0, "avg_logprob": -0.14676471423077328, "compression_ratio": + 1.755980861244019, "no_speech_prob": 0.005430617369711399}, {"id": 560, "seek": + 403172, "start": 4049.08, "end": 4053.9599999999996, "text": " and hopefully you + know like in the end in the end of the day it''s like if you guys and then", "tokens": + [51232, 293, 4696, 291, 458, 411, 294, 264, 917, 294, 264, 917, 295, 264, 786, 309, + 311, 411, 498, 291, 1074, 293, 550, 51476], "temperature": 0.0, "avg_logprob": -0.14676471423077328, + "compression_ratio": 1.755980861244019, "no_speech_prob": 0.005430617369711399}, + {"id": 561, "seek": 403172, "start": 4053.9599999999996, "end": 4061.24, "text": + " something in part of hnsw that you implemented in in in glow and you published + it as as code", "tokens": [51476, 746, 294, 644, 295, 276, 3695, 86, 300, 291, 12270, + 294, 294, 294, 17513, 293, 291, 6572, 309, 382, 382, 3089, 51840], "temperature": + 0.0, "avg_logprob": -0.14676471423077328, "compression_ratio": 1.755980861244019, + "no_speech_prob": 0.005430617369711399}, {"id": 562, "seek": 406124, "start": 4061.24, + "end": 4066.2799999999997, "text": " the original authors may also look at it right + and they might you know take that idea and bring", "tokens": [50364, 264, 3380, + 16552, 815, 611, 574, 412, 309, 558, 293, 436, 1062, 291, 458, 747, 300, 1558, 293, + 1565, 50616], "temperature": 0.0, "avg_logprob": -0.09309166933582948, "compression_ratio": + 1.9108527131782946, "no_speech_prob": 0.0012780899414792657}, {"id": 563, "seek": + 406124, "start": 4066.2799999999997, "end": 4073.16, "text": " it back to the implementation + and then that of course you know just as a side product will benefit some", "tokens": + [50616, 309, 646, 281, 264, 11420, 293, 550, 300, 295, 1164, 291, 458, 445, 382, + 257, 1252, 1674, 486, 5121, 512, 50960], "temperature": 0.0, "avg_logprob": -0.09309166933582948, + "compression_ratio": 1.9108527131782946, "no_speech_prob": 0.0012780899414792657}, + {"id": 564, "seek": 406124, "start": 4073.16, "end": 4078.52, "text": " other part + of the community but you will be there as well right because like you know it''s + the", "tokens": [50960, 661, 644, 295, 264, 1768, 457, 291, 486, 312, 456, 382, + 731, 558, 570, 411, 291, 458, 309, 311, 264, 51228], "temperature": 0.0, "avg_logprob": + -0.09309166933582948, "compression_ratio": 1.9108527131782946, "no_speech_prob": + 0.0012780899414792657}, {"id": 565, "seek": 406124, "start": 4078.52, "end": 4084.52, + "text": " authorship or you know the credit that that will be given to you because + you did it right so I mean", "tokens": [51228, 6979, 14752, 420, 291, 458, 264, + 5397, 300, 300, 486, 312, 2212, 281, 291, 570, 291, 630, 309, 558, 370, 286, 914, + 51528], "temperature": 0.0, "avg_logprob": -0.09309166933582948, "compression_ratio": + 1.9108527131782946, "no_speech_prob": 0.0012780899414792657}, {"id": 566, "seek": + 406124, "start": 4084.52, "end": 4090.6, "text": " that that that''s very interesting + like you can benefit and reach out to you to even like new users", "tokens": [51528, + 300, 300, 300, 311, 588, 1880, 411, 291, 393, 5121, 293, 2524, 484, 281, 291, 281, + 754, 411, 777, 5022, 51832], "temperature": 0.0, "avg_logprob": -0.09309166933582948, + "compression_ratio": 1.9108527131782946, "no_speech_prob": 0.0012780899414792657}, + {"id": 567, "seek": 409060, "start": 4090.6, "end": 4097.24, "text": " potentially + right or they will know about your existence through this through this link and", + "tokens": [50364, 7263, 558, 420, 436, 486, 458, 466, 428, 9123, 807, 341, 807, + 341, 2113, 293, 50696], "temperature": 0.0, "avg_logprob": -0.19756551889272836, + "compression_ratio": 1.8436018957345972, "no_speech_prob": 0.0013870676048099995}, + {"id": 568, "seek": 409060, "start": 4097.24, "end": 4103.16, "text": " no exactly + and I think so and and the thing is what happens with with a solution like like + we", "tokens": [50696, 572, 2293, 293, 286, 519, 370, 293, 293, 264, 551, 307, 437, + 2314, 365, 365, 257, 3827, 411, 411, 321, 50992], "temperature": 0.0, "avg_logprob": + -0.19756551889272836, "compression_ratio": 1.8436018957345972, "no_speech_prob": + 0.0013870676048099995}, {"id": 569, "seek": 409060, "start": 4103.16, "end": 4110.44, + "text": " fit is that it so it''s like yes it has that in an algorithm at its at + its heart at its core but there''s", "tokens": [50992, 3318, 307, 300, 309, 370, + 309, 311, 411, 2086, 309, 575, 300, 294, 364, 9284, 412, 1080, 412, 1080, 1917, + 412, 1080, 4965, 457, 456, 311, 51356], "temperature": 0.0, "avg_logprob": -0.19756551889272836, + "compression_ratio": 1.8436018957345972, "no_speech_prob": 0.0013870676048099995}, + {"id": 570, "seek": 409060, "start": 4110.44, "end": 4117.24, "text": " so much + around it with with with the scalability capability set it has with the way of also + storing", "tokens": [51356, 370, 709, 926, 309, 365, 365, 365, 264, 15664, 2310, + 13759, 992, 309, 575, 365, 264, 636, 295, 611, 26085, 51696], "temperature": 0.0, + "avg_logprob": -0.19756551889272836, "compression_ratio": 1.8436018957345972, "no_speech_prob": + 0.0013870676048099995}, {"id": 571, "seek": 411724, "start": 4117.8, "end": 4125.8, + "text": " the data objects that actually building that yourself kind of it''s very + comparable to for example", "tokens": [50392, 264, 1412, 6565, 300, 767, 2390, 300, + 1803, 733, 295, 309, 311, 588, 25323, 281, 337, 1365, 50792], "temperature": 0.0, + "avg_logprob": -0.2164826494582156, "compression_ratio": 1.7410714285714286, "no_speech_prob": + 0.0036490638740360737}, {"id": 572, "seek": 411724, "start": 4125.8, "end": 4131.719999999999, + "text": " Lucine and solar or Lucine and Elasticsearch so how I like to talk about + it in a comparison as I", "tokens": [50792, 9593, 533, 293, 7936, 420, 9593, 533, + 293, 2699, 2750, 405, 1178, 370, 577, 286, 411, 281, 751, 466, 309, 294, 257, 9660, + 382, 286, 51088], "temperature": 0.0, "avg_logprob": -0.2164826494582156, "compression_ratio": + 1.7410714285714286, "no_speech_prob": 0.0036490638740360737}, {"id": 573, "seek": + 411724, "start": 4131.719999999999, "end": 4138.599999999999, "text": " said I like + to about take the an n algorithm as the as in your mind as Lucine I mean the comparison", + "tokens": [51088, 848, 286, 411, 281, 466, 747, 264, 364, 297, 9284, 382, 264, 382, + 294, 428, 1575, 382, 9593, 533, 286, 914, 264, 9660, 51432], "temperature": 0.0, + "avg_logprob": -0.2164826494582156, "compression_ratio": 1.7410714285714286, "no_speech_prob": + 0.0036490638740360737}, {"id": 574, "seek": 411724, "start": 4138.599999999999, + "end": 4143.48, "text": " isn''t a hundred percent correct but to make my point + and then that whole thing that for example", "tokens": [51432, 1943, 380, 257, 3262, + 3043, 3006, 457, 281, 652, 452, 935, 293, 550, 300, 1379, 551, 300, 337, 1365, 51676], + "temperature": 0.0, "avg_logprob": -0.2164826494582156, "compression_ratio": 1.7410714285714286, + "no_speech_prob": 0.0036490638740360737}, {"id": 575, "seek": 414348, "start": 4143.48, + "end": 4148.599999999999, "text": " solar or Elasticsearch build around it that + is what we''re trying to do with these with these algorithms", "tokens": [50364, + 7936, 420, 2699, 2750, 405, 1178, 1322, 926, 309, 300, 307, 437, 321, 434, 1382, + 281, 360, 365, 613, 365, 613, 14642, 50620], "temperature": 0.0, "avg_logprob": + -0.180419331973361, "compression_ratio": 1.7268722466960353, "no_speech_prob": 0.0019367618951946497}, + {"id": 576, "seek": 414348, "start": 4150.36, "end": 4156.36, "text": " so you could + that''s how you could kind of compare it right Rathlin just that we said like we + give", "tokens": [50708, 370, 291, 727, 300, 311, 577, 291, 727, 733, 295, 6794, + 309, 558, 497, 998, 5045, 445, 300, 321, 848, 411, 321, 976, 51008], "temperature": + 0.0, "avg_logprob": -0.180419331973361, "compression_ratio": 1.7268722466960353, + "no_speech_prob": 0.0019367618951946497}, {"id": 577, "seek": 414348, "start": 4156.36, + "end": 4162.919999999999, "text": " you this out of the box but I do want to reiterate + that yes we have these power users that really", "tokens": [51008, 291, 341, 484, + 295, 264, 2424, 457, 286, 360, 528, 281, 33528, 300, 2086, 321, 362, 613, 1347, + 5022, 300, 534, 51336], "temperature": 0.0, "avg_logprob": -0.180419331973361, "compression_ratio": + 1.7268722466960353, "no_speech_prob": 0.0019367618951946497}, {"id": 578, "seek": + 414348, "start": 4162.919999999999, "end": 4168.36, "text": " want to know that + nitty-gritty that want to make those changes but the majority of users the", "tokens": + [51336, 528, 281, 458, 300, 297, 10016, 12, 861, 10016, 300, 528, 281, 652, 729, + 2962, 457, 264, 6286, 295, 5022, 264, 51608], "temperature": 0.0, "avg_logprob": + -0.180419331973361, "compression_ratio": 1.7268722466960353, "no_speech_prob": 0.0019367618951946497}, + {"id": 579, "seek": 416836, "start": 4169.0, "end": 4175.5599999999995, "text": + " I like to call them the silent majority of users they just have like okay I have + a hundred thousand", "tokens": [50396, 286, 411, 281, 818, 552, 264, 12784, 6286, + 295, 5022, 436, 445, 362, 411, 1392, 286, 362, 257, 3262, 4714, 50724], "temperature": + 0.0, "avg_logprob": -0.13489304819414694, "compression_ratio": 1.7244444444444444, + "no_speech_prob": 0.0020694753620773554}, {"id": 580, "seek": 416836, "start": 4175.5599999999995, + "end": 4181.719999999999, "text": " documents I know a hugging phase model that + I like how am I gonna quickly search through it", "tokens": [50724, 8512, 286, 458, + 257, 41706, 5574, 2316, 300, 286, 411, 577, 669, 286, 799, 2661, 3164, 807, 309, + 51032], "temperature": 0.0, "avg_logprob": -0.13489304819414694, "compression_ratio": + 1.7244444444444444, "no_speech_prob": 0.0020694753620773554}, {"id": 581, "seek": + 416836, "start": 4182.599999999999, "end": 4189.32, "text": " period and then they + find we did so it''s a that''s that''s the the majority of users and they probably", + "tokens": [51076, 2896, 293, 550, 436, 915, 321, 630, 370, 309, 311, 257, 300, 311, + 300, 311, 264, 264, 6286, 295, 5022, 293, 436, 1391, 51412], "temperature": 0.0, + "avg_logprob": -0.13489304819414694, "compression_ratio": 1.7244444444444444, "no_speech_prob": + 0.0020694753620773554}, {"id": 582, "seek": 416836, "start": 4189.32, "end": 4195.639999999999, + "text": " don''t even know what hnsw is which is fine right so that''s perfectly + fine because they do other", "tokens": [51412, 500, 380, 754, 458, 437, 276, 3695, + 86, 307, 597, 307, 2489, 558, 370, 300, 311, 6239, 2489, 570, 436, 360, 661, 51728], + "temperature": 0.0, "avg_logprob": -0.13489304819414694, "compression_ratio": 1.7244444444444444, + "no_speech_prob": 0.0020694753620773554}, {"id": 583, "seek": 419564, "start": 4195.72, + "end": 4200.280000000001, "text": " things they might be edits that layering what + I said like they just sit in another layer that they", "tokens": [50368, 721, 436, + 1062, 312, 41752, 300, 40754, 437, 286, 848, 411, 436, 445, 1394, 294, 1071, 4583, + 300, 436, 50596], "temperature": 0.0, "avg_logprob": -0.11021300879391757, "compression_ratio": + 1.8798076923076923, "no_speech_prob": 0.0022335818503051996}, {"id": 584, "seek": + 419564, "start": 4200.280000000001, "end": 4207.4800000000005, "text": " look at + it so I think the cool thing of our modular system is that we can make these power + users at", "tokens": [50596, 574, 412, 309, 370, 286, 519, 264, 1627, 551, 295, + 527, 31111, 1185, 307, 300, 321, 393, 652, 613, 1347, 5022, 412, 50956], "temperature": + 0.0, "avg_logprob": -0.11021300879391757, "compression_ratio": 1.8798076923076923, + "no_speech_prob": 0.0022335818503051996}, {"id": 585, "seek": 419564, "start": 4207.4800000000005, + "end": 4213.56, "text": " the core happy but we can also have these more generic + developers or full stack developers we can", "tokens": [50956, 264, 4965, 2055, + 457, 321, 393, 611, 362, 613, 544, 19577, 8849, 420, 1577, 8630, 8849, 321, 393, + 51260], "temperature": 0.0, "avg_logprob": -0.11021300879391757, "compression_ratio": + 1.8798076923076923, "no_speech_prob": 0.0022335818503051996}, {"id": 586, "seek": + 419564, "start": 4213.56, "end": 4218.68, "text": " make them happy or even in the + outer layer we can make these product managers happy and that''s", "tokens": [51260, + 652, 552, 2055, 420, 754, 294, 264, 10847, 4583, 321, 393, 652, 613, 1674, 14084, + 2055, 293, 300, 311, 51516], "temperature": 0.0, "avg_logprob": -0.11021300879391757, + "compression_ratio": 1.8798076923076923, "no_speech_prob": 0.0022335818503051996}, + {"id": 587, "seek": 421868, "start": 4219.56, "end": 4226.68, "text": " you know + what we focus on but all through a single core and to a set of modules that are", + "tokens": [50408, 291, 458, 437, 321, 1879, 322, 457, 439, 807, 257, 2167, 4965, + 293, 281, 257, 992, 295, 16679, 300, 366, 50764], "temperature": 0.0, "avg_logprob": + -0.25956942240397135, "compression_ratio": 1.6981132075471699, "no_speech_prob": + 0.0033635995350778103}, {"id": 588, "seek": 421868, "start": 4227.240000000001, + "end": 4231.64, "text": " those are immutable basically so it''s not that we have + like two types of weviators", "tokens": [50792, 729, 366, 3397, 32148, 1936, 370, + 309, 311, 406, 300, 321, 362, 411, 732, 3467, 295, 321, 4917, 3391, 51012], "temperature": + 0.0, "avg_logprob": -0.25956942240397135, "compression_ratio": 1.6981132075471699, + "no_speech_prob": 0.0033635995350778103}, {"id": 589, "seek": 421868, "start": 4231.64, + "end": 4236.84, "text": " something it''s just one weviate where we support all + these use cases yeah and I mean you know like", "tokens": [51012, 746, 309, 311, + 445, 472, 321, 4917, 473, 689, 321, 1406, 439, 613, 764, 3331, 1338, 293, 286, 914, + 291, 458, 411, 51272], "temperature": 0.0, "avg_logprob": -0.25956942240397135, + "compression_ratio": 1.6981132075471699, "no_speech_prob": 0.0033635995350778103}, + {"id": 590, "seek": 421868, "start": 4238.12, "end": 4243.240000000001, "text": + " for those of us who really want to go deep into detail and you know like the analogy + that", "tokens": [51336, 337, 729, 295, 505, 567, 534, 528, 281, 352, 2452, 666, + 2607, 293, 291, 458, 411, 264, 21663, 300, 51592], "temperature": 0.0, "avg_logprob": + -0.25956942240397135, "compression_ratio": 1.6981132075471699, "no_speech_prob": + 0.0033635995350778103}, {"id": 591, "seek": 424324, "start": 4243.24, "end": 4249.16, + "text": " just kind of came to my mind is that if you take my sequel or some some + sequel database right when", "tokens": [50364, 445, 733, 295, 1361, 281, 452, 1575, + 307, 300, 498, 291, 747, 452, 20622, 420, 512, 512, 20622, 8149, 558, 562, 50660], + "temperature": 0.0, "avg_logprob": -0.12086176621286493, "compression_ratio": 1.8761904761904762, + "no_speech_prob": 0.0016586068086326122}, {"id": 592, "seek": 424324, "start": 4249.16, + "end": 4256.76, "text": " you choose the type of the field that you index right + and you''re thinking okay it''s going to be a B3", "tokens": [50660, 291, 2826, + 264, 2010, 295, 264, 2519, 300, 291, 8186, 558, 293, 291, 434, 1953, 1392, 309, + 311, 516, 281, 312, 257, 363, 18, 51040], "temperature": 0.0, "avg_logprob": -0.12086176621286493, + "compression_ratio": 1.8761904761904762, "no_speech_prob": 0.0016586068086326122}, + {"id": 593, "seek": 424324, "start": 4256.76, "end": 4262.679999999999, "text": + " or it''s going to be full text or it''s going to be some other data structure + that sequel database", "tokens": [51040, 420, 309, 311, 516, 281, 312, 1577, 2487, + 420, 309, 311, 516, 281, 312, 512, 661, 1412, 3877, 300, 20622, 8149, 51336], "temperature": + 0.0, "avg_logprob": -0.12086176621286493, "compression_ratio": 1.8761904761904762, + "no_speech_prob": 0.0016586068086326122}, {"id": 594, "seek": 424324, "start": 4262.679999999999, + "end": 4268.04, "text": " offers to you that''s when you start asking questions + what is the trade off right of choosing that", "tokens": [51336, 7736, 281, 291, + 300, 311, 562, 291, 722, 3365, 1651, 437, 307, 264, 4923, 766, 558, 295, 10875, + 300, 51604], "temperature": 0.0, "avg_logprob": -0.12086176621286493, "compression_ratio": + 1.8761904761904762, "no_speech_prob": 0.0016586068086326122}, {"id": 595, "seek": + 426804, "start": 4268.04, "end": 4275.0, "text": " version or the other version + but you may also kind of just index the data and then kind of solve", "tokens": + [50364, 3037, 420, 264, 661, 3037, 457, 291, 815, 611, 733, 295, 445, 8186, 264, + 1412, 293, 550, 733, 295, 5039, 50712], "temperature": 0.0, "avg_logprob": -0.10668880238252527, + "compression_ratio": 1.7612612612612613, "no_speech_prob": 0.001136161619797349}, + {"id": 596, "seek": 426804, "start": 4275.0, "end": 4279.96, "text": " your use + case first right and only when your product manager comes back to you and says hey + why is", "tokens": [50712, 428, 764, 1389, 700, 558, 293, 787, 562, 428, 1674, 6598, + 1487, 646, 281, 291, 293, 1619, 4177, 983, 307, 50960], "temperature": 0.0, "avg_logprob": + -0.10668880238252527, "compression_ratio": 1.7612612612612613, "no_speech_prob": + 0.001136161619797349}, {"id": 597, "seek": 426804, "start": 4279.96, "end": 4287.88, + "text": " this slower than yesterday can you improve right exactly exactly and what + I find interesting and", "tokens": [50960, 341, 14009, 813, 5186, 393, 291, 3470, + 558, 2293, 2293, 293, 437, 286, 915, 1880, 293, 51356], "temperature": 0.0, "avg_logprob": + -0.10668880238252527, "compression_ratio": 1.7612612612612613, "no_speech_prob": + 0.001136161619797349}, {"id": 598, "seek": 426804, "start": 4287.88, "end": 4294.5199999999995, + "text": " important where we now are in the in in the cutting edge where where vector + search in general sits", "tokens": [51356, 1021, 689, 321, 586, 366, 294, 264, 294, + 294, 264, 6492, 4691, 689, 689, 8062, 3164, 294, 2674, 12696, 51688], "temperature": + 0.0, "avg_logprob": -0.10668880238252527, "compression_ratio": 1.7612612612612613, + "no_speech_prob": 0.001136161619797349}, {"id": 599, "seek": 429452, "start": 4294.6, + "end": 4303.160000000001, "text": " is that yes it''s very important to talk about + to be transparent about to to share about", "tokens": [50368, 307, 300, 2086, 309, + 311, 588, 1021, 281, 751, 466, 281, 312, 12737, 466, 281, 281, 2073, 466, 50796], + "temperature": 0.0, "avg_logprob": -0.18686670280364623, "compression_ratio": 1.7464114832535884, + "no_speech_prob": 0.004268129356205463}, {"id": 600, "seek": 429452, "start": 4303.160000000001, + "end": 4309.8, "text": " or or not share depending on your open or close energy + about how these things work what kind", "tokens": [50796, 420, 420, 406, 2073, 5413, + 322, 428, 1269, 420, 1998, 2281, 466, 577, 613, 721, 589, 437, 733, 51128], "temperature": + 0.0, "avg_logprob": -0.18686670280364623, "compression_ratio": 1.7464114832535884, + "no_speech_prob": 0.004268129356205463}, {"id": 601, "seek": 429452, "start": 4309.8, + "end": 4314.92, "text": " of algorithms are used on it etc that is very important + and that''s also as you mentioned", "tokens": [51128, 295, 14642, 366, 1143, 322, + 309, 5183, 300, 307, 588, 1021, 293, 300, 311, 611, 382, 291, 2835, 51384], "temperature": + 0.0, "avg_logprob": -0.18686670280364623, "compression_ratio": 1.7464114832535884, + "no_speech_prob": 0.004268129356205463}, {"id": 602, "seek": 429452, "start": 4314.92, + "end": 4320.360000000001, "text": " earlier something that we are very active in + by the way not only as so for example my colleague", "tokens": [51384, 3071, 746, + 300, 321, 366, 588, 4967, 294, 538, 264, 636, 406, 787, 382, 370, 337, 1365, 452, + 13532, 51656], "temperature": 0.0, "avg_logprob": -0.18686670280364623, "compression_ratio": + 1.7464114832535884, "no_speech_prob": 0.004268129356205463}, {"id": 603, "seek": + 432036, "start": 4320.44, "end": 4325.24, "text": " Janice doing that from the from + the core of vv8 but my other colleague Laura is more doing that", "tokens": [50368, + 4956, 573, 884, 300, 490, 264, 490, 264, 4965, 295, 371, 85, 23, 457, 452, 661, + 13532, 13220, 307, 544, 884, 300, 50608], "temperature": 0.0, "avg_logprob": -0.24360090753306513, + "compression_ratio": 1.6902654867256637, "no_speech_prob": 0.004861225374042988}, + {"id": 604, "seek": 432036, "start": 4325.24, "end": 4331.48, "text": " from graph + QL perspectives and like how can you build these queries when you get etc but I + think", "tokens": [50608, 490, 4295, 1249, 43, 16766, 293, 411, 577, 393, 291, 1322, + 613, 24109, 562, 291, 483, 5183, 457, 286, 519, 50920], "temperature": 0.0, "avg_logprob": + -0.24360090753306513, "compression_ratio": 1.6902654867256637, "no_speech_prob": + 0.004861225374042988}, {"id": 605, "seek": 432036, "start": 4331.48, "end": 4338.92, + "text": " it''s also important to if you we take your my sequel example what do + people use my sequel for", "tokens": [50920, 309, 311, 611, 1021, 281, 498, 291, + 321, 747, 428, 452, 20622, 1365, 437, 360, 561, 764, 452, 20622, 337, 51292], "temperature": + 0.0, "avg_logprob": -0.24360090753306513, "compression_ratio": 1.6902654867256637, + "no_speech_prob": 0.004861225374042988}, {"id": 606, "seek": 432036, "start": 4338.92, + "end": 4344.28, "text": " or in our case what do they use vector search for and + how can I communicate to these people who", "tokens": [51292, 420, 294, 527, 1389, + 437, 360, 436, 764, 8062, 3164, 337, 293, 577, 393, 286, 7890, 281, 613, 561, 567, + 51560], "temperature": 0.0, "avg_logprob": -0.24360090753306513, "compression_ratio": + 1.6902654867256637, "no_speech_prob": 0.004861225374042988}, {"id": 607, "seek": + 434428, "start": 4344.36, "end": 4352.5199999999995, "text": " absolutely no idea + what an what an an algorithm is and and I think that those are the tree pillars", + "tokens": [50368, 3122, 572, 1558, 437, 364, 437, 364, 364, 9284, 307, 293, 293, + 286, 519, 300, 729, 366, 264, 4230, 26729, 50776], "temperature": 0.0, "avg_logprob": + -0.21334413169086844, "compression_ratio": 1.7469879518072289, "no_speech_prob": + 0.0027838971000164747}, {"id": 608, "seek": 434428, "start": 4352.5199999999995, + "end": 4360.04, "text": " that we stand on so the the the core the interface and + the use cases and we try to cover these", "tokens": [50776, 300, 321, 1463, 322, + 370, 264, 264, 264, 4965, 264, 9226, 293, 264, 764, 3331, 293, 321, 853, 281, 2060, + 613, 51152], "temperature": 0.0, "avg_logprob": -0.21334413169086844, "compression_ratio": + 1.7469879518072289, "no_speech_prob": 0.0027838971000164747}, {"id": 609, "seek": + 434428, "start": 4360.04, "end": 4367.719999999999, "text": " tree pillars within + a semi based on one code dates yeah that''s fantastic I mean what what you do", + "tokens": [51152, 4230, 26729, 1951, 257, 12909, 2361, 322, 472, 3089, 11691, 1338, + 300, 311, 5456, 286, 914, 437, 437, 291, 360, 51536], "temperature": 0.0, "avg_logprob": + -0.21334413169086844, "compression_ratio": 1.7469879518072289, "no_speech_prob": + 0.0027838971000164747}, {"id": 610, "seek": 436772, "start": 4367.96, "end": 4376.4400000000005, + "text": " there and like I will link the like self what is it selfish plug so the + the block was basically", "tokens": [50376, 456, 293, 411, 286, 486, 2113, 264, + 411, 2698, 437, 307, 309, 19074, 5452, 370, 264, 264, 3461, 390, 1936, 50800], "temperature": + 0.0, "avg_logprob": -0.19245086397443498, "compression_ratio": 1.7592592592592593, + "no_speech_prob": 0.010973384603857994}, {"id": 611, "seek": 436772, "start": 4376.4400000000005, + "end": 4382.92, "text": " explains you know a few details of six vector databases + couple of witch or close source the rest", "tokens": [50800, 13948, 291, 458, 257, + 1326, 4365, 295, 2309, 8062, 22380, 1916, 295, 14867, 420, 1998, 4009, 264, 1472, + 51124], "temperature": 0.0, "avg_logprob": -0.19245086397443498, "compression_ratio": + 1.7592592592592593, "no_speech_prob": 0.010973384603857994}, {"id": 612, "seek": + 436772, "start": 4382.92, "end": 4388.6, "text": " of open source you know you can + actually see you know for yourself like what what is happening", "tokens": [51124, + 295, 1269, 4009, 291, 458, 291, 393, 767, 536, 291, 458, 337, 1803, 411, 437, 437, + 307, 2737, 51408], "temperature": 0.0, "avg_logprob": -0.19245086397443498, "compression_ratio": + 1.7592592592592593, "no_speech_prob": 0.010973384603857994}, {"id": 613, "seek": + 436772, "start": 4388.6, "end": 4395.320000000001, "text": " there in those databases + and then so much material there and like don''t go too technical yet", "tokens": + [51408, 456, 294, 729, 22380, 293, 550, 370, 709, 2527, 456, 293, 411, 500, 380, + 352, 886, 6191, 1939, 51744], "temperature": 0.0, "avg_logprob": -0.19245086397443498, + "compression_ratio": 1.7592592592592593, "no_speech_prob": 0.010973384603857994}, + {"id": 614, "seek": 439532, "start": 4395.4, "end": 4400.2, "text": " kind of stay + in the use case part and kind of like and and I and I hope I we can like", "tokens": + [50368, 733, 295, 1754, 294, 264, 764, 1389, 644, 293, 733, 295, 411, 293, 293, + 286, 293, 286, 1454, 286, 321, 393, 411, 50608], "temperature": 0.0, "avg_logprob": + -0.12857346475860218, "compression_ratio": 1.8316831683168318, "no_speech_prob": + 0.0008220384479500353}, {"id": 615, "seek": 439532, "start": 4400.84, "end": 4406.5199999999995, + "text": " as a community collaborate more in bringing these use cases kind of highlighting + them I just", "tokens": [50640, 382, 257, 1768, 18338, 544, 294, 5062, 613, 764, + 3331, 733, 295, 26551, 552, 286, 445, 50924], "temperature": 0.0, "avg_logprob": + -0.12857346475860218, "compression_ratio": 1.8316831683168318, "no_speech_prob": + 0.0008220384479500353}, {"id": 616, "seek": 439532, "start": 4406.5199999999995, + "end": 4415.08, "text": " alluded to you know similarity code search or you know + encoding some software viruses into that", "tokens": [50924, 33919, 281, 291, 458, + 32194, 3089, 3164, 420, 291, 458, 43430, 512, 4722, 21785, 666, 300, 51352], "temperature": + 0.0, "avg_logprob": -0.12857346475860218, "compression_ratio": 1.8316831683168318, + "no_speech_prob": 0.0008220384479500353}, {"id": 617, "seek": 439532, "start": 4415.08, + "end": 4421.0, "text": " representation and then searching similar viruses when + you need to do that and you''re like yeah", "tokens": [51352, 10290, 293, 550, 10808, + 2531, 21785, 562, 291, 643, 281, 360, 300, 293, 291, 434, 411, 1338, 51648], "temperature": + 0.0, "avg_logprob": -0.12857346475860218, "compression_ratio": 1.8316831683168318, + "no_speech_prob": 0.0008220384479500353}, {"id": 618, "seek": 442100, "start": 4421.0, + "end": 4428.28, "text": " so I mean there are so many use cases that it''s our job + in many ways to connect and you''re doing", "tokens": [50364, 370, 286, 914, 456, + 366, 370, 867, 764, 3331, 300, 309, 311, 527, 1691, 294, 867, 2098, 281, 1745, 293, + 291, 434, 884, 50728], "temperature": 0.0, "avg_logprob": -0.11699887980585513, + "compression_ratio": 1.7981220657276995, "no_speech_prob": 0.00217351783066988}, + {"id": 619, "seek": 442100, "start": 4428.28, "end": 4436.2, "text": " a great job + there really I need to connect the the use cases with the tech right so like don''t", + "tokens": [50728, 257, 869, 1691, 456, 534, 286, 643, 281, 1745, 264, 264, 764, + 3331, 365, 264, 7553, 558, 370, 411, 500, 380, 51124], "temperature": 0.0, "avg_logprob": + -0.11699887980585513, "compression_ratio": 1.7981220657276995, "no_speech_prob": + 0.00217351783066988}, {"id": 620, "seek": 442100, "start": 4436.2, "end": 4442.28, + "text": " fixate on the tech yet because tech isn''t going to improve over time + there will be new cool", "tokens": [51124, 3191, 473, 322, 264, 7553, 1939, 570, + 7553, 1943, 380, 516, 281, 3470, 670, 565, 456, 486, 312, 777, 1627, 51428], "temperature": + 0.0, "avg_logprob": -0.11699887980585513, "compression_ratio": 1.7981220657276995, + "no_speech_prob": 0.00217351783066988}, {"id": 621, "seek": 442100, "start": 4442.28, + "end": 4448.36, "text": " algos by the way the billion scale competition going on + there will be probably new algorithms right", "tokens": [51428, 3501, 329, 538, + 264, 636, 264, 5218, 4373, 6211, 516, 322, 456, 486, 312, 1391, 777, 14642, 558, + 51732], "temperature": 0.0, "avg_logprob": -0.11699887980585513, "compression_ratio": + 1.7981220657276995, "no_speech_prob": 0.00217351783066988}, {"id": 622, "seek": + 444836, "start": 4448.36, "end": 4453.16, "text": " that will beat in performance + and then eventually performance will stop mattering in a way right", "tokens": [50364, + 300, 486, 4224, 294, 3389, 293, 550, 4728, 3389, 486, 1590, 1871, 278, 294, 257, + 636, 558, 50604], "temperature": 0.0, "avg_logprob": -0.15147077387029476, "compression_ratio": + 1.85546875, "no_speech_prob": 0.003950993996113539}, {"id": 623, "seek": 444836, + "start": 4453.16, "end": 4459.4, "text": " like it will be something else it will + be like okay you know what can I do yes yeah so and let me", "tokens": [50604, 411, + 309, 486, 312, 746, 1646, 309, 486, 312, 411, 1392, 291, 458, 437, 393, 286, 360, + 2086, 1338, 370, 293, 718, 385, 50916], "temperature": 0.0, "avg_logprob": -0.15147077387029476, + "compression_ratio": 1.85546875, "no_speech_prob": 0.003950993996113539}, {"id": + 624, "seek": 444836, "start": 4459.4, "end": 4464.839999999999, "text": " if I may + make it a quick metaphor there I mean probably this metaphor is used many times + for", "tokens": [50916, 498, 286, 815, 652, 309, 257, 1702, 19157, 456, 286, 914, + 1391, 341, 19157, 307, 1143, 867, 1413, 337, 51188], "temperature": 0.0, "avg_logprob": + -0.15147077387029476, "compression_ratio": 1.85546875, "no_speech_prob": 0.003950993996113539}, + {"id": 625, "seek": 444836, "start": 4464.839999999999, "end": 4471.96, "text": + " for the technology but just to to to make it as well so you know the other day + I ate at a great", "tokens": [51188, 337, 264, 2899, 457, 445, 281, 281, 281, 652, + 309, 382, 731, 370, 291, 458, 264, 661, 786, 286, 8468, 412, 257, 869, 51544], "temperature": + 0.0, "avg_logprob": -0.15147077387029476, "compression_ratio": 1.85546875, "no_speech_prob": + 0.003950993996113539}, {"id": 626, "seek": 444836, "start": 4471.96, "end": 4476.599999999999, + "text": " restaurant with a friend of mine with great amazing food it was great + atmosphere great food", "tokens": [51544, 6383, 365, 257, 1277, 295, 3892, 365, + 869, 2243, 1755, 309, 390, 869, 8018, 869, 1755, 51776], "temperature": 0.0, "avg_logprob": + -0.15147077387029476, "compression_ratio": 1.85546875, "no_speech_prob": 0.003950993996113539}, + {"id": 627, "seek": 447660, "start": 4476.6, "end": 4482.76, "text": " everything + right so that is the that it would be the the metaphor the use case that you have + but", "tokens": [50364, 1203, 558, 370, 300, 307, 264, 300, 309, 576, 312, 264, + 264, 19157, 264, 764, 1389, 300, 291, 362, 457, 50672], "temperature": 0.0, "avg_logprob": + -0.1291228895602019, "compression_ratio": 1.8177570093457944, "no_speech_prob": + 0.004145899787545204}, {"id": 628, "seek": 447660, "start": 4482.76, "end": 4489.160000000001, + "text": " my friend you know she said like I actually want to know how they make + this so she also bought a", "tokens": [50672, 452, 1277, 291, 458, 750, 848, 411, + 286, 767, 528, 281, 458, 577, 436, 652, 341, 370, 750, 611, 4243, 257, 50992], "temperature": + 0.0, "avg_logprob": -0.1291228895602019, "compression_ratio": 1.8177570093457944, + "no_speech_prob": 0.004145899787545204}, {"id": 629, "seek": 447660, "start": 4489.160000000001, + "end": 4494.280000000001, "text": " cookbook right with the recipes in it so now + she could go into the book and she could actually go", "tokens": [50992, 2543, 2939, + 558, 365, 264, 13035, 294, 309, 370, 586, 750, 727, 352, 666, 264, 1446, 293, 750, + 727, 767, 352, 51248], "temperature": 0.0, "avg_logprob": -0.1291228895602019, "compression_ratio": + 1.8177570093457944, "no_speech_prob": 0.004145899787545204}, {"id": 630, "seek": + 447660, "start": 4494.280000000001, "end": 4501.88, "text": " even a level deeper + and actually see how the chef prepared the the dishes which is fine I was not", + "tokens": [51248, 754, 257, 1496, 7731, 293, 767, 536, 577, 264, 10530, 4927, 264, + 264, 10814, 597, 307, 2489, 286, 390, 406, 51628], "temperature": 0.0, "avg_logprob": + -0.1291228895602019, "compression_ratio": 1.8177570093457944, "no_speech_prob": + 0.004145899787545204}, {"id": 631, "seek": 450188, "start": 4502.2, "end": 4506.6, + "text": " I was I just wanted to have like some nice food in a nice glass wine she + want to know a little bit more", "tokens": [50380, 286, 390, 286, 445, 1415, 281, + 362, 411, 512, 1481, 1755, 294, 257, 1481, 4276, 7209, 750, 528, 281, 458, 257, + 707, 857, 544, 50600], "temperature": 0.0, "avg_logprob": -0.11536905135231457, + "compression_ratio": 1.6930232558139535, "no_speech_prob": 0.006095044780522585}, + {"id": 632, "seek": 450188, "start": 4507.400000000001, "end": 4514.28, "text": + " and I think if we''re smart about this and I think also that is the where", "tokens": + [50640, 293, 286, 519, 498, 321, 434, 4069, 466, 341, 293, 286, 519, 611, 300, 307, + 264, 689, 50984], "temperature": 0.0, "avg_logprob": -0.11536905135231457, "compression_ratio": + 1.6930232558139535, "no_speech_prob": 0.006095044780522585}, {"id": 633, "seek": + 450188, "start": 4515.96, "end": 4522.28, "text": " open source business development + is now in you know 2021 almost 2022 is that you can actually", "tokens": [51068, + 1269, 4009, 1606, 3250, 307, 586, 294, 291, 458, 7201, 1920, 20229, 307, 300, 291, + 393, 767, 51384], "temperature": 0.0, "avg_logprob": -0.11536905135231457, "compression_ratio": + 1.6930232558139535, "no_speech_prob": 0.006095044780522585}, {"id": 634, "seek": + 450188, "start": 4522.28, "end": 4529.32, "text": " cater that to that whole stack + if you do that smart and because they you know they they click", "tokens": [51384, + 21557, 300, 281, 300, 1379, 8630, 498, 291, 360, 300, 4069, 293, 570, 436, 291, + 458, 436, 436, 2052, 51736], "temperature": 0.0, "avg_logprob": -0.11536905135231457, + "compression_ratio": 1.6930232558139535, "no_speech_prob": 0.006095044780522585}, + {"id": 635, "seek": 452932, "start": 4529.32, "end": 4537.5599999999995, "text": + " into each other so I so the point is like you said I don''t think that the the + technology and talk", "tokens": [50364, 666, 1184, 661, 370, 286, 370, 264, 935, + 307, 411, 291, 848, 286, 500, 380, 519, 300, 264, 264, 2899, 293, 751, 50776], "temperature": + 0.0, "avg_logprob": -0.1370929929945204, "compression_ratio": 1.9896907216494846, + "no_speech_prob": 0.003353449981659651}, {"id": 636, "seek": 452932, "start": 4537.5599999999995, + "end": 4543.719999999999, "text": " about technology is also very important but + I don''t think it''s the only thing we should talk about", "tokens": [50776, 466, + 2899, 307, 611, 588, 1021, 457, 286, 500, 380, 519, 309, 311, 264, 787, 551, 321, + 820, 751, 466, 51084], "temperature": 0.0, "avg_logprob": -0.1370929929945204, "compression_ratio": + 1.9896907216494846, "no_speech_prob": 0.003353449981659651}, {"id": 637, "seek": + 452932, "start": 4543.719999999999, "end": 4548.12, "text": " I think we should + make sure that we talk about both and that they''re constantly aligned that''s also", + "tokens": [51084, 286, 519, 321, 820, 652, 988, 300, 321, 751, 466, 1293, 293, 300, + 436, 434, 6460, 17962, 300, 311, 611, 51304], "temperature": 0.0, "avg_logprob": + -0.1370929929945204, "compression_ratio": 1.9896907216494846, "no_speech_prob": + 0.003353449981659651}, {"id": 638, "seek": 452932, "start": 4548.12, "end": 4554.12, + "text": " within how we talk about we get internally that is that is that is aligned + but people use", "tokens": [51304, 1951, 577, 321, 751, 466, 321, 483, 19501, 300, + 307, 300, 307, 300, 307, 17962, 457, 561, 764, 51604], "temperature": 0.0, "avg_logprob": + -0.1370929929945204, "compression_ratio": 1.9896907216494846, "no_speech_prob": + 0.003353449981659651}, {"id": 639, "seek": 455412, "start": 4554.12, "end": 4559.4, + "text": " different words and different ways to describe the technology of people + that are you know helping", "tokens": [50364, 819, 2283, 293, 819, 2098, 281, 6786, + 264, 2899, 295, 561, 300, 366, 291, 458, 4315, 50628], "temperature": 0.0, "avg_logprob": + -0.14932140735311245, "compression_ratio": 1.8208955223880596, "no_speech_prob": + 0.006128443870693445}, {"id": 640, "seek": 455412, "start": 4559.4, "end": 4565.8, + "text": " me on the business development they use different words they never talk + about H&W like the people", "tokens": [50628, 385, 322, 264, 1606, 3250, 436, 764, + 819, 2283, 436, 1128, 751, 466, 389, 5, 54, 411, 264, 561, 50948], "temperature": + 0.0, "avg_logprob": -0.14932140735311245, "compression_ratio": 1.8208955223880596, + "no_speech_prob": 0.006128443870693445}, {"id": 641, "seek": 455412, "start": 4565.8, + "end": 4570.84, "text": " in the cortex team do but they have a great understanding + about the use case that are being sold", "tokens": [50948, 294, 264, 33312, 1469, + 360, 457, 436, 362, 257, 869, 3701, 466, 264, 764, 1389, 300, 366, 885, 3718, 51200], + "temperature": 0.0, "avg_logprob": -0.14932140735311245, "compression_ratio": 1.8208955223880596, + "no_speech_prob": 0.006128443870693445}, {"id": 642, "seek": 455412, "start": 4570.84, + "end": 4575.5599999999995, "text": " so and I think that all needs to come together + and we are an amazing point in time where that''s", "tokens": [51200, 370, 293, + 286, 519, 300, 439, 2203, 281, 808, 1214, 293, 321, 366, 364, 2243, 935, 294, 565, + 689, 300, 311, 51436], "temperature": 0.0, "avg_logprob": -0.14932140735311245, + "compression_ratio": 1.8208955223880596, "no_speech_prob": 0.006128443870693445}, + {"id": 643, "seek": 455412, "start": 4575.5599999999995, "end": 4580.68, "text": + " happening for vectors search so I think that''s just that''s just amazing and + by the way kudos to you", "tokens": [51436, 2737, 337, 18875, 3164, 370, 286, 519, + 300, 311, 445, 300, 311, 445, 2243, 293, 538, 264, 636, 350, 35063, 281, 291, 51692], + "temperature": 0.0, "avg_logprob": -0.14932140735311245, "compression_ratio": 1.8208955223880596, + "no_speech_prob": 0.006128443870693445}, {"id": 644, "seek": 458068, "start": 4580.68, + "end": 4585.88, "text": " as well right so you''re you''re you''re carving out your + own niche as well there so good for you", "tokens": [50364, 382, 731, 558, 370, + 291, 434, 291, 434, 291, 434, 31872, 484, 428, 1065, 19956, 382, 731, 456, 370, + 665, 337, 291, 50624], "temperature": 0.0, "avg_logprob": -0.17119889406813787, + "compression_ratio": 1.7972350230414746, "no_speech_prob": 0.019607143476605415}, + {"id": 645, "seek": 458068, "start": 4585.88, "end": 4590.84, "text": " that''s + nice you know and it''s cool to see yeah being independent kind of in this field + because", "tokens": [50624, 300, 311, 1481, 291, 458, 293, 309, 311, 1627, 281, + 536, 1338, 885, 6695, 733, 295, 294, 341, 2519, 570, 50872], "temperature": 0.0, + "avg_logprob": -0.17119889406813787, "compression_ratio": 1.7972350230414746, "no_speech_prob": + 0.019607143476605415}, {"id": 646, "seek": 458068, "start": 4591.88, "end": 4598.92, + "text": " it also opens opens doors to talk to to guys like you really and and and + you know like if I was your", "tokens": [50924, 309, 611, 9870, 9870, 8077, 281, + 751, 281, 281, 1074, 411, 291, 534, 293, 293, 293, 291, 458, 411, 498, 286, 390, + 428, 51276], "temperature": 0.0, "avg_logprob": -0.17119889406813787, "compression_ratio": + 1.7972350230414746, "no_speech_prob": 0.019607143476605415}, {"id": 647, "seek": + 458068, "start": 4598.92, "end": 4605.0, "text": " competitor maybe you wouldn''t + want to talk to me this early not not on it''s right it''s a discussion", "tokens": + [51276, 27266, 1310, 291, 2759, 380, 528, 281, 751, 281, 385, 341, 2440, 406, 406, + 322, 309, 311, 558, 309, 311, 257, 5017, 51580], "temperature": 0.0, "avg_logprob": + -0.17119889406813787, "compression_ratio": 1.7972350230414746, "no_speech_prob": + 0.019607143476605415}, {"id": 648, "seek": 460500, "start": 4605.0, "end": 4611.72, + "text": " it''s a different discussion yes and at the same time as I said in the + first episode I''m actually", "tokens": [50364, 309, 311, 257, 819, 5017, 2086, + 293, 412, 264, 912, 565, 382, 286, 848, 294, 264, 700, 3500, 286, 478, 767, 50700], + "temperature": 0.0, "avg_logprob": -0.1871852054390856, "compression_ratio": 1.7361111111111112, + "no_speech_prob": 0.0026016789488494396}, {"id": 649, "seek": 460500, "start": 4611.72, + "end": 4618.36, "text": " educating myself a lot on this so in the process of this + I hope to share you know the learnings", "tokens": [50700, 28835, 2059, 257, 688, + 322, 341, 370, 294, 264, 1399, 295, 341, 286, 1454, 281, 2073, 291, 458, 264, 2539, + 82, 51032], "temperature": 0.0, "avg_logprob": -0.1871852054390856, "compression_ratio": + 1.7361111111111112, "no_speech_prob": 0.0026016789488494396}, {"id": 650, "seek": + 460500, "start": 4618.36, "end": 4623.48, "text": " and and benefit everyone including + myself so so that''s that''s that''s the that''s the way to", "tokens": [51032, + 293, 293, 5121, 1518, 3009, 2059, 370, 370, 300, 311, 300, 311, 300, 311, 264, 300, + 311, 264, 636, 281, 51288], "temperature": 0.0, "avg_logprob": -0.1871852054390856, + "compression_ratio": 1.7361111111111112, "no_speech_prob": 0.0026016789488494396}, + {"id": 651, "seek": 460500, "start": 4623.48, "end": 4630.44, "text": " to go forward + so kind of on the open source side right so I''m open sourcing and yes exactly", + "tokens": [51288, 281, 352, 2128, 370, 733, 295, 322, 264, 1269, 4009, 1252, 558, + 370, 286, 478, 1269, 11006, 2175, 293, 2086, 2293, 51636], "temperature": 0.0, "avg_logprob": + -0.1871852054390856, "compression_ratio": 1.7361111111111112, "no_speech_prob": + 0.0026016789488494396}, {"id": 652, "seek": 463044, "start": 4630.759999999999, + "end": 4638.44, "text": " hey Bob so you you you really shared so much inside in + in what you do on the product side", "tokens": [50380, 4177, 6085, 370, 291, 291, + 291, 534, 5507, 370, 709, 1854, 294, 294, 437, 291, 360, 322, 264, 1674, 1252, 50764], + "temperature": 0.0, "avg_logprob": -0.18755880634436448, "compression_ratio": 1.7464114832535884, + "no_speech_prob": 0.009918213821947575}, {"id": 653, "seek": 463044, "start": 4639.08, + "end": 4645.08, "text": " in at Bavid as well as technology I want to drill into + more into this kind of philosophical", "tokens": [50796, 294, 412, 363, 706, 327, + 382, 731, 382, 2899, 286, 528, 281, 11392, 666, 544, 666, 341, 733, 295, 25066, + 51096], "temperature": 0.0, "avg_logprob": -0.18755880634436448, "compression_ratio": + 1.7464114832535884, "no_speech_prob": 0.009918213821947575}, {"id": 654, "seek": + 463044, "start": 4645.799999999999, "end": 4651.5599999999995, "text": " level why + you do this and when I say why like I mean you personally right and that that", + "tokens": [51132, 1496, 983, 291, 360, 341, 293, 562, 286, 584, 983, 411, 286, 914, + 291, 5665, 558, 293, 300, 300, 51420], "temperature": 0.0, "avg_logprob": -0.18755880634436448, + "compression_ratio": 1.7464114832535884, "no_speech_prob": 0.009918213821947575}, + {"id": 655, "seek": 463044, "start": 4651.5599999999995, "end": 4656.919999999999, + "text": " probably propagates to your team as well and we can even ask everyone + on your team why you guys", "tokens": [51420, 1391, 12425, 1024, 281, 428, 1469, + 382, 731, 293, 321, 393, 754, 1029, 1518, 322, 428, 1469, 983, 291, 1074, 51688], + "temperature": 0.0, "avg_logprob": -0.18755880634436448, "compression_ratio": 1.7464114832535884, + "no_speech_prob": 0.009918213821947575}, {"id": 656, "seek": 465692, "start": 4656.92, + "end": 4663.4800000000005, "text": " do it do this right but you are the visionary + you are the core of this like what brings you to this", "tokens": [50364, 360, 309, + 360, 341, 558, 457, 291, 366, 264, 49442, 291, 366, 264, 4965, 295, 341, 411, 437, + 5607, 291, 281, 341, 50692], "temperature": 0.0, "avg_logprob": -0.11447913116878933, + "compression_ratio": 1.760233918128655, "no_speech_prob": 0.001652879873290658}, + {"id": 657, "seek": 465692, "start": 4663.4800000000005, "end": 4670.92, "text": + " field you are the forefront of it yeah so it''s like the that''s a great question + by the way so thanks for", "tokens": [50692, 2519, 291, 366, 264, 27287, 295, 309, + 1338, 370, 309, 311, 411, 264, 300, 311, 257, 869, 1168, 538, 264, 636, 370, 3231, + 337, 51064], "temperature": 0.0, "avg_logprob": -0.11447913116878933, "compression_ratio": + 1.760233918128655, "no_speech_prob": 0.001652879873290658}, {"id": 658, "seek": + 465692, "start": 4670.92, "end": 4680.68, "text": " asking so the so when I was + working on just experimenting with these with these models and these", "tokens": + [51064, 3365, 370, 264, 370, 562, 286, 390, 1364, 322, 445, 29070, 365, 613, 365, + 613, 5245, 293, 613, 51552], "temperature": 0.0, "avg_logprob": -0.11447913116878933, + "compression_ratio": 1.760233918128655, "no_speech_prob": 0.001652879873290658}, + {"id": 659, "seek": 468068, "start": 4680.68, "end": 4687.320000000001, "text": + " and these representations and I started to do research on actually how much how + big is this how", "tokens": [50364, 293, 613, 33358, 293, 286, 1409, 281, 360, 2132, + 322, 767, 577, 709, 577, 955, 307, 341, 577, 50696], "temperature": 0.0, "avg_logprob": + -0.1724344341234229, "compression_ratio": 1.8151658767772512, "no_speech_prob": + 0.0031267397571355104}, {"id": 660, "seek": 468068, "start": 4687.320000000001, + "end": 4692.6, "text": " much unstructured data is there actually that we could + potentially help it then I found that this", "tokens": [50696, 709, 18799, 46847, + 1412, 307, 456, 767, 300, 321, 727, 7263, 854, 309, 550, 286, 1352, 300, 341, 50960], + "temperature": 0.0, "avg_logprob": -0.1724344341234229, "compression_ratio": 1.8151658767772512, + "no_speech_prob": 0.0031267397571355104}, {"id": 661, "seek": 468068, "start": 4692.6, + "end": 4698.200000000001, "text": " was so big and at some point I was just walking + down the street literally and I was and if you", "tokens": [50960, 390, 370, 955, + 293, 412, 512, 935, 286, 390, 445, 4494, 760, 264, 4838, 3736, 293, 286, 390, 293, + 498, 291, 51240], "temperature": 0.0, "avg_logprob": -0.1724344341234229, "compression_ratio": + 1.8151658767772512, "no_speech_prob": 0.0031267397571355104}, {"id": 662, "seek": + 468068, "start": 4698.200000000001, "end": 4703.88, "text": " just walk down street + it''s like like in the matrix when when you see everything like that it''s", "tokens": + [51240, 445, 1792, 760, 4838, 309, 311, 411, 411, 294, 264, 8141, 562, 562, 291, + 536, 1203, 411, 300, 309, 311, 51524], "temperature": 0.0, "avg_logprob": -0.1724344341234229, + "compression_ratio": 1.8151658767772512, "no_speech_prob": 0.0031267397571355104}, + {"id": 663, "seek": 470388, "start": 4703.88, "end": 4710.52, "text": " matrix I + was like wow every company that I see every truck that I see driving by an airplane", + "tokens": [50364, 8141, 286, 390, 411, 6076, 633, 2237, 300, 286, 536, 633, 5898, + 300, 286, 536, 4840, 538, 364, 17130, 50696], "temperature": 0.0, "avg_logprob": + -0.13557289458893157, "compression_ratio": 1.9476439790575917, "no_speech_prob": + 0.0016128345159813762}, {"id": 664, "seek": 470388, "start": 4710.52, "end": 4716.36, + "text": " that I see coming over a warehouse that I see they could potentially use + with you", "tokens": [50696, 300, 286, 536, 1348, 670, 257, 22244, 300, 286, 536, + 436, 727, 7263, 764, 365, 291, 50988], "temperature": 0.0, "avg_logprob": -0.13557289458893157, + "compression_ratio": 1.9476439790575917, "no_speech_prob": 0.0016128345159813762}, + {"id": 665, "seek": 470388, "start": 4717.88, "end": 4724.36, "text": " and that''s + that the dream that I just can walk down the street and I was like oh you see the + truck", "tokens": [51064, 293, 300, 311, 300, 264, 3055, 300, 286, 445, 393, 1792, + 760, 264, 4838, 293, 286, 390, 411, 1954, 291, 536, 264, 5898, 51388], "temperature": + 0.0, "avg_logprob": -0.13557289458893157, "compression_ratio": 1.9476439790575917, + "no_speech_prob": 0.0016128345159813762}, {"id": 666, "seek": 470388, "start": 4724.36, + "end": 4729.4800000000005, "text": " there yeah the company uses us to do X oh you + see the hospital over there yeah they use us to do Y", "tokens": [51388, 456, 1338, + 264, 2237, 4960, 505, 281, 360, 1783, 1954, 291, 536, 264, 4530, 670, 456, 1338, + 436, 764, 505, 281, 360, 398, 51644], "temperature": 0.0, "avg_logprob": -0.13557289458893157, + "compression_ratio": 1.9476439790575917, "no_speech_prob": 0.0016128345159813762}, + {"id": 667, "seek": 472948, "start": 4730.04, "end": 4738.5199999999995, "text": + " etc etc and that is such a thrill to be in that in a new niche and trying to", + "tokens": [50392, 5183, 5183, 293, 300, 307, 1270, 257, 32935, 281, 312, 294, 300, + 294, 257, 777, 19956, 293, 1382, 281, 50816], "temperature": 0.0, "avg_logprob": + -0.1285938673381564, "compression_ratio": 1.9655172413793103, "no_speech_prob": + 0.00478304922580719}, {"id": 668, "seek": 472948, "start": 4740.599999999999, "end": + 4746.5199999999995, "text": " trying to build that product and and try to build + it a solid product and bring that to", "tokens": [50920, 1382, 281, 1322, 300, 1674, + 293, 293, 853, 281, 1322, 309, 257, 5100, 1674, 293, 1565, 300, 281, 51216], "temperature": + 0.0, "avg_logprob": -0.1285938673381564, "compression_ratio": 1.9655172413793103, + "no_speech_prob": 0.00478304922580719}, {"id": 669, "seek": 472948, "start": 4746.5199999999995, + "end": 4752.12, "text": " you know that new product to people solve new problems + that is a personal driver plus", "tokens": [51216, 291, 458, 300, 777, 1674, 281, + 561, 5039, 777, 2740, 300, 307, 257, 2973, 6787, 1804, 51496], "temperature": 0.0, + "avg_logprob": -0.1285938673381564, "compression_ratio": 1.9655172413793103, "no_speech_prob": + 0.00478304922580719}, {"id": 670, "seek": 472948, "start": 4752.679999999999, "end": + 4755.959999999999, "text": " something that is just something that I''m personally + very interested in and that''s something", "tokens": [51524, 746, 300, 307, 445, + 746, 300, 286, 478, 5665, 588, 3102, 294, 293, 300, 311, 746, 51688], "temperature": + 0.0, "avg_logprob": -0.1285938673381564, "compression_ratio": 1.9655172413793103, + "no_speech_prob": 0.00478304922580719}, {"id": 671, "seek": 475596, "start": 4755.96, + "end": 4761.96, "text": " that you already I guess noticed by the way that I you + know present my answers and that grew over time", "tokens": [50364, 300, 291, 1217, + 286, 2041, 5694, 538, 264, 636, 300, 286, 291, 458, 1974, 452, 6338, 293, 300, 6109, + 670, 565, 50664], "temperature": 0.0, "avg_logprob": -0.1547947625319163, "compression_ratio": + 1.9611650485436893, "no_speech_prob": 0.0024568962398916483}, {"id": 672, "seek": + 475596, "start": 4761.96, "end": 4769.24, "text": " is that I became interested + in that that layer between you have to the tech and how people use it", "tokens": + [50664, 307, 300, 286, 3062, 3102, 294, 300, 300, 4583, 1296, 291, 362, 281, 264, + 7553, 293, 577, 561, 764, 309, 51028], "temperature": 0.0, "avg_logprob": -0.1547947625319163, + "compression_ratio": 1.9611650485436893, "no_speech_prob": 0.0024568962398916483}, + {"id": 673, "seek": 475596, "start": 4769.24, "end": 4775.24, "text": " there''s + like and there''s like this this overlapped there I''m interested in that overlapped + and like how", "tokens": [51028, 456, 311, 411, 293, 456, 311, 411, 341, 341, 670, + 875, 3320, 456, 286, 478, 3102, 294, 300, 670, 875, 3320, 293, 411, 577, 51328], + "temperature": 0.0, "avg_logprob": -0.1547947625319163, "compression_ratio": 1.9611650485436893, + "no_speech_prob": 0.0024568962398916483}, {"id": 674, "seek": 475596, "start": 4775.24, + "end": 4779.96, "text": " do people use the technology how does that create value + and how can we bring it to them and how can", "tokens": [51328, 360, 561, 764, 264, + 2899, 577, 775, 300, 1884, 2158, 293, 577, 393, 321, 1565, 309, 281, 552, 293, 577, + 393, 51564], "temperature": 0.0, "avg_logprob": -0.1547947625319163, "compression_ratio": + 1.9611650485436893, "no_speech_prob": 0.0024568962398916483}, {"id": 675, "seek": + 477996, "start": 4779.96, "end": 4786.44, "text": " we capture some of them and + that is something that I''m extremely interested in and and this is", "tokens": + [50364, 321, 7983, 512, 295, 552, 293, 300, 307, 746, 300, 286, 478, 4664, 3102, + 294, 293, 293, 341, 307, 50688], "temperature": 0.0, "avg_logprob": -0.16177059741730385, + "compression_ratio": 1.8792270531400965, "no_speech_prob": 0.001684472430497408}, + {"id": 676, "seek": 477996, "start": 4786.44, "end": 4791.56, "text": " just you + know some of my technologies and then we''ve got this product is vehicle to do that + so it''s", "tokens": [50688, 445, 291, 458, 512, 295, 452, 7943, 293, 550, 321, + 600, 658, 341, 1674, 307, 5864, 281, 360, 300, 370, 309, 311, 50944], "temperature": + 0.0, "avg_logprob": -0.16177059741730385, "compression_ratio": 1.8792270531400965, + "no_speech_prob": 0.001684472430497408}, {"id": 677, "seek": 477996, "start": 4791.56, + "end": 4797.56, "text": " like if we then it just you know if we think big and we + think about this new niche with new database", "tokens": [50944, 411, 498, 321, + 550, 309, 445, 291, 458, 498, 321, 519, 955, 293, 321, 519, 466, 341, 777, 19956, + 365, 777, 8149, 51244], "temperature": 0.0, "avg_logprob": -0.16177059741730385, + "compression_ratio": 1.8792270531400965, "no_speech_prob": 0.001684472430497408}, + {"id": 678, "seek": 477996, "start": 4797.56, "end": 4803.56, "text": " technology + then just let''s just go all in and just see you know how far we can bring this + and", "tokens": [51244, 2899, 550, 445, 718, 311, 445, 352, 439, 294, 293, 445, + 536, 291, 458, 577, 1400, 321, 393, 1565, 341, 293, 51544], "temperature": 0.0, + "avg_logprob": -0.16177059741730385, "compression_ratio": 1.8792270531400965, "no_speech_prob": + 0.001684472430497408}, {"id": 679, "seek": 480356, "start": 4803.64, "end": 4811.56, + "text": " that''s way more to say about this but the it''s such an exciting time + to work in this and so that''s", "tokens": [50368, 300, 311, 636, 544, 281, 584, + 466, 341, 457, 264, 309, 311, 1270, 364, 4670, 565, 281, 589, 294, 341, 293, 370, + 300, 311, 50764], "temperature": 0.0, "avg_logprob": -0.14675349990526834, "compression_ratio": + 1.8309859154929577, "no_speech_prob": 0.004342799074947834}, {"id": 680, "seek": + 480356, "start": 4811.56, "end": 4819.56, "text": " my personal reason why I do + this and so yeah so that yeah since since you are so big on use cases is", "tokens": + [50764, 452, 2973, 1778, 983, 286, 360, 341, 293, 370, 1338, 370, 300, 1338, 1670, + 1670, 291, 366, 370, 955, 322, 764, 3331, 307, 51164], "temperature": 0.0, "avg_logprob": + -0.14675349990526834, "compression_ratio": 1.8309859154929577, "no_speech_prob": + 0.004342799074947834}, {"id": 681, "seek": 480356, "start": 4819.56, "end": 4826.4400000000005, + "text": " there a specific use case that drives you the that gets solved maybe it + wasn''t solved yet maybe", "tokens": [51164, 456, 257, 2685, 764, 1389, 300, 11754, + 291, 264, 300, 2170, 13041, 1310, 309, 2067, 380, 13041, 1939, 1310, 51508], "temperature": + 0.0, "avg_logprob": -0.14675349990526834, "compression_ratio": 1.8309859154929577, + "no_speech_prob": 0.004342799074947834}, {"id": 682, "seek": 480356, "start": 4826.4400000000005, + "end": 4832.4400000000005, "text": " it was already solved by the way in your in + your videos you know like you always kind of quite", "tokens": [51508, 309, 390, + 1217, 13041, 538, 264, 636, 294, 428, 294, 428, 2145, 291, 458, 411, 291, 1009, + 733, 295, 1596, 51808], "temperature": 0.0, "avg_logprob": -0.14675349990526834, + "compression_ratio": 1.8309859154929577, "no_speech_prob": 0.004342799074947834}, + {"id": 683, "seek": 483244, "start": 4832.44, "end": 4839.639999999999, "text": + " frequently you say okay imagine a wine store right thinking I''m thinking probably + there is a good", "tokens": [50364, 10374, 291, 584, 1392, 3811, 257, 7209, 3531, + 558, 1953, 286, 478, 1953, 1391, 456, 307, 257, 665, 50724], "temperature": 0.0, + "avg_logprob": -0.1882639649093792, "compression_ratio": 1.8101851851851851, "no_speech_prob": + 0.0007089157588779926}, {"id": 684, "seek": 483244, "start": 4839.639999999999, + "end": 4845.639999999999, "text": " why wine in the Holland I should when when I + travel let''s let''s get together and drink some good wine", "tokens": [50724, 983, + 7209, 294, 264, 27201, 286, 820, 562, 562, 286, 3147, 718, 311, 718, 311, 483, 1214, + 293, 2822, 512, 665, 7209, 51024], "temperature": 0.0, "avg_logprob": -0.1882639649093792, + "compression_ratio": 1.8101851851851851, "no_speech_prob": 0.0007089157588779926}, + {"id": 685, "seek": 483244, "start": 4846.36, "end": 4851.4, "text": " but you know + like is that the use case that drives you is there something else that you think", + "tokens": [51060, 457, 291, 458, 411, 307, 300, 264, 764, 1389, 300, 11754, 291, + 307, 456, 746, 1646, 300, 291, 519, 51312], "temperature": 0.0, "avg_logprob": -0.1882639649093792, + "compression_ratio": 1.8101851851851851, "no_speech_prob": 0.0007089157588779926}, + {"id": 686, "seek": 483244, "start": 4851.4, "end": 4861.08, "text": " could be + solved by the aviate oh no so it wasn''t yeah so the what what drives me is the + there are", "tokens": [51312, 727, 312, 13041, 538, 264, 1305, 13024, 1954, 572, + 370, 309, 2067, 380, 1338, 370, 264, 437, 437, 11754, 385, 307, 264, 456, 366, 51796], + "temperature": 0.0, "avg_logprob": -0.1882639649093792, "compression_ratio": 1.8101851851851851, + "no_speech_prob": 0.0007089157588779926}, {"id": 687, "seek": 486108, "start": 4861.08, + "end": 4870.04, "text": " certain use cases that could be I think and I''m now doing + this from the top of my head that you could", "tokens": [50364, 1629, 764, 3331, + 300, 727, 312, 286, 519, 293, 286, 478, 586, 884, 341, 490, 264, 1192, 295, 452, + 1378, 300, 291, 727, 50812], "temperature": 0.0, "avg_logprob": -0.12727249186971915, + "compression_ratio": 1.7716894977168949, "no_speech_prob": 0.0005894072819501162}, + {"id": 688, "seek": 486108, "start": 4872.44, "end": 4879.72, "text": " you could + look at them from the perspective of size so that a that big you know large corporates", + "tokens": [50932, 291, 727, 574, 412, 552, 490, 264, 4585, 295, 2744, 370, 300, + 257, 300, 955, 291, 458, 2416, 6804, 1024, 51296], "temperature": 0.0, "avg_logprob": + -0.12727249186971915, "compression_ratio": 1.7716894977168949, "no_speech_prob": + 0.0005894072819501162}, {"id": 689, "seek": 486108, "start": 4879.72, "end": 4884.68, + "text": " are working with VGA trying to solve problems that I go like this is amazing + that they use this", "tokens": [51296, 366, 1364, 365, 691, 12570, 1382, 281, 5039, + 2740, 300, 286, 352, 411, 341, 307, 2243, 300, 436, 764, 341, 51544], "temperature": + 0.0, "avg_logprob": -0.12727249186971915, "compression_ratio": 1.7716894977168949, + "no_speech_prob": 0.0005894072819501162}, {"id": 690, "seek": 486108, "start": 4885.32, + "end": 4890.5199999999995, "text": " right that is that is something that I want + hand that drives me the other hand what drives me", "tokens": [51576, 558, 300, + 307, 300, 307, 746, 300, 286, 528, 1011, 300, 11754, 385, 264, 661, 1011, 437, 11754, + 385, 51836], "temperature": 0.0, "avg_logprob": -0.12727249186971915, "compression_ratio": + 1.7716894977168949, "no_speech_prob": 0.0005894072819501162}, {"id": 691, "seek": + 489052, "start": 4890.52, "end": 4898.360000000001, "text": " is that people were + looking at vv8 to to use it to where it has an impact in people''s lives so that", + "tokens": [50364, 307, 300, 561, 645, 1237, 412, 371, 85, 23, 281, 281, 764, 309, + 281, 689, 309, 575, 364, 2712, 294, 561, 311, 2909, 370, 300, 50756], "temperature": + 0.0, "avg_logprob": -0.11230423012558295, "compression_ratio": 1.7149122807017543, + "no_speech_prob": 0.0005562844453379512}, {"id": 692, "seek": 489052, "start": 4898.360000000001, + "end": 4906.200000000001, "text": " can be medical use cases or even in even go + as far as the HR example that I gave right and I''m", "tokens": [50756, 393, 312, + 4625, 764, 3331, 420, 754, 294, 754, 352, 382, 1400, 382, 264, 19460, 1365, 300, + 286, 2729, 558, 293, 286, 478, 51148], "temperature": 0.0, "avg_logprob": -0.11230423012558295, + "compression_ratio": 1.7149122807017543, "no_speech_prob": 0.0005562844453379512}, + {"id": 693, "seek": 489052, "start": 4906.200000000001, "end": 4911.88, "text": + " a little bit vague about these use case because we still working on them but so + when they''re big", "tokens": [51148, 257, 707, 857, 24247, 466, 613, 764, 1389, + 570, 321, 920, 1364, 322, 552, 457, 370, 562, 436, 434, 955, 51432], "temperature": + 0.0, "avg_logprob": -0.11230423012558295, "compression_ratio": 1.7149122807017543, + "no_speech_prob": 0.0005562844453379512}, {"id": 694, "seek": 489052, "start": 4911.88, + "end": 4919.4800000000005, "text": " or when they have an important positive impact + on people''s lives that is amazing so if I present to", "tokens": [51432, 420, 562, + 436, 362, 364, 1021, 3353, 2712, 322, 561, 311, 2909, 300, 307, 2243, 370, 498, + 286, 1974, 281, 51812], "temperature": 0.0, "avg_logprob": -0.11230423012558295, + "compression_ratio": 1.7149122807017543, "no_speech_prob": 0.0005562844453379512}, + {"id": 695, "seek": 491948, "start": 4919.48, "end": 4927.24, "text": " certain + people results for for these big use cases and you see these you know people their + eyes", "tokens": [50364, 1629, 561, 3542, 337, 337, 613, 955, 764, 3331, 293, 291, + 536, 613, 291, 458, 561, 641, 2575, 50752], "temperature": 0.0, "avg_logprob": -0.1283216584812511, + "compression_ratio": 1.8979591836734695, "no_speech_prob": 0.0008573782979510725}, + {"id": 696, "seek": 491948, "start": 4927.24, "end": 4932.36, "text": " they go + open they go like wow okay I actually do that that''s amazing that is that is the + most", "tokens": [50752, 436, 352, 1269, 436, 352, 411, 6076, 1392, 286, 767, 360, + 300, 300, 311, 2243, 300, 307, 300, 307, 264, 881, 51008], "temperature": 0.0, "avg_logprob": + -0.1283216584812511, "compression_ratio": 1.8979591836734695, "no_speech_prob": + 0.0008573782979510725}, {"id": 697, "seek": 491948, "start": 4934.04, "end": 4942.04, + "text": " that''s that''s the coolest thing that''s that''s around and and there''s + also the there''s something in", "tokens": [51092, 300, 311, 300, 311, 264, 22013, + 551, 300, 311, 300, 311, 926, 293, 293, 456, 311, 611, 264, 456, 311, 746, 294, + 51492], "temperature": 0.0, "avg_logprob": -0.1283216584812511, "compression_ratio": + 1.8979591836734695, "no_speech_prob": 0.0008573782979510725}, {"id": 698, "seek": + 491948, "start": 4943.32, "end": 4947.24, "text": " and I don''t want to sound too + vague about it but that''s something in exciting about", "tokens": [51556, 293, + 286, 500, 380, 528, 281, 1626, 886, 24247, 466, 309, 457, 300, 311, 746, 294, 4670, + 466, 51752], "temperature": 0.0, "avg_logprob": -0.1283216584812511, "compression_ratio": + 1.8979591836734695, "no_speech_prob": 0.0008573782979510725}, {"id": 699, "seek": + 494948, "start": 4950.36, "end": 4957.0, "text": " with with with with machine learning + I guess and especially with NLP so one of the things that we", "tokens": [50408, + 365, 365, 365, 365, 3479, 2539, 286, 2041, 293, 2318, 365, 426, 45196, 370, 472, + 295, 264, 721, 300, 321, 50740], "temperature": 0.0, "avg_logprob": -0.28901233673095705, + "compression_ratio": 1.5614973262032086, "no_speech_prob": 0.008545104414224625}, + {"id": 700, "seek": 494948, "start": 4957.0, "end": 4963.32, "text": " are working + on and we hopefully gonna release for be soon as it as a demo that I said is that", + "tokens": [50740, 366, 1364, 322, 293, 321, 4696, 799, 4374, 337, 312, 2321, 382, + 309, 382, 257, 10723, 300, 286, 848, 307, 300, 51056], "temperature": 0.0, "avg_logprob": + -0.28901233673095705, "compression_ratio": 1.5614973262032086, "no_speech_prob": + 0.008545104414224625}, {"id": 701, "seek": 494948, "start": 4964.44, "end": 4970.44, + "text": " we loaded the complete Wikipedia into vh8 just the whole thing and so + we''re not talking about almost", "tokens": [51112, 321, 13210, 264, 3566, 28999, + 666, 371, 71, 23, 445, 264, 1379, 551, 293, 370, 321, 434, 406, 1417, 466, 1920, + 51412], "temperature": 0.0, "avg_logprob": -0.28901233673095705, "compression_ratio": + 1.5614973262032086, "no_speech_prob": 0.008545104414224625}, {"id": 702, "seek": + 497044, "start": 4970.44, "end": 4980.36, "text": " 100 million paragraphs and I + watched an and an Anthony Hopkins move the other day and I typed", "tokens": [50364, + 2319, 2459, 48910, 293, 286, 6337, 364, 293, 364, 15853, 29999, 1286, 264, 661, + 786, 293, 286, 33941, 50860], "temperature": 0.0, "avg_logprob": -0.3143172768985524, + "compression_ratio": 1.6860986547085202, "no_speech_prob": 0.002411526395007968}, + {"id": 703, "seek": 497044, "start": 4980.36, "end": 4987.4, "text": " into we''ve + got just an aircraft you''re upgraded like you know and which actor played Hannibal", + "tokens": [50860, 666, 321, 600, 658, 445, 364, 9465, 291, 434, 24133, 411, 291, + 458, 293, 597, 8747, 3737, 33461, 34764, 51212], "temperature": 0.0, "avg_logprob": + -0.3143172768985524, "compression_ratio": 1.6860986547085202, "no_speech_prob": + 0.002411526395007968}, {"id": 704, "seek": 497044, "start": 4987.4, "end": 4993.16, + "text": " actor or something and then in a few milliseconds it says Anthony Hopkins + and they go like whoa", "tokens": [51212, 8747, 420, 746, 293, 550, 294, 257, 1326, + 34184, 309, 1619, 15853, 29999, 293, 436, 352, 411, 13310, 51500], "temperature": + 0.0, "avg_logprob": -0.3143172768985524, "compression_ratio": 1.6860986547085202, + "no_speech_prob": 0.002411526395007968}, {"id": 705, "seek": 497044, "start": 4993.799999999999, + "end": 4998.5199999999995, "text": " and then it''s so cool if that actually works + and if that happens that is very yeah that just", "tokens": [51532, 293, 550, 309, + 311, 370, 1627, 498, 300, 767, 1985, 293, 498, 300, 2314, 300, 307, 588, 1338, 300, + 445, 51768], "temperature": 0.0, "avg_logprob": -0.3143172768985524, "compression_ratio": + 1.6860986547085202, "no_speech_prob": 0.002411526395007968}, {"id": 706, "seek": + 499852, "start": 4998.52, "end": 5003.96, "text": " gives me a thrill and so I would + say these three things are why I''m why I''m doing yeah that''s", "tokens": [50364, + 2709, 385, 257, 32935, 293, 370, 286, 576, 584, 613, 1045, 721, 366, 983, 286, 478, + 983, 286, 478, 884, 1338, 300, 311, 50636], "temperature": 0.0, "avg_logprob": -0.12560460759305406, + "compression_ratio": 1.6339285714285714, "no_speech_prob": 0.0034380005672574043}, + {"id": 707, "seek": 499852, "start": 5003.96, "end": 5009.72, "text": " super exciting + it''s like you know like the if I was asking the same question to me then the word", + "tokens": [50636, 1687, 4670, 309, 311, 411, 291, 458, 411, 264, 498, 286, 390, + 3365, 264, 912, 1168, 281, 385, 550, 264, 1349, 50924], "temperature": 0.0, "avg_logprob": + -0.12560460759305406, "compression_ratio": 1.6339285714285714, "no_speech_prob": + 0.0034380005672574043}, {"id": 708, "seek": 499852, "start": 5009.72, "end": 5015.96, + "text": " semantics right the similarity yes that would drive me because I actually + did my PhD in in", "tokens": [50924, 4361, 45298, 558, 264, 32194, 2086, 300, 576, + 3332, 385, 570, 286, 767, 630, 452, 14476, 294, 294, 51236], "temperature": 0.0, + "avg_logprob": -0.12560460759305406, "compression_ratio": 1.6339285714285714, "no_speech_prob": + 0.0034380005672574043}, {"id": 709, "seek": 499852, "start": 5017.080000000001, + "end": 5023.56, "text": " machine translation and my supervisor developed a semantic + parser you know it wasn''t", "tokens": [51292, 3479, 12853, 293, 452, 24610, 4743, + 257, 47982, 21156, 260, 291, 458, 309, 2067, 380, 51616], "temperature": 0.0, "avg_logprob": + -0.12560460759305406, "compression_ratio": 1.6339285714285714, "no_speech_prob": + 0.0034380005672574043}, {"id": 710, "seek": 502356, "start": 5023.96, "end": 5030.6, + "text": " syntactic parser and I cannot still find an analogy on the market for + for the this work but like", "tokens": [50384, 23980, 19892, 21156, 260, 293, 286, + 2644, 920, 915, 364, 21663, 322, 264, 2142, 337, 337, 264, 341, 589, 457, 411, 50716], + "temperature": 0.0, "avg_logprob": -0.16221372704756887, "compression_ratio": 1.7035398230088497, + "no_speech_prob": 0.006883376743644476}, {"id": 711, "seek": 502356, "start": 5031.240000000001, + "end": 5037.64, "text": " it was driving me that the way he was explaining this + is that hey I really now read Tolstoy", "tokens": [50748, 309, 390, 4840, 385, 300, + 264, 636, 415, 390, 13468, 341, 307, 300, 4177, 286, 534, 586, 1401, 21402, 372, + 939, 51068], "temperature": 0.0, "avg_logprob": -0.16221372704756887, "compression_ratio": + 1.7035398230088497, "no_speech_prob": 0.006883376743644476}, {"id": 712, "seek": + 502356, "start": 5038.280000000001, "end": 5045.080000000001, "text": " you know + with my parser every single day right and it fails I fix it it fails I fix it but + sometimes", "tokens": [51100, 291, 458, 365, 452, 21156, 260, 633, 2167, 786, 558, + 293, 309, 18199, 286, 3191, 309, 309, 18199, 286, 3191, 309, 457, 2171, 51440], + "temperature": 0.0, "avg_logprob": -0.16221372704756887, "compression_ratio": 1.7035398230088497, + "no_speech_prob": 0.006883376743644476}, {"id": 713, "seek": 502356, "start": 5045.080000000001, + "end": 5052.280000000001, "text": " it amazes me because Tolstoy tends to create + so long sentences that they can take several pages", "tokens": [51440, 309, 669, + 921, 279, 385, 570, 21402, 372, 939, 12258, 281, 1884, 370, 938, 16579, 300, 436, + 393, 747, 2940, 7183, 51800], "temperature": 0.0, "avg_logprob": -0.16221372704756887, + "compression_ratio": 1.7035398230088497, "no_speech_prob": 0.006883376743644476}, + {"id": 714, "seek": 505356, "start": 5053.64, "end": 5057.320000000001, "text": + " you know in Russian right I don''t know how it books in the translation by the + way I''ve never seen", "tokens": [50368, 291, 458, 294, 7220, 558, 286, 500, 380, + 458, 577, 309, 3642, 294, 264, 12853, 538, 264, 636, 286, 600, 1128, 1612, 50552], + "temperature": 0.0, "avg_logprob": -0.11779004649112099, "compression_ratio": 1.8697318007662835, + "no_speech_prob": 0.005563628394156694}, {"id": 715, "seek": 505356, "start": 5057.320000000001, + "end": 5063.64, "text": " the translation but in the Russian source and Tolstoy + has written his book nine times his books it''s", "tokens": [50552, 264, 12853, + 457, 294, 264, 7220, 4009, 293, 21402, 372, 939, 575, 3720, 702, 1446, 4949, 1413, + 702, 3642, 309, 311, 50868], "temperature": 0.0, "avg_logprob": -0.11779004649112099, + "compression_ratio": 1.8697318007662835, "no_speech_prob": 0.005563628394156694}, + {"id": 716, "seek": 505356, "start": 5063.64, "end": 5070.52, "text": " like several + books right like the the war in peace I mean and like you know so like he was basically", + "tokens": [50868, 411, 2940, 3642, 558, 411, 264, 264, 1516, 294, 4336, 286, 914, + 293, 411, 291, 458, 370, 411, 415, 390, 1936, 51212], "temperature": 0.0, "avg_logprob": + -0.11779004649112099, "compression_ratio": 1.8697318007662835, "no_speech_prob": + 0.005563628394156694}, {"id": 717, "seek": 505356, "start": 5070.52, "end": 5075.88, + "text": " compiling the language using his parser and he was fascinated by this + like okay this is the", "tokens": [51212, 715, 4883, 264, 2856, 1228, 702, 21156, + 260, 293, 415, 390, 24597, 538, 341, 411, 1392, 341, 307, 264, 51480], "temperature": + 0.0, "avg_logprob": -0.11779004649112099, "compression_ratio": 1.8697318007662835, + "no_speech_prob": 0.005563628394156694}, {"id": 718, "seek": 505356, "start": 5075.88, + "end": 5081.240000000001, "text": " semantic layer and and I was constantly thinking + you know I defended my thesis in 2011 and I was", "tokens": [51480, 47982, 4583, + 293, 293, 286, 390, 6460, 1953, 291, 458, 286, 34135, 452, 22288, 294, 10154, 293, + 286, 390, 51748], "temperature": 0.0, "avg_logprob": -0.11779004649112099, "compression_ratio": + 1.8697318007662835, "no_speech_prob": 0.005563628394156694}, {"id": 719, "seek": + 508124, "start": 5081.24, "end": 5086.76, "text": " thinking okay how can I apply + this tech in real life and it was very difficult because this parser", "tokens": + [50364, 1953, 1392, 577, 393, 286, 3079, 341, 7553, 294, 957, 993, 293, 309, 390, + 588, 2252, 570, 341, 21156, 260, 50640], "temperature": 0.0, "avg_logprob": -0.08910894788001195, + "compression_ratio": 1.7934782608695652, "no_speech_prob": 0.004402803257107735}, + {"id": 720, "seek": 508124, "start": 5086.76, "end": 5092.44, "text": " is kind + of implemented in force and I don''t code in force I don''t know if you heard of + this language", "tokens": [50640, 307, 733, 295, 12270, 294, 3464, 293, 286, 500, + 380, 3089, 294, 3464, 286, 500, 380, 458, 498, 291, 2198, 295, 341, 2856, 50924], + "temperature": 0.0, "avg_logprob": -0.08910894788001195, "compression_ratio": 1.7934782608695652, + "no_speech_prob": 0.004402803257107735}, {"id": 721, "seek": 508124, "start": 5092.44, + "end": 5099.16, "text": " it''s like it''s used in the industry it''s high performance + and like but it''s functional you know it''s", "tokens": [50924, 309, 311, 411, + 309, 311, 1143, 294, 264, 3518, 309, 311, 1090, 3389, 293, 411, 457, 309, 311, 11745, + 291, 458, 309, 311, 51260], "temperature": 0.0, "avg_logprob": -0.08910894788001195, + "compression_ratio": 1.7934782608695652, "no_speech_prob": 0.004402803257107735}, + {"id": 722, "seek": 508124, "start": 5099.88, "end": 5105.08, "text": " yeah so + like you can express many things with just one single word and then it just unwinds + there", "tokens": [51296, 1338, 370, 411, 291, 393, 5109, 867, 721, 365, 445, 472, + 2167, 1349, 293, 550, 309, 445, 517, 12199, 82, 456, 51556], "temperature": 0.0, + "avg_logprob": -0.08910894788001195, "compression_ratio": 1.7934782608695652, "no_speech_prob": + 0.004402803257107735}, {"id": 723, "seek": 508124, "start": 5105.08, "end": 5109.88, + "text": " behind the scenes and you''re like okay how do I debug this right so and + there was a port to Java", "tokens": [51556, 2261, 264, 8026, 293, 291, 434, 411, + 1392, 577, 360, 286, 24083, 341, 558, 370, 293, 456, 390, 257, 2436, 281, 10745, + 51796], "temperature": 0.0, "avg_logprob": -0.08910894788001195, "compression_ratio": + 1.7934782608695652, "no_speech_prob": 0.004402803257107735}, {"id": 724, "seek": + 510988, "start": 5110.04, "end": 5115.32, "text": " well done by another student + but I was kind of constantly fascinated by this field like how can I", "tokens": + [50372, 731, 1096, 538, 1071, 3107, 457, 286, 390, 733, 295, 6460, 24597, 538, 341, + 2519, 411, 577, 393, 286, 50636], "temperature": 0.0, "avg_logprob": -0.16268492988918137, + "compression_ratio": 1.8396946564885497, "no_speech_prob": 0.010553337633609772}, + {"id": 725, "seek": 510988, "start": 5115.32, "end": 5122.12, "text": " bring semantics + into the word of numbers and into the word of like well let''s put it inverted in", + "tokens": [50636, 1565, 4361, 45298, 666, 264, 1349, 295, 3547, 293, 666, 264, 1349, + 295, 411, 731, 718, 311, 829, 309, 38969, 294, 50976], "temperature": 0.0, "avg_logprob": + -0.16268492988918137, "compression_ratio": 1.8396946564885497, "no_speech_prob": + 0.010553337633609772}, {"id": 726, "seek": 510988, "start": 5122.12, "end": 5127.400000000001, + "text": " this is so what that''s like since I mentioned it meant yeah no but I + like this extra much what", "tokens": [50976, 341, 307, 370, 437, 300, 311, 411, + 1670, 286, 2835, 309, 4140, 1338, 572, 457, 286, 411, 341, 2857, 709, 437, 51240], + "temperature": 0.0, "avg_logprob": -0.16268492988918137, "compression_ratio": 1.8396946564885497, + "no_speech_prob": 0.010553337633609772}, {"id": 727, "seek": 510988, "start": 5127.400000000001, + "end": 5132.36, "text": " you''re saying and it''s like I and this goes again a + little bit out of the realm of the technology", "tokens": [51240, 291, 434, 1566, + 293, 309, 311, 411, 286, 293, 341, 1709, 797, 257, 707, 857, 484, 295, 264, 15355, + 295, 264, 2899, 51488], "temperature": 0.0, "avg_logprob": -0.16268492988918137, + "compression_ratio": 1.8396946564885497, "no_speech_prob": 0.010553337633609772}, + {"id": 728, "seek": 510988, "start": 5132.36, "end": 5136.68, "text": " I mean you + use the technology but it goes out of it I remember that there was before the whole", + "tokens": [51488, 286, 914, 291, 764, 264, 2899, 457, 309, 1709, 484, 295, 309, + 286, 1604, 300, 456, 390, 949, 264, 1379, 51704], "temperature": 0.0, "avg_logprob": + -0.16268492988918137, "compression_ratio": 1.8396946564885497, "no_speech_prob": + 0.010553337633609772}, {"id": 729, "seek": 513668, "start": 5136.68, "end": 5143.16, + "text": " pandemic it there was like I was speaking at a conference in London and + they were so nice to book", "tokens": [50364, 5388, 309, 456, 390, 411, 286, 390, + 4124, 412, 257, 7586, 294, 7042, 293, 436, 645, 370, 1481, 281, 1446, 50688], "temperature": + 0.0, "avg_logprob": -0.15837916419619605, "compression_ratio": 1.7048458149779735, + "no_speech_prob": 0.005646928679198027}, {"id": 730, "seek": 513668, "start": 5143.16, + "end": 5149.8, "text": " me at the or to have me at a large stage and so I had like + this big screen and I''m talking about VF8", "tokens": [50688, 385, 412, 264, 420, + 281, 362, 385, 412, 257, 2416, 3233, 293, 370, 286, 632, 411, 341, 955, 2568, 293, + 286, 478, 1417, 466, 691, 37, 23, 51020], "temperature": 0.0, "avg_logprob": -0.15837916419619605, + "compression_ratio": 1.7048458149779735, "no_speech_prob": 0.005646928679198027}, + {"id": 731, "seek": 513668, "start": 5149.8, "end": 5157.16, "text": " and I''m + giving a demo and I could see the first few front rows and that what you said before", + "tokens": [51020, 293, 286, 478, 2902, 257, 10723, 293, 286, 727, 536, 264, 700, + 1326, 1868, 13241, 293, 300, 437, 291, 848, 949, 51388], "temperature": 0.0, "avg_logprob": + -0.15837916419619605, "compression_ratio": 1.7048458149779735, "no_speech_prob": + 0.005646928679198027}, {"id": 732, "seek": 513668, "start": 5157.16, "end": 5165.400000000001, + "text": " the Q&A example so I do just I give a Q&A example and I just give a real + demo and what I always", "tokens": [51388, 264, 1249, 5, 32, 1365, 370, 286, 360, + 445, 286, 976, 257, 1249, 5, 32, 1365, 293, 286, 445, 976, 257, 957, 10723, 293, + 437, 286, 1009, 51800], "temperature": 0.0, "avg_logprob": -0.15837916419619605, + "compression_ratio": 1.7048458149779735, "no_speech_prob": 0.005646928679198027}, + {"id": 733, "seek": 516540, "start": 5165.4, "end": 5170.5199999999995, "text": + " do is then the moment when I present an audience and I click like to you know + to execute the", "tokens": [50364, 360, 307, 550, 264, 1623, 562, 286, 1974, 364, + 4034, 293, 286, 2052, 411, 281, 291, 458, 281, 14483, 264, 50620], "temperature": + 0.0, "avg_logprob": -0.15348892911858514, "compression_ratio": 1.9170124481327802, + "no_speech_prob": 0.0032468941062688828}, {"id": 734, "seek": 516540, "start": 5170.5199999999995, + "end": 5175.96, "text": " query and get the response back then I always look at + the people that I''m presenting to and you", "tokens": [50620, 14581, 293, 483, + 264, 4134, 646, 550, 286, 1009, 574, 412, 264, 561, 300, 286, 478, 15578, 281, 293, + 291, 50892], "temperature": 0.0, "avg_logprob": -0.15348892911858514, "compression_ratio": + 1.9170124481327802, "no_speech_prob": 0.0032468941062688828}, {"id": 735, "seek": + 516540, "start": 5175.96, "end": 5178.92, "text": " see these people go like they + sit like in order to just watch you and they go like", "tokens": [50892, 536, 613, + 561, 352, 411, 436, 1394, 411, 294, 1668, 281, 445, 1159, 291, 293, 436, 352, 411, + 51040], "temperature": 0.0, "avg_logprob": -0.15348892911858514, "compression_ratio": + 1.9170124481327802, "no_speech_prob": 0.0032468941062688828}, {"id": 736, "seek": + 516540, "start": 5180.839999999999, "end": 5187.719999999999, "text": " and that''s + like and I go like that''s such a cool thing so something that that we as a team + came", "tokens": [51136, 293, 300, 311, 411, 293, 286, 352, 411, 300, 311, 1270, + 257, 1627, 551, 370, 746, 300, 300, 321, 382, 257, 1469, 1361, 51480], "temperature": + 0.0, "avg_logprob": -0.15348892911858514, "compression_ratio": 1.9170124481327802, + "no_speech_prob": 0.0032468941062688828}, {"id": 737, "seek": 516540, "start": 5187.719999999999, + "end": 5193.32, "text": " up with and everybody participated in building the thing + and then people enjoy seeing that and", "tokens": [51480, 493, 365, 293, 2201, 17978, + 294, 2390, 264, 551, 293, 550, 561, 2103, 2577, 300, 293, 51760], "temperature": + 0.0, "avg_logprob": -0.15348892911858514, "compression_ratio": 1.9170124481327802, + "no_speech_prob": 0.0032468941062688828}, {"id": 738, "seek": 519332, "start": 5193.4, + "end": 5202.04, "text": " enjoy using that that''s that''s amazing that''s like + yeah that''s just you know that''s just fantastic so", "tokens": [50368, 2103, 1228, + 300, 300, 311, 300, 311, 2243, 300, 311, 411, 1338, 300, 311, 445, 291, 458, 300, + 311, 445, 5456, 370, 50800], "temperature": 0.0, "avg_logprob": -0.17480669755202075, + "compression_ratio": 1.7875, "no_speech_prob": 0.002708281157538295}, {"id": 739, + "seek": 519332, "start": 5202.04, "end": 5212.759999999999, "text": " and language + has that additional element to it right so that people you know it does this how + we", "tokens": [50800, 293, 2856, 575, 300, 4497, 4478, 281, 309, 558, 370, 300, + 561, 291, 458, 309, 775, 341, 577, 321, 51336], "temperature": 0.0, "avg_logprob": + -0.17480669755202075, "compression_ratio": 1.7875, "no_speech_prob": 0.002708281157538295}, + {"id": 740, "seek": 519332, "start": 5212.759999999999, "end": 5218.36, "text": + " communicate and and we get closer to have the machines communicate you know that + we can", "tokens": [51336, 7890, 293, 293, 321, 483, 4966, 281, 362, 264, 8379, + 7890, 291, 458, 300, 321, 393, 51616], "temperature": 0.0, "avg_logprob": -0.17480669755202075, + "compression_ratio": 1.7875, "no_speech_prob": 0.002708281157538295}, {"id": 741, + "seek": 521836, "start": 5218.36, "end": 5223.32, "text": " communicate more natural + language based with people yeah that''s that''s that''s amazing so that", "tokens": + [50364, 7890, 544, 3303, 2856, 2361, 365, 561, 1338, 300, 311, 300, 311, 300, 311, + 2243, 370, 300, 50612], "temperature": 0.0, "avg_logprob": -0.18224133578213778, + "compression_ratio": 1.8795180722891567, "no_speech_prob": 0.0013401308096945286}, + {"id": 742, "seek": 521836, "start": 5223.88, "end": 5229.88, "text": " that would + be another that would be a fourth one of doing it yeah that''s fantastic that''s + so", "tokens": [50640, 300, 576, 312, 1071, 300, 576, 312, 257, 6409, 472, 295, + 884, 309, 1338, 300, 311, 5456, 300, 311, 370, 50940], "temperature": 0.0, "avg_logprob": + -0.18224133578213778, "compression_ratio": 1.8795180722891567, "no_speech_prob": + 0.0013401308096945286}, {"id": 743, "seek": 521836, "start": 5229.88, "end": 5234.599999999999, + "text": " so deep and you know like I''m happy that we also connect with you on + that topic you know", "tokens": [50940, 370, 2452, 293, 291, 458, 411, 286, 478, + 2055, 300, 321, 611, 1745, 365, 291, 322, 300, 4829, 291, 458, 51176], "temperature": + 0.0, "avg_logprob": -0.18224133578213778, "compression_ratio": 1.8795180722891567, + "no_speech_prob": 0.0013401308096945286}, {"id": 744, "seek": 521836, "start": 5234.599999999999, + "end": 5239.88, "text": " with the semantics when you you can park all other items + like technology use cases product but", "tokens": [51176, 365, 264, 4361, 45298, + 562, 291, 291, 393, 3884, 439, 661, 4754, 411, 2899, 764, 3331, 1674, 457, 51440], + "temperature": 0.0, "avg_logprob": -0.18224133578213778, "compression_ratio": 1.8795180722891567, + "no_speech_prob": 0.0013401308096945286}, {"id": 745, "seek": 521836, "start": 5239.88, + "end": 5245.48, "text": " the semantics part way we''re driving this thing it''s + it''s just amazing and I''m happy for you guys", "tokens": [51440, 264, 4361, 45298, + 644, 636, 321, 434, 4840, 341, 551, 309, 311, 309, 311, 445, 2243, 293, 286, 478, + 2055, 337, 291, 1074, 51720], "temperature": 0.0, "avg_logprob": -0.18224133578213778, + "compression_ratio": 1.8795180722891567, "no_speech_prob": 0.0013401308096945286}, + {"id": 746, "seek": 524548, "start": 5245.48, "end": 5252.5199999999995, "text": + " to be doing this and I think part of the excitement of being at the edge of of + doing things is also", "tokens": [50364, 281, 312, 884, 341, 293, 286, 519, 644, + 295, 264, 14755, 295, 885, 412, 264, 4691, 295, 295, 884, 721, 307, 611, 50716], + "temperature": 0.0, "avg_logprob": -0.11195479991824128, "compression_ratio": 1.8287037037037037, + "no_speech_prob": 0.004559936001896858}, {"id": 747, "seek": 524548, "start": 5253.24, + "end": 5259.0, "text": " kind of you know launching things and like kind of like + announcing something is there something", "tokens": [50752, 733, 295, 291, 458, + 18354, 721, 293, 411, 733, 295, 411, 28706, 746, 307, 456, 746, 51040], "temperature": + 0.0, "avg_logprob": -0.11195479991824128, "compression_ratio": 1.8287037037037037, + "no_speech_prob": 0.004559936001896858}, {"id": 748, "seek": 524548, "start": 5259.0, + "end": 5266.679999999999, "text": " you would like to announce yeah so yeah certainly + so I think the so one thing that already mentioned a", "tokens": [51040, 291, 576, + 411, 281, 7478, 1338, 370, 1338, 3297, 370, 286, 519, 264, 370, 472, 551, 300, 1217, + 2835, 257, 51424], "temperature": 0.0, "avg_logprob": -0.11195479991824128, "compression_ratio": + 1.8287037037037037, "no_speech_prob": 0.004559936001896858}, {"id": 749, "seek": + 524548, "start": 5266.679999999999, "end": 5273.48, "text": " bit so we we''re gonna + launch that huge dataset just the whole Wikipedia with everything and also", "tokens": + [51424, 857, 370, 321, 321, 434, 799, 4025, 300, 2603, 28872, 445, 264, 1379, 28999, + 365, 1203, 293, 611, 51764], "temperature": 0.0, "avg_logprob": -0.11195479991824128, + "compression_ratio": 1.8287037037037037, "no_speech_prob": 0.004559936001896858}, + {"id": 750, "seek": 527348, "start": 5273.48, "end": 5278.04, "text": " for people + to try it out and to play around with so that they can see actually how big it gets + and", "tokens": [50364, 337, 561, 281, 853, 309, 484, 293, 281, 862, 926, 365, 370, + 300, 436, 393, 536, 767, 577, 955, 309, 2170, 293, 50592], "temperature": 0.0, "avg_logprob": + -0.11625218391418457, "compression_ratio": 1.808411214953271, "no_speech_prob": + 0.003716119332239032}, {"id": 751, "seek": 527348, "start": 5278.04, "end": 5283.48, + "text": " related to that and we already actually have an pre-release but it''s + gonna be released very soon", "tokens": [50592, 4077, 281, 300, 293, 321, 1217, + 767, 362, 364, 659, 12, 265, 1122, 457, 309, 311, 799, 312, 4736, 588, 2321, 50864], + "temperature": 0.0, "avg_logprob": -0.11625218391418457, "compression_ratio": 1.808411214953271, + "no_speech_prob": 0.003716119332239032}, {"id": 752, "seek": 527348, "start": 5283.48, + "end": 5290.28, "text": " just as a standard release is everything related to horizontal + scalability so that people can now", "tokens": [50864, 445, 382, 257, 3832, 4374, + 307, 1203, 4077, 281, 12750, 15664, 2310, 370, 300, 561, 393, 586, 51204], "temperature": + 0.0, "avg_logprob": -0.11625218391418457, "compression_ratio": 1.808411214953271, + "no_speech_prob": 0.003716119332239032}, {"id": 753, "seek": 527348, "start": 5290.28, + "end": 5296.839999999999, "text": " you know scale from into the millions into the + into the billions and we starting to get these", "tokens": [51204, 291, 458, 4373, + 490, 666, 264, 6803, 666, 264, 666, 264, 17375, 293, 321, 2891, 281, 483, 613, 51532], + "temperature": 0.0, "avg_logprob": -0.11625218391418457, "compression_ratio": 1.808411214953271, + "no_speech_prob": 0.003716119332239032}, {"id": 754, "seek": 529684, "start": 5296.84, + "end": 5303.32, "text": " kinds of questions in and we''re very close and it''s + like I was in like very very very very close", "tokens": [50364, 3685, 295, 1651, + 294, 293, 321, 434, 588, 1998, 293, 309, 311, 411, 286, 390, 294, 411, 588, 588, + 588, 588, 1998, 50688], "temperature": 0.0, "avg_logprob": -0.19633106432462993, + "compression_ratio": 1.7410714285714286, "no_speech_prob": 0.003268136642873287}, + {"id": 755, "seek": 529684, "start": 5303.32, "end": 5309.32, "text": " because + the pre-release is already out there and and that then goes full circle back to + people ask", "tokens": [50688, 570, 264, 659, 12, 265, 1122, 307, 1217, 484, 456, + 293, 293, 300, 550, 1709, 1577, 6329, 646, 281, 561, 1029, 50988], "temperature": + 0.0, "avg_logprob": -0.19633106432462993, "compression_ratio": 1.7410714285714286, + "no_speech_prob": 0.003268136642873287}, {"id": 756, "seek": 529684, "start": 5309.32, + "end": 5315.72, "text": " I get sometimes emails from people say yeah so we have + like you know a x billion factors but you", "tokens": [50988, 286, 483, 2171, 12524, + 490, 561, 584, 1338, 370, 321, 362, 411, 291, 458, 257, 2031, 5218, 6771, 457, 291, + 51308], "temperature": 0.0, "avg_logprob": -0.19633106432462993, "compression_ratio": + 1.7410714285714286, "no_speech_prob": 0.003268136642873287}, {"id": 757, "seek": + 529684, "start": 5315.72, "end": 5322.04, "text": " probably can it''s a well all + choices and we probably can and I go like but can you also store the", "tokens": + [51308, 1391, 393, 309, 311, 257, 731, 439, 7994, 293, 321, 1391, 393, 293, 286, + 352, 411, 457, 393, 291, 611, 3531, 264, 51624], "temperature": 0.0, "avg_logprob": + -0.19633106432462993, "compression_ratio": 1.7410714285714286, "no_speech_prob": + 0.003268136642873287}, {"id": 758, "seek": 532204, "start": 5322.04, "end": 5327.16, + "text": " metadata you can also store the metadata and then people go they''re really + excited so it''s just", "tokens": [50364, 26603, 291, 393, 611, 3531, 264, 26603, + 293, 550, 561, 352, 436, 434, 534, 2919, 370, 309, 311, 445, 50620], "temperature": + 0.0, "avg_logprob": -0.12320057143512954, "compression_ratio": 1.8358778625954197, + "no_speech_prob": 0.006810296326875687}, {"id": 759, "seek": 532204, "start": 5327.16, + "end": 5332.68, "text": " that just keeps going and going and so those are the big + two things that I that I want to share because", "tokens": [50620, 300, 445, 5965, + 516, 293, 516, 293, 370, 729, 366, 264, 955, 732, 721, 300, 286, 300, 286, 528, + 281, 2073, 570, 50896], "temperature": 0.0, "avg_logprob": -0.12320057143512954, + "compression_ratio": 1.8358778625954197, "no_speech_prob": 0.006810296326875687}, + {"id": 760, "seek": 532204, "start": 5334.68, "end": 5339.88, "text": " there''s + a lot of people asking for this and so we''re probably gonna make a lot of people + happy", "tokens": [50996, 456, 311, 257, 688, 295, 561, 3365, 337, 341, 293, 370, + 321, 434, 1391, 799, 652, 257, 688, 295, 561, 2055, 51256], "temperature": 0.0, + "avg_logprob": -0.12320057143512954, "compression_ratio": 1.8358778625954197, "no_speech_prob": + 0.006810296326875687}, {"id": 761, "seek": 532204, "start": 5339.88, "end": 5346.5199999999995, + "text": " when it''s when it''s out of the pre-release yeah that''s fantastic I + mean it both sound so big", "tokens": [51256, 562, 309, 311, 562, 309, 311, 484, + 295, 264, 659, 12, 265, 1122, 1338, 300, 311, 5456, 286, 914, 309, 1293, 1626, 370, + 955, 51588], "temperature": 0.0, "avg_logprob": -0.12320057143512954, "compression_ratio": + 1.8358778625954197, "no_speech_prob": 0.006810296326875687}, {"id": 762, "seek": + 532204, "start": 5347.08, "end": 5351.88, "text": " and you know I''m actually spending + my time now figuring out how to scale I participate in the", "tokens": [51616, 293, + 291, 458, 286, 478, 767, 6434, 452, 565, 586, 15213, 484, 577, 281, 4373, 286, 8197, + 294, 264, 51856], "temperature": 0.0, "avg_logprob": -0.12320057143512954, "compression_ratio": + 1.8358778625954197, "no_speech_prob": 0.006810296326875687}, {"id": 763, "seek": + 535188, "start": 5351.88, "end": 5358.28, "text": " billion scale competition actually + by the way and I like I''m I''m waiting with excitement for this", "tokens": [50364, + 5218, 4373, 6211, 767, 538, 264, 636, 293, 286, 411, 286, 478, 286, 478, 3806, 365, + 14755, 337, 341, 50684], "temperature": 0.0, "avg_logprob": -0.11097455657688918, + "compression_ratio": 1.9831223628691983, "no_speech_prob": 0.003432776080444455}, + {"id": 764, "seek": 535188, "start": 5358.28, "end": 5363.72, "text": " release + because I would like to learn things from you guys as well and like if you solve + that", "tokens": [50684, 4374, 570, 286, 576, 411, 281, 1466, 721, 490, 291, 1074, + 382, 731, 293, 411, 498, 291, 5039, 300, 50956], "temperature": 0.0, "avg_logprob": + -0.11097455657688918, "compression_ratio": 1.9831223628691983, "no_speech_prob": + 0.003432776080444455}, {"id": 765, "seek": 535188, "start": 5363.72, "end": 5368.76, + "text": " and you open source many things and you will open source this part as + well so it''s it''s it''s", "tokens": [50956, 293, 291, 1269, 4009, 867, 721, 293, + 291, 486, 1269, 4009, 341, 644, 382, 731, 370, 309, 311, 309, 311, 309, 311, 51208], + "temperature": 0.0, "avg_logprob": -0.11097455657688918, "compression_ratio": 1.9831223628691983, + "no_speech_prob": 0.003432776080444455}, {"id": 766, "seek": 535188, "start": 5368.76, + "end": 5373.400000000001, "text": " amazing you know what you do and and I''m waiting + with excitement to learn what you''ve done", "tokens": [51208, 2243, 291, 458, 437, + 291, 360, 293, 293, 286, 478, 3806, 365, 14755, 281, 1466, 437, 291, 600, 1096, + 51440], "temperature": 0.0, "avg_logprob": -0.11097455657688918, "compression_ratio": + 1.9831223628691983, "no_speech_prob": 0.003432776080444455}, {"id": 767, "seek": + 535188, "start": 5374.76, "end": 5381.4800000000005, "text": " so thanks so much + and thanks so much thanks so much for your time as well I mean we we went so", "tokens": + [51508, 370, 3231, 370, 709, 293, 3231, 370, 709, 3231, 370, 709, 337, 428, 565, + 382, 731, 286, 914, 321, 321, 1437, 370, 51844], "temperature": 0.0, "avg_logprob": + -0.11097455657688918, "compression_ratio": 1.9831223628691983, "no_speech_prob": + 0.003432776080444455}, {"id": 768, "seek": 538148, "start": 5381.48, "end": 5388.12, + "text": " deep today and in many areas and I''m sure we can talk more at some point + down the road", "tokens": [50364, 2452, 965, 293, 294, 867, 3179, 293, 286, 478, + 988, 321, 393, 751, 544, 412, 512, 935, 760, 264, 3060, 50696], "temperature": 0.0, + "avg_logprob": -0.16398428748635685, "compression_ratio": 1.7401960784313726, "no_speech_prob": + 0.0016065994277596474}, {"id": 769, "seek": 538148, "start": 5388.919999999999, + "end": 5393.16, "text": " probably we probably can and thank you so much for having + me and keep up the great work because", "tokens": [50736, 1391, 321, 1391, 393, + 293, 1309, 291, 370, 709, 337, 1419, 385, 293, 1066, 493, 264, 869, 589, 570, 50948], + "temperature": 0.0, "avg_logprob": -0.16398428748635685, "compression_ratio": 1.7401960784313726, + "no_speech_prob": 0.0016065994277596474}, {"id": 770, "seek": 538148, "start": 5393.16, + "end": 5398.599999999999, "text": " you''re doing a great you know job in the you + know in the community and in the industry yeah", "tokens": [50948, 291, 434, 884, + 257, 869, 291, 458, 1691, 294, 264, 291, 458, 294, 264, 1768, 293, 294, 264, 3518, + 1338, 51220], "temperature": 0.0, "avg_logprob": -0.16398428748635685, "compression_ratio": + 1.7401960784313726, "no_speech_prob": 0.0016065994277596474}, {"id": 771, "seek": + 538148, "start": 5398.599999999999, "end": 5409.639999999999, "text": " thanks so + much Bob my pleasure and yeah see you next time bye cool thank you bye", "tokens": + [51220, 3231, 370, 709, 6085, 452, 6834, 293, 1338, 536, 291, 958, 565, 6543, 1627, + 1309, 291, 6543, 51772], "temperature": 0.0, "avg_logprob": -0.16398428748635685, + "compression_ratio": 1.7401960784313726, "no_speech_prob": 0.0016065994277596474}]' +--- + +on database and I'm sure Bob will talk more about it, what it is, what it isn't. Hey Bob. Hey, thanks for having me. Cool to be here. Yeah, thanks for joining. I know you have a haptic schedule, but it's always nice to, you know, pause a little bit and talk about things. +And I was thinking maybe we can start off by introduction. Like if you can introduce yourself your background and kind of like, how did you end up working for this product and company? Yeah, sure. +So I've been, so I started my career as a, as a software engineer and, and later I moved to a more IT and software consultancy. And one of the things that I was working with a lot of unstructured data. And we're probably going to talk way more about that. +But the story that I have is that the years ago I was at a, at a conference in San Francisco and it was a, it was a cloud conference. And back then it was just announced that there was a change in the, in the Google search algorithm. +And you have to, you have to bear my, this is this pre dating, like you've seen the remote like transformers and those kind of things. This was the time that I think, glass was the, was the, the biggest thing around. +And, and they made it change and they said, like, well, we're going to go more to contextual search. We're going to go on the way from what they call them, a page rank to rank brain. +And one of the things that I was looking into is like, you know, is there any company or are these cloud providers? Are they going to provide database technology or search engine technology that actually deals with a similar type of search. +So that becomes easier to search through unstructured data. And, and the answer was actually like they, they weren't looking into it. Or maybe they were, but they weren't sharing it. So I was actually at the airport of San Francisco. And I just started to work on this idea. +And it was like, it's coming from a, from a lot of directions back then. So it was a lot happening, knowledge graphs are happening. And the machine learning was growing. And at some point, I thought like, hey, actually, I do think that there's an opportunity in the market for this. +And so I started to work on this. So I started to gather a team around me. And what then happened was that a lot happened in the machine learning space. So think about these transformers models were released. They were getting better and better. + And back then, we were still looking at like having like these factory presentations that we can talk a little bit more, you know, about in a bit, like on the site, but we actually don't like, hey, actually, if we use this, we can just solve new use cases and we can build a completely new database or new search engine. +And so that is the origin story. So that's where I'm coming from and why we started because on structure data was a problem. It is still a problem. +And I strongly believe that these kinds of vector search technologies are helping in solving these problems, not only in text, but also basically anything you can vectorize. So they can be images, they can be audio, but that could also be, I don't know, the human genome, you name it. +All these things can be vectorized and it gives another perspective to search through the data. So that's the origin story. Yeah, that's awesome. That's awesome to hear. +And like, you know, like this field is still in many ways emerging, right? Like the field of let's say vector data basis per say as products, but also the field of applying them, you know, for different use cases. But you know, like, it's interesting. +You touched on, like, you, you know, you knew about, you know, Google kind of disclosing something. +And then you knew that the models have been also developing, right? Let's say, basically you predated that, but then bird came out, right? And then in other fields, you know, let's say computer vision, automatic speech recognition, they also been vectorizing in some way. +Maybe signal processing wasn't vectorizing, but then I guess they started doing it. And like, it's interesting. Do you think that you kind of like coincided? Like you basically predicted this field, right? Like you didn't know it will happen. You felt that it will happen. +But it wasn't at the same scale as it is now today. We have so many models, right? Like, I don't know, hugging face, making a product out of it and so on. But like, do you think there was a real need? Or was it kind of coinciding that, yes, now there are models. +We are addressing the similar problems, you know, but using different technique. And now you are there with your idea. Yeah, so I mean, there are two sides. Of the going to answer this question. So I want to end that's more the about the need. +And then secondly, about when I knew when I sold a value. And so, so let me, let me start with the first thing. So. On structure data is huge. And the problem that we currently have with search is that if you know what you're looking for, you can find it. +If you don't know what you're looking for, you can't. So to make that very simple, if you have a webshop, for example, just sort of a grocery store or something like that. And you're looking for medicine because you have a headache. +Then you must somehow or use the name of the product or somebody needs to tag the product to find it. Right. So if you have like, I don't know, a aspirin, then somebody has to add the keyword, you know, and headache or something like that. +Or a pain killer and then even with pain killer, you know, etc. What these models solve. And then we only talk about the NLP and the natural language person models is the. Is that they solve is that you can look in the vicinity of these of these words. +And what I often give is an example to think about it just as a as a mental model. Is a is an actual physical grocery store. So the example that I always gave is that I say like, well, let's say that I like a. I have a shopping list and a shopping list says like apple banana washing powder. +If you would have a traditional data, if you would have a store that it's organized as a traditional database, then it could be for example on an alphabetical order. +It's going to be pretty difficult to actually find what you're looking for because maybe at the A you might not find the apple, but you have to look to the G because you're looking for a granny Smith app, etc. +And what these these factor models do is that they they're basically a form of like a hyper space, right. So they have you can envision them as as a as a tree dimensional space. +So if you walk to the food department in the integrator and you find an apple, then you know that a banana will be closer by than the washing powder is and if you move towards the washing powder, you move away from these. +And from the from the food section and that brings me to the second part of my answer when I knew that is. This potential was because I made this very simple super super super simple. +And I was like, what's the type which was based back then on glove and big problem was that people say like there's a problem with disambiguation. So if I have a word with effect representation, for example, or apple. Is that related to the fruit apple or to the company apple. +So I did something very simple. I said like well, if I have a document or sentence. And again, at bear mind, this is predating transformers and. I said well, what if I take these individual works. So I wrote a very simple script that took these individual words. +I said, I'm going to calculate a new factory presentation, just a centroid based on these words. So now I said, OK, I have a company with an apple. So I take all these individual words calculated centroid. And now I see if I can somehow. +Make the work, you know, the sentence less ambiguous and that turned actually out to work. And then I thought, well, not extremely well, but rather well. Again, we're talking years back now. +And then I knew, OK, this here is value because I could think of so many things that you now can index and you can search in the air quotes vicinity of it. In your, in your, in your vector space, it made it easier to find things. It made it easier to classify things, et cetera, et cetera. +So that's. No, basically, be my answer and how I, how I see that. That's a great answer. So it's like the way I thought about it. +It's like, um, you bring context to your data, right? If we stay on the text side for the moment, you said apple and banana, you know, they are related because they are both fruit, right? +But there could be some other related items now data set, we just don't know about as long as we encoded them and with the right kind of distance metric, we can figure it out how close they are. +So it sounds like coming back to your previous example where we have used, let's say, inverted index, you know, we would just store all our items in some alphabetical order and hope for the best. +And that order, I think inherently didn't have the context, right? The context was kind of in, it was kind of represented in a different way, like in specifically in the case of inverted index, you deal with addictionary of terms pointing to a posting list, right? +Speaking in the scene, search engine, bingo for the moment here. +So, um, and that posting list is just an order, order list of document IDs. +So you don't have much context there either, right? + So, exactly, and that is, that is how it, how it brings context and the, and again, going back to that, to that mental model or that, that idea that you can have about it is that what I, what I said, I, so if you take the building, where you have to grow, research in the building, the building would be the database basically. +And the model tells you where to put stuff in that building. +So that's how it's giving that context and then the only thing that we need to do, well, I make it sound very simple, but the thing that we need to do in a database is, make it possible as easy as possible for the end user to navigate through that building. +And that is basically what the factor database is doing. So it's taking the data and we can also talk a little bit more about the features that that we have in the waviate because that's also something that we, we don't only store factors, we also store data objects. +But basically, if you bring a data object to waviate, you tell it and take this part of that information to factorize. +So for example, if you have, well, a product, for example, then you could say, well, I want to factorize the title and description that is factorized and then the model tells waviate where in that database or in that factor space to place that data object. +And that is what we tried to optimize for as much as we can so that you can search through to hundreds of millions of data objects in using that model in just mere milliseconds. Yeah, that's fantastic. +And I think before we move to the like kind of what you are focusing on as a product, which is super exciting. And I mean, you're doing a ton of work. +I just wanted to close off on that on that line of thought that maybe, just maybe we are on the verge of closing the inverted index data structure, because it existed since I think 15th century, like the first book where they published the index space. +And it's the index page in the end, it's an inverted index because it said, okay, this word occurs on this page, that's an inverted index, right. And so it existed for multiple centuries. And so you think we are on the verge of replacing it with contextualized embeddings. +That is certainly that isn't that isn't an exciting thought. I have to know that there are a few things from a from a from a technical perspective, really inverted index is still being used. +But one of the things that we've done, for example, we've hit is that we said like we double down on the factor, you know, on the on the contextual search. And yes, every now and then. + So for example, if you say, show me, if you have a product database and you say like show me products to for outdoor sporting, for example, but they have to be more expensive than 10 bucks, then, you know, both types of indexes kick in, but it definitely starts from the perspective of actor search and I like your idea. +The amount of also research has been released that we of course also benefit from is amazing. So I like that. I like that idea. Yeah, so I think on the next lecture, I also a little bit like teach students in the local university here. +And when I explained some basic building blocks of sort of, you know, classical search engine architecture. And I explained the inverted index. Then I ask, I puzzle them with this question. Do you know how old this data structure is? The students are actually from the linguistic department. +So they they are not as kind of, you know, IT people who care only about code, they also care about the rest of life in many ways. +So no, I don't want to play the IT guys, but I'm just saying they are kind of very multi-dimensional, you know, and they and they are just puzzled and they say, OK, maybe 18th century, they don't know. But then I just bring the screenshot of a really old book. +18th century and they're like really? So I just I sort of make that connection that hey, we are still using the tech that was invented in 15th century. Yeah, yeah, yeah, nobody that's I mean, I agree with you and that is that is extremely exciting. +And I think we'll also get into that, but but you also see emerging is like these. The use cases where those kinds of database and search engines are do that are kind of solved. I mean, of course, it's so fair, so it can always be better and always be more, but those kind of are kind of solved. +But what we see actually with these fact search engines is that new use cases and new options actually pop up. So we can do new things with it. So, and I think that's very exciting as well. Yeah, absolutely. +That's an exciting way to kind of approach this new emerging field is to look for use cases. And I was really wondering like what is that that you are building in the company your B IV 8 database engine. So you said that you have an you had an idea, you know, you started assembling the team. +Now you give vision, you drive a lot of things on the open source, you're super active. What is it that you are focusing on, you know, for your users and maybe you can also go into use cases part. Sure. +So it's important to bear in mind that if you look at the solution like we've had you can take two angles to look at it. So you can, as I like to go, you can look at it, but himself. So that's really true. The core technology. That's how you can look at, but you can also look top down. +And so that is from the, from the use case perspective. And there are like that are people in working on we've yet. And as you mentioned, it's also it's open source that are working and talking about like this from the bottom self approach. And I like to take a little bit more top down. +So this like, so what are the things that we can do with it. And. So let me explain to you what so what we're building. So at the core so you can see this like a. They're like tree layers basically. So the first layer is the database itself. +So you can find that database on on on get up, you can find it on the documentation or website that is just called the weave yet. And the fact that the fact that search in is the core database. We see people use the database just to store data objects and their own factory presentations. +Now what is very important to know from a use case perspective. And I'm now starting at the lowest level is that we thought that it was very important from the get go to make sure that people can not only store the factors, but also the data that they are representing. +So if you look at the data, you can see that they are going to be a lot more effective to do an example of the product, but it will be an article or what have you. You can actually store the product. So the price, the name, the description and those kinds of things. +And it can say this product has this factor. And on top of that, we also said like we want to be able to connect these data objects together in a more air quotes traditional graph. We just not a graph database, but it has a graph data model. +Now, when we go into use case, I can I will share a few cool things that I that you can do with that. But that's so that is at the core at the heart that is the database. And what does that database focus on. +It's focused on being a database so that you can really have create right update and delete functionality, which is easier set and done. And there's a lot of content also that my my colleague agenda talks about online if you really want to get into the mid degree. +So, but it's the database that you're used to you use, we've eaten a similar fashion. So you take the container to spin it up. APIs become available or the rest of the APIs of the crop channel APIs. Client available, Python, go job, what have you. That you can connect to the database. +So if you're used to working with a database or a search engine, it's the same function there. That is one that sits at the core. Then around that we have our first layer or a second layer. And that are those are modules. And what these modules do they do a few things. +So we've seen like, hey, there are actually certain types of, for example, machinery models that people keep using over and over to get these vector representations. Why not bundle them. So think about the text to fact models that we have. +So we have different types for different use cases where you can say, well, I'm going to throw in that product, but automatically take a model to create a vector representation. So we have a question answering models spell check models. You can create your own models. +Sorry, I'm saying models and then modules. Sorry, this is a little bit. And it's so models and modules. So I meant modules. And those are available. And open source as well. And my colleague Laura made a great video also on like how you can build your own your own modules. +And then we have like a another layer around that. And then we go a little bit outside of the realm of the software per se itself. And those are more in the package use cases. We see that there's a lot of value in in retail wholesale e-commerce in the medical space. +Data management space, those kind of spaces and what we're doing and that's mostly also my focus. And if we have at the core, we have this one singular database. +What are these package things that we can do around it? And that is also where we're, you know, where we make a distinction between our users and our customers. So our customers are mostly interested in these packages. Right. I can say, OK, I have a PRP classification problem. Oh, great. +You can actually do that with V8 with that specifically for companies in your industry. You can also be for document search for medical use case image use cases, image similarities. And so we package them together. So that would be the last there. +Sometimes there are software and folks, for example, in the form of plugins and those kind of things. But that is the that is the outer layer. So that is what we've had looks like. And that's what you're constantly building because as you mentioned before. +The fact that it's kind of a new thing right to actually deal with that. And that a question might be like, so what's the, so what's actually the new thing right. +So the other day somebody asked me said like, well, it was a data scientist is like, well, but if I have my factors, I can just store them like in my memory and do some similarities or something. And we can absolutely do that is it like now. +But what if you want to do that for a product catalog that might have like 50 K products that are constantly changing and those kind of so then that becomes problematic. +So we actually help you to bring these models to production and what you actually see is that the new use cases that come out of that are tremendously big and we're just constantly uncovering new ones. Let me give you one example. So the let's stay with the ecommerce example. +So if I have and we've had a data object that has a product and a product has effective representation that it got from a, for example, transformer model. Then we can also say we get what I have a card. I just a shopping cart. Now if people add products to the shopping cart. +We can real time calculate new vector representations based on what people have these cards. So now we can say, hey, based on what you have in your cards, you might be interested in this or that product as well. +So now you have these real time now all of a sudden it it changed from a search engine where you can find products into a recommendation engine for ecommerce just all in one. +And those kind of things for constantly uncovering and there's so much more that we that we can do from from very concrete things like ecommerce to on all the other ends of the spectrum. Things like with effects representation that people are calculating for like a genomes and those kind of things. +So that is just that keeps just. And the use case keep turning up, you know, almost on a daily basis. Yeah, yeah, that's I want to that that's so great like a dive in I was kind of a little bit I wanted to unpack a little bit like things. +So I understand them well enough and maybe our listeners will too as well. So you know, like when you said models and modules, you know, let's say I'm a researcher, I have a model in letting model right that I've been using and the battle testing. +Now, if I want to introduce that model into the aviate, I will have to create a module which is using this model is that right is that. Okay, that's great. And I need to extend some API right that you provide. +Yes, and that is something that we spent a lot of time on and that's the API design because the I am a strong believer in developer UX. So it needs to be as clean and as easy as possible. So one of the things, for example, that we've done is that we've adopted graph QLS interface. +So sometimes people ask is it like well, why graph QL and not something more expressive like spark or something like that, which is a good question. +One of the things that we know is like well, if we focus on being an effective database and we just want to show these data objects with effect to be presentations. +And you know, sometimes it's possible we have these as these graph relations connections in them, but we're not focusing on being a graph database. I think actually graph QL does a job and it's easy for people to understand it's very intuitive for people to understand. +And I think that these kind of things are very important and so to get back to your to your point that your question. So what we try to do is make it as easy as possible to actually bring these if you have your own models to production. +And it's like well, I don't have any models, but I just want to do you know, semantics search through I don't know. So if you're a person who is interested in the data science, you know, you know, you know, just make something of the shelf should it in 3D API and you could develop. +So let's say and also this is kind of like which I think is very important in today's world, even though a lot of machine learning and data science happens in Python. +So you go let's say to web scale sometimes you cannot use Python anymore like you need to use let's say go right or maybe see bindings and go and things like that. +So your API is it kind of cross-lingual meaning I have model in Python or maybe in go can I kind of plug it in or do I need to rewrite some layer on top of it to be company. +That is a great question and here comes the especially the expertise of the development team in so the what they have done is that is like well, we know that that's it that that center right that database that just needs to be optimized as far as possible because. + I'm you know let's stick with the ecommerce example if you use it in production and hundreds of people are searching and you want to give these recommendations you need to be able to scale it so you need to choose a language and an and an and an architecture that actually supports that so in our case that's that's go. +But even if you if you look to the to the GitHub repository you will even find the assembly optimizations for certain things in there but we also knew that we said like well maybe if you want to use model that's written in. + For example and that has bindings in Python for example or you like to work in in Python so one of the things that we did there so it's like well the way that the modules work is that the modules are containerized so there are APIs going between V8 and the different modules and as long as you adhere to these APIs you can choose with any whatever language you want to build to build a module. +So in the case of V8 V8 itself is completely written and go and as like even with the assembly optimization those kind of things and the modules we have a few modules that are for example written in Python because we use. +A specific types of transformer modules that just you know run well within within Python so you can do whatever you want within it when it comes to using V8. + So you have the database running and you can pick a client for example the Python client and have the Python client interact with V8 wherever it sits but if you're building a front end application people use for example the JavaScript clients we have I've seen people build react applications with the JavaScript clients so that's why we structure it like that that it's easy to use in production. + Yeah that's amazing and you know like what you touched on it's so important and I mean close to my heart as well I've been building APIs kind of in my free time for a long time and you know like what I've noticed with the users is that the lower you put the boundary to kind of enter right like meaning let's say you have an API and you have published the sample clients to use this API on all possible languages let's say kind of the main street. +Kind of the main stream once at least you know it will lower the threshold to enter for them so that they will never even contact you they will start using it right that's that's the win that's the win right. + And I even believe the so to sidestep a little bit from the discussion but it's my it's interesting to talk about this because the I am a strong believer and I'm I'm I'm shooting my horn for like years already about this that is it like the overlap between the tech and the business science is in my opinion expressed in the API layer. + So if you feel the onion of a tech business you can go as deep as go unless you have a graphical user interface but if you talk about database technology those kind of things even bigger platforms if you look at the API the API describes to you in human language what is that what it's exposing and therefore what the value is that it's creating and the only thing that you need to do is business is it's right to capture that value. + Yeah, exactly and it's like I think the the sort of the same behind the API platform I was using back then they were saying like software is eating the world but APIs are eating software so they like decomposing the software that was sitting on the shelves somewhere in those big companies small companies whatever. +And now it's introducing the network right so like everyone wants to expose their value through an API and you can easily consume that value through an API right and then you add all this you know payment you know layers and what not to actually make it economically feasible. + So that's that's I think that's an exciting direction and I'm happy to hear that you guys are pursuing that model like basically making it an API in many ways database should be an API right sitting somewhere and I can connect to it with my client of you know in the language of my choice and you know handle all the cases I need to handle. + Exactly and I think so if you look at a nice car for example right there are two ways that you can look at the car so the bottom up way if you compare it's all right the bottom up way is that the first thing that you do you open the hood and you you you know you look at the beauty of the engine and maybe you want to know how the engine works on the hood. + The top down you're looking at it is just opening the door sitting in the chair holding the interface of the car steering wheel and you're like oh this this car drives you know it drives fantastic it drives amazing amazing and my argument person and I'm not everybody agrees but that's my point of view is that I said like if you have an amazing engine but you know she is steering wheel nobody's going to drive your car. + I mean the other way around also true that if you have a beautiful interface in your car and you have a shitty engine that also doesn't work but that needs to play well together and that's again why I'm strongly for in that in that in the in the UX the experience that you have in using the technology because of course an experience is not limited to a graphical use in space that can also sit in a in an in an API of course. +Yeah absolutely. + And you know like coming back to some of the use cases you brought up I mean you mentioned the shopping cart I was actually chatting to Eric pew hey Eric if you're listening to this from open source connections you know he was saying like out of his you know it's kind of out of blue sky he was saying hey so let's imagine a use case it's a bit of delivery and and you want to. + Encode the journey you know of your delivery person with no left turns right so like only right most or like right turns or you know forward backward but no left turns just for whatever reason and this was like very interesting use case like can I actually express that in the form of embedding probably I do I can't right so and then do some kind of. + Geographical search and like say okay what's the most similar sort of journey that will bring the speed from A to B sounds a little bit crazy but you know like that's what i'm thinking when a use case exists the sort of the journey is to go backward from it to to the embedding space right and that's that's very interesting yeah. + Yeah but so let's build a little bit further on the so I like the example right so so but let's let it's an it's an it's a good example but it's an abstract example so we might even make it a little bit more concrete right so let's say you have a pizza delivery surface and from the moment somebody orders pizza you have certain data right so you have data about like what's on the pizza it's coming from or the person living etc etc. + There are two things that you can do right you can and go that information in the factory presentation and if you are fast enough in comparing factors with old orders you can say something about that order and that it comes interesting so what you know can do is you can say for example you might be able to say something about delivery times so you can say okay I we've sold that say that you're like a big pizza chain you know we'll sold like a million dollars. + Pizza's in the past now based on this request we real time calculate the fact pre-presentations for this order we do a real time comparison for orders in the past and I said like hey we see that the average of the last 10 orders that are similar was like 18 minutes so now real time you can say something about that so this is just an example of use case that where these factor databases might be extremely valuable and that's just A example and that just more and more of these kinds of cases are popping up so that's that's extremely exciting in my opinion. + Yeah I mean it sounds so interesting because like you know there are so many products that still revolve around the idea of let's say kind of for simplistic terms inverted index like I'm using that 15th century model to represent my data right then if I have images I'm like oops what should I do now okay maybe I can use some extension on top of Lucine if I'm using Lucine let's say for the sake of it right but it's still kind of a limiting experience it's kind of like okay I'm I'm I'm I'm solving my my task with the wrong tool right in many ways and maybe I should not be like let's say Lucine since I mentioned it I can kind of continue using it for the spa search for the for the normal inverted index retrieval but I can unlock so many new use cases with the vector search right and like find kind of new ways to show the value to the users because especially you know like in traditional search engines if you return let's say 100 results you know users don't have time sometimes to go through them right you basically offload a lot of do you feel that way or yeah certainly and there's a lot in there are a lot of interesting things in which you're saying so I mean so so again it depends on how you look at it bottoms up or top down so so you can make an argument from the from the bottoms up approach that you say okay we have to infer the index or we have to de facto index and we can do certain things to this but but I also like to do is that I like to look at it from from a top down perspective and what I mean with it is this so if you work with a you want to build on a project for yourself you want to build on a project for you know with your students you want to build on a project for for I don't know for your boss or for customer whatever how often do you actually say okay that the tool that I'm not going to use to store my data isn't inferred index it probably doesn't happen that often I mean it's like it's like if you go to one of the um the websites of these famous big companies that build databases around inferred indexes they don't go like this is the best inverted index around use us they say something else right so and they say like hey we help you with with enterprise search or we help you with with logging or we help you with service security needs and those kind of things and what I think where we are right now in the cutting edge of like where where effective searches is like we are talking about it like you would be talking about these inverted indexes but I hope and that's one of the things that we try to do um at with you know at some technologies around we've yet it's to also talk about these new things and this is what you do with it so it's like hey we can't you cannot you have like um these recommendation systems in e-commerce you can do contextual search to e-commerce I don't know why it's stuck with e-commerce but I keep getting but or you can do contextual search to documents and those um in those kind of things we I was talking about this this this amazing use case that had to do with a um um with with a with a with a resume and had to do about this so there was like yet a resume and and let's say that in the resume it says like um I'm an IT director and I played in the um the national Olympic Beach volleyball team that's what it says and now the request is like there's they're looking for somebody who's an IT director and who is interested in playing sports you're not going to find that person with an uh inferred index but instead of talking about that from the perspective of like the inferred index can't find it because there's no relationship between directly between sports and and and uh being in the Olympic Beach volleyball team but with the with the factor index we can I actually like to find more words and better language to actually talk about these it's from the perspective of the use case like contextual search, mental search and those kind of things which I even think are still abstracting into a lot of people's ears but I think it's also very exciting so there's this new thing goes for you as well right so you're also helping with that you're helping to let the world know like hey look there's this new thing look what the things are that you can do with it so I think the point that I'm trying to make is that I I it's not that I disagree with your point I agree with your point but if you compare it with successful search engines now that might be based on uh um inferred indexes but that's not how how we talk about them and um I really think that say that's that we're like at the at the cost of that change that people select that they start to talk more in these in these um from the perspective of the use cases and the things that you can build with them yeah absolutely I mean it's it's just you know like my engineering mind always kicks in and says like hey but you are basically offering me to replace a dirty index with like vector search data structure um but you are totally right like if an electric company would say hey buy our cars because we have the best battery and look how good it is and they they supply some diagrams there and showing how well it conserves the energy and stuff right maybe yeah we'll appeal to some clients who want to save the planet let's say right but the rest of the clients they will say okay why should I buy your car if it's slower right like you didn't focus on the use case I'm I'm kind of like uh advocating for right so and you you should always listen to your users on that one yes and so what you're saying is very interesting and this this is something um I was inspired by something which is called the the layering problem which basically means that in the past and I was so this was for me the case too like I I think like you know maybe 10 years ago or something that you know if I just go deeper drill down deeper deeper deeper deeper deeper and I understand how something works at the core that means that I understand the whole concept of something and the more I'm learning about this and the more I'm working on it the more I think that that's not the case so let me give you an example so that was this I saw this tweet coming by on the day that coinbase did an IPO and I'm paraphrasing here what was in the tweet because I don't remember exactly but somebody found the hacker news post or somebody announced that they were working on coinbase it might not even have been cold coinbase back then and he said okay I'm thinking of building a platform blah blah blah it's something like that well you should have seen the responses there because people like no wants to use that and that's not where these blockchain technologies are you know made for because we actually want to decentralize things blah blah blah regardless of the fact if you agree or disagree with the statement I think we can agree on the fact that coinbase is doing pretty well and bringing a lot of value to people so the point that I'm trying to make with this story is that I think that the the risk we run in constantly doing that deep dive and making the deep dive comparison which is important and which needs to happen and where we need to think on in the product of self-p and we also need to think in these other layers like how will people actually use that because don't bear in mind that the people currently that are involved in the discussion talking about the expected database and who are very vocal about it are people are extremely knowledgeable about what's happening in the hood but with it what if you're just a you know you're just working in the company and you're just like a normal software engineer and somebody says like hey I want to do better product search and you do a Google search on that you find a solution like we've yet you might be interested in knowing what's happening on the web but there's a limit to that you also you you come to it through the use case not not bottoms up but that's how I how I look at it so that yeah that's the point that I want to make I really like I really like your approach like because I mean like yeah I mean I was kind of joining this industry or entering this industry as an engineer and then I kind of progressed more like to work closer to product management I didn't I didn't become a product manager but when I talk to them they really kind of want to hear too much about the algorithms right because it's not what they think about daily they think about solving user use cases right and and and sometimes they may ask how can I do this is this possible right and they give you a task right and then you go and like you come back to your toolbox and you're like okay what do I have here a couple of databases I have this queue system okay let me stitch things together and maybe this will work out right but you I agree to I agree with you that I think in many ways we we have that risk kind of in engineering kind of focusing too much on what's closer to us right let's say I enjoy using this IDE I enjoy using this compiler but what value it produces beyond me enjoying using it right the end result right that matters no absolutely and don't get me wrong as so I mean sometimes when we apply on our internal slack channel when there's something you released in the software or then I enjoy that as well and or I can you know then I you know I play around with it and I go like it's amazing that this works or that we can do this or how fast things are or scalable they are don't get me wrong I enjoy that a lot but the things but I what I also enjoy and I think that's also the role that I have in this in this company and the role that I'm trying to play for us when it comes to vector search is if you for example have these product managers to actually listen to them and select what what problem can we solve and I don't think it's the responsibility of the product manager to take an example to understand how vector search might apply to their use case no we need to be able to express through the product managers how we can bring value to them because I don't think I mean of course there are product managers that say like okay for the next product we need to cut some of our database but I don't think there are many asking a question they ask a question like okay you know we can absolutely it's a lot of data we can never lose the data architect or or engineer how are we gonna solve that and it's so it's different language to talk about these problems and what we now start to see is that that there's this wave coming that people express problems from a product manager perspective business owner perspective entrepreneur perspective that they that they say things or problems that they have for example hey somebody keeps typing and I'm having a headache in my search bar but they don't see aspirin and then we need to go boom let's use case for we get yeah absolutely and it's like that's a great segue actually to the second part of the of our show you know like product managers answer the question what we're building right and engineers answer the question how and so I wanted you to kind of go and talk a little bit about how you implemented the avid and I understand that hn maybe could also talk about it and I think he talked about it recently in a podcast and I'll be I'll make sure to link that in the show notes but what caught my eye and you know if you look at the landscape of the vector databases some of them are close source most of them up to now are open source and it's interesting distinction because some businesses decide you know we will keep it close because it's at the core of what we offer and you know maybe there are some risk elements involved for them maybe something else but that's that that's their choice your choice was to open source v avid can you talk a bit more about it um yeah sure yeah sure sure so the um so that goes back to us like so if we so you can if you have a use case you can package things together right that goes from the the the lowest level of the technology so just you know where the the the bits by begin and then where the where the index sits and how how it works and how it's optimized and how it's scalable and then you go up on up and then you get to to these to these modules that you might want to use and then you get to these packages of additional tools that you might want to use for specific use case and then the question sits like okay where does the most value come from and what do people need to actually use this in production and um what you try to do is that you try to somehow capture that value and then there are two things that we see in the case of vv8 because vv8 of course all these also with our competitors we evolve in different directions right which is good I think um uh is that we said like well there's a lot of value so if you look at our enterprise customers right what's very important for them is that um uh that they want to have certain SLAs that they want to have certain um sometimes they want to have a size of things sometimes they want to use things that are in these packages sometimes they want to have specific models they all can do that uh but um that is where the most of the value for them is coming from they need vv8 to do it's always a vv8 that's hard but it um uh seldomly the people specifically ask what effect the search engine so if you go back to that example what I gave like these famous search engines that you now have around are not promoted as like we have the best infertile index uh uh piece of software that exists that goes a little bit the same for us so then you can select well if that's the case we could consider open sourcing it and then you can say well I can make a pro list and I can make a con list of open sourcing so because I have a business model right I know I want to build my business so what would be the the um a pro of open sourcing it well one is transparency so you say like well we're building something completely new it sounds all very fancy we're gonna show the world that we're not you know we can actually uh do this so I I very fancy told you like I was like well we even have parts of assembly in the in the uh in the code well you can actually see that right so you can see how it's optimized you can see how it's optimized as one so what that has uh as an effect is that it builds trust so the second thing that happens is as I mentioned before we need to learn what these use cases are that people are building with vector search so we see people are building like crazy so our downloads are going up up over time and the other day somebody just published a great uh article about how they how they index 60 million data objects than we've yet were open source users but what we have what we learned from it we have this we they it's like um they're they're so kind to do they're basically promoting we've yet they get a seat back they gave us help etc but there's also another thing um sometimes an open source user finds a bug or finds something and the way that of course that that the that the the the the software ecosystem is structured is at the moment that the fix comes in our customers have that fix as well so it's a it's a win-win so and the thing is customers don't mind that are open source users because if I have a customer that says like hey or prospect because hey Bob can I also use the open source version to say of course you can but if you manage it yourself you're stuck with the open source license if something goes wrong and you can choose the sound software and then they go like well we we want all that so then it's interesting for them to um um uh uh uh to to to buy license what's also important to know is that these companies like ours were young companies so you're also tried to position yourself in the field and you try to show what you can do and I think that open source is an amazing vehicle um uh because as you probably know the the the the the open source community can be very direct and that is great because then you learn from it and you can make things better so we've learned a lot from the from the um community so all in all it's currently is it is a net win to have it open source and it's because it's not it's it's helping us from an outreach point of view it helps us to build community and um it's not biting our uh business strategy yeah but that's well put uh but I wanted to still come back on the open source a little bit you know like you did mention these key elements that are not positive for you and they natively embed into your business model so to say right um but there is also one element that the open source like if you compare this this to close source let's say the way this would look inside your company is that you have internal you know roadmap planning and you chagelong just releasing stuff right and then you go directly through your sales you know to upgrade this installation so you become like a uh kind of corporate type of thing to be deployments right um on the open source side you need to do an upfront work to maintain the connection you've built the community but you need to keep talking to the community right that's a lot of work as well so how do you see that part of the story yeah so that's a very interesting question actually so the um so what you see with it so I so in the end I think that community is not something that is um just a open source thing so let me give you an example a database company that I find very interesting is how they operate is a snowflake right can easily set up because of data warehouse you know no no no vector search so I can easily say that no but so I find it very interesting and I sometimes talk to people and they tell me you know what's an amazing company in how they build and help us build partnerships that's snowflake never like wow that's interesting so they explain to me how they do it so the point that I'm trying to make they apparently are doing a great job in building a community uh but they're of course completely closed source so you need to build a community you know either way you need to have if people don't like your stuff um they'll move away and we know of a very famous database company where that is happening and uh so it's a um it's an old-fashioned company so that's fine but that way actually learned from them like actually you want to be you want to have a community you want to be nice you want to be great great products um because some people you know then the best marketing is basically worth of money and so the point I'm trying to make is like I don't think we're limited to um uh uh so sorry I meant community is not a thing only for open source you somehow want to show uh uh uh failure and then you build community about people using your technology saying something about your technology um etc yeah I mean absolutely I mean it's it's just the I guess the essence of my question was kind of like you know like if I if I maintain it as a closed source I can maintain my own standards and I can be let's say soc to compliant right for the auditing part of things so my business moves forward but when I'm open source I need to maintain a different level of standard like like documentation you know um code style you know uh the process of submitting you know pull requests and how how can I influence the VIV direction and other things you know like it's a lot of support on your side you basically you support the clients right like those that choose your deployments your hosted version your cloud and then you need to support the community and I mean I'm not saying this is a bad thing I'm saying this is like a portion of your business model of your day-to-day life that is dedicated to that and you are doing a great job at that by the way I'm like super amazed like positive you are always like welcoming on slack and the count keeps increasing like regularly when I go back to the IVH Slack I'm like okay just few weeks ago it was 150 now it's over over 200 like what's going on so you know what was doing a great job and and the whole team um but I mean it's work it's work that's what I'm trying yeah yeah well I mean running a startup in general is it's a lot of work but I so I hear your argument but I'm just not I don't 100% agree with the argument so let me explain let me explain why so first of all like with the the simple example I spoke to I don't know if everybody knows what suck to is but there was listening to the podcast but it's like it's the center right you can have an open source product that is suck to complying which is really is interesting again in from a business model perspective so you can say if you use this software open source it's not suck to complying but if you use the exact same software with the different license it is suck to complying so it's it's a part of the open source business so that that's one thing the second thing about for example maintaining documentation that is true but the thing is if you have a a close source solution somebody somehow needs to use the APIs as well so you still need documentation for it so you still need to maintain the documentation so um so there I'm not sure if that argument still if that if that argument still still holds the only thing that is sometimes difficult is that people ask a lot of questions so you sometimes see those on a slack they ask a lot of questions so you want to be friendly and you want to um um you know you want to answer these questions there are two things related to that so one is like at some point of course you know okay sneak people ask like a lot of questions and they keep asking over and sometimes you're friendly say you know maybe just watch this video first or read this part of documentation first or sometimes I also do that in the end and then just kindly just you know and direct them in it said I maybe you want to start you want to start there so that that is one thing um there was something else that I wanted to say related to this uh on the oh yeah and the second thing and it's also something that I always tell the team is that sometimes an open source user might ask complicated questions not complicated as in that the question itself is complicated but just oh another question or why is he or she asking this but the thing is that I strongly believe that every question that you get has a core of truth to it so so if somebody makes a fuss about somebody asks a question then probably others have that problem as well and the the upside that you have from open source is that there's a lower barrier to entry people start to ask these questions and you you know you learn from them and I think it's completely fine to in return I sometimes if people ask specific questions I just ask them not on the public select of maybe in the DMs like hey may I ask what are you building with me yet because then there's a feedback loop and we're learning from it as well so it's a I hear your point but I do think that open source is evolving and the business models around it are you evolving as well and and we're trying to benefit from it and again for now it's a net positive yeah thanks Bob it's a it's very clear you know the the reason I'm asking this question is because there is there's always something behind your choice right and it's like it's it's supporting your idea you drive in it but you know there is an alternative model as well you didn't consider it because you didn't want to go that path right you didn't want want to go the close source path for your database because the way you as you said you want to get more feedback loops right with the community you want to learn more about the day use cases and this is fantastic way of getting it right like you you show it just transparently on the web you can either download it and host yourself or and probably that will happen when you run into some some issues here and there we will we will be there to support you right and and and you can you can contribute back if you get inspired by the tech itself your deep and tech you want to fix some things right or introduce a feature that is amazing yeah and don't get me wrong as I don't have any problems with close source even if I I mean I can make an argument for close source as well but I but I but I do think is that the that it plays a role in your identity as a company so what kind of company do we want to be and how do we want to show that to the outside world and and yes that is that that comes with the complexity of needing to deal with it but in the end it it it works well so for example we also see so go back to these product managers right so what we see is that sometimes you know you have developers around the table and if developers see that it's they sometimes especially with with corporates the developers expect that we have the close source solution so then they see we've here they see there's actually open source that makes them very enthusiastic about it kind of this great then I you know I did an installation I played run with it this is great which is then a positive feedback look back to the product manager and then everybody you know everybody's happy so it's a it again it's a it's a it's a it's a it's currently a net positive and also I think when you build something new so you try to new niche yeah you create a new niche and we're not we're not alone but it's it's also not very crowded right you need to somehow show the world what that niches and what that niche can do in as many ways as possible so I think that I I dare to bet you a nice ball of wine that's for the 10 people that told me that they really liked the fact that we've hit this open source only one of them actually looked at the software itself right went in the folders looked at how it was written people do that and people get feedback on it but for a lot of people they just say hey this is great literature literature so open about it we got you what you know we understand it but this model is great this is this is working so it's it's building a it's a friendly way of approaching the market basically I would argue yeah yeah and I think to close off on this like you know like in my previous company when we were extending solar Apache solar you know the reason we were extending it is because we had a very specific use case that wasn't solved by the community right and you know like as you as you go into the Apache solar documentation you couldn't find a lot of material there on that specific topic right and so what I had to resort to is reading the source code right and this is something that actually one of the elastic search book authors I think Rafal could said you know if you have a question and nobody answers it on the mailing list or documentation doesn't have an answer go and read the code that's your answer right and if it's if it wasn't open source what would I do I would have to engage through some sales loop or what I would kind of like it would it would it would put the threshold to enter it so high that I would just kind of unbearably high I would say okay I will find something else or maybe I will stop working on this problem right yeah exactly and just knowing that that's there even if you don't need it often works as a benefit don't what I what I don't have is and that's it surprisingly a lot of people ask me this but I don't have any moral reason to have something open source it's just something is that works very well for us and how we want to position ourselves so but it's a great question but it's a I think to recap it just to position a vector search and with that we've basically in the world to show people that this is something that they can do and I think it's is working wonders yeah absolutely and maybe we can kind of cover another topic that you've you've mentioned that you know you you are using at the core of the IVAT you're using certain algorithms you know for the vector search itself like building the index and the search algorithm and you have mentioned to me in private that you are using like H&S double view which is a hierarchical navigable small world graph algorithm right that you have customized right can you talk a bit more why you did it like and you did mention crude right so that you needed to add crude yesterday I was checking their repository and they actually the original authors they already added crude there because probably there've been some other use case coming from elsewhere you know can you add it and and I was coming with my new use case by the way that I was saying can I load the first layer of the graph somehow with one single right and then I have to go and read the code this is another beauty of the source code I can read it right but can you talk a bit more why why you customized H&S double you and did you implement it in go in the end yeah yeah so so two parts of that so the I did not do that implementation so the the we earlier referred that other podcaster listeners really want to go into the near degree about it then I would highly recommend listening to that podcast but I believe that you're also going to link it so and the answer to your question was basically already in your in your question right so the the thing is that we the problem that that we needed to solve is that if you so you can take a an in an library but some of them are immutable and then the problem is like so then if you change something you need to rebuild it again that is something you don't want in a database because if you go back to that use case of for example the recommendation engine you somehow want to real time add a product to a card and somehow real time want to deal with that so then you are air quotes limited to a an algorithm that supports that and that was for us and and hnsw was the was the right fit however you can actually see that also in our documentation we'd not only have modules but the a n n is actually also plug-in so currently we only have hnsw but there's no reason we're looking at others as well at the future we're going to release other a n n plug-ins as well within we've hit it you as an enthusiast can actually choose what you want to use but the only the the requirement that we have for such an an algorithm is there basically when you say like okay we need to somehow have that crot support and or we need to build add crot support to it could even be the case that in the future it's you know we're going to support other use case for that's not the case but that's for now and to the second part of your question yes that's actually a customary build which you can of course see in the pit of repo you see a full circle that's the value all along right so and hopefully you know like in the end in the end of the day it's like if you guys and then something in part of hnsw that you implemented in in in glow and you published it as as code the original authors may also look at it right and they might you know take that idea and bring it back to the implementation and then that of course you know just as a side product will benefit some other part of the community but you will be there as well right because like you know it's the authorship or you know the credit that that will be given to you because you did it right so I mean that that that's very interesting like you can benefit and reach out to you to even like new users potentially right or they will know about your existence through this through this link and no exactly and I think so and and the thing is what happens with with a solution like like we fit is that it so it's like yes it has that in an algorithm at its at its heart at its core but there's so much around it with with with the scalability capability set it has with the way of also storing the data objects that actually building that yourself kind of it's very comparable to for example Lucine and solar or Lucine and Elasticsearch so how I like to talk about it in a comparison as I said I like to about take the an n algorithm as the as in your mind as Lucine I mean the comparison isn't a hundred percent correct but to make my point and then that whole thing that for example solar or Elasticsearch build around it that is what we're trying to do with these with these algorithms so you could that's how you could kind of compare it right Rathlin just that we said like we give you this out of the box but I do want to reiterate that yes we have these power users that really want to know that nitty-gritty that want to make those changes but the majority of users the I like to call them the silent majority of users they just have like okay I have a hundred thousand documents I know a hugging phase model that I like how am I gonna quickly search through it period and then they find we did so it's a that's that's the the majority of users and they probably don't even know what hnsw is which is fine right so that's perfectly fine because they do other things they might be edits that layering what I said like they just sit in another layer that they look at it so I think the cool thing of our modular system is that we can make these power users at the core happy but we can also have these more generic developers or full stack developers we can make them happy or even in the outer layer we can make these product managers happy and that's you know what we focus on but all through a single core and to a set of modules that are those are immutable basically so it's not that we have like two types of weviators something it's just one weviate where we support all these use cases yeah and I mean you know like for those of us who really want to go deep into detail and you know like the analogy that just kind of came to my mind is that if you take my sequel or some some sequel database right when you choose the type of the field that you index right and you're thinking okay it's going to be a B3 or it's going to be full text or it's going to be some other data structure that sequel database offers to you that's when you start asking questions what is the trade off right of choosing that version or the other version but you may also kind of just index the data and then kind of solve your use case first right and only when your product manager comes back to you and says hey why is this slower than yesterday can you improve right exactly exactly and what I find interesting and important where we now are in the in in the cutting edge where where vector search in general sits is that yes it's very important to talk about to be transparent about to to share about or or not share depending on your open or close energy about how these things work what kind of algorithms are used on it etc that is very important and that's also as you mentioned earlier something that we are very active in by the way not only as so for example my colleague Janice doing that from the from the core of vv8 but my other colleague Laura is more doing that from graph QL perspectives and like how can you build these queries when you get etc but I think it's also important to if you we take your my sequel example what do people use my sequel for or in our case what do they use vector search for and how can I communicate to these people who absolutely no idea what an what an an algorithm is and and I think that those are the tree pillars that we stand on so the the the core the interface and the use cases and we try to cover these tree pillars within a semi based on one code dates yeah that's fantastic I mean what what you do there and like I will link the like self what is it selfish plug so the the block was basically explains you know a few details of six vector databases couple of witch or close source the rest of open source you know you can actually see you know for yourself like what what is happening there in those databases and then so much material there and like don't go too technical yet kind of stay in the use case part and kind of like and and I and I hope I we can like as a community collaborate more in bringing these use cases kind of highlighting them I just alluded to you know similarity code search or you know encoding some software viruses into that representation and then searching similar viruses when you need to do that and you're like yeah so I mean there are so many use cases that it's our job in many ways to connect and you're doing a great job there really I need to connect the the use cases with the tech right so like don't fixate on the tech yet because tech isn't going to improve over time there will be new cool algos by the way the billion scale competition going on there will be probably new algorithms right that will beat in performance and then eventually performance will stop mattering in a way right like it will be something else it will be like okay you know what can I do yes yeah so and let me if I may make it a quick metaphor there I mean probably this metaphor is used many times for for the technology but just to to to make it as well so you know the other day I ate at a great restaurant with a friend of mine with great amazing food it was great atmosphere great food everything right so that is the that it would be the the metaphor the use case that you have but my friend you know she said like I actually want to know how they make this so she also bought a cookbook right with the recipes in it so now she could go into the book and she could actually go even a level deeper and actually see how the chef prepared the the dishes which is fine I was not I was I just wanted to have like some nice food in a nice glass wine she want to know a little bit more and I think if we're smart about this and I think also that is the where open source business development is now in you know 2021 almost 2022 is that you can actually cater that to that whole stack if you do that smart and because they you know they they click into each other so I so the point is like you said I don't think that the the technology and talk about technology is also very important but I don't think it's the only thing we should talk about I think we should make sure that we talk about both and that they're constantly aligned that's also within how we talk about we get internally that is that is that is aligned but people use different words and different ways to describe the technology of people that are you know helping me on the business development they use different words they never talk about H&W like the people in the cortex team do but they have a great understanding about the use case that are being sold so and I think that all needs to come together and we are an amazing point in time where that's happening for vectors search so I think that's just that's just amazing and by the way kudos to you as well right so you're you're you're carving out your own niche as well there so good for you that's nice you know and it's cool to see yeah being independent kind of in this field because it also opens opens doors to talk to to guys like you really and and and you know like if I was your competitor maybe you wouldn't want to talk to me this early not not on it's right it's a discussion it's a different discussion yes and at the same time as I said in the first episode I'm actually educating myself a lot on this so in the process of this I hope to share you know the learnings and and benefit everyone including myself so so that's that's that's the that's the way to to go forward so kind of on the open source side right so I'm open sourcing and yes exactly hey Bob so you you you really shared so much inside in in what you do on the product side in at Bavid as well as technology I want to drill into more into this kind of philosophical level why you do this and when I say why like I mean you personally right and that that probably propagates to your team as well and we can even ask everyone on your team why you guys do it do this right but you are the visionary you are the core of this like what brings you to this field you are the forefront of it yeah so it's like the that's a great question by the way so thanks for asking so the so when I was working on just experimenting with these with these models and these and these representations and I started to do research on actually how much how big is this how much unstructured data is there actually that we could potentially help it then I found that this was so big and at some point I was just walking down the street literally and I was and if you just walk down street it's like like in the matrix when when you see everything like that it's matrix I was like wow every company that I see every truck that I see driving by an airplane that I see coming over a warehouse that I see they could potentially use with you and that's that the dream that I just can walk down the street and I was like oh you see the truck there yeah the company uses us to do X oh you see the hospital over there yeah they use us to do Y etc etc and that is such a thrill to be in that in a new niche and trying to trying to build that product and and try to build it a solid product and bring that to you know that new product to people solve new problems that is a personal driver plus something that is just something that I'm personally very interested in and that's something that you already I guess noticed by the way that I you know present my answers and that grew over time is that I became interested in that that layer between you have to the tech and how people use it there's like and there's like this this overlapped there I'm interested in that overlapped and like how do people use the technology how does that create value and how can we bring it to them and how can we capture some of them and that is something that I'm extremely interested in and and this is just you know some of my technologies and then we've got this product is vehicle to do that so it's like if we then it just you know if we think big and we think about this new niche with new database technology then just let's just go all in and just see you know how far we can bring this and that's way more to say about this but the it's such an exciting time to work in this and so that's my personal reason why I do this and so yeah so that yeah since since you are so big on use cases is there a specific use case that drives you the that gets solved maybe it wasn't solved yet maybe it was already solved by the way in your in your videos you know like you always kind of quite frequently you say okay imagine a wine store right thinking I'm thinking probably there is a good why wine in the Holland I should when when I travel let's let's get together and drink some good wine but you know like is that the use case that drives you is there something else that you think could be solved by the aviate oh no so it wasn't yeah so the what what drives me is the there are certain use cases that could be I think and I'm now doing this from the top of my head that you could you could look at them from the perspective of size so that a that big you know large corporates are working with VGA trying to solve problems that I go like this is amazing that they use this right that is that is something that I want hand that drives me the other hand what drives me is that people were looking at vv8 to to use it to where it has an impact in people's lives so that can be medical use cases or even in even go as far as the HR example that I gave right and I'm a little bit vague about these use case because we still working on them but so when they're big or when they have an important positive impact on people's lives that is amazing so if I present to certain people results for for these big use cases and you see these you know people their eyes they go open they go like wow okay I actually do that that's amazing that is that is the most that's that's the coolest thing that's that's around and and there's also the there's something in and I don't want to sound too vague about it but that's something in exciting about with with with with machine learning I guess and especially with NLP so one of the things that we are working on and we hopefully gonna release for be soon as it as a demo that I said is that we loaded the complete Wikipedia into vh8 just the whole thing and so we're not talking about almost 100 million paragraphs and I watched an and an Anthony Hopkins move the other day and I typed into we've got just an aircraft you're upgraded like you know and which actor played Hannibal actor or something and then in a few milliseconds it says Anthony Hopkins and they go like whoa and then it's so cool if that actually works and if that happens that is very yeah that just gives me a thrill and so I would say these three things are why I'm why I'm doing yeah that's super exciting it's like you know like the if I was asking the same question to me then the word semantics right the similarity yes that would drive me because I actually did my PhD in in machine translation and my supervisor developed a semantic parser you know it wasn't syntactic parser and I cannot still find an analogy on the market for for the this work but like it was driving me that the way he was explaining this is that hey I really now read Tolstoy you know with my parser every single day right and it fails I fix it it fails I fix it but sometimes it amazes me because Tolstoy tends to create so long sentences that they can take several pages you know in Russian right I don't know how it books in the translation by the way I've never seen the translation but in the Russian source and Tolstoy has written his book nine times his books it's like several books right like the the war in peace I mean and like you know so like he was basically compiling the language using his parser and he was fascinated by this like okay this is the semantic layer and and I was constantly thinking you know I defended my thesis in 2011 and I was thinking okay how can I apply this tech in real life and it was very difficult because this parser is kind of implemented in force and I don't code in force I don't know if you heard of this language it's like it's used in the industry it's high performance and like but it's functional you know it's yeah so like you can express many things with just one single word and then it just unwinds there behind the scenes and you're like okay how do I debug this right so and there was a port to Java well done by another student but I was kind of constantly fascinated by this field like how can I bring semantics into the word of numbers and into the word of like well let's put it inverted in this is so what that's like since I mentioned it meant yeah no but I like this extra much what you're saying and it's like I and this goes again a little bit out of the realm of the technology I mean you use the technology but it goes out of it I remember that there was before the whole pandemic it there was like I was speaking at a conference in London and they were so nice to book me at the or to have me at a large stage and so I had like this big screen and I'm talking about VF8 and I'm giving a demo and I could see the first few front rows and that what you said before the Q&A example so I do just I give a Q&A example and I just give a real demo and what I always do is then the moment when I present an audience and I click like to you know to execute the query and get the response back then I always look at the people that I'm presenting to and you see these people go like they sit like in order to just watch you and they go like and that's like and I go like that's such a cool thing so something that that we as a team came up with and everybody participated in building the thing and then people enjoy seeing that and enjoy using that that's that's amazing that's like yeah that's just you know that's just fantastic so and language has that additional element to it right so that people you know it does this how we communicate and and we get closer to have the machines communicate you know that we can communicate more natural language based with people yeah that's that's that's amazing so that that would be another that would be a fourth one of doing it yeah that's fantastic that's so so deep and you know like I'm happy that we also connect with you on that topic you know with the semantics when you you can park all other items like technology use cases product but the semantics part way we're driving this thing it's it's just amazing and I'm happy for you guys to be doing this and I think part of the excitement of being at the edge of of doing things is also kind of you know launching things and like kind of like announcing something is there something you would like to announce yeah so yeah certainly so I think the so one thing that already mentioned a bit so we we're gonna launch that huge dataset just the whole Wikipedia with everything and also for people to try it out and to play around with so that they can see actually how big it gets and related to that and we already actually have an pre-release but it's gonna be released very soon just as a standard release is everything related to horizontal scalability so that people can now you know scale from into the millions into the into the billions and we starting to get these kinds of questions in and we're very close and it's like I was in like very very very very close because the pre-release is already out there and and that then goes full circle back to people ask I get sometimes emails from people say yeah so we have like you know a x billion factors but you probably can it's a well all choices and we probably can and I go like but can you also store the metadata you can also store the metadata and then people go they're really excited so it's just that just keeps going and going and so those are the big two things that I that I want to share because there's a lot of people asking for this and so we're probably gonna make a lot of people happy when it's when it's out of the pre-release yeah that's fantastic I mean it both sound so big and you know I'm actually spending my time now figuring out how to scale I participate in the billion scale competition actually by the way and I like I'm I'm waiting with excitement for this release because I would like to learn things from you guys as well and like if you solve that and you open source many things and you will open source this part as well so it's it's it's amazing you know what you do and and I'm waiting with excitement to learn what you've done so thanks so much and thanks so much thanks so much for your time as well I mean we we went so deep today and in many areas and I'm sure we can talk more at some point down the road probably we probably can and thank you so much for having me and keep up the great work because you're doing a great you know job in the you know in the community and in the industry yeah thanks so much Bob my pleasure and yeah see you next time bye cool thank you bye \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/code-search-copilot-llm-prompting-with-empathy-and-artifacts-with-john-berryman.md b/transcripts_with_timestamps/vector-podcast/code-search-copilot-llm-prompting-with-empathy-and-artifacts-with-john-berryman.md new file mode 100644 index 0000000..910a579 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/code-search-copilot-llm-prompting-with-empathy-and-artifacts-with-john-berryman.md @@ -0,0 +1,3520 @@ +--- +description: '

Vector Podcast website: https://vectorpodcast.com

Get + your copy of John''s new book "Prompt Engineering for LLMs: The Art and Science + of Building Large Language Model–Based Applications": https://amzn.to/4fMj2Ef

John + Berryman is the founder and principal consultant of Arcturus Labs, + where he specializes in AI application development (Agency and RAG). As an early + engineer on GitHub Copilot, John contributed to the development of its completions + and chat functionalities, working at the forefront of AI-assisted coding tools. + John is coauthor of "Prompt + Engineering for LLMs" (O''Reilly).Before his work on Copilot, John''s + focus was search technology. His diverse experience includes helping to develop + next-generation search system for the US Patent Office, building search and recommendations + for Eventbrite, and contributing to GitHub''s code search infrastructure. John is + also coauthor of "Relevant + Search" (Manning), a book that distills his expertise in the field.John''s + unique background, spanning both cutting-edge AI applications and foundational search + technologies, positions him at the forefront of innovation in LLM applications and + information retrieval.

00:00 Intro

02:19 John''s background + and story in search and ML

06:03 Is RAG just a prompt engineering technique?

10:15 + John''s progression from a search engineer to ML researcher

13:40 LLM predictability + vs more traditional programming

22:31 Code assist with GitHub Copilot

29:44 + Role of keyword search for code at GitHub

35:01 GenAI: existential risk or + pure magic? AI Natives

39:40 What are Artifacts

46:59 Demo!

55:13 + Typed artifacts, tools, accordion artifacts

56:21 From Web 2.0 to Idea exchange

57:51 + Spam will transform into Slop

58:56 John''s new book and Acturus Labs intro

Show + notes:

- John Berryman on X: https://x.com/JnBrymn

- Acturus Labs: https://arcturus-labs.com/

- + John''s blog on Artifacts (see demo in the episode): https://arcturus-labs.com/blog/2024/11/11/cut-the-chit-chat-with-artifacts/

YouTube: + https://youtu.be/60HAtHVBYj8

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20250209_090249_4151453caa902e94e1bbf399c57f535b.png +pub_date: Mon, 10 Feb 2025 03:21:48 GMT +title: Code search, Copilot, LLM prompting with empathy and Artifacts with John Berryman +url: https://rss.com/podcasts/vector-podcast/1888857 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 21.44, "text": " Hello + everyone, Vector podcast is back. Season 3. We are wrapping up the season with some", + "tokens": [50364, 2425, 1518, 11, 691, 20814, 7367, 307, 646, 13, 16465, 805, 13, + 492, 366, 21993, 493, 264, 3196, 365, 512, 51436], "temperature": 0.0, "avg_logprob": + -0.24522976253343665, "compression_ratio": 1.2919708029197081, "no_speech_prob": + 0.2688979208469391}, {"id": 1, "seek": 0, "start": 21.44, "end": 26.88, "text": + " really, really juicy episodes. I''m sure you will love this one. I have the privilege + of", "tokens": [51436, 534, 11, 534, 24696, 9313, 13, 286, 478, 988, 291, 486, 959, + 341, 472, 13, 286, 362, 264, 12122, 295, 51708], "temperature": 0.0, "avg_logprob": + -0.24522976253343665, "compression_ratio": 1.2919708029197081, "no_speech_prob": + 0.2688979208469391}, {"id": 2, "seek": 2688, "start": 26.88, "end": 34.8, "text": + " talking to John Barryman today. He is an ex senior machine learning researcher + who worked on", "tokens": [50364, 1417, 281, 2619, 21639, 1601, 965, 13, 634, 307, + 364, 454, 7965, 3479, 2539, 21751, 567, 2732, 322, 50760], "temperature": 0.0, "avg_logprob": + -0.23774630402865476, "compression_ratio": 1.4019607843137254, "no_speech_prob": + 0.26456302404403687}, {"id": 3, "seek": 2688, "start": 34.8, "end": 42.879999999999995, + "text": " GitHub Copilot. Currently, he runs his own consultancy actress labs. I''m + sure he will talk more", "tokens": [50760, 23331, 11579, 31516, 13, 19964, 11, 415, + 6676, 702, 1065, 7189, 6717, 15410, 20339, 13, 286, 478, 988, 415, 486, 751, 544, + 51164], "temperature": 0.0, "avg_logprob": -0.23774630402865476, "compression_ratio": + 1.4019607843137254, "no_speech_prob": 0.26456302404403687}, {"id": 4, "seek": 2688, + "start": 42.879999999999995, "end": 52.64, "text": " about that. Yeah, welcome, + John. Good to be here. How''s it going? Awesome. I actually just picked", "tokens": + [51164, 466, 300, 13, 865, 11, 2928, 11, 2619, 13, 2205, 281, 312, 510, 13, 1012, + 311, 309, 516, 30, 10391, 13, 286, 767, 445, 6183, 51652], "temperature": 0.0, "avg_logprob": + -0.23774630402865476, "compression_ratio": 1.4019607843137254, "no_speech_prob": + 0.26456302404403687}, {"id": 5, "seek": 5264, "start": 52.64, "end": 60.160000000000004, + "text": " the book of yours and the book that you and Dr. Moll have written together. + I''ve interviewed", "tokens": [50364, 264, 1446, 295, 6342, 293, 264, 1446, 300, + 291, 293, 2491, 13, 376, 1833, 362, 3720, 1214, 13, 286, 600, 19770, 50740], "temperature": + 0.0, "avg_logprob": -0.1765301648308249, "compression_ratio": 1.6101694915254237, + "no_speech_prob": 0.18279415369033813}, {"id": 6, "seek": 5264, "start": 60.160000000000004, + "end": 66.8, "text": " Doug a couple of times already on the podcast. He has a lot + to say. And I realized you''ve written", "tokens": [50740, 12742, 257, 1916, 295, + 1413, 1217, 322, 264, 7367, 13, 634, 575, 257, 688, 281, 584, 13, 400, 286, 5334, + 291, 600, 3720, 51072], "temperature": 0.0, "avg_logprob": -0.1765301648308249, + "compression_ratio": 1.6101694915254237, "no_speech_prob": 0.18279415369033813}, + {"id": 7, "seek": 5264, "start": 66.8, "end": 73.6, "text": " this book together. + It''s my go-to source of wisdom on search. Do you still remember which chapters", + "tokens": [51072, 341, 1446, 1214, 13, 467, 311, 452, 352, 12, 1353, 4009, 295, + 10712, 322, 3164, 13, 1144, 291, 920, 1604, 597, 20013, 51412], "temperature": 0.0, + "avg_logprob": -0.1765301648308249, "compression_ratio": 1.6101694915254237, "no_speech_prob": + 0.18279415369033813}, {"id": 8, "seek": 5264, "start": 73.6, "end": 82.24000000000001, + "text": " you covered? Oh my gosh. It''s been a long time, I''m sure. Yeah, if you + told me the chapter", "tokens": [51412, 291, 5343, 30, 876, 452, 6502, 13, 467, + 311, 668, 257, 938, 565, 11, 286, 478, 988, 13, 865, 11, 498, 291, 1907, 385, 264, + 7187, 51844], "temperature": 0.0, "avg_logprob": -0.1765301648308249, "compression_ratio": + 1.6101694915254237, "no_speech_prob": 0.18279415369033813}, {"id": 9, "seek": 8224, + "start": 82.24, "end": 87.6, "text": " title, I could probably say whether it is + mere Doug. I did all the fun ones that did all hard ones.", "tokens": [50364, 4876, + 11, 286, 727, 1391, 584, 1968, 309, 307, 8401, 12742, 13, 286, 630, 439, 264, 1019, + 2306, 300, 630, 439, 1152, 2306, 13, 50632], "temperature": 0.0, "avg_logprob": + -0.2895876432719984, "compression_ratio": 1.6724890829694323, "no_speech_prob": + 0.02608640119433403}, {"id": 10, "seek": 8224, "start": 89.11999999999999, "end": + 93.44, "text": " And we both did chapter one in our own times. I mean, we were all + in chapter twice.", "tokens": [50708, 400, 321, 1293, 630, 7187, 472, 294, 527, + 1065, 1413, 13, 286, 914, 11, 321, 645, 439, 294, 7187, 6091, 13, 50924], "temperature": + 0.0, "avg_logprob": -0.2895876432719984, "compression_ratio": 1.6724890829694323, + "no_speech_prob": 0.02608640119433403}, {"id": 11, "seek": 8224, "start": 93.44, + "end": 98.39999999999999, "text": " I want to read maybe everything, but in the + search relevance problems, search under the hood,", "tokens": [50924, 286, 528, + 281, 1401, 1310, 1203, 11, 457, 294, 264, 3164, 32684, 2740, 11, 3164, 833, 264, + 13376, 11, 51172], "temperature": 0.0, "avg_logprob": -0.2895876432719984, "compression_ratio": + 1.6724890829694323, "no_speech_prob": 0.02608640119433403}, {"id": 12, "seek": 8224, + "start": 98.39999999999999, "end": 106.72, "text": " debugging, relevance problem, + tame in tokens, basic multi-field search, how you build relevance function,", "tokens": + [51172, 45592, 11, 32684, 1154, 11, 45774, 294, 22667, 11, 3875, 4825, 12, 7610, + 3164, 11, 577, 291, 1322, 32684, 2445, 11, 51588], "temperature": 0.0, "avg_logprob": + -0.2895876432719984, "compression_ratio": 1.6724890829694323, "no_speech_prob": + 0.02608640119433403}, {"id": 13, "seek": 10672, "start": 106.72, "end": 116.8, "text": + " feed relevance feedback. Yeah, relevance centered enterprise. That''s interesting. + And then", "tokens": [50364, 3154, 32684, 5824, 13, 865, 11, 32684, 18988, 14132, + 13, 663, 311, 1880, 13, 400, 550, 50868], "temperature": 0.0, "avg_logprob": -0.33205181278594553, + "compression_ratio": 1.4922279792746114, "no_speech_prob": 0.013508201576769352}, + {"id": 14, "seek": 10672, "start": 116.8, "end": 125.2, "text": " semantic and personalized + search. Wow. Back in when was this published? I think we published that", "tokens": + [50868, 47982, 293, 28415, 3164, 13, 3153, 13, 5833, 294, 562, 390, 341, 6572, 30, + 286, 519, 321, 6572, 300, 51288], "temperature": 0.0, "avg_logprob": -0.33205181278594553, + "compression_ratio": 1.4922279792746114, "no_speech_prob": 0.013508201576769352}, + {"id": 15, "seek": 10672, "start": 125.2, "end": 133.12, "text": " in 2016, I see. + Yeah, 2016. Yeah, well, it''s been almost 10 years. Yeah, that''s the version I + have.", "tokens": [51288, 294, 6549, 11, 286, 536, 13, 865, 11, 6549, 13, 865, 11, + 731, 11, 309, 311, 668, 1920, 1266, 924, 13, 865, 11, 300, 311, 264, 3037, 286, + 362, 13, 51684], "temperature": 0.0, "avg_logprob": -0.33205181278594553, "compression_ratio": + 1.4922279792746114, "no_speech_prob": 0.013508201576769352}, {"id": 16, "seek": + 13312, "start": 133.6, "end": 140.16, "text": " So you do have semantic search in + the end there. Yeah, awesome. Yeah. But yeah, Joan,", "tokens": [50388, 407, 291, + 360, 362, 47982, 3164, 294, 264, 917, 456, 13, 865, 11, 3476, 13, 865, 13, 583, + 1338, 11, 25748, 11, 50716], "temperature": 0.0, "avg_logprob": -0.204515147518802, + "compression_ratio": 1.4248704663212435, "no_speech_prob": 0.015206818468868732}, + {"id": 17, "seek": 13312, "start": 140.16, "end": 149.76, "text": " it''s interesting + to introduce yourself to our audience. What''s your background? How you got here?", + "tokens": [50716, 309, 311, 1880, 281, 5366, 1803, 281, 527, 4034, 13, 708, 311, + 428, 3678, 30, 1012, 291, 658, 510, 30, 51196], "temperature": 0.0, "avg_logprob": + -0.204515147518802, "compression_ratio": 1.4248704663212435, "no_speech_prob": 0.015206818468868732}, + {"id": 18, "seek": 13312, "start": 149.76, "end": 156.72, "text": " What are you + up to? Oh, well, I guess that''s a long story. I''ve had a very circuitous path.", + "tokens": [51196, 708, 366, 291, 493, 281, 30, 876, 11, 731, 11, 286, 2041, 300, + 311, 257, 938, 1657, 13, 286, 600, 632, 257, 588, 9048, 563, 3100, 13, 51544], "temperature": + 0.0, "avg_logprob": -0.204515147518802, "compression_ratio": 1.4248704663212435, + "no_speech_prob": 0.015206818468868732}, {"id": 19, "seek": 15672, "start": 157.68, + "end": 164.8, "text": " I started out in aerospace engineering because I like the + math. And as I got into the field,", "tokens": [50412, 286, 1409, 484, 294, 46817, + 7043, 570, 286, 411, 264, 5221, 13, 400, 382, 286, 658, 666, 264, 2519, 11, 50768], + "temperature": 0.0, "avg_logprob": -0.17873286399520746, "compression_ratio": 1.7745454545454546, + "no_speech_prob": 0.05893902853131294}, {"id": 20, "seek": 15672, "start": 164.8, + "end": 169.44, "text": " I found that that''s a thing that I really liked once the + math and was the software. You could do", "tokens": [50768, 286, 1352, 300, 300, + 311, 257, 551, 300, 286, 534, 4501, 1564, 264, 5221, 293, 390, 264, 4722, 13, 509, + 727, 360, 51000], "temperature": 0.0, "avg_logprob": -0.17873286399520746, "compression_ratio": + 1.7745454545454546, "no_speech_prob": 0.05893902853131294}, {"id": 21, "seek": 15672, + "start": 169.44, "end": 173.68, "text": " anything with those. And so while everyone + was geeking out about satellites and stuff, I thought", "tokens": [51000, 1340, + 365, 729, 13, 400, 370, 1339, 1518, 390, 36162, 278, 484, 466, 24960, 293, 1507, + 11, 286, 1194, 51212], "temperature": 0.0, "avg_logprob": -0.17873286399520746, + "compression_ratio": 1.7745454545454546, "no_speech_prob": 0.05893902853131294}, + {"id": 22, "seek": 15672, "start": 173.68, "end": 178.48, "text": " that was really + cool. But I realized that there''s a big, big world out there that you could address", + "tokens": [51212, 300, 390, 534, 1627, 13, 583, 286, 5334, 300, 456, 311, 257, 955, + 11, 955, 1002, 484, 456, 300, 291, 727, 2985, 51452], "temperature": 0.0, "avg_logprob": + -0.17873286399520746, "compression_ratio": 1.7745454545454546, "no_speech_prob": + 0.05893902853131294}, {"id": 23, "seek": 15672, "start": 178.48, "end": 185.28, + "text": " the whole thing with software and math. So I breached out and got that + book in your hand. My next big", "tokens": [51452, 264, 1379, 551, 365, 4722, 293, + 5221, 13, 407, 286, 1403, 15095, 484, 293, 658, 300, 1446, 294, 428, 1011, 13, 1222, + 958, 955, 51792], "temperature": 0.0, "avg_logprob": -0.17873286399520746, "compression_ratio": + 1.7745454545454546, "no_speech_prob": 0.05893902853131294}, {"id": 24, "seek": 18528, + "start": 186.16, "end": 192.32, "text": " adventure was into search. I joined a + concerted consultancy in Charlottesville, Virginia,", "tokens": [50408, 9868, 390, + 666, 3164, 13, 286, 6869, 257, 8543, 292, 7189, 6717, 294, 14130, 1521, 279, 8386, + 11, 10956, 11, 50716], "temperature": 0.0, "avg_logprob": -0.19640815258026123, + "compression_ratio": 1.6538461538461537, "no_speech_prob": 0.003105930984020233}, + {"id": 25, "seek": 18528, "start": 192.96, "end": 198.72, "text": " worked with + Doug Turnbull. I did had amazing adventures, hop on planes. And I talked to Zappos,", + "tokens": [50748, 2732, 365, 12742, 7956, 37290, 13, 286, 630, 632, 2243, 20905, + 11, 3818, 322, 14952, 13, 400, 286, 2825, 281, 1176, 1746, 329, 11, 51036], "temperature": + 0.0, "avg_logprob": -0.19640815258026123, "compression_ratio": 1.6538461538461537, + "no_speech_prob": 0.003105930984020233}, {"id": 26, "seek": 18528, "start": 199.36, + "end": 205.2, "text": " shoe sales and worked with a patent office. And then I got + the opportunity to write that book with Doug.", "tokens": [51068, 12796, 5763, 293, + 2732, 365, 257, 20495, 3398, 13, 400, 550, 286, 658, 264, 2650, 281, 2464, 300, + 1446, 365, 12742, 13, 51360], "temperature": 0.0, "avg_logprob": -0.19640815258026123, + "compression_ratio": 1.6538461538461537, "no_speech_prob": 0.003105930984020233}, + {"id": 27, "seek": 18528, "start": 205.2, "end": 211.04, "text": " So that pushed + me all really, really far. I got the opportunity to start working for some really", + "tokens": [51360, 407, 300, 9152, 385, 439, 534, 11, 534, 1400, 13, 286, 658, 264, + 2650, 281, 722, 1364, 337, 512, 534, 51652], "temperature": 0.0, "avg_logprob": + -0.19640815258026123, "compression_ratio": 1.6538461538461537, "no_speech_prob": + 0.003105930984020233}, {"id": 28, "seek": 21104, "start": 211.04, "end": 217.28, + "text": " interesting companies or for Eventbrite and built out their search and + recommendation twice.", "tokens": [50364, 1880, 3431, 420, 337, 13222, 1443, 642, + 293, 3094, 484, 641, 3164, 293, 11879, 6091, 13, 50676], "temperature": 0.0, "avg_logprob": + -0.19951191167721802, "compression_ratio": 1.7293577981651376, "no_speech_prob": + 0.0012346090516075492}, {"id": 29, "seek": 21104, "start": 218.23999999999998, "end": + 225.44, "text": " And then I got a chance to parly that into GitHub. So I went to + GitHub and built out their", "tokens": [50724, 400, 550, 286, 658, 257, 2931, 281, + 971, 356, 300, 666, 23331, 13, 407, 286, 1437, 281, 23331, 293, 3094, 484, 641, + 51084], "temperature": 0.0, "avg_logprob": -0.19951191167721802, "compression_ratio": + 1.7293577981651376, "no_speech_prob": 0.0012346090516075492}, {"id": 30, "seek": + 21104, "start": 225.44, "end": 231.68, "text": " last search-based code search infrastructure. + The old search infrastructure had smoke coming out", "tokens": [51084, 1036, 3164, + 12, 6032, 3089, 3164, 6896, 13, 440, 1331, 3164, 6896, 632, 8439, 1348, 484, 51396], + "temperature": 0.0, "avg_logprob": -0.19951191167721802, "compression_ratio": 1.7293577981651376, + "no_speech_prob": 0.0012346090516075492}, {"id": 31, "seek": 21104, "start": 231.68, + "end": 240.23999999999998, "text": " of it. So we came in, rebuilt infrastructure + from ground up. And after a while, I was search was", "tokens": [51396, 295, 309, + 13, 407, 321, 1361, 294, 11, 38532, 6896, 490, 2727, 493, 13, 400, 934, 257, 1339, + 11, 286, 390, 3164, 390, 51824], "temperature": 0.0, "avg_logprob": -0.19951191167721802, + "compression_ratio": 1.7293577981651376, "no_speech_prob": 0.0012346090516075492}, + {"id": 32, "seek": 24024, "start": 240.24, "end": 245.84, "text": " fun. But I was + always trying to get a little bit back towards math, towards data science tips.", + "tokens": [50364, 1019, 13, 583, 286, 390, 1009, 1382, 281, 483, 257, 707, 857, + 646, 3030, 5221, 11, 3030, 1412, 3497, 6082, 13, 50644], "temperature": 0.0, "avg_logprob": + -0.283728274670276, "compression_ratio": 1.5474137931034482, "no_speech_prob": 0.0016589078586548567}, + {"id": 33, "seek": 24024, "start": 246.64000000000001, "end": 252.48000000000002, + "text": " And in about 2021, I got my chance to make the leak to data science.", + "tokens": [50684, 400, 294, 466, 7201, 11, 286, 658, 452, 2931, 281, 652, 264, 17143, + 281, 1412, 3497, 13, 50976], "temperature": 0.0, "avg_logprob": -0.283728274670276, + "compression_ratio": 1.5474137931034482, "no_speech_prob": 0.0016589078586548567}, + {"id": 34, "seek": 24024, "start": 253.12, "end": 259.36, "text": " Join data science + at GitHub. And from there, it ended up getting the opportunity, just right", "tokens": + [51008, 19642, 1412, 3497, 412, 23331, 13, 400, 490, 456, 11, 309, 4590, 493, 1242, + 264, 2650, 11, 445, 558, 51320], "temperature": 0.0, "avg_logprob": -0.283728274670276, + "compression_ratio": 1.5474137931034482, "no_speech_prob": 0.0016589078586548567}, + {"id": 35, "seek": 24024, "start": 259.36, "end": 265.36, "text": " place, right + time to join Copilot. Because that was kind of, you know, ML machine learning type + stuff.", "tokens": [51320, 1081, 11, 558, 565, 281, 3917, 11579, 31516, 13, 1436, + 300, 390, 733, 295, 11, 291, 458, 11, 21601, 3479, 2539, 2010, 1507, 13, 51620], + "temperature": 0.0, "avg_logprob": -0.283728274670276, "compression_ratio": 1.5474137931034482, + "no_speech_prob": 0.0016589078586548567}, {"id": 36, "seek": 26536, "start": 265.92, + "end": 273.84000000000003, "text": " And I was in the data science group to that + point. And I was, I came on to Copilot", "tokens": [50392, 400, 286, 390, 294, 264, + 1412, 3497, 1594, 281, 300, 935, 13, 400, 286, 390, 11, 286, 1361, 322, 281, 11579, + 31516, 50788], "temperature": 0.0, "avg_logprob": -0.1416790783405304, "compression_ratio": + 1.6363636363636365, "no_speech_prob": 0.0052206916734576225}, {"id": 37, "seek": + 26536, "start": 274.64, "end": 281.44, "text": " after the research team had wrapped + up. There was a research team, brilliant people from GitHub next.", "tokens": [50828, + 934, 264, 2132, 1469, 632, 14226, 493, 13, 821, 390, 257, 2132, 1469, 11, 10248, + 561, 490, 23331, 958, 13, 51168], "temperature": 0.0, "avg_logprob": -0.1416790783405304, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 0.0052206916734576225}, + {"id": 38, "seek": 26536, "start": 281.44, "end": 285.76, "text": " They said, while + look at these large language models, they''re going to do amazing things. I think", + "tokens": [51168, 814, 848, 11, 1339, 574, 412, 613, 2416, 2856, 5245, 11, 436, + 434, 516, 281, 360, 2243, 721, 13, 286, 519, 51384], "temperature": 0.0, "avg_logprob": + -0.1416790783405304, "compression_ratio": 1.6363636363636365, "no_speech_prob": + 0.0052206916734576225}, {"id": 39, "seek": 26536, "start": 285.76, "end": 293.2, + "text": " it''s time. And they built this prototype. And then I came in on the team + that was there when it", "tokens": [51384, 309, 311, 565, 13, 400, 436, 3094, 341, + 19475, 13, 400, 550, 286, 1361, 294, 322, 264, 1469, 300, 390, 456, 562, 309, 51756], + "temperature": 0.0, "avg_logprob": -0.1416790783405304, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.0052206916734576225}, {"id": 40, "seek": 29320, "start": 293.2, + "end": 298.64, "text": " was going into production. So how to get this shipped to + everyone, how to start improving it, how to", "tokens": [50364, 390, 516, 666, 4265, + 13, 407, 577, 281, 483, 341, 25312, 281, 1518, 11, 577, 281, 722, 11470, 309, 11, + 577, 281, 50636], "temperature": 0.0, "avg_logprob": -0.14121625029925003, "compression_ratio": + 1.6837606837606838, "no_speech_prob": 0.002984345890581608}, {"id": 41, "seek": + 29320, "start": 298.64, "end": 304.88, "text": " measure, you know, what was working + and what wasn''t not working. And then from there, I went into", "tokens": [50636, + 3481, 11, 291, 458, 11, 437, 390, 1364, 293, 437, 2067, 380, 406, 1364, 13, 400, + 550, 490, 456, 11, 286, 1437, 666, 50948], "temperature": 0.0, "avg_logprob": -0.14121625029925003, + "compression_ratio": 1.6837606837606838, "no_speech_prob": 0.002984345890581608}, + {"id": 42, "seek": 29320, "start": 305.84, "end": 313.2, "text": " chat, Copilot + chat. I was working with some of those features inside the web app. And finally,", + "tokens": [50996, 5081, 11, 11579, 31516, 5081, 13, 286, 390, 1364, 365, 512, 295, + 729, 4122, 1854, 264, 3670, 724, 13, 400, 2721, 11, 51364], "temperature": 0.0, + "avg_logprob": -0.14121625029925003, "compression_ratio": 1.6837606837606838, "no_speech_prob": + 0.002984345890581608}, {"id": 43, "seek": 29320, "start": 313.2, "end": 317.68, + "text": " I was like, well, you know, I''ve got a little bit of knowledge in my + head now, time to write another", "tokens": [51364, 286, 390, 411, 11, 731, 11, + 291, 458, 11, 286, 600, 658, 257, 707, 857, 295, 3601, 294, 452, 1378, 586, 11, + 565, 281, 2464, 1071, 51588], "temperature": 0.0, "avg_logprob": -0.14121625029925003, + "compression_ratio": 1.6837606837606838, "no_speech_prob": 0.002984345890581608}, + {"id": 44, "seek": 31768, "start": 318.24, "end": 325.36, "text": " book. And I + connected with one of the research scientists that was on the original team. Albert", + "tokens": [50392, 1446, 13, 400, 286, 4582, 365, 472, 295, 264, 2132, 7708, 300, + 390, 322, 264, 3380, 1469, 13, 20812, 50748], "temperature": 0.0, "avg_logprob": + -0.25054440778844494, "compression_ratio": 1.4615384615384615, "no_speech_prob": + 0.0027968345675617456}, {"id": 45, "seek": 31768, "start": 325.36, "end": 330.64, + "text": " Ziegler, we wrote the book, Prompt Engineering for Elements. It''s about + building the Elements", "tokens": [50748, 1176, 20408, 1918, 11, 321, 4114, 264, + 1446, 11, 15833, 662, 16215, 337, 8024, 1117, 13, 467, 311, 466, 2390, 264, 8024, + 1117, 51012], "temperature": 0.0, "avg_logprob": -0.25054440778844494, "compression_ratio": + 1.4615384615384615, "no_speech_prob": 0.0027968345675617456}, {"id": 46, "seek": + 31768, "start": 330.64, "end": 339.92, "text": " applications. And with that, just + published two weeks ago, officially published, I have started", "tokens": [51012, + 5821, 13, 400, 365, 300, 11, 445, 6572, 732, 3259, 2057, 11, 12053, 6572, 11, 286, + 362, 1409, 51476], "temperature": 0.0, "avg_logprob": -0.25054440778844494, "compression_ratio": + 1.4615384615384615, "no_speech_prob": 0.0027968345675617456}, {"id": 47, "seek": + 33992, "start": 340.0, "end": 347.68, "text": " out on a new adventure. Yet again, + I am running Arturus Labs. I''m an indie consultant. And I''m", "tokens": [50368, + 484, 322, 257, 777, 9868, 13, 10890, 797, 11, 286, 669, 2614, 5735, 374, 301, 40047, + 13, 286, 478, 364, 33184, 24676, 13, 400, 286, 478, 50752], "temperature": 0.0, + "avg_logprob": -0.22500356655676387, "compression_ratio": 1.5378486055776892, "no_speech_prob": + 0.03571538254618645}, {"id": 48, "seek": 33992, "start": 347.68, "end": 354.48, + "text": " focusing on everything, large language models, Prompt Engineering, how + to build applications, you", "tokens": [50752, 8416, 322, 1203, 11, 2416, 2856, + 5245, 11, 15833, 662, 16215, 11, 577, 281, 1322, 5821, 11, 291, 51092], "temperature": + 0.0, "avg_logprob": -0.22500356655676387, "compression_ratio": 1.5378486055776892, + "no_speech_prob": 0.03571538254618645}, {"id": 49, "seek": 33992, "start": 354.48, + "end": 360.32, "text": " know, it''s feasibility, evaluations, stuff like that. + Kind of anything you want at this point. And", "tokens": [51092, 458, 11, 309, 311, + 21781, 2841, 11, 43085, 11, 1507, 411, 300, 13, 9242, 295, 1340, 291, 528, 412, + 341, 935, 13, 400, 51384], "temperature": 0.0, "avg_logprob": -0.22500356655676387, + "compression_ratio": 1.5378486055776892, "no_speech_prob": 0.03571538254618645}, + {"id": 50, "seek": 33992, "start": 360.32, "end": 367.84000000000003, "text": " + it''s a blast. Oh, well, fantastic journey. Yeah, thanks for sharing that. It''s + very, you know,", "tokens": [51384, 309, 311, 257, 12035, 13, 876, 11, 731, 11, + 5456, 4671, 13, 865, 11, 3231, 337, 5414, 300, 13, 467, 311, 588, 11, 291, 458, + 11, 51760], "temperature": 0.0, "avg_logprob": -0.22500356655676387, "compression_ratio": + 1.5378486055776892, "no_speech_prob": 0.03571538254618645}, {"id": 51, "seek": 36784, + "start": 368.79999999999995, "end": 375.11999999999995, "text": " it says a lot + there. You will believe it or not. But I actually advertised your recent books,", + "tokens": [50412, 309, 1619, 257, 688, 456, 13, 509, 486, 1697, 309, 420, 406, 13, + 583, 286, 767, 42310, 428, 5162, 3642, 11, 50728], "temperature": 0.0, "avg_logprob": + -0.20105866711549084, "compression_ratio": 1.5826446280991735, "no_speech_prob": + 0.03132854402065277}, {"id": 52, "seek": 36784, "start": 375.11999999999995, "end": + 382.08, "text": " the Prompt Engineering, to my students on the recent course that + we caught up with my former", "tokens": [50728, 264, 15833, 662, 16215, 11, 281, + 452, 1731, 322, 264, 5162, 1164, 300, 321, 5415, 493, 365, 452, 5819, 51076], "temperature": + 0.0, "avg_logprob": -0.20105866711549084, "compression_ratio": 1.5826446280991735, + "no_speech_prob": 0.03132854402065277}, {"id": 53, "seek": 36784, "start": 382.79999999999995, + "end": 389.52, "text": " colleagues on LLMs and Generative AI. So I took the chapter + on the rag. And I thought that rag", "tokens": [51112, 7734, 322, 441, 43, 26386, + 293, 15409, 1166, 7318, 13, 407, 286, 1890, 264, 7187, 322, 264, 17539, 13, 400, + 286, 1194, 300, 17539, 51448], "temperature": 0.0, "avg_logprob": -0.20105866711549084, + "compression_ratio": 1.5826446280991735, "no_speech_prob": 0.03132854402065277}, + {"id": 54, "seek": 36784, "start": 389.52, "end": 396.4, "text": " is nothing else + than Prompt Engineering, really. Well, yeah, it''s interesting. I mean, that''s + a topic", "tokens": [51448, 307, 1825, 1646, 813, 15833, 662, 16215, 11, 534, 13, + 1042, 11, 1338, 11, 309, 311, 1880, 13, 286, 914, 11, 300, 311, 257, 4829, 51792], + "temperature": 0.0, "avg_logprob": -0.20105866711549084, "compression_ratio": 1.5826446280991735, + "no_speech_prob": 0.03132854402065277}, {"id": 55, "seek": 39640, "start": 396.47999999999996, + "end": 404.96, "text": " in and of itself. Are we going to open that kind of worms? + Of course. Sure. Yeah, rag is an interesting", "tokens": [50368, 294, 293, 295, + 2564, 13, 2014, 321, 516, 281, 1269, 300, 733, 295, 28271, 30, 2720, 1164, 13, 4894, + 13, 865, 11, 17539, 307, 364, 1880, 50792], "temperature": 0.0, "avg_logprob": -0.19188976287841797, + "compression_ratio": 1.672340425531915, "no_speech_prob": 0.010984758846461773}, + {"id": 56, "seek": 39640, "start": 404.96, "end": 412.71999999999997, "text": " + thing because everyone talks about rag as if it''s own entity, that it''s a special + thing. But if", "tokens": [50792, 551, 570, 1518, 6686, 466, 17539, 382, 498, 309, + 311, 1065, 13977, 11, 300, 309, 311, 257, 2121, 551, 13, 583, 498, 51180], "temperature": + 0.0, "avg_logprob": -0.19188976287841797, "compression_ratio": 1.672340425531915, + "no_speech_prob": 0.010984758846461773}, {"id": 57, "seek": 39640, "start": 412.71999999999997, + "end": 417.2, "text": " you like look at it, especially from my background, which + has been searched and then large language", "tokens": [51180, 291, 411, 574, 412, + 309, 11, 2318, 490, 452, 3678, 11, 597, 575, 668, 22961, 293, 550, 2416, 2856, 51404], + "temperature": 0.0, "avg_logprob": -0.19188976287841797, "compression_ratio": 1.672340425531915, + "no_speech_prob": 0.010984758846461773}, {"id": 58, "seek": 39640, "start": 417.2, + "end": 423.84, "text": " models, you click look at rag and it is search and then + the large language models. And if you", "tokens": [51404, 5245, 11, 291, 2052, 574, + 412, 17539, 293, 309, 307, 3164, 293, 550, 264, 2416, 2856, 5245, 13, 400, 498, + 291, 51736], "temperature": 0.0, "avg_logprob": -0.19188976287841797, "compression_ratio": + 1.672340425531915, "no_speech_prob": 0.010984758846461773}, {"id": 59, "seek": 42384, + "start": 423.84, "end": 429.44, "text": " combine them both, then it''s really hard + to get a good understanding of what''s working and what''s", "tokens": [50364, 10432, + 552, 1293, 11, 550, 309, 311, 534, 1152, 281, 483, 257, 665, 3701, 295, 437, 311, + 1364, 293, 437, 311, 50644], "temperature": 0.0, "avg_logprob": -0.13690835686140163, + "compression_ratio": 1.6176470588235294, "no_speech_prob": 0.0008702923078089952}, + {"id": 60, "seek": 42384, "start": 429.44, "end": 436.23999999999995, "text": " + not working. You just, you know, you throw up the basic chain application, connect + the data source.", "tokens": [50644, 406, 1364, 13, 509, 445, 11, 291, 458, 11, + 291, 3507, 493, 264, 3875, 5021, 3861, 11, 1745, 264, 1412, 4009, 13, 50984], "temperature": + 0.0, "avg_logprob": -0.13690835686140163, "compression_ratio": 1.6176470588235294, + "no_speech_prob": 0.0008702923078089952}, {"id": 61, "seek": 42384, "start": 436.79999999999995, + "end": 443.2, "text": " And I guess you just pray that it works. But really, what + it breaks, if you break it down to its", "tokens": [51012, 400, 286, 2041, 291, + 445, 3690, 300, 309, 1985, 13, 583, 534, 11, 437, 309, 9857, 11, 498, 291, 1821, + 309, 760, 281, 1080, 51332], "temperature": 0.0, "avg_logprob": -0.13690835686140163, + "compression_ratio": 1.6176470588235294, "no_speech_prob": 0.0008702923078089952}, + {"id": 62, "seek": 42384, "start": 443.84, "end": 449.91999999999996, "text": " + components, then you''ve got a search application and the Prompt Engineering large + language", "tokens": [51364, 6677, 11, 550, 291, 600, 658, 257, 3164, 3861, 293, + 264, 15833, 662, 16215, 2416, 2856, 51668], "temperature": 0.0, "avg_logprob": -0.13690835686140163, + "compression_ratio": 1.6176470588235294, "no_speech_prob": 0.0008702923078089952}, + {"id": 63, "seek": 44992, "start": 449.92, "end": 456.8, "text": " application that + it overlaps. But a lot of it''s kind of downstream. And if you can look at those", + "tokens": [50364, 3861, 300, 309, 15986, 2382, 13, 583, 257, 688, 295, 309, 311, + 733, 295, 30621, 13, 400, 498, 291, 393, 574, 412, 729, 50708], "temperature": 0.0, + "avg_logprob": -0.17091063795418576, "compression_ratio": 1.791044776119403, "no_speech_prob": + 0.00626370171085}, {"id": 64, "seek": 44992, "start": 456.8, "end": 463.20000000000005, + "text": " two chunks separately, it becomes a lot easier to debug problems. Rather + than saying, you know,", "tokens": [50708, 732, 24004, 14759, 11, 309, 3643, 257, + 688, 3571, 281, 24083, 2740, 13, 16571, 813, 1566, 11, 291, 458, 11, 51028], "temperature": + 0.0, "avg_logprob": -0.17091063795418576, "compression_ratio": 1.791044776119403, + "no_speech_prob": 0.00626370171085}, {"id": 65, "seek": 44992, "start": 463.20000000000005, + "end": 467.52000000000004, "text": " user asks this question, I got a garbage answer. + You can say the user asks this question, the", "tokens": [51028, 4195, 8962, 341, + 1168, 11, 286, 658, 257, 14150, 1867, 13, 509, 393, 584, 264, 4195, 8962, 341, 1168, + 11, 264, 51244], "temperature": 0.0, "avg_logprob": -0.17091063795418576, "compression_ratio": + 1.791044776119403, "no_speech_prob": 0.00626370171085}, {"id": 66, "seek": 44992, + "start": 467.52000000000004, "end": 473.84000000000003, "text": " large language + model interpreted it as this search, this search, return these results. And maybe", + "tokens": [51244, 2416, 2856, 2316, 26749, 309, 382, 341, 3164, 11, 341, 3164, 11, + 2736, 613, 3542, 13, 400, 1310, 51560], "temperature": 0.0, "avg_logprob": -0.17091063795418576, + "compression_ratio": 1.791044776119403, "no_speech_prob": 0.00626370171085}, {"id": + 67, "seek": 44992, "start": 473.84000000000003, "end": 477.84000000000003, "text": + " that''s the, maybe that''s where the problem was. And you can start debugging + that. And the search", "tokens": [51560, 300, 311, 264, 11, 1310, 300, 311, 689, + 264, 1154, 390, 13, 400, 291, 393, 722, 45592, 300, 13, 400, 264, 3164, 51760], + "temperature": 0.0, "avg_logprob": -0.17091063795418576, "compression_ratio": 1.791044776119403, + "no_speech_prob": 0.00626370171085}, {"id": 68, "seek": 47784, "start": 477.84, + "end": 482.32, "text": " results got interpreted this way. And maybe you''re not + presenting it right to the model.", "tokens": [50364, 3542, 658, 26749, 341, 636, + 13, 400, 1310, 291, 434, 406, 15578, 309, 558, 281, 264, 2316, 13, 50588], "temperature": + 0.0, "avg_logprob": -0.13551448324452275, "compression_ratio": 1.711191335740072, + "no_speech_prob": 0.01667843759059906}, {"id": 69, "seek": 47784, "start": 483.03999999999996, + "end": 487.76, "text": " So always the name of the game with it''s probably everything + we''re going to talk about today is,", "tokens": [50624, 407, 1009, 264, 1315, 295, + 264, 1216, 365, 309, 311, 1391, 1203, 321, 434, 516, 281, 751, 466, 965, 307, 11, + 50860], "temperature": 0.0, "avg_logprob": -0.13551448324452275, "compression_ratio": + 1.711191335740072, "no_speech_prob": 0.01667843759059906}, {"id": 70, "seek": 47784, + "start": 487.76, "end": 492.71999999999997, "text": " you know, figured out how + to take this giant black box and break it down into components and figure", "tokens": + [50860, 291, 458, 11, 8932, 484, 577, 281, 747, 341, 7410, 2211, 2424, 293, 1821, + 309, 760, 666, 6677, 293, 2573, 51108], "temperature": 0.0, "avg_logprob": -0.13551448324452275, + "compression_ratio": 1.711191335740072, "no_speech_prob": 0.01667843759059906}, + {"id": 71, "seek": 47784, "start": 492.71999999999997, "end": 498.71999999999997, + "text": " out what is, you know, what''s it made of and what possibly is going wrong + and put sensors there", "tokens": [51108, 484, 437, 307, 11, 291, 458, 11, 437, + 311, 309, 1027, 295, 293, 437, 6264, 307, 516, 2085, 293, 829, 14840, 456, 51408], + "temperature": 0.0, "avg_logprob": -0.13551448324452275, "compression_ratio": 1.711191335740072, + "no_speech_prob": 0.01667843759059906}, {"id": 72, "seek": 47784, "start": 498.71999999999997, + "end": 504.47999999999996, "text": " and actually debugger. Yeah, you''re absolutely + right. And in the, in the lecture, I actually", "tokens": [51408, 293, 767, 24083, + 1321, 13, 865, 11, 291, 434, 3122, 558, 13, 400, 294, 264, 11, 294, 264, 7991, 11, + 286, 767, 51696], "temperature": 0.0, "avg_logprob": -0.13551448324452275, "compression_ratio": + 1.711191335740072, "no_speech_prob": 0.01667843759059906}, {"id": 73, "seek": 50448, + "start": 504.56, "end": 511.12, "text": " longed code from someone, I forgot their + name, but I''ll make sure to link it. We''ve built a rag", "tokens": [50368, 938, + 292, 3089, 490, 1580, 11, 286, 5298, 641, 1315, 11, 457, 286, 603, 652, 988, 281, + 2113, 309, 13, 492, 600, 3094, 257, 17539, 50696], "temperature": 0.0, "avg_logprob": + -0.25309247877991314, "compression_ratio": 1.517509727626459, "no_speech_prob": + 0.004667668137699366}, {"id": 74, "seek": 50448, "start": 511.12, "end": 516.08, + "text": " ground up without using any framework whatsoever. You didn''t mention + Langchain, that''s one way of", "tokens": [50696, 2727, 493, 1553, 1228, 604, 8388, + 17076, 13, 509, 994, 380, 2152, 13313, 339, 491, 11, 300, 311, 472, 636, 295, 50944], + "temperature": 0.0, "avg_logprob": -0.25309247877991314, "compression_ratio": 1.517509727626459, + "no_speech_prob": 0.004667668137699366}, {"id": 75, "seek": 50448, "start": 516.08, + "end": 524.5600000000001, "text": " doing it for sure. But we just really built, + you know, naive, can and search and just use the model", "tokens": [50944, 884, + 309, 337, 988, 13, 583, 321, 445, 534, 3094, 11, 291, 458, 11, 29052, 11, 393, 293, + 3164, 293, 445, 764, 264, 2316, 51368], "temperature": 0.0, "avg_logprob": -0.25309247877991314, + "compression_ratio": 1.517509727626459, "no_speech_prob": 0.004667668137699366}, + {"id": 76, "seek": 50448, "start": 524.5600000000001, "end": 531.2, "text": " out + of the box, sentence, bird. And then I''ve noticed that because we did use dot product + there,", "tokens": [51368, 484, 295, 264, 2424, 11, 8174, 11, 5255, 13, 400, 550, + 286, 600, 5694, 300, 570, 321, 630, 764, 5893, 1674, 456, 11, 51700], "temperature": + 0.0, "avg_logprob": -0.25309247877991314, "compression_ratio": 1.517509727626459, + "no_speech_prob": 0.004667668137699366}, {"id": 77, "seek": 53120, "start": 531.2, + "end": 537.76, "text": " I''ve noticed that it would favor longer passages over + shorter ones, right? For example, it would", "tokens": [50364, 286, 600, 5694, 300, + 309, 576, 2294, 2854, 31589, 670, 11639, 2306, 11, 558, 30, 1171, 1365, 11, 309, + 576, 50692], "temperature": 0.0, "avg_logprob": -0.12641617624383222, "compression_ratio": + 1.6092436974789917, "no_speech_prob": 0.003477048361673951}, {"id": 78, "seek": + 53120, "start": 537.76, "end": 547.76, "text": " pull up an appendix of a AI powered + book, AI powered search book. And I was like, like, you could", "tokens": [50692, + 2235, 493, 364, 34116, 970, 295, 257, 7318, 17786, 1446, 11, 7318, 17786, 3164, + 1446, 13, 400, 286, 390, 411, 11, 411, 11, 291, 727, 51192], "temperature": 0.0, + "avg_logprob": -0.12641617624383222, "compression_ratio": 1.6092436974789917, "no_speech_prob": + 0.003477048361673951}, {"id": 79, "seek": 53120, "start": 547.76, "end": 553.6, + "text": " clearly see that it''s missing the point. It''s not able to pull up one + short sentence where the", "tokens": [51192, 4448, 536, 300, 309, 311, 5361, 264, + 935, 13, 467, 311, 406, 1075, 281, 2235, 493, 472, 2099, 8174, 689, 264, 51484], + "temperature": 0.0, "avg_logprob": -0.12641617624383222, "compression_ratio": 1.6092436974789917, + "no_speech_prob": 0.003477048361673951}, {"id": 80, "seek": 53120, "start": 553.6, + "end": 559.6800000000001, "text": " answer lies. It just pulls something else remotely + related. And that''s exactly what you said,", "tokens": [51484, 1867, 9134, 13, + 467, 445, 16982, 746, 1646, 20824, 4077, 13, 400, 300, 311, 2293, 437, 291, 848, + 11, 51788], "temperature": 0.0, "avg_logprob": -0.12641617624383222, "compression_ratio": + 1.6092436974789917, "no_speech_prob": 0.003477048361673951}, {"id": 81, "seek": + 55968, "start": 559.68, "end": 564.0799999999999, "text": " right? Like you need + to start debugging what''s going on there. And you need to start fixing on", "tokens": + [50364, 558, 30, 1743, 291, 643, 281, 722, 45592, 437, 311, 516, 322, 456, 13, 400, + 291, 643, 281, 722, 19442, 322, 50584], "temperature": 0.0, "avg_logprob": -0.13390055156889416, + "compression_ratio": 1.7830882352941178, "no_speech_prob": 0.025069870054721832}, + {"id": 82, "seek": 55968, "start": 564.0799999999999, "end": 570.7199999999999, + "text": " figuring out maybe change the model, maybe change the chunking. But yeah, + I agree. It felt a bit like", "tokens": [50584, 15213, 484, 1310, 1319, 264, 2316, + 11, 1310, 1319, 264, 16635, 278, 13, 583, 1338, 11, 286, 3986, 13, 467, 2762, 257, + 857, 411, 50916], "temperature": 0.0, "avg_logprob": -0.13390055156889416, "compression_ratio": + 1.7830882352941178, "no_speech_prob": 0.025069870054721832}, {"id": 83, "seek": + 55968, "start": 570.7199999999999, "end": 576.4, "text": " black box, but less so + when you implement it ground up, right? So you don''t depend on any framework.", + "tokens": [50916, 2211, 2424, 11, 457, 1570, 370, 562, 291, 4445, 309, 2727, 493, + 11, 558, 30, 407, 291, 500, 380, 5672, 322, 604, 8388, 13, 51200], "temperature": + 0.0, "avg_logprob": -0.13390055156889416, "compression_ratio": 1.7830882352941178, + "no_speech_prob": 0.025069870054721832}, {"id": 84, "seek": 55968, "start": 578.0799999999999, + "end": 582.88, "text": " And when you implement it ground up, you find out that + it''s not all of that complicated. And", "tokens": [51284, 400, 562, 291, 4445, + 309, 2727, 493, 11, 291, 915, 484, 300, 309, 311, 406, 439, 295, 300, 6179, 13, + 400, 51524], "temperature": 0.0, "avg_logprob": -0.13390055156889416, "compression_ratio": + 1.7830882352941178, "no_speech_prob": 0.025069870054721832}, {"id": 85, "seek": + 55968, "start": 582.88, "end": 587.04, "text": " once you''ve built every piece + of it, like, you know, I mean, you''ve already seen the black box", "tokens": [51524, + 1564, 291, 600, 3094, 633, 2522, 295, 309, 11, 411, 11, 291, 458, 11, 286, 914, + 11, 291, 600, 1217, 1612, 264, 2211, 2424, 51732], "temperature": 0.0, "avg_logprob": + -0.13390055156889416, "compression_ratio": 1.7830882352941178, "no_speech_prob": + 0.025069870054721832}, {"id": 86, "seek": 58704, "start": 587.04, "end": 591.04, + "text": " broken down to its sub pieces. It''s not a black box anymore. So yeah, + that''s", "tokens": [50364, 5463, 760, 281, 1080, 1422, 3755, 13, 467, 311, 406, + 257, 2211, 2424, 3602, 13, 407, 1338, 11, 300, 311, 50564], "temperature": 0.0, + "avg_logprob": -0.12920911095359108, "compression_ratio": 1.650735294117647, "no_speech_prob": + 0.007051675580441952}, {"id": 87, "seek": 58704, "start": 592.0799999999999, "end": + 596.8, "text": " typically since the whole industry now is sorting itself, trying + to figure out what tools are", "tokens": [50616, 5850, 1670, 264, 1379, 3518, 586, + 307, 32411, 2564, 11, 1382, 281, 2573, 484, 437, 3873, 366, 50852], "temperature": + 0.0, "avg_logprob": -0.12920911095359108, "compression_ratio": 1.650735294117647, + "no_speech_prob": 0.007051675580441952}, {"id": 88, "seek": 58704, "start": 596.8, + "end": 603.5999999999999, "text": " useful and what tools are not going to be useful, + I often advocate that people start as close to", "tokens": [50852, 4420, 293, 437, + 3873, 366, 406, 516, 281, 312, 4420, 11, 286, 2049, 14608, 300, 561, 722, 382, 1998, + 281, 51192], "temperature": 0.0, "avg_logprob": -0.12920911095359108, "compression_ratio": + 1.650735294117647, "no_speech_prob": 0.007051675580441952}, {"id": 89, "seek": 58704, + "start": 603.5999999999999, "end": 607.8399999999999, "text": " the metal as possible. + Because these models are actually pretty friendly, pretty fun to play with.", "tokens": + [51192, 264, 5760, 382, 1944, 13, 1436, 613, 5245, 366, 767, 1238, 9208, 11, 1238, + 1019, 281, 862, 365, 13, 51404], "temperature": 0.0, "avg_logprob": -0.12920911095359108, + "compression_ratio": 1.650735294117647, "no_speech_prob": 0.007051675580441952}, + {"id": 90, "seek": 58704, "start": 607.8399999999999, "end": 612.64, "text": " Don''t + put layers on top of it that obfuscate, you know, what''s actually happening.", + "tokens": [51404, 1468, 380, 829, 7914, 322, 1192, 295, 309, 300, 1111, 69, 32601, + 473, 11, 291, 458, 11, 437, 311, 767, 2737, 13, 51644], "temperature": 0.0, "avg_logprob": + -0.12920911095359108, "compression_ratio": 1.650735294117647, "no_speech_prob": + 0.007051675580441952}, {"id": 91, "seek": 61264, "start": 613.4399999999999, "end": + 620.56, "text": " Yeah, absolutely. I''m really itching to ask you more about now, + like your time at GitHub.", "tokens": [50404, 865, 11, 3122, 13, 286, 478, 534, + 309, 17354, 281, 1029, 291, 544, 466, 586, 11, 411, 428, 565, 412, 23331, 13, 50760], + "temperature": 0.0, "avg_logprob": -0.20421616784457503, "compression_ratio": 1.6459143968871595, + "no_speech_prob": 0.05906231328845024}, {"id": 92, "seek": 61264, "start": 620.56, + "end": 624.64, "text": " But before that, I also want to like a little bit take + your,", "tokens": [50760, 583, 949, 300, 11, 286, 611, 528, 281, 411, 257, 707, + 857, 747, 428, 11, 50964], "temperature": 0.0, "avg_logprob": -0.20421616784457503, + "compression_ratio": 1.6459143968871595, "no_speech_prob": 0.05906231328845024}, + {"id": 93, "seek": 61264, "start": 626.16, "end": 630.56, "text": " you know, take + a look at your approach, how you view your career, right? So you look,", "tokens": + [51040, 291, 458, 11, 747, 257, 574, 412, 428, 3109, 11, 577, 291, 1910, 428, 3988, + 11, 558, 30, 407, 291, 574, 11, 51260], "temperature": 0.0, "avg_logprob": -0.20421616784457503, + "compression_ratio": 1.6459143968871595, "no_speech_prob": 0.05906231328845024}, + {"id": 94, "seek": 61264, "start": 630.56, "end": 636.08, "text": " you worked on + search, but then you ended up in the hottest place in the way, applying", "tokens": + [51260, 291, 2732, 322, 3164, 11, 457, 550, 291, 4590, 493, 294, 264, 32780, 1081, + 294, 264, 636, 11, 9275, 51536], "temperature": 0.0, "avg_logprob": -0.20421616784457503, + "compression_ratio": 1.6459143968871595, "no_speech_prob": 0.05906231328845024}, + {"id": 95, "seek": 61264, "start": 636.08, "end": 641.2, "text": " all the lamps, + right? And you needed to convert in a way to an ML engineer. Do you view it that + way?", "tokens": [51536, 439, 264, 34887, 11, 558, 30, 400, 291, 2978, 281, 7620, + 294, 257, 636, 281, 364, 21601, 11403, 13, 1144, 291, 1910, 309, 300, 636, 30, 51792], + "temperature": 0.0, "avg_logprob": -0.20421616784457503, "compression_ratio": 1.6459143968871595, + "no_speech_prob": 0.05906231328845024}, {"id": 96, "seek": 64120, "start": 641.2, + "end": 647.84, "text": " And also if you do, how did you prepare yourself to become + a machine learning researcher,", "tokens": [50364, 400, 611, 498, 291, 360, 11, + 577, 630, 291, 5940, 1803, 281, 1813, 257, 3479, 2539, 21751, 11, 50696], "temperature": + 0.0, "avg_logprob": -0.16394857830471463, "compression_ratio": 1.5546218487394958, + "no_speech_prob": 0.009676489047706127}, {"id": 97, "seek": 64120, "start": 647.84, + "end": 653.6, "text": " actually not even an engineer, right? You are focusing on + research aspects of things. So you", "tokens": [50696, 767, 406, 754, 364, 11403, + 11, 558, 30, 509, 366, 8416, 322, 2132, 7270, 295, 721, 13, 407, 291, 50984], "temperature": + 0.0, "avg_logprob": -0.16394857830471463, "compression_ratio": 1.5546218487394958, + "no_speech_prob": 0.009676489047706127}, {"id": 98, "seek": 64120, "start": 653.6, + "end": 660.96, "text": " needed to move the needle in the research space. I don''t + know if I have a good answer for you.", "tokens": [50984, 2978, 281, 1286, 264, + 11037, 294, 264, 2132, 1901, 13, 286, 500, 380, 458, 498, 286, 362, 257, 665, 1867, + 337, 291, 13, 51352], "temperature": 0.0, "avg_logprob": -0.16394857830471463, "compression_ratio": + 1.5546218487394958, "no_speech_prob": 0.009676489047706127}, {"id": 99, "seek": + 64120, "start": 660.96, "end": 665.76, "text": " Like if anyone thinks my career + has been successful, which in many ways I''ve done all right,", "tokens": [51352, + 1743, 498, 2878, 7309, 452, 3988, 575, 668, 4406, 11, 597, 294, 867, 2098, 286, + 600, 1096, 439, 558, 11, 51592], "temperature": 0.0, "avg_logprob": -0.16394857830471463, + "compression_ratio": 1.5546218487394958, "no_speech_prob": 0.009676489047706127}, + {"id": 100, "seek": 66576, "start": 666.72, "end": 672.0, "text": " it''s been luckily + like tripping and falling uphill. Every time I fall down, it''s like in the uphill", + "tokens": [50412, 309, 311, 668, 22880, 411, 1376, 3759, 293, 7440, 39132, 13, 2048, + 565, 286, 2100, 760, 11, 309, 311, 411, 294, 264, 39132, 50676], "temperature": + 0.0, "avg_logprob": -0.18522885867527553, "compression_ratio": 1.5101010101010102, + "no_speech_prob": 0.002982220146805048}, {"id": 101, "seek": 66576, "start": 672.0, + "end": 679.92, "text": " direction. And I don''t, I''m the hand of Providence. And + so what do I do with any of these crazy jumps", "tokens": [50676, 3513, 13, 400, + 286, 500, 380, 11, 286, 478, 264, 1011, 295, 15685, 2778, 13, 400, 370, 437, 360, + 286, 360, 365, 604, 295, 613, 3219, 16704, 51072], "temperature": 0.0, "avg_logprob": + -0.18522885867527553, "compression_ratio": 1.5101010101010102, "no_speech_prob": + 0.002982220146805048}, {"id": 102, "seek": 66576, "start": 679.92, "end": 688.88, + "text": " that I make to prepare? Pretty much, I just take the jump. I think I''m + going to say how I''m going", "tokens": [51072, 300, 286, 652, 281, 5940, 30, 10693, + 709, 11, 286, 445, 747, 264, 3012, 13, 286, 519, 286, 478, 516, 281, 584, 577, 286, + 478, 516, 51520], "temperature": 0.0, "avg_logprob": -0.18522885867527553, "compression_ratio": + 1.5101010101010102, "no_speech_prob": 0.002982220146805048}, {"id": 103, "seek": + 68888, "start": 689.28, "end": 696.4, "text": " to prepare for the next jump. I + take, I see the jump. And then I jump into it and like almost", "tokens": [50384, + 281, 5940, 337, 264, 958, 3012, 13, 286, 747, 11, 286, 536, 264, 3012, 13, 400, + 550, 286, 3012, 666, 309, 293, 411, 1920, 50740], "temperature": 0.0, "avg_logprob": + -0.2002301279703776, "compression_ratio": 1.5555555555555556, "no_speech_prob": + 0.000904655666090548}, {"id": 104, "seek": 68888, "start": 696.4, "end": 706.8, + "text": " drown every single time by surviving. So in this particular case, yeah, + the move towards AI", "tokens": [50740, 20337, 633, 2167, 565, 538, 24948, 13, 407, + 294, 341, 1729, 1389, 11, 1338, 11, 264, 1286, 3030, 7318, 51260], "temperature": + 0.0, "avg_logprob": -0.2002301279703776, "compression_ratio": 1.5555555555555556, + "no_speech_prob": 0.000904655666090548}, {"id": 105, "seek": 68888, "start": 707.76, + "end": 715.12, "text": " researcher, I mean, there''s a lot in that, there''s a + lot of weight in that phrase that maybe I", "tokens": [51308, 21751, 11, 286, 914, + 11, 456, 311, 257, 688, 294, 300, 11, 456, 311, 257, 688, 295, 3364, 294, 300, 9535, + 300, 1310, 286, 51676], "temperature": 0.0, "avg_logprob": -0.2002301279703776, + "compression_ratio": 1.5555555555555556, "no_speech_prob": 0.000904655666090548}, + {"id": 106, "seek": 71512, "start": 715.12, "end": 721.2, "text": " don''t necessarily + feel in my own career. By beginning search for so long and by wanting to do", "tokens": + [50364, 500, 380, 4725, 841, 294, 452, 1065, 3988, 13, 3146, 2863, 3164, 337, 370, + 938, 293, 538, 7935, 281, 360, 50668], "temperature": 0.0, "avg_logprob": -0.13878718289462003, + "compression_ratio": 1.7205882352941178, "no_speech_prob": 0.0011525267036631703}, + {"id": 107, "seek": 71512, "start": 721.2, "end": 727.36, "text": " data science + for so long, I made myself, you know, over time, pretty aware of how things were,", + "tokens": [50668, 1412, 3497, 337, 370, 938, 11, 286, 1027, 2059, 11, 291, 458, + 11, 670, 565, 11, 1238, 3650, 295, 577, 721, 645, 11, 50976], "temperature": 0.0, + "avg_logprob": -0.13878718289462003, "compression_ratio": 1.7205882352941178, "no_speech_prob": + 0.0011525267036631703}, {"id": 108, "seek": 71512, "start": 727.36, "end": 732.48, + "text": " you know, just the typical approach to the model. So I was never caught + any of this in school,", "tokens": [50976, 291, 458, 11, 445, 264, 7476, 3109, 281, + 264, 2316, 13, 407, 286, 390, 1128, 5415, 604, 295, 341, 294, 1395, 11, 51232], + "temperature": 0.0, "avg_logprob": -0.13878718289462003, "compression_ratio": 1.7205882352941178, + "no_speech_prob": 0.0011525267036631703}, {"id": 109, "seek": 71512, "start": 732.48, + "end": 736.88, "text": " but you know, you read, you read the right books and you + know, go through the right examples.", "tokens": [51232, 457, 291, 458, 11, 291, + 1401, 11, 291, 1401, 264, 558, 3642, 293, 291, 458, 11, 352, 807, 264, 558, 5110, + 13, 51452], "temperature": 0.0, "avg_logprob": -0.13878718289462003, "compression_ratio": + 1.7205882352941178, "no_speech_prob": 0.0011525267036631703}, {"id": 110, "seek": + 71512, "start": 738.0, "end": 743.6, "text": " Yeah, so I have gained, I wouldn''t + say just an absolute comfort with any of this even now.", "tokens": [51508, 865, + 11, 370, 286, 362, 12634, 11, 286, 2759, 380, 584, 445, 364, 8236, 3400, 365, 604, + 295, 341, 754, 586, 13, 51788], "temperature": 0.0, "avg_logprob": -0.13878718289462003, + "compression_ratio": 1.7205882352941178, "no_speech_prob": 0.0011525267036631703}, + {"id": 111, "seek": 74360, "start": 744.24, "end": 749.36, "text": " But you know, + familiarity of being around it for periods of this point. And then when I jumped", + "tokens": [50396, 583, 291, 458, 11, 49828, 295, 885, 926, 309, 337, 13804, 295, + 341, 935, 13, 400, 550, 562, 286, 13864, 50652], "temperature": 0.0, "avg_logprob": + -0.1570124894045712, "compression_ratio": 1.548780487804878, "no_speech_prob": 0.0022936412133276463}, + {"id": 112, "seek": 74360, "start": 749.36, "end": 756.96, "text": " into the large + language modeling stuff, it''s actually kind of interesting because it''s a different + type", "tokens": [50652, 666, 264, 2416, 2856, 15983, 1507, 11, 309, 311, 767, 733, + 295, 1880, 570, 309, 311, 257, 819, 2010, 51032], "temperature": 0.0, "avg_logprob": + -0.1570124894045712, "compression_ratio": 1.548780487804878, "no_speech_prob": 0.0022936412133276463}, + {"id": 113, "seek": 74360, "start": 756.96, "end": 763.2, "text": " of AI expert + than we''ve had before and maybe an easier entrance for a lot of people.", "tokens": + [51032, 295, 7318, 5844, 813, 321, 600, 632, 949, 293, 1310, 364, 3571, 12014, 337, + 257, 688, 295, 561, 13, 51344], "temperature": 0.0, "avg_logprob": -0.1570124894045712, + "compression_ratio": 1.548780487804878, "no_speech_prob": 0.0022936412133276463}, + {"id": 114, "seek": 74360, "start": 763.84, "end": 769.9200000000001, "text": " + Much of my career, I have been an engineer and really I still, I predominantly think + of myself as", "tokens": [51376, 12313, 295, 452, 3988, 11, 286, 362, 668, 364, + 11403, 293, 534, 286, 920, 11, 286, 29893, 519, 295, 2059, 382, 51680], "temperature": + 0.0, "avg_logprob": -0.1570124894045712, "compression_ratio": 1.548780487804878, + "no_speech_prob": 0.0022936412133276463}, {"id": 115, "seek": 76992, "start": 769.92, + "end": 776.56, "text": " an engineering mindset. And so when you come into, you + know, large language models, it''s actually", "tokens": [50364, 364, 7043, 12543, + 13, 400, 370, 562, 291, 808, 666, 11, 291, 458, 11, 2416, 2856, 5245, 11, 309, 311, + 767, 50696], "temperature": 0.0, "avg_logprob": -0.15691834307731467, "compression_ratio": + 1.6440677966101696, "no_speech_prob": 0.0017042603576555848}, {"id": 116, "seek": + 76992, "start": 776.56, "end": 785.12, "text": " really approachable. You don''t + have to immediately know everything about, you know, what choice of", "tokens": + [50696, 534, 3109, 712, 13, 509, 500, 380, 362, 281, 4258, 458, 1203, 466, 11, 291, + 458, 11, 437, 3922, 295, 51124], "temperature": 0.0, "avg_logprob": -0.15691834307731467, + "compression_ratio": 1.6440677966101696, "no_speech_prob": 0.0017042603576555848}, + {"id": 117, "seek": 76992, "start": 785.12, "end": 790.0799999999999, "text": " + models to use and like, you know, how to train and have the whole outside and evaluate. + And", "tokens": [51124, 5245, 281, 764, 293, 411, 11, 291, 458, 11, 577, 281, 3847, + 293, 362, 264, 1379, 2380, 293, 13059, 13, 400, 51372], "temperature": 0.0, "avg_logprob": + -0.15691834307731467, "compression_ratio": 1.6440677966101696, "no_speech_prob": + 0.0017042603576555848}, {"id": 118, "seek": 76992, "start": 791.92, "end": 798.7199999999999, + "text": " you can just go to work and at first, at least, just experiment and I + really encourage people to do", "tokens": [51464, 291, 393, 445, 352, 281, 589, + 293, 412, 700, 11, 412, 1935, 11, 445, 5120, 293, 286, 534, 5373, 561, 281, 360, + 51804], "temperature": 0.0, "avg_logprob": -0.15691834307731467, "compression_ratio": + 1.6440677966101696, "no_speech_prob": 0.0017042603576555848}, {"id": 119, "seek": + 79872, "start": 798.72, "end": 802.32, "text": " this when they''re building on, + they''re on an application. Rather than, you know, thinking about all", "tokens": + [50364, 341, 562, 436, 434, 2390, 322, 11, 436, 434, 322, 364, 3861, 13, 16571, + 813, 11, 291, 458, 11, 1953, 466, 439, 50544], "temperature": 0.0, "avg_logprob": + -0.1820230975593488, "compression_ratio": 1.606694560669456, "no_speech_prob": 0.0031977645121514797}, + {"id": 120, "seek": 79872, "start": 802.32, "end": 807.12, "text": " the evaluations + and stuff at front, you''ll need, don''t worry, you''ll get to them. But just get + your", "tokens": [50544, 264, 43085, 293, 1507, 412, 1868, 11, 291, 603, 643, 11, + 500, 380, 3292, 11, 291, 603, 483, 281, 552, 13, 583, 445, 483, 428, 50784], "temperature": + 0.0, "avg_logprob": -0.1820230975593488, "compression_ratio": 1.606694560669456, + "no_speech_prob": 0.0031977645121514797}, {"id": 121, "seek": 79872, "start": 807.12, + "end": 814.96, "text": " hands, hands dirty, start using the, the APIs and build + up some intuition and a weird way", "tokens": [50784, 2377, 11, 2377, 9360, 11, + 722, 1228, 264, 11, 264, 21445, 293, 1322, 493, 512, 24002, 293, 257, 3657, 636, + 51176], "temperature": 0.0, "avg_logprob": -0.1820230975593488, "compression_ratio": + 1.606694560669456, "no_speech_prob": 0.0031977645121514797}, {"id": 122, "seek": + 79872, "start": 815.52, "end": 823.44, "text": " empathy for these large language + models. Yeah, yeah, this is brilliantly said. I just recently", "tokens": [51204, + 18701, 337, 613, 2416, 2856, 5245, 13, 865, 11, 1338, 11, 341, 307, 8695, 42580, + 848, 13, 286, 445, 3938, 51600], "temperature": 0.0, "avg_logprob": -0.1820230975593488, + "compression_ratio": 1.606694560669456, "no_speech_prob": 0.0031977645121514797}, + {"id": 123, "seek": 82344, "start": 824.32, "end": 833.6, "text": " listened to + the episode of Lex Friedman with the ontropic team. So the CEO and some of the", + "tokens": [50408, 13207, 281, 264, 3500, 295, 24086, 17605, 1601, 365, 264, 6592, + 39173, 1469, 13, 407, 264, 9282, 293, 512, 295, 264, 50872], "temperature": 0.0, + "avg_logprob": -0.19162458769032653, "compression_ratio": 1.598901098901099, "no_speech_prob": + 0.03295965492725372}, {"id": 124, "seek": 82344, "start": 833.6, "end": 841.6, "text": + " researchers there. And one of them said, yeah, exactly. And one of them said that + you, along the", "tokens": [50872, 10309, 456, 13, 400, 472, 295, 552, 848, 11, + 1338, 11, 2293, 13, 400, 472, 295, 552, 848, 300, 291, 11, 2051, 264, 51272], "temperature": + 0.0, "avg_logprob": -0.19162458769032653, "compression_ratio": 1.598901098901099, + "no_speech_prob": 0.03295965492725372}, {"id": 125, "seek": 82344, "start": 841.6, + "end": 846.4000000000001, "text": " lines of what you just said about empathy towards + the model that when you know where model succeeds and", "tokens": [51272, 3876, + 295, 437, 291, 445, 848, 466, 18701, 3030, 264, 2316, 300, 562, 291, 458, 689, 2316, + 49263, 293, 51512], "temperature": 0.0, "avg_logprob": -0.19162458769032653, "compression_ratio": + 1.598901098901099, "no_speech_prob": 0.03295965492725372}, {"id": 126, "seek": 84640, + "start": 846.4, "end": 852.16, "text": " where it kind of fails, you learn how to + prompt it. Right. You know, like which risks you will", "tokens": [50364, 689, 309, + 733, 295, 18199, 11, 291, 1466, 577, 281, 12391, 309, 13, 1779, 13, 509, 458, 11, + 411, 597, 10888, 291, 486, 50652], "temperature": 0.0, "avg_logprob": -0.20678904182032534, + "compression_ratio": 1.4742268041237114, "no_speech_prob": 0.01795823872089386}, + {"id": 127, "seek": 84640, "start": 852.8, "end": 858.72, "text": " encounter and + you should be okay with those, but you don''t tilt towards more risky areas,", "tokens": + [50684, 8593, 293, 291, 820, 312, 1392, 365, 729, 11, 457, 291, 500, 380, 18446, + 3030, 544, 21137, 3179, 11, 50980], "temperature": 0.0, "avg_logprob": -0.20678904182032534, + "compression_ratio": 1.4742268041237114, "no_speech_prob": 0.01795823872089386}, + {"id": 128, "seek": 84640, "start": 859.6, "end": 866.3199999999999, "text": " in + the west to succeed in some specific thing. So I don''t know, I like that. But what + is your take on", "tokens": [51024, 294, 264, 7009, 281, 7754, 294, 512, 2685, 551, + 13, 407, 286, 500, 380, 458, 11, 286, 411, 300, 13, 583, 437, 307, 428, 747, 322, + 51360], "temperature": 0.0, "avg_logprob": -0.20678904182032534, "compression_ratio": + 1.4742268041237114, "no_speech_prob": 0.01795823872089386}, {"id": 129, "seek": + 86632, "start": 867.0400000000001, "end": 877.6, "text": " LLM unpredictability + compared to more, if you will, you know, traditional programming per se. Right.", + "tokens": [50400, 441, 43, 44, 28341, 2310, 5347, 281, 544, 11, 498, 291, 486, 11, + 291, 458, 11, 5164, 9410, 680, 369, 13, 1779, 13, 50928], "temperature": 0.0, "avg_logprob": + -0.1997564371349742, "compression_ratio": 1.5925925925925926, "no_speech_prob": + 0.018304863944649696}, {"id": 130, "seek": 86632, "start": 877.6, "end": 883.2800000000001, + "text": " So for example, when you, when we used to, when you used to write code, + and I don''t know, C++", "tokens": [50928, 407, 337, 1365, 11, 562, 291, 11, 562, + 321, 1143, 281, 11, 562, 291, 1143, 281, 2464, 3089, 11, 293, 286, 500, 380, 458, + 11, 383, 25472, 51212], "temperature": 0.0, "avg_logprob": -0.1997564371349742, + "compression_ratio": 1.5925925925925926, "no_speech_prob": 0.018304863944649696}, + {"id": 131, "seek": 86632, "start": 883.2800000000001, "end": 888.72, "text": " + Java, what have you? It was very deterministic in many ways. Maybe there have been + some things", "tokens": [51212, 10745, 11, 437, 362, 291, 30, 467, 390, 588, 15957, + 3142, 294, 867, 2098, 13, 2704, 456, 362, 668, 512, 721, 51484], "temperature": + 0.0, "avg_logprob": -0.1997564371349742, "compression_ratio": 1.5925925925925926, + "no_speech_prob": 0.018304863944649696}, {"id": 132, "seek": 86632, "start": 888.72, + "end": 895.0400000000001, "text": " non deterministic like runtime and so on, but + still you felt like you, you are in control of many", "tokens": [51484, 2107, 15957, + 3142, 411, 34474, 293, 370, 322, 11, 457, 920, 291, 2762, 411, 291, 11, 291, 366, + 294, 1969, 295, 867, 51800], "temperature": 0.0, "avg_logprob": -0.1997564371349742, + "compression_ratio": 1.5925925925925926, "no_speech_prob": 0.018304863944649696}, + {"id": 133, "seek": 89504, "start": 895.04, "end": 902.7199999999999, "text": " + things, right. With LLM, it''s different. For example, when you ask an LLM to summarize + a document for", "tokens": [50364, 721, 11, 558, 13, 2022, 441, 43, 44, 11, 309, + 311, 819, 13, 1171, 1365, 11, 562, 291, 1029, 364, 441, 43, 44, 281, 20858, 257, + 4166, 337, 50748], "temperature": 0.0, "avg_logprob": -0.10019888833304431, "compression_ratio": + 1.6890756302521008, "no_speech_prob": 0.006261066067963839}, {"id": 134, "seek": + 89504, "start": 902.7199999999999, "end": 909.12, "text": " you, and then you ask + second time, the answer will be different. It will be, you know, in subtle ways,", + "tokens": [50748, 291, 11, 293, 550, 291, 1029, 1150, 565, 11, 264, 1867, 486, 312, + 819, 13, 467, 486, 312, 11, 291, 458, 11, 294, 13743, 2098, 11, 51068], "temperature": + 0.0, "avg_logprob": -0.10019888833304431, "compression_ratio": 1.6890756302521008, + "no_speech_prob": 0.006261066067963839}, {"id": 135, "seek": 89504, "start": 909.12, + "end": 916.48, "text": " it will be different. And so that also creates, in my mind, + some issues around, okay, if I have", "tokens": [51068, 309, 486, 312, 819, 13, + 400, 370, 300, 611, 7829, 11, 294, 452, 1575, 11, 512, 2663, 926, 11, 1392, 11, + 498, 286, 362, 51436], "temperature": 0.0, "avg_logprob": -0.10019888833304431, + "compression_ratio": 1.6890756302521008, "no_speech_prob": 0.006261066067963839}, + {"id": 136, "seek": 89504, "start": 916.48, "end": 920.8, "text": " several users + accessing the same document, should they compute the summary on the fly, or should + they", "tokens": [51436, 2940, 5022, 26440, 264, 912, 4166, 11, 820, 436, 14722, + 264, 12691, 322, 264, 3603, 11, 420, 820, 436, 51652], "temperature": 0.0, "avg_logprob": + -0.10019888833304431, "compression_ratio": 1.6890756302521008, "no_speech_prob": + 0.006261066067963839}, {"id": 137, "seek": 92080, "start": 920.88, "end": 926.7199999999999, + "text": " compute it once and store it and then show the same copy to all of them, + right. But that also means", "tokens": [50368, 14722, 309, 1564, 293, 3531, 309, + 293, 550, 855, 264, 912, 5055, 281, 439, 295, 552, 11, 558, 13, 583, 300, 611, 1355, + 50660], "temperature": 0.0, "avg_logprob": -0.15744239027782153, "compression_ratio": + 1.628691983122363, "no_speech_prob": 0.006235597655177116}, {"id": 138, "seek": + 92080, "start": 926.7199999999999, "end": 931.52, "text": " that if the original + summary was not good enough for some reason and subsequent versions were better,", + "tokens": [50660, 300, 498, 264, 3380, 12691, 390, 406, 665, 1547, 337, 512, 1778, + 293, 19962, 9606, 645, 1101, 11, 50900], "temperature": 0.0, "avg_logprob": -0.15744239027782153, + "compression_ratio": 1.628691983122363, "no_speech_prob": 0.006235597655177116}, + {"id": 139, "seek": 92080, "start": 931.52, "end": 935.8399999999999, "text": " + I will never show those better versions. Right. So like, you start asking all these + like", "tokens": [50900, 286, 486, 1128, 855, 729, 1101, 9606, 13, 1779, 13, 407, + 411, 11, 291, 722, 3365, 439, 613, 411, 51116], "temperature": 0.0, "avg_logprob": + -0.15744239027782153, "compression_ratio": 1.628691983122363, "no_speech_prob": + 0.006235597655177116}, {"id": 140, "seek": 92080, "start": 935.8399999999999, "end": + 943.28, "text": " multitude of questions, or am I asking the wrong questions? It''s + such a challenge. And I don''t,", "tokens": [51116, 36358, 295, 1651, 11, 420, 669, + 286, 3365, 264, 2085, 1651, 30, 467, 311, 1270, 257, 3430, 13, 400, 286, 500, 380, + 11, 51488], "temperature": 0.0, "avg_logprob": -0.15744239027782153, "compression_ratio": + 1.628691983122363, "no_speech_prob": 0.006235597655177116}, {"id": 141, "seek": + 94328, "start": 943.8399999999999, "end": 953.52, "text": " yeah, it''s it''s a + period. Right. Like if you''re used to doing something with Python, it''s going", + "tokens": [50392, 1338, 11, 309, 311, 309, 311, 257, 2896, 13, 1779, 13, 1743, 498, + 291, 434, 1143, 281, 884, 746, 365, 15329, 11, 309, 311, 516, 50876], "temperature": + 0.0, "avg_logprob": -0.2122426466508345, "compression_ratio": 1.5755102040816327, + "no_speech_prob": 0.010112723335623741}, {"id": 142, "seek": 94328, "start": 953.52, + "end": 959.12, "text": " to be the exact same answer every single time. With these + models, it''s just like, you know,", "tokens": [50876, 281, 312, 264, 1900, 912, + 1867, 633, 2167, 565, 13, 2022, 613, 5245, 11, 309, 311, 445, 411, 11, 291, 458, + 11, 51156], "temperature": 0.0, "avg_logprob": -0.2122426466508345, "compression_ratio": + 1.5755102040816327, "no_speech_prob": 0.010112723335623741}, {"id": 143, "seek": + 94328, "start": 960.4, "end": 965.36, "text": " a very finicky person that keeps + changing their opinion. And you ask them the same question twice,", "tokens": [51220, + 257, 588, 962, 20539, 954, 300, 5965, 4473, 641, 4800, 13, 400, 291, 1029, 552, + 264, 912, 1168, 6091, 11, 51468], "temperature": 0.0, "avg_logprob": -0.2122426466508345, + "compression_ratio": 1.5755102040816327, "no_speech_prob": 0.010112723335623741}, + {"id": 144, "seek": 94328, "start": 965.36, "end": 969.92, "text": " and they''ve + forgotten what they just said. Because it''s a new session, so they literally don''t + have", "tokens": [51468, 293, 436, 600, 11832, 437, 436, 445, 848, 13, 1436, 309, + 311, 257, 777, 5481, 11, 370, 436, 3736, 500, 380, 362, 51696], "temperature": 0.0, + "avg_logprob": -0.2122426466508345, "compression_ratio": 1.5755102040816327, "no_speech_prob": + 0.010112723335623741}, {"id": 145, "seek": 96992, "start": 969.92, "end": 977.1999999999999, + "text": " them. You''re fully just that they start over. I think we''re going to + see a shift in this is not", "tokens": [50364, 552, 13, 509, 434, 4498, 445, 300, + 436, 722, 670, 13, 286, 519, 321, 434, 516, 281, 536, 257, 5513, 294, 341, 307, + 406, 50728], "temperature": 0.0, "avg_logprob": -0.21017076454910577, "compression_ratio": + 1.7636363636363637, "no_speech_prob": 0.0027035088278353214}, {"id": 146, "seek": + 96992, "start": 977.1999999999999, "end": 985.5999999999999, "text": " going to + change anytime soon. Just it''s almost as if you kind of plug a fake human into + the circuit.", "tokens": [50728, 516, 281, 1319, 13038, 2321, 13, 1449, 309, 311, + 1920, 382, 498, 291, 733, 295, 5452, 257, 7592, 1952, 666, 264, 9048, 13, 51148], + "temperature": 0.0, "avg_logprob": -0.21017076454910577, "compression_ratio": 1.7636363636363637, + "no_speech_prob": 0.0027035088278353214}, {"id": 147, "seek": 96992, "start": 985.5999999999999, + "end": 990.24, "text": " It''s like it''s going to be independent. That''s the nature + of it. And that nature is not going to", "tokens": [51148, 467, 311, 411, 309, 311, + 516, 281, 312, 6695, 13, 663, 311, 264, 3687, 295, 309, 13, 400, 300, 3687, 307, + 406, 516, 281, 51380], "temperature": 0.0, "avg_logprob": -0.21017076454910577, + "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0027035088278353214}, + {"id": 148, "seek": 96992, "start": 990.24, "end": 999.68, "text": " change anytime + soon. So I think what you''re going to see is a modification in the way we build", + "tokens": [51380, 1319, 13038, 2321, 13, 407, 286, 519, 437, 291, 434, 516, 281, + 536, 307, 257, 26747, 294, 264, 636, 321, 1322, 51852], "temperature": 0.0, "avg_logprob": + -0.21017076454910577, "compression_ratio": 1.7636363636363637, "no_speech_prob": + 0.0027035088278353214}, {"id": 149, "seek": 99968, "start": 999.68, "end": 1005.68, + "text": " code around these things. I think the pain point is when you assume that + it''s going to be as", "tokens": [50364, 3089, 926, 613, 721, 13, 286, 519, 264, + 1822, 935, 307, 562, 291, 6552, 300, 309, 311, 516, 281, 312, 382, 50664], "temperature": + 0.0, "avg_logprob": -0.09679614067077637, "compression_ratio": 1.5805084745762712, + "no_speech_prob": 0.0008801878429949284}, {"id": 150, "seek": 99968, "start": 1005.68, + "end": 1010.4799999999999, "text": " predictable as a code that you''re used to. + But once you get over it, you realize that, okay, well,", "tokens": [50664, 27737, + 382, 257, 3089, 300, 291, 434, 1143, 281, 13, 583, 1564, 291, 483, 670, 309, 11, + 291, 4325, 300, 11, 1392, 11, 731, 11, 50904], "temperature": 0.0, "avg_logprob": + -0.09679614067077637, "compression_ratio": 1.5805084745762712, "no_speech_prob": + 0.0008801878429949284}, {"id": 151, "seek": 99968, "start": 1010.4799999999999, + "end": 1014.7199999999999, "text": " if I just literally had a human in the loop, + there''s like an API to connect to a human,", "tokens": [50904, 498, 286, 445, 3736, + 632, 257, 1952, 294, 264, 6367, 11, 456, 311, 411, 364, 9362, 281, 1745, 281, 257, + 1952, 11, 51116], "temperature": 0.0, "avg_logprob": -0.09679614067077637, "compression_ratio": + 1.5805084745762712, "no_speech_prob": 0.0008801878429949284}, {"id": 152, "seek": + 99968, "start": 1015.4399999999999, "end": 1023.3599999999999, "text": " then I + have to be build a user experience that is somehow tolerant to that. And so let''s + see.", "tokens": [51152, 550, 286, 362, 281, 312, 1322, 257, 4195, 1752, 300, 307, + 6063, 45525, 281, 300, 13, 400, 370, 718, 311, 536, 13, 51548], "temperature": 0.0, + "avg_logprob": -0.09679614067077637, "compression_ratio": 1.5805084745762712, "no_speech_prob": + 0.0008801878429949284}, {"id": 153, "seek": 102336, "start": 1024.32, "end": 1032.24, + "text": " A lot of times people are hoping the first phrase into interacting with + these things. They say,", "tokens": [50412, 316, 688, 295, 1413, 561, 366, 7159, + 264, 700, 9535, 666, 18017, 365, 613, 721, 13, 814, 584, 11, 50808], "temperature": + 0.0, "avg_logprob": -0.1808109680811564, "compression_ratio": 1.4894736842105263, + "no_speech_prob": 0.0036443881690502167}, {"id": 154, "seek": 102336, "start": 1033.6, + "end": 1039.28, "text": " here''s a specification, build this code, and they expect + the answer to just forward. Now,", "tokens": [50876, 510, 311, 257, 31256, 11, 1322, + 341, 3089, 11, 293, 436, 2066, 264, 1867, 281, 445, 2128, 13, 823, 11, 51160], "temperature": + 0.0, "avg_logprob": -0.1808109680811564, "compression_ratio": 1.4894736842105263, + "no_speech_prob": 0.0036443881690502167}, {"id": 155, "seek": 102336, "start": 1039.28, + "end": 1048.08, "text": " that can fail in one of two big ways. One way is that + it''s just too complex. The model you can do", "tokens": [51160, 300, 393, 3061, + 294, 472, 295, 732, 955, 2098, 13, 1485, 636, 307, 300, 309, 311, 445, 886, 3997, + 13, 440, 2316, 291, 393, 360, 51600], "temperature": 0.0, "avg_logprob": -0.1808109680811564, + "compression_ratio": 1.4894736842105263, "no_speech_prob": 0.0036443881690502167}, + {"id": 156, "seek": 104808, "start": 1048.1599999999999, "end": 1053.28, "text": + " chain of thought reasoning, 01 has it built in and it''s magic. And it''s going + to get better. But with", "tokens": [50368, 5021, 295, 1194, 21577, 11, 23185, 575, + 309, 3094, 294, 293, 309, 311, 5585, 13, 400, 309, 311, 516, 281, 483, 1101, 13, + 583, 365, 50624], "temperature": 0.0, "avg_logprob": -0.1606650451819102, "compression_ratio": + 1.6866952789699572, "no_speech_prob": 0.0043530636467039585}, {"id": 157, "seek": + 104808, "start": 1053.28, "end": 1061.9199999999998, "text": " any sufficiently + large request, complex request, since you''re just appending one token at a time,", + "tokens": [50624, 604, 31868, 2416, 5308, 11, 3997, 5308, 11, 1670, 291, 434, 445, + 724, 2029, 472, 14862, 412, 257, 565, 11, 51056], "temperature": 0.0, "avg_logprob": + -0.1606650451819102, "compression_ratio": 1.6866952789699572, "no_speech_prob": + 0.0043530636467039585}, {"id": 158, "seek": 104808, "start": 1062.3999999999999, + "end": 1067.76, "text": " it''s just too easy to paint yourself into a corner. So + models will get better and they''ll be less", "tokens": [51080, 309, 311, 445, 886, + 1858, 281, 4225, 1803, 666, 257, 4538, 13, 407, 5245, 486, 483, 1101, 293, 436, + 603, 312, 1570, 51348], "temperature": 0.0, "avg_logprob": -0.1606650451819102, + "compression_ratio": 1.6866952789699572, "no_speech_prob": 0.0043530636467039585}, + {"id": 159, "seek": 104808, "start": 1067.76, "end": 1072.3999999999999, "text": + " and less likely to paint themselves into a corner. But it''ll always be the case + with sufficient", "tokens": [51348, 293, 1570, 3700, 281, 4225, 2969, 666, 257, + 4538, 13, 583, 309, 603, 1009, 312, 264, 1389, 365, 11563, 51580], "temperature": + 0.0, "avg_logprob": -0.1606650451819102, "compression_ratio": 1.6866952789699572, + "no_speech_prob": 0.0043530636467039585}, {"id": 160, "seek": 107240, "start": 1072.4, + "end": 1080.0, "text": " complexity. The other issue that you run into and why we''ll + never ever get there is because", "tokens": [50364, 14024, 13, 440, 661, 2734, 300, + 291, 1190, 666, 293, 983, 321, 603, 1128, 1562, 483, 456, 307, 570, 50744], "temperature": + 0.0, "avg_logprob": -0.14833736419677734, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.012517302297055721}, {"id": 161, "seek": 107240, "start": 1082.0800000000002, + "end": 1090.96, "text": " when I describe something, the domain of possible implementations, + possible completions that", "tokens": [50848, 562, 286, 6786, 746, 11, 264, 9274, + 295, 1944, 4445, 763, 11, 1944, 1557, 626, 300, 51292], "temperature": 0.0, "avg_logprob": + -0.14833736419677734, "compression_ratio": 1.6363636363636365, "no_speech_prob": + 0.012517302297055721}, {"id": 162, "seek": 107240, "start": 1090.96, "end": 1096.0, + "text": " match that input is so much larger than whatever I have in my head right + now. And so if you have", "tokens": [51292, 2995, 300, 4846, 307, 370, 709, 4833, + 813, 2035, 286, 362, 294, 452, 1378, 558, 586, 13, 400, 370, 498, 291, 362, 51544], + "temperature": 0.0, "avg_logprob": -0.14833736419677734, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.012517302297055721}, {"id": 163, "seek": 107240, "start": 1096.0, + "end": 1101.3600000000001, "text": " a company that''s like, you know, we''re going + to have like, you say the specification or code and", "tokens": [51544, 257, 2237, + 300, 311, 411, 11, 291, 458, 11, 321, 434, 516, 281, 362, 411, 11, 291, 584, 264, + 31256, 420, 3089, 293, 51812], "temperature": 0.0, "avg_logprob": -0.14833736419677734, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 0.012517302297055721}, + {"id": 164, "seek": 110136, "start": 1101.36, "end": 1105.6799999999998, "text": + " it will just always make the code. It''s like, you don''t realize into the codes + written which you", "tokens": [50364, 309, 486, 445, 1009, 652, 264, 3089, 13, 467, + 311, 411, 11, 291, 500, 380, 4325, 666, 264, 14211, 3720, 597, 291, 50580], "temperature": + 0.0, "avg_logprob": -0.1755941076186097, "compression_ratio": 1.7212389380530972, + "no_speech_prob": 0.0015101423487067223}, {"id": 165, "seek": 110136, "start": 1105.6799999999998, + "end": 1110.32, "text": " even wanted. You don''t, and then you go back and change + it. You don''t realize the codes written", "tokens": [50580, 754, 1415, 13, 509, + 500, 380, 11, 293, 550, 291, 352, 646, 293, 1319, 309, 13, 509, 500, 380, 4325, + 264, 14211, 3720, 50812], "temperature": 0.0, "avg_logprob": -0.1755941076186097, + "compression_ratio": 1.7212389380530972, "no_speech_prob": 0.0015101423487067223}, + {"id": 166, "seek": 110136, "start": 1111.4399999999998, "end": 1116.1599999999999, + "text": " and written incorrectly, you know, that, oh, that it''s doing what I said. + That''s not what I meant.", "tokens": [50868, 293, 3720, 42892, 11, 291, 458, 11, + 300, 11, 1954, 11, 300, 309, 311, 884, 437, 286, 848, 13, 663, 311, 406, 437, 286, + 4140, 13, 51104], "temperature": 0.0, "avg_logprob": -0.1755941076186097, "compression_ratio": + 1.7212389380530972, "no_speech_prob": 0.0015101423487067223}, {"id": 167, "seek": + 110136, "start": 1118.32, "end": 1125.6, "text": " So what does this all mean? I + think that future implementations have to do a lot to keep the user", "tokens": + [51212, 407, 437, 775, 341, 439, 914, 30, 286, 519, 300, 2027, 4445, 763, 362, 281, + 360, 257, 688, 281, 1066, 264, 4195, 51576], "temperature": 0.0, "avg_logprob": + -0.1755941076186097, "compression_ratio": 1.7212389380530972, "no_speech_prob": + 0.0015101423487067223}, {"id": 168, "seek": 112560, "start": 1126.08, "end": 1131.76, + "text": " in the loop and make the experience so that the user doesn''t feel like + they''re just shouting", "tokens": [50388, 294, 264, 6367, 293, 652, 264, 1752, + 370, 300, 264, 4195, 1177, 380, 841, 411, 436, 434, 445, 20382, 50672], "temperature": + 0.0, "avg_logprob": -0.09021330570829088, "compression_ratio": 1.5698324022346368, + "no_speech_prob": 0.0050799851305782795}, {"id": 169, "seek": 112560, "start": 1131.76, + "end": 1138.6399999999999, "text": " instructions at a thing and then hoping that + it works. But the user has to be interacting with", "tokens": [50672, 9415, 412, + 257, 551, 293, 550, 7159, 300, 309, 1985, 13, 583, 264, 4195, 575, 281, 312, 18017, + 365, 51016], "temperature": 0.0, "avg_logprob": -0.09021330570829088, "compression_ratio": + 1.5698324022346368, "no_speech_prob": 0.0050799851305782795}, {"id": 170, "seek": + 112560, "start": 1138.6399999999999, "end": 1148.48, "text": " this thing and, you + know, converging towards a solution. So you see this in a couple of ways.", "tokens": + [51016, 341, 551, 293, 11, 291, 458, 11, 9652, 3249, 3030, 257, 3827, 13, 407, 291, + 536, 341, 294, 257, 1916, 295, 2098, 13, 51508], "temperature": 0.0, "avg_logprob": + -0.09021330570829088, "compression_ratio": 1.5698324022346368, "no_speech_prob": + 0.0050799851305782795}, {"id": 171, "seek": 114848, "start": 1149.28, "end": 1156.88, + "text": " One way is like with the assistant interface. And cursor, forgive me, + GitHub, per se, the cursor is", "tokens": [50404, 1485, 636, 307, 411, 365, 264, + 10994, 9226, 13, 400, 28169, 11, 10718, 385, 11, 23331, 11, 680, 369, 11, 264, 28169, + 307, 50784], "temperature": 0.0, "avg_logprob": -0.2207514338132714, "compression_ratio": + 1.778181818181818, "no_speech_prob": 0.007209557108581066}, {"id": 172, "seek": + 114848, "start": 1157.44, "end": 1162.88, "text": " just a really good example here + where you feel like you''re chatting with someone that is working", "tokens": [50812, + 445, 257, 534, 665, 1365, 510, 689, 291, 841, 411, 291, 434, 24654, 365, 1580, 300, + 307, 1364, 51084], "temperature": 0.0, "avg_logprob": -0.2207514338132714, "compression_ratio": + 1.778181818181818, "no_speech_prob": 0.007209557108581066}, {"id": 173, "seek": + 114848, "start": 1162.88, "end": 1167.92, "text": " with you to, to, on this code. + It gets into something I hope we talk about a little bit later. Art", "tokens": + [51084, 365, 291, 281, 11, 281, 11, 322, 341, 3089, 13, 467, 2170, 666, 746, 286, + 1454, 321, 751, 466, 257, 707, 857, 1780, 13, 5735, 51336], "temperature": 0.0, + "avg_logprob": -0.2207514338132714, "compression_ratio": 1.778181818181818, "no_speech_prob": + 0.007209557108581066}, {"id": 174, "seek": 114848, "start": 1167.92, "end": 1173.2, + "text": " of facts, you know, they''re, you''re having this conversation here, but + you''re working on these", "tokens": [51336, 295, 9130, 11, 291, 458, 11, 436, 434, + 11, 291, 434, 1419, 341, 3761, 510, 11, 457, 291, 434, 1364, 322, 613, 51600], "temperature": + 0.0, "avg_logprob": -0.2207514338132714, "compression_ratio": 1.778181818181818, + "no_speech_prob": 0.007209557108581066}, {"id": 175, "seek": 114848, "start": 1173.2, + "end": 1178.08, "text": " artifacts. You''re working on these things. And these + assistants under, you understand what they''re", "tokens": [51600, 24617, 13, 509, + 434, 1364, 322, 613, 721, 13, 400, 613, 34949, 833, 11, 291, 1223, 437, 436, 434, + 51844], "temperature": 0.0, "avg_logprob": -0.2207514338132714, "compression_ratio": + 1.778181818181818, "no_speech_prob": 0.007209557108581066}, {"id": 176, "seek": + 117808, "start": 1178.08, "end": 1182.6399999999999, "text": " looking at. Whenever + they make a recommendation to change something, you understand how it''s", "tokens": + [50364, 1237, 412, 13, 14159, 436, 652, 257, 11879, 281, 1319, 746, 11, 291, 1223, + 577, 309, 311, 50592], "temperature": 0.0, "avg_logprob": -0.16628951633099429, + "compression_ratio": 1.6382978723404256, "no_speech_prob": 0.00010489464330021292}, + {"id": 177, "seek": 117808, "start": 1183.28, "end": 1188.32, "text": " going to + change your code. You are still in control as a human, say yes or no for all this + stuff.", "tokens": [50624, 516, 281, 1319, 428, 3089, 13, 509, 366, 920, 294, 1969, + 382, 257, 1952, 11, 584, 2086, 420, 572, 337, 439, 341, 1507, 13, 50876], "temperature": + 0.0, "avg_logprob": -0.16628951633099429, "compression_ratio": 1.6382978723404256, + "no_speech_prob": 0.00010489464330021292}, {"id": 178, "seek": 117808, "start": + 1188.8799999999999, "end": 1194.08, "text": " And that''s one way that they keep + the users in the loop. The other way that we keep users in the", "tokens": [50904, + 400, 300, 311, 472, 636, 300, 436, 1066, 264, 5022, 294, 264, 6367, 13, 440, 661, + 636, 300, 321, 1066, 5022, 294, 264, 51164], "temperature": 0.0, "avg_logprob": + -0.16628951633099429, "compression_ratio": 1.6382978723404256, "no_speech_prob": + 0.00010489464330021292}, {"id": 179, "seek": 117808, "start": 1194.08, "end": 1203.28, + "text": " loop, and I promise I''ll shut up soon, is there''s a assistant type behavior + and then there''s like", "tokens": [51164, 6367, 11, 293, 286, 6228, 286, 603, 5309, + 493, 2321, 11, 307, 456, 311, 257, 10994, 2010, 5223, 293, 550, 456, 311, 411, 51624], + "temperature": 0.0, "avg_logprob": -0.16628951633099429, "compression_ratio": 1.6382978723404256, + "no_speech_prob": 0.00010489464330021292}, {"id": 180, "seek": 120328, "start": + 1203.28, "end": 1210.56, "text": " workflows where a human is, it''s still in the + loop. But there is a human at the beginning that", "tokens": [50364, 43461, 689, + 257, 1952, 307, 11, 309, 311, 920, 294, 264, 6367, 13, 583, 456, 307, 257, 1952, + 412, 264, 2863, 300, 50728], "temperature": 0.0, "avg_logprob": -0.15281468994763434, + "compression_ratio": 1.5949367088607596, "no_speech_prob": 0.009639846161007881}, + {"id": 181, "seek": 120328, "start": 1210.56, "end": 1216.32, "text": " designed + a workflow as like a set of steps. You can''t just say look at this website and + pull out", "tokens": [50728, 4761, 257, 20993, 382, 411, 257, 992, 295, 4439, 13, + 509, 393, 380, 445, 584, 574, 412, 341, 3144, 293, 2235, 484, 51016], "temperature": + 0.0, "avg_logprob": -0.15281468994763434, "compression_ratio": 1.5949367088607596, + "no_speech_prob": 0.009639846161007881}, {"id": 182, "seek": 120328, "start": 1217.28, + "end": 1222.48, "text": " all the phone numbers, all of the menu items, all of the, + you know, the structure content.", "tokens": [51064, 439, 264, 2593, 3547, 11, 439, + 295, 264, 6510, 4754, 11, 439, 295, 264, 11, 291, 458, 11, 264, 3877, 2701, 13, + 51324], "temperature": 0.0, "avg_logprob": -0.15281468994763434, "compression_ratio": + 1.5949367088607596, "no_speech_prob": 0.009639846161007881}, {"id": 183, "seek": + 120328, "start": 1223.2, "end": 1229.2, "text": " And always expected to work. Sometimes + it''s better to say, let''s take this big thing and have a", "tokens": [51360, 400, + 1009, 5176, 281, 589, 13, 4803, 309, 311, 1101, 281, 584, 11, 718, 311, 747, 341, + 955, 551, 293, 362, 257, 51660], "temperature": 0.0, "avg_logprob": -0.15281468994763434, + "compression_ratio": 1.5949367088607596, "no_speech_prob": 0.009639846161007881}, + {"id": 184, "seek": 122920, "start": 1229.2, "end": 1237.52, "text": " human, a + human in this loop is defining all the steps that it''s going to take to implement + this workflow.", "tokens": [50364, 1952, 11, 257, 1952, 294, 341, 6367, 307, 17827, + 439, 264, 4439, 300, 309, 311, 516, 281, 747, 281, 4445, 341, 20993, 13, 50780], + "temperature": 0.0, "avg_logprob": -0.15759757070830374, "compression_ratio": 1.7085201793721974, + "no_speech_prob": 0.0026067395228892565}, {"id": 185, "seek": 122920, "start": 1237.52, + "end": 1243.28, "text": " And that way it''s still, you can make something that + is recoverable, you know, that there''s", "tokens": [50780, 400, 300, 636, 309, + 311, 920, 11, 291, 393, 652, 746, 300, 307, 8114, 712, 11, 291, 458, 11, 300, 456, + 311, 51068], "temperature": 0.0, "avg_logprob": -0.15759757070830374, "compression_ratio": + 1.7085201793721974, "no_speech_prob": 0.0026067395228892565}, {"id": 186, "seek": + 122920, "start": 1243.28, "end": 1248.0800000000002, "text": " airstates for some + of these steps and you can get out of them, pass it back up to a real human.", "tokens": + [51068, 1988, 372, 1024, 337, 512, 295, 613, 4439, 293, 291, 393, 483, 484, 295, + 552, 11, 1320, 309, 646, 493, 281, 257, 957, 1952, 13, 51308], "temperature": 0.0, + "avg_logprob": -0.15759757070830374, "compression_ratio": 1.7085201793721974, "no_speech_prob": + 0.0026067395228892565}, {"id": 187, "seek": 122920, "start": 1249.68, "end": 1255.6000000000001, + "text": " But yeah, all along the way of saying these things are going to remain + hard to predict,", "tokens": [51388, 583, 1338, 11, 439, 2051, 264, 636, 295, 1566, + 613, 721, 366, 516, 281, 6222, 1152, 281, 6069, 11, 51684], "temperature": 0.0, + "avg_logprob": -0.15759757070830374, "compression_ratio": 1.7085201793721974, "no_speech_prob": + 0.0026067395228892565}, {"id": 188, "seek": 125560, "start": 1256.24, "end": 1260.9599999999998, + "text": " but the code that''s built around them, I think, is going to become very + tolerant of that and", "tokens": [50396, 457, 264, 3089, 300, 311, 3094, 926, 552, + 11, 286, 519, 11, 307, 516, 281, 1813, 588, 45525, 295, 300, 293, 50632], "temperature": + 0.0, "avg_logprob": -0.18884249566828162, "compression_ratio": 1.6291666666666667, + "no_speech_prob": 0.0264284685254097}, {"id": 189, "seek": 125560, "start": 1260.9599999999998, + "end": 1266.8, "text": " by pulling the users into the conversation constantly. + Yeah, so you basically, if I got your idea", "tokens": [50632, 538, 8407, 264, 5022, + 666, 264, 3761, 6460, 13, 865, 11, 370, 291, 1936, 11, 498, 286, 658, 428, 1558, + 50924], "temperature": 0.0, "avg_logprob": -0.18884249566828162, "compression_ratio": + 1.6291666666666667, "no_speech_prob": 0.0264284685254097}, {"id": 190, "seek": 125560, + "start": 1266.8, "end": 1276.0, "text": " right, is that you put the user in the + driver''s seat, right? And the model or whatever LLM app is still,", "tokens": [50924, + 558, 11, 307, 300, 291, 829, 264, 4195, 294, 264, 6787, 311, 6121, 11, 558, 30, + 400, 264, 2316, 420, 2035, 441, 43, 44, 724, 307, 920, 11, 51384], "temperature": + 0.0, "avg_logprob": -0.18884249566828162, "compression_ratio": 1.6291666666666667, + "no_speech_prob": 0.0264284685254097}, {"id": 191, "seek": 125560, "start": 1277.12, + "end": 1283.84, "text": " it''s kind of like an assistant, as you said, or companion, + whatever you want to call it, right?", "tokens": [51440, 309, 311, 733, 295, 411, + 364, 10994, 11, 382, 291, 848, 11, 420, 22363, 11, 2035, 291, 528, 281, 818, 309, + 11, 558, 30, 51776], "temperature": 0.0, "avg_logprob": -0.18884249566828162, "compression_ratio": + 1.6291666666666667, "no_speech_prob": 0.0264284685254097}, {"id": 192, "seek": 128384, + "start": 1283.84, "end": 1291.84, "text": " But you, like, you still, I guess we + are still at that point in time when we need to know exactly", "tokens": [50364, + 583, 291, 11, 411, 11, 291, 920, 11, 286, 2041, 321, 366, 920, 412, 300, 935, 294, + 565, 562, 321, 643, 281, 458, 2293, 50764], "temperature": 0.0, "avg_logprob": -0.11282600806309627, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.006080771796405315}, + {"id": 193, "seek": 128384, "start": 1291.84, "end": 1300.1599999999999, "text": + " what we want, right? As users. And I think we also need to know how to get it + out of the model,", "tokens": [50764, 437, 321, 528, 11, 558, 30, 1018, 5022, 13, + 400, 286, 519, 321, 611, 643, 281, 458, 577, 281, 483, 309, 484, 295, 264, 2316, + 11, 51180], "temperature": 0.0, "avg_logprob": -0.11282600806309627, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.006080771796405315}, {"id": 194, "seek": + 128384, "start": 1300.1599999999999, "end": 1307.28, "text": " right? Because sometimes + no matter what you know, it''s not somehow achievable, maybe because you", "tokens": + [51180, 558, 30, 1436, 2171, 572, 1871, 437, 291, 458, 11, 309, 311, 406, 6063, + 3538, 17915, 11, 1310, 570, 291, 51536], "temperature": 0.0, "avg_logprob": -0.11282600806309627, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.006080771796405315}, + {"id": 195, "seek": 128384, "start": 1307.28, "end": 1313.6799999999998, "text": + " don''t know how to prompt well or, you know, you just go into the loops, I frequently + go there,", "tokens": [51536, 500, 380, 458, 577, 281, 12391, 731, 420, 11, 291, + 458, 11, 291, 445, 352, 666, 264, 16121, 11, 286, 10374, 352, 456, 11, 51856], "temperature": + 0.0, "avg_logprob": -0.11282600806309627, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.006080771796405315}, {"id": 196, "seek": 131368, "start": 1313.68, + "end": 1320.3200000000002, "text": " when I, for example, chat to, I don''t know, + chat GPT or it could be any other tool. When it just", "tokens": [50364, 562, 286, + 11, 337, 1365, 11, 5081, 281, 11, 286, 500, 380, 458, 11, 5081, 26039, 51, 420, + 309, 727, 312, 604, 661, 2290, 13, 1133, 309, 445, 50696], "temperature": 0.0, "avg_logprob": + -0.13618355327182347, "compression_ratio": 1.596774193548387, "no_speech_prob": + 0.008403436280786991}, {"id": 197, "seek": 131368, "start": 1320.3200000000002, + "end": 1326.16, "text": " keeps going and returning to the point which didn''t work + already, because the alternative doesn''t work", "tokens": [50696, 5965, 516, 293, + 12678, 281, 264, 935, 597, 994, 380, 589, 1217, 11, 570, 264, 8535, 1177, 380, 589, + 50988], "temperature": 0.0, "avg_logprob": -0.13618355327182347, "compression_ratio": + 1.596774193548387, "no_speech_prob": 0.008403436280786991}, {"id": 198, "seek": + 131368, "start": 1326.16, "end": 1331.1200000000001, "text": " now. And I''m like, + okay, neither work. Like what you propose just doesn''t work. What should I do?", + "tokens": [50988, 586, 13, 400, 286, 478, 411, 11, 1392, 11, 9662, 589, 13, 1743, + 437, 291, 17421, 445, 1177, 380, 589, 13, 708, 820, 286, 360, 30, 51236], "temperature": + 0.0, "avg_logprob": -0.13618355327182347, "compression_ratio": 1.596774193548387, + "no_speech_prob": 0.008403436280786991}, {"id": 199, "seek": 131368, "start": 1332.16, + "end": 1338.16, "text": " But still, I feel like I became much more productive as + a, I don''t write code every day, you know,", "tokens": [51288, 583, 920, 11, 286, + 841, 411, 286, 3062, 709, 544, 13304, 382, 257, 11, 286, 500, 380, 2464, 3089, 633, + 786, 11, 291, 458, 11, 51588], "temperature": 0.0, "avg_logprob": -0.13618355327182347, + "compression_ratio": 1.596774193548387, "no_speech_prob": 0.008403436280786991}, + {"id": 200, "seek": 133816, "start": 1338.24, "end": 1345.0400000000002, "text": + " for my work anymore, but for leaving. But when I do, I feel like I saved, I don''t + know,", "tokens": [50368, 337, 452, 589, 3602, 11, 457, 337, 5012, 13, 583, 562, + 286, 360, 11, 286, 841, 411, 286, 6624, 11, 286, 500, 380, 458, 11, 50708], "temperature": + 0.0, "avg_logprob": -0.14045983842275675, "compression_ratio": 1.5263157894736843, + "no_speech_prob": 0.014122546650469303}, {"id": 201, "seek": 133816, "start": 1345.0400000000002, + "end": 1352.3200000000002, "text": " three, five days of my time by using these + tools. But there is still this kind of unpredictable", "tokens": [50708, 1045, 11, + 1732, 1708, 295, 452, 565, 538, 1228, 613, 3873, 13, 583, 456, 307, 920, 341, 733, + 295, 31160, 51072], "temperature": 0.0, "avg_logprob": -0.14045983842275675, "compression_ratio": + 1.5263157894736843, "no_speech_prob": 0.014122546650469303}, {"id": 202, "seek": + 133816, "start": 1352.3200000000002, "end": 1357.92, "text": " component to it, + you know, I''ll give you one example, very specific one. So I was building like", + "tokens": [51072, 6542, 281, 309, 11, 291, 458, 11, 286, 603, 976, 291, 472, 1365, + 11, 588, 2685, 472, 13, 407, 286, 390, 2390, 411, 51352], "temperature": 0.0, "avg_logprob": + -0.14045983842275675, "compression_ratio": 1.5263157894736843, "no_speech_prob": + 0.014122546650469303}, {"id": 203, "seek": 133816, "start": 1359.0400000000002, + "end": 1366.24, "text": " like simple Python code, which would draw a diagram. And + on the x-axis, it would need to put, you", "tokens": [51408, 411, 2199, 15329, 3089, + 11, 597, 576, 2642, 257, 10686, 13, 400, 322, 264, 2031, 12, 24633, 11, 309, 576, + 643, 281, 829, 11, 291, 51768], "temperature": 0.0, "avg_logprob": -0.14045983842275675, + "compression_ratio": 1.5263157894736843, "no_speech_prob": 0.014122546650469303}, + {"id": 204, "seek": 136624, "start": 1366.24, "end": 1376.16, "text": " know, these + values like 1, 1.5, 2, 2.5 and so on. And so the model made a mistake by rounding + all", "tokens": [50364, 458, 11, 613, 4190, 411, 502, 11, 502, 13, 20, 11, 568, + 11, 568, 13, 20, 293, 370, 322, 13, 400, 370, 264, 2316, 1027, 257, 6146, 538, 48237, + 439, 50860], "temperature": 0.0, "avg_logprob": -0.13836764452750222, "compression_ratio": + 1.640495867768595, "no_speech_prob": 0.00875595211982727}, {"id": 205, "seek": 136624, + "start": 1376.16, "end": 1382.08, "text": " these values to an integer. And so when + x-axis, all of a sudden, I saw the same values, right? And", "tokens": [50860, 613, + 4190, 281, 364, 24922, 13, 400, 370, 562, 2031, 12, 24633, 11, 439, 295, 257, 3990, + 11, 286, 1866, 264, 912, 4190, 11, 558, 30, 400, 51156], "temperature": 0.0, "avg_logprob": + -0.13836764452750222, "compression_ratio": 1.640495867768595, "no_speech_prob": + 0.00875595211982727}, {"id": 206, "seek": 136624, "start": 1382.08, "end": 1387.04, + "text": " the model doesn''t have the reasoning component to realize that it made + a mistake. Or at least call", "tokens": [51156, 264, 2316, 1177, 380, 362, 264, + 21577, 6542, 281, 4325, 300, 309, 1027, 257, 6146, 13, 1610, 412, 1935, 818, 51404], + "temperature": 0.0, "avg_logprob": -0.13836764452750222, "compression_ratio": 1.640495867768595, + "no_speech_prob": 0.00875595211982727}, {"id": 207, "seek": 136624, "start": 1387.04, + "end": 1393.28, "text": " it out and say, do you want it this way? Or should I do + it another way? I had to correct it because I", "tokens": [51404, 309, 484, 293, + 584, 11, 360, 291, 528, 309, 341, 636, 30, 1610, 820, 286, 360, 309, 1071, 636, + 30, 286, 632, 281, 3006, 309, 570, 286, 51716], "temperature": 0.0, "avg_logprob": + -0.13836764452750222, "compression_ratio": 1.640495867768595, "no_speech_prob": + 0.00875595211982727}, {"id": 208, "seek": 139328, "start": 1393.28, "end": 1399.92, + "text": " knew that I needed to cast it to float. But if I didn''t know programming, + I wouldn''t be able to do", "tokens": [50364, 2586, 300, 286, 2978, 281, 4193, 309, + 281, 15706, 13, 583, 498, 286, 994, 380, 458, 9410, 11, 286, 2759, 380, 312, 1075, + 281, 360, 50696], "temperature": 0.0, "avg_logprob": -0.1118520164489746, "compression_ratio": + 1.646808510638298, "no_speech_prob": 0.019360436126589775}, {"id": 209, "seek": + 139328, "start": 1399.92, "end": 1408.24, "text": " that, right? I would be stuck + right there. And so that''s the level lake of sophistication we are", "tokens": + [50696, 300, 11, 558, 30, 286, 576, 312, 5541, 558, 456, 13, 400, 370, 300, 311, + 264, 1496, 11001, 295, 15572, 399, 321, 366, 51112], "temperature": 0.0, "avg_logprob": + -0.1118520164489746, "compression_ratio": 1.646808510638298, "no_speech_prob": 0.019360436126589775}, + {"id": 210, "seek": 139328, "start": 1408.24, "end": 1413.12, "text": " in still, + right? If we''re talking about code completion, but I wonder what you feel about + this?", "tokens": [51112, 294, 920, 11, 558, 30, 759, 321, 434, 1417, 466, 3089, + 19372, 11, 457, 286, 2441, 437, 291, 841, 466, 341, 30, 51356], "temperature": 0.0, + "avg_logprob": -0.1118520164489746, "compression_ratio": 1.646808510638298, "no_speech_prob": + 0.019360436126589775}, {"id": 211, "seek": 139328, "start": 1413.12, "end": 1419.84, + "text": " What do you think about code complete? You did call out cursor as the + tool you use the probably", "tokens": [51356, 708, 360, 291, 519, 466, 3089, 3566, + 30, 509, 630, 818, 484, 28169, 382, 264, 2290, 291, 764, 264, 1391, 51692], "temperature": + 0.0, "avg_logprob": -0.1118520164489746, "compression_ratio": 1.646808510638298, + "no_speech_prob": 0.019360436126589775}, {"id": 212, "seek": 141984, "start": 1419.84, + "end": 1426.9599999999998, "text": " more often now, but you did work on that in + GitHub, Copilot team. And what was your sense of", "tokens": [50364, 544, 2049, + 586, 11, 457, 291, 630, 589, 322, 300, 294, 23331, 11, 11579, 31516, 1469, 13, 400, + 437, 390, 428, 2020, 295, 50720], "temperature": 0.0, "avg_logprob": -0.16853753725687662, + "compression_ratio": 1.5375, "no_speech_prob": 0.03003728576004505}, {"id": 213, + "seek": 141984, "start": 1428.24, "end": 1434.9599999999998, "text": " its quality + and like challenges around it? And in general, how did you approach the task that", + "tokens": [50784, 1080, 3125, 293, 411, 4759, 926, 309, 30, 400, 294, 2674, 11, + 577, 630, 291, 3109, 264, 5633, 300, 51120], "temperature": 0.0, "avg_logprob": + -0.16853753725687662, "compression_ratio": 1.5375, "no_speech_prob": 0.03003728576004505}, + {"id": 214, "seek": 141984, "start": 1434.9599999999998, "end": 1443.28, "text": + " research challenge? I can speak a little bit of that. There''s two ways in which + I will be", "tokens": [51120, 2132, 3430, 30, 286, 393, 1710, 257, 707, 857, 295, + 300, 13, 821, 311, 732, 2098, 294, 597, 286, 486, 312, 51536], "temperature": 0.0, + "avg_logprob": -0.16853753725687662, "compression_ratio": 1.5375, "no_speech_prob": + 0.03003728576004505}, {"id": 215, "seek": 141984, "start": 1443.28, "end": 1447.6799999999998, + "text": " unsatisfactory here. One, I can''t get into all the details probably. + And another way is I''ve", "tokens": [51536, 2693, 25239, 21840, 510, 13, 1485, + 11, 286, 393, 380, 483, 666, 439, 264, 4365, 1391, 13, 400, 1071, 636, 307, 286, + 600, 51756], "temperature": 0.0, "avg_logprob": -0.16853753725687662, "compression_ratio": + 1.5375, "no_speech_prob": 0.03003728576004505}, {"id": 216, "seek": 144768, "start": + 1447.68, "end": 1455.3600000000001, "text": " been gone since May. So I''m sure + that that makes an amazing change since then. But this", "tokens": [50364, 668, + 2780, 1670, 1891, 13, 407, 286, 478, 988, 300, 300, 1669, 364, 2243, 1319, 1670, + 550, 13, 583, 341, 50748], "temperature": 0.0, "avg_logprob": -0.3235819267504143, + "compression_ratio": 1.603448275862069, "no_speech_prob": 0.00283340853638947}, + {"id": 217, "seek": 144768, "start": 1455.3600000000001, "end": 1462.3200000000002, + "text": " Copilot completions was one of the first successful applications of large + language models. And", "tokens": [50748, 11579, 31516, 1557, 626, 390, 472, 295, + 264, 700, 4406, 5821, 295, 2416, 2856, 5245, 13, 400, 51096], "temperature": 0.0, + "avg_logprob": -0.3235819267504143, "compression_ratio": 1.603448275862069, "no_speech_prob": + 0.00283340853638947}, {"id": 218, "seek": 144768, "start": 1463.1200000000001, "end": + 1469.3600000000001, "text": " outside of the pure model, chat to BT, a large language + model as a large language model service.", "tokens": [51136, 2380, 295, 264, 6075, + 2316, 11, 5081, 281, 31144, 11, 257, 2416, 2856, 2316, 382, 257, 2416, 2856, 2316, + 2643, 13, 51448], "temperature": 0.0, "avg_logprob": -0.3235819267504143, "compression_ratio": + 1.603448275862069, "no_speech_prob": 0.00283340853638947}, {"id": 219, "seek": 146936, + "start": 1469.36, "end": 1478.8, "text": " Like this is, this was just the, I guess + it was the first. So the implementation was actually", "tokens": [50364, 1743, 341, + 307, 11, 341, 390, 445, 264, 11, 286, 2041, 309, 390, 264, 700, 13, 407, 264, 11420, + 390, 767, 50836], "temperature": 0.0, "avg_logprob": -0.18281660218169724, "compression_ratio": + 1.5698324022346368, "no_speech_prob": 0.012653964571654797}, {"id": 220, "seek": + 146936, "start": 1478.8, "end": 1486.4799999999998, "text": " fairly simplistic. + Basically, they, we weren''t using chat models at the time. Those didn''t", "tokens": + [50836, 6457, 44199, 13, 8537, 11, 436, 11, 321, 4999, 380, 1228, 5081, 5245, 412, + 264, 565, 13, 3950, 994, 380, 51220], "temperature": 0.0, "avg_logprob": -0.18281660218169724, + "compression_ratio": 1.5698324022346368, "no_speech_prob": 0.012653964571654797}, + {"id": 221, "seek": 146936, "start": 1486.4799999999998, "end": 1495.1999999999998, + "text": " exist. We were only using completion models. Completion models, basically, + I mean, your audience", "tokens": [51220, 2514, 13, 492, 645, 787, 1228, 19372, + 5245, 13, 31804, 313, 5245, 11, 1936, 11, 286, 914, 11, 428, 4034, 51656], "temperature": + 0.0, "avg_logprob": -0.18281660218169724, "compression_ratio": 1.5698324022346368, + "no_speech_prob": 0.012653964571654797}, {"id": 222, "seek": 149520, "start": 1495.2, + "end": 1501.76, "text": " probably knows this, but given the top part of a document, + then all the model does. And it''s", "tokens": [50364, 1391, 3255, 341, 11, 457, + 2212, 264, 1192, 644, 295, 257, 4166, 11, 550, 439, 264, 2316, 775, 13, 400, 309, + 311, 50692], "temperature": 0.0, "avg_logprob": -0.11065304881394511, "compression_ratio": + 1.748878923766816, "no_speech_prob": 0.006104898639023304}, {"id": 223, "seek": + 149520, "start": 1501.76, "end": 1507.6000000000001, "text": " useful to think of + the model this way. It simplifies things. All the model does is it picks the next", + "tokens": [50692, 4420, 281, 519, 295, 264, 2316, 341, 636, 13, 467, 6883, 11221, + 721, 13, 1057, 264, 2316, 775, 307, 309, 16137, 264, 958, 50984], "temperature": + 0.0, "avg_logprob": -0.11065304881394511, "compression_ratio": 1.748878923766816, + "no_speech_prob": 0.006104898639023304}, {"id": 224, "seek": 149520, "start": 1507.6000000000001, + "end": 1512.64, "text": " token. What is the most likely token based on all these + words before it? What''s the next token?", "tokens": [50984, 14862, 13, 708, 307, + 264, 881, 3700, 14862, 2361, 322, 439, 613, 2283, 949, 309, 30, 708, 311, 264, 958, + 14862, 30, 51236], "temperature": 0.0, "avg_logprob": -0.11065304881394511, "compression_ratio": + 1.748878923766816, "no_speech_prob": 0.006104898639023304}, {"id": 225, "seek": + 149520, "start": 1512.64, "end": 1518.64, "text": " And then you append that one + and you did it again and again. And so the big aha moment that happened", "tokens": + [51236, 400, 550, 291, 34116, 300, 472, 293, 291, 630, 309, 797, 293, 797, 13, 400, + 370, 264, 955, 47340, 1623, 300, 2011, 51536], "temperature": 0.0, "avg_logprob": + -0.11065304881394511, "compression_ratio": 1.748878923766816, "no_speech_prob": + 0.006104898639023304}, {"id": 226, "seek": 151864, "start": 1519.44, "end": 1527.5200000000002, + "text": " probably in 2019, as well before my time on Copilot was, look, I can take + this top half the code", "tokens": [50404, 1391, 294, 6071, 11, 382, 731, 949, 452, + 565, 322, 11579, 31516, 390, 11, 574, 11, 286, 393, 747, 341, 1192, 1922, 264, 3089, + 50808], "temperature": 0.0, "avg_logprob": -0.2215027364095052, "compression_ratio": + 1.4623115577889447, "no_speech_prob": 0.0026177691761404276}, {"id": 227, "seek": + 151864, "start": 1527.5200000000002, "end": 1534.8000000000002, "text": " down to + the function. And the answer, you know, the completion that it makes is surprisingly + good.", "tokens": [50808, 760, 281, 264, 2445, 13, 400, 264, 1867, 11, 291, 458, + 11, 264, 19372, 300, 309, 1669, 307, 17600, 665, 13, 51172], "temperature": 0.0, + "avg_logprob": -0.2215027364095052, "compression_ratio": 1.4623115577889447, "no_speech_prob": + 0.0026177691761404276}, {"id": 228, "seek": 151864, "start": 1536.0, "end": 1543.68, + "text": " So like maybe it''s time to just wrap or wrap up for application around + it. And then after that,", "tokens": [51232, 407, 411, 1310, 309, 311, 565, 281, + 445, 7019, 420, 7019, 493, 337, 3861, 926, 309, 13, 400, 550, 934, 300, 11, 51616], + "temperature": 0.0, "avg_logprob": -0.2215027364095052, "compression_ratio": 1.4623115577889447, + "no_speech_prob": 0.0026177691761404276}, {"id": 229, "seek": 154368, "start": 1544.64, + "end": 1552.24, "text": " everybody''s learning these lessons at this point. But + it''s all about the context that you put", "tokens": [50412, 2201, 311, 2539, 613, + 8820, 412, 341, 935, 13, 583, 309, 311, 439, 466, 264, 4319, 300, 291, 829, 50792], + "temperature": 0.0, "avg_logprob": -0.14130511965070452, "compression_ratio": 1.5573770491803278, + "no_speech_prob": 0.0016411576652899384}, {"id": 230, "seek": 154368, "start": 1552.24, + "end": 1558.16, "text": " around it and how you present it so that the model can + make sense of it. At the time that I started", "tokens": [50792, 926, 309, 293, + 577, 291, 1974, 309, 370, 300, 264, 2316, 393, 652, 2020, 295, 309, 13, 1711, 264, + 565, 300, 286, 1409, 51088], "temperature": 0.0, "avg_logprob": -0.14130511965070452, + "compression_ratio": 1.5573770491803278, "no_speech_prob": 0.0016411576652899384}, + {"id": 231, "seek": 154368, "start": 1558.16, "end": 1564.8, "text": " with Copilot, + we were still using the completion models. And it was, the context itself was", + "tokens": [51088, 365, 11579, 31516, 11, 321, 645, 920, 1228, 264, 19372, 5245, + 13, 400, 309, 390, 11, 264, 4319, 2564, 390, 51420], "temperature": 0.0, "avg_logprob": + -0.14130511965070452, "compression_ratio": 1.5573770491803278, "no_speech_prob": + 0.0016411576652899384}, {"id": 232, "seek": 156480, "start": 1565.44, "end": 1576.24, + "text": " 2048 tokens, I think. So just tiny, tiny, tiny window. And so a huge focus + at the time was how to", "tokens": [50396, 945, 13318, 22667, 11, 286, 519, 13, + 407, 445, 5870, 11, 5870, 11, 5870, 4910, 13, 400, 370, 257, 2603, 1879, 412, 264, + 565, 390, 577, 281, 50936], "temperature": 0.0, "avg_logprob": -0.10980531484773844, + "compression_ratio": 1.6196581196581197, "no_speech_prob": 0.005099656525999308}, + {"id": 233, "seek": 156480, "start": 1576.24, "end": 1582.1599999999999, "text": + " take all the things that we thought might be useful and squeeze it down into this + tiny space,", "tokens": [50936, 747, 439, 264, 721, 300, 321, 1194, 1062, 312, 4420, + 293, 13578, 309, 760, 666, 341, 5870, 1901, 11, 51232], "temperature": 0.0, "avg_logprob": + -0.10980531484773844, "compression_ratio": 1.6196581196581197, "no_speech_prob": + 0.005099656525999308}, {"id": 234, "seek": 156480, "start": 1582.1599999999999, + "end": 1588.24, "text": " just, you know, actually make sure you''ve nailed it. + Because not only do you have to fit the", "tokens": [51232, 445, 11, 291, 458, 11, + 767, 652, 988, 291, 600, 30790, 309, 13, 1436, 406, 787, 360, 291, 362, 281, 3318, + 264, 51536], "temperature": 0.0, "avg_logprob": -0.10980531484773844, "compression_ratio": + 1.6196581196581197, "no_speech_prob": 0.005099656525999308}, {"id": 235, "seek": + 156480, "start": 1588.8799999999999, "end": 1594.48, "text": " prompt into this + 2048 tokens, but whatever the completions are, that''s, you know, that they''re", + "tokens": [51568, 12391, 666, 341, 945, 13318, 22667, 11, 457, 2035, 264, 1557, + 626, 366, 11, 300, 311, 11, 291, 458, 11, 300, 436, 434, 51848], "temperature": + 0.0, "avg_logprob": -0.10980531484773844, "compression_ratio": 1.6196581196581197, + "no_speech_prob": 0.005099656525999308}, {"id": 236, "seek": 159448, "start": 1594.48, + "end": 1600.4, "text": " sharing the same windows. You can move that line up and + down, but it''s always in 2048. So there wasn''t,", "tokens": [50364, 5414, 264, + 912, 9309, 13, 509, 393, 1286, 300, 1622, 493, 293, 760, 11, 457, 309, 311, 1009, + 294, 945, 13318, 13, 407, 456, 2067, 380, 11, 50660], "temperature": 0.0, "avg_logprob": + -0.24075753081078624, "compression_ratio": 1.613821138211382, "no_speech_prob": + 0.0001363903866149485}, {"id": 237, "seek": 159448, "start": 1600.4, "end": 1608.96, + "text": " there wasn''t. The ingredients were pretty simple. The file that you''re + looking at is obviously the", "tokens": [50660, 456, 2067, 380, 13, 440, 6952, 645, + 1238, 2199, 13, 440, 3991, 300, 291, 434, 1237, 412, 307, 2745, 264, 51088], "temperature": + 0.0, "avg_logprob": -0.24075753081078624, "compression_ratio": 1.613821138211382, + "no_speech_prob": 0.0001363903866149485}, {"id": 238, "seek": 159448, "start": 1608.96, + "end": 1614.64, "text": " most important thing. If the file is long, which I''ll + often I''ll log in to that 48, then the text", "tokens": [51088, 881, 1021, 551, + 13, 759, 264, 3991, 307, 938, 11, 597, 286, 603, 2049, 286, 603, 3565, 294, 281, + 300, 11174, 11, 550, 264, 2487, 51372], "temperature": 0.0, "avg_logprob": -0.24075753081078624, + "compression_ratio": 1.613821138211382, "no_speech_prob": 0.0001363903866149485}, + {"id": 239, "seek": 159448, "start": 1614.64, "end": 1623.3600000000001, "text": + " right above the cursor is an important thing. There are some initial work with + like the, they''re", "tokens": [51372, 558, 3673, 264, 28169, 307, 364, 1021, 551, + 13, 821, 366, 512, 5883, 589, 365, 411, 264, 11, 436, 434, 51808], "temperature": + 0.0, "avg_logprob": -0.24075753081078624, "compression_ratio": 1.613821138211382, + "no_speech_prob": 0.0001363903866149485}, {"id": 240, "seek": 162336, "start": 1623.36, + "end": 1627.1999999999998, "text": " still called fill in the middle models, which + they don''t need this anymore because all the models", "tokens": [50364, 920, 1219, + 2836, 294, 264, 2808, 5245, 11, 597, 436, 500, 380, 643, 341, 3602, 570, 439, 264, + 5245, 50556], "temperature": 0.0, "avg_logprob": -0.21598333653395738, "compression_ratio": + 1.7536764705882353, "no_speech_prob": 0.004661222919821739}, {"id": 241, "seek": + 162336, "start": 1627.1999999999998, "end": 1632.1599999999999, "text": " are so + free. You don''t need a specialized model for this. But you could, you know, you + could say the", "tokens": [50556, 366, 370, 1737, 13, 509, 500, 380, 643, 257, 19813, + 2316, 337, 341, 13, 583, 291, 727, 11, 291, 458, 11, 291, 727, 584, 264, 50804], + "temperature": 0.0, "avg_logprob": -0.21598333653395738, "compression_ratio": 1.7536764705882353, + "no_speech_prob": 0.004661222919821739}, {"id": 242, "seek": 162336, "start": 1632.1599999999999, + "end": 1636.56, "text": " prefix and the suffix, and it would do a good job about + filling in the middle. So the suffix was", "tokens": [50804, 46969, 293, 264, 3889, + 970, 11, 293, 309, 576, 360, 257, 665, 1691, 466, 10623, 294, 264, 2808, 13, 407, + 264, 3889, 970, 390, 51024], "temperature": 0.0, "avg_logprob": -0.21598333653395738, + "compression_ratio": 1.7536764705882353, "no_speech_prob": 0.004661222919821739}, + {"id": 243, "seek": 162336, "start": 1636.56, "end": 1642.3999999999999, "text": + " also an important part of the context. Where do you stop this thing? And then + as the model,", "tokens": [51024, 611, 364, 1021, 644, 295, 264, 4319, 13, 2305, + 360, 291, 1590, 341, 551, 30, 400, 550, 382, 264, 2316, 11, 51316], "temperature": + 0.0, "avg_logprob": -0.21598333653395738, "compression_ratio": 1.7536764705882353, + "no_speech_prob": 0.004661222919821739}, {"id": 244, "seek": 162336, "start": 1643.12, + "end": 1648.0, "text": " crew is a context-based crew a little bit, we can start + sticking in extra things. And so,", "tokens": [51352, 7260, 307, 257, 4319, 12, + 6032, 7260, 257, 707, 857, 11, 321, 393, 722, 13465, 294, 2857, 721, 13, 400, 370, + 11, 51596], "temperature": 0.0, "avg_logprob": -0.21598333653395738, "compression_ratio": + 1.7536764705882353, "no_speech_prob": 0.004661222919821739}, {"id": 245, "seek": + 164800, "start": 1648.8, "end": 1653.92, "text": " you know, you start with little + bitty things. These models were not trained on,", "tokens": [50404, 291, 458, 11, + 291, 722, 365, 707, 272, 10016, 721, 13, 1981, 5245, 645, 406, 8895, 322, 11, 50660], + "temperature": 0.0, "avg_logprob": -0.17921436916698108, "compression_ratio": 1.638095238095238, + "no_speech_prob": 0.0006231710431165993}, {"id": 246, "seek": 164800, "start": 1656.48, + "end": 1662.4, "text": " these models were trained on code, but they didn''t necessarily + have the context around the code.", "tokens": [50788, 613, 5245, 645, 8895, 322, + 3089, 11, 457, 436, 994, 380, 4725, 362, 264, 4319, 926, 264, 3089, 13, 51084], + "temperature": 0.0, "avg_logprob": -0.17921436916698108, "compression_ratio": 1.638095238095238, + "no_speech_prob": 0.0006231710431165993}, {"id": 247, "seek": 164800, "start": 1662.4, + "end": 1666.08, "text": " So the first easy thing to stick in is you could do a + shabang at the top,", "tokens": [51084, 407, 264, 700, 1858, 551, 281, 2897, 294, + 307, 291, 727, 360, 257, 402, 455, 656, 412, 264, 1192, 11, 51268], "temperature": + 0.0, "avg_logprob": -0.17921436916698108, "compression_ratio": 1.638095238095238, + "no_speech_prob": 0.0006231710431165993}, {"id": 248, "seek": 164800, "start": 1666.08, + "end": 1674.4, "text": " protecting a comment that says, here''s the path for this, + this file. And that gives the model", "tokens": [51268, 12316, 257, 2871, 300, 1619, + 11, 510, 311, 264, 3100, 337, 341, 11, 341, 3991, 13, 400, 300, 2709, 264, 2316, + 51684], "temperature": 0.0, "avg_logprob": -0.17921436916698108, "compression_ratio": + 1.638095238095238, "no_speech_prob": 0.0006231710431165993}, {"id": 249, "seek": + 167440, "start": 1674.4, "end": 1680.24, "text": " context about where this lives + in the context of everything else. A big breakthrough that", "tokens": [50364, 4319, + 466, 689, 341, 2909, 294, 264, 4319, 295, 1203, 1646, 13, 316, 955, 22397, 300, + 50656], "temperature": 0.0, "avg_logprob": -0.22268356244588636, "compression_ratio": + 1.526530612244898, "no_speech_prob": 0.0016933761071413755}, {"id": 250, "seek": + 167440, "start": 1681.8400000000001, "end": 1688.88, "text": " Albert Dealer, Mike + Coother, pioneered was the neighboring tab stuff. And I think this is all", "tokens": + [50736, 20812, 1346, 17148, 11, 6602, 3066, 802, 11, 19761, 4073, 390, 264, 31521, + 4421, 1507, 13, 400, 286, 519, 341, 307, 439, 51088], "temperature": 0.0, "avg_logprob": + -0.22268356244588636, "compression_ratio": 1.526530612244898, "no_speech_prob": + 0.0016933761071413755}, {"id": 251, "seek": 167440, "start": 1689.6000000000001, + "end": 1695.1200000000001, "text": " common sense these days. But basically, when + you, as a human, are using an IDE, you open up the", "tokens": [51124, 2689, 2020, + 613, 1708, 13, 583, 1936, 11, 562, 291, 11, 382, 257, 1952, 11, 366, 1228, 364, + 40930, 11, 291, 1269, 493, 264, 51400], "temperature": 0.0, "avg_logprob": -0.22268356244588636, + "compression_ratio": 1.526530612244898, "no_speech_prob": 0.0016933761071413755}, + {"id": 252, "seek": 167440, "start": 1695.1200000000001, "end": 1701.2, "text": + " file you''re working on. But you also open up other files for reference. So, duh, + why don''t we,", "tokens": [51400, 3991, 291, 434, 1364, 322, 13, 583, 291, 611, + 1269, 493, 661, 7098, 337, 6408, 13, 407, 11, 43763, 11, 983, 500, 380, 321, 11, + 51704], "temperature": 0.0, "avg_logprob": -0.22268356244588636, "compression_ratio": + 1.526530612244898, "no_speech_prob": 0.0016933761071413755}, {"id": 253, "seek": + 170120, "start": 1701.2, "end": 1705.44, "text": " you know, do that ourselves. + And the initial implementations of this that, you know, probably got", "tokens": + [50364, 291, 458, 11, 360, 300, 4175, 13, 400, 264, 5883, 4445, 763, 295, 341, 300, + 11, 291, 458, 11, 1391, 658, 50576], "temperature": 0.0, "avg_logprob": -0.1722081164096264, + "compression_ratio": 1.6488888888888888, "no_speech_prob": 0.0063216807320714}, + {"id": 254, "seek": 170120, "start": 1705.44, "end": 1711.04, "text": " not better + at this point. It was simple. It was like, look at the text right around the cursor.", + "tokens": [50576, 406, 1101, 412, 341, 935, 13, 467, 390, 2199, 13, 467, 390, 411, + 11, 574, 412, 264, 2487, 558, 926, 264, 28169, 13, 50856], "temperature": 0.0, "avg_logprob": + -0.1722081164096264, "compression_ratio": 1.6488888888888888, "no_speech_prob": + 0.0063216807320714}, {"id": 255, "seek": 170120, "start": 1711.04, "end": 1718.0800000000002, + "text": " And then search these files for similar text. And in your timing, 2048 + token space,", "tokens": [50856, 400, 550, 3164, 613, 7098, 337, 2531, 2487, 13, + 400, 294, 428, 10822, 11, 945, 13318, 14862, 1901, 11, 51208], "temperature": 0.0, + "avg_logprob": -0.1722081164096264, "compression_ratio": 1.6488888888888888, "no_speech_prob": + 0.0063216807320714}, {"id": 256, "seek": 170120, "start": 1718.0800000000002, "end": + 1723.1200000000001, "text": " you have any room for any of these snippets, then + you can chunk other stuff into the context.", "tokens": [51208, 291, 362, 604, 1808, + 337, 604, 295, 613, 35623, 1385, 11, 550, 291, 393, 16635, 661, 1507, 666, 264, + 4319, 13, 51460], "temperature": 0.0, "avg_logprob": -0.1722081164096264, "compression_ratio": + 1.6488888888888888, "no_speech_prob": 0.0063216807320714}, {"id": 257, "seek": 172312, + "start": 1723.6, "end": 1731.84, "text": " You have to be careful how you present + that. You can''t just, you know, have random scraps of text", "tokens": [50388, + 509, 362, 281, 312, 5026, 577, 291, 1974, 300, 13, 509, 393, 380, 445, 11, 291, + 458, 11, 362, 4974, 45204, 295, 2487, 50800], "temperature": 0.0, "avg_logprob": + -0.19006152905915913, "compression_ratio": 1.6891891891891893, "no_speech_prob": + 0.011846904642879963}, {"id": 258, "seek": 172312, "start": 1732.8799999999999, + "end": 1738.1599999999999, "text": " that are like, you know, partial function implementations. + Because that will prime the model to", "tokens": [50852, 300, 366, 411, 11, 291, + 458, 11, 14641, 2445, 4445, 763, 13, 1436, 300, 486, 5835, 264, 2316, 281, 51116], + "temperature": 0.0, "avg_logprob": -0.19006152905915913, "compression_ratio": 1.6891891891891893, + "no_speech_prob": 0.011846904642879963}, {"id": 259, "seek": 172312, "start": 1739.04, + "end": 1743.6799999999998, "text": " implement partial functions. Like, it''ll, + you know, it''ll just iterate the same gross pattern.", "tokens": [51160, 4445, + 14641, 6828, 13, 1743, 11, 309, 603, 11, 291, 458, 11, 309, 603, 445, 44497, 264, + 912, 11367, 5102, 13, 51392], "temperature": 0.0, "avg_logprob": -0.19006152905915913, + "compression_ratio": 1.6891891891891893, "no_speech_prob": 0.011846904642879963}, + {"id": 260, "seek": 172312, "start": 1743.6799999999998, "end": 1750.32, "text": + " It seems above. So you do things that make it look more like code. You say, here + is an", "tokens": [51392, 467, 2544, 3673, 13, 407, 291, 360, 721, 300, 652, 309, + 574, 544, 411, 3089, 13, 509, 584, 11, 510, 307, 364, 51724], "temperature": 0.0, + "avg_logprob": -0.19006152905915913, "compression_ratio": 1.6891891891891893, "no_speech_prob": + 0.011846904642879963}, {"id": 261, "seek": 175032, "start": 1750.32, "end": 1757.28, + "text": " interesting, you skip a code from this file in the comment so that it''s + still, you know,", "tokens": [50364, 1880, 11, 291, 10023, 257, 3089, 490, 341, + 3991, 294, 264, 2871, 370, 300, 309, 311, 920, 11, 291, 458, 11, 50712], "temperature": + 0.0, "avg_logprob": -0.17527279955275515, "compression_ratio": 1.5854700854700854, + "no_speech_prob": 0.0009059800067916512}, {"id": 262, "seek": 175032, "start": 1757.28, + "end": 1764.56, "text": " importantly, so it''s still valid syntax at the end of + it. And voila, the rest of this history,", "tokens": [50712, 8906, 11, 370, 309, + 311, 920, 7363, 28431, 412, 264, 917, 295, 309, 13, 400, 45565, 11, 264, 1472, 295, + 341, 2503, 11, 51076], "temperature": 0.0, "avg_logprob": -0.17527279955275515, + "compression_ratio": 1.5854700854700854, "no_speech_prob": 0.0009059800067916512}, + {"id": 263, "seek": 175032, "start": 1764.56, "end": 1773.4399999999998, "text": + " we came out with a really impactful product that no one had seen anything like + it before. And", "tokens": [51076, 321, 1361, 484, 365, 257, 534, 30842, 1674, 300, + 572, 472, 632, 1612, 1340, 411, 309, 949, 13, 400, 51520], "temperature": 0.0, "avg_logprob": + -0.17527279955275515, "compression_ratio": 1.5854700854700854, "no_speech_prob": + 0.0009059800067916512}, {"id": 264, "seek": 175032, "start": 1773.4399999999998, + "end": 1779.6799999999998, "text": " it''s certainly changed the way I code. I''m + much quicker and probably dumber at the same time.", "tokens": [51520, 309, 311, + 3297, 3105, 264, 636, 286, 3089, 13, 286, 478, 709, 16255, 293, 1391, 274, 4182, + 412, 264, 912, 565, 13, 51832], "temperature": 0.0, "avg_logprob": -0.17527279955275515, + "compression_ratio": 1.5854700854700854, "no_speech_prob": 0.0009059800067916512}, + {"id": 265, "seek": 178032, "start": 1781.12, "end": 1784.3999999999999, "text": + " Yeah, it''s been an interesting experience.", "tokens": [50404, 865, 11, 309, + 311, 668, 364, 1880, 1752, 13, 50568], "temperature": 0.0, "avg_logprob": -0.2303636683974155, + "compression_ratio": 1.6515151515151516, "no_speech_prob": 0.028755618259310722}, + {"id": 266, "seek": 178032, "start": 1784.3999999999999, "end": 1790.96, "text": + " Oh, maybe more smart because you get to do more things, right? Like you can, I + guess you can,", "tokens": [50568, 876, 11, 1310, 544, 4069, 570, 291, 483, 281, + 360, 544, 721, 11, 558, 30, 1743, 291, 393, 11, 286, 2041, 291, 393, 11, 50896], + "temperature": 0.0, "avg_logprob": -0.2303636683974155, "compression_ratio": 1.6515151515151516, + "no_speech_prob": 0.028755618259310722}, {"id": 267, "seek": 178032, "start": 1790.96, + "end": 1796.72, "text": " you can get hired, like you can achieve, you know, larger + heights and then, like experiment", "tokens": [50896, 291, 393, 483, 13144, 11, + 411, 291, 393, 4584, 11, 291, 458, 11, 4833, 25930, 293, 550, 11, 411, 5120, 51184], + "temperature": 0.0, "avg_logprob": -0.2303636683974155, "compression_ratio": 1.6515151515151516, + "no_speech_prob": 0.028755618259310722}, {"id": 268, "seek": 178032, "start": 1796.72, + "end": 1802.56, "text": " way, you need to experiment, right? And not where it feels + maybe more mundane. As long as the code", "tokens": [51184, 636, 11, 291, 643, 281, + 5120, 11, 558, 30, 400, 406, 689, 309, 3417, 1310, 544, 43497, 13, 1018, 938, 382, + 264, 3089, 51476], "temperature": 0.0, "avg_logprob": -0.2303636683974155, "compression_ratio": + 1.6515151515151516, "no_speech_prob": 0.028755618259310722}, {"id": 269, "seek": + 180256, "start": 1802.56, "end": 1808.8799999999999, "text": " works and like, I + don''t know, there are no security holes in it and stuff like that, which", "tokens": + [50364, 1985, 293, 411, 11, 286, 500, 380, 458, 11, 456, 366, 572, 3825, 8118, 294, + 309, 293, 1507, 411, 300, 11, 597, 50680], "temperature": 0.0, "avg_logprob": -0.14798387568047705, + "compression_ratio": 1.5416666666666667, "no_speech_prob": 0.1485767811536789}, + {"id": 270, "seek": 180256, "start": 1809.9199999999998, "end": 1814.8, "text": + " would need to be checked separately, I guess. Anyway, that''s very interesting. + But to close", "tokens": [50732, 576, 643, 281, 312, 10033, 14759, 11, 286, 2041, + 13, 5684, 11, 300, 311, 588, 1880, 13, 583, 281, 1998, 50976], "temperature": 0.0, + "avg_logprob": -0.14798387568047705, "compression_ratio": 1.5416666666666667, "no_speech_prob": + 0.1485767811536789}, {"id": 271, "seek": 180256, "start": 1814.8, "end": 1820.0, + "text": " up the loop there, like I''m just trying to understand, you said you focused + on keyword search,", "tokens": [50976, 493, 264, 6367, 456, 11, 411, 286, 478, 445, + 1382, 281, 1223, 11, 291, 848, 291, 5178, 322, 20428, 3164, 11, 51236], "temperature": + 0.0, "avg_logprob": -0.14798387568047705, "compression_ratio": 1.5416666666666667, + "no_speech_prob": 0.1485767811536789}, {"id": 272, "seek": 180256, "start": 1820.0, + "end": 1825.2, "text": " right? So you, you owned the elastic search sort of pipeline. + Can you, if you''re comfortable", "tokens": [51236, 558, 30, 407, 291, 11, 291, + 11684, 264, 17115, 3164, 1333, 295, 15517, 13, 1664, 291, 11, 498, 291, 434, 4619, + 51496], "temperature": 0.0, "avg_logprob": -0.14798387568047705, "compression_ratio": + 1.5416666666666667, "no_speech_prob": 0.1485767811536789}, {"id": 273, "seek": 182520, + "start": 1825.2, "end": 1834.8, "text": " disclosing that, like, would that index + the visible code in the ID somehow so that you can,", "tokens": [50364, 17092, 6110, + 300, 11, 411, 11, 576, 300, 8186, 264, 8974, 3089, 294, 264, 7348, 6063, 370, 300, + 291, 393, 11, 50844], "temperature": 0.0, "avg_logprob": -0.21798915247763356, "compression_ratio": + 1.56, "no_speech_prob": 0.0045714848674833775}, {"id": 274, "seek": 182520, "start": + 1834.8, "end": 1838.24, "text": " or what was the role of that in the whole chain, + hope, pipeline?", "tokens": [50844, 420, 437, 390, 264, 3090, 295, 300, 294, 264, + 1379, 5021, 11, 1454, 11, 15517, 30, 51016], "temperature": 0.0, "avg_logprob": + -0.21798915247763356, "compression_ratio": 1.56, "no_speech_prob": 0.0045714848674833775}, + {"id": 275, "seek": 182520, "start": 1840.8, "end": 1846.48, "text": " You''re asking + a lot of questions that don''t quite seek well on my actual experience. Let me see,", + "tokens": [51144, 509, 434, 3365, 257, 688, 295, 1651, 300, 500, 380, 1596, 8075, + 731, 322, 452, 3539, 1752, 13, 961, 385, 536, 11, 51428], "temperature": 0.0, "avg_logprob": + -0.21798915247763356, "compression_ratio": 1.56, "no_speech_prob": 0.0045714848674833775}, + {"id": 276, "seek": 182520, "start": 1846.48, "end": 1851.6000000000001, "text": + " if I can take your question and you take it just a little bit. When I came to + GitHub, I worked on", "tokens": [51428, 498, 286, 393, 747, 428, 1168, 293, 291, + 747, 309, 445, 257, 707, 857, 13, 1133, 286, 1361, 281, 23331, 11, 286, 2732, 322, + 51684], "temperature": 0.0, "avg_logprob": -0.21798915247763356, "compression_ratio": + 1.56, "no_speech_prob": 0.0045714848674833775}, {"id": 277, "seek": 185160, "start": + 1851.6, "end": 1858.24, "text": " code search, which was keyword, like, school search + for the entire code corpus.", "tokens": [50364, 3089, 3164, 11, 597, 390, 20428, + 11, 411, 11, 1395, 3164, 337, 264, 2302, 3089, 1181, 31624, 13, 50696], "temperature": + 0.0, "avg_logprob": -0.17702709544788708, "compression_ratio": 1.662037037037037, + "no_speech_prob": 0.0016760448925197124}, {"id": 278, "seek": 185160, "start": 1859.76, + "end": 1865.28, "text": " And that was really cool work. But that has since moved + to that, they''ve rebuilt the", "tokens": [50772, 400, 300, 390, 534, 1627, 589, + 13, 583, 300, 575, 1670, 4259, 281, 300, 11, 436, 600, 38532, 264, 51048], "temperature": + 0.0, "avg_logprob": -0.17702709544788708, "compression_ratio": 1.662037037037037, + "no_speech_prob": 0.0016760448925197124}, {"id": 279, "seek": 185160, "start": 1865.28, + "end": 1873.28, "text": " whole system yet again. And it''s a really amazing engine, + the proprietary engine that''s effectively", "tokens": [51048, 1379, 1185, 1939, + 797, 13, 400, 309, 311, 257, 534, 2243, 2848, 11, 264, 38992, 2848, 300, 311, 8659, + 51448], "temperature": 0.0, "avg_logprob": -0.17702709544788708, "compression_ratio": + 1.662037037037037, "no_speech_prob": 0.0016760448925197124}, {"id": 280, "seek": + 185160, "start": 1874.0, "end": 1880.8799999999999, "text": " grip at fantastically + massive scale. But that said, that code engine, the one that I built in,", "tokens": + [51484, 12007, 412, 4115, 22808, 5994, 4373, 13, 583, 300, 848, 11, 300, 3089, 2848, + 11, 264, 472, 300, 286, 3094, 294, 11, 51828], "temperature": 0.0, "avg_logprob": + -0.17702709544788708, "compression_ratio": 1.662037037037037, "no_speech_prob": + 0.0016760448925197124}, {"id": 281, "seek": 188088, "start": 1880.88, "end": 1887.1200000000001, + "text": " even the one that came after it, are not the things that are most beneficial + for some of the", "tokens": [50364, 754, 264, 472, 300, 1361, 934, 309, 11, 366, + 406, 264, 721, 300, 366, 881, 14072, 337, 512, 295, 264, 50676], "temperature": + 0.0, "avg_logprob": -0.2272092718827097, "compression_ratio": 1.5921052631578947, + "no_speech_prob": 0.00038533390033990145}, {"id": 282, "seek": 188088, "start": + 1887.1200000000001, "end": 1891.6000000000001, "text": " applications that KhoPy + that has in the editor. And they do different things for that.", "tokens": [50676, + 5821, 300, 591, 1289, 47, 88, 300, 575, 294, 264, 9839, 13, 400, 436, 360, 819, + 721, 337, 300, 13, 50900], "temperature": 0.0, "avg_logprob": -0.2272092718827097, + "compression_ratio": 1.5921052631578947, "no_speech_prob": 0.00038533390033990145}, + {"id": 283, "seek": 188088, "start": 1892.96, "end": 1901.5200000000002, "text": + " They''re, for example, if you''re on the web app side, there are things, now I + need even in the", "tokens": [50968, 814, 434, 11, 337, 1365, 11, 498, 291, 434, + 322, 264, 3670, 724, 1252, 11, 456, 366, 721, 11, 586, 286, 643, 754, 294, 264, + 51396], "temperature": 0.0, "avg_logprob": -0.2272092718827097, "compression_ratio": + 1.5921052631578947, "no_speech_prob": 0.00038533390033990145}, {"id": 284, "seek": + 188088, "start": 1901.5200000000002, "end": 1908.48, "text": " ID, I''m remembering + stuff from six months ago, they do just in time like vector embedding", "tokens": + [51396, 7348, 11, 286, 478, 20719, 1507, 490, 2309, 2493, 2057, 11, 436, 360, 445, + 294, 565, 411, 8062, 12240, 3584, 51744], "temperature": 0.0, "avg_logprob": -0.2272092718827097, + "compression_ratio": 1.5921052631578947, "no_speech_prob": 0.00038533390033990145}, + {"id": 285, "seek": 190848, "start": 1908.48, "end": 1915.84, "text": " vector storage + and stuff. Vectors are a lot better for certain types of code search where you''re", + "tokens": [50364, 8062, 6725, 293, 1507, 13, 691, 557, 830, 366, 257, 688, 1101, + 337, 1629, 3467, 295, 3089, 3164, 689, 291, 434, 50732], "temperature": 0.0, "avg_logprob": + -0.15836210250854493, "compression_ratio": 1.7567567567567568, "no_speech_prob": + 0.001108266762457788}, {"id": 286, "seek": 190848, "start": 1915.84, "end": 1922.16, + "text": " finding code that is about something. Whereas, lexical search is a lot + better when you''re finding code", "tokens": [50732, 5006, 3089, 300, 307, 466, + 746, 13, 13813, 11, 476, 87, 804, 3164, 307, 257, 688, 1101, 562, 291, 434, 5006, + 3089, 51048], "temperature": 0.0, "avg_logprob": -0.15836210250854493, "compression_ratio": + 1.7567567567567568, "no_speech_prob": 0.001108266762457788}, {"id": 287, "seek": + 190848, "start": 1922.16, "end": 1929.92, "text": " that matches this exact string. + And I think everyone in code outside of code, everyone everywhere", "tokens": [51048, + 300, 10676, 341, 1900, 6798, 13, 400, 286, 519, 1518, 294, 3089, 2380, 295, 3089, + 11, 1518, 5315, 51436], "temperature": 0.0, "avg_logprob": -0.15836210250854493, + "compression_ratio": 1.7567567567567568, "no_speech_prob": 0.001108266762457788}, + {"id": 288, "seek": 190848, "start": 1929.92, "end": 1936.48, "text": " is still + kind of wrestling with this. There''s no one data structure that does all that stuff", + "tokens": [51436, 307, 920, 733, 295, 19274, 365, 341, 13, 821, 311, 572, 472, 1412, + 3877, 300, 775, 439, 300, 1507, 51764], "temperature": 0.0, "avg_logprob": -0.15836210250854493, + "compression_ratio": 1.7567567567567568, "no_speech_prob": 0.001108266762457788}, + {"id": 289, "seek": 193648, "start": 1936.56, "end": 1942.8, "text": " ideally. + And I think we were wrestling with that inside KhoPylet as well.", "tokens": [50368, + 22915, 13, 400, 286, 519, 321, 645, 19274, 365, 300, 1854, 591, 1289, 47, 88, 2631, + 382, 731, 13, 50680], "temperature": 0.0, "avg_logprob": -0.19335080218571488, "compression_ratio": + 1.5598290598290598, "no_speech_prob": 0.02517448365688324}, {"id": 290, "seek": + 193648, "start": 1943.52, "end": 1947.28, "text": " Yeah, but I guess, yeah, I understood + your point and I probably missed that in your explanation", "tokens": [50716, 865, + 11, 457, 286, 2041, 11, 1338, 11, 286, 7320, 428, 935, 293, 286, 1391, 6721, 300, + 294, 428, 10835, 50904], "temperature": 0.0, "avg_logprob": -0.19335080218571488, + "compression_ratio": 1.5598290598290598, "no_speech_prob": 0.02517448365688324}, + {"id": 291, "seek": 193648, "start": 1947.28, "end": 1952.08, "text": " that you + worked on code search and not on the generation. That''s why in code search, you + did use", "tokens": [50904, 300, 291, 2732, 322, 3089, 3164, 293, 406, 322, 264, + 5125, 13, 663, 311, 983, 294, 3089, 3164, 11, 291, 630, 764, 51144], "temperature": + 0.0, "avg_logprob": -0.19335080218571488, "compression_ratio": 1.5598290598290598, + "no_speech_prob": 0.02517448365688324}, {"id": 292, "seek": 193648, "start": 1952.08, + "end": 1958.0, "text": " the elastic search index. But like what I was imagining + and I''m completely clueless in this topic,", "tokens": [51144, 264, 17115, 3164, + 8186, 13, 583, 411, 437, 286, 390, 27798, 293, 286, 478, 2584, 596, 3483, 442, 294, + 341, 4829, 11, 51440], "temperature": 0.0, "avg_logprob": -0.19335080218571488, + "compression_ratio": 1.5598290598290598, "no_speech_prob": 0.02517448365688324}, + {"id": 293, "seek": 195800, "start": 1958.0, "end": 1967.52, "text": " is that by + the virtue of LLM being trained on bunch of code, let''s say open source code that", + "tokens": [50364, 307, 300, 538, 264, 20816, 295, 441, 43, 44, 885, 8895, 322, 3840, + 295, 3089, 11, 718, 311, 584, 1269, 4009, 3089, 300, 50840], "temperature": 0.0, + "avg_logprob": -0.13879903625039494, "compression_ratio": 1.588235294117647, "no_speech_prob": + 0.0033970868680626154}, {"id": 294, "seek": 195800, "start": 1967.52, "end": 1975.44, + "text": " you can train on license wise, if the user is asking something that reminds + the code that had", "tokens": [50840, 291, 393, 3847, 322, 10476, 10829, 11, 498, + 264, 4195, 307, 3365, 746, 300, 12025, 264, 3089, 300, 632, 51236], "temperature": + 0.0, "avg_logprob": -0.13879903625039494, "compression_ratio": 1.588235294117647, + "no_speech_prob": 0.0033970868680626154}, {"id": 295, "seek": 195800, "start": 1975.44, + "end": 1984.24, "text": " written before, wouldn''t it make sense to try to find + that code and kind of somehow", "tokens": [51236, 3720, 949, 11, 2759, 380, 309, + 652, 2020, 281, 853, 281, 915, 300, 3089, 293, 733, 295, 6063, 51676], "temperature": + 0.0, "avg_logprob": -0.13879903625039494, "compression_ratio": 1.588235294117647, + "no_speech_prob": 0.0033970868680626154}, {"id": 296, "seek": 198424, "start": 1984.8, + "end": 1993.44, "text": " you know, rag on it with LLM or is it completely different + than how you did it?", "tokens": [50392, 291, 458, 11, 17539, 322, 309, 365, 441, + 43, 44, 420, 307, 309, 2584, 819, 813, 577, 291, 630, 309, 30, 50824], "temperature": + 0.0, "avg_logprob": -0.38978754679361977, "compression_ratio": 1.4673913043478262, + "no_speech_prob": 0.03426869958639145}, {"id": 297, "seek": 198424, "start": 1996.0, + "end": 2003.92, "text": " The like at this point, we''ve moved to much, it''s you + know, as of May my left, they''ve moved to", "tokens": [50952, 440, 411, 412, 341, + 935, 11, 321, 600, 4259, 281, 709, 11, 309, 311, 291, 458, 11, 382, 295, 1891, 452, + 1411, 11, 436, 600, 4259, 281, 51348], "temperature": 0.0, "avg_logprob": -0.38978754679361977, + "compression_ratio": 1.4673913043478262, "no_speech_prob": 0.03426869958639145}, + {"id": 298, "seek": 198424, "start": 2003.92, "end": 2012.32, "text": " much larger + models. And then the models themselves have read not only all the code and GitHub,", + "tokens": [51348, 709, 4833, 5245, 13, 400, 550, 264, 5245, 2969, 362, 1401, 406, + 787, 439, 264, 3089, 293, 23331, 11, 51768], "temperature": 0.0, "avg_logprob": + -0.38978754679361977, "compression_ratio": 1.4673913043478262, "no_speech_prob": + 0.03426869958639145}, {"id": 299, "seek": 201232, "start": 2012.32, "end": 2017.76, + "text": " but also it''s read the internet five times or something. So they read + all the blog posts about code.", "tokens": [50364, 457, 611, 309, 311, 1401, 264, + 4705, 1732, 1413, 420, 746, 13, 407, 436, 1401, 439, 264, 6968, 12300, 466, 3089, + 13, 50636], "temperature": 0.0, "avg_logprob": -0.19740371704101561, "compression_ratio": + 1.7857142857142858, "no_speech_prob": 0.0029065452981740236}, {"id": 300, "seek": + 201232, "start": 2020.24, "end": 2026.0, "text": " It''s amazing, right? It''s what + times you live in. So whenever you''re typing something and it kind", "tokens": + [50760, 467, 311, 2243, 11, 558, 30, 467, 311, 437, 1413, 291, 1621, 294, 13, 407, + 5699, 291, 434, 18444, 746, 293, 309, 733, 51048], "temperature": 0.0, "avg_logprob": + -0.19740371704101561, "compression_ratio": 1.7857142857142858, "no_speech_prob": + 0.0029065452981740236}, {"id": 301, "seek": 201232, "start": 2026.0, "end": 2033.6, + "text": " of smells like something it''s a thing before, it doesn''t, it doesn''t + necessarily need rag to go get", "tokens": [51048, 295, 10036, 411, 746, 309, 311, + 257, 551, 949, 11, 309, 1177, 380, 11, 309, 1177, 380, 4725, 643, 17539, 281, 352, + 483, 51428], "temperature": 0.0, "avg_logprob": -0.19740371704101561, "compression_ratio": + 1.7857142857142858, "no_speech_prob": 0.0029065452981740236}, {"id": 302, "seek": + 201232, "start": 2033.6, "end": 2037.6799999999998, "text": " you know, common motifs, + common, you know, here''s what you''re doing, here''s what I think you''re doing", + "tokens": [51428, 291, 458, 11, 2689, 2184, 18290, 11, 2689, 11, 291, 458, 11, 510, + 311, 437, 291, 434, 884, 11, 510, 311, 437, 286, 519, 291, 434, 884, 51632], "temperature": + 0.0, "avg_logprob": -0.19740371704101561, "compression_ratio": 1.7857142857142858, + "no_speech_prob": 0.0029065452981740236}, {"id": 303, "seek": 203768, "start": 2037.76, + "end": 2042.4, "text": " a code right now and it can piece code together from all + the code it''s ever learned from and extract", "tokens": [50368, 257, 3089, 558, + 586, 293, 309, 393, 2522, 3089, 1214, 490, 439, 264, 3089, 309, 311, 1562, 3264, + 490, 293, 8947, 50600], "temperature": 0.0, "avg_logprob": -0.1648193935178361, + "compression_ratio": 1.6752136752136753, "no_speech_prob": 0.0017650466179475188}, + {"id": 304, "seek": 203768, "start": 2042.4, "end": 2051.36, "text": " late outside + of it. But if it is and you know, this is me talking about how maybe I would build + a", "tokens": [50600, 3469, 2380, 295, 309, 13, 583, 498, 309, 307, 293, 291, 458, + 11, 341, 307, 385, 1417, 466, 577, 1310, 286, 576, 1322, 257, 51048], "temperature": + 0.0, "avg_logprob": -0.1648193935178361, "compression_ratio": 1.6752136752136753, + "no_speech_prob": 0.0017650466179475188}, {"id": 305, "seek": 203768, "start": 2051.36, + "end": 2059.04, "text": " co-pilot. At some point I guess, you know, you need to + see if it''s if the user''s typing code that", "tokens": [51048, 598, 12, 79, 31516, + 13, 1711, 512, 935, 286, 2041, 11, 291, 458, 11, 291, 643, 281, 536, 498, 309, 311, + 498, 264, 4195, 311, 18444, 3089, 300, 51432], "temperature": 0.0, "avg_logprob": + -0.1648193935178361, "compression_ratio": 1.6752136752136753, "no_speech_prob": + 0.0017650466179475188}, {"id": 306, "seek": 203768, "start": 2059.04, "end": 2065.6800000000003, + "text": " is so similar to code in this code base that it''s worth bringing it in. + And we kind of did that", "tokens": [51432, 307, 370, 2531, 281, 3089, 294, 341, + 3089, 3096, 300, 309, 311, 3163, 5062, 309, 294, 13, 400, 321, 733, 295, 630, 300, + 51764], "temperature": 0.0, "avg_logprob": -0.1648193935178361, "compression_ratio": + 1.6752136752136753, "no_speech_prob": 0.0017650466179475188}, {"id": 307, "seek": + 206568, "start": 2065.7599999999998, "end": 2070.8799999999997, "text": " in a rudimentary + way with the neighboring tabs. You''ve already got the tabs open. And that ended", + "tokens": [50368, 294, 257, 32109, 2328, 822, 636, 365, 264, 31521, 20743, 13, 509, + 600, 1217, 658, 264, 20743, 1269, 13, 400, 300, 4590, 50624], "temperature": 0.0, + "avg_logprob": -0.1338464135993017, "compression_ratio": 1.5235602094240839, "no_speech_prob": + 0.0007026331732049584}, {"id": 308, "seek": 206568, "start": 2070.8799999999997, + "end": 2081.2, "text": " up being super useful. I think there''s probably a kind + of a decreased efficacy, there''s work for this,", "tokens": [50624, 493, 885, 1687, + 4420, 13, 286, 519, 456, 311, 1391, 257, 733, 295, 257, 24436, 33492, 11, 456, 311, + 589, 337, 341, 11, 51140], "temperature": 0.0, "avg_logprob": -0.1338464135993017, + "compression_ratio": 1.5235602094240839, "no_speech_prob": 0.0007026331732049584}, + {"id": 309, "seek": 206568, "start": 2083.2, "end": 2090.3999999999996, "text": + " where if you''re doing a rag search over the entire code base, probably the code + that you''re", "tokens": [51240, 689, 498, 291, 434, 884, 257, 17539, 3164, 670, + 264, 2302, 3089, 3096, 11, 1391, 264, 3089, 300, 291, 434, 51600], "temperature": + 0.0, "avg_logprob": -0.1338464135993017, "compression_ratio": 1.5235602094240839, + "no_speech_prob": 0.0007026331732049584}, {"id": 310, "seek": 209040, "start": 2090.4, + "end": 2095.92, "text": " going to find is already code that''s open in the tabs + right beside them. So maybe it''s useful to", "tokens": [50364, 516, 281, 915, 307, + 1217, 3089, 300, 311, 1269, 294, 264, 20743, 558, 15726, 552, 13, 407, 1310, 309, + 311, 4420, 281, 50640], "temperature": 0.0, "avg_logprob": -0.19042962096458257, + "compression_ratio": 1.4851485148514851, "no_speech_prob": 0.007279460318386555}, + {"id": 311, "seek": 209040, "start": 2095.92, "end": 2105.76, "text": " do that + maybe it''s not. But I don''t know. Yeah, interesting. I think code is like, as + you said, it''s the", "tokens": [50640, 360, 300, 1310, 309, 311, 406, 13, 583, + 286, 500, 380, 458, 13, 865, 11, 1880, 13, 286, 519, 3089, 307, 411, 11, 382, 291, + 848, 11, 309, 311, 264, 51132], "temperature": 0.0, "avg_logprob": -0.19042962096458257, + "compression_ratio": 1.4851485148514851, "no_speech_prob": 0.007279460318386555}, + {"id": 312, "seek": 209040, "start": 2105.76, "end": 2112.64, "text": " first successful + LLM application. Probably some companies will say, no, no, no, Dr. Boog''s was the", + "tokens": [51132, 700, 4406, 441, 43, 44, 3861, 13, 9210, 512, 3431, 486, 584, 11, + 572, 11, 572, 11, 572, 11, 2491, 13, 3286, 664, 311, 390, 264, 51476], "temperature": + 0.0, "avg_logprob": -0.19042962096458257, "compression_ratio": 1.4851485148514851, + "no_speech_prob": 0.007279460318386555}, {"id": 313, "seek": 211264, "start": 2112.64, + "end": 2120.08, "text": " first successful LLM application. But I, but I, there + were some, maybe it was the first successful", "tokens": [50364, 700, 4406, 441, + 43, 44, 3861, 13, 583, 286, 11, 457, 286, 11, 456, 645, 512, 11, 1310, 309, 390, + 264, 700, 4406, 50736], "temperature": 0.0, "avg_logprob": -0.2961770645295731, + "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.012393930926918983}, + {"id": 314, "seek": 211264, "start": 2120.08, "end": 2125.92, "text": " neural search + application. And then co-pilot was the first LLM application, successful LLM application.", + "tokens": [50736, 18161, 3164, 3861, 13, 400, 550, 598, 12, 79, 31516, 390, 264, + 700, 441, 43, 44, 3861, 11, 4406, 441, 43, 44, 3861, 13, 51028], "temperature": + 0.0, "avg_logprob": -0.2961770645295731, "compression_ratio": 1.8333333333333333, + "no_speech_prob": 0.012393930926918983}, {"id": 315, "seek": 211264, "start": 2125.92, + "end": 2133.6, "text": " And there''s plan nine. Yeah, there was another company + that was out there actually before us,", "tokens": [51028, 400, 456, 311, 1393, + 4949, 13, 865, 11, 456, 390, 1071, 2237, 300, 390, 484, 456, 767, 949, 505, 11, + 51412], "temperature": 0.0, "avg_logprob": -0.2961770645295731, "compression_ratio": + 1.8333333333333333, "no_speech_prob": 0.012393930926918983}, {"id": 316, "seek": + 211264, "start": 2133.6, "end": 2138.24, "text": " but they just didn''t have quite + the same, they weren''t only my Microsoft at the time, that probably", "tokens": + [51412, 457, 436, 445, 994, 380, 362, 1596, 264, 912, 11, 436, 4999, 380, 787, 452, + 8116, 412, 264, 565, 11, 300, 1391, 51644], "temperature": 0.0, "avg_logprob": -0.2961770645295731, + "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.012393930926918983}, + {"id": 317, "seek": 213824, "start": 2138.3199999999997, "end": 2148.56, "text": + " helped a bit. Yeah, budget wise. I''m guessing. Yeah. Yeah, but I still, I still + feel like it feels like", "tokens": [50368, 4254, 257, 857, 13, 865, 11, 4706, 10829, + 13, 286, 478, 17939, 13, 865, 13, 865, 11, 457, 286, 920, 11, 286, 920, 841, 411, + 309, 3417, 411, 50880], "temperature": 0.0, "avg_logprob": -0.22762206582462086, + "compression_ratio": 1.5555555555555556, "no_speech_prob": 0.03456605225801468}, + {"id": 318, "seek": 213824, "start": 2148.56, "end": 2156.3199999999997, "text": + " magic, right? Like, judgey PT also felt magic and scary in the beginning. Like + when I saw it for the", "tokens": [50880, 5585, 11, 558, 30, 1743, 11, 6995, 88, + 35460, 611, 2762, 5585, 293, 6958, 294, 264, 2863, 13, 1743, 562, 286, 1866, 309, + 337, 264, 51268], "temperature": 0.0, "avg_logprob": -0.22762206582462086, "compression_ratio": + 1.5555555555555556, "no_speech_prob": 0.03456605225801468}, {"id": 319, "seek": + 213824, "start": 2156.3199999999997, "end": 2162.24, "text": " first time and I + saw it produce code, I thought that my job is done, even though I was not a programmer", + "tokens": [51268, 700, 565, 293, 286, 1866, 309, 5258, 3089, 11, 286, 1194, 300, + 452, 1691, 307, 1096, 11, 754, 1673, 286, 390, 406, 257, 32116, 51564], "temperature": + 0.0, "avg_logprob": -0.22762206582462086, "compression_ratio": 1.5555555555555556, + "no_speech_prob": 0.03456605225801468}, {"id": 320, "seek": 216224, "start": 2162.3199999999997, + "end": 2170.9599999999996, "text": " anymore by then, but I felt the existential, + well, not crisis of fear that basically many of us,", "tokens": [50368, 3602, 538, + 550, 11, 457, 286, 2762, 264, 37133, 11, 731, 11, 406, 5869, 295, 4240, 300, 1936, + 867, 295, 505, 11, 50800], "temperature": 0.0, "avg_logprob": -0.16170352113013173, + "compression_ratio": 1.6125, "no_speech_prob": 0.02611222118139267}, {"id": 321, + "seek": 216224, "start": 2171.6, "end": 2178.0, "text": " and especially junior + developers, like probably not needed anymore. But then as I was, you know,", "tokens": + [50832, 293, 2318, 16195, 8849, 11, 411, 1391, 406, 2978, 3602, 13, 583, 550, 382, + 286, 390, 11, 291, 458, 11, 51152], "temperature": 0.0, "avg_logprob": -0.16170352113013173, + "compression_ratio": 1.6125, "no_speech_prob": 0.02611222118139267}, {"id": 322, + "seek": 216224, "start": 2178.0, "end": 2184.9599999999996, "text": " overcoming + my fear and I was like, now let me try this thing. It''s probably a toy. I found + some,", "tokens": [51152, 38047, 452, 4240, 293, 286, 390, 411, 11, 586, 718, 385, + 853, 341, 551, 13, 467, 311, 1391, 257, 12058, 13, 286, 1352, 512, 11, 51500], "temperature": + 0.0, "avg_logprob": -0.16170352113013173, "compression_ratio": 1.6125, "no_speech_prob": + 0.02611222118139267}, {"id": 323, "seek": 216224, "start": 2184.9599999999996, "end": + 2189.04, "text": " what I explained, you know, some edge cases, which just doesn''t + work. It goes in loops. And so I", "tokens": [51500, 437, 286, 8825, 11, 291, 458, + 11, 512, 4691, 3331, 11, 597, 445, 1177, 380, 589, 13, 467, 1709, 294, 16121, 13, + 400, 370, 286, 51704], "temperature": 0.0, "avg_logprob": -0.16170352113013173, + "compression_ratio": 1.6125, "no_speech_prob": 0.02611222118139267}, {"id": 324, + "seek": 218904, "start": 2189.04, "end": 2195.2, "text": " was like, okay, it seems + like another tool under my belt. So I better master it and not,", "tokens": [50364, + 390, 411, 11, 1392, 11, 309, 2544, 411, 1071, 2290, 833, 452, 10750, 13, 407, 286, + 1101, 4505, 309, 293, 406, 11, 50672], "temperature": 0.0, "avg_logprob": -0.16682620578342014, + "compression_ratio": 1.668141592920354, "no_speech_prob": 0.01401131134480238}, + {"id": 325, "seek": 218904, "start": 2196.32, "end": 2204.96, "text": " you know, + walk away from it. But the code generation still feels like magic because you can + explain,", "tokens": [50728, 291, 458, 11, 1792, 1314, 490, 309, 13, 583, 264, 3089, + 5125, 920, 3417, 411, 5585, 570, 291, 393, 2903, 11, 51160], "temperature": 0.0, + "avg_logprob": -0.16682620578342014, "compression_ratio": 1.668141592920354, "no_speech_prob": + 0.01401131134480238}, {"id": 326, "seek": 218904, "start": 2204.96, "end": 2210.24, + "text": " like you can use tap tab and like on a method signature complete, complete + something or on the", "tokens": [51160, 411, 291, 393, 764, 5119, 4421, 293, 411, + 322, 257, 3170, 13397, 3566, 11, 3566, 746, 420, 322, 264, 51424], "temperature": + 0.0, "avg_logprob": -0.16682620578342014, "compression_ratio": 1.668141592920354, + "no_speech_prob": 0.01401131134480238}, {"id": 327, "seek": 218904, "start": 2210.24, + "end": 2214.72, "text": " comment complete something, but you could also write natural + language, right? You could say,", "tokens": [51424, 2871, 3566, 746, 11, 457, 291, + 727, 611, 2464, 3303, 2856, 11, 558, 30, 509, 727, 584, 11, 51648], "temperature": + 0.0, "avg_logprob": -0.16682620578342014, "compression_ratio": 1.668141592920354, + "no_speech_prob": 0.01401131134480238}, {"id": 328, "seek": 221472, "start": 2215.3599999999997, + "end": 2220.64, "text": " generate test cases for me or something like that, right? + And then it will understand it and", "tokens": [50396, 8460, 1500, 3331, 337, 385, + 420, 746, 411, 300, 11, 558, 30, 400, 550, 309, 486, 1223, 309, 293, 50660], "temperature": + 0.0, "avg_logprob": -0.1816089201946648, "compression_ratio": 1.7422222222222221, + "no_speech_prob": 0.005565757397562265}, {"id": 329, "seek": 221472, "start": 2220.64, + "end": 2226.9599999999996, "text": " will read your code and will reason about it + and produce the test cases. I mean, that feels really", "tokens": [50660, 486, 1401, + 428, 3089, 293, 486, 1778, 466, 309, 293, 5258, 264, 1500, 3331, 13, 286, 914, 11, + 300, 3417, 534, 50976], "temperature": 0.0, "avg_logprob": -0.1816089201946648, + "compression_ratio": 1.7422222222222221, "no_speech_prob": 0.005565757397562265}, + {"id": 330, "seek": 221472, "start": 2226.9599999999996, "end": 2234.24, "text": + " magical. It''s the time we''re wandering into right now is going to feel like + magic for a while until", "tokens": [50976, 12066, 13, 467, 311, 264, 565, 321, + 434, 26396, 666, 558, 586, 307, 516, 281, 841, 411, 5585, 337, 257, 1339, 1826, + 51340], "temperature": 0.0, "avg_logprob": -0.1816089201946648, "compression_ratio": + 1.7422222222222221, "no_speech_prob": 0.005565757397562265}, {"id": 331, "seek": + 221472, "start": 2234.24, "end": 2239.12, "text": " we''ve got to get used to the + exponent, it''s just going to keep going up and going up more, going up.", "tokens": + [51340, 321, 600, 658, 281, 483, 1143, 281, 264, 37871, 11, 309, 311, 445, 516, + 281, 1066, 516, 493, 293, 516, 493, 544, 11, 516, 493, 13, 51584], "temperature": + 0.0, "avg_logprob": -0.1816089201946648, "compression_ratio": 1.7422222222222221, + "no_speech_prob": 0.005565757397562265}, {"id": 332, "seek": 223912, "start": 2239.6, + "end": 2247.6, "text": " But you know, I''ve had those existential pains myself, + but then I realized when I start using", "tokens": [50388, 583, 291, 458, 11, 286, + 600, 632, 729, 37133, 29774, 2059, 11, 457, 550, 286, 5334, 562, 286, 722, 1228, + 50788], "temperature": 0.0, "avg_logprob": -0.1469578656283292, "compression_ratio": + 1.7241379310344827, "no_speech_prob": 0.006377519108355045}, {"id": 333, "seek": + 223912, "start": 2247.6, "end": 2255.12, "text": " these new tools the way that + they want me to use them, I have superpowers. I think what we''re actually,", "tokens": + [50788, 613, 777, 3873, 264, 636, 300, 436, 528, 385, 281, 764, 552, 11, 286, 362, + 1687, 47953, 13, 286, 519, 437, 321, 434, 767, 11, 51164], "temperature": 0.0, "avg_logprob": + -0.1469578656283292, "compression_ratio": 1.7241379310344827, "no_speech_prob": + 0.006377519108355045}, {"id": 334, "seek": 223912, "start": 2255.12, "end": 2259.8399999999997, + "text": " you got to have the right mindset. If your mindset is like, oh, my cobalt + job is over, you might be", "tokens": [51164, 291, 658, 281, 362, 264, 558, 12543, + 13, 759, 428, 12543, 307, 411, 11, 1954, 11, 452, 598, 2645, 83, 1691, 307, 670, + 11, 291, 1062, 312, 51400], "temperature": 0.0, "avg_logprob": -0.1469578656283292, + "compression_ratio": 1.7241379310344827, "no_speech_prob": 0.006377519108355045}, + {"id": 335, "seek": 223912, "start": 2259.8399999999997, "end": 2267.04, "text": + " right, your cobalt job is probably over. But if your mindset is like, oh, wow, + I can do things I never", "tokens": [51400, 558, 11, 428, 598, 2645, 83, 1691, 307, + 1391, 670, 13, 583, 498, 428, 12543, 307, 411, 11, 1954, 11, 6076, 11, 286, 393, + 360, 721, 286, 1128, 51760], "temperature": 0.0, "avg_logprob": -0.1469578656283292, + "compression_ratio": 1.7241379310344827, "no_speech_prob": 0.006377519108355045}, + {"id": 336, "seek": 226704, "start": 2267.04, "end": 2274.88, "text": " could do + before. I, John Berryman, put together the HTML from my website and built a react + app", "tokens": [50364, 727, 360, 949, 13, 286, 11, 2619, 34084, 1601, 11, 829, + 1214, 264, 17995, 490, 452, 3144, 293, 3094, 257, 4515, 724, 50756], "temperature": + 0.0, "avg_logprob": -0.17659191170124092, "compression_ratio": 1.5139442231075697, + "no_speech_prob": 0.002981428988277912}, {"id": 337, "seek": 226704, "start": 2274.88, + "end": 2280.24, "text": " in this like, like I thought I''d have to have a PhD to + do something like that. But it''s amazing.", "tokens": [50756, 294, 341, 411, 11, + 411, 286, 1194, 286, 1116, 362, 281, 362, 257, 14476, 281, 360, 746, 411, 300, 13, + 583, 309, 311, 2243, 13, 51024], "temperature": 0.0, "avg_logprob": -0.17659191170124092, + "compression_ratio": 1.5139442231075697, "no_speech_prob": 0.002981428988277912}, + {"id": 338, "seek": 226704, "start": 2281.04, "end": 2288.08, "text": " And what + you''re seeing is an emergence of a new group of people that are, they call us the + AI", "tokens": [51064, 400, 437, 291, 434, 2577, 307, 364, 36211, 295, 257, 777, + 1594, 295, 561, 300, 366, 11, 436, 818, 505, 264, 7318, 51416], "temperature": 0.0, + "avg_logprob": -0.17659191170124092, "compression_ratio": 1.5139442231075697, "no_speech_prob": + 0.002981428988277912}, {"id": 339, "seek": 226704, "start": 2288.08, "end": 2294.96, + "text": " natives, AI native development. And I''ve heard, you know, code composers + rather than like just", "tokens": [51416, 47964, 11, 7318, 8470, 3250, 13, 400, + 286, 600, 2198, 11, 291, 458, 11, 3089, 43872, 2831, 813, 411, 445, 51760], "temperature": + 0.0, "avg_logprob": -0.17659191170124092, "compression_ratio": 1.5139442231075697, + "no_speech_prob": 0.002981428988277912}, {"id": 340, "seek": 229496, "start": 2294.96, + "end": 2302.8, "text": " coders. And you have people that are technically savvy. + You can''t, you have to have, you know,", "tokens": [50364, 17656, 433, 13, 400, + 291, 362, 561, 300, 366, 12120, 47506, 13, 509, 393, 380, 11, 291, 362, 281, 362, + 11, 291, 458, 11, 50756], "temperature": 0.0, "avg_logprob": -0.22497196917263967, + "compression_ratio": 1.670995670995671, "no_speech_prob": 0.006318431347608566}, + {"id": 341, "seek": 229496, "start": 2302.8, "end": 2306.96, "text": " some ability + to, to recode still at this point, to debuck some stuff like you were talking about.", + "tokens": [50756, 512, 3485, 281, 11, 281, 319, 22332, 920, 412, 341, 935, 11, 281, + 3001, 1134, 512, 1507, 411, 291, 645, 1417, 466, 13, 50964], "temperature": 0.0, + "avg_logprob": -0.22497196917263967, "compression_ratio": 1.670995670995671, "no_speech_prob": + 0.006318431347608566}, {"id": 342, "seek": 229496, "start": 2307.52, "end": 2314.4, + "text": " But they all go out at a screen, do this thing for me. And they have, + it just takes a little bit", "tokens": [50992, 583, 436, 439, 352, 484, 412, 257, + 2568, 11, 360, 341, 551, 337, 385, 13, 400, 436, 362, 11, 309, 445, 2516, 257, 707, + 857, 51336], "temperature": 0.0, "avg_logprob": -0.22497196917263967, "compression_ratio": + 1.670995670995671, "no_speech_prob": 0.006318431347608566}, {"id": 343, "seek": + 229496, "start": 2314.4, "end": 2320.16, "text": " of experience to learn how to + shout at the screen in the right way. You got to, you know, you''ve", "tokens": + [51336, 295, 1752, 281, 1466, 577, 281, 8043, 412, 264, 2568, 294, 264, 558, 636, + 13, 509, 658, 281, 11, 291, 458, 11, 291, 600, 51624], "temperature": 0.0, "avg_logprob": + -0.22497196917263967, "compression_ratio": 1.670995670995671, "no_speech_prob": + 0.006318431347608566}, {"id": 344, "seek": 232016, "start": 2320.64, "end": 2325.92, + "text": " you still got to have the human ability to, you have to think about how + this is structured,", "tokens": [50388, 291, 920, 658, 281, 362, 264, 1952, 3485, + 281, 11, 291, 362, 281, 519, 466, 577, 341, 307, 18519, 11, 50652], "temperature": + 0.0, "avg_logprob": -0.1668251714398784, "compression_ratio": 1.6736111111111112, + "no_speech_prob": 0.0022325259633362293}, {"id": 345, "seek": 232016, "start": 2325.92, + "end": 2331.2, "text": " how to modularize stuff. There, there is a craft to it + still. But you, you can start building up", "tokens": [50652, 577, 281, 31111, 1125, + 1507, 13, 821, 11, 456, 307, 257, 8448, 281, 309, 920, 13, 583, 291, 11, 291, 393, + 722, 2390, 493, 50916], "temperature": 0.0, "avg_logprob": -0.1668251714398784, + "compression_ratio": 1.6736111111111112, "no_speech_prob": 0.0022325259633362293}, + {"id": 346, "seek": 232016, "start": 2331.2, "end": 2336.48, "text": " pieces. Even + if you''re not technically savvy, if you''ve been building it in chunks when one + of these", "tokens": [50916, 3755, 13, 2754, 498, 291, 434, 406, 12120, 47506, 11, + 498, 291, 600, 668, 2390, 309, 294, 24004, 562, 472, 295, 613, 51180], "temperature": + 0.0, "avg_logprob": -0.1668251714398784, "compression_ratio": 1.6736111111111112, + "no_speech_prob": 0.0022325259633362293}, {"id": 347, "seek": 232016, "start": 2336.48, + "end": 2340.7999999999997, "text": " pieces messes up gloriously and you''ve got + your floating point numbers that I don''t work in", "tokens": [51180, 3755, 2082, + 279, 493, 26623, 8994, 293, 291, 600, 658, 428, 12607, 935, 3547, 300, 286, 500, + 380, 589, 294, 51396], "temperature": 0.0, "avg_logprob": -0.1668251714398784, "compression_ratio": + 1.6736111111111112, "no_speech_prob": 0.0022325259633362293}, {"id": 348, "seek": + 232016, "start": 2340.7999999999997, "end": 2346.3199999999997, "text": " out like + your example, then at least you can say, I''m going to delete back to here. I''m + going to try", "tokens": [51396, 484, 411, 428, 1365, 11, 550, 412, 1935, 291, 393, + 584, 11, 286, 478, 516, 281, 12097, 646, 281, 510, 13, 286, 478, 516, 281, 853, + 51672], "temperature": 0.0, "avg_logprob": -0.1668251714398784, "compression_ratio": + 1.6736111111111112, "no_speech_prob": 0.0022325259633362293}, {"id": 349, "seek": + 234632, "start": 2346.48, "end": 2352.32, "text": " a different route. See if I + can just bump it out of this. And often you can. And people in every", "tokens": + [50372, 257, 819, 7955, 13, 3008, 498, 286, 393, 445, 9961, 309, 484, 295, 341, + 13, 400, 2049, 291, 393, 13, 400, 561, 294, 633, 50664], "temperature": 0.0, "avg_logprob": + -0.18462447400362986, "compression_ratio": 1.6416666666666666, "no_speech_prob": + 0.013191438280045986}, {"id": 350, "seek": 234632, "start": 2352.32, "end": 2360.8, + "text": " walk of life are are much more effective and efficient at creating. And + it''s, you know, you don''t get", "tokens": [50664, 1792, 295, 993, 366, 366, 709, + 544, 4942, 293, 7148, 412, 4084, 13, 400, 309, 311, 11, 291, 458, 11, 291, 500, + 380, 483, 51088], "temperature": 0.0, "avg_logprob": -0.18462447400362986, "compression_ratio": + 1.6416666666666666, "no_speech_prob": 0.013191438280045986}, {"id": 351, "seek": + 234632, "start": 2360.8, "end": 2366.6400000000003, "text": " this, you don''t always + get to solve the nitpahee little, you know, if you really love debugging and", "tokens": + [51088, 341, 11, 291, 500, 380, 1009, 483, 281, 5039, 264, 10900, 79, 545, 1653, + 707, 11, 291, 458, 11, 498, 291, 534, 959, 45592, 293, 51380], "temperature": 0.0, + "avg_logprob": -0.18462447400362986, "compression_ratio": 1.6416666666666666, "no_speech_prob": + 0.013191438280045986}, {"id": 352, "seek": 234632, "start": 2366.6400000000003, + "end": 2373.36, "text": " writing tests, I''m sorry. I think that''s your days might + be numbered. But if you love creating,", "tokens": [51380, 3579, 6921, 11, 286, + 478, 2597, 13, 286, 519, 300, 311, 428, 1708, 1062, 312, 40936, 13, 583, 498, 291, + 959, 4084, 11, 51716], "temperature": 0.0, "avg_logprob": -0.18462447400362986, + "compression_ratio": 1.6416666666666666, "no_speech_prob": 0.013191438280045986}, + {"id": 353, "seek": 237336, "start": 2373.36, "end": 2378.2400000000002, "text": + " that''s I think we''re approaching a new golden age and it''s exponential. We''re + going to keep", "tokens": [50364, 300, 311, 286, 519, 321, 434, 14908, 257, 777, + 9729, 3205, 293, 309, 311, 21510, 13, 492, 434, 516, 281, 1066, 50608], "temperature": + 0.0, "avg_logprob": -0.1968018737020372, "compression_ratio": 1.5737704918032787, + "no_speech_prob": 0.009339823387563229}, {"id": 354, "seek": 237336, "start": 2378.2400000000002, + "end": 2384.2400000000002, "text": " approaching new golden ages for a while. Yeah, + I think in my career, if I can reflect a little bit,", "tokens": [50608, 14908, + 777, 9729, 12357, 337, 257, 1339, 13, 865, 11, 286, 519, 294, 452, 3988, 11, 498, + 286, 393, 5031, 257, 707, 857, 11, 50908], "temperature": 0.0, "avg_logprob": -0.1968018737020372, + "compression_ratio": 1.5737704918032787, "no_speech_prob": 0.009339823387563229}, + {"id": 355, "seek": 237336, "start": 2384.2400000000002, "end": 2391.6800000000003, + "text": " I, I love creating much more for sure. But then back then, we didn''t + have a lamp, so didn''t have", "tokens": [50908, 286, 11, 286, 959, 4084, 709, 544, + 337, 988, 13, 583, 550, 646, 550, 11, 321, 994, 380, 362, 257, 12684, 11, 370, 994, + 380, 362, 51280], "temperature": 0.0, "avg_logprob": -0.1968018737020372, "compression_ratio": + 1.5737704918032787, "no_speech_prob": 0.009339823387563229}, {"id": 356, "seek": + 239168, "start": 2391.68, "end": 2400.8799999999997, "text": " compilates. We had + to do pay a programming, right? And that was our command. Yeah. And but the,", "tokens": + [50364, 715, 388, 1024, 13, 492, 632, 281, 360, 1689, 257, 9410, 11, 558, 30, 400, + 300, 390, 527, 5622, 13, 865, 13, 400, 457, 264, 11, 50824], "temperature": 0.0, + "avg_logprob": -0.2759245258488067, "compression_ratio": 1.508108108108108, "no_speech_prob": + 0.026053477078676224}, {"id": 357, "seek": 239168, "start": 2400.8799999999997, + "end": 2409.04, "text": " the, the, that notion that you just said about creativity, + I think that drove much more", "tokens": [50824, 264, 11, 264, 11, 300, 10710, 300, + 291, 445, 848, 466, 12915, 11, 286, 519, 300, 13226, 709, 544, 51232], "temperature": + 0.0, "avg_logprob": -0.2759245258488067, "compression_ratio": 1.508108108108108, + "no_speech_prob": 0.026053477078676224}, {"id": 358, "seek": 239168, "start": 2410.16, + "end": 2417.3599999999997, "text": " forward than us going into the rabbit down + the rabbit holes, you know, of debugging that thing.", "tokens": [51288, 2128, 813, + 505, 516, 666, 264, 19509, 760, 264, 19509, 8118, 11, 291, 458, 11, 295, 45592, + 300, 551, 13, 51648], "temperature": 0.0, "avg_logprob": -0.2759245258488067, "compression_ratio": + 1.508108108108108, "no_speech_prob": 0.026053477078676224}, {"id": 359, "seek": + 241736, "start": 2417.36, "end": 2422.7200000000003, "text": " However important + that thing was, you know, of course, you need to debug and so on. But it didn''t + feel,", "tokens": [50364, 2908, 1021, 300, 551, 390, 11, 291, 458, 11, 295, 1164, + 11, 291, 643, 281, 24083, 293, 370, 322, 13, 583, 309, 994, 380, 841, 11, 50632], + "temperature": 0.0, "avg_logprob": -0.2384022813502366, "compression_ratio": 1.6981818181818182, + "no_speech_prob": 0.012571687810122967}, {"id": 360, "seek": 241736, "start": 2423.52, + "end": 2428.08, "text": " like you, you would just feel exhausted after that. You + know, like, yeah, I fixed that bug. Finally,", "tokens": [50672, 411, 291, 11, 291, + 576, 445, 841, 17992, 934, 300, 13, 509, 458, 11, 411, 11, 1338, 11, 286, 6806, + 300, 7426, 13, 6288, 11, 50900], "temperature": 0.0, "avg_logprob": -0.2384022813502366, + "compression_ratio": 1.6981818181818182, "no_speech_prob": 0.012571687810122967}, + {"id": 361, "seek": 241736, "start": 2428.08, "end": 2434.1600000000003, "text": + " I squashed it. Move on because you, you want to build stuff, right? You, and I + think it was it,", "tokens": [50900, 286, 2339, 12219, 309, 13, 10475, 322, 570, + 291, 11, 291, 528, 281, 1322, 1507, 11, 558, 30, 509, 11, 293, 286, 519, 309, 390, + 309, 11, 51204], "temperature": 0.0, "avg_logprob": -0.2384022813502366, "compression_ratio": + 1.6981818181818182, "no_speech_prob": 0.012571687810122967}, {"id": 362, "seek": + 241736, "start": 2434.7200000000003, "end": 2440.56, "text": " the extra who said, + if debugging is the process of removing, finding and removing bugs, then", "tokens": + [51232, 264, 2857, 567, 848, 11, 498, 45592, 307, 264, 1399, 295, 12720, 11, 5006, + 293, 12720, 15120, 11, 550, 51524], "temperature": 0.0, "avg_logprob": -0.2384022813502366, + "compression_ratio": 1.6981818181818182, "no_speech_prob": 0.012571687810122967}, + {"id": 363, "seek": 241736, "start": 2441.2000000000003, "end": 2446.7200000000003, + "text": " programming must be the process of introducing bugs. And so that''s right.", + "tokens": [51556, 9410, 1633, 312, 264, 1399, 295, 15424, 15120, 13, 400, 370, 300, + 311, 558, 13, 51832], "temperature": 0.0, "avg_logprob": -0.2384022813502366, "compression_ratio": + 1.6981818181818182, "no_speech_prob": 0.012571687810122967}, {"id": 364, "seek": + 244736, "start": 2447.36, "end": 2449.04, "text": " Yeah. That''s a vicious circle. + Yeah.", "tokens": [50364, 865, 13, 663, 311, 257, 30093, 6329, 13, 865, 13, 50448], + "temperature": 0.0, "avg_logprob": -0.2122407219626687, "compression_ratio": 1.5670498084291187, + "no_speech_prob": 0.003264091443270445}, {"id": 365, "seek": 244736, "start": 2451.36, + "end": 2456.96, "text": " You, you already touched on that topic a bit earlier about + artifacts. I''ve read your blog posts,", "tokens": [50564, 509, 11, 291, 1217, 9828, + 322, 300, 4829, 257, 857, 3071, 466, 24617, 13, 286, 600, 1401, 428, 6968, 12300, + 11, 50844], "temperature": 0.0, "avg_logprob": -0.2122407219626687, "compression_ratio": + 1.5670498084291187, "no_speech_prob": 0.003264091443270445}, {"id": 366, "seek": + 244736, "start": 2456.96, "end": 2463.2000000000003, "text": " which will, will + definitely link, link in and I, I got inspired by that. I have to say,", "tokens": + [50844, 597, 486, 11, 486, 2138, 2113, 11, 2113, 294, 293, 286, 11, 286, 658, 7547, + 538, 300, 13, 286, 362, 281, 584, 11, 51156], "temperature": 0.0, "avg_logprob": + -0.2122407219626687, "compression_ratio": 1.5670498084291187, "no_speech_prob": + 0.003264091443270445}, {"id": 367, "seek": 244736, "start": 2463.84, "end": 2469.04, + "text": " because oftentimes when I go to the set applications, you know, chat, + GPT or perplexity,", "tokens": [51188, 570, 18349, 562, 286, 352, 281, 264, 992, + 5821, 11, 291, 458, 11, 5081, 11, 26039, 51, 420, 680, 18945, 507, 11, 51448], "temperature": + 0.0, "avg_logprob": -0.2122407219626687, "compression_ratio": 1.5670498084291187, + "no_speech_prob": 0.003264091443270445}, {"id": 368, "seek": 244736, "start": 2469.04, + "end": 2476.8, "text": " what have you, and you have a longer conversation there, + it is hard to then sort of trace back and", "tokens": [51448, 437, 362, 291, 11, + 293, 291, 362, 257, 2854, 3761, 456, 11, 309, 307, 1152, 281, 550, 1333, 295, 13508, + 646, 293, 51836], "temperature": 0.0, "avg_logprob": -0.2122407219626687, "compression_ratio": + 1.5670498084291187, "no_speech_prob": 0.003264091443270445}, {"id": 369, "seek": + 247680, "start": 2476.88, "end": 2482.0800000000004, "text": " think, okay, I branched + here and, okay, what was my thinking again? What did I produce at that", "tokens": + [50368, 519, 11, 1392, 11, 286, 9819, 292, 510, 293, 11, 1392, 11, 437, 390, 452, + 1953, 797, 30, 708, 630, 286, 5258, 412, 300, 50628], "temperature": 0.0, "avg_logprob": + -0.13843217830068058, "compression_ratio": 1.6167400881057268, "no_speech_prob": + 0.008451452478766441}, {"id": 370, "seek": 247680, "start": 2482.0800000000004, + "end": 2488.0800000000004, "text": " point? There is nothing to hold on to except + scrolling back and forth. And that''s what you", "tokens": [50628, 935, 30, 821, + 307, 1825, 281, 1797, 322, 281, 3993, 29053, 646, 293, 5220, 13, 400, 300, 311, + 437, 291, 50928], "temperature": 0.0, "avg_logprob": -0.13843217830068058, "compression_ratio": + 1.6167400881057268, "no_speech_prob": 0.008451452478766441}, {"id": 371, "seek": + 247680, "start": 2488.0800000000004, "end": 2493.84, "text": " really put. And I + want you to open, like, you basically proposed something new, I believe.", "tokens": + [50928, 534, 829, 13, 400, 286, 528, 291, 281, 1269, 11, 411, 11, 291, 1936, 10348, + 746, 777, 11, 286, 1697, 13, 51216], "temperature": 0.0, "avg_logprob": -0.13843217830068058, + "compression_ratio": 1.6167400881057268, "no_speech_prob": 0.008451452478766441}, + {"id": 372, "seek": 247680, "start": 2494.7200000000003, "end": 2501.6800000000003, + "text": " I wonder if you are the creator of this or like, in any case, you carry + this idea forward.", "tokens": [51260, 286, 2441, 498, 291, 366, 264, 14181, 295, + 341, 420, 411, 11, 294, 604, 1389, 11, 291, 3985, 341, 1558, 2128, 13, 51608], "temperature": + 0.0, "avg_logprob": -0.13843217830068058, "compression_ratio": 1.6167400881057268, + "no_speech_prob": 0.008451452478766441}, {"id": 373, "seek": 250168, "start": 2501.68, + "end": 2504.64, "text": " Can you explain what do you mean by artifacts?", "tokens": + [50364, 1664, 291, 2903, 437, 360, 291, 914, 538, 24617, 30, 50512], "temperature": + 0.0, "avg_logprob": -0.17453065732630288, "compression_ratio": 1.5566037735849056, + "no_speech_prob": 0.005332770757377148}, {"id": 374, "seek": 250168, "start": 2505.8399999999997, + "end": 2513.12, "text": " I will carry the idea forward. I think there is what we''re + seeing is some convergence around", "tokens": [50572, 286, 486, 3985, 264, 1558, + 2128, 13, 286, 519, 456, 307, 437, 321, 434, 2577, 307, 512, 32181, 926, 50936], + "temperature": 0.0, "avg_logprob": -0.17453065732630288, "compression_ratio": 1.5566037735849056, + "no_speech_prob": 0.005332770757377148}, {"id": 375, "seek": 250168, "start": 2513.12, + "end": 2521.52, "text": " the notion that put into my blog post. For example, with + anthropics, artifacts, so that they,", "tokens": [50936, 264, 10710, 300, 829, 666, + 452, 6968, 2183, 13, 1171, 1365, 11, 365, 22727, 1167, 11, 24617, 11, 370, 300, + 436, 11, 51356], "temperature": 0.0, "avg_logprob": -0.17453065732630288, "compression_ratio": + 1.5566037735849056, "no_speech_prob": 0.005332770757377148}, {"id": 376, "seek": + 250168, "start": 2521.52, "end": 2526.8799999999997, "text": " they splash something + that I think is getting at what I''m talking about. But if you dig a little", "tokens": + [51356, 436, 25757, 746, 300, 286, 519, 307, 1242, 412, 437, 286, 478, 1417, 466, + 13, 583, 498, 291, 2528, 257, 707, 51624], "temperature": 0.0, "avg_logprob": -0.17453065732630288, + "compression_ratio": 1.5566037735849056, "no_speech_prob": 0.005332770757377148}, + {"id": 377, "seek": 252688, "start": 2526.88, "end": 2533.84, "text": " at the end, + it''s not quite what I''m talking about. Whenever you engaged in a conversation + with", "tokens": [50364, 412, 264, 917, 11, 309, 311, 406, 1596, 437, 286, 478, + 1417, 466, 13, 14159, 291, 8237, 294, 257, 3761, 365, 50712], "temperature": 0.0, + "avg_logprob": -0.23977725982666015, "compression_ratio": 1.6041666666666667, "no_speech_prob": + 0.0029198743868619204}, {"id": 378, "seek": 252688, "start": 2535.28, "end": 2542.4, + "text": " an assistant, LM experience, they just want to chat. And so we''ve done + good over time by giving", "tokens": [50784, 364, 10994, 11, 441, 44, 1752, 11, + 436, 445, 528, 281, 5081, 13, 400, 370, 321, 600, 1096, 665, 670, 565, 538, 2902, + 51140], "temperature": 0.0, "avg_logprob": -0.23977725982666015, "compression_ratio": + 1.6041666666666667, "no_speech_prob": 0.0029198743868619204}, {"id": 379, "seek": + 252688, "start": 2542.4, "end": 2547.36, "text": " them like tools. So now it''s + rather than just like being your therapist, they can go inducing for", "tokens": + [51140, 552, 411, 3873, 13, 407, 586, 309, 311, 2831, 813, 445, 411, 885, 428, 19830, + 11, 436, 393, 352, 13716, 2175, 337, 51388], "temperature": 0.0, "avg_logprob": + -0.23977725982666015, "compression_ratio": 1.6041666666666667, "no_speech_prob": + 0.0029198743868619204}, {"id": 380, "seek": 252688, "start": 2547.36, "end": 2553.04, + "text": " you. So that''s nice. But still, it''s a linear flow. And whenever you''re + talking about something,", "tokens": [51388, 291, 13, 407, 300, 311, 1481, 13, 583, + 920, 11, 309, 311, 257, 8213, 3095, 13, 400, 5699, 291, 434, 1417, 466, 746, 11, + 51672], "temperature": 0.0, "avg_logprob": -0.23977725982666015, "compression_ratio": + 1.6041666666666667, "no_speech_prob": 0.0029198743868619204}, {"id": 381, "seek": + 255304, "start": 2553.92, "end": 2560.32, "text": " it flows back into the backstroll. + Most of the time, when you are getting work done, you''re getting", "tokens": [50408, + 309, 12867, 646, 666, 264, 646, 372, 3970, 13, 4534, 295, 264, 565, 11, 562, 291, + 366, 1242, 589, 1096, 11, 291, 434, 1242, 50728], "temperature": 0.0, "avg_logprob": + -0.2807449722290039, "compression_ratio": 1.742081447963801, "no_speech_prob": 0.007474581710994244}, + {"id": 382, "seek": 255304, "start": 2560.32, "end": 2566.96, "text": " work done + on something. And artifact, there is a staple, I really wanted to call it a", "tokens": + [50728, 589, 1096, 322, 746, 13, 400, 34806, 11, 456, 307, 257, 32361, 11, 286, + 534, 1415, 281, 818, 309, 257, 51060], "temperature": 0.0, "avg_logprob": -0.2807449722290039, + "compression_ratio": 1.742081447963801, "no_speech_prob": 0.007474581710994244}, + {"id": 383, "seek": 255304, "start": 2566.96, "end": 2573.12, "text": " staple object + of discourse because it isn''t object. It''s staple because it may change. And it''s", + "tokens": [51060, 32361, 2657, 295, 23938, 570, 309, 1943, 380, 2657, 13, 467, 311, + 32361, 570, 309, 815, 1319, 13, 400, 309, 311, 51368], "temperature": 0.0, "avg_logprob": + -0.2807449722290039, "compression_ratio": 1.742081447963801, "no_speech_prob": 0.007474581710994244}, + {"id": 384, "seek": 255304, "start": 2573.12, "end": 2579.52, "text": " it''s the + object of the discourse. But artifacts is not just easy to say. But this is what + we deal with.", "tokens": [51368, 309, 311, 264, 2657, 295, 264, 23938, 13, 583, + 24617, 307, 406, 445, 1858, 281, 584, 13, 583, 341, 307, 437, 321, 2028, 365, 13, + 51688], "temperature": 0.0, "avg_logprob": -0.2807449722290039, "compression_ratio": + 1.742081447963801, "no_speech_prob": 0.007474581710994244}, {"id": 385, "seek": + 257952, "start": 2580.24, "end": 2584.48, "text": " Whenever we''re paraprogramming + on something, it''s me and you looking at this piece of code,", "tokens": [50400, + 14159, 321, 434, 36992, 340, 1342, 2810, 322, 746, 11, 309, 311, 385, 293, 291, + 1237, 412, 341, 2522, 295, 3089, 11, 50612], "temperature": 0.0, "avg_logprob": + -0.14734056260850695, "compression_ratio": 1.7216117216117217, "no_speech_prob": + 0.02397976815700531}, {"id": 386, "seek": 257952, "start": 2584.48, "end": 2588.32, + "text": " and you make a recommendation about this. And I say, that''s good. We + go back and forth.", "tokens": [50612, 293, 291, 652, 257, 11879, 466, 341, 13, + 400, 286, 584, 11, 300, 311, 665, 13, 492, 352, 646, 293, 5220, 13, 50804], "temperature": + 0.0, "avg_logprob": -0.14734056260850695, "compression_ratio": 1.7216117216117217, + "no_speech_prob": 0.02397976815700531}, {"id": 387, "seek": 257952, "start": 2588.88, + "end": 2595.12, "text": " And anything that you can imagine can be addressed like + that. The situation becomes a particular", "tokens": [50832, 400, 1340, 300, 291, + 393, 3811, 393, 312, 13847, 411, 300, 13, 440, 2590, 3643, 257, 1729, 51144], "temperature": + 0.0, "avg_logprob": -0.14734056260850695, "compression_ratio": 1.7216117216117217, + "no_speech_prob": 0.02397976815700531}, {"id": 388, "seek": 257952, "start": 2595.92, + "end": 2603.52, "text": " point yet, when every you''re dealing with multiple artifacts. + So if you''re saying, I really like", "tokens": [51184, 935, 1939, 11, 562, 633, + 291, 434, 6260, 365, 3866, 24617, 13, 407, 498, 291, 434, 1566, 11, 286, 534, 411, + 51564], "temperature": 0.0, "avg_logprob": -0.14734056260850695, "compression_ratio": + 1.7216117216117217, "no_speech_prob": 0.02397976815700531}, {"id": 389, "seek": + 257952, "start": 2603.52, "end": 2608.8, "text": " this thing over here. And I wonder + how it would fit in with this thing over here. You''re having,", "tokens": [51564, + 341, 551, 670, 510, 13, 400, 286, 2441, 577, 309, 576, 3318, 294, 365, 341, 551, + 670, 510, 13, 509, 434, 1419, 11, 51828], "temperature": 0.0, "avg_logprob": -0.14734056260850695, + "compression_ratio": 1.7216117216117217, "no_speech_prob": 0.02397976815700531}, + {"id": 390, "seek": 260880, "start": 2608.8, "end": 2613.36, "text": " as a human, + you''re having to refer to more than one thing that exists outside of this linear", + "tokens": [50364, 382, 257, 1952, 11, 291, 434, 1419, 281, 2864, 281, 544, 813, + 472, 551, 300, 8198, 2380, 295, 341, 8213, 50592], "temperature": 0.0, "avg_logprob": + -0.1477616917003285, "compression_ratio": 1.6077586206896552, "no_speech_prob": + 0.0016633168561384082}, {"id": 391, "seek": 260880, "start": 2613.36, "end": 2620.1600000000003, + "text": " conversation. And you''re talking about how they relate to one another. + And so the blog post,", "tokens": [50592, 3761, 13, 400, 291, 434, 1417, 466, 577, + 436, 10961, 281, 472, 1071, 13, 400, 370, 264, 6968, 2183, 11, 50932], "temperature": + 0.0, "avg_logprob": -0.1477616917003285, "compression_ratio": 1.6077586206896552, + "no_speech_prob": 0.0016633168561384082}, {"id": 392, "seek": 260880, "start": 2620.1600000000003, + "end": 2625.1200000000003, "text": " which I hope you guys all read, arterislabs.com, + we''ll do this again in a second, right?", "tokens": [50932, 597, 286, 1454, 291, + 1074, 439, 1401, 11, 30455, 271, 75, 17243, 13, 1112, 11, 321, 603, 360, 341, 797, + 294, 257, 1150, 11, 558, 30, 51180], "temperature": 0.0, "avg_logprob": -0.1477616917003285, + "compression_ratio": 1.6077586206896552, "no_speech_prob": 0.0016633168561384082}, + {"id": 393, "seek": 260880, "start": 2626.0, "end": 2636.0800000000004, "text": + " It gets into what I think of as an artifact. It talks about how to build a prompt + so that you have", "tokens": [51224, 467, 2170, 666, 437, 286, 519, 295, 382, 364, + 34806, 13, 467, 6686, 466, 577, 281, 1322, 257, 12391, 370, 300, 291, 362, 51728], + "temperature": 0.0, "avg_logprob": -0.1477616917003285, "compression_ratio": 1.6077586206896552, + "no_speech_prob": 0.0016633168561384082}, {"id": 394, "seek": 263608, "start": 2636.08, + "end": 2642.64, "text": " space for this linear conversation. But you also draw + the models attention to a chunk at the top,", "tokens": [50364, 1901, 337, 341, + 8213, 3761, 13, 583, 291, 611, 2642, 264, 5245, 3202, 281, 257, 16635, 412, 264, + 1192, 11, 50692], "temperature": 0.0, "avg_logprob": -0.14141454299290976, "compression_ratio": + 1.6425531914893616, "no_speech_prob": 0.001587264589034021}, {"id": 395, "seek": + 263608, "start": 2642.64, "end": 2646.72, "text": " usually, you might put it all + through, you might put it at the bottom to have an experiment with it.", "tokens": + [50692, 2673, 11, 291, 1062, 829, 309, 439, 807, 11, 291, 1062, 829, 309, 412, 264, + 2767, 281, 362, 364, 5120, 365, 309, 13, 50896], "temperature": 0.0, "avg_logprob": + -0.14141454299290976, "compression_ratio": 1.6425531914893616, "no_speech_prob": + 0.001587264589034021}, {"id": 396, "seek": 263608, "start": 2646.72, "end": 2652.24, + "text": " But a static chunk, which is like, here is all the things on the table + that people can refer to.", "tokens": [50896, 583, 257, 13437, 16635, 11, 597, 307, + 411, 11, 510, 307, 439, 264, 721, 322, 264, 3199, 300, 561, 393, 2864, 281, 13, + 51172], "temperature": 0.0, "avg_logprob": -0.14141454299290976, "compression_ratio": + 1.6425531914893616, "no_speech_prob": 0.001587264589034021}, {"id": 397, "seek": + 263608, "start": 2653.04, "end": 2662.3199999999997, "text": " Each object, each + artifact, has importantly an ID to be referred to. And I''ve noticed that", "tokens": + [51212, 6947, 2657, 11, 1184, 34806, 11, 575, 8906, 364, 7348, 281, 312, 10839, + 281, 13, 400, 286, 600, 5694, 300, 51676], "temperature": 0.0, "avg_logprob": -0.14141454299290976, + "compression_ratio": 1.6425531914893616, "no_speech_prob": 0.001587264589034021}, + {"id": 398, "seek": 266232, "start": 2662.8, "end": 2670.0, "text": " these models + do really well with arbitrary hexadecimal IDs. So I''ll just give them a random + ID.", "tokens": [50388, 613, 5245, 360, 534, 731, 365, 23211, 23291, 762, 66, 10650, + 48212, 13, 407, 286, 603, 445, 976, 552, 257, 4974, 7348, 13, 50748], "temperature": + 0.0, "avg_logprob": -0.1527665891145405, "compression_ratio": 1.6033755274261603, + "no_speech_prob": 0.004685103893280029}, {"id": 399, "seek": 266232, "start": 2670.7200000000003, + "end": 2674.48, "text": " But they''re really good at referring to those and not + like, you know, they don''t seem to hallucinate", "tokens": [50784, 583, 436, 434, + 534, 665, 412, 13761, 281, 729, 293, 406, 411, 11, 291, 458, 11, 436, 500, 380, + 1643, 281, 35212, 13923, 50972], "temperature": 0.0, "avg_logprob": -0.1527665891145405, + "compression_ratio": 1.6033755274261603, "no_speech_prob": 0.004685103893280029}, + {"id": 400, "seek": 266232, "start": 2674.96, "end": 2683.1200000000003, "text": + " these IDs, which surprised me. And so if you have a prompt with these artifacts + at the top,", "tokens": [50996, 613, 48212, 11, 597, 6100, 385, 13, 400, 370, 498, + 291, 362, 257, 12391, 365, 613, 24617, 412, 264, 1192, 11, 51404], "temperature": + 0.0, "avg_logprob": -0.1527665891145405, "compression_ratio": 1.6033755274261603, + "no_speech_prob": 0.004685103893280029}, {"id": 401, "seek": 266232, "start": 2683.1200000000003, + "end": 2687.84, "text": " and you have a system message that explains to the model + how to interact with these things,", "tokens": [51404, 293, 291, 362, 257, 1185, + 3636, 300, 13948, 281, 264, 2316, 577, 281, 4648, 365, 613, 721, 11, 51640], "temperature": + 0.0, "avg_logprob": -0.1527665891145405, "compression_ratio": 1.6033755274261603, + "no_speech_prob": 0.004685103893280029}, {"id": 402, "seek": 268784, "start": 2688.4, + "end": 2694.56, "text": " then my experience is that they obey the instructions + really well. They talk", "tokens": [50392, 550, 452, 1752, 307, 300, 436, 19297, + 264, 9415, 534, 731, 13, 814, 751, 50700], "temperature": 0.0, "avg_logprob": -0.13474469184875487, + "compression_ratio": 1.6431924882629108, "no_speech_prob": 0.004789172206073999}, + {"id": 403, "seek": 268784, "start": 2696.96, "end": 2703.6000000000004, "text": + " humans are used to using pronouns and names and nicknames and, you know, other + pointers", "tokens": [50820, 6255, 366, 1143, 281, 1228, 35883, 293, 5288, 293, + 15416, 77, 1632, 293, 11, 291, 458, 11, 661, 44548, 51152], "temperature": 0.0, + "avg_logprob": -0.13474469184875487, "compression_ratio": 1.6431924882629108, "no_speech_prob": + 0.004789172206073999}, {"id": 404, "seek": 268784, "start": 2704.2400000000002, + "end": 2710.8, "text": " that refer to the real thing. And these models having read + all the human text that they could", "tokens": [51184, 300, 2864, 281, 264, 957, + 551, 13, 400, 613, 5245, 1419, 1401, 439, 264, 1952, 2487, 300, 436, 727, 51512], + "temperature": 0.0, "avg_logprob": -0.13474469184875487, "compression_ratio": 1.6431924882629108, + "no_speech_prob": 0.004789172206073999}, {"id": 405, "seek": 268784, "start": 2710.8, + "end": 2717.52, "text": " get their hands on, the internet five times, they also + understand what you mean about using", "tokens": [51512, 483, 641, 2377, 322, 11, + 264, 4705, 1732, 1413, 11, 436, 611, 1223, 437, 291, 914, 466, 1228, 51848], "temperature": + 0.0, "avg_logprob": -0.13474469184875487, "compression_ratio": 1.6431924882629108, + "no_speech_prob": 0.004789172206073999}, {"id": 406, "seek": 271752, "start": 2718.0, + "end": 2722.8, "text": " using pronouns and stuff. So you can say, you know, dear + model, there''s this thing called artifact.", "tokens": [50388, 1228, 35883, 293, + 1507, 13, 407, 291, 393, 584, 11, 291, 458, 11, 6875, 2316, 11, 456, 311, 341, 551, + 1219, 34806, 13, 50628], "temperature": 0.0, "avg_logprob": -0.2086505709954028, + "compression_ratio": 1.5991735537190082, "no_speech_prob": 0.0008889439632184803}, + {"id": 407, "seek": 271752, "start": 2722.8, "end": 2732.16, "text": " They have + these IDs. When you refer to them, then use these, use anchors like in HML because", + "tokens": [50628, 814, 362, 613, 48212, 13, 1133, 291, 2864, 281, 552, 11, 550, + 764, 613, 11, 764, 12723, 830, 411, 294, 389, 12683, 570, 51096], "temperature": + 0.0, "avg_logprob": -0.2086505709954028, "compression_ratio": 1.5991735537190082, + "no_speech_prob": 0.0008889439632184803}, {"id": 408, "seek": 271752, "start": 2732.16, + "end": 2737.84, "text": " they''ve seen a lot of those. And in the href tag, refer + to it. And here''s an example. And they just,", "tokens": [51096, 436, 600, 1612, + 257, 688, 295, 729, 13, 400, 294, 264, 276, 33115, 6162, 11, 2864, 281, 309, 13, + 400, 510, 311, 364, 1365, 13, 400, 436, 445, 11, 51380], "temperature": 0.0, "avg_logprob": + -0.2086505709954028, "compression_ratio": 1.5991735537190082, "no_speech_prob": + 0.0008889439632184803}, {"id": 409, "seek": 271752, "start": 2739.52, "end": 2744.16, + "text": " I haven''t done any like formal like, you know, reinforced testing, but + in my experience with,", "tokens": [51464, 286, 2378, 380, 1096, 604, 411, 9860, + 411, 11, 291, 458, 11, 31365, 4997, 11, 457, 294, 452, 1752, 365, 11, 51696], "temperature": + 0.0, "avg_logprob": -0.2086505709954028, "compression_ratio": 1.5991735537190082, + "no_speech_prob": 0.0008889439632184803}, {"id": 410, "seek": 274416, "start": 2744.24, + "end": 2749.68, "text": " they just haven''t gone wrong. They are comfortable referring + to these things. And it provides a really", "tokens": [50368, 436, 445, 2378, 380, + 2780, 2085, 13, 814, 366, 4619, 13761, 281, 613, 721, 13, 400, 309, 6417, 257, 534, + 50640], "temperature": 0.0, "avg_logprob": -0.11439032554626465, "compression_ratio": + 1.7885304659498207, "no_speech_prob": 0.0037125013768672943}, {"id": 411, "seek": + 274416, "start": 2750.3199999999997, "end": 2756.96, "text": " slick experience, + I think, for the user. The user at the end of this conversation is looking at a", + "tokens": [50672, 37406, 1752, 11, 286, 519, 11, 337, 264, 4195, 13, 440, 4195, + 412, 264, 917, 295, 341, 3761, 307, 1237, 412, 257, 51004], "temperature": 0.0, + "avg_logprob": -0.11439032554626465, "compression_ratio": 1.7885304659498207, "no_speech_prob": + 0.0037125013768672943}, {"id": 412, "seek": 274416, "start": 2756.96, "end": 2760.96, + "text": " conversation that they don''t have to scroll back up to. They''re looking + at artifacts on the right.", "tokens": [51004, 3761, 300, 436, 500, 380, 362, 281, + 11369, 646, 493, 281, 13, 814, 434, 1237, 412, 24617, 322, 264, 558, 13, 51204], + "temperature": 0.0, "avg_logprob": -0.11439032554626465, "compression_ratio": 1.7885304659498207, + "no_speech_prob": 0.0037125013768672943}, {"id": 413, "seek": 274416, "start": 2760.96, + "end": 2766.64, "text": " And they can, they can grab the ones they need. The artifacts + themselves, you know, the application", "tokens": [51204, 400, 436, 393, 11, 436, + 393, 4444, 264, 2306, 436, 643, 13, 440, 24617, 2969, 11, 291, 458, 11, 264, 3861, + 51488], "temperature": 0.0, "avg_logprob": -0.11439032554626465, "compression_ratio": + 1.7885304659498207, "no_speech_prob": 0.0037125013768672943}, {"id": 414, "seek": + 274416, "start": 2766.64, "end": 2770.56, "text": " developer, you''re in charge + of how you want to present these things. If it''s its text, you can just", "tokens": + [51488, 10754, 11, 291, 434, 294, 4602, 295, 577, 291, 528, 281, 1974, 613, 721, + 13, 759, 309, 311, 1080, 2487, 11, 291, 393, 445, 51684], "temperature": 0.0, "avg_logprob": + -0.11439032554626465, "compression_ratio": 1.7885304659498207, "no_speech_prob": + 0.0037125013768672943}, {"id": 415, "seek": 277056, "start": 2770.56, "end": 2775.2, + "text": " make it text. But if it''s like a home listing, you know, in the background, + it can really be", "tokens": [50364, 652, 309, 2487, 13, 583, 498, 309, 311, 411, + 257, 1280, 22161, 11, 291, 458, 11, 294, 264, 3678, 11, 309, 393, 534, 312, 50596], + "temperature": 0.0, "avg_logprob": -0.1703711918422154, "compression_ratio": 1.7720588235294117, + "no_speech_prob": 0.0019141300581395626}, {"id": 416, "seek": 277056, "start": 2775.2, + "end": 2781.68, "text": " represented by, you know, JSON. But you present the user, + you know, picture the home and the, you know,", "tokens": [50596, 10379, 538, 11, + 291, 458, 11, 31828, 13, 583, 291, 1974, 264, 4195, 11, 291, 458, 11, 3036, 264, + 1280, 293, 264, 11, 291, 458, 11, 50920], "temperature": 0.0, "avg_logprob": -0.1703711918422154, + "compression_ratio": 1.7720588235294117, "no_speech_prob": 0.0019141300581395626}, + {"id": 417, "seek": 277056, "start": 2781.68, "end": 2787.36, "text": " scrollable + tab and maybe a scheduling button. You can do all these rich things with artifacts", + "tokens": [50920, 11369, 712, 4421, 293, 1310, 257, 29055, 2960, 13, 509, 393, 360, + 439, 613, 4593, 721, 365, 24617, 51204], "temperature": 0.0, "avg_logprob": -0.1703711918422154, + "compression_ratio": 1.7720588235294117, "no_speech_prob": 0.0019141300581395626}, + {"id": 418, "seek": 277056, "start": 2787.92, "end": 2791.44, "text": " that you + can''t do if you''re just having a chit chat conversation and it''s all just scrolling", + "tokens": [51232, 300, 291, 393, 380, 360, 498, 291, 434, 445, 1419, 257, 417, 270, + 5081, 3761, 293, 309, 311, 439, 445, 11369, 278, 51408], "temperature": 0.0, "avg_logprob": + -0.1703711918422154, "compression_ratio": 1.7720588235294117, "no_speech_prob": + 0.0019141300581395626}, {"id": 419, "seek": 277056, "start": 2791.44, "end": 2798.48, + "text": " back into the back. So I think it''s a cool enough idea. I think there''s + some indications that it''s", "tokens": [51408, 646, 666, 264, 646, 13, 407, 286, + 519, 309, 311, 257, 1627, 1547, 1558, 13, 286, 519, 456, 311, 512, 44450, 300, 309, + 311, 51760], "temperature": 0.0, "avg_logprob": -0.1703711918422154, "compression_ratio": + 1.7720588235294117, "no_speech_prob": 0.0019141300581395626}, {"id": 420, "seek": + 279848, "start": 2798.48, "end": 2804.4, "text": " coming into existence with, you + know, Anthropocardic factor, GPT, OpenAI''s Canvas.", "tokens": [50364, 1348, 666, + 9123, 365, 11, 291, 458, 11, 12727, 1513, 905, 515, 299, 5952, 11, 26039, 51, 11, + 7238, 48698, 311, 25725, 13, 50660], "temperature": 0.0, "avg_logprob": -0.2982587192369544, + "compression_ratio": 1.51528384279476, "no_speech_prob": 0.003516318742185831}, + {"id": 421, "seek": 279848, "start": 2808.0, "end": 2812.4, "text": " Persure is + actually implicitly doing a really good job with somehow they''re doing this.", + "tokens": [50840, 14006, 540, 307, 767, 26947, 356, 884, 257, 534, 665, 1691, 365, + 6063, 436, 434, 884, 341, 13, 51060], "temperature": 0.0, "avg_logprob": -0.2982587192369544, + "compression_ratio": 1.51528384279476, "no_speech_prob": 0.003516318742185831}, + {"id": 422, "seek": 279848, "start": 2813.12, "end": 2818.64, "text": " So it''ll + come into reality, I think, at some point. It just gives you an idea.", "tokens": + [51096, 407, 309, 603, 808, 666, 4103, 11, 286, 519, 11, 412, 512, 935, 13, 467, + 445, 2709, 291, 364, 1558, 13, 51372], "temperature": 0.0, "avg_logprob": -0.2982587192369544, + "compression_ratio": 1.51528384279476, "no_speech_prob": 0.003516318742185831}, + {"id": 423, "seek": 279848, "start": 2818.64, "end": 2824.88, "text": " Yeah. It + feels like it structures the interaction with the element. It doesn''t feel like + you lost", "tokens": [51372, 865, 13, 467, 3417, 411, 309, 9227, 264, 9285, 365, + 264, 4478, 13, 467, 1177, 380, 841, 411, 291, 2731, 51684], "temperature": 0.0, + "avg_logprob": -0.2982587192369544, "compression_ratio": 1.51528384279476, "no_speech_prob": + 0.003516318742185831}, {"id": 424, "seek": 282488, "start": 2824.88, "end": 2829.92, + "text": " your time in a way that you, like, it''s like you need to summarize it + for your conversation, right?", "tokens": [50364, 428, 565, 294, 257, 636, 300, + 291, 11, 411, 11, 309, 311, 411, 291, 643, 281, 20858, 309, 337, 428, 3761, 11, + 558, 30, 50616], "temperature": 0.0, "avg_logprob": -0.21321314175923664, "compression_ratio": + 1.700374531835206, "no_speech_prob": 0.0097951740026474}, {"id": 425, "seek": 282488, + "start": 2829.92, "end": 2834.96, "text": " To go back and like tell you what was + important, right? But how does it all know what is important?", "tokens": [50616, + 1407, 352, 646, 293, 411, 980, 291, 437, 390, 1021, 11, 558, 30, 583, 577, 775, + 309, 439, 458, 437, 307, 1021, 30, 50868], "temperature": 0.0, "avg_logprob": -0.21321314175923664, + "compression_ratio": 1.700374531835206, "no_speech_prob": 0.0097951740026474}, {"id": + 426, "seek": 282488, "start": 2834.96, "end": 2839.2000000000003, "text": " You + know, but you already forgot. And so if you have this artifacts, you can refer to + them.", "tokens": [50868, 509, 458, 11, 457, 291, 1217, 5298, 13, 400, 370, 498, + 291, 362, 341, 24617, 11, 291, 393, 2864, 281, 552, 13, 51080], "temperature": 0.0, + "avg_logprob": -0.21321314175923664, "compression_ratio": 1.700374531835206, "no_speech_prob": + 0.0097951740026474}, {"id": 427, "seek": 282488, "start": 2841.2000000000003, "end": + 2848.1600000000003, "text": " But it''s interesting that I think these artifacts + can you use them? And by the way, I don''t know,", "tokens": [51180, 583, 309, 311, + 1880, 300, 286, 519, 613, 24617, 393, 291, 764, 552, 30, 400, 538, 264, 636, 11, + 286, 500, 380, 458, 11, 51528], "temperature": 0.0, "avg_logprob": -0.21321314175923664, + "compression_ratio": 1.700374531835206, "no_speech_prob": 0.0097951740026474}, {"id": + 428, "seek": 282488, "start": 2848.1600000000003, "end": 2850.8, "text": " if you + can demo something quickly, I saw a demo on your website.", "tokens": [51528, 498, + 291, 393, 10723, 746, 2661, 11, 286, 1866, 257, 10723, 322, 428, 3144, 13, 51660], + "temperature": 0.0, "avg_logprob": -0.21321314175923664, "compression_ratio": 1.700374531835206, + "no_speech_prob": 0.0097951740026474}, {"id": 429, "seek": 285080, "start": 2851.76, + "end": 2859.36, "text": " All right. So this is how do you go to my website? Oh, + and you know, check this out.", "tokens": [50412, 1057, 558, 13, 407, 341, 307, + 577, 360, 291, 352, 281, 452, 3144, 30, 876, 11, 293, 291, 458, 11, 1520, 341, 484, + 13, 50792], "temperature": 0.0, "avg_logprob": -0.30674143040433843, "compression_ratio": + 1.4881516587677726, "no_speech_prob": 0.11062902212142944}, {"id": 430, "seek": + 285080, "start": 2859.36, "end": 2867.2000000000003, "text": " This website was + me and like chat GPT and cursor just kind of hanging out, teaching me some HTML.", + "tokens": [50792, 639, 3144, 390, 385, 293, 411, 5081, 26039, 51, 293, 28169, 445, + 733, 295, 8345, 484, 11, 4571, 385, 512, 17995, 13, 51184], "temperature": 0.0, + "avg_logprob": -0.30674143040433843, "compression_ratio": 1.4881516587677726, "no_speech_prob": + 0.11062902212142944}, {"id": 431, "seek": 285080, "start": 2867.2000000000003, "end": + 2873.76, "text": " But yeah, you go to my blog. Wait a second. Wait a second. You''ve + built this site with an LLM.", "tokens": [51184, 583, 1338, 11, 291, 352, 281, 452, + 6968, 13, 3802, 257, 1150, 13, 3802, 257, 1150, 13, 509, 600, 3094, 341, 3621, 365, + 364, 441, 43, 44, 13, 51512], "temperature": 0.0, "avg_logprob": -0.30674143040433843, + "compression_ratio": 1.4881516587677726, "no_speech_prob": 0.11062902212142944}, + {"id": 432, "seek": 285080, "start": 2875.04, "end": 2877.44, "text": " Correct? + Yeah. That''s what you said.", "tokens": [51576, 12753, 30, 865, 13, 663, 311, 437, + 291, 848, 13, 51696], "temperature": 0.0, "avg_logprob": -0.30674143040433843, "compression_ratio": + 1.4881516587677726, "no_speech_prob": 0.11062902212142944}, {"id": 433, "seek": + 287744, "start": 2877.44, "end": 2882.4, "text": " Well, it was me and a large language + ball. It wouldn''t be just saying build a website.", "tokens": [50364, 1042, 11, + 309, 390, 385, 293, 257, 2416, 2856, 2594, 13, 467, 2759, 380, 312, 445, 1566, 1322, + 257, 3144, 13, 50612], "temperature": 0.0, "avg_logprob": -0.27813111490278103, + "compression_ratio": 1.7601476014760147, "no_speech_prob": 0.025666004046797752}, + {"id": 434, "seek": 287744, "start": 2882.4, "end": 2886.08, "text": " Of course. + It''s going to, it''s the, it''s what''s going to happen in our future.", "tokens": + [50612, 2720, 1164, 13, 467, 311, 516, 281, 11, 309, 311, 264, 11, 309, 311, 437, + 311, 516, 281, 1051, 294, 527, 2027, 13, 50796], "temperature": 0.0, "avg_logprob": + -0.27813111490278103, "compression_ratio": 1.7601476014760147, "no_speech_prob": + 0.025666004046797752}, {"id": 435, "seek": 287744, "start": 2886.08, "end": 2890.32, + "text": " It was everything is going to be a conversation working on this with a + large language ball.", "tokens": [50796, 467, 390, 1203, 307, 516, 281, 312, 257, + 3761, 1364, 322, 341, 365, 257, 2416, 2856, 2594, 13, 51008], "temperature": 0.0, + "avg_logprob": -0.27813111490278103, "compression_ratio": 1.7601476014760147, "no_speech_prob": + 0.025666004046797752}, {"id": 436, "seek": 287744, "start": 2890.32, "end": 2895.12, + "text": " It''s a beautiful website. I have to say, yeah, amazing. And the logo.", + "tokens": [51008, 467, 311, 257, 2238, 3144, 13, 286, 362, 281, 584, 11, 1338, 11, + 2243, 13, 400, 264, 9699, 13, 51248], "temperature": 0.0, "avg_logprob": -0.27813111490278103, + "compression_ratio": 1.7601476014760147, "no_speech_prob": 0.025666004046797752}, + {"id": 437, "seek": 287744, "start": 2895.12, "end": 2901.6, "text": " Even the + sniffy little logo was generated. AI. Oh, amazing. Okay. This is ridiculous.", "tokens": + [51248, 2754, 264, 31101, 88, 707, 9699, 390, 10833, 13, 7318, 13, 876, 11, 2243, + 13, 1033, 13, 639, 307, 11083, 13, 51572], "temperature": 0.0, "avg_logprob": -0.27813111490278103, + "compression_ratio": 1.7601476014760147, "no_speech_prob": 0.025666004046797752}, + {"id": 438, "seek": 287744, "start": 2901.6, "end": 2904.0, "text": " I''m going + to take up just a little bit of your time. It''s okay.", "tokens": [51572, 286, + 478, 516, 281, 747, 493, 445, 257, 707, 857, 295, 428, 565, 13, 467, 311, 1392, + 13, 51692], "temperature": 0.0, "avg_logprob": -0.27813111490278103, "compression_ratio": + 1.7601476014760147, "no_speech_prob": 0.025666004046797752}, {"id": 439, "seek": + 290400, "start": 2904.96, "end": 2910.56, "text": " Oh, it''s fine. This logo right + here. Check out how many cool things out. There''s,", "tokens": [50412, 876, 11, + 309, 311, 2489, 13, 639, 9699, 558, 510, 13, 6881, 484, 577, 867, 1627, 721, 484, + 13, 821, 311, 11, 50692], "temperature": 0.0, "avg_logprob": -0.20681597636296198, + "compression_ratio": 1.6079295154185023, "no_speech_prob": 0.012884764932096004}, + {"id": 440, "seek": 290400, "start": 2910.56, "end": 2914.72, "text": " there''s + a bunch of little bits in here. And then I''ll make give you a quiz so you can find + the last", "tokens": [50692, 456, 311, 257, 3840, 295, 707, 9239, 294, 510, 13, + 400, 550, 286, 603, 652, 976, 291, 257, 15450, 370, 291, 393, 915, 264, 1036, 50900], + "temperature": 0.0, "avg_logprob": -0.20681597636296198, "compression_ratio": 1.6079295154185023, + "no_speech_prob": 0.012884764932096004}, {"id": 441, "seek": 290400, "start": 2916.4, + "end": 2922.48, "text": " thing hiding in this. Arcturus is a star in the Northern + hemisphere. It''s a navigational", "tokens": [50984, 551, 10596, 294, 341, 13, 1587, + 349, 374, 301, 307, 257, 3543, 294, 264, 14335, 38453, 13, 467, 311, 257, 7407, + 1478, 51288], "temperature": 0.0, "avg_logprob": -0.20681597636296198, "compression_ratio": + 1.6079295154185023, "no_speech_prob": 0.012884764932096004}, {"id": 442, "seek": + 290400, "start": 2922.48, "end": 2929.44, "text": " star. It''s a brightest star. + And it means guardian of the bear. And so with my cubo logo here,", "tokens": [51288, + 3543, 13, 467, 311, 257, 36271, 3543, 13, 400, 309, 1355, 30355, 295, 264, 6155, + 13, 400, 370, 365, 452, 10057, 78, 9699, 510, 11, 51636], "temperature": 0.0, "avg_logprob": + -0.20681597636296198, "compression_ratio": 1.6079295154185023, "no_speech_prob": + 0.012884764932096004}, {"id": 443, "seek": 292944, "start": 2929.44, "end": 2934.4, + "text": " you''ve got the a you got the bear. The a is kind of serves. It''s a little + looks like", "tokens": [50364, 291, 600, 658, 264, 257, 291, 658, 264, 6155, 13, + 440, 257, 307, 733, 295, 13451, 13, 467, 311, 257, 707, 1542, 411, 50612], "temperature": + 0.0, "avg_logprob": -0.24754694529942103, "compression_ratio": 1.6487603305785123, + "no_speech_prob": 0.0011738804168999195}, {"id": 444, "seek": 292944, "start": 2934.4, + "end": 2939.12, "text": " guardian. The bears represent of the big hairy problem. + That''s powerful. And but I''m going to,", "tokens": [50612, 30355, 13, 440, 17276, + 2906, 295, 264, 955, 42346, 1154, 13, 663, 311, 4005, 13, 400, 457, 286, 478, 516, + 281, 11, 50848], "temperature": 0.0, "avg_logprob": -0.24754694529942103, "compression_ratio": + 1.6487603305785123, "no_speech_prob": 0.0011738804168999195}, {"id": 445, "seek": + 292944, "start": 2939.12, "end": 2943.52, "text": " I''m going to help you out. + The stars are all for for pointed. It''s navigational.", "tokens": [50848, 286, + 478, 516, 281, 854, 291, 484, 13, 440, 6105, 366, 439, 337, 337, 10932, 13, 467, + 311, 7407, 1478, 13, 51068], "temperature": 0.0, "avg_logprob": -0.24754694529942103, + "compression_ratio": 1.6487603305785123, "no_speech_prob": 0.0011738804168999195}, + {"id": 446, "seek": 292944, "start": 2944.64, "end": 2951.76, "text": " There''s + one more little uh, uh, Easter egg in this that I didn''t notice until I finished + building it.", "tokens": [51124, 821, 311, 472, 544, 707, 2232, 11, 2232, 11, 9403, + 3777, 294, 341, 300, 286, 994, 380, 3449, 1826, 286, 4335, 2390, 309, 13, 51480], + "temperature": 0.0, "avg_logprob": -0.24754694529942103, "compression_ratio": 1.6487603305785123, + "no_speech_prob": 0.0011738804168999195}, {"id": 447, "seek": 292944, "start": 2951.76, + "end": 2953.36, "text": " I didn''t design it. It just emerged.", "tokens": [51480, + 286, 994, 380, 1715, 309, 13, 467, 445, 20178, 13, 51560], "temperature": 0.0, "avg_logprob": + -0.24754694529942103, "compression_ratio": 1.6487603305785123, "no_speech_prob": + 0.0011738804168999195}, {"id": 448, "seek": 295336, "start": 2954.08, "end": 2958.88, + "text": " And if, yeah, I''ll start doing it.", "tokens": [50400, 400, 498, 11, + 1338, 11, 286, 603, 722, 884, 309, 13, 50640], "temperature": 0.0, "avg_logprob": + -0.2758225003878276, "compression_ratio": 1.6648936170212767, "no_speech_prob": + 0.036097023636102676}, {"id": 449, "seek": 295336, "start": 2961.6, "end": 2967.28, + "text": " If you''re a good computer scientist, especially, oh yeah, then yeah, + yeah, a star search,", "tokens": [50776, 759, 291, 434, 257, 665, 3820, 12662, 11, + 2318, 11, 1954, 1338, 11, 550, 1338, 11, 1338, 11, 257, 3543, 3164, 11, 51060], + "temperature": 0.0, "avg_logprob": -0.2758225003878276, "compression_ratio": 1.6648936170212767, + "no_speech_prob": 0.036097023636102676}, {"id": 450, "seek": 295336, "start": 2967.28, + "end": 2973.04, "text": " a star search. You got it. You got it. You got it. I didn''t + even think about it. I just thought", "tokens": [51060, 257, 3543, 3164, 13, 509, + 658, 309, 13, 509, 658, 309, 13, 509, 658, 309, 13, 286, 994, 380, 754, 519, 466, + 309, 13, 286, 445, 1194, 51348], "temperature": 0.0, "avg_logprob": -0.2758225003878276, + "compression_ratio": 1.6648936170212767, "no_speech_prob": 0.036097023636102676}, + {"id": 451, "seek": 295336, "start": 2973.04, "end": 2977.92, "text": " this needs + kind of a star over here. And I looked at it and it''s a star, which is, you know,", + "tokens": [51348, 341, 2203, 733, 295, 257, 3543, 670, 510, 13, 400, 286, 2956, + 412, 309, 293, 309, 311, 257, 3543, 11, 597, 307, 11, 291, 458, 11, 51592], "temperature": + 0.0, "avg_logprob": -0.2758225003878276, "compression_ratio": 1.6648936170212767, + "no_speech_prob": 0.036097023636102676}, {"id": 452, "seek": 297792, "start": 2978.0, + "end": 2982.16, "text": " optimal, near optimal navigation of the difficult domain + ahead.", "tokens": [50368, 16252, 11, 2651, 16252, 17346, 295, 264, 2252, 9274, + 2286, 13, 50576], "temperature": 0.0, "avg_logprob": -0.4031072096391158, "compression_ratio": + 1.5089285714285714, "no_speech_prob": 0.01714983582496643}, {"id": 453, "seek": + 297792, "start": 2983.6, "end": 2989.6, "text": " LMS is a good at creating Easter + eggs then. Yeah, very terrible jazz. Yeah.", "tokens": [50648, 441, 10288, 307, + 257, 665, 412, 4084, 9403, 6466, 550, 13, 865, 11, 588, 6237, 15066, 13, 865, 13, + 50948], "temperature": 0.0, "avg_logprob": -0.4031072096391158, "compression_ratio": + 1.5089285714285714, "no_speech_prob": 0.01714983582496643}, {"id": 454, "seek": + 297792, "start": 2991.44, "end": 2998.08, "text": " So anyway, sorry, sorry for + the, also the stars that was, I mean, these stars are amazing as well.", "tokens": + [51040, 407, 4033, 11, 2597, 11, 2597, 337, 264, 11, 611, 264, 6105, 300, 390, 11, + 286, 914, 11, 613, 6105, 366, 2243, 382, 731, 13, 51372], "temperature": 0.0, "avg_logprob": + -0.4031072096391158, "compression_ratio": 1.5089285714285714, "no_speech_prob": + 0.01714983582496643}, {"id": 455, "seek": 297792, "start": 2998.08, "end": 3005.36, + "text": " You can just stare at them, right? And Marvel, they move, they look a + bit like snowflakes sometimes", "tokens": [51372, 509, 393, 445, 22432, 412, 552, + 11, 558, 30, 400, 13837, 11, 436, 1286, 11, 436, 574, 257, 857, 411, 44124, 3419, + 2171, 51736], "temperature": 0.0, "avg_logprob": -0.4031072096391158, "compression_ratio": + 1.5089285714285714, "no_speech_prob": 0.01714983582496643}, {"id": 456, "seek": + 300536, "start": 3005.44, "end": 3012.1600000000003, "text": " as well. Yep, they + do. All right, so thank you for the digression.", "tokens": [50368, 382, 731, 13, + 7010, 11, 436, 360, 13, 1057, 558, 11, 370, 1309, 291, 337, 264, 2528, 2775, 13, + 50704], "temperature": 0.0, "avg_logprob": -0.16619229524031928, "compression_ratio": + 1.6885245901639345, "no_speech_prob": 0.0031185667030513287}, {"id": 457, "seek": + 300536, "start": 3013.52, "end": 3018.56, "text": " We''re looking through my blog + and we''re looking through, uh, cut the chit chat with artifacts.", "tokens": [50772, + 492, 434, 1237, 807, 452, 6968, 293, 321, 434, 1237, 807, 11, 2232, 11, 1723, 264, + 417, 270, 5081, 365, 24617, 13, 51024], "temperature": 0.0, "avg_logprob": -0.16619229524031928, + "compression_ratio": 1.6885245901639345, "no_speech_prob": 0.0031185667030513287}, + {"id": 458, "seek": 300536, "start": 3018.56, "end": 3022.6400000000003, "text": + " One thing I''m trying to do recently with my blog, and I hope you guys will, you + know,", "tokens": [51024, 1485, 551, 286, 478, 1382, 281, 360, 3938, 365, 452, 6968, + 11, 293, 286, 1454, 291, 1074, 486, 11, 291, 458, 11, 51228], "temperature": 0.0, + "avg_logprob": -0.16619229524031928, "compression_ratio": 1.6885245901639345, "no_speech_prob": + 0.0031185667030513287}, {"id": 459, "seek": 300536, "start": 3022.6400000000003, + "end": 3027.6, "text": " there''s plenty of place where you can, uh, like, subscribe + for this. I''m trying to put in", "tokens": [51228, 456, 311, 7140, 295, 1081, 689, + 291, 393, 11, 2232, 11, 411, 11, 3022, 337, 341, 13, 286, 478, 1382, 281, 829, 294, + 51476], "temperature": 0.0, "avg_logprob": -0.16619229524031928, "compression_ratio": + 1.6885245901639345, "no_speech_prob": 0.0031185667030513287}, {"id": 460, "seek": + 300536, "start": 3028.8, "end": 3033.76, "text": " plenty of examples. And here''s + the kind of built-in example of it working.", "tokens": [51536, 7140, 295, 5110, + 13, 400, 510, 311, 264, 733, 295, 3094, 12, 259, 1365, 295, 309, 1364, 13, 51784], + "temperature": 0.0, "avg_logprob": -0.16619229524031928, "compression_ratio": 1.6885245901639345, + "no_speech_prob": 0.0031185667030513287}, {"id": 461, "seek": 303536, "start": 3035.6800000000003, + "end": 3042.96, "text": " Let''s see. You know what, this, we might very well edit + this out, but I''m going to go down to", "tokens": [50380, 961, 311, 536, 13, 509, + 458, 437, 11, 341, 11, 321, 1062, 588, 731, 8129, 341, 484, 11, 457, 286, 478, 516, + 281, 352, 760, 281, 50744], "temperature": 0.0, "avg_logprob": -0.193627709740991, + "compression_ratio": 1.6483050847457628, "no_speech_prob": 0.0037873743567615747}, + {"id": 462, "seek": 303536, "start": 3042.96, "end": 3051.52, "text": " the now + you try a bit right here. Oh, if this is, uh, in a naive approach, uh, let''s say + that I''m", "tokens": [50744, 264, 586, 291, 853, 257, 857, 558, 510, 13, 876, 11, + 498, 341, 307, 11, 2232, 11, 294, 257, 29052, 3109, 11, 2232, 11, 718, 311, 584, + 300, 286, 478, 51172], "temperature": 0.0, "avg_logprob": -0.193627709740991, "compression_ratio": + 1.6483050847457628, "no_speech_prob": 0.0037873743567615747}, {"id": 463, "seek": + 303536, "start": 3051.52, "end": 3058.2400000000002, "text": " building like a real + estate, uh, helper assistant. I help real estate agents. And the real estate agent", + "tokens": [51172, 2390, 411, 257, 957, 9749, 11, 2232, 11, 36133, 10994, 13, 286, + 854, 957, 9749, 12554, 13, 400, 264, 957, 9749, 9461, 51508], "temperature": 0.0, + "avg_logprob": -0.193627709740991, "compression_ratio": 1.6483050847457628, "no_speech_prob": + 0.0037873743567615747}, {"id": 464, "seek": 303536, "start": 3058.2400000000002, + "end": 3063.1200000000003, "text": " says, I want to put together an email for client + about, and I''m listed on Oak Street. Can you", "tokens": [51508, 1619, 11, 286, + 528, 281, 829, 1214, 364, 3796, 337, 6423, 466, 11, 293, 286, 478, 10052, 322, 19692, + 7638, 13, 1664, 291, 51752], "temperature": 0.0, "avg_logprob": -0.193627709740991, + "compression_ratio": 1.6483050847457628, "no_speech_prob": 0.0037873743567615747}, + {"id": 465, "seek": 306312, "start": 3063.12, "end": 3069.12, "text": " hold a listing? + And so the thing has some tools built in. Uh, it''s got a get listing tool. And + so", "tokens": [50364, 1797, 257, 22161, 30, 400, 370, 264, 551, 575, 512, 3873, + 3094, 294, 13, 4019, 11, 309, 311, 658, 257, 483, 22161, 2290, 13, 400, 370, 50664], + "temperature": 0.0, "avg_logprob": -0.13347744260515484, "compression_ratio": 1.9254901960784314, + "no_speech_prob": 0.003623353084549308}, {"id": 466, "seek": 306312, "start": 3069.12, + "end": 3076.4, "text": " you can see all the garbage that puts in there. And it''s + got this listing, um, but like, I don''t,", "tokens": [50664, 291, 393, 536, 439, + 264, 14150, 300, 8137, 294, 456, 13, 400, 309, 311, 658, 341, 22161, 11, 1105, 11, + 457, 411, 11, 286, 500, 380, 11, 51028], "temperature": 0.0, "avg_logprob": -0.13347744260515484, + "compression_ratio": 1.9254901960784314, "no_speech_prob": 0.003623353084549308}, + {"id": 467, "seek": 306312, "start": 3076.4, "end": 3080.7999999999997, "text": + " I''ve got the listing. It says it''s got the listing. Somehow all this garbage, + there''s a listing,", "tokens": [51028, 286, 600, 658, 264, 22161, 13, 467, 1619, + 309, 311, 658, 264, 22161, 13, 28357, 439, 341, 14150, 11, 456, 311, 257, 22161, + 11, 51248], "temperature": 0.0, "avg_logprob": -0.13347744260515484, "compression_ratio": + 1.9254901960784314, "no_speech_prob": 0.003623353084549308}, {"id": 468, "seek": + 306312, "start": 3080.7999999999997, "end": 3086.64, "text": " but I don''t know + what the listing''s really about, um, and so I could ask about it, but then it''s,", + "tokens": [51248, 457, 286, 500, 380, 458, 437, 264, 22161, 311, 534, 466, 11, 1105, + 11, 293, 370, 286, 727, 1029, 466, 309, 11, 457, 550, 309, 311, 11, 51540], "temperature": + 0.0, "avg_logprob": -0.13347744260515484, "compression_ratio": 1.9254901960784314, + "no_speech_prob": 0.003623353084549308}, {"id": 469, "seek": 306312, "start": 3086.64, + "end": 3090.7999999999997, "text": " it''s a filter. I don''t have the thing that + came from the database. I have this weird filter in front", "tokens": [51540, 309, + 311, 257, 6608, 13, 286, 500, 380, 362, 264, 551, 300, 1361, 490, 264, 8149, 13, + 286, 362, 341, 3657, 6608, 294, 1868, 51748], "temperature": 0.0, "avg_logprob": + -0.13347744260515484, "compression_ratio": 1.9254901960784314, "no_speech_prob": + 0.003623353084549308}, {"id": 470, "seek": 309080, "start": 3090.8, "end": 3096.4, + "text": " of it. Uh, can you pull an email template and draft a new email in another + tool that it has?", "tokens": [50364, 295, 309, 13, 4019, 11, 393, 291, 2235, 364, + 3796, 12379, 293, 11206, 257, 777, 3796, 294, 1071, 2290, 300, 309, 575, 30, 50644], + "temperature": 0.0, "avg_logprob": -0.3757033348083496, "compression_ratio": 1.2713178294573644, + "no_speech_prob": 0.0034126616083085537}, {"id": 471, "seek": 309080, "start": 3105.36, + "end": 3108.1600000000003, "text": " I guess it''s going to take its sweet time + to do it. Oh, of course.", "tokens": [51092, 286, 2041, 309, 311, 516, 281, 747, + 1080, 3844, 565, 281, 360, 309, 13, 876, 11, 295, 1164, 13, 51232], "temperature": + 0.0, "avg_logprob": -0.3757033348083496, "compression_ratio": 1.2713178294573644, + "no_speech_prob": 0.0034126616083085537}, {"id": 472, "seek": 309080, "start": 3109.6000000000004, + "end": 3109.84, "text": " Hmm.", "tokens": [51304, 8239, 13, 51316], "temperature": + 0.0, "avg_logprob": -0.3757033348083496, "compression_ratio": 1.2713178294573644, + "no_speech_prob": 0.0034126616083085537}, {"id": 473, "seek": 310984, "start": 3110.0, + "end": 3110.8, "text": " Hmm.", "tokens": [50372, 8239, 13, 50412], "temperature": + 0.0, "avg_logprob": -0.34123297660581525, "compression_ratio": 1.5487179487179488, + "no_speech_prob": 0.00704128947108984}, {"id": 474, "seek": 310984, "start": 3117.04, + "end": 3124.6400000000003, "text": " Okay. Um, so it drafts, it drafts an email, + but oh, look, I''ve forgotten this, the, the buyer''s name.", "tokens": [50724, + 1033, 13, 3301, 11, 370, 309, 11206, 82, 11, 309, 11206, 82, 364, 3796, 11, 457, + 1954, 11, 574, 11, 286, 600, 11832, 341, 11, 264, 11, 264, 24645, 311, 1315, 13, + 51104], "temperature": 0.0, "avg_logprob": -0.34123297660581525, "compression_ratio": + 1.5487179487179488, "no_speech_prob": 0.00704128947108984}, {"id": 475, "seek": + 310984, "start": 3124.6400000000003, "end": 3130.1600000000003, "text": " So this + is one version of the email that is relevant to this thing right here. Uh, but, + you know,", "tokens": [51104, 407, 341, 307, 472, 3037, 295, 264, 3796, 300, 307, + 7340, 281, 341, 551, 558, 510, 13, 4019, 11, 457, 11, 291, 458, 11, 51380], "temperature": + 0.0, "avg_logprob": -0.34123297660581525, "compression_ratio": 1.5487179487179488, + "no_speech_prob": 0.00704128947108984}, {"id": 476, "seek": 310984, "start": 3130.1600000000003, + "end": 3134.88, "text": " I''ve forgotten to tell you his name is Tim Cersei and + my company''s name is Artie Tristral Estate.", "tokens": [51380, 286, 600, 11832, + 281, 980, 291, 702, 1315, 307, 7172, 26402, 43665, 293, 452, 2237, 311, 1315, 307, + 5735, 414, 1765, 468, 2155, 48097, 13, 51616], "temperature": 0.0, "avg_logprob": + -0.34123297660581525, "compression_ratio": 1.5487179487179488, "no_speech_prob": + 0.00704128947108984}, {"id": 477, "seek": 313488, "start": 3135.84, "end": 3141.84, + "text": " Uh, it goes back to this and so it fills it in and then I''m left at the + end of the conversation,", "tokens": [50412, 4019, 11, 309, 1709, 646, 281, 341, + 293, 370, 309, 22498, 309, 294, 293, 550, 286, 478, 1411, 412, 264, 917, 295, 264, + 3761, 11, 50712], "temperature": 0.0, "avg_logprob": -0.12002897974270493, "compression_ratio": + 1.7527272727272727, "no_speech_prob": 0.012477729469537735}, {"id": 478, "seek": + 313488, "start": 3141.84, "end": 3146.8, "text": " you know, copy and paste in this + out. If this is what I want, I''m going to paste this in the user''s", "tokens": + [50712, 291, 458, 11, 5055, 293, 9163, 294, 341, 484, 13, 759, 341, 307, 437, 286, + 528, 11, 286, 478, 516, 281, 9163, 341, 294, 264, 4195, 311, 50960], "temperature": + 0.0, "avg_logprob": -0.12002897974270493, "compression_ratio": 1.7527272727272727, + "no_speech_prob": 0.012477729469537735}, {"id": 479, "seek": 313488, "start": 3146.8, + "end": 3151.2000000000003, "text": " email and be really embarrassed when it''s + got this little string at the top because I''ve copied that", "tokens": [50960, + 3796, 293, 312, 534, 16843, 562, 309, 311, 658, 341, 707, 6798, 412, 264, 1192, + 570, 286, 600, 25365, 300, 51180], "temperature": 0.0, "avg_logprob": -0.12002897974270493, + "compression_ratio": 1.7527272727272727, "no_speech_prob": 0.012477729469537735}, + {"id": 480, "seek": 313488, "start": 3151.2000000000003, "end": 3156.56, "text": + " out. And if I wanted to do anything else like modify the template or do anything, + it''s,", "tokens": [51180, 484, 13, 400, 498, 286, 1415, 281, 360, 1340, 1646, 411, + 16927, 264, 12379, 420, 360, 1340, 11, 309, 311, 11, 51448], "temperature": 0.0, + "avg_logprob": -0.12002897974270493, "compression_ratio": 1.7527272727272727, "no_speech_prob": + 0.012477729469537735}, {"id": 481, "seek": 313488, "start": 3157.28, "end": 3161.6, + "text": " it''s, it''s just, it''s not there for me. All right. So let''s, let''s + do a similar experience with,", "tokens": [51484, 309, 311, 11, 309, 311, 445, 11, + 309, 311, 406, 456, 337, 385, 13, 1057, 558, 13, 407, 718, 311, 11, 718, 311, 360, + 257, 2531, 1752, 365, 11, 51700], "temperature": 0.0, "avg_logprob": -0.12002897974270493, + "compression_ratio": 1.7527272727272727, "no_speech_prob": 0.012477729469537735}, + {"id": 482, "seek": 316160, "start": 3162.08, "end": 3167.36, "text": " with this. + I want to, uh, again, pull out that listing per, for Oak Street. Oh, I have an interesting.", + "tokens": [50388, 365, 341, 13, 286, 528, 281, 11, 2232, 11, 797, 11, 2235, 484, + 300, 22161, 680, 11, 337, 19692, 7638, 13, 876, 11, 286, 362, 364, 1880, 13, 50652], + "temperature": 0.0, "avg_logprob": -0.26006824269014245, "compression_ratio": 1.4567307692307692, + "no_speech_prob": 0.005329612176865339}, {"id": 483, "seek": 316160, "start": 3174.08, + "end": 3179.12, "text": " All right. So in this time, I''m still showing that it, + it knows how to use tools, but every time it", "tokens": [50988, 1057, 558, 13, + 407, 294, 341, 565, 11, 286, 478, 920, 4099, 300, 309, 11, 309, 3255, 577, 281, + 764, 3873, 11, 457, 633, 565, 309, 51240], "temperature": 0.0, "avg_logprob": -0.26006824269014245, + "compression_ratio": 1.4567307692307692, "no_speech_prob": 0.005329612176865339}, + {"id": 484, "seek": 316160, "start": 3179.12, "end": 3185.44, "text": " tries to + spit out this like JSON stuff, it''s actually getting substituted in with an HRF + that points", "tokens": [51240, 9898, 281, 22127, 484, 341, 411, 31828, 1507, 11, + 309, 311, 767, 1242, 26441, 4866, 294, 365, 364, 389, 49, 37, 300, 2793, 51556], + "temperature": 0.0, "avg_logprob": -0.26006824269014245, "compression_ratio": 1.4567307692307692, + "no_speech_prob": 0.005329612176865339}, {"id": 485, "seek": 318544, "start": 3185.52, + "end": 3191.6, "text": " to it. And what is it point to where you click on it and + it automatically loads, uh, this", "tokens": [50368, 281, 309, 13, 400, 437, 307, + 309, 935, 281, 689, 291, 2052, 322, 309, 293, 309, 6772, 12668, 11, 2232, 11, 341, + 50672], "temperature": 0.0, "avg_logprob": -0.2031742654195646, "compression_ratio": + 1.6379310344827587, "no_speech_prob": 0.016665233299136162}, {"id": 486, "seek": + 318544, "start": 3191.6, "end": 3196.56, "text": " scar right here. Now, um, I didn''t + take time to make a real pretty interface, but you can,", "tokens": [50672, 10569, + 558, 510, 13, 823, 11, 1105, 11, 286, 994, 380, 747, 565, 281, 652, 257, 957, 1238, + 9226, 11, 457, 291, 393, 11, 50920], "temperature": 0.0, "avg_logprob": -0.2031742654195646, + "compression_ratio": 1.6379310344827587, "no_speech_prob": 0.016665233299136162}, + {"id": 487, "seek": 318544, "start": 3196.56, "end": 3200.7200000000003, "text": + " imagine this is JSON. You can make this look like anything you want to. You can + make it link out", "tokens": [50920, 3811, 341, 307, 31828, 13, 509, 393, 652, 341, + 574, 411, 1340, 291, 528, 281, 13, 509, 393, 652, 309, 2113, 484, 51128], "temperature": + 0.0, "avg_logprob": -0.2031742654195646, "compression_ratio": 1.6379310344827587, + "no_speech_prob": 0.016665233299136162}, {"id": 488, "seek": 318544, "start": 3200.7200000000003, + "end": 3205.52, "text": " to the database and do all sorts of things. All right. + I''m going to put together that email template", "tokens": [51128, 281, 264, 8149, + 293, 360, 439, 7527, 295, 721, 13, 1057, 558, 13, 286, 478, 516, 281, 829, 1214, + 300, 3796, 12379, 51368], "temperature": 0.0, "avg_logprob": -0.2031742654195646, + "compression_ratio": 1.6379310344827587, "no_speech_prob": 0.016665233299136162}, + {"id": 489, "seek": 318544, "start": 3205.52, "end": 3211.84, "text": " again. Yeah. + I guess especially when you build a dedicated LAM application, right? You know what", + "tokens": [51368, 797, 13, 865, 13, 286, 2041, 2318, 562, 291, 1322, 257, 8374, + 441, 2865, 3861, 11, 558, 30, 509, 458, 437, 51684], "temperature": 0.0, "avg_logprob": + -0.2031742654195646, "compression_ratio": 1.6379310344827587, "no_speech_prob": + 0.016665233299136162}, {"id": 490, "seek": 321184, "start": 3211.84, "end": 3217.04, + "text": " type of, what types of objects you''re going to be interacting with and + you can build the, you know,", "tokens": [50364, 2010, 295, 11, 437, 3467, 295, + 6565, 291, 434, 516, 281, 312, 18017, 365, 293, 291, 393, 1322, 264, 11, 291, 458, + 11, 50624], "temperature": 0.0, "avg_logprob": -0.2142804219172551, "compression_ratio": + 1.7376425855513309, "no_speech_prob": 0.0027152097318321466}, {"id": 491, "seek": + 321184, "start": 3217.76, "end": 3221.84, "text": " I can do UI around those, right? + But yeah, a very flexible,", "tokens": [50660, 286, 393, 360, 15682, 926, 729, 11, + 558, 30, 583, 1338, 11, 257, 588, 11358, 11, 50864], "temperature": 0.0, "avg_logprob": + -0.2142804219172551, "compression_ratio": 1.7376425855513309, "no_speech_prob": + 0.0027152097318321466}, {"id": 492, "seek": 321184, "start": 3221.84, "end": 3226.08, + "text": " manable interface. The interface is whatever the user needs it to be potentially. + Yeah. All right.", "tokens": [50864, 587, 712, 9226, 13, 440, 9226, 307, 2035, 264, + 4195, 2203, 309, 281, 312, 7263, 13, 865, 13, 1057, 558, 13, 51076], "temperature": + 0.0, "avg_logprob": -0.2142804219172551, "compression_ratio": 1.7376425855513309, + "no_speech_prob": 0.0027152097318321466}, {"id": 493, "seek": 321184, "start": 3226.08, + "end": 3233.28, "text": " So it''s, uh, it''s, uh, it''s, uh, got this customized + email draft. Now, uh, you know, I was looking", "tokens": [51076, 407, 309, 311, + 11, 2232, 11, 309, 311, 11, 2232, 11, 309, 311, 11, 2232, 11, 658, 341, 30581, 3796, + 11206, 13, 823, 11, 2232, 11, 291, 458, 11, 286, 390, 1237, 51436], "temperature": + 0.0, "avg_logprob": -0.2142804219172551, "compression_ratio": 1.7376425855513309, + "no_speech_prob": 0.0027152097318321466}, {"id": 494, "seek": 321184, "start": 3233.28, + "end": 3237.84, "text": " here, there''s no email draft here, but there is here + on, on the side of the screen. And you can see", "tokens": [51436, 510, 11, 456, + 311, 572, 3796, 11206, 510, 11, 457, 456, 307, 510, 322, 11, 322, 264, 1252, 295, + 264, 2568, 13, 400, 291, 393, 536, 51664], "temperature": 0.0, "avg_logprob": -0.2142804219172551, + "compression_ratio": 1.7376425855513309, "no_speech_prob": 0.0027152097318321466}, + {"id": 495, "seek": 323784, "start": 3237.84, "end": 3245.6800000000003, "text": + " unfortunately, I forgot, uh, to stick in the user''s names. So, uh, let''s see. + Here''s the template", "tokens": [50364, 7015, 11, 286, 5298, 11, 2232, 11, 281, + 2897, 294, 264, 4195, 311, 5288, 13, 407, 11, 2232, 11, 718, 311, 536, 13, 1692, + 311, 264, 12379, 50756], "temperature": 0.0, "avg_logprob": -0.12056072737819465, + "compression_ratio": 1.7647058823529411, "no_speech_prob": 0.0036527165211737156}, + {"id": 496, "seek": 323784, "start": 3245.6800000000003, "end": 3250.0, "text": + " that it used. We didn''t see that in the last example. You can see how it wants + to put together", "tokens": [50756, 300, 309, 1143, 13, 492, 994, 380, 536, 300, + 294, 264, 1036, 1365, 13, 509, 393, 536, 577, 309, 2738, 281, 829, 1214, 50972], + "temperature": 0.0, "avg_logprob": -0.12056072737819465, "compression_ratio": 1.7647058823529411, + "no_speech_prob": 0.0036527165211737156}, {"id": 497, "seek": 323784, "start": 3250.0, + "end": 3256.48, "text": " stuff. You can see how it actually put together stuff. + And whenever I said I forgot his name,", "tokens": [50972, 1507, 13, 509, 393, 536, + 577, 309, 767, 829, 1214, 1507, 13, 400, 5699, 286, 848, 286, 5298, 702, 1315, 11, + 51296], "temperature": 0.0, "avg_logprob": -0.12056072737819465, "compression_ratio": + 1.7647058823529411, "no_speech_prob": 0.0036527165211737156}, {"id": 498, "seek": + 323784, "start": 3256.48, "end": 3260.88, "text": " it said, Oh, okay, I''ve updated + that artifact for you. So you don''t have like multiple versions", "tokens": [51296, + 309, 848, 11, 876, 11, 1392, 11, 286, 600, 10588, 300, 34806, 337, 291, 13, 407, + 291, 500, 380, 362, 411, 3866, 9606, 51516], "temperature": 0.0, "avg_logprob": + -0.12056072737819465, "compression_ratio": 1.7647058823529411, "no_speech_prob": + 0.0036527165211737156}, {"id": 499, "seek": 323784, "start": 3260.88, "end": 3266.48, + "text": " scaling up. You just got this. And you could do even interesting things + like I could say, uh, you", "tokens": [51516, 21589, 493, 13, 509, 445, 658, 341, + 13, 400, 291, 727, 360, 754, 1880, 721, 411, 286, 727, 584, 11, 2232, 11, 291, 51796], + "temperature": 0.0, "avg_logprob": -0.12056072737819465, "compression_ratio": 1.7647058823529411, + "no_speech_prob": 0.0036527165211737156}, {"id": 500, "seek": 326648, "start": 3266.96, + "end": 3273.12, "text": " know, this is much better if I say gone bare, you man + right here. And that is now part of the artifact", "tokens": [50388, 458, 11, 341, + 307, 709, 1101, 498, 286, 584, 2780, 6949, 11, 291, 587, 558, 510, 13, 400, 300, + 307, 586, 644, 295, 264, 34806, 50696], "temperature": 0.0, "avg_logprob": -0.22544487061039095, + "compression_ratio": 1.6942446043165467, "no_speech_prob": 0.001434390083886683}, + {"id": 501, "seek": 326648, "start": 3273.12, "end": 3278.16, "text": " that the + assistant sees. Uh, it''s, it''s in that artifact section at the top of the prompt.", + "tokens": [50696, 300, 264, 10994, 8194, 13, 4019, 11, 309, 311, 11, 309, 311, 294, + 300, 34806, 3541, 412, 264, 1192, 295, 264, 12391, 13, 50948], "temperature": 0.0, + "avg_logprob": -0.22544487061039095, "compression_ratio": 1.6942446043165467, "no_speech_prob": + 0.001434390083886683}, {"id": 502, "seek": 326648, "start": 3278.64, "end": 3282.8, + "text": " You can have it say, please change my email prompt forever to say something + out of like this.", "tokens": [50972, 509, 393, 362, 309, 584, 11, 1767, 1319, 452, + 3796, 12391, 5680, 281, 584, 746, 484, 295, 411, 341, 13, 51180], "temperature": + 0.0, "avg_logprob": -0.22544487061039095, "compression_ratio": 1.6942446043165467, + "no_speech_prob": 0.001434390083886683}, {"id": 503, "seek": 326648, "start": 3282.8, + "end": 3287.76, "text": " And you can work on this and say that back to the day. + It just, oh, it opens up a lot of", "tokens": [51180, 400, 291, 393, 589, 322, 341, + 293, 584, 300, 646, 281, 264, 786, 13, 467, 445, 11, 1954, 11, 309, 9870, 493, 257, + 688, 295, 51428], "temperature": 0.0, "avg_logprob": -0.22544487061039095, "compression_ratio": + 1.6942446043165467, "no_speech_prob": 0.001434390083886683}, {"id": 504, "seek": + 326648, "start": 3288.56, "end": 3294.2400000000002, "text": " possibilities for + a user experience that is easier. Because when we get work done, we get work", "tokens": + [51468, 12178, 337, 257, 4195, 1752, 300, 307, 3571, 13, 1436, 562, 321, 483, 589, + 1096, 11, 321, 483, 589, 51752], "temperature": 0.0, "avg_logprob": -0.22544487061039095, + "compression_ratio": 1.6942446043165467, "no_speech_prob": 0.001434390083886683}, + {"id": 505, "seek": 329424, "start": 3294.72, "end": 3301.68, "text": " on things, + not just check. Yeah. Your reason, your reason around artifacts and you work with + them", "tokens": [50388, 322, 721, 11, 406, 445, 1520, 13, 865, 13, 2260, 1778, + 11, 428, 1778, 926, 24617, 293, 291, 589, 365, 552, 50736], "temperature": 0.0, + "avg_logprob": -0.1835279228273502, "compression_ratio": 1.7946768060836502, "no_speech_prob": + 0.021373102441430092}, {"id": 506, "seek": 329424, "start": 3301.68, "end": 3306.72, + "text": " like as if they were physical objects almost, right? You can take away + this thing with you and", "tokens": [50736, 411, 382, 498, 436, 645, 4001, 6565, + 1920, 11, 558, 30, 509, 393, 747, 1314, 341, 551, 365, 291, 293, 50988], "temperature": + 0.0, "avg_logprob": -0.1835279228273502, "compression_ratio": 1.7946768060836502, + "no_speech_prob": 0.021373102441430092}, {"id": 507, "seek": 329424, "start": 3306.72, + "end": 3312.9599999999996, "text": " go proceed with your task. Yep. You refer to + them. You modify them. We use them to do things.", "tokens": [50988, 352, 8991, + 365, 428, 5633, 13, 7010, 13, 509, 2864, 281, 552, 13, 509, 16927, 552, 13, 492, + 764, 552, 281, 360, 721, 13, 51300], "temperature": 0.0, "avg_logprob": -0.1835279228273502, + "compression_ratio": 1.7946768060836502, "no_speech_prob": 0.021373102441430092}, + {"id": 508, "seek": 329424, "start": 3313.9199999999996, "end": 3318.56, "text": + " And you could, I''m guessing. I''m really guessing. I''m new to this topic. You + could maybe even", "tokens": [51348, 400, 291, 727, 11, 286, 478, 17939, 13, 286, + 478, 534, 17939, 13, 286, 478, 777, 281, 341, 4829, 13, 509, 727, 1310, 754, 51580], + "temperature": 0.0, "avg_logprob": -0.1835279228273502, "compression_ratio": 1.7946768060836502, + "no_speech_prob": 0.021373102441430092}, {"id": 509, "seek": 329424, "start": 3318.56, + "end": 3322.7999999999997, "text": " condition the model on these things, right? + You could say given this artifact, I want to do", "tokens": [51580, 4188, 264, 2316, + 322, 613, 721, 11, 558, 30, 509, 727, 584, 2212, 341, 34806, 11, 286, 528, 281, + 360, 51792], "temperature": 0.0, "avg_logprob": -0.1835279228273502, "compression_ratio": + 1.7946768060836502, "no_speech_prob": 0.021373102441430092}, {"id": 510, "seek": + 332280, "start": 3322.8, "end": 3329.04, "text": " something else with it, like + rewrite some parts. You know, would that work? I mean,", "tokens": [50364, 746, + 1646, 365, 309, 11, 411, 28132, 512, 3166, 13, 509, 458, 11, 576, 300, 589, 30, + 286, 914, 11, 50676], "temperature": 0.0, "avg_logprob": -0.1401319806537931, "compression_ratio": + 1.7944664031620554, "no_speech_prob": 0.002486889250576496}, {"id": 511, "seek": + 332280, "start": 3331.04, "end": 3335.36, "text": " this kind of sky is the limit. + Uh, it''s, it''s kind of been a fun thing to think about. But", "tokens": [50776, + 341, 733, 295, 5443, 307, 264, 4948, 13, 4019, 11, 309, 311, 11, 309, 311, 733, + 295, 668, 257, 1019, 551, 281, 519, 466, 13, 583, 50992], "temperature": 0.0, "avg_logprob": + -0.1401319806537931, "compression_ratio": 1.7944664031620554, "no_speech_prob": + 0.002486889250576496}, {"id": 512, "seek": 332280, "start": 3336.5600000000004, + "end": 3340.96, "text": " you could have typed artifacts. And then when you have + a certain type of artifact, you could", "tokens": [51052, 291, 727, 362, 33941, + 24617, 13, 400, 550, 562, 291, 362, 257, 1629, 2010, 295, 34806, 11, 291, 727, 51272], + "temperature": 0.0, "avg_logprob": -0.1401319806537931, "compression_ratio": 1.7944664031620554, + "no_speech_prob": 0.002486889250576496}, {"id": 513, "seek": 332280, "start": 3341.52, + "end": 3347.28, "text": " introduce the tools. Uh, so like, you know, if we need + to modify this artifact artifact, we can,", "tokens": [51300, 5366, 264, 3873, 13, + 4019, 11, 370, 411, 11, 291, 458, 11, 498, 321, 643, 281, 16927, 341, 34806, 34806, + 11, 321, 393, 11, 51588], "temperature": 0.0, "avg_logprob": -0.1401319806537931, + "compression_ratio": 1.7944664031620554, "no_speech_prob": 0.002486889250576496}, + {"id": 514, "seek": 332280, "start": 3347.28, "end": 3351.1200000000003, "text": + " we can know how to deal with it. You can have, it''s kind of what I did with my + next post,", "tokens": [51588, 321, 393, 458, 577, 281, 2028, 365, 309, 13, 509, + 393, 362, 11, 309, 311, 733, 295, 437, 286, 630, 365, 452, 958, 2183, 11, 51780], + "temperature": 0.0, "avg_logprob": -0.1401319806537931, "compression_ratio": 1.7944664031620554, + "no_speech_prob": 0.002486889250576496}, {"id": 515, "seek": 335112, "start": 3351.68, + "end": 3355.8399999999997, "text": " the roaming rag. You can have artifacts that + are like accordions. They''re, they''re bigger than", "tokens": [50392, 264, 42680, + 17539, 13, 509, 393, 362, 24617, 300, 366, 411, 18640, 626, 13, 814, 434, 11, 436, + 434, 3801, 813, 50600], "temperature": 0.0, "avg_logprob": -0.1719035179384293, + "compression_ratio": 1.772563176895307, "no_speech_prob": 0.004317078739404678}, + {"id": 516, "seek": 335112, "start": 3355.8399999999997, "end": 3363.2, "text": + " fit in the prompt. But you can say, you know, here''s summarized outline everything + in every piece", "tokens": [50600, 3318, 294, 264, 12391, 13, 583, 291, 393, 584, + 11, 291, 458, 11, 510, 311, 14611, 1602, 16387, 1203, 294, 633, 2522, 50968], "temperature": + 0.0, "avg_logprob": -0.1719035179384293, "compression_ratio": 1.772563176895307, + "no_speech_prob": 0.004317078739404678}, {"id": 517, "seek": 335112, "start": 3363.2, + "end": 3368.08, "text": " of that summary, uh, the model effectively can click on + it and expand it. And it''s just another", "tokens": [50968, 295, 300, 12691, 11, + 2232, 11, 264, 2316, 8659, 393, 2052, 322, 309, 293, 5268, 309, 13, 400, 309, 311, + 445, 1071, 51212], "temperature": 0.0, "avg_logprob": -0.1719035179384293, "compression_ratio": + 1.772563176895307, "no_speech_prob": 0.004317078739404678}, {"id": 518, "seek": + 335112, "start": 3368.08, "end": 3374.24, "text": " ID and, you know, a tool expand + the section. So it can read docs that are bigger than fits in its", "tokens": [51212, + 7348, 293, 11, 291, 458, 11, 257, 2290, 5268, 264, 3541, 13, 407, 309, 393, 1401, + 45623, 300, 366, 3801, 813, 9001, 294, 1080, 51520], "temperature": 0.0, "avg_logprob": + -0.1719035179384293, "compression_ratio": 1.772563176895307, "no_speech_prob": 0.004317078739404678}, + {"id": 519, "seek": 335112, "start": 3374.24, "end": 3380.0, "text": " context. + There''s just a lot of neat things that I think you can do with artifacts at the + starting point.", "tokens": [51520, 4319, 13, 821, 311, 445, 257, 688, 295, 10654, + 721, 300, 286, 519, 291, 393, 360, 365, 24617, 412, 264, 2891, 935, 13, 51808], + "temperature": 0.0, "avg_logprob": -0.1719035179384293, "compression_ratio": 1.772563176895307, + "no_speech_prob": 0.004317078739404678}, {"id": 520, "seek": 338000, "start": 3380.72, + "end": 3386.32, "text": " It''s very interesting. Don''t you think that just one + thought across my mind is that when we", "tokens": [50400, 467, 311, 588, 1880, + 13, 1468, 380, 291, 519, 300, 445, 472, 1194, 2108, 452, 1575, 307, 300, 562, 321, + 50680], "temperature": 0.0, "avg_logprob": -0.15813912285698783, "compression_ratio": + 1.61864406779661, "no_speech_prob": 0.0071645695716142654}, {"id": 521, "seek": + 338000, "start": 3386.32, "end": 3392.4, "text": " transitioned from static web + to like web 2.0, I guess, so what is what was it called when you can", "tokens": + [50680, 47346, 490, 13437, 3670, 281, 411, 3670, 568, 13, 15, 11, 286, 2041, 11, + 370, 437, 307, 437, 390, 309, 1219, 562, 291, 393, 50984], "temperature": 0.0, "avg_logprob": + -0.15813912285698783, "compression_ratio": 1.61864406779661, "no_speech_prob": 0.0071645695716142654}, + {"id": 522, "seek": 338000, "start": 3392.4, "end": 3397.44, "text": " actually + modify things on the web, right? You could send a comment, you could, you could + do stuff.", "tokens": [50984, 767, 16927, 721, 322, 264, 3670, 11, 558, 30, 509, + 727, 2845, 257, 2871, 11, 291, 727, 11, 291, 727, 360, 1507, 13, 51236], "temperature": + 0.0, "avg_logprob": -0.15813912285698783, "compression_ratio": 1.61864406779661, + "no_speech_prob": 0.0071645695716142654}, {"id": 523, "seek": 338000, "start": 3398.24, + "end": 3405.2, "text": " Uh, now it feels like we''ve transitioned into the new + phase when we do the same to the ideas.", "tokens": [51276, 4019, 11, 586, 309, + 3417, 411, 321, 600, 47346, 666, 264, 777, 5574, 562, 321, 360, 264, 912, 281, 264, + 3487, 13, 51624], "temperature": 0.0, "avg_logprob": -0.15813912285698783, "compression_ratio": + 1.61864406779661, "no_speech_prob": 0.0071645695716142654}, {"id": 524, "seek": + 340520, "start": 3405.8399999999997, "end": 3413.2, "text": " We like exchange ideas + and we can like modify them, you know, prior on them, prompt with them,", "tokens": + [50396, 492, 411, 7742, 3487, 293, 321, 393, 411, 16927, 552, 11, 291, 458, 11, + 4059, 322, 552, 11, 12391, 365, 552, 11, 50764], "temperature": 0.0, "avg_logprob": + -0.15131521224975586, "compression_ratio": 1.6056338028169015, "no_speech_prob": + 0.019538797438144684}, {"id": 525, "seek": 340520, "start": 3414.24, "end": 3420.0, + "text": " uh, take away a store. So it becomes more on the concept level.", "tokens": + [50816, 2232, 11, 747, 1314, 257, 3531, 13, 407, 309, 3643, 544, 322, 264, 3410, + 1496, 13, 51104], "temperature": 0.0, "avg_logprob": -0.15131521224975586, "compression_ratio": + 1.6056338028169015, "no_speech_prob": 0.019538797438144684}, {"id": 526, "seek": + 340520, "start": 3422.24, "end": 3427.2799999999997, "text": " I think everything''s + going to get really weird, uh, going forward. I think we''ve been used to", "tokens": + [51216, 286, 519, 1203, 311, 516, 281, 483, 534, 3657, 11, 2232, 11, 516, 2128, + 13, 286, 519, 321, 600, 668, 1143, 281, 51468], "temperature": 0.0, "avg_logprob": + -0.15131521224975586, "compression_ratio": 1.6056338028169015, "no_speech_prob": + 0.019538797438144684}, {"id": 527, "seek": 340520, "start": 3427.2799999999997, + "end": 3431.6, "text": " going to the internet and going to web pages. And even + if we could interact a little bit,", "tokens": [51468, 516, 281, 264, 4705, 293, + 516, 281, 3670, 7183, 13, 400, 754, 498, 321, 727, 4648, 257, 707, 857, 11, 51684], + "temperature": 0.0, "avg_logprob": -0.15131521224975586, "compression_ratio": 1.6056338028169015, + "no_speech_prob": 0.019538797438144684}, {"id": 528, "seek": 343160, "start": 3431.68, + "end": 3436.72, "text": " it''s nothing like you''re about to see. I wonder if a + lot of the internet experiences,", "tokens": [50368, 309, 311, 1825, 411, 291, 434, + 466, 281, 536, 13, 286, 2441, 498, 257, 688, 295, 264, 4705, 5235, 11, 50620], "temperature": + 0.0, "avg_logprob": -0.1799073259369666, "compression_ratio": 1.7992424242424243, + "no_speech_prob": 0.002159588737413287}, {"id": 529, "seek": 343160, "start": 3436.72, + "end": 3442.0, "text": " you know, they''re worried about all the text going away, + uh, because like we were, we''d run out,", "tokens": [50620, 291, 458, 11, 436, + 434, 5804, 466, 439, 264, 2487, 516, 1314, 11, 2232, 11, 570, 411, 321, 645, 11, + 321, 1116, 1190, 484, 11, 50884], "temperature": 0.0, "avg_logprob": -0.1799073259369666, + "compression_ratio": 1.7992424242424243, "no_speech_prob": 0.002159588737413287}, + {"id": 530, "seek": 343160, "start": 3442.0, "end": 3446.64, "text": " run out of + the text, the internet, training these giant giant models. Maybe the future of the", + "tokens": [50884, 1190, 484, 295, 264, 2487, 11, 264, 4705, 11, 3097, 613, 7410, + 7410, 5245, 13, 2704, 264, 2027, 295, 264, 51116], "temperature": 0.0, "avg_logprob": + -0.1799073259369666, "compression_ratio": 1.7992424242424243, "no_speech_prob": + 0.002159588737413287}, {"id": 531, "seek": 343160, "start": 3446.64, "end": 3453.44, + "text": " internet is going to be replaced by just conversations. The, you''re going + to go to a place that is a", "tokens": [51116, 4705, 307, 516, 281, 312, 10772, + 538, 445, 7315, 13, 440, 11, 291, 434, 516, 281, 352, 281, 257, 1081, 300, 307, + 257, 51456], "temperature": 0.0, "avg_logprob": -0.1799073259369666, "compression_ratio": + 1.7992424242424243, "no_speech_prob": 0.002159588737413287}, {"id": 532, "seek": + 343160, "start": 3453.44, "end": 3458.72, "text": " sensible, you know, starting + point, but the whole website is going to become whatever reality you", "tokens": + [51456, 25380, 11, 291, 458, 11, 2891, 935, 11, 457, 264, 1379, 3144, 307, 516, + 281, 1813, 2035, 4103, 291, 51720], "temperature": 0.0, "avg_logprob": -0.1799073259369666, + "compression_ratio": 1.7992424242424243, "no_speech_prob": 0.002159588737413287}, + {"id": 533, "seek": 345872, "start": 3458.72, "end": 3464.48, "text": " need it + to be at the time. And I have no idea how we harvest the text of that train of future + models.", "tokens": [50364, 643, 309, 281, 312, 412, 264, 565, 13, 400, 286, 362, + 572, 1558, 577, 321, 11917, 264, 2487, 295, 300, 3847, 295, 2027, 5245, 13, 50652], + "temperature": 0.0, "avg_logprob": -0.28321976788276065, "compression_ratio": 1.5153846153846153, + "no_speech_prob": 0.01129200030118227}, {"id": 534, "seek": 345872, "start": 3464.48, + "end": 3470.64, "text": " It might be crazy, but I think I think we''re getting + ready for a future we cannot possibly predict.", "tokens": [50652, 467, 1062, 312, + 3219, 11, 457, 286, 519, 286, 519, 321, 434, 1242, 1919, 337, 257, 2027, 321, 2644, + 6264, 6069, 13, 50960], "temperature": 0.0, "avg_logprob": -0.28321976788276065, + "compression_ratio": 1.5153846153846153, "no_speech_prob": 0.01129200030118227}, + {"id": 535, "seek": 345872, "start": 3471.3599999999997, "end": 3475.7599999999998, + "text": " Yeah, and I think spam will be replaced by slope, right? I don''t know + if you heard of this,", "tokens": [50996, 865, 11, 293, 286, 519, 24028, 486, 312, + 10772, 538, 13525, 11, 558, 30, 286, 500, 380, 458, 498, 291, 2198, 295, 341, 11, + 51216], "temperature": 0.0, "avg_logprob": -0.28321976788276065, "compression_ratio": + 1.5153846153846153, "no_speech_prob": 0.01129200030118227}, {"id": 536, "seek": + 345872, "start": 3476.8799999999997, "end": 3487.52, "text": " YouTube. No, slow, + slow, slow is, uh, SLOP. So it''s basically an unverified output of an LLAM model.", + "tokens": [51272, 3088, 13, 883, 11, 2964, 11, 2964, 11, 2964, 307, 11, 2232, 11, + 22999, 12059, 13, 407, 309, 311, 1936, 364, 517, 331, 2587, 5598, 295, 364, 441, + 43, 2865, 2316, 13, 51804], "temperature": 0.0, "avg_logprob": -0.28321976788276065, + "compression_ratio": 1.5153846153846153, "no_speech_prob": 0.01129200030118227}, + {"id": 537, "seek": 348752, "start": 3487.52, "end": 3493.52, "text": " So something + that got produced back to your question, you don''t, you have no idea if it''s true + or not,", "tokens": [50364, 407, 746, 300, 658, 7126, 646, 281, 428, 1168, 11, 291, + 500, 380, 11, 291, 362, 572, 1558, 498, 309, 311, 2074, 420, 406, 11, 50664], "temperature": + 0.0, "avg_logprob": -0.18153953552246094, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.013459110632538795}, {"id": 538, "seek": 348752, "start": 3493.52, + "end": 3498.96, "text": " you go and paste it somewhere in the web and then LLAM + goes and scraps it and learns from it.", "tokens": [50664, 291, 352, 293, 9163, + 309, 4079, 294, 264, 3670, 293, 550, 441, 43, 2865, 1709, 293, 45204, 309, 293, + 27152, 490, 309, 13, 50936], "temperature": 0.0, "avg_logprob": -0.18153953552246094, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.013459110632538795}, + {"id": 539, "seek": 348752, "start": 3499.68, "end": 3506.48, "text": " So you spam + the model. And so there is a call out. If this feedback effect. Yeah, exactly.", + "tokens": [50972, 407, 291, 24028, 264, 2316, 13, 400, 370, 456, 307, 257, 818, + 484, 13, 759, 341, 5824, 1802, 13, 865, 11, 2293, 13, 51312], "temperature": 0.0, + "avg_logprob": -0.18153953552246094, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.013459110632538795}, {"id": 540, "seek": 348752, "start": 3506.48, "end": 3513.04, + "text": " And there is a call out that, hey, let''s not spam or let''s not post + slope on the web because that will", "tokens": [51312, 400, 456, 307, 257, 818, + 484, 300, 11, 4177, 11, 718, 311, 406, 24028, 420, 718, 311, 406, 2183, 13525, 322, + 264, 3670, 570, 300, 486, 51640], "temperature": 0.0, "avg_logprob": -0.18153953552246094, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.013459110632538795}, + {"id": 541, "seek": 351304, "start": 3513.04, "end": 3521.84, "text": " bite us + because we are moving so far ahead in the LLAM. And who is obeying that recommendation?", + "tokens": [50364, 7988, 505, 570, 321, 366, 2684, 370, 1400, 2286, 294, 264, 441, + 43, 2865, 13, 400, 567, 307, 36346, 1840, 300, 11879, 30, 50804], "temperature": + 0.0, "avg_logprob": -0.18967651006743663, "compression_ratio": 1.5695364238410596, + "no_speech_prob": 0.032904285937547684}, {"id": 542, "seek": 351304, "start": 3521.84, + "end": 3527.2799999999997, "text": " Exactly. Probably not the companies that need + content produced. Yeah. Yeah. The moment you say,", "tokens": [50804, 7587, 13, + 9210, 406, 264, 3431, 300, 643, 2701, 7126, 13, 865, 13, 865, 13, 440, 1623, 291, + 584, 11, 51076], "temperature": 0.0, "avg_logprob": -0.18967651006743663, "compression_ratio": + 1.5695364238410596, "no_speech_prob": 0.032904285937547684}, {"id": 543, "seek": + 351304, "start": 3527.2799999999997, "end": 3532.08, "text": " don''t do something, + there will be a bunch of people saying, oh, let''s try. That sounds like fine.", + "tokens": [51076, 500, 380, 360, 746, 11, 456, 486, 312, 257, 3840, 295, 561, 1566, + 11, 1954, 11, 718, 311, 853, 13, 663, 3263, 411, 2489, 13, 51316], "temperature": + 0.0, "avg_logprob": -0.18967651006743663, "compression_ratio": 1.5695364238410596, + "no_speech_prob": 0.032904285937547684}, {"id": 544, "seek": 351304, "start": 3532.08, + "end": 3537.12, "text": " Oh, that''s a good idea. And then we need to invent a + solution for that. Hey, Jonathan, it was", "tokens": [51316, 876, 11, 300, 311, + 257, 665, 1558, 13, 400, 550, 321, 643, 281, 7962, 257, 3827, 337, 300, 13, 1911, + 11, 15471, 11, 309, 390, 51568], "temperature": 0.0, "avg_logprob": -0.18967651006743663, + "compression_ratio": 1.5695364238410596, "no_speech_prob": 0.032904285937547684}, + {"id": 545, "seek": 351304, "start": 3537.12, "end": 3542.0, "text": " really exciting. + And I''ve known it like a ton by talking to you. I feel like we can record", "tokens": + [51568, 534, 4670, 13, 400, 286, 600, 2570, 309, 411, 257, 2952, 538, 1417, 281, + 291, 13, 286, 841, 411, 321, 393, 2136, 51812], "temperature": 0.0, "avg_logprob": + -0.18967651006743663, "compression_ratio": 1.5695364238410596, "no_speech_prob": + 0.032904285937547684}, {"id": 546, "seek": 354200, "start": 3542.0, "end": 3549.36, + "text": " probably like like three months style episode, you know, four or five + hours before we get exhausted.", "tokens": [50364, 1391, 411, 411, 1045, 2493, 3758, + 3500, 11, 291, 458, 11, 1451, 420, 1732, 2496, 949, 321, 483, 17992, 13, 50732], + "temperature": 0.0, "avg_logprob": -0.25740442396719243, "compression_ratio": 1.5549738219895288, + "no_speech_prob": 0.04945411905646324}, {"id": 547, "seek": 354200, "start": 3549.36, + "end": 3557.84, "text": " But I also wanted to give you a chance to, you know, go + on stage and sort of and talk about your book", "tokens": [50732, 583, 286, 611, + 1415, 281, 976, 291, 257, 2931, 281, 11, 291, 458, 11, 352, 322, 3233, 293, 1333, + 295, 293, 751, 466, 428, 1446, 51156], "temperature": 0.0, "avg_logprob": -0.25740442396719243, + "compression_ratio": 1.5549738219895288, "no_speech_prob": 0.04945411905646324}, + {"id": 548, "seek": 354200, "start": 3558.8, "end": 3565.84, "text": " way. Like, + why do you think everyone needs to read it? I want to read it. If I get a chance + to", "tokens": [51204, 636, 13, 1743, 11, 983, 360, 291, 519, 1518, 2203, 281, 1401, + 309, 30, 286, 528, 281, 1401, 309, 13, 759, 286, 483, 257, 2931, 281, 51556], "temperature": + 0.0, "avg_logprob": -0.25740442396719243, "compression_ratio": 1.5549738219895288, + "no_speech_prob": 0.04945411905646324}, {"id": 549, "seek": 356584, "start": 3566.4, + "end": 3572.4, "text": " get my hands on it, hopefully soon. Everyone needs to read + it because every time I make a sale,", "tokens": [50392, 483, 452, 2377, 322, 309, + 11, 4696, 2321, 13, 5198, 2203, 281, 1401, 309, 570, 633, 565, 286, 652, 257, 8680, + 11, 50692], "temperature": 0.0, "avg_logprob": -0.21144647418328053, "compression_ratio": + 1.6385542168674698, "no_speech_prob": 0.02142273634672165}, {"id": 550, "seek": + 356584, "start": 3572.4, "end": 3577.84, "text": " I get one cup of coffee. So that''s + why everyone needs to read it. Of course. Yeah, that''s a good reason.", "tokens": + [50692, 286, 483, 472, 4414, 295, 4982, 13, 407, 300, 311, 983, 1518, 2203, 281, + 1401, 309, 13, 2720, 1164, 13, 865, 11, 300, 311, 257, 665, 1778, 13, 50964], "temperature": + 0.0, "avg_logprob": -0.21144647418328053, "compression_ratio": 1.6385542168674698, + "no_speech_prob": 0.02142273634672165}, {"id": 551, "seek": 356584, "start": 3579.84, + "end": 3587.36, "text": " But then also, yeah, go ahead. No, I wanted also you to + give you a chance to talk about your company.", "tokens": [51064, 583, 550, 611, + 11, 1338, 11, 352, 2286, 13, 883, 11, 286, 1415, 611, 291, 281, 976, 291, 257, 2931, + 281, 751, 466, 428, 2237, 13, 51440], "temperature": 0.0, "avg_logprob": -0.21144647418328053, + "compression_ratio": 1.6385542168674698, "no_speech_prob": 0.02142273634672165}, + {"id": 552, "seek": 356584, "start": 3587.36, "end": 3595.28, "text": " Because + I know that feeling of starting something new on your own, you call yourself an + indie consultant,", "tokens": [51440, 1436, 286, 458, 300, 2633, 295, 2891, 746, + 777, 322, 428, 1065, 11, 291, 818, 1803, 364, 33184, 24676, 11, 51836], "temperature": + 0.0, "avg_logprob": -0.21144647418328053, "compression_ratio": 1.6385542168674698, + "no_speech_prob": 0.02142273634672165}, {"id": 553, "seek": 359528, "start": 3595.28, + "end": 3604.32, "text": " right? At the same time, you have so much with you and + your luggage, right? Like you, the knowledge", "tokens": [50364, 558, 30, 1711, + 264, 912, 565, 11, 291, 362, 370, 709, 365, 291, 293, 428, 27744, 11, 558, 30, 1743, + 291, 11, 264, 3601, 50816], "temperature": 0.0, "avg_logprob": -0.1874878908458509, + "compression_ratio": 1.5128205128205128, "no_speech_prob": 0.006824716459959745}, + {"id": 554, "seek": 359528, "start": 3604.32, "end": 3610.7200000000003, "text": + " of the experience. And so why not share it in a different way through your company. + But I wanted", "tokens": [50816, 295, 264, 1752, 13, 400, 370, 983, 406, 2073, 309, + 294, 257, 819, 636, 807, 428, 2237, 13, 583, 286, 1415, 51136], "temperature": 0.0, + "avg_logprob": -0.1874878908458509, "compression_ratio": 1.5128205128205128, "no_speech_prob": + 0.006824716459959745}, {"id": 555, "seek": 359528, "start": 3610.7200000000003, + "end": 3619.52, "text": " to learn a bit more. What is your vision for the company? + What do you think you will offer like in", "tokens": [51136, 281, 1466, 257, 857, + 544, 13, 708, 307, 428, 5201, 337, 264, 2237, 30, 708, 360, 291, 519, 291, 486, + 2626, 411, 294, 51576], "temperature": 0.0, "avg_logprob": -0.1874878908458509, + "compression_ratio": 1.5128205128205128, "no_speech_prob": 0.006824716459959745}, + {"id": 556, "seek": 361952, "start": 3619.52, "end": 3624.8, "text": " midterm? + Where do you create the value for the customers? And maybe there will be some customers", + "tokens": [50364, 2062, 7039, 30, 2305, 360, 291, 1884, 264, 2158, 337, 264, 4581, + 30, 400, 1310, 456, 486, 312, 512, 4581, 50628], "temperature": 0.0, "avg_logprob": + -0.14651490960802352, "compression_ratio": 1.718045112781955, "no_speech_prob": + 0.016938241198658943}, {"id": 557, "seek": 361952, "start": 3624.8, "end": 3630.64, + "text": " listening in this podcast, hopefully. Sure. Well, okay, let''s go through + both of those then.", "tokens": [50628, 4764, 294, 341, 7367, 11, 4696, 13, 4894, + 13, 1042, 11, 1392, 11, 718, 311, 352, 807, 1293, 295, 729, 550, 13, 50920], "temperature": + 0.0, "avg_logprob": -0.14651490960802352, "compression_ratio": 1.718045112781955, + "no_speech_prob": 0.016938241198658943}, {"id": 558, "seek": 361952, "start": 3631.7599999999998, + "end": 3636.4, "text": " I hope I hope everyone reads the book. I hope they enjoy + it. I hope they learn from it.", "tokens": [50976, 286, 1454, 286, 1454, 1518, 15700, + 264, 1446, 13, 286, 1454, 436, 2103, 309, 13, 286, 1454, 436, 1466, 490, 309, 13, + 51208], "temperature": 0.0, "avg_logprob": -0.14651490960802352, "compression_ratio": + 1.718045112781955, "no_speech_prob": 0.016938241198658943}, {"id": 559, "seek": + 361952, "start": 3638.16, "end": 3642.48, "text": " Working with large language + models is a very different beast from what you''re used to.", "tokens": [51296, + 18337, 365, 2416, 2856, 5245, 307, 257, 588, 819, 13464, 490, 437, 291, 434, 1143, + 281, 13, 51512], "temperature": 0.0, "avg_logprob": -0.14651490960802352, "compression_ratio": + 1.718045112781955, "no_speech_prob": 0.016938241198658943}, {"id": 560, "seek": + 361952, "start": 3643.04, "end": 3648.88, "text": " I think, you know, three years + from now, everyone will be a large language model application", "tokens": [51540, + 286, 519, 11, 291, 458, 11, 1045, 924, 490, 586, 11, 1518, 486, 312, 257, 2416, + 2856, 2316, 3861, 51832], "temperature": 0.0, "avg_logprob": -0.14651490960802352, + "compression_ratio": 1.718045112781955, "no_speech_prob": 0.016938241198658943}, + {"id": 561, "seek": 364888, "start": 3648.88, "end": 3655.6800000000003, "text": + " developer because they''re becoming so prevalent everywhere. So start early. Get + your hands dirty,", "tokens": [50364, 10754, 570, 436, 434, 5617, 370, 30652, 5315, + 13, 407, 722, 2440, 13, 3240, 428, 2377, 9360, 11, 50704], "temperature": 0.0, "avg_logprob": + -0.11855846168720617, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.004083811305463314}, {"id": 562, "seek": 364888, "start": 3655.6800000000003, + "end": 3661.76, "text": " interact with these things. And my book helps kind of + take, you know, give you the training wells", "tokens": [50704, 4648, 365, 613, + 721, 13, 400, 452, 1446, 3665, 733, 295, 747, 11, 291, 458, 11, 976, 291, 264, 3097, + 30984, 51008], "temperature": 0.0, "avg_logprob": -0.11855846168720617, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.004083811305463314}, {"id": 563, "seek": + 364888, "start": 3661.76, "end": 3666.32, "text": " at first to understand here + are a bunch of the problems that you run into. Here''s how here''s", "tokens": [51008, + 412, 700, 281, 1223, 510, 366, 257, 3840, 295, 264, 2740, 300, 291, 1190, 666, 13, + 1692, 311, 577, 510, 311, 51236], "temperature": 0.0, "avg_logprob": -0.11855846168720617, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.004083811305463314}, + {"id": 564, "seek": 364888, "start": 3666.32, "end": 3671.28, "text": " how model + works. That''s there''s actually a lot of good intuition and just understanding + the tool", "tokens": [51236, 577, 2316, 1985, 13, 663, 311, 456, 311, 767, 257, + 688, 295, 665, 24002, 293, 445, 3701, 264, 2290, 51484], "temperature": 0.0, "avg_logprob": + -0.11855846168720617, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.004083811305463314}, {"id": 565, "seek": 364888, "start": 3671.28, "end": 3676.4, + "text": " that you''re interacting with. Here''s how to organize a prompt. And that''s + not always easy. You", "tokens": [51484, 300, 291, 434, 18017, 365, 13, 1692, 311, + 577, 281, 13859, 257, 12391, 13, 400, 300, 311, 406, 1009, 1858, 13, 509, 51740], + "temperature": 0.0, "avg_logprob": -0.11855846168720617, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.004083811305463314}, {"id": 566, "seek": 367640, "start": 3676.4, + "end": 3681.52, "text": " got to figure out what''s all the stuff you might use. + And you can''t use all of it because it", "tokens": [50364, 658, 281, 2573, 484, + 437, 311, 439, 264, 1507, 291, 1062, 764, 13, 400, 291, 393, 380, 764, 439, 295, + 309, 570, 309, 50620], "temperature": 0.0, "avg_logprob": -0.1298813819885254, "compression_ratio": + 1.853846153846154, "no_speech_prob": 0.0006139843608252704}, {"id": 567, "seek": + 367640, "start": 3681.52, "end": 3685.44, "text": " doesn''t all fit or because + you don''t want to wait for the latency. You know, it tells you how to,", "tokens": + [50620, 1177, 380, 439, 3318, 420, 570, 291, 500, 380, 528, 281, 1699, 337, 264, + 27043, 13, 509, 458, 11, 309, 5112, 291, 577, 281, 11, 50816], "temperature": 0.0, + "avg_logprob": -0.1298813819885254, "compression_ratio": 1.853846153846154, "no_speech_prob": + 0.0006139843608252704}, {"id": 568, "seek": 367640, "start": 3685.44, "end": 3690.0, + "text": " you know, fit that into a prompt, present it to the model in a way that + kind of empathetically,", "tokens": [50816, 291, 458, 11, 3318, 300, 666, 257, 12391, + 11, 1974, 309, 281, 264, 2316, 294, 257, 636, 300, 733, 295, 27155, 22652, 11, 51044], + "temperature": 0.0, "avg_logprob": -0.1298813819885254, "compression_ratio": 1.853846153846154, + "no_speech_prob": 0.0006139843608252704}, {"id": 569, "seek": 367640, "start": 3690.0, + "end": 3695.12, "text": " the model is going to understand the model is not psychic. + You need to talk to the model as if", "tokens": [51044, 264, 2316, 307, 516, 281, + 1223, 264, 2316, 307, 406, 35406, 13, 509, 643, 281, 751, 281, 264, 2316, 382, 498, + 51300], "temperature": 0.0, "avg_logprob": -0.1298813819885254, "compression_ratio": + 1.853846153846154, "no_speech_prob": 0.0006139843608252704}, {"id": 570, "seek": + 367640, "start": 3695.12, "end": 3700.8, "text": " you''re talking to, you know, + someone that you''re working with. And then towards the end of the book,", "tokens": + [51300, 291, 434, 1417, 281, 11, 291, 458, 11, 1580, 300, 291, 434, 1364, 365, 13, + 400, 550, 3030, 264, 917, 295, 264, 1446, 11, 51584], "temperature": 0.0, "avg_logprob": + -0.1298813819885254, "compression_ratio": 1.853846153846154, "no_speech_prob": 0.0006139843608252704}, + {"id": 571, "seek": 370080, "start": 3700.8, "end": 3706.2400000000002, "text": + " it gets outside of a single prompt and it talks about, you know, like this tool + magic word we''ve", "tokens": [50364, 309, 2170, 2380, 295, 257, 2167, 12391, 293, + 309, 6686, 466, 11, 291, 458, 11, 411, 341, 2290, 5585, 1349, 321, 600, 50636], + "temperature": 0.0, "avg_logprob": -0.1326928734779358, "compression_ratio": 1.7962962962962963, + "no_speech_prob": 0.0006211837171576917}, {"id": 572, "seek": 370080, "start": 3706.2400000000002, + "end": 3713.36, "text": " got right now, agency and how to build a assistant behavior + with tools and how to, you know, build a", "tokens": [50636, 658, 558, 586, 11, + 7934, 293, 577, 281, 1322, 257, 10994, 5223, 365, 3873, 293, 577, 281, 11, 291, + 458, 11, 1322, 257, 50992], "temperature": 0.0, "avg_logprob": -0.1326928734779358, + "compression_ratio": 1.7962962962962963, "no_speech_prob": 0.0006211837171576917}, + {"id": 573, "seek": 370080, "start": 3713.36, "end": 3718.88, "text": " more sophisticated + thinking steps with it in review of, you know, what''s happened. And it talks", + "tokens": [50992, 544, 16950, 1953, 4439, 365, 309, 294, 3131, 295, 11, 291, 458, + 11, 437, 311, 2011, 13, 400, 309, 6686, 51268], "temperature": 0.0, "avg_logprob": + -0.1326928734779358, "compression_ratio": 1.7962962962962963, "no_speech_prob": + 0.0006211837171576917}, {"id": 574, "seek": 370080, "start": 3718.88, "end": 3724.0800000000004, + "text": " about workflows, which is another type of agency really about how to, + you know, take an input,", "tokens": [51268, 466, 43461, 11, 597, 307, 1071, 2010, + 295, 7934, 534, 466, 577, 281, 11, 291, 458, 11, 747, 364, 4846, 11, 51528], "temperature": + 0.0, "avg_logprob": -0.1326928734779358, "compression_ratio": 1.7962962962962963, + "no_speech_prob": 0.0006211837171576917}, {"id": 575, "seek": 372408, "start": 3724.08, + "end": 3730.88, "text": " bunch of data, pick it apart, do the right steps to get + a job done with hopefully not going off", "tokens": [50364, 3840, 295, 1412, 11, + 1888, 309, 4936, 11, 360, 264, 558, 4439, 281, 483, 257, 1691, 1096, 365, 4696, + 406, 516, 766, 50704], "temperature": 0.0, "avg_logprob": -0.11226291839893048, + "compression_ratio": 1.6099585062240664, "no_speech_prob": 0.00508774584159255}, + {"id": 576, "seek": 372408, "start": 3730.88, "end": 3736.56, "text": " track too + much. We talk a little bit about evaluation and we wrap it up by saying holy cow,", + "tokens": [50704, 2837, 886, 709, 13, 492, 751, 257, 707, 857, 466, 13344, 293, + 321, 7019, 309, 493, 538, 1566, 10622, 8408, 11, 50988], "temperature": 0.0, "avg_logprob": + -0.11226291839893048, "compression_ratio": 1.6099585062240664, "no_speech_prob": + 0.00508774584159255}, {"id": 577, "seek": 372408, "start": 3736.56, "end": 3741.7599999999998, + "text": " look at the future we''re going into. This is going to be amazing. So + I hope you get a chance to read", "tokens": [50988, 574, 412, 264, 2027, 321, 434, + 516, 666, 13, 639, 307, 516, 281, 312, 2243, 13, 407, 286, 1454, 291, 483, 257, + 2931, 281, 1401, 51248], "temperature": 0.0, "avg_logprob": -0.11226291839893048, + "compression_ratio": 1.6099585062240664, "no_speech_prob": 0.00508774584159255}, + {"id": 578, "seek": 372408, "start": 3741.7599999999998, "end": 3748.64, "text": + " the book and I hope you enjoy it. I hope it''s as enjoyable to you as it was painful + to me to write.", "tokens": [51248, 264, 1446, 293, 286, 1454, 291, 2103, 309, 13, + 286, 1454, 309, 311, 382, 20305, 281, 291, 382, 309, 390, 11697, 281, 385, 281, + 2464, 13, 51592], "temperature": 0.0, "avg_logprob": -0.11226291839893048, "compression_ratio": + 1.6099585062240664, "no_speech_prob": 0.00508774584159255}, {"id": 579, "seek": + 374864, "start": 3749.52, "end": 3758.8799999999997, "text": " And then yes, I am + out of my own now. I''m an indie consultant at Arturus Labs. I''m specializing", + "tokens": [50408, 400, 550, 2086, 11, 286, 669, 484, 295, 452, 1065, 586, 13, 286, + 478, 364, 33184, 24676, 412, 5735, 374, 301, 40047, 13, 286, 478, 2121, 3319, 50876], + "temperature": 0.0, "avg_logprob": -0.1967130777787189, "compression_ratio": 1.616326530612245, + "no_speech_prob": 0.0049796137027442455}, {"id": 580, "seek": 374864, "start": 3758.8799999999997, + "end": 3763.92, "text": " in all things just like the book, prompt engineering, + large language model application development.", "tokens": [50876, 294, 439, 721, + 445, 411, 264, 1446, 11, 12391, 7043, 11, 2416, 2856, 2316, 3861, 3250, 13, 51128], + "temperature": 0.0, "avg_logprob": -0.1967130777787189, "compression_ratio": 1.616326530612245, + "no_speech_prob": 0.0049796137027442455}, {"id": 581, "seek": 374864, "start": 3764.96, + "end": 3771.3599999999997, "text": " I think we''re going into a very different + world as far as like how you build things. You''ve got to", "tokens": [51180, 286, + 519, 321, 434, 516, 666, 257, 588, 819, 1002, 382, 1400, 382, 411, 577, 291, 1322, + 721, 13, 509, 600, 658, 281, 51500], "temperature": 0.0, "avg_logprob": -0.1967130777787189, + "compression_ratio": 1.616326530612245, "no_speech_prob": 0.0049796137027442455}, + {"id": 582, "seek": 374864, "start": 3771.3599999999997, "end": 3776.0, "text": + " build it like we had earlier in this conversation. You''ve got to build these + components to deal with.", "tokens": [51500, 1322, 309, 411, 321, 632, 3071, 294, + 341, 3761, 13, 509, 600, 658, 281, 1322, 613, 6677, 281, 2028, 365, 13, 51732], + "temperature": 0.0, "avg_logprob": -0.1967130777787189, "compression_ratio": 1.616326530612245, + "no_speech_prob": 0.0049796137027442455}, {"id": 583, "seek": 377600, "start": 3776.0, + "end": 3781.92, "text": " You''ve got to build it web apps to deal with these components + that are very undependable. I", "tokens": [50364, 509, 600, 658, 281, 1322, 309, + 3670, 7733, 281, 2028, 365, 613, 6677, 300, 366, 588, 674, 4217, 712, 13, 286, 50660], + "temperature": 0.0, "avg_logprob": -0.16883624778999076, "compression_ratio": 1.5958333333333334, + "no_speech_prob": 0.0033530977088958025}, {"id": 584, "seek": 377600, "start": 3781.92, + "end": 3785.92, "text": " do make them as dependable as possible. How do you make + the user experience where they trust what''s", "tokens": [50660, 360, 652, 552, + 382, 5672, 712, 382, 1944, 13, 1012, 360, 291, 652, 264, 4195, 1752, 689, 436, 3361, + 437, 311, 50860], "temperature": 0.0, "avg_logprob": -0.16883624778999076, "compression_ratio": + 1.5958333333333334, "no_speech_prob": 0.0033530977088958025}, {"id": 585, "seek": + 377600, "start": 3785.92, "end": 3793.04, "text": " happening? And that''s tricky. + So I offer a whole range of things from just education, going in", "tokens": [50860, + 2737, 30, 400, 300, 311, 12414, 13, 407, 286, 2626, 257, 1379, 3613, 295, 721, 490, + 445, 3309, 11, 516, 294, 51216], "temperature": 0.0, "avg_logprob": -0.16883624778999076, + "compression_ratio": 1.5958333333333334, "no_speech_prob": 0.0033530977088958025}, + {"id": 586, "seek": 377600, "start": 3793.04, "end": 3800.0, "text": " and training + companies. I like going and working with them to think through what product they''re", + "tokens": [51216, 293, 3097, 3431, 13, 286, 411, 516, 293, 1364, 365, 552, 281, + 519, 807, 437, 1674, 436, 434, 51564], "temperature": 0.0, "avg_logprob": -0.16883624778999076, + "compression_ratio": 1.5958333333333334, "no_speech_prob": 0.0033530977088958025}, + {"id": 587, "seek": 380000, "start": 3800.0, "end": 3809.6, "text": " working on + right now with their next big goal. I can say this is a great idea. You''re on the", + "tokens": [50364, 1364, 322, 558, 586, 365, 641, 958, 955, 3387, 13, 286, 393, 584, + 341, 307, 257, 869, 1558, 13, 509, 434, 322, 264, 50844], "temperature": 0.0, "avg_logprob": + -0.1603565621883311, "compression_ratio": 1.5560165975103735, "no_speech_prob": + 0.005064779426902533}, {"id": 588, "seek": 380000, "start": 3809.6, "end": 3814.96, + "text": " right track. This is not quite feasible, but we can fix it. That''s the + product type stuff I like", "tokens": [50844, 558, 2837, 13, 639, 307, 406, 1596, + 26648, 11, 457, 321, 393, 3191, 309, 13, 663, 311, 264, 1674, 2010, 1507, 286, 411, + 51112], "temperature": 0.0, "avg_logprob": -0.1603565621883311, "compression_ratio": + 1.5560165975103735, "no_speech_prob": 0.005064779426902533}, {"id": 589, "seek": + 380000, "start": 3814.96, "end": 3822.56, "text": " thinking through. And then as + we get to a longer engagement, I just love working with these", "tokens": [51112, + 1953, 807, 13, 400, 550, 382, 321, 483, 281, 257, 2854, 8742, 11, 286, 445, 959, + 1364, 365, 613, 51492], "temperature": 0.0, "avg_logprob": -0.1603565621883311, + "compression_ratio": 1.5560165975103735, "no_speech_prob": 0.005064779426902533}, + {"id": 590, "seek": 380000, "start": 3822.56, "end": 3829.36, "text": " companies, + especially like startups. Just sit down, pair with them, do transfer of knowledge,", + "tokens": [51492, 3431, 11, 2318, 411, 28041, 13, 1449, 1394, 760, 11, 6119, 365, + 552, 11, 360, 5003, 295, 3601, 11, 51832], "temperature": 0.0, "avg_logprob": -0.1603565621883311, + "compression_ratio": 1.5560165975103735, "no_speech_prob": 0.005064779426902533}, + {"id": 591, "seek": 382936, "start": 3829.36, "end": 3835.28, "text": " type stuff. + It''s just really neat to see what people are up to. A lot of creative ideas right + now.", "tokens": [50364, 2010, 1507, 13, 467, 311, 445, 534, 10654, 281, 536, 437, + 561, 366, 493, 281, 13, 316, 688, 295, 5880, 3487, 558, 586, 13, 50660], "temperature": + 0.0, "avg_logprob": -0.21692513446418607, "compression_ratio": 1.5701754385964912, + "no_speech_prob": 0.008672201074659824}, {"id": 592, "seek": 382936, "start": 3836.6400000000003, + "end": 3839.6, "text": " And then finally, yeah, please make sure you check out + my website,", "tokens": [50728, 400, 550, 2721, 11, 1338, 11, 1767, 652, 988, 291, + 1520, 484, 452, 3144, 11, 50876], "temperature": 0.0, "avg_logprob": -0.21692513446418607, + "compression_ratio": 1.5701754385964912, "no_speech_prob": 0.008672201074659824}, + {"id": 593, "seek": 382936, "start": 3839.6, "end": 3849.52, "text": " www.artrisslabs.com. + I''m going to throw together a lot more blog posts like the one we didn''t", "tokens": + [50876, 12520, 13, 446, 81, 891, 75, 17243, 13, 1112, 13, 286, 478, 516, 281, 3507, + 1214, 257, 688, 544, 6968, 12300, 411, 264, 472, 321, 994, 380, 51372], "temperature": + 0.0, "avg_logprob": -0.21692513446418607, "compression_ratio": 1.5701754385964912, + "no_speech_prob": 0.008672201074659824}, {"id": 594, "seek": 382936, "start": 3849.52, + "end": 3855.92, "text": " know today. I''m trying right now to make sure every blog + post has something juicy, a piece of code", "tokens": [51372, 458, 965, 13, 286, + 478, 1382, 558, 586, 281, 652, 988, 633, 6968, 2183, 575, 746, 24696, 11, 257, 2522, + 295, 3089, 51692], "temperature": 0.0, "avg_logprob": -0.21692513446418607, "compression_ratio": + 1.5701754385964912, "no_speech_prob": 0.008672201074659824}, {"id": 595, "seek": + 385592, "start": 3856.0, "end": 3860.8, "text": " that actually works and you can + experience the thing that was running to my mind at the time.", "tokens": [50368, + 300, 767, 1985, 293, 291, 393, 1752, 264, 551, 300, 390, 2614, 281, 452, 1575, 412, + 264, 565, 13, 50608], "temperature": 0.0, "avg_logprob": -0.179567860622032, "compression_ratio": + 1.7309417040358743, "no_speech_prob": 0.17391560971736908}, {"id": 596, "seek": + 385592, "start": 3861.28, "end": 3867.6800000000003, "text": " So try it out. I''m + really engaging on Twitter. Tell me what you think. And yeah, I''d like to get to", + "tokens": [50632, 407, 853, 309, 484, 13, 286, 478, 534, 11268, 322, 5794, 13, 5115, + 385, 437, 291, 519, 13, 400, 1338, 11, 286, 1116, 411, 281, 483, 281, 50952], "temperature": + 0.0, "avg_logprob": -0.179567860622032, "compression_ratio": 1.7309417040358743, + "no_speech_prob": 0.17391560971736908}, {"id": 597, "seek": 385592, "start": 3868.32, + "end": 3877.04, "text": " to know you guys too. Yeah, amazing, amazing. And I wish + you all the best with your new adventure,", "tokens": [50984, 281, 458, 291, 1074, + 886, 13, 865, 11, 2243, 11, 2243, 13, 400, 286, 3172, 291, 439, 264, 1151, 365, + 428, 777, 9868, 11, 51420], "temperature": 0.0, "avg_logprob": -0.179567860622032, + "compression_ratio": 1.7309417040358743, "no_speech_prob": 0.17391560971736908}, + {"id": 598, "seek": 385592, "start": 3877.04, "end": 3883.28, "text": " your new + venture. And yeah, we will link everything. We will link the book. We will link + your", "tokens": [51420, 428, 777, 18474, 13, 400, 1338, 11, 321, 486, 2113, 1203, + 13, 492, 486, 2113, 264, 1446, 13, 492, 486, 2113, 428, 51732], "temperature": 0.0, + "avg_logprob": -0.179567860622032, "compression_ratio": 1.7309417040358743, "no_speech_prob": + 0.17391560971736908}, {"id": 599, "seek": 388328, "start": 3883.28, "end": 3890.0, + "text": " site and blogs for sure. Thanks so much for spending time with me and + educating me and", "tokens": [50364, 3621, 293, 31038, 337, 988, 13, 2561, 370, + 709, 337, 6434, 565, 365, 385, 293, 28835, 385, 293, 50700], "temperature": 0.0, + "avg_logprob": -0.18006372451782227, "compression_ratio": 1.6710526315789473, "no_speech_prob": + 0.06044189631938934}, {"id": 600, "seek": 388328, "start": 3891.2000000000003, "end": + 3897.1200000000003, "text": " keeping up with Mike sometimes, you know, and obvious + questions. It was really, really a pleasure", "tokens": [50760, 5145, 493, 365, + 6602, 2171, 11, 291, 458, 11, 293, 6322, 1651, 13, 467, 390, 534, 11, 534, 257, + 6834, 51056], "temperature": 0.0, "avg_logprob": -0.18006372451782227, "compression_ratio": + 1.6710526315789473, "no_speech_prob": 0.06044189631938934}, {"id": 601, "seek": + 388328, "start": 3897.1200000000003, "end": 3902.8, "text": " to talk to you. And + I really, really hope that we can record sometime soon because you seem to be", + "tokens": [51056, 281, 751, 281, 291, 13, 400, 286, 534, 11, 534, 1454, 300, 321, + 393, 2136, 15053, 2321, 570, 291, 1643, 281, 312, 51340], "temperature": 0.0, "avg_logprob": + -0.18006372451782227, "compression_ratio": 1.6710526315789473, "no_speech_prob": + 0.06044189631938934}, {"id": 602, "seek": 388328, "start": 3902.8, "end": 3912.4, + "text": " cooking a lot of ideas. And you take from what I gather, you take really + practical view of things.", "tokens": [51340, 6361, 257, 688, 295, 3487, 13, 400, + 291, 747, 490, 437, 286, 5448, 11, 291, 747, 534, 8496, 1910, 295, 721, 13, 51820], + "temperature": 0.0, "avg_logprob": -0.18006372451782227, "compression_ratio": 1.6710526315789473, + "no_speech_prob": 0.06044189631938934}, {"id": 603, "seek": 391240, "start": 3912.4, + "end": 3918.7200000000003, "text": " And you''ve been like, and you are an engineer + and researcher. And so that''s very dear to my heart", "tokens": [50364, 400, 291, + 600, 668, 411, 11, 293, 291, 366, 364, 11403, 293, 21751, 13, 400, 370, 300, 311, + 588, 6875, 281, 452, 1917, 50680], "temperature": 0.0, "avg_logprob": -0.12263016641875844, + "compression_ratio": 1.4867724867724867, "no_speech_prob": 0.03289446979761124}, + {"id": 604, "seek": 391240, "start": 3918.7200000000003, "end": 3926.48, "text": + " to see. And I can''t wait to see what you come up with next. Me too. Well, thanks + so much for", "tokens": [50680, 281, 536, 13, 400, 286, 393, 380, 1699, 281, 536, + 437, 291, 808, 493, 365, 958, 13, 1923, 886, 13, 1042, 11, 3231, 370, 709, 337, + 51068], "temperature": 0.0, "avg_logprob": -0.12263016641875844, "compression_ratio": + 1.4867724867724867, "no_speech_prob": 0.03289446979761124}, {"id": 605, "seek": + 391240, "start": 3926.48, "end": 3932.64, "text": " having me on. It''s been great + talking to you. So yeah, let''s do this again sometime. Yeah,", "tokens": [51068, + 1419, 385, 322, 13, 467, 311, 668, 869, 1417, 281, 291, 13, 407, 1338, 11, 718, + 311, 360, 341, 797, 15053, 13, 865, 11, 51376], "temperature": 0.0, "avg_logprob": + -0.12263016641875844, "compression_ratio": 1.4867724867724867, "no_speech_prob": + 0.03289446979761124}, {"id": 606, "seek": 393264, "start": 3932.64, "end": 3934.48, + "text": " thanks, John. Have a good day.", "tokens": [50368, 3231, 11, 2619, 13, + 3560, 257, 665, 786, 13, 50456], "temperature": 0.0, "avg_logprob": -0.3358224232991536, + "compression_ratio": 0.7894736842105263, "no_speech_prob": 0.2616181969642639}]' +--- + +Hello everyone, Vector podcast is back. Season 3. We are wrapping up the season with some really, really juicy episodes. I'm sure you will love this one. I have the privilege of talking to John Barryman today. He is an ex senior machine learning researcher who worked on GitHub Copilot. +Currently, he runs his own consultancy actress labs. I'm sure he will talk more about that. Yeah, welcome, John. Good to be here. How's it going? Awesome. I actually just picked the book of yours and the book that you and Dr. Moll have written together. +I've interviewed Doug a couple of times already on the podcast. He has a lot to say. And I realized you've written this book together. It's my go-to source of wisdom on search. Do you still remember which chapters you covered? Oh my gosh. It's been a long time, I'm sure. +Yeah, if you told me the chapter title, I could probably say whether it is mere Doug. I did all the fun ones that did all hard ones. And we both did chapter one in our own times. I mean, we were all in chapter twice. +I want to read maybe everything, but in the search relevance problems, search under the hood, debugging, relevance problem, tame in tokens, basic multi-field search, how you build relevance function, feed relevance feedback. Yeah, relevance centered enterprise. That's interesting. +And then semantic and personalized search. Wow. Back in when was this published? I think we published that in 2016, I see. Yeah, 2016. Yeah, well, it's been almost 10 years. Yeah, that's the version I have. So you do have semantic search in the end there. Yeah, awesome. Yeah. +But yeah, Joan, it's interesting to introduce yourself to our audience. What's your background? How you got here? What are you up to? Oh, well, I guess that's a long story. I've had a very circuitous path. I started out in aerospace engineering because I like the math. +And as I got into the field, I found that that's a thing that I really liked once the math and was the software. You could do anything with those. And so while everyone was geeking out about satellites and stuff, I thought that was really cool. +But I realized that there's a big, big world out there that you could address the whole thing with software and math. So I breached out and got that book in your hand. My next big adventure was into search. I joined a concerted consultancy in Charlottesville, Virginia, worked with Doug Turnbull. +I did had amazing adventures, hop on planes. And I talked to Zappos, shoe sales and worked with a patent office. And then I got the opportunity to write that book with Doug. So that pushed me all really, really far. +I got the opportunity to start working for some really interesting companies or for Eventbrite and built out their search and recommendation twice. And then I got a chance to parly that into GitHub. So I went to GitHub and built out their last search-based code search infrastructure. +The old search infrastructure had smoke coming out of it. So we came in, rebuilt infrastructure from ground up. And after a while, I was search was fun. But I was always trying to get a little bit back towards math, towards data science tips. +And in about 2021, I got my chance to make the leak to data science. Join data science at GitHub. And from there, it ended up getting the opportunity, just right place, right time to join Copilot. Because that was kind of, you know, ML machine learning type stuff. +And I was in the data science group to that point. And I was, I came on to Copilot after the research team had wrapped up. There was a research team, brilliant people from GitHub next. They said, while look at these large language models, they're going to do amazing things. I think it's time. +And they built this prototype. And then I came in on the team that was there when it was going into production. So how to get this shipped to everyone, how to start improving it, how to measure, you know, what was working and what wasn't not working. +And then from there, I went into chat, Copilot chat. I was working with some of those features inside the web app. And finally, I was like, well, you know, I've got a little bit of knowledge in my head now, time to write another book. +And I connected with one of the research scientists that was on the original team. Albert Ziegler, we wrote the book, Prompt Engineering for Elements. It's about building the Elements applications. +And with that, just published two weeks ago, officially published, I have started out on a new adventure. Yet again, I am running Arturus Labs. I'm an indie consultant. +And I'm focusing on everything, large language models, Prompt Engineering, how to build applications, you know, it's feasibility, evaluations, stuff like that. Kind of anything you want at this point. And it's a blast. Oh, well, fantastic journey. Yeah, thanks for sharing that. +It's very, you know, it says a lot there. You will believe it or not. But I actually advertised your recent books, the Prompt Engineering, to my students on the recent course that we caught up with my former colleagues on LLMs and Generative AI. So I took the chapter on the rag. +And I thought that rag is nothing else than Prompt Engineering, really. Well, yeah, it's interesting. I mean, that's a topic in and of itself. Are we going to open that kind of worms? Of course. Sure. +Yeah, rag is an interesting thing because everyone talks about rag as if it's own entity, that it's a special thing. +But if you like look at it, especially from my background, which has been searched and then large language models, you click look at rag and it is search and then the large language models. +And if you combine them both, then it's really hard to get a good understanding of what's working and what's not working. You just, you know, you throw up the basic chain application, connect the data source. And I guess you just pray that it works. +But really, what it breaks, if you break it down to its components, then you've got a search application and the Prompt Engineering large language application that it overlaps. But a lot of it's kind of downstream. +And if you can look at those two chunks separately, it becomes a lot easier to debug problems. Rather than saying, you know, user asks this question, I got a garbage answer. +You can say the user asks this question, the large language model interpreted it as this search, this search, return these results. And maybe that's the, maybe that's where the problem was. And you can start debugging that. And the search results got interpreted this way. +And maybe you're not presenting it right to the model. + So always the name of the game with it's probably everything we're going to talk about today is, you know, figured out how to take this giant black box and break it down into components and figure out what is, you know, what's it made of and what possibly is going wrong and put sensors there and actually debugger. +Yeah, you're absolutely right. And in the, in the lecture, I actually longed code from someone, I forgot their name, but I'll make sure to link it. We've built a rag ground up without using any framework whatsoever. You didn't mention Langchain, that's one way of doing it for sure. +But we just really built, you know, naive, can and search and just use the model out of the box, sentence, bird. +And then I've noticed that because we did use dot product there, I've noticed that it would favor longer passages over shorter ones, right? For example, it would pull up an appendix of a AI powered book, AI powered search book. +And I was like, like, you could clearly see that it's missing the point. It's not able to pull up one short sentence where the answer lies. It just pulls something else remotely related. And that's exactly what you said, right? Like you need to start debugging what's going on there. +And you need to start fixing on figuring out maybe change the model, maybe change the chunking. But yeah, I agree. It felt a bit like black box, but less so when you implement it ground up, right? So you don't depend on any framework. +And when you implement it ground up, you find out that it's not all of that complicated. And once you've built every piece of it, like, you know, I mean, you've already seen the black box broken down to its sub pieces. It's not a black box anymore. +So yeah, that's typically since the whole industry now is sorting itself, trying to figure out what tools are useful and what tools are not going to be useful, I often advocate that people start as close to the metal as possible. +Because these models are actually pretty friendly, pretty fun to play with. Don't put layers on top of it that obfuscate, you know, what's actually happening. Yeah, absolutely. I'm really itching to ask you more about now, like your time at GitHub. +But before that, I also want to like a little bit take your, you know, take a look at your approach, how you view your career, right? So you look, you worked on search, but then you ended up in the hottest place in the way, applying all the lamps, right? +And you needed to convert in a way to an ML engineer. +Do you view it that way? And also if you do, how did you prepare yourself to become a machine learning researcher, actually not even an engineer, right? You are focusing on research aspects of things. So you needed to move the needle in the research space. +I don't know if I have a good answer for you. Like if anyone thinks my career has been successful, which in many ways I've done all right, it's been luckily like tripping and falling uphill. Every time I fall down, it's like in the uphill direction. And I don't, I'm the hand of Providence. +And so what do I do with any of these crazy jumps that I make to prepare? Pretty much, I just take the jump. I think I'm going to say how I'm going to prepare for the next jump. I take, I see the jump. And then I jump into it and like almost drown every single time by surviving. +So in this particular case, yeah, the move towards AI researcher, I mean, there's a lot in that, there's a lot of weight in that phrase that maybe I don't necessarily feel in my own career. +By beginning search for so long and by wanting to do data science for so long, I made myself, you know, over time, pretty aware of how things were, you know, just the typical approach to the model. +So I was never caught any of this in school, but you know, you read, you read the right books and you know, go through the right examples. Yeah, so I have gained, I wouldn't say just an absolute comfort with any of this even now. +But you know, familiarity of being around it for periods of this point. And then when I jumped into the large language modeling stuff, it's actually kind of interesting because it's a different type of AI expert than we've had before and maybe an easier entrance for a lot of people. +Much of my career, I have been an engineer and really I still, I predominantly think of myself as an engineering mindset. And so when you come into, you know, large language models, it's actually really approachable. +You don't have to immediately know everything about, you know, what choice of models to use and like, you know, how to train and have the whole outside and evaluate. +And you can just go to work and at first, at least, just experiment and I really encourage people to do this when they're building on, they're on an application. Rather than, you know, thinking about all the evaluations and stuff at front, you'll need, don't worry, you'll get to them. +But just get your hands, hands dirty, start using the, the APIs and build up some intuition and a weird way empathy for these large language models. Yeah, yeah, this is brilliantly said. I just recently listened to the episode of Lex Friedman with the ontropic team. +So the CEO and some of the researchers there. And one of them said, yeah, exactly. And one of them said that you, along the lines of what you just said about empathy towards the model that when you know where model succeeds and where it kind of fails, you learn how to prompt it. Right. +You know, like which risks you will encounter and you should be okay with those, but you don't tilt towards more risky areas, in the west to succeed in some specific thing. So I don't know, I like that. +But what is your take on LLM unpredictability compared to more, if you will, you know, traditional programming per se. Right. So for example, when you, when we used to, when you used to write code, and I don't know, C++ Java, what have you? It was very deterministic in many ways. +Maybe there have been some things non deterministic like runtime and so on, but still you felt like you, you are in control of many things, right. With LLM, it's different. For example, when you ask an LLM to summarize a document for you, and then you ask second time, the answer will be different. +It will be, you know, in subtle ways, it will be different. +And so that also creates, in my mind, some issues around, okay, if I have several users accessing the same document, should they compute the summary on the fly, or should they compute it once and store it and then show the same copy to all of them, right. +But that also means that if the original summary was not good enough for some reason and subsequent versions were better, I will never show those better versions. Right. So like, you start asking all these like multitude of questions, or am I asking the wrong questions? It's such a challenge. +And I don't, yeah, it's it's a period. Right. Like if you're used to doing something with Python, it's going to be the exact same answer every single time. With these models, it's just like, you know, a very finicky person that keeps changing their opinion. +And you ask them the same question twice, and they've forgotten what they just said. Because it's a new session, so they literally don't have them. You're fully just that they start over. I think we're going to see a shift in this is not going to change anytime soon. +Just it's almost as if you kind of plug a fake human into the circuit. It's like it's going to be independent. That's the nature of it. And that nature is not going to change anytime soon. So I think what you're going to see is a modification in the way we build code around these things. +I think the pain point is when you assume that it's going to be as predictable as a code that you're used to. +But once you get over it, you realize that, okay, well, if I just literally had a human in the loop, there's like an API to connect to a human, then I have to be build a user experience that is somehow tolerant to that. And so let's see. +A lot of times people are hoping the first phrase into interacting with these things. They say, here's a specification, build this code, and they expect the answer to just forward. Now, that can fail in one of two big ways. One way is that it's just too complex. +The model you can do chain of thought reasoning, 01 has it built in and it's magic. And it's going to get better. But with any sufficiently large request, complex request, since you're just appending one token at a time, it's just too easy to paint yourself into a corner. +So models will get better and they'll be less and less likely to paint themselves into a corner. But it'll always be the case with sufficient complexity. +The other issue that you run into and why we'll never ever get there is because when I describe something, the domain of possible implementations, possible completions that match that input is so much larger than whatever I have in my head right now. +And so if you have a company that's like, you know, we're going to have like, you say the specification or code and it will just always make the code. It's like, you don't realize into the codes written which you even wanted. You don't, and then you go back and change it. +You don't realize the codes written and written incorrectly, you know, that, oh, that it's doing what I said. That's not what I meant. +So what does this all mean? I think that future implementations have to do a lot to keep the user in the loop and make the experience so that the user doesn't feel like they're just shouting instructions at a thing and then hoping that it works. +But the user has to be interacting with this thing and, you know, converging towards a solution. So you see this in a couple of ways. One way is like with the assistant interface. +And cursor, forgive me, GitHub, per se, the cursor is just a really good example here where you feel like you're chatting with someone that is working with you to, to, on this code. It gets into something I hope we talk about a little bit later. +Art of facts, you know, they're, you're having this conversation here, but you're working on these artifacts. You're working on these things. And these assistants under, you understand what they're looking at. +Whenever they make a recommendation to change something, you understand how it's going to change your code. You are still in control as a human, say yes or no for all this stuff. And that's one way that they keep the users in the loop. +The other way that we keep users in the loop, and I promise I'll shut up soon, is there's a assistant type behavior and then there's like workflows where a human is, it's still in the loop. But there is a human at the beginning that designed a workflow as like a set of steps. +You can't just say look at this website and pull out all the phone numbers, all of the menu items, all of the, you know, the structure content. And always expected to work. +Sometimes it's better to say, let's take this big thing and have a human, a human in this loop is defining all the steps that it's going to take to implement this workflow. +And that way it's still, you can make something that is recoverable, you know, that there's airstates for some of these steps and you can get out of them, pass it back up to a real human. +But yeah, all along the way of saying these things are going to remain hard to predict, but the code that's built around them, I think, is going to become very tolerant of that and by pulling the users into the conversation constantly. +Yeah, so you basically, if I got your idea right, is that you put the user in the driver's seat, right? And the model or whatever LLM app is still, it's kind of like an assistant, as you said, or companion, whatever you want to call it, right? +But you, like, you still, I guess we are still at that point in time when we need to know exactly what we want, right? As users. +And I think we also need to know how to get it out of the model, right? +Because sometimes no matter what you know, it's not somehow achievable, maybe because you don't know how to prompt well or, you know, you just go into the loops, I frequently go there, when I, for example, chat to, I don't know, chat GPT or it could be any other tool. +When it just keeps going and returning to the point which didn't work already, because the alternative doesn't work now. And I'm like, okay, neither work. Like what you propose just doesn't work. +What should I do? But still, I feel like I became much more productive as a, I don't write code every day, you know, for my work anymore, but for leaving. But when I do, I feel like I saved, I don't know, three, five days of my time by using these tools. +But there is still this kind of unpredictable component to it, you know, I'll give you one example, very specific one. So I was building like like simple Python code, which would draw a diagram. And on the x-axis, it would need to put, you know, these values like 1, 1.5, 2, 2.5 and so on. +And so the model made a mistake by rounding all these values to an integer. And so when x-axis, all of a sudden, I saw the same values, right? And the model doesn't have the reasoning component to realize that it made a mistake. +Or at least call it out and say, do you want it this way? Or should I do it another way? I had to correct it because I knew that I needed to cast it to float. But if I didn't know programming, I wouldn't be able to do that, right? I would be stuck right there. +And so that's the level lake of sophistication we are in still, right? If we're talking about code completion, but I wonder what you feel about this? What do you think about code complete? +You did call out cursor as the tool you use the probably more often now, but you did work on that in GitHub, Copilot team. +And what was your sense of its quality and like challenges around it? And in general, how did you approach the task that research challenge? I can speak a little bit of that. There's two ways in which I will be unsatisfactory here. One, I can't get into all the details probably. +And another way is I've been gone since May. So I'm sure that that makes an amazing change since then. But this Copilot completions was one of the first successful applications of large language models. +And outside of the pure model, chat to BT, a large language model as a large language model service. Like this is, this was just the, I guess it was the first. So the implementation was actually fairly simplistic. Basically, they, we weren't using chat models at the time. Those didn't exist. +We were only using completion models. Completion models, basically, I mean, your audience probably knows this, but given the top part of a document, then all the model does. And it's useful to think of the model this way. It simplifies things. All the model does is it picks the next token. +What is the most likely token based on all these words before it? What's the next token? And then you append that one and you did it again and again. +And so the big aha moment that happened probably in 2019, as well before my time on Copilot was, look, I can take this top half the code down to the function. And the answer, you know, the completion that it makes is surprisingly good. +So like maybe it's time to just wrap or wrap up for application around it. And then after that, everybody's learning these lessons at this point. But it's all about the context that you put around it and how you present it so that the model can make sense of it. +At the time that I started with Copilot, we were still using the completion models. And it was, the context itself was 2048 tokens, I think. So just tiny, tiny, tiny window. +And so a huge focus at the time was how to take all the things that we thought might be useful and squeeze it down into this tiny space, just, you know, actually make sure you've nailed it. +Because not only do you have to fit the prompt into this 2048 tokens, but whatever the completions are, that's, you know, that they're sharing the same windows. You can move that line up and down, but it's always in 2048. So there wasn't, there wasn't. The ingredients were pretty simple. +The file that you're looking at is obviously the most important thing. If the file is long, which I'll often I'll log in to that 48, then the text right above the cursor is an important thing. +There are some initial work with like the, they're still called fill in the middle models, which they don't need this anymore because all the models are so free. You don't need a specialized model for this. +But you could, you know, you could say the prefix and the suffix, and it would do a good job about filling in the middle. So the suffix was also an important part of the context. +Where do you stop this thing? And then as the model, crew is a context-based crew a little bit, we can start sticking in extra things. And so, you know, you start with little bitty things. +These models were not trained on, these models were trained on code, but they didn't necessarily have the context around the code. So the first easy thing to stick in is you could do a shabang at the top, protecting a comment that says, here's the path for this, this file. +And that gives the model context about where this lives in the context of everything else. A big breakthrough that Albert Dealer, Mike Coother, pioneered was the neighboring tab stuff. And I think this is all common sense these days. +But basically, when you, as a human, are using an IDE, you open up the file you're working on. But you also open up other files for reference. So, duh, why don't we, you know, do that ourselves. And the initial implementations of this that, you know, probably got not better at this point. +It was simple. It was like, look at the text right around the cursor. And then search these files for similar text. And in your timing, 2048 token space, you have any room for any of these snippets, then you can chunk other stuff into the context. You have to be careful how you present that. +You can't just, you know, have random scraps of text that are like, you know, partial function implementations. Because that will prime the model to implement partial functions. Like, it'll, you know, it'll just iterate the same gross pattern. It seems above. +So you do things that make it look more like code. You say, here is an interesting, you skip a code from this file in the comment so that it's still, you know, importantly, so it's still valid syntax at the end of it. +And voila, the rest of this history, we came out with a really impactful product that no one had seen anything like it before. And it's certainly changed the way I code. I'm much quicker and probably dumber at the same time. Yeah, it's been an interesting experience. +Oh, maybe more smart because you get to do more things, right? Like you can, I guess you can, you can get hired, like you can achieve, you know, larger heights and then, like experiment way, you need to experiment, right? And not where it feels maybe more mundane. +As long as the code works and like, I don't know, there are no security holes in it and stuff like that, which would need to be checked separately, I guess. Anyway, that's very interesting. +But to close up the loop there, like I'm just trying to understand, you said you focused on keyword search, right? So you, you owned the elastic search sort of pipeline. +Can you, if you're comfortable disclosing that, like, would that index the visible code in the ID somehow so that you can, or what was the role of that in the whole chain, hope, pipeline? You're asking a lot of questions that don't quite seek well on my actual experience. +Let me see, if I can take your question and you take it just a little bit. When I came to GitHub, I worked on code search, which was keyword, like, school search for the entire code corpus. And that was really cool work. But that has since moved to that, they've rebuilt the whole system yet again. +And it's a really amazing engine, the proprietary engine that's effectively grip at fantastically massive scale. +But that said, that code engine, the one that I built in, even the one that came after it, are not the things that are most beneficial for some of the applications that KhoPy that has in the editor. And they do different things for that. +They're, for example, if you're on the web app side, there are things, now I need even in the ID, I'm remembering stuff from six months ago, they do just in time like vector embedding vector storage and stuff. +Vectors are a lot better for certain types of code search where you're finding code that is about something. Whereas, lexical search is a lot better when you're finding code that matches this exact string. +And I think everyone in code outside of code, everyone everywhere is still kind of wrestling with this. There's no one data structure that does all that stuff ideally. And I think we were wrestling with that inside KhoPylet as well. +Yeah, but I guess, yeah, I understood your point and I probably missed that in your explanation that you worked on code search and not on the generation. That's why in code search, you did use the elastic search index. + But like what I was imagining and I'm completely clueless in this topic, is that by the virtue of LLM being trained on bunch of code, let's say open source code that you can train on license wise, if the user is asking something that reminds the code that had written before, wouldn't it make sense to try to find that code and kind of somehow you know, rag on it with LLM or is it completely different than how you did it? +The like at this point, we've moved to much, it's you know, as of May my left, they've moved to much larger models. +And then the models themselves have read not only all the code and GitHub, but also it's read the internet five times or something. So they read all the blog posts about code. It's amazing, right? It's what times you live in. + So whenever you're typing something and it kind of smells like something it's a thing before, it doesn't, it doesn't necessarily need rag to go get you know, common motifs, common, you know, here's what you're doing, here's what I think you're doing a code right now and it can piece code together from all the code it's ever learned from and extract late outside of it. +But if it is and you know, this is me talking about how maybe I would build a co-pilot. At some point I guess, you know, you need to see if it's if the user's typing code that is so similar to code in this code base that it's worth bringing it in. +And we kind of did that in a rudimentary way with the neighboring tabs. You've already got the tabs open. And that ended up being super useful. +I think there's probably a kind of a decreased efficacy, there's work for this, where if you're doing a rag search over the entire code base, probably the code that you're going to find is already code that's open in the tabs right beside them. So maybe it's useful to do that maybe it's not. +But I don't know. Yeah, interesting. I think code is like, as you said, it's the first successful LLM application. Probably some companies will say, no, no, no, Dr. Boog's was the first successful LLM application. +But I, but I, there were some, maybe it was the first successful neural search application. And then co-pilot was the first LLM application, successful LLM application. And there's plan nine. +Yeah, there was another company that was out there actually before us, but they just didn't have quite the same, they weren't only my Microsoft at the time, that probably helped a bit. Yeah, budget wise. I'm guessing. Yeah. +Yeah, but I still, I still feel like it feels like magic, right? Like, judgey PT also felt magic and scary in the beginning. +Like when I saw it for the first time and I saw it produce code, I thought that my job is done, even though I was not a programmer anymore by then, but I felt the existential, well, not crisis of fear that basically many of us, and especially junior developers, like probably not needed anymore. +But then as I was, you know, overcoming my fear and I was like, now let me try this thing. It's probably a toy. I found some, what I explained, you know, some edge cases, which just doesn't work. It goes in loops. And so I was like, okay, it seems like another tool under my belt. +So I better master it and not, you know, walk away from it. +But the code generation still feels like magic because you can explain, like you can use tap tab and like on a method signature complete, complete something or on the comment complete something, but you could also write natural language, right? +You could say, generate test cases for me or something like that, right? And then it will understand it and will read your code and will reason about it and produce the test cases. +I mean, that feels really magical. It's the time we're wandering into right now is going to feel like magic for a while until we've got to get used to the exponent, it's just going to keep going up and going up more, going up. +But you know, I've had those existential pains myself, but then I realized when I start using these new tools the way that they want me to use them, I have superpowers. I think what we're actually, you got to have the right mindset. +If your mindset is like, oh, my cobalt job is over, you might be right, your cobalt job is probably over. But if your mindset is like, oh, wow, I can do things I never could do before. +I, John Berryman, put together the HTML from my website and built a react app in this like, like I thought I'd have to have a PhD to do something like that. But it's amazing. +And what you're seeing is an emergence of a new group of people that are, they call us the AI natives, AI native development. And I've heard, you know, code composers rather than like just coders. And you have people that are technically savvy. +You can't, you have to have, you know, some ability to, to recode still at this point, to debuck some stuff like you were talking about. But they all go out at a screen, do this thing for me. +And they have, it just takes a little bit of experience to learn how to shout at the screen in the right way. You got to, you know, you've you still got to have the human ability to, you have to think about how this is structured, how to modularize stuff. There, there is a craft to it still. +But you, you can start building up pieces. +Even if you're not technically savvy, if you've been building it in chunks when one of these pieces messes up gloriously and you've got your floating point numbers that I don't work in out like your example, then at least you can say, I'm going to delete back to here. +I'm going to try a different route. See if I can just bump it out of this. And often you can. And people in every walk of life are are much more effective and efficient at creating. +And it's, you know, you don't get this, you don't always get to solve the nitpahee little, you know, if you really love debugging and writing tests, I'm sorry. I think that's your days might be numbered. +But if you love creating, that's I think we're approaching a new golden age and it's exponential. We're going to keep approaching new golden ages for a while. Yeah, I think in my career, if I can reflect a little bit, I, I love creating much more for sure. +But then back then, we didn't have a lamp, so didn't have compilates. We had to do pay a programming, right? And that was our command. Yeah. +And but the, the, the, that notion that you just said about creativity, I think that drove much more forward than us going into the rabbit down the rabbit holes, you know, of debugging that thing. However important that thing was, you know, of course, you need to debug and so on. +But it didn't feel, like you, you would just feel exhausted after that. You know, like, yeah, I fixed that bug. Finally, I squashed it. +Move on because you, you want to build stuff, right? You, and I think it was it, the extra who said, if debugging is the process of removing, finding and removing bugs, then programming must be the process of introducing bugs. And so that's right. Yeah. That's a vicious circle. Yeah. +You, you already touched on that topic a bit earlier about artifacts. I've read your blog posts, which will, will definitely link, link in and I, I got inspired by that. +I have to say, because oftentimes when I go to the set applications, you know, chat, GPT or perplexity, what have you, and you have a longer conversation there, it is hard to then sort of trace back and think, okay, I branched here and, okay, what was my thinking again? +What did I produce at that point? There is nothing to hold on to except scrolling back and forth. +And that's what you really put. And I want you to open, like, you basically proposed something new, I believe. I wonder if you are the creator of this or like, in any case, you carry this idea forward. Can you explain what do you mean by artifacts? I will carry the idea forward. +I think there is what we're seeing is some convergence around the notion that put into my blog post. For example, with anthropics, artifacts, so that they, they splash something that I think is getting at what I'm talking about. +But if you dig a little at the end, it's not quite what I'm talking about. Whenever you engaged in a conversation with an assistant, LM experience, they just want to chat. And so we've done good over time by giving them like tools. +So now it's rather than just like being your therapist, they can go inducing for you. So that's nice. But still, it's a linear flow. And whenever you're talking about something, it flows back into the backstroll. +Most of the time, when you are getting work done, you're getting work done on something. And artifact, there is a staple, I really wanted to call it a staple object of discourse because it isn't object. It's staple because it may change. And it's it's the object of the discourse. +But artifacts is not just easy to say. But this is what we deal with. Whenever we're paraprogramming on something, it's me and you looking at this piece of code, and you make a recommendation about this. And I say, that's good. We go back and forth. +And anything that you can imagine can be addressed like that. The situation becomes a particular point yet, when every you're dealing with multiple artifacts. So if you're saying, I really like this thing over here. And I wonder how it would fit in with this thing over here. +You're having, as a human, you're having to refer to more than one thing that exists outside of this linear conversation. And you're talking about how they relate to one another. And so the blog post, which I hope you guys all read, arterislabs. +com, we'll do this again in a second, right? It gets into what I think of as an artifact. It talks about how to build a prompt so that you have space for this linear conversation. +But you also draw the models attention to a chunk at the top, usually, you might put it all through, you might put it at the bottom to have an experiment with it. But a static chunk, which is like, here is all the things on the table that people can refer to. +Each object, each artifact, has importantly an ID to be referred to. And I've noticed that these models do really well with arbitrary hexadecimal IDs. So I'll just give them a random ID. +But they're really good at referring to those and not like, you know, they don't seem to hallucinate these IDs, which surprised me. +And so if you have a prompt with these artifacts at the top, and you have a system message that explains to the model how to interact with these things, then my experience is that they obey the instructions really well. +They talk humans are used to using pronouns and names and nicknames and, you know, other pointers that refer to the real thing. +And these models having read all the human text that they could get their hands on, the internet five times, they also understand what you mean about using using pronouns and stuff. So you can say, you know, dear model, there's this thing called artifact. They have these IDs. +When you refer to them, then use these, use anchors like in HML because they've seen a lot of those. And in the href tag, refer to it. And here's an example. And they just, I haven't done any like formal like, you know, reinforced testing, but in my experience with, they just haven't gone wrong. +They are comfortable referring to these things. And it provides a really slick experience, I think, for the user. The user at the end of this conversation is looking at a conversation that they don't have to scroll back up to. They're looking at artifacts on the right. +And they can, they can grab the ones they need. The artifacts themselves, you know, the application developer, you're in charge of how you want to present these things. If it's its text, you can just make it text. +But if it's like a home listing, you know, in the background, it can really be represented by, you know, JSON. But you present the user, you know, picture the home and the, you know, scrollable tab and maybe a scheduling button. +You can do all these rich things with artifacts that you can't do if you're just having a chit chat conversation and it's all just scrolling back into the back. So I think it's a cool enough idea. +I think there's some indications that it's coming into existence with, you know, Anthropocardic factor, GPT, OpenAI's Canvas. Persure is actually implicitly doing a really good job with somehow they're doing this. So it'll come into reality, I think, at some point. It just gives you an idea. Yeah. +It feels like it structures the interaction with the element. +It doesn't feel like you lost your time in a way that you, like, it's like you need to summarize it for your conversation, right? To go back and like tell you what was important, right? But how does it all know what is important? You know, but you already forgot. +And so if you have this artifacts, you can refer to them. But it's interesting that I think these artifacts can you use them? And by the way, I don't know, if you can demo something quickly, I saw a demo on your website. All right. +So this is how do you go to my website? Oh, and you know, check this out. This website was me and like chat GPT and cursor just kind of hanging out, teaching me some HTML. But yeah, you go to my blog. Wait a second. Wait a second. You've built this site with an LLM. Correct? Yeah. +That's what you said. Well, it was me and a large language ball. It wouldn't be just saying build a website. Of course. It's going to, it's the, it's what's going to happen in our future. It was everything is going to be a conversation working on this with a large language ball. +It's a beautiful website. I have to say, yeah, amazing. And the logo. Even the sniffy little logo was generated. AI. Oh, amazing. Okay. This is ridiculous. I'm going to take up just a little bit of your time. It's okay. Oh, it's fine. This logo right here. Check out how many cool things out. +There's, there's a bunch of little bits in here. And then I'll make give you a quiz so you can find the last thing hiding in this. Arcturus is a star in the Northern hemisphere. It's a navigational star. It's a brightest star. And it means guardian of the bear. +And so with my cubo logo here, you've got the a you got the bear. The a is kind of serves. It's a little looks like guardian. The bears represent of the big hairy problem. That's powerful. And but I'm going to, I'm going to help you out. The stars are all for for pointed. It's navigational. +There's one more little uh, uh, Easter egg in this that I didn't notice until I finished building it. I didn't design it. It just emerged. And if, yeah, I'll start doing it. If you're a good computer scientist, especially, oh yeah, then yeah, yeah, a star search, a star search. You got it. +You got it. You got it. I didn't even think about it. I just thought this needs kind of a star over here. And I looked at it and it's a star, which is, you know, optimal, near optimal navigation of the difficult domain ahead. LMS is a good at creating Easter eggs then. Yeah, very terrible jazz. +Yeah. So anyway, sorry, sorry for the, also the stars that was, I mean, these stars are amazing as well. You can just stare at them, right? And Marvel, they move, they look a bit like snowflakes sometimes as well. Yep, they do. All right, so thank you for the digression. +We're looking through my blog and we're looking through, uh, cut the chit chat with artifacts. One thing I'm trying to do recently with my blog, and I hope you guys will, you know, there's plenty of place where you can, uh, like, subscribe for this. I'm trying to put in plenty of examples. +And here's the kind of built-in example of it working. Let's see. You know what, this, we might very well edit this out, but I'm going to go down to the now you try a bit right here. Oh, if this is, uh, in a naive approach, uh, let's say that I'm building like a real estate, uh, helper assistant. +I help real estate agents. And the real estate agent says, I want to put together an email for client about, and I'm listed on Oak Street. Can you hold a listing? And so the thing has some tools built in. Uh, it's got a get listing tool. And so you can see all the garbage that puts in there. +And it's got this listing, um, but like, I don't, I've got the listing. It says it's got the listing. Somehow all this garbage, there's a listing, but I don't know what the listing's really about, um, and so I could ask about it, but then it's, it's a filter. +I don't have the thing that came from the database. I have this weird filter in front of it. Uh, can you pull an email template and draft a new email in another tool that it has? I guess it's going to take its sweet time to do it. Oh, of course. Hmm. Hmm. Okay. +Um, so it drafts, it drafts an email, but oh, look, I've forgotten this, the, the buyer's name. So this is one version of the email that is relevant to this thing right here. Uh, but, you know, I've forgotten to tell you his name is Tim Cersei and my company's name is Artie Tristral Estate. +Uh, it goes back to this and so it fills it in and then I'm left at the end of the conversation, you know, copy and paste in this out. +If this is what I want, I'm going to paste this in the user's email and be really embarrassed when it's got this little string at the top because I've copied that out. And if I wanted to do anything else like modify the template or do anything, it's, it's, it's just, it's not there for me. +All right. So let's, let's do a similar experience with, with this. I want to, uh, again, pull out that listing per, for Oak Street. Oh, I have an interesting. All right. +So in this time, I'm still showing that it, it knows how to use tools, but every time it tries to spit out this like JSON stuff, it's actually getting substituted in with an HRF that points to it. And what is it point to where you click on it and it automatically loads, uh, this scar right here. +Now, um, I didn't take time to make a real pretty interface, but you can, imagine this is JSON. You can make this look like anything you want to. You can make it link out to the database and do all sorts of things. All right. I'm going to put together that email template again. Yeah. +I guess especially when you build a dedicated LAM application, right? You know what type of, what types of objects you're going to be interacting with and you can build the, you know, I can do UI around those, right? But yeah, a very flexible, manable interface. +The interface is whatever the user needs it to be potentially. Yeah. All right. So it's, uh, it's, uh, it's, uh, got this customized email draft. Now, uh, you know, I was looking here, there's no email draft here, but there is here on, on the side of the screen. +And you can see unfortunately, I forgot, uh, to stick in the user's names. So, uh, let's see. Here's the template that it used. We didn't see that in the last example. You can see how it wants to put together stuff. You can see how it actually put together stuff. +And whenever I said I forgot his name, it said, Oh, okay, I've updated that artifact for you. So you don't have like multiple versions scaling up. You just got this. And you could do even interesting things like I could say, uh, you know, this is much better if I say gone bare, you man right here. +And that is now part of the artifact that the assistant sees. Uh, it's, it's in that artifact section at the top of the prompt. You can have it say, please change my email prompt forever to say something out of like this. And you can work on this and say that back to the day. +It just, oh, it opens up a lot of possibilities for a user experience that is easier. Because when we get work done, we get work on things, not just check. Yeah. +Your reason, your reason around artifacts and you work with them like as if they were physical objects almost, right? You can take away this thing with you and go proceed with your task. Yep. You refer to them. You modify them. We use them to do things. And you could, I'm guessing. +I'm really guessing. I'm new to this topic. You could maybe even condition the model on these things, right? You could say given this artifact, I want to do something else with it, like rewrite some parts. You know, would that work? I mean, this kind of sky is the limit. +Uh, it's, it's kind of been a fun thing to think about. But you could have typed artifacts. And then when you have a certain type of artifact, you could introduce the tools. Uh, so like, you know, if we need to modify this artifact artifact, we can, we can know how to deal with it. +You can have, it's kind of what I did with my next post, the roaming rag. You can have artifacts that are like accordions. They're, they're bigger than fit in the prompt. +But you can say, you know, here's summarized outline everything in every piece of that summary, uh, the model effectively can click on it and expand it. And it's just another ID and, you know, a tool expand the section. So it can read docs that are bigger than fits in its context. +There's just a lot of neat things that I think you can do with artifacts at the starting point. It's very interesting. Don't you think that just one thought across my mind is that when we transitioned from static web to like web 2. +0, I guess, so what is what was it called when you can actually modify things on the web, right? You could send a comment, you could, you could do stuff. Uh, now it feels like we've transitioned into the new phase when we do the same to the ideas. +We like exchange ideas and we can like modify them, you know, prior on them, prompt with them, uh, take away a store. So it becomes more on the concept level. I think everything's going to get really weird, uh, going forward. I think we've been used to going to the internet and going to web pages. +And even if we could interact a little bit, it's nothing like you're about to see. I wonder if a lot of the internet experiences, you know, they're worried about all the text going away, uh, because like we were, we'd run out, run out of the text, the internet, training these giant giant models. +Maybe the future of the internet is going to be replaced by just conversations. The, you're going to go to a place that is a sensible, you know, starting point, but the whole website is going to become whatever reality you need it to be at the time. +And I have no idea how we harvest the text of that train of future models. It might be crazy, but I think I think we're getting ready for a future we cannot possibly predict. Yeah, and I think spam will be replaced by slope, right? I don't know if you heard of this, YouTube. +No, slow, slow, slow is, uh, SLOP. So it's basically an unverified output of an LLAM model. So something that got produced back to your question, you don't, you have no idea if it's true or not, you go and paste it somewhere in the web and then LLAM goes and scraps it and learns from it. +So you spam the model. And so there is a call out. If this feedback effect. Yeah, exactly. And there is a call out that, hey, let's not spam or let's not post slope on the web because that will bite us because we are moving so far ahead in the LLAM. And who is obeying that recommendation? Exactly. +Probably not the companies that need content produced. Yeah. Yeah. The moment you say, don't do something, there will be a bunch of people saying, oh, let's try. That sounds like fine. Oh, that's a good idea. And then we need to invent a solution for that. Hey, Jonathan, it was really exciting. +And I've known it like a ton by talking to you. I feel like we can record probably like like three months style episode, you know, four or five hours before we get exhausted. But I also wanted to give you a chance to, you know, go on stage and sort of and talk about your book way. +Like, why do you think everyone needs to read it? I want to read it. If I get a chance to get my hands on it, hopefully soon. Everyone needs to read it because every time I make a sale, I get one cup of coffee. So that's why everyone needs to read it. Of course. Yeah, that's a good reason. +But then also, yeah, go ahead. No, I wanted also you to give you a chance to talk about your company. +Because I know that feeling of starting something new on your own, you call yourself an indie consultant, right? At the same time, you have so much with you and your luggage, right? Like you, the knowledge of the experience. And so why not share it in a different way through your company. +But I wanted to learn a bit more. What is your vision for the company? What do you think you will offer like in midterm? Where do you create the value for the customers? And maybe there will be some customers listening in this podcast, hopefully. Sure. +Well, okay, let's go through both of those then. I hope I hope everyone reads the book. I hope they enjoy it. I hope they learn from it. Working with large language models is a very different beast from what you're used to. +I think, you know, three years from now, everyone will be a large language model application developer because they're becoming so prevalent everywhere. So start early. Get your hands dirty, interact with these things. +And my book helps kind of take, you know, give you the training wells at first to understand here are a bunch of the problems that you run into. Here's how here's how model works. That's there's actually a lot of good intuition and just understanding the tool that you're interacting with. +Here's how to organize a prompt. And that's not always easy. You got to figure out what's all the stuff you might use. And you can't use all of it because it doesn't all fit or because you don't want to wait for the latency. +You know, it tells you how to, you know, fit that into a prompt, present it to the model in a way that kind of empathetically, the model is going to understand the model is not psychic. You need to talk to the model as if you're talking to, you know, someone that you're working with. + And then towards the end of the book, it gets outside of a single prompt and it talks about, you know, like this tool magic word we've got right now, agency and how to build a assistant behavior with tools and how to, you know, build a more sophisticated thinking steps with it in review of, you know, what's happened. +And it talks about workflows, which is another type of agency really about how to, you know, take an input, bunch of data, pick it apart, do the right steps to get a job done with hopefully not going off track too much. +We talk a little bit about evaluation and we wrap it up by saying holy cow, look at the future we're going into. This is going to be amazing. So I hope you get a chance to read the book and I hope you enjoy it. I hope it's as enjoyable to you as it was painful to me to write. +And then yes, I am out of my own now. I'm an indie consultant at Arturus Labs. I'm specializing in all things just like the book, prompt engineering, large language model application development. I think we're going into a very different world as far as like how you build things. +You've got to build it like we had earlier in this conversation. You've got to build these components to deal with. You've got to build it web apps to deal with these components that are very undependable. I do make them as dependable as possible. +How do you make the user experience where they trust what's happening? And that's tricky. So I offer a whole range of things from just education, going in and training companies. I like going and working with them to think through what product they're working on right now with their next big goal. +I can say this is a great idea. You're on the right track. This is not quite feasible, but we can fix it. That's the product type stuff I like thinking through. And then as we get to a longer engagement, I just love working with these companies, especially like startups. +Just sit down, pair with them, do transfer of knowledge, type stuff. It's just really neat to see what people are up to. A lot of creative ideas right now. And then finally, yeah, please make sure you check out my website, www.artrisslabs.com. +I'm going to throw together a lot more blog posts like the one we didn't know today. I'm trying right now to make sure every blog post has something juicy, a piece of code that actually works and you can experience the thing that was running to my mind at the time. So try it out. +I'm really engaging on Twitter. Tell me what you think. And yeah, I'd like to get to to know you guys too. Yeah, amazing, amazing. And I wish you all the best with your new adventure, your new venture. And yeah, we will link everything. We will link the book. +We will link your site and blogs for sure. Thanks so much for spending time with me and educating me and keeping up with Mike sometimes, you know, and obvious questions. It was really, really a pleasure to talk to you. +And I really, really hope that we can record sometime soon because you seem to be cooking a lot of ideas. And you take from what I gather, you take really practical view of things. And you've been like, and you are an engineer and researcher. And so that's very dear to my heart to see. +And I can't wait to see what you come up with next. Me too. Well, thanks so much for having me on. It's been great talking to you. So yeah, let's do this again sometime. Yeah, thanks, John. Have a good day. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/connor-shorten-phd-researcher-florida-atlantic-university-founder-at-henry-ai-labs.md b/transcripts_with_timestamps/vector-podcast/connor-shorten-phd-researcher-florida-atlantic-university-founder-at-henry-ai-labs.md new file mode 100644 index 0000000..ff7173e --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/connor-shorten-phd-researcher-florida-atlantic-university-founder-at-henry-ai-labs.md @@ -0,0 +1,4312 @@ +--- +description: '

YouTube: https://www.youtube.com/watch?v=FQAT6E3EX6g

Show + notes:

- On the Measure of Intelligence by François Chollet - Part 1: Foundations + (Paper Explained) [YouTube](https://www.youtube.com/watch?v=3_qGr...)

- + [2108.07258 On the Opportunities and Risks of Foundation Models](https://arxiv.org/abs/2108.07258)

- + [2005.11401 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401)

- + Negative Data Augmentation: https://arxiv.org/abs/2102.05113

- + Beyond Accuracy: Behavioral Testing of NLP models with CheckList: [2005.04118 Beyond + Accuracy: Behavioral Testing of NLP models with CheckList](https://arxiv.org/abs/2005.04118)

- + Symbolic AI vs Deep Learning battle https://www.technologyreview.com/2020...

- + Dense Passage Retrieval for Open-Domain Question Answering https://arxiv.org/abs/2004.04906

- + Data Augmentation Can Improve Robustness https://arxiv.org/abs/2111.05328

- + Contrastive Loss Explained. Contrastive loss has been used recently… | by Brian + Williams | Towards Data Science https://towardsdatascience.com/contra...

- + Keras Code examples https://keras.io/examples/

- + https://you.com/ + -- new web search engine by Richard Socher

- The Book of Why: The New + Science of Cause and Effect: Pearl, Judea, Mackenzie, Dana: 9780465097609: Amazon.com: + Books https://www.amazon.com/Book-Why-Scien...

- + Chelsea Finn: https://twitter.com/chelseabfinn

- + Jeff Clune: https://twitter.com/jeffclune

- + Michael Bronstein (Geometric Deep Learning): https://twitter.com/mmbronstein + https://arxiv.org/abs/2104.13478

- + Connor''s Twitter: https://twitter.com/CShorten30

- + Dmitry''s Twitter: https://twitter.com/DmitryKan

' +image_url: https://media.rss.com/vector-podcast/20211223_011252_c0a8e84bf74cac993f87600e13f3d942.jpg +pub_date: Thu, 23 Dec 2021 13:32:52 GMT +title: Connor Shorten - PhD Researcher - Florida Atlantic University & Founder at + Henry AI Labs +url: https://rss.com/podcasts/vector-podcast/347472 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 13.84, "text": " Hey + everyone, Dr. Podgas here.", "tokens": [50364, 1911, 1518, 11, 2491, 13, 12646, + 10549, 510, 13, 51056], "temperature": 0.0, "avg_logprob": -0.35951926909297344, + "compression_ratio": 1.425233644859813, "no_speech_prob": 0.18691717088222504}, + {"id": 1, "seek": 0, "start": 13.84, "end": 20.16, "text": " And today we have Connor + Shorten with me who will talk a bit about his research about", "tokens": [51056, + 400, 965, 321, 362, 33133, 16881, 268, 365, 385, 567, 486, 751, 257, 857, 466, 702, + 2132, 466, 51372], "temperature": 0.0, "avg_logprob": -0.35951926909297344, "compression_ratio": + 1.425233644859813, "no_speech_prob": 0.18691717088222504}, {"id": 2, "seek": 0, + "start": 20.16, "end": 23.28, "text": " lecture databases, about YouTube hopefully + as well.", "tokens": [51372, 7991, 22380, 11, 466, 3088, 4696, 382, 731, 13, 51528], + "temperature": 0.0, "avg_logprob": -0.35951926909297344, "compression_ratio": 1.425233644859813, + "no_speech_prob": 0.18691717088222504}, {"id": 3, "seek": 0, "start": 23.28, "end": + 25.48, "text": " So I''m expecting a really nice discussion today.", "tokens": [51528, + 407, 286, 478, 9650, 257, 534, 1481, 5017, 965, 13, 51638], "temperature": 0.0, + "avg_logprob": -0.35951926909297344, "compression_ratio": 1.425233644859813, "no_speech_prob": + 0.18691717088222504}, {"id": 4, "seek": 0, "start": 25.48, "end": 27.080000000000002, + "text": " Hey Connor, how are you doing?", "tokens": [51638, 1911, 33133, 11, 577, + 366, 291, 884, 30, 51718], "temperature": 0.0, "avg_logprob": -0.35951926909297344, + "compression_ratio": 1.425233644859813, "no_speech_prob": 0.18691717088222504}, + {"id": 5, "seek": 0, "start": 27.080000000000002, "end": 29.96, "text": " Hey Dmitra, + thanks so much for having me on the podcast.", "tokens": [51718, 1911, 413, 3508, + 424, 11, 3231, 370, 709, 337, 1419, 385, 322, 264, 7367, 13, 51862], "temperature": + 0.0, "avg_logprob": -0.35951926909297344, "compression_ratio": 1.425233644859813, + "no_speech_prob": 0.18691717088222504}, {"id": 6, "seek": 2996, "start": 29.96, + "end": 34.68, "text": " I''m really excited to continue our episode and maybe dive + more into the deep learning research", "tokens": [50364, 286, 478, 534, 2919, 281, + 2354, 527, 3500, 293, 1310, 9192, 544, 666, 264, 2452, 2539, 2132, 50600], "temperature": + 0.0, "avg_logprob": -0.20426280681903547, "compression_ratio": 1.764525993883792, + "no_speech_prob": 0.07932904362678528}, {"id": 7, "seek": 2996, "start": 34.68, + "end": 35.68, "text": " side.", "tokens": [50600, 1252, 13, 50650], "temperature": + 0.0, "avg_logprob": -0.20426280681903547, "compression_ratio": 1.764525993883792, + "no_speech_prob": 0.07932904362678528}, {"id": 8, "seek": 2996, "start": 35.68, + "end": 40.2, "text": " I think our first podcast on Henry AI labs went really into + the detail and the practical", "tokens": [50650, 286, 519, 527, 700, 7367, 322, + 11085, 7318, 20339, 1437, 534, 666, 264, 2607, 293, 264, 8496, 50876], "temperature": + 0.0, "avg_logprob": -0.20426280681903547, "compression_ratio": 1.764525993883792, + "no_speech_prob": 0.07932904362678528}, {"id": 9, "seek": 2996, "start": 40.2, "end": + 44.72, "text": " implementation and the history of Burton Elasticsearch and then + all the different vector databases", "tokens": [50876, 11420, 293, 264, 2503, 295, + 46011, 2699, 2750, 405, 1178, 293, 550, 439, 264, 819, 8062, 22380, 51102], "temperature": + 0.0, "avg_logprob": -0.20426280681903547, "compression_ratio": 1.764525993883792, + "no_speech_prob": 0.07932904362678528}, {"id": 10, "seek": 2996, "start": 44.72, + "end": 49.84, "text": " and I think so now we can kind of maybe look more in the + research side of things and sort", "tokens": [51102, 293, 286, 519, 370, 586, 321, + 393, 733, 295, 1310, 574, 544, 294, 264, 2132, 1252, 295, 721, 293, 1333, 51358], + "temperature": 0.0, "avg_logprob": -0.20426280681903547, "compression_ratio": 1.764525993883792, + "no_speech_prob": 0.07932904362678528}, {"id": 11, "seek": 2996, "start": 49.84, + "end": 53.64, "text": " of discuss together about where we think all this vector + search engine stuff is headed.", "tokens": [51358, 295, 2248, 1214, 466, 689, 321, + 519, 439, 341, 8062, 3164, 2848, 1507, 307, 12798, 13, 51548], "temperature": 0.0, + "avg_logprob": -0.20426280681903547, "compression_ratio": 1.764525993883792, "no_speech_prob": + 0.07932904362678528}, {"id": 12, "seek": 2996, "start": 53.64, "end": 54.64, "text": + " Oh yeah, absolutely.", "tokens": [51548, 876, 1338, 11, 3122, 13, 51598], "temperature": + 0.0, "avg_logprob": -0.20426280681903547, "compression_ratio": 1.764525993883792, + "no_speech_prob": 0.07932904362678528}, {"id": 13, "seek": 2996, "start": 54.64, + "end": 59.88, "text": " And it''s exciting to be recording based on the day when + you actually released that video.", "tokens": [51598, 400, 309, 311, 4670, 281, + 312, 6613, 2361, 322, 264, 786, 562, 291, 767, 4736, 300, 960, 13, 51860], "temperature": + 0.0, "avg_logprob": -0.20426280681903547, "compression_ratio": 1.764525993883792, + "no_speech_prob": 0.07932904362678528}, {"id": 14, "seek": 5988, "start": 59.88, + "end": 65.92, "text": " So obviously we will link it so for our listeners and our + audiences.", "tokens": [50364, 407, 2745, 321, 486, 2113, 309, 370, 337, 527, 23274, + 293, 527, 15479, 13, 50666], "temperature": 0.0, "avg_logprob": -0.21021656195322672, + "compression_ratio": 1.7123893805309736, "no_speech_prob": 0.00705825025215745}, + {"id": 15, "seek": 5988, "start": 65.92, "end": 69.36, "text": " And hey, could + you please introduce yourself?", "tokens": [50666, 400, 4177, 11, 727, 291, 1767, + 5366, 1803, 30, 50838], "temperature": 0.0, "avg_logprob": -0.21021656195322672, + "compression_ratio": 1.7123893805309736, "no_speech_prob": 0.00705825025215745}, + {"id": 16, "seek": 5988, "start": 69.36, "end": 71.12, "text": " Yeah, great.", + "tokens": [50838, 865, 11, 869, 13, 50926], "temperature": 0.0, "avg_logprob": -0.21021656195322672, + "compression_ratio": 1.7123893805309736, "no_speech_prob": 0.00705825025215745}, + {"id": 17, "seek": 5988, "start": 71.12, "end": 76.32000000000001, "text": " So + to say to introduce myself, I guess I would like to kind of like be reintroducing", + "tokens": [50926, 407, 281, 584, 281, 5366, 2059, 11, 286, 2041, 286, 576, 411, + 281, 733, 295, 411, 312, 319, 38132, 2175, 51186], "temperature": 0.0, "avg_logprob": + -0.21021656195322672, "compression_ratio": 1.7123893805309736, "no_speech_prob": + 0.00705825025215745}, {"id": 18, "seek": 5988, "start": 76.32000000000001, "end": + 78.32000000000001, "text": " myself almost every like year.", "tokens": [51186, + 2059, 1920, 633, 411, 1064, 13, 51286], "temperature": 0.0, "avg_logprob": -0.21021656195322672, + "compression_ratio": 1.7123893805309736, "no_speech_prob": 0.00705825025215745}, + {"id": 19, "seek": 5988, "start": 78.32000000000001, "end": 82.96000000000001, "text": + " So as obviously I make these YouTube videos and I''m kind of like still discovering + my", "tokens": [51286, 407, 382, 2745, 286, 652, 613, 3088, 2145, 293, 286, 478, + 733, 295, 411, 920, 24773, 452, 51518], "temperature": 0.0, "avg_logprob": -0.21021656195322672, + "compression_ratio": 1.7123893805309736, "no_speech_prob": 0.00705825025215745}, + {"id": 20, "seek": 5988, "start": 82.96000000000001, "end": 85.88, "text": " role + in deep learning research and still learning myself.", "tokens": [51518, 3090, 294, + 2452, 2539, 2132, 293, 920, 2539, 2059, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.21021656195322672, "compression_ratio": 1.7123893805309736, "no_speech_prob": + 0.00705825025215745}, {"id": 21, "seek": 8588, "start": 86.88, "end": 90.36, "text": + " In my journey, I''m in my second year of my PhD.", "tokens": [50414, 682, 452, + 4671, 11, 286, 478, 294, 452, 1150, 1064, 295, 452, 14476, 13, 50588], "temperature": + 0.0, "avg_logprob": -0.21294747458563912, "compression_ratio": 1.692883895131086, + "no_speech_prob": 0.00791763886809349}, {"id": 22, "seek": 8588, "start": 90.36, + "end": 94.24, "text": " I finished my master''s degree where I got started with + research on generative adversarial", "tokens": [50588, 286, 4335, 452, 4505, 311, + 4314, 689, 286, 658, 1409, 365, 2132, 322, 1337, 1166, 17641, 44745, 50782], "temperature": + 0.0, "avg_logprob": -0.21294747458563912, "compression_ratio": 1.692883895131086, + "no_speech_prob": 0.00791763886809349}, {"id": 23, "seek": 8588, "start": 94.24, + "end": 99.16, "text": " networks and data augmentation, published literature reviews + on data augmentation for", "tokens": [50782, 9590, 293, 1412, 14501, 19631, 11, + 6572, 10394, 10229, 322, 1412, 14501, 19631, 337, 51028], "temperature": 0.0, "avg_logprob": + -0.21294747458563912, "compression_ratio": 1.692883895131086, "no_speech_prob": + 0.00791763886809349}, {"id": 24, "seek": 8588, "start": 99.16, "end": 100.44, "text": + " images and text.", "tokens": [51028, 5267, 293, 2487, 13, 51092], "temperature": + 0.0, "avg_logprob": -0.21294747458563912, "compression_ratio": 1.692883895131086, + "no_speech_prob": 0.00791763886809349}, {"id": 25, "seek": 8588, "start": 100.44, + "end": 105.03999999999999, "text": " And this has really been my research focus + is data augmentation, the idea.", "tokens": [51092, 400, 341, 575, 534, 668, 452, + 2132, 1879, 307, 1412, 14501, 19631, 11, 264, 1558, 13, 51322], "temperature": 0.0, + "avg_logprob": -0.21294747458563912, "compression_ratio": 1.692883895131086, "no_speech_prob": + 0.00791763886809349}, {"id": 26, "seek": 8588, "start": 105.03999999999999, "end": + 109.84, "text": " Primarily my interest was I started out with when I first learned + about deep learning right", "tokens": [51322, 19671, 3289, 452, 1179, 390, 286, + 1409, 484, 365, 562, 286, 700, 3264, 466, 2452, 2539, 558, 51562], "temperature": + 0.0, "avg_logprob": -0.21294747458563912, "compression_ratio": 1.692883895131086, + "no_speech_prob": 0.00791763886809349}, {"id": 27, "seek": 8588, "start": 109.84, + "end": 112.28, "text": " away, I come from being a basketball player.", "tokens": + [51562, 1314, 11, 286, 808, 490, 885, 257, 11767, 4256, 13, 51684], "temperature": + 0.0, "avg_logprob": -0.21294747458563912, "compression_ratio": 1.692883895131086, + "no_speech_prob": 0.00791763886809349}, {"id": 28, "seek": 11228, "start": 112.28, + "end": 116.88, "text": " I played basketball in college and I was ready to go deep + learning for basketball.", "tokens": [50364, 286, 3737, 11767, 294, 3859, 293, 286, + 390, 1919, 281, 352, 2452, 2539, 337, 11767, 13, 50594], "temperature": 0.0, "avg_logprob": + -0.1899503019989514, "compression_ratio": 1.7751677852348993, "no_speech_prob": + 0.017529329285025597}, {"id": 29, "seek": 11228, "start": 116.88, "end": 119.04, + "text": " How can this improve basketball?", "tokens": [50594, 1012, 393, 341, 3470, + 11767, 30, 50702], "temperature": 0.0, "avg_logprob": -0.1899503019989514, "compression_ratio": + 1.7751677852348993, "no_speech_prob": 0.017529329285025597}, {"id": 30, "seek": + 11228, "start": 119.04, "end": 123.12, "text": " So one thing about basketball is + when you''re playing, you want to have a highlight mix", "tokens": [50702, 407, + 472, 551, 466, 11767, 307, 562, 291, 434, 2433, 11, 291, 528, 281, 362, 257, 5078, + 2890, 50906], "temperature": 0.0, "avg_logprob": -0.1899503019989514, "compression_ratio": + 1.7751677852348993, "no_speech_prob": 0.017529329285025597}, {"id": 31, "seek": + 11228, "start": 123.12, "end": 127.12, "text": " tape where you have all your best + moves and helps you get the college scholarship.", "tokens": [50906, 7314, 689, + 291, 362, 439, 428, 1151, 6067, 293, 3665, 291, 483, 264, 3859, 16178, 13, 51106], + "temperature": 0.0, "avg_logprob": -0.1899503019989514, "compression_ratio": 1.7751677852348993, + "no_speech_prob": 0.017529329285025597}, {"id": 32, "seek": 11228, "start": 127.12, + "end": 130.96, "text": " And so I was really familiar with that process of what + it takes to be recruited to play college", "tokens": [51106, 400, 370, 286, 390, + 534, 4963, 365, 300, 1399, 295, 437, 309, 2516, 281, 312, 33004, 281, 862, 3859, + 51298], "temperature": 0.0, "avg_logprob": -0.1899503019989514, "compression_ratio": + 1.7751677852348993, "no_speech_prob": 0.017529329285025597}, {"id": 33, "seek": + 11228, "start": 130.96, "end": 132.16, "text": " basketball.", "tokens": [51298, + 11767, 13, 51358], "temperature": 0.0, "avg_logprob": -0.1899503019989514, "compression_ratio": + 1.7751677852348993, "no_speech_prob": 0.017529329285025597}, {"id": 34, "seek": + 11228, "start": 132.16, "end": 136.56, "text": " So I wanted to build this computer + vision system that would crop out, you know, you''re", "tokens": [51358, 407, 286, + 1415, 281, 1322, 341, 3820, 5201, 1185, 300, 576, 9086, 484, 11, 291, 458, 11, 291, + 434, 51578], "temperature": 0.0, "avg_logprob": -0.1899503019989514, "compression_ratio": + 1.7751677852348993, "no_speech_prob": 0.017529329285025597}, {"id": 35, "seek": + 11228, "start": 136.56, "end": 139.84, "text": " made baskets from full game tapes + automatically.", "tokens": [51578, 1027, 42853, 490, 1577, 1216, 31349, 6772, 13, + 51742], "temperature": 0.0, "avg_logprob": -0.1899503019989514, "compression_ratio": + 1.7751677852348993, "no_speech_prob": 0.017529329285025597}, {"id": 36, "seek": + 13984, "start": 139.84, "end": 143.72, "text": " And so I came into this problem + that everyone has seen where if you try to do supervised", "tokens": [50364, 400, + 370, 286, 1361, 666, 341, 1154, 300, 1518, 575, 1612, 689, 498, 291, 853, 281, 360, + 46533, 50558], "temperature": 0.0, "avg_logprob": -0.21070571567701257, "compression_ratio": + 1.7722772277227723, "no_speech_prob": 0.004523335490375757}, {"id": 37, "seek": + 13984, "start": 143.72, "end": 147.12, "text": " learning with small data sets, + it does not work.", "tokens": [50558, 2539, 365, 1359, 1412, 6352, 11, 309, 775, + 406, 589, 13, 50728], "temperature": 0.0, "avg_logprob": -0.21070571567701257, "compression_ratio": + 1.7722772277227723, "no_speech_prob": 0.004523335490375757}, {"id": 38, "seek": + 13984, "start": 147.12, "end": 151.16, "text": " So like annotating data is extremely + difficult.", "tokens": [50728, 407, 411, 25339, 990, 1412, 307, 4664, 2252, 13, + 50930], "temperature": 0.0, "avg_logprob": -0.21070571567701257, "compression_ratio": + 1.7722772277227723, "no_speech_prob": 0.004523335490375757}, {"id": 39, "seek": + 13984, "start": 151.16, "end": 155.08, "text": " Like you can, if you''re doing + it yourself, you can probably get yourself like, you know,", "tokens": [50930, 1743, + 291, 393, 11, 498, 291, 434, 884, 309, 1803, 11, 291, 393, 1391, 483, 1803, 411, + 11, 291, 458, 11, 51126], "temperature": 0.0, "avg_logprob": -0.21070571567701257, + "compression_ratio": 1.7722772277227723, "no_speech_prob": 0.004523335490375757}, + {"id": 40, "seek": 13984, "start": 155.08, "end": 158.96, "text": " in my case, + I was annotating made baskets and video clips, which is already high dimensional", + "tokens": [51126, 294, 452, 1389, 11, 286, 390, 25339, 990, 1027, 42853, 293, 960, + 13117, 11, 597, 307, 1217, 1090, 18795, 51320], "temperature": 0.0, "avg_logprob": + -0.21070571567701257, "compression_ratio": 1.7722772277227723, "no_speech_prob": + 0.004523335490375757}, {"id": 41, "seek": 13984, "start": 158.96, "end": 162.64000000000001, + "text": " data already, you know, paying to store all that data.", "tokens": [51320, + 1412, 1217, 11, 291, 458, 11, 6229, 281, 3531, 439, 300, 1412, 13, 51504], "temperature": + 0.0, "avg_logprob": -0.21070571567701257, "compression_ratio": 1.7722772277227723, + "no_speech_prob": 0.004523335490375757}, {"id": 42, "seek": 13984, "start": 162.64000000000001, + "end": 165.04, "text": " So, you know, and labeling it was a problem.", "tokens": + [51504, 407, 11, 291, 458, 11, 293, 40244, 309, 390, 257, 1154, 13, 51624], "temperature": + 0.0, "avg_logprob": -0.21070571567701257, "compression_ratio": 1.7722772277227723, + "no_speech_prob": 0.004523335490375757}, {"id": 43, "seek": 13984, "start": 165.04, + "end": 169.16, "text": " So I said, maybe data augmentation because I''m overfitting + this data.", "tokens": [51624, 407, 286, 848, 11, 1310, 1412, 14501, 19631, 570, + 286, 478, 670, 69, 2414, 341, 1412, 13, 51830], "temperature": 0.0, "avg_logprob": + -0.21070571567701257, "compression_ratio": 1.7722772277227723, "no_speech_prob": + 0.004523335490375757}, {"id": 44, "seek": 16916, "start": 169.16, "end": 173.44, + "text": " So I can try to rotate it, crop it horizontally, flip it, increase the + brightness, this whole", "tokens": [50364, 407, 286, 393, 853, 281, 13121, 309, + 11, 9086, 309, 33796, 11, 7929, 309, 11, 3488, 264, 21367, 11, 341, 1379, 50578], + "temperature": 0.0, "avg_logprob": -0.27823940543241277, "compression_ratio": 1.7331081081081081, + "no_speech_prob": 0.0008667344227433205}, {"id": 45, "seek": 16916, "start": 173.44, + "end": 175.44, "text": " package of things you can do.", "tokens": [50578, 7372, + 295, 721, 291, 393, 360, 13, 50678], "temperature": 0.0, "avg_logprob": -0.27823940543241277, + "compression_ratio": 1.7331081081081081, "no_speech_prob": 0.0008667344227433205}, + {"id": 46, "seek": 16916, "start": 175.44, "end": 176.44, "text": " Yeah, and orientation.", + "tokens": [50678, 865, 11, 293, 14764, 13, 50728], "temperature": 0.0, "avg_logprob": + -0.27823940543241277, "compression_ratio": 1.7331081081081081, "no_speech_prob": + 0.0008667344227433205}, {"id": 47, "seek": 16916, "start": 176.44, "end": 177.44, + "text": " Yeah.", "tokens": [50728, 865, 13, 50778], "temperature": 0.0, "avg_logprob": + -0.27823940543241277, "compression_ratio": 1.7331081081081081, "no_speech_prob": + 0.0008667344227433205}, {"id": 48, "seek": 16916, "start": 177.44, "end": 178.44, + "text": " Right.", "tokens": [50778, 1779, 13, 50828], "temperature": 0.0, "avg_logprob": + -0.27823940543241277, "compression_ratio": 1.7331081081081081, "no_speech_prob": + 0.0008667344227433205}, {"id": 49, "seek": 16916, "start": 178.44, "end": 179.44, + "text": " And so it worked pretty well.", "tokens": [50828, 400, 370, 309, 2732, + 1238, 731, 13, 50878], "temperature": 0.0, "avg_logprob": -0.27823940543241277, + "compression_ratio": 1.7331081081081081, "no_speech_prob": 0.0008667344227433205}, + {"id": 50, "seek": 16916, "start": 179.44, "end": 182.28, "text": " So I was pretty + inspired by this idea of data augmentation.", "tokens": [50878, 407, 286, 390, 1238, + 7547, 538, 341, 1558, 295, 1412, 14501, 19631, 13, 51020], "temperature": 0.0, "avg_logprob": + -0.27823940543241277, "compression_ratio": 1.7331081081081081, "no_speech_prob": + 0.0008667344227433205}, {"id": 51, "seek": 16916, "start": 182.28, "end": 187.51999999999998, + "text": " I really like papers like Francois Chalets on the measure of intelligence + where discussing", "tokens": [51020, 286, 534, 411, 10577, 411, 34695, 271, 761, + 304, 1385, 322, 264, 3481, 295, 7599, 689, 10850, 51282], "temperature": 0.0, "avg_logprob": + -0.27823940543241277, "compression_ratio": 1.7331081081081081, "no_speech_prob": + 0.0008667344227433205}, {"id": 52, "seek": 16916, "start": 187.51999999999998, "end": + 193.56, "text": " the ideas of like system centric generalization developer, where + generalization known unknowns", "tokens": [51282, 264, 3487, 295, 411, 1185, 1489, + 1341, 2674, 2144, 10754, 11, 689, 2674, 2144, 2570, 46048, 51584], "temperature": + 0.0, "avg_logprob": -0.27823940543241277, "compression_ratio": 1.7331081081081081, + "no_speech_prob": 0.0008667344227433205}, {"id": 53, "seek": 16916, "start": 193.56, + "end": 198.56, "text": " is kind of, you know, matrix of known and unknowns with + generalization cases.", "tokens": [51584, 307, 733, 295, 11, 291, 458, 11, 8141, + 295, 2570, 293, 46048, 365, 2674, 2144, 3331, 13, 51834], "temperature": 0.0, "avg_logprob": + -0.27823940543241277, "compression_ratio": 1.7331081081081081, "no_speech_prob": + 0.0008667344227433205}, {"id": 54, "seek": 19856, "start": 198.56, "end": 203.12, + "text": " So I hold the belief that we can kind of steer the data in the direction + that enables", "tokens": [50364, 407, 286, 1797, 264, 7107, 300, 321, 393, 733, + 295, 30814, 264, 1412, 294, 264, 3513, 300, 17077, 50592], "temperature": 0.0, "avg_logprob": + -0.1719206907810309, "compression_ratio": 1.773371104815864, "no_speech_prob": 0.0006448252242989838}, + {"id": 55, "seek": 19856, "start": 203.12, "end": 204.12, "text": " more generalization.", + "tokens": [50592, 544, 2674, 2144, 13, 50642], "temperature": 0.0, "avg_logprob": + -0.1719206907810309, "compression_ratio": 1.773371104815864, "no_speech_prob": 0.0006448252242989838}, + {"id": 56, "seek": 19856, "start": 204.12, "end": 207.32, "text": " And the key + to mocking more generalization is mostly going to be in the data space.", "tokens": + [50642, 400, 264, 2141, 281, 17362, 278, 544, 2674, 2144, 307, 5240, 516, 281, 312, + 294, 264, 1412, 1901, 13, 50802], "temperature": 0.0, "avg_logprob": -0.1719206907810309, + "compression_ratio": 1.773371104815864, "no_speech_prob": 0.0006448252242989838}, + {"id": 57, "seek": 19856, "start": 207.32, "end": 211.36, "text": " So I''d say + I''m in this data centric AI category, which is, you know, lately become one", "tokens": + [50802, 407, 286, 1116, 584, 286, 478, 294, 341, 1412, 1489, 1341, 7318, 7719, 11, + 597, 307, 11, 291, 458, 11, 12881, 1813, 472, 51004], "temperature": 0.0, "avg_logprob": + -0.1719206907810309, "compression_ratio": 1.773371104815864, "no_speech_prob": 0.0006448252242989838}, + {"id": 58, "seek": 19856, "start": 211.36, "end": 213.52, "text": " of the buzzwords + of where your camp is.", "tokens": [51004, 295, 264, 13036, 13832, 295, 689, 428, + 2255, 307, 13, 51112], "temperature": 0.0, "avg_logprob": -0.1719206907810309, "compression_ratio": + 1.773371104815864, "no_speech_prob": 0.0006448252242989838}, {"id": 59, "seek": + 19856, "start": 213.52, "end": 217.12, "text": " I love things like neural architecture + search and different learning strategies and all", "tokens": [51112, 286, 959, 721, + 411, 18161, 9482, 3164, 293, 819, 2539, 9029, 293, 439, 51292], "temperature": 0.0, + "avg_logprob": -0.1719206907810309, "compression_ratio": 1.773371104815864, "no_speech_prob": + 0.0006448252242989838}, {"id": 60, "seek": 19856, "start": 217.12, "end": 218.96, + "text": " that, but I really love the data augmentation.", "tokens": [51292, 300, + 11, 457, 286, 534, 959, 264, 1412, 14501, 19631, 13, 51384], "temperature": 0.0, + "avg_logprob": -0.1719206907810309, "compression_ratio": 1.773371104815864, "no_speech_prob": + 0.0006448252242989838}, {"id": 61, "seek": 19856, "start": 218.96, "end": 222.68, + "text": " I think there''s so much opportunity and research to explore this further.", + "tokens": [51384, 286, 519, 456, 311, 370, 709, 2650, 293, 2132, 281, 6839, 341, + 3052, 13, 51570], "temperature": 0.0, "avg_logprob": -0.1719206907810309, "compression_ratio": + 1.773371104815864, "no_speech_prob": 0.0006448252242989838}, {"id": 62, "seek": + 19856, "start": 222.68, "end": 224.0, "text": " And then.", "tokens": [51570, 400, + 550, 13, 51636], "temperature": 0.0, "avg_logprob": -0.1719206907810309, "compression_ratio": + 1.773371104815864, "no_speech_prob": 0.0006448252242989838}, {"id": 63, "seek": + 19856, "start": 224.0, "end": 227.44, "text": " And so yeah, so I have a few ideas + of how this could intersect with vector search engines", "tokens": [51636, 400, + 370, 1338, 11, 370, 286, 362, 257, 1326, 3487, 295, 577, 341, 727, 27815, 365, 8062, + 3164, 12982, 51808], "temperature": 0.0, "avg_logprob": -0.1719206907810309, "compression_ratio": + 1.773371104815864, "no_speech_prob": 0.0006448252242989838}, {"id": 64, "seek": + 22744, "start": 227.44, "end": 229.68, "text": " and vector representation learning.", + "tokens": [50364, 293, 8062, 10290, 2539, 13, 50476], "temperature": 0.0, "avg_logprob": + -0.15516703658633763, "compression_ratio": 1.8631921824104234, "no_speech_prob": + 0.0009783224668353796}, {"id": 65, "seek": 22744, "start": 229.68, "end": 230.68, + "text": " So that''s on one end.", "tokens": [50476, 407, 300, 311, 322, 472, 917, + 13, 50526], "temperature": 0.0, "avg_logprob": -0.15516703658633763, "compression_ratio": + 1.8631921824104234, "no_speech_prob": 0.0009783224668353796}, {"id": 66, "seek": + 22744, "start": 230.68, "end": 234.2, "text": " So that''s kind of, you know, my + research interest is in data augmentation and a bit of a background", "tokens": + [50526, 407, 300, 311, 733, 295, 11, 291, 458, 11, 452, 2132, 1179, 307, 294, 1412, + 14501, 19631, 293, 257, 857, 295, 257, 3678, 50702], "temperature": 0.0, "avg_logprob": + -0.15516703658633763, "compression_ratio": 1.8631921824104234, "no_speech_prob": + 0.0009783224668353796}, {"id": 67, "seek": 22744, "start": 234.2, "end": 237.48, + "text": " about how I became so inspired in data augmentation.", "tokens": [50702, + 466, 577, 286, 3062, 370, 7547, 294, 1412, 14501, 19631, 13, 50866], "temperature": + 0.0, "avg_logprob": -0.15516703658633763, "compression_ratio": 1.8631921824104234, + "no_speech_prob": 0.0009783224668353796}, {"id": 68, "seek": 22744, "start": 237.48, + "end": 242.2, "text": " So then to say kind of what I''m doing right now is, you + know, I''ve, so I''ve started doing", "tokens": [50866, 407, 550, 281, 584, 733, + 295, 437, 286, 478, 884, 558, 586, 307, 11, 291, 458, 11, 286, 600, 11, 370, 286, + 600, 1409, 884, 51102], "temperature": 0.0, "avg_logprob": -0.15516703658633763, + "compression_ratio": 1.8631921824104234, "no_speech_prob": 0.0009783224668353796}, + {"id": 69, "seek": 22744, "start": 242.2, "end": 243.72, "text": " some experiment + papers.", "tokens": [51102, 512, 5120, 10577, 13, 51178], "temperature": 0.0, "avg_logprob": + -0.15516703658633763, "compression_ratio": 1.8631921824104234, "no_speech_prob": + 0.0009783224668353796}, {"id": 70, "seek": 22744, "start": 243.72, "end": 247.48, + "text": " Most of my computing is managed with Google collab, which is pretty nice.", + "tokens": [51178, 4534, 295, 452, 15866, 307, 6453, 365, 3329, 44228, 11, 597, 307, + 1238, 1481, 13, 51366], "temperature": 0.0, "avg_logprob": -0.15516703658633763, + "compression_ratio": 1.8631921824104234, "no_speech_prob": 0.0009783224668353796}, + {"id": 71, "seek": 22744, "start": 247.48, "end": 251.52, "text": " You know, like, + you have the Google collab notebooks and then you have the Google Drive", "tokens": + [51366, 509, 458, 11, 411, 11, 291, 362, 264, 3329, 44228, 43782, 293, 550, 291, + 362, 264, 3329, 15622, 51568], "temperature": 0.0, "avg_logprob": -0.15516703658633763, + "compression_ratio": 1.8631921824104234, "no_speech_prob": 0.0009783224668353796}, + {"id": 72, "seek": 22744, "start": 251.52, "end": 255.72, "text": " integration + for persistence and, you know, you can make it pretty far without putting", "tokens": + [51568, 10980, 337, 37617, 293, 11, 291, 458, 11, 291, 393, 652, 309, 1238, 1400, + 1553, 3372, 51778], "temperature": 0.0, "avg_logprob": -0.15516703658633763, "compression_ratio": + 1.8631921824104234, "no_speech_prob": 0.0009783224668353796}, {"id": 73, "seek": + 25572, "start": 255.72, "end": 260.44, "text": " a dent in your wallet by doing + it by getting too carried away.", "tokens": [50364, 257, 7059, 294, 428, 16599, + 538, 884, 309, 538, 1242, 886, 9094, 1314, 13, 50600], "temperature": 0.0, "avg_logprob": + -0.19090166546049572, "compression_ratio": 1.6236933797909407, "no_speech_prob": + 0.0007440527551807463}, {"id": 74, "seek": 25572, "start": 260.44, "end": 263.64, + "text": " And so that''s kind of how I''m setting that up.", "tokens": [50600, 400, + 370, 300, 311, 733, 295, 577, 286, 478, 3287, 300, 493, 13, 50760], "temperature": + 0.0, "avg_logprob": -0.19090166546049572, "compression_ratio": 1.6236933797909407, + "no_speech_prob": 0.0007440527551807463}, {"id": 75, "seek": 25572, "start": 263.64, + "end": 267.4, "text": " And, you know, I have, you know, I can tell people about + like, as I mentioned, beginning", "tokens": [50760, 400, 11, 291, 458, 11, 286, + 362, 11, 291, 458, 11, 286, 393, 980, 561, 466, 411, 11, 382, 286, 2835, 11, 2863, + 50948], "temperature": 0.0, "avg_logprob": -0.19090166546049572, "compression_ratio": + 1.6236933797909407, "no_speech_prob": 0.0007440527551807463}, {"id": 76, "seek": + 25572, "start": 267.4, "end": 270.12, "text": " trying to reintroduce myself and + figure out my role.", "tokens": [50948, 1382, 281, 319, 38132, 384, 2059, 293, 2573, + 484, 452, 3090, 13, 51084], "temperature": 0.0, "avg_logprob": -0.19090166546049572, + "compression_ratio": 1.6236933797909407, "no_speech_prob": 0.0007440527551807463}, + {"id": 77, "seek": 25572, "start": 270.12, "end": 273.96, "text": " So I had kind + of like recently, like a high of achieving the best student paper at this", "tokens": + [51084, 407, 286, 632, 733, 295, 411, 3938, 11, 411, 257, 1090, 295, 19626, 264, + 1151, 3107, 3035, 412, 341, 51276], "temperature": 0.0, "avg_logprob": -0.19090166546049572, + "compression_ratio": 1.6236933797909407, "no_speech_prob": 0.0007440527551807463}, + {"id": 78, "seek": 25572, "start": 273.96, "end": 278.16, "text": " ICT AI conference + on something about inductive biases.", "tokens": [51276, 286, 10259, 7318, 7586, + 322, 746, 466, 31612, 488, 32152, 13, 51486], "temperature": 0.0, "avg_logprob": + -0.19090166546049572, "compression_ratio": 1.6236933797909407, "no_speech_prob": + 0.0007440527551807463}, {"id": 79, "seek": 25572, "start": 278.16, "end": 282.96, + "text": " And then the next day I get my ICLR reviews back, which were not great.", + "tokens": [51486, 400, 550, 264, 958, 786, 286, 483, 452, 14360, 31722, 10229, 646, + 11, 597, 645, 406, 869, 13, 51726], "temperature": 0.0, "avg_logprob": -0.19090166546049572, + "compression_ratio": 1.6236933797909407, "no_speech_prob": 0.0007440527551807463}, + {"id": 80, "seek": 28296, "start": 282.96, "end": 288.12, "text": " So, you know, + and that''s kind of the journey of this, you know, I''m just setting, setting", + "tokens": [50364, 407, 11, 291, 458, 11, 293, 300, 311, 733, 295, 264, 4671, 295, + 341, 11, 291, 458, 11, 286, 478, 445, 3287, 11, 3287, 50622], "temperature": 0.0, + "avg_logprob": -0.22737705524151142, "compression_ratio": 1.7876447876447876, "no_speech_prob": + 0.1636214703321457}, {"id": 81, "seek": 28296, "start": 288.12, "end": 293.35999999999996, + "text": " forward to ICML and trying to just bounce back and stay on this journey + of figuring out", "tokens": [50622, 2128, 281, 14360, 12683, 293, 1382, 281, 445, + 15894, 646, 293, 1754, 322, 341, 4671, 295, 15213, 484, 50884], "temperature": 0.0, + "avg_logprob": -0.22737705524151142, "compression_ratio": 1.7876447876447876, "no_speech_prob": + 0.1636214703321457}, {"id": 82, "seek": 28296, "start": 293.35999999999996, "end": + 294.68, "text": " how to do deep learning research.", "tokens": [50884, 577, 281, + 360, 2452, 2539, 2132, 13, 50950], "temperature": 0.0, "avg_logprob": -0.22737705524151142, + "compression_ratio": 1.7876447876447876, "no_speech_prob": 0.1636214703321457}, + {"id": 83, "seek": 28296, "start": 294.68, "end": 296.88, "text": " So it''s definitely + a high.", "tokens": [50950, 407, 309, 311, 2138, 257, 1090, 13, 51060], "temperature": + 0.0, "avg_logprob": -0.22737705524151142, "compression_ratio": 1.7876447876447876, + "no_speech_prob": 0.1636214703321457}, {"id": 84, "seek": 28296, "start": 296.88, + "end": 297.88, "text": " Isn''t it?", "tokens": [51060, 6998, 380, 309, 30, 51110], + "temperature": 0.0, "avg_logprob": -0.22737705524151142, "compression_ratio": 1.7876447876447876, + "no_speech_prob": 0.1636214703321457}, {"id": 85, "seek": 28296, "start": 297.88, + "end": 301.76, "text": " It''s like almost always like that, you know, like in machine + learning, nothing is predictable", "tokens": [51110, 467, 311, 411, 1920, 1009, + 411, 300, 11, 291, 458, 11, 411, 294, 3479, 2539, 11, 1825, 307, 27737, 51304], + "temperature": 0.0, "avg_logprob": -0.22737705524151142, "compression_ratio": 1.7876447876447876, + "no_speech_prob": 0.1636214703321457}, {"id": 86, "seek": 28296, "start": 301.76, + "end": 306.4, "text": " and nothing is given, you know, like, and you need to be + kind of averse to that.", "tokens": [51304, 293, 1825, 307, 2212, 11, 291, 458, + 11, 411, 11, 293, 291, 643, 281, 312, 733, 295, 257, 4308, 281, 300, 13, 51536], + "temperature": 0.0, "avg_logprob": -0.22737705524151142, "compression_ratio": 1.7876447876447876, + "no_speech_prob": 0.1636214703321457}, {"id": 87, "seek": 28296, "start": 306.4, + "end": 308.44, "text": " Well, not averse, but resistant, right?", "tokens": [51536, + 1042, 11, 406, 257, 4308, 11, 457, 20383, 11, 558, 30, 51638], "temperature": 0.0, + "avg_logprob": -0.22737705524151142, "compression_ratio": 1.7876447876447876, "no_speech_prob": + 0.1636214703321457}, {"id": 88, "seek": 30844, "start": 308.44, "end": 309.56, "text": + " Like, okay, I''m fine.", "tokens": [50364, 1743, 11, 1392, 11, 286, 478, 2489, + 13, 50420], "temperature": 0.0, "avg_logprob": -0.24192097981770833, "compression_ratio": + 1.7090301003344481, "no_speech_prob": 0.19697165489196777}, {"id": 89, "seek": 30844, + "start": 309.56, "end": 313.32, "text": " I can take risks, but it''s like a marathon.", + "tokens": [50420, 286, 393, 747, 10888, 11, 457, 309, 311, 411, 257, 27601, 13, + 50608], "temperature": 0.0, "avg_logprob": -0.24192097981770833, "compression_ratio": + 1.7090301003344481, "no_speech_prob": 0.19697165489196777}, {"id": 90, "seek": 30844, + "start": 313.32, "end": 314.32, "text": " It''s not a sprint.", "tokens": [50608, + 467, 311, 406, 257, 25075, 13, 50658], "temperature": 0.0, "avg_logprob": -0.24192097981770833, + "compression_ratio": 1.7090301003344481, "no_speech_prob": 0.19697165489196777}, + {"id": 91, "seek": 30844, "start": 314.32, "end": 316.52, "text": " Oh, yeah, definitely.", + "tokens": [50658, 876, 11, 1338, 11, 2138, 13, 50768], "temperature": 0.0, "avg_logprob": + -0.24192097981770833, "compression_ratio": 1.7090301003344481, "no_speech_prob": + 0.19697165489196777}, {"id": 92, "seek": 30844, "start": 316.52, "end": 321.72, + "text": " And just the disappointment of investing a month or two into a research + project and then", "tokens": [50768, 400, 445, 264, 28175, 295, 10978, 257, 1618, + 420, 732, 666, 257, 2132, 1716, 293, 550, 51028], "temperature": 0.0, "avg_logprob": + -0.24192097981770833, "compression_ratio": 1.7090301003344481, "no_speech_prob": + 0.19697165489196777}, {"id": 93, "seek": 30844, "start": 321.72, "end": 323.24, + "text": " you just start running the experiments.", "tokens": [51028, 291, 445, + 722, 2614, 264, 12050, 13, 51104], "temperature": 0.0, "avg_logprob": -0.24192097981770833, + "compression_ratio": 1.7090301003344481, "no_speech_prob": 0.19697165489196777}, + {"id": 94, "seek": 30844, "start": 323.24, "end": 325.76, "text": " And you''re + like, oh, this is not working.", "tokens": [51104, 400, 291, 434, 411, 11, 1954, + 11, 341, 307, 406, 1364, 13, 51230], "temperature": 0.0, "avg_logprob": -0.24192097981770833, + "compression_ratio": 1.7090301003344481, "no_speech_prob": 0.19697165489196777}, + {"id": 95, "seek": 30844, "start": 325.76, "end": 329.28, "text": " And your advisor''s + on the phone twice a week and saying, how''s it going?", "tokens": [51230, 400, + 428, 19161, 311, 322, 264, 2593, 6091, 257, 1243, 293, 1566, 11, 577, 311, 309, + 516, 30, 51406], "temperature": 0.0, "avg_logprob": -0.24192097981770833, "compression_ratio": + 1.7090301003344481, "no_speech_prob": 0.19697165489196777}, {"id": 96, "seek": 30844, + "start": 329.28, "end": 330.72, "text": " And you''re like, not good.", "tokens": + [51406, 400, 291, 434, 411, 11, 406, 665, 13, 51478], "temperature": 0.0, "avg_logprob": + -0.24192097981770833, "compression_ratio": 1.7090301003344481, "no_speech_prob": + 0.19697165489196777}, {"id": 97, "seek": 30844, "start": 330.72, "end": 332.64, + "text": " You know, like, so that''s stressful.", "tokens": [51478, 509, 458, 11, + 411, 11, 370, 300, 311, 19108, 13, 51574], "temperature": 0.0, "avg_logprob": -0.24192097981770833, + "compression_ratio": 1.7090301003344481, "no_speech_prob": 0.19697165489196777}, + {"id": 98, "seek": 30844, "start": 332.64, "end": 335.88, "text": " And, you know, + anyone else going through that, I can definitely relate to that kind of", "tokens": + [51574, 400, 11, 291, 458, 11, 2878, 1646, 516, 807, 300, 11, 286, 393, 2138, 10961, + 281, 300, 733, 295, 51736], "temperature": 0.0, "avg_logprob": -0.24192097981770833, + "compression_ratio": 1.7090301003344481, "no_speech_prob": 0.19697165489196777}, + {"id": 99, "seek": 30844, "start": 335.88, "end": 336.88, "text": " struggle.", + "tokens": [51736, 7799, 13, 51786], "temperature": 0.0, "avg_logprob": -0.24192097981770833, + "compression_ratio": 1.7090301003344481, "no_speech_prob": 0.19697165489196777}, + {"id": 100, "seek": 33688, "start": 336.88, "end": 340.68, "text": " Is this by + the way, why you do YouTube show Henry A. L.A.", "tokens": [50364, 1119, 341, 538, + 264, 636, 11, 983, 291, 360, 3088, 855, 11085, 316, 13, 441, 13, 32, 13, 50554], + "temperature": 0.0, "avg_logprob": -0.24558462816126206, "compression_ratio": 1.628930817610063, + "no_speech_prob": 0.007410705089569092}, {"id": 101, "seek": 33688, "start": 340.68, + "end": 341.68, "text": " Labs?", "tokens": [50554, 40047, 30, 50604], "temperature": + 0.0, "avg_logprob": -0.24558462816126206, "compression_ratio": 1.628930817610063, + "no_speech_prob": 0.007410705089569092}, {"id": 102, "seek": 33688, "start": 341.68, + "end": 342.68, "text": " Is this why you do it?", "tokens": [50604, 1119, 341, 983, + 291, 360, 309, 30, 50654], "temperature": 0.0, "avg_logprob": -0.24558462816126206, + "compression_ratio": 1.628930817610063, "no_speech_prob": 0.007410705089569092}, + {"id": 103, "seek": 33688, "start": 342.68, "end": 343.68, "text": " Or is there + something else as well?", "tokens": [50654, 1610, 307, 456, 746, 1646, 382, 731, + 30, 50704], "temperature": 0.0, "avg_logprob": -0.24558462816126206, "compression_ratio": + 1.628930817610063, "no_speech_prob": 0.007410705089569092}, {"id": 104, "seek": + 33688, "start": 343.68, "end": 348.08, "text": " I just wanted to kind of tap into + the psychological element of it if you thought about it.", "tokens": [50704, 286, + 445, 1415, 281, 733, 295, 5119, 666, 264, 14346, 4478, 295, 309, 498, 291, 1194, + 466, 309, 13, 50924], "temperature": 0.0, "avg_logprob": -0.24558462816126206, "compression_ratio": + 1.628930817610063, "no_speech_prob": 0.007410705089569092}, {"id": 105, "seek": + 33688, "start": 348.08, "end": 350.08, "text": " Yeah, yeah, I love to talk about + it.", "tokens": [50924, 865, 11, 1338, 11, 286, 959, 281, 751, 466, 309, 13, 51024], + "temperature": 0.0, "avg_logprob": -0.24558462816126206, "compression_ratio": 1.628930817610063, + "no_speech_prob": 0.007410705089569092}, {"id": 106, "seek": 33688, "start": 350.08, + "end": 355.48, "text": " I mean, my inspiration for YouTube came from, I guess I + was just like one of these people", "tokens": [51024, 286, 914, 11, 452, 10249, + 337, 3088, 1361, 490, 11, 286, 2041, 286, 390, 445, 411, 472, 295, 613, 561, 51294], + "temperature": 0.0, "avg_logprob": -0.24558462816126206, "compression_ratio": 1.628930817610063, + "no_speech_prob": 0.007410705089569092}, {"id": 107, "seek": 33688, "start": 355.48, + "end": 361.48, "text": " who really enjoyed like we would have guest lectures come + to Florida Atlantic University.", "tokens": [51294, 567, 534, 4626, 411, 321, 576, + 362, 8341, 16564, 808, 281, 9117, 20233, 3535, 13, 51594], "temperature": 0.0, "avg_logprob": + -0.24558462816126206, "compression_ratio": 1.628930817610063, "no_speech_prob": + 0.007410705089569092}, {"id": 108, "seek": 33688, "start": 361.48, "end": 365.08, + "text": " One that stood out to me more than anything else is researchers from Johns + Hopkins came", "tokens": [51594, 1485, 300, 9371, 484, 281, 385, 544, 813, 1340, + 1646, 307, 10309, 490, 37016, 29999, 1361, 51774], "temperature": 0.0, "avg_logprob": + -0.24558462816126206, "compression_ratio": 1.628930817610063, "no_speech_prob": + 0.007410705089569092}, {"id": 109, "seek": 36508, "start": 365.08, "end": 370.47999999999996, + "text": " to, they had built a prosthetic limb that connects to a brain computer + interface.", "tokens": [50364, 281, 11, 436, 632, 3094, 257, 39976, 3532, 30390, + 300, 16967, 281, 257, 3567, 3820, 9226, 13, 50634], "temperature": 0.0, "avg_logprob": + -0.21546588766163793, "compression_ratio": 1.821917808219178, "no_speech_prob": + 0.015819894149899483}, {"id": 110, "seek": 36508, "start": 370.47999999999996, "end": + 374.8, "text": " And they have people who have lost their limbs and they can, you + know, blindfolded touch", "tokens": [50634, 400, 436, 362, 561, 567, 362, 2731, + 641, 29315, 293, 436, 393, 11, 291, 458, 11, 44846, 292, 2557, 50850], "temperature": + 0.0, "avg_logprob": -0.21546588766163793, "compression_ratio": 1.821917808219178, + "no_speech_prob": 0.015819894149899483}, {"id": 111, "seek": 36508, "start": 374.8, + "end": 378.52, "text": " an orange and say, this is an orange, this is an apple, + this is a banana.", "tokens": [50850, 364, 7671, 293, 584, 11, 341, 307, 364, 7671, + 11, 341, 307, 364, 10606, 11, 341, 307, 257, 14194, 13, 51036], "temperature": 0.0, + "avg_logprob": -0.21546588766163793, "compression_ratio": 1.821917808219178, "no_speech_prob": + 0.015819894149899483}, {"id": 112, "seek": 36508, "start": 378.52, "end": 380.56, + "text": " And they came to talk to us at Florida Atlantic.", "tokens": [51036, 400, + 436, 1361, 281, 751, 281, 505, 412, 9117, 20233, 13, 51138], "temperature": 0.0, + "avg_logprob": -0.21546588766163793, "compression_ratio": 1.821917808219178, "no_speech_prob": + 0.015819894149899483}, {"id": 113, "seek": 36508, "start": 380.56, "end": 382.56, + "text": " And I mean, it was, it was inspiring.", "tokens": [51138, 400, 286, 914, + 11, 309, 390, 11, 309, 390, 15883, 13, 51238], "temperature": 0.0, "avg_logprob": + -0.21546588766163793, "compression_ratio": 1.821917808219178, "no_speech_prob": + 0.015819894149899483}, {"id": 114, "seek": 36508, "start": 382.56, "end": 386.32, + "text": " I like, I love these kind of seminars and just, I guess like falling in + love with this", "tokens": [51238, 286, 411, 11, 286, 959, 613, 733, 295, 43112, + 293, 445, 11, 286, 2041, 411, 7440, 294, 959, 365, 341, 51426], "temperature": 0.0, + "avg_logprob": -0.21546588766163793, "compression_ratio": 1.821917808219178, "no_speech_prob": + 0.015819894149899483}, {"id": 115, "seek": 36508, "start": 386.32, "end": 388.32, + "text": " kind of presentation.", "tokens": [51426, 733, 295, 5860, 13, 51526], + "temperature": 0.0, "avg_logprob": -0.21546588766163793, "compression_ratio": 1.821917808219178, + "no_speech_prob": 0.015819894149899483}, {"id": 116, "seek": 36508, "start": 388.32, + "end": 391.52, "text": " It''s almost like, say like to me, it''s kind of like an + allegace to like maybe like stand-up", "tokens": [51526, 467, 311, 1920, 411, 11, + 584, 411, 281, 385, 11, 309, 311, 733, 295, 411, 364, 10364, 617, 281, 411, 1310, + 411, 1463, 12, 1010, 51686], "temperature": 0.0, "avg_logprob": -0.21546588766163793, + "compression_ratio": 1.821917808219178, "no_speech_prob": 0.015819894149899483}, + {"id": 117, "seek": 39152, "start": 391.52, "end": 396.32, "text": " comedy, how + you have someone who gets up on stage and puts the show on, you know, the", "tokens": + [50364, 13394, 11, 577, 291, 362, 1580, 567, 2170, 493, 322, 3233, 293, 8137, 264, + 855, 322, 11, 291, 458, 11, 264, 50604], "temperature": 0.0, "avg_logprob": -0.17862069189965307, + "compression_ratio": 1.916083916083916, "no_speech_prob": 0.030704699456691742}, + {"id": 118, "seek": 39152, "start": 396.32, "end": 398.0, "text": " benefit of the + slides behind them.", "tokens": [50604, 5121, 295, 264, 9788, 2261, 552, 13, 50688], + "temperature": 0.0, "avg_logprob": -0.17862069189965307, "compression_ratio": 1.916083916083916, + "no_speech_prob": 0.030704699456691742}, {"id": 119, "seek": 39152, "start": 398.0, + "end": 401.24, "text": " And, you know, I really like these, these kind of talks.", + "tokens": [50688, 400, 11, 291, 458, 11, 286, 534, 411, 613, 11, 613, 733, 295, + 6686, 13, 50850], "temperature": 0.0, "avg_logprob": -0.17862069189965307, "compression_ratio": + 1.916083916083916, "no_speech_prob": 0.030704699456691742}, {"id": 120, "seek": + 39152, "start": 401.24, "end": 404.91999999999996, "text": " And that''s kind of, + so that''s kind of like the art of it is what I really like about YouTube.", "tokens": + [50850, 400, 300, 311, 733, 295, 11, 370, 300, 311, 733, 295, 411, 264, 1523, 295, + 309, 307, 437, 286, 534, 411, 466, 3088, 13, 51034], "temperature": 0.0, "avg_logprob": + -0.17862069189965307, "compression_ratio": 1.916083916083916, "no_speech_prob": + 0.030704699456691742}, {"id": 121, "seek": 39152, "start": 404.91999999999996, "end": + 409.4, "text": " I mean, I definitely believe in YouTube as the medium for communicating + these ideas", "tokens": [51034, 286, 914, 11, 286, 2138, 1697, 294, 3088, 382, 264, + 6399, 337, 17559, 613, 3487, 51258], "temperature": 0.0, "avg_logprob": -0.17862069189965307, + "compression_ratio": 1.916083916083916, "no_speech_prob": 0.030704699456691742}, + {"id": 122, "seek": 39152, "start": 409.4, "end": 410.4, "text": " right now.", + "tokens": [51258, 558, 586, 13, 51308], "temperature": 0.0, "avg_logprob": -0.17862069189965307, + "compression_ratio": 1.916083916083916, "no_speech_prob": 0.030704699456691742}, + {"id": 123, "seek": 39152, "start": 410.4, "end": 416.52, "text": " You know, like, + and we''ll get into talking about writing on medium and like, yeah, like", "tokens": + [51308, 509, 458, 11, 411, 11, 293, 321, 603, 483, 666, 1417, 466, 3579, 322, 6399, + 293, 411, 11, 1338, 11, 411, 51614], "temperature": 0.0, "avg_logprob": -0.17862069189965307, + "compression_ratio": 1.916083916083916, "no_speech_prob": 0.030704699456691742}, + {"id": 124, "seek": 39152, "start": 416.52, "end": 420.24, "text": " the different + ways you can write on Twitter, you can write on medium, you can record podcasts", + "tokens": [51614, 264, 819, 2098, 291, 393, 2464, 322, 5794, 11, 291, 393, 2464, + 322, 6399, 11, 291, 393, 2136, 24045, 51800], "temperature": 0.0, "avg_logprob": + -0.17862069189965307, "compression_ratio": 1.916083916083916, "no_speech_prob": + 0.030704699456691742}, {"id": 125, "seek": 42024, "start": 420.24, "end": 425.0, + "text": " and put it on Spotify, Apple, and you can write these research papers + obviously just,", "tokens": [50364, 293, 829, 309, 322, 29036, 11, 6373, 11, 293, + 291, 393, 2464, 613, 2132, 10577, 2745, 445, 11, 50602], "temperature": 0.0, "avg_logprob": + -0.32970752716064455, "compression_ratio": 1.6150943396226416, "no_speech_prob": + 0.008928782306611538}, {"id": 126, "seek": 42024, "start": 425.0, "end": 431.96000000000004, + "text": " you know, upload it to archive, treat it like a medium of the number of + user on archive", "tokens": [50602, 291, 458, 11, 6580, 309, 281, 23507, 11, 2387, + 309, 411, 257, 6399, 295, 264, 1230, 295, 4195, 322, 23507, 50950], "temperature": + 0.0, "avg_logprob": -0.32970752716064455, "compression_ratio": 1.6150943396226416, + "no_speech_prob": 0.008928782306611538}, {"id": 127, "seek": 42024, "start": 431.96000000000004, + "end": 434.56, "text": " is probably less than what you get on YouTube.", "tokens": + [50950, 307, 1391, 1570, 813, 437, 291, 483, 322, 3088, 13, 51080], "temperature": + 0.0, "avg_logprob": -0.32970752716064455, "compression_ratio": 1.6150943396226416, + "no_speech_prob": 0.008928782306611538}, {"id": 128, "seek": 42024, "start": 434.56, + "end": 436.04, "text": " The content is different too.", "tokens": [51080, 440, + 2701, 307, 819, 886, 13, 51154], "temperature": 0.0, "avg_logprob": -0.32970752716064455, + "compression_ratio": 1.6150943396226416, "no_speech_prob": 0.008928782306611538}, + {"id": 129, "seek": 42024, "start": 436.04, "end": 437.04, "text": " So, yeah.", + "tokens": [51154, 407, 11, 1338, 13, 51204], "temperature": 0.0, "avg_logprob": + -0.32970752716064455, "compression_ratio": 1.6150943396226416, "no_speech_prob": + 0.008928782306611538}, {"id": 130, "seek": 42024, "start": 437.04, "end": 438.04, + "text": " Yeah.", "tokens": [51204, 865, 13, 51254], "temperature": 0.0, "avg_logprob": + -0.32970752716064455, "compression_ratio": 1.6150943396226416, "no_speech_prob": + 0.008928782306611538}, {"id": 131, "seek": 42024, "start": 438.04, "end": 443.24, + "text": " So, yeah, I really believe in the medium and then I just want to see the + art form develop", "tokens": [51254, 407, 11, 1338, 11, 286, 534, 1697, 294, 264, + 6399, 293, 550, 286, 445, 528, 281, 536, 264, 1523, 1254, 1499, 51514], "temperature": + 0.0, "avg_logprob": -0.32970752716064455, "compression_ratio": 1.6150943396226416, + "no_speech_prob": 0.008928782306611538}, {"id": 132, "seek": 42024, "start": 443.24, + "end": 444.24, "text": " further.", "tokens": [51514, 3052, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.32970752716064455, "compression_ratio": 1.6150943396226416, + "no_speech_prob": 0.008928782306611538}, {"id": 133, "seek": 42024, "start": 444.24, + "end": 446.08, "text": " Like, I''m really impressed with what Yannick Kiltch was + doing.", "tokens": [51564, 1743, 11, 286, 478, 534, 11679, 365, 437, 398, 969, 618, + 591, 2352, 339, 390, 884, 13, 51656], "temperature": 0.0, "avg_logprob": -0.32970752716064455, + "compression_ratio": 1.6150943396226416, "no_speech_prob": 0.008928782306611538}, + {"id": 134, "seek": 44608, "start": 446.08, "end": 450.71999999999997, "text": " + Like, right now he''s just released auto regressive diffusion models and, you know, + I''m excited", "tokens": [50364, 1743, 11, 558, 586, 415, 311, 445, 4736, 8399, + 1121, 22733, 25242, 5245, 293, 11, 291, 458, 11, 286, 478, 2919, 50596], "temperature": + 0.0, "avg_logprob": -0.22870539275693222, "compression_ratio": 1.6795774647887325, + "no_speech_prob": 0.4000794291496277}, {"id": 135, "seek": 44608, "start": 450.71999999999997, + "end": 454.96, "text": " to watch it and that''s, and that''s the fun about it is, + is you have this excitement about", "tokens": [50596, 281, 1159, 309, 293, 300, + 311, 11, 293, 300, 311, 264, 1019, 466, 309, 307, 11, 307, 291, 362, 341, 14755, + 466, 50808], "temperature": 0.0, "avg_logprob": -0.22870539275693222, "compression_ratio": + 1.6795774647887325, "no_speech_prob": 0.4000794291496277}, {"id": 136, "seek": 44608, + "start": 454.96, "end": 455.96, "text": " it.", "tokens": [50808, 309, 13, 50858], + "temperature": 0.0, "avg_logprob": -0.22870539275693222, "compression_ratio": 1.6795774647887325, + "no_speech_prob": 0.4000794291496277}, {"id": 137, "seek": 44608, "start": 455.96, + "end": 456.96, "text": " Let''s link that as well.", "tokens": [50858, 961, 311, + 2113, 300, 382, 731, 13, 50908], "temperature": 0.0, "avg_logprob": -0.22870539275693222, + "compression_ratio": 1.6795774647887325, "no_speech_prob": 0.4000794291496277}, + {"id": 138, "seek": 44608, "start": 456.96, "end": 460.56, "text": " It''s a YouTube + as well or like another show you mentioned.", "tokens": [50908, 467, 311, 257, 3088, + 382, 731, 420, 411, 1071, 855, 291, 2835, 13, 51088], "temperature": 0.0, "avg_logprob": + -0.22870539275693222, "compression_ratio": 1.6795774647887325, "no_speech_prob": + 0.4000794291496277}, {"id": 139, "seek": 44608, "start": 460.56, "end": 461.56, + "text": " Yeah, yeah.", "tokens": [51088, 865, 11, 1338, 13, 51138], "temperature": + 0.0, "avg_logprob": -0.22870539275693222, "compression_ratio": 1.6795774647887325, + "no_speech_prob": 0.4000794291496277}, {"id": 140, "seek": 44608, "start": 461.56, + "end": 466.79999999999995, "text": " I think just YouTube, Yannick Kiltch, I think + most of our viewers will know what we''re talking", "tokens": [51138, 286, 519, + 445, 3088, 11, 398, 969, 618, 591, 2352, 339, 11, 286, 519, 881, 295, 527, 8499, + 486, 458, 437, 321, 434, 1417, 51400], "temperature": 0.0, "avg_logprob": -0.22870539275693222, + "compression_ratio": 1.6795774647887325, "no_speech_prob": 0.4000794291496277}, + {"id": 141, "seek": 44608, "start": 466.79999999999995, "end": 467.79999999999995, + "text": " about.", "tokens": [51400, 466, 13, 51450], "temperature": 0.0, "avg_logprob": + -0.22870539275693222, "compression_ratio": 1.6795774647887325, "no_speech_prob": + 0.4000794291496277}, {"id": 142, "seek": 44608, "start": 467.79999999999995, "end": + 470.28, "text": " I just want to make sure that I will also educate myself.", "tokens": + [51450, 286, 445, 528, 281, 652, 988, 300, 286, 486, 611, 16092, 2059, 13, 51574], + "temperature": 0.0, "avg_logprob": -0.22870539275693222, "compression_ratio": 1.6795774647887325, + "no_speech_prob": 0.4000794291496277}, {"id": 143, "seek": 44608, "start": 470.28, + "end": 472.36, "text": " So, so let''s link that.", "tokens": [51574, 407, 11, 370, + 718, 311, 2113, 300, 13, 51678], "temperature": 0.0, "avg_logprob": -0.22870539275693222, + "compression_ratio": 1.6795774647887325, "no_speech_prob": 0.4000794291496277}, + {"id": 144, "seek": 44608, "start": 472.36, "end": 473.36, "text": " Awesome.", + "tokens": [51678, 10391, 13, 51728], "temperature": 0.0, "avg_logprob": -0.22870539275693222, + "compression_ratio": 1.6795774647887325, "no_speech_prob": 0.4000794291496277}, + {"id": 145, "seek": 47336, "start": 473.36, "end": 478.28000000000003, "text": " + Yeah, I mean, and so yeah, you said that data augmentation is one thing you worked + on and", "tokens": [50364, 865, 11, 286, 914, 11, 293, 370, 1338, 11, 291, 848, + 300, 1412, 14501, 19631, 307, 472, 551, 291, 2732, 322, 293, 50610], "temperature": + 0.0, "avg_logprob": -0.21103688596769143, "compression_ratio": 1.6895306859205776, + "no_speech_prob": 0.10372499376535416}, {"id": 146, "seek": 47336, "start": 478.28000000000003, + "end": 480.2, "text": " I guess continue working on.", "tokens": [50610, 286, 2041, + 2354, 1364, 322, 13, 50706], "temperature": 0.0, "avg_logprob": -0.21103688596769143, + "compression_ratio": 1.6895306859205776, "no_speech_prob": 0.10372499376535416}, + {"id": 147, "seek": 47336, "start": 480.2, "end": 485.28000000000003, "text": " + It''s actually interesting that you did that in CV space, but there is also somehow + connection", "tokens": [50706, 467, 311, 767, 1880, 300, 291, 630, 300, 294, 22995, + 1901, 11, 457, 456, 307, 611, 6063, 4984, 50960], "temperature": 0.0, "avg_logprob": + -0.21103688596769143, "compression_ratio": 1.6895306859205776, "no_speech_prob": + 0.10372499376535416}, {"id": 148, "seek": 47336, "start": 485.28000000000003, "end": + 486.28000000000003, "text": " in text, right?", "tokens": [50960, 294, 2487, 11, + 558, 30, 51010], "temperature": 0.0, "avg_logprob": -0.21103688596769143, "compression_ratio": + 1.6895306859205776, "no_speech_prob": 0.10372499376535416}, {"id": 149, "seek": + 47336, "start": 486.28000000000003, "end": 488.68, "text": " Can you tell a bit + more about that?", "tokens": [51010, 1664, 291, 980, 257, 857, 544, 466, 300, 30, + 51130], "temperature": 0.0, "avg_logprob": -0.21103688596769143, "compression_ratio": + 1.6895306859205776, "no_speech_prob": 0.10372499376535416}, {"id": 150, "seek": + 47336, "start": 488.68, "end": 494.76, "text": " Yeah, so I, so I spent the, I think + it was this, sorry, I''m getting my dates wrong.", "tokens": [51130, 865, 11, 370, + 286, 11, 370, 286, 4418, 264, 11, 286, 519, 309, 390, 341, 11, 2597, 11, 286, 478, + 1242, 452, 11691, 2085, 13, 51434], "temperature": 0.0, "avg_logprob": -0.21103688596769143, + "compression_ratio": 1.6895306859205776, "no_speech_prob": 0.10372499376535416}, + {"id": 151, "seek": 47336, "start": 494.76, "end": 495.76, "text": " It''s currently + the fall.", "tokens": [51434, 467, 311, 4362, 264, 2100, 13, 51484], "temperature": + 0.0, "avg_logprob": -0.21103688596769143, "compression_ratio": 1.6895306859205776, + "no_speech_prob": 0.10372499376535416}, {"id": 152, "seek": 47336, "start": 495.76, + "end": 499.96000000000004, "text": " So, I think I spent the summer spring of last + year trying to transition these ideas into", "tokens": [51484, 407, 11, 286, 519, + 286, 4418, 264, 4266, 5587, 295, 1036, 1064, 1382, 281, 6034, 613, 3487, 666, 51694], + "temperature": 0.0, "avg_logprob": -0.21103688596769143, "compression_ratio": 1.6895306859205776, + "no_speech_prob": 0.10372499376535416}, {"id": 153, "seek": 47336, "start": 499.96000000000004, + "end": 500.96000000000004, "text": " text.", "tokens": [51694, 2487, 13, 51744], + "temperature": 0.0, "avg_logprob": -0.21103688596769143, "compression_ratio": 1.6895306859205776, + "no_speech_prob": 0.10372499376535416}, {"id": 154, "seek": 50096, "start": 500.96, + "end": 505.71999999999997, "text": " I did the image data augmentation survey in + 2019 where the sentiment was still extremely", "tokens": [50364, 286, 630, 264, + 3256, 1412, 14501, 19631, 8984, 294, 6071, 689, 264, 16149, 390, 920, 4664, 50602], + "temperature": 0.0, "avg_logprob": -0.22838417912872744, "compression_ratio": 1.655688622754491, + "no_speech_prob": 0.025469627231359482}, {"id": 155, "seek": 50096, "start": 505.71999999999997, + "end": 508.08, "text": " hot around GANs, gender, vatricero networks.", "tokens": + [50602, 2368, 926, 460, 1770, 82, 11, 7898, 11, 371, 267, 1341, 2032, 9590, 13, + 50720], "temperature": 0.0, "avg_logprob": -0.22838417912872744, "compression_ratio": + 1.655688622754491, "no_speech_prob": 0.025469627231359482}, {"id": 156, "seek": + 50096, "start": 508.08, "end": 510.96, "text": " Everyone was really excited about + this real fake loss.", "tokens": [50720, 5198, 390, 534, 2919, 466, 341, 957, 7592, + 4470, 13, 50864], "temperature": 0.0, "avg_logprob": -0.22838417912872744, "compression_ratio": + 1.655688622754491, "no_speech_prob": 0.025469627231359482}, {"id": 157, "seek": + 50096, "start": 510.96, "end": 514.52, "text": " We can generate data and then add + that to the data set and then, you know, suddenly we", "tokens": [50864, 492, 393, + 8460, 1412, 293, 550, 909, 300, 281, 264, 1412, 992, 293, 550, 11, 291, 458, 11, + 5800, 321, 51042], "temperature": 0.0, "avg_logprob": -0.22838417912872744, "compression_ratio": + 1.655688622754491, "no_speech_prob": 0.025469627231359482}, {"id": 158, "seek": + 50096, "start": 514.52, "end": 519.4, "text": " have this very broad coverage for + interpolation in our data space.", "tokens": [51042, 362, 341, 588, 4152, 9645, + 337, 44902, 399, 294, 527, 1412, 1901, 13, 51286], "temperature": 0.0, "avg_logprob": + -0.22838417912872744, "compression_ratio": 1.655688622754491, "no_speech_prob": + 0.025469627231359482}, {"id": 159, "seek": 50096, "start": 519.4, "end": 521.64, + "text": " So then I was trying to look into text.", "tokens": [51286, 407, 550, + 286, 390, 1382, 281, 574, 666, 2487, 13, 51398], "temperature": 0.0, "avg_logprob": + -0.22838417912872744, "compression_ratio": 1.655688622754491, "no_speech_prob": + 0.025469627231359482}, {"id": 160, "seek": 50096, "start": 521.64, "end": 525.92, + "text": " Text is, I say the key lesson I learned is that it''s harder to be labeled + preserving.", "tokens": [51398, 18643, 307, 11, 286, 584, 264, 2141, 6898, 286, + 3264, 307, 300, 309, 311, 6081, 281, 312, 21335, 33173, 13, 51612], "temperature": + 0.0, "avg_logprob": -0.22838417912872744, "compression_ratio": 1.655688622754491, + "no_speech_prob": 0.025469627231359482}, {"id": 161, "seek": 50096, "start": 525.92, + "end": 530.04, "text": " When you''re forming the X prime Y, it''s less likely that + the Y is going to have that", "tokens": [51612, 1133, 291, 434, 15745, 264, 1783, + 5835, 398, 11, 309, 311, 1570, 3700, 300, 264, 398, 307, 516, 281, 362, 300, 51818], + "temperature": 0.0, "avg_logprob": -0.22838417912872744, "compression_ratio": 1.655688622754491, + "no_speech_prob": 0.025469627231359482}, {"id": 162, "seek": 53004, "start": 530.04, + "end": 536.4399999999999, "text": " same high level class labels as you''re trying + to do things like say, like the starter kit", "tokens": [50364, 912, 1090, 1496, + 1508, 16949, 382, 291, 434, 1382, 281, 360, 721, 411, 584, 11, 411, 264, 22465, + 8260, 50684], "temperature": 0.0, "avg_logprob": -0.20935244208214268, "compression_ratio": + 1.8543046357615893, "no_speech_prob": 0.0050623067654669285}, {"id": 163, "seek": + 53004, "start": 536.4399999999999, "end": 542.4399999999999, "text": " would be + random swapping, random insertion, random deletion, those kind of things.", "tokens": + [50684, 576, 312, 4974, 1693, 10534, 11, 4974, 8969, 313, 11, 4974, 1103, 302, 313, + 11, 729, 733, 295, 721, 13, 50984], "temperature": 0.0, "avg_logprob": -0.20935244208214268, + "compression_ratio": 1.8543046357615893, "no_speech_prob": 0.0050623067654669285}, + {"id": 164, "seek": 53004, "start": 542.4399999999999, "end": 546.28, "text": " + And then you kind of transition into maybe trying to use a knowledge graph to better + guide", "tokens": [50984, 400, 550, 291, 733, 295, 6034, 666, 1310, 1382, 281, 764, + 257, 3601, 4295, 281, 1101, 5934, 51176], "temperature": 0.0, "avg_logprob": -0.20935244208214268, + "compression_ratio": 1.8543046357615893, "no_speech_prob": 0.0050623067654669285}, + {"id": 165, "seek": 53004, "start": 546.28, "end": 548.68, "text": " the text you''re + replacing.", "tokens": [51176, 264, 2487, 291, 434, 19139, 13, 51296], "temperature": + 0.0, "avg_logprob": -0.20935244208214268, "compression_ratio": 1.8543046357615893, + "no_speech_prob": 0.0050623067654669285}, {"id": 166, "seek": 53004, "start": 548.68, + "end": 552.28, "text": " And then ideas like say mix up where you cut and paste + and glue sentences together.", "tokens": [51296, 400, 550, 3487, 411, 584, 2890, + 493, 689, 291, 1723, 293, 9163, 293, 8998, 16579, 1214, 13, 51476], "temperature": + 0.0, "avg_logprob": -0.20935244208214268, "compression_ratio": 1.8543046357615893, + "no_speech_prob": 0.0050623067654669285}, {"id": 167, "seek": 53004, "start": 552.28, + "end": 554.7199999999999, "text": " I''m not like a huge fan of that, but it''s + kind of interesting.", "tokens": [51476, 286, 478, 406, 411, 257, 2603, 3429, 295, + 300, 11, 457, 309, 311, 733, 295, 1880, 13, 51598], "temperature": 0.0, "avg_logprob": + -0.20935244208214268, "compression_ratio": 1.8543046357615893, "no_speech_prob": + 0.0050623067654669285}, {"id": 168, "seek": 53004, "start": 554.7199999999999, "end": + 555.7199999999999, "text": " Yes.", "tokens": [51598, 1079, 13, 51648], "temperature": + 0.0, "avg_logprob": -0.20935244208214268, "compression_ratio": 1.8543046357615893, + "no_speech_prob": 0.0050623067654669285}, {"id": 169, "seek": 53004, "start": 555.7199999999999, + "end": 556.7199999999999, "text": " Might as well have like drop out.", "tokens": + [51648, 23964, 382, 731, 362, 411, 3270, 484, 13, 51698], "temperature": 0.0, "avg_logprob": + -0.20935244208214268, "compression_ratio": 1.8543046357615893, "no_speech_prob": + 0.0050623067654669285}, {"id": 170, "seek": 53004, "start": 556.7199999999999, "end": + 559.7199999999999, "text": " It''s kind of like a, you know, like I don''t think + there''s a lot of intuition in the", "tokens": [51698, 467, 311, 733, 295, 411, + 257, 11, 291, 458, 11, 411, 286, 500, 380, 519, 456, 311, 257, 688, 295, 24002, + 294, 264, 51848], "temperature": 0.0, "avg_logprob": -0.20935244208214268, "compression_ratio": + 1.8543046357615893, "no_speech_prob": 0.0050623067654669285}, {"id": 171, "seek": + 55972, "start": 559.72, "end": 563.44, "text": " data space of why just smashing + them together would work so well.", "tokens": [50364, 1412, 1901, 295, 983, 445, + 43316, 552, 1214, 576, 589, 370, 731, 13, 50550], "temperature": 0.0, "avg_logprob": + -0.18025071070744442, "compression_ratio": 1.8083067092651757, "no_speech_prob": + 0.00034681532997637987}, {"id": 172, "seek": 55972, "start": 563.44, "end": 565.0, + "text": " But it does kind of work.", "tokens": [50550, 583, 309, 775, 733, 295, + 589, 13, 50628], "temperature": 0.0, "avg_logprob": -0.18025071070744442, "compression_ratio": + 1.8083067092651757, "no_speech_prob": 0.00034681532997637987}, {"id": 173, "seek": + 55972, "start": 565.0, "end": 570.0400000000001, "text": " And then, and then I + really like this category of generative data augmentation is obviously", "tokens": + [50628, 400, 550, 11, 293, 550, 286, 534, 411, 341, 7719, 295, 1337, 1166, 1412, + 14501, 19631, 307, 2745, 50880], "temperature": 0.0, "avg_logprob": -0.18025071070744442, + "compression_ratio": 1.8083067092651757, "no_speech_prob": 0.00034681532997637987}, + {"id": 174, "seek": 55972, "start": 570.0400000000001, "end": 571.96, "text": " + mentioning my start in gendered adversarial networks.", "tokens": [50880, 18315, + 452, 722, 294, 7898, 292, 17641, 44745, 9590, 13, 50976], "temperature": 0.0, "avg_logprob": + -0.18025071070744442, "compression_ratio": 1.8083067092651757, "no_speech_prob": + 0.00034681532997637987}, {"id": 175, "seek": 55972, "start": 571.96, "end": 574.36, + "text": " And this idea that you learn the data distribution.", "tokens": [50976, + 400, 341, 1558, 300, 291, 1466, 264, 1412, 7316, 13, 51096], "temperature": 0.0, + "avg_logprob": -0.18025071070744442, "compression_ratio": 1.8083067092651757, "no_speech_prob": + 0.00034681532997637987}, {"id": 176, "seek": 55972, "start": 574.36, "end": 578.88, + "text": " So you sample from the data distribution to learn classifiers and kind + of classifiers", "tokens": [51096, 407, 291, 6889, 490, 264, 1412, 7316, 281, 1466, + 1508, 23463, 293, 733, 295, 1508, 23463, 51322], "temperature": 0.0, "avg_logprob": + -0.18025071070744442, "compression_ratio": 1.8083067092651757, "no_speech_prob": + 0.00034681532997637987}, {"id": 177, "seek": 55972, "start": 578.88, "end": 584.48, + "text": " being almost like a appendage of the generative model, which is, which + is like what we''re talking", "tokens": [51322, 885, 1920, 411, 257, 34116, 609, + 295, 264, 1337, 1166, 2316, 11, 597, 307, 11, 597, 307, 411, 437, 321, 434, 1417, + 51602], "temperature": 0.0, "avg_logprob": -0.18025071070744442, "compression_ratio": + 1.8083067092651757, "no_speech_prob": 0.00034681532997637987}, {"id": 178, "seek": + 55972, "start": 584.48, "end": 588.4, "text": " about with the modules, the supervised + learning tasks that you append onto the vector search", "tokens": [51602, 466, 365, + 264, 16679, 11, 264, 46533, 2539, 9608, 300, 291, 34116, 3911, 264, 8062, 3164, + 51798], "temperature": 0.0, "avg_logprob": -0.18025071070744442, "compression_ratio": + 1.8083067092651757, "no_speech_prob": 0.00034681532997637987}, {"id": 179, "seek": + 58840, "start": 588.4, "end": 590.36, "text": " engine database.", "tokens": [50364, + 2848, 8149, 13, 50462], "temperature": 0.0, "avg_logprob": -0.21727121793306792, + "compression_ratio": 1.6805555555555556, "no_speech_prob": 0.012114069424569607}, + {"id": 180, "seek": 58840, "start": 590.36, "end": 595.8, "text": " It''s like this + task of having a generative model or say a representative vector space is", "tokens": + [50462, 467, 311, 411, 341, 5633, 295, 1419, 257, 1337, 1166, 2316, 420, 584, 257, + 12424, 8062, 1901, 307, 50734], "temperature": 0.0, "avg_logprob": -0.21727121793306792, + "compression_ratio": 1.6805555555555556, "no_speech_prob": 0.012114069424569607}, + {"id": 181, "seek": 58840, "start": 595.8, "end": 599.9599999999999, "text": " kind + of like the real context that built into the supervised learning task.", "tokens": + [50734, 733, 295, 411, 264, 957, 4319, 300, 3094, 666, 264, 46533, 2539, 5633, 13, + 50942], "temperature": 0.0, "avg_logprob": -0.21727121793306792, "compression_ratio": + 1.6805555555555556, "no_speech_prob": 0.012114069424569607}, {"id": 182, "seek": + 58840, "start": 599.9599999999999, "end": 601.4399999999999, "text": " Or at least + that''s the way I see it.", "tokens": [50942, 1610, 412, 1935, 300, 311, 264, 636, + 286, 536, 309, 13, 51016], "temperature": 0.0, "avg_logprob": -0.21727121793306792, + "compression_ratio": 1.6805555555555556, "no_speech_prob": 0.012114069424569607}, + {"id": 183, "seek": 58840, "start": 601.4399999999999, "end": 604.8, "text": " And, + you know, maybe anyone can leave a comment if they are have a different idea about", + "tokens": [51016, 400, 11, 291, 458, 11, 1310, 2878, 393, 1856, 257, 2871, 498, + 436, 366, 362, 257, 819, 1558, 466, 51184], "temperature": 0.0, "avg_logprob": -0.21727121793306792, + "compression_ratio": 1.6805555555555556, "no_speech_prob": 0.012114069424569607}, + {"id": 184, "seek": 58840, "start": 604.8, "end": 605.8, "text": " that.", "tokens": + [51184, 300, 13, 51234], "temperature": 0.0, "avg_logprob": -0.21727121793306792, + "compression_ratio": 1.6805555555555556, "no_speech_prob": 0.012114069424569607}, + {"id": 185, "seek": 58840, "start": 605.8, "end": 606.8, "text": " I think it''s + ill aimed.", "tokens": [51234, 286, 519, 309, 311, 3171, 20540, 13, 51284], "temperature": + 0.0, "avg_logprob": -0.21727121793306792, "compression_ratio": 1.6805555555555556, + "no_speech_prob": 0.012114069424569607}, {"id": 186, "seek": 58840, "start": 606.8, + "end": 609.36, "text": " But so that''s kind of how I see those two things integrating.", + "tokens": [51284, 583, 370, 300, 311, 733, 295, 577, 286, 536, 729, 732, 721, 26889, + 13, 51412], "temperature": 0.0, "avg_logprob": -0.21727121793306792, "compression_ratio": + 1.6805555555555556, "no_speech_prob": 0.012114069424569607}, {"id": 187, "seek": + 58840, "start": 609.36, "end": 614.16, "text": " So to connect this back to text, + what we can do is text is we can use things like GPT", "tokens": [51412, 407, 281, + 1745, 341, 646, 281, 2487, 11, 437, 321, 393, 360, 307, 2487, 307, 321, 393, 764, + 721, 411, 26039, 51, 51652], "temperature": 0.0, "avg_logprob": -0.21727121793306792, + "compression_ratio": 1.6805555555555556, "no_speech_prob": 0.012114069424569607}, + {"id": 188, "seek": 61416, "start": 614.16, "end": 618.8, "text": " three or more + so what they do is you would prompt GPT three.", "tokens": [50364, 1045, 420, 544, + 370, 437, 436, 360, 307, 291, 576, 12391, 26039, 51, 1045, 13, 50596], "temperature": + 0.0, "avg_logprob": -0.1784707886832101, "compression_ratio": 1.8108108108108107, + "no_speech_prob": 0.097793348133564}, {"id": 189, "seek": 61416, "start": 618.8, + "end": 623.6, "text": " So you''d say, you know, please finish this movie review + with a positive sentiment as", "tokens": [50596, 407, 291, 1116, 584, 11, 291, 458, + 11, 1767, 2413, 341, 3169, 3131, 365, 257, 3353, 16149, 382, 50836], "temperature": + 0.0, "avg_logprob": -0.1784707886832101, "compression_ratio": 1.8108108108108107, + "no_speech_prob": 0.097793348133564}, {"id": 190, "seek": 61416, "start": 623.6, + "end": 624.76, "text": " the prompt.", "tokens": [50836, 264, 12391, 13, 50894], + "temperature": 0.0, "avg_logprob": -0.1784707886832101, "compression_ratio": 1.8108108108108107, + "no_speech_prob": 0.097793348133564}, {"id": 191, "seek": 61416, "start": 624.76, + "end": 627.6, "text": " And then you can just remove whatever you want from the + original data point.", "tokens": [50894, 400, 550, 291, 393, 445, 4159, 2035, 291, + 528, 490, 264, 3380, 1412, 935, 13, 51036], "temperature": 0.0, "avg_logprob": -0.1784707886832101, + "compression_ratio": 1.8108108108108107, "no_speech_prob": 0.097793348133564}, {"id": + 192, "seek": 61416, "start": 627.6, "end": 629.76, "text": " And GPT three can generate + a new movie review.", "tokens": [51036, 400, 26039, 51, 1045, 393, 8460, 257, 777, + 3169, 3131, 13, 51144], "temperature": 0.0, "avg_logprob": -0.1784707886832101, + "compression_ratio": 1.8108108108108107, "no_speech_prob": 0.097793348133564}, {"id": + 193, "seek": 61416, "start": 629.76, "end": 634.04, "text": " And then you can blow + up your data set size, avoid the pitfalls of overfitting and that", "tokens": [51144, + 400, 550, 291, 393, 6327, 493, 428, 1412, 992, 2744, 11, 5042, 264, 10147, 18542, + 295, 670, 69, 2414, 293, 300, 51358], "temperature": 0.0, "avg_logprob": -0.1784707886832101, + "compression_ratio": 1.8108108108108107, "no_speech_prob": 0.097793348133564}, {"id": + 194, "seek": 61416, "start": 634.04, "end": 635.56, "text": " kind of promise of + data augmentation.", "tokens": [51358, 733, 295, 6228, 295, 1412, 14501, 19631, + 13, 51434], "temperature": 0.0, "avg_logprob": -0.1784707886832101, "compression_ratio": + 1.8108108108108107, "no_speech_prob": 0.097793348133564}, {"id": 195, "seek": 61416, + "start": 635.56, "end": 639.36, "text": " So hopefully that kind of answers the + question of how I did this transition from image to", "tokens": [51434, 407, 4696, + 300, 733, 295, 6338, 264, 1168, 295, 577, 286, 630, 341, 6034, 490, 3256, 281, 51624], + "temperature": 0.0, "avg_logprob": -0.1784707886832101, "compression_ratio": 1.8108108108108107, + "no_speech_prob": 0.097793348133564}, {"id": 196, "seek": 61416, "start": 639.36, + "end": 640.36, "text": " text data augmentation.", "tokens": [51624, 2487, 1412, + 14501, 19631, 13, 51674], "temperature": 0.0, "avg_logprob": -0.1784707886832101, + "compression_ratio": 1.8108108108108107, "no_speech_prob": 0.097793348133564}, {"id": + 197, "seek": 61416, "start": 640.36, "end": 641.36, "text": " Yeah, it does.", "tokens": + [51674, 865, 11, 309, 775, 13, 51724], "temperature": 0.0, "avg_logprob": -0.1784707886832101, + "compression_ratio": 1.8108108108108107, "no_speech_prob": 0.097793348133564}, {"id": + 198, "seek": 64136, "start": 641.36, "end": 646.92, "text": " And I mean, why I''m + asking also is because, you know, you can also treat these two sources", "tokens": + [50364, 400, 286, 914, 11, 983, 286, 478, 3365, 611, 307, 570, 11, 291, 458, 11, + 291, 393, 611, 2387, 613, 732, 7139, 50642], "temperature": 0.0, "avg_logprob": + -0.19987475430523907, "compression_ratio": 1.6778242677824269, "no_speech_prob": + 0.046540241688489914}, {"id": 199, "seek": 64136, "start": 646.92, "end": 651.4, + "text": " of data in like kind of in a joint training task, right?", "tokens": [50642, + 295, 1412, 294, 411, 733, 295, 294, 257, 7225, 3097, 5633, 11, 558, 30, 50866], + "temperature": 0.0, "avg_logprob": -0.19987475430523907, "compression_ratio": 1.6778242677824269, + "no_speech_prob": 0.046540241688489914}, {"id": 200, "seek": 64136, "start": 651.4, + "end": 654.48, "text": " So you can kind of train the joint neural network.", "tokens": + [50866, 407, 291, 393, 733, 295, 3847, 264, 7225, 18161, 3209, 13, 51020], "temperature": + 0.0, "avg_logprob": -0.19987475430523907, "compression_ratio": 1.6778242677824269, + "no_speech_prob": 0.046540241688489914}, {"id": 201, "seek": 64136, "start": 654.48, + "end": 659.44, "text": " And for example, when you watch, let''s say watch using + the algorithm, you watch the movie", "tokens": [51020, 400, 337, 1365, 11, 562, + 291, 1159, 11, 718, 311, 584, 1159, 1228, 264, 9284, 11, 291, 1159, 264, 3169, 51268], + "temperature": 0.0, "avg_logprob": -0.19987475430523907, "compression_ratio": 1.6778242677824269, + "no_speech_prob": 0.046540241688489914}, {"id": 202, "seek": 64136, "start": 659.44, + "end": 666.12, "text": " or cartoon and you see some scene where, you know, one + hero is kind of crying.", "tokens": [51268, 420, 18569, 293, 291, 536, 512, 4145, + 689, 11, 291, 458, 11, 472, 5316, 307, 733, 295, 8554, 13, 51602], "temperature": + 0.0, "avg_logprob": -0.19987475430523907, "compression_ratio": 1.6778242677824269, + "no_speech_prob": 0.046540241688489914}, {"id": 203, "seek": 64136, "start": 666.12, + "end": 668.32, "text": " The other one is cheering him up.", "tokens": [51602, 440, + 661, 472, 307, 11060, 796, 493, 13, 51712], "temperature": 0.0, "avg_logprob": -0.19987475430523907, + "compression_ratio": 1.6778242677824269, "no_speech_prob": 0.046540241688489914}, + {"id": 204, "seek": 66832, "start": 668.32, "end": 671.0400000000001, "text": " + You know, now where do you pay attention to?", "tokens": [50364, 509, 458, 11, 586, + 689, 360, 291, 1689, 3202, 281, 30, 50500], "temperature": 0.0, "avg_logprob": -0.2420017791516853, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.12082384526729584}, + {"id": 205, "seek": 66832, "start": 671.0400000000001, "end": 672.36, "text": " + It''s also important, right?", "tokens": [50500, 467, 311, 611, 1021, 11, 558, 30, + 50566], "temperature": 0.0, "avg_logprob": -0.2420017791516853, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.12082384526729584}, {"id": 206, "seek": + 66832, "start": 672.36, "end": 673.7600000000001, "text": " Because it''s the whole + scene.", "tokens": [50566, 1436, 309, 311, 264, 1379, 4145, 13, 50636], "temperature": + 0.0, "avg_logprob": -0.2420017791516853, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.12082384526729584}, {"id": 207, "seek": 66832, "start": 673.7600000000001, + "end": 678.32, "text": " Now you need to pay attention, maybe just to that pin on + his neck, you know, that he''s", "tokens": [50636, 823, 291, 643, 281, 1689, 3202, + 11, 1310, 445, 281, 300, 5447, 322, 702, 6189, 11, 291, 458, 11, 300, 415, 311, + 50864], "temperature": 0.0, "avg_logprob": -0.2420017791516853, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.12082384526729584}, {"id": 208, "seek": + 66832, "start": 678.32, "end": 680.12, "text": " not happy about.", "tokens": [50864, + 406, 2055, 466, 13, 50954], "temperature": 0.0, "avg_logprob": -0.2420017791516853, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.12082384526729584}, + {"id": 209, "seek": 66832, "start": 680.12, "end": 681.48, "text": " And you know, + things like that.", "tokens": [50954, 400, 291, 458, 11, 721, 411, 300, 13, 51022], + "temperature": 0.0, "avg_logprob": -0.2420017791516853, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.12082384526729584}, {"id": 210, "seek": 66832, "start": 681.48, + "end": 683.6400000000001, "text": " So have you thought about that as well?", "tokens": + [51022, 407, 362, 291, 1194, 466, 300, 382, 731, 30, 51130], "temperature": 0.0, + "avg_logprob": -0.2420017791516853, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.12082384526729584}, {"id": 211, "seek": 66832, "start": 683.6400000000001, "end": + 686.36, "text": " Or are you still considering them as independent?", "tokens": + [51130, 1610, 366, 291, 920, 8079, 552, 382, 6695, 30, 51266], "temperature": 0.0, + "avg_logprob": -0.2420017791516853, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.12082384526729584}, {"id": 212, "seek": 66832, "start": 686.36, "end": 687.96, + "text": " Yeah, I know.", "tokens": [51266, 865, 11, 286, 458, 13, 51346], "temperature": + 0.0, "avg_logprob": -0.2420017791516853, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.12082384526729584}, {"id": 213, "seek": 66832, "start": 687.96, + "end": 689.48, "text": " Yeah, I love that idea.", "tokens": [51346, 865, 11, 286, + 959, 300, 1558, 13, 51422], "temperature": 0.0, "avg_logprob": -0.2420017791516853, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.12082384526729584}, + {"id": 214, "seek": 66832, "start": 689.48, "end": 695.32, "text": " Like I think + what we''re the word that most people are using is multimodal learning.", "tokens": + [51422, 1743, 286, 519, 437, 321, 434, 264, 1349, 300, 881, 561, 366, 1228, 307, + 32972, 378, 304, 2539, 13, 51714], "temperature": 0.0, "avg_logprob": -0.2420017791516853, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.12082384526729584}, + {"id": 215, "seek": 69532, "start": 695.32, "end": 698.2, "text": " And I''d call + that paper multimodal data augmentation.", "tokens": [50364, 400, 286, 1116, 818, + 300, 3035, 32972, 378, 304, 1412, 14501, 19631, 13, 50508], "temperature": 0.0, + "avg_logprob": -0.2481349686444816, "compression_ratio": 1.583629893238434, "no_speech_prob": + 0.052273061126470566}, {"id": 216, "seek": 69532, "start": 698.2, "end": 703.4000000000001, + "text": " And you know, just last night Microsoft released a new 2.5 billion parameter + image text", "tokens": [50508, 400, 291, 458, 11, 445, 1036, 1818, 8116, 4736, 257, + 777, 568, 13, 20, 5218, 13075, 3256, 2487, 50768], "temperature": 0.0, "avg_logprob": + -0.2481349686444816, "compression_ratio": 1.583629893238434, "no_speech_prob": 0.052273061126470566}, + {"id": 217, "seek": 69532, "start": 703.4000000000001, "end": 704.4000000000001, + "text": " embedding space.", "tokens": [50768, 12240, 3584, 1901, 13, 50818], "temperature": + 0.0, "avg_logprob": -0.2481349686444816, "compression_ratio": 1.583629893238434, + "no_speech_prob": 0.052273061126470566}, {"id": 218, "seek": 69532, "start": 704.4000000000001, + "end": 709.98, "text": " You know, everyone''s knows about OpenAI''s clip image + text spaces and the dolly, the avocado", "tokens": [50818, 509, 458, 11, 1518, 311, + 3255, 466, 7238, 48698, 311, 7353, 3256, 2487, 7673, 293, 264, 2722, 88, 11, 264, + 27041, 51097], "temperature": 0.0, "avg_logprob": -0.2481349686444816, "compression_ratio": + 1.583629893238434, "no_speech_prob": 0.052273061126470566}, {"id": 219, "seek": + 69532, "start": 709.98, "end": 711.7600000000001, "text": " shaved armchair generation.", + "tokens": [51097, 37980, 3726, 17892, 5125, 13, 51186], "temperature": 0.0, "avg_logprob": + -0.2481349686444816, "compression_ratio": 1.583629893238434, "no_speech_prob": 0.052273061126470566}, + {"id": 220, "seek": 69532, "start": 711.7600000000001, "end": 713.8000000000001, + "text": " Everyone likes that.", "tokens": [51186, 5198, 5902, 300, 13, 51288], + "temperature": 0.0, "avg_logprob": -0.2481349686444816, "compression_ratio": 1.583629893238434, + "no_speech_prob": 0.052273061126470566}, {"id": 221, "seek": 69532, "start": 713.8000000000001, + "end": 717.8000000000001, "text": " So yeah, I mean, multimodal learning is so exciting.", + "tokens": [51288, 407, 1338, 11, 286, 914, 11, 32972, 378, 304, 2539, 307, 370, + 4670, 13, 51488], "temperature": 0.0, "avg_logprob": -0.2481349686444816, "compression_ratio": + 1.583629893238434, "no_speech_prob": 0.052273061126470566}, {"id": 222, "seek": + 69532, "start": 717.8000000000001, "end": 723.7600000000001, "text": " Yeah, I''d + say it''s going to be an interesting thing with the computation of it and what kind", + "tokens": [51488, 865, 11, 286, 1116, 584, 309, 311, 516, 281, 312, 364, 1880, 551, + 365, 264, 24903, 295, 309, 293, 437, 733, 51786], "temperature": 0.0, "avg_logprob": + -0.2481349686444816, "compression_ratio": 1.583629893238434, "no_speech_prob": 0.052273061126470566}, + {"id": 223, "seek": 72376, "start": 724.76, "end": 727.68, "text": " of in what + the computation requires, we''re setting up these kind of tests.", "tokens": [50414, + 295, 294, 437, 264, 24903, 7029, 11, 321, 434, 3287, 493, 613, 733, 295, 6921, 13, + 50560], "temperature": 0.0, "avg_logprob": -0.22855460475867903, "compression_ratio": + 1.7085889570552146, "no_speech_prob": 0.009539502672851086}, {"id": 224, "seek": + 72376, "start": 727.68, "end": 731.3199999999999, "text": " I''d say, especially + with video data, like you just mentioned, I, you know, I wouldn''t", "tokens": [50560, + 286, 1116, 584, 11, 2318, 365, 960, 1412, 11, 411, 291, 445, 2835, 11, 286, 11, + 291, 458, 11, 286, 2759, 380, 50742], "temperature": 0.0, "avg_logprob": -0.22855460475867903, + "compression_ratio": 1.7085889570552146, "no_speech_prob": 0.009539502672851086}, + {"id": 225, "seek": 72376, "start": 731.3199999999999, "end": 735.4, "text": " really + want to play around with video data with my collab Google Drive workflow that I + mentioned", "tokens": [50742, 534, 528, 281, 862, 926, 365, 960, 1412, 365, 452, + 44228, 3329, 15622, 20993, 300, 286, 2835, 50946], "temperature": 0.0, "avg_logprob": + -0.22855460475867903, "compression_ratio": 1.7085889570552146, "no_speech_prob": + 0.009539502672851086}, {"id": 226, "seek": 72376, "start": 735.4, "end": 736.4, + "text": " earlier.", "tokens": [50946, 3071, 13, 50996], "temperature": 0.0, "avg_logprob": + -0.22855460475867903, "compression_ratio": 1.7085889570552146, "no_speech_prob": + 0.009539502672851086}, {"id": 227, "seek": 72376, "start": 736.4, "end": 737.4, + "text": " Yeah.", "tokens": [50996, 865, 13, 51046], "temperature": 0.0, "avg_logprob": + -0.22855460475867903, "compression_ratio": 1.7085889570552146, "no_speech_prob": + 0.009539502672851086}, {"id": 228, "seek": 72376, "start": 737.4, "end": 738.4, + "text": " Yeah.", "tokens": [51046, 865, 13, 51096], "temperature": 0.0, "avg_logprob": + -0.22855460475867903, "compression_ratio": 1.7085889570552146, "no_speech_prob": + 0.009539502672851086}, {"id": 229, "seek": 72376, "start": 738.4, "end": 743.0, + "text": " But it''s interesting also that big players, like you mentioned Microsoft + and I mean, others,", "tokens": [51096, 583, 309, 311, 1880, 611, 300, 955, 4150, + 11, 411, 291, 2835, 8116, 293, 286, 914, 11, 2357, 11, 51326], "temperature": 0.0, + "avg_logprob": -0.22855460475867903, "compression_ratio": 1.7085889570552146, "no_speech_prob": + 0.009539502672851086}, {"id": 230, "seek": 72376, "start": 743.0, "end": 746.92, + "text": " they''re moving in direction of increasing number of parameters in the + model.", "tokens": [51326, 436, 434, 2684, 294, 3513, 295, 5662, 1230, 295, 9834, + 294, 264, 2316, 13, 51522], "temperature": 0.0, "avg_logprob": -0.22855460475867903, + "compression_ratio": 1.7085889570552146, "no_speech_prob": 0.009539502672851086}, + {"id": 231, "seek": 72376, "start": 746.92, "end": 751.16, "text": " But when you + go to practice and you need to build a classifier, you know, you don''t have", "tokens": + [51522, 583, 562, 291, 352, 281, 3124, 293, 291, 643, 281, 1322, 257, 1508, 9902, + 11, 291, 458, 11, 291, 500, 380, 362, 51734], "temperature": 0.0, "avg_logprob": + -0.22855460475867903, "compression_ratio": 1.7085889570552146, "no_speech_prob": + 0.009539502672851086}, {"id": 232, "seek": 72376, "start": 751.16, "end": 752.68, + "text": " that much capacity.", "tokens": [51734, 300, 709, 6042, 13, 51810], "temperature": + 0.0, "avg_logprob": -0.22855460475867903, "compression_ratio": 1.7085889570552146, + "no_speech_prob": 0.009539502672851086}, {"id": 233, "seek": 75268, "start": 752.68, + "end": 757.64, "text": " Like you don''t want to spend that much capacity really + unless you''re building like a terminator", "tokens": [50364, 1743, 291, 500, 380, + 528, 281, 3496, 300, 709, 6042, 534, 5969, 291, 434, 2390, 411, 257, 10761, 1639, + 50612], "temperature": 0.0, "avg_logprob": -0.23132730141664162, "compression_ratio": + 1.6285714285714286, "no_speech_prob": 0.016156353056430817}, {"id": 234, "seek": + 75268, "start": 757.64, "end": 761.3199999999999, "text": " level AI, which will + handle all tasks that you have.", "tokens": [50612, 1496, 7318, 11, 597, 486, 4813, + 439, 9608, 300, 291, 362, 13, 50796], "temperature": 0.0, "avg_logprob": -0.23132730141664162, + "compression_ratio": 1.6285714285714286, "no_speech_prob": 0.016156353056430817}, + {"id": 235, "seek": 75268, "start": 761.3199999999999, "end": 764.7199999999999, + "text": " But probably you won''t do that because it''s still not there.", "tokens": + [50796, 583, 1391, 291, 1582, 380, 360, 300, 570, 309, 311, 920, 406, 456, 13, 50966], + "temperature": 0.0, "avg_logprob": -0.23132730141664162, "compression_ratio": 1.6285714285714286, + "no_speech_prob": 0.016156353056430817}, {"id": 236, "seek": 75268, "start": 764.7199999999999, + "end": 769.64, "text": " So do you also think about that kind of the practical element + or are you still kind of", "tokens": [50966, 407, 360, 291, 611, 519, 466, 300, + 733, 295, 264, 8496, 4478, 420, 366, 291, 920, 733, 295, 51212], "temperature": + 0.0, "avg_logprob": -0.23132730141664162, "compression_ratio": 1.6285714285714286, + "no_speech_prob": 0.016156353056430817}, {"id": 237, "seek": 75268, "start": 769.64, + "end": 772.88, "text": " fencing the beauty of these complex models?", "tokens": + [51212, 283, 13644, 264, 6643, 295, 613, 3997, 5245, 30, 51374], "temperature": + 0.0, "avg_logprob": -0.23132730141664162, "compression_ratio": 1.6285714285714286, + "no_speech_prob": 0.016156353056430817}, {"id": 238, "seek": 75268, "start": 772.88, + "end": 775.3599999999999, "text": " Well, wait, do you see that?", "tokens": [51374, + 1042, 11, 1699, 11, 360, 291, 536, 300, 30, 51498], "temperature": 0.0, "avg_logprob": + -0.23132730141664162, "compression_ratio": 1.6285714285714286, "no_speech_prob": + 0.016156353056430817}, {"id": 239, "seek": 75268, "start": 775.3599999999999, "end": + 781.16, "text": " Yeah, well, I''ll stake my flag in the same campus, the foundation + models, researchers,", "tokens": [51498, 865, 11, 731, 11, 286, 603, 10407, 452, + 7166, 294, 264, 912, 4828, 11, 264, 7030, 5245, 11, 10309, 11, 51788], "temperature": + 0.0, "avg_logprob": -0.23132730141664162, "compression_ratio": 1.6285714285714286, + "no_speech_prob": 0.016156353056430817}, {"id": 240, "seek": 78116, "start": 781.16, + "end": 783.1999999999999, "text": " and I think it was mostly Stanford.", "tokens": + [50364, 293, 286, 519, 309, 390, 5240, 20374, 13, 50466], "temperature": 0.0, "avg_logprob": + -0.2473928145779908, "compression_ratio": 1.6277602523659307, "no_speech_prob": + 0.015554750338196754}, {"id": 241, "seek": 78116, "start": 783.1999999999999, "end": + 787.76, "text": " They published this paper titled on the opportunities and risk + of foundation models, some title like", "tokens": [50466, 814, 6572, 341, 3035, + 19841, 322, 264, 4786, 293, 3148, 295, 7030, 5245, 11, 512, 4876, 411, 50694], "temperature": + 0.0, "avg_logprob": -0.2473928145779908, "compression_ratio": 1.6277602523659307, + "no_speech_prob": 0.015554750338196754}, {"id": 242, "seek": 78116, "start": 787.76, + "end": 788.76, "text": " that.", "tokens": [50694, 300, 13, 50744], "temperature": + 0.0, "avg_logprob": -0.2473928145779908, "compression_ratio": 1.6277602523659307, + "no_speech_prob": 0.015554750338196754}, {"id": 243, "seek": 78116, "start": 788.76, + "end": 789.76, "text": " I''m sorry, it''s not exactly correct.", "tokens": [50744, + 286, 478, 2597, 11, 309, 311, 406, 2293, 3006, 13, 50794], "temperature": 0.0, "avg_logprob": + -0.2473928145779908, "compression_ratio": 1.6277602523659307, "no_speech_prob": + 0.015554750338196754}, {"id": 244, "seek": 78116, "start": 789.76, "end": 794.4, + "text": " But, you know, this kind of ideology that big companies like Microsoft + and Vitya Google", "tokens": [50794, 583, 11, 291, 458, 11, 341, 733, 295, 23101, + 300, 955, 3431, 411, 8116, 293, 691, 507, 64, 3329, 51026], "temperature": 0.0, + "avg_logprob": -0.2473928145779908, "compression_ratio": 1.6277602523659307, "no_speech_prob": + 0.015554750338196754}, {"id": 245, "seek": 78116, "start": 794.4, "end": 797.24, + "text": " Facebook, they''ll build these big, big models.", "tokens": [51026, 4384, + 11, 436, 603, 1322, 613, 955, 11, 955, 5245, 13, 51168], "temperature": 0.0, "avg_logprob": + -0.2473928145779908, "compression_ratio": 1.6277602523659307, "no_speech_prob": + 0.015554750338196754}, {"id": 246, "seek": 78116, "start": 797.24, "end": 800.0799999999999, + "text": " And then what we''ll do is we''ll use this knowledge distillation interface + to compress", "tokens": [51168, 400, 550, 437, 321, 603, 360, 307, 321, 603, 764, + 341, 3601, 42923, 399, 9226, 281, 14778, 51310], "temperature": 0.0, "avg_logprob": + -0.2473928145779908, "compression_ratio": 1.6277602523659307, "no_speech_prob": + 0.015554750338196754}, {"id": 247, "seek": 78116, "start": 800.0799999999999, "end": + 802.16, "text": " it into practical use cases.", "tokens": [51310, 309, 666, 8496, + 764, 3331, 13, 51414], "temperature": 0.0, "avg_logprob": -0.2473928145779908, "compression_ratio": + 1.6277602523659307, "no_speech_prob": 0.015554750338196754}, {"id": 248, "seek": + 78116, "start": 802.16, "end": 807.12, "text": " And so we''ve seen, I''d say this + started with Colin Raffle and the people who worked on", "tokens": [51414, 400, + 370, 321, 600, 1612, 11, 286, 1116, 584, 341, 1409, 365, 29253, 497, 29264, 293, + 264, 561, 567, 2732, 322, 51662], "temperature": 0.0, "avg_logprob": -0.2473928145779908, + "compression_ratio": 1.6277602523659307, "no_speech_prob": 0.015554750338196754}, + {"id": 249, "seek": 80712, "start": 807.12, "end": 811.88, "text": " my paper with + the text to text transfer transform of the T5 model, not showed how you could", + "tokens": [50364, 452, 3035, 365, 264, 2487, 281, 2487, 5003, 4088, 295, 264, 314, + 20, 2316, 11, 406, 4712, 577, 291, 727, 50602], "temperature": 0.0, "avg_logprob": + -0.22747027079264323, "compression_ratio": 1.910344827586207, "no_speech_prob": + 0.005559222307056189}, {"id": 250, "seek": 80712, "start": 811.88, "end": 816.88, + "text": " unify all text supervised learning tasks through the same kind of language + modeling style", "tokens": [50602, 517, 2505, 439, 2487, 46533, 2539, 9608, 807, + 264, 912, 733, 295, 2856, 15983, 3758, 50852], "temperature": 0.0, "avg_logprob": + -0.22747027079264323, "compression_ratio": 1.910344827586207, "no_speech_prob": + 0.005559222307056189}, {"id": 251, "seek": 80712, "start": 816.88, "end": 817.88, + "text": " interface.", "tokens": [50852, 9226, 13, 50902], "temperature": 0.0, "avg_logprob": + -0.22747027079264323, "compression_ratio": 1.910344827586207, "no_speech_prob": + 0.005559222307056189}, {"id": 252, "seek": 80712, "start": 817.88, "end": 821.08, + "text": " You just prompt it with, you know, natural language inference and then + you give it the", "tokens": [50902, 509, 445, 12391, 309, 365, 11, 291, 458, 11, + 3303, 2856, 38253, 293, 550, 291, 976, 309, 264, 51062], "temperature": 0.0, "avg_logprob": + -0.22747027079264323, "compression_ratio": 1.910344827586207, "no_speech_prob": + 0.005559222307056189}, {"id": 253, "seek": 80712, "start": 821.08, "end": 825.36, + "text": " input or you say, answer this question, give it the input or you say, + re-rank these documents", "tokens": [51062, 4846, 420, 291, 584, 11, 1867, 341, + 1168, 11, 976, 309, 264, 4846, 420, 291, 584, 11, 319, 12, 20479, 613, 8512, 51276], + "temperature": 0.0, "avg_logprob": -0.22747027079264323, "compression_ratio": 1.910344827586207, + "no_speech_prob": 0.005559222307056189}, {"id": 254, "seek": 80712, "start": 825.36, + "end": 826.36, "text": " and give them the document.", "tokens": [51276, 293, 976, + 552, 264, 4166, 13, 51326], "temperature": 0.0, "avg_logprob": -0.22747027079264323, + "compression_ratio": 1.910344827586207, "no_speech_prob": 0.005559222307056189}, + {"id": 255, "seek": 80712, "start": 826.36, "end": 829.6, "text": " So it''s the + same interface for every supervised learning task.", "tokens": [51326, 407, 309, + 311, 264, 912, 9226, 337, 633, 46533, 2539, 5633, 13, 51488], "temperature": 0.0, + "avg_logprob": -0.22747027079264323, "compression_ratio": 1.910344827586207, "no_speech_prob": + 0.005559222307056189}, {"id": 256, "seek": 80712, "start": 829.6, "end": 835.44, + "text": " So yeah, I''m, and then just one more thing to kind of put in the citation + context is this", "tokens": [51488, 407, 1338, 11, 286, 478, 11, 293, 550, 445, + 472, 544, 551, 281, 733, 295, 829, 294, 264, 45590, 4319, 307, 341, 51780], "temperature": + 0.0, "avg_logprob": -0.22747027079264323, "compression_ratio": 1.910344827586207, + "no_speech_prob": 0.005559222307056189}, {"id": 257, "seek": 83544, "start": 835.44, + "end": 839.6, "text": " general purpose, like, opening iClip and it looks like Microsoft, + I think they''re calling", "tokens": [50364, 2674, 4334, 11, 411, 11, 5193, 741, + 9966, 647, 293, 309, 1542, 411, 8116, 11, 286, 519, 436, 434, 5141, 50572], "temperature": + 0.0, "avg_logprob": -0.2112394650777181, "compression_ratio": 1.726384364820847, + "no_speech_prob": 0.0013060539495199919}, {"id": 258, "seek": 83544, "start": 839.6, + "end": 841.96, "text": " it bletchly or something like that.", "tokens": [50572, + 309, 888, 7858, 356, 420, 746, 411, 300, 13, 50690], "temperature": 0.0, "avg_logprob": + -0.2112394650777181, "compression_ratio": 1.726384364820847, "no_speech_prob": 0.0013060539495199919}, + {"id": 259, "seek": 83544, "start": 841.96, "end": 848.08, "text": " But this idea + of just having two vector embedding spaces and then using the contrast of alignment", + "tokens": [50690, 583, 341, 1558, 295, 445, 1419, 732, 8062, 12240, 3584, 7673, + 293, 550, 1228, 264, 8712, 295, 18515, 50996], "temperature": 0.0, "avg_logprob": + -0.2112394650777181, "compression_ratio": 1.726384364820847, "no_speech_prob": 0.0013060539495199919}, + {"id": 260, "seek": 83544, "start": 848.08, "end": 851.6400000000001, "text": " + as the general interface for any kind of task, because as we mentioned, you can + put any", "tokens": [50996, 382, 264, 2674, 9226, 337, 604, 733, 295, 5633, 11, + 570, 382, 321, 2835, 11, 291, 393, 829, 604, 51174], "temperature": 0.0, "avg_logprob": + -0.2112394650777181, "compression_ratio": 1.726384364820847, "no_speech_prob": 0.0013060539495199919}, + {"id": 261, "seek": 83544, "start": 851.6400000000001, "end": 855.6, "text": " task + into natural language, any task that you''re going to do with supervised learning + could", "tokens": [51174, 5633, 666, 3303, 2856, 11, 604, 5633, 300, 291, 434, 516, + 281, 360, 365, 46533, 2539, 727, 51372], "temperature": 0.0, "avg_logprob": -0.2112394650777181, + "compression_ratio": 1.726384364820847, "no_speech_prob": 0.0013060539495199919}, + {"id": 262, "seek": 83544, "start": 855.6, "end": 857.08, "text": " be described + with natural language.", "tokens": [51372, 312, 7619, 365, 3303, 2856, 13, 51446], + "temperature": 0.0, "avg_logprob": -0.2112394650777181, "compression_ratio": 1.726384364820847, + "no_speech_prob": 0.0013060539495199919}, {"id": 263, "seek": 83544, "start": 857.08, + "end": 861.32, "text": " So you have that kind of interface and the Allen Institute + has another architecture called", "tokens": [51446, 407, 291, 362, 300, 733, 295, + 9226, 293, 264, 17160, 9446, 575, 1071, 9482, 1219, 51658], "temperature": 0.0, + "avg_logprob": -0.2112394650777181, "compression_ratio": 1.726384364820847, "no_speech_prob": + 0.0013060539495199919}, {"id": 264, "seek": 86132, "start": 861.32, "end": 865.96, + "text": " general purpose vision systems that, you know, unifies all these tasks, + object detection,", "tokens": [50364, 2674, 4334, 5201, 3652, 300, 11, 291, 458, + 11, 517, 11221, 439, 613, 9608, 11, 2657, 17784, 11, 50596], "temperature": 0.0, + "avg_logprob": -0.1865214226951062, "compression_ratio": 1.859375, "no_speech_prob": + 0.002084535313770175}, {"id": 265, "seek": 86132, "start": 865.96, "end": 870.4000000000001, + "text": " semantic segmentation, service, normal estimation, all these kind of ideas + are unifying one architecture", "tokens": [50596, 47982, 9469, 399, 11, 2643, 11, + 2710, 35701, 11, 439, 613, 733, 295, 3487, 366, 517, 5489, 472, 9482, 50818], "temperature": + 0.0, "avg_logprob": -0.1865214226951062, "compression_ratio": 1.859375, "no_speech_prob": + 0.002084535313770175}, {"id": 266, "seek": 86132, "start": 870.4000000000001, "end": + 871.4000000000001, "text": " interface.", "tokens": [50818, 9226, 13, 50868], "temperature": + 0.0, "avg_logprob": -0.1865214226951062, "compression_ratio": 1.859375, "no_speech_prob": + 0.002084535313770175}, {"id": 267, "seek": 86132, "start": 871.4000000000001, "end": + 875.24, "text": " So to kind of wrap up my answer to the question, I think it''s + going to be Microsoft and", "tokens": [50868, 407, 281, 733, 295, 7019, 493, 452, + 1867, 281, 264, 1168, 11, 286, 519, 309, 311, 516, 281, 312, 8116, 293, 51060], + "temperature": 0.0, "avg_logprob": -0.1865214226951062, "compression_ratio": 1.859375, + "no_speech_prob": 0.002084535313770175}, {"id": 268, "seek": 86132, "start": 875.24, + "end": 877.6, "text": " them scaling up like crazy.", "tokens": [51060, 552, 21589, + 493, 411, 3219, 13, 51178], "temperature": 0.0, "avg_logprob": -0.1865214226951062, + "compression_ratio": 1.859375, "no_speech_prob": 0.002084535313770175}, {"id": 269, + "seek": 86132, "start": 877.6, "end": 880.44, "text": " Maybe they''re going to + run it out of internet scale data eventually.", "tokens": [51178, 2704, 436, 434, + 516, 281, 1190, 309, 484, 295, 4705, 4373, 1412, 4728, 13, 51320], "temperature": + 0.0, "avg_logprob": -0.1865214226951062, "compression_ratio": 1.859375, "no_speech_prob": + 0.002084535313770175}, {"id": 270, "seek": 86132, "start": 880.44, "end": 884.2800000000001, + "text": " I think Microsoft has said that they can train like a 32 trillion parameter + model if they", "tokens": [51320, 286, 519, 8116, 575, 848, 300, 436, 393, 3847, + 411, 257, 8858, 18723, 13075, 2316, 498, 436, 51512], "temperature": 0.0, "avg_logprob": + -0.1865214226951062, "compression_ratio": 1.859375, "no_speech_prob": 0.002084535313770175}, + {"id": 271, "seek": 86132, "start": 884.2800000000001, "end": 886.08, "text": " + were motivated to do so.", "tokens": [51512, 645, 14515, 281, 360, 370, 13, 51602], + "temperature": 0.0, "avg_logprob": -0.1865214226951062, "compression_ratio": 1.859375, + "no_speech_prob": 0.002084535313770175}, {"id": 272, "seek": 86132, "start": 886.08, + "end": 890.2800000000001, "text": " So I think they''re going to run out of internet + scale data and then the data augmentation", "tokens": [51602, 407, 286, 519, 436, + 434, 516, 281, 1190, 484, 295, 4705, 4373, 1412, 293, 550, 264, 1412, 14501, 19631, + 51812], "temperature": 0.0, "avg_logprob": -0.1865214226951062, "compression_ratio": + 1.859375, "no_speech_prob": 0.002084535313770175}, {"id": 273, "seek": 89028, "start": + 890.28, "end": 893.8399999999999, "text": " will be the next step from going from + say like the 400 million image taxpayers that are", "tokens": [50364, 486, 312, + 264, 958, 1823, 490, 516, 490, 584, 411, 264, 8423, 2459, 3256, 38205, 300, 366, + 50542], "temperature": 0.0, "avg_logprob": -0.19603939545460236, "compression_ratio": + 1.6232394366197183, "no_speech_prob": 0.0009415590320713818}, {"id": 274, "seek": + 89028, "start": 893.8399999999999, "end": 899.8399999999999, "text": " now open + sourced or Luther AI has the pile, which is like 800 gigabytes of raw text if you", + "tokens": [50542, 586, 1269, 11006, 1232, 420, 20693, 7318, 575, 264, 14375, 11, + 597, 307, 411, 13083, 42741, 295, 8936, 2487, 498, 291, 50842], "temperature": 0.0, + "avg_logprob": -0.19603939545460236, "compression_ratio": 1.6232394366197183, "no_speech_prob": + 0.0009415590320713818}, {"id": 275, "seek": 89028, "start": 899.8399999999999, "end": + 902.76, "text": " want to do something with that.", "tokens": [50842, 528, 281, + 360, 746, 365, 300, 13, 50988], "temperature": 0.0, "avg_logprob": -0.19603939545460236, + "compression_ratio": 1.6232394366197183, "no_speech_prob": 0.0009415590320713818}, + {"id": 276, "seek": 89028, "start": 902.76, "end": 907.1999999999999, "text": " + So I think eventually as you go into the 32 trillion parameter and on, they''re + going to", "tokens": [50988, 407, 286, 519, 4728, 382, 291, 352, 666, 264, 8858, + 18723, 13075, 293, 322, 11, 436, 434, 516, 281, 51210], "temperature": 0.0, "avg_logprob": + -0.19603939545460236, "compression_ratio": 1.6232394366197183, "no_speech_prob": + 0.0009415590320713818}, {"id": 277, "seek": 89028, "start": 907.1999999999999, "end": + 912.12, "text": " use data augmentation to have these inductive biases about how + we can keep scaling the", "tokens": [51210, 764, 1412, 14501, 19631, 281, 362, 613, + 31612, 488, 32152, 466, 577, 321, 393, 1066, 21589, 264, 51456], "temperature": + 0.0, "avg_logprob": -0.19603939545460236, "compression_ratio": 1.6232394366197183, + "no_speech_prob": 0.0009415590320713818}, {"id": 278, "seek": 89028, "start": 912.12, + "end": 913.52, "text": " data side of it.", "tokens": [51456, 1412, 1252, 295, 309, + 13, 51526], "temperature": 0.0, "avg_logprob": -0.19603939545460236, "compression_ratio": + 1.6232394366197183, "no_speech_prob": 0.0009415590320713818}, {"id": 279, "seek": + 89028, "start": 913.52, "end": 917.36, "text": " So yeah, so I think they can scale + the models for a while.", "tokens": [51526, 407, 1338, 11, 370, 286, 519, 436, 393, + 4373, 264, 5245, 337, 257, 1339, 13, 51718], "temperature": 0.0, "avg_logprob": + -0.19603939545460236, "compression_ratio": 1.6232394366197183, "no_speech_prob": + 0.0009415590320713818}, {"id": 280, "seek": 91736, "start": 917.48, "end": 921.5600000000001, + "text": " Yeah, I guess they probably they are doing an amazing job, but like they + are probably", "tokens": [50370, 865, 11, 286, 2041, 436, 1391, 436, 366, 884, 364, + 2243, 1691, 11, 457, 411, 436, 366, 1391, 50574], "temperature": 0.0, "avg_logprob": + -0.22442102432250977, "compression_ratio": 1.6912280701754385, "no_speech_prob": + 0.031633563339710236}, {"id": 281, "seek": 91736, "start": 921.5600000000001, "end": + 926.36, "text": " still writing the horse of what Peter Norby called the unreasonable + effectiveness of data,", "tokens": [50574, 920, 3579, 264, 6832, 295, 437, 6508, + 6966, 2322, 1219, 264, 41730, 21208, 295, 1412, 11, 50814], "temperature": 0.0, + "avg_logprob": -0.22442102432250977, "compression_ratio": 1.6912280701754385, "no_speech_prob": + 0.031633563339710236}, {"id": 282, "seek": 91736, "start": 926.36, "end": 927.36, + "text": " right?", "tokens": [50814, 558, 30, 50864], "temperature": 0.0, "avg_logprob": + -0.22442102432250977, "compression_ratio": 1.6912280701754385, "no_speech_prob": + 0.031633563339710236}, {"id": 283, "seek": 91736, "start": 927.36, "end": 933.04, + "text": " So like your algorithm might not be kind of as as nuanced as your data + is and so just", "tokens": [50864, 407, 411, 428, 9284, 1062, 406, 312, 733, 295, + 382, 382, 45115, 382, 428, 1412, 307, 293, 370, 445, 51148], "temperature": 0.0, + "avg_logprob": -0.22442102432250977, "compression_ratio": 1.6912280701754385, "no_speech_prob": + 0.031633563339710236}, {"id": 284, "seek": 91736, "start": 933.04, "end": 937.6800000000001, + "text": " give it to the machine learning algorithm as much as possible and then + kind of it will", "tokens": [51148, 976, 309, 281, 264, 3479, 2539, 9284, 382, 709, + 382, 1944, 293, 550, 733, 295, 309, 486, 51380], "temperature": 0.0, "avg_logprob": + -0.22442102432250977, "compression_ratio": 1.6912280701754385, "no_speech_prob": + 0.031633563339710236}, {"id": 285, "seek": 91736, "start": 937.6800000000001, "end": + 938.92, "text": " learn, right?", "tokens": [51380, 1466, 11, 558, 30, 51442], "temperature": + 0.0, "avg_logprob": -0.22442102432250977, "compression_ratio": 1.6912280701754385, + "no_speech_prob": 0.031633563339710236}, {"id": 286, "seek": 91736, "start": 938.92, + "end": 941.88, "text": " But you know, like in practical situations, this is what + I alluded to.", "tokens": [51442, 583, 291, 458, 11, 411, 294, 8496, 6851, 11, 341, + 307, 437, 286, 33919, 281, 13, 51590], "temperature": 0.0, "avg_logprob": -0.22442102432250977, + "compression_ratio": 1.6912280701754385, "no_speech_prob": 0.031633563339710236}, + {"id": 287, "seek": 91736, "start": 941.88, "end": 944.16, "text": " Like you just + don''t have that much data.", "tokens": [51590, 1743, 291, 445, 500, 380, 362, 300, + 709, 1412, 13, 51704], "temperature": 0.0, "avg_logprob": -0.22442102432250977, + "compression_ratio": 1.6912280701754385, "no_speech_prob": 0.031633563339710236}, + {"id": 288, "seek": 94416, "start": 944.16, "end": 948.28, "text": " On the other + hand, you don''t want you don''t have that much choice and you also mentioned", + "tokens": [50364, 1282, 264, 661, 1011, 11, 291, 500, 380, 528, 291, 500, 380, 362, + 300, 709, 3922, 293, 291, 611, 2835, 50570], "temperature": 0.0, "avg_logprob": + -0.21474220569317157, "compression_ratio": 1.7162162162162162, "no_speech_prob": + 0.041352272033691406}, {"id": 289, "seek": 94416, "start": 948.28, "end": 949.28, + "text": " this.", "tokens": [50570, 341, 13, 50620], "temperature": 0.0, "avg_logprob": + -0.21474220569317157, "compression_ratio": 1.7162162162162162, "no_speech_prob": + 0.041352272033691406}, {"id": 290, "seek": 94416, "start": 949.28, "end": 952.68, + "text": " This is a very interesting topic of data augmentation in text because + in images, you can do like", "tokens": [50620, 639, 307, 257, 588, 1880, 4829, 295, + 1412, 14501, 19631, 294, 2487, 570, 294, 5267, 11, 291, 393, 360, 411, 50790], "temperature": + 0.0, "avg_logprob": -0.21474220569317157, "compression_ratio": 1.7162162162162162, + "no_speech_prob": 0.041352272033691406}, {"id": 291, "seek": 94416, "start": 952.68, + "end": 955.48, "text": " cropping rotation and huge changes and whatnot.", "tokens": + [50790, 4848, 3759, 12447, 293, 2603, 2962, 293, 25882, 13, 50930], "temperature": + 0.0, "avg_logprob": -0.21474220569317157, "compression_ratio": 1.7162162162162162, + "no_speech_prob": 0.041352272033691406}, {"id": 292, "seek": 94416, "start": 955.48, + "end": 958.28, "text": " In text, you can do that like so easily.", "tokens": [50930, + 682, 2487, 11, 291, 393, 360, 300, 411, 370, 3612, 13, 51070], "temperature": 0.0, + "avg_logprob": -0.21474220569317157, "compression_ratio": 1.7162162162162162, "no_speech_prob": + 0.041352272033691406}, {"id": 293, "seek": 94416, "start": 958.28, "end": 962.16, + "text": " For example, if you say you have a sentence London is the capital of Great + Britain, you cannot", "tokens": [51070, 1171, 1365, 11, 498, 291, 584, 291, 362, + 257, 8174, 7042, 307, 264, 4238, 295, 3769, 12960, 11, 291, 2644, 51264], "temperature": + 0.0, "avg_logprob": -0.21474220569317157, "compression_ratio": 1.7162162162162162, + "no_speech_prob": 0.041352272033691406}, {"id": 294, "seek": 94416, "start": 962.16, + "end": 963.4, "text": " put Barcelona there.", "tokens": [51264, 829, 21247, 456, + 13, 51326], "temperature": 0.0, "avg_logprob": -0.21474220569317157, "compression_ratio": + 1.7162162162162162, "no_speech_prob": 0.041352272033691406}, {"id": 295, "seek": + 94416, "start": 963.4, "end": 965.36, "text": " It will not make sense.", "tokens": + [51326, 467, 486, 406, 652, 2020, 13, 51424], "temperature": 0.0, "avg_logprob": + -0.21474220569317157, "compression_ratio": 1.7162162162162162, "no_speech_prob": + 0.041352272033691406}, {"id": 296, "seek": 94416, "start": 965.36, "end": 970.12, + "text": " So, you know, but like you can still find another example where you could + probably swap", "tokens": [51424, 407, 11, 291, 458, 11, 457, 411, 291, 393, 920, + 915, 1071, 1365, 689, 291, 727, 1391, 18135, 51662], "temperature": 0.0, "avg_logprob": + -0.21474220569317157, "compression_ratio": 1.7162162162162162, "no_speech_prob": + 0.041352272033691406}, {"id": 297, "seek": 97012, "start": 970.12, "end": 974.32, + "text": " cities and that''s how you build, you know, the augmentation.", "tokens": + [50364, 6486, 293, 300, 311, 577, 291, 1322, 11, 291, 458, 11, 264, 14501, 19631, + 13, 50574], "temperature": 0.0, "avg_logprob": -0.19449284293434838, "compression_ratio": + 1.7529411764705882, "no_speech_prob": 0.004695868119597435}, {"id": 298, "seek": + 97012, "start": 974.32, "end": 975.72, "text": " But then there are other things.", + "tokens": [50574, 583, 550, 456, 366, 661, 721, 13, 50644], "temperature": 0.0, + "avg_logprob": -0.19449284293434838, "compression_ratio": 1.7529411764705882, "no_speech_prob": + 0.004695868119597435}, {"id": 299, "seek": 97012, "start": 975.72, "end": 980.68, + "text": " For example, if you take machine translation, you know, it suffers from + hallucination problem.", "tokens": [50644, 1171, 1365, 11, 498, 291, 747, 3479, + 12853, 11, 291, 458, 11, 309, 33776, 490, 35212, 2486, 1154, 13, 50892], "temperature": + 0.0, "avg_logprob": -0.19449284293434838, "compression_ratio": 1.7529411764705882, + "no_speech_prob": 0.004695868119597435}, {"id": 300, "seek": 97012, "start": 980.68, + "end": 985.88, "text": " I don''t know if you heard about it, but like if you have + certain like distortion in your", "tokens": [50892, 286, 500, 380, 458, 498, 291, + 2198, 466, 309, 11, 457, 411, 498, 291, 362, 1629, 411, 28426, 294, 428, 51152], + "temperature": 0.0, "avg_logprob": -0.19449284293434838, "compression_ratio": 1.7529411764705882, + "no_speech_prob": 0.004695868119597435}, {"id": 301, "seek": 97012, "start": 985.88, + "end": 991.84, "text": " data, for example, you call the websites and you also called + erroneously the advertisement.", "tokens": [51152, 1412, 11, 337, 1365, 11, 291, + 818, 264, 12891, 293, 291, 611, 1219, 1189, 26446, 5098, 264, 31370, 13, 51450], + "temperature": 0.0, "avg_logprob": -0.19449284293434838, "compression_ratio": 1.7529411764705882, + "no_speech_prob": 0.004695868119597435}, {"id": 302, "seek": 97012, "start": 991.84, + "end": 998.64, "text": " So you glued the advertisement to the source pair, source + target pair, right?", "tokens": [51450, 407, 291, 28008, 264, 31370, 281, 264, 4009, + 6119, 11, 4009, 3779, 6119, 11, 558, 30, 51790], "temperature": 0.0, "avg_logprob": + -0.19449284293434838, "compression_ratio": 1.7529411764705882, "no_speech_prob": + 0.004695868119597435}, {"id": 303, "seek": 99864, "start": 998.64, "end": 1003.1999999999999, + "text": " Now your model is hallucinating about that advertisement when the student + has, right?", "tokens": [50364, 823, 428, 2316, 307, 35212, 8205, 466, 300, 31370, + 562, 264, 3107, 575, 11, 558, 30, 50592], "temperature": 0.0, "avg_logprob": -0.21392417535549257, + "compression_ratio": 1.6618181818181819, "no_speech_prob": 0.007361919619143009}, + {"id": 304, "seek": 99864, "start": 1003.1999999999999, "end": 1005.6, "text": " + So, and it''s flipping facts.", "tokens": [50592, 407, 11, 293, 309, 311, 26886, + 9130, 13, 50712], "temperature": 0.0, "avg_logprob": -0.21392417535549257, "compression_ratio": + 1.6618181818181819, "no_speech_prob": 0.007361919619143009}, {"id": 305, "seek": + 99864, "start": 1005.6, "end": 1009.24, "text": " It''s also switching, you know, + object and subject easily.", "tokens": [50712, 467, 311, 611, 16493, 11, 291, 458, + 11, 2657, 293, 3983, 3612, 13, 50894], "temperature": 0.0, "avg_logprob": -0.21392417535549257, + "compression_ratio": 1.6618181818181819, "no_speech_prob": 0.007361919619143009}, + {"id": 306, "seek": 99864, "start": 1009.24, "end": 1010.4, "text": " So it''s not + something.", "tokens": [50894, 407, 309, 311, 406, 746, 13, 50952], "temperature": + 0.0, "avg_logprob": -0.21392417535549257, "compression_ratio": 1.6618181818181819, + "no_speech_prob": 0.007361919619143009}, {"id": 307, "seek": 99864, "start": 1010.4, + "end": 1014.08, "text": " And again, now I''m stepping on the territory of the model + itself, right?", "tokens": [50952, 400, 797, 11, 586, 286, 478, 16821, 322, 264, + 11360, 295, 264, 2316, 2564, 11, 558, 30, 51136], "temperature": 0.0, "avg_logprob": + -0.21392417535549257, "compression_ratio": 1.6618181818181819, "no_speech_prob": + 0.007361919619143009}, {"id": 308, "seek": 99864, "start": 1014.08, "end": 1017.04, + "text": " But like, and model robustness.", "tokens": [51136, 583, 411, 11, 293, + 2316, 13956, 1287, 13, 51284], "temperature": 0.0, "avg_logprob": -0.21392417535549257, + "compression_ratio": 1.6618181818181819, "no_speech_prob": 0.007361919619143009}, + {"id": 309, "seek": 99864, "start": 1017.04, "end": 1021.76, "text": " But I think + data augmentation plays a key role in actually making sure that your model", "tokens": + [51284, 583, 286, 519, 1412, 14501, 19631, 5749, 257, 2141, 3090, 294, 767, 1455, + 988, 300, 428, 2316, 51520], "temperature": 0.0, "avg_logprob": -0.21392417535549257, + "compression_ratio": 1.6618181818181819, "no_speech_prob": 0.007361919619143009}, + {"id": 310, "seek": 99864, "start": 1021.76, "end": 1026.44, "text": " can kind + of at least not hiccup on some very basic things, right?", "tokens": [51520, 393, + 733, 295, 412, 1935, 406, 23697, 16794, 322, 512, 588, 3875, 721, 11, 558, 30, 51754], + "temperature": 0.0, "avg_logprob": -0.21392417535549257, "compression_ratio": 1.6618181818181819, + "no_speech_prob": 0.007361919619143009}, {"id": 311, "seek": 99864, "start": 1026.44, + "end": 1027.44, "text": " So.", "tokens": [51754, 407, 13, 51804], "temperature": + 0.0, "avg_logprob": -0.21392417535549257, "compression_ratio": 1.6618181818181819, + "no_speech_prob": 0.007361919619143009}, {"id": 312, "seek": 102744, "start": 1028.04, + "end": 1030.16, "text": " Yeah, and we''re completely in agreement with that.", + "tokens": [50394, 865, 11, 293, 321, 434, 2584, 294, 8106, 365, 300, 13, 50500], + "temperature": 0.0, "avg_logprob": -0.17013627688090008, "compression_ratio": 1.7615658362989324, + "no_speech_prob": 0.011538082733750343}, {"id": 313, "seek": 102744, "start": 1030.16, + "end": 1036.6000000000001, "text": " I think one other part to that story will be + how, say, so Facebook has this model called", "tokens": [50500, 286, 519, 472, 661, + 644, 281, 300, 1657, 486, 312, 577, 11, 584, 11, 370, 4384, 575, 341, 2316, 1219, + 50822], "temperature": 0.0, "avg_logprob": -0.17013627688090008, "compression_ratio": + 1.7615658362989324, "no_speech_prob": 0.011538082733750343}, {"id": 314, "seek": + 102744, "start": 1036.6000000000001, "end": 1041.1200000000001, "text": " retrieval + augmented generation, where the whole idea is to add more context to avoid this", + "tokens": [50822, 19817, 3337, 36155, 5125, 11, 689, 264, 1379, 1558, 307, 281, + 909, 544, 4319, 281, 5042, 341, 51048], "temperature": 0.0, "avg_logprob": -0.17013627688090008, + "compression_ratio": 1.7615658362989324, "no_speech_prob": 0.011538082733750343}, + {"id": 315, "seek": 102744, "start": 1041.1200000000001, "end": 1042.44, "text": + " hallucination problem.", "tokens": [51048, 35212, 2486, 1154, 13, 51114], "temperature": + 0.0, "avg_logprob": -0.17013627688090008, "compression_ratio": 1.7615658362989324, + "no_speech_prob": 0.011538082733750343}, {"id": 316, "seek": 102744, "start": 1042.44, + "end": 1045.6000000000001, "text": " So to kind of break down three things, you + just said, I want to start off with the, yeah,", "tokens": [51114, 407, 281, 733, + 295, 1821, 760, 1045, 721, 11, 291, 445, 848, 11, 286, 528, 281, 722, 766, 365, + 264, 11, 1338, 11, 51272], "temperature": 0.0, "avg_logprob": -0.17013627688090008, + "compression_ratio": 1.7615658362989324, "no_speech_prob": 0.011538082733750343}, + {"id": 317, "seek": 102744, "start": 1045.6000000000001, "end": 1048.04, "text": + " the hallucination thing and transitioning right into that.", "tokens": [51272, + 264, 35212, 2486, 551, 293, 33777, 558, 666, 300, 13, 51394], "temperature": 0.0, + "avg_logprob": -0.17013627688090008, "compression_ratio": 1.7615658362989324, "no_speech_prob": + 0.011538082733750343}, {"id": 318, "seek": 102744, "start": 1048.04, "end": 1053.44, + "text": " So, so I think the idea of adding more context is our best solution to + stopping hallucination", "tokens": [51394, 407, 11, 370, 286, 519, 264, 1558, 295, + 5127, 544, 4319, 307, 527, 1151, 3827, 281, 12767, 35212, 2486, 51664], "temperature": + 0.0, "avg_logprob": -0.17013627688090008, "compression_ratio": 1.7615658362989324, + "no_speech_prob": 0.011538082733750343}, {"id": 319, "seek": 105344, "start": 1053.52, + "end": 1058.56, "text": " and maybe using consistency, contrastive loss, loss functions + for the fine tuning to,", "tokens": [50368, 293, 1310, 1228, 14416, 11, 8712, 488, + 4470, 11, 4470, 6828, 337, 264, 2489, 15164, 281, 11, 50620], "temperature": 0.0, + "avg_logprob": -0.27125413450476243, "compression_ratio": 1.771513353115727, "no_speech_prob": + 0.0026054223999381065}, {"id": 320, "seek": 105344, "start": 1058.56, "end": 1060.0800000000002, + "text": " to make sure they''re attending on the context.", "tokens": [50620, 281, + 652, 988, 436, 434, 15862, 322, 264, 4319, 13, 50696], "temperature": 0.0, "avg_logprob": + -0.27125413450476243, "compression_ratio": 1.771513353115727, "no_speech_prob": + 0.0026054223999381065}, {"id": 321, "seek": 105344, "start": 1060.0800000000002, + "end": 1065.16, "text": " Because like I recently reviewed a paper on my channel + titled open, open, open challenges", "tokens": [50696, 1436, 411, 286, 3938, 18429, + 257, 3035, 322, 452, 2269, 19841, 1269, 11, 1269, 11, 1269, 4759, 50950], "temperature": + 0.0, "avg_logprob": -0.27125413450476243, "compression_ratio": 1.771513353115727, + "no_speech_prob": 0.0026054223999381065}, {"id": 322, "seek": 105344, "start": 1065.16, + "end": 1069.64, "text": " in open domain generalization, some title like that, where, + um, where yeah, these models,", "tokens": [50950, 294, 1269, 9274, 2674, 2144, 11, + 512, 4876, 411, 300, 11, 689, 11, 1105, 11, 689, 1338, 11, 613, 5245, 11, 51174], + "temperature": 0.0, "avg_logprob": -0.27125413450476243, "compression_ratio": 1.771513353115727, + "no_speech_prob": 0.0026054223999381065}, {"id": 323, "seek": 105344, "start": 1069.64, + "end": 1070.8, "text": " you get them the context.", "tokens": [51174, 291, 483, + 552, 264, 4319, 13, 51232], "temperature": 0.0, "avg_logprob": -0.27125413450476243, + "compression_ratio": 1.771513353115727, "no_speech_prob": 0.0026054223999381065}, + {"id": 324, "seek": 105344, "start": 1070.8, "end": 1073.48, "text": " So they have + additional context in the input, but they just don''t read it.", "tokens": [51232, + 407, 436, 362, 4497, 4319, 294, 264, 4846, 11, 457, 436, 445, 500, 380, 1401, 309, + 13, 51366], "temperature": 0.0, "avg_logprob": -0.27125413450476243, "compression_ratio": + 1.771513353115727, "no_speech_prob": 0.0026054223999381065}, {"id": 325, "seek": + 105344, "start": 1073.48, "end": 1075.64, "text": " And they just generalize as + if it''s not there.", "tokens": [51366, 400, 436, 445, 2674, 1125, 382, 498, 309, + 311, 406, 456, 13, 51474], "temperature": 0.0, "avg_logprob": -0.27125413450476243, + "compression_ratio": 1.771513353115727, "no_speech_prob": 0.0026054223999381065}, + {"id": 326, "seek": 105344, "start": 1075.64, "end": 1078.3600000000001, "text": + " So fixing that problem is definitely step one.", "tokens": [51474, 407, 19442, + 300, 1154, 307, 2138, 1823, 472, 13, 51610], "temperature": 0.0, "avg_logprob": + -0.27125413450476243, "compression_ratio": 1.771513353115727, "no_speech_prob": + 0.0026054223999381065}, {"id": 327, "seek": 105344, "start": 1078.3600000000001, + "end": 1083.04, "text": " And so then to go into the second thing that you mentioned + where you replaced London with", "tokens": [51610, 400, 370, 550, 281, 352, 666, + 264, 1150, 551, 300, 291, 2835, 689, 291, 10772, 7042, 365, 51844], "temperature": + 0.0, "avg_logprob": -0.27125413450476243, "compression_ratio": 1.771513353115727, + "no_speech_prob": 0.0026054223999381065}, {"id": 328, "seek": 108304, "start": 1083.04, + "end": 1088.36, "text": " Barcelona and that''s the thing about tech data augmentation + is, it''s, it''s not label", "tokens": [50364, 21247, 293, 300, 311, 264, 551, 466, + 7553, 1412, 14501, 19631, 307, 11, 309, 311, 11, 309, 311, 406, 7645, 50630], "temperature": + 0.0, "avg_logprob": -0.2060009258896557, "compression_ratio": 1.7373737373737375, + "no_speech_prob": 0.0012733318144455552}, {"id": 329, "seek": 108304, "start": 1088.36, + "end": 1089.36, "text": " preserving really.", "tokens": [50630, 33173, 534, 13, + 50680], "temperature": 0.0, "avg_logprob": -0.2060009258896557, "compression_ratio": + 1.7373737373737375, "no_speech_prob": 0.0012733318144455552}, {"id": 330, "seek": + 108304, "start": 1089.36, "end": 1091.24, "text": " It''s harder to find symmetries + in the space.", "tokens": [50680, 467, 311, 6081, 281, 915, 14232, 302, 2244, 294, + 264, 1901, 13, 50774], "temperature": 0.0, "avg_logprob": -0.2060009258896557, "compression_ratio": + 1.7373737373737375, "no_speech_prob": 0.0012733318144455552}, {"id": 331, "seek": + 108304, "start": 1091.24, "end": 1092.84, "text": " It''s easier to find these differences.", + "tokens": [50774, 467, 311, 3571, 281, 915, 613, 7300, 13, 50854], "temperature": + 0.0, "avg_logprob": -0.2060009258896557, "compression_ratio": 1.7373737373737375, + "no_speech_prob": 0.0012733318144455552}, {"id": 332, "seek": 108304, "start": 1092.84, + "end": 1094.32, "text": " So there''s one paper.", "tokens": [50854, 407, 456, 311, + 472, 3035, 13, 50928], "temperature": 0.0, "avg_logprob": -0.2060009258896557, "compression_ratio": + 1.7373737373737375, "no_speech_prob": 0.0012733318144455552}, {"id": 333, "seek": + 108304, "start": 1094.32, "end": 1098.1599999999999, "text": " Maybe I''d like to + point readers to titled on negative data augmentation.", "tokens": [50928, 2704, + 286, 1116, 411, 281, 935, 17147, 281, 19841, 322, 3671, 1412, 14501, 19631, 13, + 51120], "temperature": 0.0, "avg_logprob": -0.2060009258896557, "compression_ratio": + 1.7373737373737375, "no_speech_prob": 0.0012733318144455552}, {"id": 334, "seek": + 108304, "start": 1098.1599999999999, "end": 1101.6399999999999, "text": " And so + they''re kind of flipping the, so it''s like, how do we use augmented data?", "tokens": + [51120, 400, 370, 436, 434, 733, 295, 26886, 264, 11, 370, 309, 311, 411, 11, 577, + 360, 321, 764, 36155, 1412, 30, 51294], "temperature": 0.0, "avg_logprob": -0.2060009258896557, + "compression_ratio": 1.7373737373737375, "no_speech_prob": 0.0012733318144455552}, + {"id": 335, "seek": 108304, "start": 1101.6399999999999, "end": 1106.6399999999999, + "text": " Should we just keep using this, you know, kale divergence between the + one hot class", "tokens": [51294, 6454, 321, 445, 1066, 1228, 341, 11, 291, 458, + 11, 34699, 47387, 1296, 264, 472, 2368, 1508, 51544], "temperature": 0.0, "avg_logprob": + -0.2060009258896557, "compression_ratio": 1.7373737373737375, "no_speech_prob": + 0.0012733318144455552}, {"id": 336, "seek": 108304, "start": 1106.6399999999999, + "end": 1109.8799999999999, "text": " vectors or should we do something different + with the augmented data?", "tokens": [51544, 18875, 420, 820, 321, 360, 746, 819, + 365, 264, 36155, 1412, 30, 51706], "temperature": 0.0, "avg_logprob": -0.2060009258896557, + "compression_ratio": 1.7373737373737375, "no_speech_prob": 0.0012733318144455552}, + {"id": 337, "seek": 110988, "start": 1109.88, "end": 1113.7600000000002, "text": + " I mentioned consistency losses where the loss would be, you know, the representations", + "tokens": [50364, 286, 2835, 14416, 15352, 689, 264, 4470, 576, 312, 11, 291, 458, + 11, 264, 33358, 50558], "temperature": 0.0, "avg_logprob": -0.24675430520607608, + "compression_ratio": 1.7665615141955835, "no_speech_prob": 0.2044392228126526}, + {"id": 338, "seek": 110988, "start": 1113.7600000000002, "end": 1118.92, "text": + " of X and X prime ignoring whatever the Y label is and negative data augmentation + is saying,", "tokens": [50558, 295, 1783, 293, 1783, 5835, 26258, 2035, 264, 398, + 7645, 307, 293, 3671, 1412, 14501, 19631, 307, 1566, 11, 50816], "temperature": + 0.0, "avg_logprob": -0.24675430520607608, "compression_ratio": 1.7665615141955835, + "no_speech_prob": 0.2044392228126526}, {"id": 339, "seek": 110988, "start": 1118.92, + "end": 1120.2, "text": " you know, push them apart.", "tokens": [50816, 291, 458, + 11, 2944, 552, 4936, 13, 50880], "temperature": 0.0, "avg_logprob": -0.24675430520607608, + "compression_ratio": 1.7665615141955835, "no_speech_prob": 0.2044392228126526}, + {"id": 340, "seek": 110988, "start": 1120.2, "end": 1121.44, "text": " These are + not the same label.", "tokens": [50880, 1981, 366, 406, 264, 912, 7645, 13, 50942], + "temperature": 0.0, "avg_logprob": -0.24675430520607608, "compression_ratio": 1.7665615141955835, + "no_speech_prob": 0.2044392228126526}, {"id": 341, "seek": 110988, "start": 1121.44, + "end": 1123.2, "text": " We''ve switched London with Barcelona.", "tokens": [50942, + 492, 600, 16858, 7042, 365, 21247, 13, 51030], "temperature": 0.0, "avg_logprob": + -0.24675430520607608, "compression_ratio": 1.7665615141955835, "no_speech_prob": + 0.2044392228126526}, {"id": 342, "seek": 110988, "start": 1123.2, "end": 1128.0800000000002, + "text": " And so then I think the last thing, as we''re talking about, like the + practical implementation,", "tokens": [51030, 400, 370, 550, 286, 519, 264, 1036, + 551, 11, 382, 321, 434, 1417, 466, 11, 411, 264, 8496, 11420, 11, 51274], "temperature": + 0.0, "avg_logprob": -0.24675430520607608, "compression_ratio": 1.7665615141955835, + "no_speech_prob": 0.2044392228126526}, {"id": 343, "seek": 110988, "start": 1128.0800000000002, + "end": 1131.4, "text": " I think you say two things, there''s like two directions + that which are really interesting.", "tokens": [51274, 286, 519, 291, 584, 732, + 721, 11, 456, 311, 411, 732, 11095, 300, 597, 366, 534, 1880, 13, 51440], "temperature": + 0.0, "avg_logprob": -0.24675430520607608, "compression_ratio": 1.7665615141955835, + "no_speech_prob": 0.2044392228126526}, {"id": 344, "seek": 110988, "start": 1131.4, + "end": 1134.6000000000001, "text": " And I think what you''re getting to with the + data augmentation is, is you want to prevent", "tokens": [51440, 400, 286, 519, + 437, 291, 434, 1242, 281, 365, 264, 1412, 14501, 19631, 307, 11, 307, 291, 528, + 281, 4871, 51600], "temperature": 0.0, "avg_logprob": -0.24675430520607608, "compression_ratio": + 1.7665615141955835, "no_speech_prob": 0.2044392228126526}, {"id": 345, "seek": 110988, + "start": 1134.6000000000001, "end": 1135.6000000000001, "text": " overfitting.", + "tokens": [51600, 670, 69, 2414, 13, 51650], "temperature": 0.0, "avg_logprob": + -0.24675430520607608, "compression_ratio": 1.7665615141955835, "no_speech_prob": + 0.2044392228126526}, {"id": 346, "seek": 113560, "start": 1135.6, "end": 1140.24, + "text": " And if you have, if you''re, you know, grabbing Microsoft''s 32 trillion + parameter model,", "tokens": [50364, 400, 498, 291, 362, 11, 498, 291, 434, 11, + 291, 458, 11, 23771, 8116, 311, 8858, 18723, 13075, 2316, 11, 50596], "temperature": + 0.0, "avg_logprob": -0.19803947126361685, "compression_ratio": 1.6696428571428572, + "no_speech_prob": 0.0108503932133317}, {"id": 347, "seek": 113560, "start": 1140.24, + "end": 1144.52, "text": " and you''ve only got 100 labeled examples, there''s no + way that''s going to work.", "tokens": [50596, 293, 291, 600, 787, 658, 2319, 21335, + 5110, 11, 456, 311, 572, 636, 300, 311, 516, 281, 589, 13, 50810], "temperature": + 0.0, "avg_logprob": -0.19803947126361685, "compression_ratio": 1.6696428571428572, + "no_speech_prob": 0.0108503932133317}, {"id": 348, "seek": 113560, "start": 1144.52, + "end": 1145.52, "text": " So you want to prevent overfitting.", "tokens": [50810, + 407, 291, 528, 281, 4871, 670, 69, 2414, 13, 50860], "temperature": 0.0, "avg_logprob": + -0.19803947126361685, "compression_ratio": 1.6696428571428572, "no_speech_prob": + 0.0108503932133317}, {"id": 349, "seek": 113560, "start": 1145.52, "end": 1148.6799999999998, + "text": " And then I think kind of the second part to that story when people talk + about this kind", "tokens": [50860, 400, 550, 286, 519, 733, 295, 264, 1150, 644, + 281, 300, 1657, 562, 561, 751, 466, 341, 733, 51018], "temperature": 0.0, "avg_logprob": + -0.19803947126361685, "compression_ratio": 1.6696428571428572, "no_speech_prob": + 0.0108503932133317}, {"id": 350, "seek": 113560, "start": 1148.6799999999998, "end": + 1152.9199999999998, "text": " of topic is, is like storage and inference cost and + obviously training costs.", "tokens": [51018, 295, 4829, 307, 11, 307, 411, 6725, + 293, 38253, 2063, 293, 2745, 3097, 5497, 13, 51230], "temperature": 0.0, "avg_logprob": + -0.19803947126361685, "compression_ratio": 1.6696428571428572, "no_speech_prob": + 0.0108503932133317}, {"id": 351, "seek": 113560, "start": 1152.9199999999998, "end": + 1154.08, "text": " You''re going to fine tune this.", "tokens": [51230, 509, 434, + 516, 281, 2489, 10864, 341, 13, 51288], "temperature": 0.0, "avg_logprob": -0.19803947126361685, + "compression_ratio": 1.6696428571428572, "no_speech_prob": 0.0108503932133317}, + {"id": 352, "seek": 113560, "start": 1154.08, "end": 1157.04, "text": " So maybe + training costs has been solved with prompting where you don''t actually need to", + "tokens": [51288, 407, 1310, 3097, 5497, 575, 668, 13041, 365, 12391, 278, 689, + 291, 500, 380, 767, 643, 281, 51436], "temperature": 0.0, "avg_logprob": -0.19803947126361685, + "compression_ratio": 1.6696428571428572, "no_speech_prob": 0.0108503932133317}, + {"id": 353, "seek": 113560, "start": 1157.04, "end": 1158.4399999999998, "text": + " do any grading to send updates.", "tokens": [51436, 360, 604, 35540, 281, 2845, + 9205, 13, 51506], "temperature": 0.0, "avg_logprob": -0.19803947126361685, "compression_ratio": + 1.6696428571428572, "no_speech_prob": 0.0108503932133317}, {"id": 354, "seek": 113560, + "start": 1158.4399999999998, "end": 1161.8799999999999, "text": " You just give + more in the input context.", "tokens": [51506, 509, 445, 976, 544, 294, 264, 4846, + 4319, 13, 51678], "temperature": 0.0, "avg_logprob": -0.19803947126361685, "compression_ratio": + 1.6696428571428572, "no_speech_prob": 0.0108503932133317}, {"id": 355, "seek": 116188, + "start": 1161.88, "end": 1165.92, "text": " But then I think inference cost is solved + with this knowledge installation interface.", "tokens": [50364, 583, 550, 286, 519, + 38253, 2063, 307, 13041, 365, 341, 3601, 13260, 9226, 13, 50566], "temperature": + 0.0, "avg_logprob": -0.2702792718158505, "compression_ratio": 1.744360902255639, + "no_speech_prob": 0.03740179166197777}, {"id": 356, "seek": 116188, "start": 1165.92, + "end": 1172.8000000000002, "text": " And I think hugging face, man, I think the + name of their product is lightning or something", "tokens": [50566, 400, 286, 519, + 41706, 1851, 11, 587, 11, 286, 519, 264, 1315, 295, 641, 1674, 307, 16589, 420, + 746, 50910], "temperature": 0.0, "avg_logprob": -0.2702792718158505, "compression_ratio": + 1.744360902255639, "no_speech_prob": 0.03740179166197777}, {"id": 357, "seek": 116188, + "start": 1172.8000000000002, "end": 1175.48, "text": " like that where it''s about + inference acceleration.", "tokens": [50910, 411, 300, 689, 309, 311, 466, 38253, + 17162, 13, 51044], "temperature": 0.0, "avg_logprob": -0.2702792718158505, "compression_ratio": + 1.744360902255639, "no_speech_prob": 0.03740179166197777}, {"id": 358, "seek": 116188, + "start": 1175.48, "end": 1178.5600000000002, "text": " And it looks like they''re, + you know, they''re doing it pretty well.", "tokens": [51044, 400, 309, 1542, 411, + 436, 434, 11, 291, 458, 11, 436, 434, 884, 309, 1238, 731, 13, 51198], "temperature": + 0.0, "avg_logprob": -0.2702792718158505, "compression_ratio": 1.744360902255639, + "no_speech_prob": 0.03740179166197777}, {"id": 359, "seek": 116188, "start": 1178.5600000000002, + "end": 1181.7600000000002, "text": " So I certainly bet on hugging face to solve + that problem.", "tokens": [51198, 407, 286, 3297, 778, 322, 41706, 1851, 281, 5039, + 300, 1154, 13, 51358], "temperature": 0.0, "avg_logprob": -0.2702792718158505, "compression_ratio": + 1.744360902255639, "no_speech_prob": 0.03740179166197777}, {"id": 360, "seek": 116188, + "start": 1181.7600000000002, "end": 1182.7600000000002, "text": " Oh, yeah, absolutely.", + "tokens": [51358, 876, 11, 1338, 11, 3122, 13, 51408], "temperature": 0.0, "avg_logprob": + -0.2702792718158505, "compression_ratio": 1.744360902255639, "no_speech_prob": 0.03740179166197777}, + {"id": 361, "seek": 116188, "start": 1182.7600000000002, "end": 1185.16, "text": + " I think they call it infinity, you know?", "tokens": [51408, 286, 519, 436, 818, + 309, 13202, 11, 291, 458, 30, 51528], "temperature": 0.0, "avg_logprob": -0.2702792718158505, + "compression_ratio": 1.744360902255639, "no_speech_prob": 0.03740179166197777}, + {"id": 362, "seek": 116188, "start": 1185.16, "end": 1186.16, "text": " Infinity.", + "tokens": [51528, 34762, 13, 51578], "temperature": 0.0, "avg_logprob": -0.2702792718158505, + "compression_ratio": 1.744360902255639, "no_speech_prob": 0.03740179166197777}, + {"id": 363, "seek": 116188, "start": 1186.16, "end": 1187.96, "text": " Yeah, sorry + about that.", "tokens": [51578, 865, 11, 2597, 466, 300, 13, 51668], "temperature": + 0.0, "avg_logprob": -0.2702792718158505, "compression_ratio": 1.744360902255639, + "no_speech_prob": 0.03740179166197777}, {"id": 364, "seek": 116188, "start": 1187.96, + "end": 1188.96, "text": " Oh, it''s okay.", "tokens": [51668, 876, 11, 309, 311, + 1392, 13, 51718], "temperature": 0.0, "avg_logprob": -0.2702792718158505, "compression_ratio": + 1.744360902255639, "no_speech_prob": 0.03740179166197777}, {"id": 365, "seek": 118896, + "start": 1189.3600000000001, "end": 1193.52, "text": " It''s also like testing your + memory, you know, like we remember.", "tokens": [50384, 467, 311, 611, 411, 4997, + 428, 4675, 11, 291, 458, 11, 411, 321, 1604, 13, 50592], "temperature": 0.0, "avg_logprob": + -0.19283623788871018, "compression_ratio": 1.634703196347032, "no_speech_prob": + 0.07675180584192276}, {"id": 366, "seek": 118896, "start": 1193.52, "end": 1198.08, + "text": " And I think it''s still also like at some point, and I think Elon Musk + is afraid of it.", "tokens": [50592, 400, 286, 519, 309, 311, 920, 611, 411, 412, + 512, 935, 11, 293, 286, 519, 28498, 26019, 307, 4638, 295, 309, 13, 50820], "temperature": + 0.0, "avg_logprob": -0.19283623788871018, "compression_ratio": 1.634703196347032, + "no_speech_prob": 0.07675180584192276}, {"id": 367, "seek": 118896, "start": 1198.08, + "end": 1204.52, "text": " Hey, Elon, if you''re listening to this, hello, you know, + like he''s afraid of that our", "tokens": [50820, 1911, 11, 28498, 11, 498, 291, + 434, 4764, 281, 341, 11, 7751, 11, 291, 458, 11, 411, 415, 311, 4638, 295, 300, + 527, 51142], "temperature": 0.0, "avg_logprob": -0.19283623788871018, "compression_ratio": + 1.634703196347032, "no_speech_prob": 0.07675180584192276}, {"id": 368, "seek": 118896, + "start": 1204.52, "end": 1207.04, "text": " interface is way too slow, right?", + "tokens": [51142, 9226, 307, 636, 886, 2964, 11, 558, 30, 51268], "temperature": + 0.0, "avg_logprob": -0.19283623788871018, "compression_ratio": 1.634703196347032, + "no_speech_prob": 0.07675180584192276}, {"id": 369, "seek": 118896, "start": 1207.04, + "end": 1214.3600000000001, "text": " And so eventually I will basically supersede + us, which I don''t think so, but let''s see.", "tokens": [51268, 400, 370, 4728, + 286, 486, 1936, 37906, 4858, 505, 11, 597, 286, 500, 380, 519, 370, 11, 457, 718, + 311, 536, 13, 51634], "temperature": 0.0, "avg_logprob": -0.19283623788871018, "compression_ratio": + 1.634703196347032, "no_speech_prob": 0.07675180584192276}, {"id": 370, "seek": 121436, + "start": 1214.36, "end": 1219.08, "text": " But also like what''s interesting, I + was thinking that maybe a little bit like developing this", "tokens": [50364, 583, + 611, 411, 437, 311, 1880, 11, 286, 390, 1953, 300, 1310, 257, 707, 857, 411, 6416, + 341, 50600], "temperature": 0.0, "avg_logprob": -0.24631410890871341, "compression_ratio": + 1.676923076923077, "no_speech_prob": 0.006588254123926163}, {"id": 371, "seek": + 121436, "start": 1219.08, "end": 1225.1599999999999, "text": " topic further, but + it sounds you have so much knowledge on this and it''s so packed, what", "tokens": + [50600, 4829, 3052, 11, 457, 309, 3263, 291, 362, 370, 709, 3601, 322, 341, 293, + 309, 311, 370, 13265, 11, 437, 50904], "temperature": 0.0, "avg_logprob": -0.24631410890871341, + "compression_ratio": 1.676923076923077, "no_speech_prob": 0.006588254123926163}, + {"id": 372, "seek": 121436, "start": 1225.1599999999999, "end": 1230.84, "text": + " you said, you know, like, for example, if we could use the language model itself + to help", "tokens": [50904, 291, 848, 11, 291, 458, 11, 411, 11, 337, 1365, 11, + 498, 321, 727, 764, 264, 2856, 2316, 2564, 281, 854, 51188], "temperature": 0.0, + "avg_logprob": -0.24631410890871341, "compression_ratio": 1.676923076923077, "no_speech_prob": + 0.006588254123926163}, {"id": 373, "seek": 121436, "start": 1230.84, "end": 1233.36, + "text": " us generate, you said GPT, right?", "tokens": [51188, 505, 8460, 11, 291, + 848, 26039, 51, 11, 558, 30, 51314], "temperature": 0.0, "avg_logprob": -0.24631410890871341, + "compression_ratio": 1.676923076923077, "no_speech_prob": 0.006588254123926163}, + {"id": 374, "seek": 121436, "start": 1233.36, "end": 1238.12, "text": " It''s generative + model, but there could be some others, which will kind of help us to generate", + "tokens": [51314, 467, 311, 1337, 1166, 2316, 11, 457, 456, 727, 312, 512, 2357, + 11, 597, 486, 733, 295, 854, 505, 281, 8460, 51552], "temperature": 0.0, "avg_logprob": + -0.24631410890871341, "compression_ratio": 1.676923076923077, "no_speech_prob": + 0.006588254123926163}, {"id": 375, "seek": 121436, "start": 1238.12, "end": 1240.8, + "text": " things and then augment the dataset.", "tokens": [51552, 721, 293, 550, + 29919, 264, 28872, 13, 51686], "temperature": 0.0, "avg_logprob": -0.24631410890871341, + "compression_ratio": 1.676923076923077, "no_speech_prob": 0.006588254123926163}, + {"id": 376, "seek": 124080, "start": 1241.2, "end": 1243.9199999999998, "text": + " But there is one beautiful that I don''t know if you''ve read this paper.", "tokens": + [50384, 583, 456, 307, 472, 2238, 300, 286, 500, 380, 458, 498, 291, 600, 1401, + 341, 3035, 13, 50520], "temperature": 0.0, "avg_logprob": -0.24566272774127998, + "compression_ratio": 1.6068702290076335, "no_speech_prob": 0.017959708347916603}, + {"id": 377, "seek": 124080, "start": 1243.9199999999998, "end": 1248.24, "text": + " It''s called what bird is not lessons from a new suite of cycling,", "tokens": + [50520, 467, 311, 1219, 437, 5255, 307, 406, 8820, 490, 257, 777, 14205, 295, 22425, + 11, 50736], "temperature": 0.0, "avg_logprob": -0.24566272774127998, "compression_ratio": + 1.6068702290076335, "no_speech_prob": 0.017959708347916603}, {"id": 378, "seek": + 124080, "start": 1248.24, "end": 1250.52, "text": " holistic diagnostics for language + models.", "tokens": [50736, 30334, 43215, 1167, 337, 2856, 5245, 13, 50850], "temperature": + 0.0, "avg_logprob": -0.24566272774127998, "compression_ratio": 1.6068702290076335, + "no_speech_prob": 0.017959708347916603}, {"id": 379, "seek": 124080, "start": 1250.52, + "end": 1258.28, "text": " And so basically the paper essentially claims that bird + does not distinguish the negations.", "tokens": [50850, 400, 370, 1936, 264, 3035, + 4476, 9441, 300, 5255, 775, 406, 20206, 264, 2485, 763, 13, 51238], "temperature": + 0.0, "avg_logprob": -0.24566272774127998, "compression_ratio": 1.6068702290076335, + "no_speech_prob": 0.017959708347916603}, {"id": 380, "seek": 124080, "start": 1258.28, + "end": 1262.8799999999999, "text": " And that can be super, super sensitive, like + in sentiment analysis, right?", "tokens": [51238, 400, 300, 393, 312, 1687, 11, + 1687, 9477, 11, 411, 294, 16149, 5215, 11, 558, 30, 51468], "temperature": 0.0, + "avg_logprob": -0.24566272774127998, "compression_ratio": 1.6068702290076335, "no_speech_prob": + 0.017959708347916603}, {"id": 381, "seek": 124080, "start": 1262.8799999999999, + "end": 1267.1599999999999, "text": " At least, but also like in machine translation + and other downstream tasks.", "tokens": [51468, 1711, 1935, 11, 457, 611, 411, 294, + 3479, 12853, 293, 661, 30621, 9608, 13, 51682], "temperature": 0.0, "avg_logprob": + -0.24566272774127998, "compression_ratio": 1.6068702290076335, "no_speech_prob": + 0.017959708347916603}, {"id": 382, "seek": 126716, "start": 1267.16, "end": 1268.96, + "text": " So have you thought about this?", "tokens": [50364, 407, 362, 291, 1194, + 466, 341, 30, 50454], "temperature": 0.0, "avg_logprob": -0.2627053774320162, "compression_ratio": + 1.7077922077922079, "no_speech_prob": 0.021291537210345268}, {"id": 383, "seek": + 126716, "start": 1268.96, "end": 1273.2, "text": " Like basically there is actually + a now a development.", "tokens": [50454, 1743, 1936, 456, 307, 767, 257, 586, 257, + 3250, 13, 50666], "temperature": 0.0, "avg_logprob": -0.2627053774320162, "compression_ratio": + 1.7077922077922079, "no_speech_prob": 0.021291537210345268}, {"id": 384, "seek": + 126716, "start": 1273.2, "end": 1278.96, "text": " I think it''s also on Microsoft + side to try to bring knowledge into the language model.", "tokens": [50666, 286, + 519, 309, 311, 611, 322, 8116, 1252, 281, 853, 281, 1565, 3601, 666, 264, 2856, + 2316, 13, 50954], "temperature": 0.0, "avg_logprob": -0.2627053774320162, "compression_ratio": + 1.7077922077922079, "no_speech_prob": 0.021291537210345268}, {"id": 385, "seek": + 126716, "start": 1278.96, "end": 1281.88, "text": " And you can do it in a variety + of ways you mentioned knowledge graph, but there are other", "tokens": [50954, 400, + 291, 393, 360, 309, 294, 257, 5673, 295, 2098, 291, 2835, 3601, 4295, 11, 457, 456, + 366, 661, 51100], "temperature": 0.0, "avg_logprob": -0.2627053774320162, "compression_ratio": + 1.7077922077922079, "no_speech_prob": 0.021291537210345268}, {"id": 386, "seek": + 126716, "start": 1281.88, "end": 1284.4, "text": " ways kind of to bring in the + structured knowledge.", "tokens": [51100, 2098, 733, 295, 281, 1565, 294, 264, 18519, + 3601, 13, 51226], "temperature": 0.0, "avg_logprob": -0.2627053774320162, "compression_ratio": + 1.7077922077922079, "no_speech_prob": 0.021291537210345268}, {"id": 387, "seek": + 126716, "start": 1284.4, "end": 1286.6000000000001, "text": " So any thoughts on + that on that topic?", "tokens": [51226, 407, 604, 4598, 322, 300, 322, 300, 4829, + 30, 51336], "temperature": 0.0, "avg_logprob": -0.2627053774320162, "compression_ratio": + 1.7077922077922079, "no_speech_prob": 0.021291537210345268}, {"id": 388, "seek": + 126716, "start": 1286.6000000000001, "end": 1291.3600000000001, "text": " Yeah, + and this is where I''m just starting getting back into we V8 because I think we", + "tokens": [51336, 865, 11, 293, 341, 307, 689, 286, 478, 445, 2891, 1242, 646, 666, + 321, 691, 23, 570, 286, 519, 321, 51574], "temperature": 0.0, "avg_logprob": -0.2627053774320162, + "compression_ratio": 1.7077922077922079, "no_speech_prob": 0.021291537210345268}, + {"id": 389, "seek": 126716, "start": 1291.3600000000001, "end": 1295.4, "text": + " V8 is going to be a huge part of solving that problem and adding the additional + context.", "tokens": [51574, 691, 23, 307, 516, 281, 312, 257, 2603, 644, 295, 12606, + 300, 1154, 293, 5127, 264, 4497, 4319, 13, 51776], "temperature": 0.0, "avg_logprob": + -0.2627053774320162, "compression_ratio": 1.7077922077922079, "no_speech_prob": + 0.021291537210345268}, {"id": 390, "seek": 129540, "start": 1295.4, "end": 1297.64, + "text": " But first I want to raise you one paper.", "tokens": [50364, 583, 700, + 286, 528, 281, 5300, 291, 472, 3035, 13, 50476], "temperature": 0.0, "avg_logprob": + -0.2511287689208984, "compression_ratio": 1.717607973421927, "no_speech_prob": 0.00789932906627655}, + {"id": 391, "seek": 129540, "start": 1297.64, "end": 1301.8400000000001, "text": + " So from the psycholinguistic thing, I want to point readers in the direction of + viewers", "tokens": [50476, 407, 490, 264, 4681, 401, 7050, 3142, 551, 11, 286, + 528, 281, 935, 17147, 294, 264, 3513, 295, 8499, 50686], "temperature": 0.0, "avg_logprob": + -0.2511287689208984, "compression_ratio": 1.717607973421927, "no_speech_prob": 0.00789932906627655}, + {"id": 392, "seek": 129540, "start": 1301.8400000000001, "end": 1303.64, "text": + " in the direction of checklist.", "tokens": [50686, 294, 264, 3513, 295, 30357, + 13, 50776], "temperature": 0.0, "avg_logprob": -0.2511287689208984, "compression_ratio": + 1.717607973421927, "no_speech_prob": 0.00789932906627655}, {"id": 393, "seek": 129540, + "start": 1303.64, "end": 1307.0400000000002, "text": " It was one of the best paper + awards at a recent ACL conference.", "tokens": [50776, 467, 390, 472, 295, 264, + 1151, 3035, 15193, 412, 257, 5162, 43873, 7586, 13, 50946], "temperature": 0.0, + "avg_logprob": -0.2511287689208984, "compression_ratio": 1.717607973421927, "no_speech_prob": + 0.00789932906627655}, {"id": 394, "seek": 129540, "start": 1307.0400000000002, "end": + 1313.68, "text": " ACL is I think ACL EM and OP, like the top NLP conferences checklist + is exactly what you", "tokens": [50946, 43873, 307, 286, 519, 43873, 16237, 293, + 23324, 11, 411, 264, 1192, 426, 45196, 22032, 30357, 307, 2293, 437, 291, 51278], + "temperature": 0.0, "avg_logprob": -0.2511287689208984, "compression_ratio": 1.717607973421927, + "no_speech_prob": 0.00789932906627655}, {"id": 395, "seek": 129540, "start": 1313.68, + "end": 1314.68, "text": " say.", "tokens": [51278, 584, 13, 51328], "temperature": + 0.0, "avg_logprob": -0.2511287689208984, "compression_ratio": 1.717607973421927, + "no_speech_prob": 0.00789932906627655}, {"id": 396, "seek": 129540, "start": 1314.68, + "end": 1318.96, "text": " It''s a complete suite of tests for negations named entity + swapping.", "tokens": [51328, 467, 311, 257, 3566, 14205, 295, 6921, 337, 2485, + 763, 4926, 13977, 1693, 10534, 13, 51542], "temperature": 0.0, "avg_logprob": -0.2511287689208984, + "compression_ratio": 1.717607973421927, "no_speech_prob": 0.00789932906627655}, + {"id": 397, "seek": 129540, "start": 1318.96, "end": 1320.2, "text": " And it''s + really nice to use.", "tokens": [51542, 400, 309, 311, 534, 1481, 281, 764, 13, + 51604], "temperature": 0.0, "avg_logprob": -0.2511287689208984, "compression_ratio": + 1.717607973421927, "no_speech_prob": 0.00789932906627655}, {"id": 398, "seek": 129540, + "start": 1320.2, "end": 1321.2, "text": " It''s on GitHub.", "tokens": [51604, 467, + 311, 322, 23331, 13, 51654], "temperature": 0.0, "avg_logprob": -0.2511287689208984, + "compression_ratio": 1.717607973421927, "no_speech_prob": 0.00789932906627655}, + {"id": 399, "seek": 129540, "start": 1321.2, "end": 1325.2, "text": " So yeah, so + they have the interfaces for testing for that kind of thing, which I think", "tokens": + [51654, 407, 1338, 11, 370, 436, 362, 264, 28416, 337, 4997, 337, 300, 733, 295, + 551, 11, 597, 286, 519, 51854], "temperature": 0.0, "avg_logprob": -0.2511287689208984, + "compression_ratio": 1.717607973421927, "no_speech_prob": 0.00789932906627655}, + {"id": 400, "seek": 132520, "start": 1325.2, "end": 1328.8, "text": " once you have + the test, you can start hacking away, it''s solving it.", "tokens": [50364, 1564, + 291, 362, 264, 1500, 11, 291, 393, 722, 31422, 1314, 11, 309, 311, 12606, 309, 13, + 50544], "temperature": 0.0, "avg_logprob": -0.28172068235253084, "compression_ratio": + 1.735408560311284, "no_speech_prob": 0.00039121831650845706}, {"id": 401, "seek": + 132520, "start": 1328.8, "end": 1330.76, "text": " It''s not theoretically grounded.", + "tokens": [50544, 467, 311, 406, 29400, 23535, 13, 50642], "temperature": 0.0, "avg_logprob": + -0.28172068235253084, "compression_ratio": 1.735408560311284, "no_speech_prob": + 0.00039121831650845706}, {"id": 402, "seek": 132520, "start": 1330.76, "end": 1334.56, + "text": " If you have the right test, you could hack away until you pass the test.", + "tokens": [50642, 759, 291, 362, 264, 558, 1500, 11, 291, 727, 10339, 1314, 1826, + 291, 1320, 264, 1500, 13, 50832], "temperature": 0.0, "avg_logprob": -0.28172068235253084, + "compression_ratio": 1.735408560311284, "no_speech_prob": 0.00039121831650845706}, + {"id": 403, "seek": 132520, "start": 1334.56, "end": 1338.88, "text": " So checklist + is the test for that.", "tokens": [50832, 407, 30357, 307, 264, 1500, 337, 300, + 13, 51048], "temperature": 0.0, "avg_logprob": -0.28172068235253084, "compression_ratio": + 1.735408560311284, "no_speech_prob": 0.00039121831650845706}, {"id": 404, "seek": + 132520, "start": 1338.88, "end": 1344.16, "text": " But then so yeah, so then the + idea of context and and we V8.", "tokens": [51048, 583, 550, 370, 1338, 11, 370, + 550, 264, 1558, 295, 4319, 293, 293, 321, 691, 23, 13, 51312], "temperature": 0.0, + "avg_logprob": -0.28172068235253084, "compression_ratio": 1.735408560311284, "no_speech_prob": + 0.00039121831650845706}, {"id": 405, "seek": 132520, "start": 1344.16, "end": 1349.72, + "text": " So so V8 is so the vector search engine part and you know, Facebook paper + dense passage", "tokens": [51312, 407, 370, 691, 23, 307, 370, 264, 8062, 3164, + 2848, 644, 293, 291, 458, 11, 4384, 3035, 18011, 11497, 51590], "temperature": 0.0, + "avg_logprob": -0.28172068235253084, "compression_ratio": 1.735408560311284, "no_speech_prob": + 0.00039121831650845706}, {"id": 406, "seek": 132520, "start": 1349.72, "end": 1353.2, + "text": " retrieval is their current approach where they have, you know, the text + embeddings, the", "tokens": [51590, 19817, 3337, 307, 641, 2190, 3109, 689, 436, + 362, 11, 291, 458, 11, 264, 2487, 12240, 29432, 11, 264, 51764], "temperature": + 0.0, "avg_logprob": -0.28172068235253084, "compression_ratio": 1.735408560311284, + "no_speech_prob": 0.00039121831650845706}, {"id": 407, "seek": 135320, "start": + 1353.2, "end": 1356.6000000000001, "text": " documents and they''re going to go + retrieve the context so that you can avoid hallucination,", "tokens": [50364, 8512, + 293, 436, 434, 516, 281, 352, 30254, 264, 4319, 370, 300, 291, 393, 5042, 35212, + 2486, 11, 50534], "temperature": 0.0, "avg_logprob": -0.21401780169943105, "compression_ratio": + 1.6317567567567568, "no_speech_prob": 0.004736820701509714}, {"id": 408, "seek": + 135320, "start": 1356.6000000000001, "end": 1360.0800000000002, "text": " hopefully + avoid these kind of vulnerabilities through robustness.", "tokens": [50534, 4696, + 5042, 613, 733, 295, 37633, 807, 13956, 1287, 13, 50708], "temperature": 0.0, "avg_logprob": + -0.21401780169943105, "compression_ratio": 1.6317567567567568, "no_speech_prob": + 0.004736820701509714}, {"id": 409, "seek": 135320, "start": 1360.0800000000002, + "end": 1365.0, "text": " But so vector search engines is what I see as being a huge + player in solving that particular", "tokens": [50708, 583, 370, 8062, 3164, 12982, + 307, 437, 286, 536, 382, 885, 257, 2603, 4256, 294, 12606, 300, 1729, 50954], "temperature": + 0.0, "avg_logprob": -0.21401780169943105, "compression_ratio": 1.6317567567567568, + "no_speech_prob": 0.004736820701509714}, {"id": 410, "seek": 135320, "start": 1365.0, + "end": 1366.0, "text": " problem.", "tokens": [50954, 1154, 13, 51004], "temperature": + 0.0, "avg_logprob": -0.21401780169943105, "compression_ratio": 1.6317567567567568, + "no_speech_prob": 0.004736820701509714}, {"id": 411, "seek": 135320, "start": 1366.0, + "end": 1370.4, "text": " And I see that transitioning not just from text, but image + text of video text like the idea", "tokens": [51004, 400, 286, 536, 300, 33777, + 406, 445, 490, 2487, 11, 457, 3256, 2487, 295, 960, 2487, 411, 264, 1558, 51224], + "temperature": 0.0, "avg_logprob": -0.21401780169943105, "compression_ratio": 1.6317567567567568, + "no_speech_prob": 0.004736820701509714}, {"id": 412, "seek": 135320, "start": 1370.4, + "end": 1374.8400000000001, "text": " that you want to add some more context from + your database to the current inference.", "tokens": [51224, 300, 291, 528, 281, + 909, 512, 544, 4319, 490, 428, 8149, 281, 264, 2190, 38253, 13, 51446], "temperature": + 0.0, "avg_logprob": -0.21401780169943105, "compression_ratio": 1.6317567567567568, + "no_speech_prob": 0.004736820701509714}, {"id": 413, "seek": 135320, "start": 1374.8400000000001, + "end": 1375.8400000000001, "text": " Yeah, yeah.", "tokens": [51446, 865, 11, 1338, + 13, 51496], "temperature": 0.0, "avg_logprob": -0.21401780169943105, "compression_ratio": + 1.6317567567567568, "no_speech_prob": 0.004736820701509714}, {"id": 414, "seek": + 135320, "start": 1375.8400000000001, "end": 1379.48, "text": " I mean, V8 is doing + fantastic work.", "tokens": [51496, 286, 914, 11, 691, 23, 307, 884, 5456, 589, + 13, 51678], "temperature": 0.0, "avg_logprob": -0.21401780169943105, "compression_ratio": + 1.6317567567567568, "no_speech_prob": 0.004736820701509714}, {"id": 415, "seek": + 137948, "start": 1379.48, "end": 1384.8, "text": " Actually, we have a podcast recorded + with mob and so, you know, my listener''s can actually", "tokens": [50364, 5135, + 11, 321, 362, 257, 7367, 8287, 365, 4298, 293, 370, 11, 291, 458, 11, 452, 31569, + 311, 393, 767, 50630], "temperature": 0.0, "avg_logprob": -0.23937352498372397, + "compression_ratio": 1.6692607003891051, "no_speech_prob": 0.03639225661754608}, + {"id": 416, "seek": 137948, "start": 1384.8, "end": 1388.88, "text": " watch it + and then we also had an episode with you where we covered some of the things.", + "tokens": [50630, 1159, 309, 293, 550, 321, 611, 632, 364, 3500, 365, 291, 689, + 321, 5343, 512, 295, 264, 721, 13, 50834], "temperature": 0.0, "avg_logprob": -0.23937352498372397, + "compression_ratio": 1.6692607003891051, "no_speech_prob": 0.03639225661754608}, + {"id": 417, "seek": 137948, "start": 1388.88, "end": 1393.8, "text": " And you also + recorded a bunch of videos like walking through the feature set.", "tokens": [50834, + 400, 291, 611, 8287, 257, 3840, 295, 2145, 411, 4494, 807, 264, 4111, 992, 13, 51080], + "temperature": 0.0, "avg_logprob": -0.23937352498372397, "compression_ratio": 1.6692607003891051, + "no_speech_prob": 0.03639225661754608}, {"id": 418, "seek": 137948, "start": 1393.8, + "end": 1400.56, "text": " What caught your attention in V8 when kind of if you can + slightly compare to other database", "tokens": [51080, 708, 5415, 428, 3202, 294, + 691, 23, 562, 733, 295, 498, 291, 393, 4748, 6794, 281, 661, 8149, 51418], "temperature": + 0.0, "avg_logprob": -0.23937352498372397, "compression_ratio": 1.6692607003891051, + "no_speech_prob": 0.03639225661754608}, {"id": 419, "seek": 137948, "start": 1400.56, + "end": 1401.56, "text": " vendors?", "tokens": [51418, 22056, 30, 51468], "temperature": + 0.0, "avg_logprob": -0.23937352498372397, "compression_ratio": 1.6692607003891051, + "no_speech_prob": 0.03639225661754608}, {"id": 420, "seek": 137948, "start": 1401.56, + "end": 1408.52, "text": " Okay, well, I don''t have much of a comparison to other + database vendors.", "tokens": [51468, 1033, 11, 731, 11, 286, 500, 380, 362, 709, + 295, 257, 9660, 281, 661, 8149, 22056, 13, 51816], "temperature": 0.0, "avg_logprob": + -0.23937352498372397, "compression_ratio": 1.6692607003891051, "no_speech_prob": + 0.03639225661754608}, {"id": 421, "seek": 140852, "start": 1408.52, "end": 1412.76, + "text": " And so I''m, you know, apologies to everyone out there working on this.", + "tokens": [50364, 400, 370, 286, 478, 11, 291, 458, 11, 34929, 281, 1518, 484, 456, + 1364, 322, 341, 13, 50576], "temperature": 0.0, "avg_logprob": -0.1736291940661444, + "compression_ratio": 1.7226027397260273, "no_speech_prob": 0.0009971614927053452}, + {"id": 422, "seek": 140852, "start": 1412.76, "end": 1416.8799999999999, "text": + " My experience with it doesn''t come from the practical software engineering side + of it.", "tokens": [50576, 1222, 1752, 365, 309, 1177, 380, 808, 490, 264, 8496, + 4722, 7043, 1252, 295, 309, 13, 50782], "temperature": 0.0, "avg_logprob": -0.1736291940661444, + "compression_ratio": 1.7226027397260273, "no_speech_prob": 0.0009971614927053452}, + {"id": 423, "seek": 140852, "start": 1416.8799999999999, "end": 1420.6399999999999, + "text": " It comes from reading these research papers and then being familiar with + these ideas.", "tokens": [50782, 467, 1487, 490, 3760, 613, 2132, 10577, 293, 550, + 885, 4963, 365, 613, 3487, 13, 50970], "temperature": 0.0, "avg_logprob": -0.1736291940661444, + "compression_ratio": 1.7226027397260273, "no_speech_prob": 0.0009971614927053452}, + {"id": 424, "seek": 140852, "start": 1420.6399999999999, "end": 1424.6, "text": + " And then, I mean, V8 is easy to use.", "tokens": [50970, 400, 550, 11, 286, 914, + 11, 691, 23, 307, 1858, 281, 764, 13, 51168], "temperature": 0.0, "avg_logprob": + -0.1736291940661444, "compression_ratio": 1.7226027397260273, "no_speech_prob": + 0.0009971614927053452}, {"id": 425, "seek": 140852, "start": 1424.6, "end": 1427.56, + "text": " It''s really well, the documentation is great.", "tokens": [51168, 467, + 311, 534, 731, 11, 264, 14333, 307, 869, 13, 51316], "temperature": 0.0, "avg_logprob": + -0.1736291940661444, "compression_ratio": 1.7226027397260273, "no_speech_prob": + 0.0009971614927053452}, {"id": 426, "seek": 140852, "start": 1427.56, "end": 1428.84, + "text": " It''s easy to get started with it.", "tokens": [51316, 467, 311, 1858, + 281, 483, 1409, 365, 309, 13, 51380], "temperature": 0.0, "avg_logprob": -0.1736291940661444, + "compression_ratio": 1.7226027397260273, "no_speech_prob": 0.0009971614927053452}, + {"id": 427, "seek": 140852, "start": 1428.84, "end": 1433.04, "text": " So that + was a huge thing for me is, you know, when I first met Bob, first of all, you", + "tokens": [51380, 407, 300, 390, 257, 2603, 551, 337, 385, 307, 11, 291, 458, 11, + 562, 286, 700, 1131, 6085, 11, 700, 295, 439, 11, 291, 51590], "temperature": 0.0, + "avg_logprob": -0.1736291940661444, "compression_ratio": 1.7226027397260273, "no_speech_prob": + 0.0009971614927053452}, {"id": 428, "seek": 140852, "start": 1433.04, "end": 1435.96, + "text": " know, he''s a great guy and, you know, meeting this team.", "tokens": + [51590, 458, 11, 415, 311, 257, 869, 2146, 293, 11, 291, 458, 11, 3440, 341, 1469, + 13, 51736], "temperature": 0.0, "avg_logprob": -0.1736291940661444, "compression_ratio": + 1.7226027397260273, "no_speech_prob": 0.0009971614927053452}, {"id": 429, "seek": + 143596, "start": 1435.96, "end": 1439.4, "text": " They''re all really on top of + everything and their slack chat is really great.", "tokens": [50364, 814, 434, 439, + 534, 322, 1192, 295, 1203, 293, 641, 29767, 5081, 307, 534, 869, 13, 50536], "temperature": + 0.0, "avg_logprob": -0.22765669389204546, "compression_ratio": 1.7967479674796747, + "no_speech_prob": 0.025702012702822685}, {"id": 430, "seek": 143596, "start": 1439.4, + "end": 1442.52, "text": " People, you know, pitching in their problems and it''s + just a great community.", "tokens": [50536, 3432, 11, 291, 458, 11, 37499, 294, + 641, 2740, 293, 309, 311, 445, 257, 869, 1768, 13, 50692], "temperature": 0.0, "avg_logprob": + -0.22765669389204546, "compression_ratio": 1.7967479674796747, "no_speech_prob": + 0.025702012702822685}, {"id": 431, "seek": 143596, "start": 1442.52, "end": 1447.76, + "text": " But, you know, what, what did it for me is, so I met Bob and then I spent + about two weeks", "tokens": [50692, 583, 11, 291, 458, 11, 437, 11, 437, 630, 309, + 337, 385, 307, 11, 370, 286, 1131, 6085, 293, 550, 286, 4418, 466, 732, 3259, 50954], + "temperature": 0.0, "avg_logprob": -0.22765669389204546, "compression_ratio": 1.7967479674796747, + "no_speech_prob": 0.025702012702822685}, {"id": 432, "seek": 143596, "start": 1447.76, + "end": 1450.8, "text": " going through their documentation, the quick start, the + installation set up, you know,", "tokens": [50954, 516, 807, 641, 14333, 11, 264, + 1702, 722, 11, 264, 13260, 992, 493, 11, 291, 458, 11, 51106], "temperature": 0.0, + "avg_logprob": -0.22765669389204546, "compression_ratio": 1.7967479674796747, "no_speech_prob": + 0.025702012702822685}, {"id": 433, "seek": 143596, "start": 1450.8, "end": 1452.32, + "text": " get my data sets in there.", "tokens": [51106, 483, 452, 1412, 6352, 294, + 456, 13, 51182], "temperature": 0.0, "avg_logprob": -0.22765669389204546, "compression_ratio": + 1.7967479674796747, "no_speech_prob": 0.025702012702822685}, {"id": 434, "seek": + 143596, "start": 1452.32, "end": 1453.72, "text": " And it''s just really easy to + use.", "tokens": [51182, 400, 309, 311, 445, 534, 1858, 281, 764, 13, 51252], "temperature": + 0.0, "avg_logprob": -0.22765669389204546, "compression_ratio": 1.7967479674796747, + "no_speech_prob": 0.025702012702822685}, {"id": 435, "seek": 143596, "start": 1453.72, + "end": 1456.88, "text": " So I, and then, and then learning about all these other + things like the Python client.", "tokens": [51252, 407, 286, 11, 293, 550, 11, 293, + 550, 2539, 466, 439, 613, 661, 721, 411, 264, 15329, 6423, 13, 51410], "temperature": + 0.0, "avg_logprob": -0.22765669389204546, "compression_ratio": 1.7967479674796747, + "no_speech_prob": 0.025702012702822685}, {"id": 436, "seek": 143596, "start": 1456.88, + "end": 1460.76, "text": " Like as we talk about fetching the context, I mean, we + want to ingrate that into a training", "tokens": [51410, 1743, 382, 321, 751, 466, + 23673, 278, 264, 4319, 11, 286, 914, 11, 321, 528, 281, 3957, 4404, 300, 666, 257, + 3097, 51604], "temperature": 0.0, "avg_logprob": -0.22765669389204546, "compression_ratio": + 1.7967479674796747, "no_speech_prob": 0.025702012702822685}, {"id": 437, "seek": + 143596, "start": 1460.76, "end": 1465.08, "text": " loop where say Facebook also + recently released internet augmented generation where they''re", "tokens": [51604, + 6367, 689, 584, 4384, 611, 3938, 4736, 4705, 36155, 5125, 689, 436, 434, 51820], + "temperature": 0.0, "avg_logprob": -0.22765669389204546, "compression_ratio": 1.7967479674796747, + "no_speech_prob": 0.025702012702822685}, {"id": 438, "seek": 146508, "start": 1465.08, + "end": 1469.8, "text": " using the Bing API to bring in the context and then learn + with that extra training.", "tokens": [50364, 1228, 264, 30755, 9362, 281, 1565, + 294, 264, 4319, 293, 550, 1466, 365, 300, 2857, 3097, 13, 50600], "temperature": + 0.0, "avg_logprob": -0.16515919470017956, "compression_ratio": 1.7560975609756098, + "no_speech_prob": 0.00039996570558287203}, {"id": 439, "seek": 146508, "start": + 1469.8, "end": 1474.24, "text": " So they have a Python client that lets you integrate + that into your model workflows.", "tokens": [50600, 407, 436, 362, 257, 15329, 6423, + 300, 6653, 291, 13365, 300, 666, 428, 2316, 43461, 13, 50822], "temperature": 0.0, + "avg_logprob": -0.16515919470017956, "compression_ratio": 1.7560975609756098, "no_speech_prob": + 0.00039996570558287203}, {"id": 440, "seek": 146508, "start": 1474.24, "end": 1478.28, + "text": " And then something we talked about in our last podcast, I love the GraphQL + interface.", "tokens": [50822, 400, 550, 746, 321, 2825, 466, 294, 527, 1036, 7367, + 11, 286, 959, 264, 21884, 13695, 9226, 13, 51024], "temperature": 0.0, "avg_logprob": + -0.16515919470017956, "compression_ratio": 1.7560975609756098, "no_speech_prob": + 0.00039996570558287203}, {"id": 441, "seek": 146508, "start": 1478.28, "end": 1479.36, + "text": " I think it''s really cool.", "tokens": [51024, 286, 519, 309, 311, 534, + 1627, 13, 51078], "temperature": 0.0, "avg_logprob": -0.16515919470017956, "compression_ratio": + 1.7560975609756098, "no_speech_prob": 0.00039996570558287203}, {"id": 442, "seek": + 146508, "start": 1479.36, "end": 1481.28, "text": " And I love the web demo.", "tokens": + [51078, 400, 286, 959, 264, 3670, 10723, 13, 51174], "temperature": 0.0, "avg_logprob": + -0.16515919470017956, "compression_ratio": 1.7560975609756098, "no_speech_prob": + 0.00039996570558287203}, {"id": 443, "seek": 146508, "start": 1481.28, "end": 1486.08, + "text": " So you can, you know, get started with the GraphQL interface and you can + practice your", "tokens": [51174, 407, 291, 393, 11, 291, 458, 11, 483, 1409, 365, + 264, 21884, 13695, 9226, 293, 291, 393, 3124, 428, 51414], "temperature": 0.0, "avg_logprob": + -0.16515919470017956, "compression_ratio": 1.7560975609756098, "no_speech_prob": + 0.00039996570558287203}, {"id": 444, "seek": 146508, "start": 1486.08, "end": 1491.1599999999999, + "text": " queries, you know, you know, learn it quickly before you make any commitment + of installing", "tokens": [51414, 24109, 11, 291, 458, 11, 291, 458, 11, 1466, 309, + 2661, 949, 291, 652, 604, 8371, 295, 20762, 51668], "temperature": 0.0, "avg_logprob": + -0.16515919470017956, "compression_ratio": 1.7560975609756098, "no_speech_prob": + 0.00039996570558287203}, {"id": 445, "seek": 146508, "start": 1491.1599999999999, + "end": 1492.96, "text": " your mouse database.", "tokens": [51668, 428, 9719, 8149, + 13, 51758], "temperature": 0.0, "avg_logprob": -0.16515919470017956, "compression_ratio": + 1.7560975609756098, "no_speech_prob": 0.00039996570558287203}, {"id": 446, "seek": + 149296, "start": 1492.96, "end": 1498.8, "text": " So yeah, and I just think we + be it is like a beautiful technology that''s making my, my", "tokens": [50364, 407, + 1338, 11, 293, 286, 445, 519, 321, 312, 309, 307, 411, 257, 2238, 2899, 300, 311, + 1455, 452, 11, 452, 50656], "temperature": 0.0, "avg_logprob": -0.24151722352896163, + "compression_ratio": 1.674911660777385, "no_speech_prob": 0.015586864203214645}, + {"id": 447, "seek": 149296, "start": 1498.8, "end": 1501.64, "text": " life is trying + to do deep learning research just a lot easier.", "tokens": [50656, 993, 307, 1382, + 281, 360, 2452, 2539, 2132, 445, 257, 688, 3571, 13, 50798], "temperature": 0.0, + "avg_logprob": -0.24151722352896163, "compression_ratio": 1.674911660777385, "no_speech_prob": + 0.015586864203214645}, {"id": 448, "seek": 149296, "start": 1501.64, "end": 1506.1200000000001, + "text": " So, you know, it''s awesome that they''re willing to support Henry AI + labs and help me continue", "tokens": [50798, 407, 11, 291, 458, 11, 309, 311, 3476, + 300, 436, 434, 4950, 281, 1406, 11085, 7318, 20339, 293, 854, 385, 2354, 51022], + "temperature": 0.0, "avg_logprob": -0.24151722352896163, "compression_ratio": 1.674911660777385, + "no_speech_prob": 0.015586864203214645}, {"id": 449, "seek": 149296, "start": 1506.1200000000001, + "end": 1507.44, "text": " making content on YouTube.", "tokens": [51022, 1455, 2701, + 322, 3088, 13, 51088], "temperature": 0.0, "avg_logprob": -0.24151722352896163, + "compression_ratio": 1.674911660777385, "no_speech_prob": 0.015586864203214645}, + {"id": 450, "seek": 149296, "start": 1507.44, "end": 1512.24, "text": " Well, at + the same time, it''s a, you know, it''s a tool that helps me do what I want to do", + "tokens": [51088, 1042, 11, 412, 264, 912, 565, 11, 309, 311, 257, 11, 291, 458, + 11, 309, 311, 257, 2290, 300, 3665, 385, 360, 437, 286, 528, 281, 360, 51328], "temperature": + 0.0, "avg_logprob": -0.24151722352896163, "compression_ratio": 1.674911660777385, + "no_speech_prob": 0.015586864203214645}, {"id": 451, "seek": 149296, "start": 1512.24, + "end": 1513.8, "text": " with this kind of research.", "tokens": [51328, 365, 341, + 733, 295, 2132, 13, 51406], "temperature": 0.0, "avg_logprob": -0.24151722352896163, + "compression_ratio": 1.674911660777385, "no_speech_prob": 0.015586864203214645}, + {"id": 452, "seek": 149296, "start": 1513.8, "end": 1514.8, "text": " Yeah.", "tokens": + [51406, 865, 13, 51456], "temperature": 0.0, "avg_logprob": -0.24151722352896163, + "compression_ratio": 1.674911660777385, "no_speech_prob": 0.015586864203214645}, + {"id": 453, "seek": 149296, "start": 1514.8, "end": 1519.04, "text": " And are you + like already using via V8 in your research or planning to use?", "tokens": [51456, + 400, 366, 291, 411, 1217, 1228, 5766, 691, 23, 294, 428, 2132, 420, 5038, 281, 764, + 30, 51668], "temperature": 0.0, "avg_logprob": -0.24151722352896163, "compression_ratio": + 1.674911660777385, "no_speech_prob": 0.015586864203214645}, {"id": 454, "seek": + 149296, "start": 1519.04, "end": 1520.04, "text": " Yeah.", "tokens": [51668, 865, + 13, 51718], "temperature": 0.0, "avg_logprob": -0.24151722352896163, "compression_ratio": + 1.674911660777385, "no_speech_prob": 0.015586864203214645}, {"id": 455, "seek": + 152004, "start": 1520.6399999999999, "end": 1523.68, "text": " So I haven''t really + made a Henry AI labs video on this yet, but it''s something I''m really", "tokens": + [50394, 407, 286, 2378, 380, 534, 1027, 257, 11085, 7318, 20339, 960, 322, 341, + 1939, 11, 457, 309, 311, 746, 286, 478, 534, 50546], "temperature": 0.0, "avg_logprob": + -0.3061877809721848, "compression_ratio": 1.710344827586207, "no_speech_prob": 0.43473872542381287}, + {"id": 456, "seek": 152004, "start": 1523.68, "end": 1524.68, "text": " excited + about.", "tokens": [50546, 2919, 466, 13, 50596], "temperature": 0.0, "avg_logprob": + -0.3061877809721848, "compression_ratio": 1.710344827586207, "no_speech_prob": 0.43473872542381287}, + {"id": 457, "seek": 152004, "start": 1524.68, "end": 1531.28, "text": " So one paper + I recently had accepted in ICML A, not quite ICML, but ICML A, it''s application", + "tokens": [50596, 407, 472, 3035, 286, 3938, 632, 9035, 294, 14360, 12683, 316, + 11, 406, 1596, 14360, 12683, 11, 457, 14360, 12683, 316, 11, 309, 311, 3861, 50926], + "temperature": 0.0, "avg_logprob": -0.3061877809721848, "compression_ratio": 1.710344827586207, + "no_speech_prob": 0.43473872542381287}, {"id": 458, "seek": 152004, "start": 1531.28, + "end": 1532.28, "text": " to add it to it.", "tokens": [50926, 281, 909, 309, 281, + 309, 13, 50976], "temperature": 0.0, "avg_logprob": -0.3061877809721848, "compression_ratio": + 1.710344827586207, "no_speech_prob": 0.43473872542381287}, {"id": 459, "seek": 152004, + "start": 1532.28, "end": 1537.12, "text": " But it''s a, it''s a caros, Bert is + the title of the paper and it''s about, you know, language", "tokens": [50976, 583, + 309, 311, 257, 11, 309, 311, 257, 1032, 329, 11, 29594, 307, 264, 4876, 295, 264, + 3035, 293, 309, 311, 466, 11, 291, 458, 11, 2856, 51218], "temperature": 0.0, "avg_logprob": + -0.3061877809721848, "compression_ratio": 1.710344827586207, "no_speech_prob": 0.43473872542381287}, + {"id": 460, "seek": 152004, "start": 1537.12, "end": 1541.12, "text": " modeling + with caros documentation and caros code examples and, you know, like Syek", "tokens": + [51218, 15983, 365, 1032, 329, 14333, 293, 1032, 329, 3089, 5110, 293, 11, 291, + 458, 11, 411, 3902, 916, 51418], "temperature": 0.0, "avg_logprob": -0.3061877809721848, + "compression_ratio": 1.710344827586207, "no_speech_prob": 0.43473872542381287}, + {"id": 461, "seek": 152004, "start": 1541.12, "end": 1544.8799999999999, "text": + " Paul, Franceschal Leigh, they''re going crazy with these caros code examples.", + "tokens": [51418, 4552, 11, 31441, 339, 304, 1456, 910, 11, 436, 434, 516, 3219, + 365, 613, 1032, 329, 3089, 5110, 13, 51606], "temperature": 0.0, "avg_logprob": + -0.3061877809721848, "compression_ratio": 1.710344827586207, "no_speech_prob": 0.43473872542381287}, + {"id": 462, "seek": 152004, "start": 1544.8799999999999, "end": 1546.3999999999999, + "text": " And there''s so many examples.", "tokens": [51606, 400, 456, 311, 370, + 867, 5110, 13, 51682], "temperature": 0.0, "avg_logprob": -0.3061877809721848, "compression_ratio": + 1.710344827586207, "no_speech_prob": 0.43473872542381287}, {"id": 463, "seek": 154640, + "start": 1546.4, "end": 1551.3600000000001, "text": " Like you could, you have like + a PhD and more organized completely online on this caros", "tokens": [50364, 1743, + 291, 727, 11, 291, 362, 411, 257, 14476, 293, 544, 9983, 2584, 2950, 322, 341, 1032, + 329, 50612], "temperature": 0.0, "avg_logprob": -0.2208238425829732, "compression_ratio": + 1.863481228668942, "no_speech_prob": 0.024185435846447945}, {"id": 464, "seek": + 154640, "start": 1551.3600000000001, "end": 1552.3600000000001, "text": " code examples + to me.", "tokens": [50612, 3089, 5110, 281, 385, 13, 50662], "temperature": 0.0, + "avg_logprob": -0.2208238425829732, "compression_ratio": 1.863481228668942, "no_speech_prob": + 0.024185435846447945}, {"id": 465, "seek": 154640, "start": 1552.3600000000001, + "end": 1555.96, "text": " It''s like the most interesting collection of deep learning + information on the internet", "tokens": [50662, 467, 311, 411, 264, 881, 1880, 5765, + 295, 2452, 2539, 1589, 322, 264, 4705, 50842], "temperature": 0.0, "avg_logprob": + -0.2208238425829732, "compression_ratio": 1.863481228668942, "no_speech_prob": 0.024185435846447945}, + {"id": 466, "seek": 154640, "start": 1555.96, "end": 1557.96, "text": " as the caros + code examples.", "tokens": [50842, 382, 264, 1032, 329, 3089, 5110, 13, 50942], + "temperature": 0.0, "avg_logprob": -0.2208238425829732, "compression_ratio": 1.863481228668942, + "no_speech_prob": 0.024185435846447945}, {"id": 467, "seek": 154640, "start": 1557.96, + "end": 1561.88, "text": " So from there, there''s like two ideas is like, can we + build a language model that can", "tokens": [50942, 407, 490, 456, 11, 456, 311, + 411, 732, 3487, 307, 411, 11, 393, 321, 1322, 257, 2856, 2316, 300, 393, 51138], + "temperature": 0.0, "avg_logprob": -0.2208238425829732, "compression_ratio": 1.863481228668942, + "no_speech_prob": 0.024185435846447945}, {"id": 468, "seek": 154640, "start": 1561.88, + "end": 1565.48, "text": " like debug your caros code for you and, you know, open + AI code X. Everyone knows that", "tokens": [51138, 411, 24083, 428, 1032, 329, 3089, + 337, 291, 293, 11, 291, 458, 11, 1269, 7318, 3089, 1783, 13, 5198, 3255, 300, 51318], + "temperature": 0.0, "avg_logprob": -0.2208238425829732, "compression_ratio": 1.863481228668942, + "no_speech_prob": 0.024185435846447945}, {"id": 469, "seek": 154640, "start": 1565.48, + "end": 1568.2, "text": " it looks like the answer to that is yes.", "tokens": [51318, + 309, 1542, 411, 264, 1867, 281, 300, 307, 2086, 13, 51454], "temperature": 0.0, + "avg_logprob": -0.2208238425829732, "compression_ratio": 1.863481228668942, "no_speech_prob": + 0.024185435846447945}, {"id": 470, "seek": 154640, "start": 1568.2, "end": 1571.0800000000002, + "text": " And you know, they have the lead code, they have data sets of like lead + code.", "tokens": [51454, 400, 291, 458, 11, 436, 362, 264, 1477, 3089, 11, 436, + 362, 1412, 6352, 295, 411, 1477, 3089, 13, 51598], "temperature": 0.0, "avg_logprob": + -0.2208238425829732, "compression_ratio": 1.863481228668942, "no_speech_prob": 0.024185435846447945}, + {"id": 471, "seek": 154640, "start": 1571.0800000000002, "end": 1573.0800000000002, + "text": " I know everyone loves lead code.", "tokens": [51598, 286, 458, 1518, 6752, + 1477, 3089, 13, 51698], "temperature": 0.0, "avg_logprob": -0.2208238425829732, + "compression_ratio": 1.863481228668942, "no_speech_prob": 0.024185435846447945}, + {"id": 472, "seek": 157308, "start": 1573.08, "end": 1576.32, "text": " And everyone + is looking for a job.", "tokens": [50364, 400, 1518, 307, 1237, 337, 257, 1691, + 13, 50526], "temperature": 0.0, "avg_logprob": -0.2806231490964812, "compression_ratio": + 1.7338709677419355, "no_speech_prob": 0.029828401282429695}, {"id": 473, "seek": + 157308, "start": 1576.32, "end": 1582.12, "text": " Yeah, code X is, you know, able + to pass these lead code tests.", "tokens": [50526, 865, 11, 3089, 1783, 307, 11, + 291, 458, 11, 1075, 281, 1320, 613, 1477, 3089, 6921, 13, 50816], "temperature": + 0.0, "avg_logprob": -0.2806231490964812, "compression_ratio": 1.7338709677419355, + "no_speech_prob": 0.029828401282429695}, {"id": 474, "seek": 157308, "start": 1582.12, + "end": 1585.8, "text": " So, you know, and I, you know, I''d say some lead code + tests are harder than the deep learning", "tokens": [50816, 407, 11, 291, 458, 11, + 293, 286, 11, 291, 458, 11, 286, 1116, 584, 512, 1477, 3089, 6921, 366, 6081, 813, + 264, 2452, 2539, 51000], "temperature": 0.0, "avg_logprob": -0.2806231490964812, + "compression_ratio": 1.7338709677419355, "no_speech_prob": 0.029828401282429695}, + {"id": 475, "seek": 157308, "start": 1585.8, "end": 1586.8, "text": " debugging.", + "tokens": [51000, 45592, 13, 51050], "temperature": 0.0, "avg_logprob": -0.2806231490964812, + "compression_ratio": 1.7338709677419355, "no_speech_prob": 0.029828401282429695}, + {"id": 476, "seek": 157308, "start": 1586.8, "end": 1591.4399999999998, "text": + " So, you know, it looks like it looks like a pretty promising solution.", "tokens": + [51050, 407, 11, 291, 458, 11, 309, 1542, 411, 309, 1542, 411, 257, 1238, 20257, + 3827, 13, 51282], "temperature": 0.0, "avg_logprob": -0.2806231490964812, "compression_ratio": + 1.7338709677419355, "no_speech_prob": 0.029828401282429695}, {"id": 477, "seek": + 157308, "start": 1591.4399999999998, "end": 1595.6, "text": " And so in the second + project I have that I''m integrating WeeVeate, what to help me", "tokens": [51282, + 400, 370, 294, 264, 1150, 1716, 286, 362, 300, 286, 478, 26889, 492, 68, 53, 68, + 473, 11, 437, 281, 854, 385, 51490], "temperature": 0.0, "avg_logprob": -0.2806231490964812, + "compression_ratio": 1.7338709677419355, "no_speech_prob": 0.029828401282429695}, + {"id": 478, "seek": 157308, "start": 1595.6, "end": 1600.56, "text": " do is, is, + you know, Facebook is big on unsupervised machine translation.", "tokens": [51490, + 360, 307, 11, 307, 11, 291, 458, 11, 4384, 307, 955, 322, 2693, 12879, 24420, 3479, + 12853, 13, 51738], "temperature": 0.0, "avg_logprob": -0.2806231490964812, "compression_ratio": + 1.7338709677419355, "no_speech_prob": 0.029828401282429695}, {"id": 479, "seek": + 160056, "start": 1600.56, "end": 1605.76, "text": " They did a paper where they''re + translating between Python and JavaScript without any annotation.", "tokens": [50364, + 814, 630, 257, 3035, 689, 436, 434, 35030, 1296, 15329, 293, 15778, 1553, 604, 48654, + 13, 50624], "temperature": 0.0, "avg_logprob": -0.21026254918453466, "compression_ratio": + 1.7725752508361203, "no_speech_prob": 0.03237887844443321}, {"id": 480, "seek": + 160056, "start": 1605.76, "end": 1611.6799999999998, "text": " So maybe we can translate + between caros and PyTorch without needing to, or PyTorch and", "tokens": [50624, + 407, 1310, 321, 393, 13799, 1296, 1032, 329, 293, 9953, 51, 284, 339, 1553, 18006, + 281, 11, 420, 9953, 51, 284, 339, 293, 50920], "temperature": 0.0, "avg_logprob": + -0.21026254918453466, "compression_ratio": 1.7725752508361203, "no_speech_prob": + 0.03237887844443321}, {"id": 481, "seek": 160056, "start": 1611.6799999999998, "end": + 1615.6399999999999, "text": " Jack''s even to, without, you know, somehow without + much labeling.", "tokens": [50920, 4718, 311, 754, 281, 11, 1553, 11, 291, 458, + 11, 6063, 1553, 709, 40244, 13, 51118], "temperature": 0.0, "avg_logprob": -0.21026254918453466, + "compression_ratio": 1.7725752508361203, "no_speech_prob": 0.03237887844443321}, + {"id": 482, "seek": 160056, "start": 1615.6399999999999, "end": 1619.04, "text": + " And this is very much an infant research project.", "tokens": [51118, 400, 341, + 307, 588, 709, 364, 16757, 2132, 1716, 13, 51288], "temperature": 0.0, "avg_logprob": + -0.21026254918453466, "compression_ratio": 1.7725752508361203, "no_speech_prob": + 0.03237887844443321}, {"id": 483, "seek": 160056, "start": 1619.04, "end": 1623.1599999999999, + "text": " But if you have that, if you could bring the caros code examples to PyTorch + and Jack''s", "tokens": [51288, 583, 498, 291, 362, 300, 11, 498, 291, 727, 1565, + 264, 1032, 329, 3089, 5110, 281, 9953, 51, 284, 339, 293, 4718, 311, 51494], "temperature": + 0.0, "avg_logprob": -0.21026254918453466, "compression_ratio": 1.7725752508361203, + "no_speech_prob": 0.03237887844443321}, {"id": 484, "seek": 160056, "start": 1623.1599999999999, + "end": 1626.32, "text": " and just, you know, help people share this knowledge.", + "tokens": [51494, 293, 445, 11, 291, 458, 11, 854, 561, 2073, 341, 3601, 13, 51652], + "temperature": 0.0, "avg_logprob": -0.21026254918453466, "compression_ratio": 1.7725752508361203, + "no_speech_prob": 0.03237887844443321}, {"id": 485, "seek": 160056, "start": 1626.32, + "end": 1630.44, "text": " So, so this is like two of my personal projects that I''ve + started integrating WeeVeate in", "tokens": [51652, 407, 11, 370, 341, 307, 411, + 732, 295, 452, 2973, 4455, 300, 286, 600, 1409, 26889, 492, 68, 53, 68, 473, 294, + 51858], "temperature": 0.0, "avg_logprob": -0.21026254918453466, "compression_ratio": + 1.7725752508361203, "no_speech_prob": 0.03237887844443321}, {"id": 486, "seek": + 163044, "start": 1630.44, "end": 1634.0800000000002, "text": " and then one of the + project that I''m, you know, extremely passionate about and really", "tokens": [50364, + 293, 550, 472, 295, 264, 1716, 300, 286, 478, 11, 291, 458, 11, 4664, 11410, 466, + 293, 534, 50546], "temperature": 0.0, "avg_logprob": -0.21051079322551858, "compression_ratio": + 1.723127035830619, "no_speech_prob": 0.0003647717530839145}, {"id": 487, "seek": + 163044, "start": 1634.0800000000002, "end": 1637.0, "text": " into with my involvement + with the university.", "tokens": [50546, 666, 365, 452, 17447, 365, 264, 5454, 13, + 50692], "temperature": 0.0, "avg_logprob": -0.21051079322551858, "compression_ratio": + 1.723127035830619, "no_speech_prob": 0.0003647717530839145}, {"id": 488, "seek": + 163044, "start": 1637.0, "end": 1641.4, "text": " And this is kind of a separate + thing that I''m not too heavy on because I don''t want to", "tokens": [50692, 400, + 341, 307, 733, 295, 257, 4994, 551, 300, 286, 478, 406, 886, 4676, 322, 570, 286, + 500, 380, 528, 281, 50912], "temperature": 0.0, "avg_logprob": -0.21051079322551858, + "compression_ratio": 1.723127035830619, "no_speech_prob": 0.0003647717530839145}, + {"id": 489, "seek": 163044, "start": 1641.4, "end": 1643.6000000000001, "text": + " like kind of push the commercial interest too much.", "tokens": [50912, 411, 733, + 295, 2944, 264, 6841, 1179, 886, 709, 13, 51022], "temperature": 0.0, "avg_logprob": + -0.21051079322551858, "compression_ratio": 1.723127035830619, "no_speech_prob": + 0.0003647717530839145}, {"id": 490, "seek": 163044, "start": 1643.6000000000001, + "end": 1645.76, "text": " It''s, you know, and WeeVeate is open source.", "tokens": + [51022, 467, 311, 11, 291, 458, 11, 293, 492, 68, 53, 68, 473, 307, 1269, 4009, + 13, 51130], "temperature": 0.0, "avg_logprob": -0.21051079322551858, "compression_ratio": + 1.723127035830619, "no_speech_prob": 0.0003647717530839145}, {"id": 491, "seek": + 163044, "start": 1645.76, "end": 1647.3600000000001, "text": " So it''s an open + source software.", "tokens": [51130, 407, 309, 311, 364, 1269, 4009, 4722, 13, 51210], + "temperature": 0.0, "avg_logprob": -0.21051079322551858, "compression_ratio": 1.723127035830619, + "no_speech_prob": 0.0003647717530839145}, {"id": 492, "seek": 163044, "start": 1647.3600000000001, + "end": 1650.0800000000002, "text": " We have, we can download it from GitHub and + we have it.", "tokens": [51210, 492, 362, 11, 321, 393, 5484, 309, 490, 23331, 293, + 321, 362, 309, 13, 51346], "temperature": 0.0, "avg_logprob": -0.21051079322551858, + "compression_ratio": 1.723127035830619, "no_speech_prob": 0.0003647717530839145}, + {"id": 493, "seek": 163044, "start": 1650.0800000000002, "end": 1652.4, "text": + " So they can''t, you know, take it away.", "tokens": [51346, 407, 436, 393, 380, + 11, 291, 458, 11, 747, 309, 1314, 13, 51462], "temperature": 0.0, "avg_logprob": + -0.21051079322551858, "compression_ratio": 1.723127035830619, "no_speech_prob": + 0.0003647717530839145}, {"id": 494, "seek": 163044, "start": 1652.4, "end": 1657.76, + "text": " And so, so this other project is, we''re trying to build patient information + retrieval", "tokens": [51462, 400, 370, 11, 370, 341, 661, 1716, 307, 11, 321, 434, + 1382, 281, 1322, 4537, 1589, 19817, 3337, 51730], "temperature": 0.0, "avg_logprob": + -0.21051079322551858, "compression_ratio": 1.723127035830619, "no_speech_prob": + 0.0003647717530839145}, {"id": 495, "seek": 165776, "start": 1657.76, "end": 1663.08, + "text": " systems where you, you know, you come to the hospital and they start to + record your, you", "tokens": [50364, 3652, 689, 291, 11, 291, 458, 11, 291, 808, + 281, 264, 4530, 293, 436, 722, 281, 2136, 428, 11, 291, 50630], "temperature": 0.0, + "avg_logprob": -0.32687178654457205, "compression_ratio": 1.7233333333333334, "no_speech_prob": + 0.07071597874164581}, {"id": 496, "seek": 165776, "start": 1663.08, "end": 1668.24, + "text": " know, coagulation studies, they, all the physiological markers and the + genetic history.", "tokens": [50630, 458, 11, 598, 559, 2776, 5313, 11, 436, 11, + 439, 264, 41234, 19175, 293, 264, 12462, 2503, 13, 50888], "temperature": 0.0, "avg_logprob": + -0.32687178654457205, "compression_ratio": 1.7233333333333334, "no_speech_prob": + 0.07071597874164581}, {"id": 497, "seek": 165776, "start": 1668.24, "end": 1670.04, + "text": " And we want to go query the literature maybe.", "tokens": [50888, 400, + 321, 528, 281, 352, 14581, 264, 10394, 1310, 13, 50978], "temperature": 0.0, "avg_logprob": + -0.32687178654457205, "compression_ratio": 1.7233333333333334, "no_speech_prob": + 0.07071597874164581}, {"id": 498, "seek": 165776, "start": 1670.04, "end": 1674.84, + "text": " So this is, you know, as a research project and the on Institute has been + pioneering this", "tokens": [50978, 407, 341, 307, 11, 291, 458, 11, 382, 257, 2132, + 1716, 293, 264, 322, 9446, 575, 668, 19761, 1794, 341, 51218], "temperature": 0.0, + "avg_logprob": -0.32687178654457205, "compression_ratio": 1.7233333333333334, "no_speech_prob": + 0.07071597874164581}, {"id": 499, "seek": 165776, "start": 1674.84, "end": 1679.28, + "text": " with data sets like core 19 and their system called sub.ai.", "tokens": + [51218, 365, 1412, 6352, 411, 4965, 1294, 293, 641, 1185, 1219, 1422, 13, 1301, + 13, 51440], "temperature": 0.0, "avg_logprob": -0.32687178654457205, "compression_ratio": + 1.7233333333333334, "no_speech_prob": 0.07071597874164581}, {"id": 500, "seek": + 165776, "start": 1679.28, "end": 1681.72, "text": " Salesforce research had a system + called co-search.", "tokens": [51440, 40398, 2132, 632, 257, 1185, 1219, 598, 12, + 405, 1178, 13, 51562], "temperature": 0.0, "avg_logprob": -0.32687178654457205, + "compression_ratio": 1.7233333333333334, "no_speech_prob": 0.07071597874164581}, + {"id": 501, "seek": 165776, "start": 1681.72, "end": 1683.32, "text": " I''m just + kind of naming things for people.", "tokens": [51562, 286, 478, 445, 733, 295, 25290, + 721, 337, 561, 13, 51642], "temperature": 0.0, "avg_logprob": -0.32687178654457205, + "compression_ratio": 1.7233333333333334, "no_speech_prob": 0.07071597874164581}, + {"id": 502, "seek": 165776, "start": 1683.32, "end": 1685.8, "text": " Oh my god, + I''m not going to describe these things.", "tokens": [51642, 876, 452, 3044, 11, + 286, 478, 406, 516, 281, 6786, 613, 721, 13, 51766], "temperature": 0.0, "avg_logprob": + -0.32687178654457205, "compression_ratio": 1.7233333333333334, "no_speech_prob": + 0.07071597874164581}, {"id": 503, "seek": 168580, "start": 1685.9199999999998, "end": + 1689.96, "text": " So these are like literature scientific literature mining systems + where you, you know,", "tokens": [50370, 407, 613, 366, 411, 10394, 8134, 10394, + 15512, 3652, 689, 291, 11, 291, 458, 11, 50572], "temperature": 0.0, "avg_logprob": + -0.24364545004708427, "compression_ratio": 1.797583081570997, "no_speech_prob": + 0.005606834311038256}, {"id": 504, "seek": 168580, "start": 1689.96, "end": 1694.76, + "text": " you want information about say COVID-19 and or, you know, someone''s coming + in there", "tokens": [50572, 291, 528, 1589, 466, 584, 4566, 12, 3405, 293, 420, + 11, 291, 458, 11, 1580, 311, 1348, 294, 456, 50812], "temperature": 0.0, "avg_logprob": + -0.24364545004708427, "compression_ratio": 1.797583081570997, "no_speech_prob": + 0.005606834311038256}, {"id": 505, "seek": 168580, "start": 1694.76, "end": 1697.76, + "text": " with some obscure disease, you want to be able to query the literature + with particular", "tokens": [50812, 365, 512, 34443, 4752, 11, 291, 528, 281, 312, + 1075, 281, 14581, 264, 10394, 365, 1729, 50962], "temperature": 0.0, "avg_logprob": + -0.24364545004708427, "compression_ratio": 1.797583081570997, "no_speech_prob": + 0.005606834311038256}, {"id": 506, "seek": 168580, "start": 1697.76, "end": 1699.0, + "text": " information about this patient.", "tokens": [50962, 1589, 466, 341, 4537, + 13, 51024], "temperature": 0.0, "avg_logprob": -0.24364545004708427, "compression_ratio": + 1.797583081570997, "no_speech_prob": 0.005606834311038256}, {"id": 507, "seek": + 168580, "start": 1699.0, "end": 1702.1599999999999, "text": " And so this is the + information retrieval problem that, you know, we''re super interested", "tokens": + [51024, 400, 370, 341, 307, 264, 1589, 19817, 3337, 1154, 300, 11, 291, 458, 11, + 321, 434, 1687, 3102, 51182], "temperature": 0.0, "avg_logprob": -0.24364545004708427, + "compression_ratio": 1.797583081570997, "no_speech_prob": 0.005606834311038256}, + {"id": 508, "seek": 168580, "start": 1702.1599999999999, "end": 1703.84, "text": + " in as spectrature search engine people.", "tokens": [51182, 294, 382, 6177, 81, + 1503, 3164, 2848, 561, 13, 51266], "temperature": 0.0, "avg_logprob": -0.24364545004708427, + "compression_ratio": 1.797583081570997, "no_speech_prob": 0.005606834311038256}, + {"id": 509, "seek": 168580, "start": 1703.84, "end": 1709.44, "text": " So we''re + trying to turn these patients into, which is what I have is mostly tabular data.", + "tokens": [51266, 407, 321, 434, 1382, 281, 1261, 613, 4209, 666, 11, 597, 307, + 437, 286, 362, 307, 5240, 4421, 1040, 1412, 13, 51546], "temperature": 0.0, "avg_logprob": + -0.24364545004708427, "compression_ratio": 1.797583081570997, "no_speech_prob": + 0.005606834311038256}, {"id": 510, "seek": 168580, "start": 1709.44, "end": 1713.96, + "text": " You might get a little bit of medical images, some clinical reports for + some text, but,", "tokens": [51546, 509, 1062, 483, 257, 707, 857, 295, 4625, 5267, + 11, 512, 9115, 7122, 337, 512, 2487, 11, 457, 11, 51772], "temperature": 0.0, "avg_logprob": + -0.24364545004708427, "compression_ratio": 1.797583081570997, "no_speech_prob": + 0.005606834311038256}, {"id": 511, "seek": 171396, "start": 1714.64, "end": 1715.76, + "text": " yeah, mostly tabular data.", "tokens": [50398, 1338, 11, 5240, 4421, 1040, + 1412, 13, 50454], "temperature": 0.0, "avg_logprob": -0.21258800617162732, "compression_ratio": + 1.679245283018868, "no_speech_prob": 0.007347135804593563}, {"id": 512, "seek": + 171396, "start": 1715.76, "end": 1720.16, "text": " So we want to encode that into + vectors, send those vectors into the scientific literature,", "tokens": [50454, + 407, 321, 528, 281, 2058, 1429, 300, 666, 18875, 11, 2845, 729, 18875, 666, 264, + 8134, 10394, 11, 50674], "temperature": 0.0, "avg_logprob": -0.21258800617162732, + "compression_ratio": 1.679245283018868, "no_speech_prob": 0.007347135804593563}, + {"id": 513, "seek": 171396, "start": 1720.16, "end": 1724.8, "text": " and then + maybe there''s some clinical trial, you know, because it''s so much data.", "tokens": + [50674, 293, 550, 1310, 456, 311, 512, 9115, 7308, 11, 291, 458, 11, 570, 309, 311, + 370, 709, 1412, 13, 50906], "temperature": 0.0, "avg_logprob": -0.21258800617162732, + "compression_ratio": 1.679245283018868, "no_speech_prob": 0.007347135804593563}, + {"id": 514, "seek": 171396, "start": 1724.8, "end": 1728.96, "text": " Once you + really download, like say the core 19 data set from the on Institute,", "tokens": + [50906, 3443, 291, 534, 5484, 11, 411, 584, 264, 4965, 1294, 1412, 992, 490, 264, + 322, 9446, 11, 51114], "temperature": 0.0, "avg_logprob": -0.21258800617162732, + "compression_ratio": 1.679245283018868, "no_speech_prob": 0.007347135804593563}, + {"id": 515, "seek": 171396, "start": 1728.96, "end": 1734.48, "text": " you''ll + realize that, you know, 500,000 papers about COVID is nothing anyone could read.", + "tokens": [51114, 291, 603, 4325, 300, 11, 291, 458, 11, 5923, 11, 1360, 10577, + 466, 4566, 307, 1825, 2878, 727, 1401, 13, 51390], "temperature": 0.0, "avg_logprob": + -0.21258800617162732, "compression_ratio": 1.679245283018868, "no_speech_prob": + 0.007347135804593563}, {"id": 516, "seek": 171396, "start": 1734.48, "end": 1737.52, + "text": " You know, I already know this from reading deep learning papers.", "tokens": + [51390, 509, 458, 11, 286, 1217, 458, 341, 490, 3760, 2452, 2539, 10577, 13, 51542], + "temperature": 0.0, "avg_logprob": -0.21258800617162732, "compression_ratio": 1.679245283018868, + "no_speech_prob": 0.007347135804593563}, {"id": 517, "seek": 171396, "start": 1737.52, + "end": 1739.24, "text": " It''s like no one can read this.", "tokens": [51542, 467, + 311, 411, 572, 472, 393, 1401, 341, 13, 51628], "temperature": 0.0, "avg_logprob": + -0.21258800617162732, "compression_ratio": 1.679245283018868, "no_speech_prob": + 0.007347135804593563}, {"id": 518, "seek": 171396, "start": 1739.24, "end": 1743.2, + "text": " And even like, if you go traditional way, and I wanted those at the top", + "tokens": [51628, 400, 754, 411, 11, 498, 291, 352, 5164, 636, 11, 293, 286, 1415, + 729, 412, 264, 1192, 51826], "temperature": 0.0, "avg_logprob": -0.21258800617162732, + "compression_ratio": 1.679245283018868, "no_speech_prob": 0.007347135804593563}, + {"id": 519, "seek": 174320, "start": 1743.24, "end": 1748.1200000000001, "text": + " in, in this area, you know, like if you go traditional way, let''s say you have + a keyword look up,", "tokens": [50366, 294, 11, 294, 341, 1859, 11, 291, 458, 11, + 411, 498, 291, 352, 5164, 636, 11, 718, 311, 584, 291, 362, 257, 20428, 574, 493, + 11, 50610], "temperature": 0.0, "avg_logprob": -0.22133786538067987, "compression_ratio": + 1.775438596491228, "no_speech_prob": 0.003284771926701069}, {"id": 520, "seek": + 174320, "start": 1748.1200000000001, "end": 1752.68, "text": " right? So keyword + search, you would have to build like some kind of synonym layer,", "tokens": [50610, + 558, 30, 407, 20428, 3164, 11, 291, 576, 362, 281, 1322, 411, 512, 733, 295, 5451, + 12732, 4583, 11, 50838], "temperature": 0.0, "avg_logprob": -0.22133786538067987, + "compression_ratio": 1.775438596491228, "no_speech_prob": 0.003284771926701069}, + {"id": 521, "seek": 174320, "start": 1752.68, "end": 1757.76, "text": " which means + you need to understand what you''re doing, or you will need to hire somebody to + do that.", "tokens": [50838, 597, 1355, 291, 643, 281, 1223, 437, 291, 434, 884, + 11, 420, 291, 486, 643, 281, 11158, 2618, 281, 360, 300, 13, 51092], "temperature": + 0.0, "avg_logprob": -0.22133786538067987, "compression_ratio": 1.775438596491228, + "no_speech_prob": 0.003284771926701069}, {"id": 522, "seek": 174320, "start": 1757.76, + "end": 1762.92, "text": " And that''s like an additional step, which kind of like, + you know, doesn''t reduce the journey", "tokens": [51092, 400, 300, 311, 411, 364, + 4497, 1823, 11, 597, 733, 295, 411, 11, 291, 458, 11, 1177, 380, 5407, 264, 4671, + 51350], "temperature": 0.0, "avg_logprob": -0.22133786538067987, "compression_ratio": + 1.775438596491228, "no_speech_prob": 0.003284771926701069}, {"id": 523, "seek": + 174320, "start": 1762.92, "end": 1768.4, "text": " for you. You have to do that + and this is that you feel like you have more control, maybe,", "tokens": [51350, + 337, 291, 13, 509, 362, 281, 360, 300, 293, 341, 307, 300, 291, 841, 411, 291, 362, + 544, 1969, 11, 1310, 11, 51624], "temperature": 0.0, "avg_logprob": -0.22133786538067987, + "compression_ratio": 1.775438596491228, "no_speech_prob": 0.003284771926701069}, + {"id": 524, "seek": 174320, "start": 1768.4, "end": 1770.8400000000001, "text": + " but at the same time, it''s very laborious.", "tokens": [51624, 457, 412, 264, + 912, 565, 11, 309, 311, 588, 5938, 851, 13, 51746], "temperature": 0.0, "avg_logprob": + -0.22133786538067987, "compression_ratio": 1.775438596491228, "no_speech_prob": + 0.003284771926701069}, {"id": 525, "seek": 177084, "start": 1770.9599999999998, + "end": 1775.1599999999999, "text": " So at the same time, similarity search kind + of doesn''t have that boundary, right?", "tokens": [50370, 407, 412, 264, 912, 565, + 11, 32194, 3164, 733, 295, 1177, 380, 362, 300, 12866, 11, 558, 30, 50580], "temperature": + 0.0, "avg_logprob": -0.20636556662765204, "compression_ratio": 1.78125, "no_speech_prob": + 0.00159863056614995}, {"id": 526, "seek": 177084, "start": 1775.1599999999999, "end": + 1780.32, "text": " So essentially you haven''t coded it and now you, you know, now + that the challenge,", "tokens": [50580, 407, 4476, 291, 2378, 380, 34874, 309, 293, + 586, 291, 11, 291, 458, 11, 586, 300, 264, 3430, 11, 50838], "temperature": 0.0, + "avg_logprob": -0.20636556662765204, "compression_ratio": 1.78125, "no_speech_prob": + 0.00159863056614995}, {"id": 527, "seek": 177084, "start": 1780.32, "end": 1784.3999999999999, + "text": " the complexity moves more into the space of choosing the right neural + network", "tokens": [50838, 264, 14024, 6067, 544, 666, 264, 1901, 295, 10875, 264, + 558, 18161, 3209, 51042], "temperature": 0.0, "avg_logprob": -0.20636556662765204, + "compression_ratio": 1.78125, "no_speech_prob": 0.00159863056614995}, {"id": 528, + "seek": 177084, "start": 1784.3999999999999, "end": 1787.04, "text": " and then + choosing the right database.", "tokens": [51042, 293, 550, 10875, 264, 558, 8149, + 13, 51174], "temperature": 0.0, "avg_logprob": -0.20636556662765204, "compression_ratio": + 1.78125, "no_speech_prob": 0.00159863056614995}, {"id": 529, "seek": 177084, "start": + 1787.04, "end": 1790.08, "text": " Everyone knows which is the right database.", + "tokens": [51174, 5198, 3255, 597, 307, 264, 558, 8149, 13, 51326], "temperature": + 0.0, "avg_logprob": -0.20636556662765204, "compression_ratio": 1.78125, "no_speech_prob": + 0.00159863056614995}, {"id": 530, "seek": 177084, "start": 1790.08, "end": 1795.08, + "text": " So, but anyway, but I''m just saying, like, but, but I''m just saying, + like,", "tokens": [51326, 407, 11, 457, 4033, 11, 457, 286, 478, 445, 1566, 11, + 411, 11, 457, 11, 457, 286, 478, 445, 1566, 11, 411, 11, 51576], "temperature": + 0.0, "avg_logprob": -0.20636556662765204, "compression_ratio": 1.78125, "no_speech_prob": + 0.00159863056614995}, {"id": 531, "seek": 179508, "start": 1795.12, "end": 1801.6799999999998, + "text": " do you think that similarity search will completely supersede keyword", + "tokens": [50366, 360, 291, 519, 300, 32194, 3164, 486, 2584, 37906, 4858, 20428, + 50694], "temperature": 0.0, "avg_logprob": -0.24039471274928043, "compression_ratio": + 1.5739910313901346, "no_speech_prob": 0.003325967350974679}, {"id": 532, "seek": + 179508, "start": 1801.6799999999998, "end": 1804.56, "text": " or you still see + some synergy between them?", "tokens": [50694, 420, 291, 920, 536, 512, 50163, 1296, + 552, 30, 50838], "temperature": 0.0, "avg_logprob": -0.24039471274928043, "compression_ratio": + 1.5739910313901346, "no_speech_prob": 0.003325967350974679}, {"id": 533, "seek": + 179508, "start": 1807.12, "end": 1812.08, "text": " Yeah. And well, I like, before + I get into saying my opinion on this,", "tokens": [50966, 865, 13, 400, 731, 11, + 286, 411, 11, 949, 286, 483, 666, 1566, 452, 4800, 322, 341, 11, 51214], "temperature": + 0.0, "avg_logprob": -0.24039471274928043, "compression_ratio": 1.5739910313901346, + "no_speech_prob": 0.003325967350974679}, {"id": 534, "seek": 179508, "start": 1812.08, + "end": 1814.24, "text": " I''d say that I''m not the expert on keyword search.", + "tokens": [51214, 286, 1116, 584, 300, 286, 478, 406, 264, 5844, 322, 20428, 3164, + 13, 51322], "temperature": 0.0, "avg_logprob": -0.24039471274928043, "compression_ratio": + 1.5739910313901346, "no_speech_prob": 0.003325967350974679}, {"id": 535, "seek": + 179508, "start": 1814.24, "end": 1816.1999999999998, "text": " So, so here''s my + opinion on it.", "tokens": [51322, 407, 11, 370, 510, 311, 452, 4800, 322, 309, + 13, 51420], "temperature": 0.0, "avg_logprob": -0.24039471274928043, "compression_ratio": + 1.5739910313901346, "no_speech_prob": 0.003325967350974679}, {"id": 536, "seek": + 179508, "start": 1816.1999999999998, "end": 1821.08, "text": " I, you know, we V8 + has a symbolic filtering where you can still do symbolic searches.", "tokens": [51420, + 286, 11, 291, 458, 11, 321, 691, 23, 575, 257, 25755, 30822, 689, 291, 393, 920, + 360, 25755, 26701, 13, 51664], "temperature": 0.0, "avg_logprob": -0.24039471274928043, + "compression_ratio": 1.5739910313901346, "no_speech_prob": 0.003325967350974679}, + {"id": 537, "seek": 182108, "start": 1821.08, "end": 1822.6399999999999, "text": + " You can still do the keyword filtering.", "tokens": [50364, 509, 393, 920, 360, + 264, 20428, 30822, 13, 50442], "temperature": 0.0, "avg_logprob": -0.19698003927866617, + "compression_ratio": 1.7846153846153847, "no_speech_prob": 0.0011335811577737331}, + {"id": 538, "seek": 182108, "start": 1822.6399999999999, "end": 1825.6799999999998, + "text": " You can still have these symbolic characteristics.", "tokens": [50442, + 509, 393, 920, 362, 613, 25755, 10891, 13, 50594], "temperature": 0.0, "avg_logprob": + -0.19698003927866617, "compression_ratio": 1.7846153846153847, "no_speech_prob": + 0.0011335811577737331}, {"id": 539, "seek": 182108, "start": 1825.6799999999998, + "end": 1829.4399999999998, "text": " And, you know, I''m in the same, I believe + things like what Gary Marcus talks about,", "tokens": [50594, 400, 11, 291, 458, + 11, 286, 478, 294, 264, 912, 11, 286, 1697, 721, 411, 437, 13788, 26574, 6686, 466, + 11, 50782], "temperature": 0.0, "avg_logprob": -0.19698003927866617, "compression_ratio": + 1.7846153846153847, "no_speech_prob": 0.0011335811577737331}, {"id": 540, "seek": + 182108, "start": 1829.4399999999998, "end": 1832.1999999999998, "text": " about, + you know, it''s not really robust to these symbolic queries.", "tokens": [50782, + 466, 11, 291, 458, 11, 309, 311, 406, 534, 13956, 281, 613, 25755, 24109, 13, 50920], + "temperature": 0.0, "avg_logprob": -0.19698003927866617, "compression_ratio": 1.7846153846153847, + "no_speech_prob": 0.0011335811577737331}, {"id": 541, "seek": 182108, "start": 1832.1999999999998, + "end": 1836.32, "text": " What we mentioned earlier, where you insert negation and + it might completely throw it off.", "tokens": [50920, 708, 321, 2835, 3071, 11, + 689, 291, 8969, 2485, 399, 293, 309, 1062, 2584, 3507, 309, 766, 13, 51126], "temperature": + 0.0, "avg_logprob": -0.19698003927866617, "compression_ratio": 1.7846153846153847, + "no_speech_prob": 0.0011335811577737331}, {"id": 542, "seek": 182108, "start": 1836.32, + "end": 1840.28, "text": " So robustness is like not completely solved that.", "tokens": + [51126, 407, 13956, 1287, 307, 411, 406, 2584, 13041, 300, 13, 51324], "temperature": + 0.0, "avg_logprob": -0.19698003927866617, "compression_ratio": 1.7846153846153847, + "no_speech_prob": 0.0011335811577737331}, {"id": 543, "seek": 182108, "start": 1840.28, + "end": 1844.04, "text": " I was reading a paper this morning called from DeepMind + Researcher''s data augmentation", "tokens": [51324, 286, 390, 3760, 257, 3035, 341, + 2446, 1219, 490, 14895, 44, 471, 10303, 260, 311, 1412, 14501, 19631, 51512], "temperature": + 0.0, "avg_logprob": -0.19698003927866617, "compression_ratio": 1.7846153846153847, + "no_speech_prob": 0.0011335811577737331}, {"id": 544, "seek": 182108, "start": 1844.04, + "end": 1845.08, "text": " can help robustness.", "tokens": [51512, 393, 854, 13956, + 1287, 13, 51564], "temperature": 0.0, "avg_logprob": -0.19698003927866617, "compression_ratio": + 1.7846153846153847, "no_speech_prob": 0.0011335811577737331}, {"id": 545, "seek": + 182108, "start": 1845.08, "end": 1849.4399999999998, "text": " It was like such + a on the nose title, like that, like data augmentation helps robustness.", "tokens": + [51564, 467, 390, 411, 1270, 257, 322, 264, 6690, 4876, 11, 411, 300, 11, 411, 1412, + 14501, 19631, 3665, 13956, 1287, 13, 51782], "temperature": 0.0, "avg_logprob": + -0.19698003927866617, "compression_ratio": 1.7846153846153847, "no_speech_prob": + 0.0011335811577737331}, {"id": 546, "seek": 184944, "start": 1849.44, "end": 1851.28, + "text": " So, so yeah, solving robustness.", "tokens": [50364, 407, 11, 370, 1338, + 11, 12606, 13956, 1287, 13, 50456], "temperature": 0.0, "avg_logprob": -0.2322935042842742, + "compression_ratio": 1.8410596026490067, "no_speech_prob": 0.0035861677024513483}, + {"id": 547, "seek": 184944, "start": 1851.28, "end": 1855.56, "text": " And I''m, + you know, I saw a, I''m not like, I still think solving robustness is a huge issue + for this.", "tokens": [50456, 400, 286, 478, 11, 291, 458, 11, 286, 1866, 257, 11, + 286, 478, 406, 411, 11, 286, 920, 519, 12606, 13956, 1287, 307, 257, 2603, 2734, + 337, 341, 13, 50670], "temperature": 0.0, "avg_logprob": -0.2322935042842742, "compression_ratio": + 1.8410596026490067, "no_speech_prob": 0.0035861677024513483}, {"id": 548, "seek": + 184944, "start": 1855.56, "end": 1858.6000000000001, "text": " It''s not completely + put together yet.", "tokens": [50670, 467, 311, 406, 2584, 829, 1214, 1939, 13, + 50822], "temperature": 0.0, "avg_logprob": -0.2322935042842742, "compression_ratio": + 1.8410596026490067, "no_speech_prob": 0.0035861677024513483}, {"id": 549, "seek": + 184944, "start": 1858.6000000000001, "end": 1860.04, "text": " Yeah, absolutely. + I agree. I agree.", "tokens": [50822, 865, 11, 3122, 13, 286, 3986, 13, 286, 3986, + 13, 50894], "temperature": 0.0, "avg_logprob": -0.2322935042842742, "compression_ratio": + 1.8410596026490067, "no_speech_prob": 0.0035861677024513483}, {"id": 550, "seek": + 184944, "start": 1860.04, "end": 1863.48, "text": " So, but like, yeah, you mentioned + you are not an expert on keyword search,", "tokens": [50894, 407, 11, 457, 411, + 11, 1338, 11, 291, 2835, 291, 366, 406, 364, 5844, 322, 20428, 3164, 11, 51066], + "temperature": 0.0, "avg_logprob": -0.2322935042842742, "compression_ratio": 1.8410596026490067, + "no_speech_prob": 0.0035861677024513483}, {"id": 551, "seek": 184944, "start": 1863.48, + "end": 1866.8, "text": " but at the same time, I think you were the expert of using + like Google, right?", "tokens": [51066, 457, 412, 264, 912, 565, 11, 286, 519, 291, + 645, 264, 5844, 295, 1228, 411, 3329, 11, 558, 30, 51232], "temperature": 0.0, "avg_logprob": + -0.2322935042842742, "compression_ratio": 1.8410596026490067, "no_speech_prob": + 0.0035861677024513483}, {"id": 552, "seek": 184944, "start": 1866.8, "end": 1868.64, + "text": " So like you still type keywords.", "tokens": [51232, 407, 411, 291, 920, + 2010, 21009, 13, 51324], "temperature": 0.0, "avg_logprob": -0.2322935042842742, + "compression_ratio": 1.8410596026490067, "no_speech_prob": 0.0035861677024513483}, + {"id": 553, "seek": 184944, "start": 1868.64, "end": 1871.64, "text": " And, and + I think psychologically, you still expect, you know,", "tokens": [51324, 400, 11, + 293, 286, 519, 41387, 11, 291, 920, 2066, 11, 291, 458, 11, 51474], "temperature": + 0.0, "avg_logprob": -0.2322935042842742, "compression_ratio": 1.8410596026490067, + "no_speech_prob": 0.0035861677024513483}, {"id": 554, "seek": 184944, "start": 1871.64, + "end": 1877.8, "text": " the snippets to contain some of your keywords as a validation + that the search engine got it, right?", "tokens": [51474, 264, 35623, 1385, 281, + 5304, 512, 295, 428, 21009, 382, 257, 24071, 300, 264, 3164, 2848, 658, 309, 11, + 558, 30, 51782], "temperature": 0.0, "avg_logprob": -0.2322935042842742, "compression_ratio": + 1.8410596026490067, "no_speech_prob": 0.0035861677024513483}, {"id": 555, "seek": + 187780, "start": 1877.8, "end": 1883.8, "text": " So like otherwise, search engine + maybe that just, you know, returns you garbage in return to what you want.", "tokens": + [50364, 407, 411, 5911, 11, 3164, 2848, 1310, 300, 445, 11, 291, 458, 11, 11247, + 291, 14150, 294, 2736, 281, 437, 291, 528, 13, 50664], "temperature": 0.0, "avg_logprob": + -0.2674149685218686, "compression_ratio": 1.7534722222222223, "no_speech_prob": + 0.004555961582809687}, {"id": 556, "seek": 187780, "start": 1883.8, "end": 1889.0, + "text": " Yeah, and that''s why I think like like the page rank, transition dynamic + matrices,", "tokens": [50664, 865, 11, 293, 300, 311, 983, 286, 519, 411, 411, 264, + 3028, 6181, 11, 6034, 8546, 32284, 11, 50924], "temperature": 0.0, "avg_logprob": + -0.2674149685218686, "compression_ratio": 1.7534722222222223, "no_speech_prob": + 0.004555961582809687}, {"id": 557, "seek": 187780, "start": 1889.0, "end": 1894.9199999999998, + "text": " though, those kind of things that that''s like, it won''t be enough to + just have the vector search engine probably.", "tokens": [50924, 1673, 11, 729, + 733, 295, 721, 300, 300, 311, 411, 11, 309, 1582, 380, 312, 1547, 281, 445, 362, + 264, 8062, 3164, 2848, 1391, 13, 51220], "temperature": 0.0, "avg_logprob": -0.2674149685218686, + "compression_ratio": 1.7534722222222223, "no_speech_prob": 0.004555961582809687}, + {"id": 558, "seek": 187780, "start": 1894.9199999999998, "end": 1898.2, "text": + " You''ll probably need some kind of like tuning layer.", "tokens": [51220, 509, + 603, 1391, 643, 512, 733, 295, 411, 15164, 4583, 13, 51384], "temperature": 0.0, + "avg_logprob": -0.2674149685218686, "compression_ratio": 1.7534722222222223, "no_speech_prob": + 0.004555961582809687}, {"id": 559, "seek": 187780, "start": 1898.2, "end": 1900.6, + "text": " And that''s why, so we''ve got has the Python client.", "tokens": [51384, + 400, 300, 311, 983, 11, 370, 321, 600, 658, 575, 264, 15329, 6423, 13, 51504], "temperature": + 0.0, "avg_logprob": -0.2674149685218686, "compression_ratio": 1.7534722222222223, + "no_speech_prob": 0.004555961582809687}, {"id": 560, "seek": 187780, "start": 1900.6, + "end": 1905.32, "text": " As I mentioned previously, a research project for this + would be to integrate that Python client", "tokens": [51504, 1018, 286, 2835, 8046, + 11, 257, 2132, 1716, 337, 341, 576, 312, 281, 13365, 300, 15329, 6423, 51740], "temperature": + 0.0, "avg_logprob": -0.2674149685218686, "compression_ratio": 1.7534722222222223, + "no_speech_prob": 0.004555961582809687}, {"id": 561, "seek": 190532, "start": 1905.32, + "end": 1909.32, "text": " into the training loop of the, you know, whatever is doing + the supervised learning task.", "tokens": [50364, 666, 264, 3097, 6367, 295, 264, + 11, 291, 458, 11, 2035, 307, 884, 264, 46533, 2539, 5633, 13, 50564], "temperature": + 0.0, "avg_logprob": -0.18089821759392233, "compression_ratio": 1.8827361563517915, + "no_speech_prob": 0.0028386542107909918}, {"id": 562, "seek": 190532, "start": 1909.32, + "end": 1912.04, "text": " So it kind of isn''t just retrieving.", "tokens": [50564, + 407, 309, 733, 295, 1943, 380, 445, 19817, 798, 13, 50700], "temperature": 0.0, + "avg_logprob": -0.18089821759392233, "compression_ratio": 1.8827361563517915, "no_speech_prob": + 0.0028386542107909918}, {"id": 563, "seek": 190532, "start": 1912.04, "end": 1916.2, + "text": " It''s like when we talked about the difference in information retrieval + and approximate nearest neighbor search,", "tokens": [50700, 467, 311, 411, 562, + 321, 2825, 466, 264, 2649, 294, 1589, 19817, 3337, 293, 30874, 23831, 5987, 3164, + 11, 50908], "temperature": 0.0, "avg_logprob": -0.18089821759392233, "compression_ratio": + 1.8827361563517915, "no_speech_prob": 0.0028386542107909918}, {"id": 564, "seek": + 190532, "start": 1916.2, "end": 1919.3999999999999, "text": " it''s kind of like + the semantics differences between the things you''re encoding,", "tokens": [50908, + 309, 311, 733, 295, 411, 264, 4361, 45298, 7300, 1296, 264, 721, 291, 434, 43430, + 11, 51068], "temperature": 0.0, "avg_logprob": -0.18089821759392233, "compression_ratio": + 1.8827361563517915, "no_speech_prob": 0.0028386542107909918}, {"id": 565, "seek": + 190532, "start": 1919.3999999999999, "end": 1923.8799999999999, "text": " where + you might be encoding a like the email title and then the email body.", "tokens": + [51068, 689, 291, 1062, 312, 43430, 257, 411, 264, 3796, 4876, 293, 550, 264, 3796, + 1772, 13, 51292], "temperature": 0.0, "avg_logprob": -0.18089821759392233, "compression_ratio": + 1.8827361563517915, "no_speech_prob": 0.0028386542107909918}, {"id": 566, "seek": + 190532, "start": 1923.8799999999999, "end": 1928.6, "text": " And so you have these + different kind of like transitions between the categories of objects you''re encoding.", + "tokens": [51292, 400, 370, 291, 362, 613, 819, 733, 295, 411, 23767, 1296, 264, + 10479, 295, 6565, 291, 434, 43430, 13, 51528], "temperature": 0.0, "avg_logprob": + -0.18089821759392233, "compression_ratio": 1.8827361563517915, "no_speech_prob": + 0.0028386542107909918}, {"id": 567, "seek": 190532, "start": 1928.6, "end": 1935.08, + "text": " So, so yeah, like the, you know, I still think that there''s like a layer + of,", "tokens": [51528, 407, 11, 370, 1338, 11, 411, 264, 11, 291, 458, 11, 286, + 920, 519, 300, 456, 311, 411, 257, 4583, 295, 11, 51852], "temperature": 0.0, "avg_logprob": + -0.18089821759392233, "compression_ratio": 1.8827361563517915, "no_speech_prob": + 0.0028386542107909918}, {"id": 568, "seek": 193508, "start": 1935.1599999999999, + "end": 1940.12, "text": " I don''t know how to describe it, maybe like that system + one system two, I know people like that analogy,", "tokens": [50368, 286, 500, 380, + 458, 577, 281, 6786, 309, 11, 1310, 411, 300, 1185, 472, 1185, 732, 11, 286, 458, + 561, 411, 300, 21663, 11, 50616], "temperature": 0.0, "avg_logprob": -0.21974966094249815, + "compression_ratio": 1.68259385665529, "no_speech_prob": 0.0015949123771861196}, + {"id": 569, "seek": 193508, "start": 1940.12, "end": 1945.72, "text": " but there''s + some kind of layer between keyword search and vector neural representations.", "tokens": + [50616, 457, 456, 311, 512, 733, 295, 4583, 1296, 20428, 3164, 293, 8062, 18161, + 33358, 13, 50896], "temperature": 0.0, "avg_logprob": -0.21974966094249815, "compression_ratio": + 1.68259385665529, "no_speech_prob": 0.0015949123771861196}, {"id": 570, "seek": + 193508, "start": 1945.72, "end": 1947.24, "text": " There''s something in the middle + of that.", "tokens": [50896, 821, 311, 746, 294, 264, 2808, 295, 300, 13, 50972], + "temperature": 0.0, "avg_logprob": -0.21974966094249815, "compression_ratio": 1.68259385665529, + "no_speech_prob": 0.0015949123771861196}, {"id": 571, "seek": 193508, "start": 1947.24, + "end": 1950.36, "text": " And, you know, I don''t know what it is, but yeah, I guess + page rank.", "tokens": [50972, 400, 11, 291, 458, 11, 286, 500, 380, 458, 437, 309, + 307, 11, 457, 1338, 11, 286, 2041, 3028, 6181, 13, 51128], "temperature": 0.0, "avg_logprob": + -0.21974966094249815, "compression_ratio": 1.68259385665529, "no_speech_prob": 0.0015949123771861196}, + {"id": 572, "seek": 193508, "start": 1950.36, "end": 1951.24, "text": " Yeah.", + "tokens": [51128, 865, 13, 51172], "temperature": 0.0, "avg_logprob": -0.21974966094249815, + "compression_ratio": 1.68259385665529, "no_speech_prob": 0.0015949123771861196}, + {"id": 573, "seek": 193508, "start": 1951.24, "end": 1957.8799999999999, "text": + " Yeah, like basically you''re talking about sort of even, even after vector database + has returned to the", "tokens": [51172, 865, 11, 411, 1936, 291, 434, 1417, 466, + 1333, 295, 754, 11, 754, 934, 8062, 8149, 575, 8752, 281, 264, 51504], "temperature": + 0.0, "avg_logprob": -0.21974966094249815, "compression_ratio": 1.68259385665529, + "no_speech_prob": 0.0015949123771861196}, {"id": 574, "seek": 193508, "start": 1957.8799999999999, + "end": 1964.4399999999998, "text": " nearest neighbors, you still have a sort of + liberty to apply a re-runker, right?", "tokens": [51504, 23831, 12512, 11, 291, + 920, 362, 257, 1333, 295, 22849, 281, 3079, 257, 319, 12, 12997, 5767, 11, 558, + 30, 51832], "temperature": 0.0, "avg_logprob": -0.21974966094249815, "compression_ratio": + 1.68259385665529, "no_speech_prob": 0.0015949123771861196}, {"id": 575, "seek": + 196444, "start": 1964.52, "end": 1969.72, "text": " Because and that''s where your + business logic kicks in, like the rules, the product, the vision,", "tokens": [50368, + 1436, 293, 300, 311, 689, 428, 1606, 9952, 21293, 294, 11, 411, 264, 4474, 11, 264, + 1674, 11, 264, 5201, 11, 50628], "temperature": 0.0, "avg_logprob": -0.2012007854602955, + "compression_ratio": 1.608365019011407, "no_speech_prob": 0.0014215527335181832}, + {"id": 576, "seek": 196444, "start": 1969.72, "end": 1974.1200000000001, "text": + " the design, there are so many inputs into that process of ranking.", "tokens": + [50628, 264, 1715, 11, 456, 366, 370, 867, 15743, 666, 300, 1399, 295, 17833, 13, + 50848], "temperature": 0.0, "avg_logprob": -0.2012007854602955, "compression_ratio": + 1.608365019011407, "no_speech_prob": 0.0014215527335181832}, {"id": 577, "seek": + 196444, "start": 1974.1200000000001, "end": 1979.16, "text": " And then ranking + obviously is like a huge research area as well, you know, with the click", "tokens": + [50848, 400, 550, 17833, 2745, 307, 411, 257, 2603, 2132, 1859, 382, 731, 11, 291, + 458, 11, 365, 264, 2052, 51100], "temperature": 0.0, "avg_logprob": -0.2012007854602955, + "compression_ratio": 1.608365019011407, "no_speech_prob": 0.0014215527335181832}, + {"id": 578, "seek": 196444, "start": 1979.16, "end": 1981.48, "text": " biasing + and things like that, right?", "tokens": [51100, 3228, 3349, 293, 721, 411, 300, + 11, 558, 30, 51216], "temperature": 0.0, "avg_logprob": -0.2012007854602955, "compression_ratio": + 1.608365019011407, "no_speech_prob": 0.0014215527335181832}, {"id": 579, "seek": + 196444, "start": 1982.76, "end": 1984.76, "text": " Yeah, I mean, and it''s also + interesting.", "tokens": [51280, 865, 11, 286, 914, 11, 293, 309, 311, 611, 1880, + 13, 51380], "temperature": 0.0, "avg_logprob": -0.2012007854602955, "compression_ratio": + 1.608365019011407, "no_speech_prob": 0.0014215527335181832}, {"id": 580, "seek": + 196444, "start": 1984.76, "end": 1991.4, "text": " I just crossed my mind that yesterday, + Richard Sorter announced his search engine and U.com.", "tokens": [51380, 286, 445, + 14622, 452, 1575, 300, 5186, 11, 9809, 318, 6122, 7548, 702, 3164, 2848, 293, 624, + 13, 1112, 13, 51712], "temperature": 0.0, "avg_logprob": -0.2012007854602955, "compression_ratio": + 1.608365019011407, "no_speech_prob": 0.0014215527335181832}, {"id": 581, "seek": + 199140, "start": 1991.64, "end": 1994.44, "text": " And did you have a chance to + check it out?", "tokens": [50376, 400, 630, 291, 362, 257, 2931, 281, 1520, 309, + 484, 30, 50516], "temperature": 0.0, "avg_logprob": -0.12663925545556204, "compression_ratio": + 1.6627906976744187, "no_speech_prob": 0.015483730472624302}, {"id": 582, "seek": + 199140, "start": 1994.44, "end": 1999.5600000000002, "text": " Basically for listeners + who didn''t check it out yet, so it''s a search engine which summarizes", "tokens": + [50516, 8537, 337, 23274, 567, 994, 380, 1520, 309, 484, 1939, 11, 370, 309, 311, + 257, 3164, 2848, 597, 14611, 5660, 50772], "temperature": 0.0, "avg_logprob": -0.12663925545556204, + "compression_ratio": 1.6627906976744187, "no_speech_prob": 0.015483730472624302}, + {"id": 583, "seek": 199140, "start": 1999.5600000000002, "end": 2005.24, "text": + " the web pages and the kind of documents and so on. And so you are kind of, it + makes it actionable.", "tokens": [50772, 264, 3670, 7183, 293, 264, 733, 295, 8512, + 293, 370, 322, 13, 400, 370, 291, 366, 733, 295, 11, 309, 1669, 309, 45098, 13, + 51056], "temperature": 0.0, "avg_logprob": -0.12663925545556204, "compression_ratio": + 1.6627906976744187, "no_speech_prob": 0.015483730472624302}, {"id": 584, "seek": + 199140, "start": 2005.24, "end": 2011.4, "text": " So just one example, they can + find you a code snippet on Stack Overflow that you can actually", "tokens": [51056, + 407, 445, 472, 1365, 11, 436, 393, 915, 291, 257, 3089, 35623, 302, 322, 37649, + 4886, 10565, 300, 291, 393, 767, 51364], "temperature": 0.0, "avg_logprob": -0.12663925545556204, + "compression_ratio": 1.6627906976744187, "no_speech_prob": 0.015483730472624302}, + {"id": 585, "seek": 199140, "start": 2011.4, "end": 2016.68, "text": " copy paste. + And that''s just one example, right? But there are plenty of more. Any thoughts + on this?", "tokens": [51364, 5055, 9163, 13, 400, 300, 311, 445, 472, 1365, 11, + 558, 30, 583, 456, 366, 7140, 295, 544, 13, 2639, 4598, 322, 341, 30, 51628], "temperature": + 0.0, "avg_logprob": -0.12663925545556204, "compression_ratio": 1.6627906976744187, + "no_speech_prob": 0.015483730472624302}, {"id": 586, "seek": 201668, "start": 2017.64, + "end": 2022.92, "text": " Yeah, well, I mean, first of all, Richard Sacher, his + research has been incredible. And as I", "tokens": [50412, 865, 11, 731, 11, 286, + 914, 11, 700, 295, 439, 11, 9809, 318, 4062, 11, 702, 2132, 575, 668, 4651, 13, + 400, 382, 286, 50676], "temperature": 0.0, "avg_logprob": -0.21306838019419524, + "compression_ratio": 1.5985663082437276, "no_speech_prob": 0.04407363012433052}, + {"id": 587, "seek": 201668, "start": 2022.92, "end": 2027.16, "text": " mentioned + earlier in the podcast, I was listening to systems co-search from Salesforce Research + was,", "tokens": [50676, 2835, 3071, 294, 264, 7367, 11, 286, 390, 4764, 281, 3652, + 598, 12, 405, 1178, 490, 40398, 10303, 390, 11, 50888], "temperature": 0.0, "avg_logprob": + -0.21306838019419524, "compression_ratio": 1.5985663082437276, "no_speech_prob": + 0.04407363012433052}, {"id": 588, "seek": 201668, "start": 2028.1200000000001, "end": + 2030.52, "text": " he was one of the authors, I don''t know who led the project.", + "tokens": [50936, 415, 390, 472, 295, 264, 16552, 11, 286, 500, 380, 458, 567, 4684, + 264, 1716, 13, 51056], "temperature": 0.0, "avg_logprob": -0.21306838019419524, + "compression_ratio": 1.5985663082437276, "no_speech_prob": 0.04407363012433052}, + {"id": 589, "seek": 201668, "start": 2031.8, "end": 2039.8, "text": " So yeah, U.com, + I mean, it looks crazy. Like, have I used it quite not really yet, but I definitely", + "tokens": [51120, 407, 1338, 11, 624, 13, 1112, 11, 286, 914, 11, 309, 1542, 3219, + 13, 1743, 11, 362, 286, 1143, 309, 1596, 406, 534, 1939, 11, 457, 286, 2138, 51520], + "temperature": 0.0, "avg_logprob": -0.21306838019419524, "compression_ratio": 1.5985663082437276, + "no_speech_prob": 0.04407363012433052}, {"id": 590, "seek": 201668, "start": 2039.8, + "end": 2046.04, "text": " believe in the concept and yeah, the research is pointing + in that direction. It''s exciting.", "tokens": [51520, 1697, 294, 264, 3410, 293, + 1338, 11, 264, 2132, 307, 12166, 294, 300, 3513, 13, 467, 311, 4670, 13, 51832], + "temperature": 0.0, "avg_logprob": -0.21306838019419524, "compression_ratio": 1.5985663082437276, + "no_speech_prob": 0.04407363012433052}, {"id": 591, "seek": 204668, "start": 2047.64, + "end": 2053.48, "text": " But do I think like, solely neural system? Yeah, I mean, + designing new interfaces around", "tokens": [50412, 583, 360, 286, 519, 411, 11, + 23309, 18161, 1185, 30, 865, 11, 286, 914, 11, 14685, 777, 28416, 926, 50704], "temperature": + 0.0, "avg_logprob": -0.20864842732747396, "compression_ratio": 1.6464285714285714, + "no_speech_prob": 0.0007735115359537303}, {"id": 592, "seek": 204668, "start": 2053.48, + "end": 2056.76, "text": " search, started to go around that a little bit as I''m + trying to like think, well, I talk, but", "tokens": [50704, 3164, 11, 1409, 281, + 352, 926, 300, 257, 707, 857, 382, 286, 478, 1382, 281, 411, 519, 11, 731, 11, 286, + 751, 11, 457, 50868], "temperature": 0.0, "avg_logprob": -0.20864842732747396, "compression_ratio": + 1.6464285714285714, "no_speech_prob": 0.0007735115359537303}, {"id": 593, "seek": + 204668, "start": 2057.32, "end": 2063.16, "text": " yeah, the U.com thing is exciting. + New spaces for search engines. It''s hard to even completely", "tokens": [50896, + 1338, 11, 264, 624, 13, 1112, 551, 307, 4670, 13, 1873, 7673, 337, 3164, 12982, + 13, 467, 311, 1152, 281, 754, 2584, 51188], "temperature": 0.0, "avg_logprob": -0.20864842732747396, + "compression_ratio": 1.6464285714285714, "no_speech_prob": 0.0007735115359537303}, + {"id": 594, "seek": 204668, "start": 2063.16, "end": 2068.04, "text": " conceptualize + it, I think because it''s such a, you think of Google as like this giant,", "tokens": + [51188, 24106, 1125, 309, 11, 286, 519, 570, 309, 311, 1270, 257, 11, 291, 519, + 295, 3329, 382, 411, 341, 7410, 11, 51432], "temperature": 0.0, "avg_logprob": -0.20864842732747396, + "compression_ratio": 1.6464285714285714, "no_speech_prob": 0.0007735115359537303}, + {"id": 595, "seek": 204668, "start": 2068.04, "end": 2073.0, "text": " undistructible + search engine, but that''s really not the story. There really is a ton of research", + "tokens": [51432, 674, 468, 1757, 964, 3164, 2848, 11, 457, 300, 311, 534, 406, + 264, 1657, 13, 821, 534, 307, 257, 2952, 295, 2132, 51680], "temperature": 0.0, + "avg_logprob": -0.20864842732747396, "compression_ratio": 1.6464285714285714, "no_speech_prob": + 0.0007735115359537303}, {"id": 596, "seek": 207300, "start": 2073.0, "end": 2078.2, + "text": " and search engines. Yeah, yeah, but actually, I''m currently working for + WebScale. So,", "tokens": [50364, 293, 3164, 12982, 13, 865, 11, 1338, 11, 457, + 767, 11, 286, 478, 4362, 1364, 337, 9573, 16806, 1220, 13, 407, 11, 50624], "temperature": + 0.0, "avg_logprob": -0.20867106119791667, "compression_ratio": 1.5231788079470199, + "no_speech_prob": 0.008089895360171795}, {"id": 597, "seek": 207300, "start": 2078.2, + "end": 2084.28, "text": " Changes, which I cannot mention because it''s my client + on the NDA, but we basically have all the", "tokens": [50624, 761, 10350, 11, 597, + 286, 2644, 2152, 570, 309, 311, 452, 6423, 322, 264, 426, 7509, 11, 457, 321, 1936, + 362, 439, 264, 50928], "temperature": 0.0, "avg_logprob": -0.20867106119791667, + "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.008089895360171795}, + {"id": 598, "seek": 207300, "start": 2084.28, "end": 2090.28, "text": " charts and + we know that Google is like 97%. And then everyone else is close to the bottom.", + "tokens": [50928, 17767, 293, 321, 458, 300, 3329, 307, 411, 23399, 6856, 400, 550, + 1518, 1646, 307, 1998, 281, 264, 2767, 13, 51228], "temperature": 0.0, "avg_logprob": + -0.20867106119791667, "compression_ratio": 1.5231788079470199, "no_speech_prob": + 0.008089895360171795}, {"id": 599, "seek": 207300, "start": 2091.16, "end": 2096.52, + "text": " Unfortunately, well, of course, Bing has a couple percent of the market. + And then it kind of,", "tokens": [51272, 8590, 11, 731, 11, 295, 1164, 11, 30755, + 575, 257, 1916, 3043, 295, 264, 2142, 13, 400, 550, 309, 733, 295, 11, 51540], "temperature": + 0.0, "avg_logprob": -0.20867106119791667, "compression_ratio": 1.5231788079470199, + "no_speech_prob": 0.008089895360171795}, {"id": 600, "seek": 207300, "start": 2096.52, + "end": 2100.52, "text": " if you go inside a specific country, the split might be + different. Like, if you take Russia,", "tokens": [51540, 498, 291, 352, 1854, 257, + 2685, 1941, 11, 264, 7472, 1062, 312, 819, 13, 1743, 11, 498, 291, 747, 6797, 11, + 51740], "temperature": 0.0, "avg_logprob": -0.20867106119791667, "compression_ratio": + 1.5231788079470199, "no_speech_prob": 0.008089895360171795}, {"id": 601, "seek": + 210052, "start": 2100.52, "end": 2104.68, "text": " for example, Yandex is on top + and then Google is following them, but very closely, you know,", "tokens": [50364, + 337, 1365, 11, 398, 474, 3121, 307, 322, 1192, 293, 550, 3329, 307, 3480, 552, 11, + 457, 588, 8185, 11, 291, 458, 11, 50572], "temperature": 0.0, "avg_logprob": -0.15013101722012048, + "compression_ratio": 1.7651515151515151, "no_speech_prob": 0.010685698129236698}, + {"id": 602, "seek": 210052, "start": 2105.8, "end": 2112.7599999999998, "text": + " but overall, globally, Google is just somewhere beyond the sky. So, you need to + kind of", "tokens": [50628, 457, 4787, 11, 18958, 11, 3329, 307, 445, 4079, 4399, + 264, 5443, 13, 407, 11, 291, 643, 281, 733, 295, 50976], "temperature": 0.0, "avg_logprob": + -0.15013101722012048, "compression_ratio": 1.7651515151515151, "no_speech_prob": + 0.010685698129236698}, {"id": 603, "seek": 210052, "start": 2112.7599999999998, + "end": 2118.04, "text": " differentiate a lot, you know, like you don''t want to + build another Google. It''s almost like Peter", "tokens": [50976, 23203, 257, 688, + 11, 291, 458, 11, 411, 291, 500, 380, 528, 281, 1322, 1071, 3329, 13, 467, 311, + 1920, 411, 6508, 51240], "temperature": 0.0, "avg_logprob": -0.15013101722012048, + "compression_ratio": 1.7651515151515151, "no_speech_prob": 0.010685698129236698}, + {"id": 604, "seek": 210052, "start": 2118.04, "end": 2123.0, "text": " Tills book, + you know, zero to one where he says, if you are building another Facebook, you''re + not", "tokens": [51240, 314, 2565, 1446, 11, 291, 458, 11, 4018, 281, 472, 689, + 415, 1619, 11, 498, 291, 366, 2390, 1071, 4384, 11, 291, 434, 406, 51488], "temperature": + 0.0, "avg_logprob": -0.15013101722012048, "compression_ratio": 1.7651515151515151, + "no_speech_prob": 0.010685698129236698}, {"id": 605, "seek": 210052, "start": 2123.0, + "end": 2126.84, "text": " learning anything from Mark Zuckerberg or if you''re building + another Google, you''re not,", "tokens": [51488, 2539, 1340, 490, 3934, 34032, 6873, + 420, 498, 291, 434, 2390, 1071, 3329, 11, 291, 434, 406, 11, 51680], "temperature": + 0.0, "avg_logprob": -0.15013101722012048, "compression_ratio": 1.7651515151515151, + "no_speech_prob": 0.010685698129236698}, {"id": 606, "seek": 212684, "start": 2126.84, + "end": 2131.32, "text": " you''re not learning anything from the Google founders. + Like, you need to build that one, right? And", "tokens": [50364, 291, 434, 406, + 2539, 1340, 490, 264, 3329, 25608, 13, 1743, 11, 291, 643, 281, 1322, 300, 472, + 11, 558, 30, 400, 50588], "temperature": 0.0, "avg_logprob": -0.19744549278451615, + "compression_ratio": 1.6472602739726028, "no_speech_prob": 0.00458178436383605}, + {"id": 607, "seek": 212684, "start": 2131.32, "end": 2136.6000000000004, "text": + " I think Richard is trying to build that one probably. So, yeah, I mean, it''s + an interesting", "tokens": [50588, 286, 519, 9809, 307, 1382, 281, 1322, 300, 472, + 1391, 13, 407, 11, 1338, 11, 286, 914, 11, 309, 311, 364, 1880, 50852], "temperature": + 0.0, "avg_logprob": -0.19744549278451615, "compression_ratio": 1.6472602739726028, + "no_speech_prob": 0.00458178436383605}, {"id": 608, "seek": 212684, "start": 2136.6000000000004, + "end": 2143.0, "text": " direction that he''s trying to involve the AI much deeper + in the process, probably already surfacing,", "tokens": [50852, 3513, 300, 415, + 311, 1382, 281, 9494, 264, 7318, 709, 7731, 294, 264, 1399, 11, 1391, 1217, 9684, + 5615, 11, 51172], "temperature": 0.0, "avg_logprob": -0.19744549278451615, "compression_ratio": + 1.6472602739726028, "no_speech_prob": 0.00458178436383605}, {"id": 609, "seek": + 212684, "start": 2143.0, "end": 2149.8, "text": " you know, users. That''s fantastic. + Yeah, yeah, I don''t have anything to add other than just", "tokens": [51172, 291, + 458, 11, 5022, 13, 663, 311, 5456, 13, 865, 11, 1338, 11, 286, 500, 380, 362, 1340, + 281, 909, 661, 813, 445, 51512], "temperature": 0.0, "avg_logprob": -0.19744549278451615, + "compression_ratio": 1.6472602739726028, "no_speech_prob": 0.00458178436383605}, + {"id": 610, "seek": 212684, "start": 2149.8, "end": 2156.1200000000003, "text": + " shared excitement about what you.com will become. It''s certainly exciting. Yeah, + absolutely. All", "tokens": [51512, 5507, 14755, 466, 437, 291, 13, 1112, 486, 1813, + 13, 467, 311, 3297, 4670, 13, 865, 11, 3122, 13, 1057, 51828], "temperature": 0.0, + "avg_logprob": -0.19744549278451615, "compression_ratio": 1.6472602739726028, "no_speech_prob": + 0.00458178436383605}, {"id": 611, "seek": 215612, "start": 2156.12, "end": 2165.16, + "text": " the best Richard. Yeah, and you actually I wanted to make a slight segue + into you shared like", "tokens": [50364, 264, 1151, 9809, 13, 865, 11, 293, 291, + 767, 286, 1415, 281, 652, 257, 4036, 33850, 666, 291, 5507, 411, 50816], "temperature": + 0.0, "avg_logprob": -0.18123321801843778, "compression_ratio": 1.513089005235602, + "no_speech_prob": 0.00575678888708353}, {"id": 612, "seek": 215612, "start": 2165.16, + "end": 2172.92, "text": " a ton of information today. I wonder how do you keep up + with so much stuff happening? Like, what", "tokens": [50816, 257, 2952, 295, 1589, + 965, 13, 286, 2441, 577, 360, 291, 1066, 493, 365, 370, 709, 1507, 2737, 30, 1743, + 11, 437, 51204], "temperature": 0.0, "avg_logprob": -0.18123321801843778, "compression_ratio": + 1.513089005235602, "no_speech_prob": 0.00575678888708353}, {"id": 613, "seek": 215612, + "start": 2172.92, "end": 2177.7999999999997, "text": " are your preferred sources + of information? Like, obviously YouTube is one, but, you know, there is", "tokens": + [51204, 366, 428, 16494, 7139, 295, 1589, 30, 1743, 11, 2745, 3088, 307, 472, 11, + 457, 11, 291, 458, 11, 456, 307, 51448], "temperature": 0.0, "avg_logprob": -0.18123321801843778, + "compression_ratio": 1.513089005235602, "no_speech_prob": 0.00575678888708353}, + {"id": 614, "seek": 217780, "start": 2177.8, "end": 2184.6000000000004, "text": + " also medium. There is publications themselves. How did you structure your sort + of consumption,", "tokens": [50364, 611, 6399, 13, 821, 307, 25618, 2969, 13, 1012, + 630, 291, 3877, 428, 1333, 295, 12126, 11, 50704], "temperature": 0.0, "avg_logprob": + -0.24583379357261995, "compression_ratio": 1.5986159169550174, "no_speech_prob": + 0.013005629181861877}, {"id": 615, "seek": 217780, "start": 2185.2400000000002, + "end": 2190.6000000000004, "text": " you know, parts like the pacing and kind of + where to pay, put your attention and so on?", "tokens": [50736, 291, 458, 11, 3166, + 411, 264, 43285, 293, 733, 295, 689, 281, 1689, 11, 829, 428, 3202, 293, 370, 322, + 30, 51004], "temperature": 0.0, "avg_logprob": -0.24583379357261995, "compression_ratio": + 1.5986159169550174, "no_speech_prob": 0.013005629181861877}, {"id": 616, "seek": + 217780, "start": 2192.6000000000004, "end": 2197.5600000000004, "text": " Yeah, + that''s a great question. And, you know, early days of my podcast, I was doing the", + "tokens": [51104, 865, 11, 300, 311, 257, 869, 1168, 13, 400, 11, 291, 458, 11, + 2440, 1708, 295, 452, 7367, 11, 286, 390, 884, 264, 51352], "temperature": 0.0, + "avg_logprob": -0.24583379357261995, "compression_ratio": 1.5986159169550174, "no_speech_prob": + 0.013005629181861877}, {"id": 617, "seek": 217780, "start": 2197.5600000000004, + "end": 2202.36, "text": " Machine Learning Street Talk with Tim Scarf and Yana Kiltcher + and Tim asked Jonathan Frank,", "tokens": [51352, 22155, 15205, 7638, 8780, 365, + 7172, 23181, 69, 293, 398, 2095, 591, 2352, 6759, 293, 7172, 2351, 15471, 6823, + 11, 51592], "temperature": 0.0, "avg_logprob": -0.24583379357261995, "compression_ratio": + 1.5986159169550174, "no_speech_prob": 0.013005629181861877}, {"id": 618, "seek": + 217780, "start": 2202.36, "end": 2207.0, "text": " the author of the lottery ticket + hypothesis, the same question. Like, what''s your information diet?", "tokens": + [51592, 264, 3793, 295, 264, 27391, 10550, 17291, 11, 264, 912, 1168, 13, 1743, + 11, 437, 311, 428, 1589, 6339, 30, 51824], "temperature": 0.0, "avg_logprob": -0.24583379357261995, + "compression_ratio": 1.5986159169550174, "no_speech_prob": 0.013005629181861877}, + {"id": 619, "seek": 220700, "start": 2207.0, "end": 2213.08, "text": " And I thought + it''s a really interesting question. So mine is, you know, like most people out + there", "tokens": [50364, 400, 286, 1194, 309, 311, 257, 534, 1880, 1168, 13, 407, + 3892, 307, 11, 291, 458, 11, 411, 881, 561, 484, 456, 50668], "temperature": 0.0, + "avg_logprob": -0.13902139282226564, "compression_ratio": 1.7234042553191489, "no_speech_prob": + 0.004591033793985844}, {"id": 620, "seek": 220700, "start": 2213.08, "end": 2217.8, + "text": " trying to be good at something. It''s chaotic and it gets overwhelming + and I get really stressed", "tokens": [50668, 1382, 281, 312, 665, 412, 746, 13, + 467, 311, 27013, 293, 309, 2170, 13373, 293, 286, 483, 534, 14471, 50904], "temperature": + 0.0, "avg_logprob": -0.13902139282226564, "compression_ratio": 1.7234042553191489, + "no_speech_prob": 0.004591033793985844}, {"id": 621, "seek": 220700, "start": 2217.8, + "end": 2224.36, "text": " out sometimes. So I don''t know if this is the best advice + to follow, but like, here''s what I do.", "tokens": [50904, 484, 2171, 13, 407, + 286, 500, 380, 458, 498, 341, 307, 264, 1151, 5192, 281, 1524, 11, 457, 411, 11, + 510, 311, 437, 286, 360, 13, 51232], "temperature": 0.0, "avg_logprob": -0.13902139282226564, + "compression_ratio": 1.7234042553191489, "no_speech_prob": 0.004591033793985844}, + {"id": 622, "seek": 220700, "start": 2224.36, "end": 2229.88, "text": " So I, you + know, I''m very active on Twitter, like maybe to the point of detrimental to my + health,", "tokens": [51232, 407, 286, 11, 291, 458, 11, 286, 478, 588, 4967, 322, + 5794, 11, 411, 1310, 281, 264, 935, 295, 45694, 281, 452, 1585, 11, 51508], "temperature": + 0.0, "avg_logprob": -0.13902139282226564, "compression_ratio": 1.7234042553191489, + "no_speech_prob": 0.004591033793985844}, {"id": 623, "seek": 220700, "start": 2229.88, + "end": 2234.76, "text": " like I checked Twitter, like, all the time. Like, so I''m + always refreshing Twitter and seeing the", "tokens": [51508, 411, 286, 10033, 5794, + 11, 411, 11, 439, 264, 565, 13, 1743, 11, 370, 286, 478, 1009, 19772, 5794, 293, + 2577, 264, 51752], "temperature": 0.0, "avg_logprob": -0.13902139282226564, "compression_ratio": + 1.7234042553191489, "no_speech_prob": 0.004591033793985844}, {"id": 624, "seek": + 223476, "start": 2234.76, "end": 2239.96, "text": " new headlines. And so I, when + I see like an archive link, I''ll try to, like, if I like it,", "tokens": [50364, + 777, 23867, 13, 400, 370, 286, 11, 562, 286, 536, 411, 364, 23507, 2113, 11, 286, + 603, 853, 281, 11, 411, 11, 498, 286, 411, 309, 11, 50624], "temperature": 0.0, + "avg_logprob": -0.1379076385498047, "compression_ratio": 1.610344827586207, "no_speech_prob": + 0.0018856344977393746}, {"id": 625, "seek": 223476, "start": 2239.96, "end": 2243.5600000000004, + "text": " I''ve tried to discipline myself to be like, don''t just like it. Like + read the abstract,", "tokens": [50624, 286, 600, 3031, 281, 13635, 2059, 281, 312, + 411, 11, 500, 380, 445, 411, 309, 13, 1743, 1401, 264, 12649, 11, 50804], "temperature": + 0.0, "avg_logprob": -0.1379076385498047, "compression_ratio": 1.610344827586207, + "no_speech_prob": 0.0018856344977393746}, {"id": 626, "seek": 223476, "start": 2243.5600000000004, + "end": 2248.84, "text": " like get a couple sentences in because clearly, you know, + the titles caught your attention. So,", "tokens": [50804, 411, 483, 257, 1916, 16579, + 294, 570, 4448, 11, 291, 458, 11, 264, 12992, 5415, 428, 3202, 13, 407, 11, 51068], + "temperature": 0.0, "avg_logprob": -0.1379076385498047, "compression_ratio": 1.610344827586207, + "no_speech_prob": 0.0018856344977393746}, {"id": 627, "seek": 223476, "start": 2248.84, + "end": 2254.0400000000004, "text": " so Twitter is really where I get all my news. + And then the art form of making these YouTube videos,", "tokens": [51068, 370, 5794, + 307, 534, 689, 286, 483, 439, 452, 2583, 13, 400, 550, 264, 1523, 1254, 295, 1455, + 613, 3088, 2145, 11, 51328], "temperature": 0.0, "avg_logprob": -0.1379076385498047, + "compression_ratio": 1.610344827586207, "no_speech_prob": 0.0018856344977393746}, + {"id": 628, "seek": 223476, "start": 2254.0400000000004, "end": 2259.0, "text": + " I mean, like Yana Kiltcher and Tim Scarf that I mentioned, the Machine Learning + Street Talk,", "tokens": [51328, 286, 914, 11, 411, 398, 2095, 591, 2352, 6759, + 293, 7172, 23181, 69, 300, 286, 2835, 11, 264, 22155, 15205, 7638, 8780, 11, 51576], + "temperature": 0.0, "avg_logprob": -0.1379076385498047, "compression_ratio": 1.610344827586207, + "no_speech_prob": 0.0018856344977393746}, {"id": 629, "seek": 225900, "start": 2259.08, + "end": 2265.16, "text": " these kind of, this kind of medium. It''s, I watch that. + It''s pretty good. I think I watch it on", "tokens": [50368, 613, 733, 295, 11, + 341, 733, 295, 6399, 13, 467, 311, 11, 286, 1159, 300, 13, 467, 311, 1238, 665, + 13, 286, 519, 286, 1159, 309, 322, 50672], "temperature": 0.0, "avg_logprob": -0.21075267222390245, + "compression_ratio": 1.6778523489932886, "no_speech_prob": 0.028601376339793205}, + {"id": 630, "seek": 225900, "start": 2265.16, "end": 2270.36, "text": " like, Exploratory + Street also Alexa, Miss Coffee Bean to kind of go on the list, you know, they''re + not", "tokens": [50672, 411, 11, 12514, 284, 4745, 7638, 611, 22595, 11, 5275, 25481, + 38454, 281, 733, 295, 352, 322, 264, 1329, 11, 291, 458, 11, 436, 434, 406, 50932], + "temperature": 0.0, "avg_logprob": -0.21075267222390245, "compression_ratio": 1.6778523489932886, + "no_speech_prob": 0.028601376339793205}, {"id": 631, "seek": 225900, "start": 2270.36, + "end": 2274.68, "text": " the only ones doing it well. A lot of people are starting + to make really great YouTube videos. And I", "tokens": [50932, 264, 787, 2306, 884, + 309, 731, 13, 316, 688, 295, 561, 366, 2891, 281, 652, 534, 869, 3088, 2145, 13, + 400, 286, 51148], "temperature": 0.0, "avg_logprob": -0.21075267222390245, "compression_ratio": + 1.6778523489932886, "no_speech_prob": 0.028601376339793205}, {"id": 632, "seek": + 225900, "start": 2274.68, "end": 2280.2, "text": " love that kind of medium of showing + these things. So on my, my work, my like, my workout, say I''m a", "tokens": [51148, + 959, 300, 733, 295, 6399, 295, 4099, 613, 721, 13, 407, 322, 452, 11, 452, 589, + 11, 452, 411, 11, 452, 12169, 11, 584, 286, 478, 257, 51424], "temperature": 0.0, + "avg_logprob": -0.21075267222390245, "compression_ratio": 1.6778523489932886, "no_speech_prob": + 0.028601376339793205}, {"id": 633, "seek": 225900, "start": 2280.2, "end": 2286.04, + "text": " basketball player and I''ve got to work on my deep learning skills is + it''s mostly about reading these", "tokens": [51424, 11767, 4256, 293, 286, 600, + 658, 281, 589, 322, 452, 2452, 2539, 3942, 307, 309, 311, 5240, 466, 3760, 613, + 51716], "temperature": 0.0, "avg_logprob": -0.21075267222390245, "compression_ratio": + 1.6778523489932886, "no_speech_prob": 0.028601376339793205}, {"id": 634, "seek": + 228604, "start": 2286.04, "end": 2292.2, "text": " papers. My experiments, I''d + say the coding part is not super challenging. Thanks to things like", "tokens": + [50364, 10577, 13, 1222, 12050, 11, 286, 1116, 584, 264, 17720, 644, 307, 406, 1687, + 7595, 13, 2561, 281, 721, 411, 50672], "temperature": 0.0, "avg_logprob": -0.18756159069468675, + "compression_ratio": 1.603305785123967, "no_speech_prob": 0.0033280516508966684}, + {"id": 635, "seek": 228604, "start": 2292.2, "end": 2296.84, "text": " Keras coding + examples and like thanks to them, major thanks to them because that saves me so + much", "tokens": [50672, 591, 6985, 17720, 5110, 293, 411, 3231, 281, 552, 11, 2563, + 3231, 281, 552, 570, 300, 19155, 385, 370, 709, 50904], "temperature": 0.0, "avg_logprob": + -0.18756159069468675, "compression_ratio": 1.603305785123967, "no_speech_prob": + 0.0033280516508966684}, {"id": 636, "seek": 228604, "start": 2296.84, "end": 2303.0, + "text": " headache in just getting running. So, so yeah, I try to, I try to read + like five papers at a time.", "tokens": [50904, 23520, 294, 445, 1242, 2614, 13, + 407, 11, 370, 1338, 11, 286, 853, 281, 11, 286, 853, 281, 1401, 411, 1732, 10577, + 412, 257, 565, 13, 51212], "temperature": 0.0, "avg_logprob": -0.18756159069468675, + "compression_ratio": 1.603305785123967, "no_speech_prob": 0.0033280516508966684}, + {"id": 637, "seek": 228604, "start": 2303.0, "end": 2311.64, "text": " I tried to + switch, I try to set 20 minute timers, drink a lot of coffee. And what else do I + do?", "tokens": [51212, 286, 3031, 281, 3679, 11, 286, 853, 281, 992, 945, 3456, + 524, 433, 11, 2822, 257, 688, 295, 4982, 13, 400, 437, 1646, 360, 286, 360, 30, + 51644], "temperature": 0.0, "avg_logprob": -0.18756159069468675, "compression_ratio": + 1.603305785123967, "no_speech_prob": 0.0033280516508966684}, {"id": 638, "seek": + 231164, "start": 2312.52, "end": 2316.52, "text": " Yeah, I guess that''s it really + reading the really reading the papers. I mean, if you make paper", "tokens": [50408, + 865, 11, 286, 2041, 300, 311, 309, 534, 3760, 264, 534, 3760, 264, 10577, 13, 286, + 914, 11, 498, 291, 652, 3035, 50608], "temperature": 0.0, "avg_logprob": -0.16035889867526382, + "compression_ratio": 1.6723549488054608, "no_speech_prob": 0.009077411144971848}, + {"id": 639, "seek": 231164, "start": 2316.52, "end": 2321.24, "text": " summary + videos and write blog posts, that''s also a huge way to retain it. I try to talk + to a lot of", "tokens": [50608, 12691, 2145, 293, 2464, 6968, 12300, 11, 300, 311, + 611, 257, 2603, 636, 281, 18340, 309, 13, 286, 853, 281, 751, 281, 257, 688, 295, + 50844], "temperature": 0.0, "avg_logprob": -0.16035889867526382, "compression_ratio": + 1.6723549488054608, "no_speech_prob": 0.009077411144971848}, {"id": 640, "seek": + 231164, "start": 2321.24, "end": 2327.16, "text": " people also just, you know, + I try to keep a lot of contact. Like I''m organized all this through", "tokens": + [50844, 561, 611, 445, 11, 291, 458, 11, 286, 853, 281, 1066, 257, 688, 295, 3385, + 13, 1743, 286, 478, 9983, 439, 341, 807, 51140], "temperature": 0.0, "avg_logprob": + -0.16035889867526382, "compression_ratio": 1.6723549488054608, "no_speech_prob": + 0.009077411144971848}, {"id": 641, "seek": 231164, "start": 2327.16, "end": 2334.3599999999997, + "text": " Twitter. So like, you know, I might just send messages to say, Syek Paul + from who makes, I think", "tokens": [51140, 5794, 13, 407, 411, 11, 291, 458, 11, + 286, 1062, 445, 2845, 7897, 281, 584, 11, 3902, 916, 4552, 490, 567, 1669, 11, 286, + 519, 51500], "temperature": 0.0, "avg_logprob": -0.16035889867526382, "compression_ratio": + 1.6723549488054608, "no_speech_prob": 0.009077411144971848}, {"id": 642, "seek": + 231164, "start": 2334.3599999999997, "end": 2339.64, "text": " he works at Cardid + and he makes, he''s one of the leaders of Keras code examples. I''ll send him ideas.", + "tokens": [51500, 415, 1985, 412, 11877, 327, 293, 415, 1669, 11, 415, 311, 472, + 295, 264, 3523, 295, 591, 6985, 3089, 5110, 13, 286, 603, 2845, 796, 3487, 13, 51764], + "temperature": 0.0, "avg_logprob": -0.16035889867526382, "compression_ratio": 1.6723549488054608, + "no_speech_prob": 0.009077411144971848}, {"id": 643, "seek": 233964, "start": 2339.64, + "end": 2343.08, "text": " I''ll be like, you know, I saw this paper on Twitter. + I think, you know, this reminds me of what", "tokens": [50364, 286, 603, 312, 411, + 11, 291, 458, 11, 286, 1866, 341, 3035, 322, 5794, 13, 286, 519, 11, 291, 458, 11, + 341, 12025, 385, 295, 437, 50536], "temperature": 0.0, "avg_logprob": -0.14780119436758535, + "compression_ratio": 1.7805755395683454, "no_speech_prob": 0.06597090512514114}, + {"id": 644, "seek": 233964, "start": 2343.08, "end": 2347.64, "text": " you''re + doing. And, and yes, I guess overall, that''s my information diet. I''m probably + leaving", "tokens": [50536, 291, 434, 884, 13, 400, 11, 293, 2086, 11, 286, 2041, + 4787, 11, 300, 311, 452, 1589, 6339, 13, 286, 478, 1391, 5012, 50764], "temperature": + 0.0, "avg_logprob": -0.14780119436758535, "compression_ratio": 1.7805755395683454, + "no_speech_prob": 0.06597090512514114}, {"id": 645, "seek": 233964, "start": 2347.64, + "end": 2351.3199999999997, "text": " something that I didn''t really, you know, + prepare something for this, but no, it''s okay. I mean, it''s", "tokens": [50764, + 746, 300, 286, 994, 380, 534, 11, 291, 458, 11, 5940, 746, 337, 341, 11, 457, 572, + 11, 309, 311, 1392, 13, 286, 914, 11, 309, 311, 50948], "temperature": 0.0, "avg_logprob": + -0.14780119436758535, "compression_ratio": 1.7805755395683454, "no_speech_prob": + 0.06597090512514114}, {"id": 646, "seek": 233964, "start": 2351.3199999999997, "end": + 2356.52, "text": " also, it''s also great that you''re speaking your mind, but and + things that really stick, you know,", "tokens": [50948, 611, 11, 309, 311, 611, + 869, 300, 291, 434, 4124, 428, 1575, 11, 457, 293, 721, 300, 534, 2897, 11, 291, + 458, 11, 51208], "temperature": 0.0, "avg_logprob": -0.14780119436758535, "compression_ratio": + 1.7805755395683454, "no_speech_prob": 0.06597090512514114}, {"id": 647, "seek": + 233964, "start": 2356.52, "end": 2361.8799999999997, "text": " you mentioned them, + right? But where on that scale, you would put medium, you know, the blogging platform", + "tokens": [51208, 291, 2835, 552, 11, 558, 30, 583, 689, 322, 300, 4373, 11, 291, + 576, 829, 6399, 11, 291, 458, 11, 264, 6968, 3249, 3663, 51476], "temperature": + 0.0, "avg_logprob": -0.14780119436758535, "compression_ratio": 1.7805755395683454, + "no_speech_prob": 0.06597090512514114}, {"id": 648, "seek": 236188, "start": 2361.88, + "end": 2369.8, "text": " where it kind of thrives with tutorials. And sometimes + these tutorials, they''re kind of okay,", "tokens": [50364, 689, 309, 733, 295, + 23949, 977, 365, 17616, 13, 400, 2171, 613, 17616, 11, 436, 434, 733, 295, 1392, + 11, 50760], "temperature": 0.0, "avg_logprob": -0.0996705334762047, "compression_ratio": + 1.7253521126760563, "no_speech_prob": 0.06096908077597618}, {"id": 649, "seek": + 236188, "start": 2369.8, "end": 2374.52, "text": " but you kind of like, okay, are + they going deep enough? But then there are other things where they", "tokens": [50760, + 457, 291, 733, 295, 411, 11, 1392, 11, 366, 436, 516, 2452, 1547, 30, 583, 550, + 456, 366, 661, 721, 689, 436, 50996], "temperature": 0.0, "avg_logprob": -0.0996705334762047, + "compression_ratio": 1.7253521126760563, "no_speech_prob": 0.06096908077597618}, + {"id": 650, "seek": 236188, "start": 2374.52, "end": 2379.4, "text": " summarize + papers in such a way that they actually try to explain it. It''s almost like popularizing", + "tokens": [50996, 20858, 10577, 294, 1270, 257, 636, 300, 436, 767, 853, 281, 2903, + 309, 13, 467, 311, 1920, 411, 3743, 3319, 51240], "temperature": 0.0, "avg_logprob": + -0.0996705334762047, "compression_ratio": 1.7253521126760563, "no_speech_prob": + 0.06096908077597618}, {"id": 651, "seek": 236188, "start": 2379.4, "end": 2385.1600000000003, + "text": " science because you do want to breed that next, you know, generation as + well. And maybe you will", "tokens": [51240, 3497, 570, 291, 360, 528, 281, 18971, + 300, 958, 11, 291, 458, 11, 5125, 382, 731, 13, 400, 1310, 291, 486, 51528], "temperature": + 0.0, "avg_logprob": -0.0996705334762047, "compression_ratio": 1.7253521126760563, + "no_speech_prob": 0.06096908077597618}, {"id": 652, "seek": 236188, "start": 2385.1600000000003, + "end": 2390.6, "text": " have some feedback to your ideas because don''t you think + when you publish a research paper, you know,", "tokens": [51528, 362, 512, 5824, + 281, 428, 3487, 570, 500, 380, 291, 519, 562, 291, 11374, 257, 2132, 3035, 11, 291, + 458, 11, 51800], "temperature": 0.0, "avg_logprob": -0.0996705334762047, "compression_ratio": + 1.7253521126760563, "no_speech_prob": 0.06096908077597618}, {"id": 653, "seek": + 239060, "start": 2390.6, "end": 2396.04, "text": " for the most part of the humanity, + it''s dry text. For some, it''s just Greek, right? They will not", "tokens": [50364, + 337, 264, 881, 644, 295, 264, 10243, 11, 309, 311, 4016, 2487, 13, 1171, 512, 11, + 309, 311, 445, 10281, 11, 558, 30, 814, 486, 406, 50636], "temperature": 0.0, "avg_logprob": + -0.16995952636238157, "compression_ratio": 1.8262548262548262, "no_speech_prob": + 0.03476051613688469}, {"id": 654, "seek": 239060, "start": 2396.04, "end": 2400.8399999999997, + "text": " even understand it. They will never, they will never read it. And so, + but they still might be curious,", "tokens": [50636, 754, 1223, 309, 13, 814, 486, + 1128, 11, 436, 486, 1128, 1401, 309, 13, 400, 370, 11, 457, 436, 920, 1062, 312, + 6369, 11, 50876], "temperature": 0.0, "avg_logprob": -0.16995952636238157, "compression_ratio": + 1.8262548262548262, "no_speech_prob": 0.03476051613688469}, {"id": 655, "seek": + 239060, "start": 2400.8399999999997, "end": 2405.72, "text": " like, okay, how, + you know, robots make decisions or something like that. You know, so,", "tokens": + [50876, 411, 11, 1392, 11, 577, 11, 291, 458, 11, 14733, 652, 5327, 420, 746, 411, + 300, 13, 509, 458, 11, 370, 11, 51120], "temperature": 0.0, "avg_logprob": -0.16995952636238157, + "compression_ratio": 1.8262548262548262, "no_speech_prob": 0.03476051613688469}, + {"id": 656, "seek": 239060, "start": 2406.2799999999997, "end": 2411.3199999999997, + "text": " how does my car, how does my car keep the lane keeps the lane? And actually + today I was driving,", "tokens": [51148, 577, 775, 452, 1032, 11, 577, 775, 452, + 1032, 1066, 264, 12705, 5965, 264, 12705, 30, 400, 767, 965, 286, 390, 4840, 11, + 51400], "temperature": 0.0, "avg_logprob": -0.16995952636238157, "compression_ratio": + 1.8262548262548262, "no_speech_prob": 0.03476051613688469}, {"id": 657, "seek": + 239060, "start": 2411.3199999999997, "end": 2416.6, "text": " I was driving to work + and I was like, my car actually switched to the lane keeping mode.", "tokens": [51400, + 286, 390, 4840, 281, 589, 293, 286, 390, 411, 11, 452, 1032, 767, 16858, 281, 264, + 12705, 5145, 4391, 13, 51664], "temperature": 0.0, "avg_logprob": -0.16995952636238157, + "compression_ratio": 1.8262548262548262, "no_speech_prob": 0.03476051613688469}, + {"id": 658, "seek": 241660, "start": 2417.24, "end": 2422.2799999999997, "text": + " And it was telling me that I should not, you know, steer to the left that much. + So it was actually", "tokens": [50396, 400, 309, 390, 3585, 385, 300, 286, 820, + 406, 11, 291, 458, 11, 30814, 281, 264, 1411, 300, 709, 13, 407, 309, 390, 767, + 50648], "temperature": 0.0, "avg_logprob": -0.155682006249061, "compression_ratio": + 1.775735294117647, "no_speech_prob": 0.08642230182886124}, {"id": 659, "seek": 241660, + "start": 2422.2799999999997, "end": 2426.68, "text": " steering to the right. But + the moment it noticed that I put my hands away from the steering wheel,", "tokens": + [50648, 14823, 281, 264, 558, 13, 583, 264, 1623, 309, 5694, 300, 286, 829, 452, + 2377, 1314, 490, 264, 14823, 5589, 11, 50868], "temperature": 0.0, "avg_logprob": + -0.155682006249061, "compression_ratio": 1.775735294117647, "no_speech_prob": 0.08642230182886124}, + {"id": 660, "seek": 241660, "start": 2426.68, "end": 2431.64, "text": " it actually + started alarming me and saying, hey, I''ll sleep or something, you know. So it''s + also", "tokens": [50868, 309, 767, 1409, 44043, 385, 293, 1566, 11, 4177, 11, 286, + 603, 2817, 420, 746, 11, 291, 458, 13, 407, 309, 311, 611, 51116], "temperature": + 0.0, "avg_logprob": -0.155682006249061, "compression_ratio": 1.775735294117647, + "no_speech_prob": 0.08642230182886124}, {"id": 661, "seek": 241660, "start": 2431.64, + "end": 2436.52, "text": " like kind of caring for you, right? In a way, so it''s + not trying to do so much more work,", "tokens": [51116, 411, 733, 295, 15365, 337, + 291, 11, 558, 30, 682, 257, 636, 11, 370, 309, 311, 406, 1382, 281, 360, 370, 709, + 544, 589, 11, 51360], "temperature": 0.0, "avg_logprob": -0.155682006249061, "compression_ratio": + 1.775735294117647, "no_speech_prob": 0.08642230182886124}, {"id": 662, "seek": 241660, + "start": 2437.56, "end": 2444.2799999999997, "text": " in that sense. Yeah, like, + the idea of popular science, I mean, you know, I''m recording my podcast", "tokens": + [51412, 294, 300, 2020, 13, 865, 11, 411, 11, 264, 1558, 295, 3743, 3497, 11, 286, + 914, 11, 291, 458, 11, 286, 478, 6613, 452, 7367, 51748], "temperature": 0.0, "avg_logprob": + -0.155682006249061, "compression_ratio": 1.775735294117647, "no_speech_prob": 0.08642230182886124}, + {"id": 663, "seek": 244428, "start": 2444.36, "end": 2451.2400000000002, "text": + " behind a bookshelf, like it makes me look smarter, but I only really, I only really + read books like,", "tokens": [50368, 2261, 257, 1446, 46626, 11, 411, 309, 1669, + 385, 574, 20294, 11, 457, 286, 787, 534, 11, 286, 787, 534, 1401, 3642, 411, 11, + 50712], "temperature": 0.0, "avg_logprob": -0.19112039581546938, "compression_ratio": + 1.8275862068965518, "no_speech_prob": 0.00562828965485096}, {"id": 664, "seek": + 244428, "start": 2451.2400000000002, "end": 2454.6800000000003, "text": " you know, + like the book of, I mean, the book of why is a bad example. That''s a really great + book,", "tokens": [50712, 291, 458, 11, 411, 264, 1446, 295, 11, 286, 914, 11, 264, + 1446, 295, 983, 307, 257, 1578, 1365, 13, 663, 311, 257, 534, 869, 1446, 11, 50884], + "temperature": 0.0, "avg_logprob": -0.19112039581546938, "compression_ratio": 1.8275862068965518, + "no_speech_prob": 0.00562828965485096}, {"id": 665, "seek": 244428, "start": 2454.6800000000003, + "end": 2459.88, "text": " like technical and I really really like that one. But + most of these like popular science books,", "tokens": [50884, 411, 6191, 293, 286, + 534, 534, 411, 300, 472, 13, 583, 881, 295, 613, 411, 3743, 3497, 3642, 11, 51144], + "temperature": 0.0, "avg_logprob": -0.19112039581546938, "compression_ratio": 1.8275862068965518, + "no_speech_prob": 0.00562828965485096}, {"id": 666, "seek": 244428, "start": 2459.88, + "end": 2464.44, "text": " I''d have to be like on an airplane or something like + I, or are in the same with the category", "tokens": [51144, 286, 1116, 362, 281, + 312, 411, 322, 364, 17130, 420, 746, 411, 286, 11, 420, 366, 294, 264, 912, 365, + 264, 7719, 51372], "temperature": 0.0, "avg_logprob": -0.19112039581546938, "compression_ratio": + 1.8275862068965518, "no_speech_prob": 0.00562828965485096}, {"id": 667, "seek": + 244428, "start": 2464.44, "end": 2469.7200000000003, "text": " of medium articles + that are popular science. Like, you know, I read research papers only,", "tokens": + [51372, 295, 6399, 11290, 300, 366, 3743, 3497, 13, 1743, 11, 291, 458, 11, 286, + 1401, 2132, 10577, 787, 11, 51636], "temperature": 0.0, "avg_logprob": -0.19112039581546938, + "compression_ratio": 1.8275862068965518, "no_speech_prob": 0.00562828965485096}, + {"id": 668, "seek": 246972, "start": 2470.68, "end": 2474.52, "text": " not to like + be dismissive of anything else, but that''s just like the question of what particularly", + "tokens": [50412, 406, 281, 411, 312, 16974, 488, 295, 1340, 1646, 11, 457, 300, + 311, 445, 411, 264, 1168, 295, 437, 4098, 50604], "temperature": 0.0, "avg_logprob": + -0.25637316509960145, "compression_ratio": 1.5765472312703583, "no_speech_prob": + 0.004839413333684206}, {"id": 669, "seek": 246972, "start": 2474.52, "end": 2481.64, + "text": " do I study. And in my approach is very people-centric. Like, you know, + like when, say, Chelsea Finn", "tokens": [50604, 360, 286, 2979, 13, 400, 294, 452, + 3109, 307, 588, 561, 12, 45300, 13, 1743, 11, 291, 458, 11, 411, 562, 11, 584, 11, + 26527, 21066, 50960], "temperature": 0.0, "avg_logprob": -0.25637316509960145, "compression_ratio": + 1.5765472312703583, "no_speech_prob": 0.004839413333684206}, {"id": 670, "seek": + 246972, "start": 2482.52, "end": 2487.8799999999997, "text": " publishes a new paper + on Twitter, I''ll go read that because I kind of have been following her", "tokens": + [51004, 11374, 279, 257, 777, 3035, 322, 5794, 11, 286, 603, 352, 1401, 300, 570, + 286, 733, 295, 362, 668, 3480, 720, 51272], "temperature": 0.0, "avg_logprob": -0.25637316509960145, + "compression_ratio": 1.5765472312703583, "no_speech_prob": 0.004839413333684206}, + {"id": 671, "seek": 246972, "start": 2487.8799999999997, "end": 2493.7999999999997, + "text": " thinking, like Jeff Cloon is another example with the AIGA''s or Fran\u00e7ois + Shalide. These kind of", "tokens": [51272, 1953, 11, 411, 7506, 31901, 266, 307, + 1071, 1365, 365, 264, 7318, 12570, 311, 420, 1526, 12368, 7376, 1160, 304, 482, + 13, 1981, 733, 295, 51568], "temperature": 0.0, "avg_logprob": -0.25637316509960145, + "compression_ratio": 1.5765472312703583, "no_speech_prob": 0.004839413333684206}, + {"id": 672, "seek": 246972, "start": 2493.7999999999997, "end": 2498.2, "text": + " people like I, like Michael Bronson with the geometric deep learning is another + great example.", "tokens": [51568, 561, 411, 286, 11, 411, 5116, 1603, 892, 266, + 365, 264, 33246, 2452, 2539, 307, 1071, 869, 1365, 13, 51788], "temperature": 0.0, + "avg_logprob": -0.25637316509960145, "compression_ratio": 1.5765472312703583, "no_speech_prob": + 0.004839413333684206}, {"id": 673, "seek": 249820, "start": 2498.9199999999996, + "end": 2503.64, "text": " I hate doing these lists. I never like to do these lists + because it''s so endless, like the vocabulary", "tokens": [50400, 286, 4700, 884, + 613, 14511, 13, 286, 1128, 411, 281, 360, 613, 14511, 570, 309, 311, 370, 16144, + 11, 411, 264, 19864, 50636], "temperature": 0.0, "avg_logprob": -0.1472941607963748, + "compression_ratio": 1.794776119402985, "no_speech_prob": 0.04324910417199135}, + {"id": 674, "seek": 249820, "start": 2503.64, "end": 2509.72, "text": " you need + to kind of assess, like I''ve left off so many people, but you know, I like the + people''s", "tokens": [50636, 291, 643, 281, 733, 295, 5877, 11, 411, 286, 600, + 1411, 766, 370, 867, 561, 11, 457, 291, 458, 11, 286, 411, 264, 561, 311, 50940], + "temperature": 0.0, "avg_logprob": -0.1472941607963748, "compression_ratio": 1.794776119402985, + "no_speech_prob": 0.04324910417199135}, {"id": 675, "seek": 249820, "start": 2509.72, + "end": 2513.8799999999997, "text": " centric focus and I try to get to know these + people and understand like how they think of these", "tokens": [50940, 1489, 1341, + 1879, 293, 286, 853, 281, 483, 281, 458, 613, 561, 293, 1223, 411, 577, 436, 519, + 295, 613, 51148], "temperature": 0.0, "avg_logprob": -0.1472941607963748, "compression_ratio": + 1.794776119402985, "no_speech_prob": 0.04324910417199135}, {"id": 676, "seek": 249820, + "start": 2513.8799999999997, "end": 2519.48, "text": " things. It''s like the same + thing as you go to the conference. Sometimes you don''t go to that", "tokens": [51148, + 721, 13, 467, 311, 411, 264, 912, 551, 382, 291, 352, 281, 264, 7586, 13, 4803, + 291, 500, 380, 352, 281, 300, 51428], "temperature": 0.0, "avg_logprob": -0.1472941607963748, + "compression_ratio": 1.794776119402985, "no_speech_prob": 0.04324910417199135}, + {"id": 677, "seek": 249820, "start": 2519.48, "end": 2525.3199999999997, "text": + " specific topic. Maybe when you''re a little bit more junior, you do, but later + in your career,", "tokens": [51428, 2685, 4829, 13, 2704, 562, 291, 434, 257, 707, + 857, 544, 16195, 11, 291, 360, 11, 457, 1780, 294, 428, 3988, 11, 51720], "temperature": + 0.0, "avg_logprob": -0.1472941607963748, "compression_ratio": 1.794776119402985, + "no_speech_prob": 0.04324910417199135}, {"id": 678, "seek": 252532, "start": 2525.4, + "end": 2531.2400000000002, "text": " like academic or industrial, you actually go + to listen to that person because they might not", "tokens": [50368, 411, 7778, 420, + 9987, 11, 291, 767, 352, 281, 2140, 281, 300, 954, 570, 436, 1062, 406, 50660], + "temperature": 0.0, "avg_logprob": -0.1715004298997962, "compression_ratio": 1.7107142857142856, + "no_speech_prob": 0.022812463343143463}, {"id": 679, "seek": 252532, "start": 2531.2400000000002, + "end": 2538.2000000000003, "text": " give you any novel idea, but they might give + you so much experience that you daily, like really need,", "tokens": [50660, 976, + 291, 604, 7613, 1558, 11, 457, 436, 1062, 976, 291, 370, 709, 1752, 300, 291, 5212, + 11, 411, 534, 643, 11, 51008], "temperature": 0.0, "avg_logprob": -0.1715004298997962, + "compression_ratio": 1.7107142857142856, "no_speech_prob": 0.022812463343143463}, + {"id": 680, "seek": 252532, "start": 2538.2000000000003, "end": 2544.84, "text": + " right? Yeah, and just following the timeline of their work, it helped, like their + newest work will", "tokens": [51008, 558, 30, 865, 11, 293, 445, 3480, 264, 12933, + 295, 641, 589, 11, 309, 4254, 11, 411, 641, 17569, 589, 486, 51340], "temperature": + 0.0, "avg_logprob": -0.1715004298997962, "compression_ratio": 1.7107142857142856, + "no_speech_prob": 0.022812463343143463}, {"id": 681, "seek": 252532, "start": 2544.84, + "end": 2549.32, "text": " help you realize, oh, that''s their thinking in the past + work too. I kind of see how they''re", "tokens": [51340, 854, 291, 4325, 11, 1954, + 11, 300, 311, 641, 1953, 294, 264, 1791, 589, 886, 13, 286, 733, 295, 536, 577, + 436, 434, 51564], "temperature": 0.0, "avg_logprob": -0.1715004298997962, "compression_ratio": + 1.7107142857142856, "no_speech_prob": 0.022812463343143463}, {"id": 682, "seek": + 252532, "start": 2549.32, "end": 2553.88, "text": " thinking about these things. + And it''s like, you know, everybody thinks so abstract. They have", "tokens": [51564, + 1953, 466, 613, 721, 13, 400, 309, 311, 411, 11, 291, 458, 11, 2201, 7309, 370, + 12649, 13, 814, 362, 51792], "temperature": 0.0, "avg_logprob": -0.1715004298997962, + "compression_ratio": 1.7107142857142856, "no_speech_prob": 0.022812463343143463}, + {"id": 683, "seek": 255388, "start": 2553.88, "end": 2559.56, "text": " this idea, + this vision, and it can be hard to communicate the vision in writing or videos. + So", "tokens": [50364, 341, 1558, 11, 341, 5201, 11, 293, 309, 393, 312, 1152, 281, + 7890, 264, 5201, 294, 3579, 420, 2145, 13, 407, 50648], "temperature": 0.0, "avg_logprob": + -0.214063028494517, "compression_ratio": 1.5679012345679013, "no_speech_prob": 0.013443772681057453}, + {"id": 684, "seek": 255388, "start": 2559.56, "end": 2566.2000000000003, "text": + " yeah, just like you said, I think just repeated exposure to the same person is + like, hopefully", "tokens": [50648, 1338, 11, 445, 411, 291, 848, 11, 286, 519, + 445, 10477, 10420, 281, 264, 912, 954, 307, 411, 11, 4696, 50980], "temperature": + 0.0, "avg_logprob": -0.214063028494517, "compression_ratio": 1.5679012345679013, + "no_speech_prob": 0.013443772681057453}, {"id": 685, "seek": 255388, "start": 2566.2000000000003, + "end": 2572.36, "text": " that''s Henry AI last thing. Yeah, absolutely. I''m pretty + sure. I saw some really great comments", "tokens": [50980, 300, 311, 11085, 7318, + 1036, 551, 13, 865, 11, 3122, 13, 286, 478, 1238, 988, 13, 286, 1866, 512, 534, + 869, 3053, 51288], "temperature": 0.0, "avg_logprob": -0.214063028494517, "compression_ratio": + 1.5679012345679013, "no_speech_prob": 0.013443772681057453}, {"id": 686, "seek": + 255388, "start": 2572.36, "end": 2577.08, "text": " underneath your videos, you + know, some people were saying, I can''t wait for the next one. So you", "tokens": + [51288, 7223, 428, 2145, 11, 291, 458, 11, 512, 561, 645, 1566, 11, 286, 393, 380, + 1699, 337, 264, 958, 472, 13, 407, 291, 51524], "temperature": 0.0, "avg_logprob": + -0.214063028494517, "compression_ratio": 1.5679012345679013, "no_speech_prob": 0.013443772681057453}, + {"id": 687, "seek": 257708, "start": 2577.16, "end": 2584.04, "text": " definitely + doing great job there. So could as to you for doing that for so long actually. I", + "tokens": [50368, 2138, 884, 869, 1691, 456, 13, 407, 727, 382, 281, 291, 337, 884, + 300, 337, 370, 938, 767, 13, 286, 50712], "temperature": 0.0, "avg_logprob": -0.1759114424387614, + "compression_ratio": 1.7272727272727273, "no_speech_prob": 0.03125854954123497}, + {"id": 688, "seek": 257708, "start": 2584.04, "end": 2588.2799999999997, "text": + " don''t know for how long you''ve been doing this, but you have a ton of videos. + Yeah, and I really", "tokens": [50712, 500, 380, 458, 337, 577, 938, 291, 600, 668, + 884, 341, 11, 457, 291, 362, 257, 2952, 295, 2145, 13, 865, 11, 293, 286, 534, 50924], + "temperature": 0.0, "avg_logprob": -0.1759114424387614, "compression_ratio": 1.7272727272727273, + "no_speech_prob": 0.03125854954123497}, {"id": 689, "seek": 257708, "start": 2588.2799999999997, + "end": 2592.92, "text": " appreciate it. You know, the people who keep commenting, + I, you know, I recognize your profiles,", "tokens": [50924, 4449, 309, 13, 509, + 458, 11, 264, 561, 567, 1066, 29590, 11, 286, 11, 291, 458, 11, 286, 5521, 428, + 23693, 11, 51156], "temperature": 0.0, "avg_logprob": -0.1759114424387614, "compression_ratio": + 1.7272727272727273, "no_speech_prob": 0.03125854954123497}, {"id": 690, "seek": + 257708, "start": 2592.92, "end": 2598.84, "text": " and I do really, really appreciate + it. So it helps me keep making the videos and staying convinced of", "tokens": [51156, + 293, 286, 360, 534, 11, 534, 4449, 309, 13, 407, 309, 3665, 385, 1066, 1455, 264, + 2145, 293, 7939, 12561, 295, 51452], "temperature": 0.0, "avg_logprob": -0.1759114424387614, + "compression_ratio": 1.7272727272727273, "no_speech_prob": 0.03125854954123497}, + {"id": 691, "seek": 257708, "start": 2598.84, "end": 2604.92, "text": " that medium + of YouTube being one of the ways to express these ideas. I''d say like even,", "tokens": + [51452, 300, 6399, 295, 3088, 885, 472, 295, 264, 2098, 281, 5109, 613, 3487, 13, + 286, 1116, 584, 411, 754, 11, 51756], "temperature": 0.0, "avg_logprob": -0.1759114424387614, + "compression_ratio": 1.7272727272727273, "no_speech_prob": 0.03125854954123497}, + {"id": 692, "seek": 260492, "start": 2605.8, "end": 2611.0, "text": " even more + so than writing papers that you submit to these conferences. Sometimes I, you know,", + "tokens": [50408, 754, 544, 370, 813, 3579, 10577, 300, 291, 10315, 281, 613, 22032, + 13, 4803, 286, 11, 291, 458, 11, 50668], "temperature": 0.0, "avg_logprob": -0.13820074438079585, + "compression_ratio": 1.685121107266436, "no_speech_prob": 0.00940753985196352}, + {"id": 693, "seek": 260492, "start": 2611.0, "end": 2616.6800000000003, "text": + " I think making a YouTube video can be a powerful way to share ideas. I don''t + know if I want to", "tokens": [50668, 286, 519, 1455, 257, 3088, 960, 393, 312, + 257, 4005, 636, 281, 2073, 3487, 13, 286, 500, 380, 458, 498, 286, 528, 281, 50952], + "temperature": 0.0, "avg_logprob": -0.13820074438079585, "compression_ratio": 1.685121107266436, + "no_speech_prob": 0.00940753985196352}, {"id": 694, "seek": 260492, "start": 2616.6800000000003, + "end": 2620.76, "text": " completely put my flag on that idea because I, you know, + these reviews, you do get some really good", "tokens": [50952, 2584, 829, 452, 7166, + 322, 300, 1558, 570, 286, 11, 291, 458, 11, 613, 10229, 11, 291, 360, 483, 512, + 534, 665, 51156], "temperature": 0.0, "avg_logprob": -0.13820074438079585, "compression_ratio": + 1.685121107266436, "no_speech_prob": 0.00940753985196352}, {"id": 695, "seek": 260492, + "start": 2620.76, "end": 2624.92, "text": " reviews. Like as I mentioned previously + at the beginning of the video, I, you know, I literally got", "tokens": [51156, + 10229, 13, 1743, 382, 286, 2835, 8046, 412, 264, 2863, 295, 264, 960, 11, 286, 11, + 291, 458, 11, 286, 3736, 658, 51364], "temperature": 0.0, "avg_logprob": -0.13820074438079585, + "compression_ratio": 1.685121107266436, "no_speech_prob": 0.00940753985196352}, + {"id": 696, "seek": 260492, "start": 2624.92, "end": 2632.04, "text": " smashed + on my ICLR reviews. They were not good, but I got, I got really high quality feedback. + So,", "tokens": [51364, 33269, 322, 452, 14360, 31722, 10229, 13, 814, 645, 406, + 665, 11, 457, 286, 658, 11, 286, 658, 534, 1090, 3125, 5824, 13, 407, 11, 51720], + "temperature": 0.0, "avg_logprob": -0.13820074438079585, "compression_ratio": 1.685121107266436, + "no_speech_prob": 0.00940753985196352}, {"id": 697, "seek": 263204, "start": 2632.04, + "end": 2637.16, "text": " yes. You know, you''re learning from it. You''re learning. + Right. Yeah. Actually, one of my managers used", "tokens": [50364, 2086, 13, 509, + 458, 11, 291, 434, 2539, 490, 309, 13, 509, 434, 2539, 13, 1779, 13, 865, 13, 5135, + 11, 472, 295, 452, 14084, 1143, 50620], "temperature": 0.0, "avg_logprob": -0.1615606689453125, + "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.015546012669801712}, + {"id": 698, "seek": 263204, "start": 2637.16, "end": 2644.68, "text": " to say feedback + is gold. So even if it feels painful, take it because because the problem is that", + "tokens": [50620, 281, 584, 5824, 307, 3821, 13, 407, 754, 498, 309, 3417, 11697, + 11, 747, 309, 570, 570, 264, 1154, 307, 300, 50996], "temperature": 0.0, "avg_logprob": + -0.1615606689453125, "compression_ratio": 1.7857142857142858, "no_speech_prob": + 0.015546012669801712}, {"id": 699, "seek": 263204, "start": 2644.68, "end": 2649.56, + "text": " sometimes, especially as you grow in your career, you know, at some point + you will be the role model", "tokens": [50996, 2171, 11, 2318, 382, 291, 1852, 294, + 428, 3988, 11, 291, 458, 11, 412, 512, 935, 291, 486, 312, 264, 3090, 2316, 51240], + "temperature": 0.0, "avg_logprob": -0.1615606689453125, "compression_ratio": 1.7857142857142858, + "no_speech_prob": 0.015546012669801712}, {"id": 700, "seek": 263204, "start": 2649.56, + "end": 2654.6, "text": " for some other people. Now, where do you get the feedback + from nowhere? Because you''re the person", "tokens": [51240, 337, 512, 661, 561, + 13, 823, 11, 689, 360, 291, 483, 264, 5824, 490, 11159, 30, 1436, 291, 434, 264, + 954, 51492], "temperature": 0.0, "avg_logprob": -0.1615606689453125, "compression_ratio": + 1.7857142857142858, "no_speech_prob": 0.015546012669801712}, {"id": 701, "seek": + 263204, "start": 2654.6, "end": 2660.12, "text": " giving feedback. But you still + need to grow. You still have pains, you have doubts, you have ideas,", "tokens": + [51492, 2902, 5824, 13, 583, 291, 920, 643, 281, 1852, 13, 509, 920, 362, 29774, + 11, 291, 362, 22618, 11, 291, 362, 3487, 11, 51768], "temperature": 0.0, "avg_logprob": + -0.1615606689453125, "compression_ratio": 1.7857142857142858, "no_speech_prob": + 0.015546012669801712}, {"id": 702, "seek": 266012, "start": 2660.12, "end": 2664.44, + "text": " you need validation. And maybe you''re doing something wrong as well at + some point. Maybe somebody", "tokens": [50364, 291, 643, 24071, 13, 400, 1310, 291, + 434, 884, 746, 2085, 382, 731, 412, 512, 935, 13, 2704, 2618, 50580], "temperature": + 0.0, "avg_logprob": -0.16474604399307916, "compression_ratio": 1.721254355400697, + "no_speech_prob": 0.013061142526566982}, {"id": 703, "seek": 266012, "start": 2664.44, + "end": 2669.48, "text": " is intimidated to tell you that because you are at the + top. You are like the boss or whatever. You", "tokens": [50580, 307, 40234, 281, + 980, 291, 300, 570, 291, 366, 412, 264, 1192, 13, 509, 366, 411, 264, 5741, 420, + 2035, 13, 509, 50832], "temperature": 0.0, "avg_logprob": -0.16474604399307916, + "compression_ratio": 1.721254355400697, "no_speech_prob": 0.013061142526566982}, + {"id": 704, "seek": 266012, "start": 2669.48, "end": 2676.8399999999997, "text": + " know, like who gives you feedback at that point? They actually recommend to turn + to, you know,", "tokens": [50832, 458, 11, 411, 567, 2709, 291, 5824, 412, 300, + 935, 30, 814, 767, 2748, 281, 1261, 281, 11, 291, 458, 11, 51200], "temperature": + 0.0, "avg_logprob": -0.16474604399307916, "compression_ratio": 1.721254355400697, + "no_speech_prob": 0.013061142526566982}, {"id": 705, "seek": 266012, "start": 2676.8399999999997, + "end": 2681.7999999999997, "text": " professional coaches and kind of those people + who can actually steer you in some direction. Right.", "tokens": [51200, 4843, 17503, + 293, 733, 295, 729, 561, 567, 393, 767, 30814, 291, 294, 512, 3513, 13, 1779, 13, + 51448], "temperature": 0.0, "avg_logprob": -0.16474604399307916, "compression_ratio": + 1.721254355400697, "no_speech_prob": 0.013061142526566982}, {"id": 706, "seek": + 266012, "start": 2681.7999999999997, "end": 2687.72, "text": " Oh, maybe you can + unload your thoughts. Have you found yourself in that situation? Or what, what do + you", "tokens": [51448, 876, 11, 1310, 291, 393, 32165, 428, 4598, 13, 3560, 291, + 1352, 1803, 294, 300, 2590, 30, 1610, 437, 11, 437, 360, 291, 51744], "temperature": + 0.0, "avg_logprob": -0.16474604399307916, "compression_ratio": 1.721254355400697, + "no_speech_prob": 0.013061142526566982}, {"id": 707, "seek": 268772, "start": 2687.72, + "end": 2697.24, "text": " think? Yeah. Well, I mean, I''m in a lucky situation where + I do have a formal PhD advisor that,", "tokens": [50364, 519, 30, 865, 13, 1042, + 11, 286, 914, 11, 286, 478, 294, 257, 6356, 2590, 689, 286, 360, 362, 257, 9860, + 14476, 19161, 300, 11, 50840], "temperature": 0.0, "avg_logprob": -0.14555221973079266, + "compression_ratio": 1.5666666666666667, "no_speech_prob": 0.001687857904471457}, + {"id": 708, "seek": 268772, "start": 2697.24, "end": 2703.3999999999996, "text": + " as I mentioned, I speak on the phone with very often. And, and you know, my PhD + advisor and I", "tokens": [50840, 382, 286, 2835, 11, 286, 1710, 322, 264, 2593, + 365, 588, 2049, 13, 400, 11, 293, 291, 458, 11, 452, 14476, 19161, 293, 286, 51148], + "temperature": 0.0, "avg_logprob": -0.14555221973079266, "compression_ratio": 1.5666666666666667, + "no_speech_prob": 0.001687857904471457}, {"id": 709, "seek": 268772, "start": 2703.3999999999996, + "end": 2708.04, "text": " had a relationship for so long that he like introduced + machine learning to me. So it''s like,", "tokens": [51148, 632, 257, 2480, 337, + 370, 938, 300, 415, 411, 7268, 3479, 2539, 281, 385, 13, 407, 309, 311, 411, 11, + 51380], "temperature": 0.0, "avg_logprob": -0.14555221973079266, "compression_ratio": + 1.5666666666666667, "no_speech_prob": 0.001687857904471457}, {"id": 710, "seek": + 268772, "start": 2708.04, "end": 2714.4399999999996, "text": " I was a basketball + player, you know, taking classes. And I, and so this was my introduction to", "tokens": + [51380, 286, 390, 257, 11767, 4256, 11, 291, 458, 11, 1940, 5359, 13, 400, 286, + 11, 293, 370, 341, 390, 452, 9339, 281, 51700], "temperature": 0.0, "avg_logprob": + -0.14555221973079266, "compression_ratio": 1.5666666666666667, "no_speech_prob": + 0.001687857904471457}, {"id": 711, "seek": 271444, "start": 2714.44, "end": 2720.76, + "text": " machine learning. I like, I hardly understood like, you know, like a tea + test statistical regression", "tokens": [50364, 3479, 2539, 13, 286, 411, 11, 286, + 13572, 7320, 411, 11, 291, 458, 11, 411, 257, 5817, 1500, 22820, 24590, 50680], + "temperature": 0.0, "avg_logprob": -0.130888427734375, "compression_ratio": 1.7392857142857143, + "no_speech_prob": 0.004426426719874144}, {"id": 712, "seek": 271444, "start": 2720.76, + "end": 2725.64, "text": " analysis before this class. So it''s like, so I''m, I''ve + had the same advisor for a long time in", "tokens": [50680, 5215, 949, 341, 1508, + 13, 407, 309, 311, 411, 11, 370, 286, 478, 11, 286, 600, 632, 264, 912, 19161, 337, + 257, 938, 565, 294, 50924], "temperature": 0.0, "avg_logprob": -0.130888427734375, + "compression_ratio": 1.7392857142857143, "no_speech_prob": 0.004426426719874144}, + {"id": 713, "seek": 271444, "start": 2725.64, "end": 2730.68, "text": " that regard, + like a formal academic advisor. And then meeting people like Bob and, you know,", + "tokens": [50924, 300, 3843, 11, 411, 257, 9860, 7778, 19161, 13, 400, 550, 3440, + 561, 411, 6085, 293, 11, 291, 458, 11, 51176], "temperature": 0.0, "avg_logprob": + -0.130888427734375, "compression_ratio": 1.7392857142857143, "no_speech_prob": 0.004426426719874144}, + {"id": 714, "seek": 271444, "start": 2730.68, "end": 2736.04, "text": " you and + I as we talk now, I, you know, trying to reach out and pick the brains of people + and see", "tokens": [51176, 291, 293, 286, 382, 321, 751, 586, 11, 286, 11, 291, + 458, 11, 1382, 281, 2524, 484, 293, 1888, 264, 15442, 295, 561, 293, 536, 51444], + "temperature": 0.0, "avg_logprob": -0.130888427734375, "compression_ratio": 1.7392857142857143, + "no_speech_prob": 0.004426426719874144}, {"id": 715, "seek": 271444, "start": 2736.04, + "end": 2741.4, "text": " what they think. I guess. Yeah. So basically they are like, + they become like, you might have multiple", "tokens": [51444, 437, 436, 519, 13, + 286, 2041, 13, 865, 13, 407, 1936, 436, 366, 411, 11, 436, 1813, 411, 11, 291, 1062, + 362, 3866, 51712], "temperature": 0.0, "avg_logprob": -0.130888427734375, "compression_ratio": + 1.7392857142857143, "no_speech_prob": 0.004426426719874144}, {"id": 716, "seek": + 274140, "start": 2741.4, "end": 2746.2000000000003, "text": " role models. And sometimes, + you know, like they also say, you do not need a physical person with", "tokens": + [50364, 3090, 5245, 13, 400, 2171, 11, 291, 458, 11, 411, 436, 611, 584, 11, 291, + 360, 406, 643, 257, 4001, 954, 365, 50604], "temperature": 0.0, "avg_logprob": -0.09971632379474062, + "compression_ratio": 1.6802721088435375, "no_speech_prob": 0.03275059908628464}, + {"id": 717, "seek": 274140, "start": 2746.2000000000003, "end": 2751.64, "text": + " whom you talk, but it could be some kind of online person. Like for me, it used + to be for a long", "tokens": [50604, 7101, 291, 751, 11, 457, 309, 727, 312, 512, + 733, 295, 2950, 954, 13, 1743, 337, 385, 11, 309, 1143, 281, 312, 337, 257, 938, + 50876], "temperature": 0.0, "avg_logprob": -0.09971632379474062, "compression_ratio": + 1.6802721088435375, "no_speech_prob": 0.03275059908628464}, {"id": 718, "seek": + 274140, "start": 2751.64, "end": 2757.48, "text": " time, Elon Musk, because I''ve + been focusing on building startups. And, and his approach to startups", "tokens": + [50876, 565, 11, 28498, 26019, 11, 570, 286, 600, 668, 8416, 322, 2390, 28041, 13, + 400, 11, 293, 702, 3109, 281, 28041, 51168], "temperature": 0.0, "avg_logprob": + -0.09971632379474062, "compression_ratio": 1.6802721088435375, "no_speech_prob": + 0.03275059908628464}, {"id": 719, "seek": 274140, "start": 2757.48, "end": 2763.64, + "text": " was not like, hey, you know, go unleash yourself, get rid of your doubt + and just do it. No, he''s so", "tokens": [51168, 390, 406, 411, 11, 4177, 11, 291, + 458, 11, 352, 49814, 1803, 11, 483, 3973, 295, 428, 6385, 293, 445, 360, 309, 13, + 883, 11, 415, 311, 370, 51476], "temperature": 0.0, "avg_logprob": -0.09971632379474062, + "compression_ratio": 1.6802721088435375, "no_speech_prob": 0.03275059908628464}, + {"id": 720, "seek": 274140, "start": 2763.64, "end": 2768.6, "text": " deep into + what he does. Like at some point, I want to record a podcast where I would like + to talk to", "tokens": [51476, 2452, 666, 437, 415, 775, 13, 1743, 412, 512, 935, + 11, 286, 528, 281, 2136, 257, 7367, 689, 286, 576, 411, 281, 751, 281, 51724], "temperature": + 0.0, "avg_logprob": -0.09971632379474062, "compression_ratio": 1.6802721088435375, + "no_speech_prob": 0.03275059908628464}, {"id": 721, "seek": 276860, "start": 2768.6, + "end": 2773.72, "text": " you or talk to somebody to actually explain and kind of + does it resonate with you, like he''s", "tokens": [50364, 291, 420, 751, 281, 2618, + 281, 767, 2903, 293, 733, 295, 775, 309, 34285, 365, 291, 11, 411, 415, 311, 50620], + "temperature": 0.0, "avg_logprob": -0.15541689736502512, "compression_ratio": 1.7464285714285714, + "no_speech_prob": 0.0054707396775484085}, {"id": 722, "seek": 276860, "start": 2773.72, + "end": 2778.12, "text": " thinking, like, first, you need to try this before automating + this. You need to repeat it several", "tokens": [50620, 1953, 11, 411, 11, 700, + 11, 291, 643, 281, 853, 341, 949, 3553, 990, 341, 13, 509, 643, 281, 7149, 309, + 2940, 50840], "temperature": 0.0, "avg_logprob": -0.15541689736502512, "compression_ratio": + 1.7464285714285714, "no_speech_prob": 0.0054707396775484085}, {"id": 723, "seek": + 276860, "start": 2778.12, "end": 2783.16, "text": " times to learn new mistakes + and blah, blah, blah. So it''s like an amazing way. And he like build this", "tokens": + [50840, 1413, 281, 1466, 777, 8038, 293, 12288, 11, 12288, 11, 12288, 13, 407, 309, + 311, 411, 364, 2243, 636, 13, 400, 415, 411, 1322, 341, 51092], "temperature": 0.0, + "avg_logprob": -0.15541689736502512, "compression_ratio": 1.7464285714285714, "no_speech_prob": + 0.0054707396775484085}, {"id": 724, "seek": 276860, "start": 2783.16, "end": 2789.08, + "text": " kind of, you know, a thought machinery that he applies to any problem, + right? So any problem that", "tokens": [51092, 733, 295, 11, 291, 458, 11, 257, + 1194, 27302, 300, 415, 13165, 281, 604, 1154, 11, 558, 30, 407, 604, 1154, 300, + 51388], "temperature": 0.0, "avg_logprob": -0.15541689736502512, "compression_ratio": + 1.7464285714285714, "no_speech_prob": 0.0054707396775484085}, {"id": 725, "seek": + 276860, "start": 2789.08, "end": 2795.56, "text": " lands in his hands, he''s like, + I can try it step by step like that and see what happens. And maybe", "tokens": + [51388, 5949, 294, 702, 2377, 11, 415, 311, 411, 11, 286, 393, 853, 309, 1823, 538, + 1823, 411, 300, 293, 536, 437, 2314, 13, 400, 1310, 51712], "temperature": 0.0, + "avg_logprob": -0.15541689736502512, "compression_ratio": 1.7464285714285714, "no_speech_prob": + 0.0054707396775484085}, {"id": 726, "seek": 279556, "start": 2795.56, "end": 2799.48, + "text": " at some point it just drops out and you''re like, okay, I''m done here. + I''m moving to the next one,", "tokens": [50364, 412, 512, 935, 309, 445, 11438, + 484, 293, 291, 434, 411, 11, 1392, 11, 286, 478, 1096, 510, 13, 286, 478, 2684, + 281, 264, 958, 472, 11, 50560], "temperature": 0.0, "avg_logprob": -0.16007035573323566, + "compression_ratio": 1.6346153846153846, "no_speech_prob": 0.009614890441298485}, + {"id": 727, "seek": 279556, "start": 2799.48, "end": 2802.92, "text": " right? So + I''m not going to waste my time. And he''s a super productive guy, as we know.", + "tokens": [50560, 558, 30, 407, 286, 478, 406, 516, 281, 5964, 452, 565, 13, 400, + 415, 311, 257, 1687, 13304, 2146, 11, 382, 321, 458, 13, 50732], "temperature": + 0.0, "avg_logprob": -0.16007035573323566, "compression_ratio": 1.6346153846153846, + "no_speech_prob": 0.009614890441298485}, {"id": 728, "seek": 279556, "start": 2804.12, + "end": 2807.88, "text": " So I mean, sometimes it could be just an online person + that you follow. And as you said,", "tokens": [50792, 407, 286, 914, 11, 2171, 309, + 727, 312, 445, 364, 2950, 954, 300, 291, 1524, 13, 400, 382, 291, 848, 11, 50980], + "temperature": 0.0, "avg_logprob": -0.16007035573323566, "compression_ratio": 1.6346153846153846, + "no_speech_prob": 0.009614890441298485}, {"id": 729, "seek": 279556, "start": 2808.44, + "end": 2812.84, "text": " you do this on Twitter, like you said, like maniacally + refreshing the tweeters.", "tokens": [51008, 291, 360, 341, 322, 5794, 11, 411, + 291, 848, 11, 411, 47193, 379, 19772, 264, 6986, 6202, 13, 51228], "temperature": + 0.0, "avg_logprob": -0.16007035573323566, "compression_ratio": 1.6346153846153846, + "no_speech_prob": 0.009614890441298485}, {"id": 730, "seek": 279556, "start": 2814.52, + "end": 2820.68, "text": " So just stay stay safe as well there. But at the same + time, I think the", "tokens": [51312, 407, 445, 1754, 1754, 3273, 382, 731, 456, + 13, 583, 412, 264, 912, 565, 11, 286, 519, 264, 51620], "temperature": 0.0, "avg_logprob": + -0.16007035573323566, "compression_ratio": 1.6346153846153846, "no_speech_prob": + 0.009614890441298485}, {"id": 731, "seek": 282068, "start": 2820.68, "end": 2826.6, + "text": " respiratory time in your life, when you''re learning a ton. And later + in your life, you will be kind", "tokens": [50364, 27038, 565, 294, 428, 993, 11, + 562, 291, 434, 2539, 257, 2952, 13, 400, 1780, 294, 428, 993, 11, 291, 486, 312, + 733, 50660], "temperature": 0.0, "avg_logprob": -0.1959641436313061, "compression_ratio": + 1.6919642857142858, "no_speech_prob": 0.031405240297317505}, {"id": 732, "seek": + 282068, "start": 2826.6, "end": 2832.52, "text": " of generating fruit out of it + mostly. Or maybe you will be telling to other people and maybe", "tokens": [50660, + 295, 17746, 6773, 484, 295, 309, 5240, 13, 1610, 1310, 291, 486, 312, 3585, 281, + 661, 561, 293, 1310, 50956], "temperature": 0.0, "avg_logprob": -0.1959641436313061, + "compression_ratio": 1.6919642857142858, "no_speech_prob": 0.031405240297317505}, + {"id": 733, "seek": 282068, "start": 2832.52, "end": 2838.44, "text": " inspiring + them more and more. And then leading some research groups and the work, you know,", + "tokens": [50956, 15883, 552, 544, 293, 544, 13, 400, 550, 5775, 512, 2132, 3935, + 293, 264, 589, 11, 291, 458, 11, 51252], "temperature": 0.0, "avg_logprob": -0.1959641436313061, + "compression_ratio": 1.6919642857142858, "no_speech_prob": 0.031405240297317505}, + {"id": 734, "seek": 282068, "start": 2838.44, "end": 2846.2799999999997, "text": + " teams. And that''s that''s totally fine. But I also wanted to call out your idea + that I think is", "tokens": [51252, 5491, 13, 400, 300, 311, 300, 311, 3879, 2489, + 13, 583, 286, 611, 1415, 281, 818, 484, 428, 1558, 300, 286, 519, 307, 51644], "temperature": + 0.0, "avg_logprob": -0.1959641436313061, "compression_ratio": 1.6919642857142858, + "no_speech_prob": 0.031405240297317505}, {"id": 735, "seek": 284628, "start": 2846.28, + "end": 2851.48, "text": " quite instructive for many of us. And hopefully to our + listeners that yes, do go go and read", "tokens": [50364, 1596, 7232, 488, 337, + 867, 295, 505, 13, 400, 4696, 281, 527, 23274, 300, 2086, 11, 360, 352, 352, 293, + 1401, 50624], "temperature": 0.0, "avg_logprob": -0.12946812006143424, "compression_ratio": + 1.6566523605150214, "no_speech_prob": 0.04035257175564766}, {"id": 736, "seek": + 284628, "start": 2851.48, "end": 2858.76, "text": " papers because as Andrew Ang + put it, he said, if you read a paper every weekend, let''s say you have", "tokens": + [50624, 10577, 570, 382, 10110, 4521, 829, 309, 11, 415, 848, 11, 498, 291, 1401, + 257, 3035, 633, 6711, 11, 718, 311, 584, 291, 362, 50988], "temperature": 0.0, "avg_logprob": + -0.12946812006143424, "compression_ratio": 1.6566523605150214, "no_speech_prob": + 0.04035257175564766}, {"id": 737, "seek": 284628, "start": 2858.76, "end": 2865.0, + "text": " a full-time job, you don''t have time to read it, you can read it on the + weekend. At some point,", "tokens": [50988, 257, 1577, 12, 3766, 1691, 11, 291, + 500, 380, 362, 565, 281, 1401, 309, 11, 291, 393, 1401, 309, 322, 264, 6711, 13, + 1711, 512, 935, 11, 51300], "temperature": 0.0, "avg_logprob": -0.12946812006143424, + "compression_ratio": 1.6566523605150214, "no_speech_prob": 0.04035257175564766}, + {"id": 738, "seek": 284628, "start": 2866.0400000000004, "end": 2870.76, "text": + " and he also recommended to start coding, you know, like actually you didn''t find + the code for it,", "tokens": [51352, 293, 415, 611, 9628, 281, 722, 17720, 11, 291, + 458, 11, 411, 767, 291, 994, 380, 915, 264, 3089, 337, 309, 11, 51588], "temperature": + 0.0, "avg_logprob": -0.12946812006143424, "compression_ratio": 1.6566523605150214, + "no_speech_prob": 0.04035257175564766}, {"id": 739, "seek": 287076, "start": 2870.76, + "end": 2876.6000000000004, "text": " just try to implement the idea, right? At some + point, after reading the papers, you will actually", "tokens": [50364, 445, 853, + 281, 4445, 264, 1558, 11, 558, 30, 1711, 512, 935, 11, 934, 3760, 264, 10577, 11, + 291, 486, 767, 50656], "temperature": 0.0, "avg_logprob": -0.14235063129001194, + "compression_ratio": 1.634453781512605, "no_speech_prob": 0.002434796653687954}, + {"id": 740, "seek": 287076, "start": 2876.6000000000004, "end": 2883.0800000000004, + "text": " start generating ideas because you will find gaps in the thinking of the + authors on all of these", "tokens": [50656, 722, 17746, 3487, 570, 291, 486, 915, + 15031, 294, 264, 1953, 295, 264, 16552, 322, 439, 295, 613, 50980], "temperature": + 0.0, "avg_logprob": -0.14235063129001194, "compression_ratio": 1.634453781512605, + "no_speech_prob": 0.002434796653687954}, {"id": 741, "seek": 287076, "start": 2883.0800000000004, + "end": 2888.44, "text": " papers. And nobody is doing perfect job there. They''re + doing the publishable work, right? And so", "tokens": [50980, 10577, 13, 400, 5079, + 307, 884, 2176, 1691, 456, 13, 814, 434, 884, 264, 11374, 712, 589, 11, 558, 30, + 400, 370, 51248], "temperature": 0.0, "avg_logprob": -0.14235063129001194, "compression_ratio": + 1.634453781512605, "no_speech_prob": 0.002434796653687954}, {"id": 742, "seek": + 287076, "start": 2889.2400000000002, "end": 2896.6000000000004, "text": " I think + that resonates with you as well. Yeah, definitely. You definitely like switch gears + where", "tokens": [51288, 286, 519, 300, 41051, 365, 291, 382, 731, 13, 865, 11, + 2138, 13, 509, 2138, 411, 3679, 20915, 689, 51656], "temperature": 0.0, "avg_logprob": + -0.14235063129001194, "compression_ratio": 1.634453781512605, "no_speech_prob": + 0.002434796653687954}, {"id": 743, "seek": 289660, "start": 2897.16, "end": 2901.96, + "text": " you become an idea machine like you say where you read a paper and you''ll + have like a billion ideas", "tokens": [50392, 291, 1813, 364, 1558, 3479, 411, 291, + 584, 689, 291, 1401, 257, 3035, 293, 291, 603, 362, 411, 257, 5218, 3487, 50632], + "temperature": 0.0, "avg_logprob": -0.15720769130822385, "compression_ratio": 1.810035842293907, + "no_speech_prob": 0.004141695331782103}, {"id": 744, "seek": 289660, "start": 2901.96, + "end": 2907.48, "text": " for how to extend it. And then you''ll transition to this + part, which is what I''m learning now. And,", "tokens": [50632, 337, 577, 281, 10101, + 309, 13, 400, 550, 291, 603, 6034, 281, 341, 644, 11, 597, 307, 437, 286, 478, 2539, + 586, 13, 400, 11, 50908], "temperature": 0.0, "avg_logprob": -0.15720769130822385, + "compression_ratio": 1.810035842293907, "no_speech_prob": 0.004141695331782103}, + {"id": 745, "seek": 289660, "start": 2907.48, "end": 2913.0, "text": " you know, + as I''m in my last year, I''ve been two years in my PhD and the transition for me + is going", "tokens": [50908, 291, 458, 11, 382, 286, 478, 294, 452, 1036, 1064, + 11, 286, 600, 668, 732, 924, 294, 452, 14476, 293, 264, 6034, 337, 385, 307, 516, + 51184], "temperature": 0.0, "avg_logprob": -0.15720769130822385, "compression_ratio": + 1.810035842293907, "no_speech_prob": 0.004141695331782103}, {"id": 746, "seek": + 289660, "start": 2913.0, "end": 2918.7599999999998, "text": " from idea machine + to, okay, can you really build the idea for real? Do you really know how to test + this?", "tokens": [51184, 490, 1558, 3479, 281, 11, 1392, 11, 393, 291, 534, 1322, + 264, 1558, 337, 957, 30, 1144, 291, 534, 458, 577, 281, 1500, 341, 30, 51472], "temperature": + 0.0, "avg_logprob": -0.15720769130822385, "compression_ratio": 1.810035842293907, + "no_speech_prob": 0.004141695331782103}, {"id": 747, "seek": 289660, "start": 2918.7599999999998, + "end": 2925.24, "text": " And so, and that transition isn''t super obvious. And + it''s painful to be going back and forth between,", "tokens": [51472, 400, 370, + 11, 293, 300, 6034, 1943, 380, 1687, 6322, 13, 400, 309, 311, 11697, 281, 312, 516, + 646, 293, 5220, 1296, 11, 51796], "temperature": 0.0, "avg_logprob": -0.15720769130822385, + "compression_ratio": 1.810035842293907, "no_speech_prob": 0.004141695331782103}, + {"id": 748, "seek": 292524, "start": 2925.8799999999997, "end": 2930.52, "text": + " you know, theoretical idea machine. I''m reading these papers because like in + terms of like that", "tokens": [50396, 291, 458, 11, 20864, 1558, 3479, 13, 286, + 478, 3760, 613, 10577, 570, 411, 294, 2115, 295, 411, 300, 50628], "temperature": + 0.0, "avg_logprob": -0.16231918334960938, "compression_ratio": 1.8181818181818181, + "no_speech_prob": 0.006498075556010008}, {"id": 749, "seek": 292524, "start": 2930.52, + "end": 2935.08, "text": " flow state of creativity that you get into when you''re + when you''re working on things, for me,", "tokens": [50628, 3095, 1785, 295, 12915, + 300, 291, 483, 666, 562, 291, 434, 562, 291, 434, 1364, 322, 721, 11, 337, 385, + 11, 50856], "temperature": 0.0, "avg_logprob": -0.16231918334960938, "compression_ratio": + 1.8181818181818181, "no_speech_prob": 0.006498075556010008}, {"id": 750, "seek": + 292524, "start": 2935.08, "end": 2940.3599999999997, "text": " personally, reading + papers is like the most satisfying thing. I feel very like productive when I''m", + "tokens": [50856, 5665, 11, 3760, 10577, 307, 411, 264, 881, 18348, 551, 13, 286, + 841, 588, 411, 13304, 562, 286, 478, 51120], "temperature": 0.0, "avg_logprob": + -0.16231918334960938, "compression_ratio": 1.8181818181818181, "no_speech_prob": + 0.006498075556010008}, {"id": 751, "seek": 292524, "start": 2940.3599999999997, + "end": 2945.8799999999997, "text": " reading papers. I might, you know, I feel good. + But when I''m engineering things, I feel more", "tokens": [51120, 3760, 10577, 13, + 286, 1062, 11, 291, 458, 11, 286, 841, 665, 13, 583, 562, 286, 478, 7043, 721, 11, + 286, 841, 544, 51396], "temperature": 0.0, "avg_logprob": -0.16231918334960938, + "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.006498075556010008}, + {"id": 752, "seek": 292524, "start": 2945.8799999999997, "end": 2952.3599999999997, + "text": " pain, man, because it''s more painful, I''d say. Yes, yes. And this is + where, of course, you do want", "tokens": [51396, 1822, 11, 587, 11, 570, 309, 311, + 544, 11697, 11, 286, 1116, 584, 13, 1079, 11, 2086, 13, 400, 341, 307, 689, 11, + 295, 1164, 11, 291, 360, 528, 51720], "temperature": 0.0, "avg_logprob": -0.16231918334960938, + "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.006498075556010008}, + {"id": 753, "seek": 295236, "start": 2952.36, "end": 2957.1600000000003, "text": + " to have those oiled well, well, well, oiled software systems that you don''t need + to waste your time", "tokens": [50364, 281, 362, 729, 3184, 292, 731, 11, 731, 11, + 731, 11, 3184, 292, 4722, 3652, 300, 291, 500, 380, 643, 281, 5964, 428, 565, 50604], + "temperature": 0.0, "avg_logprob": -0.2874270500020778, "compression_ratio": 1.592274678111588, + "no_speech_prob": 0.010654205456376076}, {"id": 754, "seek": 295236, "start": 2957.1600000000003, + "end": 2964.76, "text": " setting things up or running out of disco, whatever, you + know, heaven so, so frequently.", "tokens": [50604, 3287, 721, 493, 420, 2614, 484, + 295, 3622, 11, 2035, 11, 291, 458, 11, 7162, 370, 11, 370, 10374, 13, 50984], "temperature": + 0.0, "avg_logprob": -0.2874270500020778, "compression_ratio": 1.592274678111588, + "no_speech_prob": 0.010654205456376076}, {"id": 755, "seek": 295236, "start": 2965.8, + "end": 2970.92, "text": " So like even the innocuous things like before I had integrated + Google Drive with Google collab,", "tokens": [51036, 407, 411, 754, 264, 10843, + 12549, 721, 411, 949, 286, 632, 10919, 3329, 15622, 365, 3329, 44228, 11, 51292], + "temperature": 0.0, "avg_logprob": -0.2874270500020778, "compression_ratio": 1.592274678111588, + "no_speech_prob": 0.010654205456376076}, {"id": 756, "seek": 295236, "start": 2970.92, + "end": 2975.96, "text": " and it would crash. And I feel like I''ve just lost 10 + hours of running this thing. So,", "tokens": [51292, 293, 309, 576, 8252, 13, 400, + 286, 841, 411, 286, 600, 445, 2731, 1266, 2496, 295, 2614, 341, 551, 13, 407, 11, + 51544], "temperature": 0.0, "avg_logprob": -0.2874270500020778, "compression_ratio": + 1.592274678111588, "no_speech_prob": 0.010654205456376076}, {"id": 757, "seek": + 297596, "start": 2976.6, "end": 2982.76, "text": " and that is not good. Like, this + is I think what Joel Spolski said at some point, you know,", "tokens": [50396, 293, + 300, 307, 406, 665, 13, 1743, 11, 341, 307, 286, 519, 437, 21522, 1738, 401, 18020, + 848, 412, 512, 935, 11, 291, 458, 11, 50704], "temperature": 0.0, "avg_logprob": + -0.21380058167472718, "compression_ratio": 1.7427536231884058, "no_speech_prob": + 0.02069966495037079}, {"id": 758, "seek": 297596, "start": 2982.76, "end": 2989.4, + "text": " the co-founder of Stack or Flow, you know, he said like, imagine that + you want to print a piece of", "tokens": [50704, 264, 598, 12, 33348, 295, 37649, + 420, 32792, 11, 291, 458, 11, 415, 848, 411, 11, 3811, 300, 291, 528, 281, 4482, + 257, 2522, 295, 51036], "temperature": 0.0, "avg_logprob": -0.21380058167472718, + "compression_ratio": 1.7427536231884058, "no_speech_prob": 0.02069966495037079}, + {"id": 759, "seek": 297596, "start": 2989.4, "end": 2995.32, "text": " paper and + you log into your computer and it says, please upgrade the driver. So you upgrade + the", "tokens": [51036, 3035, 293, 291, 3565, 666, 428, 3820, 293, 309, 1619, 11, + 1767, 11484, 264, 6787, 13, 407, 291, 11484, 264, 51332], "temperature": 0.0, "avg_logprob": + -0.21380058167472718, "compression_ratio": 1.7427536231884058, "no_speech_prob": + 0.02069966495037079}, {"id": 760, "seek": 297596, "start": 2995.32, "end": 3000.12, + "text": " driver and then operating system says I need to reboot. So it reboots + and it basically waits 10", "tokens": [51332, 6787, 293, 550, 7447, 1185, 1619, + 286, 643, 281, 33818, 13, 407, 309, 26802, 1971, 293, 309, 1936, 40597, 1266, 51572], + "temperature": 0.0, "avg_logprob": -0.21380058167472718, "compression_ratio": 1.7427536231884058, + "no_speech_prob": 0.02069966495037079}, {"id": 761, "seek": 297596, "start": 3000.12, + "end": 3005.64, "text": " minutes of your time. And then you, and then again, it + says, hey, actually, I cannot print because", "tokens": [51572, 2077, 295, 428, + 565, 13, 400, 550, 291, 11, 293, 550, 797, 11, 309, 1619, 11, 4177, 11, 767, 11, + 286, 2644, 4482, 570, 51848], "temperature": 0.0, "avg_logprob": -0.21380058167472718, + "compression_ratio": 1.7427536231884058, "no_speech_prob": 0.02069966495037079}, + {"id": 762, "seek": 300564, "start": 3005.64, "end": 3010.7599999999998, "text": + " you ran out of something now. Again, it installs them. And you like, instead of + solving the problem,", "tokens": [50364, 291, 5872, 484, 295, 746, 586, 13, 3764, + 11, 309, 3625, 82, 552, 13, 400, 291, 411, 11, 2602, 295, 12606, 264, 1154, 11, + 50620], "temperature": 0.0, "avg_logprob": -0.1803751770330935, "compression_ratio": + 1.6270491803278688, "no_speech_prob": 0.0008852415485307574}, {"id": 763, "seek": + 300564, "start": 3010.7599999999998, "end": 3016.92, "text": " you become the administrator + of your computer, right? And that''s the same, the same thing can happen", "tokens": + [50620, 291, 1813, 264, 25529, 295, 428, 3820, 11, 558, 30, 400, 300, 311, 264, + 912, 11, 264, 912, 551, 393, 1051, 50928], "temperature": 0.0, "avg_logprob": -0.1803751770330935, + "compression_ratio": 1.6270491803278688, "no_speech_prob": 0.0008852415485307574}, + {"id": 764, "seek": 300564, "start": 3016.92, "end": 3025.08, "text": " so much, + so often in software, you know, development and research as well, because, because + I think", "tokens": [50928, 370, 709, 11, 370, 2049, 294, 4722, 11, 291, 458, 11, + 3250, 293, 2132, 382, 731, 11, 570, 11, 570, 286, 519, 51336], "temperature": 0.0, + "avg_logprob": -0.1803751770330935, "compression_ratio": 1.6270491803278688, "no_speech_prob": + 0.0008852415485307574}, {"id": 765, "seek": 300564, "start": 3025.96, "end": 3033.7999999999997, + "text": " somebody will put on Twitter, we do not actually choose between big and + small, like do a lot of", "tokens": [51380, 2618, 486, 829, 322, 5794, 11, 321, + 360, 406, 767, 2826, 1296, 955, 293, 1359, 11, 411, 360, 257, 688, 295, 51772], + "temperature": 0.0, "avg_logprob": -0.1803751770330935, "compression_ratio": 1.6270491803278688, + "no_speech_prob": 0.0008852415485307574}, {"id": 766, "seek": 303380, "start": 3033.8, + "end": 3038.6800000000003, "text": " things and do like small amount of things. + We usually choose between small and nothing.", "tokens": [50364, 721, 293, 360, + 411, 1359, 2372, 295, 721, 13, 492, 2673, 2826, 1296, 1359, 293, 1825, 13, 50608], + "temperature": 0.0, "avg_logprob": -0.18173430138027546, "compression_ratio": 1.6304347826086956, + "no_speech_prob": 0.0045081875286996365}, {"id": 767, "seek": 303380, "start": 3039.32, + "end": 3045.7200000000003, "text": " And so I guess when those things eating a lot + of your small time, right, to nothing, you''re like", "tokens": [50640, 400, 370, + 286, 2041, 562, 729, 721, 3936, 257, 688, 295, 428, 1359, 565, 11, 558, 11, 281, + 1825, 11, 291, 434, 411, 50960], "temperature": 0.0, "avg_logprob": -0.18173430138027546, + "compression_ratio": 1.6304347826086956, "no_speech_prob": 0.0045081875286996365}, + {"id": 768, "seek": 303380, "start": 3045.7200000000003, "end": 3051.2400000000002, + "text": " frustrated and you''re like, okay, I''m just down the rabbit hole. What + am I doing? And so I think", "tokens": [50960, 15751, 293, 291, 434, 411, 11, 1392, + 11, 286, 478, 445, 760, 264, 19509, 5458, 13, 708, 669, 286, 884, 30, 400, 370, + 286, 519, 51236], "temperature": 0.0, "avg_logprob": -0.18173430138027546, "compression_ratio": + 1.6304347826086956, "no_speech_prob": 0.0045081875286996365}, {"id": 769, "seek": + 303380, "start": 3051.2400000000002, "end": 3057.4, "text": " tools like VEVIates + save a ton of time and everybody who is innovating in this space from the", "tokens": + [51236, 3873, 411, 691, 36, 25322, 1024, 3155, 257, 2952, 295, 565, 293, 2201, 567, + 307, 5083, 990, 294, 341, 1901, 490, 264, 51544], "temperature": 0.0, "avg_logprob": + -0.18173430138027546, "compression_ratio": 1.6304347826086956, "no_speech_prob": + 0.0045081875286996365}, {"id": 770, "seek": 305740, "start": 3057.4, "end": 3063.88, + "text": " direction of usability, you know, like and saving time, shaving those + minutes off of, you know,", "tokens": [50364, 3513, 295, 46878, 11, 291, 458, 11, + 411, 293, 6816, 565, 11, 36481, 729, 2077, 766, 295, 11, 291, 458, 11, 50688], "temperature": + 0.0, "avg_logprob": -0.21558744399273982, "compression_ratio": 1.6618705035971224, + "no_speech_prob": 0.005872700363397598}, {"id": 771, "seek": 305740, "start": 3063.88, + "end": 3067.96, "text": " your experience, I think that will save so much time for + your thinking as well.", "tokens": [50688, 428, 1752, 11, 286, 519, 300, 486, 3155, + 370, 709, 565, 337, 428, 1953, 382, 731, 13, 50892], "temperature": 0.0, "avg_logprob": + -0.21558744399273982, "compression_ratio": 1.6618705035971224, "no_speech_prob": + 0.005872700363397598}, {"id": 772, "seek": 305740, "start": 3069.1600000000003, + "end": 3075.08, "text": " Yeah. And before VEVIate, I was doing a little bit of + the sponsored content work, and which for me", "tokens": [50952, 865, 13, 400, 949, + 691, 36, 25322, 473, 11, 286, 390, 884, 257, 707, 857, 295, 264, 16621, 2701, 589, + 11, 293, 597, 337, 385, 51248], "temperature": 0.0, "avg_logprob": -0.21558744399273982, + "compression_ratio": 1.6618705035971224, "no_speech_prob": 0.005872700363397598}, + {"id": 773, "seek": 305740, "start": 3075.08, "end": 3081.08, "text": " is great + because I get to talk to these people and they teach me a lot. And so this is with", + "tokens": [51248, 307, 869, 570, 286, 483, 281, 751, 281, 613, 561, 293, 436, 2924, + 385, 257, 688, 13, 400, 370, 341, 307, 365, 51548], "temperature": 0.0, "avg_logprob": + -0.21558744399273982, "compression_ratio": 1.6618705035971224, "no_speech_prob": + 0.005872700363397598}, {"id": 774, "seek": 305740, "start": 3081.08, "end": 3085.8, + "text": " the term in AI, which is now a part of you who have packered. And so yeah, + they''re building the", "tokens": [51548, 264, 1433, 294, 7318, 11, 597, 307, 586, + 257, 644, 295, 291, 567, 362, 2844, 4073, 13, 400, 370, 1338, 11, 436, 434, 2390, + 264, 51784], "temperature": 0.0, "avg_logprob": -0.21558744399273982, "compression_ratio": + 1.6618705035971224, "no_speech_prob": 0.005872700363397598}, {"id": 775, "seek": + 308580, "start": 3085.8, "end": 3090.92, "text": " hyper-pram, like distributed + training hyper-pram, reorganization, which what we''re talking about, like", "tokens": + [50364, 9848, 12, 1424, 335, 11, 411, 12631, 3097, 9848, 12, 1424, 335, 11, 41203, + 2144, 11, 597, 437, 321, 434, 1417, 466, 11, 411, 50620], "temperature": 0.0, "avg_logprob": + -0.2799730023134102, "compression_ratio": 1.565040650406504, "no_speech_prob": 0.0037699940148741007}, + {"id": 776, "seek": 308580, "start": 3090.92, "end": 3096.28, "text": " the administer + of the system, they''re doing a lot of this work. And you know, as anyone, I''m + sure", "tokens": [50620, 264, 22096, 295, 264, 1185, 11, 436, 434, 884, 257, 688, + 295, 341, 589, 13, 400, 291, 458, 11, 382, 2878, 11, 286, 478, 988, 50888], "temperature": + 0.0, "avg_logprob": -0.2799730023134102, "compression_ratio": 1.565040650406504, + "no_speech_prob": 0.0037699940148741007}, {"id": 777, "seek": 308580, "start": 3096.28, + "end": 3100.2000000000003, "text": " people listening to this have gotten smoked + with the cost of one of these experiments too.", "tokens": [50888, 561, 4764, 281, + 341, 362, 5768, 27205, 365, 264, 2063, 295, 472, 295, 613, 12050, 886, 13, 51084], + "temperature": 0.0, "avg_logprob": -0.2799730023134102, "compression_ratio": 1.565040650406504, + "no_speech_prob": 0.0037699940148741007}, {"id": 778, "seek": 308580, "start": 3100.92, + "end": 3106.52, "text": " So it''s not just your time. It''s not fun.", "tokens": + [51120, 407, 309, 311, 406, 445, 428, 565, 13, 467, 311, 406, 1019, 13, 51400], + "temperature": 0.0, "avg_logprob": -0.2799730023134102, "compression_ratio": 1.565040650406504, + "no_speech_prob": 0.0037699940148741007}, {"id": 779, "seek": 308580, "start": 3106.52, + "end": 3112.04, "text": " Yeah, actually, you reminded me of on Google Cloud.", + "tokens": [51400, 865, 11, 767, 11, 291, 15920, 385, 295, 322, 3329, 8061, 13, 51676], + "temperature": 0.0, "avg_logprob": -0.2799730023134102, "compression_ratio": 1.565040650406504, + "no_speech_prob": 0.0037699940148741007}, {"id": 780, "seek": 311204, "start": 3112.04, + "end": 3119.08, "text": " It was a tutorial, like a workshop. It was a free one. + They even like gave us food.", "tokens": [50364, 467, 390, 257, 7073, 11, 411, 257, + 13541, 13, 467, 390, 257, 1737, 472, 13, 814, 754, 411, 2729, 505, 1755, 13, 50716], + "temperature": 0.0, "avg_logprob": -0.224522705078125, "compression_ratio": 1.635135135135135, + "no_speech_prob": 0.02530473656952381}, {"id": 781, "seek": 311204, "start": 3121.08, + "end": 3128.04, "text": " So you just show up, they video and then they tell you + things. And it was a practical one.", "tokens": [50816, 407, 291, 445, 855, 493, + 11, 436, 960, 293, 550, 436, 980, 291, 721, 13, 400, 309, 390, 257, 8496, 472, 13, + 51164], "temperature": 0.0, "avg_logprob": -0.224522705078125, "compression_ratio": + 1.635135135135135, "no_speech_prob": 0.02530473656952381}, {"id": 782, "seek": 311204, + "start": 3128.04, "end": 3133.24, "text": " And I remember one of the instructors, + he was not an employee of Google, but he was certified.", "tokens": [51164, 400, + 286, 1604, 472, 295, 264, 28367, 11, 415, 390, 406, 364, 10738, 295, 3329, 11, 457, + 415, 390, 18580, 13, 51424], "temperature": 0.0, "avg_logprob": -0.224522705078125, + "compression_ratio": 1.635135135135135, "no_speech_prob": 0.02530473656952381}, + {"id": 783, "seek": 311204, "start": 3133.88, "end": 3139.8, "text": " And you know, + like he said, hey, now we''re gonna spin the Spanner cluster. And Spanner is the", + "tokens": [51456, 400, 291, 458, 11, 411, 415, 848, 11, 4177, 11, 586, 321, 434, + 799, 6060, 264, 1738, 9805, 13630, 13, 400, 1738, 9805, 307, 264, 51752], "temperature": + 0.0, "avg_logprob": -0.224522705078125, "compression_ratio": 1.635135135135135, + "no_speech_prob": 0.02530473656952381}, {"id": 784, "seek": 313980, "start": 3139.8, + "end": 3146.2000000000003, "text": " my SQL planet scale with all the consistency + and semantic guarantees using atomic clocks. And", "tokens": [50364, 452, 19200, + 5054, 4373, 365, 439, 264, 14416, 293, 47982, 32567, 1228, 22275, 41528, 13, 400, + 50684], "temperature": 0.0, "avg_logprob": -0.15851313344548257, "compression_ratio": + 1.5793991416309012, "no_speech_prob": 0.0028392812237143517}, {"id": 785, "seek": + 313980, "start": 3146.2000000000003, "end": 3151.88, "text": " there is like a fantastic + presentation by one of its engineers that I have in my recordings. I", "tokens": + [50684, 456, 307, 411, 257, 5456, 5860, 538, 472, 295, 1080, 11955, 300, 286, 362, + 294, 452, 25162, 13, 286, 50968], "temperature": 0.0, "avg_logprob": -0.15851313344548257, + "compression_ratio": 1.5793991416309012, "no_speech_prob": 0.0028392812237143517}, + {"id": 786, "seek": 313980, "start": 3151.88, "end": 3156.04, "text": " have not + published yet because I don''t know if Google will try to sue me. But you know,", + "tokens": [50968, 362, 406, 6572, 1939, 570, 286, 500, 380, 458, 498, 3329, 486, + 853, 281, 20416, 385, 13, 583, 291, 458, 11, 51176], "temperature": 0.0, "avg_logprob": + -0.15851313344548257, "compression_ratio": 1.5793991416309012, "no_speech_prob": + 0.0028392812237143517}, {"id": 787, "seek": 313980, "start": 3156.84, "end": 3162.6000000000004, + "text": " the idea is that it''s a fantastic system. And there is a paper as well. + And then the guide,", "tokens": [51216, 264, 1558, 307, 300, 309, 311, 257, 5456, + 1185, 13, 400, 456, 307, 257, 3035, 382, 731, 13, 400, 550, 264, 5934, 11, 51504], + "temperature": 0.0, "avg_logprob": -0.15851313344548257, "compression_ratio": 1.5793991416309012, + "no_speech_prob": 0.0028392812237143517}, {"id": 788, "seek": 316260, "start": 3162.6, + "end": 3169.56, "text": " the teacher, he said, well, hold on. Don''t spin too many + of them because I get the bill.", "tokens": [50364, 264, 5027, 11, 415, 848, 11, + 731, 11, 1797, 322, 13, 1468, 380, 6060, 886, 867, 295, 552, 570, 286, 483, 264, + 2961, 13, 50712], "temperature": 0.0, "avg_logprob": -0.19513399784381574, "compression_ratio": + 1.5565610859728507, "no_speech_prob": 0.020596634596586227}, {"id": 789, "seek": + 316260, "start": 3169.56, "end": 3175.7999999999997, "text": " And last month, I + got a bill of $4,000. And Google could not reimburse it because they said,", "tokens": + [50712, 400, 1036, 1618, 11, 286, 658, 257, 2961, 295, 1848, 19, 11, 1360, 13, 400, + 3329, 727, 406, 41685, 309, 570, 436, 848, 11, 51024], "temperature": 0.0, "avg_logprob": + -0.19513399784381574, "compression_ratio": 1.5565610859728507, "no_speech_prob": + 0.020596634596586227}, {"id": 790, "seek": 316260, "start": 3175.7999999999997, + "end": 3181.7999999999997, "text": " you''re not an internal employee. So he was + like, it''s fun. But you know, to the point when you might.", "tokens": [51024, + 291, 434, 406, 364, 6920, 10738, 13, 407, 415, 390, 411, 11, 309, 311, 1019, 13, + 583, 291, 458, 11, 281, 264, 935, 562, 291, 1062, 13, 51324], "temperature": 0.0, + "avg_logprob": -0.19513399784381574, "compression_ratio": 1.5565610859728507, "no_speech_prob": + 0.020596634596586227}, {"id": 791, "seek": 316260, "start": 3183.08, "end": 3187.24, + "text": " Yeah, it''s funny. It''s funny now, but it''s not funny at all.", "tokens": + [51388, 865, 11, 309, 311, 4074, 13, 467, 311, 4074, 586, 11, 457, 309, 311, 406, + 4074, 412, 439, 13, 51596], "temperature": 0.0, "avg_logprob": -0.19513399784381574, + "compression_ratio": 1.5565610859728507, "no_speech_prob": 0.020596634596586227}, + {"id": 792, "seek": 318724, "start": 3187.24, "end": 3195.8799999999997, "text": + " Yeah, that determined AI calls it lunch and learn. There are this kind of concept + for deep learning,", "tokens": [50364, 865, 11, 300, 9540, 7318, 5498, 309, 6349, + 293, 1466, 13, 821, 366, 341, 733, 295, 3410, 337, 2452, 2539, 11, 50796], "temperature": + 0.0, "avg_logprob": -0.18992400938464749, "compression_ratio": 1.6610169491525424, + "no_speech_prob": 0.00892741046845913}, {"id": 793, "seek": 318724, "start": 3195.8799999999997, + "end": 3201.24, "text": " or like I''d say to science content, like even like, you + know, with physics and they''re going to be", "tokens": [50796, 420, 411, 286, 1116, + 584, 281, 3497, 2701, 11, 411, 754, 411, 11, 291, 458, 11, 365, 10649, 293, 436, + 434, 516, 281, 312, 51064], "temperature": 0.0, "avg_logprob": -0.18992400938464749, + "compression_ratio": 1.6610169491525424, "no_speech_prob": 0.00892741046845913}, + {"id": 794, "seek": 318724, "start": 3201.24, "end": 3205.8799999999997, "text": + " doing experiments where it''s expensive. So we''re not going to each be doing + it. We''re going to", "tokens": [51064, 884, 12050, 689, 309, 311, 5124, 13, 407, + 321, 434, 406, 516, 281, 1184, 312, 884, 309, 13, 492, 434, 516, 281, 51296], "temperature": + 0.0, "avg_logprob": -0.18992400938464749, "compression_ratio": 1.6610169491525424, + "no_speech_prob": 0.00892741046845913}, {"id": 795, "seek": 318724, "start": 3205.8799999999997, + "end": 3210.8399999999997, "text": " watch one person do it and kind of gather around + as a community. And yeah, I see that as being a", "tokens": [51296, 1159, 472, 954, + 360, 309, 293, 733, 295, 5448, 926, 382, 257, 1768, 13, 400, 1338, 11, 286, 536, + 300, 382, 885, 257, 51544], "temperature": 0.0, "avg_logprob": -0.18992400938464749, + "compression_ratio": 1.6610169491525424, "no_speech_prob": 0.00892741046845913}, + {"id": 796, "seek": 318724, "start": 3210.8399999999997, "end": 3216.52, "text": + " huge part. Just like Uber eats coupons, I think is a brilliant interface for it. + And then everyone", "tokens": [51544, 2603, 644, 13, 1449, 411, 21839, 18109, 8682, + 892, 11, 286, 519, 307, 257, 10248, 9226, 337, 309, 13, 400, 550, 1518, 51828], + "temperature": 0.0, "avg_logprob": -0.18992400938464749, "compression_ratio": 1.6610169491525424, + "no_speech_prob": 0.00892741046845913}, {"id": 797, "seek": 321652, "start": 3216.52, + "end": 3221.48, "text": " attends the thing. But yeah, I love that kind of. And + then just quickly, so like one thing we''re", "tokens": [50364, 49837, 264, 551, + 13, 583, 1338, 11, 286, 959, 300, 733, 295, 13, 400, 550, 445, 2661, 11, 370, 411, + 472, 551, 321, 434, 50612], "temperature": 0.0, "avg_logprob": -0.23003231532989987, + "compression_ratio": 1.7234042553191489, "no_speech_prob": 0.0013813048135489225}, + {"id": 798, "seek": 321652, "start": 3221.48, "end": 3226.84, "text": " working + on at Weve 8. And as people have seen with hugging face data sets and the Kaggle + competitions,", "tokens": [50612, 1364, 322, 412, 492, 303, 1649, 13, 400, 382, + 561, 362, 1612, 365, 41706, 1851, 1412, 6352, 293, 264, 48751, 22631, 26185, 11, + 50880], "temperature": 0.0, "avg_logprob": -0.23003231532989987, "compression_ratio": + 1.7234042553191489, "no_speech_prob": 0.0013813048135489225}, {"id": 799, "seek": + 321652, "start": 3226.84, "end": 3233.8, "text": " well, hugging face data is a + little different, but it is hosting the demos cheaply. So that so", "tokens": [50880, + 731, 11, 41706, 1851, 1412, 307, 257, 707, 819, 11, 457, 309, 307, 16058, 264, 33788, + 7084, 356, 13, 407, 300, 370, 51228], "temperature": 0.0, "avg_logprob": -0.23003231532989987, + "compression_ratio": 1.7234042553191489, "no_speech_prob": 0.0013813048135489225}, + {"id": 800, "seek": 321652, "start": 3233.8, "end": 3238.52, "text": " in Weve 8, + we''re working on this. The wiki data is going to be the next big release where + we have", "tokens": [51228, 294, 492, 303, 1649, 11, 321, 434, 1364, 322, 341, 13, + 440, 261, 9850, 1412, 307, 516, 281, 312, 264, 958, 955, 4374, 689, 321, 362, 51464], + "temperature": 0.0, "avg_logprob": -0.23003231532989987, "compression_ratio": 1.7234042553191489, + "no_speech_prob": 0.0013813048135489225}, {"id": 801, "seek": 321652, "start": 3238.52, + "end": 3243.56, "text": " the pie torch, big graph embeddings, which is the graph + structure makes it different from say", "tokens": [51464, 264, 1730, 27822, 11, + 955, 4295, 12240, 29432, 11, 597, 307, 264, 4295, 3877, 1669, 309, 819, 490, 584, + 51716], "temperature": 0.0, "avg_logprob": -0.23003231532989987, "compression_ratio": + 1.7234042553191489, "no_speech_prob": 0.0013813048135489225}, {"id": 802, "seek": + 324356, "start": 3243.56, "end": 3248.2, "text": " Wikipedia, because it''s really + good at entity embedding. As we mentioned London and Barcelona,", "tokens": [50364, + 28999, 11, 570, 309, 311, 534, 665, 412, 13977, 12240, 3584, 13, 1018, 321, 2835, + 7042, 293, 21247, 11, 50596], "temperature": 0.0, "avg_logprob": -0.1814723368044253, + "compression_ratio": 1.6775362318840579, "no_speech_prob": 0.004239059053361416}, + {"id": 803, "seek": 324356, "start": 3248.2, "end": 3253.24, "text": " if you construct + a knowledge graph of Barcelona compared to London, that''s going to have a better", + "tokens": [50596, 498, 291, 7690, 257, 3601, 4295, 295, 21247, 5347, 281, 7042, + 11, 300, 311, 516, 281, 362, 257, 1101, 50848], "temperature": 0.0, "avg_logprob": + -0.1814723368044253, "compression_ratio": 1.6775362318840579, "no_speech_prob": + 0.004239059053361416}, {"id": 804, "seek": 324356, "start": 3253.24, "end": 3257.4, + "text": " entity representation using learning techniques like deep walk or note + to veck or maybe", "tokens": [50848, 13977, 10290, 1228, 2539, 7512, 411, 2452, + 1792, 420, 3637, 281, 1241, 547, 420, 1310, 51056], "temperature": 0.0, "avg_logprob": + -0.1814723368044253, "compression_ratio": 1.6775362318840579, "no_speech_prob": + 0.004239059053361416}, {"id": 805, "seek": 324356, "start": 3258.7599999999998, + "end": 3262.68, "text": " maybe like a graph convolutional network with an auto + encoder loss, but probably deep walk or", "tokens": [51124, 1310, 411, 257, 4295, + 45216, 304, 3209, 365, 364, 8399, 2058, 19866, 4470, 11, 457, 1391, 2452, 1792, + 420, 51320], "temperature": 0.0, "avg_logprob": -0.1814723368044253, "compression_ratio": + 1.6775362318840579, "no_speech_prob": 0.004239059053361416}, {"id": 806, "seek": + 324356, "start": 3262.68, "end": 3267.72, "text": " note to veck is what I would + say is, I mean, I''m not completely caught up with that, but", "tokens": [51320, + 3637, 281, 1241, 547, 307, 437, 286, 576, 584, 307, 11, 286, 914, 11, 286, 478, + 406, 2584, 5415, 493, 365, 300, 11, 457, 51572], "temperature": 0.0, "avg_logprob": + -0.1814723368044253, "compression_ratio": 1.6775362318840579, "no_speech_prob": + 0.004239059053361416}, {"id": 807, "seek": 326772, "start": 3268.3599999999997, + "end": 3274.4399999999996, "text": " anyway, so having that kind of data set, the + wiki data, and now it''s cheaper. That''s the huge", "tokens": [50396, 4033, 11, + 370, 1419, 300, 733, 295, 1412, 992, 11, 264, 261, 9850, 1412, 11, 293, 586, 309, + 311, 12284, 13, 663, 311, 264, 2603, 50700], "temperature": 0.0, "avg_logprob": + -0.1558747725053267, "compression_ratio": 1.7295373665480427, "no_speech_prob": + 0.003404415911063552}, {"id": 808, "seek": 326772, "start": 3274.4399999999996, + "end": 3279.08, "text": " difference. That''s the change in deep learning is hugging + face is hosting all these data sets,", "tokens": [50700, 2649, 13, 663, 311, 264, + 1319, 294, 2452, 2539, 307, 41706, 1851, 307, 16058, 439, 613, 1412, 6352, 11, 50932], + "temperature": 0.0, "avg_logprob": -0.1558747725053267, "compression_ratio": 1.7295373665480427, + "no_speech_prob": 0.003404415911063552}, {"id": 809, "seek": 326772, "start": 3279.08, + "end": 3283.48, "text": " so you don''t have to host them yourself. You can just + quickly access them. And with Weve 8, it''s", "tokens": [50932, 370, 291, 500, 380, + 362, 281, 3975, 552, 1803, 13, 509, 393, 445, 2661, 2105, 552, 13, 400, 365, 492, + 303, 1649, 11, 309, 311, 51152], "temperature": 0.0, "avg_logprob": -0.1558747725053267, + "compression_ratio": 1.7295373665480427, "no_speech_prob": 0.003404415911063552}, + {"id": 810, "seek": 326772, "start": 3283.48, "end": 3288.68, "text": " even more + exciting, in my opinion, because they''re hosting a vector search engine with model + inference,", "tokens": [51152, 754, 544, 4670, 11, 294, 452, 4800, 11, 570, 436, + 434, 16058, 257, 8062, 3164, 2848, 365, 2316, 38253, 11, 51412], "temperature": + 0.0, "avg_logprob": -0.1558747725053267, "compression_ratio": 1.7295373665480427, + "no_speech_prob": 0.003404415911063552}, {"id": 811, "seek": 326772, "start": 3288.68, + "end": 3292.2, "text": " I mean, hugging face is doing model inference too, as we + talked about infinity where they''ve got", "tokens": [51412, 286, 914, 11, 41706, + 1851, 307, 884, 2316, 38253, 886, 11, 382, 321, 2825, 466, 13202, 689, 436, 600, + 658, 51588], "temperature": 0.0, "avg_logprob": -0.1558747725053267, "compression_ratio": + 1.7295373665480427, "no_speech_prob": 0.003404415911063552}, {"id": 812, "seek": + 329220, "start": 3292.4399999999996, "end": 3298.2799999999997, "text": " inference + time data like milliseconds for these massive models is, yeah, is you don''t have + to pay", "tokens": [50376, 38253, 565, 1412, 411, 34184, 337, 613, 5994, 5245, 307, + 11, 1338, 11, 307, 291, 500, 380, 362, 281, 1689, 50668], "temperature": 0.0, "avg_logprob": + -0.23698940905895863, "compression_ratio": 1.6986899563318778, "no_speech_prob": + 0.014457867480814457}, {"id": 813, "seek": 329220, "start": 3298.2799999999997, + "end": 3305.16, "text": " for the hosting of these things, which is obviously good. + Absolutely, absolutely. And also not", "tokens": [50668, 337, 264, 16058, 295, 613, + 721, 11, 597, 307, 2745, 665, 13, 7021, 11, 3122, 13, 400, 611, 406, 51012], "temperature": + 0.0, "avg_logprob": -0.23698940905895863, "compression_ratio": 1.6986899563318778, + "no_speech_prob": 0.014457867480814457}, {"id": 814, "seek": 329220, "start": 3305.16, + "end": 3311.24, "text": " like massive with hosting things, because that''s also + the cost of maintaining is the cost not to", "tokens": [51012, 411, 5994, 365, 16058, + 721, 11, 570, 300, 311, 611, 264, 2063, 295, 14916, 307, 264, 2063, 406, 281, 51316], + "temperature": 0.0, "avg_logprob": -0.23698940905895863, "compression_ratio": 1.6986899563318778, + "no_speech_prob": 0.014457867480814457}, {"id": 815, "seek": 329220, "start": 3311.24, + "end": 3317.7999999999997, "text": " neglect. So absolutely. Yeah, yeah. Absolutely. + Hey, it was such a packed conversation. I think the", "tokens": [51316, 17745, 13, + 407, 3122, 13, 865, 11, 1338, 13, 7021, 13, 1911, 11, 309, 390, 1270, 257, 13265, + 3761, 13, 286, 519, 264, 51644], "temperature": 0.0, "avg_logprob": -0.23698940905895863, + "compression_ratio": 1.6986899563318778, "no_speech_prob": 0.014457867480814457}, + {"id": 816, "seek": 331780, "start": 3317.8, "end": 3323.48, "text": " show notes + will be infinite, because you mentioned so many names, so many articles, and that''s + fantastic.", "tokens": [50364, 855, 5570, 486, 312, 13785, 11, 570, 291, 2835, 370, + 867, 5288, 11, 370, 867, 11290, 11, 293, 300, 311, 5456, 13, 50648], "temperature": + 0.0, "avg_logprob": -0.13224154403529217, "compression_ratio": 1.672340425531915, + "no_speech_prob": 0.022737151011824608}, {"id": 817, "seek": 331780, "start": 3323.48, + "end": 3329.88, "text": " Thanks so much for doing this. I wanted to just still + kind of end on kind of a little bit like that", "tokens": [50648, 2561, 370, 709, + 337, 884, 341, 13, 286, 1415, 281, 445, 920, 733, 295, 917, 322, 733, 295, 257, + 707, 857, 411, 300, 50968], "temperature": 0.0, "avg_logprob": -0.13224154403529217, + "compression_ratio": 1.672340425531915, "no_speech_prob": 0.022737151011824608}, + {"id": 818, "seek": 331780, "start": 3329.88, "end": 3334.6800000000003, "text": + " philosophical stance, which I usually do. And I think we touched a lot on that + and thanks for doing", "tokens": [50968, 25066, 21033, 11, 597, 286, 2673, 360, + 13, 400, 286, 519, 321, 9828, 257, 688, 322, 300, 293, 3231, 337, 884, 51208], "temperature": + 0.0, "avg_logprob": -0.13224154403529217, "compression_ratio": 1.672340425531915, + "no_speech_prob": 0.022737151011824608}, {"id": 819, "seek": 331780, "start": 3334.6800000000003, + "end": 3340.6000000000004, "text": " this. But like in summary, what drives you? + Why are you doing this? What you are doing?", "tokens": [51208, 341, 13, 583, 411, + 294, 12691, 11, 437, 11754, 291, 30, 1545, 366, 291, 884, 341, 30, 708, 291, 366, + 884, 30, 51504], "temperature": 0.0, "avg_logprob": -0.13224154403529217, "compression_ratio": + 1.672340425531915, "no_speech_prob": 0.022737151011824608}, {"id": 820, "seek": + 334780, "start": 3347.88, "end": 3355.48, "text": " That''s great question. I mean, + I guess like, and I''ve heard, as you mentioned, Elon Musk, I''ve heard", "tokens": + [50368, 663, 311, 869, 1168, 13, 286, 914, 11, 286, 2041, 411, 11, 293, 286, 600, + 2198, 11, 382, 291, 2835, 11, 28498, 26019, 11, 286, 600, 2198, 50748], "temperature": + 0.0, "avg_logprob": -0.19293104927494842, "compression_ratio": 1.7234042553191489, + "no_speech_prob": 0.029761046171188354}, {"id": 821, "seek": 334780, "start": 3355.48, + "end": 3362.2000000000003, "text": " that he says, like, I want to be useful. That''s + one thing he says. Yeah. And I guess in the same", "tokens": [50748, 300, 415, 1619, + 11, 411, 11, 286, 528, 281, 312, 4420, 13, 663, 311, 472, 551, 415, 1619, 13, 865, + 13, 400, 286, 2041, 294, 264, 912, 51084], "temperature": 0.0, "avg_logprob": -0.19293104927494842, + "compression_ratio": 1.7234042553191489, "no_speech_prob": 0.029761046171188354}, + {"id": 822, "seek": 334780, "start": 3362.2000000000003, "end": 3371.32, "text": + " way, trying to do the useful thing. And I guess like, obviously, I like these + big grandiose visions of", "tokens": [51084, 636, 11, 1382, 281, 360, 264, 4420, + 551, 13, 400, 286, 2041, 411, 11, 2745, 11, 286, 411, 613, 955, 45155, 541, 30746, + 295, 51540], "temperature": 0.0, "avg_logprob": -0.19293104927494842, "compression_ratio": + 1.7234042553191489, "no_speech_prob": 0.029761046171188354}, {"id": 823, "seek": + 334780, "start": 3371.32, "end": 3377.7200000000003, "text": " things like helping + with health care and self-driving cars and helping with poverty and creating housing", + "tokens": [51540, 721, 411, 4315, 365, 1585, 1127, 293, 2698, 12, 47094, 5163, 293, + 4315, 365, 10958, 293, 4084, 6849, 51860], "temperature": 0.0, "avg_logprob": -0.19293104927494842, + "compression_ratio": 1.7234042553191489, "no_speech_prob": 0.029761046171188354}, + {"id": 824, "seek": 337772, "start": 3377.72, "end": 3383.0, "text": " climate science, + all these kind of things, obviously. So obviously, there are these big grandiose", + "tokens": [50364, 5659, 3497, 11, 439, 613, 733, 295, 721, 11, 2745, 13, 407, 2745, + 11, 456, 366, 613, 955, 45155, 541, 50628], "temperature": 0.0, "avg_logprob": -0.1253102099309202, + "compression_ratio": 1.7509025270758123, "no_speech_prob": 0.0016097030602395535}, + {"id": 825, "seek": 337772, "start": 3383.0, "end": 3387.7999999999997, "text": + " goals that I think we all share truthfully. But then it''s more of a question + of how do you stay in", "tokens": [50628, 5493, 300, 286, 519, 321, 439, 2073, 3494, + 2277, 13, 583, 550, 309, 311, 544, 295, 257, 1168, 295, 577, 360, 291, 1754, 294, + 50868], "temperature": 0.0, "avg_logprob": -0.1253102099309202, "compression_ratio": + 1.7509025270758123, "no_speech_prob": 0.0016097030602395535}, {"id": 826, "seek": + 337772, "start": 3387.7999999999997, "end": 3394.2799999999997, "text": " the grind + of it? And how do you keep waking up and keep getting at it? And so I''d say that + kind of", "tokens": [50868, 264, 16700, 295, 309, 30, 400, 577, 360, 291, 1066, + 20447, 493, 293, 1066, 1242, 412, 309, 30, 400, 370, 286, 1116, 584, 300, 733, 295, + 51192], "temperature": 0.0, "avg_logprob": -0.1253102099309202, "compression_ratio": + 1.7509025270758123, "no_speech_prob": 0.0016097030602395535}, {"id": 827, "seek": + 337772, "start": 3394.2799999999997, "end": 3398.8399999999997, "text": " heuristic + of just trying to do useful things every day is actually a pretty good guide. And + so", "tokens": [51192, 415, 374, 3142, 295, 445, 1382, 281, 360, 4420, 721, 633, + 786, 307, 767, 257, 1238, 665, 5934, 13, 400, 370, 51420], "temperature": 0.0, "avg_logprob": + -0.1253102099309202, "compression_ratio": 1.7509025270758123, "no_speech_prob": + 0.0016097030602395535}, {"id": 828, "seek": 337772, "start": 3399.72, "end": 3405.64, + "text": " we all share these big visions. But we need the motivation to pick ourselves + off the couch and", "tokens": [51464, 321, 439, 2073, 613, 955, 30746, 13, 583, + 321, 643, 264, 12335, 281, 1888, 4175, 766, 264, 16511, 293, 51760], "temperature": + 0.0, "avg_logprob": -0.1253102099309202, "compression_ratio": 1.7509025270758123, + "no_speech_prob": 0.0016097030602395535}, {"id": 829, "seek": 340564, "start": 3405.64, + "end": 3410.6, "text": " achieve to do that. Yeah, absolutely. And it also sounds + like you mentioned you played basketball", "tokens": [50364, 4584, 281, 360, 300, + 13, 865, 11, 3122, 13, 400, 309, 611, 3263, 411, 291, 2835, 291, 3737, 11767, 50612], + "temperature": 0.0, "avg_logprob": -0.17737797113854115, "compression_ratio": 1.776978417266187, + "no_speech_prob": 0.020927393808960915}, {"id": 830, "seek": 340564, "start": 3410.6, + "end": 3417.0, "text": " and you continue playing that, right? So that thing, when + you do the sport, you need to be persistent,", "tokens": [50612, 293, 291, 2354, + 2433, 300, 11, 558, 30, 407, 300, 551, 11, 562, 291, 360, 264, 7282, 11, 291, 643, + 281, 312, 24315, 11, 50932], "temperature": 0.0, "avg_logprob": -0.17737797113854115, + "compression_ratio": 1.776978417266187, "no_speech_prob": 0.020927393808960915}, + {"id": 831, "seek": 340564, "start": 3417.0, "end": 3421.7999999999997, "text": + " right? And your body sometimes doesn''t want to do it, maybe. But you know in + your mind that you", "tokens": [50932, 558, 30, 400, 428, 1772, 2171, 1177, 380, + 528, 281, 360, 309, 11, 1310, 13, 583, 291, 458, 294, 428, 1575, 300, 291, 51172], + "temperature": 0.0, "avg_logprob": -0.17737797113854115, "compression_ratio": 1.776978417266187, + "no_speech_prob": 0.020927393808960915}, {"id": 832, "seek": 340564, "start": 3421.7999999999997, + "end": 3428.04, "text": " do want to do it. And so that persistence, I think, also + translates into, you know, the research", "tokens": [51172, 360, 528, 281, 360, + 309, 13, 400, 370, 300, 37617, 11, 286, 519, 11, 611, 28468, 666, 11, 291, 458, + 11, 264, 2132, 51484], "temperature": 0.0, "avg_logprob": -0.17737797113854115, + "compression_ratio": 1.776978417266187, "no_speech_prob": 0.020927393808960915}, + {"id": 833, "seek": 340564, "start": 3428.04, "end": 3433.4, "text": " and keeping + up with things, right? Yeah. Yeah. And to stay on that kind of analogy, I''d say + like the", "tokens": [51484, 293, 5145, 493, 365, 721, 11, 558, 30, 865, 13, 865, + 13, 400, 281, 1754, 322, 300, 733, 295, 21663, 11, 286, 1116, 584, 411, 264, 51752], + "temperature": 0.0, "avg_logprob": -0.17737797113854115, "compression_ratio": 1.776978417266187, + "no_speech_prob": 0.020927393808960915}, {"id": 834, "seek": 343340, "start": 3433.56, + "end": 3437.56, "text": " physical pain of basketball is like, you might hurt your + knee, you might have some tendonitis,", "tokens": [50372, 4001, 1822, 295, 11767, + 307, 411, 11, 291, 1062, 4607, 428, 9434, 11, 291, 1062, 362, 512, 46479, 16074, + 11, 50572], "temperature": 0.0, "avg_logprob": -0.1353157483614408, "compression_ratio": + 1.8571428571428572, "no_speech_prob": 0.0008999018464237452}, {"id": 835, "seek": + 343340, "start": 3437.56, "end": 3442.2000000000003, "text": " is that kind of physical + pain or the physical pain of when you''re doing conditioning and you can''t", "tokens": + [50572, 307, 300, 733, 295, 4001, 1822, 420, 264, 4001, 1822, 295, 562, 291, 434, + 884, 21901, 293, 291, 393, 380, 50804], "temperature": 0.0, "avg_logprob": -0.1353157483614408, + "compression_ratio": 1.8571428571428572, "no_speech_prob": 0.0008999018464237452}, + {"id": 836, "seek": 343340, "start": 3442.2000000000003, "end": 3447.7200000000003, + "text": " breathe, that you''re going to have that same kind of analog with this + kind of mental work. And", "tokens": [50804, 10192, 11, 300, 291, 434, 516, 281, + 362, 300, 912, 733, 295, 16660, 365, 341, 733, 295, 4973, 589, 13, 400, 51080], + "temperature": 0.0, "avg_logprob": -0.1353157483614408, "compression_ratio": 1.8571428571428572, + "no_speech_prob": 0.0008999018464237452}, {"id": 837, "seek": 343340, "start": 3447.7200000000003, + "end": 3453.4, "text": " it''ll manifest itself in like depression and burnout. + And so you have to be like, as you do more", "tokens": [51080, 309, 603, 10067, + 2564, 294, 411, 10799, 293, 44841, 13, 400, 370, 291, 362, 281, 312, 411, 11, 382, + 291, 360, 544, 51364], "temperature": 0.0, "avg_logprob": -0.1353157483614408, "compression_ratio": + 1.8571428571428572, "no_speech_prob": 0.0008999018464237452}, {"id": 838, "seek": + 343340, "start": 3453.4, "end": 3458.28, "text": " training, you get better at the + pain of the injuries. So to say like it''s like injuries to your", "tokens": [51364, + 3097, 11, 291, 483, 1101, 412, 264, 1822, 295, 264, 14799, 13, 407, 281, 584, 411, + 309, 311, 411, 14799, 281, 428, 51608], "temperature": 0.0, "avg_logprob": -0.1353157483614408, + "compression_ratio": 1.8571428571428572, "no_speech_prob": 0.0008999018464237452}, + {"id": 839, "seek": 345828, "start": 3458.28, "end": 3464.76, "text": " mind and + the same kind of analog as physical injuries would be. And I think understanding + that", "tokens": [50364, 1575, 293, 264, 912, 733, 295, 16660, 382, 4001, 14799, + 576, 312, 13, 400, 286, 519, 3701, 300, 50688], "temperature": 0.0, "avg_logprob": + -0.2401774525642395, "compression_ratio": 1.582995951417004, "no_speech_prob": 0.004428073298186064}, + {"id": 840, "seek": 345828, "start": 3464.76, "end": 3470.1200000000003, "text": + " and accepting it and dealing with it is important as well. And then it kind of + translates into maybe", "tokens": [50688, 293, 17391, 309, 293, 6260, 365, 309, + 307, 1021, 382, 731, 13, 400, 550, 309, 733, 295, 28468, 666, 1310, 50956], "temperature": + 0.0, "avg_logprob": -0.2401774525642395, "compression_ratio": 1.582995951417004, + "no_speech_prob": 0.004428073298186064}, {"id": 841, "seek": 345828, "start": 3470.1200000000003, + "end": 3476.84, "text": " some other region of your brain when you have this page + from like, you know, reviews or like", "tokens": [50956, 512, 661, 4458, 295, 428, + 3567, 562, 291, 362, 341, 3028, 490, 411, 11, 291, 458, 11, 10229, 420, 411, 51292], + "temperature": 0.0, "avg_logprob": -0.2401774525642395, "compression_ratio": 1.582995951417004, + "no_speech_prob": 0.004428073298186064}, {"id": 842, "seek": 345828, "start": 3476.84, + "end": 3482.92, "text": " your experiment going, hey, why are you can make? Oh yeah. + Oh yeah. Fine. I got to get a cup of coffee", "tokens": [51292, 428, 5120, 516, + 11, 4177, 11, 983, 366, 291, 393, 652, 30, 876, 1338, 13, 876, 1338, 13, 12024, + 13, 286, 658, 281, 483, 257, 4414, 295, 4982, 51596], "temperature": 0.0, "avg_logprob": + -0.2401774525642395, "compression_ratio": 1.582995951417004, "no_speech_prob": 0.004428073298186064}, + {"id": 843, "seek": 348292, "start": 3483.08, "end": 3489.64, "text": " and you + know, in five minutes, I''m okay. Maybe. Yeah. The coffee is the key supplement.", + "tokens": [50372, 293, 291, 458, 11, 294, 1732, 2077, 11, 286, 478, 1392, 13, 2704, + 13, 865, 13, 440, 4982, 307, 264, 2141, 15436, 13, 50700], "temperature": 0.0, "avg_logprob": + -0.21079522736218512, "compression_ratio": 1.488, "no_speech_prob": 0.013565145432949066}, + {"id": 844, "seek": 348292, "start": 3492.12, "end": 3498.36, "text": " Absolutely. + Corner, thanks so much. This was such a fantastic conversation. I''m pretty sure + we", "tokens": [50824, 7021, 13, 42391, 11, 3231, 370, 709, 13, 639, 390, 1270, + 257, 5456, 3761, 13, 286, 478, 1238, 988, 321, 51136], "temperature": 0.0, "avg_logprob": + -0.21079522736218512, "compression_ratio": 1.488, "no_speech_prob": 0.013565145432949066}, + {"id": 845, "seek": 348292, "start": 3498.36, "end": 3504.76, "text": " can repeat + it. Have another one. And I can''t wait to see what development you''re doing with + VIAVIAT", "tokens": [51136, 393, 7149, 309, 13, 3560, 1071, 472, 13, 400, 286, 393, + 380, 1699, 281, 536, 437, 3250, 291, 434, 884, 365, 691, 6914, 25322, 2218, 51456], + "temperature": 0.0, "avg_logprob": -0.21079522736218512, "compression_ratio": 1.488, + "no_speech_prob": 0.013565145432949066}, {"id": 846, "seek": 348292, "start": 3504.76, + "end": 3509.8, "text": " and also in all your research projects. You know, stay + active, stay hungry, stay foolish,", "tokens": [51456, 293, 611, 294, 439, 428, + 2132, 4455, 13, 509, 458, 11, 1754, 4967, 11, 1754, 8067, 11, 1754, 23478, 11, 51708], + "temperature": 0.0, "avg_logprob": -0.21079522736218512, "compression_ratio": 1.488, + "no_speech_prob": 0.013565145432949066}, {"id": 847, "seek": 350980, "start": 3509.8, + "end": 3514.28, "text": " as Steve Jobs used to say. And I think that''s fantastic + what you''re doing. Thanks so much.", "tokens": [50364, 382, 7466, 29169, 1143, + 281, 584, 13, 400, 286, 519, 300, 311, 5456, 437, 291, 434, 884, 13, 2561, 370, + 709, 13, 50588], "temperature": 0.0, "avg_logprob": -0.2781473875045776, "compression_ratio": + 1.2300884955752212, "no_speech_prob": 0.01643732562661171}, {"id": 848, "seek": + 350980, "start": 3515.0, "end": 3518.1200000000003, "text": " Thank you so much + for having me to meet me. Bye.", "tokens": [50624, 1044, 291, 370, 709, 337, 1419, + 385, 281, 1677, 385, 13, 4621, 13, 50780], "temperature": 0.0, "avg_logprob": -0.2781473875045776, + "compression_ratio": 1.2300884955752212, "no_speech_prob": 0.01643732562661171}]' +--- + +Hey everyone, Dr. Podgas here. And today we have Connor Shorten with me who will talk a bit about his research about lecture databases, about YouTube hopefully as well. So I'm expecting a really nice discussion today. +Hey Connor, how are you doing? Hey Dmitra, thanks so much for having me on the podcast. I'm really excited to continue our episode and maybe dive more into the deep learning research side. + I think our first podcast on Henry AI labs went really into the detail and the practical implementation and the history of Burton Elasticsearch and then all the different vector databases and I think so now we can kind of maybe look more in the research side of things and sort of discuss together about where we think all this vector search engine stuff is headed. +Oh yeah, absolutely. And it's exciting to be recording based on the day when you actually released that video. So obviously we will link it so for our listeners and our audiences. And hey, could you please introduce yourself? Yeah, great. +So to say to introduce myself, I guess I would like to kind of like be reintroducing myself almost every like year. So as obviously I make these YouTube videos and I'm kind of like still discovering my role in deep learning research and still learning myself. +In my journey, I'm in my second year of my PhD. I finished my master's degree where I got started with research on generative adversarial networks and data augmentation, published literature reviews on data augmentation for images and text. +And this has really been my research focus is data augmentation, the idea. Primarily my interest was I started out with when I first learned about deep learning right away, I come from being a basketball player. I played basketball in college and I was ready to go deep learning for basketball. +How can this improve basketball? So one thing about basketball is when you're playing, you want to have a highlight mix tape where you have all your best moves and helps you get the college scholarship. +And so I was really familiar with that process of what it takes to be recruited to play college basketball. So I wanted to build this computer vision system that would crop out, you know, you're made baskets from full game tapes automatically. +And so I came into this problem that everyone has seen where if you try to do supervised learning with small data sets, it does not work. So like annotating data is extremely difficult. +Like you can, if you're doing it yourself, you can probably get yourself like, you know, in my case, I was annotating made baskets and video clips, which is already high dimensional data already, you know, paying to store all that data. So, you know, and labeling it was a problem. +So I said, maybe data augmentation because I'm overfitting this data. So I can try to rotate it, crop it horizontally, flip it, increase the brightness, this whole package of things you can do. Yeah, and orientation. Yeah. Right. And so it worked pretty well. +So I was pretty inspired by this idea of data augmentation. +I really like papers like Francois Chalets on the measure of intelligence where discussing the ideas of like system centric generalization developer, where generalization known unknowns is kind of, you know, matrix of known and unknowns with generalization cases. +So I hold the belief that we can kind of steer the data in the direction that enables more generalization. And the key to mocking more generalization is mostly going to be in the data space. +So I'd say I'm in this data centric AI category, which is, you know, lately become one of the buzzwords of where your camp is. I love things like neural architecture search and different learning strategies and all that, but I really love the data augmentation. +I think there's so much opportunity and research to explore this further. And then. And so yeah, so I have a few ideas of how this could intersect with vector search engines and vector representation learning. So that's on one end. +So that's kind of, you know, my research interest is in data augmentation and a bit of a background about how I became so inspired in data augmentation. So then to say kind of what I'm doing right now is, you know, I've, so I've started doing some experiment papers. +Most of my computing is managed with Google collab, which is pretty nice. +You know, like, you have the Google collab notebooks and then you have the Google Drive integration for persistence and, you know, you can make it pretty far without putting a dent in your wallet by doing it by getting too carried away. And so that's kind of how I'm setting that up. +And, you know, I have, you know, I can tell people about like, as I mentioned, beginning trying to reintroduce myself and figure out my role. So I had kind of like recently, like a high of achieving the best student paper at this ICT AI conference on something about inductive biases. +And then the next day I get my ICLR reviews back, which were not great. So, you know, and that's kind of the journey of this, you know, I'm just setting, setting forward to ICML and trying to just bounce back and stay on this journey of figuring out how to do deep learning research. +So it's definitely a high. Isn't it? It's like almost always like that, you know, like in machine learning, nothing is predictable and nothing is given, you know, like, and you need to be kind of averse to that. Well, not averse, but resistant, right? Like, okay, I'm fine. +I can take risks, but it's like a marathon. It's not a sprint. Oh, yeah, definitely. And just the disappointment of investing a month or two into a research project and then you just start running the experiments. And you're like, oh, this is not working. +And your advisor's on the phone twice a week and saying, how's it going? And you're like, not good. You know, like, so that's stressful. And, you know, anyone else going through that, I can definitely relate to that kind of struggle. Is this by the way, why you do YouTube show Henry A. L.A. +Labs? Is this why you do it? Or is there something else as well? I just wanted to kind of tap into the psychological element of it if you thought about it. Yeah, yeah, I love to talk about it. +I mean, my inspiration for YouTube came from, I guess I was just like one of these people who really enjoyed like we would have guest lectures come to Florida Atlantic University. +One that stood out to me more than anything else is researchers from Johns Hopkins came to, they had built a prosthetic limb that connects to a brain computer interface. +And they have people who have lost their limbs and they can, you know, blindfolded touch an orange and say, this is an orange, this is an apple, this is a banana. And they came to talk to us at Florida Atlantic. And I mean, it was, it was inspiring. +I like, I love these kind of seminars and just, I guess like falling in love with this kind of presentation. +It's almost like, say like to me, it's kind of like an allegace to like maybe like stand-up comedy, how you have someone who gets up on stage and puts the show on, you know, the benefit of the slides behind them. And, you know, I really like these, these kind of talks. +And that's kind of, so that's kind of like the art of it is what I really like about YouTube. I mean, I definitely believe in YouTube as the medium for communicating these ideas right now. + You know, like, and we'll get into talking about writing on medium and like, yeah, like the different ways you can write on Twitter, you can write on medium, you can record podcasts and put it on Spotify, Apple, and you can write these research papers obviously just, you know, upload it to archive, treat it like a medium of the number of user on archive is probably less than what you get on YouTube. +The content is different too. So, yeah. Yeah. So, yeah, I really believe in the medium and then I just want to see the art form develop further. Like, I'm really impressed with what Yannick Kiltch was doing. +Like, right now he's just released auto regressive diffusion models and, you know, I'm excited to watch it and that's, and that's the fun about it is, is you have this excitement about it. Let's link that as well. It's a YouTube as well or like another show you mentioned. Yeah, yeah. +I think just YouTube, Yannick Kiltch, I think most of our viewers will know what we're talking about. I just want to make sure that I will also educate myself. So, so let's link that. Awesome. +Yeah, I mean, and so yeah, you said that data augmentation is one thing you worked on and I guess continue working on. +It's actually interesting that you did that in CV space, but there is also somehow connection in text, right? Can you tell a bit more about that? Yeah, so I, so I spent the, I think it was this, sorry, I'm getting my dates wrong. It's currently the fall. +So, I think I spent the summer spring of last year trying to transition these ideas into text. I did the image data augmentation survey in 2019 where the sentiment was still extremely hot around GANs, gender, vatricero networks. Everyone was really excited about this real fake loss. +We can generate data and then add that to the data set and then, you know, suddenly we have this very broad coverage for interpolation in our data space. So then I was trying to look into text. Text is, I say the key lesson I learned is that it's harder to be labeled preserving. +When you're forming the X prime Y, it's less likely that the Y is going to have that same high level class labels as you're trying to do things like say, like the starter kit would be random swapping, random insertion, random deletion, those kind of things. +And then you kind of transition into maybe trying to use a knowledge graph to better guide the text you're replacing. And then ideas like say mix up where you cut and paste and glue sentences together. I'm not like a huge fan of that, but it's kind of interesting. Yes. +Might as well have like drop out. It's kind of like a, you know, like I don't think there's a lot of intuition in the data space of why just smashing them together would work so well. But it does kind of work. +And then, and then I really like this category of generative data augmentation is obviously mentioning my start in gendered adversarial networks. And this idea that you learn the data distribution. +So you sample from the data distribution to learn classifiers and kind of classifiers being almost like a appendage of the generative model, which is, which is like what we're talking about with the modules, the supervised learning tasks that you append onto the vector search engine database. +It's like this task of having a generative model or say a representative vector space is kind of like the real context that built into the supervised learning task. Or at least that's the way I see it. And, you know, maybe anyone can leave a comment if they are have a different idea about that. +I think it's ill aimed. But so that's kind of how I see those two things integrating. So to connect this back to text, what we can do is text is we can use things like GPT three or more so what they do is you would prompt GPT three. +So you'd say, you know, please finish this movie review with a positive sentiment as the prompt. And then you can just remove whatever you want from the original data point. And GPT three can generate a new movie review. +And then you can blow up your data set size, avoid the pitfalls of overfitting and that kind of promise of data augmentation. So hopefully that kind of answers the question of how I did this transition from image to text data augmentation. Yeah, it does. +And I mean, why I'm asking also is because, you know, you can also treat these two sources of data in like kind of in a joint training task, right? So you can kind of train the joint neural network. +And for example, when you watch, let's say watch using the algorithm, you watch the movie or cartoon and you see some scene where, you know, one hero is kind of crying. The other one is cheering him up. +You know, now where do you pay attention to? It's also important, right? Because it's the whole scene. Now you need to pay attention, maybe just to that pin on his neck, you know, that he's not happy about. And you know, things like that. +So have you thought about that as well? Or are you still considering them as independent? Yeah, I know. Yeah, I love that idea. Like I think what we're the word that most people are using is multimodal learning. And I'd call that paper multimodal data augmentation. +And you know, just last night Microsoft released a new 2.5 billion parameter image text embedding space. You know, everyone's knows about OpenAI's clip image text spaces and the dolly, the avocado shaved armchair generation. Everyone likes that. So yeah, I mean, multimodal learning is so exciting. +Yeah, I'd say it's going to be an interesting thing with the computation of it and what kind of in what the computation requires, we're setting up these kind of tests. +I'd say, especially with video data, like you just mentioned, I, you know, I wouldn't really want to play around with video data with my collab Google Drive workflow that I mentioned earlier. Yeah. Yeah. +But it's interesting also that big players, like you mentioned Microsoft and I mean, others, they're moving in direction of increasing number of parameters in the model. But when you go to practice and you need to build a classifier, you know, you don't have that much capacity. +Like you don't want to spend that much capacity really unless you're building like a terminator level AI, which will handle all tasks that you have. But probably you won't do that because it's still not there. +So do you also think about that kind of the practical element or are you still kind of fencing the beauty of these complex models? Well, wait, do you see that? Yeah, well, I'll stake my flag in the same campus, the foundation models, researchers, and I think it was mostly Stanford. +They published this paper titled on the opportunities and risk of foundation models, some title like that. I'm sorry, it's not exactly correct. But, you know, this kind of ideology that big companies like Microsoft and Vitya Google Facebook, they'll build these big, big models. +And then what we'll do is we'll use this knowledge distillation interface to compress it into practical use cases. +And so we've seen, I'd say this started with Colin Raffle and the people who worked on my paper with the text to text transfer transform of the T5 model, not showed how you could unify all text supervised learning tasks through the same kind of language modeling style interface. +You just prompt it with, you know, natural language inference and then you give it the input or you say, answer this question, give it the input or you say, re-rank these documents and give them the document. So it's the same interface for every supervised learning task. +So yeah, I'm, and then just one more thing to kind of put in the citation context is this general purpose, like, opening iClip and it looks like Microsoft, I think they're calling it bletchly or something like that. + But this idea of just having two vector embedding spaces and then using the contrast of alignment as the general interface for any kind of task, because as we mentioned, you can put any task into natural language, any task that you're going to do with supervised learning could be described with natural language. +So you have that kind of interface and the Allen Institute has another architecture called general purpose vision systems that, you know, unifies all these tasks, object detection, semantic segmentation, service, normal estimation, all these kind of ideas are unifying one architecture interface. +So to kind of wrap up my answer to the question, I think it's going to be Microsoft and them scaling up like crazy. Maybe they're going to run it out of internet scale data eventually. I think Microsoft has said that they can train like a 32 trillion parameter model if they were motivated to do so. + So I think they're going to run out of internet scale data and then the data augmentation will be the next step from going from say like the 400 million image taxpayers that are now open sourced or Luther AI has the pile, which is like 800 gigabytes of raw text if you want to do something with that. +So I think eventually as you go into the 32 trillion parameter and on, they're going to use data augmentation to have these inductive biases about how we can keep scaling the data side of it. So yeah, so I think they can scale the models for a while. +Yeah, I guess they probably they are doing an amazing job, but like they are probably still writing the horse of what Peter Norby called the unreasonable effectiveness of data, right? +So like your algorithm might not be kind of as as nuanced as your data is and so just give it to the machine learning algorithm as much as possible and then kind of it will learn, right? But you know, like in practical situations, this is what I alluded to. +Like you just don't have that much data. On the other hand, you don't want you don't have that much choice and you also mentioned this. This is a very interesting topic of data augmentation in text because in images, you can do like cropping rotation and huge changes and whatnot. +In text, you can do that like so easily. For example, if you say you have a sentence London is the capital of Great Britain, you cannot put Barcelona there. It will not make sense. +So, you know, but like you can still find another example where you could probably swap cities and that's how you build, you know, the augmentation. But then there are other things. For example, if you take machine translation, you know, it suffers from hallucination problem. +I don't know if you heard about it, but like if you have certain like distortion in your data, for example, you call the websites and you also called erroneously the advertisement. +So you glued the advertisement to the source pair, source target pair, right? Now your model is hallucinating about that advertisement when the student has, right? So, and it's flipping facts. It's also switching, you know, object and subject easily. So it's not something. +And again, now I'm stepping on the territory of the model itself, right? But like, and model robustness. But I think data augmentation plays a key role in actually making sure that your model can kind of at least not hiccup on some very basic things, right? So. +Yeah, and we're completely in agreement with that. I think one other part to that story will be how, say, so Facebook has this model called retrieval augmented generation, where the whole idea is to add more context to avoid this hallucination problem. +So to kind of break down three things, you just said, I want to start off with the, yeah, the hallucination thing and transitioning right into that. +So, so I think the idea of adding more context is our best solution to stopping hallucination and maybe using consistency, contrastive loss, loss functions for the fine tuning to, to make sure they're attending on the context. +Because like I recently reviewed a paper on my channel titled open, open, open challenges in open domain generalization, some title like that, where, um, where yeah, these models, you get them the context. So they have additional context in the input, but they just don't read it. +And they just generalize as if it's not there. So fixing that problem is definitely step one. And so then to go into the second thing that you mentioned where you replaced London with Barcelona and that's the thing about tech data augmentation is, it's, it's not label preserving really. +It's harder to find symmetries in the space. It's easier to find these differences. So there's one paper. Maybe I'd like to point readers to titled on negative data augmentation. +And so they're kind of flipping the, so it's like, how do we use augmented data? Should we just keep using this, you know, kale divergence between the one hot class vectors or should we do something different with the augmented data? +I mentioned consistency losses where the loss would be, you know, the representations of X and X prime ignoring whatever the Y label is and negative data augmentation is saying, you know, push them apart. +These are not the same label. We've switched London with Barcelona. And so then I think the last thing, as we're talking about, like the practical implementation, I think you say two things, there's like two directions that which are really interesting. +And I think what you're getting to with the data augmentation is, is you want to prevent overfitting. And if you have, if you're, you know, grabbing Microsoft's 32 trillion parameter model, and you've only got 100 labeled examples, there's no way that's going to work. +So you want to prevent overfitting. And then I think kind of the second part to that story when people talk about this kind of topic is, is like storage and inference cost and obviously training costs. You're going to fine tune this. +So maybe training costs has been solved with prompting where you don't actually need to do any grading to send updates. You just give more in the input context. But then I think inference cost is solved with this knowledge installation interface. +And I think hugging face, man, I think the name of their product is lightning or something like that where it's about inference acceleration. And it looks like they're, you know, they're doing it pretty well. So I certainly bet on hugging face to solve that problem. Oh, yeah, absolutely. +I think they call it infinity, you know? Infinity. Yeah, sorry about that. Oh, it's okay. It's also like testing your memory, you know, like we remember. And I think it's still also like at some point, and I think Elon Musk is afraid of it. +Hey, Elon, if you're listening to this, hello, you know, like he's afraid of that our interface is way too slow, right? And so eventually I will basically supersede us, which I don't think so, but let's see. + But also like what's interesting, I was thinking that maybe a little bit like developing this topic further, but it sounds you have so much knowledge on this and it's so packed, what you said, you know, like, for example, if we could use the language model itself to help us generate, you said GPT, right? +It's generative model, but there could be some others, which will kind of help us to generate things and then augment the dataset. +But there is one beautiful that I don't know if you've read this paper. It's called what bird is not lessons from a new suite of cycling, holistic diagnostics for language models. And so basically the paper essentially claims that bird does not distinguish the negations. +And that can be super, super sensitive, like in sentiment analysis, right? At least, but also like in machine translation and other downstream tasks. So have you thought about this? Like basically there is actually a now a development. +I think it's also on Microsoft side to try to bring knowledge into the language model. And you can do it in a variety of ways you mentioned knowledge graph, but there are other ways kind of to bring in the structured knowledge. +So any thoughts on that on that topic? Yeah, and this is where I'm just starting getting back into we V8 because I think we V8 is going to be a huge part of solving that problem and adding the additional context. But first I want to raise you one paper. +So from the psycholinguistic thing, I want to point readers in the direction of viewers in the direction of checklist. It was one of the best paper awards at a recent ACL conference. ACL is I think ACL EM and OP, like the top NLP conferences checklist is exactly what you say. +It's a complete suite of tests for negations named entity swapping. And it's really nice to use. It's on GitHub. So yeah, so they have the interfaces for testing for that kind of thing, which I think once you have the test, you can start hacking away, it's solving it. +It's not theoretically grounded. If you have the right test, you could hack away until you pass the test. So checklist is the test for that. But then so yeah, so then the idea of context and and we V8. + So so V8 is so the vector search engine part and you know, Facebook paper dense passage retrieval is their current approach where they have, you know, the text embeddings, the documents and they're going to go retrieve the context so that you can avoid hallucination, hopefully avoid these kind of vulnerabilities through robustness. +But so vector search engines is what I see as being a huge player in solving that particular problem. And I see that transitioning not just from text, but image text of video text like the idea that you want to add some more context from your database to the current inference. Yeah, yeah. +I mean, V8 is doing fantastic work. Actually, we have a podcast recorded with mob and so, you know, my listener's can actually watch it and then we also had an episode with you where we covered some of the things. And you also recorded a bunch of videos like walking through the feature set. +What caught your attention in V8 when kind of if you can slightly compare to other database vendors? Okay, well, I don't have much of a comparison to other database vendors. And so I'm, you know, apologies to everyone out there working on this. +My experience with it doesn't come from the practical software engineering side of it. It comes from reading these research papers and then being familiar with these ideas. And then, I mean, V8 is easy to use. It's really well, the documentation is great. It's easy to get started with it. +So that was a huge thing for me is, you know, when I first met Bob, first of all, you know, he's a great guy and, you know, meeting this team. They're all really on top of everything and their slack chat is really great. People, you know, pitching in their problems and it's just a great community. +But, you know, what, what did it for me is, so I met Bob and then I spent about two weeks going through their documentation, the quick start, the installation set up, you know, get my data sets in there. And it's just really easy to use. +So I, and then, and then learning about all these other things like the Python client. +Like as we talk about fetching the context, I mean, we want to ingrate that into a training loop where say Facebook also recently released internet augmented generation where they're using the Bing API to bring in the context and then learn with that extra training. +So they have a Python client that lets you integrate that into your model workflows. And then something we talked about in our last podcast, I love the GraphQL interface. I think it's really cool. And I love the web demo. +So you can, you know, get started with the GraphQL interface and you can practice your queries, you know, you know, learn it quickly before you make any commitment of installing your mouse database. +So yeah, and I just think we be it is like a beautiful technology that's making my, my life is trying to do deep learning research just a lot easier. So, you know, it's awesome that they're willing to support Henry AI labs and help me continue making content on YouTube. +Well, at the same time, it's a, you know, it's a tool that helps me do what I want to do with this kind of research. Yeah. And are you like already using via V8 in your research or planning to use? Yeah. +So I haven't really made a Henry AI labs video on this yet, but it's something I'm really excited about. So one paper I recently had accepted in ICML A, not quite ICML, but ICML A, it's application to add it to it. +But it's a, it's a caros, Bert is the title of the paper and it's about, you know, language modeling with caros documentation and caros code examples and, you know, like Syek Paul, Franceschal Leigh, they're going crazy with these caros code examples. And there's so many examples. +Like you could, you have like a PhD and more organized completely online on this caros code examples to me. It's like the most interesting collection of deep learning information on the internet as the caros code examples. +So from there, there's like two ideas is like, can we build a language model that can like debug your caros code for you and, you know, open AI code X. Everyone knows that it looks like the answer to that is yes. And you know, they have the lead code, they have data sets of like lead code. +I know everyone loves lead code. And everyone is looking for a job. Yeah, code X is, you know, able to pass these lead code tests. So, you know, and I, you know, I'd say some lead code tests are harder than the deep learning debugging. +So, you know, it looks like it looks like a pretty promising solution. And so in the second project I have that I'm integrating WeeVeate, what to help me do is, is, you know, Facebook is big on unsupervised machine translation. +They did a paper where they're translating between Python and JavaScript without any annotation. So maybe we can translate between caros and PyTorch without needing to, or PyTorch and Jack's even to, without, you know, somehow without much labeling. And this is very much an infant research project. +But if you have that, if you could bring the caros code examples to PyTorch and Jack's and just, you know, help people share this knowledge. +So, so this is like two of my personal projects that I've started integrating WeeVeate in and then one of the project that I'm, you know, extremely passionate about and really into with my involvement with the university. +And this is kind of a separate thing that I'm not too heavy on because I don't want to like kind of push the commercial interest too much. It's, you know, and WeeVeate is open source. So it's an open source software. We have, we can download it from GitHub and we have it. +So they can't, you know, take it away. +And so, so this other project is, we're trying to build patient information retrieval systems where you, you know, you come to the hospital and they start to record your, you know, coagulation studies, they, all the physiological markers and the genetic history. +And we want to go query the literature maybe. So this is, you know, as a research project and the on Institute has been pioneering this with data sets like core 19 and their system called sub.ai. Salesforce research had a system called co-search. I'm just kind of naming things for people. +Oh my god, I'm not going to describe these things. +So these are like literature scientific literature mining systems where you, you know, you want information about say COVID-19 and or, you know, someone's coming in there with some obscure disease, you want to be able to query the literature with particular information about this patient. +And so this is the information retrieval problem that, you know, we're super interested in as spectrature search engine people. So we're trying to turn these patients into, which is what I have is mostly tabular data. +You might get a little bit of medical images, some clinical reports for some text, but, yeah, mostly tabular data. So we want to encode that into vectors, send those vectors into the scientific literature, and then maybe there's some clinical trial, you know, because it's so much data. +Once you really download, like say the core 19 data set from the on Institute, you'll realize that, you know, 500,000 papers about COVID is nothing anyone could read. You know, I already know this from reading deep learning papers. It's like no one can read this. +And even like, if you go traditional way, and I wanted those at the top in, in this area, you know, like if you go traditional way, let's say you have a keyword look up, right? +So keyword search, you would have to build like some kind of synonym layer, which means you need to understand what you're doing, or you will need to hire somebody to do that. +And that's like an additional step, which kind of like, you know, doesn't reduce the journey for you. You have to do that and this is that you feel like you have more control, maybe, but at the same time, it's very laborious. +So at the same time, similarity search kind of doesn't have that boundary, right? So essentially you haven't coded it and now you, you know, now that the challenge, the complexity moves more into the space of choosing the right neural network and then choosing the right database. +Everyone knows which is the right database. So, but anyway, but I'm just saying, like, but, but I'm just saying, like, do you think that similarity search will completely supersede keyword or you still see some synergy between them? Yeah. +And well, I like, before I get into saying my opinion on this, I'd say that I'm not the expert on keyword search. So, so here's my opinion on it. I, you know, we V8 has a symbolic filtering where you can still do symbolic searches. You can still do the keyword filtering. +You can still have these symbolic characteristics. And, you know, I'm in the same, I believe things like what Gary Marcus talks about, about, you know, it's not really robust to these symbolic queries. What we mentioned earlier, where you insert negation and it might completely throw it off. +So robustness is like not completely solved that. I was reading a paper this morning called from DeepMind Researcher's data augmentation can help robustness. It was like such a on the nose title, like that, like data augmentation helps robustness. So, so yeah, solving robustness. +And I'm, you know, I saw a, I'm not like, I still think solving robustness is a huge issue for this. It's not completely put together yet. Yeah, absolutely. I agree. I agree. +So, but like, yeah, you mentioned you are not an expert on keyword search, but at the same time, I think you were the expert of using like Google, right? So like you still type keywords. +And, and I think psychologically, you still expect, you know, the snippets to contain some of your keywords as a validation that the search engine got it, right? So like otherwise, search engine maybe that just, you know, returns you garbage in return to what you want. +Yeah, and that's why I think like like the page rank, transition dynamic matrices, though, those kind of things that that's like, it won't be enough to just have the vector search engine probably. You'll probably need some kind of like tuning layer. +And that's why, so we've got has the Python client. As I mentioned previously, a research project for this would be to integrate that Python client into the training loop of the, you know, whatever is doing the supervised learning task. So it kind of isn't just retrieving. +It's like when we talked about the difference in information retrieval and approximate nearest neighbor search, it's kind of like the semantics differences between the things you're encoding, where you might be encoding a like the email title and then the email body. +And so you have these different kind of like transitions between the categories of objects you're encoding. +So, so yeah, like the, you know, I still think that there's like a layer of, I don't know how to describe it, maybe like that system one system two, I know people like that analogy, but there's some kind of layer between keyword search and vector neural representations. +There's something in the middle of that. And, you know, I don't know what it is, but yeah, I guess page rank. Yeah. +Yeah, like basically you're talking about sort of even, even after vector database has returned to the nearest neighbors, you still have a sort of liberty to apply a re-runker, right? +Because and that's where your business logic kicks in, like the rules, the product, the vision, the design, there are so many inputs into that process of ranking. +And then ranking obviously is like a huge research area as well, you know, with the click biasing and things like that, right? Yeah, I mean, and it's also interesting. I just crossed my mind that yesterday, Richard Sorter announced his search engine and U.com. +And did you have a chance to check it out? Basically for listeners who didn't check it out yet, so it's a search engine which summarizes the web pages and the kind of documents and so on. And so you are kind of, it makes it actionable. +So just one example, they can find you a code snippet on Stack Overflow that you can actually copy paste. And that's just one example, right? But there are plenty of more. Any thoughts on this? Yeah, well, I mean, first of all, Richard Sacher, his research has been incredible. +And as I mentioned earlier in the podcast, I was listening to systems co-search from Salesforce Research was, he was one of the authors, I don't know who led the project. So yeah, U.com, I mean, it looks crazy. +Like, have I used it quite not really yet, but I definitely believe in the concept and yeah, the research is pointing in that direction. It's exciting. +But do I think like, solely neural system? Yeah, I mean, designing new interfaces around search, started to go around that a little bit as I'm trying to like think, well, I talk, but yeah, the U.com thing is exciting. New spaces for search engines. +It's hard to even completely conceptualize it, I think because it's such a, you think of Google as like this giant, undistructible search engine, but that's really not the story. There really is a ton of research and search engines. Yeah, yeah, but actually, I'm currently working for WebScale. +So, Changes, which I cannot mention because it's my client on the NDA, but we basically have all the charts and we know that Google is like 97%. And then everyone else is close to the bottom. Unfortunately, well, of course, Bing has a couple percent of the market. +And then it kind of, if you go inside a specific country, the split might be different. Like, if you take Russia, for example, Yandex is on top and then Google is following them, but very closely, you know, but overall, globally, Google is just somewhere beyond the sky. +So, you need to kind of differentiate a lot, you know, like you don't want to build another Google. +It's almost like Peter Tills book, you know, zero to one where he says, if you are building another Facebook, you're not learning anything from Mark Zuckerberg or if you're building another Google, you're not, you're not learning anything from the Google founders. +Like, you need to build that one, right? And I think Richard is trying to build that one probably. So, yeah, I mean, it's an interesting direction that he's trying to involve the AI much deeper in the process, probably already surfacing, you know, users. That's fantastic. +Yeah, yeah, I don't have anything to add other than just shared excitement about what you.com will become. It's certainly exciting. Yeah, absolutely. All the best Richard. Yeah, and you actually I wanted to make a slight segue into you shared like a ton of information today. +I wonder how do you keep up with so much stuff happening? Like, what are your preferred sources of information? Like, obviously YouTube is one, but, you know, there is also medium. There is publications themselves. +How did you structure your sort of consumption, you know, parts like the pacing and kind of where to pay, put your attention and so on? Yeah, that's a great question. +And, you know, early days of my podcast, I was doing the Machine Learning Street Talk with Tim Scarf and Yana Kiltcher and Tim asked Jonathan Frank, the author of the lottery ticket hypothesis, the same question. Like, what's your information diet? And I thought it's a really interesting question. +So mine is, you know, like most people out there trying to be good at something. It's chaotic and it gets overwhelming and I get really stressed out sometimes. So I don't know if this is the best advice to follow, but like, here's what I do. +So I, you know, I'm very active on Twitter, like maybe to the point of detrimental to my health, like I checked Twitter, like, all the time. Like, so I'm always refreshing Twitter and seeing the new headlines. +And so I, when I see like an archive link, I'll try to, like, if I like it, I've tried to discipline myself to be like, don't just like it. Like read the abstract, like get a couple sentences in because clearly, you know, the titles caught your attention. +So, so Twitter is really where I get all my news. And then the art form of making these YouTube videos, I mean, like Yana Kiltcher and Tim Scarf that I mentioned, the Machine Learning Street Talk, these kind of, this kind of medium. It's, I watch that. It's pretty good. +I think I watch it on like, Exploratory Street also Alexa, Miss Coffee Bean to kind of go on the list, you know, they're not the only ones doing it well. A lot of people are starting to make really great YouTube videos. And I love that kind of medium of showing these things. +So on my, my work, my like, my workout, say I'm a basketball player and I've got to work on my deep learning skills is it's mostly about reading these papers. My experiments, I'd say the coding part is not super challenging. +Thanks to things like Keras coding examples and like thanks to them, major thanks to them because that saves me so much headache in just getting running. So, so yeah, I try to, I try to read like five papers at a time. I tried to switch, I try to set 20 minute timers, drink a lot of coffee. +And what else do I do? Yeah, I guess that's it really reading the really reading the papers. I mean, if you make paper summary videos and write blog posts, that's also a huge way to retain it. I try to talk to a lot of people also just, you know, I try to keep a lot of contact. +Like I'm organized all this through Twitter. So like, you know, I might just send messages to say, Syek Paul from who makes, I think he works at Cardid and he makes, he's one of the leaders of Keras code examples. I'll send him ideas. I'll be like, you know, I saw this paper on Twitter. +I think, you know, this reminds me of what you're doing. And, and yes, I guess overall, that's my information diet. I'm probably leaving something that I didn't really, you know, prepare something for this, but no, it's okay. +I mean, it's also, it's also great that you're speaking your mind, but and things that really stick, you know, you mentioned them, right? But where on that scale, you would put medium, you know, the blogging platform where it kind of thrives with tutorials. +And sometimes these tutorials, they're kind of okay, but you kind of like, okay, are they going deep enough? But then there are other things where they summarize papers in such a way that they actually try to explain it. +It's almost like popularizing science because you do want to breed that next, you know, generation as well. And maybe you will have some feedback to your ideas because don't you think when you publish a research paper, you know, for the most part of the humanity, it's dry text. +For some, it's just Greek, right? They will not even understand it. They will never, they will never read it. And so, but they still might be curious, like, okay, how, you know, robots make decisions or something like that. +You know, so, how does my car, how does my car keep the lane keeps the lane? And actually today I was driving, I was driving to work and I was like, my car actually switched to the lane keeping mode. And it was telling me that I should not, you know, steer to the left that much. +So it was actually steering to the right. But the moment it noticed that I put my hands away from the steering wheel, it actually started alarming me and saying, hey, I'll sleep or something, you know. +So it's also like kind of caring for you, right? In a way, so it's not trying to do so much more work, in that sense. +Yeah, like, the idea of popular science, I mean, you know, I'm recording my podcast behind a bookshelf, like it makes me look smarter, but I only really, I only really read books like, you know, like the book of, I mean, the book of why is a bad example. +That's a really great book, like technical and I really really like that one. But most of these like popular science books, I'd have to be like on an airplane or something like I, or are in the same with the category of medium articles that are popular science. +Like, you know, I read research papers only, not to like be dismissive of anything else, but that's just like the question of what particularly do I study. And in my approach is very people-centric. +Like, you know, like when, say, Chelsea Finn publishes a new paper on Twitter, I'll go read that because I kind of have been following her thinking, like Jeff Cloon is another example with the AIGA's or François Shalide. +These kind of people like I, like Michael Bronson with the geometric deep learning is another great example. I hate doing these lists. +I never like to do these lists because it's so endless, like the vocabulary you need to kind of assess, like I've left off so many people, but you know, I like the people's centric focus and I try to get to know these people and understand like how they think of these things. +It's like the same thing as you go to the conference. Sometimes you don't go to that specific topic. +Maybe when you're a little bit more junior, you do, but later in your career, like academic or industrial, you actually go to listen to that person because they might not give you any novel idea, but they might give you so much experience that you daily, like really need, right? +Yeah, and just following the timeline of their work, it helped, like their newest work will help you realize, oh, that's their thinking in the past work too. +I kind of see how they're thinking about these things. And it's like, you know, everybody thinks so abstract. They have this idea, this vision, and it can be hard to communicate the vision in writing or videos. +So yeah, just like you said, I think just repeated exposure to the same person is like, hopefully that's Henry AI last thing. Yeah, absolutely. I'm pretty sure. I saw some really great comments underneath your videos, you know, some people were saying, I can't wait for the next one. +So you definitely doing great job there. So could as to you for doing that for so long actually. I don't know for how long you've been doing this, but you have a ton of videos. Yeah, and I really appreciate it. +You know, the people who keep commenting, I, you know, I recognize your profiles, and I do really, really appreciate it. So it helps me keep making the videos and staying convinced of that medium of YouTube being one of the ways to express these ideas. +I'd say like even, even more so than writing papers that you submit to these conferences. Sometimes I, you know, I think making a YouTube video can be a powerful way to share ideas. +I don't know if I want to completely put my flag on that idea because I, you know, these reviews, you do get some really good reviews. Like as I mentioned previously at the beginning of the video, I, you know, I literally got smashed on my ICLR reviews. +They were not good, but I got, I got really high quality feedback. So, yes. You know, you're learning from it. You're learning. Right. Yeah. Actually, one of my managers used to say feedback is gold. +So even if it feels painful, take it because because the problem is that sometimes, especially as you grow in your career, you know, at some point you will be the role model for some other people. Now, where do you get the feedback from nowhere? Because you're the person giving feedback. +But you still need to grow. You still have pains, you have doubts, you have ideas, you need validation. And maybe you're doing something wrong as well at some point. Maybe somebody is intimidated to tell you that because you are at the top. You are like the boss or whatever. +You know, like who gives you feedback at that point? They actually recommend to turn to, you know, professional coaches and kind of those people who can actually steer you in some direction. Right. Oh, maybe you can unload your thoughts. +Have you found yourself in that situation? Or what, what do you think? Yeah. Well, I mean, I'm in a lucky situation where I do have a formal PhD advisor that, as I mentioned, I speak on the phone with very often. +And, and you know, my PhD advisor and I had a relationship for so long that he like introduced machine learning to me. So it's like, I was a basketball player, you know, taking classes. And I, and so this was my introduction to machine learning. +I like, I hardly understood like, you know, like a tea test statistical regression analysis before this class. So it's like, so I'm, I've had the same advisor for a long time in that regard, like a formal academic advisor. +And then meeting people like Bob and, you know, you and I as we talk now, I, you know, trying to reach out and pick the brains of people and see what they think. I guess. Yeah. So basically they are like, they become like, you might have multiple role models. +And sometimes, you know, like they also say, you do not need a physical person with whom you talk, but it could be some kind of online person. Like for me, it used to be for a long time, Elon Musk, because I've been focusing on building startups. +And, and his approach to startups was not like, hey, you know, go unleash yourself, get rid of your doubt and just do it. No, he's so deep into what he does. +Like at some point, I want to record a podcast where I would like to talk to you or talk to somebody to actually explain and kind of does it resonate with you, like he's thinking, like, first, you need to try this before automating this. +You need to repeat it several times to learn new mistakes and blah, blah, blah. So it's like an amazing way. +And he like build this kind of, you know, a thought machinery that he applies to any problem, right? So any problem that lands in his hands, he's like, I can try it step by step like that and see what happens. And maybe at some point it just drops out and you're like, okay, I'm done here. +I'm moving to the next one, right? So I'm not going to waste my time. And he's a super productive guy, as we know. So I mean, sometimes it could be just an online person that you follow. And as you said, you do this on Twitter, like you said, like maniacally refreshing the tweeters. +So just stay stay safe as well there. But at the same time, I think the respiratory time in your life, when you're learning a ton. And later in your life, you will be kind of generating fruit out of it mostly. Or maybe you will be telling to other people and maybe inspiring them more and more. +And then leading some research groups and the work, you know, teams. And that's that's totally fine. But I also wanted to call out your idea that I think is quite instructive for many of us. +And hopefully to our listeners that yes, do go go and read papers because as Andrew Ang put it, he said, if you read a paper every weekend, let's say you have a full-time job, you don't have time to read it, you can read it on the weekend. +At some point, and he also recommended to start coding, you know, like actually you didn't find the code for it, just try to implement the idea, right? +At some point, after reading the papers, you will actually start generating ideas because you will find gaps in the thinking of the authors on all of these papers. +And nobody is doing perfect job there. They're doing the publishable work, right? And so I think that resonates with you as well. Yeah, definitely. +You definitely like switch gears where you become an idea machine like you say where you read a paper and you'll have like a billion ideas for how to extend it. And then you'll transition to this part, which is what I'm learning now. +And, you know, as I'm in my last year, I've been two years in my PhD and the transition for me is going from idea machine to, okay, can you really build the idea for real? Do you really know how to test this? And so, and that transition isn't super obvious. +And it's painful to be going back and forth between, you know, theoretical idea machine. +I'm reading these papers because like in terms of like that flow state of creativity that you get into when you're when you're working on things, for me, personally, reading papers is like the most satisfying thing. I feel very like productive when I'm reading papers. +I might, you know, I feel good. But when I'm engineering things, I feel more pain, man, because it's more painful, I'd say. Yes, yes. +And this is where, of course, you do want to have those oiled well, well, well, oiled software systems that you don't need to waste your time setting things up or running out of disco, whatever, you know, heaven so, so frequently. +So like even the innocuous things like before I had integrated Google Drive with Google collab, and it would crash. And I feel like I've just lost 10 hours of running this thing. So, and that is not good. +Like, this is I think what Joel Spolski said at some point, you know, the co-founder of Stack or Flow, you know, he said like, imagine that you want to print a piece of paper and you log into your computer and it says, please upgrade the driver. +So you upgrade the driver and then operating system says I need to reboot. So it reboots and it basically waits 10 minutes of your time. And then you, and then again, it says, hey, actually, I cannot print because you ran out of something now. Again, it installs them. +And you like, instead of solving the problem, you become the administrator of your computer, right? +And that's the same, the same thing can happen so much, so often in software, you know, development and research as well, because, because I think somebody will put on Twitter, we do not actually choose between big and small, like do a lot of things and do like small amount of things. +We usually choose between small and nothing. And so I guess when those things eating a lot of your small time, right, to nothing, you're like frustrated and you're like, okay, I'm just down the rabbit hole. +What am I doing? +And so I think tools like VEVIates save a ton of time and everybody who is innovating in this space from the direction of usability, you know, like and saving time, shaving those minutes off of, you know, your experience, I think that will save so much time for your thinking as well. +Yeah. And before VEVIate, I was doing a little bit of the sponsored content work, and which for me is great because I get to talk to these people and they teach me a lot. And so this is with the term in AI, which is now a part of you who have packered. +And so yeah, they're building the hyper-pram, like distributed training hyper-pram, reorganization, which what we're talking about, like the administer of the system, they're doing a lot of this work. +And you know, as anyone, I'm sure people listening to this have gotten smoked with the cost of one of these experiments too. So it's not just your time. It's not fun. Yeah, actually, you reminded me of on Google Cloud. It was a tutorial, like a workshop. It was a free one. +They even like gave us food. So you just show up, they video and then they tell you things. And it was a practical one. And I remember one of the instructors, he was not an employee of Google, but he was certified. And you know, like he said, hey, now we're gonna spin the Spanner cluster. +And Spanner is the my SQL planet scale with all the consistency and semantic guarantees using atomic clocks. And there is like a fantastic presentation by one of its engineers that I have in my recordings. I have not published yet because I don't know if Google will try to sue me. +But you know, the idea is that it's a fantastic system. And there is a paper as well. And then the guide, the teacher, he said, well, hold on. Don't spin too many of them because I get the bill. And last month, I got a bill of $4,000. +And Google could not reimburse it because they said, you're not an internal employee. So he was like, it's fun. But you know, to the point when you might. Yeah, it's funny. It's funny now, but it's not funny at all. Yeah, that determined AI calls it lunch and learn. +There are this kind of concept for deep learning, or like I'd say to science content, like even like, you know, with physics and they're going to be doing experiments where it's expensive. So we're not going to each be doing it. +We're going to watch one person do it and kind of gather around as a community. And yeah, I see that as being a huge part. Just like Uber eats coupons, I think is a brilliant interface for it. And then everyone attends the thing. But yeah, I love that kind of. +And then just quickly, so like one thing we're working on at Weve 8. And as people have seen with hugging face data sets and the Kaggle competitions, well, hugging face data is a little different, but it is hosting the demos cheaply. So that so in Weve 8, we're working on this. +The wiki data is going to be the next big release where we have the pie torch, big graph embeddings, which is the graph structure makes it different from say Wikipedia, because it's really good at entity embedding. + As we mentioned London and Barcelona, if you construct a knowledge graph of Barcelona compared to London, that's going to have a better entity representation using learning techniques like deep walk or note to veck or maybe maybe like a graph convolutional network with an auto encoder loss, but probably deep walk or note to veck is what I would say is, I mean, I'm not completely caught up with that, but anyway, so having that kind of data set, the wiki data, and now it's cheaper. +That's the huge difference. That's the change in deep learning is hugging face is hosting all these data sets, so you don't have to host them yourself. You can just quickly access them. + And with Weve 8, it's even more exciting, in my opinion, because they're hosting a vector search engine with model inference, I mean, hugging face is doing model inference too, as we talked about infinity where they've got inference time data like milliseconds for these massive models is, yeah, is you don't have to pay for the hosting of these things, which is obviously good. +Absolutely, absolutely. And also not like massive with hosting things, because that's also the cost of maintaining is the cost not to neglect. So absolutely. Yeah, yeah. Absolutely. Hey, it was such a packed conversation. +I think the show notes will be infinite, because you mentioned so many names, so many articles, and that's fantastic. Thanks so much for doing this. I wanted to just still kind of end on kind of a little bit like that philosophical stance, which I usually do. +And I think we touched a lot on that and thanks for doing this. But like in summary, what drives you? Why are you doing this? What you are doing? That's great question. I mean, I guess like, and I've heard, as you mentioned, Elon Musk, I've heard that he says, like, I want to be useful. +That's one thing he says. Yeah. And I guess in the same way, trying to do the useful thing. +And I guess like, obviously, I like these big grandiose visions of things like helping with health care and self-driving cars and helping with poverty and creating housing climate science, all these kind of things, obviously. +So obviously, there are these big grandiose goals that I think we all share truthfully. +But then it's more of a question of how do you stay in the grind of it? And how do you keep waking up and keep getting at it? And so I'd say that kind of heuristic of just trying to do useful things every day is actually a pretty good guide. And so we all share these big visions. +But we need the motivation to pick ourselves off the couch and achieve to do that. Yeah, absolutely. +And it also sounds like you mentioned you played basketball and you continue playing that, right? So that thing, when you do the sport, you need to be persistent, right? And your body sometimes doesn't want to do it, maybe. But you know in your mind that you do want to do it. +And so that persistence, I think, also translates into, you know, the research and keeping up with things, right? Yeah. Yeah. + And to stay on that kind of analogy, I'd say like the physical pain of basketball is like, you might hurt your knee, you might have some tendonitis, is that kind of physical pain or the physical pain of when you're doing conditioning and you can't breathe, that you're going to have that same kind of analog with this kind of mental work. +And it'll manifest itself in like depression and burnout. And so you have to be like, as you do more training, you get better at the pain of the injuries. So to say like it's like injuries to your mind and the same kind of analog as physical injuries would be. +And I think understanding that and accepting it and dealing with it is important as well. And then it kind of translates into maybe some other region of your brain when you have this page from like, you know, reviews or like your experiment going, hey, why are you can make? Oh yeah. Oh yeah. Fine. +I got to get a cup of coffee and you know, in five minutes, I'm okay. Maybe. Yeah. The coffee is the key supplement. Absolutely. Corner, thanks so much. This was such a fantastic conversation. I'm pretty sure we can repeat it. Have another one. +And I can't wait to see what development you're doing with VIAVIAT and also in all your research projects. You know, stay active, stay hungry, stay foolish, as Steve Jobs used to say. And I think that's fantastic what you're doing. Thanks so much. Thank you so much for having me to meet me. Bye. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/connor-shorten-research-scientist-weaviate-chatgpt-llms-form-vs-meaning.md b/transcripts_with_timestamps/vector-podcast/connor-shorten-research-scientist-weaviate-chatgpt-llms-form-vs-meaning.md new file mode 100644 index 0000000..0730920 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/connor-shorten-research-scientist-weaviate-chatgpt-llms-form-vs-meaning.md @@ -0,0 +1,4895 @@ +--- +description: '

Topics:

00:00 Intro

01:54 Things Connor learnt in the + past year that changed his perception of Vector Search

02:42 Is search becoming + conversational?

05:46 Connor asks Dmitry: How Large Language Models will change + Search?

08:39 Vector Search Pyramid

09:53 Large models, data, Form vs + Meaning and octopus underneath the ocean

13:25 Examples of getting help from + ChatGPT and how it compares to web search today

18:32 Classical search engines + with URLs for verification vs ChatGPT-style answers

20:15 Hybrid search: keywords + + semantic retrieval

23:12 Connor asks Dmitry about his experience with sparse + retrieval

28:08 SPLADE vectors

34:10 OOD-DiskANN: handling the out-of-distribution + queries, and nuances of sparse vs dense indexing and search

39:54 Ways to + debug a query case in dense retrieval (spoiler: it is a challenge!)

44:47 + Intricacies of teaching ML models to understand your data and re-vectorization

49:23 + Local IDF vs global IDF and how dense search can approach this issue

54:00 + Realtime index

59:01 Natural language to SQL

1:04:47 Turning text into + a causal DAG

1:10:41 Engineering and Research as two highly intelligent disciplines

1:18:34 + Podcast search

1:25:24 Ref2Vec for recommender systems

1:29:48 Announcements

For + Show Notes, please check out the YouTube episode below.

This episode on YouTube: + https://www.youtube.com/watch?v=2Q-7taLZ374

Podcast + design: Saurabh Rai: https://twitter.com/srvbhr

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20230311_070307_5788fcdf763e7dd822dd4b0bbb59f9b6.jpg +pub_date: Sat, 11 Mar 2023 19:38:10 GMT +title: Connor Shorten - Research Scientist, Weaviate - ChatGPT, LLMs, Form vs Meaning +url: https://rss.com/podcasts/vector-podcast/861832 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 28.46, "text": " Hello + there, vector podcasts. Season 2, and to them super super super excited to have", + "tokens": [50364, 2425, 456, 11, 8062, 24045, 13, 16465, 568, 11, 293, 281, 552, + 1687, 1687, 1687, 2919, 281, 362, 51787], "temperature": 0.0, "avg_logprob": -0.5331543142145331, + "compression_ratio": 1.1333333333333333, "no_speech_prob": 0.11617506295442581}, + {"id": 1, "seek": 2846, "start": 28.46, "end": 35.58, "text": " a reappearance of + corner shortened on vector podcasts. We recorded like a year ago about that time.", + "tokens": [50364, 257, 35638, 14881, 719, 295, 4538, 45183, 322, 8062, 24045, 13, + 492, 8287, 411, 257, 1064, 2057, 466, 300, 565, 13, 50720], "temperature": 0.0, + "avg_logprob": -0.3013858393618935, "compression_ratio": 1.578512396694215, "no_speech_prob": + 0.28037235140800476}, {"id": 2, "seek": 2846, "start": 35.58, "end": 43.74, "text": + " Something''s changed. He is a research scientist at Semi Technologies, the company + behind VB8.", "tokens": [50720, 6595, 311, 3105, 13, 634, 307, 257, 2132, 12662, + 412, 318, 13372, 46993, 11, 264, 2237, 2261, 691, 33, 23, 13, 51128], "temperature": + 0.0, "avg_logprob": -0.3013858393618935, "compression_ratio": 1.578512396694215, + "no_speech_prob": 0.28037235140800476}, {"id": 3, "seek": 2846, "start": 44.38, + "end": 49.019999999999996, "text": " Here you can see an episode with Bob, and here + you can see the episode with Connor as well.", "tokens": [51160, 1692, 291, 393, + 536, 364, 3500, 365, 6085, 11, 293, 510, 291, 393, 536, 264, 3500, 365, 33133, 382, + 731, 13, 51392], "temperature": 0.0, "avg_logprob": -0.3013858393618935, "compression_ratio": + 1.578512396694215, "no_speech_prob": 0.28037235140800476}, {"id": 4, "seek": 2846, + "start": 51.019999999999996, "end": 56.94, "text": " And back then when we were + talking, Connor, you''ve been a lot into basketball. Do you still play", "tokens": + [51492, 400, 646, 550, 562, 321, 645, 1417, 11, 33133, 11, 291, 600, 668, 257, 688, + 666, 11767, 13, 1144, 291, 920, 862, 51788], "temperature": 0.0, "avg_logprob": + -0.3013858393618935, "compression_ratio": 1.578512396694215, "no_speech_prob": 0.28037235140800476}, + {"id": 5, "seek": 5694, "start": 56.94, "end": 63.099999999999994, "text": " basketball? + Yeah, I still play a little bit. And I''ll add also to that that I think you also", + "tokens": [50364, 11767, 30, 865, 11, 286, 920, 862, 257, 707, 857, 13, 400, 286, + 603, 909, 611, 281, 300, 300, 286, 519, 291, 611, 50672], "temperature": 0.0, "avg_logprob": + -0.30982985245554073, "compression_ratio": 1.55, "no_speech_prob": 0.02928188629448414}, + {"id": 6, "seek": 5694, "start": 63.099999999999994, "end": 69.5, "text": " have + podcasts with Eddie and Laura, also in the queue of we''ve read. We''ll add that. + Exactly.", "tokens": [50672, 362, 24045, 365, 23911, 293, 13220, 11, 611, 294, 264, + 18639, 295, 321, 600, 1401, 13, 492, 603, 909, 300, 13, 7587, 13, 50992], "temperature": + 0.0, "avg_logprob": -0.30982985245554073, "compression_ratio": 1.55, "no_speech_prob": + 0.02928188629448414}, {"id": 7, "seek": 5694, "start": 70.22, "end": 76.62, "text": + " And I remember like you''ve been big on computer vision, data augmentation back + then, and you", "tokens": [51028, 400, 286, 1604, 411, 291, 600, 668, 955, 322, + 3820, 5201, 11, 1412, 14501, 19631, 646, 550, 11, 293, 291, 51348], "temperature": + 0.0, "avg_logprob": -0.30982985245554073, "compression_ratio": 1.55, "no_speech_prob": + 0.02928188629448414}, {"id": 8, "seek": 5694, "start": 76.62, "end": 83.42, "text": + " first like guinea pig task was you know some capturing baskets in the basketball + game. And I", "tokens": [51348, 700, 411, 695, 31940, 8120, 5633, 390, 291, 458, + 512, 23384, 42853, 294, 264, 11767, 1216, 13, 400, 286, 51688], "temperature": 0.0, + "avg_logprob": -0.30982985245554073, "compression_ratio": 1.55, "no_speech_prob": + 0.02928188629448414}, {"id": 9, "seek": 8342, "start": 83.42, "end": 89.42, "text": + " wonder if you continued working on that at some point. Yeah, I think about it + every now and then,", "tokens": [50364, 2441, 498, 291, 7014, 1364, 322, 300, 412, + 512, 935, 13, 865, 11, 286, 519, 466, 309, 633, 586, 293, 550, 11, 50664], "temperature": + 0.0, "avg_logprob": -0.1597790410441737, "compression_ratio": 1.7359154929577465, + "no_speech_prob": 0.01220800168812275}, {"id": 10, "seek": 8342, "start": 89.42, + "end": 94.46000000000001, "text": " but I''ve been so captivated by the natural + language processing and the tech search honestly. I", "tokens": [50664, 457, 286, + 600, 668, 370, 40769, 770, 538, 264, 3303, 2856, 9007, 293, 264, 7553, 3164, 6095, + 13, 286, 50916], "temperature": 0.0, "avg_logprob": -0.1597790410441737, "compression_ratio": + 1.7359154929577465, "no_speech_prob": 0.01220800168812275}, {"id": 11, "seek": 8342, + "start": 94.46000000000001, "end": 101.42, "text": " still think about image search + a bit, but yeah, the tech search to me is just it''s just so exciting.", "tokens": + [50916, 920, 519, 466, 3256, 3164, 257, 857, 11, 457, 1338, 11, 264, 7553, 3164, + 281, 385, 307, 445, 309, 311, 445, 370, 4670, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.1597790410441737, "compression_ratio": 1.7359154929577465, "no_speech_prob": + 0.01220800168812275}, {"id": 12, "seek": 8342, "start": 101.42, "end": 105.74000000000001, + "text": " It feels like there''s so much that you can do with it. And yeah, it''s + really been it''s been an", "tokens": [51264, 467, 3417, 411, 456, 311, 370, 709, + 300, 291, 393, 360, 365, 309, 13, 400, 1338, 11, 309, 311, 534, 668, 309, 311, 668, + 364, 51480], "temperature": 0.0, "avg_logprob": -0.1597790410441737, "compression_ratio": + 1.7359154929577465, "no_speech_prob": 0.01220800168812275}, {"id": 13, "seek": 8342, + "start": 105.74000000000001, "end": 110.7, "text": " intense year. I''ve learned + so much and I think it''ll be a totally different podcast with respect to like", + "tokens": [51480, 9447, 1064, 13, 286, 600, 3264, 370, 709, 293, 286, 519, 309, + 603, 312, 257, 3879, 819, 7367, 365, 3104, 281, 411, 51728], "temperature": 0.0, + "avg_logprob": -0.1597790410441737, "compression_ratio": 1.7359154929577465, "no_speech_prob": + 0.01220800168812275}, {"id": 14, "seek": 11070, "start": 110.86, "end": 117.98, + "text": " what I''m this talking about. Yeah, yeah, absolutely. I actually love + to start also by asking you,", "tokens": [50372, 437, 286, 478, 341, 1417, 466, + 13, 865, 11, 1338, 11, 3122, 13, 286, 767, 959, 281, 722, 611, 538, 3365, 291, 11, + 50728], "temperature": 0.0, "avg_logprob": -0.19961551867033306, "compression_ratio": + 1.5761316872427984, "no_speech_prob": 0.009259650483727455}, {"id": 15, "seek": + 11070, "start": 117.98, "end": 123.66, "text": " what do you feel you''ve learned + in this year that has changed something fundamentally in how you", "tokens": [50728, + 437, 360, 291, 841, 291, 600, 3264, 294, 341, 1064, 300, 575, 3105, 746, 17879, + 294, 577, 291, 51012], "temperature": 0.0, "avg_logprob": -0.19961551867033306, + "compression_ratio": 1.5761316872427984, "no_speech_prob": 0.009259650483727455}, + {"id": 16, "seek": 11070, "start": 123.66, "end": 131.98000000000002, "text": " + perceive vector search today versus back then and year ago? Yeah, that''s a big + question. I think", "tokens": [51012, 20281, 8062, 3164, 965, 5717, 646, 550, 293, + 1064, 2057, 30, 865, 11, 300, 311, 257, 955, 1168, 13, 286, 519, 51428], "temperature": + 0.0, "avg_logprob": -0.19961551867033306, "compression_ratio": 1.5761316872427984, + "no_speech_prob": 0.009259650483727455}, {"id": 17, "seek": 11070, "start": 131.98000000000002, + "end": 137.26, "text": " I''m definitely with we V8. I''ve learned a lot about having + like kind of the user focus, the", "tokens": [51428, 286, 478, 2138, 365, 321, 691, + 23, 13, 286, 600, 3264, 257, 688, 466, 1419, 411, 733, 295, 264, 4195, 1879, 11, + 264, 51692], "temperature": 0.0, "avg_logprob": -0.19961551867033306, "compression_ratio": + 1.5761316872427984, "no_speech_prob": 0.009259650483727455}, {"id": 18, "seek": + 13726, "start": 137.26, "end": 142.14, "text": " product focus definitely way more + engineering understanding of the distributed data system,", "tokens": [50364, 1674, + 1879, 2138, 636, 544, 7043, 3701, 295, 264, 12631, 1412, 1185, 11, 50608], "temperature": + 0.0, "avg_logprob": -0.1706287120950633, "compression_ratio": 1.7112462006079028, + "no_speech_prob": 0.0024359251838177443}, {"id": 19, "seek": 13726, "start": 142.14, + "end": 147.1, "text": " replication, cap theorem, all these kind of things. So like + the knowledge of the engineering", "tokens": [50608, 39911, 11, 1410, 20904, 11, + 439, 613, 733, 295, 721, 13, 407, 411, 264, 3601, 295, 264, 7043, 50856], "temperature": + 0.0, "avg_logprob": -0.1706287120950633, "compression_ratio": 1.7112462006079028, + "no_speech_prob": 0.0024359251838177443}, {"id": 20, "seek": 13726, "start": 147.1, + "end": 152.22, "text": " around it in addition to sort of the machine learning research + about like how to vector representations", "tokens": [50856, 926, 309, 294, 4500, + 281, 1333, 295, 264, 3479, 2539, 2132, 466, 411, 577, 281, 8062, 33358, 51112], + "temperature": 0.0, "avg_logprob": -0.1706287120950633, "compression_ratio": 1.7112462006079028, + "no_speech_prob": 0.0024359251838177443}, {"id": 21, "seek": 13726, "start": 152.22, + "end": 156.7, "text": " get optimized with deep learning models and then you know, + this whole retrieve and read research.", "tokens": [51112, 483, 26941, 365, 2452, + 2539, 5245, 293, 550, 291, 458, 11, 341, 1379, 30254, 293, 1401, 2132, 13, 51336], + "temperature": 0.0, "avg_logprob": -0.1706287120950633, "compression_ratio": 1.7112462006079028, + "no_speech_prob": 0.0024359251838177443}, {"id": 22, "seek": 13726, "start": 156.7, + "end": 161.34, "text": " And overall the space is evolved in such an amazing way + and it''s just really exciting.", "tokens": [51336, 400, 4787, 264, 1901, 307, 14178, + 294, 1270, 364, 2243, 636, 293, 309, 311, 445, 534, 4670, 13, 51568], "temperature": + 0.0, "avg_logprob": -0.1706287120950633, "compression_ratio": 1.7112462006079028, + "no_speech_prob": 0.0024359251838177443}, {"id": 23, "seek": 13726, "start": 161.89999999999998, + "end": 166.62, "text": " Yeah, absolutely. I''ve been I''ve been also following + all different things reading papers,", "tokens": [51596, 865, 11, 3122, 13, 286, + 600, 668, 286, 600, 668, 611, 3480, 439, 819, 721, 3760, 10577, 11, 51832], "temperature": + 0.0, "avg_logprob": -0.1706287120950633, "compression_ratio": 1.7112462006079028, + "no_speech_prob": 0.0024359251838177443}, {"id": 24, "seek": 16662, "start": 166.62, + "end": 171.98000000000002, "text": " you know, implementing clip, but I still feel + like I miss out on so many things and I really hope", "tokens": [50364, 291, 458, + 11, 18114, 7353, 11, 457, 286, 920, 841, 411, 286, 1713, 484, 322, 370, 867, 721, + 293, 286, 534, 1454, 50632], "temperature": 0.0, "avg_logprob": -0.11819670372402545, + "compression_ratio": 1.5959183673469388, "no_speech_prob": 0.001943264389410615}, + {"id": 25, "seek": 16662, "start": 171.98000000000002, "end": 181.1, "text": " we + will cover some of them today. And we on the verge of I think maybe witnessing a + change in", "tokens": [50632, 321, 486, 2060, 512, 295, 552, 965, 13, 400, 321, + 322, 264, 37164, 295, 286, 519, 1310, 39233, 257, 1319, 294, 51088], "temperature": + 0.0, "avg_logprob": -0.11819670372402545, "compression_ratio": 1.5959183673469388, + "no_speech_prob": 0.001943264389410615}, {"id": 26, "seek": 16662, "start": 181.1, + "end": 188.62, "text": " the search paradigm, you know, with chat GPT. I first I + wanted to sort of get your first reaction", "tokens": [51088, 264, 3164, 24709, + 11, 291, 458, 11, 365, 5081, 26039, 51, 13, 286, 700, 286, 1415, 281, 1333, 295, + 483, 428, 700, 5480, 51464], "temperature": 0.0, "avg_logprob": -0.11819670372402545, + "compression_ratio": 1.5959183673469388, "no_speech_prob": 0.001943264389410615}, + {"id": 27, "seek": 16662, "start": 188.62, "end": 195.34, "text": " on this. Obviously + you tested it. I also tested it actually with when I published my recent blog post", + "tokens": [51464, 322, 341, 13, 7580, 291, 8246, 309, 13, 286, 611, 8246, 309, 767, + 365, 562, 286, 6572, 452, 5162, 6968, 2183, 51800], "temperature": 0.0, "avg_logprob": + -0.11819670372402545, "compression_ratio": 1.5959183673469388, "no_speech_prob": + 0.001943264389410615}, {"id": 28, "seek": 19534, "start": 195.34, "end": 201.5, + "text": " on neural search frameworks. And I was like just stuck on creating a title + and I asked chat GPT,", "tokens": [50364, 322, 18161, 3164, 29834, 13, 400, 286, + 390, 411, 445, 5541, 322, 4084, 257, 4876, 293, 286, 2351, 5081, 26039, 51, 11, + 50672], "temperature": 0.0, "avg_logprob": -0.17986619472503662, "compression_ratio": + 1.6623376623376624, "no_speech_prob": 0.0016417858423665166}, {"id": 29, "seek": + 19534, "start": 201.5, "end": 206.78, "text": " can you come up with a title and + came up with a reasonably good title and I actually used it without", "tokens": + [50672, 393, 291, 808, 493, 365, 257, 4876, 293, 1361, 493, 365, 257, 23551, 665, + 4876, 293, 286, 767, 1143, 309, 1553, 50936], "temperature": 0.0, "avg_logprob": + -0.17986619472503662, "compression_ratio": 1.6623376623376624, "no_speech_prob": + 0.0016417858423665166}, {"id": 30, "seek": 19534, "start": 206.78, "end": 212.38, + "text": " editing. And I read a bunch of other stories, you know, like for example, + how you can avoid", "tokens": [50936, 10000, 13, 400, 286, 1401, 257, 3840, 295, + 661, 3676, 11, 291, 458, 11, 411, 337, 1365, 11, 577, 291, 393, 5042, 51216], "temperature": + 0.0, "avg_logprob": -0.17986619472503662, "compression_ratio": 1.6623376623376624, + "no_speech_prob": 0.0016417858423665166}, {"id": 31, "seek": 19534, "start": 213.02, + "end": 220.7, "text": " fines, for wrong parking and stuff. But then there is this + discussion going on, you know, like", "tokens": [51248, 37989, 11, 337, 2085, 9893, + 293, 1507, 13, 583, 550, 456, 307, 341, 5017, 516, 322, 11, 291, 458, 11, 411, 51632], + "temperature": 0.0, "avg_logprob": -0.17986619472503662, "compression_ratio": 1.6623376623376624, + "no_speech_prob": 0.0016417858423665166}, {"id": 32, "seek": 22070, "start": 220.78, + "end": 224.29999999999998, "text": " how it may change search. But before that, + what was your impression of chat GPT?", "tokens": [50368, 577, 309, 815, 1319, 3164, + 13, 583, 949, 300, 11, 437, 390, 428, 9995, 295, 5081, 26039, 51, 30, 50544], "temperature": + 0.0, "avg_logprob": -0.14310882831441946, "compression_ratio": 1.5957446808510638, + "no_speech_prob": 0.018956175073981285}, {"id": 33, "seek": 22070, "start": 225.82, + "end": 232.7, "text": " Yeah, well, I think like everyone else sort of in in this + like reading about say Google''s", "tokens": [50620, 865, 11, 731, 11, 286, 519, + 411, 1518, 1646, 1333, 295, 294, 294, 341, 411, 3760, 466, 584, 3329, 311, 50964], + "temperature": 0.0, "avg_logprob": -0.14310882831441946, "compression_ratio": 1.5957446808510638, + "no_speech_prob": 0.018956175073981285}, {"id": 34, "seek": 22070, "start": 232.7, + "end": 237.89999999999998, "text": " flan model or, you know, that we''ve been kind + of reading about a lot of these large language", "tokens": [50964, 932, 282, 2316, + 420, 11, 291, 458, 11, 300, 321, 600, 668, 733, 295, 3760, 466, 257, 688, 295, 613, + 2416, 2856, 51224], "temperature": 0.0, "avg_logprob": -0.14310882831441946, "compression_ratio": + 1.5957446808510638, "no_speech_prob": 0.018956175073981285}, {"id": 35, "seek": + 22070, "start": 237.89999999999998, "end": 242.62, "text": " models, but we haven''t + actually really gotten to use them. I think Facebook''s OPT model was on", "tokens": + [51224, 5245, 11, 457, 321, 2378, 380, 767, 534, 5768, 281, 764, 552, 13, 286, 519, + 4384, 311, 23324, 51, 2316, 390, 322, 51460], "temperature": 0.0, "avg_logprob": + -0.14310882831441946, "compression_ratio": 1.5957446808510638, "no_speech_prob": + 0.018956175073981285}, {"id": 36, "seek": 22070, "start": 242.62, "end": 247.89999999999998, + "text": " hugging face and I played with that and back in back at the time, I was + mostly like the few", "tokens": [51460, 41706, 1851, 293, 286, 3737, 365, 300, 293, + 646, 294, 646, 412, 264, 565, 11, 286, 390, 5240, 411, 264, 1326, 51724], "temperature": + 0.0, "avg_logprob": -0.14310882831441946, "compression_ratio": 1.5957446808510638, + "no_speech_prob": 0.018956175073981285}, {"id": 37, "seek": 24790, "start": 247.9, + "end": 251.98000000000002, "text": " shot learning part was like the part that was + so exciting where you could, you know, give it like", "tokens": [50364, 3347, 2539, + 644, 390, 411, 264, 644, 300, 390, 370, 4670, 689, 291, 727, 11, 291, 458, 11, 976, + 309, 411, 50568], "temperature": 0.0, "avg_logprob": -0.15204935488493546, "compression_ratio": + 1.721830985915493, "no_speech_prob": 0.004537984728813171}, {"id": 38, "seek": 24790, + "start": 251.98000000000002, "end": 256.14, "text": " for example, of a task and + then it could just instantly learn the task and that''s like pretty", "tokens": + [50568, 337, 1365, 11, 295, 257, 5633, 293, 550, 309, 727, 445, 13518, 1466, 264, + 5633, 293, 300, 311, 411, 1238, 50776], "temperature": 0.0, "avg_logprob": -0.15204935488493546, + "compression_ratio": 1.721830985915493, "no_speech_prob": 0.004537984728813171}, + {"id": 39, "seek": 24790, "start": 256.14, "end": 261.1, "text": " surprising for + people who''ve been doing supervised learning optimization for a long time. And + so", "tokens": [50776, 8830, 337, 561, 567, 600, 668, 884, 46533, 2539, 19618, 337, + 257, 938, 565, 13, 400, 370, 51024], "temperature": 0.0, "avg_logprob": -0.15204935488493546, + "compression_ratio": 1.721830985915493, "no_speech_prob": 0.004537984728813171}, + {"id": 40, "seek": 24790, "start": 261.1, "end": 266.22, "text": " mostly my thinking + was a few shot learning, but this chat GPT thing, this reinforced learning from", + "tokens": [51024, 5240, 452, 1953, 390, 257, 1326, 3347, 2539, 11, 457, 341, 5081, + 26039, 51, 551, 11, 341, 31365, 2539, 490, 51280], "temperature": 0.0, "avg_logprob": + -0.15204935488493546, "compression_ratio": 1.721830985915493, "no_speech_prob": + 0.004537984728813171}, {"id": 41, "seek": 24790, "start": 266.22, "end": 273.42, + "text": " human feedback, this like I mean the way that it can talk is just mind + blowing. It''s I''m so amazed by", "tokens": [51280, 1952, 5824, 11, 341, 411, 286, + 914, 264, 636, 300, 309, 393, 751, 307, 445, 1575, 15068, 13, 467, 311, 286, 478, + 370, 20507, 538, 51640], "temperature": 0.0, "avg_logprob": -0.15204935488493546, + "compression_ratio": 1.721830985915493, "no_speech_prob": 0.004537984728813171}, + {"id": 42, "seek": 27342, "start": 273.90000000000003, "end": 279.42, "text": " + and I think yeah, it''s really unlocked a lot of thinking about the importance of + prompting to", "tokens": [50388, 293, 286, 519, 1338, 11, 309, 311, 534, 30180, + 257, 688, 295, 1953, 466, 264, 7379, 295, 12391, 278, 281, 50664], "temperature": + 0.0, "avg_logprob": -0.11184623461811483, "compression_ratio": 1.7619047619047619, + "no_speech_prob": 0.004762864205986261}, {"id": 43, "seek": 27342, "start": 279.42, + "end": 283.98, "text": " me and what prompting means. I used to think that was just + kind of like a task description idea,", "tokens": [50664, 385, 293, 437, 12391, + 278, 1355, 13, 286, 1143, 281, 519, 300, 390, 445, 733, 295, 411, 257, 5633, 3855, + 1558, 11, 50892], "temperature": 0.0, "avg_logprob": -0.11184623461811483, "compression_ratio": + 1.7619047619047619, "no_speech_prob": 0.004762864205986261}, {"id": 44, "seek": + 27342, "start": 283.98, "end": 289.1, "text": " which it still kind of is, but it''s + like the nuances of it are so much. And yeah, I''d really", "tokens": [50892, 597, + 309, 920, 733, 295, 307, 11, 457, 309, 311, 411, 264, 38775, 295, 309, 366, 370, + 709, 13, 400, 1338, 11, 286, 1116, 534, 51148], "temperature": 0.0, "avg_logprob": + -0.11184623461811483, "compression_ratio": 1.7619047619047619, "no_speech_prob": + 0.004762864205986261}, {"id": 45, "seek": 27342, "start": 289.1, "end": 294.54, + "text": " love to like dive into this topic of large language models and search + and I have a few different", "tokens": [51148, 959, 281, 411, 9192, 666, 341, 4829, + 295, 2416, 2856, 5245, 293, 3164, 293, 286, 362, 257, 1326, 819, 51420], "temperature": + 0.0, "avg_logprob": -0.11184623461811483, "compression_ratio": 1.7619047619047619, + "no_speech_prob": 0.004762864205986261}, {"id": 46, "seek": 27342, "start": 294.54, + "end": 299.26, "text": " dimensions of how I''m kind of thinking about these two + things relating to each other, but since I''ve", "tokens": [51420, 12819, 295, 577, + 286, 478, 733, 295, 1953, 466, 613, 732, 721, 23968, 281, 1184, 661, 11, 457, 1670, + 286, 600, 51656], "temperature": 0.0, "avg_logprob": -0.11184623461811483, "compression_ratio": + 1.7619047619047619, "no_speech_prob": 0.004762864205986261}, {"id": 47, "seek": + 29926, "start": 299.65999999999997, "end": 305.65999999999997, "text": " brought + up prompting, I kind of want to stay on this one quickly. So Bob and Jerry Lou showed + me this", "tokens": [50384, 3038, 493, 12391, 278, 11, 286, 733, 295, 528, 281, + 1754, 322, 341, 472, 2661, 13, 407, 6085, 293, 17454, 7272, 4712, 385, 341, 50684], + "temperature": 0.0, "avg_logprob": -0.14329574067713852, "compression_ratio": 1.7377622377622377, + "no_speech_prob": 0.0004865841183345765}, {"id": 48, "seek": 29926, "start": 305.65999999999997, + "end": 312.21999999999997, "text": " thing called GPT index and GPT index has this + strategy for prompting GPT for summarization. It has", "tokens": [50684, 551, 1219, + 26039, 51, 8186, 293, 26039, 51, 8186, 575, 341, 5206, 337, 12391, 278, 26039, 51, + 337, 14611, 2144, 13, 467, 575, 51012], "temperature": 0.0, "avg_logprob": -0.14329574067713852, + "compression_ratio": 1.7377622377622377, "no_speech_prob": 0.0004865841183345765}, + {"id": 49, "seek": 29926, "start": 312.21999999999997, "end": 316.38, "text": " + other things, but this is one thing that just really stood out to me and there are + like two strategies", "tokens": [51012, 661, 721, 11, 457, 341, 307, 472, 551, 300, + 445, 534, 9371, 484, 281, 385, 293, 456, 366, 411, 732, 9029, 51220], "temperature": + 0.0, "avg_logprob": -0.14329574067713852, "compression_ratio": 1.7377622377622377, + "no_speech_prob": 0.0004865841183345765}, {"id": 50, "seek": 29926, "start": 316.38, + "end": 323.18, "text": " you can use to summarize long text with the large language + model. You can either create and refine", "tokens": [51220, 291, 393, 764, 281, + 20858, 938, 2487, 365, 264, 2416, 2856, 2316, 13, 509, 393, 2139, 1884, 293, 33906, + 51560], "temperature": 0.0, "avg_logprob": -0.14329574067713852, "compression_ratio": + 1.7377622377622377, "no_speech_prob": 0.0004865841183345765}, {"id": 51, "seek": + 29926, "start": 323.18, "end": 327.42, "text": " where you go paragraph by paragraph + and you say like you start up by please write a summary of", "tokens": [51560, 689, + 291, 352, 18865, 538, 18865, 293, 291, 584, 411, 291, 722, 493, 538, 1767, 2464, + 257, 12691, 295, 51772], "temperature": 0.0, "avg_logprob": -0.14329574067713852, + "compression_ratio": 1.7377622377622377, "no_speech_prob": 0.0004865841183345765}, + {"id": 52, "seek": 32742, "start": 327.42, "end": 332.38, "text": " this long text, + you''ll receive a paragraph by paragraph and then it iteratively updates a summary", + "tokens": [50364, 341, 938, 2487, 11, 291, 603, 4774, 257, 18865, 538, 18865, 293, + 550, 309, 17138, 19020, 9205, 257, 12691, 50612], "temperature": 0.0, "avg_logprob": + -0.10319620931250417, "compression_ratio": 1.8631178707224334, "no_speech_prob": + 0.0023818169720470905}, {"id": 53, "seek": 32742, "start": 332.38, "end": 336.7, + "text": " or you can have this tree where you you know you chunk it up like you + know as a tree and then you", "tokens": [50612, 420, 291, 393, 362, 341, 4230, 689, + 291, 291, 458, 291, 16635, 309, 493, 411, 291, 458, 382, 257, 4230, 293, 550, 291, + 50828], "temperature": 0.0, "avg_logprob": -0.10319620931250417, "compression_ratio": + 1.8631178707224334, "no_speech_prob": 0.0023818169720470905}, {"id": 54, "seek": + 32742, "start": 336.7, "end": 342.86, "text": " couple it like recursively and then + build up the summary that way. So this kind of thing about like", "tokens": [50828, + 1916, 309, 411, 20560, 3413, 293, 550, 1322, 493, 264, 12691, 300, 636, 13, 407, + 341, 733, 295, 551, 466, 411, 51136], "temperature": 0.0, "avg_logprob": -0.10319620931250417, + "compression_ratio": 1.8631178707224334, "no_speech_prob": 0.0023818169720470905}, + {"id": 55, "seek": 32742, "start": 342.86, "end": 348.3, "text": " how we use these + large language models all of it is so interesting and so I guess kind of yeah,", + "tokens": [51136, 577, 321, 764, 613, 2416, 2856, 5245, 439, 295, 309, 307, 370, + 1880, 293, 370, 286, 2041, 733, 295, 1338, 11, 51408], "temperature": 0.0, "avg_logprob": + -0.10319620931250417, "compression_ratio": 1.8631178707224334, "no_speech_prob": + 0.0023818169720470905}, {"id": 56, "seek": 32742, "start": 348.3, "end": 353.02000000000004, + "text": " let me pass it back to you and I''m curious like how do you think large + language models will change", "tokens": [51408, 718, 385, 1320, 309, 646, 281, 291, + 293, 286, 478, 6369, 411, 577, 360, 291, 519, 2416, 2856, 5245, 486, 1319, 51644], + "temperature": 0.0, "avg_logprob": -0.10319620931250417, "compression_ratio": 1.8631178707224334, + "no_speech_prob": 0.0023818169720470905}, {"id": 57, "seek": 35302, "start": 353.41999999999996, + "end": 361.65999999999997, "text": " search? Yeah, I mean I''m still kind of learning + it and I having you know built search engine before", "tokens": [50384, 3164, 30, + 865, 11, 286, 914, 286, 478, 920, 733, 295, 2539, 309, 293, 286, 1419, 291, 458, + 3094, 3164, 2848, 949, 50796], "temperature": 0.0, "avg_logprob": -0.15071534073871115, + "compression_ratio": 1.6134453781512605, "no_speech_prob": 0.0015915316762402654}, + {"id": 58, "seek": 35302, "start": 361.65999999999997, "end": 369.5, "text": " vector + search you know using like TFIDF basically. I knew the cost of doing it wrong you + know", "tokens": [50796, 8062, 3164, 291, 458, 1228, 411, 40964, 2777, 37, 1936, + 13, 286, 2586, 264, 2063, 295, 884, 309, 2085, 291, 458, 51188], "temperature": + 0.0, "avg_logprob": -0.15071534073871115, "compression_ratio": 1.6134453781512605, + "no_speech_prob": 0.0015915316762402654}, {"id": 59, "seek": 35302, "start": 369.5, + "end": 377.34, "text": " or sort of focusing too much on precision and then paying + a huge bill because of that. So like", "tokens": [51188, 420, 1333, 295, 8416, 886, + 709, 322, 18356, 293, 550, 6229, 257, 2603, 2961, 570, 295, 300, 13, 407, 411, 51580], + "temperature": 0.0, "avg_logprob": -0.15071534073871115, "compression_ratio": 1.6134453781512605, + "no_speech_prob": 0.0015915316762402654}, {"id": 60, "seek": 35302, "start": 377.34, + "end": 382.46, "text": " our search engine for example back in the days when we + indexed on sentence level in alpha sense", "tokens": [51580, 527, 3164, 2848, 337, + 1365, 646, 294, 264, 1708, 562, 321, 8186, 292, 322, 8174, 1496, 294, 8961, 2020, + 51836], "temperature": 0.0, "avg_logprob": -0.15071534073871115, "compression_ratio": + 1.6134453781512605, "no_speech_prob": 0.0015915316762402654}, {"id": 61, "seek": + 38246, "start": 382.62, "end": 390.21999999999997, "text": " would eat something + like half a terabyte memory and you know memory was never cheap like it was", "tokens": + [50372, 576, 1862, 746, 411, 1922, 257, 1796, 34529, 4675, 293, 291, 458, 4675, + 390, 1128, 7084, 411, 309, 390, 50752], "temperature": 0.0, "avg_logprob": -0.1213303876210408, + "compression_ratio": 1.6724137931034482, "no_speech_prob": 0.0009250330622307956}, + {"id": 62, "seek": 38246, "start": 390.21999999999997, "end": 398.38, "text": " + very expensive even back then and so we had to figure out ways to retain precision, + not lose recall", "tokens": [50752, 588, 5124, 754, 646, 550, 293, 370, 321, 632, + 281, 2573, 484, 2098, 281, 18340, 18356, 11, 406, 3624, 9901, 51160], "temperature": + 0.0, "avg_logprob": -0.1213303876210408, "compression_ratio": 1.6724137931034482, + "no_speech_prob": 0.0009250330622307956}, {"id": 63, "seek": 38246, "start": 398.38, + "end": 402.7, "text": " or maybe even increase recall because there was a problem + with this precision oriented search", "tokens": [51160, 420, 1310, 754, 3488, 9901, + 570, 456, 390, 257, 1154, 365, 341, 18356, 21841, 3164, 51376], "temperature": 0.0, + "avg_logprob": -0.1213303876210408, "compression_ratio": 1.6724137931034482, "no_speech_prob": + 0.0009250330622307956}, {"id": 64, "seek": 38246, "start": 403.58, "end": 409.82, + "text": " and stay within the budget right so when I think about language models + myself and I also worked at", "tokens": [51420, 293, 1754, 1951, 264, 4706, 558, + 370, 562, 286, 519, 466, 2856, 5245, 2059, 293, 286, 611, 2732, 412, 51732], "temperature": + 0.0, "avg_logprob": -0.1213303876210408, "compression_ratio": 1.6724137931034482, + "no_speech_prob": 0.0009250330622307956}, {"id": 65, "seek": 40982, "start": 409.82, + "end": 417.5, "text": " silo AI at one large client you know applying these models + at web scale. The problem at web", "tokens": [50364, 3425, 78, 7318, 412, 472, 2416, + 6423, 291, 458, 9275, 613, 5245, 412, 3670, 4373, 13, 440, 1154, 412, 3670, 50748], + "temperature": 0.0, "avg_logprob": -0.11600063112046984, "compression_ratio": 1.7112068965517242, + "no_speech_prob": 0.0005339415511116385}, {"id": 66, "seek": 40982, "start": 417.5, + "end": 423.34, "text": " scale is that you really need to go sub-second and not + just sub-second you need to go like 10 milliseconds", "tokens": [50748, 4373, 307, + 300, 291, 534, 643, 281, 352, 1422, 12, 27375, 293, 406, 445, 1422, 12, 27375, 291, + 643, 281, 352, 411, 1266, 34184, 51040], "temperature": 0.0, "avg_logprob": -0.11600063112046984, + "compression_ratio": 1.7112068965517242, "no_speech_prob": 0.0005339415511116385}, + {"id": 67, "seek": 40982, "start": 423.34, "end": 429.1, "text": " or so because + all of these adapt because you have so many components in the search engine it''s + also", "tokens": [51040, 420, 370, 570, 439, 295, 613, 6231, 570, 291, 362, 370, + 867, 6677, 294, 264, 3164, 2848, 309, 311, 611, 51328], "temperature": 0.0, "avg_logprob": + -0.11600063112046984, "compression_ratio": 1.7112068965517242, "no_speech_prob": + 0.0005339415511116385}, {"id": 68, "seek": 40982, "start": 429.1, "end": 436.46, + "text": " multilingual it''s also serving a specific country you know with that + specific latency requirements", "tokens": [51328, 2120, 38219, 309, 311, 611, 8148, + 257, 2685, 1941, 291, 458, 365, 300, 2685, 27043, 7728, 51696], "temperature": 0.0, + "avg_logprob": -0.11600063112046984, "compression_ratio": 1.7112068965517242, "no_speech_prob": + 0.0005339415511116385}, {"id": 69, "seek": 43646, "start": 436.46, "end": 442.62, + "text": " and stuff and and then there is indexing how quickly you can index things + right because you may", "tokens": [50364, 293, 1507, 293, 293, 550, 456, 307, 8186, + 278, 577, 2661, 291, 393, 8186, 721, 558, 570, 291, 815, 50672], "temperature": + 0.0, "avg_logprob": -0.11366842325451304, "compression_ratio": 1.7962962962962963, + "no_speech_prob": 0.0016883693169802427}, {"id": 70, "seek": 43646, "start": 442.62, + "end": 447.5, "text": " also face bottlenecks there so these these are the things + that I keep thinking about but also the", "tokens": [50672, 611, 1851, 44641, 2761, + 456, 370, 613, 613, 366, 264, 721, 300, 286, 1066, 1953, 466, 457, 611, 264, 50916], + "temperature": 0.0, "avg_logprob": -0.11366842325451304, "compression_ratio": 1.7962962962962963, + "no_speech_prob": 0.0016883693169802427}, {"id": 71, "seek": 43646, "start": 447.5, + "end": 453.97999999999996, "text": " thing that we talked a year ago in the port + in the same podcast vector podcast is that you know the", "tokens": [50916, 551, + 300, 321, 2825, 257, 1064, 2057, 294, 264, 2436, 294, 264, 912, 7367, 8062, 7367, + 307, 300, 291, 458, 264, 51240], "temperature": 0.0, "avg_logprob": -0.11366842325451304, + "compression_ratio": 1.7962962962962963, "no_speech_prob": 0.0016883693169802427}, + {"id": 72, "seek": 43646, "start": 453.97999999999996, "end": 460.85999999999996, + "text": " models like trained by Microsoft for instance I can hardly imagine deploying + them today in my", "tokens": [51240, 5245, 411, 8895, 538, 8116, 337, 5197, 286, + 393, 13572, 3811, 34198, 552, 965, 294, 452, 51584], "temperature": 0.0, "avg_logprob": + -0.11366842325451304, "compression_ratio": 1.7962962962962963, "no_speech_prob": + 0.0016883693169802427}, {"id": 73, "seek": 43646, "start": 460.85999999999996, "end": + 466.14, "text": " practical setting because they will have like billions of parameters + and so they will be probably", "tokens": [51584, 8496, 3287, 570, 436, 486, 362, + 411, 17375, 295, 9834, 293, 370, 436, 486, 312, 1391, 51848], "temperature": 0.0, + "avg_logprob": -0.11366842325451304, "compression_ratio": 1.7962962962962963, "no_speech_prob": + 0.0016883693169802427}, {"id": 74, "seek": 46614, "start": 466.14, "end": 471.9, + "text": " slower and also how do I fine tune them how much server capacity I will + need to fine tune them and", "tokens": [50364, 14009, 293, 611, 577, 360, 286, 2489, + 10864, 552, 577, 709, 7154, 6042, 286, 486, 643, 281, 2489, 10864, 552, 293, 50652], + "temperature": 0.0, "avg_logprob": -0.10158433697440407, "compression_ratio": 1.6986899563318778, + "no_speech_prob": 0.0009660437935963273}, {"id": 75, "seek": 46614, "start": 471.9, + "end": 479.34, "text": " so that''s why I thought you know from the discussion with + multi-peach right he pointed me to the", "tokens": [50652, 370, 300, 311, 983, 286, + 1194, 291, 458, 490, 264, 5017, 365, 4825, 12, 494, 608, 558, 415, 10932, 385, 281, + 264, 51024], "temperature": 0.0, "avg_logprob": -0.10158433697440407, "compression_ratio": + 1.6986899563318778, "no_speech_prob": 0.0009660437935963273}, {"id": 76, "seek": + 46614, "start": 479.34, "end": 487.5, "text": " Atlas paper where they basically + are able to with a few examples fine tune the model so quickly", "tokens": [51024, + 32485, 3035, 689, 436, 1936, 366, 1075, 281, 365, 257, 1326, 5110, 2489, 10864, + 264, 2316, 370, 2661, 51432], "temperature": 0.0, "avg_logprob": -0.10158433697440407, + "compression_ratio": 1.6986899563318778, "no_speech_prob": 0.0009660437935963273}, + {"id": 77, "seek": 46614, "start": 488.38, "end": 493.41999999999996, "text": " + and it will have substantially less parameters so it becomes more practical you + know both on fine", "tokens": [51476, 293, 309, 486, 362, 30797, 1570, 9834, 370, + 309, 3643, 544, 8496, 291, 458, 1293, 322, 2489, 51728], "temperature": 0.0, "avg_logprob": + -0.10158433697440407, "compression_ratio": 1.6986899563318778, "no_speech_prob": + 0.0009660437935963273}, {"id": 78, "seek": 49342, "start": 493.42, "end": 499.18, + "text": " tuning side and also on serving side and these are the topics that I keep + thinking before I enter the", "tokens": [50364, 15164, 1252, 293, 611, 322, 8148, + 1252, 293, 613, 366, 264, 8378, 300, 286, 1066, 1953, 949, 286, 3242, 264, 50652], + "temperature": 0.0, "avg_logprob": -0.11291148519923544, "compression_ratio": 1.8228782287822878, + "no_speech_prob": 0.010798160918056965}, {"id": 79, "seek": 49342, "start": 499.82, + "end": 504.94, "text": " is it chat GPT is it sexy is it cool is it answering my + questions you know can I actually", "tokens": [50684, 307, 309, 5081, 26039, 51, + 307, 309, 13701, 307, 309, 1627, 307, 309, 13430, 452, 1651, 291, 458, 393, 286, + 767, 50940], "temperature": 0.0, "avg_logprob": -0.11291148519923544, "compression_ratio": + 1.8228782287822878, "no_speech_prob": 0.010798160918056965}, {"id": 80, "seek": + 49342, "start": 504.94, "end": 511.1, "text": " deploy it and not have angry faces + from DevOps saying hey you just crossed all the like we are low", "tokens": [50940, + 7274, 309, 293, 406, 362, 6884, 8475, 490, 43051, 1566, 4177, 291, 445, 14622, 439, + 264, 411, 321, 366, 2295, 51248], "temperature": 0.0, "avg_logprob": -0.11291148519923544, + "compression_ratio": 1.8228782287822878, "no_speech_prob": 0.010798160918056965}, + {"id": 81, "seek": 49342, "start": 511.1, "end": 516.3000000000001, "text": " margin + on search and you are just you know way above that so sorry we cannot deploy this + so these are", "tokens": [51248, 10270, 322, 3164, 293, 291, 366, 445, 291, 458, + 636, 3673, 300, 370, 2597, 321, 2644, 7274, 341, 370, 613, 366, 51508], "temperature": + 0.0, "avg_logprob": -0.11291148519923544, "compression_ratio": 1.8228782287822878, + "no_speech_prob": 0.010798160918056965}, {"id": 82, "seek": 49342, "start": 516.3000000000001, + "end": 522.62, "text": " the questions I''m thinking about a lot yeah that I think + there''s a couple things in pack and no one''s", "tokens": [51508, 264, 1651, 286, + 478, 1953, 466, 257, 688, 1338, 300, 286, 519, 456, 311, 257, 1916, 721, 294, 2844, + 293, 572, 472, 311, 51824], "temperature": 0.0, "avg_logprob": -0.11291148519923544, + "compression_ratio": 1.8228782287822878, "no_speech_prob": 0.010798160918056965}, + {"id": 83, "seek": 52262, "start": 522.62, "end": 527.26, "text": " helped me develop + the abstraction around the end-to-end search framework more than you so thank you", + "tokens": [50364, 4254, 385, 1499, 264, 37765, 926, 264, 917, 12, 1353, 12, 521, + 3164, 8388, 544, 813, 291, 370, 1309, 291, 50596], "temperature": 0.0, "avg_logprob": + -0.1812877655029297, "compression_ratio": 1.798780487804878, "no_speech_prob": 0.004123232793062925}, + {"id": 84, "seek": 52262, "start": 527.26, "end": 531.1, "text": " so with the with + the pyramid diagrams and these kind of things it''s so helpful and yeah you", "tokens": + [50596, 370, 365, 264, 365, 264, 25950, 36709, 293, 613, 733, 295, 721, 309, 311, + 370, 4961, 293, 1338, 291, 50788], "temperature": 0.0, "avg_logprob": -0.1812877655029297, + "compression_ratio": 1.798780487804878, "no_speech_prob": 0.004123232793062925}, + {"id": 85, "seek": 52262, "start": 531.1, "end": 535.74, "text": " mentioned like + the approximate nearest neighbor then one up you have where I see is the information", + "tokens": [50788, 2835, 411, 264, 30874, 23831, 5987, 550, 472, 493, 291, 362, 689, + 286, 536, 307, 264, 1589, 51020], "temperature": 0.0, "avg_logprob": -0.1812877655029297, + "compression_ratio": 1.798780487804878, "no_speech_prob": 0.004123232793062925}, + {"id": 86, "seek": 52262, "start": 535.74, "end": 540.94, "text": " retrieval layer + where you have the you know dense vector search BM25 split covert that layer and", + "tokens": [51020, 19817, 3337, 4583, 689, 291, 362, 264, 291, 458, 18011, 8062, + 3164, 15901, 6074, 7472, 45985, 300, 4583, 293, 51280], "temperature": 0.0, "avg_logprob": + -0.1812877655029297, "compression_ratio": 1.798780487804878, "no_speech_prob": 0.004123232793062925}, + {"id": 87, "seek": 52262, "start": 540.94, "end": 546.54, "text": " then at the + top you have like what I think is going to be the chat GPT layer that''s like that + would", "tokens": [51280, 550, 412, 264, 1192, 291, 362, 411, 437, 286, 519, 307, + 516, 281, 312, 264, 5081, 26039, 51, 4583, 300, 311, 411, 300, 576, 51560], "temperature": + 0.0, "avg_logprob": -0.1812877655029297, "compression_ratio": 1.798780487804878, + "no_speech_prob": 0.004123232793062925}, {"id": 88, "seek": 52262, "start": 546.54, + "end": 549.5, "text": " be my current predict and we''re going to talk about neural + search frameworks that they can do more on", "tokens": [51560, 312, 452, 2190, 6069, + 293, 321, 434, 516, 281, 751, 466, 18161, 3164, 29834, 300, 436, 393, 360, 544, + 322, 51708], "temperature": 0.0, "avg_logprob": -0.1812877655029297, "compression_ratio": + 1.798780487804878, "no_speech_prob": 0.004123232793062925}, {"id": 89, "seek": 54950, + "start": 549.5, "end": 560.62, "text": " the wv8 podcast yeah well maybe to just + say a little bit one of our favorite partners that we''ve", "tokens": [50364, 264, + 261, 85, 23, 7367, 1338, 731, 1310, 281, 445, 584, 257, 707, 857, 472, 295, 527, + 2954, 4462, 300, 321, 600, 50920], "temperature": 0.0, "avg_logprob": -0.1459484100341797, + "compression_ratio": 1.5685483870967742, "no_speech_prob": 0.0014912751503288746}, + {"id": 90, "seek": 54950, "start": 560.62, "end": 566.38, "text": " been working + with is neural magic and neural magic is doing sparsity inference acceleration where", + "tokens": [50920, 668, 1364, 365, 307, 18161, 5585, 293, 18161, 5585, 307, 884, + 637, 685, 507, 38253, 17162, 689, 51208], "temperature": 0.0, "avg_logprob": -0.1459484100341797, + "compression_ratio": 1.5685483870967742, "no_speech_prob": 0.0014912751503288746}, + {"id": 91, "seek": 54950, "start": 566.38, "end": 571.9, "text": " they''ve recently + one of their papers is about getting the 175 billion parameter GPT model to run", + "tokens": [51208, 436, 600, 3938, 472, 295, 641, 10577, 307, 466, 1242, 264, 41165, + 5218, 13075, 26039, 51, 2316, 281, 1190, 51484], "temperature": 0.0, "avg_logprob": + -0.1459484100341797, "compression_ratio": 1.5685483870967742, "no_speech_prob": + 0.0014912751503288746}, {"id": 92, "seek": 54950, "start": 571.9, "end": 577.74, + "text": " on a single GPU I know that you know you can probably compile these large + language models on like", "tokens": [51484, 322, 257, 2167, 18407, 286, 458, 300, + 291, 458, 291, 393, 1391, 31413, 613, 2416, 2856, 5245, 322, 411, 51776], "temperature": + 0.0, "avg_logprob": -0.1459484100341797, "compression_ratio": 1.5685483870967742, + "no_speech_prob": 0.0014912751503288746}, {"id": 93, "seek": 57774, "start": 577.74, + "end": 583.66, "text": " Nvidia Triton server and do it that way but I think that + this sparsity acceleration for CPUs", "tokens": [50364, 46284, 1765, 270, 266, 7154, + 293, 360, 309, 300, 636, 457, 286, 519, 300, 341, 637, 685, 507, 17162, 337, 13199, + 82, 50660], "temperature": 0.0, "avg_logprob": -0.08642537173102884, "compression_ratio": + 1.6101694915254237, "no_speech_prob": 0.0023990909103304148}, {"id": 94, "seek": + 57774, "start": 583.66, "end": 588.0600000000001, "text": " is just incredibly exciting + for that particular dimension of it and yeah I think what you said", "tokens": [50660, + 307, 445, 6252, 4670, 337, 300, 1729, 10139, 295, 309, 293, 1338, 286, 519, 437, + 291, 848, 50880], "temperature": 0.0, "avg_logprob": -0.08642537173102884, "compression_ratio": + 1.6101694915254237, "no_speech_prob": 0.0023990909103304148}, {"id": 95, "seek": + 57774, "start": 588.0600000000001, "end": 596.86, "text": " inspired so many ideas + yeah I sort of like like what I value in your approach is that you run", "tokens": + [50880, 7547, 370, 867, 3487, 1338, 286, 1333, 295, 411, 411, 437, 286, 2158, 294, + 428, 3109, 307, 300, 291, 1190, 51320], "temperature": 0.0, "avg_logprob": -0.08642537173102884, + "compression_ratio": 1.6101694915254237, "no_speech_prob": 0.0023990909103304148}, + {"id": 96, "seek": 57774, "start": 597.58, "end": 602.78, "text": " probably like + a basketball player converted into a marathon runner with the same capacity you + have", "tokens": [51356, 1391, 411, 257, 11767, 4256, 16424, 666, 257, 27601, 24376, + 365, 264, 912, 6042, 291, 362, 51616], "temperature": 0.0, "avg_logprob": -0.08642537173102884, + "compression_ratio": 1.6101694915254237, "no_speech_prob": 0.0023990909103304148}, + {"id": 97, "seek": 60278, "start": 602.78, "end": 609.66, "text": " to play a game + you know that you basically run super quick and fast and long distances you know", + "tokens": [50364, 281, 862, 257, 1216, 291, 458, 300, 291, 1936, 1190, 1687, 1702, + 293, 2370, 293, 938, 22182, 291, 458, 50708], "temperature": 0.0, "avg_logprob": + -0.08211185281926936, "compression_ratio": 1.7065217391304348, "no_speech_prob": + 0.004629852715879679}, {"id": 98, "seek": 60278, "start": 609.66, "end": 614.4599999999999, + "text": " on the research side and I love this approach really really because it + opens up a lot of", "tokens": [50708, 322, 264, 2132, 1252, 293, 286, 959, 341, + 3109, 534, 534, 570, 309, 9870, 493, 257, 688, 295, 50948], "temperature": 0.0, + "avg_logprob": -0.08211185281926936, "compression_ratio": 1.7065217391304348, "no_speech_prob": + 0.004629852715879679}, {"id": 99, "seek": 60278, "start": 614.4599999999999, "end": + 620.86, "text": " opportunities I sort of like because I come from the engineering + background yeah I did my PhD", "tokens": [50948, 4786, 286, 1333, 295, 411, 570, + 286, 808, 490, 264, 7043, 3678, 1338, 286, 630, 452, 14476, 51268], "temperature": + 0.0, "avg_logprob": -0.08211185281926936, "compression_ratio": 1.7065217391304348, + "no_speech_prob": 0.004629852715879679}, {"id": 100, "seek": 60278, "start": 620.86, + "end": 627.02, "text": " but it was like 11 years ago so I most of my time I spent + in production you know great systems", "tokens": [51268, 457, 309, 390, 411, 2975, + 924, 2057, 370, 286, 881, 295, 452, 565, 286, 4418, 294, 4265, 291, 458, 869, 3652, + 51576], "temperature": 0.0, "avg_logprob": -0.08211185281926936, "compression_ratio": + 1.7065217391304348, "no_speech_prob": 0.004629852715879679}, {"id": 101, "seek": + 60278, "start": 627.02, "end": 632.4599999999999, "text": " and every time you just + try to move a little bit like okay let''s add this and oh the cost is this", "tokens": + [51576, 293, 633, 565, 291, 445, 853, 281, 1286, 257, 707, 857, 411, 1392, 718, + 311, 909, 341, 293, 1954, 264, 2063, 307, 341, 51848], "temperature": 0.0, "avg_logprob": + -0.08211185281926936, "compression_ratio": 1.7065217391304348, "no_speech_prob": + 0.004629852715879679}, {"id": 102, "seek": 63246, "start": 632.46, "end": 639.1800000000001, + "text": " oh sorry okay it will take me now to two more weeks to index my content + so and we have", "tokens": [50364, 1954, 2597, 1392, 309, 486, 747, 385, 586, 281, + 732, 544, 3259, 281, 8186, 452, 2701, 370, 293, 321, 362, 50700], "temperature": + 0.0, "avg_logprob": -0.09236274368461521, "compression_ratio": 1.709090909090909, + "no_speech_prob": 0.001577185234054923}, {"id": 103, "seek": 63246, "start": 639.1800000000001, + "end": 645.4200000000001, "text": " for this what is the use case so you trickle + back to like almost like product level management", "tokens": [50700, 337, 341, + 437, 307, 264, 764, 1389, 370, 291, 4282, 306, 646, 281, 411, 1920, 411, 1674, 1496, + 4592, 51012], "temperature": 0.0, "avg_logprob": -0.09236274368461521, "compression_ratio": + 1.709090909090909, "no_speech_prob": 0.001577185234054923}, {"id": 104, "seek": + 63246, "start": 646.38, "end": 651.02, "text": " and so you will get these questions + inevitably like okay why are we doing this like what''s the", "tokens": [51060, + 293, 370, 291, 486, 483, 613, 1651, 28171, 411, 1392, 983, 366, 321, 884, 341, 411, + 437, 311, 264, 51292], "temperature": 0.0, "avg_logprob": -0.09236274368461521, + "compression_ratio": 1.709090909090909, "no_speech_prob": 0.001577185234054923}, + {"id": 105, "seek": 63246, "start": 651.02, "end": 657.4200000000001, "text": " + actual trade off what''s the benefit of bringing this into production right and + but at the same time", "tokens": [51292, 3539, 4923, 766, 437, 311, 264, 5121, 295, + 5062, 341, 666, 4265, 558, 293, 457, 412, 264, 912, 565, 51612], "temperature": + 0.0, "avg_logprob": -0.09236274368461521, "compression_ratio": 1.709090909090909, + "no_speech_prob": 0.001577185234054923}, {"id": 106, "seek": 65742, "start": 657.42, + "end": 663.26, "text": " I''m fascinated by this I mean this will not stop for sure + right would you agree to that statement", "tokens": [50364, 286, 478, 24597, 538, + 341, 286, 914, 341, 486, 406, 1590, 337, 988, 558, 576, 291, 3986, 281, 300, 5629, + 50656], "temperature": 0.0, "avg_logprob": -0.21674811045328776, "compression_ratio": + 1.6455696202531647, "no_speech_prob": 0.000719190516974777}, {"id": 107, "seek": + 65742, "start": 664.86, "end": 672.2199999999999, "text": " yeah I think and there''s + uh so I know Hagen Faces recently published their they open source of data", "tokens": + [50736, 1338, 286, 519, 293, 456, 311, 2232, 370, 286, 458, 389, 4698, 479, 2116, + 3938, 6572, 641, 436, 1269, 4009, 295, 1412, 51104], "temperature": 0.0, "avg_logprob": + -0.21674811045328776, "compression_ratio": 1.6455696202531647, "no_speech_prob": + 0.000719190516974777}, {"id": 108, "seek": 65742, "start": 672.2199999999999, "end": + 678.4599999999999, "text": " said they did with surge AI on getting these um human + annotations to train the reward model in", "tokens": [51104, 848, 436, 630, 365, + 18989, 7318, 322, 1242, 613, 1105, 1952, 25339, 763, 281, 3847, 264, 7782, 2316, + 294, 51416], "temperature": 0.0, "avg_logprob": -0.21674811045328776, "compression_ratio": + 1.6455696202531647, "no_speech_prob": 0.000719190516974777}, {"id": 109, "seek": + 65742, "start": 678.4599999999999, "end": 684.2199999999999, "text": " the reinforced + learning human feedback strategy so I think they''ll they''ll be an open sourcing + of", "tokens": [51416, 264, 31365, 2539, 1952, 5824, 5206, 370, 286, 519, 436, 603, + 436, 603, 312, 364, 1269, 11006, 2175, 295, 51704], "temperature": 0.0, "avg_logprob": + -0.21674811045328776, "compression_ratio": 1.6455696202531647, "no_speech_prob": + 0.000719190516974777}, {"id": 110, "seek": 68422, "start": 684.22, "end": 688.46, + "text": " the data of the data that you need to train the models and then yeah I + think pretty soon they''ll", "tokens": [50364, 264, 1412, 295, 264, 1412, 300, 291, + 643, 281, 3847, 264, 5245, 293, 550, 1338, 286, 519, 1238, 2321, 436, 603, 50576], + "temperature": 0.0, "avg_logprob": -0.10111456650954026, "compression_ratio": 1.789090909090909, + "no_speech_prob": 0.0008320531924255192}, {"id": 111, "seek": 68422, "start": 688.46, + "end": 694.86, "text": " be open source versions of it I think open AI um I I''m + very curious about this like kind of data", "tokens": [50576, 312, 1269, 4009, 9606, + 295, 309, 286, 519, 1269, 7318, 1105, 286, 286, 478, 588, 6369, 466, 341, 411, 733, + 295, 1412, 50896], "temperature": 0.0, "avg_logprob": -0.10111456650954026, "compression_ratio": + 1.789090909090909, "no_speech_prob": 0.0008320531924255192}, {"id": 112, "seek": + 68422, "start": 694.86, "end": 699.9, "text": " flywheel idea whereby open sourcing + the model they spend a ton of money on letting you use it for free", "tokens": [50896, + 3603, 22830, 1558, 36998, 1269, 11006, 2175, 264, 2316, 436, 3496, 257, 2952, 295, + 1460, 322, 8295, 291, 764, 309, 337, 1737, 51148], "temperature": 0.0, "avg_logprob": + -0.10111456650954026, "compression_ratio": 1.789090909090909, "no_speech_prob": + 0.0008320531924255192}, {"id": 113, "seek": 68422, "start": 699.9, "end": 705.4200000000001, + "text": " but then they get the data of how you want to use it and so very curious + how that leads to more", "tokens": [51148, 457, 550, 436, 483, 264, 1412, 295, 577, + 291, 528, 281, 764, 309, 293, 370, 588, 6369, 577, 300, 6689, 281, 544, 51424], + "temperature": 0.0, "avg_logprob": -0.10111456650954026, "compression_ratio": 1.789090909090909, + "no_speech_prob": 0.0008320531924255192}, {"id": 114, "seek": 68422, "start": 706.22, + "end": 711.6600000000001, "text": " to a better model my PhD advisor is a world + class expert in class imbalance like understanding that", "tokens": [51464, 281, + 257, 1101, 2316, 452, 14476, 19161, 307, 257, 1002, 1508, 5844, 294, 1508, 43007, + 411, 3701, 300, 51736], "temperature": 0.0, "avg_logprob": -0.10111456650954026, + "compression_ratio": 1.789090909090909, "no_speech_prob": 0.0008320531924255192}, + {"id": 115, "seek": 71166, "start": 711.66, "end": 716.4599999999999, "text": " + machine learning models they do not perform well on long tail you know if you have + an imbalance", "tokens": [50364, 3479, 2539, 5245, 436, 360, 406, 2042, 731, 322, + 938, 6838, 291, 458, 498, 291, 362, 364, 43007, 50604], "temperature": 0.0, "avg_logprob": + -0.13602939673832484, "compression_ratio": 1.718978102189781, "no_speech_prob": + 0.0011256723664700985}, {"id": 116, "seek": 71166, "start": 716.4599999999999, "end": + 721.66, "text": " data so it''s a lot of like the bias discussion things like that + so I''m I''m curious maybe it helps", "tokens": [50604, 1412, 370, 309, 311, 257, + 688, 295, 411, 264, 12577, 5017, 721, 411, 300, 370, 286, 478, 286, 478, 6369, 1310, + 309, 3665, 50864], "temperature": 0.0, "avg_logprob": -0.13602939673832484, "compression_ratio": + 1.718978102189781, "no_speech_prob": 0.0011256723664700985}, {"id": 117, "seek": + 71166, "start": 721.66, "end": 728.06, "text": " the long tail getting all this + data yeah it''s still not exactly how it will get better I think", "tokens": [50864, + 264, 938, 6838, 1242, 439, 341, 1412, 1338, 309, 311, 920, 406, 2293, 577, 309, + 486, 483, 1101, 286, 519, 51184], "temperature": 0.0, "avg_logprob": -0.13602939673832484, + "compression_ratio": 1.718978102189781, "no_speech_prob": 0.0011256723664700985}, + {"id": 118, "seek": 71166, "start": 729.02, "end": 733.5799999999999, "text": " + one thing I''ve said previously is like there was this paper from Emily Bender and + um", "tokens": [51232, 472, 551, 286, 600, 848, 8046, 307, 411, 456, 390, 341, 3035, + 490, 15034, 363, 3216, 293, 1105, 51460], "temperature": 0.0, "avg_logprob": -0.13602939673832484, + "compression_ratio": 1.718978102189781, "no_speech_prob": 0.0011256723664700985}, + {"id": 119, "seek": 71166, "start": 734.38, "end": 740.38, "text": " caller is the + last name sorry it but it''s called unmeaning understanding in big data and it makes", + "tokens": [51500, 48324, 307, 264, 1036, 1315, 2597, 309, 457, 309, 311, 1219, 517, + 1398, 8415, 3701, 294, 955, 1412, 293, 309, 1669, 51800], "temperature": 0.0, "avg_logprob": + -0.13602939673832484, "compression_ratio": 1.718978102189781, "no_speech_prob": + 0.0011256723664700985}, {"id": 120, "seek": 74038, "start": 740.38, "end": 744.7, + "text": " this argument that it''s like language models by predicting the next token + will never achieve", "tokens": [50364, 341, 6770, 300, 309, 311, 411, 2856, 5245, + 538, 32884, 264, 958, 14862, 486, 1128, 4584, 50580], "temperature": 0.0, "avg_logprob": + -0.11707884986121375, "compression_ratio": 1.84375, "no_speech_prob": 0.0022324423771351576}, + {"id": 121, "seek": 74038, "start": 744.7, "end": 751.18, "text": " meaning because + it''s like an octopus underneath the ocean of two stranded islanders and it''s just", + "tokens": [50580, 3620, 570, 309, 311, 411, 364, 27962, 7223, 264, 7810, 295, 732, + 44394, 6077, 433, 293, 309, 311, 445, 50904], "temperature": 0.0, "avg_logprob": + -0.11707884986121375, "compression_ratio": 1.84375, "no_speech_prob": 0.0022324423771351576}, + {"id": 122, "seek": 74038, "start": 751.18, "end": 756.06, "text": " mimicking their + language but if it if something like a bear is to show up on the island and it goes", + "tokens": [50904, 12247, 10401, 641, 2856, 457, 498, 309, 498, 746, 411, 257, 6155, + 307, 281, 855, 493, 322, 264, 6077, 293, 309, 1709, 51148], "temperature": 0.0, + "avg_logprob": -0.11707884986121375, "compression_ratio": 1.84375, "no_speech_prob": + 0.0022324423771351576}, {"id": 123, "seek": 74038, "start": 756.06, "end": 759.98, + "text": " help a bear then the octopus is like oh I don''t know what a bear is like + yeah I''ll do more", "tokens": [51148, 854, 257, 6155, 550, 264, 27962, 307, 411, + 1954, 286, 500, 380, 458, 437, 257, 6155, 307, 411, 1338, 286, 603, 360, 544, 51344], + "temperature": 0.0, "avg_logprob": -0.11707884986121375, "compression_ratio": 1.84375, + "no_speech_prob": 0.0022324423771351576}, {"id": 124, "seek": 74038, "start": 761.26, + "end": 765.34, "text": " but I think what we''re seeing with the reinforcement learning + thing is that it''s like it''s", "tokens": [51408, 457, 286, 519, 437, 321, 434, + 2577, 365, 264, 29280, 2539, 551, 307, 300, 309, 311, 411, 309, 311, 51612], "temperature": + 0.0, "avg_logprob": -0.11707884986121375, "compression_ratio": 1.84375, "no_speech_prob": + 0.0022324423771351576}, {"id": 125, "seek": 76534, "start": 765.34, "end": 769.34, + "text": " acting it''s there''s there''s this other paper called experience grounds + language", "tokens": [50364, 6577, 309, 311, 456, 311, 456, 311, 341, 661, 3035, + 1219, 1752, 19196, 2856, 50564], "temperature": 0.0, "avg_logprob": -0.1400560292330655, + "compression_ratio": 1.9036144578313252, "no_speech_prob": 0.0016971861477941275}, + {"id": 126, "seek": 76534, "start": 769.34, "end": 776.14, "text": " it''s about + you you need to it''s like the levels of sort of developing meaning and one of it + is", "tokens": [50564, 309, 311, 466, 291, 291, 643, 281, 309, 311, 411, 264, 4358, + 295, 1333, 295, 6416, 3620, 293, 472, 295, 309, 307, 50904], "temperature": 0.0, + "avg_logprob": -0.1400560292330655, "compression_ratio": 1.9036144578313252, "no_speech_prob": + 0.0016971861477941275}, {"id": 127, "seek": 76534, "start": 776.14, "end": 781.6600000000001, + "text": " like about the importance of acting acting in your environment I''m I''m + kind of going around right", "tokens": [50904, 411, 466, 264, 7379, 295, 6577, 6577, + 294, 428, 2823, 286, 478, 286, 478, 733, 295, 516, 926, 558, 51180], "temperature": + 0.0, "avg_logprob": -0.1400560292330655, "compression_ratio": 1.9036144578313252, + "no_speech_prob": 0.0016971861477941275}, {"id": 128, "seek": 76534, "start": 781.6600000000001, + "end": 786.86, "text": " here but I also see like this causal inference stuff and + uh Judea Pearl has this ladder of causality", "tokens": [51180, 510, 457, 286, 611, + 536, 411, 341, 38755, 38253, 1507, 293, 2232, 36521, 64, 24639, 575, 341, 18325, + 295, 3302, 1860, 51440], "temperature": 0.0, "avg_logprob": -0.1400560292330655, + "compression_ratio": 1.9036144578313252, "no_speech_prob": 0.0016971861477941275}, + {"id": 129, "seek": 76534, "start": 786.86, "end": 792.7800000000001, "text": " + where uh it''s you act you make interventions but then the the the the top of the + ladder of causality", "tokens": [51440, 689, 2232, 309, 311, 291, 605, 291, 652, + 20924, 457, 550, 264, 264, 264, 264, 1192, 295, 264, 18325, 295, 3302, 1860, 51736], + "temperature": 0.0, "avg_logprob": -0.1400560292330655, "compression_ratio": 1.9036144578313252, + "no_speech_prob": 0.0016971861477941275}, {"id": 130, "seek": 79278, "start": 792.86, + "end": 798.22, "text": " is you can understand uh counterfactuals and so that last + part I have no idea how that''s going to", "tokens": [50368, 307, 291, 393, 1223, + 2232, 5682, 44919, 901, 82, 293, 370, 300, 1036, 644, 286, 362, 572, 1558, 577, + 300, 311, 516, 281, 50636], "temperature": 0.0, "avg_logprob": -0.1769057880748402, + "compression_ratio": 1.7030075187969924, "no_speech_prob": 0.0030927034094929695}, + {"id": 131, "seek": 79278, "start": 798.22, "end": 802.86, "text": " be achieved + yet but I clearly chat GPT is now like acting so it''s different from the", "tokens": + [50636, 312, 11042, 1939, 457, 286, 4448, 5081, 26039, 51, 307, 586, 411, 6577, + 370, 309, 311, 819, 490, 264, 50868], "temperature": 0.0, "avg_logprob": -0.1769057880748402, + "compression_ratio": 1.7030075187969924, "no_speech_prob": 0.0030927034094929695}, + {"id": 132, "seek": 79278, "start": 802.86, "end": 808.22, "text": " yeah yeah the + next word thing yeah I think what coming back to chat GPT like what um", "tokens": + [50868, 1338, 1338, 264, 958, 1349, 551, 1338, 286, 519, 437, 1348, 646, 281, 5081, + 26039, 51, 411, 437, 1105, 51136], "temperature": 0.0, "avg_logprob": -0.1769057880748402, + "compression_ratio": 1.7030075187969924, "no_speech_prob": 0.0030927034094929695}, + {"id": 133, "seek": 79278, "start": 808.9399999999999, "end": 813.9, "text": " impressed + me maybe the most is uh so I had I had this problem uh I was I was working on", + "tokens": [51172, 11679, 385, 1310, 264, 881, 307, 2232, 370, 286, 632, 286, 632, + 341, 1154, 2232, 286, 390, 286, 390, 1364, 322, 51420], "temperature": 0.0, "avg_logprob": + -0.1769057880748402, "compression_ratio": 1.7030075187969924, "no_speech_prob": + 0.0030927034094929695}, {"id": 134, "seek": 79278, "start": 813.9, "end": 820.38, + "text": " billion scale and then search algorithm with with the group of researchers + and and engineers like", "tokens": [51420, 5218, 4373, 293, 550, 3164, 9284, 365, + 365, 264, 1594, 295, 10309, 293, 293, 11955, 411, 51744], "temperature": 0.0, "avg_logprob": + -0.1769057880748402, "compression_ratio": 1.7030075187969924, "no_speech_prob": + 0.0030927034094929695}, {"id": 135, "seek": 82038, "start": 820.46, "end": 827.02, + "text": " almost a year ago so I invented this this algorithm I called it candy + like of course you know", "tokens": [50368, 1920, 257, 1064, 2057, 370, 286, 14479, + 341, 341, 9284, 286, 1219, 309, 11237, 411, 295, 1164, 291, 458, 50696], "temperature": + 0.0, "avg_logprob": -0.1444146736808445, "compression_ratio": 1.71875, "no_speech_prob": + 0.0003906735510099679}, {"id": 136, "seek": 82038, "start": 827.02, "end": 833.9, + "text": " not not meaning my surname but in any case with a k um it''s all open + source and GitHub I''ll make", "tokens": [50696, 406, 406, 3620, 452, 50152, 457, + 294, 604, 1389, 365, 257, 350, 1105, 309, 311, 439, 1269, 4009, 293, 23331, 286, + 603, 652, 51040], "temperature": 0.0, "avg_logprob": -0.1444146736808445, "compression_ratio": + 1.71875, "no_speech_prob": 0.0003906735510099679}, {"id": 137, "seek": 82038, "start": + 833.9, "end": 839.9, "text": " sure to link it and so the the problem was that it + it would work on 10 million vectors it would work", "tokens": [51040, 988, 281, + 2113, 309, 293, 370, 264, 264, 1154, 390, 300, 309, 309, 576, 589, 322, 1266, 2459, + 18875, 309, 576, 589, 51340], "temperature": 0.0, "avg_logprob": -0.1444146736808445, + "compression_ratio": 1.71875, "no_speech_prob": 0.0003906735510099679}, {"id": 138, + "seek": 82038, "start": 839.9, "end": 845.34, "text": " on 100 million vectors but + it would choke on one billion it would basically run out of memory", "tokens": [51340, + 322, 2319, 2459, 18875, 457, 309, 576, 34427, 322, 472, 5218, 309, 576, 1936, 1190, + 484, 295, 4675, 51612], "temperature": 0.0, "avg_logprob": -0.1444146736808445, + "compression_ratio": 1.71875, "no_speech_prob": 0.0003906735510099679}, {"id": 139, + "seek": 84534, "start": 846.14, "end": 851.34, "text": " uh and and I did it entirely + in python right so maybe I I should have chosen in retrospect some", "tokens": [50404, + 2232, 293, 293, 286, 630, 309, 7696, 294, 38797, 558, 370, 1310, 286, 286, 820, + 362, 8614, 294, 34997, 512, 50664], "temperature": 0.0, "avg_logprob": -0.11971082492750518, + "compression_ratio": 1.6208333333333333, "no_speech_prob": 0.003358519170433283}, + {"id": 140, "seek": 84534, "start": 851.34, "end": 856.94, "text": " other language + but in any case I wanted to make this work um I couldn''t I ran out of time and + I ran", "tokens": [50664, 661, 2856, 457, 294, 604, 1389, 286, 1415, 281, 652, 341, + 589, 1105, 286, 2809, 380, 286, 5872, 484, 295, 565, 293, 286, 5872, 50944], "temperature": + 0.0, "avg_logprob": -0.11971082492750518, "compression_ratio": 1.6208333333333333, + "no_speech_prob": 0.003358519170433283}, {"id": 141, "seek": 84534, "start": 856.94, + "end": 863.1800000000001, "text": " out of computer resource because it was given + to us by Microsoft um for a limited period of time", "tokens": [50944, 484, 295, + 3820, 7684, 570, 309, 390, 2212, 281, 505, 538, 8116, 1105, 337, 257, 5567, 2896, + 295, 565, 51256], "temperature": 0.0, "avg_logprob": -0.11971082492750518, "compression_ratio": + 1.6208333333333333, "no_speech_prob": 0.003358519170433283}, {"id": 142, "seek": + 84534, "start": 864.0600000000001, "end": 870.86, "text": " so what I did is that + I pasted that code into chat chat GPT and I said yeah first of all I tried", "tokens": + [51300, 370, 437, 286, 630, 307, 300, 286, 1791, 292, 300, 3089, 666, 5081, 5081, + 26039, 51, 293, 286, 848, 1338, 700, 295, 439, 286, 3031, 51640], "temperature": + 0.0, "avg_logprob": -0.11971082492750518, "compression_ratio": 1.6208333333333333, + "no_speech_prob": 0.003358519170433283}, {"id": 143, "seek": 87086, "start": 871.02, + "end": 875.66, "text": " to paste the whole thing but it said well it''s too long + so I had to focus on a specific part where I", "tokens": [50372, 281, 9163, 264, + 1379, 551, 457, 309, 848, 731, 309, 311, 886, 938, 370, 286, 632, 281, 1879, 322, + 257, 2685, 644, 689, 286, 50604], "temperature": 0.0, "avg_logprob": -0.10811215038447416, + "compression_ratio": 1.8021582733812949, "no_speech_prob": 0.004470229148864746}, + {"id": 144, "seek": 87086, "start": 875.66, "end": 881.98, "text": " think the the + problem you know kind of lurks and and it gave me the answer it said okay maybe + try", "tokens": [50604, 519, 264, 264, 1154, 291, 458, 733, 295, 35583, 1694, 293, + 293, 309, 2729, 385, 264, 1867, 309, 848, 1392, 1310, 853, 50920], "temperature": + 0.0, "avg_logprob": -0.10811215038447416, "compression_ratio": 1.8021582733812949, + "no_speech_prob": 0.004470229148864746}, {"id": 145, "seek": 87086, "start": 881.98, + "end": 888.46, "text": " avoid using non-py arrays as much as you do try to pre-allocate + them try to reset them and actually", "tokens": [50920, 5042, 1228, 2107, 12, 8200, + 41011, 382, 709, 382, 291, 360, 853, 281, 659, 12, 336, 42869, 552, 853, 281, 14322, + 552, 293, 767, 51244], "temperature": 0.0, "avg_logprob": -0.10811215038447416, + "compression_ratio": 1.8021582733812949, "no_speech_prob": 0.004470229148864746}, + {"id": 146, "seek": 87086, "start": 888.46, "end": 893.02, "text": " I did that + I just didn''t paste that portion of the code which was doing this so the the system + didn''t", "tokens": [51244, 286, 630, 300, 286, 445, 994, 380, 9163, 300, 8044, + 295, 264, 3089, 597, 390, 884, 341, 370, 264, 264, 1185, 994, 380, 51472], "temperature": + 0.0, "avg_logprob": -0.10811215038447416, "compression_ratio": 1.8021582733812949, + "no_speech_prob": 0.004470229148864746}, {"id": 147, "seek": 87086, "start": 893.02, + "end": 899.1800000000001, "text": " know that but it was on the right on the right + track but then when I did it a year sorry a day later", "tokens": [51472, 458, 300, + 457, 309, 390, 322, 264, 558, 322, 264, 558, 2837, 457, 550, 562, 286, 630, 309, + 257, 1064, 2597, 257, 786, 1780, 51780], "temperature": 0.0, "avg_logprob": -0.10811215038447416, + "compression_ratio": 1.8021582733812949, "no_speech_prob": 0.004470229148864746}, + {"id": 148, "seek": 89918, "start": 899.7399999999999, "end": 905.7399999999999, + "text": " the answer changed the question was exactly same but the answer changed + and that kind of make me", "tokens": [50392, 264, 1867, 3105, 264, 1168, 390, 2293, + 912, 457, 264, 1867, 3105, 293, 300, 733, 295, 652, 385, 50692], "temperature": + 0.0, "avg_logprob": -0.1409247966294878, "compression_ratio": 1.7300884955752212, + "no_speech_prob": 0.004194737412035465}, {"id": 149, "seek": 89918, "start": 906.3, + "end": 913.26, "text": " really like uh what''s going on like is it learning as + it goes can you explain this part like have", "tokens": [50720, 534, 411, 2232, + 437, 311, 516, 322, 411, 307, 309, 2539, 382, 309, 1709, 393, 291, 2903, 341, 644, + 411, 362, 51068], "temperature": 0.0, "avg_logprob": -0.1409247966294878, "compression_ratio": + 1.7300884955752212, "no_speech_prob": 0.004194737412035465}, {"id": 150, "seek": + 89918, "start": 913.26, "end": 919.7399999999999, "text": " you seen this in his + behavior like was the casting generation of the yeah chat GPT sorry I was like", + "tokens": [51068, 291, 1612, 341, 294, 702, 5223, 411, 390, 264, 17301, 5125, 295, + 264, 1338, 5081, 26039, 51, 2597, 286, 390, 411, 51392], "temperature": 0.0, "avg_logprob": + -0.1409247966294878, "compression_ratio": 1.7300884955752212, "no_speech_prob": + 0.004194737412035465}, {"id": 151, "seek": 89918, "start": 919.7399999999999, "end": + 923.42, "text": " I was trying to follow along with the I think we''re going to + talk about like approximation error", "tokens": [51392, 286, 390, 1382, 281, 1524, + 2051, 365, 264, 286, 519, 321, 434, 516, 281, 751, 466, 411, 28023, 6713, 51576], + "temperature": 0.0, "avg_logprob": -0.1409247966294878, "compression_ratio": 1.7300884955752212, + "no_speech_prob": 0.004194737412035465}, {"id": 152, "seek": 92342, "start": 923.5, + "end": 929.9799999999999, "text": " with the AN and search as we scale it and I + know we''re coming back to the chat GPT but I''ll be uh", "tokens": [50368, 365, + 264, 5252, 293, 3164, 382, 321, 4373, 309, 293, 286, 458, 321, 434, 1348, 646, 281, + 264, 5081, 26039, 51, 457, 286, 603, 312, 2232, 50692], "temperature": 0.0, "avg_logprob": + -0.19311071868635651, "compression_ratio": 1.7445255474452555, "no_speech_prob": + 0.01498115062713623}, {"id": 153, "seek": 92342, "start": 929.9799999999999, "end": + 936.38, "text": " yeah so it''s like uh it''s like a tree decoding where uh it it + has a probability density on the", "tokens": [50692, 1338, 370, 309, 311, 411, 2232, + 309, 311, 411, 257, 4230, 979, 8616, 689, 2232, 309, 309, 575, 257, 8482, 10305, + 322, 264, 51012], "temperature": 0.0, "avg_logprob": -0.19311071868635651, "compression_ratio": + 1.7445255474452555, "no_speech_prob": 0.01498115062713623}, {"id": 154, "seek": + 92342, "start": 936.38, "end": 940.62, "text": " length of vocabulary and you can + take several paths through that tree for what you''re going to", "tokens": [51012, + 4641, 295, 19864, 293, 291, 393, 747, 2940, 14518, 807, 300, 4230, 337, 437, 291, + 434, 516, 281, 51224], "temperature": 0.0, "avg_logprob": -0.19311071868635651, + "compression_ratio": 1.7445255474452555, "no_speech_prob": 0.01498115062713623}, + {"id": 155, "seek": 92342, "start": 940.62, "end": 946.86, "text": " output and + uh you often randomly sample through the through the tree if that makes sense like + um", "tokens": [51224, 5598, 293, 2232, 291, 2049, 16979, 6889, 807, 264, 807, 264, + 4230, 498, 300, 1669, 2020, 411, 1105, 51536], "temperature": 0.0, "avg_logprob": + -0.19311071868635651, "compression_ratio": 1.7445255474452555, "no_speech_prob": + 0.01498115062713623}, {"id": 156, "seek": 92342, "start": 946.86, "end": 951.5, + "text": " yeah yeah me does but I mean the answer was kind of like in some sense + these two answers were", "tokens": [51536, 1338, 1338, 385, 775, 457, 286, 914, + 264, 1867, 390, 733, 295, 411, 294, 512, 2020, 613, 732, 6338, 645, 51768], "temperature": + 0.0, "avg_logprob": -0.19311071868635651, "compression_ratio": 1.7445255474452555, + "no_speech_prob": 0.01498115062713623}, {"id": 157, "seek": 95150, "start": 951.5, + "end": 956.78, "text": " complementary to each other right and and maybe I could + go on and say hey what do you mean by", "tokens": [50364, 40705, 281, 1184, 661, + 558, 293, 293, 1310, 286, 727, 352, 322, 293, 584, 4177, 437, 360, 291, 914, 538, + 50628], "temperature": 0.0, "avg_logprob": -0.12264800071716309, "compression_ratio": + 1.719298245614035, "no_speech_prob": 0.001135017373599112}, {"id": 158, "seek": + 95150, "start": 956.78, "end": 963.02, "text": " resetting can you because it didn''t + provide any uh code examples it would just say reset and I was like", "tokens": + [50628, 14322, 783, 393, 291, 570, 309, 994, 380, 2893, 604, 2232, 3089, 5110, 309, + 576, 445, 584, 14322, 293, 286, 390, 411, 50940], "temperature": 0.0, "avg_logprob": + -0.12264800071716309, "compression_ratio": 1.719298245614035, "no_speech_prob": + 0.001135017373599112}, {"id": 159, "seek": 95150, "start": 963.02, "end": 970.06, + "text": " what do you mean by reset I don''t have such a method like like like so + I I think that that was maybe", "tokens": [50940, 437, 360, 291, 914, 538, 14322, + 286, 500, 380, 362, 1270, 257, 3170, 411, 411, 411, 370, 286, 286, 519, 300, 300, + 390, 1310, 51292], "temperature": 0.0, "avg_logprob": -0.12264800071716309, "compression_ratio": + 1.719298245614035, "no_speech_prob": 0.001135017373599112}, {"id": 160, "seek": + 95150, "start": 970.7, "end": 977.5, "text": " impressive part of chat GPT and um + just to close off on that there was a recent discussion on", "tokens": [51324, 8992, + 644, 295, 5081, 26039, 51, 293, 1105, 445, 281, 1998, 766, 322, 300, 456, 390, 257, + 5162, 5017, 322, 51664], "temperature": 0.0, "avg_logprob": -0.12264800071716309, + "compression_ratio": 1.719298245614035, "no_speech_prob": 0.001135017373599112}, + {"id": 161, "seek": 97750, "start": 977.98, "end": 983.26, "text": " on relevancy + and matching text like where a lot of these search people see uh there was um", + "tokens": [50388, 322, 25916, 6717, 293, 14324, 2487, 411, 689, 257, 688, 295, 613, + 3164, 561, 536, 2232, 456, 390, 1105, 50652], "temperature": 0.0, "avg_logprob": + -0.14471053051692184, "compression_ratio": 1.7432432432432432, "no_speech_prob": + 0.0019175204215571284}, {"id": 162, "seek": 97750, "start": 983.26, "end": 992.3, + "text": " there was this argument against chat GPT that let''s say if you go um + you know use uh duck duck go", "tokens": [50652, 456, 390, 341, 6770, 1970, 5081, + 26039, 51, 300, 718, 311, 584, 498, 291, 352, 1105, 291, 458, 764, 2232, 12482, + 12482, 352, 51104], "temperature": 0.0, "avg_logprob": -0.14471053051692184, "compression_ratio": + 1.7432432432432432, "no_speech_prob": 0.0019175204215571284}, {"id": 163, "seek": + 97750, "start": 992.3, "end": 998.86, "text": " today you will see the links right + you can go and examine the links and you can actually verify the", "tokens": [51104, + 965, 291, 486, 536, 264, 6123, 558, 291, 393, 352, 293, 17496, 264, 6123, 293, 291, + 393, 767, 16888, 264, 51432], "temperature": 0.0, "avg_logprob": -0.14471053051692184, + "compression_ratio": 1.7432432432432432, "no_speech_prob": 0.0019175204215571284}, + {"id": 164, "seek": 97750, "start": 998.86, "end": 1004.22, "text": " information + to some extent maybe not to full extent but to some extent in chat GPT you can do + that", "tokens": [51432, 1589, 281, 512, 8396, 1310, 406, 281, 1577, 8396, 457, + 281, 512, 8396, 294, 5081, 26039, 51, 291, 393, 360, 300, 51700], "temperature": + 0.0, "avg_logprob": -0.14471053051692184, "compression_ratio": 1.7432432432432432, + "no_speech_prob": 0.0019175204215571284}, {"id": 165, "seek": 100422, "start": 1004.22, + "end": 1012.78, "text": " there is an answer that''s it so it''s it''s quite a jump + from being able to kind of seemingly check", "tokens": [50364, 456, 307, 364, 1867, + 300, 311, 309, 370, 309, 311, 309, 311, 1596, 257, 3012, 490, 885, 1075, 281, 733, + 295, 18709, 1520, 50792], "temperature": 0.0, "avg_logprob": -0.09570505402304909, + "compression_ratio": 1.7568807339449541, "no_speech_prob": 0.00084763701306656}, + {"id": 166, "seek": 100422, "start": 1012.78, "end": 1018.22, "text": " the is it + trustworthy to well you have no way to do that what do you think of this aspect", + "tokens": [50792, 264, 307, 309, 39714, 281, 731, 291, 362, 572, 636, 281, 360, + 300, 437, 360, 291, 519, 295, 341, 4171, 51064], "temperature": 0.0, "avg_logprob": + -0.09570505402304909, "compression_ratio": 1.7568807339449541, "no_speech_prob": + 0.00084763701306656}, {"id": 167, "seek": 100422, "start": 1019.5, "end": 1024.78, + "text": " yeah that''s brilliant I it makes me think about like well very broadly + it makes me think about", "tokens": [51128, 1338, 300, 311, 10248, 286, 309, 1669, + 385, 519, 466, 411, 731, 588, 19511, 309, 1669, 385, 519, 466, 51392], "temperature": + 0.0, "avg_logprob": -0.09570505402304909, "compression_ratio": 1.7568807339449541, + "no_speech_prob": 0.00084763701306656}, {"id": 168, "seek": 100422, "start": 1024.78, + "end": 1030.38, "text": " artificial general intelligence compared to super intelligence + sort so to say and like I think about", "tokens": [51392, 11677, 2674, 7599, 5347, + 281, 1687, 7599, 1333, 370, 281, 584, 293, 411, 286, 519, 466, 51672], "temperature": + 0.0, "avg_logprob": -0.09570505402304909, "compression_ratio": 1.7568807339449541, + "no_speech_prob": 0.00084763701306656}, {"id": 169, "seek": 103038, "start": 1030.46, + "end": 1035.5800000000002, "text": " the artificial general intelligence and like + because open AI they''ve published web GPT and", "tokens": [50368, 264, 11677, 2674, + 7599, 293, 411, 570, 1269, 7318, 436, 600, 6572, 3670, 26039, 51, 293, 50624], "temperature": + 0.0, "avg_logprob": -0.16619372817705264, "compression_ratio": 1.709090909090909, + "no_speech_prob": 0.0011353311128914356}, {"id": 170, "seek": 103038, "start": 1035.5800000000002, + "end": 1039.5800000000002, "text": " instruct GPT so instruct GPT is like the reinforcement + learning from human feedback part", "tokens": [50624, 7232, 26039, 51, 370, 7232, + 26039, 51, 307, 411, 264, 29280, 2539, 490, 1952, 5824, 644, 50824], "temperature": + 0.0, "avg_logprob": -0.16619372817705264, "compression_ratio": 1.709090909090909, + "no_speech_prob": 0.0011353311128914356}, {"id": 171, "seek": 103038, "start": 1039.5800000000002, + "end": 1044.3000000000002, "text": " and then web GPT is like the like the whole + idea that we''re super excited about at wevea where", "tokens": [50824, 293, 550, + 3670, 26039, 51, 307, 411, 264, 411, 264, 1379, 1558, 300, 321, 434, 1687, 2919, + 466, 412, 321, 303, 64, 689, 51060], "temperature": 0.0, "avg_logprob": -0.16619372817705264, + "compression_ratio": 1.709090909090909, "no_speech_prob": 0.0011353311128914356}, + {"id": 172, "seek": 103038, "start": 1044.3000000000002, "end": 1050.3000000000002, + "text": " you search for context to append to the input and then like if you say + like please uh ground your", "tokens": [51060, 291, 3164, 337, 4319, 281, 34116, + 281, 264, 4846, 293, 550, 411, 498, 291, 584, 411, 1767, 2232, 2727, 428, 51360], + "temperature": 0.0, "avg_logprob": -0.16619372817705264, "compression_ratio": 1.709090909090909, + "no_speech_prob": 0.0011353311128914356}, {"id": 173, "seek": 103038, "start": 1050.3000000000002, + "end": 1055.42, "text": " answer in this information and then it''s a paragraph + about like how the BM25 algorithm works like", "tokens": [51360, 1867, 294, 341, + 1589, 293, 550, 309, 311, 257, 18865, 466, 411, 577, 264, 15901, 6074, 9284, 1985, + 411, 51616], "temperature": 0.0, "avg_logprob": -0.16619372817705264, "compression_ratio": + 1.709090909090909, "no_speech_prob": 0.0011353311128914356}, {"id": 174, "seek": + 105542, "start": 1055.42, "end": 1060.22, "text": " I use this personally that way + to hybrid search and understanding it and so like if you give it", "tokens": [50364, + 286, 764, 341, 5665, 300, 636, 281, 13051, 3164, 293, 3701, 309, 293, 370, 411, + 498, 291, 976, 309, 50604], "temperature": 0.0, "avg_logprob": -0.1789126638638771, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.000558963161893189}, + {"id": 175, "seek": 105542, "start": 1060.22, "end": 1066.14, "text": " the context + it''s so much better and so I think I suspect that chat GBC under the hood does + something", "tokens": [50604, 264, 4319, 309, 311, 370, 709, 1101, 293, 370, 286, + 519, 286, 9091, 300, 5081, 460, 7869, 833, 264, 13376, 775, 746, 50900], "temperature": + 0.0, "avg_logprob": -0.1789126638638771, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.000558963161893189}, {"id": 176, "seek": 105542, "start": 1066.14, + "end": 1073.5800000000002, "text": " like a Google or a Bing API search and so it''s + like general old but um yeah this idea like so", "tokens": [50900, 411, 257, 3329, + 420, 257, 30755, 9362, 3164, 293, 370, 309, 311, 411, 2674, 1331, 457, 1105, 1338, + 341, 1558, 411, 370, 51272], "temperature": 0.0, "avg_logprob": -0.1789126638638771, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.000558963161893189}, + {"id": 177, "seek": 105542, "start": 1073.5800000000002, "end": 1077.9, "text": + " so so then this idea of super intelligence it uh because I''ve been like can I + use chat GPT to help", "tokens": [51272, 370, 370, 550, 341, 1558, 295, 1687, 7599, + 309, 2232, 570, 286, 600, 668, 411, 393, 286, 764, 5081, 26039, 51, 281, 854, 51488], + "temperature": 0.0, "avg_logprob": -0.1789126638638771, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.000558963161893189}, {"id": 178, "seek": 105542, "start": 1077.9, + "end": 1083.02, "text": " me write like you know blog post survey papers things + like that are relevant for trying to be a master", "tokens": [51488, 385, 2464, + 411, 291, 458, 6968, 2183, 8984, 10577, 721, 411, 300, 366, 7340, 337, 1382, 281, + 312, 257, 4505, 51744], "temperature": 0.0, "avg_logprob": -0.1789126638638771, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.000558963161893189}, + {"id": 179, "seek": 108302, "start": 1083.02, "end": 1088.94, "text": " of search + and what I need from it is more so like citation recommendation right like I needed + to", "tokens": [50364, 295, 3164, 293, 437, 286, 643, 490, 309, 307, 544, 370, 411, + 45590, 11879, 558, 411, 286, 2978, 281, 50660], "temperature": 0.0, "avg_logprob": + -0.1324155807495117, "compression_ratio": 1.6986899563318778, "no_speech_prob": + 0.002491435268893838}, {"id": 180, "seek": 108302, "start": 1089.98, "end": 1096.46, + "text": " go into like uh Leo Boystov''s publications and parse it out for me and + help me understand what he''s", "tokens": [50712, 352, 666, 411, 2232, 19344, 9486, + 372, 5179, 311, 25618, 293, 48377, 309, 484, 337, 385, 293, 854, 385, 1223, 437, + 415, 311, 51036], "temperature": 0.0, "avg_logprob": -0.1324155807495117, "compression_ratio": + 1.6986899563318778, "no_speech_prob": 0.002491435268893838}, {"id": 181, "seek": + 108302, "start": 1096.46, "end": 1104.46, "text": " done so it''s like the specific + information and then yeah the real I mean u.com also has a really", "tokens": [51036, + 1096, 370, 309, 311, 411, 264, 2685, 1589, 293, 550, 1338, 264, 957, 286, 914, 344, + 13, 1112, 611, 575, 257, 534, 51436], "temperature": 0.0, "avg_logprob": -0.1324155807495117, + "compression_ratio": 1.6986899563318778, "no_speech_prob": 0.002491435268893838}, + {"id": 182, "seek": 108302, "start": 1104.46, "end": 1109.98, "text": " brilliant + thing where it''s uh search engine on this panel and then the chat GBC on this side + so", "tokens": [51436, 10248, 551, 689, 309, 311, 2232, 3164, 2848, 322, 341, 4831, + 293, 550, 264, 5081, 460, 7869, 322, 341, 1252, 370, 51712], "temperature": 0.0, + "avg_logprob": -0.1324155807495117, "compression_ratio": 1.6986899563318778, "no_speech_prob": + 0.002491435268893838}, {"id": 183, "seek": 110998, "start": 1109.98, "end": 1115.74, + "text": " it''s like a user interface problem I think yeah yeah but but I mean maybe + even yeah I totally", "tokens": [50364, 309, 311, 411, 257, 4195, 9226, 1154, 286, + 519, 1338, 1338, 457, 457, 286, 914, 1310, 754, 1338, 286, 3879, 50652], "temperature": + 0.0, "avg_logprob": -0.10471352378090659, "compression_ratio": 1.8309859154929577, + "no_speech_prob": 0.0036140696611255407}, {"id": 184, "seek": 110998, "start": 1115.74, + "end": 1122.14, "text": " agree with you that user interface definitely creates + the bias uh how we like how you use traffic", "tokens": [50652, 3986, 365, 291, + 300, 4195, 9226, 2138, 7829, 264, 12577, 2232, 577, 321, 411, 577, 291, 764, 6419, + 50972], "temperature": 0.0, "avg_logprob": -0.10471352378090659, "compression_ratio": + 1.8309859154929577, "no_speech_prob": 0.0036140696611255407}, {"id": 185, "seek": + 110998, "start": 1122.14, "end": 1128.14, "text": " lights today they go like red + you know yellow and green they don''t go upside down right and like", "tokens": + [50972, 5811, 965, 436, 352, 411, 2182, 291, 458, 5566, 293, 3092, 436, 500, 380, + 352, 14119, 760, 558, 293, 411, 51272], "temperature": 0.0, "avg_logprob": -0.10471352378090659, + "compression_ratio": 1.8309859154929577, "no_speech_prob": 0.0036140696611255407}, + {"id": 186, "seek": 110998, "start": 1128.14, "end": 1133.18, "text": " if you see + an upside down you will you will think well this is a wrong uh traffic light uh + I''d rather", "tokens": [51272, 498, 291, 536, 364, 14119, 760, 291, 486, 291, 486, + 519, 731, 341, 307, 257, 2085, 2232, 6419, 1442, 2232, 286, 1116, 2831, 51524], + "temperature": 0.0, "avg_logprob": -0.10471352378090659, "compression_ratio": 1.8309859154929577, + "no_speech_prob": 0.0036140696611255407}, {"id": 187, "seek": 113318, "start": 1133.26, + "end": 1140.46, "text": " not cross here you know but like it''s kind of like similar + here like with the search engines we are", "tokens": [50368, 406, 3278, 510, 291, + 458, 457, 411, 309, 311, 733, 295, 411, 2531, 510, 411, 365, 264, 3164, 12982, 321, + 366, 50728], "temperature": 0.0, "avg_logprob": -0.14627052635274906, "compression_ratio": + 1.7035398230088497, "no_speech_prob": 0.003619111143052578}, {"id": 188, "seek": + 113318, "start": 1140.46, "end": 1146.78, "text": " used to seeing you know URLs + and and being able to click there but of course if you take Google or", "tokens": + [50728, 1143, 281, 2577, 291, 458, 43267, 293, 293, 885, 1075, 281, 2052, 456, 457, + 295, 1164, 498, 291, 747, 3329, 420, 51044], "temperature": 0.0, "avg_logprob": + -0.14627052635274906, "compression_ratio": 1.7035398230088497, "no_speech_prob": + 0.003619111143052578}, {"id": 189, "seek": 113318, "start": 1146.78, "end": 1151.66, + "text": " I guess being does that too they also pre-generate this answers answer + boxes right so you can", "tokens": [51044, 286, 2041, 885, 775, 300, 886, 436, 611, + 659, 12, 21848, 473, 341, 6338, 1867, 9002, 558, 370, 291, 393, 51288], "temperature": + 0.0, "avg_logprob": -0.14627052635274906, "compression_ratio": 1.7035398230088497, + "no_speech_prob": 0.003619111143052578}, {"id": 190, "seek": 113318, "start": 1151.66, + "end": 1157.18, "text": " answer you can click there but I don''t think you have + a URL to verify you know the source of", "tokens": [51288, 1867, 291, 393, 2052, + 456, 457, 286, 500, 380, 519, 291, 362, 257, 12905, 281, 16888, 291, 458, 264, 4009, + 295, 51564], "temperature": 0.0, "avg_logprob": -0.14627052635274906, "compression_ratio": + 1.7035398230088497, "no_speech_prob": 0.003619111143052578}, {"id": 191, "seek": + 115718, "start": 1157.26, "end": 1163.3400000000001, "text": " this information + if I''m not wrong yeah yeah so they already playing with incorporating this", "tokens": + [50368, 341, 1589, 498, 286, 478, 406, 2085, 1338, 1338, 370, 436, 1217, 2433, 365, + 33613, 341, 50672], "temperature": 0.0, "avg_logprob": -0.10445472429383476, "compression_ratio": + 1.758364312267658, "no_speech_prob": 0.004276286344975233}, {"id": 192, "seek": + 115718, "start": 1163.3400000000001, "end": 1168.8600000000001, "text": " knowledge + from a language model right and they they they look at you and of course they also + want you", "tokens": [50672, 3601, 490, 257, 2856, 2316, 558, 293, 436, 436, 436, + 574, 412, 291, 293, 295, 1164, 436, 611, 528, 291, 50948], "temperature": 0.0, "avg_logprob": + -0.10445472429383476, "compression_ratio": 1.758364312267658, "no_speech_prob": + 0.004276286344975233}, {"id": 193, "seek": 115718, "start": 1168.8600000000001, + "end": 1173.1000000000001, "text": " to spend more time on their page which is probably + not good but we''ll not discuss that", "tokens": [50948, 281, 3496, 544, 565, 322, + 641, 3028, 597, 307, 1391, 406, 665, 457, 321, 603, 406, 2248, 300, 51160], "temperature": + 0.0, "avg_logprob": -0.10445472429383476, "compression_ratio": 1.758364312267658, + "no_speech_prob": 0.004276286344975233}, {"id": 194, "seek": 115718, "start": 1174.78, + "end": 1179.74, "text": " so they don''t share the traffic further but but the thing + is you know they still play with the", "tokens": [51244, 370, 436, 500, 380, 2073, + 264, 6419, 3052, 457, 457, 264, 551, 307, 291, 458, 436, 920, 862, 365, 264, 51492], + "temperature": 0.0, "avg_logprob": -0.10445472429383476, "compression_ratio": 1.758364312267658, + "no_speech_prob": 0.004276286344975233}, {"id": 195, "seek": 115718, "start": 1179.74, + "end": 1186.38, "text": " idea okay what if we try to answer not just with the URL + and summary but actually with the actual", "tokens": [51492, 1558, 1392, 437, 498, + 321, 853, 281, 1867, 406, 445, 365, 264, 12905, 293, 12691, 457, 767, 365, 264, + 3539, 51824], "temperature": 0.0, "avg_logprob": -0.10445472429383476, "compression_ratio": + 1.758364312267658, "no_speech_prob": 0.004276286344975233}, {"id": 196, "seek": + 118638, "start": 1186.38, "end": 1193.5, "text": " thing right with the actual answer + oh so that comes into like the extractive versus abstractive", "tokens": [50364, + 551, 558, 365, 264, 3539, 1867, 1954, 370, 300, 1487, 666, 411, 264, 8947, 488, + 5717, 12649, 488, 50720], "temperature": 0.0, "avg_logprob": -0.12067663322374658, + "compression_ratio": 1.821011673151751, "no_speech_prob": 0.0007553909672424197}, + {"id": 197, "seek": 118638, "start": 1193.5, "end": 1198.46, "text": " and like + whether you want the question answering models that classify the answers in the + context", "tokens": [50720, 293, 411, 1968, 291, 528, 264, 1168, 13430, 5245, 300, + 33872, 264, 6338, 294, 264, 4319, 50968], "temperature": 0.0, "avg_logprob": -0.12067663322374658, + "compression_ratio": 1.821011673151751, "no_speech_prob": 0.0007553909672424197}, + {"id": 198, "seek": 118638, "start": 1199.42, "end": 1203.8200000000002, "text": + " yeah and yeah I think that still has a place for sure I mean it''s super lightweight + as I mentioned", "tokens": [51016, 1338, 293, 1338, 286, 519, 300, 920, 575, 257, + 1081, 337, 988, 286, 914, 309, 311, 1687, 22052, 382, 286, 2835, 51236], "temperature": + 0.0, "avg_logprob": -0.12067663322374658, "compression_ratio": 1.821011673151751, + "no_speech_prob": 0.0007553909672424197}, {"id": 199, "seek": 118638, "start": 1203.8200000000002, + "end": 1209.3400000000001, "text": " Neural Magic they just did a sparse question + answering model that can run on CPU super fast and", "tokens": [51236, 1734, 1807, + 16154, 436, 445, 630, 257, 637, 11668, 1168, 13430, 2316, 300, 393, 1190, 322, 13199, + 1687, 2370, 293, 51512], "temperature": 0.0, "avg_logprob": -0.12067663322374658, + "compression_ratio": 1.821011673151751, "no_speech_prob": 0.0007553909672424197}, + {"id": 200, "seek": 118638, "start": 1209.8200000000002, "end": 1214.7, "text": + " yeah I think that approach is also just gonna be more cost effective for a while", + "tokens": [51536, 1338, 286, 519, 300, 3109, 307, 611, 445, 799, 312, 544, 2063, + 4942, 337, 257, 1339, 51780], "temperature": 0.0, "avg_logprob": -0.12067663322374658, + "compression_ratio": 1.821011673151751, "no_speech_prob": 0.0007553909672424197}, + {"id": 201, "seek": 121470, "start": 1215.02, "end": 1222.3, "text": " yeah exactly + but you mentioned BM25 and I''m curious like I''ve been trying to approach this + hybrid", "tokens": [50380, 1338, 2293, 457, 291, 2835, 15901, 6074, 293, 286, 478, + 6369, 411, 286, 600, 668, 1382, 281, 3109, 341, 13051, 50744], "temperature": 0.0, + "avg_logprob": -0.14191165500217015, "compression_ratio": 1.6724890829694323, "no_speech_prob": + 0.003626579651609063}, {"id": 202, "seek": 121470, "start": 1222.3, "end": 1228.38, + "text": " search topic but I think you went ahead all right so and I was just wondering + like what''s your", "tokens": [50744, 3164, 4829, 457, 286, 519, 291, 1437, 2286, + 439, 558, 370, 293, 286, 390, 445, 6359, 411, 437, 311, 428, 51048], "temperature": + 0.0, "avg_logprob": -0.14191165500217015, "compression_ratio": 1.6724890829694323, + "no_speech_prob": 0.003626579651609063}, {"id": 203, "seek": 121470, "start": 1228.38, + "end": 1235.98, "text": " take on this topic like can you a little like intro it + to our listeners but also why do you think", "tokens": [51048, 747, 322, 341, 4829, + 411, 393, 291, 257, 707, 411, 12897, 309, 281, 527, 23274, 457, 611, 983, 360, 291, + 519, 51428], "temperature": 0.0, "avg_logprob": -0.14191165500217015, "compression_ratio": + 1.6724890829694323, "no_speech_prob": 0.003626579651609063}, {"id": 204, "seek": + 121470, "start": 1235.98, "end": 1241.82, "text": " it''s a good idea to to build + like a hybrid search you know combining keyboard retrieval with", "tokens": [51428, + 309, 311, 257, 665, 1558, 281, 281, 1322, 411, 257, 13051, 3164, 291, 458, 21928, + 10186, 19817, 3337, 365, 51720], "temperature": 0.0, "avg_logprob": -0.14191165500217015, + "compression_ratio": 1.6724890829694323, "no_speech_prob": 0.003626579651609063}, + {"id": 205, "seek": 124182, "start": 1241.82, "end": 1248.7, "text": " it''s with + a you know dense retrieval yeah awesome I started by saying this has just been like + the", "tokens": [50364, 309, 311, 365, 257, 291, 458, 18011, 19817, 3337, 1338, + 3476, 286, 1409, 538, 1566, 341, 575, 445, 668, 411, 264, 50708], "temperature": + 0.0, "avg_logprob": -0.1660457926058988, "compression_ratio": 1.7062937062937062, + "no_speech_prob": 0.0014643922913819551}, {"id": 206, "seek": 124182, "start": 1248.7, + "end": 1253.34, "text": " most satisfying project I''ve worked on since I''ve joined + Wevegate and just being a part of this team", "tokens": [50708, 881, 18348, 1716, + 286, 600, 2732, 322, 1670, 286, 600, 6869, 492, 303, 22514, 293, 445, 885, 257, + 644, 295, 341, 1469, 50940], "temperature": 0.0, "avg_logprob": -0.1660457926058988, + "compression_ratio": 1.7062937062937062, "no_speech_prob": 0.0014643922913819551}, + {"id": 207, "seek": 124182, "start": 1253.34, "end": 1258.1399999999999, "text": + " and it''s been you know like a big team working on hybrid search and it''s just + been an incredible", "tokens": [50940, 293, 309, 311, 668, 291, 458, 411, 257, 955, + 1469, 1364, 322, 13051, 3164, 293, 309, 311, 445, 668, 364, 4651, 51180], "temperature": + 0.0, "avg_logprob": -0.1660457926058988, "compression_ratio": 1.7062937062937062, + "no_speech_prob": 0.0014643922913819551}, {"id": 208, "seek": 124182, "start": 1258.1399999999999, + "end": 1264.06, "text": " experience so I guess starting yeah the motivation is + BM25 has this it builds on term frequency", "tokens": [51180, 1752, 370, 286, 2041, + 2891, 1338, 264, 12335, 307, 15901, 6074, 575, 341, 309, 15182, 322, 1433, 7893, + 51476], "temperature": 0.0, "avg_logprob": -0.1660457926058988, "compression_ratio": + 1.7062937062937062, "no_speech_prob": 0.0014643922913819551}, {"id": 209, "seek": + 124182, "start": 1264.06, "end": 1268.7, "text": " inverse document frequency by + adding like this binary independence model and the IDF calculation", "tokens": [51476, + 17340, 4166, 7893, 538, 5127, 411, 341, 17434, 14640, 2316, 293, 264, 7348, 37, + 17108, 51708], "temperature": 0.0, "avg_logprob": -0.1660457926058988, "compression_ratio": + 1.7062937062937062, "no_speech_prob": 0.0014643922913819551}, {"id": 210, "seek": + 126870, "start": 1268.7, "end": 1272.78, "text": " and then you also normalize it + for the length of the document that''s just like these subtle", "tokens": [50364, + 293, 550, 291, 611, 2710, 1125, 309, 337, 264, 4641, 295, 264, 4166, 300, 311, 445, + 411, 613, 13743, 50568], "temperature": 0.0, "avg_logprob": -0.12430707478927354, + "compression_ratio": 1.7933579335793357, "no_speech_prob": 0.0008278696914203465}, + {"id": 211, "seek": 126870, "start": 1272.78, "end": 1277.26, "text": " differences + that make it different than TF IDF but you could also use TF IDF in hybrid search + if", "tokens": [50568, 7300, 300, 652, 309, 819, 813, 40964, 7348, 37, 457, 291, + 727, 611, 764, 40964, 7348, 37, 294, 13051, 3164, 498, 50792], "temperature": 0.0, + "avg_logprob": -0.12430707478927354, "compression_ratio": 1.7933579335793357, "no_speech_prob": + 0.0008278696914203465}, {"id": 212, "seek": 126870, "start": 1277.26, "end": 1282.06, + "text": " that''s what you were after and so then you also have the vector search + and then you have this rank", "tokens": [50792, 300, 311, 437, 291, 645, 934, 293, + 370, 550, 291, 611, 362, 264, 8062, 3164, 293, 550, 291, 362, 341, 6181, 51032], + "temperature": 0.0, "avg_logprob": -0.12430707478927354, "compression_ratio": 1.7933579335793357, + "no_speech_prob": 0.0008278696914203465}, {"id": 213, "seek": 126870, "start": 1282.06, + "end": 1287.02, "text": " fusion so so we look we found this paper where they have + seven different strategies for rank fusion", "tokens": [51032, 23100, 370, 370, + 321, 574, 321, 1352, 341, 3035, 689, 436, 362, 3407, 819, 9029, 337, 6181, 23100, + 51280], "temperature": 0.0, "avg_logprob": -0.12430707478927354, "compression_ratio": + 1.7933579335793357, "no_speech_prob": 0.0008278696914203465}, {"id": 214, "seek": + 126870, "start": 1287.02, "end": 1294.14, "text": " it''s like rrf board uh I don''t + know come some but in the end we just went with rrf reciprocal rank", "tokens": + [51280, 309, 311, 411, 367, 81, 69, 3150, 2232, 286, 500, 380, 458, 808, 512, 457, + 294, 264, 917, 321, 445, 1437, 365, 367, 81, 69, 46948, 6181, 51636], "temperature": + 0.0, "avg_logprob": -0.12430707478927354, "compression_ratio": 1.7933579335793357, + "no_speech_prob": 0.0008278696914203465}, {"id": 215, "seek": 129414, "start": 1294.14, + "end": 1299.0200000000002, "text": " fusion which is just erica''s recently published + a blog post that shows the equation and just", "tokens": [50364, 23100, 597, 307, + 445, 1189, 2262, 311, 3938, 6572, 257, 6968, 2183, 300, 3110, 264, 5367, 293, 445, + 50608], "temperature": 0.0, "avg_logprob": -0.09832319223655844, "compression_ratio": + 1.7168458781362008, "no_speech_prob": 0.008709307760000229}, {"id": 216, "seek": + 129414, "start": 1299.0200000000002, "end": 1303.74, "text": " tells some of our + thinking around it but it''s where you just combine the ranks compared to say", + "tokens": [50608, 5112, 512, 295, 527, 1953, 926, 309, 457, 309, 311, 689, 291, + 445, 10432, 264, 21406, 5347, 281, 584, 50844], "temperature": 0.0, "avg_logprob": + -0.09832319223655844, "compression_ratio": 1.7168458781362008, "no_speech_prob": + 0.008709307760000229}, {"id": 217, "seek": 129414, "start": 1303.74, "end": 1308.14, + "text": " combining the scores because you know BM25 has a score particularly and + vector search has like a", "tokens": [50844, 21928, 264, 13444, 570, 291, 458, 15901, + 6074, 575, 257, 6175, 4098, 293, 8062, 3164, 575, 411, 257, 51064], "temperature": + 0.0, "avg_logprob": -0.09832319223655844, "compression_ratio": 1.7168458781362008, + "no_speech_prob": 0.008709307760000229}, {"id": 218, "seek": 129414, "start": 1308.14, + "end": 1313.5800000000002, "text": " distance at return so you might look at some + way of like linearly or non-linearly combining those", "tokens": [51064, 4560, 412, + 2736, 370, 291, 1062, 574, 412, 512, 636, 295, 411, 43586, 420, 2107, 12, 28263, + 356, 21928, 729, 51336], "temperature": 0.0, "avg_logprob": -0.09832319223655844, + "compression_ratio": 1.7168458781362008, "no_speech_prob": 0.008709307760000229}, + {"id": 219, "seek": 129414, "start": 1313.5800000000002, "end": 1319.5800000000002, + "text": " scores and I''ve done some experiments with with kind of my thinking around + it was like okay what", "tokens": [51336, 13444, 293, 286, 600, 1096, 512, 12050, + 365, 365, 733, 295, 452, 1953, 926, 309, 390, 411, 1392, 437, 51636], "temperature": + 0.0, "avg_logprob": -0.09832319223655844, "compression_ratio": 1.7168458781362008, + "no_speech_prob": 0.008709307760000229}, {"id": 220, "seek": 131958, "start": 1319.58, + "end": 1325.34, "text": " would be like an optimal alpha per query would that be + you know maybe like a conditional model like", "tokens": [50364, 576, 312, 411, + 364, 16252, 8961, 680, 14581, 576, 300, 312, 291, 458, 1310, 411, 257, 27708, 2316, + 411, 50652], "temperature": 0.0, "avg_logprob": -0.09418844772597491, "compression_ratio": + 1.7711267605633803, "no_speech_prob": 0.0009331719484180212}, {"id": 221, "seek": + 131958, "start": 1325.34, "end": 1330.62, "text": " so I tried this with the few + shot learning of gbt3 where you you run a few examples of the optimal alpha", "tokens": + [50652, 370, 286, 3031, 341, 365, 264, 1326, 3347, 2539, 295, 290, 4517, 18, 689, + 291, 291, 1190, 257, 1326, 5110, 295, 264, 16252, 8961, 50916], "temperature": 0.0, + "avg_logprob": -0.09418844772597491, "compression_ratio": 1.7711267605633803, "no_speech_prob": + 0.0009331719484180212}, {"id": 222, "seek": 131958, "start": 1330.62, "end": 1335.58, + "text": " and then you try to see uh you know how would you like to wait BM25 and + dense vector search given", "tokens": [50916, 293, 550, 291, 853, 281, 536, 2232, + 291, 458, 577, 576, 291, 411, 281, 1699, 15901, 6074, 293, 18011, 8062, 3164, 2212, + 51164], "temperature": 0.0, "avg_logprob": -0.09418844772597491, "compression_ratio": + 1.7711267605633803, "no_speech_prob": 0.0009331719484180212}, {"id": 223, "seek": + 131958, "start": 1335.58, "end": 1340.3799999999999, "text": " this query and see + if that is productive but I found and this is a very interesting thing because I", + "tokens": [51164, 341, 14581, 293, 536, 498, 300, 307, 13304, 457, 286, 1352, 293, + 341, 307, 257, 588, 1880, 551, 570, 286, 51404], "temperature": 0.0, "avg_logprob": + -0.09418844772597491, "compression_ratio": 1.7711267605633803, "no_speech_prob": + 0.0009331719484180212}, {"id": 224, "seek": 131958, "start": 1340.3799999999999, + "end": 1345.8999999999999, "text": " think people have this idea that BM25 is like + very interpretable but in my experience it hasn''t been", "tokens": [51404, 519, + 561, 362, 341, 1558, 300, 15901, 6074, 307, 411, 588, 7302, 712, 457, 294, 452, + 1752, 309, 6132, 380, 668, 51680], "temperature": 0.0, "avg_logprob": -0.09418844772597491, + "compression_ratio": 1.7711267605633803, "no_speech_prob": 0.0009331719484180212}, + {"id": 225, "seek": 134590, "start": 1345.9, "end": 1351.3400000000001, "text": + " that I when you do when you''re doing longish queries in long documents and maybe + we can talk about", "tokens": [50364, 300, 286, 562, 291, 360, 562, 291, 434, 884, + 938, 742, 24109, 294, 938, 8512, 293, 1310, 321, 393, 751, 466, 50636], "temperature": + 0.0, "avg_logprob": -0.12145776748657226, "compression_ratio": 1.7545126353790614, + "no_speech_prob": 0.00052179693011567}, {"id": 226, "seek": 134590, "start": 1351.3400000000001, + "end": 1358.7, "text": " long queries or short queries but I find that trying to + decode why it why BM25 was better than dense", "tokens": [50636, 938, 24109, 420, + 2099, 24109, 457, 286, 915, 300, 1382, 281, 979, 1429, 983, 309, 983, 15901, 6074, + 390, 1101, 813, 18011, 51004], "temperature": 0.0, "avg_logprob": -0.12145776748657226, + "compression_ratio": 1.7545126353790614, "no_speech_prob": 0.00052179693011567}, + {"id": 227, "seek": 134590, "start": 1358.7, "end": 1363.3400000000001, "text": + " search for some particular query I find that to be impossible and maybe someone + will prove", "tokens": [51004, 3164, 337, 512, 1729, 14581, 286, 915, 300, 281, + 312, 6243, 293, 1310, 1580, 486, 7081, 51236], "temperature": 0.0, "avg_logprob": + -0.12145776748657226, "compression_ratio": 1.7545126353790614, "no_speech_prob": + 0.00052179693011567}, {"id": 228, "seek": 134590, "start": 1363.3400000000001, "end": + 1368.46, "text": " it wrong and I''ll look forward to seeing that honestly but like + there''s this example that we have", "tokens": [51236, 309, 2085, 293, 286, 603, + 574, 2128, 281, 2577, 300, 6095, 457, 411, 456, 311, 341, 1365, 300, 321, 362, 51492], + "temperature": 0.0, "avg_logprob": -0.12145776748657226, "compression_ratio": 1.7545126353790614, + "no_speech_prob": 0.00052179693011567}, {"id": 229, "seek": 134590, "start": 1368.46, + "end": 1372.5400000000002, "text": " as you know erica was developing the weviate + error demonstration of hybrid search where the query", "tokens": [51492, 382, 291, + 458, 1189, 2262, 390, 6416, 264, 321, 4917, 473, 6713, 16520, 295, 13051, 3164, + 689, 264, 14581, 51696], "temperature": 0.0, "avg_logprob": -0.12145776748657226, + "compression_ratio": 1.7545126353790614, "no_speech_prob": 0.00052179693011567}, + {"id": 230, "seek": 137254, "start": 1372.94, "end": 1377.58, "text": " how to catch + an elaskin Pollock and the idea being that the dense vector search contributes", + "tokens": [50384, 577, 281, 3745, 364, 806, 3863, 259, 31304, 1560, 293, 264, 1558, + 885, 300, 264, 18011, 8062, 3164, 32035, 50616], "temperature": 0.0, "avg_logprob": + -0.17256110055106028, "compression_ratio": 1.7202797202797202, "no_speech_prob": + 0.003874595509842038}, {"id": 231, "seek": 137254, "start": 1377.58, "end": 1383.26, + "text": " the disambiguation of catch that it prefers to fishing and that BM25 is + specific to elaskin Pollock", "tokens": [50616, 264, 717, 2173, 328, 16073, 295, + 3745, 300, 309, 44334, 281, 10180, 293, 300, 15901, 6074, 307, 2685, 281, 806, 3863, + 259, 31304, 1560, 50900], "temperature": 0.0, "avg_logprob": -0.17256110055106028, + "compression_ratio": 1.7202797202797202, "no_speech_prob": 0.003874595509842038}, + {"id": 232, "seek": 137254, "start": 1383.98, "end": 1388.3799999999999, "text": + " but I haven''t been able to just like inspect that kind of behavior as I look + through the beer benchmarks", "tokens": [50936, 457, 286, 2378, 380, 668, 1075, + 281, 445, 411, 15018, 300, 733, 295, 5223, 382, 286, 574, 807, 264, 8795, 43751, + 51156], "temperature": 0.0, "avg_logprob": -0.17256110055106028, "compression_ratio": + 1.7202797202797202, "no_speech_prob": 0.003874595509842038}, {"id": 233, "seek": + 137254, "start": 1388.3799999999999, "end": 1393.58, "text": " that just I''m super + excited to talk about that and how we''ve been evaluating it but you know let", + "tokens": [51156, 300, 445, 286, 478, 1687, 2919, 281, 751, 466, 300, 293, 577, + 321, 600, 668, 27479, 309, 457, 291, 458, 718, 51416], "temperature": 0.0, "avg_logprob": + -0.17256110055106028, "compression_ratio": 1.7202797202797202, "no_speech_prob": + 0.003874595509842038}, {"id": 234, "seek": 137254, "start": 1393.58, "end": 1398.46, + "text": " me let me pass it back to you and ask about your experience with them + BM25 or like the keyword and", "tokens": [51416, 385, 718, 385, 1320, 309, 646, + 281, 291, 293, 1029, 466, 428, 1752, 365, 552, 15901, 6074, 420, 411, 264, 20428, + 293, 51660], "temperature": 0.0, "avg_logprob": -0.17256110055106028, "compression_ratio": + 1.7202797202797202, "no_speech_prob": 0.003874595509842038}, {"id": 235, "seek": + 139846, "start": 1398.46, "end": 1401.98, "text": " the dense search particularly + because then I''d like to kind of take the topic to just", "tokens": [50364, 264, + 18011, 3164, 4098, 570, 550, 286, 1116, 411, 281, 733, 295, 747, 264, 4829, 281, + 445, 50540], "temperature": 0.0, "avg_logprob": -0.19791771116710843, "compression_ratio": + 1.5478260869565217, "no_speech_prob": 0.0031987582333385944}, {"id": 236, "seek": + 139846, "start": 1401.98, "end": 1407.82, "text": " arbitrary combinations of retrieval + methods not just be in 25 and say DPR or whatever", "tokens": [50540, 23211, 21267, + 295, 19817, 3337, 7150, 406, 445, 312, 294, 3552, 293, 584, 413, 15958, 420, 2035, + 50832], "temperature": 0.0, "avg_logprob": -0.19791771116710843, "compression_ratio": + 1.5478260869565217, "no_speech_prob": 0.0031987582333385944}, {"id": 237, "seek": + 139846, "start": 1408.3, "end": 1414.14, "text": " yeah I think I remember even + before the dense search appeared on the scene we were", "tokens": [50856, 1338, + 286, 519, 286, 1604, 754, 949, 264, 18011, 3164, 8516, 322, 264, 4145, 321, 645, + 51148], "temperature": 0.0, "avg_logprob": -0.19791771116710843, "compression_ratio": + 1.5478260869565217, "no_speech_prob": 0.0031987582333385944}, {"id": 238, "seek": + 139846, "start": 1414.94, "end": 1423.1000000000001, "text": " experimenting with + sort of like making TFI DF which is BM25 is like an addon like BM25 I think stands", + "tokens": [51188, 29070, 365, 1333, 295, 411, 1455, 314, 38568, 48336, 597, 307, + 15901, 6074, 307, 411, 364, 909, 266, 411, 15901, 6074, 286, 519, 7382, 51596], + "temperature": 0.0, "avg_logprob": -0.19791771116710843, "compression_ratio": 1.5478260869565217, + "no_speech_prob": 0.0031987582333385944}, {"id": 239, "seek": 142310, "start": 1423.1, + "end": 1435.1, "text": " for best match so period so solved problem solved but you + know like one of the questions the", "tokens": [50364, 337, 1151, 2995, 370, 2896, + 370, 13041, 1154, 13041, 457, 291, 458, 411, 472, 295, 264, 1651, 264, 50964], "temperature": + 0.0, "avg_logprob": -0.152951251628787, "compression_ratio": 1.8235294117647058, + "no_speech_prob": 0.0009885996114462614}, {"id": 240, "seek": 142310, "start": 1435.1, + "end": 1439.6599999999999, "text": " the reason I love working with product managers + and at the moment I am a product manager so I", "tokens": [50964, 264, 1778, 286, + 959, 1364, 365, 1674, 14084, 293, 412, 264, 1623, 286, 669, 257, 1674, 6598, 370, + 286, 51192], "temperature": 0.0, "avg_logprob": -0.152951251628787, "compression_ratio": + 1.8235294117647058, "no_speech_prob": 0.0009885996114462614}, {"id": 241, "seek": + 142310, "start": 1439.6599999999999, "end": 1443.98, "text": " took the other side + of this thing maybe we can talk more about it in the VV8 podcast but", "tokens": + [51192, 1890, 264, 661, 1252, 295, 341, 551, 1310, 321, 393, 751, 544, 466, 309, + 294, 264, 691, 53, 23, 7367, 457, 51408], "temperature": 0.0, "avg_logprob": -0.152951251628787, + "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0009885996114462614}, + {"id": 242, "seek": 142310, "start": 1444.78, "end": 1449.1799999999998, "text": + " you know the reason I love talking to product managers is because they don''t + know anything maybe", "tokens": [51448, 291, 458, 264, 1778, 286, 959, 1417, 281, + 1674, 14084, 307, 570, 436, 500, 380, 458, 1340, 1310, 51668], "temperature": 0.0, + "avg_logprob": -0.152951251628787, "compression_ratio": 1.8235294117647058, "no_speech_prob": + 0.0009885996114462614}, {"id": 243, "seek": 144918, "start": 1449.18, "end": 1455.74, + "text": " they don''t know that much about algorithms as you and they don''t code + maybe as much as you", "tokens": [50364, 436, 500, 380, 458, 300, 709, 466, 14642, + 382, 291, 293, 436, 500, 380, 3089, 1310, 382, 709, 382, 291, 50692], "temperature": + 0.0, "avg_logprob": -0.09844538929698231, "compression_ratio": 1.7488372093023257, + "no_speech_prob": 0.008044441230595112}, {"id": 244, "seek": 144918, "start": 1456.3, + "end": 1462.8600000000001, "text": " but they do care for they are the stakeholders + of the end result right so when they go out talk", "tokens": [50720, 457, 436, 360, + 1127, 337, 436, 366, 264, 17779, 295, 264, 917, 1874, 558, 370, 562, 436, 352, 484, + 751, 51048], "temperature": 0.0, "avg_logprob": -0.09844538929698231, "compression_ratio": + 1.7488372093023257, "no_speech_prob": 0.008044441230595112}, {"id": 245, "seek": + 144918, "start": 1462.8600000000001, "end": 1468.78, "text": " to sales or to the + end users they will not get a question which alpha you have used now coming", "tokens": + [51048, 281, 5763, 420, 281, 264, 917, 5022, 436, 486, 406, 483, 257, 1168, 597, + 8961, 291, 362, 1143, 586, 1348, 51344], "temperature": 0.0, "avg_logprob": -0.09844538929698231, + "compression_ratio": 1.7488372093023257, "no_speech_prob": 0.008044441230595112}, + {"id": 246, "seek": 144918, "start": 1468.78, "end": 1476.7, "text": " back to your + to your example right they will say hey I typed cat three times in my query and + I", "tokens": [51344, 646, 281, 428, 281, 428, 1365, 558, 436, 486, 584, 4177, 286, + 33941, 3857, 1045, 1413, 294, 452, 14581, 293, 286, 51740], "temperature": 0.0, + "avg_logprob": -0.09844538929698231, "compression_ratio": 1.7488372093023257, "no_speech_prob": + 0.008044441230595112}, {"id": 247, "seek": 147670, "start": 1476.7, "end": 1481.5, + "text": " still see that the document that mentions it once is at the top how can + you explain this", "tokens": [50364, 920, 536, 300, 264, 4166, 300, 23844, 309, + 1564, 307, 412, 264, 1192, 577, 393, 291, 2903, 341, 50604], "temperature": 0.0, + "avg_logprob": -0.16222241867420284, "compression_ratio": 1.6592920353982301, "no_speech_prob": + 0.006021835841238499}, {"id": 248, "seek": 147670, "start": 1482.8600000000001, + "end": 1488.7, "text": " I will try to link there is a consulting company I think + they''re based in Boston actually by the way", "tokens": [50672, 286, 486, 853, + 281, 2113, 456, 307, 257, 23682, 2237, 286, 519, 436, 434, 2361, 294, 12333, 767, + 538, 264, 636, 50964], "temperature": 0.0, "avg_logprob": -0.16222241867420284, + "compression_ratio": 1.6592920353982301, "no_speech_prob": 0.006021835841238499}, + {"id": 249, "seek": 147670, "start": 1489.5800000000002, "end": 1496.8600000000001, + "text": " I just forget their name key and via something so they have a really great + presentation on", "tokens": [51008, 286, 445, 2870, 641, 1315, 2141, 293, 5766, + 746, 370, 436, 362, 257, 534, 869, 5860, 322, 51372], "temperature": 0.0, "avg_logprob": + -0.16222241867420284, "compression_ratio": 1.6592920353982301, "no_speech_prob": + 0.006021835841238499}, {"id": 250, "seek": 147670, "start": 1496.8600000000001, + "end": 1504.22, "text": " haystack life I believe where they go super deep and I + recommend you watch it super super deep", "tokens": [51372, 4842, 372, 501, 993, + 286, 1697, 689, 436, 352, 1687, 2452, 293, 286, 2748, 291, 1159, 309, 1687, 1687, + 2452, 51740], "temperature": 0.0, "avg_logprob": -0.16222241867420284, "compression_ratio": + 1.6592920353982301, "no_speech_prob": 0.006021835841238499}, {"id": 251, "seek": + 150422, "start": 1504.22, "end": 1510.78, "text": " on how TF IDF screws up our + understanding of how things should work what they don''t you know", "tokens": [50364, + 322, 577, 40964, 7348, 37, 13050, 493, 527, 3701, 295, 577, 721, 820, 589, 437, + 436, 500, 380, 291, 458, 50692], "temperature": 0.0, "avg_logprob": -0.110720269033842, + "compression_ratio": 1.8492063492063493, "no_speech_prob": 0.0060023353435099125}, + {"id": 252, "seek": 150422, "start": 1510.78, "end": 1516.8600000000001, "text": + " and they go by you know how many times you know the word cat is mentioned in the + document versus", "tokens": [50692, 293, 436, 352, 538, 291, 458, 577, 867, 1413, + 291, 458, 264, 1349, 3857, 307, 2835, 294, 264, 4166, 5717, 50996], "temperature": + 0.0, "avg_logprob": -0.110720269033842, "compression_ratio": 1.8492063492063493, + "no_speech_prob": 0.0060023353435099125}, {"id": 253, "seek": 150422, "start": 1516.8600000000001, + "end": 1521.1000000000001, "text": " how many times it''s it''s mentioned in the + query and you can do all this combinatorial you know", "tokens": [50996, 577, 867, + 1413, 309, 311, 309, 311, 2835, 294, 264, 14581, 293, 291, 393, 360, 439, 341, 2512, + 31927, 831, 291, 458, 51208], "temperature": 0.0, "avg_logprob": -0.110720269033842, + "compression_ratio": 1.8492063492063493, "no_speech_prob": 0.0060023353435099125}, + {"id": 254, "seek": 150422, "start": 1521.1000000000001, "end": 1527.02, "text": + " combinations and then they kind of like explain what you would do to kind of solve + it right", "tokens": [51208, 21267, 293, 550, 436, 733, 295, 411, 2903, 437, 291, + 576, 360, 281, 733, 295, 5039, 309, 558, 51504], "temperature": 0.0, "avg_logprob": + -0.110720269033842, "compression_ratio": 1.8492063492063493, "no_speech_prob": 0.0060023353435099125}, + {"id": 255, "seek": 150422, "start": 1527.58, "end": 1533.18, "text": " and you + you basically develop this situation another another thing is that I found useful", + "tokens": [51532, 293, 291, 291, 1936, 1499, 341, 2590, 1071, 1071, 551, 307, 300, + 286, 1352, 4420, 51812], "temperature": 0.0, "avg_logprob": -0.110720269033842, + "compression_ratio": 1.8492063492063493, "no_speech_prob": 0.0060023353435099125}, + {"id": 256, "seek": 153318, "start": 1534.0600000000002, "end": 1539.5, "text": + " and it also mentioned in the relevant search book by Dr. Nbal and Jerry Bareman", + "tokens": [50408, 293, 309, 611, 2835, 294, 264, 7340, 3164, 1446, 538, 2491, 13, + 426, 2645, 293, 17454, 4156, 15023, 50680], "temperature": 0.0, "avg_logprob": -0.20245776456945083, + "compression_ratio": 1.7096774193548387, "no_speech_prob": 0.0024194736033678055}, + {"id": 257, "seek": 153318, "start": 1540.7, "end": 1547.26, "text": " that you + can you can use like if you would use like let''s say elastic search or similar + system", "tokens": [50740, 300, 291, 393, 291, 393, 764, 411, 498, 291, 576, 764, + 411, 718, 311, 584, 17115, 3164, 420, 2531, 1185, 51068], "temperature": 0.0, "avg_logprob": + -0.20245776456945083, "compression_ratio": 1.7096774193548387, "no_speech_prob": + 0.0024194736033678055}, {"id": 258, "seek": 153318, "start": 1547.26, "end": 1554.3, + "text": " or solar you could actually build a function which explains the query + step by step right so it", "tokens": [51068, 420, 7936, 291, 727, 767, 1322, 257, + 2445, 597, 13948, 264, 14581, 1823, 538, 1823, 558, 370, 309, 51420], "temperature": + 0.0, "avg_logprob": -0.20245776456945083, "compression_ratio": 1.7096774193548387, + "no_speech_prob": 0.0024194736033678055}, {"id": 259, "seek": 153318, "start": 1554.3, + "end": 1560.8600000000001, "text": " basically prints you the tree of how it actually + came up with that final answer with that final score", "tokens": [51420, 1936, 22305, + 291, 264, 4230, 295, 577, 309, 767, 1361, 493, 365, 300, 2572, 1867, 365, 300, 2572, + 6175, 51748], "temperature": 0.0, "avg_logprob": -0.20245776456945083, "compression_ratio": + 1.7096774193548387, "no_speech_prob": 0.0024194736033678055}, {"id": 260, "seek": + 156086, "start": 1561.5, "end": 1566.4599999999998, "text": " and how you know that + specific field like for example at TomTom we would I cannot go into much", "tokens": + [50396, 293, 577, 291, 458, 300, 2685, 2519, 411, 337, 1365, 412, 5041, 23442, 321, + 576, 286, 2644, 352, 666, 709, 50644], "temperature": 0.0, "avg_logprob": -0.15901314128528943, + "compression_ratio": 1.574468085106383, "no_speech_prob": 0.001898866961710155}, + {"id": 261, "seek": 156086, "start": 1566.4599999999998, "end": 1571.58, "text": + " specifics what we do at TomTom but basically the geographic search right so you + type some", "tokens": [50644, 28454, 437, 321, 360, 412, 5041, 23442, 457, 1936, + 264, 32318, 3164, 558, 370, 291, 2010, 512, 50900], "temperature": 0.0, "avg_logprob": + -0.15901314128528943, "compression_ratio": 1.574468085106383, "no_speech_prob": + 0.001898866961710155}, {"id": 262, "seek": 156086, "start": 1571.58, "end": 1579.1799999999998, + "text": " destination let''s say an address or maybe a P.O.I name point of interest + like a shop and it''s", "tokens": [50900, 12236, 718, 311, 584, 364, 2985, 420, + 1310, 257, 430, 13, 46, 13, 40, 1315, 935, 295, 1179, 411, 257, 3945, 293, 309, + 311, 51280], "temperature": 0.0, "avg_logprob": -0.15901314128528943, "compression_ratio": + 1.574468085106383, "no_speech_prob": 0.001898866961710155}, {"id": 263, "seek": + 156086, "start": 1579.1799999999998, "end": 1585.6599999999999, "text": " multilingual + as well right so obviously your query may hit by accident sometimes in a wrong", + "tokens": [51280, 2120, 38219, 382, 731, 558, 370, 2745, 428, 14581, 815, 2045, + 538, 6398, 2171, 294, 257, 2085, 51604], "temperature": 0.0, "avg_logprob": -0.15901314128528943, + "compression_ratio": 1.574468085106383, "no_speech_prob": 0.001898866961710155}, + {"id": 264, "seek": 158566, "start": 1586.38, "end": 1595.42, "text": " language + field and so the only way to know this is to print that query execution formula + if you", "tokens": [50400, 2856, 2519, 293, 370, 264, 787, 636, 281, 458, 341, 307, + 281, 4482, 300, 14581, 15058, 8513, 498, 291, 50852], "temperature": 0.0, "avg_logprob": + -0.139484571373981, "compression_ratio": 1.6756756756756757, "no_speech_prob": 0.0042036147788167}, + {"id": 265, "seek": 158566, "start": 1595.42, "end": 1602.0600000000002, "text": + " will right and so you will see okay ah it hit in that in that let''s say I don''t + know a French", "tokens": [50852, 486, 558, 293, 370, 291, 486, 536, 1392, 3716, + 309, 2045, 294, 300, 294, 300, 718, 311, 584, 286, 500, 380, 458, 257, 5522, 51184], + "temperature": 0.0, "avg_logprob": -0.139484571373981, "compression_ratio": 1.6756756756756757, + "no_speech_prob": 0.0042036147788167}, {"id": 266, "seek": 158566, "start": 1602.0600000000002, + "end": 1607.8200000000002, "text": " uh but I wasn''t intending French I was doing + German or something why did it do that and you", "tokens": [51184, 2232, 457, 286, + 2067, 380, 560, 2029, 5522, 286, 390, 884, 6521, 420, 746, 983, 630, 309, 360, 300, + 293, 291, 51472], "temperature": 0.0, "avg_logprob": -0.139484571373981, "compression_ratio": + 1.6756756756756757, "no_speech_prob": 0.0042036147788167}, {"id": 267, "seek": 158566, + "start": 1607.8200000000002, "end": 1613.3400000000001, "text": " start reasoning + about how did I create the tokens because when you tokenize your text it''s", "tokens": + [51472, 722, 21577, 466, 577, 630, 286, 1884, 264, 22667, 570, 562, 291, 14862, + 1125, 428, 2487, 309, 311, 51748], "temperature": 0.0, "avg_logprob": -0.139484571373981, + "compression_ratio": 1.6756756756756757, "no_speech_prob": 0.0042036147788167}, + {"id": 268, "seek": 161334, "start": 1613.34, "end": 1619.5, "text": " same problem + is as in then search in a way like when you split text into paragraphs or sentences", + "tokens": [50364, 912, 1154, 307, 382, 294, 550, 3164, 294, 257, 636, 411, 562, + 291, 7472, 2487, 666, 48910, 420, 16579, 50672], "temperature": 0.0, "avg_logprob": + -0.10424212975935503, "compression_ratio": 1.9441624365482233, "no_speech_prob": + 0.0008838446228764951}, {"id": 269, "seek": 161334, "start": 1619.5, "end": 1626.3799999999999, + "text": " there you need to split the tokens how you do split the tokens is dependent + on how you model", "tokens": [50672, 456, 291, 643, 281, 7472, 264, 22667, 577, + 291, 360, 7472, 264, 22667, 307, 12334, 322, 577, 291, 2316, 51016], "temperature": + 0.0, "avg_logprob": -0.10424212975935503, "compression_ratio": 1.9441624365482233, + "no_speech_prob": 0.0008838446228764951}, {"id": 270, "seek": 161334, "start": 1626.3799999999999, + "end": 1631.82, "text": " the semantics of what you are converting to to a token + so you should not convert string to a token", "tokens": [51016, 264, 4361, 45298, + 295, 437, 291, 366, 29942, 281, 281, 257, 14862, 370, 291, 820, 406, 7620, 6798, + 281, 257, 14862, 51288], "temperature": 0.0, "avg_logprob": -0.10424212975935503, + "compression_ratio": 1.9441624365482233, "no_speech_prob": 0.0008838446228764951}, + {"id": 271, "seek": 161334, "start": 1631.82, "end": 1638.22, "text": " you should + convert meaning to a token so if you capture meaning in that token then you''re + done", "tokens": [51288, 291, 820, 7620, 3620, 281, 257, 14862, 370, 498, 291, 7983, + 3620, 294, 300, 14862, 550, 291, 434, 1096, 51608], "temperature": 0.0, "avg_logprob": + -0.10424212975935503, "compression_ratio": 1.9441624365482233, "no_speech_prob": + 0.0008838446228764951}, {"id": 272, "seek": 163822, "start": 1638.3, "end": 1643.98, + "text": " in a way but then coming back to your question I cannot answer it fully + now but I highly recommend", "tokens": [50368, 294, 257, 636, 457, 550, 1348, 646, + 281, 428, 1168, 286, 2644, 1867, 309, 4498, 586, 457, 286, 5405, 2748, 50652], "temperature": + 0.0, "avg_logprob": -0.11653264720788163, "compression_ratio": 1.6228813559322033, + "no_speech_prob": 0.004954421427100897}, {"id": 273, "seek": 163822, "start": 1643.98, + "end": 1650.7, "text": " that that talk um by can be so you know like you need to + you need to see how term frequencies", "tokens": [50652, 300, 300, 751, 1105, 538, + 393, 312, 370, 291, 458, 411, 291, 643, 281, 291, 643, 281, 536, 577, 1433, 20250, + 50988], "temperature": 0.0, "avg_logprob": -0.11653264720788163, "compression_ratio": + 1.6228813559322033, "no_speech_prob": 0.004954421427100897}, {"id": 274, "seek": + 163822, "start": 1650.7, "end": 1656.8600000000001, "text": " and inverse document + frequencies play together and also like in BM25 versus TFID if you have the", "tokens": + [50988, 293, 17340, 4166, 20250, 862, 1214, 293, 611, 411, 294, 15901, 6074, 5717, + 40964, 2777, 498, 291, 362, 264, 51296], "temperature": 0.0, "avg_logprob": -0.11653264720788163, + "compression_ratio": 1.6228813559322033, "no_speech_prob": 0.004954421427100897}, + {"id": 275, "seek": 163822, "start": 1656.8600000000001, "end": 1662.78, "text": + " term saturation issue which is kind of mitigated in BM25 to some extent right + so meeting that", "tokens": [51296, 1433, 27090, 2734, 597, 307, 733, 295, 15699, + 770, 294, 15901, 6074, 281, 512, 8396, 558, 370, 3440, 300, 51592], "temperature": + 0.0, "avg_logprob": -0.11653264720788163, "compression_ratio": 1.6228813559322033, + "no_speech_prob": 0.004954421427100897}, {"id": 276, "seek": 166278, "start": 1662.86, + "end": 1670.06, "text": " if you have two documents um sorry if you have two terms + which occur like one is like million times", "tokens": [50368, 498, 291, 362, 732, + 8512, 1105, 2597, 498, 291, 362, 732, 2115, 597, 5160, 411, 472, 307, 411, 2459, + 1413, 50728], "temperature": 0.0, "avg_logprob": -0.14323609808216925, "compression_ratio": + 1.6554621848739495, "no_speech_prob": 0.0007610110333189368}, {"id": 277, "seek": + 166278, "start": 1670.06, "end": 1676.46, "text": " and the other one one million + plus one TFID will be unable to distinguish between these two but like", "tokens": + [50728, 293, 264, 661, 472, 472, 2459, 1804, 472, 40964, 2777, 486, 312, 11299, + 281, 20206, 1296, 613, 732, 457, 411, 51048], "temperature": 0.0, "avg_logprob": + -0.14323609808216925, "compression_ratio": 1.6554621848739495, "no_speech_prob": + 0.0007610110333189368}, {"id": 278, "seek": 166278, "start": 1676.46, "end": 1682.1399999999999, + "text": " BM25 is still sensitive to these things and that''s why it''s a little + better right so I think it", "tokens": [51048, 15901, 6074, 307, 920, 9477, 281, + 613, 721, 293, 300, 311, 983, 309, 311, 257, 707, 1101, 558, 370, 286, 519, 309, + 51332], "temperature": 0.0, "avg_logprob": -0.14323609808216925, "compression_ratio": + 1.6554621848739495, "no_speech_prob": 0.0007610110333189368}, {"id": 279, "seek": + 166278, "start": 1682.1399999999999, "end": 1688.86, "text": " solves this term + saturation issue I don''t know if I answered your question but no yeah I think um", + "tokens": [51332, 39890, 341, 1433, 27090, 2734, 286, 500, 380, 458, 498, 286, 10103, + 428, 1168, 457, 572, 1338, 286, 519, 1105, 51668], "temperature": 0.0, "avg_logprob": + -0.14323609808216925, "compression_ratio": 1.6554621848739495, "no_speech_prob": + 0.0007610110333189368}, {"id": 280, "seek": 168886, "start": 1688.9399999999998, + "end": 1695.5, "text": " so yeah a couple things I really want to continue on this + TFID versus BM25 and then", "tokens": [50368, 370, 1338, 257, 1916, 721, 286, 534, + 528, 281, 2354, 322, 341, 40964, 2777, 5717, 15901, 6074, 293, 550, 50696], "temperature": + 0.0, "avg_logprob": -0.16922562986939818, "compression_ratio": 1.6608695652173913, + "no_speech_prob": 0.004990188404917717}, {"id": 281, "seek": 168886, "start": 1695.5, + "end": 1701.9799999999998, "text": " adverse displayed to it I think you''re I think + this like pseudo relevance feedback is that like the", "tokens": [50696, 27590, + 16372, 281, 309, 286, 519, 291, 434, 286, 519, 341, 411, 35899, 32684, 5824, 307, + 300, 411, 264, 51020], "temperature": 0.0, "avg_logprob": -0.16922562986939818, + "compression_ratio": 1.6608695652173913, "no_speech_prob": 0.004990188404917717}, + {"id": 282, "seek": 168886, "start": 1701.9799999999998, "end": 1708.06, "text": + " phrase I give to show that like um if you''re searching with BM25 you say if you + had added this key", "tokens": [51020, 9535, 286, 976, 281, 855, 300, 411, 1105, + 498, 291, 434, 10808, 365, 15901, 6074, 291, 584, 498, 291, 632, 3869, 341, 2141, + 51324], "temperature": 0.0, "avg_logprob": -0.16922562986939818, "compression_ratio": + 1.6608695652173913, "no_speech_prob": 0.004990188404917717}, {"id": 283, "seek": + 168886, "start": 1708.06, "end": 1713.02, "text": " like you have the gold document + and you''re like how would I have modified the query to produce that", "tokens": + [51324, 411, 291, 362, 264, 3821, 4166, 293, 291, 434, 411, 577, 576, 286, 362, + 15873, 264, 14581, 281, 5258, 300, 51572], "temperature": 0.0, "avg_logprob": -0.16922562986939818, + "compression_ratio": 1.6608695652173913, "no_speech_prob": 0.004990188404917717}, + {"id": 284, "seek": 171302, "start": 1713.1, "end": 1718.7, "text": " document is + it so I think that yeah that''s one way another way is to how would you modify the", + "tokens": [50368, 4166, 307, 309, 370, 286, 519, 300, 1338, 300, 311, 472, 636, + 1071, 636, 307, 281, 577, 576, 291, 16927, 264, 50648], "temperature": 0.0, "avg_logprob": + -0.16671668342922044, "compression_ratio": 2.0672268907563027, "no_speech_prob": + 0.004055447410792112}, {"id": 285, "seek": 171302, "start": 1718.7, "end": 1724.78, + "text": " indexing that''s more in your control right so how you would modify the + indexing for example you would", "tokens": [50648, 8186, 278, 300, 311, 544, 294, + 428, 1969, 558, 370, 577, 291, 576, 16927, 264, 8186, 278, 337, 1365, 291, 576, + 50952], "temperature": 0.0, "avg_logprob": -0.16671668342922044, "compression_ratio": + 2.0672268907563027, "no_speech_prob": 0.004055447410792112}, {"id": 286, "seek": + 171302, "start": 1724.78, "end": 1729.18, "text": " in some cases you can remove + the applicates or something right so like you don''t you don''t need them", "tokens": + [50952, 294, 512, 3331, 291, 393, 4159, 264, 2580, 1024, 420, 746, 558, 370, 411, + 291, 500, 380, 291, 500, 380, 643, 552, 51172], "temperature": 0.0, "avg_logprob": + -0.16671668342922044, "compression_ratio": 2.0672268907563027, "no_speech_prob": + 0.004055447410792112}, {"id": 287, "seek": 171302, "start": 1729.18, "end": 1733.82, + "text": " or something like that you can you can or you can split the term by numbers + or something right if", "tokens": [51172, 420, 746, 411, 300, 291, 393, 291, 393, + 420, 291, 393, 7472, 264, 1433, 538, 3547, 420, 746, 558, 498, 51404], "temperature": + 0.0, "avg_logprob": -0.16671668342922044, "compression_ratio": 2.0672268907563027, + "no_speech_prob": 0.004055447410792112}, {"id": 288, "seek": 171302, "start": 1733.82, + "end": 1738.46, "text": " they happen to occur inside the term something like I''m + making these examples but I''m saying that", "tokens": [51404, 436, 1051, 281, 5160, + 1854, 264, 1433, 746, 411, 286, 478, 1455, 613, 5110, 457, 286, 478, 1566, 300, + 51636], "temperature": 0.0, "avg_logprob": -0.16671668342922044, "compression_ratio": + 2.0672268907563027, "no_speech_prob": 0.004055447410792112}, {"id": 289, "seek": + 173846, "start": 1738.46, "end": 1742.8600000000001, "text": " you have more control + in the indexing than in the query but in the in the query you can model", "tokens": + [50364, 291, 362, 544, 1969, 294, 264, 8186, 278, 813, 294, 264, 14581, 457, 294, + 264, 294, 264, 14581, 291, 393, 2316, 50584], "temperature": 0.0, "avg_logprob": + -0.16090306082924644, "compression_ratio": 1.8317307692307692, "no_speech_prob": + 0.0031507625244557858}, {"id": 290, "seek": 173846, "start": 1742.8600000000001, + "end": 1749.3400000000001, "text": " like query similarity for example right so + yeah oh that''s super interesting yeah the the way that", "tokens": [50584, 411, + 14581, 32194, 337, 1365, 558, 370, 1338, 1954, 300, 311, 1687, 1880, 1338, 264, + 264, 636, 300, 50908], "temperature": 0.0, "avg_logprob": -0.16090306082924644, + "compression_ratio": 1.8317307692307692, "no_speech_prob": 0.0031507625244557858}, + {"id": 291, "seek": 173846, "start": 1749.3400000000001, "end": 1754.6200000000001, + "text": " you do like the text preprocessing like stemming stopper removal all that + all that that bag of", "tokens": [50908, 291, 360, 411, 264, 2487, 2666, 340, 780, + 278, 411, 12312, 2810, 1590, 610, 17933, 439, 300, 439, 300, 300, 3411, 295, 51172], + "temperature": 0.0, "avg_logprob": -0.16090306082924644, "compression_ratio": 1.8317307692307692, + "no_speech_prob": 0.0031507625244557858}, {"id": 292, "seek": 173846, "start": 1755.98, + "end": 1761.9, "text": " that''s what I hope dense vector search can kill all that + I hope you can just like anything can", "tokens": [51240, 300, 311, 437, 286, 1454, + 18011, 8062, 3164, 393, 1961, 439, 300, 286, 1454, 291, 393, 445, 411, 1340, 393, + 51536], "temperature": 0.0, "avg_logprob": -0.16090306082924644, "compression_ratio": + 1.8317307692307692, "no_speech_prob": 0.0031507625244557858}, {"id": 293, "seek": + 176190, "start": 1761.9, "end": 1769.98, "text": " go into it yeah and but um yeah + and so I think there''s this this thing called like decoding the", "tokens": [50364, + 352, 666, 309, 1338, 293, 457, 1105, 1338, 293, 370, 286, 519, 456, 311, 341, 341, + 551, 1219, 411, 979, 8616, 264, 50768], "temperature": 0.0, "avg_logprob": -0.11056596827956866, + "compression_ratio": 1.9061224489795918, "no_speech_prob": 7.760502921883017e-05}, + {"id": 294, "seek": 176190, "start": 1769.98, "end": 1773.9, "text": " latent space + of a vector search model on that other idea of what query would have produced this", + "tokens": [50768, 48994, 1901, 295, 257, 8062, 3164, 2316, 322, 300, 661, 1558, + 295, 437, 14581, 576, 362, 7126, 341, 50964], "temperature": 0.0, "avg_logprob": + -0.11056596827956866, "compression_ratio": 1.9061224489795918, "no_speech_prob": + 7.760502921883017e-05}, {"id": 295, "seek": 176190, "start": 1773.9, "end": 1779.02, + "text": " where you would take the you would train a language model on document + query pairs and then", "tokens": [50964, 689, 291, 576, 747, 264, 291, 576, 3847, + 257, 2856, 2316, 322, 4166, 14581, 15494, 293, 550, 51220], "temperature": 0.0, + "avg_logprob": -0.11056596827956866, "compression_ratio": 1.9061224489795918, "no_speech_prob": + 7.760502921883017e-05}, {"id": 296, "seek": 176190, "start": 1779.02, "end": 1783.1000000000001, + "text": " it would generate a query that would have matched the document maybe that''s + useful but", "tokens": [51220, 309, 576, 8460, 257, 14581, 300, 576, 362, 21447, + 264, 4166, 1310, 300, 311, 4420, 457, 51424], "temperature": 0.0, "avg_logprob": + -0.11056596827956866, "compression_ratio": 1.9061224489795918, "no_speech_prob": + 7.760502921883017e-05}, {"id": 297, "seek": 176190, "start": 1783.1000000000001, + "end": 1788.7800000000002, "text": " but I''m also I''m very curious what you think + about this idea of split vectors so split vectors is", "tokens": [51424, 457, 286, + 478, 611, 286, 478, 588, 6369, 437, 291, 519, 466, 341, 1558, 295, 7472, 18875, + 370, 7472, 18875, 307, 51708], "temperature": 0.0, "avg_logprob": -0.11056596827956866, + "compression_ratio": 1.9061224489795918, "no_speech_prob": 7.760502921883017e-05}, + {"id": 298, "seek": 178878, "start": 1788.78, "end": 1795.98, "text": " like you + keep the mass language modeling head and so you encode the thing into the vectors + so the", "tokens": [50364, 411, 291, 1066, 264, 2758, 2856, 15983, 1378, 293, 370, + 291, 2058, 1429, 264, 551, 666, 264, 18875, 370, 264, 50724], "temperature": 0.0, + "avg_logprob": -0.07786177689174437, "compression_ratio": 2.0638297872340425, "no_speech_prob": + 0.0008971040369942784}, {"id": 299, "seek": 178878, "start": 1795.98, "end": 1801.18, + "text": " mass language modeling head always only takes in a vector as input you + always would mask out whatever", "tokens": [50724, 2758, 2856, 15983, 1378, 1009, + 787, 2516, 294, 257, 8062, 382, 4846, 291, 1009, 576, 6094, 484, 2035, 50984], "temperature": + 0.0, "avg_logprob": -0.07786177689174437, "compression_ratio": 2.0638297872340425, + "no_speech_prob": 0.0008971040369942784}, {"id": 300, "seek": 178878, "start": 1801.18, + "end": 1806.1399999999999, "text": " the mass token was and then send just that + vector to the mass language modeling head that will produce", "tokens": [50984, + 264, 2758, 14862, 390, 293, 550, 2845, 445, 300, 8062, 281, 264, 2758, 2856, 15983, + 1378, 300, 486, 5258, 51232], "temperature": 0.0, "avg_logprob": -0.07786177689174437, + "compression_ratio": 2.0638297872340425, "no_speech_prob": 0.0008971040369942784}, + {"id": 301, "seek": 178878, "start": 1806.1399999999999, "end": 1812.1399999999999, + "text": " like a sparse distribution over what would replace it and so I think the + the idea behind", "tokens": [51232, 411, 257, 637, 11668, 7316, 670, 437, 576, 7406, + 309, 293, 370, 286, 519, 264, 264, 1558, 2261, 51532], "temperature": 0.0, "avg_logprob": + -0.07786177689174437, "compression_ratio": 2.0638297872340425, "no_speech_prob": + 0.0008971040369942784}, {"id": 302, "seek": 178878, "start": 1812.1399999999999, + "end": 1816.46, "text": " split is that you do that for each token and then you + just kind of average all the vocabulary", "tokens": [51532, 7472, 307, 300, 291, + 360, 300, 337, 1184, 14862, 293, 550, 291, 445, 733, 295, 4274, 439, 264, 19864, + 51748], "temperature": 0.0, "avg_logprob": -0.07786177689174437, "compression_ratio": + 2.0638297872340425, "no_speech_prob": 0.0008971040369942784}, {"id": 303, "seek": + 181646, "start": 1816.54, "end": 1823.02, "text": " distributions and that gives + you a sparse vector that represents like the like happy euphoric", "tokens": [50368, + 37870, 293, 300, 2709, 291, 257, 637, 11668, 8062, 300, 8855, 411, 264, 411, 2055, + 2228, 950, 16345, 50692], "temperature": 0.0, "avg_logprob": -0.1260408688617009, + "compression_ratio": 1.7880184331797235, "no_speech_prob": 0.011306462809443474}, + {"id": 304, "seek": 181646, "start": 1823.02, "end": 1830.6200000000001, "text": + " ecstatic like the kind of synonyms behind it do you like that kind of idea yeah + yeah uh uh I like that", "tokens": [50692, 11437, 34632, 411, 264, 733, 295, 5451, + 2526, 2592, 2261, 309, 360, 291, 411, 300, 733, 295, 1558, 1338, 1338, 2232, 2232, + 286, 411, 300, 51072], "temperature": 0.0, "avg_logprob": -0.1260408688617009, "compression_ratio": + 1.7880184331797235, "no_speech_prob": 0.011306462809443474}, {"id": 305, "seek": + 181646, "start": 1832.14, "end": 1838.54, "text": " the fact that I think we can + step back from like this dense vector limitations and go back and", "tokens": [51148, + 264, 1186, 300, 286, 519, 321, 393, 1823, 646, 490, 411, 341, 18011, 8062, 15705, + 293, 352, 646, 293, 51468], "temperature": 0.0, "avg_logprob": -0.1260408688617009, + "compression_ratio": 1.7880184331797235, "no_speech_prob": 0.011306462809443474}, + {"id": 306, "seek": 181646, "start": 1838.54, "end": 1844.14, "text": " try to capture + what sparse vectors do because if I don''t know if you watch the episode with duck", + "tokens": [51468, 853, 281, 7983, 437, 637, 11668, 18875, 360, 570, 498, 286, 500, + 380, 458, 498, 291, 1159, 264, 3500, 365, 12482, 51748], "temperature": 0.0, "avg_logprob": + -0.1260408688617009, "compression_ratio": 1.7880184331797235, "no_speech_prob": + 0.011306462809443474}, {"id": 307, "seek": 184414, "start": 1844.14, "end": 1849.1000000000001, + "text": " Turnbull but he actually shed the light on on this really well by saying + hey if you if you take the", "tokens": [50364, 7956, 37290, 457, 415, 767, 14951, + 264, 1442, 322, 322, 341, 534, 731, 538, 1566, 4177, 498, 291, 498, 291, 747, 264, + 50612], "temperature": 0.0, "avg_logprob": -0.14839756604537224, "compression_ratio": + 1.8221343873517786, "no_speech_prob": 0.0030631099361926317}, {"id": 308, "seek": + 184414, "start": 1849.1000000000001, "end": 1856.22, "text": " keyword retrieval + inverted index you deal with like probably hundreds of thousands of dimensions", + "tokens": [50612, 20428, 19817, 3337, 38969, 8186, 291, 2028, 365, 411, 1391, 6779, + 295, 5383, 295, 12819, 50968], "temperature": 0.0, "avg_logprob": -0.14839756604537224, + "compression_ratio": 1.8221343873517786, "no_speech_prob": 0.0030631099361926317}, + {"id": 309, "seek": 184414, "start": 1856.22, "end": 1863.1000000000001, "text": + " unless millions unless billions like in some of the indexes we had at least million + per term right", "tokens": [50968, 5969, 6803, 5969, 17375, 411, 294, 512, 295, + 264, 8186, 279, 321, 632, 412, 1935, 2459, 680, 1433, 558, 51312], "temperature": + 0.0, "avg_logprob": -0.14839756604537224, "compression_ratio": 1.8221343873517786, + "no_speech_prob": 0.0030631099361926317}, {"id": 310, "seek": 184414, "start": 1863.1000000000001, + "end": 1868.0600000000002, "text": " so that''s like million positions most of which + are zeros because this term doesn''t occur", "tokens": [51312, 370, 300, 311, 411, + 2459, 8432, 881, 295, 597, 366, 35193, 570, 341, 1433, 1177, 380, 5160, 51560], + "temperature": 0.0, "avg_logprob": -0.14839756604537224, "compression_ratio": 1.8221343873517786, + "no_speech_prob": 0.0030631099361926317}, {"id": 311, "seek": 184414, "start": 1869.0200000000002, + "end": 1873.9, "text": " you know in in specific doc but like doc id but like it + occurs like in a few", "tokens": [51608, 291, 458, 294, 294, 2685, 3211, 457, 411, + 3211, 4496, 457, 411, 309, 11843, 411, 294, 257, 1326, 51852], "temperature": 0.0, + "avg_logprob": -0.14839756604537224, "compression_ratio": 1.8221343873517786, "no_speech_prob": + 0.0030631099361926317}, {"id": 312, "seek": 187390, "start": 1873.98, "end": 1880.5400000000002, + "text": " and so in dense retrieval you sort of like compress all of these to let''s + say 256 dimensions", "tokens": [50368, 293, 370, 294, 18011, 19817, 3337, 291, 1333, + 295, 411, 14778, 439, 295, 613, 281, 718, 311, 584, 3552, 21, 12819, 50696], "temperature": + 0.0, "avg_logprob": -0.15437527611142113, "compression_ratio": 1.7598425196850394, + "no_speech_prob": 5.92911419516895e-05}, {"id": 313, "seek": 187390, "start": 1880.5400000000002, + "end": 1886.14, "text": " and inherently you lose the precision right so it becomes + more like recall oriented", "tokens": [50696, 293, 27993, 291, 3624, 264, 18356, + 558, 370, 309, 3643, 544, 411, 9901, 21841, 50976], "temperature": 0.0, "avg_logprob": + -0.15437527611142113, "compression_ratio": 1.7598425196850394, "no_speech_prob": + 5.92911419516895e-05}, {"id": 314, "seek": 187390, "start": 1887.1000000000001, + "end": 1892.3000000000002, "text": " rather than you know in sparse you you basically + like what also it means spars is that", "tokens": [51024, 2831, 813, 291, 458, 294, + 637, 11668, 291, 291, 1936, 411, 437, 611, 309, 1355, 637, 685, 307, 300, 51284], + "temperature": 0.0, "avg_logprob": -0.15437527611142113, "compression_ratio": 1.7598425196850394, + "no_speech_prob": 5.92911419516895e-05}, {"id": 315, "seek": 187390, "start": 1893.18, + "end": 1898.0600000000002, "text": " this is probably like a little bit like going + back to n and algorithms right so like an inverted", "tokens": [51328, 341, 307, + 1391, 411, 257, 707, 857, 411, 516, 646, 281, 297, 293, 14642, 558, 370, 411, 364, + 38969, 51572], "temperature": 0.0, "avg_logprob": -0.15437527611142113, "compression_ratio": + 1.7598425196850394, "no_speech_prob": 5.92911419516895e-05}, {"id": 316, "seek": + 187390, "start": 1898.0600000000002, "end": 1903.1000000000001, "text": " index + it''s basically like a hash table so I have this term it''s like order one look + up", "tokens": [51572, 8186, 309, 311, 1936, 411, 257, 22019, 3199, 370, 286, 362, + 341, 1433, 309, 311, 411, 1668, 472, 574, 493, 51824], "temperature": 0.0, "avg_logprob": + -0.15437527611142113, "compression_ratio": 1.7598425196850394, "no_speech_prob": + 5.92911419516895e-05}, {"id": 317, "seek": 190310, "start": 1903.6599999999999, + "end": 1909.26, "text": " in the hash table and then you leapfrog you use this leapfrog + algorithm implemented really well", "tokens": [50392, 294, 264, 22019, 3199, 293, + 550, 291, 19438, 69, 6675, 291, 764, 341, 19438, 69, 6675, 9284, 12270, 534, 731, + 50672], "temperature": 0.0, "avg_logprob": -0.14221199904337967, "compression_ratio": + 1.7256637168141593, "no_speech_prob": 0.0008186838822439313}, {"id": 318, "seek": + 190310, "start": 1909.26, "end": 1917.1799999999998, "text": " in leucine for example + how you jump over long strides of consecutive doc id''s because you don''t", "tokens": + [50672, 294, 476, 1311, 533, 337, 1365, 577, 291, 3012, 670, 938, 1056, 1875, 295, + 30497, 3211, 4496, 311, 570, 291, 500, 380, 51068], "temperature": 0.0, "avg_logprob": + -0.14221199904337967, "compression_ratio": 1.7256637168141593, "no_speech_prob": + 0.0008186838822439313}, {"id": 319, "seek": 190310, "start": 1917.1799999999998, + "end": 1923.1, "text": " really need to examine them in an antiquity let''s say + if it''s cat and dog you know you know that cat", "tokens": [51068, 534, 643, 281, + 17496, 552, 294, 364, 41036, 507, 718, 311, 584, 498, 309, 311, 3857, 293, 3000, + 291, 458, 291, 458, 300, 3857, 51364], "temperature": 0.0, "avg_logprob": -0.14221199904337967, + "compression_ratio": 1.7256637168141593, "no_speech_prob": 0.0008186838822439313}, + {"id": 320, "seek": 190310, "start": 1923.1, "end": 1930.78, "text": " occurs in + the document id5 well I don''t know like 10 let''s say and for dog you are on on + on three", "tokens": [51364, 11843, 294, 264, 4166, 4496, 20, 731, 286, 500, 380, + 458, 411, 1266, 718, 311, 584, 293, 337, 3000, 291, 366, 322, 322, 322, 1045, 51748], + "temperature": 0.0, "avg_logprob": -0.14221199904337967, "compression_ratio": 1.7256637168141593, + "no_speech_prob": 0.0008186838822439313}, {"id": 321, "seek": 193078, "start": 1930.78, + "end": 1936.3, "text": " so you can leapfrog all the way to 10 you don''t really + need to check all this in because they will", "tokens": [50364, 370, 291, 393, 19438, + 69, 6675, 439, 264, 636, 281, 1266, 291, 500, 380, 534, 643, 281, 1520, 439, 341, + 294, 570, 436, 486, 50640], "temperature": 0.0, "avg_logprob": -0.10259765233748998, + "compression_ratio": 1.879245283018868, "no_speech_prob": 0.0011099160183221102}, + {"id": 322, "seek": 193078, "start": 1936.3, "end": 1942.78, "text": " never occur + together so for or query that''s another story because that''s a union but for and + query", "tokens": [50640, 1128, 5160, 1214, 370, 337, 420, 14581, 300, 311, 1071, + 1657, 570, 300, 311, 257, 11671, 457, 337, 293, 14581, 50964], "temperature": 0.0, + "avg_logprob": -0.10259765233748998, "compression_ratio": 1.879245283018868, "no_speech_prob": + 0.0011099160183221102}, {"id": 323, "seek": 193078, "start": 1942.78, "end": 1947.5, + "text": " it''s an intersection so you always need an intersection you can then + stop early because you don''t need", "tokens": [50964, 309, 311, 364, 15236, 370, + 291, 1009, 643, 364, 15236, 291, 393, 550, 1590, 2440, 570, 291, 500, 380, 643, + 51200], "temperature": 0.0, "avg_logprob": -0.10259765233748998, "compression_ratio": + 1.879245283018868, "no_speech_prob": 0.0011099160183221102}, {"id": 324, "seek": + 193078, "start": 1947.5, "end": 1953.82, "text": " 100,000 results on the screen + right and I''m still actually curious on how would you know when to", "tokens": + [51200, 2319, 11, 1360, 3542, 322, 264, 2568, 558, 293, 286, 478, 920, 767, 6369, + 322, 577, 576, 291, 458, 562, 281, 51516], "temperature": 0.0, "avg_logprob": -0.10259765233748998, + "compression_ratio": 1.879245283018868, "no_speech_prob": 0.0011099160183221102}, + {"id": 325, "seek": 193078, "start": 1953.82, "end": 1959.74, "text": " stop because + what if you didn''t find the document that is even more relevant that what you have + seen", "tokens": [51516, 1590, 570, 437, 498, 291, 994, 380, 915, 264, 4166, 300, + 307, 754, 544, 7340, 300, 437, 291, 362, 1612, 51812], "temperature": 0.0, "avg_logprob": + -0.10259765233748998, "compression_ratio": 1.879245283018868, "no_speech_prob": + 0.0011099160183221102}, {"id": 326, "seek": 195974, "start": 1959.74, "end": 1964.06, + "text": " so far but that''s like a matter of debate I guess but then you start + scoring them and then", "tokens": [50364, 370, 1400, 457, 300, 311, 411, 257, 1871, + 295, 7958, 286, 2041, 457, 550, 291, 722, 22358, 552, 293, 550, 50580], "temperature": + 0.0, "avg_logprob": -0.17003371498801492, "compression_ratio": 1.7827715355805243, + "no_speech_prob": 0.011692658066749573}, {"id": 327, "seek": 195974, "start": 1964.06, + "end": 1969.42, "text": " sorting them were relevance right yeah sorry if I''m a + little behind them so is this referring to", "tokens": [50580, 32411, 552, 645, + 32684, 558, 1338, 2597, 498, 286, 478, 257, 707, 2261, 552, 370, 307, 341, 13761, + 281, 50848], "temperature": 0.0, "avg_logprob": -0.17003371498801492, "compression_ratio": + 1.7827715355805243, "no_speech_prob": 0.011692658066749573}, {"id": 328, "seek": + 195974, "start": 1969.42, "end": 1975.02, "text": " how you can use like an inverted + index to calculate the BM25 scores so I would you know with my", "tokens": [50848, + 577, 291, 393, 764, 411, 364, 38969, 8186, 281, 8873, 264, 15901, 6074, 13444, 370, + 286, 576, 291, 458, 365, 452, 51128], "temperature": 0.0, "avg_logprob": -0.17003371498801492, + "compression_ratio": 1.7827715355805243, "no_speech_prob": 0.011692658066749573}, + {"id": 329, "seek": 195974, "start": 1975.02, "end": 1980.38, "text": " document + collection if dog appears I you know dog and the documents so that when I''m calculating", + "tokens": [51128, 4166, 5765, 498, 3000, 7038, 286, 291, 458, 3000, 293, 264, 8512, + 370, 300, 562, 286, 478, 28258, 51396], "temperature": 0.0, "avg_logprob": -0.17003371498801492, + "compression_ratio": 1.7827715355805243, "no_speech_prob": 0.011692658066749573}, + {"id": 330, "seek": 195974, "start": 1980.94, "end": 1987.74, "text": " yeah yeah + but like the the the the I guess the comparison I wanted to make to dance search + that", "tokens": [51424, 1338, 1338, 457, 411, 264, 264, 264, 264, 286, 2041, 264, + 9660, 286, 1415, 281, 652, 281, 4489, 3164, 300, 51764], "temperature": 0.0, "avg_logprob": + -0.17003371498801492, "compression_ratio": 1.7827715355805243, "no_speech_prob": + 0.011692658066749573}, {"id": 331, "seek": 198774, "start": 1987.82, "end": 1993.26, + "text": " like an old vector search is that they are on the on the base data structure + first of all you have a", "tokens": [50368, 411, 364, 1331, 8062, 3164, 307, 300, + 436, 366, 322, 264, 322, 264, 3096, 1412, 3877, 700, 295, 439, 291, 362, 257, 50640], + "temperature": 0.0, "avg_logprob": -0.11560722224968524, "compression_ratio": 1.871212121212121, + "no_speech_prob": 0.0028415711130946875}, {"id": 332, "seek": 198774, "start": 1993.26, + "end": 1999.5, "text": " choice of the algorithm you want to use but let''s say + we take hnsw which is the most popular right", "tokens": [50640, 3922, 295, 264, + 9284, 291, 528, 281, 764, 457, 718, 311, 584, 321, 747, 276, 3695, 86, 597, 307, + 264, 881, 3743, 558, 50952], "temperature": 0.0, "avg_logprob": -0.11560722224968524, + "compression_ratio": 1.871212121212121, "no_speech_prob": 0.0028415711130946875}, + {"id": 333, "seek": 198774, "start": 1999.5, "end": 2005.98, "text": " also implemented + in v8 I know but like you don''t know when you enter the first layer you don''t + know", "tokens": [50952, 611, 12270, 294, 371, 23, 286, 458, 457, 411, 291, 500, + 380, 458, 562, 291, 3242, 264, 700, 4583, 291, 500, 380, 458, 51276], "temperature": + 0.0, "avg_logprob": -0.11560722224968524, "compression_ratio": 1.871212121212121, + "no_speech_prob": 0.0028415711130946875}, {"id": 334, "seek": 198774, "start": 2005.98, + "end": 2011.82, "text": " where exactly you will end up like so like with hash table + I know exactly where I''m entering and I", "tokens": [51276, 689, 2293, 291, 486, + 917, 493, 411, 370, 411, 365, 22019, 3199, 286, 458, 2293, 689, 286, 478, 11104, + 293, 286, 51568], "temperature": 0.0, "avg_logprob": -0.11560722224968524, "compression_ratio": + 1.871212121212121, "no_speech_prob": 0.0028415711130946875}, {"id": 335, "seek": + 198774, "start": 2011.82, "end": 2016.86, "text": " know that I''m exactly in the + right place right and you know you can also expand your query with", "tokens": [51568, + 458, 300, 286, 478, 2293, 294, 264, 558, 1081, 558, 293, 291, 458, 291, 393, 611, + 5268, 428, 14581, 365, 51820], "temperature": 0.0, "avg_logprob": -0.11560722224968524, + "compression_ratio": 1.871212121212121, "no_speech_prob": 0.0028415711130946875}, + {"id": 336, "seek": 201686, "start": 2016.86, "end": 2021.6599999999999, "text": + " synonyms then you enter more more points in the hash table and you start traversing + all of them", "tokens": [50364, 5451, 2526, 2592, 550, 291, 3242, 544, 544, 2793, + 294, 264, 22019, 3199, 293, 291, 722, 23149, 278, 439, 295, 552, 50604], "temperature": + 0.0, "avg_logprob": -0.1212303990903108, "compression_ratio": 1.8028673835125448, + "no_speech_prob": 0.00046189434942789376}, {"id": 337, "seek": 201686, "start": + 2022.2199999999998, "end": 2028.78, "text": " in parallel and you come up with the + answer but in dance search you need to like accept the uncertainty", "tokens": [50632, + 294, 8952, 293, 291, 808, 493, 365, 264, 1867, 457, 294, 4489, 3164, 291, 643, 281, + 411, 3241, 264, 15697, 50960], "temperature": 0.0, "avg_logprob": -0.1212303990903108, + "compression_ratio": 1.8028673835125448, "no_speech_prob": 0.00046189434942789376}, + {"id": 338, "seek": 201686, "start": 2028.78, "end": 2034.54, "text": " of navigating + that graph you don''t know where it will land it has certain limitations and trade-offs", + "tokens": [50960, 295, 32054, 300, 4295, 291, 500, 380, 458, 689, 309, 486, 2117, + 309, 575, 1629, 15705, 293, 4923, 12, 19231, 51248], "temperature": 0.0, "avg_logprob": + -0.1212303990903108, "compression_ratio": 1.8028673835125448, "no_speech_prob": + 0.00046189434942789376}, {"id": 339, "seek": 201686, "start": 2035.1799999999998, + "end": 2040.86, "text": " and then it will pull up you know some nearest neighbors + and probably you should be happy with them", "tokens": [51280, 293, 550, 309, 486, + 2235, 493, 291, 458, 512, 23831, 12512, 293, 1391, 291, 820, 312, 2055, 365, 552, + 51564], "temperature": 0.0, "avg_logprob": -0.1212303990903108, "compression_ratio": + 1.8028673835125448, "no_speech_prob": 0.00046189434942789376}, {"id": 340, "seek": + 201686, "start": 2040.86, "end": 2046.06, "text": " because oh otherwise you need + to do it twice so that price and so on you see what I mean right so like", "tokens": + [51564, 570, 1954, 5911, 291, 643, 281, 360, 309, 6091, 370, 300, 3218, 293, 370, + 322, 291, 536, 437, 286, 914, 558, 370, 411, 51824], "temperature": 0.0, "avg_logprob": + -0.1212303990903108, "compression_ratio": 1.8028673835125448, "no_speech_prob": + 0.00046189434942789376}, {"id": 341, "seek": 204606, "start": 2046.06, "end": 2052.06, + "text": " they are fundamentally different also on search side oh in like this stochastic + nature of the", "tokens": [50364, 436, 366, 17879, 819, 611, 322, 3164, 1252, 1954, + 294, 411, 341, 342, 8997, 2750, 3687, 295, 264, 50664], "temperature": 0.0, "avg_logprob": + -0.26909617787783907, "compression_ratio": 1.6540084388185654, "no_speech_prob": + 0.0017145555466413498}, {"id": 342, "seek": 204606, "start": 2053.42, "end": 2059.5, + "text": " yeah and also I read this paper called OOD disganan that talks about how + much the distribution shift can", "tokens": [50732, 1338, 293, 611, 286, 1401, 341, + 3035, 1219, 422, 14632, 717, 1275, 282, 300, 6686, 466, 577, 709, 264, 7316, 5513, + 393, 51036], "temperature": 0.0, "avg_logprob": -0.26909617787783907, "compression_ratio": + 1.6540084388185654, "no_speech_prob": 0.0017145555466413498}, {"id": 343, "seek": + 204606, "start": 2059.5, "end": 2065.58, "text": " impact the graph based vimana + so vimana is like hnsw but you flatten it so there''s no longer the", "tokens": + [51036, 2712, 264, 4295, 2361, 371, 36497, 370, 371, 36497, 307, 411, 276, 3695, + 86, 457, 291, 24183, 309, 370, 456, 311, 572, 2854, 264, 51340], "temperature": + 0.0, "avg_logprob": -0.26909617787783907, "compression_ratio": 1.6540084388185654, + "no_speech_prob": 0.0017145555466413498}, {"id": 344, "seek": 204606, "start": 2065.58, + "end": 2069.82, "text": " hierarchy of layers it''s like all the same thing and + then you can put it on disk and it''s like a", "tokens": [51340, 22333, 295, 7914, + 309, 311, 411, 439, 264, 912, 551, 293, 550, 291, 393, 829, 309, 322, 12355, 293, + 309, 311, 411, 257, 51552], "temperature": 0.0, "avg_logprob": -0.26909617787783907, + "compression_ratio": 1.6540084388185654, "no_speech_prob": 0.0017145555466413498}, + {"id": 345, "seek": 206982, "start": 2069.82, "end": 2077.82, "text": " little cheaper + run I think yeah it''s fascinating the whole indexing the part that that''s like + kind", "tokens": [50364, 707, 12284, 1190, 286, 519, 1338, 309, 311, 10343, 264, + 1379, 8186, 278, 264, 644, 300, 300, 311, 411, 733, 50764], "temperature": 0.0, + "avg_logprob": -0.2313916761796553, "compression_ratio": 1.737556561085973, "no_speech_prob": + 0.0035739107988774776}, {"id": 346, "seek": 206982, "start": 2077.82, "end": 2085.5, + "text": " of the the meat of this especially from wavy aspect of that''s where I + see and in addition to", "tokens": [50764, 295, 264, 264, 4615, 295, 341, 2318, + 490, 261, 15498, 4171, 295, 300, 311, 689, 286, 536, 293, 294, 4500, 281, 51148], + "temperature": 0.0, "avg_logprob": -0.2313916761796553, "compression_ratio": 1.737556561085973, + "no_speech_prob": 0.0035739107988774776}, {"id": 347, "seek": 206982, "start": 2085.5, + "end": 2090.06, "text": " you know the ux and making it like a very developer friendly + to well there''s a few sides to it", "tokens": [51148, 291, 458, 264, 344, 87, 293, + 1455, 309, 411, 257, 588, 10754, 9208, 281, 731, 456, 311, 257, 1326, 4881, 281, + 309, 51376], "temperature": 0.0, "avg_logprob": -0.2313916761796553, "compression_ratio": + 1.737556561085973, "no_speech_prob": 0.0035739107988774776}, {"id": 348, "seek": + 206982, "start": 2090.06, "end": 2095.1800000000003, "text": " because there''s + also the distributed database part and you know all the written and go laying the", + "tokens": [51376, 570, 456, 311, 611, 264, 12631, 8149, 644, 293, 291, 458, 439, + 264, 3720, 293, 352, 14903, 264, 51632], "temperature": 0.0, "avg_logprob": -0.2313916761796553, + "compression_ratio": 1.737556561085973, "no_speech_prob": 0.0035739107988774776}, + {"id": 349, "seek": 209518, "start": 2095.18, "end": 2100.94, "text": " concurrency + control you know the replication of the backups like all these kind of things like + that", "tokens": [50364, 23702, 10457, 1969, 291, 458, 264, 39911, 295, 264, 50160, + 411, 439, 613, 733, 295, 721, 411, 300, 50652], "temperature": 0.0, "avg_logprob": + -0.16611063591787747, "compression_ratio": 1.7925925925925925, "no_speech_prob": + 0.0009586364612914622}, {"id": 350, "seek": 209518, "start": 2100.94, "end": 2105.02, + "text": " so it''s definitely like some things to but that approximate nearest neighbor + search and I know that", "tokens": [50652, 370, 309, 311, 2138, 411, 512, 721, 281, + 457, 300, 30874, 23831, 5987, 3164, 293, 286, 458, 300, 50856], "temperature": 0.0, + "avg_logprob": -0.16611063591787747, "compression_ratio": 1.7925925925925925, "no_speech_prob": + 0.0009586364612914622}, {"id": 351, "seek": 209518, "start": 2105.02, "end": 2108.7, + "text": " you have this experience with you know I''ve listened to a ton of your + talks and you''re you", "tokens": [50856, 291, 362, 341, 1752, 365, 291, 458, 286, + 600, 13207, 281, 257, 2952, 295, 428, 6686, 293, 291, 434, 291, 51040], "temperature": + 0.0, "avg_logprob": -0.16611063591787747, "compression_ratio": 1.7925925925925925, + "no_speech_prob": 0.0009586364612914622}, {"id": 352, "seek": 209518, "start": 2108.7, + "end": 2118.62, "text": " introduced me to the a and n benchmarks but yeah that + I see that there is being like three levels", "tokens": [51040, 7268, 385, 281, + 264, 257, 293, 297, 43751, 457, 1338, 300, 286, 536, 300, 456, 307, 885, 411, 1045, + 4358, 51536], "temperature": 0.0, "avg_logprob": -0.16611063591787747, "compression_ratio": + 1.7925925925925925, "no_speech_prob": 0.0009586364612914622}, {"id": 353, "seek": + 209518, "start": 2118.62, "end": 2124.7, "text": " of errors that come that propagate + up there''s the errors from hnsw and say product quantization", "tokens": [51536, + 295, 13603, 300, 808, 300, 48256, 493, 456, 311, 264, 13603, 490, 276, 3695, 86, + 293, 584, 1674, 4426, 2144, 51840], "temperature": 0.0, "avg_logprob": -0.16611063591787747, + "compression_ratio": 1.7925925925925925, "no_speech_prob": 0.0009586364612914622}, + {"id": 354, "seek": 212518, "start": 2125.58, "end": 2129.8999999999996, "text": + " then there''s the errors from the vector representations to begin with and then + there''s maybe", "tokens": [50384, 550, 456, 311, 264, 13603, 490, 264, 8062, 33358, + 281, 1841, 365, 293, 550, 456, 311, 1310, 50600], "temperature": 0.0, "avg_logprob": + -0.09709692446984977, "compression_ratio": 1.8113207547169812, "no_speech_prob": + 0.0017972294008359313}, {"id": 355, "seek": 212518, "start": 2129.8999999999996, + "end": 2134.62, "text": " the errors and like the question answering model so if + you wanted to have like you know natural", "tokens": [50600, 264, 13603, 293, 411, + 264, 1168, 13430, 2316, 370, 498, 291, 1415, 281, 362, 411, 291, 458, 3303, 50836], + "temperature": 0.0, "avg_logprob": -0.09709692446984977, "compression_ratio": 1.8113207547169812, + "no_speech_prob": 0.0017972294008359313}, {"id": 356, "seek": 212518, "start": 2134.62, + "end": 2139.74, "text": " questions open domain qa you''re looking at like three + layers of cascading errors that are sort of", "tokens": [50836, 1651, 1269, 9274, + 9505, 64, 291, 434, 1237, 412, 411, 1045, 7914, 295, 3058, 66, 8166, 13603, 300, + 366, 1333, 295, 51092], "temperature": 0.0, "avg_logprob": -0.09709692446984977, + "compression_ratio": 1.8113207547169812, "no_speech_prob": 0.0017972294008359313}, + {"id": 357, "seek": 212518, "start": 2139.74, "end": 2146.8599999999997, "text": + " unrelated to each other yeah exactly people really brilliantly that you like and + I think if I may", "tokens": [51092, 38967, 281, 1184, 661, 1338, 2293, 561, 534, + 8695, 42580, 300, 291, 411, 293, 286, 519, 498, 286, 815, 51448], "temperature": + 0.0, "avg_logprob": -0.09709692446984977, "compression_ratio": 1.8113207547169812, + "no_speech_prob": 0.0017972294008359313}, {"id": 358, "seek": 212518, "start": 2146.8599999999997, + "end": 2152.7799999999997, "text": " summarize it you know I anyway to you know + kind of where this hat of the person who is creating", "tokens": [51448, 20858, + 309, 291, 458, 286, 4033, 281, 291, 458, 733, 295, 689, 341, 2385, 295, 264, 954, + 567, 307, 4084, 51744], "temperature": 0.0, "avg_logprob": -0.09709692446984977, + "compression_ratio": 1.8113207547169812, "no_speech_prob": 0.0017972294008359313}, + {"id": 359, "seek": 215278, "start": 2153.26, "end": 2157.5, "text": " this doctor + search pyramid and stuff I''m not the only guy doing this but I keep doing this + because", "tokens": [50388, 341, 4631, 3164, 25950, 293, 1507, 286, 478, 406, 264, + 787, 2146, 884, 341, 457, 286, 1066, 884, 341, 570, 50600], "temperature": 0.0, + "avg_logprob": -0.11128556100945723, "compression_ratio": 1.7467248908296944, "no_speech_prob": + 0.005856184288859367}, {"id": 360, "seek": 215278, "start": 2157.5, "end": 2163.6600000000003, + "text": " it helps me to stay comfortable in the topic and sort of okay I''m looking + at it from this angle and", "tokens": [50600, 309, 3665, 385, 281, 1754, 4619, 294, + 264, 4829, 293, 1333, 295, 1392, 286, 478, 1237, 412, 309, 490, 341, 5802, 293, + 50908], "temperature": 0.0, "avg_logprob": -0.11128556100945723, "compression_ratio": + 1.7467248908296944, "no_speech_prob": 0.005856184288859367}, {"id": 361, "seek": + 215278, "start": 2163.6600000000003, "end": 2168.46, "text": " if you accept it + stay with me if you don''t you know you may you may as well augment it or something", + "tokens": [50908, 498, 291, 3241, 309, 1754, 365, 385, 498, 291, 500, 380, 291, + 458, 291, 815, 291, 815, 382, 731, 29919, 309, 420, 746, 51148], "temperature": + 0.0, "avg_logprob": -0.11128556100945723, "compression_ratio": 1.7467248908296944, + "no_speech_prob": 0.005856184288859367}, {"id": 362, "seek": 215278, "start": 2168.46, + "end": 2177.02, "text": " like you did earlier with some levels and you know like + it''s just you need to accept that uncertainty", "tokens": [51148, 411, 291, 630, + 3071, 365, 512, 4358, 293, 291, 458, 411, 309, 311, 445, 291, 643, 281, 3241, 300, + 15697, 51576], "temperature": 0.0, "avg_logprob": -0.11128556100945723, "compression_ratio": + 1.7467248908296944, "no_speech_prob": 0.005856184288859367}, {"id": 363, "seek": + 217702, "start": 2177.02, "end": 2183.18, "text": " like you explained and also + that uncertainty that you know like in this can and paper they they", "tokens": + [50364, 411, 291, 8825, 293, 611, 300, 15697, 300, 291, 458, 411, 294, 341, 393, + 293, 3035, 436, 436, 50672], "temperature": 0.0, "avg_logprob": -0.14449211608531864, + "compression_ratio": 1.7612612612612613, "no_speech_prob": 0.0013236373197287321}, + {"id": 364, "seek": 217702, "start": 2183.18, "end": 2191.42, "text": " explicitly + show that in hnsw you may have unreachable nodes and they counted something like + 1000", "tokens": [50672, 20803, 855, 300, 294, 276, 3695, 86, 291, 815, 362, 517, + 16226, 712, 13891, 293, 436, 20150, 746, 411, 9714, 51084], "temperature": 0.0, + "avg_logprob": -0.14449211608531864, "compression_ratio": 1.7612612612612613, "no_speech_prob": + 0.0013236373197287321}, {"id": 365, "seek": 217702, "start": 2191.42, "end": 2196.38, + "text": " nodes were completely unreachable from any point in the graph like no + matter how you search how long", "tokens": [51084, 13891, 645, 2584, 517, 16226, + 712, 490, 604, 935, 294, 264, 4295, 411, 572, 1871, 577, 291, 3164, 577, 938, 51332], + "temperature": 0.0, "avg_logprob": -0.14449211608531864, "compression_ratio": 1.7612612612612613, + "no_speech_prob": 0.0013236373197287321}, {"id": 366, "seek": 217702, "start": 2196.38, + "end": 2202.54, "text": " you search what are the values for your e f and m parameters + during index construction and search", "tokens": [51332, 291, 3164, 437, 366, 264, + 4190, 337, 428, 308, 283, 293, 275, 9834, 1830, 8186, 6435, 293, 3164, 51640], "temperature": + 0.0, "avg_logprob": -0.14449211608531864, "compression_ratio": 1.7612612612612613, + "no_speech_prob": 0.0013236373197287321}, {"id": 367, "seek": 220254, "start": 2202.54, + "end": 2209.74, "text": " you just don''t reach them and and that''s I think that''s + somewhat similar to the inverted index", "tokens": [50364, 291, 445, 500, 380, 2524, + 552, 293, 293, 300, 311, 286, 519, 300, 311, 8344, 2531, 281, 264, 38969, 8186, + 50724], "temperature": 0.0, "avg_logprob": -0.08930057829076593, "compression_ratio": + 1.7897196261682242, "no_speech_prob": 0.0024230084381997585}, {"id": 368, "seek": + 220254, "start": 2209.74, "end": 2217.5, "text": " search where you have like one + million uh doc IDs per term how do you know when to stop it''s also", "tokens": + [50724, 3164, 689, 291, 362, 411, 472, 2459, 2232, 3211, 48212, 680, 1433, 577, + 360, 291, 458, 562, 281, 1590, 309, 311, 611, 51112], "temperature": 0.0, "avg_logprob": + -0.08930057829076593, "compression_ratio": 1.7897196261682242, "no_speech_prob": + 0.0024230084381997585}, {"id": 369, "seek": 220254, "start": 2217.5, "end": 2223.5, + "text": " like you may never reach the documents that you should have visited but + you just deliberately", "tokens": [51112, 411, 291, 815, 1128, 2524, 264, 8512, + 300, 291, 820, 362, 11220, 457, 291, 445, 23506, 51412], "temperature": 0.0, "avg_logprob": + -0.08930057829076593, "compression_ratio": 1.7897196261682242, "no_speech_prob": + 0.0024230084381997585}, {"id": 370, "seek": 220254, "start": 2223.5, "end": 2229.1, + "text": " decided to stop you know prematurely because you don''t have time you + have to you know return the", "tokens": [51412, 3047, 281, 1590, 291, 458, 34877, + 356, 570, 291, 500, 380, 362, 565, 291, 362, 281, 291, 458, 2736, 264, 51692], "temperature": + 0.0, "avg_logprob": -0.08930057829076593, "compression_ratio": 1.7897196261682242, + "no_speech_prob": 0.0024230084381997585}, {"id": 371, "seek": 222910, "start": 2229.1, + "end": 2234.46, "text": " documents within night and 10 milliseconds so you have + to make trade-offs um but they are", "tokens": [50364, 8512, 1951, 1818, 293, 1266, + 34184, 370, 291, 362, 281, 652, 4923, 12, 19231, 1105, 457, 436, 366, 50632], "temperature": + 0.0, "avg_logprob": -0.18803446220629144, "compression_ratio": 1.8611111111111112, + "no_speech_prob": 0.0033102077431976795}, {"id": 372, "seek": 222910, "start": 2235.1, + "end": 2240.46, "text": " ordered naturally in in the increasing order of doc IDs + right they''re not ordered by does", "tokens": [50664, 8866, 8195, 294, 294, 264, + 5662, 1668, 295, 3211, 48212, 558, 436, 434, 406, 8866, 538, 775, 50932], "temperature": + 0.0, "avg_logprob": -0.18803446220629144, "compression_ratio": 1.8611111111111112, + "no_speech_prob": 0.0033102077431976795}, {"id": 373, "seek": 222910, "start": 2240.46, + "end": 2245.2599999999998, "text": " this question answer anything does this does + this document know anything about cats or it just", "tokens": [50932, 341, 1168, + 1867, 1340, 775, 341, 775, 341, 4166, 458, 1340, 466, 11111, 420, 309, 445, 51172], + "temperature": 0.0, "avg_logprob": -0.18803446220629144, "compression_ratio": 1.8611111111111112, + "no_speech_prob": 0.0033102077431976795}, {"id": 374, "seek": 222910, "start": 2245.2599999999998, + "end": 2250.38, "text": " not mentions them and passing you know does this document + knows anything about tweeter does it", "tokens": [51172, 406, 23844, 552, 293, 8437, + 291, 458, 775, 341, 4166, 3255, 1340, 466, 6986, 2398, 775, 309, 51428], "temperature": + 0.0, "avg_logprob": -0.18803446220629144, "compression_ratio": 1.8611111111111112, + "no_speech_prob": 0.0033102077431976795}, {"id": 375, "seek": 222910, "start": 2250.38, + "end": 2256.7, "text": " describe tweeter or just says you know please contact me + on twitter here is my twitter handle right", "tokens": [51428, 6786, 6986, 2398, + 420, 445, 1619, 291, 458, 1767, 3385, 385, 322, 21439, 510, 307, 452, 21439, 4813, + 558, 51744], "temperature": 0.0, "avg_logprob": -0.18803446220629144, "compression_ratio": + 1.8611111111111112, "no_speech_prob": 0.0033102077431976795}, {"id": 376, "seek": + 225670, "start": 2256.7799999999997, "end": 2263.1, "text": " like complete noise + uh so so you see what I mean right so like there are I think in both approaches", + "tokens": [50368, 411, 3566, 5658, 2232, 370, 370, 291, 536, 437, 286, 914, 558, + 370, 411, 456, 366, 286, 519, 294, 1293, 11587, 50684], "temperature": 0.0, "avg_logprob": + -0.12651849080281086, "compression_ratio": 1.7207207207207207, "no_speech_prob": + 0.005413600243628025}, {"id": 377, "seek": 225670, "start": 2263.1, "end": 2269.5, + "text": " like on fundamental level on data structure level we deal with this fundamental + limitations", "tokens": [50684, 411, 322, 8088, 1496, 322, 1412, 3877, 1496, 321, + 2028, 365, 341, 8088, 15705, 51004], "temperature": 0.0, "avg_logprob": -0.12651849080281086, + "compression_ratio": 1.7207207207207207, "no_speech_prob": 0.005413600243628025}, + {"id": 378, "seek": 225670, "start": 2269.5, "end": 2276.7799999999997, "text": + " like gravity law like you cannot jump off and and fly to moon or to Mars right + without additional", "tokens": [51004, 411, 12110, 2101, 411, 291, 2644, 3012, 766, + 293, 293, 3603, 281, 7135, 420, 281, 9692, 558, 1553, 4497, 51368], "temperature": + 0.0, "avg_logprob": -0.12651849080281086, "compression_ratio": 1.7207207207207207, + "no_speech_prob": 0.005413600243628025}, {"id": 379, "seek": 225670, "start": 2276.7799999999997, + "end": 2283.02, "text": " like thrust and devices and stuff yeah so do you feel + the same like does it resonate oh yeah", "tokens": [51368, 411, 24030, 293, 5759, + 293, 1507, 1338, 370, 360, 291, 841, 264, 912, 411, 775, 309, 34285, 1954, 1338, + 51680], "temperature": 0.0, "avg_logprob": -0.12651849080281086, "compression_ratio": + 1.7207207207207207, "no_speech_prob": 0.005413600243628025}, {"id": 380, "seek": + 228302, "start": 2283.02, "end": 2288.3, "text": " well firstly thank you that you + just explained that concept to me for the first time I''m just", "tokens": [50364, + 731, 27376, 1309, 291, 300, 291, 445, 8825, 300, 3410, 281, 385, 337, 264, 700, + 565, 286, 478, 445, 50628], "temperature": 0.0, "avg_logprob": -0.13358652482339, + "compression_ratio": 1.8807692307692307, "no_speech_prob": 0.002361142775043845}, + {"id": 381, "seek": 228302, "start": 2288.3, "end": 2294.54, "text": " I''m just + now alive on the podcast understanding that concept but yeah it''s very it''s very + cool like", "tokens": [50628, 286, 478, 445, 586, 5465, 322, 264, 7367, 3701, 300, + 3410, 457, 1338, 309, 311, 588, 309, 311, 588, 1627, 411, 50940], "temperature": + 0.0, "avg_logprob": -0.13358652482339, "compression_ratio": 1.8807692307692307, + "no_speech_prob": 0.002361142775043845}, {"id": 382, "seek": 228302, "start": 2294.54, + "end": 2301.18, "text": " the um sorting the inverted index to prioritize documents + maybe by clicks like clicks would be like", "tokens": [50940, 264, 1105, 32411, + 264, 38969, 8186, 281, 25164, 8512, 1310, 538, 18521, 411, 18521, 576, 312, 411, + 51272], "temperature": 0.0, "avg_logprob": -0.13358652482339, "compression_ratio": + 1.8807692307692307, "no_speech_prob": 0.002361142775043845}, {"id": 383, "seek": + 228302, "start": 2301.18, "end": 2306.3, "text": " like the most sensible thing + if it''s like web pages so to say and you sort the documents and then", "tokens": + [51272, 411, 264, 881, 25380, 551, 498, 309, 311, 411, 3670, 7183, 370, 281, 584, + 293, 291, 1333, 264, 8512, 293, 550, 51528], "temperature": 0.0, "avg_logprob": + -0.13358652482339, "compression_ratio": 1.8807692307692307, "no_speech_prob": 0.002361142775043845}, + {"id": 384, "seek": 228302, "start": 2307.18, "end": 2311.98, "text": " you yeah + you have some kind you could probably calculate how much time you have to search + and how", "tokens": [51572, 291, 1338, 291, 362, 512, 733, 291, 727, 1391, 8873, + 577, 709, 565, 291, 362, 281, 3164, 293, 577, 51812], "temperature": 0.0, "avg_logprob": + -0.13358652482339, "compression_ratio": 1.8807692307692307, "no_speech_prob": 0.002361142775043845}, + {"id": 385, "seek": 231198, "start": 2311.98, "end": 2318.62, "text": " much that + lets you go into the invert index yeah super interesting I I think it''s very interesting + for", "tokens": [50364, 709, 300, 6653, 291, 352, 666, 264, 33966, 8186, 1338, 1687, + 1880, 286, 286, 519, 309, 311, 588, 1880, 337, 50696], "temperature": 0.0, "avg_logprob": + -0.14204045523584416, "compression_ratio": 1.8254545454545454, "no_speech_prob": + 0.00039322589873336256}, {"id": 386, "seek": 231198, "start": 2318.62, "end": 2323.82, + "text": " wevia with the with the hybrid search in the beam 25 index because I I + know the inverted index has", "tokens": [50696, 321, 11617, 365, 264, 365, 264, + 13051, 3164, 294, 264, 14269, 3552, 8186, 570, 286, 286, 458, 264, 38969, 8186, + 575, 50956], "temperature": 0.0, "avg_logprob": -0.14204045523584416, "compression_ratio": + 1.8254545454545454, "no_speech_prob": 0.00039322589873336256}, {"id": 387, "seek": + 231198, "start": 2323.82, "end": 2329.42, "text": " been explored because we have + this uh like neuro symbolic search where you would annotate properties", "tokens": + [50956, 668, 24016, 570, 321, 362, 341, 2232, 411, 16499, 25755, 3164, 689, 291, + 576, 25339, 473, 7221, 51236], "temperature": 0.0, "avg_logprob": -0.14204045523584416, + "compression_ratio": 1.8254545454545454, "no_speech_prob": 0.00039322589873336256}, + {"id": 388, "seek": 231198, "start": 2329.42, "end": 2334.7, "text": " like you + you''re searching through let''s say you have a billion sneaker images but you''ve + also labeled", "tokens": [51236, 411, 291, 291, 434, 10808, 807, 718, 311, 584, + 291, 362, 257, 5218, 9244, 4003, 5267, 457, 291, 600, 611, 21335, 51500], "temperature": + 0.0, "avg_logprob": -0.14204045523584416, "compression_ratio": 1.8254545454545454, + "no_speech_prob": 0.00039322589873336256}, {"id": 389, "seek": 231198, "start": + 2335.26, "end": 2341.1, "text": " the color they are so you have red is the color + and then you can use that to filter the search so", "tokens": [51528, 264, 2017, + 436, 366, 370, 291, 362, 2182, 307, 264, 2017, 293, 550, 291, 393, 764, 300, 281, + 6608, 264, 3164, 370, 51820], "temperature": 0.0, "avg_logprob": -0.14204045523584416, + "compression_ratio": 1.8254545454545454, "no_speech_prob": 0.00039322589873336256}, + {"id": 390, "seek": 234110, "start": 2341.1, "end": 2346.2999999999997, "text": + " there''s definitely been some foundation in pre filtering and integrating uh these + kind of symbolic", "tokens": [50364, 456, 311, 2138, 668, 512, 7030, 294, 659, 30822, + 293, 26889, 2232, 613, 733, 295, 25755, 50624], "temperature": 0.0, "avg_logprob": + -0.16630622796845018, "compression_ratio": 1.719298245614035, "no_speech_prob": + 0.0005538312834687531}, {"id": 391, "seek": 234110, "start": 2346.2999999999997, + "end": 2350.86, "text": " inverted indexes with h and s w so it''s not like the + first time we''ve yet it''s ever exploring that", "tokens": [50624, 38969, 8186, + 279, 365, 276, 293, 262, 261, 370, 309, 311, 406, 411, 264, 700, 565, 321, 600, + 1939, 309, 311, 1562, 12736, 300, 50852], "temperature": 0.0, "avg_logprob": -0.16630622796845018, + "compression_ratio": 1.719298245614035, "no_speech_prob": 0.0005538312834687531}, + {"id": 392, "seek": 234110, "start": 2350.86, "end": 2356.7799999999997, "text": + " but I yeah there''s definitely nuances with the beam 25 because of the cardinality + of how many terms", "tokens": [50852, 457, 286, 1338, 456, 311, 2138, 38775, 365, + 264, 14269, 3552, 570, 295, 264, 2920, 259, 1860, 295, 577, 867, 2115, 51148], "temperature": + 0.0, "avg_logprob": -0.16630622796845018, "compression_ratio": 1.719298245614035, + "no_speech_prob": 0.0005538312834687531}, {"id": 393, "seek": 234110, "start": 2356.7799999999997, + "end": 2362.22, "text": " you like with the document I think you''re splitting it + I don''t know 300 words right like 300", "tokens": [51148, 291, 411, 365, 264, 4166, + 286, 519, 291, 434, 30348, 309, 286, 500, 380, 458, 6641, 2283, 558, 411, 6641, + 51420], "temperature": 0.0, "avg_logprob": -0.16630622796845018, "compression_ratio": + 1.719298245614035, "no_speech_prob": 0.0005538312834687531}, {"id": 394, "seek": + 234110, "start": 2362.22, "end": 2369.1, "text": " 300 words per property so the + just the size of it um I mean starting to go into the thinking around", "tokens": + [51420, 6641, 2283, 680, 4707, 370, 264, 445, 264, 2744, 295, 309, 1105, 286, 914, + 2891, 281, 352, 666, 264, 1953, 926, 51764], "temperature": 0.0, "avg_logprob": + -0.16630622796845018, "compression_ratio": 1.719298245614035, "no_speech_prob": + 0.0005538312834687531}, {"id": 395, "seek": 236910, "start": 2369.1, "end": 2373.66, + "text": " like the sizes of things it inspired me when when you''re mentioning the + compression bottleneck from", "tokens": [50364, 411, 264, 11602, 295, 721, 309, + 7547, 385, 562, 562, 291, 434, 18315, 264, 19355, 44641, 547, 490, 50592], "temperature": + 0.0, "avg_logprob": -0.1977031628290812, "compression_ratio": 1.6629213483146068, + "no_speech_prob": 0.0011094966903328896}, {"id": 396, "seek": 236910, "start": 2373.66, + "end": 2380.06, "text": " sparse to dense I was thinking like okay let''s say we + have 384 dimensional vectors that have 32 bits", "tokens": [50592, 637, 11668, 281, + 18011, 286, 390, 1953, 411, 1392, 718, 311, 584, 321, 362, 12843, 19, 18795, 18875, + 300, 362, 8858, 9239, 50912], "temperature": 0.0, "avg_logprob": -0.1977031628290812, + "compression_ratio": 1.6629213483146068, "no_speech_prob": 0.0011094966903328896}, + {"id": 397, "seek": 236910, "start": 2380.06, "end": 2391.02, "text": " per uh vector + position like what is that is that is that 384 or 324 or 32 or you know like that", + "tokens": [50912, 680, 2232, 8062, 2535, 411, 437, 307, 300, 307, 300, 307, 300, + 12843, 19, 420, 8858, 19, 420, 8858, 420, 291, 458, 411, 300, 51460], "temperature": + 0.0, "avg_logprob": -0.1977031628290812, "compression_ratio": 1.6629213483146068, + "no_speech_prob": 0.0011094966903328896}, {"id": 398, "seek": 239102, "start": 2391.1, + "end": 2399.5, "text": " it''s still a massive common tutorial space right yes exactly + exactly and and and you said like", "tokens": [50368, 309, 311, 920, 257, 5994, + 2689, 7073, 1901, 558, 2086, 2293, 2293, 293, 293, 293, 291, 848, 411, 50788], "temperature": + 0.0, "avg_logprob": -0.17165842721628588, "compression_ratio": 1.768181818181818, + "no_speech_prob": 0.0045879557728767395}, {"id": 399, "seek": 239102, "start": 2399.5, + "end": 2405.82, "text": " is it even the model that captures everything we need + to capture right it in all of these are numbers", "tokens": [50788, 307, 309, 754, + 264, 2316, 300, 27986, 1203, 321, 643, 281, 7983, 558, 309, 294, 439, 295, 613, + 366, 3547, 51104], "temperature": 0.0, "avg_logprob": -0.17165842721628588, "compression_ratio": + 1.768181818181818, "no_speech_prob": 0.0045879557728767395}, {"id": 400, "seek": + 239102, "start": 2405.82, "end": 2411.1, "text": " of course it''s kind of number + representation of the models understanding well understanding in", "tokens": [51104, + 295, 1164, 309, 311, 733, 295, 1230, 10290, 295, 264, 5245, 3701, 731, 3701, 294, + 51368], "temperature": 0.0, "avg_logprob": -0.17165842721628588, "compression_ratio": + 1.768181818181818, "no_speech_prob": 0.0045879557728767395}, {"id": 401, "seek": + 239102, "start": 2411.1, "end": 2419.18, "text": " quotes of of the objects that + we index uh but I guess like for me like um and you''re way ahead in", "tokens": + [51368, 19963, 295, 295, 264, 6565, 300, 321, 8186, 2232, 457, 286, 2041, 411, 337, + 385, 411, 1105, 293, 291, 434, 636, 2286, 294, 51772], "temperature": 0.0, "avg_logprob": + -0.17165842721628588, "compression_ratio": 1.768181818181818, "no_speech_prob": + 0.0045879557728767395}, {"id": 402, "seek": 241918, "start": 2419.18, "end": 2427.2599999999998, + "text": " this I feel like that uh with VBA development like um of me you know what + matters to me when I was", "tokens": [50364, 341, 286, 841, 411, 300, 2232, 365, + 691, 9295, 3250, 411, 1105, 295, 385, 291, 458, 437, 7001, 281, 385, 562, 286, 390, + 50768], "temperature": 0.0, "avg_logprob": -0.09768871684650798, "compression_ratio": + 1.6694214876033058, "no_speech_prob": 0.0009292639442719519}, {"id": 403, "seek": + 241918, "start": 2427.2599999999998, "end": 2434.3799999999997, "text": " like a + search engineer day to day is what tools not necessarily tools as in specific programs + but like", "tokens": [50768, 411, 257, 3164, 11403, 786, 281, 786, 307, 437, 3873, + 406, 4725, 3873, 382, 294, 2685, 4268, 457, 411, 51124], "temperature": 0.0, "avg_logprob": + -0.09768871684650798, "compression_ratio": 1.6694214876033058, "no_speech_prob": + 0.0009292639442719519}, {"id": 404, "seek": 241918, "start": 2434.3799999999997, + "end": 2441.74, "text": " tools as in algorithms approaches I have to control the + process right so if somebody comes up and says", "tokens": [51124, 3873, 382, 294, + 14642, 11587, 286, 362, 281, 1969, 264, 1399, 558, 370, 498, 2618, 1487, 493, 293, + 1619, 51492], "temperature": 0.0, "avg_logprob": -0.09768871684650798, "compression_ratio": + 1.6694214876033058, "no_speech_prob": 0.0009292639442719519}, {"id": 405, "seek": + 241918, "start": 2441.74, "end": 2447.4199999999996, "text": " hey can you look + in this query can you debug it first of all like explain queries one brilliant way", + "tokens": [51492, 4177, 393, 291, 574, 294, 341, 14581, 393, 291, 24083, 309, 700, + 295, 439, 411, 2903, 24109, 472, 10248, 636, 51776], "temperature": 0.0, "avg_logprob": + -0.09768871684650798, "compression_ratio": 1.6694214876033058, "no_speech_prob": + 0.0009292639442719519}, {"id": 406, "seek": 244742, "start": 2447.58, "end": 2452.3, + "text": " of doing it and that''s where you start but then once you understood aha + there is a problem that it", "tokens": [50372, 295, 884, 309, 293, 300, 311, 689, + 291, 722, 457, 550, 1564, 291, 7320, 47340, 456, 307, 257, 1154, 300, 309, 50608], + "temperature": 0.0, "avg_logprob": -0.09644802411397298, "compression_ratio": 1.7234042553191489, + "no_speech_prob": 0.0033991020172834396}, {"id": 407, "seek": 244742, "start": 2452.3, + "end": 2459.26, "text": " hits this field or I give too much of a boost uh in this + situation what should I do so you start like", "tokens": [50608, 8664, 341, 2519, + 420, 286, 976, 886, 709, 295, 257, 9194, 2232, 294, 341, 2590, 437, 820, 286, 360, + 370, 291, 722, 411, 50956], "temperature": 0.0, "avg_logprob": -0.09644802411397298, + "compression_ratio": 1.7234042553191489, "no_speech_prob": 0.0033991020172834396}, + {"id": 408, "seek": 244742, "start": 2459.26, "end": 2464.54, "text": " tweaking + these parameters and you have these tools in your hands right you can do that in + vector search", "tokens": [50956, 6986, 2456, 613, 9834, 293, 291, 362, 613, 3873, + 294, 428, 2377, 558, 291, 393, 360, 300, 294, 8062, 3164, 51220], "temperature": + 0.0, "avg_logprob": -0.09644802411397298, "compression_ratio": 1.7234042553191489, + "no_speech_prob": 0.0033991020172834396}, {"id": 409, "seek": 244742, "start": 2464.54, + "end": 2470.62, "text": " I I don''t know like I have like probably fine tuning + as one tool right so like if clip stops working", "tokens": [51220, 286, 286, 500, + 380, 458, 411, 286, 362, 411, 1391, 2489, 15164, 382, 472, 2290, 558, 370, 411, + 498, 7353, 10094, 1364, 51524], "temperature": 0.0, "avg_logprob": -0.09644802411397298, + "compression_ratio": 1.7234042553191489, "no_speech_prob": 0.0033991020172834396}, + {"id": 410, "seek": 247062, "start": 2470.7, "end": 2478.8599999999997, "text": + " on these images I can go and fine tune or bird um but what else do I have like + I can also tune some", "tokens": [50368, 322, 613, 5267, 286, 393, 352, 293, 2489, + 10864, 420, 5255, 1105, 457, 437, 1646, 360, 286, 362, 411, 286, 393, 611, 10864, + 512, 50776], "temperature": 0.0, "avg_logprob": -0.19422662024404488, "compression_ratio": + 1.7117903930131004, "no_speech_prob": 0.0026084675919264555}, {"id": 411, "seek": + 247062, "start": 2478.8599999999997, "end": 2485.5, "text": " parameters in hnsw + or gskn and so or something I can make all these thousand nodes reachable like", + "tokens": [50776, 9834, 294, 276, 77, 82, 86, 420, 290, 5161, 77, 293, 370, 420, + 746, 286, 393, 652, 439, 613, 4714, 13891, 2524, 712, 411, 51108], "temperature": + 0.0, "avg_logprob": -0.19422662024404488, "compression_ratio": 1.7117903930131004, + "no_speech_prob": 0.0026084675919264555}, {"id": 412, "seek": 247062, "start": 2485.5, + "end": 2492.94, "text": " they didn''t this can and I can choose disk over RAM if + I want to save on you know on cost and stuff", "tokens": [51108, 436, 994, 380, + 341, 393, 293, 286, 393, 2826, 12355, 670, 14561, 498, 286, 528, 281, 3155, 322, + 291, 458, 322, 2063, 293, 1507, 51480], "temperature": 0.0, "avg_logprob": -0.19422662024404488, + "compression_ratio": 1.7117903930131004, "no_speech_prob": 0.0026084675919264555}, + {"id": 413, "seek": 247062, "start": 2492.94, "end": 2499.66, "text": " but what + else do I have as a control to actually go and debug and fix that specific query + like", "tokens": [51480, 457, 437, 1646, 360, 286, 362, 382, 257, 1969, 281, 767, + 352, 293, 24083, 293, 3191, 300, 2685, 14581, 411, 51816], "temperature": 0.0, "avg_logprob": + -0.19422662024404488, "compression_ratio": 1.7117903930131004, "no_speech_prob": + 0.0026084675919264555}, {"id": 414, "seek": 249966, "start": 2499.8199999999997, + "end": 2507.74, "text": " what has been your experience on that or maybe thinking + uh yeah I think you''ve named them all", "tokens": [50372, 437, 575, 668, 428, 1752, + 322, 300, 420, 1310, 1953, 2232, 1338, 286, 519, 291, 600, 4926, 552, 439, 50768], + "temperature": 0.0, "avg_logprob": -0.14236458672417535, "compression_ratio": 1.6578947368421053, + "no_speech_prob": 0.0008320417255163193}, {"id": 415, "seek": 249966, "start": 2510.62, + "end": 2518.94, "text": " I mean I know I''ve seen like um like the tuning of the + EF construction as you mentioned with hnsw and", "tokens": [50912, 286, 914, 286, + 458, 286, 600, 1612, 411, 1105, 411, 264, 15164, 295, 264, 462, 37, 6435, 382, 291, + 2835, 365, 276, 77, 82, 86, 293, 51328], "temperature": 0.0, "avg_logprob": -0.14236458672417535, + "compression_ratio": 1.6578947368421053, "no_speech_prob": 0.0008320417255163193}, + {"id": 416, "seek": 249966, "start": 2518.94, "end": 2522.8599999999997, "text": + " I guess something that I''m really excited about with these beer benchmarks and + maybe I can", "tokens": [51328, 286, 2041, 746, 300, 286, 478, 534, 2919, 466, 365, + 613, 8795, 43751, 293, 1310, 286, 393, 51524], "temperature": 0.0, "avg_logprob": + -0.14236458672417535, "compression_ratio": 1.6578947368421053, "no_speech_prob": + 0.0008320417255163193}, {"id": 417, "seek": 249966, "start": 2522.8599999999997, + "end": 2526.94, "text": " introduce it now because I think it helps with this idea + of model selection in terms of the", "tokens": [51524, 5366, 309, 586, 570, 286, + 519, 309, 3665, 365, 341, 1558, 295, 2316, 9450, 294, 2115, 295, 264, 51728], "temperature": + 0.0, "avg_logprob": -0.14236458672417535, "compression_ratio": 1.6578947368421053, + "no_speech_prob": 0.0008320417255163193}, {"id": 418, "seek": 252694, "start": 2526.94, + "end": 2531.7400000000002, "text": " user''s perspective on how can I debug my system + how do I fix my search system so the beer", "tokens": [50364, 4195, 311, 4585, 322, + 577, 393, 286, 24083, 452, 1185, 577, 360, 286, 3191, 452, 3164, 1185, 370, 264, + 8795, 50604], "temperature": 0.0, "avg_logprob": -0.18612978751199288, "compression_ratio": + 1.7715355805243447, "no_speech_prob": 0.0014933764468878508}, {"id": 419, "seek": + 252694, "start": 2531.7400000000002, "end": 2537.18, "text": " benchmarks is it''s + about diverse text retrieval so you know it''s like arguana NF corpus track", "tokens": + [50604, 43751, 307, 309, 311, 466, 9521, 2487, 19817, 3337, 370, 291, 458, 309, + 311, 411, 3882, 84, 2095, 13576, 1181, 31624, 2837, 50876], "temperature": 0.0, + "avg_logprob": -0.18612978751199288, "compression_ratio": 1.7715355805243447, "no_speech_prob": + 0.0014933764468878508}, {"id": 420, "seek": 252694, "start": 2537.18, "end": 2542.54, + "text": " covid is the difference is instead of saying that the search image net + is going to be ms marco", "tokens": [50876, 25616, 307, 264, 2649, 307, 2602, 295, + 1566, 300, 264, 3164, 3256, 2533, 307, 516, 281, 312, 275, 82, 1849, 1291, 51144], + "temperature": 0.0, "avg_logprob": -0.18612978751199288, "compression_ratio": 1.7715355805243447, + "no_speech_prob": 0.0014933764468878508}, {"id": 421, "seek": 252694, "start": 2542.54, + "end": 2547.42, "text": " which is you know like 10 million being passages and like + a million labeled query so it''s like the", "tokens": [51144, 597, 307, 291, 458, + 411, 1266, 2459, 885, 31589, 293, 411, 257, 2459, 21335, 14581, 370, 309, 311, 411, + 264, 51388], "temperature": 0.0, "avg_logprob": -0.18612978751199288, "compression_ratio": + 1.7715355805243447, "no_speech_prob": 0.0014933764468878508}, {"id": 422, "seek": + 252694, "start": 2547.42, "end": 2552.62, "text": " image net idea of like this + general source of it like image net is like a massive collection of", "tokens": + [51388, 3256, 2533, 1558, 295, 411, 341, 2674, 4009, 295, 309, 411, 3256, 2533, + 307, 411, 257, 5994, 5765, 295, 51648], "temperature": 0.0, "avg_logprob": -0.18612978751199288, + "compression_ratio": 1.7715355805243447, "no_speech_prob": 0.0014933764468878508}, + {"id": 423, "seek": 255262, "start": 2552.7, "end": 2558.22, "text": " images labeled + in a bunch of categories so it''s like it''s like is ms marco the search image net", + "tokens": [50368, 5267, 21335, 294, 257, 3840, 295, 10479, 370, 309, 311, 411, 309, + 311, 411, 307, 275, 82, 1849, 1291, 264, 3164, 3256, 2533, 50644], "temperature": + 0.0, "avg_logprob": -0.1820291356837496, "compression_ratio": 1.7117117117117118, + "no_speech_prob": 0.001304032513871789}, {"id": 424, "seek": 255262, "start": 2558.22, + "end": 2562.94, "text": " but it seems like instead we''re going for diversity with + beer and I think also if we all if we", "tokens": [50644, 457, 309, 2544, 411, 2602, + 321, 434, 516, 337, 8811, 365, 8795, 293, 286, 519, 611, 498, 321, 439, 498, 321, + 50880], "temperature": 0.0, "avg_logprob": -0.1820291356837496, "compression_ratio": + 1.7117117117117118, "no_speech_prob": 0.001304032513871789}, {"id": 425, "seek": + 255262, "start": 2562.94, "end": 2569.02, "text": " want to talk about intent intents + and instructions further I think actually beer is I think beer had", "tokens": [50880, + 528, 281, 751, 466, 8446, 560, 791, 293, 9415, 3052, 286, 519, 767, 8795, 307, 286, + 519, 8795, 632, 51184], "temperature": 0.0, "avg_logprob": -0.1820291356837496, + "compression_ratio": 1.7117117117117118, "no_speech_prob": 0.001304032513871789}, + {"id": 426, "seek": 255262, "start": 2569.02, "end": 2577.74, "text": " another + there''s latte like L O T T E capital T''s like they go they go beverages right + so", "tokens": [51184, 1071, 456, 311, 37854, 411, 441, 422, 314, 314, 462, 4238, + 314, 311, 411, 436, 352, 436, 352, 47401, 558, 370, 51620], "temperature": 0.0, + "avg_logprob": -0.1820291356837496, "compression_ratio": 1.7117117117117118, "no_speech_prob": + 0.001304032513871789}, {"id": 427, "seek": 257774, "start": 2577.8199999999997, + "end": 2584.7799999999997, "text": " yeah so there''s like an equivalent to beer + and then there''s also miracle which is for", "tokens": [50368, 1338, 370, 456, + 311, 411, 364, 10344, 281, 8795, 293, 550, 456, 311, 611, 14660, 597, 307, 337, + 50716], "temperature": 0.0, "avg_logprob": -0.10939262531421802, "compression_ratio": + 1.8825910931174088, "no_speech_prob": 0.0031901560723781586}, {"id": 428, "seek": + 257774, "start": 2584.7799999999997, "end": 2589.8999999999996, "text": " multilingual + so there''s a lot of these like diverse text retrieval and then and then it''s", + "tokens": [50716, 2120, 38219, 370, 456, 311, 257, 688, 295, 613, 411, 9521, 2487, + 19817, 3337, 293, 550, 293, 550, 309, 311, 50972], "temperature": 0.0, "avg_logprob": + -0.10939262531421802, "compression_ratio": 1.8825910931174088, "no_speech_prob": + 0.0031901560723781586}, {"id": 429, "seek": 257774, "start": 2589.8999999999996, + "end": 2594.7, "text": " expanding where you would label it with the instructions + as well and I don''t remember the names", "tokens": [50972, 14702, 689, 291, 576, + 7645, 309, 365, 264, 9415, 382, 731, 293, 286, 500, 380, 1604, 264, 5288, 51212], + "temperature": 0.0, "avg_logprob": -0.10939262531421802, "compression_ratio": 1.8825910931174088, + "no_speech_prob": 0.0031901560723781586}, {"id": 430, "seek": 257774, "start": 2594.7, + "end": 2598.2999999999997, "text": " of these data sets off the top of my head because + it''s very new but I know this paper called task", "tokens": [51212, 295, 613, 1412, + 6352, 766, 264, 1192, 295, 452, 1378, 570, 309, 311, 588, 777, 457, 286, 458, 341, + 3035, 1219, 5633, 51392], "temperature": 0.0, "avg_logprob": -0.10939262531421802, + "compression_ratio": 1.8825910931174088, "no_speech_prob": 0.0031901560723781586}, + {"id": 431, "seek": 257774, "start": 2598.2999999999997, "end": 2602.8599999999997, + "text": " aware retrieval with instructions and I think there''s a model another + paper with a model called", "tokens": [51392, 3650, 19817, 3337, 365, 9415, 293, + 286, 519, 456, 311, 257, 2316, 1071, 3035, 365, 257, 2316, 1219, 51620], "temperature": + 0.0, "avg_logprob": -0.10939262531421802, "compression_ratio": 1.8825910931174088, + "no_speech_prob": 0.0031901560723781586}, {"id": 432, "seek": 260286, "start": 2602.86, + "end": 2607.6600000000003, "text": " instructor so this is idea where you also label + with the intent but but anyways let me go back", "tokens": [50364, 18499, 370, 341, + 307, 1558, 689, 291, 611, 7645, 365, 264, 8446, 457, 457, 13448, 718, 385, 352, + 646, 50604], "temperature": 0.0, "avg_logprob": -0.11106062293948984, "compression_ratio": + 1.9266666666666667, "no_speech_prob": 0.007401135750114918}, {"id": 433, "seek": + 260286, "start": 2607.6600000000003, "end": 2612.46, "text": " to the focus on like + how does a user debug the search system and say how can I fix it so the idea", "tokens": + [50604, 281, 264, 1879, 322, 411, 577, 775, 257, 4195, 24083, 264, 3164, 1185, 293, + 584, 577, 393, 286, 3191, 309, 370, 264, 1558, 50844], "temperature": 0.0, "avg_logprob": + -0.11106062293948984, "compression_ratio": 1.9266666666666667, "no_speech_prob": + 0.007401135750114918}, {"id": 434, "seek": 260286, "start": 2612.46, "end": 2616.86, + "text": " with the beer benchmarks like one idea would be that we could test several + different models and you", "tokens": [50844, 365, 264, 8795, 43751, 411, 472, 1558, + 576, 312, 300, 321, 727, 1500, 2940, 819, 5245, 293, 291, 51064], "temperature": + 0.0, "avg_logprob": -0.11106062293948984, "compression_ratio": 1.9266666666666667, + "no_speech_prob": 0.007401135750114918}, {"id": 435, "seek": 260286, "start": 2616.86, + "end": 2622.7000000000003, "text": " could maybe say like okay well I''m building + a nutrition I''m building a nutrition search apps I''m", "tokens": [51064, 727, + 1310, 584, 411, 1392, 731, 286, 478, 2390, 257, 14718, 286, 478, 2390, 257, 14718, + 3164, 7733, 286, 478, 51356], "temperature": 0.0, "avg_logprob": -0.11106062293948984, + "compression_ratio": 1.9266666666666667, "no_speech_prob": 0.007401135750114918}, + {"id": 436, "seek": 260286, "start": 2622.7000000000003, "end": 2627.26, "text": + " like I''m like bodybuilding.com or something like that and so you would look at + the NF corpus", "tokens": [51356, 411, 286, 478, 411, 1772, 28126, 13, 1112, 420, + 746, 411, 300, 293, 370, 291, 576, 574, 412, 264, 13576, 1181, 31624, 51584], "temperature": + 0.0, "avg_logprob": -0.11106062293948984, "compression_ratio": 1.9266666666666667, + "no_speech_prob": 0.007401135750114918}, {"id": 437, "seek": 260286, "start": 2628.06, + "end": 2631.5, "text": " results and you would see the performance of the different + models and that would maybe help you", "tokens": [51624, 3542, 293, 291, 576, 536, + 264, 3389, 295, 264, 819, 5245, 293, 300, 576, 1310, 854, 291, 51796], "temperature": + 0.0, "avg_logprob": -0.11106062293948984, "compression_ratio": 1.9266666666666667, + "no_speech_prob": 0.007401135750114918}, {"id": 438, "seek": 263150, "start": 2631.5, + "end": 2637.42, "text": " take a different model off the shelf but then what you''re + saying with like fine tuning it I suspect", "tokens": [50364, 747, 257, 819, 2316, + 766, 264, 15222, 457, 550, 437, 291, 434, 1566, 365, 411, 2489, 15164, 309, 286, + 9091, 50660], "temperature": 0.0, "avg_logprob": -0.1886826145405672, "compression_ratio": + 1.6752136752136753, "no_speech_prob": 0.0004842080525122583}, {"id": 439, "seek": + 263150, "start": 2637.42, "end": 2643.58, "text": " that fine tuning is going to + be a super powerful lever I think if you find like and maybe later", "tokens": [50660, + 300, 2489, 15164, 307, 516, 281, 312, 257, 1687, 4005, 12451, 286, 519, 498, 291, + 915, 411, 293, 1310, 1780, 50968], "temperature": 0.0, "avg_logprob": -0.1886826145405672, + "compression_ratio": 1.6752136752136753, "no_speech_prob": 0.0004842080525122583}, + {"id": 440, "seek": 263150, "start": 2645.34, "end": 2652.14, "text": " there''s + so many topics I want to talk to you about. Like with the idea of I''ve been building + a", "tokens": [51056, 456, 311, 370, 867, 8378, 286, 528, 281, 751, 281, 291, 466, + 13, 1743, 365, 264, 1558, 295, 286, 600, 668, 2390, 257, 51396], "temperature": + 0.0, "avg_logprob": -0.1886826145405672, "compression_ratio": 1.6752136752136753, + "no_speech_prob": 0.0004842080525122583}, {"id": 441, "seek": 263150, "start": 2652.14, + "end": 2656.86, "text": " Wevey-A demo of the podcast search so I''ve been taking + the Wevey-A podcast parsing the transcriptions", "tokens": [51396, 492, 5603, 12, + 32, 10723, 295, 264, 7367, 3164, 370, 286, 600, 668, 1940, 264, 492, 5603, 12, 32, + 7367, 21156, 278, 264, 24444, 626, 51632], "temperature": 0.0, "avg_logprob": -0.1886826145405672, + "compression_ratio": 1.6752136752136753, "no_speech_prob": 0.0004842080525122583}, + {"id": 442, "seek": 265686, "start": 2656.86, "end": 2662.7000000000003, "text": + " and putting them in there in my temptation to like fine tune it and start thinking + about this", "tokens": [50364, 293, 3372, 552, 294, 456, 294, 452, 30423, 281, 411, + 2489, 10864, 309, 293, 722, 1953, 466, 341, 50656], "temperature": 0.0, "avg_logprob": + -0.14244754757501382, "compression_ratio": 1.7818181818181817, "no_speech_prob": + 0.0026202411390841007}, {"id": 443, "seek": 265686, "start": 2663.34, "end": 2668.94, + "text": " positive negative construction for that I mean I think in general with + Wevey-A we''re kind of you", "tokens": [50688, 3353, 3671, 6435, 337, 300, 286, + 914, 286, 519, 294, 2674, 365, 492, 5603, 12, 32, 321, 434, 733, 295, 291, 50968], + "temperature": 0.0, "avg_logprob": -0.14244754757501382, "compression_ratio": 1.7818181818181817, + "no_speech_prob": 0.0026202411390841007}, {"id": 444, "seek": 265686, "start": 2668.94, + "end": 2675.02, "text": " know letting like you know we use open AI models co-hear + models, hugging face models and it''s like", "tokens": [50968, 458, 8295, 411, 291, + 458, 321, 764, 1269, 7318, 5245, 598, 12, 675, 289, 5245, 11, 41706, 1851, 5245, + 293, 309, 311, 411, 51272], "temperature": 0.0, "avg_logprob": -0.14244754757501382, + "compression_ratio": 1.7818181818181817, "no_speech_prob": 0.0026202411390841007}, + {"id": 445, "seek": 265686, "start": 2675.02, "end": 2679.82, "text": " we''re not + really training the models but it''s just such an interesting thing to tune I know + Gina AI''s", "tokens": [51272, 321, 434, 406, 534, 3097, 264, 5245, 457, 309, 311, + 445, 1270, 364, 1880, 551, 281, 10864, 286, 458, 34711, 7318, 311, 51512], "temperature": + 0.0, "avg_logprob": -0.14244754757501382, "compression_ratio": 1.7818181818181817, + "no_speech_prob": 0.0026202411390841007}, {"id": 446, "seek": 265686, "start": 2679.82, + "end": 2685.26, "text": " fine tuner is extremely interesting that I do find myself + like constantly pulled in that direction", "tokens": [51512, 2489, 4267, 260, 307, + 4664, 1880, 300, 286, 360, 915, 2059, 411, 6460, 7373, 294, 300, 3513, 51784], "temperature": + 0.0, "avg_logprob": -0.14244754757501382, "compression_ratio": 1.7818181818181817, + "no_speech_prob": 0.0026202411390841007}, {"id": 447, "seek": 268526, "start": 2685.26, + "end": 2692.5400000000004, "text": " of like wanting to train models. Yeah absolutely + I''ve been when we when we presented Mewves", "tokens": [50364, 295, 411, 7935, + 281, 3847, 5245, 13, 865, 3122, 286, 600, 668, 562, 321, 562, 321, 8212, 376, 1023, + 977, 50728], "temperature": 0.0, "avg_logprob": -0.20249184141767787, "compression_ratio": + 1.6379310344827587, "no_speech_prob": 0.002138295443728566}, {"id": 448, "seek": + 268526, "start": 2692.5400000000004, "end": 2699.1000000000004, "text": " at Berlin + Buzzwords last year now we actually said we also have Mewver which is the component + to", "tokens": [50728, 412, 13848, 29209, 13832, 1036, 1064, 586, 321, 767, 848, + 321, 611, 362, 376, 1023, 331, 597, 307, 264, 6542, 281, 51056], "temperature": + 0.0, "avg_logprob": -0.20249184141767787, "compression_ratio": 1.6379310344827587, + "no_speech_prob": 0.002138295443728566}, {"id": 449, "seek": 268526, "start": 2699.7400000000002, + "end": 2705.1800000000003, "text": " allowing you to fine tune a model we kind of + like don''t have it for prime time but I''ve been like", "tokens": [51088, 8293, + 291, 281, 2489, 10864, 257, 2316, 321, 733, 295, 411, 500, 380, 362, 309, 337, 5835, + 565, 457, 286, 600, 668, 411, 51360], "temperature": 0.0, "avg_logprob": -0.20249184141767787, + "compression_ratio": 1.6379310344827587, "no_speech_prob": 0.002138295443728566}, + {"id": 450, "seek": 268526, "start": 2705.1800000000003, "end": 2711.5800000000004, + "text": " really fascinated kind of coding a bit of that and and checking how well + it can can work in a", "tokens": [51360, 534, 24597, 733, 295, 17720, 257, 857, + 295, 300, 293, 293, 8568, 577, 731, 309, 393, 393, 589, 294, 257, 51680], "temperature": + 0.0, "avg_logprob": -0.20249184141767787, "compression_ratio": 1.6379310344827587, + "no_speech_prob": 0.002138295443728566}, {"id": 451, "seek": 271158, "start": 2711.58, + "end": 2719.66, "text": " more generic way you know because I think fine tuner allows + you to plug in several models you know", "tokens": [50364, 544, 19577, 636, 291, + 458, 570, 286, 519, 2489, 4267, 260, 4045, 291, 281, 5452, 294, 2940, 5245, 291, + 458, 50768], "temperature": 0.0, "avg_logprob": -0.13279383669617356, "compression_ratio": + 1.8679245283018868, "no_speech_prob": 0.0018931633094325662}, {"id": 452, "seek": + 271158, "start": 2719.66, "end": 2724.94, "text": " and like because different models + have different inputs they have different like setting to train", "tokens": [50768, + 293, 411, 570, 819, 5245, 362, 819, 15743, 436, 362, 819, 411, 3287, 281, 3847, + 51032], "temperature": 0.0, "avg_logprob": -0.13279383669617356, "compression_ratio": + 1.8679245283018868, "no_speech_prob": 0.0018931633094325662}, {"id": 453, "seek": + 271158, "start": 2724.94, "end": 2730.2999999999997, "text": " and fine tune and + so you need to be aware of that like Clip is that is a kind of two tower in a way", + "tokens": [51032, 293, 2489, 10864, 293, 370, 291, 643, 281, 312, 3650, 295, 300, + 411, 2033, 647, 307, 300, 307, 257, 733, 295, 732, 10567, 294, 257, 636, 51300], + "temperature": 0.0, "avg_logprob": -0.13279383669617356, "compression_ratio": 1.8679245283018868, + "no_speech_prob": 0.0018931633094325662}, {"id": 454, "seek": 271158, "start": 2730.2999999999997, + "end": 2739.18, "text": " right so you do need text you do need the image but I + think I feel like coming back to the question", "tokens": [51300, 558, 370, 291, + 360, 643, 2487, 291, 360, 643, 264, 3256, 457, 286, 519, 286, 841, 411, 1348, 646, + 281, 264, 1168, 51744], "temperature": 0.0, "avg_logprob": -0.13279383669617356, + "compression_ratio": 1.8679245283018868, "no_speech_prob": 0.0018931633094325662}, + {"id": 455, "seek": 273918, "start": 2739.2599999999998, "end": 2744.54, "text": + " like what tools I have I feel like fine tuning and I feel like you agree to that + the fine tuning is", "tokens": [50368, 411, 437, 3873, 286, 362, 286, 841, 411, + 2489, 15164, 293, 286, 841, 411, 291, 3986, 281, 300, 264, 2489, 15164, 307, 50632], + "temperature": 0.0, "avg_logprob": -0.08822442094484965, "compression_ratio": 1.9704433497536946, + "no_speech_prob": 0.00464021647349}, {"id": 456, "seek": 273918, "start": 2744.54, + "end": 2751.98, "text": " one way that should be more available to the masses should + be more available to the users in a way", "tokens": [50632, 472, 636, 300, 820, + 312, 544, 2435, 281, 264, 23935, 820, 312, 544, 2435, 281, 264, 5022, 294, 257, + 636, 51004], "temperature": 0.0, "avg_logprob": -0.08822442094484965, "compression_ratio": + 1.9704433497536946, "no_speech_prob": 0.00464021647349}, {"id": 457, "seek": 273918, + "start": 2751.98, "end": 2759.98, "text": " that they are aware of this tool and + they know you know best sort of like know how how to use them", "tokens": [51004, + 300, 436, 366, 3650, 295, 341, 2290, 293, 436, 458, 291, 458, 1151, 1333, 295, 411, + 458, 577, 577, 281, 764, 552, 51404], "temperature": 0.0, "avg_logprob": -0.08822442094484965, + "compression_ratio": 1.9704433497536946, "no_speech_prob": 0.00464021647349}, {"id": + 458, "seek": 273918, "start": 2759.98, "end": 2765.2599999999998, "text": " and + also pitfalls you may fall into and I think this is what you brilliantly described + like a year ago", "tokens": [51404, 293, 611, 10147, 18542, 291, 815, 2100, 666, + 293, 286, 519, 341, 307, 437, 291, 8695, 42580, 7619, 411, 257, 1064, 2057, 51668], + "temperature": 0.0, "avg_logprob": -0.08822442094484965, "compression_ratio": 1.9704433497536946, + "no_speech_prob": 0.00464021647349}, {"id": 459, "seek": 276526, "start": 2765.42, + "end": 2771.5, "text": " in the context of computer vision like data augmentation + right so like it''s one thing that you can feed", "tokens": [50372, 294, 264, 4319, + 295, 3820, 5201, 411, 1412, 14501, 19631, 558, 370, 411, 309, 311, 472, 551, 300, + 291, 393, 3154, 50676], "temperature": 0.0, "avg_logprob": -0.09499218820155352, + "compression_ratio": 1.8254716981132075, "no_speech_prob": 0.004000214394181967}, + {"id": 460, "seek": 276526, "start": 2773.5800000000004, "end": 2779.26, "text": + " you can feed some manual examples but how far you can go and like in your basketball + example", "tokens": [50780, 291, 393, 3154, 512, 9688, 5110, 457, 577, 1400, 291, + 393, 352, 293, 411, 294, 428, 11767, 1365, 51064], "temperature": 0.0, "avg_logprob": + -0.09499218820155352, "compression_ratio": 1.8254716981132075, "no_speech_prob": + 0.004000214394181967}, {"id": 461, "seek": 276526, "start": 2779.26, "end": 2784.38, + "text": " like you''ve been manually labeling some examples like you run out of + patience in a way right", "tokens": [51064, 411, 291, 600, 668, 16945, 40244, 512, + 5110, 411, 291, 1190, 484, 295, 14826, 294, 257, 636, 558, 51320], "temperature": + 0.0, "avg_logprob": -0.09499218820155352, "compression_ratio": 1.8254716981132075, + "no_speech_prob": 0.004000214394181967}, {"id": 462, "seek": 276526, "start": 2784.38, + "end": 2791.26, "text": " okay you can hire people to do that but is that scalable + probably not and also new trends come up", "tokens": [51320, 1392, 291, 393, 11158, + 561, 281, 360, 300, 457, 307, 300, 38481, 1391, 406, 293, 611, 777, 13892, 808, + 493, 51664], "temperature": 0.0, "avg_logprob": -0.09499218820155352, "compression_ratio": + 1.8254716981132075, "no_speech_prob": 0.004000214394181967}, {"id": 463, "seek": + 279126, "start": 2791.26, "end": 2796.86, "text": " like if you take a business + specifically working on e-commerce or I don''t know full text document", "tokens": + [50364, 411, 498, 291, 747, 257, 1606, 4682, 1364, 322, 308, 12, 26926, 420, 286, + 500, 380, 458, 1577, 2487, 4166, 50644], "temperature": 0.0, "avg_logprob": -0.20285599807213092, + "compression_ratio": 1.6380090497737556, "no_speech_prob": 0.00987403653562069}, + {"id": 464, "seek": 279126, "start": 2796.86, "end": 2802.38, "text": " search you + know things come up every week maybe right so like I don''t know Tesla releasing", + "tokens": [50644, 3164, 291, 458, 721, 808, 493, 633, 1243, 1310, 558, 370, 411, + 286, 500, 380, 458, 13666, 16327, 50920], "temperature": 0.0, "avg_logprob": -0.20285599807213092, + "compression_ratio": 1.6380090497737556, "no_speech_prob": 0.00987403653562069}, + {"id": 465, "seek": 279126, "start": 2802.38, "end": 2807.6600000000003, "text": + " cyber truck and you don''t have it in the in the model so it actually like in + your example", "tokens": [50920, 13411, 5898, 293, 291, 500, 380, 362, 309, 294, + 264, 294, 264, 2316, 370, 309, 767, 411, 294, 428, 1365, 51184], "temperature": + 0.0, "avg_logprob": -0.20285599807213092, "compression_ratio": 1.6380090497737556, + "no_speech_prob": 0.00987403653562069}, {"id": 466, "seek": 279126, "start": 2808.2200000000003, + "end": 2815.42, "text": " what was it with the ocean and like yeah I hear you say + like how to catch an Alaska", "tokens": [51212, 437, 390, 309, 365, 264, 7810, 293, + 411, 1338, 286, 1568, 291, 584, 411, 577, 281, 3745, 364, 19553, 51572], "temperature": + 0.0, "avg_logprob": -0.20285599807213092, "compression_ratio": 1.6380090497737556, + "no_speech_prob": 0.00987403653562069}, {"id": 467, "seek": 281542, "start": 2815.58, + "end": 2822.06, "text": " Pollock and then let''s pretend that Alaska Pollock is + a new fish that like you maybe with", "tokens": [50372, 31304, 1560, 293, 550, 718, + 311, 11865, 300, 19553, 31304, 1560, 307, 257, 777, 3506, 300, 411, 291, 1310, 365, + 50696], "temperature": 0.0, "avg_logprob": -0.18749222225613063, "compression_ratio": + 1.6755555555555555, "no_speech_prob": 0.004145725630223751}, {"id": 468, "seek": + 281542, "start": 2822.06, "end": 2828.38, "text": " vector search you may try to + find what could be the most similar object but it may also be wrong", "tokens": + [50696, 8062, 3164, 291, 815, 853, 281, 915, 437, 727, 312, 264, 881, 2531, 2657, + 457, 309, 815, 611, 312, 2085, 51012], "temperature": 0.0, "avg_logprob": -0.18749222225613063, + "compression_ratio": 1.6755555555555555, "no_speech_prob": 0.004145725630223751}, + {"id": 469, "seek": 281542, "start": 2828.38, "end": 2836.38, "text": " right or + in the case when the distance is so big that it doesn''t make sense anymore to consider", + "tokens": [51012, 558, 420, 294, 264, 1389, 562, 264, 4560, 307, 370, 955, 300, + 309, 1177, 380, 652, 2020, 3602, 281, 1949, 51412], "temperature": 0.0, "avg_logprob": + -0.18749222225613063, "compression_ratio": 1.6755555555555555, "no_speech_prob": + 0.004145725630223751}, {"id": 470, "seek": 281542, "start": 2836.38, "end": 2842.86, + "text": " this as a candidate right so yeah so this is this is very interesting + like and I hear that you", "tokens": [51412, 341, 382, 257, 11532, 558, 370, 1338, + 370, 341, 307, 341, 307, 588, 1880, 411, 293, 286, 1568, 300, 291, 51736], "temperature": + 0.0, "avg_logprob": -0.18749222225613063, "compression_ratio": 1.6755555555555555, + "no_speech_prob": 0.004145725630223751}, {"id": 471, "seek": 284286, "start": 2843.34, + "end": 2851.02, "text": " you really want to like dive into fine tuning topic as + well right yeah well that idea is", "tokens": [50388, 291, 534, 528, 281, 411, 9192, + 666, 2489, 15164, 4829, 382, 731, 558, 1338, 731, 300, 1558, 307, 50772], "temperature": + 0.0, "avg_logprob": -0.12928794897519624, "compression_ratio": 1.7649253731343284, + "no_speech_prob": 0.0054238829761743546}, {"id": 472, "seek": 284286, "start": 2851.02, + "end": 2856.3, "text": " amazing because there this argument and I also when I interviewed + multi-peach he gave me these", "tokens": [50772, 2243, 570, 456, 341, 6770, 293, + 286, 611, 562, 286, 19770, 4825, 12, 494, 608, 415, 2729, 385, 613, 51036], "temperature": + 0.0, "avg_logprob": -0.12928794897519624, "compression_ratio": 1.7649253731343284, + "no_speech_prob": 0.0054238829761743546}, {"id": 473, "seek": 284286, "start": 2856.3, + "end": 2860.86, "text": " three reasons to favor the retrieve then read approach + to large language models and one of which", "tokens": [51036, 1045, 4112, 281, 2294, + 264, 30254, 550, 1401, 3109, 281, 2416, 2856, 5245, 293, 472, 295, 597, 51264], + "temperature": 0.0, "avg_logprob": -0.12928794897519624, "compression_ratio": 1.7649253731343284, + "no_speech_prob": 0.0054238829761743546}, {"id": 474, "seek": 284286, "start": 2860.86, + "end": 2865.1, "text": " was this idea that you can swap out the information to + update it with new information cyber truck", "tokens": [51264, 390, 341, 1558, 300, + 291, 393, 18135, 484, 264, 1589, 281, 5623, 309, 365, 777, 1589, 13411, 5898, 51476], + "temperature": 0.0, "avg_logprob": -0.12928794897519624, "compression_ratio": 1.7649253731343284, + "no_speech_prob": 0.0054238829761743546}, {"id": 475, "seek": 284286, "start": 2865.1, + "end": 2868.46, "text": " becomes a new thing and then you can put it in the context + and now the language model just has", "tokens": [51476, 3643, 257, 777, 551, 293, + 550, 291, 393, 829, 309, 294, 264, 4319, 293, 586, 264, 2856, 2316, 445, 575, 51644], + "temperature": 0.0, "avg_logprob": -0.12928794897519624, "compression_ratio": 1.7649253731343284, + "no_speech_prob": 0.0054238829761743546}, {"id": 476, "seek": 286846, "start": 2868.46, + "end": 2873.5, "text": " the reason across the context but then as you say the embedding + model doesn''t know about the new", "tokens": [50364, 264, 1778, 2108, 264, 4319, + 457, 550, 382, 291, 584, 264, 12240, 3584, 2316, 1177, 380, 458, 466, 264, 777, + 50616], "temperature": 0.0, "avg_logprob": -0.13318120914956796, "compression_ratio": + 1.7397260273972603, "no_speech_prob": 0.0051767039112746716}, {"id": 477, "seek": + 286846, "start": 2873.5, "end": 2878.94, "text": " thing so the embedding model + you know also isn''t going to pick it up and so yeah I think that", "tokens": [50616, + 551, 370, 264, 12240, 3584, 2316, 291, 458, 611, 1943, 380, 516, 281, 1888, 309, + 493, 293, 370, 1338, 286, 519, 300, 50888], "temperature": 0.0, "avg_logprob": -0.13318120914956796, + "compression_ratio": 1.7397260273972603, "no_speech_prob": 0.0051767039112746716}, + {"id": 478, "seek": 286846, "start": 2878.94, "end": 2883.7400000000002, "text": + " continuous updating one idea that I''m just incredibly excited about I haven''t + figured out how to", "tokens": [50888, 10957, 25113, 472, 1558, 300, 286, 478, 445, + 6252, 2919, 466, 286, 2378, 380, 8932, 484, 577, 281, 51128], "temperature": 0.0, + "avg_logprob": -0.13318120914956796, "compression_ratio": 1.7397260273972603, "no_speech_prob": + 0.0051767039112746716}, {"id": 479, "seek": 286846, "start": 2883.7400000000002, + "end": 2890.86, "text": " make this work yet but the idea would be you you''re the + ML ops problem of this is you need to", "tokens": [51128, 652, 341, 589, 1939, 457, + 264, 1558, 576, 312, 291, 291, 434, 264, 21601, 44663, 1154, 295, 341, 307, 291, + 643, 281, 51484], "temperature": 0.0, "avg_logprob": -0.13318120914956796, "compression_ratio": + 1.7397260273972603, "no_speech_prob": 0.0051767039112746716}, {"id": 480, "seek": + 289086, "start": 2890.94, "end": 2902.46, "text": " re-vectorize your data set which + yeah so the solution maybe is that you could vectorize like a", "tokens": [50368, + 319, 12, 303, 1672, 1125, 428, 1412, 992, 597, 1338, 370, 264, 3827, 1310, 307, + 300, 291, 727, 8062, 1125, 411, 257, 50944], "temperature": 0.0, "avg_logprob": + -0.26922842172475964, "compression_ratio": 1.5051546391752577, "no_speech_prob": + 0.0041251578368246555}, {"id": 481, "seek": 289086, "start": 2902.46, "end": 2909.6600000000003, + "text": " thousand representative documents and my hypothesis is that the proximity + graph from I want to say", "tokens": [50944, 4714, 12424, 8512, 293, 452, 17291, + 307, 300, 264, 27632, 4295, 490, 286, 528, 281, 584, 51304], "temperature": 0.0, + "avg_logprob": -0.26922842172475964, "compression_ratio": 1.5051546391752577, "no_speech_prob": + 0.0041251578368246555}, {"id": 482, "seek": 289086, "start": 2909.6600000000003, + "end": 2915.82, "text": " Vamanum or Southern H and SW because I barely understand + graph neural networks let alone trying to", "tokens": [51304, 691, 6147, 449, 420, + 13724, 389, 293, 20346, 570, 286, 10268, 1223, 4295, 18161, 9590, 718, 3312, 1382, + 281, 51612], "temperature": 0.0, "avg_logprob": -0.26922842172475964, "compression_ratio": + 1.5051546391752577, "no_speech_prob": 0.0041251578368246555}, {"id": 483, "seek": + 291582, "start": 2915.82, "end": 2922.38, "text": " make it a hierarchical neural + network but like if it''s if it''s the proximity graph maybe you can", "tokens": + [50364, 652, 309, 257, 35250, 804, 18161, 3209, 457, 411, 498, 309, 311, 498, 309, + 311, 264, 27632, 4295, 1310, 291, 393, 50692], "temperature": 0.0, "avg_logprob": + -0.11983980451311384, "compression_ratio": 2.086206896551724, "no_speech_prob": + 0.003155354643240571}, {"id": 484, "seek": 291582, "start": 2923.1800000000003, + "end": 2928.06, "text": " it''s like it''s like it''s like a psycho again it''s + very similar to like image to image translation", "tokens": [50732, 309, 311, 411, + 309, 311, 411, 309, 311, 411, 257, 33355, 797, 309, 311, 588, 2531, 281, 411, 3256, + 281, 3256, 12853, 50976], "temperature": 0.0, "avg_logprob": -0.11983980451311384, + "compression_ratio": 2.086206896551724, "no_speech_prob": 0.003155354643240571}, + {"id": 485, "seek": 291582, "start": 2928.06, "end": 2932.7000000000003, "text": + " or any kind of you know it''s a vector space to vector space translation and so + you you know you", "tokens": [50976, 420, 604, 733, 295, 291, 458, 309, 311, 257, + 8062, 1901, 281, 8062, 1901, 12853, 293, 370, 291, 291, 458, 291, 51208], "temperature": + 0.0, "avg_logprob": -0.11983980451311384, "compression_ratio": 2.086206896551724, + "no_speech_prob": 0.003155354643240571}, {"id": 486, "seek": 291582, "start": 2932.7000000000003, + "end": 2937.1000000000004, "text": " input the vector output the change in vector + and so can you vectorize like a thousand and then", "tokens": [51208, 4846, 264, + 8062, 5598, 264, 1319, 294, 8062, 293, 370, 393, 291, 8062, 1125, 411, 257, 4714, + 293, 550, 51428], "temperature": 0.0, "avg_logprob": -0.11983980451311384, "compression_ratio": + 2.086206896551724, "no_speech_prob": 0.003155354643240571}, {"id": 487, "seek": + 291582, "start": 2937.1000000000004, "end": 2943.6600000000003, "text": " propagate + that throughout the graph it or throughout the corpus and maybe that proximity graph + has", "tokens": [51428, 48256, 300, 3710, 264, 4295, 309, 420, 3710, 264, 1181, + 31624, 293, 1310, 300, 27632, 4295, 575, 51756], "temperature": 0.0, "avg_logprob": + -0.11983980451311384, "compression_ratio": 2.086206896551724, "no_speech_prob": + 0.003155354643240571}, {"id": 488, "seek": 294366, "start": 2943.66, "end": 2949.8199999999997, + "text": " some kind of bias that facilitates the optimization task or maybe the + graph neural network thing is", "tokens": [50364, 512, 733, 295, 12577, 300, 10217, + 30035, 264, 19618, 5633, 420, 1310, 264, 4295, 18161, 3209, 551, 307, 50672], "temperature": + 0.0, "avg_logprob": -0.12414924109854349, "compression_ratio": 1.6398305084745763, + "no_speech_prob": 0.0008932074997574091}, {"id": 489, "seek": 294366, "start": 2949.8199999999997, + "end": 2954.2999999999997, "text": " too much overhead and you''re better off just + having like a transformer that takes into vector outputs", "tokens": [50672, 886, + 709, 19922, 293, 291, 434, 1101, 766, 445, 1419, 411, 257, 31782, 300, 2516, 666, + 8062, 23930, 50896], "temperature": 0.0, "avg_logprob": -0.12414924109854349, "compression_ratio": + 1.6398305084745763, "no_speech_prob": 0.0008932074997574091}, {"id": 490, "seek": + 294366, "start": 2954.2999999999997, "end": 2960.94, "text": " a vector but yeah + that this idea of like how do you continually update your embedding models", "tokens": + [50896, 257, 8062, 457, 1338, 300, 341, 1558, 295, 411, 577, 360, 291, 22277, 5623, + 428, 12240, 3584, 5245, 51228], "temperature": 0.0, "avg_logprob": -0.12414924109854349, + "compression_ratio": 1.6398305084745763, "no_speech_prob": 0.0008932074997574091}, + {"id": 491, "seek": 294366, "start": 2961.58, "end": 2966.54, "text": " it''s fascinating + right yeah yeah especially the ML ops aspect of it as you''ve mentioned like", "tokens": + [51260, 309, 311, 10343, 558, 1338, 1338, 2318, 264, 21601, 44663, 4171, 295, 309, + 382, 291, 600, 2835, 411, 51508], "temperature": 0.0, "avg_logprob": -0.12414924109854349, + "compression_ratio": 1.6398305084745763, "no_speech_prob": 0.0008932074997574091}, + {"id": 492, "seek": 296654, "start": 2966.62, "end": 2974.62, "text": " if if we + were to insert new neighbors into the existing graph right would that change", "tokens": + [50368, 498, 498, 321, 645, 281, 8969, 777, 12512, 666, 264, 6741, 4295, 558, 576, + 300, 1319, 50768], "temperature": 0.0, "avg_logprob": -0.13443925163962625, "compression_ratio": + 1.6403508771929824, "no_speech_prob": 0.002118273638188839}, {"id": 493, "seek": + 296654, "start": 2975.9, "end": 2981.34, "text": " it favors something more recent + or would it like break something that we didn''t want to break and", "tokens": [50832, + 309, 40554, 746, 544, 5162, 420, 576, 309, 411, 1821, 746, 300, 321, 994, 380, 528, + 281, 1821, 293, 51104], "temperature": 0.0, "avg_logprob": -0.13443925163962625, + "compression_ratio": 1.6403508771929824, "no_speech_prob": 0.002118273638188839}, + {"id": 494, "seek": 296654, "start": 2981.34, "end": 2985.74, "text": " things but + but in some sense if you think about coming back like we are still in the realm + of", "tokens": [51104, 721, 457, 457, 294, 512, 2020, 498, 291, 519, 466, 1348, + 646, 411, 321, 366, 920, 294, 264, 15355, 295, 51324], "temperature": 0.0, "avg_logprob": + -0.13443925163962625, "compression_ratio": 1.6403508771929824, "no_speech_prob": + 0.002118273638188839}, {"id": 495, "seek": 296654, "start": 2985.74, "end": 2995.74, + "text": " this hybrid search topic in a way right if you look at BM25 OTF idea of + approach right so if you", "tokens": [51324, 341, 13051, 3164, 4829, 294, 257, 636, + 558, 498, 291, 574, 412, 15901, 6074, 38617, 37, 1558, 295, 3109, 558, 370, 498, + 291, 51824], "temperature": 0.0, "avg_logprob": -0.13443925163962625, "compression_ratio": + 1.6403508771929824, "no_speech_prob": 0.002118273638188839}, {"id": 496, "seek": + 299574, "start": 2995.74, "end": 3002.62, "text": " compute so you''re I so you + term frequency is only dependent on this document right so that''s fine", "tokens": + [50364, 14722, 370, 291, 434, 286, 370, 291, 1433, 7893, 307, 787, 12334, 322, 341, + 4166, 558, 370, 300, 311, 2489, 50708], "temperature": 0.0, "avg_logprob": -0.10091669419232537, + "compression_ratio": 1.8232558139534885, "no_speech_prob": 0.0011172413360327482}, + {"id": 497, "seek": 299574, "start": 3002.62, "end": 3008.62, "text": " it''s kind + of the independent of all other documents but your inverse document frequency is + dependent", "tokens": [50708, 309, 311, 733, 295, 264, 6695, 295, 439, 661, 8512, + 457, 428, 17340, 4166, 7893, 307, 12334, 51008], "temperature": 0.0, "avg_logprob": + -0.10091669419232537, "compression_ratio": 1.8232558139534885, "no_speech_prob": + 0.0011172413360327482}, {"id": 498, "seek": 299574, "start": 3008.62, "end": 3013.74, + "text": " on the whole corpus which is indexed in that chart by the way that''s + another like big topic which", "tokens": [51008, 322, 264, 1379, 1181, 31624, 597, + 307, 8186, 292, 294, 300, 6927, 538, 264, 636, 300, 311, 1071, 411, 955, 4829, 597, + 51264], "temperature": 0.0, "avg_logprob": -0.10091669419232537, "compression_ratio": + 1.8232558139534885, "no_speech_prob": 0.0011172413360327482}, {"id": 499, "seek": + 299574, "start": 3013.74, "end": 3020.8599999999997, "text": " is kind of like crossing + the boundary of is this just infrastructure issue in slash engineering", "tokens": + [51264, 307, 733, 295, 411, 14712, 264, 12866, 295, 307, 341, 445, 6896, 2734, 294, + 17330, 7043, 51620], "temperature": 0.0, "avg_logprob": -0.10091669419232537, "compression_ratio": + 1.8232558139534885, "no_speech_prob": 0.0011172413360327482}, {"id": 500, "seek": + 302086, "start": 3021.02, "end": 3029.1800000000003, "text": " is this kind of like + research issue and it''s like it''s fuzzy it''s it''s it''s it''s a blend and so + for", "tokens": [50372, 307, 341, 733, 295, 411, 2132, 2734, 293, 309, 311, 411, + 309, 311, 34710, 309, 311, 309, 311, 309, 311, 309, 311, 257, 10628, 293, 370, 337, + 50780], "temperature": 0.0, "avg_logprob": -0.1481343078613281, "compression_ratio": + 1.6647727272727273, "no_speech_prob": 0.0016965331742540002}, {"id": 501, "seek": + 302086, "start": 3029.1800000000003, "end": 3041.7400000000002, "text": " that chart + you''re gonna have that local idea unless you build a a higher level cache which + will", "tokens": [50780, 300, 6927, 291, 434, 799, 362, 300, 2654, 1558, 5969, 291, + 1322, 257, 257, 2946, 1496, 19459, 597, 486, 51408], "temperature": 0.0, "avg_logprob": + -0.1481343078613281, "compression_ratio": 1.6647727272727273, "no_speech_prob": + 0.0016965331742540002}, {"id": 502, "seek": 302086, "start": 3041.7400000000002, + "end": 3048.06, "text": " keep track of each individual chart''s idea and roll it + up to the global idea and like if you look", "tokens": [51408, 1066, 2837, 295, + 1184, 2609, 6927, 311, 1558, 293, 3373, 309, 493, 281, 264, 4338, 1558, 293, 411, + 498, 291, 574, 51724], "temperature": 0.0, "avg_logprob": -0.1481343078613281, "compression_ratio": + 1.6647727272727273, "no_speech_prob": 0.0016965331742540002}, {"id": 503, "seek": + 304806, "start": 3048.7799999999997, "end": 3055.74, "text": " at Apache Solar I + think I believe they had a country module or something implementing this", "tokens": + [50400, 412, 46597, 22385, 286, 519, 286, 1697, 436, 632, 257, 1941, 10088, 420, + 746, 18114, 341, 50748], "temperature": 0.0, "avg_logprob": -0.17180637831098577, + "compression_ratio": 1.5932203389830508, "no_speech_prob": 0.002612438751384616}, + {"id": 504, "seek": 304806, "start": 3057.02, "end": 3062.22, "text": " where you + can actually implement a global cache with IDF which will live on top of the", "tokens": + [50812, 689, 291, 393, 767, 4445, 257, 4338, 19459, 365, 7348, 37, 597, 486, 1621, + 322, 1192, 295, 264, 51072], "temperature": 0.0, "avg_logprob": -0.17180637831098577, + "compression_ratio": 1.5932203389830508, "no_speech_prob": 0.002612438751384616}, + {"id": 505, "seek": 304806, "start": 3062.22, "end": 3067.18, "text": " chart and + now you''re coming back to MLOBS you need to make sure it never dies because if + it dies you", "tokens": [51072, 6927, 293, 586, 291, 434, 1348, 646, 281, 376, 20184, + 8176, 291, 643, 281, 652, 988, 309, 1128, 2714, 570, 498, 309, 2714, 291, 51320], + "temperature": 0.0, "avg_logprob": -0.17180637831098577, "compression_ratio": 1.5932203389830508, + "no_speech_prob": 0.002612438751384616}, {"id": 506, "seek": 304806, "start": 3067.18, + "end": 3074.7, "text": " go back to like the chart level IDF and so that becomes + dependent on okay I have managed to stuff", "tokens": [51320, 352, 646, 281, 411, + 264, 6927, 1496, 7348, 37, 293, 370, 300, 3643, 12334, 322, 1392, 286, 362, 6453, + 281, 1507, 51696], "temperature": 0.0, "avg_logprob": -0.17180637831098577, "compression_ratio": + 1.5932203389830508, "no_speech_prob": 0.002612438751384616}, {"id": 507, "seek": + 307470, "start": 3075.66, "end": 3083.2599999999998, "text": " stuff a lot of documents + about cats in this chart so the IDF is like this and then I", "tokens": [50412, + 1507, 257, 688, 295, 8512, 466, 11111, 294, 341, 6927, 370, 264, 7348, 37, 307, + 411, 341, 293, 550, 286, 50792], "temperature": 0.0, "avg_logprob": -0.1194376641131462, + "compression_ratio": 1.7757009345794392, "no_speech_prob": 0.002432736800983548}, + {"id": 508, "seek": 307470, "start": 3083.2599999999998, "end": 3088.2999999999997, + "text": " stuff a lot of documents on dogs here so they become like unbalanced if + you if you know what I mean", "tokens": [50792, 1507, 257, 688, 295, 8512, 322, + 7197, 510, 370, 436, 1813, 411, 517, 40251, 498, 291, 498, 291, 458, 437, 286, 914, + 51044], "temperature": 0.0, "avg_logprob": -0.1194376641131462, "compression_ratio": + 1.7757009345794392, "no_speech_prob": 0.002432736800983548}, {"id": 509, "seek": + 307470, "start": 3088.2999999999997, "end": 3097.98, "text": " so they it''s not + a healthy mix of term statistics in your collection right and that will influence + a", "tokens": [51044, 370, 436, 309, 311, 406, 257, 4627, 2890, 295, 1433, 12523, + 294, 428, 5765, 558, 293, 300, 486, 6503, 257, 51528], "temperature": 0.0, "avg_logprob": + -0.1194376641131462, "compression_ratio": 1.7757009345794392, "no_speech_prob": + 0.002432736800983548}, {"id": 510, "seek": 307470, "start": 3097.98, "end": 3104.46, + "text": " lot of things like you may say in some cases it''s okay but in some other + cases it may not work", "tokens": [51528, 688, 295, 721, 411, 291, 815, 584, 294, + 512, 3331, 309, 311, 1392, 457, 294, 512, 661, 3331, 309, 815, 406, 589, 51852], + "temperature": 0.0, "avg_logprob": -0.1194376641131462, "compression_ratio": 1.7757009345794392, + "no_speech_prob": 0.002432736800983548}, {"id": 511, "seek": 310446, "start": 3104.46, + "end": 3110.78, "text": " if your query contains you know both concepts and they + are unequally represented somehow", "tokens": [50364, 498, 428, 14581, 8306, 291, + 458, 1293, 10392, 293, 436, 366, 2251, 358, 379, 10379, 6063, 50680], "temperature": + 0.0, "avg_logprob": -0.10266880872772961, "compression_ratio": 1.691588785046729, + "no_speech_prob": 0.0019075166201218963}, {"id": 512, "seek": 310446, "start": 3111.42, + "end": 3117.5, "text": " in your in your collection right so so I mean does it make + sense I mean so it''s like you do have", "tokens": [50712, 294, 428, 294, 428, 5765, + 558, 370, 370, 286, 914, 775, 309, 652, 2020, 286, 914, 370, 309, 311, 411, 291, + 360, 362, 51016], "temperature": 0.0, "avg_logprob": -0.10266880872772961, "compression_ratio": + 1.691588785046729, "no_speech_prob": 0.0019075166201218963}, {"id": 513, "seek": + 310446, "start": 3117.5, "end": 3122.86, "text": " limitations also and and not + limitations but maybe I should pose it in more positive way", "tokens": [51016, + 15705, 611, 293, 293, 406, 15705, 457, 1310, 286, 820, 10774, 309, 294, 544, 3353, + 636, 51284], "temperature": 0.0, "avg_logprob": -0.10266880872772961, "compression_ratio": + 1.691588785046729, "no_speech_prob": 0.0019075166201218963}, {"id": 514, "seek": + 310446, "start": 3122.86, "end": 3129.18, "text": " research tasks right so such + challenges what should we do and I hope that in some sense", "tokens": [51284, 2132, + 9608, 558, 370, 1270, 4759, 437, 820, 321, 360, 293, 286, 1454, 300, 294, 512, 2020, + 51600], "temperature": 0.0, "avg_logprob": -0.10266880872772961, "compression_ratio": + 1.691588785046729, "no_speech_prob": 0.0019075166201218963}, {"id": 515, "seek": + 312918, "start": 3129.2599999999998, "end": 3135.58, "text": " dense search is pushing + us to think more and more about this and maybe some things will back lash", "tokens": + [50368, 18011, 3164, 307, 7380, 505, 281, 519, 544, 293, 544, 466, 341, 293, 1310, + 512, 721, 486, 646, 35275, 50684], "temperature": 0.0, "avg_logprob": -0.1685879718826478, + "compression_ratio": 1.7242990654205608, "no_speech_prob": 0.001397663843818009}, + {"id": 516, "seek": 312918, "start": 3135.58, "end": 3143.74, "text": " from vector + search back to the you know classical keyword retrieval and maybe some new data", + "tokens": [50684, 490, 8062, 3164, 646, 281, 264, 291, 458, 13735, 20428, 19817, + 3337, 293, 1310, 512, 777, 1412, 51092], "temperature": 0.0, "avg_logprob": -0.1685879718826478, + "compression_ratio": 1.7242990654205608, "no_speech_prob": 0.001397663843818009}, + {"id": 517, "seek": 312918, "start": 3143.74, "end": 3150.14, "text": " structures + will even emerge to to tackle these things yeah I think that idea that you''re", + "tokens": [51092, 9227, 486, 754, 21511, 281, 281, 14896, 613, 721, 1338, 286, 519, + 300, 1558, 300, 291, 434, 51412], "temperature": 0.0, "avg_logprob": -0.1685879718826478, + "compression_ratio": 1.7242990654205608, "no_speech_prob": 0.001397663843818009}, + {"id": 518, "seek": 312918, "start": 3150.54, "end": 3157.74, "text": " describing + on the IDF caching it''s super interesting I think it is it is inspiring me just", + "tokens": [51432, 16141, 322, 264, 7348, 37, 269, 2834, 309, 311, 1687, 1880, 286, + 519, 309, 307, 309, 307, 15883, 385, 445, 51792], "temperature": 0.0, "avg_logprob": + -0.1685879718826478, "compression_ratio": 1.7242990654205608, "no_speech_prob": + 0.001397663843818009}, {"id": 519, "seek": 315774, "start": 3157.74, "end": 3160.9399999999996, + "text": " thinking generally about how we''re trading knowledge on this thing in + general and having this", "tokens": [50364, 1953, 5101, 466, 577, 321, 434, 9529, + 3601, 322, 341, 551, 294, 2674, 293, 1419, 341, 50524], "temperature": 0.0, "avg_logprob": + -0.11796992840153156, "compression_ratio": 1.7822878228782288, "no_speech_prob": + 0.0029816513415426016}, {"id": 520, "seek": 315774, "start": 3160.9399999999996, + "end": 3165.5, "text": " podcast and having this content and this communication + and how we''ve you know done like our first", "tokens": [50524, 7367, 293, 1419, + 341, 2701, 293, 341, 6101, 293, 577, 321, 600, 291, 458, 1096, 411, 527, 700, 50752], + "temperature": 0.0, "avg_logprob": -0.11796992840153156, "compression_ratio": 1.7822878228782288, + "no_speech_prob": 0.0029816513415426016}, {"id": 521, "seek": 315774, "start": 3165.5, + "end": 3170.7799999999997, "text": " iteration of BN25 and yeah like learning so + much about the index structure it is really really", "tokens": [50752, 24784, 295, + 363, 45, 6074, 293, 1338, 411, 2539, 370, 709, 466, 264, 8186, 3877, 309, 307, 534, + 534, 51016], "temperature": 0.0, "avg_logprob": -0.11796992840153156, "compression_ratio": + 1.7822878228782288, "no_speech_prob": 0.0029816513415426016}, {"id": 522, "seek": + 315774, "start": 3170.7799999999997, "end": 3176.22, "text": " interesting I was + thinking about like oh well how about displayed vectors could could we just kind", + "tokens": [51016, 1880, 286, 390, 1953, 466, 411, 1954, 731, 577, 466, 16372, 18875, + 727, 727, 321, 445, 733, 51288], "temperature": 0.0, "avg_logprob": -0.11796992840153156, + "compression_ratio": 1.7822878228782288, "no_speech_prob": 0.0029816513415426016}, + {"id": 523, "seek": 315774, "start": 3176.22, "end": 3180.2999999999997, "text": + " of update the mass language modeling head to get the new terms and would that + be easier than this", "tokens": [51288, 295, 5623, 264, 2758, 2856, 15983, 1378, + 281, 483, 264, 777, 2115, 293, 576, 300, 312, 3571, 813, 341, 51492], "temperature": + 0.0, "avg_logprob": -0.11796992840153156, "compression_ratio": 1.7822878228782288, + "no_speech_prob": 0.0029816513415426016}, {"id": 524, "seek": 318030, "start": 3180.38, + "end": 3187.5800000000004, "text": " kind of global cache I like idea and is it + more forward thinking and then yeah it''s really", "tokens": [50368, 733, 295, 4338, + 19459, 286, 411, 1558, 293, 307, 309, 544, 2128, 1953, 293, 550, 1338, 309, 311, + 534, 50728], "temperature": 0.0, "avg_logprob": -0.11349851351517898, "compression_ratio": + 1.8695652173913044, "no_speech_prob": 0.0012161463964730501}, {"id": 525, "seek": + 318030, "start": 3187.5800000000004, "end": 3192.38, "text": " interesting I think + maybe one other idea is this thing called colbert which is like a token level", + "tokens": [50728, 1880, 286, 519, 1310, 472, 661, 1558, 307, 341, 551, 1219, 1173, + 4290, 597, 307, 411, 257, 14862, 1496, 50968], "temperature": 0.0, "avg_logprob": + -0.11349851351517898, "compression_ratio": 1.8695652173913044, "no_speech_prob": + 0.0012161463964730501}, {"id": 526, "seek": 318030, "start": 3192.38, "end": 3197.7400000000002, + "text": " representation thing where it''s like they call it late interaction where + first you do the you know", "tokens": [50968, 10290, 551, 689, 309, 311, 411, 436, + 818, 309, 3469, 9285, 689, 700, 291, 360, 264, 291, 458, 51236], "temperature": + 0.0, "avg_logprob": -0.11349851351517898, "compression_ratio": 1.8695652173913044, + "no_speech_prob": 0.0012161463964730501}, {"id": 527, "seek": 318030, "start": 3197.7400000000002, + "end": 3202.0600000000004, "text": " the standard vector search but then you keep + the token vectors for each of the vectors and", "tokens": [51236, 264, 3832, 8062, + 3164, 457, 550, 291, 1066, 264, 14862, 18875, 337, 1184, 295, 264, 18875, 293, 51452], + "temperature": 0.0, "avg_logprob": -0.11349851351517898, "compression_ratio": 1.8695652173913044, + "no_speech_prob": 0.0012161463964730501}, {"id": 528, "seek": 318030, "start": 3202.0600000000004, + "end": 3205.82, "text": " and then you do that and then they''ve had efficiency + improvements on that so like I think they", "tokens": [51452, 293, 550, 291, 360, + 300, 293, 550, 436, 600, 632, 10493, 13797, 322, 300, 370, 411, 286, 519, 436, 51640], + "temperature": 0.0, "avg_logprob": -0.11349851351517898, "compression_ratio": 1.8695652173913044, + "no_speech_prob": 0.0012161463964730501}, {"id": 529, "seek": 320582, "start": 3205.9, + "end": 3210.94, "text": " in the original colbert they they''ve recently published + this paper I know Christopher Pots and", "tokens": [50368, 294, 264, 3380, 1173, + 4290, 436, 436, 600, 3938, 6572, 341, 3035, 286, 458, 20649, 430, 1971, 293, 50620], + "temperature": 0.0, "avg_logprob": -0.2349657041836629, "compression_ratio": 1.7269372693726937, + "no_speech_prob": 0.00920638907700777}, {"id": 530, "seek": 320582, "start": 3210.94, + "end": 3216.06, "text": " Omar Kataba I''m sorry I don''t know like I''ll show my + best like no give everyone credit all the time", "tokens": [50620, 33784, 8365, + 5509, 286, 478, 2597, 286, 500, 380, 458, 411, 286, 603, 855, 452, 1151, 411, 572, + 976, 1518, 5397, 439, 264, 565, 50876], "temperature": 0.0, "avg_logprob": -0.2349657041836629, + "compression_ratio": 1.7269372693726937, "no_speech_prob": 0.00920638907700777}, + {"id": 531, "seek": 320582, "start": 3217.7400000000002, "end": 3222.1400000000003, + "text": " but in this paper they describe it like the original colbert is like in + 154 gigabyte", "tokens": [50960, 457, 294, 341, 3035, 436, 6786, 309, 411, 264, + 3380, 1173, 4290, 307, 411, 294, 2119, 19, 8741, 34529, 51180], "temperature": 0.0, + "avg_logprob": -0.2349657041836629, "compression_ratio": 1.7269372693726937, "no_speech_prob": + 0.00920638907700777}, {"id": 532, "seek": 320582, "start": 3223.9, "end": 3229.02, + "text": " index compared to like one gigabyte with other methods and so so yeah + like efficient indexing", "tokens": [51268, 8186, 5347, 281, 411, 472, 8741, 34529, + 365, 661, 7150, 293, 370, 370, 1338, 411, 7148, 8186, 278, 51524], "temperature": + 0.0, "avg_logprob": -0.2349657041836629, "compression_ratio": 1.7269372693726937, + "no_speech_prob": 0.00920638907700777}, {"id": 533, "seek": 320582, "start": 3229.02, + "end": 3234.54, "text": " and I''m definitely running a van but it is like a big + thing to unpack there''s so much depth to", "tokens": [51524, 293, 286, 478, 2138, + 2614, 257, 3161, 457, 309, 307, 411, 257, 955, 551, 281, 26699, 456, 311, 370, 709, + 7161, 281, 51800], "temperature": 0.0, "avg_logprob": -0.2349657041836629, "compression_ratio": + 1.7269372693726937, "no_speech_prob": 0.00920638907700777}, {"id": 534, "seek": + 323454, "start": 3234.54, "end": 3239.18, "text": " this and that''s what makes + working in this field so exciting is that there''s so much opportunity", "tokens": + [50364, 341, 293, 300, 311, 437, 1669, 1364, 294, 341, 2519, 370, 4670, 307, 300, + 456, 311, 370, 709, 2650, 50596], "temperature": 0.0, "avg_logprob": -0.11373717384000795, + "compression_ratio": 1.8052434456928839, "no_speech_prob": 0.0044932859018445015}, + {"id": 535, "seek": 323454, "start": 3239.18, "end": 3245.2599999999998, "text": + " so much to explore yeah yeah and so much unsolved as well I don''t I don''t know + if you wanted to", "tokens": [50596, 370, 709, 281, 6839, 1338, 1338, 293, 370, + 709, 2693, 29110, 382, 731, 286, 500, 380, 286, 500, 380, 458, 498, 291, 1415, 281, + 50900], "temperature": 0.0, "avg_logprob": -0.11373717384000795, "compression_ratio": + 1.8052434456928839, "no_speech_prob": 0.0044932859018445015}, {"id": 536, "seek": + 323454, "start": 3245.2599999999998, "end": 3251.58, "text": " continue a thought + oh no sorry yeah I was just yeah I mean we are branching out but like actually", + "tokens": [50900, 2354, 257, 1194, 1954, 572, 2597, 1338, 286, 390, 445, 1338, 286, + 914, 321, 366, 9819, 278, 484, 457, 411, 767, 51216], "temperature": 0.0, "avg_logprob": + -0.11373717384000795, "compression_ratio": 1.8052434456928839, "no_speech_prob": + 0.0044932859018445015}, {"id": 537, "seek": 323454, "start": 3251.58, "end": 3257.1, + "text": " one thing that you just reminded me there was a maybe I should start writing + a book or something", "tokens": [51216, 472, 551, 300, 291, 445, 15920, 385, 456, + 390, 257, 1310, 286, 820, 722, 3579, 257, 1446, 420, 746, 51492], "temperature": + 0.0, "avg_logprob": -0.11373717384000795, "compression_ratio": 1.8052434456928839, + "no_speech_prob": 0.0044932859018445015}, {"id": 538, "seek": 323454, "start": 3257.1, + "end": 3261.66, "text": " because like the moment I remember this I should write + a chapter and then keep adding and then", "tokens": [51492, 570, 411, 264, 1623, + 286, 1604, 341, 286, 820, 2464, 257, 7187, 293, 550, 1066, 5127, 293, 550, 51720], + "temperature": 0.0, "avg_logprob": -0.11373717384000795, "compression_ratio": 1.8052434456928839, + "no_speech_prob": 0.0044932859018445015}, {"id": 539, "seek": 326166, "start": 3261.66, + "end": 3269.3399999999997, "text": " publish it maybe you can be my author or something + yeah that was just thinking it was was it like", "tokens": [50364, 11374, 309, 1310, + 291, 393, 312, 452, 3793, 420, 746, 1338, 300, 390, 445, 1953, 309, 390, 390, 309, + 411, 50748], "temperature": 0.0, "avg_logprob": -0.165740966796875, "compression_ratio": + 1.56, "no_speech_prob": 0.0037940277252346277}, {"id": 540, "seek": 326166, "start": + 3269.3399999999997, "end": 3274.8599999999997, "text": " 10 years ago on Berlin + buzzwords there was a presentation by one of the engineers at Twitter I don''t", + "tokens": [50748, 1266, 924, 2057, 322, 13848, 13036, 13832, 456, 390, 257, 5860, + 538, 472, 295, 264, 11955, 412, 5794, 286, 500, 380, 51024], "temperature": 0.0, + "avg_logprob": -0.165740966796875, "compression_ratio": 1.56, "no_speech_prob": + 0.0037940277252346277}, {"id": 541, "seek": 326166, "start": 3274.8599999999997, + "end": 3281.3399999999997, "text": " know if he''s still a Twitter and I forgot + his name I remember he was German working out from", "tokens": [51024, 458, 498, + 415, 311, 920, 257, 5794, 293, 286, 5298, 702, 1315, 286, 1604, 415, 390, 6521, + 1364, 484, 490, 51348], "temperature": 0.0, "avg_logprob": -0.165740966796875, "compression_ratio": + 1.56, "no_speech_prob": 0.0037940277252346277}, {"id": 542, "seek": 326166, "start": + 3281.3399999999997, "end": 3289.8199999999997, "text": " San Francisco and he basically + coming back to that issue with you know sorted document ideas right", "tokens": + [51348, 5271, 12279, 293, 415, 1936, 1348, 646, 281, 300, 2734, 365, 291, 458, 25462, + 4166, 3487, 558, 51772], "temperature": 0.0, "avg_logprob": -0.165740966796875, + "compression_ratio": 1.56, "no_speech_prob": 0.0037940277252346277}, {"id": 543, + "seek": 329166, "start": 3292.14, "end": 3298.2999999999997, "text": " what they + did at Twitter first of all you know the scale of Twitter is such that you cannot + possibly store", "tokens": [50388, 437, 436, 630, 412, 5794, 700, 295, 439, 291, + 458, 264, 4373, 295, 5794, 307, 1270, 300, 291, 2644, 6264, 3531, 50696], "temperature": + 0.0, "avg_logprob": -0.11831770081450974, "compression_ratio": 1.6424581005586592, + "no_speech_prob": 0.0027366860304027796}, {"id": 544, "seek": 329166, "start": 3299.8999999999996, + "end": 3306.3799999999997, "text": " Lucine index on disk and then go and retrieve + it because well it''s just way too slow right", "tokens": [50776, 9593, 533, 8186, + 322, 12355, 293, 550, 352, 293, 30254, 309, 570, 731, 309, 311, 445, 636, 886, 2964, + 558, 51100], "temperature": 0.0, "avg_logprob": -0.11831770081450974, "compression_ratio": + 1.6424581005586592, "no_speech_prob": 0.0027366860304027796}, {"id": 545, "seek": + 329166, "start": 3308.2999999999997, "end": 3315.42, "text": " what they did is + that they moved the whole index into memory right so they had to rewrite Lucine", + "tokens": [51196, 437, 436, 630, 307, 300, 436, 4259, 264, 1379, 8186, 666, 4675, + 558, 370, 436, 632, 281, 28132, 9593, 533, 51552], "temperature": 0.0, "avg_logprob": + -0.11831770081450974, "compression_ratio": 1.6424581005586592, "no_speech_prob": + 0.0027366860304027796}, {"id": 546, "seek": 331542, "start": 3316.14, "end": 3324.14, + "text": " to kind of like this memory friendly data structure and one thing they + did in particular", "tokens": [50400, 281, 733, 295, 411, 341, 4675, 9208, 1412, + 3877, 293, 472, 551, 436, 630, 294, 1729, 50800], "temperature": 0.0, "avg_logprob": + -0.12157423794269562, "compression_ratio": 1.6467065868263473, "no_speech_prob": + 0.002458452247083187}, {"id": 547, "seek": 331542, "start": 3324.14, "end": 3329.82, + "text": " is that as tweets come in each tweet is a document it gets its unique + document they deem", "tokens": [50800, 307, 300, 382, 25671, 808, 294, 1184, 15258, + 307, 257, 4166, 309, 2170, 1080, 3845, 4166, 436, 368, 443, 51084], "temperature": + 0.0, "avg_logprob": -0.12157423794269562, "compression_ratio": 1.6467065868263473, + "no_speech_prob": 0.002458452247083187}, {"id": 548, "seek": 331542, "start": 3330.94, + "end": 3341.66, "text": " and they would append this new document ID to the postings + list in the end right so for this term", "tokens": [51140, 293, 436, 576, 34116, + 341, 777, 4166, 7348, 281, 264, 2183, 1109, 1329, 294, 264, 917, 558, 370, 337, + 341, 1433, 51676], "temperature": 0.0, "avg_logprob": -0.12157423794269562, "compression_ratio": + 1.6467065868263473, "no_speech_prob": 0.002458452247083187}, {"id": 549, "seek": + 334166, "start": 3341.74, "end": 3347.98, "text": " so they would decompose it into + terms back and then they would know okay I need now to update that", "tokens": [50368, + 370, 436, 576, 22867, 541, 309, 666, 2115, 646, 293, 550, 436, 576, 458, 1392, 286, + 643, 586, 281, 5623, 300, 50680], "temperature": 0.0, "avg_logprob": -0.1141032636835334, + "compression_ratio": 1.8894230769230769, "no_speech_prob": 0.0034279569517821074}, + {"id": 550, "seek": 334166, "start": 3347.98, "end": 3354.54, "text": " specific + terms posting list so the posting list is just the array of dock IDs so they would + put", "tokens": [50680, 2685, 2115, 15978, 1329, 370, 264, 15978, 1329, 307, 445, + 264, 10225, 295, 20929, 48212, 370, 436, 576, 829, 51008], "temperature": 0.0, "avg_logprob": + -0.1141032636835334, "compression_ratio": 1.8894230769230769, "no_speech_prob": + 0.0034279569517821074}, {"id": 551, "seek": 334166, "start": 3354.54, "end": 3362.2999999999997, + "text": " that Twitter tweets dock ID in the end and as the new searcher comes in + searching tweets they would", "tokens": [51008, 300, 5794, 25671, 20929, 7348, 294, + 264, 917, 293, 382, 264, 777, 3164, 260, 1487, 294, 10808, 25671, 436, 576, 51396], + "temperature": 0.0, "avg_logprob": -0.1141032636835334, "compression_ratio": 1.8894230769230769, + "no_speech_prob": 0.0034279569517821074}, {"id": 552, "seek": 334166, "start": 3362.2999999999997, + "end": 3369.3399999999997, "text": " read backwards from the end they wouldn''t + read from the beginning of so basically what they did is", "tokens": [51396, 1401, + 12204, 490, 264, 917, 436, 2759, 380, 1401, 490, 264, 2863, 295, 370, 1936, 437, + 436, 630, 307, 51748], "temperature": 0.0, "avg_logprob": -0.1141032636835334, "compression_ratio": + 1.8894230769230769, "no_speech_prob": 0.0034279569517821074}, {"id": 553, "seek": + 336934, "start": 3369.34, "end": 3377.1800000000003, "text": " that they kind of + like encoded the temporal nature of tweets and people what end users wanting to", + "tokens": [50364, 300, 436, 733, 295, 411, 2058, 12340, 264, 30881, 3687, 295, 25671, + 293, 561, 437, 917, 5022, 7935, 281, 50756], "temperature": 0.0, "avg_logprob": + -0.1760385943130708, "compression_ratio": 1.6954022988505748, "no_speech_prob": + 0.0023509797174483538}, {"id": 554, "seek": 336934, "start": 3377.1800000000003, + "end": 3382.1400000000003, "text": " search and view the tweets which are the most + fresh so like like I don''t know if like you are the", "tokens": [50756, 3164, 293, + 1910, 264, 25671, 597, 366, 264, 881, 4451, 370, 411, 411, 286, 500, 380, 458, 498, + 411, 291, 366, 264, 51004], "temperature": 0.0, "avg_logprob": -0.1760385943130708, + "compression_ratio": 1.6954022988505748, "no_speech_prob": 0.0023509797174483538}, + {"id": 555, "seek": 336934, "start": 3382.1400000000003, "end": 3393.02, "text": + " heavy user of Twitter I do you know like on Twitter like when I log in and I check + my timeline like", "tokens": [51004, 4676, 4195, 295, 5794, 286, 360, 291, 458, + 411, 322, 5794, 411, 562, 286, 3565, 294, 293, 286, 1520, 452, 12933, 411, 51548], + "temperature": 0.0, "avg_logprob": -0.1760385943130708, "compression_ratio": 1.6954022988505748, + "no_speech_prob": 0.0023509797174483538}, {"id": 556, "seek": 339302, "start": 3393.02, + "end": 3399.66, "text": " usually I see something super fresh and then I keep scrolling + but like not like anti-props to", "tokens": [50364, 2673, 286, 536, 746, 1687, 4451, + 293, 550, 286, 1066, 29053, 457, 411, 406, 411, 6061, 12, 4318, 1878, 281, 50696], + "temperature": 0.0, "avg_logprob": -0.10612969813139542, "compression_ratio": 1.6885964912280702, + "no_speech_prob": 0.003580134827643633}, {"id": 557, "seek": 339302, "start": 3399.66, + "end": 3404.94, "text": " Twitter but it''s it''s a nightmare to search on Twitter + like when I search something I know existed", "tokens": [50696, 5794, 457, 309, + 311, 309, 311, 257, 18724, 281, 3164, 322, 5794, 411, 562, 286, 3164, 746, 286, + 458, 13135, 50960], "temperature": 0.0, "avg_logprob": -0.10612969813139542, "compression_ratio": + 1.6885964912280702, "no_speech_prob": 0.003580134827643633}, {"id": 558, "seek": + 339302, "start": 3404.94, "end": 3413.18, "text": " like a week ago there is no + way for me to find it unless I know the exact tweet ID right and so at", "tokens": + [50960, 411, 257, 1243, 2057, 456, 307, 572, 636, 337, 385, 281, 915, 309, 5969, + 286, 458, 264, 1900, 15258, 7348, 558, 293, 370, 412, 51372], "temperature": 0.0, + "avg_logprob": -0.10612969813139542, "compression_ratio": 1.6885964912280702, "no_speech_prob": + 0.003580134827643633}, {"id": 559, "seek": 339302, "start": 3413.18, "end": 3419.18, + "text": " some point I was even indexing tweets actually direct messages I had with + few people you know", "tokens": [51372, 512, 935, 286, 390, 754, 8186, 278, 25671, + 767, 2047, 7897, 286, 632, 365, 1326, 561, 291, 458, 51672], "temperature": 0.0, + "avg_logprob": -0.10612969813139542, "compression_ratio": 1.6885964912280702, "no_speech_prob": + 0.003580134827643633}, {"id": 560, "seek": 341918, "start": 3419.5, "end": 3426.8599999999997, + "text": " in solar and then basically searching them so because it was way faster + than searching them on", "tokens": [50380, 294, 7936, 293, 550, 1936, 10808, 552, + 370, 570, 309, 390, 636, 4663, 813, 10808, 552, 322, 50748], "temperature": 0.0, + "avg_logprob": -0.10192070450893669, "compression_ratio": 1.7217391304347827, "no_speech_prob": + 0.006082917097955942}, {"id": 561, "seek": 341918, "start": 3426.8599999999997, + "end": 3432.94, "text": " Twitter because like if you have 5,000 direct messages + scrolling through them will take half a day", "tokens": [50748, 5794, 570, 411, + 498, 291, 362, 1025, 11, 1360, 2047, 7897, 29053, 807, 552, 486, 747, 1922, 257, + 786, 51052], "temperature": 0.0, "avg_logprob": -0.10192070450893669, "compression_ratio": + 1.7217391304347827, "no_speech_prob": 0.006082917097955942}, {"id": 562, "seek": + 341918, "start": 3432.94, "end": 3437.74, "text": " so because they keep loading + and loading so basically what I''m trying to say is that they optimize the", "tokens": + [51052, 370, 570, 436, 1066, 15114, 293, 15114, 370, 1936, 437, 286, 478, 1382, + 281, 584, 307, 300, 436, 19719, 264, 51292], "temperature": 0.0, "avg_logprob": + -0.10192070450893669, "compression_ratio": 1.7217391304347827, "no_speech_prob": + 0.006082917097955942}, {"id": 563, "seek": 341918, "start": 3437.74, "end": 3443.98, + "text": " data structure for the nature of usage of Twitter in such a way that they + bias to the recent tweets", "tokens": [51292, 1412, 3877, 337, 264, 3687, 295, 14924, + 295, 5794, 294, 1270, 257, 636, 300, 436, 12577, 281, 264, 5162, 25671, 51604], + "temperature": 0.0, "avg_logprob": -0.10192070450893669, "compression_ratio": 1.7217391304347827, + "no_speech_prob": 0.006082917097955942}, {"id": 564, "seek": 344398, "start": 3443.98, + "end": 3450.78, "text": " and they don''t care if you will have to spend a day retrieving + like super all tweet like it''s", "tokens": [50364, 293, 436, 500, 380, 1127, 498, + 291, 486, 362, 281, 3496, 257, 786, 19817, 798, 411, 1687, 439, 15258, 411, 309, + 311, 50704], "temperature": 0.0, "avg_logprob": -0.15530645030818574, "compression_ratio": + 1.6132596685082874, "no_speech_prob": 0.0012151519767940044}, {"id": 565, "seek": + 344398, "start": 3450.78, "end": 3457.26, "text": " like so my new user use case + for them for the majority of users 99% of users will only want to see", "tokens": + [50704, 411, 370, 452, 777, 4195, 764, 1389, 337, 552, 337, 264, 6286, 295, 5022, + 11803, 4, 295, 5022, 486, 787, 528, 281, 536, 51028], "temperature": 0.0, "avg_logprob": + -0.15530645030818574, "compression_ratio": 1.6132596685082874, "no_speech_prob": + 0.0012151519767940044}, {"id": 566, "seek": 344398, "start": 3457.26, "end": 3465.1, + "text": " and consume the latest thing so in some sense this is kind of the effect + of optimizing to the usage", "tokens": [51028, 293, 14732, 264, 6792, 551, 370, + 294, 512, 2020, 341, 307, 733, 295, 264, 1802, 295, 40425, 281, 264, 14924, 51420], + "temperature": 0.0, "avg_logprob": -0.15530645030818574, "compression_ratio": 1.6132596685082874, + "no_speech_prob": 0.0012151519767940044}, {"id": 567, "seek": 346510, "start": 3465.3399999999997, + "end": 3474.46, "text": " like what you say we could optimize you know like split + or or similar you know sparse lllm or something", "tokens": [50376, 411, 437, 291, + 584, 321, 727, 19719, 291, 458, 411, 7472, 420, 420, 2531, 291, 458, 637, 11668, + 287, 285, 76, 420, 746, 50832], "temperature": 0.0, "avg_logprob": -0.16317145029703775, + "compression_ratio": 1.7085714285714286, "no_speech_prob": 0.030532527714967728}, + {"id": 568, "seek": 346510, "start": 3475.18, "end": 3482.86, "text": " to kind + of like learn you know that latest beat and maybe there is a high chance of it being", + "tokens": [50868, 281, 733, 295, 411, 1466, 291, 458, 300, 6792, 4224, 293, 1310, + 456, 307, 257, 1090, 2931, 295, 309, 885, 51252], "temperature": 0.0, "avg_logprob": + -0.16317145029703775, "compression_ratio": 1.7085714285714286, "no_speech_prob": + 0.030532527714967728}, {"id": 569, "seek": 346510, "start": 3482.86, "end": 3488.38, + "text": " retrieved as well so we might as well bias the system to that but then + of course there is catastrophic", "tokens": [51252, 19817, 937, 382, 731, 370, 321, + 1062, 382, 731, 12577, 264, 1185, 281, 300, 457, 550, 295, 1164, 456, 307, 34915, + 51528], "temperature": 0.0, "avg_logprob": -0.16317145029703775, "compression_ratio": + 1.7085714285714286, "no_speech_prob": 0.030532527714967728}, {"id": 570, "seek": + 348838, "start": 3488.38, "end": 3497.1800000000003, "text": " forgetting thing + and stuff like that. Yeah there''s no is yes not an easy problem to continually + update", "tokens": [50364, 25428, 551, 293, 1507, 411, 300, 13, 865, 456, 311, 572, + 307, 2086, 406, 364, 1858, 1154, 281, 22277, 5623, 50804], "temperature": 0.0, "avg_logprob": + -0.22320730968188213, "compression_ratio": 1.7456140350877194, "no_speech_prob": + 0.03158165514469147}, {"id": 571, "seek": 348838, "start": 3497.1800000000003, "end": + 3503.02, "text": " the mlm head either it would be maybe worth adding that this + mlm head insplay doesn''t need to be like", "tokens": [50804, 264, 23271, 76, 1378, + 2139, 309, 576, 312, 1310, 3163, 5127, 300, 341, 23271, 76, 1378, 1028, 2858, 1177, + 380, 643, 281, 312, 411, 51096], "temperature": 0.0, "avg_logprob": -0.22320730968188213, + "compression_ratio": 1.7456140350877194, "no_speech_prob": 0.03158165514469147}, + {"id": 572, "seek": 348838, "start": 3503.02, "end": 3508.78, "text": " a billion + parameters well maybe a billion would be good but it doesn''t need to be a hundred + billion", "tokens": [51096, 257, 5218, 9834, 731, 1310, 257, 5218, 576, 312, 665, + 457, 309, 1177, 380, 643, 281, 312, 257, 3262, 5218, 51384], "temperature": 0.0, + "avg_logprob": -0.22320730968188213, "compression_ratio": 1.7456140350877194, "no_speech_prob": + 0.03158165514469147}, {"id": 573, "seek": 348838, "start": 3508.78, "end": 3513.42, + "text": " or like yeah that''s such a fascinating nugget of system design you just + shared at the Twitter", "tokens": [51384, 420, 411, 1338, 300, 311, 1270, 257, 10343, + 30279, 847, 295, 1185, 1715, 291, 445, 5507, 412, 264, 5794, 51616], "temperature": + 0.0, "avg_logprob": -0.22320730968188213, "compression_ratio": 1.7456140350877194, + "no_speech_prob": 0.03158165514469147}, {"id": 574, "seek": 351342, "start": 3513.82, + "end": 3519.7400000000002, "text": " thing and yeah it''s really interesting I''ve + seen this other company called perplexity AI that", "tokens": [50384, 551, 293, + 1338, 309, 311, 534, 1880, 286, 600, 1612, 341, 661, 2237, 1219, 680, 18945, 507, + 7318, 300, 50680], "temperature": 0.0, "avg_logprob": -0.23074384706210246, "compression_ratio": + 1.6821428571428572, "no_speech_prob": 0.04397216811776161}, {"id": 575, "seek": + 351342, "start": 3519.7400000000002, "end": 3526.46, "text": " Ravine Shrinivas + is I think he''s a founder CEO of it and it''s cool because he was he worked on", + "tokens": [50680, 497, 706, 533, 1160, 12629, 24759, 307, 286, 519, 415, 311, 257, + 14917, 9282, 295, 309, 293, 309, 311, 1627, 570, 415, 390, 415, 2732, 322, 51016], + "temperature": 0.0, "avg_logprob": -0.23074384706210246, "compression_ratio": 1.6821428571428572, + "no_speech_prob": 0.04397216811776161}, {"id": 576, "seek": 351342, "start": 3526.46, + "end": 3531.58, "text": " curl with Peter Rebele on this contrastive representation + learning for robotics where they''re", "tokens": [51016, 22591, 365, 6508, 1300, + 650, 306, 322, 341, 8712, 488, 10290, 2539, 337, 34145, 689, 436, 434, 51272], "temperature": + 0.0, "avg_logprob": -0.23074384706210246, "compression_ratio": 1.6821428571428572, + "no_speech_prob": 0.04397216811776161}, {"id": 577, "seek": 351342, "start": 3531.58, + "end": 3535.98, "text": " you know they''re doing the same kind of idea vector optimization + to learn a state space for", "tokens": [51272, 291, 458, 436, 434, 884, 264, 912, + 733, 295, 1558, 8062, 19618, 281, 1466, 257, 1785, 1901, 337, 51492], "temperature": + 0.0, "avg_logprob": -0.23074384706210246, "compression_ratio": 1.6821428571428572, + "no_speech_prob": 0.04397216811776161}, {"id": 578, "seek": 351342, "start": 3535.98, + "end": 3539.42, "text": " robotic control on so I think it''s really cool that now + he''s working on the search space too but", "tokens": [51492, 30468, 1969, 322, + 370, 286, 519, 309, 311, 534, 1627, 300, 586, 415, 311, 1364, 322, 264, 3164, 1901, + 886, 457, 51664], "temperature": 0.0, "avg_logprob": -0.23074384706210246, "compression_ratio": + 1.6821428571428572, "no_speech_prob": 0.04397216811776161}, {"id": 579, "seek": + 353942, "start": 3540.2200000000003, "end": 3546.38, "text": " they have it''s like + the other approach is like natural language to SQL it''s something like that where", + "tokens": [50404, 436, 362, 309, 311, 411, 264, 661, 3109, 307, 411, 3303, 2856, + 281, 19200, 309, 311, 746, 411, 300, 689, 50712], "temperature": 0.0, "avg_logprob": + -0.12814520741557026, "compression_ratio": 1.7757847533632287, "no_speech_prob": + 0.002866260940209031}, {"id": 580, "seek": 353942, "start": 3546.38, "end": 3551.82, + "text": " like instead of and I''m getting a little off topic but it''s like kind + of related to Twitter and", "tokens": [50712, 411, 2602, 295, 293, 286, 478, 1242, + 257, 707, 766, 4829, 457, 309, 311, 411, 733, 295, 4077, 281, 5794, 293, 50984], + "temperature": 0.0, "avg_logprob": -0.12814520741557026, "compression_ratio": 1.7757847533632287, + "no_speech_prob": 0.002866260940209031}, {"id": 581, "seek": 353942, "start": 3551.82, + "end": 3557.42, "text": " it''s about like putting tweets into you know data stores + and then parsing natural language queries", "tokens": [50984, 309, 311, 466, 411, + 3372, 25671, 666, 291, 458, 1412, 9512, 293, 550, 21156, 278, 3303, 2856, 24109, + 51264], "temperature": 0.0, "avg_logprob": -0.12814520741557026, "compression_ratio": + 1.7757847533632287, "no_speech_prob": 0.002866260940209031}, {"id": 582, "seek": + 353942, "start": 3557.42, "end": 3563.7400000000002, "text": " into the SQL but + so that''s like another idea I guess is like you would parse the query yeah I think", + "tokens": [51264, 666, 264, 19200, 457, 370, 300, 311, 411, 1071, 1558, 286, 2041, + 307, 411, 291, 576, 48377, 264, 14581, 1338, 286, 519, 51580], "temperature": 0.0, + "avg_logprob": -0.12814520741557026, "compression_ratio": 1.7757847533632287, "no_speech_prob": + 0.002866260940209031}, {"id": 583, "seek": 356374, "start": 3563.74, "end": 3568.9399999999996, + "text": " I''m already explaining what do you think about that idea like you you + take the query and you turn", "tokens": [50364, 286, 478, 1217, 13468, 437, 360, + 291, 519, 466, 300, 1558, 411, 291, 291, 747, 264, 14581, 293, 291, 1261, 50624], + "temperature": 0.0, "avg_logprob": -0.18142625057336056, "compression_ratio": 1.7399103139013452, + "no_speech_prob": 0.0068636685609817505}, {"id": 584, "seek": 356374, "start": 3568.9399999999996, + "end": 3576.9399999999996, "text": " it into an SQL query in like that''s it yes + yes I know what you mean it''s like it''s very similar I", "tokens": [50624, 309, + 666, 364, 19200, 14581, 294, 411, 300, 311, 309, 2086, 2086, 286, 458, 437, 291, + 914, 309, 311, 411, 309, 311, 588, 2531, 286, 51024], "temperature": 0.0, "avg_logprob": + -0.18142625057336056, "compression_ratio": 1.7399103139013452, "no_speech_prob": + 0.0068636685609817505}, {"id": 585, "seek": 356374, "start": 3576.9399999999996, + "end": 3585.1, "text": " think deep said did that right so you can or maybe it''s + opposite I''m not sure but like if you have a", "tokens": [51024, 519, 2452, 848, + 630, 300, 558, 370, 291, 393, 420, 1310, 309, 311, 6182, 286, 478, 406, 988, 457, + 411, 498, 291, 362, 257, 51432], "temperature": 0.0, "avg_logprob": -0.18142625057336056, + "compression_ratio": 1.7399103139013452, "no_speech_prob": 0.0068636685609817505}, + {"id": 586, "seek": 356374, "start": 3586.4599999999996, "end": 3592.7, "text": + " probably the same if you have like a table right you know with fields and rows + I don''t know", "tokens": [51500, 1391, 264, 912, 498, 291, 362, 411, 257, 3199, + 558, 291, 458, 365, 7909, 293, 13241, 286, 500, 380, 458, 51812], "temperature": + 0.0, "avg_logprob": -0.18142625057336056, "compression_ratio": 1.7399103139013452, + "no_speech_prob": 0.0068636685609817505}, {"id": 587, "seek": 359270, "start": 3592.7, + "end": 3600.3799999999997, "text": " let''s say list of mountains with their heights + and so on so you can actually have a question what", "tokens": [50364, 718, 311, + 584, 1329, 295, 10233, 365, 641, 25930, 293, 370, 322, 370, 291, 393, 767, 362, + 257, 1168, 437, 50748], "temperature": 0.0, "avg_logprob": -0.1367633125998757, + "compression_ratio": 1.4057971014492754, "no_speech_prob": 0.0006770919426344335}, + {"id": 588, "seek": 359270, "start": 3600.3799999999997, "end": 3608.62, "text": + " is the tallest mountain in Europe or Asia you could turn that query in natural + language into SQL", "tokens": [50748, 307, 264, 42075, 6937, 294, 3315, 420, 10038, + 291, 727, 1261, 300, 14581, 294, 3303, 2856, 666, 19200, 51160], "temperature": + 0.0, "avg_logprob": -0.1367633125998757, "compression_ratio": 1.4057971014492754, + "no_speech_prob": 0.0006770919426344335}, {"id": 589, "seek": 360862, "start": 3608.62, + "end": 3616.7, "text": " command and say you know select you know mountains from + this mountains table", "tokens": [50364, 5622, 293, 584, 291, 458, 3048, 291, 458, + 10233, 490, 341, 10233, 3199, 50768], "temperature": 0.0, "avg_logprob": -0.18023523124488625, + "compression_ratio": 1.4396551724137931, "no_speech_prob": 0.020022448152303696}, + {"id": 590, "seek": 360862, "start": 3619.5, "end": 3630.14, "text": " order by + height reverse right descending and so I like this idea and in fact actually I''ve", + "tokens": [50908, 1668, 538, 6681, 9943, 558, 40182, 293, 370, 286, 411, 341, 1558, + 293, 294, 1186, 767, 286, 600, 51440], "temperature": 0.0, "avg_logprob": -0.18023523124488625, + "compression_ratio": 1.4396551724137931, "no_speech_prob": 0.020022448152303696}, + {"id": 591, "seek": 363014, "start": 3631.1, "end": 3637.98, "text": " I think first + of all this is already doable right so I''m fine just stood with like with the deep", + "tokens": [50412, 286, 519, 700, 295, 439, 341, 307, 1217, 41183, 558, 370, 286, + 478, 2489, 445, 9371, 365, 411, 365, 264, 2452, 50756], "temperature": 0.0, "avg_logprob": + -0.19213569770425054, "compression_ratio": 1.4753086419753085, "no_speech_prob": + 0.003279930679127574}, {"id": 592, "seek": 363014, "start": 3637.98, "end": 3643.74, + "text": " set doing that in haystack but I also came across this idea", "tokens": + [50756, 992, 884, 300, 294, 4842, 372, 501, 457, 286, 611, 1361, 2108, 341, 1558, + 51044], "temperature": 0.0, "avg_logprob": -0.19213569770425054, "compression_ratio": + 1.4753086419753085, "no_speech_prob": 0.003279930679127574}, {"id": 593, "seek": + 363014, "start": 3646.94, "end": 3654.7799999999997, "text": " during my PhD research + because so the problem there I believe was that it was like", "tokens": [51204, + 1830, 452, 14476, 2132, 570, 370, 264, 1154, 456, 286, 1697, 390, 300, 309, 390, + 411, 51596], "temperature": 0.0, "avg_logprob": -0.19213569770425054, "compression_ratio": + 1.4753086419753085, "no_speech_prob": 0.003279930679127574}, {"id": 594, "seek": + 365478, "start": 3655.7400000000002, "end": 3664.78, "text": " these engineers working + on building aircrafts and so they had to read a ton of manuals", "tokens": [50412, + 613, 11955, 1364, 322, 2390, 9465, 82, 293, 370, 436, 632, 281, 1401, 257, 2952, + 295, 9688, 82, 50864], "temperature": 0.0, "avg_logprob": -0.10964665105265956, + "compression_ratio": 1.631578947368421, "no_speech_prob": 0.0014665609924122691}, + {"id": 595, "seek": 365478, "start": 3665.6600000000003, "end": 3671.26, "text": + " but once you read the manual you still need to go and look up that specific number + somewhere in", "tokens": [50908, 457, 1564, 291, 1401, 264, 9688, 291, 920, 643, + 281, 352, 293, 574, 493, 300, 2685, 1230, 4079, 294, 51188], "temperature": 0.0, + "avg_logprob": -0.10964665105265956, "compression_ratio": 1.631578947368421, "no_speech_prob": + 0.0014665609924122691}, {"id": 596, "seek": 365478, "start": 3671.26, "end": 3678.3, + "text": " the database right so so basically they do like multiple multiple hop + approach and that may take", "tokens": [51188, 264, 8149, 558, 370, 370, 1936, 436, + 360, 411, 3866, 3866, 3818, 3109, 293, 300, 815, 747, 51540], "temperature": 0.0, + "avg_logprob": -0.10964665105265956, "compression_ratio": 1.631578947368421, "no_speech_prob": + 0.0014665609924122691}, {"id": 597, "seek": 367830, "start": 3678.38, "end": 3684.86, + "text": " like forever like you first of all you need to crunch through a ton of + you know text material and then", "tokens": [50368, 411, 5680, 411, 291, 700, 295, + 439, 291, 643, 281, 13386, 807, 257, 2952, 295, 291, 458, 2487, 2527, 293, 550, + 50692], "temperature": 0.0, "avg_logprob": -0.0821292241414388, "compression_ratio": + 1.720524017467249, "no_speech_prob": 0.0033134587574750185}, {"id": 598, "seek": + 367830, "start": 3684.86, "end": 3689.6600000000003, "text": " somehow summarize + it and then okay now I need to go and look up that that number in the database", + "tokens": [50692, 6063, 20858, 309, 293, 550, 1392, 586, 286, 643, 281, 352, 293, + 574, 493, 300, 300, 1230, 294, 264, 8149, 50932], "temperature": 0.0, "avg_logprob": + -0.0821292241414388, "compression_ratio": 1.720524017467249, "no_speech_prob": 0.0033134587574750185}, + {"id": 599, "seek": 367830, "start": 3689.6600000000003, "end": 3697.42, "text": + " but what if you could ask a natural language question to the manuals then convert + that to a SQL", "tokens": [50932, 457, 437, 498, 291, 727, 1029, 257, 3303, 2856, + 1168, 281, 264, 9688, 82, 550, 7620, 300, 281, 257, 19200, 51320], "temperature": + 0.0, "avg_logprob": -0.0821292241414388, "compression_ratio": 1.720524017467249, + "no_speech_prob": 0.0033134587574750185}, {"id": 600, "seek": 367830, "start": 3697.42, + "end": 3703.5, "text": " command which would know to go and look up in that specific + database table and give you the answer", "tokens": [51320, 5622, 597, 576, 458, + 281, 352, 293, 574, 493, 294, 300, 2685, 8149, 3199, 293, 976, 291, 264, 1867, 51624], + "temperature": 0.0, "avg_logprob": -0.0821292241414388, "compression_ratio": 1.720524017467249, + "no_speech_prob": 0.0033134587574750185}, {"id": 601, "seek": 370350, "start": 3703.5, + "end": 3707.26, "text": " so like the manual doesn''t have it but it has some instructions + how to find it", "tokens": [50364, 370, 411, 264, 9688, 1177, 380, 362, 309, 457, + 309, 575, 512, 9415, 577, 281, 915, 309, 50552], "temperature": 0.0, "avg_logprob": + -0.14709170452960127, "compression_ratio": 1.6443298969072164, "no_speech_prob": + 0.004668624140322208}, {"id": 602, "seek": 370350, "start": 3708.06, "end": 3713.42, + "text": " and then you would kind of like convert that into through this metal language + convert that into", "tokens": [50592, 293, 550, 291, 576, 733, 295, 411, 7620, 300, + 666, 807, 341, 5760, 2856, 7620, 300, 666, 50860], "temperature": 0.0, "avg_logprob": + -0.14709170452960127, "compression_ratio": 1.6443298969072164, "no_speech_prob": + 0.004668624140322208}, {"id": 603, "seek": 370350, "start": 3713.42, "end": 3720.62, + "text": " SQL and then get that answer right and this was like pre-dense retrieval + in era obviously", "tokens": [50860, 19200, 293, 550, 483, 300, 1867, 558, 293, + 341, 390, 411, 659, 12, 67, 1288, 19817, 3337, 294, 4249, 2745, 51220], "temperature": + 0.0, "avg_logprob": -0.14709170452960127, "compression_ratio": 1.6443298969072164, + "no_speech_prob": 0.004668624140322208}, {"id": 604, "seek": 370350, "start": 3720.62, + "end": 3724.38, "text": " but I think I still feel like it has the merit to like", + "tokens": [51220, 457, 286, 519, 286, 920, 841, 411, 309, 575, 264, 24527, 281, + 411, 51408], "temperature": 0.0, "avg_logprob": -0.14709170452960127, "compression_ratio": + 1.6443298969072164, "no_speech_prob": 0.004668624140322208}, {"id": 605, "seek": + 372438, "start": 3724.54, "end": 3735.9, "text": " well I guess two things I think + first there''s this problem where you search like for error line manual", "tokens": + [50372, 731, 286, 2041, 732, 721, 286, 519, 700, 456, 311, 341, 1154, 689, 291, + 3164, 411, 337, 6713, 1622, 9688, 50940], "temperature": 0.0, "avg_logprob": -0.1503126621246338, + "compression_ratio": 1.730593607305936, "no_speech_prob": 0.012029404751956463}, + {"id": 606, "seek": 372438, "start": 3735.9, "end": 3741.9, "text": " some specific + detail and it''s like in result seven like it almost got it like it''s not like", + "tokens": [50940, 512, 2685, 2607, 293, 309, 311, 411, 294, 1874, 3407, 411, 309, + 1920, 658, 309, 411, 309, 311, 406, 411, 51240], "temperature": 0.0, "avg_logprob": + -0.1503126621246338, "compression_ratio": 1.730593607305936, "no_speech_prob": 0.012029404751956463}, + {"id": 607, "seek": 372438, "start": 3741.9, "end": 3747.34, "text": " not in the + top 100 but it''s seven and to that problem is where I think this GPT index", "tokens": + [51240, 406, 294, 264, 1192, 2319, 457, 309, 311, 3407, 293, 281, 300, 1154, 307, + 689, 286, 519, 341, 26039, 51, 8186, 51512], "temperature": 0.0, "avg_logprob": + -0.1503126621246338, "compression_ratio": 1.730593607305936, "no_speech_prob": 0.012029404751956463}, + {"id": 608, "seek": 372438, "start": 3748.38, "end": 3753.42, "text": " like recursive + summarization or create and refine summarization I think that''ll solve that problem", + "tokens": [51564, 411, 20560, 488, 14611, 2144, 420, 1884, 293, 33906, 14611, 2144, + 286, 519, 300, 603, 5039, 300, 1154, 51816], "temperature": 0.0, "avg_logprob": + -0.1503126621246338, "compression_ratio": 1.730593607305936, "no_speech_prob": 0.012029404751956463}, + {"id": 609, "seek": 375438, "start": 3755.1, "end": 3760.2200000000003, "text": + " and yeah well so I then coming back to this idea of natural language to SQL and + like structured", "tokens": [50400, 293, 1338, 731, 370, 286, 550, 1348, 646, 281, + 341, 1558, 295, 3303, 2856, 281, 19200, 293, 411, 18519, 50656], "temperature": + 0.0, "avg_logprob": -0.0855837253609089, "compression_ratio": 1.8653061224489795, + "no_speech_prob": 0.001480748294852674}, {"id": 610, "seek": 375438, "start": 3760.2200000000003, + "end": 3765.5, "text": " unstructured data on the other end you can also parse the + tables into text and so I''ve seen that", "tokens": [50656, 18799, 46847, 1412, + 322, 264, 661, 917, 291, 393, 611, 48377, 264, 8020, 666, 2487, 293, 370, 286, 600, + 1612, 300, 50920], "temperature": 0.0, "avg_logprob": -0.0855837253609089, "compression_ratio": + 1.8653061224489795, "no_speech_prob": 0.001480748294852674}, {"id": 611, "seek": + 375438, "start": 3765.5, "end": 3771.42, "text": " done too there''s like wiki tables + to text and so me personally my favorite application is", "tokens": [50920, 1096, + 886, 456, 311, 411, 261, 9850, 8020, 281, 2487, 293, 370, 385, 5665, 452, 2954, + 3861, 307, 51216], "temperature": 0.0, "avg_logprob": -0.0855837253609089, "compression_ratio": + 1.8653061224489795, "no_speech_prob": 0.001480748294852674}, {"id": 612, "seek": + 375438, "start": 3772.2200000000003, "end": 3775.5, "text": " is scientific literature + mining and searching through scientific papers and so", "tokens": [51256, 307, 8134, + 10394, 15512, 293, 10808, 807, 8134, 10577, 293, 370, 51420], "temperature": 0.0, + "avg_logprob": -0.0855837253609089, "compression_ratio": 1.8653061224489795, "no_speech_prob": + 0.001480748294852674}, {"id": 613, "seek": 375438, "start": 3776.2200000000003, + "end": 3782.06, "text": " you could parse out the tables to turn like the results + tables to turn it into natural language", "tokens": [51456, 291, 727, 48377, 484, + 264, 8020, 281, 1261, 411, 264, 3542, 8020, 281, 1261, 309, 666, 3303, 2856, 51748], + "temperature": 0.0, "avg_logprob": -0.0855837253609089, "compression_ratio": 1.8653061224489795, + "no_speech_prob": 0.001480748294852674}, {"id": 614, "seek": 378206, "start": 3782.14, + "end": 3786.2999999999997, "text": " and I mean there''s so many fascinating things + so it''s like with a knowledge let''s say like a", "tokens": [50368, 293, 286, 914, + 456, 311, 370, 867, 10343, 721, 370, 309, 311, 411, 365, 257, 3601, 718, 311, 584, + 411, 257, 50576], "temperature": 0.0, "avg_logprob": -0.14234181876494506, "compression_ratio": + 1.7898832684824904, "no_speech_prob": 0.002435698639601469}, {"id": 615, "seek": + 378206, "start": 3786.2999999999997, "end": 3791.74, "text": " knowledge graph the + idea of the knowledge graph is if I have Demetri Khan host the vector", "tokens": + [50576, 3601, 4295, 264, 1558, 295, 264, 3601, 4295, 307, 498, 286, 362, 4686, 302, + 470, 18136, 3975, 264, 8062, 50848], "temperature": 0.0, "avg_logprob": -0.14234181876494506, + "compression_ratio": 1.7898832684824904, "no_speech_prob": 0.002435698639601469}, + {"id": 616, "seek": 378206, "start": 3791.74, "end": 3798.14, "text": " podcast + is a product manager at Tom Tom I with knowledge graph I can you know I compress + the", "tokens": [50848, 7367, 307, 257, 1674, 6598, 412, 5041, 5041, 286, 365, 3601, + 4295, 286, 393, 291, 458, 286, 14778, 264, 51168], "temperature": 0.0, "avg_logprob": + -0.14234181876494506, "compression_ratio": 1.7898832684824904, "no_speech_prob": + 0.002435698639601469}, {"id": 617, "seek": 378206, "start": 3798.14, "end": 3803.82, + "text": " representation of all these facts into one structure compared to having + the set of sentences", "tokens": [51168, 10290, 295, 439, 613, 9130, 666, 472, 3877, + 5347, 281, 1419, 264, 992, 295, 16579, 51452], "temperature": 0.0, "avg_logprob": + -0.14234181876494506, "compression_ratio": 1.7898832684824904, "no_speech_prob": + 0.002435698639601469}, {"id": 618, "seek": 378206, "start": 3803.82, "end": 3810.7799999999997, + "text": " right and yeah so maybe if I can kind of plug something I''ve done so + I have this paper that", "tokens": [51452, 558, 293, 1338, 370, 1310, 498, 286, + 393, 733, 295, 5452, 746, 286, 600, 1096, 370, 286, 362, 341, 3035, 300, 51800], + "temperature": 0.0, "avg_logprob": -0.14234181876494506, "compression_ratio": 1.7898832684824904, + "no_speech_prob": 0.002435698639601469}, {"id": 619, "seek": 381078, "start": 3810.78, + "end": 3817.1800000000003, "text": " will be published pretty soon it''s about it''s + in the Florida Atlantic University PhD it''s an", "tokens": [50364, 486, 312, 6572, + 1238, 2321, 309, 311, 466, 309, 311, 294, 264, 9117, 20233, 3535, 14476, 309, 311, + 364, 50684], "temperature": 0.0, "avg_logprob": -0.09633703685942151, "compression_ratio": + 1.6091205211726385, "no_speech_prob": 0.0012176918098703027}, {"id": 620, "seek": + 381078, "start": 3817.1800000000003, "end": 3821.82, "text": " interdisciplinary + team with the College of Nursing and a local healthcare system so we have electronic", + "tokens": [50684, 38280, 1469, 365, 264, 6745, 295, 42655, 293, 257, 2654, 8884, + 1185, 370, 321, 362, 10092, 50916], "temperature": 0.0, "avg_logprob": -0.09633703685942151, + "compression_ratio": 1.6091205211726385, "no_speech_prob": 0.0012176918098703027}, + {"id": 621, "seek": 381078, "start": 3821.82, "end": 3827.02, "text": " health records + that describe COVID-19 patients and we''re trying to predict survival outcome treatment", + "tokens": [50916, 1585, 7724, 300, 6786, 4566, 12, 3405, 4209, 293, 321, 434, 1382, + 281, 6069, 12559, 9700, 5032, 51176], "temperature": 0.0, "avg_logprob": -0.09633703685942151, + "compression_ratio": 1.6091205211726385, "no_speech_prob": 0.0012176918098703027}, + {"id": 622, "seek": 381078, "start": 3827.02, "end": 3833.1800000000003, "text": + " forecasting prognosis all that kind of stuff and so the the thing that we explored + in this paper is", "tokens": [51176, 44331, 447, 4568, 8211, 439, 300, 733, 295, + 1507, 293, 370, 264, 264, 551, 300, 321, 24016, 294, 341, 3035, 307, 51484], "temperature": + 0.0, "avg_logprob": -0.09633703685942151, "compression_ratio": 1.6091205211726385, + "no_speech_prob": 0.0012176918098703027}, {"id": 623, "seek": 381078, "start": 3833.1800000000003, + "end": 3837.82, "text": " let''s switch from the structured tabular data to parsing + it into natural language text and let''s", "tokens": [51484, 718, 311, 3679, 490, + 264, 18519, 4421, 1040, 1412, 281, 21156, 278, 309, 666, 3303, 2856, 2487, 293, + 718, 311, 51716], "temperature": 0.0, "avg_logprob": -0.09633703685942151, "compression_ratio": + 1.6091205211726385, "no_speech_prob": 0.0012176918098703027}, {"id": 624, "seek": + 383782, "start": 3837.82, "end": 3842.6200000000003, "text": " turn it into like + clinical narratives or let''s do this thing where you do if X if feature name", + "tokens": [50364, 1261, 309, 666, 411, 9115, 28016, 420, 718, 311, 360, 341, 551, + 689, 291, 360, 498, 1783, 498, 4111, 1315, 50604], "temperature": 0.0, "avg_logprob": + -0.16866170528323152, "compression_ratio": 1.6725663716814159, "no_speech_prob": + 0.0018300743540748954}, {"id": 625, "seek": 383782, "start": 3842.6200000000003, + "end": 3848.86, "text": " equals if feature name equals then label right yeah so + there''s a paper from the University of", "tokens": [50604, 6915, 498, 4111, 1315, + 6915, 550, 7645, 558, 1338, 370, 456, 311, 257, 3035, 490, 264, 3535, 295, 50916], + "temperature": 0.0, "avg_logprob": -0.16866170528323152, "compression_ratio": 1.6725663716814159, + "no_speech_prob": 0.0018300743540748954}, {"id": 626, "seek": 383782, "start": 3848.86, + "end": 3854.06, "text": " Wisconsin called language interface fine tuning where + they do that same idea but it''s you know like", "tokens": [50916, 17977, 1219, + 2856, 9226, 2489, 15164, 689, 436, 360, 300, 912, 1558, 457, 309, 311, 291, 458, + 411, 51176], "temperature": 0.0, "avg_logprob": -0.16866170528323152, "compression_ratio": + 1.6725663716814159, "no_speech_prob": 0.0018300743540748954}, {"id": 627, "seek": + 383782, "start": 3854.06, "end": 3859.34, "text": " the UCI machine learning repository + data sets so so I think I know that I''ve taken like a", "tokens": [51176, 264, + 14079, 40, 3479, 2539, 25841, 1412, 6352, 370, 370, 286, 519, 286, 458, 300, 286, + 600, 2726, 411, 257, 51440], "temperature": 0.0, "avg_logprob": -0.16866170528323152, + "compression_ratio": 1.6725663716814159, "no_speech_prob": 0.0018300743540748954}, + {"id": 628, "seek": 385934, "start": 3860.3, "end": 3865.9, "text": " walker and + also to think it''s cool it''s cool I''m sure now listeners will be like what", + "tokens": [50412, 1792, 260, 293, 611, 281, 519, 309, 311, 1627, 309, 311, 1627, + 286, 478, 988, 586, 23274, 486, 312, 411, 437, 50692], "temperature": 0.0, "avg_logprob": + -0.23830837899066032, "compression_ratio": 1.7096774193548387, "no_speech_prob": + 0.00744063314050436}, {"id": 629, "seek": 385934, "start": 3867.58, "end": 3874.54, + "text": " but I know like it''s it''s also what I heard from my listeners for example + in the podcast is that", "tokens": [50776, 457, 286, 458, 411, 309, 311, 309, 311, + 611, 437, 286, 2198, 490, 452, 23274, 337, 1365, 294, 264, 7367, 307, 300, 51124], + "temperature": 0.0, "avg_logprob": -0.23830837899066032, "compression_ratio": 1.7096774193548387, + "no_speech_prob": 0.00744063314050436}, {"id": 630, "seek": 385934, "start": 3874.54, + "end": 3880.94, "text": " they actually do use this episode as an educational material + so that''s why you know if we can", "tokens": [51124, 436, 767, 360, 764, 341, 3500, + 382, 364, 10189, 2527, 370, 300, 311, 983, 291, 458, 498, 321, 393, 51444], "temperature": + 0.0, "avg_logprob": -0.23830837899066032, "compression_ratio": 1.7096774193548387, + "no_speech_prob": 0.00744063314050436}, {"id": 631, "seek": 385934, "start": 3880.94, + "end": 3888.1400000000003, "text": " stuff as many links to papers and your work + they can go and study this yeah go go go I do some", "tokens": [51444, 1507, 382, + 867, 6123, 281, 10577, 293, 428, 589, 436, 393, 352, 293, 2979, 341, 1338, 352, + 352, 352, 286, 360, 512, 51804], "temperature": 0.0, "avg_logprob": -0.23830837899066032, + "compression_ratio": 1.7096774193548387, "no_speech_prob": 0.00744063314050436}, + {"id": 632, "seek": 388814, "start": 3888.14, "end": 3892.06, "text": " rise I guess + the question is like how are we thinking about structured and unstructured data", + "tokens": [50364, 6272, 286, 2041, 264, 1168, 307, 411, 577, 366, 321, 1953, 466, + 18519, 293, 18799, 46847, 1412, 50560], "temperature": 0.0, "avg_logprob": -0.09133323838439168, + "compression_ratio": 1.896551724137931, "no_speech_prob": 0.0011382680386304855}, + {"id": 633, "seek": 388814, "start": 3895.98, "end": 3900.22, "text": " the deep + learning systems you could parse out the structure into unstructure and then you + have", "tokens": [50756, 264, 2452, 2539, 3652, 291, 727, 48377, 484, 264, 3877, + 666, 18799, 2885, 293, 550, 291, 362, 50968], "temperature": 0.0, "avg_logprob": + -0.09133323838439168, "compression_ratio": 1.896551724137931, "no_speech_prob": + 0.0011382680386304855}, {"id": 634, "seek": 388814, "start": 3900.22, "end": 3906.22, + "text": " the transfer learning is really easy right yeah yes or you can keep the + structure and then maybe you", "tokens": [50968, 264, 5003, 2539, 307, 534, 1858, + 558, 1338, 2086, 420, 291, 393, 1066, 264, 3877, 293, 550, 1310, 291, 51268], "temperature": + 0.0, "avg_logprob": -0.09133323838439168, "compression_ratio": 1.896551724137931, + "no_speech_prob": 0.0011382680386304855}, {"id": 635, "seek": 388814, "start": 3906.22, + "end": 3911.9, "text": " can learn a better representation thanks to the structure + and with that question my interest has", "tokens": [51268, 393, 1466, 257, 1101, + 10290, 3231, 281, 264, 3877, 293, 365, 300, 1168, 452, 1179, 575, 51552], "temperature": + 0.0, "avg_logprob": -0.09133323838439168, "compression_ratio": 1.896551724137931, + "no_speech_prob": 0.0011382680386304855}, {"id": 636, "seek": 391190, "start": 3911.9, + "end": 3919.34, "text": " been really heavily in these causal digs and this idea + of creating structured causal relationships", "tokens": [50364, 668, 534, 10950, + 294, 613, 38755, 2528, 82, 293, 341, 1558, 295, 4084, 18519, 38755, 6159, 50736], + "temperature": 0.0, "avg_logprob": -0.12853460533674374, "compression_ratio": 1.7589285714285714, + "no_speech_prob": 0.0002419095835648477}, {"id": 637, "seek": 391190, "start": 3919.34, + "end": 3926.06, "text": " between variables I still have no idea how that really + how you can take like Wikipedia text and turn", "tokens": [50736, 1296, 9102, 286, + 920, 362, 572, 1558, 577, 300, 534, 577, 291, 393, 747, 411, 28999, 2487, 293, 1261, + 51072], "temperature": 0.0, "avg_logprob": -0.12853460533674374, "compression_ratio": + 1.7589285714285714, "no_speech_prob": 0.0002419095835648477}, {"id": 638, "seek": + 391190, "start": 3926.06, "end": 3931.7400000000002, "text": " it into a causal + diagram but I have an idea of like if and it comes back to this agi versus super", + "tokens": [51072, 309, 666, 257, 38755, 10686, 457, 286, 362, 364, 1558, 295, 411, + 498, 293, 309, 1487, 646, 281, 341, 623, 72, 5717, 1687, 51356], "temperature": + 0.0, "avg_logprob": -0.12853460533674374, "compression_ratio": 1.7589285714285714, + "no_speech_prob": 0.0002419095835648477}, {"id": 639, "seek": 391190, "start": 3931.7400000000002, + "end": 3937.98, "text": " intelligence idea if I have a super intelligence and it''s + reading search literature I want it to", "tokens": [51356, 7599, 1558, 498, 286, + 362, 257, 1687, 7599, 293, 309, 311, 3760, 3164, 10394, 286, 528, 309, 281, 51668], + "temperature": 0.0, "avg_logprob": -0.12853460533674374, "compression_ratio": 1.7589285714285714, + "no_speech_prob": 0.0002419095835648477}, {"id": 640, "seek": 393798, "start": 3937.98, + "end": 3944.3, "text": " have some kind of causal diagram of our current model of + search stuff so like it has some model", "tokens": [50364, 362, 512, 733, 295, 38755, + 10686, 295, 527, 2190, 2316, 295, 3164, 1507, 370, 411, 309, 575, 512, 2316, 50680], + "temperature": 0.0, "avg_logprob": -0.15768590084342068, "compression_ratio": 1.6652360515021458, + "no_speech_prob": 0.0013338078279048204}, {"id": 641, "seek": 393798, "start": 3944.3, + "end": 3950.94, "text": " of how BM25 is index the limitations of it''s blade this + representation this MLOVs problem it has", "tokens": [50680, 295, 577, 15901, 6074, + 307, 8186, 264, 15705, 295, 309, 311, 10959, 341, 10290, 341, 21601, 46, 53, 82, + 1154, 309, 575, 51012], "temperature": 0.0, "avg_logprob": -0.15768590084342068, + "compression_ratio": 1.6652360515021458, "no_speech_prob": 0.0013338078279048204}, + {"id": 642, "seek": 393798, "start": 3950.94, "end": 3954.94, "text": " like some + structured representation of all these problems such that when the new batch of + archive", "tokens": [51012, 411, 512, 18519, 10290, 295, 439, 613, 2740, 1270, 300, + 562, 264, 777, 15245, 295, 23507, 51212], "temperature": 0.0, "avg_logprob": -0.15768590084342068, + "compression_ratio": 1.6652360515021458, "no_speech_prob": 0.0013338078279048204}, + {"id": 643, "seek": 393798, "start": 3954.94, "end": 3961.98, "text": " papers or + tweets you know however the news is coming into it or experiments right it looks + at its", "tokens": [51212, 10577, 420, 25671, 291, 458, 4461, 264, 2583, 307, 1348, + 666, 309, 420, 12050, 558, 309, 1542, 412, 1080, 51564], "temperature": 0.0, "avg_logprob": + -0.15768590084342068, "compression_ratio": 1.6652360515021458, "no_speech_prob": + 0.0013338078279048204}, {"id": 644, "seek": 396198, "start": 3961.98, "end": 3967.82, + "text": " causal diagram to say like this violated my this this claim like because + that''s the thing you see", "tokens": [50364, 38755, 10686, 281, 584, 411, 341, + 33239, 452, 341, 341, 3932, 411, 570, 300, 311, 264, 551, 291, 536, 50656], "temperature": + 0.0, "avg_logprob": -0.1391257480182479, "compression_ratio": 1.9173228346456692, + "no_speech_prob": 0.001286407234147191}, {"id": 645, "seek": 396198, "start": 3967.82, + "end": 3973.5, "text": " a paper like autoregressive models as as search engines + or you see like what''s the name of that", "tokens": [50656, 257, 3035, 411, 1476, + 418, 3091, 488, 5245, 382, 382, 3164, 12982, 420, 291, 536, 411, 437, 311, 264, + 1315, 295, 300, 50940], "temperature": 0.0, "avg_logprob": -0.1391257480182479, + "compression_ratio": 1.9173228346456692, "no_speech_prob": 0.001286407234147191}, + {"id": 646, "seek": 396198, "start": 3973.5, "end": 3978.22, "text": " where it''s + like transformers is a differentiable search index like you see some title like + that that", "tokens": [50940, 689, 309, 311, 411, 4088, 433, 307, 257, 819, 9364, + 3164, 8186, 411, 291, 536, 512, 4876, 411, 300, 300, 51176], "temperature": 0.0, + "avg_logprob": -0.1391257480182479, "compression_ratio": 1.9173228346456692, "no_speech_prob": + 0.001286407234147191}, {"id": 647, "seek": 396198, "start": 3978.22, "end": 3983.26, + "text": " violates your causal diagram of why things are the way they are and that''s + what like inspires your", "tokens": [51176, 3448, 1024, 428, 38755, 10686, 295, + 983, 721, 366, 264, 636, 436, 366, 293, 300, 311, 437, 411, 32566, 428, 51428], + "temperature": 0.0, "avg_logprob": -0.1391257480182479, "compression_ratio": 1.9173228346456692, + "no_speech_prob": 0.001286407234147191}, {"id": 648, "seek": 396198, "start": 3983.26, + "end": 3990.78, "text": " interest so that''s that particular angle of it is yeah + yeah I''m not mostly thinking I haven''t", "tokens": [51428, 1179, 370, 300, 311, + 300, 1729, 5802, 295, 309, 307, 1338, 1338, 286, 478, 406, 5240, 1953, 286, 2378, + 380, 51804], "temperature": 0.0, "avg_logprob": -0.1391257480182479, "compression_ratio": + 1.9173228346456692, "no_speech_prob": 0.001286407234147191}, {"id": 649, "seek": + 399078, "start": 3990.78, "end": 3997.9, "text": " explored this topic myself yet + but so let''s say if you take a language model like bird which was", "tokens": [50364, + 24016, 341, 4829, 2059, 1939, 457, 370, 718, 311, 584, 498, 291, 747, 257, 2856, + 2316, 411, 5255, 597, 390, 50720], "temperature": 0.0, "avg_logprob": -0.09836694929334852, + "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0019612188916653395}, + {"id": 650, "seek": 399078, "start": 3997.9, "end": 4006.7000000000003, "text": + " kind of like you could say statically trained once on Wikipedia or news content + right but the world", "tokens": [50720, 733, 295, 411, 291, 727, 584, 2219, 984, + 8895, 1564, 322, 28999, 420, 2583, 2701, 558, 457, 264, 1002, 51160], "temperature": + 0.0, "avg_logprob": -0.09836694929334852, "compression_ratio": 1.696969696969697, + "no_speech_prob": 0.0019612188916653395}, {"id": 651, "seek": 399078, "start": 4006.7000000000003, + "end": 4013.02, "text": " is changing every single day right your model doesn''t + so what you could do is that you could", "tokens": [51160, 307, 4473, 633, 2167, + 786, 558, 428, 2316, 1177, 380, 370, 437, 291, 727, 360, 307, 300, 291, 727, 51476], + "temperature": 0.0, "avg_logprob": -0.09836694929334852, "compression_ratio": 1.696969696969697, + "no_speech_prob": 0.0019612188916653395}, {"id": 652, "seek": 399078, "start": 4013.02, + "end": 4020.0600000000004, "text": " introduce knowledge back to the model and I''m + still like on the on the brisk of kind of exploring this", "tokens": [51476, 5366, + 3601, 646, 281, 264, 2316, 293, 286, 478, 920, 411, 322, 264, 322, 264, 738, 7797, + 295, 733, 295, 12736, 341, 51828], "temperature": 0.0, "avg_logprob": -0.09836694929334852, + "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0019612188916653395}, + {"id": 653, "seek": 402006, "start": 4020.06, "end": 4027.02, "text": " I think + new tremors talked about it recently like how you can incorporate knowledge in the", + "tokens": [50364, 286, 519, 777, 7813, 830, 2825, 466, 309, 3938, 411, 577, 291, + 393, 16091, 3601, 294, 264, 50712], "temperature": 0.0, "avg_logprob": -0.1502657166446548, + "compression_ratio": 1.6637554585152838, "no_speech_prob": 0.0011547771282494068}, + {"id": 654, "seek": 402006, "start": 4027.02, "end": 4032.46, "text": " language + model so for instance what like the way I see this before I even like read this + paper", "tokens": [50712, 2856, 2316, 370, 337, 5197, 437, 411, 264, 636, 286, 536, + 341, 949, 286, 754, 411, 1401, 341, 3035, 50984], "temperature": 0.0, "avg_logprob": + -0.1502657166446548, "compression_ratio": 1.6637554585152838, "no_speech_prob": + 0.0011547771282494068}, {"id": 655, "seek": 402006, "start": 4032.46, "end": 4039.2599999999998, + "text": " so I could probably try to invent reinvent the wheel is that so the language + model might figure out", "tokens": [50984, 370, 286, 727, 1391, 853, 281, 7962, + 33477, 264, 5589, 307, 300, 370, 264, 2856, 2316, 1062, 2573, 484, 51324], "temperature": + 0.0, "avg_logprob": -0.1502657166446548, "compression_ratio": 1.6637554585152838, + "no_speech_prob": 0.0011547771282494068}, {"id": 656, "seek": 402006, "start": 4040.14, + "end": 4045.58, "text": " that the question is about the president of the United + States that specific one let''s say Obama", "tokens": [51368, 300, 264, 1168, 307, + 466, 264, 3868, 295, 264, 2824, 3040, 300, 2685, 472, 718, 311, 584, 9560, 51640], + "temperature": 0.0, "avg_logprob": -0.1502657166446548, "compression_ratio": 1.6637554585152838, + "no_speech_prob": 0.0011547771282494068}, {"id": 657, "seek": 404558, "start": 4045.66, + "end": 4053.02, "text": " something but then the question is is Obama still the + president of the United States and so now the", "tokens": [50368, 746, 457, 550, + 264, 1168, 307, 307, 9560, 920, 264, 3868, 295, 264, 2824, 3040, 293, 370, 586, + 264, 50736], "temperature": 0.0, "avg_logprob": -0.18145058606121992, "compression_ratio": + 1.5326633165829147, "no_speech_prob": 0.004877268336713314}, {"id": 658, "seek": + 404558, "start": 4053.02, "end": 4058.54, "text": " language model is kind of like + hentik app that says well I actually don''t have last I know like chat", "tokens": + [50736, 2856, 2316, 307, 733, 295, 411, 276, 317, 1035, 724, 300, 1619, 731, 286, + 767, 500, 380, 362, 1036, 286, 458, 411, 5081, 51012], "temperature": 0.0, "avg_logprob": + -0.18145058606121992, "compression_ratio": 1.5326633165829147, "no_speech_prob": + 0.004877268336713314}, {"id": 659, "seek": 404558, "start": 4058.54, "end": 4067.9, + "text": " gpd does that right like I was trained by 2021 so I have no idea what + happened in 2022 sorry goodbye but", "tokens": [51012, 290, 79, 67, 775, 300, 558, + 411, 286, 390, 8895, 538, 7201, 370, 286, 362, 572, 1558, 437, 2011, 294, 20229, + 2597, 12084, 457, 51480], "temperature": 0.0, "avg_logprob": -0.18145058606121992, + "compression_ratio": 1.5326633165829147, "no_speech_prob": 0.004877268336713314}, + {"id": 660, "seek": 406790, "start": 4067.9, "end": 4075.42, "text": " like it could + actually say it could say I figured out the context I know roughly what you''re + asking", "tokens": [50364, 411, 309, 727, 767, 584, 309, 727, 584, 286, 8932, 484, + 264, 4319, 286, 458, 9810, 437, 291, 434, 3365, 50740], "temperature": 0.0, "avg_logprob": + -0.10059656567043729, "compression_ratio": 1.8714285714285714, "no_speech_prob": + 0.005412782076746225}, {"id": 661, "seek": 406790, "start": 4075.42, "end": 4081.9, + "text": " this is the person I know this person I know that what what the president + means I know the the", "tokens": [50740, 341, 307, 264, 954, 286, 458, 341, 954, + 286, 458, 300, 437, 437, 264, 3868, 1355, 286, 458, 264, 264, 51064], "temperature": + 0.0, "avg_logprob": -0.10059656567043729, "compression_ratio": 1.8714285714285714, + "no_speech_prob": 0.005412782076746225}, {"id": 662, "seek": 406790, "start": 4081.9, + "end": 4087.26, "text": " country United States but you''re asking me a factual + question so what it could do is actually it could", "tokens": [51064, 1941, 2824, + 3040, 457, 291, 434, 3365, 385, 257, 48029, 1168, 370, 437, 309, 727, 360, 307, + 767, 309, 727, 51332], "temperature": 0.0, "avg_logprob": -0.10059656567043729, + "compression_ratio": 1.8714285714285714, "no_speech_prob": 0.005412782076746225}, + {"id": 663, "seek": 406790, "start": 4087.26, "end": 4094.78, "text": " go and ask + a knowledge graph which is updated without recalculating the the embeddings which + is", "tokens": [51332, 352, 293, 1029, 257, 3601, 4295, 597, 307, 10588, 1553, 850, + 304, 2444, 990, 264, 264, 12240, 29432, 597, 307, 51708], "temperature": 0.0, "avg_logprob": + -0.10059656567043729, "compression_ratio": 1.8714285714285714, "no_speech_prob": + 0.005412782076746225}, {"id": 664, "seek": 409478, "start": 4095.5, "end": 4101.18, + "text": " solving them all of this problem right so it''s it''s it''s another data + structure you know it''s", "tokens": [50400, 12606, 552, 439, 295, 341, 1154, 558, + 370, 309, 311, 309, 311, 309, 311, 1071, 1412, 3877, 291, 458, 309, 311, 50684], + "temperature": 0.0, "avg_logprob": -0.1401343510068696, "compression_ratio": 1.8862745098039215, + "no_speech_prob": 0.006909992545843124}, {"id": 665, "seek": 409478, "start": 4101.18, + "end": 4109.18, "text": " a knowledge graph it''s being updated as we go and so + it goes and says hey let''s coming back to your", "tokens": [50684, 257, 3601, 4295, + 309, 311, 885, 10588, 382, 321, 352, 293, 370, 309, 1709, 293, 1619, 4177, 718, + 311, 1348, 646, 281, 428, 51084], "temperature": 0.0, "avg_logprob": -0.1401343510068696, + "compression_ratio": 1.8862745098039215, "no_speech_prob": 0.006909992545843124}, + {"id": 666, "seek": 409478, "start": 4109.18, "end": 4113.66, "text": " question + on on structured language like in in graph systems you also need to form your query", + "tokens": [51084, 1168, 322, 322, 18519, 2856, 411, 294, 294, 4295, 3652, 291, 611, + 643, 281, 1254, 428, 14581, 51308], "temperature": 0.0, "avg_logprob": -0.1401343510068696, + "compression_ratio": 1.8862745098039215, "no_speech_prob": 0.006909992545843124}, + {"id": 667, "seek": 409478, "start": 4113.66, "end": 4118.7, "text": " in a certain + way so it forms the query in a certain way and traverses the graph and then checks", + "tokens": [51308, 294, 257, 1629, 636, 370, 309, 6422, 264, 14581, 294, 257, 1629, + 636, 293, 23149, 279, 264, 4295, 293, 550, 13834, 51560], "temperature": 0.0, "avg_logprob": + -0.1401343510068696, "compression_ratio": 1.8862745098039215, "no_speech_prob": + 0.006909992545843124}, {"id": 668, "seek": 409478, "start": 4118.7, "end": 4124.22, + "text": " is Obama the president the answer is no it goes back all the way to maybe + a language model I don''t", "tokens": [51560, 307, 9560, 264, 3868, 264, 1867, 307, + 572, 309, 1709, 646, 439, 264, 636, 281, 1310, 257, 2856, 2316, 286, 500, 380, 51836], + "temperature": 0.0, "avg_logprob": -0.1401343510068696, "compression_ratio": 1.8862745098039215, + "no_speech_prob": 0.006909992545843124}, {"id": 669, "seek": 412422, "start": 4124.22, + "end": 4129.66, "text": " mean some other layer and basically presents the answer + to the user right yeah so that''s just one", "tokens": [50364, 914, 512, 661, 4583, + 293, 1936, 13533, 264, 1867, 281, 264, 4195, 558, 1338, 370, 300, 311, 445, 472, + 50636], "temperature": 0.0, "avg_logprob": -0.18993494645604547, "compression_ratio": + 1.6866197183098592, "no_speech_prob": 0.0010349360527470708}, {"id": 670, "seek": + 412422, "start": 4129.66, "end": 4136.860000000001, "text": " thought before even + dove into this topic of incorporating knowledge in elabs I would probably think", + "tokens": [50636, 1194, 949, 754, 23287, 666, 341, 4829, 295, 33613, 3601, 294, + 806, 17243, 286, 576, 1391, 519, 50996], "temperature": 0.0, "avg_logprob": -0.18993494645604547, + "compression_ratio": 1.6866197183098592, "no_speech_prob": 0.0010349360527470708}, + {"id": 671, "seek": 412422, "start": 4136.860000000001, "end": 4143.18, "text": + " like that yeah I love that you brother that knowledge graph it''s like and that''s + kind of like", "tokens": [50996, 411, 300, 1338, 286, 959, 300, 291, 3708, 300, + 3601, 4295, 309, 311, 411, 293, 300, 311, 733, 295, 411, 51312], "temperature": + 0.0, "avg_logprob": -0.18993494645604547, "compression_ratio": 1.6866197183098592, + "no_speech_prob": 0.0010349360527470708}, {"id": 672, "seek": 412422, "start": 4143.18, + "end": 4147.02, "text": " GBT index as well as laying chain I can''t believe I haven''t + brought that up until now we can talk", "tokens": [51312, 460, 33853, 8186, 382, + 731, 382, 14903, 5021, 286, 393, 380, 1697, 286, 2378, 380, 3038, 300, 493, 1826, + 586, 321, 393, 751, 51504], "temperature": 0.0, "avg_logprob": -0.18993494645604547, + "compression_ratio": 1.6866197183098592, "no_speech_prob": 0.0010349360527470708}, + {"id": 673, "seek": 412422, "start": 4147.02, "end": 4150.62, "text": " about that + more in the neural search frameworks discussion on the review podcast but like", + "tokens": [51504, 466, 300, 544, 294, 264, 18161, 3164, 29834, 5017, 322, 264, 3131, + 7367, 457, 411, 51684], "temperature": 0.0, "avg_logprob": -0.18993494645604547, + "compression_ratio": 1.6866197183098592, "no_speech_prob": 0.0010349360527470708}, + {"id": 674, "seek": 415062, "start": 4151.26, "end": 4157.74, "text": " this idea + of different kinds of external memory and I don''t know what''s wrong with my brain + today", "tokens": [50396, 341, 1558, 295, 819, 3685, 295, 8320, 4675, 293, 286, + 500, 380, 458, 437, 311, 2085, 365, 452, 3567, 965, 50720], "temperature": 0.0, + "avg_logprob": -0.18490333557128907, "compression_ratio": 1.7155555555555555, "no_speech_prob": + 0.004891827702522278}, {"id": 675, "seek": 415062, "start": 4157.74, "end": 4163.42, + "text": " and I keep like branching into completely I don''t think it''s wrong I + think it''s the right setting", "tokens": [50720, 293, 286, 1066, 411, 9819, 278, + 666, 2584, 286, 500, 380, 519, 309, 311, 2085, 286, 519, 309, 311, 264, 558, 3287, + 51004], "temperature": 0.0, "avg_logprob": -0.18490333557128907, "compression_ratio": + 1.7155555555555555, "no_speech_prob": 0.004891827702522278}, {"id": 676, "seek": + 415062, "start": 4164.14, "end": 4171.98, "text": " it''s just not suitable with + the coding or something but um like so I was recently talking with", "tokens": [51040, + 309, 311, 445, 406, 12873, 365, 264, 17720, 420, 746, 457, 1105, 411, 370, 286, + 390, 3938, 1417, 365, 51432], "temperature": 0.0, "avg_logprob": -0.18490333557128907, + "compression_ratio": 1.7155555555555555, "no_speech_prob": 0.004891827702522278}, + {"id": 677, "seek": 415062, "start": 4171.98, "end": 4178.14, "text": " Shukri who + just joined we''ve got as well about um about this idea of metadata re-ranking so + one", "tokens": [51432, 1160, 2034, 470, 567, 445, 6869, 321, 600, 658, 382, 731, + 466, 1105, 466, 341, 1558, 295, 26603, 319, 12, 20479, 278, 370, 472, 51740], "temperature": + 0.0, "avg_logprob": -0.18490333557128907, "compression_ratio": 1.7155555555555555, + "no_speech_prob": 0.004891827702522278}, {"id": 678, "seek": 417814, "start": 4178.14, + "end": 4183.34, "text": " approach is you have the xg boost re-ranker where you + take in the bm25 score the vector", "tokens": [50364, 3109, 307, 291, 362, 264, + 2031, 70, 9194, 319, 12, 20479, 260, 689, 291, 747, 294, 264, 272, 76, 6074, 6175, + 264, 8062, 50624], "temperature": 0.0, "avg_logprob": -0.1727287483215332, "compression_ratio": + 1.755868544600939, "no_speech_prob": 0.00125303294043988}, {"id": 679, "seek": 417814, + "start": 4183.34, "end": 4190.9400000000005, "text": " distance and then also symbolic + features as the input to the re to the xg boost re-ranker", "tokens": [50624, 4560, + 293, 550, 611, 25755, 4122, 382, 264, 4846, 281, 264, 319, 281, 264, 2031, 70, 9194, + 319, 12, 20479, 260, 51004], "temperature": 0.0, "avg_logprob": -0.1727287483215332, + "compression_ratio": 1.755868544600939, "no_speech_prob": 0.00125303294043988}, + {"id": 680, "seek": 417814, "start": 4191.900000000001, "end": 4198.62, "text": + " so the thing he was okay do we want to store this metadata in weveate as well + or do we go get it", "tokens": [51052, 370, 264, 551, 415, 390, 1392, 360, 321, + 528, 281, 3531, 341, 26603, 294, 321, 303, 473, 382, 731, 420, 360, 321, 352, 483, + 309, 51388], "temperature": 0.0, "avg_logprob": -0.1727287483215332, "compression_ratio": + 1.755868544600939, "no_speech_prob": 0.00125303294043988}, {"id": 681, "seek": 417814, + "start": 4198.62, "end": 4203.58, "text": " from redis or feature store something + like that where we get that kind of property and so it''s like", "tokens": [51388, + 490, 2182, 271, 420, 4111, 3531, 746, 411, 300, 689, 321, 483, 300, 733, 295, 4707, + 293, 370, 309, 311, 411, 51636], "temperature": 0.0, "avg_logprob": -0.1727287483215332, + "compression_ratio": 1.755868544600939, "no_speech_prob": 0.00125303294043988}, + {"id": 682, "seek": 420358, "start": 4203.66, "end": 4208.46, "text": " the knowledge + graph the idea connects to that because it''s like okay are we going to build", + "tokens": [50368, 264, 3601, 4295, 264, 1558, 16967, 281, 300, 570, 309, 311, 411, + 1392, 366, 321, 516, 281, 1322, 50608], "temperature": 0.0, "avg_logprob": -0.10494468171717757, + "compression_ratio": 1.8409090909090908, "no_speech_prob": 0.00548475980758667}, + {"id": 683, "seek": 420358, "start": 4208.46, "end": 4214.0599999999995, "text": + " the knowledge graph in weveate should it live in weveate or should we plug weveate + in with something", "tokens": [50608, 264, 3601, 4295, 294, 321, 303, 473, 820, + 309, 1621, 294, 321, 303, 473, 420, 820, 321, 5452, 321, 303, 473, 294, 365, 746, + 50888], "temperature": 0.0, "avg_logprob": -0.10494468171717757, "compression_ratio": + 1.8409090909090908, "no_speech_prob": 0.00548475980758667}, {"id": 684, "seek": + 420358, "start": 4214.0599999999995, "end": 4219.74, "text": " like Neo4j or or + is it a top level controller like the neural search frameworks thing you''re describing", + "tokens": [50888, 411, 24458, 19, 73, 420, 420, 307, 309, 257, 1192, 1496, 10561, + 411, 264, 18161, 3164, 29834, 551, 291, 434, 16141, 51172], "temperature": 0.0, + "avg_logprob": -0.10494468171717757, "compression_ratio": 1.8409090909090908, "no_speech_prob": + 0.00548475980758667}, {"id": 685, "seek": 420358, "start": 4219.74, "end": 4226.38, + "text": " where it''s you know something that hooks into weveate and hooks into + Neo4j relational AI tiger", "tokens": [51172, 689, 309, 311, 291, 458, 746, 300, + 26485, 666, 321, 303, 473, 293, 26485, 666, 24458, 19, 73, 38444, 7318, 21432, 51504], + "temperature": 0.0, "avg_logprob": -0.10494468171717757, "compression_ratio": 1.8409090909090908, + "no_speech_prob": 0.00548475980758667}, {"id": 686, "seek": 420358, "start": 4226.38, + "end": 4232.7, "text": " graph I don''t know all the rdf ontology technologies but + you know like it has separate and it''s", "tokens": [51504, 4295, 286, 500, 380, + 458, 439, 264, 367, 45953, 6592, 1793, 7943, 457, 291, 458, 411, 309, 575, 4994, + 293, 309, 311, 51820], "temperature": 0.0, "avg_logprob": -0.10494468171717757, + "compression_ratio": 1.8409090909090908, "no_speech_prob": 0.00548475980758667}, + {"id": 687, "seek": 423270, "start": 4232.7, "end": 4237.099999999999, "text": " + a higher level that picks between the indexes so it''s yeah it''s like what kind + of technology is", "tokens": [50364, 257, 2946, 1496, 300, 16137, 1296, 264, 8186, + 279, 370, 309, 311, 1338, 309, 311, 411, 437, 733, 295, 2899, 307, 50584], "temperature": + 0.0, "avg_logprob": -0.1382554520008176, "compression_ratio": 1.732394366197183, + "no_speech_prob": 0.004590496886521578}, {"id": 688, "seek": 423270, "start": 4237.099999999999, + "end": 4243.74, "text": " built and weveate and that''s not even really up to me + you know exactly but I think it''s kind of", "tokens": [50584, 3094, 293, 321, 303, + 473, 293, 300, 311, 406, 754, 534, 493, 281, 385, 291, 458, 2293, 457, 286, 519, + 309, 311, 733, 295, 50916], "temperature": 0.0, "avg_logprob": -0.1382554520008176, + "compression_ratio": 1.732394366197183, "no_speech_prob": 0.004590496886521578}, + {"id": 689, "seek": 423270, "start": 4243.74, "end": 4251.42, "text": " fun to brainstorm + with you like what like we kind of like intuitively find this limitations", "tokens": + [50916, 1019, 281, 35245, 365, 291, 411, 437, 411, 321, 733, 295, 411, 46506, 915, + 341, 15705, 51300], "temperature": 0.0, "avg_logprob": -0.1382554520008176, "compression_ratio": + 1.732394366197183, "no_speech_prob": 0.004590496886521578}, {"id": 690, "seek": + 423270, "start": 4251.42, "end": 4256.7, "text": " together and at the same time + this limitations may lead to future discoveries like on", "tokens": [51300, 1214, + 293, 412, 264, 912, 565, 341, 15705, 815, 1477, 281, 2027, 28400, 411, 322, 51564], + "temperature": 0.0, "avg_logprob": -0.1382554520008176, "compression_ratio": 1.732394366197183, + "no_speech_prob": 0.004590496886521578}, {"id": 691, "seek": 425670, "start": 4256.94, + "end": 4263.9, "text": " engineering and research and like when I was giving this + keynote at Haystack where by the way", "tokens": [50376, 7043, 293, 2132, 293, 411, + 562, 286, 390, 2902, 341, 33896, 412, 8721, 372, 501, 689, 538, 264, 636, 50724], + "temperature": 0.0, "avg_logprob": -0.21135190604389578, "compression_ratio": 1.7212121212121212, + "no_speech_prob": 0.0046159327030181885}, {"id": 692, "seek": 425670, "start": 4263.9, + "end": 4273.82, "text": " weveate guys will surprise and another guys as well like + I didn''t I didn''t feel bold enough to", "tokens": [50724, 321, 303, 473, 1074, + 486, 6365, 293, 1071, 1074, 382, 731, 411, 286, 994, 380, 286, 994, 380, 841, 11928, + 1547, 281, 51220], "temperature": 0.0, "avg_logprob": -0.21135190604389578, "compression_ratio": + 1.7212121212121212, "no_speech_prob": 0.0046159327030181885}, {"id": 693, "seek": + 425670, "start": 4273.82, "end": 4280.54, "text": " say this but I think I will + say this now at least that I feel like engineering and research are", "tokens": + [51220, 584, 341, 457, 286, 519, 286, 486, 584, 341, 586, 412, 1935, 300, 286, 841, + 411, 7043, 293, 2132, 366, 51556], "temperature": 0.0, "avg_logprob": -0.21135190604389578, + "compression_ratio": 1.7212121212121212, "no_speech_prob": 0.0046159327030181885}, + {"id": 694, "seek": 428054, "start": 4280.54, "end": 4288.14, "text": " kind of + like indistinguishable in the amount of intelligent power you need to put into this + to", "tokens": [50364, 733, 295, 411, 1016, 468, 7050, 742, 712, 294, 264, 2372, + 295, 13232, 1347, 291, 643, 281, 829, 666, 341, 281, 50744], "temperature": 0.0, + "avg_logprob": -0.10643237760697288, "compression_ratio": 1.6946902654867257, "no_speech_prob": + 0.004193662665784359}, {"id": 695, "seek": 428054, "start": 4288.14, "end": 4294.94, + "text": " solve it because it''s not like given right like if this data structure + inverted index is designed", "tokens": [50744, 5039, 309, 570, 309, 311, 406, 411, + 2212, 558, 411, 498, 341, 1412, 3877, 38969, 8186, 307, 4761, 51084], "temperature": + 0.0, "avg_logprob": -0.10643237760697288, "compression_ratio": 1.6946902654867257, + "no_speech_prob": 0.004193662665784359}, {"id": 696, "seek": 428054, "start": 4294.94, + "end": 4302.62, "text": " like this and you do have the the issue of early termination + because you cannot like waste so many", "tokens": [51084, 411, 341, 293, 291, 360, + 362, 264, 264, 2734, 295, 2440, 1433, 2486, 570, 291, 2644, 411, 5964, 370, 867, + 51468], "temperature": 0.0, "avg_logprob": -0.10643237760697288, "compression_ratio": + 1.6946902654867257, "no_speech_prob": 0.004193662665784359}, {"id": 697, "seek": + 428054, "start": 4302.62, "end": 4309.98, "text": " CPU cycles then like okay without + reading papers can you go and solve it like being just an", "tokens": [51468, 13199, + 17796, 550, 411, 1392, 1553, 3760, 10577, 393, 291, 352, 293, 5039, 309, 411, 885, + 445, 364, 51836], "temperature": 0.0, "avg_logprob": -0.10643237760697288, "compression_ratio": + 1.6946902654867257, "no_speech_prob": 0.004193662665784359}, {"id": 698, "seek": + 430998, "start": 4309.98, "end": 4316.379999999999, "text": " engineer so to say + no you can''t it''s like it''s it''s super hard like you need to start coming up + with", "tokens": [50364, 11403, 370, 281, 584, 572, 291, 393, 380, 309, 311, 411, + 309, 311, 309, 311, 1687, 1152, 411, 291, 643, 281, 722, 1348, 493, 365, 50684], + "temperature": 0.0, "avg_logprob": -0.12398037543663612, "compression_ratio": 1.6896551724137931, + "no_speech_prob": 0.005147337447851896}, {"id": 699, "seek": 430998, "start": 4316.379999999999, + "end": 4322.54, "text": " like new vector space model which was invented when in + 60s 70s I don''t know so like can you come", "tokens": [50684, 411, 777, 8062, 1901, + 2316, 597, 390, 14479, 562, 294, 4060, 82, 5285, 82, 286, 500, 380, 458, 370, 411, + 393, 291, 808, 50992], "temperature": 0.0, "avg_logprob": -0.12398037543663612, + "compression_ratio": 1.6896551724137931, "no_speech_prob": 0.005147337447851896}, + {"id": 700, "seek": 430998, "start": 4322.54, "end": 4330.86, "text": " up with + like completing your model it''s it''s it''s equally hard as in research when okay + you know", "tokens": [50992, 493, 365, 411, 19472, 428, 2316, 309, 311, 309, 311, + 309, 311, 12309, 1152, 382, 294, 2132, 562, 1392, 291, 458, 51408], "temperature": + 0.0, "avg_logprob": -0.12398037543663612, "compression_ratio": 1.6896551724137931, + "no_speech_prob": 0.005147337447851896}, {"id": 701, "seek": 430998, "start": 4330.86, + "end": 4337.259999999999, "text": " that SOTA is now this can I beat it somehow + but it''s not like you''re just beating sort of for the", "tokens": [51408, 300, + 318, 5068, 32, 307, 586, 341, 393, 286, 4224, 309, 6063, 457, 309, 311, 406, 411, + 291, 434, 445, 13497, 1333, 295, 337, 264, 51728], "temperature": 0.0, "avg_logprob": + -0.12398037543663612, "compression_ratio": 1.6896551724137931, "no_speech_prob": + 0.005147337447851896}, {"id": 702, "seek": 433726, "start": 4337.26, "end": 4342.7, + "text": " sake of it maybe some people do but like I would take a stance of not + doing that like I would", "tokens": [50364, 9717, 295, 309, 1310, 512, 561, 360, + 457, 411, 286, 576, 747, 257, 21033, 295, 406, 884, 300, 411, 286, 576, 50636], + "temperature": 0.0, "avg_logprob": -0.09483429693406628, "compression_ratio": 1.7212389380530972, + "no_speech_prob": 0.004677057731896639}, {"id": 703, "seek": 433726, "start": 4343.42, + "end": 4348.7, "text": " try to solve an existing problem right so I do want to + surface as you said more relevant document", "tokens": [50672, 853, 281, 5039, 364, + 6741, 1154, 558, 370, 286, 360, 528, 281, 3753, 382, 291, 848, 544, 7340, 4166, + 50936], "temperature": 0.0, "avg_logprob": -0.09483429693406628, "compression_ratio": + 1.7212389380530972, "no_speech_prob": 0.004677057731896639}, {"id": 704, "seek": + 433726, "start": 4348.7, "end": 4356.3, "text": " to the top or maybe even the passage + maybe in a number so I keep pushing for that so both of these", "tokens": [50936, + 281, 264, 1192, 420, 1310, 754, 264, 11497, 1310, 294, 257, 1230, 370, 286, 1066, + 7380, 337, 300, 370, 1293, 295, 613, 51316], "temperature": 0.0, "avg_logprob": + -0.09483429693406628, "compression_ratio": 1.7212389380530972, "no_speech_prob": + 0.004677057731896639}, {"id": 705, "seek": 433726, "start": 4356.3, "end": 4363.5, + "text": " to me they''re like they require so much intelligence so that they become + indistinguishable in some", "tokens": [51316, 281, 385, 436, 434, 411, 436, 3651, + 370, 709, 7599, 370, 300, 436, 1813, 1016, 468, 7050, 742, 712, 294, 512, 51676], + "temperature": 0.0, "avg_logprob": -0.09483429693406628, "compression_ratio": 1.7212389380530972, + "no_speech_prob": 0.004677057731896639}, {"id": 706, "seek": 436350, "start": 4363.5, + "end": 4370.06, "text": " sense like what exactly are you now solving the MLOPS + problem are you solving the you know the", "tokens": [50364, 2020, 411, 437, 2293, + 366, 291, 586, 12606, 264, 21601, 46, 6273, 1154, 366, 291, 12606, 264, 291, 458, + 264, 50692], "temperature": 0.0, "avg_logprob": -0.2587367466517857, "compression_ratio": + 1.7914691943127963, "no_speech_prob": 0.006465723272413015}, {"id": 707, "seek": + 436350, "start": 4370.06, "end": 4375.1, "text": " inverted index data structure + limitation problem or are you solving how do I retrain the", "tokens": [50692, 38969, + 8186, 1412, 3877, 27432, 1154, 420, 366, 291, 12606, 577, 360, 286, 1533, 7146, + 264, 50944], "temperature": 0.0, "avg_logprob": -0.2587367466517857, "compression_ratio": + 1.7914691943127963, "no_speech_prob": 0.006465723272413015}, {"id": 708, "seek": + 436350, "start": 4375.1, "end": 4380.38, "text": " embeddings how did you train + the model or fine tune the model and I don''t recompute the embeddings", "tokens": + [50944, 12240, 29432, 577, 630, 291, 3847, 264, 2316, 420, 2489, 10864, 264, 2316, + 293, 286, 500, 380, 48000, 1169, 264, 12240, 29432, 51208], "temperature": 0.0, + "avg_logprob": -0.2587367466517857, "compression_ratio": 1.7914691943127963, "no_speech_prob": + 0.006465723272413015}, {"id": 709, "seek": 436350, "start": 4380.38, "end": 4389.26, + "text": " because it''s a way to expand so it''s to pay the bill yes does it does + it resonate with you like", "tokens": [51208, 570, 309, 311, 257, 636, 281, 5268, + 370, 309, 311, 281, 1689, 264, 2961, 2086, 775, 309, 775, 309, 34285, 365, 291, + 411, 51652], "temperature": 0.0, "avg_logprob": -0.2587367466517857, "compression_ratio": + 1.7914691943127963, "no_speech_prob": 0.006465723272413015}, {"id": 710, "seek": + 438926, "start": 4389.34, "end": 4395.66, "text": " what what are your thoughts + of that yeah our ct oeddy and delocca has written about product engineering", "tokens": + [50368, 437, 437, 366, 428, 4598, 295, 300, 1338, 527, 269, 83, 277, 292, 3173, + 293, 1103, 905, 496, 575, 3720, 466, 1674, 7043, 50684], "temperature": 0.0, "avg_logprob": + -0.2882191875193379, "compression_ratio": 1.7612612612612613, "no_speech_prob": + 0.017625700682401657}, {"id": 711, "seek": 438926, "start": 4395.66, "end": 4401.18, + "text": " and like on this meta on this meta of like how do these decisions get + made and it''s like I think", "tokens": [50684, 293, 411, 322, 341, 19616, 322, + 341, 19616, 295, 411, 577, 360, 613, 5327, 483, 1027, 293, 309, 311, 411, 286, 519, + 50960], "temperature": 0.0, "avg_logprob": -0.2882191875193379, "compression_ratio": + 1.7612612612612613, "no_speech_prob": 0.017625700682401657}, {"id": 712, "seek": + 438926, "start": 4401.18, "end": 4407.9800000000005, "text": " there''s a book called + change my office I have a bookshelf behind me I used to be in podcasts and", "tokens": + [50960, 456, 311, 257, 1446, 1219, 1319, 452, 3398, 286, 362, 257, 1446, 46626, + 2261, 385, 286, 1143, 281, 312, 294, 24045, 293, 51300], "temperature": 0.0, "avg_logprob": + -0.2882191875193379, "compression_ratio": 1.7612612612612613, "no_speech_prob": + 0.017625700682401657}, {"id": 713, "seek": 438926, "start": 4407.9800000000005, + "end": 4415.9800000000005, "text": " I''d be like it''s that yeah yeah I still have + it actually yes but it''s like it''s like ask your", "tokens": [51300, 286, 1116, + 312, 411, 309, 311, 300, 1338, 1338, 286, 920, 362, 309, 767, 2086, 457, 309, 311, + 411, 309, 311, 411, 1029, 428, 51700], "temperature": 0.0, "avg_logprob": -0.2882191875193379, + "compression_ratio": 1.7612612612612613, "no_speech_prob": 0.017625700682401657}, + {"id": 714, "seek": 441598, "start": 4415.98, "end": 4422.0599999999995, "text": + " developer is a title something like that about and well okay so that maybe maybe + I got a little", "tokens": [50364, 10754, 307, 257, 4876, 746, 411, 300, 466, 293, + 731, 1392, 370, 300, 1310, 1310, 286, 658, 257, 707, 50668], "temperature": 0.0, + "avg_logprob": -0.13517168830422793, "compression_ratio": 1.8333333333333333, "no_speech_prob": + 0.0015687869163230062}, {"id": 715, "seek": 441598, "start": 4422.0599999999995, + "end": 4429.099999999999, "text": " off with this idea of research and engineering + I think the the scientist is very like a metrics", "tokens": [50668, 766, 365, 341, + 1558, 295, 2132, 293, 7043, 286, 519, 264, 264, 12662, 307, 588, 411, 257, 16367, + 51020], "temperature": 0.0, "avg_logprob": -0.13517168830422793, "compression_ratio": + 1.8333333333333333, "no_speech_prob": 0.0015687869163230062}, {"id": 716, "seek": + 441598, "start": 4429.099999999999, "end": 4435.58, "text": " oriented in a different + way like the the engineer like the the diversity of the tests and the data", "tokens": + [51020, 21841, 294, 257, 819, 636, 411, 264, 264, 11403, 411, 264, 264, 8811, 295, + 264, 6921, 293, 264, 1412, 51344], "temperature": 0.0, "avg_logprob": -0.13517168830422793, + "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0015687869163230062}, + {"id": 717, "seek": 441598, "start": 4435.58, "end": 4441.82, "text": " collection + is more important when you''re the when you''re the scientist sort of uh yeah the + the", "tokens": [51344, 5765, 307, 544, 1021, 562, 291, 434, 264, 562, 291, 434, + 264, 12662, 1333, 295, 2232, 1338, 264, 264, 51656], "temperature": 0.0, "avg_logprob": + -0.13517168830422793, "compression_ratio": 1.8333333333333333, "no_speech_prob": + 0.0015687869163230062}, {"id": 718, "seek": 444182, "start": 4441.82, "end": 4447.34, + "text": " engineer needs to build like smoke tests sort of where whereas I see the + scientist needs to like", "tokens": [50364, 11403, 2203, 281, 1322, 411, 8439, 6921, + 1333, 295, 689, 9735, 286, 536, 264, 12662, 2203, 281, 411, 50640], "temperature": + 0.0, "avg_logprob": -0.1600402726067437, "compression_ratio": 1.8846153846153846, + "no_speech_prob": 0.00013123801909387112}, {"id": 719, "seek": 444182, "start": + 4447.34, "end": 4451.9, "text": " have a very rigorous data collection kind of because + that''s sort of how I see the distinction", "tokens": [50640, 362, 257, 588, 29882, + 1412, 5765, 733, 295, 570, 300, 311, 1333, 295, 577, 286, 536, 264, 16844, 50868], + "temperature": 0.0, "avg_logprob": -0.1600402726067437, "compression_ratio": 1.8846153846153846, + "no_speech_prob": 0.00013123801909387112}, {"id": 720, "seek": 444182, "start": + 4451.9, "end": 4457.98, "text": " and responsibility sort of is that makes sense + yeah it does it does actually yeah you you uh you gave a", "tokens": [50868, 293, + 6357, 1333, 295, 307, 300, 1669, 2020, 1338, 309, 775, 309, 775, 767, 1338, 291, + 291, 2232, 291, 2729, 257, 51172], "temperature": 0.0, "avg_logprob": -0.1600402726067437, + "compression_ratio": 1.8846153846153846, "no_speech_prob": 0.00013123801909387112}, + {"id": 721, "seek": 444182, "start": 4457.98, "end": 4464.54, "text": " very good + distinctive you know feature what I was trying to say is that like in engineering + you still", "tokens": [51172, 588, 665, 27766, 291, 458, 4111, 437, 286, 390, 1382, + 281, 584, 307, 300, 411, 294, 7043, 291, 920, 51500], "temperature": 0.0, "avg_logprob": + -0.1600402726067437, "compression_ratio": 1.8846153846153846, "no_speech_prob": + 0.00013123801909387112}, {"id": 722, "seek": 444182, "start": 4464.54, "end": 4470.78, + "text": " have a plethora of options like it''s combinatorial explosions in certain + cases there are also", "tokens": [51500, 362, 257, 499, 302, 7013, 295, 3956, 411, + 309, 311, 2512, 31927, 831, 36872, 294, 1629, 3331, 456, 366, 611, 51812], "temperature": + 0.0, "avg_logprob": -0.1600402726067437, "compression_ratio": 1.8846153846153846, + "no_speech_prob": 0.00013123801909387112}, {"id": 723, "seek": 447078, "start": + 4470.78, "end": 4476.38, "text": " mundane parts in both of these right so like + we are not talking about them but like they do exist", "tokens": [50364, 43497, + 3166, 294, 1293, 295, 613, 558, 370, 411, 321, 366, 406, 1417, 466, 552, 457, 411, + 436, 360, 2514, 50644], "temperature": 0.0, "avg_logprob": -0.06802671220567491, + "compression_ratio": 1.724890829694323, "no_speech_prob": 0.0008137279073707759}, + {"id": 724, "seek": 447078, "start": 4476.38, "end": 4483.34, "text": " but like + you do have these points like okay should I branch this way or that way should I + step back", "tokens": [50644, 457, 411, 291, 360, 362, 613, 2793, 411, 1392, 820, + 286, 9819, 341, 636, 420, 300, 636, 820, 286, 1823, 646, 50992], "temperature": + 0.0, "avg_logprob": -0.06802671220567491, "compression_ratio": 1.724890829694323, + "no_speech_prob": 0.0008137279073707759}, {"id": 725, "seek": 447078, "start": 4483.34, + "end": 4490.139999999999, "text": " and rethink and and that''s yeah but I agree + I agree you you gave a really good example of like in", "tokens": [50992, 293, 34595, + 293, 293, 300, 311, 1338, 457, 286, 3986, 286, 3986, 291, 291, 2729, 257, 534, 665, + 1365, 295, 411, 294, 51332], "temperature": 0.0, "avg_logprob": -0.06802671220567491, + "compression_ratio": 1.724890829694323, "no_speech_prob": 0.0008137279073707759}, + {"id": 726, "seek": 447078, "start": 4490.139999999999, "end": 4498.219999999999, + "text": " research I do care about data so much in engineering it''s probably the + quality assurance department", "tokens": [51332, 2132, 286, 360, 1127, 466, 1412, + 370, 709, 294, 7043, 309, 311, 1391, 264, 3125, 32189, 5882, 51736], "temperature": + 0.0, "avg_logprob": -0.06802671220567491, "compression_ratio": 1.724890829694323, + "no_speech_prob": 0.0008137279073707759}, {"id": 727, "seek": 449822, "start": 4498.7, + "end": 4504.22, "text": " is going to worry about okay what data we''re going to + feed into the system to try to kind of maybe", "tokens": [50388, 307, 516, 281, + 3292, 466, 1392, 437, 1412, 321, 434, 516, 281, 3154, 666, 264, 1185, 281, 853, + 281, 733, 295, 1310, 50664], "temperature": 0.0, "avg_logprob": -0.12567006780746134, + "compression_ratio": 1.7292576419213974, "no_speech_prob": 0.0021686244290322065}, + {"id": 728, "seek": 449822, "start": 4504.22, "end": 4511.26, "text": " break it + and see limits and where it breaks what do we need to fix um or is it kind of like + stable", "tokens": [50664, 1821, 309, 293, 536, 10406, 293, 689, 309, 9857, 437, + 360, 321, 643, 281, 3191, 1105, 420, 307, 309, 733, 295, 411, 8351, 51016], "temperature": + 0.0, "avg_logprob": -0.12567006780746134, "compression_ratio": 1.7292576419213974, + "no_speech_prob": 0.0021686244290322065}, {"id": 729, "seek": 449822, "start": 4511.26, + "end": 4517.5, "text": " what it proved enough to release you know things like that + so but yeah I think if I can stay on this", "tokens": [51016, 437, 309, 14617, 1547, + 281, 4374, 291, 458, 721, 411, 300, 370, 457, 1338, 286, 519, 498, 286, 393, 1754, + 322, 341, 51328], "temperature": 0.0, "avg_logprob": -0.12567006780746134, "compression_ratio": + 1.7292576419213974, "no_speech_prob": 0.0021686244290322065}, {"id": 730, "seek": + 449822, "start": 4517.5, "end": 4523.02, "text": " a little more I think this like + generalization testing like the industry of quality assurance but", "tokens": [51328, + 257, 707, 544, 286, 519, 341, 411, 2674, 2144, 4997, 411, 264, 3518, 295, 3125, + 32189, 457, 51604], "temperature": 0.0, "avg_logprob": -0.12567006780746134, "compression_ratio": + 1.7292576419213974, "no_speech_prob": 0.0021686244290322065}, {"id": 731, "seek": + 452302, "start": 4523.02, "end": 4529.42, "text": " 4D learning is is going to be + really fascinating I''m like excited like how I think when we first met", "tokens": + [50364, 1017, 35, 2539, 307, 307, 516, 281, 312, 534, 10343, 286, 478, 411, 2919, + 411, 577, 286, 519, 562, 321, 700, 1131, 50684], "temperature": 0.0, "avg_logprob": + -0.1162019177017925, "compression_ratio": 1.8059701492537314, "no_speech_prob": + 0.005159751046448946}, {"id": 732, "seek": 452302, "start": 4529.42, "end": 4534.860000000001, + "text": " you had written this um not all vector databases are equal and I thought + that was so insightful", "tokens": [50684, 291, 632, 3720, 341, 1105, 406, 439, + 8062, 22380, 366, 2681, 293, 286, 1194, 300, 390, 370, 46401, 50956], "temperature": + 0.0, "avg_logprob": -0.1162019177017925, "compression_ratio": 1.8059701492537314, + "no_speech_prob": 0.005159751046448946}, {"id": 733, "seek": 452302, "start": 4534.860000000001, + "end": 4539.740000000001, "text": " because it was like a you told the story of + an emerging market and that was so interesting I", "tokens": [50956, 570, 309, 390, + 411, 257, 291, 1907, 264, 1657, 295, 364, 14989, 2142, 293, 300, 390, 370, 1880, + 286, 51200], "temperature": 0.0, "avg_logprob": -0.1162019177017925, "compression_ratio": + 1.8059701492537314, "no_speech_prob": 0.005159751046448946}, {"id": 734, "seek": + 452302, "start": 4539.740000000001, "end": 4544.3, "text": " really look forward + to seeing like the story of the emerging market around generalization testing", + "tokens": [51200, 534, 574, 2128, 281, 2577, 411, 264, 1657, 295, 264, 14989, 2142, + 926, 2674, 2144, 4997, 51428], "temperature": 0.0, "avg_logprob": -0.1162019177017925, + "compression_ratio": 1.8059701492537314, "no_speech_prob": 0.005159751046448946}, + {"id": 735, "seek": 452302, "start": 4544.3, "end": 4550.38, "text": " I think like + um like with the beer benchmarks that kind of thing where it''s like you create + some", "tokens": [51428, 286, 519, 411, 1105, 411, 365, 264, 8795, 43751, 300, 733, + 295, 551, 689, 309, 311, 411, 291, 1884, 512, 51732], "temperature": 0.0, "avg_logprob": + -0.1162019177017925, "compression_ratio": 1.8059701492537314, "no_speech_prob": + 0.005159751046448946}, {"id": 736, "seek": 455038, "start": 4550.38, "end": 4556.7, + "text": " million scale data set and have the NDCG recall precision with all these + queries I think maybe", "tokens": [50364, 2459, 4373, 1412, 992, 293, 362, 264, + 426, 25619, 38, 9901, 18356, 365, 439, 613, 24109, 286, 519, 1310, 50680], "temperature": + 0.0, "avg_logprob": -0.1972413106007619, "compression_ratio": 1.6597222222222223, + "no_speech_prob": 0.0011497298255562782}, {"id": 737, "seek": 455038, "start": 4556.7, + "end": 4563.02, "text": " also this idea of like AB testing with models is going + to be more popular I was when I went to", "tokens": [50680, 611, 341, 1558, 295, + 411, 13838, 4997, 365, 5245, 307, 516, 281, 312, 544, 3743, 286, 390, 562, 286, + 1437, 281, 50996], "temperature": 0.0, "avg_logprob": -0.1972413106007619, "compression_ratio": + 1.6597222222222223, "no_speech_prob": 0.0011497298255562782}, {"id": 738, "seek": + 455038, "start": 4563.02, "end": 4568.7, "text": " Neurops this year and there is + this talk from Dr. Juhau came about interaction centric AI and", "tokens": [50996, + 1734, 374, 3370, 341, 1064, 293, 456, 307, 341, 751, 490, 2491, 13, 508, 3232, 1459, + 1361, 466, 9285, 1489, 1341, 7318, 293, 51280], "temperature": 0.0, "avg_logprob": + -0.1972413106007619, "compression_ratio": 1.6597222222222223, "no_speech_prob": + 0.0011497298255562782}, {"id": 739, "seek": 455038, "start": 4568.7, "end": 4573.9800000000005, + "text": " how that might differ from the first paradigm of model centric AI where + say you judge the image", "tokens": [51280, 577, 300, 1062, 743, 490, 264, 700, + 24709, 295, 2316, 1489, 1341, 7318, 689, 584, 291, 6995, 264, 3256, 51544], "temperature": + 0.0, "avg_logprob": -0.1972413106007619, "compression_ratio": 1.6597222222222223, + "no_speech_prob": 0.0011497298255562782}, {"id": 740, "seek": 455038, "start": 4573.9800000000005, + "end": 4579.9800000000005, "text": " generation model purely based on like inception + score for shade is tends to feature spaces in real", "tokens": [51544, 5125, 2316, + 17491, 2361, 322, 411, 49834, 6175, 337, 11466, 307, 12258, 281, 4111, 7673, 294, + 957, 51844], "temperature": 0.0, "avg_logprob": -0.1972413106007619, "compression_ratio": + 1.6597222222222223, "no_speech_prob": 0.0011497298255562782}, {"id": 741, "seek": + 457998, "start": 4579.98, "end": 4586.299999999999, "text": " images and then to + data centric AI which is like I think snorkel AI is very responsible for like", + "tokens": [50364, 5267, 293, 550, 281, 1412, 1489, 1341, 7318, 597, 307, 411, 286, + 519, 2406, 284, 7124, 7318, 307, 588, 6250, 337, 411, 50680], "temperature": 0.0, + "avg_logprob": -0.12746190216581701, "compression_ratio": 1.8403041825095057, "no_speech_prob": + 0.000681009201798588}, {"id": 742, "seek": 457998, "start": 4586.94, "end": 4591.099999999999, + "text": " branding that term and making it so popular but it''s like you''re really + focusing on the curation", "tokens": [50712, 27279, 300, 1433, 293, 1455, 309, 370, + 3743, 457, 309, 311, 411, 291, 434, 534, 8416, 322, 264, 1262, 399, 50920], "temperature": + 0.0, "avg_logprob": -0.12746190216581701, "compression_ratio": 1.8403041825095057, + "no_speech_prob": 0.000681009201798588}, {"id": 743, "seek": 457998, "start": 4591.099999999999, + "end": 4596.94, "text": " of data like your language model is like mosaic and oslatus + pub med gpt it''s about like you have", "tokens": [50920, 295, 1412, 411, 428, 2856, + 2316, 307, 411, 275, 42261, 293, 3003, 75, 37926, 1535, 1205, 290, 662, 309, 311, + 466, 411, 291, 362, 51212], "temperature": 0.0, "avg_logprob": -0.12746190216581701, + "compression_ratio": 1.8403041825095057, "no_speech_prob": 0.000681009201798588}, + {"id": 744, "seek": 457998, "start": 4596.94, "end": 4602.139999999999, "text": + " this particular data and you like clean it and you make it awesome and then I + think interaction", "tokens": [51212, 341, 1729, 1412, 293, 291, 411, 2541, 309, + 293, 291, 652, 309, 3476, 293, 550, 286, 519, 9285, 51472], "temperature": 0.0, + "avg_logprob": -0.12746190216581701, "compression_ratio": 1.8403041825095057, "no_speech_prob": + 0.000681009201798588}, {"id": 745, "seek": 457998, "start": 4602.139999999999, "end": + 4608.7, "text": " centric AI is like a new way to evaluate models where it''s like + AB testing driven kind of or like", "tokens": [51472, 1489, 1341, 7318, 307, 411, + 257, 777, 636, 281, 13059, 5245, 689, 309, 311, 411, 13838, 4997, 9555, 733, 295, + 420, 411, 51800], "temperature": 0.0, "avg_logprob": -0.12746190216581701, "compression_ratio": + 1.8403041825095057, "no_speech_prob": 0.000681009201798588}, {"id": 746, "seek": + 460870, "start": 4608.78, "end": 4613.099999999999, "text": " how quickly can you + perform a test I don''t know if I''ve gotten too else topic but", "tokens": [50368, + 577, 2661, 393, 291, 2042, 257, 1500, 286, 500, 380, 458, 498, 286, 600, 5768, 886, + 1646, 4829, 457, 50584], "temperature": 0.0, "avg_logprob": -0.1621275169904842, + "compression_ratio": 1.6807511737089202, "no_speech_prob": 0.006173889618366957}, + {"id": 747, "seek": 460870, "start": 4613.099999999999, "end": 4621.26, "text": + " no I think it''s it''s exactly the topic to focus on if we are serious about you + know putting", "tokens": [50584, 572, 286, 519, 309, 311, 309, 311, 2293, 264, 4829, + 281, 1879, 322, 498, 321, 366, 3156, 466, 291, 458, 3372, 50992], "temperature": + 0.0, "avg_logprob": -0.1621275169904842, "compression_ratio": 1.6807511737089202, + "no_speech_prob": 0.006173889618366957}, {"id": 748, "seek": 460870, "start": 4621.26, + "end": 4626.46, "text": " these things out in production like you do need you do + need to have and provide an evidence", "tokens": [50992, 613, 721, 484, 294, 4265, + 411, 291, 360, 643, 291, 360, 643, 281, 362, 293, 2893, 364, 4467, 51252], "temperature": + 0.0, "avg_logprob": -0.1621275169904842, "compression_ratio": 1.6807511737089202, + "no_speech_prob": 0.006173889618366957}, {"id": 749, "seek": 460870, "start": 4626.46, + "end": 4633.179999999999, "text": " to the stakeholders that and to yourself that + this dust hold water and we can release it and", "tokens": [51252, 281, 264, 17779, + 300, 293, 281, 1803, 300, 341, 8634, 1797, 1281, 293, 321, 393, 4374, 309, 293, + 51588], "temperature": 0.0, "avg_logprob": -0.1621275169904842, "compression_ratio": + 1.6807511737089202, "no_speech_prob": 0.006173889618366957}, {"id": 750, "seek": + 463318, "start": 4633.18, "end": 4638.780000000001, "text": " it''s not going to + show something you know in discriminate to the users that they will be", "tokens": + [50364, 309, 311, 406, 516, 281, 855, 746, 291, 458, 294, 47833, 281, 264, 5022, + 300, 436, 486, 312, 50644], "temperature": 0.0, "avg_logprob": -0.1542003059387207, + "compression_ratio": 1.7756653992395437, "no_speech_prob": 0.002740477677434683}, + {"id": 751, "seek": 463318, "start": 4638.780000000001, "end": 4644.38, "text": + " completely you know puzzled and stuff or maybe you know there are all these numerous + examples when", "tokens": [50644, 2584, 291, 458, 18741, 1493, 293, 1507, 420, 1310, + 291, 458, 456, 366, 439, 613, 12546, 5110, 562, 50924], "temperature": 0.0, "avg_logprob": + -0.1542003059387207, "compression_ratio": 1.7756653992395437, "no_speech_prob": + 0.002740477677434683}, {"id": 752, "seek": 463318, "start": 4645.5, "end": 4650.3, + "text": " like Google search when they I think incorporated some distilled version + of bird when they", "tokens": [50980, 411, 3329, 3164, 562, 436, 286, 519, 21654, + 512, 1483, 6261, 3037, 295, 5255, 562, 436, 51220], "temperature": 0.0, "avg_logprob": + -0.1542003059387207, "compression_ratio": 1.7756653992395437, "no_speech_prob": + 0.002740477677434683}, {"id": 753, "seek": 463318, "start": 4650.3, "end": 4654.62, + "text": " would flip the meaning and they would say you do take this medicine but + actually in the", "tokens": [51220, 576, 7929, 264, 3620, 293, 436, 576, 584, 291, + 360, 747, 341, 7195, 457, 767, 294, 264, 51436], "temperature": 0.0, "avg_logprob": + -0.1542003059387207, "compression_ratio": 1.7756653992395437, "no_speech_prob": + 0.002740477677434683}, {"id": 754, "seek": 463318, "start": 4655.42, "end": 4661.1, + "text": " prescription it says you do not take that medicine or vice versa you know + because it''s not sensitive", "tokens": [51476, 22456, 309, 1619, 291, 360, 406, + 747, 300, 7195, 420, 11964, 25650, 291, 458, 570, 309, 311, 406, 9477, 51760], "temperature": + 0.0, "avg_logprob": -0.1542003059387207, "compression_ratio": 1.7756653992395437, + "no_speech_prob": 0.002740477677434683}, {"id": 755, "seek": 466110, "start": 4661.1, + "end": 4668.780000000001, "text": " to negations and stuff so like I totally agree + I''m with you on that like how do we QA", "tokens": [50364, 281, 2485, 763, 293, + 1507, 370, 411, 286, 3879, 3986, 286, 478, 365, 291, 322, 300, 411, 577, 360, 321, + 1249, 32, 50748], "temperature": 0.0, "avg_logprob": -0.1914225589023547, "compression_ratio": + 1.6306306306306306, "no_speech_prob": 0.003131803823634982}, {"id": 756, "seek": + 466110, "start": 4670.38, "end": 4675.9800000000005, "text": " quality of sure you + know that the systems that release and I think the open AI", "tokens": [50828, 3125, + 295, 988, 291, 458, 300, 264, 3652, 300, 4374, 293, 286, 519, 264, 1269, 7318, 51108], + "temperature": 0.0, "avg_logprob": -0.1914225589023547, "compression_ratio": 1.6306306306306306, + "no_speech_prob": 0.003131803823634982}, {"id": 757, "seek": 466110, "start": 4675.9800000000005, + "end": 4682.3, "text": " team did that brilliant trick in a way that they said hey + here is the chat GPT go test it and they", "tokens": [51108, 1469, 630, 300, 10248, + 4282, 294, 257, 636, 300, 436, 848, 4177, 510, 307, 264, 5081, 26039, 51, 352, 1500, + 309, 293, 436, 51424], "temperature": 0.0, "avg_logprob": -0.1914225589023547, "compression_ratio": + 1.6306306306306306, "no_speech_prob": 0.003131803823634982}, {"id": 758, "seek": + 466110, "start": 4682.3, "end": 4690.54, "text": " get like million users in the + first few days because they actually do need some extra brains to do", "tokens": + [51424, 483, 411, 2459, 5022, 294, 264, 700, 1326, 1708, 570, 436, 767, 360, 643, + 512, 2857, 15442, 281, 360, 51836], "temperature": 0.0, "avg_logprob": -0.1914225589023547, + "compression_ratio": 1.6306306306306306, "no_speech_prob": 0.003131803823634982}, + {"id": 759, "seek": 469110, "start": 4691.1, "end": 4697.18, "text": " go and test + in different like scenarios and see where it breaks maybe it doesn''t make sense + anymore so", "tokens": [50364, 352, 293, 1500, 294, 819, 411, 15077, 293, 536, 689, + 309, 9857, 1310, 309, 1177, 380, 652, 2020, 3602, 370, 50668], "temperature": 0.0, + "avg_logprob": -0.22550534142388237, "compression_ratio": 1.7004405286343611, "no_speech_prob": + 0.0018624988151714206}, {"id": 760, "seek": 469110, "start": 4697.18, "end": 4704.3, + "text": " yeah it''s my understanding that''s how like scale AI became the kings + is that you know like labeled data", "tokens": [50668, 1338, 309, 311, 452, 3701, + 300, 311, 577, 411, 4373, 7318, 3062, 264, 21581, 307, 300, 291, 458, 411, 21335, + 1412, 51024], "temperature": 0.0, "avg_logprob": -0.22550534142388237, "compression_ratio": + 1.7004405286343611, "no_speech_prob": 0.0018624988151714206}, {"id": 761, "seek": + 469110, "start": 4705.34, "end": 4709.5, "text": " like mechanical Turk I think + Sir J.I. is something that''s emerging that I''ve been seeing", "tokens": [51076, + 411, 12070, 15714, 286, 519, 6144, 508, 13, 40, 13, 307, 746, 300, 311, 14989, 300, + 286, 600, 668, 2577, 51284], "temperature": 0.0, "avg_logprob": -0.22550534142388237, + "compression_ratio": 1.7004405286343611, "no_speech_prob": 0.0018624988151714206}, + {"id": 762, "seek": 469110, "start": 4710.22, "end": 4718.54, "text": " yeah it''s + really interesting yeah exactly um yeah um I was I was wondering um you you also", + "tokens": [51320, 1338, 309, 311, 534, 1880, 1338, 2293, 1105, 1338, 1105, 286, + 390, 286, 390, 6359, 1105, 291, 291, 611, 51736], "temperature": 0.0, "avg_logprob": + -0.22550534142388237, "compression_ratio": 1.7004405286343611, "no_speech_prob": + 0.0018624988151714206}, {"id": 763, "seek": 471854, "start": 4719.18, "end": 4725.58, + "text": " worked on this podcast search and you had the opinion that Whisper has + some bottlenecks I", "tokens": [50396, 2732, 322, 341, 7367, 3164, 293, 291, 632, + 264, 4800, 300, 41132, 610, 575, 512, 44641, 2761, 286, 50716], "temperature": 0.0, + "avg_logprob": -0.1904923915863037, "compression_ratio": 1.71875, "no_speech_prob": + 0.005402225535362959}, {"id": 764, "seek": 471854, "start": 4725.58, "end": 4730.7, + "text": " wonder if you if you want to like tap into that a little bit yeah so I''d + love to tell this story so", "tokens": [50716, 2441, 498, 291, 498, 291, 528, 281, + 411, 5119, 666, 300, 257, 707, 857, 1338, 370, 286, 1116, 959, 281, 980, 341, 1657, + 370, 50972], "temperature": 0.0, "avg_logprob": -0.1904923915863037, "compression_ratio": + 1.71875, "no_speech_prob": 0.005402225535362959}, {"id": 765, "seek": 471854, "start": + 4730.7, "end": 4737.9, "text": " uh so it comes the kind of story behind it is uh + so Boris power at open AI tweeted uh so they", "tokens": [50972, 2232, 370, 309, + 1487, 264, 733, 295, 1657, 2261, 309, 307, 2232, 370, 27158, 1347, 412, 1269, 7318, + 25646, 2232, 370, 436, 51332], "temperature": 0.0, "avg_logprob": -0.1904923915863037, + "compression_ratio": 1.71875, "no_speech_prob": 0.005402225535362959}, {"id": 766, + "seek": 471854, "start": 4737.9, "end": 4743.58, "text": " they cut the prices for + the open AI embeddings and and Boris is pointing out how cheap it would be to", + "tokens": [51332, 436, 1723, 264, 7901, 337, 264, 1269, 7318, 12240, 29432, 293, + 293, 27158, 307, 12166, 484, 577, 7084, 309, 576, 312, 281, 51616], "temperature": + 0.0, "avg_logprob": -0.1904923915863037, "compression_ratio": 1.71875, "no_speech_prob": + 0.005402225535362959}, {"id": 767, "seek": 474358, "start": 4743.58, "end": 4749.5, + "text": " index a massive podcast like the Joe Rogan podcast so that''s how I was + like hey I have a podcast", "tokens": [50364, 8186, 257, 5994, 7367, 411, 264, 6807, + 11860, 282, 7367, 370, 300, 311, 577, 286, 390, 411, 4177, 286, 362, 257, 7367, + 50660], "temperature": 0.0, "avg_logprob": -0.202735349867079, "compression_ratio": + 1.7832512315270936, "no_speech_prob": 0.0017639321740716696}, {"id": 768, "seek": + 474358, "start": 4754.54, "end": 4759.0199999999995, "text": " and you have a vector + podcast and we did also but so I started to be", "tokens": [50912, 293, 291, 362, + 257, 8062, 7367, 293, 321, 630, 611, 457, 370, 286, 1409, 281, 312, 51136], "temperature": + 0.0, "avg_logprob": -0.202735349867079, "compression_ratio": 1.7832512315270936, + "no_speech_prob": 0.0017639321740716696}, {"id": 769, "seek": 474358, "start": 4760.38, + "end": 4764.0599999999995, "text": " you know I started doing this where you you + know you take the audio files then you put them into", "tokens": [51204, 291, 458, + 286, 1409, 884, 341, 689, 291, 291, 458, 291, 747, 264, 6278, 7098, 550, 291, 829, + 552, 666, 51388], "temperature": 0.0, "avg_logprob": -0.202735349867079, "compression_ratio": + 1.7832512315270936, "no_speech_prob": 0.0017639321740716696}, {"id": 770, "seek": + 474358, "start": 4764.0599999999995, "end": 4768.14, "text": " Whisper I also tried + like uh descript is something that I like a lot I''ve been using descript for a", + "tokens": [51388, 41132, 610, 286, 611, 3031, 411, 2232, 31280, 307, 746, 300, 286, + 411, 257, 688, 286, 600, 668, 1228, 31280, 337, 257, 51592], "temperature": 0.0, + "avg_logprob": -0.202735349867079, "compression_ratio": 1.7832512315270936, "no_speech_prob": + 0.0017639321740716696}, {"id": 771, "seek": 476814, "start": 4768.14, "end": 4777.5, + "text": " long time for editing videos and so it''s like you still because you it''s + very the podcast transcriptions", "tokens": [50364, 938, 565, 337, 10000, 2145, + 293, 370, 309, 311, 411, 291, 920, 570, 291, 309, 311, 588, 264, 7367, 24444, 626, + 50832], "temperature": 0.0, "avg_logprob": -0.09911531017672631, "compression_ratio": + 1.8130841121495327, "no_speech_prob": 0.0007956991321407259}, {"id": 772, "seek": + 476814, "start": 4777.5, "end": 4783.26, "text": " you still want to edit them a + bit you you have like uh and like like if you were yes how I''m", "tokens": [50832, + 291, 920, 528, 281, 8129, 552, 257, 857, 291, 291, 362, 411, 2232, 293, 411, 411, + 498, 291, 645, 2086, 577, 286, 478, 51120], "temperature": 0.0, "avg_logprob": -0.09911531017672631, + "compression_ratio": 1.8130841121495327, "no_speech_prob": 0.0007956991321407259}, + {"id": 773, "seek": 476814, "start": 4783.26, "end": 4788.46, "text": " pausing + right now I''m talking about but the transcriptions it''s not quite what you want + to like", "tokens": [51120, 2502, 7981, 558, 586, 286, 478, 1417, 466, 457, 264, + 24444, 626, 309, 311, 406, 1596, 437, 291, 528, 281, 411, 51380], "temperature": + 0.0, "avg_logprob": -0.09911531017672631, "compression_ratio": 1.8130841121495327, + "no_speech_prob": 0.0007956991321407259}, {"id": 774, "seek": 476814, "start": 4788.46, + "end": 4794.46, "text": " index to this idea of like how do we create a knowledge + base from these podcasts because these", "tokens": [51380, 8186, 281, 341, 1558, + 295, 411, 577, 360, 321, 1884, 257, 3601, 3096, 490, 613, 24045, 570, 613, 51680], + "temperature": 0.0, "avg_logprob": -0.09911531017672631, "compression_ratio": 1.8130841121495327, + "no_speech_prob": 0.0007956991321407259}, {"id": 775, "seek": 479446, "start": 4794.46, + "end": 4799.66, "text": " podcasts is so like we''ve covered so many topics and + it''s so it''s kind of easier to do it like", "tokens": [50364, 24045, 307, 370, + 411, 321, 600, 5343, 370, 867, 8378, 293, 309, 311, 370, 309, 311, 733, 295, 3571, + 281, 360, 309, 411, 50624], "temperature": 0.0, "avg_logprob": -0.11640775100044583, + "compression_ratio": 1.8171641791044777, "no_speech_prob": 0.00045416783541440964}, + {"id": 776, "seek": 479446, "start": 4799.66, "end": 4805.18, "text": " this than + to be writing all this down and then and also it''s very collaborative uh like the + podcast", "tokens": [50624, 341, 813, 281, 312, 3579, 439, 341, 760, 293, 550, 293, + 611, 309, 311, 588, 16555, 2232, 411, 264, 7367, 50900], "temperature": 0.0, "avg_logprob": + -0.11640775100044583, "compression_ratio": 1.8171641791044777, "no_speech_prob": + 0.00045416783541440964}, {"id": 777, "seek": 479446, "start": 4805.18, "end": 4811.18, + "text": " you get more people involved it''s like a community building thing is + so yeah that idea of creating", "tokens": [50900, 291, 483, 544, 561, 3288, 309, + 311, 411, 257, 1768, 2390, 551, 307, 370, 1338, 300, 1558, 295, 4084, 51200], "temperature": + 0.0, "avg_logprob": -0.11640775100044583, "compression_ratio": 1.8171641791044777, + "no_speech_prob": 0.00045416783541440964}, {"id": 778, "seek": 479446, "start": + 4811.18, "end": 4815.9800000000005, "text": " knowledge bases out of podcasts like + what would you write your interest on a scale of one to 10", "tokens": [51200, 3601, + 17949, 484, 295, 24045, 411, 437, 576, 291, 2464, 428, 1179, 322, 257, 4373, 295, + 472, 281, 1266, 51440], "temperature": 0.0, "avg_logprob": -0.11640775100044583, + "compression_ratio": 1.8171641791044777, "no_speech_prob": 0.00045416783541440964}, + {"id": 779, "seek": 479446, "start": 4815.9800000000005, "end": 4823.9, "text": + " of having a vector podcast I mean I would love to join to join the you know to + join the geek here", "tokens": [51440, 295, 1419, 257, 8062, 7367, 286, 914, 286, + 576, 959, 281, 3917, 281, 3917, 264, 291, 458, 281, 3917, 264, 36162, 510, 51836], + "temperature": 0.0, "avg_logprob": -0.11640775100044583, "compression_ratio": 1.8171641791044777, + "no_speech_prob": 0.00045416783541440964}, {"id": 780, "seek": 482390, "start": + 4823.9, "end": 4829.98, "text": " so because I do like I was rewatching the episode + with you Danny here we go and you were like", "tokens": [50364, 370, 570, 286, 360, + 411, 286, 390, 319, 15219, 278, 264, 3500, 365, 291, 16682, 510, 321, 352, 293, + 291, 645, 411, 50668], "temperature": 0.0, "avg_logprob": -0.12673758593472567, + "compression_ratio": 1.859375, "no_speech_prob": 0.00475694052875042}, {"id": 781, + "seek": 482390, "start": 4829.98, "end": 4835.179999999999, "text": " exploding + with knowledge right like in a way you''re branching out a lot today as well exploding", + "tokens": [50668, 35175, 365, 3601, 558, 411, 294, 257, 636, 291, 434, 9819, 278, + 484, 257, 688, 965, 382, 731, 35175, 50928], "temperature": 0.0, "avg_logprob": + -0.12673758593472567, "compression_ratio": 1.859375, "no_speech_prob": 0.00475694052875042}, + {"id": 782, "seek": 482390, "start": 4835.179999999999, "end": 4840.62, "text": + " with knowledge because you you read all these papers you try things you share + you know like google", "tokens": [50928, 365, 3601, 570, 291, 291, 1401, 439, 613, + 10577, 291, 853, 721, 291, 2073, 291, 458, 411, 20742, 51200], "temperature": 0.0, + "avg_logprob": -0.12673758593472567, "compression_ratio": 1.859375, "no_speech_prob": + 0.00475694052875042}, {"id": 783, "seek": 482390, "start": 4840.62, "end": 4847.58, + "text": " collapse and stuff but like like how do I tap into this knowledge like + it''s it''s very synchronous", "tokens": [51200, 15584, 293, 1507, 457, 411, 411, + 577, 360, 286, 5119, 666, 341, 3601, 411, 309, 311, 309, 311, 588, 44743, 51548], + "temperature": 0.0, "avg_logprob": -0.12673758593472567, "compression_ratio": 1.859375, + "no_speech_prob": 0.00475694052875042}, {"id": 784, "seek": 482390, "start": 4847.58, + "end": 4853.259999999999, "text": " right I have to like there is no way to like + random jump into hey where did he talk about", "tokens": [51548, 558, 286, 362, + 281, 411, 456, 307, 572, 636, 281, 411, 4974, 3012, 666, 4177, 689, 630, 415, 751, + 466, 51832], "temperature": 0.0, "avg_logprob": -0.12673758593472567, "compression_ratio": + 1.859375, "no_speech_prob": 0.00475694052875042}, {"id": 785, "seek": 485390, "start": + 4854.0599999999995, "end": 4860.219999999999, "text": " you know that model from + Microsoft like I don''t know unless I have the time code I don''t have a", "tokens": + [50372, 291, 458, 300, 2316, 490, 8116, 411, 286, 500, 380, 458, 5969, 286, 362, + 264, 565, 3089, 286, 500, 380, 362, 257, 50680], "temperature": 0.0, "avg_logprob": + -0.147407560932393, "compression_ratio": 1.669603524229075, "no_speech_prob": 0.003405550494790077}, + {"id": 786, "seek": 485390, "start": 4860.219999999999, "end": 4867.9, "text": " + way to do that right so yeah yeah and I what that''s what inspires me so much with + I want to fine-tune", "tokens": [50680, 636, 281, 360, 300, 558, 370, 1338, 1338, + 293, 286, 437, 300, 311, 437, 32566, 385, 370, 709, 365, 286, 528, 281, 2489, 12, + 83, 2613, 51064], "temperature": 0.0, "avg_logprob": -0.147407560932393, "compression_ratio": + 1.669603524229075, "no_speech_prob": 0.003405550494790077}, {"id": 787, "seek": + 485390, "start": 4867.9, "end": 4874.78, "text": " these models so badly just on + the uh turn taking as the positive labeling and yeah I think", "tokens": [51064, + 613, 5245, 370, 13425, 445, 322, 264, 2232, 1261, 1940, 382, 264, 3353, 40244, 293, + 1338, 286, 519, 51408], "temperature": 0.0, "avg_logprob": -0.147407560932393, "compression_ratio": + 1.669603524229075, "no_speech_prob": 0.003405550494790077}, {"id": 788, "seek": + 485390, "start": 4875.82, "end": 4880.379999999999, "text": " can you expand a bit + more on that what do you mean uh okay Conor says I want to talk about", "tokens": + [51460, 393, 291, 5268, 257, 857, 544, 322, 300, 437, 360, 291, 914, 2232, 1392, + 2656, 284, 1619, 286, 528, 281, 751, 466, 51688], "temperature": 0.0, "avg_logprob": + -0.147407560932393, "compression_ratio": 1.669603524229075, "no_speech_prob": 0.003405550494790077}, + {"id": 789, "seek": 488038, "start": 4880.38, "end": 4884.78, "text": " the turn + taking Demetri can you expand on that more on what that means Conor okay it''s like", + "tokens": [50364, 264, 1261, 1940, 4686, 302, 470, 393, 291, 5268, 322, 300, 544, + 322, 437, 300, 1355, 2656, 284, 1392, 309, 311, 411, 50584], "temperature": 0.0, + "avg_logprob": -0.29456416420314624, "compression_ratio": 1.8009478672985781, "no_speech_prob": + 0.0036261503119021654}, {"id": 790, "seek": 488038, "start": 4886.86, "end": 4893.34, + "text": " this is how you do the positive thing that''s like potentially like that + yeah yeah yeah and like if", "tokens": [50688, 341, 307, 577, 291, 360, 264, 3353, + 551, 300, 311, 411, 7263, 411, 300, 1338, 1338, 1338, 293, 411, 498, 51012], "temperature": + 0.0, "avg_logprob": -0.29456416420314624, "compression_ratio": 1.8009478672985781, + "no_speech_prob": 0.0036261503119021654}, {"id": 791, "seek": 488038, "start": 4893.34, + "end": 4898.22, "text": " you want to have more examples of what Conor said like + you could like augment with Conor''s", "tokens": [51012, 291, 528, 281, 362, 544, + 5110, 295, 437, 2656, 284, 848, 411, 291, 727, 411, 29919, 365, 2656, 284, 311, + 51256], "temperature": 0.0, "avg_logprob": -0.29456416420314624, "compression_ratio": + 1.8009478672985781, "no_speech_prob": 0.0036261503119021654}, {"id": 792, "seek": + 488038, "start": 4898.22, "end": 4905.58, "text": " uh statements uh oh like sentences + yeah yeah and just the I feel like the potential of it is crazy", "tokens": [51256, + 2232, 12363, 2232, 1954, 411, 16579, 1338, 1338, 293, 445, 264, 286, 841, 411, 264, + 3995, 295, 309, 307, 3219, 51624], "temperature": 0.0, "avg_logprob": -0.29456416420314624, + "compression_ratio": 1.8009478672985781, "no_speech_prob": 0.0036261503119021654}, + {"id": 793, "seek": 490558, "start": 4906.14, "end": 4911.1, "text": " I also think + like we''re gonna see it like hooked into say Spotify are these big platforms that", + "tokens": [50392, 286, 611, 519, 411, 321, 434, 799, 536, 309, 411, 20410, 666, + 584, 29036, 366, 613, 955, 9473, 300, 50640], "temperature": 0.0, "avg_logprob": + -0.10304287627891258, "compression_ratio": 1.9291338582677164, "no_speech_prob": + 0.0051459879614412785}, {"id": 794, "seek": 490558, "start": 4911.1, "end": 4915.82, + "text": " organize podcasts and I think it''ll help you like discover like because + something else it''s like", "tokens": [50640, 13859, 24045, 293, 286, 519, 309, + 603, 854, 291, 411, 4411, 411, 570, 746, 1646, 309, 311, 411, 50876], "temperature": + 0.0, "avg_logprob": -0.10304287627891258, "compression_ratio": 1.9291338582677164, + "no_speech_prob": 0.0051459879614412785}, {"id": 795, "seek": 490558, "start": 4915.82, + "end": 4920.86, "text": " I love how you do this vector search podcast and I''m + also doing a vector search podcast and it''s like", "tokens": [50876, 286, 959, + 577, 291, 360, 341, 8062, 3164, 7367, 293, 286, 478, 611, 884, 257, 8062, 3164, + 7367, 293, 309, 311, 411, 51128], "temperature": 0.0, "avg_logprob": -0.10304287627891258, + "compression_ratio": 1.9291338582677164, "no_speech_prob": 0.0051459879614412785}, + {"id": 796, "seek": 490558, "start": 4920.86, "end": 4926.14, "text": " who else + is out there doing like maybe a recommendation podcast or like you know like it''s + like", "tokens": [51128, 567, 1646, 307, 484, 456, 884, 411, 1310, 257, 11879, 7367, + 420, 411, 291, 458, 411, 309, 311, 411, 51392], "temperature": 0.0, "avg_logprob": + -0.10304287627891258, "compression_ratio": 1.9291338582677164, "no_speech_prob": + 0.0051459879614412785}, {"id": 797, "seek": 490558, "start": 4926.14, "end": 4931.26, + "text": " this kind of discovery about the people because podcasting is very like + collaborative it is a medium", "tokens": [51392, 341, 733, 295, 12114, 466, 264, + 561, 570, 7367, 278, 307, 588, 411, 16555, 309, 307, 257, 6399, 51648], "temperature": + 0.0, "avg_logprob": -0.10304287627891258, "compression_ratio": 1.9291338582677164, + "no_speech_prob": 0.0051459879614412785}, {"id": 798, "seek": 493126, "start": 4931.34, + "end": 4937.26, "text": " right like it is not like you you can''t do it by yourself + no like it''s like it''s it''s almost like", "tokens": [50368, 558, 411, 309, 307, + 406, 411, 291, 291, 393, 380, 360, 309, 538, 1803, 572, 411, 309, 311, 411, 309, + 311, 309, 311, 1920, 411, 50664], "temperature": 0.0, "avg_logprob": -0.13954882277655847, + "compression_ratio": 1.7782805429864252, "no_speech_prob": 0.009568842127919197}, + {"id": 799, "seek": 493126, "start": 4937.26, "end": 4942.860000000001, "text": + " the thing uh like stand up comedian so anyone who is presenting you do need the + audience because", "tokens": [50664, 264, 551, 2232, 411, 1463, 493, 30212, 370, + 2878, 567, 307, 15578, 291, 360, 643, 264, 4034, 570, 50944], "temperature": 0.0, + "avg_logprob": -0.13954882277655847, "compression_ratio": 1.7782805429864252, "no_speech_prob": + 0.009568842127919197}, {"id": 800, "seek": 493126, "start": 4942.860000000001, "end": + 4951.34, "text": " you simply do not generate the 3d-ness of your thoughts in absence + of people like it''s very hard", "tokens": [50944, 291, 2935, 360, 406, 8460, 264, + 805, 67, 12, 1287, 295, 428, 4598, 294, 17145, 295, 561, 411, 309, 311, 588, 1152, + 51368], "temperature": 0.0, "avg_logprob": -0.13954882277655847, "compression_ratio": + 1.7782805429864252, "no_speech_prob": 0.009568842127919197}, {"id": 801, "seek": + 493126, "start": 4951.34, "end": 4958.06, "text": " to do and then same thing happens + here right now like when we exchange like I like I have like a full", "tokens": + [51368, 281, 360, 293, 550, 912, 551, 2314, 510, 558, 586, 411, 562, 321, 7742, + 411, 286, 411, 286, 362, 411, 257, 1577, 51704], "temperature": 0.0, "avg_logprob": + -0.13954882277655847, "compression_ratio": 1.7782805429864252, "no_speech_prob": + 0.009568842127919197}, {"id": 802, "seek": 495806, "start": 4958.14, "end": 4962.860000000001, + "text": " shade of these notes and stuff right so I wouldn''t be able like what + like do I do you know", "tokens": [50368, 11466, 295, 613, 5570, 293, 1507, 558, + 370, 286, 2759, 380, 312, 1075, 411, 437, 411, 360, 286, 360, 291, 458, 50604], + "temperature": 0.0, "avg_logprob": -0.17972396047491776, "compression_ratio": 1.7735849056603774, + "no_speech_prob": 0.02152528241276741}, {"id": 803, "seek": 495806, "start": 4962.860000000001, + "end": 4969.660000000001, "text": " these things do I know some of these things + you know it''s like a vote if working in your memory but", "tokens": [50604, 613, + 721, 360, 286, 458, 512, 295, 613, 721, 291, 458, 309, 311, 411, 257, 4740, 498, + 1364, 294, 428, 4675, 457, 50944], "temperature": 0.0, "avg_logprob": -0.17972396047491776, + "compression_ratio": 1.7735849056603774, "no_speech_prob": 0.02152528241276741}, + {"id": 804, "seek": 495806, "start": 4969.660000000001, "end": 4975.42, "text": + " like coming back to whisper like just to get it right you you''re saying it''s + still a bottle neck", "tokens": [50944, 411, 1348, 646, 281, 26018, 411, 445, 281, + 483, 309, 558, 291, 291, 434, 1566, 309, 311, 920, 257, 7817, 6189, 51232], "temperature": + 0.0, "avg_logprob": -0.17972396047491776, "compression_ratio": 1.7735849056603774, + "no_speech_prob": 0.02152528241276741}, {"id": 805, "seek": 495806, "start": 4975.42, + "end": 4981.1, "text": " in your opinion in what way okay well I''d hate to be like + quoted as saying it''s not good", "tokens": [51232, 294, 428, 4800, 294, 437, 636, + 1392, 731, 286, 1116, 4700, 281, 312, 411, 30047, 382, 1566, 309, 311, 406, 665, + 51516], "temperature": 0.0, "avg_logprob": -0.17972396047491776, "compression_ratio": + 1.7735849056603774, "no_speech_prob": 0.02152528241276741}, {"id": 806, "seek": + 498110, "start": 4981.1, "end": 4989.26, "text": " if it''s not the same thing which + I value you know it''s not yeah so if you''re creating a podcast", "tokens": [50364, + 498, 309, 311, 406, 264, 912, 551, 597, 286, 2158, 291, 458, 309, 311, 406, 1338, + 370, 498, 291, 434, 4084, 257, 7367, 50772], "temperature": 0.0, "avg_logprob": + -0.22395217895507813, "compression_ratio": 1.7953488372093023, "no_speech_prob": + 0.008322718553245068}, {"id": 807, "seek": 498110, "start": 4989.26, "end": 4994.700000000001, + "text": " search app you like there''s still needs to be a little more parsing I + don''t know if you need to find", "tokens": [50772, 3164, 724, 291, 411, 456, 311, + 920, 2203, 281, 312, 257, 707, 544, 21156, 278, 286, 500, 380, 458, 498, 291, 643, + 281, 915, 51044], "temperature": 0.0, "avg_logprob": -0.22395217895507813, "compression_ratio": + 1.7953488372093023, "no_speech_prob": 0.008322718553245068}, {"id": 808, "seek": + 498110, "start": 4994.700000000001, "end": 5001.42, "text": " I don''t know if you + need to correct one and then fine tune so because I''ve also been playing a", "tokens": + [51044, 286, 500, 380, 458, 498, 291, 643, 281, 3006, 472, 293, 550, 2489, 10864, + 370, 570, 286, 600, 611, 668, 2433, 257, 51380], "temperature": 0.0, "avg_logprob": + -0.22395217895507813, "compression_ratio": 1.7953488372093023, "no_speech_prob": + 0.008322718553245068}, {"id": 809, "seek": 498110, "start": 5001.42, "end": 5005.740000000001, + "text": " little bit more about chat gbc and as as I''ve been learning about this + kind of like sequential", "tokens": [51380, 707, 857, 544, 466, 5081, 290, 65, 66, + 293, 382, 382, 286, 600, 668, 2539, 466, 341, 733, 295, 411, 42881, 51596], "temperature": + 0.0, "avg_logprob": -0.22395217895507813, "compression_ratio": 1.7953488372093023, + "no_speech_prob": 0.008322718553245068}, {"id": 810, "seek": 500574, "start": 5005.74, + "end": 5011.9, "text": " prompting from gbc index and chain about learning like + how you can get chat gbc to maybe clean", "tokens": [50364, 12391, 278, 490, 290, + 65, 66, 8186, 293, 5021, 466, 2539, 411, 577, 291, 393, 483, 5081, 290, 65, 66, + 281, 1310, 2541, 50672], "temperature": 0.0, "avg_logprob": -0.1080804583670079, + "compression_ratio": 1.6711711711711712, "no_speech_prob": 0.00440682889893651}, + {"id": 811, "seek": 500574, "start": 5011.9, "end": 5017.98, "text": " up a podcast + transcription but there''s like still a pretty fat pretty difficult manual", "tokens": + [50672, 493, 257, 7367, 35288, 457, 456, 311, 411, 920, 257, 1238, 4046, 1238, 2252, + 9688, 50976], "temperature": 0.0, "avg_logprob": -0.1080804583670079, "compression_ratio": + 1.6711711711711712, "no_speech_prob": 0.00440682889893651}, {"id": 812, "seek": + 500574, "start": 5017.98, "end": 5025.0199999999995, "text": " cleaning effort in + the middle of that yeah actually I can resonate with that like I''ve I''ve worked", + "tokens": [50976, 8924, 4630, 294, 264, 2808, 295, 300, 1338, 767, 286, 393, 34285, + 365, 300, 411, 286, 600, 286, 600, 2732, 51328], "temperature": 0.0, "avg_logprob": + -0.1080804583670079, "compression_ratio": 1.6711711711711712, "no_speech_prob": + 0.00440682889893651}, {"id": 813, "seek": 500574, "start": 5025.9, "end": 5032.54, + "text": " with one startup helping them to do speech to text right and first of + all one one issue is", "tokens": [51372, 365, 472, 18578, 4315, 552, 281, 360, 6218, + 281, 2487, 558, 293, 700, 295, 439, 472, 472, 2734, 307, 51704], "temperature": + 0.0, "avg_logprob": -0.1080804583670079, "compression_ratio": 1.6711711711711712, + "no_speech_prob": 0.00440682889893651}, {"id": 814, "seek": 503254, "start": 5032.54, + "end": 5038.7, "text": " very similar with low resource so to say languages in an + OPs that if you don''t have a model", "tokens": [50364, 588, 2531, 365, 2295, 7684, + 370, 281, 584, 8650, 294, 364, 422, 23043, 300, 498, 291, 500, 380, 362, 257, 2316, + 50672], "temperature": 0.0, "avg_logprob": -0.1805364415886697, "compression_ratio": + 1.669603524229075, "no_speech_prob": 0.00454159639775753}, {"id": 815, "seek": 503254, + "start": 5038.7, "end": 5044.62, "text": " trained on a lot of examples or maybe + they''ve been trained on some TV shows and you are doing", "tokens": [50672, 8895, + 322, 257, 688, 295, 5110, 420, 1310, 436, 600, 668, 8895, 322, 512, 3558, 3110, + 293, 291, 366, 884, 50968], "temperature": 0.0, "avg_logprob": -0.1805364415886697, + "compression_ratio": 1.669603524229075, "no_speech_prob": 0.00454159639775753}, + {"id": 816, "seek": 503254, "start": 5045.42, "end": 5053.18, "text": " and a user + speech stuff you know the topics are different the style is different everything + is", "tokens": [51008, 293, 257, 4195, 6218, 1507, 291, 458, 264, 8378, 366, 819, + 264, 3758, 307, 819, 1203, 307, 51396], "temperature": 0.0, "avg_logprob": -0.1805364415886697, + "compression_ratio": 1.669603524229075, "no_speech_prob": 0.00454159639775753}, + {"id": 817, "seek": 503254, "start": 5053.18, "end": 5059.58, "text": " different + and so it breaks and so I was also eluding to the topic of fine tuning there but + exactly", "tokens": [51396, 819, 293, 370, 309, 9857, 293, 370, 286, 390, 611, 806, + 33703, 281, 264, 4829, 295, 2489, 15164, 456, 457, 2293, 51716], "temperature": + 0.0, "avg_logprob": -0.1805364415886697, "compression_ratio": 1.669603524229075, + "no_speech_prob": 0.00454159639775753}, {"id": 818, "seek": 505958, "start": 5059.58, + "end": 5066.38, "text": " what you said the problem was the output was so noisy + that I had to write and what I called like", "tokens": [50364, 437, 291, 848, 264, + 1154, 390, 264, 5598, 390, 370, 24518, 300, 286, 632, 281, 2464, 293, 437, 286, + 1219, 411, 50704], "temperature": 0.0, "avg_logprob": -0.12566911797774466, "compression_ratio": + 1.6943231441048034, "no_speech_prob": 0.01362245436757803}, {"id": 819, "seek": + 505958, "start": 5066.38, "end": 5073.42, "text": " an LPLayer which would go and + you know change things for instance if you say 25 and it actually", "tokens": [50704, + 364, 441, 21593, 11167, 597, 576, 352, 293, 291, 458, 1319, 721, 337, 5197, 498, + 291, 584, 3552, 293, 309, 767, 51056], "temperature": 0.0, "avg_logprob": -0.12566911797774466, + "compression_ratio": 1.6943231441048034, "no_speech_prob": 0.01362245436757803}, + {"id": 820, "seek": 505958, "start": 5073.42, "end": 5079.82, "text": " spells it + out with letters you you will collapse that to a number you know but sometimes it + would", "tokens": [51056, 25053, 309, 484, 365, 7825, 291, 291, 486, 15584, 300, + 281, 257, 1230, 291, 458, 457, 2171, 309, 576, 51376], "temperature": 0.0, "avg_logprob": + -0.12566911797774466, "compression_ratio": 1.6943231441048034, "no_speech_prob": + 0.01362245436757803}, {"id": 821, "seek": 505958, "start": 5079.82, "end": 5085.9, + "text": " do it in problematic places and you''re like oh no don''t do that don''t + do it here you know so like", "tokens": [51376, 360, 309, 294, 19011, 3190, 293, + 291, 434, 411, 1954, 572, 500, 380, 360, 300, 500, 380, 360, 309, 510, 291, 458, + 370, 411, 51680], "temperature": 0.0, "avg_logprob": -0.12566911797774466, "compression_ratio": + 1.6943231441048034, "no_speech_prob": 0.01362245436757803}, {"id": 822, "seek": + 508590, "start": 5085.9, "end": 5094.86, "text": " it''s just like an aftermath + you know thing and you would wish that the model having enough context", "tokens": + [50364, 309, 311, 445, 411, 364, 34095, 291, 458, 551, 293, 291, 576, 3172, 300, + 264, 2316, 1419, 1547, 4319, 50812], "temperature": 0.0, "avg_logprob": -0.19235638512505426, + "compression_ratio": 1.7477477477477477, "no_speech_prob": 0.005314115434885025}, + {"id": 823, "seek": 508590, "start": 5094.86, "end": 5102.299999999999, "text": + " and knowledge about the world should do it right as it transcribes rather than + you do this as", "tokens": [50812, 293, 3601, 466, 264, 1002, 820, 360, 309, 558, + 382, 309, 1145, 1142, 6446, 2831, 813, 291, 360, 341, 382, 51184], "temperature": + 0.0, "avg_logprob": -0.19235638512505426, "compression_ratio": 1.7477477477477477, + "no_speech_prob": 0.005314115434885025}, {"id": 824, "seek": 508590, "start": 5102.299999999999, + "end": 5108.219999999999, "text": " as a aftermath yeah yeah exactly I''m thinking + the same way and he''s like it''s a text layer afterwards", "tokens": [51184, 382, + 257, 34095, 1338, 1338, 2293, 286, 478, 1953, 264, 912, 636, 293, 415, 311, 411, + 309, 311, 257, 2487, 4583, 10543, 51480], "temperature": 0.0, "avg_logprob": -0.19235638512505426, + "compression_ratio": 1.7477477477477477, "no_speech_prob": 0.005314115434885025}, + {"id": 825, "seek": 508590, "start": 5108.7, "end": 5113.58, "text": " yeah yeah + exactly yeah super cool and then maybe like as we''re wrapping up the podcast if + you", "tokens": [51504, 1338, 1338, 2293, 1338, 1687, 1627, 293, 550, 1310, 411, + 382, 321, 434, 21993, 493, 264, 7367, 498, 291, 51748], "temperature": 0.0, "avg_logprob": + -0.19235638512505426, "compression_ratio": 1.7477477477477477, "no_speech_prob": + 0.005314115434885025}, {"id": 826, "seek": 511358, "start": 5113.58, "end": 5118.46, + "text": " let me quickly tell you about ref to veck and sort of the pivot into recommendation + and well so to", "tokens": [50364, 718, 385, 2661, 980, 291, 466, 1895, 281, 1241, + 547, 293, 1333, 295, 264, 14538, 666, 11879, 293, 731, 370, 281, 50608], "temperature": + 0.0, "avg_logprob": -0.209107967691684, "compression_ratio": 1.8588235294117648, + "no_speech_prob": 0.00044177277595736086}, {"id": 827, "seek": 511358, "start": + 5118.46, "end": 5123.26, "text": " start off ref to veck is about and it''s about + utilizing we VH data model a little more so", "tokens": [50608, 722, 766, 1895, + 281, 1241, 547, 307, 466, 293, 309, 311, 466, 26775, 321, 691, 39, 1412, 2316, 257, + 707, 544, 370, 50848], "temperature": 0.0, "avg_logprob": -0.209107967691684, "compression_ratio": + 1.8588235294117648, "no_speech_prob": 0.00044177277595736086}, {"id": 828, "seek": + 511358, "start": 5123.9, "end": 5128.22, "text": " we VH data model is designed + where you have different classes so this class could be", "tokens": [50880, 321, + 691, 39, 1412, 2316, 307, 4761, 689, 291, 362, 819, 5359, 370, 341, 1508, 727, 312, + 51096], "temperature": 0.0, "avg_logprob": -0.209107967691684, "compression_ratio": + 1.8588235294117648, "no_speech_prob": 0.00044177277595736086}, {"id": 829, "seek": + 511358, "start": 5129.0199999999995, "end": 5135.1, "text": " products this class + could be user so you know like tables and SQL we have different data objects like", + "tokens": [51136, 3383, 341, 1508, 727, 312, 4195, 370, 291, 458, 411, 8020, 293, + 19200, 321, 362, 819, 1412, 6565, 411, 51440], "temperature": 0.0, "avg_logprob": + -0.209107967691684, "compression_ratio": 1.8588235294117648, "no_speech_prob": 0.00044177277595736086}, + {"id": 830, "seek": 511358, "start": 5135.1, "end": 5140.54, "text": " the high-level + ideas of designing data objects and then you have graph relations between them like", + "tokens": [51440, 264, 1090, 12, 12418, 3487, 295, 14685, 1412, 6565, 293, 550, + 291, 362, 4295, 2299, 1296, 552, 411, 51712], "temperature": 0.0, "avg_logprob": + -0.209107967691684, "compression_ratio": 1.8588235294117648, "no_speech_prob": 0.00044177277595736086}, + {"id": 831, "seek": 514054, "start": 5140.54, "end": 5148.78, "text": " user-like + products so the simplest thing is that then you can represent the user as the average", + "tokens": [50364, 4195, 12, 4092, 3383, 370, 264, 22811, 551, 307, 300, 550, 291, + 393, 2906, 264, 4195, 382, 264, 4274, 50776], "temperature": 0.0, "avg_logprob": + -0.11452786675814924, "compression_ratio": 1.9896373056994818, "no_speech_prob": + 0.000347577384673059}, {"id": 832, "seek": 514054, "start": 5148.78, "end": 5154.38, + "text": " vector of the products that the user liked and then you can rank with + you can re-rank with that", "tokens": [50776, 8062, 295, 264, 3383, 300, 264, 4195, + 4501, 293, 550, 291, 393, 6181, 365, 291, 393, 319, 12, 20479, 365, 300, 51056], + "temperature": 0.0, "avg_logprob": -0.11452786675814924, "compression_ratio": 1.9896373056994818, + "no_speech_prob": 0.000347577384673059}, {"id": 833, "seek": 514054, "start": 5154.38, + "end": 5158.06, "text": " or you could just search with that vector that that could + be your search vector or you could have", "tokens": [51056, 420, 291, 727, 445, + 3164, 365, 300, 8062, 300, 300, 727, 312, 428, 3164, 8062, 420, 291, 727, 362, 51240], + "temperature": 0.0, "avg_logprob": -0.11452786675814924, "compression_ratio": 1.9896373056994818, + "no_speech_prob": 0.000347577384673059}, {"id": 834, "seek": 514054, "start": 5158.06, + "end": 5165.1, "text": " some other search like restaurants in Boston and because + I live in Boston and you know like oh", "tokens": [51240, 512, 661, 3164, 411, 11486, + 294, 12333, 293, 570, 286, 1621, 294, 12333, 293, 291, 458, 411, 1954, 51592], "temperature": + 0.0, "avg_logprob": -0.11452786675814924, "compression_ratio": 1.9896373056994818, + "no_speech_prob": 0.000347577384673059}, {"id": 835, "seek": 516510, "start": 5165.26, + "end": 5171.820000000001, "text": " sorry I didn''t mean to give away Boston in + the query say I my my query is Italian restaurants", "tokens": [50372, 2597, 286, + 994, 380, 914, 281, 976, 1314, 12333, 294, 264, 14581, 584, 286, 452, 452, 14581, + 307, 10003, 11486, 50700], "temperature": 0.0, "avg_logprob": -0.1631138406950852, + "compression_ratio": 1.9038461538461537, "no_speech_prob": 0.0005800747312605381}, + {"id": 836, "seek": 516510, "start": 5171.820000000001, "end": 5177.18, "text": + " and because it sees that Connor likes restaurant I don''t know like some north + and Italian restaurants", "tokens": [50700, 293, 570, 309, 8194, 300, 33133, 5902, + 6383, 286, 500, 380, 458, 411, 512, 6830, 293, 10003, 11486, 50968], "temperature": + 0.0, "avg_logprob": -0.1631138406950852, "compression_ratio": 1.9038461538461537, + "no_speech_prob": 0.0005800747312605381}, {"id": 837, "seek": 516510, "start": 5177.18, + "end": 5182.38, "text": " one way that like it knows that I''m in Boston so it will + it can personalize just using that vector", "tokens": [50968, 472, 636, 300, 411, + 309, 3255, 300, 286, 478, 294, 12333, 370, 309, 486, 309, 393, 2973, 1125, 445, + 1228, 300, 8062, 51228], "temperature": 0.0, "avg_logprob": -0.1631138406950852, + "compression_ratio": 1.9038461538461537, "no_speech_prob": 0.0005800747312605381}, + {"id": 838, "seek": 516510, "start": 5183.58, "end": 5187.5, "text": " to re-rank + to only show me restaurants in Boston because if you show me a restaurant in Chicago + it''s", "tokens": [51288, 281, 319, 12, 20479, 281, 787, 855, 385, 11486, 294, 12333, + 570, 498, 291, 855, 385, 257, 6383, 294, 9525, 309, 311, 51484], "temperature": + 0.0, "avg_logprob": -0.1631138406950852, "compression_ratio": 1.9038461538461537, + "no_speech_prob": 0.0005800747312605381}, {"id": 839, "seek": 516510, "start": 5187.5, + "end": 5194.38, "text": " like useless so so so that''s kind of the first idea is + this kind of like average the vectors to get", "tokens": [51484, 411, 14115, 370, + 370, 370, 300, 311, 733, 295, 264, 700, 1558, 307, 341, 733, 295, 411, 4274, 264, + 18875, 281, 483, 51828], "temperature": 0.0, "avg_logprob": -0.1631138406950852, + "compression_ratio": 1.9038461538461537, "no_speech_prob": 0.0005800747312605381}, + {"id": 840, "seek": 519438, "start": 5194.38, "end": 5199.1, "text": " the centroid + but then there''s this idea where and I learned this from talking to Martin Grootendorce", + "tokens": [50364, 264, 1489, 6490, 457, 550, 456, 311, 341, 1558, 689, 293, 286, + 3264, 341, 490, 1417, 281, 9184, 12981, 310, 521, 284, 384, 50600], "temperature": + 0.0, "avg_logprob": -0.13192123124579422, "compression_ratio": 1.7392857142857143, + "no_speech_prob": 0.004371500574052334}, {"id": 841, "seek": 519438, "start": 5199.1, + "end": 5203.18, "text": " about his burtopic library and I highly recommend people + check that out it''s such a cool way of", "tokens": [50600, 466, 702, 2779, 83, + 40216, 6405, 293, 286, 5405, 2748, 561, 1520, 300, 484, 309, 311, 1270, 257, 1627, + 636, 295, 50804], "temperature": 0.0, "avg_logprob": -0.13192123124579422, "compression_ratio": + 1.7392857142857143, "no_speech_prob": 0.004371500574052334}, {"id": 842, "seek": + 519438, "start": 5203.18, "end": 5208.86, "text": " visualizing vector spaces but + this like HDB scan clustering so he was describing the difference", "tokens": [50804, + 5056, 3319, 8062, 7673, 457, 341, 411, 12149, 33, 11049, 596, 48673, 370, 415, 390, + 16141, 264, 2649, 51088], "temperature": 0.0, "avg_logprob": -0.13192123124579422, + "compression_ratio": 1.7392857142857143, "no_speech_prob": 0.004371500574052334}, + {"id": 843, "seek": 519438, "start": 5208.86, "end": 5213.900000000001, "text": + " between HDB scan and k-means clustering for how they produce centroids but and + so HDB scan has", "tokens": [51088, 1296, 12149, 33, 11049, 293, 350, 12, 1398, + 599, 596, 48673, 337, 577, 436, 5258, 1489, 340, 3742, 457, 293, 370, 12149, 33, + 11049, 575, 51340], "temperature": 0.0, "avg_logprob": -0.13192123124579422, "compression_ratio": + 1.7392857142857143, "no_speech_prob": 0.004371500574052334}, {"id": 844, "seek": + 519438, "start": 5213.900000000001, "end": 5218.22, "text": " this very cool like + density clustering thing but regardless of the clustering you use I just I like", + "tokens": [51340, 341, 588, 1627, 411, 10305, 596, 48673, 551, 457, 10060, 295, + 264, 596, 48673, 291, 764, 286, 445, 286, 411, 51556], "temperature": 0.0, "avg_logprob": + -0.13192123124579422, "compression_ratio": 1.7392857142857143, "no_speech_prob": + 0.004371500574052334}, {"id": 845, "seek": 521822, "start": 5218.22, "end": 5223.66, + "text": " HDB scan a lot but so let''s say we get three centroids like I like Nike + shoes a data", "tokens": [50364, 12149, 33, 11049, 257, 688, 457, 370, 718, 311, + 584, 321, 483, 1045, 1489, 340, 3742, 411, 286, 411, 30397, 6654, 257, 1412, 50636], + "temperature": 0.0, "avg_logprob": -0.14020287139075144, "compression_ratio": 1.7969348659003832, + "no_speech_prob": 0.0009895404800772667}, {"id": 846, "seek": 521822, "start": 5223.66, + "end": 5228.860000000001, "text": " shoes and Jordan shoes and you have these three + centroids so you can use those three centroids", "tokens": [50636, 6654, 293, 10979, + 6654, 293, 291, 362, 613, 1045, 1489, 340, 3742, 370, 291, 393, 764, 729, 1045, + 1489, 340, 3742, 50896], "temperature": 0.0, "avg_logprob": -0.14020287139075144, + "compression_ratio": 1.7969348659003832, "no_speech_prob": 0.0009895404800772667}, + {"id": 847, "seek": 521822, "start": 5228.860000000001, "end": 5233.5, "text": " + is three average vectors from their respective clusters to re-rank with as well + have some kind", "tokens": [50896, 307, 1045, 4274, 18875, 490, 641, 23649, 23313, + 281, 319, 12, 20479, 365, 382, 731, 362, 512, 733, 51128], "temperature": 0.0, "avg_logprob": + -0.14020287139075144, "compression_ratio": 1.7969348659003832, "no_speech_prob": + 0.0009895404800772667}, {"id": 848, "seek": 521822, "start": 5233.5, "end": 5239.1, + "text": " of diversity and results and then there is just this thinking around so + so yes there that that''s", "tokens": [51128, 295, 8811, 293, 3542, 293, 550, 456, + 307, 445, 341, 1953, 926, 370, 370, 2086, 456, 300, 300, 311, 51408], "temperature": + 0.0, "avg_logprob": -0.14020287139075144, "compression_ratio": 1.7969348659003832, + "no_speech_prob": 0.0009895404800772667}, {"id": 849, "seek": 521822, "start": 5239.1, + "end": 5245.02, "text": " the recommendation pivot and then there''s this idea of + like top level index and I''m stealing that", "tokens": [51408, 264, 11879, 14538, + 293, 550, 456, 311, 341, 1558, 295, 411, 1192, 1496, 8186, 293, 286, 478, 19757, + 300, 51704], "temperature": 0.0, "avg_logprob": -0.14020287139075144, "compression_ratio": + 1.7969348659003832, "no_speech_prob": 0.0009895404800772667}, {"id": 850, "seek": + 524502, "start": 5245.02, "end": 5251.5, "text": " kind of terminology from gpt + index because what gpt index does is to represent a long document", "tokens": [50364, + 733, 295, 27575, 490, 290, 662, 8186, 570, 437, 290, 662, 8186, 775, 307, 281, 2906, + 257, 938, 4166, 50688], "temperature": 0.0, "avg_logprob": -0.1530308370236997, + "compression_ratio": 1.830827067669173, "no_speech_prob": 0.0008654801640659571}, + {"id": 851, "seek": 524502, "start": 5251.5, "end": 5256.540000000001, "text": " + you have again that tree summarization where you could say this is for obviously + right and it''s", "tokens": [50688, 291, 362, 797, 300, 4230, 14611, 2144, 689, + 291, 727, 584, 341, 307, 337, 2745, 558, 293, 309, 311, 50940], "temperature": 0.0, + "avg_logprob": -0.1530308370236997, "compression_ratio": 1.830827067669173, "no_speech_prob": + 0.0008654801640659571}, {"id": 852, "seek": 524502, "start": 5256.540000000001, + "end": 5261.5, "text": " for and you summarize these two and then summarize one + and see if this like top level index where you", "tokens": [50940, 337, 293, 291, + 20858, 613, 732, 293, 550, 20858, 472, 293, 536, 498, 341, 411, 1192, 1496, 8186, + 689, 291, 51188], "temperature": 0.0, "avg_logprob": -0.1530308370236997, "compression_ratio": + 1.830827067669173, "no_speech_prob": 0.0008654801640659571}, {"id": 853, "seek": + 524502, "start": 5261.5, "end": 5265.42, "text": " search through this layer first + and this layer and so it''s like if you''re asking question like", "tokens": [51188, + 3164, 807, 341, 4583, 700, 293, 341, 4583, 293, 370, 309, 311, 411, 498, 291, 434, + 3365, 1168, 411, 51384], "temperature": 0.0, "avg_logprob": -0.1530308370236997, + "compression_ratio": 1.830827067669173, "no_speech_prob": 0.0008654801640659571}, + {"id": 854, "seek": 524502, "start": 5265.42, "end": 5270.3, "text": " what was + Barack Obama''s legacy and then you have the symbolic filter of the titles of the + Wikipedia", "tokens": [51384, 437, 390, 31705, 9560, 311, 11711, 293, 550, 291, + 362, 264, 25755, 6608, 295, 264, 12992, 295, 264, 28999, 51628], "temperature": + 0.0, "avg_logprob": -0.1530308370236997, "compression_ratio": 1.830827067669173, + "no_speech_prob": 0.0008654801640659571}, {"id": 855, "seek": 527030, "start": 5270.3, + "end": 5276.3, "text": " pages and you have where title equals Barack Obama like + that top level search will like super", "tokens": [50364, 7183, 293, 291, 362, 689, + 4876, 6915, 31705, 9560, 411, 300, 1192, 1496, 3164, 486, 411, 1687, 50664], "temperature": + 0.0, "avg_logprob": -0.18939610062358536, "compression_ratio": 1.7954545454545454, + "no_speech_prob": 0.002997712232172489}, {"id": 856, "seek": 527030, "start": 5276.3, + "end": 5280.3, "text": " simplified the search space because now you''re just looking + in the Barack Obama article and there", "tokens": [50664, 26335, 264, 3164, 1901, + 570, 586, 291, 434, 445, 1237, 294, 264, 31705, 9560, 7222, 293, 456, 50864], "temperature": + 0.0, "avg_logprob": -0.18939610062358536, "compression_ratio": 1.7954545454545454, + "no_speech_prob": 0.002997712232172489}, {"id": 857, "seek": 527030, "start": 5280.3, + "end": 5287.5, "text": " at all Wikipedia so I think reftivect also in the use of + having constructing top level indexes by", "tokens": [50864, 412, 439, 28999, 370, + 286, 519, 1895, 83, 592, 557, 611, 294, 264, 764, 295, 1419, 39969, 1192, 1496, + 8186, 279, 538, 51224], "temperature": 0.0, "avg_logprob": -0.18939610062358536, + "compression_ratio": 1.7954545454545454, "no_speech_prob": 0.002997712232172489}, + {"id": 858, "seek": 527030, "start": 5288.7, "end": 5293.26, "text": " you know + having document has passage has passage has passage again in the we get data model", + "tokens": [51284, 291, 458, 1419, 4166, 575, 11497, 575, 11497, 575, 11497, 797, + 294, 264, 321, 483, 1412, 2316, 51512], "temperature": 0.0, "avg_logprob": -0.18939610062358536, + "compression_ratio": 1.7954545454545454, "no_speech_prob": 0.002997712232172489}, + {"id": 859, "seek": 527030, "start": 5293.820000000001, "end": 5298.3, "text": " + it can be it''s just like I think it''s a really interesting way that we''re trying + to use this", "tokens": [51540, 309, 393, 312, 309, 311, 445, 411, 286, 519, 309, + 311, 257, 534, 1880, 636, 300, 321, 434, 1382, 281, 764, 341, 51764], "temperature": + 0.0, "avg_logprob": -0.18939610062358536, "compression_ratio": 1.7954545454545454, + "no_speech_prob": 0.002997712232172489}, {"id": 860, "seek": 529830, "start": 5298.3, + "end": 5303.02, "text": " cross-reference graph structure to move embeddings through + the graph another idea and I know", "tokens": [50364, 3278, 12, 265, 5158, 4295, + 3877, 281, 1286, 12240, 29432, 807, 264, 4295, 1071, 1558, 293, 286, 458, 50600], + "temperature": 0.0, "avg_logprob": -0.14686332430158341, "compression_ratio": 1.936, + "no_speech_prob": 0.002331739291548729}, {"id": 861, "seek": 529830, "start": 5303.02, + "end": 5308.14, "text": " like doing a thousand ideas like it could be having like + a graph convolutional network where okay so", "tokens": [50600, 411, 884, 257, 4714, + 3487, 411, 309, 727, 312, 1419, 411, 257, 4295, 45216, 304, 3209, 689, 1392, 370, + 50856], "temperature": 0.0, "avg_logprob": -0.14686332430158341, "compression_ratio": + 1.936, "no_speech_prob": 0.002331739291548729}, {"id": 862, "seek": 529830, "start": + 5308.14, "end": 5316.22, "text": " you have user-like product has brand okay let''s + just make it a three-class graph like that and so", "tokens": [50856, 291, 362, + 4195, 12, 4092, 1674, 575, 3360, 1392, 718, 311, 445, 652, 309, 257, 1045, 12, 11665, + 4295, 411, 300, 293, 370, 51260], "temperature": 0.0, "avg_logprob": -0.14686332430158341, + "compression_ratio": 1.936, "no_speech_prob": 0.002331739291548729}, {"id": 863, + "seek": 529830, "start": 5316.22, "end": 5322.06, "text": " you have this this graph + and you need to send the you need to aggregate the embeddings through the", "tokens": + [51260, 291, 362, 341, 341, 4295, 293, 291, 643, 281, 2845, 264, 291, 643, 281, + 26118, 264, 12240, 29432, 807, 264, 51552], "temperature": 0.0, "avg_logprob": -0.14686332430158341, + "compression_ratio": 1.936, "no_speech_prob": 0.002331739291548729}, {"id": 864, + "seek": 529830, "start": 5322.06, "end": 5326.78, "text": " graph so now it''s like + should we just average should we try some kind of like nonlinear graph", "tokens": + [51552, 4295, 370, 586, 309, 311, 411, 820, 321, 445, 4274, 820, 321, 853, 512, + 733, 295, 411, 2107, 28263, 4295, 51788], "temperature": 0.0, "avg_logprob": -0.14686332430158341, + "compression_ratio": 1.936, "no_speech_prob": 0.002331739291548729}, {"id": 865, + "seek": 532678, "start": 5326.86, "end": 5331.58, "text": " convolutional network + and the graph convolutional network being beneficial because a graph network", "tokens": + [50368, 45216, 304, 3209, 293, 264, 4295, 45216, 304, 3209, 885, 14072, 570, 257, + 4295, 3209, 50604], "temperature": 0.0, "avg_logprob": -0.170864847780184, "compression_ratio": + 1.869281045751634, "no_speech_prob": 0.005384357180446386}, {"id": 866, "seek": + 532678, "start": 5331.58, "end": 5336.7, "text": " can handle like arbitrary number + of inputs that''s sort of like isn''t like a fixed input size like", "tokens": [50604, + 393, 4813, 411, 23211, 1230, 295, 15743, 300, 311, 1333, 295, 411, 1943, 380, 411, + 257, 6806, 4846, 2744, 411, 50860], "temperature": 0.0, "avg_logprob": -0.170864847780184, + "compression_ratio": 1.869281045751634, "no_speech_prob": 0.005384357180446386}, + {"id": 867, "seek": 532678, "start": 5336.7, "end": 5341.98, "text": " transformers + you would like zero-pad it to 512 tokens or the convolutional network is it''s", + "tokens": [50860, 4088, 433, 291, 576, 411, 4018, 12, 13647, 309, 281, 1025, 4762, + 22667, 420, 264, 45216, 304, 3209, 307, 309, 311, 51124], "temperature": 0.0, "avg_logprob": + -0.170864847780184, "compression_ratio": 1.869281045751634, "no_speech_prob": 0.005384357180446386}, + {"id": 868, "seek": 532678, "start": 5341.98, "end": 5346.94, "text": " like kind + of flexible but generally it''s like very flexible to the number of inputs and so + I hope", "tokens": [51124, 411, 733, 295, 11358, 457, 5101, 309, 311, 411, 588, + 11358, 281, 264, 1230, 295, 15743, 293, 370, 286, 1454, 51372], "temperature": 0.0, + "avg_logprob": -0.170864847780184, "compression_ratio": 1.869281045751634, "no_speech_prob": + 0.005384357180446386}, {"id": 869, "seek": 532678, "start": 5346.94, "end": 5351.42, + "text": " that was an okay tour of reftivect and I know I''m trying to get in a + little bit start it''s amazing", "tokens": [51372, 300, 390, 364, 1392, 3512, 295, + 1895, 83, 592, 557, 293, 286, 458, 286, 478, 1382, 281, 483, 294, 257, 707, 857, + 722, 309, 311, 2243, 51596], "temperature": 0.0, "avg_logprob": -0.170864847780184, + "compression_ratio": 1.869281045751634, "no_speech_prob": 0.005384357180446386}, + {"id": 870, "seek": 532678, "start": 5351.42, "end": 5356.38, "text": " actually + and I think I hope we can maybe discuss in subsequent episodes as well because", + "tokens": [51596, 767, 293, 286, 519, 286, 1454, 321, 393, 1310, 2248, 294, 19962, + 9313, 382, 731, 570, 51844], "temperature": 0.0, "avg_logprob": -0.170864847780184, + "compression_ratio": 1.869281045751634, "no_speech_prob": 0.005384357180446386}, + {"id": 871, "seek": 535678, "start": 5356.94, "end": 5362.62, "text": " the topic + of personalization is also very interesting and like for someone who says okay we + just", "tokens": [50372, 264, 4829, 295, 2973, 2144, 307, 611, 588, 1880, 293, 411, + 337, 1580, 567, 1619, 1392, 321, 445, 50656], "temperature": 0.0, "avg_logprob": + -0.11284814722397749, "compression_ratio": 1.6896551724137931, "no_speech_prob": + 0.001495433272793889}, {"id": 872, "seek": 535678, "start": 5362.62, "end": 5367.82, + "text": " have this fixed vectors computed from the content how the hell we can + actually bring the user", "tokens": [50656, 362, 341, 6806, 18875, 40610, 490, 264, + 2701, 577, 264, 4921, 321, 393, 767, 1565, 264, 4195, 50916], "temperature": 0.0, + "avg_logprob": -0.11284814722397749, "compression_ratio": 1.6896551724137931, "no_speech_prob": + 0.001495433272793889}, {"id": 873, "seek": 535678, "start": 5368.46, "end": 5374.46, + "text": " and this is what you''ve described this is what I perceive from it I think + this is an excellent topic", "tokens": [50948, 293, 341, 307, 437, 291, 600, 7619, + 341, 307, 437, 286, 20281, 490, 309, 286, 519, 341, 307, 364, 7103, 4829, 51248], + "temperature": 0.0, "avg_logprob": -0.11284814722397749, "compression_ratio": 1.6896551724137931, + "no_speech_prob": 0.001495433272793889}, {"id": 874, "seek": 535678, "start": 5374.46, + "end": 5383.34, "text": " and this kind of opens up opportunities for vector search + to appeal to to the you know search engine", "tokens": [51248, 293, 341, 733, 295, + 9870, 493, 4786, 337, 8062, 3164, 281, 13668, 281, 281, 264, 291, 458, 3164, 2848, + 51692], "temperature": 0.0, "avg_logprob": -0.11284814722397749, "compression_ratio": + 1.6896551724137931, "no_speech_prob": 0.001495433272793889}, {"id": 875, "seek": + 538334, "start": 5383.42, "end": 5391.42, "text": " builder so maybe some other + engines like recommendation and so but I think we have a ton of material", "tokens": + [50368, 27377, 370, 1310, 512, 661, 12982, 411, 11879, 293, 370, 457, 286, 519, + 321, 362, 257, 2952, 295, 2527, 50768], "temperature": 0.0, "avg_logprob": -0.267131980406035, + "compression_ratio": 1.7526881720430108, "no_speech_prob": 0.010554177686572075}, + {"id": 876, "seek": 538334, "start": 5391.42, "end": 5396.62, "text": " I really + love talking to you maybe before we close off is there something you wanted to announce", + "tokens": [50768, 286, 534, 959, 1417, 281, 291, 1310, 949, 321, 1998, 766, 307, + 456, 746, 291, 1415, 281, 7478, 51028], "temperature": 0.0, "avg_logprob": -0.267131980406035, + "compression_ratio": 1.7526881720430108, "no_speech_prob": 0.010554177686572075}, + {"id": 877, "seek": 538334, "start": 5396.62, "end": 5402.3, "text": " to to the + audience of vector podcast oh yeah I think it was so we have toured a lot of things + but", "tokens": [51028, 281, 281, 264, 4034, 295, 8062, 7367, 1954, 1338, 286, 519, + 309, 390, 370, 321, 362, 10095, 986, 257, 688, 295, 721, 457, 51312], "temperature": + 0.0, "avg_logprob": -0.267131980406035, "compression_ratio": 1.7526881720430108, + "no_speech_prob": 0.010554177686572075}, {"id": 878, "seek": 538334, "start": 5402.3, + "end": 5407.58, "text": " I really hope that you check out the weve 8 beer benchmarks + repository so this is a recent effort", "tokens": [51312, 286, 534, 1454, 300, 291, + 1520, 484, 264, 321, 303, 1649, 8795, 43751, 25841, 370, 341, 307, 257, 5162, 4630, + 51576], "temperature": 0.0, "avg_logprob": -0.267131980406035, "compression_ratio": + 1.7526881720430108, "no_speech_prob": 0.010554177686572075}, {"id": 879, "seek": + 538334, "start": 5407.58, "end": 5412.06, "text": " around hybrid search coming + back to that in a long conversation I really thought it was forever", "tokens": + [51576, 926, 13051, 3164, 1348, 646, 281, 300, 294, 257, 938, 3761, 286, 534, 1194, + 309, 390, 5680, 51800], "temperature": 0.0, "avg_logprob": -0.267131980406035, "compression_ratio": + 1.7526881720430108, "no_speech_prob": 0.010554177686572075}, {"id": 880, "seek": + 541206, "start": 5412.06, "end": 5420.54, "text": " ago but like the hybrid search + thing has been tested with with the beer benchmarks and so there''s", "tokens": + [50364, 2057, 457, 411, 264, 13051, 3164, 551, 575, 668, 8246, 365, 365, 264, 8795, + 43751, 293, 370, 456, 311, 50788], "temperature": 0.0, "avg_logprob": -0.1590147664991476, + "compression_ratio": 2.0121951219512195, "no_speech_prob": 0.002758374437689781}, + {"id": 881, "seek": 541206, "start": 5420.54, "end": 5425.02, "text": " there''s + like scales there''s like small scale beer medium scale larger scale so right now + there''s", "tokens": [50788, 456, 311, 411, 17408, 456, 311, 411, 1359, 4373, 8795, + 6399, 4373, 4833, 4373, 370, 558, 586, 456, 311, 51012], "temperature": 0.0, "avg_logprob": + -0.1590147664991476, "compression_ratio": 2.0121951219512195, "no_speech_prob": + 0.002758374437689781}, {"id": 882, "seek": 541206, "start": 5425.02, "end": 5429.580000000001, + "text": " the larger scale and some medium scale I''m at smaller scale and some + medium scale and right now we''re", "tokens": [51012, 264, 4833, 4373, 293, 512, + 6399, 4373, 286, 478, 412, 4356, 4373, 293, 512, 6399, 4373, 293, 558, 586, 321, + 434, 51240], "temperature": 0.0, "avg_logprob": -0.1590147664991476, "compression_ratio": + 2.0121951219512195, "no_speech_prob": 0.002758374437689781}, {"id": 883, "seek": + 541206, "start": 5429.580000000001, "end": 5435.900000000001, "text": " working + on the backups but this is all based on so we''ve got 1.15 had backups where you + can you know", "tokens": [51240, 1364, 322, 264, 50160, 457, 341, 307, 439, 2361, + 322, 370, 321, 600, 658, 502, 13, 5211, 632, 50160, 689, 291, 393, 291, 458, 51556], + "temperature": 0.0, "avg_logprob": -0.1590147664991476, "compression_ratio": 2.0121951219512195, + "no_speech_prob": 0.002758374437689781}, {"id": 884, "seek": 541206, "start": 5436.54, + "end": 5441.740000000001, "text": " back up the weve 8 instance to have like a file + that lets you just restore the weve 8 instance so", "tokens": [51588, 646, 493, + 264, 321, 303, 1649, 5197, 281, 362, 411, 257, 3991, 300, 6653, 291, 445, 15227, + 264, 321, 303, 1649, 5197, 370, 51848], "temperature": 0.0, "avg_logprob": -0.1590147664991476, + "compression_ratio": 2.0121951219512195, "no_speech_prob": 0.002758374437689781}, + {"id": 885, "seek": 544174, "start": 5441.74, "end": 5445.26, "text": " you don''t + need to import the data it''s like you know it''s like with the face indexes how + you can", "tokens": [50364, 291, 500, 380, 643, 281, 974, 264, 1412, 309, 311, 411, + 291, 458, 309, 311, 411, 365, 264, 1851, 8186, 279, 577, 291, 393, 50540], "temperature": + 0.0, "avg_logprob": -0.0960767210022477, "compression_ratio": 1.8643410852713178, + "no_speech_prob": 0.0001514465839136392}, {"id": 886, "seek": 544174, "start": 5445.26, + "end": 5451.82, "text": " just read index so so now what you can do is you can just + load the weve 8 index and so why this", "tokens": [50540, 445, 1401, 8186, 370, + 370, 586, 437, 291, 393, 360, 307, 291, 393, 445, 3677, 264, 321, 303, 1649, 8186, + 293, 370, 983, 341, 50868], "temperature": 0.0, "avg_logprob": -0.0960767210022477, + "compression_ratio": 1.8643410852713178, "no_speech_prob": 0.0001514465839136392}, + {"id": 887, "seek": 544174, "start": 5451.82, "end": 5456.219999999999, "text": + " is so exciting to me is I''ve I''ve always been really interested in like hug + and face data sets or", "tokens": [50868, 307, 370, 4670, 281, 385, 307, 286, 600, + 286, 600, 1009, 668, 534, 3102, 294, 411, 8777, 293, 1851, 1412, 6352, 420, 51088], + "temperature": 0.0, "avg_logprob": -0.0960767210022477, "compression_ratio": 1.8643410852713178, + "no_speech_prob": 0.0001514465839136392}, {"id": 888, "seek": 544174, "start": 5456.219999999999, + "end": 5463.42, "text": " papers with codes papers with data like this organization + of data and I used to think with like", "tokens": [51088, 10577, 365, 14211, 10577, + 365, 1412, 411, 341, 4475, 295, 1412, 293, 286, 1143, 281, 519, 365, 411, 51448], + "temperature": 0.0, "avg_logprob": -0.0960767210022477, "compression_ratio": 1.8643410852713178, + "no_speech_prob": 0.0001514465839136392}, {"id": 889, "seek": 544174, "start": 5463.42, + "end": 5468.78, "text": " weve 8''s Wikipedia demo that it would need to be like + live always hosted like you click try it", "tokens": [51448, 321, 303, 1649, 311, + 28999, 10723, 300, 309, 576, 643, 281, 312, 411, 1621, 1009, 19204, 411, 291, 2052, + 853, 309, 51716], "temperature": 0.0, "avg_logprob": -0.0960767210022477, "compression_ratio": + 1.8643410852713178, "no_speech_prob": 0.0001514465839136392}, {"id": 890, "seek": + 546878, "start": 5468.78, "end": 5473.34, "text": " now and then it''s like boom + you''re in the console and you can query it but I think with these with", "tokens": + [50364, 586, 293, 550, 309, 311, 411, 9351, 291, 434, 294, 264, 11076, 293, 291, + 393, 14581, 309, 457, 286, 519, 365, 613, 365, 50592], "temperature": 0.0, "avg_logprob": + -0.09469001019587282, "compression_ratio": 1.850187265917603, "no_speech_prob": + 0.0037916619330644608}, {"id": 891, "seek": 546878, "start": 5473.34, "end": 5478.38, + "text": " this repo where you just download the Docker file for weve 8 it''s like + three it''s like two lines of", "tokens": [50592, 341, 49040, 689, 291, 445, 5484, + 264, 33772, 3991, 337, 321, 303, 1649, 309, 311, 411, 1045, 309, 311, 411, 732, + 3876, 295, 50844], "temperature": 0.0, "avg_logprob": -0.09469001019587282, "compression_ratio": + 1.850187265917603, "no_speech_prob": 0.0037916619330644608}, {"id": 892, "seek": + 546878, "start": 5478.38, "end": 5483.9, "text": " code where you do Docker compose + up and then Python restore the name of the data set you want and I", "tokens": [50844, + 3089, 689, 291, 360, 33772, 35925, 493, 293, 550, 15329, 15227, 264, 1315, 295, + 264, 1412, 992, 291, 528, 293, 286, 51120], "temperature": 0.0, "avg_logprob": -0.09469001019587282, + "compression_ratio": 1.850187265917603, "no_speech_prob": 0.0037916619330644608}, + {"id": 893, "seek": 546878, "start": 5483.9, "end": 5490.86, "text": " think that''s + just as easy as having some always hosted demo so yeah I hope and I think the other", + "tokens": [51120, 519, 300, 311, 445, 382, 1858, 382, 1419, 512, 1009, 19204, 10723, + 370, 1338, 286, 1454, 293, 286, 519, 264, 661, 51468], "temperature": 0.0, "avg_logprob": + -0.09469001019587282, "compression_ratio": 1.850187265917603, "no_speech_prob": + 0.0037916619330644608}, {"id": 894, "seek": 546878, "start": 5490.86, "end": 5495.5, + "text": " thing is with with hybrid search another thing that excites me so much + is it''s like if it''s vector", "tokens": [51468, 551, 307, 365, 365, 13051, 3164, + 1071, 551, 300, 1624, 3324, 385, 370, 709, 307, 309, 311, 411, 498, 309, 311, 8062, + 51700], "temperature": 0.0, "avg_logprob": -0.09469001019587282, "compression_ratio": + 1.850187265917603, "no_speech_prob": 0.0037916619330644608}, {"id": 895, "seek": + 549550, "start": 5495.5, "end": 5500.46, "text": " search only it''s like you could + argue well why don''t I just use the face index then but I think", "tokens": [50364, + 3164, 787, 309, 311, 411, 291, 727, 9695, 731, 983, 500, 380, 286, 445, 764, 264, + 1851, 8186, 550, 457, 286, 519, 50612], "temperature": 0.0, "avg_logprob": -0.08975315960970792, + "compression_ratio": 1.7572463768115942, "no_speech_prob": 0.0017455018823966384}, + {"id": 896, "seek": 549550, "start": 5500.46, "end": 5506.38, "text": " because + it''s got the BM25 and the vector search is starting to offer more value with like + how it", "tokens": [50612, 570, 309, 311, 658, 264, 15901, 6074, 293, 264, 8062, + 3164, 307, 2891, 281, 2626, 544, 2158, 365, 411, 577, 309, 50908], "temperature": + 0.0, "avg_logprob": -0.08975315960970792, "compression_ratio": 1.7572463768115942, + "no_speech_prob": 0.0017455018823966384}, {"id": 897, "seek": 549550, "start": 5506.38, + "end": 5511.26, "text": " can help you with your information retrieval research + yeah and in general that''s just something", "tokens": [50908, 393, 854, 291, 365, + 428, 1589, 19817, 3337, 2132, 1338, 293, 294, 2674, 300, 311, 445, 746, 51152], + "temperature": 0.0, "avg_logprob": -0.08975315960970792, "compression_ratio": 1.7572463768115942, + "no_speech_prob": 0.0017455018823966384}, {"id": 898, "seek": 549550, "start": 5511.26, + "end": 5515.18, "text": " that is very important to me is trying to figure out how + to connect with the information retrieval", "tokens": [51152, 300, 307, 588, 1021, + 281, 385, 307, 1382, 281, 2573, 484, 577, 281, 1745, 365, 264, 1589, 19817, 3337, + 51348], "temperature": 0.0, "avg_logprob": -0.08975315960970792, "compression_ratio": + 1.7572463768115942, "no_speech_prob": 0.0017455018823966384}, {"id": 899, "seek": + 549550, "start": 5515.18, "end": 5519.9, "text": " research I think the beer benchmarks + presents a really exciting way to do it I do have some ideas", "tokens": [51348, + 2132, 286, 519, 264, 8795, 43751, 13533, 257, 534, 4670, 636, 281, 360, 309, 286, + 360, 362, 512, 3487, 51584], "temperature": 0.0, "avg_logprob": -0.08975315960970792, + "compression_ratio": 1.7572463768115942, "no_speech_prob": 0.0017455018823966384}, + {"id": 900, "seek": 551990, "start": 5519.9, "end": 5526.299999999999, "text": " + on how users would be interested in it because I think the idea of beer benchmarks + is maybe you look", "tokens": [50364, 322, 577, 5022, 576, 312, 3102, 294, 309, + 570, 286, 519, 264, 1558, 295, 8795, 43751, 307, 1310, 291, 574, 50684], "temperature": + 0.0, "avg_logprob": -0.13011157954180683, "compression_ratio": 1.7392857142857143, + "no_speech_prob": 0.005386746022850275}, {"id": 901, "seek": 551990, "start": 5526.299999999999, + "end": 5532.299999999999, "text": " at it and you say okay NF corpus or trec covid + or natural questions like very similar to what the", "tokens": [50684, 412, 309, + 293, 291, 584, 1392, 13576, 1181, 31624, 420, 2192, 66, 25616, 420, 3303, 1651, + 411, 588, 2531, 281, 437, 264, 50984], "temperature": 0.0, "avg_logprob": -0.13011157954180683, + "compression_ratio": 1.7392857142857143, "no_speech_prob": 0.005386746022850275}, + {"id": 902, "seek": 551990, "start": 5532.299999999999, "end": 5538.379999999999, + "text": " app that I''m building but I think with chat gbt you could probably loop + through your documents and", "tokens": [50984, 724, 300, 286, 478, 2390, 457, 286, + 519, 365, 5081, 290, 4517, 291, 727, 1391, 6367, 807, 428, 8512, 293, 51288], "temperature": + 0.0, "avg_logprob": -0.13011157954180683, "compression_ratio": 1.7392857142857143, + "no_speech_prob": 0.005386746022850275}, {"id": 903, "seek": 551990, "start": 5538.379999999999, + "end": 5543.66, "text": " generate queries gold like those would be the gold documents + for those queries and you can do", "tokens": [51288, 8460, 24109, 3821, 411, 729, + 576, 312, 264, 3821, 8512, 337, 729, 24109, 293, 291, 393, 360, 51552], "temperature": + 0.0, "avg_logprob": -0.13011157954180683, "compression_ratio": 1.7392857142857143, + "no_speech_prob": 0.005386746022850275}, {"id": 904, "seek": 551990, "start": 5543.66, + "end": 5547.099999999999, "text": " the same kind of evaluation testing where as + you mentioned you want to see how that approximate", "tokens": [51552, 264, 912, + 733, 295, 13344, 4997, 689, 382, 291, 2835, 291, 528, 281, 536, 577, 300, 30874, + 51724], "temperature": 0.0, "avg_logprob": -0.13011157954180683, "compression_ratio": + 1.7392857142857143, "no_speech_prob": 0.005386746022850275}, {"id": 905, "seek": + 554710, "start": 5547.1, "end": 5551.660000000001, "text": " nearest neighbor error + cascades into the representation error and see what that means for your", "tokens": + [50364, 23831, 5987, 6713, 3058, 66, 2977, 666, 264, 10290, 6713, 293, 536, 437, + 300, 1355, 337, 428, 50592], "temperature": 0.0, "avg_logprob": -0.15286540985107422, + "compression_ratio": 1.7105263157894737, "no_speech_prob": 0.01797335408627987}, + {"id": 906, "seek": 554710, "start": 5551.660000000001, "end": 5557.900000000001, + "text": " particular problem so I hope people check it out I hope find an interesting + yeah that''s a that''s a", "tokens": [50592, 1729, 1154, 370, 286, 1454, 561, 1520, + 309, 484, 286, 1454, 915, 364, 1880, 1338, 300, 311, 257, 300, 311, 257, 50904], + "temperature": 0.0, "avg_logprob": -0.15286540985107422, "compression_ratio": 1.7105263157894737, + "no_speech_prob": 0.01797335408627987}, {"id": 907, "seek": 554710, "start": 5557.900000000001, + "end": 5565.18, "text": " ton super packed thanks so much what I like in this discussion + compared to to the last 20 years ago", "tokens": [50904, 2952, 1687, 13265, 3231, + 370, 709, 437, 286, 411, 294, 341, 5017, 5347, 281, 281, 264, 1036, 945, 924, 2057, + 51268], "temperature": 0.0, "avg_logprob": -0.15286540985107422, "compression_ratio": + 1.7105263157894737, "no_speech_prob": 0.01797335408627987}, {"id": 908, "seek": + 554710, "start": 5565.18, "end": 5572.54, "text": " is that you continue to explode + with knowledge and I hope you will continue doing that thanks so", "tokens": [51268, + 307, 300, 291, 2354, 281, 21411, 365, 3601, 293, 286, 1454, 291, 486, 2354, 884, + 300, 3231, 370, 51636], "temperature": 0.0, "avg_logprob": -0.15286540985107422, + "compression_ratio": 1.7105263157894737, "no_speech_prob": 0.01797335408627987}, + {"id": 909, "seek": 557254, "start": 5572.54, "end": 5578.7, "text": " much for + your time today corner and yeah looking forward to talk more yeah thank you so much", + "tokens": [50364, 709, 337, 428, 565, 965, 4538, 293, 1338, 1237, 2128, 281, 751, + 544, 1338, 1309, 291, 370, 709, 50672], "temperature": 0.0, "avg_logprob": -0.29194930504108296, + "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.03523525595664978}, + {"id": 910, "seek": 557254, "start": 5578.7, "end": 5586.46, "text": " to me try + to feel like the vector podcast is like the super bowl of search podcast so thank + you so", "tokens": [50672, 281, 385, 853, 281, 841, 411, 264, 8062, 7367, 307, 411, + 264, 1687, 6571, 295, 3164, 7367, 370, 1309, 291, 370, 51060], "temperature": 0.0, + "avg_logprob": -0.29194930504108296, "compression_ratio": 1.7054794520547945, "no_speech_prob": + 0.03523525595664978}, {"id": 911, "seek": 557254, "start": 5586.46, "end": 5592.38, + "text": " much thank you so much Connor yeah enjoy your day bye bye", "tokens": + [51060, 709, 1309, 291, 370, 709, 33133, 1338, 2103, 428, 786, 6543, 6543, 51356], + "temperature": 0.0, "avg_logprob": -0.29194930504108296, "compression_ratio": 1.7054794520547945, + "no_speech_prob": 0.03523525595664978}]' +--- + +Hello there, vector podcasts. Season 2, and to them super super super excited to have a reappearance of corner shortened on vector podcasts. We recorded like a year ago about that time. Something's changed. He is a research scientist at Semi Technologies, the company behind VB8. +Here you can see an episode with Bob, and here you can see the episode with Connor as well. And back then when we were talking, Connor, you've been a lot into basketball. Do you still play basketball? Yeah, I still play a little bit. +And I'll add also to that that I think you also have podcasts with Eddie and Laura, also in the queue of we've read. We'll add that. Exactly. +And I remember like you've been big on computer vision, data augmentation back then, and you first like guinea pig task was you know some capturing baskets in the basketball game. And I wonder if you continued working on that at some point. +Yeah, I think about it every now and then, but I've been so captivated by the natural language processing and the tech search honestly. I still think about image search a bit, but yeah, the tech search to me is just it's just so exciting. It feels like there's so much that you can do with it. +And yeah, it's really been it's been an intense year. I've learned so much and I think it'll be a totally different podcast with respect to like what I'm this talking about. Yeah, yeah, absolutely. +I actually love to start also by asking you, what do you feel you've learned in this year that has changed something fundamentally in how you perceive vector search today versus back then and year ago? Yeah, that's a big question. I think I'm definitely with we V8. +I've learned a lot about having like kind of the user focus, the product focus definitely way more engineering understanding of the distributed data system, replication, cap theorem, all these kind of things. +So like the knowledge of the engineering around it in addition to sort of the machine learning research about like how to vector representations get optimized with deep learning models and then you know, this whole retrieve and read research. +And overall the space is evolved in such an amazing way and it's just really exciting. Yeah, absolutely. +I've been I've been also following all different things reading papers, you know, implementing clip, but I still feel like I miss out on so many things and I really hope we will cover some of them today. +And we on the verge of I think maybe witnessing a change in the search paradigm, you know, with chat GPT. I first I wanted to sort of get your first reaction on this. Obviously you tested it. I also tested it actually with when I published my recent blog post on neural search frameworks. +And I was like just stuck on creating a title and I asked chat GPT, can you come up with a title and came up with a reasonably good title and I actually used it without editing. And I read a bunch of other stories, you know, like for example, how you can avoid fines, for wrong parking and stuff. +But then there is this discussion going on, you know, like how it may change search. +But before that, what was your impression of chat GPT? Yeah, well, I think like everyone else sort of in in this like reading about say Google's flan model or, you know, that we've been kind of reading about a lot of these large language models, but we haven't actually really gotten to use them. + I think Facebook's OPT model was on hugging face and I played with that and back in back at the time, I was mostly like the few shot learning part was like the part that was so exciting where you could, you know, give it like for example, of a task and then it could just instantly learn the task and that's like pretty surprising for people who've been doing supervised learning optimization for a long time. +And so mostly my thinking was a few shot learning, but this chat GPT thing, this reinforced learning from human feedback, this like I mean the way that it can talk is just mind blowing. +It's I'm so amazed by and I think yeah, it's really unlocked a lot of thinking about the importance of prompting to me and what prompting means. I used to think that was just kind of like a task description idea, which it still kind of is, but it's like the nuances of it are so much. +And yeah, I'd really love to like dive into this topic of large language models and search and I have a few different dimensions of how I'm kind of thinking about these two things relating to each other, but since I've brought up prompting, I kind of want to stay on this one quickly. +So Bob and Jerry Lou showed me this thing called GPT index and GPT index has this strategy for prompting GPT for summarization. +It has other things, but this is one thing that just really stood out to me and there are like two strategies you can use to summarize long text with the large language model. + You can either create and refine where you go paragraph by paragraph and you say like you start up by please write a summary of this long text, you'll receive a paragraph by paragraph and then it iteratively updates a summary or you can have this tree where you you know you chunk it up like you know as a tree and then you couple it like recursively and then build up the summary that way. +So this kind of thing about like how we use these large language models all of it is so interesting and so I guess kind of yeah, let me pass it back to you and I'm curious like how do you think large language models will change search? +Yeah, I mean I'm still kind of learning it and I having you know built search engine before vector search you know using like TFIDF basically. +I knew the cost of doing it wrong you know or sort of focusing too much on precision and then paying a huge bill because of that. + So like our search engine for example back in the days when we indexed on sentence level in alpha sense would eat something like half a terabyte memory and you know memory was never cheap like it was very expensive even back then and so we had to figure out ways to retain precision, not lose recall or maybe even increase recall because there was a problem with this precision oriented search and stay within the budget right so when I think about language models myself and I also worked at silo AI at one large client you know applying these models at web scale. + The problem at web scale is that you really need to go sub-second and not just sub-second you need to go like 10 milliseconds or so because all of these adapt because you have so many components in the search engine it's also multilingual it's also serving a specific country you know with that specific latency requirements and stuff and and then there is indexing how quickly you can index things right because you may also face bottlenecks there so these these are the things that I keep thinking about but also the thing that we talked a year ago in the port in the same podcast vector podcast is that you know the models like trained by Microsoft for instance I can hardly imagine deploying them today in my practical setting because they will have like billions of parameters and so they will be probably slower and also how do I fine tune them how much server capacity I will need to fine tune them and so that's why I thought you know from the discussion with multi-peach right he pointed me to the Atlas paper where they basically are able to with a few examples fine tune the model so quickly and it will have substantially less parameters so it becomes more practical you know both on fine tuning side and also on serving side and these are the topics that I keep thinking before I enter the is it chat GPT is it sexy is it cool is it answering my questions you know can I actually deploy it and not have angry faces from DevOps saying hey you just crossed all the like we are low margin on search and you are just you know way above that so sorry we cannot deploy this so these are the questions I'm thinking about a lot yeah that I think there's a couple things in pack and no one's helped me develop the abstraction around the end-to-end search framework more than you so thank you so with the with the pyramid diagrams and these kind of things it's so helpful and yeah you mentioned like the approximate nearest neighbor then one up you have where I see is the information retrieval layer where you have the you know dense vector search BM25 split covert that layer and then at the top you have like what I think is going to be the chat GPT layer that's like that would be my current predict and we're going to talk about neural search frameworks that they can do more on the wv8 podcast yeah well maybe to just say a little bit one of our favorite partners that we've been working with is neural magic and neural magic is doing sparsity inference acceleration where they've recently one of their papers is about getting the 175 billion parameter GPT model to run on a single GPU I know that you know you can probably compile these large language models on like Nvidia Triton server and do it that way but I think that this sparsity acceleration for CPUs is just incredibly exciting for that particular dimension of it and yeah I think what you said inspired so many ideas yeah I sort of like like what I value in your approach is that you run probably like a basketball player converted into a marathon runner with the same capacity you have to play a game you know that you basically run super quick and fast and long distances you know on the research side and I love this approach really really because it opens up a lot of opportunities I sort of like because I come from the engineering background yeah I did my PhD but it was like 11 years ago so I most of my time I spent in production you know great systems and every time you just try to move a little bit like okay let's add this and oh the cost is this oh sorry okay it will take me now to two more weeks to index my content so and we have for this what is the use case so you trickle back to like almost like product level management and so you will get these questions inevitably like okay why are we doing this like what's the actual trade off what's the benefit of bringing this into production right and but at the same time I'm fascinated by this I mean this will not stop for sure right would you agree to that statement yeah I think and there's uh so I know Hagen Faces recently published their they open source of data said they did with surge AI on getting these um human annotations to train the reward model in the reinforced learning human feedback strategy so I think they'll they'll be an open sourcing of the data of the data that you need to train the models and then yeah I think pretty soon they'll be open source versions of it I think open AI um I I'm very curious about this like kind of data flywheel idea whereby open sourcing the model they spend a ton of money on letting you use it for free but then they get the data of how you want to use it and so very curious how that leads to more to a better model my PhD advisor is a world class expert in class imbalance like understanding that machine learning models they do not perform well on long tail you know if you have an imbalance data so it's a lot of like the bias discussion things like that so I'm I'm curious maybe it helps the long tail getting all this data yeah it's still not exactly how it will get better I think one thing I've said previously is like there was this paper from Emily Bender and um caller is the last name sorry it but it's called unmeaning understanding in big data and it makes this argument that it's like language models by predicting the next token will never achieve meaning because it's like an octopus underneath the ocean of two stranded islanders and it's just mimicking their language but if it if something like a bear is to show up on the island and it goes help a bear then the octopus is like oh I don't know what a bear is like yeah I'll do more but I think what we're seeing with the reinforcement learning thing is that it's like it's acting it's there's there's this other paper called experience grounds language it's about you you need to it's like the levels of sort of developing meaning and one of it is like about the importance of acting acting in your environment I'm I'm kind of going around right here but I also see like this causal inference stuff and uh Judea Pearl has this ladder of causality where uh it's you act you make interventions but then the the the the top of the ladder of causality is you can understand uh counterfactuals and so that last part I have no idea how that's going to be achieved yet but I clearly chat GPT is now like acting so it's different from the yeah yeah the next word thing yeah I think what coming back to chat GPT like what um impressed me maybe the most is uh so I had I had this problem uh I was I was working on billion scale and then search algorithm with with the group of researchers and and engineers like almost a year ago so I invented this this algorithm I called it candy like of course you know not not meaning my surname but in any case with a k um it's all open source and GitHub I'll make sure to link it and so the the problem was that it it would work on 10 million vectors it would work on 100 million vectors but it would choke on one billion it would basically run out of memory uh and and I did it entirely in python right so maybe I I should have chosen in retrospect some other language but in any case I wanted to make this work um I couldn't I ran out of time and I ran out of computer resource because it was given to us by Microsoft um for a limited period of time so what I did is that I pasted that code into chat chat GPT and I said yeah first of all I tried to paste the whole thing but it said well it's too long so I had to focus on a specific part where I think the the problem you know kind of lurks and and it gave me the answer it said okay maybe try avoid using non-py arrays as much as you do try to pre-allocate them try to reset them and actually I did that I just didn't paste that portion of the code which was doing this so the the system didn't know that but it was on the right on the right track but then when I did it a year sorry a day later the answer changed the question was exactly same but the answer changed and that kind of make me really like uh what's going on like is it learning as it goes can you explain this part like have you seen this in his behavior like was the casting generation of the yeah chat GPT sorry I was like I was trying to follow along with the I think we're going to talk about like approximation error with the AN and search as we scale it and I know we're coming back to the chat GPT but I'll be uh yeah so it's like uh it's like a tree decoding where uh it it has a probability density on the length of vocabulary and you can take several paths through that tree for what you're going to output and uh you often randomly sample through the through the tree if that makes sense like um yeah yeah me does but I mean the answer was kind of like in some sense these two answers were complementary to each other right and and maybe I could go on and say hey what do you mean by resetting can you because it didn't provide any uh code examples it would just say reset and I was like what do you mean by reset I don't have such a method like like like so I I think that that was maybe impressive part of chat GPT and um just to close off on that there was a recent discussion on on relevancy and matching text like where a lot of these search people see uh there was um there was this argument against chat GPT that let's say if you go um you know use uh duck duck go today you will see the links right you can go and examine the links and you can actually verify the information to some extent maybe not to full extent but to some extent in chat GPT you can do that there is an answer that's it so it's it's quite a jump from being able to kind of seemingly check the is it trustworthy to well you have no way to do that what do you think of this aspect yeah that's brilliant I it makes me think about like well very broadly it makes me think about artificial general intelligence compared to super intelligence sort so to say and like I think about the artificial general intelligence and like because open AI they've published web GPT and instruct GPT so instruct GPT is like the reinforcement learning from human feedback part and then web GPT is like the like the whole idea that we're super excited about at wevea where you search for context to append to the input and then like if you say like please uh ground your answer in this information and then it's a paragraph about like how the BM25 algorithm works like I use this personally that way to hybrid search and understanding it and so like if you give it the context it's so much better and so I think I suspect that chat GBC under the hood does something like a Google or a Bing API search and so it's like general old but um yeah this idea like so so so then this idea of super intelligence it uh because I've been like can I use chat GPT to help me write like you know blog post survey papers things like that are relevant for trying to be a master of search and what I need from it is more so like citation recommendation right like I needed to go into like uh Leo Boystov's publications and parse it out for me and help me understand what he's done so it's like the specific information and then yeah the real I mean u. +com also has a really brilliant thing where it's uh search engine on this panel and then the chat GBC on this side so it's like a user interface problem I think yeah yeah but but I mean maybe even yeah I totally agree with you that user interface definitely creates the bias uh how we like how you use traffic lights today they go like red you know yellow and green they don't go upside down right and like if you see an upside down you will you will think well this is a wrong uh traffic light uh I'd rather not cross here you know but like it's kind of like similar here like with the search engines we are used to seeing you know URLs and and being able to click there but of course if you take Google or I guess being does that too they also pre-generate this answers answer boxes right so you can answer you can click there but I don't think you have a URL to verify you know the source of this information if I'm not wrong yeah yeah so they already playing with incorporating this knowledge from a language model right and they they they look at you and of course they also want you to spend more time on their page which is probably not good but we'll not discuss that so they don't share the traffic further but but the thing is you know they still play with the idea okay what if we try to answer not just with the URL and summary but actually with the actual thing right with the actual answer oh so that comes into like the extractive versus abstractive and like whether you want the question answering models that classify the answers in the context yeah and yeah I think that still has a place for sure I mean it's super lightweight as I mentioned Neural Magic they just did a sparse question answering model that can run on CPU super fast and yeah I think that approach is also just gonna be more cost effective for a while yeah exactly but you mentioned BM25 and I'm curious like I've been trying to approach this hybrid search topic but I think you went ahead all right so and I was just wondering like what's your take on this topic like can you a little like intro it to our listeners but also why do you think it's a good idea to to build like a hybrid search you know combining keyboard retrieval with it's with a you know dense retrieval yeah awesome I started by saying this has just been like the most satisfying project I've worked on since I've joined Wevegate and just being a part of this team and it's been you know like a big team working on hybrid search and it's just been an incredible experience so I guess starting yeah the motivation is BM25 has this it builds on term frequency inverse document frequency by adding like this binary independence model and the IDF calculation and then you also normalize it for the length of the document that's just like these subtle differences that make it different than TF IDF but you could also use TF IDF in hybrid search if that's what you were after and so then you also have the vector search and then you have this rank fusion so so we look we found this paper where they have seven different strategies for rank fusion it's like rrf board uh I don't know come some but in the end we just went with rrf reciprocal rank fusion which is just erica's recently published a blog post that shows the equation and just tells some of our thinking around it but it's where you just combine the ranks compared to say combining the scores because you know BM25 has a score particularly and vector search has like a distance at return so you might look at some way of like linearly or non-linearly combining those scores and I've done some experiments with with kind of my thinking around it was like okay what would be like an optimal alpha per query would that be you know maybe like a conditional model like so I tried this with the few shot learning of gbt3 where you you run a few examples of the optimal alpha and then you try to see uh you know how would you like to wait BM25 and dense vector search given this query and see if that is productive but I found and this is a very interesting thing because I think people have this idea that BM25 is like very interpretable but in my experience it hasn't been that I when you do when you're doing longish queries in long documents and maybe we can talk about long queries or short queries but I find that trying to decode why it why BM25 was better than dense search for some particular query I find that to be impossible and maybe someone will prove it wrong and I'll look forward to seeing that honestly but like there's this example that we have as you know erica was developing the weviate error demonstration of hybrid search where the query how to catch an elaskin Pollock and the idea being that the dense vector search contributes the disambiguation of catch that it prefers to fishing and that BM25 is specific to elaskin Pollock but I haven't been able to just like inspect that kind of behavior as I look through the beer benchmarks that just I'm super excited to talk about that and how we've been evaluating it but you know let me let me pass it back to you and ask about your experience with them BM25 or like the keyword and the dense search particularly because then I'd like to kind of take the topic to just arbitrary combinations of retrieval methods not just be in 25 and say DPR or whatever yeah I think I remember even before the dense search appeared on the scene we were experimenting with sort of like making TFI DF which is BM25 is like an addon like BM25 I think stands for best match so period so solved problem solved but you know like one of the questions the the reason I love working with product managers and at the moment I am a product manager so I took the other side of this thing maybe we can talk more about it in the VV8 podcast but you know the reason I love talking to product managers is because they don't know anything maybe they don't know that much about algorithms as you and they don't code maybe as much as you but they do care for they are the stakeholders of the end result right so when they go out talk to sales or to the end users they will not get a question which alpha you have used now coming back to your to your example right they will say hey I typed cat three times in my query and I still see that the document that mentions it once is at the top how can you explain this I will try to link there is a consulting company I think they're based in Boston actually by the way I just forget their name key and via something so they have a really great presentation on haystack life I believe where they go super deep and I recommend you watch it super super deep on how TF IDF screws up our understanding of how things should work what they don't you know and they go by you know how many times you know the word cat is mentioned in the document versus how many times it's it's mentioned in the query and you can do all this combinatorial you know combinations and then they kind of like explain what you would do to kind of solve it right and you you basically develop this situation another another thing is that I found useful and it also mentioned in the relevant search book by Dr. + Nbal and Jerry Bareman that you can you can use like if you would use like let's say elastic search or similar system or solar you could actually build a function which explains the query step by step right so it basically prints you the tree of how it actually came up with that final answer with that final score and how you know that specific field like for example at TomTom we would I cannot go into much specifics what we do at TomTom but basically the geographic search right so you type some destination let's say an address or maybe a P. +O. +I name point of interest like a shop and it's multilingual as well right so obviously your query may hit by accident sometimes in a wrong language field and so the only way to know this is to print that query execution formula if you will right and so you will see okay ah it hit in that in that let's say I don't know a French uh but I wasn't intending French I was doing German or something why did it do that and you start reasoning about how did I create the tokens because when you tokenize your text it's same problem is as in then search in a way like when you split text into paragraphs or sentences there you need to split the tokens how you do split the tokens is dependent on how you model the semantics of what you are converting to to a token so you should not convert string to a token you should convert meaning to a token so if you capture meaning in that token then you're done in a way but then coming back to your question I cannot answer it fully now but I highly recommend that that talk um by can be so you know like you need to you need to see how term frequencies and inverse document frequencies play together and also like in BM25 versus TFID if you have the term saturation issue which is kind of mitigated in BM25 to some extent right so meeting that if you have two documents um sorry if you have two terms which occur like one is like million times and the other one one million plus one TFID will be unable to distinguish between these two but like BM25 is still sensitive to these things and that's why it's a little better right so I think it solves this term saturation issue I don't know if I answered your question but no yeah I think um so yeah a couple things I really want to continue on this TFID versus BM25 and then adverse displayed to it I think you're I think this like pseudo relevance feedback is that like the phrase I give to show that like um if you're searching with BM25 you say if you had added this key like you have the gold document and you're like how would I have modified the query to produce that document is it so I think that yeah that's one way another way is to how would you modify the indexing that's more in your control right so how you would modify the indexing for example you would in some cases you can remove the applicates or something right so like you don't you don't need them or something like that you can you can or you can split the term by numbers or something right if they happen to occur inside the term something like I'm making these examples but I'm saying that you have more control in the indexing than in the query but in the in the query you can model like query similarity for example right so yeah oh that's super interesting yeah the the way that you do like the text preprocessing like stemming stopper removal all that all that that bag of that's what I hope dense vector search can kill all that I hope you can just like anything can go into it yeah and but um yeah and so I think there's this this thing called like decoding the latent space of a vector search model on that other idea of what query would have produced this where you would take the you would train a language model on document query pairs and then it would generate a query that would have matched the document maybe that's useful but but I'm also I'm very curious what you think about this idea of split vectors so split vectors is like you keep the mass language modeling head and so you encode the thing into the vectors so the mass language modeling head always only takes in a vector as input you always would mask out whatever the mass token was and then send just that vector to the mass language modeling head that will produce like a sparse distribution over what would replace it and so I think the the idea behind split is that you do that for each token and then you just kind of average all the vocabulary distributions and that gives you a sparse vector that represents like the like happy euphoric ecstatic like the kind of synonyms behind it do you like that kind of idea yeah yeah uh uh I like that the fact that I think we can step back from like this dense vector limitations and go back and try to capture what sparse vectors do because if I don't know if you watch the episode with duck Turnbull but he actually shed the light on on this really well by saying hey if you if you take the keyword retrieval inverted index you deal with like probably hundreds of thousands of dimensions unless millions unless billions like in some of the indexes we had at least million per term right so that's like million positions most of which are zeros because this term doesn't occur you know in in specific doc but like doc id but like it occurs like in a few and so in dense retrieval you sort of like compress all of these to let's say 256 dimensions and inherently you lose the precision right so it becomes more like recall oriented rather than you know in sparse you you basically like what also it means spars is that this is probably like a little bit like going back to n and algorithms right so like an inverted index it's basically like a hash table so I have this term it's like order one look up in the hash table and then you leapfrog you use this leapfrog algorithm implemented really well in leucine for example how you jump over long strides of consecutive doc id's because you don't really need to examine them in an antiquity let's say if it's cat and dog you know you know that cat occurs in the document id5 well I don't know like 10 let's say and for dog you are on on on three so you can leapfrog all the way to 10 you don't really need to check all this in because they will never occur together so for or query that's another story because that's a union but for and query it's an intersection so you always need an intersection you can then stop early because you don't need 100,000 results on the screen right and I'm still actually curious on how would you know when to stop because what if you didn't find the document that is even more relevant that what you have seen so far but that's like a matter of debate I guess but then you start scoring them and then sorting them were relevance right yeah sorry if I'm a little behind them so is this referring to how you can use like an inverted index to calculate the BM25 scores so I would you know with my document collection if dog appears I you know dog and the documents so that when I'm calculating yeah yeah but like the the the the I guess the comparison I wanted to make to dance search that like an old vector search is that they are on the on the base data structure first of all you have a choice of the algorithm you want to use but let's say we take hnsw which is the most popular right also implemented in v8 I know but like you don't know when you enter the first layer you don't know where exactly you will end up like so like with hash table I know exactly where I'm entering and I know that I'm exactly in the right place right and you know you can also expand your query with synonyms then you enter more more points in the hash table and you start traversing all of them in parallel and you come up with the answer but in dance search you need to like accept the uncertainty of navigating that graph you don't know where it will land it has certain limitations and trade-offs and then it will pull up you know some nearest neighbors and probably you should be happy with them because oh otherwise you need to do it twice so that price and so on you see what I mean right so like they are fundamentally different also on search side oh in like this stochastic nature of the yeah and also I read this paper called OOD disganan that talks about how much the distribution shift can impact the graph based vimana so vimana is like hnsw but you flatten it so there's no longer the hierarchy of layers it's like all the same thing and then you can put it on disk and it's like a little cheaper run I think yeah it's fascinating the whole indexing the part that that's like kind of the the meat of this especially from wavy aspect of that's where I see and in addition to you know the ux and making it like a very developer friendly to well there's a few sides to it because there's also the distributed database part and you know all the written and go laying the concurrency control you know the replication of the backups like all these kind of things like that so it's definitely like some things to but that approximate nearest neighbor search and I know that you have this experience with you know I've listened to a ton of your talks and you're you introduced me to the a and n benchmarks but yeah that I see that there is being like three levels of errors that come that propagate up there's the errors from hnsw and say product quantization then there's the errors from the vector representations to begin with and then there's maybe the errors and like the question answering model so if you wanted to have like you know natural questions open domain qa you're looking at like three layers of cascading errors that are sort of unrelated to each other yeah exactly people really brilliantly that you like and I think if I may summarize it you know I anyway to you know kind of where this hat of the person who is creating this doctor search pyramid and stuff I'm not the only guy doing this but I keep doing this because it helps me to stay comfortable in the topic and sort of okay I'm looking at it from this angle and if you accept it stay with me if you don't you know you may you may as well augment it or something like you did earlier with some levels and you know like it's just you need to accept that uncertainty like you explained and also that uncertainty that you know like in this can and paper they they explicitly show that in hnsw you may have unreachable nodes and they counted something like 1000 nodes were completely unreachable from any point in the graph like no matter how you search how long you search what are the values for your e f and m parameters during index construction and search you just don't reach them and and that's I think that's somewhat similar to the inverted index search where you have like one million uh doc IDs per term how do you know when to stop it's also like you may never reach the documents that you should have visited but you just deliberately decided to stop you know prematurely because you don't have time you have to you know return the documents within night and 10 milliseconds so you have to make trade-offs um but they are ordered naturally in in the increasing order of doc IDs right they're not ordered by does this question answer anything does this does this document know anything about cats or it just not mentions them and passing you know does this document knows anything about tweeter does it describe tweeter or just says you know please contact me on twitter here is my twitter handle right like complete noise uh so so you see what I mean right so like there are I think in both approaches like on fundamental level on data structure level we deal with this fundamental limitations like gravity law like you cannot jump off and and fly to moon or to Mars right without additional like thrust and devices and stuff yeah so do you feel the same like does it resonate oh yeah well firstly thank you that you just explained that concept to me for the first time I'm just I'm just now alive on the podcast understanding that concept but yeah it's very it's very cool like the um sorting the inverted index to prioritize documents maybe by clicks like clicks would be like like the most sensible thing if it's like web pages so to say and you sort the documents and then you yeah you have some kind you could probably calculate how much time you have to search and how much that lets you go into the invert index yeah super interesting I I think it's very interesting for wevia with the with the hybrid search in the beam 25 index because I I know the inverted index has been explored because we have this uh like neuro symbolic search where you would annotate properties like you you're searching through let's say you have a billion sneaker images but you've also labeled the color they are so you have red is the color and then you can use that to filter the search so there's definitely been some foundation in pre filtering and integrating uh these kind of symbolic inverted indexes with h and s w so it's not like the first time we've yet it's ever exploring that but I yeah there's definitely nuances with the beam 25 because of the cardinality of how many terms you like with the document I think you're splitting it I don't know 300 words right like 300 300 words per property so the just the size of it um I mean starting to go into the thinking around like the sizes of things it inspired me when when you're mentioning the compression bottleneck from sparse to dense I was thinking like okay let's say we have 384 dimensional vectors that have 32 bits per uh vector position like what is that is that is that 384 or 324 or 32 or you know like that it's still a massive common tutorial space right yes exactly exactly and and and you said like is it even the model that captures everything we need to capture right it in all of these are numbers of course it's kind of number representation of the models understanding well understanding in quotes of of the objects that we index uh but I guess like for me like um and you're way ahead in this I feel like that uh with VBA development like um of me you know what matters to me when I was like a search engineer day to day is what tools not necessarily tools as in specific programs but like tools as in algorithms approaches I have to control the process right so if somebody comes up and says hey can you look in this query can you debug it first of all like explain queries one brilliant way of doing it and that's where you start but then once you understood aha there is a problem that it hits this field or I give too much of a boost uh in this situation what should I do so you start like tweaking these parameters and you have these tools in your hands right you can do that in vector search I I don't know like I have like probably fine tuning as one tool right so like if clip stops working on these images I can go and fine tune or bird um but what else do I have like I can also tune some parameters in hnsw or gskn and so or something I can make all these thousand nodes reachable like they didn't this can and I can choose disk over RAM if I want to save on you know on cost and stuff but what else do I have as a control to actually go and debug and fix that specific query like what has been your experience on that or maybe thinking uh yeah I think you've named them all I mean I know I've seen like um like the tuning of the EF construction as you mentioned with hnsw and I guess something that I'm really excited about with these beer benchmarks and maybe I can introduce it now because I think it helps with this idea of model selection in terms of the user's perspective on how can I debug my system how do I fix my search system so the beer benchmarks is it's about diverse text retrieval so you know it's like arguana NF corpus track covid is the difference is instead of saying that the search image net is going to be ms marco which is you know like 10 million being passages and like a million labeled query so it's like the image net idea of like this general source of it like image net is like a massive collection of images labeled in a bunch of categories so it's like it's like is ms marco the search image net but it seems like instead we're going for diversity with beer and I think also if we all if we want to talk about intent intents and instructions further I think actually beer is I think beer had another there's latte like L O T T E capital T's like they go they go beverages right so yeah so there's like an equivalent to beer and then there's also miracle which is for multilingual so there's a lot of these like diverse text retrieval and then and then it's expanding where you would label it with the instructions as well and I don't remember the names of these data sets off the top of my head because it's very new but I know this paper called task aware retrieval with instructions and I think there's a model another paper with a model called instructor so this is idea where you also label with the intent but but anyways let me go back to the focus on like how does a user debug the search system and say how can I fix it so the idea with the beer benchmarks like one idea would be that we could test several different models and you could maybe say like okay well I'm building a nutrition I'm building a nutrition search apps I'm like I'm like bodybuilding. +com or something like that and so you would look at the NF corpus results and you would see the performance of the different models and that would maybe help you take a different model off the shelf but then what you're saying with like fine tuning it I suspect that fine tuning is going to be a super powerful lever I think if you find like and maybe later there's so many topics I want to talk to you about. + Like with the idea of I've been building a Wevey-A demo of the podcast search so I've been taking the Wevey-A podcast parsing the transcriptions and putting them in there in my temptation to like fine tune it and start thinking about this positive negative construction for that I mean I think in general with Wevey-A we're kind of you know letting like you know we use open AI models co-hear models, hugging face models and it's like we're not really training the models but it's just such an interesting thing to tune I know Gina AI's fine tuner is extremely interesting that I do find myself like constantly pulled in that direction of like wanting to train models. + Yeah absolutely I've been when we when we presented Mewves at Berlin Buzzwords last year now we actually said we also have Mewver which is the component to allowing you to fine tune a model we kind of like don't have it for prime time but I've been like really fascinated kind of coding a bit of that and and checking how well it can can work in a more generic way you know because I think fine tuner allows you to plug in several models you know and like because different models have different inputs they have different like setting to train and fine tune and so you need to be aware of that like Clip is that is a kind of two tower in a way right so you do need text you do need the image but I think I feel like coming back to the question like what tools I have I feel like fine tuning and I feel like you agree to that the fine tuning is one way that should be more available to the masses should be more available to the users in a way that they are aware of this tool and they know you know best sort of like know how how to use them and also pitfalls you may fall into and I think this is what you brilliantly described like a year ago in the context of computer vision like data augmentation right so like it's one thing that you can feed you can feed some manual examples but how far you can go and like in your basketball example like you've been manually labeling some examples like you run out of patience in a way right okay you can hire people to do that but is that scalable probably not and also new trends come up like if you take a business specifically working on e-commerce or I don't know full text document search you know things come up every week maybe right so like I don't know Tesla releasing cyber truck and you don't have it in the in the model so it actually like in your example what was it with the ocean and like yeah I hear you say like how to catch an Alaska Pollock and then let's pretend that Alaska Pollock is a new fish that like you maybe with vector search you may try to find what could be the most similar object but it may also be wrong right or in the case when the distance is so big that it doesn't make sense anymore to consider this as a candidate right so yeah so this is this is very interesting like and I hear that you you really want to like dive into fine tuning topic as well right yeah well that idea is amazing because there this argument and I also when I interviewed multi-peach he gave me these three reasons to favor the retrieve then read approach to large language models and one of which was this idea that you can swap out the information to update it with new information cyber truck becomes a new thing and then you can put it in the context and now the language model just has the reason across the context but then as you say the embedding model doesn't know about the new thing so the embedding model you know also isn't going to pick it up and so yeah I think that continuous updating one idea that I'm just incredibly excited about I haven't figured out how to make this work yet but the idea would be you you're the ML ops problem of this is you need to re-vectorize your data set which yeah so the solution maybe is that you could vectorize like a thousand representative documents and my hypothesis is that the proximity graph from I want to say Vamanum or Southern H and SW because I barely understand graph neural networks let alone trying to make it a hierarchical neural network but like if it's if it's the proximity graph maybe you can it's like it's like it's like a psycho again it's very similar to like image to image translation or any kind of you know it's a vector space to vector space translation and so you you know you input the vector output the change in vector and so can you vectorize like a thousand and then propagate that throughout the graph it or throughout the corpus and maybe that proximity graph has some kind of bias that facilitates the optimization task or maybe the graph neural network thing is too much overhead and you're better off just having like a transformer that takes into vector outputs a vector but yeah that this idea of like how do you continually update your embedding models it's fascinating right yeah yeah especially the ML ops aspect of it as you've mentioned like if if we were to insert new neighbors into the existing graph right would that change it favors something more recent or would it like break something that we didn't want to break and things but but in some sense if you think about coming back like we are still in the realm of this hybrid search topic in a way right if you look at BM25 OTF idea of approach right so if you compute so you're I so you term frequency is only dependent on this document right so that's fine it's kind of the independent of all other documents but your inverse document frequency is dependent on the whole corpus which is indexed in that chart by the way that's another like big topic which is kind of like crossing the boundary of is this just infrastructure issue in slash engineering is this kind of like research issue and it's like it's fuzzy it's it's it's it's a blend and so for that chart you're gonna have that local idea unless you build a a higher level cache which will keep track of each individual chart's idea and roll it up to the global idea and like if you look at Apache Solar I think I believe they had a country module or something implementing this where you can actually implement a global cache with IDF which will live on top of the chart and now you're coming back to MLOBS you need to make sure it never dies because if it dies you go back to like the chart level IDF and so that becomes dependent on okay I have managed to stuff stuff a lot of documents about cats in this chart so the IDF is like this and then I stuff a lot of documents on dogs here so they become like unbalanced if you if you know what I mean so they it's not a healthy mix of term statistics in your collection right and that will influence a lot of things like you may say in some cases it's okay but in some other cases it may not work if your query contains you know both concepts and they are unequally represented somehow in your in your collection right so so I mean does it make sense I mean so it's like you do have limitations also and and not limitations but maybe I should pose it in more positive way research tasks right so such challenges what should we do and I hope that in some sense dense search is pushing us to think more and more about this and maybe some things will back lash from vector search back to the you know classical keyword retrieval and maybe some new data structures will even emerge to to tackle these things yeah I think that idea that you're describing on the IDF caching it's super interesting I think it is it is inspiring me just thinking generally about how we're trading knowledge on this thing in general and having this podcast and having this content and this communication and how we've you know done like our first iteration of BN25 and yeah like learning so much about the index structure it is really really interesting I was thinking about like oh well how about displayed vectors could could we just kind of update the mass language modeling head to get the new terms and would that be easier than this kind of global cache I like idea and is it more forward thinking and then yeah it's really interesting I think maybe one other idea is this thing called colbert which is like a token level representation thing where it's like they call it late interaction where first you do the you know the standard vector search but then you keep the token vectors for each of the vectors and and then you do that and then they've had efficiency improvements on that so like I think they in the original colbert they they've recently published this paper I know Christopher Pots and Omar Kataba I'm sorry I don't know like I'll show my best like no give everyone credit all the time but in this paper they describe it like the original colbert is like in 154 gigabyte index compared to like one gigabyte with other methods and so so yeah like efficient indexing and I'm definitely running a van but it is like a big thing to unpack there's so much depth to this and that's what makes working in this field so exciting is that there's so much opportunity so much to explore yeah yeah and so much unsolved as well I don't I don't know if you wanted to continue a thought oh no sorry yeah I was just yeah I mean we are branching out but like actually one thing that you just reminded me there was a maybe I should start writing a book or something because like the moment I remember this I should write a chapter and then keep adding and then publish it maybe you can be my author or something yeah that was just thinking it was was it like 10 years ago on Berlin buzzwords there was a presentation by one of the engineers at Twitter I don't know if he's still a Twitter and I forgot his name I remember he was German working out from San Francisco and he basically coming back to that issue with you know sorted document ideas right what they did at Twitter first of all you know the scale of Twitter is such that you cannot possibly store Lucine index on disk and then go and retrieve it because well it's just way too slow right what they did is that they moved the whole index into memory right so they had to rewrite Lucine to kind of like this memory friendly data structure and one thing they did in particular is that as tweets come in each tweet is a document it gets its unique document they deem and they would append this new document ID to the postings list in the end right so for this term so they would decompose it into terms back and then they would know okay I need now to update that specific terms posting list so the posting list is just the array of dock IDs so they would put that Twitter tweets dock ID in the end and as the new searcher comes in searching tweets they would read backwards from the end they wouldn't read from the beginning of so basically what they did is that they kind of like encoded the temporal nature of tweets and people what end users wanting to search and view the tweets which are the most fresh so like like I don't know if like you are the heavy user of Twitter I do you know like on Twitter like when I log in and I check my timeline like usually I see something super fresh and then I keep scrolling but like not like anti-props to Twitter but it's it's a nightmare to search on Twitter like when I search something I know existed like a week ago there is no way for me to find it unless I know the exact tweet ID right and so at some point I was even indexing tweets actually direct messages I had with few people you know in solar and then basically searching them so because it was way faster than searching them on Twitter because like if you have 5,000 direct messages scrolling through them will take half a day so because they keep loading and loading so basically what I'm trying to say is that they optimize the data structure for the nature of usage of Twitter in such a way that they bias to the recent tweets and they don't care if you will have to spend a day retrieving like super all tweet like it's like so my new user use case for them for the majority of users 99% of users will only want to see and consume the latest thing so in some sense this is kind of the effect of optimizing to the usage like what you say we could optimize you know like split or or similar you know sparse lllm or something to kind of like learn you know that latest beat and maybe there is a high chance of it being retrieved as well so we might as well bias the system to that but then of course there is catastrophic forgetting thing and stuff like that. + Yeah there's no is yes not an easy problem to continually update the mlm head either it would be maybe worth adding that this mlm head insplay doesn't need to be like a billion parameters well maybe a billion would be good but it doesn't need to be a hundred billion or like yeah that's such a fascinating nugget of system design you just shared at the Twitter thing and yeah it's really interesting I've seen this other company called perplexity AI that Ravine Shrinivas is I think he's a founder CEO of it and it's cool because he was he worked on curl with Peter Rebele on this contrastive representation learning for robotics where they're you know they're doing the same kind of idea vector optimization to learn a state space for robotic control on so I think it's really cool that now he's working on the search space too but they have it's like the other approach is like natural language to SQL it's something like that where like instead of and I'm getting a little off topic but it's like kind of related to Twitter and it's about like putting tweets into you know data stores and then parsing natural language queries into the SQL but so that's like another idea I guess is like you would parse the query yeah I think I'm already explaining what do you think about that idea like you you take the query and you turn it into an SQL query in like that's it yes yes I know what you mean it's like it's very similar I think deep said did that right so you can or maybe it's opposite I'm not sure but like if you have a probably the same if you have like a table right you know with fields and rows I don't know let's say list of mountains with their heights and so on so you can actually have a question what is the tallest mountain in Europe or Asia you could turn that query in natural language into SQL command and say you know select you know mountains from this mountains table order by height reverse right descending and so I like this idea and in fact actually I've I think first of all this is already doable right so I'm fine just stood with like with the deep set doing that in haystack but I also came across this idea during my PhD research because so the problem there I believe was that it was like these engineers working on building aircrafts and so they had to read a ton of manuals but once you read the manual you still need to go and look up that specific number somewhere in the database right so so basically they do like multiple multiple hop approach and that may take like forever like you first of all you need to crunch through a ton of you know text material and then somehow summarize it and then okay now I need to go and look up that that number in the database but what if you could ask a natural language question to the manuals then convert that to a SQL command which would know to go and look up in that specific database table and give you the answer so like the manual doesn't have it but it has some instructions how to find it and then you would kind of like convert that into through this metal language convert that into SQL and then get that answer right and this was like pre-dense retrieval in era obviously but I think I still feel like it has the merit to like well I guess two things I think first there's this problem where you search like for error line manual some specific detail and it's like in result seven like it almost got it like it's not like not in the top 100 but it's seven and to that problem is where I think this GPT index like recursive summarization or create and refine summarization I think that'll solve that problem and yeah well so I then coming back to this idea of natural language to SQL and like structured unstructured data on the other end you can also parse the tables into text and so I've seen that done too there's like wiki tables to text and so me personally my favorite application is is scientific literature mining and searching through scientific papers and so you could parse out the tables to turn like the results tables to turn it into natural language and I mean there's so many fascinating things so it's like with a knowledge let's say like a knowledge graph the idea of the knowledge graph is if I have Demetri Khan host the vector podcast is a product manager at Tom Tom I with knowledge graph I can you know I compress the representation of all these facts into one structure compared to having the set of sentences right and yeah so maybe if I can kind of plug something I've done so I have this paper that will be published pretty soon it's about it's in the Florida Atlantic University PhD it's an interdisciplinary team with the College of Nursing and a local healthcare system so we have electronic health records that describe COVID-19 patients and we're trying to predict survival outcome treatment forecasting prognosis all that kind of stuff and so the the thing that we explored in this paper is let's switch from the structured tabular data to parsing it into natural language text and let's turn it into like clinical narratives or let's do this thing where you do if X if feature name equals if feature name equals then label right yeah so there's a paper from the University of Wisconsin called language interface fine tuning where they do that same idea but it's you know like the UCI machine learning repository data sets so so I think I know that I've taken like a walker and also to think it's cool it's cool I'm sure now listeners will be like what but I know like it's it's also what I heard from my listeners for example in the podcast is that they actually do use this episode as an educational material so that's why you know if we can stuff as many links to papers and your work they can go and study this yeah go go go I do some rise I guess the question is like how are we thinking about structured and unstructured data the deep learning systems you could parse out the structure into unstructure and then you have the transfer learning is really easy right yeah yes or you can keep the structure and then maybe you can learn a better representation thanks to the structure and with that question my interest has been really heavily in these causal digs and this idea of creating structured causal relationships between variables I still have no idea how that really how you can take like Wikipedia text and turn it into a causal diagram but I have an idea of like if and it comes back to this agi versus super intelligence idea if I have a super intelligence and it's reading search literature I want it to have some kind of causal diagram of our current model of search stuff so like it has some model of how BM25 is index the limitations of it's blade this representation this MLOVs problem it has like some structured representation of all these problems such that when the new batch of archive papers or tweets you know however the news is coming into it or experiments right it looks at its causal diagram to say like this violated my this this claim like because that's the thing you see a paper like autoregressive models as as search engines or you see like what's the name of that where it's like transformers is a differentiable search index like you see some title like that that violates your causal diagram of why things are the way they are and that's what like inspires your interest so that's that particular angle of it is yeah yeah I'm not mostly thinking I haven't explored this topic myself yet but so let's say if you take a language model like bird which was kind of like you could say statically trained once on Wikipedia or news content right but the world is changing every single day right your model doesn't so what you could do is that you could introduce knowledge back to the model and I'm still like on the on the brisk of kind of exploring this I think new tremors talked about it recently like how you can incorporate knowledge in the language model so for instance what like the way I see this before I even like read this paper so I could probably try to invent reinvent the wheel is that so the language model might figure out that the question is about the president of the United States that specific one let's say Obama something but then the question is is Obama still the president of the United States and so now the language model is kind of like hentik app that says well I actually don't have last I know like chat gpd does that right like I was trained by 2021 so I have no idea what happened in 2022 sorry goodbye but like it could actually say it could say I figured out the context I know roughly what you're asking this is the person I know this person I know that what what the president means I know the the country United States but you're asking me a factual question so what it could do is actually it could go and ask a knowledge graph which is updated without recalculating the the embeddings which is solving them all of this problem right so it's it's it's another data structure you know it's a knowledge graph it's being updated as we go and so it goes and says hey let's coming back to your question on on structured language like in in graph systems you also need to form your query in a certain way so it forms the query in a certain way and traverses the graph and then checks is Obama the president the answer is no it goes back all the way to maybe a language model I don't mean some other layer and basically presents the answer to the user right yeah so that's just one thought before even dove into this topic of incorporating knowledge in elabs I would probably think like that yeah I love that you brother that knowledge graph it's like and that's kind of like GBT index as well as laying chain I can't believe I haven't brought that up until now we can talk about that more in the neural search frameworks discussion on the review podcast but like this idea of different kinds of external memory and I don't know what's wrong with my brain today and I keep like branching into completely I don't think it's wrong I think it's the right setting it's just not suitable with the coding or something but um like so I was recently talking with Shukri who just joined we've got as well about um about this idea of metadata re-ranking so one approach is you have the xg boost re-ranker where you take in the bm25 score the vector distance and then also symbolic features as the input to the re to the xg boost re-ranker so the thing he was okay do we want to store this metadata in weveate as well or do we go get it from redis or feature store something like that where we get that kind of property and so it's like the knowledge graph the idea connects to that because it's like okay are we going to build the knowledge graph in weveate should it live in weveate or should we plug weveate in with something like Neo4j or or is it a top level controller like the neural search frameworks thing you're describing where it's you know something that hooks into weveate and hooks into Neo4j relational AI tiger graph I don't know all the rdf ontology technologies but you know like it has separate and it's a higher level that picks between the indexes so it's yeah it's like what kind of technology is built and weveate and that's not even really up to me you know exactly but I think it's kind of fun to brainstorm with you like what like we kind of like intuitively find this limitations together and at the same time this limitations may lead to future discoveries like on engineering and research and like when I was giving this keynote at Haystack where by the way weveate guys will surprise and another guys as well like I didn't I didn't feel bold enough to say this but I think I will say this now at least that I feel like engineering and research are kind of like indistinguishable in the amount of intelligent power you need to put into this to solve it because it's not like given right like if this data structure inverted index is designed like this and you do have the the issue of early termination because you cannot like waste so many CPU cycles then like okay without reading papers can you go and solve it like being just an engineer so to say no you can't it's like it's it's super hard like you need to start coming up with like new vector space model which was invented when in 60s 70s I don't know so like can you come up with like completing your model it's it's it's equally hard as in research when okay you know that SOTA is now this can I beat it somehow but it's not like you're just beating sort of for the sake of it maybe some people do but like I would take a stance of not doing that like I would try to solve an existing problem right so I do want to surface as you said more relevant document to the top or maybe even the passage maybe in a number so I keep pushing for that so both of these to me they're like they require so much intelligence so that they become indistinguishable in some sense like what exactly are you now solving the MLOPS problem are you solving the you know the inverted index data structure limitation problem or are you solving how do I retrain the embeddings how did you train the model or fine tune the model and I don't recompute the embeddings because it's a way to expand so it's to pay the bill yes does it does it resonate with you like what what are your thoughts of that yeah our ct oeddy and delocca has written about product engineering and like on this meta on this meta of like how do these decisions get made and it's like I think there's a book called change my office I have a bookshelf behind me I used to be in podcasts and I'd be like it's that yeah yeah I still have it actually yes but it's like it's like ask your developer is a title something like that about and well okay so that maybe maybe I got a little off with this idea of research and engineering I think the the scientist is very like a metrics oriented in a different way like the the engineer like the the diversity of the tests and the data collection is more important when you're the when you're the scientist sort of uh yeah the the engineer needs to build like smoke tests sort of where whereas I see the scientist needs to like have a very rigorous data collection kind of because that's sort of how I see the distinction and responsibility sort of is that makes sense yeah it does it does actually yeah you you uh you gave a very good distinctive you know feature what I was trying to say is that like in engineering you still have a plethora of options like it's combinatorial explosions in certain cases there are also mundane parts in both of these right so like we are not talking about them but like they do exist but like you do have these points like okay should I branch this way or that way should I step back and rethink and and that's yeah but I agree I agree you you gave a really good example of like in research I do care about data so much in engineering it's probably the quality assurance department is going to worry about okay what data we're going to feed into the system to try to kind of maybe break it and see limits and where it breaks what do we need to fix um or is it kind of like stable what it proved enough to release you know things like that so but yeah I think if I can stay on this a little more I think this like generalization testing like the industry of quality assurance but 4D learning is is going to be really fascinating I'm like excited like how I think when we first met you had written this um not all vector databases are equal and I thought that was so insightful because it was like a you told the story of an emerging market and that was so interesting I really look forward to seeing like the story of the emerging market around generalization testing I think like um like with the beer benchmarks that kind of thing where it's like you create some million scale data set and have the NDCG recall precision with all these queries I think maybe also this idea of like AB testing with models is going to be more popular I was when I went to Neurops this year and there is this talk from Dr. + Juhau came about interaction centric AI and how that might differ from the first paradigm of model centric AI where say you judge the image generation model purely based on like inception score for shade is tends to feature spaces in real images and then to data centric AI which is like I think snorkel AI is very responsible for like branding that term and making it so popular but it's like you're really focusing on the curation of data like your language model is like mosaic and oslatus pub med gpt it's about like you have this particular data and you like clean it and you make it awesome and then I think interaction centric AI is like a new way to evaluate models where it's like AB testing driven kind of or like how quickly can you perform a test I don't know if I've gotten too else topic but no I think it's it's exactly the topic to focus on if we are serious about you know putting these things out in production like you do need you do need to have and provide an evidence to the stakeholders that and to yourself that this dust hold water and we can release it and it's not going to show something you know in discriminate to the users that they will be completely you know puzzled and stuff or maybe you know there are all these numerous examples when like Google search when they I think incorporated some distilled version of bird when they would flip the meaning and they would say you do take this medicine but actually in the prescription it says you do not take that medicine or vice versa you know because it's not sensitive to negations and stuff so like I totally agree I'm with you on that like how do we QA quality of sure you know that the systems that release and I think the open AI team did that brilliant trick in a way that they said hey here is the chat GPT go test it and they get like million users in the first few days because they actually do need some extra brains to do go and test in different like scenarios and see where it breaks maybe it doesn't make sense anymore so yeah it's my understanding that's how like scale AI became the kings is that you know like labeled data like mechanical Turk I think Sir J. +I. + is something that's emerging that I've been seeing yeah it's really interesting yeah exactly um yeah um I was I was wondering um you you also worked on this podcast search and you had the opinion that Whisper has some bottlenecks I wonder if you if you want to like tap into that a little bit yeah so I'd love to tell this story so uh so it comes the kind of story behind it is uh so Boris power at open AI tweeted uh so they they cut the prices for the open AI embeddings and and Boris is pointing out how cheap it would be to index a massive podcast like the Joe Rogan podcast so that's how I was like hey I have a podcast and you have a vector podcast and we did also but so I started to be you know I started doing this where you you know you take the audio files then you put them into Whisper I also tried like uh descript is something that I like a lot I've been using descript for a long time for editing videos and so it's like you still because you it's very the podcast transcriptions you still want to edit them a bit you you have like uh and like like if you were yes how I'm pausing right now I'm talking about but the transcriptions it's not quite what you want to like index to this idea of like how do we create a knowledge base from these podcasts because these podcasts is so like we've covered so many topics and it's so it's kind of easier to do it like this than to be writing all this down and then and also it's very collaborative uh like the podcast you get more people involved it's like a community building thing is so yeah that idea of creating knowledge bases out of podcasts like what would you write your interest on a scale of one to 10 of having a vector podcast I mean I would love to join to join the you know to join the geek here so because I do like I was rewatching the episode with you Danny here we go and you were like exploding with knowledge right like in a way you're branching out a lot today as well exploding with knowledge because you you read all these papers you try things you share you know like google collapse and stuff but like like how do I tap into this knowledge like it's it's very synchronous right I have to like there is no way to like random jump into hey where did he talk about you know that model from Microsoft like I don't know unless I have the time code I don't have a way to do that right so yeah yeah and I what that's what inspires me so much with I want to fine-tune these models so badly just on the uh turn taking as the positive labeling and yeah I think can you expand a bit more on that what do you mean uh okay Conor says I want to talk about the turn taking Demetri can you expand on that more on what that means Conor okay it's like this is how you do the positive thing that's like potentially like that yeah yeah yeah and like if you want to have more examples of what Conor said like you could like augment with Conor's uh statements uh oh like sentences yeah yeah and just the I feel like the potential of it is crazy I also think like we're gonna see it like hooked into say Spotify are these big platforms that organize podcasts and I think it'll help you like discover like because something else it's like I love how you do this vector search podcast and I'm also doing a vector search podcast and it's like who else is out there doing like maybe a recommendation podcast or like you know like it's like this kind of discovery about the people because podcasting is very like collaborative it is a medium right like it is not like you you can't do it by yourself no like it's like it's it's almost like the thing uh like stand up comedian so anyone who is presenting you do need the audience because you simply do not generate the 3d-ness of your thoughts in absence of people like it's very hard to do and then same thing happens here right now like when we exchange like I like I have like a full shade of these notes and stuff right so I wouldn't be able like what like do I do you know these things do I know some of these things you know it's like a vote if working in your memory but like coming back to whisper like just to get it right you you're saying it's still a bottle neck in your opinion in what way okay well I'd hate to be like quoted as saying it's not good if it's not the same thing which I value you know it's not yeah so if you're creating a podcast search app you like there's still needs to be a little more parsing I don't know if you need to find I don't know if you need to correct one and then fine tune so because I've also been playing a little bit more about chat gbc and as as I've been learning about this kind of like sequential prompting from gbc index and chain about learning like how you can get chat gbc to maybe clean up a podcast transcription but there's like still a pretty fat pretty difficult manual cleaning effort in the middle of that yeah actually I can resonate with that like I've I've worked with one startup helping them to do speech to text right and first of all one one issue is very similar with low resource so to say languages in an OPs that if you don't have a model trained on a lot of examples or maybe they've been trained on some TV shows and you are doing and a user speech stuff you know the topics are different the style is different everything is different and so it breaks and so I was also eluding to the topic of fine tuning there but exactly what you said the problem was the output was so noisy that I had to write and what I called like an LPLayer which would go and you know change things for instance if you say 25 and it actually spells it out with letters you you will collapse that to a number you know but sometimes it would do it in problematic places and you're like oh no don't do that don't do it here you know so like it's just like an aftermath you know thing and you would wish that the model having enough context and knowledge about the world should do it right as it transcribes rather than you do this as as a aftermath yeah yeah exactly I'm thinking the same way and he's like it's a text layer afterwards yeah yeah exactly yeah super cool and then maybe like as we're wrapping up the podcast if you let me quickly tell you about ref to veck and sort of the pivot into recommendation and well so to start off ref to veck is about and it's about utilizing we VH data model a little more so we VH data model is designed where you have different classes so this class could be products this class could be user so you know like tables and SQL we have different data objects like the high-level ideas of designing data objects and then you have graph relations between them like user-like products so the simplest thing is that then you can represent the user as the average vector of the products that the user liked and then you can rank with you can re-rank with that or you could just search with that vector that that could be your search vector or you could have some other search like restaurants in Boston and because I live in Boston and you know like oh sorry I didn't mean to give away Boston in the query say I my my query is Italian restaurants and because it sees that Connor likes restaurant I don't know like some north and Italian restaurants one way that like it knows that I'm in Boston so it will it can personalize just using that vector to re-rank to only show me restaurants in Boston because if you show me a restaurant in Chicago it's like useless so so so that's kind of the first idea is this kind of like average the vectors to get the centroid but then there's this idea where and I learned this from talking to Martin Grootendorce about his burtopic library and I highly recommend people check that out it's such a cool way of visualizing vector spaces but this like HDB scan clustering so he was describing the difference between HDB scan and k-means clustering for how they produce centroids but and so HDB scan has this very cool like density clustering thing but regardless of the clustering you use I just I like HDB scan a lot but so let's say we get three centroids like I like Nike shoes a data shoes and Jordan shoes and you have these three centroids so you can use those three centroids is three average vectors from their respective clusters to re-rank with as well have some kind of diversity and results and then there is just this thinking around so so yes there that that's the recommendation pivot and then there's this idea of like top level index and I'm stealing that kind of terminology from gpt index because what gpt index does is to represent a long document you have again that tree summarization where you could say this is for obviously right and it's for and you summarize these two and then summarize one and see if this like top level index where you search through this layer first and this layer and so it's like if you're asking question like what was Barack Obama's legacy and then you have the symbolic filter of the titles of the Wikipedia pages and you have where title equals Barack Obama like that top level search will like super simplified the search space because now you're just looking in the Barack Obama article and there at all Wikipedia so I think reftivect also in the use of having constructing top level indexes by you know having document has passage has passage has passage again in the we get data model it can be it's just like I think it's a really interesting way that we're trying to use this cross-reference graph structure to move embeddings through the graph another idea and I know like doing a thousand ideas like it could be having like a graph convolutional network where okay so you have user-like product has brand okay let's just make it a three-class graph like that and so you have this this graph and you need to send the you need to aggregate the embeddings through the graph so now it's like should we just average should we try some kind of like nonlinear graph convolutional network and the graph convolutional network being beneficial because a graph network can handle like arbitrary number of inputs that's sort of like isn't like a fixed input size like transformers you would like zero-pad it to 512 tokens or the convolutional network is it's like kind of flexible but generally it's like very flexible to the number of inputs and so I hope that was an okay tour of reftivect and I know I'm trying to get in a little bit start it's amazing actually and I think I hope we can maybe discuss in subsequent episodes as well because the topic of personalization is also very interesting and like for someone who says okay we just have this fixed vectors computed from the content how the hell we can actually bring the user and this is what you've described this is what I perceive from it I think this is an excellent topic and this kind of opens up opportunities for vector search to appeal to to the you know search engine builder so maybe some other engines like recommendation and so but I think we have a ton of material I really love talking to you maybe before we close off is there something you wanted to announce to to the audience of vector podcast oh yeah I think it was so we have toured a lot of things but I really hope that you check out the weve 8 beer benchmarks repository so this is a recent effort around hybrid search coming back to that in a long conversation I really thought it was forever ago but like the hybrid search thing has been tested with with the beer benchmarks and so there's there's like scales there's like small scale beer medium scale larger scale so right now there's the larger scale and some medium scale I'm at smaller scale and some medium scale and right now we're working on the backups but this is all based on so we've got 1. +15 had backups where you can you know back up the weve 8 instance to have like a file that lets you just restore the weve 8 instance so you don't need to import the data it's like you know it's like with the face indexes how you can just read index so so now what you can do is you can just load the weve 8 index and so why this is so exciting to me is I've I've always been really interested in like hug and face data sets or papers with codes papers with data like this organization of data and I used to think with like weve 8's Wikipedia demo that it would need to be like live always hosted like you click try it now and then it's like boom you're in the console and you can query it but I think with these with this repo where you just download the Docker file for weve 8 it's like three it's like two lines of code where you do Docker compose up and then Python restore the name of the data set you want and I think that's just as easy as having some always hosted demo so yeah I hope and I think the other thing is with with hybrid search another thing that excites me so much is it's like if it's vector search only it's like you could argue well why don't I just use the face index then but I think because it's got the BM25 and the vector search is starting to offer more value with like how it can help you with your information retrieval research yeah and in general that's just something that is very important to me is trying to figure out how to connect with the information retrieval research I think the beer benchmarks presents a really exciting way to do it I do have some ideas on how users would be interested in it because I think the idea of beer benchmarks is maybe you look at it and you say okay NF corpus or trec covid or natural questions like very similar to what the app that I'm building but I think with chat gbt you could probably loop through your documents and generate queries gold like those would be the gold documents for those queries and you can do the same kind of evaluation testing where as you mentioned you want to see how that approximate nearest neighbor error cascades into the representation error and see what that means for your particular problem so I hope people check it out I hope find an interesting yeah that's a that's a ton super packed thanks so much what I like in this discussion compared to to the last 20 years ago is that you continue to explode with knowledge and I hope you will continue doing that thanks so much for your time today corner and yeah looking forward to talk more yeah thank you so much to me try to feel like the vector podcast is like the super bowl of search podcast so thank you so much thank you so much Connor yeah enjoy your day bye bye \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/daniel-tunkelang-leading-search-consultant-leveraging-ml-for-query-and-content-understanding.md b/transcripts_with_timestamps/vector-podcast/daniel-tunkelang-leading-search-consultant-leveraging-ml-for-query-and-content-understanding.md new file mode 100644 index 0000000..7b9308d --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/daniel-tunkelang-leading-search-consultant-leveraging-ml-for-query-and-content-understanding.md @@ -0,0 +1,3273 @@ +--- +description: '

YouTube: https://www.youtube.com/watch?v=GyXggc4LNKI

Topics:

00:00 + Kick-off by Judy Zhu

01:33 Introduction by Dmitry Kan and his bio!

03:03 + Daniel’s background

04:46 “Science is the difference between instinct and + strategy”

07:41 Search as a personal learning experience

11:53 Why do + we need Machine Learning in Search, or can we use manually curated features?

16:47 + Swimming up-stream from relevancy: query / content understanding and where to start?

23:49 + Rule-based vs Machine Learning approaches to Query Understanding: Pareto principle

29:05 + How content understanding can significantly improve your search engine experience

32:02 + Available datasets, tools and algorithms to train models for content understanding

38:20 + Daniel’s take on the role of vector search in modern search engine design as the + path to language of users

45:17 Mystical question of WHY: what drives Daniel + in the search space today

49:50 Announcements from Daniel

51:15 Questions + from the audience

Show notes:

[What is Content Understanding?. Content + understanding is the foundation… | by Daniel Tunkelang | Content Understanding | + Medium](https://medium.com/content-understanding/what-is-content-understanding-4da20e925974)

Query + Understanding: An Introduction https://queryunderstanding.com/introduction-c98740502103)

Science + as Strategy [YouTube](https://www.youtube.com/watch?v=dftt6Yqgnuw)

Search + Fundamentals course - https://corise.com/course/search-fundamentals

Search + with ML course - https://corise.com/course/search-with-machine-learning

Books:

Faceted + Search, by Daniel Tunkelang: https://www.amazon.com/Synthesis-Lectures-Information-Concepts-Retrieval/dp/1598299999

Modern + Information Retrieval: The Concepts and Technology Behind Search, by Ricardo Baeza-Yates: + https://www.amazon.com/Modern-Information-Retrieval-Concepts-Technology/dp/0321416910/ref=sr11?qid=1653144684&refinements=p_27%3ARicardo+Baeza-Yates&s=books&sr=1-1

Introduction + to Information Retrieval, by Chris Manning: https://www.amazon.com/Introduction-Information-Retrieval-Christopher-Manning/dp/0521865719/ref=sr1fkmr0_1?crid=2GIR19OTZ8QFJ&keywords=chris+manning+information+retrieval&qid=1653144967&s=books&sprefix=chris+manning+information+retrieval%2Cstripbooks-intl-ship%2C141&sr=1-1-fkmr0

Query + Understanding for Search Engines, by Yi Chang and Hongbo Deng: https://www.amazon.com/Understanding-Search-Engines-Information-Retrieval/dp/3030583333

' +image_url: https://media.rss.com/vector-podcast/20220522_070529_71d7f3ebca3858a656066fb337b207c1.jpg +pub_date: Mon, 23 May 2022 13:00:19 GMT +title: Daniel Tunkelang - Leading Search Consultant - Leveraging ML for query and + content understanding +url: https://rss.com/podcasts/vector-podcast/494873 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 25.0, "text": " We can + get started. So I can kick us off even though you see are really the star of", "tokens": + [50364, 492, 393, 483, 1409, 13, 407, 286, 393, 4437, 505, 766, 754, 1673, 291, + 536, 366, 534, 264, 3543, 295, 51614], "temperature": 0.0, "avg_logprob": -0.3865584929784139, + "compression_ratio": 1.064102564102564, "no_speech_prob": 0.022780779749155045}, + {"id": 1, "seek": 2500, "start": 25.0, "end": 32.2, "text": " the show. So hi everyone. + Welcome to our fireside chat with Dimeji Conan,", "tokens": [50364, 264, 855, 13, + 407, 4879, 1518, 13, 4027, 281, 527, 15044, 482, 5081, 365, 413, 1312, 4013, 2656, + 282, 11, 50724], "temperature": 0.0, "avg_logprob": -0.34980154573247674, "compression_ratio": + 1.5270935960591132, "no_speech_prob": 0.29082417488098145}, {"id": 2, "seek": 2500, + "start": 32.2, "end": 40.6, "text": " Daniel Tungerling. This fireside chat is on + search and all the interesting topics", "tokens": [50724, 8033, 314, 1063, 260, + 1688, 13, 639, 15044, 482, 5081, 307, 322, 3164, 293, 439, 264, 1880, 8378, 51144], + "temperature": 0.0, "avg_logprob": -0.34980154573247674, "compression_ratio": 1.5270935960591132, + "no_speech_prob": 0.29082417488098145}, {"id": 3, "seek": 2500, "start": 40.6, "end": + 45.44, "text": " that Dimeji and Daniel will talk about. And it''s a series that''s + hosted by", "tokens": [51144, 300, 413, 1312, 4013, 293, 8033, 486, 751, 466, 13, + 400, 309, 311, 257, 2638, 300, 311, 19204, 538, 51386], "temperature": 0.0, "avg_logprob": + -0.34980154573247674, "compression_ratio": 1.5270935960591132, "no_speech_prob": + 0.29082417488098145}, {"id": 4, "seek": 2500, "start": 45.44, "end": 52.120000000000005, + "text": " Kauarais. Kauarais just do my plug here and we''re a new education platform + that", "tokens": [51386, 591, 1459, 289, 1527, 13, 591, 1459, 289, 1527, 445, 360, + 452, 5452, 510, 293, 321, 434, 257, 777, 3309, 3663, 300, 51720], "temperature": + 0.0, "avg_logprob": -0.34980154573247674, "compression_ratio": 1.5270935960591132, + "no_speech_prob": 0.29082417488098145}, {"id": 5, "seek": 5212, "start": 52.12, + "end": 58.32, "text": " transforms the way professionals build technical high demand + skills through top industry", "tokens": [50364, 35592, 264, 636, 11954, 1322, 6191, + 1090, 4733, 3942, 807, 1192, 3518, 50674], "temperature": 0.0, "avg_logprob": -0.26839455314304517, + "compression_ratio": 1.599078341013825, "no_speech_prob": 0.002250786405056715}, + {"id": 6, "seek": 5212, "start": 58.32, "end": 66.44, "text": " leaders such as + Daniel and collective peer learning such as Demetri, the format of our", "tokens": + [50674, 3523, 1270, 382, 8033, 293, 12590, 15108, 2539, 1270, 382, 4686, 302, 470, + 11, 264, 7877, 295, 527, 51080], "temperature": 0.0, "avg_logprob": -0.26839455314304517, + "compression_ratio": 1.599078341013825, "no_speech_prob": 0.002250786405056715}, + {"id": 7, "seek": 5212, "start": 66.44, "end": 72.24, "text": " courses or pretty + innovative because we mix live instructor sessions with real world", "tokens": [51080, + 7712, 420, 1238, 12999, 570, 321, 2890, 1621, 18499, 11081, 365, 957, 1002, 51370], + "temperature": 0.0, "avg_logprob": -0.26839455314304517, "compression_ratio": 1.599078341013825, + "no_speech_prob": 0.002250786405056715}, {"id": 8, "seek": 5212, "start": 72.24, + "end": 78.64, "text": " projects and fireside chats like these with operators that + are experts in their field.", "tokens": [51370, 4455, 293, 15044, 482, 38057, 411, + 613, 365, 19077, 300, 366, 8572, 294, 641, 2519, 13, 51690], "temperature": 0.0, + "avg_logprob": -0.26839455314304517, "compression_ratio": 1.599078341013825, "no_speech_prob": + 0.002250786405056715}, {"id": 9, "seek": 7864, "start": 78.76, "end": 84.96000000000001, + "text": " And actually I see a couple of students from both the search class and + from other classes", "tokens": [50370, 400, 767, 286, 536, 257, 1916, 295, 1731, + 490, 1293, 264, 3164, 1508, 293, 490, 661, 5359, 50680], "temperature": 0.0, "avg_logprob": + -0.21418877865405792, "compression_ratio": 1.577092511013216, "no_speech_prob": + 0.020690934732556343}, {"id": 10, "seek": 7864, "start": 84.96000000000001, "end": + 91.4, "text": " are in the audience. So welcome back to you guys and welcome to + everyone else here. So with", "tokens": [50680, 366, 294, 264, 4034, 13, 407, 2928, + 646, 281, 291, 1074, 293, 2928, 281, 1518, 1646, 510, 13, 407, 365, 51002], "temperature": + 0.0, "avg_logprob": -0.21418877865405792, "compression_ratio": 1.577092511013216, + "no_speech_prob": 0.020690934732556343}, {"id": 11, "seek": 7864, "start": 91.4, + "end": 99.2, "text": " that, I''ll pass on to Demetri. Awesome. Thanks, Judy. And + hello, everyone. As they usually", "tokens": [51002, 300, 11, 286, 603, 1320, 322, + 281, 4686, 302, 470, 13, 10391, 13, 2561, 11, 24577, 13, 400, 7751, 11, 1518, 13, + 1018, 436, 2673, 51392], "temperature": 0.0, "avg_logprob": -0.21418877865405792, + "compression_ratio": 1.577092511013216, "no_speech_prob": 0.020690934732556343}, + {"id": 12, "seek": 7864, "start": 99.2, "end": 105.12, "text": " say, hey, they + are vector podcasts is here. And today I have like a luminary guest, a", "tokens": + [51392, 584, 11, 4177, 11, 436, 366, 8062, 24045, 307, 510, 13, 400, 965, 286, 362, + 411, 257, 24635, 4066, 8341, 11, 257, 51688], "temperature": 0.0, "avg_logprob": + -0.21418877865405792, "compression_ratio": 1.577092511013216, "no_speech_prob": + 0.020690934732556343}, {"id": 13, "seek": 10512, "start": 105.12, "end": 111.08, + "text": " mogul in search world, Daniel Tankele and beyond excited to be talking + to him and discussing", "tokens": [50364, 13172, 425, 294, 3164, 1002, 11, 8033, + 17046, 330, 306, 293, 4399, 2919, 281, 312, 1417, 281, 796, 293, 10850, 50662], + "temperature": 0.0, "avg_logprob": -0.2821318458108341, "compression_ratio": 1.5677966101694916, + "no_speech_prob": 0.013967824168503284}, {"id": 14, "seek": 10512, "start": 111.08, + "end": 116.08000000000001, "text": " the, you know, favorite he''s in mind topics + in queer understanding and content understanding.", "tokens": [50662, 264, 11, 291, + 458, 11, 2954, 415, 311, 294, 1575, 8378, 294, 20323, 3701, 293, 2701, 3701, 13, + 50912], "temperature": 0.0, "avg_logprob": -0.2821318458108341, "compression_ratio": + 1.5677966101694916, "no_speech_prob": 0.013967824168503284}, {"id": 15, "seek": + 10512, "start": 116.96000000000001, "end": 123.76, "text": " And traditionally, + I will introduce myself for the first time on the podcast. And well,", "tokens": + [50956, 400, 19067, 11, 286, 486, 5366, 2059, 337, 264, 700, 565, 322, 264, 7367, + 13, 400, 731, 11, 51296], "temperature": 0.0, "avg_logprob": -0.2821318458108341, + "compression_ratio": 1.5677966101694916, "no_speech_prob": 0.013967824168503284}, + {"id": 16, "seek": 10512, "start": 123.76, "end": 130.4, "text": " what I want to + say is I have PhD in natural language processing. I work the machine translation", + "tokens": [51296, 437, 286, 528, 281, 584, 307, 286, 362, 14476, 294, 3303, 2856, + 9007, 13, 286, 589, 264, 3479, 12853, 51628], "temperature": 0.0, "avg_logprob": + -0.2821318458108341, "compression_ratio": 1.5677966101694916, "no_speech_prob": + 0.013967824168503284}, {"id": 17, "seek": 13040, "start": 130.48000000000002, "end": + 137.52, "text": " back in the days. Currently in two roles, principal AI scientist + with silo AI. It''s a consulting", "tokens": [50368, 646, 294, 264, 1708, 13, 19964, + 294, 732, 9604, 11, 9716, 7318, 12662, 365, 3425, 78, 7318, 13, 467, 311, 257, 23682, + 50720], "temperature": 0.0, "avg_logprob": -0.17352279575391746, "compression_ratio": + 1.5577689243027888, "no_speech_prob": 0.00030771075398661196}, {"id": 18, "seek": + 13040, "start": 137.52, "end": 143.76, "text": " gig. And recently I entered the + job as a senior product manager at the company called Tom Tom,", "tokens": [50720, + 8741, 13, 400, 3938, 286, 9065, 264, 1691, 382, 257, 7965, 1674, 6598, 412, 264, + 2237, 1219, 5041, 5041, 11, 51032], "temperature": 0.0, "avg_logprob": -0.17352279575391746, + "compression_ratio": 1.5577689243027888, "no_speech_prob": 0.00030771075398661196}, + {"id": 19, "seek": 13040, "start": 143.76, "end": 150.56, "text": " which produces + maps and map search and navigation. I have 16 years of experience in developing + search", "tokens": [51032, 597, 14725, 11317, 293, 4471, 3164, 293, 17346, 13, 286, + 362, 3165, 924, 295, 1752, 294, 6416, 3164, 51372], "temperature": 0.0, "avg_logprob": + -0.17352279575391746, "compression_ratio": 1.5577689243027888, "no_speech_prob": + 0.00030771075398661196}, {"id": 20, "seek": 13040, "start": 150.56, "end": 157.28, + "text": " engines for startups and multinational technology giants. Most recently, + I worked on multilingual", "tokens": [51372, 12982, 337, 28041, 293, 45872, 1478, + 2899, 31894, 13, 4534, 3938, 11, 286, 2732, 322, 2120, 38219, 51708], "temperature": + 0.0, "avg_logprob": -0.17352279575391746, "compression_ratio": 1.5577689243027888, + "no_speech_prob": 0.00030771075398661196}, {"id": 21, "seek": 15728, "start": 157.28, + "end": 164.72, "text": " web scale search. I also claim to be an expert in vector + search engines. And I''m the host of", "tokens": [50364, 3670, 4373, 3164, 13, 286, + 611, 3932, 281, 312, 364, 5844, 294, 8062, 3164, 12982, 13, 400, 286, 478, 264, + 3975, 295, 50736], "temperature": 0.0, "avg_logprob": -0.13374224953029468, "compression_ratio": + 1.617117117117117, "no_speech_prob": 0.008377227932214737}, {"id": 22, "seek": 15728, + "start": 164.72, "end": 170.08, "text": " vector podcast, which focuses on this + tech, but also beyond that on search at large. I''m also", "tokens": [50736, 8062, + 7367, 11, 597, 16109, 322, 341, 7553, 11, 457, 611, 4399, 300, 322, 3164, 412, 2416, + 13, 286, 478, 611, 51004], "temperature": 0.0, "avg_logprob": -0.13374224953029468, + "compression_ratio": 1.617117117117117, "no_speech_prob": 0.008377227932214737}, + {"id": 23, "seek": 15728, "start": 170.08, "end": 178.24, "text": " blogging on + medium. And as I said, I''m beyond excited to be talking to Daniel today. And as + a", "tokens": [51004, 6968, 3249, 322, 6399, 13, 400, 382, 286, 848, 11, 286, 478, + 4399, 2919, 281, 312, 1417, 281, 8033, 965, 13, 400, 382, 257, 51412], "temperature": + 0.0, "avg_logprob": -0.13374224953029468, "compression_ratio": 1.617117117117117, + "no_speech_prob": 0.008377227932214737}, {"id": 24, "seek": 15728, "start": 178.24, + "end": 182.64, "text": " tradition, Daniel, could you please introduce yourself + to me and our audience?", "tokens": [51412, 6994, 11, 8033, 11, 727, 291, 1767, + 5366, 1803, 281, 385, 293, 527, 4034, 30, 51632], "temperature": 0.0, "avg_logprob": + -0.13374224953029468, "compression_ratio": 1.617117117117117, "no_speech_prob": + 0.008377227932214737}, {"id": 25, "seek": 18264, "start": 183.27999999999997, "end": + 191.11999999999998, "text": " Sure, do you make me? Thank you for that. I''m Daniel + Tungaling. And I''ve been working in search", "tokens": [50396, 4894, 11, 360, 291, + 652, 385, 30, 1044, 291, 337, 300, 13, 286, 478, 8033, 314, 1063, 4270, 13, 400, + 286, 600, 668, 1364, 294, 3164, 50788], "temperature": 0.0, "avg_logprob": -0.2670934511267621, + "compression_ratio": 1.4979253112033195, "no_speech_prob": 0.017736561596393585}, + {"id": 26, "seek": 18264, "start": 191.11999999999998, "end": 198.48, "text": " + for, I guess, a little bit over two decades. I started after completing my PhD, + not", "tokens": [50788, 337, 11, 286, 2041, 11, 257, 707, 857, 670, 732, 7878, 13, + 286, 1409, 934, 19472, 452, 14476, 11, 406, 51156], "temperature": 0.0, "avg_logprob": + -0.2670934511267621, "compression_ratio": 1.4979253112033195, "no_speech_prob": + 0.017736561596393585}, {"id": 27, "seek": 18264, "start": 198.48, "end": 203.67999999999998, + "text": " anything to do with search information retrieval, but actually in network + visualization.", "tokens": [51156, 1340, 281, 360, 365, 3164, 1589, 19817, 3337, + 11, 457, 767, 294, 3209, 25801, 13, 51416], "temperature": 0.0, "avg_logprob": -0.2670934511267621, + "compression_ratio": 1.4979253112033195, "no_speech_prob": 0.017736561596393585}, + {"id": 28, "seek": 18264, "start": 204.72, "end": 212.39999999999998, "text": " + I shortly ended up teaming up with a few folks to start a company called Indeka + back in 1999", "tokens": [51468, 286, 13392, 4590, 493, 1469, 278, 493, 365, 257, + 1326, 4024, 281, 722, 257, 2237, 1219, 2333, 36361, 646, 294, 19952, 51852], "temperature": + 0.0, "avg_logprob": -0.2670934511267621, "compression_ratio": 1.4979253112033195, + "no_speech_prob": 0.017736561596393585}, {"id": 29, "seek": 21240, "start": 212.4, + "end": 218.8, "text": " that ended up focusing on e-commerce search and to some + degree enterprise search in general.", "tokens": [50364, 300, 4590, 493, 8416, 322, + 308, 12, 26926, 3164, 293, 281, 512, 4314, 14132, 3164, 294, 2674, 13, 50684], "temperature": + 0.0, "avg_logprob": -0.23994242489992917, "compression_ratio": 1.5909090909090908, + "no_speech_prob": 0.00037589113344438374}, {"id": 30, "seek": 21240, "start": 218.8, + "end": 226.88, "text": " Site search has there for 10 years of the chief scientist. + And then I went to Google where in fact,", "tokens": [50684, 34027, 3164, 575, 456, + 337, 1266, 924, 295, 264, 9588, 12662, 13, 400, 550, 286, 1437, 281, 3329, 689, + 294, 1186, 11, 51088], "temperature": 0.0, "avg_logprob": -0.23994242489992917, + "compression_ratio": 1.5909090909090908, "no_speech_prob": 0.00037589113344438374}, + {"id": 31, "seek": 21240, "start": 226.88, "end": 234.48000000000002, "text": " + I worked on search in local search part of the maps search team as a tech lead moved + ironically", "tokens": [51088, 286, 2732, 322, 3164, 294, 2654, 3164, 644, 295, + 264, 11317, 3164, 1469, 382, 257, 7553, 1477, 4259, 41082, 51468], "temperature": + 0.0, "avg_logprob": -0.23994242489992917, "compression_ratio": 1.5909090909090908, + "no_speech_prob": 0.00037589113344438374}, {"id": 32, "seek": 21240, "start": 234.48000000000002, + "end": 240.16, "text": " from the East Coast. I''ve been living in New York to now + to do to leave Google and join LinkedIn", "tokens": [51468, 490, 264, 6747, 14960, + 13, 286, 600, 668, 2647, 294, 1873, 3609, 281, 586, 281, 360, 281, 1856, 3329, 293, + 3917, 20657, 51752], "temperature": 0.0, "avg_logprob": -0.23994242489992917, "compression_ratio": + 1.5909090909090908, "no_speech_prob": 0.00037589113344438374}, {"id": 33, "seek": + 24016, "start": 240.88, "end": 245.84, "text": " where I first ran the product data + science team, but ended up coming back to my first", "tokens": [50400, 689, 286, + 700, 5872, 264, 1674, 1412, 3497, 1469, 11, 457, 4590, 493, 1348, 646, 281, 452, + 700, 50648], "temperature": 0.0, "avg_logprob": -0.1310109306784237, "compression_ratio": + 1.6327433628318584, "no_speech_prob": 0.0014622381422668695}, {"id": 34, "seek": + 24016, "start": 245.84, "end": 252.0, "text": " bulb of search. And it was at LinkedIn + where I started a query understanding team and shifted my", "tokens": [50648, 21122, + 295, 3164, 13, 400, 309, 390, 412, 20657, 689, 286, 1409, 257, 14581, 3701, 1469, + 293, 18892, 452, 50956], "temperature": 0.0, "avg_logprob": -0.1310109306784237, + "compression_ratio": 1.6327433628318584, "no_speech_prob": 0.0014622381422668695}, + {"id": 35, "seek": 24016, "start": 252.0, "end": 258.0, "text": " focus, which had + really been more around faceted search and interaction to query understanding.", + "tokens": [50956, 1879, 11, 597, 632, 534, 668, 544, 926, 1915, 10993, 3164, 293, + 9285, 281, 14581, 3701, 13, 51256], "temperature": 0.0, "avg_logprob": -0.1310109306784237, + "compression_ratio": 1.6327433628318584, "no_speech_prob": 0.0014622381422668695}, + {"id": 36, "seek": 24016, "start": 258.8, "end": 264.56, "text": " After leaving + LinkedIn, I decided to go off on my own and for the past six or seven years,", "tokens": + [51296, 2381, 5012, 20657, 11, 286, 3047, 281, 352, 766, 322, 452, 1065, 293, 337, + 264, 1791, 2309, 420, 3407, 924, 11, 51584], "temperature": 0.0, "avg_logprob": + -0.1310109306784237, "compression_ratio": 1.6327433628318584, "no_speech_prob": + 0.0014622381422668695}, {"id": 37, "seek": 26456, "start": 264.64, "end": 271.28000000000003, + "text": " I think what I like to call a high-class consultant trying to bring the + search to everybody who", "tokens": [50368, 286, 519, 437, 286, 411, 281, 818, 257, + 1090, 12, 11665, 24676, 1382, 281, 1565, 264, 3164, 281, 2201, 567, 50700], "temperature": + 0.0, "avg_logprob": -0.3030942551633145, "compression_ratio": 1.5555555555555556, + "no_speech_prob": 0.001881706528365612}, {"id": 38, "seek": 26456, "start": 271.28000000000003, + "end": 278.24, "text": " needs it, which turns out to be a lot of people. And then + last year, I discovered the wonderful", "tokens": [50700, 2203, 309, 11, 597, 4523, + 484, 281, 312, 257, 688, 295, 561, 13, 400, 550, 1036, 1064, 11, 286, 6941, 264, + 3715, 51048], "temperature": 0.0, "avg_logprob": -0.3030942551633145, "compression_ratio": + 1.5555555555555556, "no_speech_prob": 0.001881706528365612}, {"id": 39, "seek": + 26456, "start": 278.24, "end": 285.2, "text": " folks at Co-Rise and started teaching + these classes with my friend and colleague, Raddey herself.", "tokens": [51048, + 4024, 412, 3066, 12, 49, 908, 293, 1409, 4571, 613, 5359, 365, 452, 1277, 293, 13532, + 11, 9654, 1479, 88, 7530, 13, 51396], "temperature": 0.0, "avg_logprob": -0.3030942551633145, + "compression_ratio": 1.5555555555555556, "no_speech_prob": 0.001881706528365612}, + {"id": 40, "seek": 26456, "start": 286.32, "end": 291.44, "text": " Fantastic. And + I can add to that, the course being having been a student at your course.", "tokens": + [51452, 21320, 13, 400, 286, 393, 909, 281, 300, 11, 264, 1164, 885, 1419, 668, + 257, 3107, 412, 428, 1164, 13, 51708], "temperature": 0.0, "avg_logprob": -0.3030942551633145, + "compression_ratio": 1.5555555555555556, "no_speech_prob": 0.001881706528365612}, + {"id": 41, "seek": 29144, "start": 292.08, "end": 298.08, "text": " Fantastic course, + I''ve learned a lot. And yeah, I''m a happy owner of this certificate as well.", + "tokens": [50396, 21320, 1164, 11, 286, 600, 3264, 257, 688, 13, 400, 1338, 11, + 286, 478, 257, 2055, 7289, 295, 341, 15953, 382, 731, 13, 50696], "temperature": + 0.0, "avg_logprob": -0.18993362008708797, "compression_ratio": 1.4314720812182742, + "no_speech_prob": 0.005266885738819838}, {"id": 42, "seek": 29144, "start": 298.08, + "end": 306.8, "text": " So I can prove to future job employers that I have passed + it. And actually, I watched one", "tokens": [50696, 407, 286, 393, 7081, 281, 2027, + 1691, 16744, 300, 286, 362, 4678, 309, 13, 400, 767, 11, 286, 6337, 472, 51132], + "temperature": 0.0, "avg_logprob": -0.18993362008708797, "compression_ratio": 1.4314720812182742, + "no_speech_prob": 0.005266885738819838}, {"id": 43, "seek": 29144, "start": 306.8, + "end": 313.6, "text": " presentation you gave at the CIO Summit 10 years ago. And + one key phrase that I took away from it", "tokens": [51132, 5860, 291, 2729, 412, + 264, 383, 15167, 28726, 1266, 924, 2057, 13, 400, 472, 2141, 9535, 300, 286, 1890, + 1314, 490, 309, 51472], "temperature": 0.0, "avg_logprob": -0.18993362008708797, + "compression_ratio": 1.4314720812182742, "no_speech_prob": 0.005266885738819838}, + {"id": 44, "seek": 31360, "start": 314.48, "end": 321.12, "text": " or suggestion, + you said science is the difference between instinct and strategy.", "tokens": [50408, + 420, 16541, 11, 291, 848, 3497, 307, 264, 2649, 1296, 16556, 293, 5206, 13, 50740], + "temperature": 0.0, "avg_logprob": -0.11567894617716472, "compression_ratio": 1.4689265536723164, + "no_speech_prob": 0.005901699420064688}, {"id": 45, "seek": 31360, "start": 322.0, + "end": 330.40000000000003, "text": " And I wanted to a little bit like ask you to + talk to the role of science in every day", "tokens": [50784, 400, 286, 1415, 281, + 257, 707, 857, 411, 1029, 291, 281, 751, 281, 264, 3090, 295, 3497, 294, 633, 786, + 51204], "temperature": 0.0, "avg_logprob": -0.11567894617716472, "compression_ratio": + 1.4689265536723164, "no_speech_prob": 0.005901699420064688}, {"id": 46, "seek": + 31360, "start": 331.6, "end": 336.8, "text": " search engine development and research. + Do you continue to view it that way 10 years forward?", "tokens": [51264, 3164, + 2848, 3250, 293, 2132, 13, 1144, 291, 2354, 281, 1910, 309, 300, 636, 1266, 924, + 2128, 30, 51524], "temperature": 0.0, "avg_logprob": -0.11567894617716472, "compression_ratio": + 1.4689265536723164, "no_speech_prob": 0.005901699420064688}, {"id": 47, "seek": + 33680, "start": 337.76, "end": 342.32, "text": " I do, it''s funny because if you, + anybody who watches that, that is probably the only", "tokens": [50412, 286, 360, + 11, 309, 311, 4074, 570, 498, 291, 11, 4472, 567, 17062, 300, 11, 300, 307, 1391, + 264, 787, 50640], "temperature": 0.0, "avg_logprob": -0.21029680547579913, "compression_ratio": + 1.5055555555555555, "no_speech_prob": 0.004106548149138689}, {"id": 48, "seek": + 33680, "start": 342.96000000000004, "end": 351.6, "text": " recording of me wearing + a suit on any sort of video is the science strategy talk. And the", "tokens": [50672, + 6613, 295, 385, 4769, 257, 5722, 322, 604, 1333, 295, 960, 307, 264, 3497, 5206, + 751, 13, 400, 264, 51104], "temperature": 0.0, "avg_logprob": -0.21029680547579913, + "compression_ratio": 1.5055555555555555, "no_speech_prob": 0.004106548149138689}, + {"id": 49, "seek": 33680, "start": 353.2, "end": 360.48, "text": " when I was thinking + at the time, you know, as a data scientist, a big part of my job was getting", "tokens": + [51184, 562, 286, 390, 1953, 412, 264, 565, 11, 291, 458, 11, 382, 257, 1412, 12662, + 11, 257, 955, 644, 295, 452, 1691, 390, 1242, 51548], "temperature": 0.0, "avg_logprob": + -0.21029680547579913, "compression_ratio": 1.5055555555555555, "no_speech_prob": + 0.004106548149138689}, {"id": 50, "seek": 36048, "start": 360.48, "end": 367.68, + "text": " people to use the scientific method and there were whether that was a + A B testing or having,", "tokens": [50364, 561, 281, 764, 264, 8134, 3170, 293, + 456, 645, 1968, 300, 390, 257, 316, 363, 4997, 420, 1419, 11, 50724], "temperature": + 0.0, "avg_logprob": -0.16615703541745422, "compression_ratio": 1.609442060085837, + "no_speech_prob": 0.0005702024791389704}, {"id": 51, "seek": 36048, "start": 367.68, + "end": 374.96000000000004, "text": " you know, clear falseifiable hypotheses and + so forth. Now, it don''t get me wrong, instincts matter", "tokens": [50724, 291, + 458, 11, 1850, 7908, 30876, 49969, 293, 370, 5220, 13, 823, 11, 309, 500, 380, 483, + 385, 2085, 11, 38997, 1871, 51088], "temperature": 0.0, "avg_logprob": -0.16615703541745422, + "compression_ratio": 1.609442060085837, "no_speech_prob": 0.0005702024791389704}, + {"id": 52, "seek": 36048, "start": 374.96000000000004, "end": 383.20000000000005, + "text": " a lot. For example, if you go to a search engine and you''re not happy + with what you''re seeing,", "tokens": [51088, 257, 688, 13, 1171, 1365, 11, 498, + 291, 352, 281, 257, 3164, 2848, 293, 291, 434, 406, 2055, 365, 437, 291, 434, 2577, + 11, 51500], "temperature": 0.0, "avg_logprob": -0.16615703541745422, "compression_ratio": + 1.609442060085837, "no_speech_prob": 0.0005702024791389704}, {"id": 53, "seek": + 36048, "start": 383.20000000000005, "end": 390.0, "text": " your instincts are probably + right. There''s probably something wrong. But if you say, oh,", "tokens": [51500, + 428, 38997, 366, 1391, 558, 13, 821, 311, 1391, 746, 2085, 13, 583, 498, 291, 584, + 11, 1954, 11, 51840], "temperature": 0.0, "avg_logprob": -0.16615703541745422, "compression_ratio": + 1.609442060085837, "no_speech_prob": 0.0005702024791389704}, {"id": 54, "seek": + 39048, "start": 391.20000000000005, "end": 397.28000000000003, "text": " I''m not + seeing the results I like, I''m going to add a simple inventory. I''m going to turn + up one of", "tokens": [50400, 286, 478, 406, 2577, 264, 3542, 286, 411, 11, 286, + 478, 516, 281, 909, 257, 2199, 14228, 13, 286, 478, 516, 281, 1261, 493, 472, 295, + 50704], "temperature": 0.0, "avg_logprob": -0.19382472748452045, "compression_ratio": + 1.7232142857142858, "no_speech_prob": 0.0010777412680909038}, {"id": 55, "seek": + 39048, "start": 397.28000000000003, "end": 403.92, "text": " these knobs to see + what I get. Then don''t get me wrong, you''ll sometimes get improvements,", "tokens": + [50704, 613, 46999, 281, 536, 437, 286, 483, 13, 1396, 500, 380, 483, 385, 2085, + 11, 291, 603, 2171, 483, 13797, 11, 51036], "temperature": 0.0, "avg_logprob": -0.19382472748452045, + "compression_ratio": 1.7232142857142858, "no_speech_prob": 0.0010777412680909038}, + {"id": 56, "seek": 39048, "start": 403.92, "end": 410.96000000000004, "text": " + instincts are not useless. But you won''t have a way of being certain that you''re + getting improvements.", "tokens": [51036, 38997, 366, 406, 14115, 13, 583, 291, + 1582, 380, 362, 257, 636, 295, 885, 1629, 300, 291, 434, 1242, 13797, 13, 51388], + "temperature": 0.0, "avg_logprob": -0.19382472748452045, "compression_ratio": 1.7232142857142858, + "no_speech_prob": 0.0010777412680909038}, {"id": 57, "seek": 39048, "start": 410.96000000000004, + "end": 417.68, "text": " And you may sometimes get improvements that happen to work + in that particular moment at that", "tokens": [51388, 400, 291, 815, 2171, 483, + 13797, 300, 1051, 281, 589, 294, 300, 1729, 1623, 412, 300, 51724], "temperature": + 0.0, "avg_logprob": -0.19382472748452045, "compression_ratio": 1.7232142857142858, + "no_speech_prob": 0.0010777412680909038}, {"id": 58, "seek": 41768, "start": 417.68, + "end": 427.92, "text": " particular time, but which you can''t explain or sustain. + And so science is about using techniques", "tokens": [50364, 1729, 565, 11, 457, + 597, 291, 393, 380, 2903, 420, 6769, 13, 400, 370, 3497, 307, 466, 1228, 7512, 50876], + "temperature": 0.0, "avg_logprob": -0.13091498407824287, "compression_ratio": 1.5809128630705394, + "no_speech_prob": 0.000538010848686099}, {"id": 59, "seek": 41768, "start": 427.92, + "end": 434.56, "text": " like with other people might call randomize control trials, + but we like to call A B tests. The science", "tokens": [50876, 411, 365, 661, 561, + 1062, 818, 4974, 1125, 1969, 12450, 11, 457, 321, 411, 281, 818, 316, 363, 6921, + 13, 440, 3497, 51208], "temperature": 0.0, "avg_logprob": -0.13091498407824287, + "compression_ratio": 1.5809128630705394, "no_speech_prob": 0.000538010848686099}, + {"id": 60, "seek": 41768, "start": 434.56, "end": 440.8, "text": " it poses a certain + amount of discipline and it keeps you honest, which I do think is the", "tokens": + [51208, 309, 26059, 257, 1629, 2372, 295, 13635, 293, 309, 5965, 291, 3245, 11, + 597, 286, 360, 519, 307, 264, 51520], "temperature": 0.0, "avg_logprob": -0.13091498407824287, + "compression_ratio": 1.5809128630705394, "no_speech_prob": 0.000538010848686099}, + {"id": 61, "seek": 41768, "start": 440.8, "end": 446.8, "text": " difference between + running on instincts that may or may not work and being able to pursue a", "tokens": + [51520, 2649, 1296, 2614, 322, 38997, 300, 815, 420, 815, 406, 589, 293, 885, 1075, + 281, 12392, 257, 51820], "temperature": 0.0, "avg_logprob": -0.13091498407824287, + "compression_ratio": 1.5809128630705394, "no_speech_prob": 0.000538010848686099}, + {"id": 62, "seek": 44680, "start": 446.8, "end": 454.0, "text": " strategy where + you not only can see whether or how things work, you can measure this as well", + "tokens": [50364, 5206, 689, 291, 406, 787, 393, 536, 1968, 420, 577, 721, 589, + 11, 291, 393, 3481, 341, 382, 731, 50724], "temperature": 0.0, "avg_logprob": -0.19737652073735776, + "compression_ratio": 1.5814977973568283, "no_speech_prob": 0.005458125378936529}, + {"id": 63, "seek": 44680, "start": 454.0, "end": 459.6, "text": " and repeat what + you do. So I still hold to that with regard to this.", "tokens": [50724, 293, 7149, + 437, 291, 360, 13, 407, 286, 920, 1797, 281, 300, 365, 3843, 281, 341, 13, 51004], + "temperature": 0.0, "avg_logprob": -0.19737652073735776, "compression_ratio": 1.5814977973568283, + "no_speech_prob": 0.005458125378936529}, {"id": 64, "seek": 44680, "start": 460.32, + "end": 465.84000000000003, "text": " Yeah, this is fantastic. And I highly recommend + also to watch that video, even though it was for", "tokens": [51040, 865, 11, 341, + 307, 5456, 13, 400, 286, 5405, 2748, 611, 281, 1159, 300, 960, 11, 754, 1673, 309, + 390, 337, 51316], "temperature": 0.0, "avg_logprob": -0.19737652073735776, "compression_ratio": + 1.5814977973568283, "no_speech_prob": 0.005458125378936529}, {"id": 65, "seek": + 44680, "start": 465.84000000000003, "end": 473.44, "text": " high top executives, + there is a lot of logical elements to it that you can apply in day-to-day work.", + "tokens": [51316, 1090, 1192, 28485, 11, 456, 307, 257, 688, 295, 14978, 4959, 281, + 309, 300, 291, 393, 3079, 294, 786, 12, 1353, 12, 810, 589, 13, 51696], "temperature": + 0.0, "avg_logprob": -0.19737652073735776, "compression_ratio": 1.5814977973568283, + "no_speech_prob": 0.005458125378936529}, {"id": 66, "seek": 47344, "start": 473.92, + "end": 481.84, "text": " And yeah, I remember also one quote from the book called + How Google Works that if we argue and", "tokens": [50388, 400, 1338, 11, 286, 1604, + 611, 472, 6513, 490, 264, 1446, 1219, 1012, 3329, 27914, 300, 498, 321, 9695, 293, + 50784], "temperature": 0.0, "avg_logprob": -0.16534114333818545, "compression_ratio": + 1.6296296296296295, "no_speech_prob": 0.010980566963553429}, {"id": 67, "seek": + 47344, "start": 482.56, "end": 488.72, "text": " we have data, let''s look at that + data. But if we go by opinions, let''s go with mine. And it was", "tokens": [50820, + 321, 362, 1412, 11, 718, 311, 574, 412, 300, 1412, 13, 583, 498, 321, 352, 538, + 11819, 11, 718, 311, 352, 365, 3892, 13, 400, 309, 390, 51128], "temperature": 0.0, + "avg_logprob": -0.16534114333818545, "compression_ratio": 1.6296296296296295, "no_speech_prob": + 0.010980566963553429}, {"id": 68, "seek": 47344, "start": 488.72, "end": 495.68, + "text": " written by Vice President of that area. So basically, he''s a hippo or + sort of top that letter. So why not", "tokens": [51128, 3720, 538, 13276, 3117, + 295, 300, 1859, 13, 407, 1936, 11, 415, 311, 257, 27745, 78, 420, 1333, 295, 1192, + 300, 5063, 13, 407, 983, 406, 51476], "temperature": 0.0, "avg_logprob": -0.16534114333818545, + "compression_ratio": 1.6296296296296295, "no_speech_prob": 0.010980566963553429}, + {"id": 69, "seek": 47344, "start": 495.68, "end": 501.76, "text": " why not actually + follow the hierarchy there? But yeah, I agree that if you have data, look at it + if", "tokens": [51476, 983, 406, 767, 1524, 264, 22333, 456, 30, 583, 1338, 11, + 286, 3986, 300, 498, 291, 362, 1412, 11, 574, 412, 309, 498, 51780], "temperature": + 0.0, "avg_logprob": -0.16534114333818545, "compression_ratio": 1.6296296296296295, + "no_speech_prob": 0.010980566963553429}, {"id": 70, "seek": 50176, "start": 501.76, + "end": 511.12, "text": " you don''t try to collect it. Yeah. Yeah, I mean, indeed, + I mean, data is what is the equalizer,", "tokens": [50364, 291, 500, 380, 853, 281, + 2500, 309, 13, 865, 13, 865, 11, 286, 914, 11, 6451, 11, 286, 914, 11, 1412, 307, + 437, 307, 264, 2681, 6545, 11, 50832], "temperature": 0.0, "avg_logprob": -0.16739084302764579, + "compression_ratio": 1.5429864253393666, "no_speech_prob": 0.000536280800588429}, + {"id": 71, "seek": 50176, "start": 511.12, "end": 516.24, "text": " but it''s for + those of us who are not CEOs, it''s how we get things done.", "tokens": [50832, + 457, 309, 311, 337, 729, 295, 505, 567, 366, 406, 40736, 11, 309, 311, 577, 321, + 483, 721, 1096, 13, 51088], "temperature": 0.0, "avg_logprob": -0.16739084302764579, + "compression_ratio": 1.5429864253393666, "no_speech_prob": 0.000536280800588429}, + {"id": 72, "seek": 50176, "start": 516.96, "end": 522.8, "text": " Yeah, absolutely. + By the way, I wanted to also say a couple of words on logistics. Please", "tokens": + [51124, 865, 11, 3122, 13, 3146, 264, 636, 11, 286, 1415, 281, 611, 584, 257, 1916, + 295, 2283, 322, 27420, 13, 2555, 51416], "temperature": 0.0, "avg_logprob": -0.16739084302764579, + "compression_ratio": 1.5429864253393666, "no_speech_prob": 0.000536280800588429}, + {"id": 73, "seek": 50176, "start": 522.8, "end": 528.0, "text": " send your questions + on the chat and we will handle them in the end of this session.", "tokens": [51416, + 2845, 428, 1651, 322, 264, 5081, 293, 321, 486, 4813, 552, 294, 264, 917, 295, 341, + 5481, 13, 51676], "temperature": 0.0, "avg_logprob": -0.16739084302764579, "compression_ratio": + 1.5429864253393666, "no_speech_prob": 0.000536280800588429}, {"id": 74, "seek": + 52800, "start": 528.24, "end": 537.44, "text": " Yeah, and 10 years forward, I''ve + read your message on LinkedIn where you said a little bit on sad", "tokens": [50376, + 865, 11, 293, 1266, 924, 2128, 11, 286, 600, 1401, 428, 3636, 322, 20657, 689, 291, + 848, 257, 707, 857, 322, 4227, 50836], "temperature": 0.0, "avg_logprob": -0.10977302802788032, + "compression_ratio": 1.5201612903225807, "no_speech_prob": 0.013968931511044502}, + {"id": 75, "seek": 52800, "start": 537.44, "end": 545.44, "text": " tone, not everyone + shares my passion for search. But I suspect that many would be more excited", "tokens": + [50836, 8027, 11, 406, 1518, 12182, 452, 5418, 337, 3164, 13, 583, 286, 9091, 300, + 867, 576, 312, 544, 2919, 51236], "temperature": 0.0, "avg_logprob": -0.10977302802788032, + "compression_ratio": 1.5201612903225807, "no_speech_prob": 0.013968931511044502}, + {"id": 76, "seek": 52800, "start": 545.44, "end": 551.04, "text": " about search + if they understood it better. Was it just a moment of despair or was it a moment", + "tokens": [51236, 466, 3164, 498, 436, 7320, 309, 1101, 13, 3027, 309, 445, 257, + 1623, 295, 25763, 420, 390, 309, 257, 1623, 51516], "temperature": 0.0, "avg_logprob": + -0.10977302802788032, "compression_ratio": 1.5201612903225807, "no_speech_prob": + 0.013968931511044502}, {"id": 77, "seek": 52800, "start": 551.04, "end": 556.8, + "text": " that you thought, okay, I need to approach it differently. I can keep + blogging about query", "tokens": [51516, 300, 291, 1194, 11, 1392, 11, 286, 643, + 281, 3109, 309, 7614, 13, 286, 393, 1066, 6968, 3249, 466, 14581, 51804], "temperature": + 0.0, "avg_logprob": -0.10977302802788032, "compression_ratio": 1.5201612903225807, + "no_speech_prob": 0.013968931511044502}, {"id": 78, "seek": 55680, "start": 556.8, + "end": 563.28, "text": " understanding and content understanding, but how can I + actually open the doors to the minds", "tokens": [50364, 3701, 293, 2701, 3701, + 11, 457, 577, 393, 286, 767, 1269, 264, 8077, 281, 264, 9634, 50688], "temperature": + 0.0, "avg_logprob": -0.19784293636198966, "compression_ratio": 1.625, "no_speech_prob": + 0.002573759062215686}, {"id": 79, "seek": 55680, "start": 564.16, "end": 570.4, + "text": " of new people, potentially students in this field? What was going through + your mind when you wrote", "tokens": [50732, 295, 777, 561, 11, 7263, 1731, 294, + 341, 2519, 30, 708, 390, 516, 807, 428, 1575, 562, 291, 4114, 51044], "temperature": + 0.0, "avg_logprob": -0.19784293636198966, "compression_ratio": 1.625, "no_speech_prob": + 0.002573759062215686}, {"id": 80, "seek": 55680, "start": 570.4, "end": 579.1999999999999, + "text": " that? So I''m an off to this. So I''m, you know, if two years of a pandemic + and now the global crisis", "tokens": [51044, 300, 30, 407, 286, 478, 364, 766, + 281, 341, 13, 407, 286, 478, 11, 291, 458, 11, 498, 732, 924, 295, 257, 5388, 293, + 586, 264, 4338, 5869, 51484], "temperature": 0.0, "avg_logprob": -0.19784293636198966, + "compression_ratio": 1.625, "no_speech_prob": 0.002573759062215686}, {"id": 81, + "seek": 55680, "start": 579.1999999999999, "end": 584.0799999999999, "text": " can + get me out, I''m certainly not going to despair just because not enough people are + excited about", "tokens": [51484, 393, 483, 385, 484, 11, 286, 478, 3297, 406, 516, + 281, 25763, 445, 570, 406, 1547, 561, 366, 2919, 466, 51728], "temperature": 0.0, + "avg_logprob": -0.19784293636198966, "compression_ratio": 1.625, "no_speech_prob": + 0.002573759062215686}, {"id": 82, "seek": 58408, "start": 584.1600000000001, "end": + 591.9200000000001, "text": " search. But I have seen that, you know, our technology + industry tends to have certain kinds of", "tokens": [50368, 3164, 13, 583, 286, + 362, 1612, 300, 11, 291, 458, 11, 527, 2899, 3518, 12258, 281, 362, 1629, 3685, + 295, 50756], "temperature": 0.0, "avg_logprob": -0.22533723831176758, "compression_ratio": + 1.6160337552742616, "no_speech_prob": 0.0027991170063614845}, {"id": 83, "seek": + 58408, "start": 591.9200000000001, "end": 597.6800000000001, "text": " fads and + say, in fact, back in the 90s, everybody was excited about search for those of you + all", "tokens": [50756, 283, 5834, 293, 584, 11, 294, 1186, 11, 646, 294, 264, 4289, + 82, 11, 2201, 390, 2919, 466, 3164, 337, 729, 295, 291, 439, 51044], "temperature": + 0.0, "avg_logprob": -0.22533723831176758, "compression_ratio": 1.6160337552742616, + "no_speech_prob": 0.0027991170063614845}, {"id": 84, "seek": 58408, "start": 597.6800000000001, + "end": 602.64, "text": " to have to remember, Google was not the first major search + engine that we''re using all of this,", "tokens": [51044, 281, 362, 281, 1604, 11, + 3329, 390, 406, 264, 700, 2563, 3164, 2848, 300, 321, 434, 1228, 439, 295, 341, + 11, 51292], "temperature": 0.0, "avg_logprob": -0.22533723831176758, "compression_ratio": + 1.6160337552742616, "no_speech_prob": 0.0027991170063614845}, {"id": 85, "seek": + 58408, "start": 602.64, "end": 608.72, "text": " that we''re using, gahoo and so + forth. And then after Google took on the scene, many people said,", "tokens": [51292, + 300, 321, 434, 1228, 11, 290, 545, 1986, 293, 370, 5220, 13, 400, 550, 934, 3329, + 1890, 322, 264, 4145, 11, 867, 561, 848, 11, 51596], "temperature": 0.0, "avg_logprob": + -0.22533723831176758, "compression_ratio": 1.6160337552742616, "no_speech_prob": + 0.0027991170063614845}, {"id": 86, "seek": 60872, "start": 608.72, "end": 614.88, + "text": " oh, search is done. Now, I happen to not be one of those people because + I was at a startup,", "tokens": [50364, 1954, 11, 3164, 307, 1096, 13, 823, 11, + 286, 1051, 281, 406, 312, 472, 295, 729, 561, 570, 286, 390, 412, 257, 18578, 11, + 50672], "temperature": 0.0, "avg_logprob": -0.17417240944229254, "compression_ratio": + 1.6341463414634145, "no_speech_prob": 0.0014558032853528857}, {"id": 87, "seek": + 60872, "start": 614.88, "end": 620.64, "text": " which actually also started in + 1999 working on search. And they said, no, search isn''t done at all.", "tokens": + [50672, 597, 767, 611, 1409, 294, 19952, 1364, 322, 3164, 13, 400, 436, 848, 11, + 572, 11, 3164, 1943, 380, 1096, 412, 439, 13, 50960], "temperature": 0.0, "avg_logprob": + -0.17417240944229254, "compression_ratio": 1.6341463414634145, "no_speech_prob": + 0.0014558032853528857}, {"id": 88, "seek": 60872, "start": 620.64, "end": 626.08, + "text": " I mean, we were trying to help e-commerce companies and we saw that there''s + a lot to do on search.", "tokens": [50960, 286, 914, 11, 321, 645, 1382, 281, 854, + 308, 12, 26926, 3431, 293, 321, 1866, 300, 456, 311, 257, 688, 281, 360, 322, 3164, + 13, 51232], "temperature": 0.0, "avg_logprob": -0.17417240944229254, "compression_ratio": + 1.6341463414634145, "no_speech_prob": 0.0014558032853528857}, {"id": 89, "seek": + 60872, "start": 626.08, "end": 632.1600000000001, "text": " Now you might think + that 20 years later, search would finally be done. But interestingly,", "tokens": + [51232, 823, 291, 1062, 519, 300, 945, 924, 1780, 11, 3164, 576, 2721, 312, 1096, + 13, 583, 25873, 11, 51536], "temperature": 0.0, "avg_logprob": -0.17417240944229254, + "compression_ratio": 1.6341463414634145, "no_speech_prob": 0.0014558032853528857}, + {"id": 90, "seek": 60872, "start": 633.0400000000001, "end": 638.0, "text": " there + are still so many opportunities, in fact, using some of the latest developments + in", "tokens": [51580, 456, 366, 920, 370, 867, 4786, 11, 294, 1186, 11, 1228, 512, + 295, 264, 6792, 20862, 294, 51828], "temperature": 0.0, "avg_logprob": -0.17417240944229254, + "compression_ratio": 1.6341463414634145, "no_speech_prob": 0.0014558032853528857}, + {"id": 91, "seek": 63800, "start": 638.0, "end": 646.56, "text": " machine learning + to do so. But what I''ve seen is that people don''t necessarily gravitate to", "tokens": + [50364, 3479, 2539, 281, 360, 370, 13, 583, 437, 286, 600, 1612, 307, 300, 561, + 500, 380, 4725, 7427, 8086, 281, 50792], "temperature": 0.0, "avg_logprob": -0.16482418141466507, + "compression_ratio": 1.652542372881356, "no_speech_prob": 0.00015165680088102818}, + {"id": 92, "seek": 63800, "start": 646.56, "end": 653.2, "text": " search as an + exciting problem. They''re excited about voice recognition about what they perceive + as", "tokens": [50792, 3164, 382, 364, 4670, 1154, 13, 814, 434, 2919, 466, 3177, + 11150, 466, 437, 436, 20281, 382, 51124], "temperature": 0.0, "avg_logprob": -0.16482418141466507, + "compression_ratio": 1.652542372881356, "no_speech_prob": 0.00015165680088102818}, + {"id": 93, "seek": 63800, "start": 653.2, "end": 658.0, "text": " AI in general, + which they may see as question-answered, which at the heart of it has lots to do + with", "tokens": [51124, 7318, 294, 2674, 11, 597, 436, 815, 536, 382, 1168, 12, + 43904, 292, 11, 597, 412, 264, 1917, 295, 309, 575, 3195, 281, 360, 365, 51364], + "temperature": 0.0, "avg_logprob": -0.16482418141466507, "compression_ratio": 1.652542372881356, + "no_speech_prob": 0.00015165680088102818}, {"id": 94, "seek": 63800, "start": 658.0, + "end": 664.48, "text": " search as well. But they don''t realize that, you know, + that humble little search box in which they", "tokens": [51364, 3164, 382, 731, + 13, 583, 436, 500, 380, 4325, 300, 11, 291, 458, 11, 300, 16735, 707, 3164, 2424, + 294, 597, 436, 51688], "temperature": 0.0, "avg_logprob": -0.16482418141466507, + "compression_ratio": 1.652542372881356, "no_speech_prob": 0.00015165680088102818}, + {"id": 95, "seek": 66448, "start": 664.64, "end": 670.72, "text": " are typing and + everything going on between it is still an extremely exciting area of development.", + "tokens": [50372, 366, 18444, 293, 1203, 516, 322, 1296, 309, 307, 920, 364, 4664, + 4670, 1859, 295, 3250, 13, 50676], "temperature": 0.0, "avg_logprob": -0.16641165899193805, + "compression_ratio": 1.6049382716049383, "no_speech_prob": 0.0017511165933683515}, + {"id": 96, "seek": 66448, "start": 670.72, "end": 676.88, "text": " I think it''s + because it does look so simple that they don''t imagine what can you do? Change + the", "tokens": [50676, 286, 519, 309, 311, 570, 309, 775, 574, 370, 2199, 300, + 436, 500, 380, 3811, 437, 393, 291, 360, 30, 15060, 264, 50984], "temperature": + 0.0, "avg_logprob": -0.16641165899193805, "compression_ratio": 1.6049382716049383, + "no_speech_prob": 0.0017511165933683515}, {"id": 97, "seek": 66448, "start": 676.88, + "end": 682.08, "text": " size of the search box, you know, change the font of what''s + actually going on between. You''ll be", "tokens": [50984, 2744, 295, 264, 3164, + 2424, 11, 291, 458, 11, 1319, 264, 10703, 295, 437, 311, 767, 516, 322, 1296, 13, + 509, 603, 312, 51244], "temperature": 0.0, "avg_logprob": -0.16641165899193805, + "compression_ratio": 1.6049382716049383, "no_speech_prob": 0.0017511165933683515}, + {"id": 98, "seek": 66448, "start": 682.08, "end": 690.5600000000001, "text": " behind + that. So my hope is that when people see the complexity involved and the way in + which search", "tokens": [51244, 2261, 300, 13, 407, 452, 1454, 307, 300, 562, 561, + 536, 264, 14024, 3288, 293, 264, 636, 294, 597, 3164, 51668], "temperature": 0.0, + "avg_logprob": -0.16641165899193805, "compression_ratio": 1.6049382716049383, "no_speech_prob": + 0.0017511165933683515}, {"id": 99, "seek": 69056, "start": 690.64, "end": 697.28, + "text": " is amenable to the very techniques that they are excited about, they''ll + then say, oh wow,", "tokens": [50368, 307, 18497, 712, 281, 264, 588, 7512, 300, + 436, 366, 2919, 466, 11, 436, 603, 550, 584, 11, 1954, 6076, 11, 50700], "temperature": + 0.0, "avg_logprob": -0.20883768847864917, "compression_ratio": 1.6258741258741258, + "no_speech_prob": 0.016233112663030624}, {"id": 100, "seek": 69056, "start": 697.28, + "end": 701.5999999999999, "text": " this is great. And then on the top of everything + else, it''s a place where I can have a huge", "tokens": [50700, 341, 307, 869, 13, + 400, 550, 322, 264, 1192, 295, 1203, 1646, 11, 309, 311, 257, 1081, 689, 286, 393, + 362, 257, 2603, 50916], "temperature": 0.0, "avg_logprob": -0.20883768847864917, + "compression_ratio": 1.6258741258741258, "no_speech_prob": 0.016233112663030624}, + {"id": 101, "seek": 69056, "start": 701.5999999999999, "end": 708.16, "text": " + about impact, a netable impact on the way that people interact in machines. So I + know, no despair,", "tokens": [50916, 466, 2712, 11, 257, 2533, 712, 2712, 322, + 264, 636, 300, 561, 4648, 294, 8379, 13, 407, 286, 458, 11, 572, 25763, 11, 51244], + "temperature": 0.0, "avg_logprob": -0.20883768847864917, "compression_ratio": 1.6258741258741258, + "no_speech_prob": 0.016233112663030624}, {"id": 102, "seek": 69056, "start": 708.16, + "end": 712.4, "text": " just maybe sometimes a little bit of sadness that people + don''t share my excitement enough.", "tokens": [51244, 445, 1310, 2171, 257, 707, + 857, 295, 22462, 300, 561, 500, 380, 2073, 452, 14755, 1547, 13, 51456], "temperature": + 0.0, "avg_logprob": -0.20883768847864917, "compression_ratio": 1.6258741258741258, + "no_speech_prob": 0.016233112663030624}, {"id": 103, "seek": 69056, "start": 713.04, + "end": 718.56, "text": " Yeah, absolutely. I mean, search is a fantastic field if + you''re not there, consider entering,", "tokens": [51488, 865, 11, 3122, 13, 286, + 914, 11, 3164, 307, 257, 5456, 2519, 498, 291, 434, 406, 456, 11, 1949, 11104, 11, + 51764], "temperature": 0.0, "avg_logprob": -0.20883768847864917, "compression_ratio": + 1.6258741258741258, "no_speech_prob": 0.016233112663030624}, {"id": 104, "seek": + 71856, "start": 718.56, "end": 725.8399999999999, "text": " or at least studying + and evaluating, but it''s very deep. I remember actually myself like 20", "tokens": + [50364, 420, 412, 1935, 7601, 293, 27479, 11, 457, 309, 311, 588, 2452, 13, 286, + 1604, 767, 2059, 411, 945, 50728], "temperature": 0.0, "avg_logprob": -0.13599475388674392, + "compression_ratio": 1.5418326693227091, "no_speech_prob": 0.003103128634393215}, + {"id": 105, "seek": 71856, "start": 725.8399999999999, "end": 732.88, "text": " + years ago, still in the university, I was asking a friend of mine, how do the search + engine works?", "tokens": [50728, 924, 2057, 11, 920, 294, 264, 5454, 11, 286, 390, + 3365, 257, 1277, 295, 3892, 11, 577, 360, 264, 3164, 2848, 1985, 30, 51080], "temperature": + 0.0, "avg_logprob": -0.13599475388674392, "compression_ratio": 1.5418326693227091, + "no_speech_prob": 0.003103128634393215}, {"id": 106, "seek": 71856, "start": 732.88, + "end": 739.92, "text": " And he was majoring in information retrieval back then. + I knew nothing about the field. And he said,", "tokens": [51080, 400, 415, 390, + 2563, 278, 294, 1589, 19817, 3337, 646, 550, 13, 286, 2586, 1825, 466, 264, 2519, + 13, 400, 415, 848, 11, 51432], "temperature": 0.0, "avg_logprob": -0.13599475388674392, + "compression_ratio": 1.5418326693227091, "no_speech_prob": 0.003103128634393215}, + {"id": 107, "seek": 71856, "start": 739.92, "end": 745.04, "text": " well, we we + use inverted index. That''s how we represent the documents. But then I was still + not", "tokens": [51432, 731, 11, 321, 321, 764, 38969, 8186, 13, 663, 311, 577, + 321, 2906, 264, 8512, 13, 583, 550, 286, 390, 920, 406, 51688], "temperature": 0.0, + "avg_logprob": -0.13599475388674392, "compression_ratio": 1.5418326693227091, "no_speech_prob": + 0.003103128634393215}, {"id": 108, "seek": 74504, "start": 745.04, "end": 750.7199999999999, + "text": " satisfied. I asked him, hey, how can actually search engine know what + I want to find when I don''t", "tokens": [50364, 11239, 13, 286, 2351, 796, 11, + 4177, 11, 577, 393, 767, 3164, 2848, 458, 437, 286, 528, 281, 915, 562, 286, 500, + 380, 50648], "temperature": 0.0, "avg_logprob": -0.10626036621803461, "compression_ratio": + 1.6870748299319729, "no_speech_prob": 0.012129832990467548}, {"id": 109, "seek": + 74504, "start": 750.7199999999999, "end": 755.76, "text": " know myself? Like, if + I start typing something in the keyword box, it''s like a chicken act problem.", + "tokens": [50648, 458, 2059, 30, 1743, 11, 498, 286, 722, 18444, 746, 294, 264, + 20428, 2424, 11, 309, 311, 411, 257, 4662, 605, 1154, 13, 50900], "temperature": + 0.0, "avg_logprob": -0.10626036621803461, "compression_ratio": 1.6870748299319729, + "no_speech_prob": 0.012129832990467548}, {"id": 110, "seek": 74504, "start": 755.76, + "end": 761.4399999999999, "text": " It means that I know something already of the + subject, right? But what if I know nothing about it?", "tokens": [50900, 467, 1355, + 300, 286, 458, 746, 1217, 295, 264, 3983, 11, 558, 30, 583, 437, 498, 286, 458, + 1825, 466, 309, 30, 51184], "temperature": 0.0, "avg_logprob": -0.10626036621803461, + "compression_ratio": 1.6870748299319729, "no_speech_prob": 0.012129832990467548}, + {"id": 111, "seek": 74504, "start": 761.4399999999999, "end": 767.5999999999999, + "text": " And so in my mind, I started hypothesizing that maybe we can build a search + engine, which will kind", "tokens": [51184, 400, 370, 294, 452, 1575, 11, 286, 1409, + 14276, 3319, 300, 1310, 321, 393, 1322, 257, 3164, 2848, 11, 597, 486, 733, 51492], + "temperature": 0.0, "avg_logprob": -0.10626036621803461, "compression_ratio": 1.6870748299319729, + "no_speech_prob": 0.012129832990467548}, {"id": 112, "seek": 74504, "start": 767.5999999999999, + "end": 772.24, "text": " of refine the query. I didn''t know how to do it, but I + was just, you know, thinking in my mind that", "tokens": [51492, 295, 33906, 264, + 14581, 13, 286, 994, 380, 458, 577, 281, 360, 309, 11, 457, 286, 390, 445, 11, 291, + 458, 11, 1953, 294, 452, 1575, 300, 51724], "temperature": 0.0, "avg_logprob": -0.10626036621803461, + "compression_ratio": 1.6870748299319729, "no_speech_prob": 0.012129832990467548}, + {"id": 113, "seek": 77224, "start": 772.24, "end": 778.72, "text": " it''s possible. + And now so many years fast forward, we apply machine learning to search. And what + I", "tokens": [50364, 309, 311, 1944, 13, 400, 586, 370, 867, 924, 2370, 2128, 11, + 321, 3079, 3479, 2539, 281, 3164, 13, 400, 437, 286, 50688], "temperature": 0.0, + "avg_logprob": -0.13511588838365343, "compression_ratio": 1.6596638655462186, "no_speech_prob": + 0.0032055301126092672}, {"id": 114, "seek": 77224, "start": 778.72, "end": 785.52, + "text": " wanted to ask you, why do you think we need machine learning in search + today? Like, there are other", "tokens": [50688, 1415, 281, 1029, 291, 11, 983, + 360, 291, 519, 321, 643, 3479, 2539, 294, 3164, 965, 30, 1743, 11, 456, 366, 661, + 51028], "temperature": 0.0, "avg_logprob": -0.13511588838365343, "compression_ratio": + 1.6596638655462186, "no_speech_prob": 0.0032055301126092672}, {"id": 115, "seek": + 77224, "start": 785.52, "end": 793.44, "text": " ways to satisfy the user intent + on sort of calculated, understand it. Then there are other things", "tokens": [51028, + 2098, 281, 19319, 264, 4195, 8446, 322, 1333, 295, 15598, 11, 1223, 309, 13, 1396, + 456, 366, 661, 721, 51424], "temperature": 0.0, "avg_logprob": -0.13511588838365343, + "compression_ratio": 1.6596638655462186, "no_speech_prob": 0.0032055301126092672}, + {"id": 116, "seek": 77224, "start": 793.44, "end": 798.72, "text": " like established + techniques with manual boosting and manual features that you can curate. And many", + "tokens": [51424, 411, 7545, 7512, 365, 9688, 43117, 293, 9688, 4122, 300, 291, + 393, 1262, 473, 13, 400, 867, 51688], "temperature": 0.0, "avg_logprob": -0.13511588838365343, + "compression_ratio": 1.6596638655462186, "no_speech_prob": 0.0032055301126092672}, + {"id": 117, "seek": 79872, "start": 798.72, "end": 804.8000000000001, "text": " + companies, I think, still do it. But like, what''s your take on machine learning + role in search today?", "tokens": [50364, 3431, 11, 286, 519, 11, 920, 360, 309, + 13, 583, 411, 11, 437, 311, 428, 747, 322, 3479, 2539, 3090, 294, 3164, 965, 30, + 50668], "temperature": 0.0, "avg_logprob": -0.15857026953446238, "compression_ratio": + 1.6440677966101696, "no_speech_prob": 0.005017874296754599}, {"id": 118, "seek": + 79872, "start": 805.9200000000001, "end": 813.6800000000001, "text": " So certainly + when I was working on search back in 1999, I didn''t give a no much machine learning.", + "tokens": [50724, 407, 3297, 562, 286, 390, 1364, 322, 3164, 646, 294, 19952, 11, + 286, 994, 380, 976, 257, 572, 709, 3479, 2539, 13, 51112], "temperature": 0.0, "avg_logprob": + -0.15857026953446238, "compression_ratio": 1.6440677966101696, "no_speech_prob": + 0.005017874296754599}, {"id": 119, "seek": 79872, "start": 813.6800000000001, "end": + 820.48, "text": " I take a class that''s highly theoretical, but I managed to help + build search so clearly it''s", "tokens": [51112, 286, 747, 257, 1508, 300, 311, + 5405, 20864, 11, 457, 286, 6453, 281, 854, 1322, 3164, 370, 4448, 309, 311, 51452], + "temperature": 0.0, "avg_logprob": -0.15857026953446238, "compression_ratio": 1.6440677966101696, + "no_speech_prob": 0.005017874296754599}, {"id": 120, "seek": 79872, "start": 820.48, + "end": 825.44, "text": " possible to do it without machine learning. And as you + said, many people still are working with", "tokens": [51452, 1944, 281, 360, 309, + 1553, 3479, 2539, 13, 400, 382, 291, 848, 11, 867, 561, 920, 366, 1364, 365, 51700], + "temperature": 0.0, "avg_logprob": -0.15857026953446238, "compression_ratio": 1.6440677966101696, + "no_speech_prob": 0.005017874296754599}, {"id": 121, "seek": 82544, "start": 825.44, + "end": 832.0, "text": " completely hand-to-systems. I think that machine learning + plays a few roles though in,", "tokens": [50364, 2584, 1011, 12, 1353, 12, 28215, + 82, 13, 286, 519, 300, 3479, 2539, 5749, 257, 1326, 9604, 1673, 294, 11, 50692], + "temperature": 0.0, "avg_logprob": -0.23150289176714303, "compression_ratio": 1.6211453744493391, + "no_speech_prob": 0.0003461708838585764}, {"id": 122, "seek": 82544, "start": 832.8800000000001, + "end": 836.48, "text": " I think what you could say is modernize in search, but + what we''re making it do things you really", "tokens": [50736, 286, 519, 437, 291, + 727, 584, 307, 4363, 1125, 294, 3164, 11, 457, 437, 321, 434, 1455, 309, 360, 721, + 291, 534, 50916], "temperature": 0.0, "avg_logprob": -0.23150289176714303, "compression_ratio": + 1.6211453744493391, "no_speech_prob": 0.0003461708838585764}, {"id": 123, "seek": + 82544, "start": 836.48, "end": 844.0, "text": " couldn''t do before. So for one + thing, when you''re doing all of these hand-to-tune boosts,", "tokens": [50916, + 2809, 380, 360, 949, 13, 407, 337, 472, 551, 11, 562, 291, 434, 884, 439, 295, 613, + 1011, 12, 1353, 12, 83, 2613, 9194, 82, 11, 51292], "temperature": 0.0, "avg_logprob": + -0.23150289176714303, "compression_ratio": 1.6211453744493391, "no_speech_prob": + 0.0003461708838585764}, {"id": 124, "seek": 82544, "start": 844.0, "end": 849.36, + "text": " right, which you''re typically saying, oh, I''m going to have a bunch + of factors that affect me.", "tokens": [51292, 558, 11, 597, 291, 434, 5850, 1566, + 11, 1954, 11, 286, 478, 516, 281, 362, 257, 3840, 295, 6771, 300, 3345, 385, 13, + 51560], "temperature": 0.0, "avg_logprob": -0.23150289176714303, "compression_ratio": + 1.6211453744493391, "no_speech_prob": 0.0003461708838585764}, {"id": 125, "seek": + 84936, "start": 849.44, "end": 855.44, "text": " I''ll change the weights on those. + I''ll see what can improve. Effectively, you''re performing", "tokens": [50368, + 286, 603, 1319, 264, 17443, 322, 729, 13, 286, 603, 536, 437, 393, 3470, 13, 17764, + 3413, 11, 291, 434, 10205, 50668], "temperature": 0.0, "avg_logprob": -0.10475630455828727, + "compression_ratio": 1.7285067873303168, "no_speech_prob": 0.0015364597784355283}, + {"id": 126, "seek": 84936, "start": 855.44, "end": 860.5600000000001, "text": " + an optimization problem, but you''re doing it by hand, or you''re saying, let me + go a little bit", "tokens": [50668, 364, 19618, 1154, 11, 457, 291, 434, 884, 309, + 538, 1011, 11, 420, 291, 434, 1566, 11, 718, 385, 352, 257, 707, 857, 50924], "temperature": + 0.0, "avg_logprob": -0.10475630455828727, "compression_ratio": 1.7285067873303168, + "no_speech_prob": 0.0015364597784355283}, {"id": 127, "seek": 84936, "start": 860.5600000000001, + "end": 866.88, "text": " in this direction, a little bit in that direction, let + me see what it can do. Well, the main technique", "tokens": [50924, 294, 341, 3513, + 11, 257, 707, 857, 294, 300, 3513, 11, 718, 385, 536, 437, 309, 393, 360, 13, 1042, + 11, 264, 2135, 6532, 51240], "temperature": 0.0, "avg_logprob": -0.10475630455828727, + "compression_ratio": 1.7285067873303168, "no_speech_prob": 0.0015364597784355283}, + {"id": 128, "seek": 84936, "start": 866.88, "end": 876.64, "text": " that machine + learning does is optimization only that it does so by formalizing the objective", + "tokens": [51240, 300, 3479, 2539, 775, 307, 19618, 787, 300, 309, 775, 370, 538, + 9860, 3319, 264, 10024, 51728], "temperature": 0.0, "avg_logprob": -0.10475630455828727, + "compression_ratio": 1.7285067873303168, "no_speech_prob": 0.0015364597784355283}, + {"id": 129, "seek": 87664, "start": 876.64, "end": 884.0, "text": " that you''re + optimizing for, and then using mathematical techniques, like variations of gradient + descent,", "tokens": [50364, 300, 291, 434, 40425, 337, 11, 293, 550, 1228, 18894, + 7512, 11, 411, 17840, 295, 16235, 23475, 11, 50732], "temperature": 0.0, "avg_logprob": + -0.117306883617114, "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.0004638682003132999}, + {"id": 130, "seek": 87664, "start": 884.0, "end": 890.56, "text": " to look for + the place that is optimal. Well, it would be silly for you to do things by hand + that", "tokens": [50732, 281, 574, 337, 264, 1081, 300, 307, 16252, 13, 1042, 11, + 309, 576, 312, 11774, 337, 291, 281, 360, 721, 538, 1011, 300, 51060], "temperature": + 0.0, "avg_logprob": -0.117306883617114, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.0004638682003132999}, {"id": 131, "seek": 87664, "start": 890.56, + "end": 896.16, "text": " there is an existing architecture to do that. But the other + thing is that when you do things by hand,", "tokens": [51060, 456, 307, 364, 6741, + 9482, 281, 360, 300, 13, 583, 264, 661, 551, 307, 300, 562, 291, 360, 721, 538, + 1011, 11, 51340], "temperature": 0.0, "avg_logprob": -0.117306883617114, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.0004638682003132999}, {"id": 132, "seek": + 87664, "start": 896.72, "end": 902.88, "text": " you''re very unlikely to be able + to move too many knobs by hand. Three or four factors,", "tokens": [51368, 291, + 434, 588, 17518, 281, 312, 1075, 281, 1286, 886, 867, 46999, 538, 1011, 13, 6244, + 420, 1451, 6771, 11, 51676], "temperature": 0.0, "avg_logprob": -0.117306883617114, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.0004638682003132999}, + {"id": 133, "seek": 90288, "start": 902.88, "end": 909.68, "text": " you can handle + a hundred factors, almost certainly not. And that tends to be the big win of machine", + "tokens": [50364, 291, 393, 4813, 257, 3262, 6771, 11, 1920, 3297, 406, 13, 400, + 300, 12258, 281, 312, 264, 955, 1942, 295, 3479, 50704], "temperature": 0.0, "avg_logprob": + -0.18642120799799075, "compression_ratio": 1.6278026905829597, "no_speech_prob": + 0.0008062999113462865}, {"id": 134, "seek": 90288, "start": 909.68, "end": 915.12, + "text": " learning is that because of the win that it automates what you would otherwise + do by hand,", "tokens": [50704, 2539, 307, 300, 570, 295, 264, 1942, 300, 309, 3553, + 1024, 437, 291, 576, 5911, 360, 538, 1011, 11, 50976], "temperature": 0.0, "avg_logprob": + -0.18642120799799075, "compression_ratio": 1.6278026905829597, "no_speech_prob": + 0.0008062999113462865}, {"id": 135, "seek": 90288, "start": 915.12, "end": 921.28, + "text": " it allows you to do things at a much larger scale and yet keep your weights + about you.", "tokens": [50976, 309, 4045, 291, 281, 360, 721, 412, 257, 709, 4833, + 4373, 293, 1939, 1066, 428, 17443, 466, 291, 13, 51284], "temperature": 0.0, "avg_logprob": + -0.18642120799799075, "compression_ratio": 1.6278026905829597, "no_speech_prob": + 0.0008062999113462865}, {"id": 136, "seek": 90288, "start": 921.28, "end": 925.68, + "text": " And that''s usually what people do when they''re concerned about ranking. + But the other", "tokens": [51284, 400, 300, 311, 2673, 437, 561, 360, 562, 436, + 434, 5922, 466, 367, 29411, 13, 583, 264, 661, 51504], "temperature": 0.0, "avg_logprob": + -0.18642120799799075, "compression_ratio": 1.6278026905829597, "no_speech_prob": + 0.0008062999113462865}, {"id": 137, "seek": 92568, "start": 925.76, "end": 932.2399999999999, + "text": " the other way to break through in machine learning today is that in areas + like query and content", "tokens": [50368, 264, 661, 636, 281, 1821, 807, 294, 3479, + 2539, 965, 307, 300, 294, 3179, 411, 14581, 293, 2701, 50692], "temperature": 0.0, + "avg_logprob": -0.14459993781113042, "compression_ratio": 1.6888888888888889, "no_speech_prob": + 0.002450924599543214}, {"id": 138, "seek": 92568, "start": 932.2399999999999, "end": + 939.04, "text": " understanding, machine learning often allows you to solve problems. + You''d have been very unlikely", "tokens": [50692, 3701, 11, 3479, 2539, 2049, 4045, + 291, 281, 5039, 2740, 13, 509, 1116, 362, 668, 588, 17518, 51032], "temperature": + 0.0, "avg_logprob": -0.14459993781113042, "compression_ratio": 1.6888888888888889, + "no_speech_prob": 0.002450924599543214}, {"id": 139, "seek": 92568, "start": 939.04, + "end": 944.8, "text": " to solve by hand to recognize when a query comes in that''s + in a particular category or that", "tokens": [51032, 281, 5039, 538, 1011, 281, + 5521, 562, 257, 14581, 1487, 294, 300, 311, 294, 257, 1729, 7719, 420, 300, 51320], + "temperature": 0.0, "avg_logprob": -0.14459993781113042, "compression_ratio": 1.6888888888888889, + "no_speech_prob": 0.002450924599543214}, {"id": 140, "seek": 92568, "start": 945.4399999999999, + "end": 952.3199999999999, "text": " particular entities, people, brands and so forth + are mentioned in that query or to figure out", "tokens": [51352, 1729, 16667, 11, + 561, 11, 11324, 293, 370, 5220, 366, 2835, 294, 300, 14581, 420, 281, 2573, 484, + 51696], "temperature": 0.0, "avg_logprob": -0.14459993781113042, "compression_ratio": + 1.6888888888888889, "no_speech_prob": 0.002450924599543214}, {"id": 141, "seek": + 95232, "start": 952.32, "end": 957.6800000000001, "text": " what a piece of content + is about and get a representation that you can then use to inform", "tokens": [50364, + 437, 257, 2522, 295, 2701, 307, 466, 293, 483, 257, 10290, 300, 291, 393, 550, 764, + 281, 1356, 50632], "temperature": 0.0, "avg_logprob": -0.14151190639881606, "compression_ratio": + 1.6478260869565218, "no_speech_prob": 0.0005761187640018761}, {"id": 142, "seek": + 95232, "start": 957.6800000000001, "end": 964.8000000000001, "text": " what to be + returned. And that''s an area where it''s not new that you can use machine learning + there,", "tokens": [50632, 437, 281, 312, 8752, 13, 400, 300, 311, 364, 1859, 689, + 309, 311, 406, 777, 300, 291, 393, 764, 3479, 2539, 456, 11, 50988], "temperature": + 0.0, "avg_logprob": -0.14151190639881606, "compression_ratio": 1.6478260869565218, + "no_speech_prob": 0.0005761187640018761}, {"id": 143, "seek": 95232, "start": 964.8000000000001, + "end": 973.2800000000001, "text": " but the ability assistance to do so using the + more modern AI techniques of word embedding to the", "tokens": [50988, 457, 264, + 3485, 9683, 281, 360, 370, 1228, 264, 544, 4363, 7318, 7512, 295, 1349, 12240, 3584, + 281, 264, 51412], "temperature": 0.0, "avg_logprob": -0.14151190639881606, "compression_ratio": + 1.6478260869565218, "no_speech_prob": 0.0005761187640018761}, {"id": 144, "seek": + 95232, "start": 973.2800000000001, "end": 978.8800000000001, "text": " like, have + dramatically changed the quality of that. And it''s a breakthrough that I can only", + "tokens": [51412, 411, 11, 362, 17548, 3105, 264, 3125, 295, 300, 13, 400, 309, + 311, 257, 22397, 300, 286, 393, 787, 51692], "temperature": 0.0, "avg_logprob": + -0.14151190639881606, "compression_ratio": 1.6478260869565218, "no_speech_prob": + 0.0005761187640018761}, {"id": 145, "seek": 97888, "start": 979.52, "end": 985.04, + "text": " think of comparing to, you know, speech recognition has been around for + a while. But if you use", "tokens": [50396, 519, 295, 15763, 281, 11, 291, 458, + 11, 6218, 11150, 575, 668, 926, 337, 257, 1339, 13, 583, 498, 291, 764, 50672], + "temperature": 0.0, "avg_logprob": -0.13728494479738432, "compression_ratio": 1.6689419795221843, + "no_speech_prob": 0.013131290674209595}, {"id": 146, "seek": 97888, "start": 985.04, + "end": 990.56, "text": " speech recognition systems in the 1980s or 1990s, you thought + other very cute, but they''ll never be", "tokens": [50672, 6218, 11150, 3652, 294, + 264, 13626, 82, 420, 13384, 82, 11, 291, 1194, 661, 588, 4052, 11, 457, 436, 603, + 1128, 312, 50948], "temperature": 0.0, "avg_logprob": -0.13728494479738432, "compression_ratio": + 1.6689419795221843, "no_speech_prob": 0.013131290674209595}, {"id": 147, "seek": + 97888, "start": 990.56, "end": 996.08, "text": " useful. And today we take for granted + that they work well enough that people who had no other", "tokens": [50948, 4420, + 13, 400, 965, 321, 747, 337, 12344, 300, 436, 589, 731, 1547, 300, 561, 567, 632, + 572, 661, 51224], "temperature": 0.0, "avg_logprob": -0.13728494479738432, "compression_ratio": + 1.6689419795221843, "no_speech_prob": 0.013131290674209595}, {"id": 148, "seek": + 97888, "start": 996.08, "end": 1000.56, "text": " option could actually manage with + them. And I would say that machine learning in search has now", "tokens": [51224, + 3614, 727, 767, 3067, 365, 552, 13, 400, 286, 576, 584, 300, 3479, 2539, 294, 3164, + 575, 586, 51448], "temperature": 0.0, "avg_logprob": -0.13728494479738432, "compression_ratio": + 1.6689419795221843, "no_speech_prob": 0.013131290674209595}, {"id": 149, "seek": + 97888, "start": 1000.56, "end": 1006.4, "text": " reached a point where it would + be silly not to use it if you have the possibility of the data to do so.", "tokens": + [51448, 6488, 257, 935, 689, 309, 576, 312, 11774, 406, 281, 764, 309, 498, 291, + 362, 264, 7959, 295, 264, 1412, 281, 360, 370, 13, 51740], "temperature": 0.0, "avg_logprob": + -0.13728494479738432, "compression_ratio": 1.6689419795221843, "no_speech_prob": + 0.013131290674209595}, {"id": 150, "seek": 100640, "start": 1007.36, "end": 1011.36, + "text": " Yeah, absolutely, especially if you sit on a pile of data, right? As they + used to say in the", "tokens": [50412, 865, 11, 3122, 11, 2318, 498, 291, 1394, + 322, 257, 14375, 295, 1412, 11, 558, 30, 1018, 436, 1143, 281, 584, 294, 264, 50612], + "temperature": 0.0, "avg_logprob": -0.12986039661225818, "compression_ratio": 1.6722689075630253, + "no_speech_prob": 0.02888287603855133}, {"id": 151, "seek": 100640, "start": 1011.36, + "end": 1018.8, "text": " age of big data. But of course, there are still niche areas + where you, let''s say you launch a startup,", "tokens": [50612, 3205, 295, 955, + 1412, 13, 583, 295, 1164, 11, 456, 366, 920, 19956, 3179, 689, 291, 11, 718, 311, + 584, 291, 4025, 257, 18578, 11, 50984], "temperature": 0.0, "avg_logprob": -0.12986039661225818, + "compression_ratio": 1.6722689075630253, "no_speech_prob": 0.02888287603855133}, + {"id": 152, "seek": 100640, "start": 1018.8, "end": 1024.4, "text": " so you don''t + have clicks. Maybe you can measure clicks some way, but let''s say you don''t have + clicks.", "tokens": [50984, 370, 291, 500, 380, 362, 18521, 13, 2704, 291, 393, + 3481, 18521, 512, 636, 11, 457, 718, 311, 584, 291, 500, 380, 362, 18521, 13, 51264], + "temperature": 0.0, "avg_logprob": -0.12986039661225818, "compression_ratio": 1.6722689075630253, + "no_speech_prob": 0.02888287603855133}, {"id": 153, "seek": 100640, "start": 1024.4, + "end": 1030.6399999999999, "text": " You don''t have any user feedback yet. I think + at that point, you could still apply machine learning,", "tokens": [51264, 509, + 500, 380, 362, 604, 4195, 5824, 1939, 13, 286, 519, 412, 300, 935, 11, 291, 727, + 920, 3079, 3479, 2539, 11, 51576], "temperature": 0.0, "avg_logprob": -0.12986039661225818, + "compression_ratio": 1.6722689075630253, "no_speech_prob": 0.02888287603855133}, + {"id": 154, "seek": 103064, "start": 1030.64, "end": 1037.2, "text": " right? Like + deep learning, hopefully we''ll get there. But before that, I think when people + talk", "tokens": [50364, 558, 30, 1743, 2452, 2539, 11, 4696, 321, 603, 483, 456, + 13, 583, 949, 300, 11, 286, 519, 562, 561, 751, 50692], "temperature": 0.0, "avg_logprob": + -0.1420192023118337, "compression_ratio": 1.5854700854700854, "no_speech_prob": + 0.007552722934633493}, {"id": 155, "seek": 103064, "start": 1037.2, "end": 1043.92, + "text": " about ML in search context, they quite often mean, you know, machine learning + based relevancy,", "tokens": [50692, 466, 21601, 294, 3164, 4319, 11, 436, 1596, + 2049, 914, 11, 291, 458, 11, 3479, 2539, 2361, 25916, 6717, 11, 51028], "temperature": + 0.0, "avg_logprob": -0.1420192023118337, "compression_ratio": 1.5854700854700854, + "no_speech_prob": 0.007552722934633493}, {"id": 156, "seek": 103064, "start": 1044.88, + "end": 1049.8400000000001, "text": " you know, learning to rank, like you learn + a function which will rank your documents.", "tokens": [51076, 291, 458, 11, 2539, + 281, 6181, 11, 411, 291, 1466, 257, 2445, 597, 486, 6181, 428, 8512, 13, 51324], + "temperature": 0.0, "avg_logprob": -0.1420192023118337, "compression_ratio": 1.5854700854700854, + "no_speech_prob": 0.007552722934633493}, {"id": 157, "seek": 103064, "start": 1051.2, + "end": 1056.5600000000002, "text": " But in a way, what can rank or find by itself, + not much, if the data is not there, if it''s not", "tokens": [51392, 583, 294, 257, + 636, 11, 437, 393, 6181, 420, 915, 538, 2564, 11, 406, 709, 11, 498, 264, 1412, + 307, 406, 456, 11, 498, 309, 311, 406, 51660], "temperature": 0.0, "avg_logprob": + -0.1420192023118337, "compression_ratio": 1.5854700854700854, "no_speech_prob": + 0.007552722934633493}, {"id": 158, "seek": 105656, "start": 1056.56, "end": 1063.9199999999998, + "text": " categorized. So what''s your view on where machine learning can bring + a lot of benefit upstream?", "tokens": [50364, 19250, 1602, 13, 407, 437, 311, 428, + 1910, 322, 689, 3479, 2539, 393, 1565, 257, 688, 295, 5121, 33915, 30, 50732], "temperature": + 0.0, "avg_logprob": -0.24452687292983852, "compression_ratio": 1.5810276679841897, + "no_speech_prob": 0.015148076228797436}, {"id": 159, "seek": 105656, "start": 1063.9199999999998, + "end": 1068.3999999999999, "text": " And I think you touched on it like query understanding + and content understanding. Can you drill a", "tokens": [50732, 400, 286, 519, 291, + 9828, 322, 309, 411, 14581, 3701, 293, 2701, 3701, 13, 1664, 291, 11392, 257, 50956], + "temperature": 0.0, "avg_logprob": -0.24452687292983852, "compression_ratio": 1.5810276679841897, + "no_speech_prob": 0.015148076228797436}, {"id": 160, "seek": 105656, "start": 1068.3999999999999, + "end": 1073.9199999999998, "text": " little bit more into that, especially from + the point of view, how you approach the task, where you start?", "tokens": [50956, + 707, 857, 544, 666, 300, 11, 2318, 490, 264, 935, 295, 1910, 11, 577, 291, 3109, + 264, 5633, 11, 689, 291, 722, 30, 51232], "temperature": 0.0, "avg_logprob": -0.24452687292983852, + "compression_ratio": 1.5810276679841897, "no_speech_prob": 0.015148076228797436}, + {"id": 161, "seek": 105656, "start": 1075.12, "end": 1082.48, "text": " Sure. Well, + as you know, and anybody listening to this, is ready what I have to say. I like + ranking,", "tokens": [51292, 4894, 13, 1042, 11, 382, 291, 458, 11, 293, 4472, 4764, + 281, 341, 11, 307, 1919, 437, 286, 362, 281, 584, 13, 286, 411, 17833, 11, 51660], + "temperature": 0.0, "avg_logprob": -0.24452687292983852, "compression_ratio": 1.5810276679841897, + "no_speech_prob": 0.015148076228797436}, {"id": 162, "seek": 108248, "start": 1082.72, + "end": 1086.88, "text": " some of my best friends, but do ranking and even I do + it occasionally. But I think that", "tokens": [50376, 512, 295, 452, 1151, 1855, + 11, 457, 360, 17833, 293, 754, 286, 360, 309, 16895, 13, 583, 286, 519, 300, 50584], + "temperature": 0.0, "avg_logprob": -0.2371140169293693, "compression_ratio": 1.6415929203539823, + "no_speech_prob": 0.0009505892521701753}, {"id": 163, "seek": 108248, "start": 1087.44, + "end": 1092.88, "text": " ranking has been over emphasized in search in general + and in machine learning, the power and", "tokens": [50612, 17833, 575, 668, 670, + 34068, 294, 3164, 294, 2674, 293, 294, 3479, 2539, 11, 264, 1347, 293, 50884], "temperature": + 0.0, "avg_logprob": -0.2371140169293693, "compression_ratio": 1.6415929203539823, + "no_speech_prob": 0.0009505892521701753}, {"id": 164, "seek": 108248, "start": 1092.88, + "end": 1100.16, "text": " search in particular. So if we think of what ranking does, + it says, but we have a search query,", "tokens": [50884, 3164, 294, 1729, 13, 407, + 498, 321, 519, 295, 437, 17833, 775, 11, 309, 1619, 11, 457, 321, 362, 257, 3164, + 14581, 11, 51248], "temperature": 0.0, "avg_logprob": -0.2371140169293693, "compression_ratio": + 1.6415929203539823, "no_speech_prob": 0.0009505892521701753}, {"id": 165, "seek": + 108248, "start": 1100.88, "end": 1107.6, "text": " we have a potential result, and + we score it with a function that will then determine the order", "tokens": [51284, + 321, 362, 257, 3995, 1874, 11, 293, 321, 6175, 309, 365, 257, 2445, 300, 486, 550, + 6997, 264, 1668, 51620], "temperature": 0.0, "avg_logprob": -0.2371140169293693, + "compression_ratio": 1.6415929203539823, "no_speech_prob": 0.0009505892521701753}, + {"id": 166, "seek": 110760, "start": 1107.6, "end": 1113.4399999999998, "text": + " which we present it, assuming that that result is as a candidate to be considered. + And if you go", "tokens": [50364, 597, 321, 1974, 309, 11, 11926, 300, 300, 1874, + 307, 382, 257, 11532, 281, 312, 4888, 13, 400, 498, 291, 352, 50656], "temperature": + 0.0, "avg_logprob": -0.18647420037653029, "compression_ratio": 1.6367521367521367, + "no_speech_prob": 0.007548121735453606}, {"id": 167, "seek": 110760, "start": 1113.4399999999998, + "end": 1120.3999999999999, "text": " back to the original models of information + retrieval, they act as if every document in your corpus", "tokens": [50656, 646, + 281, 264, 3380, 5245, 295, 1589, 19817, 3337, 11, 436, 605, 382, 498, 633, 4166, + 294, 428, 1181, 31624, 51004], "temperature": 0.0, "avg_logprob": -0.18647420037653029, + "compression_ratio": 1.6367521367521367, "no_speech_prob": 0.007548121735453606}, + {"id": 168, "seek": 110760, "start": 1120.3999999999999, "end": 1125.52, "text": + " could be scored. The only reason you don''t do that is you can''t forward to it''s + too expensive.", "tokens": [51004, 727, 312, 18139, 13, 440, 787, 1778, 291, 500, + 380, 360, 300, 307, 291, 393, 380, 2128, 281, 309, 311, 886, 5124, 13, 51260], "temperature": + 0.0, "avg_logprob": -0.18647420037653029, "compression_ratio": 1.6367521367521367, + "no_speech_prob": 0.007548121735453606}, {"id": 169, "seek": 110760, "start": 1125.52, + "end": 1132.48, "text": " But that that''s the, the gist of it, the scoring function + on the query and document, then in", "tokens": [51260, 583, 300, 300, 311, 264, + 11, 264, 290, 468, 295, 309, 11, 264, 22358, 2445, 322, 264, 14581, 293, 4166, 11, + 550, 294, 51608], "temperature": 0.0, "avg_logprob": -0.18647420037653029, "compression_ratio": + 1.6367521367521367, "no_speech_prob": 0.007548121735453606}, {"id": 170, "seek": + 113248, "start": 1132.48, "end": 1140.48, "text": " some cases, even on the user. + Now, that''s a lot of input into a function. It''s quite a different", "tokens": + [50364, 512, 3331, 11, 754, 322, 264, 4195, 13, 823, 11, 300, 311, 257, 688, 295, + 4846, 666, 257, 2445, 13, 467, 311, 1596, 257, 819, 50764], "temperature": 0.0, + "avg_logprob": -0.11915471602459343, "compression_ratio": 1.7342342342342343, "no_speech_prob": + 0.0022390810772776604}, {"id": 171, "seek": 113248, "start": 1140.48, "end": 1147.04, + "text": " way that you might approach the problem is to say, I have a query. I''m + going to try to represent that", "tokens": [50764, 636, 300, 291, 1062, 3109, 264, + 1154, 307, 281, 584, 11, 286, 362, 257, 14581, 13, 286, 478, 516, 281, 853, 281, + 2906, 300, 51092], "temperature": 0.0, "avg_logprob": -0.11915471602459343, "compression_ratio": + 1.7342342342342343, "no_speech_prob": 0.0022390810772776604}, {"id": 172, "seek": + 113248, "start": 1147.04, "end": 1153.52, "text": " query as useful as possible + without looking at any documents first. Also, before I even see", "tokens": [51092, + 14581, 382, 4420, 382, 1944, 1553, 1237, 412, 604, 8512, 700, 13, 2743, 11, 949, + 286, 754, 536, 51416], "temperature": 0.0, "avg_logprob": -0.11915471602459343, + "compression_ratio": 1.7342342342342343, "no_speech_prob": 0.0022390810772776604}, + {"id": 173, "seek": 113248, "start": 1153.52, "end": 1159.44, "text": " any queries, + I have documents. I''m going to try to represent them as well as possible before + I", "tokens": [51416, 604, 24109, 11, 286, 362, 8512, 13, 286, 478, 516, 281, 853, + 281, 2906, 552, 382, 731, 382, 1944, 949, 286, 51712], "temperature": 0.0, "avg_logprob": + -0.11915471602459343, "compression_ratio": 1.7342342342342343, "no_speech_prob": + 0.0022390810772776604}, {"id": 174, "seek": 115944, "start": 1159.44, "end": 1164.0, + "text": " see any queries, or at the very least before I see the particular query + that someone''s going to", "tokens": [50364, 536, 604, 24109, 11, 420, 412, 264, + 588, 1935, 949, 286, 536, 264, 1729, 14581, 300, 1580, 311, 516, 281, 50592], "temperature": + 0.0, "avg_logprob": -0.15814796952176685, "compression_ratio": 1.7464788732394365, + "no_speech_prob": 0.0010841075563803315}, {"id": 175, "seek": 115944, "start": 1164.0, + "end": 1168.96, "text": " make. I might have something, I might use the history + of queries to, you know, inform my approach.", "tokens": [50592, 652, 13, 286, 1062, + 362, 746, 11, 286, 1062, 764, 264, 2503, 295, 24109, 281, 11, 291, 458, 11, 1356, + 452, 3109, 13, 50840], "temperature": 0.0, "avg_logprob": -0.15814796952176685, + "compression_ratio": 1.7464788732394365, "no_speech_prob": 0.0010841075563803315}, + {"id": 176, "seek": 115944, "start": 1169.52, "end": 1175.8400000000001, "text": + " So now we''ve factored out this original scoring problem that said, throw everything + at a scoring", "tokens": [50868, 407, 586, 321, 600, 1186, 2769, 484, 341, 3380, + 22358, 1154, 300, 848, 11, 3507, 1203, 412, 257, 22358, 51184], "temperature": 0.0, + "avg_logprob": -0.15814796952176685, "compression_ratio": 1.7464788732394365, "no_speech_prob": + 0.0010841075563803315}, {"id": 177, "seek": 115944, "start": 1175.8400000000001, + "end": 1181.92, "text": " function and said, no, no, no, we''re first going to say, + let''s understand the query in a representation", "tokens": [51184, 2445, 293, 848, + 11, 572, 11, 572, 11, 572, 11, 321, 434, 700, 516, 281, 584, 11, 718, 311, 1223, + 264, 14581, 294, 257, 10290, 51488], "temperature": 0.0, "avg_logprob": -0.15814796952176685, + "compression_ratio": 1.7464788732394365, "no_speech_prob": 0.0010841075563803315}, + {"id": 178, "seek": 115944, "start": 1181.92, "end": 1187.6000000000001, "text": + " that distills it to its assets. We have already understood the content, the documents, + in a way that", "tokens": [51488, 300, 1483, 2565, 309, 281, 1080, 9769, 13, 492, + 362, 1217, 7320, 264, 2701, 11, 264, 8512, 11, 294, 257, 636, 300, 51772], "temperature": + 0.0, "avg_logprob": -0.15814796952176685, "compression_ratio": 1.7464788732394365, + "no_speech_prob": 0.0010841075563803315}, {"id": 179, "seek": 118760, "start": 1187.6, + "end": 1195.4399999999998, "text": " distills them to their assets. And now, when + we even decide what to retrieve, we''re going to use", "tokens": [50364, 1483, 2565, + 552, 281, 641, 9769, 13, 400, 586, 11, 562, 321, 754, 4536, 437, 281, 30254, 11, + 321, 434, 516, 281, 764, 50756], "temperature": 0.0, "avg_logprob": -0.08857977867126465, + "compression_ratio": 1.7330316742081449, "no_speech_prob": 0.00011061402619816363}, + {"id": 180, "seek": 118760, "start": 1195.4399999999998, "end": 1201.04, "text": + " those representations that already have done some of the work for us. In the case + of the documents,", "tokens": [50756, 729, 33358, 300, 1217, 362, 1096, 512, 295, + 264, 589, 337, 505, 13, 682, 264, 1389, 295, 264, 8512, 11, 51036], "temperature": + 0.0, "avg_logprob": -0.08857977867126465, "compression_ratio": 1.7330316742081449, + "no_speech_prob": 0.00011061402619816363}, {"id": 181, "seek": 118760, "start": + 1201.04, "end": 1206.1599999999999, "text": " we did it offline. In the case of + the query, we have to wait till we see it unless it''s a maybe", "tokens": [51036, + 321, 630, 309, 21857, 13, 682, 264, 1389, 295, 264, 14581, 11, 321, 362, 281, 1699, + 4288, 321, 536, 309, 5969, 309, 311, 257, 1310, 51292], "temperature": 0.0, "avg_logprob": + -0.08857977867126465, "compression_ratio": 1.7330316742081449, "no_speech_prob": + 0.00011061402619816363}, {"id": 182, "seek": 118760, "start": 1206.1599999999999, + "end": 1211.04, "text": " ahead query we''ve seen before, but we do it once for + the query, not once for every result.", "tokens": [51292, 2286, 14581, 321, 600, + 1612, 949, 11, 457, 321, 360, 309, 1564, 337, 264, 14581, 11, 406, 1564, 337, 633, + 1874, 13, 51536], "temperature": 0.0, "avg_logprob": -0.08857977867126465, "compression_ratio": + 1.7330316742081449, "no_speech_prob": 0.00011061402619816363}, {"id": 183, "seek": + 121104, "start": 1212.0, "end": 1220.32, "text": " And we can use that to then say, + well, roughly speaking, if we have represented the query", "tokens": [50412, 400, + 321, 393, 764, 300, 281, 550, 584, 11, 731, 11, 9810, 4124, 11, 498, 321, 362, 10379, + 264, 14581, 50828], "temperature": 0.0, "avg_logprob": -0.12126280187250493, "compression_ratio": + 1.6981981981981982, "no_speech_prob": 0.003150768345221877}, {"id": 184, "seek": + 121104, "start": 1220.32, "end": 1227.2, "text": " and the content in a similar + space retrieval that is deciding what documents we should look at,", "tokens": [50828, + 293, 264, 2701, 294, 257, 2531, 1901, 19817, 3337, 300, 307, 17990, 437, 8512, 321, + 820, 574, 412, 11, 51172], "temperature": 0.0, "avg_logprob": -0.12126280187250493, + "compression_ratio": 1.6981981981981982, "no_speech_prob": 0.003150768345221877}, + {"id": 185, "seek": 121104, "start": 1227.2, "end": 1234.0, "text": " is much more + of a matching problem. In fact, if the space uses a similar schema, for example, + if", "tokens": [51172, 307, 709, 544, 295, 257, 14324, 1154, 13, 682, 1186, 11, + 498, 264, 1901, 4960, 257, 2531, 34078, 11, 337, 1365, 11, 498, 51512], "temperature": + 0.0, "avg_logprob": -0.12126280187250493, "compression_ratio": 1.6981981981981982, + "no_speech_prob": 0.003150768345221877}, {"id": 186, "seek": 121104, "start": 1234.0, + "end": 1239.2, "text": " the query is mapped to a category or a set of categories, + and the documents having categorized", "tokens": [51512, 264, 14581, 307, 33318, + 281, 257, 7719, 420, 257, 992, 295, 10479, 11, 293, 264, 8512, 1419, 19250, 1602, + 51772], "temperature": 0.0, "avg_logprob": -0.12126280187250493, "compression_ratio": + 1.6981981981981982, "no_speech_prob": 0.003150768345221877}, {"id": 187, "seek": + 123920, "start": 1239.28, "end": 1245.2, "text": " using the same set, we can say, + well, we should probably retrieve documents from those categories,", "tokens": [50368, + 1228, 264, 912, 992, 11, 321, 393, 584, 11, 731, 11, 321, 820, 1391, 30254, 8512, + 490, 729, 10479, 11, 50664], "temperature": 0.0, "avg_logprob": -0.1349897066752116, + "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0010063352528959513}, + {"id": 188, "seek": 123920, "start": 1245.2, "end": 1252.0800000000002, "text": + " or we may have other structured data we can use that way. And what is happening + is that a lot of the", "tokens": [50664, 420, 321, 815, 362, 661, 18519, 1412, 321, + 393, 764, 300, 636, 13, 400, 437, 307, 2737, 307, 300, 257, 688, 295, 264, 51008], + "temperature": 0.0, "avg_logprob": -0.1349897066752116, "compression_ratio": 1.696969696969697, + "no_speech_prob": 0.0010063352528959513}, {"id": 189, "seek": 123920, "start": 1252.0800000000002, + "end": 1258.16, "text": " work that ranking was doing, which was essentially trying + to say, should I retrieve this document", "tokens": [51008, 589, 300, 17833, 390, + 884, 11, 597, 390, 4476, 1382, 281, 584, 11, 820, 286, 30254, 341, 4166, 51312], + "temperature": 0.0, "avg_logprob": -0.1349897066752116, "compression_ratio": 1.696969696969697, + "no_speech_prob": 0.0010063352528959513}, {"id": 190, "seek": 123920, "start": 1258.16, + "end": 1264.72, "text": " at all? Is this document relevant enough to the query + that it should be in consideration? This", "tokens": [51312, 412, 439, 30, 1119, + 341, 4166, 7340, 1547, 281, 264, 14581, 300, 309, 820, 312, 294, 12381, 30, 639, + 51640], "temperature": 0.0, "avg_logprob": -0.1349897066752116, "compression_ratio": + 1.696969696969697, "no_speech_prob": 0.0010063352528959513}, {"id": 191, "seek": + 126472, "start": 1265.1200000000001, "end": 1273.52, "text": " query-dependent aspect + of ranking can be solved as saying, basically, once I have the query,", "tokens": + [50384, 14581, 12, 36763, 317, 4171, 295, 17833, 393, 312, 13041, 382, 1566, 11, + 1936, 11, 1564, 286, 362, 264, 14581, 11, 50804], "temperature": 0.0, "avg_logprob": + -0.1750607763017927, "compression_ratio": 1.5698924731182795, "no_speech_prob": + 0.0012051842641085386}, {"id": 192, "seek": 126472, "start": 1273.52, "end": 1281.76, + "text": " and the content represented in the same space, do they know? Is that overlocked + there? So we''re basically", "tokens": [50804, 293, 264, 2701, 10379, 294, 264, + 912, 1901, 11, 360, 436, 458, 30, 1119, 300, 670, 4102, 292, 456, 30, 407, 321, + 434, 1936, 51216], "temperature": 0.0, "avg_logprob": -0.1750607763017927, "compression_ratio": + 1.5698924731182795, "no_speech_prob": 0.0012051842641085386}, {"id": 193, "seek": + 126472, "start": 1281.76, "end": 1289.28, "text": " changing the first order bits, + the higher order bits of ranking into more of a classification", "tokens": [51216, + 4473, 264, 700, 1668, 9239, 11, 264, 2946, 1668, 9239, 295, 17833, 666, 544, 295, + 257, 21538, 51592], "temperature": 0.0, "avg_logprob": -0.1750607763017927, "compression_ratio": + 1.5698924731182795, "no_speech_prob": 0.0012051842641085386}, {"id": 194, "seek": + 128928, "start": 1289.28, "end": 1295.36, "text": " problem, which we''re experiencing + is really, look, once we have the query and the content in the same", "tokens": + [50364, 1154, 11, 597, 321, 434, 11139, 307, 534, 11, 574, 11, 1564, 321, 362, 264, + 14581, 293, 264, 2701, 294, 264, 912, 50668], "temperature": 0.0, "avg_logprob": + -0.23976987079509254, "compression_ratio": 1.7359307359307359, "no_speech_prob": + 0.006301970221102238}, {"id": 195, "seek": 128928, "start": 1295.36, "end": 1302.0, + "text": " space, figuring out if, you know, what is the content that matches the + general gist of the query,", "tokens": [50668, 1901, 11, 15213, 484, 498, 11, 291, + 458, 11, 437, 307, 264, 2701, 300, 10676, 264, 2674, 290, 468, 295, 264, 14581, + 11, 51000], "temperature": 0.0, "avg_logprob": -0.23976987079509254, "compression_ratio": + 1.7359307359307359, "no_speech_prob": 0.006301970221102238}, {"id": 196, "seek": + 128928, "start": 1302.0, "end": 1307.6, "text": " should be an easier problem. And + then ranking is more, oh, well, a lot of things matched, but some", "tokens": [51000, + 820, 312, 364, 3571, 1154, 13, 400, 550, 17833, 307, 544, 11, 1954, 11, 731, 11, + 257, 688, 295, 721, 21447, 11, 457, 512, 51280], "temperature": 0.0, "avg_logprob": + -0.23976987079509254, "compression_ratio": 1.7359307359307359, "no_speech_prob": + 0.006301970221102238}, {"id": 197, "seek": 128928, "start": 1307.6, "end": 1315.44, + "text": " are better than others. And that''s, of course, the word, but she learning + that, well, machine learning", "tokens": [51280, 366, 1101, 813, 2357, 13, 400, + 300, 311, 11, 295, 1164, 11, 264, 1349, 11, 457, 750, 2539, 300, 11, 731, 11, 3479, + 2539, 51672], "temperature": 0.0, "avg_logprob": -0.23976987079509254, "compression_ratio": + 1.7359307359307359, "no_speech_prob": 0.006301970221102238}, {"id": 198, "seek": + 131544, "start": 1315.6000000000001, "end": 1320.56, "text": " is how we get those + representations. It''s how we turn the query into a more useful representation,", + "tokens": [50372, 307, 577, 321, 483, 729, 33358, 13, 467, 311, 577, 321, 1261, + 264, 14581, 666, 257, 544, 4420, 10290, 11, 50620], "temperature": 0.0, "avg_logprob": + -0.2478546378886805, "compression_ratio": 1.8129770992366412, "no_speech_prob": + 0.007636616472154856}, {"id": 199, "seek": 131544, "start": 1320.56, "end": 1326.48, + "text": " how we turn the content into a more useful representation. But it is, + by treating those things in", "tokens": [50620, 577, 321, 1261, 264, 2701, 666, + 257, 544, 4420, 10290, 13, 583, 309, 307, 11, 538, 15083, 729, 721, 294, 50916], + "temperature": 0.0, "avg_logprob": -0.2478546378886805, "compression_ratio": 1.8129770992366412, + "no_speech_prob": 0.007636616472154856}, {"id": 200, "seek": 131544, "start": 1326.48, + "end": 1333.68, "text": " something of an acceleration, it allows us to be a lot + more directed than we are with ranking,", "tokens": [50916, 746, 295, 364, 17162, + 11, 309, 4045, 505, 281, 312, 257, 688, 544, 12898, 813, 321, 366, 365, 17833, 11, + 51276], "temperature": 0.0, "avg_logprob": -0.2478546378886805, "compression_ratio": + 1.8129770992366412, "no_speech_prob": 0.007636616472154856}, {"id": 201, "seek": + 131544, "start": 1333.68, "end": 1339.76, "text": " and in my experience, it''s + far better results. Yeah, I remember like the course,", "tokens": [51276, 293, 294, + 452, 1752, 11, 309, 311, 1400, 1101, 3542, 13, 865, 11, 286, 1604, 411, 264, 1164, + 11, 51580], "temperature": 0.0, "avg_logprob": -0.2478546378886805, "compression_ratio": + 1.8129770992366412, "no_speech_prob": 0.007636616472154856}, {"id": 202, "seek": + 131544, "start": 1340.48, "end": 1345.28, "text": " search with a meltot by you + and grunting yourself. You gave that brilliant example that stuck with me.", "tokens": + [51616, 3164, 365, 257, 10083, 310, 538, 291, 293, 677, 14559, 1803, 13, 509, 2729, + 300, 10248, 1365, 300, 5541, 365, 385, 13, 51856], "temperature": 0.0, "avg_logprob": + -0.2478546378886805, "compression_ratio": 1.8129770992366412, "no_speech_prob": + 0.007636616472154856}, {"id": 203, "seek": 134528, "start": 1345.28, "end": 1349.84, + "text": " I believe it was best by implementing, correct me if I''m wrong, implementing + the query", "tokens": [50364, 286, 1697, 309, 390, 1151, 538, 18114, 11, 3006, 385, + 498, 286, 478, 2085, 11, 18114, 264, 14581, 50592], "temperature": 0.0, "avg_logprob": + -0.17405315925335063, "compression_ratio": 1.740909090909091, "no_speech_prob": + 0.0008455596398562193}, {"id": 204, "seek": 134528, "start": 1349.84, "end": 1356.3999999999999, + "text": " functionality, query understanding functionality where if you typed iPhone + or some product that they", "tokens": [50592, 14980, 11, 14581, 3701, 14980, 689, + 498, 291, 33941, 7252, 420, 512, 1674, 300, 436, 50920], "temperature": 0.0, "avg_logprob": + -0.17405315925335063, "compression_ratio": 1.740909090909091, "no_speech_prob": + 0.0008455596398562193}, {"id": 205, "seek": 134528, "start": 1356.3999999999999, + "end": 1361.6, "text": " didn''t have at the moment, they would actually use query + understanding to tell you that they don''t", "tokens": [50920, 994, 380, 362, 412, + 264, 1623, 11, 436, 576, 767, 764, 14581, 3701, 281, 980, 291, 300, 436, 500, 380, + 51180], "temperature": 0.0, "avg_logprob": -0.17405315925335063, "compression_ratio": + 1.740909090909091, "no_speech_prob": 0.0008455596398562193}, {"id": 206, "seek": + 134528, "start": 1361.6, "end": 1369.12, "text": " have a product they would not + even go in search. And I mean, the example was this B&H photo with", "tokens": [51180, + 362, 257, 1674, 436, 576, 406, 754, 352, 294, 3164, 13, 400, 286, 914, 11, 264, + 1365, 390, 341, 363, 5, 39, 5052, 365, 51556], "temperature": 0.0, "avg_logprob": + -0.17405315925335063, "compression_ratio": 1.740909090909091, "no_speech_prob": + 0.0008455596398562193}, {"id": 207, "seek": 136912, "start": 1369.12, "end": 1376.0, + "text": " iPhones, but an example that''s even more fun is with Netflix, where I + don''t have much inside", "tokens": [50364, 43793, 11, 457, 364, 1365, 300, 311, + 754, 544, 1019, 307, 365, 12778, 11, 689, 286, 500, 380, 362, 709, 1854, 50708], + "temperature": 0.0, "avg_logprob": -0.19880712702033226, "compression_ratio": 1.584033613445378, + "no_speech_prob": 0.004584470763802528}, {"id": 208, "seek": 136912, "start": 1376.0, + "end": 1381.1999999999998, "text": " as to the channels there that haven''t been + one of my clients. But the Netflix has", "tokens": [50708, 382, 281, 264, 9235, + 456, 300, 2378, 380, 668, 472, 295, 452, 6982, 13, 583, 264, 12778, 575, 50968], + "temperature": 0.0, "avg_logprob": -0.19880712702033226, "compression_ratio": 1.584033613445378, + "no_speech_prob": 0.004584470763802528}, {"id": 209, "seek": 136912, "start": 1382.4799999999998, + "end": 1387.84, "text": " perspective about limited catalog. They don''t, for example, + get movies from folks like Disney that", "tokens": [51032, 4585, 466, 5567, 19746, + 13, 814, 500, 380, 11, 337, 1365, 11, 483, 6233, 490, 4024, 411, 8653, 300, 51300], + "temperature": 0.0, "avg_logprob": -0.19880712702033226, "compression_ratio": 1.584033613445378, + "no_speech_prob": 0.004584470763802528}, {"id": 210, "seek": 136912, "start": 1387.84, + "end": 1397.6799999999998, "text": " is quite protected of its catalog. But Netflix + knows when you''re looking for a Disney children''s movie.", "tokens": [51300, 307, + 1596, 10594, 295, 1080, 19746, 13, 583, 12778, 3255, 562, 291, 434, 1237, 337, 257, + 8653, 2227, 311, 3169, 13, 51792], "temperature": 0.0, "avg_logprob": -0.19880712702033226, + "compression_ratio": 1.584033613445378, "no_speech_prob": 0.004584470763802528}, + {"id": 211, "seek": 139768, "start": 1398.48, "end": 1404.0, "text": " And when + you do that, rather than trying to simply match the text of your query, they show + some", "tokens": [50404, 400, 562, 291, 360, 300, 11, 2831, 813, 1382, 281, 2935, + 2995, 264, 2487, 295, 428, 14581, 11, 436, 855, 512, 50680], "temperature": 0.0, + "avg_logprob": -0.1178204738176786, "compression_ratio": 1.7509293680297398, "no_speech_prob": + 0.002015321049839258}, {"id": 212, "seek": 139768, "start": 1404.0, "end": 1412.48, + "text": " of their children''s movies. So it''s an example where you clearly split + out the work of understanding,", "tokens": [50680, 295, 641, 2227, 311, 6233, 13, + 407, 309, 311, 364, 1365, 689, 291, 4448, 7472, 484, 264, 589, 295, 3701, 11, 51104], + "temperature": 0.0, "avg_logprob": -0.1178204738176786, "compression_ratio": 1.7509293680297398, + "no_speech_prob": 0.002015321049839258}, {"id": 213, "seek": 139768, "start": 1412.48, + "end": 1417.28, "text": " the searchers intent, and query understanding from simply + retrieving and scoring results,", "tokens": [51104, 264, 3164, 433, 8446, 11, 293, + 14581, 3701, 490, 2935, 19817, 798, 293, 22358, 3542, 11, 51344], "temperature": + 0.0, "avg_logprob": -0.1178204738176786, "compression_ratio": 1.7509293680297398, + "no_speech_prob": 0.002015321049839258}, {"id": 214, "seek": 139768, "start": 1417.28, + "end": 1421.2, "text": " because that improved representation. They know you''re + looking for a children''s movie,", "tokens": [51344, 570, 300, 9689, 10290, 13, + 814, 458, 291, 434, 1237, 337, 257, 2227, 311, 3169, 11, 51540], "temperature": + 0.0, "avg_logprob": -0.1178204738176786, "compression_ratio": 1.7509293680297398, + "no_speech_prob": 0.002015321049839258}, {"id": 215, "seek": 139768, "start": 1421.2, + "end": 1426.48, "text": " and they have children''s movie is far more powerful than + the traditional ways in which you might", "tokens": [51540, 293, 436, 362, 2227, + 311, 3169, 307, 1400, 544, 4005, 813, 264, 5164, 2098, 294, 597, 291, 1062, 51804], + "temperature": 0.0, "avg_logprob": -0.1178204738176786, "compression_ratio": 1.7509293680297398, + "no_speech_prob": 0.002015321049839258}, {"id": 216, "seek": 142648, "start": 1426.48, + "end": 1433.3600000000001, "text": " score a grand result. Yeah, I''m personally + fascinated by the field of query understanding,", "tokens": [50364, 6175, 257, 2697, + 1874, 13, 865, 11, 286, 478, 5665, 24597, 538, 264, 2519, 295, 14581, 3701, 11, + 50708], "temperature": 0.0, "avg_logprob": -0.2769624945822726, "compression_ratio": + 1.5899581589958158, "no_speech_prob": 0.004823252558708191}, {"id": 217, "seek": + 142648, "start": 1433.92, "end": 1440.96, "text": " having implemented it with my + team in a web scale. We worked on job search engine, as vertical", "tokens": [50736, + 1419, 12270, 309, 365, 452, 1469, 294, 257, 3670, 4373, 13, 492, 2732, 322, 1691, + 3164, 2848, 11, 382, 9429, 51088], "temperature": 0.0, "avg_logprob": -0.2769624945822726, + "compression_ratio": 1.5899581589958158, "no_speech_prob": 0.004823252558708191}, + {"id": 218, "seek": 142648, "start": 1440.96, "end": 1446.88, "text": " search engine, + no power by the web scale search engine. First of all, it was multilingual. Second", + "tokens": [51088, 3164, 2848, 11, 572, 1347, 538, 264, 3670, 4373, 3164, 2848, 13, + 2386, 295, 439, 11, 309, 390, 2120, 38219, 13, 5736, 51384], "temperature": 0.0, + "avg_logprob": -0.2769624945822726, "compression_ratio": 1.5899581589958158, "no_speech_prob": + 0.004823252558708191}, {"id": 219, "seek": 142648, "start": 1446.88, "end": 1454.24, + "text": " is that you have to figure out this semantic subtleties when users type + opening hours or working", "tokens": [51384, 307, 300, 291, 362, 281, 2573, 484, + 341, 47982, 7257, 2631, 530, 562, 5022, 2010, 5193, 2496, 420, 1364, 51752], "temperature": + 0.0, "avg_logprob": -0.2769624945822726, "compression_ratio": 1.5899581589958158, + "no_speech_prob": 0.004823252558708191}, {"id": 220, "seek": 145424, "start": 1454.32, + "end": 1460.16, "text": " hours, whatever the way they phrase it in their language. + And that''s not a query you want to execute", "tokens": [50368, 2496, 11, 2035, + 264, 636, 436, 9535, 309, 294, 641, 2856, 13, 400, 300, 311, 406, 257, 14581, 291, + 528, 281, 14483, 50660], "temperature": 0.0, "avg_logprob": -0.15605067478791448, + "compression_ratio": 1.7377622377622377, "no_speech_prob": 0.01846051774919033}, + {"id": 221, "seek": 145424, "start": 1460.16, "end": 1467.2, "text": " in the job + search. But if they set jobs in IT in London, that''s okay. And you can use that + and pass it", "tokens": [50660, 294, 264, 1691, 3164, 13, 583, 498, 436, 992, 4782, + 294, 6783, 294, 7042, 11, 300, 311, 1392, 13, 400, 291, 393, 764, 300, 293, 1320, + 309, 51012], "temperature": 0.0, "avg_logprob": -0.15605067478791448, "compression_ratio": + 1.7377622377622377, "no_speech_prob": 0.01846051774919033}, {"id": 222, "seek": + 145424, "start": 1467.2, "end": 1472.08, "text": " through the filter. So query + understanding kind of worked as a filter in a way. But then it also,", "tokens": + [51012, 807, 264, 6608, 13, 407, 14581, 3701, 733, 295, 2732, 382, 257, 6608, 294, + 257, 636, 13, 583, 550, 309, 611, 11, 51256], "temperature": 0.0, "avg_logprob": + -0.15605067478791448, "compression_ratio": 1.7377622377622377, "no_speech_prob": + 0.01846051774919033}, {"id": 223, "seek": 145424, "start": 1472.08, "end": 1476.96, + "text": " or a classify, you could say, right? But then it would also give us this + reach semantics that we", "tokens": [51256, 420, 257, 33872, 11, 291, 727, 584, + 11, 558, 30, 583, 550, 309, 576, 611, 976, 505, 341, 2524, 4361, 45298, 300, 321, + 51500], "temperature": 0.0, "avg_logprob": -0.15605067478791448, "compression_ratio": + 1.7377622377622377, "no_speech_prob": 0.01846051774919033}, {"id": 224, "seek": + 145424, "start": 1476.96, "end": 1481.28, "text": " could apply in fields. Let''s + say if it''s London as a city, you don''t want to search that work just", "tokens": + [51500, 727, 3079, 294, 7909, 13, 961, 311, 584, 498, 309, 311, 7042, 382, 257, + 2307, 11, 291, 500, 380, 528, 281, 3164, 300, 589, 445, 51716], "temperature": 0.0, + "avg_logprob": -0.15605067478791448, "compression_ratio": 1.7377622377622377, "no_speech_prob": + 0.01846051774919033}, {"id": 225, "seek": 148128, "start": 1481.28, "end": 1485.84, + "text": " in the description. You can apply it in the field of the city on the document.", + "tokens": [50364, 294, 264, 3855, 13, 509, 393, 3079, 309, 294, 264, 2519, 295, + 264, 2307, 322, 264, 4166, 13, 50592], "temperature": 0.0, "avg_logprob": -0.1876436192938622, + "compression_ratio": 1.5630252100840336, "no_speech_prob": 0.0027316163759678602}, + {"id": 226, "seek": 148128, "start": 1487.76, "end": 1495.36, "text": " And I mean, + this was like back then we applied rule-based approach. And it worked fine, but + it was", "tokens": [50688, 400, 286, 914, 11, 341, 390, 411, 646, 550, 321, 6456, + 4978, 12, 6032, 3109, 13, 400, 309, 2732, 2489, 11, 457, 309, 390, 51068], "temperature": + 0.0, "avg_logprob": -0.1876436192938622, "compression_ratio": 1.5630252100840336, + "no_speech_prob": 0.0027316163759678602}, {"id": 227, "seek": 148128, "start": 1495.36, + "end": 1501.76, "text": " very maybe conservative in a way, right? Especially for + languages like Turkish, where they have the word", "tokens": [51068, 588, 1310, + 13780, 294, 257, 636, 11, 558, 30, 8545, 337, 8650, 411, 18565, 11, 689, 436, 362, + 264, 1349, 51388], "temperature": 0.0, "avg_logprob": -0.1876436192938622, "compression_ratio": + 1.5630252100840336, "no_speech_prob": 0.0027316163759678602}, {"id": 228, "seek": + 148128, "start": 1501.76, "end": 1507.84, "text": " ish, which is a, you know, overloaded, + semantically overloaded word and used in different", "tokens": [51388, 307, 71, + 11, 597, 307, 257, 11, 291, 458, 11, 28777, 292, 11, 4361, 49505, 28777, 292, 1349, + 293, 1143, 294, 819, 51692], "temperature": 0.0, "avg_logprob": -0.1876436192938622, + "compression_ratio": 1.5630252100840336, "no_speech_prob": 0.0027316163759678602}, + {"id": 229, "seek": 150784, "start": 1507.84, "end": 1513.4399999999998, "text": + " contexts. It may mean a bank card. It may mean a job search and a number of other + meanings.", "tokens": [50364, 30628, 13, 467, 815, 914, 257, 3765, 2920, 13, 467, + 815, 914, 257, 1691, 3164, 293, 257, 1230, 295, 661, 28138, 13, 50644], "temperature": + 0.0, "avg_logprob": -0.16248978389782853, "compression_ratio": 1.631578947368421, + "no_speech_prob": 0.005474993959069252}, {"id": 230, "seek": 150784, "start": 1515.36, + "end": 1522.56, "text": " But would you advocate for using machine learning and + query understanding? I know, by the way,", "tokens": [50740, 583, 576, 291, 14608, + 337, 1228, 3479, 2539, 293, 14581, 3701, 30, 286, 458, 11, 538, 264, 636, 11, 51100], + "temperature": 0.0, "avg_logprob": -0.16248978389782853, "compression_ratio": 1.631578947368421, + "no_speech_prob": 0.005474993959069252}, {"id": 231, "seek": 150784, "start": 1522.56, + "end": 1530.24, "text": " you wrote a brilliant series of blog posts on medium drilling + into so many subtopics of query", "tokens": [51100, 291, 4114, 257, 10248, 2638, + 295, 6968, 12300, 322, 6399, 26290, 666, 370, 867, 7257, 404, 1167, 295, 14581, + 51484], "temperature": 0.0, "avg_logprob": -0.16248978389782853, "compression_ratio": + 1.631578947368421, "no_speech_prob": 0.005474993959069252}, {"id": 232, "seek": + 150784, "start": 1530.24, "end": 1534.8, "text": " understanding, and especially + like that you can actually utilize it in autocorrelate. I was", "tokens": [51484, + 3701, 11, 293, 2318, 411, 300, 291, 393, 767, 16117, 309, 294, 45833, 284, 4419, + 473, 13, 286, 390, 51712], "temperature": 0.0, "avg_logprob": -0.16248978389782853, + "compression_ratio": 1.631578947368421, "no_speech_prob": 0.005474993959069252}, + {"id": 233, "seek": 153480, "start": 1534.8, "end": 1539.36, "text": " actually + fascinated that you connected those two and I highly recommend everyone to take + a look at", "tokens": [50364, 767, 24597, 300, 291, 4582, 729, 732, 293, 286, 5405, + 2748, 1518, 281, 747, 257, 574, 412, 50592], "temperature": 0.0, "avg_logprob": + -0.1641335958962912, "compression_ratio": 1.7782805429864252, "no_speech_prob": + 0.001908169942907989}, {"id": 234, "seek": 153480, "start": 1539.36, "end": 1545.68, + "text": " that. So what''s your take on sort of rule-based versus machine learning? + Would you start with rule-based?", "tokens": [50592, 300, 13, 407, 437, 311, 428, + 747, 322, 1333, 295, 4978, 12, 6032, 5717, 3479, 2539, 30, 6068, 291, 722, 365, + 4978, 12, 6032, 30, 50908], "temperature": 0.0, "avg_logprob": -0.1641335958962912, + "compression_ratio": 1.7782805429864252, "no_speech_prob": 0.001908169942907989}, + {"id": 235, "seek": 153480, "start": 1545.68, "end": 1550.24, "text": " And then + as you learn, go to machine learning, or would you start head-on with machine learning?", + "tokens": [50908, 400, 550, 382, 291, 1466, 11, 352, 281, 3479, 2539, 11, 420, 576, + 291, 722, 1378, 12, 266, 365, 3479, 2539, 30, 51136], "temperature": 0.0, "avg_logprob": + -0.1641335958962912, "compression_ratio": 1.7782805429864252, "no_speech_prob": + 0.001908169942907989}, {"id": 236, "seek": 153480, "start": 1550.24, "end": 1557.52, + "text": " So I certainly see a lot of value of machine learning inquiry understanding + for some of the", "tokens": [51136, 407, 286, 3297, 536, 257, 688, 295, 2158, 295, + 3479, 2539, 25736, 3701, 337, 512, 295, 264, 51500], "temperature": 0.0, "avg_logprob": + -0.1641335958962912, "compression_ratio": 1.7782805429864252, "no_speech_prob": + 0.001908169942907989}, {"id": 237, "seek": 155752, "start": 1557.52, "end": 1564.48, + "text": " reasons I was saying before. But with that said, I think that there''s + often a sort of a", "tokens": [50364, 4112, 286, 390, 1566, 949, 13, 583, 365, 300, + 848, 11, 286, 519, 300, 456, 311, 2049, 257, 1333, 295, 257, 50712], "temperature": + 0.0, "avg_logprob": -0.23375516949277936, "compression_ratio": 1.5578512396694215, + "no_speech_prob": 0.00826354417949915}, {"id": 238, "seek": 155752, "start": 1564.48, + "end": 1570.08, "text": " burrito principle in 80, 20 in search problems. And when + I go to people, especially folks in small", "tokens": [50712, 2779, 17492, 8665, + 294, 4688, 11, 945, 294, 3164, 2740, 13, 400, 562, 286, 352, 281, 561, 11, 2318, + 4024, 294, 1359, 50992], "temperature": 0.0, "avg_logprob": -0.23375516949277936, + "compression_ratio": 1.5578512396694215, "no_speech_prob": 0.00826354417949915}, + {"id": 239, "seek": 155752, "start": 1570.08, "end": 1576.72, "text": " organizations, + I tell them, look, let''s say for example you''re trying to figure out would be,", + "tokens": [50992, 6150, 11, 286, 980, 552, 11, 574, 11, 718, 311, 584, 337, 1365, + 291, 434, 1382, 281, 2573, 484, 576, 312, 11, 51324], "temperature": 0.0, "avg_logprob": + -0.23375516949277936, "compression_ratio": 1.5578512396694215, "no_speech_prob": + 0.00826354417949915}, {"id": 240, "seek": 155752, "start": 1576.72, "end": 1580.8, + "text": " well, use your job of job search examples. And so I spent a few years + up like that and it''s very", "tokens": [51324, 731, 11, 764, 428, 1691, 295, 1691, + 3164, 5110, 13, 400, 370, 286, 4418, 257, 1326, 924, 493, 411, 300, 293, 309, 311, + 588, 51528], "temperature": 0.0, "avg_logprob": -0.23375516949277936, "compression_ratio": + 1.5578512396694215, "no_speech_prob": 0.00826354417949915}, {"id": 241, "seek": + 158080, "start": 1580.8, "end": 1587.68, "text": " close to my heart. They knew + on to know, well, somebody, for example, looking for a job title,", "tokens": [50364, + 1998, 281, 452, 1917, 13, 814, 2586, 322, 281, 458, 11, 731, 11, 2618, 11, 337, + 1365, 11, 1237, 337, 257, 1691, 4876, 11, 50708], "temperature": 0.0, "avg_logprob": + -0.22905848724673492, "compression_ratio": 1.7, "no_speech_prob": 0.0014563280856236815}, + {"id": 242, "seek": 158080, "start": 1587.68, "end": 1595.68, "text": " or are they + looking for, in say, lengthens case someone''s name, or in the case of, say, language,", + "tokens": [50708, 420, 366, 436, 1237, 337, 11, 294, 584, 11, 4641, 694, 1389, 1580, + 311, 1315, 11, 420, 294, 264, 1389, 295, 11, 584, 11, 2856, 11, 51108], "temperature": + 0.0, "avg_logprob": -0.22905848724673492, "compression_ratio": 1.7, "no_speech_prob": + 0.0014563280856236815}, {"id": 243, "seek": 158080, "start": 1595.68, "end": 1600.72, + "text": " maybe you''re not sure what language they''re searching in you''re trying + to do language identification.", "tokens": [51108, 1310, 291, 434, 406, 988, 437, + 2856, 436, 434, 10808, 294, 291, 434, 1382, 281, 360, 2856, 22065, 13, 51360], "temperature": + 0.0, "avg_logprob": -0.22905848724673492, "compression_ratio": 1.7, "no_speech_prob": + 0.0014563280856236815}, {"id": 244, "seek": 158080, "start": 1601.84, "end": 1608.6399999999999, + "text": " You could start by looking at the most common queries that you see, and + then just having people,", "tokens": [51416, 509, 727, 722, 538, 1237, 412, 264, + 881, 2689, 24109, 300, 291, 536, 11, 293, 550, 445, 1419, 561, 11, 51756], "temperature": + 0.0, "avg_logprob": -0.22905848724673492, "compression_ratio": 1.7, "no_speech_prob": + 0.0014563280856236815}, {"id": 245, "seek": 160864, "start": 1608.64, "end": 1614.3200000000002, + "text": " your own employees, a hired crowd, what have you to say, look, can you + just label these?", "tokens": [50364, 428, 1065, 6619, 11, 257, 13144, 6919, 11, + 437, 362, 291, 281, 584, 11, 574, 11, 393, 291, 445, 7645, 613, 30, 50648], "temperature": + 0.0, "avg_logprob": -0.14993516540527344, "compression_ratio": 1.7462686567164178, + "no_speech_prob": 0.013557023368775845}, {"id": 246, "seek": 160864, "start": 1614.3200000000002, + "end": 1619.0400000000002, "text": " I''m not going to label more than hundreds + or maybe thousands of these queries that way.", "tokens": [50648, 286, 478, 406, + 516, 281, 7645, 544, 813, 6779, 420, 1310, 5383, 295, 613, 24109, 300, 636, 13, + 50884], "temperature": 0.0, "avg_logprob": -0.14993516540527344, "compression_ratio": + 1.7462686567164178, "no_speech_prob": 0.013557023368775845}, {"id": 247, "seek": + 160864, "start": 1619.0400000000002, "end": 1624.0, "text": " At the hundreds of + thousands it starts getting a bit silly. But you can do that and you can say,", + "tokens": [50884, 1711, 264, 6779, 295, 5383, 309, 3719, 1242, 257, 857, 11774, + 13, 583, 291, 393, 360, 300, 293, 291, 393, 584, 11, 51132], "temperature": 0.0, + "avg_logprob": -0.14993516540527344, "compression_ratio": 1.7462686567164178, "no_speech_prob": + 0.013557023368775845}, {"id": 248, "seek": 160864, "start": 1624.0, "end": 1630.3200000000002, + "text": " okay, maybe you have now handled 20, 30% of my traffic that way. It''s + not uncommon that in 10,000", "tokens": [51132, 1392, 11, 1310, 291, 362, 586, 18033, + 945, 11, 2217, 4, 295, 452, 6419, 300, 636, 13, 467, 311, 406, 29289, 300, 294, + 1266, 11, 1360, 51448], "temperature": 0.0, "avg_logprob": -0.14993516540527344, + "compression_ratio": 1.7462686567164178, "no_speech_prob": 0.013557023368775845}, + {"id": 249, "seek": 160864, "start": 1630.3200000000002, "end": 1635.76, "text": + " queries you easily get to that. And you can see great. Now that I''ve done that, + now that I know,", "tokens": [51448, 24109, 291, 3612, 483, 281, 300, 13, 400, 291, + 393, 536, 869, 13, 823, 300, 286, 600, 1096, 300, 11, 586, 300, 286, 458, 11, 51720], + "temperature": 0.0, "avg_logprob": -0.14993516540527344, "compression_ratio": 1.7462686567164178, + "no_speech_prob": 0.013557023368775845}, {"id": 250, "seek": 163576, "start": 1636.72, + "end": 1642.4, "text": " that this person is looking for a job title, that the language + is Turkish, or what have you,", "tokens": [50412, 300, 341, 954, 307, 1237, 337, + 257, 1691, 4876, 11, 300, 264, 2856, 307, 18565, 11, 420, 437, 362, 291, 11, 50696], + "temperature": 0.0, "avg_logprob": -0.16622227755459873, "compression_ratio": 1.7359307359307359, + "no_speech_prob": 0.0031928035896271467}, {"id": 251, "seek": 163576, "start": 1643.04, + "end": 1649.84, "text": " what would I do with that? And I''m like, well, I''m going + to have a particular search experience in mind.", "tokens": [50728, 437, 576, 286, + 360, 365, 300, 30, 400, 286, 478, 411, 11, 731, 11, 286, 478, 516, 281, 362, 257, + 1729, 3164, 1752, 294, 1575, 13, 51068], "temperature": 0.0, "avg_logprob": -0.16622227755459873, + "compression_ratio": 1.7359307359307359, "no_speech_prob": 0.0031928035896271467}, + {"id": 252, "seek": 163576, "start": 1650.72, "end": 1655.2, "text": " If I know + that it''s going to be in jobs, I won''t look in people, or I won''t look in my + content posts.", "tokens": [51112, 759, 286, 458, 300, 309, 311, 516, 281, 312, + 294, 4782, 11, 286, 1582, 380, 574, 294, 561, 11, 420, 286, 1582, 380, 574, 294, + 452, 2701, 12300, 13, 51336], "temperature": 0.0, "avg_logprob": -0.16622227755459873, + "compression_ratio": 1.7359307359307359, "no_speech_prob": 0.0031928035896271467}, + {"id": 253, "seek": 163576, "start": 1655.2, "end": 1661.6, "text": " If I know + what language it is, I''m going to grab from that repository and so forth. And you + can learn", "tokens": [51336, 759, 286, 458, 437, 2856, 309, 307, 11, 286, 478, + 516, 281, 4444, 490, 300, 25841, 293, 370, 5220, 13, 400, 291, 393, 1466, 51656], + "temperature": 0.0, "avg_logprob": -0.16622227755459873, "compression_ratio": 1.7359307359307359, + "no_speech_prob": 0.0031928035896271467}, {"id": 254, "seek": 166160, "start": 1661.6799999999998, + "end": 1667.52, "text": " what you would do there. Now, this won''t scale into the + tail of your distribution.", "tokens": [50368, 437, 291, 576, 360, 456, 13, 823, + 11, 341, 1582, 380, 4373, 666, 264, 6838, 295, 428, 7316, 13, 50660], "temperature": + 0.0, "avg_logprob": -0.151982239314488, "compression_ratio": 1.7105263157894737, + "no_speech_prob": 0.025371821597218513}, {"id": 255, "seek": 166160, "start": 1668.1599999999999, + "end": 1673.04, "text": " But you can learn what happens with that sort of experience. + And that''s actually really important,", "tokens": [50692, 583, 291, 393, 1466, + 437, 2314, 365, 300, 1333, 295, 1752, 13, 400, 300, 311, 767, 534, 1021, 11, 50936], + "temperature": 0.0, "avg_logprob": -0.151982239314488, "compression_ratio": 1.7105263157894737, + "no_speech_prob": 0.025371821597218513}, {"id": 256, "seek": 166160, "start": 1673.04, + "end": 1678.48, "text": " because sometimes you don''t know what people react to + until you show it. There''s a bit of a", "tokens": [50936, 570, 2171, 291, 500, + 380, 458, 437, 561, 4515, 281, 1826, 291, 855, 309, 13, 821, 311, 257, 857, 295, + 257, 51208], "temperature": 0.0, "avg_logprob": -0.151982239314488, "compression_ratio": + 1.7105263157894737, "no_speech_prob": 0.025371821597218513}, {"id": 257, "seek": + 166160, "start": 1678.48, "end": 1683.6, "text": " chicken in head in these things + as to what is the quality of your data, but also what is the experience.", "tokens": + [51208, 4662, 294, 1378, 294, 613, 721, 382, 281, 437, 307, 264, 3125, 295, 428, + 1412, 11, 457, 611, 437, 307, 264, 1752, 13, 51464], "temperature": 0.0, "avg_logprob": + -0.151982239314488, "compression_ratio": 1.7105263157894737, "no_speech_prob": 0.025371821597218513}, + {"id": 258, "seek": 166160, "start": 1683.6, "end": 1687.84, "text": " But once + you''ve decided, okay, I am going to pursue this sort of experience.", "tokens": + [51464, 583, 1564, 291, 600, 3047, 11, 1392, 11, 286, 669, 516, 281, 12392, 341, + 1333, 295, 1752, 13, 51676], "temperature": 0.0, "avg_logprob": -0.151982239314488, + "compression_ratio": 1.7105263157894737, "no_speech_prob": 0.025371821597218513}, + {"id": 259, "seek": 168784, "start": 1688.8, "end": 1693.4399999999998, "text": + " Frankly, without machine learning, you''re never going to scale it. You''re not + going to label everything", "tokens": [50412, 41344, 11, 1553, 3479, 2539, 11, 291, + 434, 1128, 516, 281, 4373, 309, 13, 509, 434, 406, 516, 281, 7645, 1203, 50644], + "temperature": 0.0, "avg_logprob": -0.20036427661626025, "compression_ratio": 1.780701754385965, + "no_speech_prob": 0.035742081701755524}, {"id": 260, "seek": 168784, "start": 1693.4399999999998, + "end": 1701.36, "text": " in a rule-based approach to try to figure out what language + something is in, or what category something", "tokens": [50644, 294, 257, 4978, + 12, 6032, 3109, 281, 853, 281, 2573, 484, 437, 2856, 746, 307, 294, 11, 420, 437, + 7719, 746, 51040], "temperature": 0.0, "avg_logprob": -0.20036427661626025, "compression_ratio": + 1.780701754385965, "no_speech_prob": 0.035742081701755524}, {"id": 261, "seek": + 168784, "start": 1701.36, "end": 1709.6799999999998, "text": " is in simply isn''t + going to scale. In the case, for example, language, it''s not like you''re going", + "tokens": [51040, 307, 294, 2935, 1943, 380, 516, 281, 4373, 13, 682, 264, 1389, + 11, 337, 1365, 11, 2856, 11, 309, 311, 406, 411, 291, 434, 516, 51456], "temperature": + 0.0, "avg_logprob": -0.20036427661626025, "compression_ratio": 1.780701754385965, + "no_speech_prob": 0.035742081701755524}, {"id": 262, "seek": 168784, "start": 1709.6799999999998, + "end": 1715.1999999999998, "text": " to just build dictionary, it''s because you''ll + have cognates between the languages, or in the case of", "tokens": [51456, 281, + 445, 1322, 25890, 11, 309, 311, 570, 291, 603, 362, 11786, 1024, 1296, 264, 8650, + 11, 420, 294, 264, 1389, 295, 51732], "temperature": 0.0, "avg_logprob": -0.20036427661626025, + "compression_ratio": 1.780701754385965, "no_speech_prob": 0.035742081701755524}, + {"id": 263, "seek": 171520, "start": 1716.0, "end": 1723.2, "text": " job titles. + By the time you get to Chief Vector Search Ninja, you''re going to be in a bit of + trouble", "tokens": [50404, 1691, 12992, 13, 3146, 264, 565, 291, 483, 281, 10068, + 691, 20814, 17180, 25566, 11, 291, 434, 516, 281, 312, 294, 257, 857, 295, 5253, + 50764], "temperature": 0.0, "avg_logprob": -0.1890827781275699, "compression_ratio": + 1.5925925925925926, "no_speech_prob": 0.003942777868360281}, {"id": 264, "seek": + 171520, "start": 1723.2, "end": 1730.64, "text": " as to recognize and bad as someone''s + title. So that''s the point at which collecting training data", "tokens": [50764, + 382, 281, 5521, 293, 1578, 382, 1580, 311, 4876, 13, 407, 300, 311, 264, 935, 412, + 597, 12510, 3097, 1412, 51136], "temperature": 0.0, "avg_logprob": -0.1890827781275699, + "compression_ratio": 1.5925925925925926, "no_speech_prob": 0.003942777868360281}, + {"id": 265, "seek": 171520, "start": 1730.64, "end": 1735.2, "text": " becomes critical. + One of the nice things is if you''ve done some of the work by hands that can actually", + "tokens": [51136, 3643, 4924, 13, 1485, 295, 264, 1481, 721, 307, 498, 291, 600, + 1096, 512, 295, 264, 589, 538, 2377, 300, 393, 767, 51364], "temperature": 0.0, + "avg_logprob": -0.1890827781275699, "compression_ratio": 1.5925925925925926, "no_speech_prob": + 0.003942777868360281}, {"id": 266, "seek": 171520, "start": 1735.2, "end": 1740.0, + "text": " be how you bootstrap training data for these approaches, especially if + you don''t have", "tokens": [51364, 312, 577, 291, 11450, 372, 4007, 3097, 1412, + 337, 613, 11587, 11, 2318, 498, 291, 500, 380, 362, 51604], "temperature": 0.0, + "avg_logprob": -0.1890827781275699, "compression_ratio": 1.5925925925925926, "no_speech_prob": + 0.003942777868360281}, {"id": 267, "seek": 174000, "start": 1740.08, "end": 1743.68, + "text": " our data position to do so using feedback from your own search application.", + "tokens": [50368, 527, 1412, 2535, 281, 360, 370, 1228, 5824, 490, 428, 1065, 3164, + 3861, 13, 50548], "temperature": 0.0, "avg_logprob": -0.21619413275467722, "compression_ratio": + 1.5358649789029535, "no_speech_prob": 0.009674756787717342}, {"id": 268, "seek": + 174000, "start": 1744.64, "end": 1751.92, "text": " Yeah, absolutely. A quick shout + out to our audience. I think it''s, if I''m reading it right,", "tokens": [50596, + 865, 11, 3122, 13, 316, 1702, 8043, 484, 281, 527, 4034, 13, 286, 519, 309, 311, + 11, 498, 286, 478, 3760, 309, 558, 11, 50960], "temperature": 0.0, "avg_logprob": + -0.21619413275467722, "compression_ratio": 1.5358649789029535, "no_speech_prob": + 0.009674756787717342}, {"id": 269, "seek": 174000, "start": 1752.88, "end": 1760.4, + "text": " just a second, Andre has a poll of how many people in this call are using + hand-to-and-boost", "tokens": [51008, 445, 257, 1150, 11, 20667, 575, 257, 6418, + 295, 577, 867, 561, 294, 341, 818, 366, 1228, 1011, 12, 1353, 12, 474, 12, 1763, + 555, 51384], "temperature": 0.0, "avg_logprob": -0.21619413275467722, "compression_ratio": + 1.5358649789029535, "no_speech_prob": 0.009674756787717342}, {"id": 270, "seek": + 174000, "start": 1760.4, "end": 1767.84, "text": " versus machine learning. I''m + really interested to hear or read this opinion. Maybe you can say about it.", "tokens": + [51384, 5717, 3479, 2539, 13, 286, 478, 534, 3102, 281, 1568, 420, 1401, 341, 4800, + 13, 2704, 291, 393, 584, 466, 309, 13, 51756], "temperature": 0.0, "avg_logprob": + -0.21619413275467722, "compression_ratio": 1.5358649789029535, "no_speech_prob": + 0.009674756787717342}, {"id": 271, "seek": 176784, "start": 1768.8, "end": 1777.6799999999998, + "text": " Yeah, and on the other hand, you''ve been advocating a lot on drilling + into your content.", "tokens": [50412, 865, 11, 293, 322, 264, 661, 1011, 11, 291, + 600, 668, 32050, 257, 688, 322, 26290, 666, 428, 2701, 13, 50856], "temperature": + 0.0, "avg_logprob": -0.1798839997709467, "compression_ratio": 1.592920353982301, + "no_speech_prob": 0.011386252008378506}, {"id": 272, "seek": 176784, "start": 1777.6799999999998, + "end": 1783.28, "text": " And of course, some companies do this one way or another. + But can you illuminate us on what you", "tokens": [50856, 400, 295, 1164, 11, 512, + 3431, 360, 341, 472, 636, 420, 1071, 13, 583, 393, 291, 28593, 473, 505, 322, 437, + 291, 51136], "temperature": 0.0, "avg_logprob": -0.1798839997709467, "compression_ratio": + 1.592920353982301, "no_speech_prob": 0.011386252008378506}, {"id": 273, "seek": + 176784, "start": 1783.28, "end": 1789.28, "text": " can do also on the content understanding + side? Sure. So if you think about it, if all you", "tokens": [51136, 393, 360, 611, + 322, 264, 2701, 3701, 1252, 30, 4894, 13, 407, 498, 291, 519, 466, 309, 11, 498, + 439, 291, 51436], "temperature": 0.0, "avg_logprob": -0.1798839997709467, "compression_ratio": + 1.592920353982301, "no_speech_prob": 0.011386252008378506}, {"id": 274, "seek": + 176784, "start": 1789.28, "end": 1793.04, "text": " need was query understanding, + you might be able to figure out exactly what the search", "tokens": [51436, 643, + 390, 14581, 3701, 11, 291, 1062, 312, 1075, 281, 2573, 484, 2293, 437, 264, 3164, + 51624], "temperature": 0.0, "avg_logprob": -0.1798839997709467, "compression_ratio": + 1.592920353982301, "no_speech_prob": 0.011386252008378506}, {"id": 275, "seek": + 179304, "start": 1793.2, "end": 1799.84, "text": " or wants, but actually not be + able to find it. So content understanding is really what you''re doing", "tokens": + [50372, 420, 2738, 11, 457, 767, 406, 312, 1075, 281, 915, 309, 13, 407, 2701, 3701, + 307, 534, 437, 291, 434, 884, 50704], "temperature": 0.0, "avg_logprob": -0.17553130529260122, + "compression_ratio": 1.648068669527897, "no_speech_prob": 0.0043566529639065266}, + {"id": 276, "seek": 179304, "start": 1799.84, "end": 1805.28, "text": " in order + to represent content in your index and the best way to make it retrievable, it''s", + "tokens": [50704, 294, 1668, 281, 2906, 2701, 294, 428, 8186, 293, 264, 1151, 636, + 281, 652, 309, 19817, 17915, 11, 309, 311, 50976], "temperature": 0.0, "avg_logprob": + -0.17553130529260122, "compression_ratio": 1.648068669527897, "no_speech_prob": + 0.0043566529639065266}, {"id": 277, "seek": 179304, "start": 1805.28, "end": 1812.72, + "text": " horrible. So certainly, it''s a great place to do things like categorization. + This is especially", "tokens": [50976, 9263, 13, 407, 3297, 11, 309, 311, 257, 869, + 1081, 281, 360, 721, 411, 19250, 2144, 13, 639, 307, 2318, 51348], "temperature": + 0.0, "avg_logprob": -0.17553130529260122, "compression_ratio": 1.648068669527897, + "no_speech_prob": 0.0043566529639065266}, {"id": 278, "seek": 179304, "start": 1812.72, + "end": 1818.48, "text": " true to say if you have a marketplace or if you have a + lot of unstructured content where you don''t", "tokens": [51348, 2074, 281, 584, + 498, 291, 362, 257, 19455, 420, 498, 291, 362, 257, 688, 295, 18799, 46847, 2701, + 689, 291, 500, 380, 51636], "temperature": 0.0, "avg_logprob": -0.17553130529260122, + "compression_ratio": 1.648068669527897, "no_speech_prob": 0.0043566529639065266}, + {"id": 279, "seek": 181848, "start": 1818.48, "end": 1826.08, "text": " necessarily + know what the content is about. It''s also a good place to extract entities,", "tokens": + [50364, 4725, 458, 437, 264, 2701, 307, 466, 13, 467, 311, 611, 257, 665, 1081, + 281, 8947, 16667, 11, 50744], "temperature": 0.0, "avg_logprob": -0.1291461622858622, + "compression_ratio": 1.613733905579399, "no_speech_prob": 0.0006621080101467669}, + {"id": 280, "seek": 181848, "start": 1826.08, "end": 1832.24, "text": " terminology, + even determined potentially the terminology that''s used for representing it. I + mean,", "tokens": [50744, 27575, 11, 754, 9540, 7263, 264, 27575, 300, 311, 1143, + 337, 13460, 309, 13, 286, 914, 11, 51052], "temperature": 0.0, "avg_logprob": -0.1291461622858622, + "compression_ratio": 1.613733905579399, "no_speech_prob": 0.0006621080101467669}, + {"id": 281, "seek": 181848, "start": 1832.24, "end": 1839.68, "text": " imagine + it. For example, you have a collection of research papers. You can discover the + useful", "tokens": [51052, 3811, 309, 13, 1171, 1365, 11, 291, 362, 257, 5765, 295, + 2132, 10577, 13, 509, 393, 4411, 264, 4420, 51424], "temperature": 0.0, "avg_logprob": + -0.1291461622858622, "compression_ratio": 1.613733905579399, "no_speech_prob": 0.0006621080101467669}, + {"id": 282, "seek": 181848, "start": 1839.68, "end": 1846.32, "text": " words or + phrases that tend to carry meaning. You can relate them to one another by putting + them", "tokens": [51424, 2283, 420, 20312, 300, 3928, 281, 3985, 3620, 13, 509, + 393, 10961, 552, 281, 472, 1071, 538, 3372, 552, 51756], "temperature": 0.0, "avg_logprob": + -0.1291461622858622, "compression_ratio": 1.613733905579399, "no_speech_prob": 0.0006621080101467669}, + {"id": 283, "seek": 184632, "start": 1846.3999999999999, "end": 1853.04, "text": + " in a vector space where the distance between the vectors that tells you how similar + they are,", "tokens": [50368, 294, 257, 8062, 1901, 689, 264, 4560, 1296, 264, 18875, + 300, 5112, 291, 577, 2531, 436, 366, 11, 50700], "temperature": 0.0, "avg_logprob": + -0.16696501413981119, "compression_ratio": 1.6605504587155964, "no_speech_prob": + 0.0012674053432419896}, {"id": 284, "seek": 184632, "start": 1853.04, "end": 1860.56, + "text": " you can cluster those. So in general, doing things that involve either + classification or", "tokens": [50700, 291, 393, 13630, 729, 13, 407, 294, 2674, + 11, 884, 721, 300, 9494, 2139, 21538, 420, 51076], "temperature": 0.0, "avg_logprob": + -0.16696501413981119, "compression_ratio": 1.6605504587155964, "no_speech_prob": + 0.0012674053432419896}, {"id": 285, "seek": 184632, "start": 1861.6799999999998, + "end": 1868.08, "text": " essentially annotation recognizing entities or turns in + those allows you to enrich the way", "tokens": [51132, 4476, 48654, 18538, 16667, + 420, 4523, 294, 729, 4045, 291, 281, 18849, 264, 636, 51452], "temperature": 0.0, + "avg_logprob": -0.16696501413981119, "compression_ratio": 1.6605504587155964, "no_speech_prob": + 0.0012674053432419896}, {"id": 286, "seek": 184632, "start": 1868.08, "end": 1874.8799999999999, + "text": " you index the content. You can also figure out when documents are similar + to one another", "tokens": [51452, 291, 8186, 264, 2701, 13, 509, 393, 611, 2573, + 484, 562, 8512, 366, 2531, 281, 472, 1071, 51792], "temperature": 0.0, "avg_logprob": + -0.16696501413981119, "compression_ratio": 1.6605504587155964, "no_speech_prob": + 0.0012674053432419896}, {"id": 287, "seek": 187488, "start": 1875.68, "end": 1881.1200000000001, + "text": " because when you have these vector representations, you can take the entire + document or part of", "tokens": [50404, 570, 562, 291, 362, 613, 8062, 33358, 11, + 291, 393, 747, 264, 2302, 4166, 420, 644, 295, 50676], "temperature": 0.0, "avg_logprob": + -0.09322117675434459, "compression_ratio": 1.8476190476190477, "no_speech_prob": + 0.003261485369876027}, {"id": 288, "seek": 187488, "start": 1881.1200000000001, + "end": 1886.88, "text": " the document and do that. And that can be useful for saying, + oh, if you''re interested in this document,", "tokens": [50676, 264, 4166, 293, + 360, 300, 13, 400, 300, 393, 312, 4420, 337, 1566, 11, 1954, 11, 498, 291, 434, + 3102, 294, 341, 4166, 11, 50964], "temperature": 0.0, "avg_logprob": -0.09322117675434459, + "compression_ratio": 1.8476190476190477, "no_speech_prob": 0.003261485369876027}, + {"id": 289, "seek": 187488, "start": 1886.88, "end": 1891.1200000000001, "text": + " you might be interested in these other ones or maybe you''re interested in these + other ones,", "tokens": [50964, 291, 1062, 312, 3102, 294, 613, 661, 2306, 420, + 1310, 291, 434, 3102, 294, 613, 661, 2306, 11, 51176], "temperature": 0.0, "avg_logprob": + -0.09322117675434459, "compression_ratio": 1.8476190476190477, "no_speech_prob": + 0.003261485369876027}, {"id": 290, "seek": 187488, "start": 1891.1200000000001, + "end": 1899.0400000000002, "text": " but they''re more recent. And that allows you + to combine what content is about with other factors", "tokens": [51176, 457, 436, + 434, 544, 5162, 13, 400, 300, 4045, 291, 281, 10432, 437, 2701, 307, 466, 365, 661, + 6771, 51572], "temperature": 0.0, "avg_logprob": -0.09322117675434459, "compression_ratio": + 1.8476190476190477, "no_speech_prob": 0.003261485369876027}, {"id": 291, "seek": + 189904, "start": 1899.04, "end": 1905.12, "text": " like its recent seeds, popularity, + other people that look at it. And you see this often not just", "tokens": [50364, + 411, 1080, 5162, 9203, 11, 19301, 11, 661, 561, 300, 574, 412, 309, 13, 400, 291, + 536, 341, 2049, 406, 445, 50668], "temperature": 0.0, "avg_logprob": -0.17833597519818475, + "compression_ratio": 1.5950413223140496, "no_speech_prob": 0.0008550725760869682}, + {"id": 292, "seek": 189904, "start": 1905.12, "end": 1910.24, "text": " for search, + but specifically for for being an engine for recommendations that are triggered", + "tokens": [50668, 337, 3164, 11, 457, 4682, 337, 337, 885, 364, 2848, 337, 10434, + 300, 366, 21710, 50924], "temperature": 0.0, "avg_logprob": -0.17833597519818475, + "compression_ratio": 1.5950413223140496, "no_speech_prob": 0.0008550725760869682}, + {"id": 293, "seek": 189904, "start": 1910.24, "end": 1916.8, "text": " from discovering + something through an initial search. So all of these things basically make the", + "tokens": [50924, 490, 24773, 746, 807, 364, 5883, 3164, 13, 407, 439, 295, 613, + 721, 1936, 652, 264, 51252], "temperature": 0.0, "avg_logprob": -0.17833597519818475, + "compression_ratio": 1.5950413223140496, "no_speech_prob": 0.0008550725760869682}, + {"id": 294, "seek": 189904, "start": 1916.8, "end": 1925.28, "text": " content more + retrievable, but also more exploreable. Yeah, absolutely. I can also add to that + in some", "tokens": [51252, 2701, 544, 19817, 17915, 11, 457, 611, 544, 6839, 712, + 13, 865, 11, 3122, 13, 286, 393, 611, 909, 281, 300, 294, 512, 51676], "temperature": + 0.0, "avg_logprob": -0.17833597519818475, "compression_ratio": 1.5950413223140496, + "no_speech_prob": 0.0008550725760869682}, {"id": 295, "seek": 192528, "start": 1925.28, + "end": 1930.6399999999999, "text": " settings, specifically in financial search, + I''m happy to work at the company called AlphaSense.", "tokens": [50364, 6257, 11, + 4682, 294, 4669, 3164, 11, 286, 478, 2055, 281, 589, 412, 264, 2237, 1219, 20588, + 50, 1288, 13, 50632], "temperature": 0.0, "avg_logprob": -0.17023271924993966, "compression_ratio": + 1.5829787234042554, "no_speech_prob": 0.0035638187546283007}, {"id": 296, "seek": + 192528, "start": 1931.52, "end": 1936.96, "text": " You may end up in an institution + when you cannot actually use the hints from the users,", "tokens": [50676, 509, + 815, 917, 493, 294, 364, 7818, 562, 291, 2644, 767, 764, 264, 27271, 490, 264, 5022, + 11, 50948], "temperature": 0.0, "avg_logprob": -0.17023271924993966, "compression_ratio": + 1.5829787234042554, "no_speech_prob": 0.0035638187546283007}, {"id": 297, "seek": + 192528, "start": 1936.96, "end": 1944.16, "text": " right? So for instance, like + if you do a not a suggest and you extract themes from queries,", "tokens": [50948, + 558, 30, 407, 337, 5197, 11, 411, 498, 291, 360, 257, 406, 257, 3402, 293, 291, + 8947, 13544, 490, 24109, 11, 51308], "temperature": 0.0, "avg_logprob": -0.17023271924993966, + "compression_ratio": 1.5829787234042554, "no_speech_prob": 0.0035638187546283007}, + {"id": 298, "seek": 192528, "start": 1944.8, "end": 1949.92, "text": " you could + actually do that. I believe Google does that to some extent. But in financial setting,", + "tokens": [51340, 291, 727, 767, 360, 300, 13, 286, 1697, 3329, 775, 300, 281, 512, + 8396, 13, 583, 294, 4669, 3287, 11, 51596], "temperature": 0.0, "avg_logprob": -0.17023271924993966, + "compression_ratio": 1.5829787234042554, "no_speech_prob": 0.0035638187546283007}, + {"id": 299, "seek": 194992, "start": 1949.92, "end": 1956.88, "text": " you cannot + do this because banks will prohibit using their searches with their arrivals, right?", + "tokens": [50364, 291, 2644, 360, 341, 570, 10237, 486, 16015, 270, 1228, 641, 26701, + 365, 641, 3399, 19778, 11, 558, 30, 50712], "temperature": 0.0, "avg_logprob": -0.10277369747991147, + "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0002468499878887087}, + {"id": 300, "seek": 194992, "start": 1956.88, "end": 1963.44, "text": " You don''t + want to do that ever. And so at that point, you do go deeper into content understanding", + "tokens": [50712, 509, 500, 380, 528, 281, 360, 300, 1562, 13, 400, 370, 412, 300, + 935, 11, 291, 360, 352, 7731, 666, 2701, 3701, 51040], "temperature": 0.0, "avg_logprob": + -0.10277369747991147, "compression_ratio": 1.640495867768595, "no_speech_prob": + 0.0002468499878887087}, {"id": 301, "seek": 194992, "start": 1963.44, "end": 1969.92, + "text": " and you start extracting stable themes and maybe over time you can also + extract trends as they show up.", "tokens": [51040, 293, 291, 722, 49844, 8351, + 13544, 293, 1310, 670, 565, 291, 393, 611, 8947, 13892, 382, 436, 855, 493, 13, + 51364], "temperature": 0.0, "avg_logprob": -0.10277369747991147, "compression_ratio": + 1.640495867768595, "no_speech_prob": 0.0002468499878887087}, {"id": 302, "seek": + 194992, "start": 1969.92, "end": 1976.8000000000002, "text": " And that might be + one way to kind of combat the issue of not being able to use queries to influence", + "tokens": [51364, 400, 300, 1062, 312, 472, 636, 281, 733, 295, 8361, 264, 2734, + 295, 406, 885, 1075, 281, 764, 24109, 281, 6503, 51708], "temperature": 0.0, "avg_logprob": + -0.10277369747991147, "compression_ratio": 1.640495867768595, "no_speech_prob": + 0.0002468499878887087}, {"id": 303, "seek": 197680, "start": 1976.8, "end": 1983.04, + "text": " your model. But yeah, you might have another setting. I''m curious to + hear in the audience as well,", "tokens": [50364, 428, 2316, 13, 583, 1338, 11, + 291, 1062, 362, 1071, 3287, 13, 286, 478, 6369, 281, 1568, 294, 264, 4034, 382, + 731, 11, 50676], "temperature": 0.0, "avg_logprob": -0.10619805933354975, "compression_ratio": + 1.6302521008403361, "no_speech_prob": 0.0020618329290300608}, {"id": 304, "seek": + 197680, "start": 1983.04, "end": 1992.48, "text": " what kind of setting you guys + have. And my next question would be on what I available data sets.", "tokens": [50676, + 437, 733, 295, 3287, 291, 1074, 362, 13, 400, 452, 958, 1168, 576, 312, 322, 437, + 286, 2435, 1412, 6352, 13, 51148], "temperature": 0.0, "avg_logprob": -0.10619805933354975, + "compression_ratio": 1.6302521008403361, "no_speech_prob": 0.0020618329290300608}, + {"id": 305, "seek": 197680, "start": 1992.48, "end": 1998.1599999999999, "text": + " Let''s say if I want to practice query understanding or content understanding + at home in my lab,", "tokens": [51148, 961, 311, 584, 498, 286, 528, 281, 3124, + 14581, 3701, 420, 2701, 3701, 412, 1280, 294, 452, 2715, 11, 51432], "temperature": + 0.0, "avg_logprob": -0.10619805933354975, "compression_ratio": 1.6302521008403361, + "no_speech_prob": 0.0020618329290300608}, {"id": 306, "seek": 197680, "start": 1999.04, + "end": 2003.9199999999998, "text": " what are the available data sets, tools and + algorithms that you can recommend that will allow us", "tokens": [51476, 437, 366, + 264, 2435, 1412, 6352, 11, 3873, 293, 14642, 300, 291, 393, 2748, 300, 486, 2089, + 505, 51720], "temperature": 0.0, "avg_logprob": -0.10619805933354975, "compression_ratio": + 1.6302521008403361, "no_speech_prob": 0.0020618329290300608}, {"id": 307, "seek": + 200392, "start": 2003.92, "end": 2008.5600000000002, "text": " to train these models + for both of these directions, query and content understanding?", "tokens": [50364, + 281, 3847, 613, 5245, 337, 1293, 295, 613, 11095, 11, 14581, 293, 2701, 3701, 30, + 50596], "temperature": 0.0, "avg_logprob": -0.15965761309084686, "compression_ratio": + 1.609865470852018, "no_speech_prob": 0.003707548836246133}, {"id": 308, "seek": + 200392, "start": 2009.68, "end": 2016.48, "text": " So as those of you took the + class, no, we''ve been using any commerce data set from Best Buy", "tokens": [50652, + 407, 382, 729, 295, 291, 1890, 264, 1508, 11, 572, 11, 321, 600, 668, 1228, 604, + 26320, 1412, 992, 490, 9752, 19146, 50992], "temperature": 0.0, "avg_logprob": -0.15965761309084686, + "compression_ratio": 1.609865470852018, "no_speech_prob": 0.003707548836246133}, + {"id": 309, "seek": 200392, "start": 2017.2, "end": 2022.72, "text": " for a teaching. + It''s a nice data set. It''s a little bit old, but it has a virtue that it has", + "tokens": [51028, 337, 257, 4571, 13, 467, 311, 257, 1481, 1412, 992, 13, 467, 311, + 257, 707, 857, 1331, 11, 457, 309, 575, 257, 20816, 300, 309, 575, 51304], "temperature": + 0.0, "avg_logprob": -0.15965761309084686, "compression_ratio": 1.609865470852018, + "no_speech_prob": 0.003707548836246133}, {"id": 310, "seek": 200392, "start": 2022.72, + "end": 2029.44, "text": " a bunch of structured data queries and some click data + as well. And that''s proven useful.", "tokens": [51304, 257, 3840, 295, 18519, 1412, + 24109, 293, 512, 2052, 1412, 382, 731, 13, 400, 300, 311, 12785, 4420, 13, 51640], + "temperature": 0.0, "avg_logprob": -0.15965761309084686, "compression_ratio": 1.609865470852018, + "no_speech_prob": 0.003707548836246133}, {"id": 311, "seek": 202944, "start": 2030.0800000000002, + "end": 2038.16, "text": " You can get that from Kaggle as they''ve made available + freely. And indeed, Kaggle,", "tokens": [50396, 509, 393, 483, 300, 490, 48751, + 22631, 382, 436, 600, 1027, 2435, 16433, 13, 400, 6451, 11, 48751, 22631, 11, 50800], + "temperature": 0.0, "avg_logprob": -0.2827581139497979, "compression_ratio": 1.55793991416309, + "no_speech_prob": 0.0021636176388710737}, {"id": 312, "seek": 202944, "start": 2038.16, + "end": 2043.76, "text": " which is at this point, the subsidiary of Google, but + Minkins independent brand is a great place", "tokens": [50800, 597, 307, 412, 341, + 935, 11, 264, 48296, 822, 295, 3329, 11, 457, 376, 475, 1292, 6695, 3360, 307, 257, + 869, 1081, 51080], "temperature": 0.0, "avg_logprob": -0.2827581139497979, "compression_ratio": + 1.55793991416309, "no_speech_prob": 0.0021636176388710737}, {"id": 313, "seek": + 202944, "start": 2043.76, "end": 2052.8, "text": " to get data sets. This one from + Best Buy, I think is probably the best all around one for exploring", "tokens": + [51080, 281, 483, 1412, 6352, 13, 639, 472, 490, 9752, 19146, 11, 286, 519, 307, + 1391, 264, 1151, 439, 926, 472, 337, 12736, 51532], "temperature": 0.0, "avg_logprob": + -0.2827581139497979, "compression_ratio": 1.55793991416309, "no_speech_prob": 0.0021636176388710737}, + {"id": 314, "seek": 202944, "start": 2053.52, "end": 2058.2400000000002, "text": + " the particularly query understanding, until that starts that content understanding.", + "tokens": [51568, 264, 4098, 14581, 3701, 11, 1826, 300, 3719, 300, 2701, 3701, + 13, 51804], "temperature": 0.0, "avg_logprob": -0.2827581139497979, "compression_ratio": + 1.55793991416309, "no_speech_prob": 0.0021636176388710737}, {"id": 315, "seek": + 205824, "start": 2058.24, "end": 2065.52, "text": " There are certainly other data + sets available. You can, for example, grab dumps of data from Wikipedia", "tokens": + [50364, 821, 366, 3297, 661, 1412, 6352, 2435, 13, 509, 393, 11, 337, 1365, 11, + 4444, 11430, 82, 295, 1412, 490, 28999, 50728], "temperature": 0.0, "avg_logprob": + -0.15187160282918852, "compression_ratio": 1.5174129353233832, "no_speech_prob": + 0.0003845634055323899}, {"id": 316, "seek": 205824, "start": 2066.24, "end": 2075.68, + "text": " that are fascinating Wikipedia is perhaps the best overall data set in + the world. But they''re in mind", "tokens": [50764, 300, 366, 10343, 28999, 307, + 4317, 264, 1151, 4787, 1412, 992, 294, 264, 1002, 13, 583, 436, 434, 294, 1575, + 51236], "temperature": 0.0, "avg_logprob": -0.15187160282918852, "compression_ratio": + 1.5174129353233832, "no_speech_prob": 0.0003845634055323899}, {"id": 317, "seek": + 205824, "start": 2075.68, "end": 2082.8799999999997, "text": " that it''s a bit + sprawling and that they don''t supply much in the way queries or feedback. And you''ll", + "tokens": [51236, 300, 309, 311, 257, 857, 22734, 1688, 293, 300, 436, 500, 380, + 5847, 709, 294, 264, 636, 24109, 420, 5824, 13, 400, 291, 603, 51596], "temperature": + 0.0, "avg_logprob": -0.15187160282918852, "compression_ratio": 1.5174129353233832, + "no_speech_prob": 0.0003845634055323899}, {"id": 318, "seek": 208288, "start": 2082.88, + "end": 2089.12, "text": " have to do a little bit of a work with that. There''s + a data set called MSMarco that''s been very", "tokens": [50364, 362, 281, 360, 257, + 707, 857, 295, 257, 589, 365, 300, 13, 821, 311, 257, 1412, 992, 1219, 7395, 16639, + 1291, 300, 311, 668, 588, 50676], "temperature": 0.0, "avg_logprob": -0.12836932110530075, + "compression_ratio": 1.6, "no_speech_prob": 0.002760231727734208}, {"id": 319, "seek": + 208288, "start": 2089.12, "end": 2097.28, "text": " popular with essentially the + deep learning crowd because it''s an interesting place for doing question", "tokens": + [50676, 3743, 365, 4476, 264, 2452, 2539, 6919, 570, 309, 311, 364, 1880, 1081, + 337, 884, 1168, 51084], "temperature": 0.0, "avg_logprob": -0.12836932110530075, + "compression_ratio": 1.6, "no_speech_prob": 0.002760231727734208}, {"id": 320, "seek": + 208288, "start": 2097.28, "end": 2105.6, "text": " answering. So I think a lot of + the question becomes what is the problem that you want to work on?", "tokens": [51084, + 13430, 13, 407, 286, 519, 257, 688, 295, 264, 1168, 3643, 437, 307, 264, 1154, 300, + 291, 528, 281, 589, 322, 30, 51500], "temperature": 0.0, "avg_logprob": -0.12836932110530075, + "compression_ratio": 1.6, "no_speech_prob": 0.002760231727734208}, {"id": 321, "seek": + 208288, "start": 2105.6, "end": 2112.1600000000003, "text": " And I would say for + those of you who are already working in search and some capacity or at least", "tokens": + [51500, 400, 286, 576, 584, 337, 729, 295, 291, 567, 366, 1217, 1364, 294, 3164, + 293, 512, 6042, 420, 412, 1935, 51828], "temperature": 0.0, "avg_logprob": -0.12836932110530075, + "compression_ratio": 1.6, "no_speech_prob": 0.002760231727734208}, {"id": 322, "seek": + 211216, "start": 2112.16, "end": 2118.16, "text": " have access to data, you should + really consider trying to use your own data because usually the", "tokens": [50364, + 362, 2105, 281, 1412, 11, 291, 820, 534, 1949, 1382, 281, 764, 428, 1065, 1412, + 570, 2673, 264, 50664], "temperature": 0.0, "avg_logprob": -0.08956835643354669, + "compression_ratio": 1.6724137931034482, "no_speech_prob": 0.002382851904258132}, + {"id": 323, "seek": 211216, "start": 2118.16, "end": 2125.92, "text": " thing that + is hardest to get in public data sets is user behavior. For perfectly understandable + reasons,", "tokens": [50664, 551, 300, 307, 13158, 281, 483, 294, 1908, 1412, 6352, + 307, 4195, 5223, 13, 1171, 6239, 25648, 4112, 11, 51052], "temperature": 0.0, "avg_logprob": + -0.08956835643354669, "compression_ratio": 1.6724137931034482, "no_speech_prob": + 0.002382851904258132}, {"id": 324, "seek": 211216, "start": 2125.92, "end": 2132.56, + "text": " companies are not eager to share what their users have done either because + of the privacy", "tokens": [51052, 3431, 366, 406, 18259, 281, 2073, 437, 641, 5022, + 362, 1096, 2139, 570, 295, 264, 11427, 51384], "temperature": 0.0, "avg_logprob": + -0.08956835643354669, "compression_ratio": 1.6724137931034482, "no_speech_prob": + 0.002382851904258132}, {"id": 325, "seek": 211216, "start": 2132.56, "end": 2137.8399999999997, + "text": " constraints for their user or the competitive nature of that data. So + even if you''re able to find", "tokens": [51384, 18491, 337, 641, 4195, 420, 264, + 10043, 3687, 295, 300, 1412, 13, 407, 754, 498, 291, 434, 1075, 281, 915, 51648], + "temperature": 0.0, "avg_logprob": -0.08956835643354669, "compression_ratio": 1.6724137931034482, + "no_speech_prob": 0.002382851904258132}, {"id": 326, "seek": 213784, "start": 2137.84, + "end": 2144.32, "text": " catalog data, which you could, if it''s structured, use + to learn content understanding techniques.", "tokens": [50364, 19746, 1412, 11, + 597, 291, 727, 11, 498, 309, 311, 18519, 11, 764, 281, 1466, 2701, 3701, 7512, 13, + 50688], "temperature": 0.0, "avg_logprob": -0.12310325077601841, "compression_ratio": + 1.7026022304832713, "no_speech_prob": 0.010309484787285328}, {"id": 327, "seek": + 213784, "start": 2144.32, "end": 2150.32, "text": " For query understanding, the + most powerful thing you''re going to use is a collection of queries", "tokens": + [50688, 1171, 14581, 3701, 11, 264, 881, 4005, 551, 291, 434, 516, 281, 764, 307, + 257, 5765, 295, 24109, 50988], "temperature": 0.0, "avg_logprob": -0.12310325077601841, + "compression_ratio": 1.7026022304832713, "no_speech_prob": 0.010309484787285328}, + {"id": 328, "seek": 213784, "start": 2150.32, "end": 2155.1200000000003, "text": + " and labels for what those queries mean. But if you get a collection of data without + even having", "tokens": [50988, 293, 16949, 337, 437, 729, 24109, 914, 13, 583, + 498, 291, 483, 257, 5765, 295, 1412, 1553, 754, 1419, 51228], "temperature": 0.0, + "avg_logprob": -0.12310325077601841, "compression_ratio": 1.7026022304832713, "no_speech_prob": + 0.010309484787285328}, {"id": 329, "seek": 213784, "start": 2155.1200000000003, + "end": 2159.2000000000003, "text": " what the queries are and let alone the labels, + it''s a little bit more difficult.", "tokens": [51228, 437, 264, 24109, 366, 293, + 718, 3312, 264, 16949, 11, 309, 311, 257, 707, 857, 544, 2252, 13, 51432], "temperature": + 0.0, "avg_logprob": -0.12310325077601841, "compression_ratio": 1.7026022304832713, + "no_speech_prob": 0.010309484787285328}, {"id": 330, "seek": 213784, "start": 2160.4, + "end": 2166.4, "text": " Yeah, absolutely. And can you also share a bit on the tooling + side or maybe algorithms?", "tokens": [51492, 865, 11, 3122, 13, 400, 393, 291, + 611, 2073, 257, 857, 322, 264, 46593, 1252, 420, 1310, 14642, 30, 51792], "temperature": + 0.0, "avg_logprob": -0.12310325077601841, "compression_ratio": 1.7026022304832713, + "no_speech_prob": 0.010309484787285328}, {"id": 331, "seek": 216640, "start": 2167.28, + "end": 2175.2000000000003, "text": " Sure. So the, you know, for different problems, + obviously call for different tools.", "tokens": [50408, 4894, 13, 407, 264, 11, + 291, 458, 11, 337, 819, 2740, 11, 2745, 818, 337, 819, 3873, 13, 50804], "temperature": + 0.0, "avg_logprob": -0.2202834129333496, "compression_ratio": 1.450261780104712, + "no_speech_prob": 0.004265128634870052}, {"id": 332, "seek": 216640, "start": 2176.0, + "end": 2183.6800000000003, "text": " On the ranking side, one of the most popular + approaches that''s still in use today is XG boost,", "tokens": [50844, 1282, 264, + 17833, 1252, 11, 472, 295, 264, 881, 3743, 11587, 300, 311, 920, 294, 764, 965, + 307, 1783, 38, 9194, 11, 51228], "temperature": 0.0, "avg_logprob": -0.2202834129333496, + "compression_ratio": 1.450261780104712, "no_speech_prob": 0.004265128634870052}, + {"id": 333, "seek": 216640, "start": 2183.6800000000003, "end": 2190.7200000000003, + "text": " which you can get online easily enough. And it''s also been integrated + with, I think at this point,", "tokens": [51228, 597, 291, 393, 483, 2950, 3612, + 1547, 13, 400, 309, 311, 611, 668, 10919, 365, 11, 286, 519, 412, 341, 935, 11, + 51580], "temperature": 0.0, "avg_logprob": -0.2202834129333496, "compression_ratio": + 1.450261780104712, "no_speech_prob": 0.004265128634870052}, {"id": 334, "seek": + 219072, "start": 2190.72, "end": 2197.68, "text": " most of the major is certainly + Lucine based, sort of solar, elastant, and so forth.", "tokens": [50364, 881, 295, + 264, 2563, 307, 3297, 9593, 533, 2361, 11, 1333, 295, 7936, 11, 806, 525, 394, 11, + 293, 370, 5220, 13, 50712], "temperature": 0.0, "avg_logprob": -0.31679315464470975, + "compression_ratio": 1.5478260869565217, "no_speech_prob": 0.0013221359113231301}, + {"id": 335, "seek": 219072, "start": 2198.9599999999996, "end": 2205.68, "text": + " If you''re interested in classifying text or doing unsupervised learning and untext,", + "tokens": [50776, 759, 291, 434, 3102, 294, 1508, 5489, 2487, 420, 884, 2693, 12879, + 24420, 2539, 293, 517, 25111, 11, 51112], "temperature": 0.0, "avg_logprob": -0.31679315464470975, + "compression_ratio": 1.5478260869565217, "no_speech_prob": 0.0013221359113231301}, + {"id": 336, "seek": 219072, "start": 2206.24, "end": 2213.04, "text": " there, you + know, these days, frankly, I would go directly to embedding based models. And you + can", "tokens": [51140, 456, 11, 291, 458, 11, 613, 1708, 11, 11939, 11, 286, 576, + 352, 3838, 281, 12240, 3584, 2361, 5245, 13, 400, 291, 393, 51480], "temperature": + 0.0, "avg_logprob": -0.31679315464470975, "compression_ratio": 1.5478260869565217, + "no_speech_prob": 0.0013221359113231301}, {"id": 337, "seek": 219072, "start": 2213.04, + "end": 2220.48, "text": " use tools like Burd or maybe the old school on the fan + or fast text that you can get online", "tokens": [51480, 764, 3873, 411, 7031, 67, + 420, 1310, 264, 1331, 1395, 322, 264, 3429, 420, 2370, 2487, 300, 291, 393, 483, + 2950, 51852], "temperature": 0.0, "avg_logprob": -0.31679315464470975, "compression_ratio": + 1.5478260869565217, "no_speech_prob": 0.0013221359113231301}, {"id": 338, "seek": + 222072, "start": 2221.12, "end": 2226.3999999999996, "text": " and you can download + those, you can install them on your laptop, you can even get pre-trained", "tokens": + [50384, 293, 291, 393, 5484, 729, 11, 291, 393, 3625, 552, 322, 428, 10732, 11, + 291, 393, 754, 483, 659, 12, 17227, 2001, 50648], "temperature": 0.0, "avg_logprob": + -0.16937419805633888, "compression_ratio": 1.6170212765957446, "no_speech_prob": + 0.0010273130610585213}, {"id": 339, "seek": 222072, "start": 2226.3999999999996, + "end": 2231.8399999999997, "text": " models for hundreds of languages that do so. + And from that, it should be easy enough. You can", "tokens": [50648, 5245, 337, + 6779, 295, 8650, 300, 360, 370, 13, 400, 490, 300, 11, 309, 820, 312, 1858, 1547, + 13, 509, 393, 50920], "temperature": 0.0, "avg_logprob": -0.16937419805633888, "compression_ratio": + 1.6170212765957446, "no_speech_prob": 0.0010273130610585213}, {"id": 340, "seek": + 222072, "start": 2231.8399999999997, "end": 2238.64, "text": " just walk through + the tutorials where you take just a bunch of labeled text examples in the case", + "tokens": [50920, 445, 1792, 807, 264, 17616, 689, 291, 747, 445, 257, 3840, 295, + 21335, 2487, 5110, 294, 264, 1389, 51260], "temperature": 0.0, "avg_logprob": -0.16937419805633888, + "compression_ratio": 1.6170212765957446, "no_speech_prob": 0.0010273130610585213}, + {"id": 341, "seek": 222072, "start": 2238.64, "end": 2246.3999999999996, "text": + " of past text, it''s an example of stack exchange cooking questions associated + with it with their", "tokens": [51260, 295, 1791, 2487, 11, 309, 311, 364, 1365, + 295, 8630, 7742, 6361, 1651, 6615, 365, 309, 365, 641, 51648], "temperature": 0.0, + "avg_logprob": -0.16937419805633888, "compression_ratio": 1.6170212765957446, "no_speech_prob": + 0.0010273130610585213}, {"id": 342, "seek": 224640, "start": 2246.4, "end": 2255.52, + "text": " labels. And in half an hour, you find that you''re actually doing content + classification from this.", "tokens": [50364, 16949, 13, 400, 294, 1922, 364, 1773, + 11, 291, 915, 300, 291, 434, 767, 884, 2701, 21538, 490, 341, 13, 50820], "temperature": + 0.0, "avg_logprob": -0.183808918656974, "compression_ratio": 1.7207207207207207, + "no_speech_prob": 0.000798052700702101}, {"id": 343, "seek": 224640, "start": 2255.52, + "end": 2263.36, "text": " And in the course that we do this sort of thing with the + best by data as well, it''s amazingly", "tokens": [50820, 400, 294, 264, 1164, 300, + 321, 360, 341, 1333, 295, 551, 365, 264, 1151, 538, 1412, 382, 731, 11, 309, 311, + 31762, 51212], "temperature": 0.0, "avg_logprob": -0.183808918656974, "compression_ratio": + 1.7207207207207207, "no_speech_prob": 0.000798052700702101}, {"id": 344, "seek": + 224640, "start": 2263.36, "end": 2270.4, "text": " easy to see that these sort of + tools will start to give you reasonable looking answers. Now,", "tokens": [51212, + 1858, 281, 536, 300, 613, 1333, 295, 3873, 486, 722, 281, 976, 291, 10585, 1237, + 6338, 13, 823, 11, 51564], "temperature": 0.0, "avg_logprob": -0.183808918656974, + "compression_ratio": 1.7207207207207207, "no_speech_prob": 0.000798052700702101}, + {"id": 345, "seek": 224640, "start": 2270.4, "end": 2274.48, "text": " getting for + reasonable answers to answers that you''re happy with and incorporating into a search", + "tokens": [51564, 1242, 337, 10585, 6338, 281, 6338, 300, 291, 434, 2055, 365, 293, + 33613, 666, 257, 3164, 51768], "temperature": 0.0, "avg_logprob": -0.183808918656974, + "compression_ratio": 1.7207207207207207, "no_speech_prob": 0.000798052700702101}, + {"id": 346, "seek": 227448, "start": 2274.48, "end": 2282.4, "text": " experience + can be the difference between an hour and a week or a month or but my hope is that", + "tokens": [50364, 1752, 393, 312, 264, 2649, 1296, 364, 1773, 293, 257, 1243, 420, + 257, 1618, 420, 457, 452, 1454, 307, 300, 50760], "temperature": 0.0, "avg_logprob": + -0.18961044311523437, "compression_ratio": 1.6089965397923875, "no_speech_prob": + 0.03733779117465019}, {"id": 347, "seek": 227448, "start": 2283.28, "end": 2289.12, + "text": " by seeing how easy it is to get started with these, you get tempted enough + that you say,", "tokens": [50804, 538, 2577, 577, 1858, 309, 307, 281, 483, 1409, + 365, 613, 11, 291, 483, 29941, 1547, 300, 291, 584, 11, 51096], "temperature": 0.0, + "avg_logprob": -0.18961044311523437, "compression_ratio": 1.6089965397923875, "no_speech_prob": + 0.03733779117465019}, {"id": 348, "seek": 227448, "start": 2289.12, "end": 2294.4, + "text": " great, but 80% isn''t good enough. I need to get myself to something I''d + be willing to put", "tokens": [51096, 869, 11, 457, 4688, 4, 1943, 380, 665, 1547, + 13, 286, 643, 281, 483, 2059, 281, 746, 286, 1116, 312, 4950, 281, 829, 51360], + "temperature": 0.0, "avg_logprob": -0.18961044311523437, "compression_ratio": 1.6089965397923875, + "no_speech_prob": 0.03733779117465019}, {"id": 349, "seek": 227448, "start": 2294.4, + "end": 2299.04, "text": " in front of my customers. And to be fair, there''s a little + hard, more hard work to make that happen.", "tokens": [51360, 294, 1868, 295, 452, + 4581, 13, 400, 281, 312, 3143, 11, 456, 311, 257, 707, 1152, 11, 544, 1152, 589, + 281, 652, 300, 1051, 13, 51592], "temperature": 0.0, "avg_logprob": -0.18961044311523437, + "compression_ratio": 1.6089965397923875, "no_speech_prob": 0.03733779117465019}, + {"id": 350, "seek": 227448, "start": 2299.76, "end": 2304.0, "text": " Yeah, absolutely. + Vector''s search, by the way, is my favorite topic. I talk a lot about it.", "tokens": + [51628, 865, 11, 3122, 13, 691, 20814, 311, 3164, 11, 538, 264, 636, 11, 307, 452, + 2954, 4829, 13, 286, 751, 257, 688, 466, 309, 13, 51840], "temperature": 0.0, "avg_logprob": + -0.18961044311523437, "compression_ratio": 1.6089965397923875, "no_speech_prob": + 0.03733779117465019}, {"id": 351, "seek": 230400, "start": 2304.0, "end": 2311.92, + "text": " And the look as well. And I''m super, super happy that you mentioned this + now. And the gateway here", "tokens": [50364, 400, 264, 574, 382, 731, 13, 400, + 286, 478, 1687, 11, 1687, 2055, 300, 291, 2835, 341, 586, 13, 400, 264, 28532, 510, + 50760], "temperature": 0.0, "avg_logprob": -0.2149822363692723, "compression_ratio": + 1.6553191489361703, "no_speech_prob": 0.0009613718721084297}, {"id": 352, "seek": + 230400, "start": 2311.92, "end": 2317.52, "text": " to this topic is the connection + with content understanding is one of the techniques called", "tokens": [50760, 281, + 341, 4829, 307, 264, 4984, 365, 2701, 3701, 307, 472, 295, 264, 7512, 1219, 51040], + "temperature": 0.0, "avg_logprob": -0.2149822363692723, "compression_ratio": 1.6553191489361703, + "no_speech_prob": 0.0009613718721084297}, {"id": 353, "seek": 230400, "start": 2318.16, + "end": 2327.12, "text": " doc to query, essentially computes possible queries and + then augments your document. So you don''t", "tokens": [51072, 3211, 281, 14581, + 11, 4476, 715, 1819, 1944, 24109, 293, 550, 14501, 1117, 428, 4166, 13, 407, 291, + 500, 380, 51520], "temperature": 0.0, "avg_logprob": -0.2149822363692723, "compression_ratio": + 1.6553191489361703, "no_speech_prob": 0.0009613718721084297}, {"id": 354, "seek": + 230400, "start": 2327.12, "end": 2333.6, "text": " actually need to step in the + unknown field of vector search trying to re-engineer your search engine.", "tokens": + [51520, 767, 643, 281, 1823, 294, 264, 9841, 2519, 295, 8062, 3164, 1382, 281, 319, + 12, 25609, 260, 428, 3164, 2848, 13, 51844], "temperature": 0.0, "avg_logprob": + -0.2149822363692723, "compression_ratio": 1.6553191489361703, "no_speech_prob": + 0.0009613718721084297}, {"id": 355, "seek": 233360, "start": 2333.6, "end": 2338.7999999999997, + "text": " You can actually keep your search engine architecture as it is. And you + just basically augment your", "tokens": [50364, 509, 393, 767, 1066, 428, 3164, + 2848, 9482, 382, 309, 307, 13, 400, 291, 445, 1936, 29919, 428, 50624], "temperature": + 0.0, "avg_logprob": -0.18606754585548682, "compression_ratio": 1.7472118959107807, + "no_speech_prob": 0.0014540239935740829}, {"id": 356, "seek": 233360, "start": 2338.7999999999997, + "end": 2344.96, "text": " documents in the hope of increasing coverage and actually + precision at the same time of future queries.", "tokens": [50624, 8512, 294, 264, + 1454, 295, 5662, 9645, 293, 767, 18356, 412, 264, 912, 565, 295, 2027, 24109, 13, + 50932], "temperature": 0.0, "avg_logprob": -0.18606754585548682, "compression_ratio": + 1.7472118959107807, "no_speech_prob": 0.0014540239935740829}, {"id": 357, "seek": + 233360, "start": 2344.96, "end": 2350.7999999999997, "text": " So what''s your take + on this on these techniques, emerging techniques, but also what''s your take on", + "tokens": [50932, 407, 437, 311, 428, 747, 322, 341, 322, 613, 7512, 11, 14989, + 7512, 11, 457, 611, 437, 311, 428, 747, 322, 51224], "temperature": 0.0, "avg_logprob": + -0.18606754585548682, "compression_ratio": 1.7472118959107807, "no_speech_prob": + 0.0014540239935740829}, {"id": 358, "seek": 233360, "start": 2350.7999999999997, + "end": 2354.88, "text": " on the role of vector search in general in the search + engine design today?", "tokens": [51224, 322, 264, 3090, 295, 8062, 3164, 294, 2674, + 294, 264, 3164, 2848, 1715, 965, 30, 51428], "temperature": 0.0, "avg_logprob": + -0.18606754585548682, "compression_ratio": 1.7472118959107807, "no_speech_prob": + 0.0014540239935740829}, {"id": 359, "seek": 233360, "start": 2355.6, "end": 2361.7599999999998, + "text": " Sure. So if you think about it, the Dr. Query approach is similar in spirit + to saying, well,", "tokens": [51464, 4894, 13, 407, 498, 291, 519, 466, 309, 11, + 264, 2491, 13, 2326, 2109, 3109, 307, 2531, 294, 3797, 281, 1566, 11, 731, 11, 51772], + "temperature": 0.0, "avg_logprob": -0.18606754585548682, "compression_ratio": 1.7472118959107807, + "no_speech_prob": 0.0014540239935740829}, {"id": 360, "seek": 236176, "start": 2362.7200000000003, + "end": 2368.6400000000003, "text": " I''m going to just, I''ll have a known set + of fields that I would assign to the document in traditional", "tokens": [50412, + 286, 478, 516, 281, 445, 11, 286, 603, 362, 257, 2570, 992, 295, 7909, 300, 286, + 576, 6269, 281, 264, 4166, 294, 5164, 50708], "temperature": 0.0, "avg_logprob": + -0.21113265644420276, "compression_ratio": 1.6043478260869566, "no_speech_prob": + 0.002477979985997081}, {"id": 361, "seek": 236176, "start": 2368.6400000000003, + "end": 2375.84, "text": " inverted index or posting list. And indeed, the limitation + of the older approaches is that", "tokens": [50708, 38969, 8186, 420, 15978, 1329, + 13, 400, 6451, 11, 264, 27432, 295, 264, 4906, 11587, 307, 300, 51068], "temperature": + 0.0, "avg_logprob": -0.21113265644420276, "compression_ratio": 1.6043478260869566, + "no_speech_prob": 0.002477979985997081}, {"id": 362, "seek": 236176, "start": 2375.84, + "end": 2382.0800000000004, "text": " they get a kind of force you to a limited vocabulary. + And now the query vocabulary", "tokens": [51068, 436, 483, 257, 733, 295, 3464, + 291, 281, 257, 5567, 19864, 13, 400, 586, 264, 14581, 19864, 51380], "temperature": + 0.0, "avg_logprob": -0.21113265644420276, "compression_ratio": 1.6043478260869566, + "no_speech_prob": 0.002477979985997081}, {"id": 363, "seek": 236176, "start": 2383.76, + "end": 2391.5200000000004, "text": " is literally the language of users. So I think + that''s a great way to do things. And to handle", "tokens": [51464, 307, 3736, 264, + 2856, 295, 5022, 13, 407, 286, 519, 300, 311, 257, 869, 636, 281, 360, 721, 13, + 400, 281, 4813, 51852], "temperature": 0.0, "avg_logprob": -0.21113265644420276, + "compression_ratio": 1.6043478260869566, "no_speech_prob": 0.002477979985997081}, + {"id": 364, "seek": 239176, "start": 2391.84, "end": 2399.92, "text": " also that + documents have often a lot more variability than queries. This is typically the + only", "tokens": [50368, 611, 300, 8512, 362, 2049, 257, 688, 544, 35709, 813, 24109, + 13, 639, 307, 5850, 264, 787, 50772], "temperature": 0.0, "avg_logprob": -0.21228601599252353, + "compression_ratio": 1.6652173913043478, "no_speech_prob": 0.0003259042277932167}, + {"id": 365, "seek": 239176, "start": 2399.92, "end": 2404.2400000000002, "text": + " some of my people are going to do in a search box, but documents can be in all + shapes and sizes.", "tokens": [50772, 512, 295, 452, 561, 366, 516, 281, 360, 294, + 257, 3164, 2424, 11, 457, 8512, 393, 312, 294, 439, 10854, 293, 11602, 13, 50988], + "temperature": 0.0, "avg_logprob": -0.21228601599252353, "compression_ratio": 1.6652173913043478, + "no_speech_prob": 0.0003259042277932167}, {"id": 366, "seek": 239176, "start": 2404.2400000000002, + "end": 2412.96, "text": " So I''m certainly a fan of doing document enrichment that''s + query friendly or conversely,", "tokens": [50988, 407, 286, 478, 3297, 257, 3429, + 295, 884, 4166, 49900, 300, 311, 14581, 9208, 420, 2615, 736, 11, 51424], "temperature": + 0.0, "avg_logprob": -0.21228601599252353, "compression_ratio": 1.6652173913043478, + "no_speech_prob": 0.0003259042277932167}, {"id": 367, "seek": 239176, "start": 2414.1600000000003, + "end": 2418.7200000000003, "text": " if you''re going to do things on the query + side, to think of a query is actually as a bag of documents.", "tokens": [51484, + 498, 291, 434, 516, 281, 360, 721, 322, 264, 14581, 1252, 11, 281, 519, 295, 257, + 14581, 307, 767, 382, 257, 3411, 295, 8512, 13, 51712], "temperature": 0.0, "avg_logprob": + -0.21228601599252353, "compression_ratio": 1.6652173913043478, "no_speech_prob": + 0.0003259042277932167}, {"id": 368, "seek": 241872, "start": 2418.72, "end": 2424.8799999999997, + "text": " I think even though there have been these explicit, what we call two tower + approaches that try to", "tokens": [50364, 286, 519, 754, 1673, 456, 362, 668, 613, + 13691, 11, 437, 321, 818, 732, 10567, 11587, 300, 853, 281, 50672], "temperature": + 0.0, "avg_logprob": -0.32291127255088403, "compression_ratio": 1.6415929203539823, + "no_speech_prob": 0.0005262416671030223}, {"id": 369, "seek": 241872, "start": 2425.7599999999998, + "end": 2430.16, "text": " sort of meet halfway, I think it''s perfectly fine to + say, well, we''ll think of one of these things", "tokens": [50716, 1333, 295, 1677, + 15461, 11, 286, 519, 309, 311, 6239, 2489, 281, 584, 11, 731, 11, 321, 603, 519, + 295, 472, 295, 613, 721, 50936], "temperature": 0.0, "avg_logprob": -0.32291127255088403, + "compression_ratio": 1.6415929203539823, "no_speech_prob": 0.0005262416671030223}, + {"id": 370, "seek": 241872, "start": 2430.16, "end": 2438.64, "text": " as more + fundamental and not the second one, too. I think first off, I think it''s great.", + "tokens": [50936, 382, 544, 8088, 293, 406, 264, 1150, 472, 11, 886, 13, 286, 519, + 700, 766, 11, 286, 519, 309, 311, 869, 13, 51360], "temperature": 0.0, "avg_logprob": + -0.32291127255088403, "compression_ratio": 1.6415929203539823, "no_speech_prob": + 0.0005262416671030223}, {"id": 371, "seek": 241872, "start": 2439.6, "end": 2445.3599999999997, + "text": " The idea that you can think of meaning as it comes, point at an eye-dimensional + space,", "tokens": [51408, 440, 1558, 300, 291, 393, 519, 295, 3620, 382, 309, 1487, + 11, 935, 412, 364, 3313, 12, 18759, 1901, 11, 51696], "temperature": 0.0, "avg_logprob": + -0.32291127255088403, "compression_ratio": 1.6415929203539823, "no_speech_prob": + 0.0005262416671030223}, {"id": 372, "seek": 244536, "start": 2445.36, "end": 2452.08, + "text": " it explores things around it, even though in a way it''s not a new idea, + right?", "tokens": [50364, 309, 45473, 721, 926, 309, 11, 754, 1673, 294, 257, 636, + 309, 311, 406, 257, 777, 1558, 11, 558, 30, 50700], "temperature": 0.0, "avg_logprob": + -0.18379890217500575, "compression_ratio": 1.4936708860759493, "no_speech_prob": + 0.002841924550011754}, {"id": 373, "seek": 244536, "start": 2452.08, "end": 2459.84, + "text": " People have been using vectors at least as far back as techniques like + TFIDF where the bag of", "tokens": [50700, 3432, 362, 668, 1228, 18875, 412, 1935, + 382, 1400, 646, 382, 7512, 411, 40964, 2777, 37, 689, 264, 3411, 295, 51088], "temperature": + 0.0, "avg_logprob": -0.18379890217500575, "compression_ratio": 1.4936708860759493, + "no_speech_prob": 0.002841924550011754}, {"id": 374, "seek": 244536, "start": 2459.84, + "end": 2465.1200000000003, "text": " words representation of content was simply + a vector in the space where every word was", "tokens": [51088, 2283, 10290, 295, + 2701, 390, 2935, 257, 8062, 294, 264, 1901, 689, 633, 1349, 390, 51352], "temperature": + 0.0, "avg_logprob": -0.18379890217500575, "compression_ratio": 1.4936708860759493, + "no_speech_prob": 0.002841924550011754}, {"id": 375, "seek": 244536, "start": 2465.1200000000003, + "end": 2470.1600000000003, "text": " intervention. I''m glad you''ve gotten a little + bit smarter about that over the past few decades.", "tokens": [51352, 13176, 13, + 286, 478, 5404, 291, 600, 5768, 257, 707, 857, 20294, 466, 300, 670, 264, 1791, + 1326, 7878, 13, 51604], "temperature": 0.0, "avg_logprob": -0.18379890217500575, + "compression_ratio": 1.4936708860759493, "no_speech_prob": 0.002841924550011754}, + {"id": 376, "seek": 247016, "start": 2470.16, "end": 2478.96, "text": " And certainly + what we can do now with embeddings is amazing. With that said, I think that", "tokens": + [50364, 400, 3297, 437, 321, 393, 360, 586, 365, 12240, 29432, 307, 2243, 13, 2022, + 300, 848, 11, 286, 519, 300, 50804], "temperature": 0.0, "avg_logprob": -0.1575075694492885, + "compression_ratio": 1.5138121546961325, "no_speech_prob": 0.0019579569343477488}, + {"id": 377, "seek": 247016, "start": 2480.8799999999997, "end": 2489.68, "text": + " sometimes people use vectors as too much of a sledgehammer. For example, if I + do a query on a site", "tokens": [50900, 2171, 561, 764, 18875, 382, 886, 709, 295, + 257, 1061, 12203, 39985, 13, 1171, 1365, 11, 498, 286, 360, 257, 14581, 322, 257, + 3621, 51340], "temperature": 0.0, "avg_logprob": -0.1575075694492885, "compression_ratio": + 1.5138121546961325, "no_speech_prob": 0.0019579569343477488}, {"id": 378, "seek": + 247016, "start": 2489.68, "end": 2497.6, "text": " for cat, turning cats into a + vector, and then turning all the documents into vectors,", "tokens": [51340, 337, + 3857, 11, 6246, 11111, 666, 257, 8062, 11, 293, 550, 6246, 439, 264, 8512, 666, + 18875, 11, 51736], "temperature": 0.0, "avg_logprob": -0.1575075694492885, "compression_ratio": + 1.5138121546961325, "no_speech_prob": 0.0019579569343477488}, {"id": 379, "seek": + 249760, "start": 2497.6, "end": 2503.6, "text": " and then sorting them across my + code sign, probably is overkill when how much information am I", "tokens": [50364, + 293, 550, 32411, 552, 2108, 452, 3089, 1465, 11, 1391, 307, 670, 34213, 562, 577, + 709, 1589, 669, 286, 50664], "temperature": 0.0, "avg_logprob": -0.2239785875592913, + "compression_ratio": 1.5588235294117647, "no_speech_prob": 0.0025785823818296194}, + {"id": 380, "seek": 249760, "start": 2503.6, "end": 2509.7599999999998, "text": + " going to get out of a single token cat? Figuring out whether something is a cat, + as Google showed,", "tokens": [50664, 516, 281, 483, 484, 295, 257, 2167, 14862, + 3857, 30, 22443, 1345, 484, 1968, 746, 307, 257, 3857, 11, 382, 3329, 4712, 11, + 50972], "temperature": 0.0, "avg_logprob": -0.2239785875592913, "compression_ratio": + 1.5588235294117647, "no_speech_prob": 0.0025785823818296194}, {"id": 381, "seek": + 249760, "start": 2509.7599999999998, "end": 2515.04, "text": " may require a huge + amount of machine learning, for example, with based on taking images.", "tokens": + [50972, 815, 3651, 257, 2603, 2372, 295, 3479, 2539, 11, 337, 1365, 11, 365, 2361, + 322, 1940, 5267, 13, 51236], "temperature": 0.0, "avg_logprob": -0.2239785875592913, + "compression_ratio": 1.5588235294117647, "no_speech_prob": 0.0025785823818296194}, + {"id": 382, "seek": 249760, "start": 2515.04, "end": 2521.6, "text": " But it''s + probably safe to say that at least at the query level, there''s only so much new", + "tokens": [51236, 583, 309, 311, 1391, 3273, 281, 584, 300, 412, 1935, 412, 264, + 14581, 1496, 11, 456, 311, 787, 370, 709, 777, 51564], "temperature": 0.0, "avg_logprob": + -0.2239785875592913, "compression_ratio": 1.5588235294117647, "no_speech_prob": + 0.0025785823818296194}, {"id": 383, "seek": 252160, "start": 2521.6, "end": 2527.7599999999998, + "text": " ones are going to get out of a one word query corresponding to an entity. + And if you", "tokens": [50364, 2306, 366, 516, 281, 483, 484, 295, 257, 472, 1349, + 14581, 11760, 281, 364, 13977, 13, 400, 498, 291, 50672], "temperature": 0.0, "avg_logprob": + -0.24228742848271909, "compression_ratio": 1.7188940092165899, "no_speech_prob": + 0.005992407910525799}, {"id": 384, "seek": 252160, "start": 2529.12, "end": 2533.6, + "text": " a traditional approach where you might cure a vector based approach would + say, well,", "tokens": [50740, 257, 5164, 3109, 689, 291, 1062, 13698, 257, 8062, + 2361, 3109, 576, 584, 11, 731, 11, 50964], "temperature": 0.0, "avg_logprob": -0.24228742848271909, + "compression_ratio": 1.7188940092165899, "no_speech_prob": 0.005992407910525799}, + {"id": 385, "seek": 252160, "start": 2534.3199999999997, "end": 2539.36, "text": + " I''m going to take the vector for my query cat. I''m going to take all of the + vectors for my documents,", "tokens": [51000, 286, 478, 516, 281, 747, 264, 8062, + 337, 452, 14581, 3857, 13, 286, 478, 516, 281, 747, 439, 295, 264, 18875, 337, 452, + 8512, 11, 51252], "temperature": 0.0, "avg_logprob": -0.24228742848271909, "compression_ratio": + 1.7188940092165899, "no_speech_prob": 0.005992407910525799}, {"id": 386, "seek": + 252160, "start": 2539.36, "end": 2544.72, "text": " which I vary the reason of cat + and is implied. I suppose in those vectors and sort by their distance,", "tokens": + [51252, 597, 286, 10559, 264, 1778, 295, 3857, 293, 307, 32614, 13, 286, 7297, 294, + 729, 18875, 293, 1333, 538, 641, 4560, 11, 51520], "temperature": 0.0, "avg_logprob": + -0.24228742848271909, "compression_ratio": 1.7188940092165899, "no_speech_prob": + 0.005992407910525799}, {"id": 387, "seek": 254472, "start": 2544.72, "end": 2550.72, + "text": " it probably makes sense to start by saying, maybe I could have actually, + you know,", "tokens": [50364, 309, 1391, 1669, 2020, 281, 722, 538, 1566, 11, 1310, + 286, 727, 362, 767, 11, 291, 458, 11, 50664], "temperature": 0.0, "avg_logprob": + -0.22435046256856717, "compression_ratio": 1.6077586206896552, "no_speech_prob": + 0.0011152724036946893}, {"id": 388, "seek": 254472, "start": 2551.2799999999997, + "end": 2556.7999999999997, "text": " either using the doctor query or more traditional + methods annotated the documents in such a way", "tokens": [50692, 2139, 1228, 264, + 4631, 14581, 420, 544, 5164, 7150, 25339, 770, 264, 8512, 294, 1270, 257, 636, 50968], + "temperature": 0.0, "avg_logprob": -0.22435046256856717, "compression_ratio": 1.6077586206896552, + "no_speech_prob": 0.0011152724036946893}, {"id": 389, "seek": 254472, "start": 2556.7999999999997, + "end": 2563.7599999999998, "text": " that for the first pass, I could get the things + here. In the case of queries, and that simply,", "tokens": [50968, 300, 337, 264, + 700, 1320, 11, 286, 727, 483, 264, 721, 510, 13, 682, 264, 1389, 295, 24109, 11, + 293, 300, 2935, 11, 51316], "temperature": 0.0, "avg_logprob": -0.22435046256856717, + "compression_ratio": 1.6077586206896552, "no_speech_prob": 0.0011152724036946893}, + {"id": 390, "seek": 254472, "start": 2564.48, "end": 2569.68, "text": " as only + I''ve spoken in it, there may not be much I can do at that point in terms of use + of vectors.", "tokens": [51352, 382, 787, 286, 600, 10759, 294, 309, 11, 456, 815, + 406, 312, 709, 286, 393, 360, 412, 300, 935, 294, 2115, 295, 764, 295, 18875, 13, + 51612], "temperature": 0.0, "avg_logprob": -0.22435046256856717, "compression_ratio": + 1.6077586206896552, "no_speech_prob": 0.0011152724036946893}, {"id": 391, "seek": + 256968, "start": 2569.68, "end": 2577.04, "text": " Now, as the queries get longer, + have more signal in them. This game changes completely.", "tokens": [50364, 823, + 11, 382, 264, 24109, 483, 2854, 11, 362, 544, 6358, 294, 552, 13, 639, 1216, 2962, + 2584, 13, 50732], "temperature": 0.0, "avg_logprob": -0.21416847522442156, "compression_ratio": + 1.5630252100840336, "no_speech_prob": 0.0007685206946916878}, {"id": 392, "seek": + 256968, "start": 2577.04, "end": 2585.12, "text": " If I''m saying, I''m looking + for a cat wearing a red bow tie. Well, with a query like that,", "tokens": [50732, + 759, 286, 478, 1566, 11, 286, 478, 1237, 337, 257, 3857, 4769, 257, 2182, 4503, + 7582, 13, 1042, 11, 365, 257, 14581, 411, 300, 11, 51136], "temperature": 0.0, "avg_logprob": + -0.21416847522442156, "compression_ratio": 1.5630252100840336, "no_speech_prob": + 0.0007685206946916878}, {"id": 393, "seek": 256968, "start": 2585.9199999999996, + "end": 2591.7599999999998, "text": " it''s very unlikely that a traditional approach + is going to be able to say, well, what do I do?", "tokens": [51176, 309, 311, 588, + 17518, 300, 257, 5164, 3109, 307, 516, 281, 312, 1075, 281, 584, 11, 731, 11, 437, + 360, 286, 360, 30, 51468], "temperature": 0.0, "avg_logprob": -0.21416847522442156, + "compression_ratio": 1.5630252100840336, "no_speech_prob": 0.0007685206946916878}, + {"id": 394, "seek": 256968, "start": 2591.7599999999998, "end": 2596.96, "text": + " I''m going to look for those words. Some of them, some other ones, you know, is + a neck tie. The same", "tokens": [51468, 286, 478, 516, 281, 574, 337, 729, 2283, + 13, 2188, 295, 552, 11, 512, 661, 2306, 11, 291, 458, 11, 307, 257, 6189, 7582, + 13, 440, 912, 51728], "temperature": 0.0, "avg_logprob": -0.21416847522442156, "compression_ratio": + 1.5630252100840336, "no_speech_prob": 0.0007685206946916878}, {"id": 395, "seek": + 259696, "start": 2596.96, "end": 2605.36, "text": " as a bow tie, you know, would + a cat in a texito be better than just your typical cat pictures.", "tokens": [50364, + 382, 257, 4503, 7582, 11, 291, 458, 11, 576, 257, 3857, 294, 257, 535, 87, 3528, + 312, 1101, 813, 445, 428, 7476, 3857, 5242, 13, 50784], "temperature": 0.0, "avg_logprob": + -0.27582992391383393, "compression_ratio": 1.5603448275862069, "no_speech_prob": + 0.00234813429415226}, {"id": 396, "seek": 259696, "start": 2605.36, "end": 2612.64, + "text": " And so, at that point, the game has changed because it''s not a symbol + binary question anymore.", "tokens": [50784, 400, 370, 11, 412, 300, 935, 11, 264, + 1216, 575, 3105, 570, 309, 311, 406, 257, 5986, 17434, 1168, 3602, 13, 51148], "temperature": + 0.0, "avg_logprob": -0.27582992391383393, "compression_ratio": 1.5603448275862069, + "no_speech_prob": 0.00234813429415226}, {"id": 397, "seek": 259696, "start": 2612.64, + "end": 2616.7200000000003, "text": " And the identity that is reduced to similarity + make a huge difference. And there,", "tokens": [51148, 400, 264, 6575, 300, 307, + 9212, 281, 32194, 652, 257, 2603, 2649, 13, 400, 456, 11, 51352], "temperature": + 0.0, "avg_logprob": -0.27582992391383393, "compression_ratio": 1.5603448275862069, + "no_speech_prob": 0.00234813429415226}, {"id": 398, "seek": 259696, "start": 2617.6, + "end": 2623.76, "text": " I think you lean much more heavily into things. Now, I''d + say that it''s still the case that", "tokens": [51396, 286, 519, 291, 11659, 709, + 544, 10950, 666, 721, 13, 823, 11, 286, 1116, 584, 300, 309, 311, 920, 264, 1389, + 300, 51704], "temperature": 0.0, "avg_logprob": -0.27582992391383393, "compression_ratio": + 1.5603448275862069, "no_speech_prob": 0.00234813429415226}, {"id": 399, "seek": + 262376, "start": 2624.2400000000002, "end": 2632.1600000000003, "text": " doing + it pure, you know, grab everything as a sort of a nearest neighbor''s search in + a vector space", "tokens": [50388, 884, 309, 6075, 11, 291, 458, 11, 4444, 1203, + 382, 257, 1333, 295, 257, 23831, 5987, 311, 3164, 294, 257, 8062, 1901, 50784], + "temperature": 0.0, "avg_logprob": -0.2249653838401617, "compression_ratio": 1.5659574468085107, + "no_speech_prob": 0.007692709565162659}, {"id": 400, "seek": 262376, "start": 2632.1600000000003, + "end": 2637.6000000000004, "text": " can be computationally challenging. And it + can lead to you, you sort of unpredictable results.", "tokens": [50784, 393, 312, + 24903, 379, 7595, 13, 400, 309, 393, 1477, 281, 291, 11, 291, 1333, 295, 31160, + 3542, 13, 51056], "temperature": 0.0, "avg_logprob": -0.2249653838401617, "compression_ratio": + 1.5659574468085107, "no_speech_prob": 0.007692709565162659}, {"id": 401, "seek": + 262376, "start": 2637.6000000000004, "end": 2643.76, "text": " So, most people today + are still doing their first pass at least by using traditional methods.", "tokens": + [51056, 407, 11, 881, 561, 965, 366, 920, 884, 641, 700, 1320, 412, 1935, 538, 1228, + 5164, 7150, 13, 51364], "temperature": 0.0, "avg_logprob": -0.2249653838401617, + "compression_ratio": 1.5659574468085107, "no_speech_prob": 0.007692709565162659}, + {"id": 402, "seek": 262376, "start": 2643.76, "end": 2649.6800000000003, "text": + " But I do know folks who are increasingly trying to use vectors from the get go,", + "tokens": [51364, 583, 286, 360, 458, 4024, 567, 366, 12980, 1382, 281, 764, 18875, + 490, 264, 483, 352, 11, 51660], "temperature": 0.0, "avg_logprob": -0.2249653838401617, + "compression_ratio": 1.5659574468085107, "no_speech_prob": 0.007692709565162659}, + {"id": 403, "seek": 264968, "start": 2649.7599999999998, "end": 2657.6, "text": + " but just by using sort of course, or grade vectors or various techniques to make + that first pass be", "tokens": [50368, 457, 445, 538, 1228, 1333, 295, 1164, 11, + 420, 7204, 18875, 420, 3683, 7512, 281, 652, 300, 700, 1320, 312, 50760], "temperature": + 0.0, "avg_logprob": -0.21222081295279568, "compression_ratio": 1.584033613445378, + "no_speech_prob": 0.0017880845116451383}, {"id": 404, "seek": 264968, "start": 2657.6, + "end": 2666.56, "text": " quick enough. So, I think we''re going in that direction. + I think that there''s still a lot of value", "tokens": [50760, 1702, 1547, 13, 407, + 11, 286, 519, 321, 434, 516, 294, 300, 3513, 13, 286, 519, 300, 456, 311, 920, 257, + 688, 295, 2158, 51208], "temperature": 0.0, "avg_logprob": -0.21222081295279568, + "compression_ratio": 1.584033613445378, "no_speech_prob": 0.0017880845116451383}, + {"id": 405, "seek": 264968, "start": 2666.56, "end": 2673.04, "text": " both computational + efficiency and for end explainability in using, you know, traditional", "tokens": + [51208, 1293, 28270, 10493, 293, 337, 917, 2903, 2310, 294, 1228, 11, 291, 458, + 11, 5164, 51532], "temperature": 0.0, "avg_logprob": -0.21222081295279568, "compression_ratio": + 1.584033613445378, "no_speech_prob": 0.0017880845116451383}, {"id": 406, "seek": + 264968, "start": 2673.04, "end": 2677.2799999999997, "text": " inverted indexing + techniques where you can, especially for the early stages of retrieval.", "tokens": + [51532, 38969, 8186, 278, 7512, 689, 291, 393, 11, 2318, 337, 264, 2440, 10232, + 295, 19817, 3337, 13, 51744], "temperature": 0.0, "avg_logprob": -0.21222081295279568, + "compression_ratio": 1.584033613445378, "no_speech_prob": 0.0017880845116451383}, + {"id": 407, "seek": 267728, "start": 2677.28, "end": 2684.0800000000004, "text": + " But for either for getting these nuances or for say increasing your recall, but, + you know,", "tokens": [50364, 583, 337, 2139, 337, 1242, 613, 38775, 420, 337, 584, + 5662, 428, 9901, 11, 457, 11, 291, 458, 11, 50704], "temperature": 0.0, "avg_logprob": + -0.181779432832525, "compression_ratio": 1.6309012875536482, "no_speech_prob": 0.0012567511294037104}, + {"id": 408, "seek": 267728, "start": 2684.0800000000004, "end": 2688.7200000000003, + "text": " retrieving things that might might have lost otherwise. We''re seeing + increasingly the use of", "tokens": [50704, 19817, 798, 721, 300, 1062, 1062, 362, + 2731, 5911, 13, 492, 434, 2577, 12980, 264, 764, 295, 50936], "temperature": 0.0, + "avg_logprob": -0.181779432832525, "compression_ratio": 1.6309012875536482, "no_speech_prob": + 0.0012567511294037104}, {"id": 409, "seek": 267728, "start": 2688.7200000000003, + "end": 2696.96, "text": " a vector search to get them. And, you know, we''re doing + this talk now in 2022. I suspect in a few years", "tokens": [50936, 257, 8062, 3164, + 281, 483, 552, 13, 400, 11, 291, 458, 11, 321, 434, 884, 341, 751, 586, 294, 20229, + 13, 286, 9091, 294, 257, 1326, 924, 51348], "temperature": 0.0, "avg_logprob": -0.181779432832525, + "compression_ratio": 1.6309012875536482, "no_speech_prob": 0.0012567511294037104}, + {"id": 410, "seek": 267728, "start": 2698.1600000000003, "end": 2704.7200000000003, + "text": " the inverted, inverted index methods will become more and more confined + to those cases where", "tokens": [51408, 264, 38969, 11, 38969, 8186, 7150, 486, + 1813, 544, 293, 544, 31745, 281, 729, 3331, 689, 51736], "temperature": 0.0, "avg_logprob": + -0.181779432832525, "compression_ratio": 1.6309012875536482, "no_speech_prob": 0.0012567511294037104}, + {"id": 411, "seek": 270472, "start": 2704.7999999999997, "end": 2710.72, "text": + " where the data is really just simple binary. I think I''ve always used this. I + think this kind of,", "tokens": [50368, 689, 264, 1412, 307, 534, 445, 2199, 17434, + 13, 286, 519, 286, 600, 1009, 1143, 341, 13, 286, 519, 341, 733, 295, 11, 50664], + "temperature": 0.0, "avg_logprob": -0.19350245618444728, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.004530168604105711}, {"id": 412, "seek": 270472, "start": 2710.72, + "end": 2714.9599999999996, "text": " there''ll always be the certain head cases + of it, but the use of vectors is only going to expand.", "tokens": [50664, 456, + 603, 1009, 312, 264, 1629, 1378, 3331, 295, 309, 11, 457, 264, 764, 295, 18875, + 307, 787, 516, 281, 5268, 13, 50876], "temperature": 0.0, "avg_logprob": -0.19350245618444728, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.004530168604105711}, + {"id": 413, "seek": 270472, "start": 2716.0, "end": 2722.16, "text": " Yeah, absolutely. + Oh, like, especially where I would say, inverted index will still be needed if you", + "tokens": [50928, 865, 11, 3122, 13, 876, 11, 411, 11, 2318, 689, 286, 576, 584, + 11, 38969, 8186, 486, 920, 312, 2978, 498, 291, 51236], "temperature": 0.0, "avg_logprob": + -0.19350245618444728, "compression_ratio": 1.7142857142857142, "no_speech_prob": + 0.004530168604105711}, {"id": 414, "seek": 270472, "start": 2722.16, "end": 2726.72, + "text": " are looking for an exact phrase, like you don''t want to say, hey, vectorize + this and find the", "tokens": [51236, 366, 1237, 337, 364, 1900, 9535, 11, 411, + 291, 500, 380, 528, 281, 584, 11, 4177, 11, 8062, 1125, 341, 293, 915, 264, 51464], + "temperature": 0.0, "avg_logprob": -0.19350245618444728, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.004530168604105711}, {"id": 415, "seek": 270472, "start": 2726.72, + "end": 2732.16, "text": " similar. No, I don''t want similar. I want that exact + thing. And of course, there are other things that", "tokens": [51464, 2531, 13, + 883, 11, 286, 500, 380, 528, 2531, 13, 286, 528, 300, 1900, 551, 13, 400, 295, 1164, + 11, 456, 366, 661, 721, 300, 51736], "temperature": 0.0, "avg_logprob": -0.19350245618444728, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.004530168604105711}, + {"id": 416, "seek": 273216, "start": 2732.16, "end": 2737.7599999999998, "text": + " need to be improved in vector search, like, I don''t know, bird model, according + to one study.", "tokens": [50364, 643, 281, 312, 9689, 294, 8062, 3164, 11, 411, + 11, 286, 500, 380, 458, 11, 5255, 2316, 11, 4650, 281, 472, 2979, 13, 50644], "temperature": + 0.0, "avg_logprob": -0.21672942421653055, "compression_ratio": 1.6258741258741258, + "no_speech_prob": 0.0009910648223012686}, {"id": 417, "seek": 273216, "start": 2738.56, + "end": 2742.96, "text": " It doesn''t recognize negations and it might be actually + crucial for some search scenarios,", "tokens": [50684, 467, 1177, 380, 5521, 2485, + 763, 293, 309, 1062, 312, 767, 11462, 337, 512, 3164, 15077, 11, 50904], "temperature": + 0.0, "avg_logprob": -0.21672942421653055, "compression_ratio": 1.6258741258741258, + "no_speech_prob": 0.0009910648223012686}, {"id": 418, "seek": 273216, "start": 2742.96, + "end": 2748.08, "text": " or sentiment analysis. And also another thing is that + by now, at this point,", "tokens": [50904, 420, 16149, 5215, 13, 400, 611, 1071, + 551, 307, 300, 538, 586, 11, 412, 341, 935, 11, 51160], "temperature": 0.0, "avg_logprob": + -0.21672942421653055, "compression_ratio": 1.6258741258741258, "no_speech_prob": + 0.0009910648223012686}, {"id": 419, "seek": 273216, "start": 2748.08, "end": 2754.16, + "text": " the sparse search BM25 based methods is a very strong baseline when you + compare these methods across", "tokens": [51160, 264, 637, 11668, 3164, 15901, 6074, + 2361, 7150, 307, 257, 588, 2068, 20518, 562, 291, 6794, 613, 7150, 2108, 51464], + "temperature": 0.0, "avg_logprob": -0.21672942421653055, "compression_ratio": 1.6258741258741258, + "no_speech_prob": 0.0009910648223012686}, {"id": 420, "seek": 273216, "start": 2754.16, + "end": 2761.7599999999998, "text": " datasets, across tasks. And so across domains. + So I think the future is very bright on this direction,", "tokens": [51464, 42856, + 11, 2108, 9608, 13, 400, 370, 2108, 25514, 13, 407, 286, 519, 264, 2027, 307, 588, + 4730, 322, 341, 3513, 11, 51844], "temperature": 0.0, "avg_logprob": -0.21672942421653055, + "compression_ratio": 1.6258741258741258, "no_speech_prob": 0.0009910648223012686}, + {"id": 421, "seek": 276176, "start": 2761.76, "end": 2766.8, "text": " in this direction. + And I think a lot of folks are trying to combine this method. So I''m happy that", + "tokens": [50364, 294, 341, 3513, 13, 400, 286, 519, 257, 688, 295, 4024, 366, 1382, + 281, 10432, 341, 3170, 13, 407, 286, 478, 2055, 300, 50616], "temperature": 0.0, + "avg_logprob": -0.08813354756572458, "compression_ratio": 1.7217391304347827, "no_speech_prob": + 0.004793829284608364}, {"id": 422, "seek": 276176, "start": 2766.8, "end": 2773.6800000000003, + "text": " you are looking at this as well. And I believe you will be teaching about + this as well. We are quite", "tokens": [50616, 291, 366, 1237, 412, 341, 382, 731, + 13, 400, 286, 1697, 291, 486, 312, 4571, 466, 341, 382, 731, 13, 492, 366, 1596, + 50960], "temperature": 0.0, "avg_logprob": -0.08813354756572458, "compression_ratio": + 1.7217391304347827, "no_speech_prob": 0.004793829284608364}, {"id": 423, "seek": + 276176, "start": 2773.6800000000003, "end": 2780.0, "text": " close to the top of + the hour. And I''m happy to see so many queries coming in. But I''m going to ask", + "tokens": [50960, 1998, 281, 264, 1192, 295, 264, 1773, 13, 400, 286, 478, 2055, + 281, 536, 370, 867, 24109, 1348, 294, 13, 583, 286, 478, 516, 281, 1029, 51276], + "temperature": 0.0, "avg_logprob": -0.08813354756572458, "compression_ratio": 1.7217391304347827, + "no_speech_prob": 0.004793829284608364}, {"id": 424, "seek": 276176, "start": 2780.0, + "end": 2786.0800000000004, "text": " you my favorite question, the question of why + it''s this kind of mystical, philosophical question.", "tokens": [51276, 291, 452, + 2954, 1168, 11, 264, 1168, 295, 983, 309, 311, 341, 733, 295, 40565, 11, 25066, + 1168, 13, 51580], "temperature": 0.0, "avg_logprob": -0.08813354756572458, "compression_ratio": + 1.7217391304347827, "no_speech_prob": 0.004793829284608364}, {"id": 425, "seek": + 278608, "start": 2786.08, "end": 2791.7599999999998, "text": " You are the most + celebrated search engine professional today, one of the most, if not the most.", + "tokens": [50364, 509, 366, 264, 881, 19366, 3164, 2848, 4843, 965, 11, 472, 295, + 264, 881, 11, 498, 406, 264, 881, 13, 50648], "temperature": 0.0, "avg_logprob": + -0.14867283747746393, "compression_ratio": 1.6025641025641026, "no_speech_prob": + 0.012895653955638409}, {"id": 426, "seek": 278608, "start": 2793.68, "end": 2800.64, + "text": " You''ve done everything there is to do in search, in my opinion, like + when I look at your CV,", "tokens": [50744, 509, 600, 1096, 1203, 456, 307, 281, + 360, 294, 3164, 11, 294, 452, 4800, 11, 411, 562, 286, 574, 412, 428, 22995, 11, + 51092], "temperature": 0.0, "avg_logprob": -0.14867283747746393, "compression_ratio": + 1.6025641025641026, "no_speech_prob": 0.012895653955638409}, {"id": 427, "seek": + 278608, "start": 2800.64, "end": 2807.2, "text": " you know, even you consulted + Zoom through which we''re doing this session. So that speaks volumes.", "tokens": + [51092, 291, 458, 11, 754, 291, 47941, 13453, 807, 597, 321, 434, 884, 341, 5481, + 13, 407, 300, 10789, 22219, 13, 51420], "temperature": 0.0, "avg_logprob": -0.14867283747746393, + "compression_ratio": 1.6025641025641026, "no_speech_prob": 0.012895653955638409}, + {"id": 428, "seek": 278608, "start": 2807.2, "end": 2814.72, "text": " And I just + wanted to ask you what drives you to continue focusing on search engines. And", + "tokens": [51420, 400, 286, 445, 1415, 281, 1029, 291, 437, 11754, 291, 281, 2354, + 8416, 322, 3164, 12982, 13, 400, 51796], "temperature": 0.0, "avg_logprob": -0.14867283747746393, + "compression_ratio": 1.6025641025641026, "no_speech_prob": 0.012895653955638409}, + {"id": 429, "seek": 281472, "start": 2815.2799999999997, "end": 2824.08, "text": + " especially teaching about it. So search, if all of the problems that, you know, + of all the things we", "tokens": [50392, 2318, 4571, 466, 309, 13, 407, 3164, 11, + 498, 439, 295, 264, 2740, 300, 11, 291, 458, 11, 295, 439, 264, 721, 321, 50832], + "temperature": 0.0, "avg_logprob": -0.2222597394670759, "compression_ratio": 1.5585106382978724, + "no_speech_prob": 0.00035684084286913276}, {"id": 430, "seek": 281472, "start": + 2824.08, "end": 2832.48, "text": " do with technology, I believe is the one that + puts us as human beings front and center. So much of", "tokens": [50832, 360, 365, + 2899, 11, 286, 1697, 307, 264, 472, 300, 8137, 505, 382, 1952, 8958, 1868, 293, + 3056, 13, 407, 709, 295, 51252], "temperature": 0.0, "avg_logprob": -0.2222597394670759, + "compression_ratio": 1.5585106382978724, "no_speech_prob": 0.00035684084286913276}, + {"id": 431, "seek": 281472, "start": 2832.48, "end": 2838.9599999999996, "text": + " what you see, and specifically machine learning AI is being done to us feeds recommendations,", + "tokens": [51252, 437, 291, 536, 11, 293, 4682, 3479, 2539, 7318, 307, 885, 1096, + 281, 505, 23712, 10434, 11, 51576], "temperature": 0.0, "avg_logprob": -0.2222597394670759, + "compression_ratio": 1.5585106382978724, "no_speech_prob": 0.00035684084286913276}, + {"id": 432, "seek": 283896, "start": 2838.96, "end": 2847.36, "text": " advertisements, + and search starts with people expressing what they want. And I know,", "tokens": + [50364, 42897, 11, 293, 3164, 3719, 365, 561, 22171, 437, 436, 528, 13, 400, 286, + 458, 11, 50784], "temperature": 0.0, "avg_logprob": -0.3085612720913357, "compression_ratio": + 1.5384615384615385, "no_speech_prob": 0.0008526438032276928}, {"id": 433, "seek": + 283896, "start": 2848.64, "end": 2855.92, "text": " in my version of the future, + I''m not a custodian, but I believe that the machines will help us,", "tokens": + [50848, 294, 452, 3037, 295, 264, 2027, 11, 286, 478, 406, 257, 14884, 378, 952, + 11, 457, 286, 1697, 300, 264, 8379, 486, 854, 505, 11, 51212], "temperature": 0.0, + "avg_logprob": -0.3085612720913357, "compression_ratio": 1.5384615384615385, "no_speech_prob": + 0.0008526438032276928}, {"id": 434, "seek": 283896, "start": 2855.92, "end": 2862.2400000000002, + "text": " but they have to start with us expressing our intent. So that''s an ace + of my search is so exciting.", "tokens": [51212, 457, 436, 362, 281, 722, 365, 505, + 22171, 527, 8446, 13, 407, 300, 311, 364, 17117, 295, 452, 3164, 307, 370, 4670, + 13, 51528], "temperature": 0.0, "avg_logprob": -0.3085612720913357, "compression_ratio": + 1.5384615384615385, "no_speech_prob": 0.0008526438032276928}, {"id": 435, "seek": + 286224, "start": 2863.04, "end": 2870.3999999999996, "text": " And as for why I + teach it, well, it comes back to what you asked in the beginning. Now,", "tokens": + [50404, 400, 382, 337, 983, 286, 2924, 309, 11, 731, 11, 309, 1487, 646, 281, 437, + 291, 2351, 294, 264, 2863, 13, 823, 11, 50772], "temperature": 0.0, "avg_logprob": + -0.2157260920550372, "compression_ratio": 1.5297297297297296, "no_speech_prob": + 0.005737084895372391}, {"id": 436, "seek": 286224, "start": 2870.3999999999996, + "end": 2875.7599999999998, "text": " I''m a despairing that there''s nothing exciting + about search. Not despairing, but I do think that", "tokens": [50772, 286, 478, + 257, 25763, 278, 300, 456, 311, 1825, 4670, 466, 3164, 13, 1726, 25763, 278, 11, + 457, 286, 360, 519, 300, 51040], "temperature": 0.0, "avg_logprob": -0.2157260920550372, + "compression_ratio": 1.5297297297297296, "no_speech_prob": 0.005737084895372391}, + {"id": 437, "seek": 286224, "start": 2876.56, "end": 2886.7999999999997, "text": + " the need for people to be building great, great search is not net by the supply + of people who have", "tokens": [51080, 264, 643, 337, 561, 281, 312, 2390, 869, + 11, 869, 3164, 307, 406, 2533, 538, 264, 5847, 295, 561, 567, 362, 51592], "temperature": + 0.0, "avg_logprob": -0.2157260920550372, "compression_ratio": 1.5297297297297296, + "no_speech_prob": 0.005737084895372391}, {"id": 438, "seek": 288680, "start": 2886.8, + "end": 2893.44, "text": " learned about it. And as much as I enjoy personally working + as a consultant for companies,", "tokens": [50364, 3264, 466, 309, 13, 400, 382, + 709, 382, 286, 2103, 5665, 1364, 382, 257, 24676, 337, 3431, 11, 50696], "temperature": + 0.0, "avg_logprob": -0.0972336825202493, "compression_ratio": 1.5811965811965811, + "no_speech_prob": 0.0003691905876621604}, {"id": 439, "seek": 288680, "start": 2893.44, + "end": 2900.96, "text": " that''s not exactly a scalable approach. So what I see + is there''s so many people out there", "tokens": [50696, 300, 311, 406, 2293, 257, + 38481, 3109, 13, 407, 437, 286, 536, 307, 456, 311, 370, 867, 561, 484, 456, 51072], + "temperature": 0.0, "avg_logprob": -0.0972336825202493, "compression_ratio": 1.5811965811965811, + "no_speech_prob": 0.0003691905876621604}, {"id": 440, "seek": 288680, "start": 2901.6800000000003, + "end": 2908.96, "text": " who know enough that with a little bit of a push, some + combination of the sort of the basics", "tokens": [51108, 567, 458, 1547, 300, 365, + 257, 707, 857, 295, 257, 2944, 11, 512, 6562, 295, 264, 1333, 295, 264, 14688, 51472], + "temperature": 0.0, "avg_logprob": -0.0972336825202493, "compression_ratio": 1.5811965811965811, + "no_speech_prob": 0.0003691905876621604}, {"id": 441, "seek": 288680, "start": 2908.96, + "end": 2914.8, "text": " domain knowledge that we''re teaching in our fundamentals + class, but also the kinds of techniques", "tokens": [51472, 9274, 3601, 300, 321, + 434, 4571, 294, 527, 29505, 1508, 11, 457, 611, 264, 3685, 295, 7512, 51764], "temperature": + 0.0, "avg_logprob": -0.0972336825202493, "compression_ratio": 1.5811965811965811, + "no_speech_prob": 0.0003691905876621604}, {"id": 442, "seek": 291480, "start": 2914.8, + "end": 2919.28, "text": " and feedback are somewhat opinionated way of showing those + techniques in the search with machine learning", "tokens": [50364, 293, 5824, 366, + 8344, 4800, 770, 636, 295, 4099, 729, 7512, 294, 264, 3164, 365, 3479, 2539, 50588], + "temperature": 0.0, "avg_logprob": -0.3116169796195081, "compression_ratio": 1.644, + "no_speech_prob": 0.0017234598053619266}, {"id": 443, "seek": 291480, "start": 2919.28, + "end": 2925.1200000000003, "text": " class that focuses on queer understanding, + non-conset understanding. That takes dense retrieval,", "tokens": [50588, 1508, + 300, 16109, 322, 20323, 3701, 11, 2107, 12, 21190, 302, 3701, 13, 663, 2516, 18011, + 19817, 3337, 11, 50880], "temperature": 0.0, "avg_logprob": -0.3116169796195081, + "compression_ratio": 1.644, "no_speech_prob": 0.0017234598053619266}, {"id": 444, + "seek": 291480, "start": 2925.1200000000003, "end": 2931.84, "text": " vector retrieval, + puts it in context, is just the nudge they need to get over this. You don''t need + to", "tokens": [50880, 8062, 19817, 3337, 11, 8137, 309, 294, 4319, 11, 307, 445, + 264, 297, 16032, 436, 643, 281, 483, 670, 341, 13, 509, 500, 380, 643, 281, 51216], + "temperature": 0.0, "avg_logprob": -0.3116169796195081, "compression_ratio": 1.644, + "no_speech_prob": 0.0017234598053619266}, {"id": 445, "seek": 291480, "start": 2931.84, + "end": 2939.76, "text": " spend years and not everybody''s going to get to do a + PhD in information retrieval and machine translation.", "tokens": [51216, 3496, + 924, 293, 406, 2201, 311, 516, 281, 483, 281, 360, 257, 14476, 294, 1589, 19817, + 3337, 293, 3479, 12853, 13, 51612], "temperature": 0.0, "avg_logprob": -0.3116169796195081, + "compression_ratio": 1.644, "no_speech_prob": 0.0017234598053619266}, {"id": 446, + "seek": 293976, "start": 2939.76, "end": 2946.32, "text": " But I think that today, + if you are a software engineer, if you have a basic knowledge of coding,", "tokens": + [50364, 583, 286, 519, 300, 965, 11, 498, 291, 366, 257, 4722, 11403, 11, 498, 291, + 362, 257, 3875, 3601, 295, 17720, 11, 50692], "temperature": 0.0, "avg_logprob": + -0.11498146827774819, "compression_ratio": 1.6016260162601625, "no_speech_prob": + 0.0016789698274806142}, {"id": 447, "seek": 293976, "start": 2947.2000000000003, + "end": 2954.88, "text": " and you learn a few of these things, you can do wonders + with the tool that''s out there. And then", "tokens": [50736, 293, 291, 1466, 257, + 1326, 295, 613, 721, 11, 291, 393, 360, 27348, 365, 264, 2290, 300, 311, 484, 456, + 13, 400, 550, 51120], "temperature": 0.0, "avg_logprob": -0.11498146827774819, "compression_ratio": + 1.6016260162601625, "no_speech_prob": 0.0016789698274806142}, {"id": 448, "seek": + 293976, "start": 2954.88, "end": 2960.7200000000003, "text": " from experience, + you''ll develop the rest of the sorts of skills that you need. So I''m excited that", + "tokens": [51120, 490, 1752, 11, 291, 603, 1499, 264, 1472, 295, 264, 7527, 295, + 3942, 300, 291, 643, 13, 407, 286, 478, 2919, 300, 51412], "temperature": 0.0, "avg_logprob": + -0.11498146827774819, "compression_ratio": 1.6016260162601625, "no_speech_prob": + 0.0016789698274806142}, {"id": 449, "seek": 293976, "start": 2961.6000000000004, + "end": 2968.8, "text": " I can be a part of enabling the next generation to just + run circles around anything I ever got to do.", "tokens": [51456, 286, 393, 312, + 257, 644, 295, 23148, 264, 958, 5125, 281, 445, 1190, 13040, 926, 1340, 286, 1562, + 658, 281, 360, 13, 51816], "temperature": 0.0, "avg_logprob": -0.11498146827774819, + "compression_ratio": 1.6016260162601625, "no_speech_prob": 0.0016789698274806142}, + {"id": 450, "seek": 296880, "start": 2968.88, "end": 2973.52, "text": " Now look + back at the work we did in the early 2000s and it looks so naive, although I think", + "tokens": [50368, 823, 574, 646, 412, 264, 589, 321, 630, 294, 264, 2440, 8132, + 82, 293, 309, 1542, 370, 29052, 11, 4878, 286, 519, 50600], "temperature": 0.0, + "avg_logprob": -0.2080179748535156, "compression_ratio": 1.6819787985865724, "no_speech_prob": + 0.0049469913356006145}, {"id": 451, "seek": 296880, "start": 2974.0800000000004, + "end": 2978.6400000000003, "text": " we were working at least on the right problems, + but without the machinery we have today.", "tokens": [50628, 321, 645, 1364, 412, + 1935, 322, 264, 558, 2740, 11, 457, 1553, 264, 27302, 321, 362, 965, 13, 50856], + "temperature": 0.0, "avg_logprob": -0.2080179748535156, "compression_ratio": 1.6819787985865724, + "no_speech_prob": 0.0049469913356006145}, {"id": 452, "seek": 296880, "start": 2979.36, + "end": 2985.6800000000003, "text": " And I just think, you know, in in in another + 20 years, I look forward to looking back on the", "tokens": [50892, 400, 286, 445, + 519, 11, 291, 458, 11, 294, 294, 294, 1071, 945, 924, 11, 286, 574, 2128, 281, 1237, + 646, 322, 264, 51208], "temperature": 0.0, "avg_logprob": -0.2080179748535156, "compression_ratio": + 1.6819787985865724, "no_speech_prob": 0.0049469913356006145}, {"id": 453, "seek": + 296880, "start": 2985.6800000000003, "end": 2992.2400000000002, "text": " naivety + of what we thought search was, you know, back at this point. Yeah, I''m ecstatic. + This is very", "tokens": [51208, 1667, 592, 2210, 295, 437, 321, 1194, 3164, 390, + 11, 291, 458, 11, 646, 412, 341, 935, 13, 865, 11, 286, 478, 11437, 34632, 13, 639, + 307, 588, 51536], "temperature": 0.0, "avg_logprob": -0.2080179748535156, "compression_ratio": + 1.6819787985865724, "no_speech_prob": 0.0049469913356006145}, {"id": 454, "seek": + 296880, "start": 2992.2400000000002, "end": 2997.28, "text": " deep Daniels. Thanks + so much. And please keep doing what you''re doing because I really, really enjoy", + "tokens": [51536, 2452, 8033, 82, 13, 2561, 370, 709, 13, 400, 1767, 1066, 884, + 437, 291, 434, 884, 570, 286, 534, 11, 534, 2103, 51788], "temperature": 0.0, "avg_logprob": + -0.2080179748535156, "compression_ratio": 1.6819787985865724, "no_speech_prob": + 0.0049469913356006145}, {"id": 455, "seek": 299728, "start": 2997.92, "end": 3002.1600000000003, + "text": " reading your blogs. I need to also read your book, by the way, on FASITED + search.", "tokens": [50396, 3760, 428, 31038, 13, 286, 643, 281, 611, 1401, 428, + 1446, 11, 538, 264, 636, 11, 322, 479, 3160, 3927, 4731, 3164, 13, 50608], "temperature": + 0.0, "avg_logprob": -0.17326187265330348, "compression_ratio": 1.6468531468531469, + "no_speech_prob": 0.006259008310735226}, {"id": 456, "seek": 299728, "start": 3004.0800000000004, + "end": 3009.44, "text": " And before we move to the questions from the audience, + is there any announcement you would like to", "tokens": [50704, 400, 949, 321, 1286, + 281, 264, 1651, 490, 264, 4034, 11, 307, 456, 604, 12847, 291, 576, 411, 281, 50972], + "temperature": 0.0, "avg_logprob": -0.17326187265330348, "compression_ratio": 1.6468531468531469, + "no_speech_prob": 0.006259008310735226}, {"id": 457, "seek": 299728, "start": 3009.44, + "end": 3016.0, "text": " make to our audience? Well, you know, for those who don''t + know, we''re going to be teaching these", "tokens": [50972, 652, 281, 527, 4034, + 30, 1042, 11, 291, 458, 11, 337, 729, 567, 500, 380, 458, 11, 321, 434, 516, 281, + 312, 4571, 613, 51300], "temperature": 0.0, "avg_logprob": -0.17326187265330348, + "compression_ratio": 1.6468531468531469, "no_speech_prob": 0.006259008310735226}, + {"id": 458, "seek": 299728, "start": 3016.0, "end": 3021.0400000000004, "text": + " two classes in June. There''s a search fundamentals class, which is a two-week + class intended for", "tokens": [51300, 732, 5359, 294, 6928, 13, 821, 311, 257, + 3164, 29505, 1508, 11, 597, 307, 257, 732, 12, 23188, 1508, 10226, 337, 51552], + "temperature": 0.0, "avg_logprob": -0.17326187265330348, "compression_ratio": 1.6468531468531469, + "no_speech_prob": 0.006259008310735226}, {"id": 459, "seek": 299728, "start": 3021.0400000000004, + "end": 3026.6400000000003, "text": " people with no background in search. And this + search with machine learning class that will start", "tokens": [51552, 561, 365, + 572, 3678, 294, 3164, 13, 400, 341, 3164, 365, 3479, 2539, 1508, 300, 486, 722, + 51832], "temperature": 0.0, "avg_logprob": -0.17326187265330348, "compression_ratio": + 1.6468531468531469, "no_speech_prob": 0.006259008310735226}, {"id": 460, "seek": + 302664, "start": 3026.64, "end": 3033.52, "text": " two weeks later. So people can + take both a fact, focuses on search with machine learning.", "tokens": [50364, 732, + 3259, 1780, 13, 407, 561, 393, 747, 1293, 257, 1186, 11, 16109, 322, 3164, 365, + 3479, 2539, 13, 50708], "temperature": 0.0, "avg_logprob": -0.27948870557419797, + "compression_ratio": 1.5614754098360655, "no_speech_prob": 0.0015271957963705063}, + {"id": 461, "seek": 302664, "start": 3033.52, "end": 3038.16, "text": " So we''re + going to show you a query understanding, constant understanding, and vector retrieval. + That", "tokens": [50708, 407, 321, 434, 516, 281, 855, 291, 257, 14581, 3701, 11, + 5754, 3701, 11, 293, 8062, 19817, 3337, 13, 663, 50940], "temperature": 0.0, "avg_logprob": + -0.27948870557419797, "compression_ratio": 1.5614754098360655, "no_speech_prob": + 0.0015271957963705063}, {"id": 462, "seek": 302664, "start": 3038.16, "end": 3044.24, + "text": " will start the first course. We''ll start at June 6th, the second on June + 20th. And these classes", "tokens": [50940, 486, 722, 264, 700, 1164, 13, 492, 603, + 722, 412, 6928, 1386, 392, 11, 264, 1150, 322, 6928, 945, 392, 13, 400, 613, 5359, + 51244], "temperature": 0.0, "avg_logprob": -0.27948870557419797, "compression_ratio": + 1.5614754098360655, "no_speech_prob": 0.0015271957963705063}, {"id": 463, "seek": + 302664, "start": 3044.24, "end": 3050.8799999999997, "text": " are available to + anyone in the world. We make sure, granted eye, that we cover the time zones", "tokens": + [51244, 366, 2435, 281, 2878, 294, 264, 1002, 13, 492, 652, 988, 11, 12344, 3313, + 11, 300, 321, 2060, 264, 565, 16025, 51576], "temperature": 0.0, "avg_logprob": + -0.27948870557419797, "compression_ratio": 1.5614754098360655, "no_speech_prob": + 0.0015271957963705063}, {"id": 464, "seek": 305088, "start": 3050.96, "end": 3057.6800000000003, + "text": " well and are available asynchronously. So I hope that some of you have + already signed up. Some of you", "tokens": [50368, 731, 293, 366, 2435, 42642, 5098, + 13, 407, 286, 1454, 300, 512, 295, 291, 362, 1217, 8175, 493, 13, 2188, 295, 291, + 50704], "temperature": 0.0, "avg_logprob": -0.22742205794139575, "compression_ratio": + 1.6302521008403361, "no_speech_prob": 0.01738041266798973}, {"id": 465, "seek": + 305088, "start": 3057.6800000000003, "end": 3066.7200000000003, "text": " have taken + the class before. But what we experienced when we talked this class before was the + incredible", "tokens": [50704, 362, 2726, 264, 1508, 949, 13, 583, 437, 321, 6751, + 562, 321, 2825, 341, 1508, 949, 390, 264, 4651, 51156], "temperature": 0.0, "avg_logprob": + -0.22742205794139575, "compression_ratio": 1.6302521008403361, "no_speech_prob": + 0.01738041266798973}, {"id": 466, "seek": 305088, "start": 3066.7200000000003, "end": + 3073.28, "text": " community, which did make sure it''s indeed a part of. And we''re + excited to keep going.", "tokens": [51156, 1768, 11, 597, 630, 652, 988, 309, 311, + 6451, 257, 644, 295, 13, 400, 321, 434, 2919, 281, 1066, 516, 13, 51484], "temperature": + 0.0, "avg_logprob": -0.22742205794139575, "compression_ratio": 1.6302521008403361, + "no_speech_prob": 0.01738041266798973}, {"id": 467, "seek": 305088, "start": 3074.2400000000002, + "end": 3078.88, "text": " Absolutely. And I highly recommend you to take this course, + or both of these are one of these.", "tokens": [51532, 7021, 13, 400, 286, 5405, + 2748, 291, 281, 747, 341, 1164, 11, 420, 1293, 295, 613, 366, 472, 295, 613, 13, + 51764], "temperature": 0.0, "avg_logprob": -0.22742205794139575, "compression_ratio": + 1.6302521008403361, "no_speech_prob": 0.01738041266798973}, {"id": 468, "seek": + 307888, "start": 3079.76, "end": 3086.1600000000003, "text": " And first hand experience, + it was breathless run. And I was like, yeah, it''s every single week. I", "tokens": + [50408, 400, 700, 1011, 1752, 11, 309, 390, 6045, 1832, 1190, 13, 400, 286, 390, + 411, 11, 1338, 11, 309, 311, 633, 2167, 1243, 13, 286, 50728], "temperature": 0.0, + "avg_logprob": -0.15787684475934063, "compression_ratio": 1.6470588235294117, "no_speech_prob": + 0.02119852602481842}, {"id": 469, "seek": 307888, "start": 3086.1600000000003, "end": + 3091.92, "text": " need to hand in a project. And it''s not just theory. And theory, + by the way, is very deep. If you", "tokens": [50728, 643, 281, 1011, 294, 257, 1716, + 13, 400, 309, 311, 406, 445, 5261, 13, 400, 5261, 11, 538, 264, 636, 11, 307, 588, + 2452, 13, 759, 291, 51016], "temperature": 0.0, "avg_logprob": -0.15787684475934063, + "compression_ratio": 1.6470588235294117, "no_speech_prob": 0.02119852602481842}, + {"id": 470, "seek": 307888, "start": 3091.92, "end": 3098.7200000000003, "text": + " have time, go and read all the write-ups that Daniel and Grant have done on the + course. But also the", "tokens": [51016, 362, 565, 11, 352, 293, 1401, 439, 264, + 2464, 12, 7528, 300, 8033, 293, 17529, 362, 1096, 322, 264, 1164, 13, 583, 611, + 264, 51356], "temperature": 0.0, "avg_logprob": -0.15787684475934063, "compression_ratio": + 1.6470588235294117, "no_speech_prob": 0.02119852602481842}, {"id": 471, "seek": + 307888, "start": 3098.7200000000003, "end": 3104.2400000000002, "text": " actual + act of coding, the actual thing that you see how it evolves in your hands. It''s + amazing.", "tokens": [51356, 3539, 605, 295, 17720, 11, 264, 3539, 551, 300, 291, + 536, 577, 309, 43737, 294, 428, 2377, 13, 467, 311, 2243, 13, 51632], "temperature": + 0.0, "avg_logprob": -0.15787684475934063, "compression_ratio": 1.6470588235294117, + "no_speech_prob": 0.02119852602481842}, {"id": 472, "seek": 310424, "start": 3104.7999999999997, + "end": 3109.8399999999997, "text": " So awesome. Let''s proceed to the questions + from the audience. I will take the first question from", "tokens": [50392, 407, + 3476, 13, 961, 311, 8991, 281, 264, 1651, 490, 264, 4034, 13, 286, 486, 747, 264, + 700, 1168, 490, 50644], "temperature": 0.0, "avg_logprob": -0.26084463412945086, + "compression_ratio": 1.5551020408163265, "no_speech_prob": 0.007674016058444977}, + {"id": 473, "seek": 310424, "start": 3109.8399999999997, "end": 3114.4799999999996, + "text": " the Q&A panel. We also have in the chat. So the first question is from + Hemann Schu. I''m", "tokens": [50644, 264, 1249, 5, 32, 4831, 13, 492, 611, 362, + 294, 264, 5081, 13, 407, 264, 700, 1168, 307, 490, 18568, 969, 2065, 84, 13, 286, + 478, 50876], "temperature": 0.0, "avg_logprob": -0.26084463412945086, "compression_ratio": + 1.5551020408163265, "no_speech_prob": 0.007674016058444977}, {"id": 474, "seek": + 310424, "start": 3114.4799999999996, "end": 3119.7599999999998, "text": " a polygis + of five meets pronounce your name. Hi, Daniel. Any specific book to get started + with", "tokens": [50876, 257, 6754, 70, 271, 295, 1732, 13961, 19567, 428, 1315, + 13, 2421, 11, 8033, 13, 2639, 2685, 1446, 281, 483, 1409, 365, 51140], "temperature": + 0.0, "avg_logprob": -0.26084463412945086, "compression_ratio": 1.5551020408163265, + "no_speech_prob": 0.007674016058444977}, {"id": 475, "seek": 310424, "start": 3119.7599999999998, + "end": 3127.8399999999997, "text": " search with elements of machine learning? Thanks. + Yeah, so I mean, the, I''ll say this, there''s a lot", "tokens": [51140, 3164, 365, + 4959, 295, 3479, 2539, 30, 2561, 13, 865, 11, 370, 286, 914, 11, 264, 11, 286, 603, + 584, 341, 11, 456, 311, 257, 688, 51544], "temperature": 0.0, "avg_logprob": -0.26084463412945086, + "compression_ratio": 1.5551020408163265, "no_speech_prob": 0.007674016058444977}, + {"id": 476, "seek": 312784, "start": 3127.84, "end": 3136.8, "text": " that''s been + written on, on learning to rank. I think Chris Mannings book discusses it. I", "tokens": + [50364, 300, 311, 668, 3720, 322, 11, 322, 2539, 281, 6181, 13, 286, 519, 6688, + 16892, 1109, 1446, 2248, 279, 309, 13, 286, 50812], "temperature": 0.0, "avg_logprob": + -0.3950092356692078, "compression_ratio": 1.5805084745762712, "no_speech_prob": + 0.003178741317242384}, {"id": 477, "seek": 312784, "start": 3137.84, "end": 3144.6400000000003, + "text": " recorded by a Jesus book, might, I''ll have been a little bit older. What + I think you''re going to be", "tokens": [50864, 8287, 538, 257, 2705, 1446, 11, + 1062, 11, 286, 603, 362, 668, 257, 707, 857, 4906, 13, 708, 286, 519, 291, 434, + 516, 281, 312, 51204], "temperature": 0.0, "avg_logprob": -0.3950092356692078, "compression_ratio": + 1.5805084745762712, "no_speech_prob": 0.003178741317242384}, {"id": 478, "seek": + 312784, "start": 3144.6400000000003, "end": 3151.1200000000003, "text": " less likely + to find to though, is that''s on the query and content understanding. There is a + book.", "tokens": [51204, 1570, 3700, 281, 915, 281, 1673, 11, 307, 300, 311, 322, + 264, 14581, 293, 2701, 3701, 13, 821, 307, 257, 1446, 13, 51528], "temperature": + 0.0, "avg_logprob": -0.3950092356692078, "compression_ratio": 1.5805084745762712, + "no_speech_prob": 0.003178741317242384}, {"id": 479, "seek": 312784, "start": 3151.1200000000003, + "end": 3156.7200000000003, "text": " It''s really a moral collection of survey essays + on query understanding for search that", "tokens": [51528, 467, 311, 534, 257, 9723, + 5765, 295, 8984, 35123, 322, 14581, 3701, 337, 3164, 300, 51808], "temperature": + 0.0, "avg_logprob": -0.3950092356692078, "compression_ratio": 1.5805084745762712, + "no_speech_prob": 0.003178741317242384}, {"id": 480, "seek": 315784, "start": 3158.1600000000003, + "end": 3164.08, "text": " was published. I forget if it was last year or the year + before, a little bit expensive,", "tokens": [50380, 390, 6572, 13, 286, 2870, 498, + 309, 390, 1036, 1064, 420, 264, 1064, 949, 11, 257, 707, 857, 5124, 11, 50676], + "temperature": 0.0, "avg_logprob": -0.2335918971470424, "compression_ratio": 1.6016949152542372, + "no_speech_prob": 0.005660131573677063}, {"id": 481, "seek": 315784, "start": 3164.08, + "end": 3169.6000000000004, "text": " but if you look at that out there, if you''re + more interested in things for free, my blog at", "tokens": [50676, 457, 498, 291, + 574, 412, 300, 484, 456, 11, 498, 291, 434, 544, 3102, 294, 721, 337, 1737, 11, + 452, 6968, 412, 50952], "temperature": 0.0, "avg_logprob": -0.2335918971470424, + "compression_ratio": 1.6016949152542372, "no_speech_prob": 0.005660131573677063}, + {"id": 482, "seek": 315784, "start": 3169.6000000000004, "end": 3176.1600000000003, + "text": " queryunderstandy.com is available. At least we''ll give you a survey of + the of the techniques there.", "tokens": [50952, 14581, 6617, 1115, 88, 13, 1112, + 307, 2435, 13, 1711, 1935, 321, 603, 976, 291, 257, 8984, 295, 264, 295, 264, 7512, + 456, 13, 51280], "temperature": 0.0, "avg_logprob": -0.2335918971470424, "compression_ratio": + 1.6016949152542372, "no_speech_prob": 0.005660131573677063}, {"id": 483, "seek": + 315784, "start": 3176.1600000000003, "end": 3181.76, "text": " But to be clear, + it''s very focused on query understanding as such. I''m writing a series of content", + "tokens": [51280, 583, 281, 312, 1850, 11, 309, 311, 588, 5178, 322, 14581, 3701, + 382, 1270, 13, 286, 478, 3579, 257, 2638, 295, 2701, 51560], "temperature": 0.0, + "avg_logprob": -0.2335918971470424, "compression_ratio": 1.6016949152542372, "no_speech_prob": + 0.005660131573677063}, {"id": 484, "seek": 318176, "start": 3181.76, "end": 3188.0, + "text": " understanding that unimaginatively content understanding.com that starts + doing the same thing", "tokens": [50364, 3701, 300, 517, 44976, 19020, 2701, 3701, + 13, 1112, 300, 3719, 884, 264, 912, 551, 50676], "temperature": 0.0, "avg_logprob": + -0.2131087293902647, "compression_ratio": 1.7564575645756457, "no_speech_prob": + 0.011947030201554298}, {"id": 485, "seek": 318176, "start": 3188.0, "end": 3195.0400000000004, + "text": " there. But from the perspective of books, I would say that probably Chris + Mannings information", "tokens": [50676, 456, 13, 583, 490, 264, 4585, 295, 3642, + 11, 286, 576, 584, 300, 1391, 6688, 16892, 1109, 1589, 51028], "temperature": 0.0, + "avg_logprob": -0.2131087293902647, "compression_ratio": 1.7564575645756457, "no_speech_prob": + 0.011947030201554298}, {"id": 486, "seek": 318176, "start": 3195.0400000000004, + "end": 3199.36, "text": " retrieval book would be a good place to start an information + retrieval in general. And the query", "tokens": [51028, 19817, 3337, 1446, 576, + 312, 257, 665, 1081, 281, 722, 364, 1589, 19817, 3337, 294, 2674, 13, 400, 264, + 14581, 51244], "temperature": 0.0, "avg_logprob": -0.2131087293902647, "compression_ratio": + 1.7564575645756457, "no_speech_prob": 0.011947030201554298}, {"id": 487, "seek": + 318176, "start": 3199.36, "end": 3205.6000000000004, "text": " understanding collection + of essays is frankly the best publisher resource you''re going to get", "tokens": + [51244, 3701, 5765, 295, 35123, 307, 11939, 264, 1151, 25088, 7684, 291, 434, 516, + 281, 483, 51556], "temperature": 0.0, "avg_logprob": -0.2131087293902647, "compression_ratio": + 1.7564575645756457, "no_speech_prob": 0.011947030201554298}, {"id": 488, "seek": + 318176, "start": 3205.6000000000004, "end": 3210.4, "text": " for that. Awesome. + Hope that answers your question. I''m sure the next one I''m going to take from", + "tokens": [51556, 337, 300, 13, 10391, 13, 6483, 300, 6338, 428, 1168, 13, 286, + 478, 988, 264, 958, 472, 286, 478, 516, 281, 747, 490, 51796], "temperature": 0.0, + "avg_logprob": -0.2131087293902647, "compression_ratio": 1.7564575645756457, "no_speech_prob": + 0.011947030201554298}, {"id": 489, "seek": 321040, "start": 3210.4, "end": 3215.44, + "text": " the chat. It''s from Chris. What are your recommendations for integrating + information retrieval,", "tokens": [50364, 264, 5081, 13, 467, 311, 490, 6688, 13, + 708, 366, 428, 10434, 337, 26889, 1589, 19817, 3337, 11, 50616], "temperature": + 0.0, "avg_logprob": -0.24077258390538833, "compression_ratio": 1.643979057591623, + "no_speech_prob": 0.0011486022267490625}, {"id": 490, "seek": 321040, "start": 3215.44, + "end": 3219.76, "text": " retrieving documents with question answering, returning + answers within a context?", "tokens": [50616, 19817, 798, 8512, 365, 1168, 13430, + 11, 12678, 6338, 1951, 257, 4319, 30, 50832], "temperature": 0.0, "avg_logprob": + -0.24077258390538833, "compression_ratio": 1.643979057591623, "no_speech_prob": + 0.0011486022267490625}, {"id": 491, "seek": 321040, "start": 3221.12, "end": 3226.2400000000002, + "text": " Yeah. So question answering is really exciting, right? And that the", + "tokens": [50900, 865, 13, 407, 1168, 13430, 307, 534, 4670, 11, 558, 30, 400, 300, + 264, 51156], "temperature": 0.0, "avg_logprob": -0.24077258390538833, "compression_ratio": + 1.643979057591623, "no_speech_prob": 0.0011486022267490625}, {"id": 492, "seek": + 321040, "start": 3228.4, "end": 3231.76, "text": " this idea that you can get information + instead of just the document.", "tokens": [51264, 341, 1558, 300, 291, 393, 483, + 1589, 2602, 295, 445, 264, 4166, 13, 51432], "temperature": 0.0, "avg_logprob": + -0.24077258390538833, "compression_ratio": 1.643979057591623, "no_speech_prob": + 0.0011486022267490625}, {"id": 493, "seek": 323176, "start": 3232.2400000000002, + "end": 3240.96, "text": " The if you think about the the way we''ve gotten there, + a lot of it starts front of a", "tokens": [50388, 440, 498, 291, 519, 466, 264, + 264, 636, 321, 600, 5768, 456, 11, 257, 688, 295, 309, 3719, 1868, 295, 257, 50824], + "temperature": 0.0, "avg_logprob": -0.3075831399035098, "compression_ratio": 1.603448275862069, + "no_speech_prob": 0.01843389682471752}, {"id": 494, "seek": 323176, "start": 3240.96, + "end": 3248.0, "text": " mental passage retrieval where or even before that search + snippets essentially, if you think about", "tokens": [50824, 4973, 11497, 19817, + 3337, 689, 420, 754, 949, 300, 3164, 35623, 1385, 4476, 11, 498, 291, 519, 466, + 51176], "temperature": 0.0, "avg_logprob": -0.3075831399035098, "compression_ratio": + 1.603448275862069, "no_speech_prob": 0.01843389682471752}, {"id": 495, "seek": 323176, + "start": 3248.48, "end": 3255.92, "text": " the way Google looked five to ten years + ago, you would see that sometimes as you looked at your", "tokens": [51200, 264, + 636, 3329, 2956, 1732, 281, 2064, 924, 2057, 11, 291, 576, 536, 300, 2171, 382, + 291, 2956, 412, 428, 51572], "temperature": 0.0, "avg_logprob": -0.3075831399035098, + "compression_ratio": 1.603448275862069, "no_speech_prob": 0.01843389682471752}, + {"id": 496, "seek": 325592, "start": 3255.92, "end": 3262.8, "text": " search results + page, your answer was in the few words that were highlighted for the for the result.", + "tokens": [50364, 3164, 3542, 3028, 11, 428, 1867, 390, 294, 264, 1326, 2283, 300, + 645, 17173, 337, 264, 337, 264, 1874, 13, 50708], "temperature": 0.0, "avg_logprob": + -0.10561740398406982, "compression_ratio": 1.7045454545454546, "no_speech_prob": + 0.00038119900273159146}, {"id": 497, "seek": 325592, "start": 3262.8, "end": 3269.6800000000003, + "text": " But now it''s more likely that you''ll see that sentence extracted and + put near the top. And", "tokens": [50708, 583, 586, 309, 311, 544, 3700, 300, 291, + 603, 536, 300, 8174, 34086, 293, 829, 2651, 264, 1192, 13, 400, 51052], "temperature": + 0.0, "avg_logprob": -0.10561740398406982, "compression_ratio": 1.7045454545454546, + "no_speech_prob": 0.00038119900273159146}, {"id": 498, "seek": 325592, "start": + 3270.64, "end": 3278.08, "text": " I would say that a lot of question answering + today feels a lot like passage retrieval. That is", "tokens": [51100, 286, 576, + 584, 300, 257, 688, 295, 1168, 13430, 965, 3417, 257, 688, 411, 11497, 19817, 3337, + 13, 663, 307, 51472], "temperature": 0.0, "avg_logprob": -0.10561740398406982, "compression_ratio": + 1.7045454545454546, "no_speech_prob": 0.00038119900273159146}, {"id": 499, "seek": + 325592, "start": 3278.08, "end": 3284.16, "text": " find that sentence. Although + I would say that while before that tended to be retrieving a", "tokens": [51472, + 915, 300, 8174, 13, 5780, 286, 576, 584, 300, 1339, 949, 300, 34732, 281, 312, 19817, + 798, 257, 51776], "temperature": 0.0, "avg_logprob": -0.10561740398406982, "compression_ratio": + 1.7045454545454546, "no_speech_prob": 0.00038119900273159146}, {"id": 500, "seek": + 328416, "start": 3284.16, "end": 3288.8799999999997, "text": " passage that contained + essentially the exact words you''d use, maybe a little bit of variation", "tokens": + [50364, 11497, 300, 16212, 4476, 264, 1900, 2283, 291, 1116, 764, 11, 1310, 257, + 707, 857, 295, 12990, 50600], "temperature": 0.0, "avg_logprob": -0.1809255375581629, + "compression_ratio": 1.6194690265486726, "no_speech_prob": 0.00036422029370442033}, + {"id": 501, "seek": 328416, "start": 3288.8799999999997, "end": 3296.3999999999996, + "text": " for stemming or synonyms, nowadays it''s more likely using a vector-based + approach to be a", "tokens": [50600, 337, 12312, 2810, 420, 5451, 2526, 2592, 11, + 13434, 309, 311, 544, 3700, 1228, 257, 8062, 12, 6032, 3109, 281, 312, 257, 50976], + "temperature": 0.0, "avg_logprob": -0.1809255375581629, "compression_ratio": 1.6194690265486726, + "no_speech_prob": 0.00036422029370442033}, {"id": 502, "seek": 328416, "start": + 3296.3999999999996, "end": 3303.92, "text": " sentence or passage that is similar + in the vector space. That''s it for exciting. However,", "tokens": [50976, 8174, + 420, 11497, 300, 307, 2531, 294, 264, 8062, 1901, 13, 663, 311, 309, 337, 4670, + 13, 2908, 11, 51352], "temperature": 0.0, "avg_logprob": -0.1809255375581629, "compression_ratio": + 1.6194690265486726, "no_speech_prob": 0.00036422029370442033}, {"id": 503, "seek": + 328416, "start": 3304.72, "end": 3310.48, "text": " what people really want is that + even though there is no sentence in the content that exactly", "tokens": [51392, + 437, 561, 534, 528, 307, 300, 754, 1673, 456, 307, 572, 8174, 294, 264, 2701, 300, + 2293, 51680], "temperature": 0.0, "avg_logprob": -0.1809255375581629, "compression_ratio": + 1.6194690265486726, "no_speech_prob": 0.00036422029370442033}, {"id": 504, "seek": + 331048, "start": 3310.48, "end": 3314.96, "text": " answers your question, somehow + the search engine will be able to not to be a search engine.", "tokens": [50364, + 6338, 428, 1168, 11, 6063, 264, 3164, 2848, 486, 312, 1075, 281, 406, 281, 312, + 257, 3164, 2848, 13, 50588], "temperature": 0.0, "avg_logprob": -0.2316365202596365, + "compression_ratio": 1.6879432624113475, "no_speech_prob": 0.005329354200512171}, + {"id": 505, "seek": 331048, "start": 3315.52, "end": 3322.96, "text": " Answer engine + and say, oh, I''m able to synthesize content from different places, understand your", + "tokens": [50616, 24545, 2848, 293, 584, 11, 1954, 11, 286, 478, 1075, 281, 26617, + 1125, 2701, 490, 819, 3190, 11, 1223, 428, 50988], "temperature": 0.0, "avg_logprob": + -0.2316365202596365, "compression_ratio": 1.6879432624113475, "no_speech_prob": + 0.005329354200512171}, {"id": 506, "seek": 331048, "start": 3322.96, "end": 3329.28, + "text": " question and learn that. We are not there. I mean, of course, you can + ask what''s each the i, i, i,", "tokens": [50988, 1168, 293, 1466, 300, 13, 492, + 366, 406, 456, 13, 286, 914, 11, 295, 1164, 11, 291, 393, 1029, 437, 311, 1184, + 264, 741, 11, 741, 11, 741, 11, 51304], "temperature": 0.0, "avg_logprob": -0.2316365202596365, + "compression_ratio": 1.6879432624113475, "no_speech_prob": 0.005329354200512171}, + {"id": 507, "seek": 331048, "start": 3329.28, "end": 3334.16, "text": " i plus one, + and it will say zero, but that''s cheating, right? That''s really just doing this.", + "tokens": [51304, 741, 1804, 472, 11, 293, 309, 486, 584, 4018, 11, 457, 300, 311, + 18309, 11, 558, 30, 663, 311, 534, 445, 884, 341, 13, 51548], "temperature": 0.0, + "avg_logprob": -0.2316365202596365, "compression_ratio": 1.6879432624113475, "no_speech_prob": + 0.005329354200512171}, {"id": 508, "seek": 331048, "start": 3334.16, "end": 3339.6, + "text": " You can play with wall from alpha that is a more sophisticated version + of trying to essentially", "tokens": [51548, 509, 393, 862, 365, 2929, 490, 8961, + 300, 307, 257, 544, 16950, 3037, 295, 1382, 281, 4476, 51820], "temperature": 0.0, + "avg_logprob": -0.2316365202596365, "compression_ratio": 1.6879432624113475, "no_speech_prob": + 0.005329354200512171}, {"id": 509, "seek": 333960, "start": 3339.68, "end": 3347.68, + "text": " parse what you ask into a question that it can then execute in a language. + But I think doing that", "tokens": [50368, 48377, 437, 291, 1029, 666, 257, 1168, + 300, 309, 393, 550, 14483, 294, 257, 2856, 13, 583, 286, 519, 884, 300, 50768], + "temperature": 0.0, "avg_logprob": -0.3300367027822167, "compression_ratio": 1.5362903225806452, + "no_speech_prob": 0.013056247495114803}, {"id": 510, "seek": 333960, "start": 3347.68, + "end": 3354.24, "text": " on general information, we are far away from it''s exciting, + but that''s that will require a", "tokens": [50768, 322, 2674, 1589, 11, 321, 366, + 1400, 1314, 490, 309, 311, 4670, 11, 457, 300, 311, 300, 486, 3651, 257, 51096], + "temperature": 0.0, "avg_logprob": -0.3300367027822167, "compression_ratio": 1.5362903225806452, + "no_speech_prob": 0.013056247495114803}, {"id": 511, "seek": 333960, "start": 3354.24, + "end": 3361.12, "text": " generation. Absolutely. I agree on that. The next question + from Q&A panel from Donnie. What roles", "tokens": [51096, 5125, 13, 7021, 13, 286, + 3986, 322, 300, 13, 440, 958, 1168, 490, 1249, 5, 32, 4831, 490, 1468, 2766, 13, + 708, 9604, 51440], "temperature": 0.0, "avg_logprob": -0.3300367027822167, "compression_ratio": + 1.5362903225806452, "no_speech_prob": 0.013056247495114803}, {"id": 512, "seek": + 333960, "start": 3361.12, "end": 3367.36, "text": " do curated control for capillaries, + terminally name, third-east exonomy, and so on, playing in", "tokens": [51440, 360, + 47851, 1969, 337, 1410, 373, 4889, 11, 10761, 379, 1315, 11, 2636, 12, 22835, 454, + 266, 8488, 11, 293, 370, 322, 11, 2433, 294, 51752], "temperature": 0.0, "avg_logprob": + -0.3300367027822167, "compression_ratio": 1.5362903225806452, "no_speech_prob": + 0.013056247495114803}, {"id": 513, "seek": 336736, "start": 3367.36, "end": 3370.8, + "text": " practical approaches to query and content understanding in your experience?", + "tokens": [50364, 8496, 11587, 281, 14581, 293, 2701, 3701, 294, 428, 1752, 30, + 50536], "temperature": 0.0, "avg_logprob": -0.2338074486831139, "compression_ratio": + 1.4157303370786516, "no_speech_prob": 0.0008033758495002985}, {"id": 514, "seek": + 336736, "start": 3372.32, "end": 3381.44, "text": " So the huge. And I''m glad that + you asked this on me. Basically, collecting these sorts of", "tokens": [50612, 407, + 264, 2603, 13, 400, 286, 478, 5404, 300, 291, 2351, 341, 322, 385, 13, 8537, 11, + 12510, 613, 7527, 295, 51068], "temperature": 0.0, "avg_logprob": -0.2338074486831139, + "compression_ratio": 1.4157303370786516, "no_speech_prob": 0.0008033758495002985}, + {"id": 515, "seek": 336736, "start": 3382.32, "end": 3391.76, "text": " curated + capillaries can be a great way to label the content and have targets for doing", + "tokens": [51112, 47851, 1410, 373, 4889, 393, 312, 257, 869, 636, 281, 7645, 264, + 2701, 293, 362, 12911, 337, 884, 51584], "temperature": 0.0, "avg_logprob": -0.2338074486831139, + "compression_ratio": 1.4157303370786516, "no_speech_prob": 0.0008033758495002985}, + {"id": 516, "seek": 339176, "start": 3392.48, "end": 3400.96, "text": " a machine + learning labeling. So for example, you know what colors things come in, then", "tokens": + [50400, 257, 3479, 2539, 40244, 13, 407, 337, 1365, 11, 291, 458, 437, 4577, 721, + 808, 294, 11, 550, 50824], "temperature": 0.0, "avg_logprob": -0.3053551549496858, + "compression_ratio": 1.6521739130434783, "no_speech_prob": 0.04411494359374046}, + {"id": 517, "seek": 339176, "start": 3401.6800000000003, "end": 3406.1600000000003, + "text": " those people tend to look for colors. They''re great. You can actually + say great that I''m going to", "tokens": [50860, 729, 561, 3928, 281, 574, 337, + 4577, 13, 814, 434, 869, 13, 509, 393, 767, 584, 869, 300, 286, 478, 516, 281, 51084], + "temperature": 0.0, "avg_logprob": -0.3053551549496858, "compression_ratio": 1.6521739130434783, + "no_speech_prob": 0.04411494359374046}, {"id": 518, "seek": 339176, "start": 3406.1600000000003, + "end": 3414.1600000000003, "text": " do to look this way. Or if I know what subjects + people are interested in, those will be the subjects", "tokens": [51084, 360, 281, + 574, 341, 636, 13, 1610, 498, 286, 458, 437, 13066, 561, 366, 3102, 294, 11, 729, + 486, 312, 264, 13066, 51484], "temperature": 0.0, "avg_logprob": -0.3053551549496858, + "compression_ratio": 1.6521739130434783, "no_speech_prob": 0.04411494359374046}, + {"id": 519, "seek": 339176, "start": 3414.1600000000003, "end": 3420.88, "text": + " that they will content with and where it targets things for queries. So in a way, + having these", "tokens": [51484, 300, 436, 486, 2701, 365, 293, 689, 309, 12911, + 721, 337, 24109, 13, 407, 294, 257, 636, 11, 1419, 613, 51820], "temperature": 0.0, + "avg_logprob": -0.3053551549496858, "compression_ratio": 1.6521739130434783, "no_speech_prob": + 0.04411494359374046}, {"id": 520, "seek": 342088, "start": 3420.88, "end": 3426.8, + "text": " vocabularies changes what would otherwise be an unsturableized problem, + saying, well, I''m hoping", "tokens": [50364, 2329, 455, 1040, 530, 2962, 437, 576, + 5911, 312, 364, 18799, 374, 712, 1602, 1154, 11, 1566, 11, 731, 11, 286, 478, 7159, + 50660], "temperature": 0.0, "avg_logprob": -0.24563356047695123, "compression_ratio": + 1.6724890829694323, "no_speech_prob": 0.0014233413385227323}, {"id": 521, "seek": + 342088, "start": 3426.8, "end": 3431.28, "text": " that I can get some representation + of what this is about, what the query is about, and match them,", "tokens": [50660, + 300, 286, 393, 483, 512, 10290, 295, 437, 341, 307, 466, 11, 437, 264, 14581, 307, + 466, 11, 293, 2995, 552, 11, 50884], "temperature": 0.0, "avg_logprob": -0.24563356047695123, + "compression_ratio": 1.6724890829694323, "no_speech_prob": 0.0014233413385227323}, + {"id": 522, "seek": 342088, "start": 3431.28, "end": 3438.8, "text": " which if + I don''t have vocabularies, I''ll be somewhat unattainingated in how I do this.", + "tokens": [50884, 597, 498, 286, 500, 380, 362, 2329, 455, 1040, 530, 11, 286, 603, + 312, 8344, 47316, 3686, 770, 294, 577, 286, 360, 341, 13, 51260], "temperature": + 0.0, "avg_logprob": -0.24563356047695123, "compression_ratio": 1.6724890829694323, + "no_speech_prob": 0.0014233413385227323}, {"id": 523, "seek": 342088, "start": 3438.8, + "end": 3446.4, "text": " Having control vocabularies can say, oh, those are the + ways in which I will try to represent things.", "tokens": [51260, 10222, 1969, 2329, + 455, 1040, 530, 393, 584, 11, 1954, 11, 729, 366, 264, 2098, 294, 597, 286, 486, + 853, 281, 2906, 721, 13, 51640], "temperature": 0.0, "avg_logprob": -0.24563356047695123, + "compression_ratio": 1.6724890829694323, "no_speech_prob": 0.0014233413385227323}, + {"id": 524, "seek": 344640, "start": 3446.4, "end": 3452.2400000000002, "text": + " Even if I''m using vector-based methods to get there, they give me some alignment + on the space", "tokens": [50364, 2754, 498, 286, 478, 1228, 8062, 12, 6032, 7150, + 281, 483, 456, 11, 436, 976, 385, 512, 18515, 322, 264, 1901, 50656], "temperature": + 0.0, "avg_logprob": -0.20700997891633408, "compression_ratio": 1.7069767441860466, + "no_speech_prob": 0.0015998915769159794}, {"id": 525, "seek": 344640, "start": 3452.2400000000002, + "end": 3458.2400000000002, "text": " and having multiple such vocabularies might + say, well, sometimes I might be interested in one aspect", "tokens": [50656, 293, + 1419, 3866, 1270, 2329, 455, 1040, 530, 1062, 584, 11, 731, 11, 2171, 286, 1062, + 312, 3102, 294, 472, 4171, 50956], "temperature": 0.0, "avg_logprob": -0.20700997891633408, + "compression_ratio": 1.7069767441860466, "no_speech_prob": 0.0015998915769159794}, + {"id": 526, "seek": 344640, "start": 3458.2400000000002, "end": 3464.56, "text": + " of this content that''s sometimes in another, right? So I might be interested + in the color and", "tokens": [50956, 295, 341, 2701, 300, 311, 2171, 294, 1071, + 11, 558, 30, 407, 286, 1062, 312, 3102, 294, 264, 2017, 293, 51272], "temperature": + 0.0, "avg_logprob": -0.20700997891633408, "compression_ratio": 1.7069767441860466, + "no_speech_prob": 0.0015998915769159794}, {"id": 527, "seek": 344640, "start": 3464.56, + "end": 3470.8, "text": " I''d be in the prototype and I''d be in the material. So + I would say that having", "tokens": [51272, 286, 1116, 312, 294, 264, 19475, 293, + 286, 1116, 312, 294, 264, 2527, 13, 407, 286, 576, 584, 300, 1419, 51584], "temperature": + 0.0, "avg_logprob": -0.20700997891633408, "compression_ratio": 1.7069767441860466, + "no_speech_prob": 0.0015998915769159794}, {"id": 528, "seek": 347080, "start": 3471.44, + "end": 3479.04, "text": " these vocabularies can make a big difference. And since + you bring up the SORA,", "tokens": [50396, 613, 2329, 455, 1040, 530, 393, 652, + 257, 955, 2649, 13, 400, 1670, 291, 1565, 493, 264, 10621, 3750, 11, 50776], "temperature": + 0.0, "avg_logprob": -0.26550716512343464, "compression_ratio": 1.4972375690607735, + "no_speech_prob": 0.004273534752428532}, {"id": 529, "seek": 347080, "start": 3480.88, + "end": 3487.76, "text": " it''s going to be helpful when there''s a vocabulary gap + between the way people ask things and the", "tokens": [50868, 309, 311, 516, 281, + 312, 4961, 562, 456, 311, 257, 19864, 7417, 1296, 264, 636, 561, 1029, 721, 293, + 264, 51212], "temperature": 0.0, "avg_logprob": -0.26550716512343464, "compression_ratio": + 1.4972375690607735, "no_speech_prob": 0.004273534752428532}, {"id": 530, "seek": + 347080, "start": 3487.76, "end": 3494.32, "text": " way things are represented to + use a SORAist for query expansion. You have to be careful because", "tokens": [51212, + 636, 721, 366, 10379, 281, 764, 257, 10621, 3750, 468, 337, 14581, 11260, 13, 509, + 362, 281, 312, 5026, 570, 51540], "temperature": 0.0, "avg_logprob": -0.26550716512343464, + "compression_ratio": 1.4972375690607735, "no_speech_prob": 0.004273534752428532}, + {"id": 531, "seek": 349432, "start": 3494.7200000000003, "end": 3501.6000000000004, + "text": " the SORA tends to wreak havoc with the context in which those words occur, + but still they can be great", "tokens": [50384, 264, 10621, 3750, 12258, 281, 46674, + 514, 47367, 365, 264, 4319, 294, 597, 729, 2283, 5160, 11, 457, 920, 436, 393, 312, + 869, 50728], "temperature": 0.0, "avg_logprob": -0.17582876964281963, "compression_ratio": + 1.6024096385542168, "no_speech_prob": 0.004784346558153629}, {"id": 532, "seek": + 349432, "start": 3501.6000000000004, "end": 3508.96, "text": " for generating candidates + for retrieving more results. Awesome. I think we still have time even", "tokens": + [50728, 337, 17746, 11255, 337, 19817, 798, 544, 3542, 13, 10391, 13, 286, 519, + 321, 920, 362, 565, 754, 51096], "temperature": 0.0, "avg_logprob": -0.17582876964281963, + "compression_ratio": 1.6024096385542168, "no_speech_prob": 0.004784346558153629}, + {"id": 533, "seek": 349432, "start": 3508.96, "end": 3514.96, "text": " at the top + of the hour for the last question from an anonymous attendee. So Daniel, you talked + about", "tokens": [51096, 412, 264, 1192, 295, 264, 1773, 337, 264, 1036, 1168, + 490, 364, 24932, 6888, 1653, 13, 407, 8033, 11, 291, 2825, 466, 51396], "temperature": + 0.0, "avg_logprob": -0.17582876964281963, "compression_ratio": 1.6024096385542168, + "no_speech_prob": 0.004784346558153629}, {"id": 534, "seek": 349432, "start": 3514.96, + "end": 3521.04, "text": " query classification for the retrieval side of things, + but that can be a slippery slope. If content", "tokens": [51396, 14581, 21538, 337, + 264, 19817, 3337, 1252, 295, 721, 11, 457, 300, 393, 312, 257, 28100, 13525, 13, + 759, 2701, 51700], "temperature": 0.0, "avg_logprob": -0.17582876964281963, "compression_ratio": + 1.6024096385542168, "no_speech_prob": 0.004784346558153629}, {"id": 535, "seek": + 352104, "start": 3521.12, "end": 3526.8, "text": " isn''t 100% correctly categorized, + and often it''s not. Therefore, our recall would be negatively", "tokens": [50368, + 1943, 380, 2319, 4, 8944, 19250, 1602, 11, 293, 2049, 309, 311, 406, 13, 7504, 11, + 527, 9901, 576, 312, 29519, 50652], "temperature": 0.0, "avg_logprob": -0.14365674113179303, + "compression_ratio": 1.512, "no_speech_prob": 0.0054603200405836105}, {"id": 536, + "seek": 352104, "start": 3526.8, "end": 3533.2799999999997, "text": " impacted by + using query understanding as a hard filter. Any input on that? Absolutely. I mean,", + "tokens": [50652, 15653, 538, 1228, 14581, 3701, 382, 257, 1152, 6608, 13, 2639, + 4846, 322, 300, 30, 7021, 13, 286, 914, 11, 50976], "temperature": 0.0, "avg_logprob": + -0.14365674113179303, "compression_ratio": 1.512, "no_speech_prob": 0.0054603200405836105}, + {"id": 537, "seek": 352104, "start": 3533.2799999999997, "end": 3539.44, "text": + " I was burned by this myself when I was helping a client with trying to target + promoted search", "tokens": [50976, 286, 390, 13490, 538, 341, 2059, 562, 286, 390, + 4315, 257, 6423, 365, 1382, 281, 3779, 21162, 3164, 51284], "temperature": 0.0, + "avg_logprob": -0.14365674113179303, "compression_ratio": 1.512, "no_speech_prob": + 0.0054603200405836105}, {"id": 538, "seek": 352104, "start": 3539.44, "end": 3544.32, + "text": " results. And I said, oh, we should only put them in the right category, + indeed, because there", "tokens": [51284, 3542, 13, 400, 286, 848, 11, 1954, 11, + 321, 820, 787, 829, 552, 294, 264, 558, 7719, 11, 6451, 11, 570, 456, 51528], "temperature": + 0.0, "avg_logprob": -0.14365674113179303, "compression_ratio": 1.512, "no_speech_prob": + 0.0054603200405836105}, {"id": 539, "seek": 354432, "start": 3544.32, "end": 3550.6400000000003, + "text": " were some categorization errors. This had such a negative impact that + I stormed into the room saying,", "tokens": [50364, 645, 512, 19250, 2144, 13603, + 13, 639, 632, 1270, 257, 3671, 2712, 300, 286, 7679, 292, 666, 264, 1808, 1566, + 11, 50680], "temperature": 0.0, "avg_logprob": -0.2651321247059812, "compression_ratio": + 1.5806451612903225, "no_speech_prob": 0.0012466806219890714}, {"id": 540, "seek": + 354432, "start": 3551.2000000000003, "end": 3556.6400000000003, "text": " close + the test, we''re losing money. And it felt very embarrassed because it turned out + that, indeed,", "tokens": [50708, 1998, 264, 1500, 11, 321, 434, 7027, 1460, 13, + 400, 309, 2762, 588, 16843, 570, 309, 3574, 484, 300, 11, 6451, 11, 50980], "temperature": + 0.0, "avg_logprob": -0.2651321247059812, "compression_ratio": 1.5806451612903225, + "no_speech_prob": 0.0012466806219890714}, {"id": 541, "seek": 354432, "start": 3557.52, + "end": 3563.76, "text": " this is a problem, the content wasn''t classified well. + Of course, the first thing I would say is", "tokens": [51024, 341, 307, 257, 1154, + 11, 264, 2701, 2067, 380, 20627, 731, 13, 2720, 1164, 11, 264, 700, 551, 286, 576, + 584, 307, 51336], "temperature": 0.0, "avg_logprob": -0.2651321247059812, "compression_ratio": + 1.5806451612903225, "no_speech_prob": 0.0012466806219890714}, {"id": 542, "seek": + 354432, "start": 3564.4, "end": 3569.36, "text": " invest on the content side because + if you''re able to classify queries, you probably can also", "tokens": [51368, 1963, + 322, 264, 2701, 1252, 570, 498, 291, 434, 1075, 281, 33872, 24109, 11, 291, 1391, + 393, 611, 51616], "temperature": 0.0, "avg_logprob": -0.2651321247059812, "compression_ratio": + 1.5806451612903225, "no_speech_prob": 0.0012466806219890714}, {"id": 543, "seek": + 356936, "start": 3569.36, "end": 3574.6400000000003, "text": " invest in classified + with the content. And by the way, even if the content has been say,", "tokens": + [50364, 1963, 294, 20627, 365, 264, 2701, 13, 400, 538, 264, 636, 11, 754, 498, + 264, 2701, 575, 668, 584, 11, 50628], "temperature": 0.0, "avg_logprob": -0.24341261258689306, + "compression_ratio": 1.6488888888888888, "no_speech_prob": 0.0030971227679401636}, + {"id": 544, "seek": 356936, "start": 3576.32, "end": 3582.32, "text": " categorized + in a way you''re not allowed to override. Right. Maybe you''re a marketplace or", + "tokens": [50712, 19250, 1602, 294, 257, 636, 291, 434, 406, 4350, 281, 42321, 13, + 1779, 13, 2704, 291, 434, 257, 19455, 420, 51012], "temperature": 0.0, "avg_logprob": + -0.24341261258689306, "compression_ratio": 1.6488888888888888, "no_speech_prob": + 0.0030971227679401636}, {"id": 545, "seek": 356936, "start": 3583.04, "end": 3589.04, + "text": " you''re the content that you know, you don''t own the categorization. + For example, on LinkedIn,", "tokens": [51048, 291, 434, 264, 2701, 300, 291, 458, + 11, 291, 500, 380, 1065, 264, 19250, 2144, 13, 1171, 1365, 11, 322, 20657, 11, 51348], + "temperature": 0.0, "avg_logprob": -0.24341261258689306, "compression_ratio": 1.6488888888888888, + "no_speech_prob": 0.0030971227679401636}, {"id": 546, "seek": 356936, "start": 3590.1600000000003, + "end": 3596.6400000000003, "text": " if I decide to say I''m an attorney on LinkedIn, + LinkedIn''s not what you just automatically change,", "tokens": [51404, 498, 286, + 4536, 281, 584, 286, 478, 364, 13469, 322, 20657, 11, 20657, 311, 406, 437, 291, + 445, 6772, 1319, 11, 51728], "temperature": 0.0, "avg_logprob": -0.24341261258689306, + "compression_ratio": 1.6488888888888888, "no_speech_prob": 0.0030971227679401636}, + {"id": 547, "seek": 359664, "start": 3597.2799999999997, "end": 3605.52, "text": + " no, you''re actually not your cat. But it can still classify me and say, you know, + you kind of look", "tokens": [50396, 572, 11, 291, 434, 767, 406, 428, 3857, 13, + 583, 309, 393, 920, 33872, 385, 293, 584, 11, 291, 458, 11, 291, 733, 295, 574, + 50808], "temperature": 0.0, "avg_logprob": -0.14612061099002235, "compression_ratio": + 1.4870466321243523, "no_speech_prob": 0.001488552545197308}, {"id": 548, "seek": + 359664, "start": 3605.52, "end": 3612.72, "text": " like a software engineer. And + you can use inferred categories in your retrieval. There''s no", "tokens": [50808, + 411, 257, 4722, 11403, 13, 400, 291, 393, 764, 13596, 986, 10479, 294, 428, 19817, + 3337, 13, 821, 311, 572, 51168], "temperature": 0.0, "avg_logprob": -0.14612061099002235, + "compression_ratio": 1.4870466321243523, "no_speech_prob": 0.001488552545197308}, + {"id": 549, "seek": 359664, "start": 3612.72, "end": 3620.56, "text": " prohibition + there. I''d also say that, you know, maybe the content isn''t 100% categorized because", + "tokens": [51168, 16015, 849, 456, 13, 286, 1116, 611, 584, 300, 11, 291, 458, 11, + 1310, 264, 2701, 1943, 380, 2319, 4, 19250, 1602, 570, 51560], "temperature": 0.0, + "avg_logprob": -0.14612061099002235, "compression_ratio": 1.4870466321243523, "no_speech_prob": + 0.001488552545197308}, {"id": 550, "seek": 362056, "start": 3620.56, "end": 3626.56, + "text": " there are some similar categories. Well, you can always take query classification,", + "tokens": [50364, 456, 366, 512, 2531, 10479, 13, 1042, 11, 291, 393, 1009, 747, + 14581, 21538, 11, 50664], "temperature": 0.0, "avg_logprob": -0.1889562525300898, + "compression_ratio": 1.8425196850393701, "no_speech_prob": 0.007422778755426407}, + {"id": 551, "seek": 362056, "start": 3627.7599999999998, "end": 3632.16, "text": + " you know, it''s not not a hard and fast rule, but more like guidelines and say, + well, return things", "tokens": [50724, 291, 458, 11, 309, 311, 406, 406, 257, 1152, + 293, 2370, 4978, 11, 457, 544, 411, 12470, 293, 584, 11, 731, 11, 2736, 721, 50944], + "temperature": 0.0, "avg_logprob": -0.1889562525300898, "compression_ratio": 1.8425196850393701, + "no_speech_prob": 0.007422778755426407}, {"id": 552, "seek": 362056, "start": 3632.16, + "end": 3636.64, "text": " in that category. But I''ll also return things to say + we''re referring to that category. And maybe", "tokens": [50944, 294, 300, 7719, + 13, 583, 286, 603, 611, 2736, 721, 281, 584, 321, 434, 13761, 281, 300, 7719, 13, + 400, 1310, 51168], "temperature": 0.0, "avg_logprob": -0.1889562525300898, "compression_ratio": + 1.8425196850393701, "no_speech_prob": 0.007422778755426407}, {"id": 553, "seek": + 362056, "start": 3636.64, "end": 3642.32, "text": " you''ll even return things that + are in similar categories. Oh, in my way, if I''m not 100% sure", "tokens": [51168, + 291, 603, 754, 2736, 721, 300, 366, 294, 2531, 10479, 13, 876, 11, 294, 452, 636, + 11, 498, 286, 478, 406, 2319, 4, 988, 51452], "temperature": 0.0, "avg_logprob": + -0.1889562525300898, "compression_ratio": 1.8425196850393701, "no_speech_prob": + 0.007422778755426407}, {"id": 554, "seek": 362056, "start": 3642.96, "end": 3648.4, + "text": " on that category, even on the query side, maybe I''ll take the top few + categories that I thought", "tokens": [51484, 322, 300, 7719, 11, 754, 322, 264, + 14581, 1252, 11, 1310, 286, 603, 747, 264, 1192, 1326, 10479, 300, 286, 1194, 51756], + "temperature": 0.0, "avg_logprob": -0.1889562525300898, "compression_ratio": 1.8425196850393701, + "no_speech_prob": 0.007422778755426407}, {"id": 555, "seek": 364840, "start": 3648.4, + "end": 3654.56, "text": " were possible. There are a lot of ways in which you can + use what you''ve seen, what you''ve learned", "tokens": [50364, 645, 1944, 13, 821, + 366, 257, 688, 295, 2098, 294, 597, 291, 393, 764, 437, 291, 600, 1612, 11, 437, + 291, 600, 3264, 50672], "temperature": 0.0, "avg_logprob": -0.15704806645711264, + "compression_ratio": 1.875968992248062, "no_speech_prob": 0.020690791308879852}, + {"id": 556, "seek": 364840, "start": 3654.56, "end": 3658.1600000000003, "text": + " about the query and what you''ve learned about the content that''s hints. And + with all these things,", "tokens": [50672, 466, 264, 14581, 293, 437, 291, 600, + 3264, 466, 264, 2701, 300, 311, 27271, 13, 400, 365, 439, 613, 721, 11, 50852], + "temperature": 0.0, "avg_logprob": -0.15704806645711264, "compression_ratio": 1.875968992248062, + "no_speech_prob": 0.020690791308879852}, {"id": 557, "seek": 364840, "start": 3658.1600000000003, + "end": 3663.84, "text": " it''s a precision recall trade-off. And you have to generally + decide what''s the cost of losing", "tokens": [50852, 309, 311, 257, 18356, 9901, + 4923, 12, 4506, 13, 400, 291, 362, 281, 5101, 4536, 437, 311, 264, 2063, 295, 7027, + 51136], "temperature": 0.0, "avg_logprob": -0.15704806645711264, "compression_ratio": + 1.875968992248062, "no_speech_prob": 0.020690791308879852}, {"id": 558, "seek": + 364840, "start": 3663.84, "end": 3669.2000000000003, "text": " that recall versus + what''s the cost of having the annoyance of for precision. And it will depend.", + "tokens": [51136, 300, 9901, 5717, 437, 311, 264, 2063, 295, 1419, 264, 8759, 719, + 295, 337, 18356, 13, 400, 309, 486, 5672, 13, 51404], "temperature": 0.0, "avg_logprob": + -0.15704806645711264, "compression_ratio": 1.875968992248062, "no_speech_prob": + 0.020690791308879852}, {"id": 559, "seek": 364840, "start": 3670.88, "end": 3676.4, + "text": " Absolutely. It''s a journey. And I''ve enjoyed this conversation so much. + I''ve learned new things.", "tokens": [51488, 7021, 13, 467, 311, 257, 4671, 13, + 400, 286, 600, 4626, 341, 3761, 370, 709, 13, 286, 600, 3264, 777, 721, 13, 51764], + "temperature": 0.0, "avg_logprob": -0.15704806645711264, "compression_ratio": 1.875968992248062, + "no_speech_prob": 0.020690791308879852}, {"id": 560, "seek": 367640, "start": 3676.4, + "end": 3681.44, "text": " I will rewatch this podcast myself. And thanks to everyone + for asking your questions and thanks,", "tokens": [50364, 286, 486, 319, 15219, + 341, 7367, 2059, 13, 400, 3231, 281, 1518, 337, 3365, 428, 1651, 293, 3231, 11, + 50616], "temperature": 0.0, "avg_logprob": -0.133118358411287, "compression_ratio": + 1.6581196581196582, "no_speech_prob": 0.008224531076848507}, {"id": 561, "seek": + 367640, "start": 3681.44, "end": 3687.84, "text": " Daniel, for for answering them. + Brilliant as you usually do. Hopefully we covered all the questions", "tokens": + [50616, 8033, 11, 337, 337, 13430, 552, 13, 34007, 382, 291, 2673, 360, 13, 10429, + 321, 5343, 439, 264, 1651, 50936], "temperature": 0.0, "avg_logprob": -0.133118358411287, + "compression_ratio": 1.6581196581196582, "no_speech_prob": 0.008224531076848507}, + {"id": 562, "seek": 367640, "start": 3687.84, "end": 3695.04, "text": " from the + chat and from the Q&A panel. And yeah, thank you so much. And I think where you + don''t", "tokens": [50936, 490, 264, 5081, 293, 490, 264, 1249, 5, 32, 4831, 13, + 400, 1338, 11, 1309, 291, 370, 709, 13, 400, 286, 519, 689, 291, 500, 380, 51296], + "temperature": 0.0, "avg_logprob": -0.133118358411287, "compression_ratio": 1.6581196581196582, + "no_speech_prob": 0.008224531076848507}, {"id": 563, "seek": 367640, "start": 3695.04, + "end": 3701.28, "text": " feel comfortable or you don''t know yet, I highly recommend + you to take the course or a course on", "tokens": [51296, 841, 4619, 420, 291, 500, + 380, 458, 1939, 11, 286, 5405, 2748, 291, 281, 747, 264, 1164, 420, 257, 1164, 322, + 51608], "temperature": 0.0, "avg_logprob": -0.133118358411287, "compression_ratio": + 1.6581196581196582, "no_speech_prob": 0.008224531076848507}, {"id": 564, "seek": + 370128, "start": 3701.28, "end": 3710.0, "text": " search. And go from there, experiment. + Be bold about what you do in your experiments. But just", "tokens": [50364, 3164, + 13, 400, 352, 490, 456, 11, 5120, 13, 879, 11928, 466, 437, 291, 360, 294, 428, + 12050, 13, 583, 445, 50800], "temperature": 0.0, "avg_logprob": -0.2544640435112847, + "compression_ratio": 1.5161290322580645, "no_speech_prob": 0.0015784607967361808}, + {"id": 565, "seek": 370128, "start": 3710.0, "end": 3716.0, "text": " apply science + and apply the knowledge from the moguls like Daniel and Grant. So thank you very", + "tokens": [50800, 3079, 3497, 293, 3079, 264, 3601, 490, 264, 13172, 9468, 411, + 8033, 293, 17529, 13, 407, 1309, 291, 588, 51100], "temperature": 0.0, "avg_logprob": + -0.2544640435112847, "compression_ratio": 1.5161290322580645, "no_speech_prob": + 0.0015784607967361808}, {"id": 566, "seek": 370128, "start": 3716.0, "end": 3720.7200000000003, + "text": " much, Daniel, for your time and for your wisdom today. Thank you, Jimic, + you''re so pleasure.", "tokens": [51100, 709, 11, 8033, 11, 337, 428, 565, 293, + 337, 428, 10712, 965, 13, 1044, 291, 11, 6637, 299, 11, 291, 434, 370, 6834, 13, + 51336], "temperature": 0.0, "avg_logprob": -0.2544640435112847, "compression_ratio": + 1.5161290322580645, "no_speech_prob": 0.0015784607967361808}, {"id": 567, "seek": + 372072, "start": 3720.72, "end": 3727.68, "text": " Are ready. Thank you, everyone. + Thanks.", "tokens": [50368, 2014, 1919, 13, 1044, 291, 11, 1518, 13, 2561, 13, 50712], + "temperature": 0.0, "avg_logprob": -0.6887408036452073, "compression_ratio": 0.9285714285714286, + "no_speech_prob": 0.07727666944265366}]' +--- + +We can get started. So I can kick us off even though you see are really the star of the show. So hi everyone. Welcome to our fireside chat with Dimeji Conan, Daniel Tungerling. This fireside chat is on search and all the interesting topics that Dimeji and Daniel will talk about. +And it's a series that's hosted by Kauarais. + Kauarais just do my plug here and we're a new education platform that transforms the way professionals build technical high demand skills through top industry leaders such as Daniel and collective peer learning such as Demetri, the format of our courses or pretty innovative because we mix live instructor sessions with real world projects and fireside chats like these with operators that are experts in their field. +And actually I see a couple of students from both the search class and from other classes are in the audience. So welcome back to you guys and welcome to everyone else here. So with that, I'll pass on to Demetri. Awesome. Thanks, Judy. And hello, everyone. +As they usually say, hey, they are vector podcasts is here. And today I have like a luminary guest, a mogul in search world, Daniel Tankele and beyond excited to be talking to him and discussing the, you know, favorite he's in mind topics in queer understanding and content understanding. +And traditionally, I will introduce myself for the first time on the podcast. And well, what I want to say is I have PhD in natural language processing. I work the machine translation back in the days. Currently in two roles, principal AI scientist with silo AI. It's a consulting gig. +And recently I entered the job as a senior product manager at the company called Tom Tom, which produces maps and map search and navigation. I have 16 years of experience in developing search engines for startups and multinational technology giants. +Most recently, I worked on multilingual web scale search. I also claim to be an expert in vector search engines. And I'm the host of vector podcast, which focuses on this tech, but also beyond that on search at large. I'm also blogging on medium. +And as I said, I'm beyond excited to be talking to Daniel today. And as a tradition, Daniel, could you please introduce yourself to me and our audience? Sure, do you make me? Thank you for that. I'm Daniel Tungaling. And I've been working in search for, I guess, a little bit over two decades. +I started after completing my PhD, not anything to do with search information retrieval, but actually in network visualization. +I shortly ended up teaming up with a few folks to start a company called Indeka back in 1999 that ended up focusing on e-commerce search and to some degree enterprise search in general. Site search has there for 10 years of the chief scientist. +And then I went to Google where in fact, I worked on search in local search part of the maps search team as a tech lead moved ironically from the East Coast. +I've been living in New York to now to do to leave Google and join LinkedIn where I first ran the product data science team, but ended up coming back to my first bulb of search. +And it was at LinkedIn where I started a query understanding team and shifted my focus, which had really been more around faceted search and interaction to query understanding. +After leaving LinkedIn, I decided to go off on my own and for the past six or seven years, I think what I like to call a high-class consultant trying to bring the search to everybody who needs it, which turns out to be a lot of people. +And then last year, I discovered the wonderful folks at Co-Rise and started teaching these classes with my friend and colleague, Raddey herself. Fantastic. And I can add to that, the course being having been a student at your course. Fantastic course, I've learned a lot. +And yeah, I'm a happy owner of this certificate as well. So I can prove to future job employers that I have passed it. And actually, I watched one presentation you gave at the CIO Summit 10 years ago. +And one key phrase that I took away from it or suggestion, you said science is the difference between instinct and strategy. And I wanted to a little bit like ask you to talk to the role of science in every day search engine development and research. +Do you continue to view it that way 10 years forward? I do, it's funny because if you, anybody who watches that, that is probably the only recording of me wearing a suit on any sort of video is the science strategy talk. +And the when I was thinking at the time, you know, as a data scientist, a big part of my job was getting people to use the scientific method and there were whether that was a A B testing or having, you know, clear falseifiable hypotheses and so forth. +Now, it don't get me wrong, instincts matter a lot. For example, if you go to a search engine and you're not happy with what you're seeing, your instincts are probably right. There's probably something wrong. +But if you say, oh, I'm not seeing the results I like, I'm going to add a simple inventory. I'm going to turn up one of these knobs to see what I get. Then don't get me wrong, you'll sometimes get improvements, instincts are not useless. +But you won't have a way of being certain that you're getting improvements. And you may sometimes get improvements that happen to work in that particular moment at that particular time, but which you can't explain or sustain. +And so science is about using techniques like with other people might call randomize control trials, but we like to call A B tests. + The science it poses a certain amount of discipline and it keeps you honest, which I do think is the difference between running on instincts that may or may not work and being able to pursue a strategy where you not only can see whether or how things work, you can measure this as well and repeat what you do. +So I still hold to that with regard to this. Yeah, this is fantastic. And I highly recommend also to watch that video, even though it was for high top executives, there is a lot of logical elements to it that you can apply in day-to-day work. +And yeah, I remember also one quote from the book called How Google Works that if we argue and we have data, let's look at that data. But if we go by opinions, let's go with mine. And it was written by Vice President of that area. So basically, he's a hippo or sort of top that letter. +So why not why not actually follow the hierarchy there? But yeah, I agree that if you have data, look at it if you don't try to collect it. Yeah. Yeah, I mean, indeed, I mean, data is what is the equalizer, but it's for those of us who are not CEOs, it's how we get things done. Yeah, absolutely. +By the way, I wanted to also say a couple of words on logistics. Please send your questions on the chat and we will handle them in the end of this session. +Yeah, and 10 years forward, I've read your message on LinkedIn where you said a little bit on sad tone, not everyone shares my passion for search. But I suspect that many would be more excited about search if they understood it better. +Was it just a moment of despair or was it a moment that you thought, okay, I need to approach it differently. +I can keep blogging about query understanding and content understanding, but how can I actually open the doors to the minds of new people, potentially students in this field? What was going through your mind when you wrote that? So I'm an off to this. +So I'm, you know, if two years of a pandemic and now the global crisis can get me out, I'm certainly not going to despair just because not enough people are excited about search. + But I have seen that, you know, our technology industry tends to have certain kinds of fads and say, in fact, back in the 90s, everybody was excited about search for those of you all to have to remember, Google was not the first major search engine that we're using all of this, that we're using, gahoo and so forth. +And then after Google took on the scene, many people said, oh, search is done. Now, I happen to not be one of those people because I was at a startup, which actually also started in 1999 working on search. And they said, no, search isn't done at all. +I mean, we were trying to help e-commerce companies and we saw that there's a lot to do on search. Now you might think that 20 years later, search would finally be done. +But interestingly, there are still so many opportunities, in fact, using some of the latest developments in machine learning to do so. But what I've seen is that people don't necessarily gravitate to search as an exciting problem. +They're excited about voice recognition about what they perceive as AI in general, which they may see as question-answered, which at the heart of it has lots to do with search as well. +But they don't realize that, you know, that humble little search box in which they are typing and everything going on between it is still an extremely exciting area of development. +I think it's because it does look so simple that they don't imagine what can you do? Change the size of the search box, you know, change the font of what's actually going on between. You'll be behind that. +So my hope is that when people see the complexity involved and the way in which search is amenable to the very techniques that they are excited about, they'll then say, oh wow, this is great. +And then on the top of everything else, it's a place where I can have a huge about impact, a netable impact on the way that people interact in machines. So I know, no despair, just maybe sometimes a little bit of sadness that people don't share my excitement enough. Yeah, absolutely. +I mean, search is a fantastic field if you're not there, consider entering, or at least studying and evaluating, but it's very deep. +I remember actually myself like 20 years ago, still in the university, I was asking a friend of mine, how do the search engine works? And he was majoring in information retrieval back then. I knew nothing about the field. And he said, well, we we use inverted index. +That's how we represent the documents. But then I was still not satisfied. I asked him, hey, how can actually search engine know what I want to find when I don't know myself? Like, if I start typing something in the keyword box, it's like a chicken act problem. +It means that I know something already of the subject, right? But what if I know nothing about it? And so in my mind, I started hypothesizing that maybe we can build a search engine, which will kind of refine the query. +I didn't know how to do it, but I was just, you know, thinking in my mind that it's possible. And now so many years fast forward, we apply machine learning to search. +And what I wanted to ask you, why do you think we need machine learning in search today? Like, there are other ways to satisfy the user intent on sort of calculated, understand it. Then there are other things like established techniques with manual boosting and manual features that you can curate. +And many companies, I think, still do it. But like, what's your take on machine learning role in search today? So certainly when I was working on search back in 1999, I didn't give a no much machine learning. +I take a class that's highly theoretical, but I managed to help build search so clearly it's possible to do it without machine learning. And as you said, many people still are working with completely hand-to-systems. +I think that machine learning plays a few roles though in, I think what you could say is modernize in search, but what we're making it do things you really couldn't do before. +So for one thing, when you're doing all of these hand-to-tune boosts, right, which you're typically saying, oh, I'm going to have a bunch of factors that affect me. I'll change the weights on those. I'll see what can improve. +Effectively, you're performing an optimization problem, but you're doing it by hand, or you're saying, let me go a little bit in this direction, a little bit in that direction, let me see what it can do. +Well, the main technique that machine learning does is optimization only that it does so by formalizing the objective that you're optimizing for, and then using mathematical techniques, like variations of gradient descent, to look for the place that is optimal. +Well, it would be silly for you to do things by hand that there is an existing architecture to do that. But the other thing is that when you do things by hand, you're very unlikely to be able to move too many knobs by hand. +Three or four factors, you can handle a hundred factors, almost certainly not. +And that tends to be the big win of machine learning is that because of the win that it automates what you would otherwise do by hand, it allows you to do things at a much larger scale and yet keep your weights about you. And that's usually what people do when they're concerned about ranking. +But the other the other way to break through in machine learning today is that in areas like query and content understanding, machine learning often allows you to solve problems. + You'd have been very unlikely to solve by hand to recognize when a query comes in that's in a particular category or that particular entities, people, brands and so forth are mentioned in that query or to figure out what a piece of content is about and get a representation that you can then use to inform what to be returned. +And that's an area where it's not new that you can use machine learning there, but the ability assistance to do so using the more modern AI techniques of word embedding to the like, have dramatically changed the quality of that. +And it's a breakthrough that I can only think of comparing to, you know, speech recognition has been around for a while. But if you use speech recognition systems in the 1980s or 1990s, you thought other very cute, but they'll never be useful. +And today we take for granted that they work well enough that people who had no other option could actually manage with them. And I would say that machine learning in search has now reached a point where it would be silly not to use it if you have the possibility of the data to do so. +Yeah, absolutely, especially if you sit on a pile of data, right? As they used to say in the age of big data. But of course, there are still niche areas where you, let's say you launch a startup, so you don't have clicks. Maybe you can measure clicks some way, but let's say you don't have clicks. +You don't have any user feedback yet. I think at that point, you could still apply machine learning, right? Like deep learning, hopefully we'll get there. +But before that, I think when people talk about ML in search context, they quite often mean, you know, machine learning based relevancy, you know, learning to rank, like you learn a function which will rank your documents. +But in a way, what can rank or find by itself, not much, if the data is not there, if it's not categorized. So what's your view on where machine learning can bring a lot of benefit upstream? And I think you touched on it like query understanding and content understanding. +Can you drill a little bit more into that, especially from the point of view, how you approach the task, where you start? Sure. Well, as you know, and anybody listening to this, is ready what I have to say. I like ranking, some of my best friends, but do ranking and even I do it occasionally. +But I think that ranking has been over emphasized in search in general and in machine learning, the power and search in particular. +So if we think of what ranking does, it says, but we have a search query, we have a potential result, and we score it with a function that will then determine the order which we present it, assuming that that result is as a candidate to be considered. +And if you go back to the original models of information retrieval, they act as if every document in your corpus could be scored. The only reason you don't do that is you can't forward to it's too expensive. +But that that's the, the gist of it, the scoring function on the query and document, then in some cases, even on the user. Now, that's a lot of input into a function. It's quite a different way that you might approach the problem is to say, I have a query. +I'm going to try to represent that query as useful as possible without looking at any documents first. Also, before I even see any queries, I have documents. +I'm going to try to represent them as well as possible before I see any queries, or at the very least before I see the particular query that someone's going to make. I might have something, I might use the history of queries to, you know, inform my approach. +So now we've factored out this original scoring problem that said, throw everything at a scoring function and said, no, no, no, we're first going to say, let's understand the query in a representation that distills it to its assets. +We have already understood the content, the documents, in a way that distills them to their assets. And now, when we even decide what to retrieve, we're going to use those representations that already have done some of the work for us. In the case of the documents, we did it offline. +In the case of the query, we have to wait till we see it unless it's a maybe ahead query we've seen before, but we do it once for the query, not once for every result. +And we can use that to then say, well, roughly speaking, if we have represented the query and the content in a similar space retrieval that is deciding what documents we should look at, is much more of a matching problem. + In fact, if the space uses a similar schema, for example, if the query is mapped to a category or a set of categories, and the documents having categorized using the same set, we can say, well, we should probably retrieve documents from those categories, or we may have other structured data we can use that way. +And what is happening is that a lot of the work that ranking was doing, which was essentially trying to say, should I retrieve this document at all? Is this document relevant enough to the query that it should be in consideration? +This query-dependent aspect of ranking can be solved as saying, basically, once I have the query, and the content represented in the same space, do they know? Is that overlocked there? + So we're basically changing the first order bits, the higher order bits of ranking into more of a classification problem, which we're experiencing is really, look, once we have the query and the content in the same space, figuring out if, you know, what is the content that matches the general gist of the query, should be an easier problem. +And then ranking is more, oh, well, a lot of things matched, but some are better than others. And that's, of course, the word, but she learning that, well, machine learning is how we get those representations. +It's how we turn the query into a more useful representation, how we turn the content into a more useful representation. +But it is, by treating those things in something of an acceleration, it allows us to be a lot more directed than we are with ranking, and in my experience, it's far better results. Yeah, I remember like the course, search with a meltot by you and grunting yourself. +You gave that brilliant example that stuck with me. + I believe it was best by implementing, correct me if I'm wrong, implementing the query functionality, query understanding functionality where if you typed iPhone or some product that they didn't have at the moment, they would actually use query understanding to tell you that they don't have a product they would not even go in search. +And I mean, the example was this B&H photo with iPhones, but an example that's even more fun is with Netflix, where I don't have much inside as to the channels there that haven't been one of my clients. But the Netflix has perspective about limited catalog. +They don't, for example, get movies from folks like Disney that is quite protected of its catalog. But Netflix knows when you're looking for a Disney children's movie. And when you do that, rather than trying to simply match the text of your query, they show some of their children's movies. +So it's an example where you clearly split out the work of understanding, the searchers intent, and query understanding from simply retrieving and scoring results, because that improved representation. +They know you're looking for a children's movie, and they have children's movie is far more powerful than the traditional ways in which you might score a grand result. Yeah, I'm personally fascinated by the field of query understanding, having implemented it with my team in a web scale. +We worked on job search engine, as vertical search engine, no power by the web scale search engine. First of all, it was multilingual. Second is that you have to figure out this semantic subtleties when users type opening hours or working hours, whatever the way they phrase it in their language. +And that's not a query you want to execute in the job search. But if they set jobs in IT in London, that's okay. And you can use that and pass it through the filter. So query understanding kind of worked as a filter in a way. +But then it also, or a classify, you could say, right? But then it would also give us this reach semantics that we could apply in fields. Let's say if it's London as a city, you don't want to search that work just in the description. You can apply it in the field of the city on the document. +And I mean, this was like back then we applied rule-based approach. +And it worked fine, but it was very maybe conservative in a way, right? Especially for languages like Turkish, where they have the word ish, which is a, you know, overloaded, semantically overloaded word and used in different contexts. It may mean a bank card. +It may mean a job search and a number of other meanings. +But would you advocate for using machine learning and query understanding? I know, by the way, you wrote a brilliant series of blog posts on medium drilling into so many subtopics of query understanding, and especially like that you can actually utilize it in autocorrelate. +I was actually fascinated that you connected those two and I highly recommend everyone to take a look at that. +So what's your take on sort of rule-based versus machine learning? Would you start with rule-based? And then as you learn, go to machine learning, or would you start head-on with machine learning? +So I certainly see a lot of value of machine learning inquiry understanding for some of the reasons I was saying before. +But with that said, I think that there's often a sort of a burrito principle in 80, 20 in search problems. And when I go to people, especially folks in small organizations, I tell them, look, let's say for example you're trying to figure out would be, well, use your job of job search examples. +And so I spent a few years up like that and it's very close to my heart. +They knew on to know, well, somebody, for example, looking for a job title, or are they looking for, in say, lengthens case someone's name, or in the case of, say, language, maybe you're not sure what language they're searching in you're trying to do language identification. +You could start by looking at the most common queries that you see, and then just having people, your own employees, a hired crowd, what have you to say, look, can you just label these? I'm not going to label more than hundreds or maybe thousands of these queries that way. +At the hundreds of thousands it starts getting a bit silly. But you can do that and you can say, okay, maybe you have now handled 20, 30% of my traffic that way. It's not uncommon that in 10,000 queries you easily get to that. And you can see great. +Now that I've done that, now that I know, that this person is looking for a job title, that the language is Turkish, or what have you, what would I do with that? And I'm like, well, I'm going to have a particular search experience in mind. +If I know that it's going to be in jobs, I won't look in people, or I won't look in my content posts. If I know what language it is, I'm going to grab from that repository and so forth. And you can learn what you would do there. Now, this won't scale into the tail of your distribution. +But you can learn what happens with that sort of experience. And that's actually really important, because sometimes you don't know what people react to until you show it. There's a bit of a chicken in head in these things as to what is the quality of your data, but also what is the experience. +But once you've decided, okay, I am going to pursue this sort of experience. Frankly, without machine learning, you're never going to scale it. +You're not going to label everything in a rule-based approach to try to figure out what language something is in, or what category something is in simply isn't going to scale. +In the case, for example, language, it's not like you're going to just build dictionary, it's because you'll have cognates between the languages, or in the case of job titles. +By the time you get to Chief Vector Search Ninja, you're going to be in a bit of trouble as to recognize and bad as someone's title. So that's the point at which collecting training data becomes critical. +One of the nice things is if you've done some of the work by hands that can actually be how you bootstrap training data for these approaches, especially if you don't have our data position to do so using feedback from your own search application. Yeah, absolutely. A quick shout out to our audience. +I think it's, if I'm reading it right, just a second, Andre has a poll of how many people in this call are using hand-to-and-boost versus machine learning. I'm really interested to hear or read this opinion. Maybe you can say about it. +Yeah, and on the other hand, you've been advocating a lot on drilling into your content. And of course, some companies do this one way or another. But can you illuminate us on what you can do also on the content understanding side? Sure. +So if you think about it, if all you need was query understanding, you might be able to figure out exactly what the search or wants, but actually not be able to find it. +So content understanding is really what you're doing in order to represent content in your index and the best way to make it retrievable, it's horrible. So certainly, it's a great place to do things like categorization. +This is especially true to say if you have a marketplace or if you have a lot of unstructured content where you don't necessarily know what the content is about. It's also a good place to extract entities, terminology, even determined potentially the terminology that's used for representing it. +I mean, imagine it. For example, you have a collection of research papers. You can discover the useful words or phrases that tend to carry meaning. +You can relate them to one another by putting them in a vector space where the distance between the vectors that tells you how similar they are, you can cluster those. +So in general, doing things that involve either classification or essentially annotation recognizing entities or turns in those allows you to enrich the way you index the content. +You can also figure out when documents are similar to one another because when you have these vector representations, you can take the entire document or part of the document and do that. +And that can be useful for saying, oh, if you're interested in this document, you might be interested in these other ones or maybe you're interested in these other ones, but they're more recent. +And that allows you to combine what content is about with other factors like its recent seeds, popularity, other people that look at it. +And you see this often not just for search, but specifically for for being an engine for recommendations that are triggered from discovering something through an initial search. So all of these things basically make the content more retrievable, but also more exploreable. Yeah, absolutely. +I can also add to that in some settings, specifically in financial search, I'm happy to work at the company called AlphaSense. +You may end up in an institution when you cannot actually use the hints from the users, right? So for instance, like if you do a not a suggest and you extract themes from queries, you could actually do that. I believe Google does that to some extent. +But in financial setting, you cannot do this because banks will prohibit using their searches with their arrivals, right? You don't want to do that ever. +And so at that point, you do go deeper into content understanding and you start extracting stable themes and maybe over time you can also extract trends as they show up. And that might be one way to kind of combat the issue of not being able to use queries to influence your model. +But yeah, you might have another setting. I'm curious to hear in the audience as well, what kind of setting you guys have. And my next question would be on what I available data sets. +Let's say if I want to practice query understanding or content understanding at home in my lab, what are the available data sets, tools and algorithms that you can recommend that will allow us to train these models for both of these directions, query and content understanding? +So as those of you took the class, no, we've been using any commerce data set from Best Buy for a teaching. +It's a nice data set. It's a little bit old, but it has a virtue that it has a bunch of structured data queries and some click data as well. And that's proven useful. You can get that from Kaggle as they've made available freely. +And indeed, Kaggle, which is at this point, the subsidiary of Google, but Minkins independent brand is a great place to get data sets. +This one from Best Buy, I think is probably the best all around one for exploring the particularly query understanding, until that starts that content understanding. There are certainly other data sets available. +You can, for example, grab dumps of data from Wikipedia that are fascinating Wikipedia is perhaps the best overall data set in the world. But they're in mind that it's a bit sprawling and that they don't supply much in the way queries or feedback. +And you'll have to do a little bit of a work with that. There's a data set called MSMarco that's been very popular with essentially the deep learning crowd because it's an interesting place for doing question answering. +So I think a lot of the question becomes what is the problem that you want to work on? +And I would say for those of you who are already working in search and some capacity or at least have access to data, you should really consider trying to use your own data because usually the thing that is hardest to get in public data sets is user behavior. +For perfectly understandable reasons, companies are not eager to share what their users have done either because of the privacy constraints for their user or the competitive nature of that data. +So even if you're able to find catalog data, which you could, if it's structured, use to learn content understanding techniques. For query understanding, the most powerful thing you're going to use is a collection of queries and labels for what those queries mean. +But if you get a collection of data without even having what the queries are and let alone the labels, it's a little bit more difficult. Yeah, absolutely. And can you also share a bit on the tooling side or maybe algorithms? Sure. +So the, you know, for different problems, obviously call for different tools. On the ranking side, one of the most popular approaches that's still in use today is XG boost, which you can get online easily enough. +And it's also been integrated with, I think at this point, most of the major is certainly Lucine based, sort of solar, elastant, and so forth. +If you're interested in classifying text or doing unsupervised learning and untext, there, you know, these days, frankly, I would go directly to embedding based models. +And you can use tools like Burd or maybe the old school on the fan or fast text that you can get online and you can download those, you can install them on your laptop, you can even get pre-trained models for hundreds of languages that do so. And from that, it should be easy enough. +You can just walk through the tutorials where you take just a bunch of labeled text examples in the case of past text, it's an example of stack exchange cooking questions associated with it with their labels. And in half an hour, you find that you're actually doing content classification from this. +And in the course that we do this sort of thing with the best by data as well, it's amazingly easy to see that these sort of tools will start to give you reasonable looking answers. + Now, getting for reasonable answers to answers that you're happy with and incorporating into a search experience can be the difference between an hour and a week or a month or but my hope is that by seeing how easy it is to get started with these, you get tempted enough that you say, great, but 80% isn't good enough. +I need to get myself to something I'd be willing to put in front of my customers. And to be fair, there's a little hard, more hard work to make that happen. Yeah, absolutely. Vector's search, by the way, is my favorite topic. I talk a lot about it. And the look as well. +And I'm super, super happy that you mentioned this now. And the gateway here to this topic is the connection with content understanding is one of the techniques called doc to query, essentially computes possible queries and then augments your document. +So you don't actually need to step in the unknown field of vector search trying to re-engineer your search engine. You can actually keep your search engine architecture as it is. +And you just basically augment your documents in the hope of increasing coverage and actually precision at the same time of future queries. +So what's your take on this on these techniques, emerging techniques, but also what's your take on on the role of vector search in general in the search engine design today? Sure. So if you think about it, the Dr. +Query approach is similar in spirit to saying, well, I'm going to just, I'll have a known set of fields that I would assign to the document in traditional inverted index or posting list. And indeed, the limitation of the older approaches is that they get a kind of force you to a limited vocabulary. +And now the query vocabulary is literally the language of users. So I think that's a great way to do things. And to handle also that documents have often a lot more variability than queries. +This is typically the only some of my people are going to do in a search box, but documents can be in all shapes and sizes. +So I'm certainly a fan of doing document enrichment that's query friendly or conversely, if you're going to do things on the query side, to think of a query is actually as a bag of documents. +I think even though there have been these explicit, what we call two tower approaches that try to sort of meet halfway, I think it's perfectly fine to say, well, we'll think of one of these things as more fundamental and not the second one, too. I think first off, I think it's great. +The idea that you can think of meaning as it comes, point at an eye-dimensional space, it explores things around it, even though in a way it's not a new idea, right? +People have been using vectors at least as far back as techniques like TFIDF where the bag of words representation of content was simply a vector in the space where every word was intervention. +I'm glad you've gotten a little bit smarter about that over the past few decades. And certainly what we can do now with embeddings is amazing. With that said, I think that sometimes people use vectors as too much of a sledgehammer. +For example, if I do a query on a site for cat, turning cats into a vector, and then turning all the documents into vectors, and then sorting them across my code sign, probably is overkill when how much information am I going to get out of a single token cat? +Figuring out whether something is a cat, as Google showed, may require a huge amount of machine learning, for example, with based on taking images. +But it's probably safe to say that at least at the query level, there's only so much new ones are going to get out of a one word query corresponding to an entity. +And if you a traditional approach where you might cure a vector based approach would say, well, I'm going to take the vector for my query cat. I'm going to take all of the vectors for my documents, which I vary the reason of cat and is implied. +I suppose in those vectors and sort by their distance, it probably makes sense to start by saying, maybe I could have actually, you know, either using the doctor query or more traditional methods annotated the documents in such a way that for the first pass, I could get the things here. +In the case of queries, and that simply, as only I've spoken in it, there may not be much I can do at that point in terms of use of vectors. Now, as the queries get longer, have more signal in them. This game changes completely. If I'm saying, I'm looking for a cat wearing a red bow tie. +Well, with a query like that, it's very unlikely that a traditional approach is going to be able to say, well, what do I do? I'm going to look for those words. Some of them, some other ones, you know, is a neck tie. +The same as a bow tie, you know, would a cat in a texito be better than just your typical cat pictures. And so, at that point, the game has changed because it's not a symbol binary question anymore. And the identity that is reduced to similarity make a huge difference. +And there, I think you lean much more heavily into things. Now, I'd say that it's still the case that doing it pure, you know, grab everything as a sort of a nearest neighbor's search in a vector space can be computationally challenging. And it can lead to you, you sort of unpredictable results. +So, most people today are still doing their first pass at least by using traditional methods. But I do know folks who are increasingly trying to use vectors from the get go, but just by using sort of course, or grade vectors or various techniques to make that first pass be quick enough. +So, I think we're going in that direction. I think that there's still a lot of value both computational efficiency and for end explainability in using, you know, traditional inverted indexing techniques where you can, especially for the early stages of retrieval. +But for either for getting these nuances or for say increasing your recall, but, you know, retrieving things that might might have lost otherwise. We're seeing increasingly the use of a vector search to get them. And, you know, we're doing this talk now in 2022. +I suspect in a few years the inverted, inverted index methods will become more and more confined to those cases where where the data is really just simple binary. I think I've always used this. +I think this kind of, there'll always be the certain head cases of it, but the use of vectors is only going to expand. Yeah, absolutely. +Oh, like, especially where I would say, inverted index will still be needed if you are looking for an exact phrase, like you don't want to say, hey, vectorize this and find the similar. No, I don't want similar. I want that exact thing. +And of course, there are other things that need to be improved in vector search, like, I don't know, bird model, according to one study. It doesn't recognize negations and it might be actually crucial for some search scenarios, or sentiment analysis. +And also another thing is that by now, at this point, the sparse search BM25 based methods is a very strong baseline when you compare these methods across datasets, across tasks. And so across domains. So I think the future is very bright on this direction, in this direction. +And I think a lot of folks are trying to combine this method. So I'm happy that you are looking at this as well. And I believe you will be teaching about this as well. We are quite close to the top of the hour. And I'm happy to see so many queries coming in. +But I'm going to ask you my favorite question, the question of why it's this kind of mystical, philosophical question. You are the most celebrated search engine professional today, one of the most, if not the most. +You've done everything there is to do in search, in my opinion, like when I look at your CV, you know, even you consulted Zoom through which we're doing this session. So that speaks volumes. And I just wanted to ask you what drives you to continue focusing on search engines. +And especially teaching about it. So search, if all of the problems that, you know, of all the things we do with technology, I believe is the one that puts us as human beings front and center. +So much of what you see, and specifically machine learning AI is being done to us feeds recommendations, advertisements, and search starts with people expressing what they want. +And I know, in my version of the future, I'm not a custodian, but I believe that the machines will help us, but they have to start with us expressing our intent. So that's an ace of my search is so exciting. And as for why I teach it, well, it comes back to what you asked in the beginning. +Now, I'm a despairing that there's nothing exciting about search. Not despairing, but I do think that the need for people to be building great, great search is not net by the supply of people who have learned about it. +And as much as I enjoy personally working as a consultant for companies, that's not exactly a scalable approach. + So what I see is there's so many people out there who know enough that with a little bit of a push, some combination of the sort of the basics domain knowledge that we're teaching in our fundamentals class, but also the kinds of techniques and feedback are somewhat opinionated way of showing those techniques in the search with machine learning class that focuses on queer understanding, non-conset understanding. +That takes dense retrieval, vector retrieval, puts it in context, is just the nudge they need to get over this. You don't need to spend years and not everybody's going to get to do a PhD in information retrieval and machine translation. +But I think that today, if you are a software engineer, if you have a basic knowledge of coding, and you learn a few of these things, you can do wonders with the tool that's out there. And then from experience, you'll develop the rest of the sorts of skills that you need. +So I'm excited that I can be a part of enabling the next generation to just run circles around anything I ever got to do. +Now look back at the work we did in the early 2000s and it looks so naive, although I think we were working at least on the right problems, but without the machinery we have today. +And I just think, you know, in in in another 20 years, I look forward to looking back on the naivety of what we thought search was, you know, back at this point. Yeah, I'm ecstatic. This is very deep Daniels. Thanks so much. +And please keep doing what you're doing because I really, really enjoy reading your blogs. I need to also read your book, by the way, on FASITED search. +And before we move to the questions from the audience, is there any announcement you would like to make to our audience? Well, you know, for those who don't know, we're going to be teaching these two classes in June. +There's a search fundamentals class, which is a two-week class intended for people with no background in search. And this search with machine learning class that will start two weeks later. So people can take both a fact, focuses on search with machine learning. +So we're going to show you a query understanding, constant understanding, and vector retrieval. That will start the first course. We'll start at June 6th, the second on June 20th. And these classes are available to anyone in the world. +We make sure, granted eye, that we cover the time zones well and are available asynchronously. So I hope that some of you have already signed up. Some of you have taken the class before. +But what we experienced when we talked this class before was the incredible community, which did make sure it's indeed a part of. And we're excited to keep going. Absolutely. And I highly recommend you to take this course, or both of these are one of these. +And first hand experience, it was breathless run. And I was like, yeah, it's every single week. I need to hand in a project. And it's not just theory. And theory, by the way, is very deep. If you have time, go and read all the write-ups that Daniel and Grant have done on the course. +But also the actual act of coding, the actual thing that you see how it evolves in your hands. It's amazing. So awesome. Let's proceed to the questions from the audience. I will take the first question from the Q&A panel. We also have in the chat. So the first question is from Hemann Schu. +I'm a polygis of five meets pronounce your name. Hi, Daniel. Any specific book to get started with search with elements of machine learning? Thanks. Yeah, so I mean, the, I'll say this, there's a lot that's been written on, on learning to rank. I think Chris Mannings book discusses it. +I recorded by a Jesus book, might, I'll have been a little bit older. What I think you're going to be less likely to find to though, is that's on the query and content understanding. There is a book. +It's really a moral collection of survey essays on query understanding for search that was published. I forget if it was last year or the year before, a little bit expensive, but if you look at that out there, if you're more interested in things for free, my blog at queryunderstandy. +com is available. At least we'll give you a survey of the of the techniques there. But to be clear, it's very focused on query understanding as such. I'm writing a series of content understanding that unimaginatively content understanding.com that starts doing the same thing there. +But from the perspective of books, I would say that probably Chris Mannings information retrieval book would be a good place to start an information retrieval in general. And the query understanding collection of essays is frankly the best publisher resource you're going to get for that. Awesome. +Hope that answers your question. I'm sure the next one I'm going to take from the chat. It's from Chris. What are your recommendations for integrating information retrieval, retrieving documents with question answering, returning answers within a context? Yeah. +So question answering is really exciting, right? And that the this idea that you can get information instead of just the document. + The if you think about the the way we've gotten there, a lot of it starts front of a mental passage retrieval where or even before that search snippets essentially, if you think about the way Google looked five to ten years ago, you would see that sometimes as you looked at your search results page, your answer was in the few words that were highlighted for the for the result. +But now it's more likely that you'll see that sentence extracted and put near the top. And I would say that a lot of question answering today feels a lot like passage retrieval. That is find that sentence. + Although I would say that while before that tended to be retrieving a passage that contained essentially the exact words you'd use, maybe a little bit of variation for stemming or synonyms, nowadays it's more likely using a vector-based approach to be a sentence or passage that is similar in the vector space. +That's it for exciting. However, what people really want is that even though there is no sentence in the content that exactly answers your question, somehow the search engine will be able to not to be a search engine. +Answer engine and say, oh, I'm able to synthesize content from different places, understand your question and learn that. We are not there. I mean, of course, you can ask what's each the i, i, i, i plus one, and it will say zero, but that's cheating, right? That's really just doing this. +You can play with wall from alpha that is a more sophisticated version of trying to essentially parse what you ask into a question that it can then execute in a language. But I think doing that on general information, we are far away from it's exciting, but that's that will require a generation. +Absolutely. I agree on that. The next question from Q&A panel from Donnie. What roles do curated control for capillaries, terminally name, third-east exonomy, and so on, playing in practical approaches to query and content understanding in your experience? So the huge. +And I'm glad that you asked this on me. Basically, collecting these sorts of curated capillaries can be a great way to label the content and have targets for doing a machine learning labeling. So for example, you know what colors things come in, then those people tend to look for colors. +They're great. You can actually say great that I'm going to do to look this way. Or if I know what subjects people are interested in, those will be the subjects that they will content with and where it targets things for queries. + So in a way, having these vocabularies changes what would otherwise be an unsturableized problem, saying, well, I'm hoping that I can get some representation of what this is about, what the query is about, and match them, which if I don't have vocabularies, I'll be somewhat unattainingated in how I do this. +Having control vocabularies can say, oh, those are the ways in which I will try to represent things. +Even if I'm using vector-based methods to get there, they give me some alignment on the space and having multiple such vocabularies might say, well, sometimes I might be interested in one aspect of this content that's sometimes in another, right? +So I might be interested in the color and I'd be in the prototype and I'd be in the material. +So I would say that having these vocabularies can make a big difference. And since you bring up the SORA, it's going to be helpful when there's a vocabulary gap between the way people ask things and the way things are represented to use a SORAist for query expansion. +You have to be careful because the SORA tends to wreak havoc with the context in which those words occur, but still they can be great for generating candidates for retrieving more results. Awesome. +I think we still have time even at the top of the hour for the last question from an anonymous attendee. So Daniel, you talked about query classification for the retrieval side of things, but that can be a slippery slope. If content isn't 100% correctly categorized, and often it's not. +Therefore, our recall would be negatively impacted by using query understanding as a hard filter. Any input on that? Absolutely. I mean, I was burned by this myself when I was helping a client with trying to target promoted search results. +And I said, oh, we should only put them in the right category, indeed, because there were some categorization errors. This had such a negative impact that I stormed into the room saying, close the test, we're losing money. +And it felt very embarrassed because it turned out that, indeed, this is a problem, the content wasn't classified well. Of course, the first thing I would say is invest on the content side because if you're able to classify queries, you probably can also invest in classified with the content. +And by the way, even if the content has been say, categorized in a way you're not allowed to override. Right. Maybe you're a marketplace or you're the content that you know, you don't own the categorization. +For example, on LinkedIn, if I decide to say I'm an attorney on LinkedIn, LinkedIn's not what you just automatically change, no, you're actually not your cat. But it can still classify me and say, you know, you kind of look like a software engineer. +And you can use inferred categories in your retrieval. There's no prohibition there. I'd also say that, you know, maybe the content isn't 100% categorized because there are some similar categories. +Well, you can always take query classification, you know, it's not not a hard and fast rule, but more like guidelines and say, well, return things in that category. But I'll also return things to say we're referring to that category. +And maybe you'll even return things that are in similar categories. Oh, in my way, if I'm not 100% sure on that category, even on the query side, maybe I'll take the top few categories that I thought were possible. +There are a lot of ways in which you can use what you've seen, what you've learned about the query and what you've learned about the content that's hints. And with all these things, it's a precision recall trade-off. +And you have to generally decide what's the cost of losing that recall versus what's the cost of having the annoyance of for precision. And it will depend. Absolutely. It's a journey. And I've enjoyed this conversation so much. I've learned new things. I will rewatch this podcast myself. +And thanks to everyone for asking your questions and thanks, Daniel, for for answering them. Brilliant as you usually do. Hopefully we covered all the questions from the chat and from the Q&A panel. And yeah, thank you so much. +And I think where you don't feel comfortable or you don't know yet, I highly recommend you to take the course or a course on search. And go from there, experiment. Be bold about what you do in your experiments. But just apply science and apply the knowledge from the moguls like Daniel and Grant. +So thank you very much, Daniel, for your time and for your wisdom today. Thank you, Jimic, you're so pleasure. Are ready. Thank you, everyone. Thanks. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/debunking-myths-of-vector-search-and-llms-with-leo-boytsov.md b/transcripts_with_timestamps/vector-podcast/debunking-myths-of-vector-search-and-llms-with-leo-boytsov.md new file mode 100644 index 0000000..41a7278 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/debunking-myths-of-vector-search-and-llms-with-leo-boytsov.md @@ -0,0 +1,2867 @@ +--- +description: '

00:00 + Intro

01:31 + Leo''s story

09:59 + SPLADE: single model to solve both dense and sparse?

21:06 + DeepImpact

29:58 + NMSLIB: what are non-metric spaces

34:21 + How HNSW and NMSLIB joined forces

41:11 + Why FAISS did not choose NMSLIB''s algorithm

43:36 + Serendipity of discovery and the creation of industries

47:06 + Vector Search: intellectually rewarding, professionally undervalued

52:37 + Why RDBMS Still Struggles with Scalable Vector and Free-Text Search

1:00:16 + Leo''s recent favorite papers

Lots + of papers and other material from Leo: https://www.youtube.com/watch?v=gzWErcOXIKk

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20250117_030137_0d0e98d093e79861b9dc85d445adcf1e.png +pub_date: Fri, 17 Jan 2025 15:07:51 GMT +title: Debunking myths of vector search and LLMs with Leo Boytsov +url: https://rss.com/podcasts/vector-podcast/1852660 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 22.6, "text": " Hi everyone, + Vector Podcast is back with still with season three and I''m super excited", "tokens": + [50364, 2421, 1518, 11, 691, 20814, 29972, 307, 646, 365, 920, 365, 3196, 1045, + 293, 286, 478, 1687, 2919, 51494], "temperature": 0.0, "avg_logprob": -0.2913332939147949, + "compression_ratio": 1.3255813953488371, "no_speech_prob": 0.21869045495986938}, + {"id": 1, "seek": 0, "start": 22.6, "end": 27.28, "text": " to be talking to my + guests today and there is a connection with this episode between", "tokens": [51494, + 281, 312, 1417, 281, 452, 9804, 965, 293, 456, 307, 257, 4984, 365, 341, 3500, 1296, + 51728], "temperature": 0.0, "avg_logprob": -0.2913332939147949, "compression_ratio": + 1.3255813953488371, "no_speech_prob": 0.21869045495986938}, {"id": 2, "seek": 2728, + "start": 27.28, "end": 34.28, "text": " this episode and the episode that we recorded + with Yuri Malkov about one of the most famous", "tokens": [50364, 341, 3500, 293, + 264, 3500, 300, 321, 8287, 365, 33901, 376, 667, 5179, 466, 472, 295, 264, 881, + 4618, 50714], "temperature": 0.0, "avg_logprob": -0.3528098649876092, "compression_ratio": + 1.5443037974683544, "no_speech_prob": 0.2689688205718994}, {"id": 3, "seek": 2728, + "start": 34.28, "end": 40.400000000000006, "text": " and popular vector search algorithm + I can ask WL and I''m talking today with Leo Boyzov,", "tokens": [50714, 293, 3743, + 8062, 3164, 9284, 286, 393, 1029, 343, 43, 293, 286, 478, 1417, 965, 365, 19344, + 9486, 89, 5179, 11, 51020], "temperature": 0.0, "avg_logprob": -0.3528098649876092, + "compression_ratio": 1.5443037974683544, "no_speech_prob": 0.2689688205718994}, + {"id": 4, "seek": 2728, "start": 40.400000000000006, "end": 49.36, "text": " who + is the senior research scientist at AWS and he is also a co-author of Animesleab + and Animesleab", "tokens": [51020, 567, 307, 264, 7965, 2132, 12662, 412, 17650, + 293, 415, 307, 611, 257, 598, 12, 34224, 295, 1107, 1532, 306, 455, 293, 1107, 1532, + 306, 455, 51468], "temperature": 0.0, "avg_logprob": -0.3528098649876092, "compression_ratio": + 1.5443037974683544, "no_speech_prob": 0.2689688205718994}, {"id": 5, "seek": 2728, + "start": 49.36, "end": 55.44, "text": " is today used at Open Search and probably + some other places that I actually don''t know", "tokens": [51468, 307, 965, 1143, + 412, 7238, 17180, 293, 1391, 512, 661, 3190, 300, 286, 767, 500, 380, 458, 51772], + "temperature": 0.0, "avg_logprob": -0.3528098649876092, "compression_ratio": 1.5443037974683544, + "no_speech_prob": 0.2689688205718994}, {"id": 6, "seek": 5544, "start": 55.44, "end": + 64.96, "text": " and I hope to learn it as well today. This is just exciting and + I think goes without saying", "tokens": [50364, 293, 286, 1454, 281, 1466, 309, + 382, 731, 965, 13, 639, 307, 445, 4670, 293, 286, 519, 1709, 1553, 1566, 50840], + "temperature": 0.0, "avg_logprob": -0.1196063504074559, "compression_ratio": 1.5265957446808511, + "no_speech_prob": 0.010003809817135334}, {"id": 7, "seek": 5544, "start": 64.96, + "end": 72.47999999999999, "text": " that the whole field stands on the work done + by people like Leo and Yuri and others who actually", "tokens": [50840, 300, 264, + 1379, 2519, 7382, 322, 264, 589, 1096, 538, 561, 411, 19344, 293, 33901, 293, 2357, + 567, 767, 51216], "temperature": 0.0, "avg_logprob": -0.1196063504074559, "compression_ratio": + 1.5265957446808511, "no_speech_prob": 0.010003809817135334}, {"id": 8, "seek": 5544, + "start": 72.47999999999999, "end": 79.75999999999999, "text": " develop the core + algorithms and popularize them, improve them over time and then the story unfolds", + "tokens": [51216, 1499, 264, 4965, 14642, 293, 3743, 1125, 552, 11, 3470, 552, 670, + 565, 293, 550, 264, 1657, 17980, 82, 51580], "temperature": 0.0, "avg_logprob": + -0.1196063504074559, "compression_ratio": 1.5265957446808511, "no_speech_prob": + 0.010003809817135334}, {"id": 9, "seek": 7976, "start": 80.56, "end": 88.0, "text": + " from there. Hi Leo, how you doing? Hi, thank you for introducing me, it''s a great + pleasure", "tokens": [50404, 490, 456, 13, 2421, 19344, 11, 577, 291, 884, 30, 2421, + 11, 1309, 291, 337, 15424, 385, 11, 309, 311, 257, 869, 6834, 50776], "temperature": + 0.0, "avg_logprob": -0.21352429707845053, "compression_ratio": 1.4973544973544974, + "no_speech_prob": 0.020247841253876686}, {"id": 10, "seek": 7976, "start": 88.0, + "end": 93.92, "text": " to be able to podcast. Yeah, it''s my pleasure as well to + have you. Traditionally we start with", "tokens": [50776, 281, 312, 1075, 281, 7367, + 13, 865, 11, 309, 311, 452, 6834, 382, 731, 281, 362, 291, 13, 22017, 15899, 321, + 722, 365, 51072], "temperature": 0.0, "avg_logprob": -0.21352429707845053, "compression_ratio": + 1.4973544973544974, "no_speech_prob": 0.020247841253876686}, {"id": 11, "seek": + 7976, "start": 93.92, "end": 101.84, "text": " the background. Can you say in a + few words your background maybe how you got here and what''s your", "tokens": [51072, + 264, 3678, 13, 1664, 291, 584, 294, 257, 1326, 2283, 428, 3678, 1310, 577, 291, + 658, 510, 293, 437, 311, 428, 51468], "temperature": 0.0, "avg_logprob": -0.21352429707845053, + "compression_ratio": 1.4973544973544974, "no_speech_prob": 0.020247841253876686}, + {"id": 12, "seek": 10184, "start": 101.84, "end": 112.80000000000001, "text": " + story in search vector search and maybe LLM? Yeah, sure, yeah so background is pretty + long.", "tokens": [50364, 1657, 294, 3164, 8062, 3164, 293, 1310, 441, 43, 44, 30, + 865, 11, 988, 11, 1338, 370, 3678, 307, 1238, 938, 13, 50912], "temperature": 0.0, + "avg_logprob": -0.3075940673415725, "compression_ratio": 1.385786802030457, "no_speech_prob": + 0.006675052922219038}, {"id": 13, "seek": 10184, "start": 116.0, "end": 122.32000000000001, + "text": " So I''ve had a rather long career, honestly. Well in my current capacity + as you mentioned,", "tokens": [51072, 407, 286, 600, 632, 257, 2831, 938, 3988, + 11, 6095, 13, 1042, 294, 452, 2190, 6042, 382, 291, 2835, 11, 51388], "temperature": + 0.0, "avg_logprob": -0.3075940673415725, "compression_ratio": 1.385786802030457, + "no_speech_prob": 0.006675052922219038}, {"id": 14, "seek": 10184, "start": 122.32000000000001, + "end": 129.36, "text": " I am a scientist at WSAA labs. For one year I was working + on co-generation about this year,", "tokens": [51388, 286, 669, 257, 12662, 412, + 343, 50, 5265, 20339, 13, 1171, 472, 1064, 286, 390, 1364, 322, 598, 12, 30372, + 466, 341, 1064, 11, 51740], "temperature": 0.0, "avg_logprob": -0.3075940673415725, + "compression_ratio": 1.385786802030457, "no_speech_prob": 0.006675052922219038}, + {"id": 15, "seek": 12936, "start": 129.36, "end": 135.28, "text": " earlier this + year I moved to a Q-console team, Q-console team works on question and", "tokens": + [50364, 3071, 341, 1064, 286, 4259, 281, 257, 1249, 12, 1671, 9481, 1469, 11, 1249, + 12, 1671, 9481, 1469, 1985, 322, 1168, 293, 50660], "temperature": 0.0, "avg_logprob": + -0.32064435389134793, "compression_ratio": 1.4736842105263157, "no_speech_prob": + 0.013303917832672596}, {"id": 16, "seek": 12936, "start": 135.28, "end": 143.12, + "text": " switch at bots that answers questions about various AWS services. So we + can ask like, I don''t know,", "tokens": [50660, 3679, 412, 35410, 300, 6338, 1651, + 466, 3683, 17650, 3328, 13, 407, 321, 393, 1029, 411, 11, 286, 500, 380, 458, 11, + 51052], "temperature": 0.0, "avg_logprob": -0.32064435389134793, "compression_ratio": + 1.4736842105263157, "no_speech_prob": 0.013303917832672596}, {"id": 17, "seek": + 12936, "start": 143.12, "end": 152.08, "text": " it''s like where''s my EC2 instance + things like that and how I set up things. But I have to make a", "tokens": [51052, + 309, 311, 411, 689, 311, 452, 19081, 17, 5197, 721, 411, 300, 293, 577, 286, 992, + 493, 721, 13, 583, 286, 362, 281, 652, 257, 51500], "temperature": 0.0, "avg_logprob": + -0.32064435389134793, "compression_ratio": 1.4736842105263157, "no_speech_prob": + 0.013303917832672596}, {"id": 18, "seek": 15208, "start": 152.08, "end": 160.64000000000001, + "text": " disclaimer that today I do not speak on behalf of AWS and I cannot talk + in details about my work", "tokens": [50364, 40896, 300, 965, 286, 360, 406, 1710, + 322, 9490, 295, 17650, 293, 286, 2644, 751, 294, 4365, 466, 452, 589, 50792], "temperature": + 0.0, "avg_logprob": -0.12685412710363214, "compression_ratio": 1.5833333333333333, + "no_speech_prob": 0.007998749613761902}, {"id": 19, "seek": 15208, "start": 160.64000000000001, + "end": 170.24, "text": " there. So as I said I had a really relatively long career. + Yeah, so most of nearly all of my life,", "tokens": [50792, 456, 13, 407, 382, 286, + 848, 286, 632, 257, 534, 7226, 938, 3988, 13, 865, 11, 370, 881, 295, 6217, 439, + 295, 452, 993, 11, 51272], "temperature": 0.0, "avg_logprob": -0.12685412710363214, + "compression_ratio": 1.5833333333333333, "no_speech_prob": 0.007998749613761902}, + {"id": 20, "seek": 15208, "start": 170.24, "end": 175.92000000000002, "text": " + I have been a computer science geek with a passion for building cool stuff and solving + hard", "tokens": [51272, 286, 362, 668, 257, 3820, 3497, 36162, 365, 257, 5418, + 337, 2390, 1627, 1507, 293, 12606, 1152, 51556], "temperature": 0.0, "avg_logprob": + -0.12685412710363214, "compression_ratio": 1.5833333333333333, "no_speech_prob": + 0.007998749613761902}, {"id": 21, "seek": 15208, "start": 175.92000000000002, "end": + 182.0, "text": " problems. Yet my professional career started in rather mundane + fashion. So I started working", "tokens": [51556, 2740, 13, 10890, 452, 4843, 3988, + 1409, 294, 2831, 43497, 6700, 13, 407, 286, 1409, 1364, 51860], "temperature": 0.0, + "avg_logprob": -0.12685412710363214, "compression_ratio": 1.5833333333333333, "no_speech_prob": + 0.007998749613761902}, {"id": 22, "seek": 18200, "start": 182.24, "end": 189.44, + "text": " client and service of where for financial systems. This was not my favorite + subject, but pretty much", "tokens": [50376, 6423, 293, 2643, 295, 689, 337, 4669, + 3652, 13, 639, 390, 406, 452, 2954, 3983, 11, 457, 1238, 709, 50736], "temperature": + 0.0, "avg_logprob": -0.17028651180037532, "compression_ratio": 1.5355648535564854, + "no_speech_prob": 0.002024303423240781}, {"id": 23, "seek": 18200, "start": 189.44, + "end": 195.52, "text": " the only one that was paid reasonably well at the time. + So I had to do a lot of front end and", "tokens": [50736, 264, 787, 472, 300, 390, + 4835, 23551, 731, 412, 264, 565, 13, 407, 286, 632, 281, 360, 257, 688, 295, 1868, + 917, 293, 51040], "temperature": 0.0, "avg_logprob": -0.17028651180037532, "compression_ratio": + 1.5355648535564854, "no_speech_prob": 0.002024303423240781}, {"id": 24, "seek": + 18200, "start": 195.52, "end": 203.76, "text": " back end engineering using various + SQL databases. I was not satisfied with my career, but", "tokens": [51040, 646, + 917, 7043, 1228, 3683, 19200, 22380, 13, 286, 390, 406, 11239, 365, 452, 3988, 11, + 457, 51452], "temperature": 0.0, "avg_logprob": -0.17028651180037532, "compression_ratio": + 1.5355648535564854, "no_speech_prob": 0.002024303423240781}, {"id": 25, "seek": + 18200, "start": 204.4, "end": 209.44, "text": " luckily I got really interested + in algorithms, in particular retrieval algorithms.", "tokens": [51484, 22880, 286, + 658, 534, 3102, 294, 14642, 11, 294, 1729, 19817, 3337, 14642, 13, 51736], "temperature": + 0.0, "avg_logprob": -0.17028651180037532, "compression_ratio": 1.5355648535564854, + "no_speech_prob": 0.002024303423240781}, {"id": 26, "seek": 20944, "start": 210.4, + "end": 216.0, "text": " So I started working on this topic with the algorithms first + part time, then full time.", "tokens": [50412, 407, 286, 1409, 1364, 322, 341, 4829, + 365, 264, 14642, 700, 644, 565, 11, 550, 1577, 565, 13, 50692], "temperature": 0.0, + "avg_logprob": -0.24250221252441406, "compression_ratio": 1.6355140186915889, "no_speech_prob": + 0.005886485800147057}, {"id": 27, "seek": 20944, "start": 216.96, "end": 223.6, + "text": " But largely as a software engineer, less as a researcher. And as a software + engineer, work", "tokens": [50740, 583, 11611, 382, 257, 4722, 11403, 11, 1570, + 382, 257, 21751, 13, 400, 382, 257, 4722, 11403, 11, 589, 51072], "temperature": + 0.0, "avg_logprob": -0.24250221252441406, "compression_ratio": 1.6355140186915889, + "no_speech_prob": 0.005886485800147057}, {"id": 28, "seek": 20944, "start": 223.6, + "end": 231.12, "text": " for various companies, including two tiny startups in the + Russian search engine and the Yandex.", "tokens": [51072, 337, 3683, 3431, 11, 3009, + 732, 5870, 28041, 294, 264, 7220, 3164, 2848, 293, 264, 398, 474, 3121, 13, 51448], + "temperature": 0.0, "avg_logprob": -0.24250221252441406, "compression_ratio": 1.6355140186915889, + "no_speech_prob": 0.005886485800147057}, {"id": 29, "seek": 20944, "start": 232.24, + "end": 237.36, "text": " So later I moved to the United States and work on the search + engine PubMed,", "tokens": [51504, 407, 1780, 286, 4259, 281, 264, 2824, 3040, 293, + 589, 322, 264, 3164, 2848, 21808, 42954, 11, 51760], "temperature": 0.0, "avg_logprob": + -0.24250221252441406, "compression_ratio": 1.6355140186915889, "no_speech_prob": + 0.005886485800147057}, {"id": 30, "seek": 23736, "start": 237.52, "end": 244.24, + "text": " International Center of Biotechnology, information. First again, that + was a common topic in my career,", "tokens": [50372, 9157, 5169, 295, 13007, 43594, + 11, 1589, 13, 2386, 797, 11, 300, 390, 257, 2689, 4829, 294, 452, 3988, 11, 50708], + "temperature": 0.0, "avg_logprob": -0.41284378715183423, "compression_ratio": 1.4919354838709677, + "no_speech_prob": 0.005691992584615946}, {"id": 31, "seek": 23736, "start": 244.24, + "end": 250.24, "text": " started working with I was doing a lot of front end development. + But about the class,", "tokens": [50708, 1409, 1364, 365, 286, 390, 884, 257, 688, + 295, 1868, 917, 3250, 13, 583, 466, 264, 1508, 11, 51008], "temperature": 0.0, "avg_logprob": + -0.41284378715183423, "compression_ratio": 1.4919354838709677, "no_speech_prob": + 0.005691992584615946}, {"id": 32, "seek": 23736, "start": 250.24, "end": 258.40000000000003, + "text": " 40 years I worked primarily on the T-Roll, the core engine. In particular, + I invented a pretty", "tokens": [51008, 3356, 924, 286, 2732, 10029, 322, 264, 314, + 12, 49, 1833, 11, 264, 4965, 2848, 13, 682, 1729, 11, 286, 14479, 257, 1238, 51416], + "temperature": 0.0, "avg_logprob": -0.41284378715183423, "compression_ratio": 1.4919354838709677, + "no_speech_prob": 0.005691992584615946}, {"id": 33, "seek": 23736, "start": 258.40000000000003, + "end": 266.0, "text": " need to speed up weighted bull in the T-Roll. And around + the time I also realized that", "tokens": [51416, 643, 281, 3073, 493, 32807, 4693, + 294, 264, 314, 12, 49, 1833, 13, 400, 926, 264, 565, 286, 611, 5334, 300, 51796], + "temperature": 0.0, "avg_logprob": -0.41284378715183423, "compression_ratio": 1.4919354838709677, + "no_speech_prob": 0.005691992584615946}, {"id": 34, "seek": 26600, "start": 266.72, + "end": 273.6, "text": " it would be hard to get to the search position without a + good degree. So that motivated me to", "tokens": [50400, 309, 576, 312, 1152, 281, + 483, 281, 264, 3164, 2535, 1553, 257, 665, 4314, 13, 407, 300, 14515, 385, 281, + 50744], "temperature": 0.0, "avg_logprob": -0.18076391623053753, "compression_ratio": + 1.4170854271356783, "no_speech_prob": 0.007606916129589081}, {"id": 35, "seek": + 26600, "start": 274.4, "end": 281.44, "text": " apply a bunch of universities and + eventually I got accepted by Carnegie Mell, which was a huge", "tokens": [50784, + 3079, 257, 3840, 295, 11779, 293, 4728, 286, 658, 9035, 538, 47301, 376, 898, 11, + 597, 390, 257, 2603, 51136], "temperature": 0.0, "avg_logprob": -0.18076391623053753, + "compression_ratio": 1.4170854271356783, "no_speech_prob": 0.007606916129589081}, + {"id": 36, "seek": 26600, "start": 281.44, "end": 292.08, "text": " lock. But yeah, + so I did my PhD studies there. And during these studies, I worked on a mix of", + "tokens": [51136, 4017, 13, 583, 1338, 11, 370, 286, 630, 452, 14476, 5313, 456, + 13, 400, 1830, 613, 5313, 11, 286, 2732, 322, 257, 2890, 295, 51668], "temperature": + 0.0, "avg_logprob": -0.18076391623053753, "compression_ratio": 1.4170854271356783, + "no_speech_prob": 0.007606916129589081}, {"id": 37, "seek": 29208, "start": 292.08, + "end": 299.12, "text": " machine learning and algorithm algorithms without any deep + learning. So the vector search or", "tokens": [50364, 3479, 2539, 293, 9284, 14642, + 1553, 604, 2452, 2539, 13, 407, 264, 8062, 3164, 420, 50716], "temperature": 0.0, + "avg_logprob": -0.32173025608062744, "compression_ratio": 1.6127167630057804, "no_speech_prob": + 0.001641846145503223}, {"id": 38, "seek": 29208, "start": 299.12, "end": 306.56, + "text": " rather similarity search was a part of my graduate studies. So yeah, I + didn''t use any deep learning", "tokens": [50716, 2831, 32194, 3164, 390, 257, 644, + 295, 452, 8080, 5313, 13, 407, 1338, 11, 286, 994, 380, 764, 604, 2452, 2539, 51088], + "temperature": 0.0, "avg_logprob": -0.32173025608062744, "compression_ratio": 1.6127167630057804, + "no_speech_prob": 0.001641846145503223}, {"id": 39, "seek": 29208, "start": 306.56, + "end": 312.56, "text": " though. It was a mix of classical machine learning, VortoVex + style neural networks and", "tokens": [51088, 1673, 13, 467, 390, 257, 2890, 295, + 13735, 3479, 2539, 11, 691, 477, 78, 53, 3121, 3758, 18161, 9590, 293, 51388], "temperature": + 0.0, "avg_logprob": -0.32173025608062744, "compression_ratio": 1.6127167630057804, + "no_speech_prob": 0.001641846145503223}, {"id": 40, "seek": 31256, "start": 313.28000000000003, + "end": 320.0, "text": " digital. So what is an interesting part of that story is + that my advisor, Eric Nyberg,", "tokens": [50400, 4562, 13, 407, 437, 307, 364, + 1880, 644, 295, 300, 1657, 307, 300, 452, 19161, 11, 9336, 29214, 6873, 11, 50736], + "temperature": 0.0, "avg_logprob": -0.3530677448619496, "compression_ratio": 1.4977578475336324, + "no_speech_prob": 0.004985547158867121}, {"id": 41, "seek": 31256, "start": 320.64, + "end": 325.92, "text": " he worked on question answering. And together with his + theory and his", "tokens": [50768, 415, 2732, 322, 1168, 13430, 13, 400, 1214, 365, + 702, 5261, 293, 702, 51032], "temperature": 0.0, "avg_logprob": -0.3530677448619496, + "compression_ratio": 1.4977578475336324, "no_speech_prob": 0.004985547158867121}, + {"id": 42, "seek": 31256, "start": 325.92, "end": 332.64, "text": " participated + in development of IBM Watson, that''s an amazing trivia playing system that", "tokens": + [51032, 17978, 294, 3250, 295, 23487, 25640, 11, 300, 311, 364, 2243, 48770, 2433, + 1185, 300, 51368], "temperature": 0.0, "avg_logprob": -0.3530677448619496, "compression_ratio": + 1.4977578475336324, "no_speech_prob": 0.004985547158867121}, {"id": 43, "seek": + 31256, "start": 334.32, "end": 341.2, "text": " 2011 defeated human champions. So + that was like one reason why I chose my advisor. It was", "tokens": [51452, 10154, + 15563, 1952, 11230, 13, 407, 300, 390, 411, 472, 1778, 983, 286, 5111, 452, 19161, + 13, 467, 390, 51796], "temperature": 0.0, "avg_logprob": -0.3530677448619496, "compression_ratio": + 1.4977578475336324, "no_speech_prob": 0.004985547158867121}, {"id": 44, "seek": + 34120, "start": 341.2, "end": 347.36, "text": " like such a cool topic to choose. + But pretty quickly I learned about the system and realized,", "tokens": [50364, + 411, 1270, 257, 1627, 4829, 281, 2826, 13, 583, 1238, 2661, 286, 3264, 466, 264, + 1185, 293, 5334, 11, 50672], "temperature": 0.0, "avg_logprob": -0.19147655528078797, + "compression_ratio": 1.5932203389830508, "no_speech_prob": 0.0015903604216873646}, + {"id": 45, "seek": 34120, "start": 347.36, "end": 353.59999999999997, "text": " + oh, like it''s actually really just not just but it''s largely such engine on steroids. + So", "tokens": [50672, 1954, 11, 411, 309, 311, 767, 534, 445, 406, 445, 457, 309, + 311, 11611, 1270, 2848, 322, 45717, 13, 407, 50984], "temperature": 0.0, "avg_logprob": + -0.19147655528078797, "compression_ratio": 1.5932203389830508, "no_speech_prob": + 0.0015903604216873646}, {"id": 46, "seek": 34120, "start": 353.59999999999997, "end": + 362.15999999999997, "text": " Retrieval, IBM Watson, I have a blog post about that + if anybody is interested. But then Retrieval,", "tokens": [50984, 11495, 5469, 3337, + 11, 23487, 25640, 11, 286, 362, 257, 6968, 2183, 466, 300, 498, 4472, 307, 3102, + 13, 583, 550, 11495, 5469, 3337, 11, 51412], "temperature": 0.0, "avg_logprob": + -0.19147655528078797, "compression_ratio": 1.5932203389830508, "no_speech_prob": + 0.0015903604216873646}, {"id": 47, "seek": 34120, "start": 362.15999999999997, "end": + 368.8, "text": " it''s basically really Retrieval based extractive question answering + systems. So if you want to", "tokens": [51412, 309, 311, 1936, 534, 11495, 5469, + 3337, 2361, 8947, 488, 1168, 13430, 3652, 13, 407, 498, 291, 528, 281, 51744], "temperature": + 0.0, "avg_logprob": -0.19147655528078797, "compression_ratio": 1.5932203389830508, + "no_speech_prob": 0.0015903604216873646}, {"id": 48, "seek": 36880, "start": 368.8, + "end": 374.32, "text": " improve question answering, you need to improve Retrieval. + So that''s how I got back to working", "tokens": [50364, 3470, 1168, 13430, 11, + 291, 643, 281, 3470, 11495, 5469, 3337, 13, 407, 300, 311, 577, 286, 658, 646, 281, + 1364, 50640], "temperature": 0.0, "avg_logprob": -0.23950576782226562, "compression_ratio": + 1.5127118644067796, "no_speech_prob": 0.0013318248093128204}, {"id": 49, "seek": + 36880, "start": 374.32, "end": 380.24, "text": " on quality algorithms. And again, + I saw an opportunity and why big research question was,", "tokens": [50640, 322, + 3125, 14642, 13, 400, 797, 11, 286, 1866, 364, 2650, 293, 983, 955, 2132, 1168, + 390, 11, 50936], "temperature": 0.0, "avg_logprob": -0.23950576782226562, "compression_ratio": + 1.5127118644067796, "no_speech_prob": 0.0013318248093128204}, {"id": 50, "seek": + 36880, "start": 380.8, "end": 386.48, "text": " how can we do information Retrieval + using more advanced techniques rather than", "tokens": [50964, 577, 393, 321, 360, + 1589, 11495, 5469, 3337, 1228, 544, 7339, 7512, 2831, 813, 51248], "temperature": + 0.0, "avg_logprob": -0.23950576782226562, "compression_ratio": 1.5127118644067796, + "no_speech_prob": 0.0013318248093128204}, {"id": 51, "seek": 36880, "start": 386.48, + "end": 394.08000000000004, "text": " lexical search with BIRN 25? And because before + birth, nowadays like everybody just uses like", "tokens": [51248, 476, 87, 804, + 3164, 365, 363, 7740, 45, 3552, 30, 400, 570, 949, 3965, 11, 13434, 411, 2201, 445, + 4960, 411, 51628], "temperature": 0.0, "avg_logprob": -0.23950576782226562, "compression_ratio": + 1.5127118644067796, "no_speech_prob": 0.0013318248093128204}, {"id": 52, "seek": + 39408, "start": 394.08, "end": 402.47999999999996, "text": " word-based models or + any like other transform based models to create dense vector embeddings and", "tokens": + [50364, 1349, 12, 6032, 5245, 420, 604, 411, 661, 4088, 2361, 5245, 281, 1884, 18011, + 8062, 12240, 29432, 293, 50784], "temperature": 0.0, "avg_logprob": -0.16363255730990706, + "compression_ratio": 1.6623931623931625, "no_speech_prob": 0.0017891578609123826}, + {"id": 53, "seek": 39408, "start": 402.47999999999996, "end": 410.96, "text": " + they are quite effective, that was not the case when like 10 years ago. So whatever + we had there", "tokens": [50784, 436, 366, 1596, 4942, 11, 300, 390, 406, 264, 1389, + 562, 411, 1266, 924, 2057, 13, 407, 2035, 321, 632, 456, 51208], "temperature": + 0.0, "avg_logprob": -0.16363255730990706, "compression_ratio": 1.6623931623931625, + "no_speech_prob": 0.0017891578609123826}, {"id": 54, "seek": 39408, "start": 410.96, + "end": 416.79999999999995, "text": " was pretty ineffective Retrieval. And so my + thought was that because the single representation was", "tokens": [51208, 390, + 1238, 48836, 11495, 5469, 3337, 13, 400, 370, 452, 1194, 390, 300, 570, 264, 2167, + 10290, 390, 51500], "temperature": 0.0, "avg_logprob": -0.16363255730990706, "compression_ratio": + 1.6623931623931625, "no_speech_prob": 0.0017891578609123826}, {"id": 55, "seek": + 39408, "start": 416.79999999999995, "end": 422.56, "text": " not effective Retrieval, + those need to be somehow combined and assembled. So you basically don''t", "tokens": + [51500, 406, 4942, 11495, 5469, 3337, 11, 729, 643, 281, 312, 6063, 9354, 293, 24204, + 13, 407, 291, 1936, 500, 380, 51788], "temperature": 0.0, "avg_logprob": -0.16363255730990706, + "compression_ratio": 1.6623931623931625, "no_speech_prob": 0.0017891578609123826}, + {"id": 56, "seek": 42256, "start": 422.56, "end": 428.96, "text": " get a single + representation. You use a combination, you use combine similarity and then you treat", + "tokens": [50364, 483, 257, 2167, 10290, 13, 509, 764, 257, 6562, 11, 291, 764, + 10432, 32194, 293, 550, 291, 2387, 50684], "temperature": 0.0, "avg_logprob": -0.2163057105485783, + "compression_ratio": 1.786046511627907, "no_speech_prob": 0.0010187685256823897}, + {"id": 57, "seek": 42256, "start": 428.96, "end": 435.12, "text": " this similarity + as a black box and then you apply generic Retrieval algorithm. So this was a pretty", + "tokens": [50684, 341, 32194, 382, 257, 2211, 2424, 293, 550, 291, 3079, 19577, + 11495, 5469, 3337, 9284, 13, 407, 341, 390, 257, 1238, 50992], "temperature": 0.0, + "avg_logprob": -0.2163057105485783, "compression_ratio": 1.786046511627907, "no_speech_prob": + 0.0010187685256823897}, {"id": 58, "seek": 42256, "start": 436.32, "end": 442.08, + "text": " in hindsight that was a pretty ambitious project that required working + on both design and effective", "tokens": [51052, 294, 44357, 300, 390, 257, 1238, + 20239, 1716, 300, 4739, 1364, 322, 1293, 1715, 293, 4942, 51340], "temperature": + 0.0, "avg_logprob": -0.2163057105485783, "compression_ratio": 1.786046511627907, + "no_speech_prob": 0.0010187685256823897}, {"id": 59, "seek": 42256, "start": 442.08, + "end": 451.12, "text": " similarities. And Retrieval algorithms. And that''s why + we, well one, that''s where that", "tokens": [51340, 24197, 13, 400, 11495, 5469, + 3337, 14642, 13, 400, 300, 311, 983, 321, 11, 731, 472, 11, 300, 311, 689, 300, + 51792], "temperature": 0.0, "avg_logprob": -0.2163057105485783, "compression_ratio": + 1.786046511627907, "no_speech_prob": 0.0010187685256823897}, {"id": 60, "seek": + 45112, "start": 451.12, "end": 456.64, "text": " animously library turned out to + be very useful. It was instrumental to this work.", "tokens": [50364, 2383, 5098, + 6405, 3574, 484, 281, 312, 588, 4420, 13, 467, 390, 17388, 281, 341, 589, 13, 50640], + "temperature": 0.0, "avg_logprob": -0.19445706927587117, "compression_ratio": 1.4864864864864864, + "no_speech_prob": 0.0008963451255112886}, {"id": 61, "seek": 45112, "start": 457.36, + "end": 465.68, "text": " Although it was created for somewhat unrelated people. + Okay, so that was an overall rather bumpy,", "tokens": [50676, 5780, 309, 390, 2942, + 337, 8344, 38967, 561, 13, 1033, 11, 370, 300, 390, 364, 4787, 2831, 49400, 11, + 51092], "temperature": 0.0, "avg_logprob": -0.19445706927587117, "compression_ratio": + 1.4864864864864864, "no_speech_prob": 0.0008963451255112886}, {"id": 62, "seek": + 45112, "start": 465.68, "end": 473.92, "text": " right things didn''t work well + initially and I got a lot of help from other people in particular", "tokens": [51092, + 558, 721, 994, 380, 589, 731, 9105, 293, 286, 658, 257, 688, 295, 854, 490, 661, + 561, 294, 1729, 51504], "temperature": 0.0, "avg_logprob": -0.19445706927587117, + "compression_ratio": 1.4864864864864864, "no_speech_prob": 0.0008963451255112886}, + {"id": 63, "seek": 47392, "start": 473.92, "end": 480.40000000000003, "text": " + from my author, David Norva, who proposed an amazing improvement for one of the + algorithms", "tokens": [50364, 490, 452, 3793, 11, 4389, 6966, 2757, 11, 567, 10348, + 364, 2243, 10444, 337, 472, 295, 264, 14642, 50688], "temperature": 0.0, "avg_logprob": + -0.3238578464673913, "compression_ratio": 1.4422110552763818, "no_speech_prob": + 0.004585818853229284}, {"id": 64, "seek": 47392, "start": 481.04, "end": 491.44, + "text": " in a thermosleep. Yeah, and so we published and opened after my graduation. + And when I was writing", "tokens": [50720, 294, 257, 8810, 329, 7927, 13, 865, 11, + 293, 370, 321, 6572, 293, 5625, 934, 452, 15652, 13, 400, 562, 286, 390, 3579, 51240], + "temperature": 0.0, "avg_logprob": -0.3238578464673913, "compression_ratio": 1.4422110552763818, + "no_speech_prob": 0.004585818853229284}, {"id": 65, "seek": 47392, "start": 491.44, + "end": 498.8, "text": " my thesis, yeah, I was found like a bunch of issues with + my previous approaches and realized that", "tokens": [51240, 452, 22288, 11, 1338, + 11, 286, 390, 1352, 411, 257, 3840, 295, 2663, 365, 452, 3894, 11587, 293, 5334, + 300, 51608], "temperature": 0.0, "avg_logprob": -0.3238578464673913, "compression_ratio": + 1.4422110552763818, "no_speech_prob": 0.004585818853229284}, {"id": 66, "seek": + 49880, "start": 498.96000000000004, "end": 506.56, "text": " I could also use like + a H&SW like algorithms which were not like core part of my thesis work.", "tokens": + [50372, 286, 727, 611, 764, 411, 257, 389, 5, 50, 54, 411, 14642, 597, 645, 406, + 411, 4965, 644, 295, 452, 22288, 589, 13, 50752], "temperature": 0.0, "avg_logprob": + -0.20676604679652621, "compression_ratio": 1.4946808510638299, "no_speech_prob": + 0.0005867545260116458}, {"id": 67, "seek": 49880, "start": 507.44, "end": 513.12, + "text": " And I got even stronger results, but that was like a little bit too late + to publish and use otherwise.", "tokens": [50796, 400, 286, 658, 754, 7249, 3542, + 11, 457, 300, 390, 411, 257, 707, 857, 886, 3469, 281, 11374, 293, 764, 5911, 13, + 51080], "temperature": 0.0, "avg_logprob": -0.20676604679652621, "compression_ratio": + 1.4946808510638299, "no_speech_prob": 0.0005867545260116458}, {"id": 68, "seek": + 49880, "start": 514.8, "end": 521.76, "text": " Moreover, that the similarity that + I used was a sort of a face palm realization that", "tokens": [51164, 19838, 11, + 300, 264, 32194, 300, 286, 1143, 390, 257, 1333, 295, 257, 1851, 17018, 25138, 300, + 51512], "temperature": 0.0, "avg_logprob": -0.20676604679652621, "compression_ratio": + 1.4946808510638299, "no_speech_prob": 0.0005867545260116458}, {"id": 69, "seek": + 52176, "start": 522.08, "end": 529.6, "text": " that similarity that I used, like + Retrieval, completely as a black box. And it worked with", "tokens": [50380, 300, + 32194, 300, 286, 1143, 11, 411, 11495, 5469, 3337, 11, 2584, 382, 257, 2211, 2424, + 13, 400, 309, 2732, 365, 50756], "temperature": 0.0, "avg_logprob": -0.31423912925281744, + "compression_ratio": 1.5940170940170941, "no_speech_prob": 0.0018524241168051958}, + {"id": 70, "seek": 52176, "start": 529.6, "end": 534.8, "text": " most effective. + It was more effective than B125 on the collections that I used. But I didn''t", + "tokens": [50756, 881, 4942, 13, 467, 390, 544, 4942, 813, 363, 16, 6074, 322, 264, + 16641, 300, 286, 1143, 13, 583, 286, 994, 380, 51016], "temperature": 0.0, "avg_logprob": + -0.31423912925281744, "compression_ratio": 1.5940170940170941, "no_speech_prob": + 0.0018524241168051958}, {"id": 71, "seek": 52176, "start": 534.8, "end": 540.96, + "text": " realize that this black box similarity was actually representable by another + product between", "tokens": [51016, 4325, 300, 341, 2211, 2424, 32194, 390, 767, + 2906, 712, 538, 1071, 1674, 1296, 51324], "temperature": 0.0, "avg_logprob": -0.31423912925281744, + "compression_ratio": 1.5940170940170941, "no_speech_prob": 0.0018524241168051958}, + {"id": 72, "seek": 52176, "start": 540.96, "end": 548.88, "text": " two large sparse + vectors by another former author Chris Dyer pointed this out. And if I embraced", + "tokens": [51324, 732, 2416, 637, 11668, 18875, 538, 1071, 5819, 3793, 6688, 413, + 7224, 10932, 341, 484, 13, 400, 498, 286, 28673, 51720], "temperature": 0.0, "avg_logprob": + -0.31423912925281744, "compression_ratio": 1.5940170940170941, "no_speech_prob": + 0.0018524241168051958}, {"id": 73, "seek": 54888, "start": 548.88, "end": 553.6, + "text": " this sparse vector approach from the get-go, it would have been a much + easier problem to solve", "tokens": [50364, 341, 637, 11668, 8062, 3109, 490, 264, + 483, 12, 1571, 11, 309, 576, 362, 668, 257, 709, 3571, 1154, 281, 5039, 50600], + "temperature": 0.0, "avg_logprob": -0.19930816226535372, "compression_ratio": 1.5767634854771784, + "no_speech_prob": 0.005145624279975891}, {"id": 74, "seek": 54888, "start": 554.48, + "end": 560.64, "text": " from both engineering and scientific points of view even + without work. And okay, it could have", "tokens": [50644, 490, 1293, 7043, 293, + 8134, 2793, 295, 1910, 754, 1553, 589, 13, 400, 1392, 11, 309, 727, 362, 50952], + "temperature": 0.0, "avg_logprob": -0.19930816226535372, "compression_ratio": 1.5767634854771784, + "no_speech_prob": 0.005145624279975891}, {"id": 75, "seek": 54888, "start": 560.64, + "end": 570.96, "text": " produced some more impact. But yeah, a little bit too late + to dwell this now. Okay, and enough", "tokens": [50952, 7126, 512, 544, 2712, 13, + 583, 1338, 11, 257, 707, 857, 886, 3469, 281, 24355, 341, 586, 13, 1033, 11, 293, + 1547, 51468], "temperature": 0.0, "avg_logprob": -0.19930816226535372, "compression_ratio": + 1.5767634854771784, "no_speech_prob": 0.005145624279975891}, {"id": 76, "seek": + 54888, "start": 570.96, "end": 578.72, "text": " with that I graduated six years + ago. And since then I haven''t working as a researcher scientist", "tokens": [51468, + 365, 300, 286, 13693, 2309, 924, 2057, 13, 400, 1670, 550, 286, 2378, 380, 1364, + 382, 257, 21751, 12662, 51856], "temperature": 0.0, "avg_logprob": -0.19930816226535372, + "compression_ratio": 1.5767634854771784, "no_speech_prob": 0.005145624279975891}, + {"id": 77, "seek": 57872, "start": 578.72, "end": 585.0400000000001, "text": " and + engineer on deep learning in what specifically I had in training models for specific + initial", "tokens": [50364, 293, 11403, 322, 2452, 2539, 294, 437, 4682, 286, 632, + 294, 3097, 5245, 337, 2685, 5883, 50680], "temperature": 0.0, "avg_logprob": -0.24766811692571067, + "compression_ratio": 1.5826086956521739, "no_speech_prob": 0.0034362594597041607}, + {"id": 78, "seek": 57872, "start": 585.0400000000001, "end": 592.08, "text": " computer + vision and Retrieval. Despite this diversity, things have come a full circle and", + "tokens": [50680, 3820, 5201, 293, 11495, 5469, 3337, 13, 11334, 341, 8811, 11, + 721, 362, 808, 257, 1577, 6329, 293, 51032], "temperature": 0.0, "avg_logprob": + -0.24766811692571067, "compression_ratio": 1.5826086956521739, "no_speech_prob": + 0.0034362594597041607}, {"id": 79, "seek": 57872, "start": 592.08, "end": 598.96, + "text": " are working question as being systems once again. Yeah, that was pretty + much involved.", "tokens": [51032, 366, 1364, 1168, 382, 885, 3652, 1564, 797, 13, + 865, 11, 300, 390, 1238, 709, 3288, 13, 51376], "temperature": 0.0, "avg_logprob": + -0.24766811692571067, "compression_ratio": 1.5826086956521739, "no_speech_prob": + 0.0034362594597041607}, {"id": 80, "seek": 57872, "start": 598.96, "end": 606.08, + "text": " Yeah, amazing story. Yeah, thank you for that. It''s like the story tends + to repeat itself,", "tokens": [51376, 865, 11, 2243, 1657, 13, 865, 11, 1309, 291, + 337, 300, 13, 467, 311, 411, 264, 1657, 12258, 281, 7149, 2564, 11, 51732], "temperature": + 0.0, "avg_logprob": -0.24766811692571067, "compression_ratio": 1.5826086956521739, + "no_speech_prob": 0.0034362594597041607}, {"id": 81, "seek": 60608, "start": 606.08, + "end": 612.08, "text": " but at the same time, if we find the topic still exciting + and it seems like you are still very", "tokens": [50364, 457, 412, 264, 912, 565, + 11, 498, 321, 915, 264, 4829, 920, 4670, 293, 309, 2544, 411, 291, 366, 920, 588, + 50664], "temperature": 0.0, "avg_logprob": -0.11656635006268819, "compression_ratio": + 1.5899581589958158, "no_speech_prob": 0.008717834018170834}, {"id": 82, "seek": + 60608, "start": 612.08, "end": 618.64, "text": " interested in question answering + and improving building blocks of that, it''s kind of cool,", "tokens": [50664, 3102, + 294, 1168, 13430, 293, 11470, 2390, 8474, 295, 300, 11, 309, 311, 733, 295, 1627, + 11, 50992], "temperature": 0.0, "avg_logprob": -0.11656635006268819, "compression_ratio": + 1.5899581589958158, "no_speech_prob": 0.008717834018170834}, {"id": 83, "seek": + 60608, "start": 618.64, "end": 623.36, "text": " right? So that we are able to come + back to some of the topics, pick them up on a different level.", "tokens": [50992, + 558, 30, 407, 300, 321, 366, 1075, 281, 808, 646, 281, 512, 295, 264, 8378, 11, + 1888, 552, 493, 322, 257, 819, 1496, 13, 51228], "temperature": 0.0, "avg_logprob": + -0.11656635006268819, "compression_ratio": 1.5899581589958158, "no_speech_prob": + 0.008717834018170834}, {"id": 84, "seek": 60608, "start": 623.36, "end": 630.6400000000001, + "text": " That''s amazing. And yeah, there is a lot to unpack. I almost wanted to + ask you or the moment you", "tokens": [51228, 663, 311, 2243, 13, 400, 1338, 11, + 456, 307, 257, 688, 281, 26699, 13, 286, 1920, 1415, 281, 1029, 291, 420, 264, 1623, + 291, 51592], "temperature": 0.0, "avg_logprob": -0.11656635006268819, "compression_ratio": + 1.5899581589958158, "no_speech_prob": 0.008717834018170834}, {"id": 85, "seek": + 63064, "start": 630.64, "end": 637.6, "text": " spoke about spars and dance. I wanted + to pick your brain on what''s it take on the model called", "tokens": [50364, 7179, + 466, 637, 685, 293, 4489, 13, 286, 1415, 281, 1888, 428, 3567, 322, 437, 311, 309, + 747, 322, 264, 2316, 1219, 50712], "temperature": 0.0, "avg_logprob": -0.1600088022523007, + "compression_ratio": 1.6549295774647887, "no_speech_prob": 0.02030409313738346}, + {"id": 86, "seek": 63064, "start": 637.6, "end": 642.0, "text": " split and split + V2. I don''t know if you''re familiar with that model, but basically,", "tokens": + [50712, 7472, 293, 7472, 691, 17, 13, 286, 500, 380, 458, 498, 291, 434, 4963, 365, + 300, 2316, 11, 457, 1936, 11, 50932], "temperature": 0.0, "avg_logprob": -0.1600088022523007, + "compression_ratio": 1.6549295774647887, "no_speech_prob": 0.02030409313738346}, + {"id": 87, "seek": 63064, "start": 643.1999999999999, "end": 648.3199999999999, + "text": " you know, there is always this discussion should we take lexical search, + combine it with dance", "tokens": [50992, 291, 458, 11, 456, 307, 1009, 341, 5017, + 820, 321, 747, 476, 87, 804, 3164, 11, 10432, 309, 365, 4489, 51248], "temperature": + 0.0, "avg_logprob": -0.1600088022523007, "compression_ratio": 1.6549295774647887, + "no_speech_prob": 0.02030409313738346}, {"id": 88, "seek": 63064, "start": 648.3199999999999, + "end": 654.3199999999999, "text": " search and then do some kind of hybrid formal + on top and then how do we even learn the parameters", "tokens": [51248, 3164, 293, + 550, 360, 512, 733, 295, 13051, 9860, 322, 1192, 293, 550, 577, 360, 321, 754, 1466, + 264, 9834, 51548], "temperature": 0.0, "avg_logprob": -0.1600088022523007, "compression_ratio": + 1.6549295774647887, "no_speech_prob": 0.02030409313738346}, {"id": 89, "seek": 63064, + "start": 654.3199999999999, "end": 660.16, "text": " of that model, right? Depending + on the domain. But then there is a drastic sort of approach. Let''s", "tokens": + [51548, 295, 300, 2316, 11, 558, 30, 22539, 322, 264, 9274, 13, 583, 550, 456, 307, + 257, 36821, 1333, 295, 3109, 13, 961, 311, 51840], "temperature": 0.0, "avg_logprob": + -0.1600088022523007, "compression_ratio": 1.6549295774647887, "no_speech_prob": + 0.02030409313738346}, {"id": 90, "seek": 66016, "start": 660.16, "end": 665.8399999999999, + "text": " not do that. Let''s just take a complete model which can handle both and + then you can also", "tokens": [50364, 406, 360, 300, 13, 961, 311, 445, 747, 257, + 3566, 2316, 597, 393, 4813, 1293, 293, 550, 291, 393, 611, 50648], "temperature": + 0.0, "avg_logprob": -0.14631761010013408, "compression_ratio": 1.5106382978723405, + "no_speech_prob": 0.0028053484857082367}, {"id": 91, "seek": 66016, "start": 666.48, + "end": 674.16, "text": " support what the dance search doesn''t support like exact + phrase searches. What''s your general", "tokens": [50680, 1406, 437, 264, 4489, + 3164, 1177, 380, 1406, 411, 1900, 9535, 26701, 13, 708, 311, 428, 2674, 51064], + "temperature": 0.0, "avg_logprob": -0.14631761010013408, "compression_ratio": 1.5106382978723405, + "no_speech_prob": 0.0028053484857082367}, {"id": 92, "seek": 66016, "start": 674.16, + "end": 680.24, "text": " intuition about that? How do you think about this? Well, + that''s a super interesting question. I have", "tokens": [51064, 24002, 466, 300, + 30, 1012, 360, 291, 519, 466, 341, 30, 1042, 11, 300, 311, 257, 1687, 1880, 1168, + 13, 286, 362, 51368], "temperature": 0.0, "avg_logprob": -0.14631761010013408, "compression_ratio": + 1.5106382978723405, "no_speech_prob": 0.0028053484857082367}, {"id": 93, "seek": + 68024, "start": 680.32, "end": 691.28, "text": " one clarifying question though. + So, before I answer, you said that some people who want to have", "tokens": [50368, + 472, 6093, 5489, 1168, 1673, 13, 407, 11, 949, 286, 1867, 11, 291, 848, 300, 512, + 561, 567, 528, 281, 362, 50916], "temperature": 0.0, "avg_logprob": -0.199555788542095, + "compression_ratio": 1.576271186440678, "no_speech_prob": 0.011479969136416912}, + {"id": 94, "seek": 68024, "start": 691.28, "end": 696.32, "text": " a single model + that''s doing both. Could you elaborate a little bit on this? Well, I guess", "tokens": + [50916, 257, 2167, 2316, 300, 311, 884, 1293, 13, 7497, 291, 20945, 257, 707, 857, + 322, 341, 30, 1042, 11, 286, 2041, 51168], "temperature": 0.0, "avg_logprob": -0.199555788542095, + "compression_ratio": 1.576271186440678, "no_speech_prob": 0.011479969136416912}, + {"id": 95, "seek": 68024, "start": 697.28, "end": 703.36, "text": " maybe it''s + not that they wanted, but it''s like the development when, instead of sort of,", + "tokens": [51216, 1310, 309, 311, 406, 300, 436, 1415, 11, 457, 309, 311, 411, 264, + 3250, 562, 11, 2602, 295, 1333, 295, 11, 51520], "temperature": 0.0, "avg_logprob": + -0.199555788542095, "compression_ratio": 1.576271186440678, "no_speech_prob": 0.011479969136416912}, + {"id": 96, "seek": 68024, "start": 703.36, "end": 709.28, "text": " you know, combining + these disparate sources of results, you know, one coming from lexical search,", + "tokens": [51520, 291, 458, 11, 21928, 613, 14548, 473, 7139, 295, 3542, 11, 291, + 458, 11, 472, 1348, 490, 476, 87, 804, 3164, 11, 51816], "temperature": 0.0, "avg_logprob": + -0.199555788542095, "compression_ratio": 1.576271186440678, "no_speech_prob": 0.011479969136416912}, + {"id": 97, "seek": 70928, "start": 709.28, "end": 713.76, "text": " which is kind + of like well-known BM25 driven, I guess. And then the other one is more", "tokens": + [50364, 597, 307, 733, 295, 411, 731, 12, 6861, 15901, 6074, 9555, 11, 286, 2041, + 13, 400, 550, 264, 661, 472, 307, 544, 50588], "temperature": 0.0, "avg_logprob": + -0.1798357866248306, "compression_ratio": 1.6182572614107884, "no_speech_prob": + 0.0022938442416489124}, {"id": 98, "seek": 70928, "start": 715.76, "end": 722.4, + "text": " like more modern in a way that everyone wants to get exposed to dance + search. And then you need to", "tokens": [50688, 411, 544, 4363, 294, 257, 636, + 300, 1518, 2738, 281, 483, 9495, 281, 4489, 3164, 13, 400, 550, 291, 643, 281, 51020], + "temperature": 0.0, "avg_logprob": -0.1798357866248306, "compression_ratio": 1.6182572614107884, + "no_speech_prob": 0.0022938442416489124}, {"id": 99, "seek": 70928, "start": 722.4, + "end": 727.92, "text": " somehow figure out how you combine the results, right? + So one is designed maybe for precision lexical.", "tokens": [51020, 6063, 2573, + 484, 577, 291, 10432, 264, 3542, 11, 558, 30, 407, 472, 307, 4761, 1310, 337, 18356, + 476, 87, 804, 13, 51296], "temperature": 0.0, "avg_logprob": -0.1798357866248306, + "compression_ratio": 1.6182572614107884, "no_speech_prob": 0.0022938442416489124}, + {"id": 100, "seek": 70928, "start": 727.92, "end": 736.16, "text": " The other one + is designed more for recall, right? Because the vectors are not, they don''t have + as many", "tokens": [51296, 440, 661, 472, 307, 4761, 544, 337, 9901, 11, 558, 30, + 1436, 264, 18875, 366, 406, 11, 436, 500, 380, 362, 382, 867, 51708], "temperature": + 0.0, "avg_logprob": -0.1798357866248306, "compression_ratio": 1.6182572614107884, + "no_speech_prob": 0.0022938442416489124}, {"id": 101, "seek": 73616, "start": 736.16, + "end": 741.52, "text": " dimensions as these far specters. But then you still need + to figure out, okay, how do I combine this", "tokens": [50364, 12819, 382, 613, + 1400, 6177, 433, 13, 583, 550, 291, 920, 643, 281, 2573, 484, 11, 1392, 11, 577, + 360, 286, 10432, 341, 50632], "temperature": 0.0, "avg_logprob": -0.19729736873081752, + "compression_ratio": 1.606425702811245, "no_speech_prob": 0.00239355256780982}, + {"id": 102, "seek": 73616, "start": 741.52, "end": 749.28, "text": " to? And usually + people cite reciprocal rank fusion in what I hear, but there are other methods as", + "tokens": [50632, 281, 30, 400, 2673, 561, 37771, 46948, 6181, 23100, 294, 437, + 286, 1568, 11, 457, 456, 366, 661, 7150, 382, 51020], "temperature": 0.0, "avg_logprob": + -0.19729736873081752, "compression_ratio": 1.606425702811245, "no_speech_prob": + 0.00239355256780982}, {"id": 103, "seek": 73616, "start": 749.28, "end": 756.48, + "text": " well, like even clustering based. But then that''s one approach. Another + approach is just stop doing that,", "tokens": [51020, 731, 11, 411, 754, 596, 48673, + 2361, 13, 583, 550, 300, 311, 472, 3109, 13, 3996, 3109, 307, 445, 1590, 884, 300, + 11, 51380], "temperature": 0.0, "avg_logprob": -0.19729736873081752, "compression_ratio": + 1.606425702811245, "no_speech_prob": 0.00239355256780982}, {"id": 104, "seek": 73616, + "start": 756.48, "end": 762.9599999999999, "text": " I guess. If I really understand + what split does, and then you encode with split your data once,", "tokens": [51380, + 286, 2041, 13, 759, 286, 534, 1223, 437, 7472, 775, 11, 293, 550, 291, 2058, 1429, + 365, 7472, 428, 1412, 1564, 11, 51704], "temperature": 0.0, "avg_logprob": -0.19729736873081752, + "compression_ratio": 1.606425702811245, "no_speech_prob": 0.00239355256780982}, + {"id": 105, "seek": 76296, "start": 762.96, "end": 769.6, "text": " and you retrieve, + you know, you use its capabilities to also retrieve exact phrases, right? So,", + "tokens": [50364, 293, 291, 30254, 11, 291, 458, 11, 291, 764, 1080, 10862, 281, + 611, 30254, 1900, 20312, 11, 558, 30, 407, 11, 50696], "temperature": 0.0, "avg_logprob": + -0.176083307999831, "compression_ratio": 1.4411764705882353, "no_speech_prob": 0.005018287338316441}, + {"id": 106, "seek": 76296, "start": 769.6, "end": 777.44, "text": " effectively, + ideally, you don''t need the lexical matching engine anymore, but maybe I''m completely", + "tokens": [50696, 8659, 11, 22915, 11, 291, 500, 380, 643, 264, 476, 87, 804, 14324, + 2848, 3602, 11, 457, 1310, 286, 478, 2584, 51088], "temperature": 0.0, "avg_logprob": + -0.176083307999831, "compression_ratio": 1.4411764705882353, "no_speech_prob": 0.005018287338316441}, + {"id": 107, "seek": 76296, "start": 777.44, "end": 785.6800000000001, "text": " + wrong. I''m just, I wanted to hear your opinion on that. Okay, well, let''s get + it. Using your words,", "tokens": [51088, 2085, 13, 286, 478, 445, 11, 286, 1415, + 281, 1568, 428, 4800, 322, 300, 13, 1033, 11, 731, 11, 718, 311, 483, 309, 13, 11142, + 428, 2283, 11, 51500], "temperature": 0.0, "avg_logprob": -0.176083307999831, "compression_ratio": + 1.4411764705882353, "no_speech_prob": 0.005018287338316441}, {"id": 108, "seek": + 78568, "start": 785.68, "end": 793.76, "text": " it''s a lot on back here. I''m + still not quite sure what you mean by having like a single model.", "tokens": [50364, + 309, 311, 257, 688, 322, 646, 510, 13, 286, 478, 920, 406, 1596, 988, 437, 291, + 914, 538, 1419, 411, 257, 2167, 2316, 13, 50768], "temperature": 0.0, "avg_logprob": + -0.2128382682800293, "compression_ratio": 1.455497382198953, "no_speech_prob": 0.010937400162220001}, + {"id": 109, "seek": 78568, "start": 794.4, "end": 801.3599999999999, "text": " Although, + maybe I love me try to maybe start answering questions and you can drop me and", + "tokens": [50800, 5780, 11, 1310, 286, 959, 385, 853, 281, 1310, 722, 13430, 1651, + 293, 291, 393, 3270, 385, 293, 51148], "temperature": 0.0, "avg_logprob": -0.2128382682800293, + "compression_ratio": 1.455497382198953, "no_speech_prob": 0.010937400162220001}, + {"id": 110, "seek": 78568, "start": 802.7199999999999, "end": 811.52, "text": " + guide me into the other direction if needed. So first of all, we have what''s interesting + about", "tokens": [51216, 5934, 385, 666, 264, 661, 3513, 498, 2978, 13, 407, 700, + 295, 439, 11, 321, 362, 437, 311, 1880, 466, 51656], "temperature": 0.0, "avg_logprob": + -0.2128382682800293, "compression_ratio": 1.455497382198953, "no_speech_prob": 0.010937400162220001}, + {"id": 111, "seek": 81152, "start": 811.52, "end": 818.0799999999999, "text": " + Nashville language is that, and that''s very different from computer vision domain, + is that we", "tokens": [50364, 36370, 2856, 307, 300, 11, 293, 300, 311, 588, 819, + 490, 3820, 5201, 9274, 11, 307, 300, 321, 50692], "temperature": 0.0, "avg_logprob": + -0.2879610061645508, "compression_ratio": 1.619047619047619, "no_speech_prob": 0.01184320542961359}, + {"id": 112, "seek": 81152, "start": 818.0799999999999, "end": 826.24, "text": " + usually represent, we can we have multiple ways to represent text. So in computer + vision,", "tokens": [50692, 2673, 2906, 11, 321, 393, 321, 362, 3866, 2098, 281, + 2906, 2487, 13, 407, 294, 3820, 5201, 11, 51100], "temperature": 0.0, "avg_logprob": + -0.2879610061645508, "compression_ratio": 1.619047619047619, "no_speech_prob": 0.01184320542961359}, + {"id": 113, "seek": 81152, "start": 827.28, "end": 834.56, "text": " usually it''s + just like each image is a traditional representative of actors that was the", "tokens": + [51152, 2673, 309, 311, 445, 411, 1184, 3256, 307, 257, 5164, 12424, 295, 10037, + 300, 390, 264, 51516], "temperature": 0.0, "avg_logprob": -0.2879610061645508, "compression_ratio": + 1.619047619047619, "no_speech_prob": 0.01184320542961359}, {"id": 114, "seek": 83456, + "start": 834.56, "end": 842.88, "text": " commodity theme. But in the in a national + language processing, we started with the so-called", "tokens": [50364, 29125, 6314, + 13, 583, 294, 264, 294, 257, 4048, 2856, 9007, 11, 321, 1409, 365, 264, 370, 12, + 11880, 50780], "temperature": 0.0, "avg_logprob": -0.2968230708952873, "compression_ratio": + 1.540983606557377, "no_speech_prob": 0.0029008244164288044}, {"id": 115, "seek": + 83456, "start": 842.88, "end": 850.0799999999999, "text": " bag of words representations + where a document was represented by basically a sparse vector where", "tokens": + [50780, 3411, 295, 2283, 33358, 689, 257, 4166, 390, 10379, 538, 1936, 257, 637, + 11668, 8062, 689, 51140], "temperature": 0.0, "avg_logprob": -0.2968230708952873, + "compression_ratio": 1.540983606557377, "no_speech_prob": 0.0029008244164288044}, + {"id": 116, "seek": 83456, "start": 850.0799999999999, "end": 857.1999999999999, + "text": " you will have either zeros and ones, which means the specific terms present + or not, or maybe", "tokens": [51140, 291, 486, 362, 2139, 35193, 293, 2306, 11, + 597, 1355, 264, 2685, 2115, 1974, 420, 406, 11, 420, 1310, 51496], "temperature": + 0.0, "avg_logprob": -0.2968230708952873, "compression_ratio": 1.540983606557377, + "no_speech_prob": 0.0029008244164288044}, {"id": 117, "seek": 85720, "start": 857.2, + "end": 864.4000000000001, "text": " weights, not just zeros and ones, but weights. + But then, with development of deep learning, and I", "tokens": [50364, 17443, 11, + 406, 445, 35193, 293, 2306, 11, 457, 17443, 13, 583, 550, 11, 365, 3250, 295, 2452, + 2539, 11, 293, 286, 50724], "temperature": 0.0, "avg_logprob": -0.21123206798846905, + "compression_ratio": 1.5157894736842106, "no_speech_prob": 0.001722217071801424}, + {"id": 118, "seek": 85720, "start": 864.4000000000001, "end": 875.12, "text": " + actually started a little bit earlier with people, people learned how to represent + text using fixed", "tokens": [50724, 767, 1409, 257, 707, 857, 3071, 365, 561, 11, + 561, 3264, 577, 281, 2906, 2487, 1228, 6806, 51260], "temperature": 0.0, "avg_logprob": + -0.21123206798846905, "compression_ratio": 1.5157894736842106, "no_speech_prob": + 0.001722217071801424}, {"id": 119, "seek": 85720, "start": 875.12, "end": 882.48, + "text": " size vectors. And that was like using principle component analysis. And + this is not a very", "tokens": [51260, 2744, 18875, 13, 400, 300, 390, 411, 1228, + 8665, 6542, 5215, 13, 400, 341, 307, 406, 257, 588, 51628], "temperature": 0.0, + "avg_logprob": -0.21123206798846905, "compression_ratio": 1.5157894736842106, "no_speech_prob": + 0.001722217071801424}, {"id": 120, "seek": 88248, "start": 882.48, "end": 890.24, + "text": " natural representation for text and it didn''t work really well initially. + But now we''re having good", "tokens": [50364, 3303, 10290, 337, 2487, 293, 309, + 994, 380, 589, 534, 731, 9105, 13, 583, 586, 321, 434, 1419, 665, 50752], "temperature": + 0.0, "avg_logprob": -0.31740800957930715, "compression_ratio": 1.6527196652719665, + "no_speech_prob": 0.002557026222348213}, {"id": 121, "seek": 88248, "start": 890.24, + "end": 896.32, "text": " results. So we have like two representations and there + are different approaches to combine those,", "tokens": [50752, 3542, 13, 407, 321, + 362, 411, 732, 33358, 293, 456, 366, 819, 11587, 281, 10432, 729, 11, 51056], "temperature": + 0.0, "avg_logprob": -0.31740800957930715, "compression_ratio": 1.6527196652719665, + "no_speech_prob": 0.002557026222348213}, {"id": 122, "seek": 88248, "start": 896.32, + "end": 903.6800000000001, "text": " of course. One is just if you want to do the + T-wall, you can indeed just do the lexical base search,", "tokens": [51056, 295, + 1164, 13, 1485, 307, 445, 498, 291, 528, 281, 360, 264, 314, 12, 16256, 11, 291, + 393, 6451, 445, 360, 264, 476, 87, 804, 3096, 3164, 11, 51424], "temperature": 0.0, + "avg_logprob": -0.31740800957930715, "compression_ratio": 1.6527196652719665, "no_speech_prob": + 0.002557026222348213}, {"id": 123, "seek": 88248, "start": 904.96, "end": 911.28, + "text": " you can do a kidney or a snabestation vector representations, and then + you can somehow merge the", "tokens": [51488, 291, 393, 360, 257, 19000, 420, 257, + 2406, 455, 377, 399, 8062, 33358, 11, 293, 550, 291, 393, 6063, 22183, 264, 51804], + "temperature": 0.0, "avg_logprob": -0.31740800957930715, "compression_ratio": 1.6527196652719665, + "no_speech_prob": 0.002557026222348213}, {"id": 124, "seek": 91128, "start": 911.28, + "end": 918.0, "text": " results. You can use ranker. But you don''t have to, and + that''s the so-called hybrid search, but the", "tokens": [50364, 3542, 13, 509, + 393, 764, 6181, 260, 13, 583, 291, 500, 380, 362, 281, 11, 293, 300, 311, 264, 370, + 12, 11880, 13051, 3164, 11, 457, 264, 50700], "temperature": 0.0, "avg_logprob": + -0.14561008920474927, "compression_ratio": 1.6778242677824269, "no_speech_prob": + 0.0011995760723948479}, {"id": 125, "seek": 91128, "start": 918.0, "end": 924.48, + "text": " hybrid search can exist in different versions. So if you want to combine + it sort of in a single model,", "tokens": [50700, 13051, 3164, 393, 2514, 294, 819, + 9606, 13, 407, 498, 291, 528, 281, 10432, 309, 1333, 295, 294, 257, 2167, 2316, + 11, 51024], "temperature": 0.0, "avg_logprob": -0.14561008920474927, "compression_ratio": + 1.6778242677824269, "no_speech_prob": 0.0011995760723948479}, {"id": 126, "seek": + 91128, "start": 924.48, "end": 933.52, "text": " why don''t you represent each document + using both sparse and dense vector? And when you''re computing", "tokens": [51024, + 983, 500, 380, 291, 2906, 1184, 4166, 1228, 1293, 637, 11668, 293, 18011, 8062, + 30, 400, 562, 291, 434, 15866, 51476], "temperature": 0.0, "avg_logprob": -0.14561008920474927, + "compression_ratio": 1.6778242677824269, "no_speech_prob": 0.0011995760723948479}, + {"id": 127, "seek": 91128, "start": 933.52, "end": 939.6, "text": " the similarity, + you can compute the similarity between sparse parts, between dense parts, and then", + "tokens": [51476, 264, 32194, 11, 291, 393, 14722, 264, 32194, 1296, 637, 11668, + 3166, 11, 1296, 18011, 3166, 11, 293, 550, 51780], "temperature": 0.0, "avg_logprob": + -0.14561008920474927, "compression_ratio": 1.6778242677824269, "no_speech_prob": + 0.0011995760723948479}, {"id": 128, "seek": 93960, "start": 939.9200000000001, "end": + 946.72, "text": " combine them somehow. For example, using a weight. And that''s + in fact what I was trying to do in my thesis", "tokens": [50380, 10432, 552, 6063, + 13, 1171, 1365, 11, 1228, 257, 3364, 13, 400, 300, 311, 294, 1186, 437, 286, 390, + 1382, 281, 360, 294, 452, 22288, 50720], "temperature": 0.0, "avg_logprob": -0.14781245900623835, + "compression_ratio": 1.5210526315789474, "no_speech_prob": 0.0011417664354667068}, + {"id": 129, "seek": 93960, "start": 946.72, "end": 955.9200000000001, "text": " + as well, because I was doing, my similarities was basically an ensemble of several + similarities", "tokens": [50720, 382, 731, 11, 570, 286, 390, 884, 11, 452, 24197, + 390, 1936, 364, 19492, 295, 2940, 24197, 51180], "temperature": 0.0, "avg_logprob": + -0.14781245900623835, "compression_ratio": 1.5210526315789474, "no_speech_prob": + 0.0011417664354667068}, {"id": 130, "seek": 93960, "start": 955.9200000000001, "end": + 962.48, "text": " course for at least two representations. And that could work. + There''s of course modern", "tokens": [51180, 1164, 337, 412, 1935, 732, 33358, + 13, 400, 300, 727, 589, 13, 821, 311, 295, 1164, 4363, 51508], "temperature": 0.0, + "avg_logprob": -0.14781245900623835, "compression_ratio": 1.5210526315789474, "no_speech_prob": + 0.0011417664354667068}, {"id": 131, "seek": 96248, "start": 962.48, "end": 967.2, + "text": " instantiations of this, and there''s a paper, I think both are by some", + "tokens": [50364, 9836, 72, 763, 295, 341, 11, 293, 456, 311, 257, 3035, 11, 286, + 519, 1293, 366, 538, 512, 50600], "temperature": 0.0, "avg_logprob": -0.26171720595586867, + "compression_ratio": 1.6261261261261262, "no_speech_prob": 0.003796835197135806}, + {"id": 132, "seek": 96248, "start": 968.4, "end": 976.72, "text": " glue people, + where they did exactly like this, they combined splaid and some dense vector", "tokens": + [50660, 8998, 561, 11, 689, 436, 630, 2293, 411, 341, 11, 436, 9354, 637, 875, 327, + 293, 512, 18011, 8062, 51076], "temperature": 0.0, "avg_logprob": -0.26171720595586867, + "compression_ratio": 1.6261261261261262, "no_speech_prob": 0.003796835197135806}, + {"id": 133, "seek": 96248, "start": 976.72, "end": 983.04, "text": " embeddings. + And that can work apparently a little bit better than, or sometimes maybe a lot + better", "tokens": [51076, 12240, 29432, 13, 400, 300, 393, 589, 7970, 257, 707, + 857, 1101, 813, 11, 420, 2171, 1310, 257, 688, 1101, 51392], "temperature": 0.0, + "avg_logprob": -0.26171720595586867, "compression_ratio": 1.6261261261261262, "no_speech_prob": + 0.003796835197135806}, {"id": 134, "seek": 96248, "start": 984.0, "end": 992.4, + "text": " than basic representations, like each representation specifically. So + with both approaches, of course,", "tokens": [51440, 813, 3875, 33358, 11, 411, + 1184, 10290, 4682, 13, 407, 365, 1293, 11587, 11, 295, 1164, 11, 51860], "temperature": + 0.0, "avg_logprob": -0.26171720595586867, "compression_ratio": 1.6261261261261262, + "no_speech_prob": 0.003796835197135806}, {"id": 135, "seek": 99240, "start": 993.36, + "end": 1000.8, "text": " there are issues that you mentioned. So I don''t know what + the best approach there, and I don''t", "tokens": [50412, 456, 366, 2663, 300, 291, + 2835, 13, 407, 286, 500, 380, 458, 437, 264, 1151, 3109, 456, 11, 293, 286, 500, + 380, 50784], "temperature": 0.0, "avg_logprob": -0.12821337154933385, "compression_ratio": + 1.608695652173913, "no_speech_prob": 0.0010765603510662913}, {"id": 136, "seek": + 99240, "start": 1000.8, "end": 1006.88, "text": " have a crystal ball regarding + what''s the best path forward. But with dense representation,", "tokens": [50784, + 362, 257, 13662, 2594, 8595, 437, 311, 264, 1151, 3100, 2128, 13, 583, 365, 18011, + 10290, 11, 51088], "temperature": 0.0, "avg_logprob": -0.12821337154933385, "compression_ratio": + 1.608695652173913, "no_speech_prob": 0.0010765603510662913}, {"id": 137, "seek": + 99240, "start": 1006.88, "end": 1013.28, "text": " the clearly the problem is that + you have to pack everything into the fixed size vector.", "tokens": [51088, 264, + 4448, 264, 1154, 307, 300, 291, 362, 281, 2844, 1203, 666, 264, 6806, 2744, 8062, + 13, 51408], "temperature": 0.0, "avg_logprob": -0.12821337154933385, "compression_ratio": + 1.608695652173913, "no_speech_prob": 0.0010765603510662913}, {"id": 138, "seek": + 99240, "start": 1014.0, "end": 1021.12, "text": " And as your document is getting + bigger, you basically the vector size, the amount of information", "tokens": [51444, + 400, 382, 428, 4166, 307, 1242, 3801, 11, 291, 1936, 264, 8062, 2744, 11, 264, 2372, + 295, 1589, 51800], "temperature": 0.0, "avg_logprob": -0.12821337154933385, "compression_ratio": + 1.608695652173913, "no_speech_prob": 0.0010765603510662913}, {"id": 139, "seek": + 102112, "start": 1021.2, "end": 1025.92, "text": " you can store is the same, but + your document increases in size. So you would", "tokens": [50368, 291, 393, 3531, + 307, 264, 912, 11, 457, 428, 4166, 8637, 294, 2744, 13, 407, 291, 576, 50604], "temperature": + 0.0, "avg_logprob": -0.17627362073478053, "compression_ratio": 1.4915254237288136, + "no_speech_prob": 0.0008666721405461431}, {"id": 140, "seek": 102112, "start": 1028.32, + "end": 1037.52, "text": " possibly expect some deterioration in quality. But another + reason why you can see deteriorating", "tokens": [50724, 6264, 2066, 512, 26431, + 399, 294, 3125, 13, 583, 1071, 1778, 983, 291, 393, 536, 26431, 990, 51184], "temperature": + 0.0, "avg_logprob": -0.17627362073478053, "compression_ratio": 1.4915254237288136, + "no_speech_prob": 0.0008666721405461431}, {"id": 141, "seek": 102112, "start": 1037.52, + "end": 1046.0, "text": " results just because some like you have fixed representations, + the number of words is huge.", "tokens": [51184, 3542, 445, 570, 512, 411, 291, + 362, 6806, 33358, 11, 264, 1230, 295, 2283, 307, 2603, 13, 51608], "temperature": + 0.0, "avg_logprob": -0.17627362073478053, "compression_ratio": 1.4915254237288136, + "no_speech_prob": 0.0008666721405461431}, {"id": 142, "seek": 104600, "start": 1046.96, + "end": 1054.4, "text": " And like in regular person knows like around like educated + person knows about 30,000 words,", "tokens": [50412, 400, 411, 294, 3890, 954, 3255, + 411, 926, 411, 15872, 954, 3255, 466, 2217, 11, 1360, 2283, 11, 50784], "temperature": + 0.0, "avg_logprob": -0.26194129519992404, "compression_ratio": 1.76036866359447, + "no_speech_prob": 0.022743871435523033}, {"id": 143, "seek": 104600, "start": 1054.88, + "end": 1062.8, "text": " but in reality, like internet has millions of words, right? + And the words are not just only words,", "tokens": [50808, 457, 294, 4103, 11, 411, + 4705, 575, 6803, 295, 2283, 11, 558, 30, 400, 264, 2283, 366, 406, 445, 787, 2283, + 11, 51204], "temperature": 0.0, "avg_logprob": -0.26194129519992404, "compression_ratio": + 1.76036866359447, "no_speech_prob": 0.022743871435523033}, {"id": 144, "seek": 104600, + "start": 1062.8, "end": 1069.12, "text": " there are things like product identifiers, + right? If you want to, and sometimes people will do", "tokens": [51204, 456, 366, + 721, 411, 1674, 2473, 23463, 11, 558, 30, 759, 291, 528, 281, 11, 293, 2171, 561, + 486, 360, 51520], "temperature": 0.0, "avg_logprob": -0.26194129519992404, "compression_ratio": + 1.76036866359447, "no_speech_prob": 0.022743871435523033}, {"id": 145, "seek": 104600, + "start": 1069.12, "end": 1074.8, "text": " products, they will search something + they want to buy, and they would you know copy paste those,", "tokens": [51520, + 3383, 11, 436, 486, 3164, 746, 436, 528, 281, 2256, 11, 293, 436, 576, 291, 458, + 5055, 9163, 729, 11, 51804], "temperature": 0.0, "avg_logprob": -0.26194129519992404, + "compression_ratio": 1.76036866359447, "no_speech_prob": 0.022743871435523033}, + {"id": 146, "seek": 107480, "start": 1074.8, "end": 1083.28, "text": " or type them + in, and then they got squished in the in that dense vector. So it cannot be precise.", + "tokens": [50364, 420, 2010, 552, 294, 11, 293, 550, 436, 658, 2339, 4729, 294, + 264, 294, 300, 18011, 8062, 13, 407, 309, 2644, 312, 13600, 13, 50788], "temperature": + 0.0, "avg_logprob": -0.40208232402801514, "compression_ratio": 1.4912280701754386, + "no_speech_prob": 0.0035374897997826338}, {"id": 147, "seek": 107480, "start": 1084.24, + "end": 1090.6399999999999, "text": " There is an interesting paper by author by + Neil Srymer''s,", "tokens": [50836, 821, 307, 364, 1880, 3035, 538, 3793, 538, 18615, + 318, 627, 936, 311, 11, 51156], "temperature": 0.0, "avg_logprob": -0.40208232402801514, + "compression_ratio": 1.4912280701754386, "no_speech_prob": 0.0035374897997826338}, + {"id": 148, "seek": 107480, "start": 1091.52, "end": 1100.1599999999999, "text": + " sentence board author, where he has a, in like some experimental and even theoretical + evidence that", "tokens": [51200, 8174, 3150, 3793, 11, 689, 415, 575, 257, 11, + 294, 411, 512, 17069, 293, 754, 20864, 4467, 300, 51632], "temperature": 0.0, "avg_logprob": + -0.40208232402801514, "compression_ratio": 1.4912280701754386, "no_speech_prob": + 0.0035374897997826338}, {"id": 149, "seek": 110016, "start": 1100.5600000000002, + "end": 1108.0, "text": " as the collection size increases so the dense vector search + can deteriorate just because there", "tokens": [50384, 382, 264, 5765, 2744, 8637, + 370, 264, 18011, 8062, 3164, 393, 26431, 473, 445, 570, 456, 50756], "temperature": + 0.0, "avg_logprob": -0.2601868414109753, "compression_ratio": 1.5495867768595042, + "no_speech_prob": 0.006780822295695543}, {"id": 150, "seek": 110016, "start": 1108.0, + "end": 1112.96, "text": " would be some false positives and measures due to you + know the excruciating a lot of information", "tokens": [50756, 576, 312, 512, 7908, + 35127, 293, 8000, 3462, 281, 291, 458, 264, 1624, 894, 537, 990, 257, 688, 295, + 1589, 51004], "temperature": 0.0, "avg_logprob": -0.2601868414109753, "compression_ratio": + 1.5495867768595042, "no_speech_prob": 0.006780822295695543}, {"id": 151, "seek": + 110016, "start": 1112.96, "end": 1121.1200000000001, "text": " together and they + fix size directly. So yeah, I mean, it''s quite possible, but I haven''t seen", + "tokens": [51004, 1214, 293, 436, 3191, 2744, 3838, 13, 407, 1338, 11, 286, 914, + 11, 309, 311, 1596, 1944, 11, 457, 286, 2378, 380, 1612, 51412], "temperature": + 0.0, "avg_logprob": -0.2601868414109753, "compression_ratio": 1.5495867768595042, + "no_speech_prob": 0.006780822295695543}, {"id": 152, "seek": 110016, "start": 1121.1200000000001, + "end": 1126.16, "text": " like a fall off of this work, so I don''t know how much + of a problem it isn''t in practice.", "tokens": [51412, 411, 257, 2100, 766, 295, + 341, 589, 11, 370, 286, 500, 380, 458, 577, 709, 295, 257, 1154, 309, 1943, 380, + 294, 3124, 13, 51664], "temperature": 0.0, "avg_logprob": -0.2601868414109753, "compression_ratio": + 1.5495867768595042, "no_speech_prob": 0.006780822295695543}, {"id": 153, "seek": + 112616, "start": 1126.96, "end": 1132.64, "text": " And coming back to the sparse + representations, so yeah, they could potentially", "tokens": [50404, 400, 1348, + 646, 281, 264, 637, 11668, 33358, 11, 370, 1338, 11, 436, 727, 7263, 50688], "temperature": + 0.0, "avg_logprob": -0.2595607951536017, "compression_ratio": 1.6503067484662577, + "no_speech_prob": 0.00905586127191782}, {"id": 154, "seek": 112616, "start": 1133.2, + "end": 1140.3200000000002, "text": " use all this issue, but not necessarily with + displayed like models. Well, the problem with", "tokens": [50716, 764, 439, 341, + 2734, 11, 457, 406, 4725, 365, 16372, 411, 5245, 13, 1042, 11, 264, 1154, 365, 51072], + "temperature": 0.0, "avg_logprob": -0.2595607951536017, "compression_ratio": 1.6503067484662577, + "no_speech_prob": 0.00905586127191782}, {"id": 155, "seek": 112616, "start": 1140.3200000000002, + "end": 1148.88, "text": " display is that displayed models, they create those sparse + representations using the, not the words", "tokens": [51072, 4674, 307, 300, 16372, + 5245, 11, 436, 1884, 729, 637, 11668, 33358, 1228, 264, 11, 406, 264, 2283, 51500], + "temperature": 0.0, "avg_logprob": -0.2595607951536017, "compression_ratio": 1.6503067484662577, + "no_speech_prob": 0.00905586127191782}, {"id": 156, "seek": 114888, "start": 1148.88, + "end": 1158.48, "text": " themselves, they''re using sub word talking. So as a reminder + with like models like with transform", "tokens": [50364, 2969, 11, 436, 434, 1228, + 1422, 1349, 1417, 13, 407, 382, 257, 13548, 365, 411, 5245, 411, 365, 4088, 50844], + "temperature": 0.0, "avg_logprob": -0.22559534419666638, "compression_ratio": 1.5988700564971752, + "no_speech_prob": 0.003961000591516495}, {"id": 157, "seek": 114888, "start": 1158.48, + "end": 1169.0400000000002, "text": " models, they create this sort of new sort of + vocabulary that has some complete words, but most", "tokens": [50844, 5245, 11, + 436, 1884, 341, 1333, 295, 777, 1333, 295, 19864, 300, 575, 512, 3566, 2283, 11, + 457, 881, 51372], "temperature": 0.0, "avg_logprob": -0.22559534419666638, "compression_ratio": + 1.5988700564971752, "no_speech_prob": 0.003961000591516495}, {"id": 158, "seek": + 114888, "start": 1169.0400000000002, "end": 1175.8400000000001, "text": " words + are incomplete. So like they have like extract prefix, suffixes, parts of the words,", + "tokens": [51372, 2283, 366, 31709, 13, 407, 411, 436, 362, 411, 8947, 46969, 11, + 3889, 36005, 11, 3166, 295, 264, 2283, 11, 51712], "temperature": 0.0, "avg_logprob": + -0.22559534419666638, "compression_ratio": 1.5988700564971752, "no_speech_prob": + 0.003961000591516495}, {"id": 159, "seek": 117584, "start": 1175.84, "end": 1180.72, + "text": " and this is your new vocabulary and the difference between these new vocabulary + and the", "tokens": [50364, 293, 341, 307, 428, 777, 19864, 293, 264, 2649, 1296, + 613, 777, 19864, 293, 264, 50608], "temperature": 0.0, "avg_logprob": -0.1801014833672102, + "compression_ratio": 1.7339449541284404, "no_speech_prob": 0.005211897660046816}, + {"id": 160, "seek": 117584, "start": 1180.72, "end": 1187.04, "text": " actual vocabulary + that people use or use on the internet is that it''s limited to, it can have like", + "tokens": [50608, 3539, 19864, 300, 561, 764, 420, 764, 322, 264, 4705, 307, 300, + 309, 311, 5567, 281, 11, 309, 393, 362, 411, 50924], "temperature": 0.0, "avg_logprob": + -0.1801014833672102, "compression_ratio": 1.7339449541284404, "no_speech_prob": + 0.005211897660046816}, {"id": 161, "seek": 117584, "start": 1187.04, "end": 1195.28, + "text": " 50,000 talking, maybe 200 talking and some of the advanced modeling models, + but we really have like", "tokens": [50924, 2625, 11, 1360, 1417, 11, 1310, 2331, + 1417, 293, 512, 295, 264, 7339, 15983, 5245, 11, 457, 321, 534, 362, 411, 51336], + "temperature": 0.0, "avg_logprob": -0.1801014833672102, "compression_ratio": 1.7339449541284404, + "no_speech_prob": 0.005211897660046816}, {"id": 162, "seek": 117584, "start": 1195.28, + "end": 1201.28, "text": " millions and millions of words. So of course, that would + also lead to some deterioration in", "tokens": [51336, 6803, 293, 6803, 295, 2283, + 13, 407, 295, 1164, 11, 300, 576, 611, 1477, 281, 512, 26431, 399, 294, 51636], + "temperature": 0.0, "avg_logprob": -0.1801014833672102, "compression_ratio": 1.7339449541284404, + "no_speech_prob": 0.005211897660046816}, {"id": 163, "seek": 120128, "start": 1201.36, + "end": 1212.32, "text": " quality false positives, and especially if you try to + represent, represent long documents", "tokens": [50368, 3125, 7908, 35127, 11, 293, + 2318, 498, 291, 853, 281, 2906, 11, 2906, 938, 8512, 50916], "temperature": 0.0, + "avg_logprob": -0.19237678115432327, "compression_ratio": 1.543956043956044, "no_speech_prob": + 0.0014350195415318012}, {"id": 164, "seek": 120128, "start": 1212.32, "end": 1222.6399999999999, + "text": " using this fixed size vector. So it''s sort of sparse in more, it''s more + sparse in some ways,", "tokens": [50916, 1228, 341, 6806, 2744, 8062, 13, 407, 309, + 311, 1333, 295, 637, 11668, 294, 544, 11, 309, 311, 544, 637, 11668, 294, 512, 2098, + 11, 51432], "temperature": 0.0, "avg_logprob": -0.19237678115432327, "compression_ratio": + 1.543956043956044, "no_speech_prob": 0.0014350195415318012}, {"id": 165, "seek": + 120128, "start": 1222.6399999999999, "end": 1230.32, "text": " but it''s still fixed + size vector. Doesn''t make sense. Yeah, it does. I mean, it''s very insightful,", + "tokens": [51432, 457, 309, 311, 920, 6806, 2744, 8062, 13, 12955, 380, 652, 2020, + 13, 865, 11, 309, 775, 13, 286, 914, 11, 309, 311, 588, 46401, 11, 51816], "temperature": + 0.0, "avg_logprob": -0.19237678115432327, "compression_ratio": 1.543956043956044, + "no_speech_prob": 0.0014350195415318012}, {"id": 166, "seek": 123032, "start": 1230.32, + "end": 1238.24, "text": " what you said that like basically to make my question + much more succinct, I could ask,", "tokens": [50364, 437, 291, 848, 300, 411, 1936, + 281, 652, 452, 1168, 709, 544, 21578, 5460, 11, 286, 727, 1029, 11, 50760], "temperature": + 0.0, "avg_logprob": -0.2425368813907399, "compression_ratio": 1.5480225988700564, + "no_speech_prob": 0.002814227482303977}, {"id": 167, "seek": 123032, "start": 1238.8799999999999, + "end": 1246.08, "text": " you could we just use splaid for everything? And like + instead of, you know, combining different", "tokens": [50792, 291, 727, 321, 445, + 764, 637, 875, 327, 337, 1203, 30, 400, 411, 2602, 295, 11, 291, 458, 11, 21928, + 819, 51152], "temperature": 0.0, "avg_logprob": -0.2425368813907399, "compression_ratio": + 1.5480225988700564, "no_speech_prob": 0.002814227482303977}, {"id": 168, "seek": + 123032, "start": 1246.08, "end": 1251.4399999999998, "text": " approaches, just + use splaid, but you basically answered it really eloquently. You said that", "tokens": + [51152, 11587, 11, 445, 764, 637, 875, 327, 11, 457, 291, 1936, 10103, 309, 534, + 38682, 47519, 13, 509, 848, 300, 51420], "temperature": 0.0, "avg_logprob": -0.2425368813907399, + "compression_ratio": 1.5480225988700564, "no_speech_prob": 0.002814227482303977}, + {"id": 169, "seek": 125144, "start": 1251.44, "end": 1258.56, "text": " splaid itself + has limitations, right? For example, that would not allow us to properly embed", + "tokens": [50364, 637, 875, 327, 2564, 575, 15705, 11, 558, 30, 1171, 1365, 11, + 300, 576, 406, 2089, 505, 281, 6108, 12240, 50720], "temperature": 0.0, "avg_logprob": + -0.16277454745384953, "compression_ratio": 1.4791666666666667, "no_speech_prob": + 0.010628963820636272}, {"id": 170, "seek": 125144, "start": 1260.16, "end": 1265.04, + "text": " all variety of the language and then obviously dealing with longer documents + is another issue.", "tokens": [50800, 439, 5673, 295, 264, 2856, 293, 550, 2745, + 6260, 365, 2854, 8512, 307, 1071, 2734, 13, 51044], "temperature": 0.0, "avg_logprob": + -0.16277454745384953, "compression_ratio": 1.4791666666666667, "no_speech_prob": + 0.010628963820636272}, {"id": 171, "seek": 125144, "start": 1267.1200000000001, + "end": 1278.88, "text": " There is an interesting extension to this, so I was just + recently listening to a presentation on", "tokens": [51148, 821, 307, 364, 1880, + 10320, 281, 341, 11, 370, 286, 390, 445, 3938, 4764, 281, 257, 5860, 322, 51736], + "temperature": 0.0, "avg_logprob": -0.16277454745384953, "compression_ratio": 1.4791666666666667, + "no_speech_prob": 0.010628963820636272}, {"id": 172, "seek": 127888, "start": 1278.88, + "end": 1286.8000000000002, "text": " the extendable splaid where they extend the + vocabulary of splaid by eddy entities. That''s one", "tokens": [50364, 264, 10101, + 712, 637, 875, 327, 689, 436, 10101, 264, 19864, 295, 637, 875, 327, 538, 1257, + 3173, 16667, 13, 663, 311, 472, 50760], "temperature": 0.0, "avg_logprob": -0.21571285005599733, + "compression_ratio": 1.6506024096385543, "no_speech_prob": 0.002434792695567012}, + {"id": 173, "seek": 127888, "start": 1286.8000000000002, "end": 1294.3200000000002, + "text": " interesting direction of work, but another interesting direction is like + the so-called like", "tokens": [50760, 1880, 3513, 295, 589, 11, 457, 1071, 1880, + 3513, 307, 411, 264, 370, 12, 11880, 411, 51136], "temperature": 0.0, "avg_logprob": + -0.21571285005599733, "compression_ratio": 1.6506024096385543, "no_speech_prob": + 0.002434792695567012}, {"id": 174, "seek": 127888, "start": 1294.3200000000002, + "end": 1303.92, "text": " deep impact models where they take a document and they + do document expansion using like,", "tokens": [51136, 2452, 2712, 5245, 689, 436, + 747, 257, 4166, 293, 436, 360, 4166, 11260, 1228, 411, 11, 51616], "temperature": + 0.0, "avg_logprob": -0.21571285005599733, "compression_ratio": 1.6506024096385543, + "no_speech_prob": 0.002434792695567012}, {"id": 175, "seek": 130392, "start": 1303.92, + "end": 1310.88, "text": " you know, the doctor query style models. And then they + for each talking, I think,", "tokens": [50364, 291, 458, 11, 264, 4631, 14581, 3758, + 5245, 13, 400, 550, 436, 337, 1184, 1417, 11, 286, 519, 11, 50712], "temperature": + 0.0, "avg_logprob": -0.28142738342285156, "compression_ratio": 1.562874251497006, + "no_speech_prob": 0.0037546674720942974}, {"id": 176, "seek": 130392, "start": 1311.76, + "end": 1318.48, "text": " in the document they are learning a weight. And so this + is like a little bit more", "tokens": [50756, 294, 264, 4166, 436, 366, 2539, 257, + 3364, 13, 400, 370, 341, 307, 411, 257, 707, 857, 544, 51092], "temperature": 0.0, + "avg_logprob": -0.28142738342285156, "compression_ratio": 1.562874251497006, "no_speech_prob": + 0.0037546674720942974}, {"id": 177, "seek": 130392, "start": 1319.92, "end": 1329.52, + "text": " less limited, I think. But in the end, I think it''s whenever we, yeah, + so basically if like to be", "tokens": [51164, 1570, 5567, 11, 286, 519, 13, 583, + 294, 264, 917, 11, 286, 519, 309, 311, 5699, 321, 11, 1338, 11, 370, 1936, 498, + 411, 281, 312, 51644], "temperature": 0.0, "avg_logprob": -0.28142738342285156, + "compression_ratio": 1.562874251497006, "no_speech_prob": 0.0037546674720942974}, + {"id": 178, "seek": 132952, "start": 1329.6, "end": 1336.96, "text": " able to handle + those like rare, we need lexical representation to handle, you know, bigger", "tokens": + [50368, 1075, 281, 4813, 729, 411, 5892, 11, 321, 643, 476, 87, 804, 10290, 281, + 4813, 11, 291, 458, 11, 3801, 50736], "temperature": 0.0, "avg_logprob": -0.15265502532323202, + "compression_ratio": 1.6, "no_speech_prob": 0.0022845545317977667}, {"id": 179, + "seek": 132952, "start": 1336.96, "end": 1341.92, "text": " vocabularies. And it''s + probably hard to model with just fixed size vectors.", "tokens": [50736, 2329, 455, + 1040, 530, 13, 400, 309, 311, 1391, 1152, 281, 2316, 365, 445, 6806, 2744, 18875, + 13, 50984], "temperature": 0.0, "avg_logprob": -0.15265502532323202, "compression_ratio": + 1.6, "no_speech_prob": 0.0022845545317977667}, {"id": 180, "seek": 132952, "start": + 1342.8799999999999, "end": 1349.92, "text": " Yeah, it makes a lot of sense. At + the same time, we also know that, well, it depends on how you", "tokens": [51032, + 865, 11, 309, 1669, 257, 688, 295, 2020, 13, 1711, 264, 912, 565, 11, 321, 611, + 458, 300, 11, 731, 11, 309, 5946, 322, 577, 291, 51384], "temperature": 0.0, "avg_logprob": + -0.15265502532323202, "compression_ratio": 1.6, "no_speech_prob": 0.0022845545317977667}, + {"id": 181, "seek": 132952, "start": 1349.92, "end": 1357.52, "text": " model this, + but lexical approach, like vanilla lexical approach would miss semantic links, right,", + "tokens": [51384, 2316, 341, 11, 457, 476, 87, 804, 3109, 11, 411, 17528, 476, 87, + 804, 3109, 576, 1713, 47982, 6123, 11, 558, 11, 51764], "temperature": 0.0, "avg_logprob": + -0.15265502532323202, "compression_ratio": 1.6, "no_speech_prob": 0.0022845545317977667}, + {"id": 182, "seek": 135752, "start": 1357.52, "end": 1362.8799999999999, "text": + " and sort of understanding of larger context, because all it does is that it kind + of looks through", "tokens": [50364, 293, 1333, 295, 3701, 295, 4833, 4319, 11, + 570, 439, 309, 775, 307, 300, 309, 733, 295, 1542, 807, 50632], "temperature": 0.0, + "avg_logprob": -0.16803340911865233, "compression_ratio": 1.696113074204947, "no_speech_prob": + 0.010459821671247482}, {"id": 183, "seek": 135752, "start": 1362.8799999999999, + "end": 1368.16, "text": " the VM 25 model at the words. And sometimes it just pays + attention to some words, but doesn''t", "tokens": [50632, 264, 18038, 3552, 2316, + 412, 264, 2283, 13, 400, 2171, 309, 445, 10604, 3202, 281, 512, 2283, 11, 457, 1177, + 380, 50896], "temperature": 0.0, "avg_logprob": -0.16803340911865233, "compression_ratio": + 1.696113074204947, "no_speech_prob": 0.010459821671247482}, {"id": 184, "seek": + 135752, "start": 1368.16, "end": 1374.24, "text": " pay attention to other words. + And it may miss the main point of the query, right? But of course,", "tokens": [50896, + 1689, 3202, 281, 661, 2283, 13, 400, 309, 815, 1713, 264, 2135, 935, 295, 264, 14581, + 11, 558, 30, 583, 295, 1164, 11, 51200], "temperature": 0.0, "avg_logprob": -0.16803340911865233, + "compression_ratio": 1.696113074204947, "no_speech_prob": 0.010459821671247482}, + {"id": 185, "seek": 135752, "start": 1374.24, "end": 1380.72, "text": " this model + still worked for a new work that Yandex, you know, it best, this model''s worked", + "tokens": [51200, 341, 2316, 920, 2732, 337, 257, 777, 589, 300, 398, 474, 3121, + 11, 291, 458, 11, 309, 1151, 11, 341, 2316, 311, 2732, 51524], "temperature": 0.0, + "avg_logprob": -0.16803340911865233, "compression_ratio": 1.696113074204947, "no_speech_prob": + 0.010459821671247482}, {"id": 186, "seek": 135752, "start": 1380.72, "end": 1387.04, + "text": " previously, probably by virtue of you training the users that, hey, don''t + give me the full sentence,", "tokens": [51524, 8046, 11, 1391, 538, 20816, 295, + 291, 3097, 264, 5022, 300, 11, 4177, 11, 500, 380, 976, 385, 264, 1577, 8174, 11, + 51840], "temperature": 0.0, "avg_logprob": -0.16803340911865233, "compression_ratio": + 1.696113074204947, "no_speech_prob": 0.010459821671247482}, {"id": 187, "seek": + 138704, "start": 1387.04, "end": 1392.6399999999999, "text": " just give me like, + you know, specific words, like chopped list of words that I need to look up.", "tokens": + [50364, 445, 976, 385, 411, 11, 291, 458, 11, 2685, 2283, 11, 411, 16497, 1329, + 295, 2283, 300, 286, 643, 281, 574, 493, 13, 50644], "temperature": 0.0, "avg_logprob": + -0.11994230045991786, "compression_ratio": 1.6695652173913043, "no_speech_prob": + 0.0012812362983822823}, {"id": 188, "seek": 138704, "start": 1392.6399999999999, + "end": 1399.36, "text": " And that''s how I guess inverted index worked out. And + of course, you need to have on top of that,", "tokens": [50644, 400, 300, 311, 577, + 286, 2041, 38969, 8186, 2732, 484, 13, 400, 295, 1164, 11, 291, 643, 281, 362, 322, + 1192, 295, 300, 11, 50980], "temperature": 0.0, "avg_logprob": -0.11994230045991786, + "compression_ratio": 1.6695652173913043, "no_speech_prob": 0.0012812362983822823}, + {"id": 189, "seek": 138704, "start": 1399.36, "end": 1405.36, "text": " you need + to have very smart reranking strategy to pull up the documents that are really relevant,", + "tokens": [50980, 291, 643, 281, 362, 588, 4069, 319, 20479, 278, 5206, 281, 2235, + 493, 264, 8512, 300, 366, 534, 7340, 11, 51280], "temperature": 0.0, "avg_logprob": + -0.11994230045991786, "compression_ratio": 1.6695652173913043, "no_speech_prob": + 0.0012812362983822823}, {"id": 190, "seek": 138704, "start": 1405.36, "end": 1412.56, + "text": " right? But I guess today we have we have this new, well, I keep calling + it new, but it''s not", "tokens": [51280, 558, 30, 583, 286, 2041, 965, 321, 362, + 321, 362, 341, 777, 11, 731, 11, 286, 1066, 5141, 309, 777, 11, 457, 309, 311, 406, + 51640], "temperature": 0.0, "avg_logprob": -0.11994230045991786, "compression_ratio": + 1.6695652173913043, "no_speech_prob": 0.0012812362983822823}, {"id": 191, "seek": + 141256, "start": 1412.56, "end": 1419.2, "text": " maybe necessarily that new, but + it''s still fairly fresh development of dense,", "tokens": [50364, 1310, 4725, 300, + 777, 11, 457, 309, 311, 920, 6457, 4451, 3250, 295, 18011, 11, 50696], "temperature": + 0.0, "avg_logprob": -0.15776309967041016, "compression_ratio": 1.5555555555555556, + "no_speech_prob": 0.003212756710126996}, {"id": 192, "seek": 141256, "start": 1419.2, + "end": 1424.8, "text": " dense retrieval that not many companies, I think, have + been boarded in the products yet.", "tokens": [50696, 18011, 19817, 3337, 300, 406, + 867, 3431, 11, 286, 519, 11, 362, 668, 3150, 292, 294, 264, 3383, 1939, 13, 50976], + "temperature": 0.0, "avg_logprob": -0.15776309967041016, "compression_ratio": 1.5555555555555556, + "no_speech_prob": 0.003212756710126996}, {"id": 193, "seek": 141256, "start": 1425.9199999999998, + "end": 1431.04, "text": " But it''s a very interesting direction, and still you + need to combine the two worlds, right?", "tokens": [51032, 583, 309, 311, 257, 588, + 1880, 3513, 11, 293, 920, 291, 643, 281, 10432, 264, 732, 13401, 11, 558, 30, 51288], + "temperature": 0.0, "avg_logprob": -0.15776309967041016, "compression_ratio": 1.5555555555555556, + "no_speech_prob": 0.003212756710126996}, {"id": 194, "seek": 141256, "start": 1431.04, + "end": 1438.72, "text": " So it sounds like from what you said, the only way to + get better quality is to combine this", "tokens": [51288, 407, 309, 3263, 411, 490, + 437, 291, 848, 11, 264, 787, 636, 281, 483, 1101, 3125, 307, 281, 10432, 341, 51672], + "temperature": 0.0, "avg_logprob": -0.15776309967041016, "compression_ratio": 1.5555555555555556, + "no_speech_prob": 0.003212756710126996}, {"id": 195, "seek": 143872, "start": 1438.72, + "end": 1444.48, "text": " approaches rather than try to develop one single holistic + model to handle everything.", "tokens": [50364, 11587, 2831, 813, 853, 281, 1499, + 472, 2167, 30334, 2316, 281, 4813, 1203, 13, 50652], "temperature": 0.0, "avg_logprob": + -0.19412146175608916, "compression_ratio": 1.5963302752293578, "no_speech_prob": + 0.013916502706706524}, {"id": 196, "seek": 143872, "start": 1447.2, "end": 1452.56, + "text": " Oh, I, yeah, it''s a great question. I actually don''t know what''s the + best part forward is.", "tokens": [50788, 876, 11, 286, 11, 1338, 11, 309, 311, + 257, 869, 1168, 13, 286, 767, 500, 380, 458, 437, 311, 264, 1151, 644, 2128, 307, + 13, 51056], "temperature": 0.0, "avg_logprob": -0.19412146175608916, "compression_ratio": + 1.5963302752293578, "no_speech_prob": 0.013916502706706524}, {"id": 197, "seek": + 143872, "start": 1453.28, "end": 1459.1200000000001, "text": " So I highlighted + the, the deficiencies and advantages of different approaches.", "tokens": [51092, + 407, 286, 17173, 264, 11, 264, 19248, 31294, 293, 14906, 295, 819, 11587, 13, 51384], + "temperature": 0.0, "avg_logprob": -0.19412146175608916, "compression_ratio": 1.5963302752293578, + "no_speech_prob": 0.013916502706706524}, {"id": 198, "seek": 143872, "start": 1461.2, + "end": 1466.08, "text": " But I also want to comment on the deep impact model, the + deep impact model, I think the way,", "tokens": [51488, 583, 286, 611, 528, 281, + 2871, 322, 264, 2452, 2712, 2316, 11, 264, 2452, 2712, 2316, 11, 286, 519, 264, + 636, 11, 51732], "temperature": 0.0, "avg_logprob": -0.19412146175608916, "compression_ratio": + 1.5963302752293578, "no_speech_prob": 0.013916502706706524}, {"id": 199, "seek": + 146608, "start": 1466.08, "end": 1474.96, "text": " maybe I described it, it was, + it sounded like it is like a BM25 model, but it''s actually not.", "tokens": [50364, + 1310, 286, 7619, 309, 11, 309, 390, 11, 309, 17714, 411, 309, 307, 411, 257, 15901, + 6074, 2316, 11, 457, 309, 311, 767, 406, 13, 50808], "temperature": 0.0, "avg_logprob": + -0.22637695141052933, "compression_ratio": 1.6728971962616823, "no_speech_prob": + 0.003180060302838683}, {"id": 200, "seek": 146608, "start": 1474.96, "end": 1480.48, + "text": " So maybe we should have, like we''re talking about sparse representations, + like learned sparse", "tokens": [50808, 407, 1310, 321, 820, 362, 11, 411, 321, + 434, 1417, 466, 637, 11668, 33358, 11, 411, 3264, 637, 11668, 51084], "temperature": + 0.0, "avg_logprob": -0.22637695141052933, "compression_ratio": 1.6728971962616823, + "no_speech_prob": 0.003180060302838683}, {"id": 201, "seek": 146608, "start": 1480.48, + "end": 1485.76, "text": " representations, because it''s a bigger topic and it''s + much bigger topic than most people", "tokens": [51084, 33358, 11, 570, 309, 311, + 257, 3801, 4829, 293, 309, 311, 709, 3801, 4829, 813, 881, 561, 51348], "temperature": + 0.0, "avg_logprob": -0.22637695141052933, "compression_ratio": 1.6728971962616823, + "no_speech_prob": 0.003180060302838683}, {"id": 202, "seek": 146608, "start": 1486.6399999999999, + "end": 1492.32, "text": " realize sometimes. So people know BM25, people know dense + vectors, and these are,", "tokens": [51392, 4325, 2171, 13, 407, 561, 458, 15901, + 6074, 11, 561, 458, 18011, 18875, 11, 293, 613, 366, 11, 51676], "temperature": + 0.0, "avg_logprob": -0.22637695141052933, "compression_ratio": 1.6728971962616823, + "no_speech_prob": 0.003180060302838683}, {"id": 203, "seek": 149232, "start": 1493.28, + "end": 1498.96, "text": " these simple things, but there is a lot in between. So + first of all, what you can do, and that''s what", "tokens": [50412, 613, 2199, 721, + 11, 457, 456, 307, 257, 688, 294, 1296, 13, 407, 700, 295, 439, 11, 437, 291, 393, + 360, 11, 293, 300, 311, 437, 50696], "temperature": 0.0, "avg_logprob": -0.15300087881560373, + "compression_ratio": 1.7210300429184548, "no_speech_prob": 0.007591988891363144}, + {"id": 204, "seek": 149232, "start": 1498.96, "end": 1505.9199999999998, "text": + " people did, and even the doctor query is the most famous way to do so, but it + was actually not even", "tokens": [50696, 561, 630, 11, 293, 754, 264, 4631, 14581, + 307, 264, 881, 4618, 636, 281, 360, 370, 11, 457, 309, 390, 767, 406, 754, 51044], + "temperature": 0.0, "avg_logprob": -0.15300087881560373, "compression_ratio": 1.7210300429184548, + "no_speech_prob": 0.007591988891363144}, {"id": 205, "seek": 149232, "start": 1505.9199999999998, + "end": 1512.32, "text": " a single group of people who proposed this. So what can + you do? We can take a model, a deep learning", "tokens": [51044, 257, 2167, 1594, + 295, 561, 567, 10348, 341, 13, 407, 437, 393, 291, 360, 30, 492, 393, 747, 257, + 2316, 11, 257, 2452, 2539, 51364], "temperature": 0.0, "avg_logprob": -0.15300087881560373, + "compression_ratio": 1.7210300429184548, "no_speech_prob": 0.007591988891363144}, + {"id": 206, "seek": 149232, "start": 1512.32, "end": 1518.48, "text": " model, contextualized + model, maybe not necessarily contextualized, but contextualized models, they", "tokens": + [51364, 2316, 11, 35526, 1602, 2316, 11, 1310, 406, 4725, 35526, 1602, 11, 457, + 35526, 1602, 5245, 11, 436, 51672], "temperature": 0.0, "avg_logprob": -0.15300087881560373, + "compression_ratio": 1.7210300429184548, "no_speech_prob": 0.007591988891363144}, + {"id": 207, "seek": 151848, "start": 1519.1200000000001, "end": 1524.4, "text": + " do better job because they look at the model as a whole, the document as a whole, + they don''t like", "tokens": [50396, 360, 1101, 1691, 570, 436, 574, 412, 264, 2316, + 382, 257, 1379, 11, 264, 4166, 382, 257, 1379, 11, 436, 500, 380, 411, 50660], "temperature": + 0.0, "avg_logprob": -0.1810528594668549, "compression_ratio": 1.7702702702702702, + "no_speech_prob": 0.004379728808999062}, {"id": 208, "seek": 151848, "start": 1524.4, + "end": 1531.3600000000001, "text": " look like a devidue chunks of document, right? + So they kind of can understand what the total meaning", "tokens": [50660, 574, 411, + 257, 1905, 327, 622, 24004, 295, 4166, 11, 558, 30, 407, 436, 733, 295, 393, 1223, + 437, 264, 3217, 3620, 51008], "temperature": 0.0, "avg_logprob": -0.1810528594668549, + "compression_ratio": 1.7702702702702702, "no_speech_prob": 0.004379728808999062}, + {"id": 209, "seek": 151848, "start": 1531.3600000000001, "end": 1540.4, "text": + " of the document. And then they, they propose new keywords on new terms. So some + like synonyms,", "tokens": [51008, 295, 264, 4166, 13, 400, 550, 436, 11, 436, 17421, + 777, 21009, 322, 777, 2115, 13, 407, 512, 411, 5451, 2526, 2592, 11, 51460], "temperature": + 0.0, "avg_logprob": -0.1810528594668549, "compression_ratio": 1.7702702702702702, + "no_speech_prob": 0.004379728808999062}, {"id": 210, "seek": 151848, "start": 1541.28, + "end": 1546.88, "text": " synonyms that could have been in this document, but they + are not. And if you add these documents to", "tokens": [51504, 5451, 2526, 2592, + 300, 727, 362, 668, 294, 341, 4166, 11, 457, 436, 366, 406, 13, 400, 498, 291, 909, + 613, 8512, 281, 51784], "temperature": 0.0, "avg_logprob": -0.1810528594668549, + "compression_ratio": 1.7702702702702702, "no_speech_prob": 0.004379728808999062}, + {"id": 211, "seek": 154688, "start": 1546.88, "end": 1556.4, "text": " the new, + if you add, sorry pardon me, if you add these terms to the document, then this missing", + "tokens": [50364, 264, 777, 11, 498, 291, 909, 11, 2597, 22440, 385, 11, 498, 291, + 909, 613, 2115, 281, 264, 4166, 11, 550, 341, 5361, 50840], "temperature": 0.0, + "avg_logprob": -0.18886938610592405, "compression_ratio": 1.748502994011976, "no_speech_prob": + 0.0018137841252610087}, {"id": 212, "seek": 154688, "start": 1556.4, "end": 1564.64, + "text": " synonyms are there. You can index this document. So basically this is + document expansion. And you", "tokens": [50840, 5451, 2526, 2592, 366, 456, 13, + 509, 393, 8186, 341, 4166, 13, 407, 1936, 341, 307, 4166, 11260, 13, 400, 291, 51252], + "temperature": 0.0, "avg_logprob": -0.18886938610592405, "compression_ratio": 1.748502994011976, + "no_speech_prob": 0.0018137841252610087}, {"id": 213, "seek": 154688, "start": 1564.64, + "end": 1573.3600000000001, "text": " can do document expansion. And that helps resolve + that lexical mismatch, mitigate lexical mismatch", "tokens": [51252, 393, 360, 4166, + 11260, 13, 400, 300, 3665, 14151, 300, 476, 87, 804, 23220, 852, 11, 27336, 476, + 87, 804, 23220, 852, 51688], "temperature": 0.0, "avg_logprob": -0.18886938610592405, + "compression_ratio": 1.748502994011976, "no_speech_prob": 0.0018137841252610087}, + {"id": 214, "seek": 157336, "start": 1573.36, "end": 1579.1999999999998, "text": + " between query and documents. And I claim it''s easier to do this expansion.", + "tokens": [50364, 1296, 14581, 293, 8512, 13, 400, 286, 3932, 309, 311, 3571, 281, + 360, 341, 11260, 13, 50656], "temperature": 0.0, "avg_logprob": -0.18294605928308824, + "compression_ratio": 1.6409090909090909, "no_speech_prob": 0.002375661861151457}, + {"id": 215, "seek": 157336, "start": 1579.84, "end": 1585.6799999999998, "text": + " That there are like of course approaches that do query expansion, basically adding + synonyms at", "tokens": [50688, 663, 456, 366, 411, 295, 1164, 11587, 300, 360, + 14581, 11260, 11, 1936, 5127, 5451, 2526, 2592, 412, 50980], "temperature": 0.0, + "avg_logprob": -0.18294605928308824, "compression_ratio": 1.6409090909090909, "no_speech_prob": + 0.002375661861151457}, {"id": 216, "seek": 157336, "start": 1585.6799999999998, + "end": 1591.6, "text": " the query stage. But why claim is that it''s much harder + to do this accurately because there is", "tokens": [50980, 264, 14581, 3233, 13, + 583, 983, 3932, 307, 300, 309, 311, 709, 6081, 281, 360, 341, 20095, 570, 456, 307, + 51276], "temperature": 0.0, "avg_logprob": -0.18294605928308824, "compression_ratio": + 1.6409090909090909, "no_speech_prob": 0.002375661861151457}, {"id": 217, "seek": + 157336, "start": 1591.6, "end": 1601.76, "text": " much less context. So this is + one, you know, this is one direction of fixing things and creating", "tokens": [51276, + 709, 1570, 4319, 13, 407, 341, 307, 472, 11, 291, 458, 11, 341, 307, 472, 3513, + 295, 19442, 721, 293, 4084, 51784], "temperature": 0.0, "avg_logprob": -0.18294605928308824, + "compression_ratio": 1.6409090909090909, "no_speech_prob": 0.002375661861151457}, + {"id": 218, "seek": 160176, "start": 1601.76, "end": 1607.12, "text": " sparse representations. + Like there is a split model. What does this play? What does this play", "tokens": + [50364, 637, 11668, 33358, 13, 1743, 456, 307, 257, 7472, 2316, 13, 708, 775, 341, + 862, 30, 708, 775, 341, 862, 50632], "temperature": 0.0, "avg_logprob": -0.27197414197419817, + "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.004775272682309151}, + {"id": 219, "seek": 160176, "start": 1607.12, "end": 1613.68, "text": " model? It''s + completely sort of it''s doing something completely different. It looks at the document.", + "tokens": [50632, 2316, 30, 467, 311, 2584, 1333, 295, 309, 311, 884, 746, 2584, + 819, 13, 467, 1542, 412, 264, 4166, 13, 50960], "temperature": 0.0, "avg_logprob": + -0.27197414197419817, "compression_ratio": 1.7636363636363637, "no_speech_prob": + 0.004775272682309151}, {"id": 220, "seek": 160176, "start": 1613.68, "end": 1621.6, + "text": " And there is a vocabulary, like bird tokens. And for each token, it gives + you a weight. It looks at", "tokens": [50960, 400, 456, 307, 257, 19864, 11, 411, + 5255, 22667, 13, 400, 337, 1184, 14862, 11, 309, 2709, 291, 257, 3364, 13, 467, + 1542, 412, 51356], "temperature": 0.0, "avg_logprob": -0.27197414197419817, "compression_ratio": + 1.7636363636363637, "no_speech_prob": 0.004775272682309151}, {"id": 221, "seek": + 160176, "start": 1621.6, "end": 1626.8, "text": " the document sort of understand + this meaning that says, all like this is like, this is a word,", "tokens": [51356, + 264, 4166, 1333, 295, 1223, 341, 3620, 300, 1619, 11, 439, 411, 341, 307, 411, 11, + 341, 307, 257, 1349, 11, 51616], "temperature": 0.0, "avg_logprob": -0.27197414197419817, + "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.004775272682309151}, + {"id": 222, "seek": 162680, "start": 1626.8, "end": 1631.68, "text": " a prefix + or word. It should have this weight. And that''s how you get a sparse representation.", + "tokens": [50364, 257, 46969, 420, 1349, 13, 467, 820, 362, 341, 3364, 13, 400, + 300, 311, 577, 291, 483, 257, 637, 11668, 10290, 13, 50608], "temperature": 0.0, + "avg_logprob": -0.18384494384129843, "compression_ratio": 1.6065573770491803, "no_speech_prob": + 0.001982129644602537}, {"id": 223, "seek": 162680, "start": 1632.56, "end": 1638.56, + "text": " But with deep impact, you''re doing something slightly different. So you + take a document and you do", "tokens": [50652, 583, 365, 2452, 2712, 11, 291, 434, + 884, 746, 4748, 819, 13, 407, 291, 747, 257, 4166, 293, 291, 360, 50952], "temperature": + 0.0, "avg_logprob": -0.18384494384129843, "compression_ratio": 1.6065573770491803, + "no_speech_prob": 0.001982129644602537}, {"id": 224, "seek": 162680, "start": 1638.56, + "end": 1645.6, "text": " this document expansion. So you add words like synonyms. + But then you don''t index this document using", "tokens": [50952, 341, 4166, 11260, + 13, 407, 291, 909, 2283, 411, 5451, 2526, 2592, 13, 583, 550, 291, 500, 380, 8186, + 341, 4166, 1228, 51304], "temperature": 0.0, "avg_logprob": -0.18384494384129843, + "compression_ratio": 1.6065573770491803, "no_speech_prob": 0.001982129644602537}, + {"id": 225, "seek": 162680, "start": 1645.6, "end": 1653.52, "text": " build 25. + Why? Because build 25 is clearly old style and it doesn''t take like context to + account.", "tokens": [51304, 1322, 3552, 13, 1545, 30, 1436, 1322, 3552, 307, 4448, + 1331, 3758, 293, 309, 1177, 380, 747, 411, 4319, 281, 2696, 13, 51700], "temperature": + 0.0, "avg_logprob": -0.18384494384129843, "compression_ratio": 1.6065573770491803, + "no_speech_prob": 0.001982129644602537}, {"id": 226, "seek": 165352, "start": 1653.52, + "end": 1660.08, "text": " So instead of that, you train a transform model that would + give you weight for each", "tokens": [50364, 407, 2602, 295, 300, 11, 291, 3847, + 257, 4088, 2316, 300, 576, 976, 291, 3364, 337, 1184, 50692], "temperature": 0.0, + "avg_logprob": -0.24819218028675427, "compression_ratio": 1.6782178217821782, "no_speech_prob": + 0.002493984531611204}, {"id": 227, "seek": 165352, "start": 1661.04, "end": 1667.44, + "text": " term in the document, in the expanded document. And then you use this + for it.", "tokens": [50740, 1433, 294, 264, 4166, 11, 294, 264, 14342, 4166, 13, + 400, 550, 291, 764, 341, 337, 309, 13, 51060], "temperature": 0.0, "avg_logprob": + -0.24819218028675427, "compression_ratio": 1.6782178217821782, "no_speech_prob": + 0.002493984531611204}, {"id": 228, "seek": 165352, "start": 1669.92, "end": 1675.28, + "text": " Oh, that''s very easy. And that''s called deep and that''s called deep + impact models.", "tokens": [51184, 876, 11, 300, 311, 588, 1858, 13, 400, 300, 311, + 1219, 2452, 293, 300, 311, 1219, 2452, 2712, 5245, 13, 51452], "temperature": 0.0, + "avg_logprob": -0.24819218028675427, "compression_ratio": 1.6782178217821782, "no_speech_prob": + 0.002493984531611204}, {"id": 229, "seek": 165352, "start": 1675.28, "end": 1681.52, + "text": " Yes. Yeah. We should link that I guess there is a paper for that as well + and should be able to", "tokens": [51452, 1079, 13, 865, 13, 492, 820, 2113, 300, + 286, 2041, 456, 307, 257, 3035, 337, 300, 382, 731, 293, 820, 312, 1075, 281, 51764], + "temperature": 0.0, "avg_logprob": -0.24819218028675427, "compression_ratio": 1.6782178217821782, + "no_speech_prob": 0.002493984531611204}, {"id": 230, "seek": 168152, "start": 1681.52, + "end": 1688.32, "text": " link that. Yeah. That''s very interesting. And it''s also + interesting that what you mentioned about", "tokens": [50364, 2113, 300, 13, 865, + 13, 663, 311, 588, 1880, 13, 400, 309, 311, 611, 1880, 300, 437, 291, 2835, 466, + 50704], "temperature": 0.0, "avg_logprob": -0.31791125768902656, "compression_ratio": + 1.5982532751091703, "no_speech_prob": 0.003316299756988883}, {"id": 231, "seek": + 168152, "start": 1688.32, "end": 1695.12, "text": " the dance model sort of not + able to capture everything that you want them to capture.", "tokens": [50704, 264, + 4489, 2316, 1333, 295, 406, 1075, 281, 7983, 1203, 300, 291, 528, 552, 281, 7983, + 13, 51044], "temperature": 0.0, "avg_logprob": -0.31791125768902656, "compression_ratio": + 1.5982532751091703, "no_speech_prob": 0.003316299756988883}, {"id": 232, "seek": + 168152, "start": 1695.68, "end": 1701.28, "text": " And yet, this becomes a building + block in the application phase, like for example, in", "tokens": [51072, 400, 1939, + 11, 341, 3643, 257, 2390, 3461, 294, 264, 3861, 5574, 11, 411, 337, 1365, 11, 294, + 51352], "temperature": 0.0, "avg_logprob": -0.31791125768902656, "compression_ratio": + 1.5982532751091703, "no_speech_prob": 0.003316299756988883}, {"id": 233, "seek": + 168152, "start": 1701.28, "end": 1707.52, "text": " Rage or a Givalogmented Generation + because effectively, the only method that I heard so far off,", "tokens": [51352, + 497, 609, 420, 257, 460, 3576, 664, 14684, 23898, 570, 8659, 11, 264, 787, 3170, + 300, 286, 2198, 370, 1400, 766, 11, 51664], "temperature": 0.0, "avg_logprob": -0.31791125768902656, + "compression_ratio": 1.5982532751091703, "no_speech_prob": 0.003316299756988883}, + {"id": 234, "seek": 170752, "start": 1707.52, "end": 1714.4, "text": " which is + circulating a lot is just chunk it up. You chunk all documents up and then you hope + that", "tokens": [50364, 597, 307, 39749, 257, 688, 307, 445, 16635, 309, 493, 13, + 509, 16635, 439, 8512, 493, 293, 550, 291, 1454, 300, 50708], "temperature": 0.0, + "avg_logprob": -0.16514055493851781, "compression_ratio": 1.553763440860215, "no_speech_prob": + 0.0061403014697134495}, {"id": 235, "seek": 170752, "start": 1716.24, "end": 1722.32, + "text": " the chunk size is less or about the same as capacity of the model, right? + Because otherwise,", "tokens": [50800, 264, 16635, 2744, 307, 1570, 420, 466, 264, + 912, 382, 6042, 295, 264, 2316, 11, 558, 30, 1436, 5911, 11, 51104], "temperature": + 0.0, "avg_logprob": -0.16514055493851781, "compression_ratio": 1.553763440860215, + "no_speech_prob": 0.0061403014697134495}, {"id": 236, "seek": 170752, "start": 1722.32, + "end": 1729.76, "text": " it will chop off the end and you will lose the part of + the meaning. Or you also apply some methods", "tokens": [51104, 309, 486, 7931, + 766, 264, 917, 293, 291, 486, 3624, 264, 644, 295, 264, 3620, 13, 1610, 291, 611, + 3079, 512, 7150, 51476], "temperature": 0.0, "avg_logprob": -0.16514055493851781, + "compression_ratio": 1.553763440860215, "no_speech_prob": 0.0061403014697134495}, + {"id": 237, "seek": 172976, "start": 1729.76, "end": 1737.68, "text": " like some + level of overlap, right? So you can then index a few more chunks in the same entity", + "tokens": [50364, 411, 512, 1496, 295, 19959, 11, 558, 30, 407, 291, 393, 550, 8186, + 257, 1326, 544, 24004, 294, 264, 912, 13977, 50760], "temperature": 0.0, "avg_logprob": + -0.1615531041071965, "compression_ratio": 1.6491228070175439, "no_speech_prob": + 0.0025198108050972223}, {"id": 238, "seek": 172976, "start": 1737.68, "end": 1747.12, + "text": " and then try to query. And then interestingly, you can generate questions + out of chunks", "tokens": [50760, 293, 550, 853, 281, 14581, 13, 400, 550, 25873, + 11, 291, 393, 8460, 1651, 484, 295, 24004, 51232], "temperature": 0.0, "avg_logprob": + -0.1615531041071965, "compression_ratio": 1.6491228070175439, "no_speech_prob": + 0.0025198108050972223}, {"id": 239, "seek": 172976, "start": 1747.44, "end": 1752.72, + "text": " that these chunks might be able to answer. And then you search those questions + instead of the chunks", "tokens": [51248, 300, 613, 24004, 1062, 312, 1075, 281, + 1867, 13, 400, 550, 291, 3164, 729, 1651, 2602, 295, 264, 24004, 51512], "temperature": + 0.0, "avg_logprob": -0.1615531041071965, "compression_ratio": 1.6491228070175439, + "no_speech_prob": 0.0025198108050972223}, {"id": 240, "seek": 175272, "start": 1752.72, + "end": 1759.68, "text": " themselves, right? So which comes back to what you said + about Dr. Query, I guess. So it''s very", "tokens": [50364, 2969, 11, 558, 30, 407, + 597, 1487, 646, 281, 437, 291, 848, 466, 2491, 13, 2326, 2109, 11, 286, 2041, 13, + 407, 309, 311, 588, 50712], "temperature": 0.0, "avg_logprob": -0.2050280006982947, + "compression_ratio": 1.6282051282051282, "no_speech_prob": 0.023092610761523247}, + {"id": 241, "seek": 175272, "start": 1759.68, "end": 1768.88, "text": " interesting + that like we are sort of like standing on a set of building blocks that themselves + should", "tokens": [50712, 1880, 300, 411, 321, 366, 1333, 295, 411, 4877, 322, + 257, 992, 295, 2390, 8474, 300, 2969, 820, 51172], "temperature": 0.0, "avg_logprob": + -0.2050280006982947, "compression_ratio": 1.6282051282051282, "no_speech_prob": + 0.023092610761523247}, {"id": 242, "seek": 175272, "start": 1769.6000000000001, + "end": 1775.92, "text": " be optimized and optimized and optimized. But I guess + we already in the phase globally when", "tokens": [51208, 312, 26941, 293, 26941, + 293, 26941, 13, 583, 286, 2041, 321, 1217, 294, 264, 5574, 18958, 562, 51524], "temperature": + 0.0, "avg_logprob": -0.2050280006982947, "compression_ratio": 1.6282051282051282, + "no_speech_prob": 0.023092610761523247}, {"id": 243, "seek": 175272, "start": 1775.92, + "end": 1782.0, "text": " everyone is trying to derive value from LLMs and Rags and + everything, right? And yet, we can", "tokens": [51524, 1518, 307, 1382, 281, 28446, + 2158, 490, 441, 43, 26386, 293, 497, 12109, 293, 1203, 11, 558, 30, 400, 1939, 11, + 321, 393, 51828], "temperature": 0.0, "avg_logprob": -0.2050280006982947, "compression_ratio": + 1.6282051282051282, "no_speech_prob": 0.023092610761523247}, {"id": 244, "seek": + 178200, "start": 1782.0, "end": 1788.0, "text": " stumble upon some really tricky + situations. Like you explained. Oh, it looks like we have a lot.", "tokens": [50364, + 41302, 3564, 512, 534, 12414, 6851, 13, 1743, 291, 8825, 13, 876, 11, 309, 1542, + 411, 321, 362, 257, 688, 13, 50664], "temperature": 0.0, "avg_logprob": -0.2663711201060902, + "compression_ratio": 1.5508021390374331, "no_speech_prob": 0.0013222168199717999}, + {"id": 245, "seek": 178200, "start": 1788.56, "end": 1796.24, "text": " Yeah, it + looks like we have still a lot of research topics. Yeah. A lot of answering questions.", + "tokens": [50692, 865, 11, 309, 1542, 411, 321, 362, 920, 257, 688, 295, 2132, 8378, + 13, 865, 13, 316, 688, 295, 13430, 1651, 13, 51076], "temperature": 0.0, "avg_logprob": + -0.2663711201060902, "compression_ratio": 1.5508021390374331, "no_speech_prob": + 0.0013222168199717999}, {"id": 246, "seek": 178200, "start": 1797.2, "end": 1805.2, + "text": " Yeah. I wanted to a little bit digress from here to the work you''ve done + at NMS Lip and I want to", "tokens": [51124, 865, 13, 286, 1415, 281, 257, 707, + 857, 2528, 735, 490, 510, 281, 264, 589, 291, 600, 1096, 412, 426, 10288, 27475, + 293, 286, 528, 281, 51524], "temperature": 0.0, "avg_logprob": -0.2663711201060902, + "compression_ratio": 1.5508021390374331, "no_speech_prob": 0.0013222168199717999}, + {"id": 247, "seek": 180520, "start": 1805.2, "end": 1813.68, "text": " read it from + your from the GitHub repository. It''s a non-metrics space library. And I did spend + some", "tokens": [50364, 1401, 309, 490, 428, 490, 264, 23331, 25841, 13, 467, 311, + 257, 2107, 12, 5537, 10716, 1901, 6405, 13, 400, 286, 630, 3496, 512, 50788], "temperature": + 0.0, "avg_logprob": -0.19532004992167154, "compression_ratio": 1.5294117647058822, + "no_speech_prob": 0.004680975805968046}, {"id": 248, "seek": 180520, "start": 1813.68, + "end": 1824.0800000000002, "text": " time in my rework life, you know, when I was + studying mathematics and we did study a bunch of,", "tokens": [50788, 565, 294, + 452, 48376, 993, 11, 291, 458, 11, 562, 286, 390, 7601, 18666, 293, 321, 630, 2979, + 257, 3840, 295, 11, 51308], "temperature": 0.0, "avg_logprob": -0.19532004992167154, + "compression_ratio": 1.5294117647058822, "no_speech_prob": 0.004680975805968046}, + {"id": 249, "seek": 180520, "start": 1824.0800000000002, "end": 1830.88, "text": + " you know, metric spaces. I have never realized I would never really like imagine + that this", "tokens": [51308, 291, 458, 11, 20678, 7673, 13, 286, 362, 1128, 5334, + 286, 576, 1128, 534, 411, 3811, 300, 341, 51648], "temperature": 0.0, "avg_logprob": + -0.19532004992167154, "compression_ratio": 1.5294117647058822, "no_speech_prob": + 0.004680975805968046}, {"id": 250, "seek": 183088, "start": 1831.2800000000002, + "end": 1841.1200000000001, "text": " highly theoretical stuff would now connect + so deeply to practice and it''s amazing. But can you tell me", "tokens": [50384, + 5405, 20864, 1507, 576, 586, 1745, 370, 8760, 281, 3124, 293, 309, 311, 2243, 13, + 583, 393, 291, 980, 385, 50876], "temperature": 0.0, "avg_logprob": -0.25581018924713134, + "compression_ratio": 1.4759615384615385, "no_speech_prob": 0.0033354482147842646}, + {"id": 251, "seek": 183088, "start": 1841.1200000000001, "end": 1849.8400000000001, + "text": " why it''s non-metrics space library? Isn''t it so that the whole idea + of, you know, vector searches that", "tokens": [50876, 983, 309, 311, 2107, 12, + 5537, 10716, 1901, 6405, 30, 6998, 380, 309, 370, 300, 264, 1379, 1558, 295, 11, + 291, 458, 11, 8062, 26701, 300, 51312], "temperature": 0.0, "avg_logprob": -0.25581018924713134, + "compression_ratio": 1.4759615384615385, "no_speech_prob": 0.0033354482147842646}, + {"id": 252, "seek": 183088, "start": 1849.8400000000001, "end": 1859.7600000000002, + "text": " we choose some metric, Cousin or dot product or whatever it is. And we + are, and that''s how we express", "tokens": [51312, 321, 2826, 512, 20678, 11, 383, + 563, 259, 420, 5893, 1674, 420, 2035, 309, 307, 13, 400, 321, 366, 11, 293, 300, + 311, 577, 321, 5109, 51808], "temperature": 0.0, "avg_logprob": -0.25581018924713134, + "compression_ratio": 1.4759615384615385, "no_speech_prob": 0.0033354482147842646}, + {"id": 253, "seek": 185976, "start": 1859.84, "end": 1871.68, "text": " the semantics + similarity. Great question. So the reason why it is we decided to not limit ourselves", + "tokens": [50368, 264, 4361, 45298, 32194, 13, 3769, 1168, 13, 407, 264, 1778, 983, + 309, 307, 321, 3047, 281, 406, 4948, 4175, 50960], "temperature": 0.0, "avg_logprob": + -0.1907815103945525, "compression_ratio": 1.4492753623188406, "no_speech_prob": + 0.0024719147477298975}, {"id": 254, "seek": 185976, "start": 1871.68, "end": 1880.24, + "text": " for to metric search because we felt and that''s also a feeling of other + people is that metric search", "tokens": [50960, 337, 281, 20678, 3164, 570, 321, + 2762, 293, 300, 311, 611, 257, 2633, 295, 661, 561, 307, 300, 20678, 3164, 51388], + "temperature": 0.0, "avg_logprob": -0.1907815103945525, "compression_ratio": 1.4492753623188406, + "no_speech_prob": 0.0024719147477298975}, {"id": 255, "seek": 188024, "start": 1880.32, + "end": 1891.36, "text": " is is limiting. So it''s not expressive enough. It turned + out to be true to some degree but not as much", "tokens": [50368, 307, 307, 22083, + 13, 407, 309, 311, 406, 40189, 1547, 13, 467, 3574, 484, 281, 312, 2074, 281, 512, + 4314, 457, 406, 382, 709, 50920], "temperature": 0.0, "avg_logprob": -0.23412871655122733, + "compression_ratio": 1.515, "no_speech_prob": 0.008462520316243172}, {"id": 256, + "seek": 188024, "start": 1891.36, "end": 1899.84, "text": " as we hoped. And indeed, + in many cases, so and why we''re doing so, the representation learning was not", + "tokens": [50920, 382, 321, 19737, 13, 400, 6451, 11, 294, 867, 3331, 11, 370, 293, + 983, 321, 434, 884, 370, 11, 264, 10290, 2539, 390, 406, 51344], "temperature": + 0.0, "avg_logprob": -0.23412871655122733, "compression_ratio": 1.515, "no_speech_prob": + 0.008462520316243172}, {"id": 257, "seek": 188024, "start": 1900.72, "end": 1909.84, + "text": " as developed as it is now. So we felt like, you know, we need to be able + to, people will engineer", "tokens": [51388, 382, 4743, 382, 309, 307, 586, 13, + 407, 321, 2762, 411, 11, 291, 458, 11, 321, 643, 281, 312, 1075, 281, 11, 561, 486, + 11403, 51844], "temperature": 0.0, "avg_logprob": -0.23412871655122733, "compression_ratio": + 1.515, "no_speech_prob": 0.008462520316243172}, {"id": 258, "seek": 190984, "start": + 1909.84, "end": 1915.76, "text": " those complex similarities and we need to support + individual using this complex similarity.", "tokens": [50364, 729, 3997, 24197, + 293, 321, 643, 281, 1406, 2609, 1228, 341, 3997, 32194, 13, 50660], "temperature": + 0.0, "avg_logprob": -0.20895021192489133, "compression_ratio": 1.5706521739130435, + "no_speech_prob": 0.0023419680073857307}, {"id": 259, "seek": 190984, "start": 1915.76, + "end": 1924.6399999999999, "text": " This did not happen. But what I think happened + and that''s I want to connect this to my statement", "tokens": [50660, 639, 630, + 406, 1051, 13, 583, 437, 286, 519, 2011, 293, 300, 311, 286, 528, 281, 1745, 341, + 281, 452, 5629, 51104], "temperature": 0.0, "avg_logprob": -0.20895021192489133, + "compression_ratio": 1.5706521739130435, "no_speech_prob": 0.0023419680073857307}, + {"id": 260, "seek": 190984, "start": 1924.6399999999999, "end": 1930.8, "text": + " that in the end of my graduate studies or rather after defending a thesis, somebody + pointed out that", "tokens": [51104, 300, 294, 264, 917, 295, 452, 8080, 5313, 420, + 2831, 934, 21377, 257, 22288, 11, 2618, 10932, 484, 300, 51412], "temperature": + 0.0, "avg_logprob": -0.20895021192489133, "compression_ratio": 1.5706521739130435, + "no_speech_prob": 0.0023419680073857307}, {"id": 261, "seek": 193080, "start": 1931.76, + "end": 1939.68, "text": " the similarities that we were using were basically representable + as the sparse similar product between", "tokens": [50412, 264, 24197, 300, 321, + 645, 1228, 645, 1936, 2906, 712, 382, 264, 637, 11668, 2531, 1674, 1296, 50808], + "temperature": 0.0, "avg_logprob": -0.23663894832134247, "compression_ratio": 1.576271186440678, + "no_speech_prob": 0.003317478811368346}, {"id": 262, "seek": 193080, "start": 1940.48, + "end": 1947.44, "text": " two huge vectors. So it''s some sort of it becomes similar + to either deep impact or split.", "tokens": [50848, 732, 2603, 18875, 13, 407, 309, + 311, 512, 1333, 295, 309, 3643, 2531, 281, 2139, 2452, 2712, 420, 7472, 13, 51196], + "temperature": 0.0, "avg_logprob": -0.23663894832134247, "compression_ratio": 1.576271186440678, + "no_speech_prob": 0.003317478811368346}, {"id": 263, "seek": 193080, "start": 1948.6399999999999, + "end": 1955.52, "text": " And in fact, so the similarity is the maximum product. + It''s not the cosine similarity.", "tokens": [51256, 400, 294, 1186, 11, 370, 264, + 32194, 307, 264, 6674, 1674, 13, 467, 311, 406, 264, 23565, 32194, 13, 51600], "temperature": + 0.0, "avg_logprob": -0.23663894832134247, "compression_ratio": 1.576271186440678, + "no_speech_prob": 0.003317478811368346}, {"id": 264, "seek": 195552, "start": 1956.4, + "end": 1964.48, "text": " And the the search like the search procedure is called + maximum inner product search. So basically,", "tokens": [50408, 400, 264, 264, 3164, + 411, 264, 3164, 10747, 307, 1219, 6674, 7284, 1674, 3164, 13, 407, 1936, 11, 50812], + "temperature": 0.0, "avg_logprob": -0.2344076156616211, "compression_ratio": 1.7378048780487805, + "no_speech_prob": 0.005871497560292482}, {"id": 265, "seek": 195552, "start": 1964.48, + "end": 1972.16, "text": " you want to retrieve documents that have the maximum inner + product between query and the document.", "tokens": [50812, 291, 528, 281, 30254, + 8512, 300, 362, 264, 6674, 7284, 1674, 1296, 14581, 293, 264, 4166, 13, 51196], + "temperature": 0.0, "avg_logprob": -0.2344076156616211, "compression_ratio": 1.7378048780487805, + "no_speech_prob": 0.005871497560292482}, {"id": 266, "seek": 195552, "start": 1972.16, + "end": 1980.48, "text": " And the and this is not this is a symmetric similarity + measure in some sense symmetric,", "tokens": [51196, 400, 264, 293, 341, 307, 406, + 341, 307, 257, 32330, 32194, 3481, 294, 512, 2020, 32330, 11, 51612], "temperature": + 0.0, "avg_logprob": -0.2344076156616211, "compression_ratio": 1.7378048780487805, + "no_speech_prob": 0.005871497560292482}, {"id": 267, "seek": 198048, "start": 1980.48, + "end": 1990.32, "text": " but it is not it is it is not a metric and it''s not easily + reducible to the to the cosine similarity", "tokens": [50364, 457, 309, 307, 406, + 309, 307, 309, 307, 406, 257, 20678, 293, 309, 311, 406, 3612, 2783, 32128, 281, + 264, 281, 264, 23565, 32194, 50856], "temperature": 0.0, "avg_logprob": -0.2870780944824219, + "compression_ratio": 1.7625, "no_speech_prob": 0.00257792416960001}, {"id": 268, + "seek": 198048, "start": 1990.32, "end": 1996.0, "text": " and to the creature searching + using a science similarity is actually fully equivalent to searching", "tokens": + [50856, 293, 281, 264, 12797, 10808, 1228, 257, 3497, 32194, 307, 767, 4498, 10344, + 281, 10808, 51140], "temperature": 0.0, "avg_logprob": -0.2870780944824219, "compression_ratio": + 1.7625, "no_speech_prob": 0.00257792416960001}, {"id": 269, "seek": 198048, "start": + 1996.0, "end": 2002.32, "text": " using the Euclidean distance for the inner product + you can reduce this search to a", "tokens": [51140, 1228, 264, 462, 1311, 31264, + 282, 4560, 337, 264, 7284, 1674, 291, 393, 5407, 341, 3164, 281, 257, 51456], "temperature": + 0.0, "avg_logprob": -0.2870780944824219, "compression_ratio": 1.7625, "no_speech_prob": + 0.00257792416960001}, {"id": 270, "seek": 200232, "start": 2003.2, "end": 2009.9199999999998, + "text": " cosine similarity and Euclidean distance, but turns out that this reduction + affects efficiency.", "tokens": [50408, 23565, 32194, 293, 462, 1311, 31264, 282, + 4560, 11, 457, 4523, 484, 300, 341, 11004, 11807, 10493, 13, 50744], "temperature": + 0.0, "avg_logprob": -0.2968607935412177, "compression_ratio": 1.6946902654867257, + "no_speech_prob": 0.0021964393090456724}, {"id": 271, "seek": 200232, "start": 2010.96, + "end": 2017.9199999999998, "text": " And then somewhat like bigger topic for a discussion, + but what happened is that people say at at", "tokens": [50796, 400, 550, 8344, 411, + 3801, 4829, 337, 257, 5017, 11, 457, 437, 2011, 307, 300, 561, 584, 412, 412, 51144], + "temperature": 0.0, "avg_logprob": -0.2968607935412177, "compression_ratio": 1.6946902654867257, + "no_speech_prob": 0.0021964393090456724}, {"id": 272, "seek": 200232, "start": 2017.9199999999998, + "end": 2023.04, "text": " Lucene, who are maintaining Lucene, they were adding support + for the maximum inner product.", "tokens": [51144, 9593, 1450, 11, 567, 366, 14916, + 9593, 1450, 11, 436, 645, 5127, 1406, 337, 264, 6674, 7284, 1674, 13, 51400], "temperature": + 0.0, "avg_logprob": -0.2968607935412177, "compression_ratio": 1.6946902654867257, + "no_speech_prob": 0.0021964393090456724}, {"id": 273, "seek": 200232, "start": 2023.76, + "end": 2029.6, "text": " And Vespa did this and they did this through this trick + to reducing of of maximum inner product to", "tokens": [51436, 400, 691, 279, 4306, + 630, 341, 293, 436, 630, 341, 807, 341, 4282, 281, 12245, 295, 295, 6674, 7284, + 1674, 281, 51728], "temperature": 0.0, "avg_logprob": -0.2968607935412177, "compression_ratio": + 1.6946902654867257, "no_speech_prob": 0.0021964393090456724}, {"id": 274, "seek": + 202960, "start": 2029.6, "end": 2036.24, "text": " cosine similarity and of two. + And I argued that there is research showing that this is suboptimal", "tokens": + [50364, 23565, 32194, 293, 295, 732, 13, 400, 286, 20219, 300, 456, 307, 2132, 4099, + 300, 341, 307, 1422, 5747, 10650, 50696], "temperature": 0.0, "avg_logprob": -0.21691173315048218, + "compression_ratio": 1.6890756302521008, "no_speech_prob": 0.0031810421496629715}, + {"id": 275, "seek": 202960, "start": 2036.24, "end": 2043.52, "text": " and there + was a discussion in this as a result they basically didn''t do this. So so a long + story short,", "tokens": [50696, 293, 456, 390, 257, 5017, 294, 341, 382, 257, 1874, + 436, 1936, 994, 380, 360, 341, 13, 407, 370, 257, 938, 1657, 2099, 11, 51060], "temperature": + 0.0, "avg_logprob": -0.21691173315048218, "compression_ratio": 1.6890756302521008, + "no_speech_prob": 0.0031810421496629715}, {"id": 276, "seek": 202960, "start": 2044.32, + "end": 2051.7599999999998, "text": " I think a lot of things are so non-metrics + similarity search as in general turn out to be not so useful,", "tokens": [51100, + 286, 519, 257, 688, 295, 721, 366, 370, 2107, 12, 5537, 10716, 32194, 3164, 382, + 294, 2674, 1261, 484, 281, 312, 406, 370, 4420, 11, 51472], "temperature": 0.0, + "avg_logprob": -0.21691173315048218, "compression_ratio": 1.6890756302521008, "no_speech_prob": + 0.0031810421496629715}, {"id": 277, "seek": 202960, "start": 2051.7599999999998, + "end": 2058.4, "text": " but there are some instances like maximum inner product + search where we do have things that are", "tokens": [51472, 457, 456, 366, 512, + 14519, 411, 6674, 7284, 1674, 3164, 689, 321, 360, 362, 721, 300, 366, 51804], "temperature": + 0.0, "avg_logprob": -0.21691173315048218, "compression_ratio": 1.6890756302521008, + "no_speech_prob": 0.0031810421496629715}, {"id": 278, "seek": 205840, "start": 2058.48, + "end": 2069.6, "text": " non-metric entities widely used. Yeah, that''s amazing, + but I think I hope that equally as I''m learning,", "tokens": [50368, 2107, 12, + 5537, 1341, 16667, 13371, 1143, 13, 865, 11, 300, 311, 2243, 11, 457, 286, 519, + 286, 1454, 300, 12309, 382, 286, 478, 2539, 11, 50924], "temperature": 0.0, "avg_logprob": + -0.2104255579694917, "compression_ratio": 1.5487179487179488, "no_speech_prob": + 0.005779239349067211}, {"id": 279, "seek": 205840, "start": 2069.6, "end": 2077.44, + "text": " I hope our listeners are also learning on this because often times when + you plunge into a new field,", "tokens": [50924, 286, 1454, 527, 23274, 366, 611, + 2539, 322, 341, 570, 2049, 1413, 562, 291, 499, 27588, 666, 257, 777, 2519, 11, + 51316], "temperature": 0.0, "avg_logprob": -0.2104255579694917, "compression_ratio": + 1.5487179487179488, "no_speech_prob": 0.005779239349067211}, {"id": 280, "seek": + 205840, "start": 2077.44, "end": 2084.56, "text": " let''s say, then search, all + you see is what is being popularized and you know you may go down the", "tokens": + [51316, 718, 311, 584, 11, 550, 3164, 11, 439, 291, 536, 307, 437, 307, 885, 3743, + 1602, 293, 291, 458, 291, 815, 352, 760, 264, 51672], "temperature": 0.0, "avg_logprob": + -0.2104255579694917, "compression_ratio": 1.5487179487179488, "no_speech_prob": + 0.005779239349067211}, {"id": 281, "seek": 208456, "start": 2084.56, "end": 2092.32, + "text": " rabbit hole. So I''m really excited and thankful that you are able to + share and much wider perspective", "tokens": [50364, 19509, 5458, 13, 407, 286, + 478, 534, 2919, 293, 13611, 300, 291, 366, 1075, 281, 2073, 293, 709, 11842, 4585, + 50752], "temperature": 0.0, "avg_logprob": -0.2165426390511649, "compression_ratio": + 1.4591836734693877, "no_speech_prob": 0.006146696396172047}, {"id": 282, "seek": + 208456, "start": 2092.32, "end": 2101.7599999999998, "text": " over things. And + then also most interestingly, you work and you say you''re a co-author of", "tokens": + [50752, 670, 721, 13, 400, 550, 611, 881, 25873, 11, 291, 589, 293, 291, 584, 291, + 434, 257, 598, 12, 34224, 295, 51224], "temperature": 0.0, "avg_logprob": -0.2165426390511649, + "compression_ratio": 1.4591836734693877, "no_speech_prob": 0.006146696396172047}, + {"id": 283, "seek": 208456, "start": 2101.7599999999998, "end": 2110.48, "text": + " an MSLIP besides other authors. Your collective work is also now used at like + for open search,", "tokens": [51224, 364, 7395, 43, 9139, 11868, 661, 16552, 13, + 2260, 12590, 589, 307, 611, 586, 1143, 412, 411, 337, 1269, 3164, 11, 51660], "temperature": + 0.0, "avg_logprob": -0.2165426390511649, "compression_ratio": 1.4591836734693877, + "no_speech_prob": 0.006146696396172047}, {"id": 284, "seek": 211048, "start": 2111.2, + "end": 2120.08, "text": " engine, which I believe I also had a chance to test at + some point and like it''s a C++ library that", "tokens": [50400, 2848, 11, 597, + 286, 1697, 286, 611, 632, 257, 2931, 281, 1500, 412, 512, 935, 293, 411, 309, 311, + 257, 383, 25472, 6405, 300, 50844], "temperature": 0.0, "avg_logprob": -0.19844889353556805, + "compression_ratio": 1.4656862745098038, "no_speech_prob": 0.006503316108137369}, + {"id": 285, "seek": 211048, "start": 2120.08, "end": 2131.2, "text": " is then somehow + loaded of GVM and basically then searches is performed using H&SW. Can you tell + me", "tokens": [50844, 307, 550, 6063, 13210, 295, 460, 53, 44, 293, 1936, 550, + 26701, 307, 10332, 1228, 389, 5, 50, 54, 13, 1664, 291, 980, 385, 51400], "temperature": + 0.0, "avg_logprob": -0.19844889353556805, "compression_ratio": 1.4656862745098038, + "no_speech_prob": 0.006503316108137369}, {"id": 286, "seek": 211048, "start": 2131.2, + "end": 2138.96, "text": " a bit about that like that story of how did you end up + you know connecting these to an MSLIP and H&SW", "tokens": [51400, 257, 857, 466, + 300, 411, 300, 1657, 295, 577, 630, 291, 917, 493, 291, 458, 11015, 613, 281, 364, + 7395, 43, 9139, 293, 389, 5, 50, 54, 51788], "temperature": 0.0, "avg_logprob": + -0.19844889353556805, "compression_ratio": 1.4656862745098038, "no_speech_prob": + 0.006503316108137369}, {"id": 287, "seek": 213896, "start": 2139.84, "end": 2146.96, + "text": " and here I will probably link to the episode theory that it''s quite popular + today.", "tokens": [50408, 293, 510, 286, 486, 1391, 2113, 281, 264, 3500, 5261, + 300, 309, 311, 1596, 3743, 965, 13, 50764], "temperature": 0.0, "avg_logprob": -0.22032879292964935, + "compression_ratio": 1.3823529411764706, "no_speech_prob": 0.0036538366694003344}, + {"id": 288, "seek": 213896, "start": 2149.12, "end": 2156.96, "text": " Yeah, well + first of all I have to say that I mean it was popularity like close to 100%", "tokens": + [50872, 865, 11, 731, 700, 295, 439, 286, 362, 281, 584, 300, 286, 914, 309, 390, + 19301, 411, 1998, 281, 2319, 4, 51264], "temperature": 0.0, "avg_logprob": -0.22032879292964935, + "compression_ratio": 1.3823529411764706, "no_speech_prob": 0.0036538366694003344}, + {"id": 289, "seek": 213896, "start": 2156.96, "end": 2164.08, "text": " of an MSLIP + is certainly due to the development of H&SW which was", "tokens": [51264, 295, 364, + 7395, 43, 9139, 307, 3297, 3462, 281, 264, 3250, 295, 389, 5, 50, 54, 597, 390, + 51620], "temperature": 0.0, "avg_logprob": -0.22032879292964935, "compression_ratio": + 1.3823529411764706, "no_speech_prob": 0.0036538366694003344}, {"id": 290, "seek": + 216408, "start": 2164.08, "end": 2177.04, "text": " Eurisk creation not mine and + we affected it in only very minor ways because I think the", "tokens": [50364, 462, + 374, 7797, 8016, 406, 3892, 293, 321, 8028, 309, 294, 787, 588, 6696, 2098, 570, + 286, 519, 264, 51012], "temperature": 0.0, "avg_logprob": -0.5131012549767128, "compression_ratio": + 1.4505494505494505, "no_speech_prob": 0.008249538950622082}, {"id": 291, "seek": + 216408, "start": 2177.04, "end": 2187.6, "text": " I mean we provided the platform + and yeah so I think one trick that you reboard which I borrowed", "tokens": [51012, + 286, 914, 321, 5649, 264, 3663, 293, 1338, 370, 286, 519, 472, 4282, 300, 291, 319, + 3787, 597, 286, 26805, 51540], "temperature": 0.0, "avg_logprob": -0.5131012549767128, + "compression_ratio": 1.4505494505494505, "no_speech_prob": 0.008249538950622082}, + {"id": 292, "seek": 216408, "start": 2187.6, "end": 2193.92, "text": " from KGRAF + was the efficient algorithm from him or she checking but that was it.", "tokens": + [51540, 490, 591, 38, 3750, 37, 390, 264, 7148, 9284, 490, 796, 420, 750, 8568, + 457, 300, 390, 309, 13, 51856], "temperature": 0.0, "avg_logprob": -0.5131012549767128, + "compression_ratio": 1.4505494505494505, "no_speech_prob": 0.008249538950622082}, + {"id": 293, "seek": 219408, "start": 2194.64, "end": 2200.64, "text": " So the end + of the sleep but end of the sleep it was like creation of several people and it + was like", "tokens": [50392, 407, 264, 917, 295, 264, 2817, 457, 917, 295, 264, + 2817, 309, 390, 411, 8016, 295, 2940, 561, 293, 309, 390, 411, 50692], "temperature": + 0.0, "avg_logprob": -0.3100432796754699, "compression_ratio": 1.5930232558139534, + "no_speech_prob": 0.0034390168730169535}, {"id": 294, "seek": 219408, "start": 2200.64, + "end": 2206.88, "text": " has like a rather wild story so it was never planned in + the sort of random how we developed.", "tokens": [50692, 575, 411, 257, 2831, 4868, + 1657, 370, 309, 390, 1128, 8589, 294, 264, 1333, 295, 4974, 577, 321, 4743, 13, + 51004], "temperature": 0.0, "avg_logprob": -0.3100432796754699, "compression_ratio": + 1.5930232558139534, "no_speech_prob": 0.0034390168730169535}, {"id": 295, "seek": + 219408, "start": 2208.88, "end": 2217.2799999999997, "text": " So in 2012 I attended + the conference where I met Billik Nidhan who was working on", "tokens": [51104, + 407, 294, 9125, 286, 15990, 264, 7586, 689, 286, 1131, 5477, 1035, 426, 327, 3451, + 567, 390, 1364, 322, 51524], "temperature": 0.0, "avg_logprob": -0.3100432796754699, + "compression_ratio": 1.5930232558139534, "no_speech_prob": 0.0034390168730169535}, + {"id": 296, "seek": 221728, "start": 2217.28, "end": 2225.6800000000003, "text": + " and he was doing his PhD on similarity research and we found that like a lot of", + "tokens": [50364, 293, 415, 390, 884, 702, 14476, 322, 32194, 2132, 293, 321, 1352, + 300, 411, 257, 688, 295, 50784], "temperature": 0.0, "avg_logprob": -0.36457420349121095, + "compression_ratio": 1.6203703703703705, "no_speech_prob": 0.006099394988268614}, + {"id": 297, "seek": 221728, "start": 2226.6400000000003, "end": 2232.0800000000004, + "text": " like we shared some interesting particularly in the written algorithms + and we decided to do some", "tokens": [50832, 411, 321, 5507, 512, 1880, 4098, 294, + 264, 3720, 14642, 293, 321, 3047, 281, 360, 512, 51104], "temperature": 0.0, "avg_logprob": + -0.36457420349121095, "compression_ratio": 1.6203703703703705, "no_speech_prob": + 0.006099394988268614}, {"id": 298, "seek": 221728, "start": 2232.0800000000004, + "end": 2239.0400000000004, "text": " joint project together and then my initial + interest was how do I support not it was somewhat", "tokens": [51104, 7225, 1716, + 1214, 293, 550, 452, 5883, 1179, 390, 577, 360, 286, 1406, 406, 309, 390, 8344, + 51452], "temperature": 0.0, "avg_logprob": -0.36457420349121095, "compression_ratio": + 1.6203703703703705, "no_speech_prob": 0.006099394988268614}, {"id": 299, "seek": + 221728, "start": 2239.0400000000004, "end": 2244.7200000000003, "text": " academic + topic no metric search is as I explained before it''s still like largely", "tokens": + [51452, 7778, 4829, 572, 20678, 3164, 307, 382, 286, 8825, 949, 309, 311, 920, 411, + 11611, 51736], "temperature": 0.0, "avg_logprob": -0.36457420349121095, "compression_ratio": + 1.6203703703703705, "no_speech_prob": 0.006099394988268614}, {"id": 300, "seek": + 224472, "start": 2245.4399999999996, "end": 2251.04, "text": " more like academic + interest because a lot of things are really metric or at most", "tokens": [50400, + 544, 411, 7778, 1179, 570, 257, 688, 295, 721, 366, 534, 20678, 420, 412, 881, 50680], + "temperature": 0.0, "avg_logprob": -0.38040171350751606, "compression_ratio": 1.606060606060606, + "no_speech_prob": 0.004586182534694672}, {"id": 301, "seek": 224472, "start": 2251.9199999999996, + "end": 2256.24, "text": " maximum winter product search which is sort of almost + a scientific narrative almost metric.", "tokens": [50724, 6674, 6355, 1674, 3164, + 597, 307, 1333, 295, 1920, 257, 8134, 9977, 1920, 20678, 13, 50940], "temperature": + 0.0, "avg_logprob": -0.38040171350751606, "compression_ratio": 1.606060606060606, + "no_speech_prob": 0.004586182534694672}, {"id": 302, "seek": 224472, "start": 2256.9599999999996, + "end": 2266.3999999999996, "text": " And yeah so that was basically purely for academic + interest and I connected it to the to the", "tokens": [50976, 400, 1338, 370, 300, + 390, 1936, 17491, 337, 7778, 1179, 293, 286, 4582, 309, 281, 264, 281, 264, 51448], + "temperature": 0.0, "avg_logprob": -0.38040171350751606, "compression_ratio": 1.606060606060606, + "no_speech_prob": 0.004586182534694672}, {"id": 303, "seek": 226640, "start": 2266.4, + "end": 2273.36, "text": " machine learning because I saw an opportunity to use machine + learning to support", "tokens": [50364, 3479, 2539, 570, 286, 1866, 364, 2650, 281, + 764, 3479, 2539, 281, 1406, 50712], "temperature": 0.0, "avg_logprob": -0.3465174992879232, + "compression_ratio": 1.5941176470588236, "no_speech_prob": 0.00044519922812469304}, + {"id": 304, "seek": 226640, "start": 2274.64, "end": 2280.56, "text": " generical + algorithms that would do a k-nearest new research with non-metrics simulator such + as", "tokens": [50776, 1337, 804, 14642, 300, 576, 360, 257, 350, 12, 716, 17363, + 777, 2132, 365, 2107, 12, 5537, 10716, 32974, 1270, 382, 51072], "temperature": + 0.0, "avg_logprob": -0.3465174992879232, "compression_ratio": 1.5941176470588236, + "no_speech_prob": 0.00044519922812469304}, {"id": 305, "seek": 226640, "start": + 2280.56, "end": 2288.96, "text": " scale divergence. Yeah so we did it as a machine + learning course project and we published paper", "tokens": [51072, 4373, 47387, + 13, 865, 370, 321, 630, 309, 382, 257, 3479, 2539, 1164, 1716, 293, 321, 6572, 3035, + 51492], "temperature": 0.0, "avg_logprob": -0.3465174992879232, "compression_ratio": + 1.5941176470588236, "no_speech_prob": 0.00044519922812469304}, {"id": 306, "seek": + 228896, "start": 2289.28, "end": 2299.52, "text": " at Noribs and it could have + stopped at this point but then I also like that conference I got like I", "tokens": + [50380, 412, 6966, 897, 82, 293, 309, 727, 362, 5936, 412, 341, 935, 457, 550, 286, + 611, 411, 300, 7586, 286, 658, 411, 286, 50892], "temperature": 0.0, "avg_logprob": + -0.4482315761942259, "compression_ratio": 1.518918918918919, "no_speech_prob": 0.006563352886587381}, + {"id": 307, "seek": 228896, "start": 2299.52, "end": 2308.56, "text": " met other + URIs that conference or just treated that I met with some of with Yuri Adir", "tokens": + [50892, 1131, 661, 624, 5577, 82, 300, 7586, 420, 445, 8668, 300, 286, 1131, 365, + 512, 295, 365, 33901, 1999, 347, 51344], "temperature": 0.0, "avg_logprob": -0.4482315761942259, + "compression_ratio": 1.518918918918919, "no_speech_prob": 0.006563352886587381}, + {"id": 308, "seek": 228896, "start": 2309.28, "end": 2316.4, "text": " Quothar, + Alexander and they worked both work at Merrill Habs company where they developed + small", "tokens": [51380, 2326, 900, 289, 11, 14845, 293, 436, 2732, 1293, 589, + 412, 6124, 37480, 14225, 82, 2237, 689, 436, 4743, 1359, 51736], "temperature": + 0.0, "avg_logprob": -0.4482315761942259, "compression_ratio": 1.518918918918919, + "no_speech_prob": 0.006563352886587381}, {"id": 309, "seek": 231640, "start": 2316.4, + "end": 2322.2400000000002, "text": " world graph approach and that was like a general + version and so Alexander was really interested to", "tokens": [50364, 1002, 4295, + 3109, 293, 300, 390, 411, 257, 2674, 3037, 293, 370, 14845, 390, 534, 3102, 281, + 50656], "temperature": 0.0, "avg_logprob": -0.2563207944234212, "compression_ratio": + 1.7254901960784315, "no_speech_prob": 0.0029968791641294956}, {"id": 310, "seek": + 231640, "start": 2322.2400000000002, "end": 2327.84, "text": " prove that whatever + algorithms that we have in Emma Sleep and we were tackling generic", "tokens": [50656, + 7081, 300, 2035, 14642, 300, 321, 362, 294, 17124, 19383, 293, 321, 645, 34415, + 19577, 50936], "temperature": 0.0, "avg_logprob": -0.2563207944234212, "compression_ratio": + 1.7254901960784315, "no_speech_prob": 0.0029968791641294956}, {"id": 311, "seek": + 231640, "start": 2329.04, "end": 2334.48, "text": " search in generic spaces for + generic similarities and he was eager to prove that", "tokens": [50996, 3164, 294, + 19577, 7673, 337, 19577, 24197, 293, 415, 390, 18259, 281, 7081, 300, 51268], "temperature": + 0.0, "avg_logprob": -0.2563207944234212, "compression_ratio": 1.7254901960784315, + "no_speech_prob": 0.0029968791641294956}, {"id": 312, "seek": 231640, "start": 2335.84, + "end": 2343.52, "text": " the graph based algorithm I actually truly generic and + this is why he and his student", "tokens": [51336, 264, 4295, 2361, 9284, 286, 767, + 4908, 19577, 293, 341, 307, 983, 415, 293, 702, 3107, 51720], "temperature": 0.0, + "avg_logprob": -0.2563207944234212, "compression_ratio": 1.7254901960784315, "no_speech_prob": + 0.0029968791641294956}, {"id": 313, "seek": 234352, "start": 2344.32, "end": 2351.28, + "text": " that you know created the first version of a small world graph in Emma + Sleep so basically contributed", "tokens": [50404, 300, 291, 458, 2942, 264, 700, + 3037, 295, 257, 1359, 1002, 4295, 294, 17124, 19383, 370, 1936, 18434, 50752], "temperature": + 0.0, "avg_logprob": -0.23774796062045628, "compression_ratio": 1.6236559139784945, + "no_speech_prob": 0.0020354397129267454}, {"id": 314, "seek": 234352, "start": 2351.28, + "end": 2359.2, "text": " this version and that was a really super slow I spread + it up by both 10x and that was the version", "tokens": [50752, 341, 3037, 293, 300, + 390, 257, 534, 1687, 2964, 286, 3974, 309, 493, 538, 1293, 1266, 87, 293, 300, 390, + 264, 3037, 51148], "temperature": 0.0, "avg_logprob": -0.23774796062045628, "compression_ratio": + 1.6236559139784945, "no_speech_prob": 0.0020354397129267454}, {"id": 315, "seek": + 234352, "start": 2359.2, "end": 2368.4, "text": " that we used to win the first + in Benchmarks so it was pretty good but it has issues and one issue that", "tokens": + [51148, 300, 321, 1143, 281, 1942, 264, 700, 294, 3964, 339, 37307, 370, 309, 390, + 1238, 665, 457, 309, 575, 2663, 293, 472, 2734, 300, 51608], "temperature": 0.0, + "avg_logprob": -0.23774796062045628, "compression_ratio": 1.6236559139784945, "no_speech_prob": + 0.0020354397129267454}, {"id": 316, "seek": 236840, "start": 2368.56, "end": 2378.2400000000002, + "text": " was fixed thanks to Yuri sharing with me some early version with H&SW + and I looked at the code it", "tokens": [50372, 390, 6806, 3231, 281, 33901, 5414, + 365, 385, 512, 2440, 3037, 365, 389, 5, 50, 54, 293, 286, 2956, 412, 264, 3089, + 309, 50856], "temperature": 0.0, "avg_logprob": -0.19855050898309964, "compression_ratio": + 1.5852272727272727, "no_speech_prob": 0.003960041329264641}, {"id": 317, "seek": + 236840, "start": 2378.2400000000002, "end": 2384.96, "text": " was not as like fast + version that he created later but already there was fixing something", "tokens": + [50856, 390, 406, 382, 411, 2370, 3037, 300, 415, 2942, 1780, 457, 1217, 456, 390, + 19442, 746, 51192], "temperature": 0.0, "avg_logprob": -0.19855050898309964, "compression_ratio": + 1.5852272727272727, "no_speech_prob": 0.003960041329264641}, {"id": 318, "seek": + 236840, "start": 2385.6800000000003, "end": 2390.48, "text": " and maybe he didn''t + realize he showed me that piece of code and I realized oh like there is", "tokens": + [51228, 293, 1310, 415, 994, 380, 4325, 415, 4712, 385, 300, 2522, 295, 3089, 293, + 286, 5334, 1954, 411, 456, 307, 51468], "temperature": 0.0, "avg_logprob": -0.19855050898309964, + "compression_ratio": 1.5852272727272727, "no_speech_prob": 0.003960041329264641}, + {"id": 319, "seek": 239048, "start": 2390.48, "end": 2399.12, "text": " actually + still an issue in SW graph so SW both SW graph was improved and Yuri like", "tokens": + [50364, 767, 920, 364, 2734, 294, 318, 54, 4295, 370, 318, 54, 1293, 318, 54, 4295, + 390, 9689, 293, 33901, 411, 50796], "temperature": 0.0, "avg_logprob": -0.3491799174875453, + "compression_ratio": 1.6, "no_speech_prob": 0.004256589338183403}, {"id": 320, "seek": + 239048, "start": 2399.12, "end": 2407.76, "text": " contribution is W2 and Emma + Sleep so it greatly it was like a huge contribution like big step forward", "tokens": + [50796, 13150, 307, 343, 17, 293, 17124, 19383, 370, 309, 14147, 309, 390, 411, + 257, 2603, 13150, 411, 955, 1823, 2128, 51228], "temperature": 0.0, "avg_logprob": + -0.3491799174875453, "compression_ratio": 1.6, "no_speech_prob": 0.004256589338183403}, + {"id": 321, "seek": 239048, "start": 2408.96, "end": 2419.52, "text": " it won the + second NNBG Mark competition they proved SW graph was I think the second the", "tokens": + [51288, 309, 1582, 264, 1150, 426, 45, 33, 38, 3934, 6211, 436, 14617, 318, 54, + 4295, 390, 286, 519, 264, 1150, 264, 51816], "temperature": 0.0, "avg_logprob": + -0.3491799174875453, "compression_ratio": 1.6, "no_speech_prob": 0.004256589338183403}, + {"id": 322, "seek": 241952, "start": 2419.52, "end": 2424.88, "text": " second algorithm + I have a screenshot of this somewhere which I sometimes included to my job talks", + "tokens": [50364, 1150, 9284, 286, 362, 257, 27712, 295, 341, 4079, 597, 286, 2171, + 5556, 281, 452, 1691, 6686, 50632], "temperature": 0.0, "avg_logprob": -0.1631788526262556, + "compression_ratio": 1.5132275132275133, "no_speech_prob": 0.00043284616549499333}, + {"id": 323, "seek": 241952, "start": 2427.12, "end": 2435.04, "text": " and the + H&SW also influenced face because they realized oh like they actually knew about", + "tokens": [50744, 293, 264, 389, 5, 50, 54, 611, 15269, 1851, 570, 436, 5334, 1954, + 411, 436, 767, 2586, 466, 51140], "temperature": 0.0, "avg_logprob": -0.1631788526262556, + "compression_ratio": 1.5132275132275133, "no_speech_prob": 0.00043284616549499333}, + {"id": 324, "seek": 241952, "start": 2435.04, "end": 2443.2, "text": " K graph and + knew about the graph based retrieval but there was one important reason why they + didn''t", "tokens": [51140, 591, 4295, 293, 2586, 466, 264, 4295, 2361, 19817, 3337, + 457, 456, 390, 472, 1021, 1778, 983, 436, 994, 380, 51548], "temperature": 0.0, + "avg_logprob": -0.1631788526262556, "compression_ratio": 1.5132275132275133, "no_speech_prob": + 0.00043284616549499333}, {"id": 325, "seek": 244320, "start": 2443.68, "end": 2450.24, + "text": " you can ask me why but anyway so it influenced face and a lot of other + people and of course", "tokens": [50388, 291, 393, 1029, 385, 983, 457, 4033, 370, + 309, 15269, 1851, 293, 257, 688, 295, 661, 561, 293, 295, 1164, 50716], "temperature": + 0.0, "avg_logprob": -0.24033731489039178, "compression_ratio": 1.622093023255814, + "no_speech_prob": 0.006005355156958103}, {"id": 326, "seek": 244320, "start": 2450.7999999999997, + "end": 2458.8799999999997, "text": " yeah that that Yuri created that was Yuri''s + work like a huge impact in the rest of his history", "tokens": [50744, 1338, 300, + 300, 33901, 2942, 300, 390, 33901, 311, 589, 411, 257, 2603, 2712, 294, 264, 1472, + 295, 702, 2503, 51148], "temperature": 0.0, "avg_logprob": -0.24033731489039178, + "compression_ratio": 1.622093023255814, "no_speech_prob": 0.006005355156958103}, + {"id": 327, "seek": 244320, "start": 2458.8799999999997, "end": 2465.12, "text": + " but I think Yuri shouldn''t complain no he has a great career first he has great + career first", "tokens": [51148, 457, 286, 519, 33901, 4659, 380, 11024, 572, 415, + 575, 257, 869, 3988, 700, 415, 575, 869, 3988, 700, 51460], "temperature": 0.0, + "avg_logprob": -0.24033731489039178, "compression_ratio": 1.622093023255814, "no_speech_prob": + 0.006005355156958103}, {"id": 328, "seek": 246512, "start": 2465.68, "end": 2473.8399999999997, + "text": " Twitter and now it''s at OpenAI so yeah it''s a magic story just a close + of the wolf white", "tokens": [50392, 5794, 293, 586, 309, 311, 412, 7238, 48698, + 370, 1338, 309, 311, 257, 5585, 1657, 445, 257, 1998, 295, 264, 19216, 2418, 50800], + "temperature": 0.0, "avg_logprob": -0.33124428901119507, "compression_ratio": 1.5392670157068062, + "no_speech_prob": 0.005152097903192043}, {"id": 329, "seek": 246512, "start": 2473.8399999999997, + "end": 2481.7599999999998, "text": " it face did not implement the approach you + had this is this is a really interesting thing because", "tokens": [50800, 309, + 1851, 630, 406, 4445, 264, 3109, 291, 632, 341, 307, 341, 307, 257, 534, 1880, 551, + 570, 51196], "temperature": 0.0, "avg_logprob": -0.33124428901119507, "compression_ratio": + 1.5392670157068062, "no_speech_prob": 0.005152097903192043}, {"id": 330, "seek": + 246512, "start": 2482.24, "end": 2490.48, "text": " that''s one of my favorite pieces + in this story well turns out that the the graph based retrieval algorithms", "tokens": + [51220, 300, 311, 472, 295, 452, 2954, 3755, 294, 341, 1657, 731, 4523, 484, 300, + 264, 264, 4295, 2361, 19817, 3337, 14642, 51632], "temperature": 0.0, "avg_logprob": + -0.33124428901119507, "compression_ratio": 1.5392670157068062, "no_speech_prob": + 0.005152097903192043}, {"id": 331, "seek": 249048, "start": 2490.48, "end": 2497.44, + "text": " they had long history so a lot of this was rediscovered on the the pruning + heuristics and like", "tokens": [50364, 436, 632, 938, 2503, 370, 257, 688, 295, + 341, 390, 2182, 40080, 292, 322, 264, 264, 582, 37726, 415, 374, 6006, 293, 411, + 50712], "temperature": 0.0, "avg_logprob": -0.17607196432645203, "compression_ratio": + 1.5126582278481013, "no_speech_prob": 0.0015562961343675852}, {"id": 332, "seek": + 249048, "start": 2497.44, "end": 2506.32, "text": " the basic algorithm they go + back to papers in 80s and 90s but people did not use it", "tokens": [50712, 264, + 3875, 9284, 436, 352, 646, 281, 10577, 294, 4688, 82, 293, 4289, 82, 457, 561, 630, + 406, 764, 309, 51156], "temperature": 0.0, "avg_logprob": -0.17607196432645203, + "compression_ratio": 1.5126582278481013, "no_speech_prob": 0.0015562961343675852}, + {"id": 333, "seek": 249048, "start": 2507.36, "end": 2514.08, "text": " and one + hurdle was the inability to efficiently create those", "tokens": [51208, 293, 472, + 47423, 390, 264, 33162, 281, 19621, 1884, 729, 51544], "temperature": 0.0, "avg_logprob": + -0.17607196432645203, "compression_ratio": 1.5126582278481013, "no_speech_prob": + 0.0015562961343675852}, {"id": 334, "seek": 251408, "start": 2514.4, "end": 2521.52, + "text": " K K nearest neighbor graphs and K nearest neighbor graph it''s a simple + concept you have data point", "tokens": [50380, 591, 591, 23831, 5987, 24877, 293, + 591, 23831, 5987, 4295, 309, 311, 257, 2199, 3410, 291, 362, 1412, 935, 50736], + "temperature": 0.0, "avg_logprob": -0.25811036053825825, "compression_ratio": 1.9203980099502487, + "no_speech_prob": 0.0023210092913359404}, {"id": 335, "seek": 251408, "start": 2521.52, + "end": 2526.96, "text": " and you have you need to find some data points that are + nearest neighbor of these data points and", "tokens": [50736, 293, 291, 362, 291, + 643, 281, 915, 512, 1412, 2793, 300, 366, 23831, 5987, 295, 613, 1412, 2793, 293, + 51008], "temperature": 0.0, "avg_logprob": -0.25811036053825825, "compression_ratio": + 1.9203980099502487, "no_speech_prob": 0.0023210092913359404}, {"id": 336, "seek": + 251408, "start": 2526.96, "end": 2534.72, "text": " then you connect to them module + or some post modification of this graph but how like you know if", "tokens": [51008, + 550, 291, 1745, 281, 552, 10088, 420, 512, 2183, 26747, 295, 341, 4295, 457, 577, + 411, 291, 458, 498, 51396], "temperature": 0.0, "avg_logprob": -0.25811036053825825, + "compression_ratio": 1.9203980099502487, "no_speech_prob": 0.0023210092913359404}, + {"id": 337, "seek": 251408, "start": 2534.72, "end": 2542.16, "text": " you have + end points it is in the brute force approach and squared computation how can you + do", "tokens": [51396, 291, 362, 917, 2793, 309, 307, 294, 264, 47909, 3464, 3109, + 293, 8889, 24903, 577, 393, 291, 360, 51768], "temperature": 0.0, "avg_logprob": + -0.25811036053825825, "compression_ratio": 1.9203980099502487, "no_speech_prob": + 0.0023210092913359404}, {"id": 338, "seek": 254216, "start": 2542.16, "end": 2548.72, + "text": " this efficiently how can you approximate this so the way how it was done + before people were", "tokens": [50364, 341, 19621, 577, 393, 291, 30874, 341, 370, + 264, 636, 577, 309, 390, 1096, 949, 561, 645, 50692], "temperature": 0.0, "avg_logprob": + -0.22595163293786952, "compression_ratio": 1.7951219512195122, "no_speech_prob": + 0.0006250280421227217}, {"id": 339, "seek": 254216, "start": 2548.72, "end": 2556.08, + "text": " coming up with fancy algorithms how to how to approximate a disappointment + and those fancy", "tokens": [50692, 1348, 493, 365, 10247, 14642, 577, 281, 577, + 281, 30874, 257, 28175, 293, 729, 10247, 51060], "temperature": 0.0, "avg_logprob": + -0.22595163293786952, "compression_ratio": 1.7951219512195122, "no_speech_prob": + 0.0006250280421227217}, {"id": 340, "seek": 254216, "start": 2556.08, "end": 2561.2799999999997, + "text": " algorithms would not particularly scalable and K graph I think is not + particularly scalable we", "tokens": [51060, 14642, 576, 406, 4098, 38481, 293, + 591, 4295, 286, 519, 307, 406, 4098, 38481, 321, 51320], "temperature": 0.0, "avg_logprob": + -0.22595163293786952, "compression_ratio": 1.7951219512195122, "no_speech_prob": + 0.0006250280421227217}, {"id": 341, "seek": 254216, "start": 2562.0, "end": 2569.52, + "text": " played with it we actually incorporated K graph in primitation into enemy + sleep and it was", "tokens": [51356, 3737, 365, 309, 321, 767, 21654, 591, 4295, + 294, 2886, 4614, 666, 5945, 2817, 293, 309, 390, 51732], "temperature": 0.0, "avg_logprob": + -0.22595163293786952, "compression_ratio": 1.7951219512195122, "no_speech_prob": + 0.0006250280421227217}, {"id": 342, "seek": 256952, "start": 2569.52, "end": 2574.96, + "text": " indeed hard to create large graphs because it''s a nitty-fragurithium + and yeah it''s not it''s not", "tokens": [50364, 6451, 1152, 281, 1884, 2416, 24877, + 570, 309, 311, 257, 297, 10016, 12, 69, 3731, 374, 355, 2197, 293, 1338, 309, 311, + 406, 309, 311, 406, 50636], "temperature": 0.0, "avg_logprob": -0.47156973142881653, + "compression_ratio": 1.5494505494505495, "no_speech_prob": 0.0009183312067762017}, + {"id": 343, "seek": 256952, "start": 2574.96, "end": 2584.88, "text": " very scalable + but what what Yuri and Kothar did for as small world graph and it was before", "tokens": + [50636, 588, 38481, 457, 437, 437, 33901, 293, 591, 900, 289, 630, 337, 382, 1359, + 1002, 4295, 293, 309, 390, 949, 51132], "temperature": 0.0, "avg_logprob": -0.47156973142881653, + "compression_ratio": 1.5494505494505495, "no_speech_prob": 0.0009183312067762017}, + {"id": 344, "seek": 256952, "start": 2584.88, "end": 2591.92, "text": " she was + done we were while they were at Merrillab''s company they realized that they can + combine", "tokens": [51132, 750, 390, 1096, 321, 645, 1339, 436, 645, 412, 6124, + 37480, 455, 311, 2237, 436, 5334, 300, 436, 393, 10432, 51484], "temperature": 0.0, + "avg_logprob": -0.47156973142881653, "compression_ratio": 1.5494505494505495, "no_speech_prob": + 0.0009183312067762017}, {"id": 345, "seek": 259192, "start": 2592.32, "end": 2599.12, + "text": " the interval and creation of the graph and they can do it efficiently + like in what like using", "tokens": [50384, 264, 15035, 293, 8016, 295, 264, 4295, + 293, 436, 393, 360, 309, 19621, 411, 294, 437, 411, 1228, 50724], "temperature": + 0.0, "avg_logprob": -0.27516734809206245, "compression_ratio": 1.5898876404494382, + "no_speech_prob": 0.0024735662154853344}, {"id": 346, "seek": 259192, "start": 2599.12, + "end": 2607.6800000000003, "text": " modern terminology embarrassingly parallel + fashion and that was I think one key missing block that", "tokens": [50724, 4363, + 27575, 9187, 12163, 8952, 6700, 293, 300, 390, 286, 519, 472, 2141, 5361, 3461, + 300, 51152], "temperature": 0.0, "avg_logprob": -0.27516734809206245, "compression_ratio": + 1.5898876404494382, "no_speech_prob": 0.0024735662154853344}, {"id": 347, "seek": + 259192, "start": 2607.6800000000003, "end": 2617.84, "text": " prevented graph based + algorithms to become practical yeah that makes sense yeah it doesn''t", "tokens": + [51152, 27314, 4295, 2361, 14642, 281, 1813, 8496, 1338, 300, 1669, 2020, 1338, + 309, 1177, 380, 51660], "temperature": 0.0, "avg_logprob": -0.27516734809206245, + "compression_ratio": 1.5898876404494382, "no_speech_prob": 0.0024735662154853344}, + {"id": 348, "seek": 261784, "start": 2618.8, "end": 2624.48, "text": " like what + what excites me in this story that you shared is that how serendipitous the", "tokens": + [50412, 411, 437, 437, 1624, 3324, 385, 294, 341, 1657, 300, 291, 5507, 307, 300, + 577, 816, 521, 647, 39831, 264, 50696], "temperature": 0.0, "avg_logprob": -0.14853202595430262, + "compression_ratio": 1.7534883720930232, "no_speech_prob": 0.0038029756397008896}, + {"id": 349, "seek": 261784, "start": 2625.04, "end": 2633.2000000000003, "text": + " this like the discovery process is right and like something that feels like random + leads to I", "tokens": [50724, 341, 411, 264, 12114, 1399, 307, 558, 293, 411, 746, + 300, 3417, 411, 4974, 6689, 281, 286, 51132], "temperature": 0.0, "avg_logprob": + -0.14853202595430262, "compression_ratio": 1.7534883720930232, "no_speech_prob": + 0.0038029756397008896}, {"id": 350, "seek": 261784, "start": 2633.2000000000003, + "end": 2639.1200000000003, "text": " don''t know creation of the industry right + you could you could largely say that the new industry of", "tokens": [51132, 500, + 380, 458, 8016, 295, 264, 3518, 558, 291, 727, 291, 727, 11611, 584, 300, 264, 777, + 3518, 295, 51428], "temperature": 0.0, "avg_logprob": -0.14853202595430262, "compression_ratio": + 1.7534883720930232, "no_speech_prob": 0.0038029756397008896}, {"id": 351, "seek": + 261784, "start": 2639.1200000000003, "end": 2646.4, "text": " you know vector databases + and vector search and now rag on top of that is created because you guys", "tokens": + [51428, 291, 458, 8062, 22380, 293, 8062, 3164, 293, 586, 17539, 322, 1192, 295, + 300, 307, 2942, 570, 291, 1074, 51792], "temperature": 0.0, "avg_logprob": -0.14853202595430262, + "compression_ratio": 1.7534883720930232, "no_speech_prob": 0.0038029756397008896}, + {"id": 352, "seek": 264640, "start": 2647.36, "end": 2654.1600000000003, "text": + " worked on practical implementations of something that was also that that stood + on no", "tokens": [50412, 2732, 322, 8496, 4445, 763, 295, 746, 300, 390, 611, 300, + 300, 9371, 322, 572, 50752], "temperature": 0.0, "avg_logprob": -0.14655622243881225, + "compression_ratio": 1.6787330316742082, "no_speech_prob": 0.00035575864603742957}, + {"id": 353, "seek": 264640, "start": 2654.1600000000003, "end": 2659.36, "text": + " shoulders of you know some of the inventions and research done before that and + so it''s kind", "tokens": [50752, 10245, 295, 291, 458, 512, 295, 264, 43748, 293, + 2132, 1096, 949, 300, 293, 370, 309, 311, 733, 51012], "temperature": 0.0, "avg_logprob": + -0.14655622243881225, "compression_ratio": 1.6787330316742082, "no_speech_prob": + 0.00035575864603742957}, {"id": 354, "seek": 264640, "start": 2659.36, "end": 2665.28, + "text": " of like natural progression but I mean it''s amazing how you know it''s + just on the verge of you", "tokens": [51012, 295, 411, 3303, 18733, 457, 286, 914, + 309, 311, 2243, 577, 291, 458, 309, 311, 445, 322, 264, 37164, 295, 291, 51308], + "temperature": 0.0, "avg_logprob": -0.14655622243881225, "compression_ratio": 1.6787330316742082, + "no_speech_prob": 0.00035575864603742957}, {"id": 355, "seek": 264640, "start": + 2665.28, "end": 2672.0, "text": " not meeting someone at a conference could basically + need to possibly not creating an industry right", "tokens": [51308, 406, 3440, 1580, + 412, 257, 7586, 727, 1936, 643, 281, 6264, 406, 4084, 364, 3518, 558, 51644], "temperature": + 0.0, "avg_logprob": -0.14655622243881225, "compression_ratio": 1.6787330316742082, + "no_speech_prob": 0.00035575864603742957}, {"id": 356, "seek": 267200, "start": + 2672.96, "end": 2682.48, "text": " quite possibly I think well thank you for the + kind words and of course it''s not because of us the", "tokens": [50412, 1596, 6264, + 286, 519, 731, 1309, 291, 337, 264, 733, 2283, 293, 295, 1164, 309, 311, 406, 570, + 295, 505, 264, 50888], "temperature": 0.0, "avg_logprob": -0.34741485820097084, + "compression_ratio": 1.6264367816091954, "no_speech_prob": 0.0017407051054760814}, + {"id": 357, "seek": 267200, "start": 2684.0, "end": 2692.88, "text": " if not for + us I think other people who have done this I yeah so I with them but anyways so", + "tokens": [50964, 498, 406, 337, 505, 286, 519, 661, 561, 567, 362, 1096, 341, 286, + 1338, 370, 286, 365, 552, 457, 13448, 370, 51408], "temperature": 0.0, "avg_logprob": + -0.34741485820097084, "compression_ratio": 1.6264367816091954, "no_speech_prob": + 0.0017407051054760814}, {"id": 358, "seek": 267200, "start": 2694.0, "end": 2700.8, + "text": " I think we did useful work and clearly people are using a tremendous double + you lot and people", "tokens": [51464, 286, 519, 321, 630, 4420, 589, 293, 4448, + 561, 366, 1228, 257, 10048, 3834, 291, 688, 293, 561, 51804], "temperature": 0.0, + "avg_logprob": -0.34741485820097084, "compression_ratio": 1.6264367816091954, "no_speech_prob": + 0.0017407051054760814}, {"id": 359, "seek": 270080, "start": 2700.88, "end": 2707.92, + "text": " using it mostly even even though it hasn''t like a lot of issues but it + was still ended up being", "tokens": [50368, 1228, 309, 5240, 754, 754, 1673, 309, + 6132, 380, 411, 257, 688, 295, 2663, 457, 309, 390, 920, 4590, 493, 885, 50720], + "temperature": 0.0, "avg_logprob": -0.25191213123833955, "compression_ratio": 1.6149425287356323, + "no_speech_prob": 0.0007475847960449755}, {"id": 360, "seek": 270080, "start": 2709.2000000000003, + "end": 2715.6000000000004, "text": " used rather widely and the one reason why it + was used so widely because people who needed", "tokens": [50784, 1143, 2831, 13371, + 293, 264, 472, 1778, 983, 309, 390, 1143, 370, 13371, 570, 561, 567, 2978, 51104], + "temperature": 0.0, "avg_logprob": -0.25191213123833955, "compression_ratio": 1.6149425287356323, + "no_speech_prob": 0.0007475847960449755}, {"id": 361, "seek": 270080, "start": 2716.4, + "end": 2724.8, "text": " a library the Python basically a library that would do + Kenyar''s new research and it would do it", "tokens": [51144, 257, 6405, 264, 15329, + 1936, 257, 6405, 300, 576, 360, 8273, 88, 289, 311, 777, 2132, 293, 309, 576, 360, + 309, 51564], "temperature": 0.0, "avg_logprob": -0.25191213123833955, "compression_ratio": + 1.6149425287356323, "no_speech_prob": 0.0007475847960449755}, {"id": 362, "seek": + 272480, "start": 2724.8, "end": 2730.96, "text": " from Python and people often + take these little things for granted but say initially", "tokens": [50364, 490, + 15329, 293, 561, 2049, 747, 613, 707, 721, 337, 12344, 457, 584, 9105, 50672], "temperature": + 0.0, "avg_logprob": -0.20270160918540142, "compression_ratio": 1.8434782608695652, + "no_speech_prob": 0.002656804397702217}, {"id": 363, "seek": 272480, "start": 2730.96, + "end": 2736.6400000000003, "text": " I honestly didn''t have Python bindings and + to participate and then benchmarks and have something", "tokens": [50672, 286, 6095, + 994, 380, 362, 15329, 14786, 1109, 293, 281, 8197, 293, 550, 43751, 293, 362, 746, + 50956], "temperature": 0.0, "avg_logprob": -0.20270160918540142, "compression_ratio": + 1.8434782608695652, "no_speech_prob": 0.002656804397702217}, {"id": 364, "seek": + 272480, "start": 2736.6400000000003, "end": 2741.2000000000003, "text": " useful + you would need to have Python bindings this Python bindings were written by Billek", + "tokens": [50956, 4420, 291, 576, 643, 281, 362, 15329, 14786, 1109, 341, 15329, + 14786, 1109, 645, 3720, 538, 5477, 916, 51184], "temperature": 0.0, "avg_logprob": + -0.20270160918540142, "compression_ratio": 1.8434782608695652, "no_speech_prob": + 0.002656804397702217}, {"id": 365, "seek": 272480, "start": 2742.48, "end": 2746.6400000000003, + "text": " I didn''t I didn''t create those bindings so he created those bindings + the first version", "tokens": [51248, 286, 994, 380, 286, 994, 380, 1884, 729, 14786, + 1109, 370, 415, 2942, 729, 14786, 1109, 264, 700, 3037, 51456], "temperature": 0.0, + "avg_logprob": -0.20270160918540142, "compression_ratio": 1.8434782608695652, "no_speech_prob": + 0.002656804397702217}, {"id": 366, "seek": 272480, "start": 2747.28, "end": 2751.28, + "text": " so there you go that library was possible to use and at the moment", "tokens": + [51488, 370, 456, 291, 352, 300, 6405, 390, 1944, 281, 764, 293, 412, 264, 1623, + 51688], "temperature": 0.0, "avg_logprob": -0.20270160918540142, "compression_ratio": + 1.8434782608695652, "no_speech_prob": 0.002656804397702217}, {"id": 367, "seek": + 275128, "start": 2752.2400000000002, "end": 2761.2000000000003, "text": " there + were at the moment the it was not such a big choice of libraries to do Kenyar''s + search", "tokens": [50412, 456, 645, 412, 264, 1623, 264, 309, 390, 406, 1270, 257, + 955, 3922, 295, 15148, 281, 360, 8273, 88, 289, 311, 3164, 50860], "temperature": + 0.0, "avg_logprob": -0.22021304766337077, "compression_ratio": 1.618421052631579, + "no_speech_prob": 0.0025101350620388985}, {"id": 368, "seek": 275128, "start": 2762.0, + "end": 2766.6400000000003, "text": " so in terms of the competition there was anoy + which was slower", "tokens": [50900, 370, 294, 2115, 295, 264, 6211, 456, 390, 364, + 939, 597, 390, 14009, 51132], "temperature": 0.0, "avg_logprob": -0.22021304766337077, + "compression_ratio": 1.618421052631579, "no_speech_prob": 0.0025101350620388985}, + {"id": 369, "seek": 275128, "start": 2768.0, "end": 2775.6000000000004, "text": + " noticeably slower there was another library flan which used similar algorithms + to anoy but", "tokens": [51200, 3449, 1188, 14009, 456, 390, 1071, 6405, 932, 282, + 597, 1143, 2531, 14642, 281, 364, 939, 457, 51580], "temperature": 0.0, "avg_logprob": + -0.22021304766337077, "compression_ratio": 1.618421052631579, "no_speech_prob": + 0.0025101350620388985}, {"id": 370, "seek": 277560, "start": 2776.24, "end": 2783.2, + "text": " it was less optimized and it was also slower through three wall there + was a keg graph but it was not", "tokens": [50396, 309, 390, 1570, 26941, 293, 309, + 390, 611, 14009, 807, 1045, 2929, 456, 390, 257, 803, 70, 4295, 457, 309, 390, 406, + 50744], "temperature": 0.0, "avg_logprob": -0.274062842130661, "compression_ratio": + 1.6646341463414633, "no_speech_prob": 0.0036516953259706497}, {"id": 371, "seek": + 277560, "start": 2783.2, "end": 2792.3199999999997, "text": " like so easily usable + and yeah basically that was it and later came face but it came it was only", "tokens": + [50744, 411, 370, 3612, 29975, 293, 1338, 1936, 300, 390, 309, 293, 1780, 1361, + 1851, 457, 309, 1361, 309, 390, 787, 51200], "temperature": 0.0, "avg_logprob": + -0.274062842130661, "compression_ratio": 1.6646341463414633, "no_speech_prob": 0.0036516953259706497}, + {"id": 372, "seek": 277560, "start": 2792.88, "end": 2798.72, "text": " released + I think a year a couple of years later after well definitely after", "tokens": [51228, + 4736, 286, 519, 257, 1064, 257, 1916, 295, 924, 1780, 934, 731, 2138, 934, 51520], + "temperature": 0.0, "avg_logprob": -0.274062842130661, "compression_ratio": 1.6646341463414633, + "no_speech_prob": 0.0036516953259706497}, {"id": 373, "seek": 279872, "start": 2799.2, + "end": 2805.9199999999996, "text": " yeah it took several years for face to appear + and people started using it so at some point", "tokens": [50388, 1338, 309, 1890, + 2940, 924, 337, 1851, 281, 4204, 293, 561, 1409, 1228, 309, 370, 412, 512, 935, + 50724], "temperature": 0.0, "avg_logprob": -0.25457817857915704, "compression_ratio": + 1.6741071428571428, "no_speech_prob": 0.0062931254506111145}, {"id": 374, "seek": + 279872, "start": 2805.9199999999996, "end": 2811.9199999999996, "text": " there + was vacuum and I honestly filled it now like other approaches that are taking over + yeah", "tokens": [50724, 456, 390, 14224, 293, 286, 6095, 6412, 309, 586, 411, 661, + 11587, 300, 366, 1940, 670, 1338, 51024], "temperature": 0.0, "avg_logprob": -0.25457817857915704, + "compression_ratio": 1.6741071428571428, "no_speech_prob": 0.0062931254506111145}, + {"id": 375, "seek": 279872, "start": 2813.68, "end": 2820.24, "text": " so yeah + it in summary there wasn''t a lot of serendipity but I wouldn''t take credit for + it", "tokens": [51112, 370, 1338, 309, 294, 12691, 456, 2067, 380, 257, 688, 295, + 816, 521, 647, 507, 457, 286, 2759, 380, 747, 5397, 337, 309, 51440], "temperature": + 0.0, "avg_logprob": -0.25457817857915704, "compression_ratio": 1.6741071428571428, + "no_speech_prob": 0.0062931254506111145}, {"id": 376, "seek": 279872, "start": 2821.7599999999998, + "end": 2828.16, "text": " in the industry it would have created without us for sure + yeah yeah maybe or maybe not and it''s also", "tokens": [51516, 294, 264, 3518, + 309, 576, 362, 2942, 1553, 505, 337, 988, 1338, 1338, 1310, 420, 1310, 406, 293, + 309, 311, 611, 51836], "temperature": 0.0, "avg_logprob": -0.25457817857915704, + "compression_ratio": 1.6741071428571428, "no_speech_prob": 0.0062931254506111145}, + {"id": 377, "seek": 282872, "start": 2828.72, "end": 2836.24, "text": " well I think + it''s quite a photo like a better work typical of the end turn not to", "tokens": + [50364, 731, 286, 519, 309, 311, 1596, 257, 5052, 411, 257, 1101, 589, 7476, 295, + 264, 917, 1261, 406, 281, 50740], "temperature": 0.0, "avg_logprob": -0.2418431001551011, + "compression_ratio": 1.7095238095238094, "no_speech_prob": 0.0012158129829913378}, + {"id": 378, "seek": 282872, "start": 2838.3199999999997, "end": 2843.2, "text": + " recognize the impact they''re making because the moment they do recognize that", + "tokens": [50844, 5521, 264, 2712, 436, 434, 1455, 570, 264, 1623, 436, 360, 5521, + 300, 51088], "temperature": 0.0, "avg_logprob": -0.2418431001551011, "compression_ratio": + 1.7095238095238094, "no_speech_prob": 0.0012158129829913378}, {"id": 379, "seek": + 282872, "start": 2843.2, "end": 2850.8799999999997, "text": " that''s probably end + of story so like you need to be constantly sort of low ego and pointing at the", + "tokens": [51088, 300, 311, 1391, 917, 295, 1657, 370, 411, 291, 643, 281, 312, + 6460, 1333, 295, 2295, 14495, 293, 12166, 412, 264, 51472], "temperature": 0.0, + "avg_logprob": -0.2418431001551011, "compression_ratio": 1.7095238095238094, "no_speech_prob": + 0.0012158129829913378}, {"id": 380, "seek": 282872, "start": 2850.8799999999997, + "end": 2856.24, "text": " goal and I think this is what you''re doing and that it + feels like this is your approach but you also", "tokens": [51472, 3387, 293, 286, + 519, 341, 307, 437, 291, 434, 884, 293, 300, 309, 3417, 411, 341, 307, 428, 3109, + 457, 291, 611, 51740], "temperature": 0.0, "avg_logprob": -0.2418431001551011, "compression_ratio": + 1.7095238095238094, "no_speech_prob": 0.0012158129829913378}, {"id": 381, "seek": + 285624, "start": 2856.24, "end": 2865.12, "text": " do do quite a bit of impact + I could ask a ton of questions obviously and I could relate also to", "tokens": + [50364, 360, 360, 1596, 257, 857, 295, 2712, 286, 727, 1029, 257, 2952, 295, 1651, + 2745, 293, 286, 727, 10961, 611, 281, 50808], "temperature": 0.0, "avg_logprob": + -0.13732427900487726, "compression_ratio": 1.6594827586206897, "no_speech_prob": + 0.0019651977345347404}, {"id": 382, "seek": 285624, "start": 2865.12, "end": 2872.16, + "text": " the fact that what you explained about some of the struggles like how + to optimize these algorithms", "tokens": [50808, 264, 1186, 300, 437, 291, 8825, + 466, 512, 295, 264, 17592, 411, 577, 281, 19719, 613, 14642, 51160], "temperature": + 0.0, "avg_logprob": -0.13732427900487726, "compression_ratio": 1.6594827586206897, + "no_speech_prob": 0.0019651977345347404}, {"id": 383, "seek": 285624, "start": 2872.16, + "end": 2879.3599999999997, "text": " because at some point I did embark on participating + in billion scale and thench marks and I", "tokens": [51160, 570, 412, 512, 935, + 286, 630, 29832, 322, 13950, 294, 5218, 4373, 293, 550, 339, 10640, 293, 286, 51520], + "temperature": 0.0, "avg_logprob": -0.13732427900487726, "compression_ratio": 1.6594827586206897, + "no_speech_prob": 0.0019651977345347404}, {"id": 384, "seek": 285624, "start": 2879.3599999999997, + "end": 2885.7599999999998, "text": " I think I failed miserably but at the same + time I did have some code which worked on a small scale", "tokens": [51520, 286, + 519, 286, 7612, 17725, 1188, 457, 412, 264, 912, 565, 286, 630, 362, 512, 3089, + 597, 2732, 322, 257, 1359, 4373, 51840], "temperature": 0.0, "avg_logprob": -0.13732427900487726, + "compression_ratio": 1.6594827586206897, "no_speech_prob": 0.0019651977345347404}, + {"id": 385, "seek": 288576, "start": 2885.76, "end": 2892.88, "text": " and one + of the building works there was hnsw with very simple I would say with very very + simple", "tokens": [50364, 293, 472, 295, 264, 2390, 1985, 456, 390, 276, 3695, + 86, 365, 588, 2199, 286, 576, 584, 365, 588, 588, 2199, 50720], "temperature": 0.0, + "avg_logprob": -0.16929261250929398, "compression_ratio": 1.784037558685446, "no_speech_prob": + 0.0008407257846556604}, {"id": 386, "seek": 288576, "start": 2892.88, "end": 2900.1600000000003, + "text": " intuition that you just make several I think several passes through the + data set and you try to", "tokens": [50720, 24002, 300, 291, 445, 652, 2940, 286, + 519, 2940, 11335, 807, 264, 1412, 992, 293, 291, 853, 281, 51084], "temperature": + 0.0, "avg_logprob": -0.16929261250929398, "compression_ratio": 1.784037558685446, + "no_speech_prob": 0.0008407257846556604}, {"id": 387, "seek": 288576, "start": 2901.5200000000004, + "end": 2908.1600000000003, "text": " bind points in space that are closer to each + other and then you would push them to some common", "tokens": [51152, 14786, 2793, + 294, 1901, 300, 366, 4966, 281, 1184, 661, 293, 550, 291, 576, 2944, 552, 281, 512, + 2689, 51484], "temperature": 0.0, "avg_logprob": -0.16929261250929398, "compression_ratio": + 1.784037558685446, "no_speech_prob": 0.0008407257846556604}, {"id": 388, "seek": + 288576, "start": 2908.1600000000003, "end": 2912.1600000000003, "text": " bucket + I called them shards and then you would build a nation s double index for those + shards", "tokens": [51484, 13058, 286, 1219, 552, 402, 2287, 293, 550, 291, 576, + 1322, 257, 4790, 262, 3834, 8186, 337, 729, 402, 2287, 51684], "temperature": 0.0, + "avg_logprob": -0.16929261250929398, "compression_ratio": 1.784037558685446, "no_speech_prob": + 0.0008407257846556604}, {"id": 389, "seek": 291216, "start": 2912.7999999999997, + "end": 2919.2799999999997, "text": " the only thing that I couldn''t figure out + is for those shards I still needed to have an entry point", "tokens": [50396, 264, + 787, 551, 300, 286, 2809, 380, 2573, 484, 307, 337, 729, 402, 2287, 286, 920, 2978, + 281, 362, 364, 8729, 935, 50720], "temperature": 0.0, "avg_logprob": -0.1255583555802055, + "compression_ratio": 1.6149425287356323, "no_speech_prob": 0.004277515225112438}, + {"id": 390, "seek": 291216, "start": 2919.2799999999997, "end": 2927.2, "text": + " to quickly sort of identify which shards I should go down you know through when + I when I", "tokens": [50720, 281, 2661, 1333, 295, 5876, 597, 402, 2287, 286, 820, + 352, 760, 291, 458, 807, 562, 286, 562, 286, 51116], "temperature": 0.0, "avg_logprob": + -0.1255583555802055, "compression_ratio": 1.6149425287356323, "no_speech_prob": + 0.004277515225112438}, {"id": 391, "seek": 291216, "start": 2930.16, "end": 2937.12, + "text": " when I find similar similar documents for the query and I did attempt + to modify hnsw code in", "tokens": [51264, 562, 286, 915, 2531, 2531, 8512, 337, + 264, 14581, 293, 286, 630, 5217, 281, 16927, 276, 3695, 86, 3089, 294, 51612], "temperature": + 0.0, "avg_logprob": -0.1255583555802055, "compression_ratio": 1.6149425287356323, + "no_speech_prob": 0.004277515225112438}, {"id": 392, "seek": 293712, "start": 2937.12, + "end": 2943.68, "text": " the enemy sleep you know to to like get me only first + layer of the graph so that I can", "tokens": [50364, 264, 5945, 2817, 291, 458, + 281, 281, 411, 483, 385, 787, 700, 4583, 295, 264, 4295, 370, 300, 286, 393, 50692], + "temperature": 0.0, "avg_logprob": -0.15658715496892514, "compression_ratio": 1.6340425531914893, + "no_speech_prob": 0.003468903945758939}, {"id": 393, "seek": 293712, "start": 2943.68, + "end": 2950.08, "text": " pretend that that''s my layer for entering the shards + I just ran out of time but I see it was very", "tokens": [50692, 11865, 300, 300, + 311, 452, 4583, 337, 11104, 264, 402, 2287, 286, 445, 5872, 484, 295, 565, 457, + 286, 536, 309, 390, 588, 51012], "temperature": 0.0, "avg_logprob": -0.15658715496892514, + "compression_ratio": 1.6340425531914893, "no_speech_prob": 0.003468903945758939}, + {"id": 394, "seek": 293712, "start": 2950.08, "end": 2958.64, "text": " exciting + and also thanks to the organizers we had access to really beefy machines which I + think I", "tokens": [51012, 4670, 293, 611, 3231, 281, 264, 35071, 321, 632, 2105, + 281, 534, 9256, 88, 8379, 597, 286, 519, 286, 51440], "temperature": 0.0, "avg_logprob": + -0.15658715496892514, "compression_ratio": 1.6340425531914893, "no_speech_prob": + 0.003468903945758939}, {"id": 395, "seek": 293712, "start": 2958.64, "end": 2966.24, + "text": " had I haven''t been giving like good use I was mostly burning the you + know CPU capacity and memory but", "tokens": [51440, 632, 286, 2378, 380, 668, 2902, + 411, 665, 764, 286, 390, 5240, 9488, 264, 291, 458, 13199, 6042, 293, 4675, 457, + 51820], "temperature": 0.0, "avg_logprob": -0.15658715496892514, "compression_ratio": + 1.6340425531914893, "no_speech_prob": 0.003468903945758939}, {"id": 396, "seek": + 296712, "start": 2967.3599999999997, "end": 2974.56, "text": " I think it''s an + exciting field and what what I hope is that like with the vacuum that you mentioned", + "tokens": [50376, 286, 519, 309, 311, 364, 4670, 2519, 293, 437, 437, 286, 1454, + 307, 300, 411, 365, 264, 14224, 300, 291, 2835, 50736], "temperature": 0.0, "avg_logprob": + -0.10567137706710632, "compression_ratio": 1.7276785714285714, "no_speech_prob": + 0.0012144262436777353}, {"id": 397, "seek": 296712, "start": 2974.56, "end": 2981.3599999999997, + "text": " that it doesn''t happen that this torch will be carried forward and then + someone will get excited", "tokens": [50736, 300, 309, 1177, 380, 1051, 300, 341, + 27822, 486, 312, 9094, 2128, 293, 550, 1580, 486, 483, 2919, 51076], "temperature": + 0.0, "avg_logprob": -0.10567137706710632, "compression_ratio": 1.7276785714285714, + "no_speech_prob": 0.0012144262436777353}, {"id": 398, "seek": 296712, "start": 2981.3599999999997, + "end": 2990.24, "text": " about and not afraid of trying new things in this space + are you yourself still like looking at", "tokens": [51076, 466, 293, 406, 4638, + 295, 1382, 777, 721, 294, 341, 1901, 366, 291, 1803, 920, 411, 1237, 412, 51520], + "temperature": 0.0, "avg_logprob": -0.10567137706710632, "compression_ratio": 1.7276785714285714, + "no_speech_prob": 0.0012144262436777353}, {"id": 399, "seek": 296712, "start": 2991.3599999999997, + "end": 2995.68, "text": " obviously you''re looking after enemy sleep but is there + something that particularly excites you", "tokens": [51576, 2745, 291, 434, 1237, + 934, 5945, 2817, 457, 307, 456, 746, 300, 4098, 1624, 3324, 291, 51792], "temperature": + 0.0, "avg_logprob": -0.10567137706710632, "compression_ratio": 1.7276785714285714, + "no_speech_prob": 0.0012144262436777353}, {"id": 400, "seek": 299568, "start": 2995.68, + "end": 2999.2799999999997, "text": " in this field that you would be working on + or you are working on?", "tokens": [50364, 294, 341, 2519, 300, 291, 576, 312, 1364, + 322, 420, 291, 366, 1364, 322, 30, 50544], "temperature": 0.0, "avg_logprob": -0.29092517495155334, + "compression_ratio": 1.5029239766081872, "no_speech_prob": 0.0048709348775446415}, + {"id": 401, "seek": 299568, "start": 3002.48, "end": 3012.24, "text": " Yeah great + question so first of all yeah I am not sure if if I would do any work on vector + search", "tokens": [50704, 865, 869, 1168, 370, 700, 295, 439, 1338, 286, 669, 406, + 988, 498, 498, 286, 576, 360, 604, 589, 322, 8062, 3164, 51192], "temperature": + 0.0, "avg_logprob": -0.29092517495155334, "compression_ratio": 1.5029239766081872, + "no_speech_prob": 0.0048709348775446415}, {"id": 402, "seek": 299568, "start": 3012.24, + "end": 3021.6, "text": " in the I haven''t actually not maintaining an enemy sleep + pretty well recently I''m just didn''t", "tokens": [51192, 294, 264, 286, 2378, + 380, 767, 406, 14916, 364, 5945, 2817, 1238, 731, 3938, 286, 478, 445, 994, 380, + 51660], "temperature": 0.0, "avg_logprob": -0.29092517495155334, "compression_ratio": + 1.5029239766081872, "no_speech_prob": 0.0048709348775446415}, {"id": 403, "seek": + 302160, "start": 3021.6, "end": 3029.52, "text": " have a lot of time and there + was an issue with building the so I will still fix and support", "tokens": [50364, + 362, 257, 688, 295, 565, 293, 456, 390, 364, 2734, 365, 2390, 264, 370, 286, 486, + 920, 3191, 293, 1406, 50760], "temperature": 0.0, "avg_logprob": -0.2147812422584085, + "compression_ratio": 1.569060773480663, "no_speech_prob": 0.003949436359107494}, + {"id": 404, "seek": 302160, "start": 3029.52, "end": 3037.2799999999997, "text": + " later version of Python for sure I was like you know like piece my piecemeal work + I found like say", "tokens": [50760, 1780, 3037, 295, 15329, 337, 988, 286, 390, + 411, 291, 458, 411, 2522, 452, 2522, 32914, 589, 286, 1352, 411, 584, 51148], "temperature": + 0.0, "avg_logprob": -0.2147812422584085, "compression_ratio": 1.569060773480663, + "no_speech_prob": 0.003949436359107494}, {"id": 405, "seek": 302160, "start": 3037.2799999999997, + "end": 3044.56, "text": " half a day to fix like Windows build something else popped + up yeah so it is an exciting field", "tokens": [51148, 1922, 257, 786, 281, 3191, + 411, 8591, 1322, 746, 1646, 21545, 493, 1338, 370, 309, 307, 364, 4670, 2519, 51512], + "temperature": 0.0, "avg_logprob": -0.2147812422584085, "compression_ratio": 1.569060773480663, + "no_speech_prob": 0.003949436359107494}, {"id": 406, "seek": 304456, "start": 3045.12, + "end": 3055.12, "text": " while it''s also become really busy and another thing + is that I still see the focus main focus", "tokens": [50392, 1339, 309, 311, 611, + 1813, 534, 5856, 293, 1071, 551, 307, 300, 286, 920, 536, 264, 1879, 2135, 1879, + 50892], "temperature": 0.0, "avg_logprob": -0.15768599855727045, "compression_ratio": + 1.6845238095238095, "no_speech_prob": 0.0061231753788888454}, {"id": 407, "seek": + 304456, "start": 3055.12, "end": 3062.16, "text": " it''s not it''s not very appreciated + so the like you said you said I mean there''s a really nice", "tokens": [50892, + 309, 311, 406, 309, 311, 406, 588, 17169, 370, 264, 411, 291, 848, 291, 848, 286, + 914, 456, 311, 257, 534, 1481, 51244], "temperature": 0.0, "avg_logprob": -0.15768599855727045, + "compression_ratio": 1.6845238095238095, "no_speech_prob": 0.0061231753788888454}, + {"id": 408, "seek": 304456, "start": 3062.16, "end": 3068.48, "text": " voice that + all like that helped industry to be created and maybe it''s true to some degree + it is", "tokens": [51244, 3177, 300, 439, 411, 300, 4254, 3518, 281, 312, 2942, + 293, 1310, 309, 311, 2074, 281, 512, 4314, 309, 307, 51560], "temperature": 0.0, + "avg_logprob": -0.15768599855727045, "compression_ratio": 1.6845238095238095, "no_speech_prob": + 0.0061231753788888454}, {"id": 409, "seek": 306848, "start": 3069.12, "end": 3076.4, + "text": " what yeah is it appreciated by you know your potential employer no it''s + like zero appreciation so it", "tokens": [50396, 437, 1338, 307, 309, 17169, 538, + 291, 458, 428, 3995, 16205, 572, 309, 311, 411, 4018, 18909, 370, 309, 50760], "temperature": + 0.0, "avg_logprob": -0.17732033048357282, "compression_ratio": 1.6781609195402298, + "no_speech_prob": 0.003773072035983205}, {"id": 410, "seek": 306848, "start": 3076.4, + "end": 3087.52, "text": " it isn''t it''s it''s it''s it wasn''t still is somewhat + niche topic and most people are of course", "tokens": [50760, 309, 1943, 380, 309, + 311, 309, 311, 309, 311, 309, 2067, 380, 920, 307, 8344, 19956, 4829, 293, 881, + 561, 366, 295, 1164, 51316], "temperature": 0.0, "avg_logprob": -0.17732033048357282, + "compression_ratio": 1.6781609195402298, "no_speech_prob": 0.003773072035983205}, + {"id": 411, "seek": 306848, "start": 3087.52, "end": 3095.52, "text": " I interested + in how do you solve intelligence in that you know broad sense of the word how do + you", "tokens": [51316, 286, 3102, 294, 577, 360, 291, 5039, 7599, 294, 300, 291, + 458, 4152, 2020, 295, 264, 1349, 577, 360, 291, 51716], "temperature": 0.0, "avg_logprob": + -0.17732033048357282, "compression_ratio": 1.6781609195402298, "no_speech_prob": + 0.003773072035983205}, {"id": 412, "seek": 309552, "start": 3095.52, "end": 3101.04, + "text": " create models that can be seen and if the model like how you can combine + them that this is like", "tokens": [50364, 1884, 5245, 300, 393, 312, 1612, 293, + 498, 264, 2316, 411, 577, 291, 393, 10432, 552, 300, 341, 307, 411, 50640], "temperature": + 0.0, "avg_logprob": -0.14303845625657302, "compression_ratio": 1.7065868263473054, + "no_speech_prob": 0.003035679692402482}, {"id": 413, "seek": 309552, "start": 3101.04, + "end": 3111.12, "text": " you know this new agentic ecosystem and yeah so all that + stuff that really excites people it", "tokens": [50640, 291, 458, 341, 777, 9461, + 299, 11311, 293, 1338, 370, 439, 300, 1507, 300, 534, 1624, 3324, 561, 309, 51144], + "temperature": 0.0, "avg_logprob": -0.14303845625657302, "compression_ratio": 1.7065868263473054, + "no_speech_prob": 0.003035679692402482}, {"id": 414, "seek": 309552, "start": 3111.12, + "end": 3122.08, "text": " it isn''t this you know plane of or space of large language + models machine learning deep learning", "tokens": [51144, 309, 1943, 380, 341, 291, + 458, 5720, 295, 420, 1901, 295, 2416, 2856, 5245, 3479, 2539, 2452, 2539, 51692], + "temperature": 0.0, "avg_logprob": -0.14303845625657302, "compression_ratio": 1.7065868263473054, + "no_speech_prob": 0.003035679692402482}, {"id": 415, "seek": 312208, "start": 3122.08, + "end": 3132.48, "text": " intelligence you name it yeah so that''s why I do have + ideas I did tested some of them but", "tokens": [50364, 7599, 291, 1315, 309, 1338, + 370, 300, 311, 983, 286, 360, 362, 3487, 286, 630, 8246, 512, 295, 552, 457, 50884], + "temperature": 0.0, "avg_logprob": -0.1587352890899216, "compression_ratio": 1.6551724137931034, + "no_speech_prob": 0.0018897616537287831}, {"id": 416, "seek": 312208, "start": 3133.36, + "end": 3141.12, "text": " you know things usually don''t work but yeah I don''t + have time to think systematically about this", "tokens": [50928, 291, 458, 721, + 2673, 500, 380, 589, 457, 1338, 286, 500, 380, 362, 565, 281, 519, 39531, 466, 341, + 51316], "temperature": 0.0, "avg_logprob": -0.1587352890899216, "compression_ratio": + 1.6551724137931034, "no_speech_prob": 0.0018897616537287831}, {"id": 417, "seek": + 312208, "start": 3142.08, "end": 3151.2799999999997, "text": " issues yeah but I + guess at the same time you did create the base for for you know for other people + to", "tokens": [51364, 2663, 1338, 457, 286, 2041, 412, 264, 912, 565, 291, 630, + 1884, 264, 3096, 337, 337, 291, 458, 337, 661, 561, 281, 51824], "temperature": + 0.0, "avg_logprob": -0.1587352890899216, "compression_ratio": 1.6551724137931034, + "no_speech_prob": 0.0018897616537287831}, {"id": 418, "seek": 315128, "start": 3151.28, + "end": 3159.6800000000003, "text": " innovate and I think it''s I think it''s highly + appreciated really I also wanted I also wanted to", "tokens": [50364, 33444, 293, + 286, 519, 309, 311, 286, 519, 309, 311, 5405, 17169, 534, 286, 611, 1415, 286, 611, + 1415, 281, 50784], "temperature": 0.0, "avg_logprob": -0.16144752502441406, "compression_ratio": + 1.6823529411764706, "no_speech_prob": 0.0018721995875239372}, {"id": 419, "seek": + 315128, "start": 3160.96, "end": 3167.36, "text": " pick up the topic that originally + sort of interested when you interested me when when you", "tokens": [50848, 1888, + 493, 264, 4829, 300, 7993, 1333, 295, 3102, 562, 291, 3102, 385, 562, 562, 291, + 51168], "temperature": 0.0, "avg_logprob": -0.16144752502441406, "compression_ratio": + 1.6823529411764706, "no_speech_prob": 0.0018721995875239372}, {"id": 420, "seek": + 315128, "start": 3167.36, "end": 3180.32, "text": " popped up on my LinkedIn you + know feed you you made a statement about relational databases trying to", "tokens": + [51168, 21545, 493, 322, 452, 20657, 291, 458, 3154, 291, 291, 1027, 257, 5629, + 466, 38444, 22380, 1382, 281, 51816], "temperature": 0.0, "avg_logprob": -0.16144752502441406, + "compression_ratio": 1.6823529411764706, "no_speech_prob": 0.0018721995875239372}, + {"id": 421, "seek": 318032, "start": 3180.32, "end": 3187.1200000000003, "text": + " implement search feature or sort of capability and sort of like miserably failing + and that maybe", "tokens": [50364, 4445, 3164, 4111, 420, 1333, 295, 13759, 293, + 1333, 295, 411, 17725, 1188, 18223, 293, 300, 1310, 50704], "temperature": 0.0, + "avg_logprob": -0.08850261298092929, "compression_ratio": 1.728110599078341, "no_speech_prob": + 0.0024983815383166075}, {"id": 422, "seek": 318032, "start": 3187.1200000000003, + "end": 3193.6800000000003, "text": " you didn''t use the word miserably it''s it''s + my word here but I wanted to do a little bit like", "tokens": [50704, 291, 994, + 380, 764, 264, 1349, 17725, 1188, 309, 311, 309, 311, 452, 1349, 510, 457, 286, + 1415, 281, 360, 257, 707, 857, 411, 51032], "temperature": 0.0, "avg_logprob": -0.08850261298092929, + "compression_ratio": 1.728110599078341, "no_speech_prob": 0.0024983815383166075}, + {"id": 423, "seek": 318032, "start": 3193.6800000000003, "end": 3201.1200000000003, + "text": " expand on this like why do you think they tried to do that and also while + they were trying that", "tokens": [51032, 5268, 322, 341, 411, 983, 360, 291, 519, + 436, 3031, 281, 360, 300, 293, 611, 1339, 436, 645, 1382, 300, 51404], "temperature": + 0.0, "avg_logprob": -0.08850261298092929, "compression_ratio": 1.728110599078341, + "no_speech_prob": 0.0024983815383166075}, {"id": 424, "seek": 318032, "start": 3201.1200000000003, + "end": 3209.52, "text": " what went wrong yeah great question well first of all + I definitely wouldn''t say the word", "tokens": [51404, 437, 1437, 2085, 1338, 869, + 1168, 731, 700, 295, 439, 286, 2138, 2759, 380, 584, 264, 1349, 51824], "temperature": + 0.0, "avg_logprob": -0.08850261298092929, "compression_ratio": 1.728110599078341, + "no_speech_prob": 0.0024983815383166075}, {"id": 425, "seek": 320952, "start": 3209.52, + "end": 3216.08, "text": " miserably because the it it has been success to some degree + definitely and it''s not", "tokens": [50364, 17725, 1188, 570, 264, 309, 309, 575, + 668, 2245, 281, 512, 4314, 2138, 293, 309, 311, 406, 50692], "temperature": 0.0, + "avg_logprob": -0.10345675006057277, "compression_ratio": 1.623456790123457, "no_speech_prob": + 0.0017692841356620193}, {"id": 426, "seek": 320952, "start": 3217.2, "end": 3224.24, + "text": " it''s not over until it''s over so the people are working with this so + what I have been", "tokens": [50748, 309, 311, 406, 670, 1826, 309, 311, 670, 370, + 264, 561, 366, 1364, 365, 341, 370, 437, 286, 362, 668, 51100], "temperature": 0.0, + "avg_logprob": -0.10345675006057277, "compression_ratio": 1.623456790123457, "no_speech_prob": + 0.0017692841356620193}, {"id": 427, "seek": 320952, "start": 3224.24, "end": 3231.52, + "text": " observing for many many years and I as I said I did start my career as + as a person working on", "tokens": [51100, 22107, 337, 867, 867, 924, 293, 286, + 382, 286, 848, 286, 630, 722, 452, 3988, 382, 382, 257, 954, 1364, 322, 51464], + "temperature": 0.0, "avg_logprob": -0.10345675006057277, "compression_ratio": 1.623456790123457, + "no_speech_prob": 0.0017692841356620193}, {"id": 428, "seek": 323152, "start": 3232.16, + "end": 3246.72, "text": " databases and doing a lot of writing a lot of SQL so but + the the typical database is a very", "tokens": [50396, 22380, 293, 884, 257, 688, + 295, 3579, 257, 688, 295, 19200, 370, 457, 264, 264, 7476, 8149, 307, 257, 588, + 51124], "temperature": 0.0, "avg_logprob": -0.23455670674641926, "compression_ratio": + 1.484375, "no_speech_prob": 0.002046620938926935}, {"id": 429, "seek": 323152, "start": + 3246.72, "end": 3255.84, "text": " different beast from what you typically need + to do information to you so the first of all they all", "tokens": [51124, 819, 13464, + 490, 437, 291, 5850, 643, 281, 360, 1589, 281, 291, 370, 264, 700, 295, 439, 436, + 439, 51580], "temperature": 0.0, "avg_logprob": -0.23455670674641926, "compression_ratio": + 1.484375, "no_speech_prob": 0.002046620938926935}, {"id": 430, "seek": 325584, "start": + 3255.92, "end": 3263.04, "text": " like the early databases or they are oriented + they achieve some tradeoff between the", "tokens": [50368, 411, 264, 2440, 22380, + 420, 436, 366, 21841, 436, 4584, 512, 4923, 4506, 1296, 264, 50724], "temperature": + 0.0, "avg_logprob": -0.20267138564795778, "compression_ratio": 1.6932515337423313, + "no_speech_prob": 0.002183576114475727}, {"id": 431, "seek": 325584, "start": 3266.1600000000003, + "end": 3273.2000000000003, "text": " so they need have got throughput in both inserts + and updates and they need to be able to update", "tokens": [50880, 370, 436, 643, + 362, 658, 44629, 294, 1293, 49163, 293, 9205, 293, 436, 643, 281, 312, 1075, 281, + 5623, 51232], "temperature": 0.0, "avg_logprob": -0.20267138564795778, "compression_ratio": + 1.6932515337423313, "no_speech_prob": 0.002183576114475727}, {"id": 432, "seek": + 325584, "start": 3274.0, "end": 3281.1200000000003, "text": " information pretty + quickly and also it should be pretty reasonable and they also support really", "tokens": + [51272, 1589, 1238, 2661, 293, 611, 309, 820, 312, 1238, 10585, 293, 436, 611, 1406, + 534, 51628], "temperature": 0.0, "avg_logprob": -0.20267138564795778, "compression_ratio": + 1.6932515337423313, "no_speech_prob": 0.002183576114475727}, {"id": 433, "seek": + 328112, "start": 3281.2799999999997, "end": 3292.88, "text": " like the the data + the data can be pretty complicated that what they call that SQL schema there can", + "tokens": [50372, 411, 264, 264, 1412, 264, 1412, 393, 312, 1238, 6179, 300, 437, + 436, 818, 300, 19200, 34078, 456, 393, 50952], "temperature": 0.0, "avg_logprob": + -0.19168907403945923, "compression_ratio": 1.7797619047619047, "no_speech_prob": + 0.0016946801915764809}, {"id": 434, "seek": 328112, "start": 3292.88, "end": 3298.72, + "text": " be multiple tables and all of that needs to be supported and so of course + there are tradeoffs to be", "tokens": [50952, 312, 3866, 8020, 293, 439, 295, 300, + 2203, 281, 312, 8104, 293, 370, 295, 1164, 456, 366, 4923, 19231, 281, 312, 51244], + "temperature": 0.0, "avg_logprob": -0.19168907403945923, "compression_ratio": 1.7797619047619047, + "no_speech_prob": 0.0016946801915764809}, {"id": 435, "seek": 328112, "start": 3298.72, + "end": 3306.4, "text": " made to make it possible and again to support generality + support efficient updates support efficient", "tokens": [51244, 1027, 281, 652, + 309, 1944, 293, 797, 281, 1406, 1337, 1860, 1406, 7148, 9205, 1406, 7148, 51628], + "temperature": 0.0, "avg_logprob": -0.19168907403945923, "compression_ratio": 1.7797619047619047, + "no_speech_prob": 0.0016946801915764809}, {"id": 436, "seek": 330640, "start": 3306.48, + "end": 3313.28, "text": " inserts but at the same time if you''re doing the TV system + a lot of of this is not necessary", "tokens": [50368, 49163, 457, 412, 264, 912, + 565, 498, 291, 434, 884, 264, 3558, 1185, 257, 688, 295, 295, 341, 307, 406, 4818, + 50708], "temperature": 0.0, "avg_logprob": -0.2388037716049746, "compression_ratio": + 1.6511627906976745, "no_speech_prob": 0.003451433964073658}, {"id": 437, "seek": + 330640, "start": 3313.84, "end": 3321.2000000000003, "text": " so say for you want + to do like keyword-based retrieval you only need to all at high level", "tokens": + [50736, 370, 584, 337, 291, 528, 281, 360, 411, 20428, 12, 6032, 19817, 3337, 291, + 787, 643, 281, 439, 412, 1090, 1496, 51104], "temperature": 0.0, "avg_logprob": + -0.2388037716049746, "compression_ratio": 1.6511627906976745, "no_speech_prob": + 0.003451433964073658}, {"id": 438, "seek": 330640, "start": 3321.84, "end": 3325.6800000000003, + "text": " is somewhat a simplification but you need to memorize them in which documents", + "tokens": [51136, 307, 8344, 257, 6883, 3774, 457, 291, 643, 281, 27478, 552, 294, + 597, 8512, 51328], "temperature": 0.0, "avg_logprob": -0.2388037716049746, "compression_ratio": + 1.6511627906976745, "no_speech_prob": 0.003451433964073658}, {"id": 439, "seek": + 330640, "start": 3327.84, "end": 3333.04, "text": " you have which keywords and + then you have this so called inverted index where for each keyword", "tokens": [51436, + 291, 362, 597, 21009, 293, 550, 291, 362, 341, 370, 1219, 38969, 8186, 689, 337, + 1184, 20428, 51696], "temperature": 0.0, "avg_logprob": -0.2388037716049746, "compression_ratio": + 1.6511627906976745, "no_speech_prob": 0.003451433964073658}, {"id": 440, "seek": + 333304, "start": 3333.2, "end": 3340.32, "text": " you have a list of documents + where these keywords appear and it''s much simple structure it permits", "tokens": + [50372, 291, 362, 257, 1329, 295, 8512, 689, 613, 21009, 4204, 293, 309, 311, 709, + 2199, 3877, 309, 30990, 50728], "temperature": 0.0, "avg_logprob": -0.1891049100207044, + "compression_ratio": 1.7609756097560976, "no_speech_prob": 0.001181987812742591}, + {"id": 441, "seek": 333304, "start": 3340.32, "end": 3349.44, "text": " much more + efficient compression algorithms so it''s again it''s it''s a different beast and + and also", "tokens": [50728, 709, 544, 7148, 19355, 14642, 370, 309, 311, 797, 309, + 311, 309, 311, 257, 819, 13464, 293, 293, 611, 51184], "temperature": 0.0, "avg_logprob": + -0.1891049100207044, "compression_ratio": 1.7609756097560976, "no_speech_prob": + 0.001181987812742591}, {"id": 442, "seek": 333304, "start": 3350.0, "end": 3354.32, + "text": " in terms of efficiency of updates once you compress data and once you", + "tokens": [51212, 294, 2115, 295, 10493, 295, 9205, 1564, 291, 14778, 1412, 293, + 1564, 291, 51428], "temperature": 0.0, "avg_logprob": -0.1891049100207044, "compression_ratio": + 1.7609756097560976, "no_speech_prob": 0.001181987812742591}, {"id": 443, "seek": + 333304, "start": 3355.12, "end": 3362.72, "text": " represent it in a special way + it''s it becomes much harder to to make these incremental updates", "tokens": [51468, + 2906, 309, 294, 257, 2121, 636, 309, 311, 309, 3643, 709, 6081, 281, 281, 652, 613, + 35759, 9205, 51848], "temperature": 0.0, "avg_logprob": -0.1891049100207044, "compression_ratio": + 1.7609756097560976, "no_speech_prob": 0.001181987812742591}, {"id": 444, "seek": + 336304, "start": 3363.68, "end": 3371.36, "text": " for which those early databases + were applied so clearly there is a disconnect it was somewhat", "tokens": [50396, + 337, 597, 729, 2440, 22380, 645, 6456, 370, 4448, 456, 307, 257, 14299, 309, 390, + 8344, 50780], "temperature": 0.0, "avg_logprob": -0.16825653334795418, "compression_ratio": + 1.6171428571428572, "no_speech_prob": 0.0012183558428660035}, {"id": 445, "seek": + 336304, "start": 3371.36, "end": 3380.72, "text": " removed with the introduction + of so-called columnar databases but it''s still like with columnar", "tokens": [50780, + 7261, 365, 264, 9339, 295, 370, 12, 11880, 7738, 289, 22380, 457, 309, 311, 920, + 411, 365, 7738, 289, 51248], "temperature": 0.0, "avg_logprob": -0.16825653334795418, + "compression_ratio": 1.6171428571428572, "no_speech_prob": 0.0012183558428660035}, + {"id": 446, "seek": 336304, "start": 3380.72, "end": 3389.2799999999997, "text": + " databases I believe they actually do not favor those you know point updates anymore + they they", "tokens": [51248, 22380, 286, 1697, 436, 767, 360, 406, 2294, 729, 291, + 458, 935, 9205, 3602, 436, 436, 51676], "temperature": 0.0, "avg_logprob": -0.16825653334795418, + "compression_ratio": 1.6171428571428572, "no_speech_prob": 0.0012183558428660035}, + {"id": 447, "seek": 338928, "start": 3389.36, "end": 3395.76, "text": " are best + to be used for bulk updates and so basically once you''re doing bulk updates", "tokens": + [50368, 366, 1151, 281, 312, 1143, 337, 16139, 9205, 293, 370, 1936, 1564, 291, + 434, 884, 16139, 9205, 50688], "temperature": 0.0, "avg_logprob": -0.20951991611056858, + "compression_ratio": 1.809278350515464, "no_speech_prob": 0.001595189911313355}, + {"id": 448, "seek": 338928, "start": 3395.76, "end": 3403.92, "text": " yeah you''re + sort of in this search engine area where you ask you you change things in in", "tokens": + [50688, 1338, 291, 434, 1333, 295, 294, 341, 3164, 2848, 1859, 689, 291, 1029, 291, + 291, 1319, 721, 294, 294, 51096], "temperature": 0.0, "avg_logprob": -0.20951991611056858, + "compression_ratio": 1.809278350515464, "no_speech_prob": 0.001595189911313355}, + {"id": 449, "seek": 338928, "start": 3404.8, "end": 3409.28, "text": " rather large + increments changing the access in rather large increments and you don''t", "tokens": + [51140, 2831, 2416, 1946, 1117, 4473, 264, 2105, 294, 2831, 2416, 1946, 1117, 293, + 291, 500, 380, 51364], "temperature": 0.0, "avg_logprob": -0.20951991611056858, + "compression_ratio": 1.809278350515464, "no_speech_prob": 0.001595189911313355}, + {"id": 450, "seek": 338928, "start": 3410.96, "end": 3416.96, "text": " you don''t + worry too much about your information is being like really up to date you can wait", + "tokens": [51448, 291, 500, 380, 3292, 886, 709, 466, 428, 1589, 307, 885, 411, + 534, 493, 281, 4002, 291, 393, 1699, 51748], "temperature": 0.0, "avg_logprob": + -0.20951991611056858, "compression_ratio": 1.809278350515464, "no_speech_prob": + 0.001595189911313355}, {"id": 451, "seek": 341696, "start": 3416.96, "end": 3422.88, + "text": " maybe a day maybe a few hours but it doesn''t have like an instantaneous + update of the database", "tokens": [50364, 1310, 257, 786, 1310, 257, 1326, 2496, + 457, 309, 1177, 380, 362, 411, 364, 45596, 5623, 295, 264, 8149, 50660], "temperature": + 0.0, "avg_logprob": -0.12978360869667746, "compression_ratio": 1.7123287671232876, + "no_speech_prob": 0.0020212840754538774}, {"id": 452, "seek": 341696, "start": 3422.88, + "end": 3429.36, "text": " so this is a different trade-offs so yeah um well of course + there is a disconnect and this is why", "tokens": [50660, 370, 341, 307, 257, 819, + 4923, 12, 19231, 370, 1338, 1105, 731, 295, 1164, 456, 307, 257, 14299, 293, 341, + 307, 983, 50984], "temperature": 0.0, "avg_logprob": -0.12978360869667746, "compression_ratio": + 1.7123287671232876, "no_speech_prob": 0.0020212840754538774}, {"id": 453, "seek": + 341696, "start": 3429.36, "end": 3436.56, "text": " it''s it was always hard I believe + to add to add like food tax indexes to regular databases", "tokens": [50984, 309, + 311, 309, 390, 1009, 1152, 286, 1697, 281, 909, 281, 909, 411, 1755, 3366, 8186, + 279, 281, 3890, 22380, 51344], "temperature": 0.0, "avg_logprob": -0.12978360869667746, + "compression_ratio": 1.7123287671232876, "no_speech_prob": 0.0020212840754538774}, + {"id": 454, "seek": 341696, "start": 3437.92, "end": 3445.2, "text": " but another + issue with the the disconnect is that like again the retrieval often needs like", + "tokens": [51412, 457, 1071, 2734, 365, 264, 264, 14299, 307, 300, 411, 797, 264, + 19817, 3337, 2049, 2203, 411, 51776], "temperature": 0.0, "avg_logprob": -0.12978360869667746, + "compression_ratio": 1.7123287671232876, "no_speech_prob": 0.0020212840754538774}, + {"id": 455, "seek": 344520, "start": 3445.2, "end": 3452.8799999999997, "text": + " really different set of specialized features so if you have a relational database + system", "tokens": [50364, 534, 819, 992, 295, 19813, 4122, 370, 498, 291, 362, + 257, 38444, 8149, 1185, 50748], "temperature": 0.0, "avg_logprob": -0.14985672372286438, + "compression_ratio": 1.6395348837209303, "no_speech_prob": 0.000740553077775985}, + {"id": 456, "seek": 344520, "start": 3454.24, "end": 3460.72, "text": " it''s pretty + hard to support this for example like deconization if you need to do deconization", + "tokens": [50816, 309, 311, 1238, 1152, 281, 1406, 341, 337, 1365, 411, 979, 266, + 2144, 498, 291, 643, 281, 360, 979, 266, 2144, 51140], "temperature": 0.0, "avg_logprob": + -0.14985672372286438, "compression_ratio": 1.6395348837209303, "no_speech_prob": + 0.000740553077775985}, {"id": 457, "seek": 344520, "start": 3461.2799999999997, + "end": 3467.7599999999998, "text": " in multiple languages yeah so of course that''s + part like you know the creation of those specialized", "tokens": [51168, 294, 3866, + 8650, 1338, 370, 295, 1164, 300, 311, 644, 411, 291, 458, 264, 8016, 295, 729, 19813, + 51492], "temperature": 0.0, "avg_logprob": -0.14985672372286438, "compression_ratio": + 1.6395348837209303, "no_speech_prob": 0.000740553077775985}, {"id": 458, "seek": + 346776, "start": 3467.76, "end": 3474.1600000000003, "text": " tools with a lot + of features like you''ve seen and VESPA and databases are catching up", "tokens": + [50364, 3873, 365, 257, 688, 295, 4122, 411, 291, 600, 1612, 293, 691, 2358, 10297, + 293, 22380, 366, 16124, 493, 50684], "temperature": 0.0, "avg_logprob": -0.20436343927493042, + "compression_ratio": 1.6333333333333333, "no_speech_prob": 0.0016593633918091655}, + {"id": 459, "seek": 346776, "start": 3475.1200000000003, "end": 3483.6000000000004, + "text": " but there is still a gap and you know it''s probably like going to be + really tedious to to support", "tokens": [50732, 457, 456, 307, 920, 257, 7417, + 293, 291, 458, 309, 311, 1391, 411, 516, 281, 312, 534, 38284, 281, 281, 1406, 51156], + "temperature": 0.0, "avg_logprob": -0.20436343927493042, "compression_ratio": 1.6333333333333333, + "no_speech_prob": 0.0016593633918091655}, {"id": 460, "seek": 346776, "start": 3484.5600000000004, + "end": 3487.84, "text": " yeah full set of features like you know you need to match + VESPA", "tokens": [51204, 1338, 1577, 992, 295, 4122, 411, 291, 458, 291, 643, 281, + 2995, 691, 2358, 10297, 51368], "temperature": 0.0, "avg_logprob": -0.20436343927493042, + "compression_ratio": 1.6333333333333333, "no_speech_prob": 0.0016593633918091655}, + {"id": 461, "seek": 346776, "start": 3489.44, "end": 3494.96, "text": " so yeah + these are like my five cents on this stuff yeah but I''m curious to sort of a little + bit", "tokens": [51448, 370, 1338, 613, 366, 411, 452, 1732, 14941, 322, 341, 1507, + 1338, 457, 286, 478, 6369, 281, 1333, 295, 257, 707, 857, 51724], "temperature": + 0.0, "avg_logprob": -0.20436343927493042, "compression_ratio": 1.6333333333333333, + "no_speech_prob": 0.0016593633918091655}, {"id": 462, "seek": 349496, "start": 3494.96, + "end": 3503.76, "text": " the understand why do you think databases are still trying + why are they trying to", "tokens": [50364, 264, 1223, 983, 360, 291, 519, 22380, + 366, 920, 1382, 983, 366, 436, 1382, 281, 50804], "temperature": 0.0, "avg_logprob": + -0.18421046517112039, "compression_ratio": 1.630952380952381, "no_speech_prob": + 0.0010139207588508725}, {"id": 463, "seek": 349496, "start": 3504.8, "end": 3514.0, + "text": " encompass this seemingly disparate ways of searching right when you actually + if basically like", "tokens": [50856, 28268, 341, 18709, 14548, 473, 2098, 295, + 10808, 558, 562, 291, 767, 498, 1936, 411, 51316], "temperature": 0.0, "avg_logprob": + -0.18421046517112039, "compression_ratio": 1.630952380952381, "no_speech_prob": + 0.0010139207588508725}, {"id": 464, "seek": 349496, "start": 3514.0, "end": 3518.64, + "text": " you explained if you need to have a fully blown search engine that can + support multiple languages", "tokens": [51316, 291, 8825, 498, 291, 643, 281, 362, + 257, 4498, 16479, 3164, 2848, 300, 393, 1406, 3866, 8650, 51548], "temperature": + 0.0, "avg_logprob": -0.18421046517112039, "compression_ratio": 1.630952380952381, + "no_speech_prob": 0.0010139207588508725}, {"id": 465, "seek": 351864, "start": 3518.72, + "end": 3526.08, "text": " tokenization and so on you better be using the likes of + recene VESPA and you know maybe", "tokens": [50368, 14862, 2144, 293, 370, 322, + 291, 1101, 312, 1228, 264, 5902, 295, 850, 1450, 691, 2358, 10297, 293, 291, 458, + 1310, 50736], "temperature": 0.0, "avg_logprob": -0.22827778586858435, "compression_ratio": + 1.5922330097087378, "no_speech_prob": 0.001773255062289536}, {"id": 466, "seek": + 351864, "start": 3526.08, "end": 3530.72, "text": " elastic search on top of the + scene and so on why why are they still trying", "tokens": [50736, 17115, 3164, 322, + 1192, 295, 264, 4145, 293, 370, 322, 983, 983, 366, 436, 920, 1382, 50968], "temperature": + 0.0, "avg_logprob": -0.22827778586858435, "compression_ratio": 1.5922330097087378, + "no_speech_prob": 0.001773255062289536}, {"id": 467, "seek": 351864, "start": 3533.12, + "end": 3540.4, "text": " they want customers it''s of course advantageous to be + like you know one stop shop so they come", "tokens": [51088, 436, 528, 4581, 309, + 311, 295, 1164, 5002, 563, 281, 312, 411, 291, 458, 472, 1590, 3945, 370, 436, 808, + 51452], "temperature": 0.0, "avg_logprob": -0.22827778586858435, "compression_ratio": + 1.5922330097087378, "no_speech_prob": 0.001773255062289536}, {"id": 468, "seek": + 351864, "start": 3542.08, "end": 3546.72, "text": " to specific provider and they + have everything so I listen to a podcast", "tokens": [51536, 281, 2685, 12398, 293, + 436, 362, 1203, 370, 286, 2140, 281, 257, 7367, 51768], "temperature": 0.0, "avg_logprob": + -0.22827778586858435, "compression_ratio": 1.5922330097087378, "no_speech_prob": + 0.001773255062289536}, {"id": 469, "seek": 354864, "start": 3549.12, "end": 3558.08, + "text": " the roxette co-founder which was the roxette the one was acquired by OpenAI + but I think you", "tokens": [50388, 264, 744, 87, 3007, 598, 12, 33348, 597, 390, + 264, 744, 87, 3007, 264, 472, 390, 17554, 538, 7238, 48698, 457, 286, 519, 291, + 50836], "temperature": 0.0, "avg_logprob": -0.22136606889612534, "compression_ratio": + 1.532258064516129, "no_speech_prob": 0.007909516803920269}, {"id": 470, "seek": + 354864, "start": 3558.08, "end": 3564.56, "text": " recorded that podcast before + they were acquired so good timing and you can clearly hear that message", "tokens": + [50836, 8287, 300, 7367, 949, 436, 645, 17554, 370, 665, 10822, 293, 291, 393, 4448, + 1568, 300, 3636, 51160], "temperature": 0.0, "avg_logprob": -0.22136606889612534, + "compression_ratio": 1.532258064516129, "no_speech_prob": 0.007909516803920269}, + {"id": 471, "seek": 354864, "start": 3564.56, "end": 3571.3599999999997, "text": + " all like we really want people to come and use our solution so we have hybrid + search we have", "tokens": [51160, 439, 411, 321, 534, 528, 561, 281, 808, 293, + 764, 527, 3827, 370, 321, 362, 13051, 3164, 321, 362, 51500], "temperature": 0.0, + "avg_logprob": -0.22136606889612534, "compression_ratio": 1.532258064516129, "no_speech_prob": + 0.007909516803920269}, {"id": 472, "seek": 357136, "start": 3571.36, "end": 3578.48, + "text": " some support for ranking we have this and we have that yeah I can''t I + can''t argue against", "tokens": [50364, 512, 1406, 337, 17833, 321, 362, 341, 293, + 321, 362, 300, 1338, 286, 393, 380, 286, 393, 380, 9695, 1970, 50720], "temperature": + 0.0, "avg_logprob": -0.1349770264192061, "compression_ratio": 1.7077625570776256, + "no_speech_prob": 0.0035973640624433756}, {"id": 473, "seek": 357136, "start": 3578.48, + "end": 3585.52, "text": " this being convenient so definitely something something + very useful customers yeah yeah just a small", "tokens": [50720, 341, 885, 10851, + 370, 2138, 746, 746, 588, 4420, 4581, 1338, 1338, 445, 257, 1359, 51072], "temperature": + 0.0, "avg_logprob": -0.1349770264192061, "compression_ratio": 1.7077625570776256, + "no_speech_prob": 0.0035973640624433756}, {"id": 474, "seek": 357136, "start": 3585.52, + "end": 3591.52, "text": " correction he''s not a co-founder I think he''s well VP + of engineering or used to be a VP of", "tokens": [51072, 19984, 415, 311, 406, 257, + 598, 12, 33348, 286, 519, 415, 311, 731, 35812, 295, 7043, 420, 1143, 281, 312, + 257, 35812, 295, 51372], "temperature": 0.0, "avg_logprob": -0.1349770264192061, + "compression_ratio": 1.7077625570776256, "no_speech_prob": 0.0035973640624433756}, + {"id": 475, "seek": 357136, "start": 3591.52, "end": 3599.2000000000003, "text": + " engineering in roxette but yeah I mean he''s he brings the story and I encourage + listeners to", "tokens": [51372, 7043, 294, 744, 87, 3007, 457, 1338, 286, 914, + 415, 311, 415, 5607, 264, 1657, 293, 286, 5373, 23274, 281, 51756], "temperature": + 0.0, "avg_logprob": -0.1349770264192061, "compression_ratio": 1.7077625570776256, + "no_speech_prob": 0.0035973640624433756}, {"id": 476, "seek": 359920, "start": 3599.2, + "end": 3607.4399999999996, "text": " listen to the episode he brings the story of + you know roxDB scalability issues from from Facebook and", "tokens": [50364, 2140, + 281, 264, 3500, 415, 5607, 264, 1657, 295, 291, 458, 744, 87, 27735, 15664, 2310, + 2663, 490, 490, 4384, 293, 50776], "temperature": 0.0, "avg_logprob": -0.16489293541706784, + "compression_ratio": 1.5, "no_speech_prob": 0.0019068121910095215}, {"id": 477, + "seek": 359920, "start": 3608.64, "end": 3618.16, "text": " how it underpins you + know the the further journey at roxette so I feel like we could discuss", "tokens": + [50836, 577, 309, 833, 79, 1292, 291, 458, 264, 264, 3052, 4671, 412, 744, 87, 3007, + 370, 286, 841, 411, 321, 727, 2248, 51312], "temperature": 0.0, "avg_logprob": -0.16489293541706784, + "compression_ratio": 1.5, "no_speech_prob": 0.0019068121910095215}, {"id": 478, + "seek": 359920, "start": 3618.16, "end": 3624.72, "text": " for five hours and I''m + actually a big fan of Lex Friedman podcasts where some of the episodes", "tokens": + [51312, 337, 1732, 2496, 293, 286, 478, 767, 257, 955, 3429, 295, 24086, 17605, + 1601, 24045, 689, 512, 295, 264, 9313, 51640], "temperature": 0.0, "avg_logprob": + -0.16489293541706784, "compression_ratio": 1.5, "no_speech_prob": 0.0019068121910095215}, + {"id": 479, "seek": 362472, "start": 3625.12, "end": 3631.2, "text": " really really + long and you can listen to them for weeks and and I think I really hope that we + can", "tokens": [50384, 534, 534, 938, 293, 291, 393, 2140, 281, 552, 337, 3259, + 293, 293, 286, 519, 286, 534, 1454, 300, 321, 393, 50688], "temperature": 0.0, "avg_logprob": + -0.08310505963753963, "compression_ratio": 1.6145251396648044, "no_speech_prob": + 0.007138872053474188}, {"id": 480, "seek": 362472, "start": 3631.2, "end": 3639.04, + "text": " record with you sometime later as well as you know as you have topics + to share but is there", "tokens": [50688, 2136, 365, 291, 15053, 1780, 382, 731, + 382, 291, 458, 382, 291, 362, 8378, 281, 2073, 457, 307, 456, 51080], "temperature": + 0.0, "avg_logprob": -0.08310505963753963, "compression_ratio": 1.6145251396648044, + "no_speech_prob": 0.007138872053474188}, {"id": 481, "seek": 362472, "start": 3639.04, + "end": 3645.52, "text": " something Leo that you want to share I don''t know it + could be a paper you''ve read that particularly", "tokens": [51080, 746, 19344, + 300, 291, 528, 281, 2073, 286, 500, 380, 458, 309, 727, 312, 257, 3035, 291, 600, + 1401, 300, 4098, 51404], "temperature": 0.0, "avg_logprob": -0.08310505963753963, + "compression_ratio": 1.6145251396648044, "no_speech_prob": 0.007138872053474188}, + {"id": 482, "seek": 364552, "start": 3645.52, "end": 3654.24, "text": " excites + you maybe a book or anything else that you want to say yeah I think we", "tokens": + [50364, 1624, 3324, 291, 1310, 257, 1446, 420, 1340, 1646, 300, 291, 528, 281, 584, + 1338, 286, 519, 321, 50800], "temperature": 0.0, "avg_logprob": -0.23083412079584031, + "compression_ratio": 1.4453781512605042, "no_speech_prob": 0.011599970981478691}, + {"id": 483, "seek": 364552, "start": 3657.6, "end": 3665.84, "text": " yeah great + question so so I was interested a lot recently very recently I mean maybe the last", + "tokens": [50968, 1338, 869, 1168, 370, 370, 286, 390, 3102, 257, 688, 3938, 588, + 3938, 286, 914, 1310, 264, 1036, 51380], "temperature": 0.0, "avg_logprob": -0.23083412079584031, + "compression_ratio": 1.4453781512605042, "no_speech_prob": 0.011599970981478691}, + {"id": 484, "seek": 366584, "start": 3666.48, "end": 3676.1600000000003, "text": + " couple of years in how LLAMS can be useful for search in one particular interesting", + "tokens": [50396, 1916, 295, 924, 294, 577, 441, 43, 2865, 50, 393, 312, 4420, 337, + 3164, 294, 472, 1729, 1880, 50880], "temperature": 0.0, "avg_logprob": -0.24238252639770508, + "compression_ratio": 1.372093023255814, "no_speech_prob": 0.006762172561138868}, + {"id": 485, "seek": 366584, "start": 3676.1600000000003, "end": 3685.52, "text": + " direction is how do you use LLAMS to to train smaller models for retrieval and + ranking for me", "tokens": [50880, 3513, 307, 577, 360, 291, 764, 441, 43, 2865, + 50, 281, 281, 3847, 4356, 5245, 337, 19817, 3337, 293, 17833, 337, 385, 51348], + "temperature": 0.0, "avg_logprob": -0.24238252639770508, "compression_ratio": 1.372093023255814, + "no_speech_prob": 0.006762172561138868}, {"id": 486, "seek": 368552, "start": 3685.52, + "end": 3696.64, "text": " personally it''s a very exciting area of research yeah + as far as distillation is concerned there", "tokens": [50364, 5665, 309, 311, 257, + 588, 4670, 1859, 295, 2132, 1338, 382, 1400, 382, 42923, 399, 307, 5922, 456, 50920], + "temperature": 0.0, "avg_logprob": -0.17282658702922318, "compression_ratio": 1.6047904191616766, + "no_speech_prob": 0.004426660481840372}, {"id": 487, "seek": 368552, "start": 3696.64, + "end": 3707.7599999999998, "text": " was several interesting papers on the topic + there was but basically the lot of of that work", "tokens": [50920, 390, 2940, 1880, + 10577, 322, 264, 4829, 456, 390, 457, 1936, 264, 688, 295, 295, 300, 589, 51476], + "temperature": 0.0, "avg_logprob": -0.17282658702922318, "compression_ratio": 1.6047904191616766, + "no_speech_prob": 0.004426660481840372}, {"id": 488, "seek": 368552, "start": 3707.7599999999998, + "end": 3714.16, "text": " revolves around creation synthetic data synthetic queries + based on the documents", "tokens": [51476, 47934, 926, 8016, 23420, 1412, 23420, + 24109, 2361, 322, 264, 8512, 51796], "temperature": 0.0, "avg_logprob": -0.17282658702922318, + "compression_ratio": 1.6047904191616766, "no_speech_prob": 0.004426660481840372}, + {"id": 489, "seek": 371552, "start": 3715.52, "end": 3722.72, "text": " like we + have a document that creates the queries and queries that you asked that that", + "tokens": [50364, 411, 321, 362, 257, 4166, 300, 7829, 264, 24109, 293, 24109, 300, + 291, 2351, 300, 300, 50724], "temperature": 0.0, "avg_logprob": -0.30367846366686696, + "compression_ratio": 1.87, "no_speech_prob": 0.0007750603253953159}, {"id": 490, + "seek": 371552, "start": 3722.72, "end": 3727.92, "text": " pretty slash question + and the answer is in documents we have a positive relevant document and", "tokens": + [50724, 1238, 17330, 1168, 293, 264, 1867, 307, 294, 8512, 321, 362, 257, 3353, + 7340, 4166, 293, 50984], "temperature": 0.0, "avg_logprob": -0.30367846366686696, + "compression_ratio": 1.87, "no_speech_prob": 0.0007750603253953159}, {"id": 491, + "seek": 371552, "start": 3727.92, "end": 3736.24, "text": " you can sample negatives + from from your collection and train them while but there is also a line of", "tokens": + [50984, 291, 393, 6889, 40019, 490, 490, 428, 5765, 293, 3847, 552, 1339, 457, 456, + 307, 611, 257, 1622, 295, 51400], "temperature": 0.0, "avg_logprob": -0.30367846366686696, + "compression_ratio": 1.87, "no_speech_prob": 0.0007750603253953159}, {"id": 492, + "seek": 371552, "start": 3736.24, "end": 3744.16, "text": " research where they + they would try to create both queries and documents so yeah in summary the", "tokens": + [51400, 2132, 689, 436, 436, 576, 853, 281, 1884, 1293, 24109, 293, 8512, 370, 1338, + 294, 12691, 264, 51796], "temperature": 0.0, "avg_logprob": -0.30367846366686696, + "compression_ratio": 1.87, "no_speech_prob": 0.0007750603253953159}, {"id": 493, + "seek": 374416, "start": 3745.12, "end": 3753.04, "text": " that whole that whole + not in summary but that that that line of research was particularly interesting", + "tokens": [50412, 300, 1379, 300, 1379, 406, 294, 12691, 457, 300, 300, 300, 1622, + 295, 2132, 390, 4098, 1880, 50808], "temperature": 0.0, "avg_logprob": -0.16441614236404647, + "compression_ratio": 1.625, "no_speech_prob": 0.001531481626443565}, {"id": 494, + "seek": 374416, "start": 3753.04, "end": 3761.92, "text": " to me although there + was some work before LLAMS to create synthetic queries it was not particularly", + "tokens": [50808, 281, 385, 4878, 456, 390, 512, 589, 949, 441, 43, 2865, 50, 281, + 1884, 23420, 24109, 309, 390, 406, 4098, 51252], "temperature": 0.0, "avg_logprob": + -0.16441614236404647, "compression_ratio": 1.625, "no_speech_prob": 0.001531481626443565}, + {"id": 495, "seek": 374416, "start": 3762.7999999999997, "end": 3770.3999999999996, + "text": " well-used technique but one paper that stood out was the in-part paper + from a couple of years ago", "tokens": [51296, 731, 12, 4717, 6532, 457, 472, 3035, + 300, 9371, 484, 390, 264, 294, 12, 6971, 3035, 490, 257, 1916, 295, 924, 2057, 51676], + "temperature": 0.0, "avg_logprob": -0.16441614236404647, "compression_ratio": 1.625, + "no_speech_prob": 0.001531481626443565}, {"id": 496, "seek": 377040, "start": 3770.4, + "end": 3778.56, "text": " and we have reproduction of this paper in that paper had + a a pretty quick fall-up from the", "tokens": [50364, 293, 321, 362, 33934, 295, + 341, 3035, 294, 300, 3035, 632, 257, 257, 1238, 1702, 2100, 12, 1010, 490, 264, + 50772], "temperature": 0.0, "avg_logprob": -0.2468146970195155, "compression_ratio": + 1.668639053254438, "no_speech_prob": 0.00166233885101974}, {"id": 497, "seek": 377040, + "start": 3780.8, "end": 3788.2400000000002, "text": " there were several several + authors from Google they called it Protigator where they showed how", "tokens": + [50884, 456, 645, 2940, 2940, 16552, 490, 3329, 436, 1219, 309, 10019, 28895, 689, + 436, 4712, 577, 51256], "temperature": 0.0, "avg_logprob": -0.2468146970195155, + "compression_ratio": 1.668639053254438, "no_speech_prob": 0.00166233885101974}, + {"id": 498, "seek": 377040, "start": 3788.2400000000002, "end": 3798.2400000000002, + "text": " this technique can be improved and there was another fall-up from the + with the same first author", "tokens": [51256, 341, 6532, 393, 312, 9689, 293, 456, + 390, 1071, 2100, 12, 1010, 490, 264, 365, 264, 912, 700, 3793, 51756], "temperature": + 0.0, "avg_logprob": -0.2468146970195155, "compression_ratio": 1.668639053254438, + "no_speech_prob": 0.00166233885101974}, {"id": 499, "seek": 380040, "start": 3800.56, + "end": 3809.36, "text": " now she transitioned to the mind and now they they showed + like like oh like now we do it like", "tokens": [50372, 586, 750, 47346, 281, 264, + 1575, 293, 586, 436, 436, 4712, 411, 411, 1954, 411, 586, 321, 360, 309, 411, 50812], + "temperature": 0.0, "avg_logprob": -0.16688385009765624, "compression_ratio": 1.7530864197530864, + "no_speech_prob": 0.003513850038871169}, {"id": 500, "seek": 380040, "start": 3809.36, + "end": 3817.2000000000003, "text": " somewhat better but the they they found one + issue with the synthetic query generation approach", "tokens": [50812, 8344, 1101, + 457, 264, 436, 436, 1352, 472, 2734, 365, 264, 23420, 14581, 5125, 3109, 51204], + "temperature": 0.0, "avg_logprob": -0.16688385009765624, "compression_ratio": 1.7530864197530864, + "no_speech_prob": 0.003513850038871169}, {"id": 501, "seek": 380040, "start": 3817.2000000000003, + "end": 3824.08, "text": " that not always the the document that was used to create + the queries the most relevant document", "tokens": [51204, 300, 406, 1009, 264, + 264, 4166, 300, 390, 1143, 281, 1884, 264, 24109, 264, 881, 7340, 4166, 51548], + "temperature": 0.0, "avg_logprob": -0.16688385009765624, "compression_ratio": 1.7530864197530864, + "no_speech_prob": 0.003513850038871169}, {"id": 502, "seek": 382408, "start": 3824.64, + "end": 3832.08, "text": " so you would think it sort of makes sense that if the + question is being answered by this document it", "tokens": [50392, 370, 291, 576, + 519, 309, 1333, 295, 1669, 2020, 300, 498, 264, 1168, 307, 885, 10103, 538, 341, + 4166, 309, 50764], "temperature": 0.0, "avg_logprob": -0.20478145374971277, "compression_ratio": + 1.8803827751196172, "no_speech_prob": 0.002274239668622613}, {"id": 503, "seek": + 382408, "start": 3832.08, "end": 3836.72, "text": " is the most relevant document + that turns out if you ask a question there can be other documents", "tokens": [50764, + 307, 264, 881, 7340, 4166, 300, 4523, 484, 498, 291, 1029, 257, 1168, 456, 393, + 312, 661, 8512, 50996], "temperature": 0.0, "avg_logprob": -0.20478145374971277, + "compression_ratio": 1.8803827751196172, "no_speech_prob": 0.002274239668622613}, + {"id": 504, "seek": 382408, "start": 3836.72, "end": 3841.68, "text": " that that + answer this question and they can answer that question even better and so they solve + this", "tokens": [50996, 300, 300, 1867, 341, 1168, 293, 436, 393, 1867, 300, 1168, + 754, 1101, 293, 370, 436, 5039, 341, 51244], "temperature": 0.0, "avg_logprob": + -0.20478145374971277, "compression_ratio": 1.8803827751196172, "no_speech_prob": + 0.002274239668622613}, {"id": 505, "seek": 382408, "start": 3841.68, "end": 3849.68, + "text": " problem using you know a relabeling approach so that basically the due + to the will they generate", "tokens": [51244, 1154, 1228, 291, 458, 257, 1039, 455, + 11031, 3109, 370, 300, 1936, 264, 3462, 281, 264, 486, 436, 8460, 51644], "temperature": + 0.0, "avg_logprob": -0.20478145374971277, "compression_ratio": 1.8803827751196172, + "no_speech_prob": 0.002274239668622613}, {"id": 506, "seek": 384968, "start": 3849.68, + "end": 3856.16, "text": " synthetic query from some document they they do it you + and they do", "tokens": [50364, 23420, 14581, 490, 512, 4166, 436, 436, 360, 309, + 291, 293, 436, 360, 50688], "temperature": 0.0, "avg_logprob": -0.1876110277677837, + "compression_ratio": 1.6486486486486487, "no_speech_prob": 0.0014258669689297676}, + {"id": 507, "seek": 384968, "start": 3858.56, "end": 3865.44, "text": " then they + look at the top say 10 documents and they they use another LLM to decide whether", + "tokens": [50808, 550, 436, 574, 412, 264, 1192, 584, 1266, 8512, 293, 436, 436, + 764, 1071, 441, 43, 44, 281, 4536, 1968, 51152], "temperature": 0.0, "avg_logprob": + -0.1876110277677837, "compression_ratio": 1.6486486486486487, "no_speech_prob": + 0.0014258669689297676}, {"id": 508, "seek": 384968, "start": 3866.72, "end": 3872.16, + "text": " these documents are relevant to the query or not yeah it''s also very + interesting paper", "tokens": [51216, 613, 8512, 366, 7340, 281, 264, 14581, 420, + 406, 1338, 309, 311, 611, 588, 1880, 3035, 51488], "temperature": 0.0, "avg_logprob": + -0.1876110277677837, "compression_ratio": 1.6486486486486487, "no_speech_prob": + 0.0014258669689297676}, {"id": 509, "seek": 387216, "start": 3873.12, "end": 3881.44, + "text": " as well I yeah and finally the last couple of papers that I encountered + were regarding creation of", "tokens": [50412, 382, 731, 286, 1338, 293, 2721, 264, + 1036, 1916, 295, 10577, 300, 286, 20381, 645, 8595, 8016, 295, 50828], "temperature": + 0.0, "avg_logprob": -0.19037137031555176, "compression_ratio": 1.6136363636363635, + "no_speech_prob": 0.0033114938996732235}, {"id": 510, "seek": 387216, "start": 3882.56, + "end": 3890.64, "text": " either creation of documents either just joined here with + queries or based on the queries this is also", "tokens": [50884, 2139, 8016, 295, + 8512, 2139, 445, 6869, 510, 365, 24109, 420, 2361, 322, 264, 24109, 341, 307, 611, + 51288], "temperature": 0.0, "avg_logprob": -0.19037137031555176, "compression_ratio": + 1.6136363636363635, "no_speech_prob": 0.0033114938996732235}, {"id": 511, "seek": + 387216, "start": 3891.3599999999997, "end": 3897.2799999999997, "text": " very interesting + for long yeah that''s amazing thanks for sharing and I hope we can", "tokens": [51324, + 588, 1880, 337, 938, 1338, 300, 311, 2243, 3231, 337, 5414, 293, 286, 1454, 321, + 393, 51620], "temperature": 0.0, "avg_logprob": -0.19037137031555176, "compression_ratio": + 1.6136363636363635, "no_speech_prob": 0.0033114938996732235}, {"id": 512, "seek": + 389728, "start": 3897.28, "end": 3906.2400000000002, "text": " link all of these + papers in the in the episode you know yeah absolutely because I think one of the", + "tokens": [50364, 2113, 439, 295, 613, 10577, 294, 264, 294, 264, 3500, 291, 458, + 1338, 3122, 570, 286, 519, 472, 295, 264, 50812], "temperature": 0.0, "avg_logprob": + -0.2306030591328939, "compression_ratio": 1.8026905829596414, "no_speech_prob": + 0.006908038165420294}, {"id": 513, "seek": 389728, "start": 3907.28, "end": 3912.88, + "text": " goals of this podcast is to continue to be educational resource not just + entertainment maybe some people", "tokens": [50864, 5493, 295, 341, 7367, 307, 281, + 2354, 281, 312, 10189, 7684, 406, 445, 12393, 1310, 512, 561, 51144], "temperature": + 0.0, "avg_logprob": -0.2306030591328939, "compression_ratio": 1.8026905829596414, + "no_speech_prob": 0.006908038165420294}, {"id": 514, "seek": 389728, "start": 3914.0, + "end": 3918.88, "text": " potentially viewed as an entertainment entertainment and + then good sense of word you know when you", "tokens": [51200, 7263, 19174, 382, + 364, 12393, 12393, 293, 550, 665, 2020, 295, 1349, 291, 458, 562, 291, 51444], "temperature": + 0.0, "avg_logprob": -0.2306030591328939, "compression_ratio": 1.8026905829596414, + "no_speech_prob": 0.006908038165420294}, {"id": 515, "seek": 389728, "start": 3919.44, + "end": 3925.92, "text": " want to sort of a little bit like break away from your + daily routine and then listen to some of the", "tokens": [51472, 528, 281, 1333, + 295, 257, 707, 857, 411, 1821, 1314, 490, 428, 5212, 9927, 293, 550, 2140, 281, + 512, 295, 264, 51796], "temperature": 0.0, "avg_logprob": -0.2306030591328939, "compression_ratio": + 1.8026905829596414, "no_speech_prob": 0.006908038165420294}, {"id": 516, "seek": + 392592, "start": 3926.0, "end": 3931.76, "text": " insights and and we heard a lot + of insights today from you thanks a lot for sharing Leo and I wish", "tokens": [50368, + 14310, 293, 293, 321, 2198, 257, 688, 295, 14310, 965, 490, 291, 3231, 257, 688, + 337, 5414, 19344, 293, 286, 3172, 50656], "temperature": 0.0, "avg_logprob": -0.1200817087863354, + "compression_ratio": 1.7808219178082192, "no_speech_prob": 0.003683835733681917}, + {"id": 517, "seek": 392592, "start": 3931.76, "end": 3937.6800000000003, "text": + " you all the all the best in in your in your projects and your current projects + in your future projects", "tokens": [50656, 291, 439, 264, 439, 264, 1151, 294, + 294, 428, 294, 428, 4455, 293, 428, 2190, 4455, 294, 428, 2027, 4455, 50952], "temperature": + 0.0, "avg_logprob": -0.1200817087863354, "compression_ratio": 1.7808219178082192, + "no_speech_prob": 0.003683835733681917}, {"id": 518, "seek": 392592, "start": 3939.6, + "end": 3947.44, "text": " and yeah I mean I would be all equally excited to talk + to you at some point as well because it does", "tokens": [51048, 293, 1338, 286, + 914, 286, 576, 312, 439, 12309, 2919, 281, 751, 281, 291, 412, 512, 935, 382, 731, + 570, 309, 775, 51440], "temperature": 0.0, "avg_logprob": -0.1200817087863354, "compression_ratio": + 1.7808219178082192, "no_speech_prob": 0.003683835733681917}, {"id": 519, "seek": + 392592, "start": 3947.44, "end": 3953.04, "text": " feel like you have a lot more + to say than I''m able to contain in the in a single episode", "tokens": [51440, + 841, 411, 291, 362, 257, 688, 544, 281, 584, 813, 286, 478, 1075, 281, 5304, 294, + 264, 294, 257, 2167, 3500, 51720], "temperature": 0.0, "avg_logprob": -0.1200817087863354, + "compression_ratio": 1.7808219178082192, "no_speech_prob": 0.003683835733681917}, + {"id": 520, "seek": 395304, "start": 3953.36, "end": 3961.84, "text": " yeah it''s + my pleasure thanks a lot for inviting me I enjoyed the podcast I enjoyed our conversation + very", "tokens": [50380, 1338, 309, 311, 452, 6834, 3231, 257, 688, 337, 18202, + 385, 286, 4626, 264, 7367, 286, 4626, 527, 3761, 588, 50804], "temperature": 0.0, + "avg_logprob": -0.19293814897537231, "compression_ratio": 1.1954022988505748, "no_speech_prob": + 0.005714971572160721}, {"id": 521, "seek": 396184, "start": 3961.84, "end": 3972.4, + "text": " thank you very much Leo and good luck bye bye", "tokens": [50404, 1309, + 291, 588, 709, 19344, 293, 665, 3668, 6543, 6543, 50892], "temperature": 0.0, "avg_logprob": + -0.4470643630394569, "compression_ratio": 0.9, "no_speech_prob": 0.02594118006527424}]' +--- + + Hi everyone, Vector Podcast is back with still with season three and I'm super excited to be talking to my guests today and there is a connection with this episode between this episode and the episode that we recorded with Yuri Malkov about one of the most famous and popular vector search algorithm I can ask WL and I'm talking today with Leo Boyzov, who is the senior research scientist at AWS and he is also a co-author of Animesleab and Animesleab is today used at Open Search and probably some other places that I actually don't know and I hope to learn it as well today. +This is just exciting and I think goes without saying that the whole field stands on the work done by people like Leo and Yuri and others who actually develop the core algorithms and popularize them, improve them over time and then the story unfolds from there. +Hi Leo, how you doing? Hi, thank you for introducing me, it's a great pleasure to be able to podcast. Yeah, it's my pleasure as well to have you. Traditionally we start with the background. +Can you say in a few words your background maybe how you got here and what's your story in search vector search and maybe LLM? Yeah, sure, yeah so background is pretty long. So I've had a rather long career, honestly. Well in my current capacity as you mentioned, I am a scientist at WSAA labs. +For one year I was working on co-generation about this year, earlier this year I moved to a Q-console team, Q-console team works on question and switch at bots that answers questions about various AWS services. +So we can ask like, I don't know, it's like where's my EC2 instance things like that and how I set up things. But I have to make a disclaimer that today I do not speak on behalf of AWS and I cannot talk in details about my work there. So as I said I had a really relatively long career. +Yeah, so most of nearly all of my life, I have been a computer science geek with a passion for building cool stuff and solving hard problems. Yet my professional career started in rather mundane fashion. So I started working client and service of where for financial systems. +This was not my favorite subject, but pretty much the only one that was paid reasonably well at the time. So I had to do a lot of front end and back end engineering using various SQL databases. +I was not satisfied with my career, but luckily I got really interested in algorithms, in particular retrieval algorithms. So I started working on this topic with the algorithms first part time, then full time. But largely as a software engineer, less as a researcher. +And as a software engineer, work for various companies, including two tiny startups in the Russian search engine and the Yandex. So later I moved to the United States and work on the search engine PubMed, International Center of Biotechnology, information. +First again, that was a common topic in my career, started working with I was doing a lot of front end development. But about the class, 40 years I worked primarily on the T-Roll, the core engine. In particular, I invented a pretty need to speed up weighted bull in the T-Roll. +And around the time I also realized that it would be hard to get to the search position without a good degree. So that motivated me to apply a bunch of universities and eventually I got accepted by Carnegie Mell, which was a huge lock. But yeah, so I did my PhD studies there. +And during these studies, I worked on a mix of machine learning and algorithm algorithms without any deep learning. So the vector search or rather similarity search was a part of my graduate studies. So yeah, I didn't use any deep learning though. +It was a mix of classical machine learning, VortoVex style neural networks and digital. So what is an interesting part of that story is that my advisor, Eric Nyberg, he worked on question answering. +And together with his theory and his participated in development of IBM Watson, that's an amazing trivia playing system that 2011 defeated human champions. So that was like one reason why I chose my advisor. It was like such a cool topic to choose. +But pretty quickly I learned about the system and realized, oh, like it's actually really just not just but it's largely such engine on steroids. So Retrieval, IBM Watson, I have a blog post about that if anybody is interested. +But then Retrieval, it's basically really Retrieval based extractive question answering systems. So if you want to improve question answering, you need to improve Retrieval. So that's how I got back to working on quality algorithms. +And again, I saw an opportunity and why big research question was, how can we do information Retrieval using more advanced techniques rather than lexical search with BIRN 25? +And because before birth, nowadays like everybody just uses like word-based models or any like other transform based models to create dense vector embeddings and they are quite effective, that was not the case when like 10 years ago. +So whatever we had there was pretty ineffective Retrieval. And so my thought was that because the single representation was not effective Retrieval, those need to be somehow combined and assembled. So you basically don't get a single representation. +You use a combination, you use combine similarity and then you treat this similarity as a black box and then you apply generic Retrieval algorithm. So this was a pretty in hindsight that was a pretty ambitious project that required working on both design and effective similarities. +And Retrieval algorithms. And that's why we, well one, that's where that animously library turned out to be very useful. It was instrumental to this work. Although it was created for somewhat unrelated people. +Okay, so that was an overall rather bumpy, right things didn't work well initially and I got a lot of help from other people in particular from my author, David Norva, who proposed an amazing improvement for one of the algorithms in a thermosleep. +Yeah, and so we published and opened after my graduation. And when I was writing my thesis, yeah, I was found like a bunch of issues with my previous approaches and realized that I could also use like a H&SW like algorithms which were not like core part of my thesis work. +And I got even stronger results, but that was like a little bit too late to publish and use otherwise. Moreover, that the similarity that I used was a sort of a face palm realization that that similarity that I used, like Retrieval, completely as a black box. And it worked with most effective. +It was more effective than B125 on the collections that I used. But I didn't realize that this black box similarity was actually representable by another product between two large sparse vectors by another former author Chris Dyer pointed this out. +And if I embraced this sparse vector approach from the get-go, it would have been a much easier problem to solve from both engineering and scientific points of view even without work. And okay, it could have produced some more impact. But yeah, a little bit too late to dwell this now. +Okay, and enough with that I graduated six years ago. And since then I haven't working as a researcher scientist and engineer on deep learning in what specifically I had in training models for specific initial computer vision and Retrieval. +Despite this diversity, things have come a full circle and are working question as being systems once again. Yeah, that was pretty much involved. Yeah, amazing story. Yeah, thank you for that. +It's like the story tends to repeat itself, but at the same time, if we find the topic still exciting and it seems like you are still very interested in question answering and improving building blocks of that, it's kind of cool, right? +So that we are able to come back to some of the topics, pick them up on a different level. +That's amazing. And yeah, there is a lot to unpack. I almost wanted to ask you or the moment you spoke about spars and dance. I wanted to pick your brain on what's it take on the model called split and split V2. +I don't know if you're familiar with that model, but basically, you know, there is always this discussion should we take lexical search, combine it with dance search and then do some kind of hybrid formal on top and then how do we even learn the parameters of that model, right? +Depending on the domain. +But then there is a drastic sort of approach. Let's not do that. Let's just take a complete model which can handle both and then you can also support what the dance search doesn't support like exact phrase searches. +What's your general intuition about that? How do you think about this? Well, that's a super interesting question. I have one clarifying question though. So, before I answer, you said that some people who want to have a single model that's doing both. +Could you elaborate a little bit on this? Well, I guess maybe it's not that they wanted, but it's like the development when, instead of sort of, you know, combining these disparate sources of results, you know, one coming from lexical search, which is kind of like well-known BM25 driven, I guess. +And then the other one is more like more modern in a way that everyone wants to get exposed to dance search. And then you need to somehow figure out how you combine the results, right? So one is designed maybe for precision lexical. +The other one is designed more for recall, right? Because the vectors are not, they don't have as many dimensions as these far specters. +But then you still need to figure out, okay, how do I combine this to? And usually people cite reciprocal rank fusion in what I hear, but there are other methods as well, like even clustering based. But then that's one approach. Another approach is just stop doing that, I guess. +If I really understand what split does, and then you encode with split your data once, and you retrieve, you know, you use its capabilities to also retrieve exact phrases, right? So, effectively, ideally, you don't need the lexical matching engine anymore, but maybe I'm completely wrong. +I'm just, I wanted to hear your opinion on that. Okay, well, let's get it. Using your words, it's a lot on back here. I'm still not quite sure what you mean by having like a single model. +Although, maybe I love me try to maybe start answering questions and you can drop me and guide me into the other direction if needed. +So first of all, we have what's interesting about Nashville language is that, and that's very different from computer vision domain, is that we usually represent, we can we have multiple ways to represent text. +So in computer vision, usually it's just like each image is a traditional representative of actors that was the commodity theme. + But in the in a national language processing, we started with the so-called bag of words representations where a document was represented by basically a sparse vector where you will have either zeros and ones, which means the specific terms present or not, or maybe weights, not just zeros and ones, but weights. +But then, with development of deep learning, and I actually started a little bit earlier with people, people learned how to represent text using fixed size vectors. And that was like using principle component analysis. +And this is not a very natural representation for text and it didn't work really well initially. But now we're having good results. So we have like two representations and there are different approaches to combine those, of course. +One is just if you want to do the T-wall, you can indeed just do the lexical base search, you can do a kidney or a snabestation vector representations, and then you can somehow merge the results. You can use ranker. +But you don't have to, and that's the so-called hybrid search, but the hybrid search can exist in different versions. +So if you want to combine it sort of in a single model, why don't you represent each document using both sparse and dense vector? And when you're computing the similarity, you can compute the similarity between sparse parts, between dense parts, and then combine them somehow. +For example, using a weight. And that's in fact what I was trying to do in my thesis as well, because I was doing, my similarities was basically an ensemble of several similarities course for at least two representations. And that could work. +There's of course modern instantiations of this, and there's a paper, I think both are by some glue people, where they did exactly like this, they combined splaid and some dense vector embeddings. +And that can work apparently a little bit better than, or sometimes maybe a lot better than basic representations, like each representation specifically. So with both approaches, of course, there are issues that you mentioned. +So I don't know what the best approach there, and I don't have a crystal ball regarding what's the best path forward. But with dense representation, the clearly the problem is that you have to pack everything into the fixed size vector. +And as your document is getting bigger, you basically the vector size, the amount of information you can store is the same, but your document increases in size. So you would possibly expect some deterioration in quality. +But another reason why you can see deteriorating results just because some like you have fixed representations, the number of words is huge. +And like in regular person knows like around like educated person knows about 30,000 words, but in reality, like internet has millions of words, right? And the words are not just only words, there are things like product identifiers, right? +If you want to, and sometimes people will do products, they will search something they want to buy, and they would you know copy paste those, or type them in, and then they got squished in the in that dense vector. +So it cannot be precise. + There is an interesting paper by author by Neil Srymer's, sentence board author, where he has a, in like some experimental and even theoretical evidence that as the collection size increases so the dense vector search can deteriorate just because there would be some false positives and measures due to you know the excruciating a lot of information together and they fix size directly. +So yeah, I mean, it's quite possible, but I haven't seen like a fall off of this work, so I don't know how much of a problem it isn't in practice. And coming back to the sparse representations, so yeah, they could potentially use all this issue, but not necessarily with displayed like models. +Well, the problem with display is that displayed models, they create those sparse representations using the, not the words themselves, they're using sub word talking. +So as a reminder with like models like with transform models, they create this sort of new sort of vocabulary that has some complete words, but most words are incomplete. + So like they have like extract prefix, suffixes, parts of the words, and this is your new vocabulary and the difference between these new vocabulary and the actual vocabulary that people use or use on the internet is that it's limited to, it can have like 50,000 talking, maybe 200 talking and some of the advanced modeling models, but we really have like millions and millions of words. +So of course, that would also lead to some deterioration in quality false positives, and especially if you try to represent, represent long documents using this fixed size vector. So it's sort of sparse in more, it's more sparse in some ways, but it's still fixed size vector. Doesn't make sense. +Yeah, it does. +I mean, it's very insightful, what you said that like basically to make my question much more succinct, I could ask, you could we just use splaid for everything? And like instead of, you know, combining different approaches, just use splaid, but you basically answered it really eloquently. +You said that splaid itself has limitations, right? For example, that would not allow us to properly embed all variety of the language and then obviously dealing with longer documents is another issue. +There is an interesting extension to this, so I was just recently listening to a presentation on the extendable splaid where they extend the vocabulary of splaid by eddy entities. +That's one interesting direction of work, but another interesting direction is like the so-called like deep impact models where they take a document and they do document expansion using like, you know, the doctor query style models. +And then they for each talking, I think, in the document they are learning a weight. And so this is like a little bit more less limited, I think. +But in the end, I think it's whenever we, yeah, so basically if like to be able to handle those like rare, we need lexical representation to handle, you know, bigger vocabularies. And it's probably hard to model with just fixed size vectors. Yeah, it makes a lot of sense. +At the same time, we also know that, well, it depends on how you model this, but lexical approach, like vanilla lexical approach would miss semantic links, right, and sort of understanding of larger context, because all it does is that it kind of looks through the VM 25 model at the words. +And sometimes it just pays attention to some words, but doesn't pay attention to other words. +And it may miss the main point of the query, right? + But of course, this model still worked for a new work that Yandex, you know, it best, this model's worked previously, probably by virtue of you training the users that, hey, don't give me the full sentence, just give me like, you know, specific words, like chopped list of words that I need to look up. +And that's how I guess inverted index worked out. +And of course, you need to have on top of that, you need to have very smart reranking strategy to pull up the documents that are really relevant, right? +But I guess today we have we have this new, well, I keep calling it new, but it's not maybe necessarily that new, but it's still fairly fresh development of dense, dense retrieval that not many companies, I think, have been boarded in the products yet. +But it's a very interesting direction, and still you need to combine the two worlds, right? So it sounds like from what you said, the only way to get better quality is to combine this approaches rather than try to develop one single holistic model to handle everything. +Oh, I, yeah, it's a great question. I actually don't know what's the best part forward is. So I highlighted the, the deficiencies and advantages of different approaches. +But I also want to comment on the deep impact model, the deep impact model, I think the way, maybe I described it, it was, it sounded like it is like a BM25 model, but it's actually not. +So maybe we should have, like we're talking about sparse representations, like learned sparse representations, because it's a bigger topic and it's much bigger topic than most people realize sometimes. +So people know BM25, people know dense vectors, and these are, these simple things, but there is a lot in between. +So first of all, what you can do, and that's what people did, and even the doctor query is the most famous way to do so, but it was actually not even a single group of people who proposed this. +So what can you do? +We can take a model, a deep learning model, contextualized model, maybe not necessarily contextualized, but contextualized models, they do better job because they look at the model as a whole, the document as a whole, they don't like look like a devidue chunks of document, right? +So they kind of can understand what the total meaning of the document. +And then they, they propose new keywords on new terms. So some like synonyms, synonyms that could have been in this document, but they are not. And if you add these documents to the new, if you add, sorry pardon me, if you add these terms to the document, then this missing synonyms are there. +You can index this document. So basically this is document expansion. And you can do document expansion. And that helps resolve that lexical mismatch, mitigate lexical mismatch between query and documents. And I claim it's easier to do this expansion. +That there are like of course approaches that do query expansion, basically adding synonyms at the query stage. But why claim is that it's much harder to do this accurately because there is much less context. +So this is one, you know, this is one direction of fixing things and creating sparse representations. Like there is a split model. What does this play? What does this play model? It's completely sort of it's doing something completely different. It looks at the document. +And there is a vocabulary, like bird tokens. And for each token, it gives you a weight. It looks at the document sort of understand this meaning that says, all like this is like, this is a word, a prefix or word. It should have this weight. And that's how you get a sparse representation. +But with deep impact, you're doing something slightly different. So you take a document and you do this document expansion. So you add words like synonyms. But then you don't index this document using build 25. Why? Because build 25 is clearly old style and it doesn't take like context to account. +So instead of that, you train a transform model that would give you weight for each term in the document, in the expanded document. And then you use this for it. Oh, that's very easy. And that's called deep and that's called deep impact models. Yes. Yeah. +We should link that I guess there is a paper for that as well and should be able to link that. Yeah. That's very interesting. And it's also interesting that what you mentioned about the dance model sort of not able to capture everything that you want them to capture. +And yet, this becomes a building block in the application phase, like for example, in Rage or a Givalogmented Generation because effectively, the only method that I heard so far off, which is circulating a lot is just chunk it up. +You chunk all documents up and then you hope that the chunk size is less or about the same as capacity of the model, right? Because otherwise, it will chop off the end and you will lose the part of the meaning. +Or you also apply some methods like some level of overlap, right? So you can then index a few more chunks in the same entity and then try to query. And then interestingly, you can generate questions out of chunks that these chunks might be able to answer. +And then you search those questions instead of the chunks themselves, right? So which comes back to what you said about Dr. Query, I guess. So it's very interesting that like we are sort of like standing on a set of building blocks that themselves should be optimized and optimized and optimized. +But I guess we already in the phase globally when everyone is trying to derive value from LLMs and Rags and everything, right? And yet, we can stumble upon some really tricky situations. Like you explained. Oh, it looks like we have a lot. Yeah, it looks like we have still a lot of research topics. +Yeah. A lot of answering questions. Yeah. I wanted to a little bit digress from here to the work you've done at NMS Lip and I want to read it from your from the GitHub repository. It's a non-metrics space library. +And I did spend some time in my rework life, you know, when I was studying mathematics and we did study a bunch of, you know, metric spaces. I have never realized I would never really like imagine that this highly theoretical stuff would now connect so deeply to practice and it's amazing. +But can you tell me why it's non-metrics space library? Isn't it so that the whole idea of, you know, vector searches that we choose some metric, Cousin or dot product or whatever it is. And we are, and that's how we express the semantics similarity. Great question. +So the reason why it is we decided to not limit ourselves for to metric search because we felt and that's also a feeling of other people is that metric search is is limiting. So it's not expressive enough. It turned out to be true to some degree but not as much as we hoped. +And indeed, in many cases, so and why we're doing so, the representation learning was not as developed as it is now. So we felt like, you know, we need to be able to, people will engineer those complex similarities and we need to support individual using this complex similarity. +This did not happen. +But what I think happened and that's I want to connect this to my statement that in the end of my graduate studies or rather after defending a thesis, somebody pointed out that the similarities that we were using were basically representable as the sparse similar product between two huge vectors. +So it's some sort of it becomes similar to either deep impact or split. And in fact, so the similarity is the maximum product. It's not the cosine similarity. And the the search like the search procedure is called maximum inner product search. +So basically, you want to retrieve documents that have the maximum inner product between query and the document. + And the and this is not this is a symmetric similarity measure in some sense symmetric, but it is not it is it is not a metric and it's not easily reducible to the to the cosine similarity and to the creature searching using a science similarity is actually fully equivalent to searching using the Euclidean distance for the inner product you can reduce this search to a cosine similarity and Euclidean distance, but turns out that this reduction affects efficiency. +And then somewhat like bigger topic for a discussion, but what happened is that people say at at Lucene, who are maintaining Lucene, they were adding support for the maximum inner product. +And Vespa did this and they did this through this trick to reducing of of maximum inner product to cosine similarity and of two. And I argued that there is research showing that this is suboptimal and there was a discussion in this as a result they basically didn't do this. +So so a long story short, I think a lot of things are so non-metrics similarity search as in general turn out to be not so useful, but there are some instances like maximum inner product search where we do have things that are non-metric entities widely used. +Yeah, that's amazing, but I think I hope that equally as I'm learning, I hope our listeners are also learning on this because often times when you plunge into a new field, let's say, then search, all you see is what is being popularized and you know you may go down the rabbit hole. +So I'm really excited and thankful that you are able to share and much wider perspective over things. And then also most interestingly, you work and you say you're a co-author of an MSLIP besides other authors. +Your collective work is also now used at like for open search, engine, which I believe I also had a chance to test at some point and like it's a C++ library that is then somehow loaded of GVM and basically then searches is performed using H&SW. +Can you tell me a bit about that like that story of how did you end up you know connecting these to an MSLIP and H&SW and here I will probably link to the episode theory that it's quite popular today. + Yeah, well first of all I have to say that I mean it was popularity like close to 100% of an MSLIP is certainly due to the development of H&SW which was Eurisk creation not mine and we affected it in only very minor ways because I think the I mean we provided the platform and yeah so I think one trick that you reboard which I borrowed from KGRAF was the efficient algorithm from him or she checking but that was it. +So the end of the sleep but end of the sleep it was like creation of several people and it was like has like a rather wild story so it was never planned in the sort of random how we developed. + So in 2012 I attended the conference where I met Billik Nidhan who was working on and he was doing his PhD on similarity research and we found that like a lot of like we shared some interesting particularly in the written algorithms and we decided to do some joint project together and then my initial interest was how do I support not it was somewhat academic topic no metric search is as I explained before it's still like largely more like academic interest because a lot of things are really metric or at most maximum winter product search which is sort of almost a scientific narrative almost metric. +And yeah so that was basically purely for academic interest and I connected it to the to the machine learning because I saw an opportunity to use machine learning to support generical algorithms that would do a k-nearest new research with non-metrics simulator such as scale divergence. + Yeah so we did it as a machine learning course project and we published paper at Noribs and it could have stopped at this point but then I also like that conference I got like I met other URIs that conference or just treated that I met with some of with Yuri Adir Quothar, Alexander and they worked both work at Merrill Habs company where they developed small world graph approach and that was like a general version and so Alexander was really interested to prove that whatever algorithms that we have in Emma Sleep and we were tackling generic search in generic spaces for generic similarities and he was eager to prove that the graph based algorithm I actually truly generic and this is why he and his student that you know created the first version of a small world graph in Emma Sleep so basically contributed this version and that was a really super slow I spread it up by both 10x and that was the version that we used to win the first in Benchmarks so it was pretty good but it has issues and one issue that was fixed thanks to Yuri sharing with me some early version with H&SW and I looked at the code it was not as like fast version that he created later but already there was fixing something and maybe he didn't realize he showed me that piece of code and I realized oh like there is actually still an issue in SW graph so SW both SW graph was improved and Yuri like contribution is W2 and Emma Sleep so it greatly it was like a huge contribution like big step forward it won the second NNBG Mark competition they proved SW graph was I think the second the second algorithm I have a screenshot of this somewhere which I sometimes included to my job talks and the H&SW also influenced face because they realized oh like they actually knew about K graph and knew about the graph based retrieval but there was one important reason why they didn't you can ask me why but anyway so it influenced face and a lot of other people and of course yeah that that Yuri created that was Yuri's work like a huge impact in the rest of his history but I think Yuri shouldn't complain no he has a great career first he has great career first Twitter and now it's at OpenAI so yeah it's a magic story just a close of the wolf white it face did not implement the approach you had this is this is a really interesting thing because that's one of my favorite pieces in this story well turns out that the the graph based retrieval algorithms they had long history so a lot of this was rediscovered on the the pruning heuristics and like the basic algorithm they go back to papers in 80s and 90s but people did not use it and one hurdle was the inability to efficiently create those K K nearest neighbor graphs and K nearest neighbor graph it's a simple concept you have data point and you have you need to find some data points that are nearest neighbor of these data points and then you connect to them module or some post modification of this graph but how like you know if you have end points it is in the brute force approach and squared computation how can you do this efficiently how can you approximate this so the way how it was done before people were coming up with fancy algorithms how to how to approximate a disappointment and those fancy algorithms would not particularly scalable and K graph I think is not particularly scalable we played with it we actually incorporated K graph in primitation into enemy sleep and it was indeed hard to create large graphs because it's a nitty-fragurithium and yeah it's not it's not very scalable but what what Yuri and Kothar did for as small world graph and it was before she was done we were while they were at Merrillab's company they realized that they can combine the interval and creation of the graph and they can do it efficiently like in what like using modern terminology embarrassingly parallel fashion and that was I think one key missing block that prevented graph based algorithms to become practical yeah that makes sense yeah it doesn't like what what excites me in this story that you shared is that how serendipitous the this like the discovery process is right and like something that feels like random leads to I don't know creation of the industry right you could you could largely say that the new industry of you know vector databases and vector search and now rag on top of that is created because you guys worked on practical implementations of something that was also that that stood on no shoulders of you know some of the inventions and research done before that and so it's kind of like natural progression but I mean it's amazing how you know it's just on the verge of you not meeting someone at a conference could basically need to possibly not creating an industry right quite possibly I think well thank you for the kind words and of course it's not because of us the if not for us I think other people who have done this I yeah so I with them but anyways so I think we did useful work and clearly people are using a tremendous double you lot and people using it mostly even even though it hasn't like a lot of issues but it was still ended up being used rather widely and the one reason why it was used so widely because people who needed a library the Python basically a library that would do Kenyar's new research and it would do it from Python and people often take these little things for granted but say initially I honestly didn't have Python bindings and to participate and then benchmarks and have something useful you would need to have Python bindings this Python bindings were written by Billek I didn't I didn't create those bindings so he created those bindings the first version so there you go that library was possible to use and at the moment there were at the moment the it was not such a big choice of libraries to do Kenyar's search so in terms of the competition there was anoy which was slower noticeably slower there was another library flan which used similar algorithms to anoy but it was less optimized and it was also slower through three wall there was a keg graph but it was not like so easily usable and yeah basically that was it and later came face but it came it was only released I think a year a couple of years later after well definitely after yeah it took several years for face to appear and people started using it so at some point there was vacuum and I honestly filled it now like other approaches that are taking over yeah so yeah it in summary there wasn't a lot of serendipity but I wouldn't take credit for it in the industry it would have created without us for sure yeah yeah maybe or maybe not and it's also well I think it's quite a photo like a better work typical of the end turn not to recognize the impact they're making because the moment they do recognize that that's probably end of story so like you need to be constantly sort of low ego and pointing at the goal and I think this is what you're doing and that it feels like this is your approach but you also do do quite a bit of impact I could ask a ton of questions obviously and I could relate also to the fact that what you explained about some of the struggles like how to optimize these algorithms because at some point I did embark on participating in billion scale and thench marks and I I think I failed miserably but at the same time I did have some code which worked on a small scale and one of the building works there was hnsw with very simple I would say with very very simple intuition that you just make several I think several passes through the data set and you try to bind points in space that are closer to each other and then you would push them to some common bucket I called them shards and then you would build a nation s double index for those shards the only thing that I couldn't figure out is for those shards I still needed to have an entry point to quickly sort of identify which shards I should go down you know through when I when I when I find similar similar documents for the query and I did attempt to modify hnsw code in the enemy sleep you know to to like get me only first layer of the graph so that I can pretend that that's my layer for entering the shards I just ran out of time but I see it was very exciting and also thanks to the organizers we had access to really beefy machines which I think I had I haven't been giving like good use I was mostly burning the you know CPU capacity and memory but I think it's an exciting field and what what I hope is that like with the vacuum that you mentioned that it doesn't happen that this torch will be carried forward and then someone will get excited about and not afraid of trying new things in this space are you yourself still like looking at obviously you're looking after enemy sleep but is there something that particularly excites you in this field that you would be working on or you are working on? + Yeah great question so first of all yeah I am not sure if if I would do any work on vector search in the I haven't actually not maintaining an enemy sleep pretty well recently I'm just didn't have a lot of time and there was an issue with building the so I will still fix and support later version of Python for sure I was like you know like piece my piecemeal work I found like say half a day to fix like Windows build something else popped up yeah so it is an exciting field while it's also become really busy and another thing is that I still see the focus main focus it's not it's not very appreciated so the like you said you said I mean there's a really nice voice that all like that helped industry to be created and maybe it's true to some degree it is what yeah is it appreciated by you know your potential employer no it's like zero appreciation so it it isn't it's it's it's it wasn't still is somewhat niche topic and most people are of course I interested in how do you solve intelligence in that you know broad sense of the word how do you create models that can be seen and if the model like how you can combine them that this is like you know this new agentic ecosystem and yeah so all that stuff that really excites people it it isn't this you know plane of or space of large language models machine learning deep learning intelligence you name it yeah so that's why I do have ideas I did tested some of them but you know things usually don't work but yeah I don't have time to think systematically about this issues yeah but I guess at the same time you did create the base for for you know for other people to innovate and I think it's I think it's highly appreciated really I also wanted I also wanted to pick up the topic that originally sort of interested when you interested me when when you popped up on my LinkedIn you know feed you you made a statement about relational databases trying to implement search feature or sort of capability and sort of like miserably failing and that maybe you didn't use the word miserably it's it's my word here but I wanted to do a little bit like expand on this like why do you think they tried to do that and also while they were trying that what went wrong yeah great question well first of all I definitely wouldn't say the word miserably because the it it has been success to some degree definitely and it's not it's not over until it's over so the people are working with this so what I have been observing for many many years and I as I said I did start my career as as a person working on databases and doing a lot of writing a lot of SQL so but the the typical database is a very different beast from what you typically need to do information to you so the first of all they all like the early databases or they are oriented they achieve some tradeoff between the so they need have got throughput in both inserts and updates and they need to be able to update information pretty quickly and also it should be pretty reasonable and they also support really like the the data the data can be pretty complicated that what they call that SQL schema there can be multiple tables and all of that needs to be supported and so of course there are tradeoffs to be made to make it possible and again to support generality support efficient updates support efficient inserts but at the same time if you're doing the TV system a lot of of this is not necessary so say for you want to do like keyword-based retrieval you only need to all at high level is somewhat a simplification but you need to memorize them in which documents you have which keywords and then you have this so called inverted index where for each keyword you have a list of documents where these keywords appear and it's much simple structure it permits much more efficient compression algorithms so it's again it's it's a different beast and and also in terms of efficiency of updates once you compress data and once you represent it in a special way it's it becomes much harder to to make these incremental updates for which those early databases were applied so clearly there is a disconnect it was somewhat removed with the introduction of so-called columnar databases but it's still like with columnar databases I believe they actually do not favor those you know point updates anymore they they are best to be used for bulk updates and so basically once you're doing bulk updates yeah you're sort of in this search engine area where you ask you you change things in in rather large increments changing the access in rather large increments and you don't you don't worry too much about your information is being like really up to date you can wait maybe a day maybe a few hours but it doesn't have like an instantaneous update of the database so this is a different trade-offs so yeah um well of course there is a disconnect and this is why it's it was always hard I believe to add to add like food tax indexes to regular databases but another issue with the the disconnect is that like again the retrieval often needs like really different set of specialized features so if you have a relational database system it's pretty hard to support this for example like deconization if you need to do deconization in multiple languages yeah so of course that's part like you know the creation of those specialized tools with a lot of features like you've seen and VESPA and databases are catching up but there is still a gap and you know it's probably like going to be really tedious to to support yeah full set of features like you know you need to match VESPA so yeah these are like my five cents on this stuff yeah but I'm curious to sort of a little bit the understand why do you think databases are still trying why are they trying to encompass this seemingly disparate ways of searching right when you actually if basically like you explained if you need to have a fully blown search engine that can support multiple languages tokenization and so on you better be using the likes of recene VESPA and you know maybe elastic search on top of the scene and so on why why are they still trying they want customers it's of course advantageous to be like you know one stop shop so they come to specific provider and they have everything so I listen to a podcast the roxette co-founder which was the roxette the one was acquired by OpenAI but I think you recorded that podcast before they were acquired so good timing and you can clearly hear that message all like we really want people to come and use our solution so we have hybrid search we have some support for ranking we have this and we have that yeah I can't I can't argue against this being convenient so definitely something something very useful customers yeah yeah just a small correction he's not a co-founder I think he's well VP of engineering or used to be a VP of engineering in roxette but yeah I mean he's he brings the story and I encourage listeners to listen to the episode he brings the story of you know roxDB scalability issues from from Facebook and how it underpins you know the the further journey at roxette so I feel like we could discuss for five hours and I'm actually a big fan of Lex Friedman podcasts where some of the episodes really really long and you can listen to them for weeks and and I think I really hope that we can record with you sometime later as well as you know as you have topics to share but is there something Leo that you want to share I don't know it could be a paper you've read that particularly excites you maybe a book or anything else that you want to say yeah I think we yeah great question so so I was interested a lot recently very recently I mean maybe the last couple of years in how LLAMS can be useful for search in one particular interesting direction is how do you use LLAMS to to train smaller models for retrieval and ranking for me personally it's a very exciting area of research yeah as far as distillation is concerned there was several interesting papers on the topic there was but basically the lot of of that work revolves around creation synthetic data synthetic queries based on the documents like we have a document that creates the queries and queries that you asked that that pretty slash question and the answer is in documents we have a positive relevant document and you can sample negatives from from your collection and train them while but there is also a line of research where they they would try to create both queries and documents so yeah in summary the that whole that whole not in summary but that that that line of research was particularly interesting to me although there was some work before LLAMS to create synthetic queries it was not particularly well-used technique but one paper that stood out was the in-part paper from a couple of years ago and we have reproduction of this paper in that paper had a a pretty quick fall-up from the there were several several authors from Google they called it Protigator where they showed how this technique can be improved and there was another fall-up from the with the same first author now she transitioned to the mind and now they they showed like like oh like now we do it like somewhat better but the they they found one issue with the synthetic query generation approach that not always the the document that was used to create the queries the most relevant document so you would think it sort of makes sense that if the question is being answered by this document it is the most relevant document that turns out if you ask a question there can be other documents that that answer this question and they can answer that question even better and so they solve this problem using you know a relabeling approach so that basically the due to the will they generate synthetic query from some document they they do it you and they do then they look at the top say 10 documents and they they use another LLM to decide whether these documents are relevant to the query or not yeah it's also very interesting paper as well I yeah and finally the last couple of papers that I encountered were regarding creation of either creation of documents either just joined here with queries or based on the queries this is also very interesting for long yeah that's amazing thanks for sharing and I hope we can link all of these papers in the in the episode you know yeah absolutely because I think one of the goals of this podcast is to continue to be educational resource not just entertainment maybe some people potentially viewed as an entertainment entertainment and then good sense of word you know when you want to sort of a little bit like break away from your daily routine and then listen to some of the insights and and we heard a lot of insights today from you thanks a lot for sharing Leo and I wish you all the all the best in in your in your projects and your current projects in your future projects and yeah I mean I would be all equally excited to talk to you at some point as well because it does feel like you have a lot more to say than I'm able to contain in the in a single episode yeah it's my pleasure thanks a lot for inviting me I enjoyed the podcast I enjoyed our conversation very thank you very much Leo and good luck bye bye \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/doug-turnbull-staff-relevance-engineer-shopify-search-as-a-constant-experimentation-cycle.md b/transcripts_with_timestamps/vector-podcast/doug-turnbull-staff-relevance-engineer-shopify-search-as-a-constant-experimentation-cycle.md new file mode 100644 index 0000000..b9151d4 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/doug-turnbull-staff-relevance-engineer-shopify-search-as-a-constant-experimentation-cycle.md @@ -0,0 +1,4317 @@ +--- +description: '

This episode on YouTube: https://www.youtube.com/watch?v=Kpua1Euc-B8

Topics:

00:00 + Intro

01:30 Doug’s story in Search

04:55 How Quepid came about

10:57 + Relevance as product at Shopify: challenge, process, tools, evaluation

15:36 + Search abandonment in Ecommerce

21:30 Rigor in A/B testing

23:53 Turn + user intent and content meaning into tokens, not words into tokens

32:11 Use + case for vector search in Maps. What about search in other domains?

38:05 + Expanding on dense approaches

40:52 Sparse, dense, hybrid anyone?

48:18 + Role of HNSW, scalability and new vector databases vs Elasticsearch / Solr dense + search

52:12 Doug’s advice to vector database makers

58:19 Learning + to Rank: how to start, how to collect data with active learning, what are the ML + methods and a mindset

1:12:10 Blending search and recommendation

1:16:08 + Search engineer role and key ingredients of managing search projects today

1:20:34 + What does a Product Manager do on a Search team?

1:26:50 The magical question + of WHY

1:29:08 Doug’s announcements

Show notes:

Doug’s course: + https://www.getsphere.com/ml-engineering/ml-powered-search?source=Instructor-Other-070922-vector-pod

Upcoming + book: https://www.manning.com/books/ai-powered-search?aaid=1&abid=e47ada24&chan=aips

Doug’s + post in Shopify’s blog “Search at Shopify—Range in Data and Engineering is the Future”: + https://shopify.engineering/search-at-shopify

Doug’s + own blog: https://softwaredoug.com/

Using + Bayesian optimization for Elasticsearch relevance: https://www.youtube.com/watch?v=yDcYi-ANJwE&t=1s

Hello + LTR: https://github.com/o19s/hello-ltr

Vector + Databases: https://towardsdatascience.com/milvus-pinecone-vespa-weaviate-vald-gsi-what-unites-these-buzz-words-and-what-makes-each-9c65a3bd0696

Research: + Search abandonment has a lasting impact on brand loyalty: https://cloud.google.com/blog/topics/retail/search-abandonment-impacts-retail-sales-brand-loyalty

Quepid: + https://quepid.com/

Podcast + design: Saurabh Rai [https://twitter.com/srvbhr]

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20221001_071023_bbd8f38e993da204036dc514900a891b.png +pub_date: Sat, 01 Oct 2022 07:32:38 GMT +title: Doug Turnbull - Staff Relevance Engineer, Shopify - Search as a constant experimentation + cycle +url: https://rss.com/podcasts/vector-podcast/638830 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 27.0, "text": " Hello + there, that vector podcast is here. We are rolling in the season two of this podcast. + And so today we have like a breach, so to say from US to Finland.", "tokens": [50364, + 2425, 456, 11, 300, 8062, 7367, 307, 510, 13, 492, 366, 9439, 294, 264, 3196, 732, + 295, 341, 7367, 13, 400, 370, 965, 321, 362, 411, 257, 31086, 11, 370, 281, 584, + 490, 2546, 281, 24869, 13, 51714], "temperature": 0.0, "avg_logprob": -0.3387999883512171, + "compression_ratio": 1.3083333333333333, "no_speech_prob": 0.1380881816148758}, + {"id": 1, "seek": 2700, "start": 28.0, "end": 48.0, "text": " And I''m super excited + to talk to Dr. and ball, staff relevance engineer choppy fine and that gives to + be a CTO at open source connections, the company behind so many tools for us relevance + engineers and relevance product managers as I am today.", "tokens": [50414, 400, + 286, 478, 1687, 2919, 281, 751, 281, 2491, 13, 293, 2594, 11, 3525, 32684, 11403, + 7931, 8200, 2489, 293, 300, 2709, 281, 312, 257, 383, 15427, 412, 1269, 4009, 9271, + 11, 264, 2237, 2261, 370, 867, 3873, 337, 505, 32684, 11955, 293, 32684, 1674, 14084, + 382, 286, 669, 965, 13, 51414], "temperature": 0.0, "avg_logprob": -0.3170015508478338, + "compression_ratio": 1.4878048780487805, "no_speech_prob": 0.2716258466243744}, + {"id": 2, "seek": 4800, "start": 49.0, "end": 54.0, "text": " He''s the original + creator of cupid and explainer and also learning to rank rank.", "tokens": [50414, + 634, 311, 264, 3380, 14181, 295, 4414, 327, 293, 2903, 260, 293, 611, 2539, 281, + 6181, 6181, 13, 50664], "temperature": 0.0, "avg_logprob": -0.34256141416488156, + "compression_ratio": 1.6208530805687205, "no_speech_prob": 0.44968533515930176}, + {"id": 3, "seek": 4800, "start": 54.0, "end": 62.0, "text": " My shirt. Yeah, let''s + search. Yeah, cute.com. Awesome. Great to have you here. Hi, how are you doing?", + "tokens": [50664, 1222, 8336, 13, 865, 11, 718, 311, 3164, 13, 865, 11, 4052, 13, + 1112, 13, 10391, 13, 3769, 281, 362, 291, 510, 13, 2421, 11, 577, 366, 291, 884, + 30, 51064], "temperature": 0.0, "avg_logprob": -0.34256141416488156, "compression_ratio": + 1.6208530805687205, "no_speech_prob": 0.44968533515930176}, {"id": 4, "seek": 4800, + "start": 62.0, "end": 75.0, "text": " I''m great. Yeah, I''m doing great. Excited + to chat about where search is going and the exciting places that are, you know, + search is going ahead and everything.", "tokens": [51064, 286, 478, 869, 13, 865, + 11, 286, 478, 884, 869, 13, 9368, 1226, 281, 5081, 466, 689, 3164, 307, 516, 293, + 264, 4670, 3190, 300, 366, 11, 291, 458, 11, 3164, 307, 516, 2286, 293, 1203, 13, + 51714], "temperature": 0.0, "avg_logprob": -0.34256141416488156, "compression_ratio": + 1.6208530805687205, "no_speech_prob": 0.44968533515930176}, {"id": 5, "seek": 7500, + "start": 75.0, "end": 87.0, "text": " So finally, I get to be on this podcast. I''m + really excited to be here. Yeah, absolutely. Long overdue and you are the legendary + guest. So I''m super excited to talk to you.", "tokens": [50364, 407, 2721, 11, + 286, 483, 281, 312, 322, 341, 7367, 13, 286, 478, 534, 2919, 281, 312, 510, 13, + 865, 11, 3122, 13, 8282, 19853, 622, 293, 291, 366, 264, 16698, 8341, 13, 407, 286, + 478, 1687, 2919, 281, 751, 281, 291, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.18997544182671441, "compression_ratio": 1.5446428571428572, "no_speech_prob": + 0.22378504276275635}, {"id": 6, "seek": 7500, "start": 87.0, "end": 99.0, "text": + " And a lot to cover, but before we begin, could you spare a few minutes talking + through your background, how you ended up in search, was it an accident or was it + not, was it?", "tokens": [50964, 400, 257, 688, 281, 2060, 11, 457, 949, 321, 1841, + 11, 727, 291, 13798, 257, 1326, 2077, 1417, 807, 428, 3678, 11, 577, 291, 4590, + 493, 294, 3164, 11, 390, 309, 364, 6398, 420, 390, 309, 406, 11, 390, 309, 30, 51564], + "temperature": 0.0, "avg_logprob": -0.18997544182671441, "compression_ratio": 1.5446428571428572, + "no_speech_prob": 0.22378504276275635}, {"id": 7, "seek": 9900, "start": 99.0, "end": + 110.0, "text": " It was mostly an accident. So what happened was so for a long time, + the first chapter of my career, the first half was being C and C plus plus developer.", + "tokens": [50364, 467, 390, 5240, 364, 6398, 13, 407, 437, 2011, 390, 370, 337, + 257, 938, 565, 11, 264, 700, 7187, 295, 452, 3988, 11, 264, 700, 1922, 390, 885, + 383, 293, 383, 1804, 1804, 10754, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.1418140411376953, "compression_ratio": 1.48, "no_speech_prob": 0.13387635350227356}, + {"id": 8, "seek": 9900, "start": 110.0, "end": 118.0, "text": " And I kind of got + really into performance, so optimizing speed and in native code. That was a lot + of fun.", "tokens": [50914, 400, 286, 733, 295, 658, 534, 666, 3389, 11, 370, 40425, + 3073, 293, 294, 8470, 3089, 13, 663, 390, 257, 688, 295, 1019, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.1418140411376953, "compression_ratio": 1.48, "no_speech_prob": + 0.13387635350227356}, {"id": 9, "seek": 11800, "start": 118.0, "end": 129.0, "text": + " And I moved down here to Charlottesville in 2012 from the Washington DC area couple, + so I was a couple hours away from my work.", "tokens": [50364, 400, 286, 4259, 760, + 510, 281, 14130, 1521, 279, 8386, 294, 9125, 490, 264, 6149, 9114, 1859, 1916, 11, + 370, 286, 390, 257, 1916, 2496, 1314, 490, 452, 589, 13, 50914], "temperature": + 0.0, "avg_logprob": -0.1206354954663445, "compression_ratio": 1.40625, "no_speech_prob": + 0.19844067096710205}, {"id": 10, "seek": 11800, "start": 129.0, "end": 139.0, "text": + " And I found that like, you know, I was kind of at the time, especially being one + remote employee for an in office company is just a nightmare.", "tokens": [50914, + 400, 286, 1352, 300, 411, 11, 291, 458, 11, 286, 390, 733, 295, 412, 264, 565, 11, + 2318, 885, 472, 8607, 10738, 337, 364, 294, 3398, 2237, 307, 445, 257, 18724, 13, + 51414], "temperature": 0.0, "avg_logprob": -0.1206354954663445, "compression_ratio": + 1.40625, "no_speech_prob": 0.19844067096710205}, {"id": 11, "seek": 13900, "start": + 139.0, "end": 146.0, "text": " And we had this neighborhood block party and I decided + to wear a nerdy t shirt just to see like, oh, maybe I''ll meet other developers.", + "tokens": [50364, 400, 321, 632, 341, 7630, 3461, 3595, 293, 286, 3047, 281, 3728, + 257, 18219, 3173, 256, 8336, 445, 281, 536, 411, 11, 1954, 11, 1310, 286, 603, 1677, + 661, 8849, 13, 50714], "temperature": 0.0, "avg_logprob": -0.14923684559171163, + "compression_ratio": 1.5057471264367817, "no_speech_prob": 0.016160063445568085}, + {"id": 12, "seek": 13900, "start": 146.0, "end": 153.0, "text": " And I think this + shirt said something like my code doesn''t have any bugs. It just has features or + something or random features.", "tokens": [50714, 400, 286, 519, 341, 8336, 848, + 746, 411, 452, 3089, 1177, 380, 362, 604, 15120, 13, 467, 445, 575, 4122, 420, 746, + 420, 4974, 4122, 13, 51064], "temperature": 0.0, "avg_logprob": -0.14923684559171163, + "compression_ratio": 1.5057471264367817, "no_speech_prob": 0.016160063445568085}, + {"id": 13, "seek": 15300, "start": 153.0, "end": 166.0, "text": " And I so happened + to run into Eric Pugh, who''s the founder of open source connections and sort of + one thing led to another and I got, I was like, oh, this seems cool. It''s a small + company.", "tokens": [50364, 400, 286, 370, 2011, 281, 1190, 666, 9336, 430, 1984, + 11, 567, 311, 264, 14917, 295, 1269, 4009, 9271, 293, 1333, 295, 472, 551, 4684, + 281, 1071, 293, 286, 658, 11, 286, 390, 411, 11, 1954, 11, 341, 2544, 1627, 13, + 467, 311, 257, 1359, 2237, 13, 51014], "temperature": 0.0, "avg_logprob": -0.13637204286528798, + "compression_ratio": 1.570048309178744, "no_speech_prob": 0.03775877505540848}, + {"id": 14, "seek": 15300, "start": 166.0, "end": 177.0, "text": " Always wanted + to try out consulting and contracting. And so, yeah, we ended up getting at the + job and getting more and more into search.", "tokens": [51014, 11270, 1415, 281, + 853, 484, 23682, 293, 36095, 13, 400, 370, 11, 1338, 11, 321, 4590, 493, 1242, 412, + 264, 1691, 293, 1242, 544, 293, 544, 666, 3164, 13, 51564], "temperature": 0.0, + "avg_logprob": -0.13637204286528798, "compression_ratio": 1.570048309178744, "no_speech_prob": + 0.03775877505540848}, {"id": 15, "seek": 17700, "start": 177.0, "end": 185.0, "text": + " Yeah, awesome. And you spend there how long seven more years about eight years + eight years. Yeah, it''s a long.", "tokens": [50364, 865, 11, 3476, 13, 400, 291, + 3496, 456, 577, 938, 3407, 544, 924, 466, 3180, 924, 3180, 924, 13, 865, 11, 309, + 311, 257, 938, 13, 50764], "temperature": 0.0, "avg_logprob": -0.2224594191008923, + "compression_ratio": 1.7198067632850242, "no_speech_prob": 0.3868262767791748}, + {"id": 16, "seek": 17700, "start": 185.0, "end": 188.0, "text": " Yeah, a long time. + Yeah, it was a lot of fun.", "tokens": [50764, 865, 11, 257, 938, 565, 13, 865, + 11, 309, 390, 257, 688, 295, 1019, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.2224594191008923, "compression_ratio": 1.7198067632850242, "no_speech_prob": + 0.3868262767791748}, {"id": 17, "seek": 17700, "start": 188.0, "end": 199.0, "text": + " Yeah, and you''ve done so much. I mean, I was literally in my previous job at + AlphaSense. I was my last. So I spend the 10 and a half years and my last half a + year. I was focusing on learning to rank.", "tokens": [50914, 865, 11, 293, 291, + 600, 1096, 370, 709, 13, 286, 914, 11, 286, 390, 3736, 294, 452, 3894, 1691, 412, + 20588, 50, 1288, 13, 286, 390, 452, 1036, 13, 407, 286, 3496, 264, 1266, 293, 257, + 1922, 924, 293, 452, 1036, 1922, 257, 1064, 13, 286, 390, 8416, 322, 2539, 281, + 6181, 13, 51464], "temperature": 0.0, "avg_logprob": -0.2224594191008923, "compression_ratio": + 1.7198067632850242, "no_speech_prob": 0.3868262767791748}, {"id": 18, "seek": 19900, + "start": 199.0, "end": 204.0, "text": " And I could find I could not find a better + resource than hello LTR.", "tokens": [50364, 400, 286, 727, 915, 286, 727, 406, + 915, 257, 1101, 7684, 813, 7751, 441, 25936, 13, 50614], "temperature": 0.0, "avg_logprob": + -0.23945982456207277, "compression_ratio": 1.515, "no_speech_prob": 0.3347674608230591}, + {"id": 19, "seek": 19900, "start": 204.0, "end": 207.0, "text": " Ripple on GitHub + that you have. Yeah.", "tokens": [50614, 497, 23476, 322, 23331, 300, 291, 362, + 13, 865, 13, 50764], "temperature": 0.0, "avg_logprob": -0.23945982456207277, "compression_ratio": + 1.515, "no_speech_prob": 0.3347674608230591}, {"id": 20, "seek": 19900, "start": + 207.0, "end": 208.0, "text": " Yeah.", "tokens": [50764, 865, 13, 50814], "temperature": + 0.0, "avg_logprob": -0.23945982456207277, "compression_ratio": 1.515, "no_speech_prob": + 0.3347674608230591}, {"id": 21, "seek": 19900, "start": 208.0, "end": 219.0, "text": + " And it was it was an amazing journey because first of all, I had to learn it on + the other hand, I had to build like what we could call maybe an infrastructure pipeline + of flywheel of success.", "tokens": [50814, 400, 309, 390, 309, 390, 364, 2243, + 4671, 570, 700, 295, 439, 11, 286, 632, 281, 1466, 309, 322, 264, 661, 1011, 11, + 286, 632, 281, 1322, 411, 437, 321, 727, 818, 1310, 364, 6896, 15517, 295, 3603, + 22830, 295, 2245, 13, 51364], "temperature": 0.0, "avg_logprob": -0.23945982456207277, + "compression_ratio": 1.515, "no_speech_prob": 0.3347674608230591}, {"id": 22, "seek": + 21900, "start": 219.0, "end": 228.0, "text": " Yeah, so you''re going to be right. + So yeah, and then you train the model you test and then you validate and so on so + forth. Validate with the users maybe a test.", "tokens": [50364, 865, 11, 370, 291, + 434, 516, 281, 312, 558, 13, 407, 1338, 11, 293, 550, 291, 3847, 264, 2316, 291, + 1500, 293, 550, 291, 29562, 293, 370, 322, 370, 5220, 13, 7188, 327, 473, 365, 264, + 5022, 1310, 257, 1500, 13, 50814], "temperature": 0.0, "avg_logprob": -0.3793260385324289, + "compression_ratio": 1.7016129032258065, "no_speech_prob": 0.5702135562896729}, + {"id": 23, "seek": 21900, "start": 228.0, "end": 245.0, "text": " It was awesome. + I built it entirely on your on your Ripple. And I even thought that said some was + the PR or issue that I created. But anyway, I''m sure yeah. Yeah, it seems like + to me sure you contribute a lot to like I know you contribute a lot to keep it too.", + "tokens": [50814, 467, 390, 3476, 13, 286, 3094, 309, 7696, 322, 428, 322, 428, + 497, 23476, 13, 400, 286, 754, 1194, 300, 848, 512, 390, 264, 11568, 420, 2734, + 300, 286, 2942, 13, 583, 4033, 11, 286, 478, 988, 1338, 13, 865, 11, 309, 2544, + 411, 281, 385, 988, 291, 10586, 257, 688, 281, 411, 286, 458, 291, 10586, 257, 688, + 281, 1066, 309, 886, 13, 51664], "temperature": 0.0, "avg_logprob": -0.3793260385324289, + "compression_ratio": 1.7016129032258065, "no_speech_prob": 0.5702135562896729}, + {"id": 24, "seek": 24500, "start": 245.0, "end": 252.0, "text": " And I think it''s + a great work or constantly huddling on you know, keep it and trying to keep it keep + it going.", "tokens": [50364, 400, 286, 519, 309, 311, 257, 869, 589, 420, 6460, + 276, 26656, 1688, 322, 291, 458, 11, 1066, 309, 293, 1382, 281, 1066, 309, 1066, + 309, 516, 13, 50714], "temperature": 0.0, "avg_logprob": -0.31358371462140766, "compression_ratio": + 1.56, "no_speech_prob": 0.24467740952968597}, {"id": 25, "seek": 24500, "start": + 252.0, "end": 260.0, "text": " Yeah, and I mean, that''s thanks to your curiosity + that you created it you you kind of saw the nation feed for it.", "tokens": [50714, + 865, 11, 293, 286, 914, 11, 300, 311, 3231, 281, 428, 18769, 300, 291, 2942, 309, + 291, 291, 733, 295, 1866, 264, 4790, 3154, 337, 309, 13, 51114], "temperature": + 0.0, "avg_logprob": -0.31358371462140766, "compression_ratio": 1.56, "no_speech_prob": + 0.24467740952968597}, {"id": 26, "seek": 24500, "start": 260.0, "end": 267.0, "text": + " But also when I came across it, I mean, it was very straight forward to start + using it.", "tokens": [51114, 583, 611, 562, 286, 1361, 2108, 309, 11, 286, 914, + 11, 309, 390, 588, 2997, 2128, 281, 722, 1228, 309, 13, 51464], "temperature": 0.0, + "avg_logprob": -0.31358371462140766, "compression_ratio": 1.56, "no_speech_prob": + 0.24467740952968597}, {"id": 27, "seek": 26700, "start": 267.0, "end": 278.0, "text": + " And of course, it was also learning experience. But now every time I joined, let''s + say new new gig, you know, previously silo AI with a large client, you know, WebScale + search, I brought it in.", "tokens": [50364, 400, 295, 1164, 11, 309, 390, 611, + 2539, 1752, 13, 583, 586, 633, 565, 286, 6869, 11, 718, 311, 584, 777, 777, 8741, + 11, 291, 458, 11, 8046, 3425, 78, 7318, 365, 257, 2416, 6423, 11, 291, 458, 11, + 9573, 16806, 1220, 3164, 11, 286, 3038, 309, 294, 13, 50914], "temperature": 0.0, + "avg_logprob": -0.34895896911621094, "compression_ratio": 1.480952380952381, "no_speech_prob": + 0.2197273075580597}, {"id": 28, "seek": 26700, "start": 278.0, "end": 286.0, "text": + " I said, there is no other tool that I know we should just try this and then try + to write a Tom Tom right now as well.", "tokens": [50914, 286, 848, 11, 456, 307, + 572, 661, 2290, 300, 286, 458, 321, 820, 445, 853, 341, 293, 550, 853, 281, 2464, + 257, 5041, 5041, 558, 586, 382, 731, 13, 51314], "temperature": 0.0, "avg_logprob": + -0.34895896911621094, "compression_ratio": 1.480952380952381, "no_speech_prob": + 0.2197273075580597}, {"id": 29, "seek": 28600, "start": 286.0, "end": 289.0, "text": + " So we have it. Oh, that''s awesome. Yeah. Yeah.", "tokens": [50364, 407, 321, + 362, 309, 13, 876, 11, 300, 311, 3476, 13, 865, 13, 865, 13, 50514], "temperature": + 0.0, "avg_logprob": -0.2374183506641573, "compression_ratio": 1.615702479338843, + "no_speech_prob": 0.2941092550754547}, {"id": 30, "seek": 28600, "start": 289.0, + "end": 299.0, "text": " Yeah, Cupid has a funny origin story where in sort of like + dovetails with my story, but for a long time, opens for connections.", "tokens": + [50514, 865, 11, 383, 6127, 575, 257, 4074, 4957, 1657, 689, 294, 1333, 295, 411, + 360, 9771, 6227, 365, 452, 1657, 11, 457, 337, 257, 938, 565, 11, 9870, 337, 9271, + 13, 51014], "temperature": 0.0, "avg_logprob": -0.2374183506641573, "compression_ratio": + 1.615702479338843, "no_speech_prob": 0.2941092550754547}, {"id": 31, "seek": 28600, + "start": 299.0, "end": 312.0, "text": " And I think this is true of a lot of places + in the early 2010s. It was pretty easy to build you we would build these beautiful + search apps. And that would be part of our, our consulting as we build these search + apps.", "tokens": [51014, 400, 286, 519, 341, 307, 2074, 295, 257, 688, 295, 3190, + 294, 264, 2440, 9657, 82, 13, 467, 390, 1238, 1858, 281, 1322, 291, 321, 576, 1322, + 613, 2238, 3164, 7733, 13, 400, 300, 576, 312, 644, 295, 527, 11, 527, 23682, 382, + 321, 1322, 613, 3164, 7733, 13, 51664], "temperature": 0.0, "avg_logprob": -0.2374183506641573, + "compression_ratio": 1.615702479338843, "no_speech_prob": 0.2941092550754547}, {"id": + 32, "seek": 31200, "start": 312.0, "end": 321.0, "text": " And they would be beautiful + and they look pretty, but then only at the very end with someone type in a search + and you would see, like these results don''t make any sense.", "tokens": [50364, + 400, 436, 576, 312, 2238, 293, 436, 574, 1238, 11, 457, 550, 787, 412, 264, 588, + 917, 365, 1580, 2010, 294, 257, 3164, 293, 291, 576, 536, 11, 411, 613, 3542, 500, + 380, 652, 604, 2020, 13, 50814], "temperature": 0.0, "avg_logprob": -0.12127276694420541, + "compression_ratio": 1.6693227091633467, "no_speech_prob": 0.015008989721536636}, + {"id": 33, "seek": 31200, "start": 321.0, "end": 327.0, "text": " And then like + people panic and they want to fix it. They''re about to go to market. They can''t + release like this.", "tokens": [50814, 400, 550, 411, 561, 14783, 293, 436, 528, + 281, 3191, 309, 13, 814, 434, 466, 281, 352, 281, 2142, 13, 814, 393, 380, 4374, + 411, 341, 13, 51114], "temperature": 0.0, "avg_logprob": -0.12127276694420541, "compression_ratio": + 1.6693227091633467, "no_speech_prob": 0.015008989721536636}, {"id": 34, "seek": + 31200, "start": 327.0, "end": 334.0, "text": " And they realize of the search engine + isn''t some magic black box. It''s actually this thing that we have to configure + and tune and stuff.", "tokens": [51114, 400, 436, 4325, 295, 264, 3164, 2848, 1943, + 380, 512, 5585, 2211, 2424, 13, 467, 311, 767, 341, 551, 300, 321, 362, 281, 22162, + 293, 10864, 293, 1507, 13, 51464], "temperature": 0.0, "avg_logprob": -0.12127276694420541, + "compression_ratio": 1.6693227091633467, "no_speech_prob": 0.015008989721536636}, + {"id": 35, "seek": 33400, "start": 334.0, "end": 342.0, "text": " And so Cupid actually + started because, and there''s an old Lucene revolution video that talks about this.", + "tokens": [50364, 400, 370, 383, 6127, 767, 1409, 570, 11, 293, 456, 311, 364, 1331, + 9593, 1450, 8894, 960, 300, 6686, 466, 341, 13, 50764], "temperature": 0.0, "avg_logprob": + -0.19567819075150925, "compression_ratio": 1.5165289256198347, "no_speech_prob": + 0.03226356580853462}, {"id": 36, "seek": 33400, "start": 342.0, "end": 351.0, "text": + " But John Barryman, my coworker at the time, and I would go to our client also + in Charlottesville, Silverchair.", "tokens": [50764, 583, 2619, 21639, 1601, 11, + 452, 31998, 260, 412, 264, 565, 11, 293, 286, 576, 352, 281, 527, 6423, 611, 294, + 14130, 1521, 279, 8386, 11, 15861, 17892, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.19567819075150925, "compression_ratio": 1.5165289256198347, "no_speech_prob": + 0.03226356580853462}, {"id": 37, "seek": 33400, "start": 351.0, "end": 360.0, "text": + " And we were helping them develop these search applications. And as like they would + tune like constantly we go back every week and try to fix something.", "tokens": + [51214, 400, 321, 645, 4315, 552, 1499, 613, 3164, 5821, 13, 400, 382, 411, 436, + 576, 10864, 411, 6460, 321, 352, 646, 633, 1243, 293, 853, 281, 3191, 746, 13, 51664], + "temperature": 0.0, "avg_logprob": -0.19567819075150925, "compression_ratio": 1.5165289256198347, + "no_speech_prob": 0.03226356580853462}, {"id": 38, "seek": 36000, "start": 360.0, + "end": 376.0, "text": " And then we would end up breaking something else. So I finally + got kind of tired of it. And I just sat there and built like a at the time of Python + flask app that was just let''s show these search results and like just label them + as good or bad.", "tokens": [50364, 400, 550, 321, 576, 917, 493, 7697, 746, 1646, + 13, 407, 286, 2721, 658, 733, 295, 5868, 295, 309, 13, 400, 286, 445, 3227, 456, + 293, 3094, 411, 257, 412, 264, 565, 295, 15329, 932, 3863, 724, 300, 390, 445, 718, + 311, 855, 613, 3164, 3542, 293, 411, 445, 7645, 552, 382, 665, 420, 1578, 13, 51164], + "temperature": 0.0, "avg_logprob": -0.11197344049230798, "compression_ratio": 1.5323383084577114, + "no_speech_prob": 0.021895218640565872}, {"id": 39, "seek": 36000, "start": 376.0, + "end": 380.0, "text": " And so we don''t have to keep going backwards on on our + quality.", "tokens": [51164, 400, 370, 321, 500, 380, 362, 281, 1066, 516, 12204, + 322, 322, 527, 3125, 13, 51364], "temperature": 0.0, "avg_logprob": -0.11197344049230798, + "compression_ratio": 1.5323383084577114, "no_speech_prob": 0.021895218640565872}, + {"id": 40, "seek": 38000, "start": 380.0, "end": 388.0, "text": " And he was I was + literally creating the apple he was sitting there like trying to tune search with + with our client.", "tokens": [50364, 400, 415, 390, 286, 390, 3736, 4084, 264, 10606, + 415, 390, 3798, 456, 411, 1382, 281, 10864, 3164, 365, 365, 527, 6423, 13, 50764], + "temperature": 0.0, "avg_logprob": -0.22270767056212135, "compression_ratio": 1.575, + "no_speech_prob": 0.08262453973293304}, {"id": 41, "seek": 38000, "start": 388.0, + "end": 396.0, "text": " Reena Morse at Silverchair. So it was kind of like hacked + together in an hour and then we started using it.", "tokens": [50764, 1300, 4118, + 5146, 405, 412, 15861, 17892, 13, 407, 309, 390, 733, 295, 411, 36218, 1214, 294, + 364, 1773, 293, 550, 321, 1409, 1228, 309, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.22270767056212135, "compression_ratio": 1.575, "no_speech_prob": 0.08262453973293304}, + {"id": 42, "seek": 38000, "start": 396.0, "end": 408.0, "text": " This is so cool. + And I mean, for me, Cupid, I mean this topic of quality assurance and searches big, + I think, right. And maybe under valued. I''m not sure.", "tokens": [51164, 639, + 307, 370, 1627, 13, 400, 286, 914, 11, 337, 385, 11, 383, 6127, 11, 286, 914, 341, + 4829, 295, 3125, 32189, 293, 26701, 955, 11, 286, 519, 11, 558, 13, 400, 1310, 833, + 22608, 13, 286, 478, 406, 988, 13, 51764], "temperature": 0.0, "avg_logprob": -0.22270767056212135, + "compression_ratio": 1.575, "no_speech_prob": 0.08262453973293304}, {"id": 43, "seek": + 40800, "start": 408.0, "end": 410.0, "text": " Yeah, it is. Yeah, totally.", "tokens": + [50364, 865, 11, 309, 307, 13, 865, 11, 3879, 13, 50464], "temperature": 0.0, "avg_logprob": + -0.1613038031609504, "compression_ratio": 1.6372093023255814, "no_speech_prob": + 0.06987298280000687}, {"id": 44, "seek": 40800, "start": 410.0, "end": 419.0, "text": + " But you know, like at Alphasense, for example, I had access to people who used + to be financial analysts or they deeply understand, you know, content.", "tokens": + [50464, 583, 291, 458, 11, 411, 412, 967, 7485, 1288, 11, 337, 1365, 11, 286, 632, + 2105, 281, 561, 567, 1143, 281, 312, 4669, 31388, 420, 436, 8760, 1223, 11, 291, + 458, 11, 2701, 13, 50914], "temperature": 0.0, "avg_logprob": -0.1613038031609504, + "compression_ratio": 1.6372093023255814, "no_speech_prob": 0.06987298280000687}, + {"id": 45, "seek": 40800, "start": 419.0, "end": 429.0, "text": " And it''s so important + to understand content like brokerage versus, you know, sell site versus buy site. + What is it? What is this? You know, what people are looking there for.", "tokens": + [50914, 400, 309, 311, 370, 1021, 281, 1223, 2701, 411, 26502, 609, 5717, 11, 291, + 458, 11, 3607, 3621, 5717, 2256, 3621, 13, 708, 307, 309, 30, 708, 307, 341, 30, + 509, 458, 11, 437, 561, 366, 1237, 456, 337, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.1613038031609504, "compression_ratio": 1.6372093023255814, "no_speech_prob": + 0.06987298280000687}, {"id": 46, "seek": 42900, "start": 429.0, "end": 440.0, "text": + " And I remember one of the guys on that product team, he said, well, this is fantastic. + Now I can explain to you what I need in terms of relevancy without getting into + the weeds of your algorithm.", "tokens": [50364, 400, 286, 1604, 472, 295, 264, + 1074, 322, 300, 1674, 1469, 11, 415, 848, 11, 731, 11, 341, 307, 5456, 13, 823, + 286, 393, 2903, 281, 291, 437, 286, 643, 294, 2115, 295, 25916, 6717, 1553, 1242, + 666, 264, 26370, 295, 428, 9284, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.16919192561396845, "compression_ratio": 1.599236641221374, "no_speech_prob": + 0.35896414518356323}, {"id": 47, "seek": 42900, "start": 440.0, "end": 443.0, "text": + " And you then hold away and get there, right.", "tokens": [50914, 400, 291, 550, + 1797, 1314, 293, 483, 456, 11, 558, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.16919192561396845, "compression_ratio": 1.599236641221374, "no_speech_prob": + 0.35896414518356323}, {"id": 48, "seek": 42900, "start": 443.0, "end": 444.0, "text": + " So I mean, hold away.", "tokens": [51064, 407, 286, 914, 11, 1797, 1314, 13, 51114], + "temperature": 0.0, "avg_logprob": -0.16919192561396845, "compression_ratio": 1.599236641221374, + "no_speech_prob": 0.35896414518356323}, {"id": 49, "seek": 42900, "start": 444.0, + "end": 454.0, "text": " That''s fantastic. And I remember like at WebScale search, + we got stuck a little bit like optimizing our KPI metrics, like one of them is click + through rate.", "tokens": [51114, 663, 311, 5456, 13, 400, 286, 1604, 411, 412, + 9573, 16806, 1220, 3164, 11, 321, 658, 5541, 257, 707, 857, 411, 40425, 527, 591, + 31701, 16367, 11, 411, 472, 295, 552, 307, 2052, 807, 3314, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.16919192561396845, "compression_ratio": 1.599236641221374, + "no_speech_prob": 0.35896414518356323}, {"id": 50, "seek": 45400, "start": 454.0, + "end": 467.0, "text": " And I remember when I onboarded Cupid, we generated literally + 70 juratickets as a result of analyzing first annotating, rating the queries and + then analyzing what went wrong there, right.", "tokens": [50364, 400, 286, 1604, + 562, 286, 24033, 292, 383, 6127, 11, 321, 10833, 3736, 5285, 12721, 267, 38748, + 382, 257, 1874, 295, 23663, 700, 25339, 990, 11, 10990, 264, 24109, 293, 550, 23663, + 437, 1437, 2085, 456, 11, 558, 13, 51014], "temperature": 0.0, "avg_logprob": -0.2520988560930083, + "compression_ratio": 1.540909090909091, "no_speech_prob": 0.092001773416996}, {"id": + 51, "seek": 45400, "start": 467.0, "end": 476.0, "text": " And probably like half + of this at least was data related issues, which you would think, hey, this is Cupid. + This is about relevance and not about data.", "tokens": [51014, 400, 1391, 411, + 1922, 295, 341, 412, 1935, 390, 1412, 4077, 2663, 11, 597, 291, 576, 519, 11, 4177, + 11, 341, 307, 383, 6127, 13, 639, 307, 466, 32684, 293, 406, 466, 1412, 13, 51464], + "temperature": 0.0, "avg_logprob": -0.2520988560930083, "compression_ratio": 1.540909090909091, + "no_speech_prob": 0.092001773416996}, {"id": 52, "seek": 47600, "start": 476.0, + "end": 496.0, "text": " Oh, yeah, you find that stuff all the time. Yeah. We would, + we would, we kind of had this model at open source connections that worked well + where, you know, you, you come in as a consultant, you''re trying to, we would, + we started consulting in the search relevance basically exclusively and we would + come in and instead of", "tokens": [50364, 876, 11, 1338, 11, 291, 915, 300, 1507, + 439, 264, 565, 13, 865, 13, 492, 576, 11, 321, 576, 11, 321, 733, 295, 632, 341, + 2316, 412, 1269, 4009, 9271, 300, 2732, 731, 689, 11, 291, 458, 11, 291, 11, 291, + 808, 294, 382, 257, 24676, 11, 291, 434, 1382, 281, 11, 321, 576, 11, 321, 1409, + 23682, 294, 264, 3164, 32684, 1936, 20638, 293, 321, 576, 808, 294, 293, 2602, 295, + 51364], "temperature": 0.0, "avg_logprob": -0.18041627030623586, "compression_ratio": + 1.6512820512820512, "no_speech_prob": 0.11229746788740158}, {"id": 53, "seek": 49600, + "start": 496.0, "end": 512.0, "text": " sometimes, you know, it''s a very data driven + process and it needs to be, but on the other times it''s like just jumping in. Let''s + start with 12 queries and let''s label what''s what the good results are and improve + those.", "tokens": [50364, 2171, 11, 291, 458, 11, 309, 311, 257, 588, 1412, 9555, + 1399, 293, 309, 2203, 281, 312, 11, 457, 322, 264, 661, 1413, 309, 311, 411, 445, + 11233, 294, 13, 961, 311, 722, 365, 2272, 24109, 293, 718, 311, 7645, 437, 311, + 437, 264, 665, 3542, 366, 293, 3470, 729, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.1266712167046287, "compression_ratio": 1.6813725490196079, "no_speech_prob": + 0.008132540620863438}, {"id": 54, "seek": 49600, "start": 512.0, "end": 518.0, "text": + " And then we would kind of go through these sprints of, okay, let''s take the next + 12 queries, let''s take the next 12 queries.", "tokens": [51164, 400, 550, 321, + 576, 733, 295, 352, 807, 613, 6103, 8654, 295, 11, 1392, 11, 718, 311, 747, 264, + 958, 2272, 24109, 11, 718, 311, 747, 264, 958, 2272, 24109, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.1266712167046287, "compression_ratio": 1.6813725490196079, + "no_speech_prob": 0.008132540620863438}, {"id": 55, "seek": 51800, "start": 518.0, + "end": 543.0, "text": " And you just constantly like gradually expand the envelope + of what you''re tuning and actually worked really well as a practice for improving + relevancy without having to spend like sometimes, you know, the, you know, places + don''t necessarily have months to spend boots trapping like a click stream pipeline + and understanding clicks and all the biases and things and that.", "tokens": [50364, + 400, 291, 445, 6460, 411, 13145, 5268, 264, 19989, 295, 437, 291, 434, 15164, 293, + 767, 2732, 534, 731, 382, 257, 3124, 337, 11470, 25916, 6717, 1553, 1419, 281, 3496, + 411, 2171, 11, 291, 458, 11, 264, 11, 291, 458, 11, 3190, 500, 380, 4725, 362, 2493, + 281, 3496, 15194, 944, 3759, 411, 257, 2052, 4309, 15517, 293, 3701, 18521, 293, + 439, 264, 32152, 293, 721, 293, 300, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.10533368097592706, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.23714907467365265}, {"id": 56, "seek": 54300, "start": 544.0, "end": 548.0, "text": + " Or, you know, and so it''s just a really straightforward way to get started on + the problem.", "tokens": [50414, 1610, 11, 291, 458, 11, 293, 370, 309, 311, 445, + 257, 534, 15325, 636, 281, 483, 1409, 322, 264, 1154, 13, 50614], "temperature": + 0.0, "avg_logprob": -0.18779660761356354, "compression_ratio": 1.6595744680851063, + "no_speech_prob": 0.1724207103252411}, {"id": 57, "seek": 54300, "start": 548.0, + "end": 557.0, "text": " Yeah, absolutely. And I don''t know if you could imagine + this, but when I was a consultant, I had like a breather two months between, you + know, that client and Tom Tom.", "tokens": [50614, 865, 11, 3122, 13, 400, 286, + 500, 380, 458, 498, 291, 727, 3811, 341, 11, 457, 562, 286, 390, 257, 24676, 11, + 286, 632, 411, 257, 3656, 511, 732, 2493, 1296, 11, 291, 458, 11, 300, 6423, 293, + 5041, 5041, 13, 51064], "temperature": 0.0, "avg_logprob": -0.18779660761356354, + "compression_ratio": 1.6595744680851063, "no_speech_prob": 0.1724207103252411}, + {"id": 58, "seek": 54300, "start": 557.0, "end": 570.0, "text": " So I consulted + you startups, one of them in the US. And so they look at us, we said, you know, + you come in and they think you can do magic and I said, okay, maybe I can''t, but + I will not tell you. So no doubt.", "tokens": [51064, 407, 286, 47941, 291, 28041, + 11, 472, 295, 552, 294, 264, 2546, 13, 400, 370, 436, 574, 412, 505, 11, 321, 848, + 11, 291, 458, 11, 291, 808, 294, 293, 436, 519, 291, 393, 360, 5585, 293, 286, 848, + 11, 1392, 11, 1310, 286, 393, 380, 11, 457, 286, 486, 406, 980, 291, 13, 407, 572, + 6385, 13, 51714], "temperature": 0.0, "avg_logprob": -0.18779660761356354, "compression_ratio": + 1.6595744680851063, "no_speech_prob": 0.1724207103252411}, {"id": 59, "seek": 57000, + "start": 571.0, "end": 588.0, "text": " In front of you. So I came there and I said, + hey, how are you doing QA and it should need this massive Excel with colored legend + and like what not. And I said, well, this is cool, but I think it''s not repeatable. + And they said, yeah, it''s a big pain point. I said, let''s do something better, + something else.", "tokens": [50414, 682, 1868, 295, 291, 13, 407, 286, 1361, 456, + 293, 286, 848, 11, 4177, 11, 577, 366, 291, 884, 1249, 32, 293, 309, 820, 643, 341, + 5994, 19060, 365, 14332, 9451, 293, 411, 437, 406, 13, 400, 286, 848, 11, 731, 11, + 341, 307, 1627, 11, 457, 286, 519, 309, 311, 406, 7149, 712, 13, 400, 436, 848, + 11, 1338, 11, 309, 311, 257, 955, 1822, 935, 13, 286, 848, 11, 718, 311, 360, 746, + 1101, 11, 746, 1646, 13, 51264], "temperature": 0.0, "avg_logprob": -0.2004392941792806, + "compression_ratio": 1.566326530612245, "no_speech_prob": 0.4212491810321808}, {"id": + 60, "seek": 58800, "start": 589.0, "end": 593.0, "text": " And I introduced QBIT, + it took probably a couple months.", "tokens": [50414, 400, 286, 7268, 1249, 33, + 3927, 11, 309, 1890, 1391, 257, 1916, 2493, 13, 50614], "temperature": 0.0, "avg_logprob": + -0.17311901516384548, "compression_ratio": 1.4951456310679612, "no_speech_prob": + 0.4992668032646179}, {"id": 61, "seek": 58800, "start": 593.0, "end": 606.0, "text": + " Then I lost touch with that startup. So I switched to consulting others. They + didn''t reach out. Then I reached out and I said, hey, what''s the status. And they + said, you will be surprised, but we have moved the whole QA process to keep it as + like wow.", "tokens": [50614, 1396, 286, 2731, 2557, 365, 300, 18578, 13, 407, 286, + 16858, 281, 23682, 2357, 13, 814, 994, 380, 2524, 484, 13, 1396, 286, 6488, 484, + 293, 286, 848, 11, 4177, 11, 437, 311, 264, 6558, 13, 400, 436, 848, 11, 291, 486, + 312, 6100, 11, 457, 321, 362, 4259, 264, 1379, 1249, 32, 1399, 281, 1066, 309, 382, + 411, 6076, 13, 51264], "temperature": 0.0, "avg_logprob": -0.17311901516384548, + "compression_ratio": 1.4951456310679612, "no_speech_prob": 0.4992668032646179}, + {"id": 62, "seek": 60600, "start": 606.0, "end": 619.0, "text": " Oh, that''s awesome. + Yeah. This is the touch and feeling of what you have created for your use cases, + worked for someone else''s case. Isn''t it amazing feeling.", "tokens": [50364, + 876, 11, 300, 311, 3476, 13, 865, 13, 639, 307, 264, 2557, 293, 2633, 295, 437, + 291, 362, 2942, 337, 428, 764, 3331, 11, 2732, 337, 1580, 1646, 311, 1389, 13, 6998, + 380, 309, 2243, 2633, 13, 51014], "temperature": 0.0, "avg_logprob": -0.1974445203455483, + "compression_ratio": 1.3057851239669422, "no_speech_prob": 0.1997288465499878}, + {"id": 63, "seek": 61900, "start": 619.0, "end": 648.0, "text": " Yeah, it''s great. + Yeah, it''s funny how that works. You know, if you solve your own problem really + well, you know, there''s probably other people out there that are like you that + have the same problem and appreciate that perspective on the problem. So, so yeah, + I think that that''s kind of a truism is like, don''t worry about solving a if you + have a need, if you have a need, solve the problem for yourself as the most important + audience.", "tokens": [50364, 865, 11, 309, 311, 869, 13, 865, 11, 309, 311, 4074, + 577, 300, 1985, 13, 509, 458, 11, 498, 291, 5039, 428, 1065, 1154, 534, 731, 11, + 291, 458, 11, 456, 311, 1391, 661, 561, 484, 456, 300, 366, 411, 291, 300, 362, + 264, 912, 1154, 293, 4449, 300, 4585, 322, 264, 1154, 13, 407, 11, 370, 1338, 11, + 286, 519, 300, 300, 311, 733, 295, 257, 504, 84, 1434, 307, 411, 11, 500, 380, 3292, + 466, 12606, 257, 498, 291, 362, 257, 643, 11, 498, 291, 362, 257, 643, 11, 5039, + 264, 1154, 337, 1803, 382, 264, 881, 1021, 4034, 13, 51814], "temperature": 0.0, + "avg_logprob": -0.16434587751116073, "compression_ratio": 1.794238683127572, "no_speech_prob": + 0.11655937880277634}, {"id": 64, "seek": 64900, "start": 649.0, "end": 656.0, "text": + " And it will sort of naturally find the people like you have the exact same problem.", + "tokens": [50364, 400, 309, 486, 1333, 295, 8195, 915, 264, 561, 411, 291, 362, + 264, 1900, 912, 1154, 13, 50714], "temperature": 0.0, "avg_logprob": -0.15115780665956693, + "compression_ratio": 1.4129032258064516, "no_speech_prob": 0.07939369976520538}, + {"id": 65, "seek": 64900, "start": 656.0, "end": 661.0, "text": " Yeah, fantastic. + And now you are at Shopify. Yeah.", "tokens": [50714, 865, 11, 5456, 13, 400, 586, + 291, 366, 412, 43991, 13, 865, 13, 50964], "temperature": 0.0, "avg_logprob": -0.15115780665956693, + "compression_ratio": 1.4129032258064516, "no_speech_prob": 0.07939369976520538}, + {"id": 66, "seek": 64900, "start": 661.0, "end": 667.0, "text": " So how do you + structure your work there? This is my how part in the podcast as well.", "tokens": + [50964, 407, 577, 360, 291, 3877, 428, 589, 456, 30, 639, 307, 452, 577, 644, 294, + 264, 7367, 382, 731, 13, 51264], "temperature": 0.0, "avg_logprob": -0.15115780665956693, + "compression_ratio": 1.4129032258064516, "no_speech_prob": 0.07939369976520538}, + {"id": 67, "seek": 66700, "start": 667.0, "end": 676.0, "text": " You''re building + your product is relevancy in many ways, right? And maybe performance of the search + engine because there are trade-offs. Yeah.", "tokens": [50364, 509, 434, 2390, 428, + 1674, 307, 25916, 6717, 294, 867, 2098, 11, 558, 30, 400, 1310, 3389, 295, 264, + 3164, 2848, 570, 456, 366, 4923, 12, 19231, 13, 865, 13, 50814], "temperature": + 0.0, "avg_logprob": -0.21824804166468179, "compression_ratio": 1.5367965367965368, + "no_speech_prob": 0.555931568145752}, {"id": 68, "seek": 66700, "start": 676.0, + "end": 685.0, "text": " So how do you structure the whole process experimentation, + evaluation? Is there anything you could share and understand that there could be + some private things you don''t want to share.", "tokens": [50814, 407, 577, 360, + 291, 3877, 264, 1379, 1399, 37142, 11, 13344, 30, 1119, 456, 1340, 291, 727, 2073, + 293, 1223, 300, 456, 727, 312, 512, 4551, 721, 291, 500, 380, 528, 281, 2073, 13, + 51264], "temperature": 0.0, "avg_logprob": -0.21824804166468179, "compression_ratio": + 1.5367965367965368, "no_speech_prob": 0.555931568145752}, {"id": 69, "seek": 66700, + "start": 685.0, "end": 687.0, "text": " Oh, sure. Yeah. That''s okay.", "tokens": + [51264, 876, 11, 988, 13, 865, 13, 663, 311, 1392, 13, 51364], "temperature": 0.0, + "avg_logprob": -0.21824804166468179, "compression_ratio": 1.5367965367965368, "no_speech_prob": + 0.555931568145752}, {"id": 70, "seek": 68700, "start": 687.0, "end": 696.0, "text": + " So for context, our team works on the relevancy of all of the Shopify storefronts, + so all the little shops out there.", "tokens": [50364, 407, 337, 4319, 11, 527, + 1469, 1985, 322, 264, 25916, 6717, 295, 439, 295, 264, 43991, 3531, 11496, 82, 11, + 370, 439, 264, 707, 14457, 484, 456, 13, 50814], "temperature": 0.0, "avg_logprob": + -0.10076598560108858, "compression_ratio": 1.631578947368421, "no_speech_prob": + 0.29581418633461}, {"id": 71, "seek": 68700, "start": 696.0, "end": 706.0, "text": + " And that''s a really interesting process because you could imagine the impact + there is very variable per shop.", "tokens": [50814, 400, 300, 311, 257, 534, 1880, + 1399, 570, 291, 727, 3811, 264, 2712, 456, 307, 588, 7006, 680, 3945, 13, 51314], + "temperature": 0.0, "avg_logprob": -0.10076598560108858, "compression_ratio": 1.631578947368421, + "no_speech_prob": 0.29581418633461}, {"id": 72, "seek": 68700, "start": 706.0, "end": + 716.0, "text": " And we don''t, of course, we don''t want to like tank someone''s + sales, but at the same time, if we see something doing well, generally, then we, + you know, we want to promote it.", "tokens": [51314, 400, 321, 500, 380, 11, 295, + 1164, 11, 321, 500, 380, 528, 281, 411, 5466, 1580, 311, 5763, 11, 457, 412, 264, + 912, 565, 11, 498, 321, 536, 746, 884, 731, 11, 5101, 11, 550, 321, 11, 291, 458, + 11, 321, 528, 281, 9773, 309, 13, 51814], "temperature": 0.0, "avg_logprob": -0.10076598560108858, + "compression_ratio": 1.631578947368421, "no_speech_prob": 0.29581418633461}, {"id": + 73, "seek": 71600, "start": 716.0, "end": 729.0, "text": " So in the last part of + the process there, it''s, it''s very different than in the past, I''ve worked on, + you know, you work on one search engine, you might work on one Shopify store, so + to speak.", "tokens": [50364, 407, 294, 264, 1036, 644, 295, 264, 1399, 456, 11, + 309, 311, 11, 309, 311, 588, 819, 813, 294, 264, 1791, 11, 286, 600, 2732, 322, + 11, 291, 458, 11, 291, 589, 322, 472, 3164, 2848, 11, 291, 1062, 589, 322, 472, + 43991, 3531, 11, 370, 281, 1710, 13, 51014], "temperature": 0.0, "avg_logprob": + -0.28356385231018066, "compression_ratio": 1.6825396825396826, "no_speech_prob": + 0.006846302188932896}, {"id": 74, "seek": 71600, "start": 729.0, "end": 737.0, "text": + " And that Shopify, the challenges, there are hundreds of thousand millions of little, + you know, shops that you Shopify search.", "tokens": [51014, 400, 300, 43991, 11, + 264, 4759, 11, 456, 366, 6779, 295, 4714, 6803, 295, 707, 11, 291, 458, 11, 14457, + 300, 291, 43991, 3164, 13, 51414], "temperature": 0.0, "avg_logprob": -0.28356385231018066, + "compression_ratio": 1.6825396825396826, "no_speech_prob": 0.006846302188932896}, + {"id": 75, "seek": 73700, "start": 737.0, "end": 753.0, "text": " And you, how do + you find an algorithm or algorithms that support those, that, you know, what''s + going to work well for every possible ecommerce use case, like in some cases, of + course, there''s a lot of apparel on Shopify.", "tokens": [50364, 400, 291, 11, + 577, 360, 291, 915, 364, 9284, 420, 14642, 300, 1406, 729, 11, 300, 11, 291, 458, + 11, 437, 311, 516, 281, 589, 731, 337, 633, 1944, 308, 26926, 764, 1389, 11, 411, + 294, 512, 3331, 11, 295, 1164, 11, 456, 311, 257, 688, 295, 724, 37675, 322, 43991, + 13, 51164], "temperature": 0.0, "avg_logprob": -0.173581634249006, "compression_ratio": + 1.4473684210526316, "no_speech_prob": 0.13223791122436523}, {"id": 76, "seek": 75300, + "start": 753.0, "end": 762.0, "text": " There''s a lot of, there''s a, you know, + all kinds of things people make up businesses on Shopify Shopify is wants to very + much support creators.", "tokens": [50364, 821, 311, 257, 688, 295, 11, 456, 311, + 257, 11, 291, 458, 11, 439, 3685, 295, 721, 561, 652, 493, 6011, 322, 43991, 43991, + 307, 2738, 281, 588, 709, 1406, 16039, 13, 50814], "temperature": 0.0, "avg_logprob": + -0.1345065400955525, "compression_ratio": 1.6708333333333334, "no_speech_prob": + 0.529079258441925}, {"id": 77, "seek": 75300, "start": 762.0, "end": 767.0, "text": + " How do people even search, what do they expect when they search on these stores.", + "tokens": [50814, 1012, 360, 561, 754, 3164, 11, 437, 360, 436, 2066, 562, 436, + 3164, 322, 613, 9512, 13, 51064], "temperature": 0.0, "avg_logprob": -0.1345065400955525, + "compression_ratio": 1.6708333333333334, "no_speech_prob": 0.529079258441925}, {"id": + 78, "seek": 75300, "start": 767.0, "end": 781.0, "text": " So the good thing is + when I started at Shopify, there was already some amount of, you know, data flowing + through in terms of like knowing what people are clicking on and stuff.", "tokens": + [51064, 407, 264, 665, 551, 307, 562, 286, 1409, 412, 43991, 11, 456, 390, 1217, + 512, 2372, 295, 11, 291, 458, 11, 1412, 13974, 807, 294, 2115, 295, 411, 5276, 437, + 561, 366, 9697, 322, 293, 1507, 13, 51764], "temperature": 0.0, "avg_logprob": -0.1345065400955525, + "compression_ratio": 1.6708333333333334, "no_speech_prob": 0.529079258441925}, {"id": + 79, "seek": 78100, "start": 781.0, "end": 800.0, "text": " So I was able to pretty + early on start developing a click model. So a click model is something that looks + at how users click on results and sort of like in aggregate gives a search result, + a probability of relevance for a given query.", "tokens": [50364, 407, 286, 390, + 1075, 281, 1238, 2440, 322, 722, 6416, 257, 2052, 2316, 13, 407, 257, 2052, 2316, + 307, 746, 300, 1542, 412, 577, 5022, 2052, 322, 3542, 293, 1333, 295, 411, 294, + 26118, 2709, 257, 3164, 1874, 11, 257, 8482, 295, 32684, 337, 257, 2212, 14581, + 13, 51314], "temperature": 0.0, "avg_logprob": -0.11018500878260686, "compression_ratio": + 1.4904458598726114, "no_speech_prob": 0.035330820828676224}, {"id": 80, "seek": + 80000, "start": 800.0, "end": 813.0, "text": " And we noticed that people skip over + certain product a lot when they search for shoes, maybe for some reason the shoes + search shows socks at the top, we know those socks are probably not relevant.", + "tokens": [50364, 400, 321, 5694, 300, 561, 10023, 670, 1629, 1674, 257, 688, 562, + 436, 3164, 337, 6654, 11, 1310, 337, 512, 1778, 264, 6654, 3164, 3110, 17564, 412, + 264, 1192, 11, 321, 458, 729, 17564, 366, 1391, 406, 7340, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.13466184647356877, "compression_ratio": 1.631578947368421, + "no_speech_prob": 0.529658854007721}, {"id": 81, "seek": 80000, "start": 813.0, + "end": 818.0, "text": " And we know that whatever they''re clicking on below that + are very likely relevant.", "tokens": [51014, 400, 321, 458, 300, 2035, 436, 434, + 9697, 322, 2507, 300, 366, 588, 3700, 7340, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.13466184647356877, "compression_ratio": 1.631578947368421, "no_speech_prob": + 0.529658854007721}, {"id": 82, "seek": 81800, "start": 818.0, "end": 825.0, "text": + " And so we can, we started, you know, the good thing that Shopify we kind of could + were able to start using that as like a test set.", "tokens": [50364, 400, 370, + 321, 393, 11, 321, 1409, 11, 291, 458, 11, 264, 665, 551, 300, 43991, 321, 733, + 295, 727, 645, 1075, 281, 722, 1228, 300, 382, 411, 257, 1500, 992, 13, 50714], + "temperature": 0.0, "avg_logprob": -0.14644089551039144, "compression_ratio": 1.5892857142857142, + "no_speech_prob": 0.011702580377459526}, {"id": 83, "seek": 81800, "start": 825.0, + "end": 836.0, "text": " And then, of course, tooling is very key to my heart. So + like one thing that we''ve done at Shopify is we built a large tool tool chain.", + "tokens": [50714, 400, 550, 11, 295, 1164, 11, 46593, 307, 588, 2141, 281, 452, + 1917, 13, 407, 411, 472, 551, 300, 321, 600, 1096, 412, 43991, 307, 321, 3094, 257, + 2416, 2290, 2290, 5021, 13, 51264], "temperature": 0.0, "avg_logprob": -0.14644089551039144, + "compression_ratio": 1.5892857142857142, "no_speech_prob": 0.011702580377459526}, + {"id": 84, "seek": 83600, "start": 836.0, "end": 843.0, "text": " We sort of to + do offline experiments called called boogie using that data.", "tokens": [50364, + 492, 1333, 295, 281, 360, 21857, 12050, 1219, 1219, 748, 39031, 1228, 300, 1412, + 13, 50714], "temperature": 0.0, "avg_logprob": -0.1603498623288911, "compression_ratio": + 1.5833333333333333, "no_speech_prob": 0.19148026406764984}, {"id": 85, "seek": 83600, + "start": 843.0, "end": 853.0, "text": " And it''s about what you would expect from + using something like cupid we can take this data we can sort of see like did we + improve things to things get worse with our ideas.", "tokens": [50714, 400, 309, + 311, 466, 437, 291, 576, 2066, 490, 1228, 746, 411, 4414, 327, 321, 393, 747, 341, + 1412, 321, 393, 1333, 295, 536, 411, 630, 321, 3470, 721, 281, 721, 483, 5324, 365, + 527, 3487, 13, 51214], "temperature": 0.0, "avg_logprob": -0.1603498623288911, "compression_ratio": + 1.5833333333333333, "no_speech_prob": 0.19148026406764984}, {"id": 86, "seek": 85300, + "start": 853.0, "end": 859.0, "text": " And then of course we release to a test + we look at our normal conversion metrics and that kind of thing.", "tokens": [50364, + 400, 550, 295, 1164, 321, 4374, 281, 257, 1500, 321, 574, 412, 527, 2710, 14298, + 16367, 293, 300, 733, 295, 551, 13, 50664], "temperature": 0.0, "avg_logprob": -0.17794135182174212, + "compression_ratio": 1.7416666666666667, "no_speech_prob": 0.17695970833301544}, + {"id": 87, "seek": 85300, "start": 859.0, "end": 877.0, "text": " And then we do + a lot of analysis of our A B test and we we will graduate things for production. + So at a super high level that''s that''s nothing there. I don''t think is that different + than most places other than you know we have the challenge of so many different + shops and things that we have to sort of solve for.", "tokens": [50664, 400, 550, + 321, 360, 257, 688, 295, 5215, 295, 527, 316, 363, 1500, 293, 321, 321, 486, 8080, + 721, 337, 4265, 13, 407, 412, 257, 1687, 1090, 1496, 300, 311, 300, 311, 1825, 456, + 13, 286, 500, 380, 519, 307, 300, 819, 813, 881, 3190, 661, 813, 291, 458, 321, + 362, 264, 3430, 295, 370, 867, 819, 14457, 293, 721, 300, 321, 362, 281, 1333, 295, + 5039, 337, 13, 51564], "temperature": 0.0, "avg_logprob": -0.17794135182174212, + "compression_ratio": 1.7416666666666667, "no_speech_prob": 0.17695970833301544}, + {"id": 88, "seek": 87700, "start": 877.0, "end": 884.0, "text": " I mean, this sounds + so fantastic. It''s like almost fixing or improving search for the entire e-commerce + right and maybe even.", "tokens": [50364, 286, 914, 11, 341, 3263, 370, 5456, 13, + 467, 311, 411, 1920, 19442, 420, 11470, 3164, 337, 264, 2302, 308, 12, 26926, 558, + 293, 1310, 754, 13, 50714], "temperature": 0.0, "avg_logprob": -0.22502018542999916, + "compression_ratio": 1.6592920353982301, "no_speech_prob": 0.1703459620475769}, + {"id": 89, "seek": 87700, "start": 884.0, "end": 893.0, "text": " Yeah, that''s + that''s part of the challenge and the draw on one of the reason, you know, I met + Shopify is just because it''s it''s for you.", "tokens": [50714, 865, 11, 300, 311, + 300, 311, 644, 295, 264, 3430, 293, 264, 2642, 322, 472, 295, 264, 1778, 11, 291, + 458, 11, 286, 1131, 43991, 307, 445, 570, 309, 311, 309, 311, 337, 291, 13, 51164], + "temperature": 0.0, "avg_logprob": -0.22502018542999916, "compression_ratio": 1.6592920353982301, + "no_speech_prob": 0.1703459620475769}, {"id": 90, "seek": 87700, "start": 893.0, + "end": 900.0, "text": " There are people on Shopify who sell 100,000 products. There + are people who sell one product right and there are.", "tokens": [51164, 821, 366, + 561, 322, 43991, 567, 3607, 2319, 11, 1360, 3383, 13, 821, 366, 561, 567, 3607, + 472, 1674, 558, 293, 456, 366, 13, 51514], "temperature": 0.0, "avg_logprob": -0.22502018542999916, + "compression_ratio": 1.6592920353982301, "no_speech_prob": 0.1703459620475769}, + {"id": 91, "seek": 90000, "start": 900.0, "end": 910.0, "text": " There are really + there are stores that you use that you may not realize are Shopify stores and then + there are stores that are very clearly like Shopify at the end.", "tokens": [50364, + 821, 366, 534, 456, 366, 9512, 300, 291, 764, 300, 291, 815, 406, 4325, 366, 43991, + 9512, 293, 550, 456, 366, 9512, 300, 366, 588, 4448, 411, 43991, 412, 264, 917, + 13, 50864], "temperature": 0.0, "avg_logprob": -0.19763165209666791, "compression_ratio": + 1.8784530386740332, "no_speech_prob": 0.0852384865283966}, {"id": 92, "seek": 90000, + "start": 910.0, "end": 917.0, "text": " So that you know Shopify in the URL or in + the footer and that kind of thing, but some you know your local.", "tokens": [50864, + 407, 300, 291, 458, 43991, 294, 264, 12905, 420, 294, 264, 2671, 260, 293, 300, + 733, 295, 551, 11, 457, 512, 291, 458, 428, 2654, 13, 51214], "temperature": 0.0, + "avg_logprob": -0.19763165209666791, "compression_ratio": 1.8784530386740332, "no_speech_prob": + 0.0852384865283966}, {"id": 93, "seek": 90000, "start": 917.0, "end": 919.0, "text": + " I think my local running shop.", "tokens": [51214, 286, 519, 452, 2654, 2614, + 3945, 13, 51314], "temperature": 0.0, "avg_logprob": -0.19763165209666791, "compression_ratio": + 1.8784530386740332, "no_speech_prob": 0.0852384865283966}, {"id": 94, "seek": 90000, + "start": 919.0, "end": 922.0, "text": " Shout out to rag and mountain running.", + "tokens": [51314, 32749, 484, 281, 17539, 293, 6937, 2614, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.19763165209666791, "compression_ratio": 1.8784530386740332, + "no_speech_prob": 0.0852384865283966}, {"id": 95, "seek": 92200, "start": 922.0, + "end": 928.0, "text": " And then there''s a place that you know at the farmers market + and a lot of those places you Shopify.", "tokens": [50364, 400, 550, 456, 311, 257, + 1081, 300, 291, 458, 412, 264, 11339, 2142, 293, 257, 688, 295, 729, 3190, 291, + 43991, 13, 50664], "temperature": 0.0, "avg_logprob": -0.24331428431257418, "compression_ratio": + 1.5405405405405406, "no_speech_prob": 0.3349297046661377}, {"id": 96, "seek": 92200, + "start": 928.0, "end": 933.0, "text": " But then there''s also like larger larger + brands that use Shopify.", "tokens": [50664, 583, 550, 456, 311, 611, 411, 4833, + 4833, 11324, 300, 764, 43991, 13, 50914], "temperature": 0.0, "avg_logprob": -0.24331428431257418, + "compression_ratio": 1.5405405405405406, "no_speech_prob": 0.3349297046661377}, + {"id": 97, "seek": 92200, "start": 933.0, "end": 950.0, "text": " Yeah, that''s + amazing. I''ve seen you recently posted with a link to this paper was it from Google + saying that search abandonment costs us retail 30 300 billion dollars annually.", + "tokens": [50914, 865, 11, 300, 311, 2243, 13, 286, 600, 1612, 291, 3938, 9437, + 365, 257, 2113, 281, 341, 3035, 390, 309, 490, 3329, 1566, 300, 3164, 9072, 518, + 5497, 505, 10800, 2217, 6641, 5218, 3808, 29974, 13, 51764], "temperature": 0.0, + "avg_logprob": -0.24331428431257418, "compression_ratio": 1.5405405405405406, "no_speech_prob": + 0.3349297046661377}, {"id": 98, "seek": 95000, "start": 950.0, "end": 958.0, "text": + " So it''s like massive massive opportunity for search companies, you know, consultancies + and so on.", "tokens": [50364, 407, 309, 311, 411, 5994, 5994, 2650, 337, 3164, + 3431, 11, 291, 458, 11, 7189, 32286, 293, 370, 322, 13, 50764], "temperature": 0.0, + "avg_logprob": -0.18862951178299753, "compression_ratio": 1.7198275862068966, "no_speech_prob": + 0.11068277806043625}, {"id": 99, "seek": 95000, "start": 958.0, "end": 959.0, "text": + " Totally.", "tokens": [50764, 22837, 13, 50814], "temperature": 0.0, "avg_logprob": + -0.18862951178299753, "compression_ratio": 1.7198275862068966, "no_speech_prob": + 0.11068277806043625}, {"id": 100, "seek": 95000, "start": 959.0, "end": 964.0, "text": + " But why do you think this is still the case regardless of all the efforts of our + like what search community.", "tokens": [50814, 583, 983, 360, 291, 519, 341, 307, + 920, 264, 1389, 10060, 295, 439, 264, 6484, 295, 527, 411, 437, 3164, 1768, 13, + 51064], "temperature": 0.0, "avg_logprob": -0.18862951178299753, "compression_ratio": + 1.7198275862068966, "no_speech_prob": 0.11068277806043625}, {"id": 101, "seek": + 95000, "start": 964.0, "end": 971.0, "text": " Is it to say that community is too + small and there is like potential to grow it and add companies and so on.", "tokens": + [51064, 1119, 309, 281, 584, 300, 1768, 307, 886, 1359, 293, 456, 307, 411, 3995, + 281, 1852, 309, 293, 909, 3431, 293, 370, 322, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.18862951178299753, "compression_ratio": 1.7198275862068966, "no_speech_prob": + 0.11068277806043625}, {"id": 102, "seek": 95000, "start": 971.0, "end": 976.0, "text": + " Or is there something fundamental that you know still needs to be tackled.", "tokens": + [51414, 1610, 307, 456, 746, 8088, 300, 291, 458, 920, 2203, 281, 312, 9426, 1493, + 13, 51664], "temperature": 0.0, "avg_logprob": -0.18862951178299753, "compression_ratio": + 1.7198275862068966, "no_speech_prob": 0.11068277806043625}, {"id": 103, "seek": + 97600, "start": 976.0, "end": 986.0, "text": " And I think it''s it reminds me a + lot of where the search space and not just the search space for adjacent things + like recommendations or any.", "tokens": [50364, 400, 286, 519, 309, 311, 309, 12025, + 385, 257, 688, 295, 689, 264, 3164, 1901, 293, 406, 445, 264, 3164, 1901, 337, 24441, + 721, 411, 10434, 420, 604, 13, 50864], "temperature": 0.0, "avg_logprob": -0.1519981539526651, + "compression_ratio": 1.6418604651162791, "no_speech_prob": 0.002253777114674449}, + {"id": 104, "seek": 97600, "start": 986.0, "end": 995.0, "text": " I would say like + surface on a let''s say a website or a product or an app that like is algorithmic + in some way.", "tokens": [50864, 286, 576, 584, 411, 3753, 322, 257, 718, 311, 584, + 257, 3144, 420, 257, 1674, 420, 364, 724, 300, 411, 307, 9284, 299, 294, 512, 636, + 13, 51314], "temperature": 0.0, "avg_logprob": -0.1519981539526651, "compression_ratio": + 1.6418604651162791, "no_speech_prob": 0.002253777114674449}, {"id": 105, "seek": + 97600, "start": 995.0, "end": 1003.0, "text": " That''s just a it feels like from + how people build products. It''s just fundamentally a nascent space.", "tokens": + [51314, 663, 311, 445, 257, 309, 3417, 411, 490, 577, 561, 1322, 3383, 13, 467, + 311, 445, 17879, 257, 5382, 2207, 1901, 13, 51714], "temperature": 0.0, "avg_logprob": + -0.1519981539526651, "compression_ratio": 1.6418604651162791, "no_speech_prob": + 0.002253777114674449}, {"id": 106, "seek": 100300, "start": 1003.0, "end": 1018.0, + "text": " So it reminds me of early in my career, when I was a software developer. + And I worked at a couple software companies maybe because I was a CEO developer + that fund that really were hardware companies.", "tokens": [50364, 407, 309, 12025, + 385, 295, 2440, 294, 452, 3988, 11, 562, 286, 390, 257, 4722, 10754, 13, 400, 286, + 2732, 412, 257, 1916, 4722, 3431, 1310, 570, 286, 390, 257, 9282, 10754, 300, 2374, + 300, 534, 645, 8837, 3431, 13, 51114], "temperature": 0.0, "avg_logprob": -0.14053703035627094, + "compression_ratio": 1.6818181818181819, "no_speech_prob": 0.0006474139518104494}, + {"id": 107, "seek": 100300, "start": 1018.0, "end": 1027.0, "text": " And people + at management levels were used to running hardware companies and but more and more + of the value was delivered to software.", "tokens": [51114, 400, 561, 412, 4592, + 4358, 645, 1143, 281, 2614, 8837, 3431, 293, 457, 544, 293, 544, 295, 264, 2158, + 390, 10144, 281, 4722, 13, 51564], "temperature": 0.0, "avg_logprob": -0.14053703035627094, + "compression_ratio": 1.6818181818181819, "no_speech_prob": 0.0006474139518104494}, + {"id": 108, "seek": 102700, "start": 1027.0, "end": 1038.0, "text": " And they didn''t + necessarily understand how to manage software. So you know hardware you might have + these very upfront you know classically waterfall kinds of development processes.", + "tokens": [50364, 400, 436, 994, 380, 4725, 1223, 577, 281, 3067, 4722, 13, 407, + 291, 458, 8837, 291, 1062, 362, 613, 588, 30264, 291, 458, 1508, 984, 27848, 3685, + 295, 3250, 7555, 13, 50914], "temperature": 0.0, "avg_logprob": -0.1313933531443278, + "compression_ratio": 1.6201923076923077, "no_speech_prob": 0.014607392251491547}, + {"id": 109, "seek": 102700, "start": 1038.0, "end": 1047.0, "text": " And then software + you know in the early 2000s we learned about agile and then it was good to be iterative + and these things and it''s okay you know fail fast.", "tokens": [50914, 400, 550, + 4722, 291, 458, 294, 264, 2440, 8132, 82, 321, 3264, 466, 30072, 293, 550, 309, + 390, 665, 281, 312, 17138, 1166, 293, 613, 721, 293, 309, 311, 1392, 291, 458, 3061, + 2370, 13, 51364], "temperature": 0.0, "avg_logprob": -0.1313933531443278, "compression_ratio": + 1.6201923076923077, "no_speech_prob": 0.014607392251491547}, {"id": 110, "seek": + 104700, "start": 1047.0, "end": 1052.0, "text": " And we can always unlike hardware + with software we can hit the undo button.", "tokens": [50364, 400, 321, 393, 1009, + 8343, 8837, 365, 4722, 321, 393, 2045, 264, 23779, 2960, 13, 50614], "temperature": + 0.0, "avg_logprob": -0.08601852173500872, "compression_ratio": 1.7851239669421488, + "no_speech_prob": 0.006735930684953928}, {"id": 111, "seek": 104700, "start": 1052.0, + "end": 1057.0, "text": " And so it''s a very different practice and a very different + style of leadership I think.", "tokens": [50614, 400, 370, 309, 311, 257, 588, 819, + 3124, 293, 257, 588, 819, 3758, 295, 5848, 286, 519, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.08601852173500872, "compression_ratio": 1.7851239669421488, + "no_speech_prob": 0.006735930684953928}, {"id": 112, "seek": 104700, "start": 1057.0, + "end": 1065.0, "text": " And I think the same thing is becoming true of these data + products like algorithmic data products like search where.", "tokens": [50864, 400, + 286, 519, 264, 912, 551, 307, 5617, 2074, 295, 613, 1412, 3383, 411, 9284, 299, + 1412, 3383, 411, 3164, 689, 13, 51264], "temperature": 0.0, "avg_logprob": -0.08601852173500872, + "compression_ratio": 1.7851239669421488, "no_speech_prob": 0.006735930684953928}, + {"id": 113, "seek": 104700, "start": 1065.0, "end": 1074.0, "text": " Sure at the + implementation level in our at our level you see a lot of people who really we understand + the problem we understand it''s very experimental.", "tokens": [51264, 4894, 412, + 264, 11420, 1496, 294, 527, 412, 527, 1496, 291, 536, 257, 688, 295, 561, 567, 534, + 321, 1223, 264, 1154, 321, 1223, 309, 311, 588, 17069, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.08601852173500872, "compression_ratio": 1.7851239669421488, + "no_speech_prob": 0.006735930684953928}, {"id": 114, "seek": 107400, "start": 1074.0, + "end": 1081.0, "text": " It''s even more experimental than like how software you + can ship something and you can undo something if you need to.", "tokens": [50364, + 467, 311, 754, 544, 17069, 813, 411, 577, 4722, 291, 393, 5374, 746, 293, 291, 393, + 23779, 746, 498, 291, 643, 281, 13, 50714], "temperature": 0.0, "avg_logprob": -0.09158458709716796, + "compression_ratio": 1.766990291262136, "no_speech_prob": 0.005606007296591997}, + {"id": 115, "seek": 107400, "start": 1081.0, "end": 1089.0, "text": " It''s actually + like extremely experimental where every week you''re shipping something new and + you''re always looking at metrics and AB tests and everything.", "tokens": [50714, + 467, 311, 767, 411, 4664, 17069, 689, 633, 1243, 291, 434, 14122, 746, 777, 293, + 291, 434, 1009, 1237, 412, 16367, 293, 13838, 6921, 293, 1203, 13, 51114], "temperature": + 0.0, "avg_logprob": -0.09158458709716796, "compression_ratio": 1.766990291262136, + "no_speech_prob": 0.005606007296591997}, {"id": 116, "seek": 107400, "start": 1089.0, + "end": 1095.0, "text": " And every week you make a completely different product + that you go in a different direction.", "tokens": [51114, 400, 633, 1243, 291, 652, + 257, 2584, 819, 1674, 300, 291, 352, 294, 257, 819, 3513, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.09158458709716796, "compression_ratio": 1.766990291262136, + "no_speech_prob": 0.005606007296591997}, {"id": 117, "seek": 109500, "start": 1095.0, + "end": 1111.0, "text": " Honestly I think one reason that one reason that that this + is a problem is that organizations structure to ship classic software aren''t necessarily + well suited to ship like these data products.", "tokens": [50364, 12348, 286, 519, + 472, 1778, 300, 472, 1778, 300, 300, 341, 307, 257, 1154, 307, 300, 6150, 3877, + 281, 5374, 7230, 4722, 3212, 380, 4725, 731, 24736, 281, 5374, 411, 613, 1412, 3383, + 13, 51164], "temperature": 0.0, "avg_logprob": -0.13804930134823448, "compression_ratio": + 1.496124031007752, "no_speech_prob": 0.013470303267240524}, {"id": 118, "seek": + 111100, "start": 1111.0, "end": 1122.0, "text": " And you I gave a talk at at my + C''s you know the e-commerce search conference in Charlottesville in April about + how to Shopify.", "tokens": [50364, 400, 291, 286, 2729, 257, 751, 412, 412, 452, + 383, 311, 291, 458, 264, 308, 12, 26926, 3164, 7586, 294, 14130, 1521, 279, 8386, + 294, 6929, 466, 577, 281, 43991, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.16153591107099485, "compression_ratio": 1.471698113207547, "no_speech_prob": + 0.010380217805504799}, {"id": 119, "seek": 111100, "start": 1122.0, "end": 1128.0, + "text": " One of the things that we do to try to help with this problem is really + make engineering and data like work hand in hand.", "tokens": [50914, 1485, 295, + 264, 721, 300, 321, 360, 281, 853, 281, 854, 365, 341, 1154, 307, 534, 652, 7043, + 293, 1412, 411, 589, 1011, 294, 1011, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.16153591107099485, "compression_ratio": 1.471698113207547, "no_speech_prob": + 0.010380217805504799}, {"id": 120, "seek": 111100, "start": 1128.0, "end": 1132.0, + "text": " Because many organizations they''re very siloed from each other.", "tokens": + [51214, 1436, 867, 6150, 436, 434, 588, 3425, 78, 292, 490, 1184, 661, 13, 51414], + "temperature": 0.0, "avg_logprob": -0.16153591107099485, "compression_ratio": 1.471698113207547, + "no_speech_prob": 0.010380217805504799}, {"id": 121, "seek": 113200, "start": 1132.0, + "end": 1142.0, "text": " And that can be a really big challenge because as you make + these decisions like day to day like I''m implementing something in my search.", + "tokens": [50364, 400, 300, 393, 312, 257, 534, 955, 3430, 570, 382, 291, 652, 613, + 5327, 411, 786, 281, 786, 411, 286, 478, 18114, 746, 294, 452, 3164, 13, 50864], + "temperature": 0.0, "avg_logprob": -0.10883350241674136, "compression_ratio": 1.6228571428571428, + "no_speech_prob": 0.0465632900595665}, {"id": 122, "seek": 113200, "start": 1142.0, + "end": 1149.0, "text": " Do I go do I you know as I''m writing lines of code do + I go to the left or do I go to the right.", "tokens": [50864, 1144, 286, 352, 360, + 286, 291, 458, 382, 286, 478, 3579, 3876, 295, 3089, 360, 286, 352, 281, 264, 1411, + 420, 360, 286, 352, 281, 264, 558, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.10883350241674136, "compression_ratio": 1.6228571428571428, "no_speech_prob": + 0.0465632900595665}, {"id": 123, "seek": 113200, "start": 1149.0, "end": 1154.0, + "text": " Do I try boosting this or implement this algorithm.", "tokens": [51214, + 1144, 286, 853, 43117, 341, 420, 4445, 341, 9284, 13, 51464], "temperature": 0.0, + "avg_logprob": -0.10883350241674136, "compression_ratio": 1.6228571428571428, "no_speech_prob": + 0.0465632900595665}, {"id": 124, "seek": 115400, "start": 1154.0, "end": 1159.0, + "text": " And I think that''s really a little bit slower and a little bit faster.", + "tokens": [50364, 400, 286, 519, 300, 311, 534, 257, 707, 857, 14009, 293, 257, + 707, 857, 4663, 13, 50614], "temperature": 0.0, "avg_logprob": -0.38498484460931076, + "compression_ratio": 1.6952380952380952, "no_speech_prob": 0.32128530740737915}, + {"id": 125, "seek": 115400, "start": 1159.0, "end": 1167.0, "text": " And like those + really intricate decisions you kind of need both sides of the data and engineering + brain to to do those.", "tokens": [50614, 400, 411, 729, 534, 38015, 5327, 291, + 733, 295, 643, 1293, 4881, 295, 264, 1412, 293, 7043, 3567, 281, 281, 360, 729, + 13, 51014], "temperature": 0.0, "avg_logprob": -0.38498484460931076, "compression_ratio": + 1.6952380952380952, "no_speech_prob": 0.32128530740737915}, {"id": 126, "seek": + 115400, "start": 1167.0, "end": 1179.0, "text": " And I only think of very very + small handful of companies places like Google maybe meta Facebook like have really + mastered this like blending of data and engineering.", "tokens": [51014, 400, 286, + 787, 519, 295, 588, 588, 1359, 16458, 295, 3431, 3190, 411, 3329, 1310, 19616, 4384, + 411, 362, 534, 38686, 341, 411, 23124, 295, 1412, 293, 7043, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.38498484460931076, "compression_ratio": 1.6952380952380952, + "no_speech_prob": 0.32128530740737915}, {"id": 127, "seek": 117900, "start": 1179.0, + "end": 1185.0, "text": " Most people most other companies who have started to like + you know have finally mastered software engineering.", "tokens": [50364, 4534, 561, + 881, 661, 3431, 567, 362, 1409, 281, 411, 291, 458, 362, 2721, 38686, 4722, 7043, + 13, 50664], "temperature": 0.0, "avg_logprob": -0.14183746871127878, "compression_ratio": + 1.7596899224806202, "no_speech_prob": 0.0012182358186692}, {"id": 128, "seek": 117900, + "start": 1185.0, "end": 1197.0, "text": " Haven''t quite come up like from a leadership + and beyond and product leadership overcome the the sort of like hump of of how do + we think about data products.", "tokens": [50664, 23770, 380, 1596, 808, 493, 411, + 490, 257, 5848, 293, 4399, 293, 1674, 5848, 10473, 264, 264, 1333, 295, 411, 47093, + 295, 295, 577, 360, 321, 519, 466, 1412, 3383, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.14183746871127878, "compression_ratio": 1.7596899224806202, "no_speech_prob": + 0.0012182358186692}, {"id": 129, "seek": 117900, "start": 1197.0, "end": 1208.0, + "text": " How do we manage things that are experimental and aren''t like you know + actual projects that are going to take a couple months to complete that we have + a very clear beginning middle into.", "tokens": [51264, 1012, 360, 321, 3067, 721, + 300, 366, 17069, 293, 3212, 380, 411, 291, 458, 3539, 4455, 300, 366, 516, 281, + 747, 257, 1916, 2493, 281, 3566, 300, 321, 362, 257, 588, 1850, 2863, 2808, 666, + 13, 51814], "temperature": 0.0, "avg_logprob": -0.14183746871127878, "compression_ratio": + 1.7596899224806202, "no_speech_prob": 0.0012182358186692}, {"id": 130, "seek": 120800, + "start": 1208.0, "end": 1222.0, "text": " Yeah exactly it''s like beautifully put + like it''s not like a Toyota you know pipeline where you could say yeah this is + where we start we put it all these materials in some people do something and we + fix some bugs and off we go with the car.", "tokens": [50364, 865, 2293, 309, 311, + 411, 16525, 829, 411, 309, 311, 406, 411, 257, 22926, 291, 458, 15517, 689, 291, + 727, 584, 1338, 341, 307, 689, 321, 722, 321, 829, 309, 439, 613, 5319, 294, 512, + 561, 360, 746, 293, 321, 3191, 512, 15120, 293, 766, 321, 352, 365, 264, 1032, 13, + 51064], "temperature": 0.0, "avg_logprob": -0.151155398442195, "compression_ratio": + 1.7186147186147187, "no_speech_prob": 0.09577623754739761}, {"id": 131, "seek": + 120800, "start": 1222.0, "end": 1226.0, "text": " But like there is no like definition + of done in some sense rate.", "tokens": [51064, 583, 411, 456, 307, 572, 411, 7123, + 295, 1096, 294, 512, 2020, 3314, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.151155398442195, "compression_ratio": 1.7186147186147187, "no_speech_prob": 0.09577623754739761}, + {"id": 132, "seek": 120800, "start": 1226.0, "end": 1233.0, "text": " Yeah there''s + no definition of done it''s very much like you''re just constantly experimenting.", + "tokens": [51264, 865, 456, 311, 572, 7123, 295, 1096, 309, 311, 588, 709, 411, + 291, 434, 445, 6460, 29070, 13, 51614], "temperature": 0.0, "avg_logprob": -0.151155398442195, + "compression_ratio": 1.7186147186147187, "no_speech_prob": 0.09577623754739761}, + {"id": 133, "seek": 123300, "start": 1233.0, "end": 1240.0, "text": " It''s not + something that''s visual that it''s like oh we''re going to add this button to the + UI and it''s going to do these things.", "tokens": [50364, 467, 311, 406, 746, 300, + 311, 5056, 300, 309, 311, 411, 1954, 321, 434, 516, 281, 909, 341, 2960, 281, 264, + 15682, 293, 309, 311, 516, 281, 360, 613, 721, 13, 50714], "temperature": 0.0, "avg_logprob": + -0.08972168932057391, "compression_ratio": 1.8071748878923768, "no_speech_prob": + 0.0008172900998033583}, {"id": 134, "seek": 123300, "start": 1240.0, "end": 1248.0, + "text": " In some ways you''re not changing you''re all with you''re like rarely + changing the UI you''re like mixing up search results and how they come up.", "tokens": + [50714, 682, 512, 2098, 291, 434, 406, 4473, 291, 434, 439, 365, 291, 434, 411, + 13752, 4473, 264, 15682, 291, 434, 411, 11983, 493, 3164, 3542, 293, 577, 436, 808, + 493, 13, 51114], "temperature": 0.0, "avg_logprob": -0.08972168932057391, "compression_ratio": + 1.8071748878923768, "no_speech_prob": 0.0008172900998033583}, {"id": 135, "seek": + 123300, "start": 1248.0, "end": 1259.0, "text": " There may be UI elements to it + like oh we understand this query better so we serve this UI but it''s extremely + fuzzy and hard to like.", "tokens": [51114, 821, 815, 312, 15682, 4959, 281, 309, + 411, 1954, 321, 1223, 341, 14581, 1101, 370, 321, 4596, 341, 15682, 457, 309, 311, + 4664, 34710, 293, 1152, 281, 411, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.08972168932057391, "compression_ratio": 1.8071748878923768, "no_speech_prob": + 0.0008172900998033583}, {"id": 136, "seek": 125900, "start": 1259.0, "end": 1272.0, + "text": " One of the biggest challenges I have actually you know and I''ve had this + in consulting and you know continue to have a shop device how do you coach stakeholders + to understand what you even do.", "tokens": [50364, 1485, 295, 264, 3880, 4759, + 286, 362, 767, 291, 458, 293, 286, 600, 632, 341, 294, 23682, 293, 291, 458, 2354, + 281, 362, 257, 3945, 4302, 577, 360, 291, 6560, 17779, 281, 1223, 437, 291, 754, + 360, 13, 51014], "temperature": 0.0, "avg_logprob": -0.11457152647130629, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.005873814690858126}, {"id": 137, "seek": + 125900, "start": 1272.0, "end": 1283.0, "text": " The plus side is it''s very much + a it''s very much like oh it''s a very tied to you know we''re going to make more + money on the other hand it''s like not clearly like.", "tokens": [51014, 440, 1804, + 1252, 307, 309, 311, 588, 709, 257, 309, 311, 588, 709, 411, 1954, 309, 311, 257, + 588, 9601, 281, 291, 458, 321, 434, 516, 281, 652, 544, 1460, 322, 264, 661, 1011, + 309, 311, 411, 406, 4448, 411, 13, 51564], "temperature": 0.0, "avg_logprob": -0.11457152647130629, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.005873814690858126}, + {"id": 138, "seek": 128300, "start": 1283.0, "end": 1292.0, "text": " So just be + in a traditional sense it''s a it''s a constant cycle of experimentation and optimization.", + "tokens": [50364, 407, 445, 312, 294, 257, 5164, 2020, 309, 311, 257, 309, 311, + 257, 5754, 6586, 295, 37142, 293, 19618, 13, 50814], "temperature": 0.0, "avg_logprob": + -0.301086675925333, "compression_ratio": 1.55, "no_speech_prob": 0.3223252594470978}, + {"id": 139, "seek": 128300, "start": 1292.0, "end": 1306.0, "text": " Yeah absolutely + I mean and it requires like a different discipline and like rigor and really even + like a Tom Tom for example I work in a relevancy team search search for elements.", + "tokens": [50814, 865, 3122, 286, 914, 293, 309, 7029, 411, 257, 819, 13635, 293, + 411, 42191, 293, 534, 754, 411, 257, 5041, 5041, 337, 1365, 286, 589, 294, 257, + 25916, 6717, 1469, 3164, 3164, 337, 4959, 13, 51514], "temperature": 0.0, "avg_logprob": + -0.301086675925333, "compression_ratio": 1.55, "no_speech_prob": 0.3223252594470978}, + {"id": 140, "seek": 130600, "start": 1306.0, "end": 1334.0, "text": " I''m not a + ML component or try to but but the thing is that I was amazed by by the team saying + hey I''m running this a b test and they compute a bunch of metrics some like confidence + intervals p values and they say yeah I feel like this is a good change in the algorithm + but it''s not proving to be like when we have split our traffic a b 50 50 just doesn''t + work after two weeks we have to kill it.", "tokens": [50364, 286, 478, 406, 257, + 21601, 6542, 420, 853, 281, 457, 457, 264, 551, 307, 300, 286, 390, 20507, 538, + 538, 264, 1469, 1566, 4177, 286, 478, 2614, 341, 257, 272, 1500, 293, 436, 14722, + 257, 3840, 295, 16367, 512, 411, 6687, 26651, 280, 4190, 293, 436, 584, 1338, 286, + 841, 411, 341, 307, 257, 665, 1319, 294, 264, 9284, 457, 309, 311, 406, 27221, 281, + 312, 411, 562, 321, 362, 7472, 527, 6419, 257, 272, 2625, 2625, 445, 1177, 380, + 589, 934, 732, 3259, 321, 362, 281, 1961, 309, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.24958227035847116, "compression_ratio": 1.6188524590163935, "no_speech_prob": + 0.6544547080993652}, {"id": 141, "seek": 133400, "start": 1334.0, "end": 1347.0, + "text": " You need to go through that rigor if you say just for the moment you you + doubt and you say no I love this change I''m going to push it forward it''s not + going to harm you lost like you cannot do this right.", "tokens": [50364, 509, 643, + 281, 352, 807, 300, 42191, 498, 291, 584, 445, 337, 264, 1623, 291, 291, 6385, 293, + 291, 584, 572, 286, 959, 341, 1319, 286, 478, 516, 281, 2944, 309, 2128, 309, 311, + 406, 516, 281, 6491, 291, 2731, 411, 291, 2644, 360, 341, 558, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.08158468414138961, "compression_ratio": 1.702928870292887, + "no_speech_prob": 0.011742318980395794}, {"id": 142, "seek": 133400, "start": 1347.0, + "end": 1362.0, "text": " It requires everyone to be a scientist too and I think + traditional I think there''s like traditional product and other kinds of leadership + that is very can be very opinion driven or have a strong vision.", "tokens": [51014, + 467, 7029, 1518, 281, 312, 257, 12662, 886, 293, 286, 519, 5164, 286, 519, 456, + 311, 411, 5164, 1674, 293, 661, 3685, 295, 5848, 300, 307, 588, 393, 312, 588, 4800, + 9555, 420, 362, 257, 2068, 5201, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.08158468414138961, "compression_ratio": 1.702928870292887, "no_speech_prob": + 0.011742318980395794}, {"id": 143, "seek": 136200, "start": 1362.0, "end": 1368.0, + "text": " And I don''t think there''s a there I think there''s still tons of room + for that because at the end of the day you need.", "tokens": [50364, 400, 286, 500, + 380, 519, 456, 311, 257, 456, 286, 519, 456, 311, 920, 9131, 295, 1808, 337, 300, + 570, 412, 264, 917, 295, 264, 786, 291, 643, 13, 50664], "temperature": 0.0, "avg_logprob": + -0.11251181284586588, "compression_ratio": 1.5806451612903225, "no_speech_prob": + 0.006631307769566774}, {"id": 144, "seek": 136200, "start": 1368.0, "end": 1376.0, + "text": " You need a strong hypothesis and often what you''re a b testing is within + the context of a larger strategy.", "tokens": [50664, 509, 643, 257, 2068, 17291, + 293, 2049, 437, 291, 434, 257, 272, 4997, 307, 1951, 264, 4319, 295, 257, 4833, + 5206, 13, 51064], "temperature": 0.0, "avg_logprob": -0.11251181284586588, "compression_ratio": + 1.5806451612903225, "no_speech_prob": 0.006631307769566774}, {"id": 145, "seek": + 136200, "start": 1376.0, "end": 1380.0, "text": " Like you know we think we''ll + get traction if we go in this direction.", "tokens": [51064, 1743, 291, 458, 321, + 519, 321, 603, 483, 23558, 498, 321, 352, 294, 341, 3513, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.11251181284586588, "compression_ratio": 1.5806451612903225, + "no_speech_prob": 0.006631307769566774}, {"id": 146, "seek": 138000, "start": 1380.0, + "end": 1406.0, "text": " But you have to like you have to really bring science to + everyone has to be a scientist and it''s not as often an a b test to is not as simple + as like it was a clear winner loser it sometimes is a winner in some ways where + it''s like it''s a winner in this dimension a loser and so we mentioned and then + like can we go after and slice and dice the data to really understand what happened.", + "tokens": [50364, 583, 291, 362, 281, 411, 291, 362, 281, 534, 1565, 3497, 281, + 1518, 575, 281, 312, 257, 12662, 293, 309, 311, 406, 382, 2049, 364, 257, 272, 1500, + 281, 307, 406, 382, 2199, 382, 411, 309, 390, 257, 1850, 8507, 24606, 309, 2171, + 307, 257, 8507, 294, 512, 2098, 689, 309, 311, 411, 309, 311, 257, 8507, 294, 341, + 10139, 257, 24606, 293, 370, 321, 2835, 293, 550, 411, 393, 321, 352, 934, 293, + 13153, 293, 10313, 264, 1412, 281, 534, 1223, 437, 2011, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.15539991185906227, "compression_ratio": 1.8160377358490567, + "no_speech_prob": 0.23315531015396118}, {"id": 147, "seek": 140600, "start": 1406.0, + "end": 1431.0, "text": " It''s not it''s not a often not like a cut and dry story + and so like trying to understand the data to even know how to tell the story requires + a lot of humility I think if you''re a leader to say like okay it''s more important + like what I learned and like my big idea being being the winner of you know being + the clear like thing that won so I get all credit and that kind of thing.", "tokens": + [50364, 467, 311, 406, 309, 311, 406, 257, 2049, 406, 411, 257, 1723, 293, 4016, + 1657, 293, 370, 411, 1382, 281, 1223, 264, 1412, 281, 754, 458, 577, 281, 980, 264, + 1657, 7029, 257, 688, 295, 27106, 286, 519, 498, 291, 434, 257, 5263, 281, 584, + 411, 1392, 309, 311, 544, 1021, 411, 437, 286, 3264, 293, 411, 452, 955, 1558, 885, + 885, 264, 8507, 295, 291, 458, 885, 264, 1850, 411, 551, 300, 1582, 370, 286, 483, + 439, 5397, 293, 300, 733, 295, 551, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.13341257545385468, "compression_ratio": 1.758139534883721, "no_speech_prob": + 0.09669437259435654}, {"id": 148, "seek": 143100, "start": 1431.0, "end": 1454.0, + "text": " Yeah exactly this is amazing I wanted to I''m actually rereading your + book here I''m a big fan and and once we meet you give me the autograph all right + and John very man I think I saw you actually in person listen revolution 2013 you + were on stage it was in Dublin do you remember being in Dublin if you.", "tokens": + [50364, 865, 2293, 341, 307, 2243, 286, 1415, 281, 286, 478, 767, 46453, 8166, 428, + 1446, 510, 286, 478, 257, 955, 3429, 293, 293, 1564, 321, 1677, 291, 976, 385, 264, + 36660, 439, 558, 293, 2619, 588, 587, 286, 519, 286, 1866, 291, 767, 294, 954, 2140, + 8894, 9012, 291, 645, 322, 3233, 309, 390, 294, 42323, 360, 291, 1604, 885, 294, + 42323, 498, 291, 13, 51514], "temperature": 0.0, "avg_logprob": -0.3223176209822945, + "compression_ratio": 1.5175879396984924, "no_speech_prob": 0.22679072618484497}, + {"id": 149, "seek": 145400, "start": 1454.0, "end": 1483.0, "text": " Yeah yeah + that was fun yeah I remember they had this huge rugby stadium yeah yeah exactly + this was the coming back to I think I talked about Cupid there I think my colleague + from Silvercher arena came out we talked about Cupid yes and and I and I got blessing + for Luke from Andrew Balecki there I said would you be okay if I continue because + he didn''t have time and he said yes yes please please oh cool and and then later + it ended up being part of the game.", "tokens": [50364, 865, 1338, 300, 390, 1019, + 1338, 286, 1604, 436, 632, 341, 2603, 43895, 18585, 1338, 1338, 2293, 341, 390, + 264, 1348, 646, 281, 286, 519, 286, 2825, 466, 383, 6127, 456, 286, 519, 452, 13532, + 490, 15861, 6759, 18451, 1361, 484, 321, 2825, 466, 383, 6127, 2086, 293, 293, 286, + 293, 286, 658, 13869, 337, 13044, 490, 10110, 363, 1220, 547, 72, 456, 286, 848, + 576, 291, 312, 1392, 498, 286, 2354, 570, 415, 994, 380, 362, 565, 293, 415, 848, + 2086, 2086, 1767, 1767, 1954, 1627, 293, 293, 550, 1780, 309, 4590, 493, 885, 644, + 295, 264, 1216, 13, 51814], "temperature": 0.0, "avg_logprob": -0.30020647141539936, + "compression_ratio": 1.7153558052434457, "no_speech_prob": 0.39648333191871643}, + {"id": 150, "seek": 148400, "start": 1484.0, "end": 1513.4, "text": " You see and + earned a lesson committed to Tomoko Ocita who is now driving massive changes day + on Gira and oh yeah yeah that''s true yeah I''ve seen his name a lot yeah yes fantastic + and and in this book why I brought it up I was just reading one of the first chapters + where you so beautifully said analysis so let''s say in lesson lingo how you process + the input text yeah", "tokens": [50364, 509, 536, 293, 12283, 257, 6898, 7784, 281, + 5041, 13704, 422, 66, 2786, 567, 307, 586, 4840, 5994, 2962, 786, 322, 460, 4271, + 293, 1954, 1338, 1338, 300, 311, 2074, 1338, 286, 600, 1612, 702, 1315, 257, 688, + 1338, 2086, 5456, 293, 293, 294, 341, 1446, 983, 286, 3038, 309, 493, 286, 390, + 445, 3760, 472, 295, 264, 700, 20013, 689, 291, 370, 16525, 848, 5215, 370, 718, + 311, 584, 294, 6898, 287, 18459, 577, 291, 1399, 264, 4846, 2487, 1338, 51834], + "temperature": 0.0, "avg_logprob": -0.3472553434826079, "compression_ratio": 1.5862068965517242, + "no_speech_prob": 0.04788156971335411}, {"id": 151, "seek": 151400, "start": 1514.0, + "end": 1536.0, "text": " analysis shouldn''t map words to tokens it should map meaning + and user intent to tokens yeah I mean this is amazingly put and you go there later + explaining how you balance the precision versus recall as you do modifications to + the analysis chain you know whether you''re standing or not and stuff like that", + "tokens": [50364, 5215, 4659, 380, 4471, 2283, 281, 22667, 309, 820, 4471, 3620, + 293, 4195, 8446, 281, 22667, 1338, 286, 914, 341, 307, 31762, 829, 293, 291, 352, + 456, 1780, 13468, 577, 291, 4772, 264, 18356, 5717, 9901, 382, 291, 360, 26881, + 281, 264, 5215, 5021, 291, 458, 1968, 291, 434, 4877, 420, 406, 293, 1507, 411, + 300, 51464], "temperature": 0.0, "avg_logprob": -0.26971257527669273, "compression_ratio": + 1.6344086021505377, "no_speech_prob": 0.034534335136413574}, {"id": 152, "seek": + 153600, "start": 1536.64, "end": 1554.72, "text": " but it''s not like many people + even viewed that way I think not that I viewed it that way I was always like yeah + what should I tweak to make it possible but there is a related topic on this front + you know query and content understanding", "tokens": [50396, 457, 309, 311, 406, + 411, 867, 561, 754, 19174, 300, 636, 286, 519, 406, 300, 286, 19174, 309, 300, 636, + 286, 390, 1009, 411, 1338, 437, 820, 286, 29879, 281, 652, 309, 1944, 457, 456, + 307, 257, 4077, 4829, 322, 341, 1868, 291, 458, 14581, 293, 2701, 3701, 51300], + "temperature": 0.0, "avg_logprob": -0.2519044325901912, "compression_ratio": 1.5666666666666667, + "no_speech_prob": 0.083652064204216}, {"id": 153, "seek": 155472, "start": 1555.28, + "end": 1578.88, "text": " mm-hmm how how does this thing connect in your mind yeah + I think so first of all I think it''s funny how the work that we do shapes sort + of like how we do certain work shapes our perspective on things because I think + when I would you know writing that book is sort of like my early part of my relevance + career", "tokens": [50392, 11169, 12, 10250, 577, 577, 775, 341, 551, 1745, 294, + 428, 1575, 1338, 286, 519, 370, 700, 295, 439, 286, 519, 309, 311, 4074, 577, 264, + 589, 300, 321, 360, 10854, 1333, 295, 411, 577, 321, 360, 1629, 589, 10854, 527, + 4585, 322, 721, 570, 286, 519, 562, 286, 576, 291, 458, 3579, 300, 1446, 307, 1333, + 295, 411, 452, 2440, 644, 295, 452, 32684, 3988, 51572], "temperature": 0.0, "avg_logprob": + -0.21190053394862585, "compression_ratio": 1.6830601092896176, "no_speech_prob": + 0.028515979647636414}, {"id": 154, "seek": 157888, "start": 1579.7600000000002, + "end": 1588.96, "text": " at open source connections and this still happens like + you kind of get brought to a client and it''s like okay we have we have this app + over here", "tokens": [50408, 412, 1269, 4009, 9271, 293, 341, 920, 2314, 411, 291, + 733, 295, 483, 3038, 281, 257, 6423, 293, 309, 311, 411, 1392, 321, 362, 321, 362, + 341, 724, 670, 510, 50868], "temperature": 0.0, "avg_logprob": -0.16980829343690976, + "compression_ratio": 1.7654867256637168, "no_speech_prob": 0.016344457864761353}, + {"id": 155, "seek": 157888, "start": 1588.96, "end": 1607.92, "text": " we have + this indexing pipeline you just work within this box that is the search engine and + so I became quite adept at like how can I hack the analyzers and the query and everything + to be really to do like all the crazy things I want to do like and really", "tokens": + [50868, 321, 362, 341, 8186, 278, 15517, 291, 445, 589, 1951, 341, 2424, 300, 307, + 264, 3164, 2848, 293, 370, 286, 3062, 1596, 614, 5250, 412, 411, 577, 393, 286, + 10339, 264, 6459, 41698, 293, 264, 14581, 293, 1203, 281, 312, 534, 281, 360, 411, + 439, 264, 3219, 721, 286, 528, 281, 360, 411, 293, 534, 51816], "temperature": 0.0, + "avg_logprob": -0.16980829343690976, "compression_ratio": 1.7654867256637168, "no_speech_prob": + 0.016344457864761353}, {"id": 156, "seek": 160792, "start": 1608.0, "end": 1620.4, + "text": " could I take in a taxonomy and sort of map to a conceptual understanding + of of the language not just a not just like the words themselves you know people + think about analyzers they think about", "tokens": [50368, 727, 286, 747, 294, 257, + 3366, 23423, 293, 1333, 295, 4471, 281, 257, 24106, 3701, 295, 295, 264, 2856, 406, + 445, 257, 406, 445, 411, 264, 2283, 2969, 291, 458, 561, 519, 466, 6459, 41698, + 436, 519, 466, 50988], "temperature": 0.0, "avg_logprob": -0.11717454679719694, + "compression_ratio": 1.7685589519650655, "no_speech_prob": 0.0008840188384056091}, + {"id": 157, "seek": 160792, "start": 1621.04, "end": 1629.52, "text": " stemming + and they think about lower casing but more and more it was like oh I only I can + only work within this box it as a search engine", "tokens": [51020, 12312, 2810, + 293, 436, 519, 466, 3126, 45109, 457, 544, 293, 544, 309, 390, 411, 1954, 286, 787, + 286, 393, 787, 589, 1951, 341, 2424, 309, 382, 257, 3164, 2848, 51444], "temperature": + 0.0, "avg_logprob": -0.11717454679719694, "compression_ratio": 1.7685589519650655, + "no_speech_prob": 0.0008840188384056091}, {"id": 158, "seek": 160792, "start": 1630.24, + "end": 1635.52, "text": " and whether it becomes like plugins or whatever how can + I how can I massage", "tokens": [51480, 293, 1968, 309, 3643, 411, 33759, 420, 2035, + 577, 393, 286, 577, 393, 286, 16145, 51744], "temperature": 0.0, "avg_logprob": + -0.11717454679719694, "compression_ratio": 1.7685589519650655, "no_speech_prob": + 0.0008840188384056091}, {"id": 159, "seek": 163552, "start": 1636.4, "end": 1640.0, + "text": " the text coming in and the queries coming in so that they they mapped + to each other", "tokens": [50408, 264, 2487, 1348, 294, 293, 264, 24109, 1348, 294, + 370, 300, 436, 436, 33318, 281, 1184, 661, 50588], "temperature": 0.0, "avg_logprob": + -0.1454342557238294, "compression_ratio": 1.7659574468085106, "no_speech_prob": + 0.0023761398624628782}, {"id": 160, "seek": 163552, "start": 1640.8799999999999, + "end": 1648.96, "text": " and so in that in that context it''s it''s like you know + people you may have heard Conway''s law which is like you end up", "tokens": [50632, + 293, 370, 294, 300, 294, 300, 4319, 309, 311, 309, 311, 411, 291, 458, 561, 291, + 815, 362, 2198, 2656, 676, 311, 2101, 597, 307, 411, 291, 917, 493, 51036], "temperature": + 0.0, "avg_logprob": -0.1454342557238294, "compression_ratio": 1.7659574468085106, + "no_speech_prob": 0.0023761398624628782}, {"id": 161, "seek": 163552, "start": 1648.96, + "end": 1657.2, "text": " shipping your org chart like how you structure your projects + is very much tied to the organizational structure of how you sort of", "tokens": + [51036, 14122, 428, 14045, 6927, 411, 577, 291, 3877, 428, 4455, 307, 588, 709, + 9601, 281, 264, 24730, 3877, 295, 577, 291, 1333, 295, 51448], "temperature": 0.0, + "avg_logprob": -0.1454342557238294, "compression_ratio": 1.7659574468085106, "no_speech_prob": + 0.0023761398624628782}, {"id": 162, "seek": 165720, "start": 1658.16, "end": 1670.32, + "text": " of how you do things and so the consultant slash relevance team it really + only works in the box that is the search engine and makes the magic thing magic + more magical", "tokens": [50412, 295, 577, 291, 360, 721, 293, 370, 264, 24676, + 17330, 32684, 1469, 309, 534, 787, 1985, 294, 264, 2424, 300, 307, 264, 3164, 2848, + 293, 1669, 264, 5585, 551, 5585, 544, 12066, 51020], "temperature": 0.0, "avg_logprob": + -0.16040148500536308, "compression_ratio": 1.6867469879518073, "no_speech_prob": + 0.00020199528080411255}, {"id": 163, "seek": 165720, "start": 1672.64, "end": 1680.24, + "text": " and so how it''s when I think about that often it''s sort of similar to + how people think about relational databases", "tokens": [51136, 293, 370, 577, 309, + 311, 562, 286, 519, 466, 300, 2049, 309, 311, 1333, 295, 2531, 281, 577, 561, 519, + 466, 38444, 22380, 51516], "temperature": 0.0, "avg_logprob": -0.16040148500536308, + "compression_ratio": 1.6867469879518073, "no_speech_prob": 0.00020199528080411255}, + {"id": 164, "seek": 168024, "start": 1680.88, "end": 1694.88, "text": " you''re + creating a structure of a database to answer certain questions in the same way using + analysis and how you create fields you''re sort of like structuring an index or + a view of some documents", "tokens": [50396, 291, 434, 4084, 257, 3877, 295, 257, + 8149, 281, 1867, 1629, 1651, 294, 264, 912, 636, 1228, 5215, 293, 577, 291, 1884, + 7909, 291, 434, 1333, 295, 411, 6594, 1345, 364, 8186, 420, 257, 1910, 295, 512, + 8512, 51096], "temperature": 0.0, "avg_logprob": -0.1283384479888498, "compression_ratio": + 1.7878787878787878, "no_speech_prob": 0.00048254651483148336}, {"id": 165, "seek": + 168024, "start": 1695.6, "end": 1706.8, "text": " to answer these natural language + queries that come in and so everything is sort of like thinking about massaging + this database to really to really get to that", "tokens": [51132, 281, 1867, 613, + 3303, 2856, 24109, 300, 808, 294, 293, 370, 1203, 307, 1333, 295, 411, 1953, 466, + 2758, 3568, 341, 8149, 281, 534, 281, 534, 483, 281, 300, 51692], "temperature": + 0.0, "avg_logprob": -0.1283384479888498, "compression_ratio": 1.7878787878787878, + "no_speech_prob": 0.00048254651483148336}, {"id": 166, "seek": 170680, "start": + 1707.36, "end": 1713.2, "text": " into rank results in a way that sort of like gets + closer to the questions that users are answering", "tokens": [50392, 666, 6181, + 3542, 294, 257, 636, 300, 1333, 295, 411, 2170, 4966, 281, 264, 1651, 300, 5022, + 366, 13430, 50684], "temperature": 0.0, "avg_logprob": -0.18800059342995667, "compression_ratio": + 1.6714285714285715, "no_speech_prob": 0.0005064276629127562}, {"id": 167, "seek": + 170680, "start": 1714.48, "end": 1725.04, "text": " and I really concrete example + of that is uh you know I think this comes up a lot in that actually this is my one + of my earliest first projects was", "tokens": [50748, 293, 286, 534, 9859, 1365, + 295, 300, 307, 2232, 291, 458, 286, 519, 341, 1487, 493, 257, 688, 294, 300, 767, + 341, 307, 452, 472, 295, 452, 20573, 700, 4455, 390, 51276], "temperature": 0.0, + "avg_logprob": -0.18800059342995667, "compression_ratio": 1.6714285714285715, "no_speech_prob": + 0.0005064276629127562}, {"id": 168, "seek": 170680, "start": 1725.68, "end": 1733.9199999999998, + "text": " if you take some let''s say some medical knowledge into a your indexing + like questions or medical articles", "tokens": [51308, 498, 291, 747, 512, 718, + 311, 584, 512, 4625, 3601, 666, 257, 428, 8186, 278, 411, 1651, 420, 4625, 11290, + 51720], "temperature": 0.0, "avg_logprob": -0.18800059342995667, "compression_ratio": + 1.6714285714285715, "no_speech_prob": 0.0005064276629127562}, {"id": 169, "seek": + 173392, "start": 1734.4, "end": 1745.3600000000001, "text": " and you have a there + are taxonomies out there that are like mesh is one medical subject headings that + say like okay this article is an article about um", "tokens": [50388, 293, 291, + 362, 257, 456, 366, 3366, 12481, 530, 484, 456, 300, 366, 411, 17407, 307, 472, + 4625, 3983, 1378, 1109, 300, 584, 411, 1392, 341, 7222, 307, 364, 7222, 466, 1105, + 50936], "temperature": 0.0, "avg_logprob": -0.12052681571558903, "compression_ratio": + 1.7513812154696133, "no_speech_prob": 0.0010315979598090053}, {"id": 170, "seek": + 173392, "start": 1746.96, "end": 1757.2, "text": " it let''s say something in the + cardiovascular system it has to do with the heart and it has to do with like the + left ventricle so you that''s a taxonomy it''s hierarchy", "tokens": [51016, 309, + 718, 311, 584, 746, 294, 264, 31786, 1185, 309, 575, 281, 360, 365, 264, 1917, 293, + 309, 575, 281, 360, 365, 411, 264, 1411, 6931, 1341, 306, 370, 291, 300, 311, 257, + 3366, 23423, 309, 311, 22333, 51528], "temperature": 0.0, "avg_logprob": -0.12052681571558903, + "compression_ratio": 1.7513812154696133, "no_speech_prob": 0.0010315979598090053}, + {"id": 171, "seek": 175720, "start": 1757.92, "end": 1767.2, "text": " and um if + I can index that and I can index that taxonomy a certain way so that if someone + if I take a query", "tokens": [50400, 293, 1105, 498, 286, 393, 8186, 300, 293, + 286, 393, 8186, 300, 3366, 23423, 257, 1629, 636, 370, 300, 498, 1580, 498, 286, + 747, 257, 14581, 50864], "temperature": 0.0, "avg_logprob": -0.1055557131767273, + "compression_ratio": 1.7425149700598803, "no_speech_prob": 0.0008111178176477551}, + {"id": 172, "seek": 175720, "start": 1768.0, "end": 1775.92, "text": " I also map + the query to something in that taxonomy let''s say cardiovascular system heart rate + ventricle", "tokens": [50904, 286, 611, 4471, 264, 14581, 281, 746, 294, 300, 3366, + 23423, 718, 311, 584, 31786, 1185, 1917, 3314, 6931, 1341, 306, 51300], "temperature": + 0.0, "avg_logprob": -0.1055557131767273, "compression_ratio": 1.7425149700598803, + "no_speech_prob": 0.0008111178176477551}, {"id": 173, "seek": 175720, "start": 1777.04, + "end": 1782.48, "text": " if I can engineer the similarity in the search engines + so that it uh it kind of", "tokens": [51356, 498, 286, 393, 11403, 264, 32194, 294, + 264, 3164, 12982, 370, 300, 309, 2232, 309, 733, 295, 51628], "temperature": 0.0, + "avg_logprob": -0.1055557131767273, "compression_ratio": 1.7425149700598803, "no_speech_prob": + 0.0008111178176477551}, {"id": 174, "seek": 178248, "start": 1782.56, "end": 1794.48, + "text": " uses the analysis to be like oh it has so many taxonomy nodes similar + that makes it more relevant but maybe it has one or two dissimilar that makes a + little less relevant", "tokens": [50368, 4960, 264, 5215, 281, 312, 411, 1954, 309, + 575, 370, 867, 3366, 23423, 13891, 2531, 300, 1669, 309, 544, 7340, 457, 1310, 309, + 575, 472, 420, 732, 7802, 332, 2202, 300, 1669, 257, 707, 1570, 7340, 50964], "temperature": + 0.0, "avg_logprob": -0.22668403176700366, "compression_ratio": 1.6893203883495145, + "no_speech_prob": 0.00017477509391028434}, {"id": 175, "seek": 178248, "start": + 1795.44, "end": 1810.48, "text": " if I can sort of like zero in on a on on that + uh then I''m really getting closer to sort of meaning that I am to uh you know whether + it''s like a stemd version of this word or not", "tokens": [51012, 498, 286, 393, + 1333, 295, 411, 4018, 294, 322, 257, 322, 322, 300, 2232, 550, 286, 478, 534, 1242, + 4966, 281, 1333, 295, 3620, 300, 286, 669, 281, 2232, 291, 458, 1968, 309, 311, + 411, 257, 12312, 67, 3037, 295, 341, 1349, 420, 406, 51764], "temperature": 0.0, + "avg_logprob": -0.22668403176700366, "compression_ratio": 1.6893203883495145, "no_speech_prob": + 0.00017477509391028434}, {"id": 176, "seek": 181248, "start": 1812.96, "end": 1820.56, + "text": " and you can create tokenization pipelines that take terms like let''s + say uh myocardial infarction which is a heart attack", "tokens": [50388, 293, 291, + 393, 1884, 14862, 2144, 40168, 300, 747, 2115, 411, 718, 311, 584, 2232, 452, 905, + 515, 831, 1536, 289, 882, 597, 307, 257, 1917, 2690, 50768], "temperature": 0.0, + "avg_logprob": -0.11025592427194854, "compression_ratio": 1.6871794871794872, "no_speech_prob": + 0.0004045827954541892}, {"id": 177, "seek": 181248, "start": 1820.56, "end": 1828.32, + "text": " and sort of like uses synonyms and other things to say oh it''s actually + this part in this taxonomy uh and", "tokens": [50768, 293, 1333, 295, 411, 4960, + 5451, 2526, 2592, 293, 661, 721, 281, 584, 1954, 309, 311, 767, 341, 644, 294, 341, + 3366, 23423, 2232, 293, 51156], "temperature": 0.0, "avg_logprob": -0.11025592427194854, + "compression_ratio": 1.6871794871794872, "no_speech_prob": 0.0004045827954541892}, + {"id": 178, "seek": 181248, "start": 1828.32, "end": 1835.2, "text": " therefore + you know we we we sort of expanded to these taxonomy tokens and uh same thing at + index time", "tokens": [51156, 4412, 291, 458, 321, 321, 321, 1333, 295, 14342, + 281, 613, 3366, 23423, 22667, 293, 2232, 912, 551, 412, 8186, 565, 51500], "temperature": + 0.0, "avg_logprob": -0.11025592427194854, "compression_ratio": 1.6871794871794872, + "no_speech_prob": 0.0004045827954541892}, {"id": 179, "seek": 183520, "start": 1836.16, + "end": 1839.8400000000001, "text": " and so I got very adept at sort of massaging + data in that way", "tokens": [50412, 293, 370, 286, 658, 588, 614, 5250, 412, 1333, + 295, 2758, 3568, 1412, 294, 300, 636, 50596], "temperature": 0.0, "avg_logprob": + -0.07774044642938632, "compression_ratio": 1.7868852459016393, "no_speech_prob": + 0.0006058233557268977}, {"id": 180, "seek": 183520, "start": 1841.2, "end": 1849.1200000000001, + "text": " but I think like when you take a step back and you think about if you + have access to full indexing pipeline as most teams do", "tokens": [50664, 457, + 286, 519, 411, 562, 291, 747, 257, 1823, 646, 293, 291, 519, 466, 498, 291, 362, + 2105, 281, 1577, 8186, 278, 15517, 382, 881, 5491, 360, 51060], "temperature": 0.0, + "avg_logprob": -0.07774044642938632, "compression_ratio": 1.7868852459016393, "no_speech_prob": + 0.0006058233557268977}, {"id": 181, "seek": 183520, "start": 1849.1200000000001, + "end": 1856.72, "text": " and you have access to the full query API and everything + um really you''re doing the same exact thing you''re sort of like", "tokens": [51060, + 293, 291, 362, 2105, 281, 264, 1577, 14581, 9362, 293, 1203, 1105, 534, 291, 434, + 884, 264, 912, 1900, 551, 291, 434, 1333, 295, 411, 51440], "temperature": 0.0, + "avg_logprob": -0.07774044642938632, "compression_ratio": 1.7868852459016393, "no_speech_prob": + 0.0006058233557268977}, {"id": 182, "seek": 183520, "start": 1857.3600000000001, + "end": 1863.3600000000001, "text": " massaging content as it''s come in comes in + in some ways you have more tools if you can do it before it gets to the search engine", + "tokens": [51472, 2758, 3568, 2701, 382, 309, 311, 808, 294, 1487, 294, 294, 512, + 2098, 291, 362, 544, 3873, 498, 291, 393, 360, 309, 949, 309, 2170, 281, 264, 3164, + 2848, 51772], "temperature": 0.0, "avg_logprob": -0.07774044642938632, "compression_ratio": + 1.7868852459016393, "no_speech_prob": 0.0006058233557268977}, {"id": 183, "seek": + 186336, "start": 1864.0, "end": 1866.8, "text": " and the same thing with queries + you might have some ability to", "tokens": [50396, 293, 264, 912, 551, 365, 24109, + 291, 1062, 362, 512, 3485, 281, 50536], "temperature": 0.0, "avg_logprob": -0.09201577305793762, + "compression_ratio": 1.9150943396226414, "no_speech_prob": 0.0002215670101577416}, + {"id": 184, "seek": 186336, "start": 1867.84, "end": 1871.76, "text": " apply an + nLP model or do some kind of any recognition before it comes in", "tokens": [50588, + 3079, 364, 297, 45196, 2316, 420, 360, 512, 733, 295, 604, 11150, 949, 309, 1487, + 294, 50784], "temperature": 0.0, "avg_logprob": -0.09201577305793762, "compression_ratio": + 1.9150943396226414, "no_speech_prob": 0.0002215670101577416}, {"id": 185, "seek": + 186336, "start": 1872.4799999999998, "end": 1876.7199999999998, "text": " so philosophically + you''re really doing the same thing you''re trying to map um", "tokens": [50820, + 370, 14529, 984, 291, 434, 534, 884, 264, 912, 551, 291, 434, 1382, 281, 4471, 1105, + 51032], "temperature": 0.0, "avg_logprob": -0.09201577305793762, "compression_ratio": + 1.9150943396226414, "no_speech_prob": 0.0002215670101577416}, {"id": 186, "seek": + 186336, "start": 1878.56, "end": 1884.3999999999999, "text": " you''re sort of at + one side you''re mapping documents to queries and on the other side you''re", "tokens": + [51124, 291, 434, 1333, 295, 412, 472, 1252, 291, 434, 18350, 8512, 281, 24109, + 293, 322, 264, 661, 1252, 291, 434, 51416], "temperature": 0.0, "avg_logprob": -0.09201577305793762, + "compression_ratio": 1.9150943396226414, "no_speech_prob": 0.0002215670101577416}, + {"id": 187, "seek": 186336, "start": 1884.3999999999999, "end": 1890.1599999999999, + "text": " mapping query to sort of the document structure and you''re trying to + map those two together in a way", "tokens": [51416, 18350, 14581, 281, 1333, 295, + 264, 4166, 3877, 293, 291, 434, 1382, 281, 4471, 729, 732, 1214, 294, 257, 636, + 51704], "temperature": 0.0, "avg_logprob": -0.09201577305793762, "compression_ratio": + 1.9150943396226414, "no_speech_prob": 0.0002215670101577416}, {"id": 188, "seek": + 189016, "start": 1890.16, "end": 1893.76, "text": " that creates a ranking function + that that does what you want it to do", "tokens": [50364, 300, 7829, 257, 17833, + 2445, 300, 300, 775, 437, 291, 528, 309, 281, 360, 50544], "temperature": 0.0, "avg_logprob": + -0.19846743160916358, "compression_ratio": 1.7755905511811023, "no_speech_prob": + 0.0012437461409717798}, {"id": 189, "seek": 189016, "start": 1894.96, "end": 1901.44, + "text": " yeah absolutely um I think it was Daniel Tanklank who summarized his 20 + years experience as comparing", "tokens": [50604, 1338, 3122, 1105, 286, 519, 309, + 390, 8033, 28746, 75, 657, 567, 14611, 1602, 702, 945, 924, 1752, 382, 15763, 50928], + "temperature": 0.0, "avg_logprob": -0.19846743160916358, "compression_ratio": 1.7755905511811023, + "no_speech_prob": 0.0012437461409717798}, {"id": 190, "seek": 189016, "start": 1901.44, + "end": 1906.88, "text": " sets of documents right so like is this set of documents + better than the other and then", "tokens": [50928, 6352, 295, 8512, 558, 370, 411, + 307, 341, 992, 295, 8512, 1101, 813, 264, 661, 293, 550, 51200], "temperature": + 0.0, "avg_logprob": -0.19846743160916358, "compression_ratio": 1.7755905511811023, + "no_speech_prob": 0.0012437461409717798}, {"id": 191, "seek": 189016, "start": 1907.28, + "end": 1911.8400000000001, "text": " everything else comes as input you know was + it query understanding was it content understanding", "tokens": [51220, 1203, 1646, + 1487, 382, 4846, 291, 458, 390, 309, 14581, 3701, 390, 309, 2701, 3701, 51448], + "temperature": 0.0, "avg_logprob": -0.19846743160916358, "compression_ratio": 1.7755905511811023, + "no_speech_prob": 0.0012437461409717798}, {"id": 192, "seek": 189016, "start": 1911.8400000000001, + "end": 1918.96, "text": " whatever yeah it''s it''s amazing absolutely absolutely + yeah and it''s uh all of these things come", "tokens": [51448, 2035, 1338, 309, + 311, 309, 311, 2243, 3122, 3122, 1338, 293, 309, 311, 2232, 439, 295, 613, 721, + 808, 51804], "temperature": 0.0, "avg_logprob": -0.19846743160916358, "compression_ratio": + 1.7755905511811023, "no_speech_prob": 0.0012437461409717798}, {"id": 193, "seek": + 191896, "start": 1918.96, "end": 1922.96, "text": " together and the search engine + is kind of like the core driver and you''re trying to massaging this", "tokens": + [50364, 1214, 293, 264, 3164, 2848, 307, 733, 295, 411, 264, 4965, 6787, 293, 291, + 434, 1382, 281, 2758, 3568, 341, 50564], "temperature": 0.0, "avg_logprob": -0.19044031098831532, + "compression_ratio": 1.7568807339449541, "no_speech_prob": 0.0007721655420027673}, + {"id": 194, "seek": 191896, "start": 1922.96, "end": 1928.96, "text": " similarity + engine to to make that quote-unquote cosine similarity what you want it to be", + "tokens": [50564, 32194, 2848, 281, 281, 652, 300, 6513, 12, 409, 25016, 23565, + 32194, 437, 291, 528, 309, 281, 312, 50864], "temperature": 0.0, "avg_logprob": + -0.19044031098831532, "compression_ratio": 1.7568807339449541, "no_speech_prob": + 0.0007721655420027673}, {"id": 195, "seek": 191896, "start": 1929.68, "end": 1935.8400000000001, + "text": " yeah I''ve recently ran across one case uh so in map search you could + think well what what people", "tokens": [50900, 1338, 286, 600, 3938, 5872, 2108, + 472, 1389, 2232, 370, 294, 4471, 3164, 291, 727, 519, 731, 437, 437, 561, 51208], + "temperature": 0.0, "avg_logprob": -0.19044031098831532, "compression_ratio": 1.7568807339449541, + "no_speech_prob": 0.0007721655420027673}, {"id": 196, "seek": 191896, "start": 1935.8400000000001, + "end": 1943.76, "text": " do type there well they do type addresses they type um + coordinates um they also types uh questions", "tokens": [51208, 360, 2010, 456, + 731, 436, 360, 2010, 16862, 436, 2010, 1105, 21056, 1105, 436, 611, 3467, 2232, + 1651, 51604], "temperature": 0.0, "avg_logprob": -0.19044031098831532, "compression_ratio": + 1.7568807339449541, "no_speech_prob": 0.0007721655420027673}, {"id": 197, "seek": + 194376, "start": 1944.24, "end": 1949.28, "text": " they can see how do where do + I go hiking here you know in this area stuff like that not something", "tokens": + [50388, 436, 393, 536, 577, 360, 689, 360, 286, 352, 23784, 510, 291, 458, 294, + 341, 1859, 1507, 411, 300, 406, 746, 50640], "temperature": 0.0, "avg_logprob": + -0.20976159883582074, "compression_ratio": 1.6973684210526316, "no_speech_prob": + 0.0012910300865769386}, {"id": 198, "seek": 194376, "start": 1949.28, "end": 1957.76, + "text": " we can handle right now but maybe in the future we will and the case was + um there was a company", "tokens": [50640, 321, 393, 4813, 558, 586, 457, 1310, + 294, 264, 2027, 321, 486, 293, 264, 1389, 390, 1105, 456, 390, 257, 2237, 51064], + "temperature": 0.0, "avg_logprob": -0.20976159883582074, "compression_ratio": 1.6973684210526316, + "no_speech_prob": 0.0012910300865769386}, {"id": 199, "seek": 194376, "start": 1958.16, + "end": 1964.72, "text": " a search with company name so right we support points + of interest search B.O.I and our search", "tokens": [51084, 257, 3164, 365, 2237, + 1315, 370, 558, 321, 1406, 2793, 295, 1179, 3164, 363, 13, 46, 13, 40, 293, 527, + 3164, 51412], "temperature": 0.0, "avg_logprob": -0.20976159883582074, "compression_ratio": + 1.6973684210526316, "no_speech_prob": 0.0012910300865769386}, {"id": 200, "seek": + 194376, "start": 1964.72, "end": 1971.6, "text": " engine focused on so you have + a meaningful part of the company name I don''t remember something like", "tokens": + [51412, 2848, 5178, 322, 370, 291, 362, 257, 10995, 644, 295, 264, 2237, 1315, 286, + 500, 380, 1604, 746, 411, 51756], "temperature": 0.0, "avg_logprob": -0.20976159883582074, + "compression_ratio": 1.6973684210526316, "no_speech_prob": 0.0012910300865769386}, + {"id": 201, "seek": 197160, "start": 1972.3999999999999, "end": 1980.08, "text": + " white mice something and then had like less meaningful parts like limited south + Africa you know", "tokens": [50404, 2418, 22257, 746, 293, 550, 632, 411, 1570, + 10995, 3166, 411, 5567, 7377, 7349, 291, 458, 50788], "temperature": 0.0, "avg_logprob": + -0.22082775453977946, "compression_ratio": 1.6343612334801763, "no_speech_prob": + 0.0021685075480490923}, {"id": 202, "seek": 197160, "start": 1980.08, "end": 1985.4399999999998, + "text": " things that would repeat across a number of company names and I was search + engine because you", "tokens": [50788, 721, 300, 576, 7149, 2108, 257, 1230, 295, + 2237, 5288, 293, 286, 390, 3164, 2848, 570, 291, 51056], "temperature": 0.0, "avg_logprob": + -0.22082775453977946, "compression_ratio": 1.6343612334801763, "no_speech_prob": + 0.0021685075480490923}, {"id": 203, "seek": 197160, "start": 1985.4399999999998, + "end": 1992.0, "text": " you have the feature of minstreet max right it it it it + focused actually on less meaningful", "tokens": [51056, 291, 362, 264, 4111, 295, + 923, 372, 4751, 11469, 558, 309, 309, 309, 309, 5178, 767, 322, 1570, 10995, 51384], + "temperature": 0.0, "avg_logprob": -0.22082775453977946, "compression_ratio": 1.6343612334801763, + "no_speech_prob": 0.0021685075480490923}, {"id": 204, "seek": 197160, "start": 1992.0, + "end": 1998.56, "text": " components and so we bumped some overlap higher just because + how TF idea by the way works", "tokens": [51384, 6677, 293, 370, 321, 42696, 512, + 19959, 2946, 445, 570, 577, 40964, 1558, 538, 264, 636, 1985, 51712], "temperature": + 0.0, "avg_logprob": -0.22082775453977946, "compression_ratio": 1.6343612334801763, + "no_speech_prob": 0.0021685075480490923}, {"id": 205, "seek": 199856, "start": 1998.56, + "end": 2004.72, "text": " it''s also a lot of work they''re going into understanding + why does this TF equal to this number I", "tokens": [50364, 309, 311, 611, 257, + 688, 295, 589, 436, 434, 516, 666, 3701, 983, 775, 341, 40964, 2681, 281, 341, 1230, + 286, 50672], "temperature": 0.0, "avg_logprob": -0.12998182796737523, "compression_ratio": + 1.701067615658363, "no_speech_prob": 0.0007650140323676169}, {"id": 206, "seek": + 199856, "start": 2004.72, "end": 2012.56, "text": " need to figure out idea right + um yeah I mean and I was like so I went on Twitter and I tweeted", "tokens": [50672, + 643, 281, 2573, 484, 1558, 558, 1105, 1338, 286, 914, 293, 286, 390, 411, 370, 286, + 1437, 322, 5794, 293, 286, 25646, 51064], "temperature": 0.0, "avg_logprob": -0.12998182796737523, + "compression_ratio": 1.701067615658363, "no_speech_prob": 0.0007650140323676169}, + {"id": 207, "seek": 199856, "start": 2012.56, "end": 2018.3999999999999, "text": + " like I came across another use case where maybe vector search could help because + it would actually", "tokens": [51064, 411, 286, 1361, 2108, 1071, 764, 1389, 689, + 1310, 8062, 3164, 727, 854, 570, 309, 576, 767, 51356], "temperature": 0.0, "avg_logprob": + -0.12998182796737523, "compression_ratio": 1.701067615658363, "no_speech_prob": + 0.0007650140323676169}, {"id": 208, "seek": 199856, "start": 2018.3999999999999, + "end": 2022.8799999999999, "text": " focus on the meaningful part hopefully because + you have the attention uh mechanism in the", "tokens": [51356, 1879, 322, 264, 10995, + 644, 4696, 570, 291, 362, 264, 3202, 2232, 7513, 294, 264, 51580], "temperature": + 0.0, "avg_logprob": -0.12998182796737523, "compression_ratio": 1.701067615658363, + "no_speech_prob": 0.0007650140323676169}, {"id": 209, "seek": 199856, "start": 2022.8799999999999, + "end": 2028.3999999999999, "text": " transformer models right like bird and others + so presumably it would focus only on the right part", "tokens": [51580, 31782, 5245, + 558, 411, 5255, 293, 2357, 370, 26742, 309, 576, 1879, 787, 322, 264, 558, 644, + 51856], "temperature": 0.0, "avg_logprob": -0.12998182796737523, "compression_ratio": + 1.701067615658363, "no_speech_prob": 0.0007650140323676169}, {"id": 210, "seek": + 202856, "start": 2028.56, "end": 2034.0, "text": " and it would find it do you think + do you think this was a moment of despair do you believe in this?", "tokens": [50364, + 293, 309, 576, 915, 309, 360, 291, 519, 360, 291, 519, 341, 390, 257, 1623, 295, + 25763, 360, 291, 1697, 294, 341, 30, 50636], "temperature": 0.0, "avg_logprob": + -0.081960662206014, "compression_ratio": 1.7546296296296295, "no_speech_prob": 0.00014269817620515823}, + {"id": 211, "seek": 202856, "start": 2035.12, "end": 2041.52, "text": " uh I I mean + I do believe in it to some extent like I totally believe that''s a valid thing", + "tokens": [50692, 2232, 286, 286, 914, 286, 360, 1697, 294, 309, 281, 512, 8396, + 411, 286, 3879, 1697, 300, 311, 257, 7363, 551, 51012], "temperature": 0.0, "avg_logprob": + -0.081960662206014, "compression_ratio": 1.7546296296296295, "no_speech_prob": 0.00014269817620515823}, + {"id": 212, "seek": 202856, "start": 2042.3999999999999, "end": 2047.76, "text": + " I also think that like sometimes the document frequency itself is really interesting + because it''s", "tokens": [51056, 286, 611, 519, 300, 411, 2171, 264, 4166, 7893, + 2564, 307, 534, 1880, 570, 309, 311, 51324], "temperature": 0.0, "avg_logprob": + -0.081960662206014, "compression_ratio": 1.7546296296296295, "no_speech_prob": 0.00014269817620515823}, + {"id": 213, "seek": 202856, "start": 2047.76, "end": 2055.68, "text": " like it + gets at the idea of specificity in the query and so if you search for something + and", "tokens": [51324, 411, 309, 2170, 412, 264, 1558, 295, 2685, 507, 294, 264, + 14581, 293, 370, 498, 291, 3164, 337, 746, 293, 51720], "temperature": 0.0, "avg_logprob": + -0.081960662206014, "compression_ratio": 1.7546296296296295, "no_speech_prob": 0.00014269817620515823}, + {"id": 214, "seek": 205568, "start": 2056.3999999999996, "end": 2061.8399999999997, + "text": " it''s just like the document frequency is sometimes a poor measure of + specificity because", "tokens": [50400, 309, 311, 445, 411, 264, 4166, 7893, 307, + 2171, 257, 4716, 3481, 295, 2685, 507, 570, 50672], "temperature": 0.0, "avg_logprob": + -0.11377954483032227, "compression_ratio": 1.7317073170731707, "no_speech_prob": + 0.0004883946967311203}, {"id": 215, "seek": 205568, "start": 2062.64, "end": 2068.8799999999997, + "text": " it''s uh it''s not like it''s actually you know just because something + is rare in the corpus doesn''t", "tokens": [50712, 309, 311, 2232, 309, 311, 406, + 411, 309, 311, 767, 291, 458, 445, 570, 746, 307, 5892, 294, 264, 1181, 31624, 1177, + 380, 51024], "temperature": 0.0, "avg_logprob": -0.11377954483032227, "compression_ratio": + 1.7317073170731707, "no_speech_prob": 0.0004883946967311203}, {"id": 216, "seek": + 205568, "start": 2068.8799999999997, "end": 2076.96, "text": " necessarily mean + it''s it''s uh it''s it''s actually more specific to the users intent and some cases", + "tokens": [51024, 4725, 914, 309, 311, 309, 311, 2232, 309, 311, 309, 311, 767, + 544, 2685, 281, 264, 5022, 8446, 293, 512, 3331, 51428], "temperature": 0.0, "avg_logprob": + -0.11377954483032227, "compression_ratio": 1.7317073170731707, "no_speech_prob": + 0.0004883946967311203}, {"id": 217, "seek": 207696, "start": 2076.96, "end": 2085.2, + "text": " like that is just like uh thinking of when when we would do uh we did + a project for a", "tokens": [50364, 411, 300, 307, 445, 411, 2232, 1953, 295, 562, + 562, 321, 576, 360, 2232, 321, 630, 257, 1716, 337, 257, 50776], "temperature": + 0.0, "avg_logprob": -0.16024311198744662, "compression_ratio": 1.7355769230769231, + "no_speech_prob": 0.0004672530631069094}, {"id": 218, "seek": 207696, "start": 2085.2, + "end": 2090.2400000000002, "text": " Riley media to kind of help with a project + similar to Safari books online that people might be", "tokens": [50776, 31373, 3021, + 281, 733, 295, 854, 365, 257, 1716, 2531, 281, 43820, 3642, 2950, 300, 561, 1062, + 312, 51028], "temperature": 0.0, "avg_logprob": -0.16024311198744662, "compression_ratio": + 1.7355769230769231, "no_speech_prob": 0.0004672530631069094}, {"id": 219, "seek": + 207696, "start": 2090.2400000000002, "end": 2098.64, "text": " familiar with and + people don''t like if people search um job ascript or book job ascript books", "tokens": + [51028, 4963, 365, 293, 561, 500, 380, 411, 498, 561, 3164, 1105, 1691, 382, 5944, + 420, 1446, 1691, 382, 5944, 3642, 51448], "temperature": 0.0, "avg_logprob": -0.16024311198744662, + "compression_ratio": 1.7355769230769231, "no_speech_prob": 0.0004672530631069094}, + {"id": 220, "seek": 207696, "start": 2099.68, "end": 2106.32, "text": " it just + so happens that just how titles are written if you write a book on on uh react", + "tokens": [51500, 309, 445, 370, 2314, 300, 445, 577, 12992, 366, 3720, 498, 291, + 2464, 257, 1446, 322, 322, 2232, 4515, 51832], "temperature": 0.0, "avg_logprob": + -0.16024311198744662, "compression_ratio": 1.7355769230769231, "no_speech_prob": + 0.0004672530631069094}, {"id": 221, "seek": 210696, "start": 2107.36, "end": 2112.88, + "text": " you''re not going to put JavaScript in the title but react is conceptually + you know about", "tokens": [50384, 291, 434, 406, 516, 281, 829, 15778, 294, 264, + 4876, 457, 4515, 307, 3410, 671, 291, 458, 466, 50660], "temperature": 0.0, "avg_logprob": + -0.14067944288253784, "compression_ratio": 1.8781725888324874, "no_speech_prob": + 0.00045579587458632886}, {"id": 222, "seek": 210696, "start": 2112.88, "end": 2119.6, + "text": " JavaScript and so uh what''s really interesting is like I you know you + type JavaScript books react", "tokens": [50660, 15778, 293, 370, 2232, 437, 311, + 534, 1880, 307, 411, 286, 291, 458, 291, 2010, 15778, 3642, 4515, 50996], "temperature": + 0.0, "avg_logprob": -0.14067944288253784, "compression_ratio": 1.8781725888324874, + "no_speech_prob": 0.00045579587458632886}, {"id": 223, "seek": 210696, "start": + 2119.6, "end": 2125.12, "text": " might be a great great react book might be a great + JavaScript book but you have to understand", "tokens": [50996, 1062, 312, 257, 869, + 869, 4515, 1446, 1062, 312, 257, 869, 15778, 1446, 457, 291, 362, 281, 1223, 51272], + "temperature": 0.0, "avg_logprob": -0.14067944288253784, "compression_ratio": 1.8781725888324874, + "no_speech_prob": 0.00045579587458632886}, {"id": 224, "seek": 210696, "start": + 2125.12, "end": 2130.4, "text": " react in the context of this broader concept of + JavaScript even though that exact term is", "tokens": [51272, 4515, 294, 264, 4319, + 295, 341, 13227, 3410, 295, 15778, 754, 1673, 300, 1900, 1433, 307, 51536], "temperature": + 0.0, "avg_logprob": -0.14067944288253784, "compression_ratio": 1.8781725888324874, + "no_speech_prob": 0.00045579587458632886}, {"id": 225, "seek": 213040, "start": + 2130.4, "end": 2138.8, "text": " put in title so uh this concept of term specificity + is really useful but it''s often like uh the", "tokens": [50364, 829, 294, 4876, + 370, 2232, 341, 3410, 295, 1433, 2685, 507, 307, 534, 4420, 457, 309, 311, 2049, + 411, 2232, 264, 50784], "temperature": 0.0, "avg_logprob": -0.10187999937269422, + "compression_ratio": 1.7432432432432432, "no_speech_prob": 0.0008469465537928045}, + {"id": 226, "seek": 213040, "start": 2138.8, "end": 2144.08, "text": " the way we + get at it with document frequency it can be can be really invalid you know not great", + "tokens": [50784, 264, 636, 321, 483, 412, 309, 365, 4166, 7893, 309, 393, 312, + 393, 312, 534, 34702, 291, 458, 406, 869, 51048], "temperature": 0.0, "avg_logprob": + -0.10187999937269422, "compression_ratio": 1.7432432432432432, "no_speech_prob": + 0.0008469465537928045}, {"id": 227, "seek": 213040, "start": 2144.64, "end": 2150.0, + "text": " and to your point about like the attention mechanism that''s that''s really + that''s really interesting", "tokens": [51076, 293, 281, 428, 935, 466, 411, 264, + 3202, 7513, 300, 311, 300, 311, 534, 300, 311, 534, 1880, 51344], "temperature": + 0.0, "avg_logprob": -0.10187999937269422, "compression_ratio": 1.7432432432432432, + "no_speech_prob": 0.0008469465537928045}, {"id": 228, "seek": 213040, "start": 2150.0, + "end": 2158.56, "text": " because um yeah I sort of I could see like conceptually + how uh how that can really like tie you", "tokens": [51344, 570, 1105, 1338, 286, + 1333, 295, 286, 727, 536, 411, 3410, 671, 577, 2232, 577, 300, 393, 534, 411, 7582, + 291, 51772], "temperature": 0.0, "avg_logprob": -0.10187999937269422, "compression_ratio": + 1.7432432432432432, "no_speech_prob": 0.0008469465537928045}, {"id": 229, "seek": + 215856, "start": 2158.56, "end": 2164.4, "text": " sort of like zero in on the concepts + that are most important to a to a document and one challenge", "tokens": [50364, + 1333, 295, 411, 4018, 294, 322, 264, 10392, 300, 366, 881, 1021, 281, 257, 281, + 257, 4166, 293, 472, 3430, 50656], "temperature": 0.0, "avg_logprob": -0.1442065870905497, + "compression_ratio": 1.7397260273972603, "no_speech_prob": 7.430406549246982e-05}, + {"id": 230, "seek": 215856, "start": 2164.4, "end": 2171.84, "text": " with like + one of the reasons I think Bert is so found like transformative is traditionally + like", "tokens": [50656, 365, 411, 472, 295, 264, 4112, 286, 519, 29594, 307, 370, + 1352, 411, 36070, 307, 19067, 411, 51028], "temperature": 0.0, "avg_logprob": -0.1442065870905497, + "compression_ratio": 1.7397260273972603, "no_speech_prob": 7.430406549246982e-05}, + {"id": 231, "seek": 215856, "start": 2171.84, "end": 2176.88, "text": " for years + and years even going back to the early 2000s with like lead and semantic analysis + of", "tokens": [51028, 337, 924, 293, 924, 754, 516, 646, 281, 264, 2440, 8132, + 82, 365, 411, 1477, 293, 47982, 5215, 295, 51280], "temperature": 0.0, "avg_logprob": + -0.1442065870905497, "compression_ratio": 1.7397260273972603, "no_speech_prob": + 7.430406549246982e-05}, {"id": 232, "seek": 215856, "start": 2176.88, "end": 2183.2799999999997, + "text": " these these things and and then we have word to vac eventually these sort + of like techniques", "tokens": [51280, 613, 613, 721, 293, 293, 550, 321, 362, 1349, + 281, 2842, 4728, 613, 1333, 295, 411, 7512, 51600], "temperature": 0.0, "avg_logprob": + -0.1442065870905497, "compression_ratio": 1.7397260273972603, "no_speech_prob": + 7.430406549246982e-05}, {"id": 233, "seek": 218328, "start": 2184.0800000000004, + "end": 2190.32, "text": " uh they''re really great for like and some ways like increasing + recall or getting at like a rough", "tokens": [50404, 2232, 436, 434, 534, 869, + 337, 411, 293, 512, 2098, 411, 5662, 9901, 420, 1242, 412, 411, 257, 5903, 50716], + "temperature": 0.0, "avg_logprob": -0.11657959915870844, "compression_ratio": 1.8076923076923077, + "no_speech_prob": 0.0007920057978481054}, {"id": 234, "seek": 218328, "start": 2190.32, + "end": 2198.7200000000003, "text": " semantic sense of of what''s what''s uh you + know what''s there but when you at the end of the day", "tokens": [50716, 47982, + 2020, 295, 295, 437, 311, 437, 311, 2232, 291, 458, 437, 311, 456, 457, 562, 291, + 412, 264, 917, 295, 264, 786, 51136], "temperature": 0.0, "avg_logprob": -0.11657959915870844, + "compression_ratio": 1.8076923076923077, "no_speech_prob": 0.0007920057978481054}, + {"id": 235, "seek": 218328, "start": 2199.44, "end": 2206.6400000000003, "text": + " it''s like not helping me really get at the higher precision kind of component + of search", "tokens": [51172, 309, 311, 411, 406, 4315, 385, 534, 483, 412, 264, + 2946, 18356, 733, 295, 6542, 295, 3164, 51532], "temperature": 0.0, "avg_logprob": + -0.11657959915870844, "compression_ratio": 1.8076923076923077, "no_speech_prob": + 0.0007920057978481054}, {"id": 236, "seek": 218328, "start": 2207.6800000000003, + "end": 2212.32, "text": " that really like traditional search engines thrive at + and still are really good at like you have", "tokens": [51584, 300, 534, 411, 5164, + 3164, 12982, 21233, 412, 293, 920, 366, 534, 665, 412, 411, 291, 362, 51816], "temperature": + 0.0, "avg_logprob": -0.11657959915870844, "compression_ratio": 1.8076923076923077, + "no_speech_prob": 0.0007920057978481054}, {"id": 237, "seek": 221232, "start": 2213.28, + "end": 2218.32, "text": " you know that this is a shoe I don''t need to see socks + just show me the shoes this is a shoe", "tokens": [50412, 291, 458, 300, 341, 307, + 257, 12796, 286, 500, 380, 643, 281, 536, 17564, 445, 855, 385, 264, 6654, 341, + 307, 257, 12796, 50664], "temperature": 0.0, "avg_logprob": -0.09745325212893279, + "compression_ratio": 1.7589285714285714, "no_speech_prob": 7.994768384378403e-05}, + {"id": 238, "seek": 221232, "start": 2219.36, "end": 2224.0800000000004, "text": + " and you don''t have that like fuzziness that you get in a dense vector representation + where everything", "tokens": [50716, 293, 291, 500, 380, 362, 300, 411, 283, 16740, + 1324, 300, 291, 483, 294, 257, 18011, 8062, 10290, 689, 1203, 50952], "temperature": + 0.0, "avg_logprob": -0.09745325212893279, "compression_ratio": 1.7589285714285714, + "no_speech_prob": 7.994768384378403e-05}, {"id": 239, "seek": 221232, "start": 2224.0800000000004, + "end": 2230.4, "text": " is kind of compressed down and fuzzy uh but what hurt and + those kinds of things really do with the", "tokens": [50952, 307, 733, 295, 30353, + 760, 293, 34710, 2232, 457, 437, 4607, 293, 729, 3685, 295, 721, 534, 360, 365, + 264, 51268], "temperature": 0.0, "avg_logprob": -0.09745325212893279, "compression_ratio": + 1.7589285714285714, "no_speech_prob": 7.994768384378403e-05}, {"id": 240, "seek": + 221232, "start": 2230.4, "end": 2236.0, "text": " attention mechanism I think it''s + really like turning that on its head where it''s like actually there", "tokens": + [51268, 3202, 7513, 286, 519, 309, 311, 534, 411, 6246, 300, 322, 1080, 1378, 689, + 309, 311, 411, 767, 456, 51548], "temperature": 0.0, "avg_logprob": -0.09745325212893279, + "compression_ratio": 1.7589285714285714, "no_speech_prob": 7.994768384378403e-05}, + {"id": 241, "seek": 223600, "start": 2236.0, "end": 2242.88, "text": " are these + parts that we could get at where it''s like the precision of these related concepts + it''s", "tokens": [50364, 366, 613, 3166, 300, 321, 727, 483, 412, 689, 309, 311, + 411, 264, 18356, 295, 613, 4077, 10392, 309, 311, 50708], "temperature": 0.0, "avg_logprob": + -0.120904420011787, "compression_ratio": 1.798165137614679, "no_speech_prob": 0.00012125805369578302}, + {"id": 242, "seek": 223600, "start": 2242.88, "end": 2248.96, "text": " like we + know that um we know that the A the most important part of this document is this + part that", "tokens": [50708, 411, 321, 458, 300, 1105, 321, 458, 300, 264, 316, + 264, 881, 1021, 644, 295, 341, 4166, 307, 341, 644, 300, 51012], "temperature": + 0.0, "avg_logprob": -0.120904420011787, "compression_ratio": 1.798165137614679, + "no_speech_prob": 0.00012125805369578302}, {"id": 243, "seek": 223600, "start": + 2248.96, "end": 2255.52, "text": " talks about JavaScript or it''s you know the + JavaScript-iness about it and and when we search for that", "tokens": [51012, 6686, + 466, 15778, 420, 309, 311, 291, 458, 264, 15778, 12, 1324, 466, 309, 293, 293, 562, + 321, 3164, 337, 300, 51340], "temperature": 0.0, "avg_logprob": -0.120904420011787, + "compression_ratio": 1.798165137614679, "no_speech_prob": 0.00012125805369578302}, + {"id": 244, "seek": 223600, "start": 2255.52, "end": 2263.68, "text": " we can kind + of zero in on when that dimension of it as opposed to being a fuzzy concept of um", + "tokens": [51340, 321, 393, 733, 295, 4018, 294, 322, 562, 300, 10139, 295, 309, + 382, 8851, 281, 885, 257, 34710, 3410, 295, 1105, 51748], "temperature": 0.0, "avg_logprob": + -0.120904420011787, "compression_ratio": 1.798165137614679, "no_speech_prob": 0.00012125805369578302}, + {"id": 245, "seek": 226368, "start": 2264.3199999999997, "end": 2269.6, "text": + " of you know programming languages and JavaScript if that makes sense I feel like + it''s like", "tokens": [50396, 295, 291, 458, 9410, 8650, 293, 15778, 498, 300, + 1669, 2020, 286, 841, 411, 309, 311, 411, 50660], "temperature": 0.0, "avg_logprob": + -0.2334840836063508, "compression_ratio": 1.6222222222222222, "no_speech_prob": + 0.0010327906347811222}, {"id": 246, "seek": 226368, "start": 2269.6, "end": 2274.48, + "text": " zeroing in on like what makes this this thing precisely interesting as + opposed to", "tokens": [50660, 4018, 278, 294, 322, 411, 437, 1669, 341, 341, 551, + 13402, 1880, 382, 8851, 281, 50904], "temperature": 0.0, "avg_logprob": -0.2334840836063508, + "compression_ratio": 1.6222222222222222, "no_speech_prob": 0.0010327906347811222}, + {"id": 247, "seek": 226368, "start": 2275.12, "end": 2281.04, "text": " traditional + dense vector representations which have been more fuzzy and castifying cat", "tokens": + [50936, 5164, 18011, 8062, 33358, 597, 362, 668, 544, 34710, 293, 4193, 5489, 3857, + 51232], "temperature": 0.0, "avg_logprob": -0.2334840836063508, "compression_ratio": + 1.6222222222222222, "no_speech_prob": 0.0010327906347811222}, {"id": 248, "seek": + 226368, "start": 2281.04, "end": 2286.56, "text": " out a five wide net kind of + thing and more focused of recall yeah exactly I think and you", "tokens": [51232, + 484, 257, 1732, 4874, 2533, 733, 295, 551, 293, 544, 5178, 295, 9901, 1338, 2293, + 286, 519, 293, 291, 51508], "temperature": 0.0, "avg_logprob": -0.2334840836063508, + "compression_ratio": 1.6222222222222222, "no_speech_prob": 0.0010327906347811222}, + {"id": 249, "seek": 226368, "start": 2286.56, "end": 2293.6, "text": " you''re reading + a nice paper with what problem does Bert hope to solve the search in 2019", "tokens": + [51508, 291, 434, 3760, 257, 1481, 3035, 365, 437, 1154, 775, 29594, 1454, 281, + 5039, 264, 3164, 294, 6071, 51860], "temperature": 0.0, "avg_logprob": -0.2334840836063508, + "compression_ratio": 1.6222222222222222, "no_speech_prob": 0.0010327906347811222}, + {"id": 250, "seek": 229368, "start": 2294.0, "end": 2302.72, "text": " and of 2019 + and you really well-compared there uh inverted in the xpar search method with uh", + "tokens": [50380, 293, 295, 6071, 293, 291, 534, 731, 12, 21541, 1642, 456, 2232, + 38969, 294, 264, 2031, 2181, 3164, 3170, 365, 2232, 50816], "temperature": 0.0, + "avg_logprob": -0.32695143873041327, "compression_ratio": 1.6255707762557077, "no_speech_prob": + 0.0009401958668604493}, {"id": 251, "seek": 229368, "start": 2302.72, "end": 2307.7599999999998, + "text": " war two back and then you basically allude to the fact that Bert probably + gets the", "tokens": [50816, 1516, 732, 646, 293, 550, 291, 1936, 439, 2303, 281, + 264, 1186, 300, 29594, 1391, 2170, 264, 51068], "temperature": 0.0, "avg_logprob": + -0.32695143873041327, "compression_ratio": 1.6255707762557077, "no_speech_prob": + 0.0009401958668604493}, {"id": 252, "seek": 229368, "start": 2307.7599999999998, + "end": 2314.48, "text": " aboutness of the document uh better than uh war two veko + tfidf right because in war two veko", "tokens": [51068, 466, 1287, 295, 264, 4166, + 2232, 1101, 813, 2232, 1516, 732, 1241, 4093, 256, 69, 327, 69, 558, 570, 294, 1516, + 732, 1241, 4093, 51404], "temperature": 0.0, "avg_logprob": -0.32695143873041327, + "compression_ratio": 1.6255707762557077, "no_speech_prob": 0.0009401958668604493}, + {"id": 253, "seek": 229368, "start": 2314.48, "end": 2321.04, "text": " you essentially + have like a window that you slide through yeah to your early example if", "tokens": + [51404, 291, 4476, 362, 411, 257, 4910, 300, 291, 4137, 807, 1338, 281, 428, 2440, + 1365, 498, 51732], "temperature": 0.0, "avg_logprob": -0.32695143873041327, "compression_ratio": + 1.6255707762557077, "no_speech_prob": 0.0009401958668604493}, {"id": 254, "seek": + 232104, "start": 2322.0, "end": 2328.16, "text": " this book react never happens + to be near JavaScript because everyone knows it''s JavaScript right", "tokens": + [50412, 341, 1446, 4515, 1128, 2314, 281, 312, 2651, 15778, 570, 1518, 3255, 309, + 311, 15778, 558, 50720], "temperature": 0.0, "avg_logprob": -0.11446817024894383, + "compression_ratio": 1.735159817351598, "no_speech_prob": 0.0003328732564114034}, + {"id": 255, "seek": 232104, "start": 2329.44, "end": 2334.08, "text": " then you + will never find it using using war two veko maybe it will be too distant but with", + "tokens": [50784, 550, 291, 486, 1128, 915, 309, 1228, 1228, 1516, 732, 1241, 4093, + 1310, 309, 486, 312, 886, 17275, 457, 365, 51016], "temperature": 0.0, "avg_logprob": + -0.11446817024894383, "compression_ratio": 1.735159817351598, "no_speech_prob": + 0.0003328732564114034}, {"id": 256, "seek": 232104, "start": 2334.08, "end": 2339.7599999999998, + "text": " Bert it tries to embed the whole document right or like you know chunks + of it averaged and so on", "tokens": [51016, 29594, 309, 9898, 281, 12240, 264, + 1379, 4166, 558, 420, 411, 291, 458, 24004, 295, 309, 18247, 2980, 293, 370, 322, + 51300], "temperature": 0.0, "avg_logprob": -0.11446817024894383, "compression_ratio": + 1.735159817351598, "no_speech_prob": 0.0003328732564114034}, {"id": 257, "seek": + 232104, "start": 2340.88, "end": 2346.96, "text": " so it might yeah and and you + have to do like if you were to use war two veko you''d have to like", "tokens": + [51356, 370, 309, 1062, 1338, 293, 293, 291, 362, 281, 360, 411, 498, 291, 645, + 281, 764, 1516, 732, 1241, 4093, 291, 1116, 362, 281, 411, 51660], "temperature": + 0.0, "avg_logprob": -0.11446817024894383, "compression_ratio": 1.735159817351598, + "no_speech_prob": 0.0003328732564114034}, {"id": 258, "seek": 234696, "start": 2346.96, + "end": 2352.48, "text": " sort of implement your own attention mechanism in a way + you''d be like uh okay what parts of the", "tokens": [50364, 1333, 295, 4445, 428, + 1065, 3202, 7513, 294, 257, 636, 291, 1116, 312, 411, 2232, 1392, 437, 3166, 295, + 264, 50640], "temperature": 0.0, "avg_logprob": -0.14089982708295187, "compression_ratio": + 1.7589285714285714, "no_speech_prob": 0.0004163410922046751}, {"id": 259, "seek": + 234696, "start": 2352.48, "end": 2358.4, "text": " document are okay first i''ve + got to throw out a bunch of front matter and and matter and and junk", "tokens": + [50640, 4166, 366, 1392, 700, 741, 600, 658, 281, 3507, 484, 257, 3840, 295, 1868, + 1871, 293, 293, 1871, 293, 293, 19109, 50936], "temperature": 0.0, "avg_logprob": + -0.14089982708295187, "compression_ratio": 1.7589285714285714, "no_speech_prob": + 0.0004163410922046751}, {"id": 260, "seek": 234696, "start": 2359.12, "end": 2364.08, + "text": " and like with word to veko you''d have to somehow engineer to like okay + we''ll look at these paragraphs", "tokens": [50972, 293, 411, 365, 1349, 281, 1241, + 4093, 291, 1116, 362, 281, 6063, 11403, 281, 411, 1392, 321, 603, 574, 412, 613, + 48910, 51220], "temperature": 0.0, "avg_logprob": -0.14089982708295187, "compression_ratio": + 1.7589285714285714, "no_speech_prob": 0.0004163410922046751}, {"id": 261, "seek": + 234696, "start": 2364.88, "end": 2372.8, "text": " and uh maybe i need to focus + it on these ones more and throw away some other ones uh and you don''t", "tokens": + [51260, 293, 2232, 1310, 741, 643, 281, 1879, 309, 322, 613, 2306, 544, 293, 3507, + 1314, 512, 661, 2306, 2232, 293, 291, 500, 380, 51656], "temperature": 0.0, "avg_logprob": + -0.14089982708295187, "compression_ratio": 1.7589285714285714, "no_speech_prob": + 0.0004163410922046751}, {"id": 262, "seek": 237280, "start": 2373.6000000000004, + "end": 2379.44, "text": " the aboutness of that gets really blended whereas the + amazing thing yeah you''re at the amazing", "tokens": [50404, 264, 466, 1287, 295, + 300, 2170, 534, 27048, 9735, 264, 2243, 551, 1338, 291, 434, 412, 264, 2243, 50696], + "temperature": 0.0, "avg_logprob": -0.09344227012546583, "compression_ratio": 1.8928571428571428, + "no_speech_prob": 0.0012464409228414297}, {"id": 263, "seek": 237280, "start": 2379.44, + "end": 2385.52, "text": " thing about like about Bert is how it''s a ability to + really zero in on the aboutness of like where", "tokens": [50696, 551, 466, 411, + 466, 29594, 307, 577, 309, 311, 257, 3485, 281, 534, 4018, 294, 322, 264, 466, 1287, + 295, 411, 689, 51000], "temperature": 0.0, "avg_logprob": -0.09344227012546583, + "compression_ratio": 1.8928571428571428, "no_speech_prob": 0.0012464409228414297}, + {"id": 264, "seek": 237280, "start": 2386.96, "end": 2393.44, "text": " each each + token position it''s not just like the paragraph has you know where the document + has an", "tokens": [51072, 1184, 1184, 14862, 2535, 309, 311, 406, 445, 411, 264, + 18865, 575, 291, 458, 689, 264, 4166, 575, 364, 51396], "temperature": 0.0, "avg_logprob": + -0.09344227012546583, "compression_ratio": 1.8928571428571428, "no_speech_prob": + 0.0012464409228414297}, {"id": 265, "seek": 237280, "start": 2393.44, "end": 2399.04, + "text": " embedding each token position has an embedding so it''s like if i take + a question", "tokens": [51396, 12240, 3584, 1184, 14862, 2535, 575, 364, 12240, + 3584, 370, 309, 311, 411, 498, 741, 747, 257, 1168, 51676], "temperature": 0.0, + "avg_logprob": -0.09344227012546583, "compression_ratio": 1.8928571428571428, "no_speech_prob": + 0.0012464409228414297}, {"id": 266, "seek": 239904, "start": 2399.68, "end": 2407.44, + "text": " i can really zero in on like oh this is the part of this article that + is most similar to this", "tokens": [50396, 741, 393, 534, 4018, 294, 322, 411, + 1954, 341, 307, 264, 644, 295, 341, 7222, 300, 307, 881, 2531, 281, 341, 50784], + "temperature": 0.0, "avg_logprob": -0.1000607972292556, "compression_ratio": 1.6607929515418502, + "no_speech_prob": 0.00034966293605975807}, {"id": 267, "seek": 239904, "start": + 2408.56, "end": 2414.56, "text": " you still got challenges with like the fuzziness + of dense factor and it''s you know maybe not", "tokens": [50840, 291, 920, 658, + 4759, 365, 411, 264, 283, 16740, 1324, 295, 18011, 5952, 293, 309, 311, 291, 458, + 1310, 406, 51140], "temperature": 0.0, "avg_logprob": -0.1000607972292556, "compression_ratio": + 1.6607929515418502, "no_speech_prob": 0.00034966293605975807}, {"id": 268, "seek": + 239904, "start": 2414.56, "end": 2420.08, "text": " precisely the words you''re + looking for but just the fact like that''s just mind-boggling that each", "tokens": + [51140, 13402, 264, 2283, 291, 434, 1237, 337, 457, 445, 264, 1186, 411, 300, 311, + 445, 1575, 12, 65, 36754, 1688, 300, 1184, 51416], "temperature": 0.0, "avg_logprob": + -0.1000607972292556, "compression_ratio": 1.6607929515418502, "no_speech_prob": + 0.00034966293605975807}, {"id": 269, "seek": 239904, "start": 2420.08, "end": 2427.52, + "text": " token position of a book might be an embedding i mean it''s a it can be + a beast to to man and", "tokens": [51416, 14862, 2535, 295, 257, 1446, 1062, 312, + 364, 12240, 3584, 741, 914, 309, 311, 257, 309, 393, 312, 257, 13464, 281, 281, + 587, 293, 51788], "temperature": 0.0, "avg_logprob": -0.1000607972292556, "compression_ratio": + 1.6607929515418502, "no_speech_prob": 0.00034966293605975807}, {"id": 270, "seek": + 242752, "start": 2427.52, "end": 2432.08, "text": " should deal with but it''s it + could be a really powerful concept yeah absolutely and plus it''s a", "tokens": + [50364, 820, 2028, 365, 457, 309, 311, 309, 727, 312, 257, 534, 4005, 3410, 1338, + 3122, 293, 1804, 309, 311, 257, 50592], "temperature": 0.0, "avg_logprob": -0.1646254004501715, + "compression_ratio": 1.7902439024390244, "no_speech_prob": 0.0011218514991924167}, + {"id": 271, "seek": 242752, "start": 2432.08, "end": 2437.52, "text": " mass model + right so it can predict what should be the the token and that masked out", "tokens": + [50592, 2758, 2316, 558, 370, 309, 393, 6069, 437, 820, 312, 264, 264, 14862, 293, + 300, 45249, 484, 50864], "temperature": 0.0, "avg_logprob": -0.1646254004501715, + "compression_ratio": 1.7902439024390244, "no_speech_prob": 0.0011218514991924167}, + {"id": 272, "seek": 242752, "start": 2438.32, "end": 2443.44, "text": " position + and then it can actually predict entire sentences i think there was one of the side", + "tokens": [50904, 2535, 293, 550, 309, 393, 767, 6069, 2302, 16579, 741, 519, 456, + 390, 472, 295, 264, 1252, 51160], "temperature": 0.0, "avg_logprob": -0.1646254004501715, + "compression_ratio": 1.7902439024390244, "no_speech_prob": 0.0011218514991924167}, + {"id": 273, "seek": 242752, "start": 2443.44, "end": 2450.4, "text": " yeah effects + of it right so it could become generated yeah totally yeah exactly so it''s pretty", + "tokens": [51160, 1338, 5065, 295, 309, 558, 370, 309, 727, 1813, 10833, 1338, 3879, + 1338, 2293, 370, 309, 311, 1238, 51508], "temperature": 0.0, "avg_logprob": -0.1646254004501715, + "compression_ratio": 1.7902439024390244, "no_speech_prob": 0.0011218514991924167}, + {"id": 274, "seek": 245040, "start": 2450.4, "end": 2461.36, "text": " amazing yeah + and so today as we roll into this you you follow up on this trend of sparse versus", + "tokens": [50364, 2243, 1338, 293, 370, 965, 382, 321, 3373, 666, 341, 291, 291, + 1524, 493, 322, 341, 6028, 295, 637, 11668, 5717, 50912], "temperature": 0.0, "avg_logprob": + -0.11110964106090034, "compression_ratio": 1.6588235294117648, "no_speech_prob": + 0.0025330025237053633}, {"id": 275, "seek": 245040, "start": 2461.36, "end": 2468.8, + "text": " dense you know i think a lot of discussion is still going around how dense + will enter this", "tokens": [50912, 18011, 291, 458, 741, 519, 257, 688, 295, 5017, + 307, 920, 516, 926, 577, 18011, 486, 3242, 341, 51284], "temperature": 0.0, "avg_logprob": + -0.11110964106090034, "compression_ratio": 1.6588235294117648, "no_speech_prob": + 0.0025330025237053633}, {"id": 276, "seek": 245040, "start": 2468.8, "end": 2475.12, + "text": " sparse search world at larger scale so how do you feel about this and + of course there is hybrid", "tokens": [51284, 637, 11668, 3164, 1002, 412, 4833, + 4373, 370, 577, 360, 291, 841, 466, 341, 293, 295, 1164, 456, 307, 13051, 51600], + "temperature": 0.0, "avg_logprob": -0.11110964106090034, "compression_ratio": 1.6588235294117648, + "no_speech_prob": 0.0025330025237053633}, {"id": 277, "seek": 247512, "start": 2475.12, + "end": 2482.24, "text": " search as well it''s a hot space yeah and i know there''s + a lot of open source projects there''s", "tokens": [50364, 3164, 382, 731, 309, + 311, 257, 2368, 1901, 1338, 293, 741, 458, 456, 311, 257, 688, 295, 1269, 4009, + 4455, 456, 311, 50720], "temperature": 0.0, "avg_logprob": -0.19184434413909912, + "compression_ratio": 1.7567567567567568, "no_speech_prob": 0.000979273347184062}, + {"id": 278, "seek": 247512, "start": 2482.24, "end": 2490.7999999999997, "text": + " like milbus there''s companies like pankham there''s qdren there are all all of + these systems that are", "tokens": [50720, 411, 1962, 21441, 456, 311, 3431, 411, + 280, 657, 4822, 456, 311, 9505, 67, 1095, 456, 366, 439, 439, 295, 613, 3652, 300, + 366, 51148], "temperature": 0.0, "avg_logprob": -0.19184434413909912, "compression_ratio": + 1.7567567567567568, "no_speech_prob": 0.000979273347184062}, {"id": 279, "seek": + 247512, "start": 2490.7999999999997, "end": 2494.88, "text": " doing dense vector + retrieval and it''s also just like a fun problem if you were in search for a", "tokens": + [51148, 884, 18011, 8062, 19817, 3337, 293, 309, 311, 611, 445, 411, 257, 1019, + 1154, 498, 291, 645, 294, 3164, 337, 257, 51352], "temperature": 0.0, "avg_logprob": + -0.19184434413909912, "compression_ratio": 1.7567567567567568, "no_speech_prob": + 0.000979273347184062}, {"id": 280, "seek": 247512, "start": 2494.88, "end": 2502.0, + "text": " while to think about approximate nearest neighbors and like how you solve + that and i know for a long", "tokens": [51352, 1339, 281, 519, 466, 30874, 23831, + 12512, 293, 411, 577, 291, 5039, 300, 293, 741, 458, 337, 257, 938, 51708], "temperature": + 0.0, "avg_logprob": -0.19184434413909912, "compression_ratio": 1.7567567567567568, + "no_speech_prob": 0.000979273347184062}, {"id": 281, "seek": 250200, "start": 2502.0, + "end": 2508.0, "text": " time it''s been sort of you know a side project of a lot + of people i know for you Dimitri and Max you", "tokens": [50364, 565, 309, 311, + 668, 1333, 295, 291, 458, 257, 1252, 1716, 295, 257, 688, 295, 561, 741, 458, 337, + 291, 20975, 270, 470, 293, 7402, 291, 50664], "temperature": 0.0, "avg_logprob": + -0.11170983068721811, "compression_ratio": 1.7207207207207207, "no_speech_prob": + 0.0007110300939530134}, {"id": 282, "seek": 250200, "start": 2508.0, "end": 2516.32, + "text": " guys had had a lot of fun in the billion vector challenge um it''s it + you you the first thing to ask", "tokens": [50664, 1074, 632, 632, 257, 688, 295, + 1019, 294, 264, 5218, 8062, 3430, 1105, 309, 311, 309, 291, 291, 264, 700, 551, + 281, 1029, 51080], "temperature": 0.0, "avg_logprob": -0.11170983068721811, "compression_ratio": + 1.7207207207207207, "no_speech_prob": 0.0007110300939530134}, {"id": 283, "seek": + 250200, "start": 2516.32, "end": 2520.88, "text": " is like why do we need these + extra databases and it''s interesting to think about because", "tokens": [51080, + 307, 411, 983, 360, 321, 643, 613, 2857, 22380, 293, 309, 311, 1880, 281, 519, 466, + 570, 51308], "temperature": 0.0, "avg_logprob": -0.11170983068721811, "compression_ratio": + 1.7207207207207207, "no_speech_prob": 0.0007110300939530134}, {"id": 284, "seek": + 250200, "start": 2521.84, "end": 2528.48, "text": " you''re thinking like i we just + talked about why can''t we you know we can match map tokens to", "tokens": [51356, + 291, 434, 1953, 411, 741, 321, 445, 2825, 466, 983, 393, 380, 321, 291, 458, 321, + 393, 2995, 4471, 22667, 281, 51688], "temperature": 0.0, "avg_logprob": -0.11170983068721811, + "compression_ratio": 1.7207207207207207, "no_speech_prob": 0.0007110300939530134}, + {"id": 285, "seek": 252848, "start": 2528.48, "end": 2533.2, "text": " meaning and + that kind of thing you know a lot you know and why can''t we do that why can''t + we", "tokens": [50364, 3620, 293, 300, 733, 295, 551, 291, 458, 257, 688, 291, 458, + 293, 983, 393, 380, 321, 360, 300, 983, 393, 380, 321, 50600], "temperature": 0.0, + "avg_logprob": -0.08396761505692094, "compression_ratio": 1.8938775510204082, "no_speech_prob": + 0.00027698834310285747}, {"id": 286, "seek": 252848, "start": 2533.2, "end": 2538.08, + "text": " just apply the same techniques to the dense world why can''t we use a + traditional search engine", "tokens": [50600, 445, 3079, 264, 912, 7512, 281, 264, + 18011, 1002, 983, 393, 380, 321, 764, 257, 5164, 3164, 2848, 50844], "temperature": + 0.0, "avg_logprob": -0.08396761505692094, "compression_ratio": 1.8938775510204082, + "no_speech_prob": 0.00027698834310285747}, {"id": 287, "seek": 252848, "start": + 2539.12, "end": 2544.4, "text": " and if you think about it what what you''re they''re + very in some ways they''re very", "tokens": [50896, 293, 498, 291, 519, 466, 309, + 437, 437, 291, 434, 436, 434, 588, 294, 512, 2098, 436, 434, 588, 51160], "temperature": + 0.0, "avg_logprob": -0.08396761505692094, "compression_ratio": 1.8938775510204082, + "no_speech_prob": 0.00027698834310285747}, {"id": 288, "seek": 252848, "start": + 2546.16, "end": 2551.44, "text": " the data structures underneath of them are optimized + it''s like yes you''re both in both cases", "tokens": [51248, 264, 1412, 9227, 7223, + 295, 552, 366, 26941, 309, 311, 411, 2086, 291, 434, 1293, 294, 1293, 3331, 51512], + "temperature": 0.0, "avg_logprob": -0.08396761505692094, "compression_ratio": 1.8938775510204082, + "no_speech_prob": 0.00027698834310285747}, {"id": 289, "seek": 252848, "start": + 2551.44, "end": 2556.08, "text": " you''re sort of like mapping query meaning to + document meaning like fundamentally the task is the same", "tokens": [51512, 291, + 434, 1333, 295, 411, 18350, 14581, 3620, 281, 4166, 3620, 411, 17879, 264, 5633, + 307, 264, 912, 51744], "temperature": 0.0, "avg_logprob": -0.08396761505692094, + "compression_ratio": 1.8938775510204082, "no_speech_prob": 0.00027698834310285747}, + {"id": 290, "seek": 255608, "start": 2556.4, "end": 2561.2799999999997, "text": + " but the data structures that you would use for a dense system where everything + is", "tokens": [50380, 457, 264, 1412, 9227, 300, 291, 576, 764, 337, 257, 18011, + 1185, 689, 1203, 307, 50624], "temperature": 0.0, "avg_logprob": -0.26501864478701637, + "compression_ratio": 1.6121495327102804, "no_speech_prob": 0.0008507038583047688}, + {"id": 291, "seek": 255608, "start": 2561.2799999999997, "end": 2569.2799999999997, + "text": " clustered into like 200 and maybe 256 or 760 or whatever dimensions are + very different than a", "tokens": [50624, 596, 38624, 666, 411, 2331, 293, 1310, + 38882, 420, 1614, 4550, 420, 2035, 12819, 366, 588, 819, 813, 257, 51024], "temperature": + 0.0, "avg_logprob": -0.26501864478701637, "compression_ratio": 1.6121495327102804, + "no_speech_prob": 0.0008507038583047688}, {"id": 292, "seek": 255608, "start": 2569.2799999999997, + "end": 2575.04, "text": " sparse index where you know you have and it''s something + like a a last searcher to get", "tokens": [51024, 637, 11668, 8186, 689, 291, 458, + 291, 362, 293, 309, 311, 746, 411, 257, 257, 1036, 3164, 260, 281, 483, 51312], + "temperature": 0.0, "avg_logprob": -0.26501864478701637, "compression_ratio": 1.6121495327102804, + "no_speech_prob": 0.0008507038583047688}, {"id": 293, "seek": 255608, "start": 2575.04, + "end": 2581.2, "text": " a loose-seeing traditional index really it''s a it''s a + the dimensionality is way way", "tokens": [51312, 257, 9612, 12, 405, 14667, 5164, + 8186, 534, 309, 311, 257, 309, 311, 257, 264, 10139, 1860, 307, 636, 636, 51620], + "temperature": 0.0, "avg_logprob": -0.26501864478701637, "compression_ratio": 1.6121495327102804, + "no_speech_prob": 0.0008507038583047688}, {"id": 294, "seek": 258120, "start": 2581.3599999999997, + "end": 2588.24, "text": " higher so it''s like you know you could expect hundreds + of thousands of words each each word", "tokens": [50372, 2946, 370, 309, 311, 411, + 291, 458, 291, 727, 2066, 6779, 295, 5383, 295, 2283, 1184, 1184, 1349, 50716], + "temperature": 0.0, "avg_logprob": -0.1555237044458804, "compression_ratio": 1.7972350230414746, + "no_speech_prob": 0.0026325536891818047}, {"id": 295, "seek": 258120, "start": 2588.24, + "end": 2596.0, "text": " is its own dimension and if you think about that like you''re + going to have situations where you know", "tokens": [50716, 307, 1080, 1065, 10139, + 293, 498, 291, 519, 466, 300, 411, 291, 434, 516, 281, 362, 6851, 689, 291, 458, + 51104], "temperature": 0.0, "avg_logprob": -0.1555237044458804, "compression_ratio": + 1.7972350230414746, "no_speech_prob": 0.0026325536891818047}, {"id": 296, "seek": + 258120, "start": 2596.0, "end": 2602.64, "text": " words follows zip zip-slaw zip-slaw + which is you know the occurred in english the occurs in every word", "tokens": [51104, + 2283, 10002, 20730, 20730, 12, 82, 5901, 20730, 12, 82, 5901, 597, 307, 291, 458, + 264, 11068, 294, 32169, 264, 11843, 294, 633, 1349, 51436], "temperature": 0.0, + "avg_logprob": -0.1555237044458804, "compression_ratio": 1.7972350230414746, "no_speech_prob": + 0.0026325536891818047}, {"id": 297, "seek": 258120, "start": 2603.7599999999998, + "end": 2609.52, "text": " and you get gradually gradually falls right off a cliff + and then you get like cat occurs in 1%", "tokens": [51492, 293, 291, 483, 13145, + 13145, 8804, 558, 766, 257, 22316, 293, 550, 291, 483, 411, 3857, 11843, 294, 502, + 4, 51780], "temperature": 0.0, "avg_logprob": -0.1555237044458804, "compression_ratio": + 1.7972350230414746, "no_speech_prob": 0.0026325536891818047}, {"id": 298, "seek": + 260952, "start": 2609.52, "end": 2615.04, "text": " of documents and then you go + keep going and you get like specific terminology like", "tokens": [50364, 295, 8512, + 293, 550, 291, 352, 1066, 516, 293, 291, 483, 411, 2685, 27575, 411, 50640], "temperature": + 0.0, "avg_logprob": -0.09475156630592785, "compression_ratio": 1.6837209302325582, + "no_speech_prob": 0.00024206411035265774}, {"id": 299, "seek": 260952, "start": + 2615.7599999999998, "end": 2623.04, "text": " feline occurs in 0.1% and it really + falls off a cliff and so the sparse vector indices are", "tokens": [50676, 283, + 5440, 11843, 294, 1958, 13, 16, 4, 293, 309, 534, 8804, 766, 257, 22316, 293, 370, + 264, 637, 11668, 8062, 43840, 366, 51040], "temperature": 0.0, "avg_logprob": -0.09475156630592785, + "compression_ratio": 1.6837209302325582, "no_speech_prob": 0.00024206411035265774}, + {"id": 300, "seek": 260952, "start": 2623.04, "end": 2631.12, "text": " really optimized + for that use case of like I have a I have a term and it basically points at a", + "tokens": [51040, 534, 26941, 337, 300, 764, 1389, 295, 411, 286, 362, 257, 286, + 362, 257, 1433, 293, 309, 1936, 2793, 412, 257, 51444], "temperature": 0.0, "avg_logprob": + -0.09475156630592785, "compression_ratio": 1.6837209302325582, "no_speech_prob": + 0.00024206411035265774}, {"id": 301, "seek": 260952, "start": 2631.12, "end": 2636.4, + "text": " very small handful of documents that contain that term and I can do that + look up very quickly", "tokens": [51444, 588, 1359, 16458, 295, 8512, 300, 5304, + 300, 1433, 293, 286, 393, 360, 300, 574, 493, 588, 2661, 51708], "temperature": + 0.0, "avg_logprob": -0.09475156630592785, "compression_ratio": 1.6837209302325582, + "no_speech_prob": 0.00024206411035265774}, {"id": 302, "seek": 263640, "start": + 2636.88, "end": 2643.76, "text": " I can fetch those I can score those and then + I can sort of like get get a get a score", "tokens": [50388, 286, 393, 23673, 729, + 286, 393, 6175, 729, 293, 550, 286, 393, 1333, 295, 411, 483, 483, 257, 483, 257, + 6175, 50732], "temperature": 0.0, "avg_logprob": -0.09850875618531532, "compression_ratio": + 1.8663366336633664, "no_speech_prob": 0.003363743657246232}, {"id": 303, "seek": + 263640, "start": 2644.56, "end": 2650.8, "text": " whereas what''s interesting about + the dense vector case is it''s more like I go in with sparse", "tokens": [50772, + 9735, 437, 311, 1880, 466, 264, 18011, 8062, 1389, 307, 309, 311, 544, 411, 286, + 352, 294, 365, 637, 11668, 51084], "temperature": 0.0, "avg_logprob": -0.09850875618531532, + "compression_ratio": 1.8663366336633664, "no_speech_prob": 0.003363743657246232}, + {"id": 304, "seek": 263640, "start": 2650.8, "end": 2656.96, "text": " vector I + go in with a single term and I get like or maybe two or three terms I look up in + this giant", "tokens": [51084, 8062, 286, 352, 294, 365, 257, 2167, 1433, 293, 286, + 483, 411, 420, 1310, 732, 420, 1045, 2115, 286, 574, 493, 294, 341, 7410, 51392], + "temperature": 0.0, "avg_logprob": -0.09850875618531532, "compression_ratio": 1.8663366336633664, + "no_speech_prob": 0.003363743657246232}, {"id": 305, "seek": 263640, "start": 2656.96, + "end": 2663.92, "text": " data structure and I get this this by looking up and I + can get like the you can get the handle to", "tokens": [51392, 1412, 3877, 293, + 286, 483, 341, 341, 538, 1237, 493, 293, 286, 393, 483, 411, 264, 291, 393, 483, + 264, 4813, 281, 51740], "temperature": 0.0, "avg_logprob": -0.09850875618531532, + "compression_ratio": 1.8663366336633664, "no_speech_prob": 0.003363743657246232}, + {"id": 306, "seek": 266392, "start": 2663.92, "end": 2668.96, "text": " all the + things that match that and I can sort those and get them back with dense vector + the query", "tokens": [50364, 439, 264, 721, 300, 2995, 300, 293, 286, 393, 1333, + 729, 293, 483, 552, 646, 365, 18011, 8062, 264, 14581, 50616], "temperature": 0.0, + "avg_logprob": -0.12748778206961495, "compression_ratio": 1.5921787709497206, "no_speech_prob": + 0.00017694687994662672}, {"id": 307, "seek": 266392, "start": 2668.96, "end": 2681.52, + "text": " isn''t two or three terms it''s some value in a larger 256 or more dimension + vector so off the", "tokens": [50616, 1943, 380, 732, 420, 1045, 2115, 309, 311, + 512, 2158, 294, 257, 4833, 38882, 420, 544, 10139, 8062, 370, 766, 264, 51244], + "temperature": 0.0, "avg_logprob": -0.12748778206961495, "compression_ratio": 1.5921787709497206, + "no_speech_prob": 0.00017694687994662672}, {"id": 308, "seek": 266392, "start": + 2681.52, "end": 2688.08, "text": " bat right there is like that''s a large 256 terms + and a traditional surgingian would be a large", "tokens": [51244, 7362, 558, 456, + 307, 411, 300, 311, 257, 2416, 38882, 2115, 293, 257, 5164, 1022, 3249, 952, 576, + 312, 257, 2416, 51572], "temperature": 0.0, "avg_logprob": -0.12748778206961495, + "compression_ratio": 1.5921787709497206, "no_speech_prob": 0.00017694687994662672}, + {"id": 309, "seek": 268808, "start": 2688.08, "end": 2697.04, "text": " query and + really you''re looking up in a in a index that is itself that dimensionality much", + "tokens": [50364, 14581, 293, 534, 291, 434, 1237, 493, 294, 257, 294, 257, 8186, + 300, 307, 2564, 300, 10139, 1860, 709, 50812], "temperature": 0.0, "avg_logprob": + -0.0846630234316171, "compression_ratio": 1.82, "no_speech_prob": 0.0011695935390889645}, + {"id": 310, "seek": 268808, "start": 2697.04, "end": 2703.44, "text": " smaller + dimensionality where every document has some amount of value for each each dimension", + "tokens": [50812, 4356, 10139, 1860, 689, 633, 4166, 575, 512, 2372, 295, 2158, + 337, 1184, 1184, 10139, 51132], "temperature": 0.0, "avg_logprob": -0.0846630234316171, + "compression_ratio": 1.82, "no_speech_prob": 0.0011695935390889645}, {"id": 311, + "seek": 268808, "start": 2704.16, "end": 2709.04, "text": " so it''s not like cat + where it occurs in three things it''s like all billion documents", "tokens": [51168, + 370, 309, 311, 406, 411, 3857, 689, 309, 11843, 294, 1045, 721, 309, 311, 411, 439, + 5218, 8512, 51412], "temperature": 0.0, "avg_logprob": -0.0846630234316171, "compression_ratio": + 1.82, "no_speech_prob": 0.0011695935390889645}, {"id": 312, "seek": 268808, "start": + 2709.84, "end": 2716.0, "text": " have some level of if cat was one of the dimensions + some level of catness and if you just think", "tokens": [51452, 362, 512, 1496, + 295, 498, 3857, 390, 472, 295, 264, 12819, 512, 1496, 295, 3857, 1287, 293, 498, + 291, 445, 519, 51760], "temperature": 0.0, "avg_logprob": -0.0846630234316171, "compression_ratio": + 1.82, "no_speech_prob": 0.0011695935390889645}, {"id": 313, "seek": 271600, "start": + 2716.0, "end": 2719.36, "text": " about how you would build a data structure it''d + be a very different thing", "tokens": [50364, 466, 577, 291, 576, 1322, 257, 1412, + 3877, 309, 1116, 312, 257, 588, 819, 551, 50532], "temperature": 0.0, "avg_logprob": + -0.10746716833733894, "compression_ratio": 1.816831683168317, "no_speech_prob": + 0.0005003767437301576}, {"id": 314, "seek": 271600, "start": 2721.52, "end": 2726.56, + "text": " and that''s why that''s why they people build these you know completely + different data structures and", "tokens": [50640, 293, 300, 311, 983, 300, 311, + 983, 436, 561, 1322, 613, 291, 458, 2584, 819, 1412, 9227, 293, 50892], "temperature": + 0.0, "avg_logprob": -0.10746716833733894, "compression_ratio": 1.816831683168317, + "no_speech_prob": 0.0005003767437301576}, {"id": 315, "seek": 271600, "start": 2726.56, + "end": 2733.84, "text": " why doing nearest neighbors on this large scale data is + very important because you do want to get", "tokens": [50892, 983, 884, 23831, 12512, + 322, 341, 2416, 4373, 1412, 307, 588, 1021, 570, 291, 360, 528, 281, 483, 51256], + "temperature": 0.0, "avg_logprob": -0.10746716833733894, "compression_ratio": 1.816831683168317, + "no_speech_prob": 0.0005003767437301576}, {"id": 316, "seek": 271600, "start": 2733.84, + "end": 2742.48, "text": " like some some sense of like similar conceptual meaning + in this in this compressed vector space", "tokens": [51256, 411, 512, 512, 2020, + 295, 411, 2531, 24106, 3620, 294, 341, 294, 341, 30353, 8062, 1901, 51688], "temperature": + 0.0, "avg_logprob": -0.10746716833733894, "compression_ratio": 1.816831683168317, + "no_speech_prob": 0.0005003767437301576}, {"id": 317, "seek": 274248, "start": 2742.72, + "end": 2750.88, "text": " but that in and of itself kind of gets at the pros and + cons of each because if you get this sort of", "tokens": [50376, 457, 300, 294, + 293, 295, 2564, 733, 295, 2170, 412, 264, 6267, 293, 1014, 295, 1184, 570, 498, + 291, 483, 341, 1333, 295, 50784], "temperature": 0.0, "avg_logprob": -0.09268951416015625, + "compression_ratio": 1.6514285714285715, "no_speech_prob": 0.0008471091277897358}, + {"id": 318, "seek": 274248, "start": 2750.88, "end": 2758.16, "text": " like compressed + representation you don''t specifically have the word cat or even direct necessarily", + "tokens": [50784, 411, 30353, 10290, 291, 500, 380, 4682, 362, 264, 1349, 3857, + 420, 754, 2047, 4725, 51148], "temperature": 0.0, "avg_logprob": -0.09268951416015625, + "compression_ratio": 1.6514285714285715, "no_speech_prob": 0.0008471091277897358}, + {"id": 319, "seek": 274248, "start": 2758.16, "end": 2765.36, "text": " direct synonyms + of cat that you''ve created you have like a rough statistical sense of like", "tokens": + [51148, 2047, 5451, 2526, 2592, 295, 3857, 300, 291, 600, 2942, 291, 362, 411, 257, + 5903, 22820, 2020, 295, 411, 51508], "temperature": 0.0, "avg_logprob": -0.09268951416015625, + "compression_ratio": 1.6514285714285715, "no_speech_prob": 0.0008471091277897358}, + {"id": 320, "seek": 276536, "start": 2765.44, "end": 2772.6400000000003, "text": + " catness or animal-ness that you''re kind of clustering together you''ve lost it + by compressing", "tokens": [50368, 3857, 1287, 420, 5496, 12, 1287, 300, 291, 434, + 733, 295, 596, 48673, 1214, 291, 600, 2731, 309, 538, 14778, 278, 50728], "temperature": + 0.0, "avg_logprob": -0.1157187648203181, "compression_ratio": 1.7906976744186047, + "no_speech_prob": 0.0002073153154924512}, {"id": 321, "seek": 276536, "start": 2772.6400000000003, + "end": 2779.6, "text": " to smaller dimensions you''ve lost some precision just + by definition but you''ve sort of expanded", "tokens": [50728, 281, 4356, 12819, + 291, 600, 2731, 512, 18356, 445, 538, 7123, 457, 291, 600, 1333, 295, 14342, 51076], + "temperature": 0.0, "avg_logprob": -0.1157187648203181, "compression_ratio": 1.7906976744186047, + "no_speech_prob": 0.0002073153154924512}, {"id": 322, "seek": 276536, "start": 2779.6, + "end": 2787.84, "text": " the net of what you might bring in so that''s like a pro + and a con of the new dense vector representation", "tokens": [51076, 264, 2533, + 295, 437, 291, 1062, 1565, 294, 370, 300, 311, 411, 257, 447, 293, 257, 416, 295, + 264, 777, 18011, 8062, 10290, 51488], "temperature": 0.0, "avg_logprob": -0.1157187648203181, + "compression_ratio": 1.7906976744186047, "no_speech_prob": 0.0002073153154924512}, + {"id": 323, "seek": 276536, "start": 2788.6400000000003, "end": 2793.76, "text": + " whereas continuing to use a sparse vector representation it''s a much more precisely + managed", "tokens": [51528, 9735, 9289, 281, 764, 257, 637, 11668, 8062, 10290, + 309, 311, 257, 709, 544, 13402, 6453, 51784], "temperature": 0.0, "avg_logprob": + -0.1157187648203181, "compression_ratio": 1.7906976744186047, "no_speech_prob": + 0.0002073153154924512}, {"id": 324, "seek": 279376, "start": 2794.5600000000004, + "end": 2802.0800000000004, "text": " look up and so they there''s not some future + where you throw away one or you throw away the other", "tokens": [50404, 574, 493, + 293, 370, 436, 456, 311, 406, 512, 2027, 689, 291, 3507, 1314, 472, 420, 291, 3507, + 1314, 264, 661, 50780], "temperature": 0.0, "avg_logprob": -0.12350567806972547, + "compression_ratio": 1.7644444444444445, "no_speech_prob": 0.00032079912489280105}, + {"id": 325, "seek": 279376, "start": 2802.88, "end": 2808.48, "text": " more and + more the reality is like hybrid retrieval where you''re using both data structures + to serve", "tokens": [50820, 544, 293, 544, 264, 4103, 307, 411, 13051, 19817, 3337, + 689, 291, 434, 1228, 1293, 1412, 9227, 281, 4596, 51100], "temperature": 0.0, "avg_logprob": + -0.12350567806972547, "compression_ratio": 1.7644444444444445, "no_speech_prob": + 0.00032079912489280105}, {"id": 326, "seek": 279376, "start": 2809.28, "end": 2816.0, + "text": " search results to give people like some kind of relevant results and in + a mix of both perspectives of", "tokens": [51140, 3164, 3542, 281, 976, 561, 411, + 512, 733, 295, 7340, 3542, 293, 294, 257, 2890, 295, 1293, 16766, 295, 51476], "temperature": + 0.0, "avg_logprob": -0.12350567806972547, "compression_ratio": 1.7644444444444445, + "no_speech_prob": 0.00032079912489280105}, {"id": 327, "seek": 279376, "start": + 2816.0, "end": 2822.88, "text": " like expanding the meaning to baby mean other + things or snow-staying in this more precise world of", "tokens": [51476, 411, 14702, + 264, 3620, 281, 3186, 914, 661, 721, 420, 5756, 12, 372, 32600, 294, 341, 544, 13600, + 1002, 295, 51820], "temperature": 0.0, "avg_logprob": -0.12350567806972547, "compression_ratio": + 1.7644444444444445, "no_speech_prob": 0.00032079912489280105}, {"id": 328, "seek": + 282288, "start": 2822.88, "end": 2828.32, "text": " sparse vector meaning or sparser + meaning yeah it''s amazing how you put it like it struck me", "tokens": [50364, + 637, 11668, 8062, 3620, 420, 637, 685, 260, 3620, 1338, 309, 311, 2243, 577, 291, + 829, 309, 411, 309, 13159, 385, 50636], "temperature": 0.0, "avg_logprob": -0.1444318535622586, + "compression_ratio": 1.7244444444444444, "no_speech_prob": 0.0018101981841027737}, + {"id": 329, "seek": 282288, "start": 2829.28, "end": 2835.92, "text": " house in + a simple way you can explain complex things I mean the sparse vector yeah you said + hundreds", "tokens": [50684, 1782, 294, 257, 2199, 636, 291, 393, 2903, 3997, 721, + 286, 914, 264, 637, 11668, 8062, 1338, 291, 848, 6779, 51016], "temperature": 0.0, + "avg_logprob": -0.1444318535622586, "compression_ratio": 1.7244444444444444, "no_speech_prob": + 0.0018101981841027737}, {"id": 330, "seek": 282288, "start": 2835.92, "end": 2841.76, + "text": " of thousands of you know terms in your term dictionary when I work with + alpha sense I once counted", "tokens": [51016, 295, 5383, 295, 291, 458, 2115, 294, + 428, 1433, 25890, 562, 286, 589, 365, 8961, 2020, 286, 1564, 20150, 51308], "temperature": + 0.0, "avg_logprob": -0.1444318535622586, "compression_ratio": 1.7244444444444444, + "no_speech_prob": 0.0018101981841027737}, {"id": 331, "seek": 282288, "start": 2841.76, + "end": 2849.36, "text": " we had that billion in because you feed like I don''t + know millions and millions of documents and", "tokens": [51308, 321, 632, 300, 5218, + 294, 570, 291, 3154, 411, 286, 500, 380, 458, 6803, 293, 6803, 295, 8512, 293, 51688], + "temperature": 0.0, "avg_logprob": -0.1444318535622586, "compression_ratio": 1.7244444444444444, + "no_speech_prob": 0.0018101981841027737}, {"id": 332, "seek": 284936, "start": 2849.36, + "end": 2854.8, "text": " they do vary quite a lot of course there is some overlap + like financial legal like revenue right", "tokens": [50364, 436, 360, 10559, 1596, + 257, 688, 295, 1164, 456, 307, 512, 19959, 411, 4669, 5089, 411, 9324, 558, 50636], + "temperature": 0.0, "avg_logprob": -0.16556496847243535, "compression_ratio": 1.6581196581196582, + "no_speech_prob": 0.0010448688408359885}, {"id": 333, "seek": 284936, "start": 2854.8, + "end": 2860.96, "text": " would occur everywhere but then as they describe different + verticals in the industry you know healthcare", "tokens": [50636, 576, 5160, 5315, + 457, 550, 382, 436, 6786, 819, 9429, 82, 294, 264, 3518, 291, 458, 8884, 50944], + "temperature": 0.0, "avg_logprob": -0.16556496847243535, "compression_ratio": 1.6581196581196582, + "no_speech_prob": 0.0010448688408359885}, {"id": 334, "seek": 284936, "start": 2860.96, + "end": 2867.36, "text": " versus I don''t know pure finance banking investment firms + and stuff they they have different", "tokens": [50944, 5717, 286, 500, 380, 458, + 6075, 10719, 18261, 6078, 18055, 293, 1507, 436, 436, 362, 819, 51264], "temperature": + 0.0, "avg_logprob": -0.16556496847243535, "compression_ratio": 1.6581196581196582, + "no_speech_prob": 0.0010448688408359885}, {"id": 335, "seek": 284936, "start": 2867.36, + "end": 2874.08, "text": " lingual there and and that''s amazing like how you put + it right so if I had a vector let''s say", "tokens": [51264, 22949, 901, 456, 293, + 293, 300, 311, 2243, 411, 577, 291, 829, 309, 558, 370, 498, 286, 632, 257, 8062, + 718, 311, 584, 51600], "temperature": 0.0, "avg_logprob": -0.16556496847243535, + "compression_ratio": 1.6581196581196582, "no_speech_prob": 0.0010448688408359885}, + {"id": 336, "seek": 287408, "start": 2874.08, "end": 2880.3199999999997, "text": + " billion size right now I have what I have much less right so I have like 768 dimensions + maybe", "tokens": [50364, 5218, 2744, 558, 586, 286, 362, 437, 286, 362, 709, 1570, + 558, 370, 286, 362, 411, 24733, 23, 12819, 1310, 50676], "temperature": 0.0, "avg_logprob": + -0.22761008613987974, "compression_ratio": 1.515625, "no_speech_prob": 0.016016317531466484}, + {"id": 337, "seek": 287408, "start": 2880.3199999999997, "end": 2889.7599999999998, + "text": " 1024 I heard the recently one commuter in in Elasticsearch is trying to + push the scene to upgrade to", "tokens": [50676, 1266, 7911, 286, 2198, 264, 3938, + 472, 800, 20314, 294, 294, 2699, 2750, 405, 1178, 307, 1382, 281, 2944, 264, 4145, + 281, 11484, 281, 51148], "temperature": 0.0, "avg_logprob": -0.22761008613987974, + "compression_ratio": 1.515625, "no_speech_prob": 0.016016317531466484}, {"id": 338, + "seek": 287408, "start": 2889.7599999999998, "end": 2897.04, "text": " 2048 or something + oh wow and I didn''t see that that''s that''s amazing yeah yeah I think it was my", + "tokens": [51148, 945, 13318, 420, 746, 1954, 6076, 293, 286, 994, 380, 536, 300, + 300, 311, 300, 311, 2243, 1338, 1338, 286, 519, 309, 390, 452, 51512], "temperature": + 0.0, "avg_logprob": -0.22761008613987974, "compression_ratio": 1.515625, "no_speech_prob": + 0.016016317531466484}, {"id": 339, "seek": 289704, "start": 2897.04, "end": 2903.04, + "text": " share you were so yeah and this is amazing but I guess you''re right and + also there is another thing", "tokens": [50364, 2073, 291, 645, 370, 1338, 293, + 341, 307, 2243, 457, 286, 2041, 291, 434, 558, 293, 611, 456, 307, 1071, 551, 50664], + "temperature": 0.0, "avg_logprob": -0.23679988986843234, "compression_ratio": 1.6182572614107884, + "no_speech_prob": 0.004139723721891642}, {"id": 340, "seek": 289704, "start": 2903.04, + "end": 2910.72, "text": " that comes to mind we had a podcast with Yuri Malkov who + is the creator of one of the most popular", "tokens": [50664, 300, 1487, 281, 1575, + 321, 632, 257, 7367, 365, 33901, 376, 667, 5179, 567, 307, 264, 14181, 295, 472, + 295, 264, 881, 3743, 51048], "temperature": 0.0, "avg_logprob": -0.23679988986843234, + "compression_ratio": 1.6182572614107884, "no_speech_prob": 0.004139723721891642}, + {"id": 341, "seek": 289704, "start": 2911.52, "end": 2919.04, "text": " a and then + algorithm station SW here article navigable small world graph and when I ask him + this", "tokens": [51088, 257, 293, 550, 9284, 5214, 20346, 510, 7222, 7407, 712, + 1359, 1002, 4295, 293, 562, 286, 1029, 796, 341, 51464], "temperature": 0.0, "avg_logprob": + -0.23679988986843234, "compression_ratio": 1.6182572614107884, "no_speech_prob": + 0.004139723721891642}, {"id": 342, "seek": 289704, "start": 2919.04, "end": 2924.88, + "text": " question so let''s say you have this geometrical search right similarity + search and in the case", "tokens": [51464, 1168, 370, 718, 311, 584, 291, 362, 341, + 12956, 15888, 3164, 558, 32194, 3164, 293, 294, 264, 1389, 51756], "temperature": + 0.0, "avg_logprob": -0.23679988986843234, "compression_ratio": 1.6182572614107884, + "no_speech_prob": 0.004139723721891642}, {"id": 343, "seek": 292488, "start": 2924.88, + "end": 2931.2000000000003, "text": " of ecommerce you also want to have filters + so you want to say I want to read choose you know this size", "tokens": [50364, + 295, 308, 26926, 291, 611, 528, 281, 362, 15995, 370, 291, 528, 281, 584, 286, 528, + 281, 1401, 2826, 291, 458, 341, 2744, 50680], "temperature": 0.0, "avg_logprob": + -0.1886138501374618, "compression_ratio": 1.6353591160220995, "no_speech_prob": + 0.0037642833776772022}, {"id": 344, "seek": 292488, "start": 2931.92, "end": 2940.32, + "text": " in stock and so on and and he he''s surprising that he said these contradicts + contradict the to", "tokens": [50716, 294, 4127, 293, 370, 322, 293, 293, 415, 415, + 311, 8830, 300, 415, 848, 613, 28900, 82, 28900, 264, 281, 51136], "temperature": + 0.0, "avg_logprob": -0.1886138501374618, "compression_ratio": 1.6353591160220995, + "no_speech_prob": 0.0037642833776772022}, {"id": 345, "seek": 292488, "start": 2940.32, + "end": 2948.0, "text": " each other so so much that I cannot even imagine creating + a generic algorithm that will cover this", "tokens": [51136, 1184, 661, 370, 370, + 709, 300, 286, 2644, 754, 3811, 4084, 257, 19577, 9284, 300, 486, 2060, 341, 51520], + "temperature": 0.0, "avg_logprob": -0.1886138501374618, "compression_ratio": 1.6353591160220995, + "no_speech_prob": 0.0037642833776772022}, {"id": 346, "seek": 294800, "start": 2948.0, + "end": 2954.64, "text": " case because essentially he said it could quickly degrade + to traversing the whole space of points", "tokens": [50364, 1389, 570, 4476, 415, + 848, 309, 727, 2661, 368, 8692, 281, 23149, 278, 264, 1379, 1901, 295, 2793, 50696], + "temperature": 0.0, "avg_logprob": -0.14267768416293833, "compression_ratio": 1.8287037037037037, + "no_speech_prob": 0.003679753514006734}, {"id": 347, "seek": 294800, "start": 2954.64, + "end": 2961.04, "text": " because as you said you know about as a dimension you + also have these filters as a dimension right you", "tokens": [50696, 570, 382, 291, + 848, 291, 458, 466, 382, 257, 10139, 291, 611, 362, 613, 15995, 382, 257, 10139, + 558, 291, 51016], "temperature": 0.0, "avg_logprob": -0.14267768416293833, "compression_ratio": + 1.8287037037037037, "no_speech_prob": 0.003679753514006734}, {"id": 348, "seek": + 294800, "start": 2961.04, "end": 2966.56, "text": " could say yeah cluster these + points on color cluster these points on size yeah yeah imagine doing", "tokens": + [51016, 727, 584, 1338, 13630, 613, 2793, 322, 2017, 13630, 613, 2793, 322, 2744, + 1338, 1338, 3811, 884, 51292], "temperature": 0.0, "avg_logprob": -0.14267768416293833, + "compression_ratio": 1.8287037037037037, "no_speech_prob": 0.003679753514006734}, + {"id": 349, "seek": 294800, "start": 2966.56, "end": 2971.68, "text": " this up + front I mean this is not a generic solution and then he was he was just blunt and + saying", "tokens": [51292, 341, 493, 1868, 286, 914, 341, 307, 406, 257, 19577, + 3827, 293, 550, 415, 390, 415, 390, 445, 32246, 293, 1566, 51548], "temperature": + 0.0, "avg_logprob": -0.14267768416293833, "compression_ratio": 1.8287037037037037, + "no_speech_prob": 0.003679753514006734}, {"id": 350, "seek": 297168, "start": 2971.7599999999998, + "end": 2978.0, "text": " this is not possible I don''t see how you could do this + and yet the vector databases claim that they", "tokens": [50368, 341, 307, 406, + 1944, 286, 500, 380, 536, 577, 291, 727, 360, 341, 293, 1939, 264, 8062, 22380, + 3932, 300, 436, 50680], "temperature": 0.0, "avg_logprob": -0.1094987690448761, + "compression_ratio": 1.7342342342342343, "no_speech_prob": 0.011163199320435524}, + {"id": 351, "seek": 297168, "start": 2978.0, "end": 2986.72, "text": " have done + it and yes you can go and at scale but I I sense that there is some some truth he + didn''t", "tokens": [50680, 362, 1096, 309, 293, 2086, 291, 393, 352, 293, 412, + 4373, 457, 286, 286, 2020, 300, 456, 307, 512, 512, 3494, 415, 994, 380, 51116], + "temperature": 0.0, "avg_logprob": -0.1094987690448761, "compression_ratio": 1.7342342342342343, + "no_speech_prob": 0.011163199320435524}, {"id": 352, "seek": 297168, "start": 2986.72, + "end": 2993.12, "text": " maybe potentially there you know some edge cases where + it doesn''t work or maybe it goes over a", "tokens": [51116, 1310, 7263, 456, 291, + 458, 512, 4691, 3331, 689, 309, 1177, 380, 589, 420, 1310, 309, 1709, 670, 257, + 51436], "temperature": 0.0, "avg_logprob": -0.1094987690448761, "compression_ratio": + 1.7342342342342343, "no_speech_prob": 0.011163199320435524}, {"id": 353, "seek": + 297168, "start": 2993.12, "end": 2999.2799999999997, "text": " second and it''s + acceptable I don''t know but how do you feel yeah I mean it feels a bit like", "tokens": + [51436, 1150, 293, 309, 311, 15513, 286, 500, 380, 458, 457, 577, 360, 291, 841, + 1338, 286, 914, 309, 3417, 257, 857, 411, 51744], "temperature": 0.0, "avg_logprob": + -0.1094987690448761, "compression_ratio": 1.7342342342342343, "no_speech_prob": + 0.011163199320435524}, {"id": 354, "seek": 299928, "start": 2999.28, "end": 3006.7200000000003, + "text": " over complicating a solve problem it feels like we''ve I sort of I suspect + that will be in a world of", "tokens": [50364, 670, 16060, 990, 257, 5039, 1154, + 309, 3417, 411, 321, 600, 286, 1333, 295, 286, 9091, 300, 486, 312, 294, 257, 1002, + 295, 50736], "temperature": 0.0, "avg_logprob": -0.13866357803344725, "compression_ratio": + 1.6174863387978142, "no_speech_prob": 0.0006016258266754448}, {"id": 355, "seek": + 299928, "start": 3007.52, "end": 3013.6000000000004, "text": " of sort of hybrid + more hybrid retrieval where you''re using a traditional filter for those kinds of", + "tokens": [50776, 295, 1333, 295, 13051, 544, 13051, 19817, 3337, 689, 291, 434, + 1228, 257, 5164, 6608, 337, 729, 3685, 295, 51080], "temperature": 0.0, "avg_logprob": + -0.13866357803344725, "compression_ratio": 1.6174863387978142, "no_speech_prob": + 0.0006016258266754448}, {"id": 356, "seek": 299928, "start": 3013.6000000000004, + "end": 3022.0800000000004, "text": " things because I think like I feel like we + dense retrieval is the missing piece of most people''s", "tokens": [51080, 721, + 570, 286, 519, 411, 286, 841, 411, 321, 18011, 19817, 3337, 307, 264, 5361, 2522, + 295, 881, 561, 311, 51504], "temperature": 0.0, "avg_logprob": -0.13866357803344725, + "compression_ratio": 1.6174863387978142, "no_speech_prob": 0.0006016258266754448}, + {"id": 357, "seek": 302208, "start": 3022.72, "end": 3029.92, "text": " search systems + if not for anything then just like I think like first pass like often it was like", + "tokens": [50396, 3164, 3652, 498, 406, 337, 1340, 550, 445, 411, 286, 519, 411, + 700, 1320, 411, 2049, 309, 390, 411, 50756], "temperature": 0.0, "avg_logprob": + -0.10907360712687174, "compression_ratio": 1.7853881278538812, "no_speech_prob": + 0.0013814865378662944}, {"id": 358, "seek": 302208, "start": 3029.92, "end": 3037.7599999999998, + "text": " case for a long time that people would would do like first pass like the + m25 scoring and then like", "tokens": [50756, 1389, 337, 257, 938, 565, 300, 561, + 576, 576, 360, 411, 700, 1320, 411, 264, 275, 6074, 22358, 293, 550, 411, 51148], + "temperature": 0.0, "avg_logprob": -0.10907360712687174, "compression_ratio": 1.7853881278538812, + "no_speech_prob": 0.0013814865378662944}, {"id": 359, "seek": 302208, "start": 3037.7599999999998, + "end": 3042.48, "text": " maybe apply some learning to rank algorithm I sort of + feel like that''s going to flip to be it", "tokens": [51148, 1310, 3079, 512, 2539, + 281, 6181, 9284, 286, 1333, 295, 841, 411, 300, 311, 516, 281, 7929, 281, 312, 309, + 51384], "temperature": 0.0, "avg_logprob": -0.10907360712687174, "compression_ratio": + 1.7853881278538812, "no_speech_prob": 0.0013814865378662944}, {"id": 360, "seek": + 302208, "start": 3042.48, "end": 3049.52, "text": " could flip to be like first + I''m going to get this dense vector candidate list because it''s compressed", "tokens": + [51384, 727, 7929, 281, 312, 411, 700, 286, 478, 516, 281, 483, 341, 18011, 8062, + 11532, 1329, 570, 309, 311, 30353, 51736], "temperature": 0.0, "avg_logprob": -0.10907360712687174, + "compression_ratio": 1.7853881278538812, "no_speech_prob": 0.0013814865378662944}, + {"id": 361, "seek": 304952, "start": 3049.52, "end": 3054.16, "text": " it''s like + it''s actually more recall oriented and then I''m going to use sparse vector techniques", + "tokens": [50364, 309, 311, 411, 309, 311, 767, 544, 9901, 21841, 293, 550, 286, + 478, 516, 281, 764, 637, 11668, 8062, 7512, 50596], "temperature": 0.0, "avg_logprob": + -0.13167084095089934, "compression_ratio": 1.6515837104072397, "no_speech_prob": + 0.00035860372008755803}, {"id": 362, "seek": 304952, "start": 3054.16, "end": 3059.7599999999998, + "text": " to filter re-rank and these kinds of things but I also think like just + for speed and ops like", "tokens": [50596, 281, 6608, 319, 12, 20479, 293, 613, + 3685, 295, 721, 457, 286, 611, 519, 411, 445, 337, 3073, 293, 44663, 411, 50876], + "temperature": 0.0, "avg_logprob": -0.13167084095089934, "compression_ratio": 1.6515837104072397, + "no_speech_prob": 0.00035860372008755803}, {"id": 363, "seek": 304952, "start": + 3060.72, "end": 3067.68, "text": " one thing that''s you know it''s just a practical + concern is like they''re solar,", "tokens": [50924, 472, 551, 300, 311, 291, 458, + 309, 311, 445, 257, 8496, 3136, 307, 411, 436, 434, 7936, 11, 51272], "temperature": + 0.0, "avg_logprob": -0.13167084095089934, "compression_ratio": 1.6515837104072397, + "no_speech_prob": 0.00035860372008755803}, {"id": 364, "seek": 304952, "start": + 3067.68, "end": 3073.68, "text": " elastic search have such huge install bases and + a huge practitioner people who know how to scale", "tokens": [51272, 17115, 3164, + 362, 1270, 2603, 3625, 17949, 293, 257, 2603, 32125, 561, 567, 458, 577, 281, 4373, + 51572], "temperature": 0.0, "avg_logprob": -0.13167084095089934, "compression_ratio": + 1.6515837104072397, "no_speech_prob": 0.00035860372008755803}, {"id": 365, "seek": + 307368, "start": 3073.9199999999996, "end": 3082.08, "text": " them and I think + with new dense vector techniques I''m not sure people are going to like", "tokens": + [50376, 552, 293, 286, 519, 365, 777, 18011, 8062, 7512, 286, 478, 406, 988, 561, + 366, 516, 281, 411, 50784], "temperature": 0.0, "avg_logprob": -0.0834761563469382, + "compression_ratio": 1.7688679245283019, "no_speech_prob": 0.0003358752583153546}, + {"id": 366, "seek": 307368, "start": 3082.96, "end": 3088.3199999999997, "text": + " completely throw out their elastic search or solar install just to have this new + functionality", "tokens": [50828, 2584, 3507, 484, 641, 17115, 3164, 420, 7936, + 3625, 445, 281, 362, 341, 777, 14980, 51096], "temperature": 0.0, "avg_logprob": + -0.0834761563469382, "compression_ratio": 1.7688679245283019, "no_speech_prob": + 0.0003358752583153546}, {"id": 367, "seek": 307368, "start": 3089.2, "end": 3095.3599999999997, + "text": " and in fact you know as elastic search and solar sort of adopt more of + these things I think more", "tokens": [51140, 293, 294, 1186, 291, 458, 382, 17115, + 3164, 293, 7936, 1333, 295, 6878, 544, 295, 613, 721, 286, 519, 544, 51448], "temperature": + 0.0, "avg_logprob": -0.0834761563469382, "compression_ratio": 1.7688679245283019, + "no_speech_prob": 0.0003358752583153546}, {"id": 368, "seek": 307368, "start": 3095.3599999999997, + "end": 3099.2799999999997, "text": " and more people are going to say oh that''s + cool I''m going to use this in addition to my elastic", "tokens": [51448, 293, 544, + 561, 366, 516, 281, 584, 1954, 300, 311, 1627, 286, 478, 516, 281, 764, 341, 294, + 4500, 281, 452, 17115, 51644], "temperature": 0.0, "avg_logprob": -0.0834761563469382, + "compression_ratio": 1.7688679245283019, "no_speech_prob": 0.0003358752583153546}, + {"id": 369, "seek": 309928, "start": 3099.36, "end": 3107.92, "text": " search or + solar so I sort of feel like we''ll end up in this world where where yeah elastic + search", "tokens": [50368, 3164, 420, 7936, 370, 286, 1333, 295, 841, 411, 321, + 603, 917, 493, 294, 341, 1002, 689, 689, 1338, 17115, 3164, 50796], "temperature": + 0.0, "avg_logprob": -0.09962442517280579, "compression_ratio": 1.72, "no_speech_prob": + 0.0010028709657490253}, {"id": 370, "seek": 309928, "start": 3107.92, "end": 3115.52, + "text": " and solar don''t give us as nice of a set of feature features for that + but it already works pretty", "tokens": [50796, 293, 7936, 500, 380, 976, 505, 382, + 1481, 295, 257, 992, 295, 4111, 4122, 337, 300, 457, 309, 1217, 1985, 1238, 51176], + "temperature": 0.0, "avg_logprob": -0.09962442517280579, "compression_ratio": 1.72, + "no_speech_prob": 0.0010028709657490253}, {"id": 371, "seek": 309928, "start": 3115.52, + "end": 3121.28, "text": " well for 80% of what we do anyway and we just kind of + want to tack on this extra bit so that feels", "tokens": [51176, 731, 337, 4688, + 4, 295, 437, 321, 360, 4033, 293, 321, 445, 733, 295, 528, 281, 9426, 322, 341, + 2857, 857, 370, 300, 3417, 51464], "temperature": 0.0, "avg_logprob": -0.09962442517280579, + "compression_ratio": 1.72, "no_speech_prob": 0.0010028709657490253}, {"id": 372, + "seek": 309928, "start": 3121.28, "end": 3126.8, "text": " more of a like my expectation + of what the future would be then then we''ll like throw out the", "tokens": [51464, + 544, 295, 257, 411, 452, 14334, 295, 437, 264, 2027, 576, 312, 550, 550, 321, 603, + 411, 3507, 484, 264, 51740], "temperature": 0.0, "avg_logprob": -0.09962442517280579, + "compression_ratio": 1.72, "no_speech_prob": 0.0010028709657490253}, {"id": 373, + "seek": 312680, "start": 3126.8, "end": 3132.1600000000003, "text": " existing systems + and adopt something new yeah this is very interesting opinion of course", "tokens": + [50364, 6741, 3652, 293, 6878, 746, 777, 1338, 341, 307, 588, 1880, 4800, 295, 1164, + 50632], "temperature": 0.0, "avg_logprob": -0.2739511728286743, "compression_ratio": + 1.6130434782608696, "no_speech_prob": 0.0019557310733944178}, {"id": 374, "seek": + 312680, "start": 3132.88, "end": 3137.76, "text": " not downplaying the displayers + that you mentioned I haven''t talked about it as well seven", "tokens": [50668, + 406, 760, 32944, 264, 4674, 433, 300, 291, 2835, 286, 2378, 380, 2825, 466, 309, + 382, 731, 3407, 50912], "temperature": 0.0, "avg_logprob": -0.2739511728286743, + "compression_ratio": 1.6130434782608696, "no_speech_prob": 0.0019557310733944178}, + {"id": 375, "seek": 312680, "start": 3137.76, "end": 3146.6400000000003, "text": + " databases exist today and you neural network no neural frameworks like Gina and + Haystack which", "tokens": [50912, 22380, 2514, 965, 293, 291, 18161, 3209, 572, + 18161, 29834, 411, 34711, 293, 8721, 372, 501, 597, 51356], "temperature": 0.0, + "avg_logprob": -0.2739511728286743, "compression_ratio": 1.6130434782608696, "no_speech_prob": + 0.0019557310733944178}, {"id": 376, "seek": 312680, "start": 3146.6400000000003, + "end": 3153.84, "text": " is of these database but I agree like I think the future + might be in flexibility that okay if I''m", "tokens": [51356, 307, 295, 613, 8149, + 457, 286, 3986, 411, 286, 519, 264, 2027, 1062, 312, 294, 12635, 300, 1392, 498, + 286, 478, 51716], "temperature": 0.0, "avg_logprob": -0.2739511728286743, "compression_ratio": + 1.6130434782608696, "no_speech_prob": 0.0019557310733944178}, {"id": 377, "seek": + 315384, "start": 3153.84, "end": 3162.7200000000003, "text": " already with solar + why don''t I just use the you know and then plug in and try my luck maybe just", + "tokens": [50364, 1217, 365, 7936, 983, 500, 380, 286, 445, 764, 264, 291, 458, + 293, 550, 5452, 294, 293, 853, 452, 3668, 1310, 445, 50808], "temperature": 0.0, + "avg_logprob": -0.12157070009331954, "compression_ratio": 1.6818181818181819, "no_speech_prob": + 0.0023495813366025686}, {"id": 378, "seek": 315384, "start": 3163.76, "end": 3169.92, + "text": " wet my toes so to say right I don''t want to jump to entirely new world + of new database that I don''t", "tokens": [50860, 6630, 452, 14681, 370, 281, 584, + 558, 286, 500, 380, 528, 281, 3012, 281, 7696, 777, 1002, 295, 777, 8149, 300, 286, + 500, 380, 51168], "temperature": 0.0, "avg_logprob": -0.12157070009331954, "compression_ratio": + 1.6818181818181819, "no_speech_prob": 0.0023495813366025686}, {"id": 379, "seek": + 315384, "start": 3169.92, "end": 3177.28, "text": " have experience with but if + you haven''t had that set up you start up let''s say I know some startups", "tokens": + [51168, 362, 1752, 365, 457, 498, 291, 2378, 380, 632, 300, 992, 493, 291, 722, + 493, 718, 311, 584, 286, 458, 512, 28041, 51536], "temperature": 0.0, "avg_logprob": + -0.12157070009331954, "compression_ratio": 1.6818181818181819, "no_speech_prob": + 0.0023495813366025686}, {"id": 380, "seek": 317728, "start": 3177.36, "end": 3184.0, + "text": " when they when they want to go that direction with neural search they + do consider VESPA or", "tokens": [50368, 562, 436, 562, 436, 528, 281, 352, 300, + 3513, 365, 18161, 3164, 436, 360, 1949, 691, 2358, 10297, 420, 50700], "temperature": + 0.0, "avg_logprob": -0.18428609587929465, "compression_ratio": 1.6150627615062763, + "no_speech_prob": 0.004703637212514877}, {"id": 381, "seek": 317728, "start": 3184.6400000000003, + "end": 3190.2400000000002, "text": " VVA or pine cone you know and that might be + a different use case as well by the way this is entirely", "tokens": [50732, 691, + 20914, 420, 15113, 19749, 291, 458, 293, 300, 1062, 312, 257, 819, 764, 1389, 382, + 731, 538, 264, 636, 341, 307, 7696, 51012], "temperature": 0.0, "avg_logprob": -0.18428609587929465, + "compression_ratio": 1.6150627615062763, "no_speech_prob": 0.004703637212514877}, + {"id": 382, "seek": 317728, "start": 3190.88, "end": 3198.48, "text": " new big + topic but it''s not only search right search is still being figured out and some + companies do it", "tokens": [51044, 777, 955, 4829, 457, 309, 311, 406, 787, 3164, + 558, 3164, 307, 920, 885, 8932, 484, 293, 512, 3431, 360, 309, 51424], "temperature": + 0.0, "avg_logprob": -0.18428609587929465, "compression_ratio": 1.6150627615062763, + "no_speech_prob": 0.004703637212514877}, {"id": 383, "seek": 317728, "start": 3198.48, + "end": 3202.8, "text": " but then you also have machine learning pipelines like + recommender systems oh totally yeah", "tokens": [51424, 457, 550, 291, 611, 362, + 3479, 2539, 40168, 411, 2748, 260, 3652, 1954, 3879, 1338, 51640], "temperature": + 0.0, "avg_logprob": -0.18428609587929465, "compression_ratio": 1.6150627615062763, + "no_speech_prob": 0.004703637212514877}, {"id": 384, "seek": 320280, "start": 3203.44, + "end": 3209.76, "text": " yeah that''s a great use case so it''s not like for search + up of course like you have this huge", "tokens": [50396, 1338, 300, 311, 257, 869, + 764, 1389, 370, 309, 311, 406, 411, 337, 3164, 493, 295, 1164, 411, 291, 362, 341, + 2603, 50712], "temperature": 0.0, "avg_logprob": -0.13585243906293595, "compression_ratio": + 1.6754385964912282, "no_speech_prob": 0.0005495928344316781}, {"id": 385, "seek": + 320280, "start": 3209.76, "end": 3215.1200000000003, "text": " install base and + stuff so I mean especially you know for established companies like Shopify or", + "tokens": [50712, 3625, 3096, 293, 1507, 370, 286, 914, 2318, 291, 458, 337, 7545, + 3431, 411, 43991, 420, 50980], "temperature": 0.0, "avg_logprob": -0.13585243906293595, + "compression_ratio": 1.6754385964912282, "no_speech_prob": 0.0005495928344316781}, + {"id": 386, "seek": 320280, "start": 3215.1200000000003, "end": 3221.6800000000003, + "text": " whoever else but you''re absolutely right there is there is a lot of opportunity + for for this at", "tokens": [50980, 11387, 1646, 457, 291, 434, 3122, 558, 456, + 307, 456, 307, 257, 688, 295, 2650, 337, 337, 341, 412, 51308], "temperature": 0.0, + "avg_logprob": -0.13585243906293595, "compression_ratio": 1.6754385964912282, "no_speech_prob": + 0.0005495928344316781}, {"id": 387, "seek": 320280, "start": 3223.2000000000003, + "end": 3232.48, "text": " in so many places like pure question answering applications + or places where you know you''re using", "tokens": [51384, 294, 370, 867, 3190, + 411, 6075, 1168, 13430, 5821, 420, 3190, 689, 291, 458, 291, 434, 1228, 51848], + "temperature": 0.0, "avg_logprob": -0.13585243906293595, "compression_ratio": 1.6754385964912282, + "no_speech_prob": 0.0005495928344316781}, {"id": 388, "seek": 323248, "start": 3232.48, + "end": 3238.64, "text": " it as a backend component to do some kind of inference + for recommendation systems I think it''s a", "tokens": [50364, 309, 382, 257, 38087, + 6542, 281, 360, 512, 733, 295, 38253, 337, 11879, 3652, 286, 519, 309, 311, 257, + 50672], "temperature": 0.0, "avg_logprob": -0.11052543776375907, "compression_ratio": + 1.8055555555555556, "no_speech_prob": 0.0005219537415541708}, {"id": 389, "seek": + 323248, "start": 3238.64, "end": 3245.92, "text": " fantastic I think it''s a fantastic + thing I do I do think like more and more a bigger practice will", "tokens": [50672, + 5456, 286, 519, 309, 311, 257, 5456, 551, 286, 360, 286, 360, 519, 411, 544, 293, + 544, 257, 3801, 3124, 486, 51036], "temperature": 0.0, "avg_logprob": -0.11052543776375907, + "compression_ratio": 1.8055555555555556, "no_speech_prob": 0.0005219537415541708}, + {"id": 390, "seek": 323248, "start": 3245.92, "end": 3252.08, "text": " evolve to + scaling these things out and understanding them from an operational perspective + so yeah I", "tokens": [51036, 16693, 281, 21589, 613, 721, 484, 293, 3701, 552, + 490, 364, 16607, 4585, 370, 1338, 286, 51344], "temperature": 0.0, "avg_logprob": + -0.11052543776375907, "compression_ratio": 1.8055555555555556, "no_speech_prob": + 0.0005219537415541708}, {"id": 391, "seek": 323248, "start": 3252.08, "end": 3258.64, + "text": " definitely think it''s an interesting landscape to watch and I think like + I think the the other", "tokens": [51344, 2138, 519, 309, 311, 364, 1880, 9661, + 281, 1159, 293, 286, 519, 411, 286, 519, 264, 264, 661, 51672], "temperature": 0.0, + "avg_logprob": -0.11052543776375907, "compression_ratio": 1.8055555555555556, "no_speech_prob": + 0.0005219537415541708}, {"id": 392, "seek": 325864, "start": 3258.64, "end": 3265.6, + "text": " counter even to what I said is like like their solar and elastic search + are established for their", "tokens": [50364, 5682, 754, 281, 437, 286, 848, 307, + 411, 411, 641, 7936, 293, 17115, 3164, 366, 7545, 337, 641, 50712], "temperature": + 0.0, "avg_logprob": -0.11517593159395105, "compression_ratio": 1.7333333333333334, + "no_speech_prob": 0.0009845979511737823}, {"id": 393, "seek": 325864, "start": 3265.6, + "end": 3271.7599999999998, "text": " use cases but this sort of like I was saying + before this sort of like world of building these", "tokens": [50712, 764, 3331, + 457, 341, 1333, 295, 411, 286, 390, 1566, 949, 341, 1333, 295, 411, 1002, 295, 2390, + 613, 51020], "temperature": 0.0, "avg_logprob": -0.11517593159395105, "compression_ratio": + 1.7333333333333334, "no_speech_prob": 0.0009845979511737823}, {"id": 394, "seek": + 325864, "start": 3273.2799999999997, "end": 3281.7599999999998, "text": " search + like or data driven surfaces or personalization driven surfaces it''s just wide + open like I", "tokens": [51096, 3164, 411, 420, 1412, 9555, 16130, 420, 2973, 2144, + 9555, 16130, 309, 311, 445, 4874, 1269, 411, 286, 51520], "temperature": 0.0, "avg_logprob": + -0.11517593159395105, "compression_ratio": 1.7333333333333334, "no_speech_prob": + 0.0009845979511737823}, {"id": 395, "seek": 325864, "start": 3281.7599999999998, + "end": 3286.24, "text": " look at my phone I have the limited screen real estate + it needs to show me something relevant for me", "tokens": [51520, 574, 412, 452, + 2593, 286, 362, 264, 5567, 2568, 957, 9749, 309, 2203, 281, 855, 385, 746, 7340, + 337, 385, 51744], "temperature": 0.0, "avg_logprob": -0.11517593159395105, "compression_ratio": + 1.7333333333333334, "no_speech_prob": 0.0009845979511737823}, {"id": 396, "seek": + 328624, "start": 3287.12, "end": 3294.7999999999997, "text": " or relevant to you + know I think you know peloton for example peloton the fitness app I want to do a", + "tokens": [50408, 420, 7340, 281, 291, 458, 286, 519, 291, 458, 6178, 27794, 337, + 1365, 6178, 27794, 264, 15303, 724, 286, 528, 281, 360, 257, 50792], "temperature": + 0.0, "avg_logprob": -0.09983294463354694, "compression_ratio": 1.8320895522388059, + "no_speech_prob": 0.0009514725534245372}, {"id": 397, "seek": 328624, "start": 3294.7999999999997, + "end": 3299.6, "text": " workout it''s gonna I''m gonna go to the app it''s gonna + it''s not gonna like it does have an navigation", "tokens": [50792, 12169, 309, + 311, 799, 286, 478, 799, 352, 281, 264, 724, 309, 311, 799, 309, 311, 406, 799, + 411, 309, 775, 362, 364, 17346, 51032], "temperature": 0.0, "avg_logprob": -0.09983294463354694, + "compression_ratio": 1.8320895522388059, "no_speech_prob": 0.0009514725534245372}, + {"id": 398, "seek": 328624, "start": 3299.6, "end": 3303.52, "text": " but it''s + also just trying to show me something on the screen that''s gonna be relevant to + me so engage", "tokens": [51032, 457, 309, 311, 611, 445, 1382, 281, 855, 385, 746, + 322, 264, 2568, 300, 311, 799, 312, 7340, 281, 385, 370, 4683, 51228], "temperature": + 0.0, "avg_logprob": -0.09983294463354694, "compression_ratio": 1.8320895522388059, + "no_speech_prob": 0.0009514725534245372}, {"id": 399, "seek": 328624, "start": 3303.52, + "end": 3310.24, "text": " with it or Netflix or all of the UIs we use these days + they''re not really like point and click", "tokens": [51228, 365, 309, 420, 12778, + 420, 439, 295, 264, 624, 6802, 321, 764, 613, 1708, 436, 434, 406, 534, 411, 935, + 293, 2052, 51564], "temperature": 0.0, "avg_logprob": -0.09983294463354694, "compression_ratio": + 1.8320895522388059, "no_speech_prob": 0.0009514725534245372}, {"id": 400, "seek": + 328624, "start": 3310.24, "end": 3315.8399999999997, "text": " they''re driven by + some kind of smart algorithm and it''s not necessarily like a classic search", "tokens": + [51564, 436, 434, 9555, 538, 512, 733, 295, 4069, 9284, 293, 309, 311, 406, 4725, + 411, 257, 7230, 3164, 51844], "temperature": 0.0, "avg_logprob": -0.09983294463354694, + "compression_ratio": 1.8320895522388059, "no_speech_prob": 0.0009514725534245372}, + {"id": 401, "seek": 331584, "start": 3315.84, "end": 3321.84, "text": " use case + where it''s like point click filter then search with relevance so I do think that", + "tokens": [50364, 764, 1389, 689, 309, 311, 411, 935, 2052, 6608, 550, 3164, 365, + 32684, 370, 286, 360, 519, 300, 50664], "temperature": 0.0, "avg_logprob": -0.08126515271712323, + "compression_ratio": 1.8046511627906976, "no_speech_prob": 0.0002980774734169245}, + {"id": 402, "seek": 331584, "start": 3321.84, "end": 3328.8, "text": " it''s a wide + open space and honestly I think it''s a it''s an under appreciated space and I think + it''s a", "tokens": [50664, 309, 311, 257, 4874, 1269, 1901, 293, 6095, 286, 519, + 309, 311, 257, 309, 311, 364, 833, 17169, 1901, 293, 286, 519, 309, 311, 257, 51012], + "temperature": 0.0, "avg_logprob": -0.08126515271712323, "compression_ratio": 1.8046511627906976, + "no_speech_prob": 0.0002980774734169245}, {"id": 403, "seek": 331584, "start": 3328.8, + "end": 3335.6800000000003, "text": " space that in some ways if I was maybe if I + was you know I''m just thinking of this now and I speak", "tokens": [51012, 1901, + 300, 294, 512, 2098, 498, 286, 390, 1310, 498, 286, 390, 291, 458, 286, 478, 445, + 1953, 295, 341, 586, 293, 286, 1710, 51356], "temperature": 0.0, "avg_logprob": + -0.08126515271712323, "compression_ratio": 1.8046511627906976, "no_speech_prob": + 0.0002980774734169245}, {"id": 404, "seek": 331584, "start": 3335.6800000000003, + "end": 3343.1200000000003, "text": " speaking completely out of out of hand but + like if I was to advise like a pine cone or somebody I", "tokens": [51356, 4124, + 2584, 484, 295, 484, 295, 1011, 457, 411, 498, 286, 390, 281, 18312, 411, 257, 15113, + 19749, 420, 2618, 286, 51728], "temperature": 0.0, "avg_logprob": -0.08126515271712323, + "compression_ratio": 1.8046511627906976, "no_speech_prob": 0.0002980774734169245}, + {"id": 405, "seek": 334312, "start": 3343.12, "end": 3347.3599999999997, "text": + " might say like let''s you know stop talking about yourself as a vector database + let''s start talking", "tokens": [50364, 1062, 584, 411, 718, 311, 291, 458, 1590, + 1417, 466, 1803, 382, 257, 8062, 8149, 718, 311, 722, 1417, 50576], "temperature": + 0.0, "avg_logprob": -0.07453746293720447, "compression_ratio": 2.043103448275862, + "no_speech_prob": 8.013944170670584e-05}, {"id": 406, "seek": 334312, "start": 3347.3599999999997, + "end": 3352.24, "text": " about all of these ways that are really you know I think + I in my book I talk about relevance", "tokens": [50576, 466, 439, 295, 613, 2098, + 300, 366, 534, 291, 458, 286, 519, 286, 294, 452, 1446, 286, 751, 466, 32684, 50820], + "temperature": 0.0, "avg_logprob": -0.07453746293720447, "compression_ratio": 2.043103448275862, + "no_speech_prob": 8.013944170670584e-05}, {"id": 407, "seek": 334312, "start": 3352.24, + "end": 3358.16, "text": " oriented applications or like I forget even like relevance + driven enterprises and I think like", "tokens": [50820, 21841, 5821, 420, 411, 286, + 2870, 754, 411, 32684, 9555, 29034, 293, 286, 519, 411, 51116], "temperature": 0.0, + "avg_logprob": -0.07453746293720447, "compression_ratio": 2.043103448275862, "no_speech_prob": + 8.013944170670584e-05}, {"id": 408, "seek": 334312, "start": 3358.7999999999997, + "end": 3364.08, "text": " a lot of these applications are really sort of like completely + sort of relevance oriented", "tokens": [51148, 257, 688, 295, 613, 5821, 366, 534, + 1333, 295, 411, 2584, 1333, 295, 32684, 21841, 51412], "temperature": 0.0, "avg_logprob": + -0.07453746293720447, "compression_ratio": 2.043103448275862, "no_speech_prob": + 8.013944170670584e-05}, {"id": 409, "seek": 334312, "start": 3364.08, "end": 3368.96, + "text": " applications which really whether it''s really personalization or recommendations + there are things", "tokens": [51412, 5821, 597, 534, 1968, 309, 311, 534, 2973, + 2144, 420, 10434, 456, 366, 721, 51656], "temperature": 0.0, "avg_logprob": -0.07453746293720447, + "compression_ratio": 2.043103448275862, "no_speech_prob": 8.013944170670584e-05}, + {"id": 410, "seek": 336896, "start": 3368.96, "end": 3375.36, "text": " that are + about ranking something to a user for given context or maybe for given query or + question", "tokens": [50364, 300, 366, 466, 17833, 746, 281, 257, 4195, 337, 2212, + 4319, 420, 1310, 337, 2212, 14581, 420, 1168, 50684], "temperature": 0.0, "avg_logprob": + -0.1180851234579986, "compression_ratio": 1.7112676056338028, "no_speech_prob": + 0.002889912808313966}, {"id": 411, "seek": 336896, "start": 3376.16, "end": 3381.12, + "text": " and I would focus on the universe that stuff because I don''t know if + anyone''s really speaking", "tokens": [50724, 293, 286, 576, 1879, 322, 264, 6445, + 300, 1507, 570, 286, 500, 380, 458, 498, 2878, 311, 534, 4124, 50972], "temperature": + 0.0, "avg_logprob": -0.1180851234579986, "compression_ratio": 1.7112676056338028, + "no_speech_prob": 0.002889912808313966}, {"id": 412, "seek": 336896, "start": 3381.68, + "end": 3387.28, "text": " from a product perspective about how what is the engine + that drives that and I think that could", "tokens": [51000, 490, 257, 1674, 4585, + 466, 577, 437, 307, 264, 2848, 300, 11754, 300, 293, 286, 519, 300, 727, 51280], + "temperature": 0.0, "avg_logprob": -0.1180851234579986, "compression_ratio": 1.7112676056338028, + "no_speech_prob": 0.002889912808313966}, {"id": 413, "seek": 336896, "start": 3387.28, + "end": 3393.52, "text": " be really exciting product or open source space or whatever + just just really begin. This is a great", "tokens": [51280, 312, 534, 4670, 1674, + 420, 1269, 4009, 1901, 420, 2035, 445, 445, 534, 1841, 13, 639, 307, 257, 869, 51592], + "temperature": 0.0, "avg_logprob": -0.1180851234579986, "compression_ratio": 1.7112676056338028, + "no_speech_prob": 0.002889912808313966}, {"id": 414, "seek": 336896, "start": 3393.52, + "end": 3398.64, "text": " advice I might quote you on the upcoming keynote that + I need to deliver at Haystack Berlin because", "tokens": [51592, 5192, 286, 1062, + 6513, 291, 322, 264, 11500, 33896, 300, 286, 643, 281, 4239, 412, 8721, 372, 501, + 13848, 570, 51848], "temperature": 0.0, "avg_logprob": -0.1180851234579986, "compression_ratio": + 1.7112676056338028, "no_speech_prob": 0.002889912808313966}, {"id": 415, "seek": + 339864, "start": 3399.3599999999997, "end": 3406.72, "text": " this is a great great + thought because in many ways you know one thing is that when people come back", + "tokens": [50400, 341, 307, 257, 869, 869, 1194, 570, 294, 867, 2098, 291, 458, + 472, 551, 307, 300, 562, 561, 808, 646, 50768], "temperature": 0.0, "avg_logprob": + -0.17047829463564115, "compression_ratio": 1.7276785714285714, "no_speech_prob": + 0.001621381612494588}, {"id": 416, "seek": 339864, "start": 3406.72, "end": 3411.52, + "text": " to me as they say what''s the difference between vector and neural search + and I''m like there''s", "tokens": [50768, 281, 385, 382, 436, 584, 437, 311, 264, + 2649, 1296, 8062, 293, 18161, 3164, 293, 286, 478, 411, 456, 311, 51008], "temperature": + 0.0, "avg_logprob": -0.17047829463564115, "compression_ratio": 1.7276785714285714, + "no_speech_prob": 0.001621381612494588}, {"id": 417, "seek": 339864, "start": 3411.52, + "end": 3416.08, "text": " not much difference it''s like vector is probably mathematical + stance and then you know it''s more like", "tokens": [51008, 406, 709, 2649, 309, + 311, 411, 8062, 307, 1391, 18894, 21033, 293, 550, 291, 458, 309, 311, 544, 411, + 51236], "temperature": 0.0, "avg_logprob": -0.17047829463564115, "compression_ratio": + 1.7276785714285714, "no_speech_prob": 0.001621381612494588}, {"id": 418, "seek": + 339864, "start": 3416.08, "end": 3421.6, "text": " if you''re deep learning engineer + or researcher so you like to take it from that angle right", "tokens": [51236, 498, + 291, 434, 2452, 2539, 11403, 420, 21751, 370, 291, 411, 281, 747, 309, 490, 300, + 5802, 558, 51512], "temperature": 0.0, "avg_logprob": -0.17047829463564115, "compression_ratio": + 1.7276785714285714, "no_speech_prob": 0.001621381612494588}, {"id": 419, "seek": + 342160, "start": 3422.56, "end": 3429.2, "text": " but then you put it so beautifully + like maybe if we focus too much on this technical level saying", "tokens": [50412, + 457, 550, 291, 829, 309, 370, 16525, 411, 1310, 498, 321, 1879, 886, 709, 322, 341, + 6191, 1496, 1566, 50744], "temperature": 0.0, "avg_logprob": -0.20838844208490281, + "compression_ratio": 1.7378640776699028, "no_speech_prob": 0.0037659876979887486}, + {"id": 420, "seek": 342160, "start": 3429.2, "end": 3434.16, "text": " you know + this is vector search that is and yeah totally it''s all sexy you need to bite", + "tokens": [50744, 291, 458, 341, 307, 8062, 3164, 300, 307, 293, 1338, 3879, 309, + 311, 439, 13701, 291, 643, 281, 7988, 50992], "temperature": 0.0, "avg_logprob": + -0.20838844208490281, "compression_ratio": 1.7378640776699028, "no_speech_prob": + 0.0037659876979887486}, {"id": 421, "seek": 342160, "start": 3435.2799999999997, + "end": 3442.08, "text": " and we don''t focus on like use cases and how things enough + I think enough and how this could", "tokens": [51048, 293, 321, 500, 380, 1879, + 322, 411, 764, 3331, 293, 577, 721, 1547, 286, 519, 1547, 293, 577, 341, 727, 51388], + "temperature": 0.0, "avg_logprob": -0.20838844208490281, "compression_ratio": 1.7378640776699028, + "no_speech_prob": 0.0037659876979887486}, {"id": 422, "seek": 342160, "start": 3442.08, + "end": 3447.68, "text": " complement each other you know it''s not like vector search + is trying to kick out", "tokens": [51388, 17103, 1184, 661, 291, 458, 309, 311, + 406, 411, 8062, 3164, 307, 1382, 281, 4437, 484, 51668], "temperature": 0.0, "avg_logprob": + -0.20838844208490281, "compression_ratio": 1.7378640776699028, "no_speech_prob": + 0.0037659876979887486}, {"id": 423, "seek": 344768, "start": 3448.3199999999997, + "end": 3453.52, "text": " spar search it''s not going to happen by the way you know + the prey search is not supported in vector", "tokens": [50396, 45954, 3164, 309, + 311, 406, 516, 281, 1051, 538, 264, 636, 291, 458, 264, 21107, 3164, 307, 406, 8104, + 294, 8062, 50656], "temperature": 0.0, "avg_logprob": -0.1232144282414363, "compression_ratio": + 1.7488372093023257, "no_speech_prob": 0.005667015910148621}, {"id": 424, "seek": + 344768, "start": 3453.52, "end": 3458.48, "text": " search maybe it will be but + it''s not right now you cannot just say there''s also there''s also", "tokens": + [50656, 3164, 1310, 309, 486, 312, 457, 309, 311, 406, 558, 586, 291, 2644, 445, + 584, 456, 311, 611, 456, 311, 611, 50904], "temperature": 0.0, "avg_logprob": -0.1232144282414363, + "compression_ratio": 1.7488372093023257, "no_speech_prob": 0.005667015910148621}, + {"id": 425, "seek": 344768, "start": 3458.48, "end": 3465.2799999999997, "text": + " a set of problems that are I mean to this day people use tree based models for + I''m going to mix", "tokens": [50904, 257, 992, 295, 2740, 300, 366, 286, 914, 281, + 341, 786, 561, 764, 4230, 2361, 5245, 337, 286, 478, 516, 281, 2890, 51244], "temperature": + 0.0, "avg_logprob": -0.1232144282414363, "compression_ratio": 1.7488372093023257, + "no_speech_prob": 0.005667015910148621}, {"id": 426, "seek": 344768, "start": 3466.08, + "end": 3472.3199999999997, "text": " some kind of similarity with some kind of statistic + about my data you know I think like", "tokens": [51284, 512, 733, 295, 32194, 365, + 512, 733, 295, 29588, 466, 452, 1412, 291, 458, 286, 519, 411, 51596], "temperature": + 0.0, "avg_logprob": -0.1232144282414363, "compression_ratio": 1.7488372093023257, + "no_speech_prob": 0.005667015910148621}, {"id": 427, "seek": 347232, "start": 3472.48, + "end": 3477.6000000000004, "text": " tabular data so to speak has consistently been + dominated by tree based models which is a", "tokens": [50372, 4421, 1040, 1412, + 370, 281, 1710, 575, 14961, 668, 23755, 538, 4230, 2361, 5245, 597, 307, 257, 50628], + "temperature": 0.0, "avg_logprob": -0.16221744749281142, "compression_ratio": 1.7120622568093384, + "no_speech_prob": 0.00047509855357930064}, {"id": 428, "seek": 347232, "start": + 3477.6000000000004, "end": 3481.6000000000004, "text": " completely different thing + from deep learning and neural search so and those things", "tokens": [50628, 2584, + 819, 551, 490, 2452, 2539, 293, 18161, 3164, 370, 293, 729, 721, 50828], "temperature": + 0.0, "avg_logprob": -0.16221744749281142, "compression_ratio": 1.7120622568093384, + "no_speech_prob": 0.00047509855357930064}, {"id": 429, "seek": 347232, "start": + 3482.32, "end": 3486.0800000000004, "text": " integrate pretty well with like the + learning to rank plugins in solar and elastic search", "tokens": [50864, 13365, + 1238, 731, 365, 411, 264, 2539, 281, 6181, 33759, 294, 7936, 293, 17115, 3164, 51052], + "temperature": 0.0, "avg_logprob": -0.16221744749281142, "compression_ratio": 1.7120622568093384, + "no_speech_prob": 0.00047509855357930064}, {"id": 430, "seek": 347232, "start": + 3488.1600000000003, "end": 3492.96, "text": " where you can plug in a vector similarity + into that kind of tree based model", "tokens": [51156, 689, 291, 393, 5452, 294, + 257, 8062, 32194, 666, 300, 733, 295, 4230, 2361, 2316, 51396], "temperature": 0.0, + "avg_logprob": -0.16221744749281142, "compression_ratio": 1.7120622568093384, "no_speech_prob": + 0.00047509855357930064}, {"id": 431, "seek": 347232, "start": 3494.7200000000003, + "end": 3501.6800000000003, "text": " but the opposite isn''t necessarily true this + is very interesting so diagram like we spoke a lot about", "tokens": [51484, 457, + 264, 6182, 1943, 380, 4725, 2074, 341, 307, 588, 1880, 370, 10686, 411, 321, 7179, + 257, 688, 466, 51832], "temperature": 0.0, "avg_logprob": -0.16221744749281142, + "compression_ratio": 1.7120622568093384, "no_speech_prob": 0.00047509855357930064}, + {"id": 432, "seek": 350232, "start": 3502.32, "end": 3508.32, "text": " and I''m + sure we could speak more about how to engineer a search engine you know let''s say + if", "tokens": [50364, 293, 286, 478, 988, 321, 727, 1710, 544, 466, 577, 281, 11403, + 257, 3164, 2848, 291, 458, 718, 311, 584, 498, 50664], "temperature": 0.0, "avg_logprob": + -0.15933424927467524, "compression_ratio": 1.7465437788018434, "no_speech_prob": + 0.000993800931610167}, {"id": 433, "seek": 350232, "start": 3508.32, "end": 3513.44, + "text": " you start up you don''t have clicks right you you don''t have feedback + from your users maybe", "tokens": [50664, 291, 722, 493, 291, 500, 380, 362, 18521, + 558, 291, 291, 500, 380, 362, 5824, 490, 428, 5022, 1310, 50920], "temperature": + 0.0, "avg_logprob": -0.15933424927467524, "compression_ratio": 1.7465437788018434, + "no_speech_prob": 0.000993800931610167}, {"id": 434, "seek": 350232, "start": 3513.44, + "end": 3521.6000000000004, "text": " necessarily in that form you can engineer now + you have a dense search you can still engineer by", "tokens": [50920, 4725, 294, + 300, 1254, 291, 393, 11403, 586, 291, 362, 257, 18011, 3164, 291, 393, 920, 11403, + 538, 51328], "temperature": 0.0, "avg_logprob": -0.15933424927467524, "compression_ratio": + 1.7465437788018434, "no_speech_prob": 0.000993800931610167}, {"id": 435, "seek": + 350232, "start": 3521.6000000000004, "end": 3529.52, "text": " tweaking analysis + chains and crafting scene dictionaries but once you are over that launch you know", + "tokens": [51328, 6986, 2456, 5215, 12626, 293, 29048, 4145, 22352, 4889, 457, 1564, + 291, 366, 670, 300, 4025, 291, 458, 51724], "temperature": 0.0, "avg_logprob": -0.15933424927467524, + "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.000993800931610167}, + {"id": 436, "seek": 352952, "start": 3529.52, "end": 3536.08, "text": " and you + gather that data natureal move is to start looking into something like learning + to rank", "tokens": [50364, 293, 291, 5448, 300, 1412, 3687, 304, 1286, 307, 281, + 722, 1237, 666, 746, 411, 2539, 281, 6181, 50692], "temperature": 0.0, "avg_logprob": + -0.1953150524812586, "compression_ratio": 1.5923913043478262, "no_speech_prob": + 0.001700830296613276}, {"id": 437, "seek": 352952, "start": 3536.96, "end": 3545.68, + "text": " and you you spoke a lot about that I was just quickly googling you know + you you spoke at I believe", "tokens": [50736, 293, 291, 291, 7179, 257, 688, 466, + 300, 286, 390, 445, 2661, 50061, 1688, 291, 458, 291, 291, 7179, 412, 286, 1697, + 51172], "temperature": 0.0, "avg_logprob": -0.1953150524812586, "compression_ratio": + 1.5923913043478262, "no_speech_prob": 0.001700830296613276}, {"id": 438, "seek": + 352952, "start": 3547.52, "end": 3554.32, "text": " reading buzzwords haystack you + spoke somewhere in San Francisco Bay area like how to turn you know", "tokens": + [51264, 3760, 13036, 13832, 4842, 372, 501, 291, 7179, 4079, 294, 5271, 12279, 7840, + 1859, 411, 577, 281, 1261, 291, 458, 51604], "temperature": 0.0, "avg_logprob": + -0.1953150524812586, "compression_ratio": 1.5923913043478262, "no_speech_prob": + 0.001700830296613276}, {"id": 439, "seek": 355432, "start": 3554.32, "end": 3559.6000000000004, + "text": " ranking into a ML problem machine learning problem we also spoke about + Bayezian", "tokens": [50364, 17833, 666, 257, 21601, 1154, 3479, 2539, 1154, 321, + 611, 7179, 466, 7840, 4371, 952, 50628], "temperature": 0.0, "avg_logprob": -0.2161091075224035, + "compression_ratio": 1.6607142857142858, "no_speech_prob": 0.002461459720507264}, + {"id": 440, "seek": 355432, "start": 3561.1200000000003, "end": 3567.2000000000003, + "text": " well yeah and then there is also I''ve recently learned well not that + recently I think it was last", "tokens": [50704, 731, 1338, 293, 550, 456, 307, + 611, 286, 600, 3938, 3264, 731, 406, 300, 3938, 286, 519, 309, 390, 1036, 51008], + "temperature": 0.0, "avg_logprob": -0.2161091075224035, "compression_ratio": 1.6607142857142858, + "no_speech_prob": 0.002461459720507264}, {"id": 441, "seek": 355432, "start": 3567.6800000000003, + "end": 3576.32, "text": " haystack or maybe the previous before that learning to + boost how do these methods come together", "tokens": [51032, 4842, 372, 501, 420, + 1310, 264, 3894, 949, 300, 2539, 281, 9194, 577, 360, 613, 7150, 808, 1214, 51464], + "temperature": 0.0, "avg_logprob": -0.2161091075224035, "compression_ratio": 1.6607142857142858, + "no_speech_prob": 0.002461459720507264}, {"id": 442, "seek": 355432, "start": 3576.32, + "end": 3582.88, "text": " where do you start for those who maybe haven''t have only + heard about it but they didn''t try it yet", "tokens": [51464, 689, 360, 291, 722, + 337, 729, 567, 1310, 2378, 380, 362, 787, 2198, 466, 309, 457, 436, 994, 380, 853, + 309, 1939, 51792], "temperature": 0.0, "avg_logprob": -0.2161091075224035, "compression_ratio": + 1.6607142857142858, "no_speech_prob": 0.002461459720507264}, {"id": 443, "seek": + 358288, "start": 3583.6, "end": 3590.0, "text": " and do you also think maybe a + connected question do you think that we will marry you know", "tokens": [50400, + 293, 360, 291, 611, 519, 1310, 257, 4582, 1168, 360, 291, 519, 300, 321, 486, 9747, + 291, 458, 50720], "temperature": 0.0, "avg_logprob": -0.10770666777197994, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.00396511796861887}, {"id": 444, "seek": + 358288, "start": 3590.0, "end": 3598.48, "text": " the dense retrieval signals with + learning to rank in some way does it make sense yeah yeah so", "tokens": [50720, + 264, 18011, 19817, 3337, 12354, 365, 2539, 281, 6181, 294, 512, 636, 775, 309, 652, + 2020, 1338, 1338, 370, 51144], "temperature": 0.0, "avg_logprob": -0.10770666777197994, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.00396511796861887}, + {"id": 445, "seek": 358288, "start": 3600.0, "end": 3608.08, "text": " yeah so I + think a lot of companies they think it''s easy to go into the learning to rank process + thinking", "tokens": [51220, 1338, 370, 286, 519, 257, 688, 295, 3431, 436, 519, + 309, 311, 1858, 281, 352, 666, 264, 2539, 281, 6181, 1399, 1953, 51624], "temperature": + 0.0, "avg_logprob": -0.10770666777197994, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.00396511796861887}, {"id": 446, "seek": 360808, "start": 3608.08, + "end": 3617.44, "text": " that I''m gonna this is the you know knock on wood it''s + easy I have a model I''m gonna train this model", "tokens": [50364, 300, 286, 478, + 799, 341, 307, 264, 291, 458, 6728, 322, 4576, 309, 311, 1858, 286, 362, 257, 2316, + 286, 478, 799, 3847, 341, 2316, 50832], "temperature": 0.0, "avg_logprob": -0.14794210208359584, + "compression_ratio": 1.7253218884120172, "no_speech_prob": 0.0006000038702040911}, + {"id": 447, "seek": 360808, "start": 3617.44, "end": 3623.12, "text": " with my + clicks and everything and and search will magically get better what''s interesting + is if you go", "tokens": [50832, 365, 452, 18521, 293, 1203, 293, 293, 3164, 486, + 39763, 483, 1101, 437, 311, 1880, 307, 498, 291, 352, 51116], "temperature": 0.0, + "avg_logprob": -0.14794210208359584, "compression_ratio": 1.7253218884120172, "no_speech_prob": + 0.0006000038702040911}, {"id": 448, "seek": 360808, "start": 3623.12, "end": 3629.92, + "text": " back to haystack talks about learning to rank and other comforts talks + where the number one place", "tokens": [51116, 646, 281, 4842, 372, 501, 6686, 466, + 2539, 281, 6181, 293, 661, 3400, 82, 6686, 689, 264, 1230, 472, 1081, 51456], "temperature": + 0.0, "avg_logprob": -0.14794210208359584, "compression_ratio": 1.7253218884120172, + "no_speech_prob": 0.0006000038702040911}, {"id": 449, "seek": 360808, "start": 3629.92, + "end": 3636.24, "text": " people get stuck and they spend their energy is on the + training data less so the model the features", "tokens": [51456, 561, 483, 5541, + 293, 436, 3496, 641, 2281, 307, 322, 264, 3097, 1412, 1570, 370, 264, 2316, 264, + 4122, 51772], "temperature": 0.0, "avg_logprob": -0.14794210208359584, "compression_ratio": + 1.7253218884120172, "no_speech_prob": 0.0006000038702040911}, {"id": 450, "seek": + 363624, "start": 3636.24, "end": 3643.52, "text": " and all of that stuff and if + you think about it it makes sense because one of the biggest problems", "tokens": + [50364, 293, 439, 295, 300, 1507, 293, 498, 291, 519, 466, 309, 309, 1669, 2020, + 570, 472, 295, 264, 3880, 2740, 50728], "temperature": 0.0, "avg_logprob": -0.08589539034613247, + "compression_ratio": 1.8036529680365296, "no_speech_prob": 0.0004960038932040334}, + {"id": 451, "seek": 363624, "start": 3643.52, "end": 3650.08, "text": " with sort + of training data is it''s just horribly biased towards whatever the old algorithm + is you''re", "tokens": [50728, 365, 1333, 295, 3097, 1412, 307, 309, 311, 445, 45028, + 28035, 3030, 2035, 264, 1331, 9284, 307, 291, 434, 51056], "temperature": 0.0, "avg_logprob": + -0.08589539034613247, "compression_ratio": 1.8036529680365296, "no_speech_prob": + 0.0004960038932040334}, {"id": 452, "seek": 363624, "start": 3650.08, "end": 3654.7999999999997, + "text": " always clicking on you people are always clicking on what the old algorithm + showed them regardless of", "tokens": [51056, 1009, 9697, 322, 291, 561, 366, 1009, + 9697, 322, 437, 264, 1331, 9284, 4712, 552, 10060, 295, 51292], "temperature": 0.0, + "avg_logprob": -0.08589539034613247, "compression_ratio": 1.8036529680365296, "no_speech_prob": + 0.0004960038932040334}, {"id": 453, "seek": 363624, "start": 3655.68, "end": 3661.2799999999997, + "text": " you know if it was good or not there are it''s getting some clicks and + the stuff that might be", "tokens": [51336, 291, 458, 498, 309, 390, 665, 420, 406, + 456, 366, 309, 311, 1242, 512, 18521, 293, 264, 1507, 300, 1062, 312, 51616], "temperature": + 0.0, "avg_logprob": -0.08589539034613247, "compression_ratio": 1.8036529680365296, + "no_speech_prob": 0.0004960038932040334}, {"id": 454, "seek": 366128, "start": 3661.28, + "end": 3668.8, "text": " amazing but it''s on the 10th page is never getting clicks + so how do you optimize search in that", "tokens": [50364, 2243, 457, 309, 311, 322, + 264, 1266, 392, 3028, 307, 1128, 1242, 18521, 370, 577, 360, 291, 19719, 3164, 294, + 300, 50740], "temperature": 0.0, "avg_logprob": -0.08895704191024989, "compression_ratio": + 1.5978260869565217, "no_speech_prob": 0.0003550946421455592}, {"id": 455, "seek": + 366128, "start": 3668.8, "end": 3676.5600000000004, "text": " context and it sort + of doesn''t matter what model you use or what feature you use until you get like", + "tokens": [50740, 4319, 293, 309, 1333, 295, 1177, 380, 1871, 437, 2316, 291, 764, + 420, 437, 4111, 291, 764, 1826, 291, 483, 411, 51128], "temperature": 0.0, "avg_logprob": + -0.08895704191024989, "compression_ratio": 1.5978260869565217, "no_speech_prob": + 0.0003550946421455592}, {"id": 456, "seek": 366128, "start": 3677.2000000000003, + "end": 3685.1200000000003, "text": " really well-labeled training data you can''t + really make much progress um so what you can do to get", "tokens": [51160, 534, + 731, 12, 75, 18657, 292, 3097, 1412, 291, 393, 380, 534, 652, 709, 4205, 1105, 370, + 437, 291, 393, 360, 281, 483, 51556], "temperature": 0.0, "avg_logprob": -0.08895704191024989, + "compression_ratio": 1.5978260869565217, "no_speech_prob": 0.0003550946421455592}, + {"id": 457, "seek": 368512, "start": 3685.12, "end": 3691.52, "text": " started + on it is at a minimum okay we know that it''s the training data is not perfect but + if we", "tokens": [50364, 1409, 322, 309, 307, 412, 257, 7285, 1392, 321, 458, 300, + 309, 311, 264, 3097, 1412, 307, 406, 2176, 457, 498, 321, 50684], "temperature": + 0.0, "avg_logprob": -0.08400191917075767, "compression_ratio": 1.8774703557312253, + "no_speech_prob": 0.00032635184470564127}, {"id": 458, "seek": 368512, "start": + 3691.52, "end": 3697.52, "text": " just look at like the top end the top 10 or so + what''s actually getting clicks we might be able to", "tokens": [50684, 445, 574, + 412, 411, 264, 1192, 917, 264, 1192, 1266, 420, 370, 437, 311, 767, 1242, 18521, + 321, 1062, 312, 1075, 281, 50984], "temperature": 0.0, "avg_logprob": -0.08400191917075767, + "compression_ratio": 1.8774703557312253, "no_speech_prob": 0.00032635184470564127}, + {"id": 459, "seek": 368512, "start": 3697.52, "end": 3702.3199999999997, "text": + " start to learn some stuff there about like what differentiates them so you might + start to think you", "tokens": [50984, 722, 281, 1466, 512, 1507, 456, 466, 411, + 437, 27372, 1024, 552, 370, 291, 1062, 722, 281, 519, 291, 51224], "temperature": + 0.0, "avg_logprob": -0.08400191917075767, "compression_ratio": 1.8774703557312253, + "no_speech_prob": 0.00032635184470564127}, {"id": 460, "seek": 368512, "start": + 3702.3199999999997, "end": 3707.68, "text": " could think about this is um the learning + to rank is there are many ways of learning to rank but", "tokens": [51224, 727, + 519, 466, 341, 307, 1105, 264, 2539, 281, 6181, 307, 456, 366, 867, 2098, 295, 2539, + 281, 6181, 457, 51492], "temperature": 0.0, "avg_logprob": -0.08400191917075767, + "compression_ratio": 1.8774703557312253, "no_speech_prob": 0.00032635184470564127}, + {"id": 461, "seek": 368512, "start": 3707.68, "end": 3711.68, "text": " if we just + start to think about it as a classification problem within the context of", "tokens": + [51492, 498, 321, 445, 722, 281, 519, 466, 309, 382, 257, 21538, 1154, 1951, 264, + 4319, 295, 51692], "temperature": 0.0, "avg_logprob": -0.08400191917075767, "compression_ratio": + 1.8774703557312253, "no_speech_prob": 0.00032635184470564127}, {"id": 462, "seek": + 371168, "start": 3712.0, "end": 3719.2799999999997, "text": " these uh these results + are that we do have click data on what''s differentiating them like getting", "tokens": + [50380, 613, 2232, 613, 3542, 366, 300, 321, 360, 362, 2052, 1412, 322, 437, 311, + 27372, 990, 552, 411, 1242, 50744], "temperature": 0.0, "avg_logprob": -0.11180296609568041, + "compression_ratio": 1.8592233009708738, "no_speech_prob": 0.00026017456548288465}, + {"id": 463, "seek": 371168, "start": 3719.2799999999997, "end": 3725.9199999999996, + "text": " being seen with a lot of impressions and no clicks and lots of things + and things with lots of clicks", "tokens": [50744, 885, 1612, 365, 257, 688, 295, + 24245, 293, 572, 18521, 293, 3195, 295, 721, 293, 721, 365, 3195, 295, 18521, 51076], + "temperature": 0.0, "avg_logprob": -0.11180296609568041, "compression_ratio": 1.8592233009708738, + "no_speech_prob": 0.00026017456548288465}, {"id": 464, "seek": 371168, "start": + 3725.9199999999996, "end": 3732.48, "text": " and you start to see the features + that separate those um and then you sort of like know at least", "tokens": [51076, + 293, 291, 722, 281, 536, 264, 4122, 300, 4994, 729, 1105, 293, 550, 291, 1333, 295, + 411, 458, 412, 1935, 51404], "temperature": 0.0, "avg_logprob": -0.11180296609568041, + "compression_ratio": 1.8592233009708738, "no_speech_prob": 0.00026017456548288465}, + {"id": 465, "seek": 371168, "start": 3732.48, "end": 3738.48, "text": " you''re + knowing like within the context of your search filter bubble what''s sort of like", + "tokens": [51404, 291, 434, 5276, 411, 1951, 264, 4319, 295, 428, 3164, 6608, 12212, + 437, 311, 1333, 295, 411, 51704], "temperature": 0.0, "avg_logprob": -0.11180296609568041, + "compression_ratio": 1.8592233009708738, "no_speech_prob": 0.00026017456548288465}, + {"id": 466, "seek": 373848, "start": 3738.48, "end": 3744.96, "text": " differentiating + relevant irrelevant and you can kind of use that to rank um but at some point", + "tokens": [50364, 27372, 990, 7340, 28682, 293, 291, 393, 733, 295, 764, 300, 281, + 6181, 1105, 457, 412, 512, 935, 50688], "temperature": 0.0, "avg_logprob": -0.07251827015596278, + "compression_ratio": 1.8, "no_speech_prob": 0.00038060080260038376}, {"id": 467, + "seek": 373848, "start": 3744.96, "end": 3749.52, "text": " you do need to realize + that like I am working within this filter bubble with my original algorithm", "tokens": + [50688, 291, 360, 643, 281, 4325, 300, 411, 286, 669, 1364, 1951, 341, 6608, 12212, + 365, 452, 3380, 9284, 50916], "temperature": 0.0, "avg_logprob": -0.07251827015596278, + "compression_ratio": 1.8, "no_speech_prob": 0.00038060080260038376}, {"id": 468, + "seek": 373848, "start": 3749.52, "end": 3754.16, "text": " and all I''m doing is + sort of tweaking up a few things tweaking down a few things how do I", "tokens": + [50916, 293, 439, 286, 478, 884, 307, 1333, 295, 6986, 2456, 493, 257, 1326, 721, + 6986, 2456, 760, 257, 1326, 721, 577, 360, 286, 51148], "temperature": 0.0, "avg_logprob": + -0.07251827015596278, "compression_ratio": 1.8, "no_speech_prob": 0.00038060080260038376}, + {"id": 469, "seek": 373848, "start": 3754.16, "end": 3761.76, "text": " bust that + filter bubble and get different kinds of potential relevant results in front of + users", "tokens": [51148, 19432, 300, 6608, 12212, 293, 483, 819, 3685, 295, 3995, + 7340, 3542, 294, 1868, 295, 5022, 51528], "temperature": 0.0, "avg_logprob": -0.07251827015596278, + "compression_ratio": 1.8, "no_speech_prob": 0.00038060080260038376}, {"id": 470, + "seek": 376176, "start": 3761.84, "end": 3769.1200000000003, "text": " to sort of + like see whether or not they''ll click or not um and that''s that''s really like + sets you", "tokens": [50368, 281, 1333, 295, 411, 536, 1968, 420, 406, 436, 603, + 2052, 420, 406, 1105, 293, 300, 311, 300, 311, 534, 411, 6352, 291, 50732], "temperature": + 0.0, "avg_logprob": -0.15551205091578987, "compression_ratio": 1.6595744680851063, + "no_speech_prob": 0.004717727191746235}, {"id": 471, "seek": 376176, "start": 3769.1200000000003, + "end": 3775.92, "text": " know that''s really probably the big big challenge that + people have with I mean honestly not just", "tokens": [50732, 458, 300, 311, 534, + 1391, 264, 955, 955, 3430, 300, 561, 362, 365, 286, 914, 6095, 406, 445, 51072], + "temperature": 0.0, "avg_logprob": -0.15551205091578987, "compression_ratio": 1.6595744680851063, + "no_speech_prob": 0.004717727191746235}, {"id": 472, "seek": 376176, "start": 3775.92, + "end": 3782.7200000000003, "text": " learning to rank but any algorithmic search + works they''re doing yeah absolutely I mean I when I", "tokens": [51072, 2539, 281, + 6181, 457, 604, 9284, 299, 3164, 1985, 436, 434, 884, 1338, 3122, 286, 914, 286, + 562, 286, 51412], "temperature": 0.0, "avg_logprob": -0.15551205091578987, "compression_ratio": + 1.6595744680851063, "no_speech_prob": 0.004717727191746235}, {"id": 473, "seek": + 376176, "start": 3782.7200000000003, "end": 3789.44, "text": " was doing it using + your hello LTR Rappl I think I focused a lot on the mechanical aspects like okay", + "tokens": [51412, 390, 884, 309, 1228, 428, 7751, 441, 25936, 497, 1746, 75, 286, + 519, 286, 5178, 257, 688, 322, 264, 12070, 7270, 411, 1392, 51748], "temperature": + 0.0, "avg_logprob": -0.15551205091578987, "compression_ratio": 1.6595744680851063, + "no_speech_prob": 0.004717727191746235}, {"id": 474, "seek": 378944, "start": 3789.52, + "end": 3795.44, "text": " what is pairwise what is list wise I need to read london + mark papers to understand get into the", "tokens": [50368, 437, 307, 6119, 3711, + 437, 307, 1329, 10829, 286, 643, 281, 1401, 287, 684, 266, 1491, 10577, 281, 1223, + 483, 666, 264, 50664], "temperature": 0.0, "avg_logprob": -0.31406394471513466, + "compression_ratio": 1.6981981981981982, "no_speech_prob": 0.002289897296577692}, + {"id": 475, "seek": 378944, "start": 3795.44, "end": 3801.68, "text": " width of + the algos but then I think I spent maybe too little time figuring out the data part", + "tokens": [50664, 11402, 295, 264, 419, 18674, 457, 550, 286, 519, 286, 4418, 1310, + 886, 707, 565, 15213, 484, 264, 1412, 644, 50976], "temperature": 0.0, "avg_logprob": + -0.31406394471513466, "compression_ratio": 1.6981981981981982, "no_speech_prob": + 0.002289897296577692}, {"id": 476, "seek": 378944, "start": 3801.68, "end": 3808.7200000000003, + "text": " and like head versus tail I think grunting your show wants to say torso + as well and I''m like", "tokens": [50976, 293, 411, 1378, 5717, 6838, 286, 519, + 677, 14559, 428, 855, 2738, 281, 584, 34917, 382, 731, 293, 286, 478, 411, 51328], + "temperature": 0.0, "avg_logprob": -0.31406394471513466, "compression_ratio": 1.6981981981981982, + "no_speech_prob": 0.002289897296577692}, {"id": 477, "seek": 378944, "start": 3808.7200000000003, + "end": 3815.68, "text": " a torso yeah what''s that so like do you have any advice + for those who are starting like do they", "tokens": [51328, 257, 34917, 1338, 437, + 311, 300, 370, 411, 360, 291, 362, 604, 5192, 337, 729, 567, 366, 2891, 411, 360, + 436, 51676], "temperature": 0.0, "avg_logprob": -0.31406394471513466, "compression_ratio": + 1.6981981981981982, "no_speech_prob": 0.002289897296577692}, {"id": 478, "seek": + 381568, "start": 3815.68, "end": 3821.2799999999997, "text": " just need to be born + data scientists or do you think that they''re really like that to set yeah it", + "tokens": [50364, 445, 643, 281, 312, 4232, 1412, 7708, 420, 360, 291, 519, 300, + 436, 434, 534, 411, 300, 281, 992, 1338, 309, 50644], "temperature": 0.0, "avg_logprob": + -0.2921692616230733, "compression_ratio": 1.6428571428571428, "no_speech_prob": + 0.002574544632807374}, {"id": 479, "seek": 381568, "start": 3821.2799999999997, + "end": 3829.52, "text": " helps I guess for those like you once but like tool set + or methodology or a book or whatever like what", "tokens": [50644, 3665, 286, 2041, + 337, 729, 411, 291, 1564, 457, 411, 2290, 992, 420, 24850, 420, 257, 1446, 420, + 2035, 411, 437, 51056], "temperature": 0.0, "avg_logprob": -0.2921692616230733, + "compression_ratio": 1.6428571428571428, "no_speech_prob": 0.002574544632807374}, + {"id": 480, "seek": 381568, "start": 3830.16, "end": 3838.3999999999996, "text": + " yeah yeah so um I this is like a this is a big focus of AI powered search a book + I help write with", "tokens": [51088, 1338, 1338, 370, 1105, 286, 341, 307, 411, + 257, 341, 307, 257, 955, 1879, 295, 7318, 17786, 3164, 257, 1446, 286, 854, 2464, + 365, 51500], "temperature": 0.0, "avg_logprob": -0.2921692616230733, "compression_ratio": + 1.6428571428571428, "no_speech_prob": 0.002574544632807374}, {"id": 481, "seek": + 383840, "start": 3838.4, "end": 3844.4, "text": " trig ranger and max or whim and + then ML powered search was just the training I''m doing", "tokens": [50364, 35386, + 367, 3176, 293, 11469, 420, 47271, 293, 550, 21601, 17786, 3164, 390, 445, 264, + 3097, 286, 478, 884, 50664], "temperature": 0.0, "avg_logprob": -0.1789804458618164, + "compression_ratio": 1.6816143497757847, "no_speech_prob": 0.00022436985454987735}, + {"id": 482, "seek": 383840, "start": 3846.08, "end": 3853.6800000000003, "text": + " because I I think I think like a lot of the focus these days is on cool things + like dense vector", "tokens": [50748, 570, 286, 286, 519, 286, 519, 411, 257, 688, + 295, 264, 1879, 613, 1708, 307, 322, 1627, 721, 411, 18011, 8062, 51128], "temperature": + 0.0, "avg_logprob": -0.1789804458618164, "compression_ratio": 1.6816143497757847, + "no_speech_prob": 0.00022436985454987735}, {"id": 483, "seek": 383840, "start": + 3853.6800000000003, "end": 3859.6800000000003, "text": " retrieval bird and these + kinds of things and to me that''s like taking out an old model and", "tokens": [51128, + 19817, 3337, 5255, 293, 613, 3685, 295, 721, 293, 281, 385, 300, 311, 411, 1940, + 484, 364, 1331, 2316, 293, 51428], "temperature": 0.0, "avg_logprob": -0.1789804458618164, + "compression_ratio": 1.6816143497757847, "no_speech_prob": 0.00022436985454987735}, + {"id": 484, "seek": 383840, "start": 3859.6800000000003, "end": 3864.64, "text": + " putting in a new model but the problems outside of it to get the training data + are still really hard", "tokens": [51428, 3372, 294, 257, 777, 2316, 457, 264, 2740, + 2380, 295, 309, 281, 483, 264, 3097, 1412, 366, 920, 534, 1152, 51676], "temperature": + 0.0, "avg_logprob": -0.1789804458618164, "compression_ratio": 1.6816143497757847, + "no_speech_prob": 0.00022436985454987735}, {"id": 485, "seek": 386464, "start": + 3865.44, "end": 3869.8399999999997, "text": " and there are a lot of techniques + people can use and I think sort of the thing people aren''t", "tokens": [50404, + 293, 456, 366, 257, 688, 295, 7512, 561, 393, 764, 293, 286, 519, 1333, 295, 264, + 551, 561, 3212, 380, 50624], "temperature": 0.0, "avg_logprob": -0.07288788541962829, + "compression_ratio": 1.6527777777777777, "no_speech_prob": 0.000485570722958073}, + {"id": 486, "seek": 386464, "start": 3869.8399999999997, "end": 3877.12, "text": + " talking about enough in in the space is active and reinforcement learning and + that''s what I talk", "tokens": [50624, 1417, 466, 1547, 294, 294, 264, 1901, 307, + 4967, 293, 29280, 2539, 293, 300, 311, 437, 286, 751, 50988], "temperature": 0.0, + "avg_logprob": -0.07288788541962829, "compression_ratio": 1.6527777777777777, "no_speech_prob": + 0.000485570722958073}, {"id": 487, "seek": 386464, "start": 3877.12, "end": 3886.3199999999997, + "text": " about a lot in my book and my training is this idea of you know how do + we strategically explore", "tokens": [50988, 466, 257, 688, 294, 452, 1446, 293, + 452, 3097, 307, 341, 1558, 295, 291, 458, 577, 360, 321, 38061, 6839, 51448], "temperature": + 0.0, "avg_logprob": -0.07288788541962829, "compression_ratio": 1.6527777777777777, + "no_speech_prob": 0.000485570722958073}, {"id": 488, "seek": 386464, "start": 3887.2, + "end": 3892.16, "text": " new potential relevant search results for a query but + still maintaining", "tokens": [51492, 777, 3995, 7340, 3164, 3542, 337, 257, 14581, + 457, 920, 14916, 51740], "temperature": 0.0, "avg_logprob": -0.07288788541962829, + "compression_ratio": 1.6527777777777777, "no_speech_prob": 0.000485570722958073}, + {"id": 489, "seek": 389216, "start": 3892.3199999999997, "end": 3897.8399999999997, + "text": " exploiting the knowledge we do have because we don''t want to completely + just show people random results", "tokens": [50372, 12382, 1748, 264, 3601, 321, + 360, 362, 570, 321, 500, 380, 528, 281, 2584, 445, 855, 561, 4974, 3542, 50648], + "temperature": 0.0, "avg_logprob": -0.11524378925288489, "compression_ratio": 1.7874015748031495, + "no_speech_prob": 0.0006173097062855959}, {"id": 490, "seek": 389216, "start": 3898.7999999999997, + "end": 3903.2799999999997, "text": " and how do we play with that boundary a little + bit in a strategic way and not just like", "tokens": [50696, 293, 577, 360, 321, + 862, 365, 300, 12866, 257, 707, 857, 294, 257, 10924, 636, 293, 406, 445, 411, 50920], + "temperature": 0.0, "avg_logprob": -0.11524378925288489, "compression_ratio": 1.7874015748031495, + "no_speech_prob": 0.0006173097062855959}, {"id": 491, "seek": 389216, "start": 3903.2799999999997, + "end": 3909.6, "text": " here''s a bunch of results oh it hears like a random one + um and there are processes out there", "tokens": [50920, 510, 311, 257, 3840, 295, + 3542, 1954, 309, 25688, 411, 257, 4974, 472, 1105, 293, 456, 366, 7555, 484, 456, + 51236], "temperature": 0.0, "avg_logprob": -0.11524378925288489, "compression_ratio": + 1.7874015748031495, "no_speech_prob": 0.0006173097062855959}, {"id": 492, "seek": + 389216, "start": 3909.6, "end": 3915.8399999999997, "text": " to do that and out + you know one uh one that''s very near and dear to my heart which is a very", "tokens": + [51236, 281, 360, 300, 293, 484, 291, 458, 472, 2232, 472, 300, 311, 588, 2651, + 293, 6875, 281, 452, 1917, 597, 307, 257, 588, 51548], "temperature": 0.0, "avg_logprob": + -0.11524378925288489, "compression_ratio": 1.7874015748031495, "no_speech_prob": + 0.0006173097062855959}, {"id": 493, "seek": 389216, "start": 3915.8399999999997, + "end": 3920.7999999999997, "text": " practical thing for people to learn about is + what''s called a galsian process", "tokens": [51548, 8496, 551, 337, 561, 281, 1466, + 466, 307, 437, 311, 1219, 257, 290, 1124, 952, 1399, 51796], "temperature": 0.0, + "avg_logprob": -0.11524378925288489, "compression_ratio": 1.7874015748031495, "no_speech_prob": + 0.0006173097062855959}, {"id": 494, "seek": 392080, "start": 3921.76, "end": 3927.52, + "text": " and a galsian process is just a different kind of regression so it''s + the same way", "tokens": [50412, 293, 257, 290, 1124, 952, 1399, 307, 445, 257, + 819, 733, 295, 24590, 370, 309, 311, 264, 912, 636, 50700], "temperature": 0.0, + "avg_logprob": -0.09782828603472028, "compression_ratio": 1.7014218009478672, "no_speech_prob": + 0.00040527465171180665}, {"id": 495, "seek": 392080, "start": 3928.0800000000004, + "end": 3934.0800000000004, "text": " we''re learning uh we''re basically learning + to rank we''re learning like given a bunch of features", "tokens": [50728, 321, + 434, 2539, 2232, 321, 434, 1936, 2539, 281, 6181, 321, 434, 2539, 411, 2212, 257, + 3840, 295, 4122, 51028], "temperature": 0.0, "avg_logprob": -0.09782828603472028, + "compression_ratio": 1.7014218009478672, "no_speech_prob": 0.00040527465171180665}, + {"id": 496, "seek": 392080, "start": 3934.0800000000004, "end": 3940.1600000000003, + "text": " like the title bm25 or maybe some dense vector similarity or some other + you know", "tokens": [51028, 411, 264, 4876, 272, 76, 6074, 420, 1310, 512, 18011, + 8062, 32194, 420, 512, 661, 291, 458, 51332], "temperature": 0.0, "avg_logprob": + -0.09782828603472028, "compression_ratio": 1.7014218009478672, "no_speech_prob": + 0.00040527465171180665}, {"id": 497, "seek": 392080, "start": 3940.1600000000003, + "end": 3946.8, "text": " the popularity of the document we''re still learning from + our data what is you know what function of", "tokens": [51332, 264, 19301, 295, + 264, 4166, 321, 434, 920, 2539, 490, 527, 1412, 437, 307, 291, 458, 437, 2445, 295, + 51664], "temperature": 0.0, "avg_logprob": -0.09782828603472028, "compression_ratio": + 1.7014218009478672, "no_speech_prob": 0.00040527465171180665}, {"id": 498, "seek": + 394680, "start": 3946.8, "end": 3951.84, "text": " that and probably predicts relevance + and what doesn''t but what''s interesting about a galsian", "tokens": [50364, 300, + 293, 1391, 6069, 82, 32684, 293, 437, 1177, 380, 457, 437, 311, 1880, 466, 257, + 290, 1124, 952, 50616], "temperature": 0.0, "avg_logprob": -0.1018663929627005, + "compression_ratio": 1.912, "no_speech_prob": 0.0002328649425180629}, {"id": 499, + "seek": 394680, "start": 3951.84, "end": 3959.28, "text": " process is that any + given point it knows how certain it is in the prediction so like obviously", "tokens": + [50616, 1399, 307, 300, 604, 2212, 935, 309, 3255, 577, 1629, 309, 307, 294, 264, + 17630, 370, 411, 2745, 50988], "temperature": 0.0, "avg_logprob": -0.1018663929627005, + "compression_ratio": 1.912, "no_speech_prob": 0.0002328649425180629}, {"id": 500, + "seek": 394680, "start": 3959.92, "end": 3964.6400000000003, "text": " points that + are in your training set it''s going to have high certainty that the the variants + the", "tokens": [51020, 2793, 300, 366, 294, 428, 3097, 992, 309, 311, 516, 281, + 362, 1090, 27022, 300, 264, 264, 21669, 264, 51256], "temperature": 0.0, "avg_logprob": + -0.1018663929627005, "compression_ratio": 1.912, "no_speech_prob": 0.0002328649425180629}, + {"id": 501, "seek": 394680, "start": 3964.6400000000003, "end": 3970.2400000000002, + "text": " sort of like the gout that''s where the galsian comes in the galsian um + distribution at that point", "tokens": [51256, 1333, 295, 411, 264, 290, 346, 300, + 311, 689, 264, 290, 1124, 952, 1487, 294, 264, 290, 1124, 952, 1105, 7316, 412, + 300, 935, 51536], "temperature": 0.0, "avg_logprob": -0.1018663929627005, "compression_ratio": + 1.912, "no_speech_prob": 0.0002328649425180629}, {"id": 502, "seek": 394680, "start": + 3970.2400000000002, "end": 3975.6000000000004, "text": " is very small it''s very + certain when you go a little bit farther out from a point that you have", "tokens": + [51536, 307, 588, 1359, 309, 311, 588, 1629, 562, 291, 352, 257, 707, 857, 20344, + 484, 490, 257, 935, 300, 291, 362, 51804], "temperature": 0.0, "avg_logprob": -0.1018663929627005, + "compression_ratio": 1.912, "no_speech_prob": 0.0002328649425180629}, {"id": 503, + "seek": 397560, "start": 3975.6, "end": 3983.2799999999997, "text": " information + about and it might sort of like try to connect the dots between that that observation", + "tokens": [50364, 1589, 466, 293, 309, 1062, 1333, 295, 411, 853, 281, 1745, 264, + 15026, 1296, 300, 300, 14816, 50748], "temperature": 0.0, "avg_logprob": -0.10469961166381836, + "compression_ratio": 1.8027522935779816, "no_speech_prob": 0.00016040584887377918}, + {"id": 504, "seek": 397560, "start": 3983.2799999999997, "end": 3989.44, "text": + " and maybe one down here but at the end certain the kind of grows and grows as + it moves away from", "tokens": [50748, 293, 1310, 472, 760, 510, 457, 412, 264, + 917, 1629, 264, 733, 295, 13156, 293, 13156, 382, 309, 6067, 1314, 490, 51056], + "temperature": 0.0, "avg_logprob": -0.10469961166381836, "compression_ratio": 1.8027522935779816, + "no_speech_prob": 0.00016040584887377918}, {"id": 505, "seek": 397560, "start": + 3989.44, "end": 3996.08, "text": " an existing observation and that''s interesting + because what this model is doing is it''s sort of like", "tokens": [51056, 364, + 6741, 14816, 293, 300, 311, 1880, 570, 437, 341, 2316, 307, 884, 307, 309, 311, + 1333, 295, 411, 51388], "temperature": 0.0, "avg_logprob": -0.10469961166381836, + "compression_ratio": 1.8027522935779816, "no_speech_prob": 0.00016040584887377918}, + {"id": 506, "seek": 397560, "start": 3996.08, "end": 4002.56, "text": " both predicting + relevance for arbitrary points in the feature space and it can do that because it", + "tokens": [51388, 1293, 32884, 32684, 337, 23211, 2793, 294, 264, 4111, 1901, 293, + 309, 393, 360, 300, 570, 309, 51712], "temperature": 0.0, "avg_logprob": -0.10469961166381836, + "compression_ratio": 1.8027522935779816, "no_speech_prob": 0.00016040584887377918}, + {"id": 507, "seek": 400256, "start": 4002.56, "end": 4008.24, "text": " can see + patterns like oh it seems like there''s a cluster of training data over here where + it''s like", "tokens": [50364, 393, 536, 8294, 411, 1954, 309, 2544, 411, 456, 311, + 257, 13630, 295, 3097, 1412, 670, 510, 689, 309, 311, 411, 50648], "temperature": + 0.0, "avg_logprob": -0.07543281577099328, "compression_ratio": 1.7772511848341233, + "no_speech_prob": 4.953290772391483e-05}, {"id": 508, "seek": 400256, "start": 4008.7999999999997, + "end": 4013.52, "text": " things in this realm are more relevant than things over + here on the bottom left", "tokens": [50676, 721, 294, 341, 15355, 366, 544, 7340, + 813, 721, 670, 510, 322, 264, 2767, 1411, 50912], "temperature": 0.0, "avg_logprob": + -0.07543281577099328, "compression_ratio": 1.7772511848341233, "no_speech_prob": + 4.953290772391483e-05}, {"id": 509, "seek": 400256, "start": 4015.2, "end": 4020.96, + "text": " but when it''s in between those data points is where the uncertainty really + lies and it can say", "tokens": [50996, 457, 562, 309, 311, 294, 1296, 729, 1412, + 2793, 307, 689, 264, 15697, 534, 9134, 293, 309, 393, 584, 51284], "temperature": + 0.0, "avg_logprob": -0.07543281577099328, "compression_ratio": 1.7772511848341233, + "no_speech_prob": 4.953290772391483e-05}, {"id": 510, "seek": 400256, "start": 4020.96, + "end": 4026.32, "text": " well I''m gonna I think we should probe here I think we + should try to select the document to show the", "tokens": [51284, 731, 286, 478, + 799, 286, 519, 321, 820, 22715, 510, 286, 519, 321, 820, 853, 281, 3048, 264, 4166, + 281, 855, 264, 51552], "temperature": 0.0, "avg_logprob": -0.07543281577099328, + "compression_ratio": 1.7772511848341233, "no_speech_prob": 4.953290772391483e-05}, + {"id": 511, "seek": 402632, "start": 4026.32, "end": 4034.8, "text": " user to get + more information on that is uh uh we feel with a reasonable set of confidence is", + "tokens": [50364, 4195, 281, 483, 544, 1589, 322, 300, 307, 2232, 2232, 321, 841, + 365, 257, 10585, 992, 295, 6687, 307, 50788], "temperature": 0.0, "avg_logprob": + -0.09459094884918957, "compression_ratio": 1.6741071428571428, "no_speech_prob": + 0.00012882522423751652}, {"id": 512, "seek": 402632, "start": 4034.8, "end": 4038.7200000000003, + "text": " probably relevant but we''re not entirely sure because we haven''t exactly + observed that yet", "tokens": [50788, 1391, 7340, 457, 321, 434, 406, 7696, 988, + 570, 321, 2378, 380, 2293, 13095, 300, 1939, 50984], "temperature": 0.0, "avg_logprob": + -0.09459094884918957, "compression_ratio": 1.6741071428571428, "no_speech_prob": + 0.00012882522423751652}, {"id": 513, "seek": 402632, "start": 4041.04, "end": 4047.2000000000003, + "text": " and that''s really where you can both explore the training data and explore + the feature space so", "tokens": [51100, 293, 300, 311, 534, 689, 291, 393, 1293, + 6839, 264, 3097, 1412, 293, 6839, 264, 4111, 1901, 370, 51408], "temperature": 0.0, + "avg_logprob": -0.09459094884918957, "compression_ratio": 1.6741071428571428, "no_speech_prob": + 0.00012882522423751652}, {"id": 514, "seek": 402632, "start": 4047.2000000000003, + "end": 4052.0, "text": " if you introduce a new feature into learning to rank you + could say like oh let me try different", "tokens": [51408, 498, 291, 5366, 257, + 777, 4111, 666, 2539, 281, 6181, 291, 727, 584, 411, 1954, 718, 385, 853, 819, 51648], + "temperature": 0.0, "avg_logprob": -0.09459094884918957, "compression_ratio": 1.6741071428571428, + "no_speech_prob": 0.00012882522423751652}, {"id": 515, "seek": 405200, "start": + 4052.0, "end": 4057.92, "text": " combinations of this and then as you start to + get out the general pattern you can try things in", "tokens": [50364, 21267, 295, + 341, 293, 550, 382, 291, 722, 281, 483, 484, 264, 2674, 5102, 291, 393, 853, 721, + 294, 50660], "temperature": 0.0, "avg_logprob": -0.08053105030584773, "compression_ratio": + 1.9126984126984128, "no_speech_prob": 0.00027309905271977186}, {"id": 516, "seek": + 405200, "start": 4057.92, "end": 4064.4, "text": " between to really understand + how that feature interacts with how um how users are interacting with data", "tokens": + [50660, 1296, 281, 534, 1223, 577, 300, 4111, 43582, 365, 577, 1105, 577, 5022, + 366, 18017, 365, 1412, 50984], "temperature": 0.0, "avg_logprob": -0.08053105030584773, + "compression_ratio": 1.9126984126984128, "no_speech_prob": 0.00027309905271977186}, + {"id": 517, "seek": 405200, "start": 4065.04, "end": 4071.2, "text": " so that''s + really why it''s active learning it''s very much about like the model itself yes + you''re", "tokens": [51016, 370, 300, 311, 534, 983, 309, 311, 4967, 2539, 309, + 311, 588, 709, 466, 411, 264, 2316, 2564, 2086, 291, 434, 51324], "temperature": + 0.0, "avg_logprob": -0.08053105030584773, "compression_ratio": 1.9126984126984128, + "no_speech_prob": 0.00027309905271977186}, {"id": 518, "seek": 405200, "start": + 4071.2, "end": 4076.88, "text": " you''re training a model like the model itself + can know its own gap so that you can sort of like", "tokens": [51324, 291, 434, + 3097, 257, 2316, 411, 264, 2316, 2564, 393, 458, 1080, 1065, 7417, 370, 300, 291, + 393, 1333, 295, 411, 51608], "temperature": 0.0, "avg_logprob": -0.08053105030584773, + "compression_ratio": 1.9126984126984128, "no_speech_prob": 0.00027309905271977186}, + {"id": 519, "seek": 405200, "start": 4076.88, "end": 4081.68, "text": " imagine + as you''re serving search results you can go to this model and say not only are + you", "tokens": [51608, 3811, 382, 291, 434, 8148, 3164, 3542, 291, 393, 352, 281, + 341, 2316, 293, 584, 406, 787, 366, 291, 51848], "temperature": 0.0, "avg_logprob": + -0.08053105030584773, "compression_ratio": 1.9126984126984128, "no_speech_prob": + 0.00027309905271977186}, {"id": 520, "seek": 408168, "start": 4082.48, "end": 4087.8399999999997, + "text": " I am not only wanting what you''re most certain about I want like strategically + where you want to", "tokens": [50404, 286, 669, 406, 787, 7935, 437, 291, 434, 881, + 1629, 466, 286, 528, 411, 38061, 689, 291, 528, 281, 50672], "temperature": 0.0, + "avg_logprob": -0.155233277214898, "compression_ratio": 1.6912442396313363, "no_speech_prob": + 0.000280339561868459}, {"id": 521, "seek": 408168, "start": 4087.8399999999997, + "end": 4094.0, "text": " explore and you can show those results to users too and + start to gather clicks and information on that", "tokens": [50672, 6839, 293, 291, + 393, 855, 729, 3542, 281, 5022, 886, 293, 722, 281, 5448, 18521, 293, 1589, 322, + 300, 50980], "temperature": 0.0, "avg_logprob": -0.155233277214898, "compression_ratio": + 1.6912442396313363, "no_speech_prob": 0.000280339561868459}, {"id": 522, "seek": + 408168, "start": 4094.64, "end": 4100.08, "text": " and to me that''s a really exciting + field of of where search and information retrieval", "tokens": [51012, 293, 281, + 385, 300, 311, 257, 534, 4670, 2519, 295, 295, 689, 3164, 293, 1589, 19817, 3337, + 51284], "temperature": 0.0, "avg_logprob": -0.155233277214898, "compression_ratio": + 1.6912442396313363, "no_speech_prob": 0.000280339561868459}, {"id": 523, "seek": + 408168, "start": 4101.12, "end": 4106.32, "text": " and all of these fuzzy relevance + interfaces could go and do a lot of amazing work", "tokens": [51336, 293, 439, 295, + 613, 34710, 32684, 28416, 727, 352, 293, 360, 257, 688, 295, 2243, 589, 51596], + "temperature": 0.0, "avg_logprob": -0.155233277214898, "compression_ratio": 1.6912442396313363, + "no_speech_prob": 0.000280339561868459}, {"id": 524, "seek": 410632, "start": 4106.96, + "end": 4113.599999999999, "text": " yeah it sounds fantastic it''s like a mathematically + driven wave expanding your click base right", "tokens": [50396, 1338, 309, 3263, + 5456, 309, 311, 411, 257, 44003, 9555, 5772, 14702, 428, 2052, 3096, 558, 50728], + "temperature": 0.0, "avg_logprob": -0.2102050594255036, "compression_ratio": 1.8366533864541832, + "no_speech_prob": 0.009044284000992775}, {"id": 525, "seek": 410632, "start": 4113.599999999999, + "end": 4119.759999999999, "text": " and it still and it still sounds very experimental + to me because nothing is given", "tokens": [50728, 293, 309, 920, 293, 309, 920, + 3263, 588, 17069, 281, 385, 570, 1825, 307, 2212, 51036], "temperature": 0.0, "avg_logprob": + -0.2102050594255036, "compression_ratio": 1.8366533864541832, "no_speech_prob": + 0.009044284000992775}, {"id": 526, "seek": 410632, "start": 4119.759999999999, "end": + 4124.0, "text": " you only have from what you explained to find a student correctly + you know it''s like it''s still an", "tokens": [51036, 291, 787, 362, 490, 437, + 291, 8825, 281, 915, 257, 3107, 8944, 291, 458, 309, 311, 411, 309, 311, 920, 364, + 51248], "temperature": 0.0, "avg_logprob": -0.2102050594255036, "compression_ratio": + 1.8366533864541832, "no_speech_prob": 0.009044284000992775}, {"id": 527, "seek": + 410632, "start": 4124.0, "end": 4129.12, "text": " experiment we could run a nab + test is that how you would do it also so like you you basically", "tokens": [51248, + 5120, 321, 727, 1190, 257, 297, 455, 1500, 307, 300, 577, 291, 576, 360, 309, 611, + 370, 411, 291, 291, 1936, 51504], "temperature": 0.0, "avg_logprob": -0.2102050594255036, + "compression_ratio": 1.8366533864541832, "no_speech_prob": 0.009044284000992775}, + {"id": 528, "seek": 410632, "start": 4129.679999999999, "end": 4134.96, "text": + " your model is essentially a reflection of the data choice you made and now you + explained a", "tokens": [51532, 428, 2316, 307, 4476, 257, 12914, 295, 264, 1412, + 3922, 291, 1027, 293, 586, 291, 8825, 257, 51796], "temperature": 0.0, "avg_logprob": + -0.2102050594255036, "compression_ratio": 1.8366533864541832, "no_speech_prob": + 0.009044284000992775}, {"id": 529, "seek": 413496, "start": 4135.44, "end": 4141.36, + "text": " Gaussian model to do that right and then you run an nab test is that right + yeah you could do you", "tokens": [50388, 39148, 2316, 281, 360, 300, 558, 293, + 550, 291, 1190, 364, 297, 455, 1500, 307, 300, 558, 1338, 291, 727, 360, 291, 50684], + "temperature": 0.0, "avg_logprob": -0.14713626546958058, "compression_ratio": 1.8653846153846154, + "no_speech_prob": 0.0008556334651075304}, {"id": 530, "seek": 413496, "start": 4141.36, + "end": 4145.68, "text": " could set up lots of different ways of doing it so you + could be you could have your ab tests that", "tokens": [50684, 727, 992, 493, 3195, + 295, 819, 2098, 295, 884, 309, 370, 291, 727, 312, 291, 727, 362, 428, 410, 6921, + 300, 50900], "temperature": 0.0, "avg_logprob": -0.14713626546958058, "compression_ratio": + 1.8653846153846154, "no_speech_prob": 0.0008556334651075304}, {"id": 531, "seek": + 413496, "start": 4145.68, "end": 4156.32, "text": " are that are going on within + those ab tests or let''s say a classic ab test is like if I search for", "tokens": + [50900, 366, 300, 366, 516, 322, 1951, 729, 410, 6921, 420, 718, 311, 584, 257, + 7230, 410, 1500, 307, 411, 498, 286, 3164, 337, 51432], "temperature": 0.0, "avg_logprob": + -0.14713626546958058, "compression_ratio": 1.8653846153846154, "no_speech_prob": + 0.0008556334651075304}, {"id": 532, "seek": 413496, "start": 4156.32, "end": 4164.16, + "text": " shoe I get ranking a or ranking b I can select actually there''s there''s + a lot of creative ways", "tokens": [51432, 12796, 286, 483, 17833, 257, 420, 17833, + 272, 286, 393, 3048, 767, 456, 311, 456, 311, 257, 688, 295, 5880, 2098, 51824], + "temperature": 0.0, "avg_logprob": -0.14713626546958058, "compression_ratio": 1.8653846153846154, + "no_speech_prob": 0.0008556334651075304}, {"id": 533, "seek": 416416, "start": 4164.16, + "end": 4169.12, "text": " you can do it but sort of like a classic way of doing + it might be to say in the third slot I''m", "tokens": [50364, 291, 393, 360, 309, + 457, 1333, 295, 411, 257, 7230, 636, 295, 884, 309, 1062, 312, 281, 584, 294, 264, + 2636, 14747, 286, 478, 50612], "temperature": 0.0, "avg_logprob": -0.12606600279449134, + "compression_ratio": 1.7092511013215859, "no_speech_prob": 0.0005000947858206928}, + {"id": 534, "seek": 416416, "start": 4169.12, "end": 4175.76, "text": " going to + put the explore item so in every other slot I might have my like my traditional + LTR model", "tokens": [50612, 516, 281, 829, 264, 6839, 3174, 370, 294, 633, 661, + 14747, 286, 1062, 362, 452, 411, 452, 5164, 441, 25936, 2316, 50944], "temperature": + 0.0, "avg_logprob": -0.12606600279449134, "compression_ratio": 1.7092511013215859, + "no_speech_prob": 0.0005000947858206928}, {"id": 535, "seek": 416416, "start": 4175.76, + "end": 4183.12, "text": " that''s ranking results using lambda mar or SVM rank or + any of these sort of traditional learning", "tokens": [50944, 300, 311, 17833, 3542, + 1228, 13607, 1849, 420, 31910, 44, 6181, 420, 604, 295, 613, 1333, 295, 5164, 2539, + 51312], "temperature": 0.0, "avg_logprob": -0.12606600279449134, "compression_ratio": + 1.7092511013215859, "no_speech_prob": 0.0005000947858206928}, {"id": 536, "seek": + 416416, "start": 4183.12, "end": 4190.0, "text": " rank models and then we''re not + even learning to rank some other solution that you know works well", "tokens": [51312, + 6181, 5245, 293, 550, 321, 434, 406, 754, 2539, 281, 6181, 512, 661, 3827, 300, + 291, 458, 1985, 731, 51656], "temperature": 0.0, "avg_logprob": -0.12606600279449134, + "compression_ratio": 1.7092511013215859, "no_speech_prob": 0.0005000947858206928}, + {"id": 537, "seek": 419000, "start": 4190.0, "end": 4195.68, "text": " with your + features and then you slot in that like third result that''s going to like explore + a bit", "tokens": [50364, 365, 428, 4122, 293, 550, 291, 14747, 294, 300, 411, 2636, + 1874, 300, 311, 516, 281, 411, 6839, 257, 857, 50648], "temperature": 0.0, "avg_logprob": + -0.10458095683607944, "compression_ratio": 1.7877358490566038, "no_speech_prob": + 0.0002426455175736919}, {"id": 538, "seek": 419000, "start": 4195.68, "end": 4203.36, + "text": " it''s going to be different or as many results as you want another completely + different option is to", "tokens": [50648, 309, 311, 516, 281, 312, 819, 420, 382, + 867, 3542, 382, 291, 528, 1071, 2584, 819, 3614, 307, 281, 51032], "temperature": + 0.0, "avg_logprob": -0.10458095683607944, "compression_ratio": 1.7877358490566038, + "no_speech_prob": 0.0002426455175736919}, {"id": 539, "seek": 419000, "start": 4204.32, + "end": 4211.28, "text": " use it as a means to drive um in search results it''s + not like we show people just like 10 results", "tokens": [51080, 764, 309, 382, + 257, 1355, 281, 3332, 1105, 294, 3164, 3542, 309, 311, 406, 411, 321, 855, 561, + 445, 411, 1266, 3542, 51428], "temperature": 0.0, "avg_logprob": -0.10458095683607944, + "compression_ratio": 1.7877358490566038, "no_speech_prob": 0.0002426455175736919}, + {"id": 540, "seek": 419000, "start": 4212.24, "end": 4216.24, "text": " anymore + we give people lots of different there''s different UI widgets that are like", "tokens": + [51476, 3602, 321, 976, 561, 3195, 295, 819, 456, 311, 819, 15682, 43355, 300, 366, + 411, 51676], "temperature": 0.0, "avg_logprob": -0.10458095683607944, "compression_ratio": + 1.7877358490566038, "no_speech_prob": 0.0002426455175736919}, {"id": 541, "seek": + 421624, "start": 4216.88, "end": 4221.12, "text": " off to the right you might have + something that looks a bit more like an ad or suggestions", "tokens": [50396, 766, + 281, 264, 558, 291, 1062, 362, 746, 300, 1542, 257, 857, 544, 411, 364, 614, 420, + 13396, 50608], "temperature": 0.0, "avg_logprob": -0.07531918731390261, "compression_ratio": + 1.911764705882353, "no_speech_prob": 0.0012055416591465473}, {"id": 542, "seek": + 421624, "start": 4221.92, "end": 4227.12, "text": " or in in product recommendations + or a product search we might have like sort of similar products", "tokens": [50648, + 420, 294, 294, 1674, 10434, 420, 257, 1674, 3164, 321, 1062, 362, 411, 1333, 295, + 2531, 3383, 50908], "temperature": 0.0, "avg_logprob": -0.07531918731390261, "compression_ratio": + 1.911764705882353, "no_speech_prob": 0.0012055416591465473}, {"id": 543, "seek": + 421624, "start": 4227.12, "end": 4231.5199999999995, "text": " to those that you + searched for like different kinds of prompts and you might get sort of", "tokens": + [50908, 281, 729, 300, 291, 22961, 337, 411, 819, 3685, 295, 41095, 293, 291, 1062, + 483, 1333, 295, 51128], "temperature": 0.0, "avg_logprob": -0.07531918731390261, + "compression_ratio": 1.911764705882353, "no_speech_prob": 0.0012055416591465473}, + {"id": 544, "seek": 421624, "start": 4231.5199999999995, "end": 4237.36, "text": + " explore in those spaces too to kind of get more click data and traffic to just + sort of like", "tokens": [51128, 6839, 294, 729, 7673, 886, 281, 733, 295, 483, + 544, 2052, 1412, 293, 6419, 281, 445, 1333, 295, 411, 51420], "temperature": 0.0, + "avg_logprob": -0.07531918731390261, "compression_ratio": 1.911764705882353, "no_speech_prob": + 0.0012055416591465473}, {"id": 545, "seek": 421624, "start": 4237.36, "end": 4243.04, + "text": " explore what might be relevant so it may it depends a lot on like how + you want to drive", "tokens": [51420, 6839, 437, 1062, 312, 7340, 370, 309, 815, + 309, 5946, 257, 688, 322, 411, 577, 291, 528, 281, 3332, 51704], "temperature": + 0.0, "avg_logprob": -0.07531918731390261, "compression_ratio": 1.911764705882353, + "no_speech_prob": 0.0012055416591465473}, {"id": 546, "seek": 424304, "start": 4243.04, + "end": 4248.32, "text": " your UI in your specific use case and what might be appropriate + and this is help me understand", "tokens": [50364, 428, 15682, 294, 428, 2685, 764, + 1389, 293, 437, 1062, 312, 6854, 293, 341, 307, 854, 385, 1223, 50628], "temperature": + 0.0, "avg_logprob": -0.1837553750900995, "compression_ratio": 1.7746478873239437, + "no_speech_prob": 0.004802764859050512}, {"id": 547, "seek": 424304, "start": 4248.32, + "end": 4253.76, "text": " this this is different from click models right because + we also have the click bias problem", "tokens": [50628, 341, 341, 307, 819, 490, + 2052, 5245, 558, 570, 321, 611, 362, 264, 2052, 12577, 1154, 50900], "temperature": + 0.0, "avg_logprob": -0.1837553750900995, "compression_ratio": 1.7746478873239437, + "no_speech_prob": 0.004802764859050512}, {"id": 548, "seek": 424304, "start": 4253.76, + "end": 4260.88, "text": " and we could introduce or redistribute the click weight + in a way to those unseen items this is", "tokens": [50900, 293, 321, 727, 5366, + 420, 36198, 2024, 1169, 264, 2052, 3364, 294, 257, 636, 281, 729, 40608, 4754, 341, + 307, 51256], "temperature": 0.0, "avg_logprob": -0.1837553750900995, "compression_ratio": + 1.7746478873239437, "no_speech_prob": 0.004802764859050512}, {"id": 549, "seek": + 424304, "start": 4260.88, "end": 4268.24, "text": " absolutely right is this different + so it''s different so um so there yeah this is like I''m talking", "tokens": [51256, + 3122, 558, 307, 341, 819, 370, 309, 311, 819, 370, 1105, 370, 456, 1338, 341, 307, + 411, 286, 478, 1417, 51624], "temperature": 0.0, "avg_logprob": -0.1837553750900995, + "compression_ratio": 1.7746478873239437, "no_speech_prob": 0.004802764859050512}, + {"id": 550, "seek": 426824, "start": 4268.32, "end": 4274.5599999999995, "text": + " about step two of a process step one before you even get to here is you don''t + just want to take like", "tokens": [50368, 466, 1823, 732, 295, 257, 1399, 1823, + 472, 949, 291, 754, 483, 281, 510, 307, 291, 500, 380, 445, 528, 281, 747, 411, + 50680], "temperature": 0.0, "avg_logprob": -0.08521119031039151, "compression_ratio": + 1.7532467532467533, "no_speech_prob": 0.0018124210182577372}, {"id": 551, "seek": + 426824, "start": 4274.5599999999995, "end": 4281.92, "text": " if you search for + shoe and you notice something gets a certain click-through rate it''s not necessarily", + "tokens": [50680, 498, 291, 3164, 337, 12796, 293, 291, 3449, 746, 2170, 257, 1629, + 2052, 12, 11529, 3314, 309, 311, 406, 4725, 51048], "temperature": 0.0, "avg_logprob": + -0.08521119031039151, "compression_ratio": 1.7532467532467533, "no_speech_prob": + 0.0018124210182577372}, {"id": 552, "seek": 426824, "start": 4281.92, "end": 4287.28, + "text": " you don''t necessarily want to take that raw number of clicks because + even within those things that", "tokens": [51048, 291, 500, 380, 4725, 528, 281, + 747, 300, 8936, 1230, 295, 18521, 570, 754, 1951, 729, 721, 300, 51316], "temperature": + 0.0, "avg_logprob": -0.08521119031039151, "compression_ratio": 1.7532467532467533, + "no_speech_prob": 0.0018124210182577372}, {"id": 553, "seek": 426824, "start": 4287.28, + "end": 4294.5599999999995, "text": " you''re showing users different just there + is something called position bias which is people might scan", "tokens": [51316, + 291, 434, 4099, 5022, 819, 445, 456, 307, 746, 1219, 2535, 12577, 597, 307, 561, + 1062, 11049, 51680], "temperature": 0.0, "avg_logprob": -0.08521119031039151, "compression_ratio": + 1.7532467532467533, "no_speech_prob": 0.0018124210182577372}, {"id": 554, "seek": + 429456, "start": 4294.56, "end": 4299.76, "text": " top to bottom and they''re just + going to click on the first result more than they''re going to", "tokens": [50364, + 1192, 281, 2767, 293, 436, 434, 445, 516, 281, 2052, 322, 264, 700, 1874, 544, 813, + 436, 434, 516, 281, 50624], "temperature": 0.0, "avg_logprob": -0.11169000429527781, + "compression_ratio": 2.119815668202765, "no_speech_prob": 0.0012442364823073149}, + {"id": 555, "seek": 429456, "start": 4299.76, "end": 4303.84, "text": " click on + the second result and there''s lots of reasons for that even when they notice both", + "tokens": [50624, 2052, 322, 264, 1150, 1874, 293, 456, 311, 3195, 295, 4112, 337, + 300, 754, 562, 436, 3449, 1293, 50828], "temperature": 0.0, "avg_logprob": -0.11169000429527781, + "compression_ratio": 2.119815668202765, "no_speech_prob": 0.0012442364823073149}, + {"id": 556, "seek": 429456, "start": 4304.64, "end": 4308.64, "text": " they''re + just getting they might say oh this algorithm must know what it''s doing um", "tokens": + [50868, 436, 434, 445, 1242, 436, 1062, 584, 1954, 341, 9284, 1633, 458, 437, 309, + 311, 884, 1105, 51068], "temperature": 0.0, "avg_logprob": -0.11169000429527781, + "compression_ratio": 2.119815668202765, "no_speech_prob": 0.0012442364823073149}, + {"id": 557, "seek": 429456, "start": 4309.52, "end": 4316.160000000001, "text": + " there are people scan top to bottom um and there are different reasons people + are just like", "tokens": [51112, 456, 366, 561, 11049, 1192, 281, 2767, 1105, 293, + 456, 366, 819, 4112, 561, 366, 445, 411, 51444], "temperature": 0.0, "avg_logprob": + -0.11169000429527781, "compression_ratio": 2.119815668202765, "no_speech_prob": + 0.0012442364823073149}, {"id": 558, "seek": 429456, "start": 4316.88, "end": 4322.400000000001, + "text": " will click the first result more than the second result and so on and + it''s it''s a it''s an interesting", "tokens": [51480, 486, 2052, 264, 700, 1874, + 544, 813, 264, 1150, 1874, 293, 370, 322, 293, 309, 311, 309, 311, 257, 309, 311, + 364, 1880, 51756], "temperature": 0.0, "avg_logprob": -0.11169000429527781, "compression_ratio": + 2.119815668202765, "no_speech_prob": 0.0012442364823073149}, {"id": 559, "seek": + 432240, "start": 4322.4, "end": 4328.4, "text": " phenomenon of like psychology + about how people process uh search results that are even shown to them", "tokens": + [50364, 14029, 295, 411, 15105, 466, 577, 561, 1399, 2232, 3164, 3542, 300, 366, + 754, 4898, 281, 552, 50664], "temperature": 0.0, "avg_logprob": -0.18448654810587564, + "compression_ratio": 1.7092511013215859, "no_speech_prob": 0.0033399653621017933}, + {"id": 560, "seek": 432240, "start": 4329.2, "end": 4335.44, "text": " yeah exactly + and by the way just as you explained this it occurred to me that have you noticed + how um", "tokens": [50704, 1338, 2293, 293, 538, 264, 636, 445, 382, 291, 8825, + 341, 309, 11068, 281, 385, 300, 362, 291, 5694, 577, 1105, 51016], "temperature": + 0.0, "avg_logprob": -0.18448654810587564, "compression_ratio": 1.7092511013215859, + "no_speech_prob": 0.0033399653621017933}, {"id": 561, "seek": 432240, "start": 4336.24, + "end": 4340.96, "text": " you know the interfaces changed like you go on youtube + watching these shirts there is no way to", "tokens": [51056, 291, 458, 264, 28416, + 3105, 411, 291, 352, 322, 12487, 1976, 613, 20832, 456, 307, 572, 636, 281, 51292], + "temperature": 0.0, "avg_logprob": -0.18448654810587564, "compression_ratio": 1.7092511013215859, + "no_speech_prob": 0.0033399653621017933}, {"id": 562, "seek": 432240, "start": 4340.96, + "end": 4347.12, "text": " search them right you just click and you watch and watch + it because I think at that point", "tokens": [51292, 3164, 552, 558, 291, 445, 2052, + 293, 291, 1159, 293, 1159, 309, 570, 286, 519, 412, 300, 935, 51600], "temperature": + 0.0, "avg_logprob": -0.18448654810587564, "compression_ratio": 1.7092511013215859, + "no_speech_prob": 0.0033399653621017933}, {"id": 563, "seek": 434712, "start": 4347.12, + "end": 4352.16, "text": " first of all there is no bias you don''t know what''s + next but I think the goal is also more like", "tokens": [50364, 700, 295, 439, 456, + 307, 572, 12577, 291, 500, 380, 458, 437, 311, 958, 457, 286, 519, 264, 3387, 307, + 611, 544, 411, 50616], "temperature": 0.0, "avg_logprob": -0.09170584722396431, + "compression_ratio": 1.8528301886792453, "no_speech_prob": 0.004400997888296843}, + {"id": 564, "seek": 434712, "start": 4352.16, "end": 4358.4, "text": " entertainment + it''s not like um I have an information need right I''m actually searching for something", + "tokens": [50616, 12393, 309, 311, 406, 411, 1105, 286, 362, 364, 1589, 643, 558, + 286, 478, 767, 10808, 337, 746, 50928], "temperature": 0.0, "avg_logprob": -0.09170584722396431, + "compression_ratio": 1.8528301886792453, "no_speech_prob": 0.004400997888296843}, + {"id": 565, "seek": 434712, "start": 4358.4, "end": 4364.72, "text": " but I guess + sometimes and I think you also spoke about it search uh blends with recommender + systems", "tokens": [50928, 457, 286, 2041, 2171, 293, 286, 519, 291, 611, 7179, + 466, 309, 3164, 2232, 37619, 365, 2748, 260, 3652, 51244], "temperature": 0.0, "avg_logprob": + -0.09170584722396431, "compression_ratio": 1.8528301886792453, "no_speech_prob": + 0.004400997888296843}, {"id": 566, "seek": 434712, "start": 4364.72, "end": 4368.8, + "text": " because we actually don''t know and user might not know what they''re + looking for sometimes maybe", "tokens": [51244, 570, 321, 767, 500, 380, 458, 293, + 4195, 1062, 406, 458, 437, 436, 434, 1237, 337, 2171, 1310, 51448], "temperature": + 0.0, "avg_logprob": -0.09170584722396431, "compression_ratio": 1.8528301886792453, + "no_speech_prob": 0.004400997888296843}, {"id": 567, "seek": 434712, "start": 4368.8, + "end": 4374.64, "text": " sometimes they do sometimes they don''t like it''s an + explorative search which means it could become", "tokens": [51448, 2171, 436, 360, + 2171, 436, 500, 380, 411, 309, 311, 364, 24765, 1166, 3164, 597, 1355, 309, 727, + 1813, 51740], "temperature": 0.0, "avg_logprob": -0.09170584722396431, "compression_ratio": + 1.8528301886792453, "no_speech_prob": 0.004400997888296843}, {"id": 568, "seek": + 437464, "start": 4374.64, "end": 4380.0, "text": " a recommender system which means + you could plug in those explorative results exactly and it", "tokens": [50364, 257, + 2748, 260, 1185, 597, 1355, 291, 727, 5452, 294, 729, 24765, 1166, 3542, 2293, 293, + 309, 50632], "temperature": 0.0, "avg_logprob": -0.12388414633078654, "compression_ratio": + 1.6506024096385543, "no_speech_prob": 0.00020160170970484614}, {"id": 569, "seek": + 437464, "start": 4380.0, "end": 4391.04, "text": " becomes a very um that blending + it can be very uh interesting it''s also can be challenging because", "tokens": + [50632, 3643, 257, 588, 1105, 300, 23124, 309, 393, 312, 588, 2232, 1880, 309, 311, + 611, 393, 312, 7595, 570, 51184], "temperature": 0.0, "avg_logprob": -0.12388414633078654, + "compression_ratio": 1.6506024096385543, "no_speech_prob": 0.00020160170970484614}, + {"id": 570, "seek": 437464, "start": 4391.84, "end": 4400.88, "text": " searches + also a very intentional activity and if you do something uh let''s say in a", "tokens": + [51224, 26701, 611, 257, 588, 21935, 5191, 293, 498, 291, 360, 746, 2232, 718, 311, + 584, 294, 257, 51676], "temperature": 0.0, "avg_logprob": -0.12388414633078654, + "compression_ratio": 1.6506024096385543, "no_speech_prob": 0.00020160170970484614}, + {"id": 571, "seek": 440088, "start": 4401.4400000000005, "end": 4407.6, "text": + " an a dense vector representation there is some relationship that in a general + sense like when you", "tokens": [50392, 364, 257, 18011, 8062, 10290, 456, 307, + 512, 2480, 300, 294, 257, 2674, 2020, 411, 562, 291, 50700], "temperature": 0.0, + "avg_logprob": -0.1520837759360289, "compression_ratio": 1.7370892018779343, "no_speech_prob": + 0.0014404486864805222}, {"id": 572, "seek": 440088, "start": 4407.6, "end": 4413.68, + "text": " train on Wikipedia it makes sense that these things go together but maybe + in the specific domain", "tokens": [50700, 3847, 322, 28999, 309, 1669, 2020, 300, + 613, 721, 352, 1214, 457, 1310, 294, 264, 2685, 9274, 51004], "temperature": 0.0, + "avg_logprob": -0.1520837759360289, "compression_ratio": 1.7370892018779343, "no_speech_prob": + 0.0014404486864805222}, {"id": 573, "seek": 440088, "start": 4414.96, "end": 4422.08, + "text": " this specific uh profession there''s jargon and it turns out those don''t + go together", "tokens": [51068, 341, 2685, 2232, 7032, 456, 311, 15181, 10660, 293, + 309, 4523, 484, 729, 500, 380, 352, 1214, 51424], "temperature": 0.0, "avg_logprob": + -0.1520837759360289, "compression_ratio": 1.7370892018779343, "no_speech_prob": + 0.0014404486864805222}, {"id": 574, "seek": 440088, "start": 4423.4400000000005, + "end": 4429.84, "text": " people will notice and they''ll complain about about these + things um a sort of like actually", "tokens": [51492, 561, 486, 3449, 293, 436, + 603, 11024, 466, 466, 613, 721, 1105, 257, 1333, 295, 411, 767, 51812], "temperature": + 0.0, "avg_logprob": -0.1520837759360289, "compression_ratio": 1.7370892018779343, + "no_speech_prob": 0.0014404486864805222}, {"id": 575, "seek": 442984, "start": 4429.92, + "end": 4434.96, "text": " domain independent example of this that you sometimes + see is sometimes um things that are", "tokens": [50368, 9274, 6695, 1365, 295, 341, + 300, 291, 2171, 536, 307, 2171, 1105, 721, 300, 366, 50620], "temperature": 0.0, + "avg_logprob": -0.09354063868522644, "compression_ratio": 1.7767441860465116, "no_speech_prob": + 9.249307913705707e-05}, {"id": 576, "seek": 442984, "start": 4434.96, "end": 4442.0, + "text": " opposite sexually occur together so you get like um I want to cancel my + reservation or I want to", "tokens": [50620, 6182, 26791, 5160, 1214, 370, 291, + 483, 411, 1105, 286, 528, 281, 10373, 452, 28922, 420, 286, 528, 281, 50972], "temperature": + 0.0, "avg_logprob": -0.09354063868522644, "compression_ratio": 1.7767441860465116, + "no_speech_prob": 9.249307913705707e-05}, {"id": 577, "seek": 442984, "start": 4442.0, + "end": 4448.4800000000005, "text": " confirm my reservation those sometimes co-occur + with the same kinds of words and sometimes in these", "tokens": [50972, 9064, 452, + 28922, 729, 2171, 598, 12, 905, 14112, 365, 264, 912, 3685, 295, 2283, 293, 2171, + 294, 613, 51296], "temperature": 0.0, "avg_logprob": -0.09354063868522644, "compression_ratio": + 1.7767441860465116, "no_speech_prob": 9.249307913705707e-05}, {"id": 578, "seek": + 442984, "start": 4448.4800000000005, "end": 4454.400000000001, "text": " retrieval + situations you might be able to get away with that and like a recommendations context", + "tokens": [51296, 19817, 3337, 6851, 291, 1062, 312, 1075, 281, 483, 1314, 365, + 300, 293, 411, 257, 10434, 4319, 51592], "temperature": 0.0, "avg_logprob": -0.09354063868522644, + "compression_ratio": 1.7767441860465116, "no_speech_prob": 9.249307913705707e-05}, + {"id": 579, "seek": 445440, "start": 4454.48, "end": 4460.16, "text": " for people + like yeah whatever but when I''m searching it''s like how how dare you not understand + me", "tokens": [50368, 337, 561, 411, 1338, 2035, 457, 562, 286, 478, 10808, 309, + 311, 411, 577, 577, 8955, 291, 406, 1223, 385, 50652], "temperature": 0.0, "avg_logprob": + -0.10759635405106978, "compression_ratio": 1.7186147186147187, "no_speech_prob": + 0.0013962251832708716}, {"id": 580, "seek": 445440, "start": 4460.16, "end": 4465.92, + "text": " and people almost get like offended by it because it''s almost like going + to a person at a store and", "tokens": [50652, 293, 561, 1920, 483, 411, 26776, + 538, 309, 570, 309, 311, 1920, 411, 516, 281, 257, 954, 412, 257, 3531, 293, 50940], + "temperature": 0.0, "avg_logprob": -0.10759635405106978, "compression_ratio": 1.7186147186147187, + "no_speech_prob": 0.0013962251832708716}, {"id": 581, "seek": 445440, "start": 4465.92, + "end": 4472.32, "text": " asking a question and given the exact opposite or something + yeah exactly I think my wife was recently", "tokens": [50940, 3365, 257, 1168, 293, + 2212, 264, 1900, 6182, 420, 746, 1338, 2293, 286, 519, 452, 3836, 390, 3938, 51260], + "temperature": 0.0, "avg_logprob": -0.10759635405106978, "compression_ratio": 1.7186147186147187, + "no_speech_prob": 0.0013962251832708716}, {"id": 582, "seek": 445440, "start": 4472.32, + "end": 4478.32, "text": " doing a search in one of the uh you know grocery apps + and uh everything gets delivered home today", "tokens": [51260, 884, 257, 3164, + 294, 472, 295, 264, 2232, 291, 458, 14410, 7733, 293, 2232, 1203, 2170, 10144, 1280, + 965, 51560], "temperature": 0.0, "avg_logprob": -0.10759635405106978, "compression_ratio": + 1.7186147186147187, "no_speech_prob": 0.0013962251832708716}, {"id": 583, "seek": + 447832, "start": 4478.32, "end": 4485.2, "text": " even in Europe and she was searching + for oil and she was saying hey your uh vector search you know", "tokens": [50364, + 754, 294, 3315, 293, 750, 390, 10808, 337, 3184, 293, 750, 390, 1566, 4177, 428, + 2232, 8062, 3164, 291, 458, 50708], "temperature": 0.0, "avg_logprob": -0.15906166244339157, + "compression_ratio": 1.8169014084507042, "no_speech_prob": 0.006591228302568197}, + {"id": 584, "seek": 447832, "start": 4485.2, "end": 4490.32, "text": " research + could be applied here probably so the the top result was tuna fish and she was like + why", "tokens": [50708, 2132, 727, 312, 6456, 510, 1391, 370, 264, 264, 1192, 1874, + 390, 26670, 3506, 293, 750, 390, 411, 983, 50964], "temperature": 0.0, "avg_logprob": + -0.15906166244339157, "compression_ratio": 1.8169014084507042, "no_speech_prob": + 0.006591228302568197}, {"id": 585, "seek": 447832, "start": 4491.28, "end": 4497.5199999999995, + "text": " uh maybe maybe because oil is one of the components it''s inside the oil + right so what do you want", "tokens": [51012, 2232, 1310, 1310, 570, 3184, 307, + 472, 295, 264, 6677, 309, 311, 1854, 264, 3184, 558, 370, 437, 360, 291, 528, 51324], + "temperature": 0.0, "avg_logprob": -0.15906166244339157, "compression_ratio": 1.8169014084507042, + "no_speech_prob": 0.006591228302568197}, {"id": 586, "seek": 447832, "start": 4497.5199999999995, + "end": 4503.679999999999, "text": " but she was looking for a category of things + so like breads right and she was getting yogurts", "tokens": [51324, 457, 750, 390, + 1237, 337, 257, 7719, 295, 721, 370, 411, 1403, 5834, 558, 293, 750, 390, 1242, + 16570, 374, 1373, 51632], "temperature": 0.0, "avg_logprob": -0.15906166244339157, + "compression_ratio": 1.8169014084507042, "no_speech_prob": 0.006591228302568197}, + {"id": 587, "seek": 450368, "start": 4503.92, "end": 4511.76, "text": " yeah a sudden + yeah so I think that''s probably a negative example of an explorative search or + maybe", "tokens": [50376, 1338, 257, 3990, 1338, 370, 286, 519, 300, 311, 1391, + 257, 3671, 1365, 295, 364, 24765, 1166, 3164, 420, 1310, 50768], "temperature": + 0.0, "avg_logprob": -0.21195263109709087, "compression_ratio": 1.7725118483412323, + "no_speech_prob": 0.004748633597046137}, {"id": 588, "seek": 450368, "start": 4511.76, + "end": 4517.200000000001, "text": " not I''m not sure but I think it is like you''re + a puzzle to the user not to see breads on the on the", "tokens": [50768, 406, 286, + 478, 406, 988, 457, 286, 519, 309, 307, 411, 291, 434, 257, 12805, 281, 264, 4195, + 406, 281, 536, 1403, 5834, 322, 264, 322, 264, 51040], "temperature": 0.0, "avg_logprob": + -0.21195263109709087, "compression_ratio": 1.7725118483412323, "no_speech_prob": + 0.004748633597046137}, {"id": 589, "seek": 450368, "start": 4517.200000000001, "end": + 4524.8, "text": " page and seeing yogurts yeah exactly yeah yeah that''s actually + a good like example of a the", "tokens": [51040, 3028, 293, 2577, 16570, 374, 1373, + 1338, 2293, 1338, 1338, 300, 311, 767, 257, 665, 411, 1365, 295, 257, 264, 51420], + "temperature": 0.0, "avg_logprob": -0.21195263109709087, "compression_ratio": 1.7725118483412323, + "no_speech_prob": 0.004748633597046137}, {"id": 590, "seek": 450368, "start": 4524.8, + "end": 4530.0, "text": " traditional search engine kind of doing that or it''s like + oil but it''s tuna and oil", "tokens": [51420, 5164, 3164, 2848, 733, 295, 884, + 300, 420, 309, 311, 411, 3184, 457, 309, 311, 26670, 293, 3184, 51680], "temperature": + 0.0, "avg_logprob": -0.21195263109709087, "compression_ratio": 1.7725118483412323, + "no_speech_prob": 0.004748633597046137}, {"id": 591, "seek": 453000, "start": 4530.72, + "end": 4534.48, "text": " whereas uh maybe a dense vector search might actually + work better", "tokens": [50400, 9735, 2232, 1310, 257, 18011, 8062, 3164, 1062, + 767, 589, 1101, 50588], "temperature": 0.0, "avg_logprob": -0.11657665715073094, + "compression_ratio": 1.7790697674418605, "no_speech_prob": 0.0010015310253947973}, + {"id": 592, "seek": 453000, "start": 4535.28, "end": 4542.72, "text": " until you + get like motor oil it''s like so yeah both sides have to be tuned carefully because + yeah", "tokens": [50628, 1826, 291, 483, 411, 5932, 3184, 309, 311, 411, 370, 1338, + 1293, 4881, 362, 281, 312, 10870, 7500, 570, 1338, 51000], "temperature": 0.0, "avg_logprob": + -0.11657665715073094, "compression_ratio": 1.7790697674418605, "no_speech_prob": + 0.0010015310253947973}, {"id": 593, "seek": 453000, "start": 4542.72, "end": 4547.92, + "text": " search can really searches one of those things and I think the article + we talked about a while ago", "tokens": [51000, 3164, 393, 534, 26701, 472, 295, + 729, 721, 293, 286, 519, 264, 7222, 321, 2825, 466, 257, 1339, 2057, 51260], "temperature": + 0.0, "avg_logprob": -0.11657665715073094, "compression_ratio": 1.7790697674418605, + "no_speech_prob": 0.0010015310253947973}, {"id": 594, "seek": 453000, "start": 4548.56, + "end": 4553.68, "text": " about like the Google article actually talks about this + not just in terms of loss revenue but in", "tokens": [51292, 466, 411, 264, 3329, + 7222, 767, 6686, 466, 341, 406, 445, 294, 2115, 295, 4470, 9324, 457, 294, 51548], + "temperature": 0.0, "avg_logprob": -0.11657665715073094, "compression_ratio": 1.7790697674418605, + "no_speech_prob": 0.0010015310253947973}, {"id": 595, "seek": 453000, "start": 4553.68, + "end": 4558.64, "text": " terms of brand retention because people will not come + back to your store if they''re like the search", "tokens": [51548, 2115, 295, 3360, + 22871, 570, 561, 486, 406, 808, 646, 281, 428, 3531, 498, 436, 434, 411, 264, 3164, + 51796], "temperature": 0.0, "avg_logprob": -0.11657665715073094, "compression_ratio": + 1.7790697674418605, "no_speech_prob": 0.0010015310253947973}, {"id": 596, "seek": + 455864, "start": 4558.64, "end": 4566.56, "text": " doesn''t get me the seems dumb + so it''s it''s uh it definitely yeah people people notice when search", "tokens": + [50364, 1177, 380, 483, 385, 264, 2544, 10316, 370, 309, 311, 309, 311, 2232, 309, + 2138, 1338, 561, 561, 3449, 562, 3164, 50760], "temperature": 0.0, "avg_logprob": + -0.19765067100524902, "compression_ratio": 1.6398305084745763, "no_speech_prob": + 0.0017145259771496058}, {"id": 597, "seek": 455864, "start": 4566.56, "end": 4574.56, + "text": " is not understanding them yeah 300 billion dollar opportunity for everyone + yeah out there so in this", "tokens": [50760, 307, 406, 3701, 552, 1338, 6641, 5218, + 7241, 2650, 337, 1518, 1338, 484, 456, 370, 294, 341, 51160], "temperature": 0.0, + "avg_logprob": -0.19765067100524902, "compression_ratio": 1.6398305084745763, "no_speech_prob": + 0.0017145259771496058}, {"id": 598, "seek": 455864, "start": 4574.56, "end": 4581.4400000000005, + "text": " maze of things learning to rank density well we still need to also um + get concerned with how we", "tokens": [51160, 33032, 295, 721, 2539, 281, 6181, + 10305, 731, 321, 920, 643, 281, 611, 1105, 483, 5922, 365, 577, 321, 51504], "temperature": + 0.0, "avg_logprob": -0.19765067100524902, "compression_ratio": 1.6398305084745763, + "no_speech_prob": 0.0017145259771496058}, {"id": 599, "seek": 455864, "start": 4581.4400000000005, + "end": 4587.68, "text": " manage this project right and yeah a lot of a lot of ideas + here and thoughts uh but like I''m", "tokens": [51504, 3067, 341, 1716, 558, 293, + 1338, 257, 688, 295, 257, 688, 295, 3487, 510, 293, 4598, 2232, 457, 411, 286, 478, + 51816], "temperature": 0.0, "avg_logprob": -0.19765067100524902, "compression_ratio": + 1.6398305084745763, "no_speech_prob": 0.0017145259771496058}, {"id": 600, "seek": + 458768, "start": 4587.68, "end": 4595.76, "text": " particularly interested in this + um in the search engineer role transcending itself to something", "tokens": [50364, + 4098, 3102, 294, 341, 1105, 294, 264, 3164, 11403, 3090, 43800, 2029, 2564, 281, + 746, 50768], "temperature": 0.0, "avg_logprob": -0.0729043565947434, "compression_ratio": + 1.7342342342342343, "no_speech_prob": 0.00031562105868943036}, {"id": 601, "seek": + 458768, "start": 4595.76, "end": 4601.76, "text": " else for example it used to + be I don''t know I was tuning I was a solar relevancy search engineer", "tokens": + [50768, 1646, 337, 1365, 309, 1143, 281, 312, 286, 500, 380, 458, 286, 390, 15164, + 286, 390, 257, 7936, 25916, 6717, 3164, 11403, 51068], "temperature": 0.0, "avg_logprob": + -0.0729043565947434, "compression_ratio": 1.7342342342342343, "no_speech_prob": + 0.00031562105868943036}, {"id": 602, "seek": 458768, "start": 4601.76, "end": 4608.56, + "text": " I guess a few years ago and I was just reading these XML files and tweaking + and tweaking and then", "tokens": [51068, 286, 2041, 257, 1326, 924, 2057, 293, + 286, 390, 445, 3760, 613, 43484, 7098, 293, 6986, 2456, 293, 6986, 2456, 293, 550, + 51408], "temperature": 0.0, "avg_logprob": -0.0729043565947434, "compression_ratio": + 1.7342342342342343, "no_speech_prob": 0.00031562105868943036}, {"id": 603, "seek": + 458768, "start": 4608.56, "end": 4614.56, "text": " you know you know indexing search + pipeline and so on but today you mentioned this data science", "tokens": [51408, + 291, 458, 291, 458, 8186, 278, 3164, 15517, 293, 370, 322, 457, 965, 291, 2835, + 341, 1412, 3497, 51708], "temperature": 0.0, "avg_logprob": -0.0729043565947434, + "compression_ratio": 1.7342342342342343, "no_speech_prob": 0.00031562105868943036}, + {"id": 604, "seek": 461456, "start": 4614.72, "end": 4622.080000000001, "text": + " came into play and it''s still being integrated what what other aspects do we + need to think about", "tokens": [50372, 1361, 666, 862, 293, 309, 311, 920, 885, + 10919, 437, 437, 661, 7270, 360, 321, 643, 281, 519, 466, 50740], "temperature": + 0.0, "avg_logprob": -0.19020680879291735, "compression_ratio": 1.625, "no_speech_prob": + 0.015112183056771755}, {"id": 605, "seek": 461456, "start": 4622.080000000001, "end": + 4628.72, "text": " how should we form search teams uh I believe you have a blog + post on that as well we will", "tokens": [50740, 577, 820, 321, 1254, 3164, 5491, + 2232, 286, 1697, 291, 362, 257, 6968, 2183, 322, 300, 382, 731, 321, 486, 51072], + "temperature": 0.0, "avg_logprob": -0.19020680879291735, "compression_ratio": 1.625, + "no_speech_prob": 0.015112183056771755}, {"id": 606, "seek": 461456, "start": 4628.72, + "end": 4635.84, "text": " cite this in the show notes oh yeah a Shopify and I know + the Shopify yearning blog I know Eric Pugh at", "tokens": [51072, 37771, 341, 294, + 264, 855, 5570, 1954, 1338, 257, 43991, 293, 286, 458, 264, 43991, 1064, 773, 6968, + 286, 458, 9336, 430, 1984, 412, 51428], "temperature": 0.0, "avg_logprob": -0.19020680879291735, + "compression_ratio": 1.625, "no_speech_prob": 0.015112183056771755}, {"id": 607, + "seek": 461456, "start": 4635.84, "end": 4642.400000000001, "text": " Opus source + connections talks about a lot too um yeah and I can''t say I have all the answers + because", "tokens": [51428, 12011, 301, 4009, 9271, 6686, 466, 257, 688, 886, 1105, + 1338, 293, 286, 393, 380, 584, 286, 362, 439, 264, 6338, 570, 51756], "temperature": + 0.0, "avg_logprob": -0.19020680879291735, "compression_ratio": 1.625, "no_speech_prob": + 0.015112183056771755}, {"id": 608, "seek": 464240, "start": 4642.4, "end": 4647.04, + "text": " if you like it''s a you''re right it''s a brand new space I think it''s + an interesting thing to talk about", "tokens": [50364, 498, 291, 411, 309, 311, + 257, 291, 434, 558, 309, 311, 257, 3360, 777, 1901, 286, 519, 309, 311, 364, 1880, + 551, 281, 751, 466, 50596], "temperature": 0.0, "avg_logprob": -0.1273274070338199, + "compression_ratio": 1.7725118483412323, "no_speech_prob": 0.00017529280739836395}, + {"id": 609, "seek": 464240, "start": 4648.0, "end": 4654.0, "text": " I think there''s + two principles I think about when I think about a search team and you can''t", "tokens": + [50644, 286, 519, 456, 311, 732, 9156, 286, 519, 466, 562, 286, 519, 466, 257, 3164, + 1469, 293, 291, 393, 380, 50944], "temperature": 0.0, "avg_logprob": -0.1273274070338199, + "compression_ratio": 1.7725118483412323, "no_speech_prob": 0.00017529280739836395}, + {"id": 610, "seek": 464240, "start": 4655.44, "end": 4662.08, "text": " you have + to do both and it''s like it''s like uh building a plane while you''re flying it", + "tokens": [51016, 291, 362, 281, 360, 1293, 293, 309, 311, 411, 309, 311, 411, 2232, + 2390, 257, 5720, 1339, 291, 434, 7137, 309, 51348], "temperature": 0.0, "avg_logprob": + -0.1273274070338199, "compression_ratio": 1.7725118483412323, "no_speech_prob": + 0.00017529280739836395}, {"id": 611, "seek": 464240, "start": 4663.44, "end": 4670.5599999999995, + "text": " do you always you can''t be that I remember at Opus source connections + sometimes we would get", "tokens": [51416, 360, 291, 1009, 291, 393, 380, 312, 300, + 286, 1604, 412, 12011, 301, 4009, 9271, 2171, 321, 576, 483, 51772], "temperature": + 0.0, "avg_logprob": -0.1273274070338199, "compression_ratio": 1.7725118483412323, + "no_speech_prob": 0.00017529280739836395}, {"id": 612, "seek": 467056, "start": + 4670.56, "end": 4676.160000000001, "text": " in projects that would be very almost + like two infrastructure focus uh and then other projects", "tokens": [50364, 294, + 4455, 300, 576, 312, 588, 1920, 411, 732, 6896, 1879, 2232, 293, 550, 661, 4455, + 50644], "temperature": 0.0, "avg_logprob": -0.10289731317636919, "compression_ratio": + 1.89453125, "no_speech_prob": 0.00040984523366205394}, {"id": 613, "seek": 467056, + "start": 4676.160000000001, "end": 4683.04, "text": " that would be two only building + the experiments and solutions um and what I mean by that is sometimes", "tokens": + [50644, 300, 576, 312, 732, 787, 2390, 264, 12050, 293, 6547, 1105, 293, 437, 286, + 914, 538, 300, 307, 2171, 50988], "temperature": 0.0, "avg_logprob": -0.10289731317636919, + "compression_ratio": 1.89453125, "no_speech_prob": 0.00040984523366205394}, {"id": + 614, "seek": 467056, "start": 4683.04, "end": 4687.92, "text": " the infrastructure + folks experiments is more like oh we''re gonna gather we''re gonna spend nine", + "tokens": [50988, 264, 6896, 4024, 12050, 307, 544, 411, 1954, 321, 434, 799, 5448, + 321, 434, 799, 3496, 4949, 51232], "temperature": 0.0, "avg_logprob": -0.10289731317636919, + "compression_ratio": 1.89453125, "no_speech_prob": 0.00040984523366205394}, {"id": + 615, "seek": 467056, "start": 4687.92, "end": 4692.320000000001, "text": " months + gathering clicks and processing it and trying to understand what''s relevant before + you''ve", "tokens": [51232, 2493, 13519, 18521, 293, 9007, 309, 293, 1382, 281, + 1223, 437, 311, 7340, 949, 291, 600, 51452], "temperature": 0.0, "avg_logprob": + -0.10289731317636919, "compression_ratio": 1.89453125, "no_speech_prob": 0.00040984523366205394}, + {"id": 616, "seek": 467056, "start": 4692.320000000001, "end": 4698.240000000001, + "text": " been touching a model or or tuning relevance wherever and then the other + end of the spectrum you", "tokens": [51452, 668, 11175, 257, 2316, 420, 420, 15164, + 32684, 8660, 293, 550, 264, 661, 917, 295, 264, 11143, 291, 51748], "temperature": + 0.0, "avg_logprob": -0.10289731317636919, "compression_ratio": 1.89453125, "no_speech_prob": + 0.00040984523366205394}, {"id": 617, "seek": 469824, "start": 4698.24, "end": 4704.24, + "text": " have systems that are just like we''re not even gonna try to understand + what''s relevant just", "tokens": [50364, 362, 3652, 300, 366, 445, 411, 321, 434, + 406, 754, 799, 853, 281, 1223, 437, 311, 7340, 445, 50664], "temperature": 0.0, + "avg_logprob": -0.11791805940515855, "compression_ratio": 1.7798165137614679, "no_speech_prob": + 0.00018473845557309687}, {"id": 618, "seek": 469824, "start": 4704.24, "end": 4711.04, + "text": " tune things and you know yolo ship it and hopefully hopefully things hopefully + things look good", "tokens": [50664, 10864, 721, 293, 291, 458, 288, 7902, 5374, + 309, 293, 4696, 4696, 721, 4696, 721, 574, 665, 51004], "temperature": 0.0, "avg_logprob": + -0.11791805940515855, "compression_ratio": 1.7798165137614679, "no_speech_prob": + 0.00018473845557309687}, {"id": 619, "seek": 469824, "start": 4711.04, "end": 4717.44, + "text": " and what you know and honestly both of those are anti-patterns because + obviously in the case where", "tokens": [51004, 293, 437, 291, 458, 293, 6095, 1293, + 295, 729, 366, 6061, 12, 79, 1161, 3695, 570, 2745, 294, 264, 1389, 689, 51324], + "temperature": 0.0, "avg_logprob": -0.11791805940515855, "compression_ratio": 1.7798165137614679, + "no_speech_prob": 0.00018473845557309687}, {"id": 620, "seek": 469824, "start": + 4717.44, "end": 4724.88, "text": " we just like study the problem we never actually + deliver anything and uh not just as a consultant but", "tokens": [51324, 321, 445, + 411, 2979, 264, 1154, 321, 1128, 767, 4239, 1340, 293, 2232, 406, 445, 382, 257, + 24676, 457, 51696], "temperature": 0.0, "avg_logprob": -0.11791805940515855, "compression_ratio": + 1.7798165137614679, "no_speech_prob": 0.00018473845557309687}, {"id": 621, "seek": + 472488, "start": 4724.88, "end": 4729.84, "text": " as a practitioner working on + a team your stakeholders are gonna lose patience they''re you''re not", "tokens": + [50364, 382, 257, 32125, 1364, 322, 257, 1469, 428, 17779, 366, 799, 3624, 14826, + 436, 434, 291, 434, 406, 50612], "temperature": 0.0, "avg_logprob": -0.13548540227553424, + "compression_ratio": 1.6926406926406927, "no_speech_prob": 0.00010952988668577746}, + {"id": 622, "seek": 472488, "start": 4729.84, "end": 4736.72, "text": " gonna have + much success well the other hand um I''ve seen like I''ve had I had one project + where", "tokens": [50612, 799, 362, 709, 2245, 731, 264, 661, 1011, 1105, 286, 600, + 1612, 411, 286, 600, 632, 286, 632, 472, 1716, 689, 50956], "temperature": 0.0, + "avg_logprob": -0.13548540227553424, "compression_ratio": 1.6926406926406927, "no_speech_prob": + 0.00010952988668577746}, {"id": 623, "seek": 472488, "start": 4737.76, "end": 4744.16, + "text": " spent months and months and months developing experiments they did have + the ability to AB test but we", "tokens": [51008, 4418, 2493, 293, 2493, 293, 2493, + 6416, 12050, 436, 630, 362, 264, 3485, 281, 13838, 1500, 457, 321, 51328], "temperature": + 0.0, "avg_logprob": -0.13548540227553424, "compression_ratio": 1.6926406926406927, + "no_speech_prob": 0.00010952988668577746}, {"id": 624, "seek": 472488, "start": + 4744.16, "end": 4750.4800000000005, "text": " didn''t really have any ability to + understand or dig below underneath like what was happening at a", "tokens": [51328, + 994, 380, 534, 362, 604, 3485, 281, 1223, 420, 2528, 2507, 7223, 411, 437, 390, + 2737, 412, 257, 51644], "temperature": 0.0, "avg_logprob": -0.13548540227553424, + "compression_ratio": 1.6926406926406927, "no_speech_prob": 0.00010952988668577746}, + {"id": 625, "seek": 475048, "start": 4750.48, "end": 4757.679999999999, "text": + " query level or anything where we just spend months and months experimenting and + through a dozen", "tokens": [50364, 14581, 1496, 420, 1340, 689, 321, 445, 3496, + 2493, 293, 2493, 29070, 293, 807, 257, 16654, 50724], "temperature": 0.0, "avg_logprob": + -0.11136502311343238, "compression_ratio": 1.7555555555555555, "no_speech_prob": + 5.060793773736805e-05}, {"id": 626, "seek": 475048, "start": 4757.679999999999, + "end": 4764.16, "text": " experiments at the wall for AB tests none of them turned + out to matter and I suspect in that case it", "tokens": [50724, 12050, 412, 264, + 2929, 337, 13838, 6921, 6022, 295, 552, 3574, 484, 281, 1871, 293, 286, 9091, 294, + 300, 1389, 309, 51048], "temperature": 0.0, "avg_logprob": -0.11136502311343238, + "compression_ratio": 1.7555555555555555, "no_speech_prob": 5.060793773736805e-05}, + {"id": 627, "seek": 475048, "start": 4764.16, "end": 4773.5199999999995, "text": + " was turned out to be a performance issue or a UX issue that was actually more + a problem and really", "tokens": [51048, 390, 3574, 484, 281, 312, 257, 3389, 2734, + 420, 257, 40176, 2734, 300, 390, 767, 544, 257, 1154, 293, 534, 51516], "temperature": + 0.0, "avg_logprob": -0.11136502311343238, "compression_ratio": 1.7555555555555555, + "no_speech_prob": 5.060793773736805e-05}, {"id": 628, "seek": 475048, "start": 4773.5199999999995, + "end": 4779.839999999999, "text": " what you have to be doing in this relevant space + is shipping experiments all the time with whatever", "tokens": [51516, 437, 291, + 362, 281, 312, 884, 294, 341, 7340, 1901, 307, 14122, 12050, 439, 264, 565, 365, + 2035, 51832], "temperature": 0.0, "avg_logprob": -0.11136502311343238, "compression_ratio": + 1.7555555555555555, "no_speech_prob": 5.060793773736805e-05}, {"id": 629, "seek": + 478048, "start": 4780.959999999999, "end": 4787.5199999999995, "text": " for structure + you have to support them while simultaneously like changing the engine that you''re", + "tokens": [50388, 337, 3877, 291, 362, 281, 1406, 552, 1339, 16561, 411, 4473, 264, + 2848, 300, 291, 434, 50716], "temperature": 0.0, "avg_logprob": -0.12182502286979952, + "compression_ratio": 1.7300884955752212, "no_speech_prob": 0.0003891474916599691}, + {"id": 630, "seek": 478048, "start": 4787.5199999999995, "end": 4794.0, "text": + " using to like understand the quality of relevance so as an example you might do + something like", "tokens": [50716, 1228, 281, 411, 1223, 264, 3125, 295, 32684, + 370, 382, 364, 1365, 291, 1062, 360, 746, 411, 51040], "temperature": 0.0, "avg_logprob": + -0.12182502286979952, "compression_ratio": 1.7300884955752212, "no_speech_prob": + 0.0003891474916599691}, {"id": 631, "seek": 478048, "start": 4794.0, "end": 4801.28, + "text": " start out with cupid and start shipping things just incrementally with + cupid getting people''s feedback", "tokens": [51040, 722, 484, 365, 4414, 327, 293, + 722, 14122, 721, 445, 26200, 379, 365, 4414, 327, 1242, 561, 311, 5824, 51404], + "temperature": 0.0, "avg_logprob": -0.12182502286979952, "compression_ratio": 1.7300884955752212, + "no_speech_prob": 0.0003891474916599691}, {"id": 632, "seek": 478048, "start": 4801.28, + "end": 4810.08, "text": " as bad as it might be knowing it''s wrong um and start + shipping changes but at the same time with", "tokens": [51404, 382, 1578, 382, 309, + 1062, 312, 5276, 309, 311, 2085, 1105, 293, 722, 14122, 2962, 457, 412, 264, 912, + 565, 365, 51844], "temperature": 0.0, "avg_logprob": -0.12182502286979952, "compression_ratio": + 1.7300884955752212, "no_speech_prob": 0.0003891474916599691}, {"id": 633, "seek": + 481008, "start": 4810.08, "end": 4814.24, "text": " you''re doing that with your + right hand and with your left hand you''re kind of going in like oh we", "tokens": + [50364, 291, 434, 884, 300, 365, 428, 558, 1011, 293, 365, 428, 1411, 1011, 291, + 434, 733, 295, 516, 294, 411, 1954, 321, 50572], "temperature": 0.0, "avg_logprob": + -0.13109847351356788, "compression_ratio": 1.825278810408922, "no_speech_prob": + 0.00013177083746995777}, {"id": 634, "seek": 481008, "start": 4814.24, "end": 4819.44, + "text": " have to start gathering click data because eventually the cupid experimentation + might hit diminishing", "tokens": [50572, 362, 281, 722, 13519, 2052, 1412, 570, + 4728, 264, 4414, 327, 37142, 1062, 2045, 15739, 3807, 50832], "temperature": 0.0, + "avg_logprob": -0.13109847351356788, "compression_ratio": 1.825278810408922, "no_speech_prob": + 0.00013177083746995777}, {"id": 635, "seek": 481008, "start": 4819.44, "end": 4825.6, + "text": " returns or might get it really subtle cases that people aren''t gonna + be able to easily tell me the", "tokens": [50832, 11247, 420, 1062, 483, 309, 534, + 13743, 3331, 300, 561, 3212, 380, 799, 312, 1075, 281, 3612, 980, 385, 264, 51140], + "temperature": 0.0, "avg_logprob": -0.13109847351356788, "compression_ratio": 1.825278810408922, + "no_speech_prob": 0.00013177083746995777}, {"id": 636, "seek": 481008, "start": + 4825.6, "end": 4831.68, "text": " difference and if you''re not doing both you''re + you''re really gonna get yourself into trouble", "tokens": [51140, 2649, 293, 498, + 291, 434, 406, 884, 1293, 291, 434, 291, 434, 534, 799, 483, 1803, 666, 5253, 51444], + "temperature": 0.0, "avg_logprob": -0.13109847351356788, "compression_ratio": 1.825278810408922, + "no_speech_prob": 0.00013177083746995777}, {"id": 637, "seek": 481008, "start": + 4833.5199999999995, "end": 4839.84, "text": " yeah and it''s it''s amazing you really + described it as a not an individual level experience but like", "tokens": [51536, + 1338, 293, 309, 311, 309, 311, 2243, 291, 534, 7619, 309, 382, 257, 406, 364, 2609, + 1496, 1752, 457, 411, 51852], "temperature": 0.0, "avg_logprob": -0.13109847351356788, + "compression_ratio": 1.825278810408922, "no_speech_prob": 0.00013177083746995777}, + {"id": 638, "seek": 483984, "start": 4839.84, "end": 4847.52, "text": " a team level + experience right like and now well everyone can figure out okay add the data scientist", + "tokens": [50364, 257, 1469, 1496, 1752, 558, 411, 293, 586, 731, 1518, 393, 2573, + 484, 1392, 909, 264, 1412, 12662, 50748], "temperature": 0.0, "avg_logprob": -0.15344037115573883, + "compression_ratio": 1.6021505376344085, "no_speech_prob": 0.0002890203322749585}, + {"id": 639, "seek": 483984, "start": 4847.52, "end": 4856.08, "text": " at the UX + person at the product manager at the research engineers and work together in one + single", "tokens": [50748, 412, 264, 40176, 954, 412, 264, 1674, 6598, 412, 264, + 2132, 11955, 293, 589, 1214, 294, 472, 2167, 51176], "temperature": 0.0, "avg_logprob": + -0.15344037115573883, "compression_ratio": 1.6021505376344085, "no_speech_prob": + 0.0002890203322749585}, {"id": 640, "seek": 483984, "start": 4856.08, "end": 4864.08, + "text": " concert right yeah totally yeah and that''s a tricky thing to build because + um so the first that yeah", "tokens": [51176, 8543, 558, 1338, 3879, 1338, 293, + 300, 311, 257, 12414, 551, 281, 1322, 570, 1105, 370, 264, 700, 300, 1338, 51576], + "temperature": 0.0, "avg_logprob": -0.15344037115573883, "compression_ratio": 1.6021505376344085, + "no_speech_prob": 0.0002890203322749585}, {"id": 641, "seek": 486408, "start": 4864.24, + "end": 4872.72, "text": " you do need all those roles um and you need a tremendous + amount of data literacy so not only do you", "tokens": [50372, 291, 360, 643, 439, + 729, 9604, 1105, 293, 291, 643, 257, 10048, 2372, 295, 1412, 23166, 370, 406, 787, + 360, 291, 50796], "temperature": 0.0, "avg_logprob": -0.07997472469623272, "compression_ratio": + 1.8844621513944224, "no_speech_prob": 0.0001593780325492844}, {"id": 642, "seek": + 486408, "start": 4872.72, "end": 4879.68, "text": " need uh you need those roles + you need like probably a good strong core of engineering and data", "tokens": [50796, + 643, 2232, 291, 643, 729, 9604, 291, 643, 411, 1391, 257, 665, 2068, 4965, 295, + 7043, 293, 1412, 51144], "temperature": 0.0, "avg_logprob": -0.07997472469623272, + "compression_ratio": 1.8844621513944224, "no_speech_prob": 0.0001593780325492844}, + {"id": 643, "seek": 486408, "start": 4879.68, "end": 4884.8, "text": " working together + so that''s probably a good place to start but as you add as you eventually add", + "tokens": [51144, 1364, 1214, 370, 300, 311, 1391, 257, 665, 1081, 281, 722, 457, + 382, 291, 909, 382, 291, 4728, 909, 51400], "temperature": 0.0, "avg_logprob": -0.07997472469623272, + "compression_ratio": 1.8844621513944224, "no_speech_prob": 0.0001593780325492844}, + {"id": 644, "seek": 486408, "start": 4884.8, "end": 4888.72, "text": " like someone + like a product manager like what does a product manager on a search team do", "tokens": + [51400, 411, 1580, 411, 257, 1674, 6598, 411, 437, 775, 257, 1674, 6598, 322, 257, + 3164, 1469, 360, 51596], "temperature": 0.0, "avg_logprob": -0.07997472469623272, + "compression_ratio": 1.8844621513944224, "no_speech_prob": 0.0001593780325492844}, + {"id": 645, "seek": 486408, "start": 4889.84, "end": 4893.6, "text": " um and that''s + a really interesting question because I think it''s quite different than building", + "tokens": [51652, 1105, 293, 300, 311, 257, 534, 1880, 1168, 570, 286, 519, 309, + 311, 1596, 819, 813, 2390, 51840], "temperature": 0.0, "avg_logprob": -0.07997472469623272, + "compression_ratio": 1.8844621513944224, "no_speech_prob": 0.0001593780325492844}, + {"id": 646, "seek": 489360, "start": 4893.6, "end": 4899.04, "text": " other features + a product manager on a search team is constantly looking at data", "tokens": [50364, + 661, 4122, 257, 1674, 6598, 322, 257, 3164, 1469, 307, 6460, 1237, 412, 1412, 50636], + "temperature": 0.0, "avg_logprob": -0.0957459294518759, "compression_ratio": 1.7523809523809524, + "no_speech_prob": 4.405945219332352e-05}, {"id": 647, "seek": 489360, "start": 4900.160000000001, + "end": 4905.200000000001, "text": " trying to let''s just say at the query level + because it doesn''t have to be the query level could be a", "tokens": [50692, 1382, + 281, 718, 311, 445, 584, 412, 264, 14581, 1496, 570, 309, 1177, 380, 362, 281, 312, + 264, 14581, 1496, 727, 312, 257, 50944], "temperature": 0.0, "avg_logprob": -0.0957459294518759, + "compression_ratio": 1.7523809523809524, "no_speech_prob": 4.405945219332352e-05}, + {"id": 648, "seek": 489360, "start": 4905.200000000001, "end": 4912.0, "text": " + user or whatever is trying to say like here''s a cluster of problems we have or + opportunities", "tokens": [50944, 4195, 420, 2035, 307, 1382, 281, 584, 411, 510, + 311, 257, 13630, 295, 2740, 321, 362, 420, 4786, 51284], "temperature": 0.0, "avg_logprob": + -0.0957459294518759, "compression_ratio": 1.7523809523809524, "no_speech_prob": + 4.405945219332352e-05}, {"id": 649, "seek": 489360, "start": 4913.200000000001, + "end": 4919.52, "text": " maybe it''s this kind of search a search for um colors + in products or a search for this type of", "tokens": [51344, 1310, 309, 311, 341, + 733, 295, 3164, 257, 3164, 337, 1105, 4577, 294, 3383, 420, 257, 3164, 337, 341, + 2010, 295, 51660], "temperature": 0.0, "avg_logprob": -0.0957459294518759, "compression_ratio": + 1.7523809523809524, "no_speech_prob": 4.405945219332352e-05}, {"id": 650, "seek": + 491952, "start": 4919.52, "end": 4926.8, "text": " terminology and then have some + like has to have the ability to do the constantly do the analysis", "tokens": [50364, + 27575, 293, 550, 362, 512, 411, 575, 281, 362, 264, 3485, 281, 360, 264, 6460, 360, + 264, 5215, 50728], "temperature": 0.0, "avg_logprob": -0.11765452961862823, "compression_ratio": + 1.7649769585253456, "no_speech_prob": 8.644043555250391e-05}, {"id": 651, "seek": + 491952, "start": 4926.8, "end": 4935.040000000001, "text": " of that data advocate + for data that they need to get implemented and then um understand to some", "tokens": + [50728, 295, 300, 1412, 14608, 337, 1412, 300, 436, 643, 281, 483, 12270, 293, 550, + 1105, 1223, 281, 512, 51140], "temperature": 0.0, "avg_logprob": -0.11765452961862823, + "compression_ratio": 1.7649769585253456, "no_speech_prob": 8.644043555250391e-05}, + {"id": 652, "seek": 491952, "start": 4935.040000000001, "end": 4939.84, "text": + " level like when they work with their data and engineering team what are the experiments + like", "tokens": [51140, 1496, 411, 562, 436, 589, 365, 641, 1412, 293, 7043, 1469, + 437, 366, 264, 12050, 411, 51380], "temperature": 0.0, "avg_logprob": -0.11765452961862823, + "compression_ratio": 1.7649769585253456, "no_speech_prob": 8.644043555250391e-05}, + {"id": 653, "seek": 491952, "start": 4939.84, "end": 4945.76, "text": " let''s think + about half a dozen experiments that could treat this problem prior to ties and triage", + "tokens": [51380, 718, 311, 519, 466, 1922, 257, 16654, 12050, 300, 727, 2387, 341, + 1154, 4059, 281, 14039, 293, 1376, 609, 51676], "temperature": 0.0, "avg_logprob": + -0.11765452961862823, "compression_ratio": 1.7649769585253456, "no_speech_prob": + 8.644043555250391e-05}, {"id": 654, "seek": 494576, "start": 4946.16, "end": 4954.24, + "text": " in terms of reward effort trade-off and um and really plan out how we + do those experiments", "tokens": [50384, 294, 2115, 295, 7782, 4630, 4923, 12, 4506, + 293, 1105, 293, 534, 1393, 484, 577, 321, 360, 729, 12050, 50788], "temperature": + 0.0, "avg_logprob": -0.10264726616870398, "compression_ratio": 1.8792270531400965, + "no_speech_prob": 0.0013574406038969755}, {"id": 655, "seek": 494576, "start": 4954.88, + "end": 4960.320000000001, "text": " and when you do that planning it''s not just + about planning the like nuts and bolts of how we get", "tokens": [50820, 293, 562, + 291, 360, 300, 5038, 309, 311, 406, 445, 466, 5038, 264, 411, 10483, 293, 18127, + 295, 577, 321, 483, 51092], "temperature": 0.0, "avg_logprob": -0.10264726616870398, + "compression_ratio": 1.8792270531400965, "no_speech_prob": 0.0013574406038969755}, + {"id": 656, "seek": 494576, "start": 4960.320000000001, "end": 4965.2, "text": " + this experiment into production like we built this pipeline we do these things it''s + also building", "tokens": [51092, 341, 5120, 666, 4265, 411, 321, 3094, 341, 15517, + 321, 360, 613, 721, 309, 311, 611, 2390, 51336], "temperature": 0.0, "avg_logprob": + -0.10264726616870398, "compression_ratio": 1.8792270531400965, "no_speech_prob": + 0.0013574406038969755}, {"id": 657, "seek": 494576, "start": 4965.2, "end": 4972.8, + "text": " the like how will we measure how we answer the questions about those experiments + um and that''s a pretty", "tokens": [51336, 264, 411, 577, 486, 321, 3481, 577, + 321, 1867, 264, 1651, 466, 729, 12050, 1105, 293, 300, 311, 257, 1238, 51716], "temperature": + 0.0, "avg_logprob": -0.10264726616870398, "compression_ratio": 1.8792270531400965, + "no_speech_prob": 0.0013574406038969755}, {"id": 658, "seek": 497280, "start": 4973.4400000000005, + "end": 4980.64, "text": " I feel like that''s one of the toughest roles that''s + a unicorn that is it''s hard to hard to", "tokens": [50396, 286, 841, 411, 300, + 311, 472, 295, 264, 35037, 9604, 300, 311, 257, 28122, 300, 307, 309, 311, 1152, + 281, 1152, 281, 50756], "temperature": 0.0, "avg_logprob": -0.1961384982597537, + "compression_ratio": 1.778894472361809, "no_speech_prob": 0.0013565749395638704}, + {"id": 659, "seek": 497280, "start": 4982.16, "end": 4986.8, "text": " have someone + with all of those skills but it''s also really essential to really be able to", + "tokens": [50832, 362, 1580, 365, 439, 295, 729, 3942, 457, 309, 311, 611, 534, + 7115, 281, 534, 312, 1075, 281, 51064], "temperature": 0.0, "avg_logprob": -0.1961384982597537, + "compression_ratio": 1.778894472361809, "no_speech_prob": 0.0013565749395638704}, + {"id": 660, "seek": 497280, "start": 4987.68, "end": 4993.68, "text": " have a really + successful search team really accurately put I mean I''m still", "tokens": [51108, + 362, 257, 534, 4406, 3164, 1469, 534, 20095, 829, 286, 914, 286, 478, 920, 51408], + "temperature": 0.0, "avg_logprob": -0.1961384982597537, "compression_ratio": 1.778894472361809, + "no_speech_prob": 0.0013565749395638704}, {"id": 661, "seek": 497280, "start": 4993.68, + "end": 4999.360000000001, "text": " learning the product manager you know roll myself + but like that''s exactly right you know like you", "tokens": [51408, 2539, 264, + 1674, 6598, 291, 458, 3373, 2059, 457, 411, 300, 311, 2293, 558, 291, 458, 411, + 291, 51692], "temperature": 0.0, "avg_logprob": -0.1961384982597537, "compression_ratio": + 1.778894472361809, "no_speech_prob": 0.0013565749395638704}, {"id": 662, "seek": + 499936, "start": 4999.36, "end": 5005.599999999999, "text": " need to generate the + insight for yourself you you''re constantly like a detective work you know", "tokens": + [50364, 643, 281, 8460, 264, 11269, 337, 1803, 291, 291, 434, 6460, 411, 257, 25571, + 589, 291, 458, 50676], "temperature": 0.0, "avg_logprob": -0.12253142254693168, + "compression_ratio": 1.8697318007662835, "no_speech_prob": 0.0008889682940207422}, + {"id": 663, "seek": 499936, "start": 5005.599999999999, "end": 5011.28, "text": + " you keep looking yeah that''s a good way of putting your detective uh you have + to be a really good", "tokens": [50676, 291, 1066, 1237, 1338, 300, 311, 257, 665, + 636, 295, 3372, 428, 25571, 2232, 291, 362, 281, 312, 257, 534, 665, 50960], "temperature": + 0.0, "avg_logprob": -0.12253142254693168, "compression_ratio": 1.8697318007662835, + "no_speech_prob": 0.0008889682940207422}, {"id": 664, "seek": 499936, "start": 5011.28, + "end": 5017.2, "text": " detective and then you have to like you also have to like + figure out where you''re going to go digging", "tokens": [50960, 25571, 293, 550, + 291, 362, 281, 411, 291, 611, 362, 281, 411, 2573, 484, 689, 291, 434, 516, 281, + 352, 17343, 51256], "temperature": 0.0, "avg_logprob": -0.12253142254693168, "compression_ratio": + 1.8697318007662835, "no_speech_prob": 0.0008889682940207422}, {"id": 665, "seek": + 499936, "start": 5017.2, "end": 5022.48, "text": " as a detective what am I gonna + maybe I need to set up the like a team of manual laborers because", "tokens": [51256, + 382, 257, 25571, 437, 669, 286, 799, 1310, 286, 643, 281, 992, 493, 264, 411, 257, + 1469, 295, 9688, 5938, 433, 570, 51520], "temperature": 0.0, "avg_logprob": -0.12253142254693168, + "compression_ratio": 1.8697318007662835, "no_speech_prob": 0.0008889682940207422}, + {"id": 666, "seek": 499936, "start": 5022.48, "end": 5026.48, "text": " there''s + something on our click data that''s not quite right or or do something different + with our", "tokens": [51520, 456, 311, 746, 322, 527, 2052, 1412, 300, 311, 406, + 1596, 558, 420, 420, 360, 746, 819, 365, 527, 51720], "temperature": 0.0, "avg_logprob": + -0.12253142254693168, "compression_ratio": 1.8697318007662835, "no_speech_prob": + 0.0008889682940207422}, {"id": 667, "seek": 502648, "start": 5026.48, "end": 5029.839999999999, + "text": " click data and it''s like you really have to be able to understand and + appreciate", "tokens": [50364, 2052, 1412, 293, 309, 311, 411, 291, 534, 362, 281, + 312, 1075, 281, 1223, 293, 4449, 50532], "temperature": 0.0, "avg_logprob": -0.11947676965168544, + "compression_ratio": 1.789272030651341, "no_speech_prob": 0.0017655539559200406}, + {"id": 668, "seek": 502648, "start": 5030.5599999999995, "end": 5037.5199999999995, + "text": " how the nature of your evidence yeah exactly and and maybe to add to that + like when I used to be an", "tokens": [50568, 577, 264, 3687, 295, 428, 4467, 1338, + 2293, 293, 293, 1310, 281, 909, 281, 300, 411, 562, 286, 1143, 281, 312, 364, 50916], + "temperature": 0.0, "avg_logprob": -0.11947676965168544, "compression_ratio": 1.789272030651341, + "no_speech_prob": 0.0017655539559200406}, {"id": 669, "seek": 502648, "start": 5037.5199999999995, + "end": 5042.719999999999, "text": " engineer what do you do daily you open Gira + and you say what''s the next ticket on my name so", "tokens": [50916, 11403, 437, + 360, 291, 360, 5212, 291, 1269, 460, 4271, 293, 291, 584, 437, 311, 264, 958, 10550, + 322, 452, 1315, 370, 51176], "temperature": 0.0, "avg_logprob": -0.11947676965168544, + "compression_ratio": 1.789272030651341, "no_speech_prob": 0.0017655539559200406}, + {"id": 670, "seek": 502648, "start": 5042.719999999999, "end": 5048.719999999999, + "text": " somebody thought about it somebody says what needs to be done they don''t + tell you that this might", "tokens": [51176, 2618, 1194, 466, 309, 2618, 1619, 437, + 2203, 281, 312, 1096, 436, 500, 380, 980, 291, 300, 341, 1062, 51476], "temperature": + 0.0, "avg_logprob": -0.11947676965168544, "compression_ratio": 1.789272030651341, + "no_speech_prob": 0.0017655539559200406}, {"id": 671, "seek": 502648, "start": 5048.719999999999, + "end": 5053.839999999999, "text": " be an experiment but like it''s given right + with product management I don''t open Gira and I know", "tokens": [51476, 312, 364, + 5120, 457, 411, 309, 311, 2212, 558, 365, 1674, 4592, 286, 500, 380, 1269, 460, + 4271, 293, 286, 458, 51732], "temperature": 0.0, "avg_logprob": -0.11947676965168544, + "compression_ratio": 1.789272030651341, "no_speech_prob": 0.0017655539559200406}, + {"id": 672, "seek": 505384, "start": 5053.84, "end": 5060.72, "text": " what to + do every day I''m like let''s think uh you know okay look at metrics uh look at + query logs", "tokens": [50364, 437, 281, 360, 633, 786, 286, 478, 411, 718, 311, + 519, 2232, 291, 458, 1392, 574, 412, 16367, 2232, 574, 412, 14581, 20820, 50708], + "temperature": 0.0, "avg_logprob": -0.14453180446181185, "compression_ratio": 1.6891891891891893, + "no_speech_prob": 0.0014227429637685418}, {"id": 673, "seek": 505384, "start": 5061.2, + "end": 5067.4400000000005, "text": " see what the engineering has done what experiment + we just completed try to combine this", "tokens": [50732, 536, 437, 264, 7043, 575, + 1096, 437, 5120, 321, 445, 7365, 853, 281, 10432, 341, 51044], "temperature": 0.0, + "avg_logprob": -0.14453180446181185, "compression_ratio": 1.6891891891891893, "no_speech_prob": + 0.0014227429637685418}, {"id": 674, "seek": 505384, "start": 5067.4400000000005, + "end": 5073.28, "text": " pieces into yeah what did we learn from that experiment + and like what might be the next step yeah", "tokens": [51044, 3755, 666, 1338, 437, + 630, 321, 1466, 490, 300, 5120, 293, 411, 437, 1062, 312, 264, 958, 1823, 1338, + 51336], "temperature": 0.0, "avg_logprob": -0.14453180446181185, "compression_ratio": + 1.6891891891891893, "no_speech_prob": 0.0014227429637685418}, {"id": 675, "seek": + 505384, "start": 5074.400000000001, "end": 5082.8, "text": " yeah and and also subscribing + to bold changes sometimes it''s easy to kind of go step by step", "tokens": [51392, + 1338, 293, 293, 611, 19981, 281, 11928, 2962, 2171, 309, 311, 1858, 281, 733, 295, + 352, 1823, 538, 1823, 51812], "temperature": 0.0, "avg_logprob": -0.14453180446181185, + "compression_ratio": 1.6891891891891893, "no_speech_prob": 0.0014227429637685418}, + {"id": 676, "seek": 508280, "start": 5082.8, "end": 5090.08, "text": " evolutionary + you know but sometimes you need to jump over you steps yeah requires", "tokens": + [50364, 27567, 291, 458, 457, 2171, 291, 643, 281, 3012, 670, 291, 4439, 1338, 7029, + 50728], "temperature": 0.0, "avg_logprob": -0.19341845194498697, "compression_ratio": + 1.6827956989247312, "no_speech_prob": 0.00016242454876191914}, {"id": 677, "seek": + 508280, "start": 5090.96, "end": 5094.320000000001, "text": " boldness and then + messaging that and saying hey we need we need this", "tokens": [50772, 11928, 1287, + 293, 550, 21812, 300, 293, 1566, 4177, 321, 643, 321, 643, 341, 50940], "temperature": + 0.0, "avg_logprob": -0.19341845194498697, "compression_ratio": 1.6827956989247312, + "no_speech_prob": 0.00016242454876191914}, {"id": 678, "seek": 508280, "start": + 5095.84, "end": 5099.76, "text": " I know it''s almost like going after it makes + me think of going after like", "tokens": [51016, 286, 458, 309, 311, 1920, 411, + 516, 934, 309, 1669, 385, 519, 295, 516, 934, 411, 51212], "temperature": 0.0, "avg_logprob": + -0.19341845194498697, "compression_ratio": 1.6827956989247312, "no_speech_prob": + 0.00016242454876191914}, {"id": 679, "seek": 508280, "start": 5100.96, "end": 5108.4800000000005, + "text": " then you know to get a get um get grant research for like most you know + in the US if you", "tokens": [51272, 550, 291, 458, 281, 483, 257, 483, 1105, 483, + 6386, 2132, 337, 411, 881, 291, 458, 294, 264, 2546, 498, 291, 51648], "temperature": + 0.0, "avg_logprob": -0.19341845194498697, "compression_ratio": 1.6827956989247312, + "no_speech_prob": 0.00016242454876191914}, {"id": 680, "seek": 510848, "start": + 5109.2, "end": 5113.919999999999, "text": " if you want to do some big research + project at a university you go to a government agency and you", "tokens": [50400, + 498, 291, 528, 281, 360, 512, 955, 2132, 1716, 412, 257, 5454, 291, 352, 281, 257, + 2463, 7934, 293, 291, 50636], "temperature": 0.0, "avg_logprob": -0.06733219568119493, + "compression_ratio": 1.8480392156862746, "no_speech_prob": 0.00019729621999431401}, + {"id": 681, "seek": 510848, "start": 5113.919999999999, "end": 5119.04, "text": + " give this big proposal and for these big bets you almost like it''s almost like + that where it''s like", "tokens": [50636, 976, 341, 955, 11494, 293, 337, 613, 955, + 39922, 291, 1920, 411, 309, 311, 1920, 411, 300, 689, 309, 311, 411, 50892], "temperature": + 0.0, "avg_logprob": -0.06733219568119493, "compression_ratio": 1.8480392156862746, + "no_speech_prob": 0.00019729621999431401}, {"id": 682, "seek": 510848, "start": + 5119.759999999999, "end": 5125.36, "text": " you yes we have this like side over + here that''s just constantly evolving whatever", "tokens": [50928, 291, 2086, 321, + 362, 341, 411, 1252, 670, 510, 300, 311, 445, 6460, 21085, 2035, 51208], "temperature": + 0.0, "avg_logprob": -0.06733219568119493, "compression_ratio": 1.8480392156862746, + "no_speech_prob": 0.00019729621999431401}, {"id": 683, "seek": 510848, "start": + 5125.36, "end": 5130.32, "text": " it currently works but then like for these big + bets you almost have to think about it in terms of", "tokens": [51208, 309, 4362, + 1985, 457, 550, 411, 337, 613, 955, 39922, 291, 1920, 362, 281, 519, 466, 309, 294, + 2115, 295, 51456], "temperature": 0.0, "avg_logprob": -0.06733219568119493, "compression_ratio": + 1.8480392156862746, "no_speech_prob": 0.00019729621999431401}, {"id": 684, "seek": + 513032, "start": 5130.96, "end": 5138.32, "text": " we want to spend x amount of + time researching this area to see if this direction works out", "tokens": [50396, + 321, 528, 281, 3496, 2031, 2372, 295, 565, 24176, 341, 1859, 281, 536, 498, 341, + 3513, 1985, 484, 50764], "temperature": 0.0, "avg_logprob": -0.08422925305920978, + "compression_ratio": 1.7117117117117118, "no_speech_prob": 0.00010744218889158219}, + {"id": 685, "seek": 513032, "start": 5139.12, "end": 5145.679999999999, "text": + " and then as part of that you also have to be like these are the early tests the + prototypes before", "tokens": [50804, 293, 550, 382, 644, 295, 300, 291, 611, 362, + 281, 312, 411, 613, 366, 264, 2440, 6921, 264, 42197, 949, 51132], "temperature": + 0.0, "avg_logprob": -0.08422925305920978, "compression_ratio": 1.7117117117117118, + "no_speech_prob": 0.00010744218889158219}, {"id": 686, "seek": 513032, "start": + 5145.679999999999, "end": 5151.599999999999, "text": " we build the big thing to + know if we should invest even further and that''s that''s a tricky thing I", "tokens": + [51132, 321, 1322, 264, 955, 551, 281, 458, 498, 321, 820, 1963, 754, 3052, 293, + 300, 311, 300, 311, 257, 12414, 551, 286, 51428], "temperature": 0.0, "avg_logprob": + -0.08422925305920978, "compression_ratio": 1.7117117117117118, "no_speech_prob": + 0.00010744218889158219}, {"id": 687, "seek": 513032, "start": 5151.599999999999, + "end": 5156.48, "text": " think that''s something a really good product manager + can sort of like coach the stakeholders", "tokens": [51428, 519, 300, 311, 746, + 257, 534, 665, 1674, 6598, 393, 1333, 295, 411, 6560, 264, 17779, 51672], "temperature": + 0.0, "avg_logprob": -0.08422925305920978, "compression_ratio": 1.7117117117117118, + "no_speech_prob": 0.00010744218889158219}, {"id": 688, "seek": 515648, "start": + 5156.48, "end": 5162.719999999999, "text": " and thinking about these things of + like and and thinking about them as bets and not thinking", "tokens": [50364, 293, + 1953, 466, 613, 721, 295, 411, 293, 293, 1953, 466, 552, 382, 39922, 293, 406, 1953, + 50676], "temperature": 0.0, "avg_logprob": -0.1854982716696603, "compression_ratio": + 1.8768472906403941, "no_speech_prob": 0.00442796666175127}, {"id": 689, "seek": + 515648, "start": 5162.719999999999, "end": 5166.639999999999, "text": " about them + as like sure things that we know they''re going to work out this also really important", + "tokens": [50676, 466, 552, 382, 411, 988, 721, 300, 321, 458, 436, 434, 516, 281, + 589, 484, 341, 611, 534, 1021, 50872], "temperature": 0.0, "avg_logprob": -0.1854982716696603, + "compression_ratio": 1.8768472906403941, "no_speech_prob": 0.00442796666175127}, + {"id": 690, "seek": 515648, "start": 5167.2, "end": 5176.0, "text": " yeah exactly + yeah just one example came to came to my mind that was it eBay when they didn''t + have", "tokens": [50900, 1338, 2293, 1338, 445, 472, 1365, 1361, 281, 1361, 281, + 452, 1575, 300, 390, 309, 33803, 562, 436, 994, 380, 362, 51340], "temperature": + 0.0, "avg_logprob": -0.1854982716696603, "compression_ratio": 1.8768472906403941, + "no_speech_prob": 0.00442796666175127}, {"id": 691, "seek": 515648, "start": 5176.0, + "end": 5181.04, "text": " type of head when they added it they they tapped into + something like a hundred million dollar", "tokens": [51340, 2010, 295, 1378, 562, + 436, 3869, 309, 436, 436, 38693, 666, 746, 411, 257, 3262, 2459, 7241, 51592], "temperature": + 0.0, "avg_logprob": -0.1854982716696603, "compression_ratio": 1.8768472906403941, + "no_speech_prob": 0.00442796666175127}, {"id": 692, "seek": 518104, "start": 5181.04, + "end": 5189.84, "text": " market you know because because you reduce the the time + spent in each search session right", "tokens": [50364, 2142, 291, 458, 570, 570, + 291, 5407, 264, 264, 565, 4418, 294, 1184, 3164, 5481, 558, 50804], "temperature": + 0.0, "avg_logprob": -0.23261669476826985, "compression_ratio": 1.7100591715976332, + "no_speech_prob": 0.0046044508926570415}, {"id": 693, "seek": 518104, "start": 5189.84, + "end": 5196.32, "text": " you might get the faster which means you will get the + faster transaction or like get faster so yeah", "tokens": [50804, 291, 1062, 483, + 264, 4663, 597, 1355, 291, 486, 483, 264, 4663, 14425, 420, 411, 483, 4663, 370, + 1338, 51128], "temperature": 0.0, "avg_logprob": -0.23261669476826985, "compression_ratio": + 1.7100591715976332, "no_speech_prob": 0.0046044508926570415}, {"id": 694, "seek": + 518104, "start": 5196.32, "end": 5203.84, "text": " totally probably probably was + in involving product management thinking what if we do this but it''s", "tokens": + [51128, 3879, 1391, 1391, 390, 294, 17030, 1674, 4592, 1953, 437, 498, 321, 360, + 341, 457, 309, 311, 51504], "temperature": 0.0, "avg_logprob": -0.23261669476826985, + "compression_ratio": 1.7100591715976332, "no_speech_prob": 0.0046044508926570415}, + {"id": 695, "seek": 520384, "start": 5204.32, "end": 5212.32, "text": " yeah it''s + like outside of box thinking yeah totally totally and I before we close off I mean + I", "tokens": [50388, 1338, 309, 311, 411, 2380, 295, 2424, 1953, 1338, 3879, 3879, + 293, 286, 949, 321, 1998, 766, 286, 914, 286, 50788], "temperature": 0.0, "avg_logprob": + -0.17055638269944626, "compression_ratio": 1.7545454545454546, "no_speech_prob": + 0.012504853308200836}, {"id": 696, "seek": 520384, "start": 5212.32, "end": 5218.08, + "text": " really enjoyed this conversation bag and I think we could speak entire + day you know my engineer", "tokens": [50788, 534, 4626, 341, 3761, 3411, 293, 286, + 519, 321, 727, 1710, 2302, 786, 291, 458, 452, 11403, 51076], "temperature": 0.0, + "avg_logprob": -0.17055638269944626, "compression_ratio": 1.7545454545454546, "no_speech_prob": + 0.012504853308200836}, {"id": 697, "seek": 520384, "start": 5218.08, "end": 5224.4800000000005, + "text": " having like a lot of fun now like really getting into this but I love + asking this question and", "tokens": [51076, 1419, 411, 257, 688, 295, 1019, 586, + 411, 534, 1242, 666, 341, 457, 286, 959, 3365, 341, 1168, 293, 51396], "temperature": + 0.0, "avg_logprob": -0.17055638269944626, "compression_ratio": 1.7545454545454546, + "no_speech_prob": 0.012504853308200836}, {"id": 698, "seek": 520384, "start": 5224.4800000000005, + "end": 5231.4400000000005, "text": " you partially answered it during this podcast + the y-question what really like you''ve done a ton it''s", "tokens": [51396, 291, + 18886, 10103, 309, 1830, 341, 7367, 264, 288, 12, 20343, 313, 437, 534, 411, 291, + 600, 1096, 257, 2952, 309, 311, 51744], "temperature": 0.0, "avg_logprob": -0.17055638269944626, + "compression_ratio": 1.7545454545454546, "no_speech_prob": 0.012504853308200836}, + {"id": 699, "seek": 523144, "start": 5231.44, "end": 5237.2, "text": " not just + that you imagine doing things or told someone to do you actually did it yourself + like", "tokens": [50364, 406, 445, 300, 291, 3811, 884, 721, 420, 1907, 1580, 281, + 360, 291, 767, 630, 309, 1803, 411, 50652], "temperature": 0.0, "avg_logprob": -0.15965603099149817, + "compression_ratio": 1.6680851063829787, "no_speech_prob": 0.002001760993152857}, + {"id": 700, "seek": 523144, "start": 5237.2, "end": 5244.879999999999, "text": " + keep it learning to rank plugin explainer you know books all of these really physical + almost physical", "tokens": [50652, 1066, 309, 2539, 281, 6181, 23407, 2903, 260, + 291, 458, 3642, 439, 295, 613, 534, 4001, 1920, 4001, 51036], "temperature": 0.0, + "avg_logprob": -0.15965603099149817, "compression_ratio": 1.6680851063829787, "no_speech_prob": + 0.002001760993152857}, {"id": 701, "seek": 523144, "start": 5244.879999999999, "end": + 5252.32, "text": " objects right books are it wouldn''t matter yeah so and but you + still keep going and going and I mean", "tokens": [51036, 6565, 558, 3642, 366, + 309, 2759, 380, 1871, 1338, 370, 293, 457, 291, 920, 1066, 516, 293, 516, 293, 286, + 914, 51408], "temperature": 0.0, "avg_logprob": -0.15965603099149817, "compression_ratio": + 1.6680851063829787, "no_speech_prob": 0.002001760993152857}, {"id": 702, "seek": + 523144, "start": 5252.32, "end": 5258.08, "text": " you talk at conferences you + push so much material on LinkedIn and Twitter I barely can fall up", "tokens": [51408, + 291, 751, 412, 22032, 291, 2944, 370, 709, 2527, 322, 20657, 293, 5794, 286, 10268, + 393, 2100, 493, 51696], "temperature": 0.0, "avg_logprob": -0.15965603099149817, + "compression_ratio": 1.6680851063829787, "no_speech_prob": 0.002001760993152857}, + {"id": 703, "seek": 525808, "start": 5258.64, "end": 5268.32, "text": " like what + drives you in this space that''s a that''s a great question I think I think what + gets me", "tokens": [50392, 411, 437, 11754, 291, 294, 341, 1901, 300, 311, 257, + 300, 311, 257, 869, 1168, 286, 519, 286, 519, 437, 2170, 385, 50876], "temperature": + 0.0, "avg_logprob": -0.10705153521369486, "compression_ratio": 1.7568807339449541, + "no_speech_prob": 0.001004832680337131}, {"id": 704, "seek": 525808, "start": 5268.32, + "end": 5275.5199999999995, "text": " excited about this space is how hey it feels + like the future of how people interact with with", "tokens": [50876, 2919, 466, + 341, 1901, 307, 577, 4177, 309, 3417, 411, 264, 2027, 295, 577, 561, 4648, 365, + 365, 51236], "temperature": 0.0, "avg_logprob": -0.10705153521369486, "compression_ratio": + 1.7568807339449541, "no_speech_prob": 0.001004832680337131}, {"id": 705, "seek": + 525808, "start": 5275.5199999999995, "end": 5281.2, "text": " computers like searches + the Google for example for a long time people call it Google like it''s", "tokens": + [51236, 10807, 411, 26701, 264, 3329, 337, 1365, 337, 257, 938, 565, 561, 818, 309, + 3329, 411, 309, 311, 51520], "temperature": 0.0, "avg_logprob": -0.10705153521369486, + "compression_ratio": 1.7568807339449541, "no_speech_prob": 0.001004832680337131}, + {"id": 706, "seek": 525808, "start": 5281.2, "end": 5287.04, "text": " really a + command line interface but it understands natural language and I feel like more + and more", "tokens": [51520, 534, 257, 5622, 1622, 9226, 457, 309, 15146, 3303, + 2856, 293, 286, 841, 411, 544, 293, 544, 51812], "temperature": 0.0, "avg_logprob": + -0.10705153521369486, "compression_ratio": 1.7568807339449541, "no_speech_prob": + 0.001004832680337131}, {"id": 707, "seek": 528704, "start": 5287.04, "end": 5294.48, + "text": " interfaces are this like fuzzy interaction that''s search like and it''s + this thing creeping up on us", "tokens": [50364, 28416, 366, 341, 411, 34710, 9285, + 300, 311, 3164, 411, 293, 309, 311, 341, 551, 47753, 493, 322, 505, 50736], "temperature": + 0.0, "avg_logprob": -0.08148660984906284, "compression_ratio": 1.742489270386266, + "no_speech_prob": 0.00010909065167652443}, {"id": 708, "seek": 528704, "start": + 5294.48, "end": 5301.28, "text": " that people aren''t quite realizing um and then + the other it that just makes it a fascinating field of", "tokens": [50736, 300, + 561, 3212, 380, 1596, 16734, 1105, 293, 550, 264, 661, 309, 300, 445, 1669, 309, + 257, 10343, 2519, 295, 51076], "temperature": 0.0, "avg_logprob": -0.08148660984906284, + "compression_ratio": 1.742489270386266, "no_speech_prob": 0.00010909065167652443}, + {"id": 709, "seek": 528704, "start": 5301.28, "end": 5308.56, "text": " like the + intersection of data and engineering and product and UX and you have to have all + of these parts", "tokens": [51076, 411, 264, 15236, 295, 1412, 293, 7043, 293, 1674, + 293, 40176, 293, 291, 362, 281, 362, 439, 295, 613, 3166, 51440], "temperature": + 0.0, "avg_logprob": -0.08148660984906284, "compression_ratio": 1.742489270386266, + "no_speech_prob": 0.00010909065167652443}, {"id": 710, "seek": 528704, "start": + 5308.56, "end": 5314.96, "text": " of your brain working together to help sort of + like understand and solve the problem um it''s really", "tokens": [51440, 295, 428, + 3567, 1364, 1214, 281, 854, 1333, 295, 411, 1223, 293, 5039, 264, 1154, 1105, 309, + 311, 534, 51760], "temperature": 0.0, "avg_logprob": -0.08148660984906284, "compression_ratio": + 1.742489270386266, "no_speech_prob": 0.00010909065167652443}, {"id": 711, "seek": + 531496, "start": 5314.96, "end": 5323.52, "text": " just a it''s a huge intellectual + challenge but I you know more you know more foundationally just like", "tokens": + [50364, 445, 257, 309, 311, 257, 2603, 12576, 3430, 457, 286, 291, 458, 544, 291, + 458, 544, 7030, 379, 445, 411, 50792], "temperature": 0.0, "avg_logprob": -0.14865259594387478, + "compression_ratio": 1.811926605504587, "no_speech_prob": 0.00030252744909375906}, + {"id": 712, "seek": 531496, "start": 5323.52, "end": 5328.88, "text": " I find like + interacting with interesting and great people in the field also just drives me is + just", "tokens": [50792, 286, 915, 411, 18017, 365, 1880, 293, 869, 561, 294, 264, + 2519, 611, 445, 11754, 385, 307, 445, 51060], "temperature": 0.0, "avg_logprob": + -0.14865259594387478, "compression_ratio": 1.811926605504587, "no_speech_prob": + 0.00030252744909375906}, {"id": 713, "seek": 531496, "start": 5328.88, "end": 5334.16, + "text": " how fun it is to interact out there with people like you Demetri and other + people who are just like", "tokens": [51060, 577, 1019, 309, 307, 281, 4648, 484, + 456, 365, 561, 411, 291, 4686, 302, 470, 293, 661, 561, 567, 366, 445, 411, 51324], + "temperature": 0.0, "avg_logprob": -0.14865259594387478, "compression_ratio": 1.811926605504587, + "no_speech_prob": 0.00030252744909375906}, {"id": 714, "seek": 531496, "start": + 5335.04, "end": 5339.36, "text": " also get excited about the problem and like to + nerd out about it so that also kind of drives me", "tokens": [51368, 611, 483, 2919, + 466, 264, 1154, 293, 411, 281, 23229, 484, 466, 309, 370, 300, 611, 733, 295, 11754, + 385, 51584], "temperature": 0.0, "avg_logprob": -0.14865259594387478, "compression_ratio": + 1.811926605504587, "no_speech_prob": 0.00030252744909375906}, {"id": 715, "seek": + 533936, "start": 5339.36, "end": 5345.92, "text": " is just the social aspect of + sharing my crazy ideas or products or books and getting feedback and", "tokens": + [50364, 307, 445, 264, 2093, 4171, 295, 5414, 452, 3219, 3487, 420, 3383, 420, 3642, + 293, 1242, 5824, 293, 50692], "temperature": 0.0, "avg_logprob": -0.1322775253882775, + "compression_ratio": 1.6891891891891893, "no_speech_prob": 0.0022065704688429832}, + {"id": 716, "seek": 533936, "start": 5345.92, "end": 5351.92, "text": " like continuing + the conversation yeah and I think it''s endless you''re doing a great contribution", + "tokens": [50692, 411, 9289, 264, 3761, 1338, 293, 286, 519, 309, 311, 16144, 291, + 434, 884, 257, 869, 13150, 50992], "temperature": 0.0, "avg_logprob": -0.1322775253882775, + "compression_ratio": 1.6891891891891893, "no_speech_prob": 0.0022065704688429832}, + {"id": 717, "seek": 533936, "start": 5351.92, "end": 5358.0, "text": " there but + it''s like endless journey in many ways right so many facets so much totally", "tokens": + [50992, 456, 457, 309, 311, 411, 16144, 4671, 294, 867, 2098, 558, 370, 867, 49752, + 370, 709, 3879, 51296], "temperature": 0.0, "avg_logprob": -0.1322775253882775, + "compression_ratio": 1.6891891891891893, "no_speech_prob": 0.0022065704688429832}, + {"id": 718, "seek": 533936, "start": 5358.0, "end": 5365.12, "text": " dimensionality + yeah totally absolutely and of course I think people want to learn these things", + "tokens": [51296, 10139, 1860, 1338, 3879, 3122, 293, 295, 1164, 286, 519, 561, + 528, 281, 1466, 613, 721, 51652], "temperature": 0.0, "avg_logprob": -0.1322775253882775, + "compression_ratio": 1.6891891891891893, "no_speech_prob": 0.0022065704688429832}, + {"id": 719, "seek": 536512, "start": 5365.84, "end": 5372.5599999999995, "text": + " I myself as well from time to time I''m subscribed to a course and I just have + the blast of I don''t know", "tokens": [50400, 286, 2059, 382, 731, 490, 565, 281, + 565, 286, 478, 16665, 281, 257, 1164, 293, 286, 445, 362, 264, 12035, 295, 286, + 500, 380, 458, 50736], "temperature": 0.0, "avg_logprob": -0.11717091969081334, + "compression_ratio": 1.6740331491712708, "no_speech_prob": 0.01414013933390379}, + {"id": 720, "seek": 536512, "start": 5372.5599999999995, "end": 5379.44, "text": + " four weeks two months whatever not for the certificate but for the for the knowledge + and for that", "tokens": [50736, 1451, 3259, 732, 2493, 2035, 406, 337, 264, 15953, + 457, 337, 264, 337, 264, 3601, 293, 337, 300, 51080], "temperature": 0.0, "avg_logprob": + -0.11717091969081334, "compression_ratio": 1.6740331491712708, "no_speech_prob": + 0.01414013933390379}, {"id": 721, "seek": 536512, "start": 5379.44, "end": 5387.36, + "text": " feeling of connection to that knowledge and with that I want to ask you + if you have any announcements", "tokens": [51080, 2633, 295, 4984, 281, 300, 3601, + 293, 365, 300, 286, 528, 281, 1029, 291, 498, 291, 362, 604, 23785, 51476], "temperature": + 0.0, "avg_logprob": -0.11717091969081334, "compression_ratio": 1.6740331491712708, + "no_speech_prob": 0.01414013933390379}, {"id": 722, "seek": 538736, "start": 5388.32, + "end": 5393.12, "text": " yeah so I''m doing a course with sphere sphere is a fantastic", + "tokens": [50412, 1338, 370, 286, 478, 884, 257, 1164, 365, 16687, 16687, 307, 257, + 5456, 50652], "temperature": 0.0, "avg_logprob": -0.17331908878527189, "compression_ratio": + 1.691542288557214, "no_speech_prob": 0.0028186526615172625}, {"id": 723, "seek": + 538736, "start": 5395.04, "end": 5402.24, "text": " company that is sort of trying + to build these like next level courses you know it''s not your", "tokens": [50748, + 2237, 300, 307, 1333, 295, 1382, 281, 1322, 613, 411, 958, 1496, 7712, 291, 458, + 309, 311, 406, 428, 51108], "temperature": 0.0, "avg_logprob": -0.17331908878527189, + "compression_ratio": 1.691542288557214, "no_speech_prob": 0.0028186526615172625}, + {"id": 724, "seek": 538736, "start": 5402.24, "end": 5406.5599999999995, "text": + " basic utemy course we''re learning some basic things it''s really like it''s almost + like", "tokens": [51108, 3875, 2839, 3633, 1164, 321, 434, 2539, 512, 3875, 721, + 309, 311, 534, 411, 309, 311, 1920, 411, 51324], "temperature": 0.0, "avg_logprob": + -0.17331908878527189, "compression_ratio": 1.691542288557214, "no_speech_prob": + 0.0028186526615172625}, {"id": 725, "seek": 538736, "start": 5407.2, "end": 5412.0, + "text": " a master class with a professional and they are really focused on machine + learning engineering right", "tokens": [51356, 257, 4505, 1508, 365, 257, 4843, + 293, 436, 366, 534, 5178, 322, 3479, 2539, 7043, 558, 51596], "temperature": 0.0, + "avg_logprob": -0.17331908878527189, "compression_ratio": 1.691542288557214, "no_speech_prob": + 0.0028186526615172625}, {"id": 726, "seek": 541200, "start": 5412.0, "end": 5417.28, + "text": " now so they''ve recommended systems and all these things that I''m doing + an ML powered search course", "tokens": [50364, 586, 370, 436, 600, 9628, 3652, + 293, 439, 613, 721, 300, 286, 478, 884, 364, 21601, 17786, 3164, 1164, 50628], "temperature": + 0.0, "avg_logprob": -0.14415745735168456, "compression_ratio": 1.7689530685920578, + "no_speech_prob": 0.00035931766615249217}, {"id": 727, "seek": 541200, "start": + 5419.04, "end": 5424.4, "text": " and it really covers a lot of these things that + we''ve talked about starting from you know just", "tokens": [50716, 293, 309, 534, + 10538, 257, 688, 295, 613, 721, 300, 321, 600, 2825, 466, 2891, 490, 291, 458, 445, + 50984], "temperature": 0.0, "avg_logprob": -0.14415745735168456, "compression_ratio": + 1.7689530685920578, "no_speech_prob": 0.00035931766615249217}, {"id": 728, "seek": + 541200, "start": 5424.4, "end": 5429.28, "text": " appreciating the relevance problem + to building up learning terrain models and really focus on the", "tokens": [50984, + 3616, 990, 264, 32684, 1154, 281, 2390, 493, 2539, 17674, 5245, 293, 534, 1879, + 322, 264, 51228], "temperature": 0.0, "avg_logprob": -0.14415745735168456, "compression_ratio": + 1.7689530685920578, "no_speech_prob": 0.00035931766615249217}, {"id": 729, "seek": + 541200, "start": 5429.28, "end": 5436.08, "text": " problem of ranking and then + also discovery of doing feature exploration and training data exploration", "tokens": + [51228, 1154, 295, 17833, 293, 550, 611, 12114, 295, 884, 4111, 16197, 293, 3097, + 1412, 16197, 51568], "temperature": 0.0, "avg_logprob": -0.14415745735168456, "compression_ratio": + 1.7689530685920578, "no_speech_prob": 0.00035931766615249217}, {"id": 730, "seek": + 541200, "start": 5436.08, "end": 5440.88, "text": " to try to figure out what''s + even relevant beyond the sort of filter bubble of our current search", "tokens": + [51568, 281, 853, 281, 2573, 484, 437, 311, 754, 7340, 4399, 264, 1333, 295, 6608, + 12212, 295, 527, 2190, 3164, 51808], "temperature": 0.0, "avg_logprob": -0.14415745735168456, + "compression_ratio": 1.7689530685920578, "no_speech_prob": 0.00035931766615249217}, + {"id": 731, "seek": 544088, "start": 5440.88, "end": 5449.68, "text": " algorithms + so if you''re interested in that catch up with me at it''s get sphere.com and you + can", "tokens": [50364, 14642, 370, 498, 291, 434, 3102, 294, 300, 3745, 493, 365, + 385, 412, 309, 311, 483, 16687, 13, 1112, 293, 291, 393, 50804], "temperature": + 0.0, "avg_logprob": -0.17842747614933893, "compression_ratio": 1.5988700564971752, + "no_speech_prob": 0.0012174344155937433}, {"id": 732, "seek": 544088, "start": 5449.68, + "end": 5455.68, "text": " find the ML powered search course and then of course like + I all of my other things out there AI", "tokens": [50804, 915, 264, 21601, 17786, + 3164, 1164, 293, 550, 295, 1164, 411, 286, 439, 295, 452, 661, 721, 484, 456, 7318, + 51104], "temperature": 0.0, "avg_logprob": -0.17842747614933893, "compression_ratio": + 1.5988700564971752, "no_speech_prob": 0.0012174344155937433}, {"id": 733, "seek": + 544088, "start": 5455.68, "end": 5463.68, "text": " powered search written with + tray and max and relevant search of course hopefully still still", "tokens": [51104, + 17786, 3164, 3720, 365, 16027, 293, 11469, 293, 7340, 3164, 295, 1164, 4696, 920, + 920, 51504], "temperature": 0.0, "avg_logprob": -0.17842747614933893, "compression_ratio": + 1.5988700564971752, "no_speech_prob": 0.0012174344155937433}, {"id": 734, "seek": + 546368, "start": 5463.76, "end": 5471.84, "text": " relevant so to speak and all + the great stuff out there that I think people find interesting and useful", "tokens": + [50368, 7340, 370, 281, 1710, 293, 439, 264, 869, 1507, 484, 456, 300, 286, 519, + 561, 915, 1880, 293, 4420, 50772], "temperature": 0.0, "avg_logprob": -0.11416360556361187, + "compression_ratio": 1.7477064220183487, "no_speech_prob": 0.004133355338126421}, + {"id": 735, "seek": 546368, "start": 5472.400000000001, "end": 5476.16, "text": + " and of course I also want to continue to plug open source connections they have + great", "tokens": [50800, 293, 295, 1164, 286, 611, 528, 281, 2354, 281, 5452, 1269, + 4009, 9271, 436, 362, 869, 50988], "temperature": 0.0, "avg_logprob": -0.11416360556361187, + "compression_ratio": 1.7477064220183487, "no_speech_prob": 0.004133355338126421}, + {"id": 736, "seek": 546368, "start": 5476.72, "end": 5481.92, "text": " training + consulting courses I was you know a key part of training of that team as a great", + "tokens": [51016, 3097, 23682, 7712, 286, 390, 291, 458, 257, 2141, 644, 295, 3097, + 295, 300, 1469, 382, 257, 869, 51276], "temperature": 0.0, "avg_logprob": -0.11416360556361187, + "compression_ratio": 1.7477064220183487, "no_speech_prob": 0.004133355338126421}, + {"id": 737, "seek": 546368, "start": 5481.92, "end": 5488.320000000001, "text": + " place as a resource that you can go to so yeah this is fantastic announcement + and also thanks for that", "tokens": [51276, 1081, 382, 257, 7684, 300, 291, 393, + 352, 281, 370, 1338, 341, 307, 5456, 12847, 293, 611, 3231, 337, 300, 51596], "temperature": + 0.0, "avg_logprob": -0.11416360556361187, "compression_ratio": 1.7477064220183487, + "no_speech_prob": 0.004133355338126421}, {"id": 738, "seek": 548832, "start": 5488.32, + "end": 5495.36, "text": " and I also want to say that I enjoy the reason I enjoy + reading your book relevant searches not", "tokens": [50364, 293, 286, 611, 528, + 281, 584, 300, 286, 2103, 264, 1778, 286, 2103, 3760, 428, 1446, 7340, 26701, 406, + 50716], "temperature": 0.0, "avg_logprob": -0.14110491492531516, "compression_ratio": + 1.7, "no_speech_prob": 0.0016095573082566261}, {"id": 739, "seek": 548832, "start": + 5495.36, "end": 5500.08, "text": " only because you share a bunch there like for + example indexing songs I was like what", "tokens": [50716, 787, 570, 291, 2073, + 257, 3840, 456, 411, 337, 1365, 8186, 278, 5781, 286, 390, 411, 437, 50952], "temperature": + 0.0, "avg_logprob": -0.14110491492531516, "compression_ratio": 1.7, "no_speech_prob": + 0.0016095573082566261}, {"id": 740, "seek": 548832, "start": 5501.36, "end": 5510.0, + "text": " inverted index yeah you can if you want it your way of writing is very + thorough it''s like you create", "tokens": [51016, 38969, 8186, 1338, 291, 393, + 498, 291, 528, 309, 428, 636, 295, 3579, 307, 588, 12934, 309, 311, 411, 291, 1884, + 51448], "temperature": 0.0, "avg_logprob": -0.14110491492531516, "compression_ratio": + 1.7, "no_speech_prob": 0.0016095573082566261}, {"id": 741, "seek": 548832, "start": + 5510.0, "end": 5515.92, "text": " a network of thoughts as I go through the text + and say we will talk about it later but let me", "tokens": [51448, 257, 3209, 295, + 4598, 382, 286, 352, 807, 264, 2487, 293, 584, 321, 486, 751, 466, 309, 1780, 457, + 718, 385, 51744], "temperature": 0.0, "avg_logprob": -0.14110491492531516, "compression_ratio": + 1.7, "no_speech_prob": 0.0016095573082566261}, {"id": 742, "seek": 551592, "start": + 5515.92, "end": 5521.52, "text": " spend a few sentences still explaining what I + mean and I''m like it''s like a conversation and yeah", "tokens": [50364, 3496, + 257, 1326, 16579, 920, 13468, 437, 286, 914, 293, 286, 478, 411, 309, 311, 411, + 257, 3761, 293, 1338, 50644], "temperature": 0.0, "avg_logprob": -0.14265569051106772, + "compression_ratio": 1.746606334841629, "no_speech_prob": 0.018721191212534904}, + {"id": 743, "seek": 551592, "start": 5521.52, "end": 5526.4800000000005, "text": + " I try to be conversation on including like the typical like bad jokes and sorts + of humor", "tokens": [50644, 286, 853, 281, 312, 3761, 322, 3009, 411, 264, 7476, + 411, 1578, 14439, 293, 7527, 295, 14318, 50892], "temperature": 0.0, "avg_logprob": + -0.14265569051106772, "compression_ratio": 1.746606334841629, "no_speech_prob": + 0.018721191212534904}, {"id": 744, "seek": 551592, "start": 5527.28, "end": 5534.4800000000005, + "text": " exactly I''m also learning on that side so that''s that''s fantastic and + that gives that feel and keep", "tokens": [50932, 2293, 286, 478, 611, 2539, 322, + 300, 1252, 370, 300, 311, 300, 311, 5456, 293, 300, 2709, 300, 841, 293, 1066, 51292], + "temperature": 0.0, "avg_logprob": -0.14265569051106772, "compression_ratio": 1.746606334841629, + "no_speech_prob": 0.018721191212534904}, {"id": 745, "seek": 551592, "start": 5534.4800000000005, + "end": 5540.4800000000005, "text": " keep going keep doing this I enjoy pulling + what you do and connecting once in a while you sometimes", "tokens": [51292, 1066, + 516, 1066, 884, 341, 286, 2103, 8407, 437, 291, 360, 293, 11015, 1564, 294, 257, + 1339, 291, 2171, 51592], "temperature": 0.0, "avg_logprob": -0.14265569051106772, + "compression_ratio": 1.746606334841629, "no_speech_prob": 0.018721191212534904}, + {"id": 746, "seek": 554048, "start": 5540.48, "end": 5546.16, "text": " give me + a really good advice on you know how to oh sure is the title in the blog post or + should", "tokens": [50364, 976, 385, 257, 534, 665, 5192, 322, 291, 458, 577, 281, + 1954, 988, 307, 264, 4876, 294, 264, 6968, 2183, 420, 820, 50648], "temperature": + 0.0, "avg_logprob": -0.19522088438599974, "compression_ratio": 1.6538461538461537, + "no_speech_prob": 0.022195210680365562}, {"id": 747, "seek": 554048, "start": 5546.16, + "end": 5552.5599999999995, "text": " they venture into this or not and things like + that that''s amazing this cross pollination so I''m", "tokens": [50648, 436, 18474, + 666, 341, 420, 406, 293, 721, 411, 300, 300, 311, 2243, 341, 3278, 6418, 2486, 370, + 286, 478, 50968], "temperature": 0.0, "avg_logprob": -0.19522088438599974, "compression_ratio": + 1.6538461538461537, "no_speech_prob": 0.022195210680365562}, {"id": 748, "seek": + 554048, "start": 5552.5599999999995, "end": 5558.639999999999, "text": " enjoying + it a lot and I recommend everyone to subscribe to your course google of course link + it", "tokens": [50968, 9929, 309, 257, 688, 293, 286, 2748, 1518, 281, 3022, 281, + 428, 1164, 20742, 295, 1164, 2113, 309, 51272], "temperature": 0.0, "avg_logprob": + -0.19522088438599974, "compression_ratio": 1.6538461538461537, "no_speech_prob": + 0.022195210680365562}, {"id": 749, "seek": 554048, "start": 5559.839999999999, "end": + 5567.919999999999, "text": " thank you and have fun have fun oh definitely we''ll + do awesome thanks so much Doug I enjoyed it and", "tokens": [51332, 1309, 291, 293, + 362, 1019, 362, 1019, 1954, 2138, 321, 603, 360, 3476, 3231, 370, 709, 12742, 286, + 4626, 309, 293, 51736], "temperature": 0.0, "avg_logprob": -0.19522088438599974, + "compression_ratio": 1.6538461538461537, "no_speech_prob": 0.022195210680365562}, + {"id": 750, "seek": 556792, "start": 5567.92, "end": 5578.4800000000005, "text": + " see you soon hopefully in person yeah yeah same all right bye bye all right take + care", "tokens": [50392, 536, 291, 2321, 4696, 294, 954, 1338, 1338, 912, 439, 558, + 6543, 6543, 439, 558, 747, 1127, 50892], "temperature": 0.0, "avg_logprob": -0.41948494911193845, + "compression_ratio": 1.2142857142857142, "no_speech_prob": 0.009327513165771961}, + {"id": 751, "seek": 559792, "start": 5597.92, "end": 5598.4, "text": " you", "tokens": + [50372, 291, 50388], "temperature": 1.0, "avg_logprob": -2.3349666595458984, "compression_ratio": + 0.2727272727272727, "no_speech_prob": 0.34373190999031067}]' +--- + +Hello there, that vector podcast is here. We are rolling in the season two of this podcast. And so today we have like a breach, so to say from US to Finland. And I'm super excited to talk to Dr. +and ball, staff relevance engineer choppy fine and that gives to be a CTO at open source connections, the company behind so many tools for us relevance engineers and relevance product managers as I am today. He's the original creator of cupid and explainer and also learning to rank rank. My shirt. +Yeah, let's search. Yeah, cute.com. Awesome. Great to have you here. Hi, how are you doing? I'm great. Yeah, I'm doing great. Excited to chat about where search is going and the exciting places that are, you know, search is going ahead and everything. So finally, I get to be on this podcast. +I'm really excited to be here. Yeah, absolutely. Long overdue and you are the legendary guest. So I'm super excited to talk to you. +And a lot to cover, but before we begin, could you spare a few minutes talking through your background, how you ended up in search, was it an accident or was it not, was it? It was mostly an accident. +So what happened was so for a long time, the first chapter of my career, the first half was being C and C plus plus developer. And I kind of got really into performance, so optimizing speed and in native code. That was a lot of fun. +And I moved down here to Charlottesville in 2012 from the Washington DC area couple, so I was a couple hours away from my work. And I found that like, you know, I was kind of at the time, especially being one remote employee for an in office company is just a nightmare. +And we had this neighborhood block party and I decided to wear a nerdy t shirt just to see like, oh, maybe I'll meet other developers. And I think this shirt said something like my code doesn't have any bugs. It just has features or something or random features. +And I so happened to run into Eric Pugh, who's the founder of open source connections and sort of one thing led to another and I got, I was like, oh, this seems cool. It's a small company. Always wanted to try out consulting and contracting. +And so, yeah, we ended up getting at the job and getting more and more into search. Yeah, awesome. And you spend there how long seven more years about eight years eight years. Yeah, it's a long. Yeah, a long time. Yeah, it was a lot of fun. Yeah, and you've done so much. +I mean, I was literally in my previous job at AlphaSense. I was my last. So I spend the 10 and a half years and my last half a year. I was focusing on learning to rank. And I could find I could not find a better resource than hello LTR. Ripple on GitHub that you have. Yeah. Yeah. +And it was it was an amazing journey because first of all, I had to learn it on the other hand, I had to build like what we could call maybe an infrastructure pipeline of flywheel of success. Yeah, so you're going to be right. +So yeah, and then you train the model you test and then you validate and so on so forth. Validate with the users maybe a test. It was awesome. I built it entirely on your on your Ripple. And I even thought that said some was the PR or issue that I created. But anyway, I'm sure yeah. +Yeah, it seems like to me sure you contribute a lot to like I know you contribute a lot to keep it too. And I think it's a great work or constantly huddling on you know, keep it and trying to keep it keep it going. +Yeah, and I mean, that's thanks to your curiosity that you created it you you kind of saw the nation feed for it. But also when I came across it, I mean, it was very straight forward to start using it. And of course, it was also learning experience. +But now every time I joined, let's say new new gig, you know, previously silo AI with a large client, you know, WebScale search, I brought it in. I said, there is no other tool that I know we should just try this and then try to write a Tom Tom right now as well. So we have it. Oh, that's awesome. +Yeah. Yeah. Yeah, Cupid has a funny origin story where in sort of like dovetails with my story, but for a long time, opens for connections. And I think this is true of a lot of places in the early 2010s. It was pretty easy to build you we would build these beautiful search apps. +And that would be part of our, our consulting as we build these search apps. And they would be beautiful and they look pretty, but then only at the very end with someone type in a search and you would see, like these results don't make any sense. And then like people panic and they want to fix it. +They're about to go to market. They can't release like this. And they realize of the search engine isn't some magic black box. It's actually this thing that we have to configure and tune and stuff. +And so Cupid actually started because, and there's an old Lucene revolution video that talks about this. But John Barryman, my coworker at the time, and I would go to our client also in Charlottesville, Silverchair. And we were helping them develop these search applications. +And as like they would tune like constantly we go back every week and try to fix something. And then we would end up breaking something else. So I finally got kind of tired of it. +And I just sat there and built like a at the time of Python flask app that was just let's show these search results and like just label them as good or bad. And so we don't have to keep going backwards on on our quality. +And he was I was literally creating the apple he was sitting there like trying to tune search with with our client. Reena Morse at Silverchair. So it was kind of like hacked together in an hour and then we started using it. This is so cool. +And I mean, for me, Cupid, I mean this topic of quality assurance and searches big, I think, right. And maybe under valued. I'm not sure. Yeah, it is. Yeah, totally. +But you know, like at Alphasense, for example, I had access to people who used to be financial analysts or they deeply understand, you know, content. And it's so important to understand content like brokerage versus, you know, sell site versus buy site. +What is it? What is this? You know, what people are looking there for. And I remember one of the guys on that product team, he said, well, this is fantastic. Now I can explain to you what I need in terms of relevancy without getting into the weeds of your algorithm. +And you then hold away and get there, right. So I mean, hold away. That's fantastic. And I remember like at WebScale search, we got stuck a little bit like optimizing our KPI metrics, like one of them is click through rate. +And I remember when I onboarded Cupid, we generated literally 70 juratickets as a result of analyzing first annotating, rating the queries and then analyzing what went wrong there, right. And probably like half of this at least was data related issues, which you would think, hey, this is Cupid. +This is about relevance and not about data. Oh, yeah, you find that stuff all the time. Yeah. + We would, we would, we kind of had this model at open source connections that worked well where, you know, you, you come in as a consultant, you're trying to, we would, we started consulting in the search relevance basically exclusively and we would come in and instead of sometimes, you know, it's a very data driven process and it needs to be, but on the other times it's like just jumping in. +Let's start with 12 queries and let's label what's what the good results are and improve those. And then we would kind of go through these sprints of, okay, let's take the next 12 queries, let's take the next 12 queries. + And you just constantly like gradually expand the envelope of what you're tuning and actually worked really well as a practice for improving relevancy without having to spend like sometimes, you know, the, you know, places don't necessarily have months to spend boots trapping like a click stream pipeline and understanding clicks and all the biases and things and that. +Or, you know, and so it's just a really straightforward way to get started on the problem. Yeah, absolutely. And I don't know if you could imagine this, but when I was a consultant, I had like a breather two months between, you know, that client and Tom Tom. +So I consulted you startups, one of them in the US. And so they look at us, we said, you know, you come in and they think you can do magic and I said, okay, maybe I can't, but I will not tell you. So no doubt. In front of you. +So I came there and I said, hey, how are you doing QA and it should need this massive Excel with colored legend and like what not. And I said, well, this is cool, but I think it's not repeatable. And they said, yeah, it's a big pain point. I said, let's do something better, something else. +And I introduced QBIT, it took probably a couple months. Then I lost touch with that startup. So I switched to consulting others. They didn't reach out. Then I reached out and I said, hey, what's the status. +And they said, you will be surprised, but we have moved the whole QA process to keep it as like wow. Oh, that's awesome. Yeah. This is the touch and feeling of what you have created for your use cases, worked for someone else's case. Isn't it amazing feeling. Yeah, it's great. +Yeah, it's funny how that works. You know, if you solve your own problem really well, you know, there's probably other people out there that are like you that have the same problem and appreciate that perspective on the problem. +So, so yeah, I think that that's kind of a truism is like, don't worry about solving a if you have a need, if you have a need, solve the problem for yourself as the most important audience. And it will sort of naturally find the people like you have the exact same problem. Yeah, fantastic. +And now you are at Shopify. Yeah. So how do you structure your work there? This is my how part in the podcast as well. You're building your product is relevancy in many ways, right? And maybe performance of the search engine because there are trade-offs. Yeah. +So how do you structure the whole process experimentation, evaluation? Is there anything you could share and understand that there could be some private things you don't want to share. Oh, sure. Yeah. That's okay. +So for context, our team works on the relevancy of all of the Shopify storefronts, so all the little shops out there. And that's a really interesting process because you could imagine the impact there is very variable per shop. +And we don't, of course, we don't want to like tank someone's sales, but at the same time, if we see something doing well, generally, then we, you know, we want to promote it. +So in the last part of the process there, it's, it's very different than in the past, I've worked on, you know, you work on one search engine, you might work on one Shopify store, so to speak. +And that Shopify, the challenges, there are hundreds of thousand millions of little, you know, shops that you Shopify search. +And you, how do you find an algorithm or algorithms that support those, that, you know, what's going to work well for every possible ecommerce use case, like in some cases, of course, there's a lot of apparel on Shopify. +There's a lot of, there's a, you know, all kinds of things people make up businesses on Shopify Shopify is wants to very much support creators. How do people even search, what do they expect when they search on these stores. +So the good thing is when I started at Shopify, there was already some amount of, you know, data flowing through in terms of like knowing what people are clicking on and stuff. So I was able to pretty early on start developing a click model. +So a click model is something that looks at how users click on results and sort of like in aggregate gives a search result, a probability of relevance for a given query. +And we noticed that people skip over certain product a lot when they search for shoes, maybe for some reason the shoes search shows socks at the top, we know those socks are probably not relevant. And we know that whatever they're clicking on below that are very likely relevant. +And so we can, we started, you know, the good thing that Shopify we kind of could were able to start using that as like a test set. And then, of course, tooling is very key to my heart. So like one thing that we've done at Shopify is we built a large tool tool chain. +We sort of to do offline experiments called called boogie using that data. And it's about what you would expect from using something like cupid we can take this data we can sort of see like did we improve things to things get worse with our ideas. +And then of course we release to a test we look at our normal conversion metrics and that kind of thing. And then we do a lot of analysis of our A B test and we we will graduate things for production. So at a super high level that's that's nothing there. +I don't think is that different than most places other than you know we have the challenge of so many different shops and things that we have to sort of solve for. I mean, this sounds so fantastic. It's like almost fixing or improving search for the entire e-commerce right and maybe even. +Yeah, that's that's part of the challenge and the draw on one of the reason, you know, I met Shopify is just because it's it's for you. There are people on Shopify who sell 100,000 products. There are people who sell one product right and there are. +There are really there are stores that you use that you may not realize are Shopify stores and then there are stores that are very clearly like Shopify at the end. So that you know Shopify in the URL or in the footer and that kind of thing, but some you know your local. +I think my local running shop. Shout out to rag and mountain running. And then there's a place that you know at the farmers market and a lot of those places you Shopify. But then there's also like larger larger brands that use Shopify. Yeah, that's amazing. +I've seen you recently posted with a link to this paper was it from Google saying that search abandonment costs us retail 30 300 billion dollars annually. So it's like massive massive opportunity for search companies, you know, consultancies and so on. Totally. +But why do you think this is still the case regardless of all the efforts of our like what search community. Is it to say that community is too small and there is like potential to grow it and add companies and so on. Or is there something fundamental that you know still needs to be tackled. +And I think it's it reminds me a lot of where the search space and not just the search space for adjacent things like recommendations or any. I would say like surface on a let's say a website or a product or an app that like is algorithmic in some way. +That's just a it feels like from how people build products. It's just fundamentally a nascent space. So it reminds me of early in my career, when I was a software developer. +And I worked at a couple software companies maybe because I was a CEO developer that fund that really were hardware companies. And people at management levels were used to running hardware companies and but more and more of the value was delivered to software. +And they didn't necessarily understand how to manage software. So you know hardware you might have these very upfront you know classically waterfall kinds of development processes. +And then software you know in the early 2000s we learned about agile and then it was good to be iterative and these things and it's okay you know fail fast. And we can always unlike hardware with software we can hit the undo button. +And so it's a very different practice and a very different style of leadership I think. And I think the same thing is becoming true of these data products like algorithmic data products like search where. +Sure at the implementation level in our at our level you see a lot of people who really we understand the problem we understand it's very experimental. It's even more experimental than like how software you can ship something and you can undo something if you need to. +It's actually like extremely experimental where every week you're shipping something new and you're always looking at metrics and AB tests and everything. And every week you make a completely different product that you go in a different direction. +Honestly I think one reason that one reason that that this is a problem is that organizations structure to ship classic software aren't necessarily well suited to ship like these data products. +And you I gave a talk at at my C's you know the e-commerce search conference in Charlottesville in April about how to Shopify. One of the things that we do to try to help with this problem is really make engineering and data like work hand in hand. +Because many organizations they're very siloed from each other. And that can be a really big challenge because as you make these decisions like day to day like I'm implementing something in my search. Do I go do I you know as I'm writing lines of code do I go to the left or do I go to the right. +Do I try boosting this or implement this algorithm. And I think that's really a little bit slower and a little bit faster. And like those really intricate decisions you kind of need both sides of the data and engineering brain to to do those. +And I only think of very very small handful of companies places like Google maybe meta Facebook like have really mastered this like blending of data and engineering. Most people most other companies who have started to like you know have finally mastered software engineering. +Haven't quite come up like from a leadership and beyond and product leadership overcome the the sort of like hump of of how do we think about data products. +How do we manage things that are experimental and aren't like you know actual projects that are going to take a couple months to complete that we have a very clear beginning middle into. +Yeah exactly it's like beautifully put like it's not like a Toyota you know pipeline where you could say yeah this is where we start we put it all these materials in some people do something and we fix some bugs and off we go with the car. +But like there is no like definition of done in some sense rate. Yeah there's no definition of done it's very much like you're just constantly experimenting. It's not something that's visual that it's like oh we're going to add this button to the UI and it's going to do these things. +In some ways you're not changing you're all with you're like rarely changing the UI you're like mixing up search results and how they come up. There may be UI elements to it like oh we understand this query better so we serve this UI but it's extremely fuzzy and hard to like. +One of the biggest challenges I have actually you know and I've had this in consulting and you know continue to have a shop device how do you coach stakeholders to understand what you even do. +The plus side is it's very much a it's very much like oh it's a very tied to you know we're going to make more money on the other hand it's like not clearly like. So just be in a traditional sense it's a it's a constant cycle of experimentation and optimization. +Yeah absolutely I mean and it requires like a different discipline and like rigor and really even like a Tom Tom for example I work in a relevancy team search search for elements. + I'm not a ML component or try to but but the thing is that I was amazed by by the team saying hey I'm running this a b test and they compute a bunch of metrics some like confidence intervals p values and they say yeah I feel like this is a good change in the algorithm but it's not proving to be like when we have split our traffic a b 50 50 just doesn't work after two weeks we have to kill it. +You need to go through that rigor if you say just for the moment you you doubt and you say no I love this change I'm going to push it forward it's not going to harm you lost like you cannot do this right. +It requires everyone to be a scientist too and I think traditional I think there's like traditional product and other kinds of leadership that is very can be very opinion driven or have a strong vision. +And I don't think there's a there I think there's still tons of room for that because at the end of the day you need. You need a strong hypothesis and often what you're a b testing is within the context of a larger strategy. Like you know we think we'll get traction if we go in this direction. + But you have to like you have to really bring science to everyone has to be a scientist and it's not as often an a b test to is not as simple as like it was a clear winner loser it sometimes is a winner in some ways where it's like it's a winner in this dimension a loser and so we mentioned and then like can we go after and slice and dice the data to really understand what happened. + It's not it's not a often not like a cut and dry story and so like trying to understand the data to even know how to tell the story requires a lot of humility I think if you're a leader to say like okay it's more important like what I learned and like my big idea being being the winner of you know being the clear like thing that won so I get all credit and that kind of thing. + Yeah exactly this is amazing I wanted to I'm actually rereading your book here I'm a big fan and and once we meet you give me the autograph all right and John very man I think I saw you actually in person listen revolution 2013 you were on stage it was in Dublin do you remember being in Dublin if you. + Yeah yeah that was fun yeah I remember they had this huge rugby stadium yeah yeah exactly this was the coming back to I think I talked about Cupid there I think my colleague from Silvercher arena came out we talked about Cupid yes and and I and I got blessing for Luke from Andrew Balecki there I said would you be okay if I continue because he didn't have time and he said yes yes please please oh cool and and then later it ended up being part of the game. + You see and earned a lesson committed to Tomoko Ocita who is now driving massive changes day on Gira and oh yeah yeah that's true yeah I've seen his name a lot yeah yes fantastic and and in this book why I brought it up I was just reading one of the first chapters where you so beautifully said analysis so let's say in lesson lingo how you process the input text yeah analysis shouldn't map words to tokens it should map meaning and user intent to tokens yeah I mean this is amazingly put and you go there later explaining how you balance the precision versus recall as you do modifications to the analysis chain you know whether you're standing or not and stuff like that but it's not like many people even viewed that way I think not that I viewed it that way I was always like yeah what should I tweak to make it possible but there is a related topic on this front you know query and content understanding mm-hmm how how does this thing connect in your mind yeah I think so first of all I think it's funny how the work that we do shapes sort of like how we do certain work shapes our perspective on things because I think when I would you know writing that book is sort of like my early part of my relevance career at open source connections and this still happens like you kind of get brought to a client and it's like okay we have we have this app over here we have this indexing pipeline you just work within this box that is the search engine and so I became quite adept at like how can I hack the analyzers and the query and everything to be really to do like all the crazy things I want to do like and really could I take in a taxonomy and sort of map to a conceptual understanding of of the language not just a not just like the words themselves you know people think about analyzers they think about stemming and they think about lower casing but more and more it was like oh I only I can only work within this box it as a search engine and whether it becomes like plugins or whatever how can I how can I massage the text coming in and the queries coming in so that they they mapped to each other and so in that in that context it's it's like you know people you may have heard Conway's law which is like you end up shipping your org chart like how you structure your projects is very much tied to the organizational structure of how you sort of of how you do things and so the consultant slash relevance team it really only works in the box that is the search engine and makes the magic thing magic more magical and so how it's when I think about that often it's sort of similar to how people think about relational databases you're creating a structure of a database to answer certain questions in the same way using analysis and how you create fields you're sort of like structuring an index or a view of some documents to answer these natural language queries that come in and so everything is sort of like thinking about massaging this database to really to really get to that into rank results in a way that sort of like gets closer to the questions that users are answering and I really concrete example of that is uh you know I think this comes up a lot in that actually this is my one of my earliest first projects was if you take some let's say some medical knowledge into a your indexing like questions or medical articles and you have a there are taxonomies out there that are like mesh is one medical subject headings that say like okay this article is an article about um it let's say something in the cardiovascular system it has to do with the heart and it has to do with like the left ventricle so you that's a taxonomy it's hierarchy and um if I can index that and I can index that taxonomy a certain way so that if someone if I take a query I also map the query to something in that taxonomy let's say cardiovascular system heart rate ventricle if I can engineer the similarity in the search engines so that it uh it kind of uses the analysis to be like oh it has so many taxonomy nodes similar that makes it more relevant but maybe it has one or two dissimilar that makes a little less relevant if I can sort of like zero in on a on on that uh then I'm really getting closer to sort of meaning that I am to uh you know whether it's like a stemd version of this word or not and you can create tokenization pipelines that take terms like let's say uh myocardial infarction which is a heart attack and sort of like uses synonyms and other things to say oh it's actually this part in this taxonomy uh and therefore you know we we we sort of expanded to these taxonomy tokens and uh same thing at index time and so I got very adept at sort of massaging data in that way but I think like when you take a step back and you think about if you have access to full indexing pipeline as most teams do and you have access to the full query API and everything um really you're doing the same exact thing you're sort of like massaging content as it's come in comes in in some ways you have more tools if you can do it before it gets to the search engine and the same thing with queries you might have some ability to apply an nLP model or do some kind of any recognition before it comes in so philosophically you're really doing the same thing you're trying to map um you're sort of at one side you're mapping documents to queries and on the other side you're mapping query to sort of the document structure and you're trying to map those two together in a way that creates a ranking function that that does what you want it to do yeah absolutely um I think it was Daniel Tanklank who summarized his 20 years experience as comparing sets of documents right so like is this set of documents better than the other and then everything else comes as input you know was it query understanding was it content understanding whatever yeah it's it's amazing absolutely absolutely yeah and it's uh all of these things come together and the search engine is kind of like the core driver and you're trying to massaging this similarity engine to to make that quote-unquote cosine similarity what you want it to be yeah I've recently ran across one case uh so in map search you could think well what what people do type there well they do type addresses they type um coordinates um they also types uh questions they can see how do where do I go hiking here you know in this area stuff like that not something we can handle right now but maybe in the future we will and the case was um there was a company a search with company name so right we support points of interest search B. +O. +I and our search engine focused on so you have a meaningful part of the company name I don't remember something like white mice something and then had like less meaningful parts like limited south Africa you know things that would repeat across a number of company names and I was search engine because you you have the feature of minstreet max right it it it it focused actually on less meaningful components and so we bumped some overlap higher just because how TF idea by the way works it's also a lot of work they're going into understanding why does this TF equal to this number I need to figure out idea right um yeah I mean and I was like so I went on Twitter and I tweeted like I came across another use case where maybe vector search could help because it would actually focus on the meaningful part hopefully because you have the attention uh mechanism in the transformer models right like bird and others so presumably it would focus only on the right part and it would find it do you think do you think this was a moment of despair do you believe in this? + uh I I mean I do believe in it to some extent like I totally believe that's a valid thing I also think that like sometimes the document frequency itself is really interesting because it's like it gets at the idea of specificity in the query and so if you search for something and it's just like the document frequency is sometimes a poor measure of specificity because it's uh it's not like it's actually you know just because something is rare in the corpus doesn't necessarily mean it's it's uh it's it's actually more specific to the users intent and some cases like that is just like uh thinking of when when we would do uh we did a project for a Riley media to kind of help with a project similar to Safari books online that people might be familiar with and people don't like if people search um job ascript or book job ascript books it just so happens that just how titles are written if you write a book on on uh react you're not going to put JavaScript in the title but react is conceptually you know about JavaScript and so uh what's really interesting is like I you know you type JavaScript books react might be a great great react book might be a great JavaScript book but you have to understand react in the context of this broader concept of JavaScript even though that exact term is put in title so uh this concept of term specificity is really useful but it's often like uh the the way we get at it with document frequency it can be can be really invalid you know not great and to your point about like the attention mechanism that's that's really that's really interesting because um yeah I sort of I could see like conceptually how uh how that can really like tie you sort of like zero in on the concepts that are most important to a to a document and one challenge with like one of the reasons I think Bert is so found like transformative is traditionally like for years and years even going back to the early 2000s with like lead and semantic analysis of these these things and and then we have word to vac eventually these sort of like techniques uh they're really great for like and some ways like increasing recall or getting at like a rough semantic sense of of what's what's uh you know what's there but when you at the end of the day it's like not helping me really get at the higher precision kind of component of search that really like traditional search engines thrive at and still are really good at like you have you know that this is a shoe I don't need to see socks just show me the shoes this is a shoe and you don't have that like fuzziness that you get in a dense vector representation where everything is kind of compressed down and fuzzy uh but what hurt and those kinds of things really do with the attention mechanism I think it's really like turning that on its head where it's like actually there are these parts that we could get at where it's like the precision of these related concepts it's like we know that um we know that the A the most important part of this document is this part that talks about JavaScript or it's you know the JavaScript-iness about it and and when we search for that we can kind of zero in on when that dimension of it as opposed to being a fuzzy concept of um of you know programming languages and JavaScript if that makes sense I feel like it's like zeroing in on like what makes this this thing precisely interesting as opposed to traditional dense vector representations which have been more fuzzy and castifying cat out a five wide net kind of thing and more focused of recall yeah exactly I think and you you're reading a nice paper with what problem does Bert hope to solve the search in 2019 and of 2019 and you really well-compared there uh inverted in the xpar search method with uh war two back and then you basically allude to the fact that Bert probably gets the aboutness of the document uh better than uh war two veko tfidf right because in war two veko you essentially have like a window that you slide through yeah to your early example if this book react never happens to be near JavaScript because everyone knows it's JavaScript right then you will never find it using using war two veko maybe it will be too distant but with Bert it tries to embed the whole document right or like you know chunks of it averaged and so on so it might yeah and and you have to do like if you were to use war two veko you'd have to like sort of implement your own attention mechanism in a way you'd be like uh okay what parts of the document are okay first i've got to throw out a bunch of front matter and and matter and and junk and like with word to veko you'd have to somehow engineer to like okay we'll look at these paragraphs and uh maybe i need to focus it on these ones more and throw away some other ones uh and you don't the aboutness of that gets really blended whereas the amazing thing yeah you're at the amazing thing about like about Bert is how it's a ability to really zero in on the aboutness of like where each each token position it's not just like the paragraph has you know where the document has an embedding each token position has an embedding so it's like if i take a question i can really zero in on like oh this is the part of this article that is most similar to this you still got challenges with like the fuzziness of dense factor and it's you know maybe not precisely the words you're looking for but just the fact like that's just mind-boggling that each token position of a book might be an embedding i mean it's a it can be a beast to to man and should deal with but it's it could be a really powerful concept yeah absolutely and plus it's a mass model right so it can predict what should be the the token and that masked out position and then it can actually predict entire sentences i think there was one of the side yeah effects of it right so it could become generated yeah totally yeah exactly so it's pretty amazing yeah and so today as we roll into this you you follow up on this trend of sparse versus dense you know i think a lot of discussion is still going around how dense will enter this sparse search world at larger scale so how do you feel about this and of course there is hybrid search as well it's a hot space yeah and i know there's a lot of open source projects there's like milbus there's companies like pankham there's qdren there are all all of these systems that are doing dense vector retrieval and it's also just like a fun problem if you were in search for a while to think about approximate nearest neighbors and like how you solve that and i know for a long time it's been sort of you know a side project of a lot of people i know for you Dimitri and Max you guys had had a lot of fun in the billion vector challenge um it's it you you the first thing to ask is like why do we need these extra databases and it's interesting to think about because you're thinking like i we just talked about why can't we you know we can match map tokens to meaning and that kind of thing you know a lot you know and why can't we do that why can't we just apply the same techniques to the dense world why can't we use a traditional search engine and if you think about it what what you're they're very in some ways they're very the data structures underneath of them are optimized it's like yes you're both in both cases you're sort of like mapping query meaning to document meaning like fundamentally the task is the same but the data structures that you would use for a dense system where everything is clustered into like 200 and maybe 256 or 760 or whatever dimensions are very different than a sparse index where you know you have and it's something like a a last searcher to get a loose-seeing traditional index really it's a it's a the dimensionality is way way higher so it's like you know you could expect hundreds of thousands of words each each word is its own dimension and if you think about that like you're going to have situations where you know words follows zip zip-slaw zip-slaw which is you know the occurred in english the occurs in every word and you get gradually gradually falls right off a cliff and then you get like cat occurs in 1% of documents and then you go keep going and you get like specific terminology like feline occurs in 0. +1% and it really falls off a cliff and so the sparse vector indices are really optimized for that use case of like I have a I have a term and it basically points at a very small handful of documents that contain that term and I can do that look up very quickly I can fetch those I can score those and then I can sort of like get get a get a score whereas what's interesting about the dense vector case is it's more like I go in with sparse vector I go in with a single term and I get like or maybe two or three terms I look up in this giant data structure and I get this this by looking up and I can get like the you can get the handle to all the things that match that and I can sort those and get them back with dense vector the query isn't two or three terms it's some value in a larger 256 or more dimension vector so off the bat right there is like that's a large 256 terms and a traditional surgingian would be a large query and really you're looking up in a in a index that is itself that dimensionality much smaller dimensionality where every document has some amount of value for each each dimension so it's not like cat where it occurs in three things it's like all billion documents have some level of if cat was one of the dimensions some level of catness and if you just think about how you would build a data structure it'd be a very different thing and that's why that's why they people build these you know completely different data structures and why doing nearest neighbors on this large scale data is very important because you do want to get like some some sense of like similar conceptual meaning in this in this compressed vector space but that in and of itself kind of gets at the pros and cons of each because if you get this sort of like compressed representation you don't specifically have the word cat or even direct necessarily direct synonyms of cat that you've created you have like a rough statistical sense of like catness or animal-ness that you're kind of clustering together you've lost it by compressing to smaller dimensions you've lost some precision just by definition but you've sort of expanded the net of what you might bring in so that's like a pro and a con of the new dense vector representation whereas continuing to use a sparse vector representation it's a much more precisely managed look up and so they there's not some future where you throw away one or you throw away the other more and more the reality is like hybrid retrieval where you're using both data structures to serve search results to give people like some kind of relevant results and in a mix of both perspectives of like expanding the meaning to baby mean other things or snow-staying in this more precise world of sparse vector meaning or sparser meaning yeah it's amazing how you put it like it struck me house in a simple way you can explain complex things I mean the sparse vector yeah you said hundreds of thousands of you know terms in your term dictionary when I work with alpha sense I once counted we had that billion in because you feed like I don't know millions and millions of documents and they do vary quite a lot of course there is some overlap like financial legal like revenue right would occur everywhere but then as they describe different verticals in the industry you know healthcare versus I don't know pure finance banking investment firms and stuff they they have different lingual there and and that's amazing like how you put it right so if I had a vector let's say billion size right now I have what I have much less right so I have like 768 dimensions maybe 1024 I heard the recently one commuter in in Elasticsearch is trying to push the scene to upgrade to 2048 or something oh wow and I didn't see that that's that's amazing yeah yeah I think it was my share you were so yeah and this is amazing but I guess you're right and also there is another thing that comes to mind we had a podcast with Yuri Malkov who is the creator of one of the most popular a and then algorithm station SW here article navigable small world graph and when I ask him this question so let's say you have this geometrical search right similarity search and in the case of ecommerce you also want to have filters so you want to say I want to read choose you know this size in stock and so on and and he he's surprising that he said these contradicts contradict the to each other so so much that I cannot even imagine creating a generic algorithm that will cover this case because essentially he said it could quickly degrade to traversing the whole space of points because as you said you know about as a dimension you also have these filters as a dimension right you could say yeah cluster these points on color cluster these points on size yeah yeah imagine doing this up front I mean this is not a generic solution and then he was he was just blunt and saying this is not possible I don't see how you could do this and yet the vector databases claim that they have done it and yes you can go and at scale but I I sense that there is some some truth he didn't maybe potentially there you know some edge cases where it doesn't work or maybe it goes over a second and it's acceptable I don't know but how do you feel yeah I mean it feels a bit like over complicating a solve problem it feels like we've I sort of I suspect that will be in a world of of sort of hybrid more hybrid retrieval where you're using a traditional filter for those kinds of things because I think like I feel like we dense retrieval is the missing piece of most people's search systems if not for anything then just like I think like first pass like often it was like case for a long time that people would would do like first pass like the m25 scoring and then like maybe apply some learning to rank algorithm I sort of feel like that's going to flip to be it could flip to be like first I'm going to get this dense vector candidate list because it's compressed it's like it's actually more recall oriented and then I'm going to use sparse vector techniques to filter re-rank and these kinds of things but I also think like just for speed and ops like one thing that's you know it's just a practical concern is like they're solar, elastic search have such huge install bases and a huge practitioner people who know how to scale them and I think with new dense vector techniques I'm not sure people are going to like completely throw out their elastic search or solar install just to have this new functionality and in fact you know as elastic search and solar sort of adopt more of these things I think more and more people are going to say oh that's cool I'm going to use this in addition to my elastic search or solar so I sort of feel like we'll end up in this world where where yeah elastic search and solar don't give us as nice of a set of feature features for that but it already works pretty well for 80% of what we do anyway and we just kind of want to tack on this extra bit so that feels more of a like my expectation of what the future would be then then we'll like throw out the existing systems and adopt something new yeah this is very interesting opinion of course not downplaying the displayers that you mentioned I haven't talked about it as well seven databases exist today and you neural network no neural frameworks like Gina and Haystack which is of these database but I agree like I think the future might be in flexibility that okay if I'm already with solar why don't I just use the you know and then plug in and try my luck maybe just wet my toes so to say right I don't want to jump to entirely new world of new database that I don't have experience with but if you haven't had that set up you start up let's say I know some startups when they when they want to go that direction with neural search they do consider VESPA or VVA or pine cone you know and that might be a different use case as well by the way this is entirely new big topic but it's not only search right search is still being figured out and some companies do it but then you also have machine learning pipelines like recommender systems oh totally yeah yeah that's a great use case so it's not like for search up of course like you have this huge install base and stuff so I mean especially you know for established companies like Shopify or whoever else but you're absolutely right there is there is a lot of opportunity for for this at in so many places like pure question answering applications or places where you know you're using it as a backend component to do some kind of inference for recommendation systems I think it's a fantastic I think it's a fantastic thing I do I do think like more and more a bigger practice will evolve to scaling these things out and understanding them from an operational perspective so yeah I definitely think it's an interesting landscape to watch and I think like I think the the other counter even to what I said is like like their solar and elastic search are established for their use cases but this sort of like I was saying before this sort of like world of building these search like or data driven surfaces or personalization driven surfaces it's just wide open like I look at my phone I have the limited screen real estate it needs to show me something relevant for me or relevant to you know I think you know peloton for example peloton the fitness app I want to do a workout it's gonna I'm gonna go to the app it's gonna it's not gonna like it does have an navigation but it's also just trying to show me something on the screen that's gonna be relevant to me so engage with it or Netflix or all of the UIs we use these days they're not really like point and click they're driven by some kind of smart algorithm and it's not necessarily like a classic search use case where it's like point click filter then search with relevance so I do think that it's a wide open space and honestly I think it's a it's an under appreciated space and I think it's a space that in some ways if I was maybe if I was you know I'm just thinking of this now and I speak speaking completely out of out of hand but like if I was to advise like a pine cone or somebody I might say like let's you know stop talking about yourself as a vector database let's start talking about all of these ways that are really you know I think I in my book I talk about relevance oriented applications or like I forget even like relevance driven enterprises and I think like a lot of these applications are really sort of like completely sort of relevance oriented applications which really whether it's really personalization or recommendations there are things that are about ranking something to a user for given context or maybe for given query or question and I would focus on the universe that stuff because I don't know if anyone's really speaking from a product perspective about how what is the engine that drives that and I think that could be really exciting product or open source space or whatever just just really begin. + This is a great advice I might quote you on the upcoming keynote that I need to deliver at Haystack Berlin because this is a great great thought because in many ways you know one thing is that when people come back to me as they say what's the difference between vector and neural search and I'm like there's not much difference it's like vector is probably mathematical stance and then you know it's more like if you're deep learning engineer or researcher so you like to take it from that angle right but then you put it so beautifully like maybe if we focus too much on this technical level saying you know this is vector search that is and yeah totally it's all sexy you need to bite and we don't focus on like use cases and how things enough I think enough and how this could complement each other you know it's not like vector search is trying to kick out spar search it's not going to happen by the way you know the prey search is not supported in vector search maybe it will be but it's not right now you cannot just say there's also there's also a set of problems that are I mean to this day people use tree based models for I'm going to mix some kind of similarity with some kind of statistic about my data you know I think like tabular data so to speak has consistently been dominated by tree based models which is a completely different thing from deep learning and neural search so and those things integrate pretty well with like the learning to rank plugins in solar and elastic search where you can plug in a vector similarity into that kind of tree based model but the opposite isn't necessarily true this is very interesting so diagram like we spoke a lot about and I'm sure we could speak more about how to engineer a search engine you know let's say if you start up you don't have clicks right you you don't have feedback from your users maybe necessarily in that form you can engineer now you have a dense search you can still engineer by tweaking analysis chains and crafting scene dictionaries but once you are over that launch you know and you gather that data natureal move is to start looking into something like learning to rank and you you spoke a lot about that I was just quickly googling you know you you spoke at I believe reading buzzwords haystack you spoke somewhere in San Francisco Bay area like how to turn you know ranking into a ML problem machine learning problem we also spoke about Bayezian well yeah and then there is also I've recently learned well not that recently I think it was last haystack or maybe the previous before that learning to boost how do these methods come together where do you start for those who maybe haven't have only heard about it but they didn't try it yet and do you also think maybe a connected question do you think that we will marry you know the dense retrieval signals with learning to rank in some way does it make sense yeah yeah so yeah so I think a lot of companies they think it's easy to go into the learning to rank process thinking that I'm gonna this is the you know knock on wood it's easy I have a model I'm gonna train this model with my clicks and everything and and search will magically get better what's interesting is if you go back to haystack talks about learning to rank and other comforts talks where the number one place people get stuck and they spend their energy is on the training data less so the model the features and all of that stuff and if you think about it it makes sense because one of the biggest problems with sort of training data is it's just horribly biased towards whatever the old algorithm is you're always clicking on you people are always clicking on what the old algorithm showed them regardless of you know if it was good or not there are it's getting some clicks and the stuff that might be amazing but it's on the 10th page is never getting clicks so how do you optimize search in that context and it sort of doesn't matter what model you use or what feature you use until you get like really well-labeled training data you can't really make much progress um so what you can do to get started on it is at a minimum okay we know that it's the training data is not perfect but if we just look at like the top end the top 10 or so what's actually getting clicks we might be able to start to learn some stuff there about like what differentiates them so you might start to think you could think about this is um the learning to rank is there are many ways of learning to rank but if we just start to think about it as a classification problem within the context of these uh these results are that we do have click data on what's differentiating them like getting being seen with a lot of impressions and no clicks and lots of things and things with lots of clicks and you start to see the features that separate those um and then you sort of like know at least you're knowing like within the context of your search filter bubble what's sort of like differentiating relevant irrelevant and you can kind of use that to rank um but at some point you do need to realize that like I am working within this filter bubble with my original algorithm and all I'm doing is sort of tweaking up a few things tweaking down a few things how do I bust that filter bubble and get different kinds of potential relevant results in front of users to sort of like see whether or not they'll click or not um and that's that's really like sets you know that's really probably the big big challenge that people have with I mean honestly not just learning to rank but any algorithmic search works they're doing yeah absolutely I mean I when I was doing it using your hello LTR Rappl I think I focused a lot on the mechanical aspects like okay what is pairwise what is list wise I need to read london mark papers to understand get into the width of the algos but then I think I spent maybe too little time figuring out the data part and like head versus tail I think grunting your show wants to say torso as well and I'm like a torso yeah what's that so like do you have any advice for those who are starting like do they just need to be born data scientists or do you think that they're really like that to set yeah it helps I guess for those like you once but like tool set or methodology or a book or whatever like what yeah yeah so um I this is like a this is a big focus of AI powered search a book I help write with trig ranger and max or whim and then ML powered search was just the training I'm doing because I I think I think like a lot of the focus these days is on cool things like dense vector retrieval bird and these kinds of things and to me that's like taking out an old model and putting in a new model but the problems outside of it to get the training data are still really hard and there are a lot of techniques people can use and I think sort of the thing people aren't talking about enough in in the space is active and reinforcement learning and that's what I talk about a lot in my book and my training is this idea of you know how do we strategically explore new potential relevant search results for a query but still maintaining exploiting the knowledge we do have because we don't want to completely just show people random results and how do we play with that boundary a little bit in a strategic way and not just like here's a bunch of results oh it hears like a random one um and there are processes out there to do that and out you know one uh one that's very near and dear to my heart which is a very practical thing for people to learn about is what's called a galsian process and a galsian process is just a different kind of regression so it's the same way we're learning uh we're basically learning to rank we're learning like given a bunch of features like the title bm25 or maybe some dense vector similarity or some other you know the popularity of the document we're still learning from our data what is you know what function of that and probably predicts relevance and what doesn't but what's interesting about a galsian process is that any given point it knows how certain it is in the prediction so like obviously points that are in your training set it's going to have high certainty that the the variants the sort of like the gout that's where the galsian comes in the galsian um distribution at that point is very small it's very certain when you go a little bit farther out from a point that you have information about and it might sort of like try to connect the dots between that that observation and maybe one down here but at the end certain the kind of grows and grows as it moves away from an existing observation and that's interesting because what this model is doing is it's sort of like both predicting relevance for arbitrary points in the feature space and it can do that because it can see patterns like oh it seems like there's a cluster of training data over here where it's like things in this realm are more relevant than things over here on the bottom left but when it's in between those data points is where the uncertainty really lies and it can say well I'm gonna I think we should probe here I think we should try to select the document to show the user to get more information on that is uh uh we feel with a reasonable set of confidence is probably relevant but we're not entirely sure because we haven't exactly observed that yet and that's really where you can both explore the training data and explore the feature space so if you introduce a new feature into learning to rank you could say like oh let me try different combinations of this and then as you start to get out the general pattern you can try things in between to really understand how that feature interacts with how um how users are interacting with data so that's really why it's active learning it's very much about like the model itself yes you're you're training a model like the model itself can know its own gap so that you can sort of like imagine as you're serving search results you can go to this model and say not only are you I am not only wanting what you're most certain about I want like strategically where you want to explore and you can show those results to users too and start to gather clicks and information on that and to me that's a really exciting field of of where search and information retrieval and all of these fuzzy relevance interfaces could go and do a lot of amazing work yeah it sounds fantastic it's like a mathematically driven wave expanding your click base right and it still and it still sounds very experimental to me because nothing is given you only have from what you explained to find a student correctly you know it's like it's still an experiment we could run a nab test is that how you would do it also so like you you basically your model is essentially a reflection of the data choice you made and now you explained a Gaussian model to do that right and then you run an nab test is that right yeah you could do you could set up lots of different ways of doing it so you could be you could have your ab tests that are that are going on within those ab tests or let's say a classic ab test is like if I search for shoe I get ranking a or ranking b I can select actually there's there's a lot of creative ways you can do it but sort of like a classic way of doing it might be to say in the third slot I'm going to put the explore item so in every other slot I might have my like my traditional LTR model that's ranking results using lambda mar or SVM rank or any of these sort of traditional learning rank models and then we're not even learning to rank some other solution that you know works well with your features and then you slot in that like third result that's going to like explore a bit it's going to be different or as many results as you want another completely different option is to use it as a means to drive um in search results it's not like we show people just like 10 results anymore we give people lots of different there's different UI widgets that are like off to the right you might have something that looks a bit more like an ad or suggestions or in in product recommendations or a product search we might have like sort of similar products to those that you searched for like different kinds of prompts and you might get sort of explore in those spaces too to kind of get more click data and traffic to just sort of like explore what might be relevant so it may it depends a lot on like how you want to drive your UI in your specific use case and what might be appropriate and this is help me understand this this is different from click models right because we also have the click bias problem and we could introduce or redistribute the click weight in a way to those unseen items this is absolutely right is this different so it's different so um so there yeah this is like I'm talking about step two of a process step one before you even get to here is you don't just want to take like if you search for shoe and you notice something gets a certain click-through rate it's not necessarily you don't necessarily want to take that raw number of clicks because even within those things that you're showing users different just there is something called position bias which is people might scan top to bottom and they're just going to click on the first result more than they're going to click on the second result and there's lots of reasons for that even when they notice both they're just getting they might say oh this algorithm must know what it's doing um there are people scan top to bottom um and there are different reasons people are just like will click the first result more than the second result and so on and it's it's a it's an interesting phenomenon of like psychology about how people process uh search results that are even shown to them yeah exactly and by the way just as you explained this it occurred to me that have you noticed how um you know the interfaces changed like you go on youtube watching these shirts there is no way to search them right you just click and you watch and watch it because I think at that point first of all there is no bias you don't know what's next but I think the goal is also more like entertainment it's not like um I have an information need right I'm actually searching for something but I guess sometimes and I think you also spoke about it search uh blends with recommender systems because we actually don't know and user might not know what they're looking for sometimes maybe sometimes they do sometimes they don't like it's an explorative search which means it could become a recommender system which means you could plug in those explorative results exactly and it becomes a very um that blending it can be very uh interesting it's also can be challenging because searches also a very intentional activity and if you do something uh let's say in a an a dense vector representation there is some relationship that in a general sense like when you train on Wikipedia it makes sense that these things go together but maybe in the specific domain this specific uh profession there's jargon and it turns out those don't go together people will notice and they'll complain about about these things um a sort of like actually domain independent example of this that you sometimes see is sometimes um things that are opposite sexually occur together so you get like um I want to cancel my reservation or I want to confirm my reservation those sometimes co-occur with the same kinds of words and sometimes in these retrieval situations you might be able to get away with that and like a recommendations context for people like yeah whatever but when I'm searching it's like how how dare you not understand me and people almost get like offended by it because it's almost like going to a person at a store and asking a question and given the exact opposite or something yeah exactly I think my wife was recently doing a search in one of the uh you know grocery apps and uh everything gets delivered home today even in Europe and she was searching for oil and she was saying hey your uh vector search you know research could be applied here probably so the the top result was tuna fish and she was like why uh maybe maybe because oil is one of the components it's inside the oil right so what do you want but she was looking for a category of things so like breads right and she was getting yogurts yeah a sudden yeah so I think that's probably a negative example of an explorative search or maybe not I'm not sure but I think it is like you're a puzzle to the user not to see breads on the on the page and seeing yogurts yeah exactly yeah yeah that's actually a good like example of a the traditional search engine kind of doing that or it's like oil but it's tuna and oil whereas uh maybe a dense vector search might actually work better until you get like motor oil it's like so yeah both sides have to be tuned carefully because yeah search can really searches one of those things and I think the article we talked about a while ago about like the Google article actually talks about this not just in terms of loss revenue but in terms of brand retention because people will not come back to your store if they're like the search doesn't get me the seems dumb so it's it's uh it definitely yeah people people notice when search is not understanding them yeah 300 billion dollar opportunity for everyone yeah out there so in this maze of things learning to rank density well we still need to also um get concerned with how we manage this project right and yeah a lot of a lot of ideas here and thoughts uh but like I'm particularly interested in this um in the search engineer role transcending itself to something else for example it used to be I don't know I was tuning I was a solar relevancy search engineer I guess a few years ago and I was just reading these XML files and tweaking and tweaking and then you know you know indexing search pipeline and so on but today you mentioned this data science came into play and it's still being integrated what what other aspects do we need to think about how should we form search teams uh I believe you have a blog post on that as well we will cite this in the show notes oh yeah a Shopify and I know the Shopify yearning blog I know Eric Pugh at Opus source connections talks about a lot too um yeah and I can't say I have all the answers because if you like it's a you're right it's a brand new space I think it's an interesting thing to talk about I think there's two principles I think about when I think about a search team and you can't you have to do both and it's like it's like uh building a plane while you're flying it do you always you can't be that I remember at Opus source connections sometimes we would get in projects that would be very almost like two infrastructure focus uh and then other projects that would be two only building the experiments and solutions um and what I mean by that is sometimes the infrastructure folks experiments is more like oh we're gonna gather we're gonna spend nine months gathering clicks and processing it and trying to understand what's relevant before you've been touching a model or or tuning relevance wherever and then the other end of the spectrum you have systems that are just like we're not even gonna try to understand what's relevant just tune things and you know yolo ship it and hopefully hopefully things hopefully things look good and what you know and honestly both of those are anti-patterns because obviously in the case where we just like study the problem we never actually deliver anything and uh not just as a consultant but as a practitioner working on a team your stakeholders are gonna lose patience they're you're not gonna have much success well the other hand um I've seen like I've had I had one project where spent months and months and months developing experiments they did have the ability to AB test but we didn't really have any ability to understand or dig below underneath like what was happening at a query level or anything where we just spend months and months experimenting and through a dozen experiments at the wall for AB tests none of them turned out to matter and I suspect in that case it was turned out to be a performance issue or a UX issue that was actually more a problem and really what you have to be doing in this relevant space is shipping experiments all the time with whatever for structure you have to support them while simultaneously like changing the engine that you're using to like understand the quality of relevance so as an example you might do something like start out with cupid and start shipping things just incrementally with cupid getting people's feedback as bad as it might be knowing it's wrong um and start shipping changes but at the same time with you're doing that with your right hand and with your left hand you're kind of going in like oh we have to start gathering click data because eventually the cupid experimentation might hit diminishing returns or might get it really subtle cases that people aren't gonna be able to easily tell me the difference and if you're not doing both you're you're really gonna get yourself into trouble yeah and it's it's amazing you really described it as a not an individual level experience but like a team level experience right like and now well everyone can figure out okay add the data scientist at the UX person at the product manager at the research engineers and work together in one single concert right yeah totally yeah and that's a tricky thing to build because um so the first that yeah you do need all those roles um and you need a tremendous amount of data literacy so not only do you need uh you need those roles you need like probably a good strong core of engineering and data working together so that's probably a good place to start but as you add as you eventually add like someone like a product manager like what does a product manager on a search team do um and that's a really interesting question because I think it's quite different than building other features a product manager on a search team is constantly looking at data trying to let's just say at the query level because it doesn't have to be the query level could be a user or whatever is trying to say like here's a cluster of problems we have or opportunities maybe it's this kind of search a search for um colors in products or a search for this type of terminology and then have some like has to have the ability to do the constantly do the analysis of that data advocate for data that they need to get implemented and then um understand to some level like when they work with their data and engineering team what are the experiments like let's think about half a dozen experiments that could treat this problem prior to ties and triage in terms of reward effort trade-off and um and really plan out how we do those experiments and when you do that planning it's not just about planning the like nuts and bolts of how we get this experiment into production like we built this pipeline we do these things it's also building the like how will we measure how we answer the questions about those experiments um and that's a pretty I feel like that's one of the toughest roles that's a unicorn that is it's hard to hard to have someone with all of those skills but it's also really essential to really be able to have a really successful search team really accurately put I mean I'm still learning the product manager you know roll myself but like that's exactly right you know like you need to generate the insight for yourself you you're constantly like a detective work you know you keep looking yeah that's a good way of putting your detective uh you have to be a really good detective and then you have to like you also have to like figure out where you're going to go digging as a detective what am I gonna maybe I need to set up the like a team of manual laborers because there's something on our click data that's not quite right or or do something different with our click data and it's like you really have to be able to understand and appreciate how the nature of your evidence yeah exactly and and maybe to add to that like when I used to be an engineer what do you do daily you open Gira and you say what's the next ticket on my name so somebody thought about it somebody says what needs to be done they don't tell you that this might be an experiment but like it's given right with product management I don't open Gira and I know what to do every day I'm like let's think uh you know okay look at metrics uh look at query logs see what the engineering has done what experiment we just completed try to combine this pieces into yeah what did we learn from that experiment and like what might be the next step yeah yeah and and also subscribing to bold changes sometimes it's easy to kind of go step by step evolutionary you know but sometimes you need to jump over you steps yeah requires boldness and then messaging that and saying hey we need we need this I know it's almost like going after it makes me think of going after like then you know to get a get um get grant research for like most you know in the US if you if you want to do some big research project at a university you go to a government agency and you give this big proposal and for these big bets you almost like it's almost like that where it's like you yes we have this like side over here that's just constantly evolving whatever it currently works but then like for these big bets you almost have to think about it in terms of we want to spend x amount of time researching this area to see if this direction works out and then as part of that you also have to be like these are the early tests the prototypes before we build the big thing to know if we should invest even further and that's that's a tricky thing I think that's something a really good product manager can sort of like coach the stakeholders and thinking about these things of like and and thinking about them as bets and not thinking about them as like sure things that we know they're going to work out this also really important yeah exactly yeah just one example came to came to my mind that was it eBay when they didn't have type of head when they added it they they tapped into something like a hundred million dollar market you know because because you reduce the the time spent in each search session right you might get the faster which means you will get the faster transaction or like get faster so yeah totally probably probably was in involving product management thinking what if we do this but it's yeah it's like outside of box thinking yeah totally totally and I before we close off I mean I really enjoyed this conversation bag and I think we could speak entire day you know my engineer having like a lot of fun now like really getting into this but I love asking this question and you partially answered it during this podcast the y-question what really like you've done a ton it's not just that you imagine doing things or told someone to do you actually did it yourself like keep it learning to rank plugin explainer you know books all of these really physical almost physical objects right books are it wouldn't matter yeah so and but you still keep going and going and I mean you talk at conferences you push so much material on LinkedIn and Twitter I barely can fall up like what drives you in this space that's a that's a great question I think I think what gets me excited about this space is how hey it feels like the future of how people interact with with computers like searches the Google for example for a long time people call it Google like it's really a command line interface but it understands natural language and I feel like more and more interfaces are this like fuzzy interaction that's search like and it's this thing creeping up on us that people aren't quite realizing um and then the other it that just makes it a fascinating field of like the intersection of data and engineering and product and UX and you have to have all of these parts of your brain working together to help sort of like understand and solve the problem um it's really just a it's a huge intellectual challenge but I you know more you know more foundationally just like I find like interacting with interesting and great people in the field also just drives me is just how fun it is to interact out there with people like you Demetri and other people who are just like also get excited about the problem and like to nerd out about it so that also kind of drives me is just the social aspect of sharing my crazy ideas or products or books and getting feedback and like continuing the conversation yeah and I think it's endless you're doing a great contribution there but it's like endless journey in many ways right so many facets so much totally dimensionality yeah totally absolutely and of course I think people want to learn these things I myself as well from time to time I'm subscribed to a course and I just have the blast of I don't know four weeks two months whatever not for the certificate but for the for the knowledge and for that feeling of connection to that knowledge and with that I want to ask you if you have any announcements yeah so I'm doing a course with sphere sphere is a fantastic company that is sort of trying to build these like next level courses you know it's not your basic utemy course we're learning some basic things it's really like it's almost like a master class with a professional and they are really focused on machine learning engineering right now so they've recommended systems and all these things that I'm doing an ML powered search course and it really covers a lot of these things that we've talked about starting from you know just appreciating the relevance problem to building up learning terrain models and really focus on the problem of ranking and then also discovery of doing feature exploration and training data exploration to try to figure out what's even relevant beyond the sort of filter bubble of our current search algorithms so if you're interested in that catch up with me at it's get sphere. +com and you can find the ML powered search course and then of course like I all of my other things out there AI powered search written with tray and max and relevant search of course hopefully still still relevant so to speak and all the great stuff out there that I think people find interesting and useful and of course I also want to continue to plug open source connections they have great training consulting courses I was you know a key part of training of that team as a great place as a resource that you can go to so yeah this is fantastic announcement and also thanks for that and I also want to say that I enjoy the reason I enjoy reading your book relevant searches not only because you share a bunch there like for example indexing songs I was like what inverted index yeah you can if you want it your way of writing is very thorough it's like you create a network of thoughts as I go through the text and say we will talk about it later but let me spend a few sentences still explaining what I mean and I'm like it's like a conversation and yeah I try to be conversation on including like the typical like bad jokes and sorts of humor exactly I'm also learning on that side so that's that's fantastic and that gives that feel and keep keep going keep doing this I enjoy pulling what you do and connecting once in a while you sometimes give me a really good advice on you know how to oh sure is the title in the blog post or should they venture into this or not and things like that that's amazing this cross pollination so I'm enjoying it a lot and I recommend everyone to subscribe to your course google of course link it thank you and have fun have fun oh definitely we'll do awesome thanks so much Doug I enjoyed it and see you soon hopefully in person yeah yeah same all right bye bye all right take care you \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/economical-way-of-serving-vector-search-workloads-with-simon-eskildsen-ceo-turbopuffer.md b/transcripts_with_timestamps/vector-podcast/economical-way-of-serving-vector-search-workloads-with-simon-eskildsen-ceo-turbopuffer.md new file mode 100644 index 0000000..b533eaf --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/economical-way-of-serving-vector-search-workloads-with-simon-eskildsen-ceo-turbopuffer.md @@ -0,0 +1,4199 @@ +--- +description: '

Turbopuffer search engine supports such products as Cursor, Notion, + Linear, Superhuman and Readwise.

This episode on YouTube: https://youtu.be/I8Ztqajighg

Medium: + https://dmitry-kan.medium.com/vector-podcast-simon-eskildsen-turbopuffer-69e456da8df3

Dev: + https://dev.to/vectorpodcast/vector-podcast-simon-eskildsen-turbopuffer-cfa

If + you are on Lucene / OpenSearch stack, you can go managed by signing up here: https://console.aiven.io/signup?utm_source=youtube&utm_medium=&&utm_content=vectorpodcast

Time + codes:

00:00 Intro

00:15 Napkin Problem 4: Throughput of Redis

01:35 + Episode intro

02:45 Simon''s background, including implementation of Turbopuffer

09:23 + How Cursor became an early client

11:25 How to test pre-launch

14:38 + Why a new vector DB deserves to exist?

20:39 Latency aspect

26:27 Implementation + language for Turbopuffer

28:11 Impact of LLM coding tools on programmer craft

30:02 + Engineer 2 CEO transition

35:10 Architecture of Turbopuffer

43:25 Disk + vs S3 latency, NVMe disks, DRAM

48:27 Multitenancy

50:29 Recall@N benchmarking

59:38 + filtered ANN and Big-ANN Benchmarks

1:00:54 What users care about more (than + Recall@N benchmarking)

1:01:28 Spicy question about benchmarking in competition

1:06:01 + Interesting challenges ahead to tackle

1:10:13 Simon''s announcement

Show + notes:

- Turbopuffer in Cursor: https://www.youtube.com/watch?v=oFfVt3S51T4&t=5223s

transcript: + https://lexfridman.com/cursor-team-transcript

- + https://turbopuffer.com/

- + Napkin Math: https://sirupsen.com/napkin

- + Follow Simon on X: https://x.com/Sirupsen

- + Not All Vector Databases Are Made Equal: https://towardsdatascience.com/milvus-pinecone-vespa-weaviate-vald-gsi-what-unites-these-buzz-words-and-what-makes-each-9c65a3bd0696/

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20250919_060954_7832a7c20742f9493a19a27a0c5d8947.png +pub_date: Fri, 19 Sep 2025 06:09:39 GMT +title: Economical way of serving vector search workloads with Simon Eskildsen, CEO + Turbopuffer +url: https://rss.com/podcasts/vector-podcast/2222846 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 16.28, "text": " Now, + let''s get started.", "tokens": [50364, 823, 11, 718, 311, 483, 1409, 13, 51178], + "temperature": 0.0, "avg_logprob": -0.8359620445653012, "compression_ratio": 1.1228070175438596, + "no_speech_prob": 0.049298979341983795}, {"id": 1, "seek": 0, "start": 16.28, "end": + 18.28, "text": " MAPKIN Problem 4", "tokens": [51178, 376, 4715, 42, 1464, 11676, + 1017, 51278], "temperature": 0.0, "avg_logprob": -0.8359620445653012, "compression_ratio": + 1.1228070175438596, "no_speech_prob": 0.049298979341983795}, {"id": 2, "seek": 0, + "start": 18.28, "end": 24.44, "text": " Today, as you are preparing your organic, + high mountain type needs along in the kitchen", "tokens": [51278, 2692, 11, 382, + 291, 366, 10075, 428, 10220, 11, 1090, 6937, 2010, 2203, 2051, 294, 264, 6525, 51586], + "temperature": 0.0, "avg_logprob": -0.8359620445653012, "compression_ratio": 1.1228070175438596, + "no_speech_prob": 0.049298979341983795}, {"id": 3, "seek": 2444, "start": 24.44, + "end": 25.560000000000002, "text": " net.", "tokens": [50364, 2533, 13, 50420], + "temperature": 0.0, "avg_logprob": -0.21295687916514638, "compression_ratio": 1.5560344827586208, + "no_speech_prob": 0.08836518973112106}, {"id": 4, "seek": 2444, "start": 25.560000000000002, + "end": 30.64, "text": " One of your lovely co-workers mentioned that they were looking + at adding more radices because", "tokens": [50420, 1485, 295, 428, 7496, 598, 12, + 37101, 2835, 300, 436, 645, 1237, 412, 5127, 544, 2843, 1473, 570, 50674], "temperature": + 0.0, "avg_logprob": -0.21295687916514638, "compression_ratio": 1.5560344827586208, + "no_speech_prob": 0.08836518973112106}, {"id": 5, "seek": 2444, "start": 30.64, + "end": 37.08, "text": " it was maxing out at 10,000 commands per second, which they + were trending aggressively towards.", "tokens": [50674, 309, 390, 11469, 278, 484, + 412, 1266, 11, 1360, 16901, 680, 1150, 11, 597, 436, 645, 28692, 32024, 3030, 13, + 50996], "temperature": 0.0, "avg_logprob": -0.21295687916514638, "compression_ratio": + 1.5560344827586208, "no_speech_prob": 0.08836518973112106}, {"id": 6, "seek": 2444, + "start": 37.08, "end": 39.08, "text": " You asked them how they were using it.", + "tokens": [50996, 509, 2351, 552, 577, 436, 645, 1228, 309, 13, 51096], "temperature": + 0.0, "avg_logprob": -0.21295687916514638, "compression_ratio": 1.5560344827586208, + "no_speech_prob": 0.08836518973112106}, {"id": 7, "seek": 2444, "start": 39.08, + "end": 42.96, "text": " Were they writing some obscure order and command?", "tokens": + [51096, 12448, 436, 3579, 512, 34443, 1668, 293, 5622, 30, 51290], "temperature": + 0.0, "avg_logprob": -0.21295687916514638, "compression_ratio": 1.5560344827586208, + "no_speech_prob": 0.08836518973112106}, {"id": 8, "seek": 2444, "start": 42.96, + "end": 50.6, "text": " They would BPF probes to determine that it was all get key + and set key value.", "tokens": [51290, 814, 576, 40533, 37, 1239, 279, 281, 6997, + 300, 309, 390, 439, 483, 2141, 293, 992, 2141, 2158, 13, 51672], "temperature": + 0.0, "avg_logprob": -0.21295687916514638, "compression_ratio": 1.5560344827586208, + "no_speech_prob": 0.08836518973112106}, {"id": 9, "seek": 5060, "start": 50.6, "end": + 55.760000000000005, "text": " They also confirmed all the values were about less + than 64 bytes.", "tokens": [50364, 814, 611, 11341, 439, 264, 4190, 645, 466, 1570, + 813, 12145, 36088, 13, 50622], "temperature": 0.0, "avg_logprob": -0.26877395552818223, + "compression_ratio": 1.5158730158730158, "no_speech_prob": 0.002365718362852931}, + {"id": 10, "seek": 5060, "start": 55.760000000000005, "end": 60.96, "text": " For + those unfamiliar with radices, it''s single threaded in memory key value store written", + "tokens": [50622, 1171, 729, 29415, 365, 2843, 1473, 11, 309, 311, 2167, 47493, + 294, 4675, 2141, 2158, 3531, 3720, 50882], "temperature": 0.0, "avg_logprob": -0.26877395552818223, + "compression_ratio": 1.5158730158730158, "no_speech_prob": 0.002365718362852931}, + {"id": 11, "seek": 5060, "start": 60.96, "end": 61.96, "text": " in C.", "tokens": + [50882, 294, 383, 13, 50932], "temperature": 0.0, "avg_logprob": -0.26877395552818223, + "compression_ratio": 1.5158730158730158, "no_speech_prob": 0.002365718362852931}, + {"id": 12, "seek": 5060, "start": 61.96, "end": 66.8, "text": " And faced, after + this encounter, you walk to the window.", "tokens": [50932, 400, 11446, 11, 934, + 341, 8593, 11, 291, 1792, 281, 264, 4910, 13, 51174], "temperature": 0.0, "avg_logprob": + -0.26877395552818223, "compression_ratio": 1.5158730158730158, "no_speech_prob": + 0.002365718362852931}, {"id": 13, "seek": 5060, "start": 66.8, "end": 70.72, "text": + " You look out and see if your high mountain type needs along.", "tokens": [51174, + 509, 574, 484, 293, 536, 498, 428, 1090, 6937, 2010, 2203, 2051, 13, 51370], "temperature": + 0.0, "avg_logprob": -0.26877395552818223, "compression_ratio": 1.5158730158730158, + "no_speech_prob": 0.002365718362852931}, {"id": 14, "seek": 5060, "start": 70.72, + "end": 76.48, "text": " As you stare at yet another condominium building being built, + it hits you.", "tokens": [51370, 1018, 291, 22432, 412, 1939, 1071, 2224, 6981, + 2197, 2390, 885, 3094, 11, 309, 8664, 291, 13, 51658], "temperature": 0.0, "avg_logprob": + -0.26877395552818223, "compression_ratio": 1.5158730158730158, "no_speech_prob": + 0.002365718362852931}, {"id": 15, "seek": 5060, "start": 76.48, "end": 79.48, "text": + " 10,000 commands per second.", "tokens": [51658, 1266, 11, 1360, 16901, 680, 1150, + 13, 51808], "temperature": 0.0, "avg_logprob": -0.26877395552818223, "compression_ratio": + 1.5158730158730158, "no_speech_prob": 0.002365718362852931}, {"id": 16, "seek": + 7948, "start": 79.48, "end": 80.48, "text": " 10,000.", "tokens": [50364, 1266, + 11, 1360, 13, 50414], "temperature": 0.0, "avg_logprob": -0.4315867216690727, "compression_ratio": + 1.3822222222222222, "no_speech_prob": 0.1362903118133545}, {"id": 17, "seek": 7948, + "start": 80.48, "end": 83.72, "text": " Isn''t that a bit low?", "tokens": [50414, + 6998, 380, 300, 257, 857, 2295, 30, 50576], "temperature": 0.0, "avg_logprob": -0.4315867216690727, + "compression_ratio": 1.3822222222222222, "no_speech_prob": 0.1362903118133545}, + {"id": 18, "seek": 7948, "start": 83.72, "end": 90.44, "text": " Shouldn''t something + that''s fundamentally just doing random memory reads and writes over", "tokens": + [50576, 34170, 380, 746, 300, 311, 17879, 445, 884, 4974, 4675, 15700, 293, 13657, + 670, 50912], "temperature": 0.0, "avg_logprob": -0.4315867216690727, "compression_ratio": + 1.3822222222222222, "no_speech_prob": 0.1362903118133545}, {"id": 19, "seek": 7948, + "start": 90.44, "end": 95.04, "text": " an established TCP session be able to do + more?", "tokens": [50912, 364, 7545, 48965, 5481, 312, 1075, 281, 360, 544, 30, + 51142], "temperature": 0.0, "avg_logprob": -0.4315867216690727, "compression_ratio": + 1.3822222222222222, "no_speech_prob": 0.1362903118133545}, {"id": 20, "seek": 7948, + "start": 95.04, "end": 96.04, "text": " Hello there.", "tokens": [51142, 2425, 456, + 13, 51192], "temperature": 0.0, "avg_logprob": -0.4315867216690727, "compression_ratio": + 1.3822222222222222, "no_speech_prob": 0.1362903118133545}, {"id": 21, "seek": 7948, + "start": 96.04, "end": 98.08000000000001, "text": " Vector podcast is back.", "tokens": + [51192, 691, 20814, 7367, 307, 646, 13, 51294], "temperature": 0.0, "avg_logprob": + -0.4315867216690727, "compression_ratio": 1.3822222222222222, "no_speech_prob": + 0.1362903118133545}, {"id": 22, "seek": 7948, "start": 98.08000000000001, "end": + 100.04, "text": " Season 4.", "tokens": [51294, 16465, 1017, 13, 51392], "temperature": + 0.0, "avg_logprob": -0.4315867216690727, "compression_ratio": 1.3822222222222222, + "no_speech_prob": 0.1362903118133545}, {"id": 23, "seek": 7948, "start": 100.04, + "end": 105.96000000000001, "text": " And we are kicking off with an exciting topic + and guest assignment, Eskulls and CEO of", "tokens": [51392, 400, 321, 366, 19137, + 766, 365, 364, 4670, 4829, 293, 8341, 15187, 11, 2313, 74, 858, 82, 293, 9282, 295, + 51688], "temperature": 0.0, "avg_logprob": -0.4315867216690727, "compression_ratio": + 1.3822222222222222, "no_speech_prob": 0.1362903118133545}, {"id": 24, "seek": 7948, + "start": 105.96000000000001, "end": 106.96000000000001, "text": " TurboPuffer.", + "tokens": [51688, 35848, 47, 1245, 260, 13, 51738], "temperature": 0.0, "avg_logprob": + -0.4315867216690727, "compression_ratio": 1.3822222222222222, "no_speech_prob": + 0.1362903118133545}, {"id": 25, "seek": 10696, "start": 107.36, "end": 113.19999999999999, + "text": " I''ve been watching you guys from almost from the start, just following + each other on", "tokens": [50384, 286, 600, 668, 1976, 291, 1074, 490, 1920, 490, + 264, 722, 11, 445, 3480, 1184, 661, 322, 50676], "temperature": 0.0, "avg_logprob": + -0.3795521384791324, "compression_ratio": 1.612, "no_speech_prob": 0.6404481530189514}, + {"id": 26, "seek": 10696, "start": 113.19999999999999, "end": 115.75999999999999, + "text": " Twitter, like virtual friends.", "tokens": [50676, 5794, 11, 411, 6374, + 1855, 13, 50804], "temperature": 0.0, "avg_logprob": -0.3795521384791324, "compression_ratio": + 1.612, "no_speech_prob": 0.6404481530189514}, {"id": 27, "seek": 10696, "start": + 115.75999999999999, "end": 119.63999999999999, "text": " And it''s funny that before + this episode, you''re the CEO of the company and this for this", "tokens": [50804, + 400, 309, 311, 4074, 300, 949, 341, 3500, 11, 291, 434, 264, 9282, 295, 264, 2237, + 293, 341, 337, 341, 50998], "temperature": 0.0, "avg_logprob": -0.3795521384791324, + "compression_ratio": 1.612, "no_speech_prob": 0.6404481530189514}, {"id": 28, "seek": + 10696, "start": 119.63999999999999, "end": 125.56, "text": " episode, you try to + sell TurboPuffer to me and say, hey, why don''t you use it?", "tokens": [50998, + 3500, 11, 291, 853, 281, 3607, 35848, 47, 1245, 260, 281, 385, 293, 584, 11, 4177, + 11, 983, 500, 380, 291, 764, 309, 30, 51294], "temperature": 0.0, "avg_logprob": + -0.3795521384791324, "compression_ratio": 1.612, "no_speech_prob": 0.6404481530189514}, + {"id": 29, "seek": 10696, "start": 125.56, "end": 128.35999999999999, "text": " + Did you make a compound tom should pass?", "tokens": [51294, 2589, 291, 652, 257, + 14154, 2916, 820, 1320, 30, 51434], "temperature": 0.0, "avg_logprob": -0.3795521384791324, + "compression_ratio": 1.612, "no_speech_prob": 0.6404481530189514}, {"id": 30, "seek": + 10696, "start": 128.35999999999999, "end": 132.2, "text": " Yeah, should pass, for + sure.", "tokens": [51434, 865, 11, 820, 1320, 11, 337, 988, 13, 51626], "temperature": + 0.0, "avg_logprob": -0.3795521384791324, "compression_ratio": 1.612, "no_speech_prob": + 0.6404481530189514}, {"id": 31, "seek": 10696, "start": 132.2, "end": 134.68, "text": + " But tell me, hey, welcome, first of all, welcome.", "tokens": [51626, 583, 980, + 385, 11, 4177, 11, 2928, 11, 700, 295, 439, 11, 2928, 13, 51750], "temperature": + 0.0, "avg_logprob": -0.3795521384791324, "compression_ratio": 1.612, "no_speech_prob": + 0.6404481530189514}, {"id": 32, "seek": 13468, "start": 134.68, "end": 139.48000000000002, + "text": " And thank you very much for having with me.", "tokens": [50364, 400, 1309, + 291, 588, 709, 337, 1419, 365, 385, 13, 50604], "temperature": 0.0, "avg_logprob": + -0.29737889188007244, "compression_ratio": 1.5413533834586466, "no_speech_prob": + 0.3278094530105591}, {"id": 33, "seek": 13468, "start": 139.48000000000002, "end": + 142.36, "text": " It''s a tradition to usually start with the background.", "tokens": + [50604, 467, 311, 257, 6994, 281, 2673, 722, 365, 264, 3678, 13, 50748], "temperature": + 0.0, "avg_logprob": -0.29737889188007244, "compression_ratio": 1.5413533834586466, + "no_speech_prob": 0.3278094530105591}, {"id": 34, "seek": 13468, "start": 142.36, + "end": 146.08, "text": " If you could speak in your own words about yourself, your + journey.", "tokens": [50748, 759, 291, 727, 1710, 294, 428, 1065, 2283, 466, 1803, + 11, 428, 4671, 13, 50934], "temperature": 0.0, "avg_logprob": -0.29737889188007244, + "compression_ratio": 1.5413533834586466, "no_speech_prob": 0.3278094530105591}, + {"id": 35, "seek": 13468, "start": 146.08, "end": 152.96, "text": " I know that + you''ve worked at Shopify at some point, also scaling databases, I guess, right?", + "tokens": [50934, 286, 458, 300, 291, 600, 2732, 412, 43991, 412, 512, 935, 11, + 611, 21589, 22380, 11, 286, 2041, 11, 558, 30, 51278], "temperature": 0.0, "avg_logprob": + -0.29737889188007244, "compression_ratio": 1.5413533834586466, "no_speech_prob": + 0.3278094530105591}, {"id": 36, "seek": 13468, "start": 152.96, "end": 156.4, "text": + " But I''ve also been following your napkin math newsletter.", "tokens": [51278, + 583, 286, 600, 611, 668, 3480, 428, 9296, 5843, 5221, 26469, 13, 51450], "temperature": + 0.0, "avg_logprob": -0.29737889188007244, "compression_ratio": 1.5413533834586466, + "no_speech_prob": 0.3278094530105591}, {"id": 37, "seek": 13468, "start": 156.4, + "end": 164.28, "text": " I was reading maybe I''ll quote some text today from there, + just to amuse an exciting audience.", "tokens": [51450, 286, 390, 3760, 1310, 286, + 603, 6513, 512, 2487, 965, 490, 456, 11, 445, 281, 669, 438, 364, 4670, 4034, 13, + 51844], "temperature": 0.0, "avg_logprob": -0.29737889188007244, "compression_ratio": + 1.5413533834586466, "no_speech_prob": 0.3278094530105591}, {"id": 38, "seek": 16428, + "start": 164.28, "end": 166.28, "text": " But tell me about yourself.", "tokens": + [50364, 583, 980, 385, 466, 1803, 13, 50464], "temperature": 0.0, "avg_logprob": + -0.2525026213448003, "compression_ratio": 1.5968992248062015, "no_speech_prob": + 0.15724892914295197}, {"id": 39, "seek": 16428, "start": 166.28, "end": 172.76, + "text": " Yeah, I can give a very brief overview and if we can dig into anything, + if there''s anything", "tokens": [50464, 865, 11, 286, 393, 976, 257, 588, 5353, + 12492, 293, 498, 321, 393, 2528, 666, 1340, 11, 498, 456, 311, 1340, 50788], "temperature": + 0.0, "avg_logprob": -0.2525026213448003, "compression_ratio": 1.5968992248062015, + "no_speech_prob": 0.15724892914295197}, {"id": 40, "seek": 16428, "start": 172.76, + "end": 174.36, "text": " that stands out.", "tokens": [50788, 300, 7382, 484, 13, + 50868], "temperature": 0.0, "avg_logprob": -0.2525026213448003, "compression_ratio": + 1.5968992248062015, "no_speech_prob": 0.15724892914295197}, {"id": 41, "seek": 16428, + "start": 174.36, "end": 178.52, "text": " I started programming when I was a teenager.", + "tokens": [50868, 286, 1409, 9410, 562, 286, 390, 257, 21440, 13, 51076], "temperature": + 0.0, "avg_logprob": -0.2525026213448003, "compression_ratio": 1.5968992248062015, + "no_speech_prob": 0.15724892914295197}, {"id": 42, "seek": 16428, "start": 178.52, + "end": 181.04, "text": " Similar to you, English is not my first language.", "tokens": + [51076, 10905, 281, 291, 11, 3669, 307, 406, 452, 700, 2856, 13, 51202], "temperature": + 0.0, "avg_logprob": -0.2525026213448003, "compression_ratio": 1.5968992248062015, + "no_speech_prob": 0.15724892914295197}, {"id": 43, "seek": 16428, "start": 181.04, + "end": 187.64, "text": " So at some point, I exhausted the Danish web and then like + divulged into video game addiction", "tokens": [51202, 407, 412, 512, 935, 11, 286, + 17992, 264, 36944, 3670, 293, 550, 411, 47291, 3004, 666, 960, 1216, 16835, 51532], + "temperature": 0.0, "avg_logprob": -0.2525026213448003, "compression_ratio": 1.5968992248062015, + "no_speech_prob": 0.15724892914295197}, {"id": 44, "seek": 16428, "start": 187.64, + "end": 192.08, "text": " for three years as a teenager to learn enough English to + sort of, you know, get my own", "tokens": [51532, 337, 1045, 924, 382, 257, 21440, + 281, 1466, 1547, 3669, 281, 1333, 295, 11, 291, 458, 11, 483, 452, 1065, 51754], + "temperature": 0.0, "avg_logprob": -0.2525026213448003, "compression_ratio": 1.5968992248062015, + "no_speech_prob": 0.15724892914295197}, {"id": 45, "seek": 19208, "start": 192.08, + "end": 195.64000000000001, "text": " chat CBT moment and take off point.", "tokens": + [50364, 5081, 18745, 51, 1623, 293, 747, 766, 935, 13, 50542], "temperature": 0.0, + "avg_logprob": -0.1670213400148878, "compression_ratio": 1.7490196078431373, "no_speech_prob": + 0.03435908257961273}, {"id": 46, "seek": 19208, "start": 195.64000000000001, "end": + 200.24, "text": " And then I spent a lot of time in high school being not very good + at competitive programming,", "tokens": [50542, 400, 550, 286, 4418, 257, 688, 295, + 565, 294, 1090, 1395, 885, 406, 588, 665, 412, 10043, 9410, 11, 50772], "temperature": + 0.0, "avg_logprob": -0.1670213400148878, "compression_ratio": 1.7490196078431373, + "no_speech_prob": 0.03435908257961273}, {"id": 47, "seek": 19208, "start": 200.24, + "end": 204.08, "text": " but good enough to qualify for the small country of Denmark.", + "tokens": [50772, 457, 665, 1547, 281, 20276, 337, 264, 1359, 1941, 295, 28065, + 13, 50964], "temperature": 0.0, "avg_logprob": -0.1670213400148878, "compression_ratio": + 1.7490196078431373, "no_speech_prob": 0.03435908257961273}, {"id": 48, "seek": 19208, + "start": 204.08, "end": 210.96, "text": " And then I spent almost a decade working + at Shopify doing mainly infrastructure work.", "tokens": [50964, 400, 550, 286, + 4418, 1920, 257, 10378, 1364, 412, 43991, 884, 8704, 6896, 589, 13, 51308], "temperature": + 0.0, "avg_logprob": -0.1670213400148878, "compression_ratio": 1.7490196078431373, + "no_speech_prob": 0.03435908257961273}, {"id": 49, "seek": 19208, "start": 210.96, + "end": 216.36, "text": " So when I joined infrastructure Shopify and the infrastructure + team, we were doing, I mean,", "tokens": [51308, 407, 562, 286, 6869, 6896, 43991, + 293, 264, 6896, 1469, 11, 321, 645, 884, 11, 286, 914, 11, 51578], "temperature": + 0.0, "avg_logprob": -0.1670213400148878, "compression_ratio": 1.7490196078431373, + "no_speech_prob": 0.03435908257961273}, {"id": 50, "seek": 19208, "start": 216.36, + "end": 221.4, "text": " it was not even an infrastructure team like DevOps was just + becoming a thing.", "tokens": [51578, 309, 390, 406, 754, 364, 6896, 1469, 411, + 43051, 390, 445, 5617, 257, 551, 13, 51830], "temperature": 0.0, "avg_logprob": + -0.1670213400148878, "compression_ratio": 1.7490196078431373, "no_speech_prob": + 0.03435908257961273}, {"id": 51, "seek": 22140, "start": 221.4, "end": 224.76000000000002, + "text": " And we were driving just a couple hundred requests per second.", "tokens": + [50364, 400, 321, 645, 4840, 445, 257, 1916, 3262, 12475, 680, 1150, 13, 50532], + "temperature": 0.0, "avg_logprob": -0.16867480644812952, "compression_ratio": 1.7157190635451505, + "no_speech_prob": 0.021443955600261688}, {"id": 52, "seek": 22140, "start": 224.76000000000002, + "end": 228.24, "text": " And by the time I left, we saw peaks of more than a million.", + "tokens": [50532, 400, 538, 264, 565, 286, 1411, 11, 321, 1866, 26897, 295, 544, + 813, 257, 2459, 13, 50706], "temperature": 0.0, "avg_logprob": -0.16867480644812952, + "compression_ratio": 1.7157190635451505, "no_speech_prob": 0.021443955600261688}, + {"id": 53, "seek": 22140, "start": 228.24, "end": 233.44, "text": " And I more or + less worked on all of the stateful systems that power that because they generally", + "tokens": [50706, 400, 286, 544, 420, 1570, 2732, 322, 439, 295, 264, 1785, 906, + 3652, 300, 1347, 300, 570, 436, 5101, 50966], "temperature": 0.0, "avg_logprob": + -0.16867480644812952, "compression_ratio": 1.7157190635451505, "no_speech_prob": + 0.021443955600261688}, {"id": 54, "seek": 22140, "start": 233.44, "end": 238.48000000000002, + "text": " tend to be the bottleneck just playing whack-a-mole every single year + for every Black Friday", "tokens": [50966, 3928, 281, 312, 264, 44641, 547, 445, + 2433, 42877, 12, 64, 12, 3280, 306, 633, 2167, 1064, 337, 633, 4076, 6984, 51218], + "temperature": 0.0, "avg_logprob": -0.16867480644812952, "compression_ratio": 1.7157190635451505, + "no_speech_prob": 0.021443955600261688}, {"id": 55, "seek": 22140, "start": 238.48000000000002, + "end": 239.88, "text": " for many years.", "tokens": [51218, 337, 867, 924, 13, + 51288], "temperature": 0.0, "avg_logprob": -0.16867480644812952, "compression_ratio": + 1.7157190635451505, "no_speech_prob": 0.021443955600261688}, {"id": 56, "seek": + 22140, "start": 239.88, "end": 243.96, "text": " And I spent the majority of those + years on one of the last resort pages for Shopify as", "tokens": [51288, 400, 286, + 4418, 264, 6286, 295, 729, 924, 322, 472, 295, 264, 1036, 19606, 7183, 337, 43991, + 382, 51492], "temperature": 0.0, "avg_logprob": -0.16867480644812952, "compression_ratio": + 1.7157190635451505, "no_speech_prob": 0.021443955600261688}, {"id": 57, "seek": + 22140, "start": 243.96, "end": 244.96, "text": " well.", "tokens": [51492, 731, + 13, 51542], "temperature": 0.0, "avg_logprob": -0.16867480644812952, "compression_ratio": + 1.7157190635451505, "no_speech_prob": 0.021443955600261688}, {"id": 58, "seek": + 22140, "start": 244.96, "end": 250.6, "text": " One of those pages were the pages + are very scary in the middle of the night and where a lot", "tokens": [51542, 1485, + 295, 729, 7183, 645, 264, 7183, 366, 588, 6958, 294, 264, 2808, 295, 264, 1818, + 293, 689, 257, 688, 51824], "temperature": 0.0, "avg_logprob": -0.16867480644812952, + "compression_ratio": 1.7157190635451505, "no_speech_prob": 0.021443955600261688}, + {"id": 59, "seek": 25060, "start": 250.6, "end": 253.16, "text": " of GME of course + runs through Shopify.", "tokens": [50364, 295, 460, 15454, 295, 1164, 6676, 807, + 43991, 13, 50492], "temperature": 0.0, "avg_logprob": -0.1950820474063649, "compression_ratio": + 1.4890829694323144, "no_speech_prob": 0.00859132967889309}, {"id": 60, "seek": 25060, + "start": 253.16, "end": 257.2, "text": " So very high responsibility on that.", + "tokens": [50492, 407, 588, 1090, 6357, 322, 300, 13, 50694], "temperature": 0.0, + "avg_logprob": -0.1950820474063649, "compression_ratio": 1.4890829694323144, "no_speech_prob": + 0.00859132967889309}, {"id": 61, "seek": 25060, "start": 257.2, "end": 263.44, "text": + " I left in 2021 and kind of jumped around at my friends'' companies helping them + with various", "tokens": [50694, 286, 1411, 294, 7201, 293, 733, 295, 13864, 926, + 412, 452, 1855, 6, 3431, 4315, 552, 365, 3683, 51006], "temperature": 0.0, "avg_logprob": + -0.1950820474063649, "compression_ratio": 1.4890829694323144, "no_speech_prob": + 0.00859132967889309}, {"id": 62, "seek": 25060, "start": 263.44, "end": 264.44, + "text": " things.", "tokens": [51006, 721, 13, 51056], "temperature": 0.0, "avg_logprob": + -0.1950820474063649, "compression_ratio": 1.4890829694323144, "no_speech_prob": + 0.00859132967889309}, {"id": 63, "seek": 25060, "start": 264.44, "end": 267.08, + "text": " And I spent almost my entire career at one company.", "tokens": [51056, + 400, 286, 4418, 1920, 452, 2302, 3988, 412, 472, 2237, 13, 51188], "temperature": + 0.0, "avg_logprob": -0.1950820474063649, "compression_ratio": 1.4890829694323144, + "no_speech_prob": 0.00859132967889309}, {"id": 64, "seek": 25060, "start": 267.08, + "end": 273.04, "text": " So I wanted to dabble and just go and basically help my + friends with any infrastructure challenges", "tokens": [51188, 407, 286, 1415, 281, + 28964, 638, 293, 445, 352, 293, 1936, 854, 452, 1855, 365, 604, 6896, 4759, 51486], + "temperature": 0.0, "avg_logprob": -0.1950820474063649, "compression_ratio": 1.4890829694323144, + "no_speech_prob": 0.00859132967889309}, {"id": 65, "seek": 25060, "start": 273.04, + "end": 274.44, "text": " that they had.", "tokens": [51486, 300, 436, 632, 13, 51556], + "temperature": 0.0, "avg_logprob": -0.1950820474063649, "compression_ratio": 1.4890829694323144, + "no_speech_prob": 0.00859132967889309}, {"id": 66, "seek": 27444, "start": 274.44, + "end": 282.24, "text": " And in 2023 when Chatschy BT launched and the APIs launched, + I was working with my friends", "tokens": [50364, 400, 294, 44377, 562, 761, 1720, + 28629, 31144, 8730, 293, 264, 21445, 8730, 11, 286, 390, 1364, 365, 452, 1855, 50754], + "temperature": 0.0, "avg_logprob": -0.3703589038314106, "compression_ratio": 1.5278810408921932, + "no_speech_prob": 0.03991719335317612}, {"id": 67, "seek": 27444, "start": 282.24, + "end": 284.2, "text": " at this company called Readwise.", "tokens": [50754, 412, + 341, 2237, 1219, 17604, 3711, 13, 50852], "temperature": 0.0, "avg_logprob": -0.3703589038314106, + "compression_ratio": 1.5278810408921932, "no_speech_prob": 0.03991719335317612}, + {"id": 68, "seek": 27444, "start": 284.2, "end": 291.0, "text": " They have a product + similar to a pocket and others for reading articles later from", "tokens": [50852, + 814, 362, 257, 1674, 2531, 281, 257, 8963, 293, 2357, 337, 3760, 11290, 1780, 490, + 51192], "temperature": 0.0, "avg_logprob": -0.3703589038314106, "compression_ratio": + 1.5278810408921932, "no_speech_prob": 0.03991719335317612}, {"id": 69, "seek": 27444, + "start": 291.0, "end": 292.92, "text": " an Amal product.", "tokens": [51192, 364, + 2012, 304, 1674, 13, 51288], "temperature": 0.0, "avg_logprob": -0.3703589038314106, + "compression_ratio": 1.5278810408921932, "no_speech_prob": 0.03991719335317612}, + {"id": 70, "seek": 27444, "start": 292.92, "end": 297.48, "text": " And they asked + me to build a recommendation feature for articles.", "tokens": [51288, 400, 436, + 2351, 385, 281, 1322, 257, 11879, 4111, 337, 11290, 13, 51516], "temperature": 0.0, + "avg_logprob": -0.3703589038314106, "compression_ratio": 1.5278810408921932, "no_speech_prob": + 0.03991719335317612}, {"id": 71, "seek": 27444, "start": 297.48, "end": 299.48, + "text": " And I was like, well, it''s perfect, right?", "tokens": [51516, 400, 286, + 390, 411, 11, 731, 11, 309, 311, 2176, 11, 558, 30, 51616], "temperature": 0.0, + "avg_logprob": -0.3703589038314106, "compression_ratio": 1.5278810408921932, "no_speech_prob": + 0.03991719335317612}, {"id": 72, "seek": 27444, "start": 299.48, "end": 304.0, "text": + " LLMs or embedding models are basically just LLMs with their heads chopped off.", + "tokens": [51616, 441, 43, 26386, 420, 12240, 3584, 5245, 366, 1936, 445, 441, 43, + 26386, 365, 641, 8050, 16497, 766, 13, 51842], "temperature": 0.0, "avg_logprob": + -0.3703589038314106, "compression_ratio": 1.5278810408921932, "no_speech_prob": + 0.03991719335317612}, {"id": 73, "seek": 30400, "start": 304.0, "end": 306.32, "text": + " And they''re trained on exactly this data.", "tokens": [50364, 400, 436, 434, + 8895, 322, 2293, 341, 1412, 13, 50480], "temperature": 0.0, "avg_logprob": -0.19370662355885923, + "compression_ratio": 1.607843137254902, "no_speech_prob": 0.014127611182630062}, + {"id": 74, "seek": 30400, "start": 306.32, "end": 311.96, "text": " So we built + something and it actually worked pretty well for just recommending articles.", "tokens": + [50480, 407, 321, 3094, 746, 293, 309, 767, 2732, 1238, 731, 337, 445, 30559, 11290, + 13, 50762], "temperature": 0.0, "avg_logprob": -0.19370662355885923, "compression_ratio": + 1.607843137254902, "no_speech_prob": 0.014127611182630062}, {"id": 75, "seek": 30400, + "start": 311.96, "end": 317.24, "text": " But then I ran the back of the envelope + math on what it would cost to do this for the", "tokens": [50762, 583, 550, 286, + 5872, 264, 646, 295, 264, 19989, 5221, 322, 437, 309, 576, 2063, 281, 360, 341, + 337, 264, 51026], "temperature": 0.0, "avg_logprob": -0.19370662355885923, "compression_ratio": + 1.607843137254902, "no_speech_prob": 0.014127611182630062}, {"id": 76, "seek": 30400, + "start": 317.24, "end": 320.04, "text": " entire article catalog, right?", "tokens": + [51026, 2302, 7222, 19746, 11, 558, 30, 51166], "temperature": 0.0, "avg_logprob": + -0.19370662355885923, "compression_ratio": 1.607843137254902, "no_speech_prob": + 0.014127611182630062}, {"id": 77, "seek": 30400, "start": 320.04, "end": 322.64, + "text": " It had hundreds of millions of articles.", "tokens": [51166, 467, 632, + 6779, 295, 6803, 295, 11290, 13, 51296], "temperature": 0.0, "avg_logprob": -0.19370662355885923, + "compression_ratio": 1.607843137254902, "no_speech_prob": 0.014127611182630062}, + {"id": 78, "seek": 30400, "start": 322.64, "end": 327.4, "text": " And it would + have cost more than 30 grand a month to do.", "tokens": [51296, 400, 309, 576, 362, + 2063, 544, 813, 2217, 2697, 257, 1618, 281, 360, 13, 51534], "temperature": 0.0, + "avg_logprob": -0.19370662355885923, "compression_ratio": 1.607843137254902, "no_speech_prob": + 0.014127611182630062}, {"id": 79, "seek": 30400, "start": 327.4, "end": 330.56, + "text": " And for a large company that''s not a big deal for an experiment.", "tokens": + [51534, 400, 337, 257, 2416, 2237, 300, 311, 406, 257, 955, 2028, 337, 364, 5120, + 13, 51692], "temperature": 0.0, "avg_logprob": -0.19370662355885923, "compression_ratio": + 1.607843137254902, "no_speech_prob": 0.014127611182630062}, {"id": 80, "seek": 33056, + "start": 330.56, "end": 334.92, "text": " But this was a company that was spending + three grand a month on a Postgres instance that", "tokens": [50364, 583, 341, 390, + 257, 2237, 300, 390, 6434, 1045, 2697, 257, 1618, 322, 257, 10223, 45189, 5197, + 300, 50582], "temperature": 0.0, "avg_logprob": -0.198042883389238, "compression_ratio": + 1.7835051546391754, "no_speech_prob": 0.14610272645950317}, {"id": 81, "seek": 33056, + "start": 334.92, "end": 337.96, "text": " prior to working on this, I tuned.", "tokens": + [50582, 4059, 281, 1364, 322, 341, 11, 286, 10870, 13, 50734], "temperature": 0.0, + "avg_logprob": -0.198042883389238, "compression_ratio": 1.7835051546391754, "no_speech_prob": + 0.14610272645950317}, {"id": 82, "seek": 33056, "start": 337.96, "end": 344.88, + "text": " And spending 10 times that on just recommendations and possibly search + was just untenable.", "tokens": [50734, 400, 6434, 1266, 1413, 300, 322, 445, 10434, + 293, 6264, 3164, 390, 445, 25693, 712, 13, 51080], "temperature": 0.0, "avg_logprob": + -0.198042883389238, "compression_ratio": 1.7835051546391754, "no_speech_prob": 0.14610272645950317}, + {"id": 83, "seek": 33056, "start": 344.88, "end": 345.88, "text": " So it sort of + lost it.", "tokens": [51080, 407, 309, 1333, 295, 2731, 309, 13, 51130], "temperature": + 0.0, "avg_logprob": -0.198042883389238, "compression_ratio": 1.7835051546391754, + "no_speech_prob": 0.14610272645950317}, {"id": 84, "seek": 33056, "start": 345.88, + "end": 348.96, "text": " It was lost in its track.", "tokens": [51130, 467, 390, + 2731, 294, 1080, 2837, 13, 51284], "temperature": 0.0, "avg_logprob": -0.198042883389238, + "compression_ratio": 1.7835051546391754, "no_speech_prob": 0.14610272645950317}, + {"id": 85, "seek": 33056, "start": 348.96, "end": 350.64, "text": " And it was a + bit sad.", "tokens": [51284, 400, 309, 390, 257, 857, 4227, 13, 51368], "temperature": + 0.0, "avg_logprob": -0.198042883389238, "compression_ratio": 1.7835051546391754, + "no_speech_prob": 0.14610272645950317}, {"id": 86, "seek": 33056, "start": 350.64, + "end": 353.8, "text": " And it''s sort of ended up in that bucket that a lot of + companies have of like, okay,", "tokens": [51368, 400, 309, 311, 1333, 295, 4590, + 493, 294, 300, 13058, 300, 257, 688, 295, 3431, 362, 295, 411, 11, 1392, 11, 51526], + "temperature": 0.0, "avg_logprob": -0.198042883389238, "compression_ratio": 1.7835051546391754, + "no_speech_prob": 0.14610272645950317}, {"id": 87, "seek": 33056, "start": 353.8, + "end": 357.28, "text": " we''re going to work on this when it becomes cheaper and + then we''ll ship this feature.", "tokens": [51526, 321, 434, 516, 281, 589, 322, + 341, 562, 309, 3643, 12284, 293, 550, 321, 603, 5374, 341, 4111, 13, 51700], "temperature": + 0.0, "avg_logprob": -0.198042883389238, "compression_ratio": 1.7835051546391754, + "no_speech_prob": 0.14610272645950317}, {"id": 88, "seek": 33056, "start": 357.28, + "end": 359.56, "text": " But it was a bit sad because I was excited about this feature.", + "tokens": [51700, 583, 309, 390, 257, 857, 4227, 570, 286, 390, 2919, 466, 341, + 4111, 13, 51814], "temperature": 0.0, "avg_logprob": -0.198042883389238, "compression_ratio": + 1.7835051546391754, "no_speech_prob": 0.14610272645950317}, {"id": 89, "seek": 35956, + "start": 359.56, "end": 362.92, "text": " It''s a user of the product as well.", + "tokens": [50364, 467, 311, 257, 4195, 295, 264, 1674, 382, 731, 13, 50532], "temperature": + 0.0, "avg_logprob": -0.2509641408920288, "compression_ratio": 1.4444444444444444, + "no_speech_prob": 0.0010196856455877423}, {"id": 90, "seek": 35956, "start": 362.92, + "end": 365.88, "text": " And I could not stop thinking about that.", "tokens": [50532, + 400, 286, 727, 406, 1590, 1953, 466, 300, 13, 50680], "temperature": 0.0, "avg_logprob": + -0.2509641408920288, "compression_ratio": 1.4444444444444444, "no_speech_prob": + 0.0010196856455877423}, {"id": 91, "seek": 35956, "start": 365.88, "end": 367.92, + "text": " Why was it so expensive?", "tokens": [50680, 1545, 390, 309, 370, 5124, + 30, 50782], "temperature": 0.0, "avg_logprob": -0.2509641408920288, "compression_ratio": + 1.4444444444444444, "no_speech_prob": 0.0010196856455877423}, {"id": 92, "seek": + 35956, "start": 367.92, "end": 374.56, "text": " And the vector databases at the + time were storing everything in memory.", "tokens": [50782, 400, 264, 8062, 22380, + 412, 264, 565, 645, 26085, 1203, 294, 4675, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.2509641408920288, "compression_ratio": 1.4444444444444444, "no_speech_prob": + 0.0010196856455877423}, {"id": 93, "seek": 35956, "start": 374.56, "end": 381.8, + "text": " And DRAM on a cloud cost somewhere between two to five dollars per gigabyte.", + "tokens": [51114, 400, 12118, 2865, 322, 257, 4588, 2063, 4079, 1296, 732, 281, + 1732, 3808, 680, 8741, 34529, 13, 51476], "temperature": 0.0, "avg_logprob": -0.2509641408920288, + "compression_ratio": 1.4444444444444444, "no_speech_prob": 0.0010196856455877423}, + {"id": 94, "seek": 35956, "start": 381.8, "end": 384.08, "text": " And this just + economics of this didn''t line up.", "tokens": [51476, 400, 341, 445, 14564, 295, + 341, 994, 380, 1622, 493, 13, 51590], "temperature": 0.0, "avg_logprob": -0.2509641408920288, + "compression_ratio": 1.4444444444444444, "no_speech_prob": 0.0010196856455877423}, + {"id": 95, "seek": 38408, "start": 384.08, "end": 389.64, "text": " It wasn''t that + this vector database was doing anything, you know, malicious in their pricing.", + "tokens": [50364, 467, 2067, 380, 300, 341, 8062, 8149, 390, 884, 1340, 11, 291, + 458, 11, 33496, 294, 641, 17621, 13, 50642], "temperature": 0.0, "avg_logprob": + -0.2799035898844401, "compression_ratio": 1.7491961414790997, "no_speech_prob": + 0.008905366063117981}, {"id": 96, "seek": 38408, "start": 389.64, "end": 392.84, + "text": " They''re just trying to earn them on its margin on memory pricing.", "tokens": + [50642, 814, 434, 445, 1382, 281, 6012, 552, 322, 1080, 10270, 322, 4675, 17621, + 13, 50802], "temperature": 0.0, "avg_logprob": -0.2799035898844401, "compression_ratio": + 1.7491961414790997, "no_speech_prob": 0.008905366063117981}, {"id": 97, "seek": + 38408, "start": 392.84, "end": 397.08, "text": " But memory pricing was just too + high and it stopped its feature and it''s tracks.", "tokens": [50802, 583, 4675, + 17621, 390, 445, 886, 1090, 293, 309, 5936, 1080, 4111, 293, 309, 311, 10218, 13, + 51014], "temperature": 0.0, "avg_logprob": -0.2799035898844401, "compression_ratio": + 1.7491961414790997, "no_speech_prob": 0.008905366063117981}, {"id": 98, "seek": + 38408, "start": 397.08, "end": 400.84, "text": " And what I couldn''t stop thinking + about is, why can''t we do all of this on top of", "tokens": [51014, 400, 437, 286, + 2809, 380, 1590, 1953, 466, 307, 11, 983, 393, 380, 321, 360, 439, 295, 341, 322, + 1192, 295, 51202], "temperature": 0.0, "avg_logprob": -0.2799035898844401, "compression_ratio": + 1.7491961414790997, "no_speech_prob": 0.008905366063117981}, {"id": 99, "seek": + 38408, "start": 400.84, "end": 402.08, "text": " Obick storage, right?", "tokens": + [51202, 4075, 618, 6725, 11, 558, 30, 51264], "temperature": 0.0, "avg_logprob": + -0.2799035898844401, "compression_ratio": 1.7491961414790997, "no_speech_prob": + 0.008905366063117981}, {"id": 100, "seek": 38408, "start": 402.08, "end": 403.64, + "text": " Like we just put it on an Obick storage.", "tokens": [51264, 1743, 321, + 445, 829, 309, 322, 364, 4075, 618, 6725, 13, 51342], "temperature": 0.0, "avg_logprob": + -0.2799035898844401, "compression_ratio": 1.7491961414790997, "no_speech_prob": + 0.008905366063117981}, {"id": 101, "seek": 38408, "start": 403.64, "end": 404.64, + "text": " That''s the source of truth.", "tokens": [51342, 663, 311, 264, 4009, + 295, 3494, 13, 51392], "temperature": 0.0, "avg_logprob": -0.2799035898844401, "compression_ratio": + 1.7491961414790997, "no_speech_prob": 0.008905366063117981}, {"id": 102, "seek": + 38408, "start": 404.64, "end": 408.52, "text": " And then we actually need to some + piece of data and we put it in memory or even on this", "tokens": [51392, 400, 550, + 321, 767, 643, 281, 512, 2522, 295, 1412, 293, 321, 829, 309, 294, 4675, 420, 754, + 322, 341, 51586], "temperature": 0.0, "avg_logprob": -0.2799035898844401, "compression_ratio": + 1.7491961414790997, "no_speech_prob": 0.008905366063117981}, {"id": 103, "seek": + 38408, "start": 408.52, "end": 410.12, "text": " if we can.", "tokens": [51586, + 498, 321, 393, 13, 51666], "temperature": 0.0, "avg_logprob": -0.2799035898844401, + "compression_ratio": 1.7491961414790997, "no_speech_prob": 0.008905366063117981}, + {"id": 104, "seek": 38408, "start": 410.12, "end": 411.12, "text": " And I did it + to Mac and Nathan.", "tokens": [51666, 400, 286, 630, 309, 281, 5707, 293, 20634, + 13, 51716], "temperature": 0.0, "avg_logprob": -0.2799035898844401, "compression_ratio": + 1.7491961414790997, "no_speech_prob": 0.008905366063117981}, {"id": 105, "seek": + 41112, "start": 411.12, "end": 416.04, "text": " And I was like, I think that''s + about a hundred times cheaper.", "tokens": [50364, 400, 286, 390, 411, 11, 286, + 519, 300, 311, 466, 257, 3262, 1413, 12284, 13, 50610], "temperature": 0.0, "avg_logprob": + -0.21510773691637763, "compression_ratio": 1.6277372262773722, "no_speech_prob": + 0.3561461269855499}, {"id": 106, "seek": 41112, "start": 416.04, "end": 418.44, + "text": " And of course, that would have been no brainer for read wise.", "tokens": + [50610, 400, 295, 1164, 11, 300, 576, 362, 668, 572, 3567, 260, 337, 1401, 10829, + 13, 50730], "temperature": 0.0, "avg_logprob": -0.21510773691637763, "compression_ratio": + 1.6277372262773722, "no_speech_prob": 0.3561461269855499}, {"id": 107, "seek": 41112, + "start": 418.44, "end": 421.68, "text": " We would have just bought it and started + using it and tried it out, right?", "tokens": [50730, 492, 576, 362, 445, 4243, + 309, 293, 1409, 1228, 309, 293, 3031, 309, 484, 11, 558, 30, 50892], "temperature": + 0.0, "avg_logprob": -0.21510773691637763, "compression_ratio": 1.6277372262773722, + "no_speech_prob": 0.3561461269855499}, {"id": 108, "seek": 41112, "start": 421.68, + "end": 425.76, "text": " And maybe put way more data in and maybe worked our way + up to that 30 grand a month bill.", "tokens": [50892, 400, 1310, 829, 636, 544, + 1412, 294, 293, 1310, 2732, 527, 636, 493, 281, 300, 2217, 2697, 257, 1618, 2961, + 13, 51096], "temperature": 0.0, "avg_logprob": -0.21510773691637763, "compression_ratio": + 1.6277372262773722, "no_speech_prob": 0.3561461269855499}, {"id": 109, "seek": 41112, + "start": 425.76, "end": 428.8, "text": " But with a different workload.", "tokens": + [51096, 583, 365, 257, 819, 20139, 13, 51248], "temperature": 0.0, "avg_logprob": + -0.21510773691637763, "compression_ratio": 1.6277372262773722, "no_speech_prob": + 0.3561461269855499}, {"id": 110, "seek": 41112, "start": 428.8, "end": 431.44, "text": + " And so yeah, I couldn''t stop thinking about it.", "tokens": [51248, 400, 370, + 1338, 11, 286, 2809, 380, 1590, 1953, 466, 309, 13, 51380], "temperature": 0.0, + "avg_logprob": -0.21510773691637763, "compression_ratio": 1.6277372262773722, "no_speech_prob": + 0.3561461269855499}, {"id": 111, "seek": 41112, "start": 431.44, "end": 436.48, + "text": " And then eventually started writing the first version over the summer + of 2023.", "tokens": [51380, 400, 550, 4728, 1409, 3579, 264, 700, 3037, 670, 264, + 4266, 295, 44377, 13, 51632], "temperature": 0.0, "avg_logprob": -0.21510773691637763, + "compression_ratio": 1.6277372262773722, "no_speech_prob": 0.3561461269855499}, + {"id": 112, "seek": 43648, "start": 436.48, "end": 441.92, "text": " Just me alone + in the wills of Canada and then launched it in October of 2023, which is", "tokens": + [50364, 1449, 385, 3312, 294, 264, 486, 82, 295, 6309, 293, 550, 8730, 309, 294, + 7617, 295, 44377, 11, 597, 307, 50636], "temperature": 0.0, "avg_logprob": -0.2465718778452479, + "compression_ratio": 1.707070707070707, "no_speech_prob": 0.40480300784111023}, + {"id": 113, "seek": 43648, "start": 441.92, "end": 442.92, "text": " probably where + you saw it.", "tokens": [50636, 1391, 689, 291, 1866, 309, 13, 50686], "temperature": + 0.0, "avg_logprob": -0.2465718778452479, "compression_ratio": 1.707070707070707, + "no_speech_prob": 0.40480300784111023}, {"id": 114, "seek": 43648, "start": 442.92, + "end": 444.20000000000005, "text": " I didn''t really tell anyone about it.", "tokens": + [50686, 286, 994, 380, 534, 980, 2878, 466, 309, 13, 50750], "temperature": 0.0, + "avg_logprob": -0.2465718778452479, "compression_ratio": 1.707070707070707, "no_speech_prob": + 0.40480300784111023}, {"id": 115, "seek": 43648, "start": 444.20000000000005, "end": + 447.6, "text": " I was just I was just hacking away.", "tokens": [50750, 286, 390, + 445, 286, 390, 445, 31422, 1314, 13, 50920], "temperature": 0.0, "avg_logprob": + -0.2465718778452479, "compression_ratio": 1.707070707070707, "no_speech_prob": 0.40480300784111023}, + {"id": 116, "seek": 43648, "start": 447.6, "end": 452.04, "text": " Launched it + did a lot of our deal over that summer insights that some of them still are", "tokens": + [50920, 28119, 292, 309, 630, 257, 688, 295, 527, 2028, 670, 300, 4266, 14310, 300, + 512, 295, 552, 920, 366, 51142], "temperature": 0.0, "avg_logprob": -0.2465718778452479, + "compression_ratio": 1.707070707070707, "no_speech_prob": 0.40480300784111023}, + {"id": 117, "seek": 43648, "start": 452.04, "end": 455.96000000000004, "text": " + in the product and a lot of them we''ve since faced out.", "tokens": [51142, 294, + 264, 1674, 293, 257, 688, 295, 552, 321, 600, 1670, 11446, 484, 13, 51338], "temperature": + 0.0, "avg_logprob": -0.2465718778452479, "compression_ratio": 1.707070707070707, + "no_speech_prob": 0.40480300784111023}, {"id": 118, "seek": 43648, "start": 455.96000000000004, + "end": 459.16, "text": " But the most important thing was that it launched.", "tokens": + [51338, 583, 264, 881, 1021, 551, 390, 300, 309, 8730, 13, 51498], "temperature": + 0.0, "avg_logprob": -0.2465718778452479, "compression_ratio": 1.707070707070707, + "no_speech_prob": 0.40480300784111023}, {"id": 119, "seek": 43648, "start": 459.16, + "end": 462.32, "text": " And the first version of trouble puffer didn''t have I + was just looking at the website the", "tokens": [51498, 400, 264, 700, 3037, 295, + 5253, 19613, 260, 994, 380, 362, 286, 390, 445, 1237, 412, 264, 3144, 264, 51656], + "temperature": 0.0, "avg_logprob": -0.2465718778452479, "compression_ratio": 1.707070707070707, + "no_speech_prob": 0.40480300784111023}, {"id": 120, "seek": 43648, "start": 462.32, + "end": 464.32, "text": " other day for an unrelated reason.", "tokens": [51656, + 661, 786, 337, 364, 38967, 1778, 13, 51756], "temperature": 0.0, "avg_logprob": + -0.2465718778452479, "compression_ratio": 1.707070707070707, "no_speech_prob": 0.40480300784111023}, + {"id": 121, "seek": 46432, "start": 464.32, "end": 466.76, "text": " It didn''t + have mutable indexes.", "tokens": [50364, 467, 994, 380, 362, 5839, 712, 8186, 279, + 13, 50486], "temperature": 0.0, "avg_logprob": -0.2450800963810512, "compression_ratio": + 1.6883116883116882, "no_speech_prob": 0.12119867652654648}, {"id": 122, "seek": + 46432, "start": 466.76, "end": 472.36, "text": " So you just wrote to it and then + you called an index endpoint and then you''re logged in like that''s it.", "tokens": + [50486, 407, 291, 445, 4114, 281, 309, 293, 550, 291, 1219, 364, 8186, 35795, 293, + 550, 291, 434, 27231, 294, 411, 300, 311, 309, 13, 50766], "temperature": 0.0, "avg_logprob": + -0.2450800963810512, "compression_ratio": 1.6883116883116882, "no_speech_prob": + 0.12119867652654648}, {"id": 123, "seek": 46432, "start": 472.36, "end": 475.96, + "text": " And it didn''t have any SDKs.", "tokens": [50766, 400, 309, 994, 380, + 362, 604, 37135, 82, 13, 50946], "temperature": 0.0, "avg_logprob": -0.2450800963810512, + "compression_ratio": 1.6883116883116882, "no_speech_prob": 0.12119867652654648}, + {"id": 124, "seek": 46432, "start": 475.96, "end": 479.96, "text": " It was just + a big pure HTML website.", "tokens": [50946, 467, 390, 445, 257, 955, 6075, 17995, + 3144, 13, 51146], "temperature": 0.0, "avg_logprob": -0.2450800963810512, "compression_ratio": + 1.6883116883116882, "no_speech_prob": 0.12119867652654648}, {"id": 125, "seek": + 46432, "start": 479.96, "end": 489.2, "text": " But it was enough to ship it and + it caught the attention at the time of the cursor team back in 2023.", "tokens": + [51146, 583, 309, 390, 1547, 281, 5374, 309, 293, 309, 5415, 264, 3202, 412, 264, + 565, 295, 264, 28169, 1469, 646, 294, 44377, 13, 51608], "temperature": 0.0, "avg_logprob": + -0.2450800963810512, "compression_ratio": 1.6883116883116882, "no_speech_prob": + 0.12119867652654648}, {"id": 126, "seek": 46432, "start": 489.2, "end": 492.12, + "text": " And of course, this was this was this was early on for cursor.", "tokens": + [51608, 400, 295, 1164, 11, 341, 390, 341, 390, 341, 390, 2440, 322, 337, 28169, + 13, 51754], "temperature": 0.0, "avg_logprob": -0.2450800963810512, "compression_ratio": + 1.6883116883116882, "no_speech_prob": 0.12119867652654648}, {"id": 127, "seek": + 46432, "start": 492.12, "end": 493.6, "text": " It was early on for us.", "tokens": + [51754, 467, 390, 2440, 322, 337, 505, 13, 51828], "temperature": 0.0, "avg_logprob": + -0.2450800963810512, "compression_ratio": 1.6883116883116882, "no_speech_prob": + 0.12119867652654648}, {"id": 128, "seek": 49360, "start": 493.6, "end": 499.96000000000004, + "text": " And they are they''re a vector database built did not line up with their + per user economics and", "tokens": [50364, 400, 436, 366, 436, 434, 257, 8062, 8149, + 3094, 630, 406, 1622, 493, 365, 641, 680, 4195, 14564, 293, 50682], "temperature": + 0.0, "avg_logprob": -0.22223109692598866, "compression_ratio": 1.7666666666666666, + "no_speech_prob": 0.0013188893208280206}, {"id": 129, "seek": 49360, "start": 499.96000000000004, + "end": 503.52000000000004, "text": " how they wanted to use rag in their in cursor.", + "tokens": [50682, 577, 436, 1415, 281, 764, 17539, 294, 641, 294, 28169, 13, 50860], + "temperature": 0.0, "avg_logprob": -0.22223109692598866, "compression_ratio": 1.7666666666666666, + "no_speech_prob": 0.0013188893208280206}, {"id": 130, "seek": 49360, "start": 503.52000000000004, + "end": 506.20000000000005, "text": " And so they they wanted to try to work together.", + "tokens": [50860, 400, 370, 436, 436, 1415, 281, 853, 281, 589, 1214, 13, 50994], + "temperature": 0.0, "avg_logprob": -0.22223109692598866, "compression_ratio": 1.7666666666666666, + "no_speech_prob": 0.0013188893208280206}, {"id": 131, "seek": 49360, "start": 506.20000000000005, + "end": 511.28000000000003, "text": " And we exchanged a bunch of emails of bullet + points and it was very clear that they thought that this", "tokens": [50994, 400, + 321, 38378, 257, 3840, 295, 12524, 295, 11632, 2793, 293, 309, 390, 588, 1850, 300, + 436, 1194, 300, 341, 51248], "temperature": 0.0, "avg_logprob": -0.22223109692598866, + "compression_ratio": 1.7666666666666666, "no_speech_prob": 0.0013188893208280206}, + {"id": 132, "seek": 49360, "start": 511.28000000000003, "end": 515.96, "text": " + architecture was exactly now knowing the team are now they were just sat down at + the dining table,", "tokens": [51248, 9482, 390, 2293, 586, 5276, 264, 1469, 366, + 586, 436, 645, 445, 3227, 760, 412, 264, 17874, 3199, 11, 51482], "temperature": + 0.0, "avg_logprob": -0.22223109692598866, "compression_ratio": 1.7666666666666666, + "no_speech_prob": 0.0013188893208280206}, {"id": 133, "seek": 49360, "start": 515.96, + "end": 521.24, "text": " done to napkin math over there and then thought why hasn''t + anyone built it like this.", "tokens": [51482, 1096, 281, 9296, 5843, 5221, 670, + 456, 293, 550, 1194, 983, 6132, 380, 2878, 3094, 309, 411, 341, 13, 51746], "temperature": + 0.0, "avg_logprob": -0.22223109692598866, "compression_ratio": 1.7666666666666666, + "no_speech_prob": 0.0013188893208280206}, {"id": 134, "seek": 52124, "start": 521.24, + "end": 525.96, "text": " And so we worked we worked I went to San Francisco and + spent some time with them and", "tokens": [50364, 400, 370, 321, 2732, 321, 2732, + 286, 1437, 281, 5271, 12279, 293, 4418, 512, 565, 365, 552, 293, 50600], "temperature": + 0.0, "avg_logprob": -0.2926841042258523, "compression_ratio": 1.6735395189003437, + "no_speech_prob": 0.021179985255002975}, {"id": 135, "seek": 52124, "start": 525.96, + "end": 531.84, "text": " came up with a bunch of features that they would need and + called the best engineer that I knew would", "tokens": [50600, 1361, 493, 365, 257, + 3840, 295, 4122, 300, 436, 576, 643, 293, 1219, 264, 1151, 11403, 300, 286, 2586, + 576, 50894], "temperature": 0.0, "avg_logprob": -0.2926841042258523, "compression_ratio": + 1.6735395189003437, "no_speech_prob": 0.021179985255002975}, {"id": 136, "seek": + 52124, "start": 531.84, "end": 538.64, "text": " Shopify my co founder Justin and + asked if they''d come on board because I think I think maybe there''s something + here.", "tokens": [50894, 43991, 452, 598, 14917, 11320, 293, 2351, 498, 436, 1116, + 808, 322, 3150, 570, 286, 519, 286, 519, 1310, 456, 311, 746, 510, 13, 51234], "temperature": + 0.0, "avg_logprob": -0.2926841042258523, "compression_ratio": 1.6735395189003437, + "no_speech_prob": 0.021179985255002975}, {"id": 137, "seek": 52124, "start": 538.64, + "end": 544.76, "text": " And yeah, we launched it cursor cursor moved over and their + bill was reduced by 95% and", "tokens": [51234, 400, 1338, 11, 321, 8730, 309, 28169, + 28169, 4259, 670, 293, 641, 2961, 390, 9212, 538, 13420, 4, 293, 51540], "temperature": + 0.0, "avg_logprob": -0.2926841042258523, "compression_ratio": 1.6735395189003437, + "no_speech_prob": 0.021179985255002975}, {"id": 138, "seek": 52124, "start": 544.76, + "end": 549.52, "text": " of course the additional storage architect today were on + before didn''t make sense for the cursor", "tokens": [51540, 295, 1164, 264, 4497, + 6725, 6331, 965, 645, 322, 949, 994, 380, 652, 2020, 337, 264, 28169, 51778], "temperature": + 0.0, "avg_logprob": -0.2926841042258523, "compression_ratio": 1.6735395189003437, + "no_speech_prob": 0.021179985255002975}, {"id": 139, "seek": 54952, "start": 549.52, + "end": 553.6, "text": " economics but our storage architecture really did because + you put all the all the code based", "tokens": [50364, 14564, 457, 527, 6725, 9482, + 534, 630, 570, 291, 829, 439, 264, 439, 264, 3089, 2361, 50568], "temperature": + 0.0, "avg_logprob": -0.23744130568070845, "compression_ratio": 1.6451612903225807, + "no_speech_prob": 0.04909742623567581}, {"id": 140, "seek": 54952, "start": 553.6, + "end": 558.72, "text": " embeddings on s3 and then the ones that are actively being + used we can use in grammar or have in", "tokens": [50568, 12240, 29432, 322, 262, + 18, 293, 550, 264, 2306, 300, 366, 13022, 885, 1143, 321, 393, 764, 294, 22317, + 420, 362, 294, 50824], "temperature": 0.0, "avg_logprob": -0.23744130568070845, + "compression_ratio": 1.6451612903225807, "no_speech_prob": 0.04909742623567581}, + {"id": 141, "seek": 54952, "start": 558.72, "end": 563.0799999999999, "text": " + disk. I''ll stop there but that would be what led up to to this moment.", "tokens": + [50824, 12355, 13, 286, 603, 1590, 456, 457, 300, 576, 312, 437, 4684, 493, 281, + 281, 341, 1623, 13, 51042], "temperature": 0.0, "avg_logprob": -0.23744130568070845, + "compression_ratio": 1.6451612903225807, "no_speech_prob": 0.04909742623567581}, + {"id": 142, "seek": 54952, "start": 563.0799999999999, "end": 569.4399999999999, + "text": " Oh, that''s amazing journey. A lot to ask of course a lot of questions + but just on that cursor thing", "tokens": [51042, 876, 11, 300, 311, 2243, 4671, + 13, 316, 688, 281, 1029, 295, 1164, 257, 688, 295, 1651, 457, 445, 322, 300, 28169, + 551, 51360], "temperature": 0.0, "avg_logprob": -0.23744130568070845, "compression_ratio": + 1.6451612903225807, "no_speech_prob": 0.04909742623567581}, {"id": 143, "seek": + 54952, "start": 569.4399999999999, "end": 574.76, "text": " as I told you before + we started recording you know I knew about you launching this working on this", + "tokens": [51360, 382, 286, 1907, 291, 949, 321, 1409, 6613, 291, 458, 286, 2586, + 466, 291, 18354, 341, 1364, 322, 341, 51626], "temperature": 0.0, "avg_logprob": + -0.23744130568070845, "compression_ratio": 1.6451612903225807, "no_speech_prob": + 0.04909742623567581}, {"id": 144, "seek": 57476, "start": 574.76, "end": 581.56, + "text": " and then I''ve released it to the Lex Friedman podcast episode with the + cursor team and they didn''t", "tokens": [50364, 293, 550, 286, 600, 4736, 309, + 281, 264, 24086, 17605, 1601, 7367, 3500, 365, 264, 28169, 1469, 293, 436, 994, + 380, 50704], "temperature": 0.0, "avg_logprob": -0.20765840337517555, "compression_ratio": + 1.7292576419213974, "no_speech_prob": 0.010459833778440952}, {"id": 145, "seek": + 57476, "start": 581.56, "end": 587.4, "text": " mention turbo pop for sort of like + in passing but you know I think that also probably created a lot", "tokens": [50704, + 2152, 20902, 1665, 337, 1333, 295, 411, 294, 8437, 457, 291, 458, 286, 519, 300, + 611, 1391, 2942, 257, 688, 50996], "temperature": 0.0, "avg_logprob": -0.20765840337517555, + "compression_ratio": 1.7292576419213974, "no_speech_prob": 0.010459833778440952}, + {"id": 146, "seek": 57476, "start": 587.4, "end": 594.28, "text": " of attention + to you guys but I''m just curious like how did you get together how did you know + cursor", "tokens": [50996, 295, 3202, 281, 291, 1074, 457, 286, 478, 445, 6369, + 411, 577, 630, 291, 483, 1214, 577, 630, 291, 458, 28169, 51340], "temperature": + 0.0, "avg_logprob": -0.20765840337517555, "compression_ratio": 1.7292576419213974, + "no_speech_prob": 0.010459833778440952}, {"id": 147, "seek": 57476, "start": 594.28, + "end": 601.4, "text": " team somehow someone on the cursor team that you could like + partner early on and essentially help", "tokens": [51340, 1469, 6063, 1580, 322, + 264, 28169, 1469, 300, 291, 727, 411, 4975, 2440, 322, 293, 4476, 854, 51696], "temperature": + 0.0, "avg_logprob": -0.20765840337517555, "compression_ratio": 1.7292576419213974, + "no_speech_prob": 0.010459833778440952}, {"id": 148, "seek": 60140, "start": 602.04, + "end": 606.76, "text": " they kind of like helped you to pioneer it right in some + sense becoming the first client", "tokens": [50396, 436, 733, 295, 411, 4254, 291, + 281, 37668, 309, 558, 294, 512, 2020, 5617, 264, 700, 6423, 50632], "temperature": + 0.0, "avg_logprob": -0.12334865707534927, "compression_ratio": 1.7786259541984732, + "no_speech_prob": 0.007828392088413239}, {"id": 149, "seek": 60140, "start": 607.48, + "end": 614.92, "text": " or maybe future client right how did you approach them. + They did I mean they were a design", "tokens": [50668, 420, 1310, 2027, 6423, 558, + 577, 630, 291, 3109, 552, 13, 814, 630, 286, 914, 436, 645, 257, 1715, 51040], "temperature": + 0.0, "avg_logprob": -0.12334865707534927, "compression_ratio": 1.7786259541984732, + "no_speech_prob": 0.007828392088413239}, {"id": 150, "seek": 60140, "start": 614.92, + "end": 619.72, "text": " partner in every sense of the word right we had a slack + channel and I feel like they treated us", "tokens": [51040, 4975, 294, 633, 2020, + 295, 264, 1349, 558, 321, 632, 257, 29767, 2269, 293, 286, 841, 411, 436, 8668, + 505, 51280], "temperature": 0.0, "avg_logprob": -0.12334865707534927, "compression_ratio": + 1.7786259541984732, "no_speech_prob": 0.007828392088413239}, {"id": 151, "seek": + 60140, "start": 619.72, "end": 625.48, "text": " as part of their team and we treated + them as part of our team. They came inbound they sent an", "tokens": [51280, 382, + 644, 295, 641, 1469, 293, 321, 8668, 552, 382, 644, 295, 527, 1469, 13, 814, 1361, + 294, 18767, 436, 2279, 364, 51568], "temperature": 0.0, "avg_logprob": -0.12334865707534927, + "compression_ratio": 1.7786259541984732, "no_speech_prob": 0.007828392088413239}, + {"id": 152, "seek": 60140, "start": 625.48, "end": 630.36, "text": " email based + on the website and they said hey we would need mutable indexes and glob and a couple", + "tokens": [51568, 3796, 2361, 322, 264, 3144, 293, 436, 848, 4177, 321, 576, 643, + 5839, 712, 8186, 279, 293, 16125, 293, 257, 1916, 51812], "temperature": 0.0, "avg_logprob": + -0.12334865707534927, "compression_ratio": 1.7786259541984732, "no_speech_prob": + 0.007828392088413239}, {"id": 153, "seek": 63036, "start": 630.36, "end": 635.96, + "text": " of other things so it''s like well that''s a very reasonable request right + and I think they had", "tokens": [50364, 295, 661, 721, 370, 309, 311, 411, 731, + 300, 311, 257, 588, 10585, 5308, 558, 293, 286, 519, 436, 632, 50644], "temperature": + 0.0, "avg_logprob": -0.17964347349394352, "compression_ratio": 1.8571428571428572, + "no_speech_prob": 0.0007849333342164755}, {"id": 154, "seek": 63036, "start": 635.96, + "end": 641.88, "text": " the conviction that this was the right architecture and + like if we could prove in their trust and", "tokens": [50644, 264, 24837, 300, 341, + 390, 264, 558, 9482, 293, 411, 498, 321, 727, 7081, 294, 641, 3361, 293, 50940], + "temperature": 0.0, "avg_logprob": -0.17964347349394352, "compression_ratio": 1.8571428571428572, + "no_speech_prob": 0.0007849333342164755}, {"id": 155, "seek": 63036, "start": 641.88, + "end": 647.64, "text": " then be able to be in a good place so it was really just + a it was just an oneness conversation", "tokens": [50940, 550, 312, 1075, 281, 312, + 294, 257, 665, 1081, 370, 309, 390, 534, 445, 257, 309, 390, 445, 364, 322, 15264, + 3761, 51228], "temperature": 0.0, "avg_logprob": -0.17964347349394352, "compression_ratio": + 1.8571428571428572, "no_speech_prob": 0.0007849333342164755}, {"id": 156, "seek": + 63036, "start": 648.44, "end": 653.8000000000001, "text": " just the way that the + website is today a very honest description of what are the trade-offs", "tokens": + [51268, 445, 264, 636, 300, 264, 3144, 307, 965, 257, 588, 3245, 3855, 295, 437, + 366, 264, 4923, 12, 19231, 51536], "temperature": 0.0, "avg_logprob": -0.17964347349394352, + "compression_ratio": 1.8571428571428572, "no_speech_prob": 0.0007849333342164755}, + {"id": 157, "seek": 63036, "start": 653.8000000000001, "end": 659.4, "text": " what + can it do what can it not do what is the latency profile what are the guarantees + and", "tokens": [51536, 437, 393, 309, 360, 437, 393, 309, 406, 360, 437, 307, 264, + 27043, 7964, 437, 366, 264, 32567, 293, 51816], "temperature": 0.0, "avg_logprob": + -0.17964347349394352, "compression_ratio": 1.8571428571428572, "no_speech_prob": + 0.0007849333342164755}, {"id": 158, "seek": 66036, "start": 660.36, "end": 665.5600000000001, + "text": " that''s exactly the kind of bullet point discussion that we engaged in + over email before I met the", "tokens": [50364, 300, 311, 2293, 264, 733, 295, 11632, + 935, 5017, 300, 321, 8237, 294, 670, 3796, 949, 286, 1131, 264, 50624], "temperature": + 0.0, "avg_logprob": -0.16736915876280586, "compression_ratio": 1.7611940298507462, + "no_speech_prob": 0.001114301267080009}, {"id": 159, "seek": 66036, "start": 665.5600000000001, + "end": 673.96, "text": " team in person yeah and they of course they were a small + team at the time right it was and they", "tokens": [50624, 1469, 294, 954, 1338, + 293, 436, 295, 1164, 436, 645, 257, 1359, 1469, 412, 264, 565, 558, 309, 390, 293, + 436, 51044], "temperature": 0.0, "avg_logprob": -0.16736915876280586, "compression_ratio": + 1.7611940298507462, "no_speech_prob": 0.001114301267080009}, {"id": 160, "seek": + 66036, "start": 673.96, "end": 678.52, "text": " needed help with the with with + parts of their infrastructure and working very very closely with", "tokens": [51044, + 2978, 854, 365, 264, 365, 365, 3166, 295, 641, 6896, 293, 1364, 588, 588, 8185, + 365, 51272], "temperature": 0.0, "avg_logprob": -0.16736915876280586, "compression_ratio": + 1.7611940298507462, "no_speech_prob": 0.001114301267080009}, {"id": 161, "seek": + 66036, "start": 678.52, "end": 683.8000000000001, "text": " teams that they could + trust with the right economics and the right the right reliability.", "tokens": + [51272, 5491, 300, 436, 727, 3361, 365, 264, 558, 14564, 293, 264, 558, 264, 558, + 24550, 13, 51536], "temperature": 0.0, "avg_logprob": -0.16736915876280586, "compression_ratio": + 1.7611940298507462, "no_speech_prob": 0.001114301267080009}, {"id": 162, "seek": + 66036, "start": 684.52, "end": 689.72, "text": " Yeah for sure but I guess that + honesty which I also value a lot you know and in my work as I", "tokens": [51572, + 865, 337, 988, 457, 286, 2041, 300, 26839, 597, 286, 611, 2158, 257, 688, 291, 458, + 293, 294, 452, 589, 382, 286, 51832], "temperature": 0.0, "avg_logprob": -0.16736915876280586, + "compression_ratio": 1.7611940298507462, "no_speech_prob": 0.001114301267080009}, + {"id": 163, "seek": 68972, "start": 689.72, "end": 696.52, "text": " became a product + manager you know three years ago and I think it applies to any discipline be honest", + "tokens": [50364, 3062, 257, 1674, 6598, 291, 458, 1045, 924, 2057, 293, 286, 519, + 309, 13165, 281, 604, 13635, 312, 3245, 50704], "temperature": 0.0, "avg_logprob": + -0.08318492571512857, "compression_ratio": 1.7612612612612613, "no_speech_prob": + 0.0016630529426038265}, {"id": 164, "seek": 68972, "start": 696.52, "end": 701.88, + "text": " but but you know like that honesty probably lies on the fact that you + you''ve done your napkin", "tokens": [50704, 457, 457, 291, 458, 411, 300, 26839, + 1391, 9134, 322, 264, 1186, 300, 291, 291, 600, 1096, 428, 9296, 5843, 50972], "temperature": + 0.0, "avg_logprob": -0.08318492571512857, "compression_ratio": 1.7612612612612613, + "no_speech_prob": 0.0016630529426038265}, {"id": 165, "seek": 68972, "start": 701.88, + "end": 709.24, "text": " math and you knew where this will scale where how this + can go right how did you go about doing that", "tokens": [50972, 5221, 293, 291, + 2586, 689, 341, 486, 4373, 689, 577, 341, 393, 352, 558, 577, 630, 291, 352, 466, + 884, 300, 51340], "temperature": 0.0, "avg_logprob": -0.08318492571512857, "compression_ratio": + 1.7612612612612613, "no_speech_prob": 0.0016630529426038265}, {"id": 166, "seek": + 68972, "start": 709.24, "end": 714.44, "text": " pre-launch right before having + any client is that the company of your friends that helped you to", "tokens": [51340, + 659, 12, 875, 1680, 558, 949, 1419, 604, 6423, 307, 300, 264, 2237, 295, 428, 1855, + 300, 4254, 291, 281, 51600], "temperature": 0.0, "avg_logprob": -0.08318492571512857, + "compression_ratio": 1.7612612612612613, "no_speech_prob": 0.0016630529426038265}, + {"id": 167, "seek": 71444, "start": 714.44, "end": 720.12, "text": " kind of like + figure out the economics and sort of the the throughput and all of these rigorous", + "tokens": [50364, 733, 295, 411, 2573, 484, 264, 14564, 293, 1333, 295, 264, 264, + 44629, 293, 439, 295, 613, 29882, 50648], "temperature": 0.0, "avg_logprob": -0.11937861496143126, + "compression_ratio": 1.6521739130434783, "no_speech_prob": 0.0033078561536967754}, + {"id": 168, "seek": 71444, "start": 720.12, "end": 726.7600000000001, "text": " + questions that you ask you know as problem statements on napkin math. I think that + should almost", "tokens": [50648, 1651, 300, 291, 1029, 291, 458, 382, 1154, 12363, + 322, 9296, 5843, 5221, 13, 286, 519, 300, 820, 1920, 50980], "temperature": 0.0, + "avg_logprob": -0.11937861496143126, "compression_ratio": 1.6521739130434783, "no_speech_prob": + 0.0033078561536967754}, {"id": 169, "seek": 71444, "start": 726.7600000000001, "end": + 734.2, "text": " bring up the internet archive version of it the the first version + of TurboPuffer I had not", "tokens": [50980, 1565, 493, 264, 4705, 23507, 3037, + 295, 309, 264, 264, 700, 3037, 295, 35848, 47, 1245, 260, 286, 632, 406, 51352], + "temperature": 0.0, "avg_logprob": -0.11937861496143126, "compression_ratio": 1.6521739130434783, + "no_speech_prob": 0.0033078561536967754}, {"id": 170, "seek": 71444, "start": 734.2, + "end": 742.6, "text": " thought about the business at all I didn''t have any launch + playbook I had I had one of course all", "tokens": [51352, 1194, 466, 264, 1606, + 412, 439, 286, 994, 380, 362, 604, 4025, 862, 2939, 286, 632, 286, 632, 472, 295, + 1164, 439, 51772], "temperature": 0.0, "avg_logprob": -0.11937861496143126, "compression_ratio": + 1.6521739130434783, "no_speech_prob": 0.0033078561536967754}, {"id": 171, "seek": + 74260, "start": 742.6, "end": 746.84, "text": " the economics of what it would cost + me to operate and spend a decent amount of time on the pricing", "tokens": [50364, + 264, 14564, 295, 437, 309, 576, 2063, 385, 281, 9651, 293, 3496, 257, 8681, 2372, + 295, 565, 322, 264, 17621, 50576], "temperature": 0.0, "avg_logprob": -0.11853224951941688, + "compression_ratio": 1.7610294117647058, "no_speech_prob": 0.00045000307727605104}, + {"id": 172, "seek": 74260, "start": 746.84, "end": 751.88, "text": " because that + felt like an important thing to spend time on at the time but there was really not", + "tokens": [50576, 570, 300, 2762, 411, 364, 1021, 551, 281, 3496, 565, 322, 412, + 264, 565, 457, 456, 390, 534, 406, 50828], "temperature": 0.0, "avg_logprob": -0.11853224951941688, + "compression_ratio": 1.7610294117647058, "no_speech_prob": 0.00045000307727605104}, + {"id": 173, "seek": 74260, "start": 751.88, "end": 757.88, "text": " much more than + that of course the the Readwise team was very interested but at the time I could", + "tokens": [50828, 709, 544, 813, 300, 295, 1164, 264, 264, 17604, 3711, 1469, 390, + 588, 3102, 457, 412, 264, 565, 286, 727, 51128], "temperature": 0.0, "avg_logprob": + -0.11853224951941688, "compression_ratio": 1.7610294117647058, "no_speech_prob": + 0.00045000307727605104}, {"id": 174, "seek": 74260, "start": 757.88, "end": 762.6800000000001, + "text": " barely do a you know I could just do around 10 million factors which is + not enough for their use", "tokens": [51128, 10268, 360, 257, 291, 458, 286, 727, + 445, 360, 926, 1266, 2459, 6771, 597, 307, 406, 1547, 337, 641, 764, 51368], "temperature": + 0.0, "avg_logprob": -0.11853224951941688, "compression_ratio": 1.7610294117647058, + "no_speech_prob": 0.00045000307727605104}, {"id": 175, "seek": 74260, "start": 762.6800000000001, + "end": 771.08, "text": " case. I can screen share the website with you right here + of what it looked like at the time", "tokens": [51368, 1389, 13, 286, 393, 2568, + 2073, 264, 3144, 365, 291, 558, 510, 295, 437, 309, 2956, 411, 412, 264, 565, 51788], + "temperature": 0.0, "avg_logprob": -0.11853224951941688, "compression_ratio": 1.7610294117647058, + "no_speech_prob": 0.00045000307727605104}, {"id": 176, "seek": 77108, "start": 771.72, + "end": 777.64, "text": " and then we can get for the for the for the listening audience + we can get your reaction", "tokens": [50396, 293, 550, 321, 393, 483, 337, 264, + 337, 264, 337, 264, 4764, 4034, 321, 393, 483, 428, 5480, 50692], "temperature": + 0.0, "avg_logprob": -0.12443925716258862, "compression_ratio": 1.8565891472868217, + "no_speech_prob": 0.017964662984013557}, {"id": 177, "seek": 77108, "start": 777.64, + "end": 782.44, "text": " but it was it was very simple I wouldn''t I wouldn''t put + in any sophistication and it was honestly", "tokens": [50692, 457, 309, 390, 309, + 390, 588, 2199, 286, 2759, 380, 286, 2759, 380, 829, 294, 604, 15572, 399, 293, + 309, 390, 6095, 50932], "temperature": 0.0, "avg_logprob": -0.12443925716258862, + "compression_ratio": 1.8565891472868217, "no_speech_prob": 0.017964662984013557}, + {"id": 178, "seek": 77108, "start": 782.44, "end": 787.48, "text": " I was exhausted + I''ve been working on this like completely alone not telling anyone about it no", + "tokens": [50932, 286, 390, 17992, 286, 600, 668, 1364, 322, 341, 411, 2584, 3312, + 406, 3585, 2878, 466, 309, 572, 51184], "temperature": 0.0, "avg_logprob": -0.12443925716258862, + "compression_ratio": 1.8565891472868217, "no_speech_prob": 0.017964662984013557}, + {"id": 179, "seek": 77108, "start": 787.48, "end": 794.12, "text": " interested + customers for like four months extremely focused like every single day and I couldn''t + like", "tokens": [51184, 3102, 4581, 337, 411, 1451, 2493, 4664, 5178, 411, 633, + 2167, 786, 293, 286, 2809, 380, 411, 51516], "temperature": 0.0, "avg_logprob": + -0.12443925716258862, "compression_ratio": 1.8565891472868217, "no_speech_prob": + 0.017964662984013557}, {"id": 180, "seek": 77108, "start": 794.12, "end": 798.2800000000001, + "text": " you ask my wife she would say I was very distracted and she''s just like + well how are you working", "tokens": [51516, 291, 1029, 452, 3836, 750, 576, 584, + 286, 390, 588, 21658, 293, 750, 311, 445, 411, 731, 577, 366, 291, 1364, 51724], + "temperature": 0.0, "avg_logprob": -0.12443925716258862, "compression_ratio": 1.8565891472868217, + "no_speech_prob": 0.017964662984013557}, {"id": 181, "seek": 79828, "start": 798.36, + "end": 802.1999999999999, "text": " so hard on this like there''s no one on the + team you don''t have any customer line up and I''m just like", "tokens": [50368, + 370, 1152, 322, 341, 411, 456, 311, 572, 472, 322, 264, 1469, 291, 500, 380, 362, + 604, 5474, 1622, 493, 293, 286, 478, 445, 411, 50560], "temperature": 0.0, "avg_logprob": + -0.17285533029525008, "compression_ratio": 1.8901960784313725, "no_speech_prob": + 0.011518293991684914}, {"id": 182, "seek": 79828, "start": 802.76, "end": 810.04, + "text": " someone has to do this and I I just launched it and I launched it I mean + now I feel some", "tokens": [50588, 1580, 575, 281, 360, 341, 293, 286, 286, 445, + 8730, 309, 293, 286, 8730, 309, 286, 914, 586, 286, 841, 512, 50952], "temperature": + 0.0, "avg_logprob": -0.17285533029525008, "compression_ratio": 1.8901960784313725, + "no_speech_prob": 0.011518293991684914}, {"id": 183, "seek": 79828, "start": 810.04, + "end": 814.6, "text": " verising would be did launch it just couldn''t do that launch + it was pretty slow I spent a bunch", "tokens": [50952, 1306, 3436, 576, 312, 630, + 4025, 309, 445, 2809, 380, 360, 300, 4025, 309, 390, 1238, 2964, 286, 4418, 257, + 3840, 51180], "temperature": 0.0, "avg_logprob": -0.17285533029525008, "compression_ratio": + 1.8901960784313725, "no_speech_prob": 0.011518293991684914}, {"id": 184, "seek": + 79828, "start": 814.6, "end": 819.9599999999999, "text": " of time actually trying + to make it work in wasm and on the edge but it was too hard to make it fast", "tokens": + [51180, 295, 565, 767, 1382, 281, 652, 309, 589, 294, 390, 76, 293, 322, 264, 4691, + 457, 309, 390, 886, 1152, 281, 652, 309, 2370, 51448], "temperature": 0.0, "avg_logprob": + -0.17285533029525008, "compression_ratio": 1.8901960784313725, "no_speech_prob": + 0.011518293991684914}, {"id": 185, "seek": 79828, "start": 820.8399999999999, "end": + 825.56, "text": " and a bunch of other false starch like that on different types + of a and end indexing structures", "tokens": [51492, 293, 257, 3840, 295, 661, 7908, + 24748, 411, 300, 322, 819, 3467, 295, 257, 293, 917, 8186, 278, 9227, 51728], "temperature": + 0.0, "avg_logprob": -0.17285533029525008, "compression_ratio": 1.8901960784313725, + "no_speech_prob": 0.011518293991684914}, {"id": 186, "seek": 82556, "start": 825.56, + "end": 830.1199999999999, "text": " we could talk about that as well and would be + settled on but there was no real sophistication", "tokens": [50364, 321, 727, 751, + 466, 300, 382, 731, 293, 576, 312, 14819, 322, 457, 456, 390, 572, 957, 15572, 399, + 50592], "temperature": 0.0, "avg_logprob": -0.1474095086256663, "compression_ratio": + 1.737327188940092, "no_speech_prob": 0.019443301483988762}, {"id": 187, "seek": + 82556, "start": 830.1199999999999, "end": 834.92, "text": " in the go to market + it was really just here it is here''s the outcome math here''s what it does", "tokens": + [50592, 294, 264, 352, 281, 2142, 309, 390, 534, 445, 510, 309, 307, 510, 311, 264, + 9700, 5221, 510, 311, 437, 309, 775, 50832], "temperature": 0.0, "avg_logprob": + -0.1474095086256663, "compression_ratio": 1.737327188940092, "no_speech_prob": 0.019443301483988762}, + {"id": 188, "seek": 82556, "start": 836.04, "end": 841.16, "text": " let''s see + how the world takes it but I think when when when you sit on a well you didn''t + sit on", "tokens": [50888, 718, 311, 536, 577, 264, 1002, 2516, 309, 457, 286, 519, + 562, 562, 562, 291, 1394, 322, 257, 731, 291, 994, 380, 1394, 322, 51144], "temperature": + 0.0, "avg_logprob": -0.1474095086256663, "compression_ratio": 1.737327188940092, + "no_speech_prob": 0.019443301483988762}, {"id": 189, "seek": 82556, "start": 841.16, + "end": 849.16, "text": " it yet but you had a cool like technology ID and mind right + you knew you know it may play out", "tokens": [51144, 309, 1939, 457, 291, 632, + 257, 1627, 411, 2899, 7348, 293, 1575, 558, 291, 2586, 291, 458, 309, 815, 862, + 484, 51544], "temperature": 0.0, "avg_logprob": -0.1474095086256663, "compression_ratio": + 1.737327188940092, "no_speech_prob": 0.019443301483988762}, {"id": 190, "seek": + 84916, "start": 849.64, "end": 856.92, "text": " but also of course it required + a lot of hard work like you said but after that after you see it fly", "tokens": + [50388, 457, 611, 295, 1164, 309, 4739, 257, 688, 295, 1152, 589, 411, 291, 848, + 457, 934, 300, 934, 291, 536, 309, 3603, 50752], "temperature": 0.0, "avg_logprob": + -0.13029775757720505, "compression_ratio": 1.6420454545454546, "no_speech_prob": + 0.007909717969596386}, {"id": 191, "seek": 84916, "start": 856.92, "end": 864.36, + "text": " like on some small scale or whatever scale I think that brings you like + that excitement to bring", "tokens": [50752, 411, 322, 512, 1359, 4373, 420, 2035, + 4373, 286, 519, 300, 5607, 291, 411, 300, 14755, 281, 1565, 51124], "temperature": + 0.0, "avg_logprob": -0.13029775757720505, "compression_ratio": 1.6420454545454546, + "no_speech_prob": 0.007909717969596386}, {"id": 192, "seek": 84916, "start": 864.36, + "end": 871.3199999999999, "text": " it to the world right so yeah I see you''re + sharing the screen of the of the web archive page", "tokens": [51124, 309, 281, + 264, 1002, 558, 370, 1338, 286, 536, 291, 434, 5414, 264, 2568, 295, 264, 295, 264, + 3670, 23507, 3028, 51472], "temperature": 0.0, "avg_logprob": -0.13029775757720505, + "compression_ratio": 1.6420454545454546, "no_speech_prob": 0.007909717969596386}, + {"id": 193, "seek": 87132, "start": 871.88, "end": 880.84, "text": " yeah that''s + it very simple yeah yeah that''s awesome but yeah that''s actually a good segue + to", "tokens": [50392, 1338, 300, 311, 309, 588, 2199, 1338, 1338, 300, 311, 3476, + 457, 1338, 300, 311, 767, 257, 665, 33850, 281, 50840], "temperature": 0.0, "avg_logprob": + -0.16622103506059788, "compression_ratio": 1.7177914110429449, "no_speech_prob": + 0.014284927397966385}, {"id": 194, "seek": 87132, "start": 881.88, "end": 887.72, + "text": " you know you probably know I''ve been at the emergence of the field of + vector database field", "tokens": [50892, 291, 458, 291, 1391, 458, 286, 600, 668, + 412, 264, 36211, 295, 264, 2519, 295, 8062, 8149, 2519, 51184], "temperature": 0.0, + "avg_logprob": -0.16622103506059788, "compression_ratio": 1.7177914110429449, "no_speech_prob": + 0.014284927397966385}, {"id": 195, "seek": 87132, "start": 888.5200000000001, "end": + 895.72, "text": " I''ve been I think I was the first probably to write just a simple + block post with like you know", "tokens": [51224, 286, 600, 668, 286, 519, 286, + 390, 264, 700, 1391, 281, 2464, 445, 257, 2199, 3461, 2183, 365, 411, 291, 458, + 51584], "temperature": 0.0, "avg_logprob": -0.16622103506059788, "compression_ratio": + 1.7177914110429449, "no_speech_prob": 0.014284927397966385}, {"id": 196, "seek": + 89572, "start": 895.8000000000001, "end": 901.64, "text": " these crump snippets + of what each vector database did and how they stand out and so on", "tokens": [50368, + 613, 941, 1420, 35623, 1385, 295, 437, 1184, 8062, 8149, 630, 293, 577, 436, 1463, + 484, 293, 370, 322, 50660], "temperature": 0.0, "avg_logprob": -0.12697281376008066, + "compression_ratio": 1.8173076923076923, "no_speech_prob": 0.005139813758432865}, + {"id": 197, "seek": 89572, "start": 901.64, "end": 908.84, "text": " turbo buffer + wasn''t there because turbo buffer was still in your mind I think but but the segue", + "tokens": [50660, 20902, 21762, 2067, 380, 456, 570, 20902, 21762, 390, 920, 294, + 428, 1575, 286, 519, 457, 457, 264, 33850, 51020], "temperature": 0.0, "avg_logprob": + -0.12697281376008066, "compression_ratio": 1.8173076923076923, "no_speech_prob": + 0.005139813758432865}, {"id": 198, "seek": 89572, "start": 908.84, "end": 918.12, + "text": " here is I don''t have it covered in that block post but in your mind why + were you not happy with", "tokens": [51020, 510, 307, 286, 500, 380, 362, 309, 5343, + 294, 300, 3461, 2183, 457, 294, 428, 1575, 983, 645, 291, 406, 2055, 365, 51484], + "temperature": 0.0, "avg_logprob": -0.12697281376008066, "compression_ratio": 1.8173076923076923, + "no_speech_prob": 0.005139813758432865}, {"id": 199, "seek": 89572, "start": 918.12, + "end": 923.5600000000001, "text": " the vector date is like at large did you try + all of them did you try some of them why did you think", "tokens": [51484, 264, + 8062, 4002, 307, 411, 412, 2416, 630, 291, 853, 439, 295, 552, 630, 291, 853, 512, + 295, 552, 983, 630, 291, 519, 51756], "temperature": 0.0, "avg_logprob": -0.12697281376008066, + "compression_ratio": 1.8173076923076923, "no_speech_prob": 0.005139813758432865}, + {"id": 200, "seek": 92356, "start": 923.56, "end": 931.4, "text": " that a new vector + database deserves to exist yeah I think I think it really just came back to the", + "tokens": [50364, 300, 257, 777, 8062, 8149, 17037, 281, 2514, 1338, 286, 519, 286, + 519, 309, 534, 445, 1361, 646, 281, 264, 50756], "temperature": 0.0, "avg_logprob": + -0.11996966801332624, "compression_ratio": 1.7354260089686098, "no_speech_prob": + 0.0017117963870987296}, {"id": 201, "seek": 92356, "start": 931.4, "end": 937.16, + "text": " read wise example right there''s I there''s they look like great products + I really like the API of", "tokens": [50756, 1401, 10829, 1365, 558, 456, 311, 286, + 456, 311, 436, 574, 411, 869, 3383, 286, 534, 411, 264, 9362, 295, 51044], "temperature": + 0.0, "avg_logprob": -0.11996966801332624, "compression_ratio": 1.7354260089686098, + "no_speech_prob": 0.0017117963870987296}, {"id": 202, "seek": 92356, "start": 937.16, + "end": 941.64, "text": " many of them they had lots of features that have taken + me a long time to build that even features", "tokens": [51044, 867, 295, 552, 436, + 632, 3195, 295, 4122, 300, 362, 2726, 385, 257, 938, 565, 281, 1322, 300, 754, 4122, + 51268], "temperature": 0.0, "avg_logprob": -0.11996966801332624, "compression_ratio": + 1.7354260089686098, "no_speech_prob": 0.0017117963870987296}, {"id": 203, "seek": + 92356, "start": 941.64, "end": 945.8, "text": " that we don''t have today although + we have a lot of features today compared to when we launched", "tokens": [51268, + 300, 321, 500, 380, 362, 965, 4878, 321, 362, 257, 688, 295, 4122, 965, 5347, 281, + 562, 321, 8730, 51476], "temperature": 0.0, "avg_logprob": -0.11996966801332624, + "compression_ratio": 1.7354260089686098, "no_speech_prob": 0.0017117963870987296}, + {"id": 204, "seek": 94580, "start": 946.76, "end": 955.3199999999999, "text": " + it came out of the cost piece that it felt that there was a lot of latent demand + built up in", "tokens": [50412, 309, 1361, 484, 295, 264, 2063, 2522, 300, 309, + 2762, 300, 456, 390, 257, 688, 295, 48994, 4733, 3094, 493, 294, 50840], "temperature": + 0.0, "avg_logprob": -0.11499584804881703, "compression_ratio": 1.6711111111111112, + "no_speech_prob": 0.019973021000623703}, {"id": 205, "seek": 94580, "start": 955.3199999999999, + "end": 958.52, "text": " the market of people who wanted to use these things but + it just didn''t make sense with the", "tokens": [50840, 264, 2142, 295, 561, 567, + 1415, 281, 764, 613, 721, 457, 309, 445, 994, 380, 652, 2020, 365, 264, 51000], + "temperature": 0.0, "avg_logprob": -0.11499584804881703, "compression_ratio": 1.6711111111111112, + "no_speech_prob": 0.019973021000623703}, {"id": 206, "seek": 94580, "start": 958.52, + "end": 963.7199999999999, "text": " economics it''s very difficult to earn a return + on search I mean I remember the search clusters", "tokens": [51000, 14564, 309, + 311, 588, 2252, 281, 6012, 257, 2736, 322, 3164, 286, 914, 286, 1604, 264, 3164, + 23313, 51260], "temperature": 0.0, "avg_logprob": -0.11499584804881703, "compression_ratio": + 1.6711111111111112, "no_speech_prob": 0.019973021000623703}, {"id": 207, "seek": + 94580, "start": 963.7199999999999, "end": 971.56, "text": " that Shopify were very + expensive but ecommerce is a lot about search and so it was okay right but", "tokens": + [51260, 300, 43991, 645, 588, 5124, 457, 308, 26926, 307, 257, 688, 466, 3164, 293, + 370, 309, 390, 1392, 558, 457, 51652], "temperature": 0.0, "avg_logprob": -0.11499584804881703, + "compression_ratio": 1.6711111111111112, "no_speech_prob": 0.019973021000623703}, + {"id": 208, "seek": 97156, "start": 971.56, "end": 977.3199999999999, "text": " + for a lot of companies search is a an important feature but is not the feature right + and so the", "tokens": [50364, 337, 257, 688, 295, 3431, 3164, 307, 257, 364, 1021, + 4111, 457, 307, 406, 264, 4111, 558, 293, 370, 264, 50652], "temperature": 0.0, + "avg_logprob": -0.10414466182742499, "compression_ratio": 1.8700787401574803, "no_speech_prob": + 0.00027143603074364364}, {"id": 209, "seek": 97156, "start": 977.3199999999999, + "end": 982.04, "text": " per user economics just have to make sense it''s not that + everyone just wants it in the cheapest", "tokens": [50652, 680, 4195, 14564, 445, + 362, 281, 652, 2020, 309, 311, 406, 300, 1518, 445, 2738, 309, 294, 264, 29167, + 50888], "temperature": 0.0, "avg_logprob": -0.10414466182742499, "compression_ratio": + 1.8700787401574803, "no_speech_prob": 0.00027143603074364364}, {"id": 210, "seek": + 97156, "start": 982.04, "end": 987.56, "text": " possible way is that if you invest + in infrastructure you have to get a return on that investment", "tokens": [50888, + 1944, 636, 307, 300, 498, 291, 1963, 294, 6896, 291, 362, 281, 483, 257, 2736, 322, + 300, 6078, 51164], "temperature": 0.0, "avg_logprob": -0.10414466182742499, "compression_ratio": + 1.8700787401574803, "no_speech_prob": 0.00027143603074364364}, {"id": 211, "seek": + 97156, "start": 988.04, "end": 992.1199999999999, "text": " and it felt that I knew + that I''d read wise they could get a return on that investment but it", "tokens": + [51188, 293, 309, 2762, 300, 286, 2586, 300, 286, 1116, 1401, 10829, 436, 727, 483, + 257, 2736, 322, 300, 6078, 457, 309, 51392], "temperature": 0.0, "avg_logprob": + -0.10414466182742499, "compression_ratio": 1.8700787401574803, "no_speech_prob": + 0.00027143603074364364}, {"id": 212, "seek": 97156, "start": 992.1199999999999, + "end": 996.1999999999999, "text": " wasn''t on 30 grand a month it was maybe close + at a 3 grand or 5 grand a month that they would", "tokens": [51392, 2067, 380, 322, + 2217, 2697, 257, 1618, 309, 390, 1310, 1998, 412, 257, 805, 2697, 420, 1025, 2697, + 257, 1618, 300, 436, 576, 51596], "temperature": 0.0, "avg_logprob": -0.10414466182742499, + "compression_ratio": 1.8700787401574803, "no_speech_prob": 0.00027143603074364364}, + {"id": 213, "seek": 99620, "start": 996.76, "end": 1000.6800000000001, "text": " + feel that they could earn a return on that feature and gender conversion engagement + and whatever", "tokens": [50392, 841, 300, 436, 727, 6012, 257, 2736, 322, 300, + 4111, 293, 7898, 14298, 8742, 293, 2035, 50588], "temperature": 0.0, "avg_logprob": + -0.17379162528298117, "compression_ratio": 1.9015748031496063, "no_speech_prob": + 0.01238193828612566}, {"id": 214, "seek": 99620, "start": 1001.88, "end": 1008.0400000000001, + "text": " so it was really about the storage architecture and I think that when + I think about databases now", "tokens": [50648, 370, 309, 390, 534, 466, 264, 6725, + 9482, 293, 286, 519, 300, 562, 286, 519, 466, 22380, 586, 50956], "temperature": + 0.0, "avg_logprob": -0.17379162528298117, "compression_ratio": 1.9015748031496063, + "no_speech_prob": 0.01238193828612566}, {"id": 215, "seek": 99620, "start": 1008.0400000000001, + "end": 1012.9200000000001, "text": " this was not as coherent to me at the time + at the time I was driven by the Nipkin Math Naptkin Math", "tokens": [50956, 341, + 390, 406, 382, 36239, 281, 385, 412, 264, 565, 412, 264, 565, 286, 390, 9555, 538, + 264, 426, 647, 5843, 15776, 426, 2796, 5843, 15776, 51200], "temperature": 0.0, + "avg_logprob": -0.17379162528298117, "compression_ratio": 1.9015748031496063, "no_speech_prob": + 0.01238193828612566}, {"id": 216, "seek": 99620, "start": 1012.9200000000001, "end": + 1018.0400000000001, "text": " not the not the market nothing else it was based on + one qualitative experience and an Naptkin", "tokens": [51200, 406, 264, 406, 264, + 2142, 1825, 1646, 309, 390, 2361, 322, 472, 31312, 1752, 293, 364, 426, 2796, 5843, + 51456], "temperature": 0.0, "avg_logprob": -0.17379162528298117, "compression_ratio": + 1.9015748031496063, "no_speech_prob": 0.01238193828612566}, {"id": 217, "seek": + 99620, "start": 1018.0400000000001, "end": 1023.0, "text": " Math there was nothing + else in it and speak about it in a more sophisticated way now being you", "tokens": + [51456, 15776, 456, 390, 1825, 1646, 294, 309, 293, 1710, 466, 309, 294, 257, 544, + 16950, 636, 586, 885, 291, 51704], "temperature": 0.0, "avg_logprob": -0.17379162528298117, + "compression_ratio": 1.9015748031496063, "no_speech_prob": 0.01238193828612566}, + {"id": 218, "seek": 102300, "start": 1023.16, "end": 1028.68, "text": " know having + learned a lot about go-to-market sense but the that those were that''s really all + that", "tokens": [50372, 458, 1419, 3264, 257, 688, 466, 352, 12, 1353, 12, 16414, + 2020, 457, 264, 300, 729, 645, 300, 311, 534, 439, 300, 50648], "temperature": 0.0, + "avg_logprob": -0.1142035456537043, "compression_ratio": 1.8458498023715415, "no_speech_prob": + 0.006433083210140467}, {"id": 219, "seek": 102300, "start": 1028.68, "end": 1032.68, + "text": " was at the time it was an insight on those two things the best ideas right + are", "tokens": [50648, 390, 412, 264, 565, 309, 390, 364, 11269, 322, 729, 732, + 721, 264, 1151, 3487, 558, 366, 50848], "temperature": 0.0, "avg_logprob": -0.1142035456537043, + "compression_ratio": 1.8458498023715415, "no_speech_prob": 0.006433083210140467}, + {"id": 220, "seek": 102300, "start": 1034.92, "end": 1039.48, "text": " simultaneous + inventions right someone else would have done it six months later probably other + people", "tokens": [50960, 46218, 43748, 558, 1580, 1646, 576, 362, 1096, 309, 2309, + 2493, 1780, 1391, 661, 561, 51188], "temperature": 0.0, "avg_logprob": -0.1142035456537043, + "compression_ratio": 1.8458498023715415, "no_speech_prob": 0.006433083210140467}, + {"id": 221, "seek": 102300, "start": 1039.48, "end": 1044.12, "text": " were doing + it at the time that launched a later right we were the first to launch with this", + "tokens": [51188, 645, 884, 309, 412, 264, 565, 300, 8730, 257, 1780, 558, 321, + 645, 264, 700, 281, 4025, 365, 341, 51420], "temperature": 0.0, "avg_logprob": -0.1142035456537043, + "compression_ratio": 1.8458498023715415, "no_speech_prob": 0.006433083210140467}, + {"id": 222, "seek": 102300, "start": 1044.12, "end": 1049.72, "text": " particular + architecture but it was out there for the grappling right the idea was in the air + like", "tokens": [51420, 1729, 9482, 457, 309, 390, 484, 456, 337, 264, 50086, 558, + 264, 1558, 390, 294, 264, 1988, 411, 51700], "temperature": 0.0, "avg_logprob": + -0.1142035456537043, "compression_ratio": 1.8458498023715415, "no_speech_prob": + 0.006433083210140467}, {"id": 223, "seek": 104972, "start": 1049.72, "end": 1056.52, + "text": " s3 had the the dpites now finally so the way that I think about this to + really boil this down is that", "tokens": [50364, 262, 18, 632, 264, 264, 274, 79, + 3324, 586, 2721, 370, 264, 636, 300, 286, 519, 466, 341, 281, 534, 13329, 341, 760, + 307, 300, 50704], "temperature": 0.0, "avg_logprob": -0.1946304722836143, "compression_ratio": + 1.7767857142857142, "no_speech_prob": 0.0029727707151323557}, {"id": 224, "seek": + 104972, "start": 1057.08, "end": 1065.88, "text": " if you want to create a generational + database company I think you need two things you need a new", "tokens": [50732, + 498, 291, 528, 281, 1884, 257, 48320, 8149, 2237, 286, 519, 291, 643, 732, 721, + 291, 643, 257, 777, 51172], "temperature": 0.0, "avg_logprob": -0.1946304722836143, + "compression_ratio": 1.7767857142857142, "no_speech_prob": 0.0029727707151323557}, + {"id": 225, "seek": 104972, "start": 1065.88, "end": 1073.08, "text": " workload + the new workload here is that we have almost every company on earth sits on their + treasure", "tokens": [51172, 20139, 264, 777, 20139, 510, 307, 300, 321, 362, 1920, + 633, 2237, 322, 4120, 12696, 322, 641, 12985, 51532], "temperature": 0.0, "avg_logprob": + -0.1946304722836143, "compression_ratio": 1.7767857142857142, "no_speech_prob": + 0.0029727707151323557}, {"id": 226, "seek": 104972, "start": 1073.08, "end": 1078.76, + "text": " troll of data and they want to connect that to LLAMs especially all the + unstructured data that it''s", "tokens": [51532, 20680, 295, 1412, 293, 436, 528, + 281, 1745, 300, 281, 441, 43, 2865, 82, 2318, 439, 264, 18799, 46847, 1412, 300, + 309, 311, 51816], "temperature": 0.0, "avg_logprob": -0.1946304722836143, "compression_ratio": + 1.7767857142857142, "no_speech_prob": 0.0029727707151323557}, {"id": 227, "seek": + 107876, "start": 1078.76, "end": 1085.32, "text": " always been very difficult to + do we did this for structured data into 2010s the new workload was", "tokens": [50364, + 1009, 668, 588, 2252, 281, 360, 321, 630, 341, 337, 18519, 1412, 666, 9657, 82, + 264, 777, 20139, 390, 50692], "temperature": 0.0, "avg_logprob": -0.05884032828785549, + "compression_ratio": 1.904382470119522, "no_speech_prob": 0.0002787653647828847}, + {"id": 228, "seek": 107876, "start": 1085.32, "end": 1091.4, "text": " that we wanted + to do analytics on billions tens of billions trillions of rows of structured data", + "tokens": [50692, 300, 321, 1415, 281, 360, 15370, 322, 17375, 10688, 295, 17375, + 504, 46279, 295, 13241, 295, 18519, 1412, 50996], "temperature": 0.0, "avg_logprob": + -0.05884032828785549, "compression_ratio": 1.904382470119522, "no_speech_prob": + 0.0002787653647828847}, {"id": 229, "seek": 107876, "start": 1091.4, "end": 1096.76, + "text": " but now with LLAMs we''re entering into that with the unstructured data + that''s the first thing we", "tokens": [50996, 457, 586, 365, 441, 43, 2865, 82, + 321, 434, 11104, 666, 300, 365, 264, 18799, 46847, 1412, 300, 311, 264, 700, 551, + 321, 51264], "temperature": 0.0, "avg_logprob": -0.05884032828785549, "compression_ratio": + 1.904382470119522, "no_speech_prob": 0.0002787653647828847}, {"id": 230, "seek": + 107876, "start": 1096.76, "end": 1101.56, "text": " needed new workload because + that''s when people go out shopping for a new database the second thing", "tokens": + [51264, 2978, 777, 20139, 570, 300, 311, 562, 561, 352, 484, 8688, 337, 257, 777, + 8149, 264, 1150, 551, 51504], "temperature": 0.0, "avg_logprob": -0.05884032828785549, + "compression_ratio": 1.904382470119522, "no_speech_prob": 0.0002787653647828847}, + {"id": 231, "seek": 107876, "start": 1101.56, "end": 1106.92, "text": " that you + need is a new storage architecture if you don''t have a new storage architecture", + "tokens": [51504, 300, 291, 643, 307, 257, 777, 6725, 9482, 498, 291, 500, 380, + 362, 257, 777, 6725, 9482, 51772], "temperature": 0.0, "avg_logprob": -0.05884032828785549, + "compression_ratio": 1.904382470119522, "no_speech_prob": 0.0002787653647828847}, + {"id": 232, "seek": 110692, "start": 1107.72, "end": 1114.76, "text": " that is + fundamentally a better tradeoff for the particular workload then there''s no reason + why", "tokens": [50404, 300, 307, 17879, 257, 1101, 4923, 4506, 337, 264, 1729, + 20139, 550, 456, 311, 572, 1778, 983, 50756], "temperature": 0.0, "avg_logprob": + -0.1266285281315028, "compression_ratio": 1.6782006920415224, "no_speech_prob": + 0.0055726319551467896}, {"id": 233, "seek": 110692, "start": 1114.76, "end": 1120.6000000000001, + "text": " tacking on a secondary index to your relational database to your OLAB + to your existing search engine", "tokens": [50756, 9426, 278, 322, 257, 11396, 8186, + 281, 428, 38444, 8149, 281, 428, 39191, 13868, 281, 428, 6741, 3164, 2848, 51048], + "temperature": 0.0, "avg_logprob": -0.1266285281315028, "compression_ratio": 1.6782006920415224, + "no_speech_prob": 0.0055726319551467896}, {"id": 234, "seek": 110692, "start": 1121.5600000000002, + "end": 1125.48, "text": " they would eat it I would have made that decision in the + shoes that Shopify right it''s like well", "tokens": [51096, 436, 576, 1862, 309, + 286, 576, 362, 1027, 300, 3537, 294, 264, 6654, 300, 43991, 558, 309, 311, 411, + 731, 51292], "temperature": 0.0, "avg_logprob": -0.1266285281315028, "compression_ratio": + 1.6782006920415224, "no_speech_prob": 0.0055726319551467896}, {"id": 235, "seek": + 110692, "start": 1126.1200000000001, "end": 1131.24, "text": " this database like + has a really good vector index but it doesn''t bring anything new in terms of", + "tokens": [51324, 341, 8149, 411, 575, 257, 534, 665, 8062, 8186, 457, 309, 1177, + 380, 1565, 1340, 777, 294, 2115, 295, 51580], "temperature": 0.0, "avg_logprob": + -0.1266285281315028, "compression_ratio": 1.6782006920415224, "no_speech_prob": + 0.0055726319551467896}, {"id": 236, "seek": 110692, "start": 1131.24, "end": 1136.3600000000001, + "text": " the storage architecture so we''re just going to invest in the mySQL extension + right it''s what we", "tokens": [51580, 264, 6725, 9482, 370, 321, 434, 445, 516, + 281, 1963, 294, 264, 452, 39934, 10320, 558, 309, 311, 437, 321, 51836], "temperature": + 0.0, "avg_logprob": -0.1266285281315028, "compression_ratio": 1.6782006920415224, + "no_speech_prob": 0.0055726319551467896}, {"id": 237, "seek": 113636, "start": 1136.36, + "end": 1144.04, "text": " really want to Shopify or the uh Lucine Lucine workload + right these are great databases they''ve stood", "tokens": [50364, 534, 528, 281, + 43991, 420, 264, 2232, 9593, 533, 9593, 533, 20139, 558, 613, 366, 869, 22380, 436, + 600, 9371, 50748], "temperature": 0.0, "avg_logprob": -0.17347704922711407, "compression_ratio": + 1.7580071174377223, "no_speech_prob": 0.0009361585252918303}, {"id": 238, "seek": + 113636, "start": 1144.04, "end": 1148.6799999999998, "text": " the test of time + and when you''re on call you become very conservative in what you adopt for new", + "tokens": [50748, 264, 1500, 295, 565, 293, 562, 291, 434, 322, 818, 291, 1813, + 588, 13780, 294, 437, 291, 6878, 337, 777, 50980], "temperature": 0.0, "avg_logprob": + -0.17347704922711407, "compression_ratio": 1.7580071174377223, "no_speech_prob": + 0.0009361585252918303}, {"id": 239, "seek": 113636, "start": 1148.6799999999998, + "end": 1154.28, "text": " workloads but you cannot ignore a new storage architecture + that is an order of magnitude cheaper", "tokens": [50980, 32452, 457, 291, 2644, + 11200, 257, 777, 6725, 9482, 300, 307, 364, 1668, 295, 15668, 12284, 51260], "temperature": + 0.0, "avg_logprob": -0.17347704922711407, "compression_ratio": 1.7580071174377223, + "no_speech_prob": 0.0009361585252918303}, {"id": 240, "seek": 113636, "start": 1154.28, + "end": 1160.9199999999998, "text": " than the previous one when you store a gigabyte + of data in a traditional storage engine you have to", "tokens": [51260, 813, 264, + 3894, 472, 562, 291, 3531, 257, 8741, 34529, 295, 1412, 294, 257, 5164, 6725, 2848, + 291, 362, 281, 51592], "temperature": 0.0, "avg_logprob": -0.17347704922711407, + "compression_ratio": 1.7580071174377223, "no_speech_prob": 0.0009361585252918303}, + {"id": 241, "seek": 113636, "start": 1160.9199999999998, "end": 1165.8799999999999, + "text": " replicate that to three disks maybe two if you have a little bit or if + you have more risk tolerance", "tokens": [51592, 25356, 300, 281, 1045, 41617, 1310, + 732, 498, 291, 362, 257, 707, 857, 420, 498, 291, 362, 544, 3148, 23368, 51840], + "temperature": 0.0, "avg_logprob": -0.17347704922711407, "compression_ratio": 1.7580071174377223, + "no_speech_prob": 0.0009361585252918303}, {"id": 242, "seek": 116588, "start": 1165.88, + "end": 1172.3600000000001, "text": " but likely three a gigabyte of disk with from + the cloud vendors cost about 10 cents you run it at 50%", "tokens": [50364, 457, + 3700, 1045, 257, 8741, 34529, 295, 12355, 365, 490, 264, 4588, 22056, 2063, 466, + 1266, 14941, 291, 1190, 309, 412, 2625, 4, 50688], "temperature": 0.0, "avg_logprob": + -0.13840070743005253, "compression_ratio": 1.7654867256637168, "no_speech_prob": + 0.0005042491247877479}, {"id": 243, "seek": 116588, "start": 1172.3600000000001, + "end": 1178.5200000000002, "text": " utilization otherwise it''s too scary to be + on call 20 cents per gigabyte times three for all the", "tokens": [50688, 37074, + 5911, 309, 311, 886, 6958, 281, 312, 322, 818, 945, 14941, 680, 8741, 34529, 1413, + 1045, 337, 439, 264, 50996], "temperature": 0.0, "avg_logprob": -0.13840070743005253, + "compression_ratio": 1.7654867256637168, "no_speech_prob": 0.0005042491247877479}, + {"id": 244, "seek": 116588, "start": 1178.5200000000002, "end": 1186.0400000000002, + "text": " replicas 60 cents per gigabyte obi storage is two cents per gigabyte right + so it''s it''s it''s 30 times", "tokens": [50996, 3248, 9150, 4060, 14941, 680, + 8741, 34529, 1111, 72, 6725, 307, 732, 14941, 680, 8741, 34529, 558, 370, 309, 311, + 309, 311, 309, 311, 2217, 1413, 51372], "temperature": 0.0, "avg_logprob": -0.13840070743005253, + "compression_ratio": 1.7654867256637168, "no_speech_prob": 0.0005042491247877479}, + {"id": 245, "seek": 116588, "start": 1186.0400000000002, "end": 1191.96, "text": + " cheaper if it''s all cold now by the time you have some of it in SSD and you have + it in memory then", "tokens": [51372, 12284, 498, 309, 311, 439, 3554, 586, 538, + 264, 565, 291, 362, 512, 295, 309, 294, 30262, 293, 291, 362, 309, 294, 4675, 550, + 51668], "temperature": 0.0, "avg_logprob": -0.13840070743005253, "compression_ratio": + 1.7654867256637168, "no_speech_prob": 0.0005042491247877479}, {"id": 246, "seek": + 119196, "start": 1191.96, "end": 1197.96, "text": " the blended cost ends up being + different but it tracks the actual value to the customer even if you", "tokens": + [50364, 264, 27048, 2063, 5314, 493, 885, 819, 457, 309, 10218, 264, 3539, 2158, + 281, 264, 5474, 754, 498, 291, 50664], "temperature": 0.0, "avg_logprob": -0.1048102719443185, + "compression_ratio": 1.855513307984791, "no_speech_prob": 0.0005112163489684463}, + {"id": 247, "seek": 119196, "start": 1197.96, "end": 1202.92, "text": " have all + of that in disk well you only need one copy right and that disk you can run it at + 100%", "tokens": [50664, 362, 439, 295, 300, 294, 12355, 731, 291, 787, 643, 472, + 5055, 558, 293, 300, 12355, 291, 393, 1190, 309, 412, 2319, 4, 50912], "temperature": + 0.0, "avg_logprob": -0.1048102719443185, "compression_ratio": 1.855513307984791, + "no_speech_prob": 0.0005112163489684463}, {"id": 248, "seek": 119196, "start": 1202.92, + "end": 1207.72, "text": " utilization meaning the blended cost is now 12 cents per + gigabyte right so the 10 cents 100%", "tokens": [50912, 37074, 3620, 264, 27048, + 2063, 307, 586, 2272, 14941, 680, 8741, 34529, 558, 370, 264, 1266, 14941, 2319, + 4, 51152], "temperature": 0.0, "avg_logprob": -0.1048102719443185, "compression_ratio": + 1.855513307984791, "no_speech_prob": 0.0005112163489684463}, {"id": 249, "seek": + 119196, "start": 1207.72, "end": 1213.32, "text": " utilization plus the two cents + per gigabyte for obi storage so now you have the ingredients of a", "tokens": [51152, + 37074, 1804, 264, 732, 14941, 680, 8741, 34529, 337, 1111, 72, 6725, 370, 586, 291, + 362, 264, 6952, 295, 257, 51432], "temperature": 0.0, "avg_logprob": -0.1048102719443185, + "compression_ratio": 1.855513307984791, "no_speech_prob": 0.0005112163489684463}, + {"id": 250, "seek": 119196, "start": 1213.32, "end": 1219.0, "text": " new actual + database you have a new workload right which is which makes means that people are + out there", "tokens": [51432, 777, 3539, 8149, 291, 362, 257, 777, 20139, 558, 597, + 307, 597, 1669, 1355, 300, 561, 366, 484, 456, 51716], "temperature": 0.0, "avg_logprob": + -0.1048102719443185, "compression_ratio": 1.855513307984791, "no_speech_prob": 0.0005112163489684463}, + {"id": 251, "seek": 121900, "start": 1219.0, "end": 1223.64, "text": " trying to + look for ways to connect their data to LLMs and then you have the second ingredient + which", "tokens": [50364, 1382, 281, 574, 337, 2098, 281, 1745, 641, 1412, 281, + 441, 43, 26386, 293, 550, 291, 362, 264, 1150, 14751, 597, 50596], "temperature": + 0.0, "avg_logprob": -0.13403476987566268, "compression_ratio": 1.7491166077738516, + "no_speech_prob": 0.00047771018580533564}, {"id": 252, "seek": 121900, "start": + 1223.64, "end": 1229.08, "text": " is a new storage architecture that allows them + to do it in order of magnitude easier and cheaper", "tokens": [50596, 307, 257, + 777, 6725, 9482, 300, 4045, 552, 281, 360, 309, 294, 1668, 295, 15668, 3571, 293, + 12284, 50868], "temperature": 0.0, "avg_logprob": -0.13403476987566268, "compression_ratio": + 1.7491166077738516, "no_speech_prob": 0.00047771018580533564}, {"id": 253, "seek": + 121900, "start": 1229.08, "end": 1234.36, "text": " than what they can do when they''re + existing architectures and this matters because vectors are so big", "tokens": [50868, + 813, 437, 436, 393, 360, 562, 436, 434, 6741, 6331, 1303, 293, 341, 7001, 570, 18875, + 366, 370, 955, 51132], "temperature": 0.0, "avg_logprob": -0.13403476987566268, + "compression_ratio": 1.7491166077738516, "no_speech_prob": 0.00047771018580533564}, + {"id": 254, "seek": 121900, "start": 1234.36, "end": 1240.28, "text": " right a + kilobyte of text easily turns into tens of kilobytes of vector data yeah yeah it''s + absolutely", "tokens": [51132, 558, 257, 5128, 13944, 975, 295, 2487, 3612, 4523, + 666, 10688, 295, 5128, 996, 43673, 295, 8062, 1412, 1338, 1338, 309, 311, 3122, + 51428], "temperature": 0.0, "avg_logprob": -0.13403476987566268, "compression_ratio": + 1.7491166077738516, "no_speech_prob": 0.00047771018580533564}, {"id": 255, "seek": + 121900, "start": 1240.28, "end": 1248.12, "text": " true one other thing that I + kept keep hearing or kept hearing about you know whether or not to", "tokens": [51428, + 2074, 472, 661, 551, 300, 286, 4305, 1066, 4763, 420, 4305, 4763, 466, 291, 458, + 1968, 420, 406, 281, 51820], "temperature": 0.0, "avg_logprob": -0.13403476987566268, + "compression_ratio": 1.7491166077738516, "no_speech_prob": 0.00047771018580533564}, + {"id": 256, "seek": 124812, "start": 1248.12, "end": 1254.9199999999998, "text": + " introduce a vector search in the mix for some really heavy workloads is that it + will bring", "tokens": [50364, 5366, 257, 8062, 3164, 294, 264, 2890, 337, 512, + 534, 4676, 32452, 307, 300, 309, 486, 1565, 50704], "temperature": 0.0, "avg_logprob": + -0.12134440926944508, "compression_ratio": 1.6888888888888889, "no_speech_prob": + 0.0005005528219044209}, {"id": 257, "seek": 124812, "start": 1255.9599999999998, + "end": 1261.08, "text": " certain latency on top that we cannot tolerate right for + example if you run a hybrid search", "tokens": [50756, 1629, 27043, 322, 1192, 300, + 321, 2644, 25773, 558, 337, 1365, 498, 291, 1190, 257, 13051, 3164, 51012], "temperature": + 0.0, "avg_logprob": -0.12134440926944508, "compression_ratio": 1.6888888888888889, + "no_speech_prob": 0.0005005528219044209}, {"id": 258, "seek": 124812, "start": 1261.08, + "end": 1267.08, "text": " like you guys have implemented as well you know one of + these will be slowest and therefore you", "tokens": [51012, 411, 291, 1074, 362, + 12270, 382, 731, 291, 458, 472, 295, 613, 486, 312, 2964, 377, 293, 4412, 291, 51312], + "temperature": 0.0, "avg_logprob": -0.12134440926944508, "compression_ratio": 1.6888888888888889, + "no_speech_prob": 0.0005005528219044209}, {"id": 259, "seek": 124812, "start": 1267.08, + "end": 1272.84, "text": " will have to wait for that slowest component and so if + it adds I don''t know a few hundred milliseconds", "tokens": [51312, 486, 362, 281, + 1699, 337, 300, 2964, 377, 6542, 293, 370, 498, 309, 10860, 286, 500, 380, 458, + 257, 1326, 3262, 34184, 51600], "temperature": 0.0, "avg_logprob": -0.12134440926944508, + "compression_ratio": 1.6888888888888889, "no_speech_prob": 0.0005005528219044209}, + {"id": 260, "seek": 127284, "start": 1272.84, "end": 1277.3999999999999, "text": + " on top of your original you know retrieval mechanism then it''s going to be an + off-line", "tokens": [50364, 322, 1192, 295, 428, 3380, 291, 458, 19817, 3337, 7513, + 550, 309, 311, 516, 281, 312, 364, 766, 12, 1889, 50592], "temperature": 0.0, "avg_logprob": + -0.12013372533461626, "compression_ratio": 1.6818181818181819, "no_speech_prob": + 0.001628653146326542}, {"id": 261, "seek": 127284, "start": 1278.1999999999998, + "end": 1283.08, "text": " what''s your take on that have you thought obviously you + have thought about that what''s the", "tokens": [50632, 437, 311, 428, 747, 322, + 300, 362, 291, 1194, 2745, 291, 362, 1194, 466, 300, 437, 311, 264, 50876], "temperature": + 0.0, "avg_logprob": -0.12013372533461626, "compression_ratio": 1.6818181818181819, + "no_speech_prob": 0.001628653146326542}, {"id": 262, "seek": 127284, "start": 1283.08, + "end": 1291.8799999999999, "text": " edge that turbo buffer brings in this space + over maybe pure databases yeah I think I think there''s", "tokens": [50876, 4691, + 300, 20902, 21762, 5607, 294, 341, 1901, 670, 1310, 6075, 22380, 1338, 286, 519, + 286, 519, 456, 311, 51316], "temperature": 0.0, "avg_logprob": -0.12013372533461626, + "compression_ratio": 1.6818181818181819, "no_speech_prob": 0.001628653146326542}, + {"id": 263, "seek": 127284, "start": 1291.8799999999999, "end": 1298.52, "text": + " we see two types of ways that people adopt vector databases or turbo buffer we + don''t consider", "tokens": [51316, 321, 536, 732, 3467, 295, 2098, 300, 561, 6878, + 8062, 22380, 420, 20902, 21762, 321, 500, 380, 1949, 51648], "temperature": 0.0, + "avg_logprob": -0.12013372533461626, "compression_ratio": 1.6818181818181819, "no_speech_prob": + 0.001628653146326542}, {"id": 264, "seek": 129852, "start": 1299.48, "end": 1304.84, + "text": " turbo buffer a pure play vector database we consider it a search engine + we actually consider it", "tokens": [50412, 20902, 21762, 257, 6075, 862, 8062, + 8149, 321, 1949, 309, 257, 3164, 2848, 321, 767, 1949, 309, 50680], "temperature": + 0.0, "avg_logprob": -0.14007179825394242, "compression_ratio": 1.8803088803088803, + "no_speech_prob": 0.002767975674942136}, {"id": 265, "seek": 129852, "start": 1304.84, + "end": 1311.56, "text": " a full database because there''s a full generic LSM underneath + all of that and we consider that the", "tokens": [50680, 257, 1577, 8149, 570, 456, + 311, 257, 1577, 19577, 441, 26693, 7223, 439, 295, 300, 293, 321, 1949, 300, 264, + 51016], "temperature": 0.0, "avg_logprob": -0.14007179825394242, "compression_ratio": + 1.8803088803088803, "no_speech_prob": 0.002767975674942136}, {"id": 266, "seek": + 129852, "start": 1311.56, "end": 1316.12, "text": " actual acid of turbo puffer + is an LSM that''s obnox storage native and doesn''t rely on any state", "tokens": + [51016, 3539, 8258, 295, 20902, 19613, 260, 307, 364, 441, 26693, 300, 311, 1111, + 29129, 6725, 8470, 293, 1177, 380, 10687, 322, 604, 1785, 51244], "temperature": + 0.0, "avg_logprob": -0.14007179825394242, "compression_ratio": 1.8803088803088803, + "no_speech_prob": 0.002767975674942136}, {"id": 267, "seek": 129852, "start": 1317.08, + "end": 1321.32, "text": " we just think that the vector index and the search engine + index is what the market needed the most", "tokens": [51292, 321, 445, 519, 300, + 264, 8062, 8186, 293, 264, 3164, 2848, 8186, 307, 437, 264, 2142, 2978, 264, 881, + 51504], "temperature": 0.0, "avg_logprob": -0.14007179825394242, "compression_ratio": + 1.8803088803088803, "no_speech_prob": 0.002767975674942136}, {"id": 268, "seek": + 129852, "start": 1321.32, "end": 1327.72, "text": " so let''s speak about latency + there''s no real fundamental latency trade off with this architecture", "tokens": + [51504, 370, 718, 311, 1710, 466, 27043, 456, 311, 572, 957, 8088, 27043, 4923, + 766, 365, 341, 9482, 51824], "temperature": 0.0, "avg_logprob": -0.14007179825394242, + "compression_ratio": 1.8803088803088803, "no_speech_prob": 0.002767975674942136}, + {"id": 269, "seek": 132772, "start": 1327.72, "end": 1332.84, "text": " the only + thing is that once in a while you will hit that cold query but the entire databases", + "tokens": [50364, 264, 787, 551, 307, 300, 1564, 294, 257, 1339, 291, 486, 2045, + 300, 3554, 14581, 457, 264, 2302, 22380, 50620], "temperature": 0.0, "avg_logprob": + -0.1923782876197328, "compression_ratio": 1.737556561085973, "no_speech_prob": 0.0024514971300959587}, + {"id": 270, "seek": 132772, "start": 1332.84, "end": 1339.56, "text": " optimize + the round minimizing the amount of round trips that you do to SS3 S3 you can max + out a", "tokens": [50620, 19719, 264, 3098, 46608, 264, 2372, 295, 3098, 16051, + 300, 291, 360, 281, 12238, 18, 318, 18, 291, 393, 11469, 484, 257, 50956], "temperature": + 0.0, "avg_logprob": -0.1923782876197328, "compression_ratio": 1.737556561085973, + "no_speech_prob": 0.0024514971300959587}, {"id": 271, "seek": 132772, "start": 1340.84, + "end": 1346.3600000000001, "text": " a network card right so you can get on a gcp + or AWS box you can get 50 to 100 gigabytes per second", "tokens": [51020, 257, 3209, + 2920, 558, 370, 291, 393, 483, 322, 257, 290, 66, 79, 420, 17650, 2424, 291, 393, + 483, 2625, 281, 2319, 42741, 680, 1150, 51296], "temperature": 0.0, "avg_logprob": + -0.1923782876197328, "compression_ratio": 1.737556561085973, "no_speech_prob": 0.0024514971300959587}, + {"id": 272, "seek": 132772, "start": 1346.3600000000001, "end": 1352.68, "text": + " of network bandwidth you give it per second of network bandwidth so this is similar + to this band", "tokens": [51296, 295, 3209, 23647, 291, 976, 309, 680, 1150, 295, + 3209, 23647, 370, 341, 307, 2531, 281, 341, 4116, 51612], "temperature": 0.0, "avg_logprob": + -0.1923782876197328, "compression_ratio": 1.737556561085973, "no_speech_prob": 0.0024514971300959587}, + {"id": 273, "seek": 135268, "start": 1353.16, "end": 1357.72, "text": " with the + latency actually even better in the clouds often than disks even with SSDs even + with N and", "tokens": [50388, 365, 264, 27043, 767, 754, 1101, 294, 264, 12193, + 2049, 813, 41617, 754, 365, 30262, 82, 754, 365, 426, 293, 50616], "temperature": + 0.0, "avg_logprob": -0.1461756910596575, "compression_ratio": 1.7706093189964158, + "no_speech_prob": 0.0058416565880179405}, {"id": 274, "seek": 135268, "start": 1357.72, + "end": 1365.3200000000002, "text": " NVME SSD so the network is phenomenal you can + drive you know say you can drive all of that data you", "tokens": [50616, 46512, + 15454, 30262, 370, 264, 3209, 307, 17778, 291, 393, 3332, 291, 458, 584, 291, 393, + 3332, 439, 295, 300, 1412, 291, 50996], "temperature": 0.0, "avg_logprob": -0.1461756910596575, + "compression_ratio": 1.7706093189964158, "no_speech_prob": 0.0058416565880179405}, + {"id": 275, "seek": 135268, "start": 1365.3200000000002, "end": 1371.16, "text": + " can drive gigabytes of data per second in a single round trip so you can get great + throughput but", "tokens": [50996, 393, 3332, 42741, 295, 1412, 680, 1150, 294, + 257, 2167, 3098, 4931, 370, 291, 393, 483, 869, 44629, 457, 51288], "temperature": + 0.0, "avg_logprob": -0.1461756910596575, "compression_ratio": 1.7706093189964158, + "no_speech_prob": 0.0058416565880179405}, {"id": 276, "seek": 135268, "start": 1371.16, + "end": 1376.76, "text": " the latency is high the p90 might be around 200 milliseconds + to s3 for every round trip someone", "tokens": [51288, 264, 27043, 307, 1090, 264, + 280, 7771, 1062, 312, 926, 2331, 34184, 281, 262, 18, 337, 633, 3098, 4931, 1580, + 51568], "temperature": 0.0, "avg_logprob": -0.1461756910596575, "compression_ratio": + 1.7706093189964158, "no_speech_prob": 0.0058416565880179405}, {"id": 277, "seek": + 135268, "start": 1376.76, "end": 1381.48, "text": " regardless of how much data + that you transfer assuming you''re saturating the box we''ve decide almost", "tokens": + [51568, 10060, 295, 577, 709, 1412, 300, 291, 5003, 11926, 291, 434, 21160, 990, + 264, 2424, 321, 600, 4536, 1920, 51804], "temperature": 0.0, "avg_logprob": -0.1461756910596575, + "compression_ratio": 1.7706093189964158, "no_speech_prob": 0.0058416565880179405}, + {"id": 278, "seek": 138148, "start": 1381.48, "end": 1385.8, "text": " everything + interval buffer around minimizing the number of round trips to 3 to 4 that doesn''t", + "tokens": [50364, 1203, 15035, 21762, 926, 46608, 264, 1230, 295, 3098, 16051, 281, + 805, 281, 1017, 300, 1177, 380, 50580], "temperature": 0.0, "avg_logprob": -0.12679274755579825, + "compression_ratio": 1.8224299065420562, "no_speech_prob": 0.002645058324560523}, + {"id": 279, "seek": 138148, "start": 1385.8, "end": 1390.2, "text": " just help + for s3 it also helps for modern disk which the same thing you can drive enormous", + "tokens": [50580, 445, 854, 337, 262, 18, 309, 611, 3665, 337, 4363, 12355, 597, + 264, 912, 551, 291, 393, 3332, 11322, 50800], "temperature": 0.0, "avg_logprob": + -0.12679274755579825, "compression_ratio": 1.8224299065420562, "no_speech_prob": + 0.002645058324560523}, {"id": 280, "seek": 138148, "start": 1390.2, "end": 1394.6, + "text": " amounts of bandwidth but the round trip time is is long right it''s like + a hundreds of microseconds", "tokens": [50800, 11663, 295, 23647, 457, 264, 3098, + 4931, 565, 307, 307, 938, 558, 309, 311, 411, 257, 6779, 295, 3123, 37841, 28750, + 51020], "temperature": 0.0, "avg_logprob": -0.12679274755579825, "compression_ratio": + 1.8224299065420562, "no_speech_prob": 0.002645058324560523}, {"id": 281, "seek": + 138148, "start": 1394.6, "end": 1401.16, "text": " versus hundreds of milliseconds + but still still substantial compared to dm so the latency tradeoff", "tokens": [51020, + 5717, 6779, 295, 34184, 457, 920, 920, 16726, 5347, 281, 274, 76, 370, 264, 27043, + 4923, 4506, 51348], "temperature": 0.0, "avg_logprob": -0.12679274755579825, "compression_ratio": + 1.8224299065420562, "no_speech_prob": 0.002645058324560523}, {"id": 282, "seek": + 138148, "start": 1401.16, "end": 1405.48, "text": " is not a fundamental tradeoff + with this architecture by the time that it makes it into the memory", "tokens": + [51348, 307, 406, 257, 8088, 4923, 4506, 365, 341, 9482, 538, 264, 565, 300, 309, + 1669, 309, 666, 264, 4675, 51564], "temperature": 0.0, "avg_logprob": -0.12679274755579825, + "compression_ratio": 1.8224299065420562, "no_speech_prob": 0.002645058324560523}, + {"id": 283, "seek": 138148, "start": 1405.48, "end": 1410.68, "text": " cache it''s + just as fast as everyone else we have found that people don''t care if it''s like + a millisecond", "tokens": [51564, 19459, 309, 311, 445, 382, 2370, 382, 1518, 1646, + 321, 362, 1352, 300, 561, 500, 380, 1127, 498, 309, 311, 411, 257, 27940, 18882, + 51824], "temperature": 0.0, "avg_logprob": -0.12679274755579825, "compression_ratio": + 1.8224299065420562, "no_speech_prob": 0.002645058324560523}, {"id": 284, "seek": + 141068, "start": 1410.68, "end": 1416.3600000000001, "text": " or five milliseconds + as long as it''s reliably less than around 50 milliseconds they''re good right", + "tokens": [50364, 420, 1732, 34184, 382, 938, 382, 309, 311, 49927, 1570, 813, 926, + 2625, 34184, 436, 434, 665, 558, 50648], "temperature": 0.0, "avg_logprob": -0.09769366264343261, + "compression_ratio": 1.7954545454545454, "no_speech_prob": 0.0017712030094116926}, + {"id": 285, "seek": 141068, "start": 1417.0, "end": 1421.96, "text": " and I think + that a lot of the traditional storage architectures especially because of the", + "tokens": [50680, 293, 286, 519, 300, 257, 688, 295, 264, 5164, 6725, 6331, 1303, + 2318, 570, 295, 264, 50928], "temperature": 0.0, "avg_logprob": -0.09769366264343261, + "compression_ratio": 1.7954545454545454, "no_speech_prob": 0.0017712030094116926}, + {"id": 286, "seek": 141068, "start": 1421.96, "end": 1426.44, "text": " sharding + structure with multiple nodes you''re already in a worse position than going to + two systems", "tokens": [50928, 402, 515, 278, 3877, 365, 3866, 13891, 291, 434, + 1217, 294, 257, 5324, 2535, 813, 516, 281, 732, 3652, 51152], "temperature": 0.0, + "avg_logprob": -0.09769366264343261, "compression_ratio": 1.7954545454545454, "no_speech_prob": + 0.0017712030094116926}, {"id": 287, "seek": 141068, "start": 1426.44, "end": 1431.5600000000002, + "text": " where if you write a query on some of the traditional search engine generally + you touch", "tokens": [51152, 689, 498, 291, 2464, 257, 14581, 322, 512, 295, 264, + 5164, 3164, 2848, 5101, 291, 2557, 51408], "temperature": 0.0, "avg_logprob": -0.09769366264343261, + "compression_ratio": 1.7954545454545454, "no_speech_prob": 0.0017712030094116926}, + {"id": 288, "seek": 141068, "start": 1431.5600000000002, "end": 1437.64, "text": + " five ten maybe more nodes depending on depending on this because the shard size + is very very small", "tokens": [51408, 1732, 2064, 1310, 544, 13891, 5413, 322, + 5413, 322, 341, 570, 264, 402, 515, 2744, 307, 588, 588, 1359, 51712], "temperature": + 0.0, "avg_logprob": -0.09769366264343261, "compression_ratio": 1.7954545454545454, + "no_speech_prob": 0.0017712030094116926}, {"id": 289, "seek": 143764, "start": 1437.64, + "end": 1443.16, "text": " you go into more depth on that so you already have this + problem what we see is that there''s two", "tokens": [50364, 291, 352, 666, 544, + 7161, 322, 300, 370, 291, 1217, 362, 341, 1154, 437, 321, 536, 307, 300, 456, 311, + 732, 50640], "temperature": 0.0, "avg_logprob": -0.10004511419332253, "compression_ratio": + 1.753787878787879, "no_speech_prob": 0.003484809072688222}, {"id": 290, "seek": + 143764, "start": 1443.16, "end": 1449.0800000000002, "text": " types of ways that + people adopt it so the first one is you have an existing lexical search engine", + "tokens": [50640, 3467, 295, 2098, 300, 561, 6878, 309, 370, 264, 700, 472, 307, + 291, 362, 364, 6741, 476, 87, 804, 3164, 2848, 50936], "temperature": 0.0, "avg_logprob": + -0.10004511419332253, "compression_ratio": 1.753787878787879, "no_speech_prob": + 0.003484809072688222}, {"id": 291, "seek": 143764, "start": 1450.68, "end": 1455.3200000000002, + "text": " you are having a hard time running it because of the traditional like + very stateful", "tokens": [51016, 291, 366, 1419, 257, 1152, 565, 2614, 309, 570, + 295, 264, 5164, 411, 588, 1785, 906, 51248], "temperature": 0.0, "avg_logprob": + -0.10004511419332253, "compression_ratio": 1.753787878787879, "no_speech_prob": + 0.003484809072688222}, {"id": 292, "seek": 143764, "start": 1455.3200000000002, + "end": 1462.76, "text": " architecture and they''re reputed for just being difficult + to run and you''re like already a", "tokens": [51248, 9482, 293, 436, 434, 1085, + 4866, 337, 445, 885, 2252, 281, 1190, 293, 291, 434, 411, 1217, 257, 51620], "temperature": + 0.0, "avg_logprob": -0.10004511419332253, "compression_ratio": 1.753787878787879, + "no_speech_prob": 0.003484809072688222}, {"id": 293, "seek": 143764, "start": 1462.76, + "end": 1467.24, "text": " little bit add your threshold for the amount of money + that you''re spending on this cluster and", "tokens": [51620, 707, 857, 909, 428, + 14678, 337, 264, 2372, 295, 1460, 300, 291, 434, 6434, 322, 341, 13630, 293, 51844], + "temperature": 0.0, "avg_logprob": -0.10004511419332253, "compression_ratio": 1.753787878787879, + "no_speech_prob": 0.003484809072688222}, {"id": 294, "seek": 146724, "start": 1467.32, + "end": 1472.1200000000001, "text": " if you put the vector data in it''s often 10 + to 20 times larger than the text data it is just", "tokens": [50368, 498, 291, 829, + 264, 8062, 1412, 294, 309, 311, 2049, 1266, 281, 945, 1413, 4833, 813, 264, 2487, + 1412, 309, 307, 445, 50608], "temperature": 0.0, "avg_logprob": -0.10057676365945191, + "compression_ratio": 1.7806691449814127, "no_speech_prob": 0.0046919300220906734}, + {"id": 295, "seek": 146724, "start": 1473.16, "end": 1477.96, "text": " it''s a + project that stops in its tracks similar to the read wise case that I mentioned + before", "tokens": [50660, 309, 311, 257, 1716, 300, 10094, 294, 1080, 10218, 2531, + 281, 264, 1401, 10829, 1389, 300, 286, 2835, 949, 50900], "temperature": 0.0, "avg_logprob": + -0.10057676365945191, "compression_ratio": 1.7806691449814127, "no_speech_prob": + 0.0046919300220906734}, {"id": 296, "seek": 146724, "start": 1477.96, "end": 1482.2, + "text": " so for those players we often see that they have something that''s really + well tuned for the lexical", "tokens": [50900, 370, 337, 729, 4150, 321, 2049, 536, + 300, 436, 362, 746, 300, 311, 534, 731, 10870, 337, 264, 476, 87, 804, 51112], "temperature": + 0.0, "avg_logprob": -0.10057676365945191, "compression_ratio": 1.7806691449814127, + "no_speech_prob": 0.0046919300220906734}, {"id": 297, "seek": 146724, "start": 1482.2, + "end": 1488.6, "text": " and they adopt a vector store and then they do two queries + in parallel the vector store should not", "tokens": [51112, 293, 436, 6878, 257, + 8062, 3531, 293, 550, 436, 360, 732, 24109, 294, 8952, 264, 8062, 3531, 820, 406, + 51432], "temperature": 0.0, "avg_logprob": -0.10057676365945191, "compression_ratio": + 1.7806691449814127, "no_speech_prob": 0.0046919300220906734}, {"id": 298, "seek": + 146724, "start": 1488.6, "end": 1493.08, "text": " be slower than the lexical right + so these are just two futures that you merge together in use", "tokens": [51432, + 312, 14009, 813, 264, 476, 87, 804, 558, 370, 613, 366, 445, 732, 26071, 300, 291, + 22183, 1214, 294, 764, 51656], "temperature": 0.0, "avg_logprob": -0.10057676365945191, + "compression_ratio": 1.7806691449814127, "no_speech_prob": 0.0046919300220906734}, + {"id": 299, "seek": 149308, "start": 1493.96, "end": 1498.4399999999998, "text": + " and in general we see that our customers are actually quite happy to move some + of the ranking", "tokens": [50408, 293, 294, 2674, 321, 536, 300, 527, 4581, 366, + 767, 1596, 2055, 281, 1286, 512, 295, 264, 17833, 50632], "temperature": 0.0, "avg_logprob": + -0.12755109582628524, "compression_ratio": 1.7859778597785978, "no_speech_prob": + 0.019476044923067093}, {"id": 300, "seek": 149308, "start": 1498.4399999999998, + "end": 1505.08, "text": " and the final like second stage ranking out of the search + engine and into a search.py instead of", "tokens": [50632, 293, 264, 2572, 411, + 1150, 3233, 17833, 484, 295, 264, 3164, 2848, 293, 666, 257, 3164, 13, 8200, 2602, + 295, 50964], "temperature": 0.0, "avg_logprob": -0.12755109582628524, "compression_ratio": + 1.7859778597785978, "no_speech_prob": 0.019476044923067093}, {"id": 301, "seek": + 149308, "start": 1505.08, "end": 1510.9199999999998, "text": " a big search.json + which can be very difficult to maintain many of these companies express a lot", + "tokens": [50964, 257, 955, 3164, 13, 73, 3015, 597, 393, 312, 588, 2252, 281, 6909, + 867, 295, 613, 3431, 5109, 257, 688, 51256], "temperature": 0.0, "avg_logprob": + -0.12755109582628524, "compression_ratio": 1.7859778597785978, "no_speech_prob": + 0.019476044923067093}, {"id": 302, "seek": 149308, "start": 1510.9199999999998, + "end": 1516.04, "text": " of desire to move more and more of their lexical work + also onto turbo buffer and we have a full", "tokens": [51256, 295, 7516, 281, 1286, + 544, 293, 544, 295, 641, 476, 87, 804, 589, 611, 3911, 20902, 21762, 293, 321, 362, + 257, 1577, 51512], "temperature": 0.0, "avg_logprob": -0.12755109582628524, "compression_ratio": + 1.7859778597785978, "no_speech_prob": 0.019476044923067093}, {"id": 303, "seek": + 149308, "start": 1516.04, "end": 1520.9199999999998, "text": " tech search engine + we don''t have every feature of blue scene yet but we''re working very very actively", + "tokens": [51512, 7553, 3164, 2848, 321, 500, 380, 362, 633, 4111, 295, 3344, 4145, + 1939, 457, 321, 434, 1364, 588, 588, 13022, 51756], "temperature": 0.0, "avg_logprob": + -0.12755109582628524, "compression_ratio": 1.7859778597785978, "no_speech_prob": + 0.019476044923067093}, {"id": 304, "seek": 152092, "start": 1521.0, "end": 1527.72, + "text": " on bringing this up what we also see is that a lot of our customers don''t + need all of the features", "tokens": [50368, 322, 5062, 341, 493, 437, 321, 611, + 536, 307, 300, 257, 688, 295, 527, 4581, 500, 380, 643, 439, 295, 264, 4122, 50704], + "temperature": 0.0, "avg_logprob": -0.14197952872828434, "compression_ratio": 1.7035398230088497, + "no_speech_prob": 0.0048061395063996315}, {"id": 305, "seek": 152092, "start": 1527.72, + "end": 1535.8000000000002, "text": " of blue scene anymore because the vectors are + so good that a lot of the you know Ph.D. level", "tokens": [50704, 295, 3344, 4145, + 3602, 570, 264, 18875, 366, 370, 665, 300, 257, 688, 295, 264, 291, 458, 2623, 13, + 35, 13, 1496, 51108], "temperature": 0.0, "avg_logprob": -0.14197952872828434, "compression_ratio": + 1.7035398230088497, "no_speech_prob": 0.0048061395063996315}, {"id": 306, "seek": + 152092, "start": 1535.8000000000002, "end": 1541.0800000000002, "text": " efforts + we did before to turn strings into things is not as much of an issue anymore and + really", "tokens": [51108, 6484, 321, 630, 949, 281, 1261, 13985, 666, 721, 307, + 406, 382, 709, 295, 364, 2734, 3602, 293, 534, 51372], "temperature": 0.0, "avg_logprob": + -0.14197952872828434, "compression_ratio": 1.7035398230088497, "no_speech_prob": + 0.0048061395063996315}, {"id": 307, "seek": 152092, "start": 1541.0800000000002, + "end": 1546.92, "text": " what we use strings for more is that when you search for + DM you get the metri right like like for", "tokens": [51372, 437, 321, 764, 13985, + 337, 544, 307, 300, 562, 291, 3164, 337, 15322, 291, 483, 264, 1131, 470, 558, 411, + 411, 337, 51664], "temperature": 0.0, "avg_logprob": -0.14197952872828434, "compression_ratio": + 1.7035398230088497, "no_speech_prob": 0.0048061395063996315}, {"id": 308, "seek": + 154692, "start": 1546.92, "end": 1551.88, "text": " a prefix match whereas an embedding + model might think that that''s a direct message those kinds", "tokens": [50364, + 257, 46969, 2995, 9735, 364, 12240, 3584, 2316, 1062, 519, 300, 300, 311, 257, 2047, + 3636, 729, 3685, 50612], "temperature": 0.0, "avg_logprob": -0.13636361568345937, + "compression_ratio": 1.8964143426294822, "no_speech_prob": 0.013296348042786121}, + {"id": 309, "seek": 154692, "start": 1551.88, "end": 1556.52, "text": " of things + are important and we still need string matching for that lots of applications needed", + "tokens": [50612, 295, 721, 366, 1021, 293, 321, 920, 643, 6798, 14324, 337, 300, + 3195, 295, 5821, 2978, 50844], "temperature": 0.0, "avg_logprob": -0.13636361568345937, + "compression_ratio": 1.8964143426294822, "no_speech_prob": 0.013296348042786121}, + {"id": 310, "seek": 154692, "start": 1557.3200000000002, "end": 1562.92, "text": + " but there''s a lot of things that we do in leucine with synonyms with stemming + with all these kinds", "tokens": [50884, 457, 456, 311, 257, 688, 295, 721, 300, + 321, 360, 294, 476, 1311, 533, 365, 5451, 2526, 2592, 365, 12312, 2810, 365, 439, + 613, 3685, 51164], "temperature": 0.0, "avg_logprob": -0.13636361568345937, "compression_ratio": + 1.8964143426294822, "no_speech_prob": 0.013296348042786121}, {"id": 311, "seek": + 154692, "start": 1562.92, "end": 1568.44, "text": " of things the team models are + frankly just a lot better at so we find that this is an adoption", "tokens": [51164, + 295, 721, 264, 1469, 5245, 366, 11939, 445, 257, 688, 1101, 412, 370, 321, 915, + 300, 341, 307, 364, 19215, 51440], "temperature": 0.0, "avg_logprob": -0.13636361568345937, + "compression_ratio": 1.8964143426294822, "no_speech_prob": 0.013296348042786121}, + {"id": 312, "seek": 154692, "start": 1568.44, "end": 1572.76, "text": " curve that + is there a lot of the newer companies just start with embedding models and simple", + "tokens": [51440, 7605, 300, 307, 456, 257, 688, 295, 264, 17628, 3431, 445, 722, + 365, 12240, 3584, 5245, 293, 2199, 51656], "temperature": 0.0, "avg_logprob": -0.13636361568345937, + "compression_ratio": 1.8964143426294822, "no_speech_prob": 0.013296348042786121}, + {"id": 313, "seek": 157276, "start": 1572.76, "end": 1577.08, "text": " full-text + search and and they get it up and running on turbo puffer and they like that they + just", "tokens": [50364, 1577, 12, 25111, 3164, 293, 293, 436, 483, 309, 493, 293, + 2614, 322, 20902, 19613, 260, 293, 436, 411, 300, 436, 445, 50580], "temperature": + 0.0, "avg_logprob": -0.15564343107848608, "compression_ratio": 1.8953488372093024, + "no_speech_prob": 0.02435794100165367}, {"id": 314, "seek": 157276, "start": 1577.08, + "end": 1581.48, "text": " pay for what they need they don''t think about it and + they could pump a petabyte of data and if", "tokens": [50580, 1689, 337, 437, 436, + 643, 436, 500, 380, 519, 466, 309, 293, 436, 727, 5889, 257, 3817, 34529, 295, 1412, + 293, 498, 50800], "temperature": 0.0, "avg_logprob": -0.15564343107848608, "compression_ratio": + 1.8953488372093024, "no_speech_prob": 0.02435794100165367}, {"id": 315, "seek": + 157276, "start": 1581.48, "end": 1586.2, "text": " they want it and it would be + extremely competitive on pricing and they don''t have to think about it", "tokens": + [50800, 436, 528, 309, 293, 309, 576, 312, 4664, 10043, 322, 17621, 293, 436, 500, + 380, 362, 281, 519, 466, 309, 51036], "temperature": 0.0, "avg_logprob": -0.15564343107848608, + "compression_ratio": 1.8953488372093024, "no_speech_prob": 0.02435794100165367}, + {"id": 316, "seek": 157276, "start": 1586.2, "end": 1591.32, "text": " oh that''s + awesome that''s awesome actually I forgot to mention I forgot to ask you which language", + "tokens": [51036, 1954, 300, 311, 3476, 300, 311, 3476, 767, 286, 5298, 281, 2152, + 286, 5298, 281, 1029, 291, 597, 2856, 51292], "temperature": 0.0, "avg_logprob": + -0.15564343107848608, "compression_ratio": 1.8953488372093024, "no_speech_prob": + 0.02435794100165367}, {"id": 317, "seek": 157276, "start": 1591.32, "end": 1599.72, + "text": " did you choose to implement to revolver on yeah we we um well it was just + me at the time but I chose", "tokens": [51292, 630, 291, 2826, 281, 4445, 281, 16908, + 331, 322, 1338, 321, 321, 1105, 731, 309, 390, 445, 385, 412, 264, 565, 457, 286, + 5111, 51712], "temperature": 0.0, "avg_logprob": -0.15564343107848608, "compression_ratio": + 1.8953488372093024, "no_speech_prob": 0.02435794100165367}, {"id": 318, "seek": + 159972, "start": 1599.8, "end": 1607.08, "text": " I chose Rust and I think I spent + the majority of my career writing Ruby at Shopify and", "tokens": [50368, 286, 5111, + 34952, 293, 286, 519, 286, 4418, 264, 6286, 295, 452, 3988, 3579, 19907, 412, 43991, + 293, 50732], "temperature": 0.0, "avg_logprob": -0.1917729320296322, "compression_ratio": + 1.655813953488372, "no_speech_prob": 0.0014194503892213106}, {"id": 319, "seek": + 159972, "start": 1608.1200000000001, "end": 1613.8, "text": " and then a lot of + go as well for some of the infrastructure components and then mainly debugging", + "tokens": [50784, 293, 550, 257, 688, 295, 352, 382, 731, 337, 512, 295, 264, 6896, + 6677, 293, 550, 8704, 45592, 51068], "temperature": 0.0, "avg_logprob": -0.1917729320296322, + "compression_ratio": 1.655813953488372, "no_speech_prob": 0.0014194503892213106}, + {"id": 320, "seek": 159972, "start": 1613.8, "end": 1618.6000000000001, "text": + " C which all the databases that we were using were we''re doing and reading C", + "tokens": [51068, 383, 597, 439, 264, 22380, 300, 321, 645, 1228, 645, 321, 434, + 884, 293, 3760, 383, 51308], "temperature": 0.0, "avg_logprob": -0.1917729320296322, + "compression_ratio": 1.655813953488372, "no_speech_prob": 0.0014194503892213106}, + {"id": 321, "seek": 159972, "start": 1620.2, "end": 1628.04, "text": " I I really + like go and I like go alongside Ruby at Shopify because go was one of those things + as", "tokens": [51388, 286, 286, 534, 411, 352, 293, 286, 411, 352, 12385, 19907, + 412, 43991, 570, 352, 390, 472, 295, 729, 721, 382, 51780], "temperature": 0.0, + "avg_logprob": -0.1917729320296322, "compression_ratio": 1.655813953488372, "no_speech_prob": + 0.0014194503892213106}, {"id": 322, "seek": 162804, "start": 1628.6, "end": 1632.92, + "text": " when leading teams I didn''t have to worry about whether someone knew + go or not because the adoption", "tokens": [50392, 562, 5775, 5491, 286, 994, 380, + 362, 281, 3292, 466, 1968, 1580, 2586, 352, 420, 406, 570, 264, 19215, 50608], "temperature": + 0.0, "avg_logprob": -0.10201243240467824, "compression_ratio": 2.0034364261168385, + "no_speech_prob": 0.0026609969791024923}, {"id": 323, "seek": 162804, "start": 1632.92, + "end": 1639.8, "text": " to learn it is two weeks um the adoption to learn Rust + and being proficient in it is months right", "tokens": [50608, 281, 1466, 309, 307, + 732, 3259, 1105, 264, 19215, 281, 1466, 34952, 293, 885, 1740, 24549, 294, 309, + 307, 2493, 558, 50952], "temperature": 0.0, "avg_logprob": -0.10201243240467824, + "compression_ratio": 2.0034364261168385, "no_speech_prob": 0.0026609969791024923}, + {"id": 324, "seek": 162804, "start": 1639.8, "end": 1643.56, "text": " and somehow + that''s written Rust for two years it''s a lot more productive than someone who''s", + "tokens": [50952, 293, 6063, 300, 311, 3720, 34952, 337, 732, 924, 309, 311, 257, + 688, 544, 13304, 813, 1580, 567, 311, 51140], "temperature": 0.0, "avg_logprob": + -0.10201243240467824, "compression_ratio": 2.0034364261168385, "no_speech_prob": + 0.0026609969791024923}, {"id": 325, "seek": 162804, "start": 1643.56, "end": 1648.28, + "text": " written it for two months in the language um and that''s just not the + case for go like someone who''s", "tokens": [51140, 3720, 309, 337, 732, 2493, 294, + 264, 2856, 1105, 293, 300, 311, 445, 406, 264, 1389, 337, 352, 411, 1580, 567, 311, + 51376], "temperature": 0.0, "avg_logprob": -0.10201243240467824, "compression_ratio": + 2.0034364261168385, "no_speech_prob": 0.0026609969791024923}, {"id": 326, "seek": + 162804, "start": 1648.28, "end": 1651.72, "text": " spent two years in it is just + not that much more productive and so and I think that''s an amazing", "tokens": + [51376, 4418, 732, 924, 294, 309, 307, 445, 406, 300, 709, 544, 13304, 293, 370, + 293, 286, 519, 300, 311, 364, 2243, 51548], "temperature": 0.0, "avg_logprob": -0.10201243240467824, + "compression_ratio": 2.0034364261168385, "no_speech_prob": 0.0026609969791024923}, + {"id": 327, "seek": 162804, "start": 1651.72, "end": 1656.92, "text": " feature + of the language but from from my own point of view and from the napkin web math + point of", "tokens": [51548, 4111, 295, 264, 2856, 457, 490, 490, 452, 1065, 935, + 295, 1910, 293, 490, 264, 9296, 5843, 3670, 5221, 935, 295, 51808], "temperature": + 0.0, "avg_logprob": -0.10201243240467824, "compression_ratio": 2.0034364261168385, + "no_speech_prob": 0.0026609969791024923}, {"id": 328, "seek": 165692, "start": 1657.16, + "end": 1663.24, "text": " you I just I was always so hungry having been in time + inside of runtimes in the Ruby MRI runtime", "tokens": [50376, 291, 286, 445, 286, + 390, 1009, 370, 8067, 1419, 668, 294, 565, 1854, 295, 49435, 1532, 294, 264, 19907, + 32812, 34474, 50680], "temperature": 0.0, "avg_logprob": -0.12749700457136207, "compression_ratio": + 1.8795180722891567, "no_speech_prob": 0.0036210031248629093}, {"id": 329, "seek": + 165692, "start": 1663.24, "end": 1668.04, "text": " and then inside of the go runtime + I was just hungry to just get directly connected to", "tokens": [50680, 293, 550, + 1854, 295, 264, 352, 34474, 286, 390, 445, 8067, 281, 445, 483, 3838, 4582, 281, + 50920], "temperature": 0.0, "avg_logprob": -0.12749700457136207, "compression_ratio": + 1.8795180722891567, "no_speech_prob": 0.0036210031248629093}, {"id": 330, "seek": + 165692, "start": 1669.0, "end": 1673.8000000000002, "text": " the metal of the machine + and so and for a database in particular that was very important right", "tokens": + [50968, 264, 5760, 295, 264, 3479, 293, 370, 293, 337, 257, 8149, 294, 1729, 300, + 390, 588, 1021, 558, 51208], "temperature": 0.0, "avg_logprob": -0.12749700457136207, + "compression_ratio": 1.8795180722891567, "no_speech_prob": 0.0036210031248629093}, + {"id": 331, "seek": 165692, "start": 1673.8000000000002, "end": 1679.16, "text": + " we need to vectorize everything we need full control over that and I think that + I think that", "tokens": [51208, 321, 643, 281, 8062, 1125, 1203, 321, 643, 1577, + 1969, 670, 300, 293, 286, 519, 300, 286, 519, 300, 51476], "temperature": 0.0, "avg_logprob": + -0.12749700457136207, "compression_ratio": 1.8795180722891567, "no_speech_prob": + 0.0036210031248629093}, {"id": 332, "seek": 165692, "start": 1679.16, "end": 1684.44, + "text": " that full control as remarkable now as Go is which would I think it would + be would have been okay", "tokens": [51476, 300, 1577, 1969, 382, 12802, 586, 382, + 1037, 307, 597, 576, 286, 519, 309, 576, 312, 576, 362, 668, 1392, 51740], "temperature": + 0.0, "avg_logprob": -0.12749700457136207, "compression_ratio": 1.8795180722891567, + "no_speech_prob": 0.0036210031248629093}, {"id": 333, "seek": 168444, "start": 1685.4, + "end": 1691.24, "text": " that raw access to the machine has been has is needed + for for writing something like TurboPover.", "tokens": [50412, 300, 8936, 2105, + 281, 264, 3479, 575, 668, 575, 307, 2978, 337, 337, 3579, 746, 411, 35848, 47, 3570, + 13, 50704], "temperature": 0.0, "avg_logprob": -0.22287628449589372, "compression_ratio": + 1.6018099547511313, "no_speech_prob": 0.006159712094813585}, {"id": 334, "seek": + 168444, "start": 1691.24, "end": 1697.24, "text": " Yeah for sure I still remember + coding the times when I was learning and coded", "tokens": [50704, 865, 337, 988, + 286, 920, 1604, 17720, 264, 1413, 562, 286, 390, 2539, 293, 34874, 51004], "temperature": + 0.0, "avg_logprob": -0.22287628449589372, "compression_ratio": 1.6018099547511313, + "no_speech_prob": 0.006159712094813585}, {"id": 335, "seek": 168444, "start": 1697.96, + "end": 1704.52, "text": " industrially in CNC++ like you like you really need it + to be very very careful but in return", "tokens": [51040, 49005, 379, 294, 48714, + 25472, 411, 291, 411, 291, 534, 643, 309, 281, 312, 588, 588, 5026, 457, 294, 2736, + 51368], "temperature": 0.0, "avg_logprob": -0.22287628449589372, "compression_ratio": + 1.6018099547511313, "no_speech_prob": 0.006159712094813585}, {"id": 336, "seek": + 168444, "start": 1704.52, "end": 1709.8, "text": " you can get a lot of like performance + gains you know and some of your ideas really fly", "tokens": [51368, 291, 393, 483, + 257, 688, 295, 411, 3389, 16823, 291, 458, 293, 512, 295, 428, 3487, 534, 3603, + 51632], "temperature": 0.0, "avg_logprob": -0.22287628449589372, "compression_ratio": + 1.6018099547511313, "no_speech_prob": 0.006159712094813585}, {"id": 337, "seek": + 170980, "start": 1710.68, "end": 1715.8799999999999, "text": " but yeah today I + guess I''m coding more in Python or should I even say that I called in Python when + I", "tokens": [50408, 457, 1338, 965, 286, 2041, 286, 478, 17720, 544, 294, 15329, + 420, 820, 286, 754, 584, 300, 286, 1219, 294, 15329, 562, 286, 50668], "temperature": + 0.0, "avg_logprob": -0.11883871606055726, "compression_ratio": 1.7017543859649122, + "no_speech_prob": 0.013183152303099632}, {"id": 338, "seek": 170980, "start": 1715.8799999999999, + "end": 1724.36, "text": " use cursor more and more which is by the way scary you + know the the that feeling when some some", "tokens": [50668, 764, 28169, 544, 293, + 544, 597, 307, 538, 264, 636, 6958, 291, 458, 264, 264, 300, 2633, 562, 512, 512, + 51092], "temperature": 0.0, "avg_logprob": -0.11883871606055726, "compression_ratio": + 1.7017543859649122, "no_speech_prob": 0.013183152303099632}, {"id": 339, "seek": + 170980, "start": 1724.36, "end": 1729.3999999999999, "text": " other entity writes + code and you are just reading it right it''s it''s a little bit scary and I''m", + "tokens": [51092, 661, 13977, 13657, 3089, 293, 291, 366, 445, 3760, 309, 558, 309, + 311, 309, 311, 257, 707, 857, 6958, 293, 286, 478, 51344], "temperature": 0.0, "avg_logprob": + -0.11883871606055726, "compression_ratio": 1.7017543859649122, "no_speech_prob": + 0.013183152303099632}, {"id": 340, "seek": 170980, "start": 1729.3999999999999, + "end": 1736.44, "text": " still grappling with it but the amount of productivity + that I get is enormous and it''s like you", "tokens": [51344, 920, 50086, 365, 309, + 457, 264, 2372, 295, 15604, 300, 286, 483, 307, 11322, 293, 309, 311, 411, 291, + 51696], "temperature": 0.0, "avg_logprob": -0.11883871606055726, "compression_ratio": + 1.7017543859649122, "no_speech_prob": 0.013183152303099632}, {"id": 341, "seek": + 173644, "start": 1736.52, "end": 1741.24, "text": " know I can shoot daily like + features and just see them being used that''s amazing.", "tokens": [50368, 458, + 286, 393, 3076, 5212, 411, 4122, 293, 445, 536, 552, 885, 1143, 300, 311, 2243, + 13, 50604], "temperature": 0.0, "avg_logprob": -0.16198551982914636, "compression_ratio": + 1.6525096525096525, "no_speech_prob": 0.0007687908946536481}, {"id": 342, "seek": + 173644, "start": 1742.44, "end": 1748.92, "text": " I think what I love about it + is that I still love to sit there and write the", "tokens": [50664, 286, 519, 437, + 286, 959, 466, 309, 307, 300, 286, 920, 959, 281, 1394, 456, 293, 2464, 264, 50988], + "temperature": 0.0, "avg_logprob": -0.16198551982914636, "compression_ratio": 1.6525096525096525, + "no_speech_prob": 0.0007687908946536481}, {"id": 343, "seek": 173644, "start": 1748.92, + "end": 1755.4, "text": " additional code by hand you know maybe at some point we + will we will we will mark TurboPover as", "tokens": [50988, 4497, 3089, 538, 1011, + 291, 458, 1310, 412, 512, 935, 321, 486, 321, 486, 321, 486, 1491, 35848, 47, 3570, + 382, 51312], "temperature": 0.0, "avg_logprob": -0.16198551982914636, "compression_ratio": + 1.6525096525096525, "no_speech_prob": 0.0007687908946536481}, {"id": 344, "seek": + 173644, "start": 1755.4, "end": 1760.6000000000001, "text": " an a seasonally written + database because we don''t use a ton of AI for the very key parts", "tokens": [51312, + 364, 257, 3196, 379, 3720, 8149, 570, 321, 500, 380, 764, 257, 2952, 295, 7318, + 337, 264, 588, 2141, 3166, 51572], "temperature": 0.0, "avg_logprob": -0.16198551982914636, + "compression_ratio": 1.6525096525096525, "no_speech_prob": 0.0007687908946536481}, + {"id": 345, "seek": 173644, "start": 1761.24, "end": 1766.28, "text": " because + I mean we''re at the edge of what the LLMs could know but I think that for me", + "tokens": [51604, 570, 286, 914, 321, 434, 412, 264, 4691, 295, 437, 264, 441, 43, + 26386, 727, 458, 457, 286, 519, 300, 337, 385, 51856], "temperature": 0.0, "avg_logprob": + -0.16198551982914636, "compression_ratio": 1.6525096525096525, "no_speech_prob": + 0.0007687908946536481}, {"id": 346, "seek": 176644, "start": 1766.44, "end": 1771.24, + "text": " in a position where I''m in and out of meetings all day these days but + I can actually get a lot", "tokens": [50364, 294, 257, 2535, 689, 286, 478, 294, + 293, 484, 295, 8410, 439, 786, 613, 1708, 457, 286, 393, 767, 483, 257, 688, 50604], + "temperature": 0.0, "avg_logprob": -0.12191998617989676, "compression_ratio": 1.865814696485623, + "no_speech_prob": 0.006656951736658812}, {"id": 347, "seek": 176644, "start": 1771.24, + "end": 1775.64, "text": " done in a 30 minute window when I have something that''s + prompting and writing the tests right and", "tokens": [50604, 1096, 294, 257, 2217, + 3456, 4910, 562, 286, 362, 746, 300, 311, 12391, 278, 293, 3579, 264, 6921, 558, + 293, 50824], "temperature": 0.0, "avg_logprob": -0.12191998617989676, "compression_ratio": + 1.865814696485623, "no_speech_prob": 0.006656951736658812}, {"id": 348, "seek": + 176644, "start": 1775.64, "end": 1779.64, "text": " you follow it off at the beginning + of the meeting you check in and they''re like you know 15 30 minutes", "tokens": + [50824, 291, 1524, 309, 766, 412, 264, 2863, 295, 264, 3440, 291, 1520, 294, 293, + 436, 434, 411, 291, 458, 2119, 2217, 2077, 51024], "temperature": 0.0, "avg_logprob": + -0.12191998617989676, "compression_ratio": 1.865814696485623, "no_speech_prob": + 0.006656951736658812}, {"id": 349, "seek": 176644, "start": 1779.64, "end": 1785.24, + "text": " you have in between blocks and this allows me to actually contribute a + lot more code than I was", "tokens": [51024, 291, 362, 294, 1296, 8474, 293, 341, + 4045, 385, 281, 767, 10586, 257, 688, 544, 3089, 813, 286, 390, 51304], "temperature": + 0.0, "avg_logprob": -0.12191998617989676, "compression_ratio": 1.865814696485623, + "no_speech_prob": 0.006656951736658812}, {"id": 350, "seek": 176644, "start": 1785.24, + "end": 1790.92, "text": " otherwise going to be able to not into the core engine + you know I don''t get led led into led into", "tokens": [51304, 5911, 516, 281, + 312, 1075, 281, 406, 666, 264, 4965, 2848, 291, 458, 286, 500, 380, 483, 4684, 4684, + 666, 4684, 666, 51588], "temperature": 0.0, "avg_logprob": -0.12191998617989676, + "compression_ratio": 1.865814696485623, "no_speech_prob": 0.006656951736658812}, + {"id": 351, "seek": 176644, "start": 1790.92, "end": 1795.3200000000002, "text": + " a lot of that anymore because I don''t have the the time and focus that it takes + to fully think", "tokens": [51588, 257, 688, 295, 300, 3602, 570, 286, 500, 380, + 362, 264, 264, 565, 293, 1879, 300, 309, 2516, 281, 4498, 519, 51808], "temperature": + 0.0, "avg_logprob": -0.12191998617989676, "compression_ratio": 1.865814696485623, + "no_speech_prob": 0.006656951736658812}, {"id": 352, "seek": 179532, "start": 1795.32, + "end": 1800.12, "text": " something through there but for the website the API to + initial features all of that it''s just", "tokens": [50364, 746, 807, 456, 457, + 337, 264, 3144, 264, 9362, 281, 5883, 4122, 439, 295, 300, 309, 311, 445, 50604], + "temperature": 0.0, "avg_logprob": -0.11372512994810592, "compression_ratio": 1.602510460251046, + "no_speech_prob": 0.0034170181024819613}, {"id": 353, "seek": 179532, "start": 1800.12, + "end": 1808.04, "text": " been wonderful yeah that''s amazing I also wanted to go + a bit on the tangent like you essentially", "tokens": [50604, 668, 3715, 1338, 300, + 311, 2243, 286, 611, 1415, 281, 352, 257, 857, 322, 264, 27747, 411, 291, 4476, + 51000], "temperature": 0.0, "avg_logprob": -0.11372512994810592, "compression_ratio": + 1.602510460251046, "no_speech_prob": 0.0034170181024819613}, {"id": 354, "seek": + 179532, "start": 1808.04, "end": 1815.72, "text": " you''ve been you could say mathematician + engineer but you took a leap towards becoming a CEO right", "tokens": [51000, 291, + 600, 668, 291, 727, 584, 48281, 11403, 457, 291, 1890, 257, 19438, 3030, 5617, 257, + 9282, 558, 51384], "temperature": 0.0, "avg_logprob": -0.11372512994810592, "compression_ratio": + 1.602510460251046, "no_speech_prob": 0.0034170181024819613}, {"id": 355, "seek": + 179532, "start": 1815.72, "end": 1823.0, "text": " and I think you know as you said + you go to meetings you do lots of you know probably sales and", "tokens": [51384, + 293, 286, 519, 291, 458, 382, 291, 848, 291, 352, 281, 8410, 291, 360, 3195, 295, + 291, 458, 1391, 5763, 293, 51748], "temperature": 0.0, "avg_logprob": -0.11372512994810592, + "compression_ratio": 1.602510460251046, "no_speech_prob": 0.0034170181024819613}, + {"id": 356, "seek": 182300, "start": 1823.0, "end": 1829.72, "text": " and and and + product and all of that stuff was it a natural transition for you like what what + have", "tokens": [50364, 293, 293, 293, 1674, 293, 439, 295, 300, 1507, 390, 309, + 257, 3303, 6034, 337, 291, 411, 437, 437, 362, 50700], "temperature": 0.0, "avg_logprob": + -0.13952407836914063, "compression_ratio": 1.8309859154929577, "no_speech_prob": + 0.0015195596497505903}, {"id": 357, "seek": 182300, "start": 1829.72, "end": 1837.08, + "text": " you learned in this in this journey and what what maybe do you miss from + from your previous career", "tokens": [50700, 291, 3264, 294, 341, 294, 341, 4671, + 293, 437, 437, 1310, 360, 291, 1713, 490, 490, 428, 3894, 3988, 51068], "temperature": + 0.0, "avg_logprob": -0.13952407836914063, "compression_ratio": 1.8309859154929577, + "no_speech_prob": 0.0015195596497505903}, {"id": 358, "seek": 182300, "start": 1837.08, + "end": 1843.96, "text": " when you when you were like you know hands on and sit + down and write a bunch of code I think I", "tokens": [51068, 562, 291, 562, 291, + 645, 411, 291, 458, 2377, 322, 293, 1394, 760, 293, 2464, 257, 3840, 295, 3089, + 286, 519, 286, 51412], "temperature": 0.0, "avg_logprob": -0.13952407836914063, + "compression_ratio": 1.8309859154929577, "no_speech_prob": 0.0015195596497505903}, + {"id": 359, "seek": 182300, "start": 1843.96, "end": 1848.68, "text": " have a I + have a couple of angles to answer the question not necessarily a directing answer + I think", "tokens": [51412, 362, 257, 286, 362, 257, 1916, 295, 14708, 281, 1867, + 264, 1168, 406, 4725, 257, 26979, 1867, 286, 519, 51648], "temperature": 0.0, "avg_logprob": + -0.13952407836914063, "compression_ratio": 1.8309859154929577, "no_speech_prob": + 0.0015195596497505903}, {"id": 360, "seek": 184868, "start": 1848.76, "end": 1857.3200000000002, + "text": " one one angle is that fundamentally I''m like a growth junkie for better + or worse and I think that", "tokens": [50368, 472, 472, 5802, 307, 300, 17879, 286, + 478, 411, 257, 4599, 19109, 414, 337, 1101, 420, 5324, 293, 286, 519, 300, 50796], + "temperature": 0.0, "avg_logprob": -0.08798768104763206, "compression_ratio": 1.8953488372093024, + "no_speech_prob": 0.00229034130461514}, {"id": 361, "seek": 184868, "start": 1857.3200000000002, + "end": 1862.8400000000001, "text": " entrepreneurship is the ultimate path for a + growth junkie it was never really something that I", "tokens": [50796, 26582, 307, + 264, 9705, 3100, 337, 257, 4599, 19109, 414, 309, 390, 1128, 534, 746, 300, 286, + 51072], "temperature": 0.0, "avg_logprob": -0.08798768104763206, "compression_ratio": + 1.8953488372093024, "no_speech_prob": 0.00229034130461514}, {"id": 362, "seek": + 184868, "start": 1862.8400000000001, "end": 1867.96, "text": " assumed that I was + going to do I''ve never before even when I was working on the project it''s never", + "tokens": [51072, 15895, 300, 286, 390, 516, 281, 360, 286, 600, 1128, 949, 754, + 562, 286, 390, 1364, 322, 264, 1716, 309, 311, 1128, 51328], "temperature": 0.0, + "avg_logprob": -0.08798768104763206, "compression_ratio": 1.8953488372093024, "no_speech_prob": + 0.00229034130461514}, {"id": 363, "seek": 184868, "start": 1867.96, "end": 1872.3600000000001, + "text": " about becoming a founder it''s just about creating the database right + and at some point becoming the", "tokens": [51328, 466, 5617, 257, 14917, 309, 311, + 445, 466, 4084, 264, 8149, 558, 293, 412, 512, 935, 5617, 264, 51548], "temperature": + 0.0, "avg_logprob": -0.08798768104763206, "compression_ratio": 1.8953488372093024, + "no_speech_prob": 0.00229034130461514}, {"id": 364, "seek": 184868, "start": 1872.3600000000001, + "end": 1876.68, "text": " founder of the company becomes a means to an end of creating + the database and getting it into the", "tokens": [51548, 14917, 295, 264, 2237, + 3643, 257, 1355, 281, 364, 917, 295, 4084, 264, 8149, 293, 1242, 309, 666, 264, + 51764], "temperature": 0.0, "avg_logprob": -0.08798768104763206, "compression_ratio": + 1.8953488372093024, "no_speech_prob": 0.00229034130461514}, {"id": 365, "seek": + 187668, "start": 1876.76, "end": 1882.8400000000001, "text": " hands of our users + and making sure they have a great time that''s always what like that''s what", "tokens": + [50368, 2377, 295, 527, 5022, 293, 1455, 988, 436, 362, 257, 869, 565, 300, 311, + 1009, 437, 411, 300, 311, 437, 50672], "temperature": 0.0, "avg_logprob": -0.09643647545262386, + "compression_ratio": 1.90234375, "no_speech_prob": 0.0005882293917238712}, {"id": + 366, "seek": 187668, "start": 1882.8400000000001, "end": 1887.8, "text": " drove + me right was the read why I should have this right our customers should have this + this you", "tokens": [50672, 13226, 385, 558, 390, 264, 1401, 983, 286, 820, 362, + 341, 558, 527, 4581, 820, 362, 341, 341, 291, 50920], "temperature": 0.0, "avg_logprob": + -0.09643647545262386, "compression_ratio": 1.90234375, "no_speech_prob": 0.0005882293917238712}, + {"id": 367, "seek": 187668, "start": 1887.8, "end": 1893.5600000000002, "text": + " have a great experience and that''s always what''s driven me and to me the the + founder and all of the", "tokens": [50920, 362, 257, 869, 1752, 293, 300, 311, 1009, + 437, 311, 9555, 385, 293, 281, 385, 264, 264, 14917, 293, 439, 295, 264, 51208], + "temperature": 0.0, "avg_logprob": -0.09643647545262386, "compression_ratio": 1.90234375, + "no_speech_prob": 0.0005882293917238712}, {"id": 368, "seek": 187668, "start": 1893.5600000000002, + "end": 1899.0800000000002, "text": " other things have been a mean towards an end + there I think that one of the things that is maybe both", "tokens": [51208, 661, + 721, 362, 668, 257, 914, 3030, 364, 917, 456, 286, 519, 300, 472, 295, 264, 721, + 300, 307, 1310, 1293, 51484], "temperature": 0.0, "avg_logprob": -0.09643647545262386, + "compression_ratio": 1.90234375, "no_speech_prob": 0.0005882293917238712}, {"id": + 369, "seek": 187668, "start": 1899.0800000000002, "end": 1903.64, "text": " a controversial + but also feels like a true statement is that at some point I feel a bit numb to", + "tokens": [51484, 257, 17323, 457, 611, 3417, 411, 257, 2074, 5629, 307, 300, 412, + 512, 935, 286, 841, 257, 857, 32200, 281, 51712], "temperature": 0.0, "avg_logprob": + -0.09643647545262386, "compression_ratio": 1.90234375, "no_speech_prob": 0.0005882293917238712}, + {"id": 370, "seek": 190364, "start": 1903.64, "end": 1909.16, "text": " what work + that I enjoy and what I don''t enjoy anymore because what I enjoy the most is making", + "tokens": [50364, 437, 589, 300, 286, 2103, 293, 437, 286, 500, 380, 2103, 3602, + 570, 437, 286, 2103, 264, 881, 307, 1455, 50640], "temperature": 0.0, "avg_logprob": + -0.07741976213884784, "compression_ratio": 1.8779527559055118, "no_speech_prob": + 0.005077640060335398}, {"id": 371, "seek": 190364, "start": 1909.16, "end": 1914.3600000000001, + "text": " this company successful and making the database successful for our customers + that''s what I care", "tokens": [50640, 341, 2237, 4406, 293, 1455, 264, 8149, 4406, + 337, 527, 4581, 300, 311, 437, 286, 1127, 50900], "temperature": 0.0, "avg_logprob": + -0.07741976213884784, "compression_ratio": 1.8779527559055118, "no_speech_prob": + 0.005077640060335398}, {"id": 372, "seek": 190364, "start": 1914.3600000000001, + "end": 1922.1200000000001, "text": " the most about and I''m yeah I honestly I love + sales I love marketing I love the engineering I love", "tokens": [50900, 264, 881, + 466, 293, 286, 478, 1338, 286, 6095, 286, 959, 5763, 286, 959, 6370, 286, 959, 264, + 7043, 286, 959, 51288], "temperature": 0.0, "avg_logprob": -0.07741976213884784, + "compression_ratio": 1.8779527559055118, "no_speech_prob": 0.005077640060335398}, + {"id": 373, "seek": 190364, "start": 1922.1200000000001, "end": 1928.2800000000002, + "text": " hiring people for the team I love all of these things but it''s not a + simplistic answer to oh I''ve", "tokens": [51288, 15335, 561, 337, 264, 1469, 286, + 959, 439, 295, 613, 721, 457, 309, 311, 406, 257, 44199, 1867, 281, 1954, 286, 600, + 51596], "temperature": 0.0, "avg_logprob": -0.07741976213884784, "compression_ratio": + 1.8779527559055118, "no_speech_prob": 0.005077640060335398}, {"id": 374, "seek": + 190364, "start": 1928.2800000000002, "end": 1933.0800000000002, "text": " been coding + my whole life I think it''s more that that is my idle activity if there is that", + "tokens": [51596, 668, 17720, 452, 1379, 993, 286, 519, 309, 311, 544, 300, 300, + 307, 452, 30650, 5191, 498, 456, 307, 300, 51836], "temperature": 0.0, "avg_logprob": + -0.07741976213884784, "compression_ratio": 1.8779527559055118, "no_speech_prob": + 0.005077640060335398}, {"id": 375, "seek": 193308, "start": 1933.08, "end": 1937.48, + "text": " one to two hour and there''s nothing urgent on then I''m going to go spend + some time in the code", "tokens": [50364, 472, 281, 732, 1773, 293, 456, 311, 1825, + 19022, 322, 550, 286, 478, 516, 281, 352, 3496, 512, 565, 294, 264, 3089, 50584], + "temperature": 0.0, "avg_logprob": -0.15626467952021847, "compression_ratio": 1.9078947368421053, + "no_speech_prob": 0.0075340173207223415}, {"id": 376, "seek": 193308, "start": 1937.48, + "end": 1942.76, "text": " it''s like oh how did how did Nathan implement this new + this new query planning of query plan", "tokens": [50584, 309, 311, 411, 1954, 577, + 630, 577, 630, 20634, 4445, 341, 777, 341, 777, 14581, 5038, 295, 14581, 1393, 50848], + "temperature": 0.0, "avg_logprob": -0.15626467952021847, "compression_ratio": 1.9078947368421053, + "no_speech_prob": 0.0075340173207223415}, {"id": 377, "seek": 193308, "start": 1942.76, + "end": 1948.28, "text": " or heuristic that is a natural that''s where my idle activity + and I always like to also an", "tokens": [50848, 420, 415, 374, 3142, 300, 307, + 257, 3303, 300, 311, 689, 452, 30650, 5191, 293, 286, 1009, 411, 281, 611, 364, + 51124], "temperature": 0.0, "avg_logprob": -0.15626467952021847, "compression_ratio": + 1.9078947368421053, "no_speech_prob": 0.0075340173207223415}, {"id": 378, "seek": + 193308, "start": 1948.28, "end": 1952.28, "text": " interviewing people try to understand + especially if they''re in a more hybrid world what''s your idle", "tokens": [51124, + 26524, 561, 853, 281, 1223, 2318, 498, 436, 434, 294, 257, 544, 13051, 1002, 437, + 311, 428, 30650, 51324], "temperature": 0.0, "avg_logprob": -0.15626467952021847, + "compression_ratio": 1.9078947368421053, "no_speech_prob": 0.0075340173207223415}, + {"id": 379, "seek": 193308, "start": 1952.28, "end": 1956.52, "text": " activity + what''s the thing did you do when you have one to two hours and nothing else comes + up do you", "tokens": [51324, 5191, 437, 311, 264, 551, 630, 291, 360, 562, 291, + 362, 472, 281, 732, 2496, 293, 1825, 1646, 1487, 493, 360, 291, 51536], "temperature": + 0.0, "avg_logprob": -0.15626467952021847, "compression_ratio": 1.9078947368421053, + "no_speech_prob": 0.0075340173207223415}, {"id": 380, "seek": 193308, "start": 1956.52, + "end": 1961.8799999999999, "text": " gravitate towards the code do you start looking + at just start writing an article do you start playing", "tokens": [51536, 7427, + 8086, 3030, 264, 3089, 360, 291, 722, 1237, 412, 445, 722, 3579, 364, 7222, 360, + 291, 722, 2433, 51804], "temperature": 0.0, "avg_logprob": -0.15626467952021847, + "compression_ratio": 1.9078947368421053, "no_speech_prob": 0.0075340173207223415}, + {"id": 381, "seek": 196188, "start": 1961.88, "end": 1966.2800000000002, "text": + " with the product what is that idle activity and it is code for me that''s what + everything is grounded", "tokens": [50364, 365, 264, 1674, 437, 307, 300, 30650, + 5191, 293, 309, 307, 3089, 337, 385, 300, 311, 437, 1203, 307, 23535, 50584], "temperature": + 0.0, "avg_logprob": -0.11034653981526693, "compression_ratio": 1.9416342412451362, + "no_speech_prob": 0.006499853450804949}, {"id": 382, "seek": 196188, "start": 1966.2800000000002, + "end": 1973.96, "text": " in and I think it I think it has a deep influence on how + I can how I can lead the company but I don''t", "tokens": [50584, 294, 293, 286, + 519, 309, 286, 519, 309, 575, 257, 2452, 6503, 322, 577, 286, 393, 577, 286, 393, + 1477, 264, 2237, 457, 286, 500, 380, 50968], "temperature": 0.0, "avg_logprob": + -0.11034653981526693, "compression_ratio": 1.9416342412451362, "no_speech_prob": + 0.006499853450804949}, {"id": 383, "seek": 196188, "start": 1973.96, "end": 1981.0800000000002, + "text": " think it''s been I think I often think about something that tell them + said you know the author of", "tokens": [50968, 519, 309, 311, 668, 286, 519, 286, + 2049, 519, 466, 746, 300, 980, 552, 848, 291, 458, 264, 3793, 295, 51324], "temperature": + 0.0, "avg_logprob": -0.11034653981526693, "compression_ratio": 1.9416342412451362, + "no_speech_prob": 0.006499853450804949}, {"id": 384, "seek": 196188, "start": 1981.0800000000002, + "end": 1986.6000000000001, "text": " anti fragile and a bunch of other books is + that you the best authors of books are not the ones that", "tokens": [51324, 6061, + 23847, 293, 257, 3840, 295, 661, 3642, 307, 300, 291, 264, 1151, 16552, 295, 3642, + 366, 406, 264, 2306, 300, 51600], "temperature": 0.0, "avg_logprob": -0.11034653981526693, + "compression_ratio": 1.9416342412451362, "no_speech_prob": 0.006499853450804949}, + {"id": 385, "seek": 196188, "start": 1986.6000000000001, "end": 1990.7600000000002, + "text": " sit down and like you know read a bunch of papers then write a page then + read another paper write a", "tokens": [51600, 1394, 760, 293, 411, 291, 458, 1401, + 257, 3840, 295, 10577, 550, 2464, 257, 3028, 550, 1401, 1071, 3035, 2464, 257, 51808], + "temperature": 0.0, "avg_logprob": -0.11034653981526693, "compression_ratio": 1.9416342412451362, + "no_speech_prob": 0.006499853450804949}, {"id": 386, "seek": 199076, "start": 1990.76, + "end": 1995.64, "text": " page the best books are written by people who just you + know go to the cabin sit down write 500", "tokens": [50364, 3028, 264, 1151, 3642, + 366, 3720, 538, 561, 567, 445, 291, 458, 352, 281, 264, 9401, 1394, 760, 2464, 5923, + 50608], "temperature": 0.0, "avg_logprob": -0.08123855931418282, "compression_ratio": + 1.7794117647058822, "no_speech_prob": 0.0060724420472979546}, {"id": 387, "seek": + 199076, "start": 1995.64, "end": 2000.44, "text": " pages and and hit publish of + course that''s not what actually happens but if you read it to let books", "tokens": + [50608, 7183, 293, 293, 2045, 11374, 295, 1164, 300, 311, 406, 437, 767, 2314, 457, + 498, 291, 1401, 309, 281, 718, 3642, 50848], "temperature": 0.0, "avg_logprob": + -0.08123855931418282, "compression_ratio": 1.7794117647058822, "no_speech_prob": + 0.0060724420472979546}, {"id": 388, "seek": 199076, "start": 2000.44, "end": 2004.92, + "text": " it''s probably pretty close to what actually happened and he just has + the citations in his head", "tokens": [50848, 309, 311, 1391, 1238, 1998, 281, 437, + 767, 2011, 293, 415, 445, 575, 264, 4814, 763, 294, 702, 1378, 51072], "temperature": + 0.0, "avg_logprob": -0.08123855931418282, "compression_ratio": 1.7794117647058822, + "no_speech_prob": 0.0060724420472979546}, {"id": 389, "seek": 199076, "start": 2005.56, + "end": 2011.8799999999999, "text": " and I think about that often when building + this company that it has felt like I''ve worked or this", "tokens": [51104, 293, + 286, 519, 466, 300, 2049, 562, 2390, 341, 2237, 300, 309, 575, 2762, 411, 286, 600, + 2732, 420, 341, 51420], "temperature": 0.0, "avg_logprob": -0.08123855931418282, + "compression_ratio": 1.7794117647058822, "no_speech_prob": 0.0060724420472979546}, + {"id": 390, "seek": 199076, "start": 2011.8799999999999, "end": 2017.08, "text": + " my whole life without knowing for it and I feel every every morning that I wake + up that this is", "tokens": [51420, 452, 1379, 993, 1553, 5276, 337, 309, 293, 286, + 841, 633, 633, 2446, 300, 286, 6634, 493, 300, 341, 307, 51680], "temperature": + 0.0, "avg_logprob": -0.08123855931418282, "compression_ratio": 1.7794117647058822, + "no_speech_prob": 0.0060724420472979546}, {"id": 391, "seek": 201708, "start": 2017.08, + "end": 2023.24, "text": " exactly what it has led up to so it''s very naturally + even if it wasn''t go onto itself that it", "tokens": [50364, 2293, 437, 309, 575, + 4684, 493, 281, 370, 309, 311, 588, 8195, 754, 498, 309, 2067, 380, 352, 3911, 2564, + 300, 309, 50672], "temperature": 0.0, "avg_logprob": -0.11179924011230469, "compression_ratio": + 1.6302521008403361, "no_speech_prob": 0.010730314999818802}, {"id": 392, "seek": + 201708, "start": 2023.24, "end": 2029.8, "text": " makes sense with the experience + I''ve had to do exactly this and I tremendously enjoy it but it''s", "tokens": [50672, + 1669, 2020, 365, 264, 1752, 286, 600, 632, 281, 360, 2293, 341, 293, 286, 27985, + 2103, 309, 457, 309, 311, 51000], "temperature": 0.0, "avg_logprob": -0.11179924011230469, + "compression_ratio": 1.6302521008403361, "no_speech_prob": 0.010730314999818802}, + {"id": 393, "seek": 201708, "start": 2029.8, "end": 2035.96, "text": " not a simplistic + answer to do I miss coding no I want to make this company incredibly successful", + "tokens": [51000, 406, 257, 44199, 1867, 281, 360, 286, 1713, 17720, 572, 286, 528, + 281, 652, 341, 2237, 6252, 4406, 51308], "temperature": 0.0, "avg_logprob": -0.11179924011230469, + "compression_ratio": 1.6302521008403361, "no_speech_prob": 0.010730314999818802}, + {"id": 394, "seek": 201708, "start": 2036.76, "end": 2042.36, "text": " but sometimes + I will do it as a recreational activity yeah I mean definitely like when I look + at you", "tokens": [51348, 457, 2171, 286, 486, 360, 309, 382, 257, 37554, 5191, + 1338, 286, 914, 2138, 411, 562, 286, 574, 412, 291, 51628], "temperature": 0.0, + "avg_logprob": -0.11179924011230469, "compression_ratio": 1.6302521008403361, "no_speech_prob": + 0.010730314999818802}, {"id": 395, "seek": 204236, "start": 2043.32, "end": 2048.6, + "text": " like on twitter for example you come across as a very technical person + and you are for sure right", "tokens": [50412, 411, 322, 21439, 337, 1365, 291, + 808, 2108, 382, 257, 588, 6191, 954, 293, 291, 366, 337, 988, 558, 50676], "temperature": + 0.0, "avg_logprob": -0.09982621042351973, "compression_ratio": 1.7268722466960353, + "no_speech_prob": 0.013405069708824158}, {"id": 396, "seek": 204236, "start": 2048.6, + "end": 2053.88, "text": " even though you know that to grow your business you need + to do a lot of other activities but at the same", "tokens": [50676, 754, 1673, 291, + 458, 300, 281, 1852, 428, 1606, 291, 643, 281, 360, 257, 688, 295, 661, 5354, 457, + 412, 264, 912, 50940], "temperature": 0.0, "avg_logprob": -0.09982621042351973, + "compression_ratio": 1.7268722466960353, "no_speech_prob": 0.013405069708824158}, + {"id": 397, "seek": 204236, "start": 2053.88, "end": 2060.68, "text": " time I mean + yeah I don''t mean to ask it in a way that hey you regret now that you do sales + you", "tokens": [50940, 565, 286, 914, 1338, 286, 500, 380, 914, 281, 1029, 309, + 294, 257, 636, 300, 4177, 291, 10879, 586, 300, 291, 360, 5763, 291, 51280], "temperature": + 0.0, "avg_logprob": -0.09982621042351973, "compression_ratio": 1.7268722466960353, + "no_speech_prob": 0.013405069708824158}, {"id": 398, "seek": 204236, "start": 2060.68, + "end": 2067.7999999999997, "text": " regret not doing more coding which which is + not true you still do that and I think that all of", "tokens": [51280, 10879, 406, + 884, 544, 17720, 597, 597, 307, 406, 2074, 291, 920, 360, 300, 293, 286, 519, 300, + 439, 295, 51636], "temperature": 0.0, "avg_logprob": -0.09982621042351973, "compression_ratio": + 1.7268722466960353, "no_speech_prob": 0.013405069708824158}, {"id": 399, "seek": + 206780, "start": 2067.88, "end": 2073.32, "text": " all of the engineers will become + better engineers if they learn the mastery of actually presenting", "tokens": [50368, + 439, 295, 264, 11955, 486, 1813, 1101, 11955, 498, 436, 1466, 264, 37951, 295, 767, + 15578, 50640], "temperature": 0.0, "avg_logprob": -0.06293760437563241, "compression_ratio": + 1.8046511627906976, "no_speech_prob": 0.008376692421734333}, {"id": 400, "seek": + 206780, "start": 2073.32, "end": 2079.96, "text": " what they do right and then + they will not need a middle layer or someone else who will go and talk", "tokens": + [50640, 437, 436, 360, 558, 293, 550, 436, 486, 406, 643, 257, 2808, 4583, 420, + 1580, 1646, 567, 486, 352, 293, 751, 50972], "temperature": 0.0, "avg_logprob": + -0.06293760437563241, "compression_ratio": 1.8046511627906976, "no_speech_prob": + 0.008376692421734333}, {"id": 401, "seek": 206780, "start": 2079.96, "end": 2086.2000000000003, + "text": " to that product manager or whoever else needs to talk to right so they + can actually represent", "tokens": [50972, 281, 300, 1674, 6598, 420, 11387, 1646, + 2203, 281, 751, 281, 558, 370, 436, 393, 767, 2906, 51284], "temperature": 0.0, + "avg_logprob": -0.06293760437563241, "compression_ratio": 1.8046511627906976, "no_speech_prob": + 0.008376692421734333}, {"id": 402, "seek": 206780, "start": 2086.2000000000003, + "end": 2094.28, "text": " themselves but also I also love how you put it really + eloquently that what is your idle activity", "tokens": [51284, 2969, 457, 611, 286, + 611, 959, 577, 291, 829, 309, 534, 38682, 47519, 300, 437, 307, 428, 30650, 5191, + 51688], "temperature": 0.0, "avg_logprob": -0.06293760437563241, "compression_ratio": + 1.8046511627906976, "no_speech_prob": 0.008376692421734333}, {"id": 403, "seek": + 209428, "start": 2094.28, "end": 2099.48, "text": " right what do you what''s your + affinity what you gravitate to and I actually can it resonates a lot", "tokens": + [50364, 558, 437, 360, 291, 437, 311, 428, 39703, 437, 291, 7427, 8086, 281, 293, + 286, 767, 393, 309, 41051, 257, 688, 50624], "temperature": 0.0, "avg_logprob": + -0.11267517708443306, "compression_ratio": 1.749090909090909, "no_speech_prob": + 0.005634230561554432}, {"id": 404, "seek": 209428, "start": 2099.48, "end": 2104.84, + "text": " with me because my idle activity that I''m really nervous that I do nothing + especially on vacations", "tokens": [50624, 365, 385, 570, 452, 30650, 5191, 300, + 286, 478, 534, 6296, 300, 286, 360, 1825, 2318, 322, 2842, 763, 50892], "temperature": + 0.0, "avg_logprob": -0.11267517708443306, "compression_ratio": 1.749090909090909, + "no_speech_prob": 0.005634230561554432}, {"id": 405, "seek": 209428, "start": 2104.84, + "end": 2111.0800000000004, "text": " I start coding you know I just go and just + okay let''s just let''s just hypothesize about something", "tokens": [50892, 286, + 722, 17720, 291, 458, 286, 445, 352, 293, 445, 1392, 718, 311, 445, 718, 311, 445, + 14276, 1125, 466, 746, 51204], "temperature": 0.0, "avg_logprob": -0.11267517708443306, + "compression_ratio": 1.749090909090909, "no_speech_prob": 0.005634230561554432}, + {"id": 406, "seek": 209428, "start": 2111.0800000000004, "end": 2116.52, "text": + " but let''s let''s dial back for for the into the architecture like when I look + at the architecture", "tokens": [51204, 457, 718, 311, 718, 311, 5502, 646, 337, + 337, 264, 666, 264, 9482, 411, 562, 286, 574, 412, 264, 9482, 51476], "temperature": + 0.0, "avg_logprob": -0.11267517708443306, "compression_ratio": 1.749090909090909, + "no_speech_prob": 0.005634230561554432}, {"id": 407, "seek": 209428, "start": 2116.52, + "end": 2123.96, "text": " page of turbo buffer it''s very simple it''s like client + connecting over you know TCP to a", "tokens": [51476, 3028, 295, 20902, 21762, 309, + 311, 588, 2199, 309, 311, 411, 6423, 11015, 670, 291, 458, 48965, 281, 257, 51848], + "temperature": 0.0, "avg_logprob": -0.11267517708443306, "compression_ratio": 1.749090909090909, + "no_speech_prob": 0.005634230561554432}, {"id": 408, "seek": 212428, "start": 2124.52, + "end": 2131.0800000000004, "text": " database instance and it has just two components + they are memory or SSD cache and the object storage", "tokens": [50376, 8149, 5197, + 293, 309, 575, 445, 732, 6677, 436, 366, 4675, 420, 30262, 19459, 293, 264, 2657, + 6725, 50704], "temperature": 0.0, "avg_logprob": -0.11734734025112419, "compression_ratio": + 1.7692307692307692, "no_speech_prob": 0.0049712578766047955}, {"id": 409, "seek": + 212428, "start": 2131.8, "end": 2137.8, "text": " tell me a bit more so I think + our listeners and I mostly know what object storage is but tell me a", "tokens": + [50740, 980, 385, 257, 857, 544, 370, 286, 519, 527, 23274, 293, 286, 5240, 458, + 437, 2657, 6725, 307, 457, 980, 385, 257, 51040], "temperature": 0.0, "avg_logprob": + -0.11734734025112419, "compression_ratio": 1.7692307692307692, "no_speech_prob": + 0.0049712578766047955}, {"id": 410, "seek": 212428, "start": 2137.8, "end": 2143.8, + "text": " bit more about that memory component like what algorithm design went into + that maybe trade-offs and", "tokens": [51040, 857, 544, 466, 300, 4675, 6542, 411, + 437, 9284, 1715, 1437, 666, 300, 1310, 4923, 12, 19231, 293, 51340], "temperature": + 0.0, "avg_logprob": -0.11734734025112419, "compression_ratio": 1.7692307692307692, + "no_speech_prob": 0.0049712578766047955}, {"id": 411, "seek": 212428, "start": 2144.36, + "end": 2149.7200000000003, "text": " you know how frequently you need to do the + round trips to the object storage versus when we", "tokens": [51368, 291, 458, 577, + 10374, 291, 643, 281, 360, 264, 3098, 16051, 281, 264, 2657, 6725, 5717, 562, 321, + 51636], "temperature": 0.0, "avg_logprob": -0.11734734025112419, "compression_ratio": + 1.7692307692307692, "no_speech_prob": 0.0049712578766047955}, {"id": 412, "seek": + 214972, "start": 2149.72, "end": 2155.3199999999997, "text": " actually don''t do + that yeah I think it would be easiest to do this by speaking about the lifetime", + "tokens": [50364, 767, 500, 380, 360, 300, 1338, 286, 519, 309, 576, 312, 12889, + 281, 360, 341, 538, 4124, 466, 264, 11364, 50644], "temperature": 0.0, "avg_logprob": + -0.13326539491352282, "compression_ratio": 1.72, "no_speech_prob": 0.0014006359269842505}, + {"id": 413, "seek": 214972, "start": 2155.3199999999997, "end": 2163.56, "text": + " of a request as the cache worms so we we''ll actually start with the right path + and when when you do", "tokens": [50644, 295, 257, 5308, 382, 264, 19459, 28271, + 370, 321, 321, 603, 767, 722, 365, 264, 558, 3100, 293, 562, 562, 291, 360, 51056], + "temperature": 0.0, "avg_logprob": -0.13326539491352282, "compression_ratio": 1.72, + "no_speech_prob": 0.0014006359269842505}, {"id": 414, "seek": 214972, "start": 2163.56, + "end": 2169.3199999999997, "text": " a right into turbo buffer it''s as simple as + you can imagine it I mean at this point we''ve", "tokens": [51056, 257, 558, 666, + 20902, 21762, 309, 311, 382, 2199, 382, 291, 393, 3811, 309, 286, 914, 412, 341, + 935, 321, 600, 51344], "temperature": 0.0, "avg_logprob": -0.13326539491352282, + "compression_ratio": 1.72, "no_speech_prob": 0.0014006359269842505}, {"id": 415, + "seek": 214972, "start": 2169.3199999999997, "end": 2175.08, "text": " optimized + parts of it that it''s not this simple but this is the the best way to explain it + when you", "tokens": [51344, 26941, 3166, 295, 309, 300, 309, 311, 406, 341, 2199, + 457, 341, 307, 264, 264, 1151, 636, 281, 2903, 309, 562, 291, 51632], "temperature": + 0.0, "avg_logprob": -0.13326539491352282, "compression_ratio": 1.72, "no_speech_prob": + 0.0014006359269842505}, {"id": 416, "seek": 217508, "start": 2175.56, "end": 2182.36, + "text": " when you do a right to turbo buffer that right basically goes into a file + in a directory called the", "tokens": [50388, 562, 291, 360, 257, 558, 281, 20902, + 21762, 300, 558, 1936, 1709, 666, 257, 3991, 294, 257, 21120, 1219, 264, 50728], + "temperature": 0.0, "avg_logprob": -0.10825415580503402, "compression_ratio": 1.8660287081339713, + "no_speech_prob": 0.0017396729672327638}, {"id": 417, "seek": 217508, "start": 2182.36, + "end": 2189.08, "text": " right ahead log so when you write to a namespace you can + imagine that on s3 it''s like slash namespace", "tokens": [50728, 558, 2286, 3565, + 370, 562, 291, 2464, 281, 257, 5288, 17940, 291, 393, 3811, 300, 322, 262, 18, 309, + 311, 411, 17330, 5288, 17940, 51064], "temperature": 0.0, "avg_logprob": -0.10825415580503402, + "compression_ratio": 1.8660287081339713, "no_speech_prob": 0.0017396729672327638}, + {"id": 418, "seek": 217508, "start": 2189.08, "end": 2194.12, "text": " slash you + know right ahead log the right ahead log is basically just a sequence of all the + rights", "tokens": [51064, 17330, 291, 458, 558, 2286, 3565, 264, 558, 2286, 3565, + 307, 1936, 445, 257, 8310, 295, 439, 264, 4601, 51316], "temperature": 0.0, "avg_logprob": + -0.10825415580503402, "compression_ratio": 1.8660287081339713, "no_speech_prob": + 0.0017396729672327638}, {"id": 419, "seek": 217508, "start": 2194.12, "end": 2199.72, + "text": " in order the raw rights so you do your right and it might be okay I''m + inserting a document", "tokens": [51316, 294, 1668, 264, 8936, 4601, 370, 291, 360, + 428, 558, 293, 309, 1062, 312, 1392, 286, 478, 46567, 257, 4166, 51596], "temperature": + 0.0, "avg_logprob": -0.10825415580503402, "compression_ratio": 1.8660287081339713, + "no_speech_prob": 0.0017396729672327638}, {"id": 420, "seek": 219972, "start": 2199.8799999999997, + "end": 2205.3999999999996, "text": " with text the metri and one with text Simon + and those are the two documents you can in the simplest", "tokens": [50372, 365, + 2487, 264, 1131, 470, 293, 472, 365, 2487, 13193, 293, 729, 366, 264, 732, 8512, + 291, 393, 294, 264, 22811, 50648], "temperature": 0.0, "avg_logprob": -0.12305951551957564, + "compression_ratio": 1.853846153846154, "no_speech_prob": 0.0010597704676911235}, + {"id": 421, "seek": 219972, "start": 2205.3999999999996, "end": 2211.3199999999997, + "text": " way you can imagine that this file is called zero job JSON and the next + one is called one dot JSON", "tokens": [50648, 636, 291, 393, 3811, 300, 341, 3991, + 307, 1219, 4018, 1691, 31828, 293, 264, 958, 472, 307, 1219, 472, 5893, 31828, 50944], + "temperature": 0.0, "avg_logprob": -0.12305951551957564, "compression_ratio": 1.853846153846154, + "no_speech_prob": 0.0010597704676911235}, {"id": 422, "seek": 219972, "start": 2211.3199999999997, + "end": 2216.3599999999997, "text": " three dot JSON that''s a database right that''s + just the right ahead log and if you want to satisfy a", "tokens": [50944, 1045, + 5893, 31828, 300, 311, 257, 8149, 558, 300, 311, 445, 264, 558, 2286, 3565, 293, + 498, 291, 528, 281, 19319, 257, 51196], "temperature": 0.0, "avg_logprob": -0.12305951551957564, + "compression_ratio": 1.853846153846154, "no_speech_prob": 0.0010597704676911235}, + {"id": 423, "seek": 219972, "start": 2216.3599999999997, "end": 2222.6, "text": + " query you just scan through all the JSON documents and you satisfy the query that''s + actually", "tokens": [51196, 14581, 291, 445, 11049, 807, 439, 264, 31828, 8512, + 293, 291, 19319, 264, 14581, 300, 311, 767, 51508], "temperature": 0.0, "avg_logprob": + -0.12305951551957564, "compression_ratio": 1.853846153846154, "no_speech_prob": + 0.0010597704676911235}, {"id": 424, "seek": 219972, "start": 2222.6, "end": 2228.12, + "text": " respectable database and it''s not even that far from the first version + of turbo buffer but", "tokens": [51508, 44279, 8149, 293, 309, 311, 406, 754, 300, + 1400, 490, 264, 700, 3037, 295, 20902, 21762, 457, 51784], "temperature": 0.0, "avg_logprob": + -0.12305951551957564, "compression_ratio": 1.853846153846154, "no_speech_prob": + 0.0010597704676911235}, {"id": 425, "seek": 222812, "start": 2229.08, "end": 2235.08, + "text": " of course you have to index that data as well so basically as you can + imagine once many", "tokens": [50412, 295, 1164, 291, 362, 281, 8186, 300, 1412, + 382, 731, 370, 1936, 382, 291, 393, 3811, 1564, 867, 50712], "temperature": 0.0, + "avg_logprob": -0.13026620066443154, "compression_ratio": 1.86, "no_speech_prob": + 0.0007774747209623456}, {"id": 426, "seek": 222812, "start": 2235.08, "end": 2241.16, + "text": " megabytes of data come in asynchronously an indexing node will pick it + up and put it into the", "tokens": [50712, 10816, 24538, 295, 1412, 808, 294, 42642, + 5098, 364, 8186, 278, 9984, 486, 1888, 309, 493, 293, 829, 309, 666, 264, 51016], + "temperature": 0.0, "avg_logprob": -0.13026620066443154, "compression_ratio": 1.86, + "no_speech_prob": 0.0007774747209623456}, {"id": 427, "seek": 222812, "start": 2241.16, + "end": 2247.24, "text": " inverted index for full text search put it into an a and + an index for vector search and put it", "tokens": [51016, 38969, 8186, 337, 1577, + 2487, 3164, 829, 309, 666, 364, 257, 293, 364, 8186, 337, 8062, 3164, 293, 829, + 309, 51320], "temperature": 0.0, "avg_logprob": -0.13026620066443154, "compression_ratio": + 1.86, "no_speech_prob": 0.0007774747209623456}, {"id": 428, "seek": 222812, "start": + 2247.24, "end": 2254.3599999999997, "text": " into a attribute or filtering index + for other attributes and there will be other indexing types", "tokens": [51320, + 666, 257, 19667, 420, 30822, 8186, 337, 661, 17212, 293, 456, 486, 312, 661, 8186, + 278, 3467, 51676], "temperature": 0.0, "avg_logprob": -0.13026620066443154, "compression_ratio": + 1.86, "no_speech_prob": 0.0007774747209623456}, {"id": 429, "seek": 225436, "start": + 2254.36, "end": 2259.7200000000003, "text": " in the future when when that happens + it will put it into slash namespace slash index and just", "tokens": [50364, 294, + 264, 2027, 562, 562, 300, 2314, 309, 486, 829, 309, 666, 17330, 5288, 17940, 17330, + 8186, 293, 445, 50632], "temperature": 0.0, "avg_logprob": -0.12587163255021377, + "compression_ratio": 1.9291338582677164, "no_speech_prob": 0.0021439732518047094}, + {"id": 430, "seek": 225436, "start": 2259.7200000000003, "end": 2265.2400000000002, + "text": " start putting files in there right and then the query layer can then consult + those files right instead", "tokens": [50632, 722, 3372, 7098, 294, 456, 558, 293, + 550, 264, 14581, 4583, 393, 550, 7189, 729, 7098, 558, 2602, 50908], "temperature": + 0.0, "avg_logprob": -0.12587163255021377, "compression_ratio": 1.9291338582677164, + "no_speech_prob": 0.0021439732518047094}, {"id": 431, "seek": 225436, "start": 2265.2400000000002, + "end": 2269.6400000000003, "text": " of scanning through every single document to + find a metri you can just plop in and look at the", "tokens": [50908, 295, 27019, + 807, 633, 2167, 4166, 281, 915, 257, 1131, 470, 291, 393, 445, 499, 404, 294, 293, + 574, 412, 264, 51128], "temperature": 0.0, "avg_logprob": -0.12587163255021377, + "compression_ratio": 1.9291338582677164, "no_speech_prob": 0.0021439732518047094}, + {"id": 432, "seek": 225436, "start": 2269.6400000000003, "end": 2275.7200000000003, + "text": " metri in the inverted index find the document and return it that''s how + right works when a right", "tokens": [51128, 1131, 470, 294, 264, 38969, 8186, 915, + 264, 4166, 293, 2736, 309, 300, 311, 577, 558, 1985, 562, 257, 558, 51432], "temperature": + 0.0, "avg_logprob": -0.12587163255021377, "compression_ratio": 1.9291338582677164, + "no_speech_prob": 0.0021439732518047094}, {"id": 433, "seek": 225436, "start": 2275.7200000000003, + "end": 2282.1200000000003, "text": " happens it will go through one of the query + nodes and the right will be also written into the into the", "tokens": [51432, 2314, + 309, 486, 352, 807, 472, 295, 264, 14581, 13891, 293, 264, 558, 486, 312, 611, 3720, + 666, 264, 666, 264, 51752], "temperature": 0.0, "avg_logprob": -0.12587163255021377, + "compression_ratio": 1.9291338582677164, "no_speech_prob": 0.0021439732518047094}, + {"id": 434, "seek": 228212, "start": 2282.2, "end": 2291.88, "text": " cache right + so both the memory cache and the disk cache and when so when you do a query you + will go", "tokens": [50368, 19459, 558, 370, 1293, 264, 4675, 19459, 293, 264, 12355, + 19459, 293, 562, 370, 562, 291, 360, 257, 14581, 291, 486, 352, 50852], "temperature": + 0.0, "avg_logprob": -0.15899950458157447, "compression_ratio": 1.976284584980237, + "no_speech_prob": 0.0017899831291288137}, {"id": 435, "seek": 228212, "start": 2291.88, + "end": 2295.56, "text": " to that same query node right there''s a consistent hashing + so if there''s three it''s sort of like the", "tokens": [50852, 281, 300, 912, 14581, + 9984, 558, 456, 311, 257, 8398, 575, 571, 370, 498, 456, 311, 1045, 309, 311, 1333, + 295, 411, 264, 51036], "temperature": 0.0, "avg_logprob": -0.15899950458157447, + "compression_ratio": 1.976284584980237, "no_speech_prob": 0.0017899831291288137}, + {"id": 436, "seek": 228212, "start": 2295.56, "end": 2300.3599999999997, "text": + " same namespace will end up on node one all the time if it hashes that I know when + you''ve satisfied", "tokens": [51036, 912, 5288, 17940, 486, 917, 493, 322, 9984, + 472, 439, 264, 565, 498, 309, 575, 8076, 300, 286, 458, 562, 291, 600, 11239, 51276], + "temperature": 0.0, "avg_logprob": -0.15899950458157447, "compression_ratio": 1.976284584980237, + "no_speech_prob": 0.0017899831291288137}, {"id": 437, "seek": 228212, "start": 2300.3599999999997, + "end": 2305.48, "text": " when you when you do a query it will first take the cat + check the caches if you just did the right well", "tokens": [51276, 562, 291, 562, + 291, 360, 257, 14581, 309, 486, 700, 747, 264, 3857, 1520, 264, 269, 13272, 498, + 291, 445, 630, 264, 558, 731, 51532], "temperature": 0.0, "avg_logprob": -0.15899950458157447, + "compression_ratio": 1.976284584980237, "no_speech_prob": 0.0017899831291288137}, + {"id": 438, "seek": 228212, "start": 2306.12, "end": 2309.72, "text": " it''s already + there because we just wrote all the rights into the cache to have this you know + the", "tokens": [51564, 309, 311, 1217, 456, 570, 321, 445, 4114, 439, 264, 4601, + 666, 264, 19459, 281, 362, 341, 291, 458, 264, 51744], "temperature": 0.0, "avg_logprob": + -0.15899950458157447, "compression_ratio": 1.976284584980237, "no_speech_prob": + 0.0017899831291288137}, {"id": 439, "seek": 230972, "start": 2309.72, "end": 2316.4399999999996, + "text": " right through cache and and we will satisfy the query mainly from the + cache if for whatever reason", "tokens": [50364, 558, 807, 19459, 293, 293, 321, + 486, 19319, 264, 14581, 8704, 490, 264, 19459, 498, 337, 2035, 1778, 50700], "temperature": + 0.0, "avg_logprob": -0.15967163153454267, "compression_ratio": 1.8671875, "no_speech_prob": + 0.0002784858806990087}, {"id": 440, "seek": 230972, "start": 2316.4399999999996, + "end": 2320.68, "text": " this namespace is not maybe you did the right a month + ago and so it''s following that", "tokens": [50700, 341, 5288, 17940, 307, 406, + 1310, 291, 630, 264, 558, 257, 1618, 2057, 293, 370, 309, 311, 3480, 300, 50912], + "temperature": 0.0, "avg_logprob": -0.15967163153454267, "compression_ratio": 1.8671875, + "no_speech_prob": 0.0002784858806990087}, {"id": 441, "seek": 230972, "start": 2320.68, + "end": 2324.8399999999997, "text": " a cache and you do the read well then we''ll + read through cache by going directly to opix storage", "tokens": [50912, 257, 19459, + 293, 291, 360, 264, 1401, 731, 550, 321, 603, 1401, 807, 19459, 538, 516, 3838, + 281, 999, 970, 6725, 51120], "temperature": 0.0, "avg_logprob": -0.15967163153454267, + "compression_ratio": 1.8671875, "no_speech_prob": 0.0002784858806990087}, {"id": + 442, "seek": 230972, "start": 2324.8399999999997, "end": 2329.08, "text": " with + its few round trips as possible to get the data to satisfy the query both from the + index and", "tokens": [51120, 365, 1080, 1326, 3098, 16051, 382, 1944, 281, 483, + 264, 1412, 281, 19319, 264, 14581, 1293, 490, 264, 8186, 293, 51332], "temperature": + 0.0, "avg_logprob": -0.15967163153454267, "compression_ratio": 1.8671875, "no_speech_prob": + 0.0002784858806990087}, {"id": 443, "seek": 230972, "start": 2329.08, "end": 2335.16, + "text": " from the wall will do range reads directly on s3 right the old like hcp + range header to get exactly", "tokens": [51332, 490, 264, 2929, 486, 360, 3613, + 15700, 3838, 322, 262, 18, 558, 264, 1331, 411, 276, 66, 79, 3613, 23117, 281, 483, + 2293, 51636], "temperature": 0.0, "avg_logprob": -0.15967163153454267, "compression_ratio": + 1.8671875, "no_speech_prob": 0.0002784858806990087}, {"id": 444, "seek": 233516, + "start": 2335.16, "end": 2341.3999999999996, "text": " the bytes we need to satisfy + the query and then start hydrating the cache on the on the query node", "tokens": + [50364, 264, 36088, 321, 643, 281, 19319, 264, 14581, 293, 550, 722, 5796, 8754, + 264, 19459, 322, 264, 322, 264, 14581, 9984, 50676], "temperature": 0.0, "avg_logprob": + -0.13676319803510392, "compression_ratio": 1.9019607843137254, "no_speech_prob": + 0.0013332271482795477}, {"id": 445, "seek": 233516, "start": 2341.3999999999996, + "end": 2345.96, "text": " so the subsequent queries get faster and faster and we + can do that a gigabyte per second we can", "tokens": [50676, 370, 264, 19962, 24109, + 483, 4663, 293, 4663, 293, 321, 393, 360, 300, 257, 8741, 34529, 680, 1150, 321, + 393, 50904], "temperature": 0.0, "avg_logprob": -0.13676319803510392, "compression_ratio": + 1.9019607843137254, "no_speech_prob": 0.0013332271482795477}, {"id": 446, "seek": + 233516, "start": 2345.96, "end": 2352.3599999999997, "text": " hydrate the cache + even for very very very large namespaces so that''s the general architecture of", + "tokens": [50904, 5796, 4404, 264, 19459, 754, 337, 588, 588, 588, 2416, 5288, 79, + 2116, 370, 300, 311, 264, 2674, 9482, 295, 51224], "temperature": 0.0, "avg_logprob": + -0.13676319803510392, "compression_ratio": 1.9019607843137254, "no_speech_prob": + 0.0013332271482795477}, {"id": 447, "seek": 233516, "start": 2352.3599999999997, + "end": 2356.8399999999997, "text": " turbo puffer on a completely cold query it + takes hundreds of milliseconds and on a warm query it", "tokens": [51224, 20902, + 19613, 260, 322, 257, 2584, 3554, 14581, 309, 2516, 6779, 295, 34184, 293, 322, + 257, 4561, 14581, 309, 51448], "temperature": 0.0, "avg_logprob": -0.13676319803510392, + "compression_ratio": 1.9019607843137254, "no_speech_prob": 0.0013332271482795477}, + {"id": 448, "seek": 233516, "start": 2356.8399999999997, "end": 2362.92, "text": + " can take as little as 10 milliseconds to to satisfy the query the the last detail + I''ll point out", "tokens": [51448, 393, 747, 382, 707, 382, 1266, 34184, 281, 281, + 19319, 264, 14581, 264, 264, 1036, 2607, 286, 603, 935, 484, 51752], "temperature": + 0.0, "avg_logprob": -0.13676319803510392, "compression_ratio": 1.9019607843137254, + "no_speech_prob": 0.0013332271482795477}, {"id": 449, "seek": 236292, "start": 2363.56, + "end": 2368.6800000000003, "text": " and then we can go into a particular aspect + of this is that turbo puffer has chosen to do consistent", "tokens": [50396, 293, + 550, 321, 393, 352, 666, 257, 1729, 4171, 295, 341, 307, 300, 20902, 19613, 260, + 575, 8614, 281, 360, 8398, 50652], "temperature": 0.0, "avg_logprob": -0.13058957513773217, + "compression_ratio": 1.7343173431734318, "no_speech_prob": 0.005654902197420597}, + {"id": 450, "seek": 236292, "start": 2368.6800000000003, "end": 2373.7200000000003, + "text": " reads by default this is an unusual choice for search engines we''ve seen + doesn''t do this", "tokens": [50652, 15700, 538, 7576, 341, 307, 364, 10901, 3922, + 337, 3164, 12982, 321, 600, 1612, 1177, 380, 360, 341, 50904], "temperature": 0.0, + "avg_logprob": -0.13058957513773217, "compression_ratio": 1.7343173431734318, "no_speech_prob": + 0.005654902197420597}, {"id": 451, "seek": 236292, "start": 2373.7200000000003, + "end": 2377.16, "text": " unless you turn it on explicitly I think they''ve done + more work now for real time indexing", "tokens": [50904, 5969, 291, 1261, 309, 322, + 20803, 286, 519, 436, 600, 1096, 544, 589, 586, 337, 957, 565, 8186, 278, 51076], + "temperature": 0.0, "avg_logprob": -0.13058957513773217, "compression_ratio": 1.7343173431734318, + "no_speech_prob": 0.005654902197420597}, {"id": 452, "seek": 236292, "start": 2377.96, + "end": 2382.12, "text": " which to me is the gold standard which is why I keep referring + back to it''s phenomenal piece of", "tokens": [51116, 597, 281, 385, 307, 264, 3821, + 3832, 597, 307, 983, 286, 1066, 13761, 646, 281, 309, 311, 17778, 2522, 295, 51324], + "temperature": 0.0, "avg_logprob": -0.13058957513773217, "compression_ratio": 1.7343173431734318, + "no_speech_prob": 0.005654902197420597}, {"id": 453, "seek": 236292, "start": 2382.12, + "end": 2387.8, "text": " software and turbo power requests consistent reads by default + meaning that if you do it right", "tokens": [51324, 4722, 293, 20902, 1347, 12475, + 8398, 15700, 538, 7576, 3620, 300, 498, 291, 360, 309, 558, 51608], "temperature": + 0.0, "avg_logprob": -0.13058957513773217, "compression_ratio": 1.7343173431734318, + "no_speech_prob": 0.005654902197420597}, {"id": 454, "seek": 238780, "start": 2387.8, + "end": 2393.96, "text": " and then you read immediately afterwards that right will + be visible and in order to satisfy that", "tokens": [50364, 293, 550, 291, 1401, + 4258, 10543, 300, 558, 486, 312, 8974, 293, 294, 1668, 281, 19319, 300, 50672], + "temperature": 0.0, "avg_logprob": -0.12860016744644914, "compression_ratio": 1.830827067669173, + "no_speech_prob": 0.001849163556471467}, {"id": 455, "seek": 238780, "start": 2393.96, + "end": 2398.92, "text": " we can''t just rely on the cache on that node being out + of date that node could have died it could have", "tokens": [50672, 321, 393, 380, + 445, 10687, 322, 264, 19459, 322, 300, 9984, 885, 484, 295, 4002, 300, 9984, 727, + 362, 4539, 309, 727, 362, 50920], "temperature": 0.0, "avg_logprob": -0.12860016744644914, + "compression_ratio": 1.830827067669173, "no_speech_prob": 0.001849163556471467}, + {"id": 456, "seek": 238780, "start": 2398.92, "end": 2405.88, "text": " you know + the hashed ring could have moved because we scaled up so every single um query we + go to", "tokens": [50920, 291, 458, 264, 22019, 292, 4875, 727, 362, 4259, 570, + 321, 36039, 493, 370, 633, 2167, 1105, 14581, 321, 352, 281, 51268], "temperature": + 0.0, "avg_logprob": -0.12860016744644914, "compression_ratio": 1.830827067669173, + "no_speech_prob": 0.001849163556471467}, {"id": 457, "seek": 238780, "start": 2405.88, + "end": 2411.7200000000003, "text": " op storage and see what is the latest entry + in the wall and do we have that entry right is it at", "tokens": [51268, 999, 6725, + 293, 536, 437, 307, 264, 6792, 8729, 294, 264, 2929, 293, 360, 321, 362, 300, 8729, + 558, 307, 309, 412, 51560], "temperature": 0.0, "avg_logprob": -0.12860016744644914, + "compression_ratio": 1.830827067669173, "no_speech_prob": 0.001849163556471467}, + {"id": 458, "seek": 238780, "start": 2411.7200000000003, "end": 2416.52, "text": + " 3.json or is that 5.json and do I have that so we have a little pointer file that + we can look", "tokens": [51560, 805, 13, 73, 3015, 420, 307, 300, 1025, 13, 73, + 3015, 293, 360, 286, 362, 300, 370, 321, 362, 257, 707, 23918, 3991, 300, 321, 393, + 574, 51800], "temperature": 0.0, "avg_logprob": -0.12860016744644914, "compression_ratio": + 1.830827067669173, "no_speech_prob": 0.001849163556471467}, {"id": 459, "seek": + 241652, "start": 2416.52, "end": 2422.6, "text": " and we can download and look + at right and that round trip is basically our p50 like our spans", "tokens": [50364, + 293, 321, 393, 5484, 293, 574, 412, 558, 293, 300, 3098, 4931, 307, 1936, 527, 280, + 2803, 411, 527, 44086, 50668], "temperature": 0.0, "avg_logprob": -0.18321394264151197, + "compression_ratio": 1.7985074626865671, "no_speech_prob": 0.004216221161186695}, + {"id": 460, "seek": 241652, "start": 2422.6, "end": 2427.96, "text": " are basically + you know often like one to two milliseconds of actual search and then on gcs about", + "tokens": [50668, 366, 1936, 291, 458, 2049, 411, 472, 281, 732, 34184, 295, 3539, + 3164, 293, 550, 322, 290, 14368, 466, 50936], "temperature": 0.0, "avg_logprob": + -0.18321394264151197, "compression_ratio": 1.7985074626865671, "no_speech_prob": + 0.004216221161186695}, {"id": 461, "seek": 241652, "start": 2427.96, "end": 2435.72, + "text": " depending on the region 12 to 16 milliseconds waiting for that consistency + check on s3 the small", "tokens": [50936, 5413, 322, 264, 4458, 2272, 281, 3165, + 34184, 3806, 337, 300, 14416, 1520, 322, 262, 18, 264, 1359, 51324], "temperature": + 0.0, "avg_logprob": -0.18321394264151197, "compression_ratio": 1.7985074626865671, + "no_speech_prob": 0.004216221161186695}, {"id": 462, "seek": 241652, "start": 2435.72, + "end": 2440.28, "text": " obnoxiously it''s a little bit better so it''s eight milliseconds + but you can turn this off and", "tokens": [51324, 1111, 29129, 8994, 309, 311, 257, + 707, 857, 1101, 370, 309, 311, 3180, 34184, 457, 291, 393, 1261, 341, 766, 293, + 51552], "temperature": 0.0, "avg_logprob": -0.18321394264151197, "compression_ratio": + 1.7985074626865671, "no_speech_prob": 0.004216221161186695}, {"id": 463, "seek": + 241652, "start": 2440.28, "end": 2445.4, "text": " you will still get up to you + you can get eventual consistency that''s very normal for these databases", "tokens": + [51552, 291, 486, 920, 483, 493, 281, 291, 291, 393, 483, 33160, 14416, 300, 311, + 588, 2710, 337, 613, 22380, 51808], "temperature": 0.0, "avg_logprob": -0.18321394264151197, + "compression_ratio": 1.7985074626865671, "no_speech_prob": 0.004216221161186695}, + {"id": 464, "seek": 244540, "start": 2445.56, "end": 2449.7200000000003, "text": + " like could be up to one minute out of date and then you can see often less than + a millisecond", "tokens": [50372, 411, 727, 312, 493, 281, 472, 3456, 484, 295, + 4002, 293, 550, 291, 393, 536, 2049, 1570, 813, 257, 27940, 18882, 50580], "temperature": + 0.0, "avg_logprob": -0.16819244750002596, "compression_ratio": 1.7788018433179724, + "no_speech_prob": 0.001846386818215251}, {"id": 465, "seek": 244540, "start": 2449.7200000000003, + "end": 2454.6800000000003, "text": " or a millisecond latency to a turbo buffer + by turning off that check but we find that this is a very", "tokens": [50580, 420, + 257, 27940, 18882, 27043, 281, 257, 20902, 21762, 538, 6246, 766, 300, 1520, 457, + 321, 915, 300, 341, 307, 257, 588, 50828], "temperature": 0.0, "avg_logprob": -0.16819244750002596, + "compression_ratio": 1.7788018433179724, "no_speech_prob": 0.001846386818215251}, + {"id": 466, "seek": 244540, "start": 2454.6800000000003, "end": 2460.6, "text": + " safe default and I think that database should ship with very safe and unsurprising + defaults", "tokens": [50828, 3273, 7576, 293, 286, 519, 300, 8149, 820, 5374, 365, + 588, 3273, 293, 2693, 374, 26203, 7576, 82, 51124], "temperature": 0.0, "avg_logprob": + -0.16819244750002596, "compression_ratio": 1.7788018433179724, "no_speech_prob": + 0.001846386818215251}, {"id": 467, "seek": 244540, "start": 2461.08, "end": 2469.56, + "text": " yeah for sure for sure um so in that cache but you also have the you also + have the let''s focus only", "tokens": [51148, 1338, 337, 988, 337, 988, 1105, 370, + 294, 300, 19459, 457, 291, 611, 362, 264, 291, 611, 362, 264, 718, 311, 1879, 787, + 51572], "temperature": 0.0, "avg_logprob": -0.16819244750002596, "compression_ratio": + 1.7788018433179724, "no_speech_prob": 0.001846386818215251}, {"id": 468, "seek": + 246956, "start": 2469.72, "end": 2476.04, "text": " back to search part for now + you also have the a and n index is that also stored on s3 and then", "tokens": [50372, + 646, 281, 3164, 644, 337, 586, 291, 611, 362, 264, 257, 293, 297, 8186, 307, 300, + 611, 12187, 322, 262, 18, 293, 550, 50688], "temperature": 0.0, "avg_logprob": -0.16967498554902918, + "compression_ratio": 1.746606334841629, "no_speech_prob": 0.00039154410478658974}, + {"id": 469, "seek": 246956, "start": 2476.6, "end": 2481.88, "text": " is it do + you also keep kind of like a replica of it in memory to to quick access and how + do you", "tokens": [50716, 307, 309, 360, 291, 611, 1066, 733, 295, 411, 257, 35456, + 295, 309, 294, 4675, 281, 281, 1702, 2105, 293, 577, 360, 291, 50980], "temperature": + 0.0, "avg_logprob": -0.16967498554902918, "compression_ratio": 1.746606334841629, + "no_speech_prob": 0.00039154410478658974}, {"id": 470, "seek": 246956, "start": + 2481.88, "end": 2489.16, "text": " sort of it''s true how do you sort of synchronize + the two both the right-ahead log and the index are", "tokens": [50980, 1333, 295, + 309, 311, 2074, 577, 360, 291, 1333, 295, 19331, 1125, 264, 732, 1293, 264, 558, + 12, 545, 2056, 3565, 293, 264, 8186, 366, 51344], "temperature": 0.0, "avg_logprob": + -0.16967498554902918, "compression_ratio": 1.746606334841629, "no_speech_prob": + 0.00039154410478658974}, {"id": 471, "seek": 246956, "start": 2490.36, "end": 2497.32, + "text": " everything is stored on s3 if you killed all of the compute nodes of turbo + buffer in all of our", "tokens": [51404, 1203, 307, 12187, 322, 262, 18, 498, 291, + 4652, 439, 295, 264, 14722, 13891, 295, 20902, 21762, 294, 439, 295, 527, 51752], + "temperature": 0.0, "avg_logprob": -0.16967498554902918, "compression_ratio": 1.746606334841629, + "no_speech_prob": 0.00039154410478658974}, {"id": 472, "seek": 249732, "start": + 2497.32, "end": 2502.92, "text": " clusters we would not lose any data there is + no data on the compute notes that matter it''s", "tokens": [50364, 23313, 321, 576, + 406, 3624, 604, 1412, 456, 307, 572, 1412, 322, 264, 14722, 5570, 300, 1871, 309, + 311, 50644], "temperature": 0.0, "avg_logprob": -0.08279361222919665, "compression_ratio": + 1.8202247191011236, "no_speech_prob": 0.0046958220191299915}, {"id": 473, "seek": + 249732, "start": 2502.92, "end": 2507.4, "text": " only transient caching but we + cache everything yeah if you''re accessing the index will cache the", "tokens": + [50644, 787, 41998, 269, 2834, 457, 321, 19459, 1203, 1338, 498, 291, 434, 26440, + 264, 8186, 486, 19459, 264, 50868], "temperature": 0.0, "avg_logprob": -0.08279361222919665, + "compression_ratio": 1.8202247191011236, "no_speech_prob": 0.0046958220191299915}, + {"id": 474, "seek": 249732, "start": 2507.4, "end": 2512.36, "text": " index if + you''re just accessing the right-ahead log files because it''s so small or there''s + parts of", "tokens": [50868, 8186, 498, 291, 434, 445, 26440, 264, 558, 12, 545, + 2056, 3565, 7098, 570, 309, 311, 370, 1359, 420, 456, 311, 3166, 295, 51116], "temperature": + 0.0, "avg_logprob": -0.08279361222919665, "compression_ratio": 1.8202247191011236, + "no_speech_prob": 0.0046958220191299915}, {"id": 475, "seek": 249732, "start": 2512.36, + "end": 2516.76, "text": " the data that hasn''t been indexed then and that''s also + on s3 and goes into the same cache with", "tokens": [51116, 264, 1412, 300, 6132, + 380, 668, 8186, 292, 550, 293, 300, 311, 611, 322, 262, 18, 293, 1709, 666, 264, + 912, 19459, 365, 51336], "temperature": 0.0, "avg_logprob": -0.08279361222919665, + "compression_ratio": 1.8202247191011236, "no_speech_prob": 0.0046958220191299915}, + {"id": 476, "seek": 249732, "start": 2516.76, "end": 2521.7200000000003, "text": + " everything else right prioritized by the workloads to try to get the best performance + possible yeah it''s", "tokens": [51336, 1203, 1646, 558, 14846, 1602, 538, 264, + 32452, 281, 853, 281, 483, 264, 1151, 3389, 1944, 1338, 309, 311, 51584], "temperature": + 0.0, "avg_logprob": -0.08279361222919665, "compression_ratio": 1.8202247191011236, + "no_speech_prob": 0.0046958220191299915}, {"id": 477, "seek": 252172, "start": 2521.72, + "end": 2529.08, "text": " quite smart so effectively you like I remember like at + some previous companies when I was running", "tokens": [50364, 1596, 4069, 370, + 8659, 291, 411, 286, 1604, 411, 412, 512, 3894, 3431, 562, 286, 390, 2614, 50732], + "temperature": 0.0, "avg_logprob": -0.11721773897663931, "compression_ratio": 1.636734693877551, + "no_speech_prob": 0.002499124500900507}, {"id": 478, "seek": 252172, "start": 2529.08, + "end": 2534.52, "text": " Apache Solar one of the problems was always that all of + these charts are super cold because they''re", "tokens": [50732, 46597, 22385, 472, + 295, 264, 2740, 390, 1009, 300, 439, 295, 613, 17767, 366, 1687, 3554, 570, 436, + 434, 51004], "temperature": 0.0, "avg_logprob": -0.11721773897663931, "compression_ratio": + 1.636734693877551, "no_speech_prob": 0.002499124500900507}, {"id": 479, "seek": + 252172, "start": 2534.52, "end": 2541.72, "text": " never used right we still pay + for them but then when the query hits you incur so much latency that''s", "tokens": + [51004, 1128, 1143, 558, 321, 920, 1689, 337, 552, 457, 550, 562, 264, 14581, 8664, + 291, 35774, 370, 709, 27043, 300, 311, 51364], "temperature": 0.0, "avg_logprob": + -0.11721773897663931, "compression_ratio": 1.636734693877551, "no_speech_prob": + 0.002499124500900507}, {"id": 480, "seek": 252172, "start": 2541.72, "end": 2549.48, + "text": " super painful and so I was always coming up with these ideas what if I + run some you know post indexing", "tokens": [51364, 1687, 11697, 293, 370, 286, + 390, 1009, 1348, 493, 365, 613, 3487, 437, 498, 286, 1190, 512, 291, 458, 2183, + 8186, 278, 51752], "temperature": 0.0, "avg_logprob": -0.11721773897663931, "compression_ratio": + 1.636734693877551, "no_speech_prob": 0.002499124500900507}, {"id": 481, "seek": + 254948, "start": 2549.48, "end": 2553.96, "text": " warm-up script that will go + and shoot a bunch of queries to all of the charts just to keep them", "tokens": + [50364, 4561, 12, 1010, 5755, 300, 486, 352, 293, 3076, 257, 3840, 295, 24109, 281, + 439, 295, 264, 17767, 445, 281, 1066, 552, 50588], "temperature": 0.0, "avg_logprob": + -0.11731377468314222, "compression_ratio": 1.6, "no_speech_prob": 0.0033323941752314568}, + {"id": 482, "seek": 254948, "start": 2554.6, "end": 2562.68, "text": " you know + up and running and and warm or just cat all the indices on Linux into memory we''ve", + "tokens": [50620, 291, 458, 493, 293, 2614, 293, 293, 4561, 420, 445, 3857, 439, + 264, 43840, 322, 18734, 666, 4675, 321, 600, 51024], "temperature": 0.0, "avg_logprob": + -0.11731377468314222, "compression_ratio": 1.6, "no_speech_prob": 0.0033323941752314568}, + {"id": 483, "seek": 254948, "start": 2562.68, "end": 2569.0, "text": " done that + too that was like 10 years ago or so that was very strange feeling like why do I + need to", "tokens": [51024, 1096, 300, 886, 300, 390, 411, 1266, 924, 2057, 420, + 370, 300, 390, 588, 5861, 2633, 411, 983, 360, 286, 643, 281, 51340], "temperature": + 0.0, "avg_logprob": -0.11731377468314222, "compression_ratio": 1.6, "no_speech_prob": + 0.0033323941752314568}, {"id": 484, "seek": 254948, "start": 2569.0, "end": 2576.04, + "text": " mess with that level of detail it never actually paid off I think what + pays off is a most", "tokens": [51340, 2082, 365, 300, 1496, 295, 2607, 309, 1128, + 767, 4835, 766, 286, 519, 437, 10604, 766, 307, 257, 881, 51692], "temperature": + 0.0, "avg_logprob": -0.11731377468314222, "compression_ratio": 1.6, "no_speech_prob": + 0.0033323941752314568}, {"id": 485, "seek": 257604, "start": 2576.44, "end": 2581.96, + "text": " smart way to organize your index and how you read data backwards like + essentially when you", "tokens": [50384, 4069, 636, 281, 13859, 428, 8186, 293, + 577, 291, 1401, 1412, 12204, 411, 4476, 562, 291, 50660], "temperature": 0.0, "avg_logprob": + -0.12161791324615479, "compression_ratio": 1.6798245614035088, "no_speech_prob": + 0.0021350958850234747}, {"id": 486, "seek": 257604, "start": 2581.96, "end": 2589.24, + "text": " users really only need fresh data first like on Twitter for example everyone + is really after the", "tokens": [50660, 5022, 534, 787, 643, 4451, 1412, 700, 411, + 322, 5794, 337, 1365, 1518, 307, 534, 934, 264, 51024], "temperature": 0.0, "avg_logprob": + -0.12161791324615479, "compression_ratio": 1.6798245614035088, "no_speech_prob": + 0.0021350958850234747}, {"id": 487, "seek": 257604, "start": 2589.24, "end": 2596.04, + "text": " recent tweets and not some archive and that was very similar case for + us but it''s very interesting", "tokens": [51024, 5162, 25671, 293, 406, 512, 23507, + 293, 300, 390, 588, 2531, 1389, 337, 505, 457, 309, 311, 588, 1880, 51364], "temperature": + 0.0, "avg_logprob": -0.12161791324615479, "compression_ratio": 1.6798245614035088, + "no_speech_prob": 0.0021350958850234747}, {"id": 488, "seek": 257604, "start": 2596.04, + "end": 2603.8, "text": " like you go into so much detail there to to make the database + effectively like a living organism", "tokens": [51364, 411, 291, 352, 666, 370, + 709, 2607, 456, 281, 281, 652, 264, 8149, 8659, 411, 257, 2647, 24128, 51752], "temperature": + 0.0, "avg_logprob": -0.12161791324615479, "compression_ratio": 1.6798245614035088, + "no_speech_prob": 0.0021350958850234747}, {"id": 489, "seek": 260380, "start": 2604.28, + "end": 2610.44, "text": " adjusting to the usage but you also you also have multi-tenancy + right so meaning that the same", "tokens": [50388, 23559, 281, 264, 14924, 457, + 291, 611, 291, 611, 362, 4825, 12, 1147, 6717, 558, 370, 3620, 300, 264, 912, 50696], + "temperature": 0.0, "avg_logprob": -0.1595682236085455, "compression_ratio": 1.7464788732394365, + "no_speech_prob": 0.004518670961260796}, {"id": 490, "seek": 260380, "start": 2611.2400000000002, + "end": 2619.48, "text": " the same turbo buffer deployed across the data centers + is going to be used by multiple companies", "tokens": [50736, 264, 912, 20902, 21762, + 17826, 2108, 264, 1412, 10898, 307, 516, 281, 312, 1143, 538, 3866, 3431, 51148], + "temperature": 0.0, "avg_logprob": -0.1595682236085455, "compression_ratio": 1.7464788732394365, + "no_speech_prob": 0.004518670961260796}, {"id": 491, "seek": 260380, "start": 2619.48, + "end": 2625.0, "text": " at the same time unless they demand an isolation how do + you think about that when they", "tokens": [51148, 412, 264, 912, 565, 5969, 436, + 4733, 364, 16001, 577, 360, 291, 519, 466, 300, 562, 436, 51424], "temperature": + 0.0, "avg_logprob": -0.1595682236085455, "compression_ratio": 1.7464788732394365, + "no_speech_prob": 0.004518670961260796}, {"id": 492, "seek": 260380, "start": 2625.6400000000003, + "end": 2631.96, "text": " use the same effectively in the same instance compute + and index I''d love to go into the solar", "tokens": [51456, 764, 264, 912, 8659, + 294, 264, 912, 5197, 14722, 293, 8186, 286, 1116, 959, 281, 352, 666, 264, 7936, + 51772], "temperature": 0.0, "avg_logprob": -0.1595682236085455, "compression_ratio": + 1.7464788732394365, "no_speech_prob": 0.004518670961260796}, {"id": 493, "seek": + 263196, "start": 2631.96, "end": 2638.12, "text": " example for just one second + before we go into multi-tenancy how slow were those queries because", "tokens": + [50364, 1365, 337, 445, 472, 1150, 949, 321, 352, 666, 4825, 12, 1147, 6717, 577, + 2964, 645, 729, 24109, 570, 50672], "temperature": 0.0, "avg_logprob": -0.13707547268625034, + "compression_ratio": 1.749090909090909, "no_speech_prob": 0.003875292604789138}, + {"id": 494, "seek": 263196, "start": 2638.12, "end": 2643.48, "text": " when you + say it cold you mean that it''s not in memory when I say cold I mean that it''s + on S3", "tokens": [50672, 562, 291, 584, 309, 3554, 291, 914, 300, 309, 311, 406, + 294, 4675, 562, 286, 584, 3554, 286, 914, 300, 309, 311, 322, 318, 18, 50940], "temperature": + 0.0, "avg_logprob": -0.13707547268625034, "compression_ratio": 1.749090909090909, + "no_speech_prob": 0.003875292604789138}, {"id": 495, "seek": 263196, "start": 2643.48, + "end": 2647.56, "text": " what kind of latency were you seeing that you had to do + this work on it was very slow first of all", "tokens": [50940, 437, 733, 295, 27043, + 645, 291, 2577, 300, 291, 632, 281, 360, 341, 589, 322, 309, 390, 588, 2964, 700, + 295, 439, 51144], "temperature": 0.0, "avg_logprob": -0.13707547268625034, "compression_ratio": + 1.749090909090909, "no_speech_prob": 0.003875292604789138}, {"id": 496, "seek": + 263196, "start": 2648.68, "end": 2655.48, "text": " the it has to do also with the + domain specificity you know the the queries were Boolean and very long", "tokens": + [51200, 264, 309, 575, 281, 360, 611, 365, 264, 9274, 2685, 507, 291, 458, 264, + 264, 24109, 645, 23351, 28499, 293, 588, 938, 51540], "temperature": 0.0, "avg_logprob": + -0.13707547268625034, "compression_ratio": 1.749090909090909, "no_speech_prob": + 0.003875292604789138}, {"id": 497, "seek": 263196, "start": 2655.48, "end": 2661.2400000000002, + "text": " and so they they would take sometimes just a query itself would take a + minute to execute on", "tokens": [51540, 293, 370, 436, 436, 576, 747, 2171, 445, + 257, 14581, 2564, 576, 747, 257, 3456, 281, 14483, 322, 51828], "temperature": 0.0, + "avg_logprob": -0.13707547268625034, "compression_ratio": 1.749090909090909, "no_speech_prob": + 0.003875292604789138}, {"id": 498, "seek": 266124, "start": 2661.24, "end": 2668.4399999999996, + "text": " now like a regional index design and that was like just super crazy right + but it was also very", "tokens": [50364, 586, 411, 257, 10964, 8186, 1715, 293, + 300, 390, 411, 445, 1687, 3219, 558, 457, 309, 390, 611, 588, 50724], "temperature": + 0.0, "avg_logprob": -0.09166621600880343, "compression_ratio": 1.7625570776255708, + "no_speech_prob": 0.0006432479713112116}, {"id": 499, "seek": 266124, "start": 2668.4399999999996, + "end": 2674.4399999999996, "text": " accurate because it was like sentence level + search and then I had to design a new system new", "tokens": [50724, 8559, 570, + 309, 390, 411, 8174, 1496, 3164, 293, 550, 286, 632, 281, 1715, 257, 777, 1185, + 777, 51024], "temperature": 0.0, "avg_logprob": -0.09166621600880343, "compression_ratio": + 1.7625570776255708, "no_speech_prob": 0.0006432479713112116}, {"id": 500, "seek": + 266124, "start": 2674.4399999999996, "end": 2682.04, "text": " architecture where + we could retain the accuracy of that engine but not have to spend so much money", + "tokens": [51024, 9482, 689, 321, 727, 18340, 264, 14170, 295, 300, 2848, 457, 406, + 362, 281, 3496, 370, 709, 1460, 51404], "temperature": 0.0, "avg_logprob": -0.09166621600880343, + "compression_ratio": 1.7625570776255708, "no_speech_prob": 0.0006432479713112116}, + {"id": 501, "seek": 266124, "start": 2682.04, "end": 2688.9199999999996, "text": + " on on on indexing individual sentences so we indexed one complete document right + so I had to change", "tokens": [51404, 322, 322, 322, 8186, 278, 2609, 16579, 370, + 321, 8186, 292, 472, 3566, 4166, 558, 370, 286, 632, 281, 1319, 51748], "temperature": + 0.0, "avg_logprob": -0.09166621600880343, "compression_ratio": 1.7625570776255708, + "no_speech_prob": 0.0006432479713112116}, {"id": 502, "seek": 268892, "start": 2688.92, + "end": 2695.8, "text": " the algorithms slightly and so it went to sub-second it + was still I think it''s still slow right but", "tokens": [50364, 264, 14642, 4748, + 293, 370, 309, 1437, 281, 1422, 12, 27375, 309, 390, 920, 286, 519, 309, 311, 920, + 2964, 558, 457, 50708], "temperature": 0.0, "avg_logprob": -0.17029610500540784, + "compression_ratio": 1.7094017094017093, "no_speech_prob": 0.0012950095115229487}, + {"id": 503, "seek": 268892, "start": 2695.8, "end": 2702.12, "text": " it was much + faster and and users started like like we could scale the company effectively after + that", "tokens": [50708, 309, 390, 709, 4663, 293, 293, 5022, 1409, 411, 411, 321, + 727, 4373, 264, 2237, 8659, 934, 300, 51024], "temperature": 0.0, "avg_logprob": + -0.17029610500540784, "compression_ratio": 1.7094017094017093, "no_speech_prob": + 0.0012950095115229487}, {"id": 504, "seek": 268892, "start": 2702.12, "end": 2710.12, + "text": " right with one minute and 75% of infrastructure costs were like you know + shaving off so but that''s", "tokens": [51024, 558, 365, 472, 3456, 293, 9562, 4, + 295, 6896, 5497, 645, 411, 291, 458, 36481, 766, 370, 457, 300, 311, 51424], "temperature": + 0.0, "avg_logprob": -0.17029610500540784, "compression_ratio": 1.7094017094017093, + "no_speech_prob": 0.0012950095115229487}, {"id": 505, "seek": 268892, "start": 2710.92, + "end": 2717.16, "text": " that''s that was part of the Lucine you know munging with + the algorithm and changing how it scans the", "tokens": [51464, 300, 311, 300, 390, + 644, 295, 264, 9593, 533, 291, 458, 275, 1063, 278, 365, 264, 9284, 293, 4473, 577, + 309, 35116, 264, 51776], "temperature": 0.0, "avg_logprob": -0.17029610500540784, + "compression_ratio": 1.7094017094017093, "no_speech_prob": 0.0012950095115229487}, + {"id": 506, "seek": 271716, "start": 2717.16, "end": 2723.96, "text": " document + it had nothing to do with the level that you go into you know with turbo buffer + you know like", "tokens": [50364, 4166, 309, 632, 1825, 281, 360, 365, 264, 1496, + 300, 291, 352, 666, 291, 458, 365, 20902, 21762, 291, 458, 411, 50704], "temperature": + 0.0, "avg_logprob": -0.2030700376664085, "compression_ratio": 1.819905213270142, + "no_speech_prob": 0.001501852530054748}, {"id": 507, "seek": 271716, "start": 2725.16, + "end": 2731.8799999999997, "text": " effectively controlling the whole the whole + process there got it yeah I think the the point", "tokens": [50764, 8659, 14905, + 264, 1379, 264, 1379, 1399, 456, 658, 309, 1338, 286, 519, 264, 264, 935, 51100], + "temperature": 0.0, "avg_logprob": -0.2030700376664085, "compression_ratio": 1.819905213270142, + "no_speech_prob": 0.001501852530054748}, {"id": 508, "seek": 271716, "start": 2731.8799999999997, + "end": 2738.52, "text": " I the the point there is that I think we do see that some + customers are concerned with with", "tokens": [51100, 286, 264, 264, 935, 456, 307, + 300, 286, 519, 321, 360, 536, 300, 512, 4581, 366, 5922, 365, 365, 51432], "temperature": + 0.0, "avg_logprob": -0.2030700376664085, "compression_ratio": 1.819905213270142, + "no_speech_prob": 0.001501852530054748}, {"id": 509, "seek": 271716, "start": 2738.52, + "end": 2743.56, "text": " this cash because they''ve gone bit and by basically the + the way that I would think about it is in", "tokens": [51432, 341, 6388, 570, 436, + 600, 2780, 857, 293, 538, 1936, 264, 264, 636, 300, 286, 576, 519, 466, 309, 307, + 294, 51684], "temperature": 0.0, "avg_logprob": -0.2030700376664085, "compression_ratio": + 1.819905213270142, "no_speech_prob": 0.001501852530054748}, {"id": 510, "seek": + 274356, "start": 2743.64, "end": 2747.96, "text": " in some of the traditional engines + the way that they do IO if something is on disk it feels like", "tokens": [50368, + 294, 512, 295, 264, 5164, 12982, 264, 636, 300, 436, 360, 39839, 498, 746, 307, + 322, 12355, 309, 3417, 411, 50584], "temperature": 0.0, "avg_logprob": -0.09234489500522614, + "compression_ratio": 1.9108527131782946, "no_speech_prob": 0.003323836950585246}, + {"id": 511, "seek": 274356, "start": 2747.96, "end": 2752.92, "text": " it''s bad + like if it''s on disk it''s slow and it really has to be in memory and so you sort + of have", "tokens": [50584, 309, 311, 1578, 411, 498, 309, 311, 322, 12355, 309, + 311, 2964, 293, 309, 534, 575, 281, 312, 294, 4675, 293, 370, 291, 1333, 295, 362, + 50832], "temperature": 0.0, "avg_logprob": -0.09234489500522614, "compression_ratio": + 1.9108527131782946, "no_speech_prob": 0.003323836950585246}, {"id": 512, "seek": + 274356, "start": 2752.92, "end": 2757.24, "text": " you know the pufferfish is either + you know the pufferfish is sort of because when it''s fully inflated", "tokens": + [50832, 291, 458, 264, 19613, 260, 11608, 307, 2139, 291, 458, 264, 19613, 260, + 11608, 307, 1333, 295, 570, 562, 309, 311, 4498, 9922, 770, 51048], "temperature": + 0.0, "avg_logprob": -0.09234489500522614, "compression_ratio": 1.9108527131782946, + "no_speech_prob": 0.003323836950585246}, {"id": 513, "seek": 274356, "start": 2757.24, + "end": 2762.6, "text": " it''s a DRAM right it''s a deflated it''s in s3 well it + only had two settings right either it''s in", "tokens": [51048, 309, 311, 257, 12118, + 2865, 558, 309, 311, 257, 1060, 38539, 309, 311, 294, 262, 18, 731, 309, 787, 632, + 732, 6257, 558, 2139, 309, 311, 294, 51316], "temperature": 0.0, "avg_logprob": + -0.09234489500522614, "compression_ratio": 1.9108527131782946, "no_speech_prob": + 0.003323836950585246}, {"id": 514, "seek": 274356, "start": 2762.6, "end": 2767.56, + "text": " disk which is quite slow and frankly in some of the traditional storage + engine I''ve seen the latency", "tokens": [51316, 12355, 597, 307, 1596, 2964, 293, + 11939, 294, 512, 295, 264, 5164, 6725, 2848, 286, 600, 1612, 264, 27043, 51564], + "temperature": 0.0, "avg_logprob": -0.09234489500522614, "compression_ratio": 1.9108527131782946, + "no_speech_prob": 0.003323836950585246}, {"id": 515, "seek": 276756, "start": 2767.56, + "end": 2774.2, "text": " on disk being similar to our latency on s3 yeah and so + then you have to load it into DRAM and what", "tokens": [50364, 322, 12355, 885, + 2531, 281, 527, 27043, 322, 262, 18, 1338, 293, 370, 550, 291, 362, 281, 3677, 309, + 666, 12118, 2865, 293, 437, 50696], "temperature": 0.0, "avg_logprob": -0.12839167045824457, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.0045700110495090485}, + {"id": 516, "seek": 276756, "start": 2774.2, "end": 2778.44, "text": " a lot of + these traditional databases they have to do a full copy into DRAM they can''t just + like zero", "tokens": [50696, 257, 688, 295, 613, 5164, 22380, 436, 362, 281, 360, + 257, 1577, 5055, 666, 12118, 2865, 436, 393, 380, 445, 411, 4018, 50908], "temperature": + 0.0, "avg_logprob": -0.12839167045824457, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.0045700110495090485}, {"id": 517, "seek": 276756, "start": 2778.44, + "end": 2784.44, "text": " copy off of disk and in the disk are also quite slow these + old network disks right the NVME disks", "tokens": [50908, 5055, 766, 295, 12355, + 293, 294, 264, 12355, 366, 611, 1596, 2964, 613, 1331, 3209, 41617, 558, 264, 46512, + 15454, 41617, 51208], "temperature": 0.0, "avg_logprob": -0.12839167045824457, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.0045700110495090485}, {"id": 518, "seek": + 276756, "start": 2784.44, "end": 2793.32, "text": " are so fast right they are they + can drive bandwidth that''s within you know a very low multiple of DRAM", "tokens": + [51208, 366, 370, 2370, 558, 436, 366, 436, 393, 3332, 23647, 300, 311, 1951, 291, + 458, 257, 588, 2295, 3866, 295, 12118, 2865, 51652], "temperature": 0.0, "avg_logprob": + -0.12839167045824457, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.0045700110495090485}, {"id": 519, "seek": 279332, "start": 2793.32, "end": 2799.8, + "text": " right tens of gigabytes per second but their cost is almost and two orders + of magnitude lower", "tokens": [50364, 558, 10688, 295, 42741, 680, 1150, 457, 641, + 2063, 307, 1920, 293, 732, 9470, 295, 15668, 3126, 50688], "temperature": 0.0, "avg_logprob": + -0.12304900727182064, "compression_ratio": 1.7178571428571427, "no_speech_prob": + 0.011262807995080948}, {"id": 520, "seek": 279332, "start": 2799.8, "end": 2804.6000000000004, + "text": " so this completely changes the economics but you you can''t take advantage + of these very easily you", "tokens": [50688, 370, 341, 2584, 2962, 264, 14564, 457, + 291, 291, 393, 380, 747, 5002, 295, 613, 588, 3612, 291, 50928], "temperature": + 0.0, "avg_logprob": -0.12304900727182064, "compression_ratio": 1.7178571428571427, + "no_speech_prob": 0.011262807995080948}, {"id": 521, "seek": 279332, "start": 2804.6000000000004, + "end": 2809.88, "text": " can''t just put as some software on it and just it''s + going to be like 10 times faster than an original", "tokens": [50928, 393, 380, + 445, 829, 382, 512, 4722, 322, 309, 293, 445, 309, 311, 516, 281, 312, 411, 1266, + 1413, 4663, 813, 364, 3380, 51192], "temperature": 0.0, "avg_logprob": -0.12304900727182064, + "compression_ratio": 1.7178571428571427, "no_speech_prob": 0.011262807995080948}, + {"id": 522, "seek": 279332, "start": 2809.88, "end": 2814.84, "text": " disk even + if it''s fundamentally capable of it because what we found for example is that we", + "tokens": [51192, 12355, 754, 498, 309, 311, 17879, 8189, 295, 309, 570, 437, 321, + 1352, 337, 1365, 307, 300, 321, 51440], "temperature": 0.0, "avg_logprob": -0.12304900727182064, + "compression_ratio": 1.7178571428571427, "no_speech_prob": 0.011262807995080948}, + {"id": 523, "seek": 279332, "start": 2814.84, "end": 2819.2400000000002, "text": + " had to remove the Linux page cache because the Linux page cache cannot keep up + with these disks", "tokens": [51440, 632, 281, 4159, 264, 18734, 3028, 19459, 570, + 264, 18734, 3028, 19459, 2644, 1066, 493, 365, 613, 41617, 51660], "temperature": + 0.0, "avg_logprob": -0.12304900727182064, "compression_ratio": 1.7178571428571427, + "no_speech_prob": 0.011262807995080948}, {"id": 524, "seek": 281924, "start": 2819.24, + "end": 2823.0, "text": " so you have to do direct dial but when you do direct dial + you don''t get coalescing you don''t get", "tokens": [50364, 370, 291, 362, 281, + 360, 2047, 5502, 457, 562, 291, 360, 2047, 5502, 291, 500, 380, 483, 598, 4229, + 2175, 291, 500, 380, 483, 50552], "temperature": 0.0, "avg_logprob": -0.1318685926240066, + "compression_ratio": 1.8682170542635659, "no_speech_prob": 0.009973360225558281}, + {"id": 525, "seek": 281924, "start": 2823.0, "end": 2826.7599999999998, "text": + " all these other things now you have to write your own IO driver right and so you + just", "tokens": [50552, 439, 613, 661, 721, 586, 291, 362, 281, 2464, 428, 1065, + 39839, 6787, 558, 293, 370, 291, 445, 50740], "temperature": 0.0, "avg_logprob": + -0.1318685926240066, "compression_ratio": 1.8682170542635659, "no_speech_prob": + 0.009973360225558281}, {"id": 526, "seek": 281924, "start": 2828.2799999999997, + "end": 2833.0, "text": " databases have not been built to take advantage of it because + they''re also like they''re not built", "tokens": [50816, 22380, 362, 406, 668, + 3094, 281, 747, 5002, 295, 309, 570, 436, 434, 611, 411, 436, 434, 406, 3094, 51052], + "temperature": 0.0, "avg_logprob": -0.1318685926240066, "compression_ratio": 1.8682170542635659, + "no_speech_prob": 0.009973360225558281}, {"id": 527, "seek": 281924, "start": 2833.0, + "end": 2838.9199999999996, "text": " to try to do an IO depth like basically so + many outside standing IO requests they can they can drive", "tokens": [51052, 281, + 853, 281, 360, 364, 39839, 7161, 411, 1936, 370, 867, 2380, 4877, 39839, 12475, + 436, 393, 436, 393, 3332, 51348], "temperature": 0.0, "avg_logprob": -0.1318685926240066, + "compression_ratio": 1.8682170542635659, "no_speech_prob": 0.009973360225558281}, + {"id": 528, "seek": 281924, "start": 2838.9199999999996, "end": 2844.04, "text": + " there''s a lot of throughput so there''s just a lot of barriers of entry there + so what we find is that", "tokens": [51348, 456, 311, 257, 688, 295, 44629, 370, + 456, 311, 445, 257, 688, 295, 13565, 295, 8729, 456, 370, 437, 321, 915, 307, 300, + 51604], "temperature": 0.0, "avg_logprob": -0.1318685926240066, "compression_ratio": + 1.8682170542635659, "no_speech_prob": 0.009973360225558281}, {"id": 529, "seek": + 284404, "start": 2844.12, "end": 2849.0, "text": " when again speaking in generic + terms here of like you know millions of vectors query that once", "tokens": [50368, + 562, 797, 4124, 294, 19577, 2115, 510, 295, 411, 291, 458, 6803, 295, 18875, 14581, + 300, 1564, 50612], "temperature": 0.0, "avg_logprob": -0.14663542362681606, "compression_ratio": + 1.8308823529411764, "no_speech_prob": 0.005453239660710096}, {"id": 530, "seek": + 284404, "start": 2849.64, "end": 2855.48, "text": " when something is in disk it''s + maybe high tens of milliseconds mid you know 50 70 milliseconds when", "tokens": + [50644, 562, 746, 307, 294, 12355, 309, 311, 1310, 1090, 10688, 295, 34184, 2062, + 291, 458, 2625, 5285, 34184, 562, 50936], "temperature": 0.0, "avg_logprob": -0.14663542362681606, + "compression_ratio": 1.8308823529411764, "no_speech_prob": 0.005453239660710096}, + {"id": 531, "seek": 284404, "start": 2855.48, "end": 2860.84, "text": " it''s fully + on disk maybe lower depending on the query the machinality whatever and when it''s + in memory", "tokens": [50936, 309, 311, 4498, 322, 12355, 1310, 3126, 5413, 322, + 264, 14581, 264, 2246, 259, 1860, 2035, 293, 562, 309, 311, 294, 4675, 51204], "temperature": + 0.0, "avg_logprob": -0.14663542362681606, "compression_ratio": 1.8308823529411764, + "no_speech_prob": 0.005453239660710096}, {"id": 532, "seek": 284404, "start": 2860.84, + "end": 2865.24, "text": " it''s closer to 10 to 20 milliseconds right so it''s like + these are not this is not bad like the user", "tokens": [51204, 309, 311, 4966, + 281, 1266, 281, 945, 34184, 558, 370, 309, 311, 411, 613, 366, 406, 341, 307, 406, + 1578, 411, 264, 4195, 51424], "temperature": 0.0, "avg_logprob": -0.14663542362681606, + "compression_ratio": 1.8308823529411764, "no_speech_prob": 0.005453239660710096}, + {"id": 533, "seek": 284404, "start": 2865.24, "end": 2870.52, "text": " is barely + going to notice it and but of course you''re going to get more throughput that way + and then", "tokens": [51424, 307, 10268, 516, 281, 3449, 309, 293, 457, 295, 1164, + 291, 434, 516, 281, 483, 544, 44629, 300, 636, 293, 550, 51688], "temperature": + 0.0, "avg_logprob": -0.14663542362681606, "compression_ratio": 1.8308823529411764, + "no_speech_prob": 0.005453239660710096}, {"id": 534, "seek": 287052, "start": 2870.52, + "end": 2874.52, "text": " means it''s on s3 it''s maybe more like five to six hundred + milliseconds it''s sort of user would", "tokens": [50364, 1355, 309, 311, 322, 262, + 18, 309, 311, 1310, 544, 411, 1732, 281, 2309, 3262, 34184, 309, 311, 1333, 295, + 4195, 576, 50564], "temperature": 0.0, "avg_logprob": -0.1387880037156798, "compression_ratio": + 1.8317757009345794, "no_speech_prob": 0.0010188273154199123}, {"id": 535, "seek": + 287052, "start": 2874.52, "end": 2880.36, "text": " notice but a lot of our customers + like notion for example when you open the q and a dialog and", "tokens": [50564, + 3449, 457, 257, 688, 295, 527, 4581, 411, 10710, 337, 1365, 562, 291, 1269, 264, + 9505, 293, 257, 19308, 293, 50856], "temperature": 0.0, "avg_logprob": -0.1387880037156798, + "compression_ratio": 1.8317757009345794, "no_speech_prob": 0.0010188273154199123}, + {"id": 536, "seek": 287052, "start": 2880.36, "end": 2885.24, "text": " these different + dialogues that will query turbo puffer they will send a request to tell turbo puffer", + "tokens": [50856, 613, 819, 45551, 300, 486, 14581, 20902, 19613, 260, 436, 486, + 2845, 257, 5308, 281, 980, 20902, 19613, 260, 51100], "temperature": 0.0, "avg_logprob": + -0.1387880037156798, "compression_ratio": 1.8317757009345794, "no_speech_prob": + 0.0010188273154199123}, {"id": 537, "seek": 287052, "start": 2885.24, "end": 2889.72, + "text": " hey can you start warming up the cash here in a way that makes sense and + by cash we just mean putting", "tokens": [51100, 4177, 393, 291, 722, 17983, 493, + 264, 6388, 510, 294, 257, 636, 300, 1669, 2020, 293, 538, 6388, 321, 445, 914, 3372, + 51324], "temperature": 0.0, "avg_logprob": -0.1387880037156798, "compression_ratio": + 1.8317757009345794, "no_speech_prob": 0.0010188273154199123}, {"id": 538, "seek": + 287052, "start": 2889.72, "end": 2894.28, "text": " it into disk and starting with + sort of the upper layers of the an index and other things to reduce", "tokens": + [51324, 309, 666, 12355, 293, 2891, 365, 1333, 295, 264, 6597, 7914, 295, 264, 364, + 8186, 293, 661, 721, 281, 5407, 51552], "temperature": 0.0, "avg_logprob": -0.1387880037156798, + "compression_ratio": 1.8317757009345794, "no_speech_prob": 0.0010188273154199123}, + {"id": 539, "seek": 287052, "start": 2894.28, "end": 2899.48, "text": " the time + as much as possible so there''s a lot of things that can be done here that are very + very", "tokens": [51552, 264, 565, 382, 709, 382, 1944, 370, 456, 311, 257, 688, + 295, 721, 300, 393, 312, 1096, 510, 300, 366, 588, 588, 51812], "temperature": 0.0, + "avg_logprob": -0.1387880037156798, "compression_ratio": 1.8317757009345794, "no_speech_prob": + 0.0010188273154199123}, {"id": 540, "seek": 289948, "start": 2899.48, "end": 2907.8, + "text": " simple that means that the there''s there''s there''s barely a trade-off + yeah but we let''s go", "tokens": [50364, 2199, 300, 1355, 300, 264, 456, 311, 456, + 311, 456, 311, 10268, 257, 4923, 12, 4506, 1338, 457, 321, 718, 311, 352, 50780], + "temperature": 0.0, "avg_logprob": -0.17048031268733563, "compression_ratio": 1.933673469387755, + "no_speech_prob": 0.0011517814127728343}, {"id": 541, "seek": 289948, "start": 2907.8, + "end": 2913.2400000000002, "text": " back into multi-tenancy unless you had a follow + up let''s do that yeah let''s do that like how do you", "tokens": [50780, 646, 666, + 4825, 12, 1147, 6717, 5969, 291, 632, 257, 1524, 493, 718, 311, 360, 300, 1338, + 718, 311, 360, 300, 411, 577, 360, 291, 51052], "temperature": 0.0, "avg_logprob": + -0.17048031268733563, "compression_ratio": 1.933673469387755, "no_speech_prob": + 0.0011517814127728343}, {"id": 542, "seek": 289948, "start": 2913.2400000000002, + "end": 2918.92, "text": " use a multi-tenancy part so so turbo puffer can run in + three different ways it can run yeah", "tokens": [51052, 764, 257, 4825, 12, 1147, + 6717, 644, 370, 370, 20902, 19613, 260, 393, 1190, 294, 1045, 819, 2098, 309, 393, + 1190, 1338, 51336], "temperature": 0.0, "avg_logprob": -0.17048031268733563, "compression_ratio": + 1.933673469387755, "no_speech_prob": 0.0011517814127728343}, {"id": 543, "seek": + 289948, "start": 2918.92, "end": 2924.76, "text": " multi-tenancy clusters that''s + what I mean that''s what cursor does that''s what that''s what linear", "tokens": + [51336, 4825, 12, 1147, 6717, 23313, 300, 311, 437, 286, 914, 300, 311, 437, 28169, + 775, 300, 311, 437, 300, 311, 437, 8213, 51628], "temperature": 0.0, "avg_logprob": + -0.17048031268733563, "compression_ratio": 1.933673469387755, "no_speech_prob": + 0.0011517814127728343}, {"id": 544, "seek": 292476, "start": 2924.76, "end": 2932.5200000000004, + "text": " does and many of our customers so in multi-tenancy you share you share + the compute we can do this", "tokens": [50364, 775, 293, 867, 295, 527, 4581, 370, + 294, 4825, 12, 1147, 6717, 291, 2073, 291, 2073, 264, 14722, 321, 393, 360, 341, + 50752], "temperature": 0.0, "avg_logprob": -0.1391412570912351, "compression_ratio": + 1.8173076923076923, "no_speech_prob": 0.0017693896079435945}, {"id": 545, "seek": + 292476, "start": 2932.5200000000004, "end": 2936.76, "text": " so cheaply right + because we can share the caching can share the we can share all of this", "tokens": + [50752, 370, 7084, 356, 558, 570, 321, 393, 2073, 264, 269, 2834, 393, 2073, 264, + 321, 393, 2073, 439, 295, 341, 50964], "temperature": 0.0, "avg_logprob": -0.1391412570912351, + "compression_ratio": 1.8173076923076923, "no_speech_prob": 0.0017693896079435945}, + {"id": 546, "seek": 292476, "start": 2936.76, "end": 2942.76, "text": " infrastructure + it''s very easy for us to run this way so that''s the default mode the cash is of", + "tokens": [50964, 6896, 309, 311, 588, 1858, 337, 505, 281, 1190, 341, 636, 370, + 300, 311, 264, 7576, 4391, 264, 6388, 307, 295, 51264], "temperature": 0.0, "avg_logprob": + -0.1391412570912351, "compression_ratio": 1.8173076923076923, "no_speech_prob": + 0.0017693896079435945}, {"id": 547, "seek": 292476, "start": 2942.76, "end": 2950.2000000000003, + "text": " course segregated off in in in in in in different ways but is also like + shared in ways where you", "tokens": [51264, 1164, 47370, 766, 294, 294, 294, 294, + 294, 294, 819, 2098, 457, 307, 611, 411, 5507, 294, 2098, 689, 291, 51636], "temperature": + 0.0, "avg_logprob": -0.1391412570912351, "compression_ratio": 1.8173076923076923, + "no_speech_prob": 0.0017693896079435945}, {"id": 548, "seek": 295020, "start": 2950.2, + "end": 2957.16, "text": " have a big burst the traffic rate you get more of the + cash than others so that''s what we so it''s", "tokens": [50364, 362, 257, 955, + 12712, 264, 6419, 3314, 291, 483, 544, 295, 264, 6388, 813, 2357, 370, 300, 311, + 437, 321, 370, 309, 311, 50712], "temperature": 0.0, "avg_logprob": -0.07950835704803466, + "compression_ratio": 1.7333333333333334, "no_speech_prob": 0.0007069749990478158}, + {"id": 549, "seek": 295020, "start": 2957.16, "end": 2961.3999999999996, "text": + " a very great way of running multi-tenancy the other thing we do for multi-tenancy + to keep it very", "tokens": [50712, 257, 588, 869, 636, 295, 2614, 4825, 12, 1147, + 6717, 264, 661, 551, 321, 360, 337, 4825, 12, 1147, 6717, 281, 1066, 309, 588, 50924], + "temperature": 0.0, "avg_logprob": -0.07950835704803466, "compression_ratio": 1.7333333333333334, + "no_speech_prob": 0.0007069749990478158}, {"id": 550, "seek": 295020, "start": 2961.3999999999996, + "end": 2967.24, "text": " secure is that because all the data at rest is in the + bucket you can pass an encryption key to", "tokens": [50924, 7144, 307, 300, 570, + 439, 264, 1412, 412, 1472, 307, 294, 264, 13058, 291, 393, 1320, 364, 29575, 2141, + 281, 51216], "temperature": 0.0, "avg_logprob": -0.07950835704803466, "compression_ratio": + 1.7333333333333334, "no_speech_prob": 0.0007069749990478158}, {"id": 551, "seek": + 295020, "start": 2967.24, "end": 2973.7999999999997, "text": " turbo puffer that + we don''t have access to unless it''s audit logged on your side where we can encrypt", + "tokens": [51216, 20902, 19613, 260, 300, 321, 500, 380, 362, 2105, 281, 5969, 309, + 311, 17748, 27231, 322, 428, 1252, 689, 321, 393, 17972, 662, 51544], "temperature": + 0.0, "avg_logprob": -0.07950835704803466, "compression_ratio": 1.7333333333333334, + "no_speech_prob": 0.0007069749990478158}, {"id": 552, "seek": 297380, "start": 2973.8, + "end": 2980.76, "text": " and decrypt the object which is logically and from a security + point standpoint equivalent to you", "tokens": [50364, 293, 979, 627, 662, 264, + 2657, 597, 307, 38887, 293, 490, 257, 3825, 935, 15827, 10344, 281, 291, 50712], + "temperature": 0.0, "avg_logprob": -0.08792360988231974, "compression_ratio": 1.825925925925926, + "no_speech_prob": 0.0037865922786295414}, {"id": 553, "seek": 297380, "start": 2980.76, + "end": 2987.0800000000004, "text": " having all the data in your bucket so this + is a very nice primitive that for example linear it takes", "tokens": [50712, 1419, + 439, 264, 1412, 294, 428, 13058, 370, 341, 307, 257, 588, 1481, 28540, 300, 337, + 1365, 8213, 309, 2516, 51028], "temperature": 0.0, "avg_logprob": -0.08792360988231974, + "compression_ratio": 1.825925925925926, "no_speech_prob": 0.0037865922786295414}, + {"id": 554, "seek": 297380, "start": 2987.0800000000004, "end": 2991.32, "text": + " advantage of because they have full control over their data they can see when + turbo puffer is", "tokens": [51028, 5002, 295, 570, 436, 362, 1577, 1969, 670, 641, + 1412, 436, 393, 536, 562, 20902, 19613, 260, 307, 51240], "temperature": 0.0, "avg_logprob": + -0.08792360988231974, "compression_ratio": 1.825925925925926, "no_speech_prob": + 0.0037865922786295414}, {"id": 555, "seek": 297380, "start": 2991.32, "end": 2996.76, + "text": " accessing it they can shut it down at any point in time and they can even + pass that on to any other", "tokens": [51240, 26440, 309, 436, 393, 5309, 309, 760, + 412, 604, 935, 294, 565, 293, 436, 393, 754, 1320, 300, 322, 281, 604, 661, 51512], + "temperature": 0.0, "avg_logprob": -0.08792360988231974, "compression_ratio": 1.825925925925926, + "no_speech_prob": 0.0037865922786295414}, {"id": 556, "seek": 297380, "start": 2996.76, + "end": 3002.6000000000004, "text": " customers where turbo puffer can encrypt data + for linear customers on behalf of the customer with the", "tokens": [51512, 4581, + 689, 20902, 19613, 260, 393, 17972, 662, 1412, 337, 8213, 4581, 322, 9490, 295, + 264, 5474, 365, 264, 51804], "temperature": 0.0, "avg_logprob": -0.08792360988231974, + "compression_ratio": 1.825925925925926, "no_speech_prob": 0.0037865922786295414}, + {"id": 557, "seek": 300260, "start": 3002.6, "end": 3008.6, "text": " customer''s + key it this is like really really I think groundbreaking and underrated in this", + "tokens": [50364, 5474, 311, 2141, 309, 341, 307, 411, 534, 534, 286, 519, 42491, + 293, 833, 5468, 294, 341, 50664], "temperature": 0.0, "avg_logprob": -0.1583266558947864, + "compression_ratio": 1.7183098591549295, "no_speech_prob": 0.0015153922140598297}, + {"id": 558, "seek": 300260, "start": 3008.6, "end": 3012.68, "text": " architecture + you can of course do single tenancy with turbo puffer as well with the computers + only", "tokens": [50664, 9482, 291, 393, 295, 1164, 360, 2167, 2064, 6717, 365, + 20902, 19613, 260, 382, 731, 365, 264, 10807, 787, 50868], "temperature": 0.0, "avg_logprob": + -0.1583266558947864, "compression_ratio": 1.7183098591549295, "no_speech_prob": + 0.0015153922140598297}, {"id": 559, "seek": 300260, "start": 3012.68, "end": 3017.56, + "text": " for you you can do b y o c where we run to recover inside of your cloud + in a way that''s like very", "tokens": [50868, 337, 291, 291, 393, 360, 272, 288, + 277, 269, 689, 321, 1190, 281, 8114, 1854, 295, 428, 4588, 294, 257, 636, 300, 311, + 411, 588, 51112], "temperature": 0.0, "avg_logprob": -0.1583266558947864, "compression_ratio": + 1.7183098591549295, "no_speech_prob": 0.0015153922140598297}, {"id": 560, "seek": + 300260, "start": 3017.56, "end": 3023.48, "text": " compliant we can never see customer + data but we find it in multi-tenancy with the encryption which", "tokens": [51112, + 36248, 321, 393, 1128, 536, 5474, 1412, 457, 321, 915, 309, 294, 4825, 12, 1147, + 6717, 365, 264, 29575, 597, 51408], "temperature": 0.0, "avg_logprob": -0.1583266558947864, + "compression_ratio": 1.7183098591549295, "no_speech_prob": 0.0015153922140598297}, + {"id": 561, "seek": 300260, "start": 3023.48, "end": 3028.2799999999997, "text": + " can be done per namespities satisfies the security requirements of even some of + the biggest companies", "tokens": [51408, 393, 312, 1096, 680, 5288, 79, 1088, 44271, + 264, 3825, 7728, 295, 754, 512, 295, 264, 3880, 3431, 51648], "temperature": 0.0, + "avg_logprob": -0.1583266558947864, "compression_ratio": 1.7183098591549295, "no_speech_prob": + 0.0015153922140598297}, {"id": 562, "seek": 302828, "start": 3028.28, "end": 3034.76, + "text": " in the world yeah that sounds awesome I also wanted to pick one topic + which usually used to I", "tokens": [50364, 294, 264, 1002, 1338, 300, 3263, 3476, + 286, 611, 1415, 281, 1888, 472, 4829, 597, 2673, 1143, 281, 286, 50688], "temperature": + 0.0, "avg_logprob": -0.16269441654807643, "compression_ratio": 1.5815217391304348, + "no_speech_prob": 0.01626240462064743}, {"id": 563, "seek": 302828, "start": 3034.76, + "end": 3042.2000000000003, "text": " don''t know if any more I don''t see that as + much pick up a lot of flame discussions what is your", "tokens": [50688, 500, 380, + 458, 498, 604, 544, 286, 500, 380, 536, 300, 382, 709, 1888, 493, 257, 688, 295, + 13287, 11088, 437, 307, 428, 51060], "temperature": 0.0, "avg_logprob": -0.16269441654807643, + "compression_ratio": 1.5815217391304348, "no_speech_prob": 0.01626240462064743}, + {"id": 564, "seek": 302828, "start": 3042.2000000000003, "end": 3051.4, "text": + " recall at n and when I go to the docs of turbo puffer it says recall at n is 100% + recall at 10 excuse", "tokens": [51060, 9901, 412, 297, 293, 562, 286, 352, 281, + 264, 45623, 295, 20902, 19613, 260, 309, 1619, 9901, 412, 297, 307, 2319, 4, 9901, + 412, 1266, 8960, 51520], "temperature": 0.0, "avg_logprob": -0.16269441654807643, + "compression_ratio": 1.5815217391304348, "no_speech_prob": 0.01626240462064743}, + {"id": 565, "seek": 305140, "start": 3051.56, "end": 3059.7200000000003, "text": + " me but vector search bar so does that not 100% we said 9200 right no I think it + says what wait wait wait", "tokens": [50372, 385, 457, 8062, 3164, 2159, 370, 775, + 300, 406, 2319, 4, 321, 848, 1722, 7629, 558, 572, 286, 519, 309, 1619, 437, 1699, + 1699, 1699, 50780], "temperature": 0.0, "avg_logprob": -0.17847001211983818, "compression_ratio": + 1.747787610619469, "no_speech_prob": 0.03225192800164223}, {"id": 566, "seek": 305140, + "start": 3060.36, "end": 3067.88, "text": " I''ll need to what was the page where + you do that oh here the limits oh I see observed in production", "tokens": [50812, + 286, 603, 643, 281, 437, 390, 264, 3028, 689, 291, 360, 300, 1954, 510, 264, 10406, + 1954, 286, 536, 13095, 294, 4265, 51188], "temperature": 0.0, "avg_logprob": -0.17847001211983818, + "compression_ratio": 1.747787610619469, "no_speech_prob": 0.03225192800164223}, + {"id": 567, "seek": 305140, "start": 3067.88, "end": 3072.36, "text": " yeah it + should say up to 100% that''s a bug in the docs that I shipped last night I''m gonna + I''m", "tokens": [51188, 1338, 309, 820, 584, 493, 281, 2319, 4, 300, 311, 257, + 7426, 294, 264, 45623, 300, 286, 25312, 1036, 1818, 286, 478, 799, 286, 478, 51412], + "temperature": 0.0, "avg_logprob": -0.17847001211983818, "compression_ratio": 1.747787610619469, + "no_speech_prob": 0.03225192800164223}, {"id": 568, "seek": 305140, "start": 3072.36, + "end": 3078.6800000000003, "text": " gonna fix that after this awesome but what + it says in the in the limits is 90 to 100% but let''s", "tokens": [51412, 799, 3191, + 300, 934, 341, 3476, 457, 437, 309, 1619, 294, 264, 294, 264, 10406, 307, 4289, + 281, 2319, 4, 457, 718, 311, 51728], "temperature": 0.0, "avg_logprob": -0.17847001211983818, + "compression_ratio": 1.747787610619469, "no_speech_prob": 0.03225192800164223}, + {"id": 569, "seek": 307868, "start": 3078.68, "end": 3085.48, "text": " talk about + recall I''d love to get into recall so I think recall is incredibly important it''s + the", "tokens": [50364, 751, 466, 9901, 286, 1116, 959, 281, 483, 666, 9901, 370, + 286, 519, 9901, 307, 6252, 1021, 309, 311, 264, 50704], "temperature": 0.0, "avg_logprob": + -0.11043286727646649, "compression_ratio": 2.0625, "no_speech_prob": 0.0017049049492925406}, + {"id": 570, "seek": 307868, "start": 3085.48, "end": 3090.2799999999997, "text": + " equivalent of your database you have to trust your database to do it in the same + way that you have", "tokens": [50704, 10344, 295, 428, 8149, 291, 362, 281, 3361, + 428, 8149, 281, 360, 309, 294, 264, 912, 636, 300, 291, 362, 50944], "temperature": + 0.0, "avg_logprob": -0.11043286727646649, "compression_ratio": 2.0625, "no_speech_prob": + 0.0017049049492925406}, {"id": 571, "seek": 307868, "start": 3090.2799999999997, + "end": 3095.3199999999997, "text": " to trust your database do f sync and you have + to trust your database that when we say that hey we", "tokens": [50944, 281, 3361, + 428, 8149, 360, 283, 20271, 293, 291, 362, 281, 3361, 428, 8149, 300, 562, 321, + 584, 300, 4177, 321, 51196], "temperature": 0.0, "avg_logprob": -0.11043286727646649, + "compression_ratio": 2.0625, "no_speech_prob": 0.0017049049492925406}, {"id": 572, + "seek": 307868, "start": 3095.3199999999997, "end": 3101.8799999999997, "text": + " don''t return a success to you unless it''s committed to s3 you have to trust + that recall is similar right", "tokens": [51196, 500, 380, 2736, 257, 2245, 281, + 291, 5969, 309, 311, 7784, 281, 262, 18, 291, 362, 281, 3361, 300, 9901, 307, 2531, + 558, 51524], "temperature": 0.0, "avg_logprob": -0.11043286727646649, "compression_ratio": + 2.0625, "no_speech_prob": 0.0017049049492925406}, {"id": 573, "seek": 307868, "start": + 3101.8799999999997, "end": 3108.04, "text": " if you are working on search and you''re + working on connecting data to llem''s then you don''t want", "tokens": [51524, 498, + 291, 366, 1364, 322, 3164, 293, 291, 434, 1364, 322, 11015, 1412, 281, 287, 10386, + 311, 550, 291, 500, 380, 528, 51832], "temperature": 0.0, "avg_logprob": -0.11043286727646649, + "compression_ratio": 2.0625, "no_speech_prob": 0.0017049049492925406}, {"id": 574, + "seek": 310804, "start": 3108.04, "end": 3113.8, "text": " to worry in your e-vails + on whether your vector database is giving you low recall it''s actually a very", + "tokens": [50364, 281, 3292, 294, 428, 308, 12, 85, 6227, 322, 1968, 428, 8062, + 8149, 307, 2902, 291, 2295, 9901, 309, 311, 767, 257, 588, 50652], "temperature": + 0.0, "avg_logprob": -0.10244310819185697, "compression_ratio": 1.8009259259259258, + "no_speech_prob": 0.0007448425167240202}, {"id": 575, "seek": 310804, "start": 3113.8, + "end": 3117.96, "text": " sophisticated problem to evaluate whether this is the + cause so you have to trust your vendor", "tokens": [50652, 16950, 1154, 281, 13059, + 1968, 341, 307, 264, 3082, 370, 291, 362, 281, 3361, 428, 24321, 50860], "temperature": + 0.0, "avg_logprob": -0.10244310819185697, "compression_ratio": 1.8009259259259258, + "no_speech_prob": 0.0007448425167240202}, {"id": 576, "seek": 310804, "start": 3119.48, + "end": 3125.88, "text": " this is an underrated problem and I love that you''re + asking about it and very few people ask about", "tokens": [50936, 341, 307, 364, + 833, 5468, 1154, 293, 286, 959, 300, 291, 434, 3365, 466, 309, 293, 588, 1326, 561, + 1029, 466, 51256], "temperature": 0.0, "avg_logprob": -0.10244310819185697, "compression_ratio": + 1.8009259259259258, "no_speech_prob": 0.0007448425167240202}, {"id": 577, "seek": + 310804, "start": 3125.88, "end": 3130.68, "text": " it unless they''re quite sophisticated + so let''s go into it let''s go into a long answer here for", "tokens": [51256, 309, + 5969, 436, 434, 1596, 16950, 370, 718, 311, 352, 666, 309, 718, 311, 352, 666, 257, + 938, 1867, 510, 337, 51496], "temperature": 0.0, "avg_logprob": -0.10244310819185697, + "compression_ratio": 1.8009259259259258, "no_speech_prob": 0.0007448425167240202}, + {"id": 578, "seek": 313068, "start": 3130.8399999999997, "end": 3137.72, "text": + " your audience because I think this is paramount most databases that have a vector + index are", "tokens": [50372, 428, 4034, 570, 286, 519, 341, 307, 6220, 792, 881, + 22380, 300, 362, 257, 8062, 8186, 366, 50716], "temperature": 0.0, "avg_logprob": + -0.14194954958829012, "compression_ratio": 1.686832740213523, "no_speech_prob": + 0.00752654392272234}, {"id": 579, "seek": 313068, "start": 3139.08, "end": 3145.3999999999996, + "text": " trained on or not trained on but they''re benchmarked against for these + different A&N open", "tokens": [50784, 8895, 322, 420, 406, 8895, 322, 457, 436, + 434, 18927, 292, 1970, 337, 613, 819, 316, 5, 45, 1269, 51100], "temperature": 0.0, + "avg_logprob": -0.14194954958829012, "compression_ratio": 1.686832740213523, "no_speech_prob": + 0.00752654392272234}, {"id": 580, "seek": 313068, "start": 3145.3999999999996, "end": + 3150.68, "text": " source projects so there''s sift and others problem with these + data is that they do not represent what", "tokens": [51100, 4009, 4455, 370, 456, + 311, 262, 2008, 293, 2357, 1154, 365, 613, 1412, 307, 300, 436, 360, 406, 2906, + 437, 51364], "temperature": 0.0, "avg_logprob": -0.14194954958829012, "compression_ratio": + 1.686832740213523, "no_speech_prob": 0.00752654392272234}, {"id": 581, "seek": 313068, + "start": 3150.68, "end": 3155.24, "text": " we''ve seen in the real world a lot + of them are very low dimensionality like when we do benchmarking", "tokens": [51364, + 321, 600, 1612, 294, 264, 957, 1002, 257, 688, 295, 552, 366, 588, 2295, 10139, + 1860, 411, 562, 321, 360, 18927, 278, 51592], "temperature": 0.0, "avg_logprob": + -0.14194954958829012, "compression_ratio": 1.686832740213523, "no_speech_prob": + 0.00752654392272234}, {"id": 582, "seek": 313068, "start": 3155.24, "end": 3159.3199999999997, + "text": " on a billion that we''re working on right now the biggest data sets we + can find are like 64", "tokens": [51592, 322, 257, 5218, 300, 321, 434, 1364, 322, + 558, 586, 264, 3880, 1412, 6352, 321, 393, 915, 366, 411, 12145, 51796], "temperature": + 0.0, "avg_logprob": -0.14194954958829012, "compression_ratio": 1.686832740213523, + "no_speech_prob": 0.00752654392272234}, {"id": 583, "seek": 315932, "start": 3159.32, + "end": 3164.28, "text": " dimensions this is not what people are doing in production + they''re doing at least 512 often", "tokens": [50364, 12819, 341, 307, 406, 437, + 561, 366, 884, 294, 4265, 436, 434, 884, 412, 1935, 1025, 4762, 2049, 50612], "temperature": + 0.0, "avg_logprob": -0.1559716510772705, "compression_ratio": 1.83206106870229, + "no_speech_prob": 0.0037499640602618456}, {"id": 584, "seek": 315932, "start": 3165.32, + "end": 3170.92, "text": " generally I''d say the average is around 768 dimensions + these are not representative data sets", "tokens": [50664, 5101, 286, 1116, 584, + 264, 4274, 307, 926, 24733, 23, 12819, 613, 366, 406, 12424, 1412, 6352, 50944], + "temperature": 0.0, "avg_logprob": -0.1559716510772705, "compression_ratio": 1.83206106870229, + "no_speech_prob": 0.0037499640602618456}, {"id": 585, "seek": 315932, "start": 3170.92, + "end": 3175.1600000000003, "text": " and the distributions in the academic benchmarks + are also completely different for what we see in", "tokens": [50944, 293, 264, 37870, + 294, 264, 7778, 43751, 366, 611, 2584, 819, 337, 437, 321, 536, 294, 51156], "temperature": + 0.0, "avg_logprob": -0.1559716510772705, "compression_ratio": 1.83206106870229, + "no_speech_prob": 0.0037499640602618456}, {"id": 586, "seek": 315932, "start": 3175.1600000000003, + "end": 3181.56, "text": " real data sets right in real data sets we see millions + of copy of duplicates right we see filtering", "tokens": [51156, 957, 1412, 6352, + 558, 294, 957, 1412, 6352, 321, 536, 6803, 295, 5055, 295, 17154, 1024, 558, 321, + 536, 30822, 51476], "temperature": 0.0, "avg_logprob": -0.1559716510772705, "compression_ratio": + 1.83206106870229, "no_speech_prob": 0.0037499640602618456}, {"id": 587, "seek": + 315932, "start": 3181.56, "end": 3186.52, "text": " all these chaotic environments + that do not present themselves in the academic thench works so if", "tokens": [51476, + 439, 613, 27013, 12388, 300, 360, 406, 1974, 2969, 294, 264, 7778, 550, 339, 1985, + 370, 498, 51724], "temperature": 0.0, "avg_logprob": -0.1559716510772705, "compression_ratio": + 1.83206106870229, "no_speech_prob": 0.0037499640602618456}, {"id": 588, "seek": + 318652, "start": 3187.16, "end": 3192.28, "text": " if you''re using a vector index + that''s only been tested on academic benchmarks it''s it''s I mean", "tokens": [50396, + 498, 291, 434, 1228, 257, 8062, 8186, 300, 311, 787, 668, 8246, 322, 7778, 43751, + 309, 311, 309, 311, 286, 914, 50652], "temperature": 0.0, "avg_logprob": -0.22574036862669872, + "compression_ratio": 1.9146341463414633, "no_speech_prob": 0.006672090385109186}, + {"id": 589, "seek": 318652, "start": 3192.28, "end": 3196.04, "text": " it''s like + the LLMs right it''s like you don''t you don''t really trust it just based on", + "tokens": [50652, 309, 311, 411, 264, 441, 43, 26386, 558, 309, 311, 411, 291, 500, + 380, 291, 500, 380, 534, 3361, 309, 445, 2361, 322, 50840], "temperature": 0.0, + "avg_logprob": -0.22574036862669872, "compression_ratio": 1.9146341463414633, "no_speech_prob": + 0.006672090385109186}, {"id": 590, "seek": 318652, "start": 3196.04, "end": 3201.32, + "text": " discording it''s sort of you it''s all the vibes right it''s all the qualitative + thing right outside", "tokens": [50840, 32989, 278, 309, 311, 1333, 295, 291, 309, + 311, 439, 264, 27636, 558, 309, 311, 439, 264, 31312, 551, 558, 2380, 51104], "temperature": + 0.0, "avg_logprob": -0.22574036862669872, "compression_ratio": 1.9146341463414633, + "no_speech_prob": 0.006672090385109186}, {"id": 591, "seek": 318652, "start": 3201.32, + "end": 3205.0, "text": " the benchmark was that everyone was dreaming on them that + it will work for your domain right like", "tokens": [51104, 264, 18927, 390, 300, + 1518, 390, 21475, 322, 552, 300, 309, 486, 589, 337, 428, 9274, 558, 411, 51288], + "temperature": 0.0, "avg_logprob": -0.22574036862669872, "compression_ratio": 1.9146341463414633, + "no_speech_prob": 0.006672090385109186}, {"id": 592, "seek": 318652, "start": 3205.0, + "end": 3211.56, "text": " the LLM that''s right like early on very very early on + in in in interval puffer''s history in the", "tokens": [51288, 264, 441, 43, 44, + 300, 311, 558, 411, 2440, 322, 588, 588, 2440, 322, 294, 294, 294, 15035, 19613, + 260, 311, 2503, 294, 264, 51616], "temperature": 0.0, "avg_logprob": -0.22574036862669872, + "compression_ratio": 1.9146341463414633, "no_speech_prob": 0.006672090385109186}, + {"id": 593, "seek": 321156, "start": 3211.56, "end": 3216.84, "text": " first month + I was mainly iterating against the SIFT data set right just like 128 dimensional", + "tokens": [50364, 700, 1618, 286, 390, 8704, 17138, 990, 1970, 264, 318, 12775, + 51, 1412, 992, 558, 445, 411, 29810, 18795, 50628], "temperature": 0.0, "avg_logprob": + -0.17388935226330654, "compression_ratio": 1.8158730158730159, "no_speech_prob": + 0.0015733742620795965}, {"id": 594, "seek": 321156, "start": 3216.84, "end": 3219.88, + "text": " data set I didn''t know anything about an end at the time so it''s like + okay this is pretty good", "tokens": [50628, 1412, 992, 286, 994, 380, 458, 1340, + 466, 364, 917, 412, 264, 565, 370, 309, 311, 411, 1392, 341, 307, 1238, 665, 50780], + "temperature": 0.0, "avg_logprob": -0.17388935226330654, "compression_ratio": 1.8158730158730159, + "no_speech_prob": 0.0015733742620795965}, {"id": 595, "seek": 321156, "start": 3219.88, + "end": 3224.6, "text": " we can tune some risks on this and then I can do go wider + but I have the feedback loop and the", "tokens": [50780, 321, 393, 10864, 512, 10888, + 322, 341, 293, 550, 286, 393, 360, 352, 11842, 457, 286, 362, 264, 5824, 6367, 293, + 264, 51016], "temperature": 0.0, "avg_logprob": -0.17388935226330654, "compression_ratio": + 1.8158730158730159, "no_speech_prob": 0.0015733742620795965}, {"id": 596, "seek": + 321156, "start": 3224.6, "end": 3229.7999999999997, "text": " observation I had + at the time was that I found that one I so I got something that worked really", + "tokens": [51016, 14816, 286, 632, 412, 264, 565, 390, 300, 286, 1352, 300, 472, + 286, 370, 286, 658, 746, 300, 2732, 534, 51276], "temperature": 0.0, "avg_logprob": + -0.17388935226330654, "compression_ratio": 1.8158730158730159, "no_speech_prob": + 0.0015733742620795965}, {"id": 597, "seek": 321156, "start": 3229.7999999999997, + "end": 3234.52, "text": " well great heristics on SIFT and then when I went it on + the other data sets it just completely", "tokens": [51276, 731, 869, 720, 6006, + 322, 318, 12775, 51, 293, 550, 562, 286, 1437, 309, 322, 264, 661, 1412, 6352, 309, + 445, 2584, 51512], "temperature": 0.0, "avg_logprob": -0.17388935226330654, "compression_ratio": + 1.8158730158730159, "no_speech_prob": 0.0015733742620795965}, {"id": 598, "seek": + 321156, "start": 3234.52, "end": 3239.64, "text": " did not work well or generalized + to the other data sets and I think that taught me an early lesson", "tokens": [51512, + 630, 406, 589, 731, 420, 44498, 281, 264, 661, 1412, 6352, 293, 286, 519, 300, 5928, + 385, 364, 2440, 6898, 51768], "temperature": 0.0, "avg_logprob": -0.17388935226330654, + "compression_ratio": 1.8158730158730159, "no_speech_prob": 0.0015733742620795965}, + {"id": 599, "seek": 323964, "start": 3239.72, "end": 3246.04, "text": " that the + these academic data sets are just not enough and the only way to know what your + recall is", "tokens": [50368, 300, 264, 613, 7778, 1412, 6352, 366, 445, 406, 1547, + 293, 264, 787, 636, 281, 458, 437, 428, 9901, 307, 50684], "temperature": 0.0, "avg_logprob": + -0.12590561699621455, "compression_ratio": 1.610655737704918, "no_speech_prob": + 0.002041584113612771}, {"id": 600, "seek": 323964, "start": 3246.04, "end": 3253.4, + "text": " going to be is to measure it in production this is what TurboPuffer does + for a percentage of queries", "tokens": [50684, 516, 281, 312, 307, 281, 3481, 309, + 294, 4265, 341, 307, 437, 35848, 47, 1245, 260, 775, 337, 257, 9668, 295, 24109, + 51052], "temperature": 0.0, "avg_logprob": -0.12590561699621455, "compression_ratio": + 1.610655737704918, "no_speech_prob": 0.002041584113612771}, {"id": 601, "seek": + 323964, "start": 3253.4, "end": 3259.16, "text": " it depends on the number of queries + that you do but let''s say around 1% of queries TurboPuffer will", "tokens": [51052, + 309, 5946, 322, 264, 1230, 295, 24109, 300, 291, 360, 457, 718, 311, 584, 926, 502, + 4, 295, 24109, 35848, 47, 1245, 260, 486, 51340], "temperature": 0.0, "avg_logprob": + -0.12590561699621455, "compression_ratio": 1.610655737704918, "no_speech_prob": + 0.002041584113612771}, {"id": 602, "seek": 323964, "start": 3259.16, "end": 3266.04, + "text": " run an exhaustive search against the A&N index on a separate worker fleet + we will then emit a", "tokens": [51340, 1190, 364, 14687, 488, 3164, 1970, 264, + 316, 5, 45, 8186, 322, 257, 4994, 11346, 19396, 321, 486, 550, 32084, 257, 51684], + "temperature": 0.0, "avg_logprob": -0.12590561699621455, "compression_ratio": 1.610655737704918, + "no_speech_prob": 0.002041584113612771}, {"id": 603, "seek": 326604, "start": 3266.04, + "end": 3272.2799999999997, "text": " metric to data dog that is the recall number + right for this query right like which is basically", "tokens": [50364, 20678, 281, + 1412, 3000, 300, 307, 264, 9901, 1230, 558, 337, 341, 14581, 558, 411, 597, 307, + 1936, 50676], "temperature": 0.0, "avg_logprob": -0.16261936289019288, "compression_ratio": + 1.8814229249011858, "no_speech_prob": 0.004885130096226931}, {"id": 604, "seek": + 326604, "start": 3272.2799999999997, "end": 3276.44, "text": " okay this is the + top 10 we know is accurate and this is the you know heristic A&N index", "tokens": + [50676, 1392, 341, 307, 264, 1192, 1266, 321, 458, 307, 8559, 293, 341, 307, 264, + 291, 458, 720, 3142, 316, 5, 45, 8186, 50884], "temperature": 0.0, "avg_logprob": + -0.16261936289019288, "compression_ratio": 1.8814229249011858, "no_speech_prob": + 0.004885130096226931}, {"id": 605, "seek": 326604, "start": 3276.44, "end": 3281.48, + "text": " is what''s the overlap and we will average that over time I have a graph + in data dog that shows all", "tokens": [50884, 307, 437, 311, 264, 19959, 293, 321, + 486, 4274, 300, 670, 565, 286, 362, 257, 4295, 294, 1412, 3000, 300, 3110, 439, + 51136], "temperature": 0.0, "avg_logprob": -0.16261936289019288, "compression_ratio": + 1.8814229249011858, "no_speech_prob": 0.004885130096226931}, {"id": 606, "seek": + 326604, "start": 3281.48, "end": 3287.24, "text": " the different organizations + that have more than 100 queries in the in the past in the past hour", "tokens": + [51136, 264, 819, 6150, 300, 362, 544, 813, 2319, 24109, 294, 264, 294, 264, 1791, + 294, 264, 1791, 1773, 51424], "temperature": 0.0, "avg_logprob": -0.16261936289019288, + "compression_ratio": 1.8814229249011858, "no_speech_prob": 0.004885130096226931}, + {"id": 607, "seek": 326604, "start": 3287.24, "end": 3291.56, "text": " or whatever + and then we have the recall for all of them we have the recall at what they asked + for", "tokens": [51424, 420, 2035, 293, 550, 321, 362, 264, 9901, 337, 439, 295, + 552, 321, 362, 264, 9901, 412, 437, 436, 2351, 337, 51640], "temperature": 0.0, + "avg_logprob": -0.16261936289019288, "compression_ratio": 1.8814229249011858, "no_speech_prob": + 0.004885130096226931}, {"id": 608, "seek": 329156, "start": 3291.56, "end": 3297.16, + "text": " to recall a 10 the p10 recall the p90 recall and we try to our best to + make sure that this is", "tokens": [50364, 281, 9901, 257, 1266, 264, 280, 3279, + 9901, 264, 280, 7771, 9901, 293, 321, 853, 281, 527, 1151, 281, 652, 988, 300, 341, + 307, 50644], "temperature": 0.0, "avg_logprob": -0.11797353956434461, "compression_ratio": + 1.7419354838709677, "no_speech_prob": 0.0036414596252143383}, {"id": 609, "seek": + 329156, "start": 3297.16, "end": 3305.4, "text": " green at all times and we consider + green anything above 90% is generally quite good it well 90%", "tokens": [50644, + 3092, 412, 439, 1413, 293, 321, 1949, 3092, 1340, 3673, 4289, 4, 307, 5101, 1596, + 665, 309, 731, 4289, 4, 51056], "temperature": 0.0, "avg_logprob": -0.11797353956434461, + "compression_ratio": 1.7419354838709677, "no_speech_prob": 0.0036414596252143383}, + {"id": 610, "seek": 329156, "start": 3305.4, "end": 3311.32, "text": " is is quite + good for some queries but for simple queries often it''s closer to 100% many of + our", "tokens": [51056, 307, 307, 1596, 665, 337, 512, 24109, 457, 337, 2199, 24109, + 2049, 309, 311, 4966, 281, 2319, 4, 867, 295, 527, 51352], "temperature": 0.0, "avg_logprob": + -0.11797353956434461, "compression_ratio": 1.7419354838709677, "no_speech_prob": + 0.0036414596252143383}, {"id": 611, "seek": 329156, "start": 3311.32, "end": 3318.68, + "text": " customers have have 99.5% recall so this is the only way that we know + to do this and it''s fun", "tokens": [51352, 4581, 362, 362, 11803, 13, 20, 4, 9901, + 370, 341, 307, 264, 787, 636, 300, 321, 458, 281, 360, 341, 293, 309, 311, 1019, + 51720], "temperature": 0.0, "avg_logprob": -0.11797353956434461, "compression_ratio": + 1.7419354838709677, "no_speech_prob": 0.0036414596252143383}, {"id": 612, "seek": + 331868, "start": 3318.68, "end": 3324.12, "text": " you ask this question today + because last night I was I was hacking on putting this into the dashboard", "tokens": + [50364, 291, 1029, 341, 1168, 965, 570, 1036, 1818, 286, 390, 286, 390, 31422, 322, + 3372, 341, 666, 264, 18342, 50636], "temperature": 0.0, "avg_logprob": -0.08967226522940176, + "compression_ratio": 1.807511737089202, "no_speech_prob": 0.006659318692982197}, + {"id": 613, "seek": 331868, "start": 3324.12, "end": 3329.64, "text": " so literally + putting the recall that we observe from this from this monitoring system into the", + "tokens": [50636, 370, 3736, 3372, 264, 9901, 300, 321, 11441, 490, 341, 490, 341, + 11028, 1185, 666, 264, 50912], "temperature": 0.0, "avg_logprob": -0.08967226522940176, + "compression_ratio": 1.807511737089202, "no_speech_prob": 0.006659318692982197}, + {"id": 614, "seek": 331868, "start": 3329.64, "end": 3335.56, "text": " dashboard + of the user because we think it''s that important and it''s very difficult to get + right", "tokens": [50912, 18342, 295, 264, 4195, 570, 321, 519, 309, 311, 300, 1021, + 293, 309, 311, 588, 2252, 281, 483, 558, 51208], "temperature": 0.0, "avg_logprob": + -0.08967226522940176, "compression_ratio": 1.807511737089202, "no_speech_prob": + 0.006659318692982197}, {"id": 615, "seek": 331868, "start": 3335.56, "end": 3340.9199999999996, + "text": " we have spent thousands of engineering hours to make sure that the recall + is high now recall", "tokens": [51208, 321, 362, 4418, 5383, 295, 7043, 2496, 281, + 652, 988, 300, 264, 9901, 307, 1090, 586, 9901, 51476], "temperature": 0.0, "avg_logprob": + -0.08967226522940176, "compression_ratio": 1.807511737089202, "no_speech_prob": + 0.006659318692982197}, {"id": 616, "seek": 334092, "start": 3340.92, "end": 3349.88, + "text": " on academic benchmarks easy recall on raw ann search is especially on + academic benchmarks very easy", "tokens": [50364, 322, 7778, 43751, 1858, 9901, + 322, 8936, 364, 77, 3164, 307, 2318, 322, 7778, 43751, 588, 1858, 50812], "temperature": + 0.0, "avg_logprob": -0.2057559225294325, "compression_ratio": 1.6470588235294117, + "no_speech_prob": 0.004156796727329493}, {"id": 617, "seek": 334092, "start": 3351.96, + "end": 3357.56, "text": " raw recall on production data sets I''d say medium to + medium hard", "tokens": [50916, 8936, 9901, 322, 4265, 1412, 6352, 286, 1116, 584, + 6399, 281, 6399, 1152, 51196], "temperature": 0.0, "avg_logprob": -0.2057559225294325, + "compression_ratio": 1.6470588235294117, "no_speech_prob": 0.004156796727329493}, + {"id": 618, "seek": 334092, "start": 3359.2400000000002, "end": 3366.6, "text": + " high recall on ann queries with filters with mixed selectivity and incremental + indexing", "tokens": [51280, 1090, 9901, 322, 364, 77, 24109, 365, 15995, 365, 7467, + 3048, 4253, 293, 35759, 8186, 278, 51648], "temperature": 0.0, "avg_logprob": -0.2057559225294325, + "compression_ratio": 1.6470588235294117, "no_speech_prob": 0.004156796727329493}, + {"id": 619, "seek": 336660, "start": 3366.6, "end": 3372.2, "text": " absolute hard + mode this is what the like you just slap a secondary vector index onto an existing", + "tokens": [50364, 8236, 1152, 4391, 341, 307, 437, 264, 411, 291, 445, 21075, 257, + 11396, 8062, 8186, 3911, 364, 6741, 50644], "temperature": 0.0, "avg_logprob": -0.09728489098725496, + "compression_ratio": 1.8371212121212122, "no_speech_prob": 0.003582586534321308}, + {"id": 620, "seek": 336660, "start": 3372.2, "end": 3377.3199999999997, "text": + " database this is what they can''t do they can''t sustain them like a thousand + writes per second", "tokens": [50644, 8149, 341, 307, 437, 436, 393, 380, 360, 436, + 393, 380, 6769, 552, 411, 257, 4714, 13657, 680, 1150, 50900], "temperature": 0.0, + "avg_logprob": -0.09728489098725496, "compression_ratio": 1.8371212121212122, "no_speech_prob": + 0.003582586534321308}, {"id": 621, "seek": 336660, "start": 3377.3199999999997, + "end": 3382.8399999999997, "text": " with high recall in the face of very difficult + filter queries so let''s talk about filters recall", "tokens": [50900, 365, 1090, + 9901, 294, 264, 1851, 295, 588, 2252, 6608, 24109, 370, 718, 311, 751, 466, 15995, + 9901, 51176], "temperature": 0.0, "avg_logprob": -0.09728489098725496, "compression_ratio": + 1.8371212121212122, "no_speech_prob": 0.003582586534321308}, {"id": 622, "seek": + 336660, "start": 3382.8399999999997, "end": 3388.7599999999998, "text": " for a + second there is barely any academic data sets on this yet it''s all the production + workloads", "tokens": [51176, 337, 257, 1150, 456, 307, 10268, 604, 7778, 1412, + 6352, 322, 341, 1939, 309, 311, 439, 264, 4265, 32452, 51472], "temperature": 0.0, + "avg_logprob": -0.09728489098725496, "compression_ratio": 1.8371212121212122, "no_speech_prob": + 0.003582586534321308}, {"id": 623, "seek": 336660, "start": 3389.48, "end": 3395.0, + "text": " what a filtered in an index means is that let''s say that for example + you have you have an ecommerce", "tokens": [51508, 437, 257, 37111, 294, 364, 8186, + 1355, 307, 300, 718, 311, 584, 300, 337, 1365, 291, 362, 291, 362, 364, 308, 26926, + 51784], "temperature": 0.0, "avg_logprob": -0.09728489098725496, "compression_ratio": + 1.8371212121212122, "no_speech_prob": 0.003582586534321308}, {"id": 624, "seek": + 339500, "start": 3395.72, "end": 3404.44, "text": " and you''re searching for you''re + searching for I don''t know yellow right and you want to only get", "tokens": [50400, + 293, 291, 434, 10808, 337, 291, 434, 10808, 337, 286, 500, 380, 458, 5566, 558, + 293, 291, 528, 281, 787, 483, 50836], "temperature": 0.0, "avg_logprob": -0.14860212802886963, + "compression_ratio": 1.6844444444444444, "no_speech_prob": 0.005369552411139011}, + {"id": 625, "seek": 339500, "start": 3404.44, "end": 3410.04, "text": " things that + ship to Canada that cuts the clusters in different weird ways that might end up + with", "tokens": [50836, 721, 300, 5374, 281, 6309, 300, 9992, 264, 23313, 294, + 819, 3657, 2098, 300, 1062, 917, 493, 365, 51116], "temperature": 0.0, "avg_logprob": + -0.14860212802886963, "compression_ratio": 1.6844444444444444, "no_speech_prob": + 0.005369552411139011}, {"id": 626, "seek": 339500, "start": 3410.04, "end": 3416.28, + "text": " a selectivity of 50% and so if you just visit the the closest whatever + vectors with some", "tokens": [51116, 257, 3048, 4253, 295, 2625, 4, 293, 370, 498, + 291, 445, 3441, 264, 264, 13699, 2035, 18875, 365, 512, 51428], "temperature": 0.0, + "avg_logprob": -0.14860212802886963, "compression_ratio": 1.6844444444444444, "no_speech_prob": + 0.005369552411139011}, {"id": 627, "seek": 339500, "start": 3416.28, "end": 3420.76, + "text": " horrific you have you''re not going to get the true ann because you actually + have to search maybe", "tokens": [51428, 29248, 291, 362, 291, 434, 406, 516, 281, + 483, 264, 2074, 364, 77, 570, 291, 767, 362, 281, 3164, 1310, 51652], "temperature": + 0.0, "avg_logprob": -0.14860212802886963, "compression_ratio": 1.6844444444444444, + "no_speech_prob": 0.005369552411139011}, {"id": 628, "seek": 342076, "start": 3420.76, + "end": 3425.7200000000003, "text": " twice as many maybe three times as many vectors + to get the right recall the query planner the", "tokens": [50364, 6091, 382, 867, + 1310, 1045, 1413, 382, 867, 18875, 281, 483, 264, 558, 9901, 264, 14581, 31268, + 264, 50612], "temperature": 0.0, "avg_logprob": -0.08776487244500054, "compression_ratio": + 1.9362549800796813, "no_speech_prob": 0.0009666657424531877}, {"id": 629, "seek": + 342076, "start": 3425.7200000000003, "end": 3430.36, "text": " thing in the database + that decides where to go on disk and figure out the data and aggregate it", "tokens": + [50612, 551, 294, 264, 8149, 300, 14898, 689, 281, 352, 322, 12355, 293, 2573, 484, + 264, 1412, 293, 26118, 309, 50844], "temperature": 0.0, "avg_logprob": -0.08776487244500054, + "compression_ratio": 1.9362549800796813, "no_speech_prob": 0.0009666657424531877}, + {"id": 630, "seek": 342076, "start": 3430.36, "end": 3435.7200000000003, "text": + " altogether and return into the user needs to be aware of the selectivity of the + filter and plan", "tokens": [50844, 19051, 293, 2736, 666, 264, 4195, 2203, 281, + 312, 3650, 295, 264, 3048, 4253, 295, 264, 6608, 293, 1393, 51112], "temperature": + 0.0, "avg_logprob": -0.08776487244500054, "compression_ratio": 1.9362549800796813, + "no_speech_prob": 0.0009666657424531877}, {"id": 631, "seek": 342076, "start": 3435.7200000000003, + "end": 3441.7200000000003, "text": " that into the ann index again if a database + is not really serious about their vector offering they''re", "tokens": [51112, 300, + 666, 264, 364, 77, 8186, 797, 498, 257, 8149, 307, 406, 534, 3156, 466, 641, 8062, + 8745, 436, 434, 51412], "temperature": 0.0, "avg_logprob": -0.08776487244500054, + "compression_ratio": 1.9362549800796813, "no_speech_prob": 0.0009666657424531877}, + {"id": 632, "seek": 342076, "start": 3441.7200000000003, "end": 3445.5600000000004, + "text": " not doing this they''re not measuring in a production they''re not they''re + not willing to show their", "tokens": [51412, 406, 884, 341, 436, 434, 406, 13389, + 294, 257, 4265, 436, 434, 406, 436, 434, 406, 4950, 281, 855, 641, 51604], "temperature": + 0.0, "avg_logprob": -0.08776487244500054, "compression_ratio": 1.9362549800796813, + "no_speech_prob": 0.0009666657424531877}, {"id": 633, "seek": 344556, "start": 3445.56, + "end": 3452.2, "text": " users and they''re not they don''t have a full infrastructure + in place to measure the recall so", "tokens": [50364, 5022, 293, 436, 434, 406, + 436, 500, 380, 362, 257, 1577, 6896, 294, 1081, 281, 3481, 264, 9901, 370, 50696], + "temperature": 0.0, "avg_logprob": -0.16591591902182135, "compression_ratio": 1.7117647058823529, + "no_speech_prob": 0.0002247083029942587}, {"id": 634, "seek": 344556, "start": 3452.2, + "end": 3460.04, "text": " I''d say we take this extremely seriously and we don''t + want our users to have to get to get to", "tokens": [50696, 286, 1116, 584, 321, + 747, 341, 4664, 6638, 293, 321, 500, 380, 528, 527, 5022, 281, 362, 281, 483, 281, + 483, 281, 51088], "temperature": 0.0, "avg_logprob": -0.16591591902182135, "compression_ratio": + 1.7117647058823529, "no_speech_prob": 0.0002247083029942587}, {"id": 635, "seek": + 344556, "start": 3460.04, "end": 3468.92, "text": " guest this and it''s a it''s + sometimes a thankless job because because many many many many emails that we", "tokens": + [51088, 8341, 341, 293, 309, 311, 257, 309, 311, 2171, 257, 1309, 1832, 1691, 570, + 570, 867, 867, 867, 867, 12524, 300, 321, 51532], "temperature": 0.0, "avg_logprob": + -0.16591591902182135, "compression_ratio": 1.7117647058823529, "no_speech_prob": + 0.0002247083029942587}, {"id": 636, "seek": 346892, "start": 3468.92, "end": 3476.12, + "text": " see against some of the other vector indexes have very low recall and + and how are users supposed", "tokens": [50364, 536, 1970, 512, 295, 264, 661, 8062, + 8186, 279, 362, 588, 2295, 9901, 293, 293, 577, 366, 5022, 3442, 50724], "temperature": + 0.0, "avg_logprob": -0.11930939886305067, "compression_ratio": 1.8910505836575875, + "no_speech_prob": 0.0047335028648376465}, {"id": 637, "seek": 346892, "start": 3476.12, + "end": 3481.2400000000002, "text": " to know because running these running these + tests is extremely difficult it is and it''s like it''s", "tokens": [50724, 281, + 458, 570, 2614, 613, 2614, 613, 6921, 307, 4664, 2252, 309, 307, 293, 309, 311, + 411, 309, 311, 50980], "temperature": 0.0, "avg_logprob": -0.11930939886305067, + "compression_ratio": 1.8910505836575875, "no_speech_prob": 0.0047335028648376465}, + {"id": 638, "seek": 346892, "start": 3481.2400000000002, "end": 3485.96, "text": + " as you said like you need to trust there right trust your vendor and it''s basically + like the like", "tokens": [50980, 382, 291, 848, 411, 291, 643, 281, 3361, 456, + 558, 3361, 428, 24321, 293, 309, 311, 1936, 411, 264, 411, 51216], "temperature": + 0.0, "avg_logprob": -0.11930939886305067, "compression_ratio": 1.8910505836575875, + "no_speech_prob": 0.0047335028648376465}, {"id": 639, "seek": 346892, "start": 3485.96, + "end": 3491.7200000000003, "text": " in some documentation pages you say the floor + or like the bottom line right like the needs of each", "tokens": [51216, 294, 512, + 14333, 7183, 291, 584, 264, 4123, 420, 411, 264, 2767, 1622, 558, 411, 264, 2203, + 295, 1184, 51504], "temperature": 0.0, "avg_logprob": -0.11930939886305067, "compression_ratio": + 1.8910505836575875, "no_speech_prob": 0.0047335028648376465}, {"id": 640, "seek": + 346892, "start": 3492.76, "end": 3497.7200000000003, "text": " it just doesn''t + make sense right if the quality isn''t there then why are you even running this", + "tokens": [51556, 309, 445, 1177, 380, 652, 2020, 558, 498, 264, 3125, 1943, 380, + 456, 550, 983, 366, 291, 754, 2614, 341, 51804], "temperature": 0.0, "avg_logprob": + -0.11930939886305067, "compression_ratio": 1.8910505836575875, "no_speech_prob": + 0.0047335028648376465}, {"id": 641, "seek": 349892, "start": 3499.4, "end": 3503.32, + "text": " it''s a difference between you know finding that product with those constraints", + "tokens": [50388, 309, 311, 257, 2649, 1296, 291, 458, 5006, 300, 1674, 365, 729, + 18491, 50584], "temperature": 0.0, "avg_logprob": -0.1803402232232495, "compression_ratio": + 1.7572463768115942, "no_speech_prob": 0.003517464967444539}, {"id": 642, "seek": + 349892, "start": 3504.12, "end": 3509.8, "text": " when it exists and actually not + finding it right therefore not buying it and so on and so on and so forth", "tokens": + [50624, 562, 309, 8198, 293, 767, 406, 5006, 309, 558, 4412, 406, 6382, 309, 293, + 370, 322, 293, 370, 322, 293, 370, 5220, 50908], "temperature": 0.0, "avg_logprob": + -0.1803402232232495, "compression_ratio": 1.7572463768115942, "no_speech_prob": + 0.003517464967444539}, {"id": 643, "seek": 349892, "start": 3510.6, "end": 3517.4, + "text": " it''s right and I think yeah I you can never guarantee a recall you can + observe what you are trying to", "tokens": [50948, 309, 311, 558, 293, 286, 519, + 1338, 286, 291, 393, 1128, 10815, 257, 9901, 291, 393, 11441, 437, 291, 366, 1382, + 281, 51288], "temperature": 0.0, "avg_logprob": -0.1803402232232495, "compression_ratio": + 1.7572463768115942, "no_speech_prob": 0.003517464967444539}, {"id": 644, "seek": + 349892, "start": 3517.4, "end": 3523.7200000000003, "text": " make it be here on + every data set but if you send a billion completely random vectors with 3000", "tokens": + [51288, 652, 309, 312, 510, 322, 633, 1412, 992, 457, 498, 291, 2845, 257, 5218, + 2584, 4974, 18875, 365, 20984, 51604], "temperature": 0.0, "avg_logprob": -0.1803402232232495, + "compression_ratio": 1.7572463768115942, "no_speech_prob": 0.003517464967444539}, + {"id": 645, "seek": 349892, "start": 3523.7200000000003, "end": 3528.76, "text": + " dimensions and try to query them in turbo buffer with query vectors and there + is no natural clustering", "tokens": [51604, 12819, 293, 853, 281, 14581, 552, 294, + 20902, 21762, 365, 14581, 18875, 293, 456, 307, 572, 3303, 596, 48673, 51856], "temperature": + 0.0, "avg_logprob": -0.1803402232232495, "compression_ratio": 1.7572463768115942, + "no_speech_prob": 0.003517464967444539}, {"id": 646, "seek": 352876, "start": 3528.76, + "end": 3533.88, "text": " because they''re random vectors you''re not going to get + 100% recall when you send that with a 10%", "tokens": [50364, 570, 436, 434, 4974, + 18875, 291, 434, 406, 516, 281, 483, 2319, 4, 9901, 562, 291, 2845, 300, 365, 257, + 1266, 4, 50620], "temperature": 0.0, "avg_logprob": -0.10300642147398832, "compression_ratio": + 1.7048611111111112, "no_speech_prob": 0.0007871618145145476}, {"id": 647, "seek": + 352876, "start": 3533.88, "end": 3539.48, "text": " selectivity filters that just + like completely breaks every heuristic that''s made right but all", "tokens": [50620, + 3048, 4253, 15995, 300, 445, 411, 2584, 9857, 633, 415, 374, 3142, 300, 311, 1027, + 558, 457, 439, 50900], "temperature": 0.0, "avg_logprob": -0.10300642147398832, + "compression_ratio": 1.7048611111111112, "no_speech_prob": 0.0007871618145145476}, + {"id": 648, "seek": 352876, "start": 3539.48, "end": 3544.36, "text": " data in + production real data that people want to search has some natural clustering to it + so that''s", "tokens": [50900, 1412, 294, 4265, 957, 1412, 300, 561, 528, 281, 3164, + 575, 512, 3303, 596, 48673, 281, 309, 370, 300, 311, 51144], "temperature": 0.0, + "avg_logprob": -0.10300642147398832, "compression_ratio": 1.7048611111111112, "no_speech_prob": + 0.0007871618145145476}, {"id": 649, "seek": 352876, "start": 3544.36, "end": 3549.48, + "text": " not a real benchmark that you can evaluate recall on right and so we always + take this seriously and", "tokens": [51144, 406, 257, 957, 18927, 300, 291, 393, + 13059, 9901, 322, 558, 293, 370, 321, 1009, 747, 341, 6638, 293, 51400], "temperature": + 0.0, "avg_logprob": -0.10300642147398832, "compression_ratio": 1.7048611111111112, + "no_speech_prob": 0.0007871618145145476}, {"id": 650, "seek": 352876, "start": 3549.48, + "end": 3554.2000000000003, "text": " the in the in the in POCs and with the monitoring + we do we''re looking at these numbers all the time", "tokens": [51400, 264, 294, + 264, 294, 264, 294, 22299, 33290, 293, 365, 264, 11028, 321, 360, 321, 434, 1237, + 412, 613, 3547, 439, 264, 565, 51636], "temperature": 0.0, "avg_logprob": -0.10300642147398832, + "compression_ratio": 1.7048611111111112, "no_speech_prob": 0.0007871618145145476}, + {"id": 651, "seek": 355420, "start": 3554.2799999999997, "end": 3559.3199999999997, + "text": " but there are like absolute edge cases that can be very very difficult + and what you have to do too", "tokens": [50368, 457, 456, 366, 411, 8236, 4691, + 3331, 300, 393, 312, 588, 588, 2252, 293, 437, 291, 362, 281, 360, 886, 50620], + "temperature": 0.0, "avg_logprob": -0.0928897714256344, "compression_ratio": 2.0701107011070112, + "no_speech_prob": 0.0029162555001676083}, {"id": 652, "seek": 355420, "start": 3559.3199999999997, + "end": 3564.3599999999997, "text": " as a database vendor is like it''s a tug is + a tug of war between we''re going to look at more data to", "tokens": [50620, 382, + 257, 8149, 24321, 307, 411, 309, 311, 257, 33543, 307, 257, 33543, 295, 1516, 1296, + 321, 434, 516, 281, 574, 412, 544, 1412, 281, 50872], "temperature": 0.0, "avg_logprob": + -0.0928897714256344, "compression_ratio": 2.0701107011070112, "no_speech_prob": + 0.0029162555001676083}, {"id": 653, "seek": 355420, "start": 3564.3599999999997, + "end": 3569.48, "text": " try to get high recall and we''re going to try to improve + the clustering of the data so that we", "tokens": [50872, 853, 281, 483, 1090, 9901, + 293, 321, 434, 516, 281, 853, 281, 3470, 264, 596, 48673, 295, 264, 1412, 370, 300, + 321, 51128], "temperature": 0.0, "avg_logprob": -0.0928897714256344, "compression_ratio": + 2.0701107011070112, "no_speech_prob": 0.0029162555001676083}, {"id": 654, "seek": + 355420, "start": 3569.48, "end": 3573.72, "text": " have to search less data and + so you''re always trying to improve the clustering and you''re always", "tokens": + [51128, 362, 281, 3164, 1570, 1412, 293, 370, 291, 434, 1009, 1382, 281, 3470, 264, + 596, 48673, 293, 291, 434, 1009, 51340], "temperature": 0.0, "avg_logprob": -0.0928897714256344, + "compression_ratio": 2.0701107011070112, "no_speech_prob": 0.0029162555001676083}, + {"id": 655, "seek": 355420, "start": 3573.72, "end": 3577.3999999999996, "text": + " trying to improve the performance of the database so we can look at more data + to get high recall", "tokens": [51340, 1382, 281, 3470, 264, 3389, 295, 264, 8149, + 370, 321, 393, 574, 412, 544, 1412, 281, 483, 1090, 9901, 51524], "temperature": + 0.0, "avg_logprob": -0.0928897714256344, "compression_ratio": 2.0701107011070112, + "no_speech_prob": 0.0029162555001676083}, {"id": 656, "seek": 355420, "start": 3578.04, + "end": 3583.0, "text": " yeah for sure I know that you mentioned about filters you + know challenges", "tokens": [51556, 1338, 337, 988, 286, 458, 300, 291, 2835, 466, + 15995, 291, 458, 4759, 51804], "temperature": 0.0, "avg_logprob": -0.0928897714256344, + "compression_ratio": 2.0701107011070112, "no_speech_prob": 0.0029162555001676083}, + {"id": 657, "seek": 358300, "start": 3583.08, "end": 3587.8, "text": " vegan and + I don''t know if you aware of those the reason an end-end benchmarks right but there + is", "tokens": [50368, 12824, 293, 286, 500, 380, 458, 498, 291, 3650, 295, 729, + 264, 1778, 364, 917, 12, 521, 43751, 558, 457, 456, 307, 50604], "temperature": + 0.0, "avg_logprob": -0.20629585872996936, "compression_ratio": 1.8195121951219513, + "no_speech_prob": 0.003529760055243969}, {"id": 658, "seek": 358300, "start": 3587.8, + "end": 3595.4, "text": " also a big end-end benchmarks that they happen to pleasure + to participate in they have one of the", "tokens": [50604, 611, 257, 955, 917, 12, + 521, 43751, 300, 436, 1051, 281, 6834, 281, 8197, 294, 436, 362, 472, 295, 264, + 50984], "temperature": 0.0, "avg_logprob": -0.20629585872996936, "compression_ratio": + 1.8195121951219513, "no_speech_prob": 0.003529760055243969}, {"id": 659, "seek": + 358300, "start": 3595.4, "end": 3602.6, "text": " datasets one of the tasks they + have is the filtered search I have not participated in that one", "tokens": [50984, + 42856, 472, 295, 264, 9608, 436, 362, 307, 264, 37111, 3164, 286, 362, 406, 17978, + 294, 300, 472, 51344], "temperature": 0.0, "avg_logprob": -0.20629585872996936, + "compression_ratio": 1.8195121951219513, "no_speech_prob": 0.003529760055243969}, + {"id": 660, "seek": 358300, "start": 3602.6, "end": 3607.0, "text": " but again + as you said it''s kind of like academic but some of the datasets are quite", "tokens": + [51344, 457, 797, 382, 291, 848, 309, 311, 733, 295, 411, 7778, 457, 512, 295, 264, + 42856, 366, 1596, 51564], "temperature": 0.0, "avg_logprob": -0.20629585872996936, + "compression_ratio": 1.8195121951219513, "no_speech_prob": 0.003529760055243969}, + {"id": 661, "seek": 360700, "start": 3607.0, "end": 3612.2, "text": " logical like + beaten points dimensions and not that huge it''s like that''s the thing there are", + "tokens": [50364, 14978, 411, 17909, 2793, 12819, 293, 406, 300, 2603, 309, 311, + 411, 300, 311, 264, 551, 456, 366, 50624], "temperature": 0.0, "avg_logprob": -0.22292493932387408, + "compression_ratio": 1.8522167487684729, "no_speech_prob": 0.013297075405716896}, + {"id": 662, "seek": 360700, "start": 3612.2, "end": 3618.04, "text": " 156 yeah + it''s not like yeah there are hundreds of 200 dimensions these are not real data + sets", "tokens": [50624, 2119, 21, 1338, 309, 311, 406, 411, 1338, 456, 366, 6779, + 295, 2331, 12819, 613, 366, 406, 957, 1412, 6352, 50916], "temperature": 0.0, "avg_logprob": + -0.22292493932387408, "compression_ratio": 1.8522167487684729, "no_speech_prob": + 0.013297075405716896}, {"id": 663, "seek": 360700, "start": 3618.68, "end": 3625.48, + "text": " like no there are real data sets but they are real data sets from the + past generation of vectors", "tokens": [50948, 411, 572, 456, 366, 957, 1412, 6352, + 457, 436, 366, 957, 1412, 6352, 490, 264, 1791, 5125, 295, 18875, 51288], "temperature": + 0.0, "avg_logprob": -0.22292493932387408, "compression_ratio": 1.8522167487684729, + "no_speech_prob": 0.013297075405716896}, {"id": 664, "seek": 360700, "start": 3625.48, + "end": 3633.08, "text": " right the the pre the pre current modern embedding error + right which are just scores so much", "tokens": [51288, 558, 264, 264, 659, 264, + 659, 2190, 4363, 12240, 3584, 6713, 558, 597, 366, 445, 13444, 370, 709, 51668], + "temperature": 0.0, "avg_logprob": -0.22292493932387408, "compression_ratio": 1.8522167487684729, + "no_speech_prob": 0.013297075405716896}, {"id": 665, "seek": 363308, "start": 3633.08, + "end": 3637.48, "text": " higher than these so we just don''t see people use these + yeah yeah exactly it''s still fun to", "tokens": [50364, 2946, 813, 613, 370, 321, + 445, 500, 380, 536, 561, 764, 613, 1338, 1338, 2293, 309, 311, 920, 1019, 281, 50584], + "temperature": 0.0, "avg_logprob": -0.13589160225608132, "compression_ratio": 1.8587786259541985, + "no_speech_prob": 0.007158421445637941}, {"id": 666, "seek": 363308, "start": 3637.48, + "end": 3642.04, "text": " participate in this benchmark by the way because the data + is there and you know the some of the", "tokens": [50584, 8197, 294, 341, 18927, + 538, 264, 636, 570, 264, 1412, 307, 456, 293, 291, 458, 264, 512, 295, 264, 50812], + "temperature": 0.0, "avg_logprob": -0.13589160225608132, "compression_ratio": 1.8587786259541985, + "no_speech_prob": 0.007158421445637941}, {"id": 667, "seek": 363308, "start": 3642.04, + "end": 3648.2, "text": " guarantees that you need to hit really high you know like + thousands of tens of thousands of queries", "tokens": [50812, 32567, 300, 291, 643, + 281, 2045, 534, 1090, 291, 458, 411, 5383, 295, 10688, 295, 5383, 295, 24109, 51120], + "temperature": 0.0, "avg_logprob": -0.13589160225608132, "compression_ratio": 1.8587786259541985, + "no_speech_prob": 0.007158421445637941}, {"id": 668, "seek": 363308, "start": 3648.2, + "end": 3655.88, "text": " per second so if you can create a toy index that works + just a proud moment I guess that''s right", "tokens": [51120, 680, 1150, 370, 498, + 291, 393, 1884, 257, 12058, 8186, 300, 1985, 445, 257, 4570, 1623, 286, 2041, 300, + 311, 558, 51504], "temperature": 0.0, "avg_logprob": -0.13589160225608132, "compression_ratio": + 1.8587786259541985, "no_speech_prob": 0.007158421445637941}, {"id": 669, "seek": + 363308, "start": 3656.12, "end": 3662.04, "text": " but I would say that people + don''t care about these bench like now people care about the benchmarks like", "tokens": + [51516, 457, 286, 576, 584, 300, 561, 500, 380, 1127, 466, 613, 10638, 411, 586, + 561, 1127, 466, 264, 43751, 411, 51812], "temperature": 0.0, "avg_logprob": -0.13589160225608132, + "compression_ratio": 1.8587786259541985, "no_speech_prob": 0.007158421445637941}, + {"id": 670, "seek": 366204, "start": 3662.04, "end": 3666.36, "text": " their fun + competitions but I think it can ruin your company if all you''re trying to do is + maximize", "tokens": [50364, 641, 1019, 26185, 457, 286, 519, 309, 393, 15514, 428, + 2237, 498, 439, 291, 434, 1382, 281, 360, 307, 19874, 50580], "temperature": 0.0, + "avg_logprob": -0.1265374664077185, "compression_ratio": 1.751497005988024, "no_speech_prob": + 0.0005586735205724835}, {"id": 671, "seek": 366204, "start": 3666.36, "end": 3671.64, + "text": " these benchmarks because how many companies on in the world are trying + to do 10,000 QPS on a billion", "tokens": [50580, 613, 43751, 570, 577, 867, 3431, + 322, 294, 264, 1002, 366, 1382, 281, 360, 1266, 11, 1360, 1249, 6273, 322, 257, + 5218, 50844], "temperature": 0.0, "avg_logprob": -0.1265374664077185, "compression_ratio": + 1.751497005988024, "no_speech_prob": 0.0005586735205724835}, {"id": 672, "seek": + 366204, "start": 3671.64, "end": 3676.6, "text": " vectors right not that many but + there''s a lot of companies that have a billion vectors lying around", "tokens": + [50844, 18875, 558, 406, 300, 867, 457, 456, 311, 257, 688, 295, 3431, 300, 362, + 257, 5218, 18875, 8493, 926, 51092], "temperature": 0.0, "avg_logprob": -0.1265374664077185, + "compression_ratio": 1.751497005988024, "no_speech_prob": 0.0005586735205724835}, + {"id": 673, "seek": 366204, "start": 3676.6, "end": 3680.44, "text": " that they + want to search and they just don''t want the pricing to be offensive right we''re", + "tokens": [51092, 300, 436, 528, 281, 3164, 293, 436, 445, 500, 380, 528, 264, 17621, + 281, 312, 15710, 558, 321, 434, 51284], "temperature": 0.0, "avg_logprob": -0.1265374664077185, + "compression_ratio": 1.751497005988024, "no_speech_prob": 0.0005586735205724835}, + {"id": 674, "seek": 366204, "start": 3680.44, "end": 3683.56, "text": " a turbo + buffer can you do this depending on the dimensionality for like a thousand dollars + in", "tokens": [51284, 257, 20902, 21762, 393, 291, 360, 341, 5413, 322, 264, 10139, + 1860, 337, 411, 257, 4714, 3808, 294, 51440], "temperature": 0.0, "avg_logprob": + -0.1265374664077185, "compression_ratio": 1.751497005988024, "no_speech_prob": 0.0005586735205724835}, + {"id": 675, "seek": 366204, "start": 3683.56, "end": 3690.92, "text": " month that''s + what people really seem to care about yeah sure maybe if I ask you like a spicy + question", "tokens": [51440, 1618, 300, 311, 437, 561, 534, 1643, 281, 1127, 466, + 1338, 988, 1310, 498, 286, 1029, 291, 411, 257, 9127, 1168, 51808], "temperature": + 0.0, "avg_logprob": -0.1265374664077185, "compression_ratio": 1.751497005988024, + "no_speech_prob": 0.0005586735205724835}, {"id": 676, "seek": 369092, "start": 3690.92, + "end": 3698.12, "text": " if I may why do you think some of the vector database + players like in Dalgium cells into that game", "tokens": [50364, 498, 286, 815, + 983, 360, 291, 519, 512, 295, 264, 8062, 8149, 4150, 411, 294, 17357, 70, 2197, + 5438, 666, 300, 1216, 50724], "temperature": 0.0, "avg_logprob": -0.14926262335343796, + "compression_ratio": 1.7660550458715596, "no_speech_prob": 0.004764711018651724}, + {"id": 677, "seek": 369092, "start": 3698.12, "end": 3703.2400000000002, "text": + " of showing the benchmark and telling we are the best and then you know someone + else cuts them over", "tokens": [50724, 295, 4099, 264, 18927, 293, 3585, 321, 366, + 264, 1151, 293, 550, 291, 458, 1580, 1646, 9992, 552, 670, 50980], "temperature": + 0.0, "avg_logprob": -0.14926262335343796, "compression_ratio": 1.7660550458715596, + "no_speech_prob": 0.004764711018651724}, {"id": 678, "seek": 369092, "start": 3703.2400000000002, + "end": 3708.44, "text": " and says no you made a mistake in the benchmark why do + you think this is happening like like", "tokens": [50980, 293, 1619, 572, 291, 1027, + 257, 6146, 294, 264, 18927, 983, 360, 291, 519, 341, 307, 2737, 411, 411, 51240], + "temperature": 0.0, "avg_logprob": -0.14926262335343796, "compression_ratio": 1.7660550458715596, + "no_speech_prob": 0.004764711018651724}, {"id": 679, "seek": 369092, "start": 3708.44, + "end": 3717.0, "text": " publicly if you''re comfortable talking about this yeah + we we don''t we don''t publish benchmarks", "tokens": [51240, 14843, 498, 291, 434, + 4619, 1417, 466, 341, 1338, 321, 321, 500, 380, 321, 500, 380, 11374, 43751, 51668], + "temperature": 0.0, "avg_logprob": -0.14926262335343796, "compression_ratio": 1.7660550458715596, + "no_speech_prob": 0.004764711018651724}, {"id": 680, "seek": 371700, "start": 3717.0, + "end": 3723.16, "text": " against anyone else in in fact it''s it''s it''s usually + against the terms of service to do that", "tokens": [50364, 1970, 2878, 1646, 294, + 294, 1186, 309, 311, 309, 311, 309, 311, 2673, 1970, 264, 2115, 295, 2643, 281, + 360, 300, 50672], "temperature": 0.0, "avg_logprob": -0.11893197342201516, "compression_ratio": + 1.8537549407114624, "no_speech_prob": 0.015068494714796543}, {"id": 681, "seek": + 371700, "start": 3723.16, "end": 3727.96, "text": " for almost every vendor including + the big vendors like the hyper scalers probably shouldn''t be", "tokens": [50672, + 337, 1920, 633, 24321, 3009, 264, 955, 22056, 411, 264, 9848, 15664, 433, 1391, + 4659, 380, 312, 50912], "temperature": 0.0, "avg_logprob": -0.11893197342201516, + "compression_ratio": 1.8537549407114624, "no_speech_prob": 0.015068494714796543}, + {"id": 682, "seek": 371700, "start": 3728.76, "end": 3732.84, "text": " it probably + should be prohibited for the hyper scalers for like any competitive reasons or", + "tokens": [50952, 309, 1391, 820, 312, 32069, 337, 264, 9848, 15664, 433, 337, 411, + 604, 10043, 4112, 420, 51156], "temperature": 0.0, "avg_logprob": -0.11893197342201516, + "compression_ratio": 1.8537549407114624, "no_speech_prob": 0.015068494714796543}, + {"id": 683, "seek": 371700, "start": 3732.84, "end": 3737.8, "text": " but anyway + for the peers I think it''s it''s like a low blow because everyone can sort of p-hack", + "tokens": [51156, 457, 4033, 337, 264, 16739, 286, 519, 309, 311, 309, 311, 411, + 257, 2295, 6327, 570, 1518, 393, 1333, 295, 280, 12, 71, 501, 51404], "temperature": + 0.0, "avg_logprob": -0.11893197342201516, "compression_ratio": 1.8537549407114624, + "no_speech_prob": 0.015068494714796543}, {"id": 684, "seek": 371700, "start": 3737.8, + "end": 3742.12, "text": " their way to something where they they go better and becomes + like month throwing and it''s very", "tokens": [51404, 641, 636, 281, 746, 689, + 436, 436, 352, 1101, 293, 3643, 411, 1618, 10238, 293, 309, 311, 588, 51620], "temperature": + 0.0, "avg_logprob": -0.11893197342201516, "compression_ratio": 1.8537549407114624, + "no_speech_prob": 0.015068494714796543}, {"id": 685, "seek": 374212, "start": 3742.12, + "end": 3747.96, "text": " distracting activity um we benchmark ourselves in ways + that we find that our customers are actually", "tokens": [50364, 36689, 5191, 1105, + 321, 18927, 4175, 294, 2098, 300, 321, 915, 300, 527, 4581, 366, 767, 50656], "temperature": + 0.0, "avg_logprob": -0.10454317375465676, "compression_ratio": 1.8104089219330854, + "no_speech_prob": 0.0021337273065000772}, {"id": 686, "seek": 374212, "start": 3747.96, + "end": 3752.04, "text": " using the database so we''re not doing it at 10,000 qps + because it''s just not what we see to a single", "tokens": [50656, 1228, 264, 8149, + 370, 321, 434, 406, 884, 309, 412, 1266, 11, 1360, 9505, 1878, 570, 309, 311, 445, + 406, 437, 321, 536, 281, 257, 2167, 50860], "temperature": 0.0, "avg_logprob": -0.10454317375465676, + "compression_ratio": 1.8104089219330854, "no_speech_prob": 0.0021337273065000772}, + {"id": 687, "seek": 374212, "start": 3752.04, "end": 3758.52, "text": " namespace + um so we benchmark against ourselves we benchmark against first principles and we''re + always", "tokens": [50860, 5288, 17940, 1105, 370, 321, 18927, 1970, 4175, 321, + 18927, 1970, 700, 9156, 293, 321, 434, 1009, 51184], "temperature": 0.0, "avg_logprob": + -0.10454317375465676, "compression_ratio": 1.8104089219330854, "no_speech_prob": + 0.0021337273065000772}, {"id": 688, "seek": 374212, "start": 3758.52, "end": 3762.3599999999997, + "text": " considering what is the gap between what turbo puffer does and first principles + there''s", "tokens": [51184, 8079, 437, 307, 264, 7417, 1296, 437, 20902, 19613, + 260, 775, 293, 700, 9156, 456, 311, 51376], "temperature": 0.0, "avg_logprob": -0.10454317375465676, + "compression_ratio": 1.8104089219330854, "no_speech_prob": 0.0021337273065000772}, + {"id": 689, "seek": 374212, "start": 3763.96, "end": 3767.7999999999997, "text": + " that''s what I''ve learned that''s why I do napkin math is because the fundamental + thing you should", "tokens": [51456, 300, 311, 437, 286, 600, 3264, 300, 311, 983, + 286, 360, 9296, 5843, 5221, 307, 570, 264, 8088, 551, 291, 820, 51648], "temperature": + 0.0, "avg_logprob": -0.10454317375465676, "compression_ratio": 1.8104089219330854, + "no_speech_prob": 0.0021337273065000772}, {"id": 690, "seek": 376780, "start": 3767.8, + "end": 3773.5600000000004, "text": " be benchmarking against our first principles + there''s a gap between what the DRAM or disband with it", "tokens": [50364, 312, + 18927, 278, 1970, 527, 700, 9156, 456, 311, 257, 7417, 1296, 437, 264, 12118, 2865, + 420, 717, 4235, 365, 309, 50652], "temperature": 0.0, "avg_logprob": -0.10652859729269276, + "compression_ratio": 1.8014440433212997, "no_speech_prob": 0.0008137811091728508}, + {"id": 691, "seek": 376780, "start": 3773.5600000000004, "end": 3779.48, "text": + " is multiplied by how much if it you need and your database is not doing that well + then you either", "tokens": [50652, 307, 17207, 538, 577, 709, 498, 309, 291, 643, + 293, 428, 8149, 307, 406, 884, 300, 731, 550, 291, 2139, 50948], "temperature": + 0.0, "avg_logprob": -0.10652859729269276, "compression_ratio": 1.8014440433212997, + "no_speech_prob": 0.0008137811091728508}, {"id": 692, "seek": 376780, "start": 3779.48, + "end": 3784.92, "text": " have a gap and you''re understanding or you have a you''ve + found room for improvement that''s what matters", "tokens": [50948, 362, 257, 7417, + 293, 291, 434, 3701, 420, 291, 362, 257, 291, 600, 1352, 1808, 337, 10444, 300, + 311, 437, 7001, 51220], "temperature": 0.0, "avg_logprob": -0.10652859729269276, + "compression_ratio": 1.8014440433212997, "no_speech_prob": 0.0008137811091728508}, + {"id": 693, "seek": 376780, "start": 3784.92, "end": 3788.84, "text": " and of course + it also matters what other people are doing but what matters the most is what your", + "tokens": [51220, 293, 295, 1164, 309, 611, 7001, 437, 661, 561, 366, 884, 457, + 437, 7001, 264, 881, 307, 437, 428, 51416], "temperature": 0.0, "avg_logprob": -0.10652859729269276, + "compression_ratio": 1.8014440433212997, "no_speech_prob": 0.0008137811091728508}, + {"id": 694, "seek": 376780, "start": 3788.84, "end": 3794.04, "text": " customers + are trying to do and they''ll they''ll pull you in that direction so we think that + this is a", "tokens": [51416, 4581, 366, 1382, 281, 360, 293, 436, 603, 436, 603, + 2235, 291, 294, 300, 3513, 370, 321, 519, 300, 341, 307, 257, 51676], "temperature": + 0.0, "avg_logprob": -0.10652859729269276, "compression_ratio": 1.8014440433212997, + "no_speech_prob": 0.0008137811091728508}, {"id": 695, "seek": 379404, "start": 3794.04, + "end": 3798.52, "text": " this easily becomes one of these metrics where you know + if you give people a metric they''ll optimize for", "tokens": [50364, 341, 3612, + 3643, 472, 295, 613, 16367, 689, 291, 458, 498, 291, 976, 561, 257, 20678, 436, + 603, 19719, 337, 50588], "temperature": 0.0, "avg_logprob": -0.09159499406814575, + "compression_ratio": 1.9528619528619529, "no_speech_prob": 0.0017855166224762797}, + {"id": 696, "seek": 379404, "start": 3798.52, "end": 3803.88, "text": " it um and + benchmarks of how many qps you can do at some number recall it''s just not what + people", "tokens": [50588, 309, 1105, 293, 43751, 295, 577, 867, 9505, 1878, 291, + 393, 360, 412, 512, 1230, 9901, 309, 311, 445, 406, 437, 561, 50856], "temperature": + 0.0, "avg_logprob": -0.09159499406814575, "compression_ratio": 1.9528619528619529, + "no_speech_prob": 0.0017855166224762797}, {"id": 697, "seek": 379404, "start": 3803.88, + "end": 3808.44, "text": " care about um they care about it working they care about + enormous ride throughput they care about", "tokens": [50856, 1127, 466, 1105, 436, + 1127, 466, 309, 1364, 436, 1127, 466, 11322, 5077, 44629, 436, 1127, 466, 51084], + "temperature": 0.0, "avg_logprob": -0.09159499406814575, "compression_ratio": 1.9528619528619529, + "no_speech_prob": 0.0017855166224762797}, {"id": 698, "seek": 379404, "start": 3808.44, + "end": 3813.4, "text": " costs they care about other things necessarily that are + much harder to put in such a benchmark um", "tokens": [51084, 5497, 436, 1127, 466, + 661, 721, 4725, 300, 366, 709, 6081, 281, 829, 294, 1270, 257, 18927, 1105, 51332], + "temperature": 0.0, "avg_logprob": -0.09159499406814575, "compression_ratio": 1.9528619528619529, + "no_speech_prob": 0.0017855166224762797}, {"id": 699, "seek": 379404, "start": 3813.4, + "end": 3817.56, "text": " I think benchmarks are important like we need to give + people a sense of what they should expect", "tokens": [51332, 286, 519, 43751, 366, + 1021, 411, 321, 643, 281, 976, 561, 257, 2020, 295, 437, 436, 820, 2066, 51540], + "temperature": 0.0, "avg_logprob": -0.09159499406814575, "compression_ratio": 1.9528619528619529, + "no_speech_prob": 0.0017855166224762797}, {"id": 700, "seek": 379404, "start": 3817.56, + "end": 3822.04, "text": " and they should hold us truth at that and what I would + love to have is like more absurd", "tokens": [51540, 293, 436, 820, 1797, 505, 3494, + 412, 300, 293, 437, 286, 576, 959, 281, 362, 307, 411, 544, 19774, 51764], "temperature": + 0.0, "avg_logprob": -0.09159499406814575, "compression_ratio": 1.9528619528619529, + "no_speech_prob": 0.0017855166224762797}, {"id": 701, "seek": 382204, "start": 3822.04, + "end": 3825.96, "text": " ability in the turbo puffer product of like what kind + of like performance are you seeing um", "tokens": [50364, 3485, 294, 264, 20902, + 19613, 260, 1674, 295, 411, 437, 733, 295, 411, 3389, 366, 291, 2577, 1105, 50560], + "temperature": 0.0, "avg_logprob": -0.18909495452354694, "compression_ratio": 1.8884615384615384, + "no_speech_prob": 0.03186819702386856}, {"id": 702, "seek": 382204, "start": 3825.96, + "end": 3831.16, "text": " we''re working on you know explaining um our exposing + query plans from turbo puffer right so you can see", "tokens": [50560, 321, 434, + 1364, 322, 291, 458, 13468, 1105, 527, 33178, 14581, 5482, 490, 20902, 19613, 260, + 558, 370, 291, 393, 536, 50820], "temperature": 0.0, "avg_logprob": -0.18909495452354694, + "compression_ratio": 1.8884615384615384, "no_speech_prob": 0.03186819702386856}, + {"id": 703, "seek": 382204, "start": 3831.16, "end": 3837.64, "text": " um well + what''s what''s causing the indexing uh sorry what''s causing the the query latency + to be what it is", "tokens": [50820, 1105, 731, 437, 311, 437, 311, 9853, 264, 8186, + 278, 2232, 2597, 437, 311, 9853, 264, 264, 14581, 27043, 281, 312, 437, 309, 307, + 51144], "temperature": 0.0, "avg_logprob": -0.18909495452354694, "compression_ratio": + 1.8884615384615384, "no_speech_prob": 0.03186819702386856}, {"id": 704, "seek": + 382204, "start": 3837.64, "end": 3842.2, "text": " so yeah I don''t think the mud + throwing is great um I think that at some point someone''s going to", "tokens": + [51144, 370, 1338, 286, 500, 380, 519, 264, 8933, 10238, 307, 869, 1105, 286, 519, + 300, 412, 512, 935, 1580, 311, 516, 281, 51372], "temperature": 0.0, "avg_logprob": + -0.18909495452354694, "compression_ratio": 1.8884615384615384, "no_speech_prob": + 0.03186819702386856}, {"id": 705, "seek": 382204, "start": 3842.2, "end": 3847.72, + "text": " publish a benchmark with turbo puffer and um and and again something else + and and then we''ll", "tokens": [51372, 11374, 257, 18927, 365, 20902, 19613, 260, + 293, 1105, 293, 293, 797, 746, 1646, 293, 293, 550, 321, 603, 51648], "temperature": + 0.0, "avg_logprob": -0.18909495452354694, "compression_ratio": 1.8884615384615384, + "no_speech_prob": 0.03186819702386856}, {"id": 706, "seek": 384772, "start": 3847.7999999999997, + "end": 3852.7599999999998, "text": " have to deal with that as it comes right um + by um but it''s certainly not an activity that we plan", "tokens": [50368, 362, + 281, 2028, 365, 300, 382, 309, 1487, 558, 1105, 538, 1105, 457, 309, 311, 3297, + 406, 364, 5191, 300, 321, 1393, 50616], "temperature": 0.0, "avg_logprob": -0.1483280616894103, + "compression_ratio": 1.7234042553191489, "no_speech_prob": 0.007801888510584831}, + {"id": 707, "seek": 384772, "start": 3852.7599999999998, "end": 3858.2, "text": + " to engage in yeah I love your answer because it also resonates with me like in + a different dimension", "tokens": [50616, 281, 4683, 294, 1338, 286, 959, 428, 1867, + 570, 309, 611, 41051, 365, 385, 411, 294, 257, 819, 10139, 50888], "temperature": + 0.0, "avg_logprob": -0.1483280616894103, "compression_ratio": 1.7234042553191489, + "no_speech_prob": 0.007801888510584831}, {"id": 708, "seek": 384772, "start": 3858.2, + "end": 3863.72, "text": " you know I found myself in a situation when at some point + in the past when um we''ve been copy", "tokens": [50888, 291, 458, 286, 1352, 2059, + 294, 257, 2590, 562, 412, 512, 935, 294, 264, 1791, 562, 1105, 321, 600, 668, 5055, + 51164], "temperature": 0.0, "avg_logprob": -0.1483280616894103, "compression_ratio": + 1.7234042553191489, "no_speech_prob": 0.007801888510584831}, {"id": 709, "seek": + 384772, "start": 3864.7599999999998, "end": 3869.08, "text": " copycatted I can''t + be say in that way so there was a company that really literally copied the whole", + "tokens": [51216, 5055, 66, 32509, 286, 393, 380, 312, 584, 294, 300, 636, 370, + 456, 390, 257, 2237, 300, 534, 3736, 25365, 264, 1379, 51432], "temperature": 0.0, + "avg_logprob": -0.1483280616894103, "compression_ratio": 1.7234042553191489, "no_speech_prob": + 0.007801888510584831}, {"id": 710, "seek": 384772, "start": 3869.08, "end": 3874.52, + "text": " interface and it like how how the product looks and we felt threatened + but what they couldn''t", "tokens": [51432, 9226, 293, 309, 411, 577, 577, 264, + 1674, 1542, 293, 321, 2762, 18268, 457, 437, 436, 2809, 380, 51704], "temperature": + 0.0, "avg_logprob": -0.1483280616894103, "compression_ratio": 1.7234042553191489, + "no_speech_prob": 0.007801888510584831}, {"id": 711, "seek": 387452, "start": 3874.52, + "end": 3880.36, "text": " copy is essentially the internal IP right all the algorithms + everything where we''ve spent hard", "tokens": [50364, 5055, 307, 4476, 264, 6920, + 8671, 558, 439, 264, 14642, 1203, 689, 321, 600, 4418, 1152, 50656], "temperature": + 0.0, "avg_logprob": -0.1465372870950138, "compression_ratio": 1.7105263157894737, + "no_speech_prob": 0.0010933134471997619}, {"id": 712, "seek": 387452, "start": 3881.0, + "end": 3887.96, "text": " hard uh working time on you know they couldn''t copy that + and and effectively that doesn''t fly", "tokens": [50688, 1152, 2232, 1364, 565, + 322, 291, 458, 436, 2809, 380, 5055, 300, 293, 293, 8659, 300, 1177, 380, 3603, + 51036], "temperature": 0.0, "avg_logprob": -0.1465372870950138, "compression_ratio": + 1.7105263157894737, "no_speech_prob": 0.0010933134471997619}, {"id": 713, "seek": + 387452, "start": 3887.96, "end": 3893.96, "text": " by itself right so so basically + what I''m trying to say is that you know even though it felt threatening", "tokens": + [51036, 538, 2564, 558, 370, 370, 1936, 437, 286, 478, 1382, 281, 584, 307, 300, + 291, 458, 754, 1673, 309, 2762, 20768, 51336], "temperature": 0.0, "avg_logprob": + -0.1465372870950138, "compression_ratio": 1.7105263157894737, "no_speech_prob": + 0.0010933134471997619}, {"id": 714, "seek": 387452, "start": 3894.6, "end": 3901.56, + "text": " still thinking about what you need to solve right by the laws of physics + you really need to focus", "tokens": [51368, 920, 1953, 466, 437, 291, 643, 281, + 5039, 558, 538, 264, 6064, 295, 10649, 291, 534, 643, 281, 1879, 51716], "temperature": + 0.0, "avg_logprob": -0.1465372870950138, "compression_ratio": 1.7105263157894737, + "no_speech_prob": 0.0010933134471997619}, {"id": 715, "seek": 390156, "start": 3901.56, + "end": 3906.44, "text": " just on that and if you solve that you become the leader + of the market and that''s what happened", "tokens": [50364, 445, 322, 300, 293, + 498, 291, 5039, 300, 291, 1813, 264, 5263, 295, 264, 2142, 293, 300, 311, 437, 2011, + 50608], "temperature": 0.0, "avg_logprob": -0.12110464572906494, "compression_ratio": + 1.9606299212598426, "no_speech_prob": 0.010753949172794819}, {"id": 716, "seek": + 390156, "start": 3906.44, "end": 3912.6, "text": " to the company actually it the + story was that it actually acquired this copycat right and that''s it", "tokens": + [50608, 281, 264, 2237, 767, 309, 264, 1657, 390, 300, 309, 767, 17554, 341, 5055, + 18035, 558, 293, 300, 311, 309, 50916], "temperature": 0.0, "avg_logprob": -0.12110464572906494, + "compression_ratio": 1.9606299212598426, "no_speech_prob": 0.010753949172794819}, + {"id": 717, "seek": 390156, "start": 3913.16, "end": 3917.4, "text": " uh I mean + it doesn''t mean that''s the bad thing bad outcome for either of them but what I''m + trying to", "tokens": [50944, 2232, 286, 914, 309, 1177, 380, 914, 300, 311, 264, + 1578, 551, 1578, 9700, 337, 2139, 295, 552, 457, 437, 286, 478, 1382, 281, 51156], + "temperature": 0.0, "avg_logprob": -0.12110464572906494, "compression_ratio": 1.9606299212598426, + "no_speech_prob": 0.010753949172794819}, {"id": 718, "seek": 390156, "start": 3917.4, + "end": 3922.52, "text": " say is that just focus on that on that thing that you''re + trying to solve and don''t try to indulge", "tokens": [51156, 584, 307, 300, 445, + 1879, 322, 300, 322, 300, 551, 300, 291, 434, 1382, 281, 5039, 293, 500, 380, 853, + 281, 28626, 432, 51412], "temperature": 0.0, "avg_logprob": -0.12110464572906494, + "compression_ratio": 1.9606299212598426, "no_speech_prob": 0.010753949172794819}, + {"id": 719, "seek": 390156, "start": 3922.52, "end": 3929.24, "text": " into these + you know games of like you said you know not throwing and stuff I like that really + well said", "tokens": [51412, 666, 613, 291, 458, 2813, 295, 411, 291, 848, 291, + 458, 406, 10238, 293, 1507, 286, 411, 300, 534, 731, 848, 51748], "temperature": + 0.0, "avg_logprob": -0.12110464572906494, "compression_ratio": 1.9606299212598426, + "no_speech_prob": 0.010753949172794819}, {"id": 720, "seek": 392924, "start": 3929.24, + "end": 3936.2, "text": " yeah that''s I think I think so we focus on on customer + studies we focus on we focus on on", "tokens": [50364, 1338, 300, 311, 286, 519, + 286, 519, 370, 321, 1879, 322, 322, 5474, 5313, 321, 1879, 322, 321, 1879, 322, + 322, 50712], "temperature": 0.0, "avg_logprob": -0.1164558905142325, "compression_ratio": + 2.0837004405286343, "no_speech_prob": 0.0023051046300679445}, {"id": 721, "seek": + 392924, "start": 3936.2, "end": 3941.7999999999997, "text": " first principles we + focus on benchmarking and we focus on on what are customers telling you", "tokens": + [50712, 700, 9156, 321, 1879, 322, 18927, 278, 293, 321, 1879, 322, 322, 437, 366, + 4581, 3585, 291, 50992], "temperature": 0.0, "avg_logprob": -0.1164558905142325, + "compression_ratio": 2.0837004405286343, "no_speech_prob": 0.0023051046300679445}, + {"id": 722, "seek": 392924, "start": 3941.7999999999997, "end": 3947.16, "text": + " telling us that they need and I think that um I think those are the right things + to focus on for our", "tokens": [50992, 3585, 505, 300, 436, 643, 293, 286, 519, + 300, 1105, 286, 519, 729, 366, 264, 558, 721, 281, 1879, 322, 337, 527, 51260], + "temperature": 0.0, "avg_logprob": -0.1164558905142325, "compression_ratio": 2.0837004405286343, + "no_speech_prob": 0.0023051046300679445}, {"id": 723, "seek": 392924, "start": 3947.16, + "end": 3953.08, "text": " company for sure and and and just looking at the clientele + right the the ones that you shared", "tokens": [51260, 2237, 337, 988, 293, 293, + 293, 445, 1237, 412, 264, 6423, 16884, 558, 264, 264, 2306, 300, 291, 5507, 51556], + "temperature": 0.0, "avg_logprob": -0.1164558905142325, "compression_ratio": 2.0837004405286343, + "no_speech_prob": 0.0023051046300679445}, {"id": 724, "seek": 392924, "start": 3953.08, + "end": 3958.04, "text": " just just knowing those names cursor and and notion that + everyone is pretty much using every day", "tokens": [51556, 445, 445, 5276, 729, + 5288, 28169, 293, 293, 10710, 300, 1518, 307, 1238, 709, 1228, 633, 786, 51804], + "temperature": 0.0, "avg_logprob": -0.1164558905142325, "compression_ratio": 2.0837004405286343, + "no_speech_prob": 0.0023051046300679445}, {"id": 725, "seek": 395804, "start": 3958.04, + "end": 3963.8, "text": " that''s like a testament to what you''ve done um I also + wanted to ask you before we close I wanted to", "tokens": [50364, 300, 311, 411, + 257, 35499, 281, 437, 291, 600, 1096, 1105, 286, 611, 1415, 281, 1029, 291, 949, + 321, 1998, 286, 1415, 281, 50652], "temperature": 0.0, "avg_logprob": -0.14637079546528478, + "compression_ratio": 1.695067264573991, "no_speech_prob": 0.0015682813245803118}, + {"id": 726, "seek": 395804, "start": 3963.8, "end": 3971.4, "text": " ask you about + what are the maybe technical or business or some other challenges that you see", + "tokens": [50652, 1029, 291, 466, 437, 366, 264, 1310, 6191, 420, 1606, 420, 512, + 661, 4759, 300, 291, 536, 51032], "temperature": 0.0, "avg_logprob": -0.14637079546528478, + "compression_ratio": 1.695067264573991, "no_speech_prob": 0.0015682813245803118}, + {"id": 727, "seek": 395804, "start": 3971.4, "end": 3975.48, "text": " ahead of + yourself or maybe that''s what already is happening and you see that it''s important", + "tokens": [51032, 2286, 295, 1803, 420, 1310, 300, 311, 437, 1217, 307, 2737, 293, + 291, 536, 300, 309, 311, 1021, 51236], "temperature": 0.0, "avg_logprob": -0.14637079546528478, + "compression_ratio": 1.695067264573991, "no_speech_prob": 0.0015682813245803118}, + {"id": 728, "seek": 395804, "start": 3976.36, "end": 3984.7599999999998, "text": + " especially in this space of LLAMS where LLAMS can bring value um what is it it + feels like you", "tokens": [51280, 2318, 294, 341, 1901, 295, 441, 43, 2865, 50, + 689, 441, 43, 2865, 50, 393, 1565, 2158, 1105, 437, 307, 309, 309, 3417, 411, 291, + 51700], "temperature": 0.0, "avg_logprob": -0.14637079546528478, "compression_ratio": + 1.695067264573991, "no_speech_prob": 0.0015682813245803118}, {"id": 729, "seek": + 398476, "start": 3985.0, "end": 3990.6000000000004, "text": " you have wildly successful + as a business and as a technology but is there something that you see", "tokens": + [50376, 291, 362, 4868, 356, 4406, 382, 257, 1606, 293, 382, 257, 2899, 457, 307, + 456, 746, 300, 291, 536, 50656], "temperature": 0.0, "avg_logprob": -0.18548973216566927, + "compression_ratio": 1.5545454545454545, "no_speech_prob": 0.002473148750141263}, + {"id": 730, "seek": 398476, "start": 3991.1600000000003, "end": 3994.84, "text": + " is still unsolved and and is ahead of you and worth solving", "tokens": [50684, + 307, 920, 2693, 29110, 293, 293, 307, 2286, 295, 291, 293, 3163, 12606, 50868], + "temperature": 0.0, "avg_logprob": -0.18548973216566927, "compression_ratio": 1.5545454545454545, + "no_speech_prob": 0.002473148750141263}, {"id": 731, "seek": 398476, "start": 3999.88, + "end": 4006.2000000000003, "text": " I''ll go back to a you know I spent a long + time at Shopify and so part of growing up at that", "tokens": [51120, 286, 603, + 352, 646, 281, 257, 291, 458, 286, 4418, 257, 938, 565, 412, 43991, 293, 370, 644, + 295, 4194, 493, 412, 300, 51436], "temperature": 0.0, "avg_logprob": -0.18548973216566927, + "compression_ratio": 1.5545454545454545, "no_speech_prob": 0.002473148750141263}, + {"id": 732, "seek": 398476, "start": 4006.2000000000003, "end": 4011.1600000000003, + "text": " company from when I was very young was being taught a bit in the school + of Toby Lukay the CEO", "tokens": [51436, 2237, 490, 562, 286, 390, 588, 2037, 390, + 885, 5928, 257, 857, 294, 264, 1395, 295, 40223, 34992, 320, 264, 9282, 51684], + "temperature": 0.0, "avg_logprob": -0.18548973216566927, "compression_ratio": 1.5545454545454545, + "no_speech_prob": 0.002473148750141263}, {"id": 733, "seek": 401116, "start": 4011.24, + "end": 4018.2, "text": " and something that he often said is that you know you about + himself is like you have to grow to keep", "tokens": [50368, 293, 746, 300, 415, + 2049, 848, 307, 300, 291, 458, 291, 466, 3647, 307, 411, 291, 362, 281, 1852, 281, + 1066, 50716], "temperature": 0.0, "avg_logprob": -0.11925489344495407, "compression_ratio": + 1.6933333333333334, "no_speech_prob": 0.0027155608404427767}, {"id": 734, "seek": + 401116, "start": 4018.2, "end": 4024.68, "text": " up with the business and that''s + that''s what it is for me as well right I um first had to grow as an", "tokens": + [50716, 493, 365, 264, 1606, 293, 300, 311, 300, 311, 437, 309, 307, 337, 385, 382, + 731, 558, 286, 1105, 700, 632, 281, 1852, 382, 364, 51040], "temperature": 0.0, + "avg_logprob": -0.11925489344495407, "compression_ratio": 1.6933333333333334, "no_speech_prob": + 0.0027155608404427767}, {"id": 735, "seek": 401116, "start": 4024.68, "end": 4030.04, + "text": " engineer to put out the first version um then I had to build an engineering + team to take it much", "tokens": [51040, 11403, 281, 829, 484, 264, 700, 3037, 1105, + 550, 286, 632, 281, 1322, 364, 7043, 1469, 281, 747, 309, 709, 51308], "temperature": + 0.0, "avg_logprob": -0.11925489344495407, "compression_ratio": 1.6933333333333334, + "no_speech_prob": 0.0027155608404427767}, {"id": 736, "seek": 401116, "start": 4030.04, + "end": 4035.56, "text": " further than I ever could alone and I think we have and + just an absolutely like 99%", "tokens": [51308, 3052, 813, 286, 1562, 727, 3312, + 293, 286, 519, 321, 362, 293, 445, 364, 3122, 411, 11803, 4, 51584], "temperature": + 0.0, "avg_logprob": -0.11925489344495407, "compression_ratio": 1.6933333333333334, + "no_speech_prob": 0.0027155608404427767}, {"id": 737, "seek": 403556, "start": 4035.56, + "end": 4042.12, "text": " college engineering team now then I turned my focus to + sales and learning that and now I turn", "tokens": [50364, 3859, 7043, 1469, 586, + 550, 286, 3574, 452, 1879, 281, 5763, 293, 2539, 300, 293, 586, 286, 1261, 50692], + "temperature": 0.0, "avg_logprob": -0.12436038835913735, "compression_ratio": 2.0, + "no_speech_prob": 0.0060279155150055885}, {"id": 738, "seek": 403556, "start": 4042.68, + "end": 4047.32, "text": " now I have to turn it towards marketing towards legal + towards all these different things to build", "tokens": [50720, 586, 286, 362, 281, + 1261, 309, 3030, 6370, 3030, 5089, 3030, 439, 613, 819, 721, 281, 1322, 50952], + "temperature": 0.0, "avg_logprob": -0.12436038835913735, "compression_ratio": 2.0, + "no_speech_prob": 0.0060279155150055885}, {"id": 739, "seek": 403556, "start": 4047.32, + "end": 4053.72, "text": " the company I we spoke a little bit about this about this + before um about I think that one of the", "tokens": [50952, 264, 2237, 286, 321, + 7179, 257, 707, 857, 466, 341, 466, 341, 949, 1105, 466, 286, 519, 300, 472, 295, + 264, 51272], "temperature": 0.0, "avg_logprob": -0.12436038835913735, "compression_ratio": + 2.0, "no_speech_prob": 0.0060279155150055885}, {"id": 740, "seek": 403556, "start": + 4053.72, "end": 4058.84, "text": " beliefs that we have is just the town density + of the team and I don''t think there''s a lot of I", "tokens": [51272, 13585, 300, + 321, 362, 307, 445, 264, 3954, 10305, 295, 264, 1469, 293, 286, 500, 380, 519, 456, + 311, 257, 688, 295, 286, 51528], "temperature": 0.0, "avg_logprob": -0.12436038835913735, + "compression_ratio": 2.0, "no_speech_prob": 0.0060279155150055885}, {"id": 741, + "seek": 403556, "start": 4058.84, "end": 4063.72, "text": " think that a lot of + people talk a lot about the town density and I think that there is um now a", "tokens": + [51528, 519, 300, 257, 688, 295, 561, 751, 257, 688, 466, 264, 3954, 10305, 293, + 286, 519, 300, 456, 307, 1105, 586, 257, 51772], "temperature": 0.0, "avg_logprob": + -0.12436038835913735, "compression_ratio": 2.0, "no_speech_prob": 0.0060279155150055885}, + {"id": 742, "seek": 406372, "start": 4063.72, "end": 4068.9199999999996, "text": + " generation of companies that''s really trying to do it um I think that um the + tools that we now", "tokens": [50364, 5125, 295, 3431, 300, 311, 534, 1382, 281, + 360, 309, 1105, 286, 519, 300, 1105, 264, 3873, 300, 321, 586, 50624], "temperature": + 0.0, "avg_logprob": -0.06795200434598056, "compression_ratio": 1.755656108597285, + "no_speech_prob": 0.0014239277224987745}, {"id": 743, "seek": 406372, "start": 4068.9199999999996, + "end": 4076.2, "text": " have available to us and especially the kind of tool that + we work on every day um the floor for", "tokens": [50624, 362, 2435, 281, 505, 293, + 2318, 264, 733, 295, 2290, 300, 321, 589, 322, 633, 786, 1105, 264, 4123, 337, 50988], + "temperature": 0.0, "avg_logprob": -0.06795200434598056, "compression_ratio": 1.755656108597285, + "no_speech_prob": 0.0014239277224987745}, {"id": 744, "seek": 406372, "start": 4076.2, + "end": 4082.4399999999996, "text": " productivity has been raised but the ceiling + has been raised far more and so what really matters to", "tokens": [50988, 15604, + 575, 668, 6005, 457, 264, 13655, 575, 668, 6005, 1400, 544, 293, 370, 437, 534, + 7001, 281, 51300], "temperature": 0.0, "avg_logprob": -0.06795200434598056, "compression_ratio": + 1.755656108597285, "no_speech_prob": 0.0014239277224987745}, {"id": 745, "seek": + 406372, "start": 4082.4399999999996, "end": 4088.4399999999996, "text": " us is + having a team of individuals that where everyone is is a player right we see these + teams as", "tokens": [51300, 505, 307, 1419, 257, 1469, 295, 5346, 300, 689, 1518, + 307, 307, 257, 4256, 558, 321, 536, 613, 5491, 382, 51600], "temperature": 0.0, + "avg_logprob": -0.06795200434598056, "compression_ratio": 1.755656108597285, "no_speech_prob": + 0.0014239277224987745}, {"id": 746, "seek": 408844, "start": 4088.44, "end": 4094.04, + "text": " a symbol of almost more like sports teams today um then how companies + were originally uh originally", "tokens": [50364, 257, 5986, 295, 1920, 544, 411, + 6573, 5491, 965, 1105, 550, 577, 3431, 645, 7993, 2232, 7993, 50644], "temperature": + 0.0, "avg_logprob": -0.09398909538022933, "compression_ratio": 1.8211009174311927, + "no_speech_prob": 0.0013162157265469432}, {"id": 747, "seek": 408844, "start": 4094.04, + "end": 4101.4, "text": " built um and I think that we we hold that as a as a strong + belief in how we we um we are building", "tokens": [50644, 3094, 1105, 293, 286, + 519, 300, 321, 321, 1797, 300, 382, 257, 382, 257, 2068, 7107, 294, 577, 321, 321, + 1105, 321, 366, 2390, 51012], "temperature": 0.0, "avg_logprob": -0.09398909538022933, + "compression_ratio": 1.8211009174311927, "no_speech_prob": 0.0013162157265469432}, + {"id": 748, "seek": 408844, "start": 4101.4, "end": 4107.08, "text": " the company + but it demands a lot from everyone to work in this way but it''s very fun um and + I think", "tokens": [51012, 264, 2237, 457, 309, 15107, 257, 688, 490, 1518, 281, + 589, 294, 341, 636, 457, 309, 311, 588, 1019, 1105, 293, 286, 519, 51296], "temperature": + 0.0, "avg_logprob": -0.09398909538022933, "compression_ratio": 1.8211009174311927, + "no_speech_prob": 0.0013162157265469432}, {"id": 749, "seek": 408844, "start": 4107.08, + "end": 4112.84, "text": " that the the growth that that embodies on everyone including + myself is important and I have to keep", "tokens": [51296, 300, 264, 264, 4599, + 300, 300, 4605, 6087, 322, 1518, 3009, 2059, 307, 1021, 293, 286, 362, 281, 1066, + 51584], "temperature": 0.0, "avg_logprob": -0.09398909538022933, "compression_ratio": + 1.8211009174311927, "no_speech_prob": 0.0013162157265469432}, {"id": 750, "seek": + 411284, "start": 4112.84, "end": 4118.4400000000005, "text": " up with that I have + to keep up with the demand of of of how our customers and our team internally", + "tokens": [50364, 493, 365, 300, 286, 362, 281, 1066, 493, 365, 264, 4733, 295, + 295, 295, 577, 527, 4581, 293, 527, 1469, 19501, 50644], "temperature": 0.0, "avg_logprob": + -0.0971376895904541, "compression_ratio": 1.7644444444444445, "no_speech_prob": + 0.004712470807135105}, {"id": 751, "seek": 411284, "start": 4119.24, "end": 4124.360000000001, + "text": " and everything grows and that''s that''s the biggest challenge is just + the amount of new that has to", "tokens": [50684, 293, 1203, 13156, 293, 300, 311, + 300, 311, 264, 3880, 3430, 307, 445, 264, 2372, 295, 777, 300, 575, 281, 50940], + "temperature": 0.0, "avg_logprob": -0.0971376895904541, "compression_ratio": 1.7644444444444445, + "no_speech_prob": 0.004712470807135105}, {"id": 752, "seek": 411284, "start": 4124.360000000001, + "end": 4130.52, "text": " be learned um so that we can become a successful company + which is is important for me for our customers", "tokens": [50940, 312, 3264, 1105, + 370, 300, 321, 393, 1813, 257, 4406, 2237, 597, 307, 307, 1021, 337, 385, 337, 527, + 4581, 51248], "temperature": 0.0, "avg_logprob": -0.0971376895904541, "compression_ratio": + 1.7644444444444445, "no_speech_prob": 0.004712470807135105}, {"id": 753, "seek": + 411284, "start": 4130.52, "end": 4135.4800000000005, "text": " and for everyone + who''s chosen to come along for the ride and join the company. Oh that''s awesome", + "tokens": [51248, 293, 337, 1518, 567, 311, 8614, 281, 808, 2051, 337, 264, 5077, + 293, 3917, 264, 2237, 13, 876, 300, 311, 3476, 51496], "temperature": 0.0, "avg_logprob": + -0.0971376895904541, "compression_ratio": 1.7644444444444445, "no_speech_prob": + 0.004712470807135105}, {"id": 754, "seek": 413548, "start": 4135.48, "end": 4143.16, + "text": " yeah this field changes so quickly um it it felt much slower when I was + coding myself you know", "tokens": [50364, 1338, 341, 2519, 2962, 370, 2661, 1105, + 309, 309, 2762, 709, 14009, 562, 286, 390, 17720, 2059, 291, 458, 50748], "temperature": + 0.0, "avg_logprob": -0.12982624901665582, "compression_ratio": 1.6122448979591837, + "no_speech_prob": 0.010721574537456036}, {"id": 755, "seek": 413548, "start": 4143.16, + "end": 4149.32, "text": " Java you seen all that stuff you you had like solar elastic + search that''s it for like long time", "tokens": [50748, 10745, 291, 1612, 439, + 300, 1507, 291, 291, 632, 411, 7936, 17115, 3164, 300, 311, 309, 337, 411, 938, + 565, 51056], "temperature": 0.0, "avg_logprob": -0.12982624901665582, "compression_ratio": + 1.6122448979591837, "no_speech_prob": 0.010721574537456036}, {"id": 756, "seek": + 413548, "start": 4149.32, "end": 4156.44, "text": " and then a lot of new engines + popped up especially when vector search appeared on the scene but now with", "tokens": + [51056, 293, 550, 257, 688, 295, 777, 12982, 21545, 493, 2318, 562, 8062, 3164, + 8516, 322, 264, 4145, 457, 586, 365, 51412], "temperature": 0.0, "avg_logprob": + -0.12982624901665582, "compression_ratio": 1.6122448979591837, "no_speech_prob": + 0.010721574537456036}, {"id": 757, "seek": 413548, "start": 4156.44, "end": 4162.36, + "text": " the LEM advancements and all of that it just feels so crazy so yeah it''s + very interesting challenge", "tokens": [51412, 264, 441, 6683, 7295, 1117, 293, + 439, 295, 300, 309, 445, 3417, 370, 3219, 370, 1338, 309, 311, 588, 1880, 3430, + 51708], "temperature": 0.0, "avg_logprob": -0.12982624901665582, "compression_ratio": + 1.6122448979591837, "no_speech_prob": 0.010721574537456036}, {"id": 758, "seek": + 416236, "start": 4162.36, "end": 4170.04, "text": " for sure you know personally + and business wise and and team wise and yeah and and keeping balance", "tokens": + [50364, 337, 988, 291, 458, 5665, 293, 1606, 10829, 293, 293, 1469, 10829, 293, + 1338, 293, 293, 5145, 4772, 50748], "temperature": 0.0, "avg_logprob": -0.14645883930263234, + "compression_ratio": 1.6333333333333333, "no_speech_prob": 0.0028533500153571367}, + {"id": 759, "seek": 416236, "start": 4170.04, "end": 4177.08, "text": " is another + one. I think the pace of the pace that we see everyone running at now in the successful", + "tokens": [50748, 307, 1071, 472, 13, 286, 519, 264, 11638, 295, 264, 11638, 300, + 321, 536, 1518, 2614, 412, 586, 294, 264, 4406, 51100], "temperature": 0.0, "avg_logprob": + -0.14645883930263234, "compression_ratio": 1.6333333333333333, "no_speech_prob": + 0.0028533500153571367}, {"id": 760, "seek": 416236, "start": 4177.08, "end": 4185.639999999999, + "text": " companies is beyond anything that I''ve seen before um it reminds me of + of just the the the months", "tokens": [51100, 3431, 307, 4399, 1340, 300, 286, + 600, 1612, 949, 1105, 309, 12025, 385, 295, 295, 445, 264, 264, 264, 2493, 51528], + "temperature": 0.0, "avg_logprob": -0.14645883930263234, "compression_ratio": 1.6333333333333333, + "no_speech_prob": 0.0028533500153571367}, {"id": 761, "seek": 418564, "start": 4185.64, + "end": 4193.88, "text": " leading up to black friday at Shopify but it''s all the + time and I love it I''m addicted to that pace", "tokens": [50364, 5775, 493, 281, + 2211, 431, 4708, 412, 43991, 457, 309, 311, 439, 264, 565, 293, 286, 959, 309, 286, + 478, 24629, 281, 300, 11638, 50776], "temperature": 0.0, "avg_logprob": -0.09048097783868964, + "compression_ratio": 1.6995708154506437, "no_speech_prob": 0.012471274472773075}, + {"id": 762, "seek": 418564, "start": 4193.88, "end": 4199.8, "text": " and I think + that we have created a team of people who seek intensity and that''s exactly what + we", "tokens": [50776, 293, 286, 519, 300, 321, 362, 2942, 257, 1469, 295, 561, + 567, 8075, 13749, 293, 300, 311, 2293, 437, 321, 51072], "temperature": 0.0, "avg_logprob": + -0.09048097783868964, "compression_ratio": 1.6995708154506437, "no_speech_prob": + 0.012471274472773075}, {"id": 763, "seek": 418564, "start": 4199.8, "end": 4204.68, + "text": " think that we need to create the right product at a pace that makes sense + for also our customers", "tokens": [51072, 519, 300, 321, 643, 281, 1884, 264, 558, + 1674, 412, 257, 11638, 300, 1669, 2020, 337, 611, 527, 4581, 51316], "temperature": + 0.0, "avg_logprob": -0.09048097783868964, "compression_ratio": 1.6995708154506437, + "no_speech_prob": 0.012471274472773075}, {"id": 764, "seek": 418564, "start": 4204.68, + "end": 4209.320000000001, "text": " so the day are never bottled next on us and + that is what keeps me up at night. Oh that''s awesome that''s", "tokens": [51316, + 370, 264, 786, 366, 1128, 2274, 1493, 958, 322, 505, 293, 300, 307, 437, 5965, 385, + 493, 412, 1818, 13, 876, 300, 311, 3476, 300, 311, 51548], "temperature": 0.0, "avg_logprob": + -0.09048097783868964, "compression_ratio": 1.6995708154506437, "no_speech_prob": + 0.012471274472773075}, {"id": 765, "seek": 420932, "start": 4210.2, "end": 4217.719999999999, + "text": " great cause. We usually end with some sort of announcement anything you + want to say to the audience", "tokens": [50408, 869, 3082, 13, 492, 2673, 917, 365, + 512, 1333, 295, 12847, 1340, 291, 528, 281, 584, 281, 264, 4034, 50784], "temperature": + 0.0, "avg_logprob": -0.17218775159857247, "compression_ratio": 1.7022222222222223, + "no_speech_prob": 0.005553343333303928}, {"id": 766, "seek": 420932, "start": 4217.719999999999, + "end": 4222.36, "text": " especially now that you said that you want to go deeper + into marketing it beats your chance", "tokens": [50784, 2318, 586, 300, 291, 848, + 300, 291, 528, 281, 352, 7731, 666, 6370, 309, 16447, 428, 2931, 51016], "temperature": + 0.0, "avg_logprob": -0.17218775159857247, "compression_ratio": 1.7022222222222223, + "no_speech_prob": 0.005553343333303928}, {"id": 767, "seek": 420932, "start": 4224.12, + "end": 4231.16, "text": " anything that you want to share all call for. I think + that we we we we''ve refrained from doing any", "tokens": [51104, 1340, 300, 291, + 528, 281, 2073, 439, 818, 337, 13, 286, 519, 300, 321, 321, 321, 321, 600, 1895, + 31774, 490, 884, 604, 51456], "temperature": 0.0, "avg_logprob": -0.17218775159857247, + "compression_ratio": 1.7022222222222223, "no_speech_prob": 0.005553343333303928}, + {"id": 768, "seek": 420932, "start": 4231.16, "end": 4237.16, "text": " large releases + and we try to just ship as rapidly as we possibly can if I look at the change", + "tokens": [51456, 2416, 16952, 293, 321, 853, 281, 445, 5374, 382, 12910, 382, 321, + 6264, 393, 498, 286, 574, 412, 264, 1319, 51756], "temperature": 0.0, "avg_logprob": + -0.17218775159857247, "compression_ratio": 1.7022222222222223, "no_speech_prob": + 0.005553343333303928}, {"id": 769, "seek": 423716, "start": 4237.16, "end": 4242.36, + "text": " log this month it''s um I mean we we launched to we launched the region + the Singapore Canada", "tokens": [50364, 3565, 341, 1618, 309, 311, 1105, 286, 914, + 321, 321, 8730, 281, 321, 8730, 264, 4458, 264, 14491, 6309, 50624], "temperature": + 0.0, "avg_logprob": -0.1832933177118716, "compression_ratio": 1.7491039426523298, + "no_speech_prob": 0.06415931135416031}, {"id": 770, "seek": 423716, "start": 4242.36, + "end": 4249.24, "text": " we''ve added the float tight uh we''ve we''ve recently + added clients for Java Go Ruby um one of the things", "tokens": [50624, 321, 600, + 3869, 264, 15706, 4524, 2232, 321, 600, 321, 600, 3938, 3869, 6982, 337, 10745, + 1037, 19907, 1105, 472, 295, 264, 721, 50968], "temperature": 0.0, "avg_logprob": + -0.1832933177118716, "compression_ratio": 1.7491039426523298, "no_speech_prob": + 0.06415931135416031}, {"id": 771, "seek": 423716, "start": 4249.24, "end": 4255.08, + "text": " that I think is is really exciting is our conditional rights and this + is where turbo puffer is", "tokens": [50968, 300, 286, 519, 307, 307, 534, 4670, + 307, 527, 27708, 4601, 293, 341, 307, 689, 20902, 19613, 260, 307, 51260], "temperature": + 0.0, "avg_logprob": -0.1832933177118716, "compression_ratio": 1.7491039426523298, + "no_speech_prob": 0.06415931135416031}, {"id": 772, "seek": 423716, "start": 4255.08, + "end": 4261.4, "text": " not just a bunch of files on s3 it''s not even just a search + engine it could do conditional rights", "tokens": [51260, 406, 445, 257, 3840, 295, + 7098, 322, 262, 18, 309, 311, 406, 754, 445, 257, 3164, 2848, 309, 727, 360, 27708, + 4601, 51576], "temperature": 0.0, "avg_logprob": -0.1832933177118716, "compression_ratio": + 1.7491039426523298, "no_speech_prob": 0.06415931135416031}, {"id": 773, "seek": + 423716, "start": 4261.4, "end": 4266.92, "text": " where you can say hey I only + want to replace this document if it''s newer than the old version right", "tokens": + [51576, 689, 291, 393, 584, 4177, 286, 787, 528, 281, 7406, 341, 4166, 498, 309, + 311, 17628, 813, 264, 1331, 3037, 558, 51852], "temperature": 0.0, "avg_logprob": + -0.1832933177118716, "compression_ratio": 1.7491039426523298, "no_speech_prob": + 0.06415931135416031}, {"id": 774, "seek": 426716, "start": 4267.8, "end": 4272.599999999999, + "text": " these are real database features and things like patch right where you + do a partial update but we", "tokens": [50396, 613, 366, 957, 8149, 4122, 293, 721, + 411, 9972, 558, 689, 291, 360, 257, 14641, 5623, 457, 321, 50636], "temperature": + 0.0, "avg_logprob": -0.13236265346921725, "compression_ratio": 1.8164794007490637, + "no_speech_prob": 0.005810700356960297}, {"id": 775, "seek": 426716, "start": 4272.599999999999, + "end": 4278.5199999999995, "text": " don''t we we just we just launched and then + we put it on put it on x and we move on um so I don''t", "tokens": [50636, 500, + 380, 321, 321, 445, 321, 445, 8730, 293, 550, 321, 829, 309, 322, 829, 309, 322, + 2031, 293, 321, 1286, 322, 1105, 370, 286, 500, 380, 50932], "temperature": 0.0, + "avg_logprob": -0.13236265346921725, "compression_ratio": 1.8164794007490637, "no_speech_prob": + 0.005810700356960297}, {"id": 776, "seek": 426716, "start": 4278.5199999999995, + "end": 4282.5199999999995, "text": " have any big announcement we went GA a couple + a couple months ago it would have luck to have done that", "tokens": [50932, 362, + 604, 955, 12847, 321, 1437, 22841, 257, 1916, 257, 1916, 2493, 2057, 309, 576, 362, + 3668, 281, 362, 1096, 300, 51132], "temperature": 0.0, "avg_logprob": -0.13236265346921725, + "compression_ratio": 1.8164794007490637, "no_speech_prob": 0.005810700356960297}, + {"id": 777, "seek": 426716, "start": 4282.5199999999995, "end": 4288.04, "text": + " but we just tried to ship an announcement just get it out there as soon as possible + move on", "tokens": [51132, 457, 321, 445, 3031, 281, 5374, 364, 12847, 445, 483, + 309, 484, 456, 382, 2321, 382, 1944, 1286, 322, 51408], "temperature": 0.0, "avg_logprob": + -0.13236265346921725, "compression_ratio": 1.8164794007490637, "no_speech_prob": + 0.005810700356960297}, {"id": 778, "seek": 426716, "start": 4288.04, "end": 4292.68, + "text": " and ship the next thing. Yeah congratulations on Jay on your original + launch but on Jay I think", "tokens": [51408, 293, 5374, 264, 958, 551, 13, 865, + 13568, 322, 11146, 322, 428, 3380, 4025, 457, 322, 11146, 286, 519, 51640], "temperature": + 0.0, "avg_logprob": -0.13236265346921725, "compression_ratio": 1.8164794007490637, + "no_speech_prob": 0.005810700356960297}, {"id": 779, "seek": 429268, "start": 4292.68, + "end": 4299.240000000001, "text": " it''s a big milestone as well and as you said + you''re probably not as sort of transactional anymore", "tokens": [50364, 309, 311, + 257, 955, 28048, 382, 731, 293, 382, 291, 848, 291, 434, 1391, 406, 382, 1333, 295, + 46688, 1966, 3602, 50692], "temperature": 0.0, "avg_logprob": -0.0870864137690118, + "compression_ratio": 1.6352459016393444, "no_speech_prob": 0.010361993685364723}, + {"id": 780, "seek": 429268, "start": 4299.240000000001, "end": 4304.4400000000005, + "text": " you just keep shipping and you follow what the customers need and but + sometimes some of these things", "tokens": [50692, 291, 445, 1066, 14122, 293, 291, + 1524, 437, 264, 4581, 643, 293, 457, 2171, 512, 295, 613, 721, 50952], "temperature": + 0.0, "avg_logprob": -0.0870864137690118, "compression_ratio": 1.6352459016393444, + "no_speech_prob": 0.010361993685364723}, {"id": 781, "seek": 429268, "start": 4304.4400000000005, + "end": 4310.6, "text": " may go and notice unless you know people follow exactly + what you do and so in that sense I feel like", "tokens": [50952, 815, 352, 293, + 3449, 5969, 291, 458, 561, 1524, 2293, 437, 291, 360, 293, 370, 294, 300, 2020, + 286, 841, 411, 51260], "temperature": 0.0, "avg_logprob": -0.0870864137690118, "compression_ratio": + 1.6352459016393444, "no_speech_prob": 0.010361993685364723}, {"id": 782, "seek": + 429268, "start": 4310.6, "end": 4317.8, "text": " there is a room or stage for saying + hey guys go go use it it''s GA right run your benchmark. I think", "tokens": [51260, + 456, 307, 257, 1808, 420, 3233, 337, 1566, 4177, 1074, 352, 352, 764, 309, 309, + 311, 22841, 558, 1190, 428, 18927, 13, 286, 519, 51620], "temperature": 0.0, "avg_logprob": + -0.0870864137690118, "compression_ratio": 1.6352459016393444, "no_speech_prob": + 0.010361993685364723}, {"id": 783, "seek": 431780, "start": 4318.76, "end": 4323.320000000001, + "text": " now do they think about it been more I think one announcement might be + that early and", "tokens": [50412, 586, 360, 436, 519, 466, 309, 668, 544, 286, + 519, 472, 12847, 1062, 312, 300, 2440, 293, 50640], "temperature": 0.0, "avg_logprob": + -0.14729079746064686, "compression_ratio": 1.7804878048780488, "no_speech_prob": + 0.0030768937431275845}, {"id": 784, "seek": 431780, "start": 4323.320000000001, + "end": 4328.6, "text": " unintervaled puffer''s history we were very focused on + doing many namespaces that were small", "tokens": [50640, 517, 5106, 3337, 292, + 19613, 260, 311, 2503, 321, 645, 588, 5178, 322, 884, 867, 5288, 79, 2116, 300, + 645, 1359, 50904], "temperature": 0.0, "avg_logprob": -0.14729079746064686, "compression_ratio": + 1.7804878048780488, "no_speech_prob": 0.0030768937431275845}, {"id": 785, "seek": + 431780, "start": 4329.16, "end": 4335.96, "text": " but we are getting very good + at large namespaces now and we have customers that are searching", "tokens": [50932, + 457, 321, 366, 1242, 588, 665, 412, 2416, 5288, 79, 2116, 586, 293, 321, 362, 4581, + 300, 366, 10808, 51272], "temperature": 0.0, "avg_logprob": -0.14729079746064686, + "compression_ratio": 1.7804878048780488, "no_speech_prob": 0.0030768937431275845}, + {"id": 786, "seek": 431780, "start": 4335.96, "end": 4343.72, "text": " billions + of vectors at once and we have customers that want to search hundreds of billions + of", "tokens": [51272, 17375, 295, 18875, 412, 1564, 293, 321, 362, 4581, 300, 528, + 281, 3164, 6779, 295, 17375, 295, 51660], "temperature": 0.0, "avg_logprob": -0.14729079746064686, + "compression_ratio": 1.7804878048780488, "no_speech_prob": 0.0030768937431275845}, + {"id": 787, "seek": 434372, "start": 4343.72, "end": 4350.4400000000005, "text": + " vectors all at once and we are working with them on that and this is not particularly + scary anymore", "tokens": [50364, 18875, 439, 412, 1564, 293, 321, 366, 1364, 365, + 552, 322, 300, 293, 341, 307, 406, 4098, 6958, 3602, 50700], "temperature": 0.0, + "avg_logprob": -0.1229084079915827, "compression_ratio": 1.706140350877193, "no_speech_prob": + 0.0020374376326799393}, {"id": 788, "seek": 434372, "start": 4350.4400000000005, + "end": 4355.8, "text": " you know exactly what we need to get there so if you have + use cases of that caliber you may have", "tokens": [50700, 291, 458, 2293, 437, + 321, 643, 281, 483, 456, 370, 498, 291, 362, 764, 3331, 295, 300, 41946, 291, 815, + 362, 50968], "temperature": 0.0, "avg_logprob": -0.1229084079915827, "compression_ratio": + 1.706140350877193, "no_speech_prob": 0.0020374376326799393}, {"id": 789, "seek": + 434372, "start": 4355.8, "end": 4361.400000000001, "text": " passed by turbo puffer + before but we''re getting ready and we are ready for hundreds of million", "tokens": + [50968, 4678, 538, 20902, 19613, 260, 949, 457, 321, 434, 1242, 1919, 293, 321, + 366, 1919, 337, 6779, 295, 2459, 51248], "temperature": 0.0, "avg_logprob": -0.1229084079915827, + "compression_ratio": 1.706140350877193, "no_speech_prob": 0.0020374376326799393}, + {"id": 790, "seek": 434372, "start": 4362.2, "end": 4368.4400000000005, "text": + " and billions at once the only limitation there is the is really just the size + of a single machine", "tokens": [51288, 293, 17375, 412, 1564, 264, 787, 27432, + 456, 307, 264, 307, 534, 445, 264, 2744, 295, 257, 2167, 3479, 51600], "temperature": + 0.0, "avg_logprob": -0.1229084079915827, "compression_ratio": 1.706140350877193, + "no_speech_prob": 0.0020374376326799393}, {"id": 791, "seek": 436844, "start": 4368.5199999999995, + "end": 4373.799999999999, "text": " and then we shard over them but I think back + to the sharding we had before you need to make every", "tokens": [50368, 293, 550, + 321, 402, 515, 670, 552, 457, 286, 519, 646, 281, 264, 402, 515, 278, 321, 632, + 949, 291, 643, 281, 652, 633, 50632], "temperature": 0.0, "avg_logprob": -0.16148722672662816, + "compression_ratio": 1.7608695652173914, "no_speech_prob": 0.007752406410872936}, + {"id": 792, "seek": 436844, "start": 4373.799999999999, "end": 4377.96, "text": + " shard as large as possible to get the best economics and the best performance + and that''s been one", "tokens": [50632, 402, 515, 382, 2416, 382, 1944, 281, 483, + 264, 1151, 14564, 293, 264, 1151, 3389, 293, 300, 311, 668, 472, 50840], "temperature": + 0.0, "avg_logprob": -0.16148722672662816, "compression_ratio": 1.7608695652173914, + "no_speech_prob": 0.007752406410872936}, {"id": 793, "seek": 436844, "start": 4377.96, + "end": 4382.44, "text": " of the issues with some of the traditional story tangents. + Yeah for sure. Yeah I really enjoyed", "tokens": [50840, 295, 264, 2663, 365, 512, + 295, 264, 5164, 1657, 10266, 791, 13, 865, 337, 988, 13, 865, 286, 534, 4626, 51064], + "temperature": 0.0, "avg_logprob": -0.16148722672662816, "compression_ratio": 1.7608695652173914, + "no_speech_prob": 0.007752406410872936}, {"id": 794, "seek": 436844, "start": 4382.44, + "end": 4387.16, "text": " the convo I know we could have gone and so many topics + like I really wanted to ask you also about", "tokens": [51064, 264, 416, 3080, 286, + 458, 321, 727, 362, 2780, 293, 370, 867, 8378, 411, 286, 534, 1415, 281, 1029, 291, + 611, 466, 51300], "temperature": 0.0, "avg_logprob": -0.16148722672662816, "compression_ratio": + 1.7608695652173914, "no_speech_prob": 0.007752406410872936}, {"id": 795, "seek": + 436844, "start": 4387.16, "end": 4395.24, "text": " an an algorithms and stuff but + I also I feel like we could talk more later as well you know down", "tokens": [51300, + 364, 364, 14642, 293, 1507, 457, 286, 611, 286, 841, 411, 321, 727, 751, 544, 1780, + 382, 731, 291, 458, 760, 51704], "temperature": 0.0, "avg_logprob": -0.16148722672662816, + "compression_ratio": 1.7608695652173914, "no_speech_prob": 0.007752406410872936}, + {"id": 796, "seek": 439524, "start": 4395.24, "end": 4400.679999999999, "text": + " the road as you guys are progressing hopefully you''ll be open to that but I''ve + learned a ton and", "tokens": [50364, 264, 3060, 382, 291, 1074, 366, 36305, 4696, + 291, 603, 312, 1269, 281, 300, 457, 286, 600, 3264, 257, 2952, 293, 50636], "temperature": + 0.0, "avg_logprob": -0.13507414999462308, "compression_ratio": 1.6888888888888889, + "no_speech_prob": 0.004269792232662439}, {"id": 797, "seek": 439524, "start": 4400.679999999999, + "end": 4407.32, "text": " it''s very interesting designed that you have and and + the whole journey of you pushing for four", "tokens": [50636, 309, 311, 588, 1880, + 4761, 300, 291, 362, 293, 293, 264, 1379, 4671, 295, 291, 7380, 337, 1451, 50968], + "temperature": 0.0, "avg_logprob": -0.13507414999462308, "compression_ratio": 1.6888888888888889, + "no_speech_prob": 0.004269792232662439}, {"id": 798, "seek": 439524, "start": 4407.32, + "end": 4413.88, "text": " months you know and interrupted I hope you now regain + some of the balance back to your life", "tokens": [50968, 2493, 291, 458, 293, 30329, + 286, 1454, 291, 586, 35336, 512, 295, 264, 4772, 646, 281, 428, 993, 51296], "temperature": + 0.0, "avg_logprob": -0.13507414999462308, "compression_ratio": 1.6888888888888889, + "no_speech_prob": 0.004269792232662439}, {"id": 799, "seek": 439524, "start": 4414.599999999999, + "end": 4419.8, "text": " now that you have the team supporting you but I really + enjoyed this conversation Simon thank you", "tokens": [51332, 586, 300, 291, 362, + 264, 1469, 7231, 291, 457, 286, 534, 4626, 341, 3761, 13193, 1309, 291, 51592], + "temperature": 0.0, "avg_logprob": -0.13507414999462308, "compression_ratio": 1.6888888888888889, + "no_speech_prob": 0.004269792232662439}, {"id": 800, "seek": 441980, "start": 4419.8, + "end": 4429.8, "text": " so much for your time thank you Dimitri", "tokens": [50364, + 370, 709, 337, 428, 565, 1309, 291, 20975, 270, 470, 50864], "temperature": 0.0, + "avg_logprob": -0.48959115835336536, "compression_ratio": 0.8863636363636364, "no_speech_prob": + 0.09183885902166367}]' +--- + +Now, let's get started. MAPKIN Problem 4 Today, as you are preparing your organic, high mountain type needs along in the kitchen net. +One of your lovely co-workers mentioned that they were looking at adding more radices because it was maxing out at 10,000 commands per second, which they were trending aggressively towards. You asked them how they were using it. +Were they writing some obscure order and command? They would BPF probes to determine that it was all get key and set key value. They also confirmed all the values were about less than 64 bytes. For those unfamiliar with radices, it's single threaded in memory key value store written in C. +And faced, after this encounter, you walk to the window. You look out and see if your high mountain type needs along. As you stare at yet another condominium building being built, it hits you. 10,000 commands per second. 10,000. +Isn't that a bit low? Shouldn't something that's fundamentally just doing random memory reads and writes over an established TCP session be able to do more? Hello there. Vector podcast is back. Season 4. +And we are kicking off with an exciting topic and guest assignment, Eskulls and CEO of TurboPuffer. I've been watching you guys from almost from the start, just following each other on Twitter, like virtual friends. +And it's funny that before this episode, you're the CEO of the company and this for this episode, you try to sell TurboPuffer to me and say, hey, why don't you use it? Did you make a compound tom should pass? Yeah, should pass, for sure. But tell me, hey, welcome, first of all, welcome. +And thank you very much for having with me. It's a tradition to usually start with the background. If you could speak in your own words about yourself, your journey. +I know that you've worked at Shopify at some point, also scaling databases, I guess, right? But I've also been following your napkin math newsletter. I was reading maybe I'll quote some text today from there, just to amuse an exciting audience. But tell me about yourself. +Yeah, I can give a very brief overview and if we can dig into anything, if there's anything that stands out. I started programming when I was a teenager. Similar to you, English is not my first language. +So at some point, I exhausted the Danish web and then like divulged into video game addiction for three years as a teenager to learn enough English to sort of, you know, get my own chat CBT moment and take off point. +And then I spent a lot of time in high school being not very good at competitive programming, but good enough to qualify for the small country of Denmark. And then I spent almost a decade working at Shopify doing mainly infrastructure work. +So when I joined infrastructure Shopify and the infrastructure team, we were doing, I mean, it was not even an infrastructure team like DevOps was just becoming a thing. And we were driving just a couple hundred requests per second. And by the time I left, we saw peaks of more than a million. +And I more or less worked on all of the stateful systems that power that because they generally tend to be the bottleneck just playing whack-a-mole every single year for every Black Friday for many years. And I spent the majority of those years on one of the last resort pages for Shopify as well. +One of those pages were the pages are very scary in the middle of the night and where a lot of GME of course runs through Shopify. So very high responsibility on that. I left in 2021 and kind of jumped around at my friends' companies helping them with various things. +And I spent almost my entire career at one company. So I wanted to dabble and just go and basically help my friends with any infrastructure challenges that they had. And in 2023 when Chatschy BT launched and the APIs launched, I was working with my friends at this company called Readwise. +They have a product similar to a pocket and others for reading articles later from an Amal product. And they asked me to build a recommendation feature for articles. And I was like, well, it's perfect, right? LLMs or embedding models are basically just LLMs with their heads chopped off. +And they're trained on exactly this data. So we built something and it actually worked pretty well for just recommending articles. But then I ran the back of the envelope math on what it would cost to do this for the entire article catalog, right? It had hundreds of millions of articles. +And it would have cost more than 30 grand a month to do. And for a large company that's not a big deal for an experiment. But this was a company that was spending three grand a month on a Postgres instance that prior to working on this, I tuned. +And spending 10 times that on just recommendations and possibly search was just untenable. So it sort of lost it. It was lost in its track. And it was a bit sad. +And it's sort of ended up in that bucket that a lot of companies have of like, okay, we're going to work on this when it becomes cheaper and then we'll ship this feature. But it was a bit sad because I was excited about this feature. It's a user of the product as well. +And I could not stop thinking about that. Why was it so expensive? And the vector databases at the time were storing everything in memory. And DRAM on a cloud cost somewhere between two to five dollars per gigabyte. And this just economics of this didn't line up. +It wasn't that this vector database was doing anything, you know, malicious in their pricing. They're just trying to earn them on its margin on memory pricing. But memory pricing was just too high and it stopped its feature and it's tracks. +And what I couldn't stop thinking about is, why can't we do all of this on top of Obick storage, right? Like we just put it on an Obick storage. That's the source of truth. And then we actually need to some piece of data and we put it in memory or even on this if we can. +And I did it to Mac and Nathan. And I was like, I think that's about a hundred times cheaper. And of course, that would have been no brainer for read wise. +We would have just bought it and started using it and tried it out, right? And maybe put way more data in and maybe worked our way up to that 30 grand a month bill. But with a different workload. And so yeah, I couldn't stop thinking about it. +And then eventually started writing the first version over the summer of 2023. Just me alone in the wills of Canada and then launched it in October of 2023, which is probably where you saw it. I didn't really tell anyone about it. I was just I was just hacking away. +Launched it did a lot of our deal over that summer insights that some of them still are in the product and a lot of them we've since faced out. But the most important thing was that it launched. +And the first version of trouble puffer didn't have I was just looking at the website the other day for an unrelated reason. It didn't have mutable indexes. So you just wrote to it and then you called an index endpoint and then you're logged in like that's it. And it didn't have any SDKs. +It was just a big pure HTML website. But it was enough to ship it and it caught the attention at the time of the cursor team back in 2023. And of course, this was this was this was early on for cursor. It was early on for us. +And they are they're a vector database built did not line up with their per user economics and how they wanted to use rag in their in cursor. And so they they wanted to try to work together. +And we exchanged a bunch of emails of bullet points and it was very clear that they thought that this architecture was exactly now knowing the team are now they were just sat down at the dining table, done to napkin math over there and then thought why hasn't anyone built it like this. + And so we worked we worked I went to San Francisco and spent some time with them and came up with a bunch of features that they would need and called the best engineer that I knew would Shopify my co founder Justin and asked if they'd come on board because I think I think maybe there's something here. + And yeah, we launched it cursor cursor moved over and their bill was reduced by 95% and of course the additional storage architect today were on before didn't make sense for the cursor economics but our storage architecture really did because you put all the all the code based embeddings on s3 and then the ones that are actively being used we can use in grammar or have in disk. +I'll stop there but that would be what led up to to this moment. Oh, that's amazing journey. + A lot to ask of course a lot of questions but just on that cursor thing as I told you before we started recording you know I knew about you launching this working on this and then I've released it to the Lex Friedman podcast episode with the cursor team and they didn't mention turbo pop for sort of like in passing but you know I think that also probably created a lot of attention to you guys but I'm just curious like how did you get together how did you know cursor team somehow someone on the cursor team that you could like partner early on and essentially help they kind of like helped you to pioneer it right in some sense becoming the first client or maybe future client right how did you approach them. +They did I mean they were a design partner in every sense of the word right we had a slack channel and I feel like they treated us as part of their team and we treated them as part of our team. + They came inbound they sent an email based on the website and they said hey we would need mutable indexes and glob and a couple of other things so it's like well that's a very reasonable request right and I think they had the conviction that this was the right architecture and like if we could prove in their trust and then be able to be in a good place so it was really just a it was just an oneness conversation just the way that the website is today a very honest description of what are the trade-offs what can it do what can it not do what is the latency profile what are the guarantees and that's exactly the kind of bullet point discussion that we engaged in over email before I met the team in person yeah and they of course they were a small team at the time right it was and they needed help with the with with parts of their infrastructure and working very very closely with teams that they could trust with the right economics and the right the right reliability. + Yeah for sure but I guess that honesty which I also value a lot you know and in my work as I became a product manager you know three years ago and I think it applies to any discipline be honest but but you know like that honesty probably lies on the fact that you you've done your napkin math and you knew where this will scale where how this can go right how did you go about doing that pre-launch right before having any client is that the company of your friends that helped you to kind of like figure out the economics and sort of the the throughput and all of these rigorous questions that you ask you know as problem statements on napkin math. + I think that should almost bring up the internet archive version of it the the first version of TurboPuffer I had not thought about the business at all I didn't have any launch playbook I had I had one of course all the economics of what it would cost me to operate and spend a decent amount of time on the pricing because that felt like an important thing to spend time on at the time but there was really not much more than that of course the the Readwise team was very interested but at the time I could barely do a you know I could just do around 10 million factors which is not enough for their use case. + I can screen share the website with you right here of what it looked like at the time and then we can get for the for the for the listening audience we can get your reaction but it was it was very simple I wouldn't I wouldn't put in any sophistication and it was honestly I was exhausted I've been working on this like completely alone not telling anyone about it no interested customers for like four months extremely focused like every single day and I couldn't like you ask my wife she would say I was very distracted and she's just like well how are you working so hard on this like there's no one on the team you don't have any customer line up and I'm just like someone has to do this and I I just launched it and I launched it I mean now I feel some verising would be did launch it just couldn't do that launch it was pretty slow I spent a bunch of time actually trying to make it work in wasm and on the edge but it was too hard to make it fast and a bunch of other false starch like that on different types of a and end indexing structures we could talk about that as well and would be settled on but there was no real sophistication in the go to market it was really just here it is here's the outcome math here's what it does let's see how the world takes it but I think when when when you sit on a well you didn't sit on it yet but you had a cool like technology ID and mind right you knew you know it may play out but also of course it required a lot of hard work like you said but after that after you see it fly like on some small scale or whatever scale I think that brings you like that excitement to bring it to the world right so yeah I see you're sharing the screen of the of the web archive page yeah that's it very simple yeah yeah that's awesome but yeah that's actually a good segue to you know you probably know I've been at the emergence of the field of vector database field I've been I think I was the first probably to write just a simple block post with like you know these crump snippets of what each vector database did and how they stand out and so on turbo buffer wasn't there because turbo buffer was still in your mind I think but but the segue here is I don't have it covered in that block post but in your mind why were you not happy with the vector date is like at large did you try all of them did you try some of them why did you think that a new vector database deserves to exist yeah I think I think it really just came back to the read wise example right there's I there's they look like great products I really like the API of many of them they had lots of features that have taken me a long time to build that even features that we don't have today although we have a lot of features today compared to when we launched it came out of the cost piece that it felt that there was a lot of latent demand built up in the market of people who wanted to use these things but it just didn't make sense with the economics it's very difficult to earn a return on search I mean I remember the search clusters that Shopify were very expensive but ecommerce is a lot about search and so it was okay right but for a lot of companies search is a an important feature but is not the feature right and so the per user economics just have to make sense it's not that everyone just wants it in the cheapest possible way is that if you invest in infrastructure you have to get a return on that investment and it felt that I knew that I'd read wise they could get a return on that investment but it wasn't on 30 grand a month it was maybe close at a 3 grand or 5 grand a month that they would feel that they could earn a return on that feature and gender conversion engagement and whatever so it was really about the storage architecture and I think that when I think about databases now this was not as coherent to me at the time at the time I was driven by the Nipkin Math Naptkin Math not the not the market nothing else it was based on one qualitative experience and an Naptkin Math there was nothing else in it and speak about it in a more sophisticated way now being you know having learned a lot about go-to-market sense but the that those were that's really all that was at the time it was an insight on those two things the best ideas right are simultaneous inventions right someone else would have done it six months later probably other people were doing it at the time that launched a later right we were the first to launch with this particular architecture but it was out there for the grappling right the idea was in the air like s3 had the the dpites now finally so the way that I think about this to really boil this down is that if you want to create a generational database company I think you need two things you need a new workload the new workload here is that we have almost every company on earth sits on their treasure troll of data and they want to connect that to LLAMs especially all the unstructured data that it's always been very difficult to do we did this for structured data into 2010s the new workload was that we wanted to do analytics on billions tens of billions trillions of rows of structured data but now with LLAMs we're entering into that with the unstructured data that's the first thing we needed new workload because that's when people go out shopping for a new database the second thing that you need is a new storage architecture if you don't have a new storage architecture that is fundamentally a better tradeoff for the particular workload then there's no reason why tacking on a secondary index to your relational database to your OLAB to your existing search engine they would eat it I would have made that decision in the shoes that Shopify right it's like well this database like has a really good vector index but it doesn't bring anything new in terms of the storage architecture so we're just going to invest in the mySQL extension right it's what we really want to Shopify or the uh Lucine Lucine workload right these are great databases they've stood the test of time and when you're on call you become very conservative in what you adopt for new workloads but you cannot ignore a new storage architecture that is an order of magnitude cheaper than the previous one when you store a gigabyte of data in a traditional storage engine you have to replicate that to three disks maybe two if you have a little bit or if you have more risk tolerance but likely three a gigabyte of disk with from the cloud vendors cost about 10 cents you run it at 50% utilization otherwise it's too scary to be on call 20 cents per gigabyte times three for all the replicas 60 cents per gigabyte obi storage is two cents per gigabyte right so it's it's it's 30 times cheaper if it's all cold now by the time you have some of it in SSD and you have it in memory then the blended cost ends up being different but it tracks the actual value to the customer even if you have all of that in disk well you only need one copy right and that disk you can run it at 100% utilization meaning the blended cost is now 12 cents per gigabyte right so the 10 cents 100% utilization plus the two cents per gigabyte for obi storage so now you have the ingredients of a new actual database you have a new workload right which is which makes means that people are out there trying to look for ways to connect their data to LLMs and then you have the second ingredient which is a new storage architecture that allows them to do it in order of magnitude easier and cheaper than what they can do when they're existing architectures and this matters because vectors are so big right a kilobyte of text easily turns into tens of kilobytes of vector data yeah yeah it's absolutely true one other thing that I kept keep hearing or kept hearing about you know whether or not to introduce a vector search in the mix for some really heavy workloads is that it will bring certain latency on top that we cannot tolerate right for example if you run a hybrid search like you guys have implemented as well you know one of these will be slowest and therefore you will have to wait for that slowest component and so if it adds I don't know a few hundred milliseconds on top of your original you know retrieval mechanism then it's going to be an off-line what's your take on that have you thought obviously you have thought about that what's the edge that turbo buffer brings in this space over maybe pure databases yeah I think I think there's we see two types of ways that people adopt vector databases or turbo buffer we don't consider turbo buffer a pure play vector database we consider it a search engine we actually consider it a full database because there's a full generic LSM underneath all of that and we consider that the actual acid of turbo puffer is an LSM that's obnox storage native and doesn't rely on any state we just think that the vector index and the search engine index is what the market needed the most so let's speak about latency there's no real fundamental latency trade off with this architecture the only thing is that once in a while you will hit that cold query but the entire databases optimize the round minimizing the amount of round trips that you do to SS3 S3 you can max out a a network card right so you can get on a gcp or AWS box you can get 50 to 100 gigabytes per second of network bandwidth you give it per second of network bandwidth so this is similar to this band with the latency actually even better in the clouds often than disks even with SSDs even with N and NVME SSD so the network is phenomenal you can drive you know say you can drive all of that data you can drive gigabytes of data per second in a single round trip so you can get great throughput but the latency is high the p90 might be around 200 milliseconds to s3 for every round trip someone regardless of how much data that you transfer assuming you're saturating the box we've decide almost everything interval buffer around minimizing the number of round trips to 3 to 4 that doesn't just help for s3 it also helps for modern disk which the same thing you can drive enormous amounts of bandwidth but the round trip time is is long right it's like a hundreds of microseconds versus hundreds of milliseconds but still still substantial compared to dm so the latency tradeoff is not a fundamental tradeoff with this architecture by the time that it makes it into the memory cache it's just as fast as everyone else we have found that people don't care if it's like a millisecond or five milliseconds as long as it's reliably less than around 50 milliseconds they're good right and I think that a lot of the traditional storage architectures especially because of the sharding structure with multiple nodes you're already in a worse position than going to two systems where if you write a query on some of the traditional search engine generally you touch five ten maybe more nodes depending on depending on this because the shard size is very very small you go into more depth on that so you already have this problem what we see is that there's two types of ways that people adopt it so the first one is you have an existing lexical search engine you are having a hard time running it because of the traditional like very stateful architecture and they're reputed for just being difficult to run and you're like already a little bit add your threshold for the amount of money that you're spending on this cluster and if you put the vector data in it's often 10 to 20 times larger than the text data it is just it's a project that stops in its tracks similar to the read wise case that I mentioned before so for those players we often see that they have something that's really well tuned for the lexical and they adopt a vector store and then they do two queries in parallel the vector store should not be slower than the lexical right so these are just two futures that you merge together in use and in general we see that our customers are actually quite happy to move some of the ranking and the final like second stage ranking out of the search engine and into a search. +py instead of a big search. +json which can be very difficult to maintain many of these companies express a lot of desire to move more and more of their lexical work also onto turbo buffer and we have a full tech search engine we don't have every feature of blue scene yet but we're working very very actively on bringing this up what we also see is that a lot of our customers don't need all of the features of blue scene anymore because the vectors are so good that a lot of the you know Ph. +D. + level efforts we did before to turn strings into things is not as much of an issue anymore and really what we use strings for more is that when you search for DM you get the metri right like like for a prefix match whereas an embedding model might think that that's a direct message those kinds of things are important and we still need string matching for that lots of applications needed but there's a lot of things that we do in leucine with synonyms with stemming with all these kinds of things the team models are frankly just a lot better at so we find that this is an adoption curve that is there a lot of the newer companies just start with embedding models and simple full-text search and and they get it up and running on turbo puffer and they like that they just pay for what they need they don't think about it and they could pump a petabyte of data and if they want it and it would be extremely competitive on pricing and they don't have to think about it oh that's awesome that's awesome actually I forgot to mention I forgot to ask you which language did you choose to implement to revolver on yeah we we um well it was just me at the time but I chose I chose Rust and I think I spent the majority of my career writing Ruby at Shopify and and then a lot of go as well for some of the infrastructure components and then mainly debugging C which all the databases that we were using were we're doing and reading C I I really like go and I like go alongside Ruby at Shopify because go was one of those things as when leading teams I didn't have to worry about whether someone knew go or not because the adoption to learn it is two weeks um the adoption to learn Rust and being proficient in it is months right and somehow that's written Rust for two years it's a lot more productive than someone who's written it for two months in the language um and that's just not the case for go like someone who's spent two years in it is just not that much more productive and so and I think that's an amazing feature of the language but from from my own point of view and from the napkin web math point of you I just I was always so hungry having been in time inside of runtimes in the Ruby MRI runtime and then inside of the go runtime I was just hungry to just get directly connected to the metal of the machine and so and for a database in particular that was very important right we need to vectorize everything we need full control over that and I think that I think that that full control as remarkable now as Go is which would I think it would be would have been okay that raw access to the machine has been has is needed for for writing something like TurboPover. + Yeah for sure I still remember coding the times when I was learning and coded industrially in CNC++ like you like you really need it to be very very careful but in return you can get a lot of like performance gains you know and some of your ideas really fly but yeah today I guess I'm coding more in Python or should I even say that I called in Python when I use cursor more and more which is by the way scary you know the the that feeling when some some other entity writes code and you are just reading it right it's it's a little bit scary and I'm still grappling with it but the amount of productivity that I get is enormous and it's like you know I can shoot daily like features and just see them being used that's amazing. + I think what I love about it is that I still love to sit there and write the additional code by hand you know maybe at some point we will we will we will mark TurboPover as an a seasonally written database because we don't use a ton of AI for the very key parts because I mean we're at the edge of what the LLMs could know but I think that for me in a position where I'm in and out of meetings all day these days but I can actually get a lot done in a 30 minute window when I have something that's prompting and writing the tests right and you follow it off at the beginning of the meeting you check in and they're like you know 15 30 minutes you have in between blocks and this allows me to actually contribute a lot more code than I was otherwise going to be able to not into the core engine you know I don't get led led into led into a lot of that anymore because I don't have the the time and focus that it takes to fully think something through there but for the website the API to initial features all of that it's just been wonderful yeah that's amazing I also wanted to go a bit on the tangent like you essentially you've been you could say mathematician engineer but you took a leap towards becoming a CEO right and I think you know as you said you go to meetings you do lots of you know probably sales and and and and product and all of that stuff was it a natural transition for you like what what have you learned in this in this journey and what what maybe do you miss from from your previous career when you when you were like you know hands on and sit down and write a bunch of code I think I have a I have a couple of angles to answer the question not necessarily a directing answer I think one one angle is that fundamentally I'm like a growth junkie for better or worse and I think that entrepreneurship is the ultimate path for a growth junkie it was never really something that I assumed that I was going to do I've never before even when I was working on the project it's never about becoming a founder it's just about creating the database right and at some point becoming the founder of the company becomes a means to an end of creating the database and getting it into the hands of our users and making sure they have a great time that's always what like that's what drove me right was the read why I should have this right our customers should have this this you have a great experience and that's always what's driven me and to me the the founder and all of the other things have been a mean towards an end there I think that one of the things that is maybe both a controversial but also feels like a true statement is that at some point I feel a bit numb to what work that I enjoy and what I don't enjoy anymore because what I enjoy the most is making this company successful and making the database successful for our customers that's what I care the most about and I'm yeah I honestly I love sales I love marketing I love the engineering I love hiring people for the team I love all of these things but it's not a simplistic answer to oh I've been coding my whole life I think it's more that that is my idle activity if there is that one to two hour and there's nothing urgent on then I'm going to go spend some time in the code it's like oh how did how did Nathan implement this new this new query planning of query plan or heuristic that is a natural that's where my idle activity and I always like to also an interviewing people try to understand especially if they're in a more hybrid world what's your idle activity what's the thing did you do when you have one to two hours and nothing else comes up do you gravitate towards the code do you start looking at just start writing an article do you start playing with the product what is that idle activity and it is code for me that's what everything is grounded in and I think it I think it has a deep influence on how I can how I can lead the company but I don't think it's been I think I often think about something that tell them said you know the author of anti fragile and a bunch of other books is that you the best authors of books are not the ones that sit down and like you know read a bunch of papers then write a page then read another paper write a page the best books are written by people who just you know go to the cabin sit down write 500 pages and and hit publish of course that's not what actually happens but if you read it to let books it's probably pretty close to what actually happened and he just has the citations in his head and I think about that often when building this company that it has felt like I've worked or this my whole life without knowing for it and I feel every every morning that I wake up that this is exactly what it has led up to so it's very naturally even if it wasn't go onto itself that it makes sense with the experience I've had to do exactly this and I tremendously enjoy it but it's not a simplistic answer to do I miss coding no I want to make this company incredibly successful but sometimes I will do it as a recreational activity yeah I mean definitely like when I look at you like on twitter for example you come across as a very technical person and you are for sure right even though you know that to grow your business you need to do a lot of other activities but at the same time I mean yeah I don't mean to ask it in a way that hey you regret now that you do sales you regret not doing more coding which which is not true you still do that and I think that all of all of the engineers will become better engineers if they learn the mastery of actually presenting what they do right and then they will not need a middle layer or someone else who will go and talk to that product manager or whoever else needs to talk to right so they can actually represent themselves but also I also love how you put it really eloquently that what is your idle activity right what do you what's your affinity what you gravitate to and I actually can it resonates a lot with me because my idle activity that I'm really nervous that I do nothing especially on vacations I start coding you know I just go and just okay let's just let's just hypothesize about something but let's let's dial back for for the into the architecture like when I look at the architecture page of turbo buffer it's very simple it's like client connecting over you know TCP to a database instance and it has just two components they are memory or SSD cache and the object storage tell me a bit more so I think our listeners and I mostly know what object storage is but tell me a bit more about that memory component like what algorithm design went into that maybe trade-offs and you know how frequently you need to do the round trips to the object storage versus when we actually don't do that yeah I think it would be easiest to do this by speaking about the lifetime of a request as the cache worms so we we'll actually start with the right path and when when you do a right into turbo buffer it's as simple as you can imagine it I mean at this point we've optimized parts of it that it's not this simple but this is the the best way to explain it when you when you do a right to turbo buffer that right basically goes into a file in a directory called the right ahead log so when you write to a namespace you can imagine that on s3 it's like slash namespace slash you know right ahead log the right ahead log is basically just a sequence of all the rights in order the raw rights so you do your right and it might be okay I'm inserting a document with text the metri and one with text Simon and those are the two documents you can in the simplest way you can imagine that this file is called zero job JSON and the next one is called one dot JSON three dot JSON that's a database right that's just the right ahead log and if you want to satisfy a query you just scan through all the JSON documents and you satisfy the query that's actually respectable database and it's not even that far from the first version of turbo buffer but of course you have to index that data as well so basically as you can imagine once many megabytes of data come in asynchronously an indexing node will pick it up and put it into the inverted index for full text search put it into an a and an index for vector search and put it into a attribute or filtering index for other attributes and there will be other indexing types in the future when when that happens it will put it into slash namespace slash index and just start putting files in there right and then the query layer can then consult those files right instead of scanning through every single document to find a metri you can just plop in and look at the metri in the inverted index find the document and return it that's how right works when a right happens it will go through one of the query nodes and the right will be also written into the into the cache right so both the memory cache and the disk cache and when so when you do a query you will go to that same query node right there's a consistent hashing so if there's three it's sort of like the same namespace will end up on node one all the time if it hashes that I know when you've satisfied when you when you do a query it will first take the cat check the caches if you just did the right well it's already there because we just wrote all the rights into the cache to have this you know the right through cache and and we will satisfy the query mainly from the cache if for whatever reason this namespace is not maybe you did the right a month ago and so it's following that a cache and you do the read well then we'll read through cache by going directly to opix storage with its few round trips as possible to get the data to satisfy the query both from the index and from the wall will do range reads directly on s3 right the old like hcp range header to get exactly the bytes we need to satisfy the query and then start hydrating the cache on the on the query node so the subsequent queries get faster and faster and we can do that a gigabyte per second we can hydrate the cache even for very very very large namespaces so that's the general architecture of turbo puffer on a completely cold query it takes hundreds of milliseconds and on a warm query it can take as little as 10 milliseconds to to satisfy the query the the last detail I'll point out and then we can go into a particular aspect of this is that turbo puffer has chosen to do consistent reads by default this is an unusual choice for search engines we've seen doesn't do this unless you turn it on explicitly I think they've done more work now for real time indexing which to me is the gold standard which is why I keep referring back to it's phenomenal piece of software and turbo power requests consistent reads by default meaning that if you do it right and then you read immediately afterwards that right will be visible and in order to satisfy that we can't just rely on the cache on that node being out of date that node could have died it could have you know the hashed ring could have moved because we scaled up so every single um query we go to op storage and see what is the latest entry in the wall and do we have that entry right is it at 3. +json or is that 5. +json and do I have that so we have a little pointer file that we can look and we can download and look at right and that round trip is basically our p50 like our spans are basically you know often like one to two milliseconds of actual search and then on gcs about depending on the region 12 to 16 milliseconds waiting for that consistency check on s3 the small obnoxiously it's a little bit better so it's eight milliseconds but you can turn this off and you will still get up to you you can get eventual consistency that's very normal for these databases like could be up to one minute out of date and then you can see often less than a millisecond or a millisecond latency to a turbo buffer by turning off that check but we find that this is a very safe default and I think that database should ship with very safe and unsurprising defaults yeah for sure for sure um so in that cache but you also have the you also have the let's focus only back to search part for now you also have the a and n index is that also stored on s3 and then is it do you also keep kind of like a replica of it in memory to to quick access and how do you sort of it's true how do you sort of synchronize the two both the right-ahead log and the index are everything is stored on s3 if you killed all of the compute nodes of turbo buffer in all of our clusters we would not lose any data there is no data on the compute notes that matter it's only transient caching but we cache everything yeah if you're accessing the index will cache the index if you're just accessing the right-ahead log files because it's so small or there's parts of the data that hasn't been indexed then and that's also on s3 and goes into the same cache with everything else right prioritized by the workloads to try to get the best performance possible yeah it's quite smart so effectively you like I remember like at some previous companies when I was running Apache Solar one of the problems was always that all of these charts are super cold because they're never used right we still pay for them but then when the query hits you incur so much latency that's super painful and so I was always coming up with these ideas what if I run some you know post indexing warm-up script that will go and shoot a bunch of queries to all of the charts just to keep them you know up and running and and warm or just cat all the indices on Linux into memory we've done that too that was like 10 years ago or so that was very strange feeling like why do I need to mess with that level of detail it never actually paid off I think what pays off is a most smart way to organize your index and how you read data backwards like essentially when you users really only need fresh data first like on Twitter for example everyone is really after the recent tweets and not some archive and that was very similar case for us but it's very interesting like you go into so much detail there to to make the database effectively like a living organism adjusting to the usage but you also you also have multi-tenancy right so meaning that the same the same turbo buffer deployed across the data centers is going to be used by multiple companies at the same time unless they demand an isolation how do you think about that when they use the same effectively in the same instance compute and index I'd love to go into the solar example for just one second before we go into multi-tenancy how slow were those queries because when you say it cold you mean that it's not in memory when I say cold I mean that it's on S3 what kind of latency were you seeing that you had to do this work on it was very slow first of all the it has to do also with the domain specificity you know the the queries were Boolean and very long and so they they would take sometimes just a query itself would take a minute to execute on now like a regional index design and that was like just super crazy right but it was also very accurate because it was like sentence level search and then I had to design a new system new architecture where we could retain the accuracy of that engine but not have to spend so much money on on on indexing individual sentences so we indexed one complete document right so I had to change the algorithms slightly and so it went to sub-second it was still I think it's still slow right but it was much faster and and users started like like we could scale the company effectively after that right with one minute and 75% of infrastructure costs were like you know shaving off so but that's that's that was part of the Lucine you know munging with the algorithm and changing how it scans the document it had nothing to do with the level that you go into you know with turbo buffer you know like effectively controlling the whole the whole process there got it yeah I think the the point I the the point there is that I think we do see that some customers are concerned with with this cash because they've gone bit and by basically the the way that I would think about it is in in some of the traditional engines the way that they do IO if something is on disk it feels like it's bad like if it's on disk it's slow and it really has to be in memory and so you sort of have you know the pufferfish is either you know the pufferfish is sort of because when it's fully inflated it's a DRAM right it's a deflated it's in s3 well it only had two settings right either it's in disk which is quite slow and frankly in some of the traditional storage engine I've seen the latency on disk being similar to our latency on s3 yeah and so then you have to load it into DRAM and what a lot of these traditional databases they have to do a full copy into DRAM they can't just like zero copy off of disk and in the disk are also quite slow these old network disks right the NVME disks are so fast right they are they can drive bandwidth that's within you know a very low multiple of DRAM right tens of gigabytes per second but their cost is almost and two orders of magnitude lower so this completely changes the economics but you you can't take advantage of these very easily you can't just put as some software on it and just it's going to be like 10 times faster than an original disk even if it's fundamentally capable of it because what we found for example is that we had to remove the Linux page cache because the Linux page cache cannot keep up with these disks so you have to do direct dial but when you do direct dial you don't get coalescing you don't get all these other things now you have to write your own IO driver right and so you just databases have not been built to take advantage of it because they're also like they're not built to try to do an IO depth like basically so many outside standing IO requests they can they can drive there's a lot of throughput so there's just a lot of barriers of entry there so what we find is that when again speaking in generic terms here of like you know millions of vectors query that once when something is in disk it's maybe high tens of milliseconds mid you know 50 70 milliseconds when it's fully on disk maybe lower depending on the query the machinality whatever and when it's in memory it's closer to 10 to 20 milliseconds right so it's like these are not this is not bad like the user is barely going to notice it and but of course you're going to get more throughput that way and then means it's on s3 it's maybe more like five to six hundred milliseconds it's sort of user would notice but a lot of our customers like notion for example when you open the q and a dialog and these different dialogues that will query turbo puffer they will send a request to tell turbo puffer hey can you start warming up the cash here in a way that makes sense and by cash we just mean putting it into disk and starting with sort of the upper layers of the an index and other things to reduce the time as much as possible so there's a lot of things that can be done here that are very very simple that means that the there's there's there's barely a trade-off yeah but we let's go back into multi-tenancy unless you had a follow up let's do that yeah let's do that like how do you use a multi-tenancy part so so turbo puffer can run in three different ways it can run yeah multi-tenancy clusters that's what I mean that's what cursor does that's what that's what linear does and many of our customers so in multi-tenancy you share you share the compute we can do this so cheaply right because we can share the caching can share the we can share all of this infrastructure it's very easy for us to run this way so that's the default mode the cash is of course segregated off in in in in in in different ways but is also like shared in ways where you have a big burst the traffic rate you get more of the cash than others so that's what we so it's a very great way of running multi-tenancy the other thing we do for multi-tenancy to keep it very secure is that because all the data at rest is in the bucket you can pass an encryption key to turbo puffer that we don't have access to unless it's audit logged on your side where we can encrypt and decrypt the object which is logically and from a security point standpoint equivalent to you having all the data in your bucket so this is a very nice primitive that for example linear it takes advantage of because they have full control over their data they can see when turbo puffer is accessing it they can shut it down at any point in time and they can even pass that on to any other customers where turbo puffer can encrypt data for linear customers on behalf of the customer with the customer's key it this is like really really I think groundbreaking and underrated in this architecture you can of course do single tenancy with turbo puffer as well with the computers only for you you can do b y o c where we run to recover inside of your cloud in a way that's like very compliant we can never see customer data but we find it in multi-tenancy with the encryption which can be done per namespities satisfies the security requirements of even some of the biggest companies in the world yeah that sounds awesome I also wanted to pick one topic which usually used to I don't know if any more I don't see that as much pick up a lot of flame discussions what is your recall at n and when I go to the docs of turbo puffer it says recall at n is 100% recall at 10 excuse me but vector search bar so does that not 100% we said 9200 right no I think it says what wait wait wait I'll need to what was the page where you do that oh here the limits oh I see observed in production yeah it should say up to 100% that's a bug in the docs that I shipped last night I'm gonna I'm gonna fix that after this awesome but what it says in the in the limits is 90 to 100% but let's talk about recall I'd love to get into recall so I think recall is incredibly important it's the equivalent of your database you have to trust your database to do it in the same way that you have to trust your database do f sync and you have to trust your database that when we say that hey we don't return a success to you unless it's committed to s3 you have to trust that recall is similar right if you are working on search and you're working on connecting data to llem's then you don't want to worry in your e-vails on whether your vector database is giving you low recall it's actually a very sophisticated problem to evaluate whether this is the cause so you have to trust your vendor this is an underrated problem and I love that you're asking about it and very few people ask about it unless they're quite sophisticated so let's go into it let's go into a long answer here for your audience because I think this is paramount most databases that have a vector index are trained on or not trained on but they're benchmarked against for these different A&N open source projects so there's sift and others problem with these data is that they do not represent what we've seen in the real world a lot of them are very low dimensionality like when we do benchmarking on a billion that we're working on right now the biggest data sets we can find are like 64 dimensions this is not what people are doing in production they're doing at least 512 often generally I'd say the average is around 768 dimensions these are not representative data sets and the distributions in the academic benchmarks are also completely different for what we see in real data sets right in real data sets we see millions of copy of duplicates right we see filtering all these chaotic environments that do not present themselves in the academic thench works so if if you're using a vector index that's only been tested on academic benchmarks it's it's I mean it's like the LLMs right it's like you don't you don't really trust it just based on discording it's sort of you it's all the vibes right it's all the qualitative thing right outside the benchmark was that everyone was dreaming on them that it will work for your domain right like the LLM that's right like early on very very early on in in in interval puffer's history in the first month I was mainly iterating against the SIFT data set right just like 128 dimensional data set I didn't know anything about an end at the time so it's like okay this is pretty good we can tune some risks on this and then I can do go wider but I have the feedback loop and the observation I had at the time was that I found that one I so I got something that worked really well great heristics on SIFT and then when I went it on the other data sets it just completely did not work well or generalized to the other data sets and I think that taught me an early lesson that the these academic data sets are just not enough and the only way to know what your recall is going to be is to measure it in production this is what TurboPuffer does for a percentage of queries it depends on the number of queries that you do but let's say around 1% of queries TurboPuffer will run an exhaustive search against the A&N index on a separate worker fleet we will then emit a metric to data dog that is the recall number right for this query right like which is basically okay this is the top 10 we know is accurate and this is the you know heristic A&N index is what's the overlap and we will average that over time I have a graph in data dog that shows all the different organizations that have more than 100 queries in the in the past in the past hour or whatever and then we have the recall for all of them we have the recall at what they asked for to recall a 10 the p10 recall the p90 recall and we try to our best to make sure that this is green at all times and we consider green anything above 90% is generally quite good it well 90% is is quite good for some queries but for simple queries often it's closer to 100% many of our customers have have 99. +5% recall so this is the only way that we know to do this and it's fun you ask this question today because last night I was I was hacking on putting this into the dashboard so literally putting the recall that we observe from this from this monitoring system into the dashboard of the user because we think it's that important and it's very difficult to get right we have spent thousands of engineering hours to make sure that the recall is high now recall on academic benchmarks easy recall on raw ann search is especially on academic benchmarks very easy raw recall on production data sets I'd say medium to medium hard high recall on ann queries with filters with mixed selectivity and incremental indexing absolute hard mode this is what the like you just slap a secondary vector index onto an existing database this is what they can't do they can't sustain them like a thousand writes per second with high recall in the face of very difficult filter queries so let's talk about filters recall for a second there is barely any academic data sets on this yet it's all the production workloads what a filtered in an index means is that let's say that for example you have you have an ecommerce and you're searching for you're searching for I don't know yellow right and you want to only get things that ship to Canada that cuts the clusters in different weird ways that might end up with a selectivity of 50% and so if you just visit the the closest whatever vectors with some horrific you have you're not going to get the true ann because you actually have to search maybe twice as many maybe three times as many vectors to get the right recall the query planner the thing in the database that decides where to go on disk and figure out the data and aggregate it altogether and return into the user needs to be aware of the selectivity of the filter and plan that into the ann index again if a database is not really serious about their vector offering they're not doing this they're not measuring in a production they're not they're not willing to show their users and they're not they don't have a full infrastructure in place to measure the recall so I'd say we take this extremely seriously and we don't want our users to have to get to get to guest this and it's a it's sometimes a thankless job because because many many many many emails that we see against some of the other vector indexes have very low recall and and how are users supposed to know because running these running these tests is extremely difficult it is and it's like it's as you said like you need to trust there right trust your vendor and it's basically like the like in some documentation pages you say the floor or like the bottom line right like the needs of each it just doesn't make sense right if the quality isn't there then why are you even running this it's a difference between you know finding that product with those constraints when it exists and actually not finding it right therefore not buying it and so on and so on and so forth it's right and I think yeah I you can never guarantee a recall you can observe what you are trying to make it be here on every data set but if you send a billion completely random vectors with 3000 dimensions and try to query them in turbo buffer with query vectors and there is no natural clustering because they're random vectors you're not going to get 100% recall when you send that with a 10% selectivity filters that just like completely breaks every heuristic that's made right but all data in production real data that people want to search has some natural clustering to it so that's not a real benchmark that you can evaluate recall on right and so we always take this seriously and the in the in the in POCs and with the monitoring we do we're looking at these numbers all the time but there are like absolute edge cases that can be very very difficult and what you have to do too as a database vendor is like it's a tug is a tug of war between we're going to look at more data to try to get high recall and we're going to try to improve the clustering of the data so that we have to search less data and so you're always trying to improve the clustering and you're always trying to improve the performance of the database so we can look at more data to get high recall yeah for sure I know that you mentioned about filters you know challenges vegan and I don't know if you aware of those the reason an end-end benchmarks right but there is also a big end-end benchmarks that they happen to pleasure to participate in they have one of the datasets one of the tasks they have is the filtered search I have not participated in that one but again as you said it's kind of like academic but some of the datasets are quite logical like beaten points dimensions and not that huge it's like that's the thing there are 156 yeah it's not like yeah there are hundreds of 200 dimensions these are not real data sets like no there are real data sets but they are real data sets from the past generation of vectors right the the pre the pre current modern embedding error right which are just scores so much higher than these so we just don't see people use these yeah yeah exactly it's still fun to participate in this benchmark by the way because the data is there and you know the some of the guarantees that you need to hit really high you know like thousands of tens of thousands of queries per second so if you can create a toy index that works just a proud moment I guess that's right but I would say that people don't care about these bench like now people care about the benchmarks like their fun competitions but I think it can ruin your company if all you're trying to do is maximize these benchmarks because how many companies on in the world are trying to do 10,000 QPS on a billion vectors right not that many but there's a lot of companies that have a billion vectors lying around that they want to search and they just don't want the pricing to be offensive right we're a turbo buffer can you do this depending on the dimensionality for like a thousand dollars in month that's what people really seem to care about yeah sure maybe if I ask you like a spicy question if I may why do you think some of the vector database players like in Dalgium cells into that game of showing the benchmark and telling we are the best and then you know someone else cuts them over and says no you made a mistake in the benchmark why do you think this is happening like like publicly if you're comfortable talking about this yeah we we don't we don't publish benchmarks against anyone else in in fact it's it's it's usually against the terms of service to do that for almost every vendor including the big vendors like the hyper scalers probably shouldn't be it probably should be prohibited for the hyper scalers for like any competitive reasons or but anyway for the peers I think it's it's like a low blow because everyone can sort of p-hack their way to something where they they go better and becomes like month throwing and it's very distracting activity um we benchmark ourselves in ways that we find that our customers are actually using the database so we're not doing it at 10,000 qps because it's just not what we see to a single namespace um so we benchmark against ourselves we benchmark against first principles and we're always considering what is the gap between what turbo puffer does and first principles there's that's what I've learned that's why I do napkin math is because the fundamental thing you should be benchmarking against our first principles there's a gap between what the DRAM or disband with it is multiplied by how much if it you need and your database is not doing that well then you either have a gap and you're understanding or you have a you've found room for improvement that's what matters and of course it also matters what other people are doing but what matters the most is what your customers are trying to do and they'll they'll pull you in that direction so we think that this is a this easily becomes one of these metrics where you know if you give people a metric they'll optimize for it um and benchmarks of how many qps you can do at some number recall it's just not what people care about um they care about it working they care about enormous ride throughput they care about costs they care about other things necessarily that are much harder to put in such a benchmark um I think benchmarks are important like we need to give people a sense of what they should expect and they should hold us truth at that and what I would love to have is like more absurd ability in the turbo puffer product of like what kind of like performance are you seeing um we're working on you know explaining um our exposing query plans from turbo puffer right so you can see um well what's what's causing the indexing uh sorry what's causing the the query latency to be what it is so yeah I don't think the mud throwing is great um I think that at some point someone's going to publish a benchmark with turbo puffer and um and and again something else and and then we'll have to deal with that as it comes right um by um but it's certainly not an activity that we plan to engage in yeah I love your answer because it also resonates with me like in a different dimension you know I found myself in a situation when at some point in the past when um we've been copy copycatted I can't be say in that way so there was a company that really literally copied the whole interface and it like how how the product looks and we felt threatened but what they couldn't copy is essentially the internal IP right all the algorithms everything where we've spent hard hard uh working time on you know they couldn't copy that and and effectively that doesn't fly by itself right so so basically what I'm trying to say is that you know even though it felt threatening still thinking about what you need to solve right by the laws of physics you really need to focus just on that and if you solve that you become the leader of the market and that's what happened to the company actually it the story was that it actually acquired this copycat right and that's it uh I mean it doesn't mean that's the bad thing bad outcome for either of them but what I'm trying to say is that just focus on that on that thing that you're trying to solve and don't try to indulge into these you know games of like you said you know not throwing and stuff I like that really well said yeah that's I think I think so we focus on on customer studies we focus on we focus on on first principles we focus on benchmarking and we focus on on what are customers telling you telling us that they need and I think that um I think those are the right things to focus on for our company for sure and and and just looking at the clientele right the the ones that you shared just just knowing those names cursor and and notion that everyone is pretty much using every day that's like a testament to what you've done um I also wanted to ask you before we close I wanted to ask you about what are the maybe technical or business or some other challenges that you see ahead of yourself or maybe that's what already is happening and you see that it's important especially in this space of LLAMS where LLAMS can bring value um what is it it feels like you you have wildly successful as a business and as a technology but is there something that you see is still unsolved and and is ahead of you and worth solving I'll go back to a you know I spent a long time at Shopify and so part of growing up at that company from when I was very young was being taught a bit in the school of Toby Lukay the CEO and something that he often said is that you know you about himself is like you have to grow to keep up with the business and that's that's what it is for me as well right I um first had to grow as an engineer to put out the first version um then I had to build an engineering team to take it much further than I ever could alone and I think we have and just an absolutely like 99% college engineering team now then I turned my focus to sales and learning that and now I turn now I have to turn it towards marketing towards legal towards all these different things to build the company I we spoke a little bit about this about this before um about I think that one of the beliefs that we have is just the town density of the team and I don't think there's a lot of I think that a lot of people talk a lot about the town density and I think that there is um now a generation of companies that's really trying to do it um I think that um the tools that we now have available to us and especially the kind of tool that we work on every day um the floor for productivity has been raised but the ceiling has been raised far more and so what really matters to us is having a team of individuals that where everyone is is a player right we see these teams as a symbol of almost more like sports teams today um then how companies were originally uh originally built um and I think that we we hold that as a as a strong belief in how we we um we are building the company but it demands a lot from everyone to work in this way but it's very fun um and I think that the the growth that that embodies on everyone including myself is important and I have to keep up with that I have to keep up with the demand of of of how our customers and our team internally and everything grows and that's that's the biggest challenge is just the amount of new that has to be learned um so that we can become a successful company which is is important for me for our customers and for everyone who's chosen to come along for the ride and join the company. + Oh that's awesome yeah this field changes so quickly um it it felt much slower when I was coding myself you know Java you seen all that stuff you you had like solar elastic search that's it for like long time and then a lot of new engines popped up especially when vector search appeared on the scene but now with the LEM advancements and all of that it just feels so crazy so yeah it's very interesting challenge for sure you know personally and business wise and and team wise and yeah and and keeping balance is another one. + I think the pace of the pace that we see everyone running at now in the successful companies is beyond anything that I've seen before um it reminds me of of just the the the months leading up to black friday at Shopify but it's all the time and I love it I'm addicted to that pace and I think that we have created a team of people who seek intensity and that's exactly what we think that we need to create the right product at a pace that makes sense for also our customers so the day are never bottled next on us and that is what keeps me up at night. +Oh that's awesome that's great cause. We usually end with some sort of announcement anything you want to say to the audience especially now that you said that you want to go deeper into marketing it beats your chance anything that you want to share all call for. + I think that we we we we've refrained from doing any large releases and we try to just ship as rapidly as we possibly can if I look at the change log this month it's um I mean we we launched to we launched the region the Singapore Canada we've added the float tight uh we've we've recently added clients for Java Go Ruby um one of the things that I think is is really exciting is our conditional rights and this is where turbo puffer is not just a bunch of files on s3 it's not even just a search engine it could do conditional rights where you can say hey I only want to replace this document if it's newer than the old version right these are real database features and things like patch right where you do a partial update but we don't we we just we just launched and then we put it on put it on x and we move on um so I don't have any big announcement we went GA a couple a couple months ago it would have luck to have done that but we just tried to ship an announcement just get it out there as soon as possible move on and ship the next thing. + Yeah congratulations on Jay on your original launch but on Jay I think it's a big milestone as well and as you said you're probably not as sort of transactional anymore you just keep shipping and you follow what the customers need and but sometimes some of these things may go and notice unless you know people follow exactly what you do and so in that sense I feel like there is a room or stage for saying hey guys go go use it it's GA right run your benchmark. + I think now do they think about it been more I think one announcement might be that early and unintervaled puffer's history we were very focused on doing many namespaces that were small but we are getting very good at large namespaces now and we have customers that are searching billions of vectors at once and we have customers that want to search hundreds of billions of vectors all at once and we are working with them on that and this is not particularly scary anymore you know exactly what we need to get there so if you have use cases of that caliber you may have passed by turbo puffer before but we're getting ready and we are ready for hundreds of million and billions at once the only limitation there is the is really just the size of a single machine and then we shard over them but I think back to the sharding we had before you need to make every shard as large as possible to get the best economics and the best performance and that's been one of the issues with some of the traditional story tangents. +Yeah for sure. + Yeah I really enjoyed the convo I know we could have gone and so many topics like I really wanted to ask you also about an an algorithms and stuff but I also I feel like we could talk more later as well you know down the road as you guys are progressing hopefully you'll be open to that but I've learned a ton and it's very interesting designed that you have and and the whole journey of you pushing for four months you know and interrupted I hope you now regain some of the balance back to your life now that you have the team supporting you but I really enjoyed this conversation Simon thank you so much for your time thank you Dimitri \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/eric-pugh-measuring-search-quality-with-quepid.md b/transcripts_with_timestamps/vector-podcast/eric-pugh-measuring-search-quality-with-quepid.md new file mode 100644 index 0000000..9636348 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/eric-pugh-measuring-search-quality-with-quepid.md @@ -0,0 +1,2895 @@ +--- +description: '

This episode on YouTube: https://www.youtube.com/watch?v=1L7UjjPz5wM

00:00 + Intro

00:21 + Guest Introduction: Eric Pugh

03:00 + Eric''s story in search and the evolution of search technology

7:27 + Quepid: Improving Search Relevancy

10:08 + When to use Quepid

14:53 + Flash back to Apache Solr 1.4 and the book (of which Eric is one author)

17:49 + Quepid Demo and Future Enhancements

23:57 + Real-Time Query Doc Pairs with WebSockets

24:16 + Integrating Quepid with Search Engines

25:57 + Introducing LLM-Based Judgments

28:05 + Scaling Up Judgments with AI

28:48 Data Science + Notebooks in Quepid

33:23 Custom + Scoring in Quepid

39:23 + API and Developer Tools

42:17 The Future + of Search and Personal Reflections

Show notes:

- Hosted Quepid: + https://app.quepid.com/

- + Ragas: Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines + https://github.com/explodinggradients...

- + Why Quepid: https://quepid.com/why-quepid/

- + Quepid on Github: https://github.com/o19s/quepid

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20240626_010626_075b8a8d662d3fbf1946ef06b8218efa.png +pub_date: Wed, 26 Jun 2024 13:42:56 GMT +title: Eric Pugh - Measuring Search Quality with Quepid +url: https://rss.com/podcasts/vector-podcast/1539938 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 23.88, "text": " Hello + there, vector podcast season 3.", "tokens": [50364, 2425, 456, 11, 8062, 7367, 3196, + 805, 13, 51558], "temperature": 0.0, "avg_logprob": -0.39413962156876275, "compression_ratio": + 1.0533333333333332, "no_speech_prob": 0.15871569514274597}, {"id": 1, "seek": 0, + "start": 23.88, "end": 26.28, "text": " In this season I made one simple promise.", + "tokens": [51558, 682, 341, 3196, 286, 1027, 472, 2199, 6228, 13, 51678], "temperature": + 0.0, "avg_logprob": -0.39413962156876275, "compression_ratio": 1.0533333333333332, + "no_speech_prob": 0.15871569514274597}, {"id": 2, "seek": 2628, "start": 26.28, + "end": 29.16, "text": " I will try to stick to 30 minute episodes.", "tokens": [50364, + 286, 486, 853, 281, 2897, 281, 2217, 3456, 9313, 13, 50508], "temperature": 0.0, + "avg_logprob": -0.25778030255518924, "compression_ratio": 1.491869918699187, "no_speech_prob": + 0.6543307900428772}, {"id": 3, "seek": 2628, "start": 29.16, "end": 31.0, "text": + " Let''s see how well I will do it.", "tokens": [50508, 961, 311, 536, 577, 731, + 286, 486, 360, 309, 13, 50600], "temperature": 0.0, "avg_logprob": -0.25778030255518924, + "compression_ratio": 1.491869918699187, "no_speech_prob": 0.6543307900428772}, {"id": + 4, "seek": 2628, "start": 31.0, "end": 37.6, "text": " It''s not always easy, especially + when you have guests like Eric Pugh that I''m really having a pleasure to talk to + today.", "tokens": [50600, 467, 311, 406, 1009, 1858, 11, 2318, 562, 291, 362, 9804, + 411, 9336, 430, 1984, 300, 286, 478, 534, 1419, 257, 6834, 281, 751, 281, 965, 13, + 50930], "temperature": 0.0, "avg_logprob": -0.25778030255518924, "compression_ratio": + 1.491869918699187, "no_speech_prob": 0.6543307900428772}, {"id": 5, "seek": 2628, + "start": 38.6, "end": 44.400000000000006, "text": " I can say that we''ve been working + together on Cupid, on ideation, on things.", "tokens": [50980, 286, 393, 584, 300, + 321, 600, 668, 1364, 1214, 322, 383, 6127, 11, 322, 1153, 399, 11, 322, 721, 13, + 51270], "temperature": 0.0, "avg_logprob": -0.25778030255518924, "compression_ratio": + 1.491869918699187, "no_speech_prob": 0.6543307900428772}, {"id": 6, "seek": 2628, + "start": 45.400000000000006, "end": 47.64, "text": " And I''ve learned a ton from + you.", "tokens": [51320, 400, 286, 600, 3264, 257, 2952, 490, 291, 13, 51432], "temperature": + 0.0, "avg_logprob": -0.25778030255518924, "compression_ratio": 1.491869918699187, + "no_speech_prob": 0.6543307900428772}, {"id": 7, "seek": 2628, "start": 48.36, "end": + 50.400000000000006, "text": " Yeah, yeah, I''m super excited.", "tokens": [51468, + 865, 11, 1338, 11, 286, 478, 1687, 2919, 13, 51570], "temperature": 0.0, "avg_logprob": + -0.25778030255518924, "compression_ratio": 1.491869918699187, "no_speech_prob": + 0.6543307900428772}, {"id": 8, "seek": 2628, "start": 50.400000000000006, "end": + 52.400000000000006, "text": " So when did I come visit you?", "tokens": [51570, + 407, 562, 630, 286, 808, 3441, 291, 30, 51670], "temperature": 0.0, "avg_logprob": + -0.25778030255518924, "compression_ratio": 1.491869918699187, "no_speech_prob": + 0.6543307900428772}, {"id": 9, "seek": 5240, "start": 53.36, "end": 58.08, "text": + " I think it was two years ago or some two years ago.", "tokens": [50412, 286, 519, + 309, 390, 732, 924, 2057, 420, 512, 732, 924, 2057, 13, 50648], "temperature": 0.0, + "avg_logprob": -0.3425343283291521, "compression_ratio": 1.914438502673797, "no_speech_prob": + 0.1269148290157318}, {"id": 10, "seek": 5240, "start": 58.08, "end": 58.879999999999995, + "text": " I think so.", "tokens": [50648, 286, 519, 370, 13, 50688], "temperature": + 0.0, "avg_logprob": -0.3425343283291521, "compression_ratio": 1.914438502673797, + "no_speech_prob": 0.1269148290157318}, {"id": 11, "seek": 5240, "start": 58.879999999999995, + "end": 60.56, "text": " It was pandemic, I guess.", "tokens": [50688, 467, 390, + 5388, 11, 286, 2041, 13, 50772], "temperature": 0.0, "avg_logprob": -0.3425343283291521, + "compression_ratio": 1.914438502673797, "no_speech_prob": 0.1269148290157318}, {"id": + 12, "seek": 5240, "start": 60.56, "end": 62.72, "text": " Yeah, it was the very + end of the pandemic.", "tokens": [50772, 865, 11, 309, 390, 264, 588, 917, 295, + 264, 5388, 13, 50880], "temperature": 0.0, "avg_logprob": -0.3425343283291521, "compression_ratio": + 1.914438502673797, "no_speech_prob": 0.1269148290157318}, {"id": 13, "seek": 5240, + "start": 62.72, "end": 63.2, "text": " Right.", "tokens": [50880, 1779, 13, 50904], + "temperature": 0.0, "avg_logprob": -0.3425343283291521, "compression_ratio": 1.914438502673797, + "no_speech_prob": 0.1269148290157318}, {"id": 14, "seek": 5240, "start": 63.2, "end": + 66.32, "text": " I remember getting my, yeah, so it was still pandemic.", "tokens": + [50904, 286, 1604, 1242, 452, 11, 1338, 11, 370, 309, 390, 920, 5388, 13, 51060], + "temperature": 0.0, "avg_logprob": -0.3425343283291521, "compression_ratio": 1.914438502673797, + "no_speech_prob": 0.1269148290157318}, {"id": 15, "seek": 5240, "start": 66.32, + "end": 67.2, "text": " It was still pandemic.", "tokens": [51060, 467, 390, 920, + 5388, 13, 51104], "temperature": 0.0, "avg_logprob": -0.3425343283291521, "compression_ratio": + 1.914438502673797, "no_speech_prob": 0.1269148290157318}, {"id": 16, "seek": 5240, + "start": 67.2, "end": 68.64, "text": " Yeah, it''s still pandemic, right?", "tokens": + [51104, 865, 11, 309, 311, 920, 5388, 11, 558, 30, 51176], "temperature": 0.0, "avg_logprob": + -0.3425343283291521, "compression_ratio": 1.914438502673797, "no_speech_prob": 0.1269148290157318}, + {"id": 17, "seek": 5240, "start": 68.64, "end": 75.75999999999999, "text": " Because + I had to get Cupid to say, yeah, so I was, yeah, I was like, I want to meet to meet + you in person.", "tokens": [51176, 1436, 286, 632, 281, 483, 383, 6127, 281, 584, + 11, 1338, 11, 370, 286, 390, 11, 1338, 11, 286, 390, 411, 11, 286, 528, 281, 1677, + 281, 1677, 291, 294, 954, 13, 51532], "temperature": 0.0, "avg_logprob": -0.3425343283291521, + "compression_ratio": 1.914438502673797, "no_speech_prob": 0.1269148290157318}, {"id": + 18, "seek": 7576, "start": 75.84, "end": 80.64, "text": " And I called you and said, + I''m going to come to Helsinki and visit you.", "tokens": [50368, 400, 286, 1219, + 291, 293, 848, 11, 286, 478, 516, 281, 808, 281, 45429, 41917, 293, 3441, 291, 13, + 50608], "temperature": 0.0, "avg_logprob": -0.28815899156544306, "compression_ratio": + 1.5970149253731343, "no_speech_prob": 0.07496391236782074}, {"id": 19, "seek": 7576, + "start": 80.64, "end": 83.28, "text": " And I think you were like, why?", "tokens": + [50608, 400, 286, 519, 291, 645, 411, 11, 983, 30, 50740], "temperature": 0.0, "avg_logprob": + -0.28815899156544306, "compression_ratio": 1.5970149253731343, "no_speech_prob": + 0.07496391236782074}, {"id": 20, "seek": 7576, "start": 83.28, "end": 86.0, "text": + " I mean, we don''t work together or say we worked on Cupid though.", "tokens": + [50740, 286, 914, 11, 321, 500, 380, 589, 1214, 420, 584, 321, 2732, 322, 383, 6127, + 1673, 13, 50876], "temperature": 0.0, "avg_logprob": -0.28815899156544306, "compression_ratio": + 1.5970149253731343, "no_speech_prob": 0.07496391236782074}, {"id": 21, "seek": 7576, + "start": 86.0, "end": 88.4, "text": " Quite a few evenings together, right?", "tokens": + [50876, 20464, 257, 1326, 42835, 1214, 11, 558, 30, 50996], "temperature": 0.0, + "avg_logprob": -0.28815899156544306, "compression_ratio": 1.5970149253731343, "no_speech_prob": + 0.07496391236782074}, {"id": 22, "seek": 7576, "start": 88.4, "end": 91.60000000000001, + "text": " I think it was like nine o''clock your time, Helsinki time.", "tokens": + [50996, 286, 519, 309, 390, 411, 4949, 277, 6, 9023, 428, 565, 11, 45429, 41917, + 565, 13, 51156], "temperature": 0.0, "avg_logprob": -0.28815899156544306, "compression_ratio": + 1.5970149253731343, "no_speech_prob": 0.07496391236782074}, {"id": 23, "seek": 7576, + "start": 91.60000000000001, "end": 92.32000000000001, "text": " Yes.", "tokens": + [51156, 1079, 13, 51192], "temperature": 0.0, "avg_logprob": -0.28815899156544306, + "compression_ratio": 1.5970149253731343, "no_speech_prob": 0.07496391236782074}, + {"id": 24, "seek": 7576, "start": 92.32000000000001, "end": 92.80000000000001, "text": + " Yes.", "tokens": [51192, 1079, 13, 51216], "temperature": 0.0, "avg_logprob": + -0.28815899156544306, "compression_ratio": 1.5970149253731343, "no_speech_prob": + 0.07496391236782074}, {"id": 25, "seek": 7576, "start": 92.80000000000001, "end": + 93.12, "text": " Who was it?", "tokens": [51216, 2102, 390, 309, 30, 51232], "temperature": + 0.0, "avg_logprob": -0.28815899156544306, "compression_ratio": 1.5970149253731343, + "no_speech_prob": 0.07496391236782074}, {"id": 26, "seek": 7576, "start": 93.12, + "end": 94.24000000000001, "text": " And it was Friday.", "tokens": [51232, 400, + 309, 390, 6984, 13, 51288], "temperature": 0.0, "avg_logprob": -0.28815899156544306, + "compression_ratio": 1.5970149253731343, "no_speech_prob": 0.07496391236782074}, + {"id": 27, "seek": 7576, "start": 94.24000000000001, "end": 96.4, "text": " I remember + vividly Friday.", "tokens": [51288, 286, 1604, 23603, 356, 6984, 13, 51396], "temperature": + 0.0, "avg_logprob": -0.28815899156544306, "compression_ratio": 1.5970149253731343, + "no_speech_prob": 0.07496391236782074}, {"id": 28, "seek": 7576, "start": 97.52000000000001, + "end": 98.32000000000001, "text": " What else to do?", "tokens": [51452, 708, 1646, + 281, 360, 30, 51492], "temperature": 0.0, "avg_logprob": -0.28815899156544306, "compression_ratio": + 1.5970149253731343, "no_speech_prob": 0.07496391236782074}, {"id": 29, "seek": 7576, + "start": 98.32000000000001, "end": 101.44, "text": " So I went camping with my family.", + "tokens": [51492, 407, 286, 1437, 19470, 365, 452, 1605, 13, 51648], "temperature": + 0.0, "avg_logprob": -0.28815899156544306, "compression_ratio": 1.5970149253731343, + "no_speech_prob": 0.07496391236782074}, {"id": 30, "seek": 7576, "start": 101.44, + "end": 102.4, "text": " Can I screen share?", "tokens": [51648, 1664, 286, 2568, + 2073, 30, 51696], "temperature": 0.0, "avg_logprob": -0.28815899156544306, "compression_ratio": + 1.5970149253731343, "no_speech_prob": 0.07496391236782074}, {"id": 31, "seek": 7576, + "start": 102.4, "end": 102.72, "text": " Did you meet?", "tokens": [51696, 2589, + 291, 1677, 30, 51712], "temperature": 0.0, "avg_logprob": -0.28815899156544306, + "compression_ratio": 1.5970149253731343, "no_speech_prob": 0.07496391236782074}, + {"id": 32, "seek": 7576, "start": 102.72, "end": 103.44, "text": " Yes, yes.", "tokens": + [51712, 1079, 11, 2086, 13, 51748], "temperature": 0.0, "avg_logprob": -0.28815899156544306, + "compression_ratio": 1.5970149253731343, "no_speech_prob": 0.07496391236782074}, + {"id": 33, "seek": 10344, "start": 103.92, "end": 104.39999999999999, "text": " + Of course.", "tokens": [50388, 2720, 1164, 13, 50412], "temperature": 0.0, "avg_logprob": + -0.33767520256762235, "compression_ratio": 1.5812807881773399, "no_speech_prob": + 0.07604223489761353}, {"id": 34, "seek": 10344, "start": 104.39999999999999, "end": + 106.0, "text": " You can give me permissions.", "tokens": [50412, 509, 393, 976, + 385, 32723, 13, 50492], "temperature": 0.0, "avg_logprob": -0.33767520256762235, + "compression_ratio": 1.5812807881773399, "no_speech_prob": 0.07604223489761353}, + {"id": 35, "seek": 10344, "start": 106.0, "end": 108.96, "text": " I went camping + with my family.", "tokens": [50492, 286, 1437, 19470, 365, 452, 1605, 13, 50640], + "temperature": 0.0, "avg_logprob": -0.33767520256762235, "compression_ratio": 1.5812807881773399, + "no_speech_prob": 0.07604223489761353}, {"id": 36, "seek": 10344, "start": 108.96, + "end": 117.03999999999999, "text": " And if you think back to my visit to you, you + and your wife gave me a little gift.", "tokens": [50640, 400, 498, 291, 519, 646, + 281, 452, 3441, 281, 291, 11, 291, 293, 428, 3836, 2729, 385, 257, 707, 5306, 13, + 51044], "temperature": 0.0, "avg_logprob": -0.33767520256762235, "compression_ratio": + 1.5812807881773399, "no_speech_prob": 0.07604223489761353}, {"id": 37, "seek": 10344, + "start": 118.16, "end": 119.75999999999999, "text": " Yeah, give me a free share.", + "tokens": [51100, 865, 11, 976, 385, 257, 1737, 2073, 13, 51180], "temperature": + 0.0, "avg_logprob": -0.33767520256762235, "compression_ratio": 1.5812807881773399, + "no_speech_prob": 0.07604223489761353}, {"id": 38, "seek": 10344, "start": 119.75999999999999, + "end": 121.2, "text": " I can only do you host.", "tokens": [51180, 286, 393, 787, + 360, 291, 3975, 13, 51252], "temperature": 0.0, "avg_logprob": -0.33767520256762235, + "compression_ratio": 1.5812807881773399, "no_speech_prob": 0.07604223489761353}, + {"id": 39, "seek": 10344, "start": 121.2, "end": 123.84, "text": " Let''s do a host + and you can screen share.", "tokens": [51252, 961, 311, 360, 257, 3975, 293, 291, + 393, 2568, 2073, 13, 51384], "temperature": 0.0, "avg_logprob": -0.33767520256762235, + "compression_ratio": 1.5812807881773399, "no_speech_prob": 0.07604223489761353}, + {"id": 40, "seek": 10344, "start": 123.84, "end": 124.56, "text": " Yeah, I can.", + "tokens": [51384, 865, 11, 286, 393, 13, 51420], "temperature": 0.0, "avg_logprob": + -0.33767520256762235, "compression_ratio": 1.5812807881773399, "no_speech_prob": + 0.07604223489761353}, {"id": 41, "seek": 10344, "start": 124.56, "end": 125.44, + "text": " All right.", "tokens": [51420, 1057, 558, 13, 51464], "temperature": 0.0, + "avg_logprob": -0.33767520256762235, "compression_ratio": 1.5812807881773399, "no_speech_prob": + 0.07604223489761353}, {"id": 42, "seek": 10344, "start": 125.44, "end": 130.56, + "text": " And so I just wanted to share off this, this cup.", "tokens": [51464, + 400, 370, 286, 445, 1415, 281, 2073, 766, 341, 11, 341, 4414, 13, 51720], "temperature": + 0.0, "avg_logprob": -0.33767520256762235, "compression_ratio": 1.5812807881773399, + "no_speech_prob": 0.07604223489761353}, {"id": 43, "seek": 13056, "start": 130.56, + "end": 133.28, "text": " Oh, there it is.", "tokens": [50364, 876, 11, 456, 309, + 307, 13, 50500], "temperature": 0.0, "avg_logprob": -0.2157975126195837, "compression_ratio": + 1.5390625, "no_speech_prob": 0.02091003768146038}, {"id": 44, "seek": 13056, "start": + 133.28, "end": 135.52, "text": " So we''ve had that little wooden cup.", "tokens": + [50500, 407, 321, 600, 632, 300, 707, 14744, 4414, 13, 50612], "temperature": 0.0, + "avg_logprob": -0.2157975126195837, "compression_ratio": 1.5390625, "no_speech_prob": + 0.02091003768146038}, {"id": 45, "seek": 13056, "start": 135.52, "end": 139.52, + "text": " I think it''s a traditional finish drinking vessel when you''re out in", + "tokens": [50612, 286, 519, 309, 311, 257, 5164, 2413, 7583, 18098, 562, 291, 434, + 484, 294, 50812], "temperature": 0.0, "avg_logprob": -0.2157975126195837, "compression_ratio": + 1.5390625, "no_speech_prob": 0.02091003768146038}, {"id": 46, "seek": 13056, "start": + 139.52, "end": 139.92000000000002, "text": " nature.", "tokens": [50812, 3687, 13, + 50832], "temperature": 0.0, "avg_logprob": -0.2157975126195837, "compression_ratio": + 1.5390625, "no_speech_prob": 0.02091003768146038}, {"id": 47, "seek": 13056, "start": + 139.92000000000002, "end": 141.68, "text": " And there it is with coffee.", "tokens": + [50832, 400, 456, 309, 307, 365, 4982, 13, 50920], "temperature": 0.0, "avg_logprob": + -0.2157975126195837, "compression_ratio": 1.5390625, "no_speech_prob": 0.02091003768146038}, + {"id": 48, "seek": 13056, "start": 141.68, "end": 147.92000000000002, "text": " + And then I''m also showing off my metal ceramic metal enameled cup", "tokens": [50920, + 400, 550, 286, 478, 611, 4099, 766, 452, 5760, 29996, 5760, 465, 16103, 292, 4414, + 51232], "temperature": 0.0, "avg_logprob": -0.2157975126195837, "compression_ratio": + 1.5390625, "no_speech_prob": 0.02091003768146038}, {"id": 49, "seek": 13056, "start": + 147.92000000000002, "end": 153.2, "text": " that I picked up at OpenSearchCon EU + a couple weeks ago in Berlin", "tokens": [51232, 300, 286, 6183, 493, 412, 7238, + 10637, 1178, 9838, 10887, 257, 1916, 3259, 2057, 294, 13848, 51496], "temperature": + 0.0, "avg_logprob": -0.2157975126195837, "compression_ratio": 1.5390625, "no_speech_prob": + 0.02091003768146038}, {"id": 50, "seek": 13056, "start": 153.2, "end": 157.2, "text": + " that Zeta Alpha shared had some great conversations about search,", "tokens": + [51496, 300, 1176, 7664, 20588, 5507, 632, 512, 869, 7315, 466, 3164, 11, 51696], + "temperature": 0.0, "avg_logprob": -0.2157975126195837, "compression_ratio": 1.5390625, + "no_speech_prob": 0.02091003768146038}, {"id": 51, "seek": 13056, "start": 157.2, + "end": 159.04, "text": " relevancy and measurement with them.", "tokens": [51696, + 25916, 6717, 293, 13160, 365, 552, 13, 51788], "temperature": 0.0, "avg_logprob": + -0.2157975126195837, "compression_ratio": 1.5390625, "no_speech_prob": 0.02091003768146038}, + {"id": 52, "seek": 15904, "start": 159.04, "end": 164.23999999999998, "text": " + So we took these two cups on our family camping trip the other week.", "tokens": + [50364, 407, 321, 1890, 613, 732, 13381, 322, 527, 1605, 19470, 4931, 264, 661, + 1243, 13, 50624], "temperature": 0.0, "avg_logprob": -0.21564528677198622, "compression_ratio": + 1.5977443609022557, "no_speech_prob": 0.02588249370455742}, {"id": 53, "seek": 15904, + "start": 164.23999999999998, "end": 166.48, "text": " I wanted to show those off + to you.", "tokens": [50624, 286, 1415, 281, 855, 729, 766, 281, 291, 13, 50736], + "temperature": 0.0, "avg_logprob": -0.21564528677198622, "compression_ratio": 1.5977443609022557, + "no_speech_prob": 0.02588249370455742}, {"id": 54, "seek": 15904, "start": 166.48, + "end": 167.51999999999998, "text": " This is lovely.", "tokens": [50736, 639, 307, + 7496, 13, 50788], "temperature": 0.0, "avg_logprob": -0.21564528677198622, "compression_ratio": + 1.5977443609022557, "no_speech_prob": 0.02588249370455742}, {"id": 55, "seek": 15904, + "start": 167.51999999999998, "end": 168.23999999999998, "text": " This is lovely.", + "tokens": [50788, 639, 307, 7496, 13, 50824], "temperature": 0.0, "avg_logprob": + -0.21564528677198622, "compression_ratio": 1.5977443609022557, "no_speech_prob": + 0.02588249370455742}, {"id": 56, "seek": 15904, "start": 168.23999999999998, "end": + 171.44, "text": " And I''m glad you''re putting this in good news.", "tokens": [50824, + 400, 286, 478, 5404, 291, 434, 3372, 341, 294, 665, 2583, 13, 50984], "temperature": + 0.0, "avg_logprob": -0.21564528677198622, "compression_ratio": 1.5977443609022557, + "no_speech_prob": 0.02588249370455742}, {"id": 57, "seek": 15904, "start": 172.0, + "end": 172.48, "text": " Yep.", "tokens": [51012, 7010, 13, 51036], "temperature": + 0.0, "avg_logprob": -0.21564528677198622, "compression_ratio": 1.5977443609022557, + "no_speech_prob": 0.02588249370455742}, {"id": 58, "seek": 15904, "start": 172.48, + "end": 173.51999999999998, "text": " It goes with us.", "tokens": [51036, 467, 1709, + 365, 505, 13, 51088], "temperature": 0.0, "avg_logprob": -0.21564528677198622, "compression_ratio": + 1.5977443609022557, "no_speech_prob": 0.02588249370455742}, {"id": 59, "seek": 15904, + "start": 173.51999999999998, "end": 174.95999999999998, "text": " So fantastic.", + "tokens": [51088, 407, 5456, 13, 51160], "temperature": 0.0, "avg_logprob": -0.21564528677198622, + "compression_ratio": 1.5977443609022557, "no_speech_prob": 0.02588249370455742}, + {"id": 60, "seek": 15904, "start": 174.95999999999998, "end": 175.76, "text": " + Yes.", "tokens": [51160, 1079, 13, 51200], "temperature": 0.0, "avg_logprob": -0.21564528677198622, + "compression_ratio": 1.5977443609022557, "no_speech_prob": 0.02588249370455742}, + {"id": 61, "seek": 15904, "start": 175.76, "end": 176.95999999999998, "text": " + Yes, where we start.", "tokens": [51200, 1079, 11, 689, 321, 722, 13, 51260], "temperature": + 0.0, "avg_logprob": -0.21564528677198622, "compression_ratio": 1.5977443609022557, + "no_speech_prob": 0.02588249370455742}, {"id": 62, "seek": 15904, "start": 176.95999999999998, + "end": 178.23999999999998, "text": " First of all, hello.", "tokens": [51260, 2386, + 295, 439, 11, 7751, 13, 51324], "temperature": 0.0, "avg_logprob": -0.21564528677198622, + "compression_ratio": 1.5977443609022557, "no_speech_prob": 0.02588249370455742}, + {"id": 63, "seek": 15904, "start": 178.23999999999998, "end": 179.12, "text": " + Welcome.", "tokens": [51324, 4027, 13, 51368], "temperature": 0.0, "avg_logprob": + -0.21564528677198622, "compression_ratio": 1.5977443609022557, "no_speech_prob": + 0.02588249370455742}, {"id": 64, "seek": 15904, "start": 179.12, "end": 180.07999999999998, + "text": " Welcome.", "tokens": [51368, 4027, 13, 51416], "temperature": 0.0, "avg_logprob": + -0.21564528677198622, "compression_ratio": 1.5977443609022557, "no_speech_prob": + 0.02588249370455742}, {"id": 65, "seek": 15904, "start": 180.07999999999998, "end": + 180.56, "text": " Yeah.", "tokens": [51416, 865, 13, 51440], "temperature": 0.0, + "avg_logprob": -0.21564528677198622, "compression_ratio": 1.5977443609022557, "no_speech_prob": + 0.02588249370455742}, {"id": 66, "seek": 15904, "start": 180.56, "end": 181.92, + "text": " Thank you very much for having me.", "tokens": [51440, 1044, 291, 588, + 709, 337, 1419, 385, 13, 51508], "temperature": 0.0, "avg_logprob": -0.21564528677198622, + "compression_ratio": 1.5977443609022557, "no_speech_prob": 0.02588249370455742}, + {"id": 67, "seek": 15904, "start": 181.92, "end": 183.04, "text": " It''s long overdue.", + "tokens": [51508, 467, 311, 938, 19853, 622, 13, 51564], "temperature": 0.0, "avg_logprob": + -0.21564528677198622, "compression_ratio": 1.5977443609022557, "no_speech_prob": + 0.02588249370455742}, {"id": 68, "seek": 15904, "start": 183.04, "end": 186.23999999999998, + "text": " And usually we start with a little bit of a background.", "tokens": [51564, + 400, 2673, 321, 722, 365, 257, 707, 857, 295, 257, 3678, 13, 51724], "temperature": + 0.0, "avg_logprob": -0.21564528677198622, "compression_ratio": 1.5977443609022557, + "no_speech_prob": 0.02588249370455742}, {"id": 69, "seek": 15904, "start": 186.23999999999998, + "end": 187.6, "text": " Obviously, people can go.", "tokens": [51724, 7580, 11, + 561, 393, 352, 13, 51792], "temperature": 0.0, "avg_logprob": -0.21564528677198622, + "compression_ratio": 1.5977443609022557, "no_speech_prob": 0.02588249370455742}, + {"id": 70, "seek": 18760, "start": 187.6, "end": 190.16, "text": " I think you even + have a Wikipedia page about you.", "tokens": [50364, 286, 519, 291, 754, 362, 257, + 28999, 3028, 466, 291, 13, 50492], "temperature": 0.0, "avg_logprob": -0.21349844179655375, + "compression_ratio": 1.7509433962264151, "no_speech_prob": 0.030675683170557022}, + {"id": 71, "seek": 18760, "start": 190.16, "end": 190.88, "text": " I think so.", + "tokens": [50492, 286, 519, 370, 13, 50528], "temperature": 0.0, "avg_logprob": + -0.21349844179655375, "compression_ratio": 1.7509433962264151, "no_speech_prob": + 0.030675683170557022}, {"id": 72, "seek": 18760, "start": 190.88, "end": 191.84, + "text": " I don''t know.", "tokens": [50528, 286, 500, 380, 458, 13, 50576], "temperature": + 0.0, "avg_logprob": -0.21349844179655375, "compression_ratio": 1.7509433962264151, + "no_speech_prob": 0.030675683170557022}, {"id": 73, "seek": 18760, "start": 191.84, + "end": 193.28, "text": " That is a lot of people.", "tokens": [50576, 663, 307, + 257, 688, 295, 561, 13, 50648], "temperature": 0.0, "avg_logprob": -0.21349844179655375, + "compression_ratio": 1.7509433962264151, "no_speech_prob": 0.030675683170557022}, + {"id": 74, "seek": 18760, "start": 193.28, "end": 193.51999999999998, "text": " + Right.", "tokens": [50648, 1779, 13, 50660], "temperature": 0.0, "avg_logprob": + -0.21349844179655375, "compression_ratio": 1.7509433962264151, "no_speech_prob": + 0.030675683170557022}, {"id": 75, "seek": 18760, "start": 193.51999999999998, "end": + 196.88, "text": " That is a lot of people to get to a Wikipedia page.", "tokens": + [50660, 663, 307, 257, 688, 295, 561, 281, 483, 281, 257, 28999, 3028, 13, 50828], + "temperature": 0.0, "avg_logprob": -0.21349844179655375, "compression_ratio": 1.7509433962264151, + "no_speech_prob": 0.030675683170557022}, {"id": 76, "seek": 18760, "start": 196.88, + "end": 198.79999999999998, "text": " I don''t know that I''m quite there yet.", + "tokens": [50828, 286, 500, 380, 458, 300, 286, 478, 1596, 456, 1939, 13, 50924], + "temperature": 0.0, "avg_logprob": -0.21349844179655375, "compression_ratio": 1.7509433962264151, + "no_speech_prob": 0.030675683170557022}, {"id": 77, "seek": 18760, "start": 198.79999999999998, + "end": 199.12, "text": " Yes.", "tokens": [50924, 1079, 13, 50940], "temperature": + 0.0, "avg_logprob": -0.21349844179655375, "compression_ratio": 1.7509433962264151, + "no_speech_prob": 0.030675683170557022}, {"id": 78, "seek": 18760, "start": 199.12, + "end": 201.12, "text": " So my name''s Eric Pugh.", "tokens": [50940, 407, 452, + 1315, 311, 9336, 430, 1984, 13, 51040], "temperature": 0.0, "avg_logprob": -0.21349844179655375, + "compression_ratio": 1.7509433962264151, "no_speech_prob": 0.030675683170557022}, + {"id": 79, "seek": 18760, "start": 201.12, "end": 203.6, "text": " Been doing search + for about, I don''t know.", "tokens": [51040, 32839, 884, 3164, 337, 466, 11, 286, + 500, 380, 458, 13, 51164], "temperature": 0.0, "avg_logprob": -0.21349844179655375, + "compression_ratio": 1.7509433962264151, "no_speech_prob": 0.030675683170557022}, + {"id": 80, "seek": 18760, "start": 203.6, "end": 206.4, "text": " We''re like getting + 15 years.", "tokens": [51164, 492, 434, 411, 1242, 2119, 924, 13, 51304], "temperature": + 0.0, "avg_logprob": -0.21349844179655375, "compression_ratio": 1.7509433962264151, + "no_speech_prob": 0.030675683170557022}, {"id": 81, "seek": 18760, "start": 206.4, + "end": 210.79999999999998, "text": " And I was there for when a search was like + first,", "tokens": [51304, 400, 286, 390, 456, 337, 562, 257, 3164, 390, 411, 700, + 11, 51524], "temperature": 0.0, "avg_logprob": -0.21349844179655375, "compression_ratio": + 1.7509433962264151, "no_speech_prob": 0.030675683170557022}, {"id": 82, "seek": + 18760, "start": 210.79999999999998, "end": 212.4, "text": " oh, you have your own + search engine.", "tokens": [51524, 1954, 11, 291, 362, 428, 1065, 3164, 2848, 13, + 51604], "temperature": 0.0, "avg_logprob": -0.21349844179655375, "compression_ratio": + 1.7509433962264151, "no_speech_prob": 0.030675683170557022}, {"id": 83, "seek": + 18760, "start": 212.4, "end": 213.76, "text": " It was very exotic.", "tokens": + [51604, 467, 390, 588, 27063, 13, 51672], "temperature": 0.0, "avg_logprob": -0.21349844179655375, + "compression_ratio": 1.7509433962264151, "no_speech_prob": 0.030675683170557022}, + {"id": 84, "seek": 18760, "start": 213.76, "end": 215.84, "text": " And there was + nothing open source.", "tokens": [51672, 400, 456, 390, 1825, 1269, 4009, 13, 51776], + "temperature": 0.0, "avg_logprob": -0.21349844179655375, "compression_ratio": 1.7509433962264151, + "no_speech_prob": 0.030675683170557022}, {"id": 85, "seek": 18760, "start": 215.84, + "end": 217.04, "text": " It was all commercial.", "tokens": [51776, 467, 390, 439, + 6841, 13, 51836], "temperature": 0.0, "avg_logprob": -0.21349844179655375, "compression_ratio": + 1.7509433962264151, "no_speech_prob": 0.030675683170557022}, {"id": 86, "seek": + 21760, "start": 217.6, "end": 220.88, "text": " And then cut my teeth in search + going", "tokens": [50364, 400, 550, 1723, 452, 7798, 294, 3164, 516, 50528], "temperature": + 0.0, "avg_logprob": -0.20726046671394174, "compression_ratio": 1.6459143968871595, + "no_speech_prob": 0.0006410639034584165}, {"id": 87, "seek": 21760, "start": 220.88, + "end": 223.35999999999999, "text": " through the big data time period.", "tokens": + [50528, 807, 264, 955, 1412, 565, 2896, 13, 50652], "temperature": 0.0, "avg_logprob": + -0.20726046671394174, "compression_ratio": 1.6459143968871595, "no_speech_prob": + 0.0006410639034584165}, {"id": 88, "seek": 21760, "start": 223.35999999999999, "end": + 223.84, "text": " Right.", "tokens": [50652, 1779, 13, 50676], "temperature": 0.0, + "avg_logprob": -0.20726046671394174, "compression_ratio": 1.6459143968871595, "no_speech_prob": + 0.0006410639034584165}, {"id": 89, "seek": 21760, "start": 223.84, "end": 226.4, + "text": " When, as Grant Ingersoll said once,", "tokens": [50676, 1133, 11, 382, + 17529, 682, 9458, 1833, 848, 1564, 11, 50804], "temperature": 0.0, "avg_logprob": + -0.20726046671394174, "compression_ratio": 1.6459143968871595, "no_speech_prob": + 0.0006410639034584165}, {"id": 90, "seek": 21760, "start": 226.4, "end": 228.79999999999998, + "text": " search is the UI to big data.", "tokens": [50804, 3164, 307, 264, 15682, + 281, 955, 1412, 13, 50924], "temperature": 0.0, "avg_logprob": -0.20726046671394174, + "compression_ratio": 1.6459143968871595, "no_speech_prob": 0.0006410639034584165}, + {"id": 91, "seek": 21760, "start": 228.79999999999998, "end": 230.48, "text": " + And it was all about data.", "tokens": [50924, 400, 309, 390, 439, 466, 1412, 13, + 51008], "temperature": 0.0, "avg_logprob": -0.20726046671394174, "compression_ratio": + 1.6459143968871595, "no_speech_prob": 0.0006410639034584165}, {"id": 92, "seek": + 21760, "start": 230.48, "end": 233.68, "text": " Can we handle and how do we store + it and scale up our search", "tokens": [51008, 1664, 321, 4813, 293, 577, 360, 321, + 3531, 309, 293, 4373, 493, 527, 3164, 51168], "temperature": 0.0, "avg_logprob": + -0.20726046671394174, "compression_ratio": 1.6459143968871595, "no_speech_prob": + 0.0006410639034584165}, {"id": 93, "seek": 21760, "start": 233.68, "end": 234.72, + "text": " engines?", "tokens": [51168, 12982, 30, 51220], "temperature": 0.0, "avg_logprob": + -0.20726046671394174, "compression_ratio": 1.6459143968871595, "no_speech_prob": + 0.0006410639034584165}, {"id": 94, "seek": 21760, "start": 234.72, "end": 237.04, + "text": " And that was great and kind of led", "tokens": [51220, 400, 300, 390, + 869, 293, 733, 295, 4684, 51336], "temperature": 0.0, "avg_logprob": -0.20726046671394174, + "compression_ratio": 1.6459143968871595, "no_speech_prob": 0.0006410639034584165}, + {"id": 95, "seek": 21760, "start": 237.04, "end": 239.35999999999999, "text": " + into the machine learning time period.", "tokens": [51336, 666, 264, 3479, 2539, + 565, 2896, 13, 51452], "temperature": 0.0, "avg_logprob": -0.20726046671394174, + "compression_ratio": 1.6459143968871595, "no_speech_prob": 0.0006410639034584165}, + {"id": 96, "seek": 21760, "start": 239.35999999999999, "end": 244.0, "text": " We''re + really at that point, it was like, OK, we have lots of data.", "tokens": [51452, + 492, 434, 534, 412, 300, 935, 11, 309, 390, 411, 11, 2264, 11, 321, 362, 3195, 295, + 1412, 13, 51684], "temperature": 0.0, "avg_logprob": -0.20726046671394174, "compression_ratio": + 1.6459143968871595, "no_speech_prob": 0.0006410639034584165}, {"id": 97, "seek": + 21760, "start": 244.0, "end": 245.32, "text": " We can now search it.", "tokens": + [51684, 492, 393, 586, 3164, 309, 13, 51750], "temperature": 0.0, "avg_logprob": + -0.20726046671394174, "compression_ratio": 1.6459143968871595, "no_speech_prob": + 0.0006410639034584165}, {"id": 98, "seek": 21760, "start": 245.32, "end": 246.48, + "text": " What does it mean?", "tokens": [51750, 708, 775, 309, 914, 30, 51808], + "temperature": 0.0, "avg_logprob": -0.20726046671394174, "compression_ratio": 1.6459143968871595, + "no_speech_prob": 0.0006410639034584165}, {"id": 99, "seek": 24648, "start": 246.76, + "end": 248.76, "text": " What are people looking for?", "tokens": [50378, 708, 366, + 561, 1237, 337, 30, 50478], "temperature": 0.0, "avg_logprob": -0.19754366147316108, + "compression_ratio": 1.6629213483146068, "no_speech_prob": 0.0022490399423986673}, + {"id": 100, "seek": 24648, "start": 248.76, "end": 252.76, "text": " It wasn''t + enough to have fast search with 10 blue links.", "tokens": [50478, 467, 2067, 380, + 1547, 281, 362, 2370, 3164, 365, 1266, 3344, 6123, 13, 50678], "temperature": 0.0, + "avg_logprob": -0.19754366147316108, "compression_ratio": 1.6629213483146068, "no_speech_prob": + 0.0022490399423986673}, {"id": 101, "seek": 24648, "start": 252.76, "end": 255.35999999999999, + "text": " It was all of a sudden became really important to be like,", "tokens": + [50678, 467, 390, 439, 295, 257, 3990, 3062, 534, 1021, 281, 312, 411, 11, 50808], + "temperature": 0.0, "avg_logprob": -0.19754366147316108, "compression_ratio": 1.6629213483146068, + "no_speech_prob": 0.0022490399423986673}, {"id": 102, "seek": 24648, "start": 255.35999999999999, + "end": 258.76, "text": " am I giving my users what they want or not?", "tokens": + [50808, 669, 286, 2902, 452, 5022, 437, 436, 528, 420, 406, 30, 50978], "temperature": + 0.0, "avg_logprob": -0.19754366147316108, "compression_ratio": 1.6629213483146068, + "no_speech_prob": 0.0022490399423986673}, {"id": 103, "seek": 24648, "start": 258.76, + "end": 263.36, "text": " And machine learning and data science really kind of came + along", "tokens": [50978, 400, 3479, 2539, 293, 1412, 3497, 534, 733, 295, 1361, + 2051, 51208], "temperature": 0.0, "avg_logprob": -0.19754366147316108, "compression_ratio": + 1.6629213483146068, "no_speech_prob": 0.0022490399423986673}, {"id": 104, "seek": + 24648, "start": 263.36, "end": 266.96, "text": " and helped us make those determinations.", + "tokens": [51208, 293, 4254, 505, 652, 729, 3618, 10325, 13, 51388], "temperature": + 0.0, "avg_logprob": -0.19754366147316108, "compression_ratio": 1.6629213483146068, + "no_speech_prob": 0.0022490399423986673}, {"id": 105, "seek": 24648, "start": 266.96, + "end": 270.4, "text": " So really, and that''s when open source connections,", "tokens": + [51388, 407, 534, 11, 293, 300, 311, 562, 1269, 4009, 9271, 11, 51560], "temperature": + 0.0, "avg_logprob": -0.19754366147316108, "compression_ratio": 1.6629213483146068, + "no_speech_prob": 0.0022490399423986673}, {"id": 106, "seek": 24648, "start": 270.4, + "end": 272.76, "text": " the company I was one of the co-founders of,", "tokens": + [51560, 264, 2237, 286, 390, 472, 295, 264, 598, 12, 17493, 433, 295, 11, 51678], + "temperature": 0.0, "avg_logprob": -0.19754366147316108, "compression_ratio": 1.6629213483146068, + "no_speech_prob": 0.0022490399423986673}, {"id": 107, "seek": 24648, "start": 272.76, + "end": 275.24, "text": " and I''m one of the leaders of really kind of focusing", + "tokens": [51678, 293, 286, 478, 472, 295, 264, 3523, 295, 534, 733, 295, 8416, + 51802], "temperature": 0.0, "avg_logprob": -0.19754366147316108, "compression_ratio": + 1.6629213483146068, "no_speech_prob": 0.0022490399423986673}, {"id": 108, "seek": + 27524, "start": 275.92, "end": 278.76, "text": " on the value side of search, relevancy.", + "tokens": [50398, 322, 264, 2158, 1252, 295, 3164, 11, 25916, 6717, 13, 50540], + "temperature": 0.0, "avg_logprob": -0.20609786918571404, "compression_ratio": 1.624, + "no_speech_prob": 0.012816332280635834}, {"id": 109, "seek": 27524, "start": 278.76, + "end": 281.6, "text": " Am I giving people what they''re looking for?", "tokens": + [50540, 2012, 286, 2902, 561, 437, 436, 434, 1237, 337, 30, 50682], "temperature": + 0.0, "avg_logprob": -0.20609786918571404, "compression_ratio": 1.624, "no_speech_prob": + 0.012816332280635834}, {"id": 110, "seek": 27524, "start": 281.6, "end": 284.48, + "text": " How do I drive more revenue in e-commerce?", "tokens": [50682, 1012, 360, + 286, 3332, 544, 9324, 294, 308, 12, 26926, 30, 50826], "temperature": 0.0, "avg_logprob": + -0.20609786918571404, "compression_ratio": 1.624, "no_speech_prob": 0.012816332280635834}, + {"id": 111, "seek": 27524, "start": 284.48, "end": 287.48, "text": " How do I help + people use my SaaS products?", "tokens": [50826, 1012, 360, 286, 854, 561, 764, + 452, 49733, 3383, 30, 50976], "temperature": 0.0, "avg_logprob": -0.20609786918571404, + "compression_ratio": 1.624, "no_speech_prob": 0.012816332280635834}, {"id": 112, + "seek": 27524, "start": 287.48, "end": 290.12, "text": " Are they subscribed and + renew their subscriptions?", "tokens": [50976, 2014, 436, 16665, 293, 10162, 641, + 44951, 30, 51108], "temperature": 0.0, "avg_logprob": -0.20609786918571404, "compression_ratio": + 1.624, "no_speech_prob": 0.012816332280635834}, {"id": 113, "seek": 27524, "start": + 290.12, "end": 291.8, "text": " All of this, right?", "tokens": [51108, 1057, 295, + 341, 11, 558, 30, 51192], "temperature": 0.0, "avg_logprob": -0.20609786918571404, + "compression_ratio": 1.624, "no_speech_prob": 0.012816332280635834}, {"id": 114, + "seek": 27524, "start": 291.8, "end": 294.6, "text": " And yeah, machine learning + was awesome.", "tokens": [51192, 400, 1338, 11, 3479, 2539, 390, 3476, 13, 51332], + "temperature": 0.0, "avg_logprob": -0.20609786918571404, "compression_ratio": 1.624, + "no_speech_prob": 0.012816332280635834}, {"id": 115, "seek": 27524, "start": 294.6, + "end": 296.04, "text": " Data science was awesome.", "tokens": [51332, 11888, 3497, + 390, 3476, 13, 51404], "temperature": 0.0, "avg_logprob": -0.20609786918571404, + "compression_ratio": 1.624, "no_speech_prob": 0.012816332280635834}, {"id": 116, + "seek": 27524, "start": 296.04, "end": 299.0, "text": " Really got into a whole + measurement thing.", "tokens": [51404, 4083, 658, 666, 257, 1379, 13160, 551, 13, + 51552], "temperature": 0.0, "avg_logprob": -0.20609786918571404, "compression_ratio": + 1.624, "no_speech_prob": 0.012816332280635834}, {"id": 117, "seek": 27524, "start": + 299.0, "end": 303.8, "text": " And that was kind of one of the products that I stored,", + "tokens": [51552, 400, 300, 390, 733, 295, 472, 295, 264, 3383, 300, 286, 12187, + 11, 51792], "temperature": 0.0, "avg_logprob": -0.20609786918571404, "compression_ratio": + 1.624, "no_speech_prob": 0.012816332280635834}, {"id": 118, "seek": 30380, "start": + 303.8, "end": 308.8, "text": " Cupid, we know each other, came out of that time + period", "tokens": [50364, 383, 6127, 11, 321, 458, 1184, 661, 11, 1361, 484, 295, + 300, 565, 2896, 50614], "temperature": 0.0, "avg_logprob": -0.17252057269938942, + "compression_ratio": 1.5761316872427984, "no_speech_prob": 0.0014866720885038376}, + {"id": 119, "seek": 30380, "start": 309.08, "end": 312.2, "text": " because we said, + why are we building custom tooling", "tokens": [50628, 570, 321, 848, 11, 983, 366, + 321, 2390, 2375, 46593, 50784], "temperature": 0.0, "avg_logprob": -0.17252057269938942, + "compression_ratio": 1.5761316872427984, "no_speech_prob": 0.0014866720885038376}, + {"id": 120, "seek": 30380, "start": 312.2, "end": 315.0, "text": " for every project, + maybe we could share some things.", "tokens": [50784, 337, 633, 1716, 11, 1310, + 321, 727, 2073, 512, 721, 13, 50924], "temperature": 0.0, "avg_logprob": -0.17252057269938942, + "compression_ratio": 1.5761316872427984, "no_speech_prob": 0.0014866720885038376}, + {"id": 121, "seek": 30380, "start": 315.0, "end": 319.16, "text": " So, and then + yeah, today it''s really been exciting", "tokens": [50924, 407, 11, 293, 550, 1338, + 11, 965, 309, 311, 534, 668, 4670, 51132], "temperature": 0.0, "avg_logprob": -0.17252057269938942, + "compression_ratio": 1.5761316872427984, "no_speech_prob": 0.0014866720885038376}, + {"id": 122, "seek": 30380, "start": 319.16, "end": 323.68, "text": " to see sort + of generative AI come along and vectors.", "tokens": [51132, 281, 536, 1333, 295, + 1337, 1166, 7318, 808, 2051, 293, 18875, 13, 51358], "temperature": 0.0, "avg_logprob": + -0.17252057269938942, "compression_ratio": 1.5761316872427984, "no_speech_prob": + 0.0014866720885038376}, {"id": 123, "seek": 30380, "start": 323.68, "end": 327.04, + "text": " And it''s interesting because I still feel, you know,", "tokens": [51358, + 400, 309, 311, 1880, 570, 286, 920, 841, 11, 291, 458, 11, 51526], "temperature": + 0.0, "avg_logprob": -0.17252057269938942, "compression_ratio": 1.5761316872427984, + "no_speech_prob": 0.0014866720885038376}, {"id": 124, "seek": 30380, "start": 327.04, + "end": 331.48, "text": " for a little while I was like, is search still gonna be + a domain?", "tokens": [51526, 337, 257, 707, 1339, 286, 390, 411, 11, 307, 3164, + 920, 799, 312, 257, 9274, 30, 51748], "temperature": 0.0, "avg_logprob": -0.17252057269938942, + "compression_ratio": 1.5761316872427984, "no_speech_prob": 0.0014866720885038376}, + {"id": 125, "seek": 33148, "start": 331.56, "end": 333.84000000000003, "text": " + And you know, search is totally changed,", "tokens": [50368, 400, 291, 458, 11, + 3164, 307, 3879, 3105, 11, 50482], "temperature": 0.0, "avg_logprob": -0.18408670230787627, + "compression_ratio": 1.5420168067226891, "no_speech_prob": 0.00013959500938653946}, + {"id": 126, "seek": 33148, "start": 333.84000000000003, "end": 338.68, "text": " + but it''s still how people interact with systems, right?", "tokens": [50482, 457, + 309, 311, 920, 577, 561, 4648, 365, 3652, 11, 558, 30, 50724], "temperature": 0.0, + "avg_logprob": -0.18408670230787627, "compression_ratio": 1.5420168067226891, "no_speech_prob": + 0.00013959500938653946}, {"id": 127, "seek": 33148, "start": 338.68, "end": 343.6, + "text": " Whether it''s a spot and a retrieval augmented generation", "tokens": + [50724, 8503, 309, 311, 257, 4008, 293, 257, 19817, 3337, 36155, 5125, 50970], "temperature": + 0.0, "avg_logprob": -0.18408670230787627, "compression_ratio": 1.5420168067226891, + "no_speech_prob": 0.00013959500938653946}, {"id": 128, "seek": 33148, "start": 343.6, + "end": 346.12, "text": " or a more traditional keyword search,", "tokens": [50970, + 420, 257, 544, 5164, 20428, 3164, 11, 51096], "temperature": 0.0, "avg_logprob": + -0.18408670230787627, "compression_ratio": 1.5420168067226891, "no_speech_prob": + 0.00013959500938653946}, {"id": 129, "seek": 33148, "start": 347.08000000000004, + "end": 350.20000000000005, "text": " using LLMs, using models, using vectors,", + "tokens": [51144, 1228, 441, 43, 26386, 11, 1228, 5245, 11, 1228, 18875, 11, 51300], + "temperature": 0.0, "avg_logprob": -0.18408670230787627, "compression_ratio": 1.5420168067226891, + "no_speech_prob": 0.00013959500938653946}, {"id": 130, "seek": 33148, "start": 350.20000000000005, + "end": 352.48, "text": " still a search engine in the middle of it,", "tokens": + [51300, 920, 257, 3164, 2848, 294, 264, 2808, 295, 309, 11, 51414], "temperature": + 0.0, "avg_logprob": -0.18408670230787627, "compression_ratio": 1.5420168067226891, + "no_speech_prob": 0.00013959500938653946}, {"id": 131, "seek": 33148, "start": 352.48, + "end": 355.28000000000003, "text": " mediating, moderating that conversation.", + "tokens": [51414, 17269, 990, 11, 10494, 990, 300, 3761, 13, 51554], "temperature": + 0.0, "avg_logprob": -0.18408670230787627, "compression_ratio": 1.5420168067226891, + "no_speech_prob": 0.00013959500938653946}, {"id": 132, "seek": 33148, "start": 355.28000000000003, + "end": 360.0, "text": " So really excited about what Gen AI has let us do.", "tokens": + [51554, 407, 534, 2919, 466, 437, 3632, 7318, 575, 718, 505, 360, 13, 51790], "temperature": + 0.0, "avg_logprob": -0.18408670230787627, "compression_ratio": 1.5420168067226891, + "no_speech_prob": 0.00013959500938653946}, {"id": 133, "seek": 36000, "start": 360.0, + "end": 363.72, "text": " And I think my big takeaway right now is", "tokens": [50364, + 400, 286, 519, 452, 955, 30681, 558, 586, 307, 50550], "temperature": 0.0, "avg_logprob": + -0.1490144633283519, "compression_ratio": 1.6680497925311204, "no_speech_prob": + 8.178748976206407e-05}, {"id": 134, "seek": 36000, "start": 363.72, "end": 368.2, + "text": " that historically search was fairly mediocre.", "tokens": [50550, 300, + 16180, 3164, 390, 6457, 45415, 13, 50774], "temperature": 0.0, "avg_logprob": -0.1490144633283519, + "compression_ratio": 1.6680497925311204, "no_speech_prob": 8.178748976206407e-05}, + {"id": 135, "seek": 36000, "start": 368.2, "end": 370.92, "text": " You could make + it a little better, you could make it a little worse,", "tokens": [50774, 509, 727, + 652, 309, 257, 707, 1101, 11, 291, 727, 652, 309, 257, 707, 5324, 11, 50910], "temperature": + 0.0, "avg_logprob": -0.1490144633283519, "compression_ratio": 1.6680497925311204, + "no_speech_prob": 8.178748976206407e-05}, {"id": 136, "seek": 36000, "start": 370.92, + "end": 374.36, "text": " but it was always like people understood it was fairly + explainable.", "tokens": [50910, 457, 309, 390, 1009, 411, 561, 7320, 309, 390, + 6457, 2903, 712, 13, 51082], "temperature": 0.0, "avg_logprob": -0.1490144633283519, + "compression_ratio": 1.6680497925311204, "no_speech_prob": 8.178748976206407e-05}, + {"id": 137, "seek": 36000, "start": 375.44, "end": 379.0, "text": " Why I''m really + excited about measurement", "tokens": [51136, 1545, 286, 478, 534, 2919, 466, 13160, + 51314], "temperature": 0.0, "avg_logprob": -0.1490144633283519, "compression_ratio": + 1.6680497925311204, "no_speech_prob": 8.178748976206407e-05}, {"id": 138, "seek": + 36000, "start": 379.0, "end": 383.12, "text": " and understanding these days is + because now with Gen AI,", "tokens": [51314, 293, 3701, 613, 1708, 307, 570, 586, + 365, 3632, 7318, 11, 51520], "temperature": 0.0, "avg_logprob": -0.1490144633283519, + "compression_ratio": 1.6680497925311204, "no_speech_prob": 8.178748976206407e-05}, + {"id": 139, "seek": 36000, "start": 383.12, "end": 385.08, "text": " we have much + better tools.", "tokens": [51520, 321, 362, 709, 1101, 3873, 13, 51618], "temperature": + 0.0, "avg_logprob": -0.1490144633283519, "compression_ratio": 1.6680497925311204, + "no_speech_prob": 8.178748976206407e-05}, {"id": 140, "seek": 36000, "start": 385.08, + "end": 388.4, "text": " We don''t have to have mediocre search kind of better,", + "tokens": [51618, 492, 500, 380, 362, 281, 362, 45415, 3164, 733, 295, 1101, 11, + 51784], "temperature": 0.0, "avg_logprob": -0.1490144633283519, "compression_ratio": + 1.6680497925311204, "no_speech_prob": 8.178748976206407e-05}, {"id": 141, "seek": + 38840, "start": 388.4, "end": 389.47999999999996, "text": " kind of worse.", "tokens": + [50364, 733, 295, 5324, 13, 50418], "temperature": 0.0, "avg_logprob": -0.23618685078417134, + "compression_ratio": 1.7261904761904763, "no_speech_prob": 0.0020165250170975924}, + {"id": 142, "seek": 38840, "start": 389.47999999999996, "end": 393.52, "text": " + Instead we can have amazing, accurate search results", "tokens": [50418, 7156, 321, + 393, 362, 2243, 11, 8559, 3164, 3542, 50620], "temperature": 0.0, "avg_logprob": + -0.23618685078417134, "compression_ratio": 1.7261904761904763, "no_speech_prob": + 0.0020165250170975924}, {"id": 143, "seek": 38840, "start": 393.52, "end": 395.79999999999995, + "text": " that really understand what you''re looking for.", "tokens": [50620, 300, + 534, 1223, 437, 291, 434, 1237, 337, 13, 50734], "temperature": 0.0, "avg_logprob": + -0.23618685078417134, "compression_ratio": 1.7261904761904763, "no_speech_prob": + 0.0020165250170975924}, {"id": 144, "seek": 38840, "start": 395.79999999999995, + "end": 399.12, "text": " And you''re like, yes, this is exactly what I wanted.", + "tokens": [50734, 400, 291, 434, 411, 11, 2086, 11, 341, 307, 2293, 437, 286, 1415, + 13, 50900], "temperature": 0.0, "avg_logprob": -0.23618685078417134, "compression_ratio": + 1.7261904761904763, "no_speech_prob": 0.0020165250170975924}, {"id": 145, "seek": + 38840, "start": 399.12, "end": 402.79999999999995, "text": " But, whoops side of + it is sometimes those search results", "tokens": [50900, 583, 11, 567, 3370, 1252, + 295, 309, 307, 2171, 729, 3164, 3542, 51084], "temperature": 0.0, "avg_logprob": + -0.23618685078417134, "compression_ratio": 1.7261904761904763, "no_speech_prob": + 0.0020165250170975924}, {"id": 146, "seek": 38840, "start": 402.79999999999995, + "end": 405.23999999999995, "text": " are back shit crazy and you know,", "tokens": + [51084, 366, 646, 4611, 3219, 293, 291, 458, 11, 51206], "temperature": 0.0, "avg_logprob": + -0.23618685078417134, "compression_ratio": 1.7261904761904763, "no_speech_prob": + 0.0020165250170975924}, {"id": 147, "seek": 38840, "start": 405.23999999999995, + "end": 410.0, "text": " no idea why it came back with it and you made me lose trust.", + "tokens": [51206, 572, 1558, 983, 309, 1361, 646, 365, 309, 293, 291, 1027, 385, + 3624, 3361, 13, 51444], "temperature": 0.0, "avg_logprob": -0.23618685078417134, + "compression_ratio": 1.7261904761904763, "no_speech_prob": 0.0020165250170975924}, + {"id": 148, "seek": 38840, "start": 410.0, "end": 414.76, "text": " And so now instead + of all search results sort of being", "tokens": [51444, 400, 370, 586, 2602, 295, + 439, 3164, 3542, 1333, 295, 885, 51682], "temperature": 0.0, "avg_logprob": -0.23618685078417134, + "compression_ratio": 1.7261904761904763, "no_speech_prob": 0.0020165250170975924}, + {"id": 149, "seek": 38840, "start": 414.76, "end": 418.2, "text": " in the middle + sort of, yeah, a little better, little worse,", "tokens": [51682, 294, 264, 2808, + 1333, 295, 11, 1338, 11, 257, 707, 1101, 11, 707, 5324, 11, 51854], "temperature": + 0.0, "avg_logprob": -0.23618685078417134, "compression_ratio": 1.7261904761904763, + "no_speech_prob": 0.0020165250170975924}, {"id": 150, "seek": 41820, "start": 418.2, + "end": 419.59999999999997, "text": " we''re now really polarized.", "tokens": [50364, + 321, 434, 586, 534, 48623, 13, 50434], "temperature": 0.0, "avg_logprob": -0.13140755940258989, + "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.000396077724872157}, + {"id": 151, "seek": 41820, "start": 419.59999999999997, "end": 423.44, "text": " + Sometimes they''re amazing, sometimes they''re terrible.", "tokens": [50434, 4803, + 436, 434, 2243, 11, 2171, 436, 434, 6237, 13, 50626], "temperature": 0.0, "avg_logprob": + -0.13140755940258989, "compression_ratio": 1.8278688524590163, "no_speech_prob": + 0.000396077724872157}, {"id": 152, "seek": 41820, "start": 423.44, "end": 426.8, + "text": " And we need to understand what that curve looks like", "tokens": [50626, + 400, 321, 643, 281, 1223, 437, 300, 7605, 1542, 411, 50794], "temperature": 0.0, + "avg_logprob": -0.13140755940258989, "compression_ratio": 1.8278688524590163, "no_speech_prob": + 0.000396077724872157}, {"id": 153, "seek": 41820, "start": 426.8, "end": 430.52, + "text": " and make sure that the amount of terrible", "tokens": [50794, 293, 652, + 988, 300, 264, 2372, 295, 6237, 50980], "temperature": 0.0, "avg_logprob": -0.13140755940258989, + "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.000396077724872157}, + {"id": 154, "seek": 41820, "start": 430.52, "end": 432.8, "text": " is something + that we''re willing to deal with, right?", "tokens": [50980, 307, 746, 300, 321, + 434, 4950, 281, 2028, 365, 11, 558, 30, 51094], "temperature": 0.0, "avg_logprob": + -0.13140755940258989, "compression_ratio": 1.8278688524590163, "no_speech_prob": + 0.000396077724872157}, {"id": 155, "seek": 41820, "start": 432.8, "end": 436.36, + "text": " Terrible results, one in 10,000, one in 5,000,", "tokens": [51094, 6564, + 4457, 3542, 11, 472, 294, 1266, 11, 1360, 11, 472, 294, 1025, 11, 1360, 11, 51272], + "temperature": 0.0, "avg_logprob": -0.13140755940258989, "compression_ratio": 1.8278688524590163, + "no_speech_prob": 0.000396077724872157}, {"id": 156, "seek": 41820, "start": 436.36, + "end": 439.12, "text": " one in a million, depending on your domain,", "tokens": + [51272, 472, 294, 257, 2459, 11, 5413, 322, 428, 9274, 11, 51410], "temperature": + 0.0, "avg_logprob": -0.13140755940258989, "compression_ratio": 1.8278688524590163, + "no_speech_prob": 0.000396077724872157}, {"id": 157, "seek": 41820, "start": 439.12, + "end": 442.48, "text": " it may need to be one in a billion is a terrible,", "tokens": + [51410, 309, 815, 643, 281, 312, 472, 294, 257, 5218, 307, 257, 6237, 11, 51578], + "temperature": 0.0, "avg_logprob": -0.13140755940258989, "compression_ratio": 1.8278688524590163, + "no_speech_prob": 0.000396077724872157}, {"id": 158, "seek": 41820, "start": 442.48, + "end": 443.91999999999996, "text": " right, depending on what you''re doing.", "tokens": + [51578, 558, 11, 5413, 322, 437, 291, 434, 884, 13, 51650], "temperature": 0.0, + "avg_logprob": -0.13140755940258989, "compression_ratio": 1.8278688524590163, "no_speech_prob": + 0.000396077724872157}, {"id": 159, "seek": 41820, "start": 443.91999999999996, "end": + 447.59999999999997, "text": " So exciting times, really exciting.", "tokens": [51650, + 407, 4670, 1413, 11, 534, 4670, 13, 51834], "temperature": 0.0, "avg_logprob": -0.13140755940258989, + "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.000396077724872157}, + {"id": 160, "seek": 44760, "start": 447.6, "end": 449.72, "text": " Yeah, it''s + amazing, it''s amazing story.", "tokens": [50364, 865, 11, 309, 311, 2243, 11, 309, + 311, 2243, 1657, 13, 50470], "temperature": 0.0, "avg_logprob": -0.21488162875175476, + "compression_ratio": 1.5694444444444444, "no_speech_prob": 0.0029164880979806185}, + {"id": 161, "seek": 44760, "start": 449.72, "end": 452.8, "text": " And of course, + I''m very pleased to also being able", "tokens": [50470, 400, 295, 1164, 11, 286, + 478, 588, 10587, 281, 611, 885, 1075, 50624], "temperature": 0.0, "avg_logprob": + -0.21488162875175476, "compression_ratio": 1.5694444444444444, "no_speech_prob": + 0.0029164880979806185}, {"id": 162, "seek": 44760, "start": 452.8, "end": 455.72, + "text": " to pick up, keep it with you early on,", "tokens": [50624, 281, 1888, + 493, 11, 1066, 309, 365, 291, 2440, 322, 11, 50770], "temperature": 0.0, "avg_logprob": + -0.21488162875175476, "compression_ratio": 1.5694444444444444, "no_speech_prob": + 0.0029164880979806185}, {"id": 163, "seek": 44760, "start": 455.72, "end": 459.72, + "text": " where I tried to pioneer it two companies ago", "tokens": [50770, 689, + 286, 3031, 281, 37668, 309, 732, 3431, 2057, 50970], "temperature": 0.0, "avg_logprob": + -0.21488162875175476, "compression_ratio": 1.5694444444444444, "no_speech_prob": + 0.0029164880979806185}, {"id": 164, "seek": 44760, "start": 459.72, "end": 462.12, + "text": " and I was leaving actually.", "tokens": [50970, 293, 286, 390, 5012, 767, + 13, 51090], "temperature": 0.0, "avg_logprob": -0.21488162875175476, "compression_ratio": + 1.5694444444444444, "no_speech_prob": 0.0029164880979806185}, {"id": 165, "seek": + 44760, "start": 462.12, "end": 463.40000000000003, "text": " But it was almost ready.", + "tokens": [51090, 583, 309, 390, 1920, 1919, 13, 51154], "temperature": 0.0, "avg_logprob": + -0.21488162875175476, "compression_ratio": 1.5694444444444444, "no_speech_prob": + 0.0029164880979806185}, {"id": 166, "seek": 44760, "start": 463.40000000000003, + "end": 465.72, "text": " And then the next company I actually deployed it.", "tokens": + [51154, 400, 550, 264, 958, 2237, 286, 767, 17826, 309, 13, 51270], "temperature": + 0.0, "avg_logprob": -0.21488162875175476, "compression_ratio": 1.5694444444444444, + "no_speech_prob": 0.0029164880979806185}, {"id": 167, "seek": 44760, "start": 465.72, + "end": 470.36, "text": " And we, I remember we generated 70 G-RAT tickets", "tokens": + [51270, 400, 321, 11, 286, 1604, 321, 10833, 5285, 460, 12, 49, 2218, 12628, 51502], + "temperature": 0.0, "avg_logprob": -0.21488162875175476, "compression_ratio": 1.5694444444444444, + "no_speech_prob": 0.0029164880979806185}, {"id": 168, "seek": 44760, "start": 470.36, + "end": 472.48, "text": " just by looking at queries in Cupid", "tokens": [51502, + 445, 538, 1237, 412, 24109, 294, 383, 6127, 51608], "temperature": 0.0, "avg_logprob": + -0.21488162875175476, "compression_ratio": 1.5694444444444444, "no_speech_prob": + 0.0029164880979806185}, {"id": 169, "seek": 44760, "start": 472.48, "end": 474.40000000000003, + "text": " because you know how it usually goes.", "tokens": [51608, 570, 291, 458, + 577, 309, 2673, 1709, 13, 51704], "temperature": 0.0, "avg_logprob": -0.21488162875175476, + "compression_ratio": 1.5694444444444444, "no_speech_prob": 0.0029164880979806185}, + {"id": 170, "seek": 44760, "start": 474.40000000000003, "end": 477.44, "text": " + People develop software, other people check on it,", "tokens": [51704, 3432, 1499, + 4722, 11, 661, 561, 1520, 322, 309, 11, 51856], "temperature": 0.0, "avg_logprob": + -0.21488162875175476, "compression_ratio": 1.5694444444444444, "no_speech_prob": + 0.0029164880979806185}, {"id": 171, "seek": 47744, "start": 477.44, "end": 480.0, + "text": " other people are just project managing and things like this", "tokens": + [50364, 661, 561, 366, 445, 1716, 11642, 293, 721, 411, 341, 50492], "temperature": + 0.0, "avg_logprob": -0.1461128294467926, "compression_ratio": 1.7413793103448276, + "no_speech_prob": 0.0010672395583242178}, {"id": 172, "seek": 47744, "start": 480.0, + "end": 483.6, "text": " and no one really takes the lead on looking at the queries.", + "tokens": [50492, 293, 572, 472, 534, 2516, 264, 1477, 322, 1237, 412, 264, 24109, + 13, 50672], "temperature": 0.0, "avg_logprob": -0.1461128294467926, "compression_ratio": + 1.7413793103448276, "no_speech_prob": 0.0010672395583242178}, {"id": 173, "seek": + 47744, "start": 483.6, "end": 486.76, "text": " And this is actually the most fun + sometimes to look at queries", "tokens": [50672, 400, 341, 307, 767, 264, 881, 1019, + 2171, 281, 574, 412, 24109, 50830], "temperature": 0.0, "avg_logprob": -0.1461128294467926, + "compression_ratio": 1.7413793103448276, "no_speech_prob": 0.0010672395583242178}, + {"id": 174, "seek": 47744, "start": 486.76, "end": 489.76, "text": " and sort of + you know investigate what''s going on.", "tokens": [50830, 293, 1333, 295, 291, + 458, 15013, 437, 311, 516, 322, 13, 50980], "temperature": 0.0, "avg_logprob": -0.1461128294467926, + "compression_ratio": 1.7413793103448276, "no_speech_prob": 0.0010672395583242178}, + {"id": 175, "seek": 47744, "start": 489.76, "end": 491.28, "text": " Do you even + like these results?", "tokens": [50980, 1144, 291, 754, 411, 613, 3542, 30, 51056], + "temperature": 0.0, "avg_logprob": -0.1461128294467926, "compression_ratio": 1.7413793103448276, + "no_speech_prob": 0.0010672395583242178}, {"id": 176, "seek": 47744, "start": 491.28, + "end": 493.12, "text": " How do you feel about them?", "tokens": [51056, 1012, 360, + 291, 841, 466, 552, 30, 51148], "temperature": 0.0, "avg_logprob": -0.1461128294467926, + "compression_ratio": 1.7413793103448276, "no_speech_prob": 0.0010672395583242178}, + {"id": 177, "seek": 47744, "start": 493.12, "end": 495.84, "text": " You know let + alone setting up a team around it", "tokens": [51148, 509, 458, 718, 3312, 3287, + 493, 257, 1469, 926, 309, 51284], "temperature": 0.0, "avg_logprob": -0.1461128294467926, + "compression_ratio": 1.7413793103448276, "no_speech_prob": 0.0010672395583242178}, + {"id": 178, "seek": 47744, "start": 495.84, "end": 498.96, "text": " where some + annotators can actually go and label", "tokens": [51284, 689, 512, 25339, 3391, + 393, 767, 352, 293, 7645, 51440], "temperature": 0.0, "avg_logprob": -0.1461128294467926, + "compression_ratio": 1.7413793103448276, "no_speech_prob": 0.0010672395583242178}, + {"id": 179, "seek": 47744, "start": 498.96, "end": 500.88, "text": " with some domain + expertise, you know,", "tokens": [51440, 365, 512, 9274, 11769, 11, 291, 458, 11, + 51536], "temperature": 0.0, "avg_logprob": -0.1461128294467926, "compression_ratio": + 1.7413793103448276, "no_speech_prob": 0.0010672395583242178}, {"id": 180, "seek": + 47744, "start": 500.88, "end": 503.28, "text": " or maybe pretending to be users + and things like this.", "tokens": [51536, 420, 1310, 22106, 281, 312, 5022, 293, + 721, 411, 341, 13, 51656], "temperature": 0.0, "avg_logprob": -0.1461128294467926, + "compression_ratio": 1.7413793103448276, "no_speech_prob": 0.0010672395583242178}, + {"id": 181, "seek": 47744, "start": 503.28, "end": 505.4, "text": " So it''s an + amazing system", "tokens": [51656, 407, 309, 311, 364, 2243, 1185, 51762], "temperature": + 0.0, "avg_logprob": -0.1461128294467926, "compression_ratio": 1.7413793103448276, + "no_speech_prob": 0.0010672395583242178}, {"id": 182, "seek": 50540, "start": 505.4, + "end": 507.52, "text": " and we continue to use it today.", "tokens": [50364, 293, + 321, 2354, 281, 764, 309, 965, 13, 50470], "temperature": 0.0, "avg_logprob": -0.2066033573473914, + "compression_ratio": 1.684, "no_speech_prob": 0.001431247335858643}, {"id": 183, + "seek": 50540, "start": 507.52, "end": 510.28, "text": " Of course this was the + first thing I pioneered at TomTom", "tokens": [50470, 2720, 1164, 341, 390, 264, + 700, 551, 286, 19761, 4073, 412, 5041, 23442, 50608], "temperature": 0.0, "avg_logprob": + -0.2066033573473914, "compression_ratio": 1.684, "no_speech_prob": 0.001431247335858643}, + {"id": 184, "seek": 50540, "start": 510.28, "end": 511.4, "text": " and it''s still + there.", "tokens": [50608, 293, 309, 311, 920, 456, 13, 50664], "temperature": 0.0, + "avg_logprob": -0.2066033573473914, "compression_ratio": 1.684, "no_speech_prob": + 0.001431247335858643}, {"id": 185, "seek": 50540, "start": 512.4, "end": 514.04, + "text": " It''s fantastic, that is wonderful.", "tokens": [50714, 467, 311, 5456, + 11, 300, 307, 3715, 13, 50796], "temperature": 0.0, "avg_logprob": -0.2066033573473914, + "compression_ratio": 1.684, "no_speech_prob": 0.001431247335858643}, {"id": 186, + "seek": 50540, "start": 514.04, "end": 518.16, "text": " I mean, it''s been great + to see sort of the adoption of the product", "tokens": [50796, 286, 914, 11, 309, + 311, 668, 869, 281, 536, 1333, 295, 264, 19215, 295, 264, 1674, 51002], "temperature": + 0.0, "avg_logprob": -0.2066033573473914, "compression_ratio": 1.684, "no_speech_prob": + 0.001431247335858643}, {"id": 187, "seek": 50540, "start": 518.16, "end": 521.3199999999999, + "text": " and then people have been using it for a long time.", "tokens": [51002, + 293, 550, 561, 362, 668, 1228, 309, 337, 257, 938, 565, 13, 51160], "temperature": + 0.0, "avg_logprob": -0.2066033573473914, "compression_ratio": 1.684, "no_speech_prob": + 0.001431247335858643}, {"id": 188, "seek": 50540, "start": 521.3199999999999, "end": + 523.72, "text": " So I''m going to show a query set today", "tokens": [51160, 407, + 286, 478, 516, 281, 855, 257, 14581, 992, 965, 51280], "temperature": 0.0, "avg_logprob": + -0.2066033573473914, "compression_ratio": 1.684, "no_speech_prob": 0.001431247335858643}, + {"id": 189, "seek": 50540, "start": 523.72, "end": 526.36, "text": " that is a thousand + queries", "tokens": [51280, 300, 307, 257, 4714, 24109, 51412], "temperature": 0.0, + "avg_logprob": -0.2066033573473914, "compression_ratio": 1.684, "no_speech_prob": + 0.001431247335858643}, {"id": 190, "seek": 50540, "start": 526.36, "end": 530.36, + "text": " and maybe a thousand queries that have been judged", "tokens": [51412, + 293, 1310, 257, 4714, 24109, 300, 362, 668, 27485, 51612], "temperature": 0.0, "avg_logprob": + -0.2066033573473914, "compression_ratio": 1.684, "no_speech_prob": 0.001431247335858643}, + {"id": 191, "seek": 50540, "start": 530.36, "end": 535.12, "text": " 10 deep right + by hand for three years.", "tokens": [51612, 1266, 2452, 558, 538, 1011, 337, 1045, + 924, 13, 51850], "temperature": 0.0, "avg_logprob": -0.2066033573473914, "compression_ratio": + 1.684, "no_speech_prob": 0.001431247335858643}, {"id": 192, "seek": 53512, "start": + 535.12, "end": 536.36, "text": " Almost four years.", "tokens": [50364, 12627, 1451, + 924, 13, 50426], "temperature": 0.0, "avg_logprob": -0.16422524452209472, "compression_ratio": + 1.7098039215686274, "no_speech_prob": 0.0014380936045199633}, {"id": 193, "seek": + 53512, "start": 536.36, "end": 540.28, "text": " This one organization that Nerry + Information Network", "tokens": [50426, 639, 472, 4475, 300, 426, 5318, 15357, 12640, + 50622], "temperature": 0.0, "avg_logprob": -0.16422524452209472, "compression_ratio": + 1.7098039215686274, "no_speech_prob": 0.0014380936045199633}, {"id": 194, "seek": + 53512, "start": 540.28, "end": 542.24, "text": " has been using Cupid for years", + "tokens": [50622, 575, 668, 1228, 383, 6127, 337, 924, 50720], "temperature": 0.0, + "avg_logprob": -0.16422524452209472, "compression_ratio": 1.7098039215686274, "no_speech_prob": + 0.0014380936045199633}, {"id": 195, "seek": 53512, "start": 542.24, "end": 545.96, + "text": " and now they built up this massive body of ratings", "tokens": [50720, + 293, 586, 436, 3094, 493, 341, 5994, 1772, 295, 24603, 50906], "temperature": 0.0, + "avg_logprob": -0.16422524452209472, "compression_ratio": 1.7098039215686274, "no_speech_prob": + 0.0014380936045199633}, {"id": 196, "seek": 53512, "start": 545.96, "end": 549.16, + "text": " and they have tons of data and trend lines", "tokens": [50906, 293, 436, + 362, 9131, 295, 1412, 293, 6028, 3876, 51066], "temperature": 0.0, "avg_logprob": + -0.16422524452209472, "compression_ratio": 1.7098039215686274, "no_speech_prob": + 0.0014380936045199633}, {"id": 197, "seek": 53512, "start": 549.16, "end": 552.24, + "text": " for what did search look like four years ago?", "tokens": [51066, 337, + 437, 630, 3164, 574, 411, 1451, 924, 2057, 30, 51220], "temperature": 0.0, "avg_logprob": + -0.16422524452209472, "compression_ratio": 1.7098039215686274, "no_speech_prob": + 0.0014380936045199633}, {"id": 198, "seek": 53512, "start": 552.24, "end": 554.16, + "text": " What did it look like last year?", "tokens": [51220, 708, 630, 309, 574, + 411, 1036, 1064, 30, 51316], "temperature": 0.0, "avg_logprob": -0.16422524452209472, + "compression_ratio": 1.7098039215686274, "no_speech_prob": 0.0014380936045199633}, + {"id": 199, "seek": 53512, "start": 554.16, "end": 556.08, "text": " What does it + look like today?", "tokens": [51316, 708, 775, 309, 574, 411, 965, 30, 51412], "temperature": + 0.0, "avg_logprob": -0.16422524452209472, "compression_ratio": 1.7098039215686274, + "no_speech_prob": 0.0014380936045199633}, {"id": 200, "seek": 53512, "start": 556.08, + "end": 558.52, "text": " It''s really been exciting to see them.", "tokens": [51412, + 467, 311, 534, 668, 4670, 281, 536, 552, 13, 51534], "temperature": 0.0, "avg_logprob": + -0.16422524452209472, "compression_ratio": 1.7098039215686274, "no_speech_prob": + 0.0014380936045199633}, {"id": 201, "seek": 53512, "start": 558.52, "end": 562.8, + "text": " They''ve just been using the little hosted Cupid app.cupid.com", "tokens": + [51534, 814, 600, 445, 668, 1228, 264, 707, 19204, 383, 6127, 724, 13, 66, 6127, + 13, 1112, 51748], "temperature": 0.0, "avg_logprob": -0.16422524452209472, "compression_ratio": + 1.7098039215686274, "no_speech_prob": 0.0014380936045199633}, {"id": 202, "seek": + 53512, "start": 562.8, "end": 564.28, "text": " and but it''s worked for them.", + "tokens": [51748, 293, 457, 309, 311, 2732, 337, 552, 13, 51822], "temperature": + 0.0, "avg_logprob": -0.16422524452209472, "compression_ratio": 1.7098039215686274, + "no_speech_prob": 0.0014380936045199633}, {"id": 203, "seek": 56428, "start": 564.3199999999999, + "end": 567.72, "text": " So a thousand queries definitely takes a long time", "tokens": + [50366, 407, 257, 4714, 24109, 2138, 2516, 257, 938, 565, 50536], "temperature": + 0.0, "avg_logprob": -0.179394289978549, "compression_ratio": 1.585820895522388, + "no_speech_prob": 0.00019836827414110303}, {"id": 204, "seek": 56428, "start": 567.72, + "end": 568.76, "text": " to work your way through.", "tokens": [50536, 281, 589, + 428, 636, 807, 13, 50588], "temperature": 0.0, "avg_logprob": -0.179394289978549, + "compression_ratio": 1.585820895522388, "no_speech_prob": 0.00019836827414110303}, + {"id": 205, "seek": 56428, "start": 568.76, "end": 571.8399999999999, "text": " + But these days they''re just kind of keeping an eye", "tokens": [50588, 583, 613, + 1708, 436, 434, 445, 733, 295, 5145, 364, 3313, 50742], "temperature": 0.0, "avg_logprob": + -0.179394289978549, "compression_ratio": 1.585820895522388, "no_speech_prob": 0.00019836827414110303}, + {"id": 206, "seek": 56428, "start": 571.8399999999999, "end": 573.8, "text": " on + what''s changing, right?", "tokens": [50742, 322, 437, 311, 4473, 11, 558, 30, 50840], + "temperature": 0.0, "avg_logprob": -0.179394289978549, "compression_ratio": 1.585820895522388, + "no_speech_prob": 0.00019836827414110303}, {"id": 207, "seek": 56428, "start": 573.8, + "end": 577.0, "text": " Barring a major algorithm change.", "tokens": [50840, 363, + 18285, 257, 2563, 9284, 1319, 13, 51000], "temperature": 0.0, "avg_logprob": -0.179394289978549, + "compression_ratio": 1.585820895522388, "no_speech_prob": 0.00019836827414110303}, + {"id": 208, "seek": 56428, "start": 577.0, "end": 578.8, "text": " It''s just sort + of staying on top of it", "tokens": [51000, 467, 311, 445, 1333, 295, 7939, 322, + 1192, 295, 309, 51090], "temperature": 0.0, "avg_logprob": -0.179394289978549, "compression_ratio": + 1.585820895522388, "no_speech_prob": 0.00019836827414110303}, {"id": 209, "seek": + 56428, "start": 578.8, "end": 580.72, "text": " and keeping everything right.", + "tokens": [51090, 293, 5145, 1203, 558, 13, 51186], "temperature": 0.0, "avg_logprob": + -0.179394289978549, "compression_ratio": 1.585820895522388, "no_speech_prob": 0.00019836827414110303}, + {"id": 210, "seek": 56428, "start": 580.72, "end": 584.12, "text": " But yeah, so + it''s really exciting to see people using it.", "tokens": [51186, 583, 1338, 11, + 370, 309, 311, 534, 4670, 281, 536, 561, 1228, 309, 13, 51356], "temperature": 0.0, + "avg_logprob": -0.179394289978549, "compression_ratio": 1.585820895522388, "no_speech_prob": + 0.00019836827414110303}, {"id": 211, "seek": 56428, "start": 584.12, "end": 584.9599999999999, + "text": " Yeah.", "tokens": [51356, 865, 13, 51398], "temperature": 0.0, "avg_logprob": + -0.179394289978549, "compression_ratio": 1.585820895522388, "no_speech_prob": 0.00019836827414110303}, + {"id": 212, "seek": 56428, "start": 584.9599999999999, "end": 589.16, "text": " + Definitely I''m having a little bit of thoughts", "tokens": [51398, 12151, 286, + 478, 1419, 257, 707, 857, 295, 4598, 51608], "temperature": 0.0, "avg_logprob": + -0.179394289978549, "compression_ratio": 1.585820895522388, "no_speech_prob": 0.00019836827414110303}, + {"id": 213, "seek": 56428, "start": 589.16, "end": 593.68, "text": " about where + does Cupid live in our generative AI future?", "tokens": [51608, 466, 689, 775, + 383, 6127, 1621, 294, 527, 1337, 1166, 7318, 2027, 30, 51834], "temperature": 0.0, + "avg_logprob": -0.179394289978549, "compression_ratio": 1.585820895522388, "no_speech_prob": + 0.00019836827414110303}, {"id": 214, "seek": 59368, "start": 593.68, "end": 596.1999999999999, + "text": " Been playing a lot with tools like Ragus", "tokens": [50364, 32839, 2433, + 257, 688, 365, 3873, 411, 497, 32813, 50490], "temperature": 0.0, "avg_logprob": + -0.17974929227173783, "compression_ratio": 1.5970695970695972, "no_speech_prob": + 0.03499700874090195}, {"id": 215, "seek": 59368, "start": 596.1999999999999, "end": + 598.5999999999999, "text": " and some of the other ones, right?", "tokens": [50490, + 293, 512, 295, 264, 661, 2306, 11, 558, 30, 50610], "temperature": 0.0, "avg_logprob": + -0.17974929227173783, "compression_ratio": 1.5970695970695972, "no_speech_prob": + 0.03499700874090195}, {"id": 216, "seek": 59368, "start": 598.5999999999999, "end": + 602.52, "text": " And it''s interesting to see what tooling", "tokens": [50610, + 400, 309, 311, 1880, 281, 536, 437, 46593, 50806], "temperature": 0.0, "avg_logprob": + -0.17974929227173783, "compression_ratio": 1.5970695970695972, "no_speech_prob": + 0.03499700874090195}, {"id": 217, "seek": 59368, "start": 602.52, "end": 604.3199999999999, + "text": " and where does Cupid do things?", "tokens": [50806, 293, 689, 775, 383, + 6127, 360, 721, 30, 50896], "temperature": 0.0, "avg_logprob": -0.17974929227173783, + "compression_ratio": 1.5970695970695972, "no_speech_prob": 0.03499700874090195}, + {"id": 218, "seek": 59368, "start": 604.3199999999999, "end": 606.3199999999999, + "text": " Well, where does it have challenges?", "tokens": [50896, 1042, 11, 689, + 775, 309, 362, 4759, 30, 50996], "temperature": 0.0, "avg_logprob": -0.17974929227173783, + "compression_ratio": 1.5970695970695972, "no_speech_prob": 0.03499700874090195}, + {"id": 219, "seek": 59368, "start": 606.3199999999999, "end": 607.64, "text": " + Where do we want to go with?", "tokens": [50996, 2305, 360, 321, 528, 281, 352, + 365, 30, 51062], "temperature": 0.0, "avg_logprob": -0.17974929227173783, "compression_ratio": + 1.5970695970695972, "no_speech_prob": 0.03499700874090195}, {"id": 220, "seek": + 59368, "start": 607.64, "end": 609.12, "text": " So yeah, for sure.", "tokens": + [51062, 407, 1338, 11, 337, 988, 13, 51136], "temperature": 0.0, "avg_logprob": + -0.17974929227173783, "compression_ratio": 1.5970695970695972, "no_speech_prob": + 0.03499700874090195}, {"id": 221, "seek": 59368, "start": 609.12, "end": 610.7199999999999, + "text": " And for those who don''t know Cupid,", "tokens": [51136, 400, 337, 729, + 567, 500, 380, 458, 383, 6127, 11, 51216], "temperature": 0.0, "avg_logprob": -0.17974929227173783, + "compression_ratio": 1.5970695970695972, "no_speech_prob": 0.03499700874090195}, + {"id": 222, "seek": 59368, "start": 610.7199999999999, "end": 613.4799999999999, + "text": " I mean, I can give my short intro,", "tokens": [51216, 286, 914, 11, 286, + 393, 976, 452, 2099, 12897, 11, 51354], "temperature": 0.0, "avg_logprob": -0.17974929227173783, + "compression_ratio": 1.5970695970695972, "no_speech_prob": 0.03499700874090195}, + {"id": 223, "seek": 59368, "start": 613.4799999999999, "end": 615.4, "text": " but + obviously feel free to augment.", "tokens": [51354, 457, 2745, 841, 1737, 281, 29919, + 13, 51450], "temperature": 0.0, "avg_logprob": -0.17974929227173783, "compression_ratio": + 1.5970695970695972, "no_speech_prob": 0.03499700874090195}, {"id": 224, "seek": + 59368, "start": 615.4, "end": 618.76, "text": " But like the way I see it is that + it''s basically", "tokens": [51450, 583, 411, 264, 636, 286, 536, 309, 307, 300, + 309, 311, 1936, 51618], "temperature": 0.0, "avg_logprob": -0.17974929227173783, + "compression_ratio": 1.5970695970695972, "no_speech_prob": 0.03499700874090195}, + {"id": 225, "seek": 59368, "start": 618.76, "end": 621.9599999999999, "text": " + instead of hearsay and sort of someone saying,", "tokens": [51618, 2602, 295, 25688, + 320, 293, 1333, 295, 1580, 1566, 11, 51778], "temperature": 0.0, "avg_logprob": + -0.17974929227173783, "compression_ratio": 1.5970695970695972, "no_speech_prob": + 0.03499700874090195}, {"id": 226, "seek": 62196, "start": 621.96, "end": 623.52, + "text": " your search doesn''t work.", "tokens": [50364, 428, 3164, 1177, 380, 589, + 13, 50442], "temperature": 0.0, "avg_logprob": -0.20718093810042715, "compression_ratio": + 1.7295081967213115, "no_speech_prob": 0.007505514193326235}, {"id": 227, "seek": + 62196, "start": 623.52, "end": 626.12, "text": " And here is one anecdotal example.", + "tokens": [50442, 400, 510, 307, 472, 26652, 38180, 1365, 13, 50572], "temperature": + 0.0, "avg_logprob": -0.20718093810042715, "compression_ratio": 1.7295081967213115, + "no_speech_prob": 0.007505514193326235}, {"id": 228, "seek": 62196, "start": 626.12, + "end": 628.36, "text": " What you can do is that, oh, vice versa,", "tokens": [50572, + 708, 291, 393, 360, 307, 300, 11, 1954, 11, 11964, 25650, 11, 50684], "temperature": + 0.0, "avg_logprob": -0.20718093810042715, "compression_ratio": 1.7295081967213115, + "no_speech_prob": 0.007505514193326235}, {"id": 229, "seek": 62196, "start": 628.36, + "end": 629.76, "text": " you could say I improved search.", "tokens": [50684, 291, + 727, 584, 286, 9689, 3164, 13, 50754], "temperature": 0.0, "avg_logprob": -0.20718093810042715, + "compression_ratio": 1.7295081967213115, "no_speech_prob": 0.007505514193326235}, + {"id": 230, "seek": 62196, "start": 629.76, "end": 631.5600000000001, "text": " + And here is one anecdotal example", "tokens": [50754, 400, 510, 307, 472, 26652, + 38180, 1365, 50844], "temperature": 0.0, "avg_logprob": -0.20718093810042715, "compression_ratio": + 1.7295081967213115, "no_speech_prob": 0.007505514193326235}, {"id": 231, "seek": + 62196, "start": 631.5600000000001, "end": 633.6, "text": " where it really shines, + right?", "tokens": [50844, 689, 309, 534, 28056, 11, 558, 30, 50946], "temperature": + 0.0, "avg_logprob": -0.20718093810042715, "compression_ratio": 1.7295081967213115, + "no_speech_prob": 0.007505514193326235}, {"id": 232, "seek": 62196, "start": 633.6, + "end": 635.9200000000001, "text": " Now what should we ship it?", "tokens": [50946, + 823, 437, 820, 321, 5374, 309, 30, 51062], "temperature": 0.0, "avg_logprob": -0.20718093810042715, + "compression_ratio": 1.7295081967213115, "no_speech_prob": 0.007505514193326235}, + {"id": 233, "seek": 62196, "start": 635.9200000000001, "end": 640.76, "text": " + So basically I think Cupid really gives you the tooling", "tokens": [51062, 407, + 1936, 286, 519, 383, 6127, 534, 2709, 291, 264, 46593, 51304], "temperature": 0.0, + "avg_logprob": -0.20718093810042715, "compression_ratio": 1.7295081967213115, "no_speech_prob": + 0.007505514193326235}, {"id": 234, "seek": 62196, "start": 640.84, "end": 644.4000000000001, + "text": " and you can actually, even if you want,", "tokens": [51308, 293, 291, + 393, 767, 11, 754, 498, 291, 528, 11, 51486], "temperature": 0.0, "avg_logprob": + -0.20718093810042715, "compression_ratio": 1.7295081967213115, "no_speech_prob": + 0.007505514193326235}, {"id": 235, "seek": 62196, "start": 644.4000000000001, "end": + 647.24, "text": " you can even do it in an unbiased way,", "tokens": [51486, 291, + 393, 754, 360, 309, 294, 364, 517, 5614, 1937, 636, 11, 51628], "temperature": 0.0, + "avg_logprob": -0.20718093810042715, "compression_ratio": 1.7295081967213115, "no_speech_prob": + 0.007505514193326235}, {"id": 236, "seek": 62196, "start": 647.24, "end": 650.6, + "text": " as possible where you will do blind labeling in some sense,", "tokens": + [51628, 382, 1944, 689, 291, 486, 360, 6865, 40244, 294, 512, 2020, 11, 51796], + "temperature": 0.0, "avg_logprob": -0.20718093810042715, "compression_ratio": 1.7295081967213115, + "no_speech_prob": 0.007505514193326235}, {"id": 237, "seek": 65060, "start": 650.6, + "end": 651.6, "text": " right?", "tokens": [50364, 558, 30, 50414], "temperature": + 0.0, "avg_logprob": -0.222551025390625, "compression_ratio": 1.6977611940298507, + "no_speech_prob": 0.02434610202908516}, {"id": 238, "seek": 65060, "start": 651.6, + "end": 654.24, "text": " So I''ve done it actually just recently.", "tokens": [50414, + 407, 286, 600, 1096, 309, 767, 445, 3938, 13, 50546], "temperature": 0.0, "avg_logprob": + -0.222551025390625, "compression_ratio": 1.6977611940298507, "no_speech_prob": 0.02434610202908516}, + {"id": 239, "seek": 65060, "start": 654.24, "end": 658.08, "text": " And basically + you allow your users,", "tokens": [50546, 400, 1936, 291, 2089, 428, 5022, 11, 50738], + "temperature": 0.0, "avg_logprob": -0.222551025390625, "compression_ratio": 1.6977611940298507, + "no_speech_prob": 0.02434610202908516}, {"id": 240, "seek": 65060, "start": 658.08, + "end": 660.12, "text": " well, your domain experts actually,", "tokens": [50738, + 731, 11, 428, 9274, 8572, 767, 11, 50840], "temperature": 0.0, "avg_logprob": -0.222551025390625, + "compression_ratio": 1.6977611940298507, "no_speech_prob": 0.02434610202908516}, + {"id": 241, "seek": 65060, "start": 660.12, "end": 663.64, "text": " but maybe even + developers to go label queries.", "tokens": [50840, 457, 1310, 754, 8849, 281, 352, + 7645, 24109, 13, 51016], "temperature": 0.0, "avg_logprob": -0.222551025390625, + "compression_ratio": 1.6977611940298507, "no_speech_prob": 0.02434610202908516}, + {"id": 242, "seek": 65060, "start": 663.64, "end": 665.52, "text": " And it also + has this sandbox", "tokens": [51016, 400, 309, 611, 575, 341, 42115, 51110], "temperature": + 0.0, "avg_logprob": -0.222551025390625, "compression_ratio": 1.6977611940298507, + "no_speech_prob": 0.02434610202908516}, {"id": 243, "seek": 65060, "start": 665.52, + "end": 668.2, "text": " where you can actually, well, you can plug in your own engine,", + "tokens": [51110, 689, 291, 393, 767, 11, 731, 11, 291, 393, 5452, 294, 428, 1065, + 2848, 11, 51244], "temperature": 0.0, "avg_logprob": -0.222551025390625, "compression_ratio": + 1.6977611940298507, "no_speech_prob": 0.02434610202908516}, {"id": 244, "seek": + 65060, "start": 668.2, "end": 670.44, "text": " but you can also plug in those standard + engines", "tokens": [51244, 457, 291, 393, 611, 5452, 294, 729, 3832, 12982, 51356], + "temperature": 0.0, "avg_logprob": -0.222551025390625, "compression_ratio": 1.6977611940298507, + "no_speech_prob": 0.02434610202908516}, {"id": 245, "seek": 65060, "start": 670.44, + "end": 673.64, "text": " like Elastic Search Solar, Open Search and others.", "tokens": + [51356, 411, 2699, 2750, 17180, 22385, 11, 7238, 17180, 293, 2357, 13, 51516], "temperature": + 0.0, "avg_logprob": -0.222551025390625, "compression_ratio": 1.6977611940298507, + "no_speech_prob": 0.02434610202908516}, {"id": 246, "seek": 65060, "start": 673.64, + "end": 676.44, "text": " And I think you even added some vector search engines", + "tokens": [51516, 400, 286, 519, 291, 754, 3869, 512, 8062, 3164, 12982, 51656], + "temperature": 0.0, "avg_logprob": -0.222551025390625, "compression_ratio": 1.6977611940298507, + "no_speech_prob": 0.02434610202908516}, {"id": 247, "seek": 65060, "start": 676.44, + "end": 677.36, "text": " recently, right?", "tokens": [51656, 3938, 11, 558, 30, + 51702], "temperature": 0.0, "avg_logprob": -0.222551025390625, "compression_ratio": + 1.6977611940298507, "no_speech_prob": 0.02434610202908516}, {"id": 248, "seek": + 65060, "start": 677.36, "end": 679.76, "text": " Yeah, so we have a Vectara,", "tokens": + [51702, 865, 11, 370, 321, 362, 257, 691, 557, 2419, 11, 51822], "temperature": + 0.0, "avg_logprob": -0.222551025390625, "compression_ratio": 1.6977611940298507, + "no_speech_prob": 0.02434610202908516}, {"id": 249, "seek": 67976, "start": 679.76, + "end": 682.0, "text": " which is a pure vector search engine.", "tokens": [50364, + 597, 307, 257, 6075, 8062, 3164, 2848, 13, 50476], "temperature": 0.0, "avg_logprob": + -0.26311240196228025, "compression_ratio": 1.5541666666666667, "no_speech_prob": + 0.0007325750775635242}, {"id": 250, "seek": 67976, "start": 682.0, "end": 684.0, + "text": " We''ve got out the only app.", "tokens": [50476, 492, 600, 658, 484, 264, + 787, 724, 13, 50576], "temperature": 0.0, "avg_logprob": -0.26311240196228025, "compression_ratio": + 1.5541666666666667, "no_speech_prob": 0.0007325750775635242}, {"id": 251, "seek": + 67976, "start": 684.0, "end": 686.96, "text": " And then Open Search, Elastic Search + Solar,", "tokens": [50576, 400, 550, 7238, 17180, 11, 2699, 2750, 17180, 22385, + 11, 50724], "temperature": 0.0, "avg_logprob": -0.26311240196228025, "compression_ratio": + 1.5541666666666667, "no_speech_prob": 0.0007325750775635242}, {"id": 252, "seek": + 67976, "start": 686.96, "end": 688.8, "text": " the Lucine Bay search engines,", + "tokens": [50724, 264, 9593, 533, 7840, 3164, 12982, 11, 50816], "temperature": + 0.0, "avg_logprob": -0.26311240196228025, "compression_ratio": 1.5541666666666667, + "no_speech_prob": 0.0007325750775635242}, {"id": 253, "seek": 67976, "start": 688.8, + "end": 690.96, "text": " and then kind of exciting,", "tokens": [50816, 293, 550, + 733, 295, 4670, 11, 50924], "temperature": 0.0, "avg_logprob": -0.26311240196228025, + "compression_ratio": 1.5541666666666667, "no_speech_prob": 0.0007325750775635242}, + {"id": 254, "seek": 67976, "start": 690.96, "end": 694.0, "text": " you can also + now plug in your own search API.", "tokens": [50924, 291, 393, 611, 586, 5452, 294, + 428, 1065, 3164, 9362, 13, 51076], "temperature": 0.0, "avg_logprob": -0.26311240196228025, + "compression_ratio": 1.5541666666666667, "no_speech_prob": 0.0007325750775635242}, + {"id": 255, "seek": 67976, "start": 694.0, "end": 697.52, "text": " And so you can + just talk to any API,", "tokens": [51076, 400, 370, 291, 393, 445, 751, 281, 604, + 9362, 11, 51252], "temperature": 0.0, "avg_logprob": -0.26311240196228025, "compression_ratio": + 1.5541666666666667, "no_speech_prob": 0.0007325750775635242}, {"id": 256, "seek": + 67976, "start": 697.52, "end": 701.84, "text": " a restful Git post JSON sort of + API,", "tokens": [51252, 257, 1472, 906, 16939, 2183, 31828, 1333, 295, 9362, 11, + 51468], "temperature": 0.0, "avg_logprob": -0.26311240196228025, "compression_ratio": + 1.5541666666666667, "no_speech_prob": 0.0007325750775635242}, {"id": 257, "seek": + 67976, "start": 701.84, "end": 703.68, "text": " you can use Cupid as well.", "tokens": + [51468, 291, 393, 764, 383, 6127, 382, 731, 13, 51560], "temperature": 0.0, "avg_logprob": + -0.26311240196228025, "compression_ratio": 1.5541666666666667, "no_speech_prob": + 0.0007325750775635242}, {"id": 258, "seek": 67976, "start": 703.68, "end": 705.12, + "text": " So that''s been really good.", "tokens": [51560, 407, 300, 311, 668, 534, + 665, 13, 51632], "temperature": 0.0, "avg_logprob": -0.26311240196228025, "compression_ratio": + 1.5541666666666667, "no_speech_prob": 0.0007325750775635242}, {"id": 259, "seek": + 67976, "start": 705.12, "end": 706.12, "text": " Fantastic.", "tokens": [51632, + 21320, 13, 51682], "temperature": 0.0, "avg_logprob": -0.26311240196228025, "compression_ratio": + 1.5541666666666667, "no_speech_prob": 0.0007325750775635242}, {"id": 260, "seek": + 67976, "start": 706.12, "end": 708.24, "text": " I love this YCupid.", "tokens": + [51682, 286, 959, 341, 398, 34, 6127, 13, 51788], "temperature": 0.0, "avg_logprob": + -0.26311240196228025, "compression_ratio": 1.5541666666666667, "no_speech_prob": + 0.0007325750775635242}, {"id": 261, "seek": 70824, "start": 708.24, "end": 710.72, + "text": " This is sort of the origin story Doug Turmbolt,", "tokens": [50364, 639, + 307, 1333, 295, 264, 4957, 1657, 12742, 5712, 2504, 4837, 11, 50488], "temperature": + 0.0, "avg_logprob": -0.18321625038429543, "compression_ratio": 1.6354166666666667, + "no_speech_prob": 0.001267026411369443}, {"id": 262, "seek": 70824, "start": 710.72, + "end": 712.72, "text": " who many of you may know, right,", "tokens": [50488, 567, + 867, 295, 291, 815, 458, 11, 558, 11, 50588], "temperature": 0.0, "avg_logprob": + -0.18321625038429543, "compression_ratio": 1.6354166666666667, "no_speech_prob": + 0.001267026411369443}, {"id": 263, "seek": 70824, "start": 712.72, "end": 714.5600000000001, + "text": " from his book Relevant Search.", "tokens": [50588, 490, 702, 1446, 1300, + 25638, 17180, 13, 50680], "temperature": 0.0, "avg_logprob": -0.18321625038429543, + "compression_ratio": 1.6354166666666667, "no_speech_prob": 0.001267026411369443}, + {"id": 264, "seek": 70824, "start": 714.5600000000001, "end": 716.76, "text": " + He created Cupid.", "tokens": [50680, 634, 2942, 383, 6127, 13, 50790], "temperature": + 0.0, "avg_logprob": -0.18321625038429543, "compression_ratio": 1.6354166666666667, + "no_speech_prob": 0.001267026411369443}, {"id": 265, "seek": 70824, "start": 716.76, + "end": 719.72, "text": " And we''re looking at like a decade ago at this point.", + "tokens": [50790, 400, 321, 434, 1237, 412, 411, 257, 10378, 2057, 412, 341, 935, + 13, 50938], "temperature": 0.0, "avg_logprob": -0.18321625038429543, "compression_ratio": + 1.6354166666666667, "no_speech_prob": 0.001267026411369443}, {"id": 266, "seek": + 70824, "start": 719.72, "end": 721.16, "text": " And it was because, you know,", + "tokens": [50938, 400, 309, 390, 570, 11, 291, 458, 11, 51010], "temperature": 0.0, + "avg_logprob": -0.18321625038429543, "compression_ratio": 1.6354166666666667, "no_speech_prob": + 0.001267026411369443}, {"id": 267, "seek": 70824, "start": 721.16, "end": 724.12, + "text": " it was difficult to measure and improve search, right?", "tokens": [51010, + 309, 390, 2252, 281, 3481, 293, 3470, 3164, 11, 558, 30, 51158], "temperature": + 0.0, "avg_logprob": -0.18321625038429543, "compression_ratio": 1.6354166666666667, + "no_speech_prob": 0.001267026411369443}, {"id": 268, "seek": 70824, "start": 724.12, + "end": 726.96, "text": " Lots of spreadsheets going back, lots of conversations.", + "tokens": [51158, 15908, 295, 23651, 1385, 516, 646, 11, 3195, 295, 7315, 13, 51300], + "temperature": 0.0, "avg_logprob": -0.18321625038429543, "compression_ratio": 1.6354166666666667, + "no_speech_prob": 0.001267026411369443}, {"id": 269, "seek": 70824, "start": 726.96, + "end": 728.84, "text": " You fix one thing, break another.", "tokens": [51300, 509, + 3191, 472, 551, 11, 1821, 1071, 13, 51394], "temperature": 0.0, "avg_logprob": -0.18321625038429543, + "compression_ratio": 1.6354166666666667, "no_speech_prob": 0.001267026411369443}, + {"id": 270, "seek": 70824, "start": 728.84, "end": 732.44, "text": " And Doug and + Arena were working together on a project.", "tokens": [51394, 400, 12742, 293, 34290, + 645, 1364, 1214, 322, 257, 1716, 13, 51574], "temperature": 0.0, "avg_logprob": + -0.18321625038429543, "compression_ratio": 1.6354166666666667, "no_speech_prob": + 0.001267026411369443}, {"id": 271, "seek": 70824, "start": 732.44, "end": 734.24, + "text": " And it was literally,", "tokens": [51574, 400, 309, 390, 3736, 11, 51664], + "temperature": 0.0, "avg_logprob": -0.18321625038429543, "compression_ratio": 1.6354166666666667, + "no_speech_prob": 0.001267026411369443}, {"id": 272, "seek": 70824, "start": 734.24, + "end": 737.12, "text": " this is the origin story for Cupid.", "tokens": [51664, + 341, 307, 264, 4957, 1657, 337, 383, 6127, 13, 51808], "temperature": 0.0, "avg_logprob": + -0.18321625038429543, "compression_ratio": 1.6354166666666667, "no_speech_prob": + 0.001267026411369443}, {"id": 273, "seek": 73712, "start": 737.12, "end": 742.04, + "text": " So Cupid''s all about making collaboration better,", "tokens": [50364, + 407, 383, 6127, 311, 439, 466, 1455, 9363, 1101, 11, 50610], "temperature": 0.0, + "avg_logprob": -0.12884377096300928, "compression_ratio": 1.7702127659574467, "no_speech_prob": + 0.00014859293878544122}, {"id": 274, "seek": 73712, "start": 742.04, "end": 744.76, + "text": " making your testing more accurate,", "tokens": [50610, 1455, 428, 4997, + 544, 8559, 11, 50746], "temperature": 0.0, "avg_logprob": -0.12884377096300928, + "compression_ratio": 1.7702127659574467, "no_speech_prob": 0.00014859293878544122}, + {"id": 275, "seek": 73712, "start": 744.76, "end": 747.16, "text": " and making + things go faster, right?", "tokens": [50746, 293, 1455, 721, 352, 4663, 11, 558, + 30, 50866], "temperature": 0.0, "avg_logprob": -0.12884377096300928, "compression_ratio": + 1.7702127659574467, "no_speech_prob": 0.00014859293878544122}, {"id": 276, "seek": + 73712, "start": 747.16, "end": 751.44, "text": " Because we need to iterate and + experiment quickly, right?", "tokens": [50866, 1436, 321, 643, 281, 44497, 293, + 5120, 2661, 11, 558, 30, 51080], "temperature": 0.0, "avg_logprob": -0.12884377096300928, + "compression_ratio": 1.7702127659574467, "no_speech_prob": 0.00014859293878544122}, + {"id": 277, "seek": 73712, "start": 751.44, "end": 753.24, "text": " The one thing + I know is that the team", "tokens": [51080, 440, 472, 551, 286, 458, 307, 300, 264, + 1469, 51170], "temperature": 0.0, "avg_logprob": -0.12884377096300928, "compression_ratio": + 1.7702127659574467, "no_speech_prob": 0.00014859293878544122}, {"id": 278, "seek": + 73712, "start": 753.24, "end": 755.44, "text": " that can experiment quickly and + effectively", "tokens": [51170, 300, 393, 5120, 2661, 293, 8659, 51280], "temperature": + 0.0, "avg_logprob": -0.12884377096300928, "compression_ratio": 1.7702127659574467, + "no_speech_prob": 0.00014859293878544122}, {"id": 279, "seek": 73712, "start": 755.44, + "end": 757.88, "text": " is the team that''s going to win out, right?", "tokens": + [51280, 307, 264, 1469, 300, 311, 516, 281, 1942, 484, 11, 558, 30, 51402], "temperature": + 0.0, "avg_logprob": -0.12884377096300928, "compression_ratio": 1.7702127659574467, + "no_speech_prob": 0.00014859293878544122}, {"id": 280, "seek": 73712, "start": 757.88, + "end": 760.8, "text": " It''s not about specific technology choices", "tokens": + [51402, 467, 311, 406, 466, 2685, 2899, 7994, 51548], "temperature": 0.0, "avg_logprob": + -0.12884377096300928, "compression_ratio": 1.7702127659574467, "no_speech_prob": + 0.00014859293878544122}, {"id": 281, "seek": 73712, "start": 760.8, "end": 762.76, + "text": " or technical expertise.", "tokens": [51548, 420, 6191, 11769, 13, 51646], + "temperature": 0.0, "avg_logprob": -0.12884377096300928, "compression_ratio": 1.7702127659574467, + "no_speech_prob": 0.00014859293878544122}, {"id": 282, "seek": 73712, "start": 762.76, + "end": 764.48, "text": " It''s experimentation.", "tokens": [51646, 467, 311, 37142, + 13, 51732], "temperature": 0.0, "avg_logprob": -0.12884377096300928, "compression_ratio": + 1.7702127659574467, "no_speech_prob": 0.00014859293878544122}, {"id": 283, "seek": + 73712, "start": 764.48, "end": 765.92, "text": " Can you do it quickly?", "tokens": + [51732, 1664, 291, 360, 309, 2661, 30, 51804], "temperature": 0.0, "avg_logprob": + -0.12884377096300928, "compression_ratio": 1.7702127659574467, "no_speech_prob": + 0.00014859293878544122}, {"id": 284, "seek": 76592, "start": 765.92, "end": 770.92, + "text": " So yeah, so Cupid.com has the advertising free hosted version,", "tokens": + [50364, 407, 1338, 11, 370, 383, 6127, 13, 1112, 575, 264, 13097, 1737, 19204, 3037, + 11, 50614], "temperature": 0.0, "avg_logprob": -0.21256390652915305, "compression_ratio": + 1.595330739299611, "no_speech_prob": 0.001894827582873404}, {"id": 285, "seek": + 76592, "start": 772.3199999999999, "end": 773.3199999999999, "text": " really excited.", + "tokens": [50684, 534, 2919, 13, 50734], "temperature": 0.0, "avg_logprob": -0.21256390652915305, + "compression_ratio": 1.595330739299611, "no_speech_prob": 0.001894827582873404}, + {"id": 286, "seek": 76592, "start": 773.3199999999999, "end": 778.0799999999999, + "text": " It sort of continues to be useful in today''s world.", "tokens": [50734, + 467, 1333, 295, 6515, 281, 312, 4420, 294, 965, 311, 1002, 13, 50972], "temperature": + 0.0, "avg_logprob": -0.21256390652915305, "compression_ratio": 1.595330739299611, + "no_speech_prob": 0.001894827582873404}, {"id": 287, "seek": 76592, "start": 778.0799999999999, + "end": 778.92, "text": " Absolutely.", "tokens": [50972, 7021, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.21256390652915305, "compression_ratio": 1.595330739299611, + "no_speech_prob": 0.001894827582873404}, {"id": 288, "seek": 76592, "start": 778.92, + "end": 780.36, "text": " And it''s also open source, right?", "tokens": [51014, + 400, 309, 311, 611, 1269, 4009, 11, 558, 30, 51086], "temperature": 0.0, "avg_logprob": + -0.21256390652915305, "compression_ratio": 1.595330739299611, "no_speech_prob": + 0.001894827582873404}, {"id": 289, "seek": 76592, "start": 780.36, "end": 784.1999999999999, + "text": " So you don''t have to be buying anything, whatever.", "tokens": [51086, + 407, 291, 500, 380, 362, 281, 312, 6382, 1340, 11, 2035, 13, 51278], "temperature": + 0.0, "avg_logprob": -0.21256390652915305, "compression_ratio": 1.595330739299611, + "no_speech_prob": 0.001894827582873404}, {"id": 290, "seek": 76592, "start": 784.1999999999999, + "end": 785.4, "text": " It used to be a product though.", "tokens": [51278, 467, + 1143, 281, 312, 257, 1674, 1673, 13, 51338], "temperature": 0.0, "avg_logprob": + -0.21256390652915305, "compression_ratio": 1.595330739299611, "no_speech_prob": + 0.001894827582873404}, {"id": 291, "seek": 76592, "start": 785.4, "end": 786.9599999999999, + "text": " It used to be generating revenue.", "tokens": [51338, 467, 1143, 281, + 312, 17746, 9324, 13, 51416], "temperature": 0.0, "avg_logprob": -0.21256390652915305, + "compression_ratio": 1.595330739299611, "no_speech_prob": 0.001894827582873404}, + {"id": 292, "seek": 76592, "start": 786.9599999999999, "end": 788.4399999999999, + "text": " Yeah, I mean, you told me.", "tokens": [51416, 865, 11, 286, 914, 11, + 291, 1907, 385, 13, 51490], "temperature": 0.0, "avg_logprob": -0.21256390652915305, + "compression_ratio": 1.595330739299611, "no_speech_prob": 0.001894827582873404}, + {"id": 293, "seek": 76592, "start": 788.4399999999999, "end": 789.4399999999999, + "text": " We''re consulting soon.", "tokens": [51490, 492, 434, 23682, 2321, 13, + 51540], "temperature": 0.0, "avg_logprob": -0.21256390652915305, "compression_ratio": + 1.595330739299611, "no_speech_prob": 0.001894827582873404}, {"id": 294, "seek": + 76592, "start": 789.4399999999999, "end": 790.92, "text": " So yeah, we used to + sell it.", "tokens": [51540, 407, 1338, 11, 321, 1143, 281, 3607, 309, 13, 51614], + "temperature": 0.0, "avg_logprob": -0.21256390652915305, "compression_ratio": 1.595330739299611, + "no_speech_prob": 0.001894827582873404}, {"id": 295, "seek": 76592, "start": 790.92, + "end": 794.24, "text": " We used to sell it for $10,000 a year", "tokens": [51614, + 492, 1143, 281, 3607, 309, 337, 1848, 3279, 11, 1360, 257, 1064, 51780], "temperature": + 0.0, "avg_logprob": -0.21256390652915305, "compression_ratio": 1.595330739299611, + "no_speech_prob": 0.001894827582873404}, {"id": 296, "seek": 79424, "start": 794.24, + "end": 796.0, "text": " for an enterprise license.", "tokens": [50364, 337, 364, + 14132, 10476, 13, 50452], "temperature": 0.0, "avg_logprob": -0.16556763421921503, + "compression_ratio": 1.5208333333333333, "no_speech_prob": 0.00370888807810843}, + {"id": 297, "seek": 79424, "start": 796.0, "end": 798.04, "text": " And we had customers + and it was great.", "tokens": [50452, 400, 321, 632, 4581, 293, 309, 390, 869, 13, + 50554], "temperature": 0.0, "avg_logprob": -0.16556763421921503, "compression_ratio": + 1.5208333333333333, "no_speech_prob": 0.00370888807810843}, {"id": 298, "seek": + 79424, "start": 798.04, "end": 802.4, "text": " But I think then we figured out + we were making, I don''t know,", "tokens": [50554, 583, 286, 519, 550, 321, 8932, + 484, 321, 645, 1455, 11, 286, 500, 380, 458, 11, 50772], "temperature": 0.0, "avg_logprob": + -0.16556763421921503, "compression_ratio": 1.5208333333333333, "no_speech_prob": + 0.00370888807810843}, {"id": 299, "seek": 79424, "start": 803.4, "end": 806.4, "text": + " $80,000 a year, which sounds like a lot,", "tokens": [50822, 1848, 4702, 11, 1360, + 257, 1064, 11, 597, 3263, 411, 257, 688, 11, 50972], "temperature": 0.0, "avg_logprob": + -0.16556763421921503, "compression_ratio": 1.5208333333333333, "no_speech_prob": + 0.00370888807810843}, {"id": 300, "seek": 79424, "start": 806.4, "end": 810.6, "text": + " but then investing $150,000 in salary, supporting it.", "tokens": [50972, 457, + 550, 10978, 1848, 20120, 11, 1360, 294, 15360, 11, 7231, 309, 13, 51182], "temperature": + 0.0, "avg_logprob": -0.16556763421921503, "compression_ratio": 1.5208333333333333, + "no_speech_prob": 0.00370888807810843}, {"id": 301, "seek": 79424, "start": 810.6, + "end": 813.96, "text": " And it was like, yeah, we''re not a product company.", + "tokens": [51182, 400, 309, 390, 411, 11, 1338, 11, 321, 434, 406, 257, 1674, 2237, + 13, 51350], "temperature": 0.0, "avg_logprob": -0.16556763421921503, "compression_ratio": + 1.5208333333333333, "no_speech_prob": 0.00370888807810843}, {"id": 302, "seek": + 79424, "start": 813.96, "end": 817.32, "text": " And we are open source connections,", + "tokens": [51350, 400, 321, 366, 1269, 4009, 9271, 11, 51518], "temperature": 0.0, + "avg_logprob": -0.16556763421921503, "compression_ratio": 1.5208333333333333, "no_speech_prob": + 0.00370888807810843}, {"id": 303, "seek": 79424, "start": 817.32, "end": 820.24, + "text": " having a commercial product just didn''t fit naturally.", "tokens": [51518, + 1419, 257, 6841, 1674, 445, 994, 380, 3318, 8195, 13, 51664], "temperature": 0.0, + "avg_logprob": -0.16556763421921503, "compression_ratio": 1.5208333333333333, "no_speech_prob": + 0.00370888807810843}, {"id": 304, "seek": 82024, "start": 820.24, "end": 825.24, + "text": " And since we''re all about training our clients", "tokens": [50364, 400, + 1670, 321, 434, 439, 466, 3097, 527, 6982, 50614], "temperature": 0.0, "avg_logprob": + -0.17847543604233684, "compression_ratio": 1.6259541984732824, "no_speech_prob": + 0.0020903716795146465}, {"id": 305, "seek": 82024, "start": 825.5600000000001, "end": + 828.88, "text": " and empowering search teams, right?", "tokens": [50630, 293, 28261, + 3164, 5491, 11, 558, 30, 50796], "temperature": 0.0, "avg_logprob": -0.17847543604233684, + "compression_ratio": 1.6259541984732824, "no_speech_prob": 0.0020903716795146465}, + {"id": 306, "seek": 82024, "start": 828.88, "end": 831.52, "text": " Doesn''t necessarily + feel empowering to be like,", "tokens": [50796, 12955, 380, 4725, 841, 28261, 281, + 312, 411, 11, 50928], "temperature": 0.0, "avg_logprob": -0.17847543604233684, "compression_ratio": + 1.6259541984732824, "no_speech_prob": 0.0020903716795146465}, {"id": 307, "seek": + 82024, "start": 831.52, "end": 833.08, "text": " yes, we''ve empowered you,", "tokens": + [50928, 2086, 11, 321, 600, 27898, 291, 11, 51006], "temperature": 0.0, "avg_logprob": + -0.17847543604233684, "compression_ratio": 1.6259541984732824, "no_speech_prob": + 0.0020903716795146465}, {"id": 308, "seek": 82024, "start": 833.08, "end": 834.8, + "text": " but you have to pay us money every month", "tokens": [51006, 457, 291, + 362, 281, 1689, 505, 1460, 633, 1618, 51092], "temperature": 0.0, "avg_logprob": + -0.17847543604233684, "compression_ratio": 1.6259541984732824, "no_speech_prob": + 0.0020903716795146465}, {"id": 309, "seek": 82024, "start": 834.8, "end": 837.12, + "text": " for this one product, right?", "tokens": [51092, 337, 341, 472, 1674, + 11, 558, 30, 51208], "temperature": 0.0, "avg_logprob": -0.17847543604233684, "compression_ratio": + 1.6259541984732824, "no_speech_prob": 0.0020903716795146465}, {"id": 310, "seek": + 82024, "start": 837.12, "end": 840.84, "text": " Just felt more natural to have + it as an open source project.", "tokens": [51208, 1449, 2762, 544, 3303, 281, 362, + 309, 382, 364, 1269, 4009, 1716, 13, 51394], "temperature": 0.0, "avg_logprob": + -0.17847543604233684, "compression_ratio": 1.6259541984732824, "no_speech_prob": + 0.0020903716795146465}, {"id": 311, "seek": 82024, "start": 840.84, "end": 841.6800000000001, + "text": " Yeah, absolutely.", "tokens": [51394, 865, 11, 3122, 13, 51436], "temperature": + 0.0, "avg_logprob": -0.17847543604233684, "compression_ratio": 1.6259541984732824, + "no_speech_prob": 0.0020903716795146465}, {"id": 312, "seek": 82024, "start": 841.6800000000001, + "end": 844.4, "text": " And it also fits your, well, how should I say,", "tokens": + [51436, 400, 309, 611, 9001, 428, 11, 731, 11, 577, 820, 286, 584, 11, 51572], "temperature": + 0.0, "avg_logprob": -0.17847543604233684, "compression_ratio": 1.6259541984732824, + "no_speech_prob": 0.0020903716795146465}, {"id": 313, "seek": 82024, "start": 844.4, + "end": 847.44, "text": " your professional line at UI commuter.", "tokens": [51572, + 428, 4843, 1622, 412, 15682, 800, 20314, 13, 51724], "temperature": 0.0, "avg_logprob": + -0.17847543604233684, "compression_ratio": 1.6259541984732824, "no_speech_prob": + 0.0020903716795146465}, {"id": 314, "seek": 82024, "start": 847.44, "end": 849.4, + "text": " You''ve seen solar commuter, right?", "tokens": [51724, 509, 600, 1612, + 7936, 800, 20314, 11, 558, 30, 51822], "temperature": 0.0, "avg_logprob": -0.17847543604233684, + "compression_ratio": 1.6259541984732824, "no_speech_prob": 0.0020903716795146465}, + {"id": 315, "seek": 84940, "start": 849.4, "end": 852.9599999999999, "text": " Yeah, + so I am a commuter, not active on Lucene,", "tokens": [50364, 865, 11, 370, 286, + 669, 257, 800, 20314, 11, 406, 4967, 322, 9593, 1450, 11, 50542], "temperature": + 0.0, "avg_logprob": -0.16465173326097093, "compression_ratio": 1.6926229508196722, + "no_speech_prob": 0.00010032494901679456}, {"id": 316, "seek": 84940, "start": 852.9599999999999, + "end": 855.84, "text": " that''s just a level of technical expertise.", "tokens": + [50542, 300, 311, 445, 257, 1496, 295, 6191, 11769, 13, 50686], "temperature": 0.0, + "avg_logprob": -0.16465173326097093, "compression_ratio": 1.6926229508196722, "no_speech_prob": + 0.00010032494901679456}, {"id": 317, "seek": 84940, "start": 855.84, "end": 857.6, + "text": " But I am a commuter on solar.", "tokens": [50686, 583, 286, 669, 257, + 800, 20314, 322, 7936, 13, 50774], "temperature": 0.0, "avg_logprob": -0.16465173326097093, + "compression_ratio": 1.6926229508196722, "no_speech_prob": 0.00010032494901679456}, + {"id": 318, "seek": 84940, "start": 857.6, "end": 860.56, "text": " And then interesting + as like an interesting", "tokens": [50774, 400, 550, 1880, 382, 411, 364, 1880, + 50922], "temperature": 0.0, "avg_logprob": -0.16465173326097093, "compression_ratio": + 1.6926229508196722, "no_speech_prob": 0.00010032494901679456}, {"id": 319, "seek": + 84940, "start": 860.56, "end": 863.4, "text": " personal professional development,", + "tokens": [50922, 2973, 4843, 3250, 11, 51064], "temperature": 0.0, "avg_logprob": + -0.16465173326097093, "compression_ratio": 1.6926229508196722, "no_speech_prob": + 0.00010032494901679456}, {"id": 320, "seek": 84940, "start": 863.4, "end": 865.52, + "text": " I''ve been really gotten much more involved", "tokens": [51064, 286, 600, + 668, 534, 5768, 709, 544, 3288, 51170], "temperature": 0.0, "avg_logprob": -0.16465173326097093, + "compression_ratio": 1.6926229508196722, "no_speech_prob": 0.00010032494901679456}, + {"id": 321, "seek": 84940, "start": 865.52, "end": 868.9599999999999, "text": " + with the open search community over the whole year.", "tokens": [51170, 365, 264, + 1269, 3164, 1768, 670, 264, 1379, 1064, 13, 51342], "temperature": 0.0, "avg_logprob": + -0.16465173326097093, "compression_ratio": 1.6926229508196722, "no_speech_prob": + 0.00010032494901679456}, {"id": 322, "seek": 84940, "start": 868.9599999999999, + "end": 873.72, "text": " And so I''m now a, they call it a maintainer instead of + commuter,", "tokens": [51342, 400, 370, 286, 478, 586, 257, 11, 436, 818, 309, 257, + 6909, 260, 2602, 295, 800, 20314, 11, 51580], "temperature": 0.0, "avg_logprob": + -0.16465173326097093, "compression_ratio": 1.6926229508196722, "no_speech_prob": + 0.00010032494901679456}, {"id": 323, "seek": 84940, "start": 873.72, "end": 877.88, + "text": " but I am a maintainer for open search documentation,", "tokens": [51580, + 457, 286, 669, 257, 6909, 260, 337, 1269, 3164, 14333, 11, 51788], "temperature": + 0.0, "avg_logprob": -0.16465173326097093, "compression_ratio": 1.6926229508196722, + "no_speech_prob": 0.00010032494901679456}, {"id": 324, "seek": 87788, "start": 877.88, + "end": 881.32, "text": " which has really been really a lot fun to work on.", "tokens": + [50364, 597, 575, 534, 668, 534, 257, 688, 1019, 281, 589, 322, 13, 50536], "temperature": + 0.0, "avg_logprob": -0.19662362110765674, "compression_ratio": 1.532258064516129, + "no_speech_prob": 0.00040204881224781275}, {"id": 325, "seek": 87788, "start": 881.32, + "end": 885.16, "text": " And we''re talking about it maybe in another podcast,", + "tokens": [50536, 400, 321, 434, 1417, 466, 309, 1310, 294, 1071, 7367, 11, 50728], + "temperature": 0.0, "avg_logprob": -0.19662362110765674, "compression_ratio": 1.532258064516129, + "no_speech_prob": 0.00040204881224781275}, {"id": 326, "seek": 87788, "start": 885.16, + "end": 889.16, "text": " but contributing some new features to open search,", "tokens": + [50728, 457, 19270, 512, 777, 4122, 281, 1269, 3164, 11, 50928], "temperature": + 0.0, "avg_logprob": -0.19662362110765674, "compression_ratio": 1.532258064516129, + "no_speech_prob": 0.00040204881224781275}, {"id": 327, "seek": 87788, "start": 889.16, + "end": 890.32, "text": " the open source product.", "tokens": [50928, 264, 1269, + 4009, 1674, 13, 50986], "temperature": 0.0, "avg_logprob": -0.19662362110765674, + "compression_ratio": 1.532258064516129, "no_speech_prob": 0.00040204881224781275}, + {"id": 328, "seek": 87788, "start": 890.32, "end": 892.52, "text": " So really excited + about that.", "tokens": [50986, 407, 534, 2919, 466, 300, 13, 51096], "temperature": + 0.0, "avg_logprob": -0.19662362110765674, "compression_ratio": 1.532258064516129, + "no_speech_prob": 0.00040204881224781275}, {"id": 329, "seek": 87788, "start": 892.52, + "end": 893.36, "text": " Sam.", "tokens": [51096, 4832, 13, 51138], "temperature": + 0.0, "avg_logprob": -0.19662362110765674, "compression_ratio": 1.532258064516129, + "no_speech_prob": 0.00040204881224781275}, {"id": 330, "seek": 87788, "start": 893.36, + "end": 895.16, "text": " Actually, give me one second.", "tokens": [51138, 5135, + 11, 976, 385, 472, 1150, 13, 51228], "temperature": 0.0, "avg_logprob": -0.19662362110765674, + "compression_ratio": 1.532258064516129, "no_speech_prob": 0.00040204881224781275}, + {"id": 331, "seek": 87788, "start": 895.16, "end": 897.36, "text": " I have one + thing to confess, one second.", "tokens": [51228, 286, 362, 472, 551, 281, 19367, + 11, 472, 1150, 13, 51338], "temperature": 0.0, "avg_logprob": -0.19662362110765674, + "compression_ratio": 1.532258064516129, "no_speech_prob": 0.00040204881224781275}, + {"id": 332, "seek": 90788, "start": 908.88, "end": 912.88, "text": " I have to confess + or share one personal bit", "tokens": [50414, 286, 362, 281, 19367, 420, 2073, 472, + 2973, 857, 50614], "temperature": 0.0, "avg_logprob": -0.27774031687591033, "compression_ratio": + 1.6794871794871795, "no_speech_prob": 0.018705174326896667}, {"id": 333, "seek": + 90788, "start": 912.88, "end": 916.88, "text": " that when I started in search, + it was,", "tokens": [50614, 300, 562, 286, 1409, 294, 3164, 11, 309, 390, 11, 50814], + "temperature": 0.0, "avg_logprob": -0.27774031687591033, "compression_ratio": 1.6794871794871795, + "no_speech_prob": 0.018705174326896667}, {"id": 334, "seek": 90788, "start": 916.88, + "end": 918.88, "text": " of course, it was early.", "tokens": [50814, 295, 1164, + 11, 309, 390, 2440, 13, 50914], "temperature": 0.0, "avg_logprob": -0.27774031687591033, + "compression_ratio": 1.6794871794871795, "no_speech_prob": 0.018705174326896667}, + {"id": 335, "seek": 90788, "start": 918.88, "end": 921.88, "text": " It was like + 2003 about when I wrote my own search engine.", "tokens": [50914, 467, 390, 411, + 16416, 466, 562, 286, 4114, 452, 1065, 3164, 2848, 13, 51064], "temperature": 0.0, + "avg_logprob": -0.27774031687591033, "compression_ratio": 1.6794871794871795, "no_speech_prob": + 0.018705174326896667}, {"id": 336, "seek": 90788, "start": 921.88, "end": 924.88, + "text": " But when I started doing search in the industry, right?", "tokens": [51064, + 583, 562, 286, 1409, 884, 3164, 294, 264, 3518, 11, 558, 30, 51214], "temperature": + 0.0, "avg_logprob": -0.27774031687591033, "compression_ratio": 1.6794871794871795, + "no_speech_prob": 0.018705174326896667}, {"id": 337, "seek": 90788, "start": 924.88, + "end": 926.88, "text": " It was 2010.", "tokens": [51214, 467, 390, 9657, 13, 51314], + "temperature": 0.0, "avg_logprob": -0.27774031687591033, "compression_ratio": 1.6794871794871795, + "no_speech_prob": 0.018705174326896667}, {"id": 338, "seek": 90788, "start": 926.88, + "end": 928.88, "text": " And it was a patch of solar.", "tokens": [51314, 400, 309, + 390, 257, 9972, 295, 7936, 13, 51414], "temperature": 0.0, "avg_logprob": -0.27774031687591033, + "compression_ratio": 1.6794871794871795, "no_speech_prob": 0.018705174326896667}, + {"id": 339, "seek": 90788, "start": 928.88, "end": 930.88, "text": " And when you + Google a patch of solar,", "tokens": [51414, 400, 562, 291, 3329, 257, 9972, 295, + 7936, 11, 51514], "temperature": 0.0, "avg_logprob": -0.27774031687591033, "compression_ratio": + 1.6794871794871795, "no_speech_prob": 0.018705174326896667}, {"id": 340, "seek": + 90788, "start": 930.88, "end": 933.88, "text": " you would mostly find Java, Java + dog.", "tokens": [51514, 291, 576, 5240, 915, 10745, 11, 10745, 3000, 13, 51664], + "temperature": 0.0, "avg_logprob": -0.27774031687591033, "compression_ratio": 1.6794871794871795, + "no_speech_prob": 0.018705174326896667}, {"id": 341, "seek": 90788, "start": 933.88, + "end": 934.88, "text": " Yeah.", "tokens": [51664, 865, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.27774031687591033, "compression_ratio": 1.6794871794871795, + "no_speech_prob": 0.018705174326896667}, {"id": 342, "seek": 90788, "start": 934.88, + "end": 936.88, "text": " And maybe, and then I figured out there is also", "tokens": + [51714, 400, 1310, 11, 293, 550, 286, 8932, 484, 456, 307, 611, 51814], "temperature": + 0.0, "avg_logprob": -0.27774031687591033, "compression_ratio": 1.6794871794871795, + "no_speech_prob": 0.018705174326896667}, {"id": 343, "seek": 93688, "start": 936.88, + "end": 940.88, "text": " a mailing list was like, but is there a place where I can + read", "tokens": [50364, 257, 41612, 1329, 390, 411, 11, 457, 307, 456, 257, 1081, + 689, 286, 393, 1401, 50564], "temperature": 0.0, "avg_logprob": -0.3190021667480469, + "compression_ratio": 1.6457399103139014, "no_speech_prob": 0.0600990392267704}, + {"id": 344, "seek": 93688, "start": 940.88, "end": 943.88, "text": " about solar + besides Vicky pages because Vicky pages were not", "tokens": [50564, 466, 7936, + 11868, 691, 20539, 7183, 570, 691, 20539, 7183, 645, 406, 50714], "temperature": + 0.0, "avg_logprob": -0.3190021667480469, "compression_ratio": 1.6457399103139014, + "no_speech_prob": 0.0600990392267704}, {"id": 345, "seek": 93688, "start": 943.88, + "end": 945.88, "text": " kind of complete in a way?", "tokens": [50714, 733, 295, + 3566, 294, 257, 636, 30, 50814], "temperature": 0.0, "avg_logprob": -0.3190021667480469, + "compression_ratio": 1.6457399103139014, "no_speech_prob": 0.0600990392267704}, + {"id": 346, "seek": 93688, "start": 945.88, "end": 946.88, "text": " Yep.", "tokens": + [50814, 7010, 13, 50864], "temperature": 0.0, "avg_logprob": -0.3190021667480469, + "compression_ratio": 1.6457399103139014, "no_speech_prob": 0.0600990392267704}, + {"id": 347, "seek": 93688, "start": 946.88, "end": 947.88, "text": " Yep.", "tokens": + [50864, 7010, 13, 50914], "temperature": 0.0, "avg_logprob": -0.3190021667480469, + "compression_ratio": 1.6457399103139014, "no_speech_prob": 0.0600990392267704}, + {"id": 348, "seek": 93688, "start": 947.88, "end": 949.88, "text": " I was like, + and I found this, this book.", "tokens": [50914, 286, 390, 411, 11, 293, 286, 1352, + 341, 11, 341, 1446, 13, 51014], "temperature": 0.0, "avg_logprob": -0.3190021667480469, + "compression_ratio": 1.6457399103139014, "no_speech_prob": 0.0600990392267704}, + {"id": 349, "seek": 93688, "start": 949.88, "end": 950.88, "text": " Oh my gosh.", + "tokens": [51014, 876, 452, 6502, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.3190021667480469, "compression_ratio": 1.6457399103139014, "no_speech_prob": + 0.0600990392267704}, {"id": 350, "seek": 93688, "start": 950.88, "end": 952.88, + "text": " One point four.", "tokens": [51064, 1485, 935, 1451, 13, 51164], "temperature": + 0.0, "avg_logprob": -0.3190021667480469, "compression_ratio": 1.6457399103139014, + "no_speech_prob": 0.0600990392267704}, {"id": 351, "seek": 93688, "start": 952.88, + "end": 953.88, "text": " Yeah.", "tokens": [51164, 865, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.3190021667480469, "compression_ratio": 1.6457399103139014, + "no_speech_prob": 0.0600990392267704}, {"id": 352, "seek": 93688, "start": 953.88, + "end": 955.88, "text": " A prior server data.", "tokens": [51214, 316, 4059, 7154, + 1412, 13, 51314], "temperature": 0.0, "avg_logprob": -0.3190021667480469, "compression_ratio": + 1.6457399103139014, "no_speech_prob": 0.0600990392267704}, {"id": 353, "seek": 93688, + "start": 955.88, "end": 956.88, "text": " Yes.", "tokens": [51314, 1079, 13, 51364], + "temperature": 0.0, "avg_logprob": -0.3190021667480469, "compression_ratio": 1.6457399103139014, + "no_speech_prob": 0.0600990392267704}, {"id": 354, "seek": 93688, "start": 956.88, + "end": 957.88, "text": " Yes.", "tokens": [51364, 1079, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.3190021667480469, "compression_ratio": 1.6457399103139014, + "no_speech_prob": 0.0600990392267704}, {"id": 355, "seek": 93688, "start": 957.88, + "end": 958.88, "text": " Yes.", "tokens": [51414, 1079, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.3190021667480469, "compression_ratio": 1.6457399103139014, + "no_speech_prob": 0.0600990392267704}, {"id": 356, "seek": 93688, "start": 958.88, + "end": 960.88, "text": " And I''ve read it covered to cover.", "tokens": [51464, + 400, 286, 600, 1401, 309, 5343, 281, 2060, 13, 51564], "temperature": 0.0, "avg_logprob": + -0.3190021667480469, "compression_ratio": 1.6457399103139014, "no_speech_prob": + 0.0600990392267704}, {"id": 357, "seek": 93688, "start": 960.88, "end": 964.88, + "text": " I have to say it because because I had one challenging task.", "tokens": + [51564, 286, 362, 281, 584, 309, 570, 570, 286, 632, 472, 7595, 5633, 13, 51764], + "temperature": 0.0, "avg_logprob": -0.3190021667480469, "compression_ratio": 1.6457399103139014, + "no_speech_prob": 0.0600990392267704}, {"id": 358, "seek": 96488, "start": 964.88, + "end": 969.88, "text": " I had to build what I suggest and that I would suggest + had to abide to certain rules.", "tokens": [50364, 286, 632, 281, 1322, 437, 286, + 3402, 293, 300, 286, 576, 3402, 632, 281, 39663, 281, 1629, 4474, 13, 50614], "temperature": + 0.0, "avg_logprob": -0.20491992510282075, "compression_ratio": 1.7302158273381294, + "no_speech_prob": 0.09609706699848175}, {"id": 359, "seek": 96488, "start": 969.88, + "end": 971.88, "text": " And I was like, oh my god, how will I do it?", "tokens": + [50614, 400, 286, 390, 411, 11, 1954, 452, 3044, 11, 577, 486, 286, 360, 309, 30, + 50714], "temperature": 0.0, "avg_logprob": -0.20491992510282075, "compression_ratio": + 1.7302158273381294, "no_speech_prob": 0.09609706699848175}, {"id": 360, "seek": + 96488, "start": 971.88, "end": 973.88, "text": " In the moment I did it, it was + also slow.", "tokens": [50714, 682, 264, 1623, 286, 630, 309, 11, 309, 390, 611, + 2964, 13, 50814], "temperature": 0.0, "avg_logprob": -0.20491992510282075, "compression_ratio": + 1.7302158273381294, "no_speech_prob": 0.09609706699848175}, {"id": 361, "seek": + 96488, "start": 973.88, "end": 977.88, "text": " So I had to figure out on our data, + on our version of model of data.", "tokens": [50814, 407, 286, 632, 281, 2573, 484, + 322, 527, 1412, 11, 322, 527, 3037, 295, 2316, 295, 1412, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.20491992510282075, "compression_ratio": 1.7302158273381294, + "no_speech_prob": 0.09609706699848175}, {"id": 362, "seek": 96488, "start": 977.88, + "end": 978.88, "text": " Right.", "tokens": [51014, 1779, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.20491992510282075, "compression_ratio": 1.7302158273381294, + "no_speech_prob": 0.09609706699848175}, {"id": 363, "seek": 96488, "start": 978.88, + "end": 979.88, "text": " Oh my god, this was so exciting.", "tokens": [51064, 876, + 452, 3044, 11, 341, 390, 370, 4670, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.20491992510282075, "compression_ratio": 1.7302158273381294, "no_speech_prob": + 0.09609706699848175}, {"id": 364, "seek": 96488, "start": 979.88, "end": 982.88, + "text": " I was like going back and forth between the book.", "tokens": [51114, + 286, 390, 411, 516, 646, 293, 5220, 1296, 264, 1446, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.20491992510282075, "compression_ratio": 1.7302158273381294, + "no_speech_prob": 0.09609706699848175}, {"id": 365, "seek": 96488, "start": 982.88, + "end": 985.88, "text": " And then a bit of googling and then trying things.", "tokens": + [51264, 400, 550, 257, 857, 295, 50061, 1688, 293, 550, 1382, 721, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.20491992510282075, "compression_ratio": 1.7302158273381294, + "no_speech_prob": 0.09609706699848175}, {"id": 366, "seek": 96488, "start": 985.88, + "end": 986.88, "text": " Ah, yes.", "tokens": [51414, 2438, 11, 2086, 13, 51464], + "temperature": 0.0, "avg_logprob": -0.20491992510282075, "compression_ratio": 1.7302158273381294, + "no_speech_prob": 0.09609706699848175}, {"id": 367, "seek": 96488, "start": 986.88, + "end": 987.88, "text": " Fantastic.", "tokens": [51464, 21320, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.20491992510282075, "compression_ratio": 1.7302158273381294, + "no_speech_prob": 0.09609706699848175}, {"id": 368, "seek": 96488, "start": 987.88, + "end": 988.88, "text": " Wow.", "tokens": [51514, 3153, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.20491992510282075, "compression_ratio": 1.7302158273381294, + "no_speech_prob": 0.09609706699848175}, {"id": 369, "seek": 96488, "start": 988.88, + "end": 989.88, "text": " Thanks for doing this.", "tokens": [51564, 2561, 337, 884, + 341, 13, 51614], "temperature": 0.0, "avg_logprob": -0.20491992510282075, "compression_ratio": + 1.7302158273381294, "no_speech_prob": 0.09609706699848175}, {"id": 370, "seek": + 96488, "start": 989.88, "end": 990.88, "text": " So you also the author.", "tokens": + [51614, 407, 291, 611, 264, 3793, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.20491992510282075, "compression_ratio": 1.7302158273381294, "no_speech_prob": + 0.09609706699848175}, {"id": 371, "seek": 96488, "start": 990.88, "end": 991.88, + "text": " You also the author.", "tokens": [51664, 509, 611, 264, 3793, 13, 51714], + "temperature": 0.0, "avg_logprob": -0.20491992510282075, "compression_ratio": 1.7302158273381294, + "no_speech_prob": 0.09609706699848175}, {"id": 372, "seek": 96488, "start": 991.88, + "end": 992.88, "text": " Yeah.", "tokens": [51714, 865, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.20491992510282075, "compression_ratio": 1.7302158273381294, + "no_speech_prob": 0.09609706699848175}, {"id": 373, "seek": 99288, "start": 992.88, + "end": 993.88, "text": " Yeah.", "tokens": [50364, 865, 13, 50414], "temperature": + 0.0, "avg_logprob": -0.1467973153422198, "compression_ratio": 1.6640625, "no_speech_prob": + 0.0089711369946599}, {"id": 374, "seek": 99288, "start": 993.88, "end": 994.88, + "text": " Yeah.", "tokens": [50414, 865, 13, 50464], "temperature": 0.0, "avg_logprob": + -0.1467973153422198, "compression_ratio": 1.6640625, "no_speech_prob": 0.0089711369946599}, + {"id": 375, "seek": 99288, "start": 994.88, "end": 995.88, "text": " Yeah.", "tokens": + [50464, 865, 13, 50514], "temperature": 0.0, "avg_logprob": -0.1467973153422198, + "compression_ratio": 1.6640625, "no_speech_prob": 0.0089711369946599}, {"id": 376, + "seek": 99288, "start": 995.88, "end": 996.88, "text": " Yeah.", "tokens": [50514, + 865, 13, 50564], "temperature": 0.0, "avg_logprob": -0.1467973153422198, "compression_ratio": + 1.6640625, "no_speech_prob": 0.0089711369946599}, {"id": 377, "seek": 99288, "start": + 996.88, "end": 997.88, "text": " So we did that book.", "tokens": [50564, 407, 321, + 630, 300, 1446, 13, 50614], "temperature": 0.0, "avg_logprob": -0.1467973153422198, + "compression_ratio": 1.6640625, "no_speech_prob": 0.0089711369946599}, {"id": 378, + "seek": 99288, "start": 997.88, "end": 998.88, "text": " We did it.", "tokens": + [50614, 492, 630, 309, 13, 50664], "temperature": 0.0, "avg_logprob": -0.1467973153422198, + "compression_ratio": 1.6640625, "no_speech_prob": 0.0089711369946599}, {"id": 379, + "seek": 99288, "start": 998.88, "end": 1000.88, "text": " We did a second version + of it for updated solar.", "tokens": [50664, 492, 630, 257, 1150, 3037, 295, 309, + 337, 10588, 7936, 13, 50764], "temperature": 0.0, "avg_logprob": -0.1467973153422198, + "compression_ratio": 1.6640625, "no_speech_prob": 0.0089711369946599}, {"id": 380, + "seek": 99288, "start": 1000.88, "end": 1002.88, "text": " But that was quite a + few years ago.", "tokens": [50764, 583, 300, 390, 1596, 257, 1326, 924, 2057, 13, + 50864], "temperature": 0.0, "avg_logprob": -0.1467973153422198, "compression_ratio": + 1.6640625, "no_speech_prob": 0.0089711369946599}, {"id": 381, "seek": 99288, "start": + 1002.88, "end": 1006.88, "text": " I am kind of curious what''s going to happen + with technical books.", "tokens": [50864, 286, 669, 733, 295, 6369, 437, 311, 516, + 281, 1051, 365, 6191, 3642, 13, 51064], "temperature": 0.0, "avg_logprob": -0.1467973153422198, + "compression_ratio": 1.6640625, "no_speech_prob": 0.0089711369946599}, {"id": 382, + "seek": 99288, "start": 1006.88, "end": 1007.88, "text": " Right.", "tokens": [51064, + 1779, 13, 51114], "temperature": 0.0, "avg_logprob": -0.1467973153422198, "compression_ratio": + 1.6640625, "no_speech_prob": 0.0089711369946599}, {"id": 383, "seek": 99288, "start": + 1007.88, "end": 1015.88, "text": " I mean, in the solar community, we got the ref + guide, which is, I think, pretty darn good considering how it''s written.", "tokens": + [51114, 286, 914, 11, 294, 264, 7936, 1768, 11, 321, 658, 264, 1895, 5934, 11, 597, + 307, 11, 286, 519, 11, 1238, 29063, 665, 8079, 577, 309, 311, 3720, 13, 51514], + "temperature": 0.0, "avg_logprob": -0.1467973153422198, "compression_ratio": 1.6640625, + "no_speech_prob": 0.0089711369946599}, {"id": 384, "seek": 99288, "start": 1015.88, + "end": 1021.88, "text": " I do sort of wonder what the future of technical books + will be with open source communities.", "tokens": [51514, 286, 360, 1333, 295, 2441, + 437, 264, 2027, 295, 6191, 3642, 486, 312, 365, 1269, 4009, 4456, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.1467973153422198, "compression_ratio": 1.6640625, "no_speech_prob": + 0.0089711369946599}, {"id": 385, "seek": 102188, "start": 1021.88, "end": 1024.88, + "text": " And what do we do?", "tokens": [50364, 400, 437, 360, 321, 360, 30, 50514], + "temperature": 0.0, "avg_logprob": -0.21024766489237295, "compression_ratio": 1.691358024691358, + "no_speech_prob": 0.015546152368187904}, {"id": 386, "seek": 102188, "start": 1024.88, + "end": 1035.88, "text": " So maybe like cookbooks, you know, like that you have + specific cases and like, how would you come about building these things and maybe + real data so people can actually try things, right?", "tokens": [50514, 407, 1310, + 411, 2543, 15170, 11, 291, 458, 11, 411, 300, 291, 362, 2685, 3331, 293, 411, 11, + 577, 576, 291, 808, 466, 2390, 613, 721, 293, 1310, 957, 1412, 370, 561, 393, 767, + 853, 721, 11, 558, 30, 51064], "temperature": 0.0, "avg_logprob": -0.21024766489237295, + "compression_ratio": 1.691358024691358, "no_speech_prob": 0.015546152368187904}, + {"id": 387, "seek": 102188, "start": 1035.88, "end": 1036.88, "text": " Yeah.", + "tokens": [51064, 865, 13, 51114], "temperature": 0.0, "avg_logprob": -0.21024766489237295, + "compression_ratio": 1.691358024691358, "no_speech_prob": 0.015546152368187904}, + {"id": 388, "seek": 102188, "start": 1036.88, "end": 1037.88, "text": " Yeah.", + "tokens": [51114, 865, 13, 51164], "temperature": 0.0, "avg_logprob": -0.21024766489237295, + "compression_ratio": 1.691358024691358, "no_speech_prob": 0.015546152368187904}, + {"id": 389, "seek": 102188, "start": 1037.88, "end": 1038.88, "text": " Yeah.", + "tokens": [51164, 865, 13, 51214], "temperature": 0.0, "avg_logprob": -0.21024766489237295, + "compression_ratio": 1.691358024691358, "no_speech_prob": 0.015546152368187904}, + {"id": 390, "seek": 102188, "start": 1038.88, "end": 1041.88, "text": " I mean, + it has gotten a lot easier to publish on the web, right?", "tokens": [51214, 286, + 914, 11, 309, 575, 5768, 257, 688, 3571, 281, 11374, 322, 264, 3670, 11, 558, 30, + 51364], "temperature": 0.0, "avg_logprob": -0.21024766489237295, "compression_ratio": + 1.691358024691358, "no_speech_prob": 0.015546152368187904}, {"id": 391, "seek": + 102188, "start": 1041.88, "end": 1042.88, "text": " Yeah.", "tokens": [51364, 865, + 13, 51414], "temperature": 0.0, "avg_logprob": -0.21024766489237295, "compression_ratio": + 1.691358024691358, "no_speech_prob": 0.015546152368187904}, {"id": 392, "seek": + 102188, "start": 1042.88, "end": 1043.88, "text": " Have something.", "tokens": + [51414, 3560, 746, 13, 51464], "temperature": 0.0, "avg_logprob": -0.21024766489237295, + "compression_ratio": 1.691358024691358, "no_speech_prob": 0.015546152368187904}, + {"id": 393, "seek": 102188, "start": 1043.88, "end": 1049.88, "text": " But yeah, + what, you know, I think a lot of people write a book sort of as a writer passage + as well.", "tokens": [51464, 583, 1338, 11, 437, 11, 291, 458, 11, 286, 519, 257, + 688, 295, 561, 2464, 257, 1446, 1333, 295, 382, 257, 9936, 11497, 382, 731, 13, + 51764], "temperature": 0.0, "avg_logprob": -0.21024766489237295, "compression_ratio": + 1.691358024691358, "no_speech_prob": 0.015546152368187904}, {"id": 394, "seek": + 104988, "start": 1049.88, "end": 1050.88, "text": " Right.", "tokens": [50364, 1779, + 13, 50414], "temperature": 0.0, "avg_logprob": -0.17917386894552118, "compression_ratio": + 1.5663716814159292, "no_speech_prob": 0.022632339969277382}, {"id": 395, "seek": + 104988, "start": 1050.88, "end": 1057.88, "text": " So it''s a book, a little different + thing writing a book for an open source project reference.", "tokens": [50414, 407, + 309, 311, 257, 1446, 11, 257, 707, 819, 551, 3579, 257, 1446, 337, 364, 1269, 4009, + 1716, 6408, 13, 50764], "temperature": 0.0, "avg_logprob": -0.17917386894552118, + "compression_ratio": 1.5663716814159292, "no_speech_prob": 0.022632339969277382}, + {"id": 396, "seek": 104988, "start": 1057.88, "end": 1058.88, "text": " Right.", + "tokens": [50764, 1779, 13, 50814], "temperature": 0.0, "avg_logprob": -0.17917386894552118, + "compression_ratio": 1.5663716814159292, "no_speech_prob": 0.022632339969277382}, + {"id": 397, "seek": 104988, "start": 1058.88, "end": 1059.88, "text": " For sure.", + "tokens": [50814, 1171, 988, 13, 50864], "temperature": 0.0, "avg_logprob": -0.17917386894552118, + "compression_ratio": 1.5663716814159292, "no_speech_prob": 0.022632339969277382}, + {"id": 398, "seek": 104988, "start": 1059.88, "end": 1060.88, "text": " How to make + them printable.", "tokens": [50864, 1012, 281, 652, 552, 4482, 712, 13, 50914], + "temperature": 0.0, "avg_logprob": -0.17917386894552118, "compression_ratio": 1.5663716814159292, + "no_speech_prob": 0.022632339969277382}, {"id": 399, "seek": 104988, "start": 1060.88, + "end": 1063.88, "text": " So you can say I wrote the book for this open source project.", + "tokens": [50914, 407, 291, 393, 584, 286, 4114, 264, 1446, 337, 341, 1269, 4009, + 1716, 13, 51064], "temperature": 0.0, "avg_logprob": -0.17917386894552118, "compression_ratio": + 1.5663716814159292, "no_speech_prob": 0.022632339969277382}, {"id": 400, "seek": + 104988, "start": 1063.88, "end": 1064.88, "text": " But we''ll see.", "tokens": + [51064, 583, 321, 603, 536, 13, 51114], "temperature": 0.0, "avg_logprob": -0.17917386894552118, + "compression_ratio": 1.5663716814159292, "no_speech_prob": 0.022632339969277382}, + {"id": 401, "seek": 104988, "start": 1064.88, "end": 1065.88, "text": " Yeah.", + "tokens": [51114, 865, 13, 51164], "temperature": 0.0, "avg_logprob": -0.17917386894552118, + "compression_ratio": 1.5663716814159292, "no_speech_prob": 0.022632339969277382}, + {"id": 402, "seek": 104988, "start": 1065.88, "end": 1066.88, "text": " That''s + exciting.", "tokens": [51164, 663, 311, 4670, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.17917386894552118, "compression_ratio": 1.5663716814159292, "no_speech_prob": + 0.022632339969277382}, {"id": 403, "seek": 104988, "start": 1066.88, "end": 1068.88, + "text": " But you also wanted to show something.", "tokens": [51214, 583, 291, 611, + 1415, 281, 855, 746, 13, 51314], "temperature": 0.0, "avg_logprob": -0.17917386894552118, + "compression_ratio": 1.5663716814159292, "no_speech_prob": 0.022632339969277382}, + {"id": 404, "seek": 104988, "start": 1068.88, "end": 1069.88, "text": " Let''s demo.", + "tokens": [51314, 961, 311, 10723, 13, 51364], "temperature": 0.0, "avg_logprob": + -0.17917386894552118, "compression_ratio": 1.5663716814159292, "no_speech_prob": + 0.022632339969277382}, {"id": 405, "seek": 104988, "start": 1069.88, "end": 1070.88, + "text": " I''d love to.", "tokens": [51364, 286, 1116, 959, 281, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.17917386894552118, "compression_ratio": 1.5663716814159292, + "no_speech_prob": 0.022632339969277382}, {"id": 406, "seek": 104988, "start": 1070.88, + "end": 1071.88, "text": " Yeah.", "tokens": [51414, 865, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.17917386894552118, "compression_ratio": 1.5663716814159292, + "no_speech_prob": 0.022632339969277382}, {"id": 407, "seek": 104988, "start": 1071.88, + "end": 1074.88, "text": " So we touched briefly on Cuban, right?", "tokens": [51464, + 407, 321, 9828, 10515, 322, 31547, 11, 558, 30, 51614], "temperature": 0.0, "avg_logprob": + -0.17917386894552118, "compression_ratio": 1.5663716814159292, "no_speech_prob": + 0.022632339969277382}, {"id": 408, "seek": 107488, "start": 1074.88, "end": 1081.88, + "text": " The first one of the stewards of the project and historically for those + of here, I''ll just go ahead.", "tokens": [50364, 440, 700, 472, 295, 264, 36270, + 295, 264, 1716, 293, 16180, 337, 729, 295, 510, 11, 286, 603, 445, 352, 2286, 13, + 50714], "temperature": 0.4, "avg_logprob": -0.265173817728902, "compression_ratio": + 1.6029411764705883, "no_speech_prob": 0.22749660909175873}, {"id": 409, "seek": + 107488, "start": 1081.88, "end": 1091.88, "text": " For those of you who''ve used + Cuban in the past, the way it has worked is I''ll just I''ll just bring up my local + host.", "tokens": [50714, 1171, 729, 295, 291, 567, 600, 1143, 31547, 294, 264, + 1791, 11, 264, 636, 309, 575, 2732, 307, 286, 603, 445, 286, 603, 445, 1565, 493, + 452, 2654, 3975, 13, 51214], "temperature": 0.4, "avg_logprob": -0.265173817728902, + "compression_ratio": 1.6029411764705883, "no_speech_prob": 0.22749660909175873}, + {"id": 410, "seek": 107488, "start": 1091.88, "end": 1092.88, "text": " Here we + go.", "tokens": [51214, 1692, 321, 352, 13, 51264], "temperature": 0.4, "avg_logprob": + -0.265173817728902, "compression_ratio": 1.6029411764705883, "no_speech_prob": 0.22749660909175873}, + {"id": 411, "seek": 107488, "start": 1092.88, "end": 1093.88, "text": " Right.", + "tokens": [51264, 1779, 13, 51314], "temperature": 0.4, "avg_logprob": -0.265173817728902, + "compression_ratio": 1.6029411764705883, "no_speech_prob": 0.22749660909175873}, + {"id": 412, "seek": 107488, "start": 1093.88, "end": 1098.88, "text": " So one of + the things that we batted in the not in recent, this is the development version.", + "tokens": [51314, 407, 472, 295, 264, 721, 300, 321, 9591, 292, 294, 264, 406, 294, + 5162, 11, 341, 307, 264, 3250, 3037, 13, 51564], "temperature": 0.4, "avg_logprob": + -0.265173817728902, "compression_ratio": 1.6029411764705883, "no_speech_prob": 0.22749660909175873}, + {"id": 413, "seek": 109888, "start": 1098.88, "end": 1104.88, "text": " I''m going + to use it with realistic activity and Cuban is who I pulled up and I got a couple + of cases here.", "tokens": [50364, 286, 478, 516, 281, 764, 309, 365, 12465, 5191, + 293, 31547, 307, 567, 286, 7373, 493, 293, 286, 658, 257, 1916, 295, 3331, 510, + 13, 50664], "temperature": 0.0, "avg_logprob": -0.18941512063284902, "compression_ratio": + 1.580188679245283, "no_speech_prob": 0.178276926279068}, {"id": 414, "seek": 109888, + "start": 1104.88, "end": 1109.88, "text": " But you know, in Cuban, it works well.", + "tokens": [50664, 583, 291, 458, 11, 294, 31547, 11, 309, 1985, 731, 13, 50914], + "temperature": 0.0, "avg_logprob": -0.18941512063284902, "compression_ratio": 1.580188679245283, + "no_speech_prob": 0.178276926279068}, {"id": 415, "seek": 109888, "start": 1109.88, + "end": 1110.88, "text": " I''m going to bring up a case.", "tokens": [50914, 286, + 478, 516, 281, 1565, 493, 257, 1389, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.18941512063284902, "compression_ratio": 1.580188679245283, "no_speech_prob": + 0.178276926279068}, {"id": 416, "seek": 109888, "start": 1110.88, "end": 1111.88, + "text": " Right.", "tokens": [50964, 1779, 13, 51014], "temperature": 0.0, "avg_logprob": + -0.18941512063284902, "compression_ratio": 1.580188679245283, "no_speech_prob": + 0.178276926279068}, {"id": 417, "seek": 109888, "start": 1111.88, "end": 1112.88, + "text": " Here''s a case.", "tokens": [51014, 1692, 311, 257, 1389, 13, 51064], + "temperature": 0.0, "avg_logprob": -0.18941512063284902, "compression_ratio": 1.580188679245283, + "no_speech_prob": 0.178276926279068}, {"id": 418, "seek": 109888, "start": 1112.88, + "end": 1116.88, "text": " I''m going to search for milk.", "tokens": [51064, 286, + 478, 516, 281, 3164, 337, 5392, 13, 51264], "temperature": 0.0, "avg_logprob": -0.18941512063284902, + "compression_ratio": 1.580188679245283, "no_speech_prob": 0.178276926279068}, {"id": + 419, "seek": 109888, "start": 1116.88, "end": 1118.88, "text": " I did a query for + milk.", "tokens": [51264, 286, 630, 257, 14581, 337, 5392, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.18941512063284902, "compression_ratio": 1.580188679245283, + "no_speech_prob": 0.178276926279068}, {"id": 420, "seek": 109888, "start": 1118.88, + "end": 1122.88, "text": " This is using sort of a random data set here.", "tokens": + [51364, 639, 307, 1228, 1333, 295, 257, 4974, 1412, 992, 510, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.18941512063284902, "compression_ratio": 1.580188679245283, + "no_speech_prob": 0.178276926279068}, {"id": 421, "seek": 109888, "start": 1122.88, + "end": 1126.88, "text": " It''s backed by a solar search engine.", "tokens": [51564, + 467, 311, 20391, 538, 257, 7936, 3164, 2848, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.18941512063284902, "compression_ratio": 1.580188679245283, "no_speech_prob": + 0.178276926279068}, {"id": 422, "seek": 112688, "start": 1126.88, "end": 1127.88, + "text": " And there''s a survey right there.", "tokens": [50364, 400, 456, 311, + 257, 8984, 558, 456, 13, 50414], "temperature": 0.0, "avg_logprob": -0.3001977169152462, + "compression_ratio": 1.4918032786885247, "no_speech_prob": 0.02273700200021267}, + {"id": 423, "seek": 112688, "start": 1127.88, "end": 1129.88, "text": " There''s + my search engine.", "tokens": [50414, 821, 311, 452, 3164, 2848, 13, 50514], "temperature": + 0.0, "avg_logprob": -0.3001977169152462, "compression_ratio": 1.4918032786885247, + "no_speech_prob": 0.02273700200021267}, {"id": 424, "seek": 112688, "start": 1129.88, + "end": 1140.88, "text": " And Cuban works great for a relatively small number of + queries up to hundreds, right?", "tokens": [50514, 400, 31547, 1985, 869, 337, 257, + 7226, 1359, 1230, 295, 24109, 493, 281, 6779, 11, 558, 30, 51064], "temperature": + 0.0, "avg_logprob": -0.3001977169152462, "compression_ratio": 1.4918032786885247, + "no_speech_prob": 0.02273700200021267}, {"id": 425, "seek": 112688, "start": 1140.88, + "end": 1151.88, "text": " And one of the things that we found is that this interface + works well, especially if a search engine super fast and responsive.", "tokens": + [51064, 400, 472, 295, 264, 721, 300, 321, 1352, 307, 300, 341, 9226, 1985, 731, + 11, 2318, 498, 257, 3164, 2848, 1687, 2370, 293, 21826, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.3001977169152462, "compression_ratio": 1.4918032786885247, + "no_speech_prob": 0.02273700200021267}, {"id": 426, "seek": 115188, "start": 1151.88, + "end": 1159.88, "text": " It''s a rich single page JavaScript application that''s + making queries in real time to a search engine.", "tokens": [50364, 467, 311, 257, + 4593, 2167, 3028, 15778, 3861, 300, 311, 1455, 24109, 294, 957, 565, 281, 257, 3164, + 2848, 13, 50764], "temperature": 0.0, "avg_logprob": -0.16237482210484946, "compression_ratio": + 1.5093457943925233, "no_speech_prob": 0.3519994914531708}, {"id": 427, "seek": 115188, + "start": 1159.88, "end": 1169.88, "text": " If you have a thousand queries, like + the people I mentioned before, takes like 15, 20 minutes, but as you wide load up + and all the queries to be run.", "tokens": [50764, 759, 291, 362, 257, 4714, 24109, + 11, 411, 264, 561, 286, 2835, 949, 11, 2516, 411, 2119, 11, 945, 2077, 11, 457, + 382, 291, 4874, 3677, 493, 293, 439, 264, 24109, 281, 312, 1190, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.16237482210484946, "compression_ratio": 1.5093457943925233, + "no_speech_prob": 0.3519994914531708}, {"id": 428, "seek": 115188, "start": 1169.88, + "end": 1175.88, "text": " And we know that lots of people want to run more queries, + 5,000, right?", "tokens": [51264, 400, 321, 458, 300, 3195, 295, 561, 528, 281, + 1190, 544, 24109, 11, 1025, 11, 1360, 11, 558, 30, 51564], "temperature": 0.0, "avg_logprob": + -0.16237482210484946, "compression_ratio": 1.5093457943925233, "no_speech_prob": + 0.3519994914531708}, {"id": 429, "seek": 117588, "start": 1175.88, "end": 1177.88, + "text": " People ask how many queries should I be measuring?", "tokens": [50364, + 3432, 1029, 577, 867, 24109, 820, 286, 312, 13389, 30, 50464], "temperature": 0.0, + "avg_logprob": -0.12499410098360986, "compression_ratio": 1.5136363636363637, "no_speech_prob": + 0.027469759806990623}, {"id": 430, "seek": 117588, "start": 1177.88, "end": 1180.88, + "text": " I''m like, well, start out with what you can.", "tokens": [50464, 286, + 478, 411, 11, 731, 11, 722, 484, 365, 437, 291, 393, 13, 50614], "temperature": + 0.0, "avg_logprob": -0.12499410098360986, "compression_ratio": 1.5136363636363637, + "no_speech_prob": 0.027469759806990623}, {"id": 431, "seek": 117588, "start": 1180.88, + "end": 1183.88, "text": " If that''s 25 and 50, that''s better than zero.", "tokens": + [50614, 759, 300, 311, 3552, 293, 2625, 11, 300, 311, 1101, 813, 4018, 13, 50764], + "temperature": 0.0, "avg_logprob": -0.12499410098360986, "compression_ratio": 1.5136363636363637, + "no_speech_prob": 0.027469759806990623}, {"id": 432, "seek": 117588, "start": 1183.88, + "end": 1189.88, "text": " Think about 200, maybe 300, maybe a thousand, 5,000, right?", + "tokens": [50764, 6557, 466, 2331, 11, 1310, 6641, 11, 1310, 257, 4714, 11, 1025, + 11, 1360, 11, 558, 30, 51064], "temperature": 0.0, "avg_logprob": -0.12499410098360986, + "compression_ratio": 1.5136363636363637, "no_speech_prob": 0.027469759806990623}, + {"id": 433, "seek": 117588, "start": 1189.88, "end": 1195.88, "text": " And then + above 5,000, that''s sort of only for the most sophisticated teams.", "tokens": + [51064, 400, 550, 3673, 1025, 11, 1360, 11, 300, 311, 1333, 295, 787, 337, 264, + 881, 16950, 5491, 13, 51364], "temperature": 0.0, "avg_logprob": -0.12499410098360986, + "compression_ratio": 1.5136363636363637, "no_speech_prob": 0.027469759806990623}, + {"id": 434, "seek": 117588, "start": 1195.88, "end": 1201.88, "text": " But Cuban + kind of tops out at maybe a thousand queries.", "tokens": [51364, 583, 31547, 733, + 295, 22836, 484, 412, 1310, 257, 4714, 24109, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.12499410098360986, "compression_ratio": 1.5136363636363637, "no_speech_prob": + 0.027469759806990623}, {"id": 435, "seek": 120188, "start": 1201.88, "end": 1210.88, + "text": " And so we''ve been doing a lot of work to think about how do we support + larger data sets, right?", "tokens": [50364, 400, 370, 321, 600, 668, 884, 257, + 688, 295, 589, 281, 519, 466, 577, 360, 321, 1406, 4833, 1412, 6352, 11, 558, 30, + 50814], "temperature": 0.0, "avg_logprob": -0.08426718168620821, "compression_ratio": + 1.5305164319248827, "no_speech_prob": 0.09922218322753906}, {"id": 436, "seek": + 120188, "start": 1210.88, "end": 1213.88, "text": " Larger query sets.", "tokens": + [50814, 11569, 1321, 14581, 6352, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.08426718168620821, "compression_ratio": 1.5305164319248827, "no_speech_prob": + 0.09922218322753906}, {"id": 437, "seek": 120188, "start": 1213.88, "end": 1218.88, + "text": " And what''s been really fun is to work on introducing background processing, + right?", "tokens": [50964, 400, 437, 311, 668, 534, 1019, 307, 281, 589, 322, 15424, + 3678, 9007, 11, 558, 30, 51214], "temperature": 0.0, "avg_logprob": -0.08426718168620821, + "compression_ratio": 1.5305164319248827, "no_speech_prob": 0.09922218322753906}, + {"id": 438, "seek": 120188, "start": 1218.88, "end": 1226.88, "text": " Instead + of everything being limited by the request, response cycle of your web browser, + what if we can run some background jobs?", "tokens": [51214, 7156, 295, 1203, 885, + 5567, 538, 264, 5308, 11, 4134, 6586, 295, 428, 3670, 11185, 11, 437, 498, 321, + 393, 1190, 512, 3678, 4782, 30, 51614], "temperature": 0.0, "avg_logprob": -0.08426718168620821, + "compression_ratio": 1.5305164319248827, "no_speech_prob": 0.09922218322753906}, + {"id": 439, "seek": 122688, "start": 1226.88, "end": 1230.88, "text": " And so I''m + just going to show really quick.", "tokens": [50364, 400, 370, 286, 478, 445, 516, + 281, 855, 534, 1702, 13, 50564], "temperature": 0.0, "avg_logprob": -0.12992204107889316, + "compression_ratio": 1.505952380952381, "no_speech_prob": 0.26714858412742615}, + {"id": 440, "seek": 122688, "start": 1230.88, "end": 1234.88, "text": " I''m going + to go and bring up all the books.", "tokens": [50564, 286, 478, 516, 281, 352, 293, + 1565, 493, 439, 264, 3642, 13, 50764], "temperature": 0.0, "avg_logprob": -0.12992204107889316, + "compression_ratio": 1.505952380952381, "no_speech_prob": 0.26714858412742615}, + {"id": 441, "seek": 122688, "start": 1234.88, "end": 1237.88, "text": " And I''ve + got an import feature.", "tokens": [50764, 400, 286, 600, 658, 364, 974, 4111, 13, + 50914], "temperature": 0.0, "avg_logprob": -0.12992204107889316, "compression_ratio": + 1.505952380952381, "no_speech_prob": 0.26714858412742615}, {"id": 442, "seek": 122688, + "start": 1237.88, "end": 1242.88, "text": " So we have exported a book book export + 39.", "tokens": [50914, 407, 321, 362, 42055, 257, 1446, 1446, 10725, 15238, 13, + 51164], "temperature": 0.0, "avg_logprob": -0.12992204107889316, "compression_ratio": + 1.505952380952381, "no_speech_prob": 0.26714858412742615}, {"id": 443, "seek": 122688, + "start": 1242.88, "end": 1246.88, "text": " It''s a 62 megabyte JSON file.", "tokens": + [51164, 467, 311, 257, 24536, 10816, 34529, 31828, 3991, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.12992204107889316, "compression_ratio": 1.505952380952381, + "no_speech_prob": 0.26714858412742615}, {"id": 444, "seek": 122688, "start": 1246.88, + "end": 1248.88, "text": " So 62 megabytes.", "tokens": [51364, 407, 24536, 10816, + 24538, 13, 51464], "temperature": 0.0, "avg_logprob": -0.12992204107889316, "compression_ratio": + 1.505952380952381, "no_speech_prob": 0.26714858412742615}, {"id": 445, "seek": 122688, + "start": 1248.88, "end": 1251.88, "text": " And I''m going to go ahead and click + upload.", "tokens": [51464, 400, 286, 478, 516, 281, 352, 2286, 293, 2052, 6580, + 13, 51614], "temperature": 0.0, "avg_logprob": -0.12992204107889316, "compression_ratio": + 1.505952380952381, "no_speech_prob": 0.26714858412742615}, {"id": 446, "seek": 125188, + "start": 1251.88, "end": 1260.88, "text": " And now in cube bid, what we''re starting + to do is we can take large files based on files predominantly.", "tokens": [50364, + 400, 586, 294, 10057, 68, 12957, 11, 437, 321, 434, 2891, 281, 360, 307, 321, 393, + 747, 2416, 7098, 2361, 322, 7098, 29893, 13, 50814], "temperature": 0.0, "avg_logprob": + -0.1623751212810648, "compression_ratio": 1.6384976525821595, "no_speech_prob": + 0.02960802987217903}, {"id": 447, "seek": 125188, "start": 1260.88, "end": 1265.88, + "text": " And we storm in the background and we kick off a process, a background + job.", "tokens": [50814, 400, 321, 7679, 294, 264, 3678, 293, 321, 4437, 766, 257, + 1399, 11, 257, 3678, 1691, 13, 51064], "temperature": 0.0, "avg_logprob": -0.1623751212810648, + "compression_ratio": 1.6384976525821595, "no_speech_prob": 0.02960802987217903}, + {"id": 448, "seek": 125188, "start": 1265.88, "end": 1272.88, "text": " And there + you can see, right there we are loading a whole bunch of queries, right?", "tokens": + [51064, 400, 456, 291, 393, 536, 11, 558, 456, 321, 366, 15114, 257, 1379, 3840, + 295, 24109, 11, 558, 30, 51414], "temperature": 0.0, "avg_logprob": -0.1623751212810648, + "compression_ratio": 1.6384976525821595, "no_speech_prob": 0.02960802987217903}, + {"id": 449, "seek": 125188, "start": 1272.88, "end": 1278.88, "text": " And these + are all sort of scientific queries, some very complex ones and simpler ones.", "tokens": + [51414, 400, 613, 366, 439, 1333, 295, 8134, 24109, 11, 512, 588, 3997, 2306, 293, + 18587, 2306, 13, 51714], "temperature": 0.0, "avg_logprob": -0.1623751212810648, + "compression_ratio": 1.6384976525821595, "no_speech_prob": 0.02960802987217903}, + {"id": 450, "seek": 127888, "start": 1278.88, "end": 1288.88, "text": " And you + can see it''s going to take a while because this had what 28,000 query dock pairs, + right?", "tokens": [50364, 400, 291, 393, 536, 309, 311, 516, 281, 747, 257, 1339, + 570, 341, 632, 437, 7562, 11, 1360, 14581, 20929, 15494, 11, 558, 30, 50864], "temperature": + 0.0, "avg_logprob": -0.11332834502797068, "compression_ratio": 1.4952380952380953, + "no_speech_prob": 0.003010433167219162}, {"id": 451, "seek": 127888, "start": 1288.88, + "end": 1292.88, "text": " So that are being loaded along with their judgments.", + "tokens": [50864, 407, 300, 366, 885, 13210, 2051, 365, 641, 40337, 13, 51064], + "temperature": 0.0, "avg_logprob": -0.11332834502797068, "compression_ratio": 1.4952380952380953, + "no_speech_prob": 0.003010433167219162}, {"id": 452, "seek": 127888, "start": 1292.88, + "end": 1299.88, "text": " So, but what sort of fun with the new background jobs + and using web sockets.", "tokens": [51064, 407, 11, 457, 437, 1333, 295, 1019, 365, + 264, 777, 3678, 4782, 293, 1228, 3670, 370, 11984, 13, 51414], "temperature": 0.0, + "avg_logprob": -0.11332834502797068, "compression_ratio": 1.4952380952380953, "no_speech_prob": + 0.003010433167219162}, {"id": 453, "seek": 127888, "start": 1299.88, "end": 1305.88, + "text": " We''re also able to push up updates to you as background jobs are happening + inside cube.", "tokens": [51414, 492, 434, 611, 1075, 281, 2944, 493, 9205, 281, + 291, 382, 3678, 4782, 366, 2737, 1854, 10057, 68, 13, 51714], "temperature": 0.0, + "avg_logprob": -0.11332834502797068, "compression_ratio": 1.4952380952380953, "no_speech_prob": + 0.003010433167219162}, {"id": 454, "seek": 130588, "start": 1305.88, "end": 1311.88, + "text": " So right here, there we are, and we are loading a whole bunch of data.", + "tokens": [50364, 407, 558, 510, 11, 456, 321, 366, 11, 293, 321, 366, 15114, 257, + 1379, 3840, 295, 1412, 13, 50664], "temperature": 0.0, "avg_logprob": -0.11844118179813508, + "compression_ratio": 1.513157894736842, "no_speech_prob": 0.001832729671150446}, + {"id": 455, "seek": 130588, "start": 1311.88, "end": 1319.88, "text": " Now, yes, + it would be nice if it was a parquet file, not a MySQL database that we were using.", + "tokens": [50664, 823, 11, 2086, 11, 309, 576, 312, 1481, 498, 309, 390, 257, 971, + 19343, 3991, 11, 406, 257, 1222, 39934, 8149, 300, 321, 645, 1228, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.11844118179813508, "compression_ratio": 1.513157894736842, + "no_speech_prob": 0.001832729671150446}, {"id": 456, "seek": 130588, "start": 1319.88, + "end": 1322.88, "text": " So we''ll have to think about some of those things.", + "tokens": [51064, 407, 321, 603, 362, 281, 519, 466, 512, 295, 729, 721, 13, 51214], + "temperature": 0.0, "avg_logprob": -0.11844118179813508, "compression_ratio": 1.513157894736842, + "no_speech_prob": 0.001832729671150446}, {"id": 457, "seek": 130588, "start": 1322.88, + "end": 1334.88, "text": " But this is starting to open up the door to moving larger + data sets and being really comfortable with that sort of 5,000 queries,", "tokens": + [51214, 583, 341, 307, 2891, 281, 1269, 493, 264, 2853, 281, 2684, 4833, 1412, 6352, + 293, 885, 534, 4619, 365, 300, 1333, 295, 1025, 11, 1360, 24109, 11, 51814], "temperature": + 0.0, "avg_logprob": -0.11844118179813508, "compression_ratio": 1.513157894736842, + "no_speech_prob": 0.001832729671150446}, {"id": 458, "seek": 133488, "start": 1334.88, + "end": 1338.88, "text": " 50,000 query dock pairs kind of data.", "tokens": [50364, + 2625, 11, 1360, 14581, 20929, 15494, 733, 295, 1412, 13, 50564], "temperature": + 0.0, "avg_logprob": -0.14422721641008243, "compression_ratio": 1.515695067264574, + "no_speech_prob": 0.013159526512026787}, {"id": 459, "seek": 133488, "start": 1338.88, + "end": 1345.88, "text": " Not going to manage the 100,000 queries or quarter million + documents those data sets.", "tokens": [50564, 1726, 516, 281, 3067, 264, 2319, + 11, 1360, 24109, 420, 6555, 2459, 8512, 729, 1412, 6352, 13, 50914], "temperature": + 0.0, "avg_logprob": -0.14422721641008243, "compression_ratio": 1.515695067264574, + "no_speech_prob": 0.013159526512026787}, {"id": 460, "seek": 133488, "start": 1345.88, + "end": 1351.88, "text": " Jason is not the right format, but we''re at least scaling + it up to get a broader set.", "tokens": [50914, 11181, 307, 406, 264, 558, 7877, + 11, 457, 321, 434, 412, 1935, 21589, 309, 493, 281, 483, 257, 13227, 992, 13, 51214], + "temperature": 0.0, "avg_logprob": -0.14422721641008243, "compression_ratio": 1.515695067264574, + "no_speech_prob": 0.013159526512026787}, {"id": 461, "seek": 133488, "start": 1351.88, + "end": 1360.88, "text": " The other thing that I''m also excited about is we''re + getting closer to be able to run these analytics on a regular basis, right?", "tokens": + [51214, 440, 661, 551, 300, 286, 478, 611, 2919, 466, 307, 321, 434, 1242, 4966, + 281, 312, 1075, 281, 1190, 613, 15370, 322, 257, 3890, 5143, 11, 558, 30, 51664], + "temperature": 0.0, "avg_logprob": -0.14422721641008243, "compression_ratio": 1.515695067264574, + "no_speech_prob": 0.013159526512026787}, {"id": 462, "seek": 136088, "start": 1360.88, + "end": 1370.88, "text": " Now that we have some background processing, we could + think about every night we rerun all 1000 documents.", "tokens": [50364, 823, 300, + 321, 362, 512, 3678, 9007, 11, 321, 727, 519, 466, 633, 1818, 321, 43819, 409, 439, + 9714, 8512, 13, 50864], "temperature": 0.0, "avg_logprob": -0.12743329398239714, + "compression_ratio": 1.6073059360730593, "no_speech_prob": 0.0031989929266273975}, + {"id": 463, "seek": 136088, "start": 1370.88, "end": 1373.88, "text": " And every + night we could be storing them.", "tokens": [50864, 400, 633, 1818, 321, 727, 312, + 26085, 552, 13, 51014], "temperature": 0.0, "avg_logprob": -0.12743329398239714, + "compression_ratio": 1.6073059360730593, "no_speech_prob": 0.0031989929266273975}, + {"id": 464, "seek": 136088, "start": 1373.88, "end": 1379.88, "text": " So these + little charts here that you see that are sort of showing some basic scoring information.", + "tokens": [51014, 407, 613, 707, 17767, 510, 300, 291, 536, 300, 366, 1333, 295, + 4099, 512, 3875, 22358, 1589, 13, 51314], "temperature": 0.0, "avg_logprob": -0.12743329398239714, + "compression_ratio": 1.6073059360730593, "no_speech_prob": 0.0031989929266273975}, + {"id": 465, "seek": 136088, "start": 1379.88, "end": 1386.88, "text": " You could + start using this to monitor it over time instead of having to roll your own dashboarding + tools.", "tokens": [51314, 509, 727, 722, 1228, 341, 281, 6002, 309, 670, 565, 2602, + 295, 1419, 281, 3373, 428, 1065, 18342, 278, 3873, 13, 51664], "temperature": 0.0, + "avg_logprob": -0.12743329398239714, "compression_ratio": 1.6073059360730593, "no_speech_prob": + 0.0031989929266273975}, {"id": 466, "seek": 138688, "start": 1386.88, "end": 1390.88, + "text": " So that''s something I''m really excited about.", "tokens": [50364, 407, + 300, 311, 746, 286, 478, 534, 2919, 466, 13, 50564], "temperature": 0.0, "avg_logprob": + -0.20580493426713786, "compression_ratio": 1.3395061728395061, "no_speech_prob": + 0.031168514862656593}, {"id": 467, "seek": 138688, "start": 1390.88, "end": 1399.88, + "text": " I''m also going to point out to two PR so GitHub.com 19 sqbid is the open + source project.", "tokens": [50564, 286, 478, 611, 516, 281, 935, 484, 281, 732, + 11568, 370, 23331, 13, 1112, 1294, 262, 80, 65, 327, 307, 264, 1269, 4009, 1716, + 13, 51014], "temperature": 0.0, "avg_logprob": -0.20580493426713786, "compression_ratio": + 1.3395061728395061, "no_speech_prob": 0.031168514862656593}, {"id": 468, "seek": + 138688, "start": 1399.88, "end": 1409.88, "text": " And a couple of pull requests + that are in progress, but looking to land them soon.", "tokens": [51014, 400, 257, + 1916, 295, 2235, 12475, 300, 366, 294, 4205, 11, 457, 1237, 281, 2117, 552, 2321, + 13, 51514], "temperature": 0.0, "avg_logprob": -0.20580493426713786, "compression_ratio": + 1.3395061728395061, "no_speech_prob": 0.031168514862656593}, {"id": 469, "seek": + 140988, "start": 1409.88, "end": 1413.88, "text": " Right here is this pull request + 976.", "tokens": [50364, 1779, 510, 307, 341, 2235, 5308, 23399, 21, 13, 50564], + "temperature": 0.0, "avg_logprob": -0.27725936488101355, "compression_ratio": 1.4730290456431536, + "no_speech_prob": 0.1089719831943512}, {"id": 470, "seek": 140988, "start": 1413.88, + "end": 1417.88, "text": " Imagine if we could run thousands of queries nightly and + cupid.", "tokens": [50564, 11739, 498, 321, 727, 1190, 5383, 295, 24109, 1818, 356, + 293, 4414, 327, 13, 50764], "temperature": 0.0, "avg_logprob": -0.27725936488101355, + "compression_ratio": 1.4730290456431536, "no_speech_prob": 0.1089719831943512}, + {"id": 471, "seek": 140988, "start": 1417.88, "end": 1423.88, "text": " Now that + we''ve got background jobs working and communicating with the user right of state.", + "tokens": [50764, 823, 300, 321, 600, 658, 3678, 4782, 1364, 293, 17559, 365, 264, + 4195, 558, 295, 1785, 13, 51064], "temperature": 0.0, "avg_logprob": -0.27725936488101355, + "compression_ratio": 1.4730290456431536, "no_speech_prob": 0.1089719831943512}, + {"id": 472, "seek": 140988, "start": 1423.88, "end": 1425.88, "text": " This will + be coming pretty soon.", "tokens": [51064, 639, 486, 312, 1348, 1238, 2321, 13, + 51164], "temperature": 0.0, "avg_logprob": -0.27725936488101355, "compression_ratio": + 1.4730290456431536, "no_speech_prob": 0.1089719831943512}, {"id": 473, "seek": 140988, + "start": 1425.88, "end": 1429.88, "text": " Pretty soon in open source time, which + means.", "tokens": [51164, 10693, 2321, 294, 1269, 4009, 565, 11, 597, 1355, 13, + 51364], "temperature": 0.0, "avg_logprob": -0.27725936488101355, "compression_ratio": + 1.4730290456431536, "no_speech_prob": 0.1089719831943512}, {"id": 474, "seek": 140988, + "start": 1429.88, "end": 1431.88, "text": " I don''t know. We''ll see next few months.", + "tokens": [51364, 286, 500, 380, 458, 13, 492, 603, 536, 958, 1326, 2493, 13, 51464], + "temperature": 0.0, "avg_logprob": -0.27725936488101355, "compression_ratio": 1.4730290456431536, + "no_speech_prob": 0.1089719831943512}, {"id": 475, "seek": 140988, "start": 1431.88, + "end": 1434.88, "text": " And so I''m glad people helping and testing.", "tokens": + [51464, 400, 370, 286, 478, 5404, 561, 4315, 293, 4997, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.27725936488101355, "compression_ratio": 1.4730290456431536, + "no_speech_prob": 0.1089719831943512}, {"id": 476, "seek": 143488, "start": 1434.88, + "end": 1437.88, "text": " So this one''s super exciting.", "tokens": [50364, 407, + 341, 472, 311, 1687, 4670, 13, 50514], "temperature": 0.0, "avg_logprob": -0.20270405983438297, + "compression_ratio": 1.5246636771300448, "no_speech_prob": 0.10917375236749649}, + {"id": 477, "seek": 143488, "start": 1437.88, "end": 1439.88, "text": " Let''s go + back and see how we''re doing.", "tokens": [50514, 961, 311, 352, 646, 293, 536, + 577, 321, 434, 884, 13, 50614], "temperature": 0.0, "avg_logprob": -0.20270405983438297, + "compression_ratio": 1.5246636771300448, "no_speech_prob": 0.10917375236749649}, + {"id": 478, "seek": 143488, "start": 1439.88, "end": 1442.88, "text": " And they''re + doing.", "tokens": [50614, 400, 436, 434, 884, 13, 50764], "temperature": 0.0, "avg_logprob": + -0.20270405983438297, "compression_ratio": 1.5246636771300448, "no_speech_prob": + 0.10917375236749649}, {"id": 479, "seek": 143488, "start": 1442.88, "end": 1450.88, + "text": " So there we go. We''re up to 4968 query dock pairs as we kind of count + down.", "tokens": [50764, 407, 456, 321, 352, 13, 492, 434, 493, 281, 16513, 27102, + 14581, 20929, 15494, 382, 321, 733, 295, 1207, 760, 13, 51164], "temperature": 0.0, + "avg_logprob": -0.20270405983438297, "compression_ratio": 1.5246636771300448, "no_speech_prob": + 0.10917375236749649}, {"id": 480, "seek": 143488, "start": 1450.88, "end": 1456.88, + "text": " So yeah, this is all through the magic of Web sockets, which has been + really cool to see.", "tokens": [51164, 407, 1338, 11, 341, 307, 439, 807, 264, + 5585, 295, 9573, 370, 11984, 11, 597, 575, 668, 534, 1627, 281, 536, 13, 51464], + "temperature": 0.0, "avg_logprob": -0.20270405983438297, "compression_ratio": 1.5246636771300448, + "no_speech_prob": 0.10917375236749649}, {"id": 481, "seek": 143488, "start": 1456.88, + "end": 1461.88, "text": " And as you are loading this here, are you also executing + it against the search engine?", "tokens": [51464, 400, 382, 291, 366, 15114, 341, + 510, 11, 366, 291, 611, 32368, 309, 1970, 264, 3164, 2848, 30, 51714], "temperature": + 0.0, "avg_logprob": -0.20270405983438297, "compression_ratio": 1.5246636771300448, + "no_speech_prob": 0.10917375236749649}, {"id": 482, "seek": 146188, "start": 1461.88, + "end": 1465.88, "text": " Or are you here because we had all static data.", "tokens": + [50364, 1610, 366, 291, 510, 570, 321, 632, 439, 13437, 1412, 13, 50564], "temperature": + 0.0, "avg_logprob": -0.18150377834544462, "compression_ratio": 1.574468085106383, + "no_speech_prob": 0.03339112177491188}, {"id": 483, "seek": 146188, "start": 1465.88, + "end": 1467.88, "text": " Yes, static data.", "tokens": [50564, 1079, 11, 13437, + 1412, 13, 50664], "temperature": 0.0, "avg_logprob": -0.18150377834544462, "compression_ratio": + 1.574468085106383, "no_speech_prob": 0.03339112177491188}, {"id": 484, "seek": 146188, + "start": 1467.88, "end": 1472.88, "text": " A book represents the query dock pairs + with all of the data.", "tokens": [50664, 316, 1446, 8855, 264, 14581, 20929, 15494, + 365, 439, 295, 264, 1412, 13, 50914], "temperature": 0.0, "avg_logprob": -0.18150377834544462, + "compression_ratio": 1.574468085106383, "no_speech_prob": 0.03339112177491188}, + {"id": 485, "seek": 146188, "start": 1472.88, "end": 1475.88, "text": " Whereas + the case is what we do the real time querying.", "tokens": [50914, 13813, 264, 1389, + 307, 437, 321, 360, 264, 957, 565, 7083, 1840, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.18150377834544462, "compression_ratio": 1.574468085106383, "no_speech_prob": + 0.03339112177491188}, {"id": 486, "seek": 146188, "start": 1475.88, "end": 1479.88, + "text": " And now that we have this one working.", "tokens": [51064, 400, 586, 300, + 321, 362, 341, 472, 1364, 13, 51264], "temperature": 0.0, "avg_logprob": -0.18150377834544462, + "compression_ratio": 1.574468085106383, "no_speech_prob": 0.03339112177491188}, + {"id": 487, "seek": 146188, "start": 1479.88, "end": 1487.88, "text": " Once we + have this PR, then you''ll be able to run a background job in cupid.", "tokens": + [51264, 3443, 321, 362, 341, 11568, 11, 550, 291, 603, 312, 1075, 281, 1190, 257, + 3678, 1691, 294, 4414, 327, 13, 51664], "temperature": 0.0, "avg_logprob": -0.18150377834544462, + "compression_ratio": 1.574468085106383, "no_speech_prob": 0.03339112177491188}, + {"id": 488, "seek": 148788, "start": 1487.88, "end": 1495.88, "text": " With a similar + counter, maybe it up here next to one of your cases that says we''re running queries, + 5,000 queries.", "tokens": [50364, 2022, 257, 2531, 5682, 11, 1310, 309, 493, 510, + 958, 281, 472, 295, 428, 3331, 300, 1619, 321, 434, 2614, 24109, 11, 1025, 11, 1360, + 24109, 13, 50764], "temperature": 0.0, "avg_logprob": -0.3380036570809104, "compression_ratio": + 1.5398230088495575, "no_speech_prob": 0.030313337221741676}, {"id": 489, "seek": + 148788, "start": 1495.88, "end": 1500.88, "text": " And this is our progress for + the number of the error out.", "tokens": [50764, 400, 341, 307, 527, 4205, 337, + 264, 1230, 295, 264, 6713, 484, 13, 51014], "temperature": 0.0, "avg_logprob": -0.3380036570809104, + "compression_ratio": 1.5398230088495575, "no_speech_prob": 0.030313337221741676}, + {"id": 490, "seek": 148788, "start": 1500.88, "end": 1511.88, "text": " But of course, + for listeners to understand, like what takes time is basically, of course, also + in searching this data into cupids database is like my sequel.", "tokens": [51014, + 583, 295, 1164, 11, 337, 23274, 281, 1223, 11, 411, 437, 2516, 565, 307, 1936, 11, + 295, 1164, 11, 611, 294, 10808, 341, 1412, 666, 4414, 3742, 8149, 307, 411, 452, + 20622, 13, 51564], "temperature": 0.0, "avg_logprob": -0.3380036570809104, "compression_ratio": + 1.5398230088495575, "no_speech_prob": 0.030313337221741676}, {"id": 491, "seek": + 148788, "start": 1511.88, "end": 1513.88, "text": " Right. And ready.", "tokens": + [51564, 1779, 13, 400, 1919, 13, 51664], "temperature": 0.0, "avg_logprob": -0.3380036570809104, + "compression_ratio": 1.5398230088495575, "no_speech_prob": 0.030313337221741676}, + {"id": 492, "seek": 151388, "start": 1513.88, "end": 1515.88, "text": " So have + you stopped using it already?", "tokens": [50364, 407, 362, 291, 5936, 1228, 309, + 1217, 30, 50464], "temperature": 0.0, "avg_logprob": -0.25254138465066556, "compression_ratio": + 1.6059322033898304, "no_speech_prob": 0.03787301108241081}, {"id": 493, "seek": + 151388, "start": 1515.88, "end": 1519.88, "text": " So we''re actually, so we''re + using my sequels or database.", "tokens": [50464, 407, 321, 434, 767, 11, 370, 321, + 434, 1228, 452, 5123, 1625, 420, 8149, 13, 50664], "temperature": 0.0, "avg_logprob": + -0.25254138465066556, "compression_ratio": 1.6059322033898304, "no_speech_prob": + 0.03787301108241081}, {"id": 494, "seek": 151388, "start": 1519.88, "end": 1524.88, + "text": " However, what manages this communication web sockets is all in red.", + "tokens": [50664, 2908, 11, 437, 22489, 341, 6101, 3670, 370, 11984, 307, 439, 294, + 2182, 13, 50914], "temperature": 0.0, "avg_logprob": -0.25254138465066556, "compression_ratio": + 1.6059322033898304, "no_speech_prob": 0.03787301108241081}, {"id": 495, "seek": + 151388, "start": 1524.88, "end": 1535.88, "text": " So as the bat and it''s our + background jobs and our front end jobs and our web browsers keep track of each other + is through red.", "tokens": [50914, 407, 382, 264, 7362, 293, 309, 311, 527, 3678, + 4782, 293, 527, 1868, 917, 4782, 293, 527, 3670, 36069, 1066, 2837, 295, 1184, 661, + 307, 807, 2182, 13, 51464], "temperature": 0.0, "avg_logprob": -0.25254138465066556, + "compression_ratio": 1.6059322033898304, "no_speech_prob": 0.03787301108241081}, + {"id": 496, "seek": 151388, "start": 1535.88, "end": 1542.88, "text": " Yeah, so + if I actually, so if you, you know, I''m running local hosts, you won''t see it.", + "tokens": [51464, 865, 11, 370, 498, 286, 767, 11, 370, 498, 291, 11, 291, 458, + 11, 286, 478, 2614, 2654, 21573, 11, 291, 1582, 380, 536, 309, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.25254138465066556, "compression_ratio": 1.6059322033898304, + "no_speech_prob": 0.03787301108241081}, {"id": 497, "seek": 154288, "start": 1542.88, + "end": 1549.88, "text": " So everybody who is connected, who has permissions for + this book, everybody would be seeing these messages.", "tokens": [50364, 407, 2201, + 567, 307, 4582, 11, 567, 575, 32723, 337, 341, 1446, 11, 2201, 576, 312, 2577, 613, + 7897, 13, 50714], "temperature": 0.0, "avg_logprob": -0.23674129304431735, "compression_ratio": + 1.5343137254901962, "no_speech_prob": 0.03055999055504799}, {"id": 498, "seek": + 154288, "start": 1549.88, "end": 1552.88, "text": " Yeah, yeah. So it''s kind of + broadcasting to everyone.", "tokens": [50714, 865, 11, 1338, 13, 407, 309, 311, + 733, 295, 30024, 281, 1518, 13, 50864], "temperature": 0.0, "avg_logprob": -0.23674129304431735, + "compression_ratio": 1.5343137254901962, "no_speech_prob": 0.03055999055504799}, + {"id": 499, "seek": 154288, "start": 1552.88, "end": 1553.88, "text": " Yeah, cool.", + "tokens": [50864, 865, 11, 1627, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.23674129304431735, "compression_ratio": 1.5343137254901962, "no_speech_prob": + 0.03055999055504799}, {"id": 500, "seek": 154288, "start": 1553.88, "end": 1554.88, + "text": " Exactly.", "tokens": [50914, 7587, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.23674129304431735, "compression_ratio": 1.5343137254901962, "no_speech_prob": + 0.03055999055504799}, {"id": 501, "seek": 154288, "start": 1554.88, "end": 1557.88, + "text": " So that''s something I''m really, really excited about.", "tokens": [50964, + 407, 300, 311, 746, 286, 478, 534, 11, 534, 2919, 466, 13, 51114], "temperature": + 0.0, "avg_logprob": -0.23674129304431735, "compression_ratio": 1.5343137254901962, + "no_speech_prob": 0.03055999055504799}, {"id": 502, "seek": 154288, "start": 1557.88, + "end": 1564.88, "text": " The other thing that I''m really excited to be is LLM + based judgments, right?", "tokens": [51114, 440, 661, 551, 300, 286, 478, 534, 2919, + 281, 312, 307, 441, 43, 44, 2361, 40337, 11, 558, 30, 51464], "temperature": 0.0, + "avg_logprob": -0.23674129304431735, "compression_ratio": 1.5343137254901962, "no_speech_prob": + 0.03055999055504799}, {"id": 503, "seek": 156488, "start": 1564.88, "end": 1570.88, + "text": " So I''m going to start out this conversation about using cupid with human + judges annotators, right?", "tokens": [50364, 407, 286, 478, 516, 281, 722, 484, + 341, 3761, 466, 1228, 4414, 327, 365, 1952, 14449, 25339, 3391, 11, 558, 30, 50664], + "temperature": 0.0, "avg_logprob": -0.22117931192571466, "compression_ratio": 1.497991967871486, + "no_speech_prob": 0.008472356013953686}, {"id": 504, "seek": 156488, "start": 1570.88, + "end": 1572.88, "text": " And gathering quite quality data.", "tokens": [50664, + 400, 13519, 1596, 3125, 1412, 13, 50764], "temperature": 0.0, "avg_logprob": -0.22117931192571466, + "compression_ratio": 1.497991967871486, "no_speech_prob": 0.008472356013953686}, + {"id": 505, "seek": 156488, "start": 1572.88, "end": 1581.88, "text": " But as we + all know, human judges is expensive, not every organization can do it.", "tokens": + [50764, 583, 382, 321, 439, 458, 11, 1952, 14449, 307, 5124, 11, 406, 633, 4475, + 393, 360, 309, 13, 51214], "temperature": 0.0, "avg_logprob": -0.22117931192571466, + "compression_ratio": 1.497991967871486, "no_speech_prob": 0.008472356013953686}, + {"id": 506, "seek": 156488, "start": 1581.88, "end": 1593.88, "text": " My colleague + Scott Stoltz last year did some interesting work playing around with chat GPT when + it first came out to evaluate, is this query and this document.", "tokens": [51214, + 1222, 13532, 6659, 745, 4837, 89, 1036, 1064, 630, 512, 1880, 589, 2433, 926, 365, + 5081, 26039, 51, 562, 309, 700, 1361, 484, 281, 13059, 11, 307, 341, 14581, 293, + 341, 4166, 13, 51814], "temperature": 0.0, "avg_logprob": -0.22117931192571466, + "compression_ratio": 1.497991967871486, "no_speech_prob": 0.008472356013953686}, + {"id": 507, "seek": 159388, "start": 1593.88, "end": 1613.88, "text": " And then + we''ve been working with Moody''s on their BG solution and using what we''ve been + calling judge Judy, an LLM to evaluate what that lets us do is.", "tokens": [50364, + 400, 550, 321, 600, 668, 1364, 365, 3335, 843, 311, 322, 641, 363, 38, 3827, 293, + 1228, 437, 321, 600, 668, 5141, 6995, 24577, 11, 364, 441, 43, 44, 281, 13059, 437, + 300, 6653, 505, 360, 307, 13, 51364], "temperature": 0.0, "avg_logprob": -0.4019261768886021, + "compression_ratio": 1.26890756302521, "no_speech_prob": 0.006714065559208393}, + {"id": 508, "seek": 161388, "start": 1613.88, "end": 1620.88, "text": " We''re basically + using a small set of human judges to validate our LLM judge judge Judy.", "tokens": + [50364, 492, 434, 1936, 1228, 257, 1359, 992, 295, 1952, 14449, 281, 29562, 527, + 441, 43, 44, 6995, 6995, 24577, 13, 50714], "temperature": 0.0, "avg_logprob": -0.18312058797696742, + "compression_ratio": 1.5260663507109005, "no_speech_prob": 0.09734303504228592}, + {"id": 509, "seek": 161388, "start": 1620.88, "end": 1640.88, "text": " And if we + have good correlation, right, our inner rate of reliability looks good, you know, + flights, Kappa, Coens, all those metrics look good, then this gives us confidence + to go ahead and scale up the judgments, right, using an LLM.", "tokens": [50714, + 400, 498, 321, 362, 665, 20009, 11, 558, 11, 527, 7284, 3314, 295, 24550, 1542, + 665, 11, 291, 458, 11, 21089, 11, 591, 25637, 11, 3066, 694, 11, 439, 729, 16367, + 574, 665, 11, 550, 341, 2709, 505, 6687, 281, 352, 2286, 293, 4373, 493, 264, 40337, + 11, 558, 11, 1228, 364, 441, 43, 44, 13, 51714], "temperature": 0.0, "avg_logprob": + -0.18312058797696742, "compression_ratio": 1.5260663507109005, "no_speech_prob": + 0.09734303504228592}, {"id": 510, "seek": 164088, "start": 1640.88, "end": 1646.88, + "text": " So today, that is a bunch of pandas notebooks and kind of custom code.", + "tokens": [50364, 407, 965, 11, 300, 307, 257, 3840, 295, 4565, 296, 43782, 293, + 733, 295, 2375, 3089, 13, 50664], "temperature": 0.0, "avg_logprob": -0.14148697419600054, + "compression_ratio": 1.3518518518518519, "no_speech_prob": 0.02278565615415573}, + {"id": 511, "seek": 164088, "start": 1646.88, "end": 1659.88, "text": " However, + the other pull requests that I''m really excited about, right, is this meat judge + Judy, she is your AI powered subject matter expert, right?", "tokens": [50664, 2908, + 11, 264, 661, 2235, 12475, 300, 286, 478, 534, 2919, 466, 11, 558, 11, 307, 341, + 4615, 6995, 24577, 11, 750, 307, 428, 7318, 17786, 3983, 1871, 5844, 11, 558, 30, + 51314], "temperature": 0.0, "avg_logprob": -0.14148697419600054, "compression_ratio": + 1.3518518518518519, "no_speech_prob": 0.02278565615415573}, {"id": 512, "seek": + 165988, "start": 1659.88, "end": 1673.88, "text": " And so in the not too distant + future, you will be able to, let me go ahead and bring up this case, right, here + we have one person who''s been the judge.", "tokens": [50364, 400, 370, 294, 264, + 406, 886, 17275, 2027, 11, 291, 486, 312, 1075, 281, 11, 718, 385, 352, 2286, 293, + 1565, 493, 341, 1389, 11, 558, 11, 510, 321, 362, 472, 954, 567, 311, 668, 264, + 6995, 13, 51064], "temperature": 0.0, "avg_logprob": -0.13137922658548726, "compression_ratio": + 1.5425531914893618, "no_speech_prob": 0.8400723338127136}, {"id": 513, "seek": 165988, + "start": 1673.88, "end": 1685.88, "text": " But soon, you''ll have a second column + next to it, judge Judy, right, using whatever prompt you''ve typed in, right, or + provided is judging.", "tokens": [51064, 583, 2321, 11, 291, 603, 362, 257, 1150, + 7738, 958, 281, 309, 11, 6995, 24577, 11, 558, 11, 1228, 2035, 12391, 291, 600, + 33941, 294, 11, 558, 11, 420, 5649, 307, 23587, 13, 51664], "temperature": 0.0, + "avg_logprob": -0.13137922658548726, "compression_ratio": 1.5425531914893618, "no_speech_prob": + 0.8400723338127136}, {"id": 514, "seek": 168588, "start": 1685.88, "end": 1695.88, + "text": " So that''s the other big, how do we scale up, cupid and make it relevant + in our gen AI world, right, those are sort of the two big things.", "tokens": [50364, + 407, 300, 311, 264, 661, 955, 11, 577, 360, 321, 4373, 493, 11, 4414, 327, 293, + 652, 309, 7340, 294, 527, 1049, 7318, 1002, 11, 558, 11, 729, 366, 1333, 295, 264, + 732, 955, 721, 13, 50864], "temperature": 0.0, "avg_logprob": -0.18723144112052498, + "compression_ratio": 1.5530973451327434, "no_speech_prob": 0.20294906198978424}, + {"id": 515, "seek": 168588, "start": 1695.88, "end": 1700.88, "text": " This is + fantastic. This is really fantastic. Yeah. Wow.", "tokens": [50864, 639, 307, 5456, + 13, 639, 307, 534, 5456, 13, 865, 13, 3153, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.18723144112052498, "compression_ratio": 1.5530973451327434, "no_speech_prob": + 0.20294906198978424}, {"id": 516, "seek": 168588, "start": 1700.88, "end": 1710.88, + "text": " I hope this PRs will land really soon, especially the LLM one, right, + because this allows people to really quickly hit the ground running and start labeling.", + "tokens": [51114, 286, 1454, 341, 11568, 82, 486, 2117, 534, 2321, 11, 2318, 264, + 441, 43, 44, 472, 11, 558, 11, 570, 341, 4045, 561, 281, 534, 2661, 2045, 264, 2727, + 2614, 293, 722, 40244, 13, 51614], "temperature": 0.0, "avg_logprob": -0.18723144112052498, + "compression_ratio": 1.5530973451327434, "no_speech_prob": 0.20294906198978424}, + {"id": 517, "seek": 171088, "start": 1710.88, "end": 1723.88, "text": " Actually, + someone will label in a way, but exactly exactly the trick is having the right props, + right, and having the right set of positive examples and negative examples, right.", + "tokens": [50364, 5135, 11, 1580, 486, 7645, 294, 257, 636, 11, 457, 2293, 2293, + 264, 4282, 307, 1419, 264, 558, 26173, 11, 558, 11, 293, 1419, 264, 558, 992, 295, + 3353, 5110, 293, 3671, 5110, 11, 558, 13, 51014], "temperature": 0.0, "avg_logprob": + -0.16214269085934288, "compression_ratio": 1.6701570680628273, "no_speech_prob": + 0.12435764819383621}, {"id": 518, "seek": 171088, "start": 1723.88, "end": 1734.88, + "text": " But one of the things that I were working on, so cupid, right, ships with + a set of data science notebooks, they need a little bit more work.", "tokens": [51014, + 583, 472, 295, 264, 721, 300, 286, 645, 1364, 322, 11, 370, 4414, 327, 11, 558, + 11, 11434, 365, 257, 992, 295, 1412, 3497, 43782, 11, 436, 643, 257, 707, 857, 544, + 589, 13, 51564], "temperature": 0.0, "avg_logprob": -0.16214269085934288, "compression_ratio": + 1.6701570680628273, "no_speech_prob": 0.12435764819383621}, {"id": 519, "seek": + 173488, "start": 1734.88, "end": 1740.88, "text": " See if this comes up in my diversion, + I don''t ship that. So I''m going to switch to the production cupid.", "tokens": + [50364, 3008, 498, 341, 1487, 493, 294, 452, 49422, 11, 286, 500, 380, 5374, 300, + 13, 407, 286, 478, 516, 281, 3679, 281, 264, 4265, 4414, 327, 13, 50664], "temperature": + 0.0, "avg_logprob": -0.31818225771881814, "compression_ratio": 1.191304347826087, + "no_speech_prob": 0.2217974215745926}, {"id": 520, "seek": 173488, "start": 1740.88, + "end": 1742.88, "text": " Yeah, no worries.", "tokens": [50664, 865, 11, 572, 16340, + 13, 50764], "temperature": 0.0, "avg_logprob": -0.31818225771881814, "compression_ratio": + 1.191304347826087, "no_speech_prob": 0.2217974215745926}, {"id": 521, "seek": 173488, + "start": 1742.88, "end": 1746.88, "text": " And notebooks.", "tokens": [50764, 400, + 43782, 13, 50964], "temperature": 0.0, "avg_logprob": -0.31818225771881814, "compression_ratio": + 1.191304347826087, "no_speech_prob": 0.2217974215745926}, {"id": 522, "seek": 174688, + "start": 1746.88, "end": 1764.88, "text": " So in this example''s folder, we''re + actually shipping a couple of notebooks for you to use, flash capa, jacquard and + RBO comparison, multi-radar analysis, right.", "tokens": [50364, 407, 294, 341, + 1365, 311, 10820, 11, 321, 434, 767, 14122, 257, 1916, 295, 43782, 337, 291, 281, + 764, 11, 7319, 1410, 64, 11, 361, 326, 358, 515, 293, 497, 15893, 9660, 11, 4825, + 12, 6206, 289, 5215, 11, 558, 13, 51264], "temperature": 0.0, "avg_logprob": -0.42345521714952256, + "compression_ratio": 1.2403100775193798, "no_speech_prob": 0.5203595161437988}, + {"id": 523, "seek": 176488, "start": 1764.88, "end": 1775.88, "text": " And these + notebooks here, you can directly use with your cupid book of judgments to evaluate + how we''re doing overall.", "tokens": [50364, 400, 613, 43782, 510, 11, 291, 393, + 3838, 764, 365, 428, 4414, 327, 1446, 295, 40337, 281, 13059, 577, 321, 434, 884, + 4787, 13, 50914], "temperature": 0.0, "avg_logprob": -0.1220998646300516, "compression_ratio": + 1.6439024390243901, "no_speech_prob": 0.6272432804107666}, {"id": 524, "seek": 176488, + "start": 1775.88, "end": 1782.88, "text": " And so this can let you take your human + judgments, understand how good or bad they are.", "tokens": [50914, 400, 370, 341, + 393, 718, 291, 747, 428, 1952, 40337, 11, 1223, 577, 665, 420, 1578, 436, 366, 13, + 51264], "temperature": 0.0, "avg_logprob": -0.1220998646300516, "compression_ratio": + 1.6439024390243901, "no_speech_prob": 0.6272432804107666}, {"id": 525, "seek": 176488, + "start": 1782.88, "end": 1791.88, "text": " And then when you bring the LLM power + judge in compare the LLM judge to what your human judges were doing and feel some + confidence.", "tokens": [51264, 400, 550, 562, 291, 1565, 264, 441, 43, 44, 1347, + 6995, 294, 6794, 264, 441, 43, 44, 6995, 281, 437, 428, 1952, 14449, 645, 884, 293, + 841, 512, 6687, 13, 51714], "temperature": 0.0, "avg_logprob": -0.1220998646300516, + "compression_ratio": 1.6439024390243901, "no_speech_prob": 0.6272432804107666}, + {"id": 526, "seek": 179188, "start": 1791.88, "end": 1797.88, "text": " So I''m + really excited to be shipping these because I think it''s going to lower the barrier + to getting judgments.", "tokens": [50364, 407, 286, 478, 534, 2919, 281, 312, 14122, + 613, 570, 286, 519, 309, 311, 516, 281, 3126, 264, 13357, 281, 1242, 40337, 13, + 50664], "temperature": 0.0, "avg_logprob": -0.10515315481956969, "compression_ratio": + 1.6590909090909092, "no_speech_prob": 0.06702517718076706}, {"id": 527, "seek": + 179188, "start": 1797.88, "end": 1804.88, "text": " And that''s something that a + lot of search teams are like, I would love to use cute, but I would love to do this.", + "tokens": [50664, 400, 300, 311, 746, 300, 257, 688, 295, 3164, 5491, 366, 411, + 11, 286, 576, 959, 281, 764, 4052, 11, 457, 286, 576, 959, 281, 360, 341, 13, 51014], + "temperature": 0.0, "avg_logprob": -0.10515315481956969, "compression_ratio": 1.6590909090909092, + "no_speech_prob": 0.06702517718076706}, {"id": 528, "seek": 179188, "start": 1804.88, + "end": 1812.88, "text": " But I can''t do any of this until I have judgments and + I don''t know where to get them or I don''t have the domain experts that I need, + right.", "tokens": [51014, 583, 286, 393, 380, 360, 604, 295, 341, 1826, 286, 362, + 40337, 293, 286, 500, 380, 458, 689, 281, 483, 552, 420, 286, 500, 380, 362, 264, + 9274, 8572, 300, 286, 643, 11, 558, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.10515315481956969, "compression_ratio": 1.6590909090909092, "no_speech_prob": + 0.06702517718076706}, {"id": 529, "seek": 181288, "start": 1812.88, "end": 1824.88, + "text": " And you know, search oriented organizations often have that figured out, + but a lot of other teams are like, we just see a search engine that works what you + know reasonably well and we don''t have that.", "tokens": [50364, 400, 291, 458, + 11, 3164, 21841, 6150, 2049, 362, 300, 8932, 484, 11, 457, 257, 688, 295, 661, 5491, + 366, 411, 11, 321, 445, 536, 257, 3164, 2848, 300, 1985, 437, 291, 458, 23551, 731, + 293, 321, 500, 380, 362, 300, 13, 50964], "temperature": 0.0, "avg_logprob": -0.18047244593782244, + "compression_ratio": 1.690909090909091, "no_speech_prob": 0.6045592427253723}, {"id": + 530, "seek": 181288, "start": 1824.88, "end": 1830.88, "text": " So we got a lower + the barrier to getting judgments in judgments and I''m excited about this.", "tokens": + [50964, 407, 321, 658, 257, 3126, 264, 13357, 281, 1242, 40337, 294, 40337, 293, + 286, 478, 2919, 466, 341, 13, 51264], "temperature": 0.0, "avg_logprob": -0.18047244593782244, + "compression_ratio": 1.690909090909091, "no_speech_prob": 0.6045592427253723}, {"id": + 531, "seek": 181288, "start": 1830.88, "end": 1841.88, "text": " This is fantastic, + but I can also add from my personal experience, you know, that yes, you''re absolutely + right that there is this sometimes there is even a friction, right.", "tokens": + [51264, 639, 307, 5456, 11, 457, 286, 393, 611, 909, 490, 452, 2973, 1752, 11, 291, + 458, 11, 300, 2086, 11, 291, 434, 3122, 558, 300, 456, 307, 341, 2171, 456, 307, + 754, 257, 17710, 11, 558, 13, 51814], "temperature": 0.0, "avg_logprob": -0.18047244593782244, + "compression_ratio": 1.690909090909091, "no_speech_prob": 0.6045592427253723}, {"id": + 532, "seek": 184188, "start": 1841.88, "end": 1852.88, "text": " The search engineer + says, no, I don''t want to label I''m a search engineer, I''m developing the algorithm, + but they will get so many more insights, so much more insights if they actually + label.", "tokens": [50364, 440, 3164, 11403, 1619, 11, 572, 11, 286, 500, 380, 528, + 281, 7645, 286, 478, 257, 3164, 11403, 11, 286, 478, 6416, 264, 9284, 11, 457, 436, + 486, 483, 370, 867, 544, 14310, 11, 370, 709, 544, 14310, 498, 436, 767, 7645, 13, + 50914], "temperature": 0.0, "avg_logprob": -0.1576925415590585, "compression_ratio": + 1.6804123711340206, "no_speech_prob": 0.30825942754745483}, {"id": 533, "seek": + 184188, "start": 1852.88, "end": 1861.88, "text": " And in our teams, you know, + if you have, I don''t know, 10 people and if each will label 10 queries, you will + have 100 queries labeled.", "tokens": [50914, 400, 294, 527, 5491, 11, 291, 458, + 11, 498, 291, 362, 11, 286, 500, 380, 458, 11, 1266, 561, 293, 498, 1184, 486, 7645, + 1266, 24109, 11, 291, 486, 362, 2319, 24109, 21335, 13, 51364], "temperature": 0.0, + "avg_logprob": -0.1576925415590585, "compression_ratio": 1.6804123711340206, "no_speech_prob": + 0.30825942754745483}, {"id": 534, "seek": 186188, "start": 1861.88, "end": 1871.88, + "text": " So of course, if you don''t go for overlapping and stuff like that, but + if you go, then yeah, it''s another story, but you know, and then all of us, all + of a sudden get all this insights, right.", "tokens": [50364, 407, 295, 1164, 11, + 498, 291, 500, 380, 352, 337, 33535, 293, 1507, 411, 300, 11, 457, 498, 291, 352, + 11, 550, 1338, 11, 309, 311, 1071, 1657, 11, 457, 291, 458, 11, 293, 550, 439, 295, + 505, 11, 439, 295, 257, 3990, 483, 439, 341, 14310, 11, 558, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.15341481295498935, "compression_ratio": 1.7237354085603114, + "no_speech_prob": 0.5622115731239319}, {"id": 535, "seek": 186188, "start": 1871.88, + "end": 1887.88, "text": " Now, now the LLM thing can actually help you scale this + right and then of course all this prompting and in label studio, by the way, they + have released a maybe something to think about a capability where an agent will + learn from user feedback, right.", "tokens": [50864, 823, 11, 586, 264, 441, 43, + 44, 551, 393, 767, 854, 291, 4373, 341, 558, 293, 550, 295, 1164, 439, 341, 12391, + 278, 293, 294, 7645, 6811, 11, 538, 264, 636, 11, 436, 362, 4736, 257, 1310, 746, + 281, 519, 466, 257, 13759, 689, 364, 9461, 486, 1466, 490, 4195, 5824, 11, 558, + 13, 51664], "temperature": 0.0, "avg_logprob": -0.15341481295498935, "compression_ratio": + 1.7237354085603114, "no_speech_prob": 0.5622115731239319}, {"id": 536, "seek": 188788, + "start": 1887.88, "end": 1900.88, "text": " So let''s say they label and then so + LLM will label make make some mistakes and then the main expert will correct them + and so it takes it in as a feedback and then it becomes better over time.", "tokens": + [50364, 407, 718, 311, 584, 436, 7645, 293, 550, 370, 441, 43, 44, 486, 7645, 652, + 652, 512, 8038, 293, 550, 264, 2135, 5844, 486, 3006, 552, 293, 370, 309, 2516, + 309, 294, 382, 257, 5824, 293, 550, 309, 3643, 1101, 670, 565, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.20396795272827148, "compression_ratio": 1.625615763546798, + "no_speech_prob": 0.5232328772544861}, {"id": 537, "seek": 188788, "start": 1900.88, + "end": 1909.88, "text": " So it''s basically you can kind of support it''s like + you''re not copilot, someone said, seed prop stain in previous episode said, confident.", + "tokens": [51014, 407, 309, 311, 1936, 291, 393, 733, 295, 1406, 309, 311, 411, + 291, 434, 406, 2971, 31516, 11, 1580, 848, 11, 8871, 2365, 16441, 294, 3894, 3500, + 848, 11, 6679, 13, 51464], "temperature": 0.0, "avg_logprob": -0.20396795272827148, + "compression_ratio": 1.625615763546798, "no_speech_prob": 0.5232328772544861}, {"id": + 538, "seek": 190988, "start": 1909.88, "end": 1919.88, "text": " So you kind of + like give these things you collaborate in a way, right. So that would be this is + fantastic direction.", "tokens": [50364, 407, 291, 733, 295, 411, 976, 613, 721, + 291, 18338, 294, 257, 636, 11, 558, 13, 407, 300, 576, 312, 341, 307, 5456, 3513, + 13, 50864], "temperature": 0.0, "avg_logprob": -0.22305779708059212, "compression_ratio": + 1.4825581395348837, "no_speech_prob": 0.5416343808174133}, {"id": 539, "seek": 190988, + "start": 1919.88, "end": 1929.88, "text": " Yeah. So I mean, this is definitely + very much around that more narrow relevance judging versus generic labeling way + label studio is right.", "tokens": [50864, 865, 13, 407, 286, 914, 11, 341, 307, + 2138, 588, 709, 926, 300, 544, 9432, 32684, 23587, 5717, 19577, 40244, 636, 7645, + 6811, 307, 558, 13, 51364], "temperature": 0.0, "avg_logprob": -0.22305779708059212, + "compression_ratio": 1.4825581395348837, "no_speech_prob": 0.5416343808174133}, + {"id": 540, "seek": 192988, "start": 1929.88, "end": 1939.88, "text": " But there''s + definitely room for inspiration from both label studio. I''ve been looking at more + as well as ragas and how it''s doing some of the new metrics.", "tokens": [50364, + 583, 456, 311, 2138, 1808, 337, 10249, 490, 1293, 7645, 6811, 13, 286, 600, 668, + 1237, 412, 544, 382, 731, 382, 17539, 296, 293, 577, 309, 311, 884, 512, 295, 264, + 777, 16367, 13, 50864], "temperature": 0.0, "avg_logprob": -0.15489954106947956, + "compression_ratio": 1.5148936170212766, "no_speech_prob": 0.08198297768831253}, + {"id": 541, "seek": 192988, "start": 1939.88, "end": 1953.88, "text": " Yeah, it''s + interesting. So yeah, exactly. What I love about Cupid is that I can really connect + it to the live search engine. I mean, not necessarily in production can be some + development version of it.", "tokens": [50864, 865, 11, 309, 311, 1880, 13, 407, + 1338, 11, 2293, 13, 708, 286, 959, 466, 383, 6127, 307, 300, 286, 393, 534, 1745, + 309, 281, 264, 1621, 3164, 2848, 13, 286, 914, 11, 406, 4725, 294, 4265, 393, 312, + 512, 3250, 3037, 295, 309, 13, 51564], "temperature": 0.0, "avg_logprob": -0.15489954106947956, + "compression_ratio": 1.5148936170212766, "no_speech_prob": 0.08198297768831253}, + {"id": 542, "seek": 195388, "start": 1953.88, "end": 1967.88, "text": " And I can + start labeling and queering and as you said, search is the interface to big data, + right. So Cupid becomes interface to your search, which is the interface to your + big data and all the unknowns there.", "tokens": [50364, 400, 286, 393, 722, 40244, + 293, 631, 1794, 293, 382, 291, 848, 11, 3164, 307, 264, 9226, 281, 955, 1412, 11, + 558, 13, 407, 383, 6127, 3643, 9226, 281, 428, 3164, 11, 597, 307, 264, 9226, 281, + 428, 955, 1412, 293, 439, 264, 46048, 456, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.12989444732666017, "compression_ratio": 1.5833333333333333, "no_speech_prob": + 0.2224840372800827}, {"id": 543, "seek": 196788, "start": 1967.88, "end": 1979.88, + "text": " And when this terrible that one looks to OK, that one looks OK, that one + looks terrible, right. We can immediately start building.", "tokens": [50364, 400, + 562, 341, 6237, 300, 472, 1542, 281, 2264, 11, 300, 472, 1542, 2264, 11, 300, 472, + 1542, 6237, 11, 558, 13, 492, 393, 4258, 722, 2390, 13, 50964], "temperature": 0.0, + "avg_logprob": -0.29720140676029394, "compression_ratio": 1.8309859154929577, "no_speech_prob": + 0.24868164956569672}, {"id": 544, "seek": 196788, "start": 1979.88, "end": 1990.88, + "text": " Yeah, maybe that one''s OK, that one doesn''t look it right we can immediately + start building some sort of understanding right now.", "tokens": [50964, 865, 11, + 1310, 300, 472, 311, 2264, 11, 300, 472, 1177, 380, 574, 309, 558, 321, 393, 4258, + 722, 2390, 512, 1333, 295, 3701, 558, 586, 13, 51514], "temperature": 0.0, "avg_logprob": + -0.29720140676029394, "compression_ratio": 1.8309859154929577, "no_speech_prob": + 0.24868164956569672}, {"id": 545, "seek": 199088, "start": 1990.88, "end": 2000.88, + "text": " So quick little binary one right we''re going to start building that and + get it get a sense of what our score is going to be exactly exactly.", "tokens": + [50364, 407, 1702, 707, 17434, 472, 558, 321, 434, 516, 281, 722, 2390, 300, 293, + 483, 309, 483, 257, 2020, 295, 437, 527, 6175, 307, 516, 281, 312, 2293, 2293, 13, + 50864], "temperature": 0.0, "avg_logprob": -0.24190364250769983, "compression_ratio": + 1.5508021390374331, "no_speech_prob": 0.5269076824188232}, {"id": 546, "seek": 199088, + "start": 2000.88, "end": 2009.88, "text": " And that score is also customizable. + We''ve done some little implementations in like JavaScript looking like language, + right. I think it''s JavaScript.", "tokens": [50864, 400, 300, 6175, 307, 611, 47922, + 13, 492, 600, 1096, 512, 707, 4445, 763, 294, 411, 15778, 1237, 411, 2856, 11, 558, + 13, 286, 519, 309, 311, 15778, 13, 51314], "temperature": 0.0, "avg_logprob": -0.24190364250769983, + "compression_ratio": 1.5508021390374331, "no_speech_prob": 0.5269076824188232}, + {"id": 547, "seek": 200988, "start": 2009.88, "end": 2032.88, "text": " Yeah, but + you just come in here and you take your score right there is so here''s classic + NBC G 10, but you can change it. So like one recently, we wanted to know in this + score right here, we''re being you know penalized because soy is returning zero + results.", "tokens": [50364, 865, 11, 457, 291, 445, 808, 294, 510, 293, 291, 747, + 428, 6175, 558, 456, 307, 370, 510, 311, 7230, 31504, 460, 1266, 11, 457, 291, 393, + 1319, 309, 13, 407, 411, 472, 3938, 11, 321, 1415, 281, 458, 294, 341, 6175, 558, + 510, 11, 321, 434, 885, 291, 458, 13661, 1602, 570, 8812, 307, 12678, 4018, 3542, + 13, 51514], "temperature": 0.0, "avg_logprob": -0.2553687322707403, "compression_ratio": + 1.4855491329479769, "no_speech_prob": 0.3765116333961487}, {"id": 548, "seek": 203288, + "start": 2032.88, "end": 2037.88, "text": " And so it''s giving us a zero. So it''s + bringing down our average precision.", "tokens": [50364, 400, 370, 309, 311, 2902, + 505, 257, 4018, 13, 407, 309, 311, 5062, 760, 527, 4274, 18356, 13, 50614], "temperature": + 0.0, "avg_logprob": -0.17420241198962247, "compression_ratio": 1.5508021390374331, + "no_speech_prob": 0.34471097588539124}, {"id": 549, "seek": 203288, "start": 2037.88, + "end": 2044.88, "text": " But what if you wanted to know that it was supposed to + be zero results zero results is actually the right.", "tokens": [50614, 583, 437, + 498, 291, 1415, 281, 458, 300, 309, 390, 3442, 281, 312, 4018, 3542, 4018, 3542, + 307, 767, 264, 558, 13, 50964], "temperature": 0.0, "avg_logprob": -0.17420241198962247, + "compression_ratio": 1.5508021390374331, "no_speech_prob": 0.34471097588539124}, + {"id": 550, "seek": 203288, "start": 2044.88, "end": 2046.88, "text": " Yes, yes, + yes.", "tokens": [50964, 1079, 11, 2086, 11, 2086, 13, 51064], "temperature": 0.0, + "avg_logprob": -0.17420241198962247, "compression_ratio": 1.5508021390374331, "no_speech_prob": + 0.34471097588539124}, {"id": 551, "seek": 203288, "start": 2046.88, "end": 2058.88, + "text": " So that one we actually went in and said we added an option a per query + option should be ZSR.", "tokens": [51064, 407, 300, 472, 321, 767, 1437, 294, 293, + 848, 321, 3869, 364, 3614, 257, 680, 14581, 3614, 820, 312, 1176, 50, 49, 13, 51664], + "temperature": 0.0, "avg_logprob": -0.17420241198962247, "compression_ratio": 1.5508021390374331, + "no_speech_prob": 0.34471097588539124}, {"id": 552, "seek": 205888, "start": 2058.88, + "end": 2063.88, "text": " Right, and we set that option and then in our custom score.", + "tokens": [50364, 1779, 11, 293, 321, 992, 300, 3614, 293, 550, 294, 527, 2375, + 6175, 13, 50614], "temperature": 0.0, "avg_logprob": -0.20145328393143214, "compression_ratio": + 1.6454545454545455, "no_speech_prob": 0.11811766028404236}, {"id": 553, "seek": + 205888, "start": 2063.88, "end": 2074.88, "text": " If it said should be ZSR is + true and there were zero results then we gave it a one right because it''s working + the way and vice versa.", "tokens": [50614, 759, 309, 848, 820, 312, 1176, 50, 49, + 307, 2074, 293, 456, 645, 4018, 3542, 550, 321, 2729, 309, 257, 472, 558, 570, 309, + 311, 1364, 264, 636, 293, 11964, 25650, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.20145328393143214, "compression_ratio": 1.6454545454545455, "no_speech_prob": + 0.11811766028404236}, {"id": 554, "seek": 205888, "start": 2074.88, "end": 2086.88, + "text": " We had other situations where yeah, if we started returning results for + soy, that would have been worse search right and so yeah that was a great use of + a custom score.", "tokens": [51164, 492, 632, 661, 6851, 689, 1338, 11, 498, 321, + 1409, 12678, 3542, 337, 8812, 11, 300, 576, 362, 668, 5324, 3164, 558, 293, 370, + 1338, 300, 390, 257, 869, 764, 295, 257, 2375, 6175, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.20145328393143214, "compression_ratio": 1.6454545454545455, + "no_speech_prob": 0.11811766028404236}, {"id": 555, "seek": 208688, "start": 2086.88, + "end": 2087.88, "text": " Yeah, fantastic.", "tokens": [50364, 865, 11, 5456, 13, + 50414], "temperature": 0.0, "avg_logprob": -0.23187308408776108, "compression_ratio": + 1.6594827586206897, "no_speech_prob": 0.08860106766223907}, {"id": 556, "seek": + 208688, "start": 2087.88, "end": 2090.88, "text": " We''re all about that because + it was a great use sound.", "tokens": [50414, 492, 434, 439, 466, 300, 570, 309, + 390, 257, 869, 764, 1626, 13, 50564], "temperature": 0.0, "avg_logprob": -0.23187308408776108, + "compression_ratio": 1.6594827586206897, "no_speech_prob": 0.08860106766223907}, + {"id": 557, "seek": 208688, "start": 2090.88, "end": 2095.88, "text": " Yeah, yeah, + yeah, exactly. Also where Cupid helped us is sometimes you don''t speak that language.", + "tokens": [50564, 865, 11, 1338, 11, 1338, 11, 2293, 13, 2743, 689, 383, 6127, 4254, + 505, 307, 2171, 291, 500, 380, 1710, 300, 2856, 13, 50814], "temperature": 0.0, + "avg_logprob": -0.23187308408776108, "compression_ratio": 1.6594827586206897, "no_speech_prob": + 0.08860106766223907}, {"id": 558, "seek": 208688, "start": 2095.88, "end": 2109.88, + "text": " So it could be Korean language that you don''t speak but you need to move + and on one occasion we''ve sent a Cupid just to our Korean native speakers in the + company and they''ve labeled and they told us how it looks so.", "tokens": [50814, + 407, 309, 727, 312, 6933, 2856, 300, 291, 500, 380, 1710, 457, 291, 643, 281, 1286, + 293, 322, 472, 9674, 321, 600, 2279, 257, 383, 6127, 445, 281, 527, 6933, 8470, + 9518, 294, 264, 2237, 293, 436, 600, 21335, 293, 436, 1907, 505, 577, 309, 1542, + 370, 13, 51514], "temperature": 0.0, "avg_logprob": -0.23187308408776108, "compression_ratio": + 1.6594827586206897, "no_speech_prob": 0.08860106766223907}, {"id": 559, "seek": + 210988, "start": 2109.88, "end": 2110.88, "text": " That work.", "tokens": [50364, + 663, 589, 13, 50414], "temperature": 0.0, "avg_logprob": -0.2014806377353953, "compression_ratio": + 1.4733727810650887, "no_speech_prob": 0.422952800989151}, {"id": 560, "seek": 210988, + "start": 2110.88, "end": 2119.88, "text": " So the other thing right so here we + are we are happily loading up all these but I''ll go ahead and click judge right.", + "tokens": [50414, 407, 264, 661, 551, 558, 370, 510, 321, 366, 321, 366, 19909, + 15114, 493, 439, 613, 457, 286, 603, 352, 2286, 293, 2052, 6995, 558, 13, 50864], + "temperature": 0.0, "avg_logprob": -0.2014806377353953, "compression_ratio": 1.4733727810650887, + "no_speech_prob": 0.422952800989151}, {"id": 561, "seek": 210988, "start": 2119.88, + "end": 2127.88, "text": " So this is sort of an older approach to rating a zero + a one to 10 wouldn''t do that now but that''s what we''ve been saying.", "tokens": + [50864, 407, 341, 307, 1333, 295, 364, 4906, 3109, 281, 10990, 257, 4018, 257, 472, + 281, 1266, 2759, 380, 360, 300, 586, 457, 300, 311, 437, 321, 600, 668, 1566, 13, + 51264], "temperature": 0.0, "avg_logprob": -0.2014806377353953, "compression_ratio": + 1.4733727810650887, "no_speech_prob": 0.422952800989151}, {"id": 562, "seek": 212788, + "start": 2127.88, "end": 2148.88, "text": " So there you can see a here is the human + radar interface now I don''t know what is a good rating or not but here you can + know the rates and documents kind of taking this from this is a recent add on which + is if you''re if you can''t read it why right.", "tokens": [50364, 407, 456, 291, + 393, 536, 257, 510, 307, 264, 1952, 16544, 9226, 586, 286, 500, 380, 458, 437, 307, + 257, 665, 10990, 420, 406, 457, 510, 291, 393, 458, 264, 6846, 293, 8512, 733, 295, + 1940, 341, 490, 341, 307, 257, 5162, 909, 322, 597, 307, 498, 291, 434, 498, 291, + 393, 380, 1401, 309, 983, 558, 13, 51414], "temperature": 0.0, "avg_logprob": -0.23937542207779422, + "compression_ratio": 1.5796178343949046, "no_speech_prob": 0.6172176003456116}, + {"id": 563, "seek": 214888, "start": 2148.88, "end": 2158.88, "text": " I am a vet + in there I am not a vet and don''t understand the science.", "tokens": [50364, 286, + 669, 257, 12423, 294, 456, 286, 669, 406, 257, 12423, 293, 500, 380, 1223, 264, + 3497, 13, 50864], "temperature": 0.0, "avg_logprob": -0.24724134851674565, "compression_ratio": + 1.5633802816901408, "no_speech_prob": 0.512824535369873}, {"id": 564, "seek": 214888, + "start": 2158.88, "end": 2161.88, "text": " Yeah in this square.", "tokens": [50864, + 865, 294, 341, 3732, 13, 51014], "temperature": 0.0, "avg_logprob": -0.24724134851674565, + "compression_ratio": 1.5633802816901408, "no_speech_prob": 0.512824535369873}, {"id": + 565, "seek": 214888, "start": 2161.88, "end": 2172.88, "text": " Right and so I''ll + skip judge in that and that and that that''s been that''s been helpful in just cranking + out your human judgments so.", "tokens": [51014, 1779, 293, 370, 286, 603, 10023, + 6995, 294, 300, 293, 300, 293, 300, 300, 311, 668, 300, 311, 668, 4961, 294, 445, + 21263, 278, 484, 428, 1952, 40337, 370, 13, 51564], "temperature": 0.0, "avg_logprob": + -0.24724134851674565, "compression_ratio": 1.5633802816901408, "no_speech_prob": + 0.512824535369873}, {"id": 566, "seek": 217288, "start": 2172.88, "end": 2194.88, + "text": " Yeah go got just a couple of judgments mostly Jeff I yeah Jeff Scott 2500 + I''ve got four in here and I mark one is unreadable I can reset that which should + throw it back in the pool right maybe we have a conversation about why it was unreadable + and then throw it back in the.", "tokens": [50364, 865, 352, 658, 445, 257, 1916, + 295, 40337, 5240, 7506, 286, 1338, 7506, 6659, 41171, 286, 600, 658, 1451, 294, + 510, 293, 286, 1491, 472, 307, 517, 2538, 712, 286, 393, 14322, 300, 597, 820, 3507, + 309, 646, 294, 264, 7005, 558, 1310, 321, 362, 257, 3761, 466, 983, 309, 390, 517, + 2538, 712, 293, 550, 3507, 309, 646, 294, 264, 13, 51464], "temperature": 0.0, "avg_logprob": + -0.23891633929628314, "compression_ratio": 1.5222222222222221, "no_speech_prob": + 0.27343299984931946}, {"id": 567, "seek": 219488, "start": 2194.88, "end": 2209.88, + "text": " Almost there we are almost there right the background job is wonderful + but it doesn''t necessarily mean it''s any faster right now I know I know at least + watch it and watch the countdown so.", "tokens": [50364, 12627, 456, 321, 366, 1920, + 456, 558, 264, 3678, 1691, 307, 3715, 457, 309, 1177, 380, 4725, 914, 309, 311, + 604, 4663, 558, 586, 286, 458, 286, 458, 412, 1935, 1159, 309, 293, 1159, 264, 35985, + 370, 13, 51114], "temperature": 0.0, "avg_logprob": -0.24212326722986557, "compression_ratio": + 1.5833333333333333, "no_speech_prob": 0.41974183917045593}, {"id": 568, "seek": + 219488, "start": 2209.88, "end": 2216.88, "text": " Fantastic demo can you tell + a bit more about the tech side of things we didn''t mention sequel my sequel already + is.", "tokens": [51114, 21320, 10723, 393, 291, 980, 257, 857, 544, 466, 264, 7553, + 1252, 295, 721, 321, 994, 380, 2152, 20622, 452, 20622, 1217, 307, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.24212326722986557, "compression_ratio": 1.5833333333333333, + "no_speech_prob": 0.41974183917045593}, {"id": 569, "seek": 221688, "start": 2216.88, + "end": 2223.88, "text": " So if someone wants to jump in and start you know going + and and sort of what is the level of effort they need to go through.", "tokens": + [50364, 407, 498, 1580, 2738, 281, 3012, 294, 293, 722, 291, 458, 516, 293, 293, + 1333, 295, 437, 307, 264, 1496, 295, 4630, 436, 643, 281, 352, 807, 13, 50714], + "temperature": 0.0, "avg_logprob": -0.16827677457760543, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.5191554427146912}, {"id": 570, "seek": 221688, "start": 2223.88, + "end": 2239.88, "text": " It''s a little bit of a challenge right so most of us + so this is a Ruby on rails web app right like it is a full stack web app and this + is all just standard Ruby on rails the app is.", "tokens": [50714, 467, 311, 257, + 707, 857, 295, 257, 3430, 558, 370, 881, 295, 505, 370, 341, 307, 257, 19907, 322, + 27649, 3670, 724, 558, 411, 309, 307, 257, 1577, 8630, 3670, 724, 293, 341, 307, + 439, 445, 3832, 19907, 322, 27649, 264, 724, 307, 13, 51514], "temperature": 0.0, + "avg_logprob": -0.16827677457760543, "compression_ratio": 1.6363636363636365, "no_speech_prob": + 0.5191554427146912}, {"id": 571, "seek": 223988, "start": 2239.88, "end": 2268.88, + "text": " It''s been upgraded over the years to be with the latest standard and + so if you do rails development everything''s going to feel very comfortable obviously + a lot of us in the search or information retrieval world don''t have that expertise + and that that''s just a challenge so one thing I will say is that if you join or + ask questions on relevance slack pound cupid happy to answer those questions the + core application that you play with in here.", "tokens": [50364, 467, 311, 668, + 24133, 670, 264, 924, 281, 312, 365, 264, 6792, 3832, 293, 370, 498, 291, 360, 27649, + 3250, 1203, 311, 516, 281, 841, 588, 4619, 2745, 257, 688, 295, 505, 294, 264, 3164, + 420, 1589, 19817, 3337, 1002, 500, 380, 362, 300, 11769, 293, 300, 300, 311, 445, + 257, 3430, 370, 472, 551, 286, 486, 584, 307, 300, 498, 291, 3917, 420, 1029, 1651, + 322, 32684, 29767, 12013, 4414, 327, 2055, 281, 1867, 729, 1651, 264, 4965, 3861, + 300, 291, 862, 365, 294, 510, 13, 51814], "temperature": 0.0, "avg_logprob": -0.10352518270303915, + "compression_ratio": 1.6616541353383458, "no_speech_prob": 0.5748762488365173}, + {"id": 572, "seek": 226888, "start": 2268.88, "end": 2293.88, "text": " So it''s + an old angular one works great no problems but it''s an angular one app and because + it''s an open source project not a commercial product we sort of stayed away from + attempting the big rewrite to update it to react or name your thing lots of examples + it seems to work fine animals.", "tokens": [50364, 407, 309, 311, 364, 1331, 24413, + 472, 1985, 869, 572, 2740, 457, 309, 311, 364, 24413, 472, 724, 293, 570, 309, 311, + 364, 1269, 4009, 1716, 406, 257, 6841, 1674, 321, 1333, 295, 9181, 1314, 490, 22001, + 264, 955, 28132, 281, 5623, 309, 281, 4515, 420, 1315, 428, 551, 3195, 295, 5110, + 309, 2544, 281, 589, 2489, 4882, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.2175302505493164, "compression_ratio": 1.5944444444444446, "no_speech_prob": + 0.02821049839258194}, {"id": 573, "seek": 229388, "start": 2293.88, "end": 2318.88, + "text": " So cupid angular one app for all of this and then outside of this this + is all just a standard rails application lots of model view controller type screens + that you can see right here all standard rails my SQL database redis for sort of + the communication layer.", "tokens": [50364, 407, 4414, 327, 24413, 472, 724, 337, + 439, 295, 341, 293, 550, 2380, 295, 341, 341, 307, 439, 445, 257, 3832, 27649, 3861, + 3195, 295, 2316, 1910, 10561, 2010, 11171, 300, 291, 393, 536, 558, 510, 439, 3832, + 27649, 452, 19200, 8149, 2182, 271, 337, 1333, 295, 264, 6101, 4583, 13, 51614], + "temperature": 0.0, "avg_logprob": -0.1355051734230735, "compression_ratio": 1.5568862275449102, + "no_speech_prob": 0.02134743519127369}, {"id": 574, "seek": 231888, "start": 2318.88, + "end": 2340.88, "text": " And it''s all built using Docker so if you want to so + the read me has way too much developer centric set up right but if you have Docker + then you run bin setup Docker yeah that will set you up with the development environment + literally what I was just showing is the inside of Docker.", "tokens": [50364, 400, + 309, 311, 439, 3094, 1228, 33772, 370, 498, 291, 528, 281, 370, 264, 1401, 385, + 575, 636, 886, 709, 10754, 1489, 1341, 992, 493, 558, 457, 498, 291, 362, 33772, + 550, 291, 1190, 5171, 8657, 33772, 1338, 300, 486, 992, 291, 493, 365, 264, 3250, + 2823, 3736, 437, 286, 390, 445, 4099, 307, 264, 1854, 295, 33772, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.2526758587549603, "compression_ratio": 1.6079545454545454, + "no_speech_prob": 0.2333473414182663}, {"id": 575, "seek": 234088, "start": 2340.88, + "end": 2355.88, "text": " And then you fire it up locally with bin Docker server + and that runs it locally so there''s a lot of docs in here for all the different + parts that can be a little overwhelming I think we have to rework some of this documentation + but it''s all there.", "tokens": [50364, 400, 550, 291, 2610, 309, 493, 16143, 365, + 5171, 33772, 7154, 293, 300, 6676, 309, 16143, 370, 456, 311, 257, 688, 295, 45623, + 294, 510, 337, 439, 264, 819, 3166, 300, 393, 312, 257, 707, 13373, 286, 519, 321, + 362, 281, 48376, 512, 295, 341, 14333, 457, 309, 311, 439, 456, 13, 51114], "temperature": + 0.0, "avg_logprob": -0.1522289855139596, "compression_ratio": 1.496969696969697, + "no_speech_prob": 0.034187767654657364}, {"id": 576, "seek": 235588, "start": 2355.88, + "end": 2381.88, "text": " Now a couple of things I''ll show off we actually have + an API now so right here you have a you come here and you generate your personal + access token like that and just for fun by and and this curl command will show you + your user so we have authentication API.", "tokens": [50364, 823, 257, 1916, 295, + 721, 286, 603, 855, 766, 321, 767, 362, 364, 9362, 586, 370, 558, 510, 291, 362, + 257, 291, 808, 510, 293, 291, 8460, 428, 2973, 2105, 14862, 411, 300, 293, 445, + 337, 1019, 538, 293, 293, 341, 22591, 5622, 486, 855, 291, 428, 4195, 370, 321, + 362, 26643, 9362, 13, 51664], "temperature": 0.0, "avg_logprob": -0.18031955587452855, + "compression_ratio": 1.5266272189349113, "no_speech_prob": 0.207333043217659}, {"id": + 577, "seek": 238188, "start": 2381.88, "end": 2393.88, "text": " And we''re slowly + working on documenting all of those API API API API.", "tokens": [50364, 400, 321, + 434, 5692, 1364, 322, 42360, 439, 295, 729, 9362, 9362, 9362, 9362, 13, 50964], + "temperature": 0.0, "avg_logprob": -0.3627154450667532, "compression_ratio": 1.078125, + "no_speech_prob": 0.06112031638622284}, {"id": 578, "seek": 239388, "start": 2394.88, + "end": 2396.88, "text": " I''m doing well.", "tokens": [50414, 286, 478, 884, 731, + 13, 50514], "temperature": 0.0, "avg_logprob": -0.20875022218034073, "compression_ratio": + 1.5544554455445545, "no_speech_prob": 0.02981564961373806}, {"id": 579, "seek": + 239388, "start": 2400.88, "end": 2422.88, "text": " So there we go API API slash + we''re slowly documenting all the APIs and so one of the things that I encourage + people right is maybe Cupid doesn''t do everything you need to do and so you''re + building some scripts outside of it or in some notebooks but you can use Cupid as + your shared source of truth.", "tokens": [50714, 407, 456, 321, 352, 9362, 9362, + 17330, 321, 434, 5692, 42360, 439, 264, 21445, 293, 370, 472, 295, 264, 721, 300, + 286, 5373, 561, 558, 307, 1310, 383, 6127, 1177, 380, 360, 1203, 291, 643, 281, + 360, 293, 370, 291, 434, 2390, 512, 23294, 2380, 295, 309, 420, 294, 512, 43782, + 457, 291, 393, 764, 383, 6127, 382, 428, 5507, 4009, 295, 3494, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.20875022218034073, "compression_ratio": 1.5544554455445545, + "no_speech_prob": 0.02981564961373806}, {"id": 580, "seek": 242388, "start": 2423.88, + "end": 2443.88, "text": " So maybe you have a case that represents your golden set + of queries right you in your notebook can go and grab all the query so and so we''re + adding sort of more and more documentation on all of these different API so that''s + fantastic yeah.", "tokens": [50364, 407, 1310, 291, 362, 257, 1389, 300, 8855, 428, + 9729, 992, 295, 24109, 558, 291, 294, 428, 21060, 393, 352, 293, 4444, 439, 264, + 14581, 370, 293, 370, 321, 434, 5127, 1333, 295, 544, 293, 544, 14333, 322, 439, + 295, 613, 819, 9362, 370, 300, 311, 5456, 1338, 13, 51364], "temperature": 0.0, + "avg_logprob": -0.08215355423261535, "compression_ratio": 1.4573170731707317, "no_speech_prob": + 0.0008513920474797487}, {"id": 581, "seek": 244388, "start": 2443.88, "end": 2454.88, + "text": " So make it a little bit easier for people to understand so I can look + at this here''s case for but I can also look at it like this.", "tokens": [50364, + 407, 652, 309, 257, 707, 857, 3571, 337, 561, 281, 1223, 370, 286, 393, 574, 412, + 341, 510, 311, 1389, 337, 457, 286, 393, 611, 574, 412, 309, 411, 341, 13, 50914], + "temperature": 0.0, "avg_logprob": -0.15308488070309817, "compression_ratio": 1.662037037037037, + "no_speech_prob": 0.010321793146431446}, {"id": 582, "seek": 244388, "start": 2454.88, + "end": 2456.88, "text": " That''s a Jason right.", "tokens": [50914, 663, 311, 257, + 11181, 558, 13, 51014], "temperature": 0.0, "avg_logprob": -0.15308488070309817, + "compression_ratio": 1.662037037037037, "no_speech_prob": 0.010321793146431446}, + {"id": 583, "seek": 244388, "start": 2456.88, "end": 2471.88, "text": " This should + give me back my Jason right it''s all my Jason data right there''s all my different + scores etc so if Cupid provides a value but doesn''t do everything you need to do + you can read him right from it.", "tokens": [51014, 639, 820, 976, 385, 646, 452, + 11181, 558, 309, 311, 439, 452, 11181, 1412, 558, 456, 311, 439, 452, 819, 13444, + 5183, 370, 498, 383, 6127, 6417, 257, 2158, 457, 1177, 380, 360, 1203, 291, 643, + 281, 360, 291, 393, 1401, 796, 558, 490, 309, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.15308488070309817, "compression_ratio": 1.662037037037037, "no_speech_prob": + 0.010321793146431446}, {"id": 584, "seek": 247188, "start": 2471.88, "end": 2481.88, + "text": " So I''m going to do a lot of export name pork functions as well so yeah + so it''s fantastic and yeah so it''s loaded it''s loaded.", "tokens": [50364, 407, + 286, 478, 516, 281, 360, 257, 688, 295, 10725, 1315, 10208, 6828, 382, 731, 370, + 1338, 370, 309, 311, 5456, 293, 1338, 370, 309, 311, 13210, 309, 311, 13210, 13, + 50864], "temperature": 0.0, "avg_logprob": -0.46937929119980126, "compression_ratio": + 1.36, "no_speech_prob": 0.23623323440551758}, {"id": 585, "seek": 247188, "start": + 2481.88, "end": 2492.88, "text": " Like there we go 29,000 or 87 query document + pairs and 29,291 judgments right.", "tokens": [50864, 1743, 456, 321, 352, 9413, + 11, 1360, 420, 27990, 14581, 4166, 15494, 293, 9413, 11, 11871, 16, 40337, 558, + 13, 51414], "temperature": 0.0, "avg_logprob": -0.46937929119980126, "compression_ratio": + 1.36, "no_speech_prob": 0.23623323440551758}, {"id": 586, "seek": 249288, "start": + 2492.88, "end": 2497.88, "text": " So there is all preserved so.", "tokens": [50364, + 407, 456, 307, 439, 22242, 370, 13, 50614], "temperature": 0.0, "avg_logprob": -0.34124400697905444, + "compression_ratio": 1.4529411764705882, "no_speech_prob": 0.33454883098602295}, + {"id": 587, "seek": 249288, "start": 2497.88, "end": 2511.88, "text": " There you + go this is fantastic thanks for the demo Eric I learned because I wasn''t keeping + up as closely I think we also are writing an outdated version of Cupid so I will + ask the team to to upgrade obviously because.", "tokens": [50614, 821, 291, 352, + 341, 307, 5456, 3231, 337, 264, 10723, 9336, 286, 3264, 570, 286, 2067, 380, 5145, + 493, 382, 8185, 286, 519, 321, 611, 366, 3579, 364, 36313, 3037, 295, 383, 6127, + 370, 286, 486, 1029, 264, 1469, 281, 281, 11484, 2745, 570, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.34124400697905444, "compression_ratio": 1.4529411764705882, + "no_speech_prob": 0.33454883098602295}, {"id": 588, "seek": 251188, "start": 2511.88, + "end": 2525.88, "text": " Yeah we should yeah yeah the release cadence is fairly + fast so make sure your deployment model is pretty simplistic and automated so it''s + keep up yeah exactly.", "tokens": [50364, 865, 321, 820, 1338, 1338, 264, 4374, + 46109, 307, 6457, 2370, 370, 652, 988, 428, 19317, 2316, 307, 1238, 44199, 293, + 18473, 370, 309, 311, 1066, 493, 1338, 2293, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.15242340625860754, "compression_ratio": 1.59375, "no_speech_prob": 0.17828050255775452}, + {"id": 589, "seek": 251188, "start": 2525.88, "end": 2536.88, "text": " This is + fantastic i''m sure we can talk more about your other projects and we will save + it for another episode yeah but I was also thinking I like to ask this question + and now I get the chance yeah.", "tokens": [51064, 639, 307, 5456, 741, 478, 988, + 321, 393, 751, 544, 466, 428, 661, 4455, 293, 321, 486, 3155, 309, 337, 1071, 3500, + 1338, 457, 286, 390, 611, 1953, 286, 411, 281, 1029, 341, 1168, 293, 586, 286, 483, + 264, 2931, 1338, 13, 51614], "temperature": 0.0, "avg_logprob": -0.15242340625860754, + "compression_ratio": 1.59375, "no_speech_prob": 0.17828050255775452}, {"id": 590, + "seek": 253688, "start": 2536.88, "end": 2558.88, "text": " Why the the question + of why I call it or the the motivational what keeps you up and at night so to say + why you are still in search Eric you''ve spent so many years do you think it''s + still unsolved or what what keeps you going in in this topic yeah so what I love + about search is it it kind of reinvents itself every.", "tokens": [50364, 1545, + 264, 264, 1168, 295, 983, 286, 818, 309, 420, 264, 264, 48186, 437, 5965, 291, 493, + 293, 412, 1818, 370, 281, 584, 983, 291, 366, 920, 294, 3164, 9336, 291, 600, 4418, + 370, 867, 924, 360, 291, 519, 309, 311, 920, 2693, 29110, 420, 437, 437, 5965, 291, + 516, 294, 294, 341, 4829, 1338, 370, 437, 286, 959, 466, 3164, 307, 309, 309, 733, + 295, 6561, 85, 791, 2564, 633, 13, 51464], "temperature": 0.0, "avg_logprob": -0.15960806294491417, + "compression_ratio": 1.6153846153846154, "no_speech_prob": 0.5152212381362915}, + {"id": 591, "seek": 255888, "start": 2558.88, "end": 2582.88, "text": " I mean seven + years five to seven years it sort of reinvents itself every seven years right I + sort of started out with saying at one time it was exciting just to have an open + source search engine right in a world of big expensive commercial search engines + and then it was really exciting to get into big data from the search perspective.", + "tokens": [50364, 286, 914, 3407, 924, 1732, 281, 3407, 924, 309, 1333, 295, 6561, + 85, 791, 2564, 633, 3407, 924, 558, 286, 1333, 295, 1409, 484, 365, 1566, 412, 472, + 565, 309, 390, 4670, 445, 281, 362, 364, 1269, 4009, 3164, 2848, 558, 294, 257, + 1002, 295, 955, 5124, 6841, 3164, 12982, 293, 550, 309, 390, 534, 4670, 281, 483, + 666, 955, 1412, 490, 264, 3164, 4585, 13, 51564], "temperature": 0.0, "avg_logprob": + -0.12751359939575196, "compression_ratio": 1.7591623036649215, "no_speech_prob": + 0.34598419070243835}, {"id": 592, "seek": 258288, "start": 2582.88, "end": 2603.88, + "text": " becoming a data scientist right I mean I I I pretend to be a data scientist + I pretend to be a machine learning guy right through search right so it reinvented + itself and now i''m a prompt engineer and generative AI person through search and + so I love that the field reinvents itself.", "tokens": [50364, 5617, 257, 1412, + 12662, 558, 286, 914, 286, 286, 286, 11865, 281, 312, 257, 1412, 12662, 286, 11865, + 281, 312, 257, 3479, 2539, 2146, 558, 807, 3164, 558, 370, 309, 33477, 292, 2564, + 293, 586, 741, 478, 257, 12391, 11403, 293, 1337, 1166, 7318, 954, 807, 3164, 293, + 370, 286, 959, 300, 264, 2519, 6561, 85, 791, 2564, 13, 51414], "temperature": 0.0, + "avg_logprob": -0.19884583306691003, "compression_ratio": 1.7407407407407407, "no_speech_prob": + 0.03838794678449631}, {"id": 593, "seek": 260388, "start": 2603.88, "end": 2630.88, + "text": " But also certain long standing principles around measurement experimentation + appear to remain relevant even though it reinvents itself every seven years right + and it''s been really, really exciting like I like that what i''m doing now is not + what I was doing seven years ago and I suspect I won''t be doing it in another seven + years.", "tokens": [50364, 583, 611, 1629, 938, 4877, 9156, 926, 13160, 37142, 4204, + 281, 6222, 7340, 754, 1673, 309, 6561, 85, 791, 2564, 633, 3407, 924, 558, 293, + 309, 311, 668, 534, 11, 534, 4670, 411, 286, 411, 300, 437, 741, 478, 884, 586, + 307, 406, 437, 286, 390, 884, 3407, 924, 2057, 293, 286, 9091, 286, 1582, 380, 312, + 884, 309, 294, 1071, 3407, 924, 13, 51714], "temperature": 0.0, "avg_logprob": -0.1421948320725385, + "compression_ratio": 1.6751269035532994, "no_speech_prob": 0.002866325667127967}, + {"id": 594, "seek": 263088, "start": 2630.88, "end": 2650.88, "text": " And that + I like making things happen I like solving problems and search remains sort of the + way people interact with technology systems right I am really intrigued or looking + forward to when.", "tokens": [50364, 400, 300, 286, 411, 1455, 721, 1051, 286, 411, + 12606, 2740, 293, 3164, 7023, 1333, 295, 264, 636, 561, 4648, 365, 2899, 3652, 558, + 286, 669, 534, 35140, 420, 1237, 2128, 281, 562, 13, 51364], "temperature": 0.0, + "avg_logprob": -0.1697426720669395, "compression_ratio": 1.3741007194244603, "no_speech_prob": + 0.0033195093274116516}, {"id": 595, "seek": 265088, "start": 2650.88, "end": 2663.88, + "text": " Search isn''t just I ask a question get a response but I ask a question + get a response then I have another conversation and the search engine understands + that we actually have we we talk about search as a conversation but.", "tokens": + [50364, 17180, 1943, 380, 445, 286, 1029, 257, 1168, 483, 257, 4134, 457, 286, 1029, + 257, 1168, 483, 257, 4134, 550, 286, 362, 1071, 3761, 293, 264, 3164, 2848, 15146, + 300, 321, 767, 362, 321, 321, 751, 466, 3164, 382, 257, 3761, 457, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.14438152313232422, "compression_ratio": 1.805668016194332, + "no_speech_prob": 0.27710720896720886}, {"id": 596, "seek": 265088, "start": 2663.88, + "end": 2678.88, "text": " We don''t normally do that we just pretend it''s a one + shot kind of thing I look forward to that side of things and then what are the new + use cases we''re going to enable right i''m going with my family to Spain for three + weeks.", "tokens": [51014, 492, 500, 380, 5646, 360, 300, 321, 445, 11865, 309, + 311, 257, 472, 3347, 733, 295, 551, 286, 574, 2128, 281, 300, 1252, 295, 721, 293, + 550, 437, 366, 264, 777, 764, 3331, 321, 434, 516, 281, 9528, 558, 741, 478, 516, + 365, 452, 1605, 281, 12838, 337, 1045, 3259, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.14438152313232422, "compression_ratio": 1.805668016194332, "no_speech_prob": + 0.27710720896720886}, {"id": 597, "seek": 267888, "start": 2678.88, "end": 2707.88, + "text": " In July got my plane tickets booked I got not great flights but cheap + flights imagine that there was a search engine out there that knew what my plane + flights were knew what my wife''s personal tolerances are and if it was constantly + shopping for a cheaper flight and actually cancel current flight and gave bought + the new one and you know just let me know by the way I saved you.", "tokens": [50364, + 682, 7370, 658, 452, 5720, 12628, 26735, 286, 658, 406, 869, 21089, 457, 7084, 21089, + 3811, 300, 456, 390, 257, 3164, 2848, 484, 456, 300, 2586, 437, 452, 5720, 21089, + 645, 2586, 437, 452, 3836, 311, 2973, 11125, 2676, 366, 293, 498, 309, 390, 6460, + 8688, 337, 257, 12284, 7018, 293, 767, 10373, 2190, 7018, 293, 2729, 4243, 264, + 777, 472, 293, 291, 458, 445, 718, 385, 458, 538, 264, 636, 286, 6624, 291, 13, + 51814], "temperature": 0.0, "avg_logprob": -0.1499687870846519, "compression_ratio": + 1.7546296296296295, "no_speech_prob": 0.3149184584617615}, {"id": 598, "seek": 270788, + "start": 2707.88, "end": 2729.88, "text": " Another 400 bucks for your family of + four or I found a better flight or there was an upgrade right like wouldn''t be + amazing if once you kind of gave it the parameters is doing that and I suspect that''s + going to kind of look like a search experience right it''s going to be a query with + a bunch of parameters.", "tokens": [50364, 3996, 8423, 11829, 337, 428, 1605, 295, + 1451, 420, 286, 1352, 257, 1101, 7018, 420, 456, 390, 364, 11484, 558, 411, 2759, + 380, 312, 2243, 498, 1564, 291, 733, 295, 2729, 309, 264, 9834, 307, 884, 300, 293, + 286, 9091, 300, 311, 516, 281, 733, 295, 574, 411, 257, 3164, 1752, 558, 309, 311, + 516, 281, 312, 257, 14581, 365, 257, 3840, 295, 9834, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.07651544653851053, "compression_ratio": 1.5743589743589743, + "no_speech_prob": 0.001953707542270422}, {"id": 599, "seek": 272988, "start": 2729.88, + "end": 2752.88, "text": " It understands what my preferences and tolerances and + risks are right and that''s going to be a really interesting thing to measure and + I think it''ll be really powerful so so excited about that future I suspect that''s + the next thing that we get to once we get through kind of the current generative + AI stuff.", "tokens": [50364, 467, 15146, 437, 452, 21910, 293, 11125, 2676, 293, + 10888, 366, 558, 293, 300, 311, 516, 281, 312, 257, 534, 1880, 551, 281, 3481, 293, + 286, 519, 309, 603, 312, 534, 4005, 370, 370, 2919, 466, 300, 2027, 286, 9091, 300, + 311, 264, 958, 551, 300, 321, 483, 281, 1564, 321, 483, 807, 733, 295, 264, 2190, + 1337, 1166, 7318, 1507, 13, 51514], "temperature": 0.0, "avg_logprob": -0.11063784541505756, + "compression_ratio": 1.641711229946524, "no_speech_prob": 0.22453458607196808}, + {"id": 600, "seek": 275288, "start": 2752.88, "end": 2769.88, "text": " That''s + a beautiful answer thanks so much really I''ve learned a lot today I''m sure we''ll + repeat this let''s do it I know you have another topic to talk about from your conference + talk and another project you''re working on and I''m sure keep it keep it continues + to be.", "tokens": [50364, 663, 311, 257, 2238, 1867, 3231, 370, 709, 534, 286, + 600, 3264, 257, 688, 965, 286, 478, 988, 321, 603, 7149, 341, 718, 311, 360, 309, + 286, 458, 291, 362, 1071, 4829, 281, 751, 466, 490, 428, 7586, 751, 293, 1071, 1716, + 291, 434, 1364, 322, 293, 286, 478, 988, 1066, 309, 1066, 309, 6515, 281, 312, 13, + 51214], "temperature": 0.0, "avg_logprob": -0.15784005195863784, "compression_ratio": + 1.5497076023391814, "no_speech_prob": 0.14440293610095978}, {"id": 601, "seek": + 276988, "start": 2769.88, "end": 2796.88, "text": " Really relevant to what we do + it''s it''s a it''s a toolbox right it''s it''s it''s all in your toolbox or maybe + it''s a toolbox of full of tools but I think it''s it''s fantastic one to have to + really complete your search journey because if you are only writing code and you''re + never looking at queries you''re never labeling you never here would you know how + how does it feel like you know using this change and you will not get far so please + use it.", "tokens": [50364, 4083, 7340, 281, 437, 321, 360, 309, 311, 309, 311, + 257, 309, 311, 257, 44593, 558, 309, 311, 309, 311, 309, 311, 439, 294, 428, 44593, + 420, 1310, 309, 311, 257, 44593, 295, 1577, 295, 3873, 457, 286, 519, 309, 311, + 309, 311, 5456, 472, 281, 362, 281, 534, 3566, 428, 3164, 4671, 570, 498, 291, 366, + 787, 3579, 3089, 293, 291, 434, 1128, 1237, 412, 24109, 291, 434, 1128, 40244, 291, + 1128, 510, 576, 291, 458, 577, 577, 775, 309, 841, 411, 291, 458, 1228, 341, 1319, + 293, 291, 486, 406, 483, 1400, 370, 1767, 764, 309, 13, 51714], "temperature": 0.0, + "avg_logprob": -0.1700312605181944, "compression_ratio": 1.8571428571428572, "no_speech_prob": + 0.5359926819801331}, {"id": 602, "seek": 279688, "start": 2796.88, "end": 2813.88, + "text": " I mean of course you can set up something you know in the kitchen with + Excel but the Microsoft Excel whatever Google spreadsheets but maybe that''s not + scalable enough and not repeatable and yeah why waste time if they''re already cool + tools like keep it open source really excited about this.", "tokens": [50364, 286, + 914, 295, 1164, 291, 393, 992, 493, 746, 291, 458, 294, 264, 6525, 365, 19060, 457, + 264, 8116, 19060, 2035, 3329, 23651, 1385, 457, 1310, 300, 311, 406, 38481, 1547, + 293, 406, 7149, 712, 293, 1338, 983, 5964, 565, 498, 436, 434, 1217, 1627, 3873, + 411, 1066, 309, 1269, 4009, 534, 2919, 466, 341, 13, 51214], "temperature": 0.0, + "avg_logprob": -0.18147932688395182, "compression_ratio": 1.5077720207253886, "no_speech_prob": + 0.5349492430686951}, {"id": 603, "seek": 281388, "start": 2813.88, "end": 2828.88, + "text": " That''s great that''s great yeah the scaling up is super great super exciting + so yeah it will be interesting to make sure that keep it remains true to what it + does and doesn''t try to become all things to all people we''ll see what happens.", + "tokens": [50364, 663, 311, 869, 300, 311, 869, 1338, 264, 21589, 493, 307, 1687, + 869, 1687, 4670, 370, 1338, 309, 486, 312, 1880, 281, 652, 988, 300, 1066, 309, + 7023, 2074, 281, 437, 309, 775, 293, 1177, 380, 853, 281, 1813, 439, 721, 281, 439, + 561, 321, 603, 536, 437, 2314, 13, 51114], "temperature": 0.0, "avg_logprob": -0.1233453523545038, + "compression_ratio": 1.6406926406926408, "no_speech_prob": 0.43677860498428345}, + {"id": 604, "seek": 281388, "start": 2828.88, "end": 2838.88, "text": " Absolutely + yes and you you as a listener have a chance to contribute it''s open source exactly + exactly thanks so much Eric I really enjoyed it.", "tokens": [51114, 7021, 2086, + 293, 291, 291, 382, 257, 31569, 362, 257, 2931, 281, 10586, 309, 311, 1269, 4009, + 2293, 2293, 3231, 370, 709, 9336, 286, 534, 4626, 309, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.1233453523545038, "compression_ratio": 1.6406926406926408, + "no_speech_prob": 0.43677860498428345}, {"id": 605, "seek": 283888, "start": 2838.88, + "end": 2842.88, "text": " Thank you bye bye cheers.", "tokens": [50364, 1044, 291, + 6543, 6543, 15301, 13, 50564], "temperature": 0.0, "avg_logprob": -0.5859371821085612, + "compression_ratio": 0.8620689655172413, "no_speech_prob": 0.32112786173820496}]' +--- + +Hello there, vector podcast season 3. In this season I made one simple promise. I will try to stick to 30 minute episodes. Let's see how well I will do it. It's not always easy, especially when you have guests like Eric Pugh that I'm really having a pleasure to talk to today. +I can say that we've been working together on Cupid, on ideation, on things. And I've learned a ton from you. Yeah, yeah, I'm super excited. So when did I come visit you? I think it was two years ago or some two years ago. I think so. It was pandemic, I guess. +Yeah, it was the very end of the pandemic. Right. I remember getting my, yeah, so it was still pandemic. It was still pandemic. Yeah, it's still pandemic, right? Because I had to get Cupid to say, yeah, so I was, yeah, I was like, I want to meet to meet you in person. +And I called you and said, I'm going to come to Helsinki and visit you. And I think you were like, why? I mean, we don't work together or say we worked on Cupid though. Quite a few evenings together, right? I think it was like nine o'clock your time, Helsinki time. Yes. Yes. +Who was it? And it was Friday. I remember vividly Friday. What else to do? So I went camping with my family. Can I screen share? Did you meet? Yes, yes. Of course. You can give me permissions. I went camping with my family. +And if you think back to my visit to you, you and your wife gave me a little gift. Yeah, give me a free share. I can only do you host. Let's do a host and you can screen share. Yeah, I can. All right. And so I just wanted to share off this, this cup. Oh, there it is. +So we've had that little wooden cup. I think it's a traditional finish drinking vessel when you're out in nature. And there it is with coffee. +And then I'm also showing off my metal ceramic metal enameled cup that I picked up at OpenSearchCon EU a couple weeks ago in Berlin that Zeta Alpha shared had some great conversations about search, relevancy and measurement with them. +So we took these two cups on our family camping trip the other week. I wanted to show those off to you. This is lovely. This is lovely. And I'm glad you're putting this in good news. Yep. It goes with us. So fantastic. Yes. Yes, where we start. First of all, hello. Welcome. Welcome. Yeah. +Thank you very much for having me. It's long overdue. And usually we start with a little bit of a background. Obviously, people can go. I think you even have a Wikipedia page about you. I think so. I don't know. That is a lot of people. Right. That is a lot of people to get to a Wikipedia page. +I don't know that I'm quite there yet. Yes. So my name's Eric Pugh. Been doing search for about, I don't know. We're like getting 15 years. And I was there for when a search was like first, oh, you have your own search engine. It was very exotic. And there was nothing open source. +It was all commercial. And then cut my teeth in search going through the big data time period. Right. When, as Grant Ingersoll said once, search is the UI to big data. And it was all about data. +Can we handle and how do we store it and scale up our search engines? And that was great and kind of led into the machine learning time period. We're really at that point, it was like, OK, we have lots of data. We can now search it. +What does it mean? What are people looking for? It wasn't enough to have fast search with 10 blue links. +It was all of a sudden became really important to be like, am I giving my users what they want or not? And machine learning and data science really kind of came along and helped us make those determinations. +So really, and that's when open source connections, the company I was one of the co-founders of, and I'm one of the leaders of really kind of focusing on the value side of search, relevancy. +Am I giving people what they're looking for? How do I drive more revenue in e-commerce? How do I help people use my SaaS products? Are they subscribed and renew their subscriptions? All of this, right? And yeah, machine learning was awesome. Data science was awesome. +Really got into a whole measurement thing. And that was kind of one of the products that I stored, Cupid, we know each other, came out of that time period because we said, why are we building custom tooling for every project, maybe we could share some things. +So, and then yeah, today it's really been exciting to see sort of generative AI come along and vectors. +And it's interesting because I still feel, you know, for a little while I was like, is search still gonna be a domain? And you know, search is totally changed, but it's still how people interact with systems, right? +Whether it's a spot and a retrieval augmented generation or a more traditional keyword search, using LLMs, using models, using vectors, still a search engine in the middle of it, mediating, moderating that conversation. +So really excited about what Gen AI has let us do. And I think my big takeaway right now is that historically search was fairly mediocre. You could make it a little better, you could make it a little worse, but it was always like people understood it was fairly explainable. +Why I'm really excited about measurement and understanding these days is because now with Gen AI, we have much better tools. We don't have to have mediocre search kind of better, kind of worse. Instead we can have amazing, accurate search results that really understand what you're looking for. +And you're like, yes, this is exactly what I wanted. But, whoops side of it is sometimes those search results are back shit crazy and you know, no idea why it came back with it and you made me lose trust. +And so now instead of all search results sort of being in the middle sort of, yeah, a little better, little worse, we're now really polarized. Sometimes they're amazing, sometimes they're terrible. +And we need to understand what that curve looks like and make sure that the amount of terrible is something that we're willing to deal with, right? +Terrible results, one in 10,000, one in 5,000, one in a million, depending on your domain, it may need to be one in a billion is a terrible, right, depending on what you're doing. +So exciting times, really exciting. Yeah, it's amazing, it's amazing story. And of course, I'm very pleased to also being able to pick up, keep it with you early on, where I tried to pioneer it two companies ago and I was leaving actually. But it was almost ready. +And then the next company I actually deployed it. And we, I remember we generated 70 G-RAT tickets just by looking at queries in Cupid because you know how it usually goes. +People develop software, other people check on it, other people are just project managing and things like this and no one really takes the lead on looking at the queries. And this is actually the most fun sometimes to look at queries and sort of you know investigate what's going on. +Do you even like these results? How do you feel about them? You know let alone setting up a team around it where some annotators can actually go and label with some domain expertise, you know, or maybe pretending to be users and things like this. +So it's an amazing system and we continue to use it today. Of course this was the first thing I pioneered at TomTom and it's still there. It's fantastic, that is wonderful. I mean, it's been great to see sort of the adoption of the product and then people have been using it for a long time. +So I'm going to show a query set today that is a thousand queries and maybe a thousand queries that have been judged 10 deep right by hand for three years. Almost four years. +This one organization that Nerry Information Network has been using Cupid for years and now they built up this massive body of ratings and they have tons of data and trend lines for what did search look like four years ago? What did it look like last year? What does it look like today? +It's really been exciting to see them. +They've just been using the little hosted Cupid app.cupid.com and but it's worked for them. So a thousand queries definitely takes a long time to work your way through. But these days they're just kind of keeping an eye on what's changing, right? Barring a major algorithm change. +It's just sort of staying on top of it and keeping everything right. But yeah, so it's really exciting to see people using it. Yeah. +Definitely I'm having a little bit of thoughts about where does Cupid live in our generative AI future? Been playing a lot with tools like Ragus and some of the other ones, right? And it's interesting to see what tooling and where does Cupid do things? Well, where does it have challenges? +Where do we want to go with? So yeah, for sure. +And for those who don't know Cupid, I mean, I can give my short intro, but obviously feel free to augment. But like the way I see it is that it's basically instead of hearsay and sort of someone saying, your search doesn't work. And here is one anecdotal example. +What you can do is that, oh, vice versa, you could say I improved search. +And here is one anecdotal example where it really shines, right? Now what should we ship it? So basically I think Cupid really gives you the tooling and you can actually, even if you want, you can even do it in an unbiased way, as possible where you will do blind labeling in some sense, right? +So I've done it actually just recently. +And basically you allow your users, well, your domain experts actually, but maybe even developers to go label queries. +And it also has this sandbox where you can actually, well, you can plug in your own engine, but you can also plug in those standard engines like Elastic Search Solar, Open Search and others. +And I think you even added some vector search engines recently, right? Yeah, so we have a Vectara, which is a pure vector search engine. We've got out the only app. +And then Open Search, Elastic Search Solar, the Lucine Bay search engines, and then kind of exciting, you can also now plug in your own search API. And so you can just talk to any API, a restful Git post JSON sort of API, you can use Cupid as well. So that's been really good. Fantastic. +I love this YCupid. This is sort of the origin story Doug Turmbolt, who many of you may know, right, from his book Relevant Search. He created Cupid. And we're looking at like a decade ago at this point. +And it was because, you know, it was difficult to measure and improve search, right? Lots of spreadsheets going back, lots of conversations. You fix one thing, break another. And Doug and Arena were working together on a project. And it was literally, this is the origin story for Cupid. +So Cupid's all about making collaboration better, making your testing more accurate, and making things go faster, right? Because we need to iterate and experiment quickly, right? +The one thing I know is that the team that can experiment quickly and effectively is the team that's going to win out, right? It's not about specific technology choices or technical expertise. +It's experimentation. Can you do it quickly? So yeah, so Cupid.com has the advertising free hosted version, really excited. It sort of continues to be useful in today's world. Absolutely. And it's also open source, right? So you don't have to be buying anything, whatever. +It used to be a product though. It used to be generating revenue. Yeah, I mean, you told me. We're consulting soon. So yeah, we used to sell it. We used to sell it for $10,000 a year for an enterprise license. And we had customers and it was great. +But I think then we figured out we were making, I don't know, $80,000 a year, which sounds like a lot, but then investing $150,000 in salary, supporting it. And it was like, yeah, we're not a product company. And we are open source connections, having a commercial product just didn't fit naturally. +And since we're all about training our clients and empowering search teams, right? Doesn't necessarily feel empowering to be like, yes, we've empowered you, but you have to pay us money every month for this one product, right? Just felt more natural to have it as an open source project. +Yeah, absolutely. And it also fits your, well, how should I say, your professional line at UI commuter. You've seen solar commuter, right? Yeah, so I am a commuter, not active on Lucene, that's just a level of technical expertise. But I am a commuter on solar. +And then interesting as like an interesting personal professional development, I've been really gotten much more involved with the open search community over the whole year. +And so I'm now a, they call it a maintainer instead of commuter, but I am a maintainer for open search documentation, which has really been really a lot fun to work on. And we're talking about it maybe in another podcast, but contributing some new features to open search, the open source product. +So really excited about that. Sam. Actually, give me one second. I have one thing to confess, one second. I have to confess or share one personal bit that when I started in search, it was, of course, it was early. It was like 2003 about when I wrote my own search engine. +But when I started doing search in the industry, right? It was 2010. And it was a patch of solar. And when you Google a patch of solar, you would mostly find Java, Java dog. Yeah. +And maybe, and then I figured out there is also a mailing list was like, but is there a place where I can read about solar besides Vicky pages because Vicky pages were not kind of complete in a way? Yep. Yep. I was like, and I found this, this book. Oh my gosh. One point four. Yeah. +A prior server data. Yes. Yes. Yes. And I've read it covered to cover. I have to say it because because I had one challenging task. I had to build what I suggest and that I would suggest had to abide to certain rules. +And I was like, oh my god, how will I do it? In the moment I did it, it was also slow. So I had to figure out on our data, on our version of model of data. Right. Oh my god, this was so exciting. I was like going back and forth between the book. And then a bit of googling and then trying things. +Ah, yes. Fantastic. Wow. Thanks for doing this. So you also the author. You also the author. Yeah. Yeah. Yeah. Yeah. Yeah. So we did that book. We did it. We did a second version of it for updated solar. But that was quite a few years ago. +I am kind of curious what's going to happen with technical books. Right. I mean, in the solar community, we got the ref guide, which is, I think, pretty darn good considering how it's written. I do sort of wonder what the future of technical books will be with open source communities. +And what do we do? So maybe like cookbooks, you know, like that you have specific cases and like, how would you come about building these things and maybe real data so people can actually try things, right? Yeah. Yeah. Yeah. I mean, it has gotten a lot easier to publish on the web, right? Yeah. +Have something. But yeah, what, you know, I think a lot of people write a book sort of as a writer passage as well. Right. So it's a book, a little different thing writing a book for an open source project reference. Right. For sure. How to make them printable. +So you can say I wrote the book for this open source project. But we'll see. Yeah. That's exciting. But you also wanted to show something. Let's demo. I'd love to. Yeah. +So we touched briefly on Cuban, right? The first one of the stewards of the project and historically for those of here, I'll just go ahead. For those of you who've used Cuban in the past, the way it has worked is I'll just I'll just bring up my local host. Here we go. Right. +So one of the things that we batted in the not in recent, this is the development version. I'm going to use it with realistic activity and Cuban is who I pulled up and I got a couple of cases here. But you know, in Cuban, it works well. I'm going to bring up a case. Right. Here's a case. +I'm going to search for milk. I did a query for milk. This is using sort of a random data set here. It's backed by a solar search engine. And there's a survey right there. There's my search engine. +And Cuban works great for a relatively small number of queries up to hundreds, right? And one of the things that we found is that this interface works well, especially if a search engine super fast and responsive. +It's a rich single page JavaScript application that's making queries in real time to a search engine. If you have a thousand queries, like the people I mentioned before, takes like 15, 20 minutes, but as you wide load up and all the queries to be run. +And we know that lots of people want to run more queries, 5,000, right? People ask how many queries should I be measuring? I'm like, well, start out with what you can. If that's 25 and 50, that's better than zero. +Think about 200, maybe 300, maybe a thousand, 5,000, right? And then above 5,000, that's sort of only for the most sophisticated teams. But Cuban kind of tops out at maybe a thousand queries. +And so we've been doing a lot of work to think about how do we support larger data sets, right? Larger query sets. +And what's been really fun is to work on introducing background processing, right? Instead of everything being limited by the request, response cycle of your web browser, what if we can run some background jobs? And so I'm just going to show really quick. I'm going to go and bring up all the books. +And I've got an import feature. So we have exported a book book export 39. It's a 62 megabyte JSON file. So 62 megabytes. And I'm going to go ahead and click upload. And now in cube bid, what we're starting to do is we can take large files based on files predominantly. +And we storm in the background and we kick off a process, a background job. And there you can see, right there we are loading a whole bunch of queries, right? And these are all sort of scientific queries, some very complex ones and simpler ones. +And you can see it's going to take a while because this had what 28,000 query dock pairs, right? So that are being loaded along with their judgments. So, but what sort of fun with the new background jobs and using web sockets. +We're also able to push up updates to you as background jobs are happening inside cube. So right here, there we are, and we are loading a whole bunch of data. Now, yes, it would be nice if it was a parquet file, not a MySQL database that we were using. +So we'll have to think about some of those things. But this is starting to open up the door to moving larger data sets and being really comfortable with that sort of 5,000 queries, 50,000 query dock pairs kind of data. +Not going to manage the 100,000 queries or quarter million documents those data sets. Jason is not the right format, but we're at least scaling it up to get a broader set. +The other thing that I'm also excited about is we're getting closer to be able to run these analytics on a regular basis, right? Now that we have some background processing, we could think about every night we rerun all 1000 documents. And every night we could be storing them. +So these little charts here that you see that are sort of showing some basic scoring information. You could start using this to monitor it over time instead of having to roll your own dashboarding tools. So that's something I'm really excited about. I'm also going to point out to two PR so GitHub. +com 19 sqbid is the open source project. And a couple of pull requests that are in progress, but looking to land them soon. Right here is this pull request 976. Imagine if we could run thousands of queries nightly and cupid. +Now that we've got background jobs working and communicating with the user right of state. This will be coming pretty soon. Pretty soon in open source time, which means. I don't know. We'll see next few months. And so I'm glad people helping and testing. So this one's super exciting. +Let's go back and see how we're doing. And they're doing. So there we go. We're up to 4968 query dock pairs as we kind of count down. So yeah, this is all through the magic of Web sockets, which has been really cool to see. +And as you are loading this here, are you also executing it against the search engine? Or are you here because we had all static data. Yes, static data. A book represents the query dock pairs with all of the data. Whereas the case is what we do the real time querying. +And now that we have this one working. Once we have this PR, then you'll be able to run a background job in cupid. With a similar counter, maybe it up here next to one of your cases that says we're running queries, 5,000 queries. And this is our progress for the number of the error out. +But of course, for listeners to understand, like what takes time is basically, of course, also in searching this data into cupids database is like my sequel. Right. And ready. So have you stopped using it already? So we're actually, so we're using my sequels or database. +However, what manages this communication web sockets is all in red. So as the bat and it's our background jobs and our front end jobs and our web browsers keep track of each other is through red. Yeah, so if I actually, so if you, you know, I'm running local hosts, you won't see it. +So everybody who is connected, who has permissions for this book, everybody would be seeing these messages. Yeah, yeah. So it's kind of broadcasting to everyone. Yeah, cool. Exactly. So that's something I'm really, really excited about. +The other thing that I'm really excited to be is LLM based judgments, right? So I'm going to start out this conversation about using cupid with human judges annotators, right? And gathering quite quality data. But as we all know, human judges is expensive, not every organization can do it. +My colleague Scott Stoltz last year did some interesting work playing around with chat GPT when it first came out to evaluate, is this query and this document. +And then we've been working with Moody's on their BG solution and using what we've been calling judge Judy, an LLM to evaluate what that lets us do is. We're basically using a small set of human judges to validate our LLM judge judge Judy. +And if we have good correlation, right, our inner rate of reliability looks good, you know, flights, Kappa, Coens, all those metrics look good, then this gives us confidence to go ahead and scale up the judgments, right, using an LLM. +So today, that is a bunch of pandas notebooks and kind of custom code. +However, the other pull requests that I'm really excited about, right, is this meat judge Judy, she is your AI powered subject matter expert, right? +And so in the not too distant future, you will be able to, let me go ahead and bring up this case, right, here we have one person who's been the judge. +But soon, you'll have a second column next to it, judge Judy, right, using whatever prompt you've typed in, right, or provided is judging. So that's the other big, how do we scale up, cupid and make it relevant in our gen AI world, right, those are sort of the two big things. This is fantastic. +This is really fantastic. Yeah. Wow. I hope this PRs will land really soon, especially the LLM one, right, because this allows people to really quickly hit the ground running and start labeling. +Actually, someone will label in a way, but exactly exactly the trick is having the right props, right, and having the right set of positive examples and negative examples, right. +But one of the things that I were working on, so cupid, right, ships with a set of data science notebooks, they need a little bit more work. See if this comes up in my diversion, I don't ship that. So I'm going to switch to the production cupid. Yeah, no worries. And notebooks. +So in this example's folder, we're actually shipping a couple of notebooks for you to use, flash capa, jacquard and RBO comparison, multi-radar analysis, right. And these notebooks here, you can directly use with your cupid book of judgments to evaluate how we're doing overall. +And so this can let you take your human judgments, understand how good or bad they are. And then when you bring the LLM power judge in compare the LLM judge to what your human judges were doing and feel some confidence. +So I'm really excited to be shipping these because I think it's going to lower the barrier to getting judgments. And that's something that a lot of search teams are like, I would love to use cute, but I would love to do this. +But I can't do any of this until I have judgments and I don't know where to get them or I don't have the domain experts that I need, right. +And you know, search oriented organizations often have that figured out, but a lot of other teams are like, we just see a search engine that works what you know reasonably well and we don't have that. So we got a lower the barrier to getting judgments in judgments and I'm excited about this. +This is fantastic, but I can also add from my personal experience, you know, that yes, you're absolutely right that there is this sometimes there is even a friction, right. +The search engineer says, no, I don't want to label I'm a search engineer, I'm developing the algorithm, but they will get so many more insights, so much more insights if they actually label. +And in our teams, you know, if you have, I don't know, 10 people and if each will label 10 queries, you will have 100 queries labeled. +So of course, if you don't go for overlapping and stuff like that, but if you go, then yeah, it's another story, but you know, and then all of us, all of a sudden get all this insights, right. +Now, now the LLM thing can actually help you scale this right and then of course all this prompting and in label studio, by the way, they have released a maybe something to think about a capability where an agent will learn from user feedback, right. +So let's say they label and then so LLM will label make make some mistakes and then the main expert will correct them and so it takes it in as a feedback and then it becomes better over time. +So it's basically you can kind of support it's like you're not copilot, someone said, seed prop stain in previous episode said, confident. So you kind of like give these things you collaborate in a way, right. So that would be this is fantastic direction. Yeah. +So I mean, this is definitely very much around that more narrow relevance judging versus generic labeling way label studio is right. But there's definitely room for inspiration from both label studio. I've been looking at more as well as ragas and how it's doing some of the new metrics. +Yeah, it's interesting. So yeah, exactly. What I love about Cupid is that I can really connect it to the live search engine. I mean, not necessarily in production can be some development version of it. +And I can start labeling and queering and as you said, search is the interface to big data, right. So Cupid becomes interface to your search, which is the interface to your big data and all the unknowns there. +And when this terrible that one looks to OK, that one looks OK, that one looks terrible, right. We can immediately start building. Yeah, maybe that one's OK, that one doesn't look it right we can immediately start building some sort of understanding right now. +So quick little binary one right we're going to start building that and get it get a sense of what our score is going to be exactly exactly. And that score is also customizable. We've done some little implementations in like JavaScript looking like language, right. I think it's JavaScript. +Yeah, but you just come in here and you take your score right there is so here's classic NBC G 10, but you can change it. So like one recently, we wanted to know in this score right here, we're being you know penalized because soy is returning zero results. And so it's giving us a zero. +So it's bringing down our average precision. But what if you wanted to know that it was supposed to be zero results zero results is actually the right. Yes, yes, yes. So that one we actually went in and said we added an option a per query option should be ZSR. +Right, and we set that option and then in our custom score. If it said should be ZSR is true and there were zero results then we gave it a one right because it's working the way and vice versa. +We had other situations where yeah, if we started returning results for soy, that would have been worse search right and so yeah that was a great use of a custom score. Yeah, fantastic. We're all about that because it was a great use sound. Yeah, yeah, yeah, exactly. +Also where Cupid helped us is sometimes you don't speak that language. So it could be Korean language that you don't speak but you need to move and on one occasion we've sent a Cupid just to our Korean native speakers in the company and they've labeled and they told us how it looks so. That work. +So the other thing right so here we are we are happily loading up all these but I'll go ahead and click judge right. So this is sort of an older approach to rating a zero a one to 10 wouldn't do that now but that's what we've been saying. +So there you can see a here is the human radar interface now I don't know what is a good rating or not but here you can know the rates and documents kind of taking this from this is a recent add on which is if you're if you can't read it why right. +I am a vet in there I am not a vet and don't understand the science. Yeah in this square. Right and so I'll skip judge in that and that and that that's been that's been helpful in just cranking out your human judgments so. +Yeah go got just a couple of judgments mostly Jeff I yeah Jeff Scott 2500 I've got four in here and I mark one is unreadable I can reset that which should throw it back in the pool right maybe we have a conversation about why it was unreadable and then throw it back in the. +Almost there we are almost there right the background job is wonderful but it doesn't necessarily mean it's any faster right now I know I know at least watch it and watch the countdown so. +Fantastic demo can you tell a bit more about the tech side of things we didn't mention sequel my sequel already is. So if someone wants to jump in and start you know going and and sort of what is the level of effort they need to go through. +It's a little bit of a challenge right so most of us so this is a Ruby on rails web app right like it is a full stack web app and this is all just standard Ruby on rails the app is. + It's been upgraded over the years to be with the latest standard and so if you do rails development everything's going to feel very comfortable obviously a lot of us in the search or information retrieval world don't have that expertise and that that's just a challenge so one thing I will say is that if you join or ask questions on relevance slack pound cupid happy to answer those questions the core application that you play with in here. +So it's an old angular one works great no problems but it's an angular one app and because it's an open source project not a commercial product we sort of stayed away from attempting the big rewrite to update it to react or name your thing lots of examples it seems to work fine animals. +So cupid angular one app for all of this and then outside of this this is all just a standard rails application lots of model view controller type screens that you can see right here all standard rails my SQL database redis for sort of the communication layer. +And it's all built using Docker so if you want to so the read me has way too much developer centric set up right but if you have Docker then you run bin setup Docker yeah that will set you up with the development environment literally what I was just showing is the inside of Docker. +And then you fire it up locally with bin Docker server and that runs it locally so there's a lot of docs in here for all the different parts that can be a little overwhelming I think we have to rework some of this documentation but it's all there. +Now a couple of things I'll show off we actually have an API now so right here you have a you come here and you generate your personal access token like that and just for fun by and and this curl command will show you your user so we have authentication API. +And we're slowly working on documenting all of those API API API API. I'm doing well. +So there we go API API slash we're slowly documenting all the APIs and so one of the things that I encourage people right is maybe Cupid doesn't do everything you need to do and so you're building some scripts outside of it or in some notebooks but you can use Cupid as your shared source of truth. +So maybe you have a case that represents your golden set of queries right you in your notebook can go and grab all the query so and so we're adding sort of more and more documentation on all of these different API so that's fantastic yeah. +So make it a little bit easier for people to understand so I can look at this here's case for but I can also look at it like this. That's a Jason right. +This should give me back my Jason right it's all my Jason data right there's all my different scores etc so if Cupid provides a value but doesn't do everything you need to do you can read him right from it. +So I'm going to do a lot of export name pork functions as well so yeah so it's fantastic and yeah so it's loaded it's loaded. Like there we go 29,000 or 87 query document pairs and 29,291 judgments right. So there is all preserved so. +There you go this is fantastic thanks for the demo Eric I learned because I wasn't keeping up as closely I think we also are writing an outdated version of Cupid so I will ask the team to to upgrade obviously because. +Yeah we should yeah yeah the release cadence is fairly fast so make sure your deployment model is pretty simplistic and automated so it's keep up yeah exactly. +This is fantastic i'm sure we can talk more about your other projects and we will save it for another episode yeah but I was also thinking I like to ask this question and now I get the chance yeah. + Why the the question of why I call it or the the motivational what keeps you up and at night so to say why you are still in search Eric you've spent so many years do you think it's still unsolved or what what keeps you going in in this topic yeah so what I love about search is it it kind of reinvents itself every. + I mean seven years five to seven years it sort of reinvents itself every seven years right I sort of started out with saying at one time it was exciting just to have an open source search engine right in a world of big expensive commercial search engines and then it was really exciting to get into big data from the search perspective. +becoming a data scientist right I mean I I I pretend to be a data scientist I pretend to be a machine learning guy right through search right so it reinvented itself and now i'm a prompt engineer and generative AI person through search and so I love that the field reinvents itself. + But also certain long standing principles around measurement experimentation appear to remain relevant even though it reinvents itself every seven years right and it's been really, really exciting like I like that what i'm doing now is not what I was doing seven years ago and I suspect I won't be doing it in another seven years. +And that I like making things happen I like solving problems and search remains sort of the way people interact with technology systems right I am really intrigued or looking forward to when. +Search isn't just I ask a question get a response but I ask a question get a response then I have another conversation and the search engine understands that we actually have we we talk about search as a conversation but. +We don't normally do that we just pretend it's a one shot kind of thing I look forward to that side of things and then what are the new use cases we're going to enable right i'm going with my family to Spain for three weeks. + In July got my plane tickets booked I got not great flights but cheap flights imagine that there was a search engine out there that knew what my plane flights were knew what my wife's personal tolerances are and if it was constantly shopping for a cheaper flight and actually cancel current flight and gave bought the new one and you know just let me know by the way I saved you. + Another 400 bucks for your family of four or I found a better flight or there was an upgrade right like wouldn't be amazing if once you kind of gave it the parameters is doing that and I suspect that's going to kind of look like a search experience right it's going to be a query with a bunch of parameters. + It understands what my preferences and tolerances and risks are right and that's going to be a really interesting thing to measure and I think it'll be really powerful so so excited about that future I suspect that's the next thing that we get to once we get through kind of the current generative AI stuff. +That's a beautiful answer thanks so much really I've learned a lot today I'm sure we'll repeat this let's do it I know you have another topic to talk about from your conference talk and another project you're working on and I'm sure keep it keep it continues to be. + Really relevant to what we do it's it's a it's a toolbox right it's it's it's all in your toolbox or maybe it's a toolbox of full of tools but I think it's it's fantastic one to have to really complete your search journey because if you are only writing code and you're never looking at queries you're never labeling you never here would you know how how does it feel like you know using this change and you will not get far so please use it. +I mean of course you can set up something you know in the kitchen with Excel but the Microsoft Excel whatever Google spreadsheets but maybe that's not scalable enough and not repeatable and yeah why waste time if they're already cool tools like keep it open source really excited about this. +That's great that's great yeah the scaling up is super great super exciting so yeah it will be interesting to make sure that keep it remains true to what it does and doesn't try to become all things to all people we'll see what happens. +Absolutely yes and you you as a listener have a chance to contribute it's open source exactly exactly thanks so much Eric I really enjoyed it. Thank you bye bye cheers. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/evgeniya-sukhodolskaya-data-advocate-toloka-data-at-the-core-of-all-the-cool-ml.md b/transcripts_with_timestamps/vector-podcast/evgeniya-sukhodolskaya-data-advocate-toloka-data-at-the-core-of-all-the-cool-ml.md new file mode 100644 index 0000000..fabbe0b --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/evgeniya-sukhodolskaya-data-advocate-toloka-data-at-the-core-of-all-the-cool-ml.md @@ -0,0 +1,2887 @@ +--- +description: '

Toloka’s support for Academia: grants and educator partnerships

https://toloka.ai/collaboration-with-educators-form

https://toloka.ai/research-grants-form

These + are pages leading to them:

https://toloka.ai/academy/education-partnerships

https://toloka.ai/grants

Topics:

00:00 + Intro

01:25 Jenny’s path from graduating in ML to a Data Advocate role

07:50 + What goes into the labeling process with Toloka

11:27 How to prepare data + for labeling and design tasks

16:01 Jenny’s take on why Relevancy needs more + data in addition to clicks in Search

18:23 Dmitry plays the Devil’s Advocate + for a moment

22:41 Implicit signals vs user behavior and offline A/B testing

26:54 + Dmitry goes back to advocating for good search practices

27:42 Flower search + as a concrete example of labeling for relevancy

39:12 NDCG, ERR as ranking + quality metrics

44:27 Cross-annotator agreement, perfect list for NDCG and + Aggregations

47:17 On measuring and ensuring the quality of annotators with + honeypots

54:48 Deep-dive into aggregations

59:55 Bias in data, SERP, + labeling and A/B tests

1:16:10 Is unbiased data attainable?

1:23:20 + Announcements

This episode on YouTube: https://youtu.be/Xsw9vPFqGf4

Podcast + design: Saurabh Rai: https://twitter.com/srvbhr

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20230128_100136_c691208feb13437a07aae0f929db756b.jpg +pub_date: Sat, 28 Jan 2023 10:19:56 GMT +title: Evgeniya Sukhodolskaya - Data Advocate, Toloka - Data at the core of all the + cool ML +url: https://rss.com/podcasts/vector-podcast/799802 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 27.3, "text": " Hello + there, vector podcast season 2.", "tokens": [50364, 2425, 456, 11, 8062, 7367, 3196, + 568, 13, 51729], "temperature": 0.0, "avg_logprob": -0.5907761653264364, "compression_ratio": + 0.8222222222222222, "no_speech_prob": 0.18068993091583252}, {"id": 1, "seek": 2730, + "start": 27.3, "end": 31.3, "text": " I hope that you were waiting for a new episode.", + "tokens": [50364, 286, 1454, 300, 291, 645, 3806, 337, 257, 777, 3500, 13, 50564], + "temperature": 0.0, "avg_logprob": -0.3105118243725269, "compression_ratio": 1.5621621621621622, + "no_speech_prob": 0.3313506245613098}, {"id": 2, "seek": 2730, "start": 31.3, "end": + 38.8, "text": " And today I''m really happy about my guest and the topic, because + in many ways we didn''t", "tokens": [50564, 400, 965, 286, 478, 534, 2055, 466, + 452, 8341, 293, 264, 4829, 11, 570, 294, 867, 2098, 321, 994, 380, 50939], "temperature": + 0.0, "avg_logprob": -0.3105118243725269, "compression_ratio": 1.5621621621621622, + "no_speech_prob": 0.3313506245613098}, {"id": 3, "seek": 2730, "start": 38.8, "end": + 43.900000000000006, "text": " cover it as much as I think and hope we can cover + it today.", "tokens": [50939, 2060, 309, 382, 709, 382, 286, 519, 293, 1454, 321, + 393, 2060, 309, 965, 13, 51194], "temperature": 0.0, "avg_logprob": -0.3105118243725269, + "compression_ratio": 1.5621621621621622, "no_speech_prob": 0.3313506245613098}, + {"id": 4, "seek": 2730, "start": 43.900000000000006, "end": 51.1, "text": " It''s + the topic of data, the role of data, while everyone is talking about sexy deep learning,", + "tokens": [51194, 467, 311, 264, 4829, 295, 1412, 11, 264, 3090, 295, 1412, 11, + 1339, 1518, 307, 1417, 466, 13701, 2452, 2539, 11, 51554], "temperature": 0.0, "avg_logprob": + -0.3105118243725269, "compression_ratio": 1.5621621621621622, "no_speech_prob": + 0.3313506245613098}, {"id": 5, "seek": 5110, "start": 51.1, "end": 56.6, "text": + " chat, GPT, learning to rank new algorithms and so on.", "tokens": [50364, 5081, + 11, 26039, 51, 11, 2539, 281, 6181, 777, 14642, 293, 370, 322, 13, 50639], "temperature": + 0.0, "avg_logprob": -0.3153196370826577, "compression_ratio": 1.4805194805194806, + "no_speech_prob": 0.05171404778957367}, {"id": 6, "seek": 5110, "start": 56.6, "end": + 65.6, "text": " I still believe that we should not forget about where all these + things begin and this is data.", "tokens": [50639, 286, 920, 1697, 300, 321, 820, + 406, 2870, 466, 689, 439, 613, 721, 1841, 293, 341, 307, 1412, 13, 51089], "temperature": + 0.0, "avg_logprob": -0.3153196370826577, "compression_ratio": 1.4805194805194806, + "no_speech_prob": 0.05171404778957367}, {"id": 7, "seek": 5110, "start": 65.6, "end": + 72.6, "text": " And I''m happy to have and welcome Yvgenia Sukadolska, data advocate + at the locker today with me.", "tokens": [51089, 400, 286, 478, 2055, 281, 362, + 293, 2928, 398, 85, 1766, 654, 37898, 345, 401, 20771, 11, 1412, 14608, 412, 264, + 25707, 965, 365, 385, 13, 51439], "temperature": 0.0, "avg_logprob": -0.3153196370826577, + "compression_ratio": 1.4805194805194806, "no_speech_prob": 0.05171404778957367}, + {"id": 8, "seek": 5110, "start": 72.6, "end": 74.6, "text": " How are you doing?", + "tokens": [51439, 1012, 366, 291, 884, 30, 51539], "temperature": 0.0, "avg_logprob": + -0.3153196370826577, "compression_ratio": 1.4805194805194806, "no_speech_prob": + 0.05171404778957367}, {"id": 9, "seek": 5110, "start": 74.6, "end": 75.6, "text": + " Thank you, Dima.", "tokens": [51539, 1044, 291, 11, 413, 4775, 13, 51589], "temperature": + 0.0, "avg_logprob": -0.3153196370826577, "compression_ratio": 1.4805194805194806, + "no_speech_prob": 0.05171404778957367}, {"id": 10, "seek": 5110, "start": 75.6, + "end": 76.6, "text": " Thank you.", "tokens": [51589, 1044, 291, 13, 51639], "temperature": + 0.0, "avg_logprob": -0.3153196370826577, "compression_ratio": 1.4805194805194806, + "no_speech_prob": 0.05171404778957367}, {"id": 11, "seek": 5110, "start": 76.6, + "end": 78.1, "text": " I am super happy to talk it.", "tokens": [51639, 286, 669, + 1687, 2055, 281, 751, 309, 13, 51714], "temperature": 0.0, "avg_logprob": -0.3153196370826577, + "compression_ratio": 1.4805194805194806, "no_speech_prob": 0.05171404778957367}, + {"id": 12, "seek": 5110, "start": 78.1, "end": 79.1, "text": " I''m very precarious.", + "tokens": [51714, 286, 478, 588, 4346, 289, 851, 13, 51764], "temperature": 0.0, + "avg_logprob": -0.3153196370826577, "compression_ratio": 1.4805194805194806, "no_speech_prob": + 0.05171404778957367}, {"id": 13, "seek": 7910, "start": 79.1, "end": 80.1, "text": + " I''m very smooth.", "tokens": [50364, 286, 478, 588, 5508, 13, 50414], "temperature": + 0.0, "avg_logprob": -0.21151986452612545, "compression_ratio": 1.5022421524663676, + "no_speech_prob": 0.0869220569729805}, {"id": 14, "seek": 7910, "start": 80.1, "end": + 85.1, "text": " So I''m feeling like I''m just having a little chitchat before vacation.", + "tokens": [50414, 407, 286, 478, 2633, 411, 286, 478, 445, 1419, 257, 707, 417, + 1549, 267, 949, 12830, 13, 50664], "temperature": 0.0, "avg_logprob": -0.21151986452612545, + "compression_ratio": 1.5022421524663676, "no_speech_prob": 0.0869220569729805}, + {"id": 15, "seek": 7910, "start": 85.1, "end": 87.1, "text": " Yes, exactly how + it should be.", "tokens": [50664, 1079, 11, 2293, 577, 309, 820, 312, 13, 50764], + "temperature": 0.0, "avg_logprob": -0.21151986452612545, "compression_ratio": 1.5022421524663676, + "no_speech_prob": 0.0869220569729805}, {"id": 16, "seek": 7910, "start": 87.1, "end": + 91.1, "text": " And I''m really happy to have you here.", "tokens": [50764, 400, + 286, 478, 534, 2055, 281, 362, 291, 510, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.21151986452612545, "compression_ratio": 1.5022421524663676, "no_speech_prob": + 0.0869220569729805}, {"id": 17, "seek": 7910, "start": 91.1, "end": 95.1, "text": + " We met at Haystack Conference and this was great.", "tokens": [50964, 492, 1131, + 412, 8721, 372, 501, 22131, 293, 341, 390, 869, 13, 51164], "temperature": 0.0, + "avg_logprob": -0.21151986452612545, "compression_ratio": 1.5022421524663676, "no_speech_prob": + 0.0869220569729805}, {"id": 18, "seek": 7910, "start": 95.1, "end": 101.1, "text": + " I saw so much excitement in you when you talked about, hey, but what about data?", + "tokens": [51164, 286, 1866, 370, 709, 14755, 294, 291, 562, 291, 2825, 466, 11, + 4177, 11, 457, 437, 466, 1412, 30, 51464], "temperature": 0.0, "avg_logprob": -0.21151986452612545, + "compression_ratio": 1.5022421524663676, "no_speech_prob": 0.0869220569729805}, + {"id": 19, "seek": 7910, "start": 101.1, "end": 103.1, "text": " We should also + talk about it.", "tokens": [51464, 492, 820, 611, 751, 466, 309, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.21151986452612545, "compression_ratio": 1.5022421524663676, + "no_speech_prob": 0.0869220569729805}, {"id": 20, "seek": 7910, "start": 103.1, + "end": 105.1, "text": " Don''t forget it.", "tokens": [51564, 1468, 380, 2870, 309, + 13, 51664], "temperature": 0.0, "avg_logprob": -0.21151986452612545, "compression_ratio": + 1.5022421524663676, "no_speech_prob": 0.0869220569729805}, {"id": 21, "seek": 10510, + "start": 106.1, "end": 111.1, "text": " And I''m really excited to drill into this + today with you.", "tokens": [50414, 400, 286, 478, 534, 2919, 281, 11392, 666, 341, + 965, 365, 291, 13, 50664], "temperature": 0.0, "avg_logprob": -0.16548102834950323, + "compression_ratio": 1.5374449339207048, "no_speech_prob": 0.11866965144872665}, + {"id": 22, "seek": 10510, "start": 111.1, "end": 117.1, "text": " I think let''s + start as we usually start with your background and then we''re all from there.", + "tokens": [50664, 286, 519, 718, 311, 722, 382, 321, 2673, 722, 365, 428, 3678, + 293, 550, 321, 434, 439, 490, 456, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.16548102834950323, "compression_ratio": 1.5374449339207048, "no_speech_prob": + 0.11866965144872665}, {"id": 23, "seek": 10510, "start": 117.1, "end": 118.1, "text": + " Okay, perfect.", "tokens": [50964, 1033, 11, 2176, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.16548102834950323, "compression_ratio": 1.5374449339207048, + "no_speech_prob": 0.11866965144872665}, {"id": 24, "seek": 10510, "start": 118.1, + "end": 120.1, "text": " Yeah, Haystack.", "tokens": [51014, 865, 11, 8721, 372, + 501, 13, 51114], "temperature": 0.0, "avg_logprob": -0.16548102834950323, "compression_ratio": + 1.5374449339207048, "no_speech_prob": 0.11866965144872665}, {"id": 25, "seek": 10510, + "start": 120.1, "end": 126.1, "text": " I think I literally like formulated my passion + that I want to talk about searching the means of the data.", "tokens": [51114, 286, + 519, 286, 3736, 411, 48936, 452, 5418, 300, 286, 528, 281, 751, 466, 10808, 264, + 1355, 295, 264, 1412, 13, 51414], "temperature": 0.0, "avg_logprob": -0.16548102834950323, + "compression_ratio": 1.5374449339207048, "no_speech_prob": 0.11866965144872665}, + {"id": 26, "seek": 10510, "start": 126.1, "end": 131.1, "text": " So I''m feeling + like today I''m getting a present for Christmas.", "tokens": [51414, 407, 286, 478, + 2633, 411, 965, 286, 478, 1242, 257, 1974, 337, 5272, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.16548102834950323, "compression_ratio": 1.5374449339207048, + "no_speech_prob": 0.11866965144872665}, {"id": 27, "seek": 13110, "start": 131.1, + "end": 144.1, "text": " So yeah, about me, I am this person, this type of a person + which got his perfect position by chance because I never knew it existed.", "tokens": + [50364, 407, 1338, 11, 466, 385, 11, 286, 669, 341, 954, 11, 341, 2010, 295, 257, + 954, 597, 658, 702, 2176, 2535, 538, 2931, 570, 286, 1128, 2586, 309, 13135, 13, + 51014], "temperature": 0.0, "avg_logprob": -0.17232633104511336, "compression_ratio": + 1.718487394957983, "no_speech_prob": 0.14766912162303925}, {"id": 28, "seek": 13110, + "start": 144.1, "end": 160.1, "text": " Because I finished bachelor''s in machine + learning and I was like, okay, if I did so, I need to work like around it, you know, + but at this point, everybody was, everybody is always hyping on something in machine + learning, you know, so at that point, I mean, I finished in 2009.", "tokens": [51014, + 1436, 286, 4335, 25947, 311, 294, 3479, 2539, 293, 286, 390, 411, 11, 1392, 11, + 498, 286, 630, 370, 11, 286, 643, 281, 589, 411, 926, 309, 11, 291, 458, 11, 457, + 412, 341, 935, 11, 2201, 390, 11, 2201, 307, 1009, 2477, 3381, 322, 746, 294, 3479, + 2539, 11, 291, 458, 11, 370, 412, 300, 935, 11, 286, 914, 11, 286, 4335, 294, 11453, + 13, 51814], "temperature": 0.0, "avg_logprob": -0.17232633104511336, "compression_ratio": + 1.718487394957983, "no_speech_prob": 0.14766912162303925}, {"id": 29, "seek": 16010, + "start": 161.1, "end": 166.1, "text": " So it was like a very big era of guns and + like others.", "tokens": [50414, 407, 309, 390, 411, 257, 588, 955, 4249, 295, 10153, + 293, 411, 2357, 13, 50664], "temperature": 0.0, "avg_logprob": -0.16950880050659178, + "compression_ratio": 1.6359832635983265, "no_speech_prob": 0.004976422525942326}, + {"id": 30, "seek": 16010, "start": 166.1, "end": 178.1, "text": " And everybody + wanted to work with a computer vision and a dinner and so I thought, okay, maybe + I need to start like working somewhere in the field and see what I will like.", + "tokens": [50664, 400, 2201, 1415, 281, 589, 365, 257, 3820, 5201, 293, 257, 6148, + 293, 370, 286, 1194, 11, 1392, 11, 1310, 286, 643, 281, 722, 411, 1364, 4079, 294, + 264, 2519, 293, 536, 437, 286, 486, 411, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.16950880050659178, "compression_ratio": 1.6359832635983265, "no_speech_prob": + 0.004976422525942326}, {"id": 31, "seek": 16010, "start": 178.1, "end": 181.1, "text": + " And by some chance, I started working as a software developer.", "tokens": [51264, + 400, 538, 512, 2931, 11, 286, 1409, 1364, 382, 257, 4722, 10754, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.16950880050659178, "compression_ratio": 1.6359832635983265, + "no_speech_prob": 0.004976422525942326}, {"id": 32, "seek": 16010, "start": 181.1, + "end": 186.1, "text": " I don''t know why just, you know, like out of the blue, + your students, you''re getting your first job.", "tokens": [51414, 286, 500, 380, + 458, 983, 445, 11, 291, 458, 11, 411, 484, 295, 264, 3344, 11, 428, 1731, 11, 291, + 434, 1242, 428, 700, 1691, 13, 51664], "temperature": 0.0, "avg_logprob": -0.16950880050659178, + "compression_ratio": 1.6359832635983265, "no_speech_prob": 0.004976422525942326}, + {"id": 33, "seek": 18610, "start": 186.1, "end": 195.1, "text": " Then I kind of + realized after some time that I''m doing more like business analytics, talking tasks + and consulting through doing the development also and I like both.", "tokens": [50364, + 1396, 286, 733, 295, 5334, 934, 512, 565, 300, 286, 478, 884, 544, 411, 1606, 15370, + 11, 1417, 9608, 293, 23682, 807, 884, 264, 3250, 611, 293, 286, 411, 1293, 13, 50814], + "temperature": 0.0, "avg_logprob": -0.10949942872330949, "compression_ratio": 1.5821596244131455, + "no_speech_prob": 0.001442431821487844}, {"id": 34, "seek": 18610, "start": 195.1, + "end": 204.1, "text": " And I was like, okay, so I don''t want to be just a developer + and I switched to a position which was called technical manager, which was like + something in between analysts.", "tokens": [50814, 400, 286, 390, 411, 11, 1392, + 11, 370, 286, 500, 380, 528, 281, 312, 445, 257, 10754, 293, 286, 16858, 281, 257, + 2535, 597, 390, 1219, 6191, 6598, 11, 597, 390, 411, 746, 294, 1296, 31388, 13, + 51264], "temperature": 0.0, "avg_logprob": -0.10949942872330949, "compression_ratio": + 1.5821596244131455, "no_speech_prob": 0.001442431821487844}, {"id": 35, "seek": + 20410, "start": 204.1, "end": 218.1, "text": " And also a person who like manages + like some projects with people and also at that time, that was the first time I + tried to do crowdsourcing like related to our projects because we were doing tasks + about moderation.", "tokens": [50364, 400, 611, 257, 954, 567, 411, 22489, 411, + 512, 4455, 365, 561, 293, 611, 412, 300, 565, 11, 300, 390, 264, 700, 565, 286, + 3031, 281, 360, 26070, 41849, 411, 4077, 281, 527, 4455, 570, 321, 645, 884, 9608, + 466, 49471, 13, 51064], "temperature": 0.0, "avg_logprob": -0.15163712888150602, + "compression_ratio": 1.6715686274509804, "no_speech_prob": 0.2549780011177063}, + {"id": 36, "seek": 20410, "start": 218.1, "end": 226.1, "text": " And with moderation + and actually you need a lot of data like labels and checked on the quality because + it''s a very hard task.", "tokens": [51064, 400, 365, 49471, 293, 767, 291, 643, + 257, 688, 295, 1412, 411, 16949, 293, 10033, 322, 264, 3125, 570, 309, 311, 257, + 588, 1152, 5633, 13, 51464], "temperature": 0.0, "avg_logprob": -0.15163712888150602, + "compression_ratio": 1.6715686274509804, "no_speech_prob": 0.2549780011177063}, + {"id": 37, "seek": 22610, "start": 226.1, "end": 232.1, "text": " And still was + off because here I wasn''t using enough machine learning and I was like, but I started + it.", "tokens": [50364, 400, 920, 390, 766, 570, 510, 286, 2067, 380, 1228, 1547, + 3479, 2539, 293, 286, 390, 411, 11, 457, 286, 1409, 309, 13, 50664], "temperature": + 0.0, "avg_logprob": -0.15039322501734684, "compression_ratio": 1.7342342342342343, + "no_speech_prob": 0.7129138708114624}, {"id": 38, "seek": 22610, "start": 232.1, + "end": 240.1, "text": " Okay, then I switched to machine learning engineer. I mean, + it sounds like I''m hoping on and hoping off, but it was like some periods of time.", + "tokens": [50664, 1033, 11, 550, 286, 16858, 281, 3479, 2539, 11403, 13, 286, 914, + 11, 309, 3263, 411, 286, 478, 7159, 322, 293, 7159, 766, 11, 457, 309, 390, 411, + 512, 13804, 295, 565, 13, 51064], "temperature": 0.0, "avg_logprob": -0.15039322501734684, + "compression_ratio": 1.7342342342342343, "no_speech_prob": 0.7129138708114624}, + {"id": 39, "seek": 22610, "start": 240.1, "end": 250.1, "text": " It was amazing. + I had an amazing had the I am still very grateful to him. He like, teach me much + about machine learning in the production.", "tokens": [51064, 467, 390, 2243, 13, + 286, 632, 364, 2243, 632, 264, 286, 669, 920, 588, 7941, 281, 796, 13, 634, 411, + 11, 2924, 385, 709, 466, 3479, 2539, 294, 264, 4265, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.15039322501734684, "compression_ratio": 1.7342342342342343, + "no_speech_prob": 0.7129138708114624}, {"id": 40, "seek": 25010, "start": 250.1, + "end": 261.1, "text": " But at this point, I realized that I lost some of the knowledge + that I like received back. So I applied for the master''s program in the actually + Munich.", "tokens": [50364, 583, 412, 341, 935, 11, 286, 5334, 300, 286, 2731, 512, + 295, 264, 3601, 300, 286, 411, 4613, 646, 13, 407, 286, 6456, 337, 264, 4505, 311, + 1461, 294, 264, 767, 40601, 13, 50914], "temperature": 0.0, "avg_logprob": -0.21581243646555934, + "compression_ratio": 1.4367816091954022, "no_speech_prob": 0.04407715052366257}, + {"id": 41, "seek": 25010, "start": 261.1, "end": 268.1, "text": " Now I''m studying + the technical university of Minken on the also kind of machine learning program.", + "tokens": [50914, 823, 286, 478, 7601, 264, 6191, 5454, 295, 376, 35061, 322, 264, + 611, 733, 295, 3479, 2539, 1461, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.21581243646555934, "compression_ratio": 1.4367816091954022, "no_speech_prob": + 0.04407715052366257}, {"id": 42, "seek": 26810, "start": 269.1, "end": 284.1, "text": + " And also I feel that I''m not speaking enough. You see, so I''m like always like + speaking enough, not the melon, not this enough. And at some points, my like, please, + who I knew back then and they were working at that point in to local.", "tokens": + [50414, 400, 611, 286, 841, 300, 286, 478, 406, 4124, 1547, 13, 509, 536, 11, 370, + 286, 478, 411, 1009, 411, 4124, 1547, 11, 406, 264, 41722, 11, 406, 341, 1547, 13, + 400, 412, 512, 2793, 11, 452, 411, 11, 1767, 11, 567, 286, 2586, 646, 550, 293, + 436, 645, 1364, 412, 300, 935, 294, 281, 2654, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.2983179354886396, "compression_ratio": 1.6864406779661016, "no_speech_prob": + 0.510401725769043}, {"id": 43, "seek": 26810, "start": 284.1, "end": 296.1, "text": + " They said, hey, Jenny, how about you will work with us as a data advocate. And + I was like, what is that? I''m going to Germany. I am not, I don''t know what is + that.", "tokens": [51164, 814, 848, 11, 4177, 11, 20580, 11, 577, 466, 291, 486, + 589, 365, 505, 382, 257, 1412, 14608, 13, 400, 286, 390, 411, 11, 437, 307, 300, + 30, 286, 478, 516, 281, 7244, 13, 286, 669, 406, 11, 286, 500, 380, 458, 437, 307, + 300, 13, 51764], "temperature": 0.0, "avg_logprob": -0.2983179354886396, "compression_ratio": + 1.6864406779661016, "no_speech_prob": 0.510401725769043}, {"id": 44, "seek": 29610, + "start": 296.1, "end": 304.1, "text": " And they''re like, oh, that would be perfect + for you because you can do machine learning and researching, but you also can speak.", + "tokens": [50364, 400, 436, 434, 411, 11, 1954, 11, 300, 576, 312, 2176, 337, 291, + 570, 291, 393, 360, 3479, 2539, 293, 24176, 11, 457, 291, 611, 393, 1710, 13, 50764], + "temperature": 0.0, "avg_logprob": -0.17793797944721423, "compression_ratio": 1.8401639344262295, + "no_speech_prob": 0.025038916617631912}, {"id": 45, "seek": 29610, "start": 304.1, + "end": 324.1, "text": " And I am so happy that it literally happened because last + year and some time that I was like there working with the data, working for outsourcing, + working with search because I also have like some past experiences when I was a + machine learning engineer, I actually was working with a search recommendations + a little bit.", "tokens": [50764, 400, 286, 669, 370, 2055, 300, 309, 3736, 2011, + 570, 1036, 1064, 293, 512, 565, 300, 286, 390, 411, 456, 1364, 365, 264, 1412, 11, + 1364, 337, 14758, 41849, 11, 1364, 365, 3164, 570, 286, 611, 362, 411, 512, 1791, + 5235, 562, 286, 390, 257, 3479, 2539, 11403, 11, 286, 767, 390, 1364, 365, 257, + 3164, 10434, 257, 707, 857, 13, 51764], "temperature": 0.0, "avg_logprob": -0.17793797944721423, + "compression_ratio": 1.8401639344262295, "no_speech_prob": 0.025038916617631912}, + {"id": 46, "seek": 32410, "start": 324.1, "end": 331.1, "text": " So all of these + combined in one perfect profession. So I would say I''m a very happy person.", "tokens": + [50364, 407, 439, 295, 613, 9354, 294, 472, 2176, 7032, 13, 407, 286, 576, 584, + 286, 478, 257, 588, 2055, 954, 13, 50714], "temperature": 0.0, "avg_logprob": -0.16637213333793308, + "compression_ratio": 1.4702702702702704, "no_speech_prob": 0.01271937508136034}, + {"id": 47, "seek": 32410, "start": 331.1, "end": 344.1, "text": " This is super + great. And it sounds like you are in the warm waters of like what you really want + to do at the same time. I think it''s still a very demanding role and and then field.", + "tokens": [50714, 639, 307, 1687, 869, 13, 400, 309, 3263, 411, 291, 366, 294, 264, + 4561, 12975, 295, 411, 437, 291, 534, 528, 281, 360, 412, 264, 912, 565, 13, 286, + 519, 309, 311, 920, 257, 588, 19960, 3090, 293, 293, 550, 2519, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.16637213333793308, "compression_ratio": 1.4702702702702704, + "no_speech_prob": 0.01271937508136034}, {"id": 48, "seek": 34410, "start": 344.1, + "end": 353.1, "text": " And so you are basically still doing some ML, right, but + you also advocate for data. Can you expand a bit on what you do?", "tokens": [50364, + 400, 370, 291, 366, 1936, 920, 884, 512, 21601, 11, 558, 11, 457, 291, 611, 14608, + 337, 1412, 13, 1664, 291, 5268, 257, 857, 322, 437, 291, 360, 30, 50814], "temperature": + 0.0, "avg_logprob": -0.18994806029579855, "compression_ratio": 1.59375, "no_speech_prob": + 0.05394422635436058}, {"id": 49, "seek": 34410, "start": 353.1, "end": 370.1, "text": + " Oh, yeah, it''s actually like everything in a like all in one like, you know, + this shampoos with conditioner and the shampoo in body wash and everything because + I have some freedom at my position to choose what I want to like study now.", "tokens": + [50814, 876, 11, 1338, 11, 309, 311, 767, 411, 1203, 294, 257, 411, 439, 294, 472, + 411, 11, 291, 458, 11, 341, 402, 1215, 78, 329, 365, 33558, 293, 264, 27484, 294, + 1772, 5675, 293, 1203, 570, 286, 362, 512, 5645, 412, 452, 2535, 281, 2826, 437, + 286, 528, 281, 411, 2979, 586, 13, 51664], "temperature": 0.0, "avg_logprob": -0.18994806029579855, + "compression_ratio": 1.59375, "no_speech_prob": 0.05394422635436058}, {"id": 50, + "seek": 37010, "start": 370.1, "end": 376.1, "text": " So, for example, I chose + like going to the search conferences and talk about it because I had some experience.", + "tokens": [50364, 407, 11, 337, 1365, 11, 286, 5111, 411, 516, 281, 264, 3164, 22032, + 293, 751, 466, 309, 570, 286, 632, 512, 1752, 13, 50664], "temperature": 0.0, "avg_logprob": + -0.10147034919868081, "compression_ratio": 1.4835164835164836, "no_speech_prob": + 0.00773544842377305}, {"id": 51, "seek": 37010, "start": 376.1, "end": 388.1, "text": + " I really love the idea of comparing crowdsourcing and machine learning models + in some particular tasks. For example, let''s think about the adversarial attacks.", + "tokens": [50664, 286, 534, 959, 264, 1558, 295, 15763, 26070, 41849, 293, 3479, + 2539, 5245, 294, 512, 1729, 9608, 13, 1171, 1365, 11, 718, 311, 519, 466, 264, 17641, + 44745, 8122, 13, 51264], "temperature": 0.0, "avg_logprob": -0.10147034919868081, + "compression_ratio": 1.4835164835164836, "no_speech_prob": 0.00773544842377305}, + {"id": 52, "seek": 38810, "start": 388.1, "end": 395.1, "text": " It''s interesting + how far we can expand with them, like detecting them by machine learning and detecting + them for humans.", "tokens": [50364, 467, 311, 1880, 577, 1400, 321, 393, 5268, + 365, 552, 11, 411, 40237, 552, 538, 3479, 2539, 293, 40237, 552, 337, 6255, 13, + 50714], "temperature": 0.0, "avg_logprob": -0.23505904084892684, "compression_ratio": + 1.7370517928286853, "no_speech_prob": 0.4383450150489807}, {"id": 53, "seek": 38810, + "start": 395.1, "end": 403.1, "text": " And like these different comparisons where + they crowd wins where they like manually where the machine learning wins.", "tokens": + [50714, 400, 411, 613, 819, 33157, 689, 436, 6919, 10641, 689, 436, 411, 16945, + 689, 264, 3479, 2539, 10641, 13, 51114], "temperature": 0.0, "avg_logprob": -0.23505904084892684, + "compression_ratio": 1.7370517928286853, "no_speech_prob": 0.4383450150489807}, + {"id": 54, "seek": 38810, "start": 403.1, "end": 413.1, "text": " It''s a question + which is in general interesting, especially now with chat GPT when everybody is + like, oh my god, the ice conscious, OK, close everything, fire all the software + engineers we are done.", "tokens": [51114, 467, 311, 257, 1168, 597, 307, 294, 2674, + 1880, 11, 2318, 586, 365, 5081, 26039, 51, 562, 2201, 307, 411, 11, 1954, 452, 3044, + 11, 264, 4435, 6648, 11, 2264, 11, 1998, 1203, 11, 2610, 439, 264, 4722, 11955, + 321, 366, 1096, 13, 51614], "temperature": 0.0, "avg_logprob": -0.23505904084892684, + "compression_ratio": 1.7370517928286853, "no_speech_prob": 0.4383450150489807}, + {"id": 55, "seek": 41310, "start": 413.1, "end": 439.1, "text": " So it''s super + interesting to explore that and I am always like reading articles about this attending + a talks about this also doing myself some talks plus of course I''m also participating + in development of our company because to talk has started as a data labeling company, + but now it''s expanding much more in the means that we also started designing like + a mouth tools on the top of it.", "tokens": [50364, 407, 309, 311, 1687, 1880, 281, + 6839, 300, 293, 286, 669, 1009, 411, 3760, 11290, 466, 341, 15862, 257, 6686, 466, + 341, 611, 884, 2059, 512, 6686, 1804, 295, 1164, 286, 478, 611, 13950, 294, 3250, + 295, 527, 2237, 570, 281, 751, 575, 1409, 382, 257, 1412, 40244, 2237, 11, 457, + 586, 309, 311, 14702, 709, 544, 294, 264, 1355, 300, 321, 611, 1409, 14685, 411, + 257, 4525, 3873, 322, 264, 1192, 295, 309, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.22265765033190763, "compression_ratio": 1.7004405286343611, "no_speech_prob": + 0.3251190781593323}, {"id": 56, "seek": 43910, "start": 439.1, "end": 449.1, "text": + " Because when you''re having such a resource, you know, like human manual labeling + in your basically in your, I don''t know how to say in your basement, but it sounds + creepy.", "tokens": [50364, 1436, 562, 291, 434, 1419, 1270, 257, 7684, 11, 291, + 458, 11, 411, 1952, 9688, 40244, 294, 428, 1936, 294, 428, 11, 286, 500, 380, 458, + 577, 281, 584, 294, 428, 16893, 11, 457, 309, 3263, 14717, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.1336875196363105, "compression_ratio": 1.514792899408284, + "no_speech_prob": 0.07936246693134308}, {"id": 57, "seek": 43910, "start": 449.1, + "end": 454.1, "text": " Yeah, you can use it for transfer learning or for some other + like interesting tasks.", "tokens": [50864, 865, 11, 291, 393, 764, 309, 337, 5003, + 2539, 420, 337, 512, 661, 411, 1880, 9608, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.1336875196363105, "compression_ratio": 1.514792899408284, "no_speech_prob": 0.07936246693134308}, + {"id": 58, "seek": 45410, "start": 454.1, "end": 470.1, "text": " And yeah, we expanded + a lot and it''s very interesting to a system this process and to come and to talk + about this and also in the means of still manual lately in assistance to I am now + developing currently.", "tokens": [50364, 400, 1338, 11, 321, 14342, 257, 688, 293, + 309, 311, 588, 1880, 281, 257, 1185, 341, 1399, 293, 281, 808, 293, 281, 751, 466, + 341, 293, 611, 294, 264, 1355, 295, 920, 9688, 12881, 294, 9683, 281, 286, 669, + 586, 6416, 4362, 13, 51164], "temperature": 0.0, "avg_logprob": -0.2850502014160156, + "compression_ratio": 1.4259259259259258, "no_speech_prob": 0.5685343742370605}, + {"id": 59, "seek": 45410, "start": 470.1, "end": 472.1, "text": " Yeah, this is + fantastic.", "tokens": [51164, 865, 11, 341, 307, 5456, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.2850502014160156, "compression_ratio": 1.4259259259259258, + "no_speech_prob": 0.5685343742370605}, {"id": 60, "seek": 47210, "start": 472.1, + "end": 481.1, "text": " And when somebody approaches to work on or like maybe you + just create an account, I guess, and you start, you know, creating projects and + so on.", "tokens": [50364, 400, 562, 2618, 11587, 281, 589, 322, 420, 411, 1310, + 291, 445, 1884, 364, 2696, 11, 286, 2041, 11, 293, 291, 722, 11, 291, 458, 11, 4084, + 4455, 293, 370, 322, 13, 50814], "temperature": 0.0, "avg_logprob": -0.14932745695114136, + "compression_ratio": 1.6182572614107884, "no_speech_prob": 0.2051985114812851}, + {"id": 61, "seek": 47210, "start": 481.1, "end": 498.1, "text": " I think many, + many things go into the process like starting from the price, right, how economical + it can be, right, do I have any control over this, but also in terms of, for example, + the outcome, you know, the quality of labels that I will get.", "tokens": [50814, + 286, 519, 867, 11, 867, 721, 352, 666, 264, 1399, 411, 2891, 490, 264, 3218, 11, + 558, 11, 577, 42473, 309, 393, 312, 11, 558, 11, 360, 286, 362, 604, 1969, 670, + 341, 11, 457, 611, 294, 2115, 295, 11, 337, 1365, 11, 264, 9700, 11, 291, 458, 11, + 264, 3125, 295, 16949, 300, 286, 486, 483, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.14932745695114136, "compression_ratio": 1.6182572614107884, "no_speech_prob": + 0.2051985114812851}, {"id": 62, "seek": 49810, "start": 498.1, "end": 518.1, "text": + " How do you usually sort of structure the process, is there some general recipe + that the lock up would offer to any user and maybe on top of that, you would offer + some additional service, so to say right or advice to a company, is there something + around that you could share with us.", "tokens": [50364, 1012, 360, 291, 2673, 1333, + 295, 3877, 264, 1399, 11, 307, 456, 512, 2674, 6782, 300, 264, 4017, 493, 576, 2626, + 281, 604, 4195, 293, 1310, 322, 1192, 295, 300, 11, 291, 576, 2626, 512, 4497, 2643, + 11, 370, 281, 584, 558, 420, 5192, 281, 257, 2237, 11, 307, 456, 746, 926, 300, + 291, 727, 2073, 365, 505, 13, 51364], "temperature": 0.0, "avg_logprob": -0.1688781011672247, + "compression_ratio": 1.630057803468208, "no_speech_prob": 0.027266278862953186}, + {"id": 63, "seek": 51810, "start": 518.1, "end": 543.1, "text": " I would say I + can talk no stop, but in general it''s like this firstly, of course, when you''re + deciding that you need manual labeling for some reason, like some data sets, you + need to understand that you actually need that because it doesn''t need that for + every task that they''re existing just use data centric approach throwing data because + nothing is tops up this that''s not correct, of course.", "tokens": [50364, 286, + 576, 584, 286, 393, 751, 572, 1590, 11, 457, 294, 2674, 309, 311, 411, 341, 27376, + 11, 295, 1164, 11, 562, 291, 434, 17990, 300, 291, 643, 9688, 40244, 337, 512, 1778, + 11, 411, 512, 1412, 6352, 11, 291, 643, 281, 1223, 300, 291, 767, 643, 300, 570, + 309, 1177, 380, 643, 300, 337, 633, 5633, 300, 436, 434, 6741, 445, 764, 1412, 1489, + 1341, 3109, 10238, 1412, 570, 1825, 307, 22836, 493, 341, 300, 311, 406, 3006, 11, + 295, 1164, 13, 51614], "temperature": 0.0, "avg_logprob": -0.1793798249343346, "compression_ratio": + 1.70995670995671, "no_speech_prob": 0.1263723522424698}, {"id": 64, "seek": 54310, + "start": 543.1, "end": 553.1, "text": " You can be free using like open source data, + you can be free using synthetic data because it''s cheap, you''re just generating + it yourself.", "tokens": [50364, 509, 393, 312, 1737, 1228, 411, 1269, 4009, 1412, + 11, 291, 393, 312, 1737, 1228, 23420, 1412, 570, 309, 311, 7084, 11, 291, 434, 445, + 17746, 309, 1803, 13, 50864], "temperature": 0.0, "avg_logprob": -0.13005741292780096, + "compression_ratio": 1.5324675324675325, "no_speech_prob": 0.096756212413311}, {"id": + 65, "seek": 54310, "start": 553.1, "end": 562.1, "text": " But sometimes in a lot + of domains, you don''t have enough available data for some specific domains.", "tokens": + [50864, 583, 2171, 294, 257, 688, 295, 25514, 11, 291, 500, 380, 362, 1547, 2435, + 1412, 337, 512, 2685, 25514, 13, 51314], "temperature": 0.0, "avg_logprob": -0.13005741292780096, + "compression_ratio": 1.5324675324675325, "no_speech_prob": 0.096756212413311}, {"id": + 66, "seek": 56210, "start": 562.1, "end": 574.1, "text": " And it''s hard to gather + it or generate it or sometimes you need a human creation over curation over the + machine learning processes, for example, for monitoring or like with.", "tokens": + [50364, 400, 309, 311, 1152, 281, 5448, 309, 420, 8460, 309, 420, 2171, 291, 643, + 257, 1952, 8016, 670, 1262, 399, 670, 264, 3479, 2539, 7555, 11, 337, 1365, 11, + 337, 11028, 420, 411, 365, 13, 50964], "temperature": 0.0, "avg_logprob": -0.178009705665784, + "compression_ratio": 1.3951612903225807, "no_speech_prob": 0.684352457523346}, {"id": + 67, "seek": 57410, "start": 574.1, "end": 593.1, "text": " For example, it''s like + a hot topic i''ve seen such a thread in Twitter how people try to ask it for some + like really, really dangerous stuff and check if it will have provided and it did + so like you know we still need a human creation over like the data gathered by MLEI + mechanisms.", "tokens": [50364, 1171, 1365, 11, 309, 311, 411, 257, 2368, 4829, + 741, 600, 1612, 1270, 257, 7207, 294, 5794, 577, 561, 853, 281, 1029, 309, 337, + 512, 411, 534, 11, 534, 5795, 1507, 293, 1520, 498, 309, 486, 362, 5649, 293, 309, + 630, 370, 411, 291, 458, 321, 920, 643, 257, 1952, 8016, 670, 411, 264, 1412, 13032, + 538, 376, 2634, 40, 15902, 13, 51314], "temperature": 0.0, "avg_logprob": -0.23122239835334546, + "compression_ratio": 1.4484536082474226, "no_speech_prob": 0.19288843870162964}, + {"id": 68, "seek": 59310, "start": 594.1, "end": 604.1, "text": " So in this case, + if you feel like you haven''t need to gather a data set for your specific problem + and you don''t know where to start here is crowdsourcing platforms and for example + into loka.", "tokens": [50414, 407, 294, 341, 1389, 11, 498, 291, 841, 411, 291, + 2378, 380, 643, 281, 5448, 257, 1412, 992, 337, 428, 2685, 1154, 293, 291, 500, + 380, 458, 689, 281, 722, 510, 307, 26070, 41849, 9473, 293, 337, 1365, 666, 450, + 2330, 13, 50914], "temperature": 0.0, "avg_logprob": -0.19349405625287225, "compression_ratio": + 1.6491228070175439, "no_speech_prob": 0.04143073409795761}, {"id": 69, "seek": 59310, + "start": 604.1, "end": 616.1, "text": " It is the platform which was created from + engineers to engineers so it''s not about like the only model with business where + you''re coming to us were consulting you and you''re going away.", "tokens": [50914, + 467, 307, 264, 3663, 597, 390, 2942, 490, 11955, 281, 11955, 370, 309, 311, 406, + 466, 411, 264, 787, 2316, 365, 1606, 689, 291, 434, 1348, 281, 505, 645, 23682, + 291, 293, 291, 434, 516, 1314, 13, 51514], "temperature": 0.0, "avg_logprob": -0.19349405625287225, + "compression_ratio": 1.6491228070175439, "no_speech_prob": 0.04143073409795761}, + {"id": 70, "seek": 61610, "start": 616.1, "end": 626.1, "text": " So we''re actually + super happy if you''re like trying to deal with it yourself because we have an open + API and it''s more about mechanisms than speaking with manual labor.", "tokens": + [50364, 407, 321, 434, 767, 1687, 2055, 498, 291, 434, 411, 1382, 281, 2028, 365, + 309, 1803, 570, 321, 362, 364, 1269, 9362, 293, 309, 311, 544, 466, 15902, 813, + 4124, 365, 9688, 5938, 13, 50864], "temperature": 0.0, "avg_logprob": -0.1893657815867457, + "compression_ratio": 1.502857142857143, "no_speech_prob": 0.03512446582317352}, + {"id": 71, "seek": 61610, "start": 626.1, "end": 633.1, "text": " It''s like literally + about like handling the crowds with a mechanism sort of filtering and etc.", "tokens": + [50864, 467, 311, 411, 3736, 466, 411, 13175, 264, 26070, 365, 257, 7513, 1333, + 295, 30822, 293, 5183, 13, 51214], "temperature": 0.0, "avg_logprob": -0.1893657815867457, + "compression_ratio": 1.502857142857143, "no_speech_prob": 0.03512446582317352}, + {"id": 72, "seek": 63310, "start": 633.1, "end": 651.1, "text": " So usually to + start you need to register and then we have huge tons of tutorials and education + programs and also we have a community which like might in handles actually and we''re + bothered to discuss any problems or questions.", "tokens": [50364, 407, 2673, 281, + 722, 291, 643, 281, 7280, 293, 550, 321, 362, 2603, 9131, 295, 17616, 293, 3309, + 4268, 293, 611, 321, 362, 257, 1768, 597, 411, 1062, 294, 18722, 767, 293, 321, + 434, 22996, 281, 2248, 604, 2740, 420, 1651, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.22018416031547214, "compression_ratio": 1.477124183006536, "no_speech_prob": + 0.013215191662311554}, {"id": 73, "seek": 65110, "start": 651.1, "end": 666.1, "text": + " But I would say like we try to implement in the platform already simple steps + that help you to do it pretty intuitively without studying much your first labeling + project and set it up and let it run.", "tokens": [50364, 583, 286, 576, 584, 411, + 321, 853, 281, 4445, 294, 264, 3663, 1217, 2199, 4439, 300, 854, 291, 281, 360, + 309, 1238, 46506, 1553, 7601, 709, 428, 700, 40244, 1716, 293, 992, 309, 493, 293, + 718, 309, 1190, 13, 51114], "temperature": 0.0, "avg_logprob": -0.08933178136046503, + "compression_ratio": 1.671497584541063, "no_speech_prob": 0.10890305787324905}, + {"id": 74, "seek": 65110, "start": 666.1, "end": 676.1, "text": " So there are like + inbound instructions on every step there are like some little video or some little + text instructions telling the good practices.", "tokens": [51114, 407, 456, 366, + 411, 294, 18767, 9415, 322, 633, 1823, 456, 366, 411, 512, 707, 960, 420, 512, 707, + 2487, 9415, 3585, 264, 665, 7525, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.08933178136046503, "compression_ratio": 1.671497584541063, "no_speech_prob": + 0.10890305787324905}, {"id": 75, "seek": 67610, "start": 676.1, "end": 688.1, "text": + " So we try to make it as simple like I actually saw it developing because when + I started using to look at it wasn''t any of this and now it''s like impressive + how they changed everything.", "tokens": [50364, 407, 321, 853, 281, 652, 309, 382, + 2199, 411, 286, 767, 1866, 309, 6416, 570, 562, 286, 1409, 1228, 281, 574, 412, + 309, 2067, 380, 604, 295, 341, 293, 586, 309, 311, 411, 8992, 577, 436, 3105, 1203, + 13, 50964], "temperature": 0.0, "avg_logprob": -0.22237022250306374, "compression_ratio": + 1.3758389261744965, "no_speech_prob": 0.18906830251216888}, {"id": 76, "seek": 67610, + "start": 688.1, "end": 691.1, "text": " Yeah, I can imagine.", "tokens": [50964, + 865, 11, 286, 393, 3811, 13, 51114], "temperature": 0.0, "avg_logprob": -0.22237022250306374, + "compression_ratio": 1.3758389261744965, "no_speech_prob": 0.18906830251216888}, + {"id": 77, "seek": 69110, "start": 691.1, "end": 720.1, "text": " And so inside + the log I mean if I consider to look as deep package product that I get access to + you know inside it I presume you have the labeling editor or component whatever + is it that you''re calling it that I can flexibly load any data format right and + also any vertical like from computer science to audio to text time serious maybe + and so on.", "tokens": [50364, 400, 370, 1854, 264, 3565, 286, 914, 498, 286, 1949, + 281, 574, 382, 2452, 7372, 1674, 300, 286, 483, 2105, 281, 291, 458, 1854, 309, + 286, 43283, 291, 362, 264, 40244, 9839, 420, 6542, 2035, 307, 309, 300, 291, 434, + 5141, 309, 300, 286, 393, 5896, 3545, 3677, 604, 1412, 7877, 558, 293, 611, 604, + 9429, 411, 490, 3820, 3497, 281, 6278, 281, 2487, 565, 3156, 1310, 293, 370, 322, + 13, 51814], "temperature": 0.0, "avg_logprob": -0.20276323954264322, "compression_ratio": + 1.613953488372093, "no_speech_prob": 0.14042778313159943}, {"id": 78, "seek": 72010, + "start": 720.1, "end": 736.1, "text": " What does it take for you know the way I + imagine it they send my head is that let''s say this is this is a team that hasn''t + had experience with labeling before but they realize the importance of it in their + project.", "tokens": [50364, 708, 775, 309, 747, 337, 291, 458, 264, 636, 286, 3811, + 309, 436, 2845, 452, 1378, 307, 300, 718, 311, 584, 341, 307, 341, 307, 257, 1469, + 300, 6132, 380, 632, 1752, 365, 40244, 949, 457, 436, 4325, 264, 7379, 295, 309, + 294, 641, 1716, 13, 51164], "temperature": 0.0, "avg_logprob": -0.16406957626342775, + "compression_ratio": 1.4557823129251701, "no_speech_prob": 0.07539702951908112}, + {"id": 79, "seek": 73610, "start": 736.1, "end": 742.1, "text": " So they will not + be professionals in this space.", "tokens": [50364, 407, 436, 486, 406, 312, 11954, + 294, 341, 1901, 13, 50664], "temperature": 0.0, "avg_logprob": -0.11136488547691932, + "compression_ratio": 1.6580310880829014, "no_speech_prob": 0.33142566680908203}, + {"id": 80, "seek": 73610, "start": 742.1, "end": 761.1, "text": " What what do they + need to think about when they prepare the data maybe quantity maybe you recommend + them to start with smaller quantity how should they reason about format and should + they first go and watch all the tutorials so can they somehow intuitively follow + the UI.", "tokens": [50664, 708, 437, 360, 436, 643, 281, 519, 466, 562, 436, 5940, + 264, 1412, 1310, 11275, 1310, 291, 2748, 552, 281, 722, 365, 4356, 11275, 577, 820, + 436, 1778, 466, 7877, 293, 820, 436, 700, 352, 293, 1159, 439, 264, 17616, 370, + 393, 436, 6063, 46506, 1524, 264, 15682, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.11136488547691932, "compression_ratio": 1.6580310880829014, "no_speech_prob": + 0.33142566680908203}, {"id": 81, "seek": 76110, "start": 761.1, "end": 790.1, "text": + " I would say like this like of course crowdsourcing a little bit like aligning + it reminds me a little bit of training the machine learning model you need to spend + some time tuning of course but yeah it''s for the different data types that''s firstly + addressing your like comments it''s for different data types it''s for like all + the text video etc and like can be used for multiple use cases like gathering data.", + "tokens": [50364, 286, 576, 584, 411, 341, 411, 295, 1164, 26070, 41849, 257, 707, + 857, 411, 419, 9676, 309, 12025, 385, 257, 707, 857, 295, 3097, 264, 3479, 2539, + 2316, 291, 643, 281, 3496, 512, 565, 15164, 295, 1164, 457, 1338, 309, 311, 337, + 264, 819, 1412, 3467, 300, 311, 27376, 14329, 428, 411, 3053, 309, 311, 337, 819, + 1412, 3467, 309, 311, 337, 411, 439, 264, 2487, 960, 5183, 293, 411, 393, 312, 1143, + 337, 3866, 764, 3331, 411, 13519, 1412, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.13550514894373278, "compression_ratio": 1.8295964125560538, "no_speech_prob": + 0.03970224782824516}, {"id": 82, "seek": 79010, "start": 790.1, "end": 813.1, "text": + " Gathering data says for voice assistance and like for self driving cars or like + as I hope we will still stick to the main name of the podcast and we will talk about + search relevance with a human labeling but like yeah let''s imagine your theme is + on the record creating a project and they''re realized we need human labeling but + they never saw the platform.", "tokens": [50364, 39841, 278, 1412, 1619, 337, 3177, + 9683, 293, 411, 337, 2698, 4840, 5163, 420, 411, 382, 286, 1454, 321, 486, 920, + 2897, 281, 264, 2135, 1315, 295, 264, 7367, 293, 321, 486, 751, 466, 3164, 32684, + 365, 257, 1952, 40244, 457, 411, 1338, 718, 311, 3811, 428, 6314, 307, 322, 264, + 2136, 4084, 257, 1716, 293, 436, 434, 5334, 321, 643, 1952, 40244, 457, 436, 1128, + 1866, 264, 3663, 13, 51514], "temperature": 0.0, "avg_logprob": -0.1862437274004962, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.019701136276125908}, + {"id": 83, "seek": 81310, "start": 813.1, "end": 822.1, "text": " I would say that + the most important thing to consider is to start thinking in the midst of the composition + of the task.", "tokens": [50364, 286, 576, 584, 300, 264, 881, 1021, 551, 281, 1949, + 307, 281, 722, 1953, 294, 264, 18629, 295, 264, 12686, 295, 264, 5633, 13, 50814], + "temperature": 0.0, "avg_logprob": -0.11881085727991683, "compression_ratio": 1.7621145374449338, + "no_speech_prob": 0.22369225323200226}, {"id": 84, "seek": 81310, "start": 822.1, + "end": 841.1, "text": " It''s a key thing in the any crowdsourcing tasks that''s + not like it''s pretty scalable so the amount is not a problem it''s not that expensive + you can set up reasonable prices and it will be pretty much cheap but the one thing + that is very important if you make task too complicated.", "tokens": [50814, 467, + 311, 257, 2141, 551, 294, 264, 604, 26070, 41849, 9608, 300, 311, 406, 411, 309, + 311, 1238, 38481, 370, 264, 2372, 307, 406, 257, 1154, 309, 311, 406, 300, 5124, + 291, 393, 992, 493, 10585, 7901, 293, 309, 486, 312, 1238, 709, 7084, 457, 264, + 472, 551, 300, 307, 588, 1021, 498, 291, 652, 5633, 886, 6179, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.11881085727991683, "compression_ratio": 1.7621145374449338, + "no_speech_prob": 0.22369225323200226}, {"id": 85, "seek": 84110, "start": 841.1, + "end": 856.1, "text": " Like as you would to for example having in house experts + and you can ask them whatever and they will think of the rest here you''re working + with people which are not committed specifically to provide you something more than + you asked for them.", "tokens": [50364, 1743, 382, 291, 576, 281, 337, 1365, 1419, + 294, 1782, 8572, 293, 291, 393, 1029, 552, 2035, 293, 436, 486, 519, 295, 264, 1472, + 510, 291, 434, 1364, 365, 561, 597, 366, 406, 7784, 4682, 281, 2893, 291, 746, 544, + 813, 291, 2351, 337, 552, 13, 51114], "temperature": 0.0, "avg_logprob": -0.09817008018493652, + "compression_ratio": 1.4938271604938271, "no_speech_prob": 0.13222262263298035}, + {"id": 86, "seek": 85610, "start": 857.1, "end": 883.1, "text": " So tasks should + be simple and they should be well defined so that is the thing that you need to + a little bit train on thinking of how to decomposition task and for example like + if you offer it to a person who is not in your area of the studies that you will + be sure that he still can do it without special training maybe just reading the + instruction or like completing some exam.", "tokens": [50414, 407, 9608, 820, 312, + 2199, 293, 436, 820, 312, 731, 7642, 370, 300, 307, 264, 551, 300, 291, 643, 281, + 257, 707, 857, 3847, 322, 1953, 295, 577, 281, 48356, 5633, 293, 337, 1365, 411, + 498, 291, 2626, 309, 281, 257, 954, 567, 307, 406, 294, 428, 1859, 295, 264, 5313, + 300, 291, 486, 312, 988, 300, 415, 920, 393, 360, 309, 1553, 2121, 3097, 1310, 445, + 3760, 264, 10951, 420, 411, 19472, 512, 1139, 13, 51714], "temperature": 0.0, "avg_logprob": + -0.09103388190269471, "compression_ratio": 1.7465437788018434, "no_speech_prob": + 0.7489876747131348}, {"id": 87, "seek": 88310, "start": 883.1, "end": 904.1, "text": + " And the rest is pretty much covered by the platform because now there are like + specific mechanics which predefined your settings for you not making mistakes on + like you know we money to the crowds or like doing the interface incorrectly because + we try to implement as much testing in house as possible.", "tokens": [50364, 400, + 264, 1472, 307, 1238, 709, 5343, 538, 264, 3663, 570, 586, 456, 366, 411, 2685, + 12939, 597, 659, 37716, 428, 6257, 337, 291, 406, 1455, 8038, 322, 411, 291, 458, + 321, 1460, 281, 264, 26070, 420, 411, 884, 264, 9226, 42892, 570, 321, 853, 281, + 4445, 382, 709, 4997, 294, 1782, 382, 1944, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.1528531575607041, "compression_ratio": 1.5894736842105264, "no_speech_prob": + 0.013659652322530746}, {"id": 88, "seek": 90410, "start": 904.1, "end": 926.1, "text": + " And the interface they configure like the program where you configure the interface + it is done pretty much into intuitive sense so you don''t have to like learn JavaScript + or HTML if anything it''s done just in basic building blocks which can be like changed + in places and group together in some nice looking interface.", "tokens": [50364, + 400, 264, 9226, 436, 22162, 411, 264, 1461, 689, 291, 22162, 264, 9226, 309, 307, + 1096, 1238, 709, 666, 21769, 2020, 370, 291, 500, 380, 362, 281, 411, 1466, 15778, + 420, 17995, 498, 1340, 309, 311, 1096, 445, 294, 3875, 2390, 8474, 597, 393, 312, + 411, 3105, 294, 3190, 293, 1594, 1214, 294, 512, 1481, 1237, 9226, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.17447851550194524, "compression_ratio": 1.5771144278606966, + "no_speech_prob": 0.06218840926885605}, {"id": 89, "seek": 92610, "start": 926.1, + "end": 949.1, "text": " So I hope the most of the burden is just to start thinking + differently it''s like you know it''s like with a programming sometimes there was + a moment when I learned how scaling my life and I have to completely like reprogram + my mind because it''s such a different language in the means of programming you + need to think differently the same is with crowdsource unique to think of the composing + that is the most important.", "tokens": [50364, 407, 286, 1454, 264, 881, 295, 264, + 12578, 307, 445, 281, 722, 1953, 7614, 309, 311, 411, 291, 458, 309, 311, 411, 365, + 257, 9410, 2171, 456, 390, 257, 1623, 562, 286, 3264, 577, 21589, 452, 993, 293, + 286, 362, 281, 2584, 411, 35257, 1342, 452, 1575, 570, 309, 311, 1270, 257, 819, + 2856, 294, 264, 1355, 295, 9410, 291, 643, 281, 519, 7614, 264, 912, 307, 365, 26070, + 2948, 3845, 281, 519, 295, 264, 715, 6110, 300, 307, 264, 881, 1021, 13, 51514], + "temperature": 0.0, "avg_logprob": -0.15726157440536323, "compression_ratio": 1.7669491525423728, + "no_speech_prob": 0.07206302136182785}, {"id": 90, "seek": 94910, "start": 949.1, + "end": 974.1, "text": " Yeah that''s exciting yeah a high school functional mathematical + yeah it''s so my god yes but probably beautiful too yeah this sounds great so it + does sound like a self service in many ways and now that you called out search which + is also very dear to my heart and I''m glad it is the same for you.", "tokens": + [50364, 865, 300, 311, 4670, 1338, 257, 1090, 1395, 11745, 18894, 1338, 309, 311, + 370, 452, 3044, 2086, 457, 1391, 2238, 886, 1338, 341, 3263, 869, 370, 309, 775, + 1626, 411, 257, 2698, 2643, 294, 867, 2098, 293, 586, 300, 291, 1219, 484, 3164, + 597, 307, 611, 588, 6875, 281, 452, 1917, 293, 286, 478, 5404, 309, 307, 264, 912, + 337, 291, 13, 51614], "temperature": 0.0, "avg_logprob": -0.1980926918260979, "compression_ratio": + 1.544502617801047, "no_speech_prob": 0.09487591683864594}, {"id": 91, "seek": 97410, + "start": 974.1, "end": 1003.1, "text": " So let''s start with the basics you know + like I have a search engine I have users I''ve got logs logs right so what I can + do is that I can actually record for every search the position where the click happened + right so what we returned what was the click and so I have plenty of data assuming + I have plenty of users why do I need another data set can you convince me.", "tokens": + [50414, 407, 718, 311, 722, 365, 264, 14688, 291, 458, 411, 286, 362, 257, 3164, + 2848, 286, 362, 5022, 286, 600, 658, 20820, 20820, 558, 370, 437, 286, 393, 360, + 307, 300, 286, 393, 767, 2136, 337, 633, 3164, 264, 2535, 689, 264, 2052, 2011, + 558, 370, 437, 321, 8752, 437, 390, 264, 2052, 293, 370, 286, 362, 7140, 295, 1412, + 11926, 286, 362, 7140, 295, 5022, 983, 360, 286, 643, 1071, 1412, 992, 393, 291, + 13447, 385, 13, 51814], "temperature": 0.0, "avg_logprob": -0.08048876420951184, + "compression_ratio": 1.7548076923076923, "no_speech_prob": 0.06554551422595978}, + {"id": 92, "seek": 100410, "start": 1004.1, "end": 1005.1, "text": " What am I missing.", + "tokens": [50364, 708, 669, 286, 5361, 13, 50414], "temperature": 0.0, "avg_logprob": + -0.21699097752571106, "compression_ratio": 1.5858585858585859, "no_speech_prob": + 0.0035543444100767374}, {"id": 93, "seek": 100410, "start": 1006.1, "end": 1033.1, + "text": " Here I would say manual is much more about using not creating a new data + set from the scratch but evaluating your abilities of ranking your queries in your + search correctly because as far as I understand a lot of like there are like a lot + of the ranking pretty sophisticated algorithms existing.", "tokens": [50464, 1692, + 286, 576, 584, 9688, 307, 709, 544, 466, 1228, 406, 4084, 257, 777, 1412, 992, 490, + 264, 8459, 457, 27479, 428, 11582, 295, 17833, 428, 24109, 294, 428, 3164, 8944, + 570, 382, 1400, 382, 286, 1223, 257, 688, 295, 411, 456, 366, 411, 257, 688, 295, + 264, 17833, 1238, 16950, 14642, 6741, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.21699097752571106, "compression_ratio": 1.5858585858585859, "no_speech_prob": + 0.0035543444100767374}, {"id": 94, "seek": 103410, "start": 1034.1, "end": 1056.1, + "text": " The change in stay like starting from the simple like search like I don''t + know because I am a lot of people like document and a query and everything which + was like some past time now people are creating vector databases and it''s super + sophisticated but still we have search we have recommendations we have like some + order of queries which user expects to receive.", "tokens": [50364, 440, 1319, 294, + 1754, 411, 2891, 490, 264, 2199, 411, 3164, 411, 286, 500, 380, 458, 570, 286, 669, + 257, 688, 295, 561, 411, 4166, 293, 257, 14581, 293, 1203, 597, 390, 411, 512, 1791, + 565, 586, 561, 366, 4084, 8062, 22380, 293, 309, 311, 1687, 16950, 457, 920, 321, + 362, 3164, 321, 362, 10434, 321, 362, 411, 512, 1668, 295, 24109, 597, 4195, 33280, + 281, 4774, 13, 51464], "temperature": 0.0, "avg_logprob": -0.2912605073716905, "compression_ratio": + 1.704225352112676, "no_speech_prob": 0.02891315147280693}, {"id": 95, "seek": 105610, + "start": 1056.1, "end": 1079.1, "text": " I mean user doesn''t expect to receive + some order but it user expects to see the right answer as like closer to him as + possible not like searching for the five pages so for that specifically human evaluation + on top of the implicit signals evaluation like clicking it''s very crucial.", "tokens": + [50364, 286, 914, 4195, 1177, 380, 2066, 281, 4774, 512, 1668, 457, 309, 4195, 33280, + 281, 536, 264, 558, 1867, 382, 411, 4966, 281, 796, 382, 1944, 406, 411, 10808, + 337, 264, 1732, 7183, 370, 337, 300, 4682, 1952, 13344, 322, 1192, 295, 264, 26947, + 12354, 13344, 411, 9697, 309, 311, 588, 11462, 13, 51514], "temperature": 0.0, "avg_logprob": + -0.12389686651397169, "compression_ratio": 1.5786516853932584, "no_speech_prob": + 0.024495892226696014}, {"id": 96, "seek": 107910, "start": 1079.1, "end": 1102.1, + "text": " And I can try to elaborate on that how do you think like from your perspective + like do we need if you like your creative search engine do you think we need also + to make you and see the like ranking results or you think that clicks and buys if + we''re talking about equal is enough.", "tokens": [50364, 400, 286, 393, 853, 281, + 20945, 322, 300, 577, 360, 291, 519, 411, 490, 428, 4585, 411, 360, 321, 643, 498, + 291, 411, 428, 5880, 3164, 2848, 360, 291, 519, 321, 643, 611, 281, 652, 291, 293, + 536, 264, 411, 17833, 3542, 420, 291, 519, 300, 18521, 293, 28153, 498, 321, 434, + 1417, 466, 2681, 307, 1547, 13, 51514], "temperature": 0.0, "avg_logprob": -0.2214131970559397, + "compression_ratio": 1.650887573964497, "no_speech_prob": 0.25347405672073364}, + {"id": 97, "seek": 110210, "start": 1103.1, "end": 1129.1, "text": " Well if I play + the interior to play the devil''s advocate you know the data advocate I am the devil''s + advocate here in principle I already have users so they will tell me with the clicks + they won''t with clicks right so I might as well just measure you know the sort + of plot this click through rate or something else and then see what''s going on + right so that will be my probably online metric.", "tokens": [50414, 1042, 498, + 286, 862, 264, 10636, 281, 862, 264, 13297, 311, 14608, 291, 458, 264, 1412, 14608, + 286, 669, 264, 13297, 311, 14608, 510, 294, 8665, 286, 1217, 362, 5022, 370, 436, + 486, 980, 385, 365, 264, 18521, 436, 1582, 380, 365, 18521, 558, 370, 286, 1062, + 382, 731, 445, 3481, 291, 458, 264, 1333, 295, 7542, 341, 2052, 807, 3314, 420, + 746, 1646, 293, 550, 536, 437, 311, 516, 322, 558, 370, 300, 486, 312, 452, 1391, + 2950, 20678, 13, 51714], "temperature": 0.0, "avg_logprob": -0.16351778366986444, + "compression_ratio": 1.8064516129032258, "no_speech_prob": 0.41275790333747864}, + {"id": 98, "seek": 112910, "start": 1130.1, "end": 1140.1, "text": " But I guess + when you when you talk about human labeling you infer that there is an importance + in offline evaluation as well right.", "tokens": [50414, 583, 286, 2041, 562, 291, + 562, 291, 751, 466, 1952, 40244, 291, 13596, 300, 456, 307, 364, 7379, 294, 21857, + 13344, 382, 731, 558, 13, 50914], "temperature": 0.0, "avg_logprob": -0.1287886903092668, + "compression_ratio": 1.6650717703349283, "no_speech_prob": 0.027376869693398476}, + {"id": 99, "seek": 112910, "start": 1141.1, "end": 1157.1, "text": " Oh yeah I am + like you know is asking this rhetorical question like we need it right right but + actually I can give some motivation behind it actually it''s a very interesting + thing about what human clicks actually mean.", "tokens": [50964, 876, 1338, 286, + 669, 411, 291, 458, 307, 3365, 341, 24182, 284, 804, 1168, 411, 321, 643, 309, 558, + 558, 457, 767, 286, 393, 976, 512, 12335, 2261, 309, 767, 309, 311, 257, 588, 1880, + 551, 466, 437, 1952, 18521, 767, 914, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.1287886903092668, "compression_ratio": 1.6650717703349283, "no_speech_prob": + 0.027376869693398476}, {"id": 100, "seek": 115710, "start": 1157.1, "end": 1183.1, + "text": " We can return to it because recently I had this meet up about biases and + it was also about like human clicking and one of the very interesting talks that + we had at our meet up was about like position guys so the humans are just tend to + click on something that they are offers because they''re taught that the thing that + offered like at the top positions are exactly what they need.", "tokens": [50364, + 492, 393, 2736, 281, 309, 570, 3938, 286, 632, 341, 1677, 493, 466, 32152, 293, + 309, 390, 611, 466, 411, 1952, 9697, 293, 472, 295, 264, 588, 1880, 6686, 300, 321, + 632, 412, 527, 1677, 493, 390, 466, 411, 2535, 1074, 370, 264, 6255, 366, 445, 3928, + 281, 2052, 322, 746, 300, 436, 366, 7736, 570, 436, 434, 5928, 300, 264, 551, 300, + 8059, 411, 412, 264, 1192, 8432, 366, 2293, 437, 436, 643, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.17979175229615804, "compression_ratio": 1.8095238095238095, + "no_speech_prob": 0.005308900494128466}, {"id": 101, "seek": 118310, "start": 1183.1, + "end": 1196.1, "text": " And they may be dissatisfied with that but they''re just + like you know follow the general way of how search engines and the com search engines + work.", "tokens": [50364, 400, 436, 815, 312, 7802, 38502, 365, 300, 457, 436, 434, + 445, 411, 291, 458, 1524, 264, 2674, 636, 295, 577, 3164, 12982, 293, 264, 395, + 3164, 12982, 589, 13, 51014], "temperature": 0.0, "avg_logprob": -0.21286629977291577, + "compression_ratio": 1.63681592039801, "no_speech_prob": 0.026772646233439445}, + {"id": 102, "seek": 118310, "start": 1197.1, "end": 1209.1, "text": " So technically + online metrics they make a lot of sense of course because like by clicks and by + bias you can predict the most of the behavior and it''s pretty fast and it''s automated.", + "tokens": [51064, 407, 12120, 2950, 16367, 436, 652, 257, 688, 295, 2020, 295, 1164, + 570, 411, 538, 18521, 293, 538, 12577, 291, 393, 6069, 264, 881, 295, 264, 5223, + 293, 309, 311, 1238, 2370, 293, 309, 311, 18473, 13, 51664], "temperature": 0.0, + "avg_logprob": -0.21286629977291577, "compression_ratio": 1.63681592039801, "no_speech_prob": + 0.026772646233439445}, {"id": 103, "seek": 120910, "start": 1209.1, "end": 1238.1, + "text": " I mean like a testing by everybody knows that the one thing that it doesn''t + cover it''s firstly explicit signals like you can you can''t talk by the clicks + and vice as the whole overall about the human satisfaction the satisfaction score + because it''s not like they are explicitly asked in general like do you like this + do you like this search result maybe you wanted something else maybe you want it + more maybe you wanted to be recommended something else.", "tokens": [50364, 286, + 914, 411, 257, 4997, 538, 2201, 3255, 300, 264, 472, 551, 300, 309, 1177, 380, 2060, + 309, 311, 27376, 13691, 12354, 411, 291, 393, 291, 393, 380, 751, 538, 264, 18521, + 293, 11964, 382, 264, 1379, 4787, 466, 264, 1952, 18715, 264, 18715, 6175, 570, + 309, 311, 406, 411, 436, 366, 20803, 2351, 294, 2674, 411, 360, 291, 411, 341, 360, + 291, 411, 341, 3164, 1874, 1310, 291, 1415, 746, 1646, 1310, 291, 528, 309, 544, + 1310, 291, 1415, 281, 312, 9628, 746, 1646, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.19353615835811314, "compression_ratio": 1.9276595744680851, "no_speech_prob": + 0.11530795693397522}, {"id": 104, "seek": 123910, "start": 1239.1, "end": 1268.1, + "text": " And yeah the other things that sometimes like if we''re like doing some + assumption about the like an A B testing for example that we change some interface + and we''re doing by some clicks and by some assumption sometimes we can by introducing + new features pretty much hurt our product because it''s happening in real life and + user see the changes there like the ranking how it differs now with a new feature + and they''re getting super like.", "tokens": [50364, 400, 1338, 264, 661, 721, 300, + 2171, 411, 498, 321, 434, 411, 884, 512, 15302, 466, 264, 411, 364, 316, 363, 4997, + 337, 1365, 300, 321, 1319, 512, 9226, 293, 321, 434, 884, 538, 512, 18521, 293, + 538, 512, 15302, 2171, 321, 393, 538, 15424, 777, 4122, 1238, 709, 4607, 527, 1674, + 570, 309, 311, 2737, 294, 957, 993, 293, 4195, 536, 264, 2962, 456, 411, 264, 17833, + 577, 309, 37761, 586, 365, 257, 777, 4111, 293, 436, 434, 1242, 1687, 411, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.1786786012871321, "compression_ratio": 1.7975206611570247, + "no_speech_prob": 0.005968737415969372}, {"id": 105, "seek": 126910, "start": 1269.1, + "end": 1298.1, "text": " This satisfy the end humans they''re not like you know + for giving easily they see problems in your search engine they might say yeah i''m + not using it again no thank you so like some reasons of why would apply lately be + better you can experiment much more on it because like all you notice the feature + in zoom that if i''m doing that it''s actually noticing by the neural network and + of course me.", "tokens": [50364, 639, 19319, 264, 917, 6255, 436, 434, 406, 411, + 291, 458, 337, 2902, 3612, 436, 536, 2740, 294, 428, 3164, 2848, 436, 1062, 584, + 1338, 741, 478, 406, 1228, 309, 797, 572, 1309, 291, 370, 411, 512, 4112, 295, 983, + 576, 3079, 12881, 312, 1101, 291, 393, 5120, 709, 544, 322, 309, 570, 411, 439, + 291, 3449, 264, 4111, 294, 8863, 300, 498, 741, 478, 884, 300, 309, 311, 767, 21814, + 538, 264, 18161, 3209, 293, 295, 1164, 385, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.3074284059455596, "compression_ratio": 1.6443514644351465, "no_speech_prob": + 0.012394051998853683}, {"id": 106, "seek": 129910, "start": 1299.1, "end": 1314.1, + "text": " That is amazing oh my god okay that yeah we''re really in the like this + is the time of the artificial everything so I got super amused i''m sorry.", "tokens": + [50364, 663, 307, 2243, 1954, 452, 3044, 1392, 300, 1338, 321, 434, 534, 294, 264, + 411, 341, 307, 264, 565, 295, 264, 11677, 1203, 370, 286, 658, 1687, 669, 4717, + 741, 478, 2597, 13, 51114], "temperature": 0.0, "avg_logprob": -0.3574896632014094, + "compression_ratio": 1.2521739130434784, "no_speech_prob": 0.0024125634226948023}, + {"id": 107, "seek": 131410, "start": 1314.1, "end": 1343.1, "text": " Yeah yeah + so firstly with a plainly doing you can definitely experiment more because you can + try different features without higher harm in the end to end users of your system + of your engine and secondly you can check how they''re satisfied what they do like + actually explicitly ask them what do you think about this because when you''re just + the guessing their behavior by their like some implicit things.", "tokens": [50364, + 865, 1338, 370, 27376, 365, 257, 11121, 356, 884, 291, 393, 2138, 5120, 544, 570, + 291, 393, 853, 819, 4122, 1553, 2946, 6491, 294, 264, 917, 281, 917, 5022, 295, + 428, 1185, 295, 428, 2848, 293, 26246, 291, 393, 1520, 577, 436, 434, 11239, 437, + 436, 360, 411, 767, 20803, 1029, 552, 437, 360, 291, 519, 466, 341, 570, 562, 291, + 434, 445, 264, 17939, 641, 5223, 538, 641, 411, 512, 26947, 721, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.23253170855633623, "compression_ratio": 1.7296137339055795, + "no_speech_prob": 0.09239459782838821}, {"id": 108, "seek": 134410, "start": 1344.1, + "end": 1360.1, "text": " Like where they are is nook and how much they click you + can do much for mistakes because you know as it says we we can''t get into another''s + human head but we can at least try to ask and then that probably will be closer + to the reality.", "tokens": [50364, 1743, 689, 436, 366, 307, 572, 453, 293, 577, + 709, 436, 2052, 291, 393, 360, 709, 337, 8038, 570, 291, 458, 382, 309, 1619, 321, + 321, 393, 380, 483, 666, 1071, 311, 1952, 1378, 457, 321, 393, 412, 1935, 853, 281, + 1029, 293, 550, 300, 1391, 486, 312, 4966, 281, 264, 4103, 13, 51164], "temperature": + 0.0, "avg_logprob": -0.22075437244616056, "compression_ratio": 1.5064102564102564, + "no_speech_prob": 0.00848614051938057}, {"id": 109, "seek": 136010, "start": 1360.1, + "end": 1381.1, "text": " Oh yeah absolutely yeah I mean if we don''t if we so what + you''re saying is that we if because usually in search engines in a way we skip + that step of asking yeah we could integrate some give thumbs up or thumbs down approach + but it might also be still rather implicit and not explaining everything we want + to get.", "tokens": [50364, 876, 1338, 3122, 1338, 286, 914, 498, 321, 500, 380, + 498, 321, 370, 437, 291, 434, 1566, 307, 300, 321, 498, 570, 2673, 294, 3164, 12982, + 294, 257, 636, 321, 10023, 300, 1823, 295, 3365, 1338, 321, 727, 13365, 512, 976, + 8838, 493, 420, 8838, 760, 3109, 457, 309, 1062, 611, 312, 920, 2831, 26947, 293, + 406, 13468, 1203, 321, 528, 281, 483, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.13777703397414265, "compression_ratio": 1.56, "no_speech_prob": 0.1935093253850937}, + {"id": 110, "seek": 138110, "start": 1381.1, "end": 1410.1, "text": " But basically + like you know yeah I just remember that I was reading pretty much interesting articles + recently about the recommendation systems and implementing them using them more + about the behaviorism of the people like not just giving them the most popular result + or the most desired by the similar by some features result like in the collaborative + filtering.", "tokens": [50364, 583, 1936, 411, 291, 458, 1338, 286, 445, 1604, 300, + 286, 390, 3760, 1238, 709, 1880, 11290, 3938, 466, 264, 11879, 3652, 293, 18114, + 552, 1228, 552, 544, 466, 264, 5223, 1434, 295, 264, 561, 411, 406, 445, 2902, 552, + 264, 881, 3743, 1874, 420, 264, 881, 14721, 538, 264, 2531, 538, 512, 4122, 1874, + 411, 294, 264, 16555, 30822, 13, 51814], "temperature": 0.0, "avg_logprob": -0.16528424483079177, + "compression_ratio": 1.6759259259259258, "no_speech_prob": 0.0909501314163208}, + {"id": 111, "seek": 141010, "start": 1410.1, "end": 1440.06, "text": " But sometimes + we need to give humans the result that they are not thinking of what they it will + make their like for example health or life better because you know this problem + of recommender systems when you''re like used to clicking on something at some point + recommender system starts offering you the same pool of things you''re kind of stuck + in this and that''s why like using more like sometimes and the offers of this paper + they were suggesting that we need to sometimes.", "tokens": [50364, 583, 2171, 321, + 643, 281, 976, 6255, 264, 1874, 300, 436, 366, 406, 1953, 295, 437, 436, 309, 486, + 652, 641, 411, 337, 1365, 1585, 420, 993, 1101, 570, 291, 458, 341, 1154, 295, 2748, + 260, 3652, 562, 291, 434, 411, 1143, 281, 9697, 322, 746, 412, 512, 935, 2748, 260, + 1185, 3719, 8745, 291, 264, 912, 7005, 295, 721, 291, 434, 733, 295, 5541, 294, + 341, 293, 300, 311, 983, 411, 1228, 544, 411, 2171, 293, 264, 7736, 295, 341, 3035, + 436, 645, 18094, 300, 321, 643, 281, 2171, 13, 51862], "temperature": 0.0, "avg_logprob": + -0.14252355250906437, "compression_ratio": 1.8735177865612649, "no_speech_prob": + 0.011035244911909103}, {"id": 112, "seek": 144010, "start": 1440.1, "end": 1468.1, + "text": " As humans explicitly did they like what they were recommended and do they + understand why it was recommended to them and maybe they want to change track of + the recommendations so that''s why we shouldn''t seem just online metrics but yeah + also the second grand reason why apply metrics are good you can experiment without + harm very fast because like online metrics they''re usually taking like two weeks + for something.", "tokens": [50364, 1018, 6255, 20803, 630, 436, 411, 437, 436, 645, + 9628, 293, 360, 436, 1223, 983, 309, 390, 9628, 281, 552, 293, 1310, 436, 528, 281, + 1319, 2837, 295, 264, 10434, 370, 300, 311, 983, 321, 4659, 380, 1643, 445, 2950, + 16367, 457, 1338, 611, 264, 1150, 2697, 1778, 983, 3079, 16367, 366, 665, 291, 393, + 5120, 1553, 6491, 588, 2370, 570, 411, 2950, 16367, 436, 434, 2673, 1940, 411, 732, + 3259, 337, 746, 13, 51764], "temperature": 0.0, "avg_logprob": -0.20727993891789362, + "compression_ratio": 1.7394957983193278, "no_speech_prob": 0.014893779531121254}, + {"id": 113, "seek": 146810, "start": 1468.1, "end": 1490.1, "text": " You need to + wait for the statistical test to make some like results which are significant and + here you can test it much more faster and you can perform even offline a testing + by the applying manual label data which will cost less harm because the real users + won''t see your mistakes.", "tokens": [50364, 509, 643, 281, 1699, 337, 264, 22820, + 1500, 281, 652, 512, 411, 3542, 597, 366, 4776, 293, 510, 291, 393, 1500, 309, 709, + 544, 4663, 293, 291, 393, 2042, 754, 21857, 257, 4997, 538, 264, 9275, 9688, 7645, + 1412, 597, 486, 2063, 1570, 6491, 570, 264, 957, 5022, 1582, 380, 536, 428, 8038, + 13, 51464], "temperature": 0.0, "avg_logprob": -0.21497522551437903, "compression_ratio": + 1.5842696629213484, "no_speech_prob": 0.04778847470879555}, {"id": 114, "seek": + 149010, "start": 1490.6999999999998, "end": 1514.5, "text": " Yeah I think this + strikes a chord with me for sure so it''s like you don''t really because there is + always a cost to pay when you go online that you actually deliberately potentially + harming someone someone''s experience to learn whether your contract is it called + contract factual like your change in the algorithm is good a bad.", "tokens": [50394, + 865, 286, 519, 341, 16750, 257, 14137, 365, 385, 337, 988, 370, 309, 311, 411, 291, + 500, 380, 534, 570, 456, 307, 1009, 257, 2063, 281, 1689, 562, 291, 352, 2950, 300, + 291, 767, 23506, 7263, 2233, 2810, 1580, 1580, 311, 1752, 281, 1466, 1968, 428, + 4364, 307, 309, 1219, 4364, 48029, 411, 428, 1319, 294, 264, 9284, 307, 665, 257, + 1578, 13, 51584], "temperature": 0.0, "avg_logprob": -0.2505217879565794, "compression_ratio": + 1.6029411764705883, "no_speech_prob": 0.08272203058004379}, {"id": 115, "seek": + 151450, "start": 1515.5, "end": 1542.5, "text": " Yeah it reminds me I am sorry + I''m so totally up to the but it reminds me of the when I was working in moderation + of the vertusments here it''s very crucial to make mistake and line because they''re + like two different options otherwise you''re like showing to the end to end users + the advertisements with things which are like you aren''t supposed to show like + drugs.", "tokens": [50414, 865, 309, 12025, 385, 286, 669, 2597, 286, 478, 370, + 3879, 493, 281, 264, 457, 309, 12025, 385, 295, 264, 562, 286, 390, 1364, 294, 49471, + 295, 264, 6509, 301, 1117, 510, 309, 311, 588, 11462, 281, 652, 6146, 293, 1622, + 570, 436, 434, 411, 732, 819, 3956, 5911, 291, 434, 411, 4099, 281, 264, 917, 281, + 917, 5022, 264, 42897, 365, 721, 597, 366, 411, 291, 3212, 380, 3442, 281, 855, + 411, 7766, 13, 51764], "temperature": 0.0, "avg_logprob": -0.2561352162421504, "compression_ratio": + 1.691588785046729, "no_speech_prob": 0.03356930613517761}, {"id": 116, "seek": 154250, + "start": 1542.5, "end": 1561.5, "text": " I don''t know some the funeries some yellow + news something that is like dangerous or just like stupid on the other hand if you''re + not letting some like through the moderation some healthy content through your losing + money of the companies which are like having a deal with you and your own.", "tokens": + [50364, 286, 500, 380, 458, 512, 264, 1019, 260, 530, 512, 5566, 2583, 746, 300, + 307, 411, 5795, 420, 445, 411, 6631, 322, 264, 661, 1011, 498, 291, 434, 406, 8295, + 512, 411, 807, 264, 49471, 512, 4627, 2701, 807, 428, 7027, 1460, 295, 264, 3431, + 597, 366, 411, 1419, 257, 2028, 365, 291, 293, 428, 1065, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.19391835322145556, "compression_ratio": 1.6235955056179776, + "no_speech_prob": 0.031152261421084404}, {"id": 117, "seek": 156150, "start": 1561.5, + "end": 1590.5, "text": " So here you need to be very cautious with any online experiments + you pretty much doing everything online or offline and you need to very much monitor + how you''re watching learning algorithms doing the evaluation because environment + changes a lot in like you know in the advertisement world new laws are incoming + very fast people like people are input like they are impressively good at the end + of the day.", "tokens": [50364, 407, 510, 291, 643, 281, 312, 588, 25278, 365, 604, + 2950, 12050, 291, 1238, 709, 884, 1203, 2950, 420, 21857, 293, 291, 643, 281, 588, + 709, 6002, 577, 291, 434, 1976, 2539, 14642, 884, 264, 13344, 570, 2823, 2962, 257, + 688, 294, 411, 291, 458, 294, 264, 31370, 1002, 777, 6064, 366, 22341, 588, 2370, + 561, 411, 561, 366, 4846, 411, 436, 366, 6729, 3413, 665, 412, 264, 917, 295, 264, + 786, 13, 51814], "temperature": 0.0, "avg_logprob": -0.2766393611305638, "compression_ratio": + 1.7413793103448276, "no_speech_prob": 0.3705829977989197}, {"id": 118, "seek": 159150, + "start": 1591.5, "end": 1614.5, "text": " And the way the all of the boundaries + when they need to fraud something so imagine every day there are like new existing + algorithms of creating some advertisements which are passing the machine learning + algorithm block blocking the fraud and everything so you need to adapt very fast + and for that you and for a fine experimenting your course need applying data and + applying labeling very much.", "tokens": [50364, 400, 264, 636, 264, 439, 295, 264, + 13180, 562, 436, 643, 281, 14560, 746, 370, 3811, 633, 786, 456, 366, 411, 777, + 6741, 14642, 295, 4084, 512, 42897, 597, 366, 8437, 264, 3479, 2539, 9284, 3461, + 17776, 264, 14560, 293, 1203, 370, 291, 643, 281, 6231, 588, 2370, 293, 337, 300, + 291, 293, 337, 257, 2489, 29070, 428, 1164, 643, 9275, 1412, 293, 9275, 40244, 588, + 709, 13, 51514], "temperature": 0.0, "avg_logprob": -0.3361450566185845, "compression_ratio": + 1.8364485981308412, "no_speech_prob": 0.023027202114462852}, {"id": 119, "seek": + 161450, "start": 1614.5, "end": 1643.5, "text": " Yeah I slowly start to wake up + from my devil''s advocate role so so I should stop being careless and not only rely + on the data that I see in production because in a way it''s like prime time for + my product and I should be careful about it''s not just deploying something once + and it stays there forever and chat GPT takes care of everything.", "tokens": [50364, + 865, 286, 5692, 722, 281, 6634, 493, 490, 452, 13297, 311, 14608, 3090, 370, 370, + 286, 820, 1590, 885, 46187, 293, 406, 787, 10687, 322, 264, 1412, 300, 286, 536, + 294, 4265, 570, 294, 257, 636, 309, 311, 411, 5835, 565, 337, 452, 1674, 293, 286, + 820, 312, 5026, 466, 309, 311, 406, 445, 34198, 746, 1564, 293, 309, 10834, 456, + 5680, 293, 5081, 26039, 51, 2516, 1127, 295, 1203, 13, 51814], "temperature": 0.0, + "avg_logprob": -0.09423558553059896, "compression_ratio": 1.5943396226415094, "no_speech_prob": + 0.19422943890094757}, {"id": 120, "seek": 164350, "start": 1643.5, "end": 1658.5, + "text": " But it actually something that I will need to evolve and this is where + the crowdsourcing approach may help me to do more economical more less intrusive + as well this is really good.", "tokens": [50364, 583, 309, 767, 746, 300, 286, 486, + 643, 281, 16693, 293, 341, 307, 689, 264, 26070, 41849, 3109, 815, 854, 385, 281, + 360, 544, 42473, 544, 1570, 560, 13783, 488, 382, 731, 341, 307, 534, 665, 13, 51114], + "temperature": 0.0, "avg_logprob": -0.1403372504494407, "compression_ratio": 1.5469613259668509, + "no_speech_prob": 0.021637098863720894}, {"id": 121, "seek": 164350, "start": 1658.5, + "end": 1668.5, "text": " Let''s maybe try to make it a little bit more concrete + right and let''s emulate let''s play this game.", "tokens": [51114, 961, 311, 1310, + 853, 281, 652, 309, 257, 707, 857, 544, 9859, 558, 293, 718, 311, 45497, 718, 311, + 862, 341, 1216, 13, 51614], "temperature": 0.0, "avg_logprob": -0.1403372504494407, + "compression_ratio": 1.5469613259668509, "no_speech_prob": 0.021637098863720894}, + {"id": 122, "seek": 166850, "start": 1669.5, "end": 1686.5, "text": " Can you verbally + visualize describe let''s let''s say I''m developing a I don''t know flower search + engine I don''t want to say ecommerce I don''t want to say something specific let''s + say I''m searching I''m offering flowers and I would like to search them.", "tokens": + [50414, 1664, 291, 48162, 23273, 6786, 718, 311, 718, 311, 584, 286, 478, 6416, + 257, 286, 500, 380, 458, 8617, 3164, 2848, 286, 500, 380, 528, 281, 584, 308, 26926, + 286, 500, 380, 528, 281, 584, 746, 2685, 718, 311, 584, 286, 478, 10808, 286, 478, + 8745, 8085, 293, 286, 576, 411, 281, 3164, 552, 13, 51264], "temperature": 0.0, + "avg_logprob": -0.22194639841715494, "compression_ratio": 1.7073170731707317, "no_speech_prob": + 0.40654832124710083}, {"id": 123, "seek": 166850, "start": 1686.5, "end": 1691.5, + "text": " I will use her to search them.", "tokens": [51264, 286, 486, 764, 720, + 281, 3164, 552, 13, 51514], "temperature": 0.0, "avg_logprob": -0.22194639841715494, + "compression_ratio": 1.7073170731707317, "no_speech_prob": 0.40654832124710083}, + {"id": 124, "seek": 169150, "start": 1691.5, "end": 1720.5, "text": " I guess can + you propose sort of a framework of thought how I should approach the crowdsourcing + so let''s say what should I focus on can I choose a metric of line metric that you + will recommend would you like to you know do you think that there is some specific + thing that I could try to connect with my business goals like a metric that will + be reflective of my business goals or would you start with something just something + like I don''t know in DC geo.", "tokens": [50364, 286, 2041, 393, 291, 17421, 1333, + 295, 257, 8388, 295, 1194, 577, 286, 820, 3109, 264, 26070, 41849, 370, 718, 311, + 584, 437, 820, 286, 1879, 322, 393, 286, 2826, 257, 20678, 295, 1622, 20678, 300, + 291, 486, 2748, 576, 291, 411, 281, 291, 458, 360, 291, 519, 300, 456, 307, 512, + 2685, 551, 300, 286, 727, 853, 281, 1745, 365, 452, 1606, 5493, 411, 257, 20678, + 300, 486, 312, 28931, 295, 452, 1606, 5493, 420, 576, 291, 722, 365, 746, 445, 746, + 411, 286, 500, 380, 458, 294, 9114, 43198, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.15838029980659485, "compression_ratio": 1.819277108433735, "no_speech_prob": + 0.3393620252609253}, {"id": 125, "seek": 172050, "start": 1720.5, "end": 1726.5, + "text": " Whatever and go from there.", "tokens": [50364, 8541, 293, 352, 490, 456, + 13, 50664], "temperature": 0.0, "avg_logprob": -0.24748198769309304, "compression_ratio": + 1.5741935483870968, "no_speech_prob": 0.06876444071531296}, {"id": 126, "seek": + 172050, "start": 1726.5, "end": 1746.5, "text": " It''s a very very very long topic + to discuss but less starts from somewhere so in my perspective like of course like + you''re doing your engine it''s in some point of course you''re implementing some + like online labeling.", "tokens": [50664, 467, 311, 257, 588, 588, 588, 938, 4829, + 281, 2248, 457, 1570, 3719, 490, 4079, 370, 294, 452, 4585, 411, 295, 1164, 411, + 291, 434, 884, 428, 2848, 309, 311, 294, 512, 935, 295, 1164, 291, 434, 18114, 512, + 411, 2950, 40244, 13, 51664], "temperature": 0.0, "avg_logprob": -0.24748198769309304, + "compression_ratio": 1.5741935483870968, "no_speech_prob": 0.06876444071531296}, + {"id": 127, "seek": 174650, "start": 1746.5, "end": 1772.5, "text": " So you can + find a lot of things like online evaluation and like you have some somewhere to + start and here you come into the position where you need to do some of line labeling + so there is like these I would say like a circle which like in the parts of which + you can like you can depict your your pipeline pipeline as a circle which goes in + infinity and it has the several parts.", "tokens": [50364, 407, 291, 393, 915, 257, + 688, 295, 721, 411, 2950, 13344, 293, 411, 291, 362, 512, 4079, 281, 722, 293, 510, + 291, 808, 666, 264, 2535, 689, 291, 643, 281, 360, 512, 295, 1622, 40244, 370, 456, + 307, 411, 613, 286, 576, 584, 411, 257, 6329, 597, 411, 294, 264, 3166, 295, 597, + 291, 393, 411, 291, 393, 31553, 428, 428, 15517, 15517, 382, 257, 6329, 597, 1709, + 294, 13202, 293, 309, 575, 264, 2940, 3166, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.40516417703510804, "compression_ratio": 1.8805970149253732, "no_speech_prob": + 0.3636115789413452}, {"id": 128, "seek": 177250, "start": 1772.5, "end": 1796.5, + "text": " When you''re deciding like about like how you want to perform your ranking + what is your ranking means what do you want to show the most what is your like how + many positions people do see what do you want them to see the first what is relevant + what is around and you''re like selecting some end to end metric that you''re going + to use.", "tokens": [50364, 1133, 291, 434, 17990, 411, 466, 411, 577, 291, 528, + 281, 2042, 428, 17833, 437, 307, 428, 17833, 1355, 437, 360, 291, 528, 281, 855, + 264, 881, 437, 307, 428, 411, 577, 867, 8432, 561, 360, 536, 437, 360, 291, 528, + 552, 281, 536, 264, 700, 437, 307, 7340, 437, 307, 926, 293, 291, 434, 411, 18182, + 512, 917, 281, 917, 20678, 300, 291, 434, 516, 281, 764, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.11335725653661441, "compression_ratio": 1.9132947976878614, + "no_speech_prob": 0.17684605717658997}, {"id": 129, "seek": 179650, "start": 1796.5, + "end": 1825.5, "text": " There are like usually some popular metrics you notice + the like and this is all like this this cumulative gain metrics is very popular + and nice way to start there and there are like even like more simple ones just evaluating + about the precision and recall of your position of the elements arising in your + like ranking list resulting and there can be even more sophisticated approaches + like.", "tokens": [50364, 821, 366, 411, 2673, 512, 3743, 16367, 291, 3449, 264, + 411, 293, 341, 307, 439, 411, 341, 341, 38379, 6052, 16367, 307, 588, 3743, 293, + 1481, 636, 281, 722, 456, 293, 456, 366, 411, 754, 411, 544, 2199, 2306, 445, 27479, + 466, 264, 18356, 293, 9901, 295, 428, 2535, 295, 264, 4959, 44900, 294, 428, 411, + 17833, 1329, 16505, 293, 456, 393, 312, 754, 544, 16950, 11587, 411, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.2525428353923641, "compression_ratio": 1.8055555555555556, + "no_speech_prob": 0.020200498402118683}, {"id": 130, "seek": 182550, "start": 1825.5, + "end": 1854.5, "text": " Like expected reciprocals around for example metrics if + you heard of it it''s like more cascade approach because you know that people are + not clicking through after some certain position but think we''re talking about + flowers I would say it''s like it''s more about like image search simple one which + has like some certain type of definitive answers and it''s not like people are going + to it''s like with search in some items when you''re like finding what you desire + and then you are not.", "tokens": [50364, 1743, 5176, 28961, 66, 1124, 926, 337, + 1365, 16367, 498, 291, 2198, 295, 309, 309, 311, 411, 544, 50080, 3109, 570, 291, + 458, 300, 561, 366, 406, 9697, 807, 934, 512, 1629, 2535, 457, 519, 321, 434, 1417, + 466, 8085, 286, 576, 584, 309, 311, 411, 309, 311, 544, 466, 411, 3256, 3164, 2199, + 472, 597, 575, 411, 512, 1629, 2010, 295, 28152, 6338, 293, 309, 311, 406, 411, + 561, 366, 516, 281, 309, 311, 411, 365, 3164, 294, 512, 4754, 562, 291, 434, 411, + 5006, 437, 291, 7516, 293, 550, 291, 366, 406, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.14767529039966817, "compression_ratio": 1.8120300751879699, "no_speech_prob": + 0.04801874980330467}, {"id": 131, "seek": 185550, "start": 1855.5, "end": 1873.5, + "text": " It''s scrolling down maybe with flowers you just want to see so I would + say I mean see them like download them or something so I would say this is pretty + good at the beginning as a basis and then you can adopt this metrics based on what + are you really interested in maybe you have some advertisements on some of the flowers + for something.", "tokens": [50364, 467, 311, 29053, 760, 1310, 365, 8085, 291, 445, + 528, 281, 536, 370, 286, 576, 584, 286, 914, 536, 552, 411, 5484, 552, 420, 746, + 370, 286, 576, 584, 341, 307, 1238, 665, 412, 264, 2863, 382, 257, 5143, 293, 550, + 291, 393, 6878, 341, 16367, 2361, 322, 437, 366, 291, 534, 3102, 294, 1310, 291, + 362, 512, 42897, 322, 512, 295, 264, 8085, 337, 746, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.2161729570845483, "compression_ratio": 1.7106598984771573, + "no_speech_prob": 0.013198391534388065}, {"id": 132, "seek": 187350, "start": 1874.5, + "end": 1902.5, "text": " Then the next part after you define what are you want to + do for example view like which metric you want to evaluate what do you want to see + like what you want to compare you think about like what do you need for human labeling + how do you need to sample data what will be the result how you need to aggregate + it and how do you need to like use this information in your product.", "tokens": + [50414, 1396, 264, 958, 644, 934, 291, 6964, 437, 366, 291, 528, 281, 360, 337, + 1365, 1910, 411, 597, 20678, 291, 528, 281, 13059, 437, 360, 291, 528, 281, 536, + 411, 437, 291, 528, 281, 6794, 291, 519, 466, 411, 437, 360, 291, 643, 337, 1952, + 40244, 577, 360, 291, 643, 281, 6889, 1412, 437, 486, 312, 264, 1874, 577, 291, + 643, 281, 26118, 309, 293, 577, 360, 291, 643, 281, 411, 764, 341, 1589, 294, 428, + 1674, 13, 51814], "temperature": 0.0, "avg_logprob": -0.10744467014219702, "compression_ratio": + 1.9381443298969072, "no_speech_prob": 0.08275733143091202}, {"id": 133, "seek": + 190250, "start": 1902.5, "end": 1931.5, "text": " And then it comes like for example + for in this G you usually need some ideal ranking to compare your ranking to so + here comes exactly the crowdsourcing the manually because you can gather this ideal + ranking from them and then do a comparison on the your real search engine answers + so okay we define the goal we want an ideal ranking of flowers by the query and + not one query because like I''m not going to do it.", "tokens": [50364, 400, 550, + 309, 1487, 411, 337, 1365, 337, 294, 341, 460, 291, 2673, 643, 512, 7157, 17833, + 281, 6794, 428, 17833, 281, 370, 510, 1487, 2293, 264, 26070, 41849, 264, 16945, + 570, 291, 393, 5448, 341, 7157, 17833, 490, 552, 293, 550, 360, 257, 9660, 322, + 264, 428, 957, 3164, 2848, 6338, 370, 1392, 321, 6964, 264, 3387, 321, 528, 364, + 7157, 17833, 295, 8085, 538, 264, 14581, 293, 406, 472, 14581, 570, 411, 286, 478, + 406, 516, 281, 360, 309, 13, 51814], "temperature": 0.0, "avg_logprob": -0.3342220530790441, + "compression_ratio": 1.84304932735426, "no_speech_prob": 0.006404465530067682}, + {"id": 134, "seek": 193250, "start": 1932.5, "end": 1961.5, "text": " For example + just one queries kind of I don''t know super simple and you want to evaluate it + in general so here comes the sampling and how you can approach sampling of your + queries and the results of your search engine can be very different you can just + try to sample the most popular flowers and queries but it''s usually not the best + approach just because like the most popular queries are usually very well handled + and they are very simple.", "tokens": [50364, 1171, 1365, 445, 472, 24109, 733, + 295, 286, 500, 380, 458, 1687, 2199, 293, 291, 528, 281, 13059, 309, 294, 2674, + 370, 510, 1487, 264, 21179, 293, 577, 291, 393, 3109, 21179, 295, 428, 24109, 293, + 264, 3542, 295, 428, 3164, 2848, 393, 312, 588, 819, 291, 393, 445, 853, 281, 6889, + 264, 881, 3743, 8085, 293, 24109, 457, 309, 311, 2673, 406, 264, 1151, 3109, 445, + 570, 411, 264, 881, 3743, 24109, 366, 2673, 588, 731, 18033, 293, 436, 366, 588, + 2199, 13, 51814], "temperature": 0.0, "avg_logprob": -0.08180467013655038, "compression_ratio": + 1.8487394957983194, "no_speech_prob": 0.00217940891161561}, {"id": 135, "seek": + 196250, "start": 1962.5, "end": 1991.5, "text": " Because like when the people are + marching in the in their disayers it means that it''s not a very complicated thing + so like what very like a huge tail of very rare queries which you also want to consider + I guess in evaluation in ideal ranking so here comes like two techniques for example + reserve works sampling or even like stratified sampling I would say I must recommend + using stratified sampling adopted by like your own.", "tokens": [50364, 1436, 411, + 562, 264, 561, 366, 30523, 294, 264, 294, 641, 717, 320, 433, 309, 1355, 300, 309, + 311, 406, 257, 588, 6179, 551, 370, 411, 437, 588, 411, 257, 2603, 6838, 295, 588, + 5892, 24109, 597, 291, 611, 528, 281, 1949, 286, 2041, 294, 13344, 294, 7157, 17833, + 370, 510, 1487, 411, 732, 7512, 337, 1365, 17824, 1985, 21179, 420, 754, 411, 23674, + 2587, 21179, 286, 576, 584, 286, 1633, 2748, 1228, 23674, 2587, 21179, 12175, 538, + 411, 428, 1065, 13, 51814], "temperature": 0.0, "avg_logprob": -0.2953629886402803, + "compression_ratio": 1.7377049180327868, "no_speech_prob": 0.0011865663109347224}, + {"id": 136, "seek": 199250, "start": 1992.5, "end": 2021.5, "text": " So you can + use the same method as the one you are using in the list of the situation in your + own needs this one allows to like to very like shortly explain it without like the + deep is just you have your own data the whole amount of the queries with their like + how often they''re asked how popular they are and you are doing like some bins of + them based on the popularity and you try to sample equally from the each beam but + these beans are different sized based on the general popularity.", "tokens": [50364, + 407, 291, 393, 764, 264, 912, 3170, 382, 264, 472, 291, 366, 1228, 294, 264, 1329, + 295, 264, 2590, 294, 428, 1065, 2203, 341, 472, 4045, 281, 411, 281, 588, 411, 13392, + 2903, 309, 1553, 411, 264, 2452, 307, 445, 291, 362, 428, 1065, 1412, 264, 1379, + 2372, 295, 264, 24109, 365, 641, 411, 577, 2049, 436, 434, 2351, 577, 3743, 436, + 366, 293, 291, 366, 884, 411, 512, 41275, 295, 552, 2361, 322, 264, 19301, 293, + 291, 853, 281, 6889, 12309, 490, 264, 1184, 14269, 457, 613, 12010, 366, 819, 20004, + 2361, 322, 264, 2674, 19301, 13, 51814], "temperature": 0.0, "avg_logprob": -0.5945977692556853, + "compression_ratio": 1.8517110266159695, "no_speech_prob": 0.021445050835609436}, + {"id": 137, "seek": 202250, "start": 2022.5, "end": 2050.5, "text": " So we have + the kind of like you''re kind of modeling the distribution of the data in your engine + by sampling like this so after you have this data samples you need to think how + to present it to like manual labor what do you want to ask you like you want some + ideal ranking yes and there is an option to give them like for example query and", + "tokens": [50364, 407, 321, 362, 264, 733, 295, 411, 291, 434, 733, 295, 15983, + 264, 7316, 295, 264, 1412, 294, 428, 2848, 538, 21179, 411, 341, 370, 934, 291, + 362, 341, 1412, 10938, 291, 643, 281, 519, 577, 281, 1974, 309, 281, 411, 9688, + 5938, 437, 360, 291, 528, 281, 1029, 291, 411, 291, 528, 512, 7157, 17833, 2086, + 293, 456, 307, 364, 3614, 281, 976, 552, 411, 337, 1365, 14581, 293, 51764], "temperature": + 0.0, "avg_logprob": -0.22047872801084775, "compression_ratio": 1.7696335078534031, + "no_speech_prob": 0.003858267329633236}, {"id": 138, "seek": 205050, "start": 2050.5, + "end": 2065.5, "text": " the like the first 10 or 20 results that your engine returns + and it depends like how many results depends on the click through rate which you + can like for example estimate my", "tokens": [50364, 264, 411, 264, 700, 1266, 420, + 945, 3542, 300, 428, 2848, 11247, 293, 309, 5946, 411, 577, 867, 3542, 5946, 322, + 264, 2052, 807, 3314, 597, 291, 393, 411, 337, 1365, 12539, 452, 51114], "temperature": + 0.0, "avg_logprob": -0.14160170426239838, "compression_ratio": 1.4262295081967213, + "no_speech_prob": 0.04205191880464554}, {"id": 139, "seek": 206550, "start": 2065.5, + "end": 2081.5, "text": " you have a data about how users click how far they click + in your like length of your search results and you can estimate that after like + I don''t know 15th position it''s not interesting usually to anyone so you cannot + worry about it very much.", "tokens": [50364, 291, 362, 257, 1412, 466, 577, 5022, + 2052, 577, 1400, 436, 2052, 294, 428, 411, 4641, 295, 428, 3164, 3542, 293, 291, + 393, 12539, 300, 934, 411, 286, 500, 380, 458, 2119, 392, 2535, 309, 311, 406, 1880, + 2673, 281, 2878, 370, 291, 2644, 3292, 466, 309, 588, 709, 13, 51164], "temperature": + 0.0, "avg_logprob": -0.11413329618948477, "compression_ratio": 1.5031055900621118, + "no_speech_prob": 0.14669951796531677}, {"id": 140, "seek": 208150, "start": 2081.5, + "end": 2110.5, "text": " But here like you see if you''re giving the whole list + to end user in crowdsourcing and saying okay rank needs from like the most relevant + list relevant as I was saying before the composition is very important and this + task of ranking is very hard because even like I am having like some degree like + I have much worse and masters I think I am generally like a dukega person to some + extent I am not sure you know but", "tokens": [50364, 583, 510, 411, 291, 536, 498, + 291, 434, 2902, 264, 1379, 1329, 281, 917, 4195, 294, 26070, 41849, 293, 1566, 1392, + 6181, 2203, 490, 411, 264, 881, 7340, 1329, 7340, 382, 286, 390, 1566, 949, 264, + 12686, 307, 588, 1021, 293, 341, 5633, 295, 17833, 307, 588, 1152, 570, 754, 411, + 286, 669, 1419, 411, 512, 4314, 411, 286, 362, 709, 5324, 293, 19294, 286, 519, + 286, 669, 5101, 411, 257, 1581, 330, 3680, 954, 281, 512, 8396, 286, 669, 406, 988, + 291, 458, 457, 51814], "temperature": 0.0, "avg_logprob": -0.22696685791015625, + "compression_ratio": 1.7394957983193278, "no_speech_prob": 0.25857672095298767}, + {"id": 141, "seek": 211050, "start": 2110.5, "end": 2126.5, "text": " if somebody + says to me okay this is the flower like this there is a 15 pictures can you please + like rank them from the most suitable to the list suitable I would be like oh my + god I can''t do that because it''s too much so", "tokens": [50364, 498, 2618, 1619, + 281, 385, 1392, 341, 307, 264, 8617, 411, 341, 456, 307, 257, 2119, 5242, 393, 291, + 1767, 411, 6181, 552, 490, 264, 881, 12873, 281, 264, 1329, 12873, 286, 576, 312, + 411, 1954, 452, 3044, 286, 393, 380, 360, 300, 570, 309, 311, 886, 709, 370, 51164], + "temperature": 0.0, "avg_logprob": -0.14955833722960274, "compression_ratio": 1.476510067114094, + "no_speech_prob": 0.04592747986316681}, {"id": 142, "seek": 212650, "start": 2126.5, + "end": 2155.5, "text": " there''s like other approaches either like taking a specific + item which returned by your system taking a query and answering are they like relevant + or irrelevant together is it a matching or a matching pair it''s much simpler it''s + very good understandable by the crowd but the problem is that here you can''t kind + of compare items with the same relevance because like it says like okay", "tokens": + [50414, 456, 311, 411, 661, 11587, 2139, 411, 1940, 257, 2685, 3174, 597, 8752, + 538, 428, 1185, 1940, 257, 14581, 293, 13430, 366, 436, 411, 7340, 420, 28682, 1214, + 307, 309, 257, 14324, 420, 257, 14324, 6119, 309, 311, 709, 18587, 309, 311, 588, + 665, 25648, 538, 264, 6919, 457, 264, 1154, 307, 300, 510, 291, 393, 380, 733, 295, + 6794, 4754, 365, 264, 912, 32684, 570, 411, 309, 1619, 411, 1392, 51814], "temperature": + 0.0, "avg_logprob": -0.16114597062806826, "compression_ratio": 1.7522935779816513, + "no_speech_prob": 0.2652696669101715}, {"id": 143, "seek": 215650, "start": 2156.5, + "end": 2173.5, "text": " this relevant and this relevant and you''re like okay what + should I put on top this one or this one you can ask people to give like some percentage + of relevancy from their head but still different people think differently so it''s + kind of hard very much to aggregate the results.", "tokens": [50364, 341, 7340, + 293, 341, 7340, 293, 291, 434, 411, 1392, 437, 820, 286, 829, 322, 1192, 341, 472, + 420, 341, 472, 291, 393, 1029, 561, 281, 976, 411, 512, 9668, 295, 25916, 6717, + 490, 641, 1378, 457, 920, 819, 561, 519, 7614, 370, 309, 311, 733, 295, 1152, 588, + 709, 281, 26118, 264, 3542, 13, 51214], "temperature": 0.0, "avg_logprob": -0.1408216832047802, + "compression_ratio": 1.6294117647058823, "no_speech_prob": 0.008736278861761093}, + {"id": 144, "seek": 217350, "start": 2173.5, "end": 2201.5, "text": " The most nice + approach I would say would be a pairwise comparisons so you''re like giving a query + you''re giving two answers and you says okay what''s what''s what''s you what''s + you better and then by this pairwise comparisons you can do a whole ranking then + by aggregating this pairwise comparisons in the manner of the list with like from + the most relevant part to the list relevant and of course", "tokens": [50364, 440, + 881, 1481, 3109, 286, 576, 584, 576, 312, 257, 6119, 3711, 33157, 370, 291, 434, + 411, 2902, 257, 14581, 291, 434, 2902, 732, 6338, 293, 291, 1619, 1392, 437, 311, + 437, 311, 437, 311, 291, 437, 311, 291, 1101, 293, 550, 538, 341, 6119, 3711, 33157, + 291, 393, 360, 257, 1379, 17833, 550, 538, 16743, 990, 341, 6119, 3711, 33157, 294, + 264, 9060, 295, 264, 1329, 365, 411, 490, 264, 881, 7340, 644, 281, 264, 1329, 7340, + 293, 295, 1164, 51764], "temperature": 0.0, "avg_logprob": -0.20456910974839154, + "compression_ratio": 1.935960591133005, "no_speech_prob": 0.012534006498754025}, + {"id": 145, "seek": 220150, "start": 2201.5, "end": 2215.5, "text": " If you''re + doing this pairwise comparisons honestly like how it''s supposed to be it''s n squared + the amount of entities which is like a tons of entities so usually our suggestion + is to do like more in the", "tokens": [50364, 759, 291, 434, 884, 341, 6119, 3711, + 33157, 6095, 411, 577, 309, 311, 3442, 281, 312, 309, 311, 297, 8889, 264, 2372, + 295, 16667, 597, 307, 411, 257, 9131, 295, 16667, 370, 2673, 527, 16541, 307, 281, + 360, 411, 544, 294, 264, 51064], "temperature": 0.0, "avg_logprob": -0.19892464513364044, + "compression_ratio": 1.4744525547445255, "no_speech_prob": 0.004522436764091253}, + {"id": 146, "seek": 221550, "start": 2215.5, "end": 2233.5, "text": " weak sort + or like other sort manner with n log n so like doing a hard estimation of this pairwise + comparisons sampling a little bit less but still you can like have in the end the + pretty pretty good estimate it like ranking list.", "tokens": [50364, 5336, 1333, + 420, 411, 661, 1333, 9060, 365, 297, 3565, 297, 370, 411, 884, 257, 1152, 35701, + 295, 341, 6119, 3711, 33157, 21179, 257, 707, 857, 1570, 457, 920, 291, 393, 411, + 362, 294, 264, 917, 264, 1238, 1238, 665, 12539, 309, 411, 17833, 1329, 13, 51264], + "temperature": 0.0, "avg_logprob": -0.20536237716674804, "compression_ratio": 1.6013986013986015, + "no_speech_prob": 0.11024225503206253}, {"id": 147, "seek": 223350, "start": 2233.9, + "end": 2257.02, "text": " So you create this assignment you have this pairwise comparisons + you have the results you can estimate the quality of results based by how like this + particular user is good with this particular assignments and then you''re aggregating + it there are like some models then you can use for aggregating for example like + mathematical models statistical models like for example,", "tokens": [50384, 407, + 291, 1884, 341, 15187, 291, 362, 341, 6119, 3711, 33157, 291, 362, 264, 3542, 291, + 393, 12539, 264, 3125, 295, 3542, 2361, 538, 577, 411, 341, 1729, 4195, 307, 665, + 365, 341, 1729, 22546, 293, 550, 291, 434, 16743, 990, 309, 456, 366, 411, 512, + 5245, 550, 291, 393, 764, 337, 16743, 990, 337, 1365, 411, 18894, 5245, 22820, 5245, + 411, 337, 1365, 11, 51540], "temperature": 0.0, "avg_logprob": -0.17941074094910553, + "compression_ratio": 1.9270833333333333, "no_speech_prob": 0.04827868565917015}, + {"id": 148, "seek": 225702, "start": 2257.02, "end": 2269.02, "text": " red literary + or something we did it actually in our crowd kids it''s a thing for pretty much + we tried to do an open library in Python for every type of crowdsourcing", "tokens": + [50364, 2182, 24194, 420, 746, 321, 630, 309, 767, 294, 527, 6919, 2301, 309, 311, + 257, 551, 337, 1238, 709, 321, 3031, 281, 360, 364, 1269, 6405, 294, 15329, 337, + 633, 2010, 295, 26070, 41849, 50964], "temperature": 0.0, "avg_logprob": -0.31057712009974886, + "compression_ratio": 1.7577092511013215, "no_speech_prob": 0.018470551818609238}, + {"id": 149, "seek": 225702, "start": 2269.02, "end": 2286.94, "text": " annotations + not only to local ones so you can implement it yourself for example take some library + even hours and then you got your ideal ranking as you desired and you can compute + the metrics like compared to your ideal ranking so how", "tokens": [50964, 25339, + 763, 406, 787, 281, 2654, 2306, 370, 291, 393, 4445, 309, 1803, 337, 1365, 747, + 512, 6405, 754, 2496, 293, 550, 291, 658, 428, 7157, 17833, 382, 291, 14721, 293, + 291, 393, 14722, 264, 16367, 411, 5347, 281, 428, 7157, 17833, 370, 577, 51860], + "temperature": 0.0, "avg_logprob": -0.31057712009974886, "compression_ratio": 1.7577092511013215, + "no_speech_prob": 0.018470551818609238}, {"id": 150, "seek": 228694, "start": 2286.94, + "end": 2307.42, "text": " good your search engine returns like on big samples and + these samples how good are the results of these flowers how relevant they are and + then you see like what is the overall result it might be good or not very good if + it not very good you for example can select some domains when you see the most mistakes + and try to", "tokens": [50364, 665, 428, 3164, 2848, 11247, 411, 322, 955, 10938, + 293, 613, 10938, 577, 665, 366, 264, 3542, 295, 613, 8085, 577, 7340, 436, 366, + 293, 550, 291, 536, 411, 437, 307, 264, 4787, 1874, 309, 1062, 312, 665, 420, 406, + 588, 665, 498, 309, 406, 588, 665, 291, 337, 1365, 393, 3048, 512, 25514, 562, 291, + 536, 264, 881, 8038, 293, 853, 281, 51388], "temperature": 0.0, "avg_logprob": -0.17742101470036292, + "compression_ratio": 1.8171428571428572, "no_speech_prob": 0.0107498150318861}, + {"id": 151, "seek": 230742, "start": 2307.42, "end": 2336.42, "text": " like ask + the crowd in some separate projects the main wise like where are the mistakes exactly + maybe you have problems with like defining the color of this flower or maybe you + have problems with like good lightning on the photos and you can figure out what + is exactly the problem and yeah you can use this manual labeling firstly for evaluating + the metrics from time to time and to see how your", "tokens": [50364, 411, 1029, + 264, 6919, 294, 512, 4994, 4455, 264, 2135, 10829, 411, 689, 366, 264, 8038, 2293, + 1310, 291, 362, 2740, 365, 411, 17827, 264, 2017, 295, 341, 8617, 420, 1310, 291, + 362, 2740, 365, 411, 665, 16589, 322, 264, 5787, 293, 291, 393, 2573, 484, 437, + 307, 2293, 264, 1154, 293, 1338, 291, 393, 764, 341, 9688, 40244, 27376, 337, 27479, + 264, 16367, 490, 565, 281, 565, 293, 281, 536, 577, 428, 51814], "temperature": + 0.0, "avg_logprob": -0.16330799499115387, "compression_ratio": 1.8761904761904762, + "no_speech_prob": 0.00840856321156025}, {"id": 152, "seek": 233642, "start": 2336.42, + "end": 2352.42, "text": " search engine improves with like including new features + and changing the search in algorithms and you can also train on this manually labeled + data your ML models which perform ranking so I would say it like it works kind of + like this.", "tokens": [50364, 3164, 2848, 24771, 365, 411, 3009, 777, 4122, 293, + 4473, 264, 3164, 294, 14642, 293, 291, 393, 611, 3847, 322, 341, 16945, 21335, 1412, + 428, 21601, 5245, 597, 2042, 17833, 370, 286, 576, 584, 309, 411, 309, 1985, 733, + 295, 411, 341, 13, 51164], "temperature": 0.0, "avg_logprob": -0.20787811279296875, + "compression_ratio": 1.481012658227848, "no_speech_prob": 0.022227052599191666}, + {"id": 153, "seek": 235242, "start": 2352.42, "end": 2370.42, "text": " Yeah this + is this is great it does sound like a very structured process what you explained + but but I do want to drill into maybe couple of specifics so one is I believe in + DCG is is definitely I think it''s a browser that", "tokens": [50364, 865, 341, + 307, 341, 307, 869, 309, 775, 1626, 411, 257, 588, 18519, 1399, 437, 291, 8825, + 457, 457, 286, 360, 528, 281, 11392, 666, 1310, 1916, 295, 28454, 370, 472, 307, + 286, 1697, 294, 9114, 38, 307, 307, 2138, 286, 519, 309, 311, 257, 11185, 300, 51264], + "temperature": 0.0, "avg_logprob": -0.20850996877632888, "compression_ratio": 1.4012738853503184, + "no_speech_prob": 0.4956706464290619}, {"id": 154, "seek": 237042, "start": 2371.42, + "end": 2397.42, "text": " In principle if I was communicating this to some management + in my team I could say that yesterday we were at 75% and today we are 76% so we + are improving right and this is on a percent scale if I remove the letter and from + this formula then this becomes like an absolute scale and there is no way to tell + are we progressing or are we regressing.", "tokens": [50414, 682, 8665, 498, 286, + 390, 17559, 341, 281, 512, 4592, 294, 452, 1469, 286, 727, 584, 300, 5186, 321, + 645, 412, 9562, 4, 293, 965, 321, 366, 24733, 4, 370, 321, 366, 11470, 558, 293, + 341, 307, 322, 257, 3043, 4373, 498, 286, 4159, 264, 5063, 293, 490, 341, 8513, + 550, 341, 3643, 411, 364, 8236, 4373, 293, 456, 307, 572, 636, 281, 980, 366, 321, + 36305, 420, 366, 321, 1121, 18605, 13, 51714], "temperature": 0.0, "avg_logprob": + -0.1579043029190658, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.11258716881275177}, {"id": 155, "seek": 239742, "start": 2397.42, "end": 2422.42, + "text": " But at the same time again wearing my devil''s advocate suit here for + a moment and DCG has a problem that if let''s say I have a scale of labels from + zero to three right so zero one two three so zero meaning completely relevant result + and three meaning completely relevant perfect result if I", "tokens": [50364, 583, + 412, 264, 912, 565, 797, 4769, 452, 13297, 311, 14608, 5722, 510, 337, 257, 1623, + 293, 9114, 38, 575, 257, 1154, 300, 498, 718, 311, 584, 286, 362, 257, 4373, 295, + 16949, 490, 4018, 281, 1045, 558, 370, 4018, 472, 732, 1045, 370, 4018, 3620, 2584, + 7340, 1874, 293, 1045, 3620, 2584, 7340, 2176, 1874, 498, 286, 51614], "temperature": + 0.0, "avg_logprob": -0.17236169692008727, "compression_ratio": 1.6571428571428573, + "no_speech_prob": 0.013033497147262096}, {"id": 156, "seek": 242242, "start": 2423.42, + "end": 2449.42, "text": " receive two ratings one with all all ones right so all + ones and the other one is with all three so all ones it''s kind of like a suboptimal + result nothing better in the list but at the same time not perfect and the other + one is absolutely perfect and DCG yields exactly same number because if only if + we", "tokens": [50414, 4774, 732, 24603, 472, 365, 439, 439, 2306, 558, 370, 439, + 2306, 293, 264, 661, 472, 307, 365, 439, 1045, 370, 439, 2306, 309, 311, 733, 295, + 411, 257, 1422, 5747, 10650, 1874, 1825, 1101, 294, 264, 1329, 457, 412, 264, 912, + 565, 406, 2176, 293, 264, 661, 472, 307, 3122, 2176, 293, 9114, 38, 32168, 2293, + 912, 1230, 570, 498, 787, 498, 321, 51714], "temperature": 0.0, "avg_logprob": -0.14462030635160558, + "compression_ratio": 1.745664739884393, "no_speech_prob": 0.08547800034284592}, + {"id": 157, "seek": 244942, "start": 2450.42, "end": 2452.42, "text": " You rightly + mentioned about", "tokens": [50414, 509, 32879, 2835, 466, 50514], "temperature": + 0.0, "avg_logprob": -0.17479648897724767, "compression_ratio": 1.5217391304347827, + "no_speech_prob": 0.032472848892211914}, {"id": 158, "seek": 244942, "start": 2454.42, + "end": 2477.42, "text": " optimal perfect ranking so if my perfect ranking equals + in lens exactly the visible labeled area than the formal and DCG will yield 100% + in both cases and this is kind of like a problem and you touched on this in that + part where you say that we need to", "tokens": [50614, 16252, 2176, 17833, 370, + 498, 452, 2176, 17833, 6915, 294, 6765, 2293, 264, 8974, 21335, 1859, 813, 264, + 9860, 293, 9114, 38, 486, 11257, 2319, 4, 294, 1293, 3331, 293, 341, 307, 733, 295, + 411, 257, 1154, 293, 291, 9828, 322, 341, 294, 300, 644, 689, 291, 584, 300, 321, + 643, 281, 51764], "temperature": 0.0, "avg_logprob": -0.17479648897724767, "compression_ratio": + 1.5217391304347827, "no_speech_prob": 0.032472848892211914}, {"id": 159, "seek": + 247742, "start": 2477.42, "end": 2498.42, "text": " make sure to construct this + perfect order of results right so how long it should be let''s say if I show 10 + flowers on the screen 10 bouquets whatever how long that perfectly should be 30 + hundred is there any recommendation.", "tokens": [50364, 652, 988, 281, 7690, 341, + 2176, 1668, 295, 3542, 558, 370, 577, 938, 309, 820, 312, 718, 311, 584, 498, 286, + 855, 1266, 8085, 322, 264, 2568, 1266, 15345, 358, 1385, 2035, 577, 938, 300, 6239, + 820, 312, 2217, 3262, 307, 456, 604, 11879, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.166210836293746, "compression_ratio": 1.448051948051948, "no_speech_prob": 0.035630516707897186}, + {"id": 160, "seek": 249842, "start": 2498.42, "end": 2527.42, "text": " I would + say like as I mentioned before firstly you can use the metrics which is like expected + reciprocal rank which is exactly talking about the moment when the user lose attention + and after that you can make mistake but they are just not reaching it and for evaluation + this like the moment of the termination of the interest you can exactly I think + pre evaluated with like if you have some any data", "tokens": [50414, 286, 576, + 584, 411, 382, 286, 2835, 949, 27376, 291, 393, 764, 264, 16367, 597, 307, 411, + 5176, 46948, 6181, 597, 307, 2293, 1417, 466, 264, 1623, 562, 264, 4195, 3624, 3202, + 293, 934, 300, 291, 393, 652, 6146, 457, 436, 366, 445, 406, 9906, 309, 293, 337, + 13344, 341, 411, 264, 1623, 295, 264, 1433, 2486, 295, 264, 1179, 291, 393, 2293, + 286, 519, 659, 25509, 365, 411, 498, 291, 362, 512, 604, 1412, 51814], "temperature": + 0.0, "avg_logprob": -0.1741151687426445, "compression_ratio": 1.821917808219178, + "no_speech_prob": 0.13293956220149994}, {"id": 161, "seek": 252842, "start": 2528.42, + "end": 2553.42, "text": " and pre-evaluated by the clicks so you can give like any + item some weight by reach through general and then just predict in general how much + like general user how many items your general users like look through before they''re + satisfied with the result and maybe over time this actually amount will be decreased + because your ranking will be more perfect.", "tokens": [50364, 293, 659, 12, 68, + 3337, 27275, 538, 264, 18521, 370, 291, 393, 976, 411, 604, 3174, 512, 3364, 538, + 2524, 807, 2674, 293, 550, 445, 6069, 294, 2674, 577, 709, 411, 2674, 4195, 577, + 867, 4754, 428, 2674, 5022, 411, 574, 807, 949, 436, 434, 11239, 365, 264, 1874, + 293, 1310, 670, 565, 341, 767, 2372, 486, 312, 24436, 570, 428, 17833, 486, 312, + 544, 2176, 13, 51614], "temperature": 0.0, "avg_logprob": -0.19836205495914944, + "compression_ratio": 1.6971153846153846, "no_speech_prob": 0.0038088401779532433}, + {"id": 162, "seek": 255342, "start": 2554.42, "end": 2580.42, "text": " But you + also can try to emulate the same experiments with actually the cross-sourcing and + just to see how like to give them some certain amount of objects why I''m talking + about this actually because recently when we had this talk about biases the presenter + for testing his hypothesis on the click through he created a project in soloka where + he had like the query.", "tokens": [50414, 583, 291, 611, 393, 853, 281, 45497, + 264, 912, 12050, 365, 767, 264, 3278, 12, 82, 41849, 293, 445, 281, 536, 577, 411, + 281, 976, 552, 512, 1629, 2372, 295, 6565, 983, 286, 478, 1417, 466, 341, 767, 570, + 3938, 562, 321, 632, 341, 751, 466, 32152, 264, 35594, 337, 4997, 702, 17291, 322, + 264, 2052, 807, 415, 2942, 257, 1716, 294, 1404, 15289, 689, 415, 632, 411, 264, + 14581, 13, 51714], "temperature": 0.0, "avg_logprob": -0.15174814860026042, "compression_ratio": + 1.6106194690265487, "no_speech_prob": 0.28370440006256104}, {"id": 163, "seek": + 258042, "start": 2580.42, "end": 2608.42, "text": " And the recommendations which + were like around the 20 or like 30 and he looked are there clicking through until + some I mean they''re also were like ordered like the search engine and he looked + like how how far they''re clicking through to check the hypothesis of the position + bias so in general you can also try to test your hypothesis online with a click + metrics and see how to like.", "tokens": [50364, 400, 264, 10434, 597, 645, 411, + 926, 264, 945, 420, 411, 2217, 293, 415, 2956, 366, 456, 9697, 807, 1826, 512, 286, + 914, 436, 434, 611, 645, 411, 8866, 411, 264, 3164, 2848, 293, 415, 2956, 411, 577, + 577, 1400, 436, 434, 9697, 807, 281, 1520, 264, 17291, 295, 264, 2535, 12577, 370, + 294, 2674, 291, 393, 611, 853, 281, 1500, 428, 17291, 2950, 365, 257, 2052, 16367, + 293, 536, 577, 281, 411, 13, 51764], "temperature": 0.0, "avg_logprob": -0.16605663299560547, + "compression_ratio": 1.7767441860465116, "no_speech_prob": 0.02745301090180874}, + {"id": 164, "seek": 261042, "start": 2610.42, "end": 2633.42, "text": " Choose this + position and then test it offline but one additional thing when we''re talking about + business we''re in general also talking about budgets so of course the more you + need to evaluate steel is the cost will rise just because you''re like your offer + it more data to crowd and crowd needs to like to do more assignments or it becoming + more costly.", "tokens": [50364, 21661, 341, 2535, 293, 550, 1500, 309, 21857, 457, + 472, 4497, 551, 562, 321, 434, 1417, 466, 1606, 321, 434, 294, 2674, 611, 1417, + 466, 26708, 370, 295, 1164, 264, 544, 291, 643, 281, 13059, 8269, 307, 264, 2063, + 486, 6272, 445, 570, 291, 434, 411, 428, 2626, 309, 544, 1412, 281, 6919, 293, 6919, + 2203, 281, 411, 281, 360, 544, 22546, 420, 309, 5617, 544, 28328, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.1975498596827189, "compression_ratio": 1.7339901477832513, + "no_speech_prob": 0.008060887455940247}, {"id": 165, "seek": 263342, "start": 2633.42, + "end": 2651.42, "text": " So I would say I would like estimate the amount that you + need that you know that you need the amount of the click through and then maybe + cut it based on your like general estimate the costs of mental labeling and try + to align them little bit because still.", "tokens": [50364, 407, 286, 576, 584, + 286, 576, 411, 12539, 264, 2372, 300, 291, 643, 300, 291, 458, 300, 291, 643, 264, + 2372, 295, 264, 2052, 807, 293, 550, 1310, 1723, 309, 2361, 322, 428, 411, 2674, + 12539, 264, 5497, 295, 4973, 40244, 293, 853, 281, 7975, 552, 707, 857, 570, 920, + 13, 51264], "temperature": 0.0, "avg_logprob": -0.1766433369029652, "compression_ratio": + 1.7066666666666668, "no_speech_prob": 0.07323069870471954}, {"id": 166, "seek": + 265142, "start": 2651.42, "end": 2668.42, "text": " I would say the result might + be not like 100% perfect in the means that people are reaching like farther and + seeing the mistakes but it still will be a big improvement if you''re catch a mistake + in the top ranking like positions.", "tokens": [50364, 286, 576, 584, 264, 1874, + 1062, 312, 406, 411, 2319, 4, 2176, 294, 264, 1355, 300, 561, 366, 9906, 411, 20344, + 293, 2577, 264, 8038, 457, 309, 920, 486, 312, 257, 955, 10444, 498, 291, 434, 3745, + 257, 6146, 294, 264, 1192, 17833, 411, 8432, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.1470617198944092, "compression_ratio": 1.4615384615384615, "no_speech_prob": + 0.2523283362388611}, {"id": 167, "seek": 266842, "start": 2668.42, "end": 2686.42, + "text": " I think connected to this there is a notion of disagreement between annotators + right so what is relevant for you might be completely relevant to me and I I want + to see the for the same query I want to see the results in different order.", "tokens": + [50364, 286, 519, 4582, 281, 341, 456, 307, 257, 10710, 295, 38947, 1296, 25339, + 3391, 558, 370, 437, 307, 7340, 337, 291, 1062, 312, 2584, 7340, 281, 385, 293, + 286, 286, 528, 281, 536, 264, 337, 264, 912, 14581, 286, 528, 281, 536, 264, 3542, + 294, 819, 1668, 13, 51264], "temperature": 0.0, "avg_logprob": -0.09348512612856351, + "compression_ratio": 1.5733333333333333, "no_speech_prob": 0.4896894693374634}, + {"id": 168, "seek": 268642, "start": 2686.42, "end": 2715.42, "text": " I think + one of the suggestions I''ve heard of how you could construct this perfect list + is actually you can take and concatenate all of the rankings given by independent + annotators for the same query and then resort them in the order that makes it perfect + from the top to the bottom of course you will still have issues with ties right + so if you have three three three''s then how should you order them but.", "tokens": + [50364, 286, 519, 472, 295, 264, 13396, 286, 600, 2198, 295, 577, 291, 727, 7690, + 341, 2176, 1329, 307, 767, 291, 393, 747, 293, 1588, 7186, 473, 439, 295, 264, 36550, + 2212, 538, 6695, 25339, 3391, 337, 264, 912, 14581, 293, 550, 19606, 552, 294, 264, + 1668, 300, 1669, 309, 2176, 490, 264, 1192, 281, 264, 2767, 295, 1164, 291, 486, + 920, 362, 2663, 365, 14039, 558, 370, 498, 291, 362, 1045, 1045, 1045, 311, 550, + 577, 820, 291, 1668, 552, 457, 13, 51814], "temperature": 0.0, "avg_logprob": -0.10457659876623819, + "compression_ratio": 1.7307692307692308, "no_speech_prob": 0.06597565114498138}, + {"id": 169, "seek": 271542, "start": 2715.42, "end": 2719.42, "text": " But at least + they will be visible on the screen so maybe that''s fine.", "tokens": [50364, 583, + 412, 1935, 436, 486, 312, 8974, 322, 264, 2568, 370, 1310, 300, 311, 2489, 13, 50564], + "temperature": 0.0, "avg_logprob": -0.10175781126146193, "compression_ratio": 1.6398104265402844, + "no_speech_prob": 0.0018909581704065204}, {"id": 170, "seek": 271542, "start": 2719.42, + "end": 2742.42, "text": " Or maybe not who knows but at the same time you kind of + like achieve this perfect list which incorporates the wisdom or the wishes of other + people that have been in the same sort of group have you experiment something around + these lines or do you think it''s sensible to do is.", "tokens": [50564, 1610, 1310, + 406, 567, 3255, 457, 412, 264, 912, 565, 291, 733, 295, 411, 4584, 341, 2176, 1329, + 597, 50193, 264, 10712, 420, 264, 15065, 295, 661, 561, 300, 362, 668, 294, 264, + 912, 1333, 295, 1594, 362, 291, 5120, 746, 926, 613, 3876, 420, 360, 291, 519, 309, + 311, 25380, 281, 360, 307, 13, 51714], "temperature": 0.0, "avg_logprob": -0.10175781126146193, + "compression_ratio": 1.6398104265402844, "no_speech_prob": 0.0018909581704065204}, + {"id": 171, "seek": 274242, "start": 2743.42, "end": 2760.42, "text": " To experiment + with the which part with the check in the form working with or with reordering with + with with with constructing your perfect list right because for ndcg you need that + perfect list to divide by right in the form.", "tokens": [50414, 1407, 5120, 365, + 264, 597, 644, 365, 264, 1520, 294, 264, 1254, 1364, 365, 420, 365, 319, 765, 1794, + 365, 365, 365, 365, 39969, 428, 2176, 1329, 558, 570, 337, 297, 67, 66, 70, 291, + 643, 300, 2176, 1329, 281, 9845, 538, 558, 294, 264, 1254, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.2846854153801413, "compression_ratio": 1.6917293233082706, + "no_speech_prob": 0.01971437968313694}, {"id": 172, "seek": 276042, "start": 2761.42, + "end": 2782.42, "text": " So how we experiment with the length of this list you''re + asking me no in this case I think i''m actually describing that specific way of + building it that you take a sub lists from different people that annotated the same + query then you stack them together and then you sort them right.", "tokens": [50414, + 407, 577, 321, 5120, 365, 264, 4641, 295, 341, 1329, 291, 434, 3365, 385, 572, 294, + 341, 1389, 286, 519, 741, 478, 767, 16141, 300, 2685, 636, 295, 2390, 309, 300, + 291, 747, 257, 1422, 14511, 490, 819, 561, 300, 25339, 770, 264, 912, 14581, 550, + 291, 8630, 552, 1214, 293, 550, 291, 1333, 552, 558, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.21532035264812532, "compression_ratio": 1.5777777777777777, + "no_speech_prob": 0.07856527715921402}, {"id": 173, "seek": 278242, "start": 2782.42, + "end": 2810.42, "text": " Or always it''s yeah it''s very interesting approach I + would say that I myself never experienced such of the technique which sounds very + interesting but we''re usually just doing like I usually eat more like aggregation + by the models which are not like concatenating but that taking into the account + in the general the quality of the user in this ranking problem.", "tokens": [50364, + 1610, 1009, 309, 311, 1338, 309, 311, 588, 1880, 3109, 286, 576, 584, 300, 286, + 2059, 1128, 6751, 1270, 295, 264, 6532, 597, 3263, 588, 1880, 457, 321, 434, 2673, + 445, 884, 411, 286, 2673, 1862, 544, 411, 16743, 399, 538, 264, 5245, 597, 366, + 406, 411, 1588, 7186, 990, 457, 300, 1940, 666, 264, 2696, 294, 264, 2674, 264, + 3125, 295, 264, 4195, 294, 341, 17833, 1154, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.19378720897517793, "compression_ratio": 1.7061611374407584, "no_speech_prob": + 0.05529528856277466}, {"id": 174, "seek": 281042, "start": 2810.42, "end": 2836.42, + "text": " And so when you''re doing an aggregation you''re just like more lean towards + a user who are proficient in ranking in general so you trust him as an end to end + good user of the search so for example when one person said all three and one said + all once but I know that this three guy is in general good at this I will just take + his one as an ideal labeling.", "tokens": [50364, 400, 370, 562, 291, 434, 884, + 364, 16743, 399, 291, 434, 445, 411, 544, 11659, 3030, 257, 4195, 567, 366, 1740, + 24549, 294, 17833, 294, 2674, 370, 291, 3361, 796, 382, 364, 917, 281, 917, 665, + 4195, 295, 264, 3164, 370, 337, 1365, 562, 472, 954, 848, 439, 1045, 293, 472, 848, + 439, 1564, 457, 286, 458, 300, 341, 1045, 2146, 307, 294, 2674, 665, 412, 341, 286, + 486, 445, 747, 702, 472, 382, 364, 7157, 40244, 13, 51664], "temperature": 0.0, + "avg_logprob": -0.16851471691596798, "compression_ratio": 1.680952380952381, "no_speech_prob": + 0.09041153639554977}, {"id": 175, "seek": 283642, "start": 2836.42, "end": 2865.42, + "text": " Yeah this is this is the exciting part you''re tapping into the topic + of quality of annotators which is super super important at the same time you could + teach the annotators if you have them in house but if you have them external you + kind of do not have control over who gets what task so how exactly maybe the local + or what kind of methodology should I apply to measure the quality of the data.", + "tokens": [50364, 865, 341, 307, 341, 307, 264, 4670, 644, 291, 434, 21444, 666, + 264, 4829, 295, 3125, 295, 25339, 3391, 597, 307, 1687, 1687, 1021, 412, 264, 912, + 565, 291, 727, 2924, 264, 25339, 3391, 498, 291, 362, 552, 294, 1782, 457, 498, + 291, 362, 552, 8320, 291, 733, 295, 360, 406, 362, 1969, 670, 567, 2170, 437, 5633, + 370, 577, 2293, 1310, 264, 2654, 420, 437, 733, 295, 24850, 820, 286, 3079, 281, + 3481, 264, 3125, 295, 264, 1412, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.16525962553828596, "compression_ratio": 1.7589285714285714, "no_speech_prob": + 0.26505404710769653}, {"id": 176, "seek": 286642, "start": 2866.42, "end": 2869.42, + "text": " What are the components there.", "tokens": [50364, 708, 366, 264, 6677, + 456, 13, 50514], "temperature": 0.0, "avg_logprob": -0.15064286624684053, "compression_ratio": + 1.7048458149779735, "no_speech_prob": 0.018435053527355194}, {"id": 177, "seek": + 286642, "start": 2869.42, "end": 2895.42, "text": " It''s actually like I would + say it''s a very very like it''s a very big system and the means that you need to + not only measure quality but also like keep your projects and protect it from the + fraud and from the people who specifically want to break quality not just they''re + like making a human mistakes but they''re really really trying to scan with your + data.", "tokens": [50514, 467, 311, 767, 411, 286, 576, 584, 309, 311, 257, 588, + 588, 411, 309, 311, 257, 588, 955, 1185, 293, 264, 1355, 300, 291, 643, 281, 406, + 787, 3481, 3125, 457, 611, 411, 1066, 428, 4455, 293, 2371, 309, 490, 264, 14560, + 293, 490, 264, 561, 567, 4682, 528, 281, 1821, 3125, 406, 445, 436, 434, 411, 1455, + 257, 1952, 8038, 457, 436, 434, 534, 534, 1382, 281, 11049, 365, 428, 1412, 13, + 51814], "temperature": 0.0, "avg_logprob": -0.15064286624684053, "compression_ratio": + 1.7048458149779735, "no_speech_prob": 0.018435053527355194}, {"id": 178, "seek": + 289542, "start": 2895.42, "end": 2905.42, "text": " So there are like different + techniques starting from the super simple ones like anti fraud ones which are like + looking at how fast are you labeling.", "tokens": [50364, 407, 456, 366, 411, 819, + 7512, 2891, 490, 264, 1687, 2199, 2306, 411, 6061, 14560, 2306, 597, 366, 411, 1237, + 412, 577, 2370, 366, 291, 40244, 13, 50864], "temperature": 0.0, "avg_logprob": + -0.23826783124138326, "compression_ratio": 1.7872340425531914, "no_speech_prob": + 0.05060029402375221}, {"id": 179, "seek": 289542, "start": 2905.42, "end": 2924.42, + "text": " Are you labeling with the same non human distribution of the labels like + clicking only like one option until it just goes forever or like even sometimes + it''s shaking of how you''re like how is your behaviors like in general with like + different projects with a lot of data.", "tokens": [50864, 2014, 291, 40244, 365, + 264, 912, 2107, 1952, 7316, 295, 264, 16949, 411, 9697, 787, 411, 472, 3614, 1826, + 309, 445, 1709, 5680, 420, 411, 754, 2171, 309, 311, 15415, 295, 577, 291, 434, + 411, 577, 307, 428, 15501, 411, 294, 2674, 365, 411, 819, 4455, 365, 257, 688, 295, + 1412, 13, 51814], "temperature": 0.0, "avg_logprob": -0.23826783124138326, "compression_ratio": + 1.7872340425531914, "no_speech_prob": 0.05060029402375221}, {"id": 180, "seek": + 292442, "start": 2924.42, "end": 2933.42, "text": " Different projects without not + taking how your mouse works or something like this so this and also there are like + of course general exams.", "tokens": [50364, 20825, 4455, 1553, 406, 1940, 577, + 428, 9719, 1985, 420, 746, 411, 341, 370, 341, 293, 611, 456, 366, 411, 295, 1164, + 2674, 20514, 13, 50814], "temperature": 0.0, "avg_logprob": -0.23619004658290318, + "compression_ratio": 1.778723404255319, "no_speech_prob": 0.057521265000104904}, + {"id": 181, "seek": 292442, "start": 2933.42, "end": 2952.42, "text": " Checking + your language proficiency checking your proficiency writing checking your proficiency + in some other skills which are also building up some certain I will say port portrait + of a good label or because if you''re like able to and provide the good results + in the some skills.", "tokens": [50814, 6881, 278, 428, 2856, 1740, 42081, 8568, + 428, 1740, 42081, 3579, 8568, 428, 1740, 42081, 294, 512, 661, 3942, 597, 366, 611, + 2390, 493, 512, 1629, 286, 486, 584, 2436, 17126, 295, 257, 665, 7645, 420, 570, + 498, 291, 434, 411, 1075, 281, 293, 2893, 264, 665, 3542, 294, 264, 512, 3942, 13, + 51764], "temperature": 0.0, "avg_logprob": -0.23619004658290318, "compression_ratio": + 1.778723404255319, "no_speech_prob": 0.057521265000104904}, {"id": 182, "seek": + 295242, "start": 2952.42, "end": 2979.42, "text": " Which are like around this problem + like your good with this or this that means that you''re in general won''t be at + least a broader and you have a chance to succeed in this tasks and then the main + mechanism which is used in the most of the tasks where you have categories like + classification or something and when we''re working with a categorical tasks we + know the ground truth some sort.", "tokens": [50364, 3013, 366, 411, 926, 341, 1154, + 411, 428, 665, 365, 341, 420, 341, 300, 1355, 300, 291, 434, 294, 2674, 1582, 380, + 312, 412, 1935, 257, 13227, 293, 291, 362, 257, 2931, 281, 7754, 294, 341, 9608, + 293, 550, 264, 2135, 7513, 597, 307, 1143, 294, 264, 881, 295, 264, 9608, 689, 291, + 362, 10479, 411, 21538, 420, 746, 293, 562, 321, 434, 1364, 365, 257, 19250, 804, + 9608, 321, 458, 264, 2727, 3494, 512, 1333, 13, 51714], "temperature": 0.0, "avg_logprob": + -0.14564392890459227, "compression_ratio": 1.7625570776255708, "no_speech_prob": + 0.05177878215909004}, {"id": 183, "seek": 297942, "start": 2979.42, "end": 3002.42, + "text": " Of course it doesn''t happen with ranking because we''re ranking we don''t + know like it''s a very subjective manner what do you prefer this or this but with + but you can of course actually create an obvious examples like very obvious so when + you know like some ground truth you can hide you can shuffle in this.", "tokens": + [50364, 2720, 1164, 309, 1177, 380, 1051, 365, 17833, 570, 321, 434, 17833, 321, + 500, 380, 458, 411, 309, 311, 257, 588, 25972, 9060, 437, 360, 291, 4382, 341, 420, + 341, 457, 365, 457, 291, 393, 295, 1164, 767, 1884, 364, 6322, 5110, 411, 588, 6322, + 370, 562, 291, 458, 411, 512, 2727, 3494, 291, 393, 6479, 291, 393, 39426, 294, + 341, 13, 51514], "temperature": 0.0, "avg_logprob": -0.13944773240522904, "compression_ratio": + 1.6630434782608696, "no_speech_prob": 0.008864752016961575}, {"id": 184, "seek": + 300242, "start": 3002.42, "end": 3008.42, "text": " And you can see the examples + of the tasks with the answers which you know and you can like.", "tokens": [50364, + 400, 291, 393, 536, 264, 5110, 295, 264, 9608, 365, 264, 6338, 597, 291, 458, 293, + 291, 393, 411, 13, 50664], "temperature": 0.0, "avg_logprob": -0.2361252958124334, + "compression_ratio": 1.7663934426229508, "no_speech_prob": 0.5042442679405212}, + {"id": 185, "seek": 300242, "start": 3008.42, "end": 3030.42, "text": " Hiddenly + shuffle them in between the assignments so people will complete them without noticing + it because like it''s hidden by the API and everything and by the percentage of + the examples that they evaluated correctly you can kind of see me their skills because + you know that like in general for this class they''re giving the right answers.", + "tokens": [50664, 41156, 356, 39426, 552, 294, 1296, 264, 22546, 370, 561, 486, + 3566, 552, 1553, 21814, 309, 570, 411, 309, 311, 7633, 538, 264, 9362, 293, 1203, + 293, 538, 264, 9668, 295, 264, 5110, 300, 436, 25509, 8944, 291, 393, 733, 295, + 536, 385, 641, 3942, 570, 291, 458, 300, 411, 294, 2674, 337, 341, 1508, 436, 434, + 2902, 264, 558, 6338, 13, 51764], "temperature": 0.0, "avg_logprob": -0.2361252958124334, + "compression_ratio": 1.7663934426229508, "no_speech_prob": 0.5042442679405212}, + {"id": 186, "seek": 303042, "start": 3030.42, "end": 3059.42, "text": " And the + second technique which also works good for the more creative I would say or gathering + assignments for example when you need to take a picture or when you need to do an + assignment outdoors for example go and check there is like building on the some + certain plate for like a map sub there you can do even more tricky thing and tell + the crowd evaluate the other crowds.", "tokens": [50364, 400, 264, 1150, 6532, 597, + 611, 1985, 665, 337, 264, 544, 5880, 286, 576, 584, 420, 13519, 22546, 337, 1365, + 562, 291, 643, 281, 747, 257, 3036, 420, 562, 291, 643, 281, 360, 364, 15187, 20980, + 337, 1365, 352, 293, 1520, 456, 307, 411, 2390, 322, 264, 512, 1629, 5924, 337, + 411, 257, 4471, 1422, 456, 291, 393, 360, 754, 544, 12414, 551, 293, 980, 264, 6919, + 13059, 264, 661, 26070, 13, 51814], "temperature": 0.0, "avg_logprob": -0.11518531096609015, + "compression_ratio": 1.7395348837209301, "no_speech_prob": 0.002413414651528001}, + {"id": 187, "seek": 305942, "start": 3059.42, "end": 3078.42, "text": " So you''re + creating a specific validation project with the other crowds or source and you''re + giving them the answers of the first crowds or source and you say okay guys now + you need to evaluate doesn''t look correct to you doesn''t look like not a fraud + and everything and there by this double evaluation you''re actually sorting out + all the problems.", "tokens": [50364, 407, 291, 434, 4084, 257, 2685, 24071, 1716, + 365, 264, 661, 26070, 420, 4009, 293, 291, 434, 2902, 552, 264, 6338, 295, 264, + 700, 26070, 420, 4009, 293, 291, 584, 1392, 1074, 586, 291, 643, 281, 13059, 1177, + 380, 574, 3006, 281, 291, 1177, 380, 574, 411, 406, 257, 14560, 293, 1203, 293, + 456, 538, 341, 3834, 13344, 291, 434, 767, 32411, 484, 439, 264, 2740, 13, 51314], + "temperature": 0.0, "avg_logprob": -0.22290712007334534, "compression_ratio": 1.7263681592039801, + "no_speech_prob": 0.02078501135110855}, {"id": 188, "seek": 307842, "start": 3079.42, + "end": 3092.42, "text": " Wow I have never heard of such method it''s it''s amazing + I think more traditionally like maybe like 10 years ago in the project related to + sentiment analysis we were talking about.", "tokens": [50414, 3153, 286, 362, 1128, + 2198, 295, 1270, 3170, 309, 311, 309, 311, 2243, 286, 519, 544, 19067, 411, 1310, + 411, 1266, 924, 2057, 294, 264, 1716, 4077, 281, 16149, 5215, 321, 645, 1417, 466, + 13, 51064], "temperature": 0.0, "avg_logprob": -0.14883620650679977, "compression_ratio": + 1.4285714285714286, "no_speech_prob": 0.44732481241226196}, {"id": 189, "seek": + 307842, "start": 3093.42, "end": 3098.42, "text": " Double annotation but at the + same time so you give the same.", "tokens": [51114, 16633, 48654, 457, 412, 264, + 912, 565, 370, 291, 976, 264, 912, 13, 51364], "temperature": 0.0, "avg_logprob": + -0.14883620650679977, "compression_ratio": 1.4285714285714286, "no_speech_prob": + 0.44732481241226196}, {"id": 190, "seek": 309842, "start": 3099.42, "end": 3116.42, + "text": " You know the same label and then you ask the human to whether they agree + or not but twice and then you basically calculate the inter annotator agreement + but what you just described is so brilliantly put and sort of invented in a way.", + "tokens": [50414, 509, 458, 264, 912, 7645, 293, 550, 291, 1029, 264, 1952, 281, + 1968, 436, 3986, 420, 406, 457, 6091, 293, 550, 291, 1936, 8873, 264, 728, 25339, + 1639, 8106, 457, 437, 291, 445, 7619, 307, 370, 8695, 42580, 829, 293, 1333, 295, + 14479, 294, 257, 636, 13, 51264], "temperature": 0.0, "avg_logprob": -0.12869857339298024, + "compression_ratio": 1.5533333333333332, "no_speech_prob": 0.11700117588043213}, + {"id": 191, "seek": 311642, "start": 3116.42, "end": 3142.42, "text": " Was this + invented at the locker or have you seen this somewhere to be on I don''t know if + we did that I am super happy yeah it doesn''t seem like rocket science yeah but + it works yeah yeah yeah and also about yeah in inter annotator agreement also works + especially in like some classification that''s how we actually started creating + this hidden assignments recently.", "tokens": [50364, 3027, 341, 14479, 412, 264, + 25707, 420, 362, 291, 1612, 341, 4079, 281, 312, 322, 286, 500, 380, 458, 498, 321, + 630, 300, 286, 669, 1687, 2055, 1338, 309, 1177, 380, 1643, 411, 13012, 3497, 1338, + 457, 309, 1985, 1338, 1338, 1338, 293, 611, 466, 1338, 294, 728, 25339, 1639, 8106, + 611, 1985, 2318, 294, 411, 512, 21538, 300, 311, 577, 321, 767, 1409, 4084, 341, + 7633, 22546, 3938, 13, 51664], "temperature": 0.0, "avg_logprob": -0.26790345681680217, + "compression_ratio": 1.6470588235294117, "no_speech_prob": 0.12353106588125229}, + {"id": 192, "seek": 314242, "start": 3142.42, "end": 3167.42, "text": " Like as + I told about them we are called and honey quotes or the golden assignment the one + with a hidden tasks which you''re like shuffle in the data and then evaluate the + result the skills of people who are doing the some certain kinds of assignments + and actually also we''re saving these skills and sometimes you can access them because + they''re already on platform they call global skills you can just.", "tokens": [50364, + 1743, 382, 286, 1907, 466, 552, 321, 366, 1219, 293, 8330, 19963, 420, 264, 9729, + 15187, 264, 472, 365, 257, 7633, 9608, 597, 291, 434, 411, 39426, 294, 264, 1412, + 293, 550, 13059, 264, 1874, 264, 3942, 295, 561, 567, 366, 884, 264, 512, 1629, + 3685, 295, 22546, 293, 767, 611, 321, 434, 6816, 613, 3942, 293, 2171, 291, 393, + 2105, 552, 570, 436, 434, 1217, 322, 3663, 436, 818, 4338, 3942, 291, 393, 445, + 13, 51614], "temperature": 0.0, "avg_logprob": -0.19589686393737793, "compression_ratio": + 1.7063829787234042, "no_speech_prob": 0.04199475422501564}, {"id": 193, "seek": + 316742, "start": 3167.42, "end": 3188.42, "text": " Preselect on your project people + who already succeed in moderation for example that actually helped me recently a + lot because I didn''t have to train the crowd for my very complex stuff so yeah + but I stepped aside so before like when I was even working with to lock us on time + ago you have to create this specific tasks yourself.", "tokens": [50364, 2718, 14664, + 322, 428, 1716, 561, 567, 1217, 7754, 294, 49471, 337, 1365, 300, 767, 4254, 385, + 3938, 257, 688, 570, 286, 994, 380, 362, 281, 3847, 264, 6919, 337, 452, 588, 3997, + 1507, 370, 1338, 457, 286, 15251, 7359, 370, 949, 411, 562, 286, 390, 754, 1364, + 365, 281, 4017, 505, 322, 565, 2057, 291, 362, 281, 1884, 341, 2685, 9608, 1803, + 13, 51414], "temperature": 0.0, "avg_logprob": -0.22812868567074046, "compression_ratio": + 1.5327102803738317, "no_speech_prob": 0.01878097467124462}, {"id": 194, "seek": + 318842, "start": 3188.42, "end": 3217.42, "text": " Like this hidden you have to + manually label them and that took some time and it was like kind of tiring because + you''re sitting here and you''re creating like 100 like usually you need some certain + amount of some sample of this task like at least 75% of the general like amount + of the tasks on your platform on your project to evaluate how good the people are + because you''re just if you''re like giving them 100 items to label and once you''re + asking like.", "tokens": [50364, 1743, 341, 7633, 291, 362, 281, 16945, 7645, 552, + 293, 300, 1890, 512, 565, 293, 309, 390, 411, 733, 295, 35182, 570, 291, 434, 3798, + 510, 293, 291, 434, 4084, 411, 2319, 411, 2673, 291, 643, 512, 1629, 2372, 295, + 512, 6889, 295, 341, 5633, 411, 412, 1935, 9562, 4, 295, 264, 2674, 411, 2372, 295, + 264, 9608, 322, 428, 3663, 322, 428, 1716, 281, 13059, 577, 665, 264, 561, 366, + 570, 291, 434, 445, 498, 291, 434, 411, 2902, 552, 2319, 4754, 281, 7645, 293, 1564, + 291, 434, 3365, 411, 13, 51814], "temperature": 0.0, "avg_logprob": -0.11716637862356086, + "compression_ratio": 1.8152610441767068, "no_speech_prob": 0.2186514288187027}, + {"id": 195, "seek": 321842, "start": 3218.42, "end": 3233.42, "text": " If it''s + correct on earth you can''t evaluate if this person is good or bad it can be just + the pure luck so labeling why yourself was kind of you know time consuming and said + and recently we decided okay but we have.", "tokens": [50364, 759, 309, 311, 3006, + 322, 4120, 291, 393, 380, 13059, 498, 341, 954, 307, 665, 420, 1578, 309, 393, 312, + 445, 264, 6075, 3668, 370, 40244, 983, 1803, 390, 733, 295, 291, 458, 565, 19867, + 293, 848, 293, 3938, 321, 3047, 1392, 457, 321, 362, 13, 51114], "temperature": + 0.0, "avg_logprob": -0.22099206924438478, "compression_ratio": 1.4078947368421053, + "no_speech_prob": 0.012164880521595478}, {"id": 196, "seek": 323342, "start": 3233.42, + "end": 3262.42, "text": " Crowd why do we are doing the drop of the crowd let''s + just create this hidden tasks by the other crowd and we can do this easily just + using the interhuman agreement you''re just giving them a task and you''re pre selecting + the crowd with the good skills in the past just in general so you trust them more + and you throw for example 10 people on one tiny bit of a task and 10 people like + legally without knowing what the others say and then like to have better than one + you.", "tokens": [50364, 40110, 983, 360, 321, 366, 884, 264, 3270, 295, 264, 6919, + 718, 311, 445, 1884, 341, 7633, 9608, 538, 264, 661, 6919, 293, 321, 393, 360, 341, + 3612, 445, 1228, 264, 728, 18796, 8106, 291, 434, 445, 2902, 552, 257, 5633, 293, + 291, 434, 659, 18182, 264, 6919, 365, 264, 665, 3942, 294, 264, 1791, 445, 294, + 2674, 370, 291, 3361, 552, 544, 293, 291, 3507, 337, 1365, 1266, 561, 322, 472, + 5870, 857, 295, 257, 5633, 293, 1266, 561, 411, 21106, 1553, 5276, 437, 264, 2357, + 584, 293, 550, 411, 281, 362, 1101, 813, 472, 291, 13, 51814], "temperature": 0.0, + "avg_logprob": -0.17229157627219022, "compression_ratio": 1.8359375, "no_speech_prob": + 0.30360865592956543}, {"id": 197, "seek": 326342, "start": 3263.42, "end": 3292.42, + "text": " Usually some certain amount of the strong agreement comes and you know + that is the right answer and you can directly pick it and already shuffle it in + the other project so you''ll see yeah we''re making the self working mechanisms + like you know you just throw some data in your system yeah it''s like self reinforcement + or yeah I think this is amazing and and it also is surfacing I believe like it''s.", + "tokens": [50364, 11419, 512, 1629, 2372, 295, 264, 2068, 8106, 1487, 293, 291, + 458, 300, 307, 264, 558, 1867, 293, 291, 393, 3838, 1888, 309, 293, 1217, 39426, + 309, 294, 264, 661, 1716, 370, 291, 603, 536, 1338, 321, 434, 1455, 264, 2698, 1364, + 15902, 411, 291, 458, 291, 445, 3507, 512, 1412, 294, 428, 1185, 1338, 309, 311, + 411, 2698, 29280, 420, 1338, 286, 519, 341, 307, 2243, 293, 293, 309, 611, 307, + 9684, 5615, 286, 1697, 411, 309, 311, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.23467484439711972, "compression_ratio": 1.6793248945147679, "no_speech_prob": + 0.0847812369465828}, {"id": 198, "seek": 329342, "start": 3293.42, "end": 3305.82, + "text": " Feature of the locker that you cannot get with let''s say you set up an + open source labeling tool that you can having a specific task like moderation or.", + "tokens": [50364, 3697, 1503, 295, 264, 25707, 300, 291, 2644, 483, 365, 718, 311, + 584, 291, 992, 493, 364, 1269, 4009, 40244, 2290, 300, 291, 393, 1419, 257, 2685, + 5633, 411, 49471, 420, 13, 50984], "temperature": 0.0, "avg_logprob": -0.16636069615681967, + "compression_ratio": 1.3333333333333333, "no_speech_prob": 0.012348958291113377}, + {"id": 199, "seek": 330582, "start": 3305.82, "end": 3334.82, "text": " I don''t + know sentiment whatever machine translation that you can actually ask and gather + a group that will be proficient in that specific space so because otherwise you''re + going to be wasting cycles in potentially teaching people right yeah yeah I think + this is something that now that we started to say in the beginning of the podcast + the data is important but also humans that annotate.", "tokens": [50364, 286, 500, + 380, 458, 16149, 2035, 3479, 12853, 300, 291, 393, 767, 1029, 293, 5448, 257, 1594, + 300, 486, 312, 1740, 24549, 294, 300, 2685, 1901, 370, 570, 5911, 291, 434, 516, + 281, 312, 20457, 17796, 294, 7263, 4571, 561, 558, 1338, 1338, 286, 519, 341, 307, + 746, 300, 586, 300, 321, 1409, 281, 584, 294, 264, 2863, 295, 264, 7367, 264, 1412, + 307, 1021, 457, 611, 6255, 300, 25339, 473, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.134696044921875, "compression_ratio": 1.6724137931034482, "no_speech_prob": 0.11485940963029861}, + {"id": 200, "seek": 333582, "start": 3335.82, "end": 3364.82, "text": " It is important + I important yeah absolutely this is great I still wanted to understand one building + block you were talking about aggregation can you can you again sort of restate this + what do you mean and what should I pay attention to as a as a user of such a platform.", + "tokens": [50364, 467, 307, 1021, 286, 1021, 1338, 3122, 341, 307, 869, 286, 920, + 1415, 281, 1223, 472, 2390, 3461, 291, 645, 1417, 466, 16743, 399, 393, 291, 393, + 291, 797, 1333, 295, 1472, 473, 341, 437, 360, 291, 914, 293, 437, 820, 286, 1689, + 3202, 281, 382, 257, 382, 257, 4195, 295, 1270, 257, 3663, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.2216234042726714, "compression_ratio": 1.5789473684210527, + "no_speech_prob": 0.01720566861331463}, {"id": 201, "seek": 336582, "start": 3365.82, + "end": 3393.82, "text": " So for example like there are different ways of annotating + data and sometimes you need like there can be different cases when you need aggregation + so aggregation is like just imagine that you receive some raw results from the human + annotators and then you need to aggregate them in some final answer that you will + use for your model or for something it can be different cases when you need it for + example as we were talking about the.", "tokens": [50364, 407, 337, 1365, 411, 456, + 366, 819, 2098, 295, 25339, 990, 1412, 293, 2171, 291, 643, 411, 456, 393, 312, + 819, 3331, 562, 291, 643, 16743, 399, 370, 16743, 399, 307, 411, 445, 3811, 300, + 291, 4774, 512, 8936, 3542, 490, 264, 1952, 25339, 3391, 293, 550, 291, 643, 281, + 26118, 552, 294, 512, 2572, 1867, 300, 291, 486, 764, 337, 428, 2316, 420, 337, + 746, 309, 393, 312, 819, 3331, 562, 291, 643, 309, 337, 1365, 382, 321, 645, 1417, + 466, 264, 13, 51764], "temperature": 0.0, "avg_logprob": -0.11957089467482133, "compression_ratio": + 2.009259259259259, "no_speech_prob": 0.009952418506145477}, {"id": 202, "seek": + 339382, "start": 3393.82, "end": 3422.82, "text": " The aggregation between humans + on the some some task for example you have a task of labeling feature is that a + cat or dog and like you decided that you want to like foreign a tater so it''s and + like three of them said it''s a cat and the one said that it''s a dog and you have + less for answers and to understand that it''s a cat you need to perform an aggregation + so if it comes to classification tasks it''s pretty easy to do.", "tokens": [50364, + 440, 16743, 399, 1296, 6255, 322, 264, 512, 512, 5633, 337, 1365, 291, 362, 257, + 5633, 295, 40244, 4111, 307, 300, 257, 3857, 420, 3000, 293, 411, 291, 3047, 300, + 291, 528, 281, 411, 5329, 257, 256, 771, 370, 309, 311, 293, 411, 1045, 295, 552, + 848, 309, 311, 257, 3857, 293, 264, 472, 848, 300, 309, 311, 257, 3000, 293, 291, + 362, 1570, 337, 6338, 293, 281, 1223, 300, 309, 311, 257, 3857, 291, 643, 281, 2042, + 364, 16743, 399, 370, 498, 309, 1487, 281, 21538, 9608, 309, 311, 1238, 1858, 281, + 360, 13, 51814], "temperature": 0.0, "avg_logprob": -0.28651288090919963, "compression_ratio": + 1.8755555555555556, "no_speech_prob": 0.013296754099428654}, {"id": 203, "seek": + 342382, "start": 3423.82, "end": 3446.82, "text": " I mean you can do just major + world or like major world to wake it by the skills of this people but then it comes + for example for aggregating like images like for example you''re doing a segmentation + and you need to aggregate different answers about the segmentations here it''s already + harder because like doing like a major world piece of wise it''s a little bit of + a hard work you know.", "tokens": [50364, 286, 914, 291, 393, 360, 445, 2563, 1002, + 420, 411, 2563, 1002, 281, 6634, 309, 538, 264, 3942, 295, 341, 561, 457, 550, 309, + 1487, 337, 1365, 337, 16743, 990, 411, 5267, 411, 337, 1365, 291, 434, 884, 257, + 9469, 399, 293, 291, 643, 281, 26118, 819, 6338, 466, 264, 9469, 763, 510, 309, + 311, 1217, 6081, 570, 411, 884, 411, 257, 2563, 1002, 2522, 295, 10829, 309, 311, + 257, 707, 857, 295, 257, 1152, 589, 291, 458, 13, 51514], "temperature": 0.0, "avg_logprob": + -0.19638351072748023, "compression_ratio": 1.807511737089202, "no_speech_prob": + 0.00843070074915886}, {"id": 204, "seek": 344682, "start": 3446.82, "end": 3475.82, + "text": " So for that usually there are like some models which are pre designed + and used and studied in crowd science so aggregation for the aggregation of image + and also aggregation of this pairwise comparisons that I was talking about because + this is a specifically a hard task because you have this pair wise assignments and + sometimes it''s like a better than be be better than see but see better than a and + you''re having a cycle you don''t know what to do so for that there are existing.", + "tokens": [50364, 407, 337, 300, 2673, 456, 366, 411, 512, 5245, 597, 366, 659, + 4761, 293, 1143, 293, 9454, 294, 6919, 3497, 370, 16743, 399, 337, 264, 16743, 399, + 295, 3256, 293, 611, 16743, 399, 295, 341, 6119, 3711, 33157, 300, 286, 390, 1417, + 466, 570, 341, 307, 257, 4682, 257, 1152, 5633, 570, 291, 362, 341, 6119, 10829, + 22546, 293, 2171, 309, 311, 411, 257, 1101, 813, 312, 312, 1101, 813, 536, 457, + 536, 1101, 813, 257, 293, 291, 434, 1419, 257, 6586, 291, 500, 380, 458, 437, 281, + 360, 370, 337, 300, 456, 366, 6741, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.1981051473906546, "compression_ratio": 1.8784313725490196, "no_speech_prob": + 0.051039353013038635}, {"id": 205, "seek": 347682, "start": 3476.82, "end": 3505.82, + "text": " So then a couple of models which are based for example noisy bread dietary + which are based on the expectation maximization algorithm which assumes that flavors + are actually by the skill know the ground truth of the answers and we''re trying + to estimate that to get as possible as close to that for a digmar with like a couple + of iterations of this model and then the end it just gives you away the list of + responses like for example if we''re counting and.", "tokens": [50364, 407, 550, + 257, 1916, 295, 5245, 597, 366, 2361, 337, 1365, 24518, 5961, 37421, 597, 366, 2361, + 322, 264, 14334, 5138, 2144, 9284, 597, 37808, 300, 16303, 366, 767, 538, 264, 5389, + 458, 264, 2727, 3494, 295, 264, 6338, 293, 321, 434, 1382, 281, 12539, 300, 281, + 483, 382, 1944, 382, 1998, 281, 300, 337, 257, 2528, 6209, 365, 411, 257, 1916, + 295, 36540, 295, 341, 2316, 293, 550, 264, 917, 309, 445, 2709, 291, 1314, 264, + 1329, 295, 13019, 411, 337, 1365, 498, 321, 434, 13251, 293, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.21172958871592645, "compression_ratio": 1.8015873015873016, + "no_speech_prob": 0.020433103665709496}, {"id": 206, "seek": 350682, "start": 3506.82, + "end": 3518.82, "text": " DCD some other metric we just need a list where it says + like item one is the best item 10 is the worst so the aggregations out of this all + of the pairwise comparisons it will give you that list.", "tokens": [50364, 9114, + 35, 512, 661, 20678, 321, 445, 643, 257, 1329, 689, 309, 1619, 411, 3174, 472, 307, + 264, 1151, 3174, 1266, 307, 264, 5855, 370, 264, 16743, 763, 484, 295, 341, 439, + 295, 264, 6119, 3711, 33157, 309, 486, 976, 291, 300, 1329, 13, 50964], "temperature": + 0.0, "avg_logprob": -0.1876120837229603, "compression_ratio": 1.8444444444444446, + "no_speech_prob": 0.017133766785264015}, {"id": 207, "seek": 350682, "start": 3519.82, + "end": 3535.82, "text": " You can implement these aggregations yourself and study + them because like in crowd like we''re not the one the first ones doing crowdsourcing + action so they''re like in crowd science they''re like a lot of models presented + and our research team actually also studying them and implementing them and I hope.", + "tokens": [51014, 509, 393, 4445, 613, 16743, 763, 1803, 293, 2979, 552, 570, 411, + 294, 6919, 411, 321, 434, 406, 264, 472, 264, 700, 2306, 884, 26070, 41849, 3069, + 370, 436, 434, 411, 294, 6919, 3497, 436, 434, 411, 257, 688, 295, 5245, 8212, 293, + 527, 2132, 1469, 767, 611, 7601, 552, 293, 18114, 552, 293, 286, 1454, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.1876120837229603, "compression_ratio": 1.8444444444444446, + "no_speech_prob": 0.017133766785264015}, {"id": 208, "seek": 353682, "start": 3536.82, + "end": 3561.82, "text": " I''m not praising them too much but you can it''s your + it''s your moment of okay our research team is created yeah but but yeah we for + example for aggregation we just create a tool which can be used paired with a platform + so you don''t have to think much how does it work but if you want.", "tokens": [50364, + 286, 478, 406, 42941, 552, 886, 709, 457, 291, 393, 309, 311, 428, 309, 311, 428, + 1623, 295, 1392, 527, 2132, 1469, 307, 2942, 1338, 457, 457, 1338, 321, 337, 1365, + 337, 16743, 399, 321, 445, 1884, 257, 2290, 597, 393, 312, 1143, 25699, 365, 257, + 3663, 370, 291, 500, 380, 362, 281, 519, 709, 577, 775, 309, 589, 457, 498, 291, + 528, 13, 51614], "temperature": 0.0, "avg_logprob": -0.21010368010577032, "compression_ratio": + 1.5777777777777777, "no_speech_prob": 0.008112764917314053}, {"id": 209, "seek": + 356182, "start": 3561.82, "end": 3581.82, "text": " Right me and I can provide you + with paper yeah absolutely and all the papers that you mentioned during this podcast + I really would love to include a show notes as well because because I I see how + how the listeners find the episodes educational and they.", "tokens": [50364, 1779, + 385, 293, 286, 393, 2893, 291, 365, 3035, 1338, 3122, 293, 439, 264, 10577, 300, + 291, 2835, 1830, 341, 7367, 286, 534, 576, 959, 281, 4090, 257, 855, 5570, 382, + 731, 570, 570, 286, 286, 536, 577, 577, 264, 23274, 915, 264, 9313, 10189, 293, + 436, 13, 51364], "temperature": 0.0, "avg_logprob": -0.20095267662635216, "compression_ratio": + 1.505952380952381, "no_speech_prob": 0.5001320838928223}, {"id": 210, "seek": 358182, + "start": 3581.82, "end": 3593.82, "text": " Some of them spent a lot of time you + know listening through and and and then you know reading the papers as well at least + browsing through them.", "tokens": [50364, 2188, 295, 552, 4418, 257, 688, 295, + 565, 291, 458, 4764, 807, 293, 293, 293, 550, 291, 458, 3760, 264, 10577, 382, 731, + 412, 1935, 38602, 807, 552, 13, 50964], "temperature": 0.0, "avg_logprob": -0.18506460047479886, + "compression_ratio": 1.6033519553072626, "no_speech_prob": 0.11725147068500519}, + {"id": 211, "seek": 358182, "start": 3594.82, "end": 3605.82, "text": " So Janet + so much stuff you have shared so far and I think even those who are using open source + tools like I don''t know keep it a label studio.", "tokens": [51014, 407, 26948, + 370, 709, 1507, 291, 362, 5507, 370, 1400, 293, 286, 519, 754, 729, 567, 366, 1228, + 1269, 4009, 3873, 411, 286, 500, 380, 458, 1066, 309, 257, 7645, 6811, 13, 51564], + "temperature": 0.0, "avg_logprob": -0.18506460047479886, "compression_ratio": 1.6033519553072626, + "no_speech_prob": 0.11725147068500519}, {"id": 212, "seek": 360582, "start": 3605.82, + "end": 3634.82, "text": " And others I''m sure they can learn from what you said + but I also hope that they will improve their practices by typing into the talent + behind the locker but there is one topic that I think keeps surfacing everywhere + but also to some degree it becomes an overheated discussion around bias and data + and how this can actually drive the inequalities in life and in the world and I.", + "tokens": [50364, 400, 2357, 286, 478, 988, 436, 393, 1466, 490, 437, 291, 848, + 457, 286, 611, 1454, 300, 436, 486, 3470, 641, 7525, 538, 18444, 666, 264, 8301, + 2261, 264, 25707, 457, 456, 307, 472, 4829, 300, 286, 519, 5965, 9684, 5615, 5315, + 457, 611, 281, 512, 4314, 309, 3643, 364, 29807, 770, 5017, 926, 12577, 293, 1412, + 293, 577, 341, 393, 767, 3332, 264, 41874, 294, 993, 293, 294, 264, 1002, 293, 286, + 13, 51814], "temperature": 0.0, "avg_logprob": -0.1393157414027623, "compression_ratio": + 1.6391304347826088, "no_speech_prob": 0.3026226758956909}, {"id": 213, "seek": 363582, + "start": 3635.82, "end": 3658.82, "text": " I think by the virtue of us being in + this space we should resist this as much as possible but I wanted to pick your your + brain on what is bias for you and how you have seen or maybe discussed this in the + email projects.", "tokens": [50364, 286, 519, 538, 264, 20816, 295, 505, 885, 294, + 341, 1901, 321, 820, 4597, 341, 382, 709, 382, 1944, 457, 286, 1415, 281, 1888, + 428, 428, 3567, 322, 437, 307, 12577, 337, 291, 293, 577, 291, 362, 1612, 420, 1310, + 7152, 341, 294, 264, 3796, 4455, 13, 51514], "temperature": 0.0, "avg_logprob": + -0.1466831693462297, "compression_ratio": 1.4697986577181208, "no_speech_prob": + 0.02044491283595562}, {"id": 214, "seek": 365882, "start": 3658.82, "end": 3683.82, + "text": " I really like this topic because yeah I recently hosted a panel discussion + about biases and a lot of hosting a panel discussions because you can come you can + know zero about the subject you can ask the stupidest questions to the most awesome + engineers in the field and your returning super educated so maybe I''ll try to just + to recreate what people from this panel discussion and early in set to me.", "tokens": + [50364, 286, 534, 411, 341, 4829, 570, 1338, 286, 3938, 19204, 257, 4831, 5017, + 466, 32152, 293, 257, 688, 295, 16058, 257, 4831, 11088, 570, 291, 393, 808, 291, + 393, 458, 4018, 466, 264, 3983, 291, 393, 1029, 264, 6631, 377, 1651, 281, 264, + 881, 3476, 11955, 294, 264, 2519, 293, 428, 12678, 1687, 15872, 370, 1310, 286, + 603, 853, 281, 445, 281, 25833, 437, 561, 490, 341, 4831, 5017, 293, 2440, 294, + 992, 281, 385, 13, 51614], "temperature": 0.0, "avg_logprob": -0.25541367530822756, + "compression_ratio": 1.7577092511013215, "no_speech_prob": 0.5616403222084045}, + {"id": 215, "seek": 368382, "start": 3684.82, "end": 3695.82, "text": " But as far + from my understanding they are like not two types of biases but I would I consider + them as a two types of biases.", "tokens": [50414, 583, 382, 1400, 490, 452, 3701, + 436, 366, 411, 406, 732, 3467, 295, 32152, 457, 286, 576, 286, 1949, 552, 382, 257, + 732, 3467, 295, 32152, 13, 50964], "temperature": 0.0, "avg_logprob": -0.19801256733555947, + "compression_ratio": 1.3191489361702127, "no_speech_prob": 0.5397446751594543}, + {"id": 216, "seek": 369582, "start": 3695.82, "end": 3713.82, "text": " The one + is I think all bias more related to the stuff which is like the things that we shouldn''t + be biased on but we are biased because of the historical data or like the mother + unfair like", "tokens": [50364, 440, 472, 307, 286, 519, 439, 12577, 544, 4077, + 281, 264, 1507, 597, 307, 411, 264, 721, 300, 321, 4659, 380, 312, 28035, 322, 457, + 321, 366, 28035, 570, 295, 264, 8584, 1412, 420, 411, 264, 2895, 17019, 411, 51264], + "temperature": 0.0, "avg_logprob": -0.19767610416855924, "compression_ratio": 1.4765625, + "no_speech_prob": 0.37444883584976196}, {"id": 217, "seek": 371382, "start": 3713.82, + "end": 3727.82, "text": " like results of the history so it''s like the bias of + the skin color the bias of the gender the bias of some other and they are here and + there in the set in the big models.", "tokens": [50364, 411, 3542, 295, 264, 2503, + 370, 309, 311, 411, 264, 12577, 295, 264, 3178, 2017, 264, 12577, 295, 264, 7898, + 264, 12577, 295, 512, 661, 293, 436, 366, 510, 293, 456, 294, 264, 992, 294, 264, + 955, 5245, 13, 51064], "temperature": 0.0, "avg_logprob": -0.22371848793916924, + "compression_ratio": 1.6132075471698113, "no_speech_prob": 0.24564941227436066}, + {"id": 218, "seek": 372782, "start": 3727.82, "end": 3732.82, "text": " For example + like even the dolly and GPT free and everything.", "tokens": [50364, 1171, 1365, + 411, 754, 264, 2722, 88, 293, 26039, 51, 1737, 293, 1203, 13, 50614], "temperature": + 0.0, "avg_logprob": -0.267921274358576, "compression_ratio": 1.668141592920354, + "no_speech_prob": 0.509493887424469}, {"id": 219, "seek": 372782, "start": 3732.82, + "end": 3752.82, "text": " Sadly scenes they are learning on the internet available + data and the internet is a very toxic space sometimes especially like I still like + the stories of the chat bots which were like learned on Twitter and then they are + like so offensive that nobody can leave them out in the open like business communication + word.", "tokens": [50614, 29628, 8026, 436, 366, 2539, 322, 264, 4705, 2435, 1412, + 293, 264, 4705, 307, 257, 588, 12786, 1901, 2171, 2318, 411, 286, 920, 411, 264, + 3676, 295, 264, 5081, 35410, 597, 645, 411, 3264, 322, 5794, 293, 550, 436, 366, + 411, 370, 15710, 300, 5079, 393, 1856, 552, 484, 294, 264, 1269, 411, 1606, 6101, + 1349, 13, 51614], "temperature": 0.0, "avg_logprob": -0.267921274358576, "compression_ratio": + 1.668141592920354, "no_speech_prob": 0.509493887424469}, {"id": 220, "seek": 375282, + "start": 3753.82, "end": 3779.82, "text": " So these models of course have this + at the biases and that should be controlled very much and that''s why we have like + the fairness fairness topic in the eye and that''s exactly like I actually I studied + recently I love my masters because I''m revisiting all the topics in the ML so I''m + feeling like when I''m talking about it I''m feeling like I''m literally coming + from the academic background.", "tokens": [50414, 407, 613, 5245, 295, 1164, 362, + 341, 412, 264, 32152, 293, 300, 820, 312, 10164, 588, 709, 293, 300, 311, 983, 321, + 362, 411, 264, 29765, 29765, 4829, 294, 264, 3313, 293, 300, 311, 2293, 411, 286, + 767, 286, 9454, 3938, 286, 959, 452, 19294, 570, 286, 478, 20767, 1748, 439, 264, + 8378, 294, 264, 21601, 370, 286, 478, 2633, 411, 562, 286, 478, 1417, 466, 309, + 286, 478, 2633, 411, 286, 478, 3736, 1348, 490, 264, 7778, 3678, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.2213426090422131, "compression_ratio": 1.748878923766816, + "no_speech_prob": 0.571054995059967}, {"id": 221, "seek": 377982, "start": 3779.82, + "end": 3793.82, "text": " It''s just the monsters and the fairness algorithms there + like pretty much set up how you can evaluate how you can try to make your data list + bias or just test it on the fairness but yeah still here.", "tokens": [50364, 467, + 311, 445, 264, 15785, 293, 264, 29765, 14642, 456, 411, 1238, 709, 992, 493, 577, + 291, 393, 13059, 577, 291, 393, 853, 281, 652, 428, 1412, 1329, 12577, 420, 445, + 1500, 309, 322, 264, 29765, 457, 1338, 920, 510, 13, 51064], "temperature": 0.0, + "avg_logprob": -0.15697451843612495, "compression_ratio": 1.7733333333333334, "no_speech_prob": + 0.05865365266799927}, {"id": 222, "seek": 377982, "start": 3793.82, "end": 3807.82, + "text": " Sadly and the second thing which and of course their approaches how to + avoid it fully but sadly we''re constructing new biases here and there so we''re + getting rid of the one and we''re constructing you.", "tokens": [51064, 29628, 293, + 264, 1150, 551, 597, 293, 295, 1164, 641, 11587, 577, 281, 5042, 309, 4498, 457, + 22023, 321, 434, 39969, 777, 32152, 510, 293, 456, 370, 321, 434, 1242, 3973, 295, + 264, 472, 293, 321, 434, 39969, 291, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.15697451843612495, "compression_ratio": 1.7733333333333334, "no_speech_prob": + 0.05865365266799927}, {"id": 223, "seek": 380782, "start": 3807.82, "end": 3832.82, + "text": " And the second one they are more like behavioral biases maybe they''re + like last harmful in the general because I consider ethical biases being very harmful + like when we''re creating a systems related to jurisdiction or like to some other + things these biases can be crucial and also by the way the same biases.", "tokens": + [50364, 400, 264, 1150, 472, 436, 366, 544, 411, 19124, 32152, 1310, 436, 434, 411, + 1036, 19727, 294, 264, 2674, 570, 286, 1949, 18890, 32152, 885, 588, 19727, 411, + 562, 321, 434, 4084, 257, 3652, 4077, 281, 27285, 420, 411, 281, 512, 661, 721, + 613, 32152, 393, 312, 11462, 293, 611, 538, 264, 636, 264, 912, 32152, 13, 51614], + "temperature": 0.0, "avg_logprob": -0.16512287640180745, "compression_ratio": 1.6382978723404256, + "no_speech_prob": 0.011306934989988804}, {"id": 224, "seek": 383282, "start": 3833.82, + "end": 3861.82, "text": " Oh I can I remember the story about covid like with the + covid when people tried it''s not like that ethical bias but it''s a bias and it + was very crucial so when with covid people tried to at the beginning when it started + and everybody was panicking so people started to think it maybe they can do something + in the eye like some amount of which will help to recognize if the person has pneumonia + like is it like caused by covid or not in the lungs.", "tokens": [50414, 876, 286, + 393, 286, 1604, 264, 1657, 466, 25616, 411, 365, 264, 25616, 562, 561, 3031, 309, + 311, 406, 411, 300, 18890, 12577, 457, 309, 311, 257, 12577, 293, 309, 390, 588, + 11462, 370, 562, 365, 25616, 561, 3031, 281, 412, 264, 2863, 562, 309, 1409, 293, + 2201, 390, 2462, 10401, 370, 561, 1409, 281, 519, 309, 1310, 436, 393, 360, 746, + 294, 264, 3313, 411, 512, 2372, 295, 597, 486, 854, 281, 5521, 498, 264, 954, 575, + 43097, 411, 307, 309, 411, 7008, 538, 25616, 420, 406, 294, 264, 19467, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.16972456375757852, "compression_ratio": 1.7911646586345382, + "no_speech_prob": 0.532133162021637}, {"id": 225, "seek": 386182, "start": 3861.82, + "end": 3881.82, "text": " And there already was data from China because it started + earlier and there were like a lot of AI and the engineers working on that but the + problem was that the data was biased and it wasn''t cleaned and sorted out because + people didn''t have so much time I mean it was very like a crucial moment.", "tokens": + [50364, 400, 456, 1217, 390, 1412, 490, 3533, 570, 309, 1409, 3071, 293, 456, 645, + 411, 257, 688, 295, 7318, 293, 264, 11955, 1364, 322, 300, 457, 264, 1154, 390, + 300, 264, 1412, 390, 28035, 293, 309, 2067, 380, 16146, 293, 25462, 484, 570, 561, + 994, 380, 362, 370, 709, 565, 286, 914, 309, 390, 588, 411, 257, 11462, 1623, 13, + 51364], "temperature": 0.0, "avg_logprob": -0.12673419713974, "compression_ratio": + 1.6153846153846154, "no_speech_prob": 0.037929683923721313}, {"id": 226, "seek": + 388182, "start": 3881.82, "end": 3910.82, "text": " So because of that models were + working very biased and bad because they for example were predicting that if the + person on the like the scan if the person is kind of in the position of lying she + or he then they have covid but if it''s in the standing position they don''t have + it just because the people for lying and taking photos they were just in the worst + medical condition.", "tokens": [50364, 407, 570, 295, 300, 5245, 645, 1364, 588, + 28035, 293, 1578, 570, 436, 337, 1365, 645, 32884, 300, 498, 264, 954, 322, 264, + 411, 264, 11049, 498, 264, 954, 307, 733, 295, 294, 264, 2535, 295, 8493, 750, 420, + 415, 550, 436, 362, 25616, 457, 498, 309, 311, 294, 264, 4877, 2535, 436, 500, 380, + 362, 309, 445, 570, 264, 561, 337, 8493, 293, 1940, 5787, 436, 645, 445, 294, 264, + 5855, 4625, 4188, 13, 51814], "temperature": 0.0, "avg_logprob": -0.16406820981930464, + "compression_ratio": 1.829268292682927, "no_speech_prob": 0.588449239730835}, {"id": + 227, "seek": 391182, "start": 3911.82, "end": 3928.82, "text": " In general because + like when you can''t stand out that means that you''re pretty you know so it was + just a bias in the diet because it was a balance and that is the result of bias + which you need to monitor in control what that''s why you can''t leave it in the + open world.", "tokens": [50364, 682, 2674, 570, 411, 562, 291, 393, 380, 1463, 484, + 300, 1355, 300, 291, 434, 1238, 291, 458, 370, 309, 390, 445, 257, 12577, 294, 264, + 6339, 570, 309, 390, 257, 4772, 293, 300, 307, 264, 1874, 295, 12577, 597, 291, + 643, 281, 6002, 294, 1969, 437, 300, 311, 983, 291, 393, 380, 1856, 309, 294, 264, + 1269, 1002, 13, 51214], "temperature": 0.0, "avg_logprob": -0.19044527411460876, + "compression_ratio": 1.6341463414634145, "no_speech_prob": 0.14326226711273193}, + {"id": 228, "seek": 392882, "start": 3928.82, "end": 3953.82, "text": " And yeah + so the behavior biases it''s more like about when like for example with the search + engine I think I touched it the position bias it''s when you''re just trained to + click on the like first three elements that you see because you''re you''re so well + with information that you don''t have like a power to go through the tens of pages + and select what exactly do or there like some other biases.", "tokens": [50364, + 400, 1338, 370, 264, 5223, 32152, 309, 311, 544, 411, 466, 562, 411, 337, 1365, + 365, 264, 3164, 2848, 286, 519, 286, 9828, 309, 264, 2535, 12577, 309, 311, 562, + 291, 434, 445, 8895, 281, 2052, 322, 264, 411, 700, 1045, 4959, 300, 291, 536, 570, + 291, 434, 291, 434, 370, 731, 365, 1589, 300, 291, 500, 380, 362, 411, 257, 1347, + 281, 352, 807, 264, 10688, 295, 7183, 293, 3048, 437, 2293, 360, 420, 456, 411, + 512, 661, 32152, 13, 51614], "temperature": 0.0, "avg_logprob": -0.14760107152602253, + "compression_ratio": 1.708695652173913, "no_speech_prob": 0.13352127373218536}, + {"id": 229, "seek": 395382, "start": 3954.82, "end": 3979.82, "text": " For example + we know that one behavior thing that people have it''s it''s interesting thing it''s + called it''s called choice over it''s like when a recommendation systems people + actually like in restaurants people prefer to see something with a bigger menu with + a bigger recommendation because they think oh it''s enriched it''s nice I would + love it but the more choice you have.", "tokens": [50414, 1171, 1365, 321, 458, + 300, 472, 5223, 551, 300, 561, 362, 309, 311, 309, 311, 1880, 551, 309, 311, 1219, + 309, 311, 1219, 3922, 670, 309, 311, 411, 562, 257, 11879, 3652, 561, 767, 411, + 294, 11486, 561, 4382, 281, 536, 746, 365, 257, 3801, 6510, 365, 257, 3801, 11879, + 570, 436, 519, 1954, 309, 311, 48624, 309, 311, 1481, 286, 576, 959, 309, 457, 264, + 544, 3922, 291, 362, 13, 51664], "temperature": 0.0, "avg_logprob": -0.18433058420817058, + "compression_ratio": 1.8097560975609757, "no_speech_prob": 0.03349823132157326}, + {"id": 230, "seek": 397982, "start": 3979.82, "end": 4007.82, "text": " The more + cost you''re spending of on decision your inner cost your evaluation cost at some + point it becomes just not like not feasible not useful like you need less items + to select better choice at some point you just lose attention and everything and + that''s another like thing which comes from our behavior and which biases a lot + of instruments and which biases a lot of like models and which we need to take into + account.", "tokens": [50364, 440, 544, 2063, 291, 434, 6434, 295, 322, 3537, 428, + 7284, 2063, 428, 13344, 2063, 412, 512, 935, 309, 3643, 445, 406, 411, 406, 26648, + 406, 4420, 411, 291, 643, 1570, 4754, 281, 3048, 1101, 3922, 412, 512, 935, 291, + 445, 3624, 3202, 293, 1203, 293, 300, 311, 1071, 411, 551, 597, 1487, 490, 527, + 5223, 293, 597, 32152, 257, 688, 295, 12190, 293, 597, 32152, 257, 688, 295, 411, + 5245, 293, 597, 321, 643, 281, 747, 666, 2696, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.11496774355570476, "compression_ratio": 1.8755555555555556, "no_speech_prob": + 0.033838462084531784}, {"id": 231, "seek": 400782, "start": 4007.82, "end": 4032.82, + "text": " Or otherwise we won''t be successful with implementing it yeah absolutely + on this paralysis paralysis of choice would you think that reducing the number of + options with bias our system in some way like strictly speaking do we actually introduce + bias by reducing the number of options.", "tokens": [50364, 1610, 5911, 321, 1582, + 380, 312, 4406, 365, 18114, 309, 1338, 3122, 322, 341, 49507, 49507, 295, 3922, + 576, 291, 519, 300, 12245, 264, 1230, 295, 3956, 365, 12577, 527, 1185, 294, 512, + 636, 411, 20792, 4124, 360, 321, 767, 5366, 12577, 538, 12245, 264, 1230, 295, 3956, + 13, 51614], "temperature": 0.0, "avg_logprob": -0.22324346146493587, "compression_ratio": + 1.6358381502890174, "no_speech_prob": 0.07902079820632935}, {"id": 232, "seek": + 403282, "start": 4033.82, "end": 4060.82, "text": " Oh I''m pretty sure we do but + like as I said sometimes you can use biases like not all of the biases I would say + there that harmful sometimes you can just like try to use them for like having more + good outcome of course I''m not talking about the ethical biases but with like for + example with reducing the choice amount.", "tokens": [50414, 876, 286, 478, 1238, + 988, 321, 360, 457, 411, 382, 286, 848, 2171, 291, 393, 764, 32152, 411, 406, 439, + 295, 264, 32152, 286, 576, 584, 456, 300, 19727, 2171, 291, 393, 445, 411, 853, + 281, 764, 552, 337, 411, 1419, 544, 665, 9700, 295, 1164, 286, 478, 406, 1417, 466, + 264, 18890, 32152, 457, 365, 411, 337, 1365, 365, 12245, 264, 3922, 2372, 13, 51764], + "temperature": 0.0, "avg_logprob": -0.16870166944420856, "compression_ratio": 1.6736842105263159, + "no_speech_prob": 0.04775127023458481}, {"id": 233, "seek": 406082, "start": 4060.82, + "end": 4085.82, "text": " Of course you''re biased people towards like the what + you offer for example it depends on what you offer in this limited choice and if + you''re like offering them the most popular of course they can be stuck in the pool + of selecting the same items without changing their preferences which they would + like to but in general for them it would be easier to select something that they + really prefer by the whole characteristics.", "tokens": [50364, 2720, 1164, 291, + 434, 28035, 561, 3030, 411, 264, 437, 291, 2626, 337, 1365, 309, 5946, 322, 437, + 291, 2626, 294, 341, 5567, 3922, 293, 498, 291, 434, 411, 8745, 552, 264, 881, 3743, + 295, 1164, 436, 393, 312, 5541, 294, 264, 7005, 295, 18182, 264, 912, 4754, 1553, + 4473, 641, 21910, 597, 436, 576, 411, 281, 457, 294, 2674, 337, 552, 309, 576, 312, + 3571, 281, 3048, 746, 300, 436, 534, 4382, 538, 264, 1379, 10891, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.09764165413088916, "compression_ratio": 1.7838983050847457, + "no_speech_prob": 0.03760448843240738}, {"id": 234, "seek": 408582, "start": 4085.82, + "end": 4113.82, "text": " So even by using people here you''re actually kind of + helping them with the choice process so I would say it was a general recommendation + like after 10 or 15 items as far as I recall your choice overload becomes too much + so you just can''t like you know I hate these restaurants when you have an immune + soup she feeds Indian food Mexican food you''re got oh my god I''m so hungry but + I can''t chew.", "tokens": [50364, 407, 754, 538, 1228, 561, 510, 291, 434, 767, + 733, 295, 4315, 552, 365, 264, 3922, 1399, 370, 286, 576, 584, 309, 390, 257, 2674, + 11879, 411, 934, 1266, 420, 2119, 4754, 382, 1400, 382, 286, 9901, 428, 3922, 28777, + 3643, 886, 709, 370, 291, 445, 393, 380, 411, 291, 458, 286, 4700, 613, 11486, 562, + 291, 362, 364, 11992, 7884, 750, 23712, 6427, 1755, 16164, 1755, 291, 434, 658, + 1954, 452, 3044, 286, 478, 370, 8067, 457, 286, 393, 380, 21200, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.19349334979879446, "compression_ratio": 1.5595238095238095, + "no_speech_prob": 0.13898329436779022}, {"id": 235, "seek": 411382, "start": 4113.82, + "end": 4143.78, "text": " Yeah it''s like no focus and maybe no face of the restaurant + in a way but at the same time I''m pretty sure there are customers who are like + in haste and they don''t have time to drill and understand what is this local cuisine + here just give me that pizza or burger or whatever and I will flick through the + menu right but I really wanted to relate to what you said and I think bias is not + always negative and I think it''s important.", "tokens": [50364, 865, 309, 311, + 411, 572, 1879, 293, 1310, 572, 1851, 295, 264, 6383, 294, 257, 636, 457, 412, 264, + 912, 565, 286, 478, 1238, 988, 456, 366, 4581, 567, 366, 411, 294, 6581, 68, 293, + 436, 500, 380, 362, 565, 281, 11392, 293, 1223, 437, 307, 341, 2654, 25257, 510, + 445, 976, 385, 300, 8298, 420, 16393, 420, 2035, 293, 286, 486, 22774, 807, 264, + 6510, 558, 457, 286, 534, 1415, 281, 10961, 281, 437, 291, 848, 293, 286, 519, 12577, + 307, 406, 1009, 3671, 293, 286, 519, 309, 311, 1021, 13, 51862], "temperature": + 0.0, "avg_logprob": -0.1232108567890368, "compression_ratio": 1.67578125, "no_speech_prob": + 0.02889730967581272}, {"id": 236, "seek": 414382, "start": 4143.82, "end": 4173.78, + "text": " To understand that sometimes in circumstances circumstances it could be + actually bringing positive impact and the example you gave is one of that right + so but at the same time whatever I show on the screen in the search engine you already + talked about it''s a click bias right what I show in that order you know in majority + of countries in the world will go top to bottom left to right and we will click + and I will show you what I''m doing.", "tokens": [50364, 1407, 1223, 300, 2171, + 294, 9121, 9121, 309, 727, 312, 767, 5062, 3353, 2712, 293, 264, 1365, 291, 2729, + 307, 472, 295, 300, 558, 370, 457, 412, 264, 912, 565, 2035, 286, 855, 322, 264, + 2568, 294, 264, 3164, 2848, 291, 1217, 2825, 466, 309, 311, 257, 2052, 12577, 558, + 437, 286, 855, 294, 300, 1668, 291, 458, 294, 6286, 295, 3517, 294, 264, 1002, 486, + 352, 1192, 281, 2767, 1411, 281, 558, 293, 321, 486, 2052, 293, 286, 486, 855, 291, + 437, 286, 478, 884, 13, 51862], "temperature": 0.0, "avg_logprob": -0.31113804711235893, + "compression_ratio": 1.7732793522267207, "no_speech_prob": 0.09727617353200912}, + {"id": 237, "seek": 417382, "start": 4173.82, "end": 4197.82, "text": " In that + order but at the same time anything that I say for example now I already bias you + to think in that direction and if I choose another strategy and I start talking + about snow or something else your mind will tune to completely different topic right + and you will be biased again without realizing that I do this.", "tokens": [50364, + 682, 300, 1668, 457, 412, 264, 912, 565, 1340, 300, 286, 584, 337, 1365, 586, 286, + 1217, 12577, 291, 281, 519, 294, 300, 3513, 293, 498, 286, 2826, 1071, 5206, 293, + 286, 722, 1417, 466, 5756, 420, 746, 1646, 428, 1575, 486, 10864, 281, 2584, 819, + 4829, 558, 293, 291, 486, 312, 28035, 797, 1553, 16734, 300, 286, 360, 341, 13, + 51564], "temperature": 0.0, "avg_logprob": -0.1375464659470778, "compression_ratio": + 1.6173469387755102, "no_speech_prob": 0.03766642138361931}, {"id": 238, "seek": + 419782, "start": 4197.82, "end": 4212.82, "text": " So the same actually will apply + I think to the annotation and labeling project right so whatever I show in which + order I show which questions I ask will bias the annotators to.", "tokens": [50364, + 407, 264, 912, 767, 486, 3079, 286, 519, 281, 264, 48654, 293, 40244, 1716, 558, + 370, 2035, 286, 855, 294, 597, 1668, 286, 855, 597, 1651, 286, 1029, 486, 12577, + 264, 25339, 3391, 281, 13, 51114], "temperature": 0.0, "avg_logprob": -0.20866049253023589, + "compression_ratio": 1.4545454545454546, "no_speech_prob": 0.053199127316474915}, + {"id": 239, "seek": 421282, "start": 4213.82, "end": 4234.62, "text": " Besides + all other factors like if I overload with them with questions they will be tired + and really not give me value but if I didn''t do that the order of the tasks might + bias them and many other items can you talk a bit more about that and also is there + some silver bullet that can solve this with least improve.", "tokens": [50414, 13212, + 439, 661, 6771, 411, 498, 286, 28777, 365, 552, 365, 1651, 436, 486, 312, 5868, + 293, 534, 406, 976, 385, 2158, 457, 498, 286, 994, 380, 360, 300, 264, 1668, 295, + 264, 9608, 1062, 12577, 552, 293, 867, 661, 4754, 393, 291, 751, 257, 857, 544, + 466, 300, 293, 611, 307, 456, 512, 8753, 11632, 300, 393, 5039, 341, 365, 1935, + 3470, 13, 51454], "temperature": 0.0, "avg_logprob": -0.12013718661139994, "compression_ratio": + 1.6051282051282052, "no_speech_prob": 0.4875321090221405}, {"id": 240, "seek": 423462, + "start": 4235.62, "end": 4246.62, "text": " Okay from from my position which is + a very subjective opinion I would say bias is a very human thing so even creating + like a big.", "tokens": [50414, 1033, 490, 490, 452, 2535, 597, 307, 257, 588, 25972, + 4800, 286, 576, 584, 12577, 307, 257, 588, 1952, 551, 370, 754, 4084, 411, 257, + 955, 13, 50964], "temperature": 0.0, "avg_logprob": -0.2325829998139412, "compression_ratio": + 1.2285714285714286, "no_speech_prob": 0.021475812420248985}, {"id": 241, "seek": + 424662, "start": 4246.62, "end": 4275.62, "text": " The same shunt the perfect model + trains on a purely human data in the amount that we have it now even if we increase + the amount we can get rid of the biases which are we are so afraid of so we need + to go to the some system with robots creating robots creating robots and then maybe + we''re get rid of our own biases because as you know human factor it''s a really + like a thing where just making mistakes sometimes just because we''re.", "tokens": + [50364, 440, 912, 402, 2760, 264, 2176, 2316, 16329, 322, 257, 17491, 1952, 1412, + 294, 264, 2372, 300, 321, 362, 309, 586, 754, 498, 321, 3488, 264, 2372, 321, 393, + 483, 3973, 295, 264, 32152, 597, 366, 321, 366, 370, 4638, 295, 370, 321, 643, 281, + 352, 281, 264, 512, 1185, 365, 14733, 4084, 14733, 4084, 14733, 293, 550, 1310, + 321, 434, 483, 3973, 295, 527, 1065, 32152, 570, 382, 291, 458, 1952, 5952, 309, + 311, 257, 534, 411, 257, 551, 689, 445, 1455, 8038, 2171, 445, 570, 321, 434, 13, + 51814], "temperature": 0.0, "avg_logprob": -0.28658641281948294, "compression_ratio": + 1.799163179916318, "no_speech_prob": 0.17445215582847595}, {"id": 242, "seek": 427662, + "start": 4276.62, "end": 4287.62, "text": " Like not as perfect in the tentative + so we''re just biased as if but I would say with the human labeling you see you''re + doing product from bias humans to bias.", "tokens": [50364, 1743, 406, 382, 2176, + 294, 264, 7054, 1166, 370, 321, 434, 445, 28035, 382, 498, 457, 286, 576, 584, 365, + 264, 1952, 40244, 291, 536, 291, 434, 884, 1674, 490, 12577, 6255, 281, 12577, 13, + 50914], "temperature": 0.0, "avg_logprob": -0.22607860345950073, "compression_ratio": + 1.7136752136752136, "no_speech_prob": 0.0036492866929620504}, {"id": 243, "seek": + 427662, "start": 4289.62, "end": 4305.62, "text": " At least the thing that why + I was talking about the Ecclicity like when you''re predicting something by the + their clicks in the online experiments you''re introducing even more like you''re + introducing a third person in this chain developer who.", "tokens": [51014, 1711, + 1935, 264, 551, 300, 983, 286, 390, 1417, 466, 264, 462, 1914, 75, 44198, 411, 562, + 291, 434, 32884, 746, 538, 264, 641, 18521, 294, 264, 2950, 12050, 291, 434, 15424, + 754, 544, 411, 291, 434, 15424, 257, 2636, 954, 294, 341, 5021, 10754, 567, 13, + 51814], "temperature": 0.0, "avg_logprob": -0.22607860345950073, "compression_ratio": + 1.7136752136752136, "no_speech_prob": 0.0036492866929620504}, {"id": 244, "seek": + 430662, "start": 4306.62, "end": 4324.0199999999995, "text": " Does assumptions + about others people bias is sometimes without knowing their culture or like as you + said in search engines we read like on top to bottom from left to right but sometimes + in some cultures they have different way of writing reading right to let or maybe + they have like designing.", "tokens": [50364, 4402, 17695, 466, 2357, 561, 12577, + 307, 2171, 1553, 5276, 641, 3713, 420, 411, 382, 291, 848, 294, 3164, 12982, 321, + 1401, 411, 322, 1192, 281, 2767, 490, 1411, 281, 558, 457, 2171, 294, 512, 12951, + 436, 362, 819, 636, 295, 3579, 3760, 558, 281, 718, 420, 1310, 436, 362, 411, 14685, + 13, 51234], "temperature": 0.0, "avg_logprob": -0.23923395391096147, "compression_ratio": + 1.6187845303867403, "no_speech_prob": 0.0028459487948566675}, {"id": 245, "seek": + 432402, "start": 4324.02, "end": 4353.02, "text": " There are different people which + like different types of the search like based on age maybe sometimes some people + are like seeing less or there are people with color blindness and they need some + other results because like it''s also depends on how do you see everything so when + only one person like assumes especially developer like I was asking on the panel + a question like should be or we all be psychologists and philosophers to create + a systems because like when the development of the system is going to be a lot more + difficult to do.", "tokens": [50414, 821, 366, 819, 561, 597, 411, 819, 3467, 295, + 264, 3164, 411, 2361, 322, 3205, 1310, 2171, 512, 561, 366, 411, 2577, 1570, 420, + 456, 366, 561, 365, 2017, 46101, 293, 436, 643, 512, 661, 3542, 570, 411, 309, 311, + 611, 5946, 322, 577, 360, 291, 536, 1203, 370, 562, 787, 472, 954, 411, 37808, 2318, + 10754, 411, 286, 390, 3365, 322, 264, 4831, 257, 1168, 411, 820, 312, 420, 321, + 439, 312, 41562, 293, 36839, 281, 1884, 257, 3652, 570, 411, 562, 264, 3250, 295, + 264, 1185, 307, 516, 281, 312, 257, 688, 544, 2252, 281, 360, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.3789348228304994, "compression_ratio": 1.8551724137931034, + "no_speech_prob": 0.15549011528491974}, {"id": 246, "seek": 435402, "start": 4354.02, + "end": 4375.02, "text": " The developer decides what to do it sometimes this person + is not educated about like psychology of the human behavior and it might give some + mistakes so that''s why I think human labeling wins it''s not like there are people + are who are like psychologists and philosophers but you are giving the same task + to the same crowd you can like do a perfilter in of the crowd.", "tokens": [50364, + 440, 10754, 14898, 437, 281, 360, 309, 2171, 341, 954, 307, 406, 15872, 466, 411, + 15105, 295, 264, 1952, 5223, 293, 309, 1062, 976, 512, 8038, 370, 300, 311, 983, + 286, 519, 1952, 40244, 10641, 309, 311, 406, 411, 456, 366, 561, 366, 567, 366, + 411, 41562, 293, 36839, 457, 291, 366, 2902, 264, 912, 5633, 281, 264, 912, 6919, + 291, 393, 411, 360, 257, 13826, 388, 391, 294, 295, 264, 6919, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.1955866627878957, "compression_ratio": 1.6561085972850678, + "no_speech_prob": 0.05178459361195564}, {"id": 247, "seek": 437502, "start": 4375.02, + "end": 4394.02, "text": " For example by the same targets audience that you''re + interested in by the language by the culture by the interests by like for example + you have a comfort flowers like people who like flowers like work with flowers or + ask them do you like flowers and then send them to this task and by this your at + least like.", "tokens": [50364, 1171, 1365, 538, 264, 912, 12911, 4034, 300, 291, + 434, 3102, 294, 538, 264, 2856, 538, 264, 3713, 538, 264, 8847, 538, 411, 337, 1365, + 291, 362, 257, 3400, 8085, 411, 561, 567, 411, 8085, 411, 589, 365, 8085, 420, 1029, + 552, 360, 291, 411, 8085, 293, 550, 2845, 552, 281, 341, 5633, 293, 538, 341, 428, + 412, 1935, 411, 13, 51314], "temperature": 0.0, "avg_logprob": -0.22781615624060997, + "compression_ratio": 1.8176470588235294, "no_speech_prob": 0.07845555990934372}, + {"id": 248, "seek": 439402, "start": 4394.02, "end": 4423.02, "text": " Trying to + model the same behavior with actual like people by this having the same distribution + like maybe small like sub sample of the same distribution of people for your target + users and you''re not deciding for them yourself so I would say the best recommendation + is to think about filtering your crowd for your assignments thinking of who you + want to be satisfied by your product and asking the people related to that to do + the evolution.", "tokens": [50364, 20180, 281, 2316, 264, 912, 5223, 365, 3539, + 411, 561, 538, 341, 1419, 264, 912, 7316, 411, 1310, 1359, 411, 1422, 6889, 295, + 264, 912, 7316, 295, 561, 337, 428, 3779, 5022, 293, 291, 434, 406, 17990, 337, + 552, 1803, 370, 286, 576, 584, 264, 1151, 11879, 307, 281, 519, 466, 30822, 428, + 6919, 337, 428, 22546, 1953, 295, 567, 291, 528, 281, 312, 11239, 538, 428, 1674, + 293, 3365, 264, 561, 4077, 281, 300, 281, 360, 264, 9303, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.13952910181987715, "compression_ratio": 1.8148148148148149, + "no_speech_prob": 0.0923244059085846}, {"id": 249, "seek": 442402, "start": 4424.02, + "end": 4427.02, "text": " So I think that''s the best recommendation to do the testing.", + "tokens": [50364, 407, 286, 519, 300, 311, 264, 1151, 11879, 281, 360, 264, 4997, + 13, 50514], "temperature": 0.0, "avg_logprob": -0.32419997169857934, "compression_ratio": + 1.7831858407079646, "no_speech_prob": 0.18618302047252655}, {"id": 250, "seek": + 442402, "start": 4427.02, "end": 4453.02, "text": " Yeah, it''s just one thought + came to mind that in principle if we consider it you know annotation process is + building some kind of mathematical function that we''re trying to with which we''re + trying to fit into the reality then in principle we could have built a perfect annotation, + you know composition project that would fit into the reality.", "tokens": [50514, + 865, 11, 309, 311, 445, 472, 1194, 1361, 281, 1575, 300, 294, 8665, 498, 321, 1949, + 309, 291, 458, 48654, 1399, 307, 2390, 512, 733, 295, 18894, 2445, 300, 321, 434, + 1382, 281, 365, 597, 321, 434, 1382, 281, 3318, 666, 264, 4103, 550, 294, 8665, + 321, 727, 362, 3094, 257, 2176, 48654, 11, 291, 458, 12686, 1716, 300, 576, 3318, + 666, 264, 4103, 13, 51814], "temperature": 0.0, "avg_logprob": -0.32419997169857934, + "compression_ratio": 1.7831858407079646, "no_speech_prob": 0.18618302047252655}, + {"id": 251, "seek": 445302, "start": 4453.02, "end": 4463.02, "text": " So I think + that''s the best recommendation to replicate the same biases that exist today and + earn money right that would be kind of the wrong way to go.", "tokens": [50364, + 407, 286, 519, 300, 311, 264, 1151, 11879, 281, 25356, 264, 912, 32152, 300, 2514, + 965, 293, 6012, 1460, 558, 300, 576, 312, 733, 295, 264, 2085, 636, 281, 352, 13, + 50864], "temperature": 0.0, "avg_logprob": -0.3963607424781436, "compression_ratio": + 1.7186311787072244, "no_speech_prob": 0.01664387807250023}, {"id": 252, "seek": + 445302, "start": 4463.02, "end": 4466.02, "text": " I hope companies do do doing + that right.", "tokens": [50864, 286, 1454, 3431, 360, 360, 884, 300, 558, 13, 51014], + "temperature": 0.0, "avg_logprob": -0.3963607424781436, "compression_ratio": 1.7186311787072244, + "no_speech_prob": 0.01664387807250023}, {"id": 253, "seek": 445302, "start": 4466.02, + "end": 4482.02, "text": " I need I mean I need to say that even like I think even + to log if I am not mistaken it actually uses as a support a little bit of machine + learning labeling so we''re learning on our crowd in each project and we''re like + providing some sublingling with the most.", "tokens": [51014, 286, 643, 286, 914, + 286, 643, 281, 584, 300, 754, 411, 286, 519, 754, 281, 3565, 498, 286, 669, 406, + 21333, 309, 767, 4960, 382, 257, 1406, 257, 707, 857, 295, 3479, 2539, 40244, 370, + 321, 434, 2539, 322, 527, 6919, 294, 1184, 1716, 293, 321, 434, 411, 6530, 512, + 1422, 1688, 1688, 365, 264, 881, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.3963607424781436, "compression_ratio": 1.7186311787072244, "no_speech_prob": + 0.01664387807250023}, {"id": 254, "seek": 448202, "start": 4482.02, "end": 4497.02, + "text": " Learned by the prices of the target auditory and tries to emulate the + same behavior, but it''s still not like the evil sentient AI robotics because it''s + mostly manually labeled but I need to know.", "tokens": [50364, 17216, 292, 538, + 264, 7901, 295, 264, 3779, 17748, 827, 293, 9898, 281, 45497, 264, 912, 5223, 11, + 457, 309, 311, 920, 406, 411, 264, 6724, 2279, 1196, 7318, 34145, 570, 309, 311, + 5240, 16945, 21335, 457, 286, 643, 281, 458, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.3141353855962339, "compression_ratio": 1.3636363636363635, "no_speech_prob": + 0.06155944988131523}, {"id": 255, "seek": 449702, "start": 4497.02, "end": 4510.02, + "text": " I need to mention that also humans who are like labeling assignments sometimes + they are very educated and very smart and they are very willing to correct the system + and actually when you want to correct the system you''re becoming super talented.", + "tokens": [50364, 286, 643, 281, 2152, 300, 611, 6255, 567, 366, 411, 40244, 22546, + 2171, 436, 366, 588, 15872, 293, 588, 4069, 293, 436, 366, 588, 4950, 281, 3006, + 264, 1185, 293, 767, 562, 291, 528, 281, 3006, 264, 1185, 291, 434, 5617, 1687, + 13467, 13, 51014], "temperature": 0.0, "avg_logprob": -0.1457903357113109, "compression_ratio": + 1.8828451882845187, "no_speech_prob": 0.8136023283004761}, {"id": 256, "seek": 449702, + "start": 4510.02, "end": 4525.02, "text": " So I saw some people creating some algorithms + which are labeling the assignments for them and relating the human time of labeling + human way of like moving the mouse human way of understanding instruction.", "tokens": + [51014, 407, 286, 1866, 512, 561, 4084, 512, 14642, 597, 366, 40244, 264, 22546, + 337, 552, 293, 23968, 264, 1952, 565, 295, 40244, 1952, 636, 295, 411, 2684, 264, + 9719, 1952, 636, 295, 3701, 10951, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.1457903357113109, "compression_ratio": 1.8828451882845187, "no_speech_prob": + 0.8136023283004761}, {"id": 257, "seek": 452502, "start": 4525.02, "end": 4535.02, + "text": " Recently I was asked how we''re blocking this type of people but I''m + saying by to bond this type of people there getting so close to actually label it + that.", "tokens": [50364, 20072, 286, 390, 2351, 577, 321, 434, 17776, 341, 2010, + 295, 561, 457, 286, 478, 1566, 538, 281, 6086, 341, 2010, 295, 561, 456, 1242, 370, + 1998, 281, 767, 7645, 309, 300, 13, 50864], "temperature": 0.0, "avg_logprob": -0.32949385923497815, + "compression_ratio": 1.608695652173913, "no_speech_prob": 0.12967734038829803}, + {"id": 258, "seek": 452502, "start": 4535.02, "end": 4550.02, "text": " Yeah, I''m + sure we can learn learned from them because and you already touched on this topic + that another big area of researchers how we can.", "tokens": [50864, 865, 11, 286, + 478, 988, 321, 393, 1466, 3264, 490, 552, 570, 293, 291, 1217, 9828, 322, 341, 4829, + 300, 1071, 955, 1859, 295, 10309, 577, 321, 393, 13, 51614], "temperature": 0.0, + "avg_logprob": -0.32949385923497815, "compression_ratio": 1.608695652173913, "no_speech_prob": + 0.12967734038829803}, {"id": 259, "seek": 455002, "start": 4550.02, "end": 4570.02, + "text": " I believe it''s called gamification or like you break the machine learning + model by supplying certain sequence of actions and input such that it will unlock + some doors or whatever right maybe you receive a loan that you are not supposed + to and things like that.", "tokens": [50364, 286, 1697, 309, 311, 1219, 8019, 3774, + 420, 411, 291, 1821, 264, 3479, 2539, 2316, 538, 46815, 1629, 8310, 295, 5909, 293, + 4846, 1270, 300, 309, 486, 11634, 512, 8077, 420, 2035, 558, 1310, 291, 4774, 257, + 10529, 300, 291, 366, 406, 3442, 281, 293, 721, 411, 300, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.09129486443861476, "compression_ratio": 1.5028901734104045, + "no_speech_prob": 0.3741902709007263}, {"id": 260, "seek": 457002, "start": 4570.02, + "end": 4578.02, "text": " Yeah, this is interesting and do you think I''m asking + the same question as they see on your event.", "tokens": [50364, 865, 11, 341, 307, + 1880, 293, 360, 291, 519, 286, 478, 3365, 264, 912, 1168, 382, 436, 536, 322, 428, + 2280, 13, 50764], "temperature": 0.0, "avg_logprob": -0.12629514155180557, "compression_ratio": + 1.3852459016393444, "no_speech_prob": 0.20716407895088196}, {"id": 261, "seek": + 457002, "start": 4578.02, "end": 4585.02, "text": " Do you think that unbiased data + is attainable so there is a zero bias.", "tokens": [50764, 1144, 291, 519, 300, + 517, 5614, 1937, 1412, 307, 23766, 712, 370, 456, 307, 257, 4018, 12577, 13, 51114], + "temperature": 0.0, "avg_logprob": -0.12629514155180557, "compression_ratio": 1.3852459016393444, + "no_speech_prob": 0.20716407895088196}, {"id": 262, "seek": 458502, "start": 4585.02, + "end": 4613.02, "text": " The honest I don''t think so I don''t believe so like + I might be incorrect and the experts said like different experts like who was sitting + with me on the panel discussion and I of course I asked the same question because + it''s very interesting like is it only our thing the why we''re in this loop of + creating and fixing biases like you know like a cheap one but.", "tokens": [50364, + 440, 3245, 286, 500, 380, 519, 370, 286, 500, 380, 1697, 370, 411, 286, 1062, 312, + 18424, 293, 264, 8572, 848, 411, 819, 8572, 411, 567, 390, 3798, 365, 385, 322, + 264, 4831, 5017, 293, 286, 295, 1164, 286, 2351, 264, 912, 1168, 570, 309, 311, + 588, 1880, 411, 307, 309, 787, 527, 551, 264, 983, 321, 434, 294, 341, 6367, 295, + 4084, 293, 19442, 32152, 411, 291, 458, 411, 257, 7084, 472, 457, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.19424399846716772, "compression_ratio": 1.6822429906542056, + "no_speech_prob": 0.16298894584178925}, {"id": 263, "seek": 461302, "start": 4613.02, + "end": 4642.02, "text": " I personally don''t think so because like a bias is a + very human thing we can try to get rid of one you''re creating another but it''s + not bad it''s not bad we just need to get rid of the actually like dangerous biases + like ethic and other ones and with the rest we just need to understand how to deal + with them and as humans we can recognize some biases which are harmful and that''s + good that''s why for example we need.", "tokens": [50364, 286, 5665, 500, 380, 519, + 370, 570, 411, 257, 12577, 307, 257, 588, 1952, 551, 321, 393, 853, 281, 483, 3973, + 295, 472, 291, 434, 4084, 1071, 457, 309, 311, 406, 1578, 309, 311, 406, 1578, 321, + 445, 643, 281, 483, 3973, 295, 264, 767, 411, 5795, 32152, 411, 37820, 293, 661, + 2306, 293, 365, 264, 1472, 321, 445, 643, 281, 1223, 577, 281, 2028, 365, 552, 293, + 382, 6255, 321, 393, 5521, 512, 32152, 597, 366, 19727, 293, 300, 311, 665, 300, + 311, 983, 337, 1365, 321, 643, 13, 51814], "temperature": 0.0, "avg_logprob": -0.11885016015235414, + "compression_ratio": 1.8165938864628821, "no_speech_prob": 0.011818533763289452}, + {"id": 264, "seek": 464202, "start": 4642.02, "end": 4670.02, "text": " The manual + evaluation of the AI systems which are trained now because they are having their + very like nicely trained but they are producing biases and they can detect what + they are doing so sometimes they can be harmful so that''s why like from my perspective + like big models alone still can be like used now even if there exists and they''re + like compressing us very much I need some on top verification.", "tokens": [50364, + 440, 9688, 13344, 295, 264, 7318, 3652, 597, 366, 8895, 586, 570, 436, 366, 1419, + 641, 588, 411, 9594, 8895, 457, 436, 366, 10501, 32152, 293, 436, 393, 5531, 437, + 436, 366, 884, 370, 2171, 436, 393, 312, 19727, 370, 300, 311, 983, 411, 490, 452, + 4585, 411, 955, 5245, 3312, 920, 393, 312, 411, 1143, 586, 754, 498, 456, 8198, + 293, 436, 434, 411, 14778, 278, 505, 588, 709, 286, 643, 512, 322, 1192, 30206, + 13, 51764], "temperature": 0.0, "avg_logprob": -0.18522314377772955, "compression_ratio": + 1.7222222222222223, "no_speech_prob": 0.007187804207205772}, {"id": 265, "seek": + 467002, "start": 4670.02, "end": 4674.02, "text": " Yeah exactly and this is where + the human labeling comes in.", "tokens": [50364, 865, 2293, 293, 341, 307, 689, + 264, 1952, 40244, 1487, 294, 13, 50564], "temperature": 0.0, "avg_logprob": -0.23325912769024187, + "compression_ratio": 1.3388429752066116, "no_speech_prob": 0.028724296018481255}, + {"id": 266, "seek": 467002, "start": 4674.02, "end": 4682.02, "text": " I think + the flip side or if I would flip around my question about getting the or your question + rather.", "tokens": [50564, 286, 519, 264, 7929, 1252, 420, 498, 286, 576, 7929, + 926, 452, 1168, 466, 1242, 264, 420, 428, 1168, 2831, 13, 50964], "temperature": + 0.0, "avg_logprob": -0.23325912769024187, "compression_ratio": 1.3388429752066116, + "no_speech_prob": 0.028724296018481255}, {"id": 267, "seek": 468202, "start": 4682.02, + "end": 4703.02, "text": " Of getting the completely unbiased said you could also + say could we actually source the data said that that contains all the little biases + or little diversities that exist in the world right or maybe not okay not in the + world but in that domain of operation that you are you are in your business and + maybe.", "tokens": [50364, 2720, 1242, 264, 2584, 517, 5614, 1937, 848, 291, 727, + 611, 584, 727, 321, 767, 4009, 264, 1412, 848, 300, 300, 8306, 439, 264, 707, 32152, + 420, 707, 6111, 1088, 300, 2514, 294, 264, 1002, 558, 420, 1310, 406, 1392, 406, + 294, 264, 1002, 457, 294, 300, 9274, 295, 6916, 300, 291, 366, 291, 366, 294, 428, + 1606, 293, 1310, 13, 51414], "temperature": 0.0, "avg_logprob": -0.13214733417217547, + "compression_ratio": 1.7191011235955056, "no_speech_prob": 0.2925281822681427}, + {"id": 268, "seek": 470302, "start": 4703.02, "end": 4726.02, "text": " Formulating + it that way gives me a lever to start thinking okay what is it that I''m missing + in in the data and of course this is the most challenging question to know what + I don''t know what I''m missing right so it''s equally hard but it''s probably more + in the trajectory of a massive the data set.", "tokens": [50364, 10126, 12162, 309, + 300, 636, 2709, 385, 257, 12451, 281, 722, 1953, 1392, 437, 307, 309, 300, 286, + 478, 5361, 294, 294, 264, 1412, 293, 295, 1164, 341, 307, 264, 881, 7595, 1168, + 281, 458, 437, 286, 500, 380, 458, 437, 286, 478, 5361, 558, 370, 309, 311, 12309, + 1152, 457, 309, 311, 1391, 544, 294, 264, 21512, 295, 257, 5994, 264, 1412, 992, + 13, 51514], "temperature": 0.0, "avg_logprob": -0.11957019308338994, "compression_ratio": + 1.6444444444444444, "no_speech_prob": 0.2529779076576233}, {"id": 269, "seek": 472602, + "start": 4727.02, "end": 4745.02, "text": " I would say that I actually heard some + approaches which are working on that like specifically taking into account bias + very biased data on the like in the domain and seeing how you''re a reason will + perceive it and actually catching the mistakes by it because yeah we we have.", + "tokens": [50414, 286, 576, 584, 300, 286, 767, 2198, 512, 11587, 597, 366, 1364, + 322, 300, 411, 4682, 1940, 666, 2696, 12577, 588, 28035, 1412, 322, 264, 411, 294, + 264, 9274, 293, 2577, 577, 291, 434, 257, 1778, 486, 20281, 309, 293, 767, 16124, + 264, 8038, 538, 309, 570, 1338, 321, 321, 362, 13, 51314], "temperature": 0.0, "avg_logprob": + -0.1373183556965419, "compression_ratio": 1.5625, "no_speech_prob": 0.06511884182691574}, + {"id": 270, "seek": 474502, "start": 4745.02, "end": 4759.02, "text": " We can account + the bias data and there are like some guidelines how to notice biases in your data + or models so here we can try to at least approach this ask from your perspective + yeah.", "tokens": [50364, 492, 393, 2696, 264, 12577, 1412, 293, 456, 366, 411, + 512, 12470, 577, 281, 3449, 32152, 294, 428, 1412, 420, 5245, 370, 510, 321, 393, + 853, 281, 412, 1935, 3109, 341, 1029, 490, 428, 4585, 1338, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.14197054545084636, "compression_ratio": 1.596244131455399, + "no_speech_prob": 0.032883938401937485}, {"id": 271, "seek": 474502, "start": 4759.02, + "end": 4771.02, "text": " Sounds great hey Jenny I really enjoy talking to you and + I think we could talk forever on this topic but I really love asking the question + of why which is.", "tokens": [51064, 14576, 869, 4177, 20580, 286, 534, 2103, 1417, + 281, 291, 293, 286, 519, 321, 727, 751, 5680, 322, 341, 4829, 457, 286, 534, 959, + 3365, 264, 1168, 295, 983, 597, 307, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.14197054545084636, "compression_ratio": 1.596244131455399, "no_speech_prob": + 0.032883938401937485}, {"id": 272, "seek": 477102, "start": 4771.02, "end": 4795.02, + "text": " I''m tapping into your motivation you did say in the beginning that you + know all the stars positioned correctly in a way and you got the the dream job but + at the same time you still wake up every morning and you say okay what will I do + today what drives me why drives me so what drives you to do what you do data advocate + and a well.", "tokens": [50364, 286, 478, 21444, 666, 428, 12335, 291, 630, 584, + 294, 264, 2863, 300, 291, 458, 439, 264, 6105, 24889, 8944, 294, 257, 636, 293, + 291, 658, 264, 264, 3055, 1691, 457, 412, 264, 912, 565, 291, 920, 6634, 493, 633, + 2446, 293, 291, 584, 1392, 437, 486, 286, 360, 965, 437, 11754, 385, 983, 11754, + 385, 370, 437, 11754, 291, 281, 360, 437, 291, 360, 1412, 14608, 293, 257, 731, + 13, 51564], "temperature": 0.0, "avg_logprob": -0.174960568745931, "compression_ratio": + 1.711340206185567, "no_speech_prob": 0.20700465142726898}, {"id": 273, "seek": 479502, + "start": 4796.02, "end": 4824.02, "text": " I would say I don''t want to like start + a story with reflecting on how I like I woke up one day and realized that my my + heart belongs to a I and everything but I would say like a little little moments + in my life when I had to write and say about can computers think when I was applying + to university and I had to explain to my parents what is AI and why I''m doing it.", + "tokens": [50414, 286, 576, 584, 286, 500, 380, 528, 281, 411, 722, 257, 1657, 365, + 23543, 322, 577, 286, 411, 286, 12852, 493, 472, 786, 293, 5334, 300, 452, 452, + 1917, 12953, 281, 257, 286, 293, 1203, 457, 286, 576, 584, 411, 257, 707, 707, 6065, + 294, 452, 993, 562, 286, 632, 281, 2464, 293, 584, 466, 393, 10807, 519, 562, 286, + 390, 9275, 281, 5454, 293, 286, 632, 281, 2903, 281, 452, 3152, 437, 307, 7318, + 293, 983, 286, 478, 884, 309, 13, 51814], "temperature": 0.0, "avg_logprob": -0.12738022693367893, + "compression_ratio": 1.6486486486486487, "no_speech_prob": 0.05356425419449806}, + {"id": 274, "seek": 482402, "start": 4825.02, "end": 4853.02, "text": " When I compare + the other like positions when I saw some questions which people in general like + asking see in the Lee and GPT free when I visited some industrial conferences and + compare them with the research and conferences and notice that people are fascinated + by the models and their key textures when what when it comes to like taking it down + to development and to actually helping people people.", "tokens": [50414, 1133, + 286, 6794, 264, 661, 411, 8432, 562, 286, 1866, 512, 1651, 597, 561, 294, 2674, + 411, 3365, 536, 294, 264, 6957, 293, 26039, 51, 1737, 562, 286, 11220, 512, 9987, + 22032, 293, 6794, 552, 365, 264, 2132, 293, 22032, 293, 3449, 300, 561, 366, 24597, + 538, 264, 5245, 293, 641, 2141, 24501, 562, 437, 562, 309, 1487, 281, 411, 1940, + 309, 760, 281, 3250, 293, 281, 767, 4315, 561, 561, 13, 51814], "temperature": 0.0, + "avg_logprob": -0.22323224419041685, "compression_ratio": 1.7733333333333334, "no_speech_prob": + 0.019482966512441635}, {"id": 275, "seek": 485302, "start": 4853.02, "end": 4868.02, + "text": " Struggle with doing some simple things like not simple but basic things + like providing the date not interesting there like they sound less interesting but + they''re actually very crucial like providing the right to cure in the Bay.", "tokens": + [50364, 8251, 31726, 365, 884, 512, 2199, 721, 411, 406, 2199, 457, 3875, 721, 411, + 6530, 264, 4002, 406, 1880, 456, 411, 436, 1626, 1570, 1880, 457, 436, 434, 767, + 588, 11462, 411, 6530, 264, 558, 281, 13698, 294, 264, 7840, 13, 51114], "temperature": + 0.0, "avg_logprob": -0.28856658935546875, "compression_ratio": 1.6474820143884892, + "no_speech_prob": 0.00961078517138958}, {"id": 276, "seek": 486802, "start": 4868.02, + "end": 4893.02, "text": " By us monitoring it not just creating a proof of concept + it bugged me so much because I see so many cool models ideas architectures around + creating like insane applications but not always they''re coming to production and + not always there start like helping people and everything and I would say I really + would love to I love to.", "tokens": [50364, 3146, 505, 11028, 309, 406, 445, 4084, + 257, 8177, 295, 3410, 309, 7426, 3004, 385, 370, 709, 570, 286, 536, 370, 867, 1627, + 5245, 3487, 6331, 1303, 926, 4084, 411, 10838, 5821, 457, 406, 1009, 436, 434, 1348, + 281, 4265, 293, 406, 1009, 456, 722, 411, 4315, 561, 293, 1203, 293, 286, 576, 584, + 286, 534, 576, 959, 281, 286, 959, 281, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.2032143891747318, "compression_ratio": 1.6482412060301508, "no_speech_prob": + 0.7849456071853638}, {"id": 277, "seek": 489302, "start": 4894.02, "end": 4916.02, + "text": " In general like seeing something start working like it''s a very satisfaction + or anything so I choose my like my path on say to approach people with talking about + data and how can it help actually to train the models and make them closer to the + production.", "tokens": [50414, 682, 2674, 411, 2577, 746, 722, 1364, 411, 309, + 311, 257, 588, 18715, 420, 1340, 370, 286, 2826, 452, 411, 452, 3100, 322, 584, + 281, 3109, 561, 365, 1417, 466, 1412, 293, 577, 393, 309, 854, 767, 281, 3847, 264, + 5245, 293, 652, 552, 4966, 281, 264, 4265, 13, 51514], "temperature": 0.0, "avg_logprob": + -0.20656499322855248, "compression_ratio": 1.5548780487804879, "no_speech_prob": + 0.16829721629619598}, {"id": 278, "seek": 491602, "start": 4916.02, "end": 4927.02, + "text": " Make them closer to being actually here and working good and helping people + out with this magnificent ideas created by researchers.", "tokens": [50364, 4387, + 552, 4966, 281, 885, 767, 510, 293, 1364, 665, 293, 4315, 561, 484, 365, 341, 23690, + 3487, 2942, 538, 10309, 13, 50914], "temperature": 0.0, "avg_logprob": -0.11249187257554796, + "compression_ratio": 1.61244019138756, "no_speech_prob": 0.02634022757411003}, {"id": + 279, "seek": 491602, "start": 4928.02, "end": 4944.02, "text": " Oh yeah that sounds + so cool very deep thank you for sharing this it''s like I think in many ways it''s + like the dream maybe of creating that companion that can think the same way we think + and it''s not human.", "tokens": [50964, 876, 1338, 300, 3263, 370, 1627, 588, 2452, + 1309, 291, 337, 5414, 341, 309, 311, 411, 286, 519, 294, 867, 2098, 309, 311, 411, + 264, 3055, 1310, 295, 4084, 300, 22363, 300, 393, 519, 264, 912, 636, 321, 519, + 293, 309, 311, 406, 1952, 13, 51764], "temperature": 0.0, "avg_logprob": -0.11249187257554796, + "compression_ratio": 1.61244019138756, "no_speech_prob": 0.02634022757411003}, {"id": + 280, "seek": 494602, "start": 4946.02, "end": 4965.52, "text": " Because we are + could also as a humanity we do reproduce and so on but we also challenge ourselves + and others in what is possible what is what are the limits of of our intelligence + or you know are they need.", "tokens": [50364, 1436, 321, 366, 727, 611, 382, 257, + 10243, 321, 360, 29501, 293, 370, 322, 457, 321, 611, 3430, 4175, 293, 2357, 294, + 437, 307, 1944, 437, 307, 437, 366, 264, 10406, 295, 295, 527, 7599, 420, 291, 458, + 366, 436, 643, 13, 51339], "temperature": 0.0, "avg_logprob": -0.20765406152476434, + "compression_ratio": 1.4927536231884058, "no_speech_prob": 0.014964158646762371}, + {"id": 281, "seek": 496552, "start": 4965.52, "end": 4973.52, "text": " And it''s + a very important task that are still waiting there to be solved and can be solved + with the I think it''s it''s magnificent.", "tokens": [50364, 400, 309, 311, 257, + 588, 1021, 5633, 300, 366, 920, 3806, 456, 281, 312, 13041, 293, 393, 312, 13041, + 365, 264, 286, 519, 309, 311, 309, 311, 23690, 13, 50764], "temperature": 0.2, "avg_logprob": + -0.4774539584205264, "compression_ratio": 1.5890410958904109, "no_speech_prob": + 0.28264832496643066}, {"id": 282, "seek": 496552, "start": 4974.52, "end": 4986.52, + "text": " I am waiting to see if somebody some model will finally win the Lovener + price and positive cheering test so I''m working for chat GPT.", "tokens": [50814, + 286, 669, 3806, 281, 536, 498, 2618, 512, 2316, 486, 2721, 1942, 264, 6130, 553, + 260, 3218, 293, 3353, 11060, 1500, 370, 286, 478, 1364, 337, 5081, 26039, 51, 13, + 51414], "temperature": 0.2, "avg_logprob": -0.4774539584205264, "compression_ratio": + 1.5890410958904109, "no_speech_prob": 0.28264832496643066}, {"id": 283, "seek": + 496552, "start": 4987.52, "end": 4993.52, "text": " Yes, maybe it will be fine tuned + on some something like flower series or something.", "tokens": [51464, 1079, 11, + 1310, 309, 486, 312, 2489, 10870, 322, 512, 746, 411, 8617, 2638, 420, 746, 13, + 51764], "temperature": 0.2, "avg_logprob": -0.4774539584205264, "compression_ratio": + 1.5890410958904109, "no_speech_prob": 0.28264832496643066}, {"id": 284, "seek": + 499352, "start": 4993.52, "end": 4998.52, "text": " Yeah, yeah, yeah, on human labeling + with for lock I am pretty exactly exactly.", "tokens": [50364, 865, 11, 1338, 11, + 1338, 11, 322, 1952, 40244, 365, 337, 4017, 286, 669, 1238, 2293, 2293, 13, 50614], + "temperature": 0.0, "avg_logprob": -0.2356715482823989, "compression_ratio": 1.6073298429319371, + "no_speech_prob": 0.018972201272845268}, {"id": 285, "seek": 499352, "start": 4999.52, + "end": 5015.52, "text": " And yeah and traditionally of course we will we will link + everything we can link about to lock up but if you were to announce something to + the audience maybe how they can get closer to the platform you know start playing + around.", "tokens": [50664, 400, 1338, 293, 19067, 295, 1164, 321, 486, 321, 486, + 2113, 1203, 321, 393, 2113, 466, 281, 4017, 493, 457, 498, 291, 645, 281, 7478, + 746, 281, 264, 4034, 1310, 577, 436, 393, 483, 4966, 281, 264, 3663, 291, 458, 722, + 2433, 926, 13, 51464], "temperature": 0.0, "avg_logprob": -0.2356715482823989, "compression_ratio": + 1.6073298429319371, "no_speech_prob": 0.018972201272845268}, {"id": 286, "seek": + 501552, "start": 5016.52, "end": 5038.52, "text": " Okay, yeah, okay, I have three + things that I really want to to announce firstly it''s you if you want to talk about + just like do we need manual labeling do you need manual labeling do you need that + to leave me do you need transfer learning with crowdsourcing do you want to just + use crowdsourcing and think about it.", "tokens": [50414, 1033, 11, 1338, 11, 1392, + 11, 286, 362, 1045, 721, 300, 286, 534, 528, 281, 281, 7478, 27376, 309, 311, 291, + 498, 291, 528, 281, 751, 466, 445, 411, 360, 321, 643, 9688, 40244, 360, 291, 643, + 9688, 40244, 360, 291, 643, 300, 281, 1856, 385, 360, 291, 643, 5003, 2539, 365, + 26070, 41849, 360, 291, 528, 281, 445, 764, 26070, 41849, 293, 519, 466, 309, 13, + 51514], "temperature": 0.0, "avg_logprob": -0.2770505287277866, "compression_ratio": + 1.8421052631578947, "no_speech_prob": 0.10256291180849075}, {"id": 287, "seek": + 503852, "start": 5038.52, "end": 5049.52, "text": " To join our community because + we talk just in general they''re about the mental stuff about the eye about crowdsourcing + about that the century can model century approach and there we can.", "tokens": + [50364, 1407, 3917, 527, 1768, 570, 321, 751, 445, 294, 2674, 436, 434, 466, 264, + 4973, 1507, 466, 264, 3313, 466, 26070, 41849, 466, 300, 264, 4901, 393, 2316, 4901, + 3109, 293, 456, 321, 393, 13, 50914], "temperature": 0.0, "avg_logprob": -0.3932748452211038, + "compression_ratio": 1.488, "no_speech_prob": 0.08250134438276291}, {"id": 288, + "seek": 504952, "start": 5049.52, "end": 5069.52, "text": " concretely talk about + like some topics which concern to you to your company or like to your business and + just talk and also we have two initiatives for like education and for researchers + if somebody is interested to check some hypothesis on crowdsourcing for example + some.", "tokens": [50364, 39481, 736, 751, 466, 411, 512, 8378, 597, 3136, 281, + 291, 281, 428, 2237, 420, 411, 281, 428, 1606, 293, 445, 751, 293, 611, 321, 362, + 732, 16194, 337, 411, 3309, 293, 337, 10309, 498, 2618, 307, 3102, 281, 1520, 512, + 17291, 322, 26070, 41849, 337, 1365, 512, 13, 51364], "temperature": 0.0, "avg_logprob": + -0.14737313648439804, "compression_ratio": 1.619047619047619, "no_speech_prob": + 0.03970586508512497}, {"id": 289, "seek": 506952, "start": 5070.52, "end": 5092.52, + "text": " Christian there''s some ethical stuff some gathering of the data sets + for your own to or like you want to create some education or like to teach a course + over the crowdsourcing to your students we have two foreign applications where we + can provide you some promos so using crowdsourcing for free and play around and + maybe to start liking it as I do.", "tokens": [50414, 5778, 456, 311, 512, 18890, + 1507, 512, 13519, 295, 264, 1412, 6352, 337, 428, 1065, 281, 420, 411, 291, 528, + 281, 1884, 512, 3309, 420, 411, 281, 2924, 257, 1164, 670, 264, 26070, 41849, 281, + 428, 1731, 321, 362, 732, 5329, 5821, 689, 321, 393, 2893, 291, 512, 2234, 329, + 370, 1228, 26070, 41849, 337, 1737, 293, 862, 926, 293, 1310, 281, 722, 16933, 309, + 382, 286, 360, 13, 51514], "temperature": 0.0, "avg_logprob": -0.21206486061827778, + "compression_ratio": 1.6893203883495145, "no_speech_prob": 0.04584885388612747}, + {"id": 290, "seek": 509252, "start": 5092.52, "end": 5100.52, "text": " And I truly + like it because like when you can gather at 12k data sets in one day start liking + it.", "tokens": [50364, 400, 286, 4908, 411, 309, 570, 411, 562, 291, 393, 5448, + 412, 2272, 74, 1412, 6352, 294, 472, 786, 722, 16933, 309, 13, 50764], "temperature": + 0.0, "avg_logprob": -0.25710127088758683, "compression_ratio": 1.56, "no_speech_prob": + 0.13385094702243805}, {"id": 291, "seek": 509252, "start": 5100.52, "end": 5102.52, + "text": " This is mind blowing.", "tokens": [50764, 639, 307, 1575, 15068, 13, 50864], + "temperature": 0.0, "avg_logprob": -0.25710127088758683, "compression_ratio": 1.56, + "no_speech_prob": 0.13385094702243805}, {"id": 292, "seek": 509252, "start": 5105.52, + "end": 5109.52, "text": " So yeah that''s that''s me that''s it thank you very much.", + "tokens": [51014, 407, 1338, 300, 311, 300, 311, 385, 300, 311, 309, 1309, 291, + 588, 709, 13, 51214], "temperature": 0.0, "avg_logprob": -0.25710127088758683, "compression_ratio": + 1.56, "no_speech_prob": 0.13385094702243805}, {"id": 293, "seek": 509252, "start": + 5109.52, "end": 5110.52, "text": " That''s fun.", "tokens": [51214, 663, 311, 1019, + 13, 51264], "temperature": 0.0, "avg_logprob": -0.25710127088758683, "compression_ratio": + 1.56, "no_speech_prob": 0.13385094702243805}, {"id": 294, "seek": 509252, "start": + 5110.52, "end": 5112.52, "text": " That was magical thank you.", "tokens": [51264, + 663, 390, 12066, 1309, 291, 13, 51364], "temperature": 0.0, "avg_logprob": -0.25710127088758683, + "compression_ratio": 1.56, "no_speech_prob": 0.13385094702243805}, {"id": 295, "seek": + 509252, "start": 5112.52, "end": 5117.52, "text": " I''m sorry for being like a + very talkative person but I''m always like this would be afraid of me.", "tokens": + [51364, 286, 478, 2597, 337, 885, 411, 257, 588, 751, 1166, 954, 457, 286, 478, + 1009, 411, 341, 576, 312, 4638, 295, 385, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.25710127088758683, "compression_ratio": 1.56, "no_speech_prob": 0.13385094702243805}, + {"id": 296, "seek": 511752, "start": 5117.52, "end": 5140.52, "text": " No it''s + it''s your character it''s your energy it''s it''s your experience and it''s something + that speaks up you know beside you controlling it I think it''s it''s it''s important + it''s amazing and that''s how it should be I think I really really enjoyed talking + to you Jenny today I hope this is not the last time we can record another one and + another one.", "tokens": [50364, 883, 309, 311, 309, 311, 428, 2517, 309, 311, 428, + 2281, 309, 311, 309, 311, 428, 1752, 293, 309, 311, 746, 300, 10789, 493, 291, 458, + 15726, 291, 14905, 309, 286, 519, 309, 311, 309, 311, 309, 311, 1021, 309, 311, + 2243, 293, 300, 311, 577, 309, 820, 312, 286, 519, 286, 534, 534, 4626, 1417, 281, + 291, 20580, 965, 286, 1454, 341, 307, 406, 264, 1036, 565, 321, 393, 2136, 1071, + 472, 293, 1071, 472, 13, 51514], "temperature": 0.0, "avg_logprob": -0.11576996320559654, + "compression_ratio": 1.766497461928934, "no_speech_prob": 0.14313067495822906}, + {"id": 297, "seek": 514052, "start": 5140.52, "end": 5145.52, "text": " And all + the best with your Christmas vacation.", "tokens": [50364, 400, 439, 264, 1151, + 365, 428, 5272, 12830, 13, 50614], "temperature": 0.0, "avg_logprob": -0.3916715213230678, + "compression_ratio": 1.45, "no_speech_prob": 0.4041517376899719}, {"id": 298, "seek": + 514052, "start": 5145.52, "end": 5150.52, "text": " Oh thank you and recharging + and all the best with the locker.", "tokens": [50614, 876, 1309, 291, 293, 319, + 7374, 3249, 293, 439, 264, 1151, 365, 264, 25707, 13, 50864], "temperature": 0.0, + "avg_logprob": -0.3916715213230678, "compression_ratio": 1.45, "no_speech_prob": + 0.4041517376899719}, {"id": 299, "seek": 514052, "start": 5150.52, "end": 5152.52, + "text": " Thank you very much.", "tokens": [50864, 1044, 291, 588, 709, 13, 50964], + "temperature": 0.0, "avg_logprob": -0.3916715213230678, "compression_ratio": 1.45, + "no_speech_prob": 0.4041517376899719}, {"id": 300, "seek": 514052, "start": 5152.52, + "end": 5160.52, "text": " I keep in the city you can do it however you like actually + we we approve.", "tokens": [50964, 286, 1066, 294, 264, 2307, 291, 393, 360, 309, + 4461, 291, 411, 767, 321, 321, 18827, 13, 51364], "temperature": 0.0, "avg_logprob": + -0.3916715213230678, "compression_ratio": 1.45, "no_speech_prob": 0.4041517376899719}, + {"id": 301, "seek": 516052, "start": 5160.52, "end": 5177.52, "text": " Yeah thank + you and I hope the audience got that magic tune as well and everyone will also have + time to recharge during the Christmas and your break and we will continue from here + thank you Jenny.", "tokens": [50364, 865, 1309, 291, 293, 286, 1454, 264, 4034, + 658, 300, 5585, 10864, 382, 731, 293, 1518, 486, 611, 362, 565, 281, 31366, 1830, + 264, 5272, 293, 428, 1821, 293, 321, 486, 2354, 490, 510, 1309, 291, 20580, 13, + 51214], "temperature": 0.0, "avg_logprob": -0.14610101662430108, "compression_ratio": + 1.4333333333333333, "no_speech_prob": 0.029202817007899284}, {"id": 302, "seek": + 516052, "start": 5177.52, "end": 5178.52, "text": " Thank you.", "tokens": [51214, + 1044, 291, 13, 51264], "temperature": 0.0, "avg_logprob": -0.14610101662430108, + "compression_ratio": 1.4333333333333333, "no_speech_prob": 0.029202817007899284}, + {"id": 303, "seek": 516052, "start": 5178.52, "end": 5179.52, "text": " Bye bye.", + "tokens": [51264, 4621, 6543, 13, 51314], "temperature": 0.0, "avg_logprob": -0.14610101662430108, + "compression_ratio": 1.4333333333333333, "no_speech_prob": 0.029202817007899284}]' +--- + +Hello there, vector podcast season 2. I hope that you were waiting for a new episode. And today I'm really happy about my guest and the topic, because in many ways we didn't cover it as much as I think and hope we can cover it today. +It's the topic of data, the role of data, while everyone is talking about sexy deep learning, chat, GPT, learning to rank new algorithms and so on. I still believe that we should not forget about where all these things begin and this is data. +And I'm happy to have and welcome Yvgenia Sukadolska, data advocate at the locker today with me. How are you doing? Thank you, Dima. Thank you. I am super happy to talk it. I'm very precarious. I'm very smooth. So I'm feeling like I'm just having a little chitchat before vacation. +Yes, exactly how it should be. And I'm really happy to have you here. We met at Haystack Conference and this was great. I saw so much excitement in you when you talked about, hey, but what about data? We should also talk about it. Don't forget it. +And I'm really excited to drill into this today with you. I think let's start as we usually start with your background and then we're all from there. Okay, perfect. Yeah, Haystack. I think I literally like formulated my passion that I want to talk about searching the means of the data. +So I'm feeling like today I'm getting a present for Christmas. So yeah, about me, I am this person, this type of a person which got his perfect position by chance because I never knew it existed. +Because I finished bachelor's in machine learning and I was like, okay, if I did so, I need to work like around it, you know, but at this point, everybody was, everybody is always hyping on something in machine learning, you know, so at that point, I mean, I finished in 2009. +So it was like a very big era of guns and like others. And everybody wanted to work with a computer vision and a dinner and so I thought, okay, maybe I need to start like working somewhere in the field and see what I will like. And by some chance, I started working as a software developer. +I don't know why just, you know, like out of the blue, your students, you're getting your first job. Then I kind of realized after some time that I'm doing more like business analytics, talking tasks and consulting through doing the development also and I like both. +And I was like, okay, so I don't want to be just a developer and I switched to a position which was called technical manager, which was like something in between analysts. +And also a person who like manages like some projects with people and also at that time, that was the first time I tried to do crowdsourcing like related to our projects because we were doing tasks about moderation. +And with moderation and actually you need a lot of data like labels and checked on the quality because it's a very hard task. And still was off because here I wasn't using enough machine learning and I was like, but I started it. Okay, then I switched to machine learning engineer. +I mean, it sounds like I'm hoping on and hoping off, but it was like some periods of time. It was amazing. I had an amazing had the I am still very grateful to him. He like, teach me much about machine learning in the production. +But at this point, I realized that I lost some of the knowledge that I like received back. So I applied for the master's program in the actually Munich. Now I'm studying the technical university of Minken on the also kind of machine learning program. And also I feel that I'm not speaking enough. +You see, so I'm like always like speaking enough, not the melon, not this enough. And at some points, my like, please, who I knew back then and they were working at that point in to local. They said, hey, Jenny, how about you will work with us as a data advocate. +And I was like, what is that? I'm going to Germany. I am not, I don't know what is that. And they're like, oh, that would be perfect for you because you can do machine learning and researching, but you also can speak. + And I am so happy that it literally happened because last year and some time that I was like there working with the data, working for outsourcing, working with search because I also have like some past experiences when I was a machine learning engineer, I actually was working with a search recommendations a little bit. +So all of these combined in one perfect profession. So I would say I'm a very happy person. This is super great. And it sounds like you are in the warm waters of like what you really want to do at the same time. I think it's still a very demanding role and and then field. +And so you are basically still doing some ML, right, but you also advocate for data. +Can you expand a bit on what you do? Oh, yeah, it's actually like everything in a like all in one like, you know, this shampoos with conditioner and the shampoo in body wash and everything because I have some freedom at my position to choose what I want to like study now. +So, for example, I chose like going to the search conferences and talk about it because I had some experience. I really love the idea of comparing crowdsourcing and machine learning models in some particular tasks. For example, let's think about the adversarial attacks. +It's interesting how far we can expand with them, like detecting them by machine learning and detecting them for humans. And like these different comparisons where they crowd wins where they like manually where the machine learning wins. +It's a question which is in general interesting, especially now with chat GPT when everybody is like, oh my god, the ice conscious, OK, close everything, fire all the software engineers we are done. + So it's super interesting to explore that and I am always like reading articles about this attending a talks about this also doing myself some talks plus of course I'm also participating in development of our company because to talk has started as a data labeling company, but now it's expanding much more in the means that we also started designing like a mouth tools on the top of it. +Because when you're having such a resource, you know, like human manual labeling in your basically in your, I don't know how to say in your basement, but it sounds creepy. Yeah, you can use it for transfer learning or for some other like interesting tasks. +And yeah, we expanded a lot and it's very interesting to a system this process and to come and to talk about this and also in the means of still manual lately in assistance to I am now developing currently. Yeah, this is fantastic. +And when somebody approaches to work on or like maybe you just create an account, I guess, and you start, you know, creating projects and so on. +I think many, many things go into the process like starting from the price, right, how economical it can be, right, do I have any control over this, but also in terms of, for example, the outcome, you know, the quality of labels that I will get. +How do you usually sort of structure the process, is there some general recipe that the lock up would offer to any user and maybe on top of that, you would offer some additional service, so to say right or advice to a company, is there something around that you could share with us. + I would say I can talk no stop, but in general it's like this firstly, of course, when you're deciding that you need manual labeling for some reason, like some data sets, you need to understand that you actually need that because it doesn't need that for every task that they're existing just use data centric approach throwing data because nothing is tops up this that's not correct, of course. +You can be free using like open source data, you can be free using synthetic data because it's cheap, you're just generating it yourself. But sometimes in a lot of domains, you don't have enough available data for some specific domains. +And it's hard to gather it or generate it or sometimes you need a human creation over curation over the machine learning processes, for example, for monitoring or like with. +For example, it's like a hot topic i've seen such a thread in Twitter how people try to ask it for some like really, really dangerous stuff and check if it will have provided and it did so like you know we still need a human creation over like the data gathered by MLEI mechanisms. +So in this case, if you feel like you haven't need to gather a data set for your specific problem and you don't know where to start here is crowdsourcing platforms and for example into loka. +It is the platform which was created from engineers to engineers so it's not about like the only model with business where you're coming to us were consulting you and you're going away. +So we're actually super happy if you're like trying to deal with it yourself because we have an open API and it's more about mechanisms than speaking with manual labor. It's like literally about like handling the crowds with a mechanism sort of filtering and etc. +So usually to start you need to register and then we have huge tons of tutorials and education programs and also we have a community which like might in handles actually and we're bothered to discuss any problems or questions. +But I would say like we try to implement in the platform already simple steps that help you to do it pretty intuitively without studying much your first labeling project and set it up and let it run. +So there are like inbound instructions on every step there are like some little video or some little text instructions telling the good practices. +So we try to make it as simple like I actually saw it developing because when I started using to look at it wasn't any of this and now it's like impressive how they changed everything. Yeah, I can imagine. + And so inside the log I mean if I consider to look as deep package product that I get access to you know inside it I presume you have the labeling editor or component whatever is it that you're calling it that I can flexibly load any data format right and also any vertical like from computer science to audio to text time serious maybe and so on. +What does it take for you know the way I imagine it they send my head is that let's say this is this is a team that hasn't had experience with labeling before but they realize the importance of it in their project. So they will not be professionals in this space. +What what do they need to think about when they prepare the data maybe quantity maybe you recommend them to start with smaller quantity how should they reason about format and should they first go and watch all the tutorials so can they somehow intuitively follow the UI. + I would say like this like of course crowdsourcing a little bit like aligning it reminds me a little bit of training the machine learning model you need to spend some time tuning of course but yeah it's for the different data types that's firstly addressing your like comments it's for different data types it's for like all the text video etc and like can be used for multiple use cases like gathering data. + Gathering data says for voice assistance and like for self driving cars or like as I hope we will still stick to the main name of the podcast and we will talk about search relevance with a human labeling but like yeah let's imagine your theme is on the record creating a project and they're realized we need human labeling but they never saw the platform. +I would say that the most important thing to consider is to start thinking in the midst of the composition of the task. +It's a key thing in the any crowdsourcing tasks that's not like it's pretty scalable so the amount is not a problem it's not that expensive you can set up reasonable prices and it will be pretty much cheap but the one thing that is very important if you make task too complicated. +Like as you would to for example having in house experts and you can ask them whatever and they will think of the rest here you're working with people which are not committed specifically to provide you something more than you asked for them. + So tasks should be simple and they should be well defined so that is the thing that you need to a little bit train on thinking of how to decomposition task and for example like if you offer it to a person who is not in your area of the studies that you will be sure that he still can do it without special training maybe just reading the instruction or like completing some exam. + And the rest is pretty much covered by the platform because now there are like specific mechanics which predefined your settings for you not making mistakes on like you know we money to the crowds or like doing the interface incorrectly because we try to implement as much testing in house as possible. + And the interface they configure like the program where you configure the interface it is done pretty much into intuitive sense so you don't have to like learn JavaScript or HTML if anything it's done just in basic building blocks which can be like changed in places and group together in some nice looking interface. + So I hope the most of the burden is just to start thinking differently it's like you know it's like with a programming sometimes there was a moment when I learned how scaling my life and I have to completely like reprogram my mind because it's such a different language in the means of programming you need to think differently the same is with crowdsource unique to think of the composing that is the most important. +Yeah that's exciting yeah a high school functional mathematical yeah it's so my god yes but probably beautiful too yeah this sounds great so it does sound like a self service in many ways and now that you called out search which is also very dear to my heart and I'm glad it is the same for you. + So let's start with the basics you know like I have a search engine I have users I've got logs logs right so what I can do is that I can actually record for every search the position where the click happened right so what we returned what was the click and so I have plenty of data assuming I have plenty of users why do I need another data set can you convince me. +What am I missing. +Here I would say manual is much more about using not creating a new data set from the scratch but evaluating your abilities of ranking your queries in your search correctly because as far as I understand a lot of like there are like a lot of the ranking pretty sophisticated algorithms existing. + The change in stay like starting from the simple like search like I don't know because I am a lot of people like document and a query and everything which was like some past time now people are creating vector databases and it's super sophisticated but still we have search we have recommendations we have like some order of queries which user expects to receive. +I mean user doesn't expect to receive some order but it user expects to see the right answer as like closer to him as possible not like searching for the five pages so for that specifically human evaluation on top of the implicit signals evaluation like clicking it's very crucial. +And I can try to elaborate on that how do you think like from your perspective like do we need if you like your creative search engine do you think we need also to make you and see the like ranking results or you think that clicks and buys if we're talking about equal is enough. + Well if I play the interior to play the devil's advocate you know the data advocate I am the devil's advocate here in principle I already have users so they will tell me with the clicks they won't with clicks right so I might as well just measure you know the sort of plot this click through rate or something else and then see what's going on right so that will be my probably online metric. +But I guess when you when you talk about human labeling you infer that there is an importance in offline evaluation as well right. +Oh yeah I am like you know is asking this rhetorical question like we need it right right but actually I can give some motivation behind it actually it's a very interesting thing about what human clicks actually mean. + We can return to it because recently I had this meet up about biases and it was also about like human clicking and one of the very interesting talks that we had at our meet up was about like position guys so the humans are just tend to click on something that they are offers because they're taught that the thing that offered like at the top positions are exactly what they need. +And they may be dissatisfied with that but they're just like you know follow the general way of how search engines and the com search engines work. +So technically online metrics they make a lot of sense of course because like by clicks and by bias you can predict the most of the behavior and it's pretty fast and it's automated. + I mean like a testing by everybody knows that the one thing that it doesn't cover it's firstly explicit signals like you can you can't talk by the clicks and vice as the whole overall about the human satisfaction the satisfaction score because it's not like they are explicitly asked in general like do you like this do you like this search result maybe you wanted something else maybe you want it more maybe you wanted to be recommended something else. + And yeah the other things that sometimes like if we're like doing some assumption about the like an A B testing for example that we change some interface and we're doing by some clicks and by some assumption sometimes we can by introducing new features pretty much hurt our product because it's happening in real life and user see the changes there like the ranking how it differs now with a new feature and they're getting super like. + This satisfy the end humans they're not like you know for giving easily they see problems in your search engine they might say yeah i'm not using it again no thank you so like some reasons of why would apply lately be better you can experiment much more on it because like all you notice the feature in zoom that if i'm doing that it's actually noticing by the neural network and of course me. +That is amazing oh my god okay that yeah we're really in the like this is the time of the artificial everything so I got super amused i'm sorry. + Yeah yeah so firstly with a plainly doing you can definitely experiment more because you can try different features without higher harm in the end to end users of your system of your engine and secondly you can check how they're satisfied what they do like actually explicitly ask them what do you think about this because when you're just the guessing their behavior by their like some implicit things. +Like where they are is nook and how much they click you can do much for mistakes because you know as it says we we can't get into another's human head but we can at least try to ask and then that probably will be closer to the reality. + Oh yeah absolutely yeah I mean if we don't if we so what you're saying is that we if because usually in search engines in a way we skip that step of asking yeah we could integrate some give thumbs up or thumbs down approach but it might also be still rather implicit and not explaining everything we want to get. + But basically like you know yeah I just remember that I was reading pretty much interesting articles recently about the recommendation systems and implementing them using them more about the behaviorism of the people like not just giving them the most popular result or the most desired by the similar by some features result like in the collaborative filtering. + But sometimes we need to give humans the result that they are not thinking of what they it will make their like for example health or life better because you know this problem of recommender systems when you're like used to clicking on something at some point recommender system starts offering you the same pool of things you're kind of stuck in this and that's why like using more like sometimes and the offers of this paper they were suggesting that we need to sometimes. + As humans explicitly did they like what they were recommended and do they understand why it was recommended to them and maybe they want to change track of the recommendations so that's why we shouldn't seem just online metrics but yeah also the second grand reason why apply metrics are good you can experiment without harm very fast because like online metrics they're usually taking like two weeks for something. +You need to wait for the statistical test to make some like results which are significant and here you can test it much more faster and you can perform even offline a testing by the applying manual label data which will cost less harm because the real users won't see your mistakes. + Yeah I think this strikes a chord with me for sure so it's like you don't really because there is always a cost to pay when you go online that you actually deliberately potentially harming someone someone's experience to learn whether your contract is it called contract factual like your change in the algorithm is good a bad. + Yeah it reminds me I am sorry I'm so totally up to the but it reminds me of the when I was working in moderation of the vertusments here it's very crucial to make mistake and line because they're like two different options otherwise you're like showing to the end to end users the advertisements with things which are like you aren't supposed to show like drugs. +I don't know some the funeries some yellow news something that is like dangerous or just like stupid on the other hand if you're not letting some like through the moderation some healthy content through your losing money of the companies which are like having a deal with you and your own. + So here you need to be very cautious with any online experiments you pretty much doing everything online or offline and you need to very much monitor how you're watching learning algorithms doing the evaluation because environment changes a lot in like you know in the advertisement world new laws are incoming very fast people like people are input like they are impressively good at the end of the day. + And the way the all of the boundaries when they need to fraud something so imagine every day there are like new existing algorithms of creating some advertisements which are passing the machine learning algorithm block blocking the fraud and everything so you need to adapt very fast and for that you and for a fine experimenting your course need applying data and applying labeling very much. + Yeah I slowly start to wake up from my devil's advocate role so so I should stop being careless and not only rely on the data that I see in production because in a way it's like prime time for my product and I should be careful about it's not just deploying something once and it stays there forever and chat GPT takes care of everything. +But it actually something that I will need to evolve and this is where the crowdsourcing approach may help me to do more economical more less intrusive as well this is really good. Let's maybe try to make it a little bit more concrete right and let's emulate let's play this game. +Can you verbally visualize describe let's let's say I'm developing a I don't know flower search engine I don't want to say ecommerce I don't want to say something specific let's say I'm searching I'm offering flowers and I would like to search them. I will use her to search them. + I guess can you propose sort of a framework of thought how I should approach the crowdsourcing so let's say what should I focus on can I choose a metric of line metric that you will recommend would you like to you know do you think that there is some specific thing that I could try to connect with my business goals like a metric that will be reflective of my business goals or would you start with something just something like I don't know in DC geo. +Whatever and go from there. It's a very very very long topic to discuss but less starts from somewhere so in my perspective like of course like you're doing your engine it's in some point of course you're implementing some like online labeling. + So you can find a lot of things like online evaluation and like you have some somewhere to start and here you come into the position where you need to do some of line labeling so there is like these I would say like a circle which like in the parts of which you can like you can depict your your pipeline pipeline as a circle which goes in infinity and it has the several parts. + When you're deciding like about like how you want to perform your ranking what is your ranking means what do you want to show the most what is your like how many positions people do see what do you want them to see the first what is relevant what is around and you're like selecting some end to end metric that you're going to use. + There are like usually some popular metrics you notice the like and this is all like this this cumulative gain metrics is very popular and nice way to start there and there are like even like more simple ones just evaluating about the precision and recall of your position of the elements arising in your like ranking list resulting and there can be even more sophisticated approaches like. + Like expected reciprocals around for example metrics if you heard of it it's like more cascade approach because you know that people are not clicking through after some certain position but think we're talking about flowers I would say it's like it's more about like image search simple one which has like some certain type of definitive answers and it's not like people are going to it's like with search in some items when you're like finding what you desire and then you are not. + It's scrolling down maybe with flowers you just want to see so I would say I mean see them like download them or something so I would say this is pretty good at the beginning as a basis and then you can adopt this metrics based on what are you really interested in maybe you have some advertisements on some of the flowers for something. + Then the next part after you define what are you want to do for example view like which metric you want to evaluate what do you want to see like what you want to compare you think about like what do you need for human labeling how do you need to sample data what will be the result how you need to aggregate it and how do you need to like use this information in your product. + And then it comes like for example for in this G you usually need some ideal ranking to compare your ranking to so here comes exactly the crowdsourcing the manually because you can gather this ideal ranking from them and then do a comparison on the your real search engine answers so okay we define the goal we want an ideal ranking of flowers by the query and not one query because like I'm not going to do it. + For example just one queries kind of I don't know super simple and you want to evaluate it in general so here comes the sampling and how you can approach sampling of your queries and the results of your search engine can be very different you can just try to sample the most popular flowers and queries but it's usually not the best approach just because like the most popular queries are usually very well handled and they are very simple. + Because like when the people are marching in the in their disayers it means that it's not a very complicated thing so like what very like a huge tail of very rare queries which you also want to consider I guess in evaluation in ideal ranking so here comes like two techniques for example reserve works sampling or even like stratified sampling I would say I must recommend using stratified sampling adopted by like your own. + So you can use the same method as the one you are using in the list of the situation in your own needs this one allows to like to very like shortly explain it without like the deep is just you have your own data the whole amount of the queries with their like how often they're asked how popular they are and you are doing like some bins of them based on the popularity and you try to sample equally from the each beam but these beans are different sized based on the general popularity. + So we have the kind of like you're kind of modeling the distribution of the data in your engine by sampling like this so after you have this data samples you need to think how to present it to like manual labor what do you want to ask you like you want some ideal ranking yes and there is an option to give them like for example query and the like the first 10 or 20 results that your engine returns and it depends like how many results depends on the click through rate which you can like for example estimate my you have a data about how users click how far they click in your like length of your search results and you can estimate that after like I don't know 15th position it's not interesting usually to anyone so you cannot worry about it very much. + But here like you see if you're giving the whole list to end user in crowdsourcing and saying okay rank needs from like the most relevant list relevant as I was saying before the composition is very important and this task of ranking is very hard because even like I am having like some degree like I have much worse and masters I think I am generally like a dukega person to some extent I am not sure you know but if somebody says to me okay this is the flower like this there is a 15 pictures can you please like rank them from the most suitable to the list suitable I would be like oh my god I can't do that because it's too much so there's like other approaches either like taking a specific item which returned by your system taking a query and answering are they like relevant or irrelevant together is it a matching or a matching pair it's much simpler it's very good understandable by the crowd but the problem is that here you can't kind of compare items with the same relevance because like it says like okay this relevant and this relevant and you're like okay what should I put on top this one or this one you can ask people to give like some percentage of relevancy from their head but still different people think differently so it's kind of hard very much to aggregate the results. + The most nice approach I would say would be a pairwise comparisons so you're like giving a query you're giving two answers and you says okay what's what's what's you what's you better and then by this pairwise comparisons you can do a whole ranking then by aggregating this pairwise comparisons in the manner of the list with like from the most relevant part to the list relevant and of course If you're doing this pairwise comparisons honestly like how it's supposed to be it's n squared the amount of entities which is like a tons of entities so usually our suggestion is to do like more in the weak sort or like other sort manner with n log n so like doing a hard estimation of this pairwise comparisons sampling a little bit less but still you can like have in the end the pretty pretty good estimate it like ranking list. + So you create this assignment you have this pairwise comparisons you have the results you can estimate the quality of results based by how like this particular user is good with this particular assignments and then you're aggregating it there are like some models then you can use for aggregating for example like mathematical models statistical models like for example, red literary or something we did it actually in our crowd kids it's a thing for pretty much we tried to do an open library in Python for every type of crowdsourcing annotations not only to local ones so you can implement it yourself for example take some library even hours and then you got your ideal ranking as you desired and you can compute the metrics like compared to your ideal ranking so how good your search engine returns like on big samples and these samples how good are the results of these flowers how relevant they are and then you see like what is the overall result it might be good or not very good if it not very good you for example can select some domains when you see the most mistakes and try to like ask the crowd in some separate projects the main wise like where are the mistakes exactly maybe you have problems with like defining the color of this flower or maybe you have problems with like good lightning on the photos and you can figure out what is exactly the problem and yeah you can use this manual labeling firstly for evaluating the metrics from time to time and to see how your search engine improves with like including new features and changing the search in algorithms and you can also train on this manually labeled data your ML models which perform ranking so I would say it like it works kind of like this. + Yeah this is this is great it does sound like a very structured process what you explained but but I do want to drill into maybe couple of specifics so one is I believe in DCG is is definitely I think it's a browser that In principle if I was communicating this to some management in my team I could say that yesterday we were at 75% and today we are 76% so we are improving right and this is on a percent scale if I remove the letter and from this formula then this becomes like an absolute scale and there is no way to tell are we progressing or are we regressing. + But at the same time again wearing my devil's advocate suit here for a moment and DCG has a problem that if let's say I have a scale of labels from zero to three right so zero one two three so zero meaning completely relevant result and three meaning completely relevant perfect result if I receive two ratings one with all all ones right so all ones and the other one is with all three so all ones it's kind of like a suboptimal result nothing better in the list but at the same time not perfect and the other one is absolutely perfect and DCG yields exactly same number because if only if we You rightly mentioned about optimal perfect ranking so if my perfect ranking equals in lens exactly the visible labeled area than the formal and DCG will yield 100% in both cases and this is kind of like a problem and you touched on this in that part where you say that we need to make sure to construct this perfect order of results right so how long it should be let's say if I show 10 flowers on the screen 10 bouquets whatever how long that perfectly should be 30 hundred is there any recommendation. + I would say like as I mentioned before firstly you can use the metrics which is like expected reciprocal rank which is exactly talking about the moment when the user lose attention and after that you can make mistake but they are just not reaching it and for evaluation this like the moment of the termination of the interest you can exactly I think pre evaluated with like if you have some any data and pre-evaluated by the clicks so you can give like any item some weight by reach through general and then just predict in general how much like general user how many items your general users like look through before they're satisfied with the result and maybe over time this actually amount will be decreased because your ranking will be more perfect. + But you also can try to emulate the same experiments with actually the cross-sourcing and just to see how like to give them some certain amount of objects why I'm talking about this actually because recently when we had this talk about biases the presenter for testing his hypothesis on the click through he created a project in soloka where he had like the query. + And the recommendations which were like around the 20 or like 30 and he looked are there clicking through until some I mean they're also were like ordered like the search engine and he looked like how how far they're clicking through to check the hypothesis of the position bias so in general you can also try to test your hypothesis online with a click metrics and see how to like. + Choose this position and then test it offline but one additional thing when we're talking about business we're in general also talking about budgets so of course the more you need to evaluate steel is the cost will rise just because you're like your offer it more data to crowd and crowd needs to like to do more assignments or it becoming more costly. +So I would say I would like estimate the amount that you need that you know that you need the amount of the click through and then maybe cut it based on your like general estimate the costs of mental labeling and try to align them little bit because still. +I would say the result might be not like 100% perfect in the means that people are reaching like farther and seeing the mistakes but it still will be a big improvement if you're catch a mistake in the top ranking like positions. +I think connected to this there is a notion of disagreement between annotators right so what is relevant for you might be completely relevant to me and I I want to see the for the same query I want to see the results in different order. + I think one of the suggestions I've heard of how you could construct this perfect list is actually you can take and concatenate all of the rankings given by independent annotators for the same query and then resort them in the order that makes it perfect from the top to the bottom of course you will still have issues with ties right so if you have three three three's then how should you order them but. +But at least they will be visible on the screen so maybe that's fine. +Or maybe not who knows but at the same time you kind of like achieve this perfect list which incorporates the wisdom or the wishes of other people that have been in the same sort of group have you experiment something around these lines or do you think it's sensible to do is. +To experiment with the which part with the check in the form working with or with reordering with with with with constructing your perfect list right because for ndcg you need that perfect list to divide by right in the form. +So how we experiment with the length of this list you're asking me no in this case I think i'm actually describing that specific way of building it that you take a sub lists from different people that annotated the same query then you stack them together and then you sort them right. + Or always it's yeah it's very interesting approach I would say that I myself never experienced such of the technique which sounds very interesting but we're usually just doing like I usually eat more like aggregation by the models which are not like concatenating but that taking into the account in the general the quality of the user in this ranking problem. + And so when you're doing an aggregation you're just like more lean towards a user who are proficient in ranking in general so you trust him as an end to end good user of the search so for example when one person said all three and one said all once but I know that this three guy is in general good at this I will just take his one as an ideal labeling. + Yeah this is this is the exciting part you're tapping into the topic of quality of annotators which is super super important at the same time you could teach the annotators if you have them in house but if you have them external you kind of do not have control over who gets what task so how exactly maybe the local or what kind of methodology should I apply to measure the quality of the data. +What are the components there. + It's actually like I would say it's a very very like it's a very big system and the means that you need to not only measure quality but also like keep your projects and protect it from the fraud and from the people who specifically want to break quality not just they're like making a human mistakes but they're really really trying to scan with your data. +So there are like different techniques starting from the super simple ones like anti fraud ones which are like looking at how fast are you labeling. +Are you labeling with the same non human distribution of the labels like clicking only like one option until it just goes forever or like even sometimes it's shaking of how you're like how is your behaviors like in general with like different projects with a lot of data. +Different projects without not taking how your mouse works or something like this so this and also there are like of course general exams. +Checking your language proficiency checking your proficiency writing checking your proficiency in some other skills which are also building up some certain I will say port portrait of a good label or because if you're like able to and provide the good results in the some skills. + Which are like around this problem like your good with this or this that means that you're in general won't be at least a broader and you have a chance to succeed in this tasks and then the main mechanism which is used in the most of the tasks where you have categories like classification or something and when we're working with a categorical tasks we know the ground truth some sort. + Of course it doesn't happen with ranking because we're ranking we don't know like it's a very subjective manner what do you prefer this or this but with but you can of course actually create an obvious examples like very obvious so when you know like some ground truth you can hide you can shuffle in this. +And you can see the examples of the tasks with the answers which you know and you can like. + Hiddenly shuffle them in between the assignments so people will complete them without noticing it because like it's hidden by the API and everything and by the percentage of the examples that they evaluated correctly you can kind of see me their skills because you know that like in general for this class they're giving the right answers. + And the second technique which also works good for the more creative I would say or gathering assignments for example when you need to take a picture or when you need to do an assignment outdoors for example go and check there is like building on the some certain plate for like a map sub there you can do even more tricky thing and tell the crowd evaluate the other crowds. + So you're creating a specific validation project with the other crowds or source and you're giving them the answers of the first crowds or source and you say okay guys now you need to evaluate doesn't look correct to you doesn't look like not a fraud and everything and there by this double evaluation you're actually sorting out all the problems. +Wow I have never heard of such method it's it's amazing I think more traditionally like maybe like 10 years ago in the project related to sentiment analysis we were talking about. Double annotation but at the same time so you give the same. +You know the same label and then you ask the human to whether they agree or not but twice and then you basically calculate the inter annotator agreement but what you just described is so brilliantly put and sort of invented in a way. + Was this invented at the locker or have you seen this somewhere to be on I don't know if we did that I am super happy yeah it doesn't seem like rocket science yeah but it works yeah yeah yeah and also about yeah in inter annotator agreement also works especially in like some classification that's how we actually started creating this hidden assignments recently. + Like as I told about them we are called and honey quotes or the golden assignment the one with a hidden tasks which you're like shuffle in the data and then evaluate the result the skills of people who are doing the some certain kinds of assignments and actually also we're saving these skills and sometimes you can access them because they're already on platform they call global skills you can just. + Preselect on your project people who already succeed in moderation for example that actually helped me recently a lot because I didn't have to train the crowd for my very complex stuff so yeah but I stepped aside so before like when I was even working with to lock us on time ago you have to create this specific tasks yourself. + Like this hidden you have to manually label them and that took some time and it was like kind of tiring because you're sitting here and you're creating like 100 like usually you need some certain amount of some sample of this task like at least 75% of the general like amount of the tasks on your platform on your project to evaluate how good the people are because you're just if you're like giving them 100 items to label and once you're asking like. +If it's correct on earth you can't evaluate if this person is good or bad it can be just the pure luck so labeling why yourself was kind of you know time consuming and said and recently we decided okay but we have. + Crowd why do we are doing the drop of the crowd let's just create this hidden tasks by the other crowd and we can do this easily just using the interhuman agreement you're just giving them a task and you're pre selecting the crowd with the good skills in the past just in general so you trust them more and you throw for example 10 people on one tiny bit of a task and 10 people like legally without knowing what the others say and then like to have better than one you. + Usually some certain amount of the strong agreement comes and you know that is the right answer and you can directly pick it and already shuffle it in the other project so you'll see yeah we're making the self working mechanisms like you know you just throw some data in your system yeah it's like self reinforcement or yeah I think this is amazing and and it also is surfacing I believe like it's. +Feature of the locker that you cannot get with let's say you set up an open source labeling tool that you can having a specific task like moderation or. + I don't know sentiment whatever machine translation that you can actually ask and gather a group that will be proficient in that specific space so because otherwise you're going to be wasting cycles in potentially teaching people right yeah yeah I think this is something that now that we started to say in the beginning of the podcast the data is important but also humans that annotate. +It is important I important yeah absolutely this is great I still wanted to understand one building block you were talking about aggregation can you can you again sort of restate this what do you mean and what should I pay attention to as a as a user of such a platform. + So for example like there are different ways of annotating data and sometimes you need like there can be different cases when you need aggregation so aggregation is like just imagine that you receive some raw results from the human annotators and then you need to aggregate them in some final answer that you will use for your model or for something it can be different cases when you need it for example as we were talking about the. + The aggregation between humans on the some some task for example you have a task of labeling feature is that a cat or dog and like you decided that you want to like foreign a tater so it's and like three of them said it's a cat and the one said that it's a dog and you have less for answers and to understand that it's a cat you need to perform an aggregation so if it comes to classification tasks it's pretty easy to do. + I mean you can do just major world or like major world to wake it by the skills of this people but then it comes for example for aggregating like images like for example you're doing a segmentation and you need to aggregate different answers about the segmentations here it's already harder because like doing like a major world piece of wise it's a little bit of a hard work you know. + So for that usually there are like some models which are pre designed and used and studied in crowd science so aggregation for the aggregation of image and also aggregation of this pairwise comparisons that I was talking about because this is a specifically a hard task because you have this pair wise assignments and sometimes it's like a better than be be better than see but see better than a and you're having a cycle you don't know what to do so for that there are existing. + So then a couple of models which are based for example noisy bread dietary which are based on the expectation maximization algorithm which assumes that flavors are actually by the skill know the ground truth of the answers and we're trying to estimate that to get as possible as close to that for a digmar with like a couple of iterations of this model and then the end it just gives you away the list of responses like for example if we're counting and. +DCD some other metric we just need a list where it says like item one is the best item 10 is the worst so the aggregations out of this all of the pairwise comparisons it will give you that list. + You can implement these aggregations yourself and study them because like in crowd like we're not the one the first ones doing crowdsourcing action so they're like in crowd science they're like a lot of models presented and our research team actually also studying them and implementing them and I hope. +I'm not praising them too much but you can it's your it's your moment of okay our research team is created yeah but but yeah we for example for aggregation we just create a tool which can be used paired with a platform so you don't have to think much how does it work but if you want. +Right me and I can provide you with paper yeah absolutely and all the papers that you mentioned during this podcast I really would love to include a show notes as well because because I I see how how the listeners find the episodes educational and they. +Some of them spent a lot of time you know listening through and and and then you know reading the papers as well at least browsing through them. So Janet so much stuff you have shared so far and I think even those who are using open source tools like I don't know keep it a label studio. + And others I'm sure they can learn from what you said but I also hope that they will improve their practices by typing into the talent behind the locker but there is one topic that I think keeps surfacing everywhere but also to some degree it becomes an overheated discussion around bias and data and how this can actually drive the inequalities in life and in the world and I. +I think by the virtue of us being in this space we should resist this as much as possible but I wanted to pick your your brain on what is bias for you and how you have seen or maybe discussed this in the email projects. + I really like this topic because yeah I recently hosted a panel discussion about biases and a lot of hosting a panel discussions because you can come you can know zero about the subject you can ask the stupidest questions to the most awesome engineers in the field and your returning super educated so maybe I'll try to just to recreate what people from this panel discussion and early in set to me. +But as far from my understanding they are like not two types of biases but I would I consider them as a two types of biases. + The one is I think all bias more related to the stuff which is like the things that we shouldn't be biased on but we are biased because of the historical data or like the mother unfair like like results of the history so it's like the bias of the skin color the bias of the gender the bias of some other and they are here and there in the set in the big models. +For example like even the dolly and GPT free and everything. + Sadly scenes they are learning on the internet available data and the internet is a very toxic space sometimes especially like I still like the stories of the chat bots which were like learned on Twitter and then they are like so offensive that nobody can leave them out in the open like business communication word. + So these models of course have this at the biases and that should be controlled very much and that's why we have like the fairness fairness topic in the eye and that's exactly like I actually I studied recently I love my masters because I'm revisiting all the topics in the ML so I'm feeling like when I'm talking about it I'm feeling like I'm literally coming from the academic background. +It's just the monsters and the fairness algorithms there like pretty much set up how you can evaluate how you can try to make your data list bias or just test it on the fairness but yeah still here. +Sadly and the second thing which and of course their approaches how to avoid it fully but sadly we're constructing new biases here and there so we're getting rid of the one and we're constructing you. + And the second one they are more like behavioral biases maybe they're like last harmful in the general because I consider ethical biases being very harmful like when we're creating a systems related to jurisdiction or like to some other things these biases can be crucial and also by the way the same biases. + Oh I can I remember the story about covid like with the covid when people tried it's not like that ethical bias but it's a bias and it was very crucial so when with covid people tried to at the beginning when it started and everybody was panicking so people started to think it maybe they can do something in the eye like some amount of which will help to recognize if the person has pneumonia like is it like caused by covid or not in the lungs. +And there already was data from China because it started earlier and there were like a lot of AI and the engineers working on that but the problem was that the data was biased and it wasn't cleaned and sorted out because people didn't have so much time I mean it was very like a crucial moment. + So because of that models were working very biased and bad because they for example were predicting that if the person on the like the scan if the person is kind of in the position of lying she or he then they have covid but if it's in the standing position they don't have it just because the people for lying and taking photos they were just in the worst medical condition. +In general because like when you can't stand out that means that you're pretty you know so it was just a bias in the diet because it was a balance and that is the result of bias which you need to monitor in control what that's why you can't leave it in the open world. + And yeah so the behavior biases it's more like about when like for example with the search engine I think I touched it the position bias it's when you're just trained to click on the like first three elements that you see because you're you're so well with information that you don't have like a power to go through the tens of pages and select what exactly do or there like some other biases. + For example we know that one behavior thing that people have it's it's interesting thing it's called it's called choice over it's like when a recommendation systems people actually like in restaurants people prefer to see something with a bigger menu with a bigger recommendation because they think oh it's enriched it's nice I would love it but the more choice you have. + The more cost you're spending of on decision your inner cost your evaluation cost at some point it becomes just not like not feasible not useful like you need less items to select better choice at some point you just lose attention and everything and that's another like thing which comes from our behavior and which biases a lot of instruments and which biases a lot of like models and which we need to take into account. +Or otherwise we won't be successful with implementing it yeah absolutely on this paralysis paralysis of choice would you think that reducing the number of options with bias our system in some way like strictly speaking do we actually introduce bias by reducing the number of options. + Oh I'm pretty sure we do but like as I said sometimes you can use biases like not all of the biases I would say there that harmful sometimes you can just like try to use them for like having more good outcome of course I'm not talking about the ethical biases but with like for example with reducing the choice amount. + Of course you're biased people towards like the what you offer for example it depends on what you offer in this limited choice and if you're like offering them the most popular of course they can be stuck in the pool of selecting the same items without changing their preferences which they would like to but in general for them it would be easier to select something that they really prefer by the whole characteristics. + So even by using people here you're actually kind of helping them with the choice process so I would say it was a general recommendation like after 10 or 15 items as far as I recall your choice overload becomes too much so you just can't like you know I hate these restaurants when you have an immune soup she feeds Indian food Mexican food you're got oh my god I'm so hungry but I can't chew. + Yeah it's like no focus and maybe no face of the restaurant in a way but at the same time I'm pretty sure there are customers who are like in haste and they don't have time to drill and understand what is this local cuisine here just give me that pizza or burger or whatever and I will flick through the menu right but I really wanted to relate to what you said and I think bias is not always negative and I think it's important. + To understand that sometimes in circumstances circumstances it could be actually bringing positive impact and the example you gave is one of that right so but at the same time whatever I show on the screen in the search engine you already talked about it's a click bias right what I show in that order you know in majority of countries in the world will go top to bottom left to right and we will click and I will show you what I'm doing. + In that order but at the same time anything that I say for example now I already bias you to think in that direction and if I choose another strategy and I start talking about snow or something else your mind will tune to completely different topic right and you will be biased again without realizing that I do this. +So the same actually will apply I think to the annotation and labeling project right so whatever I show in which order I show which questions I ask will bias the annotators to. + Besides all other factors like if I overload with them with questions they will be tired and really not give me value but if I didn't do that the order of the tasks might bias them and many other items can you talk a bit more about that and also is there some silver bullet that can solve this with least improve. +Okay from from my position which is a very subjective opinion I would say bias is a very human thing so even creating like a big. + The same shunt the perfect model trains on a purely human data in the amount that we have it now even if we increase the amount we can get rid of the biases which are we are so afraid of so we need to go to the some system with robots creating robots creating robots and then maybe we're get rid of our own biases because as you know human factor it's a really like a thing where just making mistakes sometimes just because we're. +Like not as perfect in the tentative so we're just biased as if but I would say with the human labeling you see you're doing product from bias humans to bias. +At least the thing that why I was talking about the Ecclicity like when you're predicting something by the their clicks in the online experiments you're introducing even more like you're introducing a third person in this chain developer who. +Does assumptions about others people bias is sometimes without knowing their culture or like as you said in search engines we read like on top to bottom from left to right but sometimes in some cultures they have different way of writing reading right to let or maybe they have like designing. + There are different people which like different types of the search like based on age maybe sometimes some people are like seeing less or there are people with color blindness and they need some other results because like it's also depends on how do you see everything so when only one person like assumes especially developer like I was asking on the panel a question like should be or we all be psychologists and philosophers to create a systems because like when the development of the system is going to be a lot more difficult to do. + The developer decides what to do it sometimes this person is not educated about like psychology of the human behavior and it might give some mistakes so that's why I think human labeling wins it's not like there are people are who are like psychologists and philosophers but you are giving the same task to the same crowd you can like do a perfilter in of the crowd. + For example by the same targets audience that you're interested in by the language by the culture by the interests by like for example you have a comfort flowers like people who like flowers like work with flowers or ask them do you like flowers and then send them to this task and by this your at least like. + Trying to model the same behavior with actual like people by this having the same distribution like maybe small like sub sample of the same distribution of people for your target users and you're not deciding for them yourself so I would say the best recommendation is to think about filtering your crowd for your assignments thinking of who you want to be satisfied by your product and asking the people related to that to do the evolution. +So I think that's the best recommendation to do the testing. + Yeah, it's just one thought came to mind that in principle if we consider it you know annotation process is building some kind of mathematical function that we're trying to with which we're trying to fit into the reality then in principle we could have built a perfect annotation, you know composition project that would fit into the reality. +So I think that's the best recommendation to replicate the same biases that exist today and earn money right that would be kind of the wrong way to go. I hope companies do do doing that right. +I need I mean I need to say that even like I think even to log if I am not mistaken it actually uses as a support a little bit of machine learning labeling so we're learning on our crowd in each project and we're like providing some sublingling with the most. +Learned by the prices of the target auditory and tries to emulate the same behavior, but it's still not like the evil sentient AI robotics because it's mostly manually labeled but I need to know. +I need to mention that also humans who are like labeling assignments sometimes they are very educated and very smart and they are very willing to correct the system and actually when you want to correct the system you're becoming super talented. +So I saw some people creating some algorithms which are labeling the assignments for them and relating the human time of labeling human way of like moving the mouse human way of understanding instruction. +Recently I was asked how we're blocking this type of people but I'm saying by to bond this type of people there getting so close to actually label it that. Yeah, I'm sure we can learn learned from them because and you already touched on this topic that another big area of researchers how we can. +I believe it's called gamification or like you break the machine learning model by supplying certain sequence of actions and input such that it will unlock some doors or whatever right maybe you receive a loan that you are not supposed to and things like that. +Yeah, this is interesting and do you think I'm asking the same question as they see on your event. Do you think that unbiased data is attainable so there is a zero bias. + The honest I don't think so I don't believe so like I might be incorrect and the experts said like different experts like who was sitting with me on the panel discussion and I of course I asked the same question because it's very interesting like is it only our thing the why we're in this loop of creating and fixing biases like you know like a cheap one but. + I personally don't think so because like a bias is a very human thing we can try to get rid of one you're creating another but it's not bad it's not bad we just need to get rid of the actually like dangerous biases like ethic and other ones and with the rest we just need to understand how to deal with them and as humans we can recognize some biases which are harmful and that's good that's why for example we need. + The manual evaluation of the AI systems which are trained now because they are having their very like nicely trained but they are producing biases and they can detect what they are doing so sometimes they can be harmful so that's why like from my perspective like big models alone still can be like used now even if there exists and they're like compressing us very much I need some on top verification. +Yeah exactly and this is where the human labeling comes in. I think the flip side or if I would flip around my question about getting the or your question rather. + Of getting the completely unbiased said you could also say could we actually source the data said that that contains all the little biases or little diversities that exist in the world right or maybe not okay not in the world but in that domain of operation that you are you are in your business and maybe. +Formulating it that way gives me a lever to start thinking okay what is it that I'm missing in in the data and of course this is the most challenging question to know what I don't know what I'm missing right so it's equally hard but it's probably more in the trajectory of a massive the data set. +I would say that I actually heard some approaches which are working on that like specifically taking into account bias very biased data on the like in the domain and seeing how you're a reason will perceive it and actually catching the mistakes by it because yeah we we have. +We can account the bias data and there are like some guidelines how to notice biases in your data or models so here we can try to at least approach this ask from your perspective yeah. +Sounds great hey Jenny I really enjoy talking to you and I think we could talk forever on this topic but I really love asking the question of why which is. + I'm tapping into your motivation you did say in the beginning that you know all the stars positioned correctly in a way and you got the the dream job but at the same time you still wake up every morning and you say okay what will I do today what drives me why drives me so what drives you to do what you do data advocate and a well. + I would say I don't want to like start a story with reflecting on how I like I woke up one day and realized that my my heart belongs to a I and everything but I would say like a little little moments in my life when I had to write and say about can computers think when I was applying to university and I had to explain to my parents what is AI and why I'm doing it. + When I compare the other like positions when I saw some questions which people in general like asking see in the Lee and GPT free when I visited some industrial conferences and compare them with the research and conferences and notice that people are fascinated by the models and their key textures when what when it comes to like taking it down to development and to actually helping people people. +Struggle with doing some simple things like not simple but basic things like providing the date not interesting there like they sound less interesting but they're actually very crucial like providing the right to cure in the Bay. + By us monitoring it not just creating a proof of concept it bugged me so much because I see so many cool models ideas architectures around creating like insane applications but not always they're coming to production and not always there start like helping people and everything and I would say I really would love to I love to. +In general like seeing something start working like it's a very satisfaction or anything so I choose my like my path on say to approach people with talking about data and how can it help actually to train the models and make them closer to the production. +Make them closer to being actually here and working good and helping people out with this magnificent ideas created by researchers. +Oh yeah that sounds so cool very deep thank you for sharing this it's like I think in many ways it's like the dream maybe of creating that companion that can think the same way we think and it's not human. +Because we are could also as a humanity we do reproduce and so on but we also challenge ourselves and others in what is possible what is what are the limits of of our intelligence or you know are they need. +And it's a very important task that are still waiting there to be solved and can be solved with the I think it's it's magnificent. I am waiting to see if somebody some model will finally win the Lovener price and positive cheering test so I'm working for chat GPT. +Yes, maybe it will be fine tuned on some something like flower series or something. Yeah, yeah, yeah, on human labeling with for lock I am pretty exactly exactly. +And yeah and traditionally of course we will we will link everything we can link about to lock up but if you were to announce something to the audience maybe how they can get closer to the platform you know start playing around. + Okay, yeah, okay, I have three things that I really want to to announce firstly it's you if you want to talk about just like do we need manual labeling do you need manual labeling do you need that to leave me do you need transfer learning with crowdsourcing do you want to just use crowdsourcing and think about it. +To join our community because we talk just in general they're about the mental stuff about the eye about crowdsourcing about that the century can model century approach and there we can. +concretely talk about like some topics which concern to you to your company or like to your business and just talk and also we have two initiatives for like education and for researchers if somebody is interested to check some hypothesis on crowdsourcing for example some. + Christian there's some ethical stuff some gathering of the data sets for your own to or like you want to create some education or like to teach a course over the crowdsourcing to your students we have two foreign applications where we can provide you some promos so using crowdsourcing for free and play around and maybe to start liking it as I do. +And I truly like it because like when you can gather at 12k data sets in one day start liking it. This is mind blowing. So yeah that's that's me that's it thank you very much. That's fun. That was magical thank you. +I'm sorry for being like a very talkative person but I'm always like this would be afraid of me. + No it's it's your character it's your energy it's it's your experience and it's something that speaks up you know beside you controlling it I think it's it's it's important it's amazing and that's how it should be I think I really really enjoyed talking to you Jenny today I hope this is not the last time we can record another one and another one. +And all the best with your Christmas vacation. Oh thank you and recharging and all the best with the locker. Thank you very much. I keep in the city you can do it however you like actually we we approve. +Yeah thank you and I hope the audience got that magic tune as well and everyone will also have time to recharge during the Christmas and your break and we will continue from here thank you Jenny. Thank you. Bye bye. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/filip-haltmayer-data-engineer-ziliz-on-milvus-vector-database-and-working-with-clients.md b/transcripts_with_timestamps/vector-podcast/filip-haltmayer-data-engineer-ziliz-on-milvus-vector-database-and-working-with-clients.md new file mode 100644 index 0000000..4fd1a8c --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/filip-haltmayer-data-engineer-ziliz-on-milvus-vector-database-and-working-with-clients.md @@ -0,0 +1,5860 @@ +--- +description: '

YouTube: https://www.youtube.com/watch?v=fHu8b-EzOzU

Order + your Milvus t-shirt / hoodie! https://milvus.typeform.com/to/IrnLAgui + Thanks Filip for arranging.

Show notes:

- Milvus DB: https://milvus.io/

- + Not All Vector Databases Are Made Equal: https://towardsdatascience.com/milvus...

- + Milvus talk at Haystack: https://www.youtube.com/watch?v=MLSMs...

- + BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval + Models https://arxiv.org/abs/2104.08663

- + End-to-End Environmental Sound Classification using a 1D Convolutional Neural Network: + https://arxiv.org/abs/1904.08990

- + What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language + models https://arxiv.org/abs/1907.13528

- + NVIDIA Triton Inference Server: https://developer.nvidia.com/nvidia-t...

- + Towhee -- ML / Embedding pipeline making steps before Milvus easier: https://github.com/towhee-io/towhee

- + Being at the leading edge: http://paulgraham.com/startupideas.html

' +image_url: https://media.rss.com/vector-podcast/20211223_011243_83a6aa11fa9886f4212eedc43a2501c3.jpg +pub_date: Thu, 23 Dec 2021 13:28:43 GMT +title: Filip Haltmayer (Data Engineer, Ziliz) on Milvus vector database and working + with clients +url: https://rss.com/podcasts/vector-podcast/347470 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 16.16, "text": " All + right, vector podcast, episode three.", "tokens": [50364, 1057, 558, 11, 8062, 7367, + 11, 3500, 1045, 13, 51172], "temperature": 0.0, "avg_logprob": -0.4674873853984632, + "compression_ratio": 1.1926605504587156, "no_speech_prob": 0.2585803270339966}, + {"id": 1, "seek": 0, "start": 16.16, "end": 25.52, "text": " And today we have Philip + Altmeier, data engineer as Zillitz, who works a lot with users.", "tokens": [51172, + 400, 965, 321, 362, 21144, 15992, 1398, 811, 11, 1412, 11403, 382, 1176, 373, 6862, + 11, 567, 1985, 257, 688, 365, 5022, 13, 51640], "temperature": 0.0, "avg_logprob": + -0.4674873853984632, "compression_ratio": 1.1926605504587156, "no_speech_prob": + 0.2585803270339966}, {"id": 2, "seek": 2552, "start": 25.52, "end": 30.24, "text": + " And the company is building the vector search database called Milbus.", "tokens": + [50364, 400, 264, 2237, 307, 2390, 264, 8062, 3164, 8149, 1219, 7036, 21441, 13, + 50600], "temperature": 0.0, "avg_logprob": -0.42098724632932427, "compression_ratio": + 1.5943775100401607, "no_speech_prob": 0.06694626063108444}, {"id": 3, "seek": 2552, + "start": 30.24, "end": 31.439999999999998, "text": " Hey Philip.", "tokens": [50600, + 1911, 21144, 13, 50660], "temperature": 0.0, "avg_logprob": -0.42098724632932427, + "compression_ratio": 1.5943775100401607, "no_speech_prob": 0.06694626063108444}, + {"id": 4, "seek": 2552, "start": 31.439999999999998, "end": 33.44, "text": " Nice + to meet you.", "tokens": [50660, 5490, 281, 1677, 291, 13, 50760], "temperature": + 0.0, "avg_logprob": -0.42098724632932427, "compression_ratio": 1.5943775100401607, + "no_speech_prob": 0.06694626063108444}, {"id": 5, "seek": 2552, "start": 33.44, + "end": 36.4, "text": " Yeah, I got it.", "tokens": [50760, 865, 11, 286, 658, 309, + 13, 50908], "temperature": 0.0, "avg_logprob": -0.42098724632932427, "compression_ratio": + 1.5943775100401607, "no_speech_prob": 0.06694626063108444}, {"id": 6, "seek": 2552, + "start": 36.4, "end": 39.12, "text": " Data engineer as Zillitz, pretty much me.", + "tokens": [50908, 11888, 11403, 382, 1176, 373, 6862, 11, 1238, 709, 385, 13, 51044], + "temperature": 0.0, "avg_logprob": -0.42098724632932427, "compression_ratio": 1.5943775100401607, + "no_speech_prob": 0.06694626063108444}, {"id": 7, "seek": 2552, "start": 39.12, + "end": 40.6, "text": " Yeah, awesome, awesome.", "tokens": [51044, 865, 11, 3476, + 11, 3476, 13, 51118], "temperature": 0.0, "avg_logprob": -0.42098724632932427, "compression_ratio": + 1.5943775100401607, "no_speech_prob": 0.06694626063108444}, {"id": 8, "seek": 2552, + "start": 40.6, "end": 41.6, "text": " Nice to meet you as well.", "tokens": [51118, + 5490, 281, 1677, 291, 382, 731, 13, 51168], "temperature": 0.0, "avg_logprob": -0.42098724632932427, + "compression_ratio": 1.5943775100401607, "no_speech_prob": 0.06694626063108444}, + {"id": 9, "seek": 2552, "start": 41.6, "end": 44.44, "text": " And thanks for joining + the show.", "tokens": [51168, 400, 3231, 337, 5549, 264, 855, 13, 51310], "temperature": + 0.0, "avg_logprob": -0.42098724632932427, "compression_ratio": 1.5943775100401607, + "no_speech_prob": 0.06694626063108444}, {"id": 10, "seek": 2552, "start": 44.44, + "end": 50.6, "text": " And yeah, it''s usually I would like to start with you introducing + yourself to our audience,", "tokens": [51310, 400, 1338, 11, 309, 311, 2673, 286, + 576, 411, 281, 722, 365, 291, 15424, 1803, 281, 527, 4034, 11, 51618], "temperature": + 0.0, "avg_logprob": -0.42098724632932427, "compression_ratio": 1.5943775100401607, + "no_speech_prob": 0.06694626063108444}, {"id": 11, "seek": 2552, "start": 50.6, + "end": 54.92, "text": " what''s your background and how you ended up working for + Zillitz.", "tokens": [51618, 437, 311, 428, 3678, 293, 577, 291, 4590, 493, 1364, + 337, 1176, 373, 6862, 13, 51834], "temperature": 0.0, "avg_logprob": -0.42098724632932427, + "compression_ratio": 1.5943775100401607, "no_speech_prob": 0.06694626063108444}, + {"id": 12, "seek": 5492, "start": 55.760000000000005, "end": 56.800000000000004, + "text": " Sounds good.", "tokens": [50406, 14576, 665, 13, 50458], "temperature": + 0.0, "avg_logprob": -0.2030608714127741, "compression_ratio": 1.5857142857142856, + "no_speech_prob": 0.011916986666619778}, {"id": 13, "seek": 5492, "start": 56.800000000000004, + "end": 60.400000000000006, "text": " Yes, so you got my name already Philip Altmeier.", + "tokens": [50458, 1079, 11, 370, 291, 658, 452, 1315, 1217, 21144, 15992, 1398, + 811, 13, 50638], "temperature": 0.0, "avg_logprob": -0.2030608714127741, "compression_ratio": + 1.5857142857142856, "no_speech_prob": 0.011916986666619778}, {"id": 14, "seek": + 5492, "start": 60.400000000000006, "end": 65.08, "text": " A graduated from UC Santa + Cruz with a BS in computer science in 2020.", "tokens": [50638, 316, 13693, 490, + 14079, 9933, 23008, 365, 257, 27253, 294, 3820, 3497, 294, 4808, 13, 50872], "temperature": + 0.0, "avg_logprob": -0.2030608714127741, "compression_ratio": 1.5857142857142856, + "no_speech_prob": 0.011916986666619778}, {"id": 15, "seek": 5492, "start": 65.08, + "end": 68.04, "text": " So right start enduring COVID.", "tokens": [50872, 407, + 558, 722, 36562, 4566, 13, 51020], "temperature": 0.0, "avg_logprob": -0.2030608714127741, + "compression_ratio": 1.5857142857142856, "no_speech_prob": 0.011916986666619778}, + {"id": 16, "seek": 5492, "start": 68.04, "end": 71.96000000000001, "text": " And + then out of college, I really wanted to kind of get into the startup scene.", "tokens": + [51020, 400, 550, 484, 295, 3859, 11, 286, 534, 1415, 281, 733, 295, 483, 666, 264, + 18578, 4145, 13, 51216], "temperature": 0.0, "avg_logprob": -0.2030608714127741, + "compression_ratio": 1.5857142857142856, "no_speech_prob": 0.011916986666619778}, + {"id": 17, "seek": 5492, "start": 71.96000000000001, "end": 76.4, "text": " And + I was doing a lot of things, machine learning, taking a lot of classes, doing projects.", + "tokens": [51216, 400, 286, 390, 884, 257, 688, 295, 721, 11, 3479, 2539, 11, 1940, + 257, 688, 295, 5359, 11, 884, 4455, 13, 51438], "temperature": 0.0, "avg_logprob": + -0.2030608714127741, "compression_ratio": 1.5857142857142856, "no_speech_prob": + 0.011916986666619778}, {"id": 18, "seek": 5492, "start": 76.4, "end": 80.6, "text": + " And when I was going to look for a job, I realized anything machine learning related,", + "tokens": [51438, 400, 562, 286, 390, 516, 281, 574, 337, 257, 1691, 11, 286, 5334, + 1340, 3479, 2539, 4077, 11, 51648], "temperature": 0.0, "avg_logprob": -0.2030608714127741, + "compression_ratio": 1.5857142857142856, "no_speech_prob": 0.011916986666619778}, + {"id": 19, "seek": 5492, "start": 80.6, "end": 82.24000000000001, "text": " you + have to have a PhD.", "tokens": [51648, 291, 362, 281, 362, 257, 14476, 13, 51730], + "temperature": 0.0, "avg_logprob": -0.2030608714127741, "compression_ratio": 1.5857142857142856, + "no_speech_prob": 0.011916986666619778}, {"id": 20, "seek": 8224, "start": 82.24, + "end": 85.75999999999999, "text": " You have to be doing masters PhD, extra work.", + "tokens": [50364, 509, 362, 281, 312, 884, 19294, 14476, 11, 2857, 589, 13, 50540], + "temperature": 0.0, "avg_logprob": -0.20737360525822293, "compression_ratio": 1.7419354838709677, + "no_speech_prob": 0.0012850885977968574}, {"id": 21, "seek": 8224, "start": 85.75999999999999, + "end": 89.8, "text": " You''re not getting out of, you''re not getting into the + field out of college.", "tokens": [50540, 509, 434, 406, 1242, 484, 295, 11, 291, + 434, 406, 1242, 666, 264, 2519, 484, 295, 3859, 13, 50742], "temperature": 0.0, + "avg_logprob": -0.20737360525822293, "compression_ratio": 1.7419354838709677, "no_speech_prob": + 0.0012850885977968574}, {"id": 22, "seek": 8224, "start": 89.8, "end": 94.36, "text": + " So the next step was like, okay, what''s kind of new and growing in that field?", + "tokens": [50742, 407, 264, 958, 1823, 390, 411, 11, 1392, 11, 437, 311, 733, 295, + 777, 293, 4194, 294, 300, 2519, 30, 50970], "temperature": 0.0, "avg_logprob": -0.20737360525822293, + "compression_ratio": 1.7419354838709677, "no_speech_prob": 0.0012850885977968574}, + {"id": 23, "seek": 8224, "start": 94.36, "end": 97.16, "text": " Somewhere where + there isn''t already so much this set knowledge.", "tokens": [50970, 34500, 689, + 456, 1943, 380, 1217, 370, 709, 341, 992, 3601, 13, 51110], "temperature": 0.0, + "avg_logprob": -0.20737360525822293, "compression_ratio": 1.7419354838709677, "no_speech_prob": + 0.0012850885977968574}, {"id": 24, "seek": 8224, "start": 97.16, "end": 99.56, "text": + " And that''s where vector search came in.", "tokens": [51110, 400, 300, 311, 689, + 8062, 3164, 1361, 294, 13, 51230], "temperature": 0.0, "avg_logprob": -0.20737360525822293, + "compression_ratio": 1.7419354838709677, "no_speech_prob": 0.0012850885977968574}, + {"id": 25, "seek": 8224, "start": 99.56, "end": 103.24, "text": " And then Zillitz, + I found them and did the whole process.", "tokens": [51230, 400, 550, 1176, 373, + 6862, 11, 286, 1352, 552, 293, 630, 264, 1379, 1399, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.20737360525822293, "compression_ratio": 1.7419354838709677, + "no_speech_prob": 0.0012850885977968574}, {"id": 26, "seek": 8224, "start": 103.24, + "end": 106.16, "text": " And I really thought I fed it fit in.", "tokens": [51414, + 400, 286, 534, 1194, 286, 4636, 309, 3318, 294, 13, 51560], "temperature": 0.0, + "avg_logprob": -0.20737360525822293, "compression_ratio": 1.7419354838709677, "no_speech_prob": + 0.0012850885977968574}, {"id": 27, "seek": 8224, "start": 106.16, "end": 107.56, + "text": " And that''s where that took off.", "tokens": [51560, 400, 300, 311, 689, + 300, 1890, 766, 13, 51630], "temperature": 0.0, "avg_logprob": -0.20737360525822293, + "compression_ratio": 1.7419354838709677, "no_speech_prob": 0.0012850885977968574}, + {"id": 28, "seek": 8224, "start": 107.56, "end": 109.72, "text": " So that''s kind + of how I got to where I am right now.", "tokens": [51630, 407, 300, 311, 733, 295, + 577, 286, 658, 281, 689, 286, 669, 558, 586, 13, 51738], "temperature": 0.0, "avg_logprob": + -0.20737360525822293, "compression_ratio": 1.7419354838709677, "no_speech_prob": + 0.0012850885977968574}, {"id": 29, "seek": 10972, "start": 109.72, "end": 114.24, + "text": " But pretty much yeah, I throw a shout out of college, getting in the whole + field and", "tokens": [50364, 583, 1238, 709, 1338, 11, 286, 3507, 257, 8043, 484, + 295, 3859, 11, 1242, 294, 264, 1379, 2519, 293, 50590], "temperature": 0.0, "avg_logprob": + -0.3217706506902521, "compression_ratio": 1.593625498007968, "no_speech_prob": 0.007135322783142328}, + {"id": 30, "seek": 10972, "start": 114.24, "end": 116.52, "text": " figuring everything + out on how it all works.", "tokens": [50590, 15213, 1203, 484, 322, 577, 309, 439, + 1985, 13, 50704], "temperature": 0.0, "avg_logprob": -0.3217706506902521, "compression_ratio": + 1.593625498007968, "no_speech_prob": 0.007135322783142328}, {"id": 31, "seek": 10972, + "start": 116.52, "end": 117.76, "text": " Oh yeah, that''s cool.", "tokens": [50704, + 876, 1338, 11, 300, 311, 1627, 13, 50766], "temperature": 0.0, "avg_logprob": -0.3217706506902521, + "compression_ratio": 1.593625498007968, "no_speech_prob": 0.007135322783142328}, + {"id": 32, "seek": 10972, "start": 117.76, "end": 119.44, "text": " And can you + tell me a bit more?", "tokens": [50766, 400, 393, 291, 980, 385, 257, 857, 544, + 30, 50850], "temperature": 0.0, "avg_logprob": -0.3217706506902521, "compression_ratio": + 1.593625498007968, "no_speech_prob": 0.007135322783142328}, {"id": 33, "seek": 10972, + "start": 119.44, "end": 124.08, "text": " I''ve been also doing some tech stuff + for a few years here and there.", "tokens": [50850, 286, 600, 668, 611, 884, 512, + 7553, 1507, 337, 257, 1326, 924, 510, 293, 456, 13, 51082], "temperature": 0.0, + "avg_logprob": -0.3217706506902521, "compression_ratio": 1.593625498007968, "no_speech_prob": + 0.007135322783142328}, {"id": 34, "seek": 10972, "start": 124.08, "end": 129.44, + "text": " But like data engineer and you work with users.", "tokens": [51082, 583, + 411, 1412, 11403, 293, 291, 589, 365, 5022, 13, 51350], "temperature": 0.0, "avg_logprob": + -0.3217706506902521, "compression_ratio": 1.593625498007968, "no_speech_prob": 0.007135322783142328}, + {"id": 35, "seek": 10972, "start": 129.44, "end": 131.56, "text": " How exactly + that looks like?", "tokens": [51350, 1012, 2293, 300, 1542, 411, 30, 51456], "temperature": + 0.0, "avg_logprob": -0.3217706506902521, "compression_ratio": 1.593625498007968, + "no_speech_prob": 0.007135322783142328}, {"id": 36, "seek": 10972, "start": 131.56, + "end": 137.48, "text": " Yeah, so data engineer gets thrown around a lot at a lot + of companies.", "tokens": [51456, 865, 11, 370, 1412, 11403, 2170, 11732, 926, 257, + 688, 412, 257, 688, 295, 3431, 13, 51752], "temperature": 0.0, "avg_logprob": -0.3217706506902521, + "compression_ratio": 1.593625498007968, "no_speech_prob": 0.007135322783142328}, + {"id": 37, "seek": 13748, "start": 137.48, "end": 144.2, "text": " Right now, so + for me, how it works for me, data engineer falls into kind of just user success", + "tokens": [50364, 1779, 586, 11, 370, 337, 385, 11, 577, 309, 1985, 337, 385, 11, + 1412, 11403, 8804, 666, 733, 295, 445, 4195, 2245, 50700], "temperature": 0.0, "avg_logprob": + -0.19794039087971366, "compression_ratio": 1.7163636363636363, "no_speech_prob": + 0.005221668630838394}, {"id": 38, "seek": 13748, "start": 144.2, "end": 148.39999999999998, + "text": " and more also pre sale style of things.", "tokens": [50700, 293, 544, + 611, 659, 8680, 3758, 295, 721, 13, 50910], "temperature": 0.0, "avg_logprob": -0.19794039087971366, + "compression_ratio": 1.7163636363636363, "no_speech_prob": 0.005221668630838394}, + {"id": 39, "seek": 13748, "start": 148.39999999999998, "end": 153.6, "text": " So + how to use our tech, creating new use cases like we have a bootcamp where we show + examples", "tokens": [50910, 407, 577, 281, 764, 527, 7553, 11, 4084, 777, 764, + 3331, 411, 321, 362, 257, 11450, 24640, 689, 321, 855, 5110, 51170], "temperature": + 0.0, "avg_logprob": -0.19794039087971366, "compression_ratio": 1.7163636363636363, + "no_speech_prob": 0.005221668630838394}, {"id": 40, "seek": 13748, "start": 153.6, + "end": 155.16, "text": " of how to use Milvus.", "tokens": [51170, 295, 577, 281, + 764, 7036, 85, 301, 13, 51248], "temperature": 0.0, "avg_logprob": -0.19794039087971366, + "compression_ratio": 1.7163636363636363, "no_speech_prob": 0.005221668630838394}, + {"id": 41, "seek": 13748, "start": 155.16, "end": 156.16, "text": " That''s kind + of what we''re doing.", "tokens": [51248, 663, 311, 733, 295, 437, 321, 434, 884, + 13, 51298], "temperature": 0.0, "avg_logprob": -0.19794039087971366, "compression_ratio": + 1.7163636363636363, "no_speech_prob": 0.005221668630838394}, {"id": 42, "seek": + 13748, "start": 156.16, "end": 159.44, "text": " We''re talking to the customers + that are trying to learn how they can implement in their", "tokens": [51298, 492, + 434, 1417, 281, 264, 4581, 300, 366, 1382, 281, 1466, 577, 436, 393, 4445, 294, + 641, 51462], "temperature": 0.0, "avg_logprob": -0.19794039087971366, "compression_ratio": + 1.7163636363636363, "no_speech_prob": 0.005221668630838394}, {"id": 43, "seek": + 13748, "start": 159.44, "end": 161.64, "text": " system, what problems they''re + having.", "tokens": [51462, 1185, 11, 437, 2740, 436, 434, 1419, 13, 51572], "temperature": + 0.0, "avg_logprob": -0.19794039087971366, "compression_ratio": 1.7163636363636363, + "no_speech_prob": 0.005221668630838394}, {"id": 44, "seek": 13748, "start": 161.64, + "end": 164.32, "text": " So we''re the ones that are kind of front facing in the + company.", "tokens": [51572, 407, 321, 434, 264, 2306, 300, 366, 733, 295, 1868, + 7170, 294, 264, 2237, 13, 51706], "temperature": 0.0, "avg_logprob": -0.19794039087971366, + "compression_ratio": 1.7163636363636363, "no_speech_prob": 0.005221668630838394}, + {"id": 45, "seek": 16432, "start": 164.32, "end": 169.2, "text": " And then as a + data engineer, I''ve also worked with a lot on the cloud deployment, figuring", + "tokens": [50364, 400, 550, 382, 257, 1412, 11403, 11, 286, 600, 611, 2732, 365, + 257, 688, 322, 264, 4588, 19317, 11, 15213, 50608], "temperature": 0.0, "avg_logprob": + -0.2499818115234375, "compression_ratio": 1.7715355805243447, "no_speech_prob": + 0.010826865211129189}, {"id": 46, "seek": 16432, "start": 169.2, "end": 173.16, + "text": " that, optimizing that worked on some development aspects as well.", "tokens": + [50608, 300, 11, 40425, 300, 2732, 322, 512, 3250, 7270, 382, 731, 13, 50806], "temperature": + 0.0, "avg_logprob": -0.2499818115234375, "compression_ratio": 1.7715355805243447, + "no_speech_prob": 0.010826865211129189}, {"id": 47, "seek": 16432, "start": 173.16, + "end": 175.16, "text": " But it''s kind of a lot of hats.", "tokens": [50806, 583, + 309, 311, 733, 295, 257, 688, 295, 20549, 13, 50906], "temperature": 0.0, "avg_logprob": + -0.2499818115234375, "compression_ratio": 1.7715355805243447, "no_speech_prob": + 0.010826865211129189}, {"id": 48, "seek": 16432, "start": 175.16, "end": 179.79999999999998, + "text": " So it''s like startup data engineer, at least here, it''s pretty much + just whatever needs", "tokens": [50906, 407, 309, 311, 411, 18578, 1412, 11403, + 11, 412, 1935, 510, 11, 309, 311, 1238, 709, 445, 2035, 2203, 51138], "temperature": + 0.0, "avg_logprob": -0.2499818115234375, "compression_ratio": 1.7715355805243447, + "no_speech_prob": 0.010826865211129189}, {"id": 49, "seek": 16432, "start": 179.79999999999998, + "end": 182.0, "text": " help with or whatever needs work.", "tokens": [51138, 854, + 365, 420, 2035, 2203, 589, 13, 51248], "temperature": 0.0, "avg_logprob": -0.2499818115234375, + "compression_ratio": 1.7715355805243447, "no_speech_prob": 0.010826865211129189}, + {"id": 50, "seek": 16432, "start": 182.0, "end": 183.56, "text": " You kind of get + put on that.", "tokens": [51248, 509, 733, 295, 483, 829, 322, 300, 13, 51326], + "temperature": 0.0, "avg_logprob": -0.2499818115234375, "compression_ratio": 1.7715355805243447, + "no_speech_prob": 0.010826865211129189}, {"id": 51, "seek": 16432, "start": 183.56, + "end": 188.4, "text": " So it''s a cool opportunity to try a lot of it and get to + meet a lot of customers and", "tokens": [51326, 407, 309, 311, 257, 1627, 2650, + 281, 853, 257, 688, 295, 309, 293, 483, 281, 1677, 257, 688, 295, 4581, 293, 51568], + "temperature": 0.0, "avg_logprob": -0.2499818115234375, "compression_ratio": 1.7715355805243447, + "no_speech_prob": 0.010826865211129189}, {"id": 52, "seek": 16432, "start": 188.4, + "end": 191.79999999999998, "text": " cool people in all different parts of the field.", + "tokens": [51568, 1627, 561, 294, 439, 819, 3166, 295, 264, 2519, 13, 51738], "temperature": + 0.0, "avg_logprob": -0.2499818115234375, "compression_ratio": 1.7715355805243447, + "no_speech_prob": 0.010826865211129189}, {"id": 53, "seek": 19180, "start": 191.8, + "end": 198.16000000000003, "text": " Yeah, and you also learn to interact with users + because they bring a different perspective", "tokens": [50364, 865, 11, 293, 291, + 611, 1466, 281, 4648, 365, 5022, 570, 436, 1565, 257, 819, 4585, 50682], "temperature": + 0.0, "avg_logprob": -0.29569473266601565, "compression_ratio": 1.6094890510948905, + "no_speech_prob": 0.025342250242829323}, {"id": 54, "seek": 19180, "start": 198.16000000000003, + "end": 199.16000000000003, "text": " over things.", "tokens": [50682, 670, 721, + 13, 50732], "temperature": 0.0, "avg_logprob": -0.29569473266601565, "compression_ratio": + 1.6094890510948905, "no_speech_prob": 0.025342250242829323}, {"id": 55, "seek": + 19180, "start": 199.16000000000003, "end": 205.0, "text": " They probably don''t + focus as much on the integrated, but they need to solve something.", "tokens": [50732, + 814, 1391, 500, 380, 1879, 382, 709, 322, 264, 10919, 11, 457, 436, 643, 281, 5039, + 746, 13, 51024], "temperature": 0.0, "avg_logprob": -0.29569473266601565, "compression_ratio": + 1.6094890510948905, "no_speech_prob": 0.025342250242829323}, {"id": 56, "seek": + 19180, "start": 205.0, "end": 206.0, "text": " Right?", "tokens": [51024, 1779, + 30, 51074], "temperature": 0.0, "avg_logprob": -0.29569473266601565, "compression_ratio": + 1.6094890510948905, "no_speech_prob": 0.025342250242829323}, {"id": 57, "seek": + 19180, "start": 206.0, "end": 207.36, "text": " Oh yeah, it''s definitely that.", + "tokens": [51074, 876, 1338, 11, 309, 311, 2138, 300, 13, 51142], "temperature": + 0.0, "avg_logprob": -0.29569473266601565, "compression_ratio": 1.6094890510948905, + "no_speech_prob": 0.025342250242829323}, {"id": 58, "seek": 19180, "start": 207.36, + "end": 211.48000000000002, "text": " So it''s figuring out, seeing their use cases + is always crazy.", "tokens": [51142, 407, 309, 311, 15213, 484, 11, 2577, 641, 764, + 3331, 307, 1009, 3219, 13, 51348], "temperature": 0.0, "avg_logprob": -0.29569473266601565, + "compression_ratio": 1.6094890510948905, "no_speech_prob": 0.025342250242829323}, + {"id": 59, "seek": 19180, "start": 211.48000000000002, "end": 214.84, "text": " + See how much data they''re dealing with these cool ideas of what they''re trying + to do and", "tokens": [51348, 3008, 577, 709, 1412, 436, 434, 6260, 365, 613, 1627, + 3487, 295, 437, 436, 434, 1382, 281, 360, 293, 51516], "temperature": 0.0, "avg_logprob": + -0.29569473266601565, "compression_ratio": 1.6094890510948905, "no_speech_prob": + 0.025342250242829323}, {"id": 60, "seek": 19180, "start": 214.84, "end": 217.12, + "text": " seeing how we can make it work.", "tokens": [51516, 2577, 577, 321, 393, + 652, 309, 589, 13, 51630], "temperature": 0.0, "avg_logprob": -0.29569473266601565, + "compression_ratio": 1.6094890510948905, "no_speech_prob": 0.025342250242829323}, + {"id": 61, "seek": 19180, "start": 217.12, "end": 219.12, "text": " And usually + it all goes well.", "tokens": [51630, 400, 2673, 309, 439, 1709, 731, 13, 51730], + "temperature": 0.0, "avg_logprob": -0.29569473266601565, "compression_ratio": 1.6094890510948905, + "no_speech_prob": 0.025342250242829323}, {"id": 62, "seek": 21912, "start": 219.12, + "end": 220.28, "text": " We can figure out solutions.", "tokens": [50364, 492, 393, + 2573, 484, 6547, 13, 50422], "temperature": 0.0, "avg_logprob": -0.2259512930205374, + "compression_ratio": 1.834862385321101, "no_speech_prob": 0.08031974732875824}, + {"id": 63, "seek": 21912, "start": 220.28, "end": 221.28, "text": " We work together.", + "tokens": [50422, 492, 589, 1214, 13, 50472], "temperature": 0.0, "avg_logprob": + -0.2259512930205374, "compression_ratio": 1.834862385321101, "no_speech_prob": 0.08031974732875824}, + {"id": 64, "seek": 21912, "start": 221.28, "end": 222.28, "text": " We kind of keep + relationships.", "tokens": [50472, 492, 733, 295, 1066, 6159, 13, 50522], "temperature": + 0.0, "avg_logprob": -0.2259512930205374, "compression_ratio": 1.834862385321101, + "no_speech_prob": 0.08031974732875824}, {"id": 65, "seek": 21912, "start": 222.28, + "end": 223.28, "text": " I think it''s really cool.", "tokens": [50522, 286, 519, + 309, 311, 534, 1627, 13, 50572], "temperature": 0.0, "avg_logprob": -0.2259512930205374, + "compression_ratio": 1.834862385321101, "no_speech_prob": 0.08031974732875824}, + {"id": 66, "seek": 21912, "start": 223.28, "end": 224.76, "text": " And sometimes + it doesn''t work.", "tokens": [50572, 400, 2171, 309, 1177, 380, 589, 13, 50646], + "temperature": 0.0, "avg_logprob": -0.2259512930205374, "compression_ratio": 1.834862385321101, + "no_speech_prob": 0.08031974732875824}, {"id": 67, "seek": 21912, "start": 224.76, + "end": 226.48000000000002, "text": " Sometimes we need to put more things into Milvus.", + "tokens": [50646, 4803, 321, 643, 281, 829, 544, 721, 666, 7036, 85, 301, 13, 50732], + "temperature": 0.0, "avg_logprob": -0.2259512930205374, "compression_ratio": 1.834862385321101, + "no_speech_prob": 0.08031974732875824}, {"id": 68, "seek": 21912, "start": 226.48000000000002, + "end": 230.92000000000002, "text": " So we kind of keep the communication line open + and we kind of figure out more things", "tokens": [50732, 407, 321, 733, 295, 1066, + 264, 6101, 1622, 1269, 293, 321, 733, 295, 2573, 484, 544, 721, 50954], "temperature": + 0.0, "avg_logprob": -0.2259512930205374, "compression_ratio": 1.834862385321101, + "no_speech_prob": 0.08031974732875824}, {"id": 69, "seek": 21912, "start": 230.92000000000002, + "end": 234.8, "text": " to put in, kind of get their input on what we''re working + on and go from there.", "tokens": [50954, 281, 829, 294, 11, 733, 295, 483, 641, + 4846, 322, 437, 321, 434, 1364, 322, 293, 352, 490, 456, 13, 51148], "temperature": + 0.0, "avg_logprob": -0.2259512930205374, "compression_ratio": 1.834862385321101, + "no_speech_prob": 0.08031974732875824}, {"id": 70, "seek": 21912, "start": 234.8, + "end": 238.56, "text": " But it''s a lot of a back and forth conversation with users + and customers.", "tokens": [51148, 583, 309, 311, 257, 688, 295, 257, 646, 293, + 5220, 3761, 365, 5022, 293, 4581, 13, 51336], "temperature": 0.0, "avg_logprob": + -0.2259512930205374, "compression_ratio": 1.834862385321101, "no_speech_prob": 0.08031974732875824}, + {"id": 71, "seek": 21912, "start": 238.56, "end": 239.56, "text": " Yeah, for sure.", + "tokens": [51336, 865, 11, 337, 988, 13, 51386], "temperature": 0.0, "avg_logprob": + -0.2259512930205374, "compression_ratio": 1.834862385321101, "no_speech_prob": 0.08031974732875824}, + {"id": 72, "seek": 21912, "start": 239.56, "end": 243.6, "text": " It''s kind of + like, first you need to learn what is it that they''re trying to do, right?", "tokens": + [51386, 467, 311, 733, 295, 411, 11, 700, 291, 643, 281, 1466, 437, 307, 309, 300, + 436, 434, 1382, 281, 360, 11, 558, 30, 51588], "temperature": 0.0, "avg_logprob": + -0.2259512930205374, "compression_ratio": 1.834862385321101, "no_speech_prob": 0.08031974732875824}, + {"id": 73, "seek": 21912, "start": 243.6, "end": 248.4, "text": " And before, like + you suggest any solution because it takes a lot of time.", "tokens": [51588, 400, + 949, 11, 411, 291, 3402, 604, 3827, 570, 309, 2516, 257, 688, 295, 565, 13, 51828], + "temperature": 0.0, "avg_logprob": -0.2259512930205374, "compression_ratio": 1.834862385321101, + "no_speech_prob": 0.08031974732875824}, {"id": 74, "seek": 24840, "start": 248.4, + "end": 250.96, "text": " Do you feel the same way when you talk to them?", "tokens": + [50364, 1144, 291, 841, 264, 912, 636, 562, 291, 751, 281, 552, 30, 50492], "temperature": + 0.0, "avg_logprob": -0.20458404927314083, "compression_ratio": 1.7708978328173375, + "no_speech_prob": 0.004957969766110182}, {"id": 75, "seek": 24840, "start": 250.96, + "end": 255.36, "text": " Like instead of kind of jumping in and like solving, you + kind of try to figure it out.", "tokens": [50492, 1743, 2602, 295, 733, 295, 11233, + 294, 293, 411, 12606, 11, 291, 733, 295, 853, 281, 2573, 309, 484, 13, 50712], "temperature": + 0.0, "avg_logprob": -0.20458404927314083, "compression_ratio": 1.7708978328173375, + "no_speech_prob": 0.004957969766110182}, {"id": 76, "seek": 24840, "start": 255.36, + "end": 256.56, "text": " Or you do something else.", "tokens": [50712, 1610, 291, + 360, 746, 1646, 13, 50772], "temperature": 0.0, "avg_logprob": -0.20458404927314083, + "compression_ratio": 1.7708978328173375, "no_speech_prob": 0.004957969766110182}, + {"id": 77, "seek": 24840, "start": 256.56, "end": 258.8, "text": " You do you have + a different approach.", "tokens": [50772, 509, 360, 291, 362, 257, 819, 3109, 13, + 50884], "temperature": 0.0, "avg_logprob": -0.20458404927314083, "compression_ratio": + 1.7708978328173375, "no_speech_prob": 0.004957969766110182}, {"id": 78, "seek": + 24840, "start": 258.8, "end": 262.96, "text": " So I think I go the way of like, + yeah, kind of get all the info first because everyone''s", "tokens": [50884, 407, + 286, 519, 286, 352, 264, 636, 295, 411, 11, 1338, 11, 733, 295, 483, 439, 264, 13614, + 700, 570, 1518, 311, 51092], "temperature": 0.0, "avg_logprob": -0.20458404927314083, + "compression_ratio": 1.7708978328173375, "no_speech_prob": 0.004957969766110182}, + {"id": 79, "seek": 24840, "start": 262.96, "end": 263.96, "text": " use cases different.", + "tokens": [51092, 764, 3331, 819, 13, 51142], "temperature": 0.0, "avg_logprob": + -0.20458404927314083, "compression_ratio": 1.7708978328173375, "no_speech_prob": + 0.004957969766110182}, {"id": 80, "seek": 24840, "start": 263.96, "end": 266.28000000000003, + "text": " None of them are ever the same.", "tokens": [51142, 14492, 295, 552, 366, + 1562, 264, 912, 13, 51258], "temperature": 0.0, "avg_logprob": -0.20458404927314083, + "compression_ratio": 1.7708978328173375, "no_speech_prob": 0.004957969766110182}, + {"id": 81, "seek": 24840, "start": 266.28000000000003, "end": 269.24, "text": " + And then kind of come back to the team, kind of discuss it.", "tokens": [51258, + 400, 550, 733, 295, 808, 646, 281, 264, 1469, 11, 733, 295, 2248, 309, 13, 51406], + "temperature": 0.0, "avg_logprob": -0.20458404927314083, "compression_ratio": 1.7708978328173375, + "no_speech_prob": 0.004957969766110182}, {"id": 82, "seek": 24840, "start": 269.24, + "end": 272.84000000000003, "text": " See, do we have any else that''s been doing + it sort of similar?", "tokens": [51406, 3008, 11, 360, 321, 362, 604, 1646, 300, + 311, 668, 884, 309, 1333, 295, 2531, 30, 51586], "temperature": 0.0, "avg_logprob": + -0.20458404927314083, "compression_ratio": 1.7708978328173375, "no_speech_prob": + 0.004957969766110182}, {"id": 83, "seek": 24840, "start": 272.84000000000003, "end": + 273.84000000000003, "text": " Do we have a solution?", "tokens": [51586, 1144, 321, + 362, 257, 3827, 30, 51636], "temperature": 0.0, "avg_logprob": -0.20458404927314083, + "compression_ratio": 1.7708978328173375, "no_speech_prob": 0.004957969766110182}, + {"id": 84, "seek": 24840, "start": 273.84000000000003, "end": 277.52, "text": " + Because sometimes we''ve had to do hacky solutions with our like our previous version.", + "tokens": [51636, 1436, 2171, 321, 600, 632, 281, 360, 10339, 88, 6547, 365, 527, + 411, 527, 3894, 3037, 13, 51820], "temperature": 0.0, "avg_logprob": -0.20458404927314083, + "compression_ratio": 1.7708978328173375, "no_speech_prob": 0.004957969766110182}, + {"id": 85, "seek": 27752, "start": 277.52, "end": 281.96, "text": " There were things + that just weren''t up to par and you couldn''t really change it that much", "tokens": + [50364, 821, 645, 721, 300, 445, 4999, 380, 493, 281, 971, 293, 291, 2809, 380, + 534, 1319, 309, 300, 709, 50586], "temperature": 0.0, "avg_logprob": -0.22230359670278188, + "compression_ratio": 1.7702265372168284, "no_speech_prob": 0.0025041503831744194}, + {"id": 86, "seek": 27752, "start": 281.96, "end": 285.2, "text": " because it was + like kind of on an old style of doing it and it didn''t work.", "tokens": [50586, + 570, 309, 390, 411, 733, 295, 322, 364, 1331, 3758, 295, 884, 309, 293, 309, 994, + 380, 589, 13, 50748], "temperature": 0.0, "avg_logprob": -0.22230359670278188, "compression_ratio": + 1.7702265372168284, "no_speech_prob": 0.0025041503831744194}, {"id": 87, "seek": + 27752, "start": 285.2, "end": 287.96, "text": " So you kind of do some hacky solution + for them.", "tokens": [50748, 407, 291, 733, 295, 360, 512, 10339, 88, 3827, 337, + 552, 13, 50886], "temperature": 0.0, "avg_logprob": -0.22230359670278188, "compression_ratio": + 1.7702265372168284, "no_speech_prob": 0.0025041503831744194}, {"id": 88, "seek": + 27752, "start": 287.96, "end": 290.64, "text": " Some way to trick it into working + it out work in production.", "tokens": [50886, 2188, 636, 281, 4282, 309, 666, 1364, + 309, 484, 589, 294, 4265, 13, 51020], "temperature": 0.0, "avg_logprob": -0.22230359670278188, + "compression_ratio": 1.7702265372168284, "no_speech_prob": 0.0025041503831744194}, + {"id": 89, "seek": 27752, "start": 290.64, "end": 295.96, "text": " But then we + take notes of that and kind of later on put it into like, so like when we''re", + "tokens": [51020, 583, 550, 321, 747, 5570, 295, 300, 293, 733, 295, 1780, 322, + 829, 309, 666, 411, 11, 370, 411, 562, 321, 434, 51286], "temperature": 0.0, "avg_logprob": + -0.22230359670278188, "compression_ratio": 1.7702265372168284, "no_speech_prob": + 0.0025041503831744194}, {"id": 90, "seek": 27752, "start": 295.96, "end": 297.79999999999995, + "text": " going to new version, okay, we got to think about this.", "tokens": [51286, + 516, 281, 777, 3037, 11, 1392, 11, 321, 658, 281, 519, 466, 341, 13, 51378], "temperature": + 0.0, "avg_logprob": -0.22230359670278188, "compression_ratio": 1.7702265372168284, + "no_speech_prob": 0.0025041503831744194}, {"id": 91, "seek": 27752, "start": 297.79999999999995, + "end": 299.28, "text": " We got to improve on this.", "tokens": [51378, 492, 658, + 281, 3470, 322, 341, 13, 51452], "temperature": 0.0, "avg_logprob": -0.22230359670278188, + "compression_ratio": 1.7702265372168284, "no_speech_prob": 0.0025041503831744194}, + {"id": 92, "seek": 27752, "start": 299.28, "end": 302.68, "text": " And yeah, but + it all starts with kind of figuring out what they''re doing, getting the whole", + "tokens": [51452, 400, 1338, 11, 457, 309, 439, 3719, 365, 733, 295, 15213, 484, + 437, 436, 434, 884, 11, 1242, 264, 1379, 51622], "temperature": 0.0, "avg_logprob": + -0.22230359670278188, "compression_ratio": 1.7702265372168284, "no_speech_prob": + 0.0025041503831744194}, {"id": 93, "seek": 27752, "start": 302.68, "end": 303.68, + "text": " picture.", "tokens": [51622, 3036, 13, 51672], "temperature": 0.0, "avg_logprob": + -0.22230359670278188, "compression_ratio": 1.7702265372168284, "no_speech_prob": + 0.0025041503831744194}, {"id": 94, "seek": 30368, "start": 303.68, "end": 307.36, + "text": " And sometimes like there''s always conversations where they can''t really + say everything", "tokens": [50364, 400, 2171, 411, 456, 311, 1009, 7315, 689, 436, + 393, 380, 534, 584, 1203, 50548], "temperature": 0.0, "avg_logprob": -0.22832530628551137, + "compression_ratio": 1.7976539589442815, "no_speech_prob": 0.2567178010940552}, + {"id": 95, "seek": 30368, "start": 307.36, "end": 311.36, "text": " because a lot + of these places that are these companies are, there''s a reason that they''re", + "tokens": [50548, 570, 257, 688, 295, 613, 3190, 300, 366, 613, 3431, 366, 11, 456, + 311, 257, 1778, 300, 436, 434, 50748], "temperature": 0.0, "avg_logprob": -0.22832530628551137, + "compression_ratio": 1.7976539589442815, "no_speech_prob": 0.2567178010940552}, + {"id": 96, "seek": 30368, "start": 311.36, "end": 314.32, "text": " like, they got + to be secretive because it''s a new field and they may have a really good", "tokens": + [50748, 411, 11, 436, 658, 281, 312, 4054, 488, 570, 309, 311, 257, 777, 2519, 293, + 436, 815, 362, 257, 534, 665, 50896], "temperature": 0.0, "avg_logprob": -0.22832530628551137, + "compression_ratio": 1.7976539589442815, "no_speech_prob": 0.2567178010940552}, + {"id": 97, "seek": 30368, "start": 314.32, "end": 315.32, "text": " new idea.", + "tokens": [50896, 777, 1558, 13, 50946], "temperature": 0.0, "avg_logprob": -0.22832530628551137, + "compression_ratio": 1.7976539589442815, "no_speech_prob": 0.2567178010940552}, + {"id": 98, "seek": 30368, "start": 315.32, "end": 319.28000000000003, "text": " + But it''s kind of like extracting as much as we can without like crossing those + bounds,", "tokens": [50946, 583, 309, 311, 733, 295, 411, 49844, 382, 709, 382, + 321, 393, 1553, 411, 14712, 729, 29905, 11, 51144], "temperature": 0.0, "avg_logprob": + -0.22832530628551137, "compression_ratio": 1.7976539589442815, "no_speech_prob": + 0.2567178010940552}, {"id": 99, "seek": 30368, "start": 319.28000000000003, "end": + 322.4, "text": " getting that in for the not a lot of tell and seeing if it can + help them with what we", "tokens": [51144, 1242, 300, 294, 337, 264, 406, 257, 688, + 295, 980, 293, 2577, 498, 309, 393, 854, 552, 365, 437, 321, 51300], "temperature": + 0.0, "avg_logprob": -0.22832530628551137, "compression_ratio": 1.7976539589442815, + "no_speech_prob": 0.2567178010940552}, {"id": 100, "seek": 30368, "start": 322.4, + "end": 323.4, "text": " got.", "tokens": [51300, 658, 13, 51350], "temperature": + 0.0, "avg_logprob": -0.22832530628551137, "compression_ratio": 1.7976539589442815, + "no_speech_prob": 0.2567178010940552}, {"id": 101, "seek": 30368, "start": 323.4, + "end": 325.28000000000003, "text": " But it''s like a big team effort.", "tokens": + [51350, 583, 309, 311, 411, 257, 955, 1469, 4630, 13, 51444], "temperature": 0.0, + "avg_logprob": -0.22832530628551137, "compression_ratio": 1.7976539589442815, "no_speech_prob": + 0.2567178010940552}, {"id": 102, "seek": 30368, "start": 325.28000000000003, "end": + 326.28000000000003, "text": " It''s not just me.", "tokens": [51444, 467, 311, 406, + 445, 385, 13, 51494], "temperature": 0.0, "avg_logprob": -0.22832530628551137, "compression_ratio": + 1.7976539589442815, "no_speech_prob": 0.2567178010940552}, {"id": 103, "seek": 30368, + "start": 326.28000000000003, "end": 329.84000000000003, "text": " So I talked to + them, bring it back and then we all work together to solve it.", "tokens": [51494, + 407, 286, 2825, 281, 552, 11, 1565, 309, 646, 293, 550, 321, 439, 589, 1214, 281, + 5039, 309, 13, 51672], "temperature": 0.0, "avg_logprob": -0.22832530628551137, + "compression_ratio": 1.7976539589442815, "no_speech_prob": 0.2567178010940552}, + {"id": 104, "seek": 30368, "start": 329.84000000000003, "end": 332.28000000000003, + "text": " Yeah, yeah, for sure, for sure.", "tokens": [51672, 865, 11, 1338, 11, + 337, 988, 11, 337, 988, 13, 51794], "temperature": 0.0, "avg_logprob": -0.22832530628551137, + "compression_ratio": 1.7976539589442815, "no_speech_prob": 0.2567178010940552}, + {"id": 105, "seek": 33228, "start": 332.28, "end": 337.55999999999995, "text": " + And I, your users kind of aware of that you are helping them with vector search.", + "tokens": [50364, 400, 286, 11, 428, 5022, 733, 295, 3650, 295, 300, 291, 366, 4315, + 552, 365, 8062, 3164, 13, 50628], "temperature": 0.0, "avg_logprob": -0.19224947493597372, + "compression_ratio": 1.901360544217687, "no_speech_prob": 0.009422517381608486}, + {"id": 106, "seek": 33228, "start": 337.55999999999995, "end": 340.03999999999996, + "text": " Do they even care?", "tokens": [50628, 1144, 436, 754, 1127, 30, 50752], + "temperature": 0.0, "avg_logprob": -0.19224947493597372, "compression_ratio": 1.901360544217687, + "no_speech_prob": 0.009422517381608486}, {"id": 107, "seek": 33228, "start": 340.03999999999996, + "end": 345.44, "text": " So it''s fifth, there may be not fifth, like I would say + 70% of the people we talk to are", "tokens": [50752, 407, 309, 311, 9266, 11, 456, + 815, 312, 406, 9266, 11, 411, 286, 576, 584, 5285, 4, 295, 264, 561, 321, 751, 281, + 366, 51022], "temperature": 0.0, "avg_logprob": -0.19224947493597372, "compression_ratio": + 1.901360544217687, "no_speech_prob": 0.009422517381608486}, {"id": 108, "seek": + 33228, "start": 345.44, "end": 346.44, "text": " aware from it.", "tokens": [51022, + 3650, 490, 309, 13, 51072], "temperature": 0.0, "avg_logprob": -0.19224947493597372, + "compression_ratio": 1.901360544217687, "no_speech_prob": 0.009422517381608486}, + {"id": 109, "seek": 33228, "start": 346.44, "end": 347.84, "text": " They come to + us with help.", "tokens": [51072, 814, 808, 281, 505, 365, 854, 13, 51142], "temperature": + 0.0, "avg_logprob": -0.19224947493597372, "compression_ratio": 1.901360544217687, + "no_speech_prob": 0.009422517381608486}, {"id": 110, "seek": 33228, "start": 347.84, + "end": 351.03999999999996, "text": " So they know what they''re kind of getting + into and they know what they need vector search", "tokens": [51142, 407, 436, 458, + 437, 436, 434, 733, 295, 1242, 666, 293, 436, 458, 437, 436, 643, 8062, 3164, 51302], + "temperature": 0.0, "avg_logprob": -0.19224947493597372, "compression_ratio": 1.901360544217687, + "no_speech_prob": 0.009422517381608486}, {"id": 111, "seek": 33228, "start": 351.03999999999996, + "end": 354.11999999999995, "text": " for and they know like what they''re doing + and why they''re doing it.", "tokens": [51302, 337, 293, 436, 458, 411, 437, 436, + 434, 884, 293, 983, 436, 434, 884, 309, 13, 51456], "temperature": 0.0, "avg_logprob": + -0.19224947493597372, "compression_ratio": 1.901360544217687, "no_speech_prob": + 0.009422517381608486}, {"id": 112, "seek": 33228, "start": 354.11999999999995, "end": + 358.15999999999997, "text": " But sometimes we also get people that are, hey, like + I want to find similar images.", "tokens": [51456, 583, 2171, 321, 611, 483, 561, + 300, 366, 11, 4177, 11, 411, 286, 528, 281, 915, 2531, 5267, 13, 51658], "temperature": + 0.0, "avg_logprob": -0.19224947493597372, "compression_ratio": 1.901360544217687, + "no_speech_prob": 0.009422517381608486}, {"id": 113, "seek": 33228, "start": 358.15999999999997, + "end": 362.11999999999995, "text": " And there it''s like, we have like the simple + like tutorials that kind of deal with it,", "tokens": [51658, 400, 456, 309, 311, + 411, 11, 321, 362, 411, 264, 2199, 411, 17616, 300, 733, 295, 2028, 365, 309, 11, + 51856], "temperature": 0.0, "avg_logprob": -0.19224947493597372, "compression_ratio": + 1.901360544217687, "no_speech_prob": 0.009422517381608486}, {"id": 114, "seek": + 36212, "start": 362.12, "end": 363.32, "text": " but they want to know more about + it.", "tokens": [50364, 457, 436, 528, 281, 458, 544, 466, 309, 13, 50424], "temperature": + 0.0, "avg_logprob": -0.21715760532813735, "compression_ratio": 1.835820895522388, + "no_speech_prob": 0.002792703453451395}, {"id": 115, "seek": 36212, "start": 363.32, + "end": 366.36, "text": " So there is some explaining of what vector search is.", + "tokens": [50424, 407, 456, 307, 512, 13468, 295, 437, 8062, 3164, 307, 13, 50576], + "temperature": 0.0, "avg_logprob": -0.21715760532813735, "compression_ratio": 1.835820895522388, + "no_speech_prob": 0.002792703453451395}, {"id": 116, "seek": 36212, "start": 366.36, + "end": 370.68, "text": " What vectors are sometimes that some like it''s understandable, + like not everyone goes", "tokens": [50576, 708, 18875, 366, 2171, 300, 512, 411, + 309, 311, 25648, 11, 411, 406, 1518, 1709, 50792], "temperature": 0.0, "avg_logprob": + -0.21715760532813735, "compression_ratio": 1.835820895522388, "no_speech_prob": + 0.002792703453451395}, {"id": 117, "seek": 36212, "start": 370.68, "end": 374.12, + "text": " studies machine learning and knows what like vectors are math as well.", + "tokens": [50792, 5313, 3479, 2539, 293, 3255, 437, 411, 18875, 366, 5221, 382, + 731, 13, 50964], "temperature": 0.0, "avg_logprob": -0.21715760532813735, "compression_ratio": + 1.835820895522388, "no_speech_prob": 0.002792703453451395}, {"id": 118, "seek": + 36212, "start": 374.12, "end": 378.72, "text": " But I would say 70% of the time + they get it and they know what they''re getting into, but", "tokens": [50964, 583, + 286, 576, 584, 5285, 4, 295, 264, 565, 436, 483, 309, 293, 436, 458, 437, 436, 434, + 1242, 666, 11, 457, 51194], "temperature": 0.0, "avg_logprob": -0.21715760532813735, + "compression_ratio": 1.835820895522388, "no_speech_prob": 0.002792703453451395}, + {"id": 119, "seek": 36212, "start": 378.72, "end": 383.56, "text": " 30% of the + time it''s also just like a whole new world and they don''t really like we explain", + "tokens": [51194, 2217, 4, 295, 264, 565, 309, 311, 611, 445, 411, 257, 1379, 777, + 1002, 293, 436, 500, 380, 534, 411, 321, 2903, 51436], "temperature": 0.0, "avg_logprob": + -0.21715760532813735, "compression_ratio": 1.835820895522388, "no_speech_prob": + 0.002792703453451395}, {"id": 120, "seek": 36212, "start": 383.56, "end": 386.56, + "text": " it, but it''s not like a can''t explain it all in one day.", "tokens": + [51436, 309, 11, 457, 309, 311, 406, 411, 257, 393, 380, 2903, 309, 439, 294, 472, + 786, 13, 51586], "temperature": 0.0, "avg_logprob": -0.21715760532813735, "compression_ratio": + 1.835820895522388, "no_speech_prob": 0.002792703453451395}, {"id": 121, "seek": + 36212, "start": 386.56, "end": 388.44, "text": " There''s a lot of stuff that goes + behind it.", "tokens": [51586, 821, 311, 257, 688, 295, 1507, 300, 1709, 2261, 309, + 13, 51680], "temperature": 0.0, "avg_logprob": -0.21715760532813735, "compression_ratio": + 1.835820895522388, "no_speech_prob": 0.002792703453451395}, {"id": 122, "seek": + 36212, "start": 388.44, "end": 391.08, "text": " Sure you might touch on vectors + one day, but then you have to get into the algorithms", "tokens": [51680, 4894, + 291, 1062, 2557, 322, 18875, 472, 786, 11, 457, 550, 291, 362, 281, 483, 666, 264, + 14642, 51812], "temperature": 0.0, "avg_logprob": -0.21715760532813735, "compression_ratio": + 1.835820895522388, "no_speech_prob": 0.002792703453451395}, {"id": 123, "seek": + 39108, "start": 391.08, "end": 394.76, "text": " next day and then it''s sort of + like keeping that relationship and answering questions", "tokens": [50364, 958, + 786, 293, 550, 309, 311, 1333, 295, 411, 5145, 300, 2480, 293, 13430, 1651, 50548], + "temperature": 0.0, "avg_logprob": -0.24489309491902372, "compression_ratio": 1.8206896551724139, + "no_speech_prob": 0.012401336804032326}, {"id": 124, "seek": 39108, "start": 394.76, + "end": 396.28, "text": " whenever they come up.", "tokens": [50548, 5699, 436, 808, + 493, 13, 50624], "temperature": 0.0, "avg_logprob": -0.24489309491902372, "compression_ratio": + 1.8206896551724139, "no_speech_prob": 0.012401336804032326}, {"id": 125, "seek": + 39108, "start": 396.28, "end": 398.15999999999997, "text": " Oh yeah, oh yeah, for + sure.", "tokens": [50624, 876, 1338, 11, 1954, 1338, 11, 337, 988, 13, 50718], "temperature": + 0.0, "avg_logprob": -0.24489309491902372, "compression_ratio": 1.8206896551724139, + "no_speech_prob": 0.012401336804032326}, {"id": 126, "seek": 39108, "start": 398.15999999999997, + "end": 402.08, "text": " And are you using like most of the time you''re using Milders, + right?", "tokens": [50718, 400, 366, 291, 1228, 411, 881, 295, 264, 565, 291, 434, + 1228, 376, 793, 433, 11, 558, 30, 50914], "temperature": 0.0, "avg_logprob": -0.24489309491902372, + "compression_ratio": 1.8206896551724139, "no_speech_prob": 0.012401336804032326}, + {"id": 127, "seek": 39108, "start": 402.08, "end": 407.36, "text": " Like as part + of your user engagement or do you like how does it look like?", "tokens": [50914, + 1743, 382, 644, 295, 428, 4195, 8742, 420, 360, 291, 411, 577, 775, 309, 574, 411, + 30, 51178], "temperature": 0.0, "avg_logprob": -0.24489309491902372, "compression_ratio": + 1.8206896551724139, "no_speech_prob": 0.012401336804032326}, {"id": 128, "seek": + 39108, "start": 407.36, "end": 411.76, "text": " So you bring the database and you + say you know it can solve a bunch of different use cases,", "tokens": [51178, 407, + 291, 1565, 264, 8149, 293, 291, 584, 291, 458, 309, 393, 5039, 257, 3840, 295, 819, + 764, 3331, 11, 51398], "temperature": 0.0, "avg_logprob": -0.24489309491902372, + "compression_ratio": 1.8206896551724139, "no_speech_prob": 0.012401336804032326}, + {"id": 129, "seek": 39108, "start": 411.76, "end": 416.47999999999996, "text": " + but you know we also need to vectorize your data or maybe they bring the vectors.", + "tokens": [51398, 457, 291, 458, 321, 611, 643, 281, 8062, 1125, 428, 1412, 420, + 1310, 436, 1565, 264, 18875, 13, 51634], "temperature": 0.0, "avg_logprob": -0.24489309491902372, + "compression_ratio": 1.8206896551724139, "no_speech_prob": 0.012401336804032326}, + {"id": 130, "seek": 39108, "start": 416.47999999999996, "end": 417.47999999999996, + "text": " How does it look like?", "tokens": [51634, 1012, 775, 309, 574, 411, 30, + 51684], "temperature": 0.0, "avg_logprob": -0.24489309491902372, "compression_ratio": + 1.8206896551724139, "no_speech_prob": 0.012401336804032326}, {"id": 131, "seek": + 39108, "start": 417.47999999999996, "end": 420.96, "text": " So they they usually + bring the vectors themselves.", "tokens": [51684, 407, 436, 436, 2673, 1565, 264, + 18875, 2969, 13, 51858], "temperature": 0.0, "avg_logprob": -0.24489309491902372, + "compression_ratio": 1.8206896551724139, "no_speech_prob": 0.012401336804032326}, + {"id": 132, "seek": 42096, "start": 420.96, "end": 424.12, "text": " We''re currently + going working on as well as we''re building something a little bit above", "tokens": + [50364, 492, 434, 4362, 516, 1364, 322, 382, 731, 382, 321, 434, 2390, 746, 257, + 707, 857, 3673, 50522], "temperature": 0.0, "avg_logprob": -0.25056501254913915, + "compression_ratio": 1.8419452887537995, "no_speech_prob": 0.0016592275351285934}, + {"id": 133, "seek": 42096, "start": 424.12, "end": 427.56, "text": " Milders for + actually getting the vectors, but that''s the little in the working progress", "tokens": + [50522, 376, 793, 433, 337, 767, 1242, 264, 18875, 11, 457, 300, 311, 264, 707, + 294, 264, 1364, 4205, 50694], "temperature": 0.0, "avg_logprob": -0.25056501254913915, + "compression_ratio": 1.8419452887537995, "no_speech_prob": 0.0016592275351285934}, + {"id": 134, "seek": 42096, "start": 427.56, "end": 428.88, "text": " and it should + be releasing soon.", "tokens": [50694, 293, 309, 820, 312, 16327, 2321, 13, 50760], + "temperature": 0.0, "avg_logprob": -0.25056501254913915, "compression_ratio": 1.8419452887537995, + "no_speech_prob": 0.0016592275351285934}, {"id": 135, "seek": 42096, "start": 428.88, + "end": 434.76, "text": " But for now it''s always we have like our examples, we + have a bootcamp where we pull like", "tokens": [50760, 583, 337, 586, 309, 311, + 1009, 321, 362, 411, 527, 5110, 11, 321, 362, 257, 11450, 24640, 689, 321, 2235, + 411, 51054], "temperature": 0.0, "avg_logprob": -0.25056501254913915, "compression_ratio": + 1.8419452887537995, "no_speech_prob": 0.0016592275351285934}, {"id": 136, "seek": + 42096, "start": 434.76, "end": 436.28, "text": " the basic like the basic pipeline.", + "tokens": [51054, 264, 3875, 411, 264, 3875, 15517, 13, 51130], "temperature": 0.0, + "avg_logprob": -0.25056501254913915, "compression_ratio": 1.8419452887537995, "no_speech_prob": + 0.0016592275351285934}, {"id": 137, "seek": 42096, "start": 436.28, "end": 440.4, + "text": " It''s always around Milders, but for like images we have a resident 50 + and we kind of show", "tokens": [51130, 467, 311, 1009, 926, 376, 793, 433, 11, + 457, 337, 411, 5267, 321, 362, 257, 10832, 2625, 293, 321, 733, 295, 855, 51336], + "temperature": 0.0, "avg_logprob": -0.25056501254913915, "compression_ratio": 1.8419452887537995, + "no_speech_prob": 0.0016592275351285934}, {"id": 138, "seek": 42096, "start": 440.4, + "end": 442.35999999999996, "text": " them how it goes in that process.", "tokens": + [51336, 552, 577, 309, 1709, 294, 300, 1399, 13, 51434], "temperature": 0.0, "avg_logprob": + -0.25056501254913915, "compression_ratio": 1.8419452887537995, "no_speech_prob": + 0.0016592275351285934}, {"id": 139, "seek": 42096, "start": 442.35999999999996, + "end": 445.71999999999997, "text": " So that''s for the 30% we kind of go over like + one pipeline.", "tokens": [51434, 407, 300, 311, 337, 264, 2217, 4, 321, 733, 295, + 352, 670, 411, 472, 15517, 13, 51602], "temperature": 0.0, "avg_logprob": -0.25056501254913915, + "compression_ratio": 1.8419452887537995, "no_speech_prob": 0.0016592275351285934}, + {"id": 140, "seek": 42096, "start": 445.71999999999997, "end": 450.56, "text": " + It''s like a small file because it''s just three steps encode it and bet it and + then search", "tokens": [51602, 467, 311, 411, 257, 1359, 3991, 570, 309, 311, 445, + 1045, 4439, 2058, 1429, 309, 293, 778, 309, 293, 550, 3164, 51844], "temperature": + 0.0, "avg_logprob": -0.25056501254913915, "compression_ratio": 1.8419452887537995, + "no_speech_prob": 0.0016592275351285934}, {"id": 141, "seek": 45056, "start": 450.56, + "end": 451.56, "text": " it.", "tokens": [50364, 309, 13, 50414], "temperature": + 0.0, "avg_logprob": -0.19409389772276947, "compression_ratio": 1.8166089965397925, + "no_speech_prob": 0.0007536117336712778}, {"id": 142, "seek": 45056, "start": 451.56, + "end": 457.68, "text": " But yeah, so most of the time they already have their embeddings + though with those bigger", "tokens": [50414, 583, 1338, 11, 370, 881, 295, 264, + 565, 436, 1217, 362, 641, 12240, 29432, 1673, 365, 729, 3801, 50720], "temperature": + 0.0, "avg_logprob": -0.19409389772276947, "compression_ratio": 1.8166089965397925, + "no_speech_prob": 0.0007536117336712778}, {"id": 143, "seek": 45056, "start": 457.68, + "end": 462.52, "text": " companies who knows who know what vector search is, who + knows what they''re getting into.", "tokens": [50720, 3431, 567, 3255, 567, 458, + 437, 8062, 3164, 307, 11, 567, 3255, 437, 436, 434, 1242, 666, 13, 50962], "temperature": + 0.0, "avg_logprob": -0.19409389772276947, "compression_ratio": 1.8166089965397925, + "no_speech_prob": 0.0007536117336712778}, {"id": 144, "seek": 45056, "start": 462.52, + "end": 466.48, "text": " They already have their 512 dimensions 10 million vectors.", + "tokens": [50962, 814, 1217, 362, 641, 1025, 4762, 12819, 1266, 2459, 18875, 13, + 51160], "temperature": 0.0, "avg_logprob": -0.19409389772276947, "compression_ratio": + 1.8166089965397925, "no_speech_prob": 0.0007536117336712778}, {"id": 145, "seek": + 45056, "start": 466.48, "end": 467.56, "text": " They know what they''re getting + into.", "tokens": [51160, 814, 458, 437, 436, 434, 1242, 666, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.19409389772276947, "compression_ratio": 1.8166089965397925, + "no_speech_prob": 0.0007536117336712778}, {"id": 146, "seek": 45056, "start": 467.56, + "end": 470.8, "text": " So they just want to see okay how many how fast can you + do it?", "tokens": [51214, 407, 436, 445, 528, 281, 536, 1392, 577, 867, 577, 2370, + 393, 291, 360, 309, 30, 51376], "temperature": 0.0, "avg_logprob": -0.19409389772276947, + "compression_ratio": 1.8166089965397925, "no_speech_prob": 0.0007536117336712778}, + {"id": 147, "seek": 45056, "start": 470.8, "end": 471.8, "text": " What are the + bottlenecks?", "tokens": [51376, 708, 366, 264, 44641, 2761, 30, 51426], "temperature": + 0.0, "avg_logprob": -0.19409389772276947, "compression_ratio": 1.8166089965397925, + "no_speech_prob": 0.0007536117336712778}, {"id": 148, "seek": 45056, "start": 471.8, + "end": 474.84000000000003, "text": " Where can we like what do we need to scale + out if we''re going to scale?", "tokens": [51426, 2305, 393, 321, 411, 437, 360, + 321, 643, 281, 4373, 484, 498, 321, 434, 516, 281, 4373, 30, 51578], "temperature": + 0.0, "avg_logprob": -0.19409389772276947, "compression_ratio": 1.8166089965397925, + "no_speech_prob": 0.0007536117336712778}, {"id": 149, "seek": 45056, "start": 474.84000000000003, + "end": 480.44, "text": " So it''s like again 70 30 the 30% usually you need to go + over the actual embeddings as", "tokens": [51578, 407, 309, 311, 411, 797, 5285, + 2217, 264, 2217, 4, 2673, 291, 643, 281, 352, 670, 264, 3539, 12240, 29432, 382, + 51858], "temperature": 0.0, "avg_logprob": -0.19409389772276947, "compression_ratio": + 1.8166089965397925, "no_speech_prob": 0.0007536117336712778}, {"id": 150, "seek": + 48044, "start": 480.44, "end": 481.44, "text": " well.", "tokens": [50364, 731, + 13, 50414], "temperature": 0.0, "avg_logprob": -0.2696099009940295, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.00425342470407486}, {"id": 151, "seek": + 48044, "start": 481.44, "end": 484.44, "text": " So just like a quick neural that + lesson.", "tokens": [50414, 407, 445, 411, 257, 1702, 18161, 300, 6898, 13, 50564], + "temperature": 0.0, "avg_logprob": -0.2696099009940295, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.00425342470407486}, {"id": 152, "seek": 48044, "start": 484.44, + "end": 485.44, "text": " Yeah, yeah.", "tokens": [50564, 865, 11, 1338, 13, 50614], + "temperature": 0.0, "avg_logprob": -0.2696099009940295, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.00425342470407486}, {"id": 153, "seek": 48044, "start": 485.44, + "end": 492.0, "text": " Or maybe like it''s in their culture in the company to kind + of like dive deeper into what", "tokens": [50614, 1610, 1310, 411, 309, 311, 294, + 641, 3713, 294, 264, 2237, 281, 733, 295, 411, 9192, 7731, 666, 437, 50942], "temperature": + 0.0, "avg_logprob": -0.2696099009940295, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.00425342470407486}, {"id": 154, "seek": 48044, "start": 492.0, + "end": 494.2, "text": " they are doing, right?", "tokens": [50942, 436, 366, 884, + 11, 558, 30, 51052], "temperature": 0.0, "avg_logprob": -0.2696099009940295, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.00425342470407486}, {"id": 155, "seek": + 48044, "start": 494.2, "end": 495.2, "text": " Yeah.", "tokens": [51052, 865, 13, + 51102], "temperature": 0.0, "avg_logprob": -0.2696099009940295, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.00425342470407486}, {"id": 156, "seek": + 48044, "start": 495.2, "end": 499.76, "text": " And maybe they think that they can + kind of take it over and then kind of run with it as", "tokens": [51102, 400, 1310, + 436, 519, 300, 436, 393, 733, 295, 747, 309, 670, 293, 550, 733, 295, 1190, 365, + 309, 382, 51330], "temperature": 0.0, "avg_logprob": -0.2696099009940295, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.00425342470407486}, {"id": 157, "seek": + 48044, "start": 499.76, "end": 501.0, "text": " they learned, right?", "tokens": + [51330, 436, 3264, 11, 558, 30, 51392], "temperature": 0.0, "avg_logprob": -0.2696099009940295, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.00425342470407486}, + {"id": 158, "seek": 48044, "start": 501.0, "end": 505.88, "text": " But then you + said 70% are kind of like, you know, here is my problem.", "tokens": [51392, 583, + 550, 291, 848, 5285, 4, 366, 733, 295, 411, 11, 291, 458, 11, 510, 307, 452, 1154, + 13, 51636], "temperature": 0.0, "avg_logprob": -0.2696099009940295, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.00425342470407486}, {"id": 159, "seek": + 48044, "start": 505.88, "end": 506.88, "text": " Can you solve it?", "tokens": [51636, + 1664, 291, 5039, 309, 30, 51686], "temperature": 0.0, "avg_logprob": -0.2696099009940295, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.00425342470407486}, + {"id": 160, "seek": 48044, "start": 506.88, "end": 508.96, "text": " Is that exactly + like does this work?", "tokens": [51686, 1119, 300, 2293, 411, 775, 341, 589, 30, + 51790], "temperature": 0.0, "avg_logprob": -0.2696099009940295, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.00425342470407486}, {"id": 161, "seek": + 50896, "start": 509.52, "end": 511.15999999999997, "text": " Can we have a handle + this?", "tokens": [50392, 1664, 321, 362, 257, 4813, 341, 30, 50474], "temperature": + 0.0, "avg_logprob": -0.28609051023210796, "compression_ratio": 1.6926229508196722, + "no_speech_prob": 0.01748480461537838}, {"id": 162, "seek": 50896, "start": 511.15999999999997, + "end": 516.1999999999999, "text": " And that''s how usually like 70% again, but + they have their own neural and that''s sometimes", "tokens": [50474, 400, 300, 311, + 577, 2673, 411, 5285, 4, 797, 11, 457, 436, 362, 641, 1065, 18161, 293, 300, 311, + 2171, 50726], "temperature": 0.0, "avg_logprob": -0.28609051023210796, "compression_ratio": + 1.6926229508196722, "no_speech_prob": 0.01748480461537838}, {"id": 163, "seek": + 50896, "start": 516.1999999999999, "end": 520.88, "text": " they don''t want to + tell us like what neural nets are using or how exactly their data is.", "tokens": + [50726, 436, 500, 380, 528, 281, 980, 505, 411, 437, 18161, 36170, 366, 1228, 420, + 577, 2293, 641, 1412, 307, 13, 50960], "temperature": 0.0, "avg_logprob": -0.28609051023210796, + "compression_ratio": 1.6926229508196722, "no_speech_prob": 0.01748480461537838}, + {"id": 164, "seek": 50896, "start": 520.88, "end": 526.52, "text": " But they give + us okay, this many dimensions, this many vectors and this many read request,", "tokens": + [50960, 583, 436, 976, 505, 1392, 11, 341, 867, 12819, 11, 341, 867, 18875, 293, + 341, 867, 1401, 5308, 11, 51242], "temperature": 0.0, "avg_logprob": -0.28609051023210796, + "compression_ratio": 1.6926229508196722, "no_speech_prob": 0.01748480461537838}, + {"id": 165, "seek": 50896, "start": 526.52, "end": 528.76, "text": " write request, + will it work?", "tokens": [51242, 2464, 5308, 11, 486, 309, 589, 30, 51354], "temperature": + 0.0, "avg_logprob": -0.28609051023210796, "compression_ratio": 1.6926229508196722, + "no_speech_prob": 0.01748480461537838}, {"id": 166, "seek": 50896, "start": 528.76, + "end": 531.96, "text": " And then we kind of go from there and see if we can solve + it.", "tokens": [51354, 400, 550, 321, 733, 295, 352, 490, 456, 293, 536, 498, 321, + 393, 5039, 309, 13, 51514], "temperature": 0.0, "avg_logprob": -0.28609051023210796, + "compression_ratio": 1.6926229508196722, "no_speech_prob": 0.01748480461537838}, + {"id": 167, "seek": 50896, "start": 531.96, "end": 534.56, "text": " Yeah, yeah, + sounds cool.", "tokens": [51514, 865, 11, 1338, 11, 3263, 1627, 13, 51644], "temperature": + 0.0, "avg_logprob": -0.28609051023210796, "compression_ratio": 1.6926229508196722, + "no_speech_prob": 0.01748480461537838}, {"id": 168, "seek": 53456, "start": 534.56, + "end": 539.3599999999999, "text": " But can you actually tell me what is vector + search and what is Mildos?", "tokens": [50364, 583, 393, 291, 767, 980, 385, 437, + 307, 8062, 3164, 293, 437, 307, 376, 793, 329, 30, 50604], "temperature": 0.0, "avg_logprob": + -0.35632816542927015, "compression_ratio": 1.7244094488188977, "no_speech_prob": + 0.03264587000012398}, {"id": 169, "seek": 53456, "start": 539.3599999999999, "end": + 541.04, "text": " Okay, sure.", "tokens": [50604, 1033, 11, 988, 13, 50688], "temperature": + 0.0, "avg_logprob": -0.35632816542927015, "compression_ratio": 1.7244094488188977, + "no_speech_prob": 0.03264587000012398}, {"id": 170, "seek": 53456, "start": 541.04, + "end": 543.8, "text": " So yeah, vector search.", "tokens": [50688, 407, 1338, 11, + 8062, 3164, 13, 50826], "temperature": 0.0, "avg_logprob": -0.35632816542927015, + "compression_ratio": 1.7244094488188977, "no_speech_prob": 0.03264587000012398}, + {"id": 171, "seek": 53456, "start": 543.8, "end": 548.16, "text": " Pretty much + a way to search vectors and check over vectors as well.", "tokens": [50826, 10693, + 709, 257, 636, 281, 3164, 18875, 293, 1520, 670, 18875, 382, 731, 13, 51044], "temperature": + 0.0, "avg_logprob": -0.35632816542927015, "compression_ratio": 1.7244094488188977, + "no_speech_prob": 0.03264587000012398}, {"id": 172, "seek": 53456, "start": 548.16, + "end": 549.8399999999999, "text": " Yeah, I''ll go over.", "tokens": [51044, 865, + 11, 286, 603, 352, 670, 13, 51128], "temperature": 0.0, "avg_logprob": -0.35632816542927015, + "compression_ratio": 1.7244094488188977, "no_speech_prob": 0.03264587000012398}, + {"id": 173, "seek": 53456, "start": 549.8399999999999, "end": 553.3599999999999, + "text": " So numbers and vectors, you have numbers easily comparable.", "tokens": + [51128, 407, 3547, 293, 18875, 11, 291, 362, 3547, 3612, 25323, 13, 51304], "temperature": + 0.0, "avg_logprob": -0.35632816542927015, "compression_ratio": 1.7244094488188977, + "no_speech_prob": 0.03264587000012398}, {"id": 174, "seek": 53456, "start": 553.3599999999999, + "end": 555.04, "text": " You can store them in relational databases.", "tokens": + [51304, 509, 393, 3531, 552, 294, 38444, 22380, 13, 51388], "temperature": 0.0, + "avg_logprob": -0.35632816542927015, "compression_ratio": 1.7244094488188977, "no_speech_prob": + 0.03264587000012398}, {"id": 175, "seek": 53456, "start": 555.04, "end": 558.56, + "text": " Yeah, like the greater than equal to less than.", "tokens": [51388, 865, + 11, 411, 264, 5044, 813, 2681, 281, 1570, 813, 13, 51564], "temperature": 0.0, "avg_logprob": + -0.35632816542927015, "compression_ratio": 1.7244094488188977, "no_speech_prob": + 0.03264587000012398}, {"id": 176, "seek": 53456, "start": 558.56, "end": 563.0, + "text": " So to actually index these and search quickly through them, you can do + things like Beatriz.", "tokens": [51564, 407, 281, 767, 8186, 613, 293, 3164, 2661, + 807, 552, 11, 291, 393, 360, 721, 411, 16031, 24959, 13, 51786], "temperature": + 0.0, "avg_logprob": -0.35632816542927015, "compression_ratio": 1.7244094488188977, + "no_speech_prob": 0.03264587000012398}, {"id": 177, "seek": 56300, "start": 563.48, + "end": 567.68, "text": " This is a very efficient and very fast way of searching + for a value.", "tokens": [50388, 639, 307, 257, 588, 7148, 293, 588, 2370, 636, + 295, 10808, 337, 257, 2158, 13, 50598], "temperature": 0.0, "avg_logprob": -0.17748535040653113, + "compression_ratio": 1.7242524916943522, "no_speech_prob": 0.002639733487740159}, + {"id": 178, "seek": 56300, "start": 567.68, "end": 573.56, "text": " Vectors on + the other hand, you don''t really have this kind of like comparison,", "tokens": + [50598, 691, 557, 830, 322, 264, 661, 1011, 11, 291, 500, 380, 534, 362, 341, 733, + 295, 411, 9660, 11, 50892], "temperature": 0.0, "avg_logprob": -0.17748535040653113, + "compression_ratio": 1.7242524916943522, "no_speech_prob": 0.002639733487740159}, + {"id": 179, "seek": 56300, "start": 573.56, "end": 578.28, "text": " direct comparison, + you have similarity metrics, which is a math equation where you kind", "tokens": + [50892, 2047, 9660, 11, 291, 362, 32194, 16367, 11, 597, 307, 257, 5221, 5367, 689, + 291, 733, 51128], "temperature": 0.0, "avg_logprob": -0.17748535040653113, "compression_ratio": + 1.7242524916943522, "no_speech_prob": 0.002639733487740159}, {"id": 180, "seek": + 56300, "start": 578.28, "end": 580.64, "text": " of find out how far they diverge.", + "tokens": [51128, 295, 915, 484, 577, 1400, 436, 18558, 432, 13, 51246], "temperature": + 0.0, "avg_logprob": -0.17748535040653113, "compression_ratio": 1.7242524916943522, + "no_speech_prob": 0.002639733487740159}, {"id": 181, "seek": 56300, "start": 580.64, + "end": 583.24, "text": " But it doesn''t tell you, okay, this element is diverging + this much.", "tokens": [51246, 583, 309, 1177, 380, 980, 291, 11, 1392, 11, 341, + 4478, 307, 18558, 3249, 341, 709, 13, 51376], "temperature": 0.0, "avg_logprob": + -0.17748535040653113, "compression_ratio": 1.7242524916943522, "no_speech_prob": + 0.002639733487740159}, {"id": 182, "seek": 56300, "start": 583.24, "end": 586.84, + "text": " It''s like a lump sum for every value in the vector combined.", "tokens": + [51376, 467, 311, 411, 257, 25551, 2408, 337, 633, 2158, 294, 264, 8062, 9354, 13, + 51556], "temperature": 0.0, "avg_logprob": -0.17748535040653113, "compression_ratio": + 1.7242524916943522, "no_speech_prob": 0.002639733487740159}, {"id": 183, "seek": + 56300, "start": 586.84, "end": 589.04, "text": " This is how different the entire + thing is.", "tokens": [51556, 639, 307, 577, 819, 264, 2302, 551, 307, 13, 51666], + "temperature": 0.0, "avg_logprob": -0.17748535040653113, "compression_ratio": 1.7242524916943522, + "no_speech_prob": 0.002639733487740159}, {"id": 184, "seek": 56300, "start": 589.04, + "end": 592.88, "text": " And that makes indexing a little bit more difficult because + you kind of start", "tokens": [51666, 400, 300, 1669, 8186, 278, 257, 707, 857, + 544, 2252, 570, 291, 733, 295, 722, 51858], "temperature": 0.0, "avg_logprob": -0.17748535040653113, + "compression_ratio": 1.7242524916943522, "no_speech_prob": 0.002639733487740159}, + {"id": 185, "seek": 59288, "start": 592.88, "end": 596.68, "text": " relying on + more approximate algorithms.", "tokens": [50364, 24140, 322, 544, 30874, 14642, + 13, 50554], "temperature": 0.0, "avg_logprob": -0.21422190232710406, "compression_ratio": + 1.825910931174089, "no_speech_prob": 0.00040544933290220797}, {"id": 186, "seek": + 59288, "start": 596.68, "end": 600.64, "text": " So this is what approximate nearest + neighbor search, which is pretty much all of vector search.", "tokens": [50554, + 407, 341, 307, 437, 30874, 23831, 5987, 3164, 11, 597, 307, 1238, 709, 439, 295, + 8062, 3164, 13, 50752], "temperature": 0.0, "avg_logprob": -0.21422190232710406, + "compression_ratio": 1.825910931174089, "no_speech_prob": 0.00040544933290220797}, + {"id": 187, "seek": 59288, "start": 600.64, "end": 604.4, "text": " It''s this library + of algorithms.", "tokens": [50752, 467, 311, 341, 6405, 295, 14642, 13, 50940], + "temperature": 0.0, "avg_logprob": -0.21422190232710406, "compression_ratio": 1.825910931174089, + "no_speech_prob": 0.00040544933290220797}, {"id": 188, "seek": 59288, "start": 604.4, + "end": 608.56, "text": " And there you can do clustering and then you can do graph + based, tree based.", "tokens": [50940, 400, 456, 291, 393, 360, 596, 48673, 293, + 550, 291, 393, 360, 4295, 2361, 11, 4230, 2361, 13, 51148], "temperature": 0.0, + "avg_logprob": -0.21422190232710406, "compression_ratio": 1.825910931174089, "no_speech_prob": + 0.00040544933290220797}, {"id": 189, "seek": 59288, "start": 608.56, "end": 613.8, + "text": " So the big, the big names are right now for inverted file we have face.", + "tokens": [51148, 407, 264, 955, 11, 264, 955, 5288, 366, 558, 586, 337, 38969, + 3991, 321, 362, 1851, 13, 51410], "temperature": 0.0, "avg_logprob": -0.21422190232710406, + "compression_ratio": 1.825910931174089, "no_speech_prob": 0.00040544933290220797}, + {"id": 190, "seek": 59288, "start": 613.8, "end": 617.76, "text": " That''s the + library for its clustering based on centroids.", "tokens": [51410, 663, 311, 264, + 6405, 337, 1080, 596, 48673, 2361, 322, 24607, 3742, 13, 51608], "temperature": + 0.0, "avg_logprob": -0.21422190232710406, "compression_ratio": 1.825910931174089, + "no_speech_prob": 0.00040544933290220797}, {"id": 191, "seek": 59288, "start": 617.76, + "end": 622.0, "text": " And then you store values in the inverted file and you search + through that.", "tokens": [51608, 400, 550, 291, 3531, 4190, 294, 264, 38969, 3991, + 293, 291, 3164, 807, 300, 13, 51820], "temperature": 0.0, "avg_logprob": -0.21422190232710406, + "compression_ratio": 1.825910931174089, "no_speech_prob": 0.00040544933290220797}, + {"id": 192, "seek": 62200, "start": 622.0, "end": 627.6, "text": " There''s tree + based, which is spotifies annoy what they''re using for their music recommendations.", + "tokens": [50364, 821, 311, 4230, 2361, 11, 597, 307, 4008, 11221, 8759, 437, 436, + 434, 1228, 337, 641, 1318, 10434, 13, 50644], "temperature": 0.0, "avg_logprob": + -0.22516836438860213, "compression_ratio": 1.7220216606498195, "no_speech_prob": + 0.009592418558895588}, {"id": 193, "seek": 62200, "start": 627.6, "end": 633.44, + "text": " And that''s just building trees and splitting all your data by hyperplanes + and then going", "tokens": [50644, 400, 300, 311, 445, 2390, 5852, 293, 30348, 439, + 428, 1412, 538, 9848, 564, 12779, 293, 550, 516, 50936], "temperature": 0.0, "avg_logprob": + -0.22516836438860213, "compression_ratio": 1.7220216606498195, "no_speech_prob": + 0.009592418558895588}, {"id": 194, "seek": 62200, "start": 633.44, "end": 635.12, + "text": " left or right.", "tokens": [50936, 1411, 420, 558, 13, 51020], "temperature": + 0.0, "avg_logprob": -0.22516836438860213, "compression_ratio": 1.7220216606498195, + "no_speech_prob": 0.009592418558895588}, {"id": 195, "seek": 62200, "start": 635.12, + "end": 637.72, "text": " And then we have graph based, which is H and SW.", "tokens": + [51020, 400, 550, 321, 362, 4295, 2361, 11, 597, 307, 389, 293, 20346, 13, 51150], + "temperature": 0.0, "avg_logprob": -0.22516836438860213, "compression_ratio": 1.7220216606498195, + "no_speech_prob": 0.009592418558895588}, {"id": 196, "seek": 62200, "start": 637.72, + "end": 640.44, "text": " I think is the biggest one right now.", "tokens": [51150, + 286, 519, 307, 264, 3880, 472, 558, 586, 13, 51286], "temperature": 0.0, "avg_logprob": + -0.22516836438860213, "compression_ratio": 1.7220216606498195, "no_speech_prob": + 0.009592418558895588}, {"id": 197, "seek": 62200, "start": 640.44, "end": 646.56, + "text": " And they''re doing pretty much graph start with a very sparse, a very + like empty graph", "tokens": [51286, 400, 436, 434, 884, 1238, 709, 4295, 722, 365, + 257, 588, 637, 11668, 11, 257, 588, 411, 6707, 4295, 51592], "temperature": 0.0, + "avg_logprob": -0.22516836438860213, "compression_ratio": 1.7220216606498195, "no_speech_prob": + 0.009592418558895588}, {"id": 198, "seek": 62200, "start": 646.56, "end": 647.88, + "text": " on the top layer.", "tokens": [51592, 322, 264, 1192, 4583, 13, 51658], + "temperature": 0.0, "avg_logprob": -0.22516836438860213, "compression_ratio": 1.7220216606498195, + "no_speech_prob": 0.009592418558895588}, {"id": 199, "seek": 62200, "start": 647.88, + "end": 651.18, "text": " And then you find the closest to one point, let''s say, + and then you drop down a lower", "tokens": [51658, 400, 550, 291, 915, 264, 13699, + 281, 472, 935, 11, 718, 311, 584, 11, 293, 550, 291, 3270, 760, 257, 3126, 51823], + "temperature": 0.0, "avg_logprob": -0.22516836438860213, "compression_ratio": 1.7220216606498195, + "no_speech_prob": 0.009592418558895588}, {"id": 200, "seek": 65118, "start": 651.18, + "end": 654.02, "text": " where it gets more dense and you keep dropping and dropping.", + "tokens": [50364, 689, 309, 2170, 544, 18011, 293, 291, 1066, 13601, 293, 13601, + 13, 50506], "temperature": 0.0, "avg_logprob": -0.2337789386510849, "compression_ratio": + 1.924187725631769, "no_speech_prob": 0.002125912345945835}, {"id": 201, "seek": + 65118, "start": 654.02, "end": 660.4599999999999, "text": " And then there''s a + locality based sensitive hashing, which instead of with normal hash algorithms,", + "tokens": [50506, 400, 550, 456, 311, 257, 1628, 1860, 2361, 9477, 575, 571, 11, + 597, 2602, 295, 365, 2710, 22019, 14642, 11, 50828], "temperature": 0.0, "avg_logprob": + -0.2337789386510849, "compression_ratio": 1.924187725631769, "no_speech_prob": 0.002125912345945835}, + {"id": 202, "seek": 65118, "start": 660.4599999999999, "end": 665.66, "text": " + you avoid collisions with locality sensitive hashing, you try to get collisions.", + "tokens": [50828, 291, 5042, 46537, 365, 1628, 1860, 9477, 575, 571, 11, 291, 853, + 281, 483, 46537, 13, 51088], "temperature": 0.0, "avg_logprob": -0.2337789386510849, + "compression_ratio": 1.924187725631769, "no_speech_prob": 0.002125912345945835}, + {"id": 203, "seek": 65118, "start": 665.66, "end": 668.8599999999999, "text": " + If you get collisions, that means that they''re close together.", "tokens": [51088, + 759, 291, 483, 46537, 11, 300, 1355, 300, 436, 434, 1998, 1214, 13, 51248], "temperature": + 0.0, "avg_logprob": -0.2337789386510849, "compression_ratio": 1.924187725631769, + "no_speech_prob": 0.002125912345945835}, {"id": 204, "seek": 65118, "start": 668.8599999999999, + "end": 672.9799999999999, "text": " And then one thing I kind of forgot to go over + is like the data types that this brings.", "tokens": [51248, 400, 550, 472, 551, + 286, 733, 295, 5298, 281, 352, 670, 307, 411, 264, 1412, 3467, 300, 341, 5607, 13, + 51454], "temperature": 0.0, "avg_logprob": -0.2337789386510849, "compression_ratio": + 1.924187725631769, "no_speech_prob": 0.002125912345945835}, {"id": 205, "seek": + 65118, "start": 672.9799999999999, "end": 678.18, "text": " So there is structured + data, which is those numbers strings, those things that can be easily", "tokens": + [51454, 407, 456, 307, 18519, 1412, 11, 597, 307, 729, 3547, 13985, 11, 729, 721, + 300, 393, 312, 3612, 51714], "temperature": 0.0, "avg_logprob": -0.2337789386510849, + "compression_ratio": 1.924187725631769, "no_speech_prob": 0.002125912345945835}, + {"id": 206, "seek": 65118, "start": 678.18, "end": 679.5, "text": " compared to.", + "tokens": [51714, 5347, 281, 13, 51780], "temperature": 0.0, "avg_logprob": -0.2337789386510849, + "compression_ratio": 1.924187725631769, "no_speech_prob": 0.002125912345945835}, + {"id": 207, "seek": 65118, "start": 679.5, "end": 681.14, "text": " And then there''s + unstructured data.", "tokens": [51780, 400, 550, 456, 311, 18799, 46847, 1412, 13, + 51862], "temperature": 0.0, "avg_logprob": -0.2337789386510849, "compression_ratio": + 1.924187725631769, "no_speech_prob": 0.002125912345945835}, {"id": 208, "seek": + 68114, "start": 681.18, "end": 687.62, "text": " And this is pretty much these images, + videos, medical data, some things that computers can''t", "tokens": [50366, 400, + 341, 307, 1238, 709, 613, 5267, 11, 2145, 11, 4625, 1412, 11, 512, 721, 300, 10807, + 393, 380, 50688], "temperature": 0.0, "avg_logprob": -0.20614183144491227, "compression_ratio": + 1.8514492753623188, "no_speech_prob": 6.004552778904326e-05}, {"id": 209, "seek": + 68114, "start": 687.62, "end": 689.1, "text": " easily understand.", "tokens": [50688, + 3612, 1223, 13, 50762], "temperature": 0.0, "avg_logprob": -0.20614183144491227, + "compression_ratio": 1.8514492753623188, "no_speech_prob": 6.004552778904326e-05}, + {"id": 210, "seek": 68114, "start": 689.1, "end": 692.42, "text": " And then with + unstructured data, you throw them through a neural net and you get those", "tokens": + [50762, 400, 550, 365, 18799, 46847, 1412, 11, 291, 3507, 552, 807, 257, 18161, + 2533, 293, 291, 483, 729, 50928], "temperature": 0.0, "avg_logprob": -0.20614183144491227, + "compression_ratio": 1.8514492753623188, "no_speech_prob": 6.004552778904326e-05}, + {"id": 211, "seek": 68114, "start": 692.42, "end": 694.74, "text": " vectors that + we previously talked about.", "tokens": [50928, 18875, 300, 321, 8046, 2825, 466, + 13, 51044], "temperature": 0.0, "avg_logprob": -0.20614183144491227, "compression_ratio": + 1.8514492753623188, "no_speech_prob": 6.004552778904326e-05}, {"id": 212, "seek": + 68114, "start": 694.74, "end": 699.78, "text": " And then relational data or with + a structured data, you can just take the data itself because", "tokens": [51044, + 400, 550, 38444, 1412, 420, 365, 257, 18519, 1412, 11, 291, 393, 445, 747, 264, + 1412, 2564, 570, 51296], "temperature": 0.0, "avg_logprob": -0.20614183144491227, + "compression_ratio": 1.8514492753623188, "no_speech_prob": 6.004552778904326e-05}, + {"id": 213, "seek": 68114, "start": 699.78, "end": 701.18, "text": " it can be easily + compared.", "tokens": [51296, 309, 393, 312, 3612, 5347, 13, 51366], "temperature": + 0.0, "avg_logprob": -0.20614183144491227, "compression_ratio": 1.8514492753623188, + "no_speech_prob": 6.004552778904326e-05}, {"id": 214, "seek": 68114, "start": 701.18, + "end": 702.38, "text": " It''s already known to a computer.", "tokens": [51366, + 467, 311, 1217, 2570, 281, 257, 3820, 13, 51426], "temperature": 0.0, "avg_logprob": + -0.20614183144491227, "compression_ratio": 1.8514492753623188, "no_speech_prob": + 6.004552778904326e-05}, {"id": 215, "seek": 68114, "start": 702.38, "end": 703.98, + "text": " It can understand them.", "tokens": [51426, 467, 393, 1223, 552, 13, 51506], + "temperature": 0.0, "avg_logprob": -0.20614183144491227, "compression_ratio": 1.8514492753623188, + "no_speech_prob": 6.004552778904326e-05}, {"id": 216, "seek": 68114, "start": 703.98, + "end": 709.38, "text": " And then there''s in between, which is semi structured, + semi structured is things like emails", "tokens": [51506, 400, 550, 456, 311, 294, + 1296, 11, 597, 307, 12909, 18519, 11, 12909, 18519, 307, 721, 411, 12524, 51776], + "temperature": 0.0, "avg_logprob": -0.20614183144491227, "compression_ratio": 1.8514492753623188, + "no_speech_prob": 6.004552778904326e-05}, {"id": 217, "seek": 70938, "start": 709.38, + "end": 710.9, "text": " where you have structured to it.", "tokens": [50364, 689, + 291, 362, 18519, 281, 309, 13, 50440], "temperature": 0.0, "avg_logprob": -0.22103356976881094, + "compression_ratio": 1.744408945686901, "no_speech_prob": 0.023343143984675407}, + {"id": 218, "seek": 70938, "start": 710.9, "end": 716.58, "text": " Like you have + the body, the header, the sending address, those are all like every email has", + "tokens": [50440, 1743, 291, 362, 264, 1772, 11, 264, 23117, 11, 264, 7750, 2985, + 11, 729, 366, 439, 411, 633, 3796, 575, 50724], "temperature": 0.0, "avg_logprob": + -0.22103356976881094, "compression_ratio": 1.744408945686901, "no_speech_prob": + 0.023343143984675407}, {"id": 219, "seek": 70938, "start": 716.58, "end": 718.82, + "text": " that, but the data inside is unstructured.", "tokens": [50724, 300, 11, + 457, 264, 1412, 1854, 307, 18799, 46847, 13, 50836], "temperature": 0.0, "avg_logprob": + -0.22103356976881094, "compression_ratio": 1.744408945686901, "no_speech_prob": + 0.023343143984675407}, {"id": 220, "seek": 70938, "start": 718.82, "end": 720.66, + "text": " This is kind of where you use a mix of both.", "tokens": [50836, 639, + 307, 733, 295, 689, 291, 764, 257, 2890, 295, 1293, 13, 50928], "temperature": 0.0, + "avg_logprob": -0.22103356976881094, "compression_ratio": 1.744408945686901, "no_speech_prob": + 0.023343143984675407}, {"id": 221, "seek": 70938, "start": 720.66, "end": 725.78, + "text": " But yeah, vector search gets a little complicated, but the main way to + think about it, you have", "tokens": [50928, 583, 1338, 11, 8062, 3164, 2170, 257, + 707, 6179, 11, 457, 264, 2135, 636, 281, 519, 466, 309, 11, 291, 362, 51184], "temperature": + 0.0, "avg_logprob": -0.22103356976881094, "compression_ratio": 1.744408945686901, + "no_speech_prob": 0.023343143984675407}, {"id": 222, "seek": 70938, "start": 725.78, + "end": 728.74, "text": " unstructured data, your computer does not understand whatsoever.", + "tokens": [51184, 18799, 46847, 1412, 11, 428, 3820, 775, 406, 1223, 17076, 13, + 51332], "temperature": 0.0, "avg_logprob": -0.22103356976881094, "compression_ratio": + 1.744408945686901, "no_speech_prob": 0.023343143984675407}, {"id": 223, "seek": + 70938, "start": 728.74, "end": 731.66, "text": " You can have two images, a pixel + apart.", "tokens": [51332, 509, 393, 362, 732, 5267, 11, 257, 19261, 4936, 13, 51478], + "temperature": 0.0, "avg_logprob": -0.22103356976881094, "compression_ratio": 1.744408945686901, + "no_speech_prob": 0.023343143984675407}, {"id": 224, "seek": 70938, "start": 731.66, + "end": 735.42, "text": " And half the time if your algorithm is not good, your computer + will think there are two exactly", "tokens": [51478, 400, 1922, 264, 565, 498, 428, + 9284, 307, 406, 665, 11, 428, 3820, 486, 519, 456, 366, 732, 2293, 51666], "temperature": + 0.0, "avg_logprob": -0.22103356976881094, "compression_ratio": 1.744408945686901, + "no_speech_prob": 0.023343143984675407}, {"id": 225, "seek": 70938, "start": 735.42, + "end": 736.42, "text": " different pictures.", "tokens": [51666, 819, 5242, 13, + 51716], "temperature": 0.0, "avg_logprob": -0.22103356976881094, "compression_ratio": + 1.744408945686901, "no_speech_prob": 0.023343143984675407}, {"id": 226, "seek": + 70938, "start": 736.42, "end": 737.78, "text": " It won''t get it.", "tokens": [51716, + 467, 1582, 380, 483, 309, 13, 51784], "temperature": 0.0, "avg_logprob": -0.22103356976881094, + "compression_ratio": 1.744408945686901, "no_speech_prob": 0.023343143984675407}, + {"id": 227, "seek": 73778, "start": 737.78, "end": 742.1, "text": " So you take + that unstructured data, you throw a through a neural net, you get vectors,", "tokens": + [50364, 407, 291, 747, 300, 18799, 46847, 1412, 11, 291, 3507, 257, 807, 257, 18161, + 2533, 11, 291, 483, 18875, 11, 50580], "temperature": 0.0, "avg_logprob": -0.28908658549733407, + "compression_ratio": 1.6996805111821087, "no_speech_prob": 0.0037593275774270296}, + {"id": 228, "seek": 73778, "start": 742.1, "end": 746.5, "text": " and what vectors + use those previous algorithms to find things that are similar.", "tokens": [50580, + 293, 437, 18875, 764, 729, 3894, 14642, 281, 915, 721, 300, 366, 2531, 13, 50800], + "temperature": 0.0, "avg_logprob": -0.28908658549733407, "compression_ratio": 1.6996805111821087, + "no_speech_prob": 0.0037593275774270296}, {"id": 229, "seek": 73778, "start": 746.5, + "end": 748.5, "text": " And that''s how you can quickly search through it.", "tokens": + [50800, 400, 300, 311, 577, 291, 393, 2661, 3164, 807, 309, 13, 50900], "temperature": + 0.0, "avg_logprob": -0.28908658549733407, "compression_ratio": 1.6996805111821087, + "no_speech_prob": 0.0037593275774270296}, {"id": 230, "seek": 73778, "start": 748.5, + "end": 749.5, "text": " Right, right.", "tokens": [50900, 1779, 11, 558, 13, 50950], + "temperature": 0.0, "avg_logprob": -0.28908658549733407, "compression_ratio": 1.6996805111821087, + "no_speech_prob": 0.0037593275774270296}, {"id": 231, "seek": 73778, "start": 749.5, + "end": 753.6999999999999, "text": " And I mean, so you mentioned these several algorithms + there, which is I agree, I agree,", "tokens": [50950, 400, 286, 914, 11, 370, 291, + 2835, 613, 2940, 14642, 456, 11, 597, 307, 286, 3986, 11, 286, 3986, 11, 51160], + "temperature": 0.0, "avg_logprob": -0.28908658549733407, "compression_ratio": 1.6996805111821087, + "no_speech_prob": 0.0037593275774270296}, {"id": 232, "seek": 73778, "start": 753.6999999999999, + "end": 755.5, "text": " I read this paper as well.", "tokens": [51160, 286, 1401, + 341, 3035, 382, 731, 13, 51250], "temperature": 0.0, "avg_logprob": -0.28908658549733407, + "compression_ratio": 1.6996805111821087, "no_speech_prob": 0.0037593275774270296}, + {"id": 233, "seek": 73778, "start": 755.5, "end": 756.5, "text": " This is cool.", + "tokens": [51250, 639, 307, 1627, 13, 51300], "temperature": 0.0, "avg_logprob": + -0.28908658549733407, "compression_ratio": 1.6996805111821087, "no_speech_prob": + 0.0037593275774270296}, {"id": 234, "seek": 73778, "start": 756.5, "end": 762.02, + "text": " But like just to satisfy my curiosity, where would you put the product + quantization methods,", "tokens": [51300, 583, 411, 445, 281, 19319, 452, 18769, + 11, 689, 576, 291, 829, 264, 1674, 4426, 2144, 7150, 11, 51576], "temperature": + 0.0, "avg_logprob": -0.28908658549733407, "compression_ratio": 1.6996805111821087, + "no_speech_prob": 0.0037593275774270296}, {"id": 235, "seek": 73778, "start": 762.02, + "end": 765.9, "text": " you know, which is also implemented in FICE and maybe some, + some where else to.", "tokens": [51576, 291, 458, 11, 597, 307, 611, 12270, 294, + 479, 13663, 293, 1310, 512, 11, 512, 689, 1646, 281, 13, 51770], "temperature": + 0.0, "avg_logprob": -0.28908658549733407, "compression_ratio": 1.6996805111821087, + "no_speech_prob": 0.0037593275774270296}, {"id": 236, "seek": 76590, "start": 765.9, + "end": 772.22, "text": " Like, is this like a fundamentally different approach compared + to LSH graph, trees,", "tokens": [50364, 1743, 11, 307, 341, 411, 257, 17879, 819, + 3109, 5347, 281, 441, 17308, 4295, 11, 5852, 11, 50680], "temperature": 0.0, "avg_logprob": + -0.20632700060234696, "compression_ratio": 1.579136690647482, "no_speech_prob": + 0.03268862143158913}, {"id": 237, "seek": 76590, "start": 772.22, "end": 776.14, + "text": " or is this something else in your book?", "tokens": [50680, 420, 307, + 341, 746, 1646, 294, 428, 1446, 30, 50876], "temperature": 0.0, "avg_logprob": -0.20632700060234696, + "compression_ratio": 1.579136690647482, "no_speech_prob": 0.03268862143158913}, + {"id": 238, "seek": 76590, "start": 776.14, "end": 782.62, "text": " So with FICE + with this quantization, I think I just find that to be part of the graph", "tokens": + [50876, 407, 365, 479, 13663, 365, 341, 4426, 2144, 11, 286, 519, 286, 445, 915, + 300, 281, 312, 644, 295, 264, 4295, 51200], "temperature": 0.0, "avg_logprob": -0.20632700060234696, + "compression_ratio": 1.579136690647482, "no_speech_prob": 0.03268862143158913}, + {"id": 239, "seek": 76590, "start": 782.62, "end": 783.62, "text": " based.", "tokens": + [51200, 2361, 13, 51250], "temperature": 0.0, "avg_logprob": -0.20632700060234696, + "compression_ratio": 1.579136690647482, "no_speech_prob": 0.03268862143158913}, + {"id": 240, "seek": 76590, "start": 783.62, "end": 784.62, "text": " I looked a + bit into it.", "tokens": [51250, 286, 2956, 257, 857, 666, 309, 13, 51300], "temperature": + 0.0, "avg_logprob": -0.20632700060234696, "compression_ratio": 1.579136690647482, + "no_speech_prob": 0.03268862143158913}, {"id": 241, "seek": 76590, "start": 784.62, + "end": 787.62, "text": " This kind of went a little deeper because I didn''t really + work on that much, but I did", "tokens": [51300, 639, 733, 295, 1437, 257, 707, + 7731, 570, 286, 994, 380, 534, 589, 322, 300, 709, 11, 457, 286, 630, 51450], "temperature": + 0.0, "avg_logprob": -0.20632700060234696, "compression_ratio": 1.579136690647482, + "no_speech_prob": 0.03268862143158913}, {"id": 242, "seek": 76590, "start": 787.62, + "end": 789.18, "text": " up and look into it.", "tokens": [51450, 493, 293, 574, + 666, 309, 13, 51528], "temperature": 0.0, "avg_logprob": -0.20632700060234696, "compression_ratio": + 1.579136690647482, "no_speech_prob": 0.03268862143158913}, {"id": 243, "seek": 76590, + "start": 789.18, "end": 794.46, "text": " But it''s pretty much just simplifying + it for clustering is the way I saw it, where I would", "tokens": [51528, 583, 309, + 311, 1238, 709, 445, 6883, 5489, 309, 337, 596, 48673, 307, 264, 636, 286, 1866, + 309, 11, 689, 286, 576, 51792], "temperature": 0.0, "avg_logprob": -0.20632700060234696, + "compression_ratio": 1.579136690647482, "no_speech_prob": 0.03268862143158913}, + {"id": 244, "seek": 79446, "start": 794.46, "end": 795.94, "text": " classify that + for clustering.", "tokens": [50364, 33872, 300, 337, 596, 48673, 13, 50438], "temperature": + 0.0, "avg_logprob": -0.18246467963798896, "compression_ratio": 1.745819397993311, + "no_speech_prob": 0.0039596944116055965}, {"id": 245, "seek": 79446, "start": 795.94, + "end": 798.7800000000001, "text": " You kind of want to, you need something to kind + of simplify and speed it up.", "tokens": [50438, 509, 733, 295, 528, 281, 11, 291, + 643, 746, 281, 733, 295, 20460, 293, 3073, 309, 493, 13, 50580], "temperature": + 0.0, "avg_logprob": -0.18246467963798896, "compression_ratio": 1.745819397993311, + "no_speech_prob": 0.0039596944116055965}, {"id": 246, "seek": 79446, "start": 798.7800000000001, + "end": 801.22, "text": " So that''s where in FICE you have the quantized base.", + "tokens": [50580, 407, 300, 311, 689, 294, 479, 13663, 291, 362, 264, 4426, 1602, + 3096, 13, 50702], "temperature": 0.0, "avg_logprob": -0.18246467963798896, "compression_ratio": + 1.745819397993311, "no_speech_prob": 0.0039596944116055965}, {"id": 247, "seek": + 79446, "start": 801.22, "end": 806.14, "text": " You have the flats, which aren''t, + you have the SQH, which is quantized based.", "tokens": [50702, 509, 362, 264, 43075, + 11, 597, 3212, 380, 11, 291, 362, 264, 318, 48, 39, 11, 597, 307, 4426, 1602, 2361, + 13, 50948], "temperature": 0.0, "avg_logprob": -0.18246467963798896, "compression_ratio": + 1.745819397993311, "no_speech_prob": 0.0039596944116055965}, {"id": 248, "seek": + 79446, "start": 806.14, "end": 809.5, "text": " And there''s a few more throughout + the names, but it''s just a way of speeding up that", "tokens": [50948, 400, 456, + 311, 257, 1326, 544, 3710, 264, 5288, 11, 457, 309, 311, 445, 257, 636, 295, 35593, + 493, 300, 51116], "temperature": 0.0, "avg_logprob": -0.18246467963798896, "compression_ratio": + 1.745819397993311, "no_speech_prob": 0.0039596944116055965}, {"id": 249, "seek": + 79446, "start": 809.5, "end": 810.74, "text": " already used one.", "tokens": [51116, + 1217, 1143, 472, 13, 51178], "temperature": 0.0, "avg_logprob": -0.18246467963798896, + "compression_ratio": 1.745819397993311, "no_speech_prob": 0.0039596944116055965}, + {"id": 250, "seek": 79446, "start": 810.74, "end": 816.4200000000001, "text": " + I''m not sure how well it will work with other algorithms.", "tokens": [51178, 286, + 478, 406, 988, 577, 731, 309, 486, 589, 365, 661, 14642, 13, 51462], "temperature": + 0.0, "avg_logprob": -0.18246467963798896, "compression_ratio": 1.745819397993311, + "no_speech_prob": 0.0039596944116055965}, {"id": 251, "seek": 79446, "start": 816.4200000000001, + "end": 821.14, "text": " So like using that quantization and then trying it on a + noise, you quantize everything", "tokens": [51462, 407, 411, 1228, 300, 4426, 2144, + 293, 550, 1382, 309, 322, 257, 5658, 11, 291, 4426, 1125, 1203, 51698], "temperature": + 0.0, "avg_logprob": -0.18246467963798896, "compression_ratio": 1.745819397993311, + "no_speech_prob": 0.0039596944116055965}, {"id": 252, "seek": 79446, "start": 821.14, + "end": 823.46, "text": " and then you start doing the splits.", "tokens": [51698, + 293, 550, 291, 722, 884, 264, 37741, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.18246467963798896, "compression_ratio": 1.745819397993311, "no_speech_prob": + 0.0039596944116055965}, {"id": 253, "seek": 82346, "start": 823.46, "end": 829.34, + "text": " It might speed things up, but yeah, it''s a little bit outside of where + I see what I know.", "tokens": [50364, 467, 1062, 3073, 721, 493, 11, 457, 1338, + 11, 309, 311, 257, 707, 857, 2380, 295, 689, 286, 536, 437, 286, 458, 13, 50658], + "temperature": 0.0, "avg_logprob": -0.3005703059771589, "compression_ratio": 1.6062717770034842, + "no_speech_prob": 0.010494323447346687}, {"id": 254, "seek": 82346, "start": 829.34, + "end": 833.1800000000001, "text": " But if it works with FICE, I believe it could + work with the other locations.", "tokens": [50658, 583, 498, 309, 1985, 365, 479, + 13663, 11, 286, 1697, 309, 727, 589, 365, 264, 661, 9253, 13, 50850], "temperature": + 0.0, "avg_logprob": -0.3005703059771589, "compression_ratio": 1.6062717770034842, + "no_speech_prob": 0.010494323447346687}, {"id": 255, "seek": 82346, "start": 833.1800000000001, + "end": 835.82, "text": " It''s just, I don''t think it''s fantastic yet.", "tokens": + [50850, 467, 311, 445, 11, 286, 500, 380, 519, 309, 311, 5456, 1939, 13, 50982], + "temperature": 0.0, "avg_logprob": -0.3005703059771589, "compression_ratio": 1.6062717770034842, + "no_speech_prob": 0.010494323447346687}, {"id": 256, "seek": 82346, "start": 835.82, + "end": 836.82, "text": " Yeah, I mean, I agree.", "tokens": [50982, 865, 11, 286, + 914, 11, 286, 3986, 13, 51032], "temperature": 0.0, "avg_logprob": -0.3005703059771589, + "compression_ratio": 1.6062717770034842, "no_speech_prob": 0.010494323447346687}, + {"id": 257, "seek": 82346, "start": 836.82, "end": 840.58, "text": " And I mean, + there are a number of approaches where they combine things.", "tokens": [51032, + 400, 286, 914, 11, 456, 366, 257, 1230, 295, 11587, 689, 436, 10432, 721, 13, 51220], + "temperature": 0.0, "avg_logprob": -0.3005703059771589, "compression_ratio": 1.6062717770034842, + "no_speech_prob": 0.010494323447346687}, {"id": 258, "seek": 82346, "start": 840.58, + "end": 847.0600000000001, "text": " If you take the paper on disk and then from + Microsoft, from I think Bing team, they combine", "tokens": [51220, 759, 291, 747, + 264, 3035, 322, 12355, 293, 550, 490, 8116, 11, 490, 286, 519, 30755, 1469, 11, + 436, 10432, 51544], "temperature": 0.0, "avg_logprob": -0.3005703059771589, "compression_ratio": + 1.6062717770034842, "no_speech_prob": 0.010494323447346687}, {"id": 259, "seek": + 82346, "start": 847.0600000000001, "end": 851.5400000000001, "text": " H&SW with + product quantization.", "tokens": [51544, 389, 5, 50, 54, 365, 1674, 4426, 2144, + 13, 51768], "temperature": 0.0, "avg_logprob": -0.3005703059771589, "compression_ratio": + 1.6062717770034842, "no_speech_prob": 0.010494323447346687}, {"id": 260, "seek": + 82346, "start": 851.5400000000001, "end": 853.38, "text": " And they also have clustering.", + "tokens": [51768, 400, 436, 611, 362, 596, 48673, 13, 51860], "temperature": 0.0, + "avg_logprob": -0.3005703059771589, "compression_ratio": 1.6062717770034842, "no_speech_prob": + 0.010494323447346687}, {"id": 261, "seek": 85338, "start": 853.38, "end": 855.9399999999999, + "text": " So it''s kind of three phase algorithm.", "tokens": [50364, 407, 309, + 311, 733, 295, 1045, 5574, 9284, 13, 50492], "temperature": 0.0, "avg_logprob": + -0.25848005514229294, "compression_ratio": 1.7008547008547008, "no_speech_prob": + 0.0025568148121237755}, {"id": 262, "seek": 85338, "start": 855.9399999999999, "end": + 860.9399999999999, "text": " The first cluster, the points, they get the centroids.", + "tokens": [50492, 440, 700, 13630, 11, 264, 2793, 11, 436, 483, 264, 24607, 3742, + 13, 50742], "temperature": 0.0, "avg_logprob": -0.25848005514229294, "compression_ratio": + 1.7008547008547008, "no_speech_prob": 0.0025568148121237755}, {"id": 263, "seek": + 85338, "start": 860.9399999999999, "end": 866.62, "text": " Then they kind of quantize + them, I guess, kind of lose some precision on the vectors so", "tokens": [50742, + 1396, 436, 733, 295, 4426, 1125, 552, 11, 286, 2041, 11, 733, 295, 3624, 512, 18356, + 322, 264, 18875, 370, 51026], "temperature": 0.0, "avg_logprob": -0.25848005514229294, + "compression_ratio": 1.7008547008547008, "no_speech_prob": 0.0025568148121237755}, + {"id": 264, "seek": 85338, "start": 866.62, "end": 869.78, "text": " that you can + actually load them in memory.", "tokens": [51026, 300, 291, 393, 767, 3677, 552, + 294, 4675, 13, 51184], "temperature": 0.0, "avg_logprob": -0.25848005514229294, + "compression_ratio": 1.7008547008547008, "no_speech_prob": 0.0025568148121237755}, + {"id": 265, "seek": 85338, "start": 869.78, "end": 877.26, "text": " And then like + from there, they build the, so for the clusters, they build the H&SW, the", "tokens": + [51184, 400, 550, 411, 490, 456, 11, 436, 1322, 264, 11, 370, 337, 264, 23313, 11, + 436, 1322, 264, 389, 5, 50, 54, 11, 264, 51558], "temperature": 0.0, "avg_logprob": + -0.25848005514229294, "compression_ratio": 1.7008547008547008, "no_speech_prob": + 0.0025568148121237755}, {"id": 266, "seek": 85338, "start": 877.26, "end": 880.02, + "text": " graph kind of layout, right?", "tokens": [51558, 4295, 733, 295, 13333, + 11, 558, 30, 51696], "temperature": 0.0, "avg_logprob": -0.25848005514229294, "compression_ratio": + 1.7008547008547008, "no_speech_prob": 0.0025568148121237755}, {"id": 267, "seek": + 85338, "start": 880.02, "end": 883.36, "text": " For each kind of like shard, you + could say, for cluster.", "tokens": [51696, 1171, 1184, 733, 295, 411, 402, 515, + 11, 291, 727, 584, 11, 337, 13630, 13, 51863], "temperature": 0.0, "avg_logprob": + -0.25848005514229294, "compression_ratio": 1.7008547008547008, "no_speech_prob": + 0.0025568148121237755}, {"id": 268, "seek": 88336, "start": 883.36, "end": 887.96, + "text": " And then they basically kind of, it''s a, it''s a few steps kind of approach.", + "tokens": [50364, 400, 550, 436, 1936, 733, 295, 11, 309, 311, 257, 11, 309, 311, + 257, 1326, 4439, 733, 295, 3109, 13, 50594], "temperature": 0.0, "avg_logprob": + -0.1984797193292986, "compression_ratio": 1.7238493723849373, "no_speech_prob": + 0.0012302107643336058}, {"id": 269, "seek": 88336, "start": 887.96, "end": 892.5600000000001, + "text": " So your query comes in, it basically goes through kind of this quantization.", + "tokens": [50594, 407, 428, 14581, 1487, 294, 11, 309, 1936, 1709, 807, 733, 295, + 341, 4426, 2144, 13, 50824], "temperature": 0.0, "avg_logprob": -0.1984797193292986, + "compression_ratio": 1.7238493723849373, "no_speech_prob": 0.0012302107643336058}, + {"id": 270, "seek": 88336, "start": 892.5600000000001, "end": 896.36, "text": " + You find the closest, you know, centroids.", "tokens": [50824, 509, 915, 264, 13699, + 11, 291, 458, 11, 24607, 3742, 13, 51014], "temperature": 0.0, "avg_logprob": -0.1984797193292986, + "compression_ratio": 1.7238493723849373, "no_speech_prob": 0.0012302107643336058}, + {"id": 271, "seek": 88336, "start": 896.36, "end": 899.08, "text": " And then you + go and kind of searching them.", "tokens": [51014, 400, 550, 291, 352, 293, 733, + 295, 10808, 552, 13, 51150], "temperature": 0.0, "avg_logprob": -0.1984797193292986, + "compression_ratio": 1.7238493723849373, "no_speech_prob": 0.0012302107643336058}, + {"id": 272, "seek": 88336, "start": 899.08, "end": 902.44, "text": " And then you + read rank the results based on the disk.", "tokens": [51150, 400, 550, 291, 1401, + 6181, 264, 3542, 2361, 322, 264, 12355, 13, 51318], "temperature": 0.0, "avg_logprob": + -0.1984797193292986, "compression_ratio": 1.7238493723849373, "no_speech_prob": + 0.0012302107643336058}, {"id": 273, "seek": 88336, "start": 902.44, "end": 906.08, + "text": " So from disk, you read the non-quantized versions of vectors, right?", + "tokens": [51318, 407, 490, 12355, 11, 291, 1401, 264, 2107, 12, 358, 394, 1602, + 9606, 295, 18875, 11, 558, 30, 51500], "temperature": 0.0, "avg_logprob": -0.1984797193292986, + "compression_ratio": 1.7238493723849373, "no_speech_prob": 0.0012302107643336058}, + {"id": 274, "seek": 88336, "start": 906.08, "end": 909.32, "text": " So that you + can actually give, get the precision.", "tokens": [51500, 407, 300, 291, 393, 767, + 976, 11, 483, 264, 18356, 13, 51662], "temperature": 0.0, "avg_logprob": -0.1984797193292986, + "compression_ratio": 1.7238493723849373, "no_speech_prob": 0.0012302107643336058}, + {"id": 275, "seek": 90932, "start": 909.32, "end": 913.8000000000001, "text": " + So I mean, what I''m trying to say basically is that you can combine these algorithms", + "tokens": [50364, 407, 286, 914, 11, 437, 286, 478, 1382, 281, 584, 1936, 307, 300, + 291, 393, 10432, 613, 14642, 50588], "temperature": 0.0, "avg_logprob": -0.2619866202859318, + "compression_ratio": 1.697080291970803, "no_speech_prob": 0.00554925249889493}, + {"id": 276, "seek": 90932, "start": 913.8000000000001, "end": 914.8000000000001, + "text": " in ways.", "tokens": [50588, 294, 2098, 13, 50638], "temperature": 0.0, + "avg_logprob": -0.2619866202859318, "compression_ratio": 1.697080291970803, "no_speech_prob": + 0.00554925249889493}, {"id": 277, "seek": 90932, "start": 914.8000000000001, "end": + 915.8000000000001, "text": " Yeah.", "tokens": [50638, 865, 13, 50688], "temperature": + 0.0, "avg_logprob": -0.2619866202859318, "compression_ratio": 1.697080291970803, + "no_speech_prob": 0.00554925249889493}, {"id": 278, "seek": 90932, "start": 915.8000000000001, + "end": 916.8000000000001, "text": " Yeah.", "tokens": [50688, 865, 13, 50738], "temperature": + 0.0, "avg_logprob": -0.2619866202859318, "compression_ratio": 1.697080291970803, + "no_speech_prob": 0.00554925249889493}, {"id": 279, "seek": 90932, "start": 916.8000000000001, + "end": 920.36, "text": " Depending on your use case, basically, like if you try + to optimize for memory or speed", "tokens": [50738, 22539, 322, 428, 764, 1389, + 11, 1936, 11, 411, 498, 291, 853, 281, 19719, 337, 4675, 420, 3073, 50916], "temperature": + 0.0, "avg_logprob": -0.2619866202859318, "compression_ratio": 1.697080291970803, + "no_speech_prob": 0.00554925249889493}, {"id": 280, "seek": 90932, "start": 920.36, + "end": 922.0400000000001, "text": " or something like that.", "tokens": [50916, + 420, 746, 411, 300, 13, 51000], "temperature": 0.0, "avg_logprob": -0.2619866202859318, + "compression_ratio": 1.697080291970803, "no_speech_prob": 0.00554925249889493}, + {"id": 281, "seek": 90932, "start": 922.0400000000001, "end": 923.0400000000001, + "text": " Yeah.", "tokens": [51000, 865, 13, 51050], "temperature": 0.0, "avg_logprob": + -0.2619866202859318, "compression_ratio": 1.697080291970803, "no_speech_prob": 0.00554925249889493}, + {"id": 282, "seek": 90932, "start": 923.0400000000001, "end": 924.0400000000001, + "text": " Yeah.", "tokens": [51050, 865, 13, 51100], "temperature": 0.0, "avg_logprob": + -0.2619866202859318, "compression_ratio": 1.697080291970803, "no_speech_prob": 0.00554925249889493}, + {"id": 283, "seek": 90932, "start": 924.0400000000001, "end": 930.6800000000001, + "text": " And so if we go back to, before we go into Milbus, like if we go back + to use cases, you", "tokens": [51100, 400, 370, 498, 321, 352, 646, 281, 11, 949, + 321, 352, 666, 7036, 21441, 11, 411, 498, 321, 352, 646, 281, 764, 3331, 11, 291, + 51432], "temperature": 0.0, "avg_logprob": -0.2619866202859318, "compression_ratio": + 1.697080291970803, "no_speech_prob": 0.00554925249889493}, {"id": 284, "seek": 90932, + "start": 930.6800000000001, "end": 932.9200000000001, "text": " mentioned there + are like a number of things.", "tokens": [51432, 2835, 456, 366, 411, 257, 1230, + 295, 721, 13, 51544], "temperature": 0.0, "avg_logprob": -0.2619866202859318, "compression_ratio": + 1.697080291970803, "no_speech_prob": 0.00554925249889493}, {"id": 285, "seek": 90932, + "start": 932.9200000000001, "end": 938.1600000000001, "text": " Let''s say you can + encode almost any object and you gave an example, really good one about", "tokens": + [51544, 961, 311, 584, 291, 393, 2058, 1429, 1920, 604, 2657, 293, 291, 2729, 364, + 1365, 11, 534, 665, 472, 466, 51806], "temperature": 0.0, "avg_logprob": -0.2619866202859318, + "compression_ratio": 1.697080291970803, "no_speech_prob": 0.00554925249889493}, + {"id": 286, "seek": 90932, "start": 938.1600000000001, "end": 939.1600000000001, + "text": " email, right?", "tokens": [51806, 3796, 11, 558, 30, 51856], "temperature": + 0.0, "avg_logprob": -0.2619866202859318, "compression_ratio": 1.697080291970803, + "no_speech_prob": 0.00554925249889493}, {"id": 287, "seek": 93916, "start": 939.16, + "end": 943.04, "text": " So email on one hand, everyone knows what it is.", "tokens": + [50364, 407, 3796, 322, 472, 1011, 11, 1518, 3255, 437, 309, 307, 13, 50558], "temperature": + 0.0, "avg_logprob": -0.14584015932950106, "compression_ratio": 1.6224066390041494, + "no_speech_prob": 0.0016803696053102612}, {"id": 288, "seek": 93916, "start": 943.04, + "end": 947.3199999999999, "text": " On the other hand, it has unstructured kind + of parts to it.", "tokens": [50558, 1282, 264, 661, 1011, 11, 309, 575, 18799, 46847, + 733, 295, 3166, 281, 309, 13, 50772], "temperature": 0.0, "avg_logprob": -0.14584015932950106, + "compression_ratio": 1.6224066390041494, "no_speech_prob": 0.0016803696053102612}, + {"id": 289, "seek": 93916, "start": 947.3199999999999, "end": 955.1999999999999, + "text": " And if you compare text, let''s say versus audio or video, do you think + that you can equally", "tokens": [50772, 400, 498, 291, 6794, 2487, 11, 718, 311, + 584, 5717, 6278, 420, 960, 11, 360, 291, 519, 300, 291, 393, 12309, 51166], "temperature": + 0.0, "avg_logprob": -0.14584015932950106, "compression_ratio": 1.6224066390041494, + "no_speech_prob": 0.0016803696053102612}, {"id": 290, "seek": 93916, "start": 955.1999999999999, + "end": 957.24, "text": " apply vector search?", "tokens": [51166, 3079, 8062, 3164, + 30, 51268], "temperature": 0.0, "avg_logprob": -0.14584015932950106, "compression_ratio": + 1.6224066390041494, "no_speech_prob": 0.0016803696053102612}, {"id": 291, "seek": + 93916, "start": 957.24, "end": 961.3199999999999, "text": " Of course you can, but + I''m asking in terms of quality that you will get.", "tokens": [51268, 2720, 1164, + 291, 393, 11, 457, 286, 478, 3365, 294, 2115, 295, 3125, 300, 291, 486, 483, 13, + 51472], "temperature": 0.0, "avg_logprob": -0.14584015932950106, "compression_ratio": + 1.6224066390041494, "no_speech_prob": 0.0016803696053102612}, {"id": 292, "seek": + 93916, "start": 961.3199999999999, "end": 966.04, "text": " Or do you need to go + like extra mile, you know, in audio, extra mile and video compared", "tokens": [51472, + 1610, 360, 291, 643, 281, 352, 411, 2857, 12620, 11, 291, 458, 11, 294, 6278, 11, + 2857, 12620, 293, 960, 5347, 51708], "temperature": 0.0, "avg_logprob": -0.14584015932950106, + "compression_ratio": 1.6224066390041494, "no_speech_prob": 0.0016803696053102612}, + {"id": 293, "seek": 93916, "start": 966.04, "end": 967.04, "text": " to text?", + "tokens": [51708, 281, 2487, 30, 51758], "temperature": 0.0, "avg_logprob": -0.14584015932950106, + "compression_ratio": 1.6224066390041494, "no_speech_prob": 0.0016803696053102612}, + {"id": 294, "seek": 96704, "start": 967.04, "end": 970.16, "text": " There are so + many models.", "tokens": [50364, 821, 366, 370, 867, 5245, 13, 50520], "temperature": + 0.0, "avg_logprob": -0.19153122220720564, "compression_ratio": 1.88339222614841, + "no_speech_prob": 0.14406709372997284}, {"id": 295, "seek": 96704, "start": 970.16, + "end": 971.52, "text": " Honestly, that''s a good question.", "tokens": [50520, + 12348, 11, 300, 311, 257, 665, 1168, 13, 50588], "temperature": 0.0, "avg_logprob": + -0.19153122220720564, "compression_ratio": 1.88339222614841, "no_speech_prob": 0.14406709372997284}, + {"id": 296, "seek": 96704, "start": 971.52, "end": 976.52, "text": " I think that''s + where the neural nets come in and that''s where they''re important.", "tokens": + [50588, 286, 519, 300, 311, 689, 264, 18161, 36170, 808, 294, 293, 300, 311, 689, + 436, 434, 1021, 13, 50838], "temperature": 0.0, "avg_logprob": -0.19153122220720564, + "compression_ratio": 1.88339222614841, "no_speech_prob": 0.14406709372997284}, {"id": + 297, "seek": 96704, "start": 976.52, "end": 982.0799999999999, "text": " How they + kind of the black box doesn''t how it kind of sorts everything out, but I believe", + "tokens": [50838, 1012, 436, 733, 295, 264, 2211, 2424, 1177, 380, 577, 309, 733, + 295, 7527, 1203, 484, 11, 457, 286, 1697, 51116], "temperature": 0.0, "avg_logprob": + -0.19153122220720564, "compression_ratio": 1.88339222614841, "no_speech_prob": 0.14406709372997284}, + {"id": 298, "seek": 96704, "start": 982.0799999999999, "end": 987.16, "text": " + in text, I feel like first, I feel like there''s been a lot more work and a lot + more kind of", "tokens": [51116, 294, 2487, 11, 286, 841, 411, 700, 11, 286, 841, + 411, 456, 311, 668, 257, 688, 544, 589, 293, 257, 688, 544, 733, 295, 51370], "temperature": + 0.0, "avg_logprob": -0.19153122220720564, "compression_ratio": 1.88339222614841, + "no_speech_prob": 0.14406709372997284}, {"id": 299, "seek": 96704, "start": 987.16, + "end": 990.64, "text": " people have been looking into them for now, everyone''s + kind of switching to that for product", "tokens": [51370, 561, 362, 668, 1237, 666, + 552, 337, 586, 11, 1518, 311, 733, 295, 16493, 281, 300, 337, 1674, 51544], "temperature": + 0.0, "avg_logprob": -0.19153122220720564, "compression_ratio": 1.88339222614841, + "no_speech_prob": 0.14406709372997284}, {"id": 300, "seek": 96704, "start": 990.64, + "end": 991.64, "text": " recommendation.", "tokens": [51544, 11879, 13, 51594], + "temperature": 0.0, "avg_logprob": -0.19153122220720564, "compression_ratio": 1.88339222614841, + "no_speech_prob": 0.14406709372997284}, {"id": 301, "seek": 96704, "start": 991.64, + "end": 993.0, "text": " There''s a lot more money in that area.", "tokens": [51594, + 821, 311, 257, 688, 544, 1460, 294, 300, 1859, 13, 51662], "temperature": 0.0, "avg_logprob": + -0.19153122220720564, "compression_ratio": 1.88339222614841, "no_speech_prob": 0.14406709372997284}, + {"id": 302, "seek": 96704, "start": 993.0, "end": 996.7199999999999, "text": " So + I think there are a lot more advances in those neural nets.", "tokens": [51662, + 407, 286, 519, 456, 366, 257, 688, 544, 25297, 294, 729, 18161, 36170, 13, 51848], + "temperature": 0.0, "avg_logprob": -0.19153122220720564, "compression_ratio": 1.88339222614841, + "no_speech_prob": 0.14406709372997284}, {"id": 303, "seek": 99672, "start": 996.72, + "end": 1001.96, "text": " But I think the underlying to text, the way I personally + see it, this isn''t like scientific", "tokens": [50364, 583, 286, 519, 264, 14217, + 281, 2487, 11, 264, 636, 286, 5665, 536, 309, 11, 341, 1943, 380, 411, 8134, 50626], + "temperature": 0.0, "avg_logprob": -0.1755487404617609, "compression_ratio": 1.8995633187772927, + "no_speech_prob": 0.0006869940552860498}, {"id": 304, "seek": 99672, "start": 1001.96, + "end": 1006.36, "text": " factor is I feel like there''s a lot more underlying in + language.", "tokens": [50626, 5952, 307, 286, 841, 411, 456, 311, 257, 688, 544, + 14217, 294, 2856, 13, 50846], "temperature": 0.0, "avg_logprob": -0.1755487404617609, + "compression_ratio": 1.8995633187772927, "no_speech_prob": 0.0006869940552860498}, + {"id": 305, "seek": 99672, "start": 1006.36, "end": 1010.5600000000001, "text": + " I feel like there''s a lot more rules underlying connections that I would think + a neural net", "tokens": [50846, 286, 841, 411, 456, 311, 257, 688, 544, 4474, 14217, + 9271, 300, 286, 576, 519, 257, 18161, 2533, 51056], "temperature": 0.0, "avg_logprob": + -0.1755487404617609, "compression_ratio": 1.8995633187772927, "no_speech_prob": + 0.0006869940552860498}, {"id": 306, "seek": 99672, "start": 1010.5600000000001, + "end": 1014.24, "text": " would find compared to an image.", "tokens": [51056, 576, + 915, 5347, 281, 364, 3256, 13, 51240], "temperature": 0.0, "avg_logprob": -0.1755487404617609, + "compression_ratio": 1.8995633187772927, "no_speech_prob": 0.0006869940552860498}, + {"id": 307, "seek": 99672, "start": 1014.24, "end": 1020.08, "text": " And then + with those underlying values and kind of the underlying language, you''ll have an", + "tokens": [51240, 400, 550, 365, 729, 14217, 4190, 293, 733, 295, 264, 14217, 2856, + 11, 291, 603, 362, 364, 51532], "temperature": 0.0, "avg_logprob": -0.1755487404617609, + "compression_ratio": 1.8995633187772927, "no_speech_prob": 0.0006869940552860498}, + {"id": 308, "seek": 99672, "start": 1020.08, "end": 1024.0, "text": " easier time + kind of grouping things together with a neural net.", "tokens": [51532, 3571, 565, + 733, 295, 40149, 721, 1214, 365, 257, 18161, 2533, 13, 51728], "temperature": 0.0, + "avg_logprob": -0.1755487404617609, "compression_ratio": 1.8995633187772927, "no_speech_prob": + 0.0006869940552860498}, {"id": 309, "seek": 102400, "start": 1024.0, "end": 1028.68, + "text": " And then if you can easily more easily group things together, the more + easily you can search", "tokens": [50364, 400, 550, 498, 291, 393, 3612, 544, 3612, + 1594, 721, 1214, 11, 264, 544, 3612, 291, 393, 3164, 50598], "temperature": 0.0, + "avg_logprob": -0.19116018724071887, "compression_ratio": 1.8120567375886525, "no_speech_prob": + 0.0011080747935920954}, {"id": 310, "seek": 102400, "start": 1028.68, "end": 1029.68, + "text": " it pretty much.", "tokens": [50598, 309, 1238, 709, 13, 50648], "temperature": + 0.0, "avg_logprob": -0.19116018724071887, "compression_ratio": 1.8120567375886525, + "no_speech_prob": 0.0011080747935920954}, {"id": 311, "seek": 102400, "start": 1029.68, + "end": 1033.28, "text": " You can make these clusters a lot more accurate if things + are going to already be near each", "tokens": [50648, 509, 393, 652, 613, 23313, + 257, 688, 544, 8559, 498, 721, 366, 516, 281, 1217, 312, 2651, 1184, 50828], "temperature": + 0.0, "avg_logprob": -0.19116018724071887, "compression_ratio": 1.8120567375886525, + "no_speech_prob": 0.0011080747935920954}, {"id": 312, "seek": 102400, "start": 1033.28, + "end": 1035.56, "text": " other, and it''s easier to find.", "tokens": [50828, 661, + 11, 293, 309, 311, 3571, 281, 915, 13, 50942], "temperature": 0.0, "avg_logprob": + -0.19116018724071887, "compression_ratio": 1.8120567375886525, "no_speech_prob": + 0.0011080747935920954}, {"id": 313, "seek": 102400, "start": 1035.56, "end": 1040.2, + "text": " With images on the other hand, I feel like there''s not as much of a background + connection", "tokens": [50942, 2022, 5267, 322, 264, 661, 1011, 11, 286, 841, 411, + 456, 311, 406, 382, 709, 295, 257, 3678, 4984, 51174], "temperature": 0.0, "avg_logprob": + -0.19116018724071887, "compression_ratio": 1.8120567375886525, "no_speech_prob": + 0.0011080747935920954}, {"id": 314, "seek": 102400, "start": 1040.2, "end": 1041.48, + "text": " in everything.", "tokens": [51174, 294, 1203, 13, 51238], "temperature": + 0.0, "avg_logprob": -0.19116018724071887, "compression_ratio": 1.8120567375886525, + "no_speech_prob": 0.0011080747935920954}, {"id": 315, "seek": 102400, "start": 1041.48, + "end": 1046.36, "text": " Again, all personal take, but I feel like sure an image + might have the same object, but", "tokens": [51238, 3764, 11, 439, 2973, 747, 11, + 457, 286, 841, 411, 988, 364, 3256, 1062, 362, 264, 912, 2657, 11, 457, 51482], + "temperature": 0.0, "avg_logprob": -0.19116018724071887, "compression_ratio": 1.8120567375886525, + "no_speech_prob": 0.0011080747935920954}, {"id": 316, "seek": 102400, "start": 1046.36, + "end": 1049.8, "text": " there''s no like real underlying thing linking those objects.", + "tokens": [51482, 456, 311, 572, 411, 957, 14217, 551, 25775, 729, 6565, 13, 51654], + "temperature": 0.0, "avg_logprob": -0.19116018724071887, "compression_ratio": 1.8120567375886525, + "no_speech_prob": 0.0011080747935920954}, {"id": 317, "seek": 102400, "start": 1049.8, + "end": 1050.8, "text": " Yeah, there''s the shape.", "tokens": [51654, 865, 11, + 456, 311, 264, 3909, 13, 51704], "temperature": 0.0, "avg_logprob": -0.19116018724071887, + "compression_ratio": 1.8120567375886525, "no_speech_prob": 0.0011080747935920954}, + {"id": 318, "seek": 105080, "start": 1051.28, "end": 1055.12, "text": " But in languages, + you have a lot more than just a shape of an object.", "tokens": [50388, 583, 294, + 8650, 11, 291, 362, 257, 688, 544, 813, 445, 257, 3909, 295, 364, 2657, 13, 50580], + "temperature": 0.0, "avg_logprob": -0.2297589681982025, "compression_ratio": 1.6571428571428573, + "no_speech_prob": 0.03750661015510559}, {"id": 319, "seek": 105080, "start": 1055.12, + "end": 1060.44, "text": " So that''s kind of where I think text does have a better + time.", "tokens": [50580, 407, 300, 311, 733, 295, 689, 286, 519, 2487, 775, 362, + 257, 1101, 565, 13, 50846], "temperature": 0.0, "avg_logprob": -0.2297589681982025, + "compression_ratio": 1.6571428571428573, "no_speech_prob": 0.03750661015510559}, + {"id": 320, "seek": 105080, "start": 1060.44, "end": 1064.08, "text": " But in reality, + the way I want to, when we look at our systems and everything like when", "tokens": + [50846, 583, 294, 4103, 11, 264, 636, 286, 528, 281, 11, 562, 321, 574, 412, 527, + 3652, 293, 1203, 411, 562, 51028], "temperature": 0.0, "avg_logprob": -0.2297589681982025, + "compression_ratio": 1.6571428571428573, "no_speech_prob": 0.03750661015510559}, + {"id": 321, "seek": 105080, "start": 1064.08, "end": 1067.76, "text": " we try, + it always ends up being very close to each other.", "tokens": [51028, 321, 853, + 11, 309, 1009, 5314, 493, 885, 588, 1998, 281, 1184, 661, 13, 51212], "temperature": + 0.0, "avg_logprob": -0.2297589681982025, "compression_ratio": 1.6571428571428573, + "no_speech_prob": 0.03750661015510559}, {"id": 322, "seek": 105080, "start": 1067.76, + "end": 1071.84, "text": " Maybe like up until now, it''s already approximate.", + "tokens": [51212, 2704, 411, 493, 1826, 586, 11, 309, 311, 1217, 30874, 13, 51416], + "temperature": 0.0, "avg_logprob": -0.2297589681982025, "compression_ratio": 1.6571428571428573, + "no_speech_prob": 0.03750661015510559}, {"id": 323, "seek": 105080, "start": 1071.84, + "end": 1076.6399999999999, "text": " So no one''s really been hurt that much by + half a percent of accuracy.", "tokens": [51416, 407, 572, 472, 311, 534, 668, 4607, + 300, 709, 538, 1922, 257, 3043, 295, 14170, 13, 51656], "temperature": 0.0, "avg_logprob": + -0.2297589681982025, "compression_ratio": 1.6571428571428573, "no_speech_prob": + 0.03750661015510559}, {"id": 324, "seek": 105080, "start": 1076.6399999999999, "end": + 1079.28, "text": " Up until like everyone understands it''s still kind of a new + feel.", "tokens": [51656, 5858, 1826, 411, 1518, 15146, 309, 311, 920, 733, 295, + 257, 777, 841, 13, 51788], "temperature": 0.0, "avg_logprob": -0.2297589681982025, + "compression_ratio": 1.6571428571428573, "no_speech_prob": 0.03750661015510559}, + {"id": 325, "seek": 107928, "start": 1079.28, "end": 1082.48, "text": " It''s kind + of growing in that these methods are all approximate.", "tokens": [50364, 467, 311, + 733, 295, 4194, 294, 300, 613, 7150, 366, 439, 30874, 13, 50524], "temperature": + 0.0, "avg_logprob": -0.2322766194578077, "compression_ratio": 1.8065693430656935, + "no_speech_prob": 0.005367324221879244}, {"id": 326, "seek": 107928, "start": 1082.48, + "end": 1084.8, "text": " Like you''re never going to get a perfect end.", "tokens": + [50524, 1743, 291, 434, 1128, 516, 281, 483, 257, 2176, 917, 13, 50640], "temperature": + 0.0, "avg_logprob": -0.2322766194578077, "compression_ratio": 1.8065693430656935, + "no_speech_prob": 0.005367324221879244}, {"id": 327, "seek": 107928, "start": 1084.8, + "end": 1089.36, "text": " It''s really up to the testing with your neural net to + see which embeddings and optimizing", "tokens": [50640, 467, 311, 534, 493, 281, + 264, 4997, 365, 428, 18161, 2533, 281, 536, 597, 12240, 29432, 293, 40425, 50868], + "temperature": 0.0, "avg_logprob": -0.2322766194578077, "compression_ratio": 1.8065693430656935, + "no_speech_prob": 0.005367324221879244}, {"id": 328, "seek": 107928, "start": 1089.36, + "end": 1094.52, "text": " your neural net and then throwing because these algorithms + aren''t, it''s like for the approximate", "tokens": [50868, 428, 18161, 2533, 293, + 550, 10238, 570, 613, 14642, 3212, 380, 11, 309, 311, 411, 337, 264, 30874, 51126], + "temperature": 0.0, "avg_logprob": -0.2322766194578077, "compression_ratio": 1.8065693430656935, + "no_speech_prob": 0.005367324221879244}, {"id": 329, "seek": 107928, "start": 1094.52, + "end": 1097.24, "text": " year''s neighbor, these algorithms aren''t really learning.", + "tokens": [51126, 1064, 311, 5987, 11, 613, 14642, 3212, 380, 534, 2539, 13, 51262], + "temperature": 0.0, "avg_logprob": -0.2322766194578077, "compression_ratio": 1.8065693430656935, + "no_speech_prob": 0.005367324221879244}, {"id": 330, "seek": 107928, "start": 1097.24, + "end": 1102.28, "text": " Sure, there is some learning with the quantization based + ones, where it is kind of making its own", "tokens": [51262, 4894, 11, 456, 307, + 512, 2539, 365, 264, 4426, 2144, 2361, 2306, 11, 689, 309, 307, 733, 295, 1455, + 1080, 1065, 51514], "temperature": 0.0, "avg_logprob": -0.2322766194578077, "compression_ratio": + 1.8065693430656935, "no_speech_prob": 0.005367324221879244}, {"id": 331, "seek": + 107928, "start": 1102.28, "end": 1103.28, "text": " quantization.", "tokens": [51514, + 4426, 2144, 13, 51564], "temperature": 0.0, "avg_logprob": -0.2322766194578077, + "compression_ratio": 1.8065693430656935, "no_speech_prob": 0.005367324221879244}, + {"id": 332, "seek": 107928, "start": 1103.28, "end": 1105.44, "text": " I know for + face, it does it.", "tokens": [51564, 286, 458, 337, 1851, 11, 309, 775, 309, 13, + 51672], "temperature": 0.0, "avg_logprob": -0.2322766194578077, "compression_ratio": + 1.8065693430656935, "no_speech_prob": 0.005367324221879244}, {"id": 333, "seek": + 110544, "start": 1105.44, "end": 1110.28, "text": " But again, it''s like it''s + like a, it''s an algorithm that kind of goes step by step and", "tokens": [50364, + 583, 797, 11, 309, 311, 411, 309, 311, 411, 257, 11, 309, 311, 364, 9284, 300, 733, + 295, 1709, 1823, 538, 1823, 293, 50606], "temperature": 0.0, "avg_logprob": -0.23956664403279623, + "compression_ratio": 1.6764705882352942, "no_speech_prob": 0.011535623110830784}, + {"id": 334, "seek": 110544, "start": 1110.28, "end": 1111.76, "text": " there''s + not too much randomness.", "tokens": [50606, 456, 311, 406, 886, 709, 4974, 1287, + 13, 50680], "temperature": 0.0, "avg_logprob": -0.23956664403279623, "compression_ratio": + 1.6764705882352942, "no_speech_prob": 0.011535623110830784}, {"id": 335, "seek": + 110544, "start": 1111.76, "end": 1115.16, "text": " Sure Spotify does random splits + in their annoy.", "tokens": [50680, 4894, 29036, 775, 4974, 37741, 294, 641, 8759, + 13, 50850], "temperature": 0.0, "avg_logprob": -0.23956664403279623, "compression_ratio": + 1.6764705882352942, "no_speech_prob": 0.011535623110830784}, {"id": 336, "seek": + 110544, "start": 1115.16, "end": 1121.3200000000002, "text": " But yeah, so I kind + of have to optimize your neural net to really get the best performance.", "tokens": + [50850, 583, 1338, 11, 370, 286, 733, 295, 362, 281, 19719, 428, 18161, 2533, 281, + 534, 483, 264, 1151, 3389, 13, 51158], "temperature": 0.0, "avg_logprob": -0.23956664403279623, + "compression_ratio": 1.6764705882352942, "no_speech_prob": 0.011535623110830784}, + {"id": 337, "seek": 110544, "start": 1121.3200000000002, "end": 1127.44, "text": + " But there are some values that you can mess around with the actual approximate + nearest", "tokens": [51158, 583, 456, 366, 512, 4190, 300, 291, 393, 2082, 926, + 365, 264, 3539, 30874, 23831, 51464], "temperature": 0.0, "avg_logprob": -0.23956664403279623, + "compression_ratio": 1.6764705882352942, "no_speech_prob": 0.011535623110830784}, + {"id": 338, "seek": 110544, "start": 1127.44, "end": 1128.68, "text": " neighbor + search.", "tokens": [51464, 5987, 3164, 13, 51526], "temperature": 0.0, "avg_logprob": + -0.23956664403279623, "compression_ratio": 1.6764705882352942, "no_speech_prob": + 0.011535623110830784}, {"id": 339, "seek": 110544, "start": 1128.68, "end": 1132.48, + "text": " But those don''t play as big of a deal, I believe, as what you''re doing + with your neural", "tokens": [51526, 583, 729, 500, 380, 862, 382, 955, 295, 257, + 2028, 11, 286, 1697, 11, 382, 437, 291, 434, 884, 365, 428, 18161, 51716], "temperature": + 0.0, "avg_logprob": -0.23956664403279623, "compression_ratio": 1.6764705882352942, + "no_speech_prob": 0.011535623110830784}, {"id": 340, "seek": 110544, "start": 1132.48, + "end": 1133.48, "text": " net.", "tokens": [51716, 2533, 13, 51766], "temperature": + 0.0, "avg_logprob": -0.23956664403279623, "compression_ratio": 1.6764705882352942, + "no_speech_prob": 0.011535623110830784}, {"id": 341, "seek": 113348, "start": 1134.0, + "end": 1135.92, "text": " Yeah, that''s interesting.", "tokens": [50390, 865, 11, + 300, 311, 1880, 13, 50486], "temperature": 0.0, "avg_logprob": -0.28061049917469855, + "compression_ratio": 1.5070422535211268, "no_speech_prob": 0.01208804827183485}, + {"id": 342, "seek": 113348, "start": 1135.92, "end": 1143.8, "text": " So, if I + actually take a step back a little bit, can you tell me and our listeners why we", + "tokens": [50486, 407, 11, 498, 286, 767, 747, 257, 1823, 646, 257, 707, 857, 11, + 393, 291, 980, 385, 293, 527, 23274, 983, 321, 50880], "temperature": 0.0, "avg_logprob": + -0.28061049917469855, "compression_ratio": 1.5070422535211268, "no_speech_prob": + 0.01208804827183485}, {"id": 343, "seek": 113348, "start": 1143.8, "end": 1148.48, + "text": " cannot do exact KNN, exact KNN nearest neighbors?", "tokens": [50880, + 2644, 360, 1900, 26967, 45, 11, 1900, 26967, 45, 23831, 12512, 30, 51114], "temperature": + 0.0, "avg_logprob": -0.28061049917469855, "compression_ratio": 1.5070422535211268, + "no_speech_prob": 0.01208804827183485}, {"id": 344, "seek": 113348, "start": 1148.48, + "end": 1150.24, "text": " Why do we need to do approximate?", "tokens": [51114, + 1545, 360, 321, 643, 281, 360, 30874, 30, 51202], "temperature": 0.0, "avg_logprob": + -0.28061049917469855, "compression_ratio": 1.5070422535211268, "no_speech_prob": + 0.01208804827183485}, {"id": 345, "seek": 113348, "start": 1150.24, "end": 1153.2, + "text": " What stops us from doing exact?", "tokens": [51202, 708, 10094, 505, 490, + 884, 1900, 30, 51350], "temperature": 0.0, "avg_logprob": -0.28061049917469855, + "compression_ratio": 1.5070422535211268, "no_speech_prob": 0.01208804827183485}, + {"id": 346, "seek": 113348, "start": 1153.2, "end": 1159.4, "text": " So with exact, + okay, so first you can get exact, but that''s just going to be brute force.", "tokens": + [51350, 407, 365, 1900, 11, 1392, 11, 370, 700, 291, 393, 483, 1900, 11, 457, 300, + 311, 445, 516, 281, 312, 47909, 3464, 13, 51660], "temperature": 0.0, "avg_logprob": + -0.28061049917469855, "compression_ratio": 1.5070422535211268, "no_speech_prob": + 0.01208804827183485}, {"id": 347, "seek": 115940, "start": 1159.4, "end": 1164.88, + "text": " Maybe all these libraries, maybe not all of them, but most of them do + have a brute force", "tokens": [50364, 2704, 439, 613, 15148, 11, 1310, 406, 439, + 295, 552, 11, 457, 881, 295, 552, 360, 362, 257, 47909, 3464, 50638], "temperature": + 0.0, "avg_logprob": -0.2188546553902004, "compression_ratio": 1.7730263157894737, + "no_speech_prob": 0.05606098845601082}, {"id": 348, "seek": 115940, "start": 1164.88, + "end": 1166.2800000000002, "text": " search.", "tokens": [50638, 3164, 13, 50708], + "temperature": 0.0, "avg_logprob": -0.2188546553902004, "compression_ratio": 1.7730263157894737, + "no_speech_prob": 0.05606098845601082}, {"id": 349, "seek": 115940, "start": 1166.2800000000002, + "end": 1168.0, "text": " And then you haven''t solved anything.", "tokens": [50708, + 400, 550, 291, 2378, 380, 13041, 1340, 13, 50794], "temperature": 0.0, "avg_logprob": + -0.2188546553902004, "compression_ratio": 1.7730263157894737, "no_speech_prob": + 0.05606098845601082}, {"id": 350, "seek": 115940, "start": 1168.0, "end": 1172.0800000000002, + "text": " Then you can just use relational database, throw in your numbers and go + by each column,", "tokens": [50794, 1396, 291, 393, 445, 764, 38444, 8149, 11, 3507, + 294, 428, 3547, 293, 352, 538, 1184, 7738, 11, 50998], "temperature": 0.0, "avg_logprob": + -0.2188546553902004, "compression_ratio": 1.7730263157894737, "no_speech_prob": + 0.05606098845601082}, {"id": 351, "seek": 115940, "start": 1172.0800000000002, "end": + 1174.72, "text": " look through each one, see which one''s closest.", "tokens": + [50998, 574, 807, 1184, 472, 11, 536, 597, 472, 311, 13699, 13, 51130], "temperature": + 0.0, "avg_logprob": -0.2188546553902004, "compression_ratio": 1.7730263157894737, + "no_speech_prob": 0.05606098845601082}, {"id": 352, "seek": 115940, "start": 1174.72, + "end": 1177.3600000000001, "text": " Yeah, so you get approximate, that''s where + you get your speed up.", "tokens": [51130, 865, 11, 370, 291, 483, 30874, 11, 300, + 311, 689, 291, 483, 428, 3073, 493, 13, 51262], "temperature": 0.0, "avg_logprob": + -0.2188546553902004, "compression_ratio": 1.7730263157894737, "no_speech_prob": + 0.05606098845601082}, {"id": 353, "seek": 115940, "start": 1177.3600000000001, "end": + 1182.4, "text": " And then with approximate, because you''re doing clustering, you + assume most things are going", "tokens": [51262, 400, 550, 365, 30874, 11, 570, + 291, 434, 884, 596, 48673, 11, 291, 6552, 881, 721, 366, 516, 51514], "temperature": + 0.0, "avg_logprob": -0.2188546553902004, "compression_ratio": 1.7730263157894737, + "no_speech_prob": 0.05606098845601082}, {"id": 354, "seek": 115940, "start": 1182.4, + "end": 1183.4, "text": " to be embedded.", "tokens": [51514, 281, 312, 16741, 13, + 51564], "temperature": 0.0, "avg_logprob": -0.2188546553902004, "compression_ratio": + 1.7730263157894737, "no_speech_prob": 0.05606098845601082}, {"id": 355, "seek": + 115940, "start": 1183.4, "end": 1187.8400000000001, "text": " Your neural net, like + if you have a neural net, your embedding layer, your vector is probably", "tokens": + [51564, 2260, 18161, 2533, 11, 411, 498, 291, 362, 257, 18161, 2533, 11, 428, 12240, + 3584, 4583, 11, 428, 8062, 307, 1391, 51786], "temperature": 0.0, "avg_logprob": + -0.2188546553902004, "compression_ratio": 1.7730263157894737, "no_speech_prob": + 0.05606098845601082}, {"id": 356, "seek": 118784, "start": 1187.84, "end": 1190.32, + "text": " like you hope that it''s going to find similarities.", "tokens": [50364, + 411, 291, 1454, 300, 309, 311, 516, 281, 915, 24197, 13, 50488], "temperature": + 0.0, "avg_logprob": -0.21342143764743557, "compression_ratio": 1.876543209876543, + "no_speech_prob": 0.018584705889225006}, {"id": 357, "seek": 118784, "start": 1190.32, + "end": 1194.12, "text": " Like if you have two items that are very similar, your + hope that their distance is not going", "tokens": [50488, 1743, 498, 291, 362, 732, + 4754, 300, 366, 588, 2531, 11, 428, 1454, 300, 641, 4560, 307, 406, 516, 50678], + "temperature": 0.0, "avg_logprob": -0.21342143764743557, "compression_ratio": 1.876543209876543, + "no_speech_prob": 0.018584705889225006}, {"id": 358, "seek": 118784, "start": 1194.12, + "end": 1195.12, "text": " to be far.", "tokens": [50678, 281, 312, 1400, 13, 50728], + "temperature": 0.0, "avg_logprob": -0.21342143764743557, "compression_ratio": 1.876543209876543, + "no_speech_prob": 0.018584705889225006}, {"id": 359, "seek": 118784, "start": 1195.12, + "end": 1196.8799999999999, "text": " This isn''t always the case.", "tokens": [50728, + 639, 1943, 380, 1009, 264, 1389, 13, 50816], "temperature": 0.0, "avg_logprob": + -0.21342143764743557, "compression_ratio": 1.876543209876543, "no_speech_prob": + 0.018584705889225006}, {"id": 360, "seek": 118784, "start": 1196.8799999999999, + "end": 1200.6399999999999, "text": " A neural net, if you have a photo of a car + and a photo of a car with a bike in the background", "tokens": [50816, 316, 18161, + 2533, 11, 498, 291, 362, 257, 5052, 295, 257, 1032, 293, 257, 5052, 295, 257, 1032, + 365, 257, 5656, 294, 264, 3678, 51004], "temperature": 0.0, "avg_logprob": -0.21342143764743557, + "compression_ratio": 1.876543209876543, "no_speech_prob": 0.018584705889225006}, + {"id": 361, "seek": 118784, "start": 1200.6399999999999, "end": 1204.52, "text": + " in might for some reason, folks on the bike, we don''t really know what''s going + on.", "tokens": [51004, 294, 1062, 337, 512, 1778, 11, 4024, 322, 264, 5656, 11, + 321, 500, 380, 534, 458, 437, 311, 516, 322, 13, 51198], "temperature": 0.0, "avg_logprob": + -0.21342143764743557, "compression_ratio": 1.876543209876543, "no_speech_prob": + 0.018584705889225006}, {"id": 362, "seek": 118784, "start": 1204.52, "end": 1207.76, + "text": " There''s research into seeing what''s actually going on behind the scenes + in the box.", "tokens": [51198, 821, 311, 2132, 666, 2577, 437, 311, 767, 516, 322, + 2261, 264, 8026, 294, 264, 2424, 13, 51360], "temperature": 0.0, "avg_logprob": + -0.21342143764743557, "compression_ratio": 1.876543209876543, "no_speech_prob": + 0.018584705889225006}, {"id": 363, "seek": 118784, "start": 1207.76, "end": 1211.48, + "text": " But yeah, these two might pop out with two completely different values.", + "tokens": [51360, 583, 1338, 11, 613, 732, 1062, 1665, 484, 365, 732, 2584, 819, + 4190, 13, 51546], "temperature": 0.0, "avg_logprob": -0.21342143764743557, "compression_ratio": + 1.876543209876543, "no_speech_prob": 0.018584705889225006}, {"id": 364, "seek": + 118784, "start": 1211.48, "end": 1216.3999999999999, "text": " They might be in + completely different clusters, even though they kind of should be similar.", "tokens": + [51546, 814, 1062, 312, 294, 2584, 819, 23313, 11, 754, 1673, 436, 733, 295, 820, + 312, 2531, 13, 51792], "temperature": 0.0, "avg_logprob": -0.21342143764743557, + "compression_ratio": 1.876543209876543, "no_speech_prob": 0.018584705889225006}, + {"id": 365, "seek": 121640, "start": 1216.4, "end": 1218.96, "text": " So that''s + where this, it can kind of go wrong.", "tokens": [50364, 407, 300, 311, 689, 341, + 11, 309, 393, 733, 295, 352, 2085, 13, 50492], "temperature": 0.0, "avg_logprob": + -0.2041626433412472, "compression_ratio": 1.891025641025641, "no_speech_prob": 0.0004561484674923122}, + {"id": 366, "seek": 121640, "start": 1218.96, "end": 1221.96, "text": " And then, + see, yeah, you search the wrong cluster, and then you''ll miss that, even though", + "tokens": [50492, 400, 550, 11, 536, 11, 1338, 11, 291, 3164, 264, 2085, 13630, + 11, 293, 550, 291, 603, 1713, 300, 11, 754, 1673, 50642], "temperature": 0.0, "avg_logprob": + -0.2041626433412472, "compression_ratio": 1.891025641025641, "no_speech_prob": 0.0004561484674923122}, + {"id": 367, "seek": 121640, "start": 1221.96, "end": 1224.0, "text": " I was supposed + to be a good choice.", "tokens": [50642, 286, 390, 3442, 281, 312, 257, 665, 3922, + 13, 50744], "temperature": 0.0, "avg_logprob": -0.2041626433412472, "compression_ratio": + 1.891025641025641, "no_speech_prob": 0.0004561484674923122}, {"id": 368, "seek": + 121640, "start": 1224.0, "end": 1227.4, "text": " But then there''s also the aspect + of not searching through everything.", "tokens": [50744, 583, 550, 456, 311, 611, + 264, 4171, 295, 406, 10808, 807, 1203, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.2041626433412472, "compression_ratio": 1.891025641025641, "no_speech_prob": 0.0004561484674923122}, + {"id": 369, "seek": 121640, "start": 1227.4, "end": 1228.92, "text": " You want + to speed things up.", "tokens": [50914, 509, 528, 281, 3073, 721, 493, 13, 50990], + "temperature": 0.0, "avg_logprob": -0.2041626433412472, "compression_ratio": 1.891025641025641, + "no_speech_prob": 0.0004561484674923122}, {"id": 370, "seek": 121640, "start": 1228.92, + "end": 1234.52, "text": " You search through the top 10 matches, let''s say for + inverted file list, which is centroid", "tokens": [50990, 509, 3164, 807, 264, 1192, + 1266, 10676, 11, 718, 311, 584, 337, 38969, 3991, 1329, 11, 597, 307, 1489, 6490, + 51270], "temperature": 0.0, "avg_logprob": -0.2041626433412472, "compression_ratio": + 1.891025641025641, "no_speech_prob": 0.0004561484674923122}, {"id": 371, "seek": + 121640, "start": 1234.52, "end": 1235.52, "text": " base, the clustering.", "tokens": + [51270, 3096, 11, 264, 596, 48673, 13, 51320], "temperature": 0.0, "avg_logprob": + -0.2041626433412472, "compression_ratio": 1.891025641025641, "no_speech_prob": 0.0004561484674923122}, + {"id": 372, "seek": 121640, "start": 1235.52, "end": 1237.92, "text": " You look + at the top 10 centroid.", "tokens": [51320, 509, 574, 412, 264, 1192, 1266, 1489, + 6490, 13, 51440], "temperature": 0.0, "avg_logprob": -0.2041626433412472, "compression_ratio": + 1.891025641025641, "no_speech_prob": 0.0004561484674923122}, {"id": 373, "seek": + 121640, "start": 1237.92, "end": 1240.5600000000002, "text": " If you look at the + top 10 centroid, and you find those files in there, yeah, they''re", "tokens": [51440, + 759, 291, 574, 412, 264, 1192, 1266, 1489, 6490, 11, 293, 291, 915, 729, 7098, 294, + 456, 11, 1338, 11, 436, 434, 51572], "temperature": 0.0, "avg_logprob": -0.2041626433412472, + "compression_ratio": 1.891025641025641, "no_speech_prob": 0.0004561484674923122}, + {"id": 374, "seek": 121640, "start": 1240.5600000000002, "end": 1241.72, "text": + " going to be similar.", "tokens": [51572, 516, 281, 312, 2531, 13, 51630], "temperature": + 0.0, "avg_logprob": -0.2041626433412472, "compression_ratio": 1.891025641025641, + "no_speech_prob": 0.0004561484674923122}, {"id": 375, "seek": 121640, "start": 1241.72, + "end": 1245.5600000000002, "text": " But there might be the 11th centroid, might + be a very similar one.", "tokens": [51630, 583, 456, 1062, 312, 264, 2975, 392, + 1489, 6490, 11, 1062, 312, 257, 588, 2531, 472, 13, 51822], "temperature": 0.0, + "avg_logprob": -0.2041626433412472, "compression_ratio": 1.891025641025641, "no_speech_prob": + 0.0004561484674923122}, {"id": 376, "seek": 124556, "start": 1245.56, "end": 1247.6799999999998, + "text": " They''re all by just a tiny bit less.", "tokens": [50364, 814, 434, 439, + 538, 445, 257, 5870, 857, 1570, 13, 50470], "temperature": 0.0, "avg_logprob": -0.2747142450596259, + "compression_ratio": 1.7362637362637363, "no_speech_prob": 0.017481479793787003}, + {"id": 377, "seek": 124556, "start": 1247.6799999999998, "end": 1250.6399999999999, + "text": " And then inside it, it might have the perfect answer.", "tokens": [50470, + 400, 550, 1854, 309, 11, 309, 1062, 362, 264, 2176, 1867, 13, 50618], "temperature": + 0.0, "avg_logprob": -0.2747142450596259, "compression_ratio": 1.7362637362637363, + "no_speech_prob": 0.017481479793787003}, {"id": 378, "seek": 124556, "start": 1250.6399999999999, + "end": 1255.08, "text": " So there is like all of this approximation where you only + look at top X numbers, and then", "tokens": [50618, 407, 456, 307, 411, 439, 295, + 341, 28023, 689, 291, 787, 574, 412, 1192, 1783, 3547, 11, 293, 550, 50840], "temperature": + 0.0, "avg_logprob": -0.2747142450596259, "compression_ratio": 1.7362637362637363, + "no_speech_prob": 0.017481479793787003}, {"id": 379, "seek": 124556, "start": 1255.08, + "end": 1260.72, "text": " also combine with you only look at, you only make so many + clusters, you make X clusters.", "tokens": [50840, 611, 10432, 365, 291, 787, 574, + 412, 11, 291, 787, 652, 370, 867, 23313, 11, 291, 652, 1783, 23313, 13, 51122], + "temperature": 0.0, "avg_logprob": -0.2747142450596259, "compression_ratio": 1.7362637362637363, + "no_speech_prob": 0.017481479793787003}, {"id": 380, "seek": 124556, "start": 1260.72, + "end": 1263.28, "text": " There''s always going to be outliers out of bounds.", + "tokens": [51122, 821, 311, 1009, 516, 281, 312, 484, 23646, 484, 295, 29905, 13, + 51250], "temperature": 0.0, "avg_logprob": -0.2747142450596259, "compression_ratio": + 1.7362637362637363, "no_speech_prob": 0.017481479793787003}, {"id": 381, "seek": + 124556, "start": 1263.28, "end": 1265.04, "text": " So that''s where you kind of + get that loss.", "tokens": [51250, 407, 300, 311, 689, 291, 733, 295, 483, 300, + 4470, 13, 51338], "temperature": 0.0, "avg_logprob": -0.2747142450596259, "compression_ratio": + 1.7362637362637363, "no_speech_prob": 0.017481479793787003}, {"id": 382, "seek": + 124556, "start": 1265.04, "end": 1269.3999999999999, "text": " Because for the similarity + search in this, leading on this, like, will similarity search", "tokens": [51338, + 1436, 337, 264, 32194, 3164, 294, 341, 11, 5775, 322, 341, 11, 411, 11, 486, 32194, + 3164, 51556], "temperature": 0.0, "avg_logprob": -0.2747142450596259, "compression_ratio": + 1.7362637362637363, "no_speech_prob": 0.017481479793787003}, {"id": 383, "seek": + 124556, "start": 1269.3999999999999, "end": 1271.12, "text": " take over everything?", + "tokens": [51556, 747, 670, 1203, 30, 51642], "temperature": 0.0, "avg_logprob": + -0.2747142450596259, "compression_ratio": 1.7362637362637363, "no_speech_prob": + 0.017481479793787003}, {"id": 384, "seek": 127112, "start": 1271.12, "end": 1274.6799999999998, + "text": " It won''t really, because sometimes you need perfect results.", "tokens": + [50364, 467, 1582, 380, 534, 11, 570, 2171, 291, 643, 2176, 3542, 13, 50542], "temperature": + 0.0, "avg_logprob": -0.26847011424877026, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.08495737612247467}, {"id": 385, "seek": 127112, "start": 1274.6799999999998, + "end": 1277.0, "text": " And similarity search is kind of useless there.", "tokens": + [50542, 400, 32194, 3164, 307, 733, 295, 14115, 456, 13, 50658], "temperature": + 0.0, "avg_logprob": -0.26847011424877026, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.08495737612247467}, {"id": 386, "seek": 127112, "start": 1277.0, + "end": 1281.6799999999998, "text": " It''s going to end up being brute force, and + then with brute force, any algorithm really", "tokens": [50658, 467, 311, 516, 281, + 917, 493, 885, 47909, 3464, 11, 293, 550, 365, 47909, 3464, 11, 604, 9284, 534, + 50892], "temperature": 0.0, "avg_logprob": -0.26847011424877026, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.08495737612247467}, {"id": 387, "seek": + 127112, "start": 1281.6799999999998, "end": 1282.6799999999998, "text": " works.", + "tokens": [50892, 1985, 13, 50942], "temperature": 0.0, "avg_logprob": -0.26847011424877026, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.08495737612247467}, + {"id": 388, "seek": 127112, "start": 1282.6799999999998, "end": 1284.6799999999998, + "text": " You''re going to be looking through every single value.", "tokens": [50942, + 509, 434, 516, 281, 312, 1237, 807, 633, 2167, 2158, 13, 51042], "temperature": + 0.0, "avg_logprob": -0.26847011424877026, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.08495737612247467}, {"id": 389, "seek": 127112, "start": 1284.6799999999998, + "end": 1285.6799999999998, "text": " Yeah.", "tokens": [51042, 865, 13, 51092], + "temperature": 0.0, "avg_logprob": -0.26847011424877026, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.08495737612247467}, {"id": 390, "seek": 127112, "start": 1285.6799999999998, + "end": 1290.3999999999999, "text": " So it''s like complexity wise, it becomes like + big all of n, where n could be like 1 billion,", "tokens": [51092, 407, 309, 311, + 411, 14024, 10829, 11, 309, 3643, 411, 955, 439, 295, 297, 11, 689, 297, 727, 312, + 411, 502, 5218, 11, 51328], "temperature": 0.0, "avg_logprob": -0.26847011424877026, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.08495737612247467}, + {"id": 391, "seek": 127112, "start": 1290.3999999999999, "end": 1291.3999999999999, + "text": " right?", "tokens": [51328, 558, 30, 51378], "temperature": 0.0, "avg_logprob": + -0.26847011424877026, "compression_ratio": 1.7142857142857142, "no_speech_prob": + 0.08495737612247467}, {"id": 392, "seek": 127112, "start": 1291.3999999999999, "end": + 1295.0, "text": " Yeah, when you''re in 1 billion, there''s no problem solved anymore + if you look through", "tokens": [51378, 865, 11, 562, 291, 434, 294, 502, 5218, + 11, 456, 311, 572, 1154, 13041, 3602, 498, 291, 574, 807, 51558], "temperature": + 0.0, "avg_logprob": -0.26847011424877026, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.08495737612247467}, {"id": 393, "seek": 127112, "start": 1295.0, + "end": 1296.0, "text": " everything.", "tokens": [51558, 1203, 13, 51608], "temperature": + 0.0, "avg_logprob": -0.26847011424877026, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.08495737612247467}, {"id": 394, "seek": 127112, "start": 1296.0, + "end": 1297.0, "text": " Yeah.", "tokens": [51608, 865, 13, 51658], "temperature": + 0.0, "avg_logprob": -0.26847011424877026, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.08495737612247467}, {"id": 395, "seek": 127112, "start": 1297.0, + "end": 1298.0, "text": " Yeah.", "tokens": [51658, 865, 13, 51708], "temperature": + 0.0, "avg_logprob": -0.26847011424877026, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.08495737612247467}, {"id": 396, "seek": 127112, "start": 1298.0, + "end": 1299.0, "text": " Yeah.", "tokens": [51708, 865, 13, 51758], "temperature": + 0.0, "avg_logprob": -0.26847011424877026, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.08495737612247467}, {"id": 397, "seek": 129900, "start": 1299.0, + "end": 1303.08, "text": " That''s why you need to go approximate, but it''s not + like approximate to the level of losing", "tokens": [50364, 663, 311, 983, 291, + 643, 281, 352, 30874, 11, 457, 309, 311, 406, 411, 30874, 281, 264, 1496, 295, 7027, + 50568], "temperature": 0.0, "avg_logprob": -0.26516576451579416, "compression_ratio": + 1.7362637362637363, "no_speech_prob": 0.16967517137527466}, {"id": 398, "seek": + 129900, "start": 1303.08, "end": 1305.84, "text": " like tens of percent in, in, + uh, percent.", "tokens": [50568, 411, 10688, 295, 3043, 294, 11, 294, 11, 2232, + 11, 3043, 13, 50706], "temperature": 0.0, "avg_logprob": -0.26516576451579416, "compression_ratio": + 1.7362637362637363, "no_speech_prob": 0.16967517137527466}, {"id": 399, "seek": + 129900, "start": 1305.84, "end": 1306.84, "text": " Yeah.", "tokens": [50706, 865, + 13, 50756], "temperature": 0.0, "avg_logprob": -0.26516576451579416, "compression_ratio": + 1.7362637362637363, "no_speech_prob": 0.16967517137527466}, {"id": 400, "seek": + 129900, "start": 1306.84, "end": 1311.12, "text": " It''s usually, I would say it''s + usually like around three to the three percent.", "tokens": [50756, 467, 311, 2673, + 11, 286, 576, 584, 309, 311, 2673, 411, 926, 1045, 281, 264, 1045, 3043, 13, 50970], + "temperature": 0.0, "avg_logprob": -0.26516576451579416, "compression_ratio": 1.7362637362637363, + "no_speech_prob": 0.16967517137527466}, {"id": 401, "seek": 129900, "start": 1311.12, + "end": 1316.88, "text": " If you''re doing like a very reasonable, like speed versus + a recall, um, you balance it", "tokens": [50970, 759, 291, 434, 884, 411, 257, 588, + 10585, 11, 411, 3073, 5717, 257, 9901, 11, 1105, 11, 291, 4772, 309, 51258], "temperature": + 0.0, "avg_logprob": -0.26516576451579416, "compression_ratio": 1.7362637362637363, + "no_speech_prob": 0.16967517137527466}, {"id": 402, "seek": 129900, "start": 1316.88, + "end": 1320.0, "text": " out of it, that''s where you can change the values in the + actual algorithm.", "tokens": [51258, 484, 295, 309, 11, 300, 311, 689, 291, 393, + 1319, 264, 4190, 294, 264, 3539, 9284, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.26516576451579416, "compression_ratio": 1.7362637362637363, "no_speech_prob": + 0.16967517137527466}, {"id": 403, "seek": 129900, "start": 1320.0, "end": 1327.56, + "text": " But if you keep it kind of balanced, usually 97% the average of mostly + what I''ve been seeing.", "tokens": [51414, 583, 498, 291, 1066, 309, 733, 295, + 13902, 11, 2673, 23399, 4, 264, 4274, 295, 5240, 437, 286, 600, 668, 2577, 13, 51792], + "temperature": 0.0, "avg_logprob": -0.26516576451579416, "compression_ratio": 1.7362637362637363, + "no_speech_prob": 0.16967517137527466}, {"id": 404, "seek": 132756, "start": 1327.56, + "end": 1329.0, "text": " So it''s pretty strong.", "tokens": [50364, 407, 309, 311, + 1238, 2068, 13, 50436], "temperature": 0.0, "avg_logprob": -0.23500954688012182, + "compression_ratio": 1.652027027027027, "no_speech_prob": 0.009698078967630863}, + {"id": 405, "seek": 132756, "start": 1329.0, "end": 1333.1599999999999, "text": + " And this is like where, yeah, it''s approximate when you''re dealing with billions + of data,", "tokens": [50436, 400, 341, 307, 411, 689, 11, 1338, 11, 309, 311, 30874, + 562, 291, 434, 6260, 365, 17375, 295, 1412, 11, 50644], "temperature": 0.0, "avg_logprob": + -0.23500954688012182, "compression_ratio": 1.652027027027027, "no_speech_prob": + 0.009698078967630863}, {"id": 406, "seek": 132756, "start": 1333.1599999999999, + "end": 1338.1599999999999, "text": " you don''t really, like, yeah, finding the + exact for some use it use cases is very useful,", "tokens": [50644, 291, 500, 380, + 534, 11, 411, 11, 1338, 11, 5006, 264, 1900, 337, 512, 764, 309, 764, 3331, 307, + 588, 4420, 11, 50894], "temperature": 0.0, "avg_logprob": -0.23500954688012182, + "compression_ratio": 1.652027027027027, "no_speech_prob": 0.009698078967630863}, + {"id": 407, "seek": 132756, "start": 1338.1599999999999, "end": 1342.1599999999999, + "text": " but usually in the billion scale data range, you''re okay with just getting + a few that are", "tokens": [50894, 457, 2673, 294, 264, 5218, 4373, 1412, 3613, + 11, 291, 434, 1392, 365, 445, 1242, 257, 1326, 300, 366, 51094], "temperature": + 0.0, "avg_logprob": -0.23500954688012182, "compression_ratio": 1.652027027027027, + "no_speech_prob": 0.009698078967630863}, {"id": 408, "seek": 132756, "start": 1342.1599999999999, + "end": 1343.1599999999999, "text": " very close.", "tokens": [51094, 588, 1998, + 13, 51144], "temperature": 0.0, "avg_logprob": -0.23500954688012182, "compression_ratio": + 1.652027027027027, "no_speech_prob": 0.009698078967630863}, {"id": 409, "seek": + 132756, "start": 1343.1599999999999, "end": 1344.1599999999999, "text": " Yeah.", + "tokens": [51144, 865, 13, 51194], "temperature": 0.0, "avg_logprob": -0.23500954688012182, + "compression_ratio": 1.652027027027027, "no_speech_prob": 0.009698078967630863}, + {"id": 410, "seek": 132756, "start": 1344.1599999999999, "end": 1345.1599999999999, + "text": " Yeah.", "tokens": [51194, 865, 13, 51244], "temperature": 0.0, "avg_logprob": + -0.23500954688012182, "compression_ratio": 1.652027027027027, "no_speech_prob": + 0.009698078967630863}, {"id": 411, "seek": 132756, "start": 1345.1599999999999, + "end": 1351.12, "text": " And I mean, I''ve, um, so when I published a blog post + about like all the vector databases,", "tokens": [51244, 400, 286, 914, 11, 286, + 600, 11, 1105, 11, 370, 562, 286, 6572, 257, 6968, 2183, 466, 411, 439, 264, 8062, + 22380, 11, 51542], "temperature": 0.0, "avg_logprob": -0.23500954688012182, "compression_ratio": + 1.652027027027027, "no_speech_prob": 0.009698078967630863}, {"id": 412, "seek": + 132756, "start": 1351.12, "end": 1353.52, "text": " I will make sure to link it + in the notes.", "tokens": [51542, 286, 486, 652, 988, 281, 2113, 309, 294, 264, + 5570, 13, 51662], "temperature": 0.0, "avg_logprob": -0.23500954688012182, "compression_ratio": + 1.652027027027027, "no_speech_prob": 0.009698078967630863}, {"id": 413, "seek": + 132756, "start": 1353.52, "end": 1356.12, "text": " Um, and, and Mildbus was there + as well.", "tokens": [51662, 3301, 11, 293, 11, 293, 376, 793, 21441, 390, 456, + 382, 731, 13, 51792], "temperature": 0.0, "avg_logprob": -0.23500954688012182, "compression_ratio": + 1.652027027027027, "no_speech_prob": 0.009698078967630863}, {"id": 414, "seek": + 135612, "start": 1356.12, "end": 1362.0, "text": " You know, and I can use somebody + said that they have been actually using no SQL database", "tokens": [50364, 509, + 458, 11, 293, 286, 393, 764, 2618, 848, 300, 436, 362, 668, 767, 1228, 572, 19200, + 8149, 50658], "temperature": 0.0, "avg_logprob": -0.2279913295399059, "compression_ratio": + 1.7469879518072289, "no_speech_prob": 0.013662697747349739}, {"id": 415, "seek": + 135612, "start": 1362.0, "end": 1364.6399999999999, "text": " for genome related + project.", "tokens": [50658, 337, 21953, 4077, 1716, 13, 50790], "temperature": + 0.0, "avg_logprob": -0.2279913295399059, "compression_ratio": 1.7469879518072289, + "no_speech_prob": 0.013662697747349739}, {"id": 416, "seek": 135612, "start": 1364.6399999999999, + "end": 1369.8799999999999, "text": " And so what, what the guy said that he did + is that he actually can pre computed the nearest", "tokens": [50790, 400, 370, 437, + 11, 437, 264, 2146, 848, 300, 415, 630, 307, 300, 415, 767, 393, 659, 40610, 264, + 23831, 51052], "temperature": 0.0, "avg_logprob": -0.2279913295399059, "compression_ratio": + 1.7469879518072289, "no_speech_prob": 0.013662697747349739}, {"id": 417, "seek": + 135612, "start": 1369.8799999999999, "end": 1373.1999999999998, "text": " neighbors + for each individual entry.", "tokens": [51052, 12512, 337, 1184, 2609, 8729, 13, + 51218], "temperature": 0.0, "avg_logprob": -0.2279913295399059, "compression_ratio": + 1.7469879518072289, "no_speech_prob": 0.013662697747349739}, {"id": 418, "seek": + 135612, "start": 1373.1999999999998, "end": 1377.7199999999998, "text": " And then + he stored it as, as individual items in the no SQL database.", "tokens": [51218, + 400, 550, 415, 12187, 309, 382, 11, 382, 2609, 4754, 294, 264, 572, 19200, 8149, + 13, 51444], "temperature": 0.0, "avg_logprob": -0.2279913295399059, "compression_ratio": + 1.7469879518072289, "no_speech_prob": 0.013662697747349739}, {"id": 419, "seek": + 135612, "start": 1377.7199999999998, "end": 1382.08, "text": " And so as query came + in, he basically kind of went and kind of asked for each item,", "tokens": [51444, + 400, 370, 382, 14581, 1361, 294, 11, 415, 1936, 733, 295, 1437, 293, 733, 295, 2351, + 337, 1184, 3174, 11, 51662], "temperature": 0.0, "avg_logprob": -0.2279913295399059, + "compression_ratio": 1.7469879518072289, "no_speech_prob": 0.013662697747349739}, + {"id": 420, "seek": 135612, "start": 1382.08, "end": 1384.6799999999998, "text": + " okay, what''s your neighbors, right?", "tokens": [51662, 1392, 11, 437, 311, 428, + 12512, 11, 558, 30, 51792], "temperature": 0.0, "avg_logprob": -0.2279913295399059, + "compression_ratio": 1.7469879518072289, "no_speech_prob": 0.013662697747349739}, + {"id": 421, "seek": 138468, "start": 1384.68, "end": 1390.24, "text": " And, and + then he said on, on, on small scale, this worked fine, but he wouldn''t necessarily", + "tokens": [50364, 400, 11, 293, 550, 415, 848, 322, 11, 322, 11, 322, 1359, 4373, + 11, 341, 2732, 2489, 11, 457, 415, 2759, 380, 4725, 50642], "temperature": 0.0, + "avg_logprob": -0.18661177627683626, "compression_ratio": 1.5863453815261044, "no_speech_prob": + 0.00018995991558767855}, {"id": 422, "seek": 138468, "start": 1390.24, "end": 1393.4, + "text": " use this on the kind of next level, right?", "tokens": [50642, 764, 341, + 322, 264, 733, 295, 958, 1496, 11, 558, 30, 50800], "temperature": 0.0, "avg_logprob": + -0.18661177627683626, "compression_ratio": 1.5863453815261044, "no_speech_prob": + 0.00018995991558767855}, {"id": 423, "seek": 138468, "start": 1393.4, "end": 1394.4, + "text": " Yeah.", "tokens": [50800, 865, 13, 50850], "temperature": 0.0, "avg_logprob": + -0.18661177627683626, "compression_ratio": 1.5863453815261044, "no_speech_prob": + 0.00018995991558767855}, {"id": 424, "seek": 138468, "start": 1394.4, "end": 1398.5600000000002, + "text": " And so can you tell me more like how Mildbus is done?", "tokens": [50850, + 400, 370, 393, 291, 980, 385, 544, 411, 577, 376, 793, 21441, 307, 1096, 30, 51058], + "temperature": 0.0, "avg_logprob": -0.18661177627683626, "compression_ratio": 1.5863453815261044, + "no_speech_prob": 0.00018995991558767855}, {"id": 425, "seek": 138468, "start": + 1398.5600000000002, "end": 1403.68, "text": " What is, what is it as a product, + let''s say, uh, and, and what''s included inside?", "tokens": [51058, 708, 307, + 11, 437, 307, 309, 382, 257, 1674, 11, 718, 311, 584, 11, 2232, 11, 293, 11, 293, + 437, 311, 5556, 1854, 30, 51314], "temperature": 0.0, "avg_logprob": -0.18661177627683626, + "compression_ratio": 1.5863453815261044, "no_speech_prob": 0.00018995991558767855}, + {"id": 426, "seek": 138468, "start": 1403.68, "end": 1405.16, "text": " What can + I get as a user?", "tokens": [51314, 708, 393, 286, 483, 382, 257, 4195, 30, 51388], + "temperature": 0.0, "avg_logprob": -0.18661177627683626, "compression_ratio": 1.5863453815261044, + "no_speech_prob": 0.00018995991558767855}, {"id": 427, "seek": 138468, "start": + 1405.16, "end": 1406.16, "text": " Yeah, sure.", "tokens": [51388, 865, 11, 988, + 13, 51438], "temperature": 0.0, "avg_logprob": -0.18661177627683626, "compression_ratio": + 1.5863453815261044, "no_speech_prob": 0.00018995991558767855}, {"id": 428, "seek": + 138468, "start": 1406.16, "end": 1412.16, "text": " So, um, yeah, Mildbus, we kind + of built it as a database first, similar research", "tokens": [51438, 407, 11, 1105, + 11, 1338, 11, 376, 793, 21441, 11, 321, 733, 295, 3094, 309, 382, 257, 8149, 700, + 11, 2531, 2132, 51738], "temperature": 0.0, "avg_logprob": -0.18661177627683626, + "compression_ratio": 1.5863453815261044, "no_speech_prob": 0.00018995991558767855}, + {"id": 429, "seek": 141216, "start": 1412.16, "end": 1419.3600000000001, "text": + " second, where everyone''s collecting a bunch of data, a bunch of vectors, everyone''s + hoarding", "tokens": [50364, 1150, 11, 689, 1518, 311, 12510, 257, 3840, 295, 1412, + 11, 257, 3840, 295, 18875, 11, 1518, 311, 45940, 278, 50724], "temperature": 0.0, + "avg_logprob": -0.21822354243351863, "compression_ratio": 1.782918149466192, "no_speech_prob": + 0.05765970051288605}, {"id": 430, "seek": 141216, "start": 1419.3600000000001, "end": + 1423.2, "text": " all their data and they have, they''re making their neural nets, + they''re all getting embeddings,", "tokens": [50724, 439, 641, 1412, 293, 436, 362, + 11, 436, 434, 1455, 641, 18161, 36170, 11, 436, 434, 439, 1242, 12240, 29432, 11, + 50916], "temperature": 0.0, "avg_logprob": -0.21822354243351863, "compression_ratio": + 1.782918149466192, "no_speech_prob": 0.05765970051288605}, {"id": 431, "seek": 141216, + "start": 1423.2, "end": 1424.88, "text": " but then like, what''s next?", "tokens": + [50916, 457, 550, 411, 11, 437, 311, 958, 30, 51000], "temperature": 0.0, "avg_logprob": + -0.21822354243351863, "compression_ratio": 1.782918149466192, "no_speech_prob": + 0.05765970051288605}, {"id": 432, "seek": 141216, "start": 1424.88, "end": 1426.28, + "text": " You need to do something with that data.", "tokens": [51000, 509, 643, + 281, 360, 746, 365, 300, 1412, 13, 51070], "temperature": 0.0, "avg_logprob": -0.21822354243351863, + "compression_ratio": 1.782918149466192, "no_speech_prob": 0.05765970051288605}, + {"id": 433, "seek": 141216, "start": 1426.28, "end": 1427.68, "text": " So that''s + where against similar research.", "tokens": [51070, 407, 300, 311, 689, 1970, 2531, + 2132, 13, 51140], "temperature": 0.0, "avg_logprob": -0.21822354243351863, "compression_ratio": + 1.782918149466192, "no_speech_prob": 0.05765970051288605}, {"id": 434, "seek": 141216, + "start": 1427.68, "end": 1431.0, "text": " Yeah, what we''re doing is building up + a database system.", "tokens": [51140, 865, 11, 437, 321, 434, 884, 307, 2390, 493, + 257, 8149, 1185, 13, 51306], "temperature": 0.0, "avg_logprob": -0.21822354243351863, + "compression_ratio": 1.782918149466192, "no_speech_prob": 0.05765970051288605}, + {"id": 435, "seek": 141216, "start": 1431.0, "end": 1436.8400000000001, "text": + " So right now with version 2.0, we''re really working on making a cloud, uh, native, + something", "tokens": [51306, 407, 558, 586, 365, 3037, 568, 13, 15, 11, 321, 434, + 534, 1364, 322, 1455, 257, 4588, 11, 2232, 11, 8470, 11, 746, 51598], "temperature": + 0.0, "avg_logprob": -0.21822354243351863, "compression_ratio": 1.782918149466192, + "no_speech_prob": 0.05765970051288605}, {"id": 436, "seek": 141216, "start": 1436.8400000000001, + "end": 1440.72, "text": " scalable, something fast and something easy to use.", + "tokens": [51598, 38481, 11, 746, 2370, 293, 746, 1858, 281, 764, 13, 51792], "temperature": + 0.0, "avg_logprob": -0.21822354243351863, "compression_ratio": 1.782918149466192, + "no_speech_prob": 0.05765970051288605}, {"id": 437, "seek": 144072, "start": 1440.72, + "end": 1445.84, "text": " So we, you can think of it pretty much as a MySQL database + and just for vectors.", "tokens": [50364, 407, 321, 11, 291, 393, 519, 295, 309, + 1238, 709, 382, 257, 1222, 39934, 8149, 293, 445, 337, 18875, 13, 50620], "temperature": + 0.0, "avg_logprob": -0.16057920055229122, "compression_ratio": 1.7896825396825398, + "no_speech_prob": 0.0004333755641710013}, {"id": 438, "seek": 144072, "start": 1445.84, + "end": 1451.68, "text": " And then in that regard, you have the cred operations, + you have sharding, you have all of", "tokens": [50620, 400, 550, 294, 300, 3843, + 11, 291, 362, 264, 3864, 7705, 11, 291, 362, 402, 515, 278, 11, 291, 362, 439, 295, + 50912], "temperature": 0.0, "avg_logprob": -0.16057920055229122, "compression_ratio": + 1.7896825396825398, "no_speech_prob": 0.0004333755641710013}, {"id": 439, "seek": + 144072, "start": 1451.68, "end": 1457.2, "text": " this, all these operations and + we''re kind of building up that for vector itself.", "tokens": [50912, 341, 11, + 439, 613, 7705, 293, 321, 434, 733, 295, 2390, 493, 300, 337, 8062, 2564, 13, 51188], + "temperature": 0.0, "avg_logprob": -0.16057920055229122, "compression_ratio": 1.7896825396825398, + "no_speech_prob": 0.0004333755641710013}, {"id": 440, "seek": 144072, "start": 1457.2, + "end": 1461.16, "text": " And then later on, we''re going to be building up other + parts that branch off to kind of make", "tokens": [51188, 400, 550, 1780, 322, 11, + 321, 434, 516, 281, 312, 2390, 493, 661, 3166, 300, 9819, 766, 281, 733, 295, 652, + 51386], "temperature": 0.0, "avg_logprob": -0.16057920055229122, "compression_ratio": + 1.7896825396825398, "no_speech_prob": 0.0004333755641710013}, {"id": 441, "seek": + 144072, "start": 1461.16, "end": 1462.16, "text": " those vectors.", "tokens": [51386, + 729, 18875, 13, 51436], "temperature": 0.0, "avg_logprob": -0.16057920055229122, + "compression_ratio": 1.7896825396825398, "no_speech_prob": 0.0004333755641710013}, + {"id": 442, "seek": 144072, "start": 1462.16, "end": 1466.52, "text": " So we''re + kind of, it''s this core to our entire pipeline for dealing with similarity", "tokens": + [51436, 407, 321, 434, 733, 295, 11, 309, 311, 341, 4965, 281, 527, 2302, 15517, + 337, 6260, 365, 32194, 51654], "temperature": 0.0, "avg_logprob": -0.16057920055229122, + "compression_ratio": 1.7896825396825398, "no_speech_prob": 0.0004333755641710013}, + {"id": 443, "seek": 144072, "start": 1466.52, "end": 1468.32, "text": " search.", + "tokens": [51654, 3164, 13, 51744], "temperature": 0.0, "avg_logprob": -0.16057920055229122, + "compression_ratio": 1.7896825396825398, "no_speech_prob": 0.0004333755641710013}, + {"id": 444, "seek": 146832, "start": 1468.32, "end": 1472.1599999999999, "text": + " And yeah, that''s kind of what it is.", "tokens": [50364, 400, 1338, 11, 300, + 311, 733, 295, 437, 309, 307, 13, 50556], "temperature": 0.0, "avg_logprob": -0.19916470845540366, + "compression_ratio": 1.7106382978723405, "no_speech_prob": 0.020923268049955368}, + {"id": 445, "seek": 146832, "start": 1472.1599999999999, "end": 1477.04, "text": + " In terms of like the actions you can do with it, you can do storing, you can updating,", + "tokens": [50556, 682, 2115, 295, 411, 264, 5909, 291, 393, 360, 365, 309, 11, 291, + 393, 360, 26085, 11, 291, 393, 25113, 11, 50800], "temperature": 0.0, "avg_logprob": + -0.19916470845540366, "compression_ratio": 1.7106382978723405, "no_speech_prob": + 0.020923268049955368}, {"id": 446, "seek": 146832, "start": 1477.04, "end": 1481.56, + "text": " as I said, partitioning, sharding, we''re adding scalar filtering in right + now.", "tokens": [50800, 382, 286, 848, 11, 24808, 278, 11, 402, 515, 278, 11, 321, + 434, 5127, 39684, 30822, 294, 558, 586, 13, 51026], "temperature": 0.0, "avg_logprob": + -0.19916470845540366, "compression_ratio": 1.7106382978723405, "no_speech_prob": + 0.020923268049955368}, {"id": 447, "seek": 146832, "start": 1481.56, "end": 1483.8799999999999, + "text": " It''s with INS, but I think this month.", "tokens": [51026, 467, 311, + 365, 6892, 50, 11, 457, 286, 519, 341, 1618, 13, 51142], "temperature": 0.0, "avg_logprob": + -0.19916470845540366, "compression_ratio": 1.7106382978723405, "no_speech_prob": + 0.020923268049955368}, {"id": 448, "seek": 146832, "start": 1483.8799999999999, + "end": 1489.0, "text": " So in the next week, I believe we''re going to be having + strings for a scalar filtering.", "tokens": [51142, 407, 294, 264, 958, 1243, 11, + 286, 1697, 321, 434, 516, 281, 312, 1419, 13985, 337, 257, 39684, 30822, 13, 51398], + "temperature": 0.0, "avg_logprob": -0.19916470845540366, "compression_ratio": 1.7106382978723405, + "no_speech_prob": 0.020923268049955368}, {"id": 449, "seek": 146832, "start": 1489.0, + "end": 1493.9199999999998, "text": " What scalar filtering is, is being able to + filter results in a fast way.", "tokens": [51398, 708, 39684, 30822, 307, 11, 307, + 885, 1075, 281, 6608, 3542, 294, 257, 2370, 636, 13, 51644], "temperature": 0.0, + "avg_logprob": -0.19916470845540366, "compression_ratio": 1.7106382978723405, "no_speech_prob": + 0.020923268049955368}, {"id": 450, "seek": 149392, "start": 1493.92, "end": 1498.68, + "text": " So instead of searching through everything and then filtering out these + certain things,", "tokens": [50364, 407, 2602, 295, 10808, 807, 1203, 293, 550, + 30822, 484, 613, 1629, 721, 11, 50602], "temperature": 0.0, "avg_logprob": -0.1762977502284906, + "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.05225497484207153}, + {"id": 451, "seek": 149392, "start": 1498.68, "end": 1503.48, "text": " you kind + of apply the filter first or apply the filter during the search to speed everything", + "tokens": [50602, 291, 733, 295, 3079, 264, 6608, 700, 420, 3079, 264, 6608, 1830, + 264, 3164, 281, 3073, 1203, 50842], "temperature": 0.0, "avg_logprob": -0.1762977502284906, + "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.05225497484207153}, + {"id": 452, "seek": 149392, "start": 1503.48, "end": 1506.3200000000002, "text": + " up and also get more accurate results.", "tokens": [50842, 493, 293, 611, 483, + 544, 8559, 3542, 13, 50984], "temperature": 0.0, "avg_logprob": -0.1762977502284906, + "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.05225497484207153}, + {"id": 453, "seek": 149392, "start": 1506.3200000000002, "end": 1512.0800000000002, + "text": " So with, let''s say you have a vector and then there''s a filter that + says, glass is equal", "tokens": [50984, 407, 365, 11, 718, 311, 584, 291, 362, + 257, 8062, 293, 550, 456, 311, 257, 6608, 300, 1619, 11, 4276, 307, 2681, 51272], + "temperature": 0.0, "avg_logprob": -0.1762977502284906, "compression_ratio": 1.8333333333333333, + "no_speech_prob": 0.05225497484207153}, {"id": 454, "seek": 149392, "start": 1512.0800000000002, + "end": 1513.0800000000002, "text": " true.", "tokens": [51272, 2074, 13, 51322], + "temperature": 0.0, "avg_logprob": -0.1762977502284906, "compression_ratio": 1.8333333333333333, + "no_speech_prob": 0.05225497484207153}, {"id": 455, "seek": 149392, "start": 1513.0800000000002, + "end": 1517.6000000000001, "text": " You can look for every single vector that has + glasses equal true.", "tokens": [51322, 509, 393, 574, 337, 633, 2167, 8062, 300, + 575, 10812, 2681, 2074, 13, 51548], "temperature": 0.0, "avg_logprob": -0.1762977502284906, + "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.05225497484207153}, + {"id": 456, "seek": 149392, "start": 1517.6000000000001, "end": 1520.72, "text": + " And that''s very useful and something that everyone''s been looking for.", "tokens": + [51548, 400, 300, 311, 588, 4420, 293, 746, 300, 1518, 311, 668, 1237, 337, 13, + 51704], "temperature": 0.0, "avg_logprob": -0.1762977502284906, "compression_ratio": + 1.8333333333333333, "no_speech_prob": 0.05225497484207153}, {"id": 457, "seek": + 149392, "start": 1520.72, "end": 1522.44, "text": " But yeah, it''s a database first.", + "tokens": [51704, 583, 1338, 11, 309, 311, 257, 8149, 700, 13, 51790], "temperature": + 0.0, "avg_logprob": -0.1762977502284906, "compression_ratio": 1.8333333333333333, + "no_speech_prob": 0.05225497484207153}, {"id": 458, "seek": 152244, "start": 1522.44, + "end": 1527.76, "text": " And then for the actual searching, we employ all these + libraries that we produced and", "tokens": [50364, 400, 550, 337, 264, 3539, 10808, + 11, 321, 3188, 439, 613, 15148, 300, 321, 7126, 293, 50630], "temperature": 0.0, + "avg_logprob": -0.1932563643524612, "compression_ratio": 1.8027681660899655, "no_speech_prob": + 0.09691616892814636}, {"id": 459, "seek": 152244, "start": 1527.76, "end": 1534.68, + "text": " mentioned, us, annoy, phase, hnsw, all these guys to build these indexes.", + "tokens": [50630, 2835, 11, 505, 11, 8759, 11, 5574, 11, 276, 3695, 86, 11, 439, + 613, 1074, 281, 1322, 613, 8186, 279, 13, 50976], "temperature": 0.0, "avg_logprob": + -0.1932563643524612, "compression_ratio": 1.8027681660899655, "no_speech_prob": + 0.09691616892814636}, {"id": 460, "seek": 152244, "start": 1534.68, "end": 1536.72, + "text": " And then you can select whichever one you want.", "tokens": [50976, 400, + 550, 291, 393, 3048, 24123, 472, 291, 528, 13, 51078], "temperature": 0.0, "avg_logprob": + -0.1932563643524612, "compression_ratio": 1.8027681660899655, "no_speech_prob": + 0.09691616892814636}, {"id": 461, "seek": 152244, "start": 1536.72, "end": 1538.28, + "text": " You can use multiple.", "tokens": [51078, 509, 393, 764, 3866, 13, 51156], + "temperature": 0.0, "avg_logprob": -0.1932563643524612, "compression_ratio": 1.8027681660899655, + "no_speech_prob": 0.09691616892814636}, {"id": 462, "seek": 152244, "start": 1538.28, + "end": 1542.72, "text": " Sometimes some will work better for images or if you''re + neural nets working in some way,", "tokens": [51156, 4803, 512, 486, 589, 1101, + 337, 5267, 420, 498, 291, 434, 18161, 36170, 1364, 294, 512, 636, 11, 51378], "temperature": + 0.0, "avg_logprob": -0.1932563643524612, "compression_ratio": 1.8027681660899655, + "no_speech_prob": 0.09691616892814636}, {"id": 463, "seek": 152244, "start": 1542.72, + "end": 1543.92, "text": " it might work better with this one.", "tokens": [51378, + 309, 1062, 589, 1101, 365, 341, 472, 13, 51438], "temperature": 0.0, "avg_logprob": + -0.1932563643524612, "compression_ratio": 1.8027681660899655, "no_speech_prob": + 0.09691616892814636}, {"id": 464, "seek": 152244, "start": 1543.92, "end": 1548.04, + "text": " So you can store multiple of these indexes, decide, test pretty easily, + and mess around", "tokens": [51438, 407, 291, 393, 3531, 3866, 295, 613, 8186, 279, + 11, 4536, 11, 1500, 1238, 3612, 11, 293, 2082, 926, 51644], "temperature": 0.0, + "avg_logprob": -0.1932563643524612, "compression_ratio": 1.8027681660899655, "no_speech_prob": + 0.09691616892814636}, {"id": 465, "seek": 152244, "start": 1548.04, "end": 1549.04, + "text": " with it.", "tokens": [51644, 365, 309, 13, 51694], "temperature": 0.0, + "avg_logprob": -0.1932563643524612, "compression_ratio": 1.8027681660899655, "no_speech_prob": + 0.09691616892814636}, {"id": 466, "seek": 152244, "start": 1549.04, "end": 1552.16, + "text": " And once you''re done, you select what you want and you call it a day.", + "tokens": [51694, 400, 1564, 291, 434, 1096, 11, 291, 3048, 437, 291, 528, 293, + 291, 818, 309, 257, 786, 13, 51850], "temperature": 0.0, "avg_logprob": -0.1932563643524612, + "compression_ratio": 1.8027681660899655, "no_speech_prob": 0.09691616892814636}, + {"id": 467, "seek": 155216, "start": 1552.16, "end": 1554.44, "text": " And you + search and you get results.", "tokens": [50364, 400, 291, 3164, 293, 291, 483, 3542, + 13, 50478], "temperature": 0.0, "avg_logprob": -0.27476676734718114, "compression_ratio": + 1.6270491803278688, "no_speech_prob": 0.017103565856814384}, {"id": 468, "seek": + 155216, "start": 1554.44, "end": 1555.44, "text": " Yeah.", "tokens": [50478, 865, + 13, 50528], "temperature": 0.0, "avg_logprob": -0.27476676734718114, "compression_ratio": + 1.6270491803278688, "no_speech_prob": 0.017103565856814384}, {"id": 469, "seek": + 155216, "start": 1555.44, "end": 1561.0, "text": " Actually, when I was watching + one presentation by your colleagues at Haystack, we will make", "tokens": [50528, + 5135, 11, 562, 286, 390, 1976, 472, 5860, 538, 428, 7734, 412, 8721, 372, 501, 11, + 321, 486, 652, 50806], "temperature": 0.0, "avg_logprob": -0.27476676734718114, + "compression_ratio": 1.6270491803278688, "no_speech_prob": 0.017103565856814384}, + {"id": 470, "seek": 155216, "start": 1561.0, "end": 1562.68, "text": " sure to link + this as well.", "tokens": [50806, 988, 281, 2113, 341, 382, 731, 13, 50890], "temperature": + 0.0, "avg_logprob": -0.27476676734718114, "compression_ratio": 1.6270491803278688, + "no_speech_prob": 0.017103565856814384}, {"id": 471, "seek": 155216, "start": 1562.68, + "end": 1567.0, "text": " Like this got my eye besides, you know, the horizontal + scaling that other databases as", "tokens": [50890, 1743, 341, 658, 452, 3313, 11868, + 11, 291, 458, 11, 264, 12750, 21589, 300, 661, 22380, 382, 51106], "temperature": + 0.0, "avg_logprob": -0.27476676734718114, "compression_ratio": 1.6270491803278688, + "no_speech_prob": 0.017103565856814384}, {"id": 472, "seek": 155216, "start": 1567.0, + "end": 1568.0, "text": " well have.", "tokens": [51106, 731, 362, 13, 51156], "temperature": + 0.0, "avg_logprob": -0.27476676734718114, "compression_ratio": 1.6270491803278688, + "no_speech_prob": 0.017103565856814384}, {"id": 473, "seek": 155216, "start": 1568.0, + "end": 1571.8400000000001, "text": " Well, maybe not all of them, but some most + of them.", "tokens": [51156, 1042, 11, 1310, 406, 439, 295, 552, 11, 457, 512, 881, + 295, 552, 13, 51348], "temperature": 0.0, "avg_logprob": -0.27476676734718114, "compression_ratio": + 1.6270491803278688, "no_speech_prob": 0.017103565856814384}, {"id": 474, "seek": + 155216, "start": 1571.8400000000001, "end": 1576.92, "text": " But, you know, one + thing that caught my eye was that I can indexes, you said, the data", "tokens": + [51348, 583, 11, 291, 458, 11, 472, 551, 300, 5415, 452, 3313, 390, 300, 286, 393, + 8186, 279, 11, 291, 848, 11, 264, 1412, 51602], "temperature": 0.0, "avg_logprob": + -0.27476676734718114, "compression_ratio": 1.6270491803278688, "no_speech_prob": + 0.017103565856814384}, {"id": 475, "seek": 157692, "start": 1576.92, "end": 1583.6000000000001, + "text": " using different index layouts, essentially different algorithms that you + alluded to earlier.", "tokens": [50364, 1228, 819, 8186, 46100, 11, 4476, 819, 14642, + 300, 291, 33919, 281, 3071, 13, 50698], "temperature": 0.0, "avg_logprob": -0.20521903038024902, + "compression_ratio": 1.7719298245614035, "no_speech_prob": 0.0494510680437088}, + {"id": 476, "seek": 157692, "start": 1583.6000000000001, "end": 1588.4, "text": + " And then I can somehow test and kind of figure out which one works better.", "tokens": + [50698, 400, 550, 286, 393, 6063, 1500, 293, 733, 295, 2573, 484, 597, 472, 1985, + 1101, 13, 50938], "temperature": 0.0, "avg_logprob": -0.20521903038024902, "compression_ratio": + 1.7719298245614035, "no_speech_prob": 0.0494510680437088}, {"id": 477, "seek": 157692, + "start": 1588.4, "end": 1589.4, "text": " Is that right?", "tokens": [50938, 1119, + 300, 558, 30, 50988], "temperature": 0.0, "avg_logprob": -0.20521903038024902, "compression_ratio": + 1.7719298245614035, "no_speech_prob": 0.0494510680437088}, {"id": 478, "seek": 157692, + "start": 1589.4, "end": 1593.76, "text": " Yeah, pretty much so we do have benchmarking + algorithms, but you can also benchmark yourself", "tokens": [50988, 865, 11, 1238, + 709, 370, 321, 360, 362, 18927, 278, 14642, 11, 457, 291, 393, 611, 18927, 1803, + 51206], "temperature": 0.0, "avg_logprob": -0.20521903038024902, "compression_ratio": + 1.7719298245614035, "no_speech_prob": 0.0494510680437088}, {"id": 479, "seek": 157692, + "start": 1593.76, "end": 1594.92, "text": " as well.", "tokens": [51206, 382, 731, + 13, 51264], "temperature": 0.0, "avg_logprob": -0.20521903038024902, "compression_ratio": + 1.7719298245614035, "no_speech_prob": 0.0494510680437088}, {"id": 480, "seek": 157692, + "start": 1594.92, "end": 1599.16, "text": " But the way is you can build up also + every single index has its own parameters and you", "tokens": [51264, 583, 264, + 636, 307, 291, 393, 1322, 493, 611, 633, 2167, 8186, 575, 1080, 1065, 9834, 293, + 291, 51476], "temperature": 0.0, "avg_logprob": -0.20521903038024902, "compression_ratio": + 1.7719298245614035, "no_speech_prob": 0.0494510680437088}, {"id": 481, "seek": 157692, + "start": 1599.16, "end": 1600.6000000000001, "text": " can just constantly build + up more.", "tokens": [51476, 393, 445, 6460, 1322, 493, 544, 13, 51548], "temperature": + 0.0, "avg_logprob": -0.20521903038024902, "compression_ratio": 1.7719298245614035, + "no_speech_prob": 0.0494510680437088}, {"id": 482, "seek": 157692, "start": 1600.6000000000001, + "end": 1605.04, "text": " You can like build up 10 of parameters change this way + or 10 of just completely different", "tokens": [51548, 509, 393, 411, 1322, 493, + 1266, 295, 9834, 1319, 341, 636, 420, 1266, 295, 445, 2584, 819, 51770], "temperature": + 0.0, "avg_logprob": -0.20521903038024902, "compression_ratio": 1.7719298245614035, + "no_speech_prob": 0.0494510680437088}, {"id": 483, "seek": 157692, "start": 1605.04, + "end": 1606.04, "text": " indexes.", "tokens": [51770, 8186, 279, 13, 51820], "temperature": + 0.0, "avg_logprob": -0.20521903038024902, "compression_ratio": 1.7719298245614035, + "no_speech_prob": 0.0494510680437088}, {"id": 484, "seek": 160604, "start": 1606.04, + "end": 1610.12, "text": " And then you perform a search with the same vector for + each index because when you search,", "tokens": [50364, 400, 550, 291, 2042, 257, + 3164, 365, 264, 912, 8062, 337, 1184, 8186, 570, 562, 291, 3164, 11, 50568], "temperature": + 0.0, "avg_logprob": -0.19483541351517822, "compression_ratio": 1.8650306748466257, + "no_speech_prob": 0.016312668099999428}, {"id": 485, "seek": 160604, "start": 1610.12, + "end": 1612.32, "text": " you can select which index you want to use.", "tokens": + [50568, 291, 393, 3048, 597, 8186, 291, 528, 281, 764, 13, 50678], "temperature": + 0.0, "avg_logprob": -0.19483541351517822, "compression_ratio": 1.8650306748466257, + "no_speech_prob": 0.016312668099999428}, {"id": 486, "seek": 160604, "start": 1612.32, + "end": 1616.72, "text": " So you can just take that search, throw it in through + every single index, see the results.", "tokens": [50678, 407, 291, 393, 445, 747, + 300, 3164, 11, 3507, 309, 294, 807, 633, 2167, 8186, 11, 536, 264, 3542, 13, 50898], + "temperature": 0.0, "avg_logprob": -0.19483541351517822, "compression_ratio": 1.8650306748466257, + "no_speech_prob": 0.016312668099999428}, {"id": 487, "seek": 160604, "start": 1616.72, + "end": 1620.1599999999999, "text": " And then if you have a baseline data where + you already have it labeled and you know what", "tokens": [50898, 400, 550, 498, + 291, 362, 257, 20518, 1412, 689, 291, 1217, 362, 309, 21335, 293, 291, 458, 437, + 51070], "temperature": 0.0, "avg_logprob": -0.19483541351517822, "compression_ratio": + 1.8650306748466257, "no_speech_prob": 0.016312668099999428}, {"id": 488, "seek": + 160604, "start": 1620.1599999999999, "end": 1622.52, "text": " results you should + be getting from a brute force.", "tokens": [51070, 3542, 291, 820, 312, 1242, 490, + 257, 47909, 3464, 13, 51188], "temperature": 0.0, "avg_logprob": -0.19483541351517822, + "compression_ratio": 1.8650306748466257, "no_speech_prob": 0.016312668099999428}, + {"id": 489, "seek": 160604, "start": 1622.52, "end": 1626.24, "text": " So when + we do these benchmarks, it''s always compared to brute force because brute force", + "tokens": [51188, 407, 562, 321, 360, 613, 43751, 11, 309, 311, 1009, 5347, 281, + 47909, 3464, 570, 47909, 3464, 51374], "temperature": 0.0, "avg_logprob": -0.19483541351517822, + "compression_ratio": 1.8650306748466257, "no_speech_prob": 0.016312668099999428}, + {"id": 490, "seek": 160604, "start": 1626.24, "end": 1628.08, "text": " will give + you the exact answers.", "tokens": [51374, 486, 976, 291, 264, 1900, 6338, 13, 51466], + "temperature": 0.0, "avg_logprob": -0.19483541351517822, "compression_ratio": 1.8650306748466257, + "no_speech_prob": 0.016312668099999428}, {"id": 491, "seek": 160604, "start": 1628.08, + "end": 1631.6399999999999, "text": " And from there, you can kind of see, okay, + how many hits did I get, how many did I miss,", "tokens": [51466, 400, 490, 456, + 11, 291, 393, 733, 295, 536, 11, 1392, 11, 577, 867, 8664, 630, 286, 483, 11, 577, + 867, 630, 286, 1713, 11, 51644], "temperature": 0.0, "avg_logprob": -0.19483541351517822, + "compression_ratio": 1.8650306748466257, "no_speech_prob": 0.016312668099999428}, + {"id": 492, "seek": 160604, "start": 1631.6399999999999, "end": 1632.8799999999999, + "text": " and see what your recall rate is.", "tokens": [51644, 293, 536, 437, 428, + 9901, 3314, 307, 13, 51706], "temperature": 0.0, "avg_logprob": -0.19483541351517822, + "compression_ratio": 1.8650306748466257, "no_speech_prob": 0.016312668099999428}, + {"id": 493, "seek": 163288, "start": 1632.88, "end": 1641.2, "text": " And then + you can also time these things as well because some parameters, if you make 10,000", + "tokens": [50364, 400, 550, 291, 393, 611, 565, 613, 721, 382, 731, 570, 512, 9834, + 11, 498, 291, 652, 1266, 11, 1360, 50780], "temperature": 0.0, "avg_logprob": -0.182239259992327, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.008399716578423977}, + {"id": 494, "seek": 163288, "start": 1641.2, "end": 1645.0800000000002, "text": + " clusters within your data, that''s going to take a bit to search if you want to + search", "tokens": [50780, 23313, 1951, 428, 1412, 11, 300, 311, 516, 281, 747, + 257, 857, 281, 3164, 498, 291, 528, 281, 3164, 50974], "temperature": 0.0, "avg_logprob": + -0.182239259992327, "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.008399716578423977}, + {"id": 495, "seek": 163288, "start": 1645.0800000000002, "end": 1646.0800000000002, + "text": " through every single one.", "tokens": [50974, 807, 633, 2167, 472, 13, + 51024], "temperature": 0.0, "avg_logprob": -0.182239259992327, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.008399716578423977}, {"id": 496, "seek": + 163288, "start": 1646.0800000000002, "end": 1650.8400000000001, "text": " So you + time it and then you can kind of get this ratio of speech performance or we usually", + "tokens": [51024, 407, 291, 565, 309, 293, 550, 291, 393, 733, 295, 483, 341, 8509, + 295, 6218, 3389, 420, 321, 2673, 51262], "temperature": 0.0, "avg_logprob": -0.182239259992327, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.008399716578423977}, + {"id": 497, "seek": 163288, "start": 1650.8400000000001, "end": 1652.68, "text": + " say like speech recall.", "tokens": [51262, 584, 411, 6218, 9901, 13, 51354], + "temperature": 0.0, "avg_logprob": -0.182239259992327, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.008399716578423977}, {"id": 498, "seek": 163288, "start": 1652.68, + "end": 1657.0, "text": " But yeah, so you can build up all of them then go from + there kind of if one doesn''t work,", "tokens": [51354, 583, 1338, 11, 370, 291, + 393, 1322, 493, 439, 295, 552, 550, 352, 490, 456, 733, 295, 498, 472, 1177, 380, + 589, 11, 51570], "temperature": 0.0, "avg_logprob": -0.182239259992327, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.008399716578423977}, {"id": 499, "seek": + 163288, "start": 1657.0, "end": 1659.72, "text": " you can just delete it, it''ll + do it in the background, build a new one.", "tokens": [51570, 291, 393, 445, 12097, + 309, 11, 309, 603, 360, 309, 294, 264, 3678, 11, 1322, 257, 777, 472, 13, 51706], + "temperature": 0.0, "avg_logprob": -0.182239259992327, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.008399716578423977}, {"id": 500, "seek": 165972, "start": 1659.96, + "end": 1666.24, "text": " You can be doing these searches and everything concurrently + because back to go, the indexes", "tokens": [50376, 509, 393, 312, 884, 613, 26701, + 293, 1203, 37702, 356, 570, 646, 281, 352, 11, 264, 8186, 279, 50690], "temperature": + 0.0, "avg_logprob": -0.22983684259302475, "compression_ratio": 1.6935483870967742, + "no_speech_prob": 0.06231360509991646}, {"id": 501, "seek": 165972, "start": 1666.24, + "end": 1669.84, "text": " can be built at the same time as you''re searching and + doing all these other things because", "tokens": [50690, 393, 312, 3094, 412, 264, + 912, 565, 382, 291, 434, 10808, 293, 884, 439, 613, 661, 721, 570, 50870], "temperature": + 0.0, "avg_logprob": -0.22983684259302475, "compression_ratio": 1.6935483870967742, + "no_speech_prob": 0.06231360509991646}, {"id": 502, "seek": 165972, "start": 1669.84, + "end": 1674.56, "text": " we have workers for queries, for building indexes and + for inserting data.", "tokens": [50870, 321, 362, 5600, 337, 24109, 11, 337, 2390, + 8186, 279, 293, 337, 46567, 1412, 13, 51106], "temperature": 0.0, "avg_logprob": + -0.22983684259302475, "compression_ratio": 1.6935483870967742, "no_speech_prob": + 0.06231360509991646}, {"id": 503, "seek": 165972, "start": 1674.56, "end": 1677.56, + "text": " So it''s all kind of in the background and kind of gets dealt for you.", + "tokens": [51106, 407, 309, 311, 439, 733, 295, 294, 264, 3678, 293, 733, 295, 2170, + 15991, 337, 291, 13, 51256], "temperature": 0.0, "avg_logprob": -0.22983684259302475, + "compression_ratio": 1.6935483870967742, "no_speech_prob": 0.06231360509991646}, + {"id": 504, "seek": 165972, "start": 1677.56, "end": 1678.56, "text": " Yeah, that''s + cool.", "tokens": [51256, 865, 11, 300, 311, 1627, 13, 51306], "temperature": 0.0, + "avg_logprob": -0.22983684259302475, "compression_ratio": 1.6935483870967742, "no_speech_prob": + 0.06231360509991646}, {"id": 505, "seek": 165972, "start": 1678.56, "end": 1682.44, + "text": " And I mean, so you mentioned the technical part, you know, like different + products they might", "tokens": [51306, 400, 286, 914, 11, 370, 291, 2835, 264, + 6191, 644, 11, 291, 458, 11, 411, 819, 3383, 436, 1062, 51500], "temperature": 0.0, + "avg_logprob": -0.22983684259302475, "compression_ratio": 1.6935483870967742, "no_speech_prob": + 0.06231360509991646}, {"id": 506, "seek": 165972, "start": 1682.44, "end": 1688.88, + "text": " have some SLA, let''s say, you know, how quick it is, queit per second, + P99, whatever.", "tokens": [51500, 362, 512, 318, 11435, 11, 718, 311, 584, 11, + 291, 458, 11, 577, 1702, 309, 307, 11, 631, 270, 680, 1150, 11, 430, 8494, 11, 2035, + 13, 51822], "temperature": 0.0, "avg_logprob": -0.22983684259302475, "compression_ratio": + 1.6935483870967742, "no_speech_prob": 0.06231360509991646}, {"id": 507, "seek": + 168888, "start": 1688.88, "end": 1691.2, "text": " But like what about the semantic + part?", "tokens": [50364, 583, 411, 437, 466, 264, 47982, 644, 30, 50480], "temperature": + 0.0, "avg_logprob": -0.1769416332244873, "compression_ratio": 1.7943548387096775, + "no_speech_prob": 0.004401324782520533}, {"id": 508, "seek": 168888, "start": 1691.2, + "end": 1696.4, "text": " Like you mentioned that there is like a ground truth that + you can compare to always, right?", "tokens": [50480, 1743, 291, 2835, 300, 456, + 307, 411, 257, 2727, 3494, 300, 291, 393, 6794, 281, 1009, 11, 558, 30, 50740], + "temperature": 0.0, "avg_logprob": -0.1769416332244873, "compression_ratio": 1.7943548387096775, + "no_speech_prob": 0.004401324782520533}, {"id": 509, "seek": 168888, "start": 1696.4, + "end": 1700.44, "text": " But like what about the other kind of side of things, + let''s say for people who are like,", "tokens": [50740, 583, 411, 437, 466, 264, + 661, 733, 295, 1252, 295, 721, 11, 718, 311, 584, 337, 561, 567, 366, 411, 11, 50942], + "temperature": 0.0, "avg_logprob": -0.1769416332244873, "compression_ratio": 1.7943548387096775, + "no_speech_prob": 0.004401324782520533}, {"id": 510, "seek": 168888, "start": 1700.44, + "end": 1705.24, "text": " let''s say product managers, they''re not very technical, + they will not look into this metrics,", "tokens": [50942, 718, 311, 584, 1674, 14084, + 11, 436, 434, 406, 588, 6191, 11, 436, 486, 406, 574, 666, 341, 16367, 11, 51182], + "temperature": 0.0, "avg_logprob": -0.1769416332244873, "compression_ratio": 1.7943548387096775, + "no_speech_prob": 0.004401324782520533}, {"id": 511, "seek": 168888, "start": 1705.24, + "end": 1711.64, "text": " but they still would like to get a way of understanding, + you know, what''s the kind of", "tokens": [51182, 457, 436, 920, 576, 411, 281, + 483, 257, 636, 295, 3701, 11, 291, 458, 11, 437, 311, 264, 733, 295, 51502], "temperature": + 0.0, "avg_logprob": -0.1769416332244873, "compression_ratio": 1.7943548387096775, + "no_speech_prob": 0.004401324782520533}, {"id": 512, "seek": 168888, "start": 1711.64, + "end": 1715.0400000000002, "text": " impact on the semantic part of things, right?", + "tokens": [51502, 2712, 322, 264, 47982, 644, 295, 721, 11, 558, 30, 51672], "temperature": + 0.0, "avg_logprob": -0.1769416332244873, "compression_ratio": 1.7943548387096775, + "no_speech_prob": 0.004401324782520533}, {"id": 513, "seek": 171504, "start": 1715.04, + "end": 1721.3999999999999, "text": " For instance, you''re comparing, you know, + inverted index versus vector search, right?", "tokens": [50364, 1171, 5197, 11, + 291, 434, 15763, 11, 291, 458, 11, 38969, 8186, 5717, 8062, 3164, 11, 558, 30, 50682], + "temperature": 0.0, "avg_logprob": -0.15081087748209634, "compression_ratio": 1.8233082706766917, + "no_speech_prob": 0.025108875706791878}, {"id": 514, "seek": 171504, "start": 1721.3999999999999, + "end": 1726.84, "text": " So with the semantic part, we don''t really deal with + that as much because we''re assuming", "tokens": [50682, 407, 365, 264, 47982, 644, + 11, 321, 500, 380, 534, 2028, 365, 300, 382, 709, 570, 321, 434, 11926, 50954], + "temperature": 0.0, "avg_logprob": -0.15081087748209634, "compression_ratio": 1.8233082706766917, + "no_speech_prob": 0.025108875706791878}, {"id": 515, "seek": 171504, "start": 1726.84, + "end": 1733.3999999999999, "text": " that your semantics are done well by the, by + the neural net because this is where it", "tokens": [50954, 300, 428, 4361, 45298, + 366, 1096, 731, 538, 264, 11, 538, 264, 18161, 2533, 570, 341, 307, 689, 309, 51282], + "temperature": 0.0, "avg_logprob": -0.15081087748209634, "compression_ratio": 1.8233082706766917, + "no_speech_prob": 0.025108875706791878}, {"id": 516, "seek": 171504, "start": 1733.3999999999999, + "end": 1735.28, "text": " kind of goes, you compare everything to brute force.", + "tokens": [51282, 733, 295, 1709, 11, 291, 6794, 1203, 281, 47909, 3464, 13, 51376], + "temperature": 0.0, "avg_logprob": -0.15081087748209634, "compression_ratio": 1.8233082706766917, + "no_speech_prob": 0.025108875706791878}, {"id": 517, "seek": 171504, "start": 1735.28, + "end": 1740.6399999999999, "text": " If your brute force shows that this is the + correct response or this like, or this wording", "tokens": [51376, 759, 428, 47909, + 3464, 3110, 300, 341, 307, 264, 3006, 4134, 420, 341, 411, 11, 420, 341, 47602, + 51644], "temperature": 0.0, "avg_logprob": -0.15081087748209634, "compression_ratio": + 1.8233082706766917, "no_speech_prob": 0.025108875706791878}, {"id": 518, "seek": + 171504, "start": 1740.6399999999999, "end": 1744.28, "text": " or this wording or + this wording or the top three results, those are mathematically", "tokens": [51644, + 420, 341, 47602, 420, 341, 47602, 420, 264, 1192, 1045, 3542, 11, 729, 366, 44003, + 51826], "temperature": 0.0, "avg_logprob": -0.15081087748209634, "compression_ratio": + 1.8233082706766917, "no_speech_prob": 0.025108875706791878}, {"id": 519, "seek": + 174428, "start": 1744.28, "end": 1747.8, "text": " the closest, most similar to + your input.", "tokens": [50364, 264, 13699, 11, 881, 2531, 281, 428, 4846, 13, 50540], + "temperature": 0.0, "avg_logprob": -0.22015034767889208, "compression_ratio": 1.7794117647058822, + "no_speech_prob": 0.005466743838042021}, {"id": 520, "seek": 174428, "start": 1747.8, + "end": 1750.72, "text": " So that''s where you kind of compared to that.", "tokens": + [50540, 407, 300, 311, 689, 291, 733, 295, 5347, 281, 300, 13, 50686], "temperature": + 0.0, "avg_logprob": -0.22015034767889208, "compression_ratio": 1.7794117647058822, + "no_speech_prob": 0.005466743838042021}, {"id": 521, "seek": 174428, "start": 1750.72, + "end": 1755.72, "text": " If those aren''t close, that means that there''s an error + a step above because your neural", "tokens": [50686, 759, 729, 3212, 380, 1998, + 11, 300, 1355, 300, 456, 311, 364, 6713, 257, 1823, 3673, 570, 428, 18161, 50936], + "temperature": 0.0, "avg_logprob": -0.22015034767889208, "compression_ratio": 1.7794117647058822, + "no_speech_prob": 0.005466743838042021}, {"id": 522, "seek": 174428, "start": 1755.72, + "end": 1758.96, "text": " net is not finding the connections correctly.", "tokens": + [50936, 2533, 307, 406, 5006, 264, 9271, 8944, 13, 51098], "temperature": 0.0, "avg_logprob": + -0.22015034767889208, "compression_ratio": 1.7794117647058822, "no_speech_prob": + 0.005466743838042021}, {"id": 523, "seek": 174428, "start": 1758.96, "end": 1764.56, + "text": " So that''s kind of how we compare to the base, which is just the flat + index of brute force", "tokens": [51098, 407, 300, 311, 733, 295, 577, 321, 6794, + 281, 264, 3096, 11, 597, 307, 445, 264, 4962, 8186, 295, 47909, 3464, 51378], "temperature": + 0.0, "avg_logprob": -0.22015034767889208, "compression_ratio": 1.7794117647058822, + "no_speech_prob": 0.005466743838042021}, {"id": 524, "seek": 174428, "start": 1764.56, + "end": 1769.92, "text": " and we kind of pull out and we see if you''re hitting + the right responses, like if that''s", "tokens": [51378, 293, 321, 733, 295, 2235, + 484, 293, 321, 536, 498, 291, 434, 8850, 264, 558, 13019, 11, 411, 498, 300, 311, + 51646], "temperature": 0.0, "avg_logprob": -0.22015034767889208, "compression_ratio": + 1.7794117647058822, "no_speech_prob": 0.005466743838042021}, {"id": 525, "seek": + 174428, "start": 1769.92, "end": 1774.24, "text": " sort of what we deal with, not + the actual, because the semantics come from the same", "tokens": [51646, 1333, 295, + 437, 321, 2028, 365, 11, 406, 264, 3539, 11, 570, 264, 4361, 45298, 808, 490, 264, + 912, 51862], "temperature": 0.0, "avg_logprob": -0.22015034767889208, "compression_ratio": + 1.7794117647058822, "no_speech_prob": 0.005466743838042021}, {"id": 526, "seek": + 177424, "start": 1774.28, "end": 1780.84, "text": " from the neural net and to find + that issue is more above us like in the whole stack, that", "tokens": [50366, 490, + 264, 18161, 2533, 293, 281, 915, 300, 2734, 307, 544, 3673, 505, 411, 294, 264, + 1379, 8630, 11, 300, 50694], "temperature": 0.0, "avg_logprob": -0.24857307499290532, + "compression_ratio": 1.715481171548117, "no_speech_prob": 0.0013967540580779314}, + {"id": 527, "seek": 177424, "start": 1780.84, "end": 1781.84, "text": " makes sense.", + "tokens": [50694, 1669, 2020, 13, 50744], "temperature": 0.0, "avg_logprob": -0.24857307499290532, + "compression_ratio": 1.715481171548117, "no_speech_prob": 0.0013967540580779314}, + {"id": 528, "seek": 177424, "start": 1781.84, "end": 1783.1200000000001, "text": + " Yeah, yeah, for sure.", "tokens": [50744, 865, 11, 1338, 11, 337, 988, 13, 50808], + "temperature": 0.0, "avg_logprob": -0.24857307499290532, "compression_ratio": 1.715481171548117, + "no_speech_prob": 0.0013967540580779314}, {"id": 529, "seek": 177424, "start": 1783.1200000000001, + "end": 1790.04, "text": " So basically what you''re saying is that, you know, if + I take, if I fix the model, right,", "tokens": [50808, 407, 1936, 437, 291, 434, + 1566, 307, 300, 11, 291, 458, 11, 498, 286, 747, 11, 498, 286, 3191, 264, 2316, + 11, 558, 11, 51154], "temperature": 0.0, "avg_logprob": -0.24857307499290532, "compression_ratio": + 1.715481171548117, "no_speech_prob": 0.0013967540580779314}, {"id": 530, "seek": + 177424, "start": 1790.04, "end": 1792.48, "text": " so the model is fixed.", "tokens": + [51154, 370, 264, 2316, 307, 6806, 13, 51276], "temperature": 0.0, "avg_logprob": + -0.24857307499290532, "compression_ratio": 1.715481171548117, "no_speech_prob": + 0.0013967540580779314}, {"id": 531, "seek": 177424, "start": 1792.48, "end": 1798.64, + "text": " And I pick, let''s say, different algorithms for indexing as well as let''s + say, different", "tokens": [51276, 400, 286, 1888, 11, 718, 311, 584, 11, 819, 14642, + 337, 8186, 278, 382, 731, 382, 718, 311, 584, 11, 819, 51584], "temperature": 0.0, + "avg_logprob": -0.24857307499290532, "compression_ratio": 1.715481171548117, "no_speech_prob": + 0.0013967540580779314}, {"id": 532, "seek": 177424, "start": 1798.64, "end": 1799.96, + "text": " even distances, right?", "tokens": [51584, 754, 22182, 11, 558, 30, 51650], + "temperature": 0.0, "avg_logprob": -0.24857307499290532, "compression_ratio": 1.715481171548117, + "no_speech_prob": 0.0013967540580779314}, {"id": 533, "seek": 177424, "start": 1799.96, + "end": 1802.96, "text": " In some cases, I can maybe choose different distances, + right?", "tokens": [51650, 682, 512, 3331, 11, 286, 393, 1310, 2826, 819, 22182, + 11, 558, 30, 51800], "temperature": 0.0, "avg_logprob": -0.24857307499290532, "compression_ratio": + 1.715481171548117, "no_speech_prob": 0.0013967540580779314}, {"id": 534, "seek": + 180296, "start": 1802.96, "end": 1807.24, "text": " Although maybe you can tell + me if I''m wrong here, because if I trained the model for a specific", "tokens": + [50364, 5780, 1310, 291, 393, 980, 385, 498, 286, 478, 2085, 510, 11, 570, 498, + 286, 8895, 264, 2316, 337, 257, 2685, 50578], "temperature": 0.0, "avg_logprob": + -0.24580939366267276, "compression_ratio": 1.7183098591549295, "no_speech_prob": + 0.012628264725208282}, {"id": 535, "seek": 180296, "start": 1807.24, "end": 1812.88, + "text": " distance, maybe I cannot easily pick another distance during test, is + that right?", "tokens": [50578, 4560, 11, 1310, 286, 2644, 3612, 1888, 1071, 4560, + 1830, 1500, 11, 307, 300, 558, 30, 50860], "temperature": 0.0, "avg_logprob": -0.24580939366267276, + "compression_ratio": 1.7183098591549295, "no_speech_prob": 0.012628264725208282}, + {"id": 536, "seek": 180296, "start": 1812.88, "end": 1816.4, "text": " So a distance, + what do you mean by selecting a distance?", "tokens": [50860, 407, 257, 4560, 11, + 437, 360, 291, 914, 538, 18182, 257, 4560, 30, 51036], "temperature": 0.0, "avg_logprob": + -0.24580939366267276, "compression_ratio": 1.7183098591549295, "no_speech_prob": + 0.012628264725208282}, {"id": 537, "seek": 180296, "start": 1816.4, "end": 1821.92, + "text": " Because it''s all based on closest, like we will rank it closest to furthest + and then", "tokens": [51036, 1436, 309, 311, 439, 2361, 322, 13699, 11, 411, 321, + 486, 6181, 309, 13699, 281, 2687, 36356, 293, 550, 51312], "temperature": 0.0, "avg_logprob": + -0.24580939366267276, "compression_ratio": 1.7183098591549295, "no_speech_prob": + 0.012628264725208282}, {"id": 538, "seek": 180296, "start": 1821.92, "end": 1823.8, + "text": " it''s only like top end results.", "tokens": [51312, 309, 311, 787, 411, + 1192, 917, 3542, 13, 51406], "temperature": 0.0, "avg_logprob": -0.24580939366267276, + "compression_ratio": 1.7183098591549295, "no_speech_prob": 0.012628264725208282}, + {"id": 539, "seek": 180296, "start": 1823.8, "end": 1826.28, "text": " Yeah, I guess + what I meant is the distance metric itself.", "tokens": [51406, 865, 11, 286, 2041, + 437, 286, 4140, 307, 264, 4560, 20678, 2564, 13, 51530], "temperature": 0.0, "avg_logprob": + -0.24580939366267276, "compression_ratio": 1.7183098591549295, "no_speech_prob": + 0.012628264725208282}, {"id": 540, "seek": 180296, "start": 1826.28, "end": 1831.08, + "text": " So it could be to harming, you know, and yeah, do you correct all those?", + "tokens": [51530, 407, 309, 727, 312, 281, 2233, 2810, 11, 291, 458, 11, 293, 1338, + 11, 360, 291, 3006, 439, 729, 30, 51770], "temperature": 0.0, "avg_logprob": -0.24580939366267276, + "compression_ratio": 1.7183098591549295, "no_speech_prob": 0.012628264725208282}, + {"id": 541, "seek": 180296, "start": 1831.08, "end": 1832.08, "text": " Yeah.", + "tokens": [51770, 865, 13, 51820], "temperature": 0.0, "avg_logprob": -0.24580939366267276, + "compression_ratio": 1.7183098591549295, "no_speech_prob": 0.012628264725208282}, + {"id": 542, "seek": 183208, "start": 1832.08, "end": 1837.36, "text": " So comparing + across this different distance metrics, that one is kind of you have to look", "tokens": + [50364, 407, 15763, 2108, 341, 819, 4560, 16367, 11, 300, 472, 307, 733, 295, 291, + 362, 281, 574, 50628], "temperature": 0.0, "avg_logprob": -0.1976453297173799, "compression_ratio": + 1.8488372093023255, "no_speech_prob": 0.014526451006531715}, {"id": 543, "seek": + 183208, "start": 1837.36, "end": 1838.36, "text": " at your data.", "tokens": [50628, + 412, 428, 1412, 13, 50678], "temperature": 0.0, "avg_logprob": -0.1976453297173799, + "compression_ratio": 1.8488372093023255, "no_speech_prob": 0.014526451006531715}, + {"id": 544, "seek": 183208, "start": 1838.36, "end": 1843.0, "text": " We don''t + have, because yeah, if you use different distance metrics, then your flat line is", + "tokens": [50678, 492, 500, 380, 362, 11, 570, 1338, 11, 498, 291, 764, 819, 4560, + 16367, 11, 550, 428, 4962, 1622, 307, 50910], "temperature": 0.0, "avg_logprob": + -0.1976453297173799, "compression_ratio": 1.8488372093023255, "no_speech_prob": + 0.014526451006531715}, {"id": 545, "seek": 183208, "start": 1843.0, "end": 1844.4399999999998, + "text": " going to be different.", "tokens": [50910, 516, 281, 312, 819, 13, 50982], + "temperature": 0.0, "avg_logprob": -0.1976453297173799, "compression_ratio": 1.8488372093023255, + "no_speech_prob": 0.014526451006531715}, {"id": 546, "seek": 183208, "start": 1844.4399999999998, + "end": 1849.36, "text": " But yeah, that''s one where if you''re going to compare + across indexes, you have to keep", "tokens": [50982, 583, 1338, 11, 300, 311, 472, + 689, 498, 291, 434, 516, 281, 6794, 2108, 8186, 279, 11, 291, 362, 281, 1066, 51228], + "temperature": 0.0, "avg_logprob": -0.1976453297173799, "compression_ratio": 1.8488372093023255, + "no_speech_prob": 0.014526451006531715}, {"id": 547, "seek": 183208, "start": 1849.36, + "end": 1852.56, "text": " them the same distance metric.", "tokens": [51228, 552, + 264, 912, 4560, 20678, 13, 51388], "temperature": 0.0, "avg_logprob": -0.1976453297173799, + "compression_ratio": 1.8488372093023255, "no_speech_prob": 0.014526451006531715}, + {"id": 548, "seek": 183208, "start": 1852.56, "end": 1857.6, "text": " Swapping + them out will make some big changes, I think, because if you go from L1 to L2,", + "tokens": [51388, 3926, 10534, 552, 484, 486, 652, 512, 955, 2962, 11, 286, 519, + 11, 570, 498, 291, 352, 490, 441, 16, 281, 441, 17, 11, 51640], "temperature": 0.0, + "avg_logprob": -0.1976453297173799, "compression_ratio": 1.8488372093023255, "no_speech_prob": + 0.014526451006531715}, {"id": 549, "seek": 183208, "start": 1857.6, "end": 1861.6, + "text": " or maybe not L1 to L2, there''s a cosine to Euclidean.", "tokens": [51640, + 420, 1310, 406, 441, 16, 281, 441, 17, 11, 456, 311, 257, 23565, 281, 462, 1311, + 31264, 282, 13, 51840], "temperature": 0.0, "avg_logprob": -0.1976453297173799, + "compression_ratio": 1.8488372093023255, "no_speech_prob": 0.014526451006531715}, + {"id": 550, "seek": 186160, "start": 1861.6, "end": 1867.06, "text": " It switches + up things up a bit where in some cases, one of the distances might, a higher", "tokens": + [50364, 467, 19458, 493, 721, 493, 257, 857, 689, 294, 512, 3331, 11, 472, 295, + 264, 22182, 1062, 11, 257, 2946, 50637], "temperature": 0.0, "avg_logprob": -0.19601173088198803, + "compression_ratio": 1.7622641509433963, "no_speech_prob": 0.003416001098230481}, + {"id": 551, "seek": 186160, "start": 1867.06, "end": 1869.84, "text": " values better, + in some cases, the lower values better.", "tokens": [50637, 4190, 1101, 11, 294, + 512, 3331, 11, 264, 3126, 4190, 1101, 13, 50776], "temperature": 0.0, "avg_logprob": + -0.19601173088198803, "compression_ratio": 1.7622641509433963, "no_speech_prob": + 0.003416001098230481}, {"id": 552, "seek": 186160, "start": 1869.84, "end": 1871.6799999999998, + "text": " So there''s no real direct comparison.", "tokens": [50776, 407, 456, 311, + 572, 957, 2047, 9660, 13, 50868], "temperature": 0.0, "avg_logprob": -0.19601173088198803, + "compression_ratio": 1.7622641509433963, "no_speech_prob": 0.003416001098230481}, + {"id": 553, "seek": 186160, "start": 1871.6799999999998, "end": 1875.08, "text": + " They''re still going to usually rank in the same order.", "tokens": [50868, 814, + 434, 920, 516, 281, 2673, 6181, 294, 264, 912, 1668, 13, 51038], "temperature": + 0.0, "avg_logprob": -0.19601173088198803, "compression_ratio": 1.7622641509433963, + "no_speech_prob": 0.003416001098230481}, {"id": 554, "seek": 186160, "start": 1875.08, + "end": 1880.48, "text": " But yeah, for figuring out which one you want to use there, + it''s kind of give or taking", "tokens": [51038, 583, 1338, 11, 337, 15213, 484, + 597, 472, 291, 528, 281, 764, 456, 11, 309, 311, 733, 295, 976, 420, 1940, 51308], + "temperature": 0.0, "avg_logprob": -0.19601173088198803, "compression_ratio": 1.7622641509433963, + "no_speech_prob": 0.003416001098230481}, {"id": 555, "seek": 186160, "start": 1880.48, + "end": 1881.48, "text": " off.", "tokens": [51308, 766, 13, 51358], "temperature": + 0.0, "avg_logprob": -0.19601173088198803, "compression_ratio": 1.7622641509433963, + "no_speech_prob": 0.003416001098230481}, {"id": 556, "seek": 186160, "start": 1881.48, + "end": 1883.6799999999998, "text": " I actually have to look at the results for + that one.", "tokens": [51358, 286, 767, 362, 281, 574, 412, 264, 3542, 337, 300, + 472, 13, 51468], "temperature": 0.0, "avg_logprob": -0.19601173088198803, "compression_ratio": + 1.7622641509433963, "no_speech_prob": 0.003416001098230481}, {"id": 557, "seek": + 186160, "start": 1883.6799999999998, "end": 1889.36, "text": " There''s no real + like mathematical way to kind of compare some antics to distance and", "tokens": + [51468, 821, 311, 572, 957, 411, 18894, 636, 281, 733, 295, 6794, 512, 2511, 1167, + 281, 4560, 293, 51752], "temperature": 0.0, "avg_logprob": -0.19601173088198803, + "compression_ratio": 1.7622641509433963, "no_speech_prob": 0.003416001098230481}, + {"id": 558, "seek": 188936, "start": 1889.36, "end": 1892.1999999999998, "text": + " kind of get the relationship, if that makes sense.", "tokens": [50364, 733, 295, + 483, 264, 2480, 11, 498, 300, 1669, 2020, 13, 50506], "temperature": 0.0, "avg_logprob": + -0.2557817569448928, "compression_ratio": 1.60431654676259, "no_speech_prob": 0.013426491059362888}, + {"id": 559, "seek": 188936, "start": 1892.1999999999998, "end": 1893.1999999999998, + "text": " Yeah, yeah, for sure.", "tokens": [50506, 865, 11, 1338, 11, 337, 988, + 13, 50556], "temperature": 0.0, "avg_logprob": -0.2557817569448928, "compression_ratio": + 1.60431654676259, "no_speech_prob": 0.013426491059362888}, {"id": 560, "seek": 188936, + "start": 1893.1999999999998, "end": 1895.6, "text": " For sure, it''s more like + an experimentation needed there, right?", "tokens": [50556, 1171, 988, 11, 309, + 311, 544, 411, 364, 37142, 2978, 456, 11, 558, 30, 50676], "temperature": 0.0, "avg_logprob": + -0.2557817569448928, "compression_ratio": 1.60431654676259, "no_speech_prob": 0.013426491059362888}, + {"id": 561, "seek": 188936, "start": 1895.6, "end": 1896.6, "text": " Yeah.", "tokens": + [50676, 865, 13, 50726], "temperature": 0.0, "avg_logprob": -0.2557817569448928, + "compression_ratio": 1.60431654676259, "no_speech_prob": 0.013426491059362888}, + {"id": 562, "seek": 188936, "start": 1896.6, "end": 1897.6, "text": " Exactly.", + "tokens": [50726, 7587, 13, 50776], "temperature": 0.0, "avg_logprob": -0.2557817569448928, + "compression_ratio": 1.60431654676259, "no_speech_prob": 0.013426491059362888}, + {"id": 563, "seek": 188936, "start": 1897.6, "end": 1900.76, "text": " And also, + like, actually, I just remembered when you''ve been kind of describing these", "tokens": + [50776, 400, 611, 11, 411, 11, 767, 11, 286, 445, 13745, 562, 291, 600, 668, 733, + 295, 16141, 613, 50934], "temperature": 0.0, "avg_logprob": -0.2557817569448928, + "compression_ratio": 1.60431654676259, "no_speech_prob": 0.013426491059362888}, + {"id": 564, "seek": 188936, "start": 1900.76, "end": 1902.56, "text": " different + distance metrics.", "tokens": [50934, 819, 4560, 16367, 13, 51024], "temperature": + 0.0, "avg_logprob": -0.2557817569448928, "compression_ratio": 1.60431654676259, + "no_speech_prob": 0.013426491059362888}, {"id": 565, "seek": 188936, "start": 1902.56, + "end": 1906.08, "text": " I remember a paper, I think it''s called Beer.", "tokens": + [51024, 286, 1604, 257, 3035, 11, 286, 519, 309, 311, 1219, 41453, 13, 51200], "temperature": + 0.0, "avg_logprob": -0.2557817569448928, "compression_ratio": 1.60431654676259, + "no_speech_prob": 0.013426491059362888}, {"id": 566, "seek": 188936, "start": 1906.08, + "end": 1913.6399999999999, "text": " So it was comparing different methods to do + the re-ranking step, right?", "tokens": [51200, 407, 309, 390, 15763, 819, 7150, + 281, 360, 264, 319, 12, 20479, 278, 1823, 11, 558, 30, 51578], "temperature": 0.0, + "avg_logprob": -0.2557817569448928, "compression_ratio": 1.60431654676259, "no_speech_prob": + 0.013426491059362888}, {"id": 567, "seek": 188936, "start": 1913.6399999999999, + "end": 1917.1999999999998, "text": " Like dense retrieval and some other methods + I forgot already.", "tokens": [51578, 1743, 18011, 19817, 3337, 293, 512, 661, 7150, + 286, 5298, 1217, 13, 51756], "temperature": 0.0, "avg_logprob": -0.2557817569448928, + "compression_ratio": 1.60431654676259, "no_speech_prob": 0.013426491059362888}, + {"id": 568, "seek": 191720, "start": 1917.2, "end": 1924.96, "text": " But they + actually found out that if you have documents, let''s say text documents, the", + "tokens": [50364, 583, 436, 767, 1352, 484, 300, 498, 291, 362, 8512, 11, 718, 311, + 584, 2487, 8512, 11, 264, 50752], "temperature": 0.0, "avg_logprob": -0.18210739247939167, + "compression_ratio": 1.6741573033707866, "no_speech_prob": 0.05176974833011627}, + {"id": 569, "seek": 191720, "start": 1924.96, "end": 1934.2, "text": " cosine similarity + will favor shorter documents given the tie versus dot product will favor", "tokens": + [50752, 23565, 32194, 486, 2294, 11639, 8512, 2212, 264, 7582, 5717, 5893, 1674, + 486, 2294, 51214], "temperature": 0.0, "avg_logprob": -0.18210739247939167, "compression_ratio": + 1.6741573033707866, "no_speech_prob": 0.05176974833011627}, {"id": 570, "seek": + 191720, "start": 1934.2, "end": 1935.76, "text": " longer documents.", "tokens": + [51214, 2854, 8512, 13, 51292], "temperature": 0.0, "avg_logprob": -0.18210739247939167, + "compression_ratio": 1.6741573033707866, "no_speech_prob": 0.05176974833011627}, + {"id": 571, "seek": 191720, "start": 1935.76, "end": 1937.88, "text": " And this + is by design of the formula.", "tokens": [51292, 400, 341, 307, 538, 1715, 295, + 264, 8513, 13, 51398], "temperature": 0.0, "avg_logprob": -0.18210739247939167, + "compression_ratio": 1.6741573033707866, "no_speech_prob": 0.05176974833011627}, + {"id": 572, "seek": 191720, "start": 1937.88, "end": 1942.96, "text": " The cosine + similarity is basically mapping it to the unit sphere.", "tokens": [51398, 440, + 23565, 32194, 307, 1936, 18350, 309, 281, 264, 4985, 16687, 13, 51652], "temperature": + 0.0, "avg_logprob": -0.18210739247939167, "compression_ratio": 1.6741573033707866, + "no_speech_prob": 0.05176974833011627}, {"id": 573, "seek": 194296, "start": 1942.96, + "end": 1946.8, "text": " The dot product is there is nothing to kind of normalize + on.", "tokens": [50364, 440, 5893, 1674, 307, 456, 307, 1825, 281, 733, 295, 2710, + 1125, 322, 13, 50556], "temperature": 0.0, "avg_logprob": -0.19431572947008857, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.011542931199073792}, + {"id": 574, "seek": 194296, "start": 1946.8, "end": 1952.32, "text": " So it basically + just takes all the components of your vector and just basically says, okay,", "tokens": + [50556, 407, 309, 1936, 445, 2516, 439, 264, 6677, 295, 428, 8062, 293, 445, 1936, + 1619, 11, 1392, 11, 50832], "temperature": 0.0, "avg_logprob": -0.19431572947008857, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.011542931199073792}, + {"id": 575, "seek": 194296, "start": 1952.32, "end": 1956.6000000000001, "text": + " here is the volume and just the lowest one wins, right?", "tokens": [50832, 510, + 307, 264, 5523, 293, 445, 264, 12437, 472, 10641, 11, 558, 30, 51046], "temperature": + 0.0, "avg_logprob": -0.19431572947008857, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.011542931199073792}, {"id": 576, "seek": 194296, "start": 1956.6000000000001, + "end": 1960.28, "text": " And that can actually impact the user experience, right?", + "tokens": [51046, 400, 300, 393, 767, 2712, 264, 4195, 1752, 11, 558, 30, 51230], + "temperature": 0.0, "avg_logprob": -0.19431572947008857, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.011542931199073792}, {"id": 577, "seek": 194296, "start": 1960.28, + "end": 1966.72, "text": " Like if I have a database, let''s say of news versus some + deep research, right?", "tokens": [51230, 1743, 498, 286, 362, 257, 8149, 11, 718, + 311, 584, 295, 2583, 5717, 512, 2452, 2132, 11, 558, 30, 51552], "temperature": + 0.0, "avg_logprob": -0.19431572947008857, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.011542931199073792}, {"id": 578, "seek": 194296, "start": 1966.72, + "end": 1970.56, "text": " So deep research is thousands of pages and news is couple + of pages, maybe even just", "tokens": [51552, 407, 2452, 2132, 307, 5383, 295, 7183, + 293, 2583, 307, 1916, 295, 7183, 11, 1310, 754, 445, 51744], "temperature": 0.0, + "avg_logprob": -0.19431572947008857, "compression_ratio": 1.7142857142857142, "no_speech_prob": + 0.011542931199073792}, {"id": 579, "seek": 194296, "start": 1970.56, "end": 1971.88, + "text": " couple of paragraphs.", "tokens": [51744, 1916, 295, 48910, 13, 51810], + "temperature": 0.0, "avg_logprob": -0.19431572947008857, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.011542931199073792}, {"id": 580, "seek": 194296, "start": 1971.88, + "end": 1972.88, "text": " Yeah.", "tokens": [51810, 865, 13, 51860], "temperature": + 0.0, "avg_logprob": -0.19431572947008857, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.011542931199073792}, {"id": 581, "seek": 197288, "start": 1972.88, + "end": 1978.0800000000002, "text": " So if my hits are just in a paragraph in the + news and also in a paragraph in the longer", "tokens": [50364, 407, 498, 452, 8664, + 366, 445, 294, 257, 18865, 294, 264, 2583, 293, 611, 294, 257, 18865, 294, 264, + 2854, 50624], "temperature": 0.0, "avg_logprob": -0.2703243915299724, "compression_ratio": + 1.8082706766917294, "no_speech_prob": 0.0050003123469650745}, {"id": 582, "seek": + 197288, "start": 1978.0800000000002, "end": 1980.5600000000002, "text": " document + with cosine, I''ll get the news.", "tokens": [50624, 4166, 365, 23565, 11, 286, + 603, 483, 264, 2583, 13, 50748], "temperature": 0.0, "avg_logprob": -0.2703243915299724, + "compression_ratio": 1.8082706766917294, "no_speech_prob": 0.0050003123469650745}, + {"id": 583, "seek": 197288, "start": 1980.5600000000002, "end": 1982.3600000000001, + "text": " I will not get the deep research, right?", "tokens": [50748, 286, 486, + 406, 483, 264, 2452, 2132, 11, 558, 30, 50838], "temperature": 0.0, "avg_logprob": + -0.2703243915299724, "compression_ratio": 1.8082706766917294, "no_speech_prob": + 0.0050003123469650745}, {"id": 584, "seek": 197288, "start": 1982.3600000000001, + "end": 1983.3600000000001, "text": " See what I''m saying?", "tokens": [50838, 3008, + 437, 286, 478, 1566, 30, 50888], "temperature": 0.0, "avg_logprob": -0.2703243915299724, + "compression_ratio": 1.8082706766917294, "no_speech_prob": 0.0050003123469650745}, + {"id": 585, "seek": 197288, "start": 1983.3600000000001, "end": 1984.5200000000002, + "text": " No, that makes sense.", "tokens": [50888, 883, 11, 300, 1669, 2020, 13, + 50946], "temperature": 0.0, "avg_logprob": -0.2703243915299724, "compression_ratio": + 1.8082706766917294, "no_speech_prob": 0.0050003123469650745}, {"id": 586, "seek": + 197288, "start": 1984.5200000000002, "end": 1985.5200000000002, "text": " Yeah.", + "tokens": [50946, 865, 13, 50996], "temperature": 0.0, "avg_logprob": -0.2703243915299724, + "compression_ratio": 1.8082706766917294, "no_speech_prob": 0.0050003123469650745}, + {"id": 587, "seek": 197288, "start": 1985.5200000000002, "end": 1990.64, "text": + " So yeah, that''s I think one where you you have to kind of test it out and see + what you", "tokens": [50996, 407, 1338, 11, 300, 311, 286, 519, 472, 689, 291, 291, + 362, 281, 733, 295, 1500, 309, 484, 293, 536, 437, 291, 51252], "temperature": 0.0, + "avg_logprob": -0.2703243915299724, "compression_ratio": 1.8082706766917294, "no_speech_prob": + 0.0050003123469650745}, {"id": 588, "seek": 197288, "start": 1990.64, "end": 1994.0800000000002, + "text": " want because I have some people searching they might want to use some + people searching", "tokens": [51252, 528, 570, 286, 362, 512, 561, 10808, 436, 1062, + 528, 281, 764, 512, 561, 10808, 51424], "temperature": 0.0, "avg_logprob": -0.2703243915299724, + "compression_ratio": 1.8082706766917294, "no_speech_prob": 0.0050003123469650745}, + {"id": 589, "seek": 197288, "start": 1994.0800000000002, "end": 1997.3200000000002, + "text": " they might want the scientific paper.", "tokens": [51424, 436, 1062, 528, + 264, 8134, 3035, 13, 51586], "temperature": 0.0, "avg_logprob": -0.2703243915299724, + "compression_ratio": 1.8082706766917294, "no_speech_prob": 0.0050003123469650745}, + {"id": 590, "seek": 197288, "start": 1997.3200000000002, "end": 1999.8000000000002, + "text": " And that''s one where you look at history, I guess.", "tokens": [51586, + 400, 300, 311, 472, 689, 291, 574, 412, 2503, 11, 286, 2041, 13, 51710], "temperature": + 0.0, "avg_logprob": -0.2703243915299724, "compression_ratio": 1.8082706766917294, + "no_speech_prob": 0.0050003123469650745}, {"id": 591, "seek": 199980, "start": 1999.8, + "end": 2002.3999999999999, "text": " For thinking about this, let''s say Google + is doing it.", "tokens": [50364, 1171, 1953, 466, 341, 11, 718, 311, 584, 3329, + 307, 884, 309, 13, 50494], "temperature": 0.0, "avg_logprob": -0.2652392355811517, + "compression_ratio": 1.6959247648902822, "no_speech_prob": 0.06107824668288231}, + {"id": 592, "seek": 199980, "start": 2002.3999999999999, "end": 2005.48, "text": + " You look at the user''s history of how they search if they''re searching for scientific", + "tokens": [50494, 509, 574, 412, 264, 4195, 311, 2503, 295, 577, 436, 3164, 498, + 436, 434, 10808, 337, 8134, 50648], "temperature": 0.0, "avg_logprob": -0.2652392355811517, + "compression_ratio": 1.6959247648902822, "no_speech_prob": 0.06107824668288231}, + {"id": 593, "seek": 199980, "start": 2005.48, "end": 2009.68, "text": " stuff or + if they''re always looking at you maybe swap the index and to a different distance", + "tokens": [50648, 1507, 420, 498, 436, 434, 1009, 1237, 412, 291, 1310, 18135, 264, + 8186, 293, 281, 257, 819, 4560, 50858], "temperature": 0.0, "avg_logprob": -0.2652392355811517, + "compression_ratio": 1.6959247648902822, "no_speech_prob": 0.06107824668288231}, + {"id": 594, "seek": 199980, "start": 2009.68, "end": 2010.68, "text": " metrics.", + "tokens": [50858, 16367, 13, 50908], "temperature": 0.0, "avg_logprob": -0.2652392355811517, + "compression_ratio": 1.6959247648902822, "no_speech_prob": 0.06107824668288231}, + {"id": 595, "seek": 199980, "start": 2010.68, "end": 2012.52, "text": " But yeah, + I haven''t thought of that too much.", "tokens": [50908, 583, 1338, 11, 286, 2378, + 380, 1194, 295, 300, 886, 709, 13, 51000], "temperature": 0.0, "avg_logprob": -0.2652392355811517, + "compression_ratio": 1.6959247648902822, "no_speech_prob": 0.06107824668288231}, + {"id": 596, "seek": 199980, "start": 2012.52, "end": 2013.52, "text": " Not way.", + "tokens": [51000, 1726, 636, 13, 51050], "temperature": 0.0, "avg_logprob": -0.2652392355811517, + "compression_ratio": 1.6959247648902822, "no_speech_prob": 0.06107824668288231}, + {"id": 597, "seek": 199980, "start": 2013.52, "end": 2014.52, "text": " That''s + really interesting.", "tokens": [51050, 663, 311, 534, 1880, 13, 51100], "temperature": + 0.0, "avg_logprob": -0.2652392355811517, "compression_ratio": 1.6959247648902822, + "no_speech_prob": 0.06107824668288231}, {"id": 598, "seek": 199980, "start": 2014.52, + "end": 2016.52, "text": " Oh, yeah, I need to check out that paper.", "tokens": + [51100, 876, 11, 1338, 11, 286, 643, 281, 1520, 484, 300, 3035, 13, 51200], "temperature": + 0.0, "avg_logprob": -0.2652392355811517, "compression_ratio": 1.6959247648902822, + "no_speech_prob": 0.06107824668288231}, {"id": 599, "seek": 199980, "start": 2016.52, + "end": 2017.52, "text": " Yeah, for sure.", "tokens": [51200, 865, 11, 337, 988, + 13, 51250], "temperature": 0.0, "avg_logprob": -0.2652392355811517, "compression_ratio": + 1.6959247648902822, "no_speech_prob": 0.06107824668288231}, {"id": 600, "seek": + 199980, "start": 2017.52, "end": 2021.8799999999999, "text": " I''ll send you the + link and I''ll make sure to also include in the notes, maybe for those", "tokens": + [51250, 286, 603, 2845, 291, 264, 2113, 293, 286, 603, 652, 988, 281, 611, 4090, + 294, 264, 5570, 11, 1310, 337, 729, 51468], "temperature": 0.0, "avg_logprob": -0.2652392355811517, + "compression_ratio": 1.6959247648902822, "no_speech_prob": 0.06107824668288231}, + {"id": 601, "seek": 199980, "start": 2021.8799999999999, "end": 2024.96, "text": + " of us who are interested in reading papers.", "tokens": [51468, 295, 505, 567, + 366, 3102, 294, 3760, 10577, 13, 51622], "temperature": 0.0, "avg_logprob": -0.2652392355811517, + "compression_ratio": 1.6959247648902822, "no_speech_prob": 0.06107824668288231}, + {"id": 602, "seek": 199980, "start": 2024.96, "end": 2027.48, "text": " And yeah, + so that''s awesome.", "tokens": [51622, 400, 1338, 11, 370, 300, 311, 3476, 13, + 51748], "temperature": 0.0, "avg_logprob": -0.2652392355811517, "compression_ratio": + 1.6959247648902822, "no_speech_prob": 0.06107824668288231}, {"id": 603, "seek": + 202748, "start": 2027.48, "end": 2035.64, "text": " So you guys basically will for + a database where users can mix and match the way they want,", "tokens": [50364, + 407, 291, 1074, 1936, 486, 337, 257, 8149, 689, 5022, 393, 2890, 293, 2995, 264, + 636, 436, 528, 11, 50772], "temperature": 0.0, "avg_logprob": -0.18846865567294033, + "compression_ratio": 1.72265625, "no_speech_prob": 0.012348617427051067}, {"id": + 604, "seek": 202748, "start": 2035.64, "end": 2036.64, "text": " right?", "tokens": + [50772, 558, 30, 50822], "temperature": 0.0, "avg_logprob": -0.18846865567294033, + "compression_ratio": 1.72265625, "no_speech_prob": 0.012348617427051067}, {"id": + 605, "seek": 202748, "start": 2036.64, "end": 2042.24, "text": " And then you help + them to kind of do you guide the users in the process of doing this?", "tokens": + [50822, 400, 550, 291, 854, 552, 281, 733, 295, 360, 291, 5934, 264, 5022, 294, + 264, 1399, 295, 884, 341, 30, 51102], "temperature": 0.0, "avg_logprob": -0.18846865567294033, + "compression_ratio": 1.72265625, "no_speech_prob": 0.012348617427051067}, {"id": + 606, "seek": 202748, "start": 2042.24, "end": 2046.96, "text": " So if they come + to us for help, we usually kind of we have some articles where we mess", "tokens": + [51102, 407, 498, 436, 808, 281, 505, 337, 854, 11, 321, 2673, 733, 295, 321, 362, + 512, 11290, 689, 321, 2082, 51338], "temperature": 0.0, "avg_logprob": -0.18846865567294033, + "compression_ratio": 1.72265625, "no_speech_prob": 0.012348617427051067}, {"id": + 607, "seek": 202748, "start": 2046.96, "end": 2050.92, "text": " around with the + indexes and different parameters and we kind of have like a graph, let''s say speed", + "tokens": [51338, 926, 365, 264, 8186, 279, 293, 819, 9834, 293, 321, 733, 295, + 362, 411, 257, 4295, 11, 718, 311, 584, 3073, 51536], "temperature": 0.0, "avg_logprob": + -0.18846865567294033, "compression_ratio": 1.72265625, "no_speech_prob": 0.012348617427051067}, + {"id": 608, "seek": 202748, "start": 2050.92, "end": 2054.52, "text": " performance, + recall that kind of stuff where it''s kind of preliminary.", "tokens": [51536, 3389, + 11, 9901, 300, 733, 295, 1507, 689, 309, 311, 733, 295, 28817, 13, 51716], "temperature": + 0.0, "avg_logprob": -0.18846865567294033, "compression_ratio": 1.72265625, "no_speech_prob": + 0.012348617427051067}, {"id": 609, "seek": 205452, "start": 2054.52, "end": 2058.28, + "text": " We kind of hope that they kind of learn it on their own because like we + can only help", "tokens": [50364, 492, 733, 295, 1454, 300, 436, 733, 295, 1466, + 309, 322, 641, 1065, 570, 411, 321, 393, 787, 854, 50552], "temperature": 0.0, "avg_logprob": + -0.1794552457505378, "compression_ratio": 1.8962962962962964, "no_speech_prob": + 0.020940329879522324}, {"id": 610, "seek": 205452, "start": 2058.28, "end": 2060.92, + "text": " so many people.", "tokens": [50552, 370, 867, 561, 13, 50684], "temperature": + 0.0, "avg_logprob": -0.1794552457505378, "compression_ratio": 1.8962962962962964, + "no_speech_prob": 0.020940329879522324}, {"id": 611, "seek": 205452, "start": 2060.92, + "end": 2061.92, "text": " So yeah, we do help out.", "tokens": [50684, 407, 1338, + 11, 321, 360, 854, 484, 13, 50734], "temperature": 0.0, "avg_logprob": -0.1794552457505378, + "compression_ratio": 1.8962962962962964, "no_speech_prob": 0.020940329879522324}, + {"id": 612, "seek": 205452, "start": 2061.92, "end": 2063.0, "text": " We point + to the right directions.", "tokens": [50734, 492, 935, 281, 264, 558, 11095, 13, + 50788], "temperature": 0.0, "avg_logprob": -0.1794552457505378, "compression_ratio": + 1.8962962962962964, "no_speech_prob": 0.020940329879522324}, {"id": 613, "seek": + 205452, "start": 2063.0, "end": 2065.84, "text": " And if it''s like a really interesting + use case or a really big use case, we''ll kind", "tokens": [50788, 400, 498, 309, + 311, 411, 257, 534, 1880, 764, 1389, 420, 257, 534, 955, 764, 1389, 11, 321, 603, + 733, 50930], "temperature": 0.0, "avg_logprob": -0.1794552457505378, "compression_ratio": + 1.8962962962962964, "no_speech_prob": 0.020940329879522324}, {"id": 614, "seek": + 205452, "start": 2065.84, "end": 2068.96, "text": " of mess around with it ourselves + and try to help out.", "tokens": [50930, 295, 2082, 926, 365, 309, 4175, 293, 853, + 281, 854, 484, 13, 51086], "temperature": 0.0, "avg_logprob": -0.1794552457505378, + "compression_ratio": 1.8962962962962964, "no_speech_prob": 0.020940329879522324}, + {"id": 615, "seek": 205452, "start": 2068.96, "end": 2073.72, "text": " But we also + just hope that people mess around and then post the results.", "tokens": [51086, + 583, 321, 611, 445, 1454, 300, 561, 2082, 926, 293, 550, 2183, 264, 3542, 13, 51324], + "temperature": 0.0, "avg_logprob": -0.1794552457505378, "compression_ratio": 1.8962962962962964, + "no_speech_prob": 0.020940329879522324}, {"id": 616, "seek": 205452, "start": 2073.72, + "end": 2077.16, "text": " They see that and then like we kind of the more data we + get, the more like it helps", "tokens": [51324, 814, 536, 300, 293, 550, 411, 321, + 733, 295, 264, 544, 1412, 321, 483, 11, 264, 544, 411, 309, 3665, 51496], "temperature": + 0.0, "avg_logprob": -0.1794552457505378, "compression_ratio": 1.8962962962962964, + "no_speech_prob": 0.020940329879522324}, {"id": 617, "seek": 205452, "start": 2077.16, + "end": 2080.32, "text": " a lot with when people kind of share what they''re doing.", + "tokens": [51496, 257, 688, 365, 562, 561, 733, 295, 2073, 437, 436, 434, 884, 13, + 51654], "temperature": 0.0, "avg_logprob": -0.1794552457505378, "compression_ratio": + 1.8962962962962964, "no_speech_prob": 0.020940329879522324}, {"id": 618, "seek": + 208032, "start": 2080.32, "end": 2084.6000000000004, "text": " We''re trying to + share as much as everything kind of get people into this, get those words", "tokens": + [50364, 492, 434, 1382, 281, 2073, 382, 709, 382, 1203, 733, 295, 483, 561, 666, + 341, 11, 483, 729, 2283, 50578], "temperature": 0.0, "avg_logprob": -0.30623067220052086, + "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.09608723223209381}, + {"id": 619, "seek": 208032, "start": 2084.6000000000004, "end": 2087.96, "text": + " spread and that''s pretty much open source.", "tokens": [50578, 3974, 293, 300, + 311, 1238, 709, 1269, 4009, 13, 50746], "temperature": 0.0, "avg_logprob": -0.30623067220052086, + "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.09608723223209381}, + {"id": 620, "seek": 208032, "start": 2087.96, "end": 2091.32, "text": " Like a big + deal of open source is kind of getting this out there.", "tokens": [50746, 1743, + 257, 955, 2028, 295, 1269, 4009, 307, 733, 295, 1242, 341, 484, 456, 13, 50914], + "temperature": 0.0, "avg_logprob": -0.30623067220052086, "compression_ratio": 1.6991869918699187, + "no_speech_prob": 0.09608723223209381}, {"id": 621, "seek": 208032, "start": 2091.32, + "end": 2095.7200000000003, "text": " Like competition''s good, new innovations good, + just get this like vector similarity", "tokens": [50914, 1743, 6211, 311, 665, 11, + 777, 24283, 665, 11, 445, 483, 341, 411, 8062, 32194, 51134], "temperature": 0.0, + "avg_logprob": -0.30623067220052086, "compression_ratio": 1.6991869918699187, "no_speech_prob": + 0.09608723223209381}, {"id": 622, "seek": 208032, "start": 2095.7200000000003, "end": + 2099.28, "text": " search, getting people interested in it.", "tokens": [51134, + 3164, 11, 1242, 561, 3102, 294, 309, 13, 51312], "temperature": 0.0, "avg_logprob": + -0.30623067220052086, "compression_ratio": 1.6991869918699187, "no_speech_prob": + 0.09608723223209381}, {"id": 623, "seek": 208032, "start": 2099.28, "end": 2100.28, + "text": " Yeah.", "tokens": [51312, 865, 13, 51362], "temperature": 0.0, "avg_logprob": + -0.30623067220052086, "compression_ratio": 1.6991869918699187, "no_speech_prob": + 0.09608723223209381}, {"id": 624, "seek": 208032, "start": 2100.28, "end": 2101.28, + "text": " Yeah.", "tokens": [51362, 865, 13, 51412], "temperature": 0.0, "avg_logprob": + -0.30623067220052086, "compression_ratio": 1.6991869918699187, "no_speech_prob": + 0.09608723223209381}, {"id": 625, "seek": 208032, "start": 2101.28, "end": 2107.96, + "text": " And your website has so many use cases covered like I was looking at audio + search.", "tokens": [51412, 400, 428, 3144, 575, 370, 867, 764, 3331, 5343, 411, + 286, 390, 1237, 412, 6278, 3164, 13, 51746], "temperature": 0.0, "avg_logprob": + -0.30623067220052086, "compression_ratio": 1.6991869918699187, "no_speech_prob": + 0.09608723223209381}, {"id": 626, "seek": 210796, "start": 2107.96, "end": 2108.96, + "text": " That was interesting.", "tokens": [50364, 663, 390, 1880, 13, 50414], + "temperature": 0.0, "avg_logprob": -0.20353193020601884, "compression_ratio": 1.6176470588235294, + "no_speech_prob": 0.029624227434396744}, {"id": 627, "seek": 210796, "start": 2108.96, + "end": 2114.76, "text": " Like you basically walk through, you know, selecting a + library, how I will encode the", "tokens": [50414, 1743, 291, 1936, 1792, 807, 11, + 291, 458, 11, 18182, 257, 6405, 11, 577, 286, 486, 2058, 1429, 264, 50704], "temperature": + 0.0, "avg_logprob": -0.20353193020601884, "compression_ratio": 1.6176470588235294, + "no_speech_prob": 0.029624227434396744}, {"id": 628, "seek": 210796, "start": 2114.76, + "end": 2121.28, "text": " song and I have an idea to try it out on a few songs that + I have like MP3s.", "tokens": [50704, 2153, 293, 286, 362, 364, 1558, 281, 853, + 309, 484, 322, 257, 1326, 5781, 300, 286, 362, 411, 14146, 18, 82, 13, 51030], "temperature": + 0.0, "avg_logprob": -0.20353193020601884, "compression_ratio": 1.6176470588235294, + "no_speech_prob": 0.029624227434396744}, {"id": 629, "seek": 210796, "start": 2121.28, + "end": 2122.28, "text": " Yeah.", "tokens": [51030, 865, 13, 51080], "temperature": + 0.0, "avg_logprob": -0.20353193020601884, "compression_ratio": 1.6176470588235294, + "no_speech_prob": 0.029624227434396744}, {"id": 630, "seek": 210796, "start": 2122.28, + "end": 2127.08, "text": " And what I was particularly interested is like, okay, + is there a way to separate like the", "tokens": [51080, 400, 437, 286, 390, 4098, + 3102, 307, 411, 11, 1392, 11, 307, 456, 257, 636, 281, 4994, 411, 264, 51320], "temperature": + 0.0, "avg_logprob": -0.20353193020601884, "compression_ratio": 1.6176470588235294, + "no_speech_prob": 0.029624227434396744}, {"id": 631, "seek": 210796, "start": 2127.08, + "end": 2133.84, "text": " singer voice from the musical instrument from like, I + don''t know, the style of this song", "tokens": [51320, 11564, 3177, 490, 264, 9165, + 7198, 490, 411, 11, 286, 500, 380, 458, 11, 264, 3758, 295, 341, 2153, 51658], "temperature": + 0.0, "avg_logprob": -0.20353193020601884, "compression_ratio": 1.6176470588235294, + "no_speech_prob": 0.029624227434396744}, {"id": 632, "seek": 210796, "start": 2133.84, + "end": 2134.84, "text": " and so on.", "tokens": [51658, 293, 370, 322, 13, 51708], + "temperature": 0.0, "avg_logprob": -0.20353193020601884, "compression_ratio": 1.6176470588235294, + "no_speech_prob": 0.029624227434396744}, {"id": 633, "seek": 210796, "start": 2134.84, + "end": 2135.84, "text": " Yeah.", "tokens": [51708, 865, 13, 51758], "temperature": + 0.0, "avg_logprob": -0.20353193020601884, "compression_ratio": 1.6176470588235294, + "no_speech_prob": 0.029624227434396744}, {"id": 634, "seek": 213584, "start": 2135.84, + "end": 2138.04, "text": " And it''s a really cool one because this is one of the + things that kind of looked into", "tokens": [50364, 400, 309, 311, 257, 534, 1627, + 472, 570, 341, 307, 472, 295, 264, 721, 300, 733, 295, 2956, 666, 50474], "temperature": + 0.0, "avg_logprob": -0.2624570429325104, "compression_ratio": 1.7777777777777777, + "no_speech_prob": 0.02268509939312935}, {"id": 635, "seek": 213584, "start": 2138.04, + "end": 2142.96, "text": " a lot and did some talks on not like really detailed, + but it always popped up.", "tokens": [50474, 257, 688, 293, 630, 512, 6686, 322, + 406, 411, 534, 9942, 11, 457, 309, 1009, 21545, 493, 13, 50720], "temperature": + 0.0, "avg_logprob": -0.2624570429325104, "compression_ratio": 1.7777777777777777, + "no_speech_prob": 0.02268509939312935}, {"id": 636, "seek": 213584, "start": 2142.96, + "end": 2147.32, "text": " So for all these like recommendation systems, like what''s + I think I have never looked", "tokens": [50720, 407, 337, 439, 613, 411, 11879, + 3652, 11, 411, 437, 311, 286, 519, 286, 362, 1128, 2956, 50938], "temperature": + 0.0, "avg_logprob": -0.2624570429325104, "compression_ratio": 1.7777777777777777, + "no_speech_prob": 0.02268509939312935}, {"id": 637, "seek": 213584, "start": 2147.32, + "end": 2149.6000000000004, "text": " at like their deep everything''s behind closed + doors.", "tokens": [50938, 412, 411, 641, 2452, 1203, 311, 2261, 5395, 8077, 13, + 51052], "temperature": 0.0, "avg_logprob": -0.2624570429325104, "compression_ratio": + 1.7777777777777777, "no_speech_prob": 0.02268509939312935}, {"id": 638, "seek": + 213584, "start": 2149.6000000000004, "end": 2154.4, "text": " But it''s like spotifies + recommendation and then when you''re doing like the shazam, I think", "tokens": + [51052, 583, 309, 311, 411, 4008, 11221, 11879, 293, 550, 562, 291, 434, 884, 411, + 264, 402, 921, 335, 11, 286, 519, 51292], "temperature": 0.0, "avg_logprob": -0.2624570429325104, + "compression_ratio": 1.7777777777777777, "no_speech_prob": 0.02268509939312935}, + {"id": 639, "seek": 213584, "start": 2154.4, "end": 2160.2000000000003, "text": + " was shazam for the audio recognizing is, yeah, they separate the background music + and the", "tokens": [51292, 390, 402, 921, 335, 337, 264, 6278, 18538, 307, 11, + 1338, 11, 436, 4994, 264, 3678, 1318, 293, 264, 51582], "temperature": 0.0, "avg_logprob": + -0.2624570429325104, "compression_ratio": 1.7777777777777777, "no_speech_prob": + 0.02268509939312935}, {"id": 640, "seek": 213584, "start": 2160.2000000000003, "end": + 2163.1200000000003, "text": " vocals and they pretty much discard vocals.", "tokens": + [51582, 28441, 293, 436, 1238, 709, 31597, 28441, 13, 51728], "temperature": 0.0, + "avg_logprob": -0.2624570429325104, "compression_ratio": 1.7777777777777777, "no_speech_prob": + 0.02268509939312935}, {"id": 641, "seek": 216312, "start": 2163.12, "end": 2168.48, + "text": " This music searching is based just in the background and there were some + techniques.", "tokens": [50364, 639, 1318, 10808, 307, 2361, 445, 294, 264, 3678, + 293, 456, 645, 512, 7512, 13, 50632], "temperature": 0.0, "avg_logprob": -0.209587828318278, + "compression_ratio": 1.866412213740458, "no_speech_prob": 0.006004296708852053}, + {"id": 642, "seek": 216312, "start": 2168.48, "end": 2176.2, "text": " So I think + there are for separating the audio, there is like one D neural nets that kind", + "tokens": [50632, 407, 286, 519, 456, 366, 337, 29279, 264, 6278, 11, 456, 307, + 411, 472, 413, 18161, 36170, 300, 733, 51018], "temperature": 0.0, "avg_logprob": + -0.209587828318278, "compression_ratio": 1.866412213740458, "no_speech_prob": 0.006004296708852053}, + {"id": 643, "seek": 216312, "start": 2176.2, "end": 2179.7599999999998, "text": + " of go in the line and there''s like times used based neural nets.", "tokens": + [51018, 295, 352, 294, 264, 1622, 293, 456, 311, 411, 1413, 1143, 2361, 18161, 36170, + 13, 51196], "temperature": 0.0, "avg_logprob": -0.209587828318278, "compression_ratio": + 1.866412213740458, "no_speech_prob": 0.006004296708852053}, {"id": 644, "seek": + 216312, "start": 2179.7599999999998, "end": 2182.12, "text": " But another one that + was audio inversion.", "tokens": [51196, 583, 1071, 472, 300, 390, 6278, 43576, + 13, 51314], "temperature": 0.0, "avg_logprob": -0.209587828318278, "compression_ratio": + 1.866412213740458, "no_speech_prob": 0.006004296708852053}, {"id": 645, "seek": + 216312, "start": 2182.12, "end": 2185.88, "text": " So it would help when you had + the background where you didn''t vert it or where you had", "tokens": [51314, 407, + 309, 576, 854, 562, 291, 632, 264, 3678, 689, 291, 994, 380, 6509, 309, 420, 689, + 291, 632, 51502], "temperature": 0.0, "avg_logprob": -0.209587828318278, "compression_ratio": + 1.866412213740458, "no_speech_prob": 0.006004296708852053}, {"id": 646, "seek": + 216312, "start": 2185.88, "end": 2188.0, "text": " the vocals to get the audio out.", + "tokens": [51502, 264, 28441, 281, 483, 264, 6278, 484, 13, 51608], "temperature": + 0.0, "avg_logprob": -0.209587828318278, "compression_ratio": 1.866412213740458, + "no_speech_prob": 0.006004296708852053}, {"id": 647, "seek": 216312, "start": 2188.0, + "end": 2192.48, "text": " But a lot of it was working on that of pulling out the + background music is the big step.", "tokens": [51608, 583, 257, 688, 295, 309, 390, + 1364, 322, 300, 295, 8407, 484, 264, 3678, 1318, 307, 264, 955, 1823, 13, 51832], + "temperature": 0.0, "avg_logprob": -0.209587828318278, "compression_ratio": 1.866412213740458, + "no_speech_prob": 0.006004296708852053}, {"id": 648, "seek": 219248, "start": 2192.48, + "end": 2196.2400000000002, "text": " And then performing the neural net on that + to get the embedding.", "tokens": [50364, 400, 550, 10205, 264, 18161, 2533, 322, + 300, 281, 483, 264, 12240, 3584, 13, 50552], "temperature": 0.0, "avg_logprob": + -0.20243375054721174, "compression_ratio": 1.873846153846154, "no_speech_prob": + 0.003102038288488984}, {"id": 649, "seek": 219248, "start": 2196.2400000000002, + "end": 2200.84, "text": " So that''s how you avoid if you''re recommending songs, + you would cover songs and you kind", "tokens": [50552, 407, 300, 311, 577, 291, + 5042, 498, 291, 434, 30559, 5781, 11, 291, 576, 2060, 5781, 293, 291, 733, 50782], + "temperature": 0.0, "avg_logprob": -0.20243375054721174, "compression_ratio": 1.873846153846154, + "no_speech_prob": 0.003102038288488984}, {"id": 650, "seek": 219248, "start": 2200.84, + "end": 2204.2400000000002, "text": " of avoid it for you can easily filter out cover + songs because they''re going to have the", "tokens": [50782, 295, 5042, 309, 337, + 291, 393, 3612, 6608, 484, 2060, 5781, 570, 436, 434, 516, 281, 362, 264, 50952], + "temperature": 0.0, "avg_logprob": -0.20243375054721174, "compression_ratio": 1.873846153846154, + "no_speech_prob": 0.003102038288488984}, {"id": 651, "seek": 219248, "start": 2204.2400000000002, + "end": 2205.2400000000002, "text": " exact same background.", "tokens": [50952, + 1900, 912, 3678, 13, 51002], "temperature": 0.0, "avg_logprob": -0.20243375054721174, + "compression_ratio": 1.873846153846154, "no_speech_prob": 0.003102038288488984}, + {"id": 652, "seek": 219248, "start": 2205.2400000000002, "end": 2207.08, "text": + " The vocals will be different.", "tokens": [51002, 440, 28441, 486, 312, 819, 13, + 51094], "temperature": 0.0, "avg_logprob": -0.20243375054721174, "compression_ratio": + 1.873846153846154, "no_speech_prob": 0.003102038288488984}, {"id": 653, "seek": + 219248, "start": 2207.08, "end": 2211.72, "text": " And then with these recommendation + system, another cool thing is you don''t want the perfect", "tokens": [51094, 400, + 550, 365, 613, 11879, 1185, 11, 1071, 1627, 551, 307, 291, 500, 380, 528, 264, 2176, + 51326], "temperature": 0.0, "avg_logprob": -0.20243375054721174, "compression_ratio": + 1.873846153846154, "no_speech_prob": 0.003102038288488984}, {"id": 654, "seek": + 219248, "start": 2211.72, "end": 2215.56, "text": " similar like with your result + search result, you don''t want the exact same research.", "tokens": [51326, 2531, + 411, 365, 428, 1874, 3164, 1874, 11, 291, 500, 380, 528, 264, 1900, 912, 2132, 13, + 51518], "temperature": 0.0, "avg_logprob": -0.20243375054721174, "compression_ratio": + 1.873846153846154, "no_speech_prob": 0.003102038288488984}, {"id": 655, "seek": + 219248, "start": 2215.56, "end": 2217.8, "text": " It''s like you don''t want the + top 10 closest.", "tokens": [51518, 467, 311, 411, 291, 500, 380, 528, 264, 1192, + 1266, 13699, 13, 51630], "temperature": 0.0, "avg_logprob": -0.20243375054721174, + "compression_ratio": 1.873846153846154, "no_speech_prob": 0.003102038288488984}, + {"id": 656, "seek": 219248, "start": 2217.8, "end": 2222.08, "text": " You might + want like the last 10 out of 100 that are close because you want something similar", + "tokens": [51630, 509, 1062, 528, 411, 264, 1036, 1266, 484, 295, 2319, 300, 366, + 1998, 570, 291, 528, 746, 2531, 51844], "temperature": 0.0, "avg_logprob": -0.20243375054721174, + "compression_ratio": 1.873846153846154, "no_speech_prob": 0.003102038288488984}, + {"id": 657, "seek": 222208, "start": 2222.88, "end": 2225.56, "text": " but not + really exact of the same.", "tokens": [50404, 457, 406, 534, 1900, 295, 264, 912, + 13, 50538], "temperature": 0.0, "avg_logprob": -0.1877386810582712, "compression_ratio": + 1.6732283464566928, "no_speech_prob": 0.017068244516849518}, {"id": 658, "seek": + 222208, "start": 2225.56, "end": 2229.2, "text": " But yeah, audio inversion and + one D neural nets and a few others.", "tokens": [50538, 583, 1338, 11, 6278, 43576, + 293, 472, 413, 18161, 36170, 293, 257, 1326, 2357, 13, 50720], "temperature": 0.0, + "avg_logprob": -0.1877386810582712, "compression_ratio": 1.6732283464566928, "no_speech_prob": + 0.017068244516849518}, {"id": 659, "seek": 222208, "start": 2229.2, "end": 2233.92, + "text": " I don''t remember that on the top of my head, but it''s a hard problem + to solve of getting", "tokens": [50720, 286, 500, 380, 1604, 300, 322, 264, 1192, + 295, 452, 1378, 11, 457, 309, 311, 257, 1152, 1154, 281, 5039, 295, 1242, 50956], + "temperature": 0.0, "avg_logprob": -0.1877386810582712, "compression_ratio": 1.6732283464566928, + "no_speech_prob": 0.017068244516849518}, {"id": 660, "seek": 222208, "start": 2233.92, + "end": 2238.72, "text": " the vocals out without having like separated files already.", + "tokens": [50956, 264, 28441, 484, 1553, 1419, 411, 12005, 7098, 1217, 13, 51196], + "temperature": 0.0, "avg_logprob": -0.1877386810582712, "compression_ratio": 1.6732283464566928, + "no_speech_prob": 0.017068244516849518}, {"id": 661, "seek": 222208, "start": 2238.72, + "end": 2243.96, "text": " And it''s like an exciting topic because like, you know, + like there are so many examples", "tokens": [51196, 400, 309, 311, 411, 364, 4670, + 4829, 570, 411, 11, 291, 458, 11, 411, 456, 366, 370, 867, 5110, 51458], "temperature": + 0.0, "avg_logprob": -0.1877386810582712, "compression_ratio": 1.6732283464566928, + "no_speech_prob": 0.017068244516849518}, {"id": 662, "seek": 222208, "start": 2243.96, + "end": 2249.88, "text": " on the web how you can index text, you know, how you can + not index text and do something", "tokens": [51458, 322, 264, 3670, 577, 291, 393, + 8186, 2487, 11, 291, 458, 11, 577, 291, 393, 406, 8186, 2487, 293, 360, 746, 51754], + "temperature": 0.0, "avg_logprob": -0.1877386810582712, "compression_ratio": 1.6732283464566928, + "no_speech_prob": 0.017068244516849518}, {"id": 663, "seek": 224988, "start": 2249.88, + "end": 2252.52, "text": " else with text and more text, right?", "tokens": [50364, + 1646, 365, 2487, 293, 544, 2487, 11, 558, 30, 50496], "temperature": 0.0, "avg_logprob": + -0.18228666191427118, "compression_ratio": 1.6742424242424243, "no_speech_prob": + 0.0386434905230999}, {"id": 664, "seek": 224988, "start": 2252.52, "end": 2260.52, + "text": " But it''s like in frequent that I come across some image search or audio + or even for that", "tokens": [50496, 583, 309, 311, 411, 294, 18004, 300, 286, 808, + 2108, 512, 3256, 3164, 420, 6278, 420, 754, 337, 300, 50896], "temperature": 0.0, + "avg_logprob": -0.18228666191427118, "compression_ratio": 1.6742424242424243, "no_speech_prob": + 0.0386434905230999}, {"id": 665, "seek": 224988, "start": 2260.52, "end": 2263.6400000000003, + "text": " matter video, you know, I haven''t seen any blog posts on the video.", + "tokens": [50896, 1871, 960, 11, 291, 458, 11, 286, 2378, 380, 1612, 604, 6968, + 12300, 322, 264, 960, 13, 51052], "temperature": 0.0, "avg_logprob": -0.18228666191427118, + "compression_ratio": 1.6742424242424243, "no_speech_prob": 0.0386434905230999}, + {"id": 666, "seek": 224988, "start": 2263.6400000000003, "end": 2265.44, "text": + " I don''t know if you guys have it.", "tokens": [51052, 286, 500, 380, 458, 498, + 291, 1074, 362, 309, 13, 51142], "temperature": 0.0, "avg_logprob": -0.18228666191427118, + "compression_ratio": 1.6742424242424243, "no_speech_prob": 0.0386434905230999}, + {"id": 667, "seek": 224988, "start": 2265.44, "end": 2270.84, "text": " So video, + yeah, that''s that one gets a little difficult in terms of like video you can,", + "tokens": [51142, 407, 960, 11, 1338, 11, 300, 311, 300, 472, 2170, 257, 707, 2252, + 294, 2115, 295, 411, 960, 291, 393, 11, 51412], "temperature": 0.0, "avg_logprob": + -0.18228666191427118, "compression_ratio": 1.6742424242424243, "no_speech_prob": + 0.0386434905230999}, {"id": 668, "seek": 224988, "start": 2270.84, "end": 2274.48, + "text": " but also how you''re going to sort everything out because when you''re + doing dealing with", "tokens": [51412, 457, 611, 577, 291, 434, 516, 281, 1333, + 1203, 484, 570, 562, 291, 434, 884, 6260, 365, 51594], "temperature": 0.0, "avg_logprob": + -0.18228666191427118, "compression_ratio": 1.6742424242424243, "no_speech_prob": + 0.0386434905230999}, {"id": 669, "seek": 224988, "start": 2274.48, "end": 2276.8, + "text": " videos, everything is framed by frame.", "tokens": [51594, 2145, 11, 1203, + 307, 30420, 538, 3920, 13, 51710], "temperature": 0.0, "avg_logprob": -0.18228666191427118, + "compression_ratio": 1.6742424242424243, "no_speech_prob": 0.0386434905230999}, + {"id": 670, "seek": 227680, "start": 2276.8, "end": 2282.6800000000003, "text": + " So then it''s how to do you take every frame and sort of group it together into + one sort", "tokens": [50364, 407, 550, 309, 311, 577, 281, 360, 291, 747, 633, 3920, + 293, 1333, 295, 1594, 309, 1214, 666, 472, 1333, 50658], "temperature": 0.0, "avg_logprob": + -0.18744122896263068, "compression_ratio": 1.8333333333333333, "no_speech_prob": + 0.016027219593524933}, {"id": 671, "seek": 227680, "start": 2282.6800000000003, + "end": 2286.96, "text": " of like ID and then if any frame matches, you kind of + point to that, it gets a little", "tokens": [50658, 295, 411, 7348, 293, 550, 498, + 604, 3920, 10676, 11, 291, 733, 295, 935, 281, 300, 11, 309, 2170, 257, 707, 50872], + "temperature": 0.0, "avg_logprob": -0.18744122896263068, "compression_ratio": 1.8333333333333333, + "no_speech_prob": 0.016027219593524933}, {"id": 672, "seek": 227680, "start": 2286.96, + "end": 2289.32, "text": " difficult with video.", "tokens": [50872, 2252, 365, 960, + 13, 50990], "temperature": 0.0, "avg_logprob": -0.18744122896263068, "compression_ratio": + 1.8333333333333333, "no_speech_prob": 0.016027219593524933}, {"id": 673, "seek": + 227680, "start": 2289.32, "end": 2293.32, "text": " It''s not too bad if you''re + doing let''s say live tracking in a video like, like to say", "tokens": [50990, + 467, 311, 406, 886, 1578, 498, 291, 434, 884, 718, 311, 584, 1621, 11603, 294, 257, + 960, 411, 11, 411, 281, 584, 51190], "temperature": 0.0, "avg_logprob": -0.18744122896263068, + "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.016027219593524933}, + {"id": 674, "seek": 227680, "start": 2293.32, "end": 2297.1200000000003, "text": + " there''s a soccer player and you pull out the most similar player that looks like + him,", "tokens": [51190, 456, 311, 257, 15469, 4256, 293, 291, 2235, 484, 264, 881, + 2531, 4256, 300, 1542, 411, 796, 11, 51380], "temperature": 0.0, "avg_logprob": + -0.18744122896263068, "compression_ratio": 1.8333333333333333, "no_speech_prob": + 0.016027219593524933}, {"id": 675, "seek": 227680, "start": 2297.1200000000003, + "end": 2298.52, "text": " you can get a name for him to track him.", "tokens": [51380, + 291, 393, 483, 257, 1315, 337, 796, 281, 2837, 796, 13, 51450], "temperature": 0.0, + "avg_logprob": -0.18744122896263068, "compression_ratio": 1.8333333333333333, "no_speech_prob": + 0.016027219593524933}, {"id": 676, "seek": 227680, "start": 2298.52, "end": 2299.88, + "text": " So it knows his name.", "tokens": [51450, 407, 309, 3255, 702, 1315, 13, + 51518], "temperature": 0.0, "avg_logprob": -0.18744122896263068, "compression_ratio": + 1.8333333333333333, "no_speech_prob": 0.016027219593524933}, {"id": 677, "seek": + 227680, "start": 2299.88, "end": 2302.6000000000004, "text": " That''s kind of similar + if you''re doing live tracking, but if you''re looking for things", "tokens": [51518, + 663, 311, 733, 295, 2531, 498, 291, 434, 884, 1621, 11603, 11, 457, 498, 291, 434, + 1237, 337, 721, 51654], "temperature": 0.0, "avg_logprob": -0.18744122896263068, + "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.016027219593524933}, + {"id": 678, "seek": 230260, "start": 2302.6, "end": 2307.52, "text": " within a + video like you look for a single person an entire video, then it kind of gets", + "tokens": [50364, 1951, 257, 960, 411, 291, 574, 337, 257, 2167, 954, 364, 2302, + 960, 11, 550, 309, 733, 295, 2170, 50610], "temperature": 0.0, "avg_logprob": -0.20567962579559862, + "compression_ratio": 1.7115384615384615, "no_speech_prob": 0.07865426689386368}, + {"id": 679, "seek": 230260, "start": 2307.52, "end": 2309.68, "text": " difficult + of it takes time.", "tokens": [50610, 2252, 295, 309, 2516, 565, 13, 50718], "temperature": + 0.0, "avg_logprob": -0.20567962579559862, "compression_ratio": 1.7115384615384615, + "no_speech_prob": 0.07865426689386368}, {"id": 680, "seek": 230260, "start": 2309.68, + "end": 2314.8399999999997, "text": " You either go through all of you index all + of them or you pull out a few key frames, but", "tokens": [50718, 509, 2139, 352, + 807, 439, 295, 291, 8186, 439, 295, 552, 420, 291, 2235, 484, 257, 1326, 2141, 12083, + 11, 457, 50976], "temperature": 0.0, "avg_logprob": -0.20567962579559862, "compression_ratio": + 1.7115384615384615, "no_speech_prob": 0.07865426689386368}, {"id": 681, "seek": + 230260, "start": 2314.8399999999997, "end": 2318.2, "text": " not too many people + I''m going to be honest are doing video yet.", "tokens": [50976, 406, 886, 867, + 561, 286, 478, 516, 281, 312, 3245, 366, 884, 960, 1939, 13, 51144], "temperature": + 0.0, "avg_logprob": -0.20567962579559862, "compression_ratio": 1.7115384615384615, + "no_speech_prob": 0.07865426689386368}, {"id": 682, "seek": 230260, "start": 2318.2, + "end": 2323.96, "text": " I think there is a little bit of lack right now to be + honest, I think images are the most", "tokens": [51144, 286, 519, 456, 307, 257, + 707, 857, 295, 5011, 558, 586, 281, 312, 3245, 11, 286, 519, 5267, 366, 264, 881, + 51432], "temperature": 0.0, "avg_logprob": -0.20567962579559862, "compression_ratio": + 1.7115384615384615, "no_speech_prob": 0.07865426689386368}, {"id": 683, "seek": + 230260, "start": 2323.96, "end": 2330.4, "text": " used for us like everyone because + images, I think, is the easiest even compared to text", "tokens": [51432, 1143, + 337, 505, 411, 1518, 570, 5267, 11, 286, 519, 11, 307, 264, 12889, 754, 5347, 281, + 2487, 51754], "temperature": 0.0, "avg_logprob": -0.20567962579559862, "compression_ratio": + 1.7115384615384615, "no_speech_prob": 0.07865426689386368}, {"id": 684, "seek": + 233040, "start": 2330.48, "end": 2334.88, "text": " because the text you have, some + of these neural networks where the transformer networks", "tokens": [50368, 570, + 264, 2487, 291, 362, 11, 512, 295, 613, 18161, 9590, 689, 264, 31782, 9590, 50588], + "temperature": 0.0, "avg_logprob": -0.27217652247502255, "compression_ratio": 1.7209302325581395, + "no_speech_prob": 0.01971358247101307}, {"id": 685, "seek": 233040, "start": 2334.88, + "end": 2337.0, "text": " are a little bit hard to use.", "tokens": [50588, 366, + 257, 707, 857, 1152, 281, 764, 13, 50694], "temperature": 0.0, "avg_logprob": -0.27217652247502255, + "compression_ratio": 1.7209302325581395, "no_speech_prob": 0.01971358247101307}, + {"id": 686, "seek": 233040, "start": 2337.0, "end": 2340.84, "text": " You always + have like, yeah, you have like in Python you have sentence transformed the", "tokens": + [50694, 509, 1009, 362, 411, 11, 1338, 11, 291, 362, 411, 294, 15329, 291, 362, + 8174, 16894, 264, 50886], "temperature": 0.0, "avg_logprob": -0.27217652247502255, + "compression_ratio": 1.7209302325581395, "no_speech_prob": 0.01971358247101307}, + {"id": 687, "seek": 233040, "start": 2340.84, "end": 2344.08, "text": " easiest + one where you just input the string, but the other ones kind of required to add", + "tokens": [50886, 12889, 472, 689, 291, 445, 4846, 264, 6798, 11, 457, 264, 661, + 2306, 733, 295, 4739, 281, 909, 51048], "temperature": 0.0, "avg_logprob": -0.27217652247502255, + "compression_ratio": 1.7209302325581395, "no_speech_prob": 0.01971358247101307}, + {"id": 688, "seek": 233040, "start": 2344.08, "end": 2348.1600000000003, "text": + " tags in the string and do these things which not everyone understands.", "tokens": + [51048, 18632, 294, 264, 6798, 293, 360, 613, 721, 597, 406, 1518, 15146, 13, 51252], + "temperature": 0.0, "avg_logprob": -0.27217652247502255, "compression_ratio": 1.7209302325581395, + "no_speech_prob": 0.01971358247101307}, {"id": 689, "seek": 233040, "start": 2348.1600000000003, + "end": 2355.52, "text": " With images, it''s just import some ResNet 50, which torch + makes it really simple.", "tokens": [51252, 2022, 5267, 11, 309, 311, 445, 974, + 512, 5015, 31890, 2625, 11, 597, 27822, 1669, 309, 534, 2199, 13, 51620], "temperature": + 0.0, "avg_logprob": -0.27217652247502255, "compression_ratio": 1.7209302325581395, + "no_speech_prob": 0.01971358247101307}, {"id": 690, "seek": 235552, "start": 2355.52, + "end": 2360.64, "text": " So the image put the image in the ResNet and then literally + you get your embedding vector", "tokens": [50364, 407, 264, 3256, 829, 264, 3256, + 294, 264, 5015, 31890, 293, 550, 3736, 291, 483, 428, 12240, 3584, 8062, 50620], + "temperature": 0.0, "avg_logprob": -0.240711378014606, "compression_ratio": 1.7350993377483444, + "no_speech_prob": 0.16822285950183868}, {"id": 691, "seek": 235552, "start": 2360.64, + "end": 2361.64, "text": " you can directly pipe it.", "tokens": [50620, 291, 393, + 3838, 11240, 309, 13, 50670], "temperature": 0.0, "avg_logprob": -0.240711378014606, + "compression_ratio": 1.7350993377483444, "no_speech_prob": 0.16822285950183868}, + {"id": 692, "seek": 235552, "start": 2361.64, "end": 2365.56, "text": " So it''s + like, it''s a very simple one and it gets good results that are pretty interesting", + "tokens": [50670, 407, 309, 311, 411, 11, 309, 311, 257, 588, 2199, 472, 293, 309, + 2170, 665, 3542, 300, 366, 1238, 1880, 50866], "temperature": 0.0, "avg_logprob": + -0.240711378014606, "compression_ratio": 1.7350993377483444, "no_speech_prob": 0.16822285950183868}, + {"id": 693, "seek": 235552, "start": 2365.56, "end": 2369.32, "text": " and you + can do a lot with images and I don''t think enough people are doing it yet for", + "tokens": [50866, 293, 291, 393, 360, 257, 688, 365, 5267, 293, 286, 500, 380, 519, + 1547, 561, 366, 884, 309, 1939, 337, 51054], "temperature": 0.0, "avg_logprob": + -0.240711378014606, "compression_ratio": 1.7350993377483444, "no_speech_prob": 0.16822285950183868}, + {"id": 694, "seek": 235552, "start": 2369.32, "end": 2371.48, "text": " like shopping + things.", "tokens": [51054, 411, 8688, 721, 13, 51162], "temperature": 0.0, "avg_logprob": + -0.240711378014606, "compression_ratio": 1.7350993377483444, "no_speech_prob": 0.16822285950183868}, + {"id": 695, "seek": 235552, "start": 2371.48, "end": 2374.88, "text": " Everyone''s + still relying on text but let''s say you upload an image or a shoe, you find", "tokens": + [51162, 5198, 311, 920, 24140, 322, 2487, 457, 718, 311, 584, 291, 6580, 364, 3256, + 420, 257, 12796, 11, 291, 915, 51332], "temperature": 0.0, "avg_logprob": -0.240711378014606, + "compression_ratio": 1.7350993377483444, "no_speech_prob": 0.16822285950183868}, + {"id": 696, "seek": 235552, "start": 2374.88, "end": 2375.88, "text": " that shoe.", + "tokens": [51332, 300, 12796, 13, 51382], "temperature": 0.0, "avg_logprob": -0.240711378014606, + "compression_ratio": 1.7350993377483444, "no_speech_prob": 0.16822285950183868}, + {"id": 697, "seek": 235552, "start": 2375.88, "end": 2377.72, "text": " I think + everyone will enjoy that a lot more.", "tokens": [51382, 286, 519, 1518, 486, 2103, + 300, 257, 688, 544, 13, 51474], "temperature": 0.0, "avg_logprob": -0.240711378014606, + "compression_ratio": 1.7350993377483444, "no_speech_prob": 0.16822285950183868}, + {"id": 698, "seek": 235552, "start": 2377.72, "end": 2378.72, "text": " Yeah.", + "tokens": [51474, 865, 13, 51524], "temperature": 0.0, "avg_logprob": -0.240711378014606, + "compression_ratio": 1.7350993377483444, "no_speech_prob": 0.16822285950183868}, + {"id": 699, "seek": 235552, "start": 2378.72, "end": 2382.84, "text": " So it says + like very concrete application in business, right?", "tokens": [51524, 407, 309, + 1619, 411, 588, 9859, 3861, 294, 1606, 11, 558, 30, 51730], "temperature": 0.0, + "avg_logprob": -0.240711378014606, "compression_ratio": 1.7350993377483444, "no_speech_prob": + 0.16822285950183868}, {"id": 700, "seek": 238284, "start": 2382.84, "end": 2386.2400000000002, + "text": " And e-commerce is a very big area.", "tokens": [50364, 400, 308, 12, 26926, + 307, 257, 588, 955, 1859, 13, 50534], "temperature": 0.0, "avg_logprob": -0.25533668858230496, + "compression_ratio": 1.6930091185410334, "no_speech_prob": 0.26431429386138916}, + {"id": 701, "seek": 238284, "start": 2386.2400000000002, "end": 2388.1600000000003, + "text": " So yeah, that makes total sense.", "tokens": [50534, 407, 1338, 11, 300, + 1669, 3217, 2020, 13, 50630], "temperature": 0.0, "avg_logprob": -0.25533668858230496, + "compression_ratio": 1.6930091185410334, "no_speech_prob": 0.26431429386138916}, + {"id": 702, "seek": 238284, "start": 2388.1600000000003, "end": 2392.96, "text": + " It''s not like many users are like, oh, I remember that scene in the movie, can + I find", "tokens": [50630, 467, 311, 406, 411, 867, 5022, 366, 411, 11, 1954, 11, + 286, 1604, 300, 4145, 294, 264, 3169, 11, 393, 286, 915, 50870], "temperature": + 0.0, "avg_logprob": -0.25533668858230496, "compression_ratio": 1.6930091185410334, + "no_speech_prob": 0.26431429386138916}, {"id": 703, "seek": 238284, "start": 2392.96, + "end": 2395.04, "text": " it expressing it in words?", "tokens": [50870, 309, 22171, + 309, 294, 2283, 30, 50974], "temperature": 0.0, "avg_logprob": -0.25533668858230496, + "compression_ratio": 1.6930091185410334, "no_speech_prob": 0.26431429386138916}, + {"id": 704, "seek": 238284, "start": 2395.04, "end": 2397.52, "text": " Yeah, it + won''t work.", "tokens": [50974, 865, 11, 309, 1582, 380, 589, 13, 51098], "temperature": + 0.0, "avg_logprob": -0.25533668858230496, "compression_ratio": 1.6930091185410334, + "no_speech_prob": 0.26431429386138916}, {"id": 705, "seek": 238284, "start": 2397.52, + "end": 2400.6400000000003, "text": " Maybe you can like say the actor and then like + some description of the scene, but then", "tokens": [51098, 2704, 291, 393, 411, + 584, 264, 8747, 293, 550, 411, 512, 3855, 295, 264, 4145, 11, 457, 550, 51254], + "temperature": 0.0, "avg_logprob": -0.25533668858230496, "compression_ratio": 1.6930091185410334, + "no_speech_prob": 0.26431429386138916}, {"id": 706, "seek": 238284, "start": 2400.6400000000003, + "end": 2404.2400000000002, "text": " you already have to know the actor, which personally, + I don''t know any actor names.", "tokens": [51254, 291, 1217, 362, 281, 458, 264, + 8747, 11, 597, 5665, 11, 286, 500, 380, 458, 604, 8747, 5288, 13, 51434], "temperature": + 0.0, "avg_logprob": -0.25533668858230496, "compression_ratio": 1.6930091185410334, + "no_speech_prob": 0.26431429386138916}, {"id": 707, "seek": 238284, "start": 2404.2400000000002, + "end": 2405.2400000000002, "text": " So yeah, exactly.", "tokens": [51434, 407, + 1338, 11, 2293, 13, 51484], "temperature": 0.0, "avg_logprob": -0.25533668858230496, + "compression_ratio": 1.6930091185410334, "no_speech_prob": 0.26431429386138916}, + {"id": 708, "seek": 238284, "start": 2405.2400000000002, "end": 2406.2400000000002, + "text": " It doesn''t work for me.", "tokens": [51484, 467, 1177, 380, 589, 337, + 385, 13, 51534], "temperature": 0.0, "avg_logprob": -0.25533668858230496, "compression_ratio": + 1.6930091185410334, "no_speech_prob": 0.26431429386138916}, {"id": 709, "seek": + 238284, "start": 2406.2400000000002, "end": 2408.2400000000002, "text": " And it + and it defeats the purpose of search, right?", "tokens": [51534, 400, 309, 293, + 309, 7486, 1720, 264, 4334, 295, 3164, 11, 558, 30, 51634], "temperature": 0.0, + "avg_logprob": -0.25533668858230496, "compression_ratio": 1.6930091185410334, "no_speech_prob": + 0.26431429386138916}, {"id": 710, "seek": 238284, "start": 2408.2400000000002, "end": + 2412.6400000000003, "text": " Because actually like early on when I was kind of + just entering this field many years ago,", "tokens": [51634, 1436, 767, 411, 2440, + 322, 562, 286, 390, 733, 295, 445, 11104, 341, 2519, 867, 924, 2057, 11, 51854], + "temperature": 0.0, "avg_logprob": -0.25533668858230496, "compression_ratio": 1.6930091185410334, + "no_speech_prob": 0.26431429386138916}, {"id": 711, "seek": 241264, "start": 2412.64, + "end": 2417.6, "text": " I was like, so search, it''s like I need to know what to + look for, right?", "tokens": [50364, 286, 390, 411, 11, 370, 3164, 11, 309, 311, + 411, 286, 643, 281, 458, 437, 281, 574, 337, 11, 558, 30, 50612], "temperature": + 0.0, "avg_logprob": -0.28024149248676916, "compression_ratio": 1.956989247311828, + "no_speech_prob": 0.018702195957303047}, {"id": 712, "seek": 241264, "start": 2417.6, + "end": 2421.8399999999997, "text": " So I''m typing the keywords, telling the search + engine what I''m looking for, but I don''t", "tokens": [50612, 407, 286, 478, 18444, + 264, 21009, 11, 3585, 264, 3164, 2848, 437, 286, 478, 1237, 337, 11, 457, 286, 500, + 380, 50824], "temperature": 0.0, "avg_logprob": -0.28024149248676916, "compression_ratio": + 1.956989247311828, "no_speech_prob": 0.018702195957303047}, {"id": 713, "seek": + 241264, "start": 2421.8399999999997, "end": 2422.8399999999997, "text": " know what + I''m looking for.", "tokens": [50824, 458, 437, 286, 478, 1237, 337, 13, 50874], + "temperature": 0.0, "avg_logprob": -0.28024149248676916, "compression_ratio": 1.956989247311828, + "no_speech_prob": 0.018702195957303047}, {"id": 714, "seek": 241264, "start": 2422.8399999999997, + "end": 2425.44, "text": " Yeah, you''re already doing the job for it.", "tokens": + [50874, 865, 11, 291, 434, 1217, 884, 264, 1691, 337, 309, 13, 51004], "temperature": + 0.0, "avg_logprob": -0.28024149248676916, "compression_ratio": 1.956989247311828, + "no_speech_prob": 0.018702195957303047}, {"id": 715, "seek": 241264, "start": 2425.44, + "end": 2426.44, "text": " Yeah.", "tokens": [51004, 865, 13, 51054], "temperature": + 0.0, "avg_logprob": -0.28024149248676916, "compression_ratio": 1.956989247311828, + "no_speech_prob": 0.018702195957303047}, {"id": 716, "seek": 241264, "start": 2426.44, + "end": 2429.12, "text": " Yeah, like if you''re searching for the keywords, that + means you really need to search", "tokens": [51054, 865, 11, 411, 498, 291, 434, + 10808, 337, 264, 21009, 11, 300, 1355, 291, 534, 643, 281, 3164, 51188], "temperature": + 0.0, "avg_logprob": -0.28024149248676916, "compression_ratio": 1.956989247311828, + "no_speech_prob": 0.018702195957303047}, {"id": 717, "seek": 241264, "start": 2429.12, + "end": 2432.24, "text": " for the engine anymore if you''re doing all of this job.", + "tokens": [51188, 337, 264, 2848, 3602, 498, 291, 434, 884, 439, 295, 341, 1691, + 13, 51344], "temperature": 0.0, "avg_logprob": -0.28024149248676916, "compression_ratio": + 1.956989247311828, "no_speech_prob": 0.018702195957303047}, {"id": 718, "seek": + 241264, "start": 2432.24, "end": 2433.24, "text": " Yeah.", "tokens": [51344, 865, + 13, 51394], "temperature": 0.0, "avg_logprob": -0.28024149248676916, "compression_ratio": + 1.956989247311828, "no_speech_prob": 0.018702195957303047}, {"id": 719, "seek": + 241264, "start": 2433.24, "end": 2436.2799999999997, "text": " And I really loved + one competition that Yandex did.", "tokens": [51394, 400, 286, 534, 4333, 472, 6211, + 300, 398, 474, 3121, 630, 13, 51546], "temperature": 0.0, "avg_logprob": -0.28024149248676916, + "compression_ratio": 1.956989247311828, "no_speech_prob": 0.018702195957303047}, + {"id": 720, "seek": 241264, "start": 2436.2799999999997, "end": 2439.2799999999997, + "text": " Like they actually stopped doing it in its epiti.", "tokens": [51546, + 1743, 436, 767, 5936, 884, 309, 294, 1080, 2388, 8707, 13, 51696], "temperature": + 0.0, "avg_logprob": -0.28024149248676916, "compression_ratio": 1.956989247311828, + "no_speech_prob": 0.018702195957303047}, {"id": 721, "seek": 241264, "start": 2439.2799999999997, + "end": 2442.3199999999997, "text": " So the competition was like this, they give + you a question.", "tokens": [51696, 407, 264, 6211, 390, 411, 341, 11, 436, 976, + 291, 257, 1168, 13, 51848], "temperature": 0.0, "avg_logprob": -0.28024149248676916, + "compression_ratio": 1.956989247311828, "no_speech_prob": 0.018702195957303047}, + {"id": 722, "seek": 244232, "start": 2442.32, "end": 2445.0800000000004, "text": + " And basically you compete with other people, all right?", "tokens": [50364, 400, + 1936, 291, 11831, 365, 661, 561, 11, 439, 558, 30, 50502], "temperature": 0.0, "avg_logprob": + -0.17692713774451913, "compression_ratio": 1.7813620071684588, "no_speech_prob": + 0.003576674498617649}, {"id": 723, "seek": 244232, "start": 2445.0800000000004, + "end": 2450.2400000000002, "text": " So they give you a question, but it''s not + like what is the color of submarine in the", "tokens": [50502, 407, 436, 976, 291, + 257, 1168, 11, 457, 309, 311, 406, 411, 437, 307, 264, 2017, 295, 33995, 294, 264, + 50760], "temperature": 0.0, "avg_logprob": -0.17692713774451913, "compression_ratio": + 1.7813620071684588, "no_speech_prob": 0.003576674498617649}, {"id": 724, "seek": + 244232, "start": 2450.2400000000002, "end": 2452.28, "text": " in the song of Beatles, + right?", "tokens": [50760, 294, 264, 2153, 295, 38376, 11, 558, 30, 50862], "temperature": + 0.0, "avg_logprob": -0.17692713774451913, "compression_ratio": 1.7813620071684588, + "no_speech_prob": 0.003576674498617649}, {"id": 725, "seek": 244232, "start": 2452.28, + "end": 2456.6800000000003, "text": " It''s like you first need to answer the first + part of the question.", "tokens": [50862, 467, 311, 411, 291, 700, 643, 281, 1867, + 264, 700, 644, 295, 264, 1168, 13, 51082], "temperature": 0.0, "avg_logprob": -0.17692713774451913, + "compression_ratio": 1.7813620071684588, "no_speech_prob": 0.003576674498617649}, + {"id": 726, "seek": 244232, "start": 2456.6800000000003, "end": 2461.88, "text": + " Then you get like kind of another puzzle, another question kind of, you know, + the puzzle", "tokens": [51082, 1396, 291, 483, 411, 733, 295, 1071, 12805, 11, 1071, + 1168, 733, 295, 11, 291, 458, 11, 264, 12805, 51342], "temperature": 0.0, "avg_logprob": + -0.17692713774451913, "compression_ratio": 1.7813620071684588, "no_speech_prob": + 0.003576674498617649}, {"id": 727, "seek": 244232, "start": 2461.88, "end": 2464.1200000000003, + "text": " gets solved and you get the full question and so on.", "tokens": [51342, + 2170, 13041, 293, 291, 483, 264, 1577, 1168, 293, 370, 322, 13, 51454], "temperature": + 0.0, "avg_logprob": -0.17692713774451913, "compression_ratio": 1.7813620071684588, + "no_speech_prob": 0.003576674498617649}, {"id": 728, "seek": 244232, "start": 2464.1200000000003, + "end": 2466.36, "text": " And like it''s multi-layered process.", "tokens": [51454, + 400, 411, 309, 311, 4825, 12, 8376, 4073, 1399, 13, 51566], "temperature": 0.0, + "avg_logprob": -0.17692713774451913, "compression_ratio": 1.7813620071684588, "no_speech_prob": + 0.003576674498617649}, {"id": 729, "seek": 244232, "start": 2466.36, "end": 2470.7200000000003, + "text": " So basically they''re telling you that a cool search engine would be doing + that.", "tokens": [51566, 407, 1936, 436, 434, 3585, 291, 300, 257, 1627, 3164, + 2848, 576, 312, 884, 300, 13, 51784], "temperature": 0.0, "avg_logprob": -0.17692713774451913, + "compression_ratio": 1.7813620071684588, "no_speech_prob": 0.003576674498617649}, + {"id": 730, "seek": 247072, "start": 2470.72, "end": 2476.3999999999996, "text": + " So you could ask like a very convoluted question and it would kind of figure everything + out.", "tokens": [50364, 407, 291, 727, 1029, 411, 257, 588, 3754, 2308, 292, 1168, + 293, 309, 576, 733, 295, 2573, 1203, 484, 13, 50648], "temperature": 0.0, "avg_logprob": + -0.18848318802682976, "compression_ratio": 1.8829787234042554, "no_speech_prob": + 0.004213270265609026}, {"id": 731, "seek": 247072, "start": 2476.3999999999996, + "end": 2481.48, "text": " I feel like you might know more, isn''t there the the + aspect of I remember taking an MLP", "tokens": [50648, 286, 841, 411, 291, 1062, + 458, 544, 11, 1943, 380, 456, 264, 264, 4171, 295, 286, 1604, 1940, 364, 21601, + 47, 50902], "temperature": 0.0, "avg_logprob": -0.18848318802682976, "compression_ratio": + 1.8829787234042554, "no_speech_prob": 0.004213270265609026}, {"id": 732, "seek": + 247072, "start": 2481.48, "end": 2482.48, "text": " class.", "tokens": [50902, 1508, + 13, 50952], "temperature": 0.0, "avg_logprob": -0.18848318802682976, "compression_ratio": + 1.8829787234042554, "no_speech_prob": 0.004213270265609026}, {"id": 733, "seek": + 247072, "start": 2482.48, "end": 2487.48, "text": " It was always you could only + get to two degrees or was it like the third degree would always", "tokens": [50952, + 467, 390, 1009, 291, 727, 787, 483, 281, 732, 5310, 420, 390, 309, 411, 264, 2636, + 4314, 576, 1009, 51202], "temperature": 0.0, "avg_logprob": -0.18848318802682976, + "compression_ratio": 1.8829787234042554, "no_speech_prob": 0.004213270265609026}, + {"id": 734, "seek": 247072, "start": 2487.48, "end": 2491.56, "text": " like if + you have a question and then like based on that answer, you have another like", + "tokens": [51202, 411, 498, 291, 362, 257, 1168, 293, 550, 411, 2361, 322, 300, + 1867, 11, 291, 362, 1071, 411, 51406], "temperature": 0.0, "avg_logprob": -0.18848318802682976, + "compression_ratio": 1.8829787234042554, "no_speech_prob": 0.004213270265609026}, + {"id": 735, "seek": 247072, "start": 2491.56, "end": 2494.9599999999996, "text": + " part of like the next part of the question is placed on the first one.", "tokens": + [51406, 644, 295, 411, 264, 958, 644, 295, 264, 1168, 307, 7074, 322, 264, 700, + 472, 13, 51576], "temperature": 0.0, "avg_logprob": -0.18848318802682976, "compression_ratio": + 1.8829787234042554, "no_speech_prob": 0.004213270265609026}, {"id": 736, "seek": + 247072, "start": 2494.9599999999996, "end": 2498.52, "text": " I think they were + only being able to do the second degree like a question after the", "tokens": [51576, + 286, 519, 436, 645, 787, 885, 1075, 281, 360, 264, 1150, 4314, 411, 257, 1168, 934, + 264, 51754], "temperature": 0.0, "avg_logprob": -0.18848318802682976, "compression_ratio": + 1.8829787234042554, "no_speech_prob": 0.004213270265609026}, {"id": 737, "seek": + 247072, "start": 2498.52, "end": 2499.52, "text": " question.", "tokens": [51754, + 1168, 13, 51804], "temperature": 0.0, "avg_logprob": -0.18848318802682976, "compression_ratio": + 1.8829787234042554, "no_speech_prob": 0.004213270265609026}, {"id": 738, "seek": + 249952, "start": 2499.52, "end": 2502.84, "text": " And like getting that third + part, it would always fail.", "tokens": [50364, 400, 411, 1242, 300, 2636, 644, + 11, 309, 576, 1009, 3061, 13, 50530], "temperature": 0.0, "avg_logprob": -0.2986171556555707, + "compression_ratio": 1.6806083650190113, "no_speech_prob": 0.10566551238298416}, + {"id": 739, "seek": 249952, "start": 2502.84, "end": 2506.12, "text": " Like it + wouldn''t be able to do the connection all the way back.", "tokens": [50530, 1743, + 309, 2759, 380, 312, 1075, 281, 360, 264, 4984, 439, 264, 636, 646, 13, 50694], + "temperature": 0.0, "avg_logprob": -0.2986171556555707, "compression_ratio": 1.6806083650190113, + "no_speech_prob": 0.10566551238298416}, {"id": 740, "seek": 249952, "start": 2506.12, + "end": 2507.12, "text": " Yeah, yeah.", "tokens": [50694, 865, 11, 1338, 13, 50744], + "temperature": 0.0, "avg_logprob": -0.2986171556555707, "compression_ratio": 1.6806083650190113, + "no_speech_prob": 0.10566551238298416}, {"id": 741, "seek": 249952, "start": 2507.12, + "end": 2508.16, "text": " They stopped doing that.", "tokens": [50744, 814, 5936, + 884, 300, 13, 50796], "temperature": 0.0, "avg_logprob": -0.2986171556555707, "compression_ratio": + 1.6806083650190113, "no_speech_prob": 0.10566551238298416}, {"id": 742, "seek": + 249952, "start": 2508.16, "end": 2510.36, "text": " Seems like a really good conversation.", + "tokens": [50796, 22524, 411, 257, 534, 665, 3761, 13, 50906], "temperature": 0.0, + "avg_logprob": -0.2986171556555707, "compression_ratio": 1.6806083650190113, "no_speech_prob": + 0.10566551238298416}, {"id": 743, "seek": 249952, "start": 2510.36, "end": 2516.92, + "text": " And it was also based on a lot of associations, something that computers + may or may not be doing", "tokens": [50906, 400, 309, 390, 611, 2361, 322, 257, + 688, 295, 26597, 11, 746, 300, 10807, 815, 420, 815, 406, 312, 884, 51234], "temperature": + 0.0, "avg_logprob": -0.2986171556555707, "compression_ratio": 1.6806083650190113, + "no_speech_prob": 0.10566551238298416}, {"id": 744, "seek": 249952, "start": 2516.92, + "end": 2517.92, "text": " good.", "tokens": [51234, 665, 13, 51284], "temperature": + 0.0, "avg_logprob": -0.2986171556555707, "compression_ratio": 1.6806083650190113, + "no_speech_prob": 0.10566551238298416}, {"id": 745, "seek": 249952, "start": 2517.92, + "end": 2523.08, "text": " You know, like if you know prologue, the programming language, + like it basically has the", "tokens": [51284, 509, 458, 11, 411, 498, 291, 458, + 447, 4987, 622, 11, 264, 9410, 2856, 11, 411, 309, 1936, 575, 264, 51542], "temperature": + 0.0, "avg_logprob": -0.2986171556555707, "compression_ratio": 1.6806083650190113, + "no_speech_prob": 0.10566551238298416}, {"id": 746, "seek": 249952, "start": 2523.08, + "end": 2527.7599999999998, "text": " associations kind of built in as first class + function.", "tokens": [51542, 26597, 733, 295, 3094, 294, 382, 700, 1508, 2445, + 13, 51776], "temperature": 0.0, "avg_logprob": -0.2986171556555707, "compression_ratio": + 1.6806083650190113, "no_speech_prob": 0.10566551238298416}, {"id": 747, "seek": + 252776, "start": 2527.76, "end": 2533.92, "text": " You type something like, you + know, orange fruit and then you type somewhere fruit and it", "tokens": [50364, + 509, 2010, 746, 411, 11, 291, 458, 11, 7671, 6773, 293, 550, 291, 2010, 4079, 6773, + 293, 309, 50672], "temperature": 0.0, "avg_logprob": -0.24107136445886948, "compression_ratio": + 1.7862903225806452, "no_speech_prob": 0.10481883585453033}, {"id": 748, "seek": + 252776, "start": 2533.92, "end": 2534.92, "text": " says orange.", "tokens": [50672, + 1619, 7671, 13, 50722], "temperature": 0.0, "avg_logprob": -0.24107136445886948, + "compression_ratio": 1.7862903225806452, "no_speech_prob": 0.10481883585453033}, + {"id": 749, "seek": 252776, "start": 2534.92, "end": 2536.84, "text": " If you type + orange, it says fruit, right?", "tokens": [50722, 759, 291, 2010, 7671, 11, 309, + 1619, 6773, 11, 558, 30, 50818], "temperature": 0.0, "avg_logprob": -0.24107136445886948, + "compression_ratio": 1.7862903225806452, "no_speech_prob": 0.10481883585453033}, + {"id": 750, "seek": 252776, "start": 2536.84, "end": 2539.2000000000003, "text": + " It remembered that mapping, right?", "tokens": [50818, 467, 13745, 300, 18350, + 11, 558, 30, 50936], "temperature": 0.0, "avg_logprob": -0.24107136445886948, "compression_ratio": + 1.7862903225806452, "no_speech_prob": 0.10481883585453033}, {"id": 751, "seek": + 252776, "start": 2539.2000000000003, "end": 2543.36, "text": " And then you can + use this associative kind of programming in a bunch of places kind of", "tokens": + [50936, 400, 550, 291, 393, 764, 341, 4180, 1166, 733, 295, 9410, 294, 257, 3840, + 295, 3190, 733, 295, 51144], "temperature": 0.0, "avg_logprob": -0.24107136445886948, + "compression_ratio": 1.7862903225806452, "no_speech_prob": 0.10481883585453033}, + {"id": 752, "seek": 252776, "start": 2543.36, "end": 2545.28, "text": " building + AI.", "tokens": [51144, 2390, 7318, 13, 51240], "temperature": 0.0, "avg_logprob": + -0.24107136445886948, "compression_ratio": 1.7862903225806452, "no_speech_prob": + 0.10481883585453033}, {"id": 753, "seek": 252776, "start": 2545.28, "end": 2546.96, + "text": " But I haven''t been programming prologue.", "tokens": [51240, 583, 286, + 2378, 380, 668, 9410, 447, 4987, 622, 13, 51324], "temperature": 0.0, "avg_logprob": + -0.24107136445886948, "compression_ratio": 1.7862903225806452, "no_speech_prob": + 0.10481883585453033}, {"id": 754, "seek": 252776, "start": 2546.96, "end": 2549.28, + "text": " I was just it was part of one course.", "tokens": [51324, 286, 390, 445, + 309, 390, 644, 295, 472, 1164, 13, 51440], "temperature": 0.0, "avg_logprob": -0.24107136445886948, + "compression_ratio": 1.7862903225806452, "no_speech_prob": 0.10481883585453033}, + {"id": 755, "seek": 252776, "start": 2549.28, "end": 2552.5200000000004, "text": + " But you know, like the questions that they asked at the Yandex competition, they + were", "tokens": [51440, 583, 291, 458, 11, 411, 264, 1651, 300, 436, 2351, 412, + 264, 398, 474, 3121, 6211, 11, 436, 645, 51602], "temperature": 0.0, "avg_logprob": + -0.24107136445886948, "compression_ratio": 1.7862903225806452, "no_speech_prob": + 0.10481883585453033}, {"id": 756, "seek": 255252, "start": 2552.52, "end": 2559.72, + "text": " also like, you know, something like, you know, who met this lady when + he was a student", "tokens": [50364, 611, 411, 11, 291, 458, 11, 746, 411, 11, 291, + 458, 11, 567, 1131, 341, 7262, 562, 415, 390, 257, 3107, 50724], "temperature": + 0.0, "avg_logprob": -0.24195151781513743, "compression_ratio": 1.8172043010752688, + "no_speech_prob": 0.03537409007549286}, {"id": 757, "seek": 255252, "start": 2559.72, + "end": 2560.8, "text": " blah, blah, blah, blah.", "tokens": [50724, 12288, 11, + 12288, 11, 12288, 11, 12288, 13, 50778], "temperature": 0.0, "avg_logprob": -0.24195151781513743, + "compression_ratio": 1.8172043010752688, "no_speech_prob": 0.03537409007549286}, + {"id": 758, "seek": 255252, "start": 2560.8, "end": 2565.16, "text": " And you''re + like, I already lost the train, the train of thought in this question.", "tokens": + [50778, 400, 291, 434, 411, 11, 286, 1217, 2731, 264, 3847, 11, 264, 3847, 295, + 1194, 294, 341, 1168, 13, 50996], "temperature": 0.0, "avg_logprob": -0.24195151781513743, + "compression_ratio": 1.8172043010752688, "no_speech_prob": 0.03537409007549286}, + {"id": 759, "seek": 255252, "start": 2565.16, "end": 2569.52, "text": " So I spent + like, I don''t know, maybe one minute just figuring out what is being asked.", "tokens": + [50996, 407, 286, 4418, 411, 11, 286, 500, 380, 458, 11, 1310, 472, 3456, 445, 15213, + 484, 437, 307, 885, 2351, 13, 51214], "temperature": 0.0, "avg_logprob": -0.24195151781513743, + "compression_ratio": 1.8172043010752688, "no_speech_prob": 0.03537409007549286}, + {"id": 760, "seek": 255252, "start": 2569.52, "end": 2573.24, "text": " And then + you''re like decomposing this problem into multiple problems and you start from + the", "tokens": [51214, 400, 550, 291, 434, 411, 22867, 6110, 341, 1154, 666, 3866, + 2740, 293, 291, 722, 490, 264, 51400], "temperature": 0.0, "avg_logprob": -0.24195151781513743, + "compression_ratio": 1.8172043010752688, "no_speech_prob": 0.03537409007549286}, + {"id": 761, "seek": 255252, "start": 2573.24, "end": 2574.32, "text": " first one.", + "tokens": [51400, 700, 472, 13, 51454], "temperature": 0.0, "avg_logprob": -0.24195151781513743, + "compression_ratio": 1.8172043010752688, "no_speech_prob": 0.03537409007549286}, + {"id": 762, "seek": 255252, "start": 2574.32, "end": 2575.72, "text": " And then + they are solving it.", "tokens": [51454, 400, 550, 436, 366, 12606, 309, 13, 51524], + "temperature": 0.0, "avg_logprob": -0.24195151781513743, "compression_ratio": 1.8172043010752688, + "no_speech_prob": 0.03537409007549286}, {"id": 763, "seek": 255252, "start": 2575.72, + "end": 2577.8, "text": " The next one and time is running.", "tokens": [51524, 440, + 958, 472, 293, 565, 307, 2614, 13, 51628], "temperature": 0.0, "avg_logprob": -0.24195151781513743, + "compression_ratio": 1.8172043010752688, "no_speech_prob": 0.03537409007549286}, + {"id": 764, "seek": 255252, "start": 2577.8, "end": 2581.16, "text": " It''s like + five minutes of question, if I remember correctly.", "tokens": [51628, 467, 311, + 411, 1732, 2077, 295, 1168, 11, 498, 286, 1604, 8944, 13, 51796], "temperature": + 0.0, "avg_logprob": -0.24195151781513743, "compression_ratio": 1.8172043010752688, + "no_speech_prob": 0.03537409007549286}, {"id": 765, "seek": 258116, "start": 2581.16, + "end": 2582.96, "text": " And it was fantastic competition.", "tokens": [50364, + 400, 309, 390, 5456, 6211, 13, 50454], "temperature": 0.0, "avg_logprob": -0.3502138169085393, + "compression_ratio": 1.676, "no_speech_prob": 0.024898797273635864}, {"id": 766, + "seek": 258116, "start": 2582.96, "end": 2588.04, "text": " You know, like it''s + like, I don''t think search engines are still kind of on that level.", "tokens": + [50454, 509, 458, 11, 411, 309, 311, 411, 11, 286, 500, 380, 519, 3164, 12982, 366, + 920, 733, 295, 322, 300, 1496, 13, 50708], "temperature": 0.0, "avg_logprob": -0.3502138169085393, + "compression_ratio": 1.676, "no_speech_prob": 0.024898797273635864}, {"id": 767, + "seek": 258116, "start": 2588.04, "end": 2590.8799999999997, "text": " I don''t + think they are yet on that level.", "tokens": [50708, 286, 500, 380, 519, 436, 366, + 1939, 322, 300, 1496, 13, 50850], "temperature": 0.0, "avg_logprob": -0.3502138169085393, + "compression_ratio": 1.676, "no_speech_prob": 0.024898797273635864}, {"id": 768, + "seek": 258116, "start": 2590.8799999999997, "end": 2591.8799999999997, "text": + " So yeah.", "tokens": [50850, 407, 1338, 13, 50900], "temperature": 0.0, "avg_logprob": + -0.3502138169085393, "compression_ratio": 1.676, "no_speech_prob": 0.024898797273635864}, + {"id": 769, "seek": 258116, "start": 2591.8799999999997, "end": 2594.3599999999997, + "text": " They''ll probably get there.", "tokens": [50900, 814, 603, 1391, 483, + 456, 13, 51024], "temperature": 0.0, "avg_logprob": -0.3502138169085393, "compression_ratio": + 1.676, "no_speech_prob": 0.024898797273635864}, {"id": 770, "seek": 258116, "start": + 2594.3599999999997, "end": 2595.92, "text": " It''s those neural nets.", "tokens": + [51024, 467, 311, 729, 18161, 36170, 13, 51102], "temperature": 0.0, "avg_logprob": + -0.3502138169085393, "compression_ratio": 1.676, "no_speech_prob": 0.024898797273635864}, + {"id": 771, "seek": 258116, "start": 2595.92, "end": 2599.04, "text": " What they''re + doing with them is it''s going to get there.", "tokens": [51102, 708, 436, 434, + 884, 365, 552, 307, 309, 311, 516, 281, 483, 456, 13, 51258], "temperature": 0.0, + "avg_logprob": -0.3502138169085393, "compression_ratio": 1.676, "no_speech_prob": + 0.024898797273635864}, {"id": 772, "seek": 258116, "start": 2599.04, "end": 2600.04, + "text": " Yeah.", "tokens": [51258, 865, 13, 51308], "temperature": 0.0, "avg_logprob": + -0.3502138169085393, "compression_ratio": 1.676, "no_speech_prob": 0.024898797273635864}, + {"id": 773, "seek": 258116, "start": 2600.04, "end": 2606.3599999999997, "text": + " And we have those giant networks that they''re now creating like Nvidia release + that nega", "tokens": [51308, 400, 321, 362, 729, 7410, 9590, 300, 436, 434, 586, + 4084, 411, 46284, 4374, 300, 2485, 64, 51624], "temperature": 0.0, "avg_logprob": + -0.3502138169085393, "compression_ratio": 1.676, "no_speech_prob": 0.024898797273635864}, + {"id": 774, "seek": 258116, "start": 2606.3599999999997, "end": 2607.8399999999997, + "text": " something GPT.", "tokens": [51624, 746, 26039, 51, 13, 51698], "temperature": + 0.0, "avg_logprob": -0.3502138169085393, "compression_ratio": 1.676, "no_speech_prob": + 0.024898797273635864}, {"id": 775, "seek": 258116, "start": 2607.8399999999997, + "end": 2609.8399999999997, "text": " What is it three right now?", "tokens": [51698, + 708, 307, 309, 1045, 558, 586, 30, 51798], "temperature": 0.0, "avg_logprob": -0.3502138169085393, + "compression_ratio": 1.676, "no_speech_prob": 0.024898797273635864}, {"id": 776, + "seek": 260984, "start": 2610.2400000000002, "end": 2613.44, "text": " Was it yeah, + GPT three is the one that they''re not releasing.", "tokens": [50384, 3027, 309, + 1338, 11, 26039, 51, 1045, 307, 264, 472, 300, 436, 434, 406, 16327, 13, 50544], + "temperature": 0.0, "avg_logprob": -0.2847082040605754, "compression_ratio": 1.789855072463768, + "no_speech_prob": 0.004159548319876194}, {"id": 777, "seek": 260984, "start": 2613.44, + "end": 2616.76, "text": " No, it''s going to get there one layer another.", "tokens": + [50544, 883, 11, 309, 311, 516, 281, 483, 456, 472, 4583, 1071, 13, 50710], "temperature": + 0.0, "avg_logprob": -0.2847082040605754, "compression_ratio": 1.789855072463768, + "no_speech_prob": 0.004159548319876194}, {"id": 778, "seek": 260984, "start": 2616.76, + "end": 2622.6400000000003, "text": " So does it make you excited like to try these + models in real life?", "tokens": [50710, 407, 775, 309, 652, 291, 2919, 411, 281, + 853, 613, 5245, 294, 957, 993, 30, 51004], "temperature": 0.0, "avg_logprob": -0.2847082040605754, + "compression_ratio": 1.789855072463768, "no_speech_prob": 0.004159548319876194}, + {"id": 779, "seek": 260984, "start": 2622.6400000000003, "end": 2626.1600000000003, + "text": " What do you think they kind of make too far still, too far from real life?", + "tokens": [51004, 708, 360, 291, 519, 436, 733, 295, 652, 886, 1400, 920, 11, 886, + 1400, 490, 957, 993, 30, 51180], "temperature": 0.0, "avg_logprob": -0.2847082040605754, + "compression_ratio": 1.789855072463768, "no_speech_prob": 0.004159548319876194}, + {"id": 780, "seek": 260984, "start": 2626.1600000000003, "end": 2627.56, "text": + " What do you think it makes me excited?", "tokens": [51180, 708, 360, 291, 519, + 309, 1669, 385, 2919, 30, 51250], "temperature": 0.0, "avg_logprob": -0.2847082040605754, + "compression_ratio": 1.789855072463768, "no_speech_prob": 0.004159548319876194}, + {"id": 781, "seek": 260984, "start": 2627.56, "end": 2628.8, "text": " And I think + they''re doing really well.", "tokens": [51250, 400, 286, 519, 436, 434, 884, 534, + 731, 13, 51312], "temperature": 0.0, "avg_logprob": -0.2847082040605754, "compression_ratio": + 1.789855072463768, "no_speech_prob": 0.004159548319876194}, {"id": 782, "seek": + 260984, "start": 2628.8, "end": 2633.36, "text": " It''s just that this whole trend + has kind of been going to like not user friendly.", "tokens": [51312, 467, 311, + 445, 300, 341, 1379, 6028, 575, 733, 295, 668, 516, 281, 411, 406, 4195, 9208, 13, + 51540], "temperature": 0.0, "avg_logprob": -0.2847082040605754, "compression_ratio": + 1.789855072463768, "no_speech_prob": 0.004159548319876194}, {"id": 783, "seek": + 260984, "start": 2633.36, "end": 2639.1600000000003, "text": " No one can run any + of these models and need like nine a 100s like $100,000 worth of", "tokens": [51540, + 883, 472, 393, 1190, 604, 295, 613, 5245, 293, 643, 411, 4949, 257, 2319, 82, 411, + 1848, 6879, 11, 1360, 3163, 295, 51830], "temperature": 0.0, "avg_logprob": -0.2847082040605754, + "compression_ratio": 1.789855072463768, "no_speech_prob": 0.004159548319876194}, + {"id": 784, "seek": 263916, "start": 2639.16, "end": 2644.3199999999997, "text": + " computing who has that like other than those places that are doing that what GPT + three", "tokens": [50364, 15866, 567, 575, 300, 411, 661, 813, 729, 3190, 300, 366, + 884, 300, 437, 26039, 51, 1045, 50622], "temperature": 0.0, "avg_logprob": -0.22533713292031393, + "compression_ratio": 1.820069204152249, "no_speech_prob": 0.004692982882261276}, + {"id": 785, "seek": 263916, "start": 2644.3199999999997, "end": 2645.3199999999997, + "text": " took.", "tokens": [50622, 1890, 13, 50672], "temperature": 0.0, "avg_logprob": + -0.22533713292031393, "compression_ratio": 1.820069204152249, "no_speech_prob": + 0.004692982882261276}, {"id": 786, "seek": 263916, "start": 2645.3199999999997, + "end": 2648.2, "text": " I don''t know how many billions and billions parameters.", + "tokens": [50672, 286, 500, 380, 458, 577, 867, 17375, 293, 17375, 9834, 13, 50816], + "temperature": 0.0, "avg_logprob": -0.22533713292031393, "compression_ratio": 1.820069204152249, + "no_speech_prob": 0.004692982882261276}, {"id": 787, "seek": 263916, "start": 2648.2, + "end": 2651.48, "text": " No one can run that unless you''re like at some super + big company.", "tokens": [50816, 883, 472, 393, 1190, 300, 5969, 291, 434, 411, + 412, 512, 1687, 955, 2237, 13, 50980], "temperature": 0.0, "avg_logprob": -0.22533713292031393, + "compression_ratio": 1.820069204152249, "no_speech_prob": 0.004692982882261276}, + {"id": 788, "seek": 263916, "start": 2651.48, "end": 2656.48, "text": " It''s like + my opinion is what''s the point like you can you can always throw more and more", + "tokens": [50980, 467, 311, 411, 452, 4800, 307, 437, 311, 264, 935, 411, 291, 393, + 291, 393, 1009, 3507, 544, 293, 544, 51230], "temperature": 0.0, "avg_logprob": + -0.22533713292031393, "compression_ratio": 1.820069204152249, "no_speech_prob": + 0.004692982882261276}, {"id": 789, "seek": 263916, "start": 2656.48, "end": 2662.44, + "text": " hard door at it and you can always get like 0.001 percent closer and closer.", + "tokens": [51230, 1152, 2853, 412, 309, 293, 291, 393, 1009, 483, 411, 1958, 13, + 628, 16, 3043, 4966, 293, 4966, 13, 51528], "temperature": 0.0, "avg_logprob": -0.22533713292031393, + "compression_ratio": 1.820069204152249, "no_speech_prob": 0.004692982882261276}, + {"id": 790, "seek": 263916, "start": 2662.44, "end": 2667.24, "text": " And that''s + kind of like this whole thing of why like this area of research is on.", "tokens": + [51528, 400, 300, 311, 733, 295, 411, 341, 1379, 551, 295, 983, 411, 341, 1859, + 295, 2132, 307, 322, 13, 51768], "temperature": 0.0, "avg_logprob": -0.22533713292031393, + "compression_ratio": 1.820069204152249, "no_speech_prob": 0.004692982882261276}, + {"id": 791, "seek": 263916, "start": 2667.24, "end": 2669.08, "text": " It''s kind + of the new thing like we''ve already kind of maxed our", "tokens": [51768, 467, + 311, 733, 295, 264, 777, 551, 411, 321, 600, 1217, 733, 295, 11469, 292, 527, 51860], + "temperature": 0.0, "avg_logprob": -0.22533713292031393, "compression_ratio": 1.820069204152249, + "no_speech_prob": 0.004692982882261276}, {"id": 792, "seek": 266908, "start": 2669.08, + "end": 2671.24, "text": " selves out on neural nets.", "tokens": [50364, 41900, + 484, 322, 18161, 36170, 13, 50472], "temperature": 0.0, "avg_logprob": -0.20606228400921, + "compression_ratio": 1.7153284671532847, "no_speech_prob": 0.015953492373228073}, + {"id": 793, "seek": 266908, "start": 2671.24, "end": 2676.36, "text": " I personally + believe unless there''s some huge architecture changes that inspire some", "tokens": + [50472, 286, 5665, 1697, 5969, 456, 311, 512, 2603, 9482, 2962, 300, 15638, 512, + 50728], "temperature": 0.0, "avg_logprob": -0.20606228400921, "compression_ratio": + 1.7153284671532847, "no_speech_prob": 0.015953492373228073}, {"id": 794, "seek": + 266908, "start": 2676.36, "end": 2678.88, "text": " really interesting stuff.", + "tokens": [50728, 534, 1880, 1507, 13, 50854], "temperature": 0.0, "avg_logprob": + -0.20606228400921, "compression_ratio": 1.7153284671532847, "no_speech_prob": 0.015953492373228073}, + {"id": 795, "seek": 266908, "start": 2678.88, "end": 2682.7599999999998, "text": + " I don''t see neural nets changing as much and then I feel like we can do more + with the", "tokens": [50854, 286, 500, 380, 536, 18161, 36170, 4473, 382, 709, 293, + 550, 286, 841, 411, 321, 393, 360, 544, 365, 264, 51048], "temperature": 0.0, "avg_logprob": + -0.20606228400921, "compression_ratio": 1.7153284671532847, "no_speech_prob": 0.015953492373228073}, + {"id": 796, "seek": 266908, "start": 2682.7599999999998, "end": 2687.88, "text": + " next step of the nearest neighbor search in this approximate nearest neighbor + vector similarity", "tokens": [51048, 958, 1823, 295, 264, 23831, 5987, 3164, 294, + 341, 30874, 23831, 5987, 8062, 32194, 51304], "temperature": 0.0, "avg_logprob": + -0.20606228400921, "compression_ratio": 1.7153284671532847, "no_speech_prob": 0.015953492373228073}, + {"id": 797, "seek": 266908, "start": 2687.88, "end": 2688.88, "text": " search.", + "tokens": [51304, 3164, 13, 51354], "temperature": 0.0, "avg_logprob": -0.20606228400921, + "compression_ratio": 1.7153284671532847, "no_speech_prob": 0.015953492373228073}, + {"id": 798, "seek": 266908, "start": 2688.88, "end": 2693.96, "text": " And I think + that''s where we can make some more new head games until we max this out.", "tokens": + [51354, 400, 286, 519, 300, 311, 689, 321, 393, 652, 512, 544, 777, 1378, 2813, + 1826, 321, 11469, 341, 484, 13, 51608], "temperature": 0.0, "avg_logprob": -0.20606228400921, + "compression_ratio": 1.7153284671532847, "no_speech_prob": 0.015953492373228073}, + {"id": 799, "seek": 266908, "start": 2693.96, "end": 2695.52, "text": " But yeah, + I don''t know.", "tokens": [51608, 583, 1338, 11, 286, 500, 380, 458, 13, 51686], + "temperature": 0.0, "avg_logprob": -0.20606228400921, "compression_ratio": 1.7153284671532847, + "no_speech_prob": 0.015953492373228073}, {"id": 800, "seek": 266908, "start": 2695.52, + "end": 2696.88, "text": " I''m excited to see where it goes.", "tokens": [51686, + 286, 478, 2919, 281, 536, 689, 309, 1709, 13, 51754], "temperature": 0.0, "avg_logprob": + -0.20606228400921, "compression_ratio": 1.7153284671532847, "no_speech_prob": 0.015953492373228073}, + {"id": 801, "seek": 269688, "start": 2696.88, "end": 2702.4, "text": " It''s just + I hope it''s going to be something I can try myself and not need to spend $400", + "tokens": [50364, 467, 311, 445, 286, 1454, 309, 311, 516, 281, 312, 746, 286, 393, + 853, 2059, 293, 406, 643, 281, 3496, 1848, 13741, 50640], "temperature": 0.0, "avg_logprob": + -0.2724788609672995, "compression_ratio": 1.7033333333333334, "no_speech_prob": + 0.18417097628116608}, {"id": 802, "seek": 269688, "start": 2702.4, "end": 2705.7200000000003, + "text": " on Amazon for hours worth of calculations.", "tokens": [50640, 322, 6795, + 337, 2496, 3163, 295, 20448, 13, 50806], "temperature": 0.0, "avg_logprob": -0.2724788609672995, + "compression_ratio": 1.7033333333333334, "no_speech_prob": 0.18417097628116608}, + {"id": 803, "seek": 269688, "start": 2705.7200000000003, "end": 2706.7200000000003, + "text": " Yeah.", "tokens": [50806, 865, 13, 50856], "temperature": 0.0, "avg_logprob": + -0.2724788609672995, "compression_ratio": 1.7033333333333334, "no_speech_prob": + 0.18417097628116608}, {"id": 804, "seek": 269688, "start": 2706.7200000000003, "end": + 2709.36, "text": " So maybe somebody needs to work on compressing this model.", + "tokens": [50856, 407, 1310, 2618, 2203, 281, 589, 322, 14778, 278, 341, 2316, 13, + 50988], "temperature": 0.0, "avg_logprob": -0.2724788609672995, "compression_ratio": + 1.7033333333333334, "no_speech_prob": 0.18417097628116608}, {"id": 805, "seek": + 269688, "start": 2709.36, "end": 2714.92, "text": " So sort of compressing the compute + power they actually, you know, require.", "tokens": [50988, 407, 1333, 295, 14778, + 278, 264, 14722, 1347, 436, 767, 11, 291, 458, 11, 3651, 13, 51266], "temperature": + 0.0, "avg_logprob": -0.2724788609672995, "compression_ratio": 1.7033333333333334, + "no_speech_prob": 0.18417097628116608}, {"id": 806, "seek": 269688, "start": 2714.92, + "end": 2719.44, "text": " But now the thing is now it''s like everyone''s interested + when they say, oh yeah, we use", "tokens": [51266, 583, 586, 264, 551, 307, 586, + 309, 311, 411, 1518, 311, 3102, 562, 436, 584, 11, 1954, 1338, 11, 321, 764, 51492], + "temperature": 0.0, "avg_logprob": -0.2724788609672995, "compression_ratio": 1.7033333333333334, + "no_speech_prob": 0.18417097628116608}, {"id": 807, "seek": 269688, "start": 2719.44, + "end": 2722.1600000000003, "text": " like 40 million dollars with the compute power.", + "tokens": [51492, 411, 3356, 2459, 3808, 365, 264, 14722, 1347, 13, 51628], "temperature": + 0.0, "avg_logprob": -0.2724788609672995, "compression_ratio": 1.7033333333333334, + "no_speech_prob": 0.18417097628116608}, {"id": 808, "seek": 269688, "start": 2722.1600000000003, + "end": 2723.1600000000003, "text": " Everyone thinks that''s cool.", "tokens": [51628, + 5198, 7309, 300, 311, 1627, 13, 51678], "temperature": 0.0, "avg_logprob": -0.2724788609672995, + "compression_ratio": 1.7033333333333334, "no_speech_prob": 0.18417097628116608}, + {"id": 809, "seek": 269688, "start": 2723.1600000000003, "end": 2724.84, "text": + " That''s going to get some news.", "tokens": [51678, 663, 311, 516, 281, 483, 512, + 2583, 13, 51762], "temperature": 0.0, "avg_logprob": -0.2724788609672995, "compression_ratio": + 1.7033333333333334, "no_speech_prob": 0.18417097628116608}, {"id": 810, "seek": + 269688, "start": 2724.84, "end": 2726.36, "text": " And people are going to be interested + in it.", "tokens": [51762, 400, 561, 366, 516, 281, 312, 3102, 294, 309, 13, 51838], + "temperature": 0.0, "avg_logprob": -0.2724788609672995, "compression_ratio": 1.7033333333333334, + "no_speech_prob": 0.18417097628116608}, {"id": 811, "seek": 272636, "start": 2726.36, + "end": 2728.52, "text": " It''s when it''s bigger, but there is I don''t know.", + "tokens": [50364, 467, 311, 562, 309, 311, 3801, 11, 457, 456, 307, 286, 500, 380, + 458, 13, 50472], "temperature": 0.0, "avg_logprob": -0.22726792874543564, "compression_ratio": + 1.6798679867986799, "no_speech_prob": 0.021956684067845345}, {"id": 812, "seek": + 272636, "start": 2728.52, "end": 2732.44, "text": " I remember when I was studying + all this, there was a lot of those things in regards to", "tokens": [50472, 286, + 1604, 562, 286, 390, 7601, 439, 341, 11, 456, 390, 257, 688, 295, 729, 721, 294, + 14258, 281, 50668], "temperature": 0.0, "avg_logprob": -0.22726792874543564, "compression_ratio": + 1.6798679867986799, "no_speech_prob": 0.021956684067845345}, {"id": 813, "seek": + 272636, "start": 2732.44, "end": 2736.44, "text": " sparse neural nets and that + was kind of going to be the future of compressing all this", "tokens": [50668, 637, + 11668, 18161, 36170, 293, 300, 390, 733, 295, 516, 281, 312, 264, 2027, 295, 14778, + 278, 439, 341, 50868], "temperature": 0.0, "avg_logprob": -0.22726792874543564, + "compression_ratio": 1.6798679867986799, "no_speech_prob": 0.021956684067845345}, + {"id": 814, "seek": 272636, "start": 2736.44, "end": 2737.44, "text": " down.", + "tokens": [50868, 760, 13, 50918], "temperature": 0.0, "avg_logprob": -0.22726792874543564, + "compression_ratio": 1.6798679867986799, "no_speech_prob": 0.021956684067845345}, + {"id": 815, "seek": 272636, "start": 2737.44, "end": 2741.88, "text": " Fortunately, + I haven''t really kept up on it too much, but hopefully they do make moves", "tokens": + [50918, 20652, 11, 286, 2378, 380, 534, 4305, 493, 322, 309, 886, 709, 11, 457, + 4696, 436, 360, 652, 6067, 51140], "temperature": 0.0, "avg_logprob": -0.22726792874543564, + "compression_ratio": 1.6798679867986799, "no_speech_prob": 0.021956684067845345}, + {"id": 816, "seek": 272636, "start": 2741.88, "end": 2747.08, "text": " because + again, I don''t see throwing more and more hardware as innovation compared to", + "tokens": [51140, 570, 797, 11, 286, 500, 380, 536, 10238, 544, 293, 544, 8837, + 382, 8504, 5347, 281, 51400], "temperature": 0.0, "avg_logprob": -0.22726792874543564, + "compression_ratio": 1.6798679867986799, "no_speech_prob": 0.021956684067845345}, + {"id": 817, "seek": 272636, "start": 2747.08, "end": 2750.6400000000003, "text": + " actually making it efficient.", "tokens": [51400, 767, 1455, 309, 7148, 13, 51578], + "temperature": 0.0, "avg_logprob": -0.22726792874543564, "compression_ratio": 1.6798679867986799, + "no_speech_prob": 0.021956684067845345}, {"id": 818, "seek": 272636, "start": 2750.6400000000003, + "end": 2751.6400000000003, "text": " Yeah.", "tokens": [51578, 865, 13, 51628], + "temperature": 0.0, "avg_logprob": -0.22726792874543564, "compression_ratio": 1.6798679867986799, + "no_speech_prob": 0.021956684067845345}, {"id": 819, "seek": 272636, "start": 2751.6400000000003, + "end": 2753.32, "text": " That''s my take.", "tokens": [51628, 663, 311, 452, 747, + 13, 51712], "temperature": 0.0, "avg_logprob": -0.22726792874543564, "compression_ratio": + 1.6798679867986799, "no_speech_prob": 0.021956684067845345}, {"id": 820, "seek": + 272636, "start": 2753.32, "end": 2755.92, "text": " People are doing it so there''s + there''s reason to do it.", "tokens": [51712, 3432, 366, 884, 309, 370, 456, 311, + 456, 311, 1778, 281, 360, 309, 13, 51842], "temperature": 0.0, "avg_logprob": -0.22726792874543564, + "compression_ratio": 1.6798679867986799, "no_speech_prob": 0.021956684067845345}, + {"id": 821, "seek": 275592, "start": 2756.2400000000002, "end": 2757.2400000000002, + "text": " Yeah, for sure.", "tokens": [50380, 865, 11, 337, 988, 13, 50430], "temperature": + 0.0, "avg_logprob": -0.3039734683819671, "compression_ratio": 1.7185185185185186, + "no_speech_prob": 0.0033704210072755814}, {"id": 822, "seek": 275592, "start": 2757.2400000000002, + "end": 2764.28, "text": " It''s like I guess the the hope of researchers is to essentially + kind of emulate human brain,", "tokens": [50430, 467, 311, 411, 286, 2041, 264, + 264, 1454, 295, 10309, 307, 281, 4476, 733, 295, 45497, 1952, 3567, 11, 50782], + "temperature": 0.0, "avg_logprob": -0.3039734683819671, "compression_ratio": 1.7185185185185186, + "no_speech_prob": 0.0033704210072755814}, {"id": 823, "seek": 275592, "start": 2764.28, + "end": 2765.28, "text": " right?", "tokens": [50782, 558, 30, 50832], "temperature": + 0.0, "avg_logprob": -0.3039734683819671, "compression_ratio": 1.7185185185185186, + "no_speech_prob": 0.0033704210072755814}, {"id": 824, "seek": 275592, "start": 2765.28, + "end": 2770.48, "text": " But like, I think human brain has like a hundred million + neurons, so even more, right?", "tokens": [50832, 583, 411, 11, 286, 519, 1952, + 3567, 575, 411, 257, 3262, 2459, 22027, 11, 370, 754, 544, 11, 558, 30, 51092], + "temperature": 0.0, "avg_logprob": -0.3039734683819671, "compression_ratio": 1.7185185185185186, + "no_speech_prob": 0.0033704210072755814}, {"id": 825, "seek": 275592, "start": 2770.48, + "end": 2771.7200000000003, "text": " I don''t know.", "tokens": [51092, 286, 500, + 380, 458, 13, 51154], "temperature": 0.0, "avg_logprob": -0.3039734683819671, "compression_ratio": + 1.7185185185185186, "no_speech_prob": 0.0033704210072755814}, {"id": 826, "seek": + 275592, "start": 2771.7200000000003, "end": 2775.28, "text": " Like, yeah, it has + a bunch of neurons and then connections that we have is.", "tokens": [51154, 1743, + 11, 1338, 11, 309, 575, 257, 3840, 295, 22027, 293, 550, 9271, 300, 321, 362, 307, + 13, 51332], "temperature": 0.0, "avg_logprob": -0.3039734683819671, "compression_ratio": + 1.7185185185185186, "no_speech_prob": 0.0033704210072755814}, {"id": 827, "seek": + 275592, "start": 2775.28, "end": 2776.28, "text": " Yeah.", "tokens": [51332, 865, + 13, 51382], "temperature": 0.0, "avg_logprob": -0.3039734683819671, "compression_ratio": + 1.7185185185185186, "no_speech_prob": 0.0033704210072755814}, {"id": 828, "seek": + 275592, "start": 2776.28, "end": 2779.52, "text": " Yeah, we''re not, I don''t think + hardware wise we''re going to be close yet.", "tokens": [51382, 865, 11, 321, 434, + 406, 11, 286, 500, 380, 519, 8837, 10829, 321, 434, 516, 281, 312, 1998, 1939, 13, + 51544], "temperature": 0.0, "avg_logprob": -0.3039734683819671, "compression_ratio": + 1.7185185185185186, "no_speech_prob": 0.0033704210072755814}, {"id": 829, "seek": + 275592, "start": 2779.52, "end": 2784.84, "text": " Unless, I don''t know, I know + they''re doing a lot of research in this area, but I definitely", "tokens": [51544, + 16581, 11, 286, 500, 380, 458, 11, 286, 458, 436, 434, 884, 257, 688, 295, 2132, + 294, 341, 1859, 11, 457, 286, 2138, 51810], "temperature": 0.0, "avg_logprob": -0.3039734683819671, + "compression_ratio": 1.7185185185185186, "no_speech_prob": 0.0033704210072755814}, + {"id": 830, "seek": 278484, "start": 2784.84, "end": 2788.6800000000003, "text": + " don''t think the neuron or, yeah, I don''t know.", "tokens": [50364, 500, 380, + 519, 264, 34090, 420, 11, 1338, 11, 286, 500, 380, 458, 13, 50556], "temperature": + 0.0, "avg_logprob": -0.3628031094868978, "compression_ratio": 1.6422764227642277, + "no_speech_prob": 0.007612353190779686}, {"id": 831, "seek": 278484, "start": 2788.6800000000003, + "end": 2790.36, "text": " This is the way I estimate that.", "tokens": [50556, 639, + 307, 264, 636, 286, 12539, 300, 13, 50640], "temperature": 0.0, "avg_logprob": -0.3628031094868978, + "compression_ratio": 1.6422764227642277, "no_speech_prob": 0.007612353190779686}, + {"id": 832, "seek": 278484, "start": 2790.36, "end": 2791.36, "text": " Yeah, yeah, + yeah.", "tokens": [50640, 865, 11, 1338, 11, 1338, 13, 50690], "temperature": 0.0, + "avg_logprob": -0.3628031094868978, "compression_ratio": 1.6422764227642277, "no_speech_prob": + 0.007612353190779686}, {"id": 833, "seek": 278484, "start": 2791.36, "end": 2797.42, + "text": " But like, just to close on that thought also that human brain does not + kind of use the", "tokens": [50690, 583, 411, 11, 445, 281, 1998, 322, 300, 1194, + 611, 300, 1952, 3567, 775, 406, 733, 295, 764, 264, 50993], "temperature": 0.0, + "avg_logprob": -0.3628031094868978, "compression_ratio": 1.6422764227642277, "no_speech_prob": + 0.007612353190779686}, {"id": 834, "seek": 278484, "start": 2797.42, "end": 2803.48, + "text": " energy of a server farm is just like one electric lamp or whatever.", + "tokens": [50993, 2281, 295, 257, 7154, 5421, 307, 445, 411, 472, 5210, 12684, 420, + 2035, 13, 51296], "temperature": 0.0, "avg_logprob": -0.3628031094868978, "compression_ratio": + 1.6422764227642277, "no_speech_prob": 0.007612353190779686}, {"id": 835, "seek": + 278484, "start": 2803.48, "end": 2806.56, "text": " Yeah, even less efficiency right + there.", "tokens": [51296, 865, 11, 754, 1570, 10493, 558, 456, 13, 51450], "temperature": + 0.0, "avg_logprob": -0.3628031094868978, "compression_ratio": 1.6422764227642277, + "no_speech_prob": 0.007612353190779686}, {"id": 836, "seek": 278484, "start": 2806.56, + "end": 2807.56, "text": " And that efficiency.", "tokens": [51450, 400, 300, 10493, + 13, 51500], "temperature": 0.0, "avg_logprob": -0.3628031094868978, "compression_ratio": + 1.6422764227642277, "no_speech_prob": 0.007612353190779686}, {"id": 837, "seek": + 278484, "start": 2807.56, "end": 2810.2400000000002, "text": " A couple of kilos + and that''s it, right?", "tokens": [51500, 316, 1916, 295, 30000, 293, 300, 311, + 309, 11, 558, 30, 51634], "temperature": 0.0, "avg_logprob": -0.3628031094868978, + "compression_ratio": 1.6422764227642277, "no_speech_prob": 0.007612353190779686}, + {"id": 838, "seek": 278484, "start": 2810.2400000000002, "end": 2811.2400000000002, + "text": " That''s the device.", "tokens": [51634, 663, 311, 264, 4302, 13, 51684], + "temperature": 0.0, "avg_logprob": -0.3628031094868978, "compression_ratio": 1.6422764227642277, + "no_speech_prob": 0.007612353190779686}, {"id": 839, "seek": 278484, "start": 2811.2400000000002, + "end": 2813.4, "text": " And then we can do all of this.", "tokens": [51684, 400, + 550, 321, 393, 360, 439, 295, 341, 13, 51792], "temperature": 0.0, "avg_logprob": + -0.3628031094868978, "compression_ratio": 1.6422764227642277, "no_speech_prob": + 0.007612353190779686}, {"id": 840, "seek": 281340, "start": 2813.7200000000003, + "end": 2815.32, "text": " Yeah, there is a long way to go.", "tokens": [50380, 865, + 11, 456, 307, 257, 938, 636, 281, 352, 13, 50460], "temperature": 0.0, "avg_logprob": + -0.313291410966353, "compression_ratio": 1.5368852459016393, "no_speech_prob": 0.0378362238407135}, + {"id": 841, "seek": 281340, "start": 2815.32, "end": 2816.32, "text": " Yeah, yeah.", + "tokens": [50460, 865, 11, 1338, 13, 50510], "temperature": 0.0, "avg_logprob": + -0.313291410966353, "compression_ratio": 1.5368852459016393, "no_speech_prob": 0.0378362238407135}, + {"id": 842, "seek": 281340, "start": 2816.32, "end": 2817.76, "text": " And exciting + though.", "tokens": [50510, 400, 4670, 1673, 13, 50582], "temperature": 0.0, "avg_logprob": + -0.313291410966353, "compression_ratio": 1.5368852459016393, "no_speech_prob": 0.0378362238407135}, + {"id": 843, "seek": 281340, "start": 2817.76, "end": 2819.12, "text": " Yeah, yeah, + absolutely.", "tokens": [50582, 865, 11, 1338, 11, 3122, 13, 50650], "temperature": + 0.0, "avg_logprob": -0.313291410966353, "compression_ratio": 1.5368852459016393, + "no_speech_prob": 0.0378362238407135}, {"id": 844, "seek": 281340, "start": 2819.12, + "end": 2822.64, "text": " But actually, today I''ve learned from my colleague Arnitalman.", + "tokens": [50650, 583, 767, 11, 965, 286, 600, 3264, 490, 452, 13532, 1587, 77, + 1686, 1601, 13, 50826], "temperature": 0.0, "avg_logprob": -0.313291410966353, "compression_ratio": + 1.5368852459016393, "no_speech_prob": 0.0378362238407135}, {"id": 845, "seek": 281340, + "start": 2822.64, "end": 2827.2400000000002, "text": " I will make sure to also + ask him to give me the link to this paper.", "tokens": [50826, 286, 486, 652, 988, + 281, 611, 1029, 796, 281, 976, 385, 264, 2113, 281, 341, 3035, 13, 51056], "temperature": + 0.0, "avg_logprob": -0.313291410966353, "compression_ratio": 1.5368852459016393, + "no_speech_prob": 0.0378362238407135}, {"id": 846, "seek": 281340, "start": 2827.2400000000002, + "end": 2835.7200000000003, "text": " The paper said that if you take a model like + bird, bird will not be able to distinguish", "tokens": [51056, 440, 3035, 848, 300, + 498, 291, 747, 257, 2316, 411, 5255, 11, 5255, 486, 406, 312, 1075, 281, 20206, + 51480], "temperature": 0.0, "avg_logprob": -0.313291410966353, "compression_ratio": + 1.5368852459016393, "no_speech_prob": 0.0378362238407135}, {"id": 847, "seek": 281340, + "start": 2835.7200000000003, "end": 2837.2400000000002, "text": " the negations.", + "tokens": [51480, 264, 2485, 763, 13, 51556], "temperature": 0.0, "avg_logprob": + -0.313291410966353, "compression_ratio": 1.5368852459016393, "no_speech_prob": 0.0378362238407135}, + {"id": 848, "seek": 281340, "start": 2837.2400000000002, "end": 2840.12, "text": + " Have you seen such research, which is very amazing.", "tokens": [51556, 3560, + 291, 1612, 1270, 2132, 11, 597, 307, 588, 2243, 13, 51700], "temperature": 0.0, + "avg_logprob": -0.313291410966353, "compression_ratio": 1.5368852459016393, "no_speech_prob": + 0.0378362238407135}, {"id": 849, "seek": 284012, "start": 2840.12, "end": 2846.64, + "text": " Like for the bird power, not to be able to distinguish negations, that + could be like", "tokens": [50364, 1743, 337, 264, 5255, 1347, 11, 406, 281, 312, + 1075, 281, 20206, 2485, 763, 11, 300, 727, 312, 411, 50690], "temperature": 0.0, + "avg_logprob": -0.26912161509195964, "compression_ratio": 1.6267281105990783, "no_speech_prob": + 0.030437584966421127}, {"id": 850, "seek": 284012, "start": 2846.64, "end": 2848.72, + "text": " deal breaker in some cases.", "tokens": [50690, 2028, 35375, 294, 512, + 3331, 13, 50794], "temperature": 0.0, "avg_logprob": -0.26912161509195964, "compression_ratio": + 1.6267281105990783, "no_speech_prob": 0.030437584966421127}, {"id": 851, "seek": + 284012, "start": 2848.72, "end": 2854.2, "text": " Even though it is very, bird + is very kind of powerful model and probably Google is using", "tokens": [50794, + 2754, 1673, 309, 307, 588, 11, 5255, 307, 588, 733, 295, 4005, 2316, 293, 1391, + 3329, 307, 1228, 51068], "temperature": 0.0, "avg_logprob": -0.26912161509195964, + "compression_ratio": 1.6267281105990783, "no_speech_prob": 0.030437584966421127}, + {"id": 852, "seek": 284012, "start": 2854.2, "end": 2857.3199999999997, "text": + " it like for a few percent of their searches.", "tokens": [51068, 309, 411, 337, + 257, 1326, 3043, 295, 641, 26701, 13, 51224], "temperature": 0.0, "avg_logprob": + -0.26912161509195964, "compression_ratio": 1.6267281105990783, "no_speech_prob": + 0.030437584966421127}, {"id": 853, "seek": 284012, "start": 2857.3199999999997, + "end": 2864.96, "text": " But to know that it doesn''t know to distinguish between + negated or non negated phrase, that''s", "tokens": [51224, 583, 281, 458, 300, 309, + 1177, 380, 458, 281, 20206, 1296, 2485, 770, 420, 2107, 2485, 770, 9535, 11, 300, + 311, 51606], "temperature": 0.0, "avg_logprob": -0.26912161509195964, "compression_ratio": + 1.6267281105990783, "no_speech_prob": 0.030437584966421127}, {"id": 854, "seek": + 284012, "start": 2864.96, "end": 2866.12, "text": " interesting.", "tokens": [51606, + 1880, 13, 51664], "temperature": 0.0, "avg_logprob": -0.26912161509195964, "compression_ratio": + 1.6267281105990783, "no_speech_prob": 0.030437584966421127}, {"id": 855, "seek": + 286612, "start": 2866.12, "end": 2872.2, "text": " And that brings more questions + in where some languages have double negations, where matters,", "tokens": [50364, + 400, 300, 5607, 544, 1651, 294, 689, 512, 8650, 362, 3834, 2485, 763, 11, 689, 7001, + 11, 50668], "temperature": 0.0, "avg_logprob": -0.2884235382080078, "compression_ratio": + 1.9292035398230087, "no_speech_prob": 0.041901737451553345}, {"id": 856, "seek": + 286612, "start": 2872.2, "end": 2873.7599999999998, "text": " where it doesn''t + matter.", "tokens": [50668, 689, 309, 1177, 380, 1871, 13, 50746], "temperature": + 0.0, "avg_logprob": -0.2884235382080078, "compression_ratio": 1.9292035398230087, + "no_speech_prob": 0.041901737451553345}, {"id": 857, "seek": 286612, "start": 2873.7599999999998, + "end": 2877.0, "text": " And then that''s like where it''s like a double negations + in some languages are used quite", "tokens": [50746, 400, 550, 300, 311, 411, 689, + 309, 311, 411, 257, 3834, 2485, 763, 294, 512, 8650, 366, 1143, 1596, 50908], "temperature": + 0.0, "avg_logprob": -0.2884235382080078, "compression_ratio": 1.9292035398230087, + "no_speech_prob": 0.041901737451553345}, {"id": 858, "seek": 286612, "start": 2877.0, + "end": 2879.12, "text": " a bit and they really change the meaning.", "tokens": + [50908, 257, 857, 293, 436, 534, 1319, 264, 3620, 13, 51014], "temperature": 0.0, + "avg_logprob": -0.2884235382080078, "compression_ratio": 1.9292035398230087, "no_speech_prob": + 0.041901737451553345}, {"id": 859, "seek": 286612, "start": 2879.12, "end": 2884.24, + "text": " So that''s where it''s like, what do we do there?", "tokens": [51014, + 407, 300, 311, 689, 309, 311, 411, 11, 437, 360, 321, 360, 456, 30, 51270], "temperature": + 0.0, "avg_logprob": -0.2884235382080078, "compression_ratio": 1.9292035398230087, + "no_speech_prob": 0.041901737451553345}, {"id": 860, "seek": 286612, "start": 2884.24, + "end": 2885.7599999999998, "text": " But I wonder why.", "tokens": [51270, 583, + 286, 2441, 983, 13, 51346], "temperature": 0.0, "avg_logprob": -0.2884235382080078, + "compression_ratio": 1.9292035398230087, "no_speech_prob": 0.041901737451553345}, + {"id": 861, "seek": 286612, "start": 2885.7599999999998, "end": 2889.2799999999997, + "text": " I guess I wonder if that''s something we know why or if that''s something + just a black", "tokens": [51346, 286, 2041, 286, 2441, 498, 300, 311, 746, 321, + 458, 983, 420, 498, 300, 311, 746, 445, 257, 2211, 51522], "temperature": 0.0, "avg_logprob": + -0.2884235382080078, "compression_ratio": 1.9292035398230087, "no_speech_prob": + 0.041901737451553345}, {"id": 862, "seek": 286612, "start": 2889.2799999999997, + "end": 2892.7999999999997, "text": " box of mystery does something there.", "tokens": + [51522, 2424, 295, 11422, 775, 746, 456, 13, 51698], "temperature": 0.0, "avg_logprob": + -0.2884235382080078, "compression_ratio": 1.9292035398230087, "no_speech_prob": + 0.041901737451553345}, {"id": 863, "seek": 289280, "start": 2892.92, "end": 2897.48, + "text": " Yeah, because if it was a rule based system, you could claim that, hey, + I have the rules", "tokens": [50370, 865, 11, 570, 498, 309, 390, 257, 4978, 2361, + 1185, 11, 291, 727, 3932, 300, 11, 4177, 11, 286, 362, 264, 4474, 50598], "temperature": + 0.0, "avg_logprob": -0.25575417280197144, "compression_ratio": 1.7178423236514522, + "no_speech_prob": 0.22626496851444244}, {"id": 864, "seek": 289280, "start": 2897.48, + "end": 2898.48, "text": " here, right?", "tokens": [50598, 510, 11, 558, 30, 50648], + "temperature": 0.0, "avg_logprob": -0.25575417280197144, "compression_ratio": 1.7178423236514522, + "no_speech_prob": 0.22626496851444244}, {"id": 865, "seek": 289280, "start": 2898.48, + "end": 2901.04, "text": " And I''ve managed to encode negations.", "tokens": [50648, + 400, 286, 600, 6453, 281, 2058, 1429, 2485, 763, 13, 50776], "temperature": 0.0, + "avg_logprob": -0.25575417280197144, "compression_ratio": 1.7178423236514522, "no_speech_prob": + 0.22626496851444244}, {"id": 866, "seek": 289280, "start": 2901.04, "end": 2903.0, + "text": " I know what they are.", "tokens": [50776, 286, 458, 437, 436, 366, 13, + 50874], "temperature": 0.0, "avg_logprob": -0.25575417280197144, "compression_ratio": + 1.7178423236514522, "no_speech_prob": 0.22626496851444244}, {"id": 867, "seek": + 289280, "start": 2903.0, "end": 2907.84, "text": " And maybe you run out of all + possible combinations, then you add another one and another one.", "tokens": [50874, + 400, 1310, 291, 1190, 484, 295, 439, 1944, 21267, 11, 550, 291, 909, 1071, 472, + 293, 1071, 472, 13, 51116], "temperature": 0.0, "avg_logprob": -0.25575417280197144, + "compression_ratio": 1.7178423236514522, "no_speech_prob": 0.22626496851444244}, + {"id": 868, "seek": 289280, "start": 2907.84, "end": 2909.76, "text": " But in bird, + you don''t do that, right?", "tokens": [51116, 583, 294, 5255, 11, 291, 500, 380, + 360, 300, 11, 558, 30, 51212], "temperature": 0.0, "avg_logprob": -0.25575417280197144, + "compression_ratio": 1.7178423236514522, "no_speech_prob": 0.22626496851444244}, + {"id": 869, "seek": 289280, "start": 2909.76, "end": 2912.76, "text": " You mask + the text and you train it.", "tokens": [51212, 509, 6094, 264, 2487, 293, 291, 3847, + 309, 13, 51362], "temperature": 0.0, "avg_logprob": -0.25575417280197144, "compression_ratio": + 1.7178423236514522, "no_speech_prob": 0.22626496851444244}, {"id": 870, "seek": + 289280, "start": 2912.76, "end": 2914.28, "text": " And that''s it, right?", "tokens": + [51362, 400, 300, 311, 309, 11, 558, 30, 51438], "temperature": 0.0, "avg_logprob": + -0.25575417280197144, "compression_ratio": 1.7178423236514522, "no_speech_prob": + 0.22626496851444244}, {"id": 871, "seek": 289280, "start": 2914.28, "end": 2917.5600000000004, + "text": " Yeah, you didn''t tell what is negation.", "tokens": [51438, 865, 11, + 291, 994, 380, 980, 437, 307, 2485, 399, 13, 51602], "temperature": 0.0, "avg_logprob": + -0.25575417280197144, "compression_ratio": 1.7178423236514522, "no_speech_prob": + 0.22626496851444244}, {"id": 872, "seek": 289280, "start": 2917.5600000000004, "end": + 2919.6400000000003, "text": " You didn''t tell what.", "tokens": [51602, 509, 994, + 380, 980, 437, 13, 51706], "temperature": 0.0, "avg_logprob": -0.25575417280197144, + "compression_ratio": 1.7178423236514522, "no_speech_prob": 0.22626496851444244}, + {"id": 873, "seek": 291964, "start": 2919.64, "end": 2924.6, "text": " And you can + also argue, yes, we also in our human brain, we don''t have syntax, right?", "tokens": + [50364, 400, 291, 393, 611, 9695, 11, 2086, 11, 321, 611, 294, 527, 1952, 3567, + 11, 321, 500, 380, 362, 28431, 11, 558, 30, 50612], "temperature": 0.0, "avg_logprob": + -0.2281210140919122, "compression_ratio": 1.7578125, "no_speech_prob": 0.03039485402405262}, + {"id": 874, "seek": 291964, "start": 2924.6, "end": 2926.52, "text": " But there + are some studies like that.", "tokens": [50612, 583, 456, 366, 512, 5313, 411, 300, + 13, 50708], "temperature": 0.0, "avg_logprob": -0.2281210140919122, "compression_ratio": + 1.7578125, "no_speech_prob": 0.03039485402405262}, {"id": 875, "seek": 291964, "start": + 2926.52, "end": 2931.2, "text": " Like, for instance, when kids learn to speak the + language, they don''t know what is negation,", "tokens": [50708, 1743, 11, 337, + 5197, 11, 562, 2301, 1466, 281, 1710, 264, 2856, 11, 436, 500, 380, 458, 437, 307, + 2485, 399, 11, 50942], "temperature": 0.0, "avg_logprob": -0.2281210140919122, "compression_ratio": + 1.7578125, "no_speech_prob": 0.03039485402405262}, {"id": 876, "seek": 291964, "start": + 2931.2, "end": 2932.2, "text": " right?", "tokens": [50942, 558, 30, 50992], "temperature": + 0.0, "avg_logprob": -0.2281210140919122, "compression_ratio": 1.7578125, "no_speech_prob": + 0.03039485402405262}, {"id": 877, "seek": 291964, "start": 2932.2, "end": 2935.24, + "text": " They don''t know what is the syntactic structure or pronoun or whatever.", + "tokens": [50992, 814, 500, 380, 458, 437, 307, 264, 23980, 19892, 3877, 420, 14144, + 420, 2035, 13, 51144], "temperature": 0.0, "avg_logprob": -0.2281210140919122, "compression_ratio": + 1.7578125, "no_speech_prob": 0.03039485402405262}, {"id": 878, "seek": 291964, "start": + 2935.24, "end": 2937.68, "text": " They just speak, right?", "tokens": [51144, 814, + 445, 1710, 11, 558, 30, 51266], "temperature": 0.0, "avg_logprob": -0.2281210140919122, + "compression_ratio": 1.7578125, "no_speech_prob": 0.03039485402405262}, {"id": 879, + "seek": 291964, "start": 2937.68, "end": 2941.72, "text": " And so we probably don''t + use syntax in our brain either.", "tokens": [51266, 400, 370, 321, 1391, 500, 380, + 764, 28431, 294, 527, 3567, 2139, 13, 51468], "temperature": 0.0, "avg_logprob": + -0.2281210140919122, "compression_ratio": 1.7578125, "no_speech_prob": 0.03039485402405262}, + {"id": 880, "seek": 291964, "start": 2941.72, "end": 2948.3199999999997, "text": + " Like we use some semantic grams, I don''t know, something like that.", "tokens": + [51468, 1743, 321, 764, 512, 47982, 11899, 11, 286, 500, 380, 458, 11, 746, 411, + 300, 13, 51798], "temperature": 0.0, "avg_logprob": -0.2281210140919122, "compression_ratio": + 1.7578125, "no_speech_prob": 0.03039485402405262}, {"id": 881, "seek": 291964, "start": + 2948.3199999999997, "end": 2949.3199999999997, "text": " Yeah.", "tokens": [51798, + 865, 13, 51848], "temperature": 0.0, "avg_logprob": -0.2281210140919122, "compression_ratio": + 1.7578125, "no_speech_prob": 0.03039485402405262}, {"id": 882, "seek": 294932, "start": + 2949.4, "end": 2950.32, "text": " It''s an exciting topic.", "tokens": [50368, 467, + 311, 364, 4670, 4829, 13, 50414], "temperature": 0.0, "avg_logprob": -0.2309028080531529, + "compression_ratio": 1.5674740484429066, "no_speech_prob": 0.002601569751277566}, + {"id": 883, "seek": 294932, "start": 2950.32, "end": 2959.04, "text": " So, and + if we go back to Milbus, so you guys basically essentially have built support for", + "tokens": [50414, 407, 11, 293, 498, 321, 352, 646, 281, 7036, 21441, 11, 370, 291, + 1074, 1936, 4476, 362, 3094, 1406, 337, 50850], "temperature": 0.0, "avg_logprob": + -0.2309028080531529, "compression_ratio": 1.5674740484429066, "no_speech_prob": + 0.002601569751277566}, {"id": 884, "seek": 294932, "start": 2959.04, "end": 2962.92, + "text": " a number of indexing algorithms, as you said, right?", "tokens": [50850, + 257, 1230, 295, 8186, 278, 14642, 11, 382, 291, 848, 11, 558, 30, 51044], "temperature": + 0.0, "avg_logprob": -0.2309028080531529, "compression_ratio": 1.5674740484429066, + "no_speech_prob": 0.002601569751277566}, {"id": 885, "seek": 294932, "start": 2962.92, + "end": 2963.44, "text": " Yeah.", "tokens": [51044, 865, 13, 51070], "temperature": + 0.0, "avg_logprob": -0.2309028080531529, "compression_ratio": 1.5674740484429066, + "no_speech_prob": 0.002601569751277566}, {"id": 886, "seek": 294932, "start": 2963.44, + "end": 2967.6400000000003, "text": " And can I, as a user, also plug in my own method?", + "tokens": [51070, 400, 393, 286, 11, 382, 257, 4195, 11, 611, 5452, 294, 452, 1065, + 3170, 30, 51280], "temperature": 0.0, "avg_logprob": -0.2309028080531529, "compression_ratio": + 1.5674740484429066, "no_speech_prob": 0.002601569751277566}, {"id": 887, "seek": + 294932, "start": 2967.6400000000003, "end": 2969.84, "text": " So we''re currently + working for those plans.", "tokens": [51280, 407, 321, 434, 4362, 1364, 337, 729, + 5482, 13, 51390], "temperature": 0.0, "avg_logprob": -0.2309028080531529, "compression_ratio": + 1.5674740484429066, "no_speech_prob": 0.002601569751277566}, {"id": 888, "seek": + 294932, "start": 2969.84, "end": 2972.04, "text": " Right now, it''s kind of blocked + off.", "tokens": [51390, 1779, 586, 11, 309, 311, 733, 295, 15470, 766, 13, 51500], + "temperature": 0.0, "avg_logprob": -0.2309028080531529, "compression_ratio": 1.5674740484429066, + "no_speech_prob": 0.002601569751277566}, {"id": 889, "seek": 294932, "start": 2972.04, + "end": 2974.4, "text": " Or like there''s a bunch of changes you have to make deeper + in.", "tokens": [51500, 1610, 411, 456, 311, 257, 3840, 295, 2962, 291, 362, 281, + 652, 7731, 294, 13, 51618], "temperature": 0.0, "avg_logprob": -0.2309028080531529, + "compression_ratio": 1.5674740484429066, "no_speech_prob": 0.002601569751277566}, + {"id": 890, "seek": 294932, "start": 2974.4, "end": 2978.52, "text": " But we are + working towards doing something along those lines where we''ll have a system", "tokens": + [51618, 583, 321, 366, 1364, 3030, 884, 746, 2051, 729, 3876, 689, 321, 603, 362, + 257, 1185, 51824], "temperature": 0.0, "avg_logprob": -0.2309028080531529, "compression_ratio": + 1.5674740484429066, "no_speech_prob": 0.002601569751277566}, {"id": 891, "seek": + 297852, "start": 2978.52, "end": 2980.52, "text": " where you can kind of bring + it in.", "tokens": [50364, 689, 291, 393, 733, 295, 1565, 309, 294, 13, 50464], + "temperature": 0.0, "avg_logprob": -0.23315645193124745, "compression_ratio": 1.9171974522292994, + "no_speech_prob": 0.002841190667822957}, {"id": 892, "seek": 297852, "start": 2980.52, + "end": 2985.2, "text": " But we''re also trying to add like Google scan and we''re + working on that disk and it''s like", "tokens": [50464, 583, 321, 434, 611, 1382, + 281, 909, 411, 3329, 11049, 293, 321, 434, 1364, 322, 300, 12355, 293, 309, 311, + 411, 50698], "temperature": 0.0, "avg_logprob": -0.23315645193124745, "compression_ratio": + 1.9171974522292994, "no_speech_prob": 0.002841190667822957}, {"id": 893, "seek": + 297852, "start": 2985.2, "end": 2987.36, "text": " we''re already like working on + putting all the main ones in.", "tokens": [50698, 321, 434, 1217, 411, 1364, 322, + 3372, 439, 264, 2135, 2306, 294, 13, 50806], "temperature": 0.0, "avg_logprob": + -0.23315645193124745, "compression_ratio": 1.9171974522292994, "no_speech_prob": + 0.002841190667822957}, {"id": 894, "seek": 297852, "start": 2987.36, "end": 2990.0, + "text": " But right now, you can''t really do your own.", "tokens": [50806, 583, + 558, 586, 11, 291, 393, 380, 534, 360, 428, 1065, 13, 50938], "temperature": 0.0, + "avg_logprob": -0.23315645193124745, "compression_ratio": 1.9171974522292994, "no_speech_prob": + 0.002841190667822957}, {"id": 895, "seek": 297852, "start": 2990.0, "end": 2992.52, + "text": " And there''s also another question that comes up a bunch of the time.", + "tokens": [50938, 400, 456, 311, 611, 1071, 1168, 300, 1487, 493, 257, 3840, 295, + 264, 565, 13, 51064], "temperature": 0.0, "avg_logprob": -0.23315645193124745, "compression_ratio": + 1.9171974522292994, "no_speech_prob": 0.002841190667822957}, {"id": 896, "seek": + 297852, "start": 2992.52, "end": 2994.48, "text": " Is it using your own distance + metric?", "tokens": [51064, 1119, 309, 1228, 428, 1065, 4560, 20678, 30, 51162], + "temperature": 0.0, "avg_logprob": -0.23315645193124745, "compression_ratio": 1.9171974522292994, + "no_speech_prob": 0.002841190667822957}, {"id": 897, "seek": 297852, "start": 2994.48, + "end": 2999.96, "text": " And that''s one, unfortunately, you kind of lose, like + you can do your own disks metrics,", "tokens": [51162, 400, 300, 311, 472, 11, 7015, + 11, 291, 733, 295, 3624, 11, 411, 291, 393, 360, 428, 1065, 41617, 16367, 11, 51436], + "temperature": 0.0, "avg_logprob": -0.23315645193124745, "compression_ratio": 1.9171974522292994, + "no_speech_prob": 0.002841190667822957}, {"id": 898, "seek": 297852, "start": 2999.96, + "end": 3004.64, "text": " metric, but it would require you to only use flat because + all these algorithms are kind", "tokens": [51436, 20678, 11, 457, 309, 576, 3651, + 291, 281, 787, 764, 4962, 570, 439, 613, 14642, 366, 733, 51670], "temperature": + 0.0, "avg_logprob": -0.23315645193124745, "compression_ratio": 1.9171974522292994, + "no_speech_prob": 0.002841190667822957}, {"id": 899, "seek": 297852, "start": 3004.64, + "end": 3008.48, "text": " of, the reason they''re efficient is because they can + do some of these distance metrics.", "tokens": [51670, 295, 11, 264, 1778, 436, + 434, 7148, 307, 570, 436, 393, 360, 512, 295, 613, 4560, 16367, 13, 51862], "temperature": + 0.0, "avg_logprob": -0.23315645193124745, "compression_ratio": 1.9171974522292994, + "no_speech_prob": 0.002841190667822957}, {"id": 900, "seek": 300848, "start": 3008.52, + "end": 3012.12, "text": " And like, let''s say with quantization, like it kind of + plugs and plays together nicely.", "tokens": [50366, 400, 411, 11, 718, 311, 584, + 365, 4426, 2144, 11, 411, 309, 733, 295, 33899, 293, 5749, 1214, 9594, 13, 50546], + "temperature": 0.0, "avg_logprob": -0.24736228778207903, "compression_ratio": 1.7474747474747474, + "no_speech_prob": 0.00034690083703026175}, {"id": 901, "seek": 300848, "start": + 3012.12, "end": 3015.92, "text": " When you try changing those things, everything + breaks and you kind of have to revert back", "tokens": [50546, 1133, 291, 853, 4473, + 729, 721, 11, 1203, 9857, 293, 291, 733, 295, 362, 281, 319, 3281, 646, 50736], + "temperature": 0.0, "avg_logprob": -0.24736228778207903, "compression_ratio": 1.7474747474747474, + "no_speech_prob": 0.00034690083703026175}, {"id": 902, "seek": 300848, "start": + 3015.92, "end": 3017.48, "text": " to doing a flat-based system.", "tokens": [50736, + 281, 884, 257, 4962, 12, 6032, 1185, 13, 50814], "temperature": 0.0, "avg_logprob": + -0.24736228778207903, "compression_ratio": 1.7474747474747474, "no_speech_prob": + 0.00034690083703026175}, {"id": 903, "seek": 300848, "start": 3017.48, "end": 3022.32, + "text": " But yeah, those are the, yeah, for now, we''re trying to add in all of + the most famous,", "tokens": [50814, 583, 1338, 11, 729, 366, 264, 11, 1338, 11, + 337, 586, 11, 321, 434, 1382, 281, 909, 294, 439, 295, 264, 881, 4618, 11, 51056], + "temperature": 0.0, "avg_logprob": -0.24736228778207903, "compression_ratio": 1.7474747474747474, + "no_speech_prob": 0.00034690083703026175}, {"id": 904, "seek": 300848, "start": + 3022.32, "end": 3027.2, "text": " or not most famous, but just state of the art + near-snaper searching algorithms.", "tokens": [51056, 420, 406, 881, 4618, 11, 457, + 445, 1785, 295, 264, 1523, 2651, 12, 82, 629, 610, 10808, 14642, 13, 51300], "temperature": + 0.0, "avg_logprob": -0.24736228778207903, "compression_ratio": 1.7474747474747474, + "no_speech_prob": 0.00034690083703026175}, {"id": 905, "seek": 300848, "start": + 3027.2, "end": 3031.12, "text": " And then later on, hopefully, we can kind of make + this thing where you can kind of code", "tokens": [51300, 400, 550, 1780, 322, 11, + 4696, 11, 321, 393, 733, 295, 652, 341, 551, 689, 291, 393, 733, 295, 3089, 51496], + "temperature": 0.0, "avg_logprob": -0.24736228778207903, "compression_ratio": 1.7474747474747474, + "no_speech_prob": 0.00034690083703026175}, {"id": 906, "seek": 300848, "start": + 3031.12, "end": 3034.8, "text": " it out yourself and kind of plug it in and make + it work.", "tokens": [51496, 309, 484, 1803, 293, 733, 295, 5452, 309, 294, 293, + 652, 309, 589, 13, 51680], "temperature": 0.0, "avg_logprob": -0.24736228778207903, + "compression_ratio": 1.7474747474747474, "no_speech_prob": 0.00034690083703026175}, + {"id": 907, "seek": 303480, "start": 3034.8, "end": 3037.84, "text": " And also, + Milbus, like, pays a lot of attention to scalability.", "tokens": [50364, 400, 611, + 11, 7036, 21441, 11, 411, 11, 10604, 257, 688, 295, 3202, 281, 15664, 2310, 13, + 50516], "temperature": 0.0, "avg_logprob": -0.24086390196822072, "compression_ratio": + 1.6620689655172414, "no_speech_prob": 0.013788984157145023}, {"id": 908, "seek": + 303480, "start": 3037.84, "end": 3043.52, "text": " So for instance, like horizontal + scaling, you know, is probably vertical, right?", "tokens": [50516, 407, 337, 5197, + 11, 411, 12750, 21589, 11, 291, 458, 11, 307, 1391, 9429, 11, 558, 30, 50800], "temperature": + 0.0, "avg_logprob": -0.24086390196822072, "compression_ratio": 1.6620689655172414, + "no_speech_prob": 0.013788984157145023}, {"id": 909, "seek": 303480, "start": 3043.52, + "end": 3050.76, "text": " So, but for instance, one thing that I''ve been thinking + about is that in the whole infrastructure", "tokens": [50800, 407, 11, 457, 337, + 5197, 11, 472, 551, 300, 286, 600, 668, 1953, 466, 307, 300, 294, 264, 1379, 6896, + 51162], "temperature": 0.0, "avg_logprob": -0.24086390196822072, "compression_ratio": + 1.6620689655172414, "no_speech_prob": 0.013788984157145023}, {"id": 910, "seek": + 303480, "start": 3050.76, "end": 3056.6000000000004, "text": " and the pipeline + of a search engine, one of the bottlenecks is actually getting the data,", "tokens": + [51162, 293, 264, 15517, 295, 257, 3164, 2848, 11, 472, 295, 264, 44641, 2761, 307, + 767, 1242, 264, 1412, 11, 51454], "temperature": 0.0, "avg_logprob": -0.24086390196822072, + "compression_ratio": 1.6620689655172414, "no_speech_prob": 0.013788984157145023}, + {"id": 911, "seek": 303480, "start": 3056.6000000000004, "end": 3057.6000000000004, + "text": " right?", "tokens": [51454, 558, 30, 51504], "temperature": 0.0, "avg_logprob": + -0.24086390196822072, "compression_ratio": 1.6620689655172414, "no_speech_prob": + 0.013788984157145023}, {"id": 912, "seek": 303480, "start": 3057.6000000000004, + "end": 3060.96, "text": " So the data comes in, let''s say in raw format, I don''t + know, news items, images, what", "tokens": [51504, 407, 264, 1412, 1487, 294, 11, + 718, 311, 584, 294, 8936, 7877, 11, 286, 500, 380, 458, 11, 2583, 4754, 11, 5267, + 11, 437, 51672], "temperature": 0.0, "avg_logprob": -0.24086390196822072, "compression_ratio": + 1.6620689655172414, "no_speech_prob": 0.013788984157145023}, {"id": 913, "seek": + 303480, "start": 3060.96, "end": 3061.96, "text": " have you.", "tokens": [51672, + 362, 291, 13, 51722], "temperature": 0.0, "avg_logprob": -0.24086390196822072, "compression_ratio": + 1.6620689655172414, "no_speech_prob": 0.013788984157145023}, {"id": 914, "seek": + 303480, "start": 3061.96, "end": 3064.6000000000004, "text": " Now you need to compute + the embeddings, right?", "tokens": [51722, 823, 291, 643, 281, 14722, 264, 12240, + 29432, 11, 558, 30, 51854], "temperature": 0.0, "avg_logprob": -0.24086390196822072, + "compression_ratio": 1.6620689655172414, "no_speech_prob": 0.013788984157145023}, + {"id": 915, "seek": 306460, "start": 3064.6, "end": 3067.68, "text": " At some good, + good rate, right?", "tokens": [50364, 1711, 512, 665, 11, 665, 3314, 11, 558, 30, + 50518], "temperature": 0.0, "avg_logprob": -0.269567242984114, "compression_ratio": + 1.6809338521400778, "no_speech_prob": 0.0025497868191450834}, {"id": 916, "seek": + 306460, "start": 3067.68, "end": 3068.92, "text": " So kind of throughput.", "tokens": + [50518, 407, 733, 295, 44629, 13, 50580], "temperature": 0.0, "avg_logprob": -0.269567242984114, + "compression_ratio": 1.6809338521400778, "no_speech_prob": 0.0025497868191450834}, + {"id": 917, "seek": 306460, "start": 3068.92, "end": 3069.92, "text": " Yeah.", + "tokens": [50580, 865, 13, 50630], "temperature": 0.0, "avg_logprob": -0.269567242984114, + "compression_ratio": 1.6809338521400778, "no_speech_prob": 0.0025497868191450834}, + {"id": 918, "seek": 306460, "start": 3069.92, "end": 3076.92, "text": " So do you + guys have any work done in this area or kind of recommendations for the users?", + "tokens": [50630, 407, 360, 291, 1074, 362, 604, 589, 1096, 294, 341, 1859, 420, + 733, 295, 10434, 337, 264, 5022, 30, 50980], "temperature": 0.0, "avg_logprob": + -0.269567242984114, "compression_ratio": 1.6809338521400778, "no_speech_prob": 0.0025497868191450834}, + {"id": 919, "seek": 306460, "start": 3076.92, "end": 3081.92, "text": " So recommendations + right now are what we''ve been using is when we recommend is like having", "tokens": + [50980, 407, 10434, 558, 586, 366, 437, 321, 600, 668, 1228, 307, 562, 321, 2748, + 307, 411, 1419, 51230], "temperature": 0.0, "avg_logprob": -0.269567242984114, "compression_ratio": + 1.6809338521400778, "no_speech_prob": 0.0025497868191450834}, {"id": 920, "seek": + 306460, "start": 3081.92, "end": 3086.16, "text": " a server, like there''s in videos, + try to in there''s a few other ones of inference-based", "tokens": [51230, 257, + 7154, 11, 411, 456, 311, 294, 2145, 11, 853, 281, 294, 456, 311, 257, 1326, 661, + 2306, 295, 38253, 12, 6032, 51442], "temperature": 0.0, "avg_logprob": -0.269567242984114, + "compression_ratio": 1.6809338521400778, "no_speech_prob": 0.0025497868191450834}, + {"id": 921, "seek": 306460, "start": 3086.16, "end": 3088.68, "text": " servers, + and you scale those up themselves.", "tokens": [51442, 15909, 11, 293, 291, 4373, + 729, 493, 2969, 13, 51568], "temperature": 0.0, "avg_logprob": -0.269567242984114, + "compression_ratio": 1.6809338521400778, "no_speech_prob": 0.0025497868191450834}, + {"id": 922, "seek": 306460, "start": 3088.68, "end": 3091.7599999999998, "text": + " We are currently working on that, like we''re calling it OE.", "tokens": [51568, + 492, 366, 4362, 1364, 322, 300, 11, 411, 321, 434, 5141, 309, 422, 36, 13, 51722], + "temperature": 0.0, "avg_logprob": -0.269567242984114, "compression_ratio": 1.6809338521400778, + "no_speech_prob": 0.0025497868191450834}, {"id": 923, "seek": 309176, "start": 3091.76, + "end": 3097.0800000000004, "text": " And it''s kind of the ML pipeline scalable + that goes, that''s focuses on embeddings.", "tokens": [50364, 400, 309, 311, 733, + 295, 264, 21601, 15517, 38481, 300, 1709, 11, 300, 311, 16109, 322, 12240, 29432, + 13, 50630], "temperature": 0.0, "avg_logprob": -0.20688496293692754, "compression_ratio": + 1.8771331058020477, "no_speech_prob": 0.019276460632681847}, {"id": 924, "seek": + 309176, "start": 3097.0800000000004, "end": 3100.76, "text": " So like kind of doing + all these things about embeddings, everything embeddings, and making", "tokens": + [50630, 407, 411, 733, 295, 884, 439, 613, 721, 466, 12240, 29432, 11, 1203, 12240, + 29432, 11, 293, 1455, 50814], "temperature": 0.0, "avg_logprob": -0.20688496293692754, + "compression_ratio": 1.8771331058020477, "no_speech_prob": 0.019276460632681847}, + {"id": 925, "seek": 309176, "start": 3100.76, "end": 3104.28, "text": " pipelines + that can scale in multiple machines and multiple GPUs.", "tokens": [50814, 40168, + 300, 393, 4373, 294, 3866, 8379, 293, 3866, 18407, 82, 13, 50990], "temperature": + 0.0, "avg_logprob": -0.20688496293692754, "compression_ratio": 1.8771331058020477, + "no_speech_prob": 0.019276460632681847}, {"id": 926, "seek": 309176, "start": 3104.28, + "end": 3106.48, "text": " Still in the work of progress, it''s pretty early stage.", + "tokens": [50990, 8291, 294, 264, 589, 295, 4205, 11, 309, 311, 1238, 2440, 3233, + 13, 51100], "temperature": 0.0, "avg_logprob": -0.20688496293692754, "compression_ratio": + 1.8771331058020477, "no_speech_prob": 0.019276460632681847}, {"id": 927, "seek": + 309176, "start": 3106.48, "end": 3110.28, "text": " That''s what I''m currently + also working on.", "tokens": [51100, 663, 311, 437, 286, 478, 4362, 611, 1364, 322, + 13, 51290], "temperature": 0.0, "avg_logprob": -0.20688496293692754, "compression_ratio": + 1.8771331058020477, "no_speech_prob": 0.019276460632681847}, {"id": 928, "seek": + 309176, "start": 3110.28, "end": 3113.28, "text": " That''s going to be that, like + there are people that don''t know that step and that don''t", "tokens": [51290, + 663, 311, 516, 281, 312, 300, 11, 411, 456, 366, 561, 300, 500, 380, 458, 300, 1823, + 293, 300, 500, 380, 51440], "temperature": 0.0, "avg_logprob": -0.20688496293692754, + "compression_ratio": 1.8771331058020477, "no_speech_prob": 0.019276460632681847}, + {"id": 929, "seek": 309176, "start": 3113.28, "end": 3117.0400000000004, "text": + " know what the best process is, and we''re going to be, it''s also open source.", + "tokens": [51440, 458, 437, 264, 1151, 1399, 307, 11, 293, 321, 434, 516, 281, 312, + 11, 309, 311, 611, 1269, 4009, 13, 51628], "temperature": 0.0, "avg_logprob": -0.20688496293692754, + "compression_ratio": 1.8771331058020477, "no_speech_prob": 0.019276460632681847}, + {"id": 930, "seek": 309176, "start": 3117.0400000000004, "end": 3119.5200000000004, + "text": " So it''s going to be the step ahead of millivis.", "tokens": [51628, 407, + 309, 311, 516, 281, 312, 264, 1823, 2286, 295, 1728, 592, 271, 13, 51752], "temperature": + 0.0, "avg_logprob": -0.20688496293692754, "compression_ratio": 1.8771331058020477, + "no_speech_prob": 0.019276460632681847}, {"id": 931, "seek": 311952, "start": 3119.52, + "end": 3123.8, "text": " And then we''re going to, as it progresses, kind of interlink + it with millivis, kind of", "tokens": [50364, 400, 550, 321, 434, 516, 281, 11, + 382, 309, 41929, 11, 733, 295, 728, 22473, 309, 365, 1728, 592, 271, 11, 733, 295, + 50578], "temperature": 0.0, "avg_logprob": -0.21983256340026855, "compression_ratio": + 1.646153846153846, "no_speech_prob": 0.007579208351671696}, {"id": 932, "seek": + 311952, "start": 3123.8, "end": 3126.64, "text": " make an easy plug-in play together.", + "tokens": [50578, 652, 364, 1858, 5452, 12, 259, 862, 1214, 13, 50720], "temperature": + 0.0, "avg_logprob": -0.21983256340026855, "compression_ratio": 1.646153846153846, + "no_speech_prob": 0.007579208351671696}, {"id": 933, "seek": 311952, "start": 3126.64, + "end": 3130.8, "text": " But for now, it''s all about kind of scaling up inference + servers.", "tokens": [50720, 583, 337, 586, 11, 309, 311, 439, 466, 733, 295, 21589, + 493, 38253, 15909, 13, 50928], "temperature": 0.0, "avg_logprob": -0.21983256340026855, + "compression_ratio": 1.646153846153846, "no_speech_prob": 0.007579208351671696}, + {"id": 934, "seek": 311952, "start": 3130.8, "end": 3132.6, "text": " Luckily, you + can scale it pretty easily.", "tokens": [50928, 19726, 11, 291, 393, 4373, 309, + 1238, 3612, 13, 51018], "temperature": 0.0, "avg_logprob": -0.21983256340026855, + "compression_ratio": 1.646153846153846, "no_speech_prob": 0.007579208351671696}, + {"id": 935, "seek": 311952, "start": 3132.6, "end": 3137.52, "text": " When it comes + to videos, where frame order matters, it''s a little different.", "tokens": [51018, + 1133, 309, 1487, 281, 2145, 11, 689, 3920, 1668, 7001, 11, 309, 311, 257, 707, 819, + 13, 51264], "temperature": 0.0, "avg_logprob": -0.21983256340026855, "compression_ratio": + 1.646153846153846, "no_speech_prob": 0.007579208351671696}, {"id": 936, "seek": + 311952, "start": 3137.52, "end": 3143.72, "text": " But yes, we kind of, for now, + with millivis are only that storage and search part.", "tokens": [51264, 583, 2086, + 11, 321, 733, 295, 11, 337, 586, 11, 365, 1728, 592, 271, 366, 787, 300, 6725, 293, + 3164, 644, 13, 51574], "temperature": 0.0, "avg_logprob": -0.21983256340026855, + "compression_ratio": 1.646153846153846, "no_speech_prob": 0.007579208351671696}, + {"id": 937, "seek": 311952, "start": 3143.72, "end": 3146.6, "text": " Everything + above it is up to the user.", "tokens": [51574, 5471, 3673, 309, 307, 493, 281, + 264, 4195, 13, 51718], "temperature": 0.0, "avg_logprob": -0.21983256340026855, + "compression_ratio": 1.646153846153846, "no_speech_prob": 0.007579208351671696}, + {"id": 938, "seek": 314660, "start": 3146.88, "end": 3147.44, "text": " Yeah.", + "tokens": [50378, 865, 13, 50406], "temperature": 0.0, "avg_logprob": -0.29001028197152273, + "compression_ratio": 1.6754385964912282, "no_speech_prob": 0.11483266949653625}, + {"id": 939, "seek": 314660, "start": 3147.44, "end": 3155.16, "text": " And millivis, + do I only store the vectors, or can I also store the input object?", "tokens": [50406, + 400, 1728, 592, 271, 11, 360, 286, 787, 3531, 264, 18875, 11, 420, 393, 286, 611, + 3531, 264, 4846, 2657, 30, 50792], "temperature": 0.0, "avg_logprob": -0.29001028197152273, + "compression_ratio": 1.6754385964912282, "no_speech_prob": 0.11483266949653625}, + {"id": 940, "seek": 314660, "start": 3155.16, "end": 3156.52, "text": " Right now, + no input object.", "tokens": [50792, 1779, 586, 11, 572, 4846, 2657, 13, 50860], + "temperature": 0.0, "avg_logprob": -0.29001028197152273, "compression_ratio": 1.6754385964912282, + "no_speech_prob": 0.11483266949653625}, {"id": 941, "seek": 314660, "start": 3156.52, + "end": 3160.96, "text": " And that was kind of like, it slows things down a lot.", + "tokens": [50860, 400, 300, 390, 733, 295, 411, 11, 309, 35789, 721, 760, 257, 688, + 13, 51082], "temperature": 0.0, "avg_logprob": -0.29001028197152273, "compression_ratio": + 1.6754385964912282, "no_speech_prob": 0.11483266949653625}, {"id": 942, "seek": + 314660, "start": 3160.96, "end": 3167.0, "text": " And that''s where sort of a no-skill + database or something like that would work a lot better", "tokens": [51082, 400, + 300, 311, 689, 1333, 295, 257, 572, 12, 5161, 373, 8149, 420, 746, 411, 300, 576, + 589, 257, 688, 1101, 51384], "temperature": 0.0, "avg_logprob": -0.29001028197152273, + "compression_ratio": 1.6754385964912282, "no_speech_prob": 0.11483266949653625}, + {"id": 943, "seek": 314660, "start": 3167.0, "end": 3170.4, "text": " for those + quick retrievals, where you need exact.", "tokens": [51384, 337, 729, 1702, 19817, + 19778, 11, 689, 291, 643, 1900, 13, 51554], "temperature": 0.0, "avg_logprob": -0.29001028197152273, + "compression_ratio": 1.6754385964912282, "no_speech_prob": 0.11483266949653625}, + {"id": 944, "seek": 314660, "start": 3170.4, "end": 3174.7999999999997, "text": + " So we, for now, store the ints, and then we''re going to store strings.", "tokens": + [51554, 407, 321, 11, 337, 586, 11, 3531, 264, 560, 82, 11, 293, 550, 321, 434, + 516, 281, 3531, 13985, 13, 51774], "temperature": 0.0, "avg_logprob": -0.29001028197152273, + "compression_ratio": 1.6754385964912282, "no_speech_prob": 0.11483266949653625}, + {"id": 945, "seek": 317480, "start": 3174.8, "end": 3178.36, "text": " And then + later on, we''re going to add more and more types that we can store with it.", "tokens": + [50364, 400, 550, 1780, 322, 11, 321, 434, 516, 281, 909, 544, 293, 544, 3467, 300, + 321, 393, 3531, 365, 309, 13, 50542], "temperature": 0.0, "avg_logprob": -0.2460803985595703, + "compression_ratio": 1.7927272727272727, "no_speech_prob": 0.0005468070157803595}, + {"id": 946, "seek": 317480, "start": 3178.36, "end": 3182.6400000000003, "text": + " And another thing, we''re hoping to be able to also index on strings on indexes.", + "tokens": [50542, 400, 1071, 551, 11, 321, 434, 7159, 281, 312, 1075, 281, 611, + 8186, 322, 13985, 322, 8186, 279, 13, 50756], "temperature": 0.0, "avg_logprob": + -0.2460803985595703, "compression_ratio": 1.7927272727272727, "no_speech_prob": + 0.0005468070157803595}, {"id": 947, "seek": 317480, "start": 3182.6400000000003, + "end": 3184.44, "text": " So you are on ints.", "tokens": [50756, 407, 291, 366, + 322, 560, 82, 13, 50846], "temperature": 0.0, "avg_logprob": -0.2460803985595703, + "compression_ratio": 1.7927272727272727, "no_speech_prob": 0.0005468070157803595}, + {"id": 948, "seek": 317480, "start": 3184.44, "end": 3187.1200000000003, "text": + " So for now, we don''t store the objects.", "tokens": [50846, 407, 337, 586, 11, + 321, 500, 380, 3531, 264, 6565, 13, 50980], "temperature": 0.0, "avg_logprob": -0.2460803985595703, + "compression_ratio": 1.7927272727272727, "no_speech_prob": 0.0005468070157803595}, + {"id": 949, "seek": 317480, "start": 3187.1200000000003, "end": 3190.32, "text": + " So in the future, when we have string, you can link the file path.", "tokens": + [50980, 407, 294, 264, 2027, 11, 562, 321, 362, 6798, 11, 291, 393, 2113, 264, 3991, + 3100, 13, 51140], "temperature": 0.0, "avg_logprob": -0.2460803985595703, "compression_ratio": + 1.7927272727272727, "no_speech_prob": 0.0005468070157803595}, {"id": 950, "seek": + 317480, "start": 3190.32, "end": 3193.84, "text": " Because that''s usually what + most, when you store an object, you''re just storing the file", "tokens": [51140, + 1436, 300, 311, 2673, 437, 881, 11, 562, 291, 3531, 364, 2657, 11, 291, 434, 445, + 26085, 264, 3991, 51316], "temperature": 0.0, "avg_logprob": -0.2460803985595703, + "compression_ratio": 1.7927272727272727, "no_speech_prob": 0.0005468070157803595}, + {"id": 951, "seek": 317480, "start": 3193.84, "end": 3194.84, "text": " path.", + "tokens": [51316, 3100, 13, 51366], "temperature": 0.0, "avg_logprob": -0.2460803985595703, + "compression_ratio": 1.7927272727272727, "no_speech_prob": 0.0005468070157803595}, + {"id": 952, "seek": 317480, "start": 3194.84, "end": 3201.2400000000002, "text": + " But yeah, object storage is not part currently on in the millivis server.", "tokens": + [51366, 583, 1338, 11, 2657, 6725, 307, 406, 644, 4362, 322, 294, 264, 1728, 592, + 271, 7154, 13, 51686], "temperature": 0.0, "avg_logprob": -0.2460803985595703, "compression_ratio": + 1.7927272727272727, "no_speech_prob": 0.0005468070157803595}, {"id": 953, "seek": + 317480, "start": 3201.2400000000002, "end": 3204.0800000000004, "text": " You just + get an ID and a vector.", "tokens": [51686, 509, 445, 483, 364, 7348, 293, 257, + 8062, 13, 51828], "temperature": 0.0, "avg_logprob": -0.2460803985595703, "compression_ratio": + 1.7927272727272727, "no_speech_prob": 0.0005468070157803595}, {"id": 954, "seek": + 320408, "start": 3204.08, "end": 3205.08, "text": " Yeah, yeah.", "tokens": [50364, + 865, 11, 1338, 13, 50414], "temperature": 0.0, "avg_logprob": -0.2524923969515794, + "compression_ratio": 1.6679104477611941, "no_speech_prob": 0.006133212707936764}, + {"id": 955, "seek": 320408, "start": 3205.08, "end": 3210.2, "text": " And then + you go back and kind of link it to the other days, where those objects are stored", + "tokens": [50414, 400, 550, 291, 352, 646, 293, 733, 295, 2113, 309, 281, 264, 661, + 1708, 11, 689, 729, 6565, 366, 12187, 50670], "temperature": 0.0, "avg_logprob": + -0.2524923969515794, "compression_ratio": 1.6679104477611941, "no_speech_prob": + 0.006133212707936764}, {"id": 956, "seek": 320408, "start": 3210.2, "end": 3213.0, + "text": " in case you need to display them or something like that.", "tokens": [50670, + 294, 1389, 291, 643, 281, 4674, 552, 420, 746, 411, 300, 13, 50810], "temperature": + 0.0, "avg_logprob": -0.2524923969515794, "compression_ratio": 1.6679104477611941, + "no_speech_prob": 0.006133212707936764}, {"id": 957, "seek": 320408, "start": 3213.0, + "end": 3217.24, "text": " So in that sense, you guys are like a pure vector database, + like you store.", "tokens": [50810, 407, 294, 300, 2020, 11, 291, 1074, 366, 411, + 257, 6075, 8062, 8149, 11, 411, 291, 3531, 13, 51022], "temperature": 0.0, "avg_logprob": + -0.2524923969515794, "compression_ratio": 1.6679104477611941, "no_speech_prob": + 0.006133212707936764}, {"id": 958, "seek": 320408, "start": 3217.24, "end": 3218.24, + "text": " Exactly.", "tokens": [51022, 7587, 13, 51072], "temperature": 0.0, "avg_logprob": + -0.2524923969515794, "compression_ratio": 1.6679104477611941, "no_speech_prob": + 0.006133212707936764}, {"id": 959, "seek": 320408, "start": 3218.24, "end": 3222.44, + "text": " Literally the vectors plus, I guess, the scholar values that they can + filter on, right?", "tokens": [51072, 23768, 264, 18875, 1804, 11, 286, 2041, 11, + 264, 17912, 4190, 300, 436, 393, 6608, 322, 11, 558, 30, 51282], "temperature": + 0.0, "avg_logprob": -0.2524923969515794, "compression_ratio": 1.6679104477611941, + "no_speech_prob": 0.006133212707936764}, {"id": 960, "seek": 320408, "start": 3222.44, + "end": 3223.44, "text": " Yeah.", "tokens": [51282, 865, 13, 51332], "temperature": + 0.0, "avg_logprob": -0.2524923969515794, "compression_ratio": 1.6679104477611941, + "no_speech_prob": 0.006133212707936764}, {"id": 961, "seek": 320408, "start": 3223.44, + "end": 3224.44, "text": " Yeah.", "tokens": [51332, 865, 13, 51382], "temperature": + 0.0, "avg_logprob": -0.2524923969515794, "compression_ratio": 1.6679104477611941, + "no_speech_prob": 0.006133212707936764}, {"id": 962, "seek": 320408, "start": 3224.44, + "end": 3225.44, "text": " Exactly.", "tokens": [51382, 7587, 13, 51432], "temperature": + 0.0, "avg_logprob": -0.2524923969515794, "compression_ratio": 1.6679104477611941, + "no_speech_prob": 0.006133212707936764}, {"id": 963, "seek": 320408, "start": 3225.44, + "end": 3226.44, "text": " Oh, that''s awesome.", "tokens": [51432, 876, 11, 300, + 311, 3476, 13, 51482], "temperature": 0.0, "avg_logprob": -0.2524923969515794, "compression_ratio": + 1.6679104477611941, "no_speech_prob": 0.006133212707936764}, {"id": 964, "seek": + 320408, "start": 3226.44, "end": 3228.88, "text": " I think that covers a lot of + use cases, isn''t it?", "tokens": [51482, 286, 519, 300, 10538, 257, 688, 295, 764, + 3331, 11, 1943, 380, 309, 30, 51604], "temperature": 0.0, "avg_logprob": -0.2524923969515794, + "compression_ratio": 1.6679104477611941, "no_speech_prob": 0.006133212707936764}, + {"id": 965, "seek": 320408, "start": 3228.88, "end": 3229.88, "text": " Doesn''t + it?", "tokens": [51604, 12955, 380, 309, 30, 51654], "temperature": 0.0, "avg_logprob": + -0.2524923969515794, "compression_ratio": 1.6679104477611941, "no_speech_prob": + 0.006133212707936764}, {"id": 966, "seek": 320408, "start": 3229.88, "end": 3230.88, + "text": " Yeah.", "tokens": [51654, 865, 13, 51704], "temperature": 0.0, "avg_logprob": + -0.2524923969515794, "compression_ratio": 1.6679104477611941, "no_speech_prob": + 0.006133212707936764}, {"id": 967, "seek": 320408, "start": 3230.88, "end": 3231.88, + "text": " Yeah.", "tokens": [51704, 865, 13, 51754], "temperature": 0.0, "avg_logprob": + -0.2524923969515794, "compression_ratio": 1.6679104477611941, "no_speech_prob": + 0.006133212707936764}, {"id": 968, "seek": 323188, "start": 3232.04, "end": 3236.96, + "text": " And I was also thinking like, so millivis is open source.", "tokens": + [50372, 400, 286, 390, 611, 1953, 411, 11, 370, 1728, 592, 271, 307, 1269, 4009, + 13, 50618], "temperature": 0.0, "avg_logprob": -0.25915334992489575, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.008430704474449158}, {"id": 969, "seek": + 323188, "start": 3236.96, "end": 3240.08, "text": " And it''s one of my favorite + also questions.", "tokens": [50618, 400, 309, 311, 472, 295, 452, 2954, 611, 1651, + 13, 50774], "temperature": 0.0, "avg_logprob": -0.25915334992489575, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.008430704474449158}, {"id": 970, "seek": + 323188, "start": 3240.08, "end": 3245.56, "text": " You know, like, can you speak + more why, why millivis is open source?", "tokens": [50774, 509, 458, 11, 411, 11, + 393, 291, 1710, 544, 983, 11, 983, 1728, 592, 271, 307, 1269, 4009, 30, 51048], + "temperature": 0.0, "avg_logprob": -0.25915334992489575, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.008430704474449158}, {"id": 971, "seek": 323188, "start": 3245.56, + "end": 3247.76, "text": " What do you get from it being open source?", "tokens": + [51048, 708, 360, 291, 483, 490, 309, 885, 1269, 4009, 30, 51158], "temperature": + 0.0, "avg_logprob": -0.25915334992489575, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.008430704474449158}, {"id": 972, "seek": 323188, "start": 3247.76, + "end": 3253.92, "text": " So I think the biggest thing right now is with open source, + like, you need open source", "tokens": [51158, 407, 286, 519, 264, 3880, 551, 558, + 586, 307, 365, 1269, 4009, 11, 411, 11, 291, 643, 1269, 4009, 51466], "temperature": + 0.0, "avg_logprob": -0.25915334992489575, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.008430704474449158}, {"id": 973, "seek": 323188, "start": 3253.92, + "end": 3255.76, "text": " to kind of get this idea out.", "tokens": [51466, 281, + 733, 295, 483, 341, 1558, 484, 13, 51558], "temperature": 0.0, "avg_logprob": -0.25915334992489575, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.008430704474449158}, + {"id": 974, "seek": 323188, "start": 3255.76, "end": 3260.44, "text": " So Dr. search, + if you close source it, you don''t really know what''s going on.", "tokens": [51558, + 407, 2491, 13, 3164, 11, 498, 291, 1998, 4009, 309, 11, 291, 500, 380, 534, 458, + 437, 311, 516, 322, 13, 51792], "temperature": 0.0, "avg_logprob": -0.25915334992489575, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.008430704474449158}, + {"id": 975, "seek": 326044, "start": 3261.44, "end": 3264.52, "text": " Nothing + like something like you don''t really get the info that you want.", "tokens": [50414, + 6693, 411, 746, 411, 291, 500, 380, 534, 483, 264, 13614, 300, 291, 528, 13, 50568], + "temperature": 0.0, "avg_logprob": -0.1966567426114469, "compression_ratio": 1.8357771260997067, + "no_speech_prob": 0.025944175198674202}, {"id": 976, "seek": 326044, "start": 3264.52, + "end": 3268.64, "text": " And that doesn''t spark competition because if you''re + not in there already, you need to find out", "tokens": [50568, 400, 300, 1177, 380, + 9908, 6211, 570, 498, 291, 434, 406, 294, 456, 1217, 11, 291, 643, 281, 915, 484, + 50774], "temperature": 0.0, "avg_logprob": -0.1966567426114469, "compression_ratio": + 1.8357771260997067, "no_speech_prob": 0.025944175198674202}, {"id": 977, "seek": + 326044, "start": 3268.64, "end": 3274.76, "text": " everything yourself, kind of + do it all to just catch up in that that''s sort of going to take a long time.", + "tokens": [50774, 1203, 1803, 11, 733, 295, 360, 309, 439, 281, 445, 3745, 493, + 294, 300, 300, 311, 1333, 295, 516, 281, 747, 257, 938, 565, 13, 51080], "temperature": + 0.0, "avg_logprob": -0.1966567426114469, "compression_ratio": 1.8357771260997067, + "no_speech_prob": 0.025944175198674202}, {"id": 978, "seek": 326044, "start": 3274.76, + "end": 3279.36, "text": " So with open source, it kind of promotes this competition + innovation because everyone can see what we''re doing.", "tokens": [51080, 407, + 365, 1269, 4009, 11, 309, 733, 295, 36015, 341, 6211, 8504, 570, 1518, 393, 536, + 437, 321, 434, 884, 13, 51310], "temperature": 0.0, "avg_logprob": -0.1966567426114469, + "compression_ratio": 1.8357771260997067, "no_speech_prob": 0.025944175198674202}, + {"id": 979, "seek": 326044, "start": 3279.36, "end": 3282.48, "text": " You can + see all these algorithms, you can learn how a vector search works.", "tokens": [51310, + 509, 393, 536, 439, 613, 14642, 11, 291, 393, 1466, 577, 257, 8062, 3164, 1985, + 13, 51466], "temperature": 0.0, "avg_logprob": -0.1966567426114469, "compression_ratio": + 1.8357771260997067, "no_speech_prob": 0.025944175198674202}, {"id": 980, "seek": + 326044, "start": 3282.48, "end": 3284.8, "text": " And then people can kind of branch + off and do their own.", "tokens": [51466, 400, 550, 561, 393, 733, 295, 9819, 766, + 293, 360, 641, 1065, 13, 51582], "temperature": 0.0, "avg_logprob": -0.1966567426114469, + "compression_ratio": 1.8357771260997067, "no_speech_prob": 0.025944175198674202}, + {"id": 981, "seek": 326044, "start": 3284.8, "end": 3289.92, "text": " Sure it''s + competition, but the only way you''re going to get more users familiar with this + and knowing about", "tokens": [51582, 4894, 309, 311, 6211, 11, 457, 264, 787, 636, + 291, 434, 516, 281, 483, 544, 5022, 4963, 365, 341, 293, 5276, 466, 51838], "temperature": + 0.0, "avg_logprob": -0.1966567426114469, "compression_ratio": 1.8357771260997067, + "no_speech_prob": 0.025944175198674202}, {"id": 982, "seek": 328992, "start": 3289.92, + "end": 3294.44, "text": " vector search is just get as many people doing it and + get as many people trying their own routes,", "tokens": [50364, 8062, 3164, 307, + 445, 483, 382, 867, 561, 884, 309, 293, 483, 382, 867, 561, 1382, 641, 1065, 18242, + 11, 50590], "temperature": 0.0, "avg_logprob": -0.14154698407208477, "compression_ratio": + 1.914590747330961, "no_speech_prob": 0.0010415184078738093}, {"id": 983, "seek": + 328992, "start": 3294.44, "end": 3298.2400000000002, "text": " get as many people + building up their own systems and just kind of get it out there.", "tokens": [50590, + 483, 382, 867, 561, 2390, 493, 641, 1065, 3652, 293, 445, 733, 295, 483, 309, 484, + 456, 13, 50780], "temperature": 0.0, "avg_logprob": -0.14154698407208477, "compression_ratio": + 1.914590747330961, "no_speech_prob": 0.0010415184078738093}, {"id": 984, "seek": + 328992, "start": 3298.2400000000002, "end": 3300.2000000000003, "text": " That''s + like what we want.", "tokens": [50780, 663, 311, 411, 437, 321, 528, 13, 50878], + "temperature": 0.0, "avg_logprob": -0.14154698407208477, "compression_ratio": 1.914590747330961, + "no_speech_prob": 0.0010415184078738093}, {"id": 985, "seek": 328992, "start": 3300.2000000000003, + "end": 3302.32, "text": " And I think like that''s the biggest way you can do it.", + "tokens": [50878, 400, 286, 519, 411, 300, 311, 264, 3880, 636, 291, 393, 360, 309, + 13, 50984], "temperature": 0.0, "avg_logprob": -0.14154698407208477, "compression_ratio": + 1.914590747330961, "no_speech_prob": 0.0010415184078738093}, {"id": 986, "seek": + 328992, "start": 3302.32, "end": 3307.16, "text": " If you open source, everyone + can see what''s going on, learn from it and just go ahead.", "tokens": [50984, 759, + 291, 1269, 4009, 11, 1518, 393, 536, 437, 311, 516, 322, 11, 1466, 490, 309, 293, + 445, 352, 2286, 13, 51226], "temperature": 0.0, "avg_logprob": -0.14154698407208477, + "compression_ratio": 1.914590747330961, "no_speech_prob": 0.0010415184078738093}, + {"id": 987, "seek": 328992, "start": 3307.16, "end": 3313.2400000000002, "text": + " And then also with open source, you kind of get feedback from everyone from all + different areas,", "tokens": [51226, 400, 550, 611, 365, 1269, 4009, 11, 291, 733, + 295, 483, 5824, 490, 1518, 490, 439, 819, 3179, 11, 51530], "temperature": 0.0, + "avg_logprob": -0.14154698407208477, "compression_ratio": 1.914590747330961, "no_speech_prob": + 0.0010415184078738093}, {"id": 988, "seek": 328992, "start": 3313.2400000000002, + "end": 3317.44, "text": " from all different like, you can be a student working + on a project who has some great idea.", "tokens": [51530, 490, 439, 819, 411, 11, + 291, 393, 312, 257, 3107, 1364, 322, 257, 1716, 567, 575, 512, 869, 1558, 13, 51740], + "temperature": 0.0, "avg_logprob": -0.14154698407208477, "compression_ratio": 1.914590747330961, + "no_speech_prob": 0.0010415184078738093}, {"id": 989, "seek": 331744, "start": 3317.44, + "end": 3318.92, "text": " Like he''s not some company.", "tokens": [50364, 1743, + 415, 311, 406, 512, 2237, 13, 50438], "temperature": 0.0, "avg_logprob": -0.17102552051386558, + "compression_ratio": 1.734982332155477, "no_speech_prob": 0.07708778977394104}, + {"id": 990, "seek": 331744, "start": 3318.92, "end": 3323.28, "text": " So if you''re + like sometimes close source, if it''s not bringing money in or something like that,", + "tokens": [50438, 407, 498, 291, 434, 411, 2171, 1998, 4009, 11, 498, 309, 311, + 406, 5062, 1460, 294, 420, 746, 411, 300, 11, 50656], "temperature": 0.0, "avg_logprob": + -0.17102552051386558, "compression_ratio": 1.734982332155477, "no_speech_prob": + 0.07708778977394104}, {"id": 991, "seek": 331744, "start": 3323.28, "end": 3327.08, + "text": " no one really really listened to that small student and his idea.", "tokens": + [50656, 572, 472, 534, 534, 13207, 281, 300, 1359, 3107, 293, 702, 1558, 13, 50846], + "temperature": 0.0, "avg_logprob": -0.17102552051386558, "compression_ratio": 1.734982332155477, + "no_speech_prob": 0.07708778977394104}, {"id": 992, "seek": 331744, "start": 3327.08, + "end": 3329.68, "text": " So where he might not be able to use it.", "tokens": [50846, + 407, 689, 415, 1062, 406, 312, 1075, 281, 764, 309, 13, 50976], "temperature": 0.0, + "avg_logprob": -0.17102552051386558, "compression_ratio": 1.734982332155477, "no_speech_prob": + 0.07708778977394104}, {"id": 993, "seek": 331744, "start": 3329.68, "end": 3336.0, + "text": " So it''s just about getting more perspectives on it and getting more input + and kind of making it accessible to everyone", "tokens": [50976, 407, 309, 311, + 445, 466, 1242, 544, 16766, 322, 309, 293, 1242, 544, 4846, 293, 733, 295, 1455, + 309, 9515, 281, 1518, 51292], "temperature": 0.0, "avg_logprob": -0.17102552051386558, + "compression_ratio": 1.734982332155477, "no_speech_prob": 0.07708778977394104}, + {"id": 994, "seek": 331744, "start": 3336.0, "end": 3338.8, "text": " and sparking + that competition innovation.", "tokens": [51292, 293, 9908, 278, 300, 6211, 8504, + 13, 51432], "temperature": 0.0, "avg_logprob": -0.17102552051386558, "compression_ratio": + 1.734982332155477, "no_speech_prob": 0.07708778977394104}, {"id": 995, "seek": 331744, + "start": 3338.8, "end": 3341.2000000000003, "text": " Yeah, actually you brought + a very interesting topic.", "tokens": [51432, 865, 11, 767, 291, 3038, 257, 588, + 1880, 4829, 13, 51552], "temperature": 0.0, "avg_logprob": -0.17102552051386558, + "compression_ratio": 1.734982332155477, "no_speech_prob": 0.07708778977394104}, + {"id": 996, "seek": 331744, "start": 3341.2000000000003, "end": 3343.88, "text": + " I didn''t think about it that way to be honest.", "tokens": [51552, 286, 994, + 380, 519, 466, 309, 300, 636, 281, 312, 3245, 13, 51686], "temperature": 0.0, "avg_logprob": + -0.17102552051386558, "compression_ratio": 1.734982332155477, "no_speech_prob": + 0.07708778977394104}, {"id": 997, "seek": 334388, "start": 3343.88, "end": 3349.96, + "text": " And now that you said it, it''s very logical that it may as well be a + competition between some users", "tokens": [50364, 400, 586, 300, 291, 848, 309, + 11, 309, 311, 588, 14978, 300, 309, 815, 382, 731, 312, 257, 6211, 1296, 512, 5022, + 50668], "temperature": 0.0, "avg_logprob": -0.2048908265169002, "compression_ratio": + 1.7553191489361701, "no_speech_prob": 0.011953696608543396}, {"id": 998, "seek": + 334388, "start": 3349.96, "end": 3355.1600000000003, "text": " because they are + using the same tech and they have different use cases or maybe the same use case,", + "tokens": [50668, 570, 436, 366, 1228, 264, 912, 7553, 293, 436, 362, 819, 764, + 3331, 420, 1310, 264, 912, 764, 1389, 11, 50928], "temperature": 0.0, "avg_logprob": + -0.2048908265169002, "compression_ratio": 1.7553191489361701, "no_speech_prob": + 0.011953696608543396}, {"id": 999, "seek": 334388, "start": 3355.1600000000003, + "end": 3363.48, "text": " but they are competing like to get that last sort of percent + of precision out or whatever you''re doing.", "tokens": [50928, 457, 436, 366, 15439, + 411, 281, 483, 300, 1036, 1333, 295, 3043, 295, 18356, 484, 420, 2035, 291, 434, + 884, 13, 51344], "temperature": 0.0, "avg_logprob": -0.2048908265169002, "compression_ratio": + 1.7553191489361701, "no_speech_prob": 0.011953696608543396}, {"id": 1000, "seek": + 334388, "start": 3363.48, "end": 3366.6800000000003, "text": " But also at the same + time, you know, like when you look at open source projects,", "tokens": [51344, + 583, 611, 412, 264, 912, 565, 11, 291, 458, 11, 411, 562, 291, 574, 412, 1269, 4009, + 4455, 11, 51504], "temperature": 0.0, "avg_logprob": -0.2048908265169002, "compression_ratio": + 1.7553191489361701, "no_speech_prob": 0.011953696608543396}, {"id": 1001, "seek": + 334388, "start": 3366.6800000000003, "end": 3371.08, "text": " like I don''t know + Apache Software Foundation for that sake, you know,", "tokens": [51504, 411, 286, + 500, 380, 458, 46597, 27428, 10335, 337, 300, 9717, 11, 291, 458, 11, 51724], "temperature": + 0.0, "avg_logprob": -0.2048908265169002, "compression_ratio": 1.7553191489361701, + "no_speech_prob": 0.011953696608543396}, {"id": 1002, "seek": 334388, "start": 3371.08, + "end": 3373.28, "text": " when you go there and you ask a question,", "tokens": + [51724, 562, 291, 352, 456, 293, 291, 1029, 257, 1168, 11, 51834], "temperature": + 0.0, "avg_logprob": -0.2048908265169002, "compression_ratio": 1.7553191489361701, + "no_speech_prob": 0.011953696608543396}, {"id": 1003, "seek": 337328, "start": 3373.28, + "end": 3376.7200000000003, "text": " you, first of all, you don''t have to say the + company that you work for, right?", "tokens": [50364, 291, 11, 700, 295, 439, 11, + 291, 500, 380, 362, 281, 584, 264, 2237, 300, 291, 589, 337, 11, 558, 30, 50536], + "temperature": 0.0, "avg_logprob": -0.1900373026102531, "compression_ratio": 1.7165991902834008, + "no_speech_prob": 0.004992845933884382}, {"id": 1004, "seek": 337328, "start": 3376.7200000000003, + "end": 3379.44, "text": " Or maybe you are that student that you said.", "tokens": + [50536, 1610, 1310, 291, 366, 300, 3107, 300, 291, 848, 13, 50672], "temperature": + 0.0, "avg_logprob": -0.1900373026102531, "compression_ratio": 1.7165991902834008, + "no_speech_prob": 0.004992845933884382}, {"id": 1005, "seek": 337328, "start": 3379.44, + "end": 3383.1200000000003, "text": " And, and you know, you just focus on the matter, + right?", "tokens": [50672, 400, 11, 293, 291, 458, 11, 291, 445, 1879, 322, 264, + 1871, 11, 558, 30, 50856], "temperature": 0.0, "avg_logprob": -0.1900373026102531, + "compression_ratio": 1.7165991902834008, "no_speech_prob": 0.004992845933884382}, + {"id": 1006, "seek": 337328, "start": 3383.1200000000003, "end": 3386.1600000000003, + "text": " You focus on what is it you''re asking about?", "tokens": [50856, 509, + 1879, 322, 437, 307, 309, 291, 434, 3365, 466, 30, 51008], "temperature": 0.0, "avg_logprob": + -0.1900373026102531, "compression_ratio": 1.7165991902834008, "no_speech_prob": + 0.004992845933884382}, {"id": 1007, "seek": 337328, "start": 3386.1600000000003, + "end": 3393.44, "text": " And then if somebody is so curious, even if they''re competing + over the same thing,", "tokens": [51008, 400, 550, 498, 2618, 307, 370, 6369, 11, + 754, 498, 436, 434, 15439, 670, 264, 912, 551, 11, 51372], "temperature": 0.0, "avg_logprob": + -0.1900373026102531, "compression_ratio": 1.7165991902834008, "no_speech_prob": + 0.004992845933884382}, {"id": 1008, "seek": 337328, "start": 3393.44, "end": 3396.7200000000003, + "text": " they might kind of casually share something, right?", "tokens": [51372, + 436, 1062, 733, 295, 34872, 2073, 746, 11, 558, 30, 51536], "temperature": 0.0, + "avg_logprob": -0.1900373026102531, "compression_ratio": 1.7165991902834008, "no_speech_prob": + 0.004992845933884382}, {"id": 1009, "seek": 337328, "start": 3396.7200000000003, + "end": 3400.96, "text": " I mean, that''s what I''ve seen in the in the mailing + lists a lot.", "tokens": [51536, 286, 914, 11, 300, 311, 437, 286, 600, 1612, 294, + 264, 294, 264, 41612, 14511, 257, 688, 13, 51748], "temperature": 0.0, "avg_logprob": + -0.1900373026102531, "compression_ratio": 1.7165991902834008, "no_speech_prob": + 0.004992845933884382}, {"id": 1010, "seek": 340096, "start": 3400.96, "end": 3404.8, + "text": " Like users just some of the other users, they just come in and say,", + "tokens": [50364, 1743, 5022, 445, 512, 295, 264, 661, 5022, 11, 436, 445, 808, + 294, 293, 584, 11, 50556], "temperature": 0.0, "avg_logprob": -0.16179648312655362, + "compression_ratio": 1.809090909090909, "no_speech_prob": 0.004277524072676897}, + {"id": 1011, "seek": 340096, "start": 3404.8, "end": 3405.92, "text": " Hey, why + are you doing this?", "tokens": [50556, 1911, 11, 983, 366, 291, 884, 341, 30, 50612], + "temperature": 0.0, "avg_logprob": -0.16179648312655362, "compression_ratio": 1.809090909090909, + "no_speech_prob": 0.004277524072676897}, {"id": 1012, "seek": 340096, "start": 3405.92, + "end": 3407.84, "text": " You know, did you consider something else?", "tokens": + [50612, 509, 458, 11, 630, 291, 1949, 746, 1646, 30, 50708], "temperature": 0.0, + "avg_logprob": -0.16179648312655362, "compression_ratio": 1.809090909090909, "no_speech_prob": + 0.004277524072676897}, {"id": 1013, "seek": 340096, "start": 3407.84, "end": 3410.96, + "text": " And you''re focusing so much on solving a specific problem?", "tokens": + [50708, 400, 291, 434, 8416, 370, 709, 322, 12606, 257, 2685, 1154, 30, 50864], + "temperature": 0.0, "avg_logprob": -0.16179648312655362, "compression_ratio": 1.809090909090909, + "no_speech_prob": 0.004277524072676897}, {"id": 1014, "seek": 340096, "start": 3410.96, + "end": 3415.36, "text": " Yeah, I think it''s just, yeah, it''s kind of in like + with competition, there''s innovation.", "tokens": [50864, 865, 11, 286, 519, 309, + 311, 445, 11, 1338, 11, 309, 311, 733, 295, 294, 411, 365, 6211, 11, 456, 311, 8504, + 13, 51084], "temperature": 0.0, "avg_logprob": -0.16179648312655362, "compression_ratio": + 1.809090909090909, "no_speech_prob": 0.004277524072676897}, {"id": 1015, "seek": + 340096, "start": 3415.36, "end": 3417.36, "text": " And then with innovation, you + get more people interested.", "tokens": [51084, 400, 550, 365, 8504, 11, 291, 483, + 544, 561, 3102, 13, 51184], "temperature": 0.0, "avg_logprob": -0.16179648312655362, + "compression_ratio": 1.809090909090909, "no_speech_prob": 0.004277524072676897}, + {"id": 1016, "seek": 340096, "start": 3417.36, "end": 3421.28, "text": " And I think + that''s kind of like what neural nets did is started out.", "tokens": [51184, 400, + 286, 519, 300, 311, 733, 295, 411, 437, 18161, 36170, 630, 307, 1409, 484, 13, 51380], + "temperature": 0.0, "avg_logprob": -0.16179648312655362, "compression_ratio": 1.809090909090909, + "no_speech_prob": 0.004277524072676897}, {"id": 1017, "seek": 340096, "start": 3421.28, + "end": 3422.4, "text": " I don''t think everyone was using it.", "tokens": [51380, + 286, 500, 380, 519, 1518, 390, 1228, 309, 13, 51436], "temperature": 0.0, "avg_logprob": + -0.16179648312655362, "compression_ratio": 1.809090909090909, "no_speech_prob": + 0.004277524072676897}, {"id": 1018, "seek": 340096, "start": 3422.4, "end": 3426.0, + "text": " Everyone was just using some brute force tech search of keyword matching.", + "tokens": [51436, 5198, 390, 445, 1228, 512, 47909, 3464, 7553, 3164, 295, 20428, + 14324, 13, 51616], "temperature": 0.0, "avg_logprob": -0.16179648312655362, "compression_ratio": + 1.809090909090909, "no_speech_prob": 0.004277524072676897}, {"id": 1019, "seek": + 340096, "start": 3426.7200000000003, "end": 3430.08, "text": " And then as people + learned about it more, there''s open source systems.", "tokens": [51652, 400, 550, + 382, 561, 3264, 466, 309, 544, 11, 456, 311, 1269, 4009, 3652, 13, 51820], "temperature": + 0.0, "avg_logprob": -0.16179648312655362, "compression_ratio": 1.809090909090909, + "no_speech_prob": 0.004277524072676897}, {"id": 1020, "seek": 343008, "start": 3430.08, + "end": 3432.96, "text": " And I think a lot of these neural nets, if you''re going + to be making neural nets,", "tokens": [50364, 400, 286, 519, 257, 688, 295, 613, + 18161, 36170, 11, 498, 291, 434, 516, 281, 312, 1455, 18161, 36170, 11, 50508], + "temperature": 0.0, "avg_logprob": -0.16851291408786526, "compression_ratio": 1.832214765100671, + "no_speech_prob": 0.007393502164632082}, {"id": 1021, "seek": 343008, "start": 3432.96, + "end": 3434.64, "text": " you''re going to be doing research on them.", "tokens": + [50508, 291, 434, 516, 281, 312, 884, 2132, 322, 552, 13, 50592], "temperature": + 0.0, "avg_logprob": -0.16851291408786526, "compression_ratio": 1.832214765100671, + "no_speech_prob": 0.007393502164632082}, {"id": 1022, "seek": 343008, "start": 3434.64, + "end": 3436.0, "text": " You''re going to be posting those papers.", "tokens": [50592, + 509, 434, 516, 281, 312, 15978, 729, 10577, 13, 50660], "temperature": 0.0, "avg_logprob": + -0.16851291408786526, "compression_ratio": 1.832214765100671, "no_speech_prob": + 0.007393502164632082}, {"id": 1023, "seek": 343008, "start": 3436.0, "end": 3438.7999999999997, + "text": " Everyone''s going to see it filled on top of it.", "tokens": [50660, 5198, + 311, 516, 281, 536, 309, 6412, 322, 1192, 295, 309, 13, 50800], "temperature": 0.0, + "avg_logprob": -0.16851291408786526, "compression_ratio": 1.832214765100671, "no_speech_prob": + 0.007393502164632082}, {"id": 1024, "seek": 343008, "start": 3438.7999999999997, + "end": 3439.68, "text": " And we''ll just explode.", "tokens": [50800, 400, 321, + 603, 445, 21411, 13, 50844], "temperature": 0.0, "avg_logprob": -0.16851291408786526, + "compression_ratio": 1.832214765100671, "no_speech_prob": 0.007393502164632082}, + {"id": 1025, "seek": 343008, "start": 3439.68, "end": 3444.24, "text": " I think + now that''s happening with these Bert models with Huggins face all of this.", "tokens": + [50844, 286, 519, 586, 300, 311, 2737, 365, 613, 29594, 5245, 365, 389, 3562, 1292, + 1851, 439, 295, 341, 13, 51072], "temperature": 0.0, "avg_logprob": -0.16851291408786526, + "compression_ratio": 1.832214765100671, "no_speech_prob": 0.007393502164632082}, + {"id": 1026, "seek": 343008, "start": 3444.24, "end": 3446.48, "text": " It''s just + exploding and more people are looking into it.", "tokens": [51072, 467, 311, 445, + 35175, 293, 544, 561, 366, 1237, 666, 309, 13, 51184], "temperature": 0.0, "avg_logprob": + -0.16851291408786526, "compression_ratio": 1.832214765100671, "no_speech_prob": + 0.007393502164632082}, {"id": 1027, "seek": 343008, "start": 3446.48, "end": 3449.7599999999998, + "text": " And it''s just it''s better for everything at this point.", "tokens": + [51184, 400, 309, 311, 445, 309, 311, 1101, 337, 1203, 412, 341, 935, 13, 51348], + "temperature": 0.0, "avg_logprob": -0.16851291408786526, "compression_ratio": 1.832214765100671, + "no_speech_prob": 0.007393502164632082}, {"id": 1028, "seek": 343008, "start": 3450.48, + "end": 3452.24, "text": " So that''s kind of why we do it.", "tokens": [51384, 407, + 300, 311, 733, 295, 983, 321, 360, 309, 13, 51472], "temperature": 0.0, "avg_logprob": + -0.16851291408786526, "compression_ratio": 1.832214765100671, "no_speech_prob": + 0.007393502164632082}, {"id": 1029, "seek": 343008, "start": 3452.88, "end": 3457.68, + "text": " There''s a bit of my opinion company motto, but that''s kind of, yeah, + the reason.", "tokens": [51504, 821, 311, 257, 857, 295, 452, 4800, 2237, 32680, + 11, 457, 300, 311, 733, 295, 11, 1338, 11, 264, 1778, 13, 51744], "temperature": + 0.0, "avg_logprob": -0.16851291408786526, "compression_ratio": 1.832214765100671, + "no_speech_prob": 0.007393502164632082}, {"id": 1030, "seek": 345768, "start": 3458.48, + "end": 3462.56, "text": " Yeah, it''s kind of like a compound effect of multiple + inputs.", "tokens": [50404, 865, 11, 309, 311, 733, 295, 411, 257, 14154, 1802, + 295, 3866, 15743, 13, 50608], "temperature": 0.0, "avg_logprob": -0.2846320759166371, + "compression_ratio": 1.5265486725663717, "no_speech_prob": 0.008988680317997932}, + {"id": 1031, "seek": 345768, "start": 3462.56, "end": 3470.16, "text": " And then + everyone essentially has the same goal is to serve the users the best.", "tokens": + [50608, 400, 550, 1518, 4476, 575, 264, 912, 3387, 307, 281, 4596, 264, 5022, 264, + 1151, 13, 50988], "temperature": 0.0, "avg_logprob": -0.2846320759166371, "compression_ratio": + 1.5265486725663717, "no_speech_prob": 0.008988680317997932}, {"id": 1032, "seek": + 345768, "start": 3471.2, "end": 3473.52, "text": " Or maybe solve that specific + problem they''re solving.", "tokens": [51040, 1610, 1310, 5039, 300, 2685, 1154, + 436, 434, 12606, 13, 51156], "temperature": 0.0, "avg_logprob": -0.2846320759166371, + "compression_ratio": 1.5265486725663717, "no_speech_prob": 0.008988680317997932}, + {"id": 1033, "seek": 345768, "start": 3474.08, "end": 3475.2799999999997, "text": + " Maybe even for themselves.", "tokens": [51184, 2704, 754, 337, 2969, 13, 51244], + "temperature": 0.0, "avg_logprob": -0.2846320759166371, "compression_ratio": 1.5265486725663717, + "no_speech_prob": 0.008988680317997932}, {"id": 1034, "seek": 345768, "start": 3476.08, + "end": 3478.7999999999997, "text": " But yeah, I mean, that''s very interesting.", + "tokens": [51284, 583, 1338, 11, 286, 914, 11, 300, 311, 588, 1880, 13, 51420], + "temperature": 0.0, "avg_logprob": -0.2846320759166371, "compression_ratio": 1.5265486725663717, + "no_speech_prob": 0.008988680317997932}, {"id": 1035, "seek": 345768, "start": 3478.7999999999997, + "end": 3487.44, "text": " And how do you, so basically you have Slack where I can + go and ask my question.", "tokens": [51420, 400, 577, 360, 291, 11, 370, 1936, 291, + 362, 37211, 689, 286, 393, 352, 293, 1029, 452, 1168, 13, 51852], "temperature": + 0.0, "avg_logprob": -0.2846320759166371, "compression_ratio": 1.5265486725663717, + "no_speech_prob": 0.008988680317997932}, {"id": 1036, "seek": 348768, "start": 3488.0, + "end": 3494.3999999999996, "text": " Like how do you kind of balance your time between + kind of doing the actual work and helping the community?", "tokens": [50380, 1743, + 577, 360, 291, 733, 295, 4772, 428, 565, 1296, 733, 295, 884, 264, 3539, 589, 293, + 4315, 264, 1768, 30, 50700], "temperature": 0.0, "avg_logprob": -0.16872796704692225, + "compression_ratio": 1.693103448275862, "no_speech_prob": 0.0011028097942471504}, + {"id": 1037, "seek": 348768, "start": 3495.7599999999998, "end": 3496.96, "text": + " So that''s a hard one.", "tokens": [50768, 407, 300, 311, 257, 1152, 472, 13, + 50828], "temperature": 0.0, "avg_logprob": -0.16872796704692225, "compression_ratio": + 1.693103448275862, "no_speech_prob": 0.0011028097942471504}, {"id": 1038, "seek": + 348768, "start": 3498.0, "end": 3502.96, "text": " Right now with Slack is it''s + people that come to us.", "tokens": [50880, 1779, 586, 365, 37211, 307, 309, 311, + 561, 300, 808, 281, 505, 13, 51128], "temperature": 0.0, "avg_logprob": -0.16872796704692225, + "compression_ratio": 1.693103448275862, "no_speech_prob": 0.0011028097942471504}, + {"id": 1039, "seek": 348768, "start": 3502.96, "end": 3506.64, "text": " So because + this area hasn''t blown up so much, it''s still manageable.", "tokens": [51128, + 407, 570, 341, 1859, 6132, 380, 16479, 493, 370, 709, 11, 309, 311, 920, 38798, + 13, 51312], "temperature": 0.0, "avg_logprob": -0.16872796704692225, "compression_ratio": + 1.693103448275862, "no_speech_prob": 0.0011028097942471504}, {"id": 1040, "seek": + 348768, "start": 3506.64, "end": 3511.68, "text": " I''m thinking the future when + you get to like levels of these other open source projects where they have like + 20,000 people", "tokens": [51312, 286, 478, 1953, 264, 2027, 562, 291, 483, 281, + 411, 4358, 295, 613, 661, 1269, 4009, 4455, 689, 436, 362, 411, 945, 11, 1360, 561, + 51564], "temperature": 0.0, "avg_logprob": -0.16872796704692225, "compression_ratio": + 1.693103448275862, "no_speech_prob": 0.0011028097942471504}, {"id": 1041, "seek": + 348768, "start": 3511.68, "end": 3514.08, "text": " and they''re slack all like + posting questions.", "tokens": [51564, 293, 436, 434, 29767, 439, 411, 15978, 1651, + 13, 51684], "temperature": 0.0, "avg_logprob": -0.16872796704692225, "compression_ratio": + 1.693103448275862, "no_speech_prob": 0.0011028097942471504}, {"id": 1042, "seek": + 348768, "start": 3514.08, "end": 3516.72, "text": " Right now it''s pretty manageable + and you can kind of keep on top of it.", "tokens": [51684, 1779, 586, 309, 311, + 1238, 38798, 293, 291, 393, 733, 295, 1066, 322, 1192, 295, 309, 13, 51816], "temperature": + 0.0, "avg_logprob": -0.16872796704692225, "compression_ratio": 1.693103448275862, + "no_speech_prob": 0.0011028097942471504}, {"id": 1043, "seek": 351672, "start": + 3517.4399999999996, "end": 3520.16, "text": " And yeah, Slack, we made like a discourse.", + "tokens": [50400, 400, 1338, 11, 37211, 11, 321, 1027, 411, 257, 23938, 13, 50536], + "temperature": 0.0, "avg_logprob": -0.1602008381827933, "compression_ratio": 1.7615658362989324, + "no_speech_prob": 0.002312406897544861}, {"id": 1044, "seek": 351672, "start": 3520.16, + "end": 3524.3999999999996, "text": " We kind of made a lot of like preliminary like + areas that you could talk to us.", "tokens": [50536, 492, 733, 295, 1027, 257, 688, + 295, 411, 28817, 411, 3179, 300, 291, 727, 751, 281, 505, 13, 50748], "temperature": + 0.0, "avg_logprob": -0.1602008381827933, "compression_ratio": 1.7615658362989324, + "no_speech_prob": 0.002312406897544861}, {"id": 1045, "seek": 351672, "start": 3524.9599999999996, + "end": 3528.08, "text": " And then yeah, there''s GitHub issues all that.", "tokens": + [50776, 400, 550, 1338, 11, 456, 311, 23331, 2663, 439, 300, 13, 50932], "temperature": + 0.0, "avg_logprob": -0.1602008381827933, "compression_ratio": 1.7615658362989324, + "no_speech_prob": 0.002312406897544861}, {"id": 1046, "seek": 351672, "start": 3528.08, + "end": 3531.7599999999998, "text": " But it''s also there''s another aspect of like + kind of splitting up the problems.", "tokens": [50932, 583, 309, 311, 611, 456, + 311, 1071, 4171, 295, 411, 733, 295, 30348, 493, 264, 2740, 13, 51116], "temperature": + 0.0, "avg_logprob": -0.1602008381827933, "compression_ratio": 1.7615658362989324, + "no_speech_prob": 0.002312406897544861}, {"id": 1047, "seek": 351672, "start": 3531.7599999999998, + "end": 3535.68, "text": " Because if you open up a Slack, people might post their + technical problems there.", "tokens": [51116, 1436, 498, 291, 1269, 493, 257, 37211, + 11, 561, 1062, 2183, 641, 6191, 2740, 456, 13, 51312], "temperature": 0.0, "avg_logprob": + -0.1602008381827933, "compression_ratio": 1.7615658362989324, "no_speech_prob": + 0.002312406897544861}, {"id": 1048, "seek": 351672, "start": 3535.68, "end": 3538.72, + "text": " Or like there''s something that might be worth being a GitHub issue.", + "tokens": [51312, 1610, 411, 456, 311, 746, 300, 1062, 312, 3163, 885, 257, 23331, + 2734, 13, 51464], "temperature": 0.0, "avg_logprob": -0.1602008381827933, "compression_ratio": + 1.7615658362989324, "no_speech_prob": 0.002312406897544861}, {"id": 1049, "seek": + 351672, "start": 3539.52, "end": 3543.04, "text": " The people that are looking + at the Slack majority of the time are more like user success style,", "tokens": + [51504, 440, 561, 300, 366, 1237, 412, 264, 37211, 6286, 295, 264, 565, 366, 544, + 411, 4195, 2245, 3758, 11, 51680], "temperature": 0.0, "avg_logprob": -0.1602008381827933, + "compression_ratio": 1.7615658362989324, "no_speech_prob": 0.002312406897544861}, + {"id": 1050, "seek": 354304, "start": 3543.04, "end": 3546.48, "text": " not like + full-blown R&D deep engineers.", "tokens": [50364, 406, 411, 1577, 12, 5199, 648, + 497, 5, 35, 2452, 11955, 13, 50536], "temperature": 0.0, "avg_logprob": -0.2504910178806471, + "compression_ratio": 1.6588628762541806, "no_speech_prob": 0.0015015702228993177}, + {"id": 1051, "seek": 354304, "start": 3547.04, "end": 3550.4, "text": " So that''s + where the, that''s where I think the balance comes out where the problem is of like", + "tokens": [50564, 407, 300, 311, 689, 264, 11, 300, 311, 689, 286, 519, 264, 4772, + 1487, 484, 689, 264, 1154, 307, 295, 411, 50732], "temperature": 0.0, "avg_logprob": + -0.2504910178806471, "compression_ratio": 1.6588628762541806, "no_speech_prob": + 0.0015015702228993177}, {"id": 1052, "seek": 354304, "start": 3550.96, "end": 3553.68, + "text": " what belongs on Slack, what belongs on GitHub as an issue.", "tokens": + [50760, 437, 12953, 322, 37211, 11, 437, 12953, 322, 23331, 382, 364, 2734, 13, + 50896], "temperature": 0.0, "avg_logprob": -0.2504910178806471, "compression_ratio": + 1.6588628762541806, "no_speech_prob": 0.0015015702228993177}, {"id": 1053, "seek": + 354304, "start": 3554.24, "end": 3559.2799999999997, "text": " But for now, all + of it''s easily solvable because it''s a steady inflow that we can manage.", "tokens": + [50924, 583, 337, 586, 11, 439, 295, 309, 311, 3612, 1404, 17915, 570, 309, 311, + 257, 13211, 9922, 305, 300, 321, 393, 3067, 13, 51176], "temperature": 0.0, "avg_logprob": + -0.2504910178806471, "compression_ratio": 1.6588628762541806, "no_speech_prob": + 0.0015015702228993177}, {"id": 1054, "seek": 354304, "start": 3559.2799999999997, + "end": 3560.96, "text": " And we have enough people looking at it.", "tokens": [51176, + 400, 321, 362, 1547, 561, 1237, 412, 309, 13, 51260], "temperature": 0.0, "avg_logprob": + -0.2504910178806471, "compression_ratio": 1.6588628762541806, "no_speech_prob": + 0.0015015702228993177}, {"id": 1055, "seek": 354304, "start": 3560.96, "end": 3564.16, + "text": " But we''ll see in the future that''s going to be another problem to deal + with in the", "tokens": [51260, 583, 321, 603, 536, 294, 264, 2027, 300, 311, 516, + 281, 312, 1071, 1154, 281, 2028, 365, 294, 264, 51420], "temperature": 0.0, "avg_logprob": + -0.2504910178806471, "compression_ratio": 1.6588628762541806, "no_speech_prob": + 0.0015015702228993177}, {"id": 1056, "seek": 354304, "start": 3564.88, "end": 3566.16, + "text": " interested to see how we do it.", "tokens": [51456, 3102, 281, 536, 577, + 321, 360, 309, 13, 51520], "temperature": 0.0, "avg_logprob": -0.2504910178806471, + "compression_ratio": 1.6588628762541806, "no_speech_prob": 0.0015015702228993177}, + {"id": 1057, "seek": 354304, "start": 3566.16, "end": 3569.36, "text": " Yeah, it''s + like both catastrophic success, right?", "tokens": [51520, 865, 11, 309, 311, 411, + 1293, 34915, 2245, 11, 558, 30, 51680], "temperature": 0.0, "avg_logprob": -0.2504910178806471, + "compression_ratio": 1.6588628762541806, "no_speech_prob": 0.0015015702228993177}, + {"id": 1058, "seek": 354304, "start": 3569.36, "end": 3570.56, "text": " Exactly.", + "tokens": [51680, 7587, 13, 51740], "temperature": 0.0, "avg_logprob": -0.2504910178806471, + "compression_ratio": 1.6588628762541806, "no_speech_prob": 0.0015015702228993177}, + {"id": 1059, "seek": 357056, "start": 3570.56, "end": 3571.7599999999998, "text": + " It may happen.", "tokens": [50364, 467, 815, 1051, 13, 50424], "temperature": + 0.0, "avg_logprob": -0.2756032373151209, "compression_ratio": 1.64453125, "no_speech_prob": + 0.0035639656707644463}, {"id": 1060, "seek": 357056, "start": 3571.7599999999998, + "end": 3573.92, "text": " But hopefully it will be manageable in your case.", "tokens": + [50424, 583, 4696, 309, 486, 312, 38798, 294, 428, 1389, 13, 50532], "temperature": + 0.0, "avg_logprob": -0.2756032373151209, "compression_ratio": 1.64453125, "no_speech_prob": + 0.0035639656707644463}, {"id": 1061, "seek": 357056, "start": 3573.92, "end": 3579.84, + "text": " And so you can, you can as said kind of cater to that community as well + as actually keep solving the", "tokens": [50532, 400, 370, 291, 393, 11, 291, 393, + 382, 848, 733, 295, 21557, 281, 300, 1768, 382, 731, 382, 767, 1066, 12606, 264, + 50828], "temperature": 0.0, "avg_logprob": -0.2756032373151209, "compression_ratio": + 1.64453125, "no_speech_prob": 0.0035639656707644463}, {"id": 1062, "seek": 357056, + "start": 3580.48, "end": 3584.32, "text": " and keeping your roadmap under control + because you also need to keep, you know,", "tokens": [50860, 293, 5145, 428, 35738, + 833, 1969, 570, 291, 611, 643, 281, 1066, 11, 291, 458, 11, 51052], "temperature": + 0.0, "avg_logprob": -0.2756032373151209, "compression_ratio": 1.64453125, "no_speech_prob": + 0.0035639656707644463}, {"id": 1063, "seek": 357056, "start": 3584.32, "end": 3585.68, + "text": " innovating in this space, right?", "tokens": [51052, 5083, 990, 294, 341, + 1901, 11, 558, 30, 51120], "temperature": 0.0, "avg_logprob": -0.2756032373151209, + "compression_ratio": 1.64453125, "no_speech_prob": 0.0035639656707644463}, {"id": + 1064, "seek": 357056, "start": 3587.2, "end": 3589.2, "text": " Yeah, I''m glad + it''s working for you.", "tokens": [51196, 865, 11, 286, 478, 5404, 309, 311, 1364, + 337, 291, 13, 51296], "temperature": 0.0, "avg_logprob": -0.2756032373151209, "compression_ratio": + 1.64453125, "no_speech_prob": 0.0035639656707644463}, {"id": 1065, "seek": 357056, + "start": 3589.2, "end": 3594.64, "text": " And I''ve been also slacking a bit kind + of, the slacking is not the right word,", "tokens": [51296, 400, 286, 600, 668, + 611, 1061, 14134, 257, 857, 733, 295, 11, 264, 1061, 14134, 307, 406, 264, 558, + 1349, 11, 51568], "temperature": 0.0, "avg_logprob": -0.2756032373151209, "compression_ratio": + 1.64453125, "no_speech_prob": 0.0035639656707644463}, {"id": 1066, "seek": 357056, + "start": 3594.64, "end": 3596.4, "text": " but slacking with big ads.", "tokens": + [51568, 457, 1061, 14134, 365, 955, 10342, 13, 51656], "temperature": 0.0, "avg_logprob": + -0.2756032373151209, "compression_ratio": 1.64453125, "no_speech_prob": 0.0035639656707644463}, + {"id": 1067, "seek": 359640, "start": 3597.04, "end": 3604.56, "text": " So just + kind of, I so immediately answers to my questions and they''ve been like a long + thread.", "tokens": [50396, 407, 445, 733, 295, 11, 286, 370, 4258, 6338, 281, 452, + 1651, 293, 436, 600, 668, 411, 257, 938, 7207, 13, 50772], "temperature": 0.0, "avg_logprob": + -0.33722089516996134, "compression_ratio": 1.5638766519823788, "no_speech_prob": + 0.01028203871101141}, {"id": 1068, "seek": 359640, "start": 3605.2000000000003, + "end": 3607.52, "text": " Why Docker doesn''t work, can you try these?", "tokens": + [50804, 1545, 33772, 1177, 380, 589, 11, 393, 291, 853, 613, 30, 50920], "temperature": + 0.0, "avg_logprob": -0.33722089516996134, "compression_ratio": 1.5638766519823788, + "no_speech_prob": 0.01028203871101141}, {"id": 1069, "seek": 359640, "start": 3607.52, + "end": 3608.4, "text": " Can you try that?", "tokens": [50920, 1664, 291, 853, 300, + 30, 50964], "temperature": 0.0, "avg_logprob": -0.33722089516996134, "compression_ratio": + 1.5638766519823788, "no_speech_prob": 0.01028203871101141}, {"id": 1070, "seek": + 359640, "start": 3608.4, "end": 3616.08, "text": " And it''s also like, you know, + it''s like a first impression you get about the database", "tokens": [50964, 400, + 309, 311, 611, 411, 11, 291, 458, 11, 309, 311, 411, 257, 700, 9995, 291, 483, 466, + 264, 8149, 51348], "temperature": 0.0, "avg_logprob": -0.33722089516996134, "compression_ratio": + 1.5638766519823788, "no_speech_prob": 0.01028203871101141}, {"id": 1071, "seek": + 359640, "start": 3616.08, "end": 3618.08, "text": " or about the Nopus source product.", + "tokens": [51348, 420, 466, 264, 426, 404, 301, 4009, 1674, 13, 51448], "temperature": + 0.0, "avg_logprob": -0.33722089516996134, "compression_ratio": 1.5638766519823788, + "no_speech_prob": 0.01028203871101141}, {"id": 1072, "seek": 359640, "start": 3618.08, + "end": 3619.6800000000003, "text": " Like how soon you get an answer?", "tokens": + [51448, 1743, 577, 2321, 291, 483, 364, 1867, 30, 51528], "temperature": 0.0, "avg_logprob": + -0.33722089516996134, "compression_ratio": 1.5638766519823788, "no_speech_prob": + 0.01028203871101141}, {"id": 1073, "seek": 359640, "start": 3620.88, "end": 3622.7200000000003, + "text": " Yeah, and you definitely try this sometimes.", "tokens": [51588, 865, + 11, 293, 291, 2138, 853, 341, 2171, 13, 51680], "temperature": 0.0, "avg_logprob": + -0.33722089516996134, "compression_ratio": 1.5638766519823788, "no_speech_prob": + 0.01028203871101141}, {"id": 1074, "seek": 362272, "start": 3622.72, "end": 3626.3999999999996, + "text": " Also like you''re working on one problem and you have another problem + that''s completely separate.", "tokens": [50364, 2743, 411, 291, 434, 1364, 322, + 472, 1154, 293, 291, 362, 1071, 1154, 300, 311, 2584, 4994, 13, 50548], "temperature": + 0.0, "avg_logprob": -0.20178638006511487, "compression_ratio": 1.705521472392638, + "no_speech_prob": 0.05228903517127037}, {"id": 1075, "seek": 362272, "start": 3626.3999999999996, + "end": 3627.7599999999998, "text": " It''s like it''s a big system.", "tokens": + [50548, 467, 311, 411, 309, 311, 257, 955, 1185, 13, 50616], "temperature": 0.0, + "avg_logprob": -0.20178638006511487, "compression_ratio": 1.705521472392638, "no_speech_prob": + 0.05228903517127037}, {"id": 1076, "seek": 362272, "start": 3627.7599999999998, + "end": 3629.8399999999997, "text": " So jumping around between, but then it''s also + okay.", "tokens": [50616, 407, 11233, 926, 1296, 11, 457, 550, 309, 311, 611, 1392, + 13, 50720], "temperature": 0.0, "avg_logprob": -0.20178638006511487, "compression_ratio": + 1.705521472392638, "no_speech_prob": 0.05228903517127037}, {"id": 1077, "seek": + 362272, "start": 3629.8399999999997, "end": 3631.4399999999996, "text": " Let me + find someone to answer that for you.", "tokens": [50720, 961, 385, 915, 1580, 281, + 1867, 300, 337, 291, 13, 50800], "temperature": 0.0, "avg_logprob": -0.20178638006511487, + "compression_ratio": 1.705521472392638, "no_speech_prob": 0.05228903517127037}, + {"id": 1078, "seek": 362272, "start": 3631.4399999999996, "end": 3632.9599999999996, + "text": " So you go internally look for someone.", "tokens": [50800, 407, 291, 352, + 19501, 574, 337, 1580, 13, 50876], "temperature": 0.0, "avg_logprob": -0.20178638006511487, + "compression_ratio": 1.705521472392638, "no_speech_prob": 0.05228903517127037}, + {"id": 1079, "seek": 362272, "start": 3632.9599999999996, "end": 3633.8399999999997, + "text": " Hey, can you answer this?", "tokens": [50876, 1911, 11, 393, 291, 1867, + 341, 30, 50920], "temperature": 0.0, "avg_logprob": -0.20178638006511487, "compression_ratio": + 1.705521472392638, "no_speech_prob": 0.05228903517127037}, {"id": 1080, "seek": + 362272, "start": 3634.48, "end": 3635.6, "text": " But hopefully it''s working.", + "tokens": [50952, 583, 4696, 309, 311, 1364, 13, 51008], "temperature": 0.0, "avg_logprob": + -0.20178638006511487, "compression_ratio": 1.705521472392638, "no_speech_prob": + 0.05228903517127037}, {"id": 1081, "seek": 362272, "start": 3635.6, "end": 3638.08, + "text": " I think, I think we''re pretty quick on our responses.", "tokens": [51008, + 286, 519, 11, 286, 519, 321, 434, 1238, 1702, 322, 527, 13019, 13, 51132], "temperature": + 0.0, "avg_logprob": -0.20178638006511487, "compression_ratio": 1.705521472392638, + "no_speech_prob": 0.05228903517127037}, {"id": 1082, "seek": 362272, "start": 3638.08, + "end": 3642.56, "text": " Maybe like overnight, it''s sometimes difficult to sleeping + in everything.", "tokens": [51132, 2704, 411, 13935, 11, 309, 311, 2171, 2252, 281, + 8296, 294, 1203, 13, 51356], "temperature": 0.0, "avg_logprob": -0.20178638006511487, + "compression_ratio": 1.705521472392638, "no_speech_prob": 0.05228903517127037}, + {"id": 1083, "seek": 362272, "start": 3642.56, "end": 3646.24, "text": " But we + try to get responses whenever we can.", "tokens": [51356, 583, 321, 853, 281, 483, + 13019, 5699, 321, 393, 13, 51540], "temperature": 0.0, "avg_logprob": -0.20178638006511487, + "compression_ratio": 1.705521472392638, "no_speech_prob": 0.05228903517127037}, + {"id": 1084, "seek": 362272, "start": 3646.8799999999997, "end": 3650.24, "text": + " Yeah, some people are like in China, I guess.", "tokens": [51572, 865, 11, 512, + 561, 366, 411, 294, 3533, 11, 286, 2041, 13, 51740], "temperature": 0.0, "avg_logprob": + -0.20178638006511487, "compression_ratio": 1.705521472392638, "no_speech_prob": + 0.05228903517127037}, {"id": 1085, "seek": 362272, "start": 3650.24, "end": 3651.2799999999997, + "text": " So like, I don''t know.", "tokens": [51740, 407, 411, 11, 286, 500, 380, + 458, 13, 51792], "temperature": 0.0, "avg_logprob": -0.20178638006511487, "compression_ratio": + 1.705521472392638, "no_speech_prob": 0.05228903517127037}, {"id": 1086, "seek": + 365128, "start": 3651.44, "end": 3655.36, "text": " It was like five hours difference + with my time zone and sometimes.", "tokens": [50372, 467, 390, 411, 1732, 2496, + 2649, 365, 452, 565, 6668, 293, 2171, 13, 50568], "temperature": 0.0, "avg_logprob": + -0.20795181819370814, "compression_ratio": 1.6175548589341693, "no_speech_prob": + 0.021716637536883354}, {"id": 1087, "seek": 365128, "start": 3655.36, "end": 3656.6400000000003, + "text": " With yours, yes, probably five.", "tokens": [50568, 2022, 6342, 11, 2086, + 11, 1391, 1732, 13, 50632], "temperature": 0.0, "avg_logprob": -0.20795181819370814, + "compression_ratio": 1.6175548589341693, "no_speech_prob": 0.021716637536883354}, + {"id": 1088, "seek": 365128, "start": 3656.6400000000003, "end": 3657.44, "text": + " Mine is 14.", "tokens": [50632, 11620, 307, 3499, 13, 50672], "temperature": 0.0, + "avg_logprob": -0.20795181819370814, "compression_ratio": 1.6175548589341693, "no_speech_prob": + 0.021716637536883354}, {"id": 1089, "seek": 365128, "start": 3658.88, "end": 3661.6800000000003, + "text": " That''s like you ask one question for a couple of days, right?", "tokens": + [50744, 663, 311, 411, 291, 1029, 472, 1168, 337, 257, 1916, 295, 1708, 11, 558, + 30, 50884], "temperature": 0.0, "avg_logprob": -0.20795181819370814, "compression_ratio": + 1.6175548589341693, "no_speech_prob": 0.021716637536883354}, {"id": 1090, "seek": + 365128, "start": 3662.5600000000004, "end": 3666.0, "text": " Yeah, it gets interesting + with the technical like very deep questions,", "tokens": [50928, 865, 11, 309, 2170, + 1880, 365, 264, 6191, 411, 588, 2452, 1651, 11, 51100], "temperature": 0.0, "avg_logprob": + -0.20795181819370814, "compression_ratio": 1.6175548589341693, "no_speech_prob": + 0.021716637536883354}, {"id": 1091, "seek": 365128, "start": 3666.0, "end": 3668.4, + "text": " because then I have to kind of bridge the gap of time.", "tokens": [51100, + 570, 550, 286, 362, 281, 733, 295, 7283, 264, 7417, 295, 565, 13, 51220], "temperature": + 0.0, "avg_logprob": -0.20795181819370814, "compression_ratio": 1.6175548589341693, + "no_speech_prob": 0.021716637536883354}, {"id": 1092, "seek": 365128, "start": 3668.4, + "end": 3671.1200000000003, "text": " Try to find solutions on my own to why it''s + going wrong.", "tokens": [51220, 6526, 281, 915, 6547, 322, 452, 1065, 281, 983, + 309, 311, 516, 2085, 13, 51356], "temperature": 0.0, "avg_logprob": -0.20795181819370814, + "compression_ratio": 1.6175548589341693, "no_speech_prob": 0.021716637536883354}, + {"id": 1093, "seek": 365128, "start": 3671.1200000000003, "end": 3676.8, "text": + " But then also once five o''clock hits for me, I can pull in the external knowledge + from the other team.", "tokens": [51356, 583, 550, 611, 1564, 1732, 277, 6, 9023, + 8664, 337, 385, 11, 286, 393, 2235, 294, 264, 8320, 3601, 490, 264, 661, 1469, 13, + 51640], "temperature": 0.0, "avg_logprob": -0.20795181819370814, "compression_ratio": + 1.6175548589341693, "no_speech_prob": 0.021716637536883354}, {"id": 1094, "seek": + 365128, "start": 3677.76, "end": 3678.5600000000004, "text": " But it''s fun.", + "tokens": [51688, 583, 309, 311, 1019, 13, 51728], "temperature": 0.0, "avg_logprob": + -0.20795181819370814, "compression_ratio": 1.6175548589341693, "no_speech_prob": + 0.021716637536883354}, {"id": 1095, "seek": 365128, "start": 3678.5600000000004, + "end": 3680.4, "text": " This needs to be solved with vector search.", "tokens": + [51728, 639, 2203, 281, 312, 13041, 365, 8062, 3164, 13, 51820], "temperature": + 0.0, "avg_logprob": -0.20795181819370814, "compression_ratio": 1.6175548589341693, + "no_speech_prob": 0.021716637536883354}, {"id": 1096, "seek": 368040, "start": 3681.04, + "end": 3683.92, "text": " I don''t need an exact answer just in the approximate, + but faster.", "tokens": [50396, 286, 500, 380, 643, 364, 1900, 1867, 445, 294, 264, + 30874, 11, 457, 4663, 13, 50540], "temperature": 0.0, "avg_logprob": -0.20055357854169115, + "compression_ratio": 1.6798561151079137, "no_speech_prob": 0.011522125452756882}, + {"id": 1097, "seek": 368040, "start": 3684.56, "end": 3686.7200000000003, "text": + " Oh, yeah, we were working on that.", "tokens": [50572, 876, 11, 1338, 11, 321, + 645, 1364, 322, 300, 13, 50680], "temperature": 0.0, "avg_logprob": -0.20055357854169115, + "compression_ratio": 1.6798561151079137, "no_speech_prob": 0.011522125452756882}, + {"id": 1098, "seek": 368040, "start": 3686.7200000000003, "end": 3691.6, "text": + " We''re trying to apply it to like a chatbot for all the problems that you have.", + "tokens": [50680, 492, 434, 1382, 281, 3079, 309, 281, 411, 257, 5081, 18870, 337, + 439, 264, 2740, 300, 291, 362, 13, 50924], "temperature": 0.0, "avg_logprob": -0.20055357854169115, + "compression_ratio": 1.6798561151079137, "no_speech_prob": 0.011522125452756882}, + {"id": 1099, "seek": 368040, "start": 3692.2400000000002, "end": 3694.8, "text": + " It''s been working okay, but we''re working on it.", "tokens": [50956, 467, 311, + 668, 1364, 1392, 11, 457, 321, 434, 1364, 322, 309, 13, 51084], "temperature": 0.0, + "avg_logprob": -0.20055357854169115, "compression_ratio": 1.6798561151079137, "no_speech_prob": + 0.011522125452756882}, {"id": 1100, "seek": 368040, "start": 3694.8, "end": 3697.2000000000003, + "text": " Trying to get more questions, kind of build up a data.", "tokens": [51084, + 20180, 281, 483, 544, 1651, 11, 733, 295, 1322, 493, 257, 1412, 13, 51204], "temperature": + 0.0, "avg_logprob": -0.20055357854169115, "compression_ratio": 1.6798561151079137, + "no_speech_prob": 0.011522125452756882}, {"id": 1101, "seek": 368040, "start": 3697.2000000000003, + "end": 3698.48, "text": " So that''s the issue with everything.", "tokens": [51204, + 407, 300, 311, 264, 2734, 365, 1203, 13, 51268], "temperature": 0.0, "avg_logprob": + -0.20055357854169115, "compression_ratio": 1.6798561151079137, "no_speech_prob": + 0.011522125452756882}, {"id": 1102, "seek": 368040, "start": 3699.12, "end": 3700.48, + "text": " Just building up that data said.", "tokens": [51300, 1449, 2390, 493, + 300, 1412, 848, 13, 51368], "temperature": 0.0, "avg_logprob": -0.20055357854169115, + "compression_ratio": 1.6798561151079137, "no_speech_prob": 0.011522125452756882}, + {"id": 1103, "seek": 368040, "start": 3701.2000000000003, "end": 3702.2400000000002, + "text": " Yeah, absolutely.", "tokens": [51404, 865, 11, 3122, 13, 51456], "temperature": + 0.0, "avg_logprob": -0.20055357854169115, "compression_ratio": 1.6798561151079137, + "no_speech_prob": 0.011522125452756882}, {"id": 1104, "seek": 368040, "start": 3702.2400000000002, + "end": 3705.28, "text": " So that it will make sense for the chatbot to kind of,", + "tokens": [51456, 407, 300, 309, 486, 652, 2020, 337, 264, 5081, 18870, 281, 733, + 295, 11, 51608], "temperature": 0.0, "avg_logprob": -0.20055357854169115, "compression_ratio": + 1.6798561151079137, "no_speech_prob": 0.011522125452756882}, {"id": 1105, "seek": + 368040, "start": 3705.28, "end": 3707.44, "text": " because chatbot wouldn''t create + answers.", "tokens": [51608, 570, 5081, 18870, 2759, 380, 1884, 6338, 13, 51716], + "temperature": 0.0, "avg_logprob": -0.20055357854169115, "compression_ratio": 1.6798561151079137, + "no_speech_prob": 0.011522125452756882}, {"id": 1106, "seek": 370744, "start": 3707.44, + "end": 3709.84, "text": " Well, unless it''s some generative model.", "tokens": + [50364, 1042, 11, 5969, 309, 311, 512, 1337, 1166, 2316, 13, 50484], "temperature": + 0.0, "avg_logprob": -0.2223906482723977, "compression_ratio": 1.619047619047619, + "no_speech_prob": 0.04151097685098648}, {"id": 1107, "seek": 370744, "start": 3710.4, + "end": 3712.48, "text": " Yeah, the GBT3 for our questions.", "tokens": [50512, + 865, 11, 264, 460, 33853, 18, 337, 527, 1651, 13, 50616], "temperature": 0.0, "avg_logprob": + -0.2223906482723977, "compression_ratio": 1.619047619047619, "no_speech_prob": 0.04151097685098648}, + {"id": 1108, "seek": 370744, "start": 3712.96, "end": 3714.2400000000002, "text": + " Like a story out of it.", "tokens": [50640, 1743, 257, 1657, 484, 295, 309, 13, + 50704], "temperature": 0.0, "avg_logprob": -0.2223906482723977, "compression_ratio": + 1.619047619047619, "no_speech_prob": 0.04151097685098648}, {"id": 1109, "seek": + 370744, "start": 3714.2400000000002, "end": 3718.0, "text": " Yeah, but then it + might be hallucinating as well.", "tokens": [50704, 865, 11, 457, 550, 309, 1062, + 312, 35212, 8205, 382, 731, 13, 50892], "temperature": 0.0, "avg_logprob": -0.2223906482723977, + "compression_ratio": 1.619047619047619, "no_speech_prob": 0.04151097685098648}, + {"id": 1110, "seek": 370744, "start": 3718.2400000000002, "end": 3719.84, "text": + " In some cases, it''s okay though, right?", "tokens": [50904, 682, 512, 3331, 11, + 309, 311, 1392, 1673, 11, 558, 30, 50984], "temperature": 0.0, "avg_logprob": -0.2223906482723977, + "compression_ratio": 1.619047619047619, "no_speech_prob": 0.04151097685098648}, + {"id": 1111, "seek": 370744, "start": 3720.56, "end": 3722.56, "text": " Yeah, one + of the ten is the correct answer.", "tokens": [51020, 865, 11, 472, 295, 264, 2064, + 307, 264, 3006, 1867, 13, 51120], "temperature": 0.0, "avg_logprob": -0.2223906482723977, + "compression_ratio": 1.619047619047619, "no_speech_prob": 0.04151097685098648}, + {"id": 1112, "seek": 370744, "start": 3722.56, "end": 3724.48, "text": " The other + ones are all just like burn your computer.", "tokens": [51120, 440, 661, 2306, 366, + 439, 445, 411, 5064, 428, 3820, 13, 51216], "temperature": 0.0, "avg_logprob": -0.2223906482723977, + "compression_ratio": 1.619047619047619, "no_speech_prob": 0.04151097685098648}, + {"id": 1113, "seek": 370744, "start": 3725.2000000000003, "end": 3727.84, "text": + " Yeah, if you want to have fun with, you know,", "tokens": [51252, 865, 11, 498, + 291, 528, 281, 362, 1019, 365, 11, 291, 458, 11, 51384], "temperature": 0.0, "avg_logprob": + -0.2223906482723977, "compression_ratio": 1.619047619047619, "no_speech_prob": 0.04151097685098648}, + {"id": 1114, "seek": 370744, "start": 3727.84, "end": 3729.28, "text": " you don''t + need an exact answer.", "tokens": [51384, 291, 500, 380, 643, 364, 1900, 1867, 13, + 51456], "temperature": 0.0, "avg_logprob": -0.2223906482723977, "compression_ratio": + 1.619047619047619, "no_speech_prob": 0.04151097685098648}, {"id": 1115, "seek": + 370744, "start": 3729.28, "end": 3731.36, "text": " You just, okay, hey buddy, how + are you doing?", "tokens": [51456, 509, 445, 11, 1392, 11, 4177, 10340, 11, 577, + 366, 291, 884, 30, 51560], "temperature": 0.0, "avg_logprob": -0.2223906482723977, + "compression_ratio": 1.619047619047619, "no_speech_prob": 0.04151097685098648}, + {"id": 1116, "seek": 370744, "start": 3733.36, "end": 3735.36, "text": " Yeah, so + yeah, that''s fantastic.", "tokens": [51660, 865, 11, 370, 1338, 11, 300, 311, 5456, + 13, 51760], "temperature": 0.0, "avg_logprob": -0.2223906482723977, "compression_ratio": + 1.619047619047619, "no_speech_prob": 0.04151097685098648}, {"id": 1117, "seek": + 373536, "start": 3735.36, "end": 3739.1200000000003, "text": " So I was lovely moving + to to why section,", "tokens": [50364, 407, 286, 390, 7496, 2684, 281, 281, 983, + 3541, 11, 50552], "temperature": 0.0, "avg_logprob": -0.22384011543403237, "compression_ratio": + 1.6553030303030303, "no_speech_prob": 0.0042383125983178616}, {"id": 1118, "seek": + 373536, "start": 3739.1200000000003, "end": 3740.8, "text": " even though I didn''t + say all the sections,", "tokens": [50552, 754, 1673, 286, 994, 380, 584, 439, 264, + 10863, 11, 50636], "temperature": 0.0, "avg_logprob": -0.22384011543403237, "compression_ratio": + 1.6553030303030303, "no_speech_prob": 0.0042383125983178616}, {"id": 1119, "seek": + 373536, "start": 3740.8, "end": 3744.08, "text": " but we kind of mixed what and + how together in many ways.", "tokens": [50636, 457, 321, 733, 295, 7467, 437, 293, + 577, 1214, 294, 867, 2098, 13, 50800], "temperature": 0.0, "avg_logprob": -0.22384011543403237, + "compression_ratio": 1.6553030303030303, "no_speech_prob": 0.0042383125983178616}, + {"id": 1120, "seek": 373536, "start": 3744.08, "end": 3746.32, "text": " And you + handle it really, really well.", "tokens": [50800, 400, 291, 4813, 309, 534, 11, + 534, 731, 13, 50912], "temperature": 0.0, "avg_logprob": -0.22384011543403237, "compression_ratio": + 1.6553030303030303, "no_speech_prob": 0.0042383125983178616}, {"id": 1121, "seek": + 373536, "start": 3747.6800000000003, "end": 3750.7200000000003, "text": " You know, + the why the why question that I really like to ask,", "tokens": [50980, 509, 458, + 11, 264, 983, 264, 983, 1168, 300, 286, 534, 411, 281, 1029, 11, 51132], "temperature": + 0.0, "avg_logprob": -0.22384011543403237, "compression_ratio": 1.6553030303030303, + "no_speech_prob": 0.0042383125983178616}, {"id": 1122, "seek": 373536, "start": + 3750.7200000000003, "end": 3754.96, "text": " everyone on this show is kind of what + motivates you", "tokens": [51132, 1518, 322, 341, 855, 307, 733, 295, 437, 42569, + 291, 51344], "temperature": 0.0, "avg_logprob": -0.22384011543403237, "compression_ratio": + 1.6553030303030303, "no_speech_prob": 0.0042383125983178616}, {"id": 1123, "seek": + 373536, "start": 3754.96, "end": 3757.2000000000003, "text": " to be part of vector + search development today?", "tokens": [51344, 281, 312, 644, 295, 8062, 3164, 3250, + 965, 30, 51456], "temperature": 0.0, "avg_logprob": -0.22384011543403237, "compression_ratio": + 1.6553030303030303, "no_speech_prob": 0.0042383125983178616}, {"id": 1124, "seek": + 373536, "start": 3758.96, "end": 3761.76, "text": " I think for me the biggest thing, + I want to over a few times is", "tokens": [51544, 286, 519, 337, 385, 264, 3880, + 551, 11, 286, 528, 281, 670, 257, 1326, 1413, 307, 51684], "temperature": 0.0, "avg_logprob": + -0.22384011543403237, "compression_ratio": 1.6553030303030303, "no_speech_prob": + 0.0042383125983178616}, {"id": 1125, "seek": 373536, "start": 3762.56, "end": 3764.1600000000003, + "text": " everyone storing all this data.", "tokens": [51724, 1518, 26085, 439, + 341, 1412, 13, 51804], "temperature": 0.0, "avg_logprob": -0.22384011543403237, + "compression_ratio": 1.6553030303030303, "no_speech_prob": 0.0042383125983178616}, + {"id": 1126, "seek": 376416, "start": 3764.7999999999997, "end": 3766.7999999999997, + "text": " And like it''s so like it''s a huge amount.", "tokens": [50396, 400, 411, + 309, 311, 370, 411, 309, 311, 257, 2603, 2372, 13, 50496], "temperature": 0.0, "avg_logprob": + -0.12587396988016092, "compression_ratio": 1.8477508650519032, "no_speech_prob": + 0.005057618021965027}, {"id": 1127, "seek": 376416, "start": 3766.7999999999997, + "end": 3768.3199999999997, "text": " I like all these companies.", "tokens": [50496, + 286, 411, 439, 613, 3431, 13, 50572], "temperature": 0.0, "avg_logprob": -0.12587396988016092, + "compression_ratio": 1.8477508650519032, "no_speech_prob": 0.005057618021965027}, + {"id": 1128, "seek": 376416, "start": 3768.3199999999997, "end": 3772.08, "text": + " And then just the next step, I want to see what we can do with it.", "tokens": + [50572, 400, 550, 445, 264, 958, 1823, 11, 286, 528, 281, 536, 437, 321, 393, 360, + 365, 309, 13, 50760], "temperature": 0.0, "avg_logprob": -0.12587396988016092, "compression_ratio": + 1.8477508650519032, "no_speech_prob": 0.005057618021965027}, {"id": 1129, "seek": + 376416, "start": 3772.08, "end": 3773.68, "text": " Vector search is one who knows.", + "tokens": [50760, 691, 20814, 3164, 307, 472, 567, 3255, 13, 50840], "temperature": + 0.0, "avg_logprob": -0.12587396988016092, "compression_ratio": 1.8477508650519032, + "no_speech_prob": 0.005057618021965027}, {"id": 1130, "seek": 376416, "start": 3773.68, + "end": 3775.12, "text": " Maybe vector search might not be it.", "tokens": [50840, + 2704, 8062, 3164, 1062, 406, 312, 309, 13, 50912], "temperature": 0.0, "avg_logprob": + -0.12587396988016092, "compression_ratio": 1.8477508650519032, "no_speech_prob": + 0.005057618021965027}, {"id": 1131, "seek": 376416, "start": 3775.7599999999998, + "end": 3778.56, "text": " But in that chase for figuring out vector search", "tokens": + [50944, 583, 294, 300, 15359, 337, 15213, 484, 8062, 3164, 51084], "temperature": + 0.0, "avg_logprob": -0.12587396988016092, "compression_ratio": 1.8477508650519032, + "no_speech_prob": 0.005057618021965027}, {"id": 1132, "seek": 376416, "start": 3778.56, + "end": 3781.2799999999997, "text": " for like perfecting it, something you might + pop out", "tokens": [51084, 337, 411, 2176, 278, 309, 11, 746, 291, 1062, 1665, + 484, 51220], "temperature": 0.0, "avg_logprob": -0.12587396988016092, "compression_ratio": + 1.8477508650519032, "no_speech_prob": 0.005057618021965027}, {"id": 1133, "seek": + 376416, "start": 3781.2799999999997, "end": 3783.6, "text": " and kind of ride this + wave of what''s next.", "tokens": [51220, 293, 733, 295, 5077, 341, 5772, 295, 437, + 311, 958, 13, 51336], "temperature": 0.0, "avg_logprob": -0.12587396988016092, "compression_ratio": + 1.8477508650519032, "no_speech_prob": 0.005057618021965027}, {"id": 1134, "seek": + 376416, "start": 3783.6, "end": 3787.04, "text": " And that''s kind of like why + I really like vector search right now.", "tokens": [51336, 400, 300, 311, 733, 295, + 411, 983, 286, 534, 411, 8062, 3164, 558, 586, 13, 51508], "temperature": 0.0, "avg_logprob": + -0.12587396988016092, "compression_ratio": 1.8477508650519032, "no_speech_prob": + 0.005057618021965027}, {"id": 1135, "seek": 376416, "start": 3787.04, "end": 3788.64, + "text": " I get to learn about all these things.", "tokens": [51508, 286, 483, 281, + 1466, 466, 439, 613, 721, 13, 51588], "temperature": 0.0, "avg_logprob": -0.12587396988016092, + "compression_ratio": 1.8477508650519032, "no_speech_prob": 0.005057618021965027}, + {"id": 1136, "seek": 376416, "start": 3789.2, "end": 3791.2, "text": " I still get + to like throw my ideas into it", "tokens": [51616, 286, 920, 483, 281, 411, 3507, + 452, 3487, 666, 309, 51716], "temperature": 0.0, "avg_logprob": -0.12587396988016092, + "compression_ratio": 1.8477508650519032, "no_speech_prob": 0.005057618021965027}, + {"id": 1137, "seek": 376416, "start": 3791.2, "end": 3792.72, "text": " and still + kind of have them matter.", "tokens": [51716, 293, 920, 733, 295, 362, 552, 1871, + 13, 51792], "temperature": 0.0, "avg_logprob": -0.12587396988016092, "compression_ratio": + 1.8477508650519032, "no_speech_prob": 0.005057618021965027}, {"id": 1138, "seek": + 379272, "start": 3793.4399999999996, "end": 3795.52, "text": " Like the previous + like the way that''s past already,", "tokens": [50400, 1743, 264, 3894, 411, 264, + 636, 300, 311, 1791, 1217, 11, 50504], "temperature": 0.0, "avg_logprob": -0.18328845247309258, + "compression_ratio": 1.7293729372937294, "no_speech_prob": 0.005425240378826857}, + {"id": 1139, "seek": 379272, "start": 3795.52, "end": 3798.72, "text": " it''s kind + of gone to the point where you really need to have this deep,", "tokens": [50504, + 309, 311, 733, 295, 2780, 281, 264, 935, 689, 291, 534, 643, 281, 362, 341, 2452, + 11, 50664], "temperature": 0.0, "avg_logprob": -0.18328845247309258, "compression_ratio": + 1.7293729372937294, "no_speech_prob": 0.005425240378826857}, {"id": 1140, "seek": + 379272, "start": 3798.72, "end": 3800.56, "text": " deep knowledge to actually be + able to innovate.", "tokens": [50664, 2452, 3601, 281, 767, 312, 1075, 281, 33444, + 13, 50756], "temperature": 0.0, "avg_logprob": -0.18328845247309258, "compression_ratio": + 1.7293729372937294, "no_speech_prob": 0.005425240378826857}, {"id": 1141, "seek": + 379272, "start": 3801.12, "end": 3803.2, "text": " You do also with vector search + and all of these things,", "tokens": [50784, 509, 360, 611, 365, 8062, 3164, 293, + 439, 295, 613, 721, 11, 50888], "temperature": 0.0, "avg_logprob": -0.18328845247309258, + "compression_ratio": 1.7293729372937294, "no_speech_prob": 0.005425240378826857}, + {"id": 1142, "seek": 379272, "start": 3803.2, "end": 3805.2, "text": " but it''s + a little bit more fresh.", "tokens": [50888, 457, 309, 311, 257, 707, 857, 544, + 4451, 13, 50988], "temperature": 0.0, "avg_logprob": -0.18328845247309258, "compression_ratio": + 1.7293729372937294, "no_speech_prob": 0.005425240378826857}, {"id": 1143, "seek": + 379272, "start": 3805.2, "end": 3807.52, "text": " So more that sort of makes sense.", + "tokens": [50988, 407, 544, 300, 1333, 295, 1669, 2020, 13, 51104], "temperature": + 0.0, "avg_logprob": -0.18328845247309258, "compression_ratio": 1.7293729372937294, + "no_speech_prob": 0.005425240378826857}, {"id": 1144, "seek": 379272, "start": 3807.52, + "end": 3810.16, "text": " Like I like to I want to ride that wave of freshness", + "tokens": [51104, 1743, 286, 411, 281, 286, 528, 281, 5077, 300, 5772, 295, 4451, + 1287, 51236], "temperature": 0.0, "avg_logprob": -0.18328845247309258, "compression_ratio": + 1.7293729372937294, "no_speech_prob": 0.005425240378826857}, {"id": 1145, "seek": + 379272, "start": 3810.16, "end": 3813.2799999999997, "text": " and kind of the next + step of dealing with these huge data amounts.", "tokens": [51236, 293, 733, 295, + 264, 958, 1823, 295, 6260, 365, 613, 2603, 1412, 11663, 13, 51392], "temperature": + 0.0, "avg_logprob": -0.18328845247309258, "compression_ratio": 1.7293729372937294, + "no_speech_prob": 0.005425240378826857}, {"id": 1146, "seek": 379272, "start": 3814.0, + "end": 3815.7599999999998, "text": " Yeah, that''s that''s that''s amazing.", "tokens": + [51428, 865, 11, 300, 311, 300, 311, 300, 311, 2243, 13, 51516], "temperature": + 0.0, "avg_logprob": -0.18328845247309258, "compression_ratio": 1.7293729372937294, + "no_speech_prob": 0.005425240378826857}, {"id": 1147, "seek": 379272, "start": 3815.7599999999998, + "end": 3821.4399999999996, "text": " And also I think I''ve read somewhere on on + to either one of the founders", "tokens": [51516, 400, 611, 286, 519, 286, 600, + 1401, 4079, 322, 322, 281, 2139, 472, 295, 264, 25608, 51800], "temperature": 0.0, + "avg_logprob": -0.18328845247309258, "compression_ratio": 1.7293729372937294, "no_speech_prob": + 0.005425240378826857}, {"id": 1148, "seek": 382144, "start": 3821.44, "end": 3822.64, + "text": " of Y-combinator.", "tokens": [50364, 295, 398, 12, 38763, 31927, 13, 50424], + "temperature": 0.0, "avg_logprob": -0.1952812070769023, "compression_ratio": 1.873469387755102, + "no_speech_prob": 0.009870289824903011}, {"id": 1149, "seek": 382144, "start": 3824.32, + "end": 3826.64, "text": " He said like it was an essay.", "tokens": [50508, 634, + 848, 411, 309, 390, 364, 16238, 13, 50624], "temperature": 0.0, "avg_logprob": -0.1952812070769023, + "compression_ratio": 1.873469387755102, "no_speech_prob": 0.009870289824903011}, + {"id": 1150, "seek": 382144, "start": 3826.64, "end": 3831.12, "text": " He said + like when you are on the bleeding edge of doing something,", "tokens": [50624, 634, + 848, 411, 562, 291, 366, 322, 264, 19312, 4691, 295, 884, 746, 11, 50848], "temperature": + 0.0, "avg_logprob": -0.1952812070769023, "compression_ratio": 1.873469387755102, + "no_speech_prob": 0.009870289824903011}, {"id": 1151, "seek": 382144, "start": 3831.68, + "end": 3835.2000000000003, "text": " then you you automatically become the expert + in that field.", "tokens": [50876, 550, 291, 291, 6772, 1813, 264, 5844, 294, 300, + 2519, 13, 51052], "temperature": 0.0, "avg_logprob": -0.1952812070769023, "compression_ratio": + 1.873469387755102, "no_speech_prob": 0.009870289824903011}, {"id": 1152, "seek": + 382144, "start": 3835.2000000000003, "end": 3837.6, "text": " And if something works + for you, you know,", "tokens": [51052, 400, 498, 746, 1985, 337, 291, 11, 291, 458, + 11, 51172], "temperature": 0.0, "avg_logprob": -0.1952812070769023, "compression_ratio": + 1.873469387755102, "no_speech_prob": 0.009870289824903011}, {"id": 1153, "seek": + 382144, "start": 3837.6, "end": 3840.32, "text": " the rest of the market will probably + try to copy.", "tokens": [51172, 264, 1472, 295, 264, 2142, 486, 1391, 853, 281, + 5055, 13, 51308], "temperature": 0.0, "avg_logprob": -0.1952812070769023, "compression_ratio": + 1.873469387755102, "no_speech_prob": 0.009870289824903011}, {"id": 1154, "seek": + 382144, "start": 3840.32, "end": 3844.48, "text": " If it didn''t work, then probably + everyone else didn''t figure it out", "tokens": [51308, 759, 309, 994, 380, 589, + 11, 550, 1391, 1518, 1646, 994, 380, 2573, 309, 484, 51516], "temperature": 0.0, + "avg_logprob": -0.1952812070769023, "compression_ratio": 1.873469387755102, "no_speech_prob": + 0.009870289824903011}, {"id": 1155, "seek": 382144, "start": 3844.48, "end": 3846.88, + "text": " because you are the bleeding edge expert, right?", "tokens": [51516, 570, + 291, 366, 264, 19312, 4691, 5844, 11, 558, 30, 51636], "temperature": 0.0, "avg_logprob": + -0.1952812070769023, "compression_ratio": 1.873469387755102, "no_speech_prob": 0.009870289824903011}, + {"id": 1156, "seek": 382144, "start": 3846.88, "end": 3847.6, "text": " You are + right there.", "tokens": [51636, 509, 366, 558, 456, 13, 51672], "temperature": + 0.0, "avg_logprob": -0.1952812070769023, "compression_ratio": 1.873469387755102, + "no_speech_prob": 0.009870289824903011}, {"id": 1157, "seek": 382144, "start": 3848.2400000000002, + "end": 3851.2000000000003, "text": " And if you will figure if you will figure out + something", "tokens": [51704, 400, 498, 291, 486, 2573, 498, 291, 486, 2573, 484, + 746, 51852], "temperature": 0.0, "avg_logprob": -0.1952812070769023, "compression_ratio": + 1.873469387755102, "no_speech_prob": 0.009870289824903011}, {"id": 1158, "seek": + 385120, "start": 3851.3599999999997, "end": 3855.3599999999997, "text": " very interesting + that will be kind of revolutionary in some way,", "tokens": [50372, 588, 1880, 300, + 486, 312, 733, 295, 22687, 294, 512, 636, 11, 50572], "temperature": 0.0, "avg_logprob": + -0.14178136299396382, "compression_ratio": 1.7283464566929134, "no_speech_prob": + 0.0016081002540886402}, {"id": 1159, "seek": 385120, "start": 3855.3599999999997, + "end": 3860.08, "text": " then you will be first to possibly capture the value, + right?", "tokens": [50572, 550, 291, 486, 312, 700, 281, 6264, 7983, 264, 2158, + 11, 558, 30, 50808], "temperature": 0.0, "avg_logprob": -0.14178136299396382, "compression_ratio": + 1.7283464566929134, "no_speech_prob": 0.0016081002540886402}, {"id": 1160, "seek": + 385120, "start": 3860.08, "end": 3862.56, "text": " And so you work for that goal.", + "tokens": [50808, 400, 370, 291, 589, 337, 300, 3387, 13, 50932], "temperature": + 0.0, "avg_logprob": -0.14178136299396382, "compression_ratio": 1.7283464566929134, + "no_speech_prob": 0.0016081002540886402}, {"id": 1161, "seek": 385120, "start": + 3862.56, "end": 3865.52, "text": " On one hand, as you said, it motivates you to + unlock the,", "tokens": [50932, 1282, 472, 1011, 11, 382, 291, 848, 11, 309, 42569, + 291, 281, 11634, 264, 11, 51080], "temperature": 0.0, "avg_logprob": -0.14178136299396382, + "compression_ratio": 1.7283464566929134, "no_speech_prob": 0.0016081002540886402}, + {"id": 1162, "seek": 385120, "start": 3866.56, "end": 3869.7599999999998, "text": + " you know, the silo database is kind of of data,", "tokens": [51132, 291, 458, + 11, 264, 3425, 78, 8149, 307, 733, 295, 295, 1412, 11, 51292], "temperature": 0.0, + "avg_logprob": -0.14178136299396382, "compression_ratio": 1.7283464566929134, "no_speech_prob": + 0.0016081002540886402}, {"id": 1163, "seek": 385120, "start": 3869.7599999999998, + "end": 3871.2799999999997, "text": " unstructured and structured data.", "tokens": + [51292, 18799, 46847, 293, 18519, 1412, 13, 51368], "temperature": 0.0, "avg_logprob": + -0.14178136299396382, "compression_ratio": 1.7283464566929134, "no_speech_prob": + 0.0016081002540886402}, {"id": 1164, "seek": 385120, "start": 3872.24, "end": 3875.3599999999997, + "text": " On the other hand, you said maybe it won''t be vector search,", "tokens": + [51416, 1282, 264, 661, 1011, 11, 291, 848, 1310, 309, 1582, 380, 312, 8062, 3164, + 11, 51572], "temperature": 0.0, "avg_logprob": -0.14178136299396382, "compression_ratio": + 1.7283464566929134, "no_speech_prob": 0.0016081002540886402}, {"id": 1165, "seek": + 385120, "start": 3875.3599999999997, "end": 3879.4399999999996, "text": " maybe + it will be something else because you are in that experimental mode, right?", "tokens": + [51572, 1310, 309, 486, 312, 746, 1646, 570, 291, 366, 294, 300, 17069, 4391, 11, + 558, 30, 51776], "temperature": 0.0, "avg_logprob": -0.14178136299396382, "compression_ratio": + 1.7283464566929134, "no_speech_prob": 0.0016081002540886402}, {"id": 1166, "seek": + 387944, "start": 3879.44, "end": 3880.0, "text": " Yeah.", "tokens": [50364, 865, + 13, 50392], "temperature": 0.0, "avg_logprob": -0.2801873506005131, "compression_ratio": + 1.6910569105691058, "no_speech_prob": 0.011041813530027866}, {"id": 1167, "seek": + 387944, "start": 3880.0, "end": 3883.44, "text": " Whereas you can easily quickly + transition and kind of keep that knowledge,", "tokens": [50392, 13813, 291, 393, + 3612, 2661, 6034, 293, 733, 295, 1066, 300, 3601, 11, 50564], "temperature": 0.0, + "avg_logprob": -0.2801873506005131, "compression_ratio": 1.6910569105691058, "no_speech_prob": + 0.011041813530027866}, {"id": 1168, "seek": 387944, "start": 3884.0, "end": 3885.36, + "text": " keep it going and keep it running.", "tokens": [50592, 1066, 309, 516, + 293, 1066, 309, 2614, 13, 50660], "temperature": 0.0, "avg_logprob": -0.2801873506005131, + "compression_ratio": 1.6910569105691058, "no_speech_prob": 0.011041813530027866}, + {"id": 1169, "seek": 387944, "start": 3886.2400000000002, "end": 3889.44, "text": + " Yeah, that''s pretty much for me why I''m doing this.", "tokens": [50704, 865, + 11, 300, 311, 1238, 709, 337, 385, 983, 286, 478, 884, 341, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.2801873506005131, "compression_ratio": 1.6910569105691058, + "no_speech_prob": 0.011041813530027866}, {"id": 1170, "seek": 387944, "start": 3889.92, + "end": 3892.0, "text": " It''s been really fun so far.", "tokens": [50888, 467, + 311, 668, 534, 1019, 370, 1400, 13, 50992], "temperature": 0.0, "avg_logprob": -0.2801873506005131, + "compression_ratio": 1.6910569105691058, "no_speech_prob": 0.011041813530027866}, + {"id": 1171, "seek": 387944, "start": 3892.0, "end": 3893.2000000000003, "text": + " Also startups.", "tokens": [50992, 2743, 28041, 13, 51052], "temperature": 0.0, + "avg_logprob": -0.2801873506005131, "compression_ratio": 1.6910569105691058, "no_speech_prob": + 0.011041813530027866}, {"id": 1172, "seek": 387944, "start": 3893.84, "end": 3894.4, + "text": " I like it.", "tokens": [51084, 286, 411, 309, 13, 51112], "temperature": + 0.0, "avg_logprob": -0.2801873506005131, "compression_ratio": 1.6910569105691058, + "no_speech_prob": 0.011041813530027866}, {"id": 1173, "seek": 387944, "start": 3894.4, + "end": 3895.68, "text": " I like the multi-hat.", "tokens": [51112, 286, 411, 264, + 4825, 12, 15178, 13, 51176], "temperature": 0.0, "avg_logprob": -0.2801873506005131, + "compression_ratio": 1.6910569105691058, "no_speech_prob": 0.011041813530027866}, + {"id": 1174, "seek": 387944, "start": 3896.48, "end": 3898.0, "text": " Kind of + just do it.", "tokens": [51216, 9242, 295, 445, 360, 309, 13, 51292], "temperature": + 0.0, "avg_logprob": -0.2801873506005131, "compression_ratio": 1.6910569105691058, + "no_speech_prob": 0.011041813530027866}, {"id": 1175, "seek": 387944, "start": 3898.48, + "end": 3900.08, "text": " Try it by fire and just get it done.", "tokens": [51316, + 6526, 309, 538, 2610, 293, 445, 483, 309, 1096, 13, 51396], "temperature": 0.0, + "avg_logprob": -0.2801873506005131, "compression_ratio": 1.6910569105691058, "no_speech_prob": + 0.011041813530027866}, {"id": 1176, "seek": 387944, "start": 3900.88, "end": 3902.0, + "text": " Yeah, yeah.", "tokens": [51436, 865, 11, 1338, 13, 51492], "temperature": + 0.0, "avg_logprob": -0.2801873506005131, "compression_ratio": 1.6910569105691058, + "no_speech_prob": 0.011041813530027866}, {"id": 1177, "seek": 387944, "start": 3903.6, + "end": 3905.2000000000003, "text": " With the money where you mouth,", "tokens": + [51572, 2022, 264, 1460, 689, 291, 4525, 11, 51652], "temperature": 0.0, "avg_logprob": + -0.2801873506005131, "compression_ratio": 1.6910569105691058, "no_speech_prob": + 0.011041813530027866}, {"id": 1178, "seek": 387944, "start": 3905.2000000000003, + "end": 3906.88, "text": " well, how do you say it in English?", "tokens": [51652, + 731, 11, 577, 360, 291, 584, 309, 294, 3669, 30, 51736], "temperature": 0.0, "avg_logprob": + -0.2801873506005131, "compression_ratio": 1.6910569105691058, "no_speech_prob": + 0.011041813530027866}, {"id": 1179, "seek": 387944, "start": 3906.88, "end": 3908.4, + "text": " With the money where you mouth is?", "tokens": [51736, 2022, 264, 1460, + 689, 291, 4525, 307, 30, 51812], "temperature": 0.0, "avg_logprob": -0.2801873506005131, + "compression_ratio": 1.6910569105691058, "no_speech_prob": 0.011041813530027866}, + {"id": 1180, "seek": 390840, "start": 3908.4, "end": 3909.52, "text": " With the + money where you mouth is?", "tokens": [50364, 2022, 264, 1460, 689, 291, 4525, 307, + 30, 50420], "temperature": 0.0, "avg_logprob": -0.2110766287772886, "compression_ratio": + 1.681159420289855, "no_speech_prob": 0.0035956676583737135}, {"id": 1181, "seek": + 390840, "start": 3909.52, "end": 3911.28, "text": " Yeah, like something along those + lines.", "tokens": [50420, 865, 11, 411, 746, 2051, 729, 3876, 13, 50508], "temperature": + 0.0, "avg_logprob": -0.2110766287772886, "compression_ratio": 1.681159420289855, + "no_speech_prob": 0.0035956676583737135}, {"id": 1182, "seek": 390840, "start": + 3911.28, "end": 3915.6, "text": " Yeah, but I mean, you basically, instead of just + kind of blogging or saying how cool it is,", "tokens": [50508, 865, 11, 457, 286, + 914, 11, 291, 1936, 11, 2602, 295, 445, 733, 295, 6968, 3249, 420, 1566, 577, 1627, + 309, 307, 11, 50724], "temperature": 0.0, "avg_logprob": -0.2110766287772886, "compression_ratio": + 1.681159420289855, "no_speech_prob": 0.0035956676583737135}, {"id": 1183, "seek": + 390840, "start": 3915.6, "end": 3920.1600000000003, "text": " you actually go and + try to apply it to some real use case, right?", "tokens": [50724, 291, 767, 352, + 293, 853, 281, 3079, 309, 281, 512, 957, 764, 1389, 11, 558, 30, 50952], "temperature": + 0.0, "avg_logprob": -0.2110766287772886, "compression_ratio": 1.681159420289855, + "no_speech_prob": 0.0035956676583737135}, {"id": 1184, "seek": 390840, "start": + 3920.8, "end": 3921.6, "text": " Exactly.", "tokens": [50984, 7587, 13, 51024], + "temperature": 0.0, "avg_logprob": -0.2110766287772886, "compression_ratio": 1.681159420289855, + "no_speech_prob": 0.0035956676583737135}, {"id": 1185, "seek": 390840, "start": + 3921.6, "end": 3926.1600000000003, "text": " And if I may ask you, like, do you + think that something kind of", "tokens": [51024, 400, 498, 286, 815, 1029, 291, + 11, 411, 11, 360, 291, 519, 300, 746, 733, 295, 51252], "temperature": 0.0, "avg_logprob": + -0.2110766287772886, "compression_ratio": 1.681159420289855, "no_speech_prob": 0.0035956676583737135}, + {"id": 1186, "seek": 390840, "start": 3926.1600000000003, "end": 3931.12, "text": + " tactically or strategically is missing right now in the vector search space?", + "tokens": [51252, 9959, 984, 420, 38061, 307, 5361, 558, 586, 294, 264, 8062, 3164, + 1901, 30, 51500], "temperature": 0.0, "avg_logprob": -0.2110766287772886, "compression_ratio": + 1.681159420289855, "no_speech_prob": 0.0035956676583737135}, {"id": 1187, "seek": + 390840, "start": 3931.12, "end": 3938.1600000000003, "text": " Like maybe on the + lines of how we explain it or maybe there are some untapped use", "tokens": [51500, + 1743, 1310, 322, 264, 3876, 295, 577, 321, 2903, 309, 420, 1310, 456, 366, 512, + 517, 1328, 3320, 764, 51852], "temperature": 0.0, "avg_logprob": -0.2110766287772886, + "compression_ratio": 1.681159420289855, "no_speech_prob": 0.0035956676583737135}, + {"id": 1188, "seek": 393816, "start": 3938.24, "end": 3940.7999999999997, "text": + " cases or something else that comes to you.", "tokens": [50368, 3331, 420, 746, + 1646, 300, 1487, 281, 291, 13, 50496], "temperature": 0.0, "avg_logprob": -0.200099975832047, + "compression_ratio": 1.7234848484848484, "no_speech_prob": 0.0028136486653238535}, + {"id": 1189, "seek": 393816, "start": 3940.7999999999997, "end": 3947.04, "text": + " I think I think the big thing right now is I may be wrong on this one.", "tokens": + [50496, 286, 519, 286, 519, 264, 955, 551, 558, 586, 307, 286, 815, 312, 2085, 322, + 341, 472, 13, 50808], "temperature": 0.0, "avg_logprob": -0.200099975832047, "compression_ratio": + 1.7234848484848484, "no_speech_prob": 0.0028136486653238535}, {"id": 1190, "seek": + 393816, "start": 3947.04, "end": 3950.3199999999997, "text": " I kind of might be + explaining it weirdly, but like having a standard,", "tokens": [50808, 286, 733, + 295, 1062, 312, 13468, 309, 48931, 11, 457, 411, 1419, 257, 3832, 11, 50972], "temperature": + 0.0, "avg_logprob": -0.200099975832047, "compression_ratio": 1.7234848484848484, + "no_speech_prob": 0.0028136486653238535}, {"id": 1191, "seek": 393816, "start": + 3951.12, "end": 3954.0, "text": " like we don''t really have a standard for any + of this yet.", "tokens": [51012, 411, 321, 500, 380, 534, 362, 257, 3832, 337, 604, + 295, 341, 1939, 13, 51156], "temperature": 0.0, "avg_logprob": -0.200099975832047, + "compression_ratio": 1.7234848484848484, "no_speech_prob": 0.0028136486653238535}, + {"id": 1192, "seek": 393816, "start": 3954.0, "end": 3958.3999999999996, "text": + " And there''s a bunch of things kind of popping up and everyone''s going to be + scared to move away.", "tokens": [51156, 400, 456, 311, 257, 3840, 295, 721, 733, + 295, 18374, 493, 293, 1518, 311, 516, 281, 312, 5338, 281, 1286, 1314, 13, 51376], + "temperature": 0.0, "avg_logprob": -0.200099975832047, "compression_ratio": 1.7234848484848484, + "no_speech_prob": 0.0028136486653238535}, {"id": 1193, "seek": 393816, "start": + 3959.04, "end": 3961.3599999999997, "text": " So like, I feel like the last to search.", + "tokens": [51408, 407, 411, 11, 286, 841, 411, 264, 1036, 281, 3164, 13, 51524], + "temperature": 0.0, "avg_logprob": -0.200099975832047, "compression_ratio": 1.7234848484848484, + "no_speech_prob": 0.0028136486653238535}, {"id": 1194, "seek": 393816, "start": + 3961.3599999999997, "end": 3964.7999999999997, "text": " I don''t know too much + about the history and like what''s going on, but like,", "tokens": [51524, 286, + 500, 380, 458, 886, 709, 466, 264, 2503, 293, 411, 437, 311, 516, 322, 11, 457, + 411, 11, 51696], "temperature": 0.0, "avg_logprob": -0.200099975832047, "compression_ratio": + 1.7234848484848484, "no_speech_prob": 0.0028136486653238535}, {"id": 1195, "seek": + 396480, "start": 3964.8, "end": 3967.36, "text": " everyone''s a bunch of people + have built up their system on that and it''s kind of", "tokens": [50364, 1518, 311, + 257, 3840, 295, 561, 362, 3094, 493, 641, 1185, 322, 300, 293, 309, 311, 733, 295, + 50492], "temperature": 0.0, "avg_logprob": -0.16331350067515432, "compression_ratio": + 1.9514563106796117, "no_speech_prob": 0.0016130456933751702}, {"id": 1196, "seek": + 396480, "start": 3967.36, "end": 3970.88, "text": " been a standard for doing that + text-based keyword searching and that kind of stuff.", "tokens": [50492, 668, 257, + 3832, 337, 884, 300, 2487, 12, 6032, 20428, 10808, 293, 300, 733, 295, 1507, 13, + 50668], "temperature": 0.0, "avg_logprob": -0.16331350067515432, "compression_ratio": + 1.9514563106796117, "no_speech_prob": 0.0016130456933751702}, {"id": 1197, "seek": + 396480, "start": 3971.6000000000004, "end": 3973.76, "text": " And then when we + say, oh yeah, do word embedding.", "tokens": [50704, 400, 550, 562, 321, 584, 11, + 1954, 1338, 11, 360, 1349, 12240, 3584, 13, 50812], "temperature": 0.0, "avg_logprob": + -0.16331350067515432, "compression_ratio": 1.9514563106796117, "no_speech_prob": + 0.0016130456933751702}, {"id": 1198, "seek": 396480, "start": 3973.76, "end": 3975.04, + "text": " So we''ll make everything improve.", "tokens": [50812, 407, 321, 603, + 652, 1203, 3470, 13, 50876], "temperature": 0.0, "avg_logprob": -0.16331350067515432, + "compression_ratio": 1.9514563106796117, "no_speech_prob": 0.0016130456933751702}, + {"id": 1199, "seek": 396480, "start": 3975.04, "end": 3975.84, "text": " Do this, + do this.", "tokens": [50876, 1144, 341, 11, 360, 341, 13, 50916], "temperature": + 0.0, "avg_logprob": -0.16331350067515432, "compression_ratio": 1.9514563106796117, + "no_speech_prob": 0.0016130456933751702}, {"id": 1200, "seek": 396480, "start": + 3976.4, "end": 3980.48, "text": " But it''s like, there''s no standard in any of + like, we''re doing vector database.", "tokens": [50944, 583, 309, 311, 411, 11, + 456, 311, 572, 3832, 294, 604, 295, 411, 11, 321, 434, 884, 8062, 8149, 13, 51148], + "temperature": 0.0, "avg_logprob": -0.16331350067515432, "compression_ratio": 1.9514563106796117, + "no_speech_prob": 0.0016130456933751702}, {"id": 1201, "seek": 396480, "start": + 3980.48, "end": 3982.7200000000003, "text": " Some of the people doing vector search + with database attached.", "tokens": [51148, 2188, 295, 264, 561, 884, 8062, 3164, + 365, 8149, 8570, 13, 51260], "temperature": 0.0, "avg_logprob": -0.16331350067515432, + "compression_ratio": 1.9514563106796117, "no_speech_prob": 0.0016130456933751702}, + {"id": 1202, "seek": 396480, "start": 3982.7200000000003, "end": 3987.2000000000003, + "text": " We''re like, everyone''s kind of just doing some like, there''s no big + thing there that keep people", "tokens": [51260, 492, 434, 411, 11, 1518, 311, 733, + 295, 445, 884, 512, 411, 11, 456, 311, 572, 955, 551, 456, 300, 1066, 561, 51484], + "temperature": 0.0, "avg_logprob": -0.16331350067515432, "compression_ratio": 1.9514563106796117, + "no_speech_prob": 0.0016130456933751702}, {"id": 1203, "seek": 396480, "start": + 3987.2000000000003, "end": 3989.28, "text": " kind of try to make it similar.", + "tokens": [51484, 733, 295, 853, 281, 652, 309, 2531, 13, 51588], "temperature": + 0.0, "avg_logprob": -0.16331350067515432, "compression_ratio": 1.9514563106796117, + "no_speech_prob": 0.0016130456933751702}, {"id": 1204, "seek": 396480, "start": + 3989.28, "end": 3992.48, "text": " So yeah, there''s no standard, which I think + is kind of an issue.", "tokens": [51588, 407, 1338, 11, 456, 311, 572, 3832, 11, + 597, 286, 519, 307, 733, 295, 364, 2734, 13, 51748], "temperature": 0.0, "avg_logprob": + -0.16331350067515432, "compression_ratio": 1.9514563106796117, "no_speech_prob": + 0.0016130456933751702}, {"id": 1205, "seek": 399248, "start": 3993.04, "end": 3997.6, + "text": " And it''s going to hurt everyone in the long run because there''s no standard.", + "tokens": [50392, 400, 309, 311, 516, 281, 4607, 1518, 294, 264, 938, 1190, 570, + 456, 311, 572, 3832, 13, 50620], "temperature": 0.0, "avg_logprob": -0.19648820628290592, + "compression_ratio": 1.6444444444444444, "no_speech_prob": 0.015218515880405903}, + {"id": 1206, "seek": 399248, "start": 3997.6, "end": 4000.8, "text": " People won''t + be as excited to try it out because there''s too many options.", "tokens": [50620, + 3432, 1582, 380, 312, 382, 2919, 281, 853, 309, 484, 570, 456, 311, 886, 867, 3956, + 13, 50780], "temperature": 0.0, "avg_logprob": -0.19648820628290592, "compression_ratio": + 1.6444444444444444, "no_speech_prob": 0.015218515880405903}, {"id": 1207, "seek": + 399248, "start": 4000.8, "end": 4002.48, "text": " Why switch is too much of a pain.", + "tokens": [50780, 1545, 3679, 307, 886, 709, 295, 257, 1822, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.19648820628290592, "compression_ratio": 1.6444444444444444, + "no_speech_prob": 0.015218515880405903}, {"id": 1208, "seek": 399248, "start": 4003.36, + "end": 4007.12, "text": " I don''t know if that kind of made sense, but that''s + sort of what I''m seeing as an issue right now.", "tokens": [50908, 286, 500, 380, + 458, 498, 300, 733, 295, 1027, 2020, 11, 457, 300, 311, 1333, 295, 437, 286, 478, + 2577, 382, 364, 2734, 558, 586, 13, 51096], "temperature": 0.0, "avg_logprob": -0.19648820628290592, + "compression_ratio": 1.6444444444444444, "no_speech_prob": 0.015218515880405903}, + {"id": 1209, "seek": 399248, "start": 4007.92, "end": 4011.28, "text": " But it''ll + probably solve at some point.", "tokens": [51136, 583, 309, 603, 1391, 5039, 412, + 512, 935, 13, 51304], "temperature": 0.0, "avg_logprob": -0.19648820628290592, "compression_ratio": + 1.6444444444444444, "no_speech_prob": 0.015218515880405903}, {"id": 1210, "seek": + 399248, "start": 4011.28, "end": 4013.2, "text": " I think naturally that happens.", + "tokens": [51304, 286, 519, 8195, 300, 2314, 13, 51400], "temperature": 0.0, "avg_logprob": + -0.19648820628290592, "compression_ratio": 1.6444444444444444, "no_speech_prob": + 0.015218515880405903}, {"id": 1211, "seek": 399248, "start": 4014.88, "end": 4019.68, + "text": " I guess explaining it could have seen from my previous explanation of + trying to explain", "tokens": [51484, 286, 2041, 13468, 309, 727, 362, 1612, 490, + 452, 3894, 10835, 295, 1382, 281, 2903, 51724], "temperature": 0.0, "avg_logprob": + -0.19648820628290592, "compression_ratio": 1.6444444444444444, "no_speech_prob": + 0.015218515880405903}, {"id": 1212, "seek": 401968, "start": 4019.8399999999997, + "end": 4021.6, "text": " a vector search. It was all over the place.", "tokens": + [50372, 257, 8062, 3164, 13, 467, 390, 439, 670, 264, 1081, 13, 50460], "temperature": + 0.0, "avg_logprob": -0.20565749051278098, "compression_ratio": 1.6420664206642066, + "no_speech_prob": 0.013607233762741089}, {"id": 1213, "seek": 401968, "start": 4022.24, + "end": 4025.3599999999997, "text": " But it kind of gets hard to it''s a step.", + "tokens": [50492, 583, 309, 733, 295, 2170, 1152, 281, 309, 311, 257, 1823, 13, + 50648], "temperature": 0.0, "avg_logprob": -0.20565749051278098, "compression_ratio": + 1.6420664206642066, "no_speech_prob": 0.013607233762741089}, {"id": 1214, "seek": + 401968, "start": 4025.3599999999997, "end": 4031.6, "text": " It''s a jump and um, + yeah, not everyone will know like similarity like cosine distances.", "tokens": + [50648, 467, 311, 257, 3012, 293, 1105, 11, 1338, 11, 406, 1518, 486, 458, 411, + 32194, 411, 23565, 22182, 13, 50960], "temperature": 0.0, "avg_logprob": -0.20565749051278098, + "compression_ratio": 1.6420664206642066, "no_speech_prob": 0.013607233762741089}, + {"id": 1215, "seek": 401968, "start": 4031.6, "end": 4034.56, "text": " Like you + need to be sort of involved with machine learning.", "tokens": [50960, 1743, 291, + 643, 281, 312, 1333, 295, 3288, 365, 3479, 2539, 13, 51108], "temperature": 0.0, + "avg_logprob": -0.20565749051278098, "compression_ratio": 1.6420664206642066, "no_speech_prob": + 0.013607233762741089}, {"id": 1216, "seek": 401968, "start": 4035.2799999999997, + "end": 4039.6, "text": " I think the best way of around that is just making full + pipelines for people where you just", "tokens": [51144, 286, 519, 264, 1151, 636, + 295, 926, 300, 307, 445, 1455, 1577, 40168, 337, 561, 689, 291, 445, 51360], "temperature": + 0.0, "avg_logprob": -0.20565749051278098, "compression_ratio": 1.6420664206642066, + "no_speech_prob": 0.013607233762741089}, {"id": 1217, "seek": 401968, "start": 4039.6, + "end": 4044.48, "text": " put an image in you get your result and then go from there + and then from there on they can start", "tokens": [51360, 829, 364, 3256, 294, 291, + 483, 428, 1874, 293, 550, 352, 490, 456, 293, 550, 490, 456, 322, 436, 393, 722, + 51604], "temperature": 0.0, "avg_logprob": -0.20565749051278098, "compression_ratio": + 1.6420664206642066, "no_speech_prob": 0.013607233762741089}, {"id": 1218, "seek": + 401968, "start": 4044.48, "end": 4045.2799999999997, "text": " messing around with + it.", "tokens": [51604, 23258, 926, 365, 309, 13, 51644], "temperature": 0.0, "avg_logprob": + -0.20565749051278098, "compression_ratio": 1.6420664206642066, "no_speech_prob": + 0.013607233762741089}, {"id": 1219, "seek": 404528, "start": 4046.0, "end": 4049.92, + "text": " But uh, in time, everyone I think will have that and everyone''s working + for that.", "tokens": [50400, 583, 2232, 11, 294, 565, 11, 1518, 286, 519, 486, + 362, 300, 293, 1518, 311, 1364, 337, 300, 13, 50596], "temperature": 0.0, "avg_logprob": + -0.20110034942626953, "compression_ratio": 1.652014652014652, "no_speech_prob": + 0.01567445695400238}, {"id": 1220, "seek": 404528, "start": 4050.6400000000003, + "end": 4055.76, "text": " Yeah, I think what you said makes a lot of sense and thanks + for bringing up this topic, you know,", "tokens": [50632, 865, 11, 286, 519, 437, + 291, 848, 1669, 257, 688, 295, 2020, 293, 3231, 337, 5062, 493, 341, 4829, 11, 291, + 458, 11, 50888], "temperature": 0.0, "avg_logprob": -0.20110034942626953, "compression_ratio": + 1.652014652014652, "no_speech_prob": 0.01567445695400238}, {"id": 1221, "seek": + 404528, "start": 4055.76, "end": 4063.28, "text": " standardization because um, + on one hand it basically points us to think that this field is still", "tokens": + [50888, 3832, 2144, 570, 1105, 11, 322, 472, 1011, 309, 1936, 2793, 505, 281, 519, + 300, 341, 2519, 307, 920, 51264], "temperature": 0.0, "avg_logprob": -0.20110034942626953, + "compression_ratio": 1.652014652014652, "no_speech_prob": 0.01567445695400238}, + {"id": 1222, "seek": 404528, "start": 4063.28, "end": 4070.0800000000004, "text": + " fragmented, right? I''ve, I''ve blogged about it. I had six databases and then + one evening I get", "tokens": [51264, 9241, 14684, 11, 558, 30, 286, 600, 11, 286, + 600, 6968, 3004, 466, 309, 13, 286, 632, 2309, 22380, 293, 550, 472, 5634, 286, + 483, 51604], "temperature": 0.0, "avg_logprob": -0.20110034942626953, "compression_ratio": + 1.652014652014652, "no_speech_prob": 0.01567445695400238}, {"id": 1223, "seek": + 404528, "start": 4070.0800000000004, "end": 4074.2400000000002, "text": " a comment + on the blog that hey, we are the new key on the blog. Can you add us?", "tokens": + [51604, 257, 2871, 322, 264, 6968, 300, 4177, 11, 321, 366, 264, 777, 2141, 322, + 264, 6968, 13, 1664, 291, 909, 505, 30, 51812], "temperature": 0.0, "avg_logprob": + -0.20110034942626953, "compression_ratio": 1.652014652014652, "no_speech_prob": + 0.01567445695400238}, {"id": 1224, "seek": 407424, "start": 4074.3999999999996, + "end": 4078.08, "text": " That''s the seventh database, right? So how many more + there are?", "tokens": [50372, 663, 311, 264, 17875, 8149, 11, 558, 30, 407, 577, + 867, 544, 456, 366, 30, 50556], "temperature": 0.0, "avg_logprob": -0.20606477120343378, + "compression_ratio": 1.719869706840391, "no_speech_prob": 0.003913502674549818}, + {"id": 1225, "seek": 407424, "start": 4078.08, "end": 4082.24, "text": " Probably, + yeah, dense. I don''t know. Oh yeah, always popping up.", "tokens": [50556, 9210, + 11, 1338, 11, 18011, 13, 286, 500, 380, 458, 13, 876, 1338, 11, 1009, 18374, 493, + 13, 50764], "temperature": 0.0, "avg_logprob": -0.20606477120343378, "compression_ratio": + 1.719869706840391, "no_speech_prob": 0.003913502674549818}, {"id": 1226, "seek": + 407424, "start": 4082.24, "end": 4086.4799999999996, "text": " But it''s like it''s + good for innovation. It''s just like we''re competing against ourselves.", "tokens": + [50764, 583, 309, 311, 411, 309, 311, 665, 337, 8504, 13, 467, 311, 445, 411, 321, + 434, 15439, 1970, 4175, 13, 50976], "temperature": 0.0, "avg_logprob": -0.20606477120343378, + "compression_ratio": 1.719869706840391, "no_speech_prob": 0.003913502674549818}, + {"id": 1227, "seek": 407424, "start": 4086.4799999999996, "end": 4090.56, "text": + " Like we''re competing in everything but no one else really cares.", "tokens": + [50976, 1743, 321, 434, 15439, 294, 1203, 457, 572, 472, 1646, 534, 12310, 13, 51180], + "temperature": 0.0, "avg_logprob": -0.20606477120343378, "compression_ratio": 1.719869706840391, + "no_speech_prob": 0.003913502674549818}, {"id": 1228, "seek": 407424, "start": 4090.56, + "end": 4093.68, "text": " Like we can all compete against each other but the people + that are actually going to use", "tokens": [51180, 1743, 321, 393, 439, 11831, 1970, + 1184, 661, 457, 264, 561, 300, 366, 767, 516, 281, 764, 51336], "temperature": 0.0, + "avg_logprob": -0.20606477120343378, "compression_ratio": 1.719869706840391, "no_speech_prob": + 0.003913502674549818}, {"id": 1229, "seek": 407424, "start": 4093.68, "end": 4098.639999999999, + "text": " are going to be like look at this mess. Why would we go into that area?", + "tokens": [51336, 366, 516, 281, 312, 411, 574, 412, 341, 2082, 13, 1545, 576, 321, + 352, 666, 300, 1859, 30, 51584], "temperature": 0.0, "avg_logprob": -0.20606477120343378, + "compression_ratio": 1.719869706840391, "no_speech_prob": 0.003913502674549818}, + {"id": 1230, "seek": 407424, "start": 4099.36, "end": 4103.92, "text": " Yeah. And + that there was actually, you know, if you know the relevancy and matching", "tokens": + [51620, 865, 13, 400, 300, 456, 390, 767, 11, 291, 458, 11, 498, 291, 458, 264, + 25916, 6717, 293, 14324, 51848], "temperature": 0.0, "avg_logprob": -0.20606477120343378, + "compression_ratio": 1.719869706840391, "no_speech_prob": 0.003913502674549818}, + {"id": 1231, "seek": 410392, "start": 4103.92, "end": 4108.32, "text": " slack, + I don''t know if you''re on it, it''s like a community of all search", "tokens": + [50364, 29767, 11, 286, 500, 380, 458, 498, 291, 434, 322, 309, 11, 309, 311, 411, + 257, 1768, 295, 439, 3164, 50584], "temperature": 0.0, "avg_logprob": -0.2064945502359359, + "compression_ratio": 1.7234848484848484, "no_speech_prob": 0.006917283404618502}, + {"id": 1232, "seek": 410392, "start": 4108.32, "end": 4113.52, "text": " in 3DS + consultants. I think I am. Yeah. Yeah. Yeah. It''s it''s it''s fantastic place. + I''ll also", "tokens": [50584, 294, 805, 11844, 38935, 13, 286, 519, 286, 669, 13, + 865, 13, 865, 13, 865, 13, 467, 311, 309, 311, 309, 311, 5456, 1081, 13, 286, 603, + 611, 50844], "temperature": 0.0, "avg_logprob": -0.2064945502359359, "compression_ratio": + 1.7234848484848484, "no_speech_prob": 0.006917283404618502}, {"id": 1233, "seek": + 410392, "start": 4113.52, "end": 4118.4, "text": " make sure to link it in the notes. + And there was like one very interesting piece on like actually", "tokens": [50844, + 652, 988, 281, 2113, 309, 294, 264, 5570, 13, 400, 456, 390, 411, 472, 588, 1880, + 2522, 322, 411, 767, 51088], "temperature": 0.0, "avg_logprob": -0.2064945502359359, + "compression_ratio": 1.7234848484848484, "no_speech_prob": 0.006917283404618502}, + {"id": 1234, "seek": 410392, "start": 4118.4, "end": 4124.8, "text": " touching + on what you just said, you know, there was like a heated discussion on like how + should", "tokens": [51088, 11175, 322, 437, 291, 445, 848, 11, 291, 458, 11, 456, + 390, 411, 257, 18806, 5017, 322, 411, 577, 820, 51408], "temperature": 0.0, "avg_logprob": + -0.2064945502359359, "compression_ratio": 1.7234848484848484, "no_speech_prob": + 0.006917283404618502}, {"id": 1235, "seek": 410392, "start": 4124.8, "end": 4130.08, + "text": " we call it pre-filtering, single-stage filtering, something filtering. + And you know, like when", "tokens": [51408, 321, 818, 309, 659, 12, 19776, 34200, + 11, 2167, 12, 17882, 30822, 11, 746, 30822, 13, 400, 291, 458, 11, 411, 562, 51672], + "temperature": 0.0, "avg_logprob": -0.2064945502359359, "compression_ratio": 1.7234848484848484, + "no_speech_prob": 0.006917283404618502}, {"id": 1236, "seek": 413008, "start": 4130.08, + "end": 4135.2, "text": " you invented that, you go to your users and you say, yeah, + today we released a single-stage", "tokens": [50364, 291, 14479, 300, 11, 291, 352, + 281, 428, 5022, 293, 291, 584, 11, 1338, 11, 965, 321, 4736, 257, 2167, 12, 17882, + 50620], "temperature": 0.0, "avg_logprob": -0.15147773490464392, "compression_ratio": + 1.728301886792453, "no_speech_prob": 0.003275364637374878}, {"id": 1237, "seek": + 413008, "start": 4135.2, "end": 4140.5599999999995, "text": " pre-filter after filter + filtering. So please use it, right? And then some other company comes", "tokens": + [50620, 659, 12, 19776, 391, 934, 6608, 30822, 13, 407, 1767, 764, 309, 11, 558, + 30, 400, 550, 512, 661, 2237, 1487, 50888], "temperature": 0.0, "avg_logprob": -0.15147773490464392, + "compression_ratio": 1.728301886792453, "no_speech_prob": 0.003275364637374878}, + {"id": 1238, "seek": 413008, "start": 4140.5599999999995, "end": 4146.32, "text": + " and says, no, we invented another one. It''s called after pre-filtering single-stage + with", "tokens": [50888, 293, 1619, 11, 572, 11, 321, 14479, 1071, 472, 13, 467, + 311, 1219, 934, 659, 12, 19776, 34200, 2167, 12, 17882, 365, 51176], "temperature": + 0.0, "avg_logprob": -0.15147773490464392, "compression_ratio": 1.728301886792453, + "no_speech_prob": 0.003275364637374878}, {"id": 1239, "seek": 413008, "start": 4146.32, + "end": 4151.04, "text": " double sub-stages, you know, I''m just making it up obviously. + Exactly what you mean.", "tokens": [51176, 3834, 1422, 12, 372, 1660, 11, 291, 458, + 11, 286, 478, 445, 1455, 309, 493, 2745, 13, 7587, 437, 291, 914, 13, 51412], "temperature": + 0.0, "avg_logprob": -0.15147773490464392, "compression_ratio": 1.728301886792453, + "no_speech_prob": 0.003275364637374878}, {"id": 1240, "seek": 413008, "start": 4151.04, + "end": 4155.68, "text": " I''ve exaggerating, right? And then that''s what you said. + Eventually it will hurt the users because", "tokens": [51412, 286, 600, 19123, 990, + 11, 558, 30, 400, 550, 300, 311, 437, 291, 848, 13, 17586, 309, 486, 4607, 264, + 5022, 570, 51644], "temperature": 0.0, "avg_logprob": -0.15147773490464392, "compression_ratio": + 1.728301886792453, "no_speech_prob": 0.003275364637374878}, {"id": 1241, "seek": + 415568, "start": 4155.68, "end": 4161.280000000001, "text": " they will say, oh + no, no, no, I have that single-stage filter after filtering. I will not go and", + "tokens": [50364, 436, 486, 584, 11, 1954, 572, 11, 572, 11, 572, 11, 286, 362, + 300, 2167, 12, 17882, 6608, 934, 30822, 13, 286, 486, 406, 352, 293, 50644], "temperature": + 0.0, "avg_logprob": -0.15117944735232916, "compression_ratio": 1.6266094420600858, + "no_speech_prob": 0.010430784896016121}, {"id": 1242, "seek": 415568, "start": 4161.280000000001, + "end": 4168.320000000001, "text": " switch it to another one, right? Yeah. No, exactly + then. It''s just but I think it''s it''s all young.", "tokens": [50644, 3679, 309, + 281, 1071, 472, 11, 558, 30, 865, 13, 883, 11, 2293, 550, 13, 467, 311, 445, 457, + 286, 519, 309, 311, 309, 311, 439, 2037, 13, 50996], "temperature": 0.0, "avg_logprob": + -0.15117944735232916, "compression_ratio": 1.6266094420600858, "no_speech_prob": + 0.010430784896016121}, {"id": 1243, "seek": 415568, "start": 4168.320000000001, + "end": 4173.200000000001, "text": " I''m like kind of new to this whole field of + how things work. But I think it''s natural for", "tokens": [50996, 286, 478, 411, + 733, 295, 777, 281, 341, 1379, 2519, 295, 577, 721, 589, 13, 583, 286, 519, 309, + 311, 3303, 337, 51240], "temperature": 0.0, "avg_logprob": -0.15117944735232916, + "compression_ratio": 1.6266094420600858, "no_speech_prob": 0.010430784896016121}, + {"id": 1244, "seek": 415568, "start": 4174.16, "end": 4179.92, "text": " these young + someone like everyone''s going to race at the top and see. And you just got to do", + "tokens": [51288, 613, 2037, 1580, 411, 1518, 311, 516, 281, 4569, 412, 264, 1192, + 293, 536, 13, 400, 291, 445, 658, 281, 360, 51576], "temperature": 0.0, "avg_logprob": + -0.15117944735232916, "compression_ratio": 1.6266094420600858, "no_speech_prob": + 0.010430784896016121}, {"id": 1245, "seek": 417992, "start": 4179.92, "end": 4186.16, + "text": " what we got to do. But it''s interesting how it''s all going to play out + if there is going to be more", "tokens": [50364, 437, 321, 658, 281, 360, 13, 583, + 309, 311, 1880, 577, 309, 311, 439, 516, 281, 862, 484, 498, 456, 307, 516, 281, + 312, 544, 50676], "temperature": 0.0, "avg_logprob": -0.1569471831368928, "compression_ratio": + 1.6244541484716157, "no_speech_prob": 0.010222727432847023}, {"id": 1246, "seek": + 417992, "start": 4186.16, "end": 4192.32, "text": " communication between everyone + if there''s not. I don''t know. It''s might be a little bit above my", "tokens": + [50676, 6101, 1296, 1518, 498, 456, 311, 406, 13, 286, 500, 380, 458, 13, 467, 311, + 1062, 312, 257, 707, 857, 3673, 452, 50984], "temperature": 0.0, "avg_logprob": + -0.1569471831368928, "compression_ratio": 1.6244541484716157, "no_speech_prob": + 0.010222727432847023}, {"id": 1247, "seek": 417992, "start": 4192.32, "end": 4198.96, + "text": " what I''m doing. But we''ll see future awaits with all of this. But I + still feel like", "tokens": [50984, 437, 286, 478, 884, 13, 583, 321, 603, 536, + 2027, 45955, 365, 439, 295, 341, 13, 583, 286, 920, 841, 411, 51316], "temperature": + 0.0, "avg_logprob": -0.1569471831368928, "compression_ratio": 1.6244541484716157, + "no_speech_prob": 0.010222727432847023}, {"id": 1248, "seek": 417992, "start": 4200.88, + "end": 4206.64, "text": " in the end of the day, you really need to focus on the + users, right? You''re not focusing on", "tokens": [51412, 294, 264, 917, 295, 264, + 786, 11, 291, 534, 643, 281, 1879, 322, 264, 5022, 11, 558, 30, 509, 434, 406, 8416, + 322, 51700], "temperature": 0.0, "avg_logprob": -0.1569471831368928, "compression_ratio": + 1.6244541484716157, "no_speech_prob": 0.010222727432847023}, {"id": 1249, "seek": + 420664, "start": 4206.64, "end": 4211.92, "text": " inventing a new turnbook or + a new dictionary for vector search. Eventually it will be published,", "tokens": + [50364, 7962, 278, 257, 777, 1261, 2939, 420, 257, 777, 25890, 337, 8062, 3164, + 13, 17586, 309, 486, 312, 6572, 11, 50628], "temperature": 0.0, "avg_logprob": -0.19060383028197056, + "compression_ratio": 1.5511811023622046, "no_speech_prob": 0.013427691534161568}, + {"id": 1250, "seek": 420664, "start": 4211.92, "end": 4217.200000000001, "text": + " by the way. I''m sure there will be so many terms. It will be published. But like + we need to work", "tokens": [50628, 538, 264, 636, 13, 286, 478, 988, 456, 486, + 312, 370, 867, 2115, 13, 467, 486, 312, 6572, 13, 583, 411, 321, 643, 281, 589, + 50892], "temperature": 0.0, "avg_logprob": -0.19060383028197056, "compression_ratio": + 1.5511811023622046, "no_speech_prob": 0.013427691534161568}, {"id": 1251, "seek": + 420664, "start": 4217.200000000001, "end": 4224.88, "text": " to that. Yeah. No, + I agree. 100%. So hey Philip, like it''s been really great discussion. I was thinking", + "tokens": [50892, 281, 300, 13, 865, 13, 883, 11, 286, 3986, 13, 2319, 6856, 407, + 4177, 21144, 11, 411, 309, 311, 668, 534, 869, 5017, 13, 286, 390, 1953, 51276], + "temperature": 0.0, "avg_logprob": -0.19060383028197056, "compression_ratio": 1.5511811023622046, + "no_speech_prob": 0.013427691534161568}, {"id": 1252, "seek": 420664, "start": 4224.88, + "end": 4230.8, "text": " like, would you like to announce something to the users + of Milbus or maybe those who are not yet", "tokens": [51276, 411, 11, 576, 291, + 411, 281, 7478, 746, 281, 264, 5022, 295, 7036, 21441, 420, 1310, 729, 567, 366, + 406, 1939, 51572], "temperature": 0.0, "avg_logprob": -0.19060383028197056, "compression_ratio": + 1.5511811023622046, "no_speech_prob": 0.013427691534161568}, {"id": 1253, "seek": + 423080, "start": 4230.8, "end": 4238.24, "text": " using Milbus, but they would + like to try it out. Yeah. So we have Hector profess get involved.", "tokens": [50364, + 1228, 7036, 21441, 11, 457, 436, 576, 411, 281, 853, 309, 484, 13, 865, 13, 407, + 321, 362, 389, 20814, 2668, 483, 3288, 13, 50736], "temperature": 0.0, "avg_logprob": + -0.2809847629431522, "compression_ratio": 1.5648535564853556, "no_speech_prob": + 0.018864192068576813}, {"id": 1254, "seek": 423080, "start": 4238.24, "end": 4244.88, + "text": " And it''s a pretty easy one to check out and see how our system works. + And we are releasing a general", "tokens": [50736, 400, 309, 311, 257, 1238, 1858, + 472, 281, 1520, 484, 293, 536, 577, 527, 1185, 1985, 13, 400, 321, 366, 16327, 257, + 2674, 51068], "temperature": 0.0, "avg_logprob": -0.2809847629431522, "compression_ratio": + 1.5648535564853556, "no_speech_prob": 0.018864192068576813}, {"id": 1255, "seek": + 423080, "start": 4244.88, "end": 4249.84, "text": " release candidate. So pretty + much are kind of tried and true. Milbus 2.0 coming month.", "tokens": [51068, 4374, + 11532, 13, 407, 1238, 709, 366, 733, 295, 3031, 293, 2074, 13, 7036, 21441, 568, + 13, 15, 1348, 1618, 13, 51316], "temperature": 0.0, "avg_logprob": -0.2809847629431522, + "compression_ratio": 1.5648535564853556, "no_speech_prob": 0.018864192068576813}, + {"id": 1256, "seek": 423080, "start": 4251.04, "end": 4256.16, "text": " And you + also mentioned that other system toy, you said, what is it about? And when will + the", "tokens": [51376, 400, 291, 611, 2835, 300, 661, 1185, 12058, 11, 291, 848, + 11, 437, 307, 309, 466, 30, 400, 562, 486, 264, 51632], "temperature": 0.0, "avg_logprob": + -0.2809847629431522, "compression_ratio": 1.5648535564853556, "no_speech_prob": + 0.018864192068576813}, {"id": 1257, "seek": 425616, "start": 4256.8, "end": 4263.12, + "text": " yeah, Toby is a ML pipeline software kind of simplifying mainly for embeddings. + And it''s all", "tokens": [50396, 1338, 11, 40223, 307, 257, 21601, 15517, 4722, + 733, 295, 6883, 5489, 8704, 337, 12240, 29432, 13, 400, 309, 311, 439, 50712], "temperature": + 0.0, "avg_logprob": -0.16463831436535545, "compression_ratio": 1.7162629757785468, + "no_speech_prob": 0.01979568414390087}, {"id": 1258, "seek": 425616, "start": 4263.12, + "end": 4267.92, "text": " about embeddings and kind of making these for you. So + it''s pipeline system. Everyone can operate and", "tokens": [50712, 466, 12240, + 29432, 293, 733, 295, 1455, 613, 337, 291, 13, 407, 309, 311, 15517, 1185, 13, 5198, + 393, 9651, 293, 50952], "temperature": 0.0, "avg_logprob": -0.16463831436535545, + "compression_ratio": 1.7162629757785468, "no_speech_prob": 0.01979568414390087}, + {"id": 1259, "seek": 425616, "start": 4268.5599999999995, "end": 4273.599999999999, + "text": " everyone can upload their solutions if they want to download them. And + yeah, still in the working", "tokens": [50984, 1518, 393, 6580, 641, 6547, 498, + 436, 528, 281, 5484, 552, 13, 400, 1338, 11, 920, 294, 264, 1364, 51236], "temperature": + 0.0, "avg_logprob": -0.16463831436535545, "compression_ratio": 1.7162629757785468, + "no_speech_prob": 0.01979568414390087}, {"id": 1260, "seek": 425616, "start": 4273.599999999999, + "end": 4278.5599999999995, "text": " progress, but look out for it because I think + it''s going to help in a lot of these areas. Yeah, that''s", "tokens": [51236, 4205, + 11, 457, 574, 484, 337, 309, 570, 286, 519, 309, 311, 516, 281, 854, 294, 257, 688, + 295, 613, 3179, 13, 865, 11, 300, 311, 51484], "temperature": 0.0, "avg_logprob": + -0.16463831436535545, "compression_ratio": 1.7162629757785468, "no_speech_prob": + 0.01979568414390087}, {"id": 1261, "seek": 425616, "start": 4278.5599999999995, + "end": 4284.32, "text": " super cool. That sounds very exciting, you know, to kind + of take this package and kind of plug things", "tokens": [51484, 1687, 1627, 13, + 663, 3263, 588, 4670, 11, 291, 458, 11, 281, 733, 295, 747, 341, 7372, 293, 733, + 295, 5452, 721, 51772], "temperature": 0.0, "avg_logprob": -0.16463831436535545, + "compression_ratio": 1.7162629757785468, "no_speech_prob": 0.01979568414390087}, + {"id": 1262, "seek": 428432, "start": 4284.32, "end": 4289.92, "text": " in and + try it out for real on the real day. Exactly. That''s fantastic. Thanks for doing + this. We''ll", "tokens": [50364, 294, 293, 853, 309, 484, 337, 957, 322, 264, 957, + 786, 13, 7587, 13, 663, 311, 5456, 13, 2561, 337, 884, 341, 13, 492, 603, 50644], + "temperature": 0.0, "avg_logprob": -0.19180492688250797, "compression_ratio": 1.5956521739130434, + "no_speech_prob": 0.009798945859074593}, {"id": 1263, "seek": 428432, "start": 4289.92, + "end": 4296.719999999999, "text": " make sure to also kind of mention this in the + show notes or link if by the time it''s there.", "tokens": [50644, 652, 988, 281, + 611, 733, 295, 2152, 341, 294, 264, 855, 5570, 420, 2113, 498, 538, 264, 565, 309, + 311, 456, 13, 50984], "temperature": 0.0, "avg_logprob": -0.19180492688250797, "compression_ratio": + 1.5956521739130434, "no_speech_prob": 0.009798945859074593}, {"id": 1264, "seek": + 428432, "start": 4297.759999999999, "end": 4304.08, "text": " Yeah, awesome. Thanks, + Philip, so much for your time for going so deep with me on", "tokens": [51036, 865, + 11, 3476, 13, 2561, 11, 21144, 11, 370, 709, 337, 428, 565, 337, 516, 370, 2452, + 365, 385, 322, 51352], "temperature": 0.0, "avg_logprob": -0.19180492688250797, + "compression_ratio": 1.5956521739130434, "no_speech_prob": 0.009798945859074593}, + {"id": 1265, "seek": 428432, "start": 4304.08, "end": 4310.639999999999, "text": + " on even philosophy behind neural networks and then sharing your ideas and thoughts. + Thanks so", "tokens": [51352, 322, 754, 10675, 2261, 18161, 9590, 293, 550, 5414, + 428, 3487, 293, 4598, 13, 2561, 370, 51680], "temperature": 0.0, "avg_logprob": + -0.19180492688250797, "compression_ratio": 1.5956521739130434, "no_speech_prob": + 0.009798945859074593}, {"id": 1266, "seek": 431064, "start": 4310.64, "end": 4316.88, + "text": " much. And I hope we can make another episode at some point down the road + if you''re open to it,", "tokens": [50364, 709, 13, 400, 286, 1454, 321, 393, 652, + 1071, 3500, 412, 512, 935, 760, 264, 3060, 498, 291, 434, 1269, 281, 309, 11, 50676], + "temperature": 0.0, "avg_logprob": -0.15834888458251953, "compression_ratio": 1.5257731958762886, + "no_speech_prob": 0.007455460727214813}, {"id": 1267, "seek": 431064, "start": 4316.88, + "end": 4321.68, "text": " especially as the company materials and the product materials. + And you get more use cases. So I''m", "tokens": [50676, 2318, 382, 264, 2237, 5319, + 293, 264, 1674, 5319, 13, 400, 291, 483, 544, 764, 3331, 13, 407, 286, 478, 50916], + "temperature": 0.0, "avg_logprob": -0.15834888458251953, "compression_ratio": 1.5257731958762886, + "no_speech_prob": 0.007455460727214813}, {"id": 1268, "seek": 431064, "start": 4321.68, + "end": 4327.200000000001, "text": " looking forward for more blog posts as well. + Awesome. Yeah. Yeah, thanks for having me. It was a really", "tokens": [50916, 1237, + 2128, 337, 544, 6968, 12300, 382, 731, 13, 10391, 13, 865, 13, 865, 11, 3231, 337, + 1419, 385, 13, 467, 390, 257, 534, 51192], "temperature": 0.0, "avg_logprob": -0.15834888458251953, + "compression_ratio": 1.5257731958762886, "no_speech_prob": 0.007455460727214813}, + {"id": 1269, "seek": 432720, "start": 4327.2, "end": 4331.12, "text": " fun discussion. + Thanks so much, Philip. Bye bye. Bye.", "tokens": [50364, 1019, 5017, 13, 2561, + 370, 709, 11, 21144, 13, 4621, 6543, 13, 4621, 13, 50560], "temperature": 0.0, "avg_logprob": + -0.39165766098920035, "compression_ratio": 0.9298245614035088, "no_speech_prob": + 0.15921558439731598}]' +--- + +All right, vector podcast, episode three. And today we have Philip Altmeier, data engineer as Zillitz, who works a lot with users. And the company is building the vector search database called Milbus. Hey Philip. Nice to meet you. Yeah, I got it. Data engineer as Zillitz, pretty much me. +Yeah, awesome, awesome. Nice to meet you as well. And thanks for joining the show. And yeah, it's usually I would like to start with you introducing yourself to our audience, what's your background and how you ended up working for Zillitz. Sounds good. +Yes, so you got my name already Philip Altmeier. A graduated from UC Santa Cruz with a BS in computer science in 2020. So right start enduring COVID. And then out of college, I really wanted to kind of get into the startup scene. +And I was doing a lot of things, machine learning, taking a lot of classes, doing projects. And when I was going to look for a job, I realized anything machine learning related, you have to have a PhD. You have to be doing masters PhD, extra work. +You're not getting out of, you're not getting into the field out of college. So the next step was like, okay, what's kind of new and growing in that field? Somewhere where there isn't already so much this set knowledge. And that's where vector search came in. +And then Zillitz, I found them and did the whole process. And I really thought I fed it fit in. And that's where that took off. So that's kind of how I got to where I am right now. +But pretty much yeah, I throw a shout out of college, getting in the whole field and figuring everything out on how it all works. Oh yeah, that's cool. And can you tell me a bit more? I've been also doing some tech stuff for a few years here and there. +But like data engineer and you work with users. How exactly that looks like? Yeah, so data engineer gets thrown around a lot at a lot of companies. Right now, so for me, how it works for me, data engineer falls into kind of just user success and more also pre sale style of things. +So how to use our tech, creating new use cases like we have a bootcamp where we show examples of how to use Milvus. That's kind of what we're doing. We're talking to the customers that are trying to learn how they can implement in their system, what problems they're having. +So we're the ones that are kind of front facing in the company. And then as a data engineer, I've also worked with a lot on the cloud deployment, figuring that, optimizing that worked on some development aspects as well. But it's kind of a lot of hats. +So it's like startup data engineer, at least here, it's pretty much just whatever needs help with or whatever needs work. You kind of get put on that. So it's a cool opportunity to try a lot of it and get to meet a lot of customers and cool people in all different parts of the field. +Yeah, and you also learn to interact with users because they bring a different perspective over things. They probably don't focus as much on the integrated, but they need to solve something. Right? Oh yeah, it's definitely that. So it's figuring out, seeing their use cases is always crazy. +See how much data they're dealing with these cool ideas of what they're trying to do and seeing how we can make it work. And usually it all goes well. We can figure out solutions. We work together. We kind of keep relationships. I think it's really cool. And sometimes it doesn't work. +Sometimes we need to put more things into Milvus. So we kind of keep the communication line open and we kind of figure out more things to put in, kind of get their input on what we're working on and go from there. But it's a lot of a back and forth conversation with users and customers. +Yeah, for sure. It's kind of like, first you need to learn what is it that they're trying to do, right? And before, like you suggest any solution because it takes a lot of time. +Do you feel the same way when you talk to them? Like instead of kind of jumping in and like solving, you kind of try to figure it out. Or you do something else. You do you have a different approach. +So I think I go the way of like, yeah, kind of get all the info first because everyone's use cases different. None of them are ever the same. And then kind of come back to the team, kind of discuss it. +See, do we have any else that's been doing it sort of similar? Do we have a solution? Because sometimes we've had to do hacky solutions with our like our previous version. +There were things that just weren't up to par and you couldn't really change it that much because it was like kind of on an old style of doing it and it didn't work. So you kind of do some hacky solution for them. Some way to trick it into working it out work in production. +But then we take notes of that and kind of later on put it into like, so like when we're going to new version, okay, we got to think about this. We got to improve on this. And yeah, but it all starts with kind of figuring out what they're doing, getting the whole picture. +And sometimes like there's always conversations where they can't really say everything because a lot of these places that are these companies are, there's a reason that they're like, they got to be secretive because it's a new field and they may have a really good new idea. +But it's kind of like extracting as much as we can without like crossing those bounds, getting that in for the not a lot of tell and seeing if it can help them with what we got. But it's like a big team effort. It's not just me. +So I talked to them, bring it back and then we all work together to solve it. Yeah, yeah, for sure, for sure. And I, your users kind of aware of that you are helping them with vector search. +Do they even care? So it's fifth, there may be not fifth, like I would say 70% of the people we talk to are aware from it. They come to us with help. +So they know what they're kind of getting into and they know what they need vector search for and they know like what they're doing and why they're doing it. But sometimes we also get people that are, hey, like I want to find similar images. +And there it's like, we have like the simple like tutorials that kind of deal with it, but they want to know more about it. So there is some explaining of what vector search is. +What vectors are sometimes that some like it's understandable, like not everyone goes studies machine learning and knows what like vectors are math as well. +But I would say 70% of the time they get it and they know what they're getting into, but 30% of the time it's also just like a whole new world and they don't really like we explain it, but it's not like a can't explain it all in one day. There's a lot of stuff that goes behind it. +Sure you might touch on vectors one day, but then you have to get into the algorithms next day and then it's sort of like keeping that relationship and answering questions whenever they come up. Oh yeah, oh yeah, for sure. +And are you using like most of the time you're using Milders, right? Like as part of your user engagement or do you like how does it look like? +So you bring the database and you say you know it can solve a bunch of different use cases, but you know we also need to vectorize your data or maybe they bring the vectors. +How does it look like? So they they usually bring the vectors themselves. We're currently going working on as well as we're building something a little bit above Milders for actually getting the vectors, but that's the little in the working progress and it should be releasing soon. +But for now it's always we have like our examples, we have a bootcamp where we pull like the basic like the basic pipeline. It's always around Milders, but for like images we have a resident 50 and we kind of show them how it goes in that process. +So that's for the 30% we kind of go over like one pipeline. It's like a small file because it's just three steps encode it and bet it and then search it. +But yeah, so most of the time they already have their embeddings though with those bigger companies who knows who know what vector search is, who knows what they're getting into. They already have their 512 dimensions 10 million vectors. They know what they're getting into. +So they just want to see okay how many how fast can you do it? What are the bottlenecks? Where can we like what do we need to scale out if we're going to scale? So it's like again 70 30 the 30% usually you need to go over the actual embeddings as well. So just like a quick neural that lesson. +Yeah, yeah. Or maybe like it's in their culture in the company to kind of like dive deeper into what they are doing, right? Yeah. +And maybe they think that they can kind of take it over and then kind of run with it as they learned, right? But then you said 70% are kind of like, you know, here is my problem. +Can you solve it? Is that exactly like does this work? Can we have a handle this? And that's how usually like 70% again, but they have their own neural and that's sometimes they don't want to tell us like what neural nets are using or how exactly their data is. +But they give us okay, this many dimensions, this many vectors and this many read request, write request, will it work? And then we kind of go from there and see if we can solve it. Yeah, yeah, sounds cool. But can you actually tell me what is vector search and what is Mildos? Okay, sure. +So yeah, vector search. Pretty much a way to search vectors and check over vectors as well. Yeah, I'll go over. So numbers and vectors, you have numbers easily comparable. You can store them in relational databases. Yeah, like the greater than equal to less than. +So to actually index these and search quickly through them, you can do things like Beatriz. This is a very efficient and very fast way of searching for a value. +Vectors on the other hand, you don't really have this kind of like comparison, direct comparison, you have similarity metrics, which is a math equation where you kind of find out how far they diverge. But it doesn't tell you, okay, this element is diverging this much. +It's like a lump sum for every value in the vector combined. This is how different the entire thing is. And that makes indexing a little bit more difficult because you kind of start relying on more approximate algorithms. +So this is what approximate nearest neighbor search, which is pretty much all of vector search. It's this library of algorithms. And there you can do clustering and then you can do graph based, tree based. So the big, the big names are right now for inverted file we have face. +That's the library for its clustering based on centroids. And then you store values in the inverted file and you search through that. There's tree based, which is spotifies annoy what they're using for their music recommendations. +And that's just building trees and splitting all your data by hyperplanes and then going left or right. And then we have graph based, which is H and SW. I think is the biggest one right now. And they're doing pretty much graph start with a very sparse, a very like empty graph on the top layer. +And then you find the closest to one point, let's say, and then you drop down a lower where it gets more dense and you keep dropping and dropping. +And then there's a locality based sensitive hashing, which instead of with normal hash algorithms, you avoid collisions with locality sensitive hashing, you try to get collisions. If you get collisions, that means that they're close together. +And then one thing I kind of forgot to go over is like the data types that this brings. So there is structured data, which is those numbers strings, those things that can be easily compared to. And then there's unstructured data. +And this is pretty much these images, videos, medical data, some things that computers can't easily understand. And then with unstructured data, you throw them through a neural net and you get those vectors that we previously talked about. +And then relational data or with a structured data, you can just take the data itself because it can be easily compared. It's already known to a computer. It can understand them. +And then there's in between, which is semi structured, semi structured is things like emails where you have structured to it. Like you have the body, the header, the sending address, those are all like every email has that, but the data inside is unstructured. +This is kind of where you use a mix of both. But yeah, vector search gets a little complicated, but the main way to think about it, you have unstructured data, your computer does not understand whatsoever. You can have two images, a pixel apart. +And half the time if your algorithm is not good, your computer will think there are two exactly different pictures. It won't get it. So you take that unstructured data, you throw a through a neural net, you get vectors, and what vectors use those previous algorithms to find things that are similar. +And that's how you can quickly search through it. Right, right. And I mean, so you mentioned these several algorithms there, which is I agree, I agree, I read this paper as well. This is cool. +But like just to satisfy my curiosity, where would you put the product quantization methods, you know, which is also implemented in FICE and maybe some, some where else to. +Like, is this like a fundamentally different approach compared to LSH graph, trees, or is this something else in your book? So with FICE with this quantization, I think I just find that to be part of the graph based. I looked a bit into it. +This kind of went a little deeper because I didn't really work on that much, but I did up and look into it. But it's pretty much just simplifying it for clustering is the way I saw it, where I would classify that for clustering. +You kind of want to, you need something to kind of simplify and speed it up. So that's where in FICE you have the quantized base. You have the flats, which aren't, you have the SQH, which is quantized based. +And there's a few more throughout the names, but it's just a way of speeding up that already used one. I'm not sure how well it will work with other algorithms. So like using that quantization and then trying it on a noise, you quantize everything and then you start doing the splits. +It might speed things up, but yeah, it's a little bit outside of where I see what I know. But if it works with FICE, I believe it could work with the other locations. It's just, I don't think it's fantastic yet. Yeah, I mean, I agree. +And I mean, there are a number of approaches where they combine things. If you take the paper on disk and then from Microsoft, from I think Bing team, they combine H&SW with product quantization. And they also have clustering. So it's kind of three phase algorithm. +The first cluster, the points, they get the centroids. Then they kind of quantize them, I guess, kind of lose some precision on the vectors so that you can actually load them in memory. +And then like from there, they build the, so for the clusters, they build the H&SW, the graph kind of layout, right? For each kind of like shard, you could say, for cluster. And then they basically kind of, it's a, it's a few steps kind of approach. +So your query comes in, it basically goes through kind of this quantization. You find the closest, you know, centroids. And then you go and kind of searching them. And then you read rank the results based on the disk. +So from disk, you read the non-quantized versions of vectors, right? So that you can actually give, get the precision. So I mean, what I'm trying to say basically is that you can combine these algorithms in ways. Yeah. Yeah. +Depending on your use case, basically, like if you try to optimize for memory or speed or something like that. Yeah. Yeah. And so if we go back to, before we go into Milbus, like if we go back to use cases, you mentioned there are like a number of things. +Let's say you can encode almost any object and you gave an example, really good one about email, right? So email on one hand, everyone knows what it is. On the other hand, it has unstructured kind of parts to it. +And if you compare text, let's say versus audio or video, do you think that you can equally apply vector search? Of course you can, but I'm asking in terms of quality that you will get. +Or do you need to go like extra mile, you know, in audio, extra mile and video compared to text? There are so many models. Honestly, that's a good question. I think that's where the neural nets come in and that's where they're important. +How they kind of the black box doesn't how it kind of sorts everything out, but I believe in text, I feel like first, I feel like there's been a lot more work and a lot more kind of people have been looking into them for now, everyone's kind of switching to that for product recommendation. +There's a lot more money in that area. So I think there are a lot more advances in those neural nets. But I think the underlying to text, the way I personally see it, this isn't like scientific factor is I feel like there's a lot more underlying in language. +I feel like there's a lot more rules underlying connections that I would think a neural net would find compared to an image. And then with those underlying values and kind of the underlying language, you'll have an easier time kind of grouping things together with a neural net. +And then if you can easily more easily group things together, the more easily you can search it pretty much. You can make these clusters a lot more accurate if things are going to already be near each other, and it's easier to find. +With images on the other hand, I feel like there's not as much of a background connection in everything. Again, all personal take, but I feel like sure an image might have the same object, but there's no like real underlying thing linking those objects. Yeah, there's the shape. +But in languages, you have a lot more than just a shape of an object. So that's kind of where I think text does have a better time. But in reality, the way I want to, when we look at our systems and everything like when we try, it always ends up being very close to each other. +Maybe like up until now, it's already approximate. So no one's really been hurt that much by half a percent of accuracy. Up until like everyone understands it's still kind of a new feel. It's kind of growing in that these methods are all approximate. Like you're never going to get a perfect end. +It's really up to the testing with your neural net to see which embeddings and optimizing your neural net and then throwing because these algorithms aren't, it's like for the approximate year's neighbor, these algorithms aren't really learning. +Sure, there is some learning with the quantization based ones, where it is kind of making its own quantization. I know for face, it does it. But again, it's like it's like a, it's an algorithm that kind of goes step by step and there's not too much randomness. +Sure Spotify does random splits in their annoy. But yeah, so I kind of have to optimize your neural net to really get the best performance. But there are some values that you can mess around with the actual approximate nearest neighbor search. +But those don't play as big of a deal, I believe, as what you're doing with your neural net. Yeah, that's interesting. +So, if I actually take a step back a little bit, can you tell me and our listeners why we cannot do exact KNN, exact KNN nearest neighbors? Why do we need to do approximate? What stops us from doing exact? So with exact, okay, so first you can get exact, but that's just going to be brute force. +Maybe all these libraries, maybe not all of them, but most of them do have a brute force search. And then you haven't solved anything. Then you can just use relational database, throw in your numbers and go by each column, look through each one, see which one's closest. +Yeah, so you get approximate, that's where you get your speed up. And then with approximate, because you're doing clustering, you assume most things are going to be embedded. +Your neural net, like if you have a neural net, your embedding layer, your vector is probably like you hope that it's going to find similarities. Like if you have two items that are very similar, your hope that their distance is not going to be far. This isn't always the case. +A neural net, if you have a photo of a car and a photo of a car with a bike in the background in might for some reason, folks on the bike, we don't really know what's going on. There's research into seeing what's actually going on behind the scenes in the box. +But yeah, these two might pop out with two completely different values. They might be in completely different clusters, even though they kind of should be similar. So that's where this, it can kind of go wrong. +And then, see, yeah, you search the wrong cluster, and then you'll miss that, even though I was supposed to be a good choice. But then there's also the aspect of not searching through everything. You want to speed things up. +You search through the top 10 matches, let's say for inverted file list, which is centroid base, the clustering. You look at the top 10 centroid. If you look at the top 10 centroid, and you find those files in there, yeah, they're going to be similar. +But there might be the 11th centroid, might be a very similar one. They're all by just a tiny bit less. And then inside it, it might have the perfect answer. +So there is like all of this approximation where you only look at top X numbers, and then also combine with you only look at, you only make so many clusters, you make X clusters. There's always going to be outliers out of bounds. So that's where you kind of get that loss. +Because for the similarity search in this, leading on this, like, will similarity search take over everything? It won't really, because sometimes you need perfect results. And similarity search is kind of useless there. +It's going to end up being brute force, and then with brute force, any algorithm really works. You're going to be looking through every single value. Yeah. +So it's like complexity wise, it becomes like big all of n, where n could be like 1 billion, right? Yeah, when you're in 1 billion, there's no problem solved anymore if you look through everything. Yeah. Yeah. Yeah. +That's why you need to go approximate, but it's not like approximate to the level of losing like tens of percent in, in, uh, percent. Yeah. It's usually, I would say it's usually like around three to the three percent. +If you're doing like a very reasonable, like speed versus a recall, um, you balance it out of it, that's where you can change the values in the actual algorithm. But if you keep it kind of balanced, usually 97% the average of mostly what I've been seeing. So it's pretty strong. +And this is like where, yeah, it's approximate when you're dealing with billions of data, you don't really, like, yeah, finding the exact for some use it use cases is very useful, but usually in the billion scale data range, you're okay with just getting a few that are very close. Yeah. Yeah. +And I mean, I've, um, so when I published a blog post about like all the vector databases, I will make sure to link it in the notes. Um, and, and Mildbus was there as well. You know, and I can use somebody said that they have been actually using no SQL database for genome related project. +And so what, what the guy said that he did is that he actually can pre computed the nearest neighbors for each individual entry. And then he stored it as, as individual items in the no SQL database. +And so as query came in, he basically kind of went and kind of asked for each item, okay, what's your neighbors, right? And, and then he said on, on, on small scale, this worked fine, but he wouldn't necessarily use this on the kind of next level, right? Yeah. +And so can you tell me more like how Mildbus is done? What is, what is it as a product, let's say, uh, and, and what's included inside? What can I get as a user? Yeah, sure. +So, um, yeah, Mildbus, we kind of built it as a database first, similar research second, where everyone's collecting a bunch of data, a bunch of vectors, everyone's hoarding all their data and they have, they're making their neural nets, they're all getting embeddings, but then like, what's next? +You need to do something with that data. +So that's where against similar research. Yeah, what we're doing is building up a database system. So right now with version 2.0, we're really working on making a cloud, uh, native, something scalable, something fast and something easy to use. +So we, you can think of it pretty much as a MySQL database and just for vectors. And then in that regard, you have the cred operations, you have sharding, you have all of this, all these operations and we're kind of building up that for vector itself. +And then later on, we're going to be building up other parts that branch off to kind of make those vectors. So we're kind of, it's this core to our entire pipeline for dealing with similarity search. And yeah, that's kind of what it is. +In terms of like the actions you can do with it, you can do storing, you can updating, as I said, partitioning, sharding, we're adding scalar filtering in right now. It's with INS, but I think this month. So in the next week, I believe we're going to be having strings for a scalar filtering. +What scalar filtering is, is being able to filter results in a fast way. So instead of searching through everything and then filtering out these certain things, you kind of apply the filter first or apply the filter during the search to speed everything up and also get more accurate results. +So with, let's say you have a vector and then there's a filter that says, glass is equal true. You can look for every single vector that has glasses equal true. And that's very useful and something that everyone's been looking for. But yeah, it's a database first. +And then for the actual searching, we employ all these libraries that we produced and mentioned, us, annoy, phase, hnsw, all these guys to build these indexes. And then you can select whichever one you want. You can use multiple. +Sometimes some will work better for images or if you're neural nets working in some way, it might work better with this one. So you can store multiple of these indexes, decide, test pretty easily, and mess around with it. And once you're done, you select what you want and you call it a day. +And you search and you get results. Yeah. Actually, when I was watching one presentation by your colleagues at Haystack, we will make sure to link this as well. Like this got my eye besides, you know, the horizontal scaling that other databases as well have. +Well, maybe not all of them, but some most of them. But, you know, one thing that caught my eye was that I can indexes, you said, the data using different index layouts, essentially different algorithms that you alluded to earlier. +And then I can somehow test and kind of figure out which one works better. Is that right? Yeah, pretty much so we do have benchmarking algorithms, but you can also benchmark yourself as well. +But the way is you can build up also every single index has its own parameters and you can just constantly build up more. You can like build up 10 of parameters change this way or 10 of just completely different indexes. +And then you perform a search with the same vector for each index because when you search, you can select which index you want to use. So you can just take that search, throw it in through every single index, see the results. +And then if you have a baseline data where you already have it labeled and you know what results you should be getting from a brute force. So when we do these benchmarks, it's always compared to brute force because brute force will give you the exact answers. +And from there, you can kind of see, okay, how many hits did I get, how many did I miss, and see what your recall rate is. +And then you can also time these things as well because some parameters, if you make 10,000 clusters within your data, that's going to take a bit to search if you want to search through every single one. +So you time it and then you can kind of get this ratio of speech performance or we usually say like speech recall. But yeah, so you can build up all of them then go from there kind of if one doesn't work, you can just delete it, it'll do it in the background, build a new one. +You can be doing these searches and everything concurrently because back to go, the indexes can be built at the same time as you're searching and doing all these other things because we have workers for queries, for building indexes and for inserting data. +So it's all kind of in the background and kind of gets dealt for you. Yeah, that's cool. And I mean, so you mentioned the technical part, you know, like different products they might have some SLA, let's say, you know, how quick it is, queit per second, P99, whatever. +But like what about the semantic part? Like you mentioned that there is like a ground truth that you can compare to always, right? + But like what about the other kind of side of things, let's say for people who are like, let's say product managers, they're not very technical, they will not look into this metrics, but they still would like to get a way of understanding, you know, what's the kind of impact on the semantic part of things, right? +For instance, you're comparing, you know, inverted index versus vector search, right? +So with the semantic part, we don't really deal with that as much because we're assuming that your semantics are done well by the, by the neural net because this is where it kind of goes, you compare everything to brute force. +If your brute force shows that this is the correct response or this like, or this wording or this wording or this wording or the top three results, those are mathematically the closest, most similar to your input. So that's where you kind of compared to that. +If those aren't close, that means that there's an error a step above because your neural net is not finding the connections correctly. + So that's kind of how we compare to the base, which is just the flat index of brute force and we kind of pull out and we see if you're hitting the right responses, like if that's sort of what we deal with, not the actual, because the semantics come from the same from the neural net and to find that issue is more above us like in the whole stack, that makes sense. +Yeah, yeah, for sure. So basically what you're saying is that, you know, if I take, if I fix the model, right, so the model is fixed. +And I pick, let's say, different algorithms for indexing as well as let's say, different even distances, right? In some cases, I can maybe choose different distances, right? +Although maybe you can tell me if I'm wrong here, because if I trained the model for a specific distance, maybe I cannot easily pick another distance during test, is that right? So a distance, what do you mean by selecting a distance? +Because it's all based on closest, like we will rank it closest to furthest and then it's only like top end results. +Yeah, I guess what I meant is the distance metric itself. So it could be to harming, you know, and yeah, do you correct all those? Yeah. So comparing across this different distance metrics, that one is kind of you have to look at your data. +We don't have, because yeah, if you use different distance metrics, then your flat line is going to be different. But yeah, that's one where if you're going to compare across indexes, you have to keep them the same distance metric. +Swapping them out will make some big changes, I think, because if you go from L1 to L2, or maybe not L1 to L2, there's a cosine to Euclidean. It switches up things up a bit where in some cases, one of the distances might, a higher values better, in some cases, the lower values better. +So there's no real direct comparison. They're still going to usually rank in the same order. But yeah, for figuring out which one you want to use there, it's kind of give or taking off. I actually have to look at the results for that one. +There's no real like mathematical way to kind of compare some antics to distance and kind of get the relationship, if that makes sense. Yeah, yeah, for sure. For sure, it's more like an experimentation needed there, right? Yeah. Exactly. +And also, like, actually, I just remembered when you've been kind of describing these different distance metrics. I remember a paper, I think it's called Beer. So it was comparing different methods to do the re-ranking step, right? Like dense retrieval and some other methods I forgot already. +But they actually found out that if you have documents, let's say text documents, the cosine similarity will favor shorter documents given the tie versus dot product will favor longer documents. And this is by design of the formula. The cosine similarity is basically mapping it to the unit sphere. +The dot product is there is nothing to kind of normalize on. +So it basically just takes all the components of your vector and just basically says, okay, here is the volume and just the lowest one wins, right? And that can actually impact the user experience, right? Like if I have a database, let's say of news versus some deep research, right? +So deep research is thousands of pages and news is couple of pages, maybe even just couple of paragraphs. +Yeah. So if my hits are just in a paragraph in the news and also in a paragraph in the longer document with cosine, I'll get the news. I will not get the deep research, right? See what I'm saying? No, that makes sense. Yeah. +So yeah, that's I think one where you you have to kind of test it out and see what you want because I have some people searching they might want to use some people searching they might want the scientific paper. And that's one where you look at history, I guess. +For thinking about this, let's say Google is doing it. You look at the user's history of how they search if they're searching for scientific stuff or if they're always looking at you maybe swap the index and to a different distance metrics. But yeah, I haven't thought of that too much. Not way. +That's really interesting. Oh, yeah, I need to check out that paper. Yeah, for sure. I'll send you the link and I'll make sure to also include in the notes, maybe for those of us who are interested in reading papers. And yeah, so that's awesome. +So you guys basically will for a database where users can mix and match the way they want, right? And then you help them to kind of do you guide the users in the process of doing this? +So if they come to us for help, we usually kind of we have some articles where we mess around with the indexes and different parameters and we kind of have like a graph, let's say speed performance, recall that kind of stuff where it's kind of preliminary. +We kind of hope that they kind of learn it on their own because like we can only help so many people. So yeah, we do help out. We point to the right directions. And if it's like a really interesting use case or a really big use case, we'll kind of mess around with it ourselves and try to help out. +But we also just hope that people mess around and then post the results. They see that and then like we kind of the more data we get, the more like it helps a lot with when people kind of share what they're doing. +We're trying to share as much as everything kind of get people into this, get those words spread and that's pretty much open source. Like a big deal of open source is kind of getting this out there. +Like competition's good, new innovations good, just get this like vector similarity search, getting people interested in it. Yeah. Yeah. And your website has so many use cases covered like I was looking at audio search. That was interesting. +Like you basically walk through, you know, selecting a library, how I will encode the song and I have an idea to try it out on a few songs that I have like MP3s. Yeah. +And what I was particularly interested is like, okay, is there a way to separate like the singer voice from the musical instrument from like, I don't know, the style of this song and so on. Yeah. +And it's a really cool one because this is one of the things that kind of looked into a lot and did some talks on not like really detailed, but it always popped up. +So for all these like recommendation systems, like what's I think I have never looked at like their deep everything's behind closed doors. +But it's like spotifies recommendation and then when you're doing like the shazam, I think was shazam for the audio recognizing is, yeah, they separate the background music and the vocals and they pretty much discard vocals. +This music searching is based just in the background and there were some techniques. So I think there are for separating the audio, there is like one D neural nets that kind of go in the line and there's like times used based neural nets. But another one that was audio inversion. +So it would help when you had the background where you didn't vert it or where you had the vocals to get the audio out. But a lot of it was working on that of pulling out the background music is the big step. And then performing the neural net on that to get the embedding. +So that's how you avoid if you're recommending songs, you would cover songs and you kind of avoid it for you can easily filter out cover songs because they're going to have the exact same background. The vocals will be different. +And then with these recommendation system, another cool thing is you don't want the perfect similar like with your result search result, you don't want the exact same research. It's like you don't want the top 10 closest. +You might want like the last 10 out of 100 that are close because you want something similar but not really exact of the same. But yeah, audio inversion and one D neural nets and a few others. +I don't remember that on the top of my head, but it's a hard problem to solve of getting the vocals out without having like separated files already. +And it's like an exciting topic because like, you know, like there are so many examples on the web how you can index text, you know, how you can not index text and do something else with text and more text, right? +But it's like in frequent that I come across some image search or audio or even for that matter video, you know, I haven't seen any blog posts on the video. +I don't know if you guys have it. So video, yeah, that's that one gets a little difficult in terms of like video you can, but also how you're going to sort everything out because when you're doing dealing with videos, everything is framed by frame. +So then it's how to do you take every frame and sort of group it together into one sort of like ID and then if any frame matches, you kind of point to that, it gets a little difficult with video. +It's not too bad if you're doing let's say live tracking in a video like, like to say there's a soccer player and you pull out the most similar player that looks like him, you can get a name for him to track him. So it knows his name. +That's kind of similar if you're doing live tracking, but if you're looking for things within a video like you look for a single person an entire video, then it kind of gets difficult of it takes time. +You either go through all of you index all of them or you pull out a few key frames, but not too many people I'm going to be honest are doing video yet. +I think there is a little bit of lack right now to be honest, I think images are the most used for us like everyone because images, I think, is the easiest even compared to text because the text you have, some of these neural networks where the transformer networks are a little bit hard to use. +You always have like, yeah, you have like in Python you have sentence transformed the easiest one where you just input the string, but the other ones kind of required to add tags in the string and do these things which not everyone understands. +With images, it's just import some ResNet 50, which torch makes it really simple. So the image put the image in the ResNet and then literally you get your embedding vector you can directly pipe it. +So it's like, it's a very simple one and it gets good results that are pretty interesting and you can do a lot with images and I don't think enough people are doing it yet for like shopping things. Everyone's still relying on text but let's say you upload an image or a shoe, you find that shoe. +I think everyone will enjoy that a lot more. Yeah. So it says like very concrete application in business, right? And e-commerce is a very big area. So yeah, that makes total sense. +It's not like many users are like, oh, I remember that scene in the movie, can I find it expressing it in words? Yeah, it won't work. +Maybe you can like say the actor and then like some description of the scene, but then you already have to know the actor, which personally, I don't know any actor names. So yeah, exactly. It doesn't work for me. +And it and it defeats the purpose of search, right? Because actually like early on when I was kind of just entering this field many years ago, I was like, so search, it's like I need to know what to look for, right? +So I'm typing the keywords, telling the search engine what I'm looking for, but I don't know what I'm looking for. +Yeah, you're already doing the job for it. Yeah. Yeah, like if you're searching for the keywords, that means you really need to search for the engine anymore if you're doing all of this job. Yeah. And I really loved one competition that Yandex did. Like they actually stopped doing it in its epiti. +So the competition was like this, they give you a question. And basically you compete with other people, all right? So they give you a question, but it's not like what is the color of submarine in the in the song of Beatles, right? It's like you first need to answer the first part of the question. +Then you get like kind of another puzzle, another question kind of, you know, the puzzle gets solved and you get the full question and so on. And like it's multi-layered process. So basically they're telling you that a cool search engine would be doing that. +So you could ask like a very convoluted question and it would kind of figure everything out. I feel like you might know more, isn't there the the aspect of I remember taking an MLP class. +It was always you could only get to two degrees or was it like the third degree would always like if you have a question and then like based on that answer, you have another like part of like the next part of the question is placed on the first one. +I think they were only being able to do the second degree like a question after the question. And like getting that third part, it would always fail. Like it wouldn't be able to do the connection all the way back. Yeah, yeah. They stopped doing that. Seems like a really good conversation. +And it was also based on a lot of associations, something that computers may or may not be doing good. You know, like if you know prologue, the programming language, like it basically has the associations kind of built in as first class function. +You type something like, you know, orange fruit and then you type somewhere fruit and it says orange. If you type orange, it says fruit, right? It remembered that mapping, right? And then you can use this associative kind of programming in a bunch of places kind of building AI. +But I haven't been programming prologue. I was just it was part of one course. But you know, like the questions that they asked at the Yandex competition, they were also like, you know, something like, you know, who met this lady when he was a student blah, blah, blah, blah. +And you're like, I already lost the train, the train of thought in this question. So I spent like, I don't know, maybe one minute just figuring out what is being asked. And then you're like decomposing this problem into multiple problems and you start from the first one. +And then they are solving it. The next one and time is running. It's like five minutes of question, if I remember correctly. And it was fantastic competition. You know, like it's like, I don't think search engines are still kind of on that level. I don't think they are yet on that level. So yeah. +They'll probably get there. It's those neural nets. What they're doing with them is it's going to get there. Yeah. And we have those giant networks that they're now creating like Nvidia release that nega something GPT. +What is it three right now? Was it yeah, GPT three is the one that they're not releasing. No, it's going to get there one layer another. +So does it make you excited like to try these models in real life? What do you think they kind of make too far still, too far from real life? What do you think it makes me excited? And I think they're doing really well. +It's just that this whole trend has kind of been going to like not user friendly. No one can run any of these models and need like nine a 100s like $100,000 worth of computing who has that like other than those places that are doing that what GPT three took. +I don't know how many billions and billions parameters. No one can run that unless you're like at some super big company. It's like my opinion is what's the point like you can you can always throw more and more hard door at it and you can always get like 0.001 percent closer and closer. +And that's kind of like this whole thing of why like this area of research is on. It's kind of the new thing like we've already kind of maxed our selves out on neural nets. I personally believe unless there's some huge architecture changes that inspire some really interesting stuff. +I don't see neural nets changing as much and then I feel like we can do more with the next step of the nearest neighbor search in this approximate nearest neighbor vector similarity search. And I think that's where we can make some more new head games until we max this out. But yeah, I don't know. +I'm excited to see where it goes. It's just I hope it's going to be something I can try myself and not need to spend $400 on Amazon for hours worth of calculations. Yeah. So maybe somebody needs to work on compressing this model. +So sort of compressing the compute power they actually, you know, require. But now the thing is now it's like everyone's interested when they say, oh yeah, we use like 40 million dollars with the compute power. Everyone thinks that's cool. That's going to get some news. +And people are going to be interested in it. It's when it's bigger, but there is I don't know. I remember when I was studying all this, there was a lot of those things in regards to sparse neural nets and that was kind of going to be the future of compressing all this down. +Fortunately, I haven't really kept up on it too much, but hopefully they do make moves because again, I don't see throwing more and more hardware as innovation compared to actually making it efficient. Yeah. That's my take. People are doing it so there's there's reason to do it. Yeah, for sure. +It's like I guess the the hope of researchers is to essentially kind of emulate human brain, right? But like, I think human brain has like a hundred million neurons, so even more, right? I don't know. Like, yeah, it has a bunch of neurons and then connections that we have is. Yeah. +Yeah, we're not, I don't think hardware wise we're going to be close yet. Unless, I don't know, I know they're doing a lot of research in this area, but I definitely don't think the neuron or, yeah, I don't know. This is the way I estimate that. Yeah, yeah, yeah. +But like, just to close on that thought also that human brain does not kind of use the energy of a server farm is just like one electric lamp or whatever. Yeah, even less efficiency right there. And that efficiency. A couple of kilos and that's it, right? That's the device. +And then we can do all of this. Yeah, there is a long way to go. Yeah, yeah. And exciting though. Yeah, yeah, absolutely. But actually, today I've learned from my colleague Arnitalman. I will make sure to also ask him to give me the link to this paper. +The paper said that if you take a model like bird, bird will not be able to distinguish the negations. Have you seen such research, which is very amazing. Like for the bird power, not to be able to distinguish negations, that could be like deal breaker in some cases. +Even though it is very, bird is very kind of powerful model and probably Google is using it like for a few percent of their searches. But to know that it doesn't know to distinguish between negated or non negated phrase, that's interesting. +And that brings more questions in where some languages have double negations, where matters, where it doesn't matter. And then that's like where it's like a double negations in some languages are used quite a bit and they really change the meaning. +So that's where it's like, what do we do there? But I wonder why. I guess I wonder if that's something we know why or if that's something just a black box of mystery does something there. +Yeah, because if it was a rule based system, you could claim that, hey, I have the rules here, right? And I've managed to encode negations. I know what they are. And maybe you run out of all possible combinations, then you add another one and another one. +But in bird, you don't do that, right? You mask the text and you train it. And that's it, right? Yeah, you didn't tell what is negation. You didn't tell what. And you can also argue, yes, we also in our human brain, we don't have syntax, right? But there are some studies like that. +Like, for instance, when kids learn to speak the language, they don't know what is negation, right? They don't know what is the syntactic structure or pronoun or whatever. They just speak, right? And so we probably don't use syntax in our brain either. +Like we use some semantic grams, I don't know, something like that. Yeah. It's an exciting topic. So, and if we go back to Milbus, so you guys basically essentially have built support for a number of indexing algorithms, as you said, right? Yeah. +And can I, as a user, also plug in my own method? So we're currently working for those plans. Right now, it's kind of blocked off. Or like there's a bunch of changes you have to make deeper in. +But we are working towards doing something along those lines where we'll have a system where you can kind of bring it in. But we're also trying to add like Google scan and we're working on that disk and it's like we're already like working on putting all the main ones in. +But right now, you can't really do your own. And there's also another question that comes up a bunch of the time. +Is it using your own distance metric? +And that's one, unfortunately, you kind of lose, like you can do your own disks metrics, metric, but it would require you to only use flat because all these algorithms are kind of, the reason they're efficient is because they can do some of these distance metrics. +And like, let's say with quantization, like it kind of plugs and plays together nicely. When you try changing those things, everything breaks and you kind of have to revert back to doing a flat-based system. +But yeah, those are the, yeah, for now, we're trying to add in all of the most famous, or not most famous, but just state of the art near-snaper searching algorithms. +And then later on, hopefully, we can kind of make this thing where you can kind of code it out yourself and kind of plug it in and make it work. And also, Milbus, like, pays a lot of attention to scalability. +So for instance, like horizontal scaling, you know, is probably vertical, right? So, but for instance, one thing that I've been thinking about is that in the whole infrastructure and the pipeline of a search engine, one of the bottlenecks is actually getting the data, right? +So the data comes in, let's say in raw format, I don't know, news items, images, what have you. +Now you need to compute the embeddings, right? At some good, good rate, right? So kind of throughput. Yeah. +So do you guys have any work done in this area or kind of recommendations for the users? +So recommendations right now are what we've been using is when we recommend is like having a server, like there's in videos, try to in there's a few other ones of inference-based servers, and you scale those up themselves. +We are currently working on that, like we're calling it OE. And it's kind of the ML pipeline scalable that goes, that's focuses on embeddings. So like kind of doing all these things about embeddings, everything embeddings, and making pipelines that can scale in multiple machines and multiple GPUs. +Still in the work of progress, it's pretty early stage. That's what I'm currently also working on. That's going to be that, like there are people that don't know that step and that don't know what the best process is, and we're going to be, it's also open source. +So it's going to be the step ahead of millivis. And then we're going to, as it progresses, kind of interlink it with millivis, kind of make an easy plug-in play together. But for now, it's all about kind of scaling up inference servers. Luckily, you can scale it pretty easily. +When it comes to videos, where frame order matters, it's a little different. But yes, we kind of, for now, with millivis are only that storage and search part. Everything above it is up to the user. Yeah. +And millivis, do I only store the vectors, or can I also store the input object? Right now, no input object. And that was kind of like, it slows things down a lot. +And that's where sort of a no-skill database or something like that would work a lot better for those quick retrievals, where you need exact. So we, for now, store the ints, and then we're going to store strings. And then later on, we're going to add more and more types that we can store with it. +And another thing, we're hoping to be able to also index on strings on indexes. So you are on ints. So for now, we don't store the objects. So in the future, when we have string, you can link the file path. +Because that's usually what most, when you store an object, you're just storing the file path. But yeah, object storage is not part currently on in the millivis server. You just get an ID and a vector. Yeah, yeah. +And then you go back and kind of link it to the other days, where those objects are stored in case you need to display them or something like that. So in that sense, you guys are like a pure vector database, like you store. Exactly. +Literally the vectors plus, I guess, the scholar values that they can filter on, right? Yeah. Yeah. Exactly. Oh, that's awesome. I think that covers a lot of use cases, isn't it? Doesn't it? Yeah. Yeah. And I was also thinking like, so millivis is open source. +And it's one of my favorite also questions. You know, like, can you speak more why, why millivis is open source? What do you get from it being open source? So I think the biggest thing right now is with open source, like, you need open source to kind of get this idea out. So Dr. +search, if you close source it, you don't really know what's going on. Nothing like something like you don't really get the info that you want. +And that doesn't spark competition because if you're not in there already, you need to find out everything yourself, kind of do it all to just catch up in that that's sort of going to take a long time. +So with open source, it kind of promotes this competition innovation because everyone can see what we're doing. You can see all these algorithms, you can learn how a vector search works. And then people can kind of branch off and do their own. +Sure it's competition, but the only way you're going to get more users familiar with this and knowing about vector search is just get as many people doing it and get as many people trying their own routes, get as many people building up their own systems and just kind of get it out there. +That's like what we want. And I think like that's the biggest way you can do it. If you open source, everyone can see what's going on, learn from it and just go ahead. +And then also with open source, you kind of get feedback from everyone from all different areas, from all different like, you can be a student working on a project who has some great idea. Like he's not some company. +So if you're like sometimes close source, if it's not bringing money in or something like that, no one really really listened to that small student and his idea. So where he might not be able to use it. +So it's just about getting more perspectives on it and getting more input and kind of making it accessible to everyone and sparking that competition innovation. Yeah, actually you brought a very interesting topic. I didn't think about it that way to be honest. + And now that you said it, it's very logical that it may as well be a competition between some users because they are using the same tech and they have different use cases or maybe the same use case, but they are competing like to get that last sort of percent of precision out or whatever you're doing. +But also at the same time, you know, like when you look at open source projects, like I don't know Apache Software Foundation for that sake, you know, when you go there and you ask a question, you, first of all, you don't have to say the company that you work for, right? +Or maybe you are that student that you said. +And, and you know, you just focus on the matter, right? You focus on what is it you're asking about? And then if somebody is so curious, even if they're competing over the same thing, they might kind of casually share something, right? +I mean, that's what I've seen in the in the mailing lists a lot. +Like users just some of the other users, they just come in and say, Hey, why are you doing this? You know, did you consider something else? And you're focusing so much on solving a specific problem? Yeah, I think it's just, yeah, it's kind of in like with competition, there's innovation. +And then with innovation, you get more people interested. And I think that's kind of like what neural nets did is started out. I don't think everyone was using it. Everyone was just using some brute force tech search of keyword matching. +And then as people learned about it more, there's open source systems. And I think a lot of these neural nets, if you're going to be making neural nets, you're going to be doing research on them. You're going to be posting those papers. Everyone's going to see it filled on top of it. +And we'll just explode. I think now that's happening with these Bert models with Huggins face all of this. It's just exploding and more people are looking into it. And it's just it's better for everything at this point. So that's kind of why we do it. +There's a bit of my opinion company motto, but that's kind of, yeah, the reason. Yeah, it's kind of like a compound effect of multiple inputs. And then everyone essentially has the same goal is to serve the users the best. Or maybe solve that specific problem they're solving. +Maybe even for themselves. But yeah, I mean, that's very interesting. And how do you, so basically you have Slack where I can go and ask my question. Like how do you kind of balance your time between kind of doing the actual work and helping the community? So that's a hard one. +Right now with Slack is it's people that come to us. So because this area hasn't blown up so much, it's still manageable. I'm thinking the future when you get to like levels of these other open source projects where they have like 20,000 people and they're slack all like posting questions. +Right now it's pretty manageable and you can kind of keep on top of it. And yeah, Slack, we made like a discourse. We kind of made a lot of like preliminary like areas that you could talk to us. And then yeah, there's GitHub issues all that. +But it's also there's another aspect of like kind of splitting up the problems. Because if you open up a Slack, people might post their technical problems there. Or like there's something that might be worth being a GitHub issue. +The people that are looking at the Slack majority of the time are more like user success style, not like full-blown R&D deep engineers. So that's where the, that's where I think the balance comes out where the problem is of like what belongs on Slack, what belongs on GitHub as an issue. +But for now, all of it's easily solvable because it's a steady inflow that we can manage. And we have enough people looking at it. But we'll see in the future that's going to be another problem to deal with in the interested to see how we do it. +Yeah, it's like both catastrophic success, right? Exactly. It may happen. But hopefully it will be manageable in your case. +And so you can, you can as said kind of cater to that community as well as actually keep solving the and keeping your roadmap under control because you also need to keep, you know, innovating in this space, right? Yeah, I'm glad it's working for you. +And I've been also slacking a bit kind of, the slacking is not the right word, but slacking with big ads. So just kind of, I so immediately answers to my questions and they've been like a long thread. +Why Docker doesn't work, can you try these? Can you try that? And it's also like, you know, it's like a first impression you get about the database or about the Nopus source product. Like how soon you get an answer? Yeah, and you definitely try this sometimes. +Also like you're working on one problem and you have another problem that's completely separate. It's like it's a big system. So jumping around between, but then it's also okay. Let me find someone to answer that for you. So you go internally look for someone. +Hey, can you answer this? But hopefully it's working. I think, I think we're pretty quick on our responses. Maybe like overnight, it's sometimes difficult to sleeping in everything. But we try to get responses whenever we can. Yeah, some people are like in China, I guess. So like, I don't know. +It was like five hours difference with my time zone and sometimes. With yours, yes, probably five. Mine is 14. That's like you ask one question for a couple of days, right? Yeah, it gets interesting with the technical like very deep questions, because then I have to kind of bridge the gap of time. +Try to find solutions on my own to why it's going wrong. But then also once five o'clock hits for me, I can pull in the external knowledge from the other team. But it's fun. This needs to be solved with vector search. I don't need an exact answer just in the approximate, but faster. +Oh, yeah, we were working on that. We're trying to apply it to like a chatbot for all the problems that you have. It's been working okay, but we're working on it. Trying to get more questions, kind of build up a data. So that's the issue with everything. Just building up that data said. +Yeah, absolutely. So that it will make sense for the chatbot to kind of, because chatbot wouldn't create answers. Well, unless it's some generative model. Yeah, the GBT3 for our questions. Like a story out of it. Yeah, but then it might be hallucinating as well. +In some cases, it's okay though, right? Yeah, one of the ten is the correct answer. The other ones are all just like burn your computer. Yeah, if you want to have fun with, you know, you don't need an exact answer. You just, okay, hey buddy, how are you doing? Yeah, so yeah, that's fantastic. +So I was lovely moving to to why section, even though I didn't say all the sections, but we kind of mixed what and how together in many ways. And you handle it really, really well. +You know, the why the why question that I really like to ask, everyone on this show is kind of what motivates you to be part of vector search development today? I think for me the biggest thing, I want to over a few times is everyone storing all this data. And like it's so like it's a huge amount. +I like all these companies. And then just the next step, I want to see what we can do with it. Vector search is one who knows. Maybe vector search might not be it. +But in that chase for figuring out vector search for like perfecting it, something you might pop out and kind of ride this wave of what's next. And that's kind of like why I really like vector search right now. I get to learn about all these things. +I still get to like throw my ideas into it and still kind of have them matter. Like the previous like the way that's past already, it's kind of gone to the point where you really need to have this deep, deep knowledge to actually be able to innovate. +You do also with vector search and all of these things, but it's a little bit more fresh. So more that sort of makes sense. Like I like to I want to ride that wave of freshness and kind of the next step of dealing with these huge data amounts. Yeah, that's that's that's amazing. +And also I think I've read somewhere on on to either one of the founders of Y-combinator. He said like it was an essay. He said like when you are on the bleeding edge of doing something, then you you automatically become the expert in that field. +And if something works for you, you know, the rest of the market will probably try to copy. If it didn't work, then probably everyone else didn't figure it out because you are the bleeding edge expert, right? You are right there. +And if you will figure if you will figure out something very interesting that will be kind of revolutionary in some way, then you will be first to possibly capture the value, right? And so you work for that goal. +On one hand, as you said, it motivates you to unlock the, you know, the silo database is kind of of data, unstructured and structured data. On the other hand, you said maybe it won't be vector search, maybe it will be something else because you are in that experimental mode, right? Yeah. +Whereas you can easily quickly transition and kind of keep that knowledge, keep it going and keep it running. Yeah, that's pretty much for me why I'm doing this. It's been really fun so far. Also startups. I like it. I like the multi-hat. Kind of just do it. Try it by fire and just get it done. +Yeah, yeah. With the money where you mouth, well, how do you say it in English? With the money where you mouth is? With the money where you mouth is? Yeah, like something along those lines. +Yeah, but I mean, you basically, instead of just kind of blogging or saying how cool it is, you actually go and try to apply it to some real use case, right? Exactly. +And if I may ask you, like, do you think that something kind of tactically or strategically is missing right now in the vector search space? Like maybe on the lines of how we explain it or maybe there are some untapped use cases or something else that comes to you. +I think I think the big thing right now is I may be wrong on this one. I kind of might be explaining it weirdly, but like having a standard, like we don't really have a standard for any of this yet. And there's a bunch of things kind of popping up and everyone's going to be scared to move away. +So like, I feel like the last to search. I don't know too much about the history and like what's going on, but like, everyone's a bunch of people have built up their system on that and it's kind of been a standard for doing that text-based keyword searching and that kind of stuff. +And then when we say, oh yeah, do word embedding. So we'll make everything improve. Do this, do this. But it's like, there's no standard in any of like, we're doing vector database. Some of the people doing vector search with database attached. +We're like, everyone's kind of just doing some like, there's no big thing there that keep people kind of try to make it similar. So yeah, there's no standard, which I think is kind of an issue. And it's going to hurt everyone in the long run because there's no standard. +People won't be as excited to try it out because there's too many options. Why switch is too much of a pain. I don't know if that kind of made sense, but that's sort of what I'm seeing as an issue right now. But it'll probably solve at some point. I think naturally that happens. +I guess explaining it could have seen from my previous explanation of trying to explain a vector search. It was all over the place. But it kind of gets hard to it's a step. It's a jump and um, yeah, not everyone will know like similarity like cosine distances. +Like you need to be sort of involved with machine learning. I think the best way of around that is just making full pipelines for people where you just put an image in you get your result and then go from there and then from there on they can start messing around with it. +But uh, in time, everyone I think will have that and everyone's working for that. +Yeah, I think what you said makes a lot of sense and thanks for bringing up this topic, you know, standardization because um, on one hand it basically points us to think that this field is still fragmented, right? I've, I've blogged about it. +I had six databases and then one evening I get a comment on the blog that hey, we are the new key on the blog. Can you add us? That's the seventh database, right? So how many more there are? Probably, yeah, dense. I don't know. Oh yeah, always popping up. But it's like it's good for innovation. +It's just like we're competing against ourselves. Like we're competing in everything but no one else really cares. Like we can all compete against each other but the people that are actually going to use are going to be like look at this mess. Why would we go into that area? Yeah. +And that there was actually, you know, if you know the relevancy and matching slack, I don't know if you're on it, it's like a community of all search in 3DS consultants. I think I am. Yeah. Yeah. Yeah. It's it's it's fantastic place. I'll also make sure to link it in the notes. +And there was like one very interesting piece on like actually touching on what you just said, you know, there was like a heated discussion on like how should we call it pre-filtering, single-stage filtering, something filtering. +And you know, like when you invented that, you go to your users and you say, yeah, today we released a single-stage pre-filter after filter filtering. So please use it, right? And then some other company comes and says, no, we invented another one. +It's called after pre-filtering single-stage with double sub-stages, you know, I'm just making it up obviously. Exactly what you mean. I've exaggerating, right? And then that's what you said. +Eventually it will hurt the users because they will say, oh no, no, no, I have that single-stage filter after filtering. I will not go and switch it to another one, right? Yeah. No, exactly then. It's just but I think it's it's all young. I'm like kind of new to this whole field of how things work. +But I think it's natural for these young someone like everyone's going to race at the top and see. And you just got to do what we got to do. But it's interesting how it's all going to play out if there is going to be more communication between everyone if there's not. I don't know. +It's might be a little bit above my what I'm doing. But we'll see future awaits with all of this. But I still feel like in the end of the day, you really need to focus on the users, right? You're not focusing on inventing a new turnbook or a new dictionary for vector search. +Eventually it will be published, by the way. I'm sure there will be so many terms. It will be published. But like we need to work to that. Yeah. No, I agree. 100%. So hey Philip, like it's been really great discussion. +I was thinking like, would you like to announce something to the users of Milbus or maybe those who are not yet using Milbus, but they would like to try it out. Yeah. So we have Hector profess get involved. And it's a pretty easy one to check out and see how our system works. +And we are releasing a general release candidate. So pretty much are kind of tried and true. Milbus 2.0 coming month. And you also mentioned that other system toy, you said, what is it about? And when will the yeah, Toby is a ML pipeline software kind of simplifying mainly for embeddings. +And it's all about embeddings and kind of making these for you. So it's pipeline system. Everyone can operate and everyone can upload their solutions if they want to download them. +And yeah, still in the working progress, but look out for it because I think it's going to help in a lot of these areas. Yeah, that's super cool. That sounds very exciting, you know, to kind of take this package and kind of plug things in and try it out for real on the real day. Exactly. +That's fantastic. Thanks for doing this. We'll make sure to also kind of mention this in the show notes or link if by the time it's there. Yeah, awesome. +Thanks, Philip, so much for your time for going so deep with me on on even philosophy behind neural networks and then sharing your ideas and thoughts. Thanks so much. +And I hope we can make another episode at some point down the road if you're open to it, especially as the company materials and the product materials. And you get more use cases. So I'm looking forward for more blog posts as well. Awesome. Yeah. Yeah, thanks for having me. +It was a really fun discussion. Thanks so much, Philip. Bye bye. Bye. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/grant-ingersoll-fractional-cto-leading-search-consultant-engineering-better-search.md b/transcripts_with_timestamps/vector-podcast/grant-ingersoll-fractional-cto-leading-search-consultant-engineering-better-search.md new file mode 100644 index 0000000..37b2bd9 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/grant-ingersoll-fractional-cto-leading-search-consultant-engineering-better-search.md @@ -0,0 +1,3903 @@ +--- +description: '

YouTube: https://www.youtube.com/watch?v=r4HEpyur-OE

Vector + Podcast Live

Topics:

00:00 Kick-off introducing co:rise study platform

03:03 + Grant’s background

04:58 Principle of 3 C’s in the life of a CTO: Code, Conferences + and Customers

07:16 Principle of 3 C’s in the Search Engine development: Content, + Collaboration and Context

11:51 Balance between manual tuning in pursuit to + learn and Machine Learning

15:42 How to nurture intuition in building search + engine algorithms

18:51 How to change the approach of organizations to true + experimentation

23:17 Where should one start in approaching the data (like + click logs) for developing a search engine

29:36 How to measure the success + of your search engine

33:50 The role of manual query rating to improve search + result relevancy

36:56 What are the available datasets, tools and algorithms, + that allow us to build a search engine?

41:56 Vector search and its role in + broad search engine development and how the profession is shaping up

49:01 + The magical question of WHY: what motivates Grant to stay in the space

52:09 + Announcement from Grant: course discount code DGSEARCH10

54:55 Questions from + the audience

Show notes:

- Grant’s interview at Berlin Buzzwords 2016: + https://www.youtube.com/watch?v=Y13gZM5EGdc

- + “BM25 is so Yesterday: Modern Techniques for Better Search”: https://www.youtube.com/watch?v=CRZfc9lj7Po

- + “Taming text” - book co-authored by Grant: https://www.manning.com/books/taming-text

- + Search Fundamentals course - https://corise.com/course/search-fundamentals

- + Search with ML course - https://corise.com/course/search-with-machine-learning

- + Click Models for Web Search: https://github.com/markovi/PyClick

- + Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing, book + by Ron Kohavi et al: https://www.amazon.com/Trustworthy-Online-Controlled-Experiments-Practical-ebook/dp/B0845Y3DJV

- + Quepid, open source tool and free service for query rating and relevancy tuning: + https://quepid.com/

- + Grant’s talk in 2013 where he discussed the need of a vector field in Lucene and + Solr: https://www.youtube.com/watch?v=dCCqauwMWFE

- + Demo of multimodal search with CLIP: https://blog.muves.io/multilingual-and-multimodal-vector-search-with-hardware-acceleration-2091a825de78

- + Learning to Boost: https://www.youtube.com/watch?v=af1dyamySCs

' +image_url: https://media.rss.com/vector-podcast/20220609_020607_0461c4544521e6be53134d28774b7c4a.jpg +pub_date: Thu, 09 Jun 2022 14:51:07 GMT +title: Grant Ingersoll - Fractional CTO, Leading Search Consultant - Engineering Better + Search +url: https://rss.com/podcasts/vector-podcast/514832 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 28.400000000000002, + "text": " Hello there, vector podcast is here. I''m Dimitri Khan and I''ll be hosting + this session. And just a few words on the logistics.", "tokens": [50364, 2425, 456, + 11, 8062, 7367, 307, 510, 13, 286, 478, 20975, 270, 470, 18136, 293, 286, 603, 312, + 16058, 341, 5481, 13, 400, 445, 257, 1326, 2283, 322, 264, 27420, 13, 51784], "temperature": + 0.0, "avg_logprob": -0.4064546857561384, "compression_ratio": 1.1666666666666667, + "no_speech_prob": 0.08359098434448242}, {"id": 1, "seek": 2840, "start": 29.36, + "end": 38.32, "text": " Everyone in the audience feel free to submit your questions + either through a Q&A panel or directly in the chat and we will try to handle as + many questions as we can.", "tokens": [50412, 5198, 294, 264, 4034, 841, 1737, 281, + 10315, 428, 1651, 2139, 807, 257, 1249, 5, 32, 4831, 420, 3838, 294, 264, 5081, + 293, 321, 486, 853, 281, 4813, 382, 867, 1651, 382, 321, 393, 13, 50860], "temperature": + 0.0, "avg_logprob": -0.1969124231583033, "compression_ratio": 1.569672131147541, + "no_speech_prob": 0.06889087706804276}, {"id": 2, "seek": 2840, "start": 39.68, + "end": 55.519999999999996, "text": " I''ll save you words about core eyes. What''s + core eyes is a new education platform that transforms the way professionals build + technical high demand skills through top industry instructors and collective peer + learning.", "tokens": [50928, 286, 603, 3155, 291, 2283, 466, 4965, 2575, 13, 708, + 311, 4965, 2575, 307, 257, 777, 3309, 3663, 300, 35592, 264, 636, 11954, 1322, 6191, + 1090, 4733, 3942, 807, 1192, 3518, 28367, 293, 12590, 15108, 2539, 13, 51720], "temperature": + 0.0, "avg_logprob": -0.1969124231583033, "compression_ratio": 1.569672131147541, + "no_speech_prob": 0.06889087706804276}, {"id": 3, "seek": 5552, "start": 56.080000000000005, + "end": 69.12, "text": " And the format of their courses is innovative mixing live + instructor sessions with real world projects and fireside chats like this one technique + with operators who experts in their fields.", "tokens": [50392, 400, 264, 7877, + 295, 641, 7712, 307, 12999, 11983, 1621, 18499, 11081, 365, 957, 1002, 4455, 293, + 15044, 482, 38057, 411, 341, 472, 6532, 365, 19077, 567, 8572, 294, 641, 7909, 13, + 51044], "temperature": 0.0, "avg_logprob": -0.19098338796131648, "compression_ratio": + 1.5142857142857142, "no_speech_prob": 0.014435657300055027}, {"id": 4, "seek": 5552, + "start": 71.12, "end": 79.68, "text": " I will say a few words about myself as well, + untraditionally on the podcast, but I think it becomes a tradition now second time.", + "tokens": [51144, 286, 486, 584, 257, 1326, 2283, 466, 2059, 382, 731, 11, 1701, + 6206, 15899, 322, 264, 7367, 11, 457, 286, 519, 309, 3643, 257, 6994, 586, 1150, + 565, 13, 51572], "temperature": 0.0, "avg_logprob": -0.19098338796131648, "compression_ratio": + 1.5142857142857142, "no_speech_prob": 0.014435657300055027}, {"id": 5, "seek": 7968, + "start": 80.64, "end": 94.48, "text": " I said and Dimitri Khan I have a PhD in + natural language processing. I''ve worked at company Alphasense helped to build + the search stack. I spent like a decade, you know, there.", "tokens": [50412, 286, + 848, 293, 20975, 270, 470, 18136, 286, 362, 257, 14476, 294, 3303, 2856, 9007, 13, + 286, 600, 2732, 412, 2237, 967, 7485, 1288, 4254, 281, 1322, 264, 3164, 8630, 13, + 286, 4418, 411, 257, 10378, 11, 291, 458, 11, 456, 13, 51104], "temperature": 0.0, + "avg_logprob": -0.28397237141927084, "compression_ratio": 1.4158415841584158, "no_speech_prob": + 0.018019085749983788}, {"id": 6, "seek": 7968, "start": 95.68, "end": 103.68, "text": + " I''ve been a principal AI scientist at silo AI. It''s a AI consulting gig focusing + on a number of ML verticals.", "tokens": [51164, 286, 600, 668, 257, 9716, 7318, + 12662, 412, 3425, 78, 7318, 13, 467, 311, 257, 7318, 23682, 8741, 8416, 322, 257, + 1230, 295, 21601, 9429, 82, 13, 51564], "temperature": 0.0, "avg_logprob": -0.28397237141927084, + "compression_ratio": 1.4158415841584158, "no_speech_prob": 0.018019085749983788}, + {"id": 7, "seek": 10368, "start": 104.64, "end": 109.84, "text": " And recently + I joined company Tom Tom as a senior product manager working on search.", "tokens": + [50412, 400, 3938, 286, 6869, 2237, 5041, 5041, 382, 257, 7965, 1674, 6598, 1364, + 322, 3164, 13, 50672], "temperature": 0.0, "avg_logprob": -0.21967054393193494, + "compression_ratio": 1.471698113207547, "no_speech_prob": 0.04752865061163902}, + {"id": 8, "seek": 10368, "start": 111.60000000000001, "end": 117.76, "text": " I''ve + also been a contributor and user of Cupid. It''s a query rating tool go check it + out. It''s an open source tool.", "tokens": [50760, 286, 600, 611, 668, 257, 42859, + 293, 4195, 295, 383, 6127, 13, 467, 311, 257, 14581, 10990, 2290, 352, 1520, 309, + 484, 13, 467, 311, 364, 1269, 4009, 2290, 13, 51068], "temperature": 0.0, "avg_logprob": + -0.21967054393193494, "compression_ratio": 1.471698113207547, "no_speech_prob": + 0.04752865061163902}, {"id": 9, "seek": 10368, "start": 118.64000000000001, "end": + 125.60000000000001, "text": " So overall I spent like 16 years in developing search + engines for startups and multinational technology giants.", "tokens": [51112, 407, + 4787, 286, 4418, 411, 3165, 924, 294, 6416, 3164, 12982, 337, 28041, 293, 45872, + 1478, 2899, 31894, 13, 51460], "temperature": 0.0, "avg_logprob": -0.21967054393193494, + "compression_ratio": 1.471698113207547, "no_speech_prob": 0.04752865061163902}, + {"id": 10, "seek": 12560, "start": 126.24, "end": 134.16, "text": " I also happen + to be hosting this podcast vector podcast go check it out. I''ll share the link + in a second.", "tokens": [50396, 286, 611, 1051, 281, 312, 16058, 341, 7367, 8062, + 7367, 352, 1520, 309, 484, 13, 286, 603, 2073, 264, 2113, 294, 257, 1150, 13, 50792], + "temperature": 0.0, "avg_logprob": -0.1895643570843865, "compression_ratio": 1.587962962962963, + "no_speech_prob": 0.010211152024567127}, {"id": 11, "seek": 12560, "start": 135.35999999999999, + "end": 142.88, "text": " And I''m also blogging on medium on on my findings in vector + search. So you might hear me talking about vector search here and there.", "tokens": + [50852, 400, 286, 478, 611, 6968, 3249, 322, 6399, 322, 322, 452, 16483, 294, 8062, + 3164, 13, 407, 291, 1062, 1568, 385, 1417, 466, 8062, 3164, 510, 293, 456, 13, 51228], + "temperature": 0.0, "avg_logprob": -0.1895643570843865, "compression_ratio": 1.587962962962963, + "no_speech_prob": 0.010211152024567127}, {"id": 12, "seek": 12560, "start": 144.07999999999998, + "end": 152.56, "text": " And today I''m super super excited to have Grant in your + soul with me. I''ve known Grant since about 2011.", "tokens": [51288, 400, 965, + 286, 478, 1687, 1687, 2919, 281, 362, 17529, 294, 428, 5133, 365, 385, 13, 286, + 600, 2570, 17529, 1670, 466, 10154, 13, 51712], "temperature": 0.0, "avg_logprob": + -0.1895643570843865, "compression_ratio": 1.587962962962963, "no_speech_prob": 0.010211152024567127}, + {"id": 13, "seek": 15256, "start": 152.72, "end": 162.32, "text": " Not personally, + but I''ve seen I''ve seen him on stage on you know Berlin buzzwords conference and + Lucinda revolution.", "tokens": [50372, 1726, 5665, 11, 457, 286, 600, 1612, 286, + 600, 1612, 796, 322, 3233, 322, 291, 458, 13848, 13036, 13832, 7586, 293, 9593, + 6837, 8894, 13, 50852], "temperature": 0.0, "avg_logprob": -0.2910769271850586, + "compression_ratio": 1.5645756457564575, "no_speech_prob": 0.011651513166725636}, + {"id": 14, "seek": 15256, "start": 162.32, "end": 168.24, "text": " And he has been + a long contributor and open source as well. Solary, Lucinda Mahoot and others.", + "tokens": [50852, 400, 415, 575, 668, 257, 938, 42859, 293, 1269, 4009, 382, 731, + 13, 7026, 822, 11, 9593, 6837, 10104, 6259, 293, 2357, 13, 51148], "temperature": + 0.0, "avg_logprob": -0.2910769271850586, "compression_ratio": 1.5645756457564575, + "no_speech_prob": 0.011651513166725636}, {"id": 15, "seek": 15256, "start": 169.44, + "end": 174.8, "text": " And very very effective presenter. I just watched a few + presentations as a homework for this session.", "tokens": [51208, 400, 588, 588, + 4942, 35594, 13, 286, 445, 6337, 257, 1326, 18964, 382, 257, 14578, 337, 341, 5481, + 13, 51476], "temperature": 0.0, "avg_logprob": -0.2910769271850586, "compression_ratio": + 1.5645756457564575, "no_speech_prob": 0.011651513166725636}, {"id": 16, "seek": + 15256, "start": 175.52, "end": 181.2, "text": " There will be some questions from + there. But hey Grant, let''s start with an introduction from Indio own words.", + "tokens": [51512, 821, 486, 312, 512, 1651, 490, 456, 13, 583, 4177, 17529, 11, + 718, 311, 722, 365, 364, 9339, 490, 2333, 1004, 1065, 2283, 13, 51796], "temperature": + 0.0, "avg_logprob": -0.2910769271850586, "compression_ratio": 1.5645756457564575, + "no_speech_prob": 0.011651513166725636}, {"id": 17, "seek": 18256, "start": 182.56, + "end": 191.84, "text": " Hey Dmitri and thank you so much for having me on the vector + podcast and obviously props to co rise here as well for helping sponsor this.", + "tokens": [50364, 1911, 413, 3508, 470, 293, 1309, 291, 370, 709, 337, 1419, 385, + 322, 264, 8062, 7367, 293, 2745, 26173, 281, 598, 6272, 510, 382, 731, 337, 4315, + 16198, 341, 13, 50828], "temperature": 0.0, "avg_logprob": -0.26167962816026474, + "compression_ratio": 1.5546218487394958, "no_speech_prob": 0.001603670185431838}, + {"id": 18, "seek": 18256, "start": 192.72, "end": 201.2, "text": " Both Daniel Tungaling + and I are on the co rise platform and really enjoying our time there. So real quick + about myself.", "tokens": [50872, 6767, 8033, 314, 1063, 4270, 293, 286, 366, 322, + 264, 598, 6272, 3663, 293, 534, 9929, 527, 565, 456, 13, 407, 957, 1702, 466, 2059, + 13, 51296], "temperature": 0.0, "avg_logprob": -0.26167962816026474, "compression_ratio": + 1.5546218487394958, "no_speech_prob": 0.001603670185431838}, {"id": 19, "seek": + 18256, "start": 201.2, "end": 208.88, "text": " As you said, my name is Grant Ingersoll. + I guess these days a long standing user and contributor and committer.", "tokens": + [51296, 1018, 291, 848, 11, 452, 1315, 307, 17529, 682, 9458, 1833, 13, 286, 2041, + 613, 1708, 257, 938, 4877, 4195, 293, 42859, 293, 5599, 391, 13, 51680], "temperature": + 0.0, "avg_logprob": -0.26167962816026474, "compression_ratio": 1.5546218487394958, + "no_speech_prob": 0.001603670185431838}, {"id": 20, "seek": 20888, "start": 209.35999999999999, + "end": 221.6, "text": " And generally somebody who participates in the search space, + if you will, I think I wrote my first Lucine code back in 2004 or so. I guess that + maybe makes me old.", "tokens": [50388, 400, 5101, 2618, 567, 3421, 1024, 294, 264, + 3164, 1901, 11, 498, 291, 486, 11, 286, 519, 286, 4114, 452, 700, 9593, 533, 3089, + 646, 294, 15817, 420, 370, 13, 286, 2041, 300, 1310, 1669, 385, 1331, 13, 51000], + "temperature": 0.0, "avg_logprob": -0.14172953528326912, "compression_ratio": 1.5502645502645502, + "no_speech_prob": 0.005819553509354591}, {"id": 21, "seek": 20888, "start": 222.72, + "end": 231.68, "text": " As far as my background is I was one of the co founders + of Lucidworks, which is one of the leading companies in the search space.", "tokens": + [51056, 1018, 1400, 382, 452, 3678, 307, 286, 390, 472, 295, 264, 598, 25608, 295, + 9593, 327, 18357, 11, 597, 307, 472, 295, 264, 5775, 3431, 294, 264, 3164, 1901, + 13, 51504], "temperature": 0.0, "avg_logprob": -0.14172953528326912, "compression_ratio": + 1.5502645502645502, "no_speech_prob": 0.005819553509354591}, {"id": 22, "seek": + 23168, "start": 232.64000000000001, "end": 240.08, "text": " I then left them in + 2019 to become the chief technology officer at the Wikimedia Foundation.", "tokens": + [50412, 286, 550, 1411, 552, 294, 6071, 281, 1813, 264, 9588, 2899, 8456, 412, 264, + 23377, 332, 14212, 10335, 13, 50784], "temperature": 0.0, "avg_logprob": -0.18921605278463924, + "compression_ratio": 1.4193548387096775, "no_speech_prob": 0.01034748274832964}, + {"id": 23, "seek": 23168, "start": 240.88, "end": 246.96, "text": " You probably + know them better as the nonprofit behind Wikipedia and Wikidata.", "tokens": [50824, + 509, 1391, 458, 552, 1101, 382, 264, 23348, 2261, 28999, 293, 23377, 327, 3274, + 13, 51128], "temperature": 0.0, "avg_logprob": -0.18921605278463924, "compression_ratio": + 1.4193548387096775, "no_speech_prob": 0.01034748274832964}, {"id": 24, "seek": 23168, + "start": 247.6, "end": 256.96000000000004, "text": " So I was the CTO there for + two years. And then in August or so of 2021, I took some time off.", "tokens": [51160, + 407, 286, 390, 264, 383, 15427, 456, 337, 732, 924, 13, 400, 550, 294, 6897, 420, + 370, 295, 7201, 11, 286, 1890, 512, 565, 766, 13, 51628], "temperature": 0.0, "avg_logprob": + -0.18921605278463924, "compression_ratio": 1.4193548387096775, "no_speech_prob": + 0.01034748274832964}, {"id": 25, "seek": 25696, "start": 257.91999999999996, "end": + 265.52, "text": " And then in January of 2022, I went on my own as a consultant + and an instructor for co rise.", "tokens": [50412, 400, 550, 294, 7061, 295, 20229, + 11, 286, 1437, 322, 452, 1065, 382, 257, 24676, 293, 364, 18499, 337, 598, 6272, + 13, 50792], "temperature": 0.0, "avg_logprob": -0.17949795455075382, "compression_ratio": + 1.5166666666666666, "no_speech_prob": 0.005357579793781042}, {"id": 26, "seek": + 25696, "start": 265.52, "end": 274.79999999999995, "text": " So here we are now. + I am commonly doing work in what I would call fractional CTO land, which means I + primarily help companies", "tokens": [50792, 407, 510, 321, 366, 586, 13, 286, 669, + 12719, 884, 589, 294, 437, 286, 576, 818, 17948, 1966, 383, 15427, 2117, 11, 597, + 1355, 286, 10029, 854, 3431, 51256], "temperature": 0.0, "avg_logprob": -0.17949795455075382, + "compression_ratio": 1.5166666666666666, "no_speech_prob": 0.005357579793781042}, + {"id": 27, "seek": 25696, "start": 275.59999999999997, "end": 284.71999999999997, + "text": " kind of get their technology stack in order, make decisions about technology, + higher teams, upgrade teams, do all the things that a CTO would do.", "tokens": + [51296, 733, 295, 483, 641, 2899, 8630, 294, 1668, 11, 652, 5327, 466, 2899, 11, + 2946, 5491, 11, 11484, 5491, 11, 360, 439, 264, 721, 300, 257, 383, 15427, 576, + 360, 13, 51752], "temperature": 0.0, "avg_logprob": -0.17949795455075382, "compression_ratio": + 1.5166666666666666, "no_speech_prob": 0.005357579793781042}, {"id": 28, "seek": + 28472, "start": 285.28000000000003, "end": 288.88000000000005, "text": " Often for + small businesses and or startups.", "tokens": [50392, 20043, 337, 1359, 6011, 293, + 420, 28041, 13, 50572], "temperature": 0.0, "avg_logprob": -0.2162195506848787, + "compression_ratio": 1.6049382716049383, "no_speech_prob": 0.005259320139884949}, + {"id": 29, "seek": 28472, "start": 290.64000000000004, "end": 295.68, "text": " + And so that''s really my background. Really happy to be here and looking forward + to the podcast.", "tokens": [50660, 400, 370, 300, 311, 534, 452, 3678, 13, 4083, + 2055, 281, 312, 510, 293, 1237, 2128, 281, 264, 7367, 13, 50912], "temperature": + 0.0, "avg_logprob": -0.2162195506848787, "compression_ratio": 1.6049382716049383, + "no_speech_prob": 0.005259320139884949}, {"id": 30, "seek": 28472, "start": 296.88000000000005, + "end": 305.92, "text": " Awesome. Great to have you, really grand. And also, you + know, finally, I have a chance to ask some questions and chat to you in this cozy + atmosphere as well.", "tokens": [50972, 10391, 13, 3769, 281, 362, 291, 11, 534, + 2697, 13, 400, 611, 11, 291, 458, 11, 2721, 11, 286, 362, 257, 2931, 281, 1029, + 512, 1651, 293, 5081, 281, 291, 294, 341, 29414, 8018, 382, 731, 13, 51424], "temperature": + 0.0, "avg_logprob": -0.2162195506848787, "compression_ratio": 1.6049382716049383, + "no_speech_prob": 0.005259320139884949}, {"id": 31, "seek": 28472, "start": 307.36, + "end": 314.0, "text": " And I wanted to start with a question. So I was watching + a kind of short interview you gave.", "tokens": [51496, 400, 286, 1415, 281, 722, + 365, 257, 1168, 13, 407, 286, 390, 1976, 257, 733, 295, 2099, 4049, 291, 2729, 13, + 51828], "temperature": 0.0, "avg_logprob": -0.2162195506848787, "compression_ratio": + 1.6049382716049383, "no_speech_prob": 0.005259320139884949}, {"id": 32, "seek": + 31472, "start": 314.88000000000005, "end": 322.40000000000003, "text": " During + Berlin buzzwords 2016, where you said how you split your time as then CTO", "tokens": + [50372, 6842, 13848, 13036, 13832, 6549, 11, 689, 291, 848, 577, 291, 7472, 428, + 565, 382, 550, 383, 15427, 50748], "temperature": 0.0, "avg_logprob": -0.2232109815224834, + "compression_ratio": 1.5818181818181818, "no_speech_prob": 0.0024626140948385}, + {"id": 33, "seek": 31472, "start": 323.6, "end": 329.20000000000005, "text": " of + I believe, Lucid works. You said that you split your time between three C''s, which + is writing code,", "tokens": [50808, 295, 286, 1697, 11, 9593, 327, 1985, 13, 509, + 848, 300, 291, 7472, 428, 565, 1296, 1045, 383, 311, 11, 597, 307, 3579, 3089, 11, + 51088], "temperature": 0.0, "avg_logprob": -0.2232109815224834, "compression_ratio": + 1.5818181818181818, "no_speech_prob": 0.0024626140948385}, {"id": 34, "seek": 31472, + "start": 330.0, "end": 336.64000000000004, "text": " going to conferences and talking + to customers. Now that you''re independent, is this how you spend your time or did + you", "tokens": [51128, 516, 281, 22032, 293, 1417, 281, 4581, 13, 823, 300, 291, + 434, 6695, 11, 307, 341, 577, 291, 3496, 428, 565, 420, 630, 291, 51460], "temperature": + 0.0, "avg_logprob": -0.2232109815224834, "compression_ratio": 1.5818181818181818, + "no_speech_prob": 0.0024626140948385}, {"id": 35, "seek": 31472, "start": 336.64000000000004, + "end": 338.72, "text": " did you get some new letters of the alphabet?", "tokens": + [51460, 630, 291, 483, 512, 777, 7825, 295, 264, 23339, 30, 51564], "temperature": + 0.0, "avg_logprob": -0.2232109815224834, "compression_ratio": 1.5818181818181818, + "no_speech_prob": 0.0024626140948385}, {"id": 36, "seek": 33872, "start": 339.12, + "end": 339.92, "text": " Yeah.", "tokens": [50384, 865, 13, 50424], "temperature": + 0.0, "avg_logprob": -0.23830691443549262, "compression_ratio": 1.5161290322580645, + "no_speech_prob": 0.007980392314493656}, {"id": 37, "seek": 33872, "start": 341.12, + "end": 349.76000000000005, "text": " Yeah, and there''s often in there as well, + colleagues and co-workers, you know, especially in, you know, the CTO role is kind + of a funny one, right?", "tokens": [50484, 865, 11, 293, 456, 311, 2049, 294, 456, + 382, 731, 11, 7734, 293, 598, 12, 37101, 11, 291, 458, 11, 2318, 294, 11, 291, 458, + 11, 264, 383, 15427, 3090, 307, 733, 295, 257, 4074, 472, 11, 558, 30, 50916], "temperature": + 0.0, "avg_logprob": -0.23830691443549262, "compression_ratio": 1.5161290322580645, + "no_speech_prob": 0.007980392314493656}, {"id": 38, "seek": 33872, "start": 350.40000000000003, + "end": 358.40000000000003, "text": " Depending on the company, it can mean a lot + of different things. At some companies, CTOs are entirely outward facing.", "tokens": + [50948, 22539, 322, 264, 2237, 11, 309, 393, 914, 257, 688, 295, 819, 721, 13, 1711, + 512, 3431, 11, 383, 15427, 82, 366, 7696, 26914, 7170, 13, 51348], "temperature": + 0.0, "avg_logprob": -0.23830691443549262, "compression_ratio": 1.5161290322580645, + "no_speech_prob": 0.007980392314493656}, {"id": 39, "seek": 33872, "start": 358.40000000000003, + "end": 363.44000000000005, "text": " It''s effectively a sales role or a marketing + role, right?", "tokens": [51348, 467, 311, 8659, 257, 5763, 3090, 420, 257, 6370, + 3090, 11, 558, 30, 51600], "temperature": 0.0, "avg_logprob": -0.23830691443549262, + "compression_ratio": 1.5161290322580645, "no_speech_prob": 0.007980392314493656}, + {"id": 40, "seek": 36344, "start": 364.16, "end": 368.16, "text": " You''re out + evangelizing the product, you''re talking to customers, etc.", "tokens": [50400, + 509, 434, 484, 24546, 3319, 264, 1674, 11, 291, 434, 1417, 281, 4581, 11, 5183, + 13, 50600], "temperature": 0.0, "avg_logprob": -0.13048405017492906, "compression_ratio": + 1.808695652173913, "no_speech_prob": 0.011804792098701}, {"id": 41, "seek": 36344, + "start": 368.71999999999997, "end": 376.96, "text": " In a startup, the CTO is often + the primary engineer. If you''re a two person startup and you''re just", "tokens": + [50628, 682, 257, 18578, 11, 264, 383, 15427, 307, 2049, 264, 6194, 11403, 13, 759, + 291, 434, 257, 732, 954, 18578, 293, 291, 434, 445, 51040], "temperature": 0.0, + "avg_logprob": -0.13048405017492906, "compression_ratio": 1.808695652173913, "no_speech_prob": + 0.011804792098701}, {"id": 42, "seek": 36344, "start": 376.96, "end": 381.52, "text": + " getting off the ground, you probably have the CTO title if you''re the technical + one in that startup,", "tokens": [51040, 1242, 766, 264, 2727, 11, 291, 1391, 362, + 264, 383, 15427, 4876, 498, 291, 434, 264, 6191, 472, 294, 300, 18578, 11, 51268], + "temperature": 0.0, "avg_logprob": -0.13048405017492906, "compression_ratio": 1.808695652173913, + "no_speech_prob": 0.011804792098701}, {"id": 43, "seek": 36344, "start": 381.52, + "end": 383.92, "text": " and you''re probably writing all the code, right?", "tokens": + [51268, 293, 291, 434, 1391, 3579, 439, 264, 3089, 11, 558, 30, 51388], "temperature": + 0.0, "avg_logprob": -0.13048405017492906, "compression_ratio": 1.808695652173913, + "no_speech_prob": 0.011804792098701}, {"id": 44, "seek": 36344, "start": 385.52, + "end": 390.4, "text": " In other places, you''re running your engineering team, + and you may not be writing as much code,", "tokens": [51468, 682, 661, 3190, 11, + 291, 434, 2614, 428, 7043, 1469, 11, 293, 291, 815, 406, 312, 3579, 382, 709, 3089, + 11, 51712], "temperature": 0.0, "avg_logprob": -0.13048405017492906, "compression_ratio": + 1.808695652173913, "no_speech_prob": 0.011804792098701}, {"id": 45, "seek": 39040, + "start": 390.4, "end": 396.15999999999997, "text": " but you''re responsible for + the team. I guess over my years, I''ve worn all of those hats.", "tokens": [50364, + 457, 291, 434, 6250, 337, 264, 1469, 13, 286, 2041, 670, 452, 924, 11, 286, 600, + 15254, 439, 295, 729, 20549, 13, 50652], "temperature": 0.0, "avg_logprob": -0.14257277382744682, + "compression_ratio": 1.682170542635659, "no_speech_prob": 0.003562831087037921}, + {"id": 46, "seek": 39040, "start": 397.03999999999996, "end": 403.76, "text": " + I''ve been out doing conferences and evangelizing. I''ve done a lot of sales work, + especially later on at", "tokens": [50696, 286, 600, 668, 484, 884, 22032, 293, + 24546, 3319, 13, 286, 600, 1096, 257, 688, 295, 5763, 589, 11, 2318, 1780, 322, + 412, 51032], "temperature": 0.0, "avg_logprob": -0.14257277382744682, "compression_ratio": + 1.682170542635659, "no_speech_prob": 0.003562831087037921}, {"id": 47, "seek": 39040, + "start": 403.76, "end": 411.03999999999996, "text": " Lucidworks, I did a lot of + sales work as the company evolved and grew. When I was at Wikimedia, it was all", + "tokens": [51032, 9593, 327, 18357, 11, 286, 630, 257, 688, 295, 5763, 589, 382, + 264, 2237, 14178, 293, 6109, 13, 1133, 286, 390, 412, 23377, 332, 14212, 11, 309, + 390, 439, 51396], "temperature": 0.0, "avg_logprob": -0.14257277382744682, "compression_ratio": + 1.682170542635659, "no_speech_prob": 0.003562831087037921}, {"id": 48, "seek": 39040, + "start": 411.03999999999996, "end": 419.2, "text": " pretty much internal running + the technology team, making, you know, helping making technology decisions, all + of those kinds of things.", "tokens": [51396, 1238, 709, 6920, 2614, 264, 2899, + 1469, 11, 1455, 11, 291, 458, 11, 4315, 1455, 2899, 5327, 11, 439, 295, 729, 3685, + 295, 721, 13, 51804], "temperature": 0.0, "avg_logprob": -0.14257277382744682, "compression_ratio": + 1.682170542635659, "no_speech_prob": 0.003562831087037921}, {"id": 49, "seek": 42040, + "start": 420.88, "end": 426.4, "text": " So I wouldn''t necessarily say it''s changed + much. I still do write some code, but not as much as", "tokens": [50388, 407, 286, + 2759, 380, 4725, 584, 309, 311, 3105, 709, 13, 286, 920, 360, 2464, 512, 3089, 11, + 457, 406, 382, 709, 382, 50664], "temperature": 0.0, "avg_logprob": -0.15673101562814615, + "compression_ratio": 1.532, "no_speech_prob": 0.016913631930947304}, {"id": 50, + "seek": 42040, "start": 427.2, "end": 434.15999999999997, "text": " as I used to, + I guess, when I was a full-time engineer. But yeah, it still roughly falls into + those", "tokens": [50704, 382, 286, 1143, 281, 11, 286, 2041, 11, 562, 286, 390, + 257, 1577, 12, 3766, 11403, 13, 583, 1338, 11, 309, 920, 9810, 8804, 666, 729, 51052], + "temperature": 0.0, "avg_logprob": -0.15673101562814615, "compression_ratio": 1.532, + "no_speech_prob": 0.016913631930947304}, {"id": 51, "seek": 42040, "start": 434.15999999999997, + "end": 443.03999999999996, "text": " categories. Yeah, and I mean, like having been + a student on your course, I''ve really enjoyed", "tokens": [51052, 10479, 13, 865, + 11, 293, 286, 914, 11, 411, 1419, 668, 257, 3107, 322, 428, 1164, 11, 286, 600, + 534, 4626, 51496], "temperature": 0.0, "avg_logprob": -0.15673101562814615, "compression_ratio": + 1.532, "no_speech_prob": 0.016913631930947304}, {"id": 52, "seek": 42040, "start": + 443.03999999999996, "end": 448.88, "text": " so much code that you''ve written to + support this infrastructure of building the search engine.", "tokens": [51496, 370, + 709, 3089, 300, 291, 600, 3720, 281, 1406, 341, 6896, 295, 2390, 264, 3164, 2848, + 13, 51788], "temperature": 0.0, "avg_logprob": -0.15673101562814615, "compression_ratio": + 1.532, "no_speech_prob": 0.016913631930947304}, {"id": 53, "seek": 44888, "start": + 448.88, "end": 454.4, "text": " And I mean, you are still highly technical person, + so I wouldn''t discount that. And I mean, this is", "tokens": [50364, 400, 286, + 914, 11, 291, 366, 920, 5405, 6191, 954, 11, 370, 286, 2759, 380, 11635, 300, 13, + 400, 286, 914, 11, 341, 307, 50640], "temperature": 0.0, "avg_logprob": -0.1724810269799563, + "compression_ratio": 1.5853658536585367, "no_speech_prob": 0.004058916121721268}, + {"id": 54, "seek": 44888, "start": 454.4, "end": 460.15999999999997, "text": " something + that is dear to my heart as well for me being an engineer, to talk to like-minded + person.", "tokens": [50640, 746, 300, 307, 6875, 281, 452, 1917, 382, 731, 337, + 385, 885, 364, 11403, 11, 281, 751, 281, 411, 12, 23310, 954, 13, 50928], "temperature": + 0.0, "avg_logprob": -0.1724810269799563, "compression_ratio": 1.5853658536585367, + "no_speech_prob": 0.004058916121721268}, {"id": 55, "seek": 44888, "start": 461.76, + "end": 468.71999999999997, "text": " And in this segment year, in the same conference, + 2017, you gave an excellent talk title,", "tokens": [51008, 400, 294, 341, 9469, + 1064, 11, 294, 264, 912, 7586, 11, 6591, 11, 291, 2729, 364, 7103, 751, 4876, 11, + 51356], "temperature": 0.0, "avg_logprob": -0.1724810269799563, "compression_ratio": + 1.5853658536585367, "no_speech_prob": 0.004058916121721268}, {"id": 56, "seek": + 44888, "start": 468.71999999999997, "end": 475.52, "text": " BM25, is so yesterday, + modern techniques for better search. And what''s funny, and I''m going to share", + "tokens": [51356, 15901, 6074, 11, 307, 370, 5186, 11, 4363, 7512, 337, 1101, 3164, + 13, 400, 437, 311, 4074, 11, 293, 286, 478, 516, 281, 2073, 51696], "temperature": + 0.0, "avg_logprob": -0.1724810269799563, "compression_ratio": 1.5853658536585367, + "no_speech_prob": 0.004058916121721268}, {"id": 57, "seek": 47552, "start": 475.59999999999997, + "end": 480.79999999999995, "text": " the link as well. But what''s funny is that + I don''t know if you noticed it yourself, but you again have", "tokens": [50368, + 264, 2113, 382, 731, 13, 583, 437, 311, 4074, 307, 300, 286, 500, 380, 458, 498, + 291, 5694, 309, 1803, 11, 457, 291, 797, 362, 50628], "temperature": 0.0, "avg_logprob": + -0.13290157318115234, "compression_ratio": 1.6219512195121952, "no_speech_prob": + 0.01171757560223341}, {"id": 58, "seek": 47552, "start": 480.79999999999995, "end": + 486.4, "text": " three C''s in there. I wonder if you did it on purpose. What you + have there as building blocks of", "tokens": [50628, 1045, 383, 311, 294, 456, 13, + 286, 2441, 498, 291, 630, 309, 322, 4334, 13, 708, 291, 362, 456, 382, 2390, 8474, + 295, 50908], "temperature": 0.0, "avg_logprob": -0.13290157318115234, "compression_ratio": + 1.6219512195121952, "no_speech_prob": 0.01171757560223341}, {"id": 59, "seek": 47552, + "start": 486.4, "end": 492.79999999999995, "text": " this kind of journey of building + a search engine. So the first one is content. And you piggyback on", "tokens": [50908, + 341, 733, 295, 4671, 295, 2390, 257, 3164, 2848, 13, 407, 264, 700, 472, 307, 2701, + 13, 400, 291, 39349, 3207, 322, 51228], "temperature": 0.0, "avg_logprob": -0.13290157318115234, + "compression_ratio": 1.6219512195121952, "no_speech_prob": 0.01171757560223341}, + {"id": 60, "seek": 47552, "start": 492.79999999999995, "end": 499.2, "text": " solar + capabilities, but in general, it could be any search engine out there with rules + for content,", "tokens": [51228, 7936, 10862, 11, 457, 294, 2674, 11, 309, 727, + 312, 604, 3164, 2848, 484, 456, 365, 4474, 337, 2701, 11, 51548], "temperature": + 0.0, "avg_logprob": -0.13290157318115234, "compression_ratio": 1.6219512195121952, + "no_speech_prob": 0.01171757560223341}, {"id": 61, "seek": 49920, "start": 499.2, + "end": 504.08, "text": " like with boosting, manual boosting, you know, lending + pages, and so on. The second C", "tokens": [50364, 411, 365, 43117, 11, 9688, 43117, + 11, 291, 458, 11, 29823, 7183, 11, 293, 370, 322, 13, 440, 1150, 383, 50608], "temperature": + 0.0, "avg_logprob": -0.14259627713995465, "compression_ratio": 1.703971119133574, + "no_speech_prob": 0.005128503777086735}, {"id": 62, "seek": 49920, "start": 505.03999999999996, + "end": 510.64, "text": " is collaboration. So that''s like the way you put it, it''s + collective intelligence to predict", "tokens": [50656, 307, 9363, 13, 407, 300, + 311, 411, 264, 636, 291, 829, 309, 11, 309, 311, 12590, 7599, 281, 6069, 50936], + "temperature": 0.0, "avg_logprob": -0.14259627713995465, "compression_ratio": 1.703971119133574, + "no_speech_prob": 0.005128503777086735}, {"id": 63, "seek": 49920, "start": 510.64, + "end": 515.36, "text": " user behavior based on like historical aggregated data. + And this is where I think recommenders", "tokens": [50936, 4195, 5223, 2361, 322, + 411, 8584, 16743, 770, 1412, 13, 400, 341, 307, 689, 286, 519, 2748, 433, 51172], + "temperature": 0.0, "avg_logprob": -0.14259627713995465, "compression_ratio": 1.703971119133574, + "no_speech_prob": 0.005128503777086735}, {"id": 64, "seek": 49920, "start": 515.36, + "end": 522.64, "text": " come in, popularity, signals, and so on. And last but not + least, you have context, which is when you", "tokens": [51172, 808, 294, 11, 19301, + 11, 12354, 11, 293, 370, 322, 13, 400, 1036, 457, 406, 1935, 11, 291, 362, 4319, + 11, 597, 307, 562, 291, 51536], "temperature": 0.0, "avg_logprob": -0.14259627713995465, + "compression_ratio": 1.703971119133574, "no_speech_prob": 0.005128503777086735}, + {"id": 65, "seek": 49920, "start": 522.64, "end": 527.52, "text": " ask questions, + who are you, where are you, you know, what I have you done previously. And this + is", "tokens": [51536, 1029, 1651, 11, 567, 366, 291, 11, 689, 366, 291, 11, 291, + 458, 11, 437, 286, 362, 291, 1096, 8046, 13, 400, 341, 307, 51780], "temperature": + 0.0, "avg_logprob": -0.14259627713995465, "compression_ratio": 1.703971119133574, + "no_speech_prob": 0.005128503777086735}, {"id": 66, "seek": 52752, "start": 527.76, + "end": 533.12, "text": " when you start doing market and user segmentation and venture + into personalization and so on,", "tokens": [50376, 562, 291, 722, 884, 2142, 293, + 4195, 9469, 399, 293, 18474, 666, 2973, 2144, 293, 370, 322, 11, 50644], "temperature": + 0.0, "avg_logprob": -0.13416558343010979, "compression_ratio": 1.6992753623188406, + "no_speech_prob": 0.0010685856686905026}, {"id": 67, "seek": 52752, "start": 533.84, + "end": 539.04, "text": " would you say that you view the search engine journey and + development the same way today, or", "tokens": [50680, 576, 291, 584, 300, 291, + 1910, 264, 3164, 2848, 4671, 293, 3250, 264, 912, 636, 965, 11, 420, 50940], "temperature": + 0.0, "avg_logprob": -0.13416558343010979, "compression_ratio": 1.6992753623188406, + "no_speech_prob": 0.0010685856686905026}, {"id": 68, "seek": 52752, "start": 539.04, + "end": 544.4, "text": " have you have you changed your perspective? I really need + to check and get a little more creative.", "tokens": [50940, 362, 291, 362, 291, + 3105, 428, 4585, 30, 286, 534, 643, 281, 1520, 293, 483, 257, 707, 544, 5880, 13, + 51208], "temperature": 0.0, "avg_logprob": -0.13416558343010979, "compression_ratio": + 1.6992753623188406, "no_speech_prob": 0.0010685856686905026}, {"id": 69, "seek": + 52752, "start": 544.4, "end": 551.4399999999999, "text": " I think I''m using the + letter C there too many times in a row, but I mean, I think a lot of", "tokens": + [51208, 286, 519, 286, 478, 1228, 264, 5063, 383, 456, 886, 867, 1413, 294, 257, + 5386, 11, 457, 286, 914, 11, 286, 519, 257, 688, 295, 51560], "temperature": 0.0, + "avg_logprob": -0.13416558343010979, "compression_ratio": 1.6992753623188406, "no_speech_prob": + 0.0010685856686905026}, {"id": 70, "seek": 52752, "start": 551.4399999999999, "end": + 556.48, "text": " that still stands pretty true. Regardless of the engine you''re + using or whether you''re using", "tokens": [51560, 300, 920, 7382, 1238, 2074, 13, + 25148, 295, 264, 2848, 291, 434, 1228, 420, 1968, 291, 434, 1228, 51812], "temperature": + 0.0, "avg_logprob": -0.13416558343010979, "compression_ratio": 1.6992753623188406, + "no_speech_prob": 0.0010685856686905026}, {"id": 71, "seek": 55648, "start": 556.48, + "end": 561.84, "text": " deep learning techniques or not, like, you know, at the + end of the day, you''re trying to", "tokens": [50364, 2452, 2539, 7512, 420, 406, + 11, 411, 11, 291, 458, 11, 412, 264, 917, 295, 264, 786, 11, 291, 434, 1382, 281, + 50632], "temperature": 0.0, "avg_logprob": -0.1766209602355957, "compression_ratio": + 1.6196581196581197, "no_speech_prob": 0.0007912431028671563}, {"id": 72, "seek": + 55648, "start": 563.12, "end": 569.44, "text": " match users to information that + will help them make better decisions or be more informed, right.", "tokens": [50696, + 2995, 5022, 281, 1589, 300, 486, 854, 552, 652, 1101, 5327, 420, 312, 544, 11740, + 11, 558, 13, 51012], "temperature": 0.0, "avg_logprob": -0.1766209602355957, "compression_ratio": + 1.6196581196581197, "no_speech_prob": 0.0007912431028671563}, {"id": 73, "seek": + 55648, "start": 570.16, "end": 576.08, "text": " And you know, these days, I would + probably add in one more. I''m trying to think of how I could be", "tokens": [51048, + 400, 291, 458, 11, 613, 1708, 11, 286, 576, 1391, 909, 294, 472, 544, 13, 286, 478, + 1382, 281, 519, 295, 577, 286, 727, 312, 51344], "temperature": 0.0, "avg_logprob": + -0.1766209602355957, "compression_ratio": 1.6196581196581197, "no_speech_prob": + 0.0007912431028671563}, {"id": 74, "seek": 55648, "start": 576.08, "end": 581.9200000000001, + "text": " witty and make it into another C, but you know, in working with Daniel + Tongueleg on this class,", "tokens": [51344, 261, 10016, 293, 652, 309, 666, 1071, + 383, 11, 457, 291, 458, 11, 294, 1364, 365, 8033, 26946, 622, 306, 70, 322, 341, + 1508, 11, 51636], "temperature": 0.0, "avg_logprob": -0.1766209602355957, "compression_ratio": + 1.6196581196581197, "no_speech_prob": 0.0007912431028671563}, {"id": 75, "seek": + 58192, "start": 582.0, "end": 587.36, "text": " one of the things that is just absolutely + wowed me is the query understanding aspects of it.", "tokens": [50368, 472, 295, + 264, 721, 300, 307, 445, 3122, 6076, 292, 385, 307, 264, 14581, 3701, 7270, 295, + 309, 13, 50636], "temperature": 0.0, "avg_logprob": -0.11495535233441521, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.0020068620797246695}, {"id": 76, "seek": + 58192, "start": 587.36, "end": 594.16, "text": " And so maybe you could put that + into the context category if you wanted. But, you know,", "tokens": [50636, 400, + 370, 1310, 291, 727, 829, 300, 666, 264, 4319, 7719, 498, 291, 1415, 13, 583, 11, + 291, 458, 11, 50976], "temperature": 0.0, "avg_logprob": -0.11495535233441521, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.0020068620797246695}, {"id": 77, "seek": + 58192, "start": 594.16, "end": 599.28, "text": " realistically speaking that that + work you can do, especially in large scale environments where you", "tokens": [50976, + 40734, 4124, 300, 300, 589, 291, 393, 360, 11, 2318, 294, 2416, 4373, 12388, 689, + 291, 51232], "temperature": 0.0, "avg_logprob": -0.11495535233441521, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.0020068620797246695}, {"id": 78, "seek": + 58192, "start": 599.28, "end": 606.48, "text": " have a lot of queries, to really + understand what users are asking or intending to ask when they", "tokens": [51232, + 362, 257, 688, 295, 24109, 11, 281, 534, 1223, 437, 5022, 366, 3365, 420, 560, 2029, + 281, 1029, 562, 436, 51592], "temperature": 0.0, "avg_logprob": -0.11495535233441521, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.0020068620797246695}, + {"id": 79, "seek": 60648, "start": 606.5600000000001, "end": 612.8000000000001, + "text": " put in a query. So I would probably throw that in there if I didn''t include + that back then. And", "tokens": [50368, 829, 294, 257, 14581, 13, 407, 286, 576, + 1391, 3507, 300, 294, 456, 498, 286, 994, 380, 4090, 300, 646, 550, 13, 400, 50680], + "temperature": 0.0, "avg_logprob": -0.11644958314441499, "compression_ratio": 1.7383512544802868, + "no_speech_prob": 0.0006392719224095345}, {"id": 80, "seek": 60648, "start": 612.8000000000001, + "end": 618.4, "text": " so like I said, maybe that''s part of your content or your + or your context is the actual query,", "tokens": [50680, 370, 411, 286, 848, 11, + 1310, 300, 311, 644, 295, 428, 2701, 420, 428, 420, 428, 4319, 307, 264, 3539, 14581, + 11, 50960], "temperature": 0.0, "avg_logprob": -0.11644958314441499, "compression_ratio": + 1.7383512544802868, "no_speech_prob": 0.0006392719224095345}, {"id": 81, "seek": + 60648, "start": 618.4, "end": 625.6, "text": " a user or this, the set of queries + that a user is asking. You know, but I still think a lot of that", "tokens": [50960, + 257, 4195, 420, 341, 11, 264, 992, 295, 24109, 300, 257, 4195, 307, 3365, 13, 509, + 458, 11, 457, 286, 920, 519, 257, 688, 295, 300, 51320], "temperature": 0.0, "avg_logprob": + -0.11644958314441499, "compression_ratio": 1.7383512544802868, "no_speech_prob": + 0.0006392719224095345}, {"id": 82, "seek": 60648, "start": 625.6, "end": 631.52, + "text": " stands at the conceptual level, right, is you have to have some, you know, + if you think about it,", "tokens": [51320, 7382, 412, 264, 24106, 1496, 11, 558, + 11, 307, 291, 362, 281, 362, 512, 11, 291, 458, 11, 498, 291, 519, 466, 309, 11, + 51616], "temperature": 0.0, "avg_logprob": -0.11644958314441499, "compression_ratio": + 1.7383512544802868, "no_speech_prob": 0.0006392719224095345}, {"id": 83, "seek": + 60648, "start": 631.52, "end": 636.0, "text": " this is the vector podcast, right. + All of this stuff we''re building vectors and then essentially", "tokens": [51616, + 341, 307, 264, 8062, 7367, 11, 558, 13, 1057, 295, 341, 1507, 321, 434, 2390, 18875, + 293, 550, 4476, 51840], "temperature": 0.0, "avg_logprob": -0.11644958314441499, + "compression_ratio": 1.7383512544802868, "no_speech_prob": 0.0006392719224095345}, + {"id": 84, "seek": 63600, "start": 636.0, "end": 642.08, "text": " calculating this + fancy version of a cosine similarity between them. And at the end of the day,", + "tokens": [50364, 28258, 341, 10247, 3037, 295, 257, 23565, 32194, 1296, 552, 13, + 400, 412, 264, 917, 295, 264, 786, 11, 50668], "temperature": 0.0, "avg_logprob": + -0.11144391606363017, "compression_ratio": 1.7534246575342465, "no_speech_prob": + 0.0005059881368651986}, {"id": 85, "seek": 63600, "start": 642.08, "end": 649.6, + "text": " all of these techniques we''re doing are effectively how can we shape + those vectors so that things", "tokens": [50668, 439, 295, 613, 7512, 321, 434, + 884, 366, 8659, 577, 393, 321, 3909, 729, 18875, 370, 300, 721, 51044], "temperature": + 0.0, "avg_logprob": -0.11144391606363017, "compression_ratio": 1.7534246575342465, + "no_speech_prob": 0.0005059881368651986}, {"id": 86, "seek": 63600, "start": 649.6, + "end": 656.56, "text": " that are meant to be closer together, show up closer together + and things that are not as related", "tokens": [51044, 300, 366, 4140, 281, 312, + 4966, 1214, 11, 855, 493, 4966, 1214, 293, 721, 300, 366, 406, 382, 4077, 51392], + "temperature": 0.0, "avg_logprob": -0.11144391606363017, "compression_ratio": 1.7534246575342465, + "no_speech_prob": 0.0005059881368651986}, {"id": 87, "seek": 63600, "start": 657.36, + "end": 662.48, "text": " the cosine is further apart, right. Like at the end of + the day, like that math doesn''t change,", "tokens": [51432, 264, 23565, 307, 3052, + 4936, 11, 558, 13, 1743, 412, 264, 917, 295, 264, 786, 11, 411, 300, 5221, 1177, + 380, 1319, 11, 51688], "temperature": 0.0, "avg_logprob": -0.11144391606363017, + "compression_ratio": 1.7534246575342465, "no_speech_prob": 0.0005059881368651986}, + {"id": 88, "seek": 66248, "start": 662.48, "end": 667.36, "text": " yet all these + techniques, whether it''s deep learning, et cetera, are all about creating those", + "tokens": [50364, 1939, 439, 613, 7512, 11, 1968, 309, 311, 2452, 2539, 11, 1030, + 11458, 11, 366, 439, 466, 4084, 729, 50608], "temperature": 0.0, "avg_logprob": + -0.08283464532149465, "compression_ratio": 1.9631147540983607, "no_speech_prob": + 0.001296419301070273}, {"id": 89, "seek": 66248, "start": 667.36, "end": 673.36, + "text": " vectors and doing that calculation, right. And so by understanding your + content, you''re shifting", "tokens": [50608, 18875, 293, 884, 300, 17108, 11, 558, + 13, 400, 370, 538, 3701, 428, 2701, 11, 291, 434, 17573, 50908], "temperature": + 0.0, "avg_logprob": -0.08283464532149465, "compression_ratio": 1.9631147540983607, + "no_speech_prob": 0.001296419301070273}, {"id": 90, "seek": 66248, "start": 673.36, + "end": 678.24, "text": " those vectors, you''re transforming them in the space, + you''re adding synonyms, you''re adding embeddings,", "tokens": [50908, 729, 18875, + 11, 291, 434, 27210, 552, 294, 264, 1901, 11, 291, 434, 5127, 5451, 2526, 2592, + 11, 291, 434, 5127, 12240, 29432, 11, 51152], "temperature": 0.0, "avg_logprob": + -0.08283464532149465, "compression_ratio": 1.9631147540983607, "no_speech_prob": + 0.001296419301070273}, {"id": 91, "seek": 66248, "start": 678.24, "end": 683.44, + "text": " all of those kinds of things, you''re adding proper nouns, you''re you''re + doing noun phrases,", "tokens": [51152, 439, 295, 729, 3685, 295, 721, 11, 291, + 434, 5127, 2296, 48184, 11, 291, 434, 291, 434, 884, 23307, 20312, 11, 51412], "temperature": + 0.0, "avg_logprob": -0.08283464532149465, "compression_ratio": 1.9631147540983607, + "no_speech_prob": 0.001296419301070273}, {"id": 92, "seek": 66248, "start": 683.44, + "end": 688.4, "text": " et cetera. By understanding your context, you''re able to + ask better queries, right, which is", "tokens": [51412, 1030, 11458, 13, 3146, 3701, + 428, 4319, 11, 291, 434, 1075, 281, 1029, 1101, 24109, 11, 558, 11, 597, 307, 51660], + "temperature": 0.0, "avg_logprob": -0.08283464532149465, "compression_ratio": 1.9631147540983607, + "no_speech_prob": 0.001296419301070273}, {"id": 93, "seek": 68840, "start": 688.4, + "end": 695.92, "text": " shifting the query vector, right. And by using popularity, + et cetera, you''re also then shifting", "tokens": [50364, 17573, 264, 14581, 8062, + 11, 558, 13, 400, 538, 1228, 19301, 11, 1030, 11458, 11, 291, 434, 611, 550, 17573, + 50740], "temperature": 0.0, "avg_logprob": -0.10276345339688388, "compression_ratio": + 1.7756653992395437, "no_speech_prob": 0.002730551641434431}, {"id": 94, "seek": + 68840, "start": 695.92, "end": 700.9599999999999, "text": " those vectors by essentially + adding more weight to things that are more popular, right.", "tokens": [50740, 729, + 18875, 538, 4476, 5127, 544, 3364, 281, 721, 300, 366, 544, 3743, 11, 558, 13, 50992], + "temperature": 0.0, "avg_logprob": -0.10276345339688388, "compression_ratio": 1.7756653992395437, + "no_speech_prob": 0.002730551641434431}, {"id": 95, "seek": 68840, "start": 702.0, + "end": 705.76, "text": " You know, so at the end of the day, like, yeah, I would + say I''d still stand by that with the", "tokens": [51044, 509, 458, 11, 370, 412, + 264, 917, 295, 264, 786, 11, 411, 11, 1338, 11, 286, 576, 584, 286, 1116, 920, 1463, + 538, 300, 365, 264, 51232], "temperature": 0.0, "avg_logprob": -0.10276345339688388, + "compression_ratio": 1.7756653992395437, "no_speech_prob": 0.002730551641434431}, + {"id": 96, "seek": 68840, "start": 705.76, "end": 712.56, "text": " caveat is really + bringing forward the query understanding aspect of it. Yeah, I think query", "tokens": + [51232, 43012, 307, 534, 5062, 2128, 264, 14581, 3701, 4171, 295, 309, 13, 865, + 11, 286, 519, 14581, 51572], "temperature": 0.0, "avg_logprob": -0.10276345339688388, + "compression_ratio": 1.7756653992395437, "no_speech_prob": 0.002730551641434431}, + {"id": 97, "seek": 68840, "start": 712.56, "end": 717.52, "text": " understanding, + you put it brilliantly, it''s like really an exciting space and we actually recorded", + "tokens": [51572, 3701, 11, 291, 829, 309, 8695, 42580, 11, 309, 311, 411, 534, + 364, 4670, 1901, 293, 321, 767, 8287, 51820], "temperature": 0.0, "avg_logprob": + -0.10276345339688388, "compression_ratio": 1.7756653992395437, "no_speech_prob": + 0.002730551641434431}, {"id": 98, "seek": 71752, "start": 717.68, "end": 723.92, + "text": " a podcast as well with Daniel Tankilank, where he explained a lot of it, + he also blogged a lot about it.", "tokens": [50372, 257, 7367, 382, 731, 365, 8033, + 28746, 388, 657, 11, 689, 415, 8825, 257, 688, 295, 309, 11, 415, 611, 6968, 3004, + 257, 688, 466, 309, 13, 50684], "temperature": 0.0, "avg_logprob": -0.22521983107475385, + "compression_ratio": 1.5128205128205128, "no_speech_prob": 0.002179196337237954}, + {"id": 99, "seek": 71752, "start": 723.92, "end": 733.36, "text": " So go check + it out. And like in that same presentation, like when you demoed the capabilities + of", "tokens": [50684, 407, 352, 1520, 309, 484, 13, 400, 411, 294, 300, 912, 5860, + 11, 411, 562, 291, 10723, 292, 264, 10862, 295, 51156], "temperature": 0.0, "avg_logprob": + -0.22521983107475385, "compression_ratio": 1.5128205128205128, "no_speech_prob": + 0.002179196337237954}, {"id": 100, "seek": 71752, "start": 733.36, "end": 739.68, + "text": " Lucidworks platform, where you played a lot with different like ranking + strategies, basically", "tokens": [51156, 9593, 327, 18357, 3663, 11, 689, 291, + 3737, 257, 688, 365, 819, 411, 17833, 9029, 11, 1936, 51472], "temperature": 0.0, + "avg_logprob": -0.22521983107475385, "compression_ratio": 1.5128205128205128, "no_speech_prob": + 0.002179196337237954}, {"id": 101, "seek": 73968, "start": 740.16, "end": 747.04, + "text": " like you pre-trained some of them and you you were able to switch live, + I felt like you you are", "tokens": [50388, 411, 291, 659, 12, 17227, 2001, 512, + 295, 552, 293, 291, 291, 645, 1075, 281, 3679, 1621, 11, 286, 2762, 411, 291, 291, + 366, 50732], "temperature": 0.0, "avg_logprob": -0.17073892809681057, "compression_ratio": + 1.6317991631799162, "no_speech_prob": 0.0033825526479631662}, {"id": 102, "seek": + 73968, "start": 747.04, "end": 753.76, "text": " a tinkerer as well. You enjoy really + going deep down into the what search engine can do and what", "tokens": [50732, + 257, 256, 40467, 260, 382, 731, 13, 509, 2103, 534, 516, 2452, 760, 666, 264, 437, + 3164, 2848, 393, 360, 293, 437, 51068], "temperature": 0.0, "avg_logprob": -0.17073892809681057, + "compression_ratio": 1.6317991631799162, "no_speech_prob": 0.0033825526479631662}, + {"id": 103, "seek": 73968, "start": 753.76, "end": 760.8, "text": " you you can + extract from the data. And my question is, where do you see the balance between + kind of like", "tokens": [51068, 291, 291, 393, 8947, 490, 264, 1412, 13, 400, 452, + 1168, 307, 11, 689, 360, 291, 536, 264, 4772, 1296, 733, 295, 411, 51420], "temperature": + 0.0, "avg_logprob": -0.17073892809681057, "compression_ratio": 1.6317991631799162, + "no_speech_prob": 0.0033825526479631662}, {"id": 104, "seek": 73968, "start": 760.8, + "end": 766.4799999999999, "text": " doing this in a more manual fashion, where you + actually educate yourself, right, versus like", "tokens": [51420, 884, 341, 294, + 257, 544, 9688, 6700, 11, 689, 291, 767, 16092, 1803, 11, 558, 11, 5717, 411, 51704], + "temperature": 0.0, "avg_logprob": -0.17073892809681057, "compression_ratio": 1.6317991631799162, + "no_speech_prob": 0.0033825526479631662}, {"id": 105, "seek": 76648, "start": 766.48, + "end": 771.84, "text": " throwing it to a machine learning model? Yeah, it''s a + great question. I mean, I think", "tokens": [50364, 10238, 309, 281, 257, 3479, + 2539, 2316, 30, 865, 11, 309, 311, 257, 869, 1168, 13, 286, 914, 11, 286, 519, 50632], + "temperature": 0.0, "avg_logprob": -0.19382127901402915, "compression_ratio": 1.721189591078067, + "no_speech_prob": 0.005030888132750988}, {"id": 106, "seek": 76648, "start": 772.5600000000001, + "end": 779.12, "text": " you know, obviously, and I see my former colleague and + co-founder Eric Hatcher is on, I mean,", "tokens": [50668, 291, 458, 11, 2745, 11, + 293, 286, 536, 452, 5819, 13532, 293, 598, 12, 33348, 9336, 389, 852, 260, 307, + 322, 11, 286, 914, 11, 50996], "temperature": 0.0, "avg_logprob": -0.19382127901402915, + "compression_ratio": 1.721189591078067, "no_speech_prob": 0.005030888132750988}, + {"id": 107, "seek": 76648, "start": 779.12, "end": 784.72, "text": " he used to + always say it depends and I''d say it depends here of course as well, which is, + you know,", "tokens": [50996, 415, 1143, 281, 1009, 584, 309, 5946, 293, 286, 1116, + 584, 309, 5946, 510, 295, 1164, 382, 731, 11, 597, 307, 11, 291, 458, 11, 51276], + "temperature": 0.0, "avg_logprob": -0.19382127901402915, "compression_ratio": 1.721189591078067, + "no_speech_prob": 0.005030888132750988}, {"id": 108, "seek": 76648, "start": 784.72, + "end": 789.76, "text": " I mean, there''s there''s some situations where you just + you don''t have enough data for machine learning,", "tokens": [51276, 286, 914, + 11, 456, 311, 456, 311, 512, 6851, 689, 291, 445, 291, 500, 380, 362, 1547, 1412, + 337, 3479, 2539, 11, 51528], "temperature": 0.0, "avg_logprob": -0.19382127901402915, + "compression_ratio": 1.721189591078067, "no_speech_prob": 0.005030888132750988}, + {"id": 109, "seek": 76648, "start": 789.76, "end": 796.16, "text": " right? So by + default, you are going to be manually tuning the situation, right?", "tokens": [51528, + 558, 30, 407, 538, 7576, 11, 291, 366, 516, 281, 312, 16945, 15164, 264, 2590, 11, + 558, 30, 51848], "temperature": 0.0, "avg_logprob": -0.19382127901402915, "compression_ratio": + 1.721189591078067, "no_speech_prob": 0.005030888132750988}, {"id": 110, "seek": + 79616, "start": 796.16, "end": 801.28, "text": " You you see that a lot in enterprise + systems, especially smaller enterprise systems or in", "tokens": [50364, 509, 291, + 536, 300, 257, 688, 294, 14132, 3652, 11, 2318, 4356, 14132, 3652, 420, 294, 50620], + "temperature": 0.0, "avg_logprob": -0.1034354541612708, "compression_ratio": 1.6228813559322033, + "no_speech_prob": 0.0005666203214786947}, {"id": 111, "seek": 79616, "start": 801.92, + "end": 808.64, "text": " niche applications where, you know, effectively search + just needs to be good enough. Maybe you''re not", "tokens": [50652, 19956, 5821, + 689, 11, 291, 458, 11, 8659, 3164, 445, 2203, 281, 312, 665, 1547, 13, 2704, 291, + 434, 406, 50988], "temperature": 0.0, "avg_logprob": -0.1034354541612708, "compression_ratio": + 1.6228813559322033, "no_speech_prob": 0.0005666203214786947}, {"id": 112, "seek": + 79616, "start": 808.64, "end": 815.6, "text": " monetizing search. And so you don''t, + you know, you just kind of need it to be reasonably good,", "tokens": [50988, 15556, + 3319, 3164, 13, 400, 370, 291, 500, 380, 11, 291, 458, 11, 291, 445, 733, 295, 643, + 309, 281, 312, 23551, 665, 11, 51336], "temperature": 0.0, "avg_logprob": -0.1034354541612708, + "compression_ratio": 1.6228813559322033, "no_speech_prob": 0.0005666203214786947}, + {"id": 113, "seek": 79616, "start": 815.6, "end": 822.3199999999999, "text": " right? + It''s a feature in a much broader set of features that users are going to engage + with. And", "tokens": [51336, 558, 30, 467, 311, 257, 4111, 294, 257, 709, 13227, + 992, 295, 4122, 300, 5022, 366, 516, 281, 4683, 365, 13, 400, 51672], "temperature": + 0.0, "avg_logprob": -0.1034354541612708, "compression_ratio": 1.6228813559322033, + "no_speech_prob": 0.0005666203214786947}, {"id": 114, "seek": 82232, "start": 822.32, + "end": 827.36, "text": " so, you know, where and how you would use machine learning + in those situations, you know, you may", "tokens": [50364, 370, 11, 291, 458, 11, + 689, 293, 577, 291, 576, 764, 3479, 2539, 294, 729, 6851, 11, 291, 458, 11, 291, + 815, 50616], "temperature": 0.0, "avg_logprob": -0.14121672571921834, "compression_ratio": + 1.8142857142857143, "no_speech_prob": 0.00045651980326510966}, {"id": 115, "seek": + 82232, "start": 827.36, "end": 834.5600000000001, "text": " or may not. In the situations + where you have lots and lots of data, lots of users, you''re probably", "tokens": + [50616, 420, 815, 406, 13, 682, 264, 6851, 689, 291, 362, 3195, 293, 3195, 295, + 1412, 11, 3195, 295, 5022, 11, 291, 434, 1391, 50976], "temperature": 0.0, "avg_logprob": + -0.14121672571921834, "compression_ratio": 1.8142857142857143, "no_speech_prob": + 0.00045651980326510966}, {"id": 116, "seek": 82232, "start": 834.5600000000001, + "end": 840.8000000000001, "text": " monetizing search, whether that''s via e-commerce + or or web search or ads or whatever, like, you know,", "tokens": [50976, 15556, + 3319, 3164, 11, 1968, 300, 311, 5766, 308, 12, 26926, 420, 420, 3670, 3164, 420, + 10342, 420, 2035, 11, 411, 11, 291, 458, 11, 51288], "temperature": 0.0, "avg_logprob": + -0.14121672571921834, "compression_ratio": 1.8142857142857143, "no_speech_prob": + 0.00045651980326510966}, {"id": 117, "seek": 82232, "start": 840.8000000000001, + "end": 846.08, "text": " I think machine learning makes a lot more sense there and + and it''s a lot easier to", "tokens": [51288, 286, 519, 3479, 2539, 1669, 257, 688, + 544, 2020, 456, 293, 293, 309, 311, 257, 688, 3571, 281, 51552], "temperature": + 0.0, "avg_logprob": -0.14121672571921834, "compression_ratio": 1.8142857142857143, + "no_speech_prob": 0.00045651980326510966}, {"id": 118, "seek": 84608, "start": 846.96, + "end": 853.5200000000001, "text": " run these types of experiments that allow you + to tinker not just with the hand-ranked models,", "tokens": [50408, 1190, 613, 3467, + 295, 12050, 300, 2089, 291, 281, 256, 40467, 406, 445, 365, 264, 1011, 12, 20479, + 292, 5245, 11, 50736], "temperature": 0.0, "avg_logprob": -0.10000874598821004, + "compression_ratio": 1.6853448275862069, "no_speech_prob": 0.002620015060529113}, + {"id": 119, "seek": 84608, "start": 853.5200000000001, "end": 859.6, "text": " which + I think hand-ranking still has its place, right? Because they help you form intuition + about", "tokens": [50736, 597, 286, 519, 1011, 12, 20479, 278, 920, 575, 1080, 1081, + 11, 558, 30, 1436, 436, 854, 291, 1254, 24002, 466, 51040], "temperature": 0.0, + "avg_logprob": -0.10000874598821004, "compression_ratio": 1.6853448275862069, "no_speech_prob": + 0.002620015060529113}, {"id": 120, "seek": 84608, "start": 859.6, "end": 865.6800000000001, + "text": " what is in your data, right? And that intuition is really important even + in a machine learning world", "tokens": [51040, 437, 307, 294, 428, 1412, 11, 558, + 30, 400, 300, 24002, 307, 534, 1021, 754, 294, 257, 3479, 2539, 1002, 51344], "temperature": + 0.0, "avg_logprob": -0.10000874598821004, "compression_ratio": 1.6853448275862069, + "no_speech_prob": 0.002620015060529113}, {"id": 121, "seek": 84608, "start": 865.6800000000001, + "end": 870.48, "text": " because, you know, at the end of the day, even with machine + learning, while you can still try out,", "tokens": [51344, 570, 11, 291, 458, 11, + 412, 264, 917, 295, 264, 786, 11, 754, 365, 3479, 2539, 11, 1339, 291, 393, 920, + 853, 484, 11, 51584], "temperature": 0.0, "avg_logprob": -0.10000874598821004, "compression_ratio": + 1.6853448275862069, "no_speech_prob": 0.002620015060529113}, {"id": 122, "seek": + 87048, "start": 870.48, "end": 877.2, "text": " you can try out a lot more features + and approaches, you still have limited time, right? And so,", "tokens": [50364, + 291, 393, 853, 484, 257, 688, 544, 4122, 293, 11587, 11, 291, 920, 362, 5567, 565, + 11, 558, 30, 400, 370, 11, 50700], "temperature": 0.0, "avg_logprob": -0.12090349733159783, + "compression_ratio": 1.6916299559471366, "no_speech_prob": 0.0012513004476204515}, + {"id": 123, "seek": 87048, "start": 877.2, "end": 883.36, "text": " you still have + to have some intuition about what''s going to work and I think there''s no substitute", + "tokens": [50700, 291, 920, 362, 281, 362, 512, 24002, 466, 437, 311, 516, 281, + 589, 293, 286, 519, 456, 311, 572, 15802, 51008], "temperature": 0.0, "avg_logprob": + -0.12090349733159783, "compression_ratio": 1.6916299559471366, "no_speech_prob": + 0.0012513004476204515}, {"id": 124, "seek": 87048, "start": 883.36, "end": 889.52, + "text": " for that intuition helping guide you into what matters, like, so for instance, + in a learning", "tokens": [51008, 337, 300, 24002, 4315, 5934, 291, 666, 437, 7001, + 11, 411, 11, 370, 337, 5197, 11, 294, 257, 2539, 51316], "temperature": 0.0, "avg_logprob": + -0.12090349733159783, "compression_ratio": 1.6916299559471366, "no_speech_prob": + 0.0012513004476204515}, {"id": 125, "seek": 87048, "start": 889.52, "end": 896.16, + "text": " to rank scenario where you''re actually learning a ranking model, you + still are often building up", "tokens": [51316, 281, 6181, 9005, 689, 291, 434, + 767, 2539, 257, 17833, 2316, 11, 291, 920, 366, 2049, 2390, 493, 51648], "temperature": + 0.0, "avg_logprob": -0.12090349733159783, "compression_ratio": 1.6916299559471366, + "no_speech_prob": 0.0012513004476204515}, {"id": 126, "seek": 89616, "start": 896.16, + "end": 902.56, "text": " those systems using the features of your data. So you have + to know what those features are and", "tokens": [50364, 729, 3652, 1228, 264, 4122, + 295, 428, 1412, 13, 407, 291, 362, 281, 458, 437, 729, 4122, 366, 293, 50684], "temperature": + 0.0, "avg_logprob": -0.14002721206001614, "compression_ratio": 1.5822784810126582, + "no_speech_prob": 0.0009849191410467029}, {"id": 127, "seek": 89616, "start": 902.56, + "end": 909.04, "text": " one of the nice things is like with tools like Lucine-based + engines like OpenSearch or Solar or", "tokens": [50684, 472, 295, 264, 1481, 721, + 307, 411, 365, 3873, 411, 9593, 533, 12, 6032, 12982, 411, 7238, 10637, 1178, 420, + 22385, 420, 51008], "temperature": 0.0, "avg_logprob": -0.14002721206001614, "compression_ratio": + 1.5822784810126582, "no_speech_prob": 0.0009849191410467029}, {"id": 128, "seek": + 89616, "start": 909.04, "end": 914.3199999999999, "text": " Elastic, I''m sure Vespa + has the same kind of thing, you can go and play around with those,", "tokens": [51008, + 2699, 2750, 11, 286, 478, 988, 691, 279, 4306, 575, 264, 912, 733, 295, 551, 11, + 291, 393, 352, 293, 862, 926, 365, 729, 11, 51272], "temperature": 0.0, "avg_logprob": + -0.14002721206001614, "compression_ratio": 1.5822784810126582, "no_speech_prob": + 0.0009849191410467029}, {"id": 129, "seek": 89616, "start": 914.3199999999999, "end": + 921.1999999999999, "text": " you can create your own function queries that allow + you to roughly try out different formulas", "tokens": [51272, 291, 393, 1884, 428, + 1065, 2445, 24109, 300, 2089, 291, 281, 9810, 853, 484, 819, 30546, 51616], "temperature": + 0.0, "avg_logprob": -0.14002721206001614, "compression_ratio": 1.5822784810126582, + "no_speech_prob": 0.0009849191410467029}, {"id": 130, "seek": 92120, "start": 921.2, + "end": 926.32, "text": " for ranking and then you can go and turn those things into + machine learning models, right?", "tokens": [50364, 337, 17833, 293, 550, 291, 393, + 352, 293, 1261, 729, 721, 666, 3479, 2539, 5245, 11, 558, 30, 50620], "temperature": + 0.0, "avg_logprob": -0.1272864124991677, "compression_ratio": 1.6475770925110131, + "no_speech_prob": 0.003484126413241029}, {"id": 131, "seek": 92120, "start": 926.32, + "end": 931.12, "text": " That learn a much more effective function than what you + could come up with, right? So,", "tokens": [50620, 663, 1466, 257, 709, 544, 4942, + 2445, 813, 437, 291, 727, 808, 493, 365, 11, 558, 30, 407, 11, 50860], "temperature": + 0.0, "avg_logprob": -0.1272864124991677, "compression_ratio": 1.6475770925110131, + "no_speech_prob": 0.003484126413241029}, {"id": 132, "seek": 92120, "start": 932.08, + "end": 937.2, "text": " I think even in this world of large data sets and machine + learning, you''re still going to have", "tokens": [50908, 286, 519, 754, 294, 341, + 1002, 295, 2416, 1412, 6352, 293, 3479, 2539, 11, 291, 434, 920, 516, 281, 362, + 51164], "temperature": 0.0, "avg_logprob": -0.1272864124991677, "compression_ratio": + 1.6475770925110131, "no_speech_prob": 0.003484126413241029}, {"id": 133, "seek": + 92120, "start": 938.1600000000001, "end": 946.5600000000001, "text": " to build + intuition, right? Yeah, absolutely. And like in your own experience and in the experience + of", "tokens": [51212, 281, 1322, 24002, 11, 558, 30, 865, 11, 3122, 13, 400, 411, + 294, 428, 1065, 1752, 293, 294, 264, 1752, 295, 51632], "temperature": 0.0, "avg_logprob": + -0.1272864124991677, "compression_ratio": 1.6475770925110131, "no_speech_prob": + 0.003484126413241029}, {"id": 134, "seek": 94656, "start": 946.7199999999999, "end": + 954.4799999999999, "text": " the teams that you supported, how do you nurture this + intuition? Like, do you read books? Do you constantly", "tokens": [50372, 264, 5491, + 300, 291, 8104, 11, 577, 360, 291, 41451, 341, 24002, 30, 1743, 11, 360, 291, 1401, + 3642, 30, 1144, 291, 6460, 50760], "temperature": 0.0, "avg_logprob": -0.21449580945466695, + "compression_ratio": 1.6055776892430278, "no_speech_prob": 0.004052360542118549}, + {"id": 135, "seek": 94656, "start": 954.4799999999999, "end": 960.3199999999999, + "text": " experiment? And also like when it comes in, you know, to understanding + fundamentals of search,", "tokens": [50760, 5120, 30, 400, 611, 411, 562, 309, 1487, + 294, 11, 291, 458, 11, 281, 3701, 29505, 295, 3164, 11, 51052], "temperature": 0.0, + "avg_logprob": -0.21449580945466695, "compression_ratio": 1.6055776892430278, "no_speech_prob": + 0.004052360542118549}, {"id": 136, "seek": 94656, "start": 960.3199999999999, "end": + 967.8399999999999, "text": " let''s say knowing how TFIDF formula composed or you + have 25, what are the trade-offs versus sort of", "tokens": [51052, 718, 311, 584, + 5276, 577, 40964, 2777, 37, 8513, 18204, 420, 291, 362, 3552, 11, 437, 366, 264, + 4923, 12, 19231, 5717, 1333, 295, 51428], "temperature": 0.0, "avg_logprob": -0.21449580945466695, + "compression_ratio": 1.6055776892430278, "no_speech_prob": 0.004052360542118549}, + {"id": 137, "seek": 94656, "start": 967.8399999999999, "end": 973.3599999999999, + "text": " like going and actually experimenting and trying out things, you know, + where do you see that balance", "tokens": [51428, 411, 516, 293, 767, 29070, 293, + 1382, 484, 721, 11, 291, 458, 11, 689, 360, 291, 536, 300, 4772, 51704], "temperature": + 0.0, "avg_logprob": -0.21449580945466695, "compression_ratio": 1.6055776892430278, + "no_speech_prob": 0.004052360542118549}, {"id": 138, "seek": 97336, "start": 973.36, + "end": 980.48, "text": " as well for yourself maybe and also for the teams around + you? Yeah, I mean, I think everybody", "tokens": [50364, 382, 731, 337, 1803, 1310, + 293, 611, 337, 264, 5491, 926, 291, 30, 865, 11, 286, 914, 11, 286, 519, 2201, 50720], + "temperature": 0.0, "avg_logprob": -0.13456870118776956, "compression_ratio": 1.6475770925110131, + "no_speech_prob": 0.0009662438533268869}, {"id": 139, "seek": 97336, "start": 980.48, + "end": 985.2, "text": " will have their own, you know, kind of depending on where + you come from, right? Like if,", "tokens": [50720, 486, 362, 641, 1065, 11, 291, + 458, 11, 733, 295, 5413, 322, 689, 291, 808, 490, 11, 558, 30, 1743, 498, 11, 50956], + "temperature": 0.0, "avg_logprob": -0.13456870118776956, "compression_ratio": 1.6475770925110131, + "no_speech_prob": 0.0009662438533268869}, {"id": 140, "seek": 97336, "start": 985.84, + "end": 990.88, "text": " you know, like if you have, if you''ve done deep academic + work, you''re probably going to have a", "tokens": [50988, 291, 458, 11, 411, 498, + 291, 362, 11, 498, 291, 600, 1096, 2452, 7778, 589, 11, 291, 434, 1391, 516, 281, + 362, 257, 51240], "temperature": 0.0, "avg_logprob": -0.13456870118776956, "compression_ratio": + 1.6475770925110131, "no_speech_prob": 0.0009662438533268869}, {"id": 141, "seek": + 97336, "start": 990.88, "end": 997.12, "text": " lot more understanding of the math + and the theoretical side of it. And then you''re going to have", "tokens": [51240, + 688, 544, 3701, 295, 264, 5221, 293, 264, 20864, 1252, 295, 309, 13, 400, 550, 291, + 434, 516, 281, 362, 51552], "temperature": 0.0, "avg_logprob": -0.13456870118776956, + "compression_ratio": 1.6475770925110131, "no_speech_prob": 0.0009662438533268869}, + {"id": 142, "seek": 99712, "start": 997.12, "end": 1003.36, "text": " to develop + the intuition of real world data, right? How messy it is, how clunky it is, how", + "tokens": [50364, 281, 1499, 264, 24002, 295, 957, 1002, 1412, 11, 558, 30, 1012, + 16191, 309, 307, 11, 577, 596, 25837, 309, 307, 11, 577, 50676], "temperature": + 0.0, "avg_logprob": -0.11430580447418522, "compression_ratio": 1.628691983122363, + "no_speech_prob": 0.0020228272769600153}, {"id": 143, "seek": 99712, "start": 1004.16, + "end": 1009.6, "text": " full of junk and spam, et cetera, right? Because a lot + of times when you''re dealing with academic", "tokens": [50716, 1577, 295, 19109, + 293, 24028, 11, 1030, 11458, 11, 558, 30, 1436, 257, 688, 295, 1413, 562, 291, 434, + 6260, 365, 7778, 50988], "temperature": 0.0, "avg_logprob": -0.11430580447418522, + "compression_ratio": 1.628691983122363, "no_speech_prob": 0.0020228272769600153}, + {"id": 144, "seek": 99712, "start": 1009.6, "end": 1015.12, "text": " data sets, + they''re pretty clean, right? Relatively speaking, they still of course have their + own", "tokens": [50988, 1412, 6352, 11, 436, 434, 1238, 2541, 11, 558, 30, 8738, + 19020, 4124, 11, 436, 920, 295, 1164, 362, 641, 1065, 51264], "temperature": 0.0, + "avg_logprob": -0.11430580447418522, "compression_ratio": 1.628691983122363, "no_speech_prob": + 0.0020228272769600153}, {"id": 145, "seek": 99712, "start": 1015.12, "end": 1020.64, + "text": " set of garbage and nuances in them. Whereas if you''re an engineer and + you''re coming at it from like,", "tokens": [51264, 992, 295, 14150, 293, 38775, + 294, 552, 13, 13813, 498, 291, 434, 364, 11403, 293, 291, 434, 1348, 412, 309, 490, + 411, 11, 51540], "temperature": 0.0, "avg_logprob": -0.11430580447418522, "compression_ratio": + 1.628691983122363, "no_speech_prob": 0.0020228272769600153}, {"id": 146, "seek": + 102064, "start": 1020.64, "end": 1027.2, "text": " hey, you know, often what I see + with engineers is they come at it from a quantitative standpoint", "tokens": [50364, + 4177, 11, 291, 458, 11, 2049, 437, 286, 536, 365, 11955, 307, 436, 808, 412, 309, + 490, 257, 27778, 15827, 50692], "temperature": 0.0, "avg_logprob": -0.13971944288773971, + "compression_ratio": 1.5732217573221758, "no_speech_prob": 0.0005663756164722145}, + {"id": 147, "seek": 102064, "start": 1027.2, "end": 1031.68, "text": " of like, + I want to make sure this is scalable and reliable. So they''re solving for the", + "tokens": [50692, 295, 411, 11, 286, 528, 281, 652, 988, 341, 307, 38481, 293, 12924, + 13, 407, 436, 434, 12606, 337, 264, 50916], "temperature": 0.0, "avg_logprob": -0.13971944288773971, + "compression_ratio": 1.5732217573221758, "no_speech_prob": 0.0005663756164722145}, + {"id": 148, "seek": 102064, "start": 1032.48, "end": 1041.44, "text": " hardening + of the system problem first. And then they often will develop the the relevant side + of it", "tokens": [50956, 1152, 4559, 295, 264, 1185, 1154, 700, 13, 400, 550, 436, + 2049, 486, 1499, 264, 264, 7340, 1252, 295, 309, 51404], "temperature": 0.0, "avg_logprob": + -0.13971944288773971, "compression_ratio": 1.5732217573221758, "no_speech_prob": + 0.0005663756164722145}, {"id": 149, "seek": 102064, "start": 1041.44, "end": 1047.36, + "text": " or the the understanding of the data second. Now again, broad generalizations + there because,", "tokens": [51404, 420, 264, 264, 3701, 295, 264, 1412, 1150, 13, + 823, 797, 11, 4152, 2674, 14455, 456, 570, 11, 51700], "temperature": 0.0, "avg_logprob": + -0.13971944288773971, "compression_ratio": 1.5732217573221758, "no_speech_prob": + 0.0005663756164722145}, {"id": 150, "seek": 104736, "start": 1047.9199999999998, + "end": 1053.12, "text": " you know, folks have all kinds of different backgrounds. + But you know, so like as a leader in", "tokens": [50392, 291, 458, 11, 4024, 362, + 439, 3685, 295, 819, 17336, 13, 583, 291, 458, 11, 370, 411, 382, 257, 5263, 294, + 50652], "temperature": 0.0, "avg_logprob": -0.1250316738027387, "compression_ratio": + 1.7985074626865671, "no_speech_prob": 0.0008158011478371918}, {"id": 151, "seek": + 104736, "start": 1053.6, "end": 1059.84, "text": " somebody who does, you know, + manages people in this space, like I would often just work with you", "tokens": + [50676, 2618, 567, 775, 11, 291, 458, 11, 22489, 561, 294, 341, 1901, 11, 411, 286, + 576, 2049, 445, 589, 365, 291, 50988], "temperature": 0.0, "avg_logprob": -0.1250316738027387, + "compression_ratio": 1.7985074626865671, "no_speech_prob": 0.0008158011478371918}, + {"id": 152, "seek": 104736, "start": 1059.84, "end": 1064.8799999999999, "text": + " depending on what your background and understanding and intuition is. And then, + you know, try to help", "tokens": [50988, 5413, 322, 437, 428, 3678, 293, 3701, + 293, 24002, 307, 13, 400, 550, 11, 291, 458, 11, 853, 281, 854, 51240], "temperature": + 0.0, "avg_logprob": -0.1250316738027387, "compression_ratio": 1.7985074626865671, + "no_speech_prob": 0.0008158011478371918}, {"id": 153, "seek": 104736, "start": 1064.8799999999999, + "end": 1071.04, "text": " you complement whatever it is you''re missing there, right? + Like I think you have to have an", "tokens": [51240, 291, 17103, 2035, 309, 307, + 291, 434, 5361, 456, 11, 558, 30, 1743, 286, 519, 291, 362, 281, 362, 364, 51548], + "temperature": 0.0, "avg_logprob": -0.1250316738027387, "compression_ratio": 1.7985074626865671, + "no_speech_prob": 0.0008158011478371918}, {"id": 154, "seek": 104736, "start": 1071.04, + "end": 1077.04, "text": " understanding of how these engines work. I''ve often seen + folks who don''t have an understanding of", "tokens": [51548, 3701, 295, 577, 613, + 12982, 589, 13, 286, 600, 2049, 1612, 4024, 567, 500, 380, 362, 364, 3701, 295, + 51848], "temperature": 0.0, "avg_logprob": -0.1250316738027387, "compression_ratio": + 1.7985074626865671, "no_speech_prob": 0.0008158011478371918}, {"id": 155, "seek": + 107704, "start": 1077.04, "end": 1083.2, "text": " all the capabilities of these + modern search engines recreate the wheel, right? Like they''re reinventing", "tokens": + [50364, 439, 264, 10862, 295, 613, 4363, 3164, 12982, 25833, 264, 5589, 11, 558, + 30, 1743, 436, 434, 33477, 278, 50672], "temperature": 0.0, "avg_logprob": -0.10110753232782538, + "compression_ratio": 1.742081447963801, "no_speech_prob": 0.00016552692977711558}, + {"id": 156, "seek": 107704, "start": 1083.2, "end": 1089.52, "text": " the wheel + because they they''re coming from this first principles of the math that they learn", + "tokens": [50672, 264, 5589, 570, 436, 436, 434, 1348, 490, 341, 700, 9156, 295, + 264, 5221, 300, 436, 1466, 50988], "temperature": 0.0, "avg_logprob": -0.10110753232782538, + "compression_ratio": 1.742081447963801, "no_speech_prob": 0.00016552692977711558}, + {"id": 157, "seek": 107704, "start": 1089.52, "end": 1095.12, "text": " at the academic + level. And then, but they don''t necessarily know how that applies to real data", + "tokens": [50988, 412, 264, 7778, 1496, 13, 400, 550, 11, 457, 436, 500, 380, 4725, + 458, 577, 300, 13165, 281, 957, 1412, 51268], "temperature": 0.0, "avg_logprob": + -0.10110753232782538, "compression_ratio": 1.742081447963801, "no_speech_prob": + 0.00016552692977711558}, {"id": 158, "seek": 107704, "start": 1095.12, "end": 1100.08, + "text": " in the real world. Whereas a lot of these, you know, modern search engines, + because they are,", "tokens": [51268, 294, 264, 957, 1002, 13, 13813, 257, 688, + 295, 613, 11, 291, 458, 11, 4363, 3164, 12982, 11, 570, 436, 366, 11, 51516], "temperature": + 0.0, "avg_logprob": -0.10110753232782538, "compression_ratio": 1.742081447963801, + "no_speech_prob": 0.00016552692977711558}, {"id": 159, "seek": 110008, "start": + 1100.08, "end": 1108.6399999999999, "text": " they grew up in large scale, you know, + publicly traded high volume spaces. They''ve really been", "tokens": [50364, 436, + 6109, 493, 294, 2416, 4373, 11, 291, 458, 11, 14843, 27157, 1090, 5523, 7673, 13, + 814, 600, 534, 668, 50792], "temperature": 0.0, "avg_logprob": -0.14634986357255417, + "compression_ratio": 1.5867768595041323, "no_speech_prob": 0.0004984450642950833}, + {"id": 160, "seek": 110008, "start": 1108.6399999999999, "end": 1114.24, "text": + " hardened on the engineering side and they really know how to deal with all the + nuances of real", "tokens": [50792, 42605, 322, 264, 7043, 1252, 293, 436, 534, + 458, 577, 281, 2028, 365, 439, 264, 38775, 295, 957, 51072], "temperature": 0.0, + "avg_logprob": -0.14634986357255417, "compression_ratio": 1.5867768595041323, "no_speech_prob": + 0.0004984450642950833}, {"id": 161, "seek": 110008, "start": 1114.24, "end": 1121.76, + "text": " world data, right? And so, you by learning those kinds of things, you + will be much more effective at", "tokens": [51072, 1002, 1412, 11, 558, 30, 400, + 370, 11, 291, 538, 2539, 729, 3685, 295, 721, 11, 291, 486, 312, 709, 544, 4942, + 412, 51448], "temperature": 0.0, "avg_logprob": -0.14634986357255417, "compression_ratio": + 1.5867768595041323, "no_speech_prob": 0.0004984450642950833}, {"id": 162, "seek": + 110008, "start": 1121.76, "end": 1129.04, "text": " the at bringing to bear your + intuitions and understandings from whichever background that is.", "tokens": [51448, + 264, 412, 5062, 281, 6155, 428, 16224, 626, 293, 1223, 1109, 490, 24123, 3678, 300, + 307, 13, 51812], "temperature": 0.0, "avg_logprob": -0.14634986357255417, "compression_ratio": + 1.5867768595041323, "no_speech_prob": 0.0004984450642950833}, {"id": 163, "seek": + 112904, "start": 1129.04, "end": 1133.92, "text": " I don''t know if that makes + sense or not. Yeah, no, absolutely. Yeah, actually in the same", "tokens": [50364, + 286, 500, 380, 458, 498, 300, 1669, 2020, 420, 406, 13, 865, 11, 572, 11, 3122, + 13, 865, 11, 767, 294, 264, 912, 50608], "temperature": 0.0, "avg_logprob": -0.16189155907466493, + "compression_ratio": 1.6468531468531469, "no_speech_prob": 0.005156954284757376}, + {"id": 164, "seek": 112904, "start": 1133.92, "end": 1140.32, "text": " presentation, + you also said like, you''ve seen cases where you you come into helper company and", + "tokens": [50608, 5860, 11, 291, 611, 848, 411, 11, 291, 600, 1612, 3331, 689, 291, + 291, 808, 666, 36133, 2237, 293, 50928], "temperature": 0.0, "avg_logprob": -0.16189155907466493, + "compression_ratio": 1.6468531468531469, "no_speech_prob": 0.005156954284757376}, + {"id": 165, "seek": 112904, "start": 1141.04, "end": 1147.92, "text": " they they + point you to sort of like a data, the ace almost of 10,000 rules. And so you you + said", "tokens": [50964, 436, 436, 935, 291, 281, 1333, 295, 411, 257, 1412, 11, + 264, 17117, 1920, 295, 1266, 11, 1360, 4474, 13, 400, 370, 291, 291, 848, 51308], + "temperature": 0.0, "avg_logprob": -0.16189155907466493, "compression_ratio": 1.6468531468531469, + "no_speech_prob": 0.005156954284757376}, {"id": 166, "seek": 112904, "start": 1147.92, + "end": 1151.44, "text": " they have that in principle, you could just remove solar + or whatever search engine you have and", "tokens": [51308, 436, 362, 300, 294, 8665, + 11, 291, 727, 445, 4159, 7936, 420, 2035, 3164, 2848, 291, 362, 293, 51484], "temperature": + 0.0, "avg_logprob": -0.16189155907466493, "compression_ratio": 1.6468531468531469, + "no_speech_prob": 0.005156954284757376}, {"id": 167, "seek": 112904, "start": 1151.44, + "end": 1156.56, "text": " just use those rules to retrieve documents, right? But + when you go and ask specific questions,", "tokens": [51484, 445, 764, 729, 4474, + 281, 30254, 8512, 11, 558, 30, 583, 562, 291, 352, 293, 1029, 2685, 1651, 11, 51740], + "temperature": 0.0, "avg_logprob": -0.16189155907466493, "compression_ratio": 1.6468531468531469, + "no_speech_prob": 0.005156954284757376}, {"id": 168, "seek": 115656, "start": 1156.56, + "end": 1162.56, "text": " what what what this rule does? The answer that you you + illustrated was well, it was created by", "tokens": [50364, 437, 437, 437, 341, + 4978, 775, 30, 440, 1867, 300, 291, 291, 33875, 390, 731, 11, 309, 390, 2942, 538, + 50664], "temperature": 0.0, "avg_logprob": -0.1486204497668208, "compression_ratio": + 1.56, "no_speech_prob": 0.003055430017411709}, {"id": 169, "seek": 115656, "start": + 1162.56, "end": 1169.28, "text": " Joy, you know, and he quit five years ago. So + he then said it makes sense. So we keep it. So how do", "tokens": [50664, 15571, + 11, 291, 458, 11, 293, 415, 10366, 1732, 924, 2057, 13, 407, 415, 550, 848, 309, + 1669, 2020, 13, 407, 321, 1066, 309, 13, 407, 577, 360, 51000], "temperature": 0.0, + "avg_logprob": -0.1486204497668208, "compression_ratio": 1.56, "no_speech_prob": + 0.003055430017411709}, {"id": 170, "seek": 115656, "start": 1169.28, "end": 1175.2, + "text": " you go about convincing the organization or teams to change their perception + and sort of like become", "tokens": [51000, 291, 352, 466, 24823, 264, 4475, 420, + 5491, 281, 1319, 641, 12860, 293, 1333, 295, 411, 1813, 51296], "temperature": 0.0, + "avg_logprob": -0.1486204497668208, "compression_ratio": 1.56, "no_speech_prob": + 0.003055430017411709}, {"id": 171, "seek": 115656, "start": 1175.2, "end": 1182.24, + "text": " more flexible and move into this flywheel of experiments? Yeah, it''s + hard. And again, I think,", "tokens": [51296, 544, 11358, 293, 1286, 666, 341, 3603, + 22830, 295, 12050, 30, 865, 11, 309, 311, 1152, 13, 400, 797, 11, 286, 519, 11, + 51648], "temperature": 0.0, "avg_logprob": -0.1486204497668208, "compression_ratio": + 1.56, "no_speech_prob": 0.003055430017411709}, {"id": 172, "seek": 118224, "start": + 1182.24, "end": 1187.6, "text": " you know, I mean, you have to look at incentives + and first principles there, right? Like,", "tokens": [50364, 291, 458, 11, 286, + 914, 11, 291, 362, 281, 574, 412, 23374, 293, 700, 9156, 456, 11, 558, 30, 1743, + 11, 50632], "temperature": 0.0, "avg_logprob": -0.10238174998432124, "compression_ratio": + 1.6824034334763949, "no_speech_prob": 0.0018538926960900426}, {"id": 173, "seek": + 118224, "start": 1188.72, "end": 1195.44, "text": " again, if you''re in this boat + of like searches, just a feature, there may or may not be any incentive.", "tokens": + [50688, 797, 11, 498, 291, 434, 294, 341, 6582, 295, 411, 26701, 11, 445, 257, 4111, + 11, 456, 815, 420, 815, 406, 312, 604, 22346, 13, 51024], "temperature": 0.0, "avg_logprob": + -0.10238174998432124, "compression_ratio": 1.6824034334763949, "no_speech_prob": + 0.0018538926960900426}, {"id": 174, "seek": 118224, "start": 1195.44, "end": 1200.64, + "text": " But if you''re in this boat of like, hey, search is a really critical + aspect of what we do. Our users", "tokens": [51024, 583, 498, 291, 434, 294, 341, + 6582, 295, 411, 11, 4177, 11, 3164, 307, 257, 534, 4924, 4171, 295, 437, 321, 360, + 13, 2621, 5022, 51284], "temperature": 0.0, "avg_logprob": -0.10238174998432124, + "compression_ratio": 1.6824034334763949, "no_speech_prob": 0.0018538926960900426}, + {"id": 175, "seek": 118224, "start": 1200.64, "end": 1209.2, "text": " use it all + the time. It''s key to revenue. It''s key to timeliness or it''s, you know, people''s + lives", "tokens": [51284, 764, 309, 439, 264, 565, 13, 467, 311, 2141, 281, 9324, + 13, 467, 311, 2141, 281, 524, 25307, 420, 309, 311, 11, 291, 458, 11, 561, 311, + 2909, 51712], "temperature": 0.0, "avg_logprob": -0.10238174998432124, "compression_ratio": + 1.6824034334763949, "no_speech_prob": 0.0018538926960900426}, {"id": 176, "seek": + 120920, "start": 1209.2, "end": 1216.96, "text": " are on the line, et cetera. You''re + going to invest in making sure searches as capable as possible.", "tokens": [50364, + 366, 322, 264, 1622, 11, 1030, 11458, 13, 509, 434, 516, 281, 1963, 294, 1455, 988, + 26701, 382, 8189, 382, 1944, 13, 50752], "temperature": 0.0, "avg_logprob": -0.07385156105975715, + "compression_ratio": 1.5983935742971886, "no_speech_prob": 0.0013425301294773817}, + {"id": 177, "seek": 120920, "start": 1216.96, "end": 1224.16, "text": " Those folks + usually don''t take much convincing once you can show them a better way, right? + They''re", "tokens": [50752, 3950, 4024, 2673, 500, 380, 747, 709, 24823, 1564, + 291, 393, 855, 552, 257, 1101, 636, 11, 558, 30, 814, 434, 51112], "temperature": + 0.0, "avg_logprob": -0.07385156105975715, "compression_ratio": 1.5983935742971886, + "no_speech_prob": 0.0013425301294773817}, {"id": 178, "seek": 120920, "start": 1224.16, + "end": 1230.88, "text": " often already frustrated by the sheer number of rules + that they have. And so one of the things that", "tokens": [51112, 2049, 1217, 15751, + 538, 264, 23061, 1230, 295, 4474, 300, 436, 362, 13, 400, 370, 472, 295, 264, 721, + 300, 51448], "temperature": 0.0, "avg_logprob": -0.07385156105975715, "compression_ratio": + 1.5983935742971886, "no_speech_prob": 0.0013425301294773817}, {"id": 179, "seek": + 120920, "start": 1230.88, "end": 1234.88, "text": " can often work in those situations, + I think is, you know, you can start to just learn the, you know,", "tokens": [51448, + 393, 2049, 589, 294, 729, 6851, 11, 286, 519, 307, 11, 291, 458, 11, 291, 393, 722, + 281, 445, 1466, 264, 11, 291, 458, 11, 51648], "temperature": 0.0, "avg_logprob": + -0.07385156105975715, "compression_ratio": 1.5983935742971886, "no_speech_prob": + 0.0013425301294773817}, {"id": 180, "seek": 123488, "start": 1234.88, "end": 1240.0800000000002, + "text": " a lot of these machine learning systems will actually learn the set of + rules, right? And so if you", "tokens": [50364, 257, 688, 295, 613, 3479, 2539, + 3652, 486, 767, 1466, 264, 992, 295, 4474, 11, 558, 30, 400, 370, 498, 291, 50624], + "temperature": 0.0, "avg_logprob": -0.10293541635785784, "compression_ratio": 1.8029739776951672, + "no_speech_prob": 0.0004807735385838896}, {"id": 181, "seek": 123488, "start": 1240.0800000000002, + "end": 1244.48, "text": " want, you can just start to learn the rules by the fact + that you''re gathering your queries and", "tokens": [50624, 528, 11, 291, 393, 445, + 722, 281, 1466, 264, 4474, 538, 264, 1186, 300, 291, 434, 13519, 428, 24109, 293, + 50844], "temperature": 0.0, "avg_logprob": -0.10293541635785784, "compression_ratio": + 1.8029739776951672, "no_speech_prob": 0.0004807735385838896}, {"id": 182, "seek": + 123488, "start": 1244.48, "end": 1250.0, "text": " your click logs and you''re looking + at the engagements users are having with the system, with the rules", "tokens": + [50844, 428, 2052, 20820, 293, 291, 434, 1237, 412, 264, 44978, 5022, 366, 1419, + 365, 264, 1185, 11, 365, 264, 4474, 51120], "temperature": 0.0, "avg_logprob": -0.10293541635785784, + "compression_ratio": 1.8029739776951672, "no_speech_prob": 0.0004807735385838896}, + {"id": 183, "seek": 123488, "start": 1250.0, "end": 1255.6000000000001, "text": + " in place. And then over time, you know, that will learn it. That the harder part + often is", "tokens": [51120, 294, 1081, 13, 400, 550, 670, 565, 11, 291, 458, 11, + 300, 486, 1466, 309, 13, 663, 264, 6081, 644, 2049, 307, 51400], "temperature": + 0.0, "avg_logprob": -0.10293541635785784, "compression_ratio": 1.8029739776951672, + "no_speech_prob": 0.0004807735385838896}, {"id": 184, "seek": 123488, "start": 1257.2, + "end": 1264.4, "text": " getting that last part, which is true experimentation whereby + they actually have a system in place", "tokens": [51480, 1242, 300, 1036, 644, 11, + 597, 307, 2074, 37142, 36998, 436, 767, 362, 257, 1185, 294, 1081, 51840], "temperature": + 0.0, "avg_logprob": -0.10293541635785784, "compression_ratio": 1.8029739776951672, + "no_speech_prob": 0.0004807735385838896}, {"id": 185, "seek": 126440, "start": 1264.4, + "end": 1272.0800000000002, "text": " for running multi-variant experiments or AB + tests, right? And they can actually try out different", "tokens": [50364, 337, 2614, + 4825, 12, 34033, 394, 12050, 420, 13838, 6921, 11, 558, 30, 400, 436, 393, 767, + 853, 484, 819, 50748], "temperature": 0.0, "avg_logprob": -0.08641707036913056, + "compression_ratio": 1.6302521008403361, "no_speech_prob": 0.0005219281883910298}, + {"id": 186, "seek": 126440, "start": 1272.0800000000002, "end": 1279.0400000000002, + "text": " approaches and see which one wins and see which one''s most effective + and then go with that from,", "tokens": [50748, 11587, 293, 536, 597, 472, 10641, + 293, 536, 597, 472, 311, 881, 4942, 293, 550, 352, 365, 300, 490, 11, 51096], "temperature": + 0.0, "avg_logprob": -0.08641707036913056, "compression_ratio": 1.6302521008403361, + "no_speech_prob": 0.0005219281883910298}, {"id": 187, "seek": 126440, "start": 1279.0400000000002, + "end": 1284.4, "text": " you know, until the next one beats it, right? That''s a + fair amount of engineering work to get in place.", "tokens": [51096, 291, 458, 11, + 1826, 264, 958, 472, 16447, 309, 11, 558, 30, 663, 311, 257, 3143, 2372, 295, 7043, + 589, 281, 483, 294, 1081, 13, 51364], "temperature": 0.0, "avg_logprob": -0.08641707036913056, + "compression_ratio": 1.6302521008403361, "no_speech_prob": 0.0005219281883910298}, + {"id": 188, "seek": 126440, "start": 1284.4, "end": 1290.4, "text": " It''s also + a fair amount of math to do in order to make sure it''s appropriate. These days,", + "tokens": [51364, 467, 311, 611, 257, 3143, 2372, 295, 5221, 281, 360, 294, 1668, + 281, 652, 988, 309, 311, 6854, 13, 1981, 1708, 11, 51664], "temperature": 0.0, "avg_logprob": + -0.08641707036913056, "compression_ratio": 1.6302521008403361, "no_speech_prob": + 0.0005219281883910298}, {"id": 189, "seek": 129040, "start": 1290.4, "end": 1294.8000000000002, + "text": " there are systems and tools that allow you to do it, but if you want to + homegrown it, you know,", "tokens": [50364, 456, 366, 3652, 293, 3873, 300, 2089, + 291, 281, 360, 309, 11, 457, 498, 291, 528, 281, 1280, 38413, 309, 11, 291, 458, + 11, 50584], "temperature": 0.0, "avg_logprob": -0.10876838197099402, "compression_ratio": + 1.6875, "no_speech_prob": 0.003533940063789487}, {"id": 190, "seek": 129040, "start": + 1294.8000000000002, "end": 1300.96, "text": " that can take a lot of work. So getting + people to be in that mindset, especially in", "tokens": [50584, 300, 393, 747, 257, + 688, 295, 589, 13, 407, 1242, 561, 281, 312, 294, 300, 12543, 11, 2318, 294, 50892], + "temperature": 0.0, "avg_logprob": -0.10876838197099402, "compression_ratio": 1.6875, + "no_speech_prob": 0.003533940063789487}, {"id": 191, "seek": 129040, "start": 1302.4, + "end": 1308.16, "text": " environments or company cultures where like there''s pride + in being right, you know, you sometimes", "tokens": [50964, 12388, 420, 2237, 12951, + 689, 411, 456, 311, 10936, 294, 885, 558, 11, 291, 458, 11, 291, 2171, 51252], "temperature": + 0.0, "avg_logprob": -0.10876838197099402, "compression_ratio": 1.6875, "no_speech_prob": + 0.003533940063789487}, {"id": 192, "seek": 129040, "start": 1308.16, "end": 1313.76, + "text": " see that in a lot of companies where it''s like whoever''s the boss has + to be right kind of situation.", "tokens": [51252, 536, 300, 294, 257, 688, 295, + 3431, 689, 309, 311, 411, 11387, 311, 264, 5741, 575, 281, 312, 558, 733, 295, 2590, + 13, 51532], "temperature": 0.0, "avg_logprob": -0.10876838197099402, "compression_ratio": + 1.6875, "no_speech_prob": 0.003533940063789487}, {"id": 193, "seek": 131376, "start": + 1314.32, "end": 1320.8, "text": " Those types of companies are always going to struggle + with experiment mindsets because, you know,", "tokens": [50392, 3950, 3467, 295, + 3431, 366, 1009, 516, 281, 7799, 365, 5120, 9634, 1385, 570, 11, 291, 458, 11, 50716], + "temperature": 0.0, "avg_logprob": -0.2014251756079403, "compression_ratio": 1.6536796536796536, + "no_speech_prob": 0.010579102672636509}, {"id": 194, "seek": 131376, "start": 1320.8, + "end": 1327.44, "text": " they reward, quote unquote, being right as opposed to, + quote unquote, you know, rewarding", "tokens": [50716, 436, 7782, 11, 6513, 37557, + 11, 885, 558, 382, 8851, 281, 11, 6513, 37557, 11, 291, 458, 11, 20063, 51048], + "temperature": 0.0, "avg_logprob": -0.2014251756079403, "compression_ratio": 1.6536796536796536, + "no_speech_prob": 0.010579102672636509}, {"id": 195, "seek": 131376, "start": 1328.16, + "end": 1333.92, "text": " longer term growth and incremental improvements with the + occasional failures, right? So you really", "tokens": [51084, 2854, 1433, 4599, + 293, 35759, 13797, 365, 264, 31644, 20774, 11, 558, 30, 407, 291, 534, 51372], "temperature": + 0.0, "avg_logprob": -0.2014251756079403, "compression_ratio": 1.6536796536796536, + "no_speech_prob": 0.010579102672636509}, {"id": 196, "seek": 131376, "start": 1333.92, + "end": 1341.28, "text": " have to look at company culture first and potentially + reset that and then build and bake in the", "tokens": [51372, 362, 281, 574, 412, + 2237, 3713, 700, 293, 7263, 14322, 300, 293, 550, 1322, 293, 16562, 294, 264, 51740], + "temperature": 0.0, "avg_logprob": -0.2014251756079403, "compression_ratio": 1.6536796536796536, + "no_speech_prob": 0.010579102672636509}, {"id": 197, "seek": 134128, "start": 1342.0, + "end": 1349.2, "text": " the necessary engineering work to make experiments work. + Yeah, absolutely. I agree to that same thought", "tokens": [50400, 264, 4818, 7043, + 589, 281, 652, 12050, 589, 13, 865, 11, 3122, 13, 286, 3986, 281, 300, 912, 1194, + 50760], "temperature": 0.0, "avg_logprob": -0.18434673885129532, "compression_ratio": + 1.6642857142857144, "no_speech_prob": 0.005494780372828245}, {"id": 198, "seek": + 134128, "start": 1349.2, "end": 1354.8799999999999, "text": " that, you know, without + failures, you cannot really breed the culture of creating cool new stuff", "tokens": + [50760, 300, 11, 291, 458, 11, 1553, 20774, 11, 291, 2644, 534, 18971, 264, 3713, + 295, 4084, 1627, 777, 1507, 51044], "temperature": 0.0, "avg_logprob": -0.18434673885129532, + "compression_ratio": 1.6642857142857144, "no_speech_prob": 0.005494780372828245}, + {"id": 199, "seek": 134128, "start": 1354.8799999999999, "end": 1360.24, "text": + " because you basically cannot unleash yourself to go and mess with your code base, + right?", "tokens": [51044, 570, 291, 1936, 2644, 49814, 1803, 281, 352, 293, 2082, + 365, 428, 3089, 3096, 11, 558, 30, 51312], "temperature": 0.0, "avg_logprob": -0.18434673885129532, + "compression_ratio": 1.6642857142857144, "no_speech_prob": 0.005494780372828245}, + {"id": 200, "seek": 134128, "start": 1361.04, "end": 1364.16, "text": " And do things + and create new stuff. So like, you need to be brave for sure.", "tokens": [51352, + 400, 360, 721, 293, 1884, 777, 1507, 13, 407, 411, 11, 291, 643, 281, 312, 12653, + 337, 988, 13, 51508], "temperature": 0.0, "avg_logprob": -0.18434673885129532, "compression_ratio": + 1.6642857142857144, "no_speech_prob": 0.005494780372828245}, {"id": 201, "seek": + 134128, "start": 1364.6399999999999, "end": 1370.3999999999999, "text": " Well, + as I think the front of mind Ted Dunning said, the cool thing about experimentation + frameworks", "tokens": [51532, 1042, 11, 382, 286, 519, 264, 1868, 295, 1575, 14985, + 11959, 773, 848, 11, 264, 1627, 551, 466, 37142, 29834, 51820], "temperature": 0.0, + "avg_logprob": -0.18434673885129532, "compression_ratio": 1.6642857142857144, "no_speech_prob": + 0.005494780372828245}, {"id": 202, "seek": 137040, "start": 1370.4, "end": 1377.1200000000001, + "text": " is you get to be wrong and that''s okay, right? Like you''re actually + right by the fact that you''re wrong.", "tokens": [50364, 307, 291, 483, 281, 312, + 2085, 293, 300, 311, 1392, 11, 558, 30, 1743, 291, 434, 767, 558, 538, 264, 1186, + 300, 291, 434, 2085, 13, 50700], "temperature": 0.0, "avg_logprob": -0.114143683531574, + "compression_ratio": 1.7256637168141593, "no_speech_prob": 0.003049850929528475}, + {"id": 203, "seek": 137040, "start": 1378.0, "end": 1386.3200000000002, "text": + " You''re because you''re right in the long run, right? Yes. Even if any given experiment + is flat or bad,", "tokens": [50744, 509, 434, 570, 291, 434, 558, 294, 264, 938, + 1190, 11, 558, 30, 1079, 13, 2754, 498, 604, 2212, 5120, 307, 4962, 420, 1578, 11, + 51160], "temperature": 0.0, "avg_logprob": -0.114143683531574, "compression_ratio": + 1.7256637168141593, "no_speech_prob": 0.003049850929528475}, {"id": 204, "seek": + 137040, "start": 1386.3200000000002, "end": 1391.68, "text": " right? But overall, + you know, in the long run, you''re going to win out because you''re going to just,", + "tokens": [51160, 558, 30, 583, 4787, 11, 291, 458, 11, 294, 264, 938, 1190, 11, + 291, 434, 516, 281, 1942, 484, 570, 291, 434, 516, 281, 445, 11, 51428], "temperature": + 0.0, "avg_logprob": -0.114143683531574, "compression_ratio": 1.7256637168141593, + "no_speech_prob": 0.003049850929528475}, {"id": 205, "seek": 137040, "start": 1391.68, + "end": 1397.6000000000001, "text": " it''s easier and easier for you to add in a + new approach. Yeah, absolutely. I think", "tokens": [51428, 309, 311, 3571, 293, + 3571, 337, 291, 281, 909, 294, 257, 777, 3109, 13, 865, 11, 3122, 13, 286, 519, + 51724], "temperature": 0.0, "avg_logprob": -0.114143683531574, "compression_ratio": + 1.7256637168141593, "no_speech_prob": 0.003049850929528475}, {"id": 206, "seek": + 139760, "start": 1398.3999999999999, "end": 1405.04, "text": " that Turnbull also + said, like, you know, how you basically accumulate this bruises, right? So you''re + like,", "tokens": [50404, 300, 7956, 37290, 611, 848, 11, 411, 11, 291, 458, 11, + 577, 291, 1936, 33384, 341, 25267, 3598, 11, 558, 30, 407, 291, 434, 411, 11, 50736], + "temperature": 0.0, "avg_logprob": -0.2441332222211479, "compression_ratio": 1.6554621848739495, + "no_speech_prob": 0.01411152258515358}, {"id": 207, "seek": 139760, "start": 1405.04, + "end": 1412.0, "text": " Oscar tissue as some other people say. So I think without + doing things, you can''t without failing as well,", "tokens": [50736, 20718, 12404, + 382, 512, 661, 561, 584, 13, 407, 286, 519, 1553, 884, 721, 11, 291, 393, 380, 1553, + 18223, 382, 731, 11, 51084], "temperature": 0.0, "avg_logprob": -0.2441332222211479, + "compression_ratio": 1.6554621848739495, "no_speech_prob": 0.01411152258515358}, + {"id": 208, "seek": 139760, "start": 1412.0, "end": 1417.12, "text": " you can''t + learn. So I totally agree to that. But still for those who are still learning,", + "tokens": [51084, 291, 393, 380, 1466, 13, 407, 286, 3879, 3986, 281, 300, 13, 583, + 920, 337, 729, 567, 366, 920, 2539, 11, 51340], "temperature": 0.0, "avg_logprob": + -0.2441332222211479, "compression_ratio": 1.6554621848739495, "no_speech_prob": + 0.01411152258515358}, {"id": 209, "seek": 139760, "start": 1417.12, "end": 1422.1599999999999, + "text": " you know, and we are discussing, to some extent, the courses that you + couldn''t be teaching,", "tokens": [51340, 291, 458, 11, 293, 321, 366, 10850, 11, + 281, 512, 8396, 11, 264, 7712, 300, 291, 2809, 380, 312, 4571, 11, 51592], "temperature": + 0.0, "avg_logprob": -0.2441332222211479, "compression_ratio": 1.6554621848739495, + "no_speech_prob": 0.01411152258515358}, {"id": 210, "seek": 142216, "start": 1422.72, + "end": 1428.64, "text": " you know, where do you start? Like, let''s say you have + some data, right? You have some click logs", "tokens": [50392, 291, 458, 11, 689, + 360, 291, 722, 30, 1743, 11, 718, 311, 584, 291, 362, 512, 1412, 11, 558, 30, 509, + 362, 512, 2052, 20820, 50688], "temperature": 0.0, "avg_logprob": -0.155118648822491, + "compression_ratio": 1.7162162162162162, "no_speech_prob": 0.025838572531938553}, + {"id": 211, "seek": 142216, "start": 1430.4, "end": 1435.44, "text": " within your + organization or maybe you found some data set. Where do you start? How do you go + about", "tokens": [50776, 1951, 428, 4475, 420, 1310, 291, 1352, 512, 1412, 992, + 13, 2305, 360, 291, 722, 30, 1012, 360, 291, 352, 466, 51028], "temperature": 0.0, + "avg_logprob": -0.155118648822491, "compression_ratio": 1.7162162162162162, "no_speech_prob": + 0.025838572531938553}, {"id": 212, "seek": 142216, "start": 1435.44, "end": 1441.44, + "text": " dissecting that data set? What do you do with it as next steps and what + to avoid maybe and", "tokens": [51028, 48332, 278, 300, 1412, 992, 30, 708, 360, + 291, 360, 365, 309, 382, 958, 4439, 293, 437, 281, 5042, 1310, 293, 51328], "temperature": + 0.0, "avg_logprob": -0.155118648822491, "compression_ratio": 1.7162162162162162, + "no_speech_prob": 0.025838572531938553}, {"id": 213, "seek": 142216, "start": 1442.3200000000002, + "end": 1448.5600000000002, "text": " what good things to know to keep in mind? Yeah, + I mean, I think, you know, first off, I mean,", "tokens": [51372, 437, 665, 721, + 281, 458, 281, 1066, 294, 1575, 30, 865, 11, 286, 914, 11, 286, 519, 11, 291, 458, + 11, 700, 766, 11, 286, 914, 11, 51684], "temperature": 0.0, "avg_logprob": -0.155118648822491, + "compression_ratio": 1.7162162162162162, "no_speech_prob": 0.025838572531938553}, + {"id": 214, "seek": 144856, "start": 1449.52, "end": 1454.24, "text": " a lot of + companies aren''t even at all that great at actually collecting and managing their + query", "tokens": [50412, 257, 688, 295, 3431, 3212, 380, 754, 412, 439, 300, 869, + 412, 767, 12510, 293, 11642, 641, 14581, 50648], "temperature": 0.0, "avg_logprob": + -0.08285161150180227, "compression_ratio": 1.7375886524822695, "no_speech_prob": + 0.014973311685025692}, {"id": 215, "seek": 144856, "start": 1454.24, "end": 1460.24, + "text": " logs, right? So if you''re, if you''ve got a search engine up and running + and you want to improve it,", "tokens": [50648, 20820, 11, 558, 30, 407, 498, 291, + 434, 11, 498, 291, 600, 658, 257, 3164, 2848, 493, 293, 2614, 293, 291, 528, 281, + 3470, 309, 11, 50948], "temperature": 0.0, "avg_logprob": -0.08285161150180227, + "compression_ratio": 1.7375886524822695, "no_speech_prob": 0.014973311685025692}, + {"id": 216, "seek": 144856, "start": 1460.24, "end": 1463.52, "text": " I mean, + I think the first thing you have to start to do is again, it kind of goes back to + this", "tokens": [50948, 286, 914, 11, 286, 519, 264, 700, 551, 291, 362, 281, 722, + 281, 360, 307, 797, 11, 309, 733, 295, 1709, 646, 281, 341, 51112], "temperature": + 0.0, "avg_logprob": -0.08285161150180227, "compression_ratio": 1.7375886524822695, + "no_speech_prob": 0.014973311685025692}, {"id": 217, "seek": 144856, "start": 1463.52, + "end": 1470.0, "text": " first principles. Like, if I''m not measuring things that + help me understand what users are doing,", "tokens": [51112, 700, 9156, 13, 1743, + 11, 498, 286, 478, 406, 13389, 721, 300, 854, 385, 1223, 437, 5022, 366, 884, 11, + 51436], "temperature": 0.0, "avg_logprob": -0.08285161150180227, "compression_ratio": + 1.7375886524822695, "no_speech_prob": 0.014973311685025692}, {"id": 218, "seek": + 144856, "start": 1470.0, "end": 1474.96, "text": " and that''s the first step, right? + Like, make sure you''re able to process your query logs and capture", "tokens": + [51436, 293, 300, 311, 264, 700, 1823, 11, 558, 30, 1743, 11, 652, 988, 291, 434, + 1075, 281, 1399, 428, 14581, 20820, 293, 7983, 51684], "temperature": 0.0, "avg_logprob": + -0.08285161150180227, "compression_ratio": 1.7375886524822695, "no_speech_prob": + 0.014973311685025692}, {"id": 219, "seek": 147496, "start": 1474.96, "end": 1480.96, + "text": " things like session history and what users clicked on, what they saw. + A lot of companies will only", "tokens": [50364, 721, 411, 5481, 2503, 293, 437, + 5022, 23370, 322, 11, 437, 436, 1866, 13, 316, 688, 295, 3431, 486, 787, 50664], + "temperature": 0.0, "avg_logprob": -0.0838387648264567, "compression_ratio": 1.8458646616541354, + "no_speech_prob": 0.015847446396946907}, {"id": 220, "seek": 147496, "start": 1480.96, + "end": 1487.04, "text": " measure what was clicked on, but they actually don''t + measure what was seen by the user or at least", "tokens": [50664, 3481, 437, 390, + 23370, 322, 11, 457, 436, 767, 500, 380, 3481, 437, 390, 1612, 538, 264, 4195, 420, + 412, 1935, 50968], "temperature": 0.0, "avg_logprob": -0.0838387648264567, "compression_ratio": + 1.8458646616541354, "no_speech_prob": 0.015847446396946907}, {"id": 221, "seek": + 147496, "start": 1487.04, "end": 1492.64, "text": " inferred to be seen by the user. + And that can be a big loss because a lot of these machine learning", "tokens": [50968, + 13596, 986, 281, 312, 1612, 538, 264, 4195, 13, 400, 300, 393, 312, 257, 955, 4470, + 570, 257, 688, 295, 613, 3479, 2539, 51248], "temperature": 0.0, "avg_logprob": + -0.0838387648264567, "compression_ratio": 1.8458646616541354, "no_speech_prob": + 0.015847446396946907}, {"id": 222, "seek": 147496, "start": 1492.64, "end": 1499.28, + "text": " systems, you need to know what wasn''t chosen just as much as you need + to know what was chosen,", "tokens": [51248, 3652, 11, 291, 643, 281, 458, 437, + 2067, 380, 8614, 445, 382, 709, 382, 291, 643, 281, 458, 437, 390, 8614, 11, 51580], + "temperature": 0.0, "avg_logprob": -0.0838387648264567, "compression_ratio": 1.8458646616541354, + "no_speech_prob": 0.015847446396946907}, {"id": 223, "seek": 147496, "start": 1499.28, + "end": 1504.0, "text": " right? So really make sure you''ve got the instrumentation + of your system in place. And guess what?", "tokens": [51580, 558, 30, 407, 534, + 652, 988, 291, 600, 658, 264, 7198, 399, 295, 428, 1185, 294, 1081, 13, 400, 2041, + 437, 30, 51816], "temperature": 0.0, "avg_logprob": -0.0838387648264567, "compression_ratio": + 1.8458646616541354, "no_speech_prob": 0.015847446396946907}, {"id": 224, "seek": + 150400, "start": 1504.0, "end": 1510.0, "text": " A search engine is a great place + to store all of that data as well, right? As elastic as proven out", "tokens": [50364, + 316, 3164, 2848, 307, 257, 869, 1081, 281, 3531, 439, 295, 300, 1412, 382, 731, + 11, 558, 30, 1018, 17115, 382, 12785, 484, 50664], "temperature": 0.0, "avg_logprob": + -0.10025016784667969, "compression_ratio": 1.7259786476868328, "no_speech_prob": + 0.002184606157243252}, {"id": 225, "seek": 150400, "start": 1510.0, "end": 1515.76, + "text": " with their using search for logs and spawn as well, right? And so make + sure you''re captioning all", "tokens": [50664, 365, 641, 1228, 3164, 337, 20820, + 293, 17088, 382, 731, 11, 558, 30, 400, 370, 652, 988, 291, 434, 31974, 278, 439, + 50952], "temperature": 0.0, "avg_logprob": -0.10025016784667969, "compression_ratio": + 1.7259786476868328, "no_speech_prob": 0.002184606157243252}, {"id": 226, "seek": + 150400, "start": 1515.76, "end": 1520.32, "text": " that stuff. And then again, + I think this is where your intuition starts to come in. So whenever I get", "tokens": + [50952, 300, 1507, 13, 400, 550, 797, 11, 286, 519, 341, 307, 689, 428, 24002, 3719, + 281, 808, 294, 13, 407, 5699, 286, 483, 51180], "temperature": 0.0, "avg_logprob": + -0.10025016784667969, "compression_ratio": 1.7259786476868328, "no_speech_prob": + 0.002184606157243252}, {"id": 227, "seek": 150400, "start": 1520.32, "end": 1526.64, + "text": " a new data set, a new set of click logs, I start to look at, well, what + are my most popular queries?", "tokens": [51180, 257, 777, 1412, 992, 11, 257, 777, + 992, 295, 2052, 20820, 11, 286, 722, 281, 574, 412, 11, 731, 11, 437, 366, 452, + 881, 3743, 24109, 30, 51496], "temperature": 0.0, "avg_logprob": -0.10025016784667969, + "compression_ratio": 1.7259786476868328, "no_speech_prob": 0.002184606157243252}, + {"id": 228, "seek": 150400, "start": 1526.64, "end": 1532.56, "text": " What are + users asking today? What are they asking overall? What led to zero results?", "tokens": + [51496, 708, 366, 5022, 3365, 965, 30, 708, 366, 436, 3365, 4787, 30, 708, 4684, + 281, 4018, 3542, 30, 51792], "temperature": 0.0, "avg_logprob": -0.10025016784667969, + "compression_ratio": 1.7259786476868328, "no_speech_prob": 0.002184606157243252}, + {"id": 229, "seek": 153256, "start": 1533.36, "end": 1538.72, "text": " How often + are they rewriting their queries like they typed in a query and then they", "tokens": + [50404, 1012, 2049, 366, 436, 319, 19868, 641, 24109, 411, 436, 33941, 294, 257, + 14581, 293, 550, 436, 50672], "temperature": 0.0, "avg_logprob": -0.13262417080166103, + "compression_ratio": 1.7196969696969697, "no_speech_prob": 0.003400468034669757}, + {"id": 230, "seek": 153256, "start": 1538.72, "end": 1543.2, "text": " didn''t like + the results. So they rewrote it. You know, all of these things are pretty easily", + "tokens": [50672, 994, 380, 411, 264, 3542, 13, 407, 436, 319, 7449, 1370, 309, + 13, 509, 458, 11, 439, 295, 613, 721, 366, 1238, 3612, 50896], "temperature": 0.0, + "avg_logprob": -0.13262417080166103, "compression_ratio": 1.7196969696969697, "no_speech_prob": + 0.003400468034669757}, {"id": 231, "seek": 153256, "start": 1543.2, "end": 1548.3999999999999, + "text": " discoverable in query logs, right? So just start digging in and building + some intuition", "tokens": [50896, 4411, 712, 294, 14581, 20820, 11, 558, 30, 407, + 445, 722, 17343, 294, 293, 2390, 512, 24002, 51156], "temperature": 0.0, "avg_logprob": + -0.13262417080166103, "compression_ratio": 1.7196969696969697, "no_speech_prob": + 0.003400468034669757}, {"id": 232, "seek": 153256, "start": 1549.2, "end": 1553.52, + "text": " for those things. So for instance, one of the things when I was back at + Lucidworks that we would", "tokens": [51196, 337, 729, 721, 13, 407, 337, 5197, + 11, 472, 295, 264, 721, 562, 286, 390, 646, 412, 9593, 327, 18357, 300, 321, 576, + 51412], "temperature": 0.0, "avg_logprob": -0.13262417080166103, "compression_ratio": + 1.7196969696969697, "no_speech_prob": 0.003400468034669757}, {"id": 233, "seek": + 153256, "start": 1554.1599999999999, "end": 1560.1599999999999, "text": " do is + what we call like head tail analysis or long tail analysis is another thing you + see in", "tokens": [51444, 360, 307, 437, 321, 818, 411, 1378, 6838, 5215, 420, + 938, 6838, 5215, 307, 1071, 551, 291, 536, 294, 51744], "temperature": 0.0, "avg_logprob": + -0.13262417080166103, "compression_ratio": 1.7196969696969697, "no_speech_prob": + 0.003400468034669757}, {"id": 234, "seek": 156016, "start": 1560.16, "end": 1564.72, + "text": " the literature, you know, especially in the e-commerce world where you + have this power law", "tokens": [50364, 264, 10394, 11, 291, 458, 11, 2318, 294, + 264, 308, 12, 26926, 1002, 689, 291, 362, 341, 1347, 2101, 50592], "temperature": + 0.0, "avg_logprob": -0.11133351486720396, "compression_ratio": 1.695067264573991, + "no_speech_prob": 0.00028587476117536426}, {"id": 235, "seek": 156016, "start": + 1564.72, "end": 1570.72, "text": " distribution where most people ask the same things + over and over, but you often have a really", "tokens": [50592, 7316, 689, 881, 561, + 1029, 264, 912, 721, 670, 293, 670, 11, 457, 291, 2049, 362, 257, 534, 50892], "temperature": + 0.0, "avg_logprob": -0.11133351486720396, "compression_ratio": 1.695067264573991, + "no_speech_prob": 0.00028587476117536426}, {"id": 236, "seek": 156016, "start": + 1570.72, "end": 1576.5600000000002, "text": " long tail. When you analyze the long + tail in a lot of e-commerce situations, what you often find,", "tokens": [50892, + 938, 6838, 13, 1133, 291, 12477, 264, 938, 6838, 294, 257, 688, 295, 308, 12, 26926, + 6851, 11, 437, 291, 2049, 915, 11, 51184], "temperature": 0.0, "avg_logprob": -0.11133351486720396, + "compression_ratio": 1.695067264573991, "no_speech_prob": 0.00028587476117536426}, + {"id": 237, "seek": 156016, "start": 1576.5600000000002, "end": 1582.96, "text": + " for instance, is the long tail is actually pretty highly correlated to the head + queries, right?", "tokens": [51184, 337, 5197, 11, 307, 264, 938, 6838, 307, 767, + 1238, 5405, 38574, 281, 264, 1378, 24109, 11, 558, 30, 51504], "temperature": 0.0, + "avg_logprob": -0.11133351486720396, "compression_ratio": 1.695067264573991, "no_speech_prob": + 0.00028587476117536426}, {"id": 238, "seek": 158296, "start": 1583.04, "end": 1588.48, + "text": " And so developing that intuition of like, you know, why are these long + tail queries", "tokens": [50368, 400, 370, 6416, 300, 24002, 295, 411, 11, 291, + 458, 11, 983, 366, 613, 938, 6838, 24109, 50640], "temperature": 0.0, "avg_logprob": + -0.09535997564142401, "compression_ratio": 1.6755555555555555, "no_speech_prob": + 0.0027663810178637505}, {"id": 239, "seek": 158296, "start": 1590.8, "end": 1597.52, + "text": " working or not working? That can then help you do much better at all of + your queries, right?", "tokens": [50756, 1364, 420, 406, 1364, 30, 663, 393, 550, + 854, 291, 360, 709, 1101, 412, 439, 295, 428, 24109, 11, 558, 30, 51092], "temperature": + 0.0, "avg_logprob": -0.09535997564142401, "compression_ratio": 1.6755555555555555, + "no_speech_prob": 0.0027663810178637505}, {"id": 240, "seek": 158296, "start": 1597.52, + "end": 1602.16, "text": " And so, you know, from those click logs, then you start + to focus on, well, how do I improve my head", "tokens": [51092, 400, 370, 11, 291, + 458, 11, 490, 729, 2052, 20820, 11, 550, 291, 722, 281, 1879, 322, 11, 731, 11, + 577, 360, 286, 3470, 452, 1378, 51324], "temperature": 0.0, "avg_logprob": -0.09535997564142401, + "compression_ratio": 1.6755555555555555, "no_speech_prob": 0.0027663810178637505}, + {"id": 241, "seek": 158296, "start": 1602.16, "end": 1608.24, "text": " or my torso + queries, like the ones that are most common? And then as you go on, then you can + look at", "tokens": [51324, 420, 452, 34917, 24109, 11, 411, 264, 2306, 300, 366, + 881, 2689, 30, 400, 550, 382, 291, 352, 322, 11, 550, 291, 393, 574, 412, 51628], + "temperature": 0.0, "avg_logprob": -0.09535997564142401, "compression_ratio": 1.6755555555555555, + "no_speech_prob": 0.0027663810178637505}, {"id": 242, "seek": 160824, "start": 1608.32, + "end": 1615.68, "text": " how do I handle long tail queries depending on how important + they are to you? You know, and from", "tokens": [50368, 577, 360, 286, 4813, 938, + 6838, 24109, 5413, 322, 577, 1021, 436, 366, 281, 291, 30, 509, 458, 11, 293, 490, + 50736], "temperature": 0.0, "avg_logprob": -0.1362801153682968, "compression_ratio": + 1.6470588235294117, "no_speech_prob": 0.0006714849732816219}, {"id": 243, "seek": + 160824, "start": 1615.68, "end": 1620.72, "text": " from that click log, then you + can start to build either, you know, in some cases, you still might", "tokens": + [50736, 490, 300, 2052, 3565, 11, 550, 291, 393, 722, 281, 1322, 2139, 11, 291, + 458, 11, 294, 512, 3331, 11, 291, 920, 1062, 50988], "temperature": 0.0, "avg_logprob": + -0.1362801153682968, "compression_ratio": 1.6470588235294117, "no_speech_prob": + 0.0006714849732816219}, {"id": 244, "seek": 160824, "start": 1620.72, "end": 1627.04, + "text": " make sense for you to have rules. And then, and then you can also look + at, you know, like again,", "tokens": [50988, 652, 2020, 337, 291, 281, 362, 4474, + 13, 400, 550, 11, 293, 550, 291, 393, 611, 574, 412, 11, 291, 458, 11, 411, 797, + 11, 51304], "temperature": 0.0, "avg_logprob": -0.1362801153682968, "compression_ratio": + 1.6470588235294117, "no_speech_prob": 0.0006714849732816219}, {"id": 245, "seek": + 160824, "start": 1627.04, "end": 1632.32, "text": " like I would try to look at + it the problem holistically, what''s going to get me the most bang for my", "tokens": + [51304, 411, 286, 576, 853, 281, 574, 412, 309, 264, 1154, 4091, 20458, 11, 437, + 311, 516, 281, 483, 385, 264, 881, 8550, 337, 452, 51568], "temperature": 0.0, "avg_logprob": + -0.1362801153682968, "compression_ratio": 1.6470588235294117, "no_speech_prob": + 0.0006714849732816219}, {"id": 246, "seek": 163232, "start": 1632.32, "end": 1639.4399999999998, + "text": " buck in terms of where I should spend my time, right? So in the short + run, rules are probably", "tokens": [50364, 14894, 294, 2115, 295, 689, 286, 820, + 3496, 452, 565, 11, 558, 30, 407, 294, 264, 2099, 1190, 11, 4474, 366, 1391, 50720], + "temperature": 0.0, "avg_logprob": -0.10916797560874862, "compression_ratio": 1.6952789699570816, + "no_speech_prob": 0.002710397122427821}, {"id": 247, "seek": 163232, "start": 1639.4399999999998, + "end": 1646.1599999999999, "text": " easier, but they''re harder to maintain in + the long run. And of course, you can only manage so many", "tokens": [50720, 3571, + 11, 457, 436, 434, 6081, 281, 6909, 294, 264, 938, 1190, 13, 400, 295, 1164, 11, + 291, 393, 787, 3067, 370, 867, 51056], "temperature": 0.0, "avg_logprob": -0.10916797560874862, + "compression_ratio": 1.6952789699570816, "no_speech_prob": 0.002710397122427821}, + {"id": 248, "seek": 163232, "start": 1646.1599999999999, "end": 1652.3999999999999, + "text": " rules on your own and, you know, even with several people, whereas machine + learning may take more work", "tokens": [51056, 4474, 322, 428, 1065, 293, 11, 291, + 458, 11, 754, 365, 2940, 561, 11, 9735, 3479, 2539, 815, 747, 544, 589, 51368], + "temperature": 0.0, "avg_logprob": -0.10916797560874862, "compression_ratio": 1.6952789699570816, + "no_speech_prob": 0.002710397122427821}, {"id": 249, "seek": 163232, "start": 1652.3999999999999, + "end": 1658.8, "text": " up front, but in the long run is probably easier to maintain. + Although I do still wonder, you know,", "tokens": [51368, 493, 1868, 11, 457, 294, + 264, 938, 1190, 307, 1391, 3571, 281, 6909, 13, 5780, 286, 360, 920, 2441, 11, 291, + 458, 11, 51688], "temperature": 0.0, "avg_logprob": -0.10916797560874862, "compression_ratio": + 1.6952789699570816, "no_speech_prob": 0.002710397122427821}, {"id": 250, "seek": + 165880, "start": 1658.8, "end": 1662.8, "text": " if we''re going to run into the + same kind of problems we have with rules with machine learning", "tokens": [50364, + 498, 321, 434, 516, 281, 1190, 666, 264, 912, 733, 295, 2740, 321, 362, 365, 4474, + 365, 3479, 2539, 50564], "temperature": 0.0, "avg_logprob": -0.08125195679841218, + "compression_ratio": 1.8646616541353382, "no_speech_prob": 0.0008244350901804864}, + {"id": 251, "seek": 165880, "start": 1662.8, "end": 1668.24, "text": " models where + we have so many different models that are being applied and they''re built by different", + "tokens": [50564, 5245, 689, 321, 362, 370, 867, 819, 5245, 300, 366, 885, 6456, + 293, 436, 434, 3094, 538, 819, 50836], "temperature": 0.0, "avg_logprob": -0.08125195679841218, + "compression_ratio": 1.8646616541353382, "no_speech_prob": 0.0008244350901804864}, + {"id": 252, "seek": 165880, "start": 1668.24, "end": 1674.8, "text": " teams and + they''re applied in different scenarios. And, and next thing you know, you have + a complexity", "tokens": [50836, 5491, 293, 436, 434, 6456, 294, 819, 15077, 13, + 400, 11, 293, 958, 551, 291, 458, 11, 291, 362, 257, 14024, 51164], "temperature": + 0.0, "avg_logprob": -0.08125195679841218, "compression_ratio": 1.8646616541353382, + "no_speech_prob": 0.0008244350901804864}, {"id": 253, "seek": 165880, "start": 1674.8, + "end": 1680.56, "text": " problem on that front as well. But, you know, luckily, + like with things like machine learning operations", "tokens": [51164, 1154, 322, + 300, 1868, 382, 731, 13, 583, 11, 291, 458, 11, 22880, 11, 411, 365, 721, 411, 3479, + 2539, 7705, 51452], "temperature": 0.0, "avg_logprob": -0.08125195679841218, "compression_ratio": + 1.8646616541353382, "no_speech_prob": 0.0008244350901804864}, {"id": 254, "seek": + 165880, "start": 1680.56, "end": 1687.84, "text": " becoming more of a focus and + people getting much more rigorous about how they deploy and manage", "tokens": [51452, + 5617, 544, 295, 257, 1879, 293, 561, 1242, 709, 544, 29882, 466, 577, 436, 7274, + 293, 3067, 51816], "temperature": 0.0, "avg_logprob": -0.08125195679841218, "compression_ratio": + 1.8646616541353382, "no_speech_prob": 0.0008244350901804864}, {"id": 255, "seek": + 168784, "start": 1687.84, "end": 1693.1999999999998, "text": " models, I think most + of those problems will be mitigated in one run, but it still goes back to", "tokens": + [50364, 5245, 11, 286, 519, 881, 295, 729, 2740, 486, 312, 15699, 770, 294, 472, + 1190, 11, 457, 309, 920, 1709, 646, 281, 50632], "temperature": 0.0, "avg_logprob": + -0.14867196083068848, "compression_ratio": 1.7607142857142857, "no_speech_prob": + 0.0012617834145203233}, {"id": 256, "seek": 168784, "start": 1693.1999999999998, + "end": 1700.6399999999999, "text": " the same core principles, which you need to + have good housekeeping in order to be successful both with", "tokens": [50632, 264, + 912, 4965, 9156, 11, 597, 291, 643, 281, 362, 665, 48033, 294, 1668, 281, 312, 4406, + 1293, 365, 51004], "temperature": 0.0, "avg_logprob": -0.14867196083068848, "compression_ratio": + 1.7607142857142857, "no_speech_prob": 0.0012617834145203233}, {"id": 257, "seek": + 168784, "start": 1700.6399999999999, "end": 1705.6799999999998, "text": " rules + and with machine learning models. I don''t know if that that was kind of long wind. + I don''t", "tokens": [51004, 4474, 293, 365, 3479, 2539, 5245, 13, 286, 500, 380, + 458, 498, 300, 300, 390, 733, 295, 938, 2468, 13, 286, 500, 380, 51256], "temperature": + 0.0, "avg_logprob": -0.14867196083068848, "compression_ratio": 1.7607142857142857, + "no_speech_prob": 0.0012617834145203233}, {"id": 258, "seek": 168784, "start": 1705.6799999999998, + "end": 1710.72, "text": " know if that answered the question or not. It does, it + does. I mean, it gives the intuition, especially", "tokens": [51256, 458, 498, 300, + 10103, 264, 1168, 420, 406, 13, 467, 775, 11, 309, 775, 13, 286, 914, 11, 309, 2709, + 264, 24002, 11, 2318, 51508], "temperature": 0.0, "avg_logprob": -0.14867196083068848, + "compression_ratio": 1.7607142857142857, "no_speech_prob": 0.0012617834145203233}, + {"id": 259, "seek": 168784, "start": 1710.72, "end": 1716.08, "text": " where you + said the connection between, you know, that that was an insight actually to me, + like", "tokens": [51508, 689, 291, 848, 264, 4984, 1296, 11, 291, 458, 11, 300, + 300, 390, 364, 11269, 767, 281, 385, 11, 411, 51776], "temperature": 0.0, "avg_logprob": + -0.14867196083068848, "compression_ratio": 1.7607142857142857, "no_speech_prob": + 0.0012617834145203233}, {"id": 260, "seek": 171608, "start": 1716.72, "end": 1722.8799999999999, + "text": " the connection between head and tail that 50% of tail may correlate with + your head. And that''s", "tokens": [50396, 264, 4984, 1296, 1378, 293, 6838, 300, + 2625, 4, 295, 6838, 815, 48742, 365, 428, 1378, 13, 400, 300, 311, 50704], "temperature": + 0.0, "avg_logprob": -0.19300092435350605, "compression_ratio": 1.5901639344262295, + "no_speech_prob": 0.0053496211767196655}, {"id": 261, "seek": 171608, "start": 1722.8799999999999, + "end": 1729.1999999999998, "text": " amazing. Like 50% of this super hard queries + could be kind of, you know, removed from that complexity", "tokens": [50704, 2243, + 13, 1743, 2625, 4, 295, 341, 1687, 1152, 24109, 727, 312, 733, 295, 11, 291, 458, + 11, 7261, 490, 300, 14024, 51020], "temperature": 0.0, "avg_logprob": -0.19300092435350605, + "compression_ratio": 1.5901639344262295, "no_speech_prob": 0.0053496211767196655}, + {"id": 262, "seek": 171608, "start": 1729.1999999999998, "end": 1734.6399999999999, + "text": " space, right? Which is, again, you know, your mileage may vary, right? + Like it depends on your", "tokens": [51020, 1901, 11, 558, 30, 3013, 307, 11, 797, + 11, 291, 458, 11, 428, 43121, 815, 10559, 11, 558, 30, 1743, 309, 5946, 322, 428, + 51292], "temperature": 0.0, "avg_logprob": -0.19300092435350605, "compression_ratio": + 1.5901639344262295, "no_speech_prob": 0.0053496211767196655}, {"id": 263, "seek": + 171608, "start": 1734.6399999999999, "end": 1741.28, "text": " data set in Europe, + but you know, like in e-commerce, right? If if I phone 13 or whatever is the", "tokens": + [51292, 1412, 992, 294, 3315, 11, 457, 291, 458, 11, 411, 294, 308, 12, 26926, 11, + 558, 30, 759, 498, 286, 2593, 3705, 420, 2035, 307, 264, 51624], "temperature": + 0.0, "avg_logprob": -0.19300092435350605, "compression_ratio": 1.5901639344262295, + "no_speech_prob": 0.0053496211767196655}, {"id": 264, "seek": 174128, "start": 1741.28, + "end": 1750.16, "text": " head query, there''s probably a tail query that''s, you + know, silver 64 gigabyte iPhone 13 with case,", "tokens": [50364, 1378, 14581, 11, + 456, 311, 1391, 257, 6838, 14581, 300, 311, 11, 291, 458, 11, 8753, 12145, 8741, + 34529, 7252, 3705, 365, 1389, 11, 50808], "temperature": 0.0, "avg_logprob": -0.10392130264128098, + "compression_ratio": 1.7212389380530972, "no_speech_prob": 0.0016816583229228854}, + {"id": 265, "seek": 174128, "start": 1750.16, "end": 1755.28, "text": " right? Like + that''s probably a tail query or at least a torso query. And once you have those + types", "tokens": [50808, 558, 30, 1743, 300, 311, 1391, 257, 6838, 14581, 420, + 412, 1935, 257, 34917, 14581, 13, 400, 1564, 291, 362, 729, 3467, 51064], "temperature": + 0.0, "avg_logprob": -0.10392130264128098, "compression_ratio": 1.7212389380530972, + "no_speech_prob": 0.0016816583229228854}, {"id": 266, "seek": 174128, "start": 1755.28, + "end": 1760.72, "text": " of realizations, you can start to link these up. And then + the cool thing really is that then", "tokens": [51064, 295, 957, 14455, 11, 291, + 393, 722, 281, 2113, 613, 493, 13, 400, 550, 264, 1627, 551, 534, 307, 300, 550, + 51336], "temperature": 0.0, "avg_logprob": -0.10392130264128098, "compression_ratio": + 1.7212389380530972, "no_speech_prob": 0.0016816583229228854}, {"id": 267, "seek": + 174128, "start": 1762.16, "end": 1768.3999999999999, "text": " the things you know + about the head can apply to those types of tail queries as well. And so you''re", + "tokens": [51408, 264, 721, 291, 458, 466, 264, 1378, 393, 3079, 281, 729, 3467, + 295, 6838, 24109, 382, 731, 13, 400, 370, 291, 434, 51720], "temperature": 0.0, + "avg_logprob": -0.10392130264128098, "compression_ratio": 1.7212389380530972, "no_speech_prob": + 0.0016816583229228854}, {"id": 268, "seek": 176840, "start": 1768.4, "end": 1774.64, + "text": " actually, you might be able to more effectively manage those tail queries, + even without machine learning", "tokens": [50364, 767, 11, 291, 1062, 312, 1075, + 281, 544, 8659, 3067, 729, 6838, 24109, 11, 754, 1553, 3479, 2539, 50676], "temperature": + 0.0, "avg_logprob": -0.12994774266293174, "compression_ratio": 1.5793650793650793, + "no_speech_prob": 0.0014774148585274816}, {"id": 269, "seek": 176840, "start": 1774.64, + "end": 1780.88, "text": " models. Yeah, absolutely. And just a quick reminder to + our respected audience, feel free to send", "tokens": [50676, 5245, 13, 865, 11, + 3122, 13, 400, 445, 257, 1702, 13548, 281, 527, 20020, 4034, 11, 841, 1737, 281, + 2845, 50988], "temperature": 0.0, "avg_logprob": -0.12994774266293174, "compression_ratio": + 1.5793650793650793, "no_speech_prob": 0.0014774148585274816}, {"id": 270, "seek": + 176840, "start": 1780.88, "end": 1786.0800000000002, "text": " your questions. Otherwise, + I will ask all the questions myself, which, which of course I have, but,", "tokens": + [50988, 428, 1651, 13, 10328, 11, 286, 486, 1029, 439, 264, 1651, 2059, 11, 597, + 11, 597, 295, 1164, 286, 362, 11, 457, 11, 51248], "temperature": 0.0, "avg_logprob": + -0.12994774266293174, "compression_ratio": 1.5793650793650793, "no_speech_prob": + 0.0014774148585274816}, {"id": 271, "seek": 176840, "start": 1786.0800000000002, + "end": 1792.0, "text": " you know, I''m sure you guys have guys and girls. I''m + sure you have some interesting cases. We do", "tokens": [51248, 291, 458, 11, 286, + 478, 988, 291, 1074, 362, 1074, 293, 4519, 13, 286, 478, 988, 291, 362, 512, 1880, + 3331, 13, 492, 360, 51544], "temperature": 0.0, "avg_logprob": -0.12994774266293174, + "compression_ratio": 1.5793650793650793, "no_speech_prob": 0.0014774148585274816}, + {"id": 272, "seek": 179200, "start": 1792.0, "end": 1796.24, "text": " get a few + questions already, but we will we''ll answer them in the end of this session.", + "tokens": [50364, 483, 257, 1326, 1651, 1217, 11, 457, 321, 486, 321, 603, 1867, + 552, 294, 264, 917, 295, 341, 5481, 13, 50576], "temperature": 0.0, "avg_logprob": + -0.16335716454879098, "compression_ratio": 1.6849315068493151, "no_speech_prob": + 0.004379446152597666}, {"id": 273, "seek": 179200, "start": 1797.44, "end": 1804.24, + "text": " And couple coupling, you know, that process of sort of, you know, crafting + the signals and", "tokens": [50636, 400, 1916, 37447, 11, 291, 458, 11, 300, 1399, + 295, 1333, 295, 11, 291, 458, 11, 29048, 264, 12354, 293, 50976], "temperature": + 0.0, "avg_logprob": -0.16335716454879098, "compression_ratio": 1.6849315068493151, + "no_speech_prob": 0.004379446152597666}, {"id": 274, "seek": 179200, "start": 1804.24, + "end": 1811.28, "text": " training your model and deploying it and ML ops that you + mentioned. How do you when it comes to", "tokens": [50976, 3097, 428, 2316, 293, + 34198, 309, 293, 21601, 44663, 300, 291, 2835, 13, 1012, 360, 291, 562, 309, 1487, + 281, 51328], "temperature": 0.0, "avg_logprob": -0.16335716454879098, "compression_ratio": + 1.6849315068493151, "no_speech_prob": 0.004379446152597666}, {"id": 275, "seek": + 179200, "start": 1811.28, "end": 1816.32, "text": " measurement, how do you measure? + How do you make sure that, you know, what happens right now in", "tokens": [51328, + 13160, 11, 577, 360, 291, 3481, 30, 1012, 360, 291, 652, 988, 300, 11, 291, 458, + 11, 437, 2314, 558, 586, 294, 51580], "temperature": 0.0, "avg_logprob": -0.16335716454879098, + "compression_ratio": 1.6849315068493151, "no_speech_prob": 0.004379446152597666}, + {"id": 276, "seek": 181632, "start": 1816.32, "end": 1822.24, "text": " production + still makes sense that they don''t need to do any hectic action about, you know, + okay,", "tokens": [50364, 4265, 920, 1669, 2020, 300, 436, 500, 380, 643, 281, 360, + 604, 415, 15518, 3069, 466, 11, 291, 458, 11, 1392, 11, 50660], "temperature": 0.0, + "avg_logprob": -0.13573506537904131, "compression_ratio": 1.646808510638298, "no_speech_prob": + 0.002851012861356139}, {"id": 277, "seek": 181632, "start": 1822.24, "end": 1827.52, + "text": " pulling the model back or something like that. What''s your sense on on + on that front? And like,", "tokens": [50660, 8407, 264, 2316, 646, 420, 746, 411, + 300, 13, 708, 311, 428, 2020, 322, 322, 322, 300, 1868, 30, 400, 411, 11, 50924], + "temperature": 0.0, "avg_logprob": -0.13573506537904131, "compression_ratio": 1.646808510638298, + "no_speech_prob": 0.002851012861356139}, {"id": 278, "seek": 181632, "start": 1827.52, + "end": 1832.48, "text": " maybe some measurements that you have deployed yourself + and have been observing every single day", "tokens": [50924, 1310, 512, 15383, 300, + 291, 362, 17826, 1803, 293, 362, 668, 22107, 633, 2167, 786, 51172], "temperature": + 0.0, "avg_logprob": -0.13573506537904131, "compression_ratio": 1.646808510638298, + "no_speech_prob": 0.002851012861356139}, {"id": 279, "seek": 181632, "start": 1833.2, + "end": 1840.0, "text": " and relying on it. And again, it depends on your what, + you know, kind of what domain you work in.", "tokens": [51208, 293, 24140, 322, + 309, 13, 400, 797, 11, 309, 5946, 322, 428, 437, 11, 291, 458, 11, 733, 295, 437, + 9274, 291, 589, 294, 13, 51548], "temperature": 0.0, "avg_logprob": -0.13573506537904131, + "compression_ratio": 1.646808510638298, "no_speech_prob": 0.002851012861356139}, + {"id": 280, "seek": 184000, "start": 1840.0, "end": 1846.4, "text": " But, you know, + I mean, there''s there''s lots of literature on how to score and and, you know,", + "tokens": [50364, 583, 11, 291, 458, 11, 286, 914, 11, 456, 311, 456, 311, 3195, + 295, 10394, 322, 577, 281, 6175, 293, 293, 11, 291, 458, 11, 50684], "temperature": + 0.0, "avg_logprob": -0.1576796220929435, "compression_ratio": 1.7638888888888888, + "no_speech_prob": 0.0022852637339383364}, {"id": 281, "seek": 184000, "start": 1846.4, + "end": 1851.76, "text": " test your model. So things like precision and recall where + you''re looking at what users are", "tokens": [50684, 1500, 428, 2316, 13, 407, + 721, 411, 18356, 293, 9901, 689, 291, 434, 1237, 412, 437, 5022, 366, 50952], "temperature": + 0.0, "avg_logprob": -0.1576796220929435, "compression_ratio": 1.7638888888888888, + "no_speech_prob": 0.0022852637339383364}, {"id": 282, "seek": 184000, "start": 1851.76, + "end": 1858.88, "text": " clicking on and whether they''re finding the results, + things like zero results or often one of the", "tokens": [50952, 9697, 322, 293, + 1968, 436, 434, 5006, 264, 3542, 11, 721, 411, 4018, 3542, 420, 2049, 472, 295, + 264, 51308], "temperature": 0.0, "avg_logprob": -0.1576796220929435, "compression_ratio": + 1.7638888888888888, "no_speech_prob": 0.0022852637339383364}, {"id": 283, "seek": + 184000, "start": 1858.88, "end": 1867.28, "text": " things that I find helpful is + like what what you would call surprising results where documents are", "tokens": + [51308, 721, 300, 286, 915, 4961, 307, 411, 437, 437, 291, 576, 818, 8830, 3542, + 689, 8512, 366, 51728], "temperature": 0.0, "avg_logprob": -0.1576796220929435, + "compression_ratio": 1.7638888888888888, "no_speech_prob": 0.0022852637339383364}, + {"id": 284, "seek": 186728, "start": 1867.28, "end": 1873.84, "text": " occurring + fairly high up in the results, but they''re not actually garnering the clicks that + you", "tokens": [50364, 18386, 6457, 1090, 493, 294, 264, 3542, 11, 457, 436, 434, + 406, 767, 25067, 1794, 264, 18521, 300, 291, 50692], "temperature": 0.0, "avg_logprob": + -0.09422235990825452, "compression_ratio": 1.6122448979591837, "no_speech_prob": + 0.002201144816353917}, {"id": 285, "seek": 186728, "start": 1873.84, "end": 1879.52, + "text": " would expect given that position. So for instance, you know, I mean, many + people in search understand", "tokens": [50692, 576, 2066, 2212, 300, 2535, 13, + 407, 337, 5197, 11, 291, 458, 11, 286, 914, 11, 867, 561, 294, 3164, 1223, 50976], + "temperature": 0.0, "avg_logprob": -0.09422235990825452, "compression_ratio": 1.6122448979591837, + "no_speech_prob": 0.002201144816353917}, {"id": 286, "seek": 186728, "start": 1879.52, + "end": 1886.8, "text": " that there''s a position bias that''s just built into all + of us as humans. We we trust the machine.", "tokens": [50976, 300, 456, 311, 257, + 2535, 12577, 300, 311, 445, 3094, 666, 439, 295, 505, 382, 6255, 13, 492, 321, 3361, + 264, 3479, 13, 51340], "temperature": 0.0, "avg_logprob": -0.09422235990825452, + "compression_ratio": 1.6122448979591837, "no_speech_prob": 0.002201144816353917}, + {"id": 287, "seek": 186728, "start": 1886.8, "end": 1893.6, "text": " And so we + click on the first one. Well, if you if you consistently see that a document is + appearing", "tokens": [51340, 400, 370, 321, 2052, 322, 264, 700, 472, 13, 1042, + 11, 498, 291, 498, 291, 14961, 536, 300, 257, 4166, 307, 19870, 51680], "temperature": + 0.0, "avg_logprob": -0.09422235990825452, "compression_ratio": 1.6122448979591837, + "no_speech_prob": 0.002201144816353917}, {"id": 288, "seek": 189360, "start": 1893.6, + "end": 1900.6399999999999, "text": " at say number one or number two in the results, + but it''s getting way less clicks than say the six or", "tokens": [50364, 412, 584, + 1230, 472, 420, 1230, 732, 294, 264, 3542, 11, 457, 309, 311, 1242, 636, 1570, 18521, + 813, 584, 264, 2309, 420, 50716], "temperature": 0.0, "avg_logprob": -0.14063564936319986, + "compression_ratio": 1.5731707317073171, "no_speech_prob": 0.004283294081687927}, + {"id": 289, "seek": 189360, "start": 1900.6399999999999, "end": 1907.76, "text": + " seventh document, that might be an indication to you that that document isn''t + particularly relevant or", "tokens": [50716, 17875, 4166, 11, 300, 1062, 312, 364, + 18877, 281, 291, 300, 300, 4166, 1943, 380, 4098, 7340, 420, 51072], "temperature": + 0.0, "avg_logprob": -0.14063564936319986, "compression_ratio": 1.5731707317073171, + "no_speech_prob": 0.004283294081687927}, {"id": 290, "seek": 189360, "start": 1907.76, + "end": 1913.84, "text": " for whatever reasons users aren''t liking it. So those + kinds of more subtle metrics can also be", "tokens": [51072, 337, 2035, 4112, 5022, + 3212, 380, 16933, 309, 13, 407, 729, 3685, 295, 544, 13743, 16367, 393, 611, 312, + 51376], "temperature": 0.0, "avg_logprob": -0.14063564936319986, "compression_ratio": + 1.5731707317073171, "no_speech_prob": 0.004283294081687927}, {"id": 291, "seek": + 189360, "start": 1913.84, "end": 1921.6799999999998, "text": " informative. I think, + you know, if you have a AB experiment, testing framework in place,", "tokens": [51376, + 27759, 13, 286, 519, 11, 291, 458, 11, 498, 291, 362, 257, 13838, 5120, 11, 4997, + 8388, 294, 1081, 11, 51768], "temperature": 0.0, "avg_logprob": -0.14063564936319986, + "compression_ratio": 1.5731707317073171, "no_speech_prob": 0.004283294081687927}, + {"id": 292, "seek": 192168, "start": 1921.68, "end": 1927.52, "text": " obviously + you can do all of your metrics around AB testing, you know, start with just giving", + "tokens": [50364, 2745, 291, 393, 360, 439, 295, 428, 16367, 926, 13838, 4997, 11, + 291, 458, 11, 722, 365, 445, 2902, 50656], "temperature": 0.0, "avg_logprob": -0.13764904936154684, + "compression_ratio": 1.7074235807860263, "no_speech_prob": 0.0021812322083860636}, + {"id": 293, "seek": 192168, "start": 1927.52, "end": 1933.92, "text": " a certain + amount of traffic to your new approach and then ramping up as it meets your metrics,", + "tokens": [50656, 257, 1629, 2372, 295, 6419, 281, 428, 777, 3109, 293, 550, 12428, + 278, 493, 382, 309, 13961, 428, 16367, 11, 50976], "temperature": 0.0, "avg_logprob": + -0.13764904936154684, "compression_ratio": 1.7074235807860263, "no_speech_prob": + 0.0021812322083860636}, {"id": 294, "seek": 192168, "start": 1933.92, "end": 1939.76, + "text": " whatever that is or what, you know, your targets are if that''s things + like add the cards, etc. You can", "tokens": [50976, 2035, 300, 307, 420, 437, 11, + 291, 458, 11, 428, 12911, 366, 498, 300, 311, 721, 411, 909, 264, 5632, 11, 5183, + 13, 509, 393, 51268], "temperature": 0.0, "avg_logprob": -0.13764904936154684, "compression_ratio": + 1.7074235807860263, "no_speech_prob": 0.0021812322083860636}, {"id": 295, "seek": + 192168, "start": 1939.76, "end": 1948.24, "text": " ramp up those those types of + tests as you as it proves out. There''s obviously there''s things you can", "tokens": + [51268, 12428, 493, 729, 729, 3467, 295, 6921, 382, 291, 382, 309, 25019, 484, 13, + 821, 311, 2745, 456, 311, 721, 291, 393, 51692], "temperature": 0.0, "avg_logprob": + -0.13764904936154684, "compression_ratio": 1.7074235807860263, "no_speech_prob": + 0.0021812322083860636}, {"id": 296, "seek": 194824, "start": 1948.24, "end": 1955.76, + "text": " do offline as well, like especially if you have enough query logs. And + if your index hasn''t changed", "tokens": [50364, 360, 21857, 382, 731, 11, 411, + 2318, 498, 291, 362, 1547, 14581, 20820, 13, 400, 498, 428, 8186, 6132, 380, 3105, + 50740], "temperature": 0.0, "avg_logprob": -0.1337230470445421, "compression_ratio": + 1.5303030303030303, "no_speech_prob": 0.0010978406062349677}, {"id": 297, "seek": + 194824, "start": 1955.76, "end": 1962.88, "text": " that much, but maybe just the + approach you''re taking has, then you can you can replay your logs, you can", "tokens": + [50740, 300, 709, 11, 457, 1310, 445, 264, 3109, 291, 434, 1940, 575, 11, 550, 291, + 393, 291, 393, 23836, 428, 20820, 11, 291, 393, 51096], "temperature": 0.0, "avg_logprob": + -0.1337230470445421, "compression_ratio": 1.5303030303030303, "no_speech_prob": + 0.0010978406062349677}, {"id": 298, "seek": 194824, "start": 1962.88, "end": 1970.72, + "text": " test out and you know, effectively simulate what users might click on + in those scenarios. And then", "tokens": [51096, 1500, 484, 293, 291, 458, 11, 8659, + 27817, 437, 5022, 1062, 2052, 322, 294, 729, 15077, 13, 400, 550, 51488], "temperature": + 0.0, "avg_logprob": -0.1337230470445421, "compression_ratio": 1.5303030303030303, + "no_speech_prob": 0.0010978406062349677}, {"id": 299, "seek": 197072, "start": 1970.72, + "end": 1977.44, "text": " of course there''s the old fashioned just, you know, things + like smell tests like do these results", "tokens": [50364, 295, 1164, 456, 311, + 264, 1331, 40646, 445, 11, 291, 458, 11, 721, 411, 4316, 6921, 411, 360, 613, 3542, + 50700], "temperature": 0.0, "avg_logprob": -0.1246923848202354, "compression_ratio": + 1.6864406779661016, "no_speech_prob": 0.00042755273170769215}, {"id": 300, "seek": + 197072, "start": 1977.44, "end": 1985.28, "text": " look better to me as an expert, + you obviously have to be careful there or to a small cohort of experts,", "tokens": + [50700, 574, 1101, 281, 385, 382, 364, 5844, 11, 291, 2745, 362, 281, 312, 5026, + 456, 420, 281, 257, 1359, 28902, 295, 8572, 11, 51092], "temperature": 0.0, "avg_logprob": + -0.1246923848202354, "compression_ratio": 1.6864406779661016, "no_speech_prob": + 0.00042755273170769215}, {"id": 301, "seek": 197072, "start": 1985.28, "end": 1990.56, + "text": " you know, like maybe your colleagues, etc. might spend some time scoring. + So all of these things,", "tokens": [51092, 291, 458, 11, 411, 1310, 428, 7734, + 11, 5183, 13, 1062, 3496, 512, 565, 22358, 13, 407, 439, 295, 613, 721, 11, 51356], + "temperature": 0.0, "avg_logprob": -0.1246923848202354, "compression_ratio": 1.6864406779661016, + "no_speech_prob": 0.00042755273170769215}, {"id": 302, "seek": 197072, "start": + 1990.56, "end": 1997.44, "text": " I think are techniques and measurements you can + use to check to see whether results are, you know,", "tokens": [51356, 286, 519, + 366, 7512, 293, 15383, 291, 393, 764, 281, 1520, 281, 536, 1968, 3542, 366, 11, + 291, 458, 11, 51700], "temperature": 0.0, "avg_logprob": -0.1246923848202354, "compression_ratio": + 1.6864406779661016, "no_speech_prob": 0.00042755273170769215}, {"id": 303, "seek": + 199744, "start": 1997.52, "end": 2002.56, "text": " good enough for you them to + go into production. I think there''s I think Ronnie, Ronnie,", "tokens": [50368, + 665, 1547, 337, 291, 552, 281, 352, 666, 4265, 13, 286, 519, 456, 311, 286, 519, + 46131, 11, 46131, 11, 50620], "temperature": 0.0, "avg_logprob": -0.1803128378731864, + "compression_ratio": 1.6842105263157894, "no_speech_prob": 0.0014796133618801832}, + {"id": 304, "seek": 199744, "start": 2002.56, "end": 2007.6000000000001, "text": + " co-hoved me, I forget the name of the book, but he has a really good book along + with a co-author on", "tokens": [50620, 598, 12, 1289, 937, 385, 11, 286, 2870, + 264, 1315, 295, 264, 1446, 11, 457, 415, 575, 257, 534, 665, 1446, 2051, 365, 257, + 598, 12, 34224, 322, 50872], "temperature": 0.0, "avg_logprob": -0.1803128378731864, + "compression_ratio": 1.6842105263157894, "no_speech_prob": 0.0014796133618801832}, + {"id": 305, "seek": 199744, "start": 2009.3600000000001, "end": 2014.64, "text": + " online experimentation. It''s probably these days the Bible of online experimentation. + So I would", "tokens": [50960, 2950, 37142, 13, 467, 311, 1391, 613, 1708, 264, + 6544, 295, 2950, 37142, 13, 407, 286, 576, 51224], "temperature": 0.0, "avg_logprob": + -0.1803128378731864, "compression_ratio": 1.6842105263157894, "no_speech_prob": + 0.0014796133618801832}, {"id": 306, "seek": 199744, "start": 2014.64, "end": 2021.52, + "text": " encourage users to check that out. And then, you know, there''s there''s + lots of metrics that you can", "tokens": [51224, 5373, 5022, 281, 1520, 300, 484, + 13, 400, 550, 11, 291, 458, 11, 456, 311, 456, 311, 3195, 295, 16367, 300, 291, + 393, 51568], "temperature": 0.0, "avg_logprob": -0.1803128378731864, "compression_ratio": + 1.6842105263157894, "no_speech_prob": 0.0014796133618801832}, {"id": 307, "seek": + 202152, "start": 2021.52, "end": 2026.8, "text": " deploy, you know, that are pretty + well standard and publicized. There''s some quick googling should", "tokens": [50364, + 7274, 11, 291, 458, 11, 300, 366, 1238, 731, 3832, 293, 1908, 1602, 13, 821, 311, + 512, 1702, 50061, 1688, 820, 50628], "temperature": 0.0, "avg_logprob": -0.19304724300608916, + "compression_ratio": 1.5294117647058822, "no_speech_prob": 0.0011438294313848019}, + {"id": 308, "seek": 202152, "start": 2026.8, "end": 2033.68, "text": " find those + for people. Yeah, for sure. Of course, I think you could measure some things like", + "tokens": [50628, 915, 729, 337, 561, 13, 865, 11, 337, 988, 13, 2720, 1164, 11, + 286, 519, 291, 727, 3481, 512, 721, 411, 50972], "temperature": 0.0, "avg_logprob": + -0.19304724300608916, "compression_ratio": 1.5294117647058822, "no_speech_prob": + 0.0011438294313848019}, {"id": 309, "seek": 202152, "start": 2033.68, "end": 2040.4, + "text": " a DCG, which is offline, right? So like, but you do need like rated queries. + And as a contributor to", "tokens": [50972, 257, 9114, 38, 11, 597, 307, 21857, + 11, 558, 30, 407, 411, 11, 457, 291, 360, 643, 411, 22103, 24109, 13, 400, 382, + 257, 42859, 281, 51308], "temperature": 0.0, "avg_logprob": -0.19304724300608916, + "compression_ratio": 1.5294117647058822, "no_speech_prob": 0.0011438294313848019}, + {"id": 310, "seek": 202152, "start": 2040.4, "end": 2049.36, "text": " Qbit, which + is a query rating system, open source system, I''m curious to to hear your opinion + on,", "tokens": [51308, 1249, 5260, 11, 597, 307, 257, 14581, 10990, 1185, 11, 1269, + 4009, 1185, 11, 286, 478, 6369, 281, 281, 1568, 428, 4800, 322, 11, 51756], "temperature": + 0.0, "avg_logprob": -0.19304724300608916, "compression_ratio": 1.5294117647058822, + "no_speech_prob": 0.0011438294313848019}, {"id": 311, "seek": 204936, "start": 2049.36, + "end": 2055.04, "text": " you know, sort of on one hand, of course, you can always + go and just check, sanity check,", "tokens": [50364, 291, 458, 11, 1333, 295, 322, + 472, 1011, 11, 295, 1164, 11, 291, 393, 1009, 352, 293, 445, 1520, 11, 47892, 1520, + 11, 50648], "temperature": 0.0, "avg_logprob": -0.21216041163394325, "compression_ratio": + 1.6150627615062763, "no_speech_prob": 0.004200038034468889}, {"id": 312, "seek": + 204936, "start": 2056.0, "end": 2062.48, "text": " you know, smoke test, your, your, + your runker. But that''s just maybe for engineers or product managers,", "tokens": + [50696, 291, 458, 11, 8439, 1500, 11, 428, 11, 428, 11, 428, 367, 3197, 260, 13, + 583, 300, 311, 445, 1310, 337, 11955, 420, 1674, 14084, 11, 51020], "temperature": + 0.0, "avg_logprob": -0.21216041163394325, "compression_ratio": 1.6150627615062763, + "no_speech_prob": 0.004200038034468889}, {"id": 313, "seek": 204936, "start": 2062.48, + "end": 2068.8, "text": " like a smaller group versus when you go and try to understand + the intent of queries at larger scale", "tokens": [51020, 411, 257, 4356, 1594, + 5717, 562, 291, 352, 293, 853, 281, 1223, 264, 8446, 295, 24109, 412, 4833, 4373, + 51336], "temperature": 0.0, "avg_logprob": -0.21216041163394325, "compression_ratio": + 1.6150627615062763, "no_speech_prob": 0.004200038034468889}, {"id": 314, "seek": + 204936, "start": 2068.8, "end": 2073.52, "text": " with this manual effort. Have + you seen, have you deployed such methods within organizations?", "tokens": [51336, + 365, 341, 9688, 4630, 13, 3560, 291, 1612, 11, 362, 291, 17826, 1270, 7150, 1951, + 6150, 30, 51572], "temperature": 0.0, "avg_logprob": -0.21216041163394325, "compression_ratio": + 1.6150627615062763, "no_speech_prob": 0.004200038034468889}, {"id": 315, "seek": + 207352, "start": 2074.08, "end": 2080.8, "text": " What do you feel like doing this + in the companies on more regular basis? And I also know, as a shout out", "tokens": + [50392, 708, 360, 291, 841, 411, 884, 341, 294, 264, 3431, 322, 544, 3890, 5143, + 30, 400, 286, 611, 458, 11, 382, 257, 8043, 484, 50728], "temperature": 0.0, "avg_logprob": + -0.16098369436061127, "compression_ratio": 1.632034632034632, "no_speech_prob": + 0.007157593499869108}, {"id": 316, "seek": 207352, "start": 2080.8, "end": 2085.6, + "text": " to what you did in the course, search with the mail course, like you did + ask us to", "tokens": [50728, 281, 437, 291, 630, 294, 264, 1164, 11, 3164, 365, + 264, 10071, 1164, 11, 411, 291, 630, 1029, 505, 281, 50968], "temperature": 0.0, + "avg_logprob": -0.16098369436061127, "compression_ratio": 1.632034632034632, "no_speech_prob": + 0.007157593499869108}, {"id": 317, "seek": 207352, "start": 2086.56, "end": 2091.68, + "text": " rate some queries and create a judgment, please, to get a feel of the + process. And I think that", "tokens": [51016, 3314, 512, 24109, 293, 1884, 257, + 12216, 11, 1767, 11, 281, 483, 257, 841, 295, 264, 1399, 13, 400, 286, 519, 300, + 51272], "temperature": 0.0, "avg_logprob": -0.16098369436061127, "compression_ratio": + 1.632034632034632, "no_speech_prob": 0.007157593499869108}, {"id": 318, "seek": + 207352, "start": 2091.68, "end": 2099.36, "text": " by itself is a great idea because + it pushes you towards, you know, further understanding what", "tokens": [51272, + 538, 2564, 307, 257, 869, 1558, 570, 309, 21020, 291, 3030, 11, 291, 458, 11, 3052, + 3701, 437, 51656], "temperature": 0.0, "avg_logprob": -0.16098369436061127, "compression_ratio": + 1.632034632034632, "no_speech_prob": 0.007157593499869108}, {"id": 319, "seek": + 209936, "start": 2099.44, "end": 2104.2400000000002, "text": " is it that you''re + building for? So yeah. Yeah, I mean, I think, yeah, I mean, it makes,", "tokens": + [50368, 307, 309, 300, 291, 434, 2390, 337, 30, 407, 1338, 13, 865, 11, 286, 914, + 11, 286, 519, 11, 1338, 11, 286, 914, 11, 309, 1669, 11, 50608], "temperature": + 0.0, "avg_logprob": -0.1421388021790155, "compression_ratio": 1.6832579185520362, + "no_speech_prob": 0.000876779027748853}, {"id": 320, "seek": 209936, "start": 2104.2400000000002, + "end": 2111.92, "text": " it makes a ton of sense to have, if you can afford to + do offline evaluation using, you know,", "tokens": [50608, 309, 1669, 257, 2952, + 295, 2020, 281, 362, 11, 498, 291, 393, 6157, 281, 360, 21857, 13344, 1228, 11, + 291, 458, 11, 50992], "temperature": 0.0, "avg_logprob": -0.1421388021790155, "compression_ratio": + 1.6832579185520362, "no_speech_prob": 0.000876779027748853}, {"id": 321, "seek": + 209936, "start": 2111.92, "end": 2119.28, "text": " professional annotators, you + know, like, I don''t know how good mechanical Turk these days is,", "tokens": [50992, + 4843, 25339, 3391, 11, 291, 458, 11, 411, 11, 286, 500, 380, 458, 577, 665, 12070, + 15714, 613, 1708, 307, 11, 51360], "temperature": 0.0, "avg_logprob": -0.1421388021790155, + "compression_ratio": 1.6832579185520362, "no_speech_prob": 0.000876779027748853}, + {"id": 322, "seek": 209936, "start": 2119.28, "end": 2125.52, "text": " but like, + you know, something like a mechanical Turk or like, I forget what crowd flour is + called", "tokens": [51360, 457, 411, 11, 291, 458, 11, 746, 411, 257, 12070, 15714, + 420, 411, 11, 286, 2870, 437, 6919, 7693, 307, 1219, 51672], "temperature": 0.0, + "avg_logprob": -0.1421388021790155, "compression_ratio": 1.6832579185520362, "no_speech_prob": + 0.000876779027748853}, {"id": 323, "seek": 212552, "start": 2125.52, "end": 2131.04, + "text": " now or I know we''ve worked with a company called Appen in the past, like, + there are these companies", "tokens": [50364, 586, 420, 286, 458, 321, 600, 2732, + 365, 257, 2237, 1219, 3132, 268, 294, 264, 1791, 11, 411, 11, 456, 366, 613, 3431, + 50640], "temperature": 0.0, "avg_logprob": -0.1304206166948591, "compression_ratio": + 1.6554621848739495, "no_speech_prob": 0.002578917657956481}, {"id": 324, "seek": + 212552, "start": 2131.04, "end": 2137.52, "text": " out there that will provide + you with a large number of annotators who will run your queries and", "tokens": + [50640, 484, 456, 300, 486, 2893, 291, 365, 257, 2416, 1230, 295, 25339, 3391, 567, + 486, 1190, 428, 24109, 293, 50964], "temperature": 0.0, "avg_logprob": -0.1304206166948591, + "compression_ratio": 1.6554621848739495, "no_speech_prob": 0.002578917657956481}, + {"id": 325, "seek": 212552, "start": 2137.52, "end": 2145.04, "text": " then rank + them for you. And of course, you can use that as well. So again, like, you know, + it often", "tokens": [50964, 550, 6181, 552, 337, 291, 13, 400, 295, 1164, 11, 291, + 393, 764, 300, 382, 731, 13, 407, 797, 11, 411, 11, 291, 458, 11, 309, 2049, 51340], + "temperature": 0.0, "avg_logprob": -0.1304206166948591, "compression_ratio": 1.6554621848739495, + "no_speech_prob": 0.002578917657956481}, {"id": 326, "seek": 212552, "start": 2145.04, + "end": 2150.24, "text": " comes down to whether you''re monetizing your search results + and folks who do monetize their search", "tokens": [51340, 1487, 760, 281, 1968, + 291, 434, 15556, 3319, 428, 3164, 3542, 293, 4024, 567, 360, 15556, 1125, 641, 3164, + 51600], "temperature": 0.0, "avg_logprob": -0.1304206166948591, "compression_ratio": + 1.6554621848739495, "no_speech_prob": 0.002578917657956481}, {"id": 327, "seek": + 215024, "start": 2150.3999999999996, "end": 2156.0, "text": " results will typically + pay for those kinds of things, especially once they reach really large scales,", + "tokens": [50372, 3542, 486, 5850, 1689, 337, 729, 3685, 295, 721, 11, 2318, 1564, + 436, 2524, 534, 2416, 17408, 11, 50652], "temperature": 0.0, "avg_logprob": -0.10489437796852806, + "compression_ratio": 1.6866952789699572, "no_speech_prob": 0.004106334410607815}, + {"id": 328, "seek": 215024, "start": 2156.0, "end": 2164.8799999999997, "text": + " you know, like your, your Amazon''s and the like. Where and how much you can do + that often comes down", "tokens": [50652, 291, 458, 11, 411, 428, 11, 428, 6795, + 311, 293, 264, 411, 13, 2305, 293, 577, 709, 291, 393, 360, 300, 2049, 1487, 760, + 51096], "temperature": 0.0, "avg_logprob": -0.10489437796852806, "compression_ratio": + 1.6866952789699572, "no_speech_prob": 0.004106334410607815}, {"id": 329, "seek": + 215024, "start": 2164.8799999999997, "end": 2171.2, "text": " to budget and time, + right? So, you know, if you have the budget, I''ve seen companies do that,", "tokens": + [51096, 281, 4706, 293, 565, 11, 558, 30, 407, 11, 291, 458, 11, 498, 291, 362, + 264, 4706, 11, 286, 600, 1612, 3431, 360, 300, 11, 51412], "temperature": 0.0, "avg_logprob": + -0.10489437796852806, "compression_ratio": 1.6866952789699572, "no_speech_prob": + 0.004106334410607815}, {"id": 330, "seek": 215024, "start": 2172.08, "end": 2176.24, + "text": " you know, maybe I don''t know about weekly, there might be some that do + that weekly at the really", "tokens": [51456, 291, 458, 11, 1310, 286, 500, 380, + 458, 466, 12460, 11, 456, 1062, 312, 512, 300, 360, 300, 12460, 412, 264, 534, 51664], + "temperature": 0.0, "avg_logprob": -0.10489437796852806, "compression_ratio": 1.6866952789699572, + "no_speech_prob": 0.004106334410607815}, {"id": 331, "seek": 217624, "start": 2176.24, + "end": 2182.3999999999996, "text": " large scale, that gets really expensive quarterly + or whenever there''s a major update to the system,", "tokens": [50364, 2416, 4373, + 11, 300, 2170, 534, 5124, 38633, 420, 5699, 456, 311, 257, 2563, 5623, 281, 264, + 1185, 11, 50672], "temperature": 0.0, "avg_logprob": -0.12486493145978009, "compression_ratio": + 1.6209677419354838, "no_speech_prob": 0.002458769828081131}, {"id": 332, "seek": + 217624, "start": 2182.3999999999996, "end": 2187.68, "text": " those kinds of things. + So by all means, I mean, I think anything you can do to get, you know, I think", + "tokens": [50672, 729, 3685, 295, 721, 13, 407, 538, 439, 1355, 11, 286, 914, 11, + 286, 519, 1340, 291, 393, 360, 281, 483, 11, 291, 458, 11, 286, 519, 50936], "temperature": + 0.0, "avg_logprob": -0.12486493145978009, "compression_ratio": 1.6209677419354838, + "no_speech_prob": 0.002458769828081131}, {"id": 333, "seek": 217624, "start": 2187.68, + "end": 2193.2, "text": " often in this space, we love to say, oh, well, this is + the way you do it. And the reality is, is like,", "tokens": [50936, 2049, 294, 341, + 1901, 11, 321, 959, 281, 584, 11, 1954, 11, 731, 11, 341, 307, 264, 636, 291, 360, + 309, 13, 400, 264, 4103, 307, 11, 307, 411, 11, 51212], "temperature": 0.0, "avg_logprob": + -0.12486493145978009, "compression_ratio": 1.6209677419354838, "no_speech_prob": + 0.002458769828081131}, {"id": 334, "seek": 217624, "start": 2194.72, "end": 2200.24, + "text": " you want a hybrid approach to most of these things, right? Because there''s + no one perfect way of,", "tokens": [51288, 291, 528, 257, 13051, 3109, 281, 881, + 295, 613, 721, 11, 558, 30, 1436, 456, 311, 572, 472, 2176, 636, 295, 11, 51564], + "temperature": 0.0, "avg_logprob": -0.12486493145978009, "compression_ratio": 1.6209677419354838, + "no_speech_prob": 0.002458769828081131}, {"id": 335, "seek": 220024, "start": 2200.8799999999997, + "end": 2207.68, "text": " there''s no one perfect model and there''s no one perfect + way of evaluating a model, right? And so", "tokens": [50396, 456, 311, 572, 472, + 2176, 2316, 293, 456, 311, 572, 472, 2176, 636, 295, 27479, 257, 2316, 11, 558, + 30, 400, 370, 50736], "temperature": 0.0, "avg_logprob": -0.14317961956592315, "compression_ratio": + 1.6830357142857142, "no_speech_prob": 0.004250739701092243}, {"id": 336, "seek": + 220024, "start": 2209.2799999999997, "end": 2214.8799999999997, "text": " you need + to blend these and build up a broader sense of what actually works, right?", "tokens": + [50816, 291, 643, 281, 10628, 613, 293, 1322, 493, 257, 13227, 2020, 295, 437, 767, + 1985, 11, 558, 30, 51096], "temperature": 0.0, "avg_logprob": -0.14317961956592315, + "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.004250739701092243}, + {"id": 337, "seek": 220024, "start": 2215.9199999999996, "end": 2222.72, "text": + " Yeah, absolutely. It''s just like, I guess, I guess, general awareness, like that + these systems and", "tokens": [51148, 865, 11, 3122, 13, 467, 311, 445, 411, 11, + 286, 2041, 11, 286, 2041, 11, 2674, 8888, 11, 411, 300, 613, 3652, 293, 51488], + "temperature": 0.0, "avg_logprob": -0.14317961956592315, "compression_ratio": 1.6830357142857142, + "no_speech_prob": 0.004250739701092243}, {"id": 338, "seek": 220024, "start": 2222.72, + "end": 2227.12, "text": " approaches exist and like when you feel stuck that you + don''t know, okay, you don''t generate ideas", "tokens": [51488, 11587, 2514, 293, + 411, 562, 291, 841, 5541, 300, 291, 500, 380, 458, 11, 1392, 11, 291, 500, 380, + 8460, 3487, 51708], "temperature": 0.0, "avg_logprob": -0.14317961956592315, "compression_ratio": + 1.6830357142857142, "no_speech_prob": 0.004250739701092243}, {"id": 339, "seek": + 222712, "start": 2227.12, "end": 2231.6, "text": " where you can improve your search + engine, you can go deeper and try to involve, you know,", "tokens": [50364, 689, + 291, 393, 3470, 428, 3164, 2848, 11, 291, 393, 352, 7731, 293, 853, 281, 9494, 11, + 291, 458, 11, 50588], "temperature": 0.0, "avg_logprob": -0.153562642215343, "compression_ratio": + 1.6772727272727272, "no_speech_prob": 0.0065754191018640995}, {"id": 340, "seek": + 222712, "start": 2231.6, "end": 2241.2, "text": " and the teachers, I believe, to + help you understand. And before we move further to some of", "tokens": [50588, 293, + 264, 6023, 11, 286, 1697, 11, 281, 854, 291, 1223, 13, 400, 949, 321, 1286, 3052, + 281, 512, 295, 51068], "temperature": 0.0, "avg_logprob": -0.153562642215343, "compression_ratio": + 1.6772727272727272, "no_speech_prob": 0.0065754191018640995}, {"id": 341, "seek": + 222712, "start": 2241.2, "end": 2246.3199999999997, "text": " higher level questions, + I still wanted to ask you a little bit more detailed question on if", "tokens": + [51068, 2946, 1496, 1651, 11, 286, 920, 1415, 281, 1029, 291, 257, 707, 857, 544, + 9942, 1168, 322, 498, 51324], "temperature": 0.0, "avg_logprob": -0.153562642215343, + "compression_ratio": 1.6772727272727272, "no_speech_prob": 0.0065754191018640995}, + {"id": 342, "seek": 222712, "start": 2246.3199999999997, "end": 2253.7599999999998, + "text": " somebody in the audience or listeners wants to try to build the kind of + end-to-end search engine", "tokens": [51324, 2618, 294, 264, 4034, 420, 23274, 2738, + 281, 853, 281, 1322, 264, 733, 295, 917, 12, 1353, 12, 521, 3164, 2848, 51696], + "temperature": 0.0, "avg_logprob": -0.153562642215343, "compression_ratio": 1.6772727272727272, + "no_speech_prob": 0.0065754191018640995}, {"id": 343, "seek": 225376, "start": 2253.84, + "end": 2259.6800000000003, "text": " at home. So what are the available datasets, + tools and algorithms exist today that will allow you", "tokens": [50368, 412, 1280, + 13, 407, 437, 366, 264, 2435, 42856, 11, 3873, 293, 14642, 2514, 965, 300, 486, + 2089, 291, 50660], "temperature": 0.0, "avg_logprob": -0.18385151158208432, "compression_ratio": + 1.5767634854771784, "no_speech_prob": 0.0061095645651221275}, {"id": 344, "seek": + 225376, "start": 2259.6800000000003, "end": 2266.32, "text": " to build this and + train relevancy models and all these building blocks in the search engine?", "tokens": + [50660, 281, 1322, 341, 293, 3847, 25916, 6717, 5245, 293, 439, 613, 2390, 8474, + 294, 264, 3164, 2848, 30, 50992], "temperature": 0.0, "avg_logprob": -0.18385151158208432, + "compression_ratio": 1.5767634854771784, "no_speech_prob": 0.0061095645651221275}, + {"id": 345, "seek": 225376, "start": 2267.5200000000004, "end": 2274.1600000000003, + "text": " Yeah, I mean, it''s, you know, it''s interesting. I think in many ways + we live in a golden age of", "tokens": [51052, 865, 11, 286, 914, 11, 309, 311, + 11, 291, 458, 11, 309, 311, 1880, 13, 286, 519, 294, 867, 2098, 321, 1621, 294, + 257, 9729, 3205, 295, 51384], "temperature": 0.0, "avg_logprob": -0.18385151158208432, + "compression_ratio": 1.5767634854771784, "no_speech_prob": 0.0061095645651221275}, + {"id": 346, "seek": 225376, "start": 2274.1600000000003, "end": 2282.7200000000003, + "text": " of search engines, right? Like, there are several just top notch open + source freely available", "tokens": [51384, 295, 3164, 12982, 11, 558, 30, 1743, + 11, 456, 366, 2940, 445, 1192, 26109, 1269, 4009, 16433, 2435, 51812], "temperature": + 0.0, "avg_logprob": -0.18385151158208432, "compression_ratio": 1.5767634854771784, + "no_speech_prob": 0.0061095645651221275}, {"id": 347, "seek": 228272, "start": 2282.72, + "end": 2288.8799999999997, "text": " search engines on the market. There are a number + of companies competing in this space,", "tokens": [50364, 3164, 12982, 322, 264, + 2142, 13, 821, 366, 257, 1230, 295, 3431, 15439, 294, 341, 1901, 11, 50672], "temperature": + 0.0, "avg_logprob": -0.12170405526763027, "compression_ratio": 1.6593886462882097, + "no_speech_prob": 0.0015482836170122027}, {"id": 348, "seek": 228272, "start": 2289.52, + "end": 2295.8399999999997, "text": " right? So, you know, picking an engine is almost + like, hey, you know, it''s a plethora of riches.", "tokens": [50704, 558, 30, 407, + 11, 291, 458, 11, 8867, 364, 2848, 307, 1920, 411, 11, 4177, 11, 291, 458, 11, 309, + 311, 257, 499, 302, 7013, 295, 35777, 13, 51020], "temperature": 0.0, "avg_logprob": + -0.12170405526763027, "compression_ratio": 1.6593886462882097, "no_speech_prob": + 0.0015482836170122027}, {"id": 349, "seek": 228272, "start": 2295.8399999999997, + "end": 2302.48, "text": " It''s almost, it''s like, you''re, it''s a challenge to + pick one because there''s so many good choices,", "tokens": [51020, 467, 311, 1920, + 11, 309, 311, 411, 11, 291, 434, 11, 309, 311, 257, 3430, 281, 1888, 472, 570, 456, + 311, 370, 867, 665, 7994, 11, 51352], "temperature": 0.0, "avg_logprob": -0.12170405526763027, + "compression_ratio": 1.6593886462882097, "no_speech_prob": 0.0015482836170122027}, + {"id": 350, "seek": 228272, "start": 2302.48, "end": 2307.68, "text": " right? And + you''re often like, what specific features or domains am I going to participate + in? So,", "tokens": [51352, 558, 30, 400, 291, 434, 2049, 411, 11, 437, 2685, 4122, + 420, 25514, 669, 286, 516, 281, 8197, 294, 30, 407, 11, 51612], "temperature": 0.0, + "avg_logprob": -0.12170405526763027, "compression_ratio": 1.6593886462882097, "no_speech_prob": + 0.0015482836170122027}, {"id": 351, "seek": 230768, "start": 2307.68, "end": 2312.3199999999997, + "text": " you know, it''s obviously one like, choose a good engine. And I think + you really can''t go wrong", "tokens": [50364, 291, 458, 11, 309, 311, 2745, 472, + 411, 11, 2826, 257, 665, 2848, 13, 400, 286, 519, 291, 534, 393, 380, 352, 2085, + 50596], "temperature": 0.0, "avg_logprob": -0.1670391760139822, "compression_ratio": + 1.6150627615062763, "no_speech_prob": 0.004894915968179703}, {"id": 352, "seek": + 230768, "start": 2312.3199999999997, "end": 2318.0, "text": " with any of the main + ones. What, you know, it''s the Lucine-based ones, Solar Elastic Search Open Search.", + "tokens": [50596, 365, 604, 295, 264, 2135, 2306, 13, 708, 11, 291, 458, 11, 309, + 311, 264, 9593, 533, 12, 6032, 2306, 11, 22385, 2699, 2750, 17180, 7238, 17180, + 13, 50880], "temperature": 0.0, "avg_logprob": -0.1670391760139822, "compression_ratio": + 1.6150627615062763, "no_speech_prob": 0.004894915968179703}, {"id": 353, "seek": + 230768, "start": 2318.7999999999997, "end": 2324.72, "text": " I haven''t played + with Vespa myself, but, you know, I think that one''s coming on strong as well.", + "tokens": [50920, 286, 2378, 380, 3737, 365, 691, 279, 4306, 2059, 11, 457, 11, + 291, 458, 11, 286, 519, 300, 472, 311, 1348, 322, 2068, 382, 731, 13, 51216], "temperature": + 0.0, "avg_logprob": -0.1670391760139822, "compression_ratio": 1.6150627615062763, + "no_speech_prob": 0.004894915968179703}, {"id": 354, "seek": 230768, "start": 2325.8399999999997, + "end": 2330.8799999999997, "text": " You see a lot of interesting capabilities that + are coming out of that. And then, you know,", "tokens": [51272, 509, 536, 257, 688, + 295, 1880, 10862, 300, 366, 1348, 484, 295, 300, 13, 400, 550, 11, 291, 458, 11, + 51524], "temperature": 0.0, "avg_logprob": -0.1670391760139822, "compression_ratio": + 1.6150627615062763, "no_speech_prob": 0.004894915968179703}, {"id": 355, "seek": + 233088, "start": 2331.44, "end": 2337.6800000000003, "text": " you have obviously + the, the companies behind it. Of course, I''m co-founder of Lucidworks,", "tokens": + [50392, 291, 362, 2745, 264, 11, 264, 3431, 2261, 309, 13, 2720, 1164, 11, 286, + 478, 598, 12, 33348, 295, 9593, 327, 18357, 11, 50704], "temperature": 0.0, "avg_logprob": + -0.15004480679829915, "compression_ratio": 1.5912162162162162, "no_speech_prob": + 0.004984300117939711}, {"id": 356, "seek": 233088, "start": 2337.6800000000003, + "end": 2341.76, "text": " and so still a big shout out and big fan there, because + I think they''re doing a lot of interesting", "tokens": [50704, 293, 370, 920, 257, + 955, 8043, 484, 293, 955, 3429, 456, 11, 570, 286, 519, 436, 434, 884, 257, 688, + 295, 1880, 50908], "temperature": 0.0, "avg_logprob": -0.15004480679829915, "compression_ratio": + 1.5912162162162162, "no_speech_prob": 0.004984300117939711}, {"id": 357, "seek": + 233088, "start": 2341.76, "end": 2347.28, "text": " things. But you also see a number + of other players in that space, both with deep learning or", "tokens": [50908, 721, + 13, 583, 291, 611, 536, 257, 1230, 295, 661, 4150, 294, 300, 1901, 11, 1293, 365, + 2452, 2539, 420, 51184], "temperature": 0.0, "avg_logprob": -0.15004480679829915, + "compression_ratio": 1.5912162162162162, "no_speech_prob": 0.004984300117939711}, + {"id": 358, "seek": 233088, "start": 2347.28, "end": 2353.04, "text": " neural-based + approaches, as well as blended or hybrid or traditional approaches. So, one,", "tokens": + [51184, 18161, 12, 6032, 11587, 11, 382, 731, 382, 27048, 420, 13051, 420, 5164, + 11587, 13, 407, 11, 472, 11, 51472], "temperature": 0.0, "avg_logprob": -0.15004480679829915, + "compression_ratio": 1.5912162162162162, "no_speech_prob": 0.004984300117939711}, + {"id": 359, "seek": 233088, "start": 2353.04, "end": 2358.56, "text": " start with + your engine. See what it''s capable of. And then on the data set front, it really + kind of", "tokens": [51472, 722, 365, 428, 2848, 13, 3008, 437, 309, 311, 8189, + 295, 13, 400, 550, 322, 264, 1412, 992, 1868, 11, 309, 534, 733, 295, 51748], "temperature": + 0.0, "avg_logprob": -0.15004480679829915, "compression_ratio": 1.5912162162162162, + "no_speech_prob": 0.004984300117939711}, {"id": 360, "seek": 235856, "start": 2358.56, + "end": 2364.56, "text": " depends on what your, what domain you''re in. But, you + know, I''m a big fan. You know, I often start", "tokens": [50364, 5946, 322, 437, + 428, 11, 437, 9274, 291, 434, 294, 13, 583, 11, 291, 458, 11, 286, 478, 257, 955, + 3429, 13, 509, 458, 11, 286, 2049, 722, 50664], "temperature": 0.0, "avg_logprob": + -0.16028053943927473, "compression_ratio": 1.6144067796610169, "no_speech_prob": + 0.0006086836219765246}, {"id": 361, "seek": 235856, "start": 2364.56, "end": 2371.6, + "text": " with public data sets, Trek TREC is a great place to get data sets across + a large number of", "tokens": [50664, 365, 1908, 1412, 6352, 11, 25845, 15176, 8140, + 307, 257, 869, 1081, 281, 483, 1412, 6352, 2108, 257, 2416, 1230, 295, 51016], "temperature": + 0.0, "avg_logprob": -0.16028053943927473, "compression_ratio": 1.6144067796610169, + "no_speech_prob": 0.0006086836219765246}, {"id": 362, "seek": 235856, "start": 2371.6, + "end": 2377.2, "text": " domains. You can also get queries. So, whether you want + to do web search or e-commerce or legal or", "tokens": [51016, 25514, 13, 509, 393, + 611, 483, 24109, 13, 407, 11, 1968, 291, 528, 281, 360, 3670, 3164, 420, 308, 12, + 26926, 420, 5089, 420, 51296], "temperature": 0.0, "avg_logprob": -0.16028053943927473, + "compression_ratio": 1.6144067796610169, "no_speech_prob": 0.0006086836219765246}, + {"id": 363, "seek": 235856, "start": 2377.2, "end": 2385.6, "text": " enterprise + or medical, like you can go to track and get a data set and start indexing that,", + "tokens": [51296, 14132, 420, 4625, 11, 411, 291, 393, 352, 281, 2837, 293, 483, + 257, 1412, 992, 293, 722, 8186, 278, 300, 11, 51716], "temperature": 0.0, "avg_logprob": + -0.16028053943927473, "compression_ratio": 1.6144067796610169, "no_speech_prob": + 0.0006086836219765246}, {"id": 364, "seek": 238560, "start": 2385.6, "end": 2392.16, + "text": " playing around with it. These days also, it''s just super easy to go crawl. + So, you know, get like", "tokens": [50364, 2433, 926, 365, 309, 13, 1981, 1708, + 611, 11, 309, 311, 445, 1687, 1858, 281, 352, 24767, 13, 407, 11, 291, 458, 11, + 483, 411, 50692], "temperature": 0.0, "avg_logprob": -0.15940564795385434, "compression_ratio": + 1.4630541871921183, "no_speech_prob": 0.005309475585818291}, {"id": 365, "seek": + 238560, "start": 2394.96, "end": 2403.04, "text": " scrappy or curl or WGET or whatever, + or it''s one of these crawlers and go crawl websites. And then", "tokens": [50832, + 13943, 7966, 420, 22591, 420, 343, 38, 4850, 420, 2035, 11, 420, 309, 311, 472, + 295, 613, 13999, 11977, 293, 352, 24767, 12891, 13, 400, 550, 51236], "temperature": + 0.0, "avg_logprob": -0.15940564795385434, "compression_ratio": 1.4630541871921183, + "no_speech_prob": 0.005309475585818291}, {"id": 366, "seek": 238560, "start": 2403.04, + "end": 2410.08, "text": " you can start going from there. The query log side tends + to be a little bit harder because companies", "tokens": [51236, 291, 393, 722, 516, + 490, 456, 13, 440, 14581, 3565, 1252, 12258, 281, 312, 257, 707, 857, 6081, 570, + 3431, 51588], "temperature": 0.0, "avg_logprob": -0.15940564795385434, "compression_ratio": + 1.4630541871921183, "no_speech_prob": 0.005309475585818291}, {"id": 367, "seek": + 241008, "start": 2410.08, "end": 2417.2799999999997, "text": " don''t like to release + their queries. But there are several data sets that do have some form of", "tokens": + [50364, 500, 380, 411, 281, 4374, 641, 24109, 13, 583, 456, 366, 2940, 1412, 6352, + 300, 360, 362, 512, 1254, 295, 50724], "temperature": 0.0, "avg_logprob": -0.10343896126260563, + "compression_ratio": 1.6008583690987124, "no_speech_prob": 0.002642524428665638}, + {"id": 368, "seek": 241008, "start": 2417.2799999999997, "end": 2422.88, "text": + " queries with them. They may not be enough for you to fully test all the features + of an engine.", "tokens": [50724, 24109, 365, 552, 13, 814, 815, 406, 312, 1547, + 337, 291, 281, 4498, 1500, 439, 264, 4122, 295, 364, 2848, 13, 51004], "temperature": + 0.0, "avg_logprob": -0.10343896126260563, "compression_ratio": 1.6008583690987124, + "no_speech_prob": 0.002642524428665638}, {"id": 369, "seek": 241008, "start": 2423.52, + "end": 2430.24, "text": " So, in our class, we use a really old data set from Best + Buy that has query logs. In it,", "tokens": [51036, 407, 11, 294, 527, 1508, 11, + 321, 764, 257, 534, 1331, 1412, 992, 490, 9752, 19146, 300, 575, 14581, 20820, 13, + 682, 309, 11, 51372], "temperature": 0.0, "avg_logprob": -0.10343896126260563, "compression_ratio": + 1.6008583690987124, "no_speech_prob": 0.002642524428665638}, {"id": 370, "seek": + 241008, "start": 2430.24, "end": 2435.7599999999998, "text": " well, query click + logs. But for instance, it doesn''t tell you what was shown the user. It just", + "tokens": [51372, 731, 11, 14581, 2052, 20820, 13, 583, 337, 5197, 11, 309, 1177, + 380, 980, 291, 437, 390, 4898, 264, 4195, 13, 467, 445, 51648], "temperature": 0.0, + "avg_logprob": -0.10343896126260563, "compression_ratio": 1.6008583690987124, "no_speech_prob": + 0.002642524428665638}, {"id": 371, "seek": 243576, "start": 2435.76, "end": 2441.28, + "text": " tells you what they clicked on. And so, you can''t actually build full + models or effective models with", "tokens": [50364, 5112, 291, 437, 436, 23370, + 322, 13, 400, 370, 11, 291, 393, 380, 767, 1322, 1577, 5245, 420, 4942, 5245, 365, + 50640], "temperature": 0.0, "avg_logprob": -0.10002184647780199, "compression_ratio": + 1.8043478260869565, "no_speech_prob": 0.00430570263415575}, {"id": 372, "seek": + 243576, "start": 2441.28, "end": 2446.96, "text": " that. But it''s actually a really + good e-commerce data set because it has all of the problems of a", "tokens": [50640, + 300, 13, 583, 309, 311, 767, 257, 534, 665, 308, 12, 26926, 1412, 992, 570, 309, + 575, 439, 295, 264, 2740, 295, 257, 50924], "temperature": 0.0, "avg_logprob": -0.10002184647780199, + "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.00430570263415575}, + {"id": 373, "seek": 243576, "start": 2446.96, "end": 2452.96, "text": " data set + that comes from a company. Namely, there''s a lot of missing data in there. There''s + a lot", "tokens": [50924, 1412, 992, 300, 1487, 490, 257, 2237, 13, 10684, 736, + 11, 456, 311, 257, 688, 295, 5361, 1412, 294, 456, 13, 821, 311, 257, 688, 51224], + "temperature": 0.0, "avg_logprob": -0.10002184647780199, "compression_ratio": 1.8043478260869565, + "no_speech_prob": 0.00430570263415575}, {"id": 374, "seek": 243576, "start": 2452.96, + "end": 2457.2000000000003, "text": " of bad data. But there''s also a lot of really + good data. And so, starting with those, and then I think,", "tokens": [51224, 295, + 1578, 1412, 13, 583, 456, 311, 611, 257, 688, 295, 534, 665, 1412, 13, 400, 370, + 11, 2891, 365, 729, 11, 293, 550, 286, 519, 11, 51436], "temperature": 0.0, "avg_logprob": + -0.10002184647780199, "compression_ratio": 1.8043478260869565, "no_speech_prob": + 0.00430570263415575}, {"id": 375, "seek": 243576, "start": 2457.2000000000003, "end": + 2463.2000000000003, "text": " you know, you kind of just start to push the engine + through its paces. Start with the tutorials,", "tokens": [51436, 291, 458, 11, 291, + 733, 295, 445, 722, 281, 2944, 264, 2848, 807, 1080, 280, 2116, 13, 6481, 365, 264, + 17616, 11, 51736], "temperature": 0.0, "avg_logprob": -0.10002184647780199, "compression_ratio": + 1.8043478260869565, "no_speech_prob": 0.00430570263415575}, {"id": 376, "seek": + 246320, "start": 2463.2, "end": 2469.4399999999996, "text": " the basic features, + and then see where you can go deeper. Can you actually get Best In Class", "tokens": + [50364, 264, 3875, 4122, 11, 293, 550, 536, 689, 291, 393, 352, 7731, 13, 1664, + 291, 767, 483, 9752, 682, 9471, 50676], "temperature": 0.0, "avg_logprob": -0.13748544057210285, + "compression_ratio": 1.662162162162162, "no_speech_prob": 0.007721584755927324}, + {"id": 377, "seek": 246320, "start": 2471.2799999999997, "end": 2478.24, "text": + " relevance measurement out of it? Can you get Best In Class speed performance out + of it?", "tokens": [50768, 32684, 13160, 484, 295, 309, 30, 1664, 291, 483, 9752, + 682, 9471, 3073, 3389, 484, 295, 309, 30, 51116], "temperature": 0.0, "avg_logprob": + -0.13748544057210285, "compression_ratio": 1.662162162162162, "no_speech_prob": + 0.007721584755927324}, {"id": 378, "seek": 246320, "start": 2478.24, "end": 2483.2, + "text": " And then just work your way through the engine. And these days, you can + typically do that in,", "tokens": [51116, 400, 550, 445, 589, 428, 636, 807, 264, + 2848, 13, 400, 613, 1708, 11, 291, 393, 5850, 360, 300, 294, 11, 51364], "temperature": + 0.0, "avg_logprob": -0.13748544057210285, "compression_ratio": 1.662162162162162, + "no_speech_prob": 0.007721584755927324}, {"id": 379, "seek": 246320, "start": 2484.0, + "end": 2490.7999999999997, "text": " say, less than a week. And that''s really amazing, + right? Especially when you combine that with", "tokens": [51404, 584, 11, 1570, + 813, 257, 1243, 13, 400, 300, 311, 534, 2243, 11, 558, 30, 8545, 562, 291, 10432, + 300, 365, 51744], "temperature": 0.0, "avg_logprob": -0.13748544057210285, "compression_ratio": + 1.662162162162162, "no_speech_prob": 0.007721584755927324}, {"id": 380, "seek": + 249080, "start": 2490.8, "end": 2494.7200000000003, "text": " all the great information + out on the web, right? Like, you know, I think when I was getting started,", "tokens": + [50364, 439, 264, 869, 1589, 484, 322, 264, 3670, 11, 558, 30, 1743, 11, 291, 458, + 11, 286, 519, 562, 286, 390, 1242, 1409, 11, 50560], "temperature": 0.0, "avg_logprob": + -0.10030921936035156, "compression_ratio": 1.6920415224913494, "no_speech_prob": + 0.009588594548404217}, {"id": 381, "seek": 249080, "start": 2494.7200000000003, + "end": 2501.6800000000003, "text": " it was, you know, you had to go and really + dig in underneath the hood and kind of figure out a lot", "tokens": [50560, 309, + 390, 11, 291, 458, 11, 291, 632, 281, 352, 293, 534, 2528, 294, 7223, 264, 13376, + 293, 733, 295, 2573, 484, 257, 688, 50908], "temperature": 0.0, "avg_logprob": -0.10030921936035156, + "compression_ratio": 1.6920415224913494, "no_speech_prob": 0.009588594548404217}, + {"id": 382, "seek": 249080, "start": 2501.6800000000003, "end": 2507.84, "text": + " of those pieces these days. It would take several weeks, if not months, you know, + month or more to", "tokens": [50908, 295, 729, 3755, 613, 1708, 13, 467, 576, 747, + 2940, 3259, 11, 498, 406, 2493, 11, 291, 458, 11, 1618, 420, 544, 281, 51216], "temperature": + 0.0, "avg_logprob": -0.10030921936035156, "compression_ratio": 1.6920415224913494, + "no_speech_prob": 0.009588594548404217}, {"id": 383, "seek": 249080, "start": 2507.84, + "end": 2513.2000000000003, "text": " really feel like you understood an engine and + where it went. And I think these days, it''s just so", "tokens": [51216, 534, 841, + 411, 291, 7320, 364, 2848, 293, 689, 309, 1437, 13, 400, 286, 519, 613, 1708, 11, + 309, 311, 445, 370, 51484], "temperature": 0.0, "avg_logprob": -0.10030921936035156, + "compression_ratio": 1.6920415224913494, "no_speech_prob": 0.009588594548404217}, + {"id": 384, "seek": 249080, "start": 2513.2000000000003, "end": 2518.2400000000002, + "text": " much easier to do that, which is awesome. Yeah, absolutely. And I remember + during the course", "tokens": [51484, 709, 3571, 281, 360, 300, 11, 597, 307, 3476, + 13, 865, 11, 3122, 13, 400, 286, 1604, 1830, 264, 1164, 51736], "temperature": 0.0, + "avg_logprob": -0.10030921936035156, "compression_ratio": 1.6920415224913494, "no_speech_prob": + 0.009588594548404217}, {"id": 385, "seek": 251824, "start": 2518.24, "end": 2527.2799999999997, + "text": " we had to do it within a week. So per project. So that was super exciting. + And I think this would", "tokens": [50364, 321, 632, 281, 360, 309, 1951, 257, 1243, + 13, 407, 680, 1716, 13, 407, 300, 390, 1687, 4670, 13, 400, 286, 519, 341, 576, + 50816], "temperature": 0.0, "avg_logprob": -0.15960913735467033, "compression_ratio": + 1.5368421052631578, "no_speech_prob": 0.0042487503960728645}, {"id": 386, "seek": + 251824, "start": 2527.2799999999997, "end": 2533.12, "text": " not be a vector podcast + if I wouldn''t ask you also on your opinion in vector search. Like what''s your", + "tokens": [50816, 406, 312, 257, 8062, 7367, 498, 286, 2759, 380, 1029, 291, 611, + 322, 428, 4800, 294, 8062, 3164, 13, 1743, 437, 311, 428, 51108], "temperature": + 0.0, "avg_logprob": -0.15960913735467033, "compression_ratio": 1.5368421052631578, + "no_speech_prob": 0.0042487503960728645}, {"id": 387, "seek": 251824, "start": 2533.9199999999996, + "end": 2542.24, "text": " feel for how it will augment the search engine experience + on the user side as well as on the", "tokens": [51148, 841, 337, 577, 309, 486, + 29919, 264, 3164, 2848, 1752, 322, 264, 4195, 1252, 382, 731, 382, 322, 264, 51564], + "temperature": 0.0, "avg_logprob": -0.15960913735467033, "compression_ratio": 1.5368421052631578, + "no_speech_prob": 0.0042487503960728645}, {"id": 388, "seek": 254224, "start": 2542.3199999999997, + "end": 2549.52, "text": " development side and connected to that. What do you think + the search engine engineer profession is", "tokens": [50368, 3250, 1252, 293, 4582, + 281, 300, 13, 708, 360, 291, 519, 264, 3164, 2848, 11403, 7032, 307, 50728], "temperature": + 0.0, "avg_logprob": -0.1299250176612367, "compression_ratio": 1.6965811965811965, + "no_speech_prob": 0.003778139129281044}, {"id": 389, "seek": 254224, "start": 2549.52, + "end": 2554.8799999999997, "text": " going to be like soon? And I think it''s already + shaping up in many ways. Like the boundary between", "tokens": [50728, 516, 281, + 312, 411, 2321, 30, 400, 286, 519, 309, 311, 1217, 25945, 493, 294, 867, 2098, 13, + 1743, 264, 12866, 1296, 50996], "temperature": 0.0, "avg_logprob": -0.1299250176612367, + "compression_ratio": 1.6965811965811965, "no_speech_prob": 0.003778139129281044}, + {"id": 390, "seek": 254224, "start": 2554.8799999999997, "end": 2562.08, "text": + " data scientists and the search engineer blend. Do you feel yourself like that? + Do you think this is", "tokens": [50996, 1412, 7708, 293, 264, 3164, 11403, 10628, + 13, 1144, 291, 841, 1803, 411, 300, 30, 1144, 291, 519, 341, 307, 51356], "temperature": + 0.0, "avg_logprob": -0.1299250176612367, "compression_ratio": 1.6965811965811965, + "no_speech_prob": 0.003778139129281044}, {"id": 391, "seek": 254224, "start": 2562.08, + "end": 2567.7599999999998, "text": " the direction we are going? Or do you think + it''s going to be like a form that will wear off? That''s", "tokens": [51356, 264, + 3513, 321, 366, 516, 30, 1610, 360, 291, 519, 309, 311, 516, 281, 312, 411, 257, + 1254, 300, 486, 3728, 766, 30, 663, 311, 51640], "temperature": 0.0, "avg_logprob": + -0.1299250176612367, "compression_ratio": 1.6965811965811965, "no_speech_prob": + 0.003778139129281044}, {"id": 392, "seek": 256776, "start": 2567.84, "end": 2573.5200000000004, + "text": " at some point. Yeah, I mean, it''s, well, it''s not going to wear off. + I mean, there''s too much money", "tokens": [50368, 412, 512, 935, 13, 865, 11, + 286, 914, 11, 309, 311, 11, 731, 11, 309, 311, 406, 516, 281, 3728, 766, 13, 286, + 914, 11, 456, 311, 886, 709, 1460, 50652], "temperature": 0.0, "avg_logprob": -0.18364983070187452, + "compression_ratio": 1.5591397849462365, "no_speech_prob": 0.0016648039454594254}, + {"id": 393, "seek": 256776, "start": 2573.5200000000004, "end": 2581.1200000000003, + "text": " and too much investment and too much better results. I will state upfront, + I''m not an expert on", "tokens": [50652, 293, 886, 709, 6078, 293, 886, 709, 1101, + 3542, 13, 286, 486, 1785, 30264, 11, 286, 478, 406, 364, 5844, 322, 51032], "temperature": + 0.0, "avg_logprob": -0.18364983070187452, "compression_ratio": 1.5591397849462365, + "no_speech_prob": 0.0016648039454594254}, {"id": 394, "seek": 256776, "start": 2581.1200000000003, + "end": 2589.84, "text": " these vector engines, right? Like I, it''s kind of interesting. + Like they, I went back and look", "tokens": [51032, 613, 8062, 12982, 11, 558, 30, + 1743, 286, 11, 309, 311, 733, 295, 1880, 13, 1743, 436, 11, 286, 1437, 646, 293, + 574, 51468], "temperature": 0.0, "avg_logprob": -0.18364983070187452, "compression_ratio": + 1.5591397849462365, "no_speech_prob": 0.0016648039454594254}, {"id": 395, "seek": + 258984, "start": 2589.84, "end": 2598.48, "text": " through some of my talks and + I think I gave a talk in 2013 on what the Lucine and Solar community", "tokens": + [50364, 807, 512, 295, 452, 6686, 293, 286, 519, 286, 2729, 257, 751, 294, 9012, + 322, 437, 264, 9593, 533, 293, 22385, 1768, 50796], "temperature": 0.0, "avg_logprob": + -0.16044195810953776, "compression_ratio": 1.5051020408163265, "no_speech_prob": + 0.006680662278085947}, {"id": 396, "seek": 258984, "start": 2598.48, "end": 2606.0, + "text": " needed to do next. And one of the things was we need to add support for + dense vectors. That was 2013.", "tokens": [50796, 2978, 281, 360, 958, 13, 400, + 472, 295, 264, 721, 390, 321, 643, 281, 909, 1406, 337, 18011, 18875, 13, 663, 390, + 9012, 13, 51172], "temperature": 0.0, "avg_logprob": -0.16044195810953776, "compression_ratio": + 1.5051020408163265, "no_speech_prob": 0.006680662278085947}, {"id": 397, "seek": + 258984, "start": 2606.7200000000003, "end": 2615.04, "text": " I think we just got + dense vector support in solar. Elastic maybe was there a little bit sooner,", "tokens": + [51208, 286, 519, 321, 445, 658, 18011, 8062, 1406, 294, 7936, 13, 2699, 2750, 1310, + 390, 456, 257, 707, 857, 15324, 11, 51624], "temperature": 0.0, "avg_logprob": -0.16044195810953776, + "compression_ratio": 1.5051020408163265, "no_speech_prob": 0.006680662278085947}, + {"id": 398, "seek": 261504, "start": 2615.04, "end": 2622.32, "text": " but roughly + same time frame. There are plugins, of course, that have been around like the K&N", + "tokens": [50364, 457, 9810, 912, 565, 3920, 13, 821, 366, 33759, 11, 295, 1164, + 11, 300, 362, 668, 926, 411, 264, 591, 5, 45, 50728], "temperature": 0.0, "avg_logprob": + -0.18812683837054528, "compression_ratio": 1.5, "no_speech_prob": 0.004257539752870798}, + {"id": 399, "seek": 261504, "start": 2622.32, "end": 2628.08, "text": " plugins, + things like that. Hey folks, like this stuff is here to stay. I mean, the really + interesting", "tokens": [50728, 33759, 11, 721, 411, 300, 13, 1911, 4024, 11, 411, + 341, 1507, 307, 510, 281, 1754, 13, 286, 914, 11, 264, 534, 1880, 51016], "temperature": + 0.0, "avg_logprob": -0.18812683837054528, "compression_ratio": 1.5, "no_speech_prob": + 0.004257539752870798}, {"id": 400, "seek": 261504, "start": 2628.08, "end": 2637.36, + "text": " questions, you''re starting to see these hybrid models where, like BM25 + is still really good and", "tokens": [51016, 1651, 11, 291, 434, 2891, 281, 536, + 613, 13051, 5245, 689, 11, 411, 15901, 6074, 307, 920, 534, 665, 293, 51480], "temperature": + 0.0, "avg_logprob": -0.18812683837054528, "compression_ratio": 1.5, "no_speech_prob": + 0.004257539752870798}, {"id": 401, "seek": 263736, "start": 2637.36, "end": 2644.56, + "text": " really fast at that first pass retrieval. It''s kind of hard to beat in + terms of the scale at which", "tokens": [50364, 534, 2370, 412, 300, 700, 1320, + 19817, 3337, 13, 467, 311, 733, 295, 1152, 281, 4224, 294, 2115, 295, 264, 4373, + 412, 597, 50724], "temperature": 0.0, "avg_logprob": -0.11015045249855125, "compression_ratio": + 1.6322314049586777, "no_speech_prob": 0.0017154898960143328}, {"id": 402, "seek": + 263736, "start": 2644.56, "end": 2654.6400000000003, "text": " you can get a first + pass rank, right? And then feeding it, those results into much deeper or more", + "tokens": [50724, 291, 393, 483, 257, 700, 1320, 6181, 11, 558, 30, 400, 550, 12919, + 309, 11, 729, 3542, 666, 709, 7731, 420, 544, 51228], "temperature": 0.0, "avg_logprob": + -0.11015045249855125, "compression_ratio": 1.6322314049586777, "no_speech_prob": + 0.0017154898960143328}, {"id": 403, "seek": 263736, "start": 2654.6400000000003, + "end": 2661.28, "text": " capable engines. I think that''s been around for a while + and academia has proven that out. Clearly,", "tokens": [51228, 8189, 12982, 13, + 286, 519, 300, 311, 668, 926, 337, 257, 1339, 293, 28937, 575, 12785, 300, 484, + 13, 24120, 11, 51560], "temperature": 0.0, "avg_logprob": -0.11015045249855125, + "compression_ratio": 1.6322314049586777, "no_speech_prob": 0.0017154898960143328}, + {"id": 404, "seek": 263736, "start": 2661.28, "end": 2666.88, "text": " like using + embeddings and vectors for things like query understanding and content understanding + and", "tokens": [51560, 411, 1228, 12240, 29432, 293, 18875, 337, 721, 411, 14581, + 3701, 293, 2701, 3701, 293, 51840], "temperature": 0.0, "avg_logprob": -0.11015045249855125, + "compression_ratio": 1.6322314049586777, "no_speech_prob": 0.0017154898960143328}, + {"id": 405, "seek": 266688, "start": 2666.88, "end": 2673.52, "text": " using tools + like Burnt, etc. for enriching your understanding, your content, and then", "tokens": + [50364, 1228, 3873, 411, 7031, 580, 11, 5183, 13, 337, 18849, 278, 428, 3701, 11, + 428, 2701, 11, 293, 550, 50696], "temperature": 0.0, "avg_logprob": -0.17834359949285333, + "compression_ratio": 1.5113636363636365, "no_speech_prob": 0.0001374840212520212}, + {"id": 406, "seek": 266688, "start": 2675.6800000000003, "end": 2683.12, "text": + " making those searchable. That''s all, I think, well and good. I think the really + interesting", "tokens": [50804, 1455, 729, 3164, 712, 13, 663, 311, 439, 11, 286, + 519, 11, 731, 293, 665, 13, 286, 519, 264, 534, 1880, 51176], "temperature": 0.0, + "avg_logprob": -0.17834359949285333, "compression_ratio": 1.5113636363636365, "no_speech_prob": + 0.0001374840212520212}, {"id": 407, "seek": 266688, "start": 2683.12, "end": 2692.2400000000002, + "text": " question will be is whether the vector engines can add all of the layers + that the sparse", "tokens": [51176, 1168, 486, 312, 307, 1968, 264, 8062, 12982, + 393, 909, 439, 295, 264, 7914, 300, 264, 637, 11668, 51632], "temperature": 0.0, + "avg_logprob": -0.17834359949285333, "compression_ratio": 1.5113636363636365, "no_speech_prob": + 0.0001374840212520212}, {"id": 408, "seek": 269224, "start": 2692.24, "end": 2697.3599999999997, + "text": " approaches have, I don''t know about perfected, but added over the years, + you know, the", "tokens": [50364, 11587, 362, 11, 286, 500, 380, 458, 466, 2176, + 292, 11, 457, 3869, 670, 264, 924, 11, 291, 458, 11, 264, 50620], "temperature": + 0.0, "avg_logprob": -0.13036582175265538, "compression_ratio": 1.6592920353982301, + "no_speech_prob": 0.0015548147493973374}, {"id": 409, "seek": 269224, "start": 2697.3599999999997, + "end": 2702.56, "text": " fascinating, the aggregations, the spell checkings, the + highlighting, all of those things that", "tokens": [50620, 10343, 11, 264, 16743, + 763, 11, 264, 9827, 1520, 1109, 11, 264, 26551, 11, 439, 295, 729, 721, 300, 50880], + "temperature": 0.0, "avg_logprob": -0.13036582175265538, "compression_ratio": 1.6592920353982301, + "no_speech_prob": 0.0015548147493973374}, {"id": 410, "seek": 269224, "start": 2702.56, + "end": 2709.4399999999996, "text": " actually go into building a search application. + If the vector engines deliver all of those things", "tokens": [50880, 767, 352, + 666, 2390, 257, 3164, 3861, 13, 759, 264, 8062, 12982, 4239, 439, 295, 729, 721, + 51224], "temperature": 0.0, "avg_logprob": -0.13036582175265538, "compression_ratio": + 1.6592920353982301, "no_speech_prob": 0.0015548147493973374}, {"id": 411, "seek": + 269224, "start": 2709.4399999999996, "end": 2717.3599999999997, "text": " and deliver + better results, that''s probably a no brainer, right? In the meantime, we have these", + "tokens": [51224, 293, 4239, 1101, 3542, 11, 300, 311, 1391, 257, 572, 3567, 260, + 11, 558, 30, 682, 264, 14991, 11, 321, 362, 613, 51620], "temperature": 0.0, "avg_logprob": + -0.13036582175265538, "compression_ratio": 1.6592920353982301, "no_speech_prob": + 0.0015548147493973374}, {"id": 412, "seek": 271736, "start": 2717.36, "end": 2722.0, + "text": " hybrids because I think there nobody is delivering all of the capabilities. + The other things that''s", "tokens": [50364, 2477, 1443, 3742, 570, 286, 519, 456, + 5079, 307, 14666, 439, 295, 264, 10862, 13, 440, 661, 721, 300, 311, 50596], "temperature": + 0.0, "avg_logprob": -0.17125726782757303, "compression_ratio": 1.6544117647058822, + "no_speech_prob": 0.0016281824791803956}, {"id": 413, "seek": 271736, "start": 2722.0, + "end": 2728.2400000000002, "text": " interesting with the dense vectors, right, + is that you can start to map multimodal data types", "tokens": [50596, 1880, 365, + 264, 18011, 18875, 11, 558, 11, 307, 300, 291, 393, 722, 281, 4471, 32972, 378, + 304, 1412, 3467, 50908], "temperature": 0.0, "avg_logprob": -0.17125726782757303, + "compression_ratio": 1.6544117647058822, "no_speech_prob": 0.0016281824791803956}, + {"id": 414, "seek": 271736, "start": 2728.2400000000002, "end": 2734.48, "text": + " all into the same engine. So images and text and audio, etc. Right? And again, + like I''m not an", "tokens": [50908, 439, 666, 264, 912, 2848, 13, 407, 5267, 293, + 2487, 293, 6278, 11, 5183, 13, 1779, 30, 400, 797, 11, 411, 286, 478, 406, 364, + 51220], "temperature": 0.0, "avg_logprob": -0.17125726782757303, "compression_ratio": + 1.6544117647058822, "no_speech_prob": 0.0016281824791803956}, {"id": 415, "seek": + 271736, "start": 2734.48, "end": 2738.32, "text": " expert on this, but that''s + my understanding. So then, so then you can query across", "tokens": [51220, 5844, + 322, 341, 11, 457, 300, 311, 452, 3701, 13, 407, 550, 11, 370, 550, 291, 393, 14581, + 2108, 51412], "temperature": 0.0, "avg_logprob": -0.17125726782757303, "compression_ratio": + 1.6544117647058822, "no_speech_prob": 0.0016281824791803956}, {"id": 416, "seek": + 271736, "start": 2740.0, "end": 2743.44, "text": " spaces, if you will. Again, like + I''m not using the right terminology here, but", "tokens": [51496, 7673, 11, 498, + 291, 486, 13, 3764, 11, 411, 286, 478, 406, 1228, 264, 558, 27575, 510, 11, 457, + 51668], "temperature": 0.0, "avg_logprob": -0.17125726782757303, "compression_ratio": + 1.6544117647058822, "no_speech_prob": 0.0016281824791803956}, {"id": 417, "seek": + 274344, "start": 2743.84, "end": 2751.04, "text": " and that to me is often the, + at least people talk about that like it''s a holy grail. I''m not fully", "tokens": + [50384, 293, 300, 281, 385, 307, 2049, 264, 11, 412, 1935, 561, 751, 466, 300, 411, + 309, 311, 257, 10622, 1295, 388, 13, 286, 478, 406, 4498, 50744], "temperature": + 0.0, "avg_logprob": -0.12974972085854442, "compression_ratio": 1.672340425531915, + "no_speech_prob": 0.008284059353172779}, {"id": 418, "seek": 274344, "start": 2751.04, + "end": 2757.76, "text": " convinced people will actually search that way. I still + think that remains to seem because there''s", "tokens": [50744, 12561, 561, 486, + 767, 3164, 300, 636, 13, 286, 920, 519, 300, 7023, 281, 1643, 570, 456, 311, 51080], + "temperature": 0.0, "avg_logprob": -0.12974972085854442, "compression_ratio": 1.672340425531915, + "no_speech_prob": 0.008284059353172779}, {"id": 419, "seek": 274344, "start": 2757.76, + "end": 2764.16, "text": " a lot of implications for the the user interface and the + user experience is how you interact with", "tokens": [51080, 257, 688, 295, 16602, + 337, 264, 264, 4195, 9226, 293, 264, 4195, 1752, 307, 577, 291, 4648, 365, 51400], + "temperature": 0.0, "avg_logprob": -0.12974972085854442, "compression_ratio": 1.672340425531915, + "no_speech_prob": 0.008284059353172779}, {"id": 420, "seek": 274344, "start": 2764.16, + "end": 2768.96, "text": " that. You know, like people have long talked about, oh, + hey, I''m going to take a picture and then", "tokens": [51400, 300, 13, 509, 458, + 11, 411, 561, 362, 938, 2825, 466, 11, 1954, 11, 4177, 11, 286, 478, 516, 281, 747, + 257, 3036, 293, 550, 51640], "temperature": 0.0, "avg_logprob": -0.12974972085854442, + "compression_ratio": 1.672340425531915, "no_speech_prob": 0.008284059353172779}, + {"id": 421, "seek": 276896, "start": 2768.96, "end": 2773.92, "text": " get my back, + my search results, but like I don''t every time I use those tools, I''m like, okay,", + "tokens": [50364, 483, 452, 646, 11, 452, 3164, 3542, 11, 457, 411, 286, 500, 380, + 633, 565, 286, 764, 729, 3873, 11, 286, 478, 411, 11, 1392, 11, 50612], "temperature": + 0.0, "avg_logprob": -0.10362108321416946, "compression_ratio": 1.606694560669456, + "no_speech_prob": 0.0011397736379876733}, {"id": 422, "seek": 276896, "start": 2773.92, + "end": 2780.32, "text": " that''s nice, but it''s still clunky from a user experience + standpoint, right? So, so like there''s", "tokens": [50612, 300, 311, 1481, 11, + 457, 309, 311, 920, 596, 25837, 490, 257, 4195, 1752, 15827, 11, 558, 30, 407, 11, + 370, 411, 456, 311, 50932], "temperature": 0.0, "avg_logprob": -0.10362108321416946, + "compression_ratio": 1.606694560669456, "no_speech_prob": 0.0011397736379876733}, + {"id": 423, "seek": 276896, "start": 2780.32, "end": 2788.64, "text": " a lot of + that work above and beyond just the core engine that has to be solved. But clearly,", + "tokens": [50932, 257, 688, 295, 300, 589, 3673, 293, 4399, 445, 264, 4965, 2848, + 300, 575, 281, 312, 13041, 13, 583, 4448, 11, 51348], "temperature": 0.0, "avg_logprob": + -0.10362108321416946, "compression_ratio": 1.606694560669456, "no_speech_prob": + 0.0011397736379876733}, {"id": 424, "seek": 276896, "start": 2788.64, "end": 2793.6, + "text": " there''s a lot of money and effort going into it. And so like as a search + engineer, you can''t ignore", "tokens": [51348, 456, 311, 257, 688, 295, 1460, 293, + 4630, 516, 666, 309, 13, 400, 370, 411, 382, 257, 3164, 11403, 11, 291, 393, 380, + 11200, 51596], "temperature": 0.0, "avg_logprob": -0.10362108321416946, "compression_ratio": + 1.606694560669456, "no_speech_prob": 0.0011397736379876733}, {"id": 425, "seek": + 279360, "start": 2794.0, "end": 2799.2799999999997, "text": " as a data scientist, + you can''t ignore it. And so you''ve got to get up on how these are built.", "tokens": + [50384, 382, 257, 1412, 12662, 11, 291, 393, 380, 11200, 309, 13, 400, 370, 291, + 600, 658, 281, 483, 493, 322, 577, 613, 366, 3094, 13, 50648], "temperature": 0.0, + "avg_logprob": -0.0827808478443893, "compression_ratio": 1.6163793103448276, "no_speech_prob": + 0.008597486652433872}, {"id": 426, "seek": 279360, "start": 2800.0, "end": 2806.3199999999997, + "text": " I think all the major engines open source and private have some form of + it at this point of", "tokens": [50684, 286, 519, 439, 264, 2563, 12982, 1269, 4009, + 293, 4551, 362, 512, 1254, 295, 309, 412, 341, 935, 295, 51000], "temperature": + 0.0, "avg_logprob": -0.0827808478443893, "compression_ratio": 1.6163793103448276, + "no_speech_prob": 0.008597486652433872}, {"id": 427, "seek": 279360, "start": 2806.3199999999997, + "end": 2812.4, "text": " blended models. Again, like, you know, if you''re in a + domain that you don''t have enough data for", "tokens": [51000, 27048, 5245, 13, + 3764, 11, 411, 11, 291, 458, 11, 498, 291, 434, 294, 257, 9274, 300, 291, 500, 380, + 362, 1547, 1412, 337, 51304], "temperature": 0.0, "avg_logprob": -0.0827808478443893, + "compression_ratio": 1.6163793103448276, "no_speech_prob": 0.008597486652433872}, + {"id": 428, "seek": 279360, "start": 2812.4, "end": 2818.72, "text": " these and + may or may not work, although again, like one of the interesting things with these", + "tokens": [51304, 613, 293, 815, 420, 815, 406, 589, 11, 4878, 797, 11, 411, 472, + 295, 264, 1880, 721, 365, 613, 51620], "temperature": 0.0, "avg_logprob": -0.0827808478443893, + "compression_ratio": 1.6163793103448276, "no_speech_prob": 0.008597486652433872}, + {"id": 429, "seek": 281872, "start": 2819.52, "end": 2825.3599999999997, "text": + " neural models, right, is you can often train on a general model and then just + use a few examples", "tokens": [50404, 18161, 5245, 11, 558, 11, 307, 291, 393, + 2049, 3847, 322, 257, 2674, 2316, 293, 550, 445, 764, 257, 1326, 5110, 50696], "temperature": + 0.0, "avg_logprob": -0.08858027403382049, "compression_ratio": 1.6375545851528384, + "no_speech_prob": 0.0026207230985164642}, {"id": 430, "seek": 281872, "start": 2825.3599999999997, + "end": 2832.0, "text": " from your domain to essentially tailor that general model + to your environment, right? Like I''m", "tokens": [50696, 490, 428, 9274, 281, 4476, + 33068, 300, 2674, 2316, 281, 428, 2823, 11, 558, 30, 1743, 286, 478, 51028], "temperature": + 0.0, "avg_logprob": -0.08858027403382049, "compression_ratio": 1.6375545851528384, + "no_speech_prob": 0.0026207230985164642}, {"id": 431, "seek": 281872, "start": 2832.0, + "end": 2838.3199999999997, "text": " working on one of my clients is doing this + in the NLP space right now. We''re using a general", "tokens": [51028, 1364, 322, + 472, 295, 452, 6982, 307, 884, 341, 294, 264, 426, 45196, 1901, 558, 586, 13, 492, + 434, 1228, 257, 2674, 51344], "temperature": 0.0, "avg_logprob": -0.08858027403382049, + "compression_ratio": 1.6375545851528384, "no_speech_prob": 0.0026207230985164642}, + {"id": 432, "seek": 281872, "start": 2838.3199999999997, "end": 2845.4399999999996, + "text": " model around analyzing contracts and then we''re applying domain specific + things to it. And", "tokens": [51344, 2316, 926, 23663, 13952, 293, 550, 321, 434, + 9275, 9274, 2685, 721, 281, 309, 13, 400, 51700], "temperature": 0.0, "avg_logprob": + -0.08858027403382049, "compression_ratio": 1.6375545851528384, "no_speech_prob": + 0.0026207230985164642}, {"id": 433, "seek": 284544, "start": 2845.44, "end": 2851.92, + "text": " it''s really interesting how effective it is with very few examples, right? + That''s an NLP problem,", "tokens": [50364, 309, 311, 534, 1880, 577, 4942, 309, + 307, 365, 588, 1326, 5110, 11, 558, 30, 663, 311, 364, 426, 45196, 1154, 11, 50688], + "temperature": 0.0, "avg_logprob": -0.0966919198328135, "compression_ratio": 1.5219123505976095, + "no_speech_prob": 0.0020108248572796583}, {"id": 434, "seek": 284544, "start": 2851.92, + "end": 2857.04, "text": " not a search problem, but you know, so I think you''re + going to just continue to see that trend", "tokens": [50688, 406, 257, 3164, 1154, + 11, 457, 291, 458, 11, 370, 286, 519, 291, 434, 516, 281, 445, 2354, 281, 536, 300, + 6028, 50944], "temperature": 0.0, "avg_logprob": -0.0966919198328135, "compression_ratio": + 1.5219123505976095, "no_speech_prob": 0.0020108248572796583}, {"id": 435, "seek": + 284544, "start": 2857.04, "end": 2863.76, "text": " and grow and expand, right? + So you''ve got to be on board with it. Yeah, absolutely. And you can", "tokens": + [50944, 293, 1852, 293, 5268, 11, 558, 30, 407, 291, 600, 658, 281, 312, 322, 3150, + 365, 309, 13, 865, 11, 3122, 13, 400, 291, 393, 51280], "temperature": 0.0, "avg_logprob": + -0.0966919198328135, "compression_ratio": 1.5219123505976095, "no_speech_prob": + 0.0020108248572796583}, {"id": 436, "seek": 284544, "start": 2863.76, "end": 2869.84, + "text": " find of course more conversation on the podcast about this. But I think + I agree with you that", "tokens": [51280, 915, 295, 1164, 544, 3761, 322, 264, 7367, + 466, 341, 13, 583, 286, 519, 286, 3986, 365, 291, 300, 51584], "temperature": 0.0, + "avg_logprob": -0.0966919198328135, "compression_ratio": 1.5219123505976095, "no_speech_prob": + 0.0020108248572796583}, {"id": 437, "seek": 286984, "start": 2870.48, "end": 2876.48, + "text": " the multimodality aspect of vector search is quite exciting. And where + the data sits in images,", "tokens": [50396, 264, 32972, 378, 1860, 4171, 295, 8062, + 3164, 307, 1596, 4670, 13, 400, 689, 264, 1412, 12696, 294, 5267, 11, 50696], "temperature": + 0.0, "avg_logprob": -0.14240115880966187, "compression_ratio": 1.5766129032258065, + "no_speech_prob": 0.003340014023706317}, {"id": 438, "seek": 286984, "start": 2876.48, + "end": 2881.44, "text": " for instance, that haven''t been annotated yet, right? + And so many images uploaded every single day", "tokens": [50696, 337, 5197, 11, + 300, 2378, 380, 668, 25339, 770, 1939, 11, 558, 30, 400, 370, 867, 5267, 17135, + 633, 2167, 786, 50944], "temperature": 0.0, "avg_logprob": -0.14240115880966187, + "compression_ratio": 1.5766129032258065, "no_speech_prob": 0.003340014023706317}, + {"id": 439, "seek": 286984, "start": 2881.44, "end": 2889.1200000000003, "text": + " in videos, you know, if the model is able to transcend the the domains so easily + like clip model,", "tokens": [50944, 294, 2145, 11, 291, 458, 11, 498, 264, 2316, + 307, 1075, 281, 28535, 264, 264, 25514, 370, 3612, 411, 7353, 2316, 11, 51328], + "temperature": 0.0, "avg_logprob": -0.14240115880966187, "compression_ratio": 1.5766129032258065, + "no_speech_prob": 0.003340014023706317}, {"id": 440, "seek": 286984, "start": 2889.1200000000003, + "end": 2895.6800000000003, "text": " for instance, built by OpenAI, it''s not a + perfect model. Sometimes it fails, but sometimes it also", "tokens": [51328, 337, + 5197, 11, 3094, 538, 7238, 48698, 11, 309, 311, 406, 257, 2176, 2316, 13, 4803, + 309, 18199, 11, 457, 2171, 309, 611, 51656], "temperature": 0.0, "avg_logprob": + -0.14240115880966187, "compression_ratio": 1.5766129032258065, "no_speech_prob": + 0.003340014023706317}, {"id": 441, "seek": 289568, "start": 2895.68, "end": 2903.2799999999997, + "text": " uses you like, how could it figure out, you know, to work so reliably + on my data that it hasn''t", "tokens": [50364, 4960, 291, 411, 11, 577, 727, 309, + 2573, 484, 11, 291, 458, 11, 281, 589, 370, 49927, 322, 452, 1412, 300, 309, 6132, + 380, 50744], "temperature": 0.0, "avg_logprob": -0.13609039783477783, "compression_ratio": + 1.6805555555555556, "no_speech_prob": 0.004394518677145243}, {"id": 442, "seek": + 289568, "start": 2903.2799999999997, "end": 2909.2, "text": " seen before? That''s + amazing. Well, and it goes back to your earlier question, which is like,", "tokens": + [50744, 1612, 949, 30, 663, 311, 2243, 13, 1042, 11, 293, 309, 1709, 646, 281, 428, + 3071, 1168, 11, 597, 307, 411, 11, 51040], "temperature": 0.0, "avg_logprob": -0.13609039783477783, + "compression_ratio": 1.6805555555555556, "no_speech_prob": 0.004394518677145243}, + {"id": 443, "seek": 289568, "start": 2909.2, "end": 2914.24, "text": " you know, + at the end of the day, folks like go evaluate it and see whether it works better + for you.", "tokens": [51040, 291, 458, 11, 412, 264, 917, 295, 264, 786, 11, 4024, + 411, 352, 13059, 309, 293, 536, 1968, 309, 1985, 1101, 337, 291, 13, 51292], "temperature": + 0.0, "avg_logprob": -0.13609039783477783, "compression_ratio": 1.6805555555555556, + "no_speech_prob": 0.004394518677145243}, {"id": 444, "seek": 289568, "start": 2914.96, + "end": 2918.7999999999997, "text": " And then like I said, even earlier, I mean, + they''re all just vectors and we''re all just trying to", "tokens": [51328, 400, + 550, 411, 286, 848, 11, 754, 3071, 11, 286, 914, 11, 436, 434, 439, 445, 18875, + 293, 321, 434, 439, 445, 1382, 281, 51520], "temperature": 0.0, "avg_logprob": -0.13609039783477783, + "compression_ratio": 1.6805555555555556, "no_speech_prob": 0.004394518677145243}, + {"id": 445, "seek": 289568, "start": 2918.7999999999997, "end": 2925.04, "text": + " calculate cosines between the user''s query and and the vector. And so in some + regards, like we''re", "tokens": [51520, 8873, 3792, 1652, 1296, 264, 4195, 311, + 14581, 293, 293, 264, 8062, 13, 400, 370, 294, 512, 14258, 11, 411, 321, 434, 51832], + "temperature": 0.0, "avg_logprob": -0.13609039783477783, "compression_ratio": 1.6805555555555556, + "no_speech_prob": 0.004394518677145243}, {"id": 446, "seek": 292504, "start": 2925.12, + "end": 2932.4, "text": " just building a better vector, right? It''s just a better + vector. It has more information encoded in it.", "tokens": [50368, 445, 2390, 257, + 1101, 8062, 11, 558, 30, 467, 311, 445, 257, 1101, 8062, 13, 467, 575, 544, 1589, + 2058, 12340, 294, 309, 13, 50732], "temperature": 0.0, "avg_logprob": -0.16321213731487977, + "compression_ratio": 1.528, "no_speech_prob": 0.000682392914313823}, {"id": 447, + "seek": 292504, "start": 2932.4, "end": 2937.44, "text": " And so if I can query + that more effectively, then why wouldn''t you use it?", "tokens": [50732, 400, 370, + 498, 286, 393, 14581, 300, 544, 8659, 11, 550, 983, 2759, 380, 291, 764, 309, 30, + 50984], "temperature": 0.0, "avg_logprob": -0.16321213731487977, "compression_ratio": + 1.528, "no_speech_prob": 0.000682392914313823}, {"id": 448, "seek": 292504, "start": + 2938.8, "end": 2945.36, "text": " Yeah, yeah, exactly. And of course, there are + other subtopics there how to make it faster and so on,", "tokens": [51052, 865, + 11, 1338, 11, 2293, 13, 400, 295, 1164, 11, 456, 366, 661, 7257, 404, 1167, 456, + 577, 281, 652, 309, 4663, 293, 370, 322, 11, 51380], "temperature": 0.0, "avg_logprob": + -0.16321213731487977, "compression_ratio": 1.528, "no_speech_prob": 0.000682392914313823}, + {"id": 449, "seek": 292504, "start": 2945.36, "end": 2951.2, "text": " but I think + eventually we will, hey, Google figured it out for 10% of the queries. So I guess + the rest", "tokens": [51380, 457, 286, 519, 4728, 321, 486, 11, 4177, 11, 3329, + 8932, 309, 484, 337, 1266, 4, 295, 264, 24109, 13, 407, 286, 2041, 264, 1472, 51672], + "temperature": 0.0, "avg_logprob": -0.16321213731487977, "compression_ratio": 1.528, + "no_speech_prob": 0.000682392914313823}, {"id": 450, "seek": 295120, "start": 2951.2, + "end": 2957.8399999999997, "text": " of the world will catch up. Before we continue + to the questions from the audience of which we have", "tokens": [50364, 295, 264, + 1002, 486, 3745, 493, 13, 4546, 321, 2354, 281, 264, 1651, 490, 264, 4034, 295, + 597, 321, 362, 50696], "temperature": 0.0, "avg_logprob": -0.11356743176778157, + "compression_ratio": 1.565040650406504, "no_speech_prob": 0.006591591518372297}, + {"id": 451, "seek": 295120, "start": 2957.8399999999997, "end": 2963.2799999999997, + "text": " at you, I do love asking, and if you can keep it a little bit short, because + we are short on time,", "tokens": [50696, 412, 291, 11, 286, 360, 959, 3365, 11, + 293, 498, 291, 393, 1066, 309, 257, 707, 857, 2099, 11, 570, 321, 366, 2099, 322, + 565, 11, 50968], "temperature": 0.0, "avg_logprob": -0.11356743176778157, "compression_ratio": + 1.565040650406504, "no_speech_prob": 0.006591591518372297}, {"id": 452, "seek": + 295120, "start": 2963.2799999999997, "end": 2970.08, "text": " but I''m still super, + super interested to hear your motivation to stay in this space. You have", "tokens": + [50968, 457, 286, 478, 920, 1687, 11, 1687, 3102, 281, 1568, 428, 12335, 281, 1754, + 294, 341, 1901, 13, 509, 362, 51308], "temperature": 0.0, "avg_logprob": -0.11356743176778157, + "compression_ratio": 1.565040650406504, "no_speech_prob": 0.006591591518372297}, + {"id": 453, "seek": 295120, "start": 2970.08, "end": 2975.3599999999997, "text": + " tried so many things in your career, right? Looking at your LinkedIn profiles, + just on and on", "tokens": [51308, 3031, 370, 867, 721, 294, 428, 3988, 11, 558, + 30, 11053, 412, 428, 20657, 23693, 11, 445, 322, 293, 322, 51572], "temperature": + 0.0, "avg_logprob": -0.11356743176778157, "compression_ratio": 1.565040650406504, + "no_speech_prob": 0.006591591518372297}, {"id": 454, "seek": 297536, "start": 2975.36, + "end": 2981.52, "text": " experiences and fractional CTO and full-time CTO and an + engineer and so on and book author.", "tokens": [50364, 5235, 293, 17948, 1966, + 383, 15427, 293, 1577, 12, 3766, 383, 15427, 293, 364, 11403, 293, 370, 322, 293, + 1446, 3793, 13, 50672], "temperature": 0.0, "avg_logprob": -0.14109558860460916, + "compression_ratio": 1.5574468085106383, "no_speech_prob": 0.0034909816458821297}, + {"id": 455, "seek": 297536, "start": 2983.6, "end": 2988.2400000000002, "text": + " What motivates you to stay in this space today and also go into education teaching?", + "tokens": [50776, 708, 42569, 291, 281, 1754, 294, 341, 1901, 965, 293, 611, 352, + 666, 3309, 4571, 30, 51008], "temperature": 0.0, "avg_logprob": -0.14109558860460916, + "compression_ratio": 1.5574468085106383, "no_speech_prob": 0.0034909816458821297}, + {"id": 456, "seek": 297536, "start": 2988.8, "end": 2994.0, "text": " Yeah, I mean, + it''s funny. I think, well, even when I was at Wikimedia and I quote, unquote,", + "tokens": [51036, 865, 11, 286, 914, 11, 309, 311, 4074, 13, 286, 519, 11, 731, + 11, 754, 562, 286, 390, 412, 23377, 332, 14212, 293, 286, 6513, 11, 37557, 11, 51296], + "temperature": 0.0, "avg_logprob": -0.14109558860460916, "compression_ratio": 1.5574468085106383, + "no_speech_prob": 0.0034909816458821297}, {"id": 457, "seek": 297536, "start": 2994.0, + "end": 3001.04, "text": " left search, I mean, we still ran a very large search + engine and I always enjoyed my conversations", "tokens": [51296, 1411, 3164, 11, + 286, 914, 11, 321, 920, 5872, 257, 588, 2416, 3164, 2848, 293, 286, 1009, 4626, + 452, 7315, 51648], "temperature": 0.0, "avg_logprob": -0.14109558860460916, "compression_ratio": + 1.5574468085106383, "no_speech_prob": 0.0034909816458821297}, {"id": 458, "seek": + 300104, "start": 3001.04, "end": 3007.04, "text": " with a search team at Wikimedia + just because they were, you know, it''s such a high traffic website", "tokens": + [50364, 365, 257, 3164, 1469, 412, 23377, 332, 14212, 445, 570, 436, 645, 11, 291, + 458, 11, 309, 311, 1270, 257, 1090, 6419, 3144, 50664], "temperature": 0.0, "avg_logprob": + -0.14159415318415716, "compression_ratio": 1.544041450777202, "no_speech_prob": + 0.0011936401715502143}, {"id": 459, "seek": 300104, "start": 3007.04, "end": 3012.48, + "text": " and search there, I think does something like 6,000 queries per second + or something like that. So", "tokens": [50664, 293, 3164, 456, 11, 286, 519, 775, + 746, 411, 1386, 11, 1360, 24109, 680, 1150, 420, 746, 411, 300, 13, 407, 50936], + "temperature": 0.0, "avg_logprob": -0.14159415318415716, "compression_ratio": 1.544041450777202, + "no_speech_prob": 0.0011936401715502143}, {"id": 460, "seek": 300104, "start": 3014.24, + "end": 3021.04, "text": " you know, in some ways, and this is reflecting back on + my career, I mean, I think I fell in love with", "tokens": [51024, 291, 458, 11, + 294, 512, 2098, 11, 293, 341, 307, 23543, 646, 322, 452, 3988, 11, 286, 914, 11, + 286, 519, 286, 5696, 294, 959, 365, 51364], "temperature": 0.0, "avg_logprob": -0.14159415318415716, + "compression_ratio": 1.544041450777202, "no_speech_prob": 0.0011936401715502143}, + {"id": 461, "seek": 302104, "start": 3021.84, "end": 3035.04, "text": " language + and the way humans use language and find information back circa 1999 or so when + I started", "tokens": [50404, 2856, 293, 264, 636, 6255, 764, 2856, 293, 915, 1589, + 646, 45972, 19952, 420, 370, 562, 286, 1409, 51064], "temperature": 0.0, "avg_logprob": + -0.19641939331503475, "compression_ratio": 1.505050505050505, "no_speech_prob": + 0.004666642285883427}, {"id": 462, "seek": 302104, "start": 3035.04, "end": 3042.24, + "text": " at a small company called TextWise run by Liz Litty who is one of the + pioneers in the natural language", "tokens": [51064, 412, 257, 1359, 2237, 1219, + 18643, 54, 908, 1190, 538, 16480, 441, 10016, 567, 307, 472, 295, 264, 47381, 294, + 264, 3303, 2856, 51424], "temperature": 0.0, "avg_logprob": -0.19641939331503475, + "compression_ratio": 1.505050505050505, "no_speech_prob": 0.004666642285883427}, + {"id": 463, "seek": 302104, "start": 3042.24, "end": 3048.4, "text": " processing + field and it just happened to have a search project that I started working on, right?", + "tokens": [51424, 9007, 2519, 293, 309, 445, 2011, 281, 362, 257, 3164, 1716, 300, + 286, 1409, 1364, 322, 11, 558, 30, 51732], "temperature": 0.0, "avg_logprob": -0.19641939331503475, + "compression_ratio": 1.505050505050505, "no_speech_prob": 0.004666642285883427}, + {"id": 464, "seek": 304840, "start": 3048.48, "end": 3053.6800000000003, "text": + " But to me, you know, at the end of the day, like, this space and this is why I + went to Wikimedia.", "tokens": [50368, 583, 281, 385, 11, 291, 458, 11, 412, 264, + 917, 295, 264, 786, 11, 411, 11, 341, 1901, 293, 341, 307, 983, 286, 1437, 281, + 23377, 332, 14212, 13, 50628], "temperature": 0.0, "avg_logprob": -0.13151549319831693, + "compression_ratio": 1.663677130044843, "no_speech_prob": 0.0019197214860469103}, + {"id": 465, "seek": 304840, "start": 3053.6800000000003, "end": 3058.0, "text": + " So I say, searches that necessarily the through line, even though it''s often + the main,", "tokens": [50628, 407, 286, 584, 11, 26701, 300, 4725, 264, 807, 1622, + 11, 754, 1673, 309, 311, 2049, 264, 2135, 11, 50844], "temperature": 0.0, "avg_logprob": + -0.13151549319831693, "compression_ratio": 1.663677130044843, "no_speech_prob": + 0.0019197214860469103}, {"id": 466, "seek": 304840, "start": 3059.12, "end": 3064.56, + "text": " it appears to be the through line in my career, the deeper through line, + I think, is that", "tokens": [50900, 309, 7038, 281, 312, 264, 807, 1622, 294, 452, + 3988, 11, 264, 7731, 807, 1622, 11, 286, 519, 11, 307, 300, 51172], "temperature": + 0.0, "avg_logprob": -0.13151549319831693, "compression_ratio": 1.663677130044843, + "no_speech_prob": 0.0019197214860469103}, {"id": 467, "seek": 304840, "start": 3064.56, + "end": 3071.36, "text": " I am fascinated by how we can leverage computers to help + users make more informed, more capable,", "tokens": [51172, 286, 669, 24597, 538, + 577, 321, 393, 13982, 10807, 281, 854, 5022, 652, 544, 11740, 11, 544, 8189, 11, + 51512], "temperature": 0.0, "avg_logprob": -0.13151549319831693, "compression_ratio": + 1.663677130044843, "no_speech_prob": 0.0019197214860469103}, {"id": 468, "seek": + 307136, "start": 3071.36, "end": 3081.1200000000003, "text": " more aware decisions + in their lives, whether that''s purchasing online or political or governmental", + "tokens": [50364, 544, 3650, 5327, 294, 641, 2909, 11, 1968, 300, 311, 20906, 2950, + 420, 3905, 420, 43391, 50852], "temperature": 0.0, "avg_logprob": -0.1194511340214656, + "compression_ratio": 1.5159574468085106, "no_speech_prob": 0.006363488733768463}, + {"id": 469, "seek": 307136, "start": 3081.1200000000003, "end": 3086.96, "text": + " or whatever it is, like, I am fascinated by how we can help people make more informed + decisions", "tokens": [50852, 420, 2035, 309, 307, 11, 411, 11, 286, 669, 24597, + 538, 577, 321, 393, 854, 561, 652, 544, 11740, 5327, 51144], "temperature": 0.0, + "avg_logprob": -0.1194511340214656, "compression_ratio": 1.5159574468085106, "no_speech_prob": + 0.006363488733768463}, {"id": 470, "seek": 307136, "start": 3086.96, "end": 3098.96, + "text": " because I think that''s the thing that lifts us out, right? And so education + then is a easy", "tokens": [51144, 570, 286, 519, 300, 311, 264, 551, 300, 30501, + 505, 484, 11, 558, 30, 400, 370, 3309, 550, 307, 257, 1858, 51744], "temperature": + 0.0, "avg_logprob": -0.1194511340214656, "compression_ratio": 1.5159574468085106, + "no_speech_prob": 0.006363488733768463}, {"id": 471, "seek": 309896, "start": 3098.96, + "end": 3105.44, "text": " follow-on from that through line, right? Like, the more + people I can help use these tools and", "tokens": [50364, 1524, 12, 266, 490, 300, + 807, 1622, 11, 558, 30, 1743, 11, 264, 544, 561, 286, 393, 854, 764, 613, 3873, + 293, 50688], "temperature": 0.0, "avg_logprob": -0.13307300123196203, "compression_ratio": + 1.7123287671232876, "no_speech_prob": 0.002534678904339671}, {"id": 472, "seek": + 309896, "start": 3105.44, "end": 3112.0, "text": " also learn myself, the better + off will I''ll be, right? Like, we have to use these tools to,", "tokens": [50688, + 611, 1466, 2059, 11, 264, 1101, 766, 486, 286, 603, 312, 11, 558, 30, 1743, 11, + 321, 362, 281, 764, 613, 3873, 281, 11, 51016], "temperature": 0.0, "avg_logprob": + -0.13307300123196203, "compression_ratio": 1.7123287671232876, "no_speech_prob": + 0.002534678904339671}, {"id": 473, "seek": 309896, "start": 3113.12, "end": 3119.12, + "text": " you know, to help us as humans get along better, etc. be more informed, + so on, so forth, right?", "tokens": [51072, 291, 458, 11, 281, 854, 505, 382, 6255, + 483, 2051, 1101, 11, 5183, 13, 312, 544, 11740, 11, 370, 322, 11, 370, 5220, 11, + 558, 30, 51372], "temperature": 0.0, "avg_logprob": -0.13307300123196203, "compression_ratio": + 1.7123287671232876, "no_speech_prob": 0.002534678904339671}, {"id": 474, "seek": + 309896, "start": 3119.12, "end": 3124.64, "text": " So that''s probably the through + line of the career, right? Is this how do you help people find", "tokens": [51372, + 407, 300, 311, 1391, 264, 807, 1622, 295, 264, 3988, 11, 558, 30, 1119, 341, 577, + 360, 291, 854, 561, 915, 51648], "temperature": 0.0, "avg_logprob": -0.13307300123196203, + "compression_ratio": 1.7123287671232876, "no_speech_prob": 0.002534678904339671}, + {"id": 475, "seek": 312464, "start": 3124.72, "end": 3131.04, "text": " information + and take action that makes us all better? Absolutely, this is very deep. Thanks + so much.", "tokens": [50368, 1589, 293, 747, 3069, 300, 1669, 505, 439, 1101, 30, + 7021, 11, 341, 307, 588, 2452, 13, 2561, 370, 709, 13, 50684], "temperature": 0.0, + "avg_logprob": -0.12683395866875177, "compression_ratio": 1.6795774647887325, "no_speech_prob": + 0.03353035822510719}, {"id": 476, "seek": 312464, "start": 3131.68, "end": 3137.2799999999997, + "text": " I love asking this question because I''m super motivated to stay in the + space, but I also love to", "tokens": [50716, 286, 959, 3365, 341, 1168, 570, 286, + 478, 1687, 14515, 281, 1754, 294, 264, 1901, 11, 457, 286, 611, 959, 281, 50996], + "temperature": 0.0, "avg_logprob": -0.12683395866875177, "compression_ratio": 1.6795774647887325, + "no_speech_prob": 0.03353035822510719}, {"id": 477, "seek": 312464, "start": 3137.2799999999997, + "end": 3142.7999999999997, "text": " see the facets and the motivation of other + professionals like yourself that I''m looking up to.", "tokens": [50996, 536, 264, + 49752, 293, 264, 12335, 295, 661, 11954, 411, 1803, 300, 286, 478, 1237, 493, 281, + 13, 51272], "temperature": 0.0, "avg_logprob": -0.12683395866875177, "compression_ratio": + 1.6795774647887325, "no_speech_prob": 0.03353035822510719}, {"id": 478, "seek": + 312464, "start": 3143.52, "end": 3147.6, "text": " I really enjoyed this conversation. + Is there an announcement that you want to make in terms of", "tokens": [51308, 286, + 534, 4626, 341, 3761, 13, 1119, 456, 364, 12847, 300, 291, 528, 281, 652, 294, 2115, + 295, 51512], "temperature": 0.0, "avg_logprob": -0.12683395866875177, "compression_ratio": + 1.6795774647887325, "no_speech_prob": 0.03353035822510719}, {"id": 479, "seek": + 312464, "start": 3147.6, "end": 3151.92, "text": " the courses that you''re going + to be teaching soon? Yeah, that''s great. I appreciate that,", "tokens": [51512, + 264, 7712, 300, 291, 434, 516, 281, 312, 4571, 2321, 30, 865, 11, 300, 311, 869, + 13, 286, 4449, 300, 11, 51728], "temperature": 0.0, "avg_logprob": -0.12683395866875177, + "compression_ratio": 1.6795774647887325, "no_speech_prob": 0.03353035822510719}, + {"id": 480, "seek": 315192, "start": 3151.92, "end": 3156.0, "text": " the Metri, + and I know we''d have some user questions, and I''m happy to stay on a little bit + longer as", "tokens": [50364, 264, 6377, 470, 11, 293, 286, 458, 321, 1116, 362, + 512, 4195, 1651, 11, 293, 286, 478, 2055, 281, 1754, 322, 257, 707, 857, 2854, 382, + 50568], "temperature": 0.0, "avg_logprob": -0.15142239026786866, "compression_ratio": + 1.7304964539007093, "no_speech_prob": 0.005808789748698473}, {"id": 481, "seek": + 315192, "start": 3156.0, "end": 3162.7200000000003, "text": " well, get those. Yes, + we actually, we have two classes coming up. So one of the things we learned", "tokens": + [50568, 731, 11, 483, 729, 13, 1079, 11, 321, 767, 11, 321, 362, 732, 5359, 1348, + 493, 13, 407, 472, 295, 264, 721, 321, 3264, 50904], "temperature": 0.0, "avg_logprob": + -0.15142239026786866, "compression_ratio": 1.7304964539007093, "no_speech_prob": + 0.005808789748698473}, {"id": 482, "seek": 315192, "start": 3162.7200000000003, + "end": 3169.04, "text": " in the first run of search with machine learning is, you + know, effectively we had one week of", "tokens": [50904, 294, 264, 700, 1190, 295, + 3164, 365, 3479, 2539, 307, 11, 291, 458, 11, 8659, 321, 632, 472, 1243, 295, 51220], + "temperature": 0.0, "avg_logprob": -0.15142239026786866, "compression_ratio": 1.7304964539007093, + "no_speech_prob": 0.005808789748698473}, {"id": 483, "seek": 315192, "start": 3169.04, + "end": 3174.8, "text": " trying to get everybody on to the same page of how does + open search work and what are the basics", "tokens": [51220, 1382, 281, 483, 2201, + 322, 281, 264, 912, 3028, 295, 577, 775, 1269, 3164, 589, 293, 437, 366, 264, 14688, + 51508], "temperature": 0.0, "avg_logprob": -0.15142239026786866, "compression_ratio": + 1.7304964539007093, "no_speech_prob": 0.005808789748698473}, {"id": 484, "seek": + 315192, "start": 3174.8, "end": 3181.52, "text": " of search? And then we had three + weeks of fairly intense machine learning in a search environment,", "tokens": [51508, + 295, 3164, 30, 400, 550, 321, 632, 1045, 3259, 295, 6457, 9447, 3479, 2539, 294, + 257, 3164, 2823, 11, 51844], "temperature": 0.0, "avg_logprob": -0.15142239026786866, + "compression_ratio": 1.7304964539007093, "no_speech_prob": 0.005808789748698473}, + {"id": 485, "seek": 318152, "start": 3181.52, "end": 3187.2, "text": " and one of + the things that happened in the class because we didn''t have a lot of prerequisites", + "tokens": [50364, 293, 472, 295, 264, 721, 300, 2011, 294, 264, 1508, 570, 321, + 994, 380, 362, 257, 688, 295, 38333, 15398, 3324, 50648], "temperature": 0.0, "avg_logprob": + -0.10143456999788579, "compression_ratio": 1.6860986547085202, "no_speech_prob": + 0.00024349303566850722}, {"id": 486, "seek": 318152, "start": 3187.2, "end": 3193.28, + "text": " is we had a really wide array of students of folks who were deep experts + like yourself,", "tokens": [50648, 307, 321, 632, 257, 534, 4874, 10225, 295, 1731, + 295, 4024, 567, 645, 2452, 8572, 411, 1803, 11, 50952], "temperature": 0.0, "avg_logprob": + -0.10143456999788579, "compression_ratio": 1.6860986547085202, "no_speech_prob": + 0.00024349303566850722}, {"id": 487, "seek": 318152, "start": 3194.88, "end": 3200.8, + "text": " as well as like totally new to this arena. And what happened, I think, + is that first week for", "tokens": [51032, 382, 731, 382, 411, 3879, 777, 281, 341, + 18451, 13, 400, 437, 2011, 11, 286, 519, 11, 307, 300, 700, 1243, 337, 51328], "temperature": + 0.0, "avg_logprob": -0.10143456999788579, "compression_ratio": 1.6860986547085202, + "no_speech_prob": 0.00024349303566850722}, {"id": 488, "seek": 318152, "start": + 3200.8, "end": 3206.96, "text": " the new people was like, hey, this is too much + for me to get up to speed. And for the folks who had", "tokens": [51328, 264, 777, + 561, 390, 411, 11, 4177, 11, 341, 307, 886, 709, 337, 385, 281, 483, 493, 281, 3073, + 13, 400, 337, 264, 4024, 567, 632, 51636], "temperature": 0.0, "avg_logprob": -0.10143456999788579, + "compression_ratio": 1.6860986547085202, "no_speech_prob": 0.00024349303566850722}, + {"id": 489, "seek": 320696, "start": 3206.96, "end": 3212.0, "text": " already done + search, it was like, hey, I already know how to do all of this. And so trying to,", + "tokens": [50364, 1217, 1096, 3164, 11, 309, 390, 411, 11, 4177, 11, 286, 1217, + 458, 577, 281, 360, 439, 295, 341, 13, 400, 370, 1382, 281, 11, 50616], "temperature": + 0.0, "avg_logprob": -0.12190031236217867, "compression_ratio": 1.5973451327433628, + "no_speech_prob": 0.0003050028462894261}, {"id": 490, "seek": 320696, "start": 3213.04, + "end": 3216.48, "text": " trying to go across that gap, I think we kind of ended + up in this", "tokens": [50668, 1382, 281, 352, 2108, 300, 7417, 11, 286, 519, 321, + 733, 295, 4590, 493, 294, 341, 50840], "temperature": 0.0, "avg_logprob": -0.12190031236217867, + "compression_ratio": 1.5973451327433628, "no_speech_prob": 0.0003050028462894261}, + {"id": 491, "seek": 320696, "start": 3217.6, "end": 3223.52, "text": " lukewarm + area where nobody was quite satisfied. So one of the things we did was we split + out the new", "tokens": [50896, 10438, 330, 49240, 1859, 689, 5079, 390, 1596, 11239, + 13, 407, 472, 295, 264, 721, 321, 630, 390, 321, 7472, 484, 264, 777, 51192], "temperature": + 0.0, "avg_logprob": -0.12190031236217867, "compression_ratio": 1.5973451327433628, + "no_speech_prob": 0.0003050028462894261}, {"id": 492, "seek": 320696, "start": 3223.52, + "end": 3230.48, "text": " stuff into a two week class called search fundamentals, + which covers all of the basic intuitions of", "tokens": [51192, 1507, 666, 257, + 732, 1243, 1508, 1219, 3164, 29505, 11, 597, 10538, 439, 295, 264, 3875, 16224, + 626, 295, 51540], "temperature": 0.0, "avg_logprob": -0.12190031236217867, "compression_ratio": + 1.5973451327433628, "no_speech_prob": 0.0003050028462894261}, {"id": 493, "seek": + 323048, "start": 3230.48, "end": 3238.16, "text": " search, whether it''s deep learning + based or a sparse learning based or sparse vector based,", "tokens": [50364, 3164, + 11, 1968, 309, 311, 2452, 2539, 2361, 420, 257, 637, 11668, 2539, 2361, 420, 637, + 11668, 8062, 2361, 11, 50748], "temperature": 0.0, "avg_logprob": -0.1676989514777001, + "compression_ratio": 1.654867256637168, "no_speech_prob": 0.0009253321914002299}, + {"id": 494, "seek": 323048, "start": 3238.16, "end": 3244.72, "text": " sorry. And + so we cover, you know, indexing querying, facetying, spell checking, auto-complete,", + "tokens": [50748, 2597, 13, 400, 370, 321, 2060, 11, 291, 458, 11, 8186, 278, 7083, + 1840, 11, 1915, 302, 1840, 11, 9827, 8568, 11, 8399, 12, 1112, 17220, 11, 51076], + "temperature": 0.0, "avg_logprob": -0.1676989514777001, "compression_ratio": 1.654867256637168, + "no_speech_prob": 0.0009253321914002299}, {"id": 495, "seek": 323048, "start": 3244.72, + "end": 3249.12, "text": " kind of all the building blocks of a search application. + And then with the machine learning", "tokens": [51076, 733, 295, 439, 264, 2390, + 8474, 295, 257, 3164, 3861, 13, 400, 550, 365, 264, 3479, 2539, 51296], "temperature": + 0.0, "avg_logprob": -0.1676989514777001, "compression_ratio": 1.654867256637168, + "no_speech_prob": 0.0009253321914002299}, {"id": 496, "seek": 323048, "start": 3249.12, + "end": 3255.84, "text": " class, because we''re dropping that beginner class week, + we now have added in a neural retrieval", "tokens": [51296, 1508, 11, 570, 321, + 434, 13601, 300, 22080, 1508, 1243, 11, 321, 586, 362, 3869, 294, 257, 18161, 19817, + 3337, 51632], "temperature": 0.0, "avg_logprob": -0.1676989514777001, "compression_ratio": + 1.654867256637168, "no_speech_prob": 0.0009253321914002299}, {"id": 497, "seek": + 325584, "start": 3256.0, "end": 3262.32, "text": " dance retrieval into that as + well. And so the search with fundamentals class starts next Monday,", "tokens": + [50372, 4489, 19817, 3337, 666, 300, 382, 731, 13, 400, 370, 264, 3164, 365, 29505, + 1508, 3719, 958, 8138, 11, 50688], "temperature": 0.0, "avg_logprob": -0.14036091108967488, + "compression_ratio": 1.7074829931972788, "no_speech_prob": 0.004025400150567293}, + {"id": 498, "seek": 325584, "start": 3262.32, "end": 3269.76, "text": " June 6th. + You can still sign up. It''s $200. There''s a code, DGSearch 10. And then search + with machine", "tokens": [50688, 6928, 1386, 392, 13, 509, 393, 920, 1465, 493, + 13, 467, 311, 1848, 7629, 13, 821, 311, 257, 3089, 11, 413, 38, 10637, 1178, 1266, + 13, 400, 550, 3164, 365, 3479, 51060], "temperature": 0.0, "avg_logprob": -0.14036091108967488, + "compression_ratio": 1.7074829931972788, "no_speech_prob": 0.004025400150567293}, + {"id": 499, "seek": 325584, "start": 3269.76, "end": 3275.36, "text": " learning + is two weeks after that. And that''s a four week class. Both are project intensive. + Every", "tokens": [51060, 2539, 307, 732, 3259, 934, 300, 13, 400, 300, 311, 257, + 1451, 1243, 1508, 13, 6767, 366, 1716, 18957, 13, 2048, 51340], "temperature": 0.0, + "avg_logprob": -0.14036091108967488, "compression_ratio": 1.7074829931972788, "no_speech_prob": + 0.004025400150567293}, {"id": 500, "seek": 325584, "start": 3275.36, "end": 3279.92, + "text": " week, you''re going to do a project, you''re going to write code, you''re + going to interact with students,", "tokens": [51340, 1243, 11, 291, 434, 516, 281, + 360, 257, 1716, 11, 291, 434, 516, 281, 2464, 3089, 11, 291, 434, 516, 281, 4648, + 365, 1731, 11, 51568], "temperature": 0.0, "avg_logprob": -0.14036091108967488, + "compression_ratio": 1.7074829931972788, "no_speech_prob": 0.004025400150567293}, + {"id": 501, "seek": 325584, "start": 3279.92, "end": 3285.76, "text": " you''re + going to hear lectures, so on, so forth. In many ways, I think it''s modeled after + a university", "tokens": [51568, 291, 434, 516, 281, 1568, 16564, 11, 370, 322, + 11, 370, 5220, 13, 682, 867, 2098, 11, 286, 519, 309, 311, 37140, 934, 257, 5454, + 51860], "temperature": 0.0, "avg_logprob": -0.14036091108967488, "compression_ratio": + 1.7074829931972788, "no_speech_prob": 0.004025400150567293}, {"id": 502, "seek": + 328576, "start": 3285.76, "end": 3289.92, "text": " style class where you, you know, + every week you have homework, every week you have lectures,", "tokens": [50364, + 3758, 1508, 689, 291, 11, 291, 458, 11, 633, 1243, 291, 362, 14578, 11, 633, 1243, + 291, 362, 16564, 11, 50572], "temperature": 0.0, "avg_logprob": -0.19481226031699878, + "compression_ratio": 1.6309012875536482, "no_speech_prob": 0.000557929917704314}, + {"id": 503, "seek": 328576, "start": 3290.96, "end": 3298.4, "text": " and so on, + so forth. So yeah, please sign up. Yeah, that''s awesome. What I''ve personally + enjoyed", "tokens": [50624, 293, 370, 322, 11, 370, 5220, 13, 407, 1338, 11, 1767, + 1465, 493, 13, 865, 11, 300, 311, 3476, 13, 708, 286, 600, 5665, 4626, 50996], "temperature": + 0.0, "avg_logprob": -0.19481226031699878, "compression_ratio": 1.6309012875536482, + "no_speech_prob": 0.000557929917704314}, {"id": 504, "seek": 328576, "start": 3298.4, + "end": 3305.44, "text": " during the course, the search with the Mel four weeks + course was the atmosphere. The atmosphere", "tokens": [50996, 1830, 264, 1164, 11, + 264, 3164, 365, 264, 7375, 1451, 3259, 1164, 390, 264, 8018, 13, 440, 8018, 51348], + "temperature": 0.0, "avg_logprob": -0.19481226031699878, "compression_ratio": 1.6309012875536482, + "no_speech_prob": 0.000557929917704314}, {"id": 505, "seek": 328576, "start": 3305.44, + "end": 3311.6800000000003, "text": " that was basically creating itself amongst + the students and was over 100 people there on Slack", "tokens": [51348, 300, 390, + 1936, 4084, 2564, 12918, 264, 1731, 293, 390, 670, 2319, 561, 456, 322, 37211, 51660], + "temperature": 0.0, "avg_logprob": -0.19481226031699878, "compression_ratio": 1.6309012875536482, + "no_speech_prob": 0.000557929917704314}, {"id": 506, "seek": 331168, "start": 3311.68, + "end": 3317.04, "text": " helping each other. That was just amazing. Somebody saved + me like a ton of time by just sharing,", "tokens": [50364, 4315, 1184, 661, 13, + 663, 390, 445, 2243, 13, 13463, 6624, 385, 411, 257, 2952, 295, 565, 538, 445, 5414, + 11, 50632], "temperature": 0.0, "avg_logprob": -0.1370131326100183, "compression_ratio": + 1.6655405405405406, "no_speech_prob": 0.020679641515016556}, {"id": 507, "seek": + 331168, "start": 3317.04, "end": 3322.3199999999997, "text": " you know, a recipe + that I followed and quickly went through to some hurdle. And I learned, and I,", + "tokens": [50632, 291, 458, 11, 257, 6782, 300, 286, 6263, 293, 2661, 1437, 807, + 281, 512, 47423, 13, 400, 286, 3264, 11, 293, 286, 11, 50896], "temperature": 0.0, + "avg_logprob": -0.1370131326100183, "compression_ratio": 1.6655405405405406, "no_speech_prob": + 0.020679641515016556}, {"id": 508, "seek": 331168, "start": 3322.3199999999997, + "end": 3327.52, "text": " of course, I knew some stuff. Yes, I''m an expert in this + field, but also you can put your expertise,", "tokens": [50896, 295, 1164, 11, 286, + 2586, 512, 1507, 13, 1079, 11, 286, 478, 364, 5844, 294, 341, 2519, 11, 457, 611, + 291, 393, 829, 428, 11769, 11, 51156], "temperature": 0.0, "avg_logprob": -0.1370131326100183, + "compression_ratio": 1.6655405405405406, "no_speech_prob": 0.020679641515016556}, + {"id": 509, "seek": 331168, "start": 3327.52, "end": 3332.56, "text": " you know, + to a test when you, when you run so fast during the course and the support that + you guys", "tokens": [51156, 291, 458, 11, 281, 257, 1500, 562, 291, 11, 562, 291, + 1190, 370, 2370, 1830, 264, 1164, 293, 264, 1406, 300, 291, 1074, 51408], "temperature": + 0.0, "avg_logprob": -0.1370131326100183, "compression_ratio": 1.6655405405405406, + "no_speech_prob": 0.020679641515016556}, {"id": 510, "seek": 331168, "start": 3332.56, + "end": 3338.7999999999997, "text": " provided was amazing. So that''s amazing. I''ve + enjoyed this conversation so much. Now we are moving", "tokens": [51408, 5649, 390, + 2243, 13, 407, 300, 311, 2243, 13, 286, 600, 4626, 341, 3761, 370, 709, 13, 823, + 321, 366, 2684, 51720], "temperature": 0.0, "avg_logprob": -0.1370131326100183, + "compression_ratio": 1.6655405405405406, "no_speech_prob": 0.020679641515016556}, + {"id": 511, "seek": 333880, "start": 3338.8, "end": 3345.6800000000003, "text": + " to the questions from the audience. And I''ll pick the, and feel free to ask questions, + please.", "tokens": [50364, 281, 264, 1651, 490, 264, 4034, 13, 400, 286, 603, 1888, + 264, 11, 293, 841, 1737, 281, 1029, 1651, 11, 1767, 13, 50708], "temperature": 0.0, + "avg_logprob": -0.19465838307919708, "compression_ratio": 1.7793427230046948, "no_speech_prob": + 0.01837104558944702}, {"id": 512, "seek": 333880, "start": 3346.88, "end": 3353.04, + "text": " We still have a few minutes. The first question comes from Avynash, who + is currently testing the", "tokens": [50768, 492, 920, 362, 257, 1326, 2077, 13, + 440, 700, 1168, 1487, 490, 11667, 2534, 1299, 11, 567, 307, 4362, 4997, 264, 51076], + "temperature": 0.0, "avg_logprob": -0.19465838307919708, "compression_ratio": 1.7793427230046948, + "no_speech_prob": 0.01837104558944702}, {"id": 513, "seek": 333880, "start": 3353.04, + "end": 3360.0, "text": " approach of buying coder to find the similar sentence, + top 10. And later passing the top 10", "tokens": [51076, 3109, 295, 6382, 17656, + 260, 281, 915, 264, 2531, 8174, 11, 1192, 1266, 13, 400, 1780, 8437, 264, 1192, + 1266, 51424], "temperature": 0.0, "avg_logprob": -0.19465838307919708, "compression_ratio": + 1.7793427230046948, "no_speech_prob": 0.01837104558944702}, {"id": 514, "seek": + 333880, "start": 3360.0, "end": 3365.52, "text": " sentence to a crossing coder + model to find the most similar sentence in the top 10 using cosine", "tokens": [51424, + 8174, 281, 257, 14712, 17656, 260, 2316, 281, 915, 264, 881, 2531, 8174, 294, 264, + 1192, 1266, 1228, 23565, 51700], "temperature": 0.0, "avg_logprob": -0.19465838307919708, + "compression_ratio": 1.7793427230046948, "no_speech_prob": 0.01837104558944702}, + {"id": 515, "seek": 336552, "start": 3365.52, "end": 3372.0, "text": " similarity. + Yeah, I guess he''s asking for advice is this an appropriate method.", "tokens": + [50364, 32194, 13, 865, 11, 286, 2041, 415, 311, 3365, 337, 5192, 307, 341, 364, + 6854, 3170, 13, 50688], "temperature": 0.0, "avg_logprob": -0.14145879487733584, + "compression_ratio": 1.4242424242424243, "no_speech_prob": 0.00481015257537365}, + {"id": 516, "seek": 336552, "start": 3374.64, "end": 3380.72, "text": " This is + where my expertise just is not. So Avynash, I will apologize. I do not know enough + here to", "tokens": [50820, 639, 307, 689, 452, 11769, 445, 307, 406, 13, 407, 11667, + 2534, 1299, 11, 286, 486, 12328, 13, 286, 360, 406, 458, 1547, 510, 281, 51124], + "temperature": 0.0, "avg_logprob": -0.14145879487733584, "compression_ratio": 1.4242424242424243, + "no_speech_prob": 0.00481015257537365}, {"id": 517, "seek": 336552, "start": 3380.72, + "end": 3386.88, "text": " give you advice. I would probably ask first, like, what + is the actual problem? Are you trying to solve?", "tokens": [51124, 976, 291, 5192, + 13, 286, 576, 1391, 1029, 700, 11, 411, 11, 437, 307, 264, 3539, 1154, 30, 2014, + 291, 1382, 281, 5039, 30, 51432], "temperature": 0.0, "avg_logprob": -0.14145879487733584, + "compression_ratio": 1.4242424242424243, "no_speech_prob": 0.00481015257537365}, + {"id": 518, "seek": 338688, "start": 3387.12, "end": 3396.2400000000002, "text": + " You know, so if you''re trying to find similar sentences, then from my understanding + of it, that my", "tokens": [50376, 509, 458, 11, 370, 498, 291, 434, 1382, 281, + 915, 2531, 16579, 11, 550, 490, 452, 3701, 295, 309, 11, 300, 452, 50832], "temperature": + 0.0, "avg_logprob": -0.22691947855847946, "compression_ratio": 1.636734693877551, + "no_speech_prob": 0.03233594074845314}, {"id": 519, "seek": 338688, "start": 3396.2400000000002, + "end": 3401.6, "text": " basic level understanding of what you''re describing, it + sounds like a reasonable, a reasonable approach.", "tokens": [50832, 3875, 1496, + 3701, 295, 437, 291, 434, 16141, 11, 309, 3263, 411, 257, 10585, 11, 257, 10585, + 3109, 13, 51100], "temperature": 0.0, "avg_logprob": -0.22691947855847946, "compression_ratio": + 1.636734693877551, "no_speech_prob": 0.03233594074845314}, {"id": 520, "seek": 338688, + "start": 3401.6, "end": 3406.48, "text": " But there are people who are much in + probably Dmitry, you probably could answer this one better than I,", "tokens": [51100, + 583, 456, 366, 561, 567, 366, 709, 294, 1391, 413, 3508, 627, 11, 291, 1391, 727, + 1867, 341, 472, 1101, 813, 286, 11, 51344], "temperature": 0.0, "avg_logprob": -0.22691947855847946, + "compression_ratio": 1.636734693877551, "no_speech_prob": 0.03233594074845314}, + {"id": 521, "seek": 338688, "start": 3406.48, "end": 3412.0, "text": " but I have + not played with or tried out those specific types of capabilities. So I don''t have", + "tokens": [51344, 457, 286, 362, 406, 3737, 365, 420, 3031, 484, 729, 2685, 3467, + 295, 10862, 13, 407, 286, 500, 380, 362, 51620], "temperature": 0.0, "avg_logprob": + -0.22691947855847946, "compression_ratio": 1.636734693877551, "no_speech_prob": + 0.03233594074845314}, {"id": 522, "seek": 341200, "start": 3412.72, "end": 3419.2, + "text": " good advice there. I have worked in general on sentence similarity type + problems. It is always", "tokens": [50400, 665, 5192, 456, 13, 286, 362, 2732, 294, + 2674, 322, 8174, 32194, 2010, 2740, 13, 467, 307, 1009, 50724], "temperature": 0.0, + "avg_logprob": -0.11199063195122613, "compression_ratio": 1.6929824561403508, "no_speech_prob": + 0.005875768139958382}, {"id": 523, "seek": 341200, "start": 3419.2, "end": 3426.16, + "text": " challenging. In fact, I have a my current company that I''m one of my + fractional clients. We are", "tokens": [50724, 7595, 13, 682, 1186, 11, 286, 362, + 257, 452, 2190, 2237, 300, 286, 478, 472, 295, 452, 17948, 1966, 6982, 13, 492, + 366, 51072], "temperature": 0.0, "avg_logprob": -0.11199063195122613, "compression_ratio": + 1.6929824561403508, "no_speech_prob": 0.005875768139958382}, {"id": 524, "seek": + 341200, "start": 3426.16, "end": 3432.88, "text": " doing sentence similarity or + clause similarity types problems. And I think they are we are using", "tokens": + [51072, 884, 8174, 32194, 420, 25925, 32194, 3467, 2740, 13, 400, 286, 519, 436, + 366, 321, 366, 1228, 51408], "temperature": 0.0, "avg_logprob": -0.11199063195122613, + "compression_ratio": 1.6929824561403508, "no_speech_prob": 0.005875768139958382}, + {"id": 525, "seek": 341200, "start": 3432.88, "end": 3438.16, "text": " similar + modeling techniques, but I''m not doing the day to day modeling on that. So I''m + really just", "tokens": [51408, 2531, 15983, 7512, 11, 457, 286, 478, 406, 884, + 264, 786, 281, 786, 15983, 322, 300, 13, 407, 286, 478, 534, 445, 51672], "temperature": + 0.0, "avg_logprob": -0.11199063195122613, "compression_ratio": 1.6929824561403508, + "no_speech_prob": 0.005875768139958382}, {"id": 526, "seek": 343816, "start": 3438.16, + "end": 3447.2, "text": " trusting the data scientists on that. Yeah, I can add to + this that I happen to have given a", "tokens": [50364, 28235, 264, 1412, 7708, 322, + 300, 13, 865, 11, 286, 393, 909, 281, 341, 300, 286, 1051, 281, 362, 2212, 257, + 50816], "temperature": 0.0, "avg_logprob": -0.22103147004780016, "compression_ratio": + 1.644736842105263, "no_speech_prob": 0.005836143624037504}, {"id": 527, "seek": + 343816, "start": 3447.8399999999997, "end": 3453.7599999999998, "text": " community + talk during the search with the mail course. And there I actually go explicitly + into", "tokens": [50848, 1768, 751, 1830, 264, 3164, 365, 264, 10071, 1164, 13, + 400, 456, 286, 767, 352, 20803, 666, 51144], "temperature": 0.0, "avg_logprob": + -0.22103147004780016, "compression_ratio": 1.644736842105263, "no_speech_prob": + 0.005836143624037504}, {"id": 528, "seek": 343816, "start": 3453.7599999999998, + "end": 3458.8799999999997, "text": " this by encoder and cross-ent coder. So only + one thing is that cross-ent coder is much more", "tokens": [51144, 341, 538, 2058, + 19866, 293, 3278, 12, 317, 17656, 260, 13, 407, 787, 472, 551, 307, 300, 3278, 12, + 317, 17656, 260, 307, 709, 544, 51400], "temperature": 0.0, "avg_logprob": -0.22103147004780016, + "compression_ratio": 1.644736842105263, "no_speech_prob": 0.005836143624037504}, + {"id": 529, "seek": 343816, "start": 3460.48, "end": 3465.2799999999997, "text": + " computationally intensive. And so you don''t want to run it on a huge amount of + sentences. And it", "tokens": [51480, 24903, 379, 18957, 13, 400, 370, 291, 500, + 380, 528, 281, 1190, 309, 322, 257, 2603, 2372, 295, 16579, 13, 400, 309, 51720], + "temperature": 0.0, "avg_logprob": -0.22103147004780016, "compression_ratio": 1.644736842105263, + "no_speech_prob": 0.005836143624037504}, {"id": 530, "seek": 346528, "start": 3465.28, + "end": 3471.0400000000004, "text": " looks like that''s what you''re doing. So that + sounds sensible to me. I think I would pay more", "tokens": [50364, 1542, 411, 300, + 311, 437, 291, 434, 884, 13, 407, 300, 3263, 25380, 281, 385, 13, 286, 519, 286, + 576, 1689, 544, 50652], "temperature": 0.0, "avg_logprob": -0.15293543266527582, + "compression_ratio": 1.5591836734693878, "no_speech_prob": 0.005863623693585396}, + {"id": 531, "seek": 346528, "start": 3471.0400000000004, "end": 3477.44, "text": + " attention to testing your approach. So make sure to reserve some part of your + data set to test it.", "tokens": [50652, 3202, 281, 4997, 428, 3109, 13, 407, 652, + 988, 281, 17824, 512, 644, 295, 428, 1412, 992, 281, 1500, 309, 13, 50972], "temperature": + 0.0, "avg_logprob": -0.15293543266527582, "compression_ratio": 1.5591836734693878, + "no_speech_prob": 0.005863623693585396}, {"id": 532, "seek": 346528, "start": 3478.32, + "end": 3483.44, "text": " Careful. Yeah, this is the cool thing for me coming back + in from Wikiland is I''m learning so", "tokens": [51016, 32932, 13, 865, 11, 341, + 307, 264, 1627, 551, 337, 385, 1348, 646, 294, 490, 35892, 1661, 307, 286, 478, + 2539, 370, 51272], "temperature": 0.0, "avg_logprob": -0.15293543266527582, "compression_ratio": + 1.5591836734693878, "no_speech_prob": 0.005863623693585396}, {"id": 533, "seek": + 346528, "start": 3483.44, "end": 3488.6400000000003, "text": " much now too. Like + this is I''ve been digging my way through a lot of these things, but as you can", + "tokens": [51272, 709, 586, 886, 13, 1743, 341, 307, 286, 600, 668, 17343, 452, + 636, 807, 257, 688, 295, 613, 721, 11, 457, 382, 291, 393, 51532], "temperature": + 0.0, "avg_logprob": -0.15293543266527582, "compression_ratio": 1.5591836734693878, + "no_speech_prob": 0.005863623693585396}, {"id": 534, "seek": 348864, "start": 3488.64, + "end": 3496.08, "text": " see, this is why it''s the gold age because there''s so + many approaches and they''re often", "tokens": [50364, 536, 11, 341, 307, 983, 309, + 311, 264, 3821, 3205, 570, 456, 311, 370, 867, 11587, 293, 436, 434, 2049, 50736], + "temperature": 0.0, "avg_logprob": -0.26228651087334814, "compression_ratio": 1.5296610169491525, + "no_speech_prob": 0.00551283173263073}, {"id": 535, "seek": 348864, "start": 3496.08, + "end": 3502.4, "text": " improving state of the art every week, right? Yeah, exactly. + A lot of things is happening.", "tokens": [50736, 11470, 1785, 295, 264, 1523, 633, + 1243, 11, 558, 30, 865, 11, 2293, 13, 316, 688, 295, 721, 307, 2737, 13, 51052], + "temperature": 0.0, "avg_logprob": -0.26228651087334814, "compression_ratio": 1.5296610169491525, + "no_speech_prob": 0.00551283173263073}, {"id": 536, "seek": 348864, "start": 3503.04, + "end": 3507.2, "text": " Another question I''m taking now from the chat, Carlos + is asking, I''d like to know", "tokens": [51084, 3996, 1168, 286, 478, 1940, 586, + 490, 264, 5081, 11, 19646, 307, 3365, 11, 286, 1116, 411, 281, 458, 51292], "temperature": + 0.0, "avg_logprob": -0.26228651087334814, "compression_ratio": 1.5296610169491525, + "no_speech_prob": 0.00551283173263073}, {"id": 537, "seek": 348864, "start": 3508.48, + "end": 3515.68, "text": " Grandsepinion inside about learning to boost. He gives + also a link to a presentation at a high-stack", "tokens": [51356, 6757, 405, 17836, + 313, 1854, 466, 2539, 281, 9194, 13, 634, 2709, 611, 257, 2113, 281, 257, 5860, + 412, 257, 1090, 12, 372, 501, 51716], "temperature": 0.0, "avg_logprob": -0.26228651087334814, + "compression_ratio": 1.5296610169491525, "no_speech_prob": 0.00551283173263073}, + {"id": 538, "seek": 351568, "start": 3516.16, "end": 3521.3599999999997, "text": + " high-stack conference. I don''t know if you''re familiar with this approach, Grant. + Can you say anything?", "tokens": [50388, 1090, 12, 372, 501, 7586, 13, 286, 500, + 380, 458, 498, 291, 434, 4963, 365, 341, 3109, 11, 17529, 13, 1664, 291, 584, 1340, + 30, 50648], "temperature": 0.0, "avg_logprob": -0.1990061556355337, "compression_ratio": + 1.5817307692307692, "no_speech_prob": 0.014524207450449467}, {"id": 539, "seek": + 351568, "start": 3523.12, "end": 3529.6, "text": " I am not. I''d like to know learning + to boost interesting. Another thing to go learn.", "tokens": [50736, 286, 669, 406, + 13, 286, 1116, 411, 281, 458, 2539, 281, 9194, 1880, 13, 3996, 551, 281, 352, 1466, + 13, 51060], "temperature": 0.0, "avg_logprob": -0.1990061556355337, "compression_ratio": + 1.5817307692307692, "no_speech_prob": 0.014524207450449467}, {"id": 540, "seek": + 351568, "start": 3530.64, "end": 3533.2799999999997, "text": " Yeah, I think it + was all kind of learning to rank.", "tokens": [51112, 865, 11, 286, 519, 309, 390, + 439, 733, 295, 2539, 281, 6181, 13, 51244], "temperature": 0.0, "avg_logprob": -0.1990061556355337, + "compression_ratio": 1.5817307692307692, "no_speech_prob": 0.014524207450449467}, + {"id": 541, "seek": 351568, "start": 3535.2, "end": 3540.8799999999997, "text": + " I think it''s related, but I actually don''t know myself like that much in detail, + but that", "tokens": [51340, 286, 519, 309, 311, 4077, 11, 457, 286, 767, 500, 380, + 458, 2059, 411, 300, 709, 294, 2607, 11, 457, 300, 51624], "temperature": 0.0, "avg_logprob": + -0.1990061556355337, "compression_ratio": 1.5817307692307692, "no_speech_prob": + 0.014524207450449467}, {"id": 542, "seek": 354088, "start": 3541.36, "end": 3547.28, + "text": " that presentation was great. It looked like new thing, but at the same + time kind of familiar.", "tokens": [50388, 300, 5860, 390, 869, 13, 467, 2956, 411, + 777, 551, 11, 457, 412, 264, 912, 565, 733, 295, 4963, 13, 50684], "temperature": + 0.0, "avg_logprob": -0.20237319365791653, "compression_ratio": 1.553648068669528, + "no_speech_prob": 0.009666667319834232}, {"id": 543, "seek": 354088, "start": 3550.08, + "end": 3554.7200000000003, "text": " Basically, instead of learning to rank, you + learn the boost values as far as I remember.", "tokens": [50824, 8537, 11, 2602, + 295, 2539, 281, 6181, 11, 291, 1466, 264, 9194, 4190, 382, 1400, 382, 286, 1604, + 13, 51056], "temperature": 0.0, "avg_logprob": -0.20237319365791653, "compression_ratio": + 1.553648068669528, "no_speech_prob": 0.009666667319834232}, {"id": 544, "seek": + 354088, "start": 3558.0, "end": 3563.84, "text": " It sounds interesting and reasonable. + Again, at the end of the day, how do we shape these vectors?", "tokens": [51220, + 467, 3263, 1880, 293, 10585, 13, 3764, 11, 412, 264, 917, 295, 264, 786, 11, 577, + 360, 321, 3909, 613, 18875, 30, 51512], "temperature": 0.0, "avg_logprob": -0.20237319365791653, + "compression_ratio": 1.553648068669528, "no_speech_prob": 0.009666667319834232}, + {"id": 545, "seek": 354088, "start": 3563.84, "end": 3568.6400000000003, "text": + " I know that''s a generic wave in your hands, but I would take this and go try + it.", "tokens": [51512, 286, 458, 300, 311, 257, 19577, 5772, 294, 428, 2377, 11, + 457, 286, 576, 747, 341, 293, 352, 853, 309, 13, 51752], "temperature": 0.0, "avg_logprob": + -0.20237319365791653, "compression_ratio": 1.553648068669528, "no_speech_prob": + 0.009666667319834232}, {"id": 546, "seek": 356864, "start": 3568.96, "end": 3577.04, + "text": " I think most of these machine learning systems you''re trying to learn + weights that then shape the way", "tokens": [50380, 286, 519, 881, 295, 613, 3479, + 2539, 3652, 291, 434, 1382, 281, 1466, 17443, 300, 550, 3909, 264, 636, 50784], + "temperature": 0.0, "avg_logprob": -0.20973227680593298, "compression_ratio": 1.5449735449735449, + "no_speech_prob": 0.00460564810782671}, {"id": 547, "seek": 356864, "start": 3577.04, + "end": 3586.16, "text": " that vector gets called. If it works on your domain and + it''s fast enough and you can maintain it,", "tokens": [50784, 300, 8062, 2170, + 1219, 13, 759, 309, 1985, 322, 428, 9274, 293, 309, 311, 2370, 1547, 293, 291, 393, + 6909, 309, 11, 51240], "temperature": 0.0, "avg_logprob": -0.20973227680593298, + "compression_ratio": 1.5449735449735449, "no_speech_prob": 0.00460564810782671}, + {"id": 548, "seek": 356864, "start": 3587.3599999999997, "end": 3596.08, "text": + " then go for it. You don''t need some experts blessing on it. It certainly sounds + interesting.", "tokens": [51300, 550, 352, 337, 309, 13, 509, 500, 380, 643, 512, + 8572, 13869, 322, 309, 13, 467, 3297, 3263, 1880, 13, 51736], "temperature": 0.0, + "avg_logprob": -0.20973227680593298, "compression_ratio": 1.5449735449735449, "no_speech_prob": + 0.00460564810782671}, {"id": 549, "seek": 359608, "start": 3596.08, "end": 3602.24, + "text": " LTR certainly has its own challenges in terms of tweaking and tuning. + I know I''ve struggled with that", "tokens": [50364, 441, 25936, 3297, 575, 1080, + 1065, 4759, 294, 2115, 295, 6986, 2456, 293, 15164, 13, 286, 458, 286, 600, 19023, + 365, 300, 50672], "temperature": 0.0, "avg_logprob": -0.17012994939630682, "compression_ratio": + 1.6125, "no_speech_prob": 0.017327241599559784}, {"id": 550, "seek": 359608, "start": + 3602.24, "end": 3611.6, "text": " with LTR a lot. I know I''ve struggled with hand-tuned + boost a lot as well, so anything that helps", "tokens": [50672, 365, 441, 25936, + 257, 688, 13, 286, 458, 286, 600, 19023, 365, 1011, 12, 83, 43703, 9194, 257, 688, + 382, 731, 11, 370, 1340, 300, 3665, 51140], "temperature": 0.0, "avg_logprob": -0.17012994939630682, + "compression_ratio": 1.6125, "no_speech_prob": 0.017327241599559784}, {"id": 551, + "seek": 359608, "start": 3611.6, "end": 3619.36, "text": " do that I think would + be good. Yeah, awesome. The next question comes from Nico, Hey, Nico,", "tokens": + [51140, 360, 300, 286, 519, 576, 312, 665, 13, 865, 11, 3476, 13, 440, 958, 1168, + 1487, 490, 15115, 11, 1911, 11, 15115, 11, 51528], "temperature": 0.0, "avg_logprob": + -0.17012994939630682, "compression_ratio": 1.6125, "no_speech_prob": 0.017327241599559784}, + {"id": 552, "seek": 359608, "start": 3619.36, "end": 3625.04, "text": " a former + colleague from AlphaSense. If you''re hosting an information search engine which + should", "tokens": [51528, 257, 5819, 13532, 490, 20588, 50, 1288, 13, 759, 291, + 434, 16058, 364, 1589, 3164, 2848, 597, 820, 51812], "temperature": 0.0, "avg_logprob": + -0.17012994939630682, "compression_ratio": 1.6125, "no_speech_prob": 0.017327241599559784}, + {"id": 553, "seek": 362504, "start": 3625.04, "end": 3632.72, "text": " catch new + topics like COVID when it hit, how do you notice that your boosting model of vector", + "tokens": [50364, 3745, 777, 8378, 411, 4566, 562, 309, 2045, 11, 577, 360, 291, + 3449, 300, 428, 43117, 2316, 295, 8062, 50748], "temperature": 0.0, "avg_logprob": + -0.137120177946895, "compression_ratio": 1.6273584905660377, "no_speech_prob": 0.006344288121908903}, + {"id": 554, "seek": 362504, "start": 3632.72, "end": 3638.08, "text": " embedding + model does not recognize queries related to these new topics proactively?", "tokens": + [50748, 12240, 3584, 2316, 775, 406, 5521, 24109, 4077, 281, 613, 777, 8378, 447, + 45679, 30, 51016], "temperature": 0.0, "avg_logprob": -0.137120177946895, "compression_ratio": + 1.6273584905660377, "no_speech_prob": 0.006344288121908903}, {"id": 555, "seek": + 362504, "start": 3639.44, "end": 3643.04, "text": " Yeah, that''s where I think + the instrumentation of your system comes in, right?", "tokens": [51084, 865, 11, + 300, 311, 689, 286, 519, 264, 7198, 399, 295, 428, 1185, 1487, 294, 11, 558, 30, + 51264], "temperature": 0.0, "avg_logprob": -0.137120177946895, "compression_ratio": + 1.6273584905660377, "no_speech_prob": 0.006344288121908903}, {"id": 556, "seek": + 362504, "start": 3644.24, "end": 3649.44, "text": " And the human and the loop on + that instrumentation in the system, right? I mean, I think", "tokens": [51324, 400, + 264, 1952, 293, 264, 6367, 322, 300, 7198, 399, 294, 264, 1185, 11, 558, 30, 286, + 914, 11, 286, 519, 51584], "temperature": 0.0, "avg_logprob": -0.137120177946895, + "compression_ratio": 1.6273584905660377, "no_speech_prob": 0.006344288121908903}, + {"id": 557, "seek": 364944, "start": 3649.84, "end": 3656.64, "text": " nobody talks + about it, but even at the really large successful search engines, there''s still + people", "tokens": [50384, 5079, 6686, 466, 309, 11, 457, 754, 412, 264, 534, 2416, + 4406, 3164, 12982, 11, 456, 311, 920, 561, 50724], "temperature": 0.0, "avg_logprob": + -0.14560079038812873, "compression_ratio": 1.7236842105263157, "no_speech_prob": + 0.005095826927572489}, {"id": 558, "seek": 364944, "start": 3656.64, "end": 3663.84, + "text": " who are reviewing where things are working and not working, right? And + generally they''re doing it", "tokens": [50724, 567, 366, 19576, 689, 721, 366, + 1364, 293, 406, 1364, 11, 558, 30, 400, 5101, 436, 434, 884, 309, 51084], "temperature": + 0.0, "avg_logprob": -0.14560079038812873, "compression_ratio": 1.7236842105263157, + "no_speech_prob": 0.005095826927572489}, {"id": 559, "seek": 364944, "start": 3663.84, + "end": 3671.04, "text": " at the experimentation level, but people still dig into + queries. What queries are underperforming?", "tokens": [51084, 412, 264, 37142, + 1496, 11, 457, 561, 920, 2528, 666, 24109, 13, 708, 24109, 366, 833, 26765, 278, + 30, 51444], "temperature": 0.0, "avg_logprob": -0.14560079038812873, "compression_ratio": + 1.7236842105263157, "no_speech_prob": 0.005095826927572489}, {"id": 560, "seek": + 364944, "start": 3671.52, "end": 3676.2400000000002, "text": " What documents are + underperforming? I think there''s tools, there''s a lot of good tools out there", + "tokens": [51468, 708, 8512, 366, 833, 26765, 278, 30, 286, 519, 456, 311, 3873, + 11, 456, 311, 257, 688, 295, 665, 3873, 484, 456, 51704], "temperature": 0.0, "avg_logprob": + -0.14560079038812873, "compression_ratio": 1.7236842105263157, "no_speech_prob": + 0.005095826927572489}, {"id": 561, "seek": 367624, "start": 3676.24, "end": 3682.16, + "text": " for anomaly detection as well. So recognizing when new queries are coming + in is something like", "tokens": [50364, 337, 42737, 17784, 382, 731, 13, 407, 18538, + 562, 777, 24109, 366, 1348, 294, 307, 746, 411, 50660], "temperature": 0.0, "avg_logprob": + -0.10794573944884461, "compression_ratio": 1.6805555555555556, "no_speech_prob": + 0.0014734393917024136}, {"id": 562, "seek": 367624, "start": 3682.16, "end": 3692.24, + "text": " anomaly detection algorithms will work with, right? You know, looking + at your top queries,", "tokens": [50660, 42737, 17784, 14642, 486, 589, 365, 11, + 558, 30, 509, 458, 11, 1237, 412, 428, 1192, 24109, 11, 51164], "temperature": 0.0, + "avg_logprob": -0.10794573944884461, "compression_ratio": 1.6805555555555556, "no_speech_prob": + 0.0014734393917024136}, {"id": 563, "seek": 367624, "start": 3692.24, "end": 3697.68, + "text": " your trending queries, and then again, looking at those results, there + are machine learning", "tokens": [51164, 428, 28692, 24109, 11, 293, 550, 797, 11, + 1237, 412, 729, 3542, 11, 456, 366, 3479, 2539, 51436], "temperature": 0.0, "avg_logprob": + -0.10794573944884461, "compression_ratio": 1.6805555555555556, "no_speech_prob": + 0.0014734393917024136}, {"id": 564, "seek": 367624, "start": 3697.68, "end": 3702.64, + "text": " approaches to automatically identifying and alerting on those kinds of + things, again,", "tokens": [51436, 11587, 281, 6772, 16696, 293, 419, 27187, 322, + 729, 3685, 295, 721, 11, 797, 11, 51684], "temperature": 0.0, "avg_logprob": -0.10794573944884461, + "compression_ratio": 1.6805555555555556, "no_speech_prob": 0.0014734393917024136}, + {"id": 565, "seek": 370264, "start": 3702.64, "end": 3708.3199999999997, "text": + " along the anomaly detection line. But at the end of the day, you can always do + that with people", "tokens": [50364, 2051, 264, 42737, 17784, 1622, 13, 583, 412, + 264, 917, 295, 264, 786, 11, 291, 393, 1009, 360, 300, 365, 561, 50648], "temperature": + 0.0, "avg_logprob": -0.15962262587113815, "compression_ratio": 1.5748987854251013, + "no_speech_prob": 0.013607312925159931}, {"id": 566, "seek": 370264, "start": 3708.3199999999997, + "end": 3714.72, "text": " as well, right? And that''s where humans maybe are better + at still at recognizing some of those things.", "tokens": [50648, 382, 731, 11, + 558, 30, 400, 300, 311, 689, 6255, 1310, 366, 1101, 412, 920, 412, 18538, 512, 295, + 729, 721, 13, 50968], "temperature": 0.0, "avg_logprob": -0.15962262587113815, "compression_ratio": + 1.5748987854251013, "no_speech_prob": 0.013607312925159931}, {"id": 567, "seek": + 370264, "start": 3716.16, "end": 3721.2799999999997, "text": " Yeah, and I think + you also alluded to this somewhat. I mean, this question is to me, it''s like", + "tokens": [51040, 865, 11, 293, 286, 519, 291, 611, 33919, 281, 341, 8344, 13, 286, + 914, 11, 341, 1168, 307, 281, 385, 11, 309, 311, 411, 51296], "temperature": 0.0, + "avg_logprob": -0.15962262587113815, "compression_ratio": 1.5748987854251013, "no_speech_prob": + 0.013607312925159931}, {"id": 568, "seek": 370264, "start": 3721.2799999999997, + "end": 3727.44, "text": " chicken-eyed problem, right? So if a new topic arises + in the queries and also in the documents,", "tokens": [51296, 4662, 12, 37860, 1154, + 11, 558, 30, 407, 498, 257, 777, 4829, 27388, 294, 264, 24109, 293, 611, 294, 264, + 8512, 11, 51604], "temperature": 0.0, "avg_logprob": -0.15962262587113815, "compression_ratio": + 1.5748987854251013, "no_speech_prob": 0.013607312925159931}, {"id": 569, "seek": + 372744, "start": 3727.44, "end": 3736.08, "text": " but I haven''t handled it yet + before prior to this, then what can I do live? So I think you said", "tokens": [50364, + 457, 286, 2378, 380, 18033, 309, 1939, 949, 4059, 281, 341, 11, 550, 437, 393, 286, + 360, 1621, 30, 407, 286, 519, 291, 848, 50796], "temperature": 0.0, "avg_logprob": + -0.17442673444747925, "compression_ratio": 1.6270491803278688, "no_speech_prob": + 0.006583199370652437}, {"id": 570, "seek": 372744, "start": 3736.08, "end": 3741.76, + "text": " that try to measure things like if some top ranking documents are not + clicked, then that''s probably", "tokens": [50796, 300, 853, 281, 3481, 721, 411, + 498, 512, 1192, 17833, 8512, 366, 406, 23370, 11, 550, 300, 311, 1391, 51080], "temperature": + 0.0, "avg_logprob": -0.17442673444747925, "compression_ratio": 1.6270491803278688, + "no_speech_prob": 0.006583199370652437}, {"id": 571, "seek": 372744, "start": 3741.76, + "end": 3747.2000000000003, "text": " a signal of something is smoky there. Go check + it out. Another thing that I think I could recommend,", "tokens": [51080, 257, 6358, + 295, 746, 307, 32073, 88, 456, 13, 1037, 1520, 309, 484, 13, 3996, 551, 300, 286, + 519, 286, 727, 2748, 11, 51352], "temperature": 0.0, "avg_logprob": -0.17442673444747925, + "compression_ratio": 1.6270491803278688, "no_speech_prob": 0.006583199370652437}, + {"id": 572, "seek": 372744, "start": 3747.2000000000003, "end": 3753.36, "text": + " maybe from my side, is you could try to cluster your queries actually. And sometimes + the funny thing", "tokens": [51352, 1310, 490, 452, 1252, 11, 307, 291, 727, 853, + 281, 13630, 428, 24109, 767, 13, 400, 2171, 264, 4074, 551, 51660], "temperature": + 0.0, "avg_logprob": -0.17442673444747925, "compression_ratio": 1.6270491803278688, + "no_speech_prob": 0.006583199370652437}, {"id": 573, "seek": 375336, "start": 3753.36, + "end": 3759.44, "text": " is that queries are related in some way, right? So like + if it''s a completely new cluster and usually", "tokens": [50364, 307, 300, 24109, + 366, 4077, 294, 512, 636, 11, 558, 30, 407, 411, 498, 309, 311, 257, 2584, 777, + 13630, 293, 2673, 50668], "temperature": 0.0, "avg_logprob": -0.1298132946616725, + "compression_ratio": 1.5942622950819672, "no_speech_prob": 0.007006326224654913}, + {"id": 574, "seek": 375336, "start": 3759.44, "end": 3767.84, "text": " dense retrieval + helps a lot there, pre-trained models on your domain or maybe on some generic domain", + "tokens": [50668, 18011, 19817, 3337, 3665, 257, 688, 456, 11, 659, 12, 17227, 2001, + 5245, 322, 428, 9274, 420, 1310, 322, 512, 19577, 9274, 51088], "temperature": 0.0, + "avg_logprob": -0.1298132946616725, "compression_ratio": 1.5942622950819672, "no_speech_prob": + 0.007006326224654913}, {"id": 575, "seek": 375336, "start": 3767.84, "end": 3773.28, + "text": " like news, they might still pick these things up and put them in the same + basket, then ask some", "tokens": [51088, 411, 2583, 11, 436, 1062, 920, 1888, 613, + 721, 493, 293, 829, 552, 294, 264, 912, 8390, 11, 550, 1029, 512, 51360], "temperature": + 0.0, "avg_logprob": -0.1298132946616725, "compression_ratio": 1.5942622950819672, + "no_speech_prob": 0.007006326224654913}, {"id": 576, "seek": 375336, "start": 3773.28, + "end": 3778.6400000000003, "text": " human annotators to go and check. Instead of + checking the whole multimillion log, you know,", "tokens": [51360, 1952, 25339, + 3391, 281, 352, 293, 1520, 13, 7156, 295, 8568, 264, 1379, 32972, 11836, 3565, 11, + 291, 458, 11, 51628], "temperature": 0.0, "avg_logprob": -0.1298132946616725, "compression_ratio": + 1.5942622950819672, "no_speech_prob": 0.007006326224654913}, {"id": 577, "seek": + 377864, "start": 3778.72, "end": 3782.08, "text": " which would be super, super + complicated. And you know, I agree.", "tokens": [50368, 597, 576, 312, 1687, 11, + 1687, 6179, 13, 400, 291, 458, 11, 286, 3986, 13, 50536], "temperature": 0.0, "avg_logprob": + -0.16583304935031468, "compression_ratio": 1.640625, "no_speech_prob": 0.021722517907619476}, + {"id": 578, "seek": 377864, "start": 3782.64, "end": 3786.16, "text": " And the + nice thing about like, you know, especially, you know, these engines,", "tokens": + [50564, 400, 264, 1481, 551, 466, 411, 11, 291, 458, 11, 2318, 11, 291, 458, 11, + 613, 12982, 11, 50740], "temperature": 0.0, "avg_logprob": -0.16583304935031468, + "compression_ratio": 1.640625, "no_speech_prob": 0.021722517907619476}, {"id": 579, + "seek": 377864, "start": 3788.08, "end": 3795.3599999999997, "text": " you know, + there is still the good old BM25 case where like at least the basic level keywords", + "tokens": [50836, 291, 458, 11, 456, 307, 920, 264, 665, 1331, 15901, 6074, 1389, + 689, 411, 412, 1935, 264, 3875, 1496, 21009, 51200], "temperature": 0.0, "avg_logprob": + -0.16583304935031468, "compression_ratio": 1.640625, "no_speech_prob": 0.021722517907619476}, + {"id": 580, "seek": 377864, "start": 3795.3599999999997, "end": 3801.3599999999997, + "text": " are going to match. And so if a new term comes in for COVID and like it''s + in the documents,", "tokens": [51200, 366, 516, 281, 2995, 13, 400, 370, 498, 257, + 777, 1433, 1487, 294, 337, 4566, 293, 411, 309, 311, 294, 264, 8512, 11, 51500], + "temperature": 0.0, "avg_logprob": -0.16583304935031468, "compression_ratio": 1.640625, + "no_speech_prob": 0.021722517907619476}, {"id": 581, "seek": 377864, "start": 3801.3599999999997, + "end": 3806.72, "text": " you''ll at least probably get an exact match. You may + not deal with the fuzzy matches all that", "tokens": [51500, 291, 603, 412, 1935, + 1391, 483, 364, 1900, 2995, 13, 509, 815, 406, 2028, 365, 264, 34710, 10676, 439, + 300, 51768], "temperature": 0.0, "avg_logprob": -0.16583304935031468, "compression_ratio": + 1.640625, "no_speech_prob": 0.021722517907619476}, {"id": 582, "seek": 380672, "start": + 3806.72, "end": 3811.7599999999998, "text": " well, but you know, like something''s + better than nothing. And then that allows you to start to", "tokens": [50364, 731, + 11, 457, 291, 458, 11, 411, 746, 311, 1101, 813, 1825, 13, 400, 550, 300, 4045, + 291, 281, 722, 281, 50616], "temperature": 0.0, "avg_logprob": -0.16903831647789996, + "compression_ratio": 1.5238095238095237, "no_speech_prob": 0.00046816596295684576}, + {"id": 583, "seek": 380672, "start": 3811.7599999999998, "end": 3819.52, "text": + " iterate on it. Yeah, exactly. So the next question from Q&A panel is from Chris + for the search with", "tokens": [50616, 44497, 322, 309, 13, 865, 11, 2293, 13, + 407, 264, 958, 1168, 490, 1249, 5, 32, 4831, 307, 490, 6688, 337, 264, 3164, 365, + 51004], "temperature": 0.0, "avg_logprob": -0.16903831647789996, "compression_ratio": + 1.5238095238095237, "no_speech_prob": 0.00046816596295684576}, {"id": 584, "seek": + 380672, "start": 3819.52, "end": 3823.9199999999996, "text": " ML course, which + front-end framework are most students using for their projects?", "tokens": [51004, + 21601, 1164, 11, 597, 1868, 12, 521, 8388, 366, 881, 1731, 1228, 337, 641, 4455, + 30, 51224], "temperature": 0.0, "avg_logprob": -0.16903831647789996, "compression_ratio": + 1.5238095238095237, "no_speech_prob": 0.00046816596295684576}, {"id": 585, "seek": + 380672, "start": 3826.24, "end": 3830.72, "text": " Front-end framework feels a + little open-ended to me, but I mean, I can tell.", "tokens": [51340, 17348, 12, + 521, 8388, 3417, 257, 707, 1269, 12, 3502, 281, 385, 11, 457, 286, 914, 11, 286, + 393, 980, 13, 51564], "temperature": 0.0, "avg_logprob": -0.16903831647789996, "compression_ratio": + 1.5238095238095237, "no_speech_prob": 0.00046816596295684576}, {"id": 586, "seek": + 383072, "start": 3831.2799999999997, "end": 3836.9599999999996, "text": " So one + of the things we''re doing in both classes is we try to work with a real data set-end + with", "tokens": [50392, 407, 472, 295, 264, 721, 321, 434, 884, 294, 1293, 5359, + 307, 321, 853, 281, 589, 365, 257, 957, 1412, 992, 12, 521, 365, 50676], "temperature": + 0.0, "avg_logprob": -0.11868405085737987, "compression_ratio": 1.662162162162162, + "no_speech_prob": 0.0034896009601652622}, {"id": 587, "seek": 383072, "start": 3836.9599999999996, + "end": 3842.8799999999997, "text": " a real search application. For better or for + worse, we chose not to use notebooks.", "tokens": [50676, 257, 957, 3164, 3861, + 13, 1171, 1101, 420, 337, 5324, 11, 321, 5111, 406, 281, 764, 43782, 13, 50972], + "temperature": 0.0, "avg_logprob": -0.11868405085737987, "compression_ratio": 1.662162162162162, + "no_speech_prob": 0.0034896009601652622}, {"id": 588, "seek": 383072, "start": 3844.3199999999997, + "end": 3848.56, "text": " Notebooks are great for a lot of things, but I don''t + know that they always show you how actual", "tokens": [51044, 11633, 15170, 366, + 869, 337, 257, 688, 295, 721, 11, 457, 286, 500, 380, 458, 300, 436, 1009, 855, + 291, 577, 3539, 51256], "temperature": 0.0, "avg_logprob": -0.11868405085737987, + "compression_ratio": 1.662162162162162, "no_speech_prob": 0.0034896009601652622}, + {"id": 589, "seek": 383072, "start": 3848.56, "end": 3854.08, "text": " applications + work. So we actually build out a really simple application. The front-end is like", + "tokens": [51256, 5821, 589, 13, 407, 321, 767, 1322, 484, 257, 534, 2199, 3861, + 13, 440, 1868, 12, 521, 307, 411, 51532], "temperature": 0.0, "avg_logprob": -0.11868405085737987, + "compression_ratio": 1.662162162162162, "no_speech_prob": 0.0034896009601652622}, + {"id": 590, "seek": 385408, "start": 3854.16, "end": 3862.4, "text": " tailwinds, + CSS, and really simple flask serving layer for the APIs. And then we use", "tokens": + [50368, 6838, 12199, 82, 11, 24387, 11, 293, 534, 2199, 932, 3863, 8148, 4583, 337, + 264, 21445, 13, 400, 550, 321, 764, 50780], "temperature": 0.0, "avg_logprob": -0.17761532689484072, + "compression_ratio": 1.5138121546961325, "no_speech_prob": 0.0010503839002922177}, + {"id": 591, "seek": 385408, "start": 3863.52, "end": 3871.52, "text": " open search + for the search engine and things like fast text and a few other things for ML side + of it.", "tokens": [50836, 1269, 3164, 337, 264, 3164, 2848, 293, 721, 411, 2370, + 2487, 293, 257, 1326, 661, 721, 337, 21601, 1252, 295, 309, 13, 51236], "temperature": + 0.0, "avg_logprob": -0.17761532689484072, "compression_ratio": 1.5138121546961325, + "no_speech_prob": 0.0010503839002922177}, {"id": 592, "seek": 385408, "start": 3871.52, + "end": 3877.84, "text": " You know, we use the learning to rank plugin for open + search, trying to think if there''s", "tokens": [51236, 509, 458, 11, 321, 764, + 264, 2539, 281, 6181, 23407, 337, 1269, 3164, 11, 1382, 281, 519, 498, 456, 311, + 51552], "temperature": 0.0, "avg_logprob": -0.17761532689484072, "compression_ratio": + 1.5138121546961325, "no_speech_prob": 0.0010503839002922177}, {"id": 593, "seek": + 387784, "start": 3877.84, "end": 3884.96, "text": " anything else in our stack. + It''s primarily Python, but I think if you were a Java user or any of the", "tokens": + [50364, 1340, 1646, 294, 527, 8630, 13, 467, 311, 10029, 15329, 11, 457, 286, 519, + 498, 291, 645, 257, 10745, 4195, 420, 604, 295, 264, 50720], "temperature": 0.0, + "avg_logprob": -0.09764885646040722, "compression_ratio": 1.590717299578059, "no_speech_prob": + 0.0013443896314129233}, {"id": 594, "seek": 387784, "start": 3884.96, "end": 3891.1200000000003, + "text": " other languages where there''s clients for open search, you would do just + fine in the class.", "tokens": [50720, 661, 8650, 689, 456, 311, 6982, 337, 1269, + 3164, 11, 291, 576, 360, 445, 2489, 294, 264, 1508, 13, 51028], "temperature": 0.0, + "avg_logprob": -0.09764885646040722, "compression_ratio": 1.590717299578059, "no_speech_prob": + 0.0013443896314129233}, {"id": 595, "seek": 387784, "start": 3891.1200000000003, + "end": 3897.76, "text": " You maybe just won''t be able to use all of the Python + capabilities that we have in the class.", "tokens": [51028, 509, 1310, 445, 1582, + 380, 312, 1075, 281, 764, 439, 295, 264, 15329, 10862, 300, 321, 362, 294, 264, + 1508, 13, 51360], "temperature": 0.0, "avg_logprob": -0.09764885646040722, "compression_ratio": + 1.590717299578059, "no_speech_prob": 0.0013443896314129233}, {"id": 596, "seek": + 387784, "start": 3897.76, "end": 3904.2400000000002, "text": " I hope that answers + your question, Chris. The repositories are all at least the base level", "tokens": + [51360, 286, 1454, 300, 6338, 428, 1168, 11, 6688, 13, 440, 22283, 2083, 366, 439, + 412, 1935, 264, 3096, 1496, 51684], "temperature": 0.0, "avg_logprob": -0.09764885646040722, + "compression_ratio": 1.590717299578059, "no_speech_prob": 0.0013443896314129233}, + {"id": 597, "seek": 390424, "start": 3904.24, "end": 3910.8799999999997, "text": + " repositories are all available under my GitHub. So you can just go to my GitHub, + which excuse me is", "tokens": [50364, 22283, 2083, 366, 439, 2435, 833, 452, 23331, + 13, 407, 291, 393, 445, 352, 281, 452, 23331, 11, 597, 8960, 385, 307, 50696], "temperature": + 0.0, "avg_logprob": -0.19815291298760307, "compression_ratio": 1.6351931330472103, + "no_speech_prob": 0.013288472779095173}, {"id": 598, "seek": 390424, "start": 3910.8799999999997, + "end": 3919.12, "text": " GSING, ERS, and put that in the chat. And then you can + see the frameworks we use.", "tokens": [50696, 460, 20262, 30237, 11, 462, 43580, + 11, 293, 829, 300, 294, 264, 5081, 13, 400, 550, 291, 393, 536, 264, 29834, 321, + 764, 13, 51108], "temperature": 0.0, "avg_logprob": -0.19815291298760307, "compression_ratio": + 1.6351931330472103, "no_speech_prob": 0.013288472779095173}, {"id": 599, "seek": + 390424, "start": 3920.08, "end": 3926.08, "text": " Yeah, awesome. And I can just, + you know, you can pick these things up or you can, if you know Python,", "tokens": + [51156, 865, 11, 3476, 13, 400, 286, 393, 445, 11, 291, 458, 11, 291, 393, 1888, + 613, 721, 493, 420, 291, 393, 11, 498, 291, 458, 15329, 11, 51456], "temperature": + 0.0, "avg_logprob": -0.19815291298760307, "compression_ratio": 1.6351931330472103, + "no_speech_prob": 0.013288472779095173}, {"id": 600, "seek": 390424, "start": 3926.08, + "end": 3932.9599999999996, "text": " it''s probably easy for you, but if you don''t, + you can pick this up. And the next question is from", "tokens": [51456, 309, 311, + 1391, 1858, 337, 291, 11, 457, 498, 291, 500, 380, 11, 291, 393, 1888, 341, 493, + 13, 400, 264, 958, 1168, 307, 490, 51800], "temperature": 0.0, "avg_logprob": -0.19815291298760307, + "compression_ratio": 1.6351931330472103, "no_speech_prob": 0.013288472779095173}, + {"id": 601, "seek": 393296, "start": 3932.96, "end": 3939.76, "text": " the chat + from quasi, I hope you pronounce your name correctly. As these days, most of these", + "tokens": [50364, 264, 5081, 490, 20954, 11, 286, 1454, 291, 19567, 428, 1315, 8944, + 13, 1018, 613, 1708, 11, 881, 295, 613, 50704], "temperature": 0.0, "avg_logprob": + -0.20160731402310458, "compression_ratio": 1.6092436974789917, "no_speech_prob": + 0.00234916596673429}, {"id": 602, "seek": 393296, "start": 3939.76, "end": 3945.52, + "text": " sort of approaches are based on transformers for anyone who wants to try + out IR approach using", "tokens": [50704, 1333, 295, 11587, 366, 2361, 322, 4088, + 433, 337, 2878, 567, 2738, 281, 853, 484, 16486, 3109, 1228, 50992], "temperature": + 0.0, "avg_logprob": -0.20160731402310458, "compression_ratio": 1.6092436974789917, + "no_speech_prob": 0.00234916596673429}, {"id": 603, "seek": 393296, "start": 3945.52, + "end": 3951.84, "text": " transformers as a pet project. Does grant have any recommendations + in terms of cloud services tools?", "tokens": [50992, 4088, 433, 382, 257, 3817, + 1716, 13, 4402, 6386, 362, 604, 10434, 294, 2115, 295, 4588, 3328, 3873, 30, 51308], + "temperature": 0.0, "avg_logprob": -0.20160731402310458, "compression_ratio": 1.6092436974789917, + "no_speech_prob": 0.00234916596673429}, {"id": 604, "seek": 393296, "start": 3954.08, + "end": 3960.0, "text": " I don''t have any specific recommendations. I know I''ve + looked at there''s several players. I was", "tokens": [51420, 286, 500, 380, 362, + 604, 2685, 10434, 13, 286, 458, 286, 600, 2956, 412, 456, 311, 2940, 4150, 13, 286, + 390, 51716], "temperature": 0.0, "avg_logprob": -0.20160731402310458, "compression_ratio": + 1.6092436974789917, "no_speech_prob": 0.00234916596673429}, {"id": 605, "seek": + 396000, "start": 3960.0, "end": 3966.72, "text": " so for instance, I saw somebody + in one of the IR communities that I was in with posted around,", "tokens": [50364, + 370, 337, 5197, 11, 286, 1866, 2618, 294, 472, 295, 264, 16486, 4456, 300, 286, + 390, 294, 365, 9437, 926, 11, 50700], "temperature": 0.0, "avg_logprob": -0.24450718626684073, + "compression_ratio": 1.5578512396694215, "no_speech_prob": 0.001972643891349435}, + {"id": 606, "seek": 396000, "start": 3966.72, "end": 3971.28, "text": " I think + I don''t know how he''s pronounced about quadrant, I think QDR and T. I know there''s", + "tokens": [50700, 286, 519, 286, 500, 380, 458, 577, 415, 311, 23155, 466, 46856, + 11, 286, 519, 1249, 35, 49, 293, 314, 13, 286, 458, 456, 311, 50928], "temperature": + 0.0, "avg_logprob": -0.24450718626684073, "compression_ratio": 1.5578512396694215, + "no_speech_prob": 0.001972643891349435}, {"id": 607, "seek": 396000, "start": 3971.28, + "end": 3979.04, "text": " UVA, I know there''s pine cone, elastic, solar, and open + search all have dense vector retrieval", "tokens": [50928, 17887, 32, 11, 286, 458, + 456, 311, 15113, 19749, 11, 17115, 11, 7936, 11, 293, 1269, 3164, 439, 362, 18011, + 8062, 19817, 3337, 51316], "temperature": 0.0, "avg_logprob": -0.24450718626684073, + "compression_ratio": 1.5578512396694215, "no_speech_prob": 0.001972643891349435}, + {"id": 608, "seek": 396000, "start": 3979.04, "end": 3986.0, "text": " capabilities. + I''ve been playing around just getting started with hugging face. I''m a little + late", "tokens": [51316, 10862, 13, 286, 600, 668, 2433, 926, 445, 1242, 1409, 365, + 41706, 1851, 13, 286, 478, 257, 707, 3469, 51664], "temperature": 0.0, "avg_logprob": + -0.24450718626684073, "compression_ratio": 1.5578512396694215, "no_speech_prob": + 0.001972643891349435}, {"id": 609, "seek": 398600, "start": 3986.0, "end": 3991.04, + "text": " to the hugging face game when it comes to these things. I know a lot of + people I talk to use", "tokens": [50364, 281, 264, 41706, 1851, 1216, 562, 309, + 1487, 281, 613, 721, 13, 286, 458, 257, 688, 295, 561, 286, 751, 281, 764, 50616], + "temperature": 0.0, "avg_logprob": -0.13947213556348664, "compression_ratio": 1.5938864628820961, + "no_speech_prob": 0.005920142401009798}, {"id": 610, "seek": 398600, "start": 3991.04, + "end": 3998.96, "text": " colab to build and run these systems. And so I think you + can probably get started. Again,", "tokens": [50616, 1173, 455, 281, 1322, 293, + 1190, 613, 3652, 13, 400, 370, 286, 519, 291, 393, 1391, 483, 1409, 13, 3764, 11, + 51012], "temperature": 0.0, "avg_logprob": -0.13947213556348664, "compression_ratio": + 1.5938864628820961, "no_speech_prob": 0.005920142401009798}, {"id": 611, "seek": + 398600, "start": 3998.96, "end": 4004.08, "text": " like Demetri, you may have better + tutorials. I know you''ve posted a bunch of stuff on medium", "tokens": [51012, + 411, 4686, 302, 470, 11, 291, 815, 362, 1101, 17616, 13, 286, 458, 291, 600, 9437, + 257, 3840, 295, 1507, 322, 6399, 51268], "temperature": 0.0, "avg_logprob": -0.13947213556348664, + "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.005920142401009798}, + {"id": 612, "seek": 398600, "start": 4004.08, "end": 4009.52, "text": " amount, + how to get started in this as have other people. So I would start there, I guess,", + "tokens": [51268, 2372, 11, 577, 281, 483, 1409, 294, 341, 382, 362, 661, 561, 13, + 407, 286, 576, 722, 456, 11, 286, 2041, 11, 51540], "temperature": 0.0, "avg_logprob": + -0.13947213556348664, "compression_ratio": 1.5938864628820961, "no_speech_prob": + 0.005920142401009798}, {"id": 613, "seek": 400952, "start": 4010.16, "end": 4016.32, + "text": " any one of those you probably won''t do wrong with. And then for me, I + always go back to,", "tokens": [50396, 604, 472, 295, 729, 291, 1391, 1582, 380, + 360, 2085, 365, 13, 400, 550, 337, 385, 11, 286, 1009, 352, 646, 281, 11, 50704], + "temperature": 0.0, "avg_logprob": -0.13880052773848825, "compression_ratio": 1.819905213270142, + "no_speech_prob": 0.004882677458226681}, {"id": 614, "seek": 400952, "start": 4016.32, + "end": 4022.64, "text": " like I like to take a data set that I''m familiar with + first rather than a technology that I''m", "tokens": [50704, 411, 286, 411, 281, + 747, 257, 1412, 992, 300, 286, 478, 4963, 365, 700, 2831, 813, 257, 2899, 300, 286, + 478, 51020], "temperature": 0.0, "avg_logprob": -0.13880052773848825, "compression_ratio": + 1.819905213270142, "no_speech_prob": 0.004882677458226681}, {"id": 615, "seek": + 400952, "start": 4022.64, "end": 4029.12, "text": " unfamiliar with. Whenever I''m + learning something new, I start with something I''m familiar with and", "tokens": + [51020, 29415, 365, 13, 14159, 286, 478, 2539, 746, 777, 11, 286, 722, 365, 746, + 286, 478, 4963, 365, 293, 51344], "temperature": 0.0, "avg_logprob": -0.13880052773848825, + "compression_ratio": 1.819905213270142, "no_speech_prob": 0.004882677458226681}, + {"id": 616, "seek": 400952, "start": 4029.12, "end": 4037.92, "text": " then try + to apply that thing to the new technology as opposed to picking the technology first + and then", "tokens": [51344, 550, 853, 281, 3079, 300, 551, 281, 264, 777, 2899, + 382, 8851, 281, 8867, 264, 2899, 700, 293, 550, 51784], "temperature": 0.0, "avg_logprob": + -0.13880052773848825, "compression_ratio": 1.819905213270142, "no_speech_prob": + 0.004882677458226681}, {"id": 617, "seek": 403792, "start": 4038.56, "end": 4043.84, + "text": " trying to, you know, kind of go back and forth between the tutorials that + they provide. But", "tokens": [50396, 1382, 281, 11, 291, 458, 11, 733, 295, 352, + 646, 293, 5220, 1296, 264, 17616, 300, 436, 2893, 13, 583, 50660], "temperature": + 0.0, "avg_logprob": -0.09673656116832387, "compression_ratio": 1.7318840579710144, + "no_speech_prob": 0.0014678132720291615}, {"id": 618, "seek": 403792, "start": 4043.84, + "end": 4049.2000000000003, "text": " I always like to go back to a domain I''m familiar + with because then I don''t have to rebuild my", "tokens": [50660, 286, 1009, 411, + 281, 352, 646, 281, 257, 9274, 286, 478, 4963, 365, 570, 550, 286, 500, 380, 362, + 281, 16877, 452, 50928], "temperature": 0.0, "avg_logprob": -0.09673656116832387, + "compression_ratio": 1.7318840579710144, "no_speech_prob": 0.0014678132720291615}, + {"id": 619, "seek": 403792, "start": 4049.2000000000003, "end": 4053.92, "text": + " intuition. Right. So for instance, I''ve never really done image search, but I''ve + done e-commerce", "tokens": [50928, 24002, 13, 1779, 13, 407, 337, 5197, 11, 286, + 600, 1128, 534, 1096, 3256, 3164, 11, 457, 286, 600, 1096, 308, 12, 26926, 51164], + "temperature": 0.0, "avg_logprob": -0.09673656116832387, "compression_ratio": 1.7318840579710144, + "no_speech_prob": 0.0014678132720291615}, {"id": 620, "seek": 403792, "start": 4053.92, + "end": 4060.8, "text": " search all the time. So it makes way more sense for me + to try out transformers with e-commerce", "tokens": [51164, 3164, 439, 264, 565, + 13, 407, 309, 1669, 636, 544, 2020, 337, 385, 281, 853, 484, 4088, 433, 365, 308, + 12, 26926, 51508], "temperature": 0.0, "avg_logprob": -0.09673656116832387, "compression_ratio": + 1.7318840579710144, "no_speech_prob": 0.0014678132720291615}, {"id": 621, "seek": + 403792, "start": 4060.8, "end": 4067.28, "text": " than it does with images just + because I don''t know the core intuition as much on the images as I do", "tokens": + [51508, 813, 309, 775, 365, 5267, 445, 570, 286, 500, 380, 458, 264, 4965, 24002, + 382, 709, 322, 264, 5267, 382, 286, 360, 51832], "temperature": 0.0, "avg_logprob": + -0.09673656116832387, "compression_ratio": 1.7318840579710144, "no_speech_prob": + 0.0014678132720291615}, {"id": 622, "seek": 406728, "start": 4067.28, "end": 4074.4, + "text": " for e-commerce. So I would probably start that way first. Yeah, I agree. + And another thing,", "tokens": [50364, 337, 308, 12, 26926, 13, 407, 286, 576, 1391, + 722, 300, 636, 700, 13, 865, 11, 286, 3986, 13, 400, 1071, 551, 11, 50720], "temperature": + 0.0, "avg_logprob": -0.1799041863643762, "compression_ratio": 1.5346938775510204, + "no_speech_prob": 0.006555759813636541}, {"id": 623, "seek": 406728, "start": 4074.4, + "end": 4079.2000000000003, "text": " yeah, of course, Grant, you thank you, you + mentioned, you know, my medium blog post, there are", "tokens": [50720, 1338, 11, + 295, 1164, 11, 17529, 11, 291, 1309, 291, 11, 291, 2835, 11, 291, 458, 11, 452, + 6399, 6968, 2183, 11, 456, 366, 50960], "temperature": 0.0, "avg_logprob": -0.1799041863643762, + "compression_ratio": 1.5346938775510204, "no_speech_prob": 0.006555759813636541}, + {"id": 624, "seek": 406728, "start": 4079.2000000000003, "end": 4084.32, "text": + " a lot more people blogging on this, but I have a specific collection on medium, + 37 minutes", "tokens": [50960, 257, 688, 544, 561, 6968, 3249, 322, 341, 11, 457, + 286, 362, 257, 2685, 5765, 322, 6399, 11, 13435, 2077, 51216], "temperature": 0.0, + "avg_logprob": -0.1799041863643762, "compression_ratio": 1.5346938775510204, "no_speech_prob": + 0.006555759813636541}, {"id": 625, "seek": 406728, "start": 4084.96, "end": 4091.92, + "text": " by sheer reading time. You can go through like basics like exact can and + search all the way up to,", "tokens": [51248, 538, 23061, 3760, 565, 13, 509, 393, + 352, 807, 411, 14688, 411, 1900, 393, 293, 3164, 439, 264, 636, 493, 281, 11, 51596], + "temperature": 0.0, "avg_logprob": -0.1799041863643762, "compression_ratio": 1.5346938775510204, + "no_speech_prob": 0.006555759813636541}, {"id": 626, "seek": 409192, "start": 4092.8, + "end": 4097.76, "text": " you know, neural retrieval, which is approximate nearest + neighbor search because you cannot do", "tokens": [50408, 291, 458, 11, 18161, 19817, + 3337, 11, 597, 307, 30874, 23831, 5987, 3164, 570, 291, 2644, 360, 50656], "temperature": + 0.0, "avg_logprob": -0.1365921762254503, "compression_ratio": 1.6125, "no_speech_prob": + 0.003066863864660263}, {"id": 627, "seek": 409192, "start": 4097.76, "end": 4104.8, + "text": " exact can and search at scale. It will just not not scale. So you have + to kind of go and cut some", "tokens": [50656, 1900, 393, 293, 3164, 412, 4373, + 13, 467, 486, 445, 406, 406, 4373, 13, 407, 291, 362, 281, 733, 295, 352, 293, 1723, + 512, 51008], "temperature": 0.0, "avg_logprob": -0.1365921762254503, "compression_ratio": + 1.6125, "no_speech_prob": 0.003066863864660263}, {"id": 628, "seek": 409192, "start": + 4104.8, "end": 4109.68, "text": " corners, so to say, but actually in a more mathematical + sense, you create this algorithms that", "tokens": [51008, 12413, 11, 370, 281, + 584, 11, 457, 767, 294, 257, 544, 18894, 2020, 11, 291, 1884, 341, 14642, 300, 51252], + "temperature": 0.0, "avg_logprob": -0.1365921762254503, "compression_ratio": 1.6125, + "no_speech_prob": 0.003066863864660263}, {"id": 629, "seek": 409192, "start": 4109.68, + "end": 4117.2, "text": " are beautifully handling this complexity for you. So go + check it out. I think the next and probably", "tokens": [51252, 366, 16525, 13175, + 341, 14024, 337, 291, 13, 407, 352, 1520, 309, 484, 13, 286, 519, 264, 958, 293, + 1391, 51628], "temperature": 0.0, "avg_logprob": -0.1365921762254503, "compression_ratio": + 1.6125, "no_speech_prob": 0.003066863864660263}, {"id": 630, "seek": 411720, "start": + 4117.28, "end": 4124.72, "text": " last question, but not least, is from a shish, + is the search with a mail course right to step into", "tokens": [50368, 1036, 1168, + 11, 457, 406, 1935, 11, 307, 490, 257, 402, 742, 11, 307, 264, 3164, 365, 257, 10071, + 1164, 558, 281, 1823, 666, 50740], "temperature": 0.0, "avg_logprob": -0.1991677235082253, + "compression_ratio": 1.5875, "no_speech_prob": 0.006210990250110626}, {"id": 631, + "seek": 411720, "start": 4124.72, "end": 4131.28, "text": " if I''m looking to learn + about semantic search and add the functionality to SQL or no SQL databases?", "tokens": + [50740, 498, 286, 478, 1237, 281, 1466, 466, 47982, 3164, 293, 909, 264, 14980, + 281, 19200, 420, 572, 19200, 22380, 30, 51068], "temperature": 0.0, "avg_logprob": + -0.1991677235082253, "compression_ratio": 1.5875, "no_speech_prob": 0.006210990250110626}, + {"id": 632, "seek": 411720, "start": 4133.28, "end": 4136.88, "text": " That''s + an interesting question. I guess I haven''t thought about it in that sense. I mean, + I think,", "tokens": [51168, 663, 311, 364, 1880, 1168, 13, 286, 2041, 286, 2378, + 380, 1194, 466, 309, 294, 300, 2020, 13, 286, 914, 11, 286, 519, 11, 51348], "temperature": + 0.0, "avg_logprob": -0.1991677235082253, "compression_ratio": 1.5875, "no_speech_prob": + 0.006210990250110626}, {"id": 633, "seek": 411720, "start": 4139.36, "end": 4146.96, + "text": " you know, I think a lot of the techniques we use in the ML class relate + to semantic", "tokens": [51472, 291, 458, 11, 286, 519, 257, 688, 295, 264, 7512, + 321, 764, 294, 264, 21601, 1508, 10961, 281, 47982, 51852], "temperature": 0.0, + "avg_logprob": -0.1991677235082253, "compression_ratio": 1.5875, "no_speech_prob": + 0.006210990250110626}, {"id": 634, "seek": 414696, "start": 4146.96, "end": 4154.24, + "text": " search and relate to like how can we get better relevance out of the engine? + So semantic search being", "tokens": [50364, 3164, 293, 10961, 281, 411, 577, 393, + 321, 483, 1101, 32684, 484, 295, 264, 2848, 30, 407, 47982, 3164, 885, 50728], "temperature": + 0.0, "avg_logprob": -0.1753984769185384, "compression_ratio": 1.6470588235294117, + "no_speech_prob": 0.0009869193891063333}, {"id": 635, "seek": 414696, "start": 4154.24, + "end": 4159.76, "text": " one of those types of capabilities, a kind of semantic + search often is a pretty loaded phrase. So", "tokens": [50728, 472, 295, 729, 3467, + 295, 10862, 11, 257, 733, 295, 47982, 3164, 2049, 307, 257, 1238, 13210, 9535, 13, + 407, 51004], "temperature": 0.0, "avg_logprob": -0.1753984769185384, "compression_ratio": + 1.6470588235294117, "no_speech_prob": 0.0009869193891063333}, {"id": 636, "seek": + 414696, "start": 4159.76, "end": 4167.12, "text": " depending on what you''re trying + to do there, as you should your mileage may vary. But we certainly", "tokens": [51004, + 5413, 322, 437, 291, 434, 1382, 281, 360, 456, 11, 382, 291, 820, 428, 43121, 815, + 10559, 13, 583, 321, 3297, 51372], "temperature": 0.0, "avg_logprob": -0.1753984769185384, + "compression_ratio": 1.6470588235294117, "no_speech_prob": 0.0009869193891063333}, + {"id": 637, "seek": 414696, "start": 4167.12, "end": 4173.76, "text": " cover things + like classifying your content, classifying your queries. We do learning to rank.", + "tokens": [51372, 2060, 721, 411, 1508, 5489, 428, 2701, 11, 1508, 5489, 428, 24109, + 13, 492, 360, 2539, 281, 6181, 13, 51704], "temperature": 0.0, "avg_logprob": -0.1753984769185384, + "compression_ratio": 1.6470588235294117, "no_speech_prob": 0.0009869193891063333}, + {"id": 638, "seek": 417376, "start": 4174.72, "end": 4182.64, "text": " We talk + about synonym expansion query, you know, smarter queries, better filters, all of + those kinds of", "tokens": [50412, 492, 751, 466, 5451, 12732, 11260, 14581, 11, + 291, 458, 11, 20294, 24109, 11, 1101, 15995, 11, 439, 295, 729, 3685, 295, 50808], + "temperature": 0.0, "avg_logprob": -0.16645102751882454, "compression_ratio": 1.574468085106383, + "no_speech_prob": 0.001611536368727684}, {"id": 639, "seek": 417376, "start": 4182.64, + "end": 4188.72, "text": " things, I think fall can be loosely coupled into semantic + search. If you''re talking more like you", "tokens": [50808, 721, 11, 286, 519, + 2100, 393, 312, 37966, 29482, 666, 47982, 3164, 13, 759, 291, 434, 1417, 544, 411, + 291, 51112], "temperature": 0.0, "avg_logprob": -0.16645102751882454, "compression_ratio": + 1.574468085106383, "no_speech_prob": 0.001611536368727684}, {"id": 640, "seek": + 417376, "start": 4188.72, "end": 4198.56, "text": " want to do like graph-based + inferences or, you know, using things like wiki data or dbpedia or", "tokens": [51112, + 528, 281, 360, 411, 4295, 12, 6032, 13596, 2667, 420, 11, 291, 458, 11, 1228, 721, + 411, 261, 9850, 1412, 420, 274, 65, 3452, 654, 420, 51604], "temperature": 0.0, + "avg_logprob": -0.16645102751882454, "compression_ratio": 1.574468085106383, "no_speech_prob": + 0.001611536368727684}, {"id": 641, "seek": 419856, "start": 4198.56, "end": 4204.64, + "text": " those kinds of things to infer relationships and do semantic search that + way. We don''t really", "tokens": [50364, 729, 3685, 295, 721, 281, 13596, 6159, + 293, 360, 47982, 3164, 300, 636, 13, 492, 500, 380, 534, 50668], "temperature": + 0.0, "avg_logprob": -0.1109636671402875, "compression_ratio": 1.5274725274725274, + "no_speech_prob": 0.000490899255964905}, {"id": 642, "seek": 419856, "start": 4204.64, + "end": 4212.4800000000005, "text": " cover those as much. We do base off of open + search, but I think the concepts apply in general.", "tokens": [50668, 2060, 729, + 382, 709, 13, 492, 360, 3096, 766, 295, 1269, 3164, 11, 457, 286, 519, 264, 10392, + 3079, 294, 2674, 13, 51060], "temperature": 0.0, "avg_logprob": -0.1109636671402875, + "compression_ratio": 1.5274725274725274, "no_speech_prob": 0.000490899255964905}, + {"id": 643, "seek": 419856, "start": 4213.68, "end": 4221.6, "text": " With the + SQL and no SQL databases, like I know a lot of them have kind of baseline search", + "tokens": [51120, 2022, 264, 19200, 293, 572, 19200, 22380, 11, 411, 286, 458, 257, + 688, 295, 552, 362, 733, 295, 20518, 3164, 51516], "temperature": 0.0, "avg_logprob": + -0.1109636671402875, "compression_ratio": 1.5274725274725274, "no_speech_prob": + 0.000490899255964905}, {"id": 644, "seek": 422160, "start": 4221.6, "end": 4229.360000000001, + "text": " functionality in them. And so you would be able to apply some of the principles + because a lot of", "tokens": [50364, 14980, 294, 552, 13, 400, 370, 291, 576, 312, + 1075, 281, 3079, 512, 295, 264, 9156, 570, 257, 688, 295, 50752], "temperature": + 0.0, "avg_logprob": -0.10869394322877289, "compression_ratio": 1.7130044843049328, + "no_speech_prob": 0.007272040005773306}, {"id": 645, "seek": 422160, "start": 4229.360000000001, + "end": 4238.240000000001, "text": " the principles we do, you actually do either + before indexing or before querying. So those would", "tokens": [50752, 264, 9156, + 321, 360, 11, 291, 767, 360, 2139, 949, 8186, 278, 420, 949, 7083, 1840, 13, 407, + 729, 576, 51196], "temperature": 0.0, "avg_logprob": -0.10869394322877289, "compression_ratio": + 1.7130044843049328, "no_speech_prob": 0.007272040005773306}, {"id": 646, "seek": + 422160, "start": 4238.240000000001, "end": 4243.280000000001, "text": " certainly + apply, you know, because at the end of the day, you''re just using those things + to then", "tokens": [51196, 3297, 3079, 11, 291, 458, 11, 570, 412, 264, 917, 295, + 264, 786, 11, 291, 434, 445, 1228, 729, 721, 281, 550, 51448], "temperature": 0.0, + "avg_logprob": -0.10869394322877289, "compression_ratio": 1.7130044843049328, "no_speech_prob": + 0.007272040005773306}, {"id": 647, "seek": 422160, "start": 4243.280000000001, "end": + 4249.52, "text": " generate a better query or a better document to be stored in + your engine. And so I don''t see", "tokens": [51448, 8460, 257, 1101, 14581, 420, + 257, 1101, 4166, 281, 312, 12187, 294, 428, 2848, 13, 400, 370, 286, 500, 380, 536, + 51760], "temperature": 0.0, "avg_logprob": -0.10869394322877289, "compression_ratio": + 1.7130044843049328, "no_speech_prob": 0.007272040005773306}, {"id": 648, "seek": + 424952, "start": 4249.68, "end": 4254.96, "text": " your reason why they went work + in a no-SQL store or a SQL store. It''s just then how do you", "tokens": [50372, + 428, 1778, 983, 436, 1437, 589, 294, 257, 572, 12, 39934, 3531, 420, 257, 19200, + 3531, 13, 467, 311, 445, 550, 577, 360, 291, 50636], "temperature": 0.0, "avg_logprob": + -0.16992644893312916, "compression_ratio": 1.5809128630705394, "no_speech_prob": + 0.005734000355005264}, {"id": 649, "seek": 424952, "start": 4254.96, "end": 4262.160000000001, + "text": " translate that into your query language, right? But we do use open search. + All the examples are", "tokens": [50636, 13799, 300, 666, 428, 14581, 2856, 11, + 558, 30, 583, 321, 360, 764, 1269, 3164, 13, 1057, 264, 5110, 366, 50996], "temperature": + 0.0, "avg_logprob": -0.16992644893312916, "compression_ratio": 1.5809128630705394, + "no_speech_prob": 0.005734000355005264}, {"id": 650, "seek": 424952, "start": 4262.160000000001, + "end": 4268.4800000000005, "text": " open search. You would have to do the work + to leap to that, whatever it is your engine is doing.", "tokens": [50996, 1269, + 3164, 13, 509, 576, 362, 281, 360, 264, 589, 281, 19438, 281, 300, 11, 2035, 309, + 307, 428, 2848, 307, 884, 13, 51312], "temperature": 0.0, "avg_logprob": -0.16992644893312916, + "compression_ratio": 1.5809128630705394, "no_speech_prob": 0.005734000355005264}, + {"id": 651, "seek": 424952, "start": 4269.6, "end": 4275.360000000001, "text": " + Yeah, absolutely. And the good thing is that open search does have a K&N plugin. + They call it K&N", "tokens": [51368, 865, 11, 3122, 13, 400, 264, 665, 551, 307, + 300, 1269, 3164, 775, 362, 257, 591, 5, 45, 23407, 13, 814, 818, 309, 591, 5, 45, + 51656], "temperature": 0.0, "avg_logprob": -0.16992644893312916, "compression_ratio": + 1.5809128630705394, "no_speech_prob": 0.005734000355005264}, {"id": 652, "seek": + 427536, "start": 4275.36, "end": 4280.719999999999, "text": " plugin, but it''s + actually approximate nearest neighbor search. And so it''s off heap for those", + "tokens": [50364, 23407, 11, 457, 309, 311, 767, 30874, 23831, 5987, 3164, 13, 400, + 370, 309, 311, 766, 33591, 337, 729, 50632], "temperature": 0.0, "avg_logprob": + -0.11169684926668803, "compression_ratio": 1.531496062992126, "no_speech_prob": + 0.019404180347919464}, {"id": 653, "seek": 427536, "start": 4280.719999999999, "end": + 4288.32, "text": " who care. So it''s not inside Java, but it still allows you to + get a feel of how neural search will", "tokens": [50632, 567, 1127, 13, 407, 309, + 311, 406, 1854, 10745, 11, 457, 309, 920, 4045, 291, 281, 483, 257, 841, 295, 577, + 18161, 3164, 486, 51012], "temperature": 0.0, "avg_logprob": -0.11169684926668803, + "compression_ratio": 1.531496062992126, "no_speech_prob": 0.019404180347919464}, + {"id": 654, "seek": 427536, "start": 4288.32, "end": 4294.08, "text": " influence + your results at. And you can also, you know, mix and match, sort of using more traditional", + "tokens": [51012, 6503, 428, 3542, 412, 13, 400, 291, 393, 611, 11, 291, 458, 11, + 2890, 293, 2995, 11, 1333, 295, 1228, 544, 5164, 51300], "temperature": 0.0, "avg_logprob": + -0.11169684926668803, "compression_ratio": 1.531496062992126, "no_speech_prob": + 0.019404180347919464}, {"id": 655, "seek": 427536, "start": 4294.08, "end": 4300.48, + "text": " VM25 with this. Awesome. This was the last question. Thanks so much to + everyone who asked their", "tokens": [51300, 18038, 6074, 365, 341, 13, 10391, 13, + 639, 390, 264, 1036, 1168, 13, 2561, 370, 709, 281, 1518, 567, 2351, 641, 51620], + "temperature": 0.0, "avg_logprob": -0.11169684926668803, "compression_ratio": 1.531496062992126, + "no_speech_prob": 0.019404180347919464}, {"id": 656, "seek": 430048, "start": 4300.48, + "end": 4307.12, "text": " questions live. And, you know, consider joining the course + if you haven''t yet. And, Grant,", "tokens": [50364, 1651, 1621, 13, 400, 11, 291, + 458, 11, 1949, 5549, 264, 1164, 498, 291, 2378, 380, 1939, 13, 400, 11, 17529, 11, + 50696], "temperature": 0.0, "avg_logprob": -0.17734616994857788, "compression_ratio": + 1.6394849785407726, "no_speech_prob": 0.003930052742362022}, {"id": 657, "seek": + 430048, "start": 4307.12, "end": 4312.879999999999, "text": " thanks so much for + this session and for answering the question and sharing your wisdom. I''ve enjoyed", + "tokens": [50696, 3231, 370, 709, 337, 341, 5481, 293, 337, 13430, 264, 1168, 293, + 5414, 428, 10712, 13, 286, 600, 4626, 50984], "temperature": 0.0, "avg_logprob": + -0.17734616994857788, "compression_ratio": 1.6394849785407726, "no_speech_prob": + 0.003930052742362022}, {"id": 658, "seek": 430048, "start": 4312.879999999999, "end": + 4319.2, "text": " this conversation very much. Thank you. Thanks so much for having + me, Dmitry, and keep up the great", "tokens": [50984, 341, 3761, 588, 709, 13, 1044, + 291, 13, 2561, 370, 709, 337, 1419, 385, 11, 413, 3508, 627, 11, 293, 1066, 493, + 264, 869, 51300], "temperature": 0.0, "avg_logprob": -0.17734616994857788, "compression_ratio": + 1.6394849785407726, "no_speech_prob": 0.003930052742362022}, {"id": 659, "seek": + 430048, "start": 4319.2, "end": 4325.759999999999, "text": " work. I love the podcast. + And it''s awesome to see a search dedicated podcast out there. So", "tokens": [51300, + 589, 13, 286, 959, 264, 7367, 13, 400, 309, 311, 3476, 281, 536, 257, 3164, 8374, + 7367, 484, 456, 13, 407, 51628], "temperature": 0.0, "avg_logprob": -0.17734616994857788, + "compression_ratio": 1.6394849785407726, "no_speech_prob": 0.003930052742362022}, + {"id": 660, "seek": 432576, "start": 4326.400000000001, "end": 4331.76, "text": + " congrats and good luck with that. Thank you so much. All right. Bye-bye. Bye, + folks.", "tokens": [50396, 8882, 1720, 293, 665, 3668, 365, 300, 13, 1044, 291, + 370, 709, 13, 1057, 558, 13, 4621, 12, 6650, 13, 4621, 11, 4024, 13, 50664], "temperature": + 0.0, "avg_logprob": -0.25870293837327224, "compression_ratio": 1.1734693877551021, + "no_speech_prob": 0.0075907097198069096}, {"id": 661, "seek": 432576, "start": 4332.72, + "end": 4341.68, "text": " Thanks, Dmitry. Thanks, Grant.", "tokens": [50712, 2561, + 11, 413, 3508, 627, 13, 2561, 11, 17529, 13, 51160], "temperature": 0.0, "avg_logprob": + -0.25870293837327224, "compression_ratio": 1.1734693877551021, "no_speech_prob": + 0.0075907097198069096}, {"id": 662, "seek": 435576, "start": 4355.76, "end": 4357.72, + "text": " All right.", "tokens": [50364, 1057, 558, 13, 50462], "temperature": 1.0, + "avg_logprob": -3.493879631384095, "compression_ratio": 1.2868852459016393, "no_speech_prob": + 0.5679353475570679}, {"id": 663, "seek": 435576, "start": 4357.84, "end": 4358.76, + "text": " Dmitry.", "tokens": [50468, 413, 3508, 627, 13, 50514], "temperature": + 1.0, "avg_logprob": -3.493879631384095, "compression_ratio": 1.2868852459016393, + "no_speech_prob": 0.5679353475570679}, {"id": 664, "seek": 435576, "start": 4361.08, + "end": 4361.56, "text": " Right, now.", "tokens": [50630, 1779, 11, 586, 13, 50654], + "temperature": 1.0, "avg_logprob": -3.493879631384095, "compression_ratio": 1.2868852459016393, + "no_speech_prob": 0.5679353475570679}, {"id": 665, "seek": 435576, "start": 4361.8, + "end": 4362.16, "text": " Two.", "tokens": [50666, 4453, 13, 50684], "temperature": + 1.0, "avg_logprob": -3.493879631384095, "compression_ratio": 1.2868852459016393, + "no_speech_prob": 0.5679353475570679}, {"id": 666, "seek": 435576, "start": 4362.54, + "end": 4362.8, "text": " Two.", "tokens": [50703, 4453, 13, 50716], "temperature": + 1.0, "avg_logprob": -3.493879631384095, "compression_ratio": 1.2868852459016393, + "no_speech_prob": 0.5679353475570679}, {"id": 667, "seek": 435576, "start": 4362.8, + "end": 4364.34, "text": " Until eight.", "tokens": [50716, 9088, 3180, 13, 50793], + "temperature": 1.0, "avg_logprob": -3.493879631384095, "compression_ratio": 1.2868852459016393, + "no_speech_prob": 0.5679353475570679}, {"id": 668, "seek": 435576, "start": 4364.5, + "end": 4365.8, "text": " One.", "tokens": [50801, 1485, 13, 50866], "temperature": + 1.0, "avg_logprob": -3.493879631384095, "compression_ratio": 1.2868852459016393, + "no_speech_prob": 0.5679353475570679}, {"id": 669, "seek": 435576, "start": 4365.8, + "end": 4366.64, "text": " Four.", "tokens": [50866, 7451, 13, 50908], "temperature": + 1.0, "avg_logprob": -3.493879631384095, "compression_ratio": 1.2868852459016393, + "no_speech_prob": 0.5679353475570679}, {"id": 670, "seek": 435576, "start": 4366.9400000000005, + "end": 4374.860000000001, "text": " Okay, let''s get back to what you want and now + we can direct that back to the millions.", "tokens": [50923, 1033, 11, 718, 311, + 483, 646, 281, 437, 291, 528, 293, 586, 321, 393, 2047, 300, 646, 281, 264, 6803, + 13, 51319], "temperature": 1.0, "avg_logprob": -3.493879631384095, "compression_ratio": + 1.2868852459016393, "no_speech_prob": 0.5679353475570679}, {"id": 671, "seek": 435576, + "start": 4375.64, "end": 4376.24, "text": " Okay.", "tokens": [51358, 1033, 13, + 51388], "temperature": 1.0, "avg_logprob": -3.493879631384095, "compression_ratio": + 1.2868852459016393, "no_speech_prob": 0.5679353475570679}]' +--- + +Hello there, vector podcast is here. I'm Dimitri Khan and I'll be hosting this session. And just a few words on the logistics. Everyone in the audience feel free to submit your questions either through a Q&A panel or directly in the chat and we will try to handle as many questions as we can. +I'll save you words about core eyes. What's core eyes is a new education platform that transforms the way professionals build technical high demand skills through top industry instructors and collective peer learning. +And the format of their courses is innovative mixing live instructor sessions with real world projects and fireside chats like this one technique with operators who experts in their fields. +I will say a few words about myself as well, untraditionally on the podcast, but I think it becomes a tradition now second time. I said and Dimitri Khan I have a PhD in natural language processing. I've worked at company Alphasense helped to build the search stack. +I spent like a decade, you know, there. I've been a principal AI scientist at silo AI. It's a AI consulting gig focusing on a number of ML verticals. And recently I joined company Tom Tom as a senior product manager working on search. I've also been a contributor and user of Cupid. +It's a query rating tool go check it out. It's an open source tool. So overall I spent like 16 years in developing search engines for startups and multinational technology giants. I also happen to be hosting this podcast vector podcast go check it out. I'll share the link in a second. +And I'm also blogging on medium on on my findings in vector search. So you might hear me talking about vector search here and there. And today I'm super super excited to have Grant in your soul with me. I've known Grant since about 2011. +Not personally, but I've seen I've seen him on stage on you know Berlin buzzwords conference and Lucinda revolution. And he has been a long contributor and open source as well. Solary, Lucinda Mahoot and others. And very very effective presenter. +I just watched a few presentations as a homework for this session. There will be some questions from there. But hey Grant, let's start with an introduction from Indio own words. +Hey Dmitri and thank you so much for having me on the vector podcast and obviously props to co rise here as well for helping sponsor this. Both Daniel Tungaling and I are on the co rise platform and really enjoying our time there. So real quick about myself. As you said, my name is Grant Ingersoll. +I guess these days a long standing user and contributor and committer. And generally somebody who participates in the search space, if you will, I think I wrote my first Lucine code back in 2004 or so. I guess that maybe makes me old. +As far as my background is I was one of the co founders of Lucidworks, which is one of the leading companies in the search space. I then left them in 2019 to become the chief technology officer at the Wikimedia Foundation. +You probably know them better as the nonprofit behind Wikipedia and Wikidata. So I was the CTO there for two years. And then in August or so of 2021, I took some time off. And then in January of 2022, I went on my own as a consultant and an instructor for co rise. So here we are now. +I am commonly doing work in what I would call fractional CTO land, which means I primarily help companies kind of get their technology stack in order, make decisions about technology, higher teams, upgrade teams, do all the things that a CTO would do. Often for small businesses and or startups. +And so that's really my background. Really happy to be here and looking forward to the podcast. Awesome. Great to have you, really grand. And also, you know, finally, I have a chance to ask some questions and chat to you in this cozy atmosphere as well. And I wanted to start with a question. +So I was watching a kind of short interview you gave. During Berlin buzzwords 2016, where you said how you split your time as then CTO of I believe, Lucid works. You said that you split your time between three C's, which is writing code, going to conferences and talking to customers. +Now that you're independent, is this how you spend your time or did you did you get some new letters of the alphabet? Yeah. +Yeah, and there's often in there as well, colleagues and co-workers, you know, especially in, you know, the CTO role is kind of a funny one, right? Depending on the company, it can mean a lot of different things. At some companies, CTOs are entirely outward facing. +It's effectively a sales role or a marketing role, right? You're out evangelizing the product, you're talking to customers, etc. In a startup, the CTO is often the primary engineer. +If you're a two person startup and you're just getting off the ground, you probably have the CTO title if you're the technical one in that startup, and you're probably writing all the code, right? +In other places, you're running your engineering team, and you may not be writing as much code, but you're responsible for the team. +I guess over my years, I've worn all of those hats. I've been out doing conferences and evangelizing. I've done a lot of sales work, especially later on at Lucidworks, I did a lot of sales work as the company evolved and grew. +When I was at Wikimedia, it was all pretty much internal running the technology team, making, you know, helping making technology decisions, all of those kinds of things. So I wouldn't necessarily say it's changed much. +I still do write some code, but not as much as as I used to, I guess, when I was a full-time engineer. But yeah, it still roughly falls into those categories. +Yeah, and I mean, like having been a student on your course, I've really enjoyed so much code that you've written to support this infrastructure of building the search engine. And I mean, you are still highly technical person, so I wouldn't discount that. +And I mean, this is something that is dear to my heart as well for me being an engineer, to talk to like-minded person. And in this segment year, in the same conference, 2017, you gave an excellent talk title, BM25, is so yesterday, modern techniques for better search. +And what's funny, and I'm going to share the link as well. But what's funny is that I don't know if you noticed it yourself, but you again have three C's in there. I wonder if you did it on purpose. What you have there as building blocks of this kind of journey of building a search engine. +So the first one is content. And you piggyback on solar capabilities, but in general, it could be any search engine out there with rules for content, like with boosting, manual boosting, you know, lending pages, and so on. The second C is collaboration. +So that's like the way you put it, it's collective intelligence to predict user behavior based on like historical aggregated data. And this is where I think recommenders come in, popularity, signals, and so on. +And last but not least, you have context, which is when you ask questions, who are you, where are you, you know, what I have you done previously. +And this is when you start doing market and user segmentation and venture into personalization and so on, would you say that you view the search engine journey and development the same way today, or have you have you changed your perspective? I really need to check and get a little more creative. +I think I'm using the letter C there too many times in a row, but I mean, I think a lot of that still stands pretty true. +Regardless of the engine you're using or whether you're using deep learning techniques or not, like, you know, at the end of the day, you're trying to match users to information that will help them make better decisions or be more informed, right. +And you know, these days, I would probably add in one more. I'm trying to think of how I could be witty and make it into another C, but you know, in working with Daniel Tongueleg on this class, one of the things that is just absolutely wowed me is the query understanding aspects of it. +And so maybe you could put that into the context category if you wanted. But, you know, realistically speaking that that work you can do, especially in large scale environments where you have a lot of queries, to really understand what users are asking or intending to ask when they put in a query. +So I would probably throw that in there if I didn't include that back then. And so like I said, maybe that's part of your content or your or your context is the actual query, a user or this, the set of queries that a user is asking. +You know, but I still think a lot of that stands at the conceptual level, right, is you have to have some, you know, if you think about it, this is the vector podcast, right. +All of this stuff we're building vectors and then essentially calculating this fancy version of a cosine similarity between them. +And at the end of the day, all of these techniques we're doing are effectively how can we shape those vectors so that things that are meant to be closer together, show up closer together and things that are not as related the cosine is further apart, right. +Like at the end of the day, like that math doesn't change, yet all these techniques, whether it's deep learning, et cetera, are all about creating those vectors and doing that calculation, right. +And so by understanding your content, you're shifting those vectors, you're transforming them in the space, you're adding synonyms, you're adding embeddings, all of those kinds of things, you're adding proper nouns, you're you're doing noun phrases, et cetera. +By understanding your context, you're able to ask better queries, right, which is shifting the query vector, right. And by using popularity, et cetera, you're also then shifting those vectors by essentially adding more weight to things that are more popular, right. +You know, so at the end of the day, like, yeah, I would say I'd still stand by that with the caveat is really bringing forward the query understanding aspect of it. +Yeah, I think query understanding, you put it brilliantly, it's like really an exciting space and we actually recorded a podcast as well with Daniel Tankilank, where he explained a lot of it, he also blogged a lot about it. So go check it out. +And like in that same presentation, like when you demoed the capabilities of Lucidworks platform, where you played a lot with different like ranking strategies, basically like you pre-trained some of them and you you were able to switch live, I felt like you you are a tinkerer as well. +You enjoy really going deep down into the what search engine can do and what you you can extract from the data. +And my question is, where do you see the balance between kind of like doing this in a more manual fashion, where you actually educate yourself, right, versus like throwing it to a machine learning model? Yeah, it's a great question. + I mean, I think you know, obviously, and I see my former colleague and co-founder Eric Hatcher is on, I mean, he used to always say it depends and I'd say it depends here of course as well, which is, you know, I mean, there's there's some situations where you just you don't have enough data for machine learning, right? +So by default, you are going to be manually tuning the situation, right? You you see that a lot in enterprise systems, especially smaller enterprise systems or in niche applications where, you know, effectively search just needs to be good enough. +Maybe you're not monetizing search. And so you don't, you know, you just kind of need it to be reasonably good, right? It's a feature in a much broader set of features that users are going to engage with. +And so, you know, where and how you would use machine learning in those situations, you know, you may or may not. + In the situations where you have lots and lots of data, lots of users, you're probably monetizing search, whether that's via e-commerce or or web search or ads or whatever, like, you know, I think machine learning makes a lot more sense there and and it's a lot easier to run these types of experiments that allow you to tinker not just with the hand-ranked models, which I think hand-ranking still has its place, right? +Because they help you form intuition about what is in your data, right? +And that intuition is really important even in a machine learning world because, you know, at the end of the day, even with machine learning, while you can still try out, you can try out a lot more features and approaches, you still have limited time, right? + And so, you still have to have some intuition about what's going to work and I think there's no substitute for that intuition helping guide you into what matters, like, so for instance, in a learning to rank scenario where you're actually learning a ranking model, you still are often building up those systems using the features of your data. + So you have to know what those features are and one of the nice things is like with tools like Lucine-based engines like OpenSearch or Solar or Elastic, I'm sure Vespa has the same kind of thing, you can go and play around with those, you can create your own function queries that allow you to roughly try out different formulas for ranking and then you can go and turn those things into machine learning models, right? +That learn a much more effective function than what you could come up with, right? So, I think even in this world of large data sets and machine learning, you're still going to have to build intuition, right? Yeah, absolutely. +And like in your own experience and in the experience of the teams that you supported, how do you nurture this intuition? Like, do you read books? Do you constantly experiment? + And also like when it comes in, you know, to understanding fundamentals of search, let's say knowing how TFIDF formula composed or you have 25, what are the trade-offs versus sort of like going and actually experimenting and trying out things, you know, where do you see that balance as well for yourself maybe and also for the teams around you? +Yeah, I mean, I think everybody will have their own, you know, kind of depending on where you come from, right? Like if, you know, like if you have, if you've done deep academic work, you're probably going to have a lot more understanding of the math and the theoretical side of it. +And then you're going to have to develop the intuition of real world data, right? How messy it is, how clunky it is, how full of junk and spam, et cetera, right? Because a lot of times when you're dealing with academic data sets, they're pretty clean, right? +Relatively speaking, they still of course have their own set of garbage and nuances in them. +Whereas if you're an engineer and you're coming at it from like, hey, you know, often what I see with engineers is they come at it from a quantitative standpoint of like, I want to make sure this is scalable and reliable. So they're solving for the hardening of the system problem first. +And then they often will develop the the relevant side of it or the the understanding of the data second. Now again, broad generalizations there because, you know, folks have all kinds of different backgrounds. +But you know, so like as a leader in somebody who does, you know, manages people in this space, like I would often just work with you depending on what your background and understanding and intuition is. +And then, you know, try to help you complement whatever it is you're missing there, right? Like I think you have to have an understanding of how these engines work. +I've often seen folks who don't have an understanding of all the capabilities of these modern search engines recreate the wheel, right? Like they're reinventing the wheel because they they're coming from this first principles of the math that they learn at the academic level. +And then, but they don't necessarily know how that applies to real data in the real world. Whereas a lot of these, you know, modern search engines, because they are, they grew up in large scale, you know, publicly traded high volume spaces. +They've really been hardened on the engineering side and they really know how to deal with all the nuances of real world data, right? +And so, you by learning those kinds of things, you will be much more effective at the at bringing to bear your intuitions and understandings from whichever background that is. +I don't know if that makes sense or not. Yeah, no, absolutely. Yeah, actually in the same presentation, you also said like, you've seen cases where you you come into helper company and they they point you to sort of like a data, the ace almost of 10,000 rules. +And so you you said they have that in principle, you could just remove solar or whatever search engine you have and just use those rules to retrieve documents, right? But when you go and ask specific questions, what what what this rule does? +The answer that you you illustrated was well, it was created by Joy, you know, and he quit five years ago. +So he then said it makes sense. So we keep it. So how do you go about convincing the organization or teams to change their perception and sort of like become more flexible and move into this flywheel of experiments? Yeah, it's hard. +And again, I think, you know, I mean, you have to look at incentives and first principles there, right? Like, again, if you're in this boat of like searches, just a feature, there may or may not be any incentive. +But if you're in this boat of like, hey, search is a really critical aspect of what we do. Our users use it all the time. It's key to revenue. It's key to timeliness or it's, you know, people's lives are on the line, et cetera. You're going to invest in making sure searches as capable as possible. +Those folks usually don't take much convincing once you can show them a better way, right? They're often already frustrated by the sheer number of rules that they have. +And so one of the things that can often work in those situations, I think is, you know, you can start to just learn the, you know, a lot of these machine learning systems will actually learn the set of rules, right? +And so if you want, you can just start to learn the rules by the fact that you're gathering your queries and your click logs and you're looking at the engagements users are having with the system, with the rules in place. +And then over time, you know, that will learn it. +That the harder part often is getting that last part, which is true experimentation whereby they actually have a system in place for running multi-variant experiments or AB tests, right? +And they can actually try out different approaches and see which one wins and see which one's most effective and then go with that from, you know, until the next one beats it, right? That's a fair amount of engineering work to get in place. +It's also a fair amount of math to do in order to make sure it's appropriate. These days, there are systems and tools that allow you to do it, but if you want to homegrown it, you know, that can take a lot of work. +So getting people to be in that mindset, especially in environments or company cultures where like there's pride in being right, you know, you sometimes see that in a lot of companies where it's like whoever's the boss has to be right kind of situation. +Those types of companies are always going to struggle with experiment mindsets because, you know, they reward, quote unquote, being right as opposed to, quote unquote, you know, rewarding longer term growth and incremental improvements with the occasional failures, right? +So you really have to look at company culture first and potentially reset that and then build and bake in the the necessary engineering work to make experiments work. +Yeah, absolutely. I agree to that same thought that, you know, without failures, you cannot really breed the culture of creating cool new stuff because you basically cannot unleash yourself to go and mess with your code base, right? And do things and create new stuff. +So like, you need to be brave for sure. Well, as I think the front of mind Ted Dunning said, the cool thing about experimentation frameworks is you get to be wrong and that's okay, right? Like you're actually right by the fact that you're wrong. +You're because you're right in the long run, right? Yes. Even if any given experiment is flat or bad, right? But overall, you know, in the long run, you're going to win out because you're going to just, it's easier and easier for you to add in a new approach. Yeah, absolutely. +I think that Turnbull also said, like, you know, how you basically accumulate this bruises, right? So you're like, Oscar tissue as some other people say. So I think without doing things, you can't without failing as well, you can't learn. So I totally agree to that. +But still for those who are still learning, you know, and we are discussing, to some extent, the courses that you couldn't be teaching, you know, where do you start? Like, let's say you have some data, right? You have some click logs within your organization or maybe you found some data set. +Where do you start? How do you go about dissecting that data set? What do you do with it as next steps and what to avoid maybe and what good things to know to keep in mind? +Yeah, I mean, I think, you know, first off, I mean, a lot of companies aren't even at all that great at actually collecting and managing their query logs, right? +So if you're, if you've got a search engine up and running and you want to improve it, I mean, I think the first thing you have to start to do is again, it kind of goes back to this first principles. +Like, if I'm not measuring things that help me understand what users are doing, and that's the first step, right? Like, make sure you're able to process your query logs and capture things like session history and what users clicked on, what they saw. +A lot of companies will only measure what was clicked on, but they actually don't measure what was seen by the user or at least inferred to be seen by the user. +And that can be a big loss because a lot of these machine learning systems, you need to know what wasn't chosen just as much as you need to know what was chosen, right? So really make sure you've got the instrumentation of your system in place. +And guess what? A search engine is a great place to store all of that data as well, right? As elastic as proven out with their using search for logs and spawn as well, right? And so make sure you're captioning all that stuff. And then again, I think this is where your intuition starts to come in. +So whenever I get a new data set, a new set of click logs, I start to look at, well, what are my most popular queries? What are users asking today? What are they asking overall? What led to zero results? +How often are they rewriting their queries like they typed in a query and then they didn't like the results. +So they rewrote it. You know, all of these things are pretty easily discoverable in query logs, right? So just start digging in and building some intuition for those things. + So for instance, one of the things when I was back at Lucidworks that we would do is what we call like head tail analysis or long tail analysis is another thing you see in the literature, you know, especially in the e-commerce world where you have this power law distribution where most people ask the same things over and over, but you often have a really long tail. +When you analyze the long tail in a lot of e-commerce situations, what you often find, for instance, is the long tail is actually pretty highly correlated to the head queries, right? And so developing that intuition of like, you know, why are these long tail queries working or not working? +That can then help you do much better at all of your queries, right? And so, you know, from those click logs, then you start to focus on, well, how do I improve my head or my torso queries, like the ones that are most common? +And then as you go on, then you can look at how do I handle long tail queries depending on how important they are to you? You know, and from from that click log, then you can start to build either, you know, in some cases, you still might make sense for you to have rules. +And then, and then you can also look at, you know, like again, like I would try to look at it the problem holistically, what's going to get me the most bang for my buck in terms of where I should spend my time, right? +So in the short run, rules are probably easier, but they're harder to maintain in the long run. +And of course, you can only manage so many rules on your own and, you know, even with several people, whereas machine learning may take more work up front, but in the long run is probably easier to maintain. +Although I do still wonder, you know, if we're going to run into the same kind of problems we have with rules with machine learning models where we have so many different models that are being applied and they're built by different teams and they're applied in different scenarios. +And, and next thing you know, you have a complexity problem on that front as well. + But, you know, luckily, like with things like machine learning operations becoming more of a focus and people getting much more rigorous about how they deploy and manage models, I think most of those problems will be mitigated in one run, but it still goes back to the same core principles, which you need to have good housekeeping in order to be successful both with rules and with machine learning models. +I don't know if that that was kind of long wind. I don't know if that answered the question or not. It does, it does. +I mean, it gives the intuition, especially where you said the connection between, you know, that that was an insight actually to me, like the connection between head and tail that 50% of tail may correlate with your head. And that's amazing. +Like 50% of this super hard queries could be kind of, you know, removed from that complexity space, right? Which is, again, you know, your mileage may vary, right? Like it depends on your data set in Europe, but you know, like in e-commerce, right? +If if I phone 13 or whatever is the head query, there's probably a tail query that's, you know, silver 64 gigabyte iPhone 13 with case, right? Like that's probably a tail query or at least a torso query. +And once you have those types of realizations, you can start to link these up. And then the cool thing really is that then the things you know about the head can apply to those types of tail queries as well. +And so you're actually, you might be able to more effectively manage those tail queries, even without machine learning models. Yeah, absolutely. And just a quick reminder to our respected audience, feel free to send your questions. +Otherwise, I will ask all the questions myself, which, which of course I have, but, you know, I'm sure you guys have guys and girls. I'm sure you have some interesting cases. We do get a few questions already, but we will we'll answer them in the end of this session. +And couple coupling, you know, that process of sort of, you know, crafting the signals and training your model and deploying it and ML ops that you mentioned. +How do you when it comes to measurement, how do you measure? How do you make sure that, you know, what happens right now in production still makes sense that they don't need to do any hectic action about, you know, okay, pulling the model back or something like that. +What's your sense on on on that front? And like, maybe some measurements that you have deployed yourself and have been observing every single day and relying on it. And again, it depends on your what, you know, kind of what domain you work in. +But, you know, I mean, there's there's lots of literature on how to score and and, you know, test your model. + So things like precision and recall where you're looking at what users are clicking on and whether they're finding the results, things like zero results or often one of the things that I find helpful is like what what you would call surprising results where documents are occurring fairly high up in the results, but they're not actually garnering the clicks that you would expect given that position. +So for instance, you know, I mean, many people in search understand that there's a position bias that's just built into all of us as humans. We we trust the machine. And so we click on the first one. + Well, if you if you consistently see that a document is appearing at say number one or number two in the results, but it's getting way less clicks than say the six or seventh document, that might be an indication to you that that document isn't particularly relevant or for whatever reasons users aren't liking it. +So those kinds of more subtle metrics can also be informative. + I think, you know, if you have a AB experiment, testing framework in place, obviously you can do all of your metrics around AB testing, you know, start with just giving a certain amount of traffic to your new approach and then ramping up as it meets your metrics, whatever that is or what, you know, your targets are if that's things like add the cards, etc. +You can ramp up those those types of tests as you as it proves out. There's obviously there's things you can do offline as well, like especially if you have enough query logs. +And if your index hasn't changed that much, but maybe just the approach you're taking has, then you can you can replay your logs, you can test out and you know, effectively simulate what users might click on in those scenarios. +And then of course there's the old fashioned just, you know, things like smell tests like do these results look better to me as an expert, you obviously have to be careful there or to a small cohort of experts, you know, like maybe your colleagues, etc. might spend some time scoring. +So all of these things, I think are techniques and measurements you can use to check to see whether results are, you know, good enough for you them to go into production. +I think there's I think Ronnie, Ronnie, co-hoved me, I forget the name of the book, but he has a really good book along with a co-author on online experimentation. It's probably these days the Bible of online experimentation. So I would encourage users to check that out. +And then, you know, there's there's lots of metrics that you can deploy, you know, that are pretty well standard and publicized. There's some quick googling should find those for people. Yeah, for sure. +Of course, I think you could measure some things like a DCG, which is offline, right? So like, but you do need like rated queries. +And as a contributor to Qbit, which is a query rating system, open source system, I'm curious to to hear your opinion on, you know, sort of on one hand, of course, you can always go and just check, sanity check, you know, smoke test, your, your, your runker. +But that's just maybe for engineers or product managers, like a smaller group versus when you go and try to understand the intent of queries at larger scale with this manual effort. +Have you seen, have you deployed such methods within organizations? What do you feel like doing this in the companies on more regular basis? +And I also know, as a shout out to what you did in the course, search with the mail course, like you did ask us to rate some queries and create a judgment, please, to get a feel of the process. +And I think that by itself is a great idea because it pushes you towards, you know, further understanding what is it that you're building for? So yeah. + Yeah, I mean, I think, yeah, I mean, it makes, it makes a ton of sense to have, if you can afford to do offline evaluation using, you know, professional annotators, you know, like, I don't know how good mechanical Turk these days is, but like, you know, something like a mechanical Turk or like, I forget what crowd flour is called now or I know we've worked with a company called Appen in the past, like, there are these companies out there that will provide you with a large number of annotators who will run your queries and then rank them for you. +And of course, you can use that as well. +So again, like, you know, it often comes down to whether you're monetizing your search results and folks who do monetize their search results will typically pay for those kinds of things, especially once they reach really large scales, you know, like your, your Amazon's and the like. +Where and how much you can do that often comes down to budget and time, right? +So, you know, if you have the budget, I've seen companies do that, you know, maybe I don't know about weekly, there might be some that do that weekly at the really large scale, that gets really expensive quarterly or whenever there's a major update to the system, those kinds of things. +So by all means, I mean, I think anything you can do to get, you know, I think often in this space, we love to say, oh, well, this is the way you do it. +And the reality is, is like, you want a hybrid approach to most of these things, right? Because there's no one perfect way of, there's no one perfect model and there's no one perfect way of evaluating a model, right? +And so you need to blend these and build up a broader sense of what actually works, right? Yeah, absolutely. + It's just like, I guess, I guess, general awareness, like that these systems and approaches exist and like when you feel stuck that you don't know, okay, you don't generate ideas where you can improve your search engine, you can go deeper and try to involve, you know, and the teachers, I believe, to help you understand. +And before we move further to some of higher level questions, I still wanted to ask you a little bit more detailed question on if somebody in the audience or listeners wants to try to build the kind of end-to-end search engine at home. +So what are the available datasets, tools and algorithms exist today that will allow you to build this and train relevancy models and all these building blocks in the search engine? Yeah, I mean, it's, you know, it's interesting. +I think in many ways we live in a golden age of of search engines, right? Like, there are several just top notch open source freely available search engines on the market. +There are a number of companies competing in this space, right? So, you know, picking an engine is almost like, hey, you know, it's a plethora of riches. +It's almost, it's like, you're, it's a challenge to pick one because there's so many good choices, right? And you're often like, what specific features or domains am I going to participate in? So, you know, it's obviously one like, choose a good engine. +And I think you really can't go wrong with any of the main ones. What, you know, it's the Lucine-based ones, Solar Elastic Search Open Search. I haven't played with Vespa myself, but, you know, I think that one's coming on strong as well. +You see a lot of interesting capabilities that are coming out of that. And then, you know, you have obviously the, the companies behind it. Of course, I'm co-founder of Lucidworks, and so still a big shout out and big fan there, because I think they're doing a lot of interesting things. +But you also see a number of other players in that space, both with deep learning or neural-based approaches, as well as blended or hybrid or traditional approaches. So, one, start with your engine. See what it's capable of. +And then on the data set front, it really kind of depends on what your, what domain you're in. But, you know, I'm a big fan. You know, I often start with public data sets, Trek TREC is a great place to get data sets across a large number of domains. You can also get queries. +So, whether you want to do web search or e-commerce or legal or enterprise or medical, like you can go to track and get a data set and start indexing that, playing around with it. These days also, it's just super easy to go crawl. +So, you know, get like scrappy or curl or WGET or whatever, or it's one of these crawlers and go crawl websites. And then you can start going from there. The query log side tends to be a little bit harder because companies don't like to release their queries. +But there are several data sets that do have some form of queries with them. They may not be enough for you to fully test all the features of an engine. So, in our class, we use a really old data set from Best Buy that has query logs. In it, well, query click logs. +But for instance, it doesn't tell you what was shown the user. It just tells you what they clicked on. And so, you can't actually build full models or effective models with that. +But it's actually a really good e-commerce data set because it has all of the problems of a data set that comes from a company. Namely, there's a lot of missing data in there. There's a lot of bad data. But there's also a lot of really good data. +And so, starting with those, and then I think, you know, you kind of just start to push the engine through its paces. Start with the tutorials, the basic features, and then see where you can go deeper. +Can you actually get Best In Class relevance measurement out of it? Can you get Best In Class speed performance out of it? And then just work your way through the engine. And these days, you can typically do that in, say, less than a week. +And that's really amazing, right? Especially when you combine that with all the great information out on the web, right? Like, you know, I think when I was getting started, it was, you know, you had to go and really dig in underneath the hood and kind of figure out a lot of those pieces these days. +It would take several weeks, if not months, you know, month or more to really feel like you understood an engine and where it went. And I think these days, it's just so much easier to do that, which is awesome. Yeah, absolutely. And I remember during the course we had to do it within a week. +So per project. So that was super exciting. And I think this would not be a vector podcast if I wouldn't ask you also on your opinion in vector search. +Like what's your feel for how it will augment the search engine experience on the user side as well as on the development side and connected to that. What do you think the search engine engineer profession is going to be like soon? And I think it's already shaping up in many ways. +Like the boundary between data scientists and the search engineer blend. Do you feel yourself like that? Do you think this is the direction we are going? Or do you think it's going to be like a form that will wear off? That's at some point. Yeah, I mean, it's, well, it's not going to wear off. +I mean, there's too much money and too much investment and too much better results. I will state upfront, I'm not an expert on these vector engines, right? Like I, it's kind of interesting. +Like they, I went back and look through some of my talks and I think I gave a talk in 2013 on what the Lucine and Solar community needed to do next. And one of the things was we need to add support for dense vectors. That was 2013. I think we just got dense vector support in solar. +Elastic maybe was there a little bit sooner, but roughly same time frame. There are plugins, of course, that have been around like the K&N plugins, things like that. Hey folks, like this stuff is here to stay. +I mean, the really interesting questions, you're starting to see these hybrid models where, like BM25 is still really good and really fast at that first pass retrieval. +It's kind of hard to beat in terms of the scale at which you can get a first pass rank, right? And then feeding it, those results into much deeper or more capable engines. I think that's been around for a while and academia has proven that out. +Clearly, like using embeddings and vectors for things like query understanding and content understanding and using tools like Burnt, etc. for enriching your understanding, your content, and then making those searchable. That's all, I think, well and good. + I think the really interesting question will be is whether the vector engines can add all of the layers that the sparse approaches have, I don't know about perfected, but added over the years, you know, the fascinating, the aggregations, the spell checkings, the highlighting, all of those things that actually go into building a search application. +If the vector engines deliver all of those things and deliver better results, that's probably a no brainer, right? In the meantime, we have these hybrids because I think there nobody is delivering all of the capabilities. +The other things that's interesting with the dense vectors, right, is that you can start to map multimodal data types all into the same engine. So images and text and audio, etc. Right? And again, like I'm not an expert on this, but that's my understanding. +So then, so then you can query across spaces, if you will. Again, like I'm not using the right terminology here, but and that to me is often the, at least people talk about that like it's a holy grail. I'm not fully convinced people will actually search that way. +I still think that remains to seem because there's a lot of implications for the the user interface and the user experience is how you interact with that. +You know, like people have long talked about, oh, hey, I'm going to take a picture and then get my back, my search results, but like I don't every time I use those tools, I'm like, okay, that's nice, but it's still clunky from a user experience standpoint, right? +So, so like there's a lot of that work above and beyond just the core engine that has to be solved. +But clearly, there's a lot of money and effort going into it. And so like as a search engineer, you can't ignore as a data scientist, you can't ignore it. And so you've got to get up on how these are built. +I think all the major engines open source and private have some form of it at this point of blended models. + Again, like, you know, if you're in a domain that you don't have enough data for these and may or may not work, although again, like one of the interesting things with these neural models, right, is you can often train on a general model and then just use a few examples from your domain to essentially tailor that general model to your environment, right? +Like I'm working on one of my clients is doing this in the NLP space right now. +We're using a general model around analyzing contracts and then we're applying domain specific things to it. +And it's really interesting how effective it is with very few examples, right? That's an NLP problem, not a search problem, but you know, so I think you're going to just continue to see that trend and grow and expand, right? So you've got to be on board with it. Yeah, absolutely. +And you can find of course more conversation on the podcast about this. But I think I agree with you that the multimodality aspect of vector search is quite exciting. +And where the data sits in images, for instance, that haven't been annotated yet, right? And so many images uploaded every single day in videos, you know, if the model is able to transcend the the domains so easily like clip model, for instance, built by OpenAI, it's not a perfect model. +Sometimes it fails, but sometimes it also uses you like, how could it figure out, you know, to work so reliably on my data that it hasn't seen before? That's amazing. +Well, and it goes back to your earlier question, which is like, you know, at the end of the day, folks like go evaluate it and see whether it works better for you. +And then like I said, even earlier, I mean, they're all just vectors and we're all just trying to calculate cosines between the user's query and and the vector. And so in some regards, like we're just building a better vector, right? It's just a better vector. It has more information encoded in it. +And so if I can query that more effectively, then why wouldn't you use it? Yeah, yeah, exactly. And of course, there are other subtopics there how to make it faster and so on, but I think eventually we will, hey, Google figured it out for 10% of the queries. +So I guess the rest of the world will catch up. +Before we continue to the questions from the audience of which we have at you, I do love asking, and if you can keep it a little bit short, because we are short on time, but I'm still super, super interested to hear your motivation to stay in this space. +You have tried so many things in your career, right? Looking at your LinkedIn profiles, just on and on experiences and fractional CTO and full-time CTO and an engineer and so on and book author. +What motivates you to stay in this space today and also go into education teaching? Yeah, I mean, it's funny. + I think, well, even when I was at Wikimedia and I quote, unquote, left search, I mean, we still ran a very large search engine and I always enjoyed my conversations with a search team at Wikimedia just because they were, you know, it's such a high traffic website and search there, I think does something like 6,000 queries per second or something like that. + So you know, in some ways, and this is reflecting back on my career, I mean, I think I fell in love with language and the way humans use language and find information back circa 1999 or so when I started at a small company called TextWise run by Liz Litty who is one of the pioneers in the natural language processing field and it just happened to have a search project that I started working on, right? +But to me, you know, at the end of the day, like, this space and this is why I went to Wikimedia. + So I say, searches that necessarily the through line, even though it's often the main, it appears to be the through line in my career, the deeper through line, I think, is that I am fascinated by how we can leverage computers to help users make more informed, more capable, more aware decisions in their lives, whether that's purchasing online or political or governmental or whatever it is, like, I am fascinated by how we can help people make more informed decisions because I think that's the thing that lifts us out, right? +And so education then is a easy follow-on from that through line, right? Like, the more people I can help use these tools and also learn myself, the better off will I'll be, right? Like, we have to use these tools to, you know, to help us as humans get along better, etc. +be more informed, so on, so forth, right? So that's probably the through line of the career, right? Is this how do you help people find information and take action that makes us all better? Absolutely, this is very deep. Thanks so much. +I love asking this question because I'm super motivated to stay in the space, but I also love to see the facets and the motivation of other professionals like yourself that I'm looking up to. I really enjoyed this conversation. +Is there an announcement that you want to make in terms of the courses that you're going to be teaching soon? Yeah, that's great. I appreciate that, the Metri, and I know we'd have some user questions, and I'm happy to stay on a little bit longer as well, get those. +Yes, we actually, we have two classes coming up. +So one of the things we learned in the first run of search with machine learning is, you know, effectively we had one week of trying to get everybody on to the same page of how does open search work and what are the basics of search? + And then we had three weeks of fairly intense machine learning in a search environment, and one of the things that happened in the class because we didn't have a lot of prerequisites is we had a really wide array of students of folks who were deep experts like yourself, as well as like totally new to this arena. +And what happened, I think, is that first week for the new people was like, hey, this is too much for me to get up to speed. And for the folks who had already done search, it was like, hey, I already know how to do all of this. +And so trying to, trying to go across that gap, I think we kind of ended up in this lukewarm area where nobody was quite satisfied. +So one of the things we did was we split out the new stuff into a two week class called search fundamentals, which covers all of the basic intuitions of search, whether it's deep learning based or a sparse learning based or sparse vector based, sorry. +And so we cover, you know, indexing querying, facetying, spell checking, auto-complete, kind of all the building blocks of a search application. +And then with the machine learning class, because we're dropping that beginner class week, we now have added in a neural retrieval dance retrieval into that as well. And so the search with fundamentals class starts next Monday, June 6th. You can still sign up. It's $200. +There's a code, DGSearch 10. And then search with machine learning is two weeks after that. And that's a four week class. Both are project intensive. +Every week, you're going to do a project, you're going to write code, you're going to interact with students, you're going to hear lectures, so on, so forth. +In many ways, I think it's modeled after a university style class where you, you know, every week you have homework, every week you have lectures, and so on, so forth. So yeah, please sign up. Yeah, that's awesome. +What I've personally enjoyed during the course, the search with the Mel four weeks course was the atmosphere. The atmosphere that was basically creating itself amongst the students and was over 100 people there on Slack helping each other. That was just amazing. +Somebody saved me like a ton of time by just sharing, you know, a recipe that I followed and quickly went through to some hurdle. And I learned, and I, of course, I knew some stuff. +Yes, I'm an expert in this field, but also you can put your expertise, you know, to a test when you, when you run so fast during the course and the support that you guys provided was amazing. So that's amazing. I've enjoyed this conversation so much. +Now we are moving to the questions from the audience. And I'll pick the, and feel free to ask questions, please. We still have a few minutes. The first question comes from Avynash, who is currently testing the approach of buying coder to find the similar sentence, top 10. +And later passing the top 10 sentence to a crossing coder model to find the most similar sentence in the top 10 using cosine similarity. Yeah, I guess he's asking for advice is this an appropriate method. This is where my expertise just is not. So Avynash, I will apologize. +I do not know enough here to give you advice. +I would probably ask first, like, what is the actual problem? Are you trying to solve? You know, so if you're trying to find similar sentences, then from my understanding of it, that my basic level understanding of what you're describing, it sounds like a reasonable, a reasonable approach. +But there are people who are much in probably Dmitry, you probably could answer this one better than I, but I have not played with or tried out those specific types of capabilities. So I don't have good advice there. I have worked in general on sentence similarity type problems. +It is always challenging. In fact, I have a my current company that I'm one of my fractional clients. We are doing sentence similarity or clause similarity types problems. And I think they are we are using similar modeling techniques, but I'm not doing the day to day modeling on that. +So I'm really just trusting the data scientists on that. Yeah, I can add to this that I happen to have given a community talk during the search with the mail course. And there I actually go explicitly into this by encoder and cross-ent coder. +So only one thing is that cross-ent coder is much more computationally intensive. And so you don't want to run it on a huge amount of sentences. And it looks like that's what you're doing. So that sounds sensible to me. I think I would pay more attention to testing your approach. +So make sure to reserve some part of your data set to test it. Careful. Yeah, this is the cool thing for me coming back in from Wikiland is I'm learning so much now too. +Like this is I've been digging my way through a lot of these things, but as you can see, this is why it's the gold age because there's so many approaches and they're often improving state of the art every week, right? Yeah, exactly. A lot of things is happening. +Another question I'm taking now from the chat, Carlos is asking, I'd like to know Grandsepinion inside about learning to boost. He gives also a link to a presentation at a high-stack high-stack conference. I don't know if you're familiar with this approach, Grant. Can you say anything? I am not. +I'd like to know learning to boost interesting. Another thing to go learn. Yeah, I think it was all kind of learning to rank. I think it's related, but I actually don't know myself like that much in detail, but that that presentation was great. +It looked like new thing, but at the same time kind of familiar. Basically, instead of learning to rank, you learn the boost values as far as I remember. It sounds interesting and reasonable. +Again, at the end of the day, how do we shape these vectors? I know that's a generic wave in your hands, but I would take this and go try it. I think most of these machine learning systems you're trying to learn weights that then shape the way that vector gets called. +If it works on your domain and it's fast enough and you can maintain it, then go for it. You don't need some experts blessing on it. It certainly sounds interesting. LTR certainly has its own challenges in terms of tweaking and tuning. I know I've struggled with that with LTR a lot. +I know I've struggled with hand-tuned boost a lot as well, so anything that helps do that I think would be good. Yeah, awesome. The next question comes from Nico, Hey, Nico, a former colleague from AlphaSense. +If you're hosting an information search engine which should catch new topics like COVID when it hit, how do you notice that your boosting model of vector embedding model does not recognize queries related to these new topics proactively? +Yeah, that's where I think the instrumentation of your system comes in, right? And the human and the loop on that instrumentation in the system, right? +I mean, I think nobody talks about it, but even at the really large successful search engines, there's still people who are reviewing where things are working and not working, right? And generally they're doing it at the experimentation level, but people still dig into queries. +What queries are underperforming? What documents are underperforming? I think there's tools, there's a lot of good tools out there for anomaly detection as well. +So recognizing when new queries are coming in is something like anomaly detection algorithms will work with, right? +You know, looking at your top queries, your trending queries, and then again, looking at those results, there are machine learning approaches to automatically identifying and alerting on those kinds of things, again, along the anomaly detection line. +But at the end of the day, you can always do that with people as well, right? And that's where humans maybe are better at still at recognizing some of those things. Yeah, and I think you also alluded to this somewhat. +I mean, this question is to me, it's like chicken-eyed problem, right? So if a new topic arises in the queries and also in the documents, but I haven't handled it yet before prior to this, then what can I do live? +So I think you said that try to measure things like if some top ranking documents are not clicked, then that's probably a signal of something is smoky there. +Go check it out. Another thing that I think I could recommend, maybe from my side, is you could try to cluster your queries actually. +And sometimes the funny thing is that queries are related in some way, right? +So like if it's a completely new cluster and usually dense retrieval helps a lot there, pre-trained models on your domain or maybe on some generic domain like news, they might still pick these things up and put them in the same basket, then ask some human annotators to go and check. +Instead of checking the whole multimillion log, you know, which would be super, super complicated. And you know, I agree. +And the nice thing about like, you know, especially, you know, these engines, you know, there is still the good old BM25 case where like at least the basic level keywords are going to match. +And so if a new term comes in for COVID and like it's in the documents, you'll at least probably get an exact match. You may not deal with the fuzzy matches all that well, but you know, like something's better than nothing. And then that allows you to start to iterate on it. Yeah, exactly. +So the next question from Q&A panel is from Chris for the search with ML course, which front-end framework are most students using for their projects? Front-end framework feels a little open-ended to me, but I mean, I can tell. +So one of the things we're doing in both classes is we try to work with a real data set-end with a real search application. For better or for worse, we chose not to use notebooks. Notebooks are great for a lot of things, but I don't know that they always show you how actual applications work. +So we actually build out a really simple application. The front-end is like tailwinds, CSS, and really simple flask serving layer for the APIs. And then we use open search for the search engine and things like fast text and a few other things for ML side of it. +You know, we use the learning to rank plugin for open search, trying to think if there's anything else in our stack. It's primarily Python, but I think if you were a Java user or any of the other languages where there's clients for open search, you would do just fine in the class. +You maybe just won't be able to use all of the Python capabilities that we have in the class. I hope that answers your question, Chris. The repositories are all at least the base level repositories are all available under my GitHub. +So you can just go to my GitHub, which excuse me is GSING, ERS, and put that in the chat. And then you can see the frameworks we use. Yeah, awesome. +And I can just, you know, you can pick these things up or you can, if you know Python, it's probably easy for you, but if you don't, you can pick this up. And the next question is from the chat from quasi, I hope you pronounce your name correctly. +As these days, most of these sort of approaches are based on transformers for anyone who wants to try out IR approach using transformers as a pet project. Does grant have any recommendations in terms of cloud services tools? I don't have any specific recommendations. +I know I've looked at there's several players. I was so for instance, I saw somebody in one of the IR communities that I was in with posted around, I think I don't know how he's pronounced about quadrant, I think QDR and T. +I know there's UVA, I know there's pine cone, elastic, solar, and open search all have dense vector retrieval capabilities. I've been playing around just getting started with hugging face. I'm a little late to the hugging face game when it comes to these things. +I know a lot of people I talk to use colab to build and run these systems. And so I think you can probably get started. Again, like Demetri, you may have better tutorials. I know you've posted a bunch of stuff on medium amount, how to get started in this as have other people. +So I would start there, I guess, any one of those you probably won't do wrong with. And then for me, I always go back to, like I like to take a data set that I'm familiar with first rather than a technology that I'm unfamiliar with. +Whenever I'm learning something new, I start with something I'm familiar with and then try to apply that thing to the new technology as opposed to picking the technology first and then trying to, you know, kind of go back and forth between the tutorials that they provide. +But I always like to go back to a domain I'm familiar with because then I don't have to rebuild my intuition. Right. So for instance, I've never really done image search, but I've done e-commerce search all the time. +So it makes way more sense for me to try out transformers with e-commerce than it does with images just because I don't know the core intuition as much on the images as I do for e-commerce. So I would probably start that way first. Yeah, I agree. +And another thing, yeah, of course, Grant, you thank you, you mentioned, you know, my medium blog post, there are a lot more people blogging on this, but I have a specific collection on medium, 37 minutes by sheer reading time. +You can go through like basics like exact can and search all the way up to, you know, neural retrieval, which is approximate nearest neighbor search because you cannot do exact can and search at scale. It will just not not scale. +So you have to kind of go and cut some corners, so to say, but actually in a more mathematical sense, you create this algorithms that are beautifully handling this complexity for you. So go check it out. +I think the next and probably last question, but not least, is from a shish, is the search with a mail course right to step into if I'm looking to learn about semantic search and add the functionality to SQL or no SQL databases? That's an interesting question. +I guess I haven't thought about it in that sense. +I mean, I think, you know, I think a lot of the techniques we use in the ML class relate to semantic search and relate to like how can we get better relevance out of the engine? So semantic search being one of those types of capabilities, a kind of semantic search often is a pretty loaded phrase. +So depending on what you're trying to do there, as you should your mileage may vary. But we certainly cover things like classifying your content, classifying your queries. We do learning to rank. +We talk about synonym expansion query, you know, smarter queries, better filters, all of those kinds of things, I think fall can be loosely coupled into semantic search. +If you're talking more like you want to do like graph-based inferences or, you know, using things like wiki data or dbpedia or those kinds of things to infer relationships and do semantic search that way. We don't really cover those as much. +We do base off of open search, but I think the concepts apply in general. With the SQL and no SQL databases, like I know a lot of them have kind of baseline search functionality in them. +And so you would be able to apply some of the principles because a lot of the principles we do, you actually do either before indexing or before querying. +So those would certainly apply, you know, because at the end of the day, you're just using those things to then generate a better query or a better document to be stored in your engine. And so I don't see your reason why they went work in a no-SQL store or a SQL store. +It's just then how do you translate that into your query language, right? But we do use open search. All the examples are open search. You would have to do the work to leap to that, whatever it is your engine is doing. Yeah, absolutely. And the good thing is that open search does have a K&N plugin. +They call it K&N plugin, but it's actually approximate nearest neighbor search. And so it's off heap for those who care. So it's not inside Java, but it still allows you to get a feel of how neural search will influence your results at. +And you can also, you know, mix and match, sort of using more traditional VM25 with this. Awesome. This was the last question. Thanks so much to everyone who asked their questions live. And, you know, consider joining the course if you haven't yet. +And, Grant, thanks so much for this session and for answering the question and sharing your wisdom. I've enjoyed this conversation very much. Thank you. Thanks so much for having me, Dmitry, and keep up the great work. I love the podcast. +And it's awesome to see a search dedicated podcast out there. So congrats and good luck with that. Thank you so much. All right. Bye-bye. Bye, folks. Thanks, Dmitry. Thanks, Grant. All right. Dmitry. Right, now. Two. Two. Until eight. One. Four. +Okay, let's get back to what you want and now we can direct that back to the millions. Okay. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/greg-kogan-pinecone-vector-podcast-with-dmitry-kan.md b/transcripts_with_timestamps/vector-podcast/greg-kogan-pinecone-vector-podcast-with-dmitry-kan.md new file mode 100644 index 0000000..a1bca5b --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/greg-kogan-pinecone-vector-podcast-with-dmitry-kan.md @@ -0,0 +1,2170 @@ +--- +description: '

Show notes:

1. Pinecone 2.0: https://www.pinecone.io/learn/pinecon... + It is GA and free: https://www.pinecone.io/learn/v2-pric...

2. Get your + “Love Thy Nearest Neighbour” t-shirt :) shoot an email to greg@pinecone.io

3. + Billion-Scale Approximate Nearest Neighbour Search Challenge: https://big-ann-benchmarks.com/index.... +

4. ANNOY: https://github.com/spotify/annoy

5. FAISS: https://github.com/facebookresearch/f... +

6. HNSW: https://github.com/nmslib/hnswlib

7. “How Zero Results Are + Killing Ecommerce Conversions” https://lucidworks.com/post/how-zero-... +

8. Try out Pinecone vector DB: https://app.pinecone.io/

9. Twitter: https://twitter.com/Pinecone_io

10. + LinkedIn: https://www.linkedin.com/company/pine...

11. Greg’s Twitter: + https://twitter.com/grigoriy_kogan +

12. Dmitry''s Twitter: https://twitter.com/DmitryKan

Watch on YouTube: https://www.youtube.com/watch?v=jT3i7NLwJ8w

' +image_url: https://media.rss.com/vector-podcast/20211206_061204_ed150262b3f862f73666d3cce317fc98.jpg +pub_date: Mon, 06 Dec 2021 18:00:04 GMT +title: Greg Kogan - Pinecone - Vector Podcast with Dmitry Kan +url: https://rss.com/podcasts/vector-podcast/334671 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 22.0, "text": " Hello + everyone, so, Dr. Podcast here. Today I have Greg Coggen with the charter of marketing.", + "tokens": [50364, 2425, 1518, 11, 370, 11, 2491, 13, 29972, 510, 13, 2692, 286, + 362, 11490, 383, 664, 1766, 365, 264, 27472, 295, 6370, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.6521179764359085, "compression_ratio": 1.0568181818181819, + "no_speech_prob": 0.08461221307516098}, {"id": 1, "seek": 2200, "start": 22.0, "end": + 31.0, "text": " He works for Pinecon. So today we will dive into Pinecon and maybe + Greg will give us some highlights as well. Hi Greg.", "tokens": [50364, 634, 1985, + 337, 33531, 1671, 13, 407, 965, 321, 486, 9192, 666, 33531, 1671, 293, 1310, 11490, + 486, 976, 505, 512, 14254, 382, 731, 13, 2421, 11490, 13, 50814], "temperature": + 0.0, "avg_logprob": -0.2748660077952375, "compression_ratio": 1.6431372549019607, + "no_speech_prob": 0.5747199654579163}, {"id": 2, "seek": 2200, "start": 31.0, "end": + 34.0, "text": " It''s me, Tree. Thanks for having me.", "tokens": [50814, 467, 311, + 385, 11, 22291, 13, 2561, 337, 1419, 385, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.2748660077952375, "compression_ratio": 1.6431372549019607, "no_speech_prob": + 0.5747199654579163}, {"id": 3, "seek": 2200, "start": 34.0, "end": 49.0, "text": + " Yeah, awesome. Thanks for joining. So I was thinking maybe you can introduce yourself + to our audience because actually I personally was quite impressed that you''re so + technical and even though you''re in charge of marketing, you''re like your lingo + is so technical.", "tokens": [50964, 865, 11, 3476, 13, 2561, 337, 5549, 13, 407, + 286, 390, 1953, 1310, 291, 393, 5366, 1803, 281, 527, 4034, 570, 767, 286, 5665, + 390, 1596, 11679, 300, 291, 434, 370, 6191, 293, 754, 1673, 291, 434, 294, 4602, + 295, 6370, 11, 291, 434, 411, 428, 287, 18459, 307, 370, 6191, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.2748660077952375, "compression_ratio": 1.6431372549019607, + "no_speech_prob": 0.5747199654579163}, {"id": 4, "seek": 4900, "start": 49.0, "end": + 53.0, "text": " So technical, so can you do have some technical background?", "tokens": + [50364, 407, 6191, 11, 370, 393, 291, 360, 362, 512, 6191, 3678, 30, 50564], "temperature": + 0.0, "avg_logprob": -0.1855557131212811, "compression_ratio": 1.3533834586466165, + "no_speech_prob": 0.013169613666832447}, {"id": 5, "seek": 4900, "start": 53.0, + "end": 63.0, "text": " Yeah, I actually have a degree in naval architecture. It''s + an engineering degree and that was my career for three years.", "tokens": [50564, + 865, 11, 286, 767, 362, 257, 4314, 294, 33050, 9482, 13, 467, 311, 364, 7043, 4314, + 293, 300, 390, 452, 3988, 337, 1045, 924, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.1855557131212811, "compression_ratio": 1.3533834586466165, "no_speech_prob": + 0.013169613666832447}, {"id": 6, "seek": 6300, "start": 63.0, "end": 86.0, "text": + " And I did systems engineering and mechanical engineering electrical and so on. + While I was doing that, I also was moonlighting as a web developer and taught myself + PHP and things like that and reading about startups and eventually became clear + that I should make my day jobs related to startups.", "tokens": [50364, 400, 286, + 630, 3652, 7043, 293, 12070, 7043, 12147, 293, 370, 322, 13, 3987, 286, 390, 884, + 300, 11, 286, 611, 390, 48058, 278, 382, 257, 3670, 10754, 293, 5928, 2059, 47298, + 293, 721, 411, 300, 293, 3760, 466, 28041, 293, 4728, 3062, 1850, 300, 286, 820, + 652, 452, 786, 4782, 4077, 281, 28041, 13, 51514], "temperature": 0.0, "avg_logprob": + -0.11429223367723368, "compression_ratio": 1.5691489361702127, "no_speech_prob": + 0.05036507546901703}, {"id": 7, "seek": 8600, "start": 86.0, "end": 98.0, "text": + " And so I left my engineering career and went to work with startups with marketing + and I fell in love with it. That was about nine years ago.", "tokens": [50364, 400, + 370, 286, 1411, 452, 7043, 3988, 293, 1437, 281, 589, 365, 28041, 365, 6370, 293, + 286, 5696, 294, 959, 365, 309, 13, 663, 390, 466, 4949, 924, 2057, 13, 50964], "temperature": + 0.0, "avg_logprob": -0.11116004812306371, "compression_ratio": 1.5705521472392638, + "no_speech_prob": 0.011474424041807652}, {"id": 8, "seek": 8600, "start": 98.0, + "end": 109.0, "text": " And I''ve been working with for the past eight years. I + was consulting and advising technical startups on marketing.", "tokens": [50964, + 400, 286, 600, 668, 1364, 365, 337, 264, 1791, 3180, 924, 13, 286, 390, 23682, 293, + 35598, 6191, 28041, 322, 6370, 13, 51514], "temperature": 0.0, "avg_logprob": -0.11116004812306371, + "compression_ratio": 1.5705521472392638, "no_speech_prob": 0.011474424041807652}, + {"id": 9, "seek": 10900, "start": 109.0, "end": 120.0, "text": " And I loved it + because I was able to use my engineering thinking and get along well with technical + founders and the", "tokens": [50364, 400, 286, 4333, 309, 570, 286, 390, 1075, 281, + 764, 452, 7043, 1953, 293, 483, 2051, 731, 365, 6191, 25608, 293, 264, 50914], "temperature": + 0.0, "avg_logprob": -0.12089078625043233, "compression_ratio": 1.4507042253521127, + "no_speech_prob": 0.006496563088148832}, {"id": 10, "seek": 10900, "start": 120.0, + "end": 128.0, "text": " like the coding foundation I had allowed me to get a grasp + for what it is the products do.", "tokens": [50914, 411, 264, 17720, 7030, 286, + 632, 4350, 385, 281, 483, 257, 21743, 337, 437, 309, 307, 264, 3383, 360, 13, 51314], + "temperature": 0.0, "avg_logprob": -0.12089078625043233, "compression_ratio": 1.4507042253521127, + "no_speech_prob": 0.006496563088148832}, {"id": 11, "seek": 12800, "start": 128.0, + "end": 153.0, "text": " And last year I joined pineconous the VP of marketing and + the engineering background certainly helps we have a technical product technical + users and really everyone at the company has a very technical background, even our + director of product has a PhD in electrical engineering just to give you a sense.", + "tokens": [50364, 400, 1036, 1064, 286, 6869, 15113, 1671, 563, 264, 35812, 295, + 6370, 293, 264, 7043, 3678, 3297, 3665, 321, 362, 257, 6191, 1674, 6191, 5022, 293, + 534, 1518, 412, 264, 2237, 575, 257, 588, 6191, 3678, 11, 754, 527, 5391, 295, 1674, + 575, 257, 14476, 294, 12147, 7043, 445, 281, 976, 291, 257, 2020, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.21626839395296776, "compression_ratio": 1.6031746031746033, + "no_speech_prob": 0.033137157559394836}, {"id": 12, "seek": 15300, "start": 153.0, + "end": 156.0, "text": " And I was like, wow, that''s impressive.", "tokens": [50364, + 400, 286, 390, 411, 11, 6076, 11, 300, 311, 8992, 13, 50514], "temperature": 0.0, + "avg_logprob": -0.5187439468671691, "compression_ratio": 1.3617021276595744, "no_speech_prob": + 0.3348621428012848}, {"id": 13, "seek": 15300, "start": 156.0, "end": 166.0, "text": + " Yeah, that''s like you mentioned to the H.P. actually this was one of the first + languages I called learned to code and decide Pearl, but yeah, this days.", "tokens": + [50514, 865, 11, 300, 311, 411, 291, 2835, 281, 264, 389, 13, 47, 13, 767, 341, + 390, 472, 295, 264, 700, 8650, 286, 1219, 3264, 281, 3089, 293, 4536, 24639, 11, + 457, 1338, 11, 341, 1708, 13, 51014], "temperature": 0.0, "avg_logprob": -0.5187439468671691, + "compression_ratio": 1.3617021276595744, "no_speech_prob": 0.3348621428012848}, + {"id": 14, "seek": 16600, "start": 166.0, "end": 181.0, "text": " I''m almost I + slowed down before before I told people I learned PHP because I know there''s a + bit of stigma with it. It was like messy and it''s like, you know, not as pristine + or.", "tokens": [50364, 286, 478, 1920, 286, 32057, 760, 949, 949, 286, 1907, 561, + 286, 3264, 47298, 570, 286, 458, 456, 311, 257, 857, 295, 27880, 365, 309, 13, 467, + 390, 411, 16191, 293, 309, 311, 411, 11, 291, 458, 11, 406, 382, 582, 42745, 420, + 13, 51114], "temperature": 0.0, "avg_logprob": -0.20353189328821694, "compression_ratio": + 1.5588235294117647, "no_speech_prob": 0.19927692413330078}, {"id": 15, "seek": 16600, + "start": 181.0, "end": 190.0, "text": " Yeah, as fancy as something else, but they + got the job job done like with with that foundation, a lot of other things made + a lot more sense.", "tokens": [51114, 865, 11, 382, 10247, 382, 746, 1646, 11, 457, + 436, 658, 264, 1691, 1691, 1096, 411, 365, 365, 300, 7030, 11, 257, 688, 295, 661, + 721, 1027, 257, 688, 544, 2020, 13, 51564], "temperature": 0.0, "avg_logprob": -0.20353189328821694, + "compression_ratio": 1.5588235294117647, "no_speech_prob": 0.19927692413330078}, + {"id": 16, "seek": 19000, "start": 190.0, "end": 207.0, "text": " Yeah, absolutely. + Yeah, I mean, I also enjoyed actually like it was one of the first jobs I got was + in PHP. So I built like a forum and every class in the code was starting with oops + and I was asking the new engineer doesn''t mean all OP like object oriented programming.", + "tokens": [50364, 865, 11, 3122, 13, 865, 11, 286, 914, 11, 286, 611, 4626, 767, + 411, 309, 390, 472, 295, 264, 700, 4782, 286, 658, 390, 294, 47298, 13, 407, 286, + 3094, 411, 257, 17542, 293, 633, 1508, 294, 264, 3089, 390, 2891, 365, 34166, 293, + 286, 390, 3365, 264, 777, 11403, 1177, 380, 914, 439, 23324, 411, 2657, 21841, 9410, + 13, 51214], "temperature": 0.0, "avg_logprob": -0.23643625699556792, "compression_ratio": + 1.583969465648855, "no_speech_prob": 0.09716351330280304}, {"id": 17, "seek": 19000, + "start": 207.0, "end": 215.0, "text": " And he said, no, it just means oops, I''m + not technical. So he wasn''t technical enough to know what is all P. But anyway, + that was kind of funny.", "tokens": [51214, 400, 415, 848, 11, 572, 11, 309, 445, + 1355, 34166, 11, 286, 478, 406, 6191, 13, 407, 415, 2067, 380, 6191, 1547, 281, + 458, 437, 307, 439, 430, 13, 583, 4033, 11, 300, 390, 733, 295, 4074, 13, 51614], + "temperature": 0.0, "avg_logprob": -0.23643625699556792, "compression_ratio": 1.583969465648855, + "no_speech_prob": 0.09716351330280304}, {"id": 18, "seek": 21500, "start": 215.0, + "end": 235.0, "text": " Yeah, that''s cool. So basically like you have the technical + background. You also know how to explain things. I think it''s very important in + our profession at large and sounds like you''ve been you''ve been advancing into + this topic more and more to the level of becoming, you know, symbol or like BPO + marketing actually to be precise, right.", "tokens": [50364, 865, 11, 300, 311, + 1627, 13, 407, 1936, 411, 291, 362, 264, 6191, 3678, 13, 509, 611, 458, 577, 281, + 2903, 721, 13, 286, 519, 309, 311, 588, 1021, 294, 527, 7032, 412, 2416, 293, 3263, + 411, 291, 600, 668, 291, 600, 668, 27267, 666, 341, 4829, 544, 293, 544, 281, 264, + 1496, 295, 5617, 11, 291, 458, 11, 5986, 420, 411, 363, 34885, 6370, 767, 281, 312, + 13600, 11, 558, 13, 51364], "temperature": 0.0, "avg_logprob": -0.19429757720545718, + "compression_ratio": 1.540909090909091, "no_speech_prob": 0.10594156384468079}, + {"id": 19, "seek": 23500, "start": 235.0, "end": 254.0, "text": " Yeah, that''s + awesome. So tell me more a bit more about fine code like what what are you guys + building and yeah, I know that you''ve recently had a major upgrade of fine code. + Maybe if you wish you could highlight some of the improvements you guys made.", + "tokens": [50364, 865, 11, 300, 311, 3476, 13, 407, 980, 385, 544, 257, 857, 544, + 466, 2489, 3089, 411, 437, 437, 366, 291, 1074, 2390, 293, 1338, 11, 286, 458, 300, + 291, 600, 3938, 632, 257, 2563, 11484, 295, 2489, 3089, 13, 2704, 498, 291, 3172, + 291, 727, 5078, 512, 295, 264, 13797, 291, 1074, 1027, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.2792580168125993, "compression_ratio": 1.4764705882352942, + "no_speech_prob": 0.054512519389390945}, {"id": 20, "seek": 25400, "start": 254.0, + "end": 266.0, "text": " Sure. So we''re building a vector database that makes it + very easy to deploy to build and deploy the vector search into production applications.", + "tokens": [50364, 4894, 13, 407, 321, 434, 2390, 257, 8062, 8149, 300, 1669, 309, + 588, 1858, 281, 7274, 281, 1322, 293, 7274, 264, 8062, 3164, 666, 4265, 5821, 13, + 50964], "temperature": 0.0, "avg_logprob": -0.09147703647613525, "compression_ratio": + 1.4863013698630136, "no_speech_prob": 0.008751542307436466}, {"id": 21, "seek": + 25400, "start": 266.0, "end": 274.0, "text": " This is especially useful for semantic + search and recommendation systems.", "tokens": [50964, 639, 307, 2318, 4420, 337, + 47982, 3164, 293, 11879, 3652, 13, 51364], "temperature": 0.0, "avg_logprob": -0.09147703647613525, + "compression_ratio": 1.4863013698630136, "no_speech_prob": 0.008751542307436466}, + {"id": 22, "seek": 27400, "start": 275.0, "end": 289.0, "text": " There are, we + saw, I should say the founders saw that there''s several ways of doing this to try + and emulate the big companies like Facebook, Google, Microsoft and Spotify.", "tokens": + [50414, 821, 366, 11, 321, 1866, 11, 286, 820, 584, 264, 25608, 1866, 300, 456, + 311, 2940, 2098, 295, 884, 341, 281, 853, 293, 45497, 264, 955, 3431, 411, 4384, + 11, 3329, 11, 8116, 293, 29036, 13, 51114], "temperature": 0.0, "avg_logprob": -0.1673540472984314, + "compression_ratio": 1.2573529411764706, "no_speech_prob": 0.016666073352098465}, + {"id": 23, "seek": 28900, "start": 289.0, "end": 315.0, "text": " They all involved + a lot of engineering work and a lot of infrastructure work and maintenance to actually + make it run in production, whether you''re a small startup and have better things + to focus on or a big tech company and also have better things to focus on, especially + when supporting your search and recommender systems would involve like a big team + of engineers.", "tokens": [50364, 814, 439, 3288, 257, 688, 295, 7043, 589, 293, + 257, 688, 295, 6896, 589, 293, 11258, 281, 767, 652, 309, 1190, 294, 4265, 11, 1968, + 291, 434, 257, 1359, 18578, 293, 362, 1101, 721, 281, 1879, 322, 420, 257, 955, + 7553, 2237, 293, 611, 362, 1101, 721, 281, 1879, 322, 11, 2318, 562, 7231, 428, + 3164, 293, 2748, 260, 3652, 576, 9494, 411, 257, 955, 1469, 295, 11955, 13, 51664], + "temperature": 0.0, "avg_logprob": -0.0883713813677226, "compression_ratio": 1.755980861244019, + "no_speech_prob": 0.008265172131359577}, {"id": 24, "seek": 31500, "start": 315.0, + "end": 341.0, "text": " So we recently announced pine cone 2.0 and that''s that''s + a major release that gets us closer to helping companies deploy this in production. + So one of the biggest things we''ve heard from users is that to get this in production, + they need to emulate some of their traditional search features they had before, + but they''re trying to replace.", "tokens": [50364, 407, 321, 3938, 7548, 15113, + 19749, 568, 13, 15, 293, 300, 311, 300, 311, 257, 2563, 4374, 300, 2170, 505, 4966, + 281, 4315, 3431, 7274, 341, 294, 4265, 13, 407, 472, 295, 264, 3880, 721, 321, 600, + 2198, 490, 5022, 307, 300, 281, 483, 341, 294, 4265, 11, 436, 643, 281, 45497, 512, + 295, 641, 5164, 3164, 4122, 436, 632, 949, 11, 457, 436, 434, 1382, 281, 7406, 13, + 51664], "temperature": 0.0, "avg_logprob": -0.10980924841475813, "compression_ratio": + 1.6018957345971565, "no_speech_prob": 0.0026685791090130806}, {"id": 25, "seek": + 34100, "start": 341.0, "end": 351.0, "text": " And that was specifically filtering. + They wanted to have some control over the nearest neighbor search results that they + were getting through pine cone.", "tokens": [50364, 400, 300, 390, 4682, 30822, + 13, 814, 1415, 281, 362, 512, 1969, 670, 264, 23831, 5987, 3164, 3542, 300, 436, + 645, 1242, 807, 15113, 19749, 13, 50864], "temperature": 0.0, "avg_logprob": -0.10829769863801844, + "compression_ratio": 1.7333333333333334, "no_speech_prob": 0.004197240341454744}, + {"id": 26, "seek": 34100, "start": 351.0, "end": 369.0, "text": " Another thing + was cost since typically vector search nearest neighbor searches are done in memory + companies with millions and billions of items, which are the types of companies + that benefit most from pine cone.", "tokens": [50864, 3996, 551, 390, 2063, 1670, + 5850, 8062, 3164, 23831, 5987, 26701, 366, 1096, 294, 4675, 3431, 365, 6803, 293, + 17375, 295, 4754, 11, 597, 366, 264, 3467, 295, 3431, 300, 5121, 881, 490, 15113, + 19749, 13, 51764], "temperature": 0.0, "avg_logprob": -0.10829769863801844, "compression_ratio": + 1.7333333333333334, "no_speech_prob": 0.004197240341454744}, {"id": 27, "seek": + 36900, "start": 369.0, "end": 375.0, "text": " We''re finding it prohibitively expensive + to do vector search not just on pine cone, but anywhere.", "tokens": [50364, 492, + 434, 5006, 309, 16015, 2187, 356, 5124, 281, 360, 8062, 3164, 406, 445, 322, 15113, + 19749, 11, 457, 4992, 13, 50664], "temperature": 0.0, "avg_logprob": -0.14237532525692345, + "compression_ratio": 1.3988095238095237, "no_speech_prob": 0.0013494844315573573}, + {"id": 28, "seek": 36900, "start": 375.0, "end": 387.0, "text": " And so for them, + the barrier to getting into production wasn''t lack of engineering teams. It was + like just astronomical cost projections.", "tokens": [50664, 400, 370, 337, 552, + 11, 264, 13357, 281, 1242, 666, 4265, 2067, 380, 5011, 295, 7043, 5491, 13, 467, + 390, 411, 445, 49035, 2063, 32371, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.14237532525692345, "compression_ratio": 1.3988095238095237, "no_speech_prob": + 0.0013494844315573573}, {"id": 29, "seek": 38700, "start": 387.0, "end": 407.0, + "text": " And so for that, we are releasing hybrid storage, which stores part of + the, which basically stores some data on disk and a smaller amount of data in memory, + which reduces costs up to 10x, reduces infrastructure costs.", "tokens": [50364, + 400, 370, 337, 300, 11, 321, 366, 16327, 13051, 6725, 11, 597, 9512, 644, 295, 264, + 11, 597, 1936, 9512, 512, 1412, 322, 12355, 293, 257, 4356, 2372, 295, 1412, 294, + 4675, 11, 597, 18081, 5497, 493, 281, 1266, 87, 11, 18081, 6896, 5497, 13, 51364], + "temperature": 0.0, "avg_logprob": -0.14196710197293028, "compression_ratio": 1.5174825174825175, + "no_speech_prob": 0.035922158509492874}, {"id": 30, "seek": 40700, "start": 407.0, + "end": 420.0, "text": " And we''re passing that along to users. So it''s going to + reduce it or manage the infrastructure, but their costs are going to go down as + well. And there''s some other things like sock to compliance.", "tokens": [50364, + 400, 321, 434, 8437, 300, 2051, 281, 5022, 13, 407, 309, 311, 516, 281, 5407, 309, + 420, 3067, 264, 6896, 11, 457, 641, 5497, 366, 516, 281, 352, 760, 382, 731, 13, + 400, 456, 311, 512, 661, 721, 411, 35302, 281, 15882, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.16702741257687834, "compression_ratio": 1.410071942446043, + "no_speech_prob": 0.005396661348640919}, {"id": 31, "seek": 42000, "start": 420.0, + "end": 425.0, "text": " They''re totally new rest API and Python client.", "tokens": + [50364, 814, 434, 3879, 777, 1472, 9362, 293, 15329, 6423, 13, 50614], "temperature": + 0.0, "avg_logprob": -0.2201005959812599, "compression_ratio": 1.5196078431372548, + "no_speech_prob": 0.2912045121192932}, {"id": 32, "seek": 42000, "start": 425.0, + "end": 429.0, "text": " And console.", "tokens": [50614, 400, 11076, 13, 50814], + "temperature": 0.0, "avg_logprob": -0.2201005959812599, "compression_ratio": 1.5196078431372548, + "no_speech_prob": 0.2912045121192932}, {"id": 33, "seek": 42000, "start": 429.0, + "end": 445.0, "text": " And a bunch of other things as well. So, yeah, and there''s + even more I can''t announce just yet, but we''re our engineering team is growing + in our development velocities picking up as well. So we''re going to have lots of + new things to share very soon.", "tokens": [50814, 400, 257, 3840, 295, 661, 721, + 382, 731, 13, 407, 11, 1338, 11, 293, 456, 311, 754, 544, 286, 393, 380, 7478, 445, + 1939, 11, 457, 321, 434, 527, 7043, 1469, 307, 4194, 294, 527, 3250, 7806, 1088, + 8867, 493, 382, 731, 13, 407, 321, 434, 516, 281, 362, 3195, 295, 777, 721, 281, + 2073, 588, 2321, 13, 51614], "temperature": 0.0, "avg_logprob": -0.2201005959812599, + "compression_ratio": 1.5196078431372548, "no_speech_prob": 0.2912045121192932}, + {"id": 34, "seek": 44500, "start": 445.0, "end": 459.0, "text": " Yeah, that''s + fantastic. Can''t wait. And then compress on the on the 2.0 release. But I just + noticed your t-shirt says love the nearest neighbor. Wow.", "tokens": [50364, 865, + 11, 300, 311, 5456, 13, 1664, 380, 1699, 13, 400, 550, 14778, 322, 264, 322, 264, + 568, 13, 15, 4374, 13, 583, 286, 445, 5694, 428, 256, 12, 15313, 1619, 959, 264, + 23831, 5987, 13, 3153, 13, 51064], "temperature": 0.0, "avg_logprob": -0.37235296689547026, + "compression_ratio": 1.2945205479452055, "no_speech_prob": 0.23936359584331512}, + {"id": 35, "seek": 44500, "start": 459.0, "end": 464.0, "text": " This is so relevant + to this discussion.", "tokens": [51064, 639, 307, 370, 7340, 281, 341, 5017, 13, + 51314], "temperature": 0.0, "avg_logprob": -0.37235296689547026, "compression_ratio": + 1.2945205479452055, "no_speech_prob": 0.23936359584331512}, {"id": 36, "seek": 46400, + "start": 464.0, "end": 469.0, "text": " We have lots more of these.", "tokens": + [50364, 492, 362, 3195, 544, 295, 613, 13, 50614], "temperature": 0.0, "avg_logprob": + -0.3933400426592146, "compression_ratio": 1.3088235294117647, "no_speech_prob": + 0.4658838212490082}, {"id": 37, "seek": 46400, "start": 469.0, "end": 477.0, "text": + " Anyone can send me an email. I got pine cone that I know and I''ll get your form + to fill out to get one.", "tokens": [50614, 14643, 393, 2845, 385, 364, 3796, 13, + 286, 658, 15113, 19749, 300, 286, 458, 293, 286, 603, 483, 428, 1254, 281, 2836, + 484, 281, 483, 472, 13, 51014], "temperature": 0.0, "avg_logprob": -0.3933400426592146, + "compression_ratio": 1.3088235294117647, "no_speech_prob": 0.4658838212490082}, + {"id": 38, "seek": 46400, "start": 477.0, "end": 481.0, "text": " Oh, thanks. Thanks, + Greg. I''ll gladly wear it.", "tokens": [51014, 876, 11, 3231, 13, 2561, 11, 11490, + 13, 286, 603, 47307, 3728, 309, 13, 51214], "temperature": 0.0, "avg_logprob": -0.3933400426592146, + "compression_ratio": 1.3088235294117647, "no_speech_prob": 0.4658838212490082}, + {"id": 39, "seek": 48100, "start": 481.0, "end": 503.0, "text": " So yeah, I mean + that that covers the value prop behind your product. So I mean the key element for + me is also that as you said, you''re reducing cost and you know, like you provide + fully managed, you know, solution to better search. So teams don''t have to kind + of like run around, figure out some low level things and just get to business. That''s + great.", "tokens": [50364, 407, 1338, 11, 286, 914, 300, 300, 10538, 264, 2158, + 2365, 2261, 428, 1674, 13, 407, 286, 914, 264, 2141, 4478, 337, 385, 307, 611, 300, + 382, 291, 848, 11, 291, 434, 12245, 2063, 293, 291, 458, 11, 411, 291, 2893, 4498, + 6453, 11, 291, 458, 11, 3827, 281, 1101, 3164, 13, 407, 5491, 500, 380, 362, 281, + 733, 295, 411, 1190, 926, 11, 2573, 484, 512, 2295, 1496, 721, 293, 445, 483, 281, + 1606, 13, 663, 311, 869, 13, 51464], "temperature": 0.0, "avg_logprob": -0.16477466764904203, + "compression_ratio": 1.5855855855855856, "no_speech_prob": 0.353030264377594}, {"id": + 40, "seek": 50300, "start": 503.0, "end": 518.0, "text": " So the next thing I wanted + to ask you like more like on the lines of how you know there are different ways + of implementing vector search right and there are different algorithms. There is + an end bench marks that will be big and then benchmarks soon as well.", "tokens": + [50364, 407, 264, 958, 551, 286, 1415, 281, 1029, 291, 411, 544, 411, 322, 264, + 3876, 295, 577, 291, 458, 456, 366, 819, 2098, 295, 18114, 8062, 3164, 558, 293, + 456, 366, 819, 14642, 13, 821, 307, 364, 917, 10638, 10640, 300, 486, 312, 955, + 293, 550, 43751, 2321, 382, 731, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.2738764319621341, "compression_ratio": 1.6787564766839378, "no_speech_prob": + 0.06032079830765724}, {"id": 41, "seek": 50300, "start": 518.0, "end": 524.0, "text": + " That competition is going for listeners outshare the link as well.", "tokens": + [51114, 663, 6211, 307, 516, 337, 23274, 484, 2716, 543, 264, 2113, 382, 731, 13, + 51414], "temperature": 0.0, "avg_logprob": -0.2738764319621341, "compression_ratio": + 1.6787564766839378, "no_speech_prob": 0.06032079830765724}, {"id": 42, "seek": 52400, + "start": 524.0, "end": 552.0, "text": " But what ways did you kind of consider to + implement your tech. I know some parts of it are proprietary. So maybe you cannot + share too much detail, but maybe you can share some things give us a clue how you + do things on kind of like algorithmic side and also like kind of like speak to the + product that large like, you know, you mentioned, so see two compliance. So it was + very important for your customers, right.", "tokens": [50364, 583, 437, 2098, 630, + 291, 733, 295, 1949, 281, 4445, 428, 7553, 13, 286, 458, 512, 3166, 295, 309, 366, + 38992, 13, 407, 1310, 291, 2644, 2073, 886, 709, 2607, 11, 457, 1310, 291, 393, + 2073, 512, 721, 976, 505, 257, 13602, 577, 291, 360, 721, 322, 733, 295, 411, 9284, + 299, 1252, 293, 611, 411, 733, 295, 411, 1710, 281, 264, 1674, 300, 2416, 411, 11, + 291, 458, 11, 291, 2835, 11, 370, 536, 732, 15882, 13, 407, 309, 390, 588, 1021, + 337, 428, 4581, 11, 558, 13, 51764], "temperature": 0.0, "avg_logprob": -0.15645420935846144, + "compression_ratio": 1.699588477366255, "no_speech_prob": 0.03916977718472481}, + {"id": 43, "seek": 55200, "start": 552.0, "end": 557.0, "text": " So that also is + kind of included in the how part.", "tokens": [50364, 407, 300, 611, 307, 733, 295, + 5556, 294, 264, 577, 644, 13, 50614], "temperature": 0.0, "avg_logprob": -0.12182786729600695, + "compression_ratio": 1.5957446808510638, "no_speech_prob": 0.0021915591787546873}, + {"id": 44, "seek": 55200, "start": 557.0, "end": 574.0, "text": " Yeah, I''ll be + a little lighter on the technical side because I would rather, I''d rather point + you to our docs and point people to our docs and some of the articles and examples + we have, then say something that''s imprecise from a technical standpoint.", "tokens": + [50614, 865, 11, 286, 603, 312, 257, 707, 11546, 322, 264, 6191, 1252, 570, 286, + 576, 2831, 11, 286, 1116, 2831, 935, 291, 281, 527, 45623, 293, 935, 561, 281, 527, + 45623, 293, 512, 295, 264, 11290, 293, 5110, 321, 362, 11, 550, 584, 746, 300, 311, + 704, 13867, 908, 490, 257, 6191, 15827, 13, 51464], "temperature": 0.0, "avg_logprob": + -0.12182786729600695, "compression_ratio": 1.5957446808510638, "no_speech_prob": + 0.0021915591787546873}, {"id": 45, "seek": 57400, "start": 574.0, "end": 589.0, + "text": " Generally that there are sort of three layers, we see three layers in + the inside of vector search solution or vector database.", "tokens": [50364, 21082, + 300, 456, 366, 1333, 295, 1045, 7914, 11, 321, 536, 1045, 7914, 294, 264, 1854, + 295, 8062, 3164, 3827, 420, 8062, 8149, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.3259212630135672, "compression_ratio": 1.3263157894736841, "no_speech_prob": + 0.004582516849040985}, {"id": 46, "seek": 58900, "start": 589.0, "end": 598.0, "text": + " The lowest layer is the near neighbor search algorithm like annoy or hnsw.", "tokens": + [50364, 440, 12437, 4583, 307, 264, 2651, 5987, 3164, 9284, 411, 8759, 420, 276, + 3695, 86, 13, 50814], "temperature": 0.0, "avg_logprob": -0.22692115604877472, "compression_ratio": + 1.6022727272727273, "no_speech_prob": 0.017932498827576637}, {"id": 47, "seek": + 58900, "start": 598.0, "end": 606.0, "text": " Then there''s an index library, which + contains those algorithms and that''s like face.", "tokens": [50814, 1396, 456, + 311, 364, 8186, 6405, 11, 597, 8306, 729, 14642, 293, 300, 311, 411, 1851, 13, 51214], + "temperature": 0.0, "avg_logprob": -0.22692115604877472, "compression_ratio": 1.6022727272727273, + "no_speech_prob": 0.017932498827576637}, {"id": 48, "seek": 58900, "start": 606.0, + "end": 615.0, "text": " And then there''s a shell around that, which we''re calling + vector database that provides things like live index updates and", "tokens": [51214, + 400, 550, 456, 311, 257, 8720, 926, 300, 11, 597, 321, 434, 5141, 8062, 8149, 300, + 6417, 721, 411, 1621, 8186, 9205, 293, 51664], "temperature": 0.0, "avg_logprob": + -0.22692115604877472, "compression_ratio": 1.6022727272727273, "no_speech_prob": + 0.017932498827576637}, {"id": 49, "seek": 61500, "start": 615.0, "end": 624.0, "text": + " crowd operations on vectors and filtering and metadata storage and things like + that.", "tokens": [50364, 6919, 7705, 322, 18875, 293, 30822, 293, 26603, 6725, + 293, 721, 411, 300, 13, 50814], "temperature": 0.0, "avg_logprob": -0.3079881417123895, + "compression_ratio": 1.3153153153153154, "no_speech_prob": 0.0026434902101755142}, + {"id": 50, "seek": 61500, "start": 624.0, "end": 635.0, "text": " So for the index, + we, Pankoan does use face for exact search.", "tokens": [50814, 407, 337, 264, 8186, + 11, 321, 11, 430, 657, 78, 282, 775, 764, 1851, 337, 1900, 3164, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.3079881417123895, "compression_ratio": 1.3153153153153154, + "no_speech_prob": 0.0026434902101755142}, {"id": 51, "seek": 63500, "start": 635.0, + "end": 650.0, "text": " You can choose if what sort of engine you''re running and + a proprietary index for approximate search, which is obviously the bulk of use cases + for us.", "tokens": [50364, 509, 393, 2826, 498, 437, 1333, 295, 2848, 291, 434, + 2614, 293, 257, 38992, 8186, 337, 30874, 3164, 11, 597, 307, 2745, 264, 16139, 295, + 764, 3331, 337, 505, 13, 51114], "temperature": 0.0, "avg_logprob": -0.16239042843089385, + "compression_ratio": 1.2735042735042734, "no_speech_prob": 0.007020191755145788}, + {"id": 52, "seek": 65000, "start": 650.0, "end": 667.0, "text": " And we thought + a lot about performance comparisons, maybe even open sourcing that proprietary index + so we can, so we can be included in an end benchmark.", "tokens": [50364, 400, 321, + 1194, 257, 688, 466, 3389, 33157, 11, 1310, 754, 1269, 11006, 2175, 300, 38992, + 8186, 370, 321, 393, 11, 370, 321, 393, 312, 5556, 294, 364, 917, 18927, 13, 51214], + "temperature": 0.0, "avg_logprob": -0.17587409700666154, "compression_ratio": 1.3076923076923077, + "no_speech_prob": 0.013330293819308281}, {"id": 53, "seek": 66700, "start": 667.0, + "end": 682.0, "text": " While we were thinking about that, we learned from users + that actually like eaking out slightly more slightly lower latencies or slightly + higher recall from the index was not really what they were after.", "tokens": [50364, + 3987, 321, 645, 1953, 466, 300, 11, 321, 3264, 490, 5022, 300, 767, 411, 308, 2456, + 484, 4748, 544, 4748, 3126, 4465, 6464, 420, 4748, 2946, 9901, 490, 264, 8186, 390, + 406, 534, 437, 436, 645, 934, 13, 51114], "temperature": 0.0, "avg_logprob": -0.1679224406971651, + "compression_ratio": 1.6974358974358974, "no_speech_prob": 0.09428898990154266}, + {"id": 54, "seek": 66700, "start": 682.0, "end": 694.0, "text": " That''s not where + they were stuck. They were stuck on downstream things like horizontal scaling and + adding features to an index.", "tokens": [51114, 663, 311, 406, 689, 436, 645, 5541, + 13, 814, 645, 5541, 322, 30621, 721, 411, 12750, 21589, 293, 5127, 4122, 281, 364, + 8186, 13, 51714], "temperature": 0.0, "avg_logprob": -0.1679224406971651, "compression_ratio": + 1.6974358974358974, "no_speech_prob": 0.09428898990154266}, {"id": 55, "seek": 69400, + "start": 694.0, "end": 703.0, "text": " Setting up the infrastructure and managing + it. And so since learning that and valid data that we focus much more on those things.", + "tokens": [50364, 21063, 493, 264, 6896, 293, 11642, 309, 13, 400, 370, 1670, 2539, + 300, 293, 7363, 1412, 300, 321, 1879, 709, 544, 322, 729, 721, 13, 50814], "temperature": + 0.0, "avg_logprob": -0.2549197174781977, "compression_ratio": 1.4071428571428573, + "no_speech_prob": 0.01194772869348526}, {"id": 56, "seek": 69400, "start": 703.0, + "end": 709.0, "text": " And stayed with a proprietary index for the for approximate + search.", "tokens": [50814, 400, 9181, 365, 257, 38992, 8186, 337, 264, 337, 30874, + 3164, 13, 51114], "temperature": 0.0, "avg_logprob": -0.2549197174781977, "compression_ratio": + 1.4071428571428573, "no_speech_prob": 0.01194772869348526}, {"id": 57, "seek": 70900, + "start": 709.0, "end": 726.0, "text": " And sure enough, we find that even people + who ask a lot about this after they sign up and start using it, they really, you + know, the solve their use case and they don''t ask us about it again after that + from some other search or recommendation system to vector search.", "tokens": [50364, + 400, 988, 1547, 11, 321, 915, 300, 754, 561, 567, 1029, 257, 688, 466, 341, 934, + 436, 1465, 493, 293, 722, 1228, 309, 11, 436, 534, 11, 291, 458, 11, 264, 5039, + 641, 764, 1389, 293, 436, 500, 380, 1029, 505, 466, 309, 797, 934, 300, 490, 512, + 661, 3164, 420, 11879, 1185, 281, 8062, 3164, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.1411050544990288, "compression_ratio": 1.702127659574468, "no_speech_prob": 0.08193209767341614}, + {"id": 58, "seek": 70900, "start": 726.0, "end": 734.0, "text": " And you''re just + looking for an easy way to run it in production. So that''s the use case just implement + vector search and production.", "tokens": [51214, 400, 291, 434, 445, 1237, 337, + 364, 1858, 636, 281, 1190, 309, 294, 4265, 13, 407, 300, 311, 264, 764, 1389, 445, + 4445, 8062, 3164, 293, 4265, 13, 51614], "temperature": 0.0, "avg_logprob": -0.1411050544990288, + "compression_ratio": 1.702127659574468, "no_speech_prob": 0.08193209767341614}, + {"id": 59, "seek": 73400, "start": 734.0, "end": 754.0, "text": " Or a lot of people + come to us from from like an application side, which is they don''t even know they + want to use vector search, but they know they want to replace their semantic their + keyword search with semantic search or they want to implement", "tokens": [50364, + 1610, 257, 688, 295, 561, 808, 281, 505, 490, 490, 411, 364, 3861, 1252, 11, 597, + 307, 436, 500, 380, 754, 458, 436, 528, 281, 764, 8062, 3164, 11, 457, 436, 458, + 436, 528, 281, 7406, 641, 47982, 641, 20428, 3164, 365, 47982, 3164, 420, 436, 528, + 281, 4445, 51364], "temperature": 0.0, "avg_logprob": -0.14257531795861586, "compression_ratio": + 1.6896551724137931, "no_speech_prob": 0.019311560317873955}, {"id": 60, "seek": + 75400, "start": 754.0, "end": 764.0, "text": " image similarity search that will + work on fuzzy matches or they want to do anomaly detection.", "tokens": [50364, + 3256, 32194, 3164, 300, 486, 589, 322, 34710, 10676, 420, 436, 528, 281, 360, 42737, + 17784, 13, 50864], "temperature": 0.0, "avg_logprob": -0.22999132440445272, "compression_ratio": + 1.4701986754966887, "no_speech_prob": 0.07317647337913513}, {"id": 61, "seek": 75400, + "start": 764.0, "end": 777.0, "text": " So and or classification and things like + that. It really is it has as many applications as search information retrieval general.", + "tokens": [50864, 407, 293, 420, 21538, 293, 721, 411, 300, 13, 467, 534, 307, 309, + 575, 382, 867, 5821, 382, 3164, 1589, 19817, 3337, 2674, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.22999132440445272, "compression_ratio": 1.4701986754966887, + "no_speech_prob": 0.07317647337913513}, {"id": 62, "seek": 77700, "start": 777.0, + "end": 783.0, "text": " A lot of people come to us for vectors, excuse me for semantic + search.", "tokens": [50364, 316, 688, 295, 561, 808, 281, 505, 337, 18875, 11, 8960, + 385, 337, 47982, 3164, 13, 50664], "temperature": 0.0, "avg_logprob": -0.17912373477465485, + "compression_ratio": 1.6436170212765957, "no_speech_prob": 0.003295383183285594}, + {"id": 63, "seek": 77700, "start": 783.0, "end": 791.0, "text": " So they have their + embedding models like bird or something like that.", "tokens": [50664, 407, 436, + 362, 641, 12240, 3584, 5245, 411, 5255, 420, 746, 411, 300, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.17912373477465485, "compression_ratio": 1.6436170212765957, + "no_speech_prob": 0.003295383183285594}, {"id": 64, "seek": 77700, "start": 791.0, + "end": 803.0, "text": " And they got it working in the lab, the data science team + got semantic search working using embeddings. Now they''re like, okay, engineering + team or ML engineering team.", "tokens": [51064, 400, 436, 658, 309, 1364, 294, + 264, 2715, 11, 264, 1412, 3497, 1469, 658, 47982, 3164, 1364, 1228, 12240, 29432, + 13, 823, 436, 434, 411, 11, 1392, 11, 7043, 1469, 420, 21601, 7043, 1469, 13, 51664], + "temperature": 0.0, "avg_logprob": -0.17912373477465485, "compression_ratio": 1.6436170212765957, + "no_speech_prob": 0.003295383183285594}, {"id": 65, "seek": 80300, "start": 803.0, + "end": 811.0, "text": " How do we get this in our product? How do we make this? + How do we keep latency below 200 milliseconds?", "tokens": [50364, 1012, 360, 321, + 483, 341, 294, 527, 1674, 30, 1012, 360, 321, 652, 341, 30, 1012, 360, 321, 1066, + 27043, 2507, 2331, 34184, 30, 50764], "temperature": 0.0, "avg_logprob": -0.17541929604350656, + "compression_ratio": 1.5314285714285714, "no_speech_prob": 0.0005434759077616036}, + {"id": 66, "seek": 80300, "start": 811.0, "end": 815.0, "text": " How do we add + filtering to this to give users control.", "tokens": [50764, 1012, 360, 321, 909, + 30822, 281, 341, 281, 976, 5022, 1969, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.17541929604350656, "compression_ratio": 1.5314285714285714, "no_speech_prob": + 0.0005434759077616036}, {"id": 67, "seek": 80300, "start": 815.0, "end": 823.0, + "text": " And the ML engineering team is then goes out and finds like, oh, we can + do this with something like bank home.", "tokens": [50964, 400, 264, 21601, 7043, + 1469, 307, 550, 1709, 484, 293, 10704, 411, 11, 1954, 11, 321, 393, 360, 341, 365, + 746, 411, 3765, 1280, 13, 51364], "temperature": 0.0, "avg_logprob": -0.17541929604350656, + "compression_ratio": 1.5314285714285714, "no_speech_prob": 0.0005434759077616036}, + {"id": 68, "seek": 82300, "start": 823.0, "end": 836.0, "text": " So those are those + are the typical use cases, I would say semantic search, the most common or somebody + just coming because they''re looking for vector search and regardless of what it''s + for.", "tokens": [50364, 407, 729, 366, 729, 366, 264, 7476, 764, 3331, 11, 286, + 576, 584, 47982, 3164, 11, 264, 881, 2689, 420, 2618, 445, 1348, 570, 436, 434, + 1237, 337, 8062, 3164, 293, 10060, 295, 437, 309, 311, 337, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.17686812797289217, "compression_ratio": 1.6473214285714286, + "no_speech_prob": 0.06906095147132874}, {"id": 69, "seek": 82300, "start": 836.0, + "end": 841.0, "text": " Yeah, yeah, from from our from our perspective for pine + cone.", "tokens": [51014, 865, 11, 1338, 11, 490, 490, 527, 490, 527, 4585, 337, + 15113, 19749, 13, 51264], "temperature": 0.0, "avg_logprob": -0.17686812797289217, + "compression_ratio": 1.6473214285714286, "no_speech_prob": 0.06906095147132874}, + {"id": 70, "seek": 82300, "start": 841.0, "end": 852.0, "text": " We don''t care + what your data is like if it''s in an embedding format, you can index it and then + you search through it.", "tokens": [51264, 492, 500, 380, 1127, 437, 428, 1412, + 307, 411, 498, 309, 311, 294, 364, 12240, 3584, 7877, 11, 291, 393, 8186, 309, 293, + 550, 291, 3164, 807, 309, 13, 51814], "temperature": 0.0, "avg_logprob": -0.17686812797289217, + "compression_ratio": 1.6473214285714286, "no_speech_prob": 0.06906095147132874}, + {"id": 71, "seek": 85200, "start": 852.0, "end": 867.0, "text": " Any it works with + any model, any any, you know, initial data and because we have a rest API, you can + call it from anywhere. So you can use it in a notebook, you can use it in the backend + application.", "tokens": [50364, 2639, 309, 1985, 365, 604, 2316, 11, 604, 604, + 11, 291, 458, 11, 5883, 1412, 293, 570, 321, 362, 257, 1472, 9362, 11, 291, 393, + 818, 309, 490, 4992, 13, 407, 291, 393, 764, 309, 294, 257, 21060, 11, 291, 393, + 764, 309, 294, 264, 38087, 3861, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.1936218955300071, "compression_ratio": 1.6846153846153846, "no_speech_prob": + 0.001894969493150711}, {"id": 72, "seek": 85200, "start": 867.0, "end": 870.0, "text": + " Yeah, we''re seeing a lot of interesting use cases.", "tokens": [51114, 865, 11, + 321, 434, 2577, 257, 688, 295, 1880, 764, 3331, 13, 51264], "temperature": 0.0, + "avg_logprob": -0.1936218955300071, "compression_ratio": 1.6846153846153846, "no_speech_prob": + 0.001894969493150711}, {"id": 73, "seek": 85200, "start": 870.0, "end": 880.0, "text": + " Yeah, sounds great. Sounds like a lot actually of use case that you mentioned. + I mean, obviously it''s search, but then the answer to could also be like data science + that they want to run.", "tokens": [51264, 865, 11, 3263, 869, 13, 14576, 411, 257, + 688, 767, 295, 764, 1389, 300, 291, 2835, 13, 286, 914, 11, 2745, 309, 311, 3164, + 11, 457, 550, 264, 1867, 281, 727, 611, 312, 411, 1412, 3497, 300, 436, 528, 281, + 1190, 13, 51764], "temperature": 0.0, "avg_logprob": -0.1936218955300071, "compression_ratio": + 1.6846153846153846, "no_speech_prob": 0.001894969493150711}, {"id": 74, "seek": + 88000, "start": 880.0, "end": 892.0, "text": " If you take five, for instance, you + know, metadata science teams, they run like large scale experiments using the library, + but like obviously when that''s the data science part, that''s the exploration part.", + "tokens": [50364, 759, 291, 747, 1732, 11, 337, 5197, 11, 291, 458, 11, 26603, 3497, + 5491, 11, 436, 1190, 411, 2416, 4373, 12050, 1228, 264, 6405, 11, 457, 411, 2745, + 562, 300, 311, 264, 1412, 3497, 644, 11, 300, 311, 264, 16197, 644, 13, 50964], + "temperature": 0.0, "avg_logprob": -0.21556065877278646, "compression_ratio": 1.6444444444444444, + "no_speech_prob": 0.01978985033929348}, {"id": 75, "seek": 88000, "start": 892.0, + "end": 901.0, "text": " But the moment you want to put this out to prod, you''ll + face a bunch of kind of like low level engineering concerns, like, oh, how do I + do this? How do you do that?", "tokens": [50964, 583, 264, 1623, 291, 528, 281, + 829, 341, 484, 281, 15792, 11, 291, 603, 1851, 257, 3840, 295, 733, 295, 411, 2295, + 1496, 7043, 7389, 11, 411, 11, 1954, 11, 577, 360, 286, 360, 341, 30, 1012, 360, + 291, 360, 300, 30, 51414], "temperature": 0.0, "avg_logprob": -0.21556065877278646, + "compression_ratio": 1.6444444444444444, "no_speech_prob": 0.01978985033929348}, + {"id": 76, "seek": 90100, "start": 901.0, "end": 914.0, "text": " Reinventing the + wheel isn''t ever fun. Well, sometimes it''s fun if it''s kind of like they''d work, + but if you don''t have time, you kind of like when I''m both faster than obviously + you will want to use an existing solution for that.", "tokens": [50364, 42116, 2475, + 278, 264, 5589, 1943, 380, 1562, 1019, 13, 1042, 11, 2171, 309, 311, 1019, 498, + 309, 311, 733, 295, 411, 436, 1116, 589, 11, 457, 498, 291, 500, 380, 362, 565, + 11, 291, 733, 295, 411, 562, 286, 478, 1293, 4663, 813, 2745, 291, 486, 528, 281, + 764, 364, 6741, 3827, 337, 300, 13, 51014], "temperature": 0.0, "avg_logprob": -0.22148522563364315, + "compression_ratio": 1.6, "no_speech_prob": 0.30482327938079834}, {"id": 77, "seek": + 90100, "start": 914.0, "end": 920.0, "text": " Yeah, we find that, you know, for + the data science team, they don''t, it''s not their issue.", "tokens": [51014, 865, + 11, 321, 915, 300, 11, 291, 458, 11, 337, 264, 1412, 3497, 1469, 11, 436, 500, 380, + 11, 309, 311, 406, 641, 2734, 13, 51314], "temperature": 0.0, "avg_logprob": -0.22148522563364315, + "compression_ratio": 1.6, "no_speech_prob": 0.30482327938079834}, {"id": 78, "seek": + 92000, "start": 920.0, "end": 927.0, "text": " They need to develop the model and + and prove that the method works.", "tokens": [50364, 814, 643, 281, 1499, 264, 2316, + 293, 293, 7081, 300, 264, 3170, 1985, 13, 50714], "temperature": 0.0, "avg_logprob": + -0.16659649440220425, "compression_ratio": 1.5380952380952382, "no_speech_prob": + 0.040135666728019714}, {"id": 79, "seek": 92000, "start": 927.0, "end": 932.0, "text": + " It becomes an engineering teams issue or the ML engineering teams issue.", "tokens": + [50714, 467, 3643, 364, 7043, 5491, 2734, 420, 264, 21601, 7043, 5491, 2734, 13, + 50964], "temperature": 0.0, "avg_logprob": -0.16659649440220425, "compression_ratio": + 1.5380952380952382, "no_speech_prob": 0.040135666728019714}, {"id": 80, "seek": + 92000, "start": 932.0, "end": 948.0, "text": " And yeah, they''re often not exactly + lacking things to do. So, some organizations are all about like focusing on the + core product and trying to use managed services wherever possible.", "tokens": [50964, + 400, 1338, 11, 436, 434, 2049, 406, 2293, 20889, 721, 281, 360, 13, 407, 11, 512, + 6150, 366, 439, 466, 411, 8416, 322, 264, 4965, 1674, 293, 1382, 281, 764, 6453, + 3328, 8660, 1944, 13, 51764], "temperature": 0.0, "avg_logprob": -0.16659649440220425, + "compression_ratio": 1.5380952380952382, "no_speech_prob": 0.040135666728019714}, + {"id": 81, "seek": 94800, "start": 948.0, "end": 964.0, "text": " Others like to + develop things in house and prefer to take open source as much as possible. So I + think it depends on your, you know, how you prioritize your focus and what kind + of, you know, what''s your engineering culture at the company?", "tokens": [50364, + 20277, 411, 281, 1499, 721, 294, 1782, 293, 4382, 281, 747, 1269, 4009, 382, 709, + 382, 1944, 13, 407, 286, 519, 309, 5946, 322, 428, 11, 291, 458, 11, 577, 291, 25164, + 428, 1879, 293, 437, 733, 295, 11, 291, 458, 11, 437, 311, 428, 7043, 3713, 412, + 264, 2237, 30, 51164], "temperature": 0.0, "avg_logprob": -0.19071258293403373, + "compression_ratio": 1.5776892430278884, "no_speech_prob": 0.06385955214500427}, + {"id": 82, "seek": 94800, "start": 964.0, "end": 976.0, "text": " Yeah, absolutely. + And sounds like you also address the elements of like, so see to and I believe you + also will have GPR covered at some point already covered.", "tokens": [51164, 865, + 11, 3122, 13, 400, 3263, 411, 291, 611, 2985, 264, 4959, 295, 411, 11, 370, 536, + 281, 293, 286, 1697, 291, 611, 486, 362, 460, 15958, 5343, 412, 512, 935, 1217, + 5343, 13, 51764], "temperature": 0.0, "avg_logprob": -0.19071258293403373, "compression_ratio": + 1.5776892430278884, "no_speech_prob": 0.06385955214500427}, {"id": 83, "seek": 97600, + "start": 976.0, "end": 999.0, "text": " So we say we''re GDPR friendly, which means + there''s no, there''s no official certification you can get for being GDPR, you + can just be following the regulations and able to make the proper disclosures and + able to act on requests for deletion and things like that.", "tokens": [50364, 407, + 321, 584, 321, 434, 460, 35, 15958, 9208, 11, 597, 1355, 456, 311, 572, 11, 456, + 311, 572, 4783, 21775, 291, 393, 483, 337, 885, 460, 35, 15958, 11, 291, 393, 445, + 312, 3480, 264, 12563, 293, 1075, 281, 652, 264, 2296, 2983, 9389, 1303, 293, 1075, + 281, 605, 322, 12475, 337, 1103, 302, 313, 293, 721, 411, 300, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.1907348192655123, "compression_ratio": 1.5174418604651163, + "no_speech_prob": 0.002655997173860669}, {"id": 84, "seek": 99900, "start": 999.0, + "end": 1014.0, "text": " These are the types of things like the security aspects. + It''s another thing that a data science team might not force to when they''re developing + like a factor search solutions to some application.", "tokens": [50364, 1981, 366, + 264, 3467, 295, 721, 411, 264, 3825, 7270, 13, 467, 311, 1071, 551, 300, 257, 1412, + 3497, 1469, 1062, 406, 3464, 281, 562, 436, 434, 6416, 411, 257, 5952, 3164, 6547, + 281, 512, 3861, 13, 51114], "temperature": 0.0, "avg_logprob": -0.16578189257917733, + "compression_ratio": 1.5271739130434783, "no_speech_prob": 0.07403605431318283}, + {"id": 85, "seek": 99900, "start": 1014.0, "end": 1020.0, "text": " But when it + goes engineering when you start talking about getting it into production.", "tokens": + [51114, 583, 562, 309, 1709, 7043, 562, 291, 722, 1417, 466, 1242, 309, 666, 4265, + 13, 51414], "temperature": 0.0, "avg_logprob": -0.16578189257917733, "compression_ratio": + 1.5271739130434783, "no_speech_prob": 0.07403605431318283}, {"id": 86, "seek": 102000, + "start": 1020.0, "end": 1029.0, "text": " And depending on the company, you''re, + yeah, you start, you know, all these things come up. Does it meet our security, + does it pass our security review?", "tokens": [50364, 400, 5413, 322, 264, 2237, + 11, 291, 434, 11, 1338, 11, 291, 722, 11, 291, 458, 11, 439, 613, 721, 808, 493, + 13, 4402, 309, 1677, 527, 3825, 11, 775, 309, 1320, 527, 3825, 3131, 30, 50814], + "temperature": 0.0, "avg_logprob": -0.18620049081197598, "compression_ratio": 1.6865671641791045, + "no_speech_prob": 0.16128839552402496}, {"id": 87, "seek": 102000, "start": 1029.0, + "end": 1043.0, "text": " Does it pass our reliability requirements? Who''s going + to be on call if this thing goes down like all these things come up and we worry + about those things so that the users don''t have to.", "tokens": [50814, 4402, 309, + 1320, 527, 24550, 7728, 30, 2102, 311, 516, 281, 312, 322, 818, 498, 341, 551, 1709, + 760, 411, 439, 613, 721, 808, 493, 293, 321, 3292, 466, 729, 721, 370, 300, 264, + 5022, 500, 380, 362, 281, 13, 51514], "temperature": 0.0, "avg_logprob": -0.18620049081197598, + "compression_ratio": 1.6865671641791045, "no_speech_prob": 0.16128839552402496}, + {"id": 88, "seek": 104300, "start": 1043.0, "end": 1050.0, "text": " Yeah, that''s + a big benefit like to the users again to focus on what matters to them.", "tokens": + [50364, 865, 11, 300, 311, 257, 955, 5121, 411, 281, 264, 5022, 797, 281, 1879, + 322, 437, 7001, 281, 552, 13, 50714], "temperature": 0.0, "avg_logprob": -0.11077933060495478, + "compression_ratio": 1.6923076923076923, "no_speech_prob": 0.19690246880054474}, + {"id": 89, "seek": 104300, "start": 1050.0, "end": 1055.0, "text": " And by the + way, I don''t want to just so this doesn''t come off as like promotional.", "tokens": + [50714, 400, 538, 264, 636, 11, 286, 500, 380, 528, 281, 445, 370, 341, 1177, 380, + 808, 766, 382, 411, 41790, 13, 50964], "temperature": 0.0, "avg_logprob": -0.11077933060495478, + "compression_ratio": 1.6923076923076923, "no_speech_prob": 0.19690246880054474}, + {"id": 90, "seek": 104300, "start": 1055.0, "end": 1072.0, "text": " Anyone listening + to this can treat this as just heads up about what you should think about if you + want to get vector search in your production, even if you''re using some other solution, + like these are things you should plan for.", "tokens": [50964, 14643, 4764, 281, + 341, 393, 2387, 341, 382, 445, 8050, 493, 466, 437, 291, 820, 519, 466, 498, 291, + 528, 281, 483, 8062, 3164, 294, 428, 4265, 11, 754, 498, 291, 434, 1228, 512, 661, + 3827, 11, 411, 613, 366, 721, 291, 820, 1393, 337, 13, 51814], "temperature": 0.0, + "avg_logprob": -0.11077933060495478, "compression_ratio": 1.6923076923076923, "no_speech_prob": + 0.19690246880054474}, {"id": 91, "seek": 107200, "start": 1072.0, "end": 1076.0, + "text": " And start thinking about it and making.", "tokens": [50364, 400, 722, + 1953, 466, 309, 293, 1455, 13, 50564], "temperature": 0.0, "avg_logprob": -0.23016932203962998, + "compression_ratio": 1.4010695187165776, "no_speech_prob": 0.04371524974703789}, + {"id": 92, "seek": 107200, "start": 1076.0, "end": 1078.0, "text": " Leaving time + to do.", "tokens": [50564, 41253, 565, 281, 360, 13, 50664], "temperature": 0.0, + "avg_logprob": -0.23016932203962998, "compression_ratio": 1.4010695187165776, "no_speech_prob": + 0.04371524974703789}, {"id": 93, "seek": 107200, "start": 1078.0, "end": 1085.0, + "text": " Yeah, absolutely. You don''t want to be caught by surprise in those, those + items for sure.", "tokens": [50664, 865, 11, 3122, 13, 509, 500, 380, 528, 281, + 312, 5415, 538, 6365, 294, 729, 11, 729, 4754, 337, 988, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.23016932203962998, "compression_ratio": 1.4010695187165776, + "no_speech_prob": 0.04371524974703789}, {"id": 94, "seek": 107200, "start": 1085.0, + "end": 1093.0, "text": " Yeah, that''s awesome. By the way, I remember that you + guys also made a bunch of blog posts on like FICE and LSH.", "tokens": [51014, 865, + 11, 300, 311, 3476, 13, 3146, 264, 636, 11, 286, 1604, 300, 291, 1074, 611, 1027, + 257, 3840, 295, 6968, 12300, 322, 411, 479, 13663, 293, 441, 17308, 13, 51414], + "temperature": 0.0, "avg_logprob": -0.23016932203962998, "compression_ratio": 1.4010695187165776, + "no_speech_prob": 0.04371524974703789}, {"id": 95, "seek": 109300, "start": 1093.0, + "end": 1103.0, "text": " I mean, I really like the way you did it. You know, it''s + almost look what looks like a comic book, you know, you know, get, get, get deep + with, with these things.", "tokens": [50364, 286, 914, 11, 286, 534, 411, 264, 636, + 291, 630, 309, 13, 509, 458, 11, 309, 311, 1920, 574, 437, 1542, 411, 257, 13900, + 1446, 11, 291, 458, 11, 291, 458, 11, 483, 11, 483, 11, 483, 2452, 365, 11, 365, + 613, 721, 13, 50864], "temperature": 0.0, "avg_logprob": -0.22663739832436167, "compression_ratio": + 1.5977011494252873, "no_speech_prob": 0.2388550490140915}, {"id": 96, "seek": 109300, + "start": 1103.0, "end": 1110.0, "text": " And I think you also shared the source + code, like some notebooks. Is that right? Is that.", "tokens": [50864, 400, 286, + 519, 291, 611, 5507, 264, 4009, 3089, 11, 411, 512, 43782, 13, 1119, 300, 558, 30, + 1119, 300, 13, 51214], "temperature": 0.0, "avg_logprob": -0.22663739832436167, + "compression_ratio": 1.5977011494252873, "no_speech_prob": 0.2388550490140915}, + {"id": 97, "seek": 109300, "start": 1110.0, "end": 1115.0, "text": " Yeah, we, we, + we publish.", "tokens": [51214, 865, 11, 321, 11, 321, 11, 321, 11374, 13, 51464], + "temperature": 0.0, "avg_logprob": -0.22663739832436167, "compression_ratio": 1.5977011494252873, + "no_speech_prob": 0.2388550490140915}, {"id": 98, "seek": 111500, "start": 1115.0, + "end": 1121.0, "text": " And articles on vector search on face on.", "tokens": [50364, + 400, 11290, 322, 8062, 3164, 322, 1851, 322, 13, 50664], "temperature": 0.0, "avg_logprob": + -0.26717655694306786, "compression_ratio": 1.5027322404371584, "no_speech_prob": + 0.23333612084388733}, {"id": 99, "seek": 111500, "start": 1121.0, "end": 1128.0, + "text": " Semantic search different techniques and things like that. A lot of them + are done by the very talented.", "tokens": [50664, 14421, 7128, 3164, 819, 7512, + 293, 721, 411, 300, 13, 316, 688, 295, 552, 366, 1096, 538, 264, 588, 13467, 13, + 51014], "temperature": 0.0, "avg_logprob": -0.26717655694306786, "compression_ratio": + 1.5027322404371584, "no_speech_prob": 0.23333612084388733}, {"id": 100, "seek": + 111500, "start": 1128.0, "end": 1141.0, "text": " James breaks, I should give him + a shout out. We have a new one today about the index composite indexes and index + factory in face.", "tokens": [51014, 5678, 9857, 11, 286, 820, 976, 796, 257, 8043, + 484, 13, 492, 362, 257, 777, 472, 965, 466, 264, 8186, 25557, 8186, 279, 293, 8186, + 9265, 294, 1851, 13, 51664], "temperature": 0.0, "avg_logprob": -0.26717655694306786, + "compression_ratio": 1.5027322404371584, "no_speech_prob": 0.23333612084388733}, + {"id": 101, "seek": 114100, "start": 1141.0, "end": 1146.0, "text": " We share code + snippets and we have example notebooks for all of them.", "tokens": [50364, 492, + 2073, 3089, 35623, 1385, 293, 321, 362, 1365, 43782, 337, 439, 295, 552, 13, 50614], + "temperature": 0.0, "avg_logprob": -0.14504288543354382, "compression_ratio": 1.6088888888888888, + "no_speech_prob": 0.05774102360010147}, {"id": 102, "seek": 114100, "start": 1146.0, + "end": 1151.0, "text": " And yeah, we''re very happy to see people like them.", + "tokens": [50614, 400, 1338, 11, 321, 434, 588, 2055, 281, 536, 561, 411, 552, 13, + 50864], "temperature": 0.0, "avg_logprob": -0.14504288543354382, "compression_ratio": + 1.6088888888888888, "no_speech_prob": 0.05774102360010147}, {"id": 103, "seek": + 114100, "start": 1151.0, "end": 1161.0, "text": " Even people who are not familiar + with vector search will see it and it peaks their interest because engineers like + to see how things work and learn new things.", "tokens": [50864, 2754, 561, 567, + 366, 406, 4963, 365, 8062, 3164, 486, 536, 309, 293, 309, 26897, 641, 1179, 570, + 11955, 411, 281, 536, 577, 721, 589, 293, 1466, 777, 721, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.14504288543354382, "compression_ratio": 1.6088888888888888, + "no_speech_prob": 0.05774102360010147}, {"id": 104, "seek": 114100, "start": 1161.0, + "end": 1167.0, "text": " And that''s our goal. It''s some of them have almost nothing + to do with pine cone.", "tokens": [51364, 400, 300, 311, 527, 3387, 13, 467, 311, + 512, 295, 552, 362, 1920, 1825, 281, 360, 365, 15113, 19749, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.14504288543354382, "compression_ratio": 1.6088888888888888, + "no_speech_prob": 0.05774102360010147}, {"id": 105, "seek": 116700, "start": 1167.0, + "end": 1172.0, "text": " And we have more people to learn about.", "tokens": [50364, + 400, 321, 362, 544, 561, 281, 1466, 466, 13, 50614], "temperature": 0.0, "avg_logprob": + -0.245787732741412, "compression_ratio": 1.5753968253968254, "no_speech_prob": 0.020519956946372986}, + {"id": 106, "seek": 116700, "start": 1172.0, "end": 1175.0, "text": " Vector search + to.", "tokens": [50614, 691, 20814, 3164, 281, 13, 50764], "temperature": 0.0, "avg_logprob": + -0.245787732741412, "compression_ratio": 1.5753968253968254, "no_speech_prob": 0.020519956946372986}, + {"id": 107, "seek": 116700, "start": 1175.0, "end": 1182.0, "text": " Realize that + they can use vector search to replace their to improve the current applications + and.", "tokens": [50764, 8467, 1125, 300, 436, 393, 764, 8062, 3164, 281, 7406, + 641, 281, 3470, 264, 2190, 5821, 293, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.245787732741412, "compression_ratio": 1.5753968253968254, "no_speech_prob": 0.020519956946372986}, + {"id": 108, "seek": 116700, "start": 1182.0, "end": 1187.0, "text": " If we succeed + in that, I think it''ll certainly help us, but really everyone in the.", "tokens": + [51114, 759, 321, 7754, 294, 300, 11, 286, 519, 309, 603, 3297, 854, 505, 11, 457, + 534, 1518, 294, 264, 13, 51364], "temperature": 0.0, "avg_logprob": -0.245787732741412, + "compression_ratio": 1.5753968253968254, "no_speech_prob": 0.020519956946372986}, + {"id": 109, "seek": 116700, "start": 1187.0, "end": 1188.0, "text": " In this space.", + "tokens": [51364, 682, 341, 1901, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.245787732741412, "compression_ratio": 1.5753968253968254, "no_speech_prob": 0.020519956946372986}, + {"id": 110, "seek": 116700, "start": 1188.0, "end": 1195.0, "text": " Yeah, I mean, + absolutely. Those looks like our jam, you know, people are reading, citing and kind + of discussing on Slack and things like that.", "tokens": [51414, 865, 11, 286, 914, + 11, 3122, 13, 3950, 1542, 411, 527, 7872, 11, 291, 458, 11, 561, 366, 3760, 11, + 48749, 293, 733, 295, 10850, 322, 37211, 293, 721, 411, 300, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.245787732741412, "compression_ratio": 1.5753968253968254, + "no_speech_prob": 0.020519956946372986}, {"id": 111, "seek": 119500, "start": 1195.0, + "end": 1207.0, "text": " And yeah, it sounds like you guys are also kind of willing + to share your knowledge with the community, even like beyond kind of share, you + know, customer interaction and so on. Right.", "tokens": [50364, 400, 1338, 11, + 309, 3263, 411, 291, 1074, 366, 611, 733, 295, 4950, 281, 2073, 428, 3601, 365, + 264, 1768, 11, 754, 411, 4399, 733, 295, 2073, 11, 291, 458, 11, 5474, 9285, 293, + 370, 322, 13, 1779, 13, 50964], "temperature": 0.0, "avg_logprob": -0.15861816155283073, + "compression_ratio": 1.5051020408163265, "no_speech_prob": 0.01606069877743721}, + {"id": 112, "seek": 119500, "start": 1207.0, "end": 1209.0, "text": " So that''s + that''s awesome.", "tokens": [50964, 407, 300, 311, 300, 311, 3476, 13, 51064], + "temperature": 0.0, "avg_logprob": -0.15861816155283073, "compression_ratio": 1.5051020408163265, + "no_speech_prob": 0.01606069877743721}, {"id": 113, "seek": 119500, "start": 1209.0, + "end": 1210.0, "text": " Yeah.", "tokens": [51064, 865, 13, 51114], "temperature": + 0.0, "avg_logprob": -0.15861816155283073, "compression_ratio": 1.5051020408163265, + "no_speech_prob": 0.01606069877743721}, {"id": 114, "seek": 119500, "start": 1210.0, + "end": 1216.0, "text": " I think we are moving slowly to the third section of our + podcast, which is why.", "tokens": [51114, 286, 519, 321, 366, 2684, 5692, 281, + 264, 2636, 3541, 295, 527, 7367, 11, 597, 307, 983, 13, 51414], "temperature": 0.0, + "avg_logprob": -0.15861816155283073, "compression_ratio": 1.5051020408163265, "no_speech_prob": + 0.01606069877743721}, {"id": 115, "seek": 121600, "start": 1216.0, "end": 1222.0, + "text": " And I think I know it''s a little bit more philosophical kind of stance + and what you do.", "tokens": [50364, 400, 286, 519, 286, 458, 309, 311, 257, 707, + 857, 544, 25066, 733, 295, 21033, 293, 437, 291, 360, 13, 50664], "temperature": + 0.0, "avg_logprob": -0.16848244565598508, "compression_ratio": 1.6017316017316017, + "no_speech_prob": 0.11230132728815079}, {"id": 116, "seek": 121600, "start": 1222.0, + "end": 1229.0, "text": " And kind of like how you do it. I don''t know if you''ve + been reflecting on your journey. I know you said you joined last year.", "tokens": + [50664, 400, 733, 295, 411, 577, 291, 360, 309, 13, 286, 500, 380, 458, 498, 291, + 600, 668, 23543, 322, 428, 4671, 13, 286, 458, 291, 848, 291, 6869, 1036, 1064, + 13, 51014], "temperature": 0.0, "avg_logprob": -0.16848244565598508, "compression_ratio": + 1.6017316017316017, "no_speech_prob": 0.11230132728815079}, {"id": 117, "seek": + 121600, "start": 1229.0, "end": 1231.0, "text": " Join bank on last year.", "tokens": + [51014, 19642, 3765, 322, 1036, 1064, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.16848244565598508, "compression_ratio": 1.6017316017316017, "no_speech_prob": + 0.11230132728815079}, {"id": 118, "seek": 121600, "start": 1231.0, "end": 1239.0, + "text": " But I guess I''ll start off by just asking you what motivates you to be + part of vector search development and this community as much.", "tokens": [51114, + 583, 286, 2041, 286, 603, 722, 766, 538, 445, 3365, 291, 437, 42569, 291, 281, 312, + 644, 295, 8062, 3164, 3250, 293, 341, 1768, 382, 709, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.16848244565598508, "compression_ratio": 1.6017316017316017, + "no_speech_prob": 0.11230132728815079}, {"id": 119, "seek": 123900, "start": 1239.0, + "end": 1256.0, "text": " Me personally, I''ve worked with 40, 40 startups when I + was consulting over 40 startups. And when I met, you know, the founder of pine cone.", + "tokens": [50364, 1923, 5665, 11, 286, 600, 2732, 365, 3356, 11, 3356, 28041, 562, + 286, 390, 23682, 670, 3356, 28041, 13, 400, 562, 286, 1131, 11, 291, 458, 11, 264, + 14917, 295, 15113, 19749, 13, 51214], "temperature": 0.0, "avg_logprob": -0.18929634094238282, + "compression_ratio": 1.463855421686747, "no_speech_prob": 0.005013223737478256}, + {"id": 120, "seek": 123900, "start": 1256.0, "end": 1265.0, "text": " And learned + about the product and about the space. I saw a familiar pattern, which caught my + attention.", "tokens": [51214, 400, 3264, 466, 264, 1674, 293, 466, 264, 1901, 13, + 286, 1866, 257, 4963, 5102, 11, 597, 5415, 452, 3202, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.18929634094238282, "compression_ratio": 1.463855421686747, + "no_speech_prob": 0.005013223737478256}, {"id": 121, "seek": 126500, "start": 1265.0, + "end": 1270.0, "text": " And the familiar pattern was from 2015.", "tokens": [50364, + 400, 264, 4963, 5102, 390, 490, 7546, 13, 50614], "temperature": 0.0, "avg_logprob": + -0.3358294343295163, "compression_ratio": 1.4228855721393034, "no_speech_prob": + 0.042271923273801804}, {"id": 122, "seek": 126500, "start": 1270.0, "end": 1282.0, + "text": " Six years ago now, almost seven when I started working with the time, + very small company called Domino Data Lab, which.", "tokens": [50614, 11678, 924, + 2057, 586, 11, 1920, 3407, 562, 286, 1409, 1364, 365, 264, 565, 11, 588, 1359, 2237, + 1219, 16674, 2982, 11888, 10137, 11, 597, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.3358294343295163, "compression_ratio": 1.4228855721393034, "no_speech_prob": + 0.042271923273801804}, {"id": 123, "seek": 126500, "start": 1282.0, "end": 1290.0, + "text": " It''s an ML ops platform at the time, we call that a data science platform. + It''s used by over 20% or the fortune 500 companies.", "tokens": [51214, 467, 311, + 364, 21601, 44663, 3663, 412, 264, 565, 11, 321, 818, 300, 257, 1412, 3497, 3663, + 13, 467, 311, 1143, 538, 670, 945, 4, 420, 264, 16531, 5923, 3431, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.3358294343295163, "compression_ratio": 1.4228855721393034, + "no_speech_prob": 0.042271923273801804}, {"id": 124, "seek": 129000, "start": 1290.0, + "end": 1296.0, "text": " And the time was a small team and it was a product for + data scientists, but like nobody knew exactly what is a data scientist.", "tokens": + [50364, 400, 264, 565, 390, 257, 1359, 1469, 293, 309, 390, 257, 1674, 337, 1412, + 7708, 11, 457, 411, 5079, 2586, 2293, 437, 307, 257, 1412, 12662, 13, 50664], "temperature": + 0.0, "avg_logprob": -0.2033338184598126, "compression_ratio": 1.6329787234042554, + "no_speech_prob": 0.04448205605149269}, {"id": 125, "seek": 129000, "start": 1296.0, + "end": 1302.0, "text": " Few people called themselves that even if they were doing + data science work.", "tokens": [50664, 33468, 561, 1219, 2969, 300, 754, 498, 436, + 645, 884, 1412, 3497, 589, 13, 50964], "temperature": 0.0, "avg_logprob": -0.2033338184598126, + "compression_ratio": 1.6329787234042554, "no_speech_prob": 0.04448205605149269}, + {"id": 126, "seek": 129000, "start": 1302.0, "end": 1307.0, "text": " A lot of work + data science work was done on just people''s laptops.", "tokens": [50964, 316, 688, + 295, 589, 1412, 3497, 589, 390, 1096, 322, 445, 561, 311, 27642, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.2033338184598126, "compression_ratio": 1.6329787234042554, + "no_speech_prob": 0.04448205605149269}, {"id": 127, "seek": 129000, "start": 1307.0, + "end": 1310.0, "text": " And there''s no.", "tokens": [51214, 400, 456, 311, 572, + 13, 51364], "temperature": 0.0, "avg_logprob": -0.2033338184598126, "compression_ratio": + 1.6329787234042554, "no_speech_prob": 0.04448205605149269}, {"id": 128, "seek": + 129000, "start": 1310.0, "end": 1314.0, "text": " It was a very young.", "tokens": + [51364, 467, 390, 257, 588, 2037, 13, 51564], "temperature": 0.0, "avg_logprob": + -0.2033338184598126, "compression_ratio": 1.6329787234042554, "no_speech_prob": + 0.04448205605149269}, {"id": 129, "seek": 131400, "start": 1314.0, "end": 1317.0, + "text": " Area, let''s say.", "tokens": [50364, 19405, 11, 718, 311, 584, 13, 50514], + "temperature": 0.0, "avg_logprob": -0.19811121155233943, "compression_ratio": 1.4736842105263157, + "no_speech_prob": 0.005171893164515495}, {"id": 130, "seek": 131400, "start": 1317.0, + "end": 1321.0, "text": " Not not quite mature. There''s not a tooling for it and + so on.", "tokens": [50514, 1726, 406, 1596, 14442, 13, 821, 311, 406, 257, 46593, + 337, 309, 293, 370, 322, 13, 50714], "temperature": 0.0, "avg_logprob": -0.19811121155233943, + "compression_ratio": 1.4736842105263157, "no_speech_prob": 0.005171893164515495}, + {"id": 131, "seek": 131400, "start": 1321.0, "end": 1329.0, "text": " And over time, + over a few years, it became, of course, data science became.", "tokens": [50714, + 400, 670, 565, 11, 670, 257, 1326, 924, 11, 309, 3062, 11, 295, 1164, 11, 1412, + 3497, 3062, 13, 51114], "temperature": 0.0, "avg_logprob": -0.19811121155233943, + "compression_ratio": 1.4736842105263157, "no_speech_prob": 0.005171893164515495}, + {"id": 132, "seek": 131400, "start": 1329.0, "end": 1338.0, "text": " A core function + in many companies, like just like engineering and marketing and customer support.", + "tokens": [51114, 316, 4965, 2445, 294, 867, 3431, 11, 411, 445, 411, 7043, 293, + 6370, 293, 5474, 1406, 13, 51564], "temperature": 0.0, "avg_logprob": -0.19811121155233943, + "compression_ratio": 1.4736842105263157, "no_speech_prob": 0.005171893164515495}, + {"id": 133, "seek": 133800, "start": 1338.0, "end": 1346.0, "text": " And as that + happened, like having the right tooling for that function.", "tokens": [50364, 400, + 382, 300, 2011, 11, 411, 1419, 264, 558, 46593, 337, 300, 2445, 13, 50764], "temperature": + 0.0, "avg_logprob": -0.17683476209640503, "compression_ratio": 1.5135135135135136, + "no_speech_prob": 0.018351173028349876}, {"id": 134, "seek": 133800, "start": 1346.0, + "end": 1357.0, "text": " And kind of maturing the capabilities and making sure it''s + everything data sciences run can run in production securely and reliably and things + like that.", "tokens": [50764, 400, 733, 295, 3803, 1345, 264, 10862, 293, 1455, + 988, 309, 311, 1203, 1412, 17677, 1190, 393, 1190, 294, 4265, 38348, 293, 49927, + 293, 721, 411, 300, 13, 51314], "temperature": 0.0, "avg_logprob": -0.17683476209640503, + "compression_ratio": 1.5135135135135136, "no_speech_prob": 0.018351173028349876}, + {"id": 135, "seek": 135700, "start": 1357.0, "end": 1363.0, "text": " And so it + became more important. Of course, the companies that were.", "tokens": [50364, 400, + 370, 309, 3062, 544, 1021, 13, 2720, 1164, 11, 264, 3431, 300, 645, 13, 50664], + "temperature": 0.0, "avg_logprob": -0.2544973190516642, "compression_ratio": 1.5454545454545454, + "no_speech_prob": 0.3018932342529297}, {"id": 136, "seek": 135700, "start": 1363.0, + "end": 1369.0, "text": " Solving those things were growing with that demand.", "tokens": + [50664, 7026, 798, 729, 721, 645, 4194, 365, 300, 4733, 13, 50964], "temperature": + 0.0, "avg_logprob": -0.2544973190516642, "compression_ratio": 1.5454545454545454, + "no_speech_prob": 0.3018932342529297}, {"id": 137, "seek": 135700, "start": 1369.0, + "end": 1375.0, "text": " And so I wanted to be a part of that journey, that kind + of journey again.", "tokens": [50964, 400, 370, 286, 1415, 281, 312, 257, 644, 295, + 300, 4671, 11, 300, 733, 295, 4671, 797, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.2544973190516642, "compression_ratio": 1.5454545454545454, "no_speech_prob": + 0.3018932342529297}, {"id": 138, "seek": 135700, "start": 1375.0, "end": 1382.0, + "text": " And again, I saw in Pinecon, I saw product that is pretty early in the + space.", "tokens": [51264, 400, 797, 11, 286, 1866, 294, 33531, 1671, 11, 286, 1866, + 1674, 300, 307, 1238, 2440, 294, 264, 1901, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.2544973190516642, "compression_ratio": 1.5454545454545454, "no_speech_prob": + 0.3018932342529297}, {"id": 139, "seek": 138200, "start": 1382.0, "end": 1386.0, + "text": " And I saw a lot of data based concept and.", "tokens": [50364, 400, 286, + 1866, 257, 688, 295, 1412, 2361, 3410, 293, 13, 50564], "temperature": 0.0, "avg_logprob": + -0.2833099365234375, "compression_ratio": 1.6440677966101696, "no_speech_prob": + 0.0330592580139637}, {"id": 140, "seek": 138200, "start": 1386.0, "end": 1390.0, + "text": " We had to spend a lot of time explaining to people what that means they + weren''t getting it.", "tokens": [50564, 492, 632, 281, 3496, 257, 688, 295, 565, + 13468, 281, 561, 437, 300, 1355, 436, 4999, 380, 1242, 309, 13, 50764], "temperature": + 0.0, "avg_logprob": -0.2833099365234375, "compression_ratio": 1.6440677966101696, + "no_speech_prob": 0.0330592580139637}, {"id": 141, "seek": 138200, "start": 1390.0, + "end": 1395.0, "text": " On the user side, you see many, many engineers doing ML + engineering work.", "tokens": [50764, 1282, 264, 4195, 1252, 11, 291, 536, 867, + 11, 867, 11955, 884, 21601, 7043, 589, 13, 51014], "temperature": 0.0, "avg_logprob": + -0.2833099365234375, "compression_ratio": 1.6440677966101696, "no_speech_prob": + 0.0330592580139637}, {"id": 142, "seek": 138200, "start": 1395.0, "end": 1398.0, + "text": " We don''t yet call themselves ML engineers.", "tokens": [51014, 492, 500, + 380, 1939, 818, 2969, 21601, 11955, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.2833099365234375, "compression_ratio": 1.6440677966101696, "no_speech_prob": + 0.0330592580139637}, {"id": 143, "seek": 138200, "start": 1398.0, "end": 1400.0, + "text": " They''re still titled the software engineers.", "tokens": [51164, 814, + 434, 920, 19841, 264, 4722, 11955, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.2833099365234375, "compression_ratio": 1.6440677966101696, "no_speech_prob": + 0.0330592580139637}, {"id": 144, "seek": 138200, "start": 1400.0, "end": 1406.0, + "text": " Or they might get data scientists, but they''re now working on like production + applications.", "tokens": [51264, 1610, 436, 1062, 483, 1412, 7708, 11, 457, 436, + 434, 586, 1364, 322, 411, 4265, 5821, 13, 51564], "temperature": 0.0, "avg_logprob": + -0.2833099365234375, "compression_ratio": 1.6440677966101696, "no_speech_prob": + 0.0330592580139637}, {"id": 145, "seek": 140600, "start": 1406.0, "end": 1416.0, + "text": " And also we see that companies are struggling as they as they want to + take vector search out of the lab and into production production applications.", + "tokens": [50364, 400, 611, 321, 536, 300, 3431, 366, 9314, 382, 436, 382, 436, + 528, 281, 747, 8062, 3164, 484, 295, 264, 2715, 293, 666, 4265, 4265, 5821, 13, + 50864], "temperature": 0.0, "avg_logprob": -0.16826950537191854, "compression_ratio": + 1.5728155339805825, "no_speech_prob": 0.0004191641346551478}, {"id": 146, "seek": + 140600, "start": 1416.0, "end": 1420.0, "text": " They''re running up against the + same challenges like the technology they have.", "tokens": [50864, 814, 434, 2614, + 493, 1970, 264, 912, 4759, 411, 264, 2899, 436, 362, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.16826950537191854, "compression_ratio": 1.5728155339805825, + "no_speech_prob": 0.0004191641346551478}, {"id": 147, "seek": 140600, "start": 1420.0, + "end": 1424.0, "text": " They had available wasn''t quite.", "tokens": [51064, 814, + 632, 2435, 2067, 380, 1596, 13, 51264], "temperature": 0.0, "avg_logprob": -0.16826950537191854, + "compression_ratio": 1.5728155339805825, "no_speech_prob": 0.0004191641346551478}, + {"id": 148, "seek": 140600, "start": 1424.0, "end": 1426.0, "text": " Built for + that.", "tokens": [51264, 49822, 337, 300, 13, 51364], "temperature": 0.0, "avg_logprob": + -0.16826950537191854, "compression_ratio": 1.5728155339805825, "no_speech_prob": + 0.0004191641346551478}, {"id": 149, "seek": 140600, "start": 1426.0, "end": 1432.0, + "text": " For huge scale and for like secure and reliable.", "tokens": [51364, 1171, + 2603, 4373, 293, 337, 411, 7144, 293, 12924, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.16826950537191854, "compression_ratio": 1.5728155339805825, "no_speech_prob": + 0.0004191641346551478}, {"id": 150, "seek": 143200, "start": 1432.0, "end": 1435.0, + "text": " And so that''s the environment.", "tokens": [50364, 400, 370, 300, 311, + 264, 2823, 13, 50514], "temperature": 0.0, "avg_logprob": -0.3239966062741859, "compression_ratio": + 1.6029411764705883, "no_speech_prob": 0.0345943458378315}, {"id": 151, "seek": 143200, + "start": 1435.0, "end": 1436.0, "text": " And.", "tokens": [50514, 400, 13, 50564], + "temperature": 0.0, "avg_logprob": -0.3239966062741859, "compression_ratio": 1.6029411764705883, + "no_speech_prob": 0.0345943458378315}, {"id": 152, "seek": 143200, "start": 1436.0, + "end": 1438.0, "text": " Yeah, that''s exciting to be.", "tokens": [50564, 865, + 11, 300, 311, 4670, 281, 312, 13, 50664], "temperature": 0.0, "avg_logprob": -0.3239966062741859, + "compression_ratio": 1.6029411764705883, "no_speech_prob": 0.0345943458378315}, + {"id": 153, "seek": 143200, "start": 1438.0, "end": 1440.0, "text": " To be in an + emerging category like that.", "tokens": [50664, 1407, 312, 294, 364, 14989, 7719, + 411, 300, 13, 50764], "temperature": 0.0, "avg_logprob": -0.3239966062741859, "compression_ratio": + 1.6029411764705883, "no_speech_prob": 0.0345943458378315}, {"id": 154, "seek": 143200, + "start": 1440.0, "end": 1444.0, "text": " And solve a real need and see watch the + need.", "tokens": [50764, 400, 5039, 257, 957, 643, 293, 536, 1159, 264, 643, 13, + 50964], "temperature": 0.0, "avg_logprob": -0.3239966062741859, "compression_ratio": + 1.6029411764705883, "no_speech_prob": 0.0345943458378315}, {"id": 155, "seek": 143200, + "start": 1444.0, "end": 1446.0, "text": " Grow.", "tokens": [50964, 18476, 13, 51064], + "temperature": 0.0, "avg_logprob": -0.3239966062741859, "compression_ratio": 1.6029411764705883, + "no_speech_prob": 0.0345943458378315}, {"id": 156, "seek": 143200, "start": 1446.0, + "end": 1447.0, "text": " Yeah.", "tokens": [51064, 865, 13, 51114], "temperature": + 0.0, "avg_logprob": -0.3239966062741859, "compression_ratio": 1.6029411764705883, + "no_speech_prob": 0.0345943458378315}, {"id": 157, "seek": 143200, "start": 1447.0, + "end": 1450.0, "text": " That''s my personal, you know, that''s what motivates me.", + "tokens": [51114, 663, 311, 452, 2973, 11, 291, 458, 11, 300, 311, 437, 42569, 385, + 13, 51264], "temperature": 0.0, "avg_logprob": -0.3239966062741859, "compression_ratio": + 1.6029411764705883, "no_speech_prob": 0.0345943458378315}, {"id": 158, "seek": 143200, + "start": 1450.0, "end": 1451.0, "text": " And that''s why.", "tokens": [51264, 400, + 300, 311, 983, 13, 51314], "temperature": 0.0, "avg_logprob": -0.3239966062741859, + "compression_ratio": 1.6029411764705883, "no_speech_prob": 0.0345943458378315}, + {"id": 159, "seek": 143200, "start": 1451.0, "end": 1452.0, "text": " So I''m excited + to be here.", "tokens": [51314, 407, 286, 478, 2919, 281, 312, 510, 13, 51364], + "temperature": 0.0, "avg_logprob": -0.3239966062741859, "compression_ratio": 1.6029411764705883, + "no_speech_prob": 0.0345943458378315}, {"id": 160, "seek": 143200, "start": 1452.0, + "end": 1459.0, "text": " If you want to go even even on a more philosophical level, + like.", "tokens": [51364, 759, 291, 528, 281, 352, 754, 754, 322, 257, 544, 25066, + 1496, 11, 411, 13, 51714], "temperature": 0.0, "avg_logprob": -0.3239966062741859, + "compression_ratio": 1.6029411764705883, "no_speech_prob": 0.0345943458378315}, + {"id": 161, "seek": 145900, "start": 1459.0, "end": 1467.0, "text": " It''s really + rewarding to me to.", "tokens": [50364, 467, 311, 534, 20063, 281, 385, 281, 13, + 50764], "temperature": 0.0, "avg_logprob": -0.24377586364746093, "compression_ratio": + 1.3823529411764706, "no_speech_prob": 0.005706429481506348}, {"id": 162, "seek": + 145900, "start": 1467.0, "end": 1470.0, "text": " Help grow.", "tokens": [50764, + 10773, 1852, 13, 50914], "temperature": 0.0, "avg_logprob": -0.24377586364746093, + "compression_ratio": 1.3823529411764706, "no_speech_prob": 0.005706429481506348}, + {"id": 163, "seek": 145900, "start": 1470.0, "end": 1476.0, "text": " The kinds + of technologies that are powering.", "tokens": [50914, 440, 3685, 295, 7943, 300, + 366, 1347, 278, 13, 51214], "temperature": 0.0, "avg_logprob": -0.24377586364746093, + "compression_ratio": 1.3823529411764706, "no_speech_prob": 0.005706429481506348}, + {"id": 164, "seek": 145900, "start": 1476.0, "end": 1483.0, "text": " Our like software + infrastructure, which, which, which, which everything in this world runs on today.", + "tokens": [51214, 2621, 411, 4722, 6896, 11, 597, 11, 597, 11, 597, 11, 597, 1203, + 294, 341, 1002, 6676, 322, 965, 13, 51564], "temperature": 0.0, "avg_logprob": -0.24377586364746093, + "compression_ratio": 1.3823529411764706, "no_speech_prob": 0.005706429481506348}, + {"id": 165, "seek": 148300, "start": 1483.0, "end": 1491.0, "text": " So it''s really + a big thing to do.", "tokens": [50364, 407, 309, 311, 534, 257, 955, 551, 281, 360, + 13, 50764], "temperature": 0.4, "avg_logprob": -0.5697867575656163, "compression_ratio": + 1.663594470046083, "no_speech_prob": 0.12653446197509766}, {"id": 166, "seek": 148300, + "start": 1491.0, "end": 1495.0, "text": " It''s a fact that it''s kind of behind + the scenes and under the hood that you know most consumers and most people don''t + know that.", "tokens": [50764, 467, 311, 257, 1186, 300, 309, 311, 733, 295, 2261, + 264, 8026, 293, 833, 264, 13376, 300, 291, 458, 881, 11883, 293, 881, 561, 500, + 380, 458, 300, 13, 50964], "temperature": 0.4, "avg_logprob": -0.5697867575656163, + "compression_ratio": 1.663594470046083, "no_speech_prob": 0.12653446197509766}, + {"id": 167, "seek": 148300, "start": 1495.0, "end": 1502.0, "text": " Their Facebook + feed is powered by similarity search, or that their Google search is powered by + similar research.", "tokens": [50964, 6710, 4384, 3154, 307, 17786, 538, 32194, + 3164, 11, 420, 300, 641, 3329, 3164, 307, 17786, 538, 2531, 2132, 13, 51314], "temperature": + 0.4, "avg_logprob": -0.5697867575656163, "compression_ratio": 1.663594470046083, + "no_speech_prob": 0.12653446197509766}, {"id": 168, "seek": 148300, "start": 1502.0, + "end": 1505.0, "text": " But it even without them knowing it affects them tremendously.", + "tokens": [51314, 583, 309, 754, 1553, 552, 5276, 309, 11807, 552, 27985, 13, 51464], + "temperature": 0.4, "avg_logprob": -0.5697867575656163, "compression_ratio": 1.663594470046083, + "no_speech_prob": 0.12653446197509766}, {"id": 169, "seek": 148300, "start": 1505.0, + "end": 1509.0, "text": " I feel like we have a.", "tokens": [51464, 286, 841, 411, + 321, 362, 257, 13, 51664], "temperature": 0.4, "avg_logprob": -0.5697867575656163, + "compression_ratio": 1.663594470046083, "no_speech_prob": 0.12653446197509766}, + {"id": 170, "seek": 150900, "start": 1509.0, "end": 1512.0, "text": " And I think + that''s really.", "tokens": [50364, 400, 286, 519, 300, 311, 534, 13, 50514], "temperature": + 0.0, "avg_logprob": -0.4781525223343461, "compression_ratio": 1.9637305699481866, + "no_speech_prob": 0.02768482081592083}, {"id": 171, "seek": 150900, "start": 1512.0, + "end": 1513.0, "text": " I think that''s really.", "tokens": [50514, 286, 519, 300, + 311, 534, 13, 50564], "temperature": 0.0, "avg_logprob": -0.4781525223343461, "compression_ratio": + 1.9637305699481866, "no_speech_prob": 0.02768482081592083}, {"id": 172, "seek": + 150900, "start": 1513.0, "end": 1515.0, "text": " I think that''s really.", "tokens": + [50564, 286, 519, 300, 311, 534, 13, 50664], "temperature": 0.0, "avg_logprob": + -0.4781525223343461, "compression_ratio": 1.9637305699481866, "no_speech_prob": + 0.02768482081592083}, {"id": 173, "seek": 150900, "start": 1515.0, "end": 1517.0, + "text": " I think that''s really.", "tokens": [50664, 286, 519, 300, 311, 534, 13, + 50764], "temperature": 0.0, "avg_logprob": -0.4781525223343461, "compression_ratio": + 1.9637305699481866, "no_speech_prob": 0.02768482081592083}, {"id": 174, "seek": + 150900, "start": 1517.0, "end": 1518.0, "text": " I think that''s really.", "tokens": + [50764, 286, 519, 300, 311, 534, 13, 50814], "temperature": 0.0, "avg_logprob": + -0.4781525223343461, "compression_ratio": 1.9637305699481866, "no_speech_prob": + 0.02768482081592083}, {"id": 175, "seek": 150900, "start": 1518.0, "end": 1519.0, + "text": " Yeah.", "tokens": [50814, 865, 13, 50864], "temperature": 0.0, "avg_logprob": + -0.4781525223343461, "compression_ratio": 1.9637305699481866, "no_speech_prob": + 0.02768482081592083}, {"id": 176, "seek": 150900, "start": 1519.0, "end": 1520.0, + "text": " Yeah.", "tokens": [50864, 865, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.4781525223343461, "compression_ratio": 1.9637305699481866, "no_speech_prob": + 0.02768482081592083}, {"id": 177, "seek": 150900, "start": 1520.0, "end": 1521.0, + "text": " Yeah.", "tokens": [50914, 865, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.4781525223343461, "compression_ratio": 1.9637305699481866, "no_speech_prob": + 0.02768482081592083}, {"id": 178, "seek": 150900, "start": 1521.0, "end": 1522.0, + "text": " Sounds.", "tokens": [50964, 14576, 13, 51014], "temperature": 0.0, "avg_logprob": + -0.4781525223343461, "compression_ratio": 1.9637305699481866, "no_speech_prob": + 0.02768482081592083}, {"id": 179, "seek": 150900, "start": 1522.0, "end": 1523.0, + "text": " So deep.", "tokens": [51014, 407, 2452, 13, 51064], "temperature": 0.0, + "avg_logprob": -0.4781525223343461, "compression_ratio": 1.9637305699481866, "no_speech_prob": + 0.02768482081592083}, {"id": 180, "seek": 150900, "start": 1523.0, "end": 1524.0, + "text": " I mean, your connection to it.", "tokens": [51064, 286, 914, 11, 428, + 4984, 281, 309, 13, 51114], "temperature": 0.0, "avg_logprob": -0.4781525223343461, + "compression_ratio": 1.9637305699481866, "no_speech_prob": 0.02768482081592083}, + {"id": 181, "seek": 150900, "start": 1524.0, "end": 1528.0, "text": " And in general, + like it sounds like you''re excited to be at the bleeding edge of stack, right?", + "tokens": [51114, 400, 294, 2674, 11, 411, 309, 3263, 411, 291, 434, 2919, 281, + 312, 412, 264, 19312, 4691, 295, 8630, 11, 558, 30, 51314], "temperature": 0.0, + "avg_logprob": -0.4781525223343461, "compression_ratio": 1.9637305699481866, "no_speech_prob": + 0.02768482081592083}, {"id": 182, "seek": 150900, "start": 1528.0, "end": 1529.0, + "text": " So kind of like.", "tokens": [51314, 407, 733, 295, 411, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.4781525223343461, "compression_ratio": 1.9637305699481866, + "no_speech_prob": 0.02768482081592083}, {"id": 183, "seek": 150900, "start": 1529.0, + "end": 1531.0, "text": " Building the next thing.", "tokens": [51364, 18974, 264, + 958, 551, 13, 51464], "temperature": 0.0, "avg_logprob": -0.4781525223343461, "compression_ratio": + 1.9637305699481866, "no_speech_prob": 0.02768482081592083}, {"id": 184, "seek": + 150900, "start": 1531.0, "end": 1532.0, "text": " It''s.", "tokens": [51464, 467, + 311, 13, 51514], "temperature": 0.0, "avg_logprob": -0.4781525223343461, "compression_ratio": + 1.9637305699481866, "no_speech_prob": 0.02768482081592083}, {"id": 185, "seek": + 150900, "start": 1532.0, "end": 1534.0, "text": " I think it''s always exciting.", + "tokens": [51514, 286, 519, 309, 311, 1009, 4670, 13, 51614], "temperature": 0.0, + "avg_logprob": -0.4781525223343461, "compression_ratio": 1.9637305699481866, "no_speech_prob": + 0.02768482081592083}, {"id": 186, "seek": 150900, "start": 1534.0, "end": 1535.0, + "text": " Of course, it''s also.", "tokens": [51614, 2720, 1164, 11, 309, 311, 611, + 13, 51664], "temperature": 0.0, "avg_logprob": -0.4781525223343461, "compression_ratio": + 1.9637305699481866, "no_speech_prob": 0.02768482081592083}, {"id": 187, "seek": + 153500, "start": 1535.0, "end": 1538.0, "text": " And in many ways.", "tokens": + [50364, 400, 294, 867, 2098, 13, 50514], "temperature": 0.0, "avg_logprob": -0.23426016445817618, + "compression_ratio": 1.6167400881057268, "no_speech_prob": 0.02419867180287838}, + {"id": 188, "seek": 153500, "start": 1538.0, "end": 1539.0, "text": " Kind of.", + "tokens": [50514, 9242, 295, 13, 50564], "temperature": 0.0, "avg_logprob": -0.23426016445817618, + "compression_ratio": 1.6167400881057268, "no_speech_prob": 0.02419867180287838}, + {"id": 189, "seek": 153500, "start": 1539.0, "end": 1541.0, "text": " Well, I don''t + want to use the word dangerous.", "tokens": [50564, 1042, 11, 286, 500, 380, 528, + 281, 764, 264, 1349, 5795, 13, 50664], "temperature": 0.0, "avg_logprob": -0.23426016445817618, + "compression_ratio": 1.6167400881057268, "no_speech_prob": 0.02419867180287838}, + {"id": 190, "seek": 153500, "start": 1541.0, "end": 1542.0, "text": " I want to + use the word.", "tokens": [50664, 286, 528, 281, 764, 264, 1349, 13, 50714], "temperature": + 0.0, "avg_logprob": -0.23426016445817618, "compression_ratio": 1.6167400881057268, + "no_speech_prob": 0.02419867180287838}, {"id": 191, "seek": 153500, "start": 1542.0, + "end": 1543.0, "text": " Kind of like intense.", "tokens": [50714, 9242, 295, 411, + 9447, 13, 50764], "temperature": 0.0, "avg_logprob": -0.23426016445817618, "compression_ratio": + 1.6167400881057268, "no_speech_prob": 0.02419867180287838}, {"id": 192, "seek": + 153500, "start": 1543.0, "end": 1545.0, "text": " And you know, like.", "tokens": + [50764, 400, 291, 458, 11, 411, 13, 50864], "temperature": 0.0, "avg_logprob": -0.23426016445817618, + "compression_ratio": 1.6167400881057268, "no_speech_prob": 0.02419867180287838}, + {"id": 193, "seek": 153500, "start": 1545.0, "end": 1546.0, "text": " It''s nice + and bold.", "tokens": [50864, 467, 311, 1481, 293, 11928, 13, 50914], "temperature": + 0.0, "avg_logprob": -0.23426016445817618, "compression_ratio": 1.6167400881057268, + "no_speech_prob": 0.02419867180287838}, {"id": 194, "seek": 153500, "start": 1546.0, + "end": 1548.0, "text": " That was saying, right?", "tokens": [50914, 663, 390, 1566, + 11, 558, 30, 51014], "temperature": 0.0, "avg_logprob": -0.23426016445817618, "compression_ratio": + 1.6167400881057268, "no_speech_prob": 0.02419867180287838}, {"id": 195, "seek": + 153500, "start": 1548.0, "end": 1549.0, "text": " Yeah.", "tokens": [51014, 865, + 13, 51064], "temperature": 0.0, "avg_logprob": -0.23426016445817618, "compression_ratio": + 1.6167400881057268, "no_speech_prob": 0.02419867180287838}, {"id": 196, "seek": + 153500, "start": 1549.0, "end": 1550.0, "text": " Yeah.", "tokens": [51064, 865, + 13, 51114], "temperature": 0.0, "avg_logprob": -0.23426016445817618, "compression_ratio": + 1.6167400881057268, "no_speech_prob": 0.02419867180287838}, {"id": 197, "seek": + 153500, "start": 1550.0, "end": 1551.0, "text": " It''s.", "tokens": [51114, 467, + 311, 13, 51164], "temperature": 0.0, "avg_logprob": -0.23426016445817618, "compression_ratio": + 1.6167400881057268, "no_speech_prob": 0.02419867180287838}, {"id": 198, "seek": + 153500, "start": 1551.0, "end": 1552.0, "text": " For sure.", "tokens": [51164, + 1171, 988, 13, 51214], "temperature": 0.0, "avg_logprob": -0.23426016445817618, + "compression_ratio": 1.6167400881057268, "no_speech_prob": 0.02419867180287838}, + {"id": 199, "seek": 153500, "start": 1552.0, "end": 1553.0, "text": " We don''t + know how the future will play out.", "tokens": [51214, 492, 500, 380, 458, 577, + 264, 2027, 486, 862, 484, 13, 51264], "temperature": 0.0, "avg_logprob": -0.23426016445817618, + "compression_ratio": 1.6167400881057268, "no_speech_prob": 0.02419867180287838}, + {"id": 200, "seek": 153500, "start": 1553.0, "end": 1554.0, "text": " We have our + hopes and.", "tokens": [51264, 492, 362, 527, 13681, 293, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.23426016445817618, "compression_ratio": 1.6167400881057268, + "no_speech_prob": 0.02419867180287838}, {"id": 201, "seek": 153500, "start": 1554.0, + "end": 1556.0, "text": " And we''re making our bets.", "tokens": [51314, 400, 321, + 434, 1455, 527, 39922, 13, 51414], "temperature": 0.0, "avg_logprob": -0.23426016445817618, + "compression_ratio": 1.6167400881057268, "no_speech_prob": 0.02419867180287838}, + {"id": 202, "seek": 153500, "start": 1556.0, "end": 1559.0, "text": " But.", "tokens": + [51414, 583, 13, 51564], "temperature": 0.0, "avg_logprob": -0.23426016445817618, + "compression_ratio": 1.6167400881057268, "no_speech_prob": 0.02419867180287838}, + {"id": 203, "seek": 153500, "start": 1559.0, "end": 1560.0, "text": " It''s exciting + to try it.", "tokens": [51564, 467, 311, 4670, 281, 853, 309, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.23426016445817618, "compression_ratio": 1.6167400881057268, + "no_speech_prob": 0.02419867180287838}, {"id": 204, "seek": 153500, "start": 1560.0, + "end": 1561.0, "text": " And it.", "tokens": [51614, 400, 309, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.23426016445817618, "compression_ratio": 1.6167400881057268, + "no_speech_prob": 0.02419867180287838}, {"id": 205, "seek": 153500, "start": 1561.0, + "end": 1562.0, "text": " That''s.", "tokens": [51664, 663, 311, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.23426016445817618, "compression_ratio": 1.6167400881057268, + "no_speech_prob": 0.02419867180287838}, {"id": 206, "seek": 153500, "start": 1562.0, + "end": 1563.0, "text": " It motivates us.", "tokens": [51714, 467, 42569, 505, 13, + 51764], "temperature": 0.0, "avg_logprob": -0.23426016445817618, "compression_ratio": + 1.6167400881057268, "no_speech_prob": 0.02419867180287838}, {"id": 207, "seek": + 156300, "start": 1563.0, "end": 1564.0, "text": " And it''s.", "tokens": [50364, + 400, 309, 311, 13, 50414], "temperature": 0.0, "avg_logprob": -0.25297793404000707, + "compression_ratio": 1.568888888888889, "no_speech_prob": 0.009742332622408867}, + {"id": 208, "seek": 156300, "start": 1564.0, "end": 1565.0, "text": " It''s.", "tokens": + [50414, 467, 311, 13, 50464], "temperature": 0.0, "avg_logprob": -0.25297793404000707, + "compression_ratio": 1.568888888888889, "no_speech_prob": 0.009742332622408867}, + {"id": 209, "seek": 156300, "start": 1565.0, "end": 1567.0, "text": " Yeah, we''re + not looking for safe.", "tokens": [50464, 865, 11, 321, 434, 406, 1237, 337, 3273, + 13, 50564], "temperature": 0.0, "avg_logprob": -0.25297793404000707, "compression_ratio": + 1.568888888888889, "no_speech_prob": 0.009742332622408867}, {"id": 210, "seek": + 156300, "start": 1567.0, "end": 1568.0, "text": " For safety here.", "tokens": [50564, + 1171, 4514, 510, 13, 50614], "temperature": 0.0, "avg_logprob": -0.25297793404000707, + "compression_ratio": 1.568888888888889, "no_speech_prob": 0.009742332622408867}, + {"id": 211, "seek": 156300, "start": 1568.0, "end": 1569.0, "text": " Yeah.", "tokens": + [50614, 865, 13, 50664], "temperature": 0.0, "avg_logprob": -0.25297793404000707, + "compression_ratio": 1.568888888888889, "no_speech_prob": 0.009742332622408867}, + {"id": 212, "seek": 156300, "start": 1569.0, "end": 1570.0, "text": " Yeah.", "tokens": + [50664, 865, 13, 50714], "temperature": 0.0, "avg_logprob": -0.25297793404000707, + "compression_ratio": 1.568888888888889, "no_speech_prob": 0.009742332622408867}, + {"id": 213, "seek": 156300, "start": 1570.0, "end": 1571.0, "text": " Absolutely.", + "tokens": [50714, 7021, 13, 50764], "temperature": 0.0, "avg_logprob": -0.25297793404000707, + "compression_ratio": 1.568888888888889, "no_speech_prob": 0.009742332622408867}, + {"id": 214, "seek": 156300, "start": 1571.0, "end": 1573.0, "text": " But on that + front, like on the future,", "tokens": [50764, 583, 322, 300, 1868, 11, 411, 322, + 264, 2027, 11, 50864], "temperature": 0.0, "avg_logprob": -0.25297793404000707, + "compression_ratio": 1.568888888888889, "no_speech_prob": 0.009742332622408867}, + {"id": 215, "seek": 156300, "start": 1573.0, "end": 1574.0, "text": " a little bit.", + "tokens": [50864, 257, 707, 857, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.25297793404000707, "compression_ratio": 1.568888888888889, "no_speech_prob": + 0.009742332622408867}, {"id": 216, "seek": 156300, "start": 1574.0, "end": 1578.0, + "text": " Touching on the future of this market, even though it''s emerging.", "tokens": + [50914, 20029, 278, 322, 264, 2027, 295, 341, 2142, 11, 754, 1673, 309, 311, 14989, + 13, 51114], "temperature": 0.0, "avg_logprob": -0.25297793404000707, "compression_ratio": + 1.568888888888889, "no_speech_prob": 0.009742332622408867}, {"id": 217, "seek": + 156300, "start": 1578.0, "end": 1581.0, "text": " You know, and it''s still unfolding + in many ways.", "tokens": [51114, 509, 458, 11, 293, 309, 311, 920, 44586, 294, + 867, 2098, 13, 51264], "temperature": 0.0, "avg_logprob": -0.25297793404000707, + "compression_ratio": 1.568888888888889, "no_speech_prob": 0.009742332622408867}, + {"id": 218, "seek": 156300, "start": 1581.0, "end": 1585.0, "text": " And there + are so many players already.", "tokens": [51264, 400, 456, 366, 370, 867, 4150, + 1217, 13, 51464], "temperature": 0.0, "avg_logprob": -0.25297793404000707, "compression_ratio": + 1.568888888888889, "no_speech_prob": 0.009742332622408867}, {"id": 219, "seek": + 156300, "start": 1585.0, "end": 1587.0, "text": " But I''m just thinking like.", + "tokens": [51464, 583, 286, 478, 445, 1953, 411, 13, 51564], "temperature": 0.0, + "avg_logprob": -0.25297793404000707, "compression_ratio": 1.568888888888889, "no_speech_prob": + 0.009742332622408867}, {"id": 220, "seek": 156300, "start": 1587.0, "end": 1589.0, + "text": " What do you think.", "tokens": [51564, 708, 360, 291, 519, 13, 51664], + "temperature": 0.0, "avg_logprob": -0.25297793404000707, "compression_ratio": 1.568888888888889, + "no_speech_prob": 0.009742332622408867}, {"id": 221, "seek": 156300, "start": 1589.0, + "end": 1590.0, "text": " Kind of.", "tokens": [51664, 9242, 295, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.25297793404000707, "compression_ratio": 1.568888888888889, + "no_speech_prob": 0.009742332622408867}, {"id": 222, "seek": 159000, "start": 1590.0, + "end": 1593.0, "text": " What strategic items and missing on the market right now?", + "tokens": [50364, 708, 10924, 4754, 293, 5361, 322, 264, 2142, 558, 586, 30, 50514], + "temperature": 0.0, "avg_logprob": -0.17039137620192307, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.018524089828133583}, {"id": 223, "seek": 159000, "start": 1593.0, + "end": 1596.0, "text": " You know, when you think about not the data science part,", + "tokens": [50514, 509, 458, 11, 562, 291, 519, 466, 406, 264, 1412, 3497, 644, 11, + 50664], "temperature": 0.0, "avg_logprob": -0.17039137620192307, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.018524089828133583}, {"id": 224, "seek": + 159000, "start": 1596.0, "end": 1599.0, "text": " I think that data science is developed + quite well.", "tokens": [50664, 286, 519, 300, 1412, 3497, 307, 4743, 1596, 731, + 13, 50814], "temperature": 0.0, "avg_logprob": -0.17039137620192307, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.018524089828133583}, {"id": 225, "seek": + 159000, "start": 1599.0, "end": 1604.0, "text": " We have a lot of competing, you + know, algorithms and frameworks.", "tokens": [50814, 492, 362, 257, 688, 295, 15439, + 11, 291, 458, 11, 14642, 293, 29834, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.17039137620192307, "compression_ratio": 1.7142857142857142, "no_speech_prob": + 0.018524089828133583}, {"id": 226, "seek": 159000, "start": 1604.0, "end": 1607.0, + "text": " But like more like on the business side, right?", "tokens": [51064, 583, + 411, 544, 411, 322, 264, 1606, 1252, 11, 558, 30, 51214], "temperature": 0.0, "avg_logprob": + -0.17039137620192307, "compression_ratio": 1.7142857142857142, "no_speech_prob": + 0.018524089828133583}, {"id": 227, "seek": 159000, "start": 1607.0, "end": 1612.0, + "text": " And maybe that''s in line with like how users understand the systems.", + "tokens": [51214, 400, 1310, 300, 311, 294, 1622, 365, 411, 577, 5022, 1223, 264, + 3652, 13, 51464], "temperature": 0.0, "avg_logprob": -0.17039137620192307, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.018524089828133583}, {"id": 228, "seek": + 159000, "start": 1612.0, "end": 1614.0, "text": " Maybe they don''t understand enough.", + "tokens": [51464, 2704, 436, 500, 380, 1223, 1547, 13, 51564], "temperature": 0.0, + "avg_logprob": -0.17039137620192307, "compression_ratio": 1.7142857142857142, "no_speech_prob": + 0.018524089828133583}, {"id": 229, "seek": 159000, "start": 1614.0, "end": 1617.0, + "text": " Like, or like what items and missing?", "tokens": [51564, 1743, 11, 420, + 411, 437, 4754, 293, 5361, 30, 51714], "temperature": 0.0, "avg_logprob": -0.17039137620192307, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.018524089828133583}, + {"id": 230, "seek": 159000, "start": 1617.0, "end": 1619.0, "text": " And maybe + you''re working on that.", "tokens": [51714, 400, 1310, 291, 434, 1364, 322, 300, + 13, 51814], "temperature": 0.0, "avg_logprob": -0.17039137620192307, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.018524089828133583}, {"id": 231, "seek": + 161900, "start": 1619.0, "end": 1625.0, "text": " Maybe you''re willing to share + maybe not, but maybe something along those lines that we can discuss.", "tokens": + [50364, 2704, 291, 434, 4950, 281, 2073, 1310, 406, 11, 457, 1310, 746, 2051, 729, + 3876, 300, 321, 393, 2248, 13, 50664], "temperature": 0.0, "avg_logprob": -0.1670883066513959, + "compression_ratio": 1.5555555555555556, "no_speech_prob": 0.009171971119940281}, + {"id": 232, "seek": 161900, "start": 1625.0, "end": 1629.0, "text": " Yeah, I think + you actually.", "tokens": [50664, 865, 11, 286, 519, 291, 767, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.1670883066513959, "compression_ratio": 1.5555555555555556, + "no_speech_prob": 0.009171971119940281}, {"id": 233, "seek": 161900, "start": 1629.0, + "end": 1632.0, "text": " You made the right point, which is.", "tokens": [50864, + 509, 1027, 264, 558, 935, 11, 597, 307, 13, 51014], "temperature": 0.0, "avg_logprob": + -0.1670883066513959, "compression_ratio": 1.5555555555555556, "no_speech_prob": + 0.009171971119940281}, {"id": 234, "seek": 161900, "start": 1632.0, "end": 1637.0, + "text": " For a certain for a certain audience.", "tokens": [51014, 1171, 257, 1629, + 337, 257, 1629, 4034, 13, 51264], "temperature": 0.0, "avg_logprob": -0.1670883066513959, + "compression_ratio": 1.5555555555555556, "no_speech_prob": 0.009171971119940281}, + {"id": 235, "seek": 161900, "start": 1637.0, "end": 1639.0, "text": " There''s not.", + "tokens": [51264, 821, 311, 406, 13, 51364], "temperature": 0.0, "avg_logprob": + -0.1670883066513959, "compression_ratio": 1.5555555555555556, "no_speech_prob": + 0.009171971119940281}, {"id": 236, "seek": 161900, "start": 1639.0, "end": 1647.0, + "text": " I mean, there''s still more to be done for a very technical audience that''s + familiar with the vector search.", "tokens": [51364, 286, 914, 11, 456, 311, 920, + 544, 281, 312, 1096, 337, 257, 588, 6191, 4034, 300, 311, 4963, 365, 264, 8062, + 3164, 13, 51764], "temperature": 0.0, "avg_logprob": -0.1670883066513959, "compression_ratio": + 1.5555555555555556, "no_speech_prob": 0.009171971119940281}, {"id": 237, "seek": + 164700, "start": 1647.0, "end": 1651.0, "text": " They have a lot of tools in front + of them and.", "tokens": [50364, 814, 362, 257, 688, 295, 3873, 294, 1868, 295, + 552, 293, 13, 50564], "temperature": 0.0, "avg_logprob": -0.1628240229009272, "compression_ratio": + 1.5576036866359446, "no_speech_prob": 0.006784122437238693}, {"id": 238, "seek": + 164700, "start": 1651.0, "end": 1655.0, "text": " And right now, whatever extra + features they needed, they''ve.", "tokens": [50564, 400, 558, 586, 11, 2035, 2857, + 4122, 436, 2978, 11, 436, 600, 13, 50764], "temperature": 0.0, "avg_logprob": -0.1628240229009272, + "compression_ratio": 1.5576036866359446, "no_speech_prob": 0.006784122437238693}, + {"id": 239, "seek": 164700, "start": 1655.0, "end": 1658.0, "text": " They''ve hopefully + figured out.", "tokens": [50764, 814, 600, 4696, 8932, 484, 13, 50914], "temperature": + 0.0, "avg_logprob": -0.1628240229009272, "compression_ratio": 1.5576036866359446, + "no_speech_prob": 0.006784122437238693}, {"id": 240, "seek": 164700, "start": 1658.0, + "end": 1664.0, "text": " It''s everyone else who doesn''t yet understand this and + doesn''t quite see.", "tokens": [50914, 467, 311, 1518, 1646, 567, 1177, 380, 1939, + 1223, 341, 293, 1177, 380, 1596, 536, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.1628240229009272, "compression_ratio": 1.5576036866359446, "no_speech_prob": + 0.006784122437238693}, {"id": 241, "seek": 164700, "start": 1664.0, "end": 1668.0, + "text": " How it applies to their applications.", "tokens": [51214, 1012, 309, 13165, + 281, 641, 5821, 13, 51414], "temperature": 0.0, "avg_logprob": -0.1628240229009272, + "compression_ratio": 1.5576036866359446, "no_speech_prob": 0.006784122437238693}, + {"id": 242, "seek": 164700, "start": 1668.0, "end": 1676.0, "text": " And for whom + it''s not clear what, you know, how to choose an algorithm, how to tune it.", "tokens": + [51414, 400, 337, 7101, 309, 311, 406, 1850, 437, 11, 291, 458, 11, 577, 281, 2826, + 364, 9284, 11, 577, 281, 10864, 309, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.1628240229009272, "compression_ratio": 1.5576036866359446, "no_speech_prob": + 0.006784122437238693}, {"id": 243, "seek": 167600, "start": 1676.0, "end": 1680.0, + "text": " That''s, I think the future isn''t educating.", "tokens": [50364, 663, + 311, 11, 286, 519, 264, 2027, 1943, 380, 28835, 13, 50564], "temperature": 0.0, + "avg_logprob": -0.1755827780692808, "compression_ratio": 1.4759036144578312, "no_speech_prob": + 0.0002443080593366176}, {"id": 244, "seek": 167600, "start": 1680.0, "end": 1685.0, + "text": " Those people and those companies and then bringing this capability to + them.", "tokens": [50564, 3950, 561, 293, 729, 3431, 293, 550, 5062, 341, 13759, + 281, 552, 13, 50814], "temperature": 0.0, "avg_logprob": -0.1755827780692808, "compression_ratio": + 1.4759036144578312, "no_speech_prob": 0.0002443080593366176}, {"id": 245, "seek": + 167600, "start": 1685.0, "end": 1692.0, "text": " And that means just helping them + understand what it is, but it also means making the.", "tokens": [50814, 400, 300, + 1355, 445, 4315, 552, 1223, 437, 309, 307, 11, 457, 309, 611, 1355, 1455, 264, 13, + 51164], "temperature": 0.0, "avg_logprob": -0.1755827780692808, "compression_ratio": + 1.4759036144578312, "no_speech_prob": 0.0002443080593366176}, {"id": 246, "seek": + 167600, "start": 1692.0, "end": 1696.0, "text": " Products more accessible to them.", + "tokens": [51164, 47699, 544, 9515, 281, 552, 13, 51364], "temperature": 0.0, "avg_logprob": + -0.1755827780692808, "compression_ratio": 1.4759036144578312, "no_speech_prob": + 0.0002443080593366176}, {"id": 247, "seek": 167600, "start": 1696.0, "end": 1699.0, + "text": " Like.", "tokens": [51364, 1743, 13, 51514], "temperature": 0.0, "avg_logprob": + -0.1755827780692808, "compression_ratio": 1.4759036144578312, "no_speech_prob": + 0.0002443080593366176}, {"id": 248, "seek": 169900, "start": 1699.0, "end": 1706.0, + "text": " And they can care of some of the technical details so that they can just + focus on.", "tokens": [50364, 400, 436, 393, 1127, 295, 512, 295, 264, 6191, 4365, + 370, 300, 436, 393, 445, 1879, 322, 13, 50714], "temperature": 0.0, "avg_logprob": + -0.1766753083183652, "compression_ratio": 1.58, "no_speech_prob": 0.002294097328558564}, + {"id": 249, "seek": 169900, "start": 1706.0, "end": 1710.0, "text": " Yeah, they''re + business side of things in their application.", "tokens": [50714, 865, 11, 436, + 434, 1606, 1252, 295, 721, 294, 641, 3861, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.1766753083183652, "compression_ratio": 1.58, "no_speech_prob": 0.002294097328558564}, + {"id": 250, "seek": 169900, "start": 1710.0, "end": 1716.0, "text": " And there + are many, many companies out there that can use vector search, but just haven''t + heard of it.", "tokens": [50914, 400, 456, 366, 867, 11, 867, 3431, 484, 456, 300, + 393, 764, 8062, 3164, 11, 457, 445, 2378, 380, 2198, 295, 309, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.1766753083183652, "compression_ratio": 1.58, "no_speech_prob": + 0.002294097328558564}, {"id": 251, "seek": 169900, "start": 1716.0, "end": 1718.0, + "text": " Don''t realize it.", "tokens": [51214, 1468, 380, 4325, 309, 13, 51314], + "temperature": 0.0, "avg_logprob": -0.1766753083183652, "compression_ratio": 1.58, + "no_speech_prob": 0.002294097328558564}, {"id": 252, "seek": 169900, "start": 1718.0, + "end": 1720.0, "text": " And.", "tokens": [51314, 400, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.1766753083183652, "compression_ratio": 1.58, "no_speech_prob": + 0.002294097328558564}, {"id": 253, "seek": 169900, "start": 1720.0, "end": 1725.0, + "text": " I think the future is in reaching those people.", "tokens": [51414, 286, + 519, 264, 2027, 307, 294, 9906, 729, 561, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.1766753083183652, "compression_ratio": 1.58, "no_speech_prob": 0.002294097328558564}, + {"id": 254, "seek": 172500, "start": 1725.0, "end": 1728.0, "text": " And I think + even looking.", "tokens": [50364, 400, 286, 519, 754, 1237, 13, 50514], "temperature": + 0.0, "avg_logprob": -0.2399420464175871, "compression_ratio": 1.5980392156862746, + "no_speech_prob": 0.0008687314111739397}, {"id": 255, "seek": 172500, "start": 1728.0, + "end": 1730.0, "text": " Beyond vector search and just.", "tokens": [50514, 19707, + 8062, 3164, 293, 445, 13, 50614], "temperature": 0.0, "avg_logprob": -0.2399420464175871, + "compression_ratio": 1.5980392156862746, "no_speech_prob": 0.0008687314111739397}, + {"id": 256, "seek": 172500, "start": 1730.0, "end": 1736.0, "text": " Vector embeddings + in general, I think as more, as more and more companies adopt.", "tokens": [50614, + 691, 20814, 12240, 29432, 294, 2674, 11, 286, 519, 382, 544, 11, 382, 544, 293, + 544, 3431, 6878, 13, 50914], "temperature": 0.0, "avg_logprob": -0.2399420464175871, + "compression_ratio": 1.5980392156862746, "no_speech_prob": 0.0008687314111739397}, + {"id": 257, "seek": 172500, "start": 1736.0, "end": 1738.0, "text": " Machine learning + and learn about.", "tokens": [50914, 22155, 2539, 293, 1466, 466, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.2399420464175871, "compression_ratio": 1.5980392156862746, + "no_speech_prob": 0.0008687314111739397}, {"id": 258, "seek": 172500, "start": 1738.0, + "end": 1739.0, "text": " And LP.", "tokens": [51014, 400, 38095, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.2399420464175871, "compression_ratio": 1.5980392156862746, + "no_speech_prob": 0.0008687314111739397}, {"id": 259, "seek": 172500, "start": 1739.0, + "end": 1744.0, "text": " And continue hiring for data scientists and now machine + learning engineers, which.", "tokens": [51064, 400, 2354, 15335, 337, 1412, 7708, + 293, 586, 3479, 2539, 11955, 11, 597, 13, 51314], "temperature": 0.0, "avg_logprob": + -0.2399420464175871, "compression_ratio": 1.5980392156862746, "no_speech_prob": + 0.0008687314111739397}, {"id": 260, "seek": 172500, "start": 1744.0, "end": 1748.0, + "text": " By the way, are growing at a faster pace than.", "tokens": [51314, 3146, + 264, 636, 11, 366, 4194, 412, 257, 4663, 11638, 813, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.2399420464175871, "compression_ratio": 1.5980392156862746, + "no_speech_prob": 0.0008687314111739397}, {"id": 261, "seek": 172500, "start": 1748.0, + "end": 1750.0, "text": " Data scientists.", "tokens": [51514, 11888, 7708, 13, 51614], + "temperature": 0.0, "avg_logprob": -0.2399420464175871, "compression_ratio": 1.5980392156862746, + "no_speech_prob": 0.0008687314111739397}, {"id": 262, "seek": 175000, "start": 1750.0, + "end": 1756.0, "text": " The number of people with machine learning engineer titles + on LinkedIn.", "tokens": [50364, 440, 1230, 295, 561, 365, 3479, 2539, 11403, 12992, + 322, 20657, 13, 50664], "temperature": 0.0, "avg_logprob": -0.27157571315765383, + "compression_ratio": 1.4136363636363636, "no_speech_prob": 0.0008564196759834886}, + {"id": 263, "seek": 175000, "start": 1756.0, "end": 1759.0, "text": " Group by something + like 16%.", "tokens": [50664, 10500, 538, 746, 411, 3165, 4, 13, 50814], "temperature": + 0.0, "avg_logprob": -0.27157571315765383, "compression_ratio": 1.4136363636363636, + "no_speech_prob": 0.0008564196759834886}, {"id": 264, "seek": 175000, "start": 1759.0, + "end": 1764.0, "text": " In Q2 of this year, which is when I last.", "tokens": [50814, + 682, 1249, 17, 295, 341, 1064, 11, 597, 307, 562, 286, 1036, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.27157571315765383, "compression_ratio": 1.4136363636363636, + "no_speech_prob": 0.0008564196759834886}, {"id": 265, "seek": 175000, "start": 1764.0, + "end": 1767.0, "text": " Check this, where is data scientists grew by.", "tokens": + [51064, 6881, 341, 11, 689, 307, 1412, 7708, 6109, 538, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.27157571315765383, "compression_ratio": 1.4136363636363636, + "no_speech_prob": 0.0008564196759834886}, {"id": 266, "seek": 175000, "start": 1767.0, + "end": 1771.0, "text": " I don''t remember exactly, but single digits.", "tokens": + [51214, 286, 500, 380, 1604, 2293, 11, 457, 2167, 27011, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.27157571315765383, "compression_ratio": 1.4136363636363636, + "no_speech_prob": 0.0008564196759834886}, {"id": 267, "seek": 175000, "start": 1771.0, + "end": 1777.0, "text": " So and obviously not all those mental engineers are working + on vector search.", "tokens": [51414, 407, 293, 2745, 406, 439, 729, 4973, 11955, + 366, 1364, 322, 8062, 3164, 13, 51714], "temperature": 0.0, "avg_logprob": -0.27157571315765383, + "compression_ratio": 1.4136363636363636, "no_speech_prob": 0.0008564196759834886}, + {"id": 268, "seek": 177700, "start": 1777.0, "end": 1779.0, "text": " But.", "tokens": + [50364, 583, 13, 50464], "temperature": 0.0, "avg_logprob": -0.21351921979118796, + "compression_ratio": 1.4375, "no_speech_prob": 0.002354922704398632}, {"id": 269, + "seek": 177700, "start": 1779.0, "end": 1781.0, "text": " They will have more and + more.", "tokens": [50464, 814, 486, 362, 544, 293, 544, 13, 50564], "temperature": + 0.0, "avg_logprob": -0.21351921979118796, "compression_ratio": 1.4375, "no_speech_prob": + 0.002354922704398632}, {"id": 270, "seek": 177700, "start": 1781.0, "end": 1783.0, + "text": " Vector embedding data.", "tokens": [50564, 691, 20814, 12240, 3584, 1412, + 13, 50664], "temperature": 0.0, "avg_logprob": -0.21351921979118796, "compression_ratio": + 1.4375, "no_speech_prob": 0.002354922704398632}, {"id": 271, "seek": 177700, "start": + 1783.0, "end": 1786.0, "text": " But they''re trying to wrangle.", "tokens": [50664, + 583, 436, 434, 1382, 281, 928, 7846, 13, 50814], "temperature": 0.0, "avg_logprob": + -0.21351921979118796, "compression_ratio": 1.4375, "no_speech_prob": 0.002354922704398632}, + {"id": 272, "seek": 177700, "start": 1786.0, "end": 1790.0, "text": " They want + to maintain and analyze.", "tokens": [50814, 814, 528, 281, 6909, 293, 12477, 13, + 51014], "temperature": 0.0, "avg_logprob": -0.21351921979118796, "compression_ratio": + 1.4375, "no_speech_prob": 0.002354922704398632}, {"id": 273, "seek": 177700, "start": + 1790.0, "end": 1794.0, "text": " And in some cases search through, but also maybe + just.", "tokens": [51014, 400, 294, 512, 3331, 3164, 807, 11, 457, 611, 1310, 445, + 13, 51214], "temperature": 0.0, "avg_logprob": -0.21351921979118796, "compression_ratio": + 1.4375, "no_speech_prob": 0.002354922704398632}, {"id": 274, "seek": 177700, "start": + 1794.0, "end": 1796.0, "text": " Feed into other models.", "tokens": [51214, 33720, + 666, 661, 5245, 13, 51314], "temperature": 0.0, "avg_logprob": -0.21351921979118796, + "compression_ratio": 1.4375, "no_speech_prob": 0.002354922704398632}, {"id": 275, + "seek": 177700, "start": 1796.0, "end": 1797.0, "text": " And so on.", "tokens": + [51314, 400, 370, 322, 13, 51364], "temperature": 0.0, "avg_logprob": -0.21351921979118796, + "compression_ratio": 1.4375, "no_speech_prob": 0.002354922704398632}, {"id": 276, + "seek": 177700, "start": 1797.0, "end": 1799.0, "text": " And so.", "tokens": [51364, + 400, 370, 13, 51464], "temperature": 0.0, "avg_logprob": -0.21351921979118796, "compression_ratio": + 1.4375, "no_speech_prob": 0.002354922704398632}, {"id": 277, "seek": 177700, "start": + 1799.0, "end": 1806.0, "text": " In the past five years, we saw.", "tokens": [51464, + 682, 264, 1791, 1732, 924, 11, 321, 1866, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.21351921979118796, "compression_ratio": 1.4375, "no_speech_prob": 0.002354922704398632}, + {"id": 278, "seek": 180600, "start": 1806.0, "end": 1807.0, "text": " That''s.", + "tokens": [50364, 663, 311, 13, 50414], "temperature": 0.6, "avg_logprob": -0.7825347900390625, + "compression_ratio": 1.7559055118110236, "no_speech_prob": 0.003963829018175602}, + {"id": 279, "seek": 180600, "start": 1807.0, "end": 1808.0, "text": " That''s why + we want to work with data.", "tokens": [50414, 663, 311, 983, 321, 528, 281, 589, + 365, 1412, 13, 50464], "temperature": 0.6, "avg_logprob": -0.7825347900390625, "compression_ratio": + 1.7559055118110236, "no_speech_prob": 0.003963829018175602}, {"id": 280, "seek": + 180600, "start": 1808.0, "end": 1810.0, "text": " And a lot of questions have been + asked.", "tokens": [50464, 400, 257, 688, 295, 1651, 362, 668, 2351, 13, 50564], + "temperature": 0.6, "avg_logprob": -0.7825347900390625, "compression_ratio": 1.7559055118110236, + "no_speech_prob": 0.003963829018175602}, {"id": 281, "seek": 180600, "start": 1810.0, + "end": 1811.0, "text": " And so there are.", "tokens": [50564, 400, 370, 456, 366, + 13, 50614], "temperature": 0.6, "avg_logprob": -0.7825347900390625, "compression_ratio": + 1.7559055118110236, "no_speech_prob": 0.003963829018175602}, {"id": 282, "seek": + 180600, "start": 1811.0, "end": 1814.0, "text": " And so they need to think about + the question of data warehouses and data lakes,", "tokens": [50614, 400, 370, 436, + 643, 281, 519, 466, 264, 1168, 295, 1412, 17464, 29578, 293, 1412, 25595, 11, 50764], + "temperature": 0.6, "avg_logprob": -0.7825347900390625, "compression_ratio": 1.7559055118110236, + "no_speech_prob": 0.003963829018175602}, {"id": 283, "seek": 180600, "start": 1814.0, + "end": 1816.0, "text": " and really like.", "tokens": [50764, 293, 534, 411, 13, + 50864], "temperature": 0.6, "avg_logprob": -0.7825347900390625, "compression_ratio": + 1.7559055118110236, "no_speech_prob": 0.003963829018175602}, {"id": 284, "seek": + 180600, "start": 1816.0, "end": 1817.0, "text": " Companies.", "tokens": [50864, + 44031, 13, 50914], "temperature": 0.6, "avg_logprob": -0.7825347900390625, "compression_ratio": + 1.7559055118110236, "no_speech_prob": 0.003963829018175602}, {"id": 285, "seek": + 180600, "start": 1817.0, "end": 1819.0, "text": " Realizing they need to centralize + their all their data.", "tokens": [50914, 8467, 3319, 436, 643, 281, 5777, 1125, + 641, 439, 641, 1412, 13, 51014], "temperature": 0.6, "avg_logprob": -0.7825347900390625, + "compression_ratio": 1.7559055118110236, "no_speech_prob": 0.003963829018175602}, + {"id": 286, "seek": 180600, "start": 1819.0, "end": 1822.0, "text": " For their + data science teams and analysts and so on.", "tokens": [51014, 1171, 641, 1412, + 3497, 5491, 293, 31388, 293, 370, 322, 13, 51164], "temperature": 0.6, "avg_logprob": + -0.7825347900390625, "compression_ratio": 1.7559055118110236, "no_speech_prob": + 0.003963829018175602}, {"id": 287, "seek": 180600, "start": 1822.0, "end": 1823.0, + "text": " We.", "tokens": [51164, 492, 13, 51214], "temperature": 0.6, "avg_logprob": + -0.7825347900390625, "compression_ratio": 1.7559055118110236, "no_speech_prob": + 0.003963829018175602}, {"id": 288, "seek": 180600, "start": 1823.0, "end": 1824.0, + "text": " We believe.", "tokens": [51214, 492, 1697, 13, 51264], "temperature": + 0.6, "avg_logprob": -0.7825347900390625, "compression_ratio": 1.7559055118110236, + "no_speech_prob": 0.003963829018175602}, {"id": 289, "seek": 180600, "start": 1824.0, + "end": 1825.0, "text": " The same will.", "tokens": [51264, 440, 912, 486, 13, 51314], + "temperature": 0.6, "avg_logprob": -0.7825347900390625, "compression_ratio": 1.7559055118110236, + "no_speech_prob": 0.003963829018175602}, {"id": 290, "seek": 180600, "start": 1825.0, + "end": 1829.0, "text": " Companies will need the same thing for vector embeddings.", + "tokens": [51314, 44031, 486, 643, 264, 912, 551, 337, 8062, 12240, 29432, 13, 51514], + "temperature": 0.6, "avg_logprob": -0.7825347900390625, "compression_ratio": 1.7559055118110236, + "no_speech_prob": 0.003963829018175602}, {"id": 291, "seek": 180600, "start": 1829.0, + "end": 1832.0, "text": " So the, they have.", "tokens": [51514, 407, 264, 11, 436, + 362, 13, 51664], "temperature": 0.6, "avg_logprob": -0.7825347900390625, "compression_ratio": + 1.7559055118110236, "no_speech_prob": 0.003963829018175602}, {"id": 292, "seek": + 180600, "start": 1832.0, "end": 1834.0, "text": " One database for.", "tokens": + [51664, 1485, 8149, 337, 13, 51764], "temperature": 0.6, "avg_logprob": -0.7825347900390625, + "compression_ratio": 1.7559055118110236, "no_speech_prob": 0.003963829018175602}, + {"id": 293, "seek": 183400, "start": 1834.0, "end": 1842.88, "text": " all the can + feed applications, feed training and analysis and so on. So yeah, that might", "tokens": + [50364, 439, 264, 393, 3154, 5821, 11, 3154, 3097, 293, 5215, 293, 370, 322, 13, + 407, 1338, 11, 300, 1062, 50808], "temperature": 0.0, "avg_logprob": -0.2512951708854513, + "compression_ratio": 1.6088888888888888, "no_speech_prob": 0.08715543895959854}, + {"id": 294, "seek": 183400, "start": 1842.88, "end": 1849.12, "text": " be a few + years out and we''ll see if that ever happens. But those are the kinds of things", + "tokens": [50808, 312, 257, 1326, 924, 484, 293, 321, 603, 536, 498, 300, 1562, + 2314, 13, 583, 729, 366, 264, 3685, 295, 721, 51120], "temperature": 0.0, "avg_logprob": + -0.2512951708854513, "compression_ratio": 1.6088888888888888, "no_speech_prob": + 0.08715543895959854}, {"id": 295, "seek": 183400, "start": 1849.12, "end": 1854.16, + "text": " we''re thinking about often. Beyond research, how do we get, how do we + help people get more", "tokens": [51120, 321, 434, 1953, 466, 2049, 13, 19707, 2132, + 11, 577, 360, 321, 483, 11, 577, 360, 321, 854, 561, 483, 544, 51372], "temperature": + 0.0, "avg_logprob": -0.2512951708854513, "compression_ratio": 1.6088888888888888, + "no_speech_prob": 0.08715543895959854}, {"id": 296, "seek": 183400, "start": 1854.16, + "end": 1859.68, "text": " use out of there? Yeah, I mean, so I guess it goes along + the lines also of producing docs and", "tokens": [51372, 764, 484, 295, 456, 30, + 865, 11, 286, 914, 11, 370, 286, 2041, 309, 1709, 2051, 264, 3876, 611, 295, 10501, + 45623, 293, 51648], "temperature": 0.0, "avg_logprob": -0.2512951708854513, "compression_ratio": + 1.6088888888888888, "no_speech_prob": 0.08715543895959854}, {"id": 297, "seek": + 185968, "start": 1859.68, "end": 1865.6000000000001, "text": " the kind of documentation, + kind of explaining and source code explaining things, right? How can I,", "tokens": + [50364, 264, 733, 295, 14333, 11, 733, 295, 13468, 293, 4009, 3089, 13468, 721, + 11, 558, 30, 1012, 393, 286, 11, 50660], "temperature": 0.0, "avg_logprob": -0.258727970123291, + "compression_ratio": 1.7767857142857142, "no_speech_prob": 0.016809910535812378}, + {"id": 298, "seek": 185968, "start": 1865.6000000000001, "end": 1871.8400000000001, + "text": " you know, keep the road running and kind of doing things with my, because + I don''t want to focus on", "tokens": [50660, 291, 458, 11, 1066, 264, 3060, 2614, + 293, 733, 295, 884, 721, 365, 452, 11, 570, 286, 500, 380, 528, 281, 1879, 322, + 50972], "temperature": 0.0, "avg_logprob": -0.258727970123291, "compression_ratio": + 1.7767857142857142, "no_speech_prob": 0.016809910535812378}, {"id": 299, "seek": + 185968, "start": 1871.8400000000001, "end": 1876.88, "text": " like, you know, need + to create the of the vector of search itself maybe, but actually what I really", + "tokens": [50972, 411, 11, 291, 458, 11, 643, 281, 1884, 264, 295, 264, 8062, 295, + 3164, 2564, 1310, 11, 457, 767, 437, 286, 534, 51224], "temperature": 0.0, "avg_logprob": + -0.258727970123291, "compression_ratio": 1.7767857142857142, "no_speech_prob": 0.016809910535812378}, + {"id": 300, "seek": 185968, "start": 1876.88, "end": 1883.44, "text": " want is + to achieve like, you know, my goal, right? You know, let''s say create a music search + service", "tokens": [51224, 528, 307, 281, 4584, 411, 11, 291, 458, 11, 452, 3387, + 11, 558, 30, 509, 458, 11, 718, 311, 584, 1884, 257, 1318, 3164, 2643, 51552], "temperature": + 0.0, "avg_logprob": -0.258727970123291, "compression_ratio": 1.7767857142857142, + "no_speech_prob": 0.016809910535812378}, {"id": 301, "seek": 188344, "start": 1883.44, + "end": 1891.2, "text": " or something like that, right? Yeah, exactly. Yeah, it''s, + that''s a trap. A lot of people and", "tokens": [50364, 420, 746, 411, 300, 11, + 558, 30, 865, 11, 2293, 13, 865, 11, 309, 311, 11, 300, 311, 257, 11487, 13, 316, + 688, 295, 561, 293, 50752], "temperature": 0.0, "avg_logprob": -0.20912809686346367, + "compression_ratio": 1.6267281105990783, "no_speech_prob": 0.003218613797798753}, + {"id": 302, "seek": 188344, "start": 1891.2, "end": 1898.8, "text": " companies + fall into the, they love the technology, they love, you know, they''re very proud + of,", "tokens": [50752, 3431, 2100, 666, 264, 11, 436, 959, 264, 2899, 11, 436, + 959, 11, 291, 458, 11, 436, 434, 588, 4570, 295, 11, 51132], "temperature": 0.0, + "avg_logprob": -0.20912809686346367, "compression_ratio": 1.6267281105990783, "no_speech_prob": + 0.003218613797798753}, {"id": 303, "seek": 188344, "start": 1898.8, "end": 1904.88, + "text": " yeah, building something unique. And as you should be, but you have to + remember that that", "tokens": [51132, 1338, 11, 2390, 746, 3845, 13, 400, 382, + 291, 820, 312, 11, 457, 291, 362, 281, 1604, 300, 300, 51436], "temperature": 0.0, + "avg_logprob": -0.20912809686346367, "compression_ratio": 1.6267281105990783, "no_speech_prob": + 0.003218613797798753}, {"id": 304, "seek": 188344, "start": 1906.24, "end": 1911.52, + "text": " people you''re serving are just trying to solve some, some business problem.", + "tokens": [51504, 561, 291, 434, 8148, 366, 445, 1382, 281, 5039, 512, 11, 512, + 1606, 1154, 13, 51768], "temperature": 0.0, "avg_logprob": -0.20912809686346367, + "compression_ratio": 1.6267281105990783, "no_speech_prob": 0.003218613797798753}, + {"id": 305, "seek": 191344, "start": 1913.44, "end": 1919.8400000000001, "text": + " Some of them, the early adopters, they''ll, they might be very curious about how, + how it works under", "tokens": [50364, 2188, 295, 552, 11, 264, 2440, 22486, 1559, + 11, 436, 603, 11, 436, 1062, 312, 588, 6369, 466, 577, 11, 577, 309, 1985, 833, + 50684], "temperature": 0.0, "avg_logprob": -0.12072736284007198, "compression_ratio": + 1.6736401673640167, "no_speech_prob": 7.660697883693501e-05}, {"id": 306, "seek": + 191344, "start": 1919.8400000000001, "end": 1926.16, "text": " the hood and they + might want to have the ability to pull some levers and turn some knobs, but the + vast", "tokens": [50684, 264, 13376, 293, 436, 1062, 528, 281, 362, 264, 3485, 281, + 2235, 512, 45571, 293, 1261, 512, 46999, 11, 457, 264, 8369, 51000], "temperature": + 0.0, "avg_logprob": -0.12072736284007198, "compression_ratio": 1.6736401673640167, + "no_speech_prob": 7.660697883693501e-05}, {"id": 307, "seek": 191344, "start": 1926.16, + "end": 1934.56, "text": " majority of people just want to implement machine learning + into the applications to create a smarter", "tokens": [51000, 6286, 295, 561, 445, + 528, 281, 4445, 3479, 2539, 666, 264, 5821, 281, 1884, 257, 20294, 51420], "temperature": + 0.0, "avg_logprob": -0.12072736284007198, "compression_ratio": 1.6736401673640167, + "no_speech_prob": 7.660697883693501e-05}, {"id": 308, "seek": 191344, "start": 1934.56, + "end": 1943.3600000000001, "text": " search function to increase user engagement, + things like that. Yeah, yeah, I think that was also", "tokens": [51420, 3164, 2445, + 281, 3488, 4195, 8742, 11, 721, 411, 300, 13, 865, 11, 1338, 11, 286, 519, 300, + 390, 611, 51860], "temperature": 0.0, "avg_logprob": -0.12072736284007198, "compression_ratio": + 1.6736401673640167, "no_speech_prob": 7.660697883693501e-05}, {"id": 309, "seek": + 194336, "start": 1943.6, "end": 1950.32, "text": " one recent article. I will also + link it in the, in the notes, explaining that you can apply", "tokens": [50376, + 472, 5162, 7222, 13, 286, 486, 611, 2113, 309, 294, 264, 11, 294, 264, 5570, 11, + 13468, 300, 291, 393, 3079, 50712], "temperature": 0.0, "avg_logprob": -0.17104810597945233, + "compression_ratio": 1.5809128630705394, "no_speech_prob": 0.0004022227949462831}, + {"id": 310, "seek": 194336, "start": 1950.32, "end": 1956.7199999999998, "text": + " a vector search to solve the zero heat problem in e-commerce. And, and that''s + how you can save,", "tokens": [50712, 257, 8062, 3164, 281, 5039, 264, 4018, 3738, + 1154, 294, 308, 12, 26926, 13, 400, 11, 293, 300, 311, 577, 291, 393, 3155, 11, + 51032], "temperature": 0.0, "avg_logprob": -0.17104810597945233, "compression_ratio": + 1.5809128630705394, "no_speech_prob": 0.0004022227949462831}, {"id": 311, "seek": + 194336, "start": 1956.7199999999998, "end": 1963.76, "text": " well, actually earn + money, right? So save the user experience in that sense. Yeah, so it sounds", "tokens": + [51032, 731, 11, 767, 6012, 1460, 11, 558, 30, 407, 3155, 264, 4195, 1752, 294, + 300, 2020, 13, 865, 11, 370, 309, 3263, 51384], "temperature": 0.0, "avg_logprob": + -0.17104810597945233, "compression_ratio": 1.5809128630705394, "no_speech_prob": + 0.0004022227949462831}, {"id": 312, "seek": 194336, "start": 1963.76, "end": 1968.8799999999999, + "text": " like more and more use cases are coming up. I mean, you guys at the forefront + of actually hearing", "tokens": [51384, 411, 544, 293, 544, 764, 3331, 366, 1348, + 493, 13, 286, 914, 11, 291, 1074, 412, 264, 27287, 295, 767, 4763, 51640], "temperature": + 0.0, "avg_logprob": -0.17104810597945233, "compression_ratio": 1.5809128630705394, + "no_speech_prob": 0.0004022227949462831}, {"id": 313, "seek": 196888, "start": 1968.88, + "end": 1975.7600000000002, "text": " what are the use cases, right? And kind of + hopefully you''ll be sharing some of those with the", "tokens": [50364, 437, 366, + 264, 764, 3331, 11, 558, 30, 400, 733, 295, 4696, 291, 603, 312, 5414, 512, 295, + 729, 365, 264, 50708], "temperature": 0.0, "avg_logprob": -0.1363173256749692, "compression_ratio": + 1.6157205240174672, "no_speech_prob": 0.0023115132935345173}, {"id": 314, "seek": + 196888, "start": 1975.7600000000002, "end": 1983.2, "text": " audience at large + and something we''ll learn from you guys. Yeah, I''ve, we''re still, we''re still", + "tokens": [50708, 4034, 412, 2416, 293, 746, 321, 603, 1466, 490, 291, 1074, 13, + 865, 11, 286, 600, 11, 321, 434, 920, 11, 321, 434, 920, 51080], "temperature": + 0.0, "avg_logprob": -0.1363173256749692, "compression_ratio": 1.6157205240174672, + "no_speech_prob": 0.0023115132935345173}, {"id": 315, "seek": 196888, "start": 1983.2, + "end": 1991.1200000000001, "text": " constantly surprised by what people want to + do with the vector search. And, yeah, we want to make", "tokens": [51080, 6460, + 6100, 538, 437, 561, 528, 281, 360, 365, 264, 8062, 3164, 13, 400, 11, 1338, 11, + 321, 528, 281, 652, 51476], "temperature": 0.0, "avg_logprob": -0.1363173256749692, + "compression_ratio": 1.6157205240174672, "no_speech_prob": 0.0023115132935345173}, + {"id": 316, "seek": 196888, "start": 1991.1200000000001, "end": 1998.16, "text": + " the product available to as many people as possible to see what they come up with.", + "tokens": [51476, 264, 1674, 2435, 281, 382, 867, 561, 382, 1944, 281, 536, 437, + 436, 808, 493, 365, 13, 51828], "temperature": 0.0, "avg_logprob": -0.1363173256749692, + "compression_ratio": 1.6157205240174672, "no_speech_prob": 0.0023115132935345173}, + {"id": 317, "seek": 199888, "start": 1999.68, "end": 2009.7600000000002, "text": + " I will say though, we also surprised in a way by how many people want to just + do vector search", "tokens": [50404, 286, 486, 584, 1673, 11, 321, 611, 6100, 294, + 257, 636, 538, 577, 867, 561, 528, 281, 445, 360, 8062, 3164, 50908], "temperature": + 0.0, "avg_logprob": -0.1249434719346974, "compression_ratio": 1.5297297297297296, + "no_speech_prob": 0.0010734334355220199}, {"id": 318, "seek": 199888, "start": 2009.7600000000002, + "end": 2019.2800000000002, "text": " on text data, which seems like such a simple + thing to us, maybe. But it gets back to this point", "tokens": [50908, 322, 2487, + 1412, 11, 597, 2544, 411, 1270, 257, 2199, 551, 281, 505, 11, 1310, 13, 583, 309, + 2170, 646, 281, 341, 935, 51384], "temperature": 0.0, "avg_logprob": -0.1249434719346974, + "compression_ratio": 1.5297297297297296, "no_speech_prob": 0.0010734334355220199}, + {"id": 319, "seek": 199888, "start": 2019.2800000000002, "end": 2025.1200000000001, + "text": " that not everyone is, you know, this far along as, as people in the vector + search community.", "tokens": [51384, 300, 406, 1518, 307, 11, 291, 458, 11, 341, + 1400, 2051, 382, 11, 382, 561, 294, 264, 8062, 3164, 1768, 13, 51676], "temperature": + 0.0, "avg_logprob": -0.1249434719346974, "compression_ratio": 1.5297297297297296, + "no_speech_prob": 0.0010734334355220199}, {"id": 320, "seek": 202512, "start": 2026.08, + "end": 2033.04, "text": " So we got to bring, we got to bring more people with us + and help them see that once they''re done", "tokens": [50412, 407, 321, 658, 281, + 1565, 11, 321, 658, 281, 1565, 544, 561, 365, 505, 293, 854, 552, 536, 300, 1564, + 436, 434, 1096, 50760], "temperature": 0.0, "avg_logprob": -0.18708024422327676, + "compression_ratio": 1.6877828054298643, "no_speech_prob": 0.01758735068142414}, + {"id": 321, "seek": 202512, "start": 2033.04, "end": 2037.28, "text": " with a semantic + search use case, there''s actually a lot more they can do with it. Yeah,", "tokens": + [50760, 365, 257, 47982, 3164, 764, 1389, 11, 456, 311, 767, 257, 688, 544, 436, + 393, 360, 365, 309, 13, 865, 11, 50972], "temperature": 0.0, "avg_logprob": -0.18708024422327676, + "compression_ratio": 1.6877828054298643, "no_speech_prob": 0.01758735068142414}, + {"id": 322, "seek": 202512, "start": 2037.28, "end": 2046.0, "text": " I think it''s + something that probably needs a bit of kind of discovery for everyone, but also + sort", "tokens": [50972, 286, 519, 309, 311, 746, 300, 1391, 2203, 257, 857, 295, + 733, 295, 12114, 337, 1518, 11, 457, 611, 1333, 51408], "temperature": 0.0, "avg_logprob": + -0.18708024422327676, "compression_ratio": 1.6877828054298643, "no_speech_prob": + 0.01758735068142414}, {"id": 323, "seek": 202512, "start": 2046.0, "end": 2051.2, + "text": " of like blogging more about that and sharing more about that that, no, + it''s not only text,", "tokens": [51408, 295, 411, 6968, 3249, 544, 466, 300, 293, + 5414, 544, 466, 300, 300, 11, 572, 11, 309, 311, 406, 787, 2487, 11, 51668], "temperature": + 0.0, "avg_logprob": -0.18708024422327676, "compression_ratio": 1.6877828054298643, + "no_speech_prob": 0.01758735068142414}, {"id": 324, "seek": 205120, "start": 2051.2, + "end": 2056.96, "text": " it''s actually everything that is inculcable as a vector, + right? And could be dense, it could be", "tokens": [50364, 309, 311, 767, 1203, + 300, 307, 834, 425, 66, 712, 382, 257, 8062, 11, 558, 30, 400, 727, 312, 18011, + 11, 309, 727, 312, 50652], "temperature": 0.0, "avg_logprob": -0.18632020950317382, + "compression_ratio": 1.6297872340425532, "no_speech_prob": 0.0022406429052352905}, + {"id": 325, "seek": 205120, "start": 2056.96, "end": 2062.72, "text": " sparse, + it could be whatever you have there as long as it''s a vector. Then you can send + it in", "tokens": [50652, 637, 11668, 11, 309, 727, 312, 2035, 291, 362, 456, 382, + 938, 382, 309, 311, 257, 8062, 13, 1396, 291, 393, 2845, 309, 294, 50940], "temperature": + 0.0, "avg_logprob": -0.18632020950317382, "compression_ratio": 1.6297872340425532, + "no_speech_prob": 0.0022406429052352905}, {"id": 326, "seek": 205120, "start": 2062.72, + "end": 2067.2799999999997, "text": " index and search and then you need tools to + choose the metric function, right? We didn''t talk about", "tokens": [50940, 8186, + 293, 3164, 293, 550, 291, 643, 3873, 281, 2826, 264, 20678, 2445, 11, 558, 30, 492, + 994, 380, 751, 466, 51168], "temperature": 0.0, "avg_logprob": -0.18632020950317382, + "compression_ratio": 1.6297872340425532, "no_speech_prob": 0.0022406429052352905}, + {"id": 327, "seek": 205120, "start": 2067.2799999999997, "end": 2075.52, "text": + " it, but I know you guys support like three major distances like Euclidean and + dot product and", "tokens": [51168, 309, 11, 457, 286, 458, 291, 1074, 1406, 411, + 1045, 2563, 22182, 411, 462, 1311, 31264, 282, 293, 5893, 1674, 293, 51580], "temperature": + 0.0, "avg_logprob": -0.18632020950317382, "compression_ratio": 1.6297872340425532, + "no_speech_prob": 0.0022406429052352905}, {"id": 328, "seek": 207552, "start": 2075.52, + "end": 2081.68, "text": " to sign. Yeah, so I mean, these are like more or less + this standards across many, you know,", "tokens": [50364, 281, 1465, 13, 865, 11, + 370, 286, 914, 11, 613, 366, 411, 544, 420, 1570, 341, 7787, 2108, 867, 11, 291, + 458, 11, 50672], "temperature": 0.0, "avg_logprob": -0.25909970788394704, "compression_ratio": + 1.4816753926701571, "no_speech_prob": 0.0017748078098520637}, {"id": 329, "seek": + 207552, "start": 2081.68, "end": 2086.72, "text": " data science applications, but + I''m sure there is somebody somewhere sitting in the garage and", "tokens": [50672, + 1412, 3497, 5821, 11, 457, 286, 478, 988, 456, 307, 2618, 4079, 3798, 294, 264, + 14400, 293, 50924], "temperature": 0.0, "avg_logprob": -0.25909970788394704, "compression_ratio": + 1.4816753926701571, "no_speech_prob": 0.0017748078098520637}, {"id": 330, "seek": + 207552, "start": 2086.72, "end": 2092.24, "text": " venting in new metric and probably + you will want to kind of provide plug-in architecture for that", "tokens": [50924, + 6931, 278, 294, 777, 20678, 293, 1391, 291, 486, 528, 281, 733, 295, 2893, 5452, + 12, 259, 9482, 337, 300, 51200], "temperature": 0.0, "avg_logprob": -0.25909970788394704, + "compression_ratio": 1.4816753926701571, "no_speech_prob": 0.0017748078098520637}, + {"id": 331, "seek": 209224, "start": 2092.3199999999997, "end": 2101.2, "text": + " case as well, right? Yeah, well, we have our own people in this figurative garages", + "tokens": [50368, 1389, 382, 731, 11, 558, 30, 865, 11, 731, 11, 321, 362, 527, + 1065, 561, 294, 341, 31094, 1166, 3691, 1660, 50812], "temperature": 0.0, "avg_logprob": + -0.17815639078617096, "compression_ratio": 1.5227272727272727, "no_speech_prob": + 0.005211774259805679}, {"id": 332, "seek": 209224, "start": 2102.72, "end": 2112.4799999999996, + "text": " working on stuff as well. But also to go back to the previous thing, the + vector database that", "tokens": [50888, 1364, 322, 1507, 382, 731, 13, 583, 611, + 281, 352, 646, 281, 264, 3894, 551, 11, 264, 8062, 8149, 300, 51376], "temperature": + 0.0, "avg_logprob": -0.17815639078617096, "compression_ratio": 1.5227272727272727, + "no_speech_prob": 0.005211774259805679}, {"id": 333, "seek": 209224, "start": 2112.4799999999996, + "end": 2119.8399999999997, "text": " surrounds the engine as well, which might just + look like more traditional database features", "tokens": [51376, 44576, 264, 2848, + 382, 731, 11, 597, 1062, 445, 574, 411, 544, 5164, 8149, 4122, 51744], "temperature": + 0.0, "avg_logprob": -0.17815639078617096, "compression_ratio": 1.5227272727272727, + "no_speech_prob": 0.005211774259805679}, {"id": 334, "seek": 211984, "start": 2120.32, + "end": 2127.52, "text": " rather than and simply applied to vector search rather + than some breakthrough algorithms or", "tokens": [50388, 2831, 813, 293, 2935, 6456, + 281, 8062, 3164, 2831, 813, 512, 22397, 14642, 420, 50748], "temperature": 0.0, + "avg_logprob": -0.2619583716759315, "compression_ratio": 1.5568181818181819, "no_speech_prob": + 0.0030265129171311855}, {"id": 335, "seek": 211984, "start": 2128.7200000000003, + "end": 2133.04, "text": " things like that. Although, yeah, you know, the filtering + that we introduced with point", "tokens": [50808, 721, 411, 300, 13, 5780, 11, 1338, + 11, 291, 458, 11, 264, 30822, 300, 321, 7268, 365, 935, 51024], "temperature": 0.0, + "avg_logprob": -0.2619583716759315, "compression_ratio": 1.5568181818181819, "no_speech_prob": + 0.0030265129171311855}, {"id": 336, "seek": 211984, "start": 2133.04, "end": 2143.92, + "text": " on 2.0 is doing single stage filtering on vector index was, let''s say, + let''s not say that it''s", "tokens": [51024, 322, 568, 13, 15, 307, 884, 2167, + 3233, 30822, 322, 8062, 8186, 390, 11, 718, 311, 584, 11, 718, 311, 406, 584, 300, + 309, 311, 51568], "temperature": 0.0, "avg_logprob": -0.2619583716759315, "compression_ratio": + 1.5568181818181819, "no_speech_prob": 0.0030265129171311855}, {"id": 337, "seek": + 214392, "start": 2144.0, "end": 2149.92, "text": " a collot of late nights in the + garage. Yeah, yeah, sounds exciting and sounds like what your", "tokens": [50368, + 257, 1263, 310, 295, 3469, 13249, 294, 264, 14400, 13, 865, 11, 1338, 11, 3263, + 4670, 293, 3263, 411, 437, 428, 50664], "temperature": 0.0, "avg_logprob": -0.19278801144577387, + "compression_ratio": 1.7153024911032029, "no_speech_prob": 0.007601321674883366}, + {"id": 338, "seek": 214392, "start": 2149.92, "end": 2157.84, "text": " customers + will benefit from, right? Almost immediately. Yeah, that''s fantastic. Yeah, I was + thinking,", "tokens": [50664, 4581, 486, 5121, 490, 11, 558, 30, 12627, 4258, 13, + 865, 11, 300, 311, 5456, 13, 865, 11, 286, 390, 1953, 11, 51060], "temperature": + 0.0, "avg_logprob": -0.19278801144577387, "compression_ratio": 1.7153024911032029, + "no_speech_prob": 0.007601321674883366}, {"id": 339, "seek": 214392, "start": 2157.84, + "end": 2163.6800000000003, "text": " like, do you want to add anything more on Vinegon + or like, for instance, if somebody wants to try it", "tokens": [51060, 411, 11, + 360, 291, 528, 281, 909, 1340, 544, 322, 40569, 10660, 420, 411, 11, 337, 5197, + 11, 498, 2618, 2738, 281, 853, 309, 51352], "temperature": 0.0, "avg_logprob": -0.19278801144577387, + "compression_ratio": 1.7153024911032029, "no_speech_prob": 0.007601321674883366}, + {"id": 340, "seek": 214392, "start": 2163.6800000000003, "end": 2169.28, "text": + " out today, what''s the process looks like or should they just shoot you at the + email? Yeah, well,", "tokens": [51352, 484, 965, 11, 437, 311, 264, 1399, 1542, + 411, 420, 820, 436, 445, 3076, 291, 412, 264, 3796, 30, 865, 11, 731, 11, 51632], + "temperature": 0.0, "avg_logprob": -0.19278801144577387, "compression_ratio": 1.7153024911032029, + "no_speech_prob": 0.007601321674883366}, {"id": 341, "seek": 214392, "start": 2169.28, + "end": 2173.2000000000003, "text": " if they want to shoot me an email, they''re + welcome to do that. If they want to a t-shirt,", "tokens": [51632, 498, 436, 528, + 281, 3076, 385, 364, 3796, 11, 436, 434, 2928, 281, 360, 300, 13, 759, 436, 528, + 281, 257, 256, 12, 15313, 11, 51828], "temperature": 0.0, "avg_logprob": -0.19278801144577387, + "compression_ratio": 1.7153024911032029, "no_speech_prob": 0.007601321674883366}, + {"id": 342, "seek": 217320, "start": 2173.2, "end": 2181.2, "text": " send me an + email, grabgetpanko.io. But actually, we want to make it very easy for people to + start", "tokens": [50364, 2845, 385, 364, 3796, 11, 4444, 847, 79, 657, 78, 13, + 1004, 13, 583, 767, 11, 321, 528, 281, 652, 309, 588, 1858, 337, 561, 281, 722, + 50764], "temperature": 0.0, "avg_logprob": -0.15735796585823725, "compression_ratio": + 1.7180616740088106, "no_speech_prob": 0.0009715275373309851}, {"id": 343, "seek": + 217320, "start": 2181.2, "end": 2187.8399999999997, "text": " and experiment with. + And so you can go to panko.io slash start and create a free account. And for", "tokens": + [50764, 293, 5120, 365, 13, 400, 370, 291, 393, 352, 281, 280, 657, 78, 13, 1004, + 17330, 722, 293, 1884, 257, 1737, 2696, 13, 400, 337, 51096], "temperature": 0.0, + "avg_logprob": -0.15735796585823725, "compression_ratio": 1.7180616740088106, "no_speech_prob": + 0.0009715275373309851}, {"id": 344, "seek": 217320, "start": 2187.8399999999997, + "end": 2197.04, "text": " small workloads, it''s actually free to use. You get one + pod, which is enough for, definitely enough", "tokens": [51096, 1359, 32452, 11, + 309, 311, 767, 1737, 281, 764, 13, 509, 483, 472, 2497, 11, 597, 307, 1547, 337, + 11, 2138, 1547, 51556], "temperature": 0.0, "avg_logprob": -0.15735796585823725, + "compression_ratio": 1.7180616740088106, "no_speech_prob": 0.0009715275373309851}, + {"id": 345, "seek": 217320, "start": 2197.04, "end": 2201.7599999999998, "text": + " for experimenting. And if you have a small workload, it''s enough for your production + use case.", "tokens": [51556, 337, 29070, 13, 400, 498, 291, 362, 257, 1359, 20139, + 11, 309, 311, 1547, 337, 428, 4265, 764, 1389, 13, 51792], "temperature": 0.0, "avg_logprob": + -0.15735796585823725, "compression_ratio": 1.7180616740088106, "no_speech_prob": + 0.0009715275373309851}, {"id": 346, "seek": 220320, "start": 2203.2, "end": 2207.12, + "text": " That''s the easiest and fastest way to sign up. You don''t have to talk + to anyone.", "tokens": [50364, 663, 311, 264, 12889, 293, 14573, 636, 281, 1465, + 493, 13, 509, 500, 380, 362, 281, 751, 281, 2878, 13, 50560], "temperature": 0.0, + "avg_logprob": -0.12020600788177006, "compression_ratio": 1.4514285714285715, "no_speech_prob": + 0.0009048613719642162}, {"id": 347, "seek": 220320, "start": 2209.3599999999997, + "end": 2218.08, "text": " If you need custom deployment configurations, like certain + availability zones, or", "tokens": [50672, 759, 291, 643, 2375, 19317, 31493, 11, + 411, 1629, 17945, 16025, 11, 420, 51108], "temperature": 0.0, "avg_logprob": -0.12020600788177006, + "compression_ratio": 1.4514285714285715, "no_speech_prob": 0.0009048613719642162}, + {"id": 348, "seek": 220320, "start": 2220.16, "end": 2225.6, "text": " anything + else, you can send me an email or you can use a contact form on our site and we''ll", + "tokens": [51212, 1340, 1646, 11, 291, 393, 2845, 385, 364, 3796, 420, 291, 393, + 764, 257, 3385, 1254, 322, 527, 3621, 293, 321, 603, 51484], "temperature": 0.0, + "avg_logprob": -0.12020600788177006, "compression_ratio": 1.4514285714285715, "no_speech_prob": + 0.0009048613719642162}, {"id": 349, "seek": 222560, "start": 2226.3199999999997, + "end": 2229.52, "text": " get you set up. And it''s almost as quick. We just have + to", "tokens": [50400, 483, 291, 992, 493, 13, 400, 309, 311, 1920, 382, 1702, 13, + 492, 445, 362, 281, 50560], "temperature": 0.0, "avg_logprob": -0.13275982471222572, + "compression_ratio": 1.555023923444976, "no_speech_prob": 0.008896099403500557}, + {"id": 350, "seek": 222560, "start": 2232.24, "end": 2238.56, "text": " set up some + configurations. But we want to help you get to production. And that means", "tokens": + [50696, 992, 493, 512, 31493, 13, 583, 321, 528, 281, 854, 291, 483, 281, 4265, + 13, 400, 300, 1355, 51012], "temperature": 0.0, "avg_logprob": -0.13275982471222572, + "compression_ratio": 1.555023923444976, "no_speech_prob": 0.008896099403500557}, + {"id": 351, "seek": 222560, "start": 2239.04, "end": 2243.36, "text": " not standing + in your way. So that''s the best way to do it. Go to panko.io slash start.", "tokens": + [51036, 406, 4877, 294, 428, 636, 13, 407, 300, 311, 264, 1151, 636, 281, 360, 309, + 13, 1037, 281, 280, 657, 78, 13, 1004, 17330, 722, 13, 51252], "temperature": 0.0, + "avg_logprob": -0.13275982471222572, "compression_ratio": 1.555023923444976, "no_speech_prob": + 0.008896099403500557}, {"id": 352, "seek": 222560, "start": 2244.4, "end": 2249.36, + "text": " Awesome. And we''ll make sure to link that in the notes as well. And you + said, what do you mean", "tokens": [51304, 10391, 13, 400, 321, 603, 652, 988, 281, + 2113, 300, 294, 264, 5570, 382, 731, 13, 400, 291, 848, 11, 437, 360, 291, 914, + 51552], "temperature": 0.0, "avg_logprob": -0.13275982471222572, "compression_ratio": + 1.555023923444976, "no_speech_prob": 0.008896099403500557}, {"id": 353, "seek": + 224936, "start": 2249.44, "end": 2256.0, "text": " Kubernetes, right? Kubernetes + pod. Oh yeah. I mean, we didn''t touch on this in this", "tokens": [50368, 23145, + 11, 558, 30, 23145, 2497, 13, 876, 1338, 13, 286, 914, 11, 321, 994, 380, 2557, + 322, 341, 294, 341, 50696], "temperature": 0.0, "avg_logprob": -0.2332640730816385, + "compression_ratio": 1.6150442477876106, "no_speech_prob": 0.02308342233300209}, + {"id": 354, "seek": 224936, "start": 2256.7200000000003, "end": 2262.6400000000003, + "text": " in this episode, but obviously you guys are scaling with Kubernetes. So + you''re also modern on", "tokens": [50732, 294, 341, 3500, 11, 457, 2745, 291, 1074, + 366, 21589, 365, 23145, 13, 407, 291, 434, 611, 4363, 322, 51028], "temperature": + 0.0, "avg_logprob": -0.2332640730816385, "compression_ratio": 1.6150442477876106, + "no_speech_prob": 0.02308342233300209}, {"id": 355, "seek": 224936, "start": 2262.6400000000003, + "end": 2270.08, "text": " that site as well. Oh yeah, we, we, you know, I should + have mentioned this when you asked about", "tokens": [51028, 300, 3621, 382, 731, + 13, 876, 1338, 11, 321, 11, 321, 11, 291, 458, 11, 286, 820, 362, 2835, 341, 562, + 291, 2351, 466, 51400], "temperature": 0.0, "avg_logprob": -0.2332640730816385, + "compression_ratio": 1.6150442477876106, "no_speech_prob": 0.02308342233300209}, + {"id": 356, "seek": 224936, "start": 2270.08, "end": 2278.08, "text": " the inner + workings. But yeah, we''re using Kubernetes to make the whole service horizontally", + "tokens": [51400, 264, 7284, 589, 1109, 13, 583, 1338, 11, 321, 434, 1228, 23145, + 281, 652, 264, 1379, 2643, 33796, 51800], "temperature": 0.0, "avg_logprob": -0.2332640730816385, + "compression_ratio": 1.6150442477876106, "no_speech_prob": 0.02308342233300209}, + {"id": 357, "seek": 227808, "start": 2278.08, "end": 2283.04, "text": " scalable. + And of course, the total managed on our side. So you don''t have to know anything + about", "tokens": [50364, 38481, 13, 400, 295, 1164, 11, 264, 3217, 6453, 322, 527, + 1252, 13, 407, 291, 500, 380, 362, 281, 458, 1340, 466, 50612], "temperature": 0.0, + "avg_logprob": -0.2074084886362855, "compression_ratio": 1.4795918367346939, "no_speech_prob": + 0.0007838575402274728}, {"id": 358, "seek": 227808, "start": 2283.04, "end": 2293.52, + "text": " containers or Kubernetes or, or worry about any of it. But I mean, Kafka + for streaming to support", "tokens": [50612, 17089, 420, 23145, 420, 11, 420, 3292, + 466, 604, 295, 309, 13, 583, 286, 914, 11, 47064, 337, 11791, 281, 1406, 51136], + "temperature": 0.0, "avg_logprob": -0.2074084886362855, "compression_ratio": 1.4795918367346939, + "no_speech_prob": 0.0007838575402274728}, {"id": 359, "seek": 227808, "start": 2293.52, + "end": 2304.16, "text": " streaming index updates or batch, batch updates. There + are load balancers that are API gateways", "tokens": [51136, 11791, 8186, 9205, + 420, 15245, 11, 15245, 9205, 13, 821, 366, 3677, 3119, 4463, 433, 300, 366, 9362, + 8539, 942, 51668], "temperature": 0.0, "avg_logprob": -0.2074084886362855, "compression_ratio": + 1.4795918367346939, "no_speech_prob": 0.0007838575402274728}, {"id": 360, "seek": + 230416, "start": 2304.24, "end": 2308.16, "text": " that are just a bunch of different. + There''s a key value store under the hood.", "tokens": [50368, 300, 366, 445, 257, + 3840, 295, 819, 13, 821, 311, 257, 2141, 2158, 3531, 833, 264, 13376, 13, 50564], + "temperature": 0.0, "avg_logprob": -0.16671541574839, "compression_ratio": 1.603846153846154, + "no_speech_prob": 0.011620491743087769}, {"id": 361, "seek": 230416, "start": 2310.48, + "end": 2314.48, "text": " If you want to see the architecture, you can cut our docs + and learn a bit more about it. But again,", "tokens": [50680, 759, 291, 528, 281, + 536, 264, 9482, 11, 291, 393, 1723, 527, 45623, 293, 1466, 257, 857, 544, 466, 309, + 13, 583, 797, 11, 50880], "temperature": 0.0, "avg_logprob": -0.16671541574839, + "compression_ratio": 1.603846153846154, "no_speech_prob": 0.011620491743087769}, + {"id": 362, "seek": 230416, "start": 2314.48, "end": 2318.3199999999997, "text": + " you don''t have to know anything about it. And that''s the point. We make it a,", + "tokens": [50880, 291, 500, 380, 362, 281, 458, 1340, 466, 309, 13, 400, 300, 311, + 264, 935, 13, 492, 652, 309, 257, 11, 51072], "temperature": 0.0, "avg_logprob": + -0.16671541574839, "compression_ratio": 1.603846153846154, "no_speech_prob": 0.011620491743087769}, + {"id": 363, "seek": 230416, "start": 2320.24, "end": 2326.8799999999997, "text": + " you just make your API calls and, and get your results and do something with those + results.", "tokens": [51168, 291, 445, 652, 428, 9362, 5498, 293, 11, 293, 483, + 428, 3542, 293, 360, 746, 365, 729, 3542, 13, 51500], "temperature": 0.0, "avg_logprob": + -0.16671541574839, "compression_ratio": 1.603846153846154, "no_speech_prob": 0.011620491743087769}, + {"id": 364, "seek": 230416, "start": 2326.8799999999997, "end": 2331.52, "text": + " Yeah, exactly. Fantastic. And by the way, are you planning to kind of", "tokens": + [51500, 865, 11, 2293, 13, 21320, 13, 400, 538, 264, 636, 11, 366, 291, 5038, 281, + 733, 295, 51732], "temperature": 0.0, "avg_logprob": -0.16671541574839, "compression_ratio": + 1.603846153846154, "no_speech_prob": 0.011620491743087769}, {"id": 365, "seek": + 233152, "start": 2332.48, "end": 2338.0, "text": " at some point, maybe open source, + or actually implement some things for the public to send the", "tokens": [50412, + 412, 512, 935, 11, 1310, 1269, 4009, 11, 420, 767, 4445, 512, 721, 337, 264, 1908, + 281, 2845, 264, 50688], "temperature": 0.0, "avg_logprob": -0.1901295848728455, + "compression_ratio": 1.6260504201680672, "no_speech_prob": 0.003159665036946535}, + {"id": 366, "seek": 233152, "start": 2338.0, "end": 2342.32, "text": " data in? + Well, do you think it''s not a problem at all? You know, kind of some kind of connector", + "tokens": [50688, 1412, 294, 30, 1042, 11, 360, 291, 519, 309, 311, 406, 257, 1154, + 412, 439, 30, 509, 458, 11, 733, 295, 512, 733, 295, 19127, 50904], "temperature": + 0.0, "avg_logprob": -0.1901295848728455, "compression_ratio": 1.6260504201680672, + "no_speech_prob": 0.003159665036946535}, {"id": 367, "seek": 233152, "start": 2342.32, + "end": 2348.8, "text": " called some kind of gluing code to the pinecon on the side + of integration, right? So I guess", "tokens": [50904, 1219, 512, 733, 295, 1563, + 9635, 3089, 281, 264, 15113, 1671, 322, 264, 1252, 295, 10980, 11, 558, 30, 407, + 286, 2041, 51228], "temperature": 0.0, "avg_logprob": -0.1901295848728455, "compression_ratio": + 1.6260504201680672, "no_speech_prob": 0.003159665036946535}, {"id": 368, "seek": + 233152, "start": 2348.8, "end": 2353.52, "text": " obviously clients will still + look at how do they plug in pinecon in the right part of the architecture.", "tokens": + [51228, 2745, 6982, 486, 920, 574, 412, 577, 360, 436, 5452, 294, 15113, 1671, 294, + 264, 558, 644, 295, 264, 9482, 13, 51464], "temperature": 0.0, "avg_logprob": -0.1901295848728455, + "compression_ratio": 1.6260504201680672, "no_speech_prob": 0.003159665036946535}, + {"id": 369, "seek": 235352, "start": 2354.4, "end": 2364.96, "text": " Yeah, we''re + thinking a lot about that. We''re looking at what are the most common data sources.", + "tokens": [50408, 865, 11, 321, 434, 1953, 257, 688, 466, 300, 13, 492, 434, 1237, + 412, 437, 366, 264, 881, 2689, 1412, 7139, 13, 50936], "temperature": 0.0, "avg_logprob": + -0.2683268422665803, "compression_ratio": 1.373015873015873, "no_speech_prob": 0.0046066101640462875}, + {"id": 370, "seek": 235352, "start": 2366.72, "end": 2376.4, "text": " What is typical + usage look like? And what''s the trickiest part for people? And", "tokens": [51024, + 708, 307, 7476, 14924, 574, 411, 30, 400, 437, 311, 264, 4282, 6495, 644, 337, 561, + 30, 400, 51508], "temperature": 0.0, "avg_logprob": -0.2683268422665803, "compression_ratio": + 1.373015873015873, "no_speech_prob": 0.0046066101640462875}, {"id": 371, "seek": + 237640, "start": 2376.64, "end": 2385.92, "text": " we are thinking about how to + make the trickiest part, parts easiest, as many people as possible.", "tokens": + [50376, 321, 366, 1953, 466, 577, 281, 652, 264, 4282, 6495, 644, 11, 3166, 12889, + 11, 382, 867, 561, 382, 1944, 13, 50840], "temperature": 0.0, "avg_logprob": -0.16395123799641928, + "compression_ratio": 1.583673469387755, "no_speech_prob": 0.0027856784872710705}, + {"id": 372, "seek": 237640, "start": 2386.8, "end": 2394.0, "text": " So can''t + say much more than that, but certainly we''ll have some common use cases covered + soon.", "tokens": [50884, 407, 393, 380, 584, 709, 544, 813, 300, 11, 457, 3297, + 321, 603, 362, 512, 2689, 764, 3331, 5343, 2321, 13, 51244], "temperature": 0.0, + "avg_logprob": -0.16395123799641928, "compression_ratio": 1.583673469387755, "no_speech_prob": + 0.0027856784872710705}, {"id": 373, "seek": 237640, "start": 2394.8, "end": 2400.2400000000002, + "text": " Yeah, yeah, sure. I mean, that''s so important. I mean, a lot of things + like in machine learning,", "tokens": [51284, 865, 11, 1338, 11, 988, 13, 286, 914, + 11, 300, 311, 370, 1021, 13, 286, 914, 11, 257, 688, 295, 721, 411, 294, 3479, 2539, + 11, 51556], "temperature": 0.0, "avg_logprob": -0.16395123799641928, "compression_ratio": + 1.583673469387755, "no_speech_prob": 0.0027856784872710705}, {"id": 374, "seek": + 237640, "start": 2400.2400000000002, "end": 2405.6, "text": " you know, that like + 80% goes to data collection and cleaning. And then in the end, you plug in some", + "tokens": [51556, 291, 458, 11, 300, 411, 4688, 4, 1709, 281, 1412, 5765, 293, 8924, + 13, 400, 550, 294, 264, 917, 11, 291, 5452, 294, 512, 51824], "temperature": 0.0, + "avg_logprob": -0.16395123799641928, "compression_ratio": 1.583673469387755, "no_speech_prob": + 0.0027856784872710705}, {"id": 375, "seek": 240560, "start": 2405.6, "end": 2411.2799999999997, + "text": " water, like, ooh, I sold the task, right? And the same kind of goes to + the trying databases or,", "tokens": [50364, 1281, 11, 411, 11, 17024, 11, 286, + 3718, 264, 5633, 11, 558, 30, 400, 264, 912, 733, 295, 1709, 281, 264, 1382, 22380, + 420, 11, 50648], "temperature": 0.0, "avg_logprob": -0.23021762742908722, "compression_ratio": + 1.6470588235294117, "no_speech_prob": 0.0008808431448414922}, {"id": 376, "seek": + 240560, "start": 2411.2799999999997, "end": 2416.7999999999997, "text": " you know, + software like, okay, how do I plug this in? And days go by and you''re still figuring + things", "tokens": [50648, 291, 458, 11, 4722, 411, 11, 1392, 11, 577, 360, 286, + 5452, 341, 294, 30, 400, 1708, 352, 538, 293, 291, 434, 920, 15213, 721, 50924], + "temperature": 0.0, "avg_logprob": -0.23021762742908722, "compression_ratio": 1.6470588235294117, + "no_speech_prob": 0.0008808431448414922}, {"id": 377, "seek": 240560, "start": 2416.7999999999997, + "end": 2422.88, "text": " out. So I think that''s the, that''s something to address. + And I guess you guys are doing that, right?", "tokens": [50924, 484, 13, 407, 286, + 519, 300, 311, 264, 11, 300, 311, 746, 281, 2985, 13, 400, 286, 2041, 291, 1074, + 366, 884, 300, 11, 558, 30, 51228], "temperature": 0.0, "avg_logprob": -0.23021762742908722, + "compression_ratio": 1.6470588235294117, "no_speech_prob": 0.0008808431448414922}, + {"id": 378, "seek": 240560, "start": 2424.24, "end": 2431.04, "text": " Yeah, yeah, + we definitely, and also a lot of people, you know, we expect people to keep their", + "tokens": [51296, 865, 11, 1338, 11, 321, 2138, 11, 293, 611, 257, 688, 295, 561, + 11, 291, 458, 11, 321, 2066, 561, 281, 1066, 641, 51636], "temperature": 0.0, "avg_logprob": + -0.23021762742908722, "compression_ratio": 1.6470588235294117, "no_speech_prob": + 0.0008808431448414922}, {"id": 379, "seek": 243104, "start": 2431.04, "end": 2437.52, + "text": " data warehouse and their document store, you know, because we are, we + are your infected database. We''re", "tokens": [50364, 1412, 22244, 293, 641, 4166, + 3531, 11, 291, 458, 11, 570, 321, 366, 11, 321, 366, 428, 15414, 8149, 13, 492, + 434, 50688], "temperature": 0.0, "avg_logprob": -0.22446534253548886, "compression_ratio": + 1.5935828877005347, "no_speech_prob": 0.0006363232969306409}, {"id": 380, "seek": + 243104, "start": 2437.52, "end": 2447.44, "text": " not your blob storage or document + storage. So a lot of people use PINCO and alongside the data warehouse", "tokens": + [50688, 406, 428, 46115, 6725, 420, 4166, 6725, 13, 407, 257, 688, 295, 561, 764, + 430, 1464, 12322, 293, 12385, 264, 1412, 22244, 51184], "temperature": 0.0, "avg_logprob": + -0.22446534253548886, "compression_ratio": 1.5935828877005347, "no_speech_prob": + 0.0006363232969306409}, {"id": 381, "seek": 243104, "start": 2448.96, "end": 2456.08, + "text": " or some other database. And the easier and more seamless connections are + between the two,", "tokens": [51260, 420, 512, 661, 8149, 13, 400, 264, 3571, 293, + 544, 28677, 9271, 366, 1296, 264, 732, 11, 51616], "temperature": 0.0, "avg_logprob": + -0.22446534253548886, "compression_ratio": 1.5935828877005347, "no_speech_prob": + 0.0006363232969306409}, {"id": 382, "seek": 245608, "start": 2456.56, "end": 2461.36, + "text": " the easier it is to get factory search into production. So that''s what + we''re thinking about.", "tokens": [50388, 264, 3571, 309, 307, 281, 483, 9265, + 3164, 666, 4265, 13, 407, 300, 311, 437, 321, 434, 1953, 466, 13, 50628], "temperature": + 0.0, "avg_logprob": -0.25223926197398794, "compression_ratio": 1.5737051792828685, + "no_speech_prob": 0.00350744160823524}, {"id": 383, "seek": 245608, "start": 2462.0, + "end": 2468.4, "text": " Yeah, sounds great. Thanks. And yeah, I think we can wrap + up like I really enjoyed talking to you, Greg.", "tokens": [50660, 865, 11, 3263, + 869, 13, 2561, 13, 400, 1338, 11, 286, 519, 321, 393, 7019, 493, 411, 286, 534, + 4626, 1417, 281, 291, 11, 11490, 13, 50980], "temperature": 0.0, "avg_logprob": + -0.25223926197398794, "compression_ratio": 1.5737051792828685, "no_speech_prob": + 0.00350744160823524}, {"id": 384, "seek": 245608, "start": 2468.4, "end": 2474.96, + "text": " I mean, your, your T-shirt is the best. I, once I get it, I will wear + it as well. And I''ll be", "tokens": [50980, 286, 914, 11, 428, 11, 428, 314, 12, + 15313, 307, 264, 1151, 13, 286, 11, 1564, 286, 483, 309, 11, 286, 486, 3728, 309, + 382, 731, 13, 400, 286, 603, 312, 51308], "temperature": 0.0, "avg_logprob": -0.25223926197398794, + "compression_ratio": 1.5737051792828685, "no_speech_prob": 0.00350744160823524}, + {"id": 385, "seek": 245608, "start": 2474.96, "end": 2481.6, "text": " compatible + with the researcher. So thanks so much for your thoughts. I mean, this was super + deep. But I", "tokens": [51308, 18218, 365, 264, 21751, 13, 407, 3231, 370, 709, + 337, 428, 4598, 13, 286, 914, 11, 341, 390, 1687, 2452, 13, 583, 286, 51640], "temperature": + 0.0, "avg_logprob": -0.25223926197398794, "compression_ratio": 1.5737051792828685, + "no_speech_prob": 0.00350744160823524}, {"id": 386, "seek": 248160, "start": 2481.6, + "end": 2488.08, "text": " mean, also you shared some of your personal kind of, you + know, attitude and aspirations in this area.", "tokens": [50364, 914, 11, 611, 291, + 5507, 512, 295, 428, 2973, 733, 295, 11, 291, 458, 11, 10157, 293, 32458, 294, 341, + 1859, 13, 50688], "temperature": 0.0, "avg_logprob": -0.16516026910745873, "compression_ratio": + 1.5680933852140078, "no_speech_prob": 0.006213393993675709}, {"id": 387, "seek": + 248160, "start": 2488.08, "end": 2494.7999999999997, "text": " It''s still emerging, + but I mean, it''s great to see you guys at the forefront of it. And I hope to", + "tokens": [50688, 467, 311, 920, 14989, 11, 457, 286, 914, 11, 309, 311, 869, 281, + 536, 291, 1074, 412, 264, 27287, 295, 309, 13, 400, 286, 1454, 281, 51024], "temperature": + 0.0, "avg_logprob": -0.16516026910745873, "compression_ratio": 1.5680933852140078, + "no_speech_prob": 0.006213393993675709}, {"id": 388, "seek": 248160, "start": 2494.7999999999997, + "end": 2500.64, "text": " hear more. And just last question, where our listeners + can follow you or maybe like Twitter or LinkedIn,", "tokens": [51024, 1568, 544, + 13, 400, 445, 1036, 1168, 11, 689, 527, 23274, 393, 1524, 291, 420, 1310, 411, 5794, + 420, 20657, 11, 51316], "temperature": 0.0, "avg_logprob": -0.16516026910745873, + "compression_ratio": 1.5680933852140078, "no_speech_prob": 0.006213393993675709}, + {"id": 389, "seek": 248160, "start": 2500.64, "end": 2508.88, "text": " where are + you kind of publicly available? So for PINCO, you, on, on, we publish a lot of things + on", "tokens": [51316, 689, 366, 291, 733, 295, 14843, 2435, 30, 407, 337, 430, + 1464, 12322, 11, 291, 11, 322, 11, 322, 11, 321, 11374, 257, 688, 295, 721, 322, + 51728], "temperature": 0.0, "avg_logprob": -0.16516026910745873, "compression_ratio": + 1.5680933852140078, "no_speech_prob": 0.006213393993675709}, {"id": 390, "seek": + 250888, "start": 2508.88, "end": 2515.84, "text": " our website. So you can go to + PINCO.io and at the bottom, you can subscribe for email updates. And", "tokens": + [50364, 527, 3144, 13, 407, 291, 393, 352, 281, 430, 1464, 12322, 13, 1004, 293, + 412, 264, 2767, 11, 291, 393, 3022, 337, 3796, 9205, 13, 400, 50712], "temperature": + 0.0, "avg_logprob": -0.15501789930390147, "compression_ratio": 1.4326923076923077, + "no_speech_prob": 0.03184689208865166}, {"id": 391, "seek": 250888, "start": 2515.84, + "end": 2520.1600000000003, "text": " you get, you know, all these face articles + and things like that. You heard about, you''ll get them", "tokens": [50712, 291, + 483, 11, 291, 458, 11, 439, 613, 1851, 11290, 293, 721, 411, 300, 13, 509, 2198, + 466, 11, 291, 603, 483, 552, 50928], "temperature": 0.0, "avg_logprob": -0.15501789930390147, + "compression_ratio": 1.4326923076923077, "no_speech_prob": 0.03184689208865166}, + {"id": 392, "seek": 250888, "start": 2520.1600000000003, "end": 2527.36, "text": + " in your inbox on Twitter. We''re at PINCO and underscore IO. On LinkedIn, we also + have a big following", "tokens": [50928, 294, 428, 35067, 322, 5794, 13, 492, 434, + 412, 430, 1464, 12322, 293, 37556, 39839, 13, 1282, 20657, 11, 321, 611, 362, 257, + 955, 3480, 51288], "temperature": 0.0, "avg_logprob": -0.15501789930390147, "compression_ratio": + 1.4326923076923077, "no_speech_prob": 0.03184689208865166}, {"id": 393, "seek": + 252736, "start": 2528.32, "end": 2539.36, "text": " there. Me personally, I''m at + Gregory underscore Kogen. Gregory is GRI, GOR, IY underscore K-O-G-A-N.", "tokens": + [50412, 456, 13, 1923, 5665, 11, 286, 478, 412, 11490, 827, 37556, 591, 8799, 13, + 11490, 827, 307, 460, 5577, 11, 460, 2483, 11, 286, 56, 37556, 591, 12, 46, 12, + 38, 12, 32, 12, 45, 13, 50964], "temperature": 0.0, "avg_logprob": -0.3060902815598708, + "compression_ratio": 1.3175675675675675, "no_speech_prob": 0.02356462925672531}, + {"id": 394, "seek": 252736, "start": 2542.48, "end": 2547.6, "text": " But a lot + of things I post there are PINCO and related because that''s what I think about + a lot", "tokens": [51120, 583, 257, 688, 295, 721, 286, 2183, 456, 366, 430, 1464, + 12322, 293, 4077, 570, 300, 311, 437, 286, 519, 466, 257, 688, 51376], "temperature": + 0.0, "avg_logprob": -0.3060902815598708, "compression_ratio": 1.3175675675675675, + "no_speech_prob": 0.02356462925672531}, {"id": 395, "seek": 254760, "start": 2547.68, + "end": 2558.88, "text": " these days. And I''ll also add that big credit to you + for also leading the way with, with doing a", "tokens": [50368, 613, 1708, 13, 400, + 286, 603, 611, 909, 300, 955, 5397, 281, 291, 337, 611, 5775, 264, 636, 365, 11, + 365, 884, 257, 50928], "temperature": 0.0, "avg_logprob": -0.17860854012625557, + "compression_ratio": 1.5139664804469273, "no_speech_prob": 0.007292834110558033}, + {"id": 396, "seek": 254760, "start": 2558.88, "end": 2568.48, "text": " podcast + like this. Yeah, it''s exciting to see more people learn about this, about Factor + Search,", "tokens": [50928, 7367, 411, 341, 13, 865, 11, 309, 311, 4670, 281, 536, + 544, 561, 1466, 466, 341, 11, 466, 479, 15104, 17180, 11, 51408], "temperature": + 0.0, "avg_logprob": -0.17860854012625557, "compression_ratio": 1.5139664804469273, + "no_speech_prob": 0.007292834110558033}, {"id": 397, "seek": 254760, "start": 2568.48, + "end": 2574.3199999999997, "text": " and start thinking about it and implementing + it. And a lot of it is thanks to", "tokens": [51408, 293, 722, 1953, 466, 309, 293, + 18114, 309, 13, 400, 257, 688, 295, 309, 307, 3231, 281, 51700], "temperature": + 0.0, "avg_logprob": -0.17860854012625557, "compression_ratio": 1.5139664804469273, + "no_speech_prob": 0.007292834110558033}, {"id": 398, "seek": 257432, "start": 2574.8, + "end": 2582.2400000000002, "text": " evangelists like you who put in the work to + do that. So thanks to you. Glad to hear that,", "tokens": [50388, 24546, 1751, 411, + 291, 567, 829, 294, 264, 589, 281, 360, 300, 13, 407, 3231, 281, 291, 13, 28301, + 281, 1568, 300, 11, 50760], "temperature": 0.0, "avg_logprob": -0.21562442582907135, + "compression_ratio": 1.6311111111111112, "no_speech_prob": 0.018412329256534576}, + {"id": 399, "seek": 257432, "start": 2582.2400000000002, "end": 2587.1200000000003, + "text": " right? Thanks so much. And actually, I must say that I''m educating myself + equally on this journey.", "tokens": [50760, 558, 30, 2561, 370, 709, 13, 400, 767, + 11, 286, 1633, 584, 300, 286, 478, 28835, 2059, 12309, 322, 341, 4671, 13, 51004], + "temperature": 0.0, "avg_logprob": -0.21562442582907135, "compression_ratio": 1.6311111111111112, + "no_speech_prob": 0.018412329256534576}, {"id": 400, "seek": 257432, "start": 2587.1200000000003, + "end": 2594.0, "text": " So hopefully as part of this journey, you know, the listeners + and the readers can, can", "tokens": [51004, 407, 4696, 382, 644, 295, 341, 4671, + 11, 291, 458, 11, 264, 23274, 293, 264, 17147, 393, 11, 393, 51348], "temperature": + 0.0, "avg_logprob": -0.21562442582907135, "compression_ratio": 1.6311111111111112, + "no_speech_prob": 0.018412329256534576}, {"id": 401, "seek": 257432, "start": 2594.0, + "end": 2601.6000000000004, "text": " educate as well. So in the end, you know, value + increases by doing these things. So that''s,", "tokens": [51348, 16092, 382, 731, + 13, 407, 294, 264, 917, 11, 291, 458, 11, 2158, 8637, 538, 884, 613, 721, 13, 407, + 300, 311, 11, 51728], "temperature": 0.0, "avg_logprob": -0.21562442582907135, "compression_ratio": + 1.6311111111111112, "no_speech_prob": 0.018412329256534576}, {"id": 402, "seek": + 260160, "start": 2601.6, "end": 2608.7999999999997, "text": " that''s what drives + me here. So thanks so much for joining this show. Yeah, I hope we can record", "tokens": + [50364, 300, 311, 437, 11754, 385, 510, 13, 407, 3231, 370, 709, 337, 5549, 341, + 855, 13, 865, 11, 286, 1454, 321, 393, 2136, 50724], "temperature": 0.0, "avg_logprob": + -0.2604670891394982, "compression_ratio": 1.3541666666666667, "no_speech_prob": + 0.009526950307190418}, {"id": 403, "seek": 260160, "start": 2608.7999999999997, + "end": 2617.52, "text": " another one at some time down the road. Yeah, that would + be awesome. Awesome. Thanks Greg. Bye bye.", "tokens": [50724, 1071, 472, 412, 512, + 565, 760, 264, 3060, 13, 865, 11, 300, 576, 312, 3476, 13, 10391, 13, 2561, 11490, + 13, 4621, 6543, 13, 51160], "temperature": 0.0, "avg_logprob": -0.2604670891394982, + "compression_ratio": 1.3541666666666667, "no_speech_prob": 0.009526950307190418}]' +--- + +Hello everyone, so, Dr. Podcast here. Today I have Greg Coggen with the charter of marketing. He works for Pinecon. So today we will dive into Pinecon and maybe Greg will give us some highlights as well. Hi Greg. It's me, Tree. Thanks for having me. Yeah, awesome. Thanks for joining. +So I was thinking maybe you can introduce yourself to our audience because actually I personally was quite impressed that you're so technical and even though you're in charge of marketing, you're like your lingo is so technical. +So technical, so can you do have some technical background? Yeah, I actually have a degree in naval architecture. It's an engineering degree and that was my career for three years. And I did systems engineering and mechanical engineering electrical and so on. +While I was doing that, I also was moonlighting as a web developer and taught myself PHP and things like that and reading about startups and eventually became clear that I should make my day jobs related to startups. +And so I left my engineering career and went to work with startups with marketing and I fell in love with it. That was about nine years ago. And I've been working with for the past eight years. I was consulting and advising technical startups on marketing. +And I loved it because I was able to use my engineering thinking and get along well with technical founders and the like the coding foundation I had allowed me to get a grasp for what it is the products do. + And last year I joined pineconous the VP of marketing and the engineering background certainly helps we have a technical product technical users and really everyone at the company has a very technical background, even our director of product has a PhD in electrical engineering just to give you a sense. +And I was like, wow, that's impressive. Yeah, that's like you mentioned to the H.P. actually this was one of the first languages I called learned to code and decide Pearl, but yeah, this days. +I'm almost I slowed down before before I told people I learned PHP because I know there's a bit of stigma with it. It was like messy and it's like, you know, not as pristine or. +Yeah, as fancy as something else, but they got the job job done like with with that foundation, a lot of other things made a lot more sense. Yeah, absolutely. Yeah, I mean, I also enjoyed actually like it was one of the first jobs I got was in PHP. +So I built like a forum and every class in the code was starting with oops and I was asking the new engineer doesn't mean all OP like object oriented programming. And he said, no, it just means oops, I'm not technical. So he wasn't technical enough to know what is all P. +But anyway, that was kind of funny. Yeah, that's cool. So basically like you have the technical background. You also know how to explain things. +I think it's very important in our profession at large and sounds like you've been you've been advancing into this topic more and more to the level of becoming, you know, symbol or like BPO marketing actually to be precise, right. Yeah, that's awesome. +So tell me more a bit more about fine code like what what are you guys building and yeah, I know that you've recently had a major upgrade of fine code. Maybe if you wish you could highlight some of the improvements you guys made. Sure. +So we're building a vector database that makes it very easy to deploy to build and deploy the vector search into production applications. This is especially useful for semantic search and recommendation systems. +There are, we saw, I should say the founders saw that there's several ways of doing this to try and emulate the big companies like Facebook, Google, Microsoft and Spotify. + They all involved a lot of engineering work and a lot of infrastructure work and maintenance to actually make it run in production, whether you're a small startup and have better things to focus on or a big tech company and also have better things to focus on, especially when supporting your search and recommender systems would involve like a big team of engineers. +So we recently announced pine cone 2.0 and that's that's a major release that gets us closer to helping companies deploy this in production. +So one of the biggest things we've heard from users is that to get this in production, they need to emulate some of their traditional search features they had before, but they're trying to replace. And that was specifically filtering. +They wanted to have some control over the nearest neighbor search results that they were getting through pine cone. +Another thing was cost since typically vector search nearest neighbor searches are done in memory companies with millions and billions of items, which are the types of companies that benefit most from pine cone. +We're finding it prohibitively expensive to do vector search not just on pine cone, but anywhere. And so for them, the barrier to getting into production wasn't lack of engineering teams. It was like just astronomical cost projections. +And so for that, we are releasing hybrid storage, which stores part of the, which basically stores some data on disk and a smaller amount of data in memory, which reduces costs up to 10x, reduces infrastructure costs. And we're passing that along to users. +So it's going to reduce it or manage the infrastructure, but their costs are going to go down as well. And there's some other things like sock to compliance. They're totally new rest API and Python client. And console. And a bunch of other things as well. +So, yeah, and there's even more I can't announce just yet, but we're our engineering team is growing in our development velocities picking up as well. So we're going to have lots of new things to share very soon. Yeah, that's fantastic. Can't wait. And then compress on the on the 2.0 release. +But I just noticed your t-shirt says love the nearest neighbor. Wow. This is so relevant to this discussion. We have lots more of these. Anyone can send me an email. I got pine cone that I know and I'll get your form to fill out to get one. Oh, thanks. Thanks, Greg. I'll gladly wear it. +So yeah, I mean that that covers the value prop behind your product. So I mean the key element for me is also that as you said, you're reducing cost and you know, like you provide fully managed, you know, solution to better search. +So teams don't have to kind of like run around, figure out some low level things and just get to business. That's great. So the next thing I wanted to ask you like more like on the lines of how you know there are different ways of implementing vector search right and there are different algorithms. +There is an end bench marks that will be big and then benchmarks soon as well. That competition is going for listeners outshare the link as well. But what ways did you kind of consider to implement your tech. I know some parts of it are proprietary. +So maybe you cannot share too much detail, but maybe you can share some things give us a clue how you do things on kind of like algorithmic side and also like kind of like speak to the product that large like, you know, you mentioned, so see two compliance. +So it was very important for your customers, right. So that also is kind of included in the how part. +Yeah, I'll be a little lighter on the technical side because I would rather, I'd rather point you to our docs and point people to our docs and some of the articles and examples we have, then say something that's imprecise from a technical standpoint. +Generally that there are sort of three layers, we see three layers in the inside of vector search solution or vector database. The lowest layer is the near neighbor search algorithm like annoy or hnsw. Then there's an index library, which contains those algorithms and that's like face. +And then there's a shell around that, which we're calling vector database that provides things like live index updates and crowd operations on vectors and filtering and metadata storage and things like that. So for the index, we, Pankoan does use face for exact search. +You can choose if what sort of engine you're running and a proprietary index for approximate search, which is obviously the bulk of use cases for us. +And we thought a lot about performance comparisons, maybe even open sourcing that proprietary index so we can, so we can be included in an end benchmark. +While we were thinking about that, we learned from users that actually like eaking out slightly more slightly lower latencies or slightly higher recall from the index was not really what they were after. That's not where they were stuck. +They were stuck on downstream things like horizontal scaling and adding features to an index. Setting up the infrastructure and managing it. And so since learning that and valid data that we focus much more on those things. And stayed with a proprietary index for the for approximate search. +And sure enough, we find that even people who ask a lot about this after they sign up and start using it, they really, you know, the solve their use case and they don't ask us about it again after that from some other search or recommendation system to vector search. +And you're just looking for an easy way to run it in production. So that's the use case just implement vector search and production. + Or a lot of people come to us from from like an application side, which is they don't even know they want to use vector search, but they know they want to replace their semantic their keyword search with semantic search or they want to implement image similarity search that will work on fuzzy matches or they want to do anomaly detection. +So and or classification and things like that. It really is it has as many applications as search information retrieval general. A lot of people come to us for vectors, excuse me for semantic search. So they have their embedding models like bird or something like that. +And they got it working in the lab, the data science team got semantic search working using embeddings. Now they're like, okay, engineering team or ML engineering team. +How do we get this in our product? How do we make this? How do we keep latency below 200 milliseconds? How do we add filtering to this to give users control. And the ML engineering team is then goes out and finds like, oh, we can do this with something like bank home. +So those are those are the typical use cases, I would say semantic search, the most common or somebody just coming because they're looking for vector search and regardless of what it's for. Yeah, yeah, from from our from our perspective for pine cone. +We don't care what your data is like if it's in an embedding format, you can index it and then you search through it. Any it works with any model, any any, you know, initial data and because we have a rest API, you can call it from anywhere. +So you can use it in a notebook, you can use it in the backend application. Yeah, we're seeing a lot of interesting use cases. Yeah, sounds great. Sounds like a lot actually of use case that you mentioned. +I mean, obviously it's search, but then the answer to could also be like data science that they want to run. +If you take five, for instance, you know, metadata science teams, they run like large scale experiments using the library, but like obviously when that's the data science part, that's the exploration part. +But the moment you want to put this out to prod, you'll face a bunch of kind of like low level engineering concerns, like, oh, how do I do this? How do you do that? Reinventing the wheel isn't ever fun. +Well, sometimes it's fun if it's kind of like they'd work, but if you don't have time, you kind of like when I'm both faster than obviously you will want to use an existing solution for that. Yeah, we find that, you know, for the data science team, they don't, it's not their issue. +They need to develop the model and and prove that the method works. It becomes an engineering teams issue or the ML engineering teams issue. And yeah, they're often not exactly lacking things to do. +So, some organizations are all about like focusing on the core product and trying to use managed services wherever possible. Others like to develop things in house and prefer to take open source as much as possible. +So I think it depends on your, you know, how you prioritize your focus and what kind of, you know, what's your engineering culture at the company? Yeah, absolutely. +And sounds like you also address the elements of like, so see to and I believe you also will have GPR covered at some point already covered. +So we say we're GDPR friendly, which means there's no, there's no official certification you can get for being GDPR, you can just be following the regulations and able to make the proper disclosures and able to act on requests for deletion and things like that. +These are the types of things like the security aspects. It's another thing that a data science team might not force to when they're developing like a factor search solutions to some application. But when it goes engineering when you start talking about getting it into production. +And depending on the company, you're, yeah, you start, you know, all these things come up. +Does it meet our security, does it pass our security review? Does it pass our reliability requirements? Who's going to be on call if this thing goes down like all these things come up and we worry about those things so that the users don't have to. +Yeah, that's a big benefit like to the users again to focus on what matters to them. And by the way, I don't want to just so this doesn't come off as like promotional. +Anyone listening to this can treat this as just heads up about what you should think about if you want to get vector search in your production, even if you're using some other solution, like these are things you should plan for. And start thinking about it and making. Leaving time to do. +Yeah, absolutely. You don't want to be caught by surprise in those, those items for sure. Yeah, that's awesome. By the way, I remember that you guys also made a bunch of blog posts on like FICE and LSH. I mean, I really like the way you did it. +You know, it's almost look what looks like a comic book, you know, you know, get, get, get deep with, with these things. And I think you also shared the source code, like some notebooks. Is that right? Is that. Yeah, we, we, we publish. And articles on vector search on face on. +Semantic search different techniques and things like that. A lot of them are done by the very talented. James breaks, I should give him a shout out. We have a new one today about the index composite indexes and index factory in face. +We share code snippets and we have example notebooks for all of them. And yeah, we're very happy to see people like them. Even people who are not familiar with vector search will see it and it peaks their interest because engineers like to see how things work and learn new things. +And that's our goal. It's some of them have almost nothing to do with pine cone. And we have more people to learn about. Vector search to. Realize that they can use vector search to replace their to improve the current applications and. +If we succeed in that, I think it'll certainly help us, but really everyone in the. In this space. Yeah, I mean, absolutely. Those looks like our jam, you know, people are reading, citing and kind of discussing on Slack and things like that. +And yeah, it sounds like you guys are also kind of willing to share your knowledge with the community, even like beyond kind of share, you know, customer interaction and so on. Right. So that's that's awesome. Yeah. I think we are moving slowly to the third section of our podcast, which is why. +And I think I know it's a little bit more philosophical kind of stance and what you do. And kind of like how you do it. I don't know if you've been reflecting on your journey. I know you said you joined last year. Join bank on last year. +But I guess I'll start off by just asking you what motivates you to be part of vector search development and this community as much. Me personally, I've worked with 40, 40 startups when I was consulting over 40 startups. And when I met, you know, the founder of pine cone. +And learned about the product and about the space. I saw a familiar pattern, which caught my attention. And the familiar pattern was from 2015. Six years ago now, almost seven when I started working with the time, very small company called Domino Data Lab, which. +It's an ML ops platform at the time, we call that a data science platform. It's used by over 20% or the fortune 500 companies. And the time was a small team and it was a product for data scientists, but like nobody knew exactly what is a data scientist. +Few people called themselves that even if they were doing data science work. A lot of work data science work was done on just people's laptops. And there's no. It was a very young. Area, let's say. Not not quite mature. There's not a tooling for it and so on. +And over time, over a few years, it became, of course, data science became. A core function in many companies, like just like engineering and marketing and customer support. And as that happened, like having the right tooling for that function. +And kind of maturing the capabilities and making sure it's everything data sciences run can run in production securely and reliably and things like that. And so it became more important. Of course, the companies that were. Solving those things were growing with that demand. +And so I wanted to be a part of that journey, that kind of journey again. And again, I saw in Pinecon, I saw product that is pretty early in the space. And I saw a lot of data based concept and. We had to spend a lot of time explaining to people what that means they weren't getting it. +On the user side, you see many, many engineers doing ML engineering work. We don't yet call themselves ML engineers. They're still titled the software engineers. Or they might get data scientists, but they're now working on like production applications. +And also we see that companies are struggling as they as they want to take vector search out of the lab and into production production applications. They're running up against the same challenges like the technology they have. They had available wasn't quite. Built for that. +For huge scale and for like secure and reliable. And so that's the environment. And. Yeah, that's exciting to be. To be in an emerging category like that. And solve a real need and see watch the need. Grow. Yeah. That's my personal, you know, that's what motivates me. And that's why. +So I'm excited to be here. If you want to go even even on a more philosophical level, like. It's really rewarding to me to. Help grow. The kinds of technologies that are powering. Our like software infrastructure, which, which, which, which everything in this world runs on today. +So it's really a big thing to do. It's a fact that it's kind of behind the scenes and under the hood that you know most consumers and most people don't know that. Their Facebook feed is powered by similarity search, or that their Google search is powered by similar research. +But it even without them knowing it affects them tremendously. I feel like we have a. And I think that's really. I think that's really. I think that's really. I think that's really. I think that's really. Yeah. Yeah. Yeah. Sounds. So deep. I mean, your connection to it. +And in general, like it sounds like you're excited to be at the bleeding edge of stack, right? So kind of like. Building the next thing. It's. I think it's always exciting. Of course, it's also. And in many ways. Kind of. Well, I don't want to use the word dangerous. I want to use the word. +Kind of like intense. And you know, like. It's nice and bold. That was saying, right? Yeah. Yeah. It's. For sure. We don't know how the future will play out. We have our hopes and. And we're making our bets. But. It's exciting to try it. And it. That's. It motivates us. And it's. It's. +Yeah, we're not looking for safe. For safety here. Yeah. Yeah. Absolutely. But on that front, like on the future, a little bit. Touching on the future of this market, even though it's emerging. You know, and it's still unfolding in many ways. And there are so many players already. +But I'm just thinking like. What do you think. Kind of. What strategic items and missing on the market right now? You know, when you think about not the data science part, I think that data science is developed quite well. We have a lot of competing, you know, algorithms and frameworks. +But like more like on the business side, right? And maybe that's in line with like how users understand the systems. Maybe they don't understand enough. Like, or like what items and missing? And maybe you're working on that. +Maybe you're willing to share maybe not, but maybe something along those lines that we can discuss. Yeah, I think you actually. You made the right point, which is. For a certain for a certain audience. There's not. +I mean, there's still more to be done for a very technical audience that's familiar with the vector search. They have a lot of tools in front of them and. And right now, whatever extra features they needed, they've. They've hopefully figured out. +It's everyone else who doesn't yet understand this and doesn't quite see. How it applies to their applications. And for whom it's not clear what, you know, how to choose an algorithm, how to tune it. That's, I think the future isn't educating. +Those people and those companies and then bringing this capability to them. And that means just helping them understand what it is, but it also means making the. Products more accessible to them. Like. And they can care of some of the technical details so that they can just focus on. +Yeah, they're business side of things in their application. And there are many, many companies out there that can use vector search, but just haven't heard of it. Don't realize it. And. I think the future is in reaching those people. And I think even looking. Beyond vector search and just. +Vector embeddings in general, I think as more, as more and more companies adopt. Machine learning and learn about. And LP. And continue hiring for data scientists and now machine learning engineers, which. By the way, are growing at a faster pace than. Data scientists. +The number of people with machine learning engineer titles on LinkedIn. Group by something like 16%. In Q2 of this year, which is when I last. Check this, where is data scientists grew by. I don't remember exactly, but single digits. +So and obviously not all those mental engineers are working on vector search. But. They will have more and more. Vector embedding data. But they're trying to wrangle. They want to maintain and analyze. And in some cases search through, but also maybe just. Feed into other models. And so on. And so. +In the past five years, we saw. That's. That's why we want to work with data. And a lot of questions have been asked. And so there are. And so they need to think about the question of data warehouses and data lakes, and really like. Companies. Realizing they need to centralize their all their data. +For their data science teams and analysts and so on. We. We believe. The same will. Companies will need the same thing for vector embeddings. So the, they have. One database for. all the can feed applications, feed training and analysis and so on. +So yeah, that might be a few years out and we'll see if that ever happens. But those are the kinds of things we're thinking about often. +Beyond research, how do we get, how do we help people get more use out of there? Yeah, I mean, so I guess it goes along the lines also of producing docs and the kind of documentation, kind of explaining and source code explaining things, right? +How can I, you know, keep the road running and kind of doing things with my, because I don't want to focus on like, you know, need to create the of the vector of search itself maybe, but actually what I really want is to achieve like, you know, my goal, right? +You know, let's say create a music search service or something like that, right? Yeah, exactly. +Yeah, it's, that's a trap. A lot of people and companies fall into the, they love the technology, they love, you know, they're very proud of, yeah, building something unique. +And as you should be, but you have to remember that that people you're serving are just trying to solve some, some business problem. + Some of them, the early adopters, they'll, they might be very curious about how, how it works under the hood and they might want to have the ability to pull some levers and turn some knobs, but the vast majority of people just want to implement machine learning into the applications to create a smarter search function to increase user engagement, things like that. +Yeah, yeah, I think that was also one recent article. I will also link it in the, in the notes, explaining that you can apply a vector search to solve the zero heat problem in e-commerce. And, and that's how you can save, well, actually earn money, right? So save the user experience in that sense. +Yeah, so it sounds like more and more use cases are coming up. I mean, you guys at the forefront of actually hearing what are the use cases, right? And kind of hopefully you'll be sharing some of those with the audience at large and something we'll learn from you guys. +Yeah, I've, we're still, we're still constantly surprised by what people want to do with the vector search. And, yeah, we want to make the product available to as many people as possible to see what they come up with. +I will say though, we also surprised in a way by how many people want to just do vector search on text data, which seems like such a simple thing to us, maybe. But it gets back to this point that not everyone is, you know, this far along as, as people in the vector search community. +So we got to bring, we got to bring more people with us and help them see that once they're done with a semantic search use case, there's actually a lot more they can do with it. +Yeah, I think it's something that probably needs a bit of kind of discovery for everyone, but also sort of like blogging more about that and sharing more about that that, no, it's not only text, it's actually everything that is inculcable as a vector, right? +And could be dense, it could be sparse, it could be whatever you have there as long as it's a vector. +Then you can send it in index and search and then you need tools to choose the metric function, right? We didn't talk about it, but I know you guys support like three major distances like Euclidean and dot product and to sign. +Yeah, so I mean, these are like more or less this standards across many, you know, data science applications, but I'm sure there is somebody somewhere sitting in the garage and venting in new metric and probably you will want to kind of provide plug-in architecture for that case as well, right? +Yeah, well, we have our own people in this figurative garages working on stuff as well. +But also to go back to the previous thing, the vector database that surrounds the engine as well, which might just look like more traditional database features rather than and simply applied to vector search rather than some breakthrough algorithms or things like that. +Although, yeah, you know, the filtering that we introduced with point on 2.0 is doing single stage filtering on vector index was, let's say, let's not say that it's a collot of late nights in the garage. +Yeah, yeah, sounds exciting and sounds like what your customers will benefit from, right? Almost immediately. Yeah, that's fantastic. +Yeah, I was thinking, like, do you want to add anything more on Vinegon or like, for instance, if somebody wants to try it out today, what's the process looks like or should they just shoot you at the email? Yeah, well, if they want to shoot me an email, they're welcome to do that. +If they want to a t-shirt, send me an email, grabgetpanko.io. But actually, we want to make it very easy for people to start and experiment with. And so you can go to panko.io slash start and create a free account. And for small workloads, it's actually free to use. +You get one pod, which is enough for, definitely enough for experimenting. And if you have a small workload, it's enough for your production use case. That's the easiest and fastest way to sign up. You don't have to talk to anyone. +If you need custom deployment configurations, like certain availability zones, or anything else, you can send me an email or you can use a contact form on our site and we'll get you set up. And it's almost as quick. We just have to set up some configurations. +But we want to help you get to production. And that means not standing in your way. So that's the best way to do it. Go to panko.io slash start. Awesome. And we'll make sure to link that in the notes as well. And you said, what do you mean Kubernetes, right? Kubernetes pod. Oh yeah. +I mean, we didn't touch on this in this in this episode, but obviously you guys are scaling with Kubernetes. So you're also modern on that site as well. Oh yeah, we, we, you know, I should have mentioned this when you asked about the inner workings. +But yeah, we're using Kubernetes to make the whole service horizontally scalable. And of course, the total managed on our side. So you don't have to know anything about containers or Kubernetes or, or worry about any of it. +But I mean, Kafka for streaming to support streaming index updates or batch, batch updates. There are load balancers that are API gateways that are just a bunch of different. There's a key value store under the hood. +If you want to see the architecture, you can cut our docs and learn a bit more about it. But again, you don't have to know anything about it. And that's the point. We make it a, you just make your API calls and, and get your results and do something with those results. Yeah, exactly. Fantastic. +And by the way, are you planning to kind of at some point, maybe open source, or actually implement some things for the public to send the data in? Well, do you think it's not a problem at all? +You know, kind of some kind of connector called some kind of gluing code to the pinecon on the side of integration, right? So I guess obviously clients will still look at how do they plug in pinecon in the right part of the architecture. +Yeah, we're thinking a lot about that. We're looking at what are the most common data sources. What is typical usage look like? And what's the trickiest part for people? And we are thinking about how to make the trickiest part, parts easiest, as many people as possible. +So can't say much more than that, but certainly we'll have some common use cases covered soon. Yeah, yeah, sure. I mean, that's so important. I mean, a lot of things like in machine learning, you know, that like 80% goes to data collection and cleaning. +And then in the end, you plug in some water, like, ooh, I sold the task, right? And the same kind of goes to the trying databases or, you know, software like, okay, how do I plug this in? And days go by and you're still figuring things out. So I think that's the, that's something to address. +And I guess you guys are doing that, right? Yeah, yeah, we definitely, and also a lot of people, you know, we expect people to keep their data warehouse and their document store, you know, because we are, we are your infected database. We're not your blob storage or document storage. +So a lot of people use PINCO and alongside the data warehouse or some other database. And the easier and more seamless connections are between the two, the easier it is to get factory search into production. So that's what we're thinking about. Yeah, sounds great. Thanks. +And yeah, I think we can wrap up like I really enjoyed talking to you, Greg. I mean, your, your T-shirt is the best. I, once I get it, I will wear it as well. And I'll be compatible with the researcher. So thanks so much for your thoughts. I mean, this was super deep. +But I mean, also you shared some of your personal kind of, you know, attitude and aspirations in this area. It's still emerging, but I mean, it's great to see you guys at the forefront of it. And I hope to hear more. +And just last question, where our listeners can follow you or maybe like Twitter or LinkedIn, where are you kind of publicly available? So for PINCO, you, on, on, we publish a lot of things on our website. So you can go to PINCO.io and at the bottom, you can subscribe for email updates. +And you get, you know, all these face articles and things like that. You heard about, you'll get them in your inbox on Twitter. We're at PINCO and underscore IO. On LinkedIn, we also have a big following there. Me personally, I'm at Gregory underscore Kogen. +Gregory is GRI, GOR, IY underscore K-O-G-A-N. But a lot of things I post there are PINCO and related because that's what I think about a lot these days. And I'll also add that big credit to you for also leading the way with, with doing a podcast like this. +Yeah, it's exciting to see more people learn about this, about Factor Search, and start thinking about it and implementing it. And a lot of it is thanks to evangelists like you who put in the work to do that. So thanks to you. Glad to hear that, right? Thanks so much. +And actually, I must say that I'm educating myself equally on this journey. So hopefully as part of this journey, you know, the listeners and the readers can, can educate as well. So in the end, you know, value increases by doing these things. So that's, that's what drives me here. +So thanks so much for joining this show. Yeah, I hope we can record another one at some time down the road. Yeah, that would be awesome. Awesome. Thanks Greg. Bye bye. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/jo-bergum-distinguished-engineer-yahoo-vespa-journey-of-vespa-from-sparse-into-neural-search.md b/transcripts_with_timestamps/vector-podcast/jo-bergum-distinguished-engineer-yahoo-vespa-journey-of-vespa-from-sparse-into-neural-search.md new file mode 100644 index 0000000..1eb124f --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/jo-bergum-distinguished-engineer-yahoo-vespa-journey-of-vespa-from-sparse-into-neural-search.md @@ -0,0 +1,4169 @@ +--- +description: '

Topics:

00:00 Introduction

01:21 Jo Kristian’s background + in Search / Recommendations since 2001 in Fast Search & Transfer (FAST)

03:16 + Nice words about Trondheim

04:37 Role of NTNU in supplying search talent and + having roots in FAST

05:33 History of Vespa from keyword search

09:00 + Architecture of Vespa and programming language choice: C++ (content layer), Java + (HTTP requests and search plugins) and Python (pyvespa)

13:45 How Python API + enables evaluation of the latest ML models with Vespa and ONNX support

17:04 + Tensor data structure in Vespa and its use cases

22:23 Multi-stage ranking + pipeline use cases with Vespa

24:37 Optimizing your ranker for top 1. Bonus: + cool search course mentioned!

30:18 Fascination of Query Understanding, ways + to implement and its role in search UX

33:34 You need to have investment to + get great results in search

35:30 Game-changing vector search in Vespa and + impact of MS Marco Passage Ranking

38:44 User aspect of vector search algorithms

43:19 + Approximate vs exact nearest neighbor search tradeoffs

47:58 Misconceptions + in neural search

52:06 Ranking competitions, idea generation and BERT bi-encoder + dream

56:19 Helping wider community through improving search over CORD-19 + dataset

58:13 Multimodal search is where vector search shines

1:01:14 + Power of building fully-fledged demos

1:04:47 How to combine vector search + with sparse search: Reciprocal Rank Fusion

1:10:37 The philosophical WHY question: + Jo Kristian’s drive in the search field

1:21:43 Announcement on the coming + features from Vespa

- Jo Kristian’s Twitter: https://twitter.com/jobergum

- + Dmitry’s Twitter: https://twitter.com/DmitryKan

For + the Show Notes check: https://www.youtube.com/watch?v=UxEdoXtA9oM

' +image_url: https://media.rss.com/vector-podcast/20220412_120408_e18078d3137041275301d6bf045caa0e.jpg +pub_date: Tue, 12 Apr 2022 12:29:08 GMT +title: Jo Bergum - Distinguished Engineer, Yahoo! Vespa - Journey of Vespa from Sparse + into Neural Search +url: https://rss.com/podcasts/vector-podcast/452635 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 22.28, "text": " Everyone, + Vector Podcast is here. I hope you have been waiting for another episode.", "tokens": + [50364, 5198, 11, 691, 20814, 29972, 307, 510, 13, 286, 1454, 291, 362, 668, 3806, + 337, 1071, 3500, 13, 51478], "temperature": 0.0, "avg_logprob": -0.4239329383486793, + "compression_ratio": 1.2727272727272727, "no_speech_prob": 0.14029882848262787}, + {"id": 1, "seek": 0, "start": 22.28, "end": 28.88, "text": " And today I have a + rock star with me. Joe Christian Bergum, a distinguished engineer", "tokens": [51478, + 400, 965, 286, 362, 257, 3727, 3543, 365, 385, 13, 6807, 5778, 27511, 449, 11, 257, + 21702, 11403, 51808], "temperature": 0.0, "avg_logprob": -0.4239329383486793, "compression_ratio": + 1.2727272727272727, "no_speech_prob": 0.14029882848262787}, {"id": 2, "seek": 2888, + "start": 28.88, "end": 34.4, "text": " with Yahoo. And he has been super vocal in + the field of Vector Search. And he has been", "tokens": [50364, 365, 41757, 13, + 400, 415, 575, 668, 1687, 11657, 294, 264, 2519, 295, 691, 20814, 17180, 13, 400, + 415, 575, 668, 50640], "temperature": 0.0, "avg_logprob": -0.27326652125308387, + "compression_ratio": 1.5799086757990868, "no_speech_prob": 0.36996251344680786}, + {"id": 3, "seek": 2888, "start": 34.4, "end": 39.92, "text": " also advocating for + one of the famous Vector Search engines and actually like a platform.", "tokens": + [50640, 611, 32050, 337, 472, 295, 264, 4618, 691, 20814, 17180, 12982, 293, 767, + 411, 257, 3663, 13, 50916], "temperature": 0.0, "avg_logprob": -0.27326652125308387, + "compression_ratio": 1.5799086757990868, "no_speech_prob": 0.36996251344680786}, + {"id": 4, "seek": 2888, "start": 39.92, "end": 44.8, "text": " Shirley Jo can talk + more about it called Vespa. Hey Joe, how are you doing?", "tokens": [50916, 43275, + 3139, 393, 751, 544, 466, 309, 1219, 691, 279, 4306, 13, 1911, 6807, 11, 577, 366, + 291, 884, 30, 51160], "temperature": 0.0, "avg_logprob": -0.27326652125308387, "compression_ratio": + 1.5799086757990868, "no_speech_prob": 0.36996251344680786}, {"id": 5, "seek": 2888, + "start": 46.4, "end": 52.239999999999995, "text": " Hey Dimitri, I''m good, thanks. + How are you? I''m great. Thank you very much for taking time to", "tokens": [51240, + 1911, 20975, 270, 470, 11, 286, 478, 665, 11, 3231, 13, 1012, 366, 291, 30, 286, + 478, 869, 13, 1044, 291, 588, 709, 337, 1940, 565, 281, 51532], "temperature": 0.0, + "avg_logprob": -0.27326652125308387, "compression_ratio": 1.5799086757990868, "no_speech_prob": + 0.36996251344680786}, {"id": 6, "seek": 5224, "start": 52.24, "end": 60.400000000000006, + "text": " talk to me. It''s fantastic being here on your show. It''s become so popular. + Thank you for that", "tokens": [50364, 751, 281, 385, 13, 467, 311, 5456, 885, 510, + 322, 428, 855, 13, 467, 311, 1813, 370, 3743, 13, 1044, 291, 337, 300, 50772], "temperature": + 0.0, "avg_logprob": -0.1979934310913086, "compression_ratio": 1.6153846153846154, + "no_speech_prob": 0.0923013761639595}, {"id": 7, "seek": 5224, "start": 60.400000000000006, + "end": 68.32000000000001, "text": " introduction. I''m not sure if I''m a rock star. + It''s really interesting to be here. I really", "tokens": [50772, 9339, 13, 286, + 478, 406, 988, 498, 286, 478, 257, 3727, 3543, 13, 467, 311, 534, 1880, 281, 312, + 510, 13, 286, 534, 51168], "temperature": 0.0, "avg_logprob": -0.1979934310913086, + "compression_ratio": 1.6153846153846154, "no_speech_prob": 0.0923013761639595}, + {"id": 8, "seek": 5224, "start": 68.32000000000001, "end": 74.64, "text": " look + forward for our conversation on Vector Search and maybe we''ll touch on language + models as well.", "tokens": [51168, 574, 2128, 337, 527, 3761, 322, 691, 20814, + 17180, 293, 1310, 321, 603, 2557, 322, 2856, 5245, 382, 731, 13, 51484], "temperature": + 0.0, "avg_logprob": -0.1979934310913086, "compression_ratio": 1.6153846153846154, + "no_speech_prob": 0.0923013761639595}, {"id": 9, "seek": 5224, "start": 74.64, "end": + 80.4, "text": " And they''ll talk a little bit about Vespa and the technology in + Vespa. I''m really excited.", "tokens": [51484, 400, 436, 603, 751, 257, 707, 857, + 466, 691, 279, 4306, 293, 264, 2899, 294, 691, 279, 4306, 13, 286, 478, 534, 2919, + 13, 51772], "temperature": 0.0, "avg_logprob": -0.1979934310913086, "compression_ratio": + 1.6153846153846154, "no_speech_prob": 0.0923013761639595}, {"id": 10, "seek": 8040, + "start": 81.04, "end": 86.96000000000001, "text": " Yeah, I''m looking forward to + that. And I mean, you are a rock star. I can hear you every way on", "tokens": [50396, + 865, 11, 286, 478, 1237, 2128, 281, 300, 13, 400, 286, 914, 11, 291, 366, 257, 3727, + 3543, 13, 286, 393, 1568, 291, 633, 636, 322, 50692], "temperature": 0.0, "avg_logprob": + -0.16460328249587225, "compression_ratio": 1.58008658008658, "no_speech_prob": 0.12272235006093979}, + {"id": 11, "seek": 8040, "start": 86.96000000000001, "end": 96.24000000000001, "text": + " Twitter and LinkedIn and blogging. And so what else? So this has been like this. + And I''m really", "tokens": [50692, 5794, 293, 20657, 293, 6968, 3249, 13, 400, + 370, 437, 1646, 30, 407, 341, 575, 668, 411, 341, 13, 400, 286, 478, 534, 51156], + "temperature": 0.0, "avg_logprob": -0.16460328249587225, "compression_ratio": 1.58008658008658, + "no_speech_prob": 0.12272235006093979}, {"id": 12, "seek": 8040, "start": 96.24000000000001, + "end": 103.60000000000001, "text": " glad to hear to talk to you here today. And + so as a tradition, could you please introduce yourself", "tokens": [51156, 5404, + 281, 1568, 281, 751, 281, 291, 510, 965, 13, 400, 370, 382, 257, 6994, 11, 727, + 291, 1767, 5366, 1803, 51524], "temperature": 0.0, "avg_logprob": -0.16460328249587225, + "compression_ratio": 1.58008658008658, "no_speech_prob": 0.12272235006093979}, {"id": + 13, "seek": 8040, "start": 103.60000000000001, "end": 107.28, "text": " however + you want to know the detail you want and we''ll take it from there?", "tokens": + [51524, 4461, 291, 528, 281, 458, 264, 2607, 291, 528, 293, 321, 603, 747, 309, + 490, 456, 30, 51708], "temperature": 0.0, "avg_logprob": -0.16460328249587225, "compression_ratio": + 1.58008658008658, "no_speech_prob": 0.12272235006093979}, {"id": 14, "seek": 10728, + "start": 107.36, "end": 115.68, "text": " Yeah. Yeah, so my name is Joe Christian + and I work for Yahoo. And I''ve been working for Yahoo''s", "tokens": [50368, 865, + 13, 865, 11, 370, 452, 1315, 307, 6807, 5778, 293, 286, 589, 337, 41757, 13, 400, + 286, 600, 668, 1364, 337, 41757, 311, 50784], "temperature": 0.0, "avg_logprob": + -0.22470376298234268, "compression_ratio": 1.5078534031413613, "no_speech_prob": + 0.04876043647527695}, {"id": 15, "seek": 10728, "start": 115.68, "end": 124.56, + "text": " is 2007. My current role in Yahoo is distinguished engineer and I work + on the Vespa platform.", "tokens": [50784, 307, 12656, 13, 1222, 2190, 3090, 294, + 41757, 307, 21702, 11403, 293, 286, 589, 322, 264, 691, 279, 4306, 3663, 13, 51228], + "temperature": 0.0, "avg_logprob": -0.22470376298234268, "compression_ratio": 1.5078534031413613, + "no_speech_prob": 0.04876043647527695}, {"id": 16, "seek": 10728, "start": 125.12, + "end": 132.88, "text": " And I''ve been working on Search and Recommendations since + about 2001. When I joined a company here", "tokens": [51256, 400, 286, 600, 668, + 1364, 322, 17180, 293, 49545, 521, 763, 1670, 466, 16382, 13, 1133, 286, 6869, 257, + 2237, 510, 51644], "temperature": 0.0, "avg_logprob": -0.22470376298234268, "compression_ratio": + 1.5078534031413613, "no_speech_prob": 0.04876043647527695}, {"id": 17, "seek": 13288, + "start": 132.88, "end": 139.04, "text": " as an intern during my studies, a company + called Fast Search and Transfer, an Norwegian company.", "tokens": [50364, 382, + 364, 2154, 1830, 452, 5313, 11, 257, 2237, 1219, 15968, 17180, 293, 35025, 11, 364, + 34875, 2237, 13, 50672], "temperature": 0.0, "avg_logprob": -0.21663724113913144, + "compression_ratio": 1.6088888888888888, "no_speech_prob": 0.014461626298725605}, + {"id": 18, "seek": 13288, "start": 140.07999999999998, "end": 144.64, "text": " + Back then they were doing web search with this web search engine called alldevab.com.", + "tokens": [50724, 5833, 550, 436, 645, 884, 3670, 3164, 365, 341, 3670, 3164, 2848, + 1219, 439, 40343, 455, 13, 1112, 13, 50952], "temperature": 0.0, "avg_logprob": + -0.21663724113913144, "compression_ratio": 1.6088888888888888, "no_speech_prob": + 0.014461626298725605}, {"id": 19, "seek": 13288, "start": 145.2, "end": 152.48, + "text": " So they started around 98 I think trying to compete with Google and so + on. And then Yahoo came", "tokens": [50980, 407, 436, 1409, 926, 20860, 286, 519, + 1382, 281, 11831, 365, 3329, 293, 370, 322, 13, 400, 550, 41757, 1361, 51344], "temperature": + 0.0, "avg_logprob": -0.21663724113913144, "compression_ratio": 1.6088888888888888, + "no_speech_prob": 0.014461626298725605}, {"id": 20, "seek": 13288, "start": 152.48, + "end": 157.68, "text": " along and bought the web search division. The team here + in Toronto. They also bought", "tokens": [51344, 2051, 293, 4243, 264, 3670, 3164, + 10044, 13, 440, 1469, 510, 294, 14140, 13, 814, 611, 4243, 51604], "temperature": + 0.0, "avg_logprob": -0.21663724113913144, "compression_ratio": 1.6088888888888888, + "no_speech_prob": 0.014461626298725605}, {"id": 21, "seek": 15768, "start": 158.56, + "end": 168.96, "text": " all the way star and so on. So that was back in 2003. And + in 2004, Vespa was born. So and I joined", "tokens": [50408, 439, 264, 636, 3543, + 293, 370, 322, 13, 407, 300, 390, 646, 294, 16416, 13, 400, 294, 15817, 11, 691, + 279, 4306, 390, 4232, 13, 407, 293, 286, 6869, 50928], "temperature": 0.0, "avg_logprob": + -0.24940004348754882, "compression_ratio": 1.5372340425531914, "no_speech_prob": + 0.007330908440053463}, {"id": 22, "seek": 15768, "start": 170.32, "end": 178.8, + "text": " I actually worked in Fast in the enterprise search division for some time, + three years. And then", "tokens": [50996, 286, 767, 2732, 294, 15968, 294, 264, + 14132, 3164, 10044, 337, 512, 565, 11, 1045, 924, 13, 400, 550, 51420], "temperature": + 0.0, "avg_logprob": -0.24940004348754882, "compression_ratio": 1.5372340425531914, + "no_speech_prob": 0.007330908440053463}, {"id": 23, "seek": 15768, "start": 178.8, + "end": 186.64000000000001, "text": " I joined Yahoo in 2007. And since then I''m + been here working on search and Vespa in Yahoo. So", "tokens": [51420, 286, 6869, + 41757, 294, 12656, 13, 400, 1670, 550, 286, 478, 668, 510, 1364, 322, 3164, 293, + 691, 279, 4306, 294, 41757, 13, 407, 51812], "temperature": 0.0, "avg_logprob": + -0.24940004348754882, "compression_ratio": 1.5372340425531914, "no_speech_prob": + 0.007330908440053463}, {"id": 24, "seek": 18664, "start": 187.27999999999997, "end": + 192.95999999999998, "text": " that''s my background. I also hold a master degree + in computer science from the Norwegian University", "tokens": [50396, 300, 311, + 452, 3678, 13, 286, 611, 1797, 257, 4505, 4314, 294, 3820, 3497, 490, 264, 34875, + 3535, 50680], "temperature": 0.0, "avg_logprob": -0.23920736208066837, "compression_ratio": + 1.4695652173913043, "no_speech_prob": 0.005616862792521715}, {"id": 25, "seek": + 18664, "start": 192.95999999999998, "end": 198.0, "text": " here in Toronto. Oh + yeah, that''s great. Actually, by the way, I did visit Toronto Hame.", "tokens": + [50680, 510, 294, 14140, 13, 876, 1338, 11, 300, 311, 869, 13, 5135, 11, 538, 264, + 636, 11, 286, 630, 3441, 14140, 389, 529, 13, 50932], "temperature": 0.0, "avg_logprob": + -0.23920736208066837, "compression_ratio": 1.4695652173913043, "no_speech_prob": + 0.005616862792521715}, {"id": 26, "seek": 18664, "start": 198.0, "end": 204.32, + "text": " Was it 2007 for an interview with one search company? Not fast.", "tokens": + [50932, 3027, 309, 12656, 337, 364, 4049, 365, 472, 3164, 2237, 30, 1726, 2370, + 13, 51248], "temperature": 0.0, "avg_logprob": -0.23920736208066837, "compression_ratio": + 1.4695652173913043, "no_speech_prob": 0.005616862792521715}, {"id": 27, "seek": + 18664, "start": 206.0, "end": 212.56, "text": " But yeah, it was a great, great + visit. I mean, I love the city. It''s an amazing place.", "tokens": [51332, 583, + 1338, 11, 309, 390, 257, 869, 11, 869, 3441, 13, 286, 914, 11, 286, 959, 264, 2307, + 13, 467, 311, 364, 2243, 1081, 13, 51660], "temperature": 0.0, "avg_logprob": -0.23920736208066837, + "compression_ratio": 1.4695652173913043, "no_speech_prob": 0.005616862792521715}, + {"id": 28, "seek": 21256, "start": 213.04, "end": 219.92000000000002, "text": " + Yeah, it''s an amazing place. And it''s funny what you said about search and", "tokens": + [50388, 865, 11, 309, 311, 364, 2243, 1081, 13, 400, 309, 311, 4074, 437, 291, 848, + 466, 3164, 293, 50732], "temperature": 0.0, "avg_logprob": -0.257664540234734, "compression_ratio": + 1.4860335195530727, "no_speech_prob": 0.018701262772083282}, {"id": 29, "seek": + 21256, "start": 219.92000000000002, "end": 228.24, "text": " trial because it really + has a special, maybe special even in Europe because we at one time we had", "tokens": + [50732, 7308, 570, 309, 534, 575, 257, 2121, 11, 1310, 2121, 754, 294, 3315, 570, + 321, 412, 472, 565, 321, 632, 51148], "temperature": 0.0, "avg_logprob": -0.257664540234734, + "compression_ratio": 1.4860335195530727, "no_speech_prob": 0.018701262772083282}, + {"id": 30, "seek": 21256, "start": 228.24, "end": 236.32, "text": " both Google, + Bing and Yahoo here in in in in trial line. So that was a fantastic time. Google", + "tokens": [51148, 1293, 3329, 11, 30755, 293, 41757, 510, 294, 294, 294, 294, 7308, + 1622, 13, 407, 300, 390, 257, 5456, 565, 13, 3329, 51552], "temperature": 0.0, "avg_logprob": + -0.257664540234734, "compression_ratio": 1.4860335195530727, "no_speech_prob": 0.018701262772083282}, + {"id": 31, "seek": 23632, "start": 236.95999999999998, "end": 243.51999999999998, + "text": " shut down their office here in trial line. And but now we have a Microsoft + is here in", "tokens": [50396, 5309, 760, 641, 3398, 510, 294, 7308, 1622, 13, 400, + 457, 586, 321, 362, 257, 8116, 307, 510, 294, 50724], "temperature": 0.0, "avg_logprob": + -0.23053432263826068, "compression_ratio": 1.6079295154185023, "no_speech_prob": + 0.013439943082630634}, {"id": 32, "seek": 23632, "start": 243.51999999999998, "end": + 250.32, "text": " in tronheim and also Yahoo as office here in in tronheim. So there''s + a lot of search technology", "tokens": [50724, 294, 504, 266, 18673, 293, 611, 41757, + 382, 3398, 510, 294, 294, 504, 266, 18673, 13, 407, 456, 311, 257, 688, 295, 3164, + 2899, 51064], "temperature": 0.0, "avg_logprob": -0.23053432263826068, "compression_ratio": + 1.6079295154185023, "no_speech_prob": 0.013439943082630634}, {"id": 33, "seek": + 23632, "start": 250.32, "end": 255.76, "text": " competence here in tronheim. This + is amazing actually for for relatively small city, but I think", "tokens": [51064, + 39965, 510, 294, 504, 266, 18673, 13, 639, 307, 2243, 767, 337, 337, 7226, 1359, + 2307, 11, 457, 286, 519, 51336], "temperature": 0.0, "avg_logprob": -0.23053432263826068, + "compression_ratio": 1.6079295154185023, "no_speech_prob": 0.013439943082630634}, + {"id": 34, "seek": 23632, "start": 255.76, "end": 259.84, "text": " Tronheim used + to be a capital of Norway at some point in back in history. Yeah, in its", "tokens": + [51336, 1765, 266, 18673, 1143, 281, 312, 257, 4238, 295, 24354, 412, 512, 935, + 294, 646, 294, 2503, 13, 865, 11, 294, 1080, 51540], "temperature": 0.0, "avg_logprob": + -0.23053432263826068, "compression_ratio": 1.6079295154185023, "no_speech_prob": + 0.013439943082630634}, {"id": 35, "seek": 25984, "start": 259.84, "end": 266.08, + "text": " on point, back way back in the Viking days. Exactly. So now all these + Vikings are", "tokens": [50364, 322, 935, 11, 646, 636, 646, 294, 264, 40375, 1708, + 13, 7587, 13, 407, 586, 439, 613, 48761, 366, 50676], "temperature": 0.0, "avg_logprob": + -0.25572087547995825, "compression_ratio": 1.4979423868312758, "no_speech_prob": + 0.002604901557788253}, {"id": 36, "seek": 25984, "start": 269.52, "end": 276.32, + "text": " stopped going around with boats and harassing people. Now we developed + search technology now.", "tokens": [50848, 5936, 516, 926, 365, 17772, 293, 16910, + 278, 561, 13, 823, 321, 4743, 3164, 2899, 586, 13, 51188], "temperature": 0.0, "avg_logprob": + -0.25572087547995825, "compression_ratio": 1.4979423868312758, "no_speech_prob": + 0.002604901557788253}, {"id": 37, "seek": 25984, "start": 276.32, "end": 284.32, + "text": " Yeah, such a move. Wow. And I also understood that in tronheim, as you + said, there is the university.", "tokens": [51188, 865, 11, 1270, 257, 1286, 13, + 3153, 13, 400, 286, 611, 7320, 300, 294, 504, 266, 18673, 11, 382, 291, 848, 11, + 456, 307, 264, 5454, 13, 51588], "temperature": 0.0, "avg_logprob": -0.25572087547995825, + "compression_ratio": 1.4979423868312758, "no_speech_prob": 0.002604901557788253}, + {"id": 38, "seek": 25984, "start": 284.32, "end": 288.64, "text": " Is it actually + one of the talent supplies for this industry or engineering in general?", "tokens": + [51588, 1119, 309, 767, 472, 295, 264, 8301, 11768, 337, 341, 3518, 420, 7043, 294, + 2674, 30, 51804], "temperature": 0.0, "avg_logprob": -0.25572087547995825, "compression_ratio": + 1.4979423868312758, "no_speech_prob": 0.002604901557788253}, {"id": 39, "seek": + 28984, "start": 290.08, "end": 297.76, "text": " Yeah, it is. We have the largest + technical university in Norway, C and in tronheim. So as an old", "tokens": [50376, + 865, 11, 309, 307, 13, 492, 362, 264, 6443, 6191, 5454, 294, 24354, 11, 383, 293, + 294, 504, 266, 18673, 13, 407, 382, 364, 1331, 50760], "temperature": 0.0, "avg_logprob": + -0.31427082117053046, "compression_ratio": 1.5105263157894737, "no_speech_prob": + 0.003874805523082614}, {"id": 40, "seek": 28984, "start": 297.76, "end": 306.15999999999997, + "text": " kind of history and so it''s definitely one of the reasons why the search + companies evolved.", "tokens": [50760, 733, 295, 2503, 293, 370, 309, 311, 2138, + 472, 295, 264, 4112, 983, 264, 3164, 3431, 14178, 13, 51180], "temperature": 0.0, + "avg_logprob": -0.31427082117053046, "compression_ratio": 1.5105263157894737, "no_speech_prob": + 0.003874805523082614}, {"id": 41, "seek": 28984, "start": 306.15999999999997, "end": + 313.52, "text": " And the fast search and transfer of the company was founded by + people coming out of the university", "tokens": [51180, 400, 264, 2370, 3164, 293, + 5003, 295, 264, 2237, 390, 13234, 538, 561, 1348, 484, 295, 264, 5454, 51548], "temperature": + 0.0, "avg_logprob": -0.31427082117053046, "compression_ratio": 1.5105263157894737, + "no_speech_prob": 0.003874805523082614}, {"id": 42, "seek": 31352, "start": 314.0, + "end": 319.28, "text": " here. So two point in the east week, very good swing and + so these two reggae and they they", "tokens": [50388, 510, 13, 407, 732, 935, 294, + 264, 10648, 1243, 11, 588, 665, 11173, 293, 370, 613, 732, 1121, 45534, 293, 436, + 436, 50652], "temperature": 0.0, "avg_logprob": -0.5256483476240557, "compression_ratio": + 1.644736842105263, "no_speech_prob": 0.032441671937704086}, {"id": 43, "seek": 31352, + "start": 319.28, "end": 324.64, "text": " came they actually started with FTP search + bucket back in like the night the seven. So and that", "tokens": [50652, 1361, 436, + 767, 1409, 365, 479, 16804, 3164, 13058, 646, 294, 411, 264, 1818, 264, 3407, 13, + 407, 293, 300, 50920], "temperature": 0.0, "avg_logprob": -0.5256483476240557, "compression_ratio": + 1.644736842105263, "no_speech_prob": 0.032441671937704086}, {"id": 44, "seek": 31352, + "start": 324.64, "end": 331.52, "text": " developed into this web search engine + and then eventually this became a Westpaw in Yahoo.", "tokens": [50920, 4743, 666, + 341, 3670, 3164, 2848, 293, 550, 4728, 341, 3062, 257, 4055, 79, 1607, 294, 41757, + 13, 51264], "temperature": 0.0, "avg_logprob": -0.5256483476240557, "compression_ratio": + 1.644736842105263, "no_speech_prob": 0.032441671937704086}, {"id": 45, "seek": 31352, + "start": 332.32, "end": 338.15999999999997, "text": " Oh yeah, yeah, sounds great. + So I can actually maybe touch on the backgrounds since I''ve mentioned", "tokens": + [51304, 876, 1338, 11, 1338, 11, 3263, 869, 13, 407, 286, 393, 767, 1310, 2557, + 322, 264, 17336, 1670, 286, 600, 2835, 51596], "temperature": 0.0, "avg_logprob": + -0.5256483476240557, "compression_ratio": 1.644736842105263, "no_speech_prob": 0.032441671937704086}, + {"id": 46, "seek": 33816, "start": 338.16, "end": 343.52000000000004, "text": " + now web search and you know how maybe not everybody has heard about Westpaw and + so Westpaw actually", "tokens": [50364, 586, 3670, 3164, 293, 291, 458, 577, 1310, + 406, 2201, 575, 2198, 466, 4055, 79, 1607, 293, 370, 4055, 79, 1607, 767, 50632], + "temperature": 0.0, "avg_logprob": -0.15635337220861556, "compression_ratio": 1.6538461538461537, + "no_speech_prob": 0.008224464021623135}, {"id": 47, "seek": 33816, "start": 344.96000000000004, + "end": 352.16, "text": " we started developing Westpaw in 2004. So Yahoo said that + you know we brought you into the company.", "tokens": [50704, 321, 1409, 6416, 4055, + 79, 1607, 294, 15817, 13, 407, 41757, 848, 300, 291, 458, 321, 3038, 291, 666, 264, + 2237, 13, 51064], "temperature": 0.0, "avg_logprob": -0.15635337220861556, "compression_ratio": + 1.6538461538461537, "no_speech_prob": 0.008224464021623135}, {"id": 48, "seek": + 33816, "start": 352.16, "end": 359.12, "text": " We want you to build a vertical + search platform that we can use across our properties in Yahoo.", "tokens": [51064, + 492, 528, 291, 281, 1322, 257, 9429, 3164, 3663, 300, 321, 393, 764, 2108, 527, + 7221, 294, 41757, 13, 51412], "temperature": 0.0, "avg_logprob": -0.15635337220861556, + "compression_ratio": 1.6538461538461537, "no_speech_prob": 0.008224464021623135}, + {"id": 49, "seek": 33816, "start": 359.12, "end": 367.28000000000003, "text": " + So for example, Yahoo finance, Yahoo news, they need to have some kind of search + engine. So", "tokens": [51412, 407, 337, 1365, 11, 41757, 10719, 11, 41757, 2583, + 11, 436, 643, 281, 362, 512, 733, 295, 3164, 2848, 13, 407, 51820], "temperature": + 0.0, "avg_logprob": -0.15635337220861556, "compression_ratio": 1.6538461538461537, + "no_speech_prob": 0.008224464021623135}, {"id": 50, "seek": 36728, "start": 367.28, + "end": 373.59999999999997, "text": " and they gave that task to ask you in trial + and I''m so they started building Westpaw,", "tokens": [50364, 293, 436, 2729, 300, + 5633, 281, 1029, 291, 294, 7308, 293, 286, 478, 370, 436, 1409, 2390, 4055, 79, + 1607, 11, 50680], "temperature": 0.0, "avg_logprob": -0.21887676643602777, "compression_ratio": + 1.56, "no_speech_prob": 0.0003841678553726524}, {"id": 51, "seek": 36728, "start": + 374.32, "end": 380.23999999999995, "text": " the Westpaw platform using the routes + and the technology from the web search and putting that", "tokens": [50716, 264, + 4055, 79, 1607, 3663, 1228, 264, 18242, 293, 264, 2899, 490, 264, 3670, 3164, 293, + 3372, 300, 51012], "temperature": 0.0, "avg_logprob": -0.21887676643602777, "compression_ratio": + 1.56, "no_speech_prob": 0.0003841678553726524}, {"id": 52, "seek": 36728, "start": + 380.23999999999995, "end": 391.28, "text": " into a package that the verticals could + install and use. And then over time this so basically", "tokens": [51012, 666, 257, + 7372, 300, 264, 9429, 82, 727, 3625, 293, 764, 13, 400, 550, 670, 565, 341, 370, + 1936, 51564], "temperature": 0.0, "avg_logprob": -0.21887676643602777, "compression_ratio": + 1.56, "no_speech_prob": 0.0003841678553726524}, {"id": 53, "seek": 39128, "start": + 391.28, "end": 401.35999999999996, "text": " starting with basic BM25 like search + like keyword search and then gradually Westpaw added more", "tokens": [50364, 2891, + 365, 3875, 15901, 6074, 411, 3164, 411, 20428, 3164, 293, 550, 13145, 4055, 79, + 1607, 3869, 544, 50868], "temperature": 0.0, "avg_logprob": -0.2136540710926056, + "compression_ratio": 1.4795918367346939, "no_speech_prob": 0.0009724642150104046}, + {"id": 54, "seek": 39128, "start": 401.35999999999996, "end": 410.08, "text": " + features real time indexing, 10 source aggregations, grouping facets as well. So + it really developed", "tokens": [50868, 4122, 957, 565, 8186, 278, 11, 1266, 4009, + 16743, 763, 11, 40149, 49752, 382, 731, 13, 407, 309, 534, 4743, 51304], "temperature": + 0.0, "avg_logprob": -0.2136540710926056, "compression_ratio": 1.4795918367346939, + "no_speech_prob": 0.0009724642150104046}, {"id": 55, "seek": 39128, "start": 411.11999999999995, + "end": 418.88, "text": " over time and new requirements came in especially when + we started Westpaw it was around search", "tokens": [51356, 670, 565, 293, 777, + 7728, 1361, 294, 2318, 562, 321, 1409, 4055, 79, 1607, 309, 390, 926, 3164, 51744], + "temperature": 0.0, "avg_logprob": -0.2136540710926056, "compression_ratio": 1.4795918367346939, + "no_speech_prob": 0.0009724642150104046}, {"id": 56, "seek": 41888, "start": 419.84, + "end": 427.76, "text": " but in 2007, 2008 around that time Westpaw''s also started + to be used more of as a recommendation engine.", "tokens": [50412, 457, 294, 12656, + 11, 10389, 926, 300, 565, 4055, 79, 1607, 311, 611, 1409, 281, 312, 1143, 544, 295, + 382, 257, 11879, 2848, 13, 50808], "temperature": 0.0, "avg_logprob": -0.2582565771566855, + "compression_ratio": 1.5934065934065933, "no_speech_prob": 0.013602495193481445}, + {"id": 57, "seek": 41888, "start": 427.76, "end": 433.36, "text": " So serving of + recommendations. So when you go to finance, Yahoo.com and there''s a set of articles", + "tokens": [50808, 407, 8148, 295, 10434, 13, 407, 562, 291, 352, 281, 10719, 11, + 41757, 13, 1112, 293, 456, 311, 257, 992, 295, 11290, 51088], "temperature": 0.0, + "avg_logprob": -0.2582565771566855, "compression_ratio": 1.5934065934065933, "no_speech_prob": + 0.013602495193481445}, {"id": 58, "seek": 41888, "start": 434.0, "end": 446.32, + "text": " that are recommended to you the serving engine doing that is Westpaw. + And then in 2017,", "tokens": [51120, 300, 366, 9628, 281, 291, 264, 8148, 2848, + 884, 300, 307, 4055, 79, 1607, 13, 400, 550, 294, 6591, 11, 51736], "temperature": + 0.0, "avg_logprob": -0.2582565771566855, "compression_ratio": 1.5934065934065933, + "no_speech_prob": 0.013602495193481445}, {"id": 59, "seek": 44632, "start": 446.96, + "end": 454.08, "text": " Yahoo decided that we''re going to open source Westpaw + to the world. So we open-sourced it using", "tokens": [50396, 41757, 3047, 300, + 321, 434, 516, 281, 1269, 4009, 4055, 79, 1607, 281, 264, 1002, 13, 407, 321, 1269, + 12, 82, 396, 1232, 309, 1228, 50752], "temperature": 0.0, "avg_logprob": -0.1816017598281672, + "compression_ratio": 1.5706806282722514, "no_speech_prob": 0.010644343681633472}, + {"id": 60, "seek": 44632, "start": 454.08, "end": 463.12, "text": " our Apache tool + license and we still continue to actively, very actively develop on Westpaw and + add new", "tokens": [50752, 527, 46597, 2290, 10476, 293, 321, 920, 2354, 281, 13022, + 11, 588, 13022, 1499, 322, 4055, 79, 1607, 293, 909, 777, 51204], "temperature": + 0.0, "avg_logprob": -0.1816017598281672, "compression_ratio": 1.5706806282722514, + "no_speech_prob": 0.010644343681633472}, {"id": 61, "seek": 44632, "start": 463.12, + "end": 470.8, "text": " features and so on. So that''s a kind of brief background. + So Westpaw is not new. It''s really kind of", "tokens": [51204, 4122, 293, 370, + 322, 13, 407, 300, 311, 257, 733, 295, 5353, 3678, 13, 407, 4055, 79, 1607, 307, + 406, 777, 13, 467, 311, 534, 733, 295, 51588], "temperature": 0.0, "avg_logprob": + -0.1816017598281672, "compression_ratio": 1.5706806282722514, "no_speech_prob": + 0.010644343681633472}, {"id": 62, "seek": 47080, "start": 470.8, "end": 476.56, + "text": " it has in a very long history and I think that''s also great thing and + we can talk maybe a little bit", "tokens": [50364, 309, 575, 294, 257, 588, 938, + 2503, 293, 286, 519, 300, 311, 611, 869, 551, 293, 321, 393, 751, 1310, 257, 707, + 857, 50652], "temperature": 0.0, "avg_logprob": -0.1923310226864285, "compression_ratio": + 1.5360824742268042, "no_speech_prob": 0.004410884343087673}, {"id": 63, "seek": + 47080, "start": 476.56, "end": 483.36, "text": " about it because you know we need + to develop software over time. There are a lot of changes you know", "tokens": [50652, + 466, 309, 570, 291, 458, 321, 643, 281, 1499, 4722, 670, 565, 13, 821, 366, 257, + 688, 295, 2962, 291, 458, 50992], "temperature": 0.0, "avg_logprob": -0.1923310226864285, + "compression_ratio": 1.5360824742268042, "no_speech_prob": 0.004410884343087673}, + {"id": 64, "seek": 47080, "start": 483.36, "end": 491.84000000000003, "text": " + in the infrastructure. There was no cloud, public cloud. There were no Kubernetes + and from 2004.", "tokens": [50992, 294, 264, 6896, 13, 821, 390, 572, 4588, 11, + 1908, 4588, 13, 821, 645, 572, 23145, 293, 490, 15817, 13, 51416], "temperature": + 0.0, "avg_logprob": -0.1923310226864285, "compression_ratio": 1.5360824742268042, + "no_speech_prob": 0.004410884343087673}, {"id": 65, "seek": 49184, "start": 491.91999999999996, + "end": 499.91999999999996, "text": " I started in 2007, you know a high power content + machine, content node machine would have maybe", "tokens": [50368, 286, 1409, 294, + 12656, 11, 291, 458, 257, 1090, 1347, 2701, 3479, 11, 2701, 9984, 3479, 576, 362, + 1310, 50768], "temperature": 0.0, "avg_logprob": -0.22727350286535314, "compression_ratio": + 1.5401069518716577, "no_speech_prob": 0.0069178631529212}, {"id": 66, "seek": 49184, + "start": 499.91999999999996, "end": 509.03999999999996, "text": " eight things of + RAM. And it would have maybe maximum 1 gigabit per second network. And if we go", + "tokens": [50768, 3180, 721, 295, 14561, 13, 400, 309, 576, 362, 1310, 6674, 502, + 8741, 455, 270, 680, 1150, 3209, 13, 400, 498, 321, 352, 51224], "temperature": + 0.0, "avg_logprob": -0.22727350286535314, "compression_ratio": 1.5401069518716577, + "no_speech_prob": 0.0069178631529212}, {"id": 67, "seek": 49184, "start": 509.03999999999996, + "end": 516.48, "text": " fast forward, you know, and it will have spinning disks. + And now we have NVME SSD disks. We have", "tokens": [51224, 2370, 2128, 11, 291, + 458, 11, 293, 309, 486, 362, 15640, 41617, 13, 400, 586, 321, 362, 46512, 15454, + 30262, 41617, 13, 492, 362, 51596], "temperature": 0.0, "avg_logprob": -0.22727350286535314, + "compression_ratio": 1.5401069518716577, "no_speech_prob": 0.0069178631529212}, + {"id": 68, "seek": 51648, "start": 516.48, "end": 524.64, "text": " nodes with four + terabytes, potentially of memory, lots of CPU power. So there''s like keeping up", + "tokens": [50364, 13891, 365, 1451, 1796, 24538, 11, 7263, 295, 4675, 11, 3195, + 295, 13199, 1347, 13, 407, 456, 311, 411, 5145, 493, 50772], "temperature": 0.0, + "avg_logprob": -0.2013667713512074, "compression_ratio": 1.592274678111588, "no_speech_prob": + 0.007568803150206804}, {"id": 69, "seek": 51648, "start": 525.52, "end": 532.0, + "text": " in improving the software and adopting it to the hardware and new hardware + and so on. It''s", "tokens": [50816, 294, 11470, 264, 4722, 293, 32328, 309, 281, + 264, 8837, 293, 777, 8837, 293, 370, 322, 13, 467, 311, 51140], "temperature": 0.0, + "avg_logprob": -0.2013667713512074, "compression_ratio": 1.592274678111588, "no_speech_prob": + 0.007568803150206804}, {"id": 70, "seek": 51648, "start": 532.0, "end": 537.12, + "text": " been really fun to watch. I think we did a good job actually making Westpaw + kind of modern", "tokens": [51140, 668, 534, 1019, 281, 1159, 13, 286, 519, 321, + 630, 257, 665, 1691, 767, 1455, 4055, 79, 1607, 733, 295, 4363, 51396], "temperature": + 0.0, "avg_logprob": -0.2013667713512074, "compression_ratio": 1.592274678111588, + "no_speech_prob": 0.007568803150206804}, {"id": 71, "seek": 51648, "start": 538.08, + "end": 544.72, "text": " from something that started in 2004. It turns like really + an exciting journey and really like", "tokens": [51444, 490, 746, 300, 1409, 294, + 15817, 13, 467, 4523, 411, 534, 364, 4670, 4671, 293, 534, 411, 51776], "temperature": + 0.0, "avg_logprob": -0.2013667713512074, "compression_ratio": 1.592274678111588, + "no_speech_prob": 0.007568803150206804}, {"id": 72, "seek": 54472, "start": 544.72, + "end": 550.88, "text": " starting from when you would explain like you know small + scale servers in the way all the way.", "tokens": [50364, 2891, 490, 562, 291, 576, + 2903, 411, 291, 458, 1359, 4373, 15909, 294, 264, 636, 439, 264, 636, 13, 50672], + "temperature": 0.0, "avg_logprob": -0.26578508723865857, "compression_ratio": 1.608695652173913, + "no_speech_prob": 0.004526263102889061}, {"id": 73, "seek": 54472, "start": 550.88, + "end": 556.08, "text": " And the technology has changed so much right? The disks + became faster I guess and you know", "tokens": [50672, 400, 264, 2899, 575, 3105, + 370, 709, 558, 30, 440, 41617, 3062, 4663, 286, 2041, 293, 291, 458, 50932], "temperature": + 0.0, "avg_logprob": -0.26578508723865857, "compression_ratio": 1.608695652173913, + "no_speech_prob": 0.004526263102889061}, {"id": 74, "seek": 54472, "start": 556.64, + "end": 562.0, "text": " the network has become faster. And like I remember like + in Silicon Valley, Citco, if you", "tokens": [50960, 264, 3209, 575, 1813, 4663, + 13, 400, 411, 286, 1604, 411, 294, 25351, 10666, 11, 18435, 1291, 11, 498, 291, + 51228], "temperature": 0.0, "avg_logprob": -0.26578508723865857, "compression_ratio": + 1.608695652173913, "no_speech_prob": 0.004526263102889061}, {"id": 75, "seek": 54472, + "start": 562.0, "end": 567.76, "text": " if you watched it, it like they had a case + when they optimized one module in the system and the", "tokens": [51228, 498, 291, + 6337, 309, 11, 309, 411, 436, 632, 257, 1389, 562, 436, 26941, 472, 10088, 294, + 264, 1185, 293, 264, 51516], "temperature": 0.0, "avg_logprob": -0.26578508723865857, + "compression_ratio": 1.608695652173913, "no_speech_prob": 0.004526263102889061}, + {"id": 76, "seek": 56776, "start": 567.76, "end": 575.4399999999999, "text": " whole + system went down because it''s way too fast. So it''s like it sounds like you have + done quite", "tokens": [50364, 1379, 1185, 1437, 760, 570, 309, 311, 636, 886, 2370, + 13, 407, 309, 311, 411, 309, 3263, 411, 291, 362, 1096, 1596, 50748], "temperature": + 0.0, "avg_logprob": -0.2155465690457091, "compression_ratio": 1.5951417004048583, + "no_speech_prob": 0.00247886567376554}, {"id": 77, "seek": 56776, "start": 575.4399999999999, + "end": 581.36, "text": " a bit of job to actually keep this shape of flow. And like + if I understood correctly, technically", "tokens": [50748, 257, 857, 295, 1691, + 281, 767, 1066, 341, 3909, 295, 3095, 13, 400, 411, 498, 286, 7320, 8944, 11, 12120, + 51044], "temperature": 0.0, "avg_logprob": -0.2155465690457091, "compression_ratio": + 1.5951417004048583, "no_speech_prob": 0.00247886567376554}, {"id": 78, "seek": 56776, + "start": 581.36, "end": 588.88, "text": " speaking, Westpaw or portion of Westpaw + is implemented in Java. And then portion in C or C++ and then", "tokens": [51044, + 4124, 11, 4055, 79, 1607, 420, 8044, 295, 4055, 79, 1607, 307, 12270, 294, 10745, + 13, 400, 550, 8044, 294, 383, 420, 383, 25472, 293, 550, 51420], "temperature": + 0.0, "avg_logprob": -0.2155465690457091, "compression_ratio": 1.5951417004048583, + "no_speech_prob": 0.00247886567376554}, {"id": 79, "seek": 56776, "start": 588.88, + "end": 595.92, "text": " you also have some Python. And maybe you can talk more + about the choice of languages and sort of", "tokens": [51420, 291, 611, 362, 512, + 15329, 13, 400, 1310, 291, 393, 751, 544, 466, 264, 3922, 295, 8650, 293, 1333, + 295, 51772], "temperature": 0.0, "avg_logprob": -0.2155465690457091, "compression_ratio": + 1.5951417004048583, "no_speech_prob": 0.00247886567376554}, {"id": 80, "seek": 59592, + "start": 596.24, "end": 601.76, "text": " culture that there isn''t the team. But + I''m also curious like around the same time. I''ve actually", "tokens": [50380, + 3713, 300, 456, 1943, 380, 264, 1469, 13, 583, 286, 478, 611, 6369, 411, 926, 264, + 912, 565, 13, 286, 600, 767, 50656], "temperature": 0.0, "avg_logprob": -0.1531265613644622, + "compression_ratio": 1.5663716814159292, "no_speech_prob": 0.0023909492883831263}, + {"id": 81, "seek": 59592, "start": 601.76, "end": 609.4399999999999, "text": " seen + was also developing right quite quite fast. Did you kind of look at what that team + is doing", "tokens": [50656, 1612, 390, 611, 6416, 558, 1596, 1596, 2370, 13, 2589, + 291, 733, 295, 574, 412, 437, 300, 1469, 307, 884, 51040], "temperature": 0.0, "avg_logprob": + -0.1531265613644622, "compression_ratio": 1.5663716814159292, "no_speech_prob": + 0.0023909492883831263}, {"id": 82, "seek": 59592, "start": 610.24, "end": 613.92, + "text": " which is like an open source project? Was there something to loan from?", + "tokens": [51080, 597, 307, 411, 364, 1269, 4009, 1716, 30, 3027, 456, 746, 281, + 10529, 490, 30, 51264], "temperature": 0.0, "avg_logprob": -0.1531265613644622, + "compression_ratio": 1.5663716814159292, "no_speech_prob": 0.0023909492883831263}, + {"id": 83, "seek": 59592, "start": 618.4799999999999, "end": 623.4399999999999, + "text": " Yeah, so let me tackle the first questions around Westpaw and the kind + of languages that", "tokens": [51492, 865, 11, 370, 718, 385, 14896, 264, 700, 1651, + 926, 4055, 79, 1607, 293, 264, 733, 295, 8650, 300, 51740], "temperature": 0.0, + "avg_logprob": -0.1531265613644622, "compression_ratio": 1.5663716814159292, "no_speech_prob": + 0.0023909492883831263}, {"id": 84, "seek": 62344, "start": 623.44, "end": 631.12, + "text": " be used. And there''s a lot of things here to cover. So Westpaw is around + 1.7 million", "tokens": [50364, 312, 1143, 13, 400, 456, 311, 257, 688, 295, 721, + 510, 281, 2060, 13, 407, 4055, 79, 1607, 307, 926, 502, 13, 22, 2459, 50748], "temperature": + 0.0, "avg_logprob": -0.14532268478209714, "compression_ratio": 1.5108695652173914, + "no_speech_prob": 0.0014025408308953047}, {"id": 85, "seek": 62344, "start": 631.12, + "end": 639.6800000000001, "text": " lines of code, the total Westpaw platform. And + it''s a roughly 50% is written in Java. And 50%", "tokens": [50748, 3876, 295, 3089, + 11, 264, 3217, 4055, 79, 1607, 3663, 13, 400, 309, 311, 257, 9810, 2625, 4, 307, + 3720, 294, 10745, 13, 400, 2625, 4, 51176], "temperature": 0.0, "avg_logprob": -0.14532268478209714, + "compression_ratio": 1.5108695652173914, "no_speech_prob": 0.0014025408308953047}, + {"id": 86, "seek": 62344, "start": 639.6800000000001, "end": 647.2, "text": " is + written in C++. And why do we use two different languages and what are the trade-offs? + So in the", "tokens": [51176, 307, 3720, 294, 383, 25472, 13, 400, 983, 360, 321, + 764, 732, 819, 8650, 293, 437, 366, 264, 4923, 12, 19231, 30, 407, 294, 264, 51552], + "temperature": 0.0, "avg_logprob": -0.14532268478209714, "compression_ratio": 1.5108695652173914, + "no_speech_prob": 0.0014025408308953047}, {"id": 87, "seek": 64720, "start": 647.2, + "end": 654.0, "text": " Westpaw architecture, we made a clear distinction between + what we call the cluster that holds the", "tokens": [50364, 4055, 79, 1607, 9482, + 11, 321, 1027, 257, 1850, 16844, 1296, 437, 321, 818, 264, 13630, 300, 9190, 264, + 50704], "temperature": 0.0, "avg_logprob": -0.09892934976622116, "compression_ratio": + 1.6860986547085202, "no_speech_prob": 0.0003257182252127677}, {"id": 88, "seek": + 64720, "start": 654.0, "end": 661.2800000000001, "text": " content where you actually + index and invert the documents and you have all the data structures for", "tokens": + [50704, 2701, 689, 291, 767, 8186, 293, 33966, 264, 8512, 293, 291, 362, 439, 264, + 1412, 9227, 337, 51068], "temperature": 0.0, "avg_logprob": -0.09892934976622116, + "compression_ratio": 1.6860986547085202, "no_speech_prob": 0.0003257182252127677}, + {"id": 89, "seek": 64720, "start": 661.2800000000001, "end": 666.08, "text": " fast + searching in these data structures. The content layer is written in C++ because + you''re", "tokens": [51068, 2370, 10808, 294, 613, 1412, 9227, 13, 440, 2701, 4583, + 307, 3720, 294, 383, 25472, 570, 291, 434, 51308], "temperature": 0.0, "avg_logprob": + -0.09892934976622116, "compression_ratio": 1.6860986547085202, "no_speech_prob": + 0.0003257182252127677}, {"id": 90, "seek": 64720, "start": 666.08, "end": 672.32, + "text": " managing a lot of data. You have the data that you need to have in memory + and so on. So", "tokens": [51308, 11642, 257, 688, 295, 1412, 13, 509, 362, 264, + 1412, 300, 291, 643, 281, 362, 294, 4675, 293, 370, 322, 13, 407, 51620], "temperature": + 0.0, "avg_logprob": -0.09892934976622116, "compression_ratio": 1.6860986547085202, + "no_speech_prob": 0.0003257182252127677}, {"id": 91, "seek": 67232, "start": 672.96, + "end": 680.96, "text": " and it needs to be fairly efficient. And then on there + what we call the stateless layer is the", "tokens": [50396, 293, 309, 2203, 281, + 312, 6457, 7148, 13, 400, 550, 322, 456, 437, 321, 818, 264, 2219, 4272, 4583, 307, + 264, 50796], "temperature": 0.0, "avg_logprob": -0.22794599533081056, "compression_ratio": + 1.5483870967741935, "no_speech_prob": 0.0021192936692386866}, {"id": 92, "seek": + 67232, "start": 680.96, "end": 688.96, "text": " layer that actually interacts with + user requests. So user requests comes in. It''s accepted by", "tokens": [50796, + 4583, 300, 767, 43582, 365, 4195, 12475, 13, 407, 4195, 12475, 1487, 294, 13, 467, + 311, 9035, 538, 51196], "temperature": 0.0, "avg_logprob": -0.22794599533081056, + "compression_ratio": 1.5483870967741935, "no_speech_prob": 0.0021192936692386866}, + {"id": 93, "seek": 67232, "start": 688.96, "end": 696.5600000000001, "text": " HSP + server and there you do and that layer is written in Java. So you can also then + deploy plugins.", "tokens": [51196, 389, 27921, 7154, 293, 456, 291, 360, 293, 300, + 4583, 307, 3720, 294, 10745, 13, 407, 291, 393, 611, 550, 7274, 33759, 13, 51576], + "temperature": 0.0, "avg_logprob": -0.22794599533081056, "compression_ratio": 1.5483870967741935, + "no_speech_prob": 0.0021192936692386866}, {"id": 94, "seek": 69656, "start": 696.64, + "end": 702.4, "text": " You can write your own searcher functions that can dispatch + the query and get a reply. And you", "tokens": [50368, 509, 393, 2464, 428, 1065, + 3164, 260, 6828, 300, 393, 36729, 264, 14581, 293, 483, 257, 16972, 13, 400, 291, + 50656], "temperature": 0.0, "avg_logprob": -0.1156198952787666, "compression_ratio": + 1.6478260869565218, "no_speech_prob": 0.002656460739672184}, {"id": 95, "seek": + 69656, "start": 702.4, "end": 708.3199999999999, "text": " don''t. It''s transparent + from a given searcher if you have a 100 node cluster or if you have a", "tokens": + [50656, 500, 380, 13, 467, 311, 12737, 490, 257, 2212, 3164, 260, 498, 291, 362, + 257, 2319, 9984, 13630, 420, 498, 291, 362, 257, 50952], "temperature": 0.0, "avg_logprob": + -0.1156198952787666, "compression_ratio": 1.6478260869565218, "no_speech_prob": + 0.002656460739672184}, {"id": 96, "seek": 69656, "start": 708.3199999999999, "end": + 715.3599999999999, "text": " single node cluster. So that''s kind of hidden away + when you deploy a plugin. So those languages", "tokens": [50952, 2167, 9984, 13630, + 13, 407, 300, 311, 733, 295, 7633, 1314, 562, 291, 7274, 257, 23407, 13, 407, 729, + 8650, 51304], "temperature": 0.0, "avg_logprob": -0.1156198952787666, "compression_ratio": + 1.6478260869565218, "no_speech_prob": 0.002656460739672184}, {"id": 97, "seek": + 69656, "start": 715.3599999999999, "end": 720.9599999999999, "text": " have different + trade-offs. So it''s a lot easier for people to write plugins using Java without", + "tokens": [51304, 362, 819, 4923, 12, 19231, 13, 407, 309, 311, 257, 688, 3571, + 337, 561, 281, 2464, 33759, 1228, 10745, 1553, 51584], "temperature": 0.0, "avg_logprob": + -0.1156198952787666, "compression_ratio": 1.6478260869565218, "no_speech_prob": + 0.002656460739672184}, {"id": 98, "seek": 72096, "start": 721.0400000000001, "end": + 728.32, "text": " shooting themselves in the foot using C++. So in the content layer + in C++ we don''t allow any kind", "tokens": [50368, 5942, 2969, 294, 264, 2671, + 1228, 383, 25472, 13, 407, 294, 264, 2701, 4583, 294, 383, 25472, 321, 500, 380, + 2089, 604, 733, 50732], "temperature": 0.0, "avg_logprob": -0.22338106099841665, + "compression_ratio": 1.706140350877193, "no_speech_prob": 0.0014847111888229847}, + {"id": 99, "seek": 72096, "start": 728.32, "end": 733.0400000000001, "text": " of + plugins. You can contribute or you can contribute to the open source but then it + needs to be", "tokens": [50732, 295, 33759, 13, 509, 393, 10586, 420, 291, 393, + 10586, 281, 264, 1269, 4009, 457, 550, 309, 2203, 281, 312, 50968], "temperature": + 0.0, "avg_logprob": -0.22338106099841665, "compression_ratio": 1.706140350877193, + "no_speech_prob": 0.0014847111888229847}, {"id": 100, "seek": 72096, "start": 733.0400000000001, + "end": 738.4000000000001, "text": " a kind of feature. We don''t allow you to embed + a library or something into the content layer.", "tokens": [50968, 257, 733, 295, + 4111, 13, 492, 500, 380, 2089, 291, 281, 12240, 257, 6405, 420, 746, 666, 264, 2701, + 4583, 13, 51236], "temperature": 0.0, "avg_logprob": -0.22338106099841665, "compression_ratio": + 1.706140350877193, "no_speech_prob": 0.0014847111888229847}, {"id": 101, "seek": + 72096, "start": 739.12, "end": 744.72, "text": " So that''s a trade-off. So then + you mentioned Python. We have a Python, what we call pi-wespa which is", "tokens": + [51272, 407, 300, 311, 257, 4923, 12, 4506, 13, 407, 550, 291, 2835, 15329, 13, + 492, 362, 257, 15329, 11, 437, 321, 818, 3895, 12, 86, 279, 4306, 597, 307, 51552], + "temperature": 0.0, "avg_logprob": -0.22338106099841665, "compression_ratio": 1.706140350877193, + "no_speech_prob": 0.0014847111888229847}, {"id": 102, "seek": 74472, "start": 745.44, + "end": 753.84, "text": " language binding on top of the HDP API. So it''s not of + the core kind of westpides. It''s an API where we built", "tokens": [50400, 2856, + 17359, 322, 1192, 295, 264, 389, 11373, 9362, 13, 407, 309, 311, 406, 295, 264, + 4965, 733, 295, 7009, 79, 1875, 13, 467, 311, 364, 9362, 689, 321, 3094, 50820], + "temperature": 0.0, "avg_logprob": -0.34311512154592594, "compression_ratio": 1.5181347150259068, + "no_speech_prob": 0.004754857160151005}, {"id": 103, "seek": 74472, "start": 755.84, + "end": 762.32, "text": " around interacting with westpa, doing model evaluation + and evaluating for example different", "tokens": [50920, 926, 18017, 365, 7009, + 4306, 11, 884, 2316, 13344, 293, 27479, 337, 1365, 819, 51244], "temperature": 0.0, + "avg_logprob": -0.34311512154592594, "compression_ratio": 1.5181347150259068, "no_speech_prob": + 0.004754857160151005}, {"id": 104, "seek": 74472, "start": 762.32, "end": 768.8000000000001, + "text": " retrieval and writing strategies. So that''s the kind of language. And + regarding your scene,", "tokens": [51244, 19817, 3337, 293, 3579, 9029, 13, 407, + 300, 311, 264, 733, 295, 2856, 13, 400, 8595, 428, 4145, 11, 51568], "temperature": + 0.0, "avg_logprob": -0.34311512154592594, "compression_ratio": 1.5181347150259068, + "no_speech_prob": 0.004754857160151005}, {"id": 105, "seek": 76880, "start": 768.8, + "end": 775.12, "text": " Apache Lucene. So if I recall correctly, I think Apache + Lucene started in 1998. So around", "tokens": [50364, 46597, 9593, 1450, 13, 407, + 498, 286, 9901, 8944, 11, 286, 519, 46597, 9593, 1450, 1409, 294, 21404, 13, 407, + 926, 50680], "temperature": 0.0, "avg_logprob": -0.23998415093672903, "compression_ratio": + 1.5836909871244635, "no_speech_prob": 0.004530901554971933}, {"id": 106, "seek": + 76880, "start": 775.12, "end": 782.8, "text": " time. So there''s a lot of inspiration + of course and it''s not that many ways you can build", "tokens": [50680, 565, 13, + 407, 456, 311, 257, 688, 295, 10249, 295, 1164, 293, 309, 311, 406, 300, 867, 2098, + 291, 393, 1322, 51064], "temperature": 0.0, "avg_logprob": -0.23998415093672903, + "compression_ratio": 1.5836909871244635, "no_speech_prob": 0.004530901554971933}, + {"id": 107, "seek": 76880, "start": 782.8, "end": 790.3199999999999, "text": " a + search engine. So I''m losing pretty much, it''s a really good library. So yeah, + definitely we look", "tokens": [51064, 257, 3164, 2848, 13, 407, 286, 478, 7027, + 1238, 709, 11, 309, 311, 257, 534, 665, 6405, 13, 407, 1338, 11, 2138, 321, 574, + 51440], "temperature": 0.0, "avg_logprob": -0.23998415093672903, "compression_ratio": + 1.5836909871244635, "no_speech_prob": 0.004530901554971933}, {"id": 108, "seek": + 76880, "start": 790.3199999999999, "end": 796.3199999999999, "text": " at what''s + happening in open source and they have a lot of admiration for the work and the", + "tokens": [51440, 412, 437, 311, 2737, 294, 1269, 4009, 293, 436, 362, 257, 688, + 295, 44597, 337, 264, 589, 293, 264, 51740], "temperature": 0.0, "avg_logprob": + -0.23998415093672903, "compression_ratio": 1.5836909871244635, "no_speech_prob": + 0.004530901554971933}, {"id": 109, "seek": 79632, "start": 796.32, "end": 801.6, + "text": " committers of Apache Lucene. I mean, it''s a great job that they''ve done + and they''ll be able to", "tokens": [50364, 5599, 1559, 295, 46597, 9593, 1450, + 13, 286, 914, 11, 309, 311, 257, 869, 1691, 300, 436, 600, 1096, 293, 436, 603, + 312, 1075, 281, 50628], "temperature": 0.0, "avg_logprob": -0.24683005923316592, + "compression_ratio": 1.63135593220339, "no_speech_prob": 0.000335570250172168}, + {"id": 110, "seek": 79632, "start": 801.6, "end": 809.6, "text": " develop this + over 20 years. And the core difference is between westpun, Apache Lucene is that + westpun", "tokens": [50628, 1499, 341, 670, 945, 924, 13, 400, 264, 4965, 2649, + 307, 1296, 7009, 79, 409, 11, 46597, 9593, 1450, 307, 300, 7009, 79, 409, 51028], + "temperature": 0.0, "avg_logprob": -0.24683005923316592, "compression_ratio": 1.63135593220339, + "no_speech_prob": 0.000335570250172168}, {"id": 111, "seek": 79632, "start": 809.6, + "end": 814.1600000000001, "text": " is a full kind of engine. So it becomes more + of like comparing westpun with elastic search or", "tokens": [51028, 307, 257, 1577, + 733, 295, 2848, 13, 407, 309, 3643, 544, 295, 411, 15763, 7009, 79, 409, 365, 17115, + 3164, 420, 51256], "temperature": 0.0, "avg_logprob": -0.24683005923316592, "compression_ratio": + 1.63135593220339, "no_speech_prob": 0.000335570250172168}, {"id": 112, "seek": 79632, + "start": 814.1600000000001, "end": 821.2, "text": " Apache Zoolar, which is kind + of an engine on top. So there''s no like westpun library which you", "tokens": [51256, + 46597, 1176, 1092, 289, 11, 597, 307, 733, 295, 364, 2848, 322, 1192, 13, 407, 456, + 311, 572, 411, 7009, 79, 409, 6405, 597, 291, 51608], "temperature": 0.0, "avg_logprob": + -0.24683005923316592, "compression_ratio": 1.63135593220339, "no_speech_prob": 0.000335570250172168}, + {"id": 113, "seek": 82120, "start": 821.2, "end": 825.76, "text": " can use. You + have to kind of buy the whole, you have to buy the whole platform.", "tokens": [50364, + 393, 764, 13, 509, 362, 281, 733, 295, 2256, 264, 1379, 11, 291, 362, 281, 2256, + 264, 1379, 3663, 13, 50592], "temperature": 0.0, "avg_logprob": -0.14024956737245833, + "compression_ratio": 1.7388059701492538, "no_speech_prob": 0.0014419262297451496}, + {"id": 114, "seek": 82120, "start": 826.72, "end": 831.84, "text": " Yes, basically + like a web server around it and all the components like the nodes and overseer", + "tokens": [50640, 1079, 11, 1936, 411, 257, 3670, 7154, 926, 309, 293, 439, 264, + 6677, 411, 264, 13891, 293, 11916, 260, 50896], "temperature": 0.0, "avg_logprob": + -0.14024956737245833, "compression_ratio": 1.7388059701492538, "no_speech_prob": + 0.0014419262297451496}, {"id": 115, "seek": 82120, "start": 831.84, "end": 836.72, + "text": " and other architectural elements. Yeah, for sure. And on the Python side, + I''m also curious like", "tokens": [50896, 293, 661, 26621, 4959, 13, 865, 11, 337, + 988, 13, 400, 322, 264, 15329, 1252, 11, 286, 478, 611, 6369, 411, 51140], "temperature": + 0.0, "avg_logprob": -0.14024956737245833, "compression_ratio": 1.7388059701492538, + "no_speech_prob": 0.0014419262297451496}, {"id": 116, "seek": 82120, "start": 836.72, + "end": 841.76, "text": " with all the development of models and you know, hugging + face and you can pretty much find a paper", "tokens": [51140, 365, 439, 264, 3250, + 295, 5245, 293, 291, 458, 11, 41706, 1851, 293, 291, 393, 1238, 709, 915, 257, 3035, + 51392], "temperature": 0.0, "avg_logprob": -0.14024956737245833, "compression_ratio": + 1.7388059701492538, "no_speech_prob": 0.0014419262297451496}, {"id": 117, "seek": + 82120, "start": 841.76, "end": 848.1600000000001, "text": " and then most likely + there is a model already available in some shape and form. And so the Python", "tokens": + [51392, 293, 550, 881, 3700, 456, 307, 257, 2316, 1217, 2435, 294, 512, 3909, 293, + 1254, 13, 400, 370, 264, 15329, 51712], "temperature": 0.0, "avg_logprob": -0.14024956737245833, + "compression_ratio": 1.7388059701492538, "no_speech_prob": 0.0014419262297451496}, + {"id": 118, "seek": 84816, "start": 848.24, "end": 854.24, "text": " layer in westpun + does it help you know newcomers to kind of easier experiment with these models in", + "tokens": [50368, 4583, 294, 7009, 79, 409, 775, 309, 854, 291, 458, 40014, 433, + 281, 733, 295, 3571, 5120, 365, 613, 5245, 294, 50668], "temperature": 0.0, "avg_logprob": + -0.20577346400210733, "compression_ratio": 1.5793991416309012, "no_speech_prob": + 0.0011269384995102882}, {"id": 119, "seek": 84816, "start": 854.24, "end": 862.9599999999999, + "text": " conjunction with westpun? We do hope so. And that was one of the goals + for making py westpun.", "tokens": [50668, 27482, 365, 7009, 79, 409, 30, 492, 360, + 1454, 370, 13, 400, 300, 390, 472, 295, 264, 5493, 337, 1455, 10664, 7009, 79, 409, + 13, 51104], "temperature": 0.0, "avg_logprob": -0.20577346400210733, "compression_ratio": + 1.5793991416309012, "no_speech_prob": 0.0011269384995102882}, {"id": 120, "seek": + 84816, "start": 862.9599999999999, "end": 868.64, "text": " So there are different + kind of use cases where you if you have like a more of a low", "tokens": [51104, + 407, 456, 366, 819, 733, 295, 764, 3331, 689, 291, 498, 291, 362, 411, 257, 544, + 295, 257, 2295, 51388], "temperature": 0.0, "avg_logprob": -0.20577346400210733, + "compression_ratio": 1.5793991416309012, "no_speech_prob": 0.0011269384995102882}, + {"id": 121, "seek": 84816, "start": 868.64, "end": 874.0799999999999, "text": " + query volume, maybe you have 200,000 documents or something like that, you know, + not really", "tokens": [51388, 14581, 5523, 11, 1310, 291, 362, 2331, 11, 1360, + 8512, 420, 746, 411, 300, 11, 291, 458, 11, 406, 534, 51660], "temperature": 0.0, + "avg_logprob": -0.20577346400210733, "compression_ratio": 1.5793991416309012, "no_speech_prob": + 0.0011269384995102882}, {"id": 122, "seek": 87408, "start": 874.32, "end": 881.2800000000001, + "text": " not really very low latency and so on. Then you can use Python and do + embeddings and you can play", "tokens": [50376, 406, 534, 588, 2295, 27043, 293, + 370, 322, 13, 1396, 291, 393, 764, 15329, 293, 360, 12240, 29432, 293, 291, 393, + 862, 50724], "temperature": 0.0, "avg_logprob": -0.16675648790724734, "compression_ratio": + 1.6462882096069869, "no_speech_prob": 0.0017631014343351126}, {"id": 123, "seek": + 87408, "start": 881.2800000000001, "end": 886.32, "text": " then it natively works + with hugging face and all those libraries that are typically written in", "tokens": + [50724, 550, 309, 8470, 356, 1985, 365, 41706, 1851, 293, 439, 729, 15148, 300, + 366, 5850, 3720, 294, 50976], "temperature": 0.0, "avg_logprob": -0.16675648790724734, + "compression_ratio": 1.6462882096069869, "no_speech_prob": 0.0017631014343351126}, + {"id": 124, "seek": 87408, "start": 886.32, "end": 893.84, "text": " Python. And + then you can use westpun, just purely HTTP based APIs and so on. The other option,", + "tokens": [50976, 15329, 13, 400, 550, 291, 393, 764, 7009, 79, 409, 11, 445, 17491, + 33283, 2361, 21445, 293, 370, 322, 13, 440, 661, 3614, 11, 51352], "temperature": + 0.0, "avg_logprob": -0.16675648790724734, "compression_ratio": 1.6462882096069869, + "no_speech_prob": 0.0017631014343351126}, {"id": 125, "seek": 87408, "start": 893.84, + "end": 899.36, "text": " which is more involved, I have to say, and that is that + you can take a transformer model,", "tokens": [51352, 597, 307, 544, 3288, 11, 286, + 362, 281, 584, 11, 293, 300, 307, 300, 291, 393, 747, 257, 31782, 2316, 11, 51628], + "temperature": 0.0, "avg_logprob": -0.16675648790724734, "compression_ratio": 1.6462882096069869, + "no_speech_prob": 0.0017631014343351126}, {"id": 126, "seek": 89936, "start": 899.44, + "end": 907.2, "text": " for example, and export it to one X format or on X, which + is open neural network exchange format.", "tokens": [50368, 337, 1365, 11, 293, + 10725, 309, 281, 472, 1783, 7877, 420, 322, 1783, 11, 597, 307, 1269, 18161, 3209, + 7742, 7877, 13, 50756], "temperature": 0.0, "avg_logprob": -0.15498650868733724, + "compression_ratio": 1.7174887892376682, "no_speech_prob": 0.005225573666393757}, + {"id": 127, "seek": 89936, "start": 907.2, "end": 916.08, "text": " So that''s a + kind of open neural network format that multiple companies like Microsoft, I think", + "tokens": [50756, 407, 300, 311, 257, 733, 295, 1269, 18161, 3209, 7877, 300, 3866, + 3431, 411, 8116, 11, 286, 519, 51200], "temperature": 0.0, "avg_logprob": -0.15498650868733724, + "compression_ratio": 1.7174887892376682, "no_speech_prob": 0.005225573666393757}, + {"id": 128, "seek": 89936, "start": 916.08, "end": 922.0, "text": " also Facebook + have rallied around, you know, this open format. So you can take the transformer", + "tokens": [51200, 611, 4384, 362, 31552, 1091, 926, 11, 291, 458, 11, 341, 1269, + 7877, 13, 407, 291, 393, 747, 264, 31782, 51496], "temperature": 0.0, "avg_logprob": + -0.15498650868733724, "compression_ratio": 1.7174887892376682, "no_speech_prob": + 0.005225573666393757}, {"id": 129, "seek": 89936, "start": 922.0, "end": 928.72, + "text": " models from the hugging face library and then you can export it to on + X and then you can import", "tokens": [51496, 5245, 490, 264, 41706, 1851, 6405, + 293, 550, 291, 393, 10725, 309, 281, 322, 1783, 293, 550, 291, 393, 974, 51832], + "temperature": 0.0, "avg_logprob": -0.15498650868733724, "compression_ratio": 1.7174887892376682, + "no_speech_prob": 0.005225573666393757}, {"id": 130, "seek": 92872, "start": 928.72, + "end": 936.32, "text": " all next models into westpun for evaluation. And westpun + we integrate with on X runtime,", "tokens": [50364, 439, 958, 5245, 666, 7009, 79, + 409, 337, 13344, 13, 400, 7009, 79, 409, 321, 13365, 365, 322, 1783, 34474, 11, + 50744], "temperature": 0.0, "avg_logprob": -0.22077992497658244, "compression_ratio": + 1.6244541484716157, "no_speech_prob": 0.0015304171247407794}, {"id": 131, "seek": + 92872, "start": 936.32, "end": 941.9200000000001, "text": " which is open source + library from Microsoft, which has a lot of different language findings,", "tokens": + [50744, 597, 307, 1269, 4009, 6405, 490, 8116, 11, 597, 575, 257, 688, 295, 819, + 2856, 16483, 11, 51024], "temperature": 0.0, "avg_logprob": -0.22077992497658244, + "compression_ratio": 1.6244541484716157, "no_speech_prob": 0.0015304171247407794}, + {"id": 132, "seek": 92872, "start": 941.9200000000001, "end": 949.2, "text": " Python, + C++, Java. So it''s a really great library and we integrate with that. So you don''t + use", "tokens": [51024, 15329, 11, 383, 25472, 11, 10745, 13, 407, 309, 311, 257, + 534, 869, 6405, 293, 321, 13365, 365, 300, 13, 407, 291, 500, 380, 764, 51388], + "temperature": 0.0, "avg_logprob": -0.22077992497658244, "compression_ratio": 1.6244541484716157, + "no_speech_prob": 0.0015304171247407794}, {"id": 133, "seek": 92872, "start": 949.2, + "end": 954.8000000000001, "text": " it directly, but we have like you can put the + model here, westpun you can be use it and you can", "tokens": [51388, 309, 3838, + 11, 457, 321, 362, 411, 291, 393, 829, 264, 2316, 510, 11, 7009, 79, 409, 291, 393, + 312, 764, 309, 293, 291, 393, 51668], "temperature": 0.0, "avg_logprob": -0.22077992497658244, + "compression_ratio": 1.6244541484716157, "no_speech_prob": 0.0015304171247407794}, + {"id": 134, "seek": 95480, "start": 954.88, "end": 961.52, "text": " invoke it and + so on. And those models and then you''re kind of a trade off between, you know,", + "tokens": [50368, 41117, 309, 293, 370, 322, 13, 400, 729, 5245, 293, 550, 291, + 434, 733, 295, 257, 4923, 766, 1296, 11, 291, 458, 11, 50700], "temperature": 0.0, + "avg_logprob": -0.17766854070848034, "compression_ratio": 1.6784452296819787, "no_speech_prob": + 0.006654925644397736}, {"id": 135, "seek": 95480, "start": 962.24, "end": 967.5999999999999, + "text": " getting to know westpun playing around with it and then, you know, maybe + low QPS, but in the", "tokens": [50736, 1242, 281, 458, 7009, 79, 409, 2433, 926, + 365, 309, 293, 550, 11, 291, 458, 11, 1310, 2295, 1249, 6273, 11, 457, 294, 264, + 51004], "temperature": 0.0, "avg_logprob": -0.17766854070848034, "compression_ratio": + 1.6784452296819787, "no_speech_prob": 0.006654925644397736}, {"id": 136, "seek": + 95480, "start": 967.5999999999999, "end": 973.1999999999999, "text": " scenario + where you have a really large scale, you want to do 100,000 per cent back and there''s", + "tokens": [51004, 9005, 689, 291, 362, 257, 534, 2416, 4373, 11, 291, 528, 281, + 360, 2319, 11, 1360, 680, 1489, 646, 293, 456, 311, 51284], "temperature": 0.0, + "avg_logprob": -0.17766854070848034, "compression_ratio": 1.6784452296819787, "no_speech_prob": + 0.006654925644397736}, {"id": 137, "seek": 95480, "start": 973.1999999999999, "end": + 977.68, "text": " something like that, then you move it to on X and deploy it actually + inside the westpun cluster,", "tokens": [51284, 746, 411, 300, 11, 550, 291, 1286, + 309, 281, 322, 1783, 293, 7274, 309, 767, 1854, 264, 7009, 79, 409, 13630, 11, 51508], + "temperature": 0.0, "avg_logprob": -0.17766854070848034, "compression_ratio": 1.6784452296819787, + "no_speech_prob": 0.006654925644397736}, {"id": 138, "seek": 95480, "start": 977.68, + "end": 984.0799999999999, "text": " which has many benefits because then you don''t + transfer a lot of data over the network and so on,", "tokens": [51508, 597, 575, + 867, 5311, 570, 550, 291, 500, 380, 5003, 257, 688, 295, 1412, 670, 264, 3209, 293, + 370, 322, 11, 51828], "temperature": 0.0, "avg_logprob": -0.17766854070848034, "compression_ratio": + 1.6784452296819787, "no_speech_prob": 0.006654925644397736}, {"id": 139, "seek": + 98408, "start": 984.08, "end": 990.72, "text": " because network is still even, + you know, within the data centers, maybe the network limitations have", "tokens": + [50364, 570, 3209, 307, 920, 754, 11, 291, 458, 11, 1951, 264, 1412, 10898, 11, + 1310, 264, 3209, 15705, 362, 50696], "temperature": 0.0, "avg_logprob": -0.20668819386471984, + "compression_ratio": 1.6431718061674008, "no_speech_prob": 0.0009269010042771697}, + {"id": 140, "seek": 98408, "start": 991.76, "end": 1000.32, "text": " this sold + so you can get 10 gigs or 25 gigs even, but going cross region, then latency is + still", "tokens": [50748, 341, 3718, 370, 291, 393, 483, 1266, 34586, 420, 3552, + 34586, 754, 11, 457, 516, 3278, 4458, 11, 550, 27043, 307, 920, 51176], "temperature": + 0.0, "avg_logprob": -0.20668819386471984, "compression_ratio": 1.6431718061674008, + "no_speech_prob": 0.0009269010042771697}, {"id": 141, "seek": 98408, "start": 1001.0400000000001, + "end": 1006.5600000000001, "text": " concern and that''s that''s one thing that + really fascinates me is that we''re still sometimes,", "tokens": [51212, 3136, 293, + 300, 311, 300, 311, 472, 551, 300, 534, 7184, 259, 1024, 385, 307, 300, 321, 434, + 920, 2171, 11, 51488], "temperature": 0.0, "avg_logprob": -0.20668819386471984, + "compression_ratio": 1.6431718061674008, "no_speech_prob": 0.0009269010042771697}, + {"id": 142, "seek": 98408, "start": 1006.5600000000001, "end": 1012.48, "text": + " you know, the use cases are bottlenecked by the speed of the light, right? So + yeah,", "tokens": [51488, 291, 458, 11, 264, 764, 3331, 366, 44641, 44118, 538, + 264, 3073, 295, 264, 1442, 11, 558, 30, 407, 1338, 11, 51784], "temperature": 0.0, + "avg_logprob": -0.20668819386471984, "compression_ratio": 1.6431718061674008, "no_speech_prob": + 0.0009269010042771697}, {"id": 143, "seek": 101248, "start": 1012.64, "end": 1016.88, + "text": " going from the east goes to the west coast and the US is easily 100 milliseconds. + So", "tokens": [50372, 516, 490, 264, 10648, 1709, 281, 264, 7009, 8684, 293, 264, + 2546, 307, 3612, 2319, 34184, 13, 407, 50584], "temperature": 0.0, "avg_logprob": + -0.2670500095073993, "compression_ratio": 1.5533980582524272, "no_speech_prob": + 0.0024898636620491743}, {"id": 144, "seek": 101248, "start": 1017.9200000000001, + "end": 1021.36, "text": " hasn''t been yet canceled or sold so yeah, physics.", + "tokens": [50636, 6132, 380, 668, 1939, 24839, 420, 3718, 370, 1338, 11, 10649, + 13, 50808], "temperature": 0.0, "avg_logprob": -0.2670500095073993, "compression_ratio": + 1.5533980582524272, "no_speech_prob": 0.0024898636620491743}, {"id": 145, "seek": + 101248, "start": 1024.64, "end": 1031.2, "text": " Yeah, this is fantastic and and + so and also like even before we go into this wonderful world", "tokens": [50972, + 865, 11, 341, 307, 5456, 293, 293, 370, 293, 611, 411, 754, 949, 321, 352, 666, + 341, 3715, 1002, 51300], "temperature": 0.0, "avg_logprob": -0.2670500095073993, + "compression_ratio": 1.5533980582524272, "no_speech_prob": 0.0024898636620491743}, + {"id": 146, "seek": 101248, "start": 1031.2, "end": 1037.68, "text": " of models + and latest advancements like I''m still curious also to dig into the item that you", + "tokens": [51300, 295, 5245, 293, 6792, 7295, 1117, 411, 286, 478, 920, 6369, 611, + 281, 2528, 666, 264, 3174, 300, 291, 51624], "temperature": 0.0, "avg_logprob": + -0.2670500095073993, "compression_ratio": 1.5533980582524272, "no_speech_prob": + 0.0024898636620491743}, {"id": 147, "seek": 103768, "start": 1037.68, "end": 1044.72, + "text": " mentioned like you when when you have been evolving westpun over time, + you found a need to add", "tokens": [50364, 2835, 411, 291, 562, 562, 291, 362, + 668, 21085, 7009, 79, 409, 670, 565, 11, 291, 1352, 257, 643, 281, 909, 50716], + "temperature": 0.0, "avg_logprob": -0.19216473182935392, "compression_ratio": 1.8046511627906976, + "no_speech_prob": 0.003391799284145236}, {"id": 148, "seek": 103768, "start": 1044.72, + "end": 1049.92, "text": " something really interesting, some really interesting + data structures like tensors you mentioned and", "tokens": [50716, 746, 534, 1880, + 11, 512, 534, 1880, 1412, 9227, 411, 10688, 830, 291, 2835, 293, 50976], "temperature": + 0.0, "avg_logprob": -0.19216473182935392, "compression_ratio": 1.8046511627906976, + "no_speech_prob": 0.003391799284145236}, {"id": 149, "seek": 103768, "start": 1049.92, + "end": 1057.6000000000001, "text": " like could you elaborate a bit more on how + this need arise and also like, you know, what are the", "tokens": [50976, 411, 727, + 291, 20945, 257, 857, 544, 322, 577, 341, 643, 20288, 293, 611, 411, 11, 291, 458, + 11, 437, 366, 264, 51360], "temperature": 0.0, "avg_logprob": -0.19216473182935392, + "compression_ratio": 1.8046511627906976, "no_speech_prob": 0.003391799284145236}, + {"id": 150, "seek": 103768, "start": 1058.5600000000002, "end": 1064.3200000000002, + "text": " use cases, typical use cases for it today and also how accessible to an + average user of westpun", "tokens": [51408, 764, 3331, 11, 7476, 764, 3331, 337, + 309, 965, 293, 611, 577, 9515, 281, 364, 4274, 4195, 295, 7009, 79, 409, 51696], + "temperature": 0.0, "avg_logprob": -0.19216473182935392, "compression_ratio": 1.8046511627906976, + "no_speech_prob": 0.003391799284145236}, {"id": 151, "seek": 106432, "start": 1064.32, + "end": 1073.9199999999998, "text": " so to say. Yeah, so I''ll do a little bit of + history on that. So the best for document model you", "tokens": [50364, 370, 281, + 584, 13, 865, 11, 370, 286, 603, 360, 257, 707, 857, 295, 2503, 322, 300, 13, 407, + 264, 1151, 337, 4166, 2316, 291, 50844], "temperature": 0.0, "avg_logprob": -0.21822569105360243, + "compression_ratio": 1.7432432432432432, "no_speech_prob": 0.005370495840907097}, + {"id": 152, "seek": 106432, "start": 1073.9199999999998, "end": 1079.6, "text": + " write has a fixed kind of you have to have a defined schema in westpun. So you + have to define it", "tokens": [50844, 2464, 575, 257, 6806, 733, 295, 291, 362, + 281, 362, 257, 7642, 34078, 294, 7009, 79, 409, 13, 407, 291, 362, 281, 6964, 309, + 51128], "temperature": 0.0, "avg_logprob": -0.21822569105360243, "compression_ratio": + 1.7432432432432432, "no_speech_prob": 0.005370495840907097}, {"id": 153, "seek": + 106432, "start": 1079.6, "end": 1085.4399999999998, "text": " for instance, you + have a document type called document and it has a title, it has maybe some time", + "tokens": [51128, 337, 5197, 11, 291, 362, 257, 4166, 2010, 1219, 4166, 293, 309, + 575, 257, 4876, 11, 309, 575, 1310, 512, 565, 51420], "temperature": 0.0, "avg_logprob": + -0.21822569105360243, "compression_ratio": 1.7432432432432432, "no_speech_prob": + 0.005370495840907097}, {"id": 154, "seek": 106432, "start": 1085.4399999999998, + "end": 1092.0, "text": " stamp, it might be have an integer attribute. So there + are different like normal document model,", "tokens": [51420, 9921, 11, 309, 1062, + 312, 362, 364, 24922, 19667, 13, 407, 456, 366, 819, 411, 2710, 4166, 2316, 11, + 51748], "temperature": 0.0, "avg_logprob": -0.21822569105360243, "compression_ratio": + 1.7432432432432432, "no_speech_prob": 0.005370495840907097}, {"id": 155, "seek": + 109200, "start": 1092.0, "end": 1099.92, "text": " what you expect from kind of + any any schema oriented database and we also had vectors so you can do", "tokens": + [50364, 437, 291, 2066, 490, 733, 295, 604, 604, 34078, 21841, 8149, 293, 321, 611, + 632, 18875, 370, 291, 393, 360, 50760], "temperature": 0.0, "avg_logprob": -0.22447259426116944, + "compression_ratio": 1.7431192660550459, "no_speech_prob": 0.00042848754674196243}, + {"id": 156, "seek": 109200, "start": 1100.8, "end": 1106.88, "text": " early on + that you can actually do brute force dot products as part of ranking because that + was", "tokens": [50804, 2440, 322, 300, 291, 393, 767, 360, 47909, 3464, 5893, 3383, + 382, 644, 295, 17833, 570, 300, 390, 51108], "temperature": 0.0, "avg_logprob": + -0.22447259426116944, "compression_ratio": 1.7431192660550459, "no_speech_prob": + 0.00042848754674196243}, {"id": 157, "seek": 109200, "start": 1106.88, "end": 1112.72, + "text": " really popular among in in your you know for various ranking requirements + you will multiply or", "tokens": [51108, 534, 3743, 3654, 294, 294, 428, 291, 458, + 337, 3683, 17833, 7728, 291, 486, 12972, 420, 51400], "temperature": 0.0, "avg_logprob": + -0.22447259426116944, "compression_ratio": 1.7431192660550459, "no_speech_prob": + 0.00042848754674196243}, {"id": 158, "seek": 109200, "start": 1112.72, "end": 1117.44, + "text": " sorry, you will perform multiple different dot products over the documents + that you you''re", "tokens": [51400, 2597, 11, 291, 486, 2042, 3866, 819, 5893, + 3383, 670, 264, 8512, 300, 291, 291, 434, 51636], "temperature": 0.0, "avg_logprob": + -0.22447259426116944, "compression_ratio": 1.7431192660550459, "no_speech_prob": + 0.00042848754674196243}, {"id": 159, "seek": 111744, "start": 1117.44, "end": 1126.0800000000002, + "text": " queried as retreat then in around 2013 2014 the researchers in your outside, + you know, we really want", "tokens": [50364, 7083, 1091, 382, 15505, 550, 294, 926, + 9012, 8227, 264, 10309, 294, 428, 2380, 11, 291, 458, 11, 321, 534, 528, 50796], + "temperature": 0.0, "avg_logprob": -0.23751659393310548, "compression_ratio": 1.5181347150259068, + "no_speech_prob": 0.0006269071018323302}, {"id": 160, "seek": 111744, "start": 1126.0800000000002, + "end": 1133.28, "text": " to express these type of recommendation models where we + can use the general concept of tensor so", "tokens": [50796, 281, 5109, 613, 2010, + 295, 11879, 5245, 689, 321, 393, 764, 264, 2674, 3410, 295, 40863, 370, 51156], + "temperature": 0.0, "avg_logprob": -0.23751659393310548, "compression_ratio": 1.5181347150259068, + "no_speech_prob": 0.0006269071018323302}, {"id": 161, "seek": 111744, "start": 1133.28, + "end": 1139.6000000000001, "text": " not just storing a vector in the document but + even a matrix and they had some use cases around", "tokens": [51156, 406, 445, 26085, + 257, 8062, 294, 264, 4166, 457, 754, 257, 8141, 293, 436, 632, 512, 764, 3331, 926, + 51472], "temperature": 0.0, "avg_logprob": -0.23751659393310548, "compression_ratio": + 1.5181347150259068, "no_speech_prob": 0.0006269071018323302}, {"id": 162, "seek": + 113960, "start": 1140.56, "end": 1147.6, "text": " recommendation. So for instance + in the in the in the document you can represent in the matrix so", "tokens": [50412, + 11879, 13, 407, 337, 5197, 294, 264, 294, 264, 294, 264, 4166, 291, 393, 2906, 294, + 264, 8141, 370, 50764], "temperature": 0.0, "avg_logprob": -0.16803231693449475, + "compression_ratio": 1.9646464646464648, "no_speech_prob": 0.0013109511928632855}, + {"id": 163, "seek": 113960, "start": 1147.6, "end": 1155.4399999999998, "text": + " you can have multiple is this document popular in multiple different categories + for example that", "tokens": [50764, 291, 393, 362, 3866, 307, 341, 4166, 3743, + 294, 3866, 819, 10479, 337, 1365, 300, 51156], "temperature": 0.0, "avg_logprob": + -0.16803231693449475, "compression_ratio": 1.9646464646464648, "no_speech_prob": + 0.0013109511928632855}, {"id": 164, "seek": 113960, "start": 1155.4399999999998, + "end": 1161.6799999999998, "text": " you know this document is popular among people + that are interested in use this is in the ones that", "tokens": [51156, 291, 458, + 341, 4166, 307, 3743, 3654, 561, 300, 366, 3102, 294, 764, 341, 307, 294, 264, 2306, + 300, 51468], "temperature": 0.0, "avg_logprob": -0.16803231693449475, "compression_ratio": + 1.9646464646464648, "no_speech_prob": 0.0013109511928632855}, {"id": 165, "seek": + 113960, "start": 1161.6799999999998, "end": 1168.08, "text": " they''re interested + in finance and so on. So it''s a really like complex and complex like that you", + "tokens": [51468, 436, 434, 3102, 294, 10719, 293, 370, 322, 13, 407, 309, 311, + 257, 534, 411, 3997, 293, 3997, 411, 300, 291, 51788], "temperature": 0.0, "avg_logprob": + -0.16803231693449475, "compression_ratio": 1.9646464646464648, "no_speech_prob": + 0.0013109511928632855}, {"id": 166, "seek": 116808, "start": 1168.08, "end": 1173.1999999999998, + "text": " can actually have both the tensors in the in the document side but also + the query side and then", "tokens": [50364, 393, 767, 362, 1293, 264, 10688, 830, + 294, 264, 294, 264, 4166, 1252, 457, 611, 264, 14581, 1252, 293, 550, 50620], "temperature": + 0.0, "avg_logprob": -0.1717419007245232, "compression_ratio": 1.784037558685446, + "no_speech_prob": 4.991051537217572e-05}, {"id": 167, "seek": 116808, "start": 1173.1999999999998, + "end": 1179.84, "text": " you can do during the ranking phase you can evaluate these + kind of expressions so it''s a really", "tokens": [50620, 291, 393, 360, 1830, 264, + 17833, 5574, 291, 393, 13059, 613, 733, 295, 15277, 370, 309, 311, 257, 534, 50952], + "temperature": 0.0, "avg_logprob": -0.1717419007245232, "compression_ratio": 1.784037558685446, + "no_speech_prob": 4.991051537217572e-05}, {"id": 168, "seek": 116808, "start": 1181.4399999999998, + "end": 1188.6399999999999, "text": " it''s a really powerful the language and one + example concrete example is we haven''t touched on", "tokens": [51032, 309, 311, + 257, 534, 4005, 264, 2856, 293, 472, 1365, 9859, 1365, 307, 321, 2378, 380, 9828, + 322, 51392], "temperature": 0.0, "avg_logprob": -0.1717419007245232, "compression_ratio": + 1.784037558685446, "no_speech_prob": 4.991051537217572e-05}, {"id": 169, "seek": + 116808, "start": 1188.6399999999999, "end": 1195.1999999999998, "text": " the language + models and so on but for instance the callbert model which is contextualized late", + "tokens": [51392, 264, 2856, 5245, 293, 370, 322, 457, 337, 5197, 264, 818, 4290, + 2316, 597, 307, 35526, 1602, 3469, 51720], "temperature": 0.0, "avg_logprob": -0.1717419007245232, + "compression_ratio": 1.784037558685446, "no_speech_prob": 4.991051537217572e-05}, + {"id": 170, "seek": 119520, "start": 1196.0800000000002, "end": 1203.52, "text": + " interaction overbert where you actually take the query is not represented as one + vector", "tokens": [50408, 9285, 670, 4290, 689, 291, 767, 747, 264, 14581, 307, + 406, 10379, 382, 472, 8062, 50780], "temperature": 0.0, "avg_logprob": -0.13808503935608682, + "compression_ratio": 1.9841269841269842, "no_speech_prob": 0.001746696187183261}, + {"id": 171, "seek": 119520, "start": 1203.52, "end": 1209.04, "text": " but each + of the terms in the queries represent the desivector and similar on the document + side", "tokens": [50780, 457, 1184, 295, 264, 2115, 294, 264, 24109, 2906, 264, + 730, 488, 1672, 293, 2531, 322, 264, 4166, 1252, 51056], "temperature": 0.0, "avg_logprob": + -0.13808503935608682, "compression_ratio": 1.9841269841269842, "no_speech_prob": + 0.001746696187183261}, {"id": 172, "seek": 119520, "start": 1209.04, "end": 1215.28, + "text": " each of the document terms are represented as a vector and then at runtime + you retrieve documents", "tokens": [51056, 1184, 295, 264, 4166, 2115, 366, 10379, + 382, 257, 8062, 293, 550, 412, 34474, 291, 30254, 8512, 51368], "temperature": 0.0, + "avg_logprob": -0.13808503935608682, "compression_ratio": 1.9841269841269842, "no_speech_prob": + 0.001746696187183261}, {"id": 173, "seek": 119520, "start": 1215.28, "end": 1221.44, + "text": " and then you rank them based on this maximum similarity function so it + takes the vector of the", "tokens": [51368, 293, 550, 291, 6181, 552, 2361, 322, + 341, 6674, 32194, 2445, 370, 309, 2516, 264, 8062, 295, 264, 51676], "temperature": + 0.0, "avg_logprob": -0.13808503935608682, "compression_ratio": 1.9841269841269842, + "no_speech_prob": 0.001746696187183261}, {"id": 174, "seek": 122144, "start": 1221.44, + "end": 1228.72, "text": " first term and then it performs k dot products against + the vectors of the document terms and then", "tokens": [50364, 700, 1433, 293, 550, + 309, 26213, 350, 5893, 3383, 1970, 264, 18875, 295, 264, 4166, 2115, 293, 550, 50728], + "temperature": 0.0, "avg_logprob": -0.1543291532076322, "compression_ratio": 1.916256157635468, + "no_speech_prob": 0.00021294938051141798}, {"id": 175, "seek": 122144, "start": + 1228.72, "end": 1234.0, "text": " you you take the maximum of that score and then + you do that for all of the terms and the final is", "tokens": [50728, 291, 291, + 747, 264, 6674, 295, 300, 6175, 293, 550, 291, 360, 300, 337, 439, 295, 264, 2115, + 293, 264, 2572, 307, 50992], "temperature": 0.0, "avg_logprob": -0.1543291532076322, + "compression_ratio": 1.916256157635468, "no_speech_prob": 0.00021294938051141798}, + {"id": 176, "seek": 122144, "start": 1234.0, "end": 1239.6000000000001, "text": + " the it''s a sum so that was actually one of the things that I personally the tensors + hadn''t been", "tokens": [50992, 264, 309, 311, 257, 2408, 370, 300, 390, 767, 472, + 295, 264, 721, 300, 286, 5665, 264, 10688, 830, 8782, 380, 668, 51272], "temperature": + 0.0, "avg_logprob": -0.1543291532076322, "compression_ratio": 1.916256157635468, + "no_speech_prob": 0.00021294938051141798}, {"id": 177, "seek": 122144, "start": + 1239.6000000000001, "end": 1246.0800000000002, "text": " that much used for search + use cases but more around recommendation use cases when I when I when I", "tokens": + [51272, 300, 709, 1143, 337, 3164, 764, 3331, 457, 544, 926, 11879, 764, 3331, 562, + 286, 562, 286, 562, 286, 51596], "temperature": 0.0, "avg_logprob": -0.1543291532076322, + "compression_ratio": 1.916256157635468, "no_speech_prob": 0.00021294938051141798}, + {"id": 178, "seek": 124608, "start": 1246.6399999999999, "end": 1252.48, "text": + " saw callbert and I saw the maximum operator I was like this is just perfect fit + for for the best", "tokens": [50392, 1866, 818, 4290, 293, 286, 1866, 264, 6674, + 12973, 286, 390, 411, 341, 307, 445, 2176, 3318, 337, 337, 264, 1151, 50684], "temperature": + 0.0, "avg_logprob": -0.16759918512922994, "compression_ratio": 1.7399103139013452, + "no_speech_prob": 0.0010916386963799596}, {"id": 179, "seek": 124608, "start": 1252.48, + "end": 1262.1599999999999, "text": " potential it''s a perfect use case so yeah + yeah that''s one example yeah awesome once you described", "tokens": [50684, 3995, + 309, 311, 257, 2176, 764, 1389, 370, 1338, 1338, 300, 311, 472, 1365, 1338, 3476, + 1564, 291, 7619, 51168], "temperature": 0.0, "avg_logprob": -0.16759918512922994, + "compression_ratio": 1.7399103139013452, "no_speech_prob": 0.0010916386963799596}, + {"id": 180, "seek": 124608, "start": 1262.1599999999999, "end": 1266.6399999999999, + "text": " like when you go like many models today is like okay embedded spas that + you embed this paragraph", "tokens": [51168, 411, 562, 291, 352, 411, 867, 5245, + 965, 307, 411, 1392, 16741, 637, 296, 300, 291, 12240, 341, 18865, 51392], "temperature": + 0.0, "avg_logprob": -0.16759918512922994, "compression_ratio": 1.7399103139013452, + "no_speech_prob": 0.0010916386963799596}, {"id": 181, "seek": 124608, "start": 1266.6399999999999, + "end": 1274.48, "text": " whatever but if if you need to go world level that''s + like lots of data lots of computation right", "tokens": [51392, 2035, 457, 498, + 498, 291, 643, 281, 352, 1002, 1496, 300, 311, 411, 3195, 295, 1412, 3195, 295, + 24903, 558, 51784], "temperature": 0.0, "avg_logprob": -0.16759918512922994, "compression_ratio": + 1.7399103139013452, "no_speech_prob": 0.0010916386963799596}, {"id": 182, "seek": + 127448, "start": 1275.28, "end": 1281.6, "text": " so how you would even do this + sounds like tensors have found the use case there", "tokens": [50404, 370, 577, + 291, 576, 754, 360, 341, 3263, 411, 10688, 830, 362, 1352, 264, 764, 1389, 456, + 50720], "temperature": 0.0, "avg_logprob": -0.23685801714316182, "compression_ratio": + 1.8066037735849056, "no_speech_prob": 0.0004518931673374027}, {"id": 183, "seek": + 127448, "start": 1283.1200000000001, "end": 1290.88, "text": " yeah and in in in + in callbert what when we we represented callbert on less also we did a large sample", + "tokens": [50796, 1338, 293, 294, 294, 294, 294, 818, 4290, 437, 562, 321, 321, + 10379, 818, 4290, 322, 1570, 611, 321, 630, 257, 2416, 6889, 51184], "temperature": + 0.0, "avg_logprob": -0.23685801714316182, "compression_ratio": 1.8066037735849056, + "no_speech_prob": 0.0004518931673374027}, {"id": 184, "seek": 127448, "start": 1290.88, + "end": 1298.08, "text": " application around the ms marker dataset the passage ranking + dataset of mammoth marker so we made", "tokens": [51184, 3861, 926, 264, 275, 82, + 15247, 28872, 264, 11497, 17833, 28872, 295, 19033, 900, 15247, 370, 321, 1027, + 51544], "temperature": 0.0, "avg_logprob": -0.23685801714316182, "compression_ratio": + 1.8066037735849056, "no_speech_prob": 0.0004518931673374027}, {"id": 185, "seek": + 127448, "start": 1298.08, "end": 1304.0, "text": " a sample app where you can combine + these different retrieval and ranking strategies and but in our case", "tokens": + [51544, 257, 6889, 724, 689, 291, 393, 10432, 613, 819, 19817, 3337, 293, 17833, + 9029, 293, 457, 294, 527, 1389, 51840], "temperature": 0.0, "avg_logprob": -0.23685801714316182, + "compression_ratio": 1.8066037735849056, "no_speech_prob": 0.0004518931673374027}, + {"id": 186, "seek": 130400, "start": 1304.0, "end": 1310.4, "text": " we used callbert + as a re ranking model and that''s one of the really strength of of espice that", + "tokens": [50364, 321, 1143, 818, 4290, 382, 257, 319, 17833, 2316, 293, 300, 311, + 472, 295, 264, 534, 3800, 295, 295, 7089, 573, 300, 50684], "temperature": 0.0, + "avg_logprob": -0.1403299119737413, "compression_ratio": 1.8285714285714285, "no_speech_prob": + 0.00021134539565537125}, {"id": 187, "seek": 130400, "start": 1311.04, "end": 1319.6, + "text": " we allow you to express really complex retrieval and ranking pipelines + so that you do a query and", "tokens": [50716, 321, 2089, 291, 281, 5109, 534, 3997, + 19817, 3337, 293, 17833, 40168, 370, 300, 291, 360, 257, 14581, 293, 51144], "temperature": + 0.0, "avg_logprob": -0.1403299119737413, "compression_ratio": 1.8285714285714285, + "no_speech_prob": 0.00021134539565537125}, {"id": 188, "seek": 130400, "start": + 1319.6, "end": 1324.88, "text": " then each of the nodes involved in the query they + will do a local ranking or matching and then you", "tokens": [51144, 550, 1184, + 295, 264, 13891, 3288, 294, 264, 14581, 436, 486, 360, 257, 2654, 17833, 420, 14324, + 293, 550, 291, 51408], "temperature": 0.0, "avg_logprob": -0.1403299119737413, "compression_ratio": + 1.8285714285714285, "no_speech_prob": 0.00021134539565537125}, {"id": 189, "seek": + 130400, "start": 1324.88, "end": 1331.2, "text": " could have a second face locally + on each node and then when you have the kind of global view", "tokens": [51408, + 727, 362, 257, 1150, 1851, 16143, 322, 1184, 9984, 293, 550, 562, 291, 362, 264, + 733, 295, 4338, 1910, 51724], "temperature": 0.0, "avg_logprob": -0.1403299119737413, + "compression_ratio": 1.8285714285714285, "no_speech_prob": 0.00021134539565537125}, + {"id": 190, "seek": 133120, "start": 1331.28, "end": 1335.8400000000001, "text": + " after you have done the scatter gather then you can do another re ranking face + because then you have", "tokens": [50368, 934, 291, 362, 1096, 264, 34951, 5448, + 550, 291, 393, 360, 1071, 319, 17833, 1851, 570, 550, 291, 362, 50596], "temperature": + 0.0, "avg_logprob": -0.15650162469773066, "compression_ratio": 1.7075812274368232, + "no_speech_prob": 0.0010767659405246377}, {"id": 191, "seek": 133120, "start": 1335.8400000000001, + "end": 1341.8400000000001, "text": " the global view so there are a lot of possibilities + to kind of trade off between accuracy and cost", "tokens": [50596, 264, 4338, 1910, + 370, 456, 366, 257, 688, 295, 12178, 281, 733, 295, 4923, 766, 1296, 14170, 293, + 2063, 50896], "temperature": 0.0, "avg_logprob": -0.15650162469773066, "compression_ratio": + 1.7075812274368232, "no_speech_prob": 0.0010767659405246377}, {"id": 192, "seek": + 133120, "start": 1341.8400000000001, "end": 1347.76, "text": " them yeah yeah exactly + and actually as you''ve been describing this I also realized that", "tokens": [50896, + 552, 1338, 1338, 2293, 293, 767, 382, 291, 600, 668, 16141, 341, 286, 611, 5334, + 300, 51192], "temperature": 0.0, "avg_logprob": -0.15650162469773066, "compression_ratio": + 1.7075812274368232, "no_speech_prob": 0.0010767659405246377}, {"id": 193, "seek": + 133120, "start": 1348.64, "end": 1354.0, "text": " we''ve been recently discussing + in one of the podcasts about multi-stage runker right so", "tokens": [51236, 321, + 600, 668, 3938, 10850, 294, 472, 295, 264, 24045, 466, 4825, 12, 17882, 367, 3197, + 260, 558, 370, 51504], "temperature": 0.0, "avg_logprob": -0.15650162469773066, + "compression_ratio": 1.7075812274368232, "no_speech_prob": 0.0010767659405246377}, + {"id": 194, "seek": 133120, "start": 1354.0, "end": 1359.92, "text": " you could + have either a sparse or dense retrieval but you can then later use your graph algorithm", + "tokens": [51504, 291, 727, 362, 2139, 257, 637, 11668, 420, 18011, 19817, 3337, + 457, 291, 393, 550, 1780, 764, 428, 4295, 9284, 51800], "temperature": 0.0, "avg_logprob": + -0.15650162469773066, "compression_ratio": 1.7075812274368232, "no_speech_prob": + 0.0010767659405246377}, {"id": 195, "seek": 135992, "start": 1360.4, "end": 1368.48, + "text": " to kind of like re rank the items I think it was in the podcast with Yuri + Markov the author of H&SW algorithm", "tokens": [50388, 281, 733, 295, 411, 319, + 6181, 264, 4754, 286, 519, 309, 390, 294, 264, 7367, 365, 33901, 3934, 5179, 264, + 3793, 295, 389, 5, 50, 54, 9284, 50792], "temperature": 0.0, "avg_logprob": -0.28793848673502603, + "compression_ratio": 1.4688995215311005, "no_speech_prob": 0.0009381374693475664}, + {"id": 196, "seek": 135992, "start": 1368.48, "end": 1376.88, "text": " and so have + you have you seen any use cases based on espice you know for multi-stage ranking + pipeline?", "tokens": [50792, 293, 370, 362, 291, 362, 291, 1612, 604, 764, 3331, + 2361, 322, 7089, 573, 291, 458, 337, 4825, 12, 17882, 17833, 15517, 30, 51212], + "temperature": 0.0, "avg_logprob": -0.28793848673502603, "compression_ratio": 1.4688995215311005, + "no_speech_prob": 0.0009381374693475664}, {"id": 197, "seek": 135992, "start": 1379.1200000000001, + "end": 1388.96, "text": " Definitely I mean so both the search internally in our + we also see this outside from customers", "tokens": [51324, 12151, 286, 914, 370, + 1293, 264, 3164, 19501, 294, 527, 321, 611, 536, 341, 2380, 490, 4581, 51816], "temperature": + 0.0, "avg_logprob": -0.28793848673502603, "compression_ratio": 1.4688995215311005, + "no_speech_prob": 0.0009381374693475664}, {"id": 198, "seek": 138896, "start": 1388.96, + "end": 1394.0, "text": " using last but they do multi-stage retrieval and ranking + pipelines so there''s basically", "tokens": [50364, 1228, 1036, 457, 436, 360, 4825, + 12, 17882, 19817, 3337, 293, 17833, 40168, 370, 456, 311, 1936, 50616], "temperature": + 0.0, "avg_logprob": -0.15914190106275605, "compression_ratio": 1.7061611374407584, + "no_speech_prob": 0.0008941815467551351}, {"id": 199, "seek": 138896, "start": 1396.56, + "end": 1402.56, "text": " the reason why you do it typically is that it''s too expensive + to evaluate", "tokens": [50744, 264, 1778, 983, 291, 360, 309, 5850, 307, 300, 309, + 311, 886, 5124, 281, 13059, 51044], "temperature": 0.0, "avg_logprob": -0.15914190106275605, + "compression_ratio": 1.7061611374407584, "no_speech_prob": 0.0008941815467551351}, + {"id": 200, "seek": 138896, "start": 1404.24, "end": 1410.56, "text": " the kind + of final ranking model over all the documents right so you take some kind of approximation", + "tokens": [51128, 264, 733, 295, 2572, 17833, 2316, 670, 439, 264, 8512, 558, 370, + 291, 747, 512, 733, 295, 28023, 51444], "temperature": 0.0, "avg_logprob": -0.15914190106275605, + "compression_ratio": 1.7061611374407584, "no_speech_prob": 0.0008941815467551351}, + {"id": 201, "seek": 138896, "start": 1410.56, "end": 1416.64, "text": " of that + model and then you execute that as to kind of candidate the treaver and I think + one of the", "tokens": [51444, 295, 300, 2316, 293, 550, 291, 14483, 300, 382, 281, + 733, 295, 11532, 264, 2192, 20655, 293, 286, 519, 472, 295, 264, 51748], "temperature": + 0.0, "avg_logprob": -0.15914190106275605, "compression_ratio": 1.7061611374407584, + "no_speech_prob": 0.0008941815467551351}, {"id": 202, "seek": 141664, "start": 1416.72, + "end": 1423.2800000000002, "text": " we haven''t talked about the vector search + capabilities of VESPA yet but one of the beauties of VESPA is", "tokens": [50368, + 321, 2378, 380, 2825, 466, 264, 8062, 3164, 10862, 295, 691, 2358, 10297, 1939, + 457, 472, 295, 264, 1869, 530, 295, 691, 2358, 10297, 307, 50696], "temperature": + 0.0, "avg_logprob": -0.16820520765325997, "compression_ratio": 1.72, "no_speech_prob": + 0.0008973479270935059}, {"id": 203, "seek": 141664, "start": 1423.2800000000002, + "end": 1429.2, "text": " that we after we integrate it approximate nearest neighbor + searches that you can do a combination", "tokens": [50696, 300, 321, 934, 321, 13365, + 309, 30874, 23831, 5987, 26701, 300, 291, 393, 360, 257, 6562, 50992], "temperature": + 0.0, "avg_logprob": -0.16820520765325997, "compression_ratio": 1.72, "no_speech_prob": + 0.0008973479270935059}, {"id": 204, "seek": 141664, "start": 1430.16, "end": 1435.68, + "text": " when you actually do the matching and querying that you can combine it + the regular sparse or", "tokens": [51040, 562, 291, 767, 360, 264, 14324, 293, 7083, + 1840, 300, 291, 393, 10432, 309, 264, 3890, 637, 11668, 420, 51316], "temperature": + 0.0, "avg_logprob": -0.16820520765325997, "compression_ratio": 1.72, "no_speech_prob": + 0.0008973479270935059}, {"id": 205, "seek": 141664, "start": 1435.68, "end": 1442.24, + "text": " keyword search with a vector search and then you re rank and it''s kind + of paradigm of having", "tokens": [51316, 20428, 3164, 365, 257, 8062, 3164, 293, + 550, 291, 319, 6181, 293, 309, 311, 733, 295, 24709, 295, 1419, 51644], "temperature": + 0.0, "avg_logprob": -0.16820520765325997, "compression_ratio": 1.72, "no_speech_prob": + 0.0008973479270935059}, {"id": 206, "seek": 144224, "start": 1442.32, "end": 1447.2, + "text": " multiple stages you know you see that in the question answering pipelines + as well right or you have", "tokens": [50368, 3866, 10232, 291, 458, 291, 536, 300, + 294, 264, 1168, 13430, 40168, 382, 731, 558, 420, 291, 362, 50612], "temperature": + 0.0, "avg_logprob": -0.1214578769825123, "compression_ratio": 1.7399103139013452, + "no_speech_prob": 0.002143857069313526}, {"id": 207, "seek": 144224, "start": 1447.68, + "end": 1454.0, "text": " retriever and then you have what they call a reader right + so it basically finds some candidate", "tokens": [50636, 19817, 331, 293, 550, 291, + 362, 437, 436, 818, 257, 15149, 558, 370, 309, 1936, 10704, 512, 11532, 50952], + "temperature": 0.0, "avg_logprob": -0.1214578769825123, "compression_ratio": 1.7399103139013452, + "no_speech_prob": 0.002143857069313526}, {"id": 208, "seek": 144224, "start": 1454.0, + "end": 1461.36, "text": " passages from Wikipedia and then extracts in a reader + but evaluating the reader which is a complex", "tokens": [50952, 31589, 490, 28999, + 293, 550, 8947, 82, 294, 257, 15149, 457, 27479, 264, 15149, 597, 307, 257, 3997, + 51320], "temperature": 0.0, "avg_logprob": -0.1214578769825123, "compression_ratio": + 1.7399103139013452, "no_speech_prob": 0.002143857069313526}, {"id": 209, "seek": + 144224, "start": 1461.36, "end": 1466.24, "text": " typically a transformer model + where you input both the query and the document at the same time", "tokens": [51320, + 5850, 257, 31782, 2316, 689, 291, 4846, 1293, 264, 14581, 293, 264, 4166, 412, 264, + 912, 565, 51564], "temperature": 0.0, "avg_logprob": -0.1214578769825123, "compression_ratio": + 1.7399103139013452, "no_speech_prob": 0.002143857069313526}, {"id": 210, "seek": + 146624, "start": 1466.32, "end": 1472.56, "text": " into the deep neural network + it''s very complex to actually evaluate that overall the potential", "tokens": [50368, + 666, 264, 2452, 18161, 3209, 309, 311, 588, 3997, 281, 767, 13059, 300, 4787, 264, + 3995, 50680], "temperature": 0.0, "avg_logprob": -0.19715923982508043, "compression_ratio": + 1.7324561403508771, "no_speech_prob": 0.0029489686712622643}, {"id": 211, "seek": + 146624, "start": 1473.1200000000001, "end": 1479.52, "text": " passages and user + types. It''s like super intensive and I''m super curious to drill into this topic + of", "tokens": [50708, 31589, 293, 4195, 3467, 13, 467, 311, 411, 1687, 18957, 293, + 286, 478, 1687, 6369, 281, 11392, 666, 341, 4829, 295, 51028], "temperature": 0.0, + "avg_logprob": -0.19715923982508043, "compression_ratio": 1.7324561403508771, "no_speech_prob": + 0.0029489686712622643}, {"id": 212, "seek": 146624, "start": 1479.52, "end": 1485.92, + "text": " like combining you know neural search with sparse search actually before + that as you''ve been talking", "tokens": [51028, 411, 21928, 291, 458, 18161, 3164, + 365, 637, 11668, 3164, 767, 949, 300, 382, 291, 600, 668, 1417, 51348], "temperature": + 0.0, "avg_logprob": -0.19715923982508043, "compression_ratio": 1.7324561403508771, + "no_speech_prob": 0.0029489686712622643}, {"id": 213, "seek": 146624, "start": 1485.92, + "end": 1492.4, "text": " I''ve realized I''m actually now taking a search with machine + learning course dot by grand ingressol", "tokens": [51348, 286, 600, 5334, 286, + 478, 767, 586, 1940, 257, 3164, 365, 3479, 2539, 1164, 5893, 538, 2697, 3957, 735, + 401, 51672], "temperature": 0.0, "avg_logprob": -0.19715923982508043, "compression_ratio": + 1.7324561403508771, "no_speech_prob": 0.0029489686712622643}, {"id": 214, "seek": + 149240, "start": 1492.5600000000002, "end": 1499.1200000000001, "text": " Daniel + thank you like it''s a fantastic course I can highly recommend it it''s super intense + as well", "tokens": [50372, 8033, 1309, 291, 411, 309, 311, 257, 5456, 1164, 286, + 393, 5405, 2748, 309, 309, 311, 1687, 9447, 382, 731, 50700], "temperature": 0.0, + "avg_logprob": -0.16151119413829984, "compression_ratio": 1.7522522522522523, "no_speech_prob": + 0.005738195031881332}, {"id": 215, "seek": 149240, "start": 1499.1200000000001, + "end": 1505.76, "text": " and I think yesterday grand mentioned that there are companies + which you know that they really", "tokens": [50700, 293, 286, 519, 5186, 2697, 2835, + 300, 456, 366, 3431, 597, 291, 458, 300, 436, 534, 51032], "temperature": 0.0, "avg_logprob": + -0.16151119413829984, "compression_ratio": 1.7522522522522523, "no_speech_prob": + 0.005738195031881332}, {"id": 216, "seek": 149240, "start": 1505.76, "end": 1511.92, + "text": " need to optimize only like top one or top two results and they have built + models to optimize only", "tokens": [51032, 643, 281, 19719, 787, 411, 1192, 472, + 420, 1192, 732, 3542, 293, 436, 362, 3094, 5245, 281, 19719, 787, 51340], "temperature": + 0.0, "avg_logprob": -0.16151119413829984, "compression_ratio": 1.7522522522522523, + "no_speech_prob": 0.005738195031881332}, {"id": 217, "seek": 149240, "start": 1511.92, + "end": 1518.96, "text": " that top one or top two which sounds like my mind blowing + right and that was like something maybe", "tokens": [51340, 300, 1192, 472, 420, + 1192, 732, 597, 3263, 411, 452, 1575, 15068, 558, 293, 300, 390, 411, 746, 1310, + 51692], "temperature": 0.0, "avg_logprob": -0.16151119413829984, "compression_ratio": + 1.7522522522522523, "no_speech_prob": 0.005738195031881332}, {"id": 218, "seek": + 151896, "start": 1519.04, "end": 1525.3600000000001, "text": " this applies to web + scale to some extent and one of my recent experiences is actually in web scale", + "tokens": [50368, 341, 13165, 281, 3670, 4373, 281, 512, 8396, 293, 472, 295, 452, + 5162, 5235, 307, 767, 294, 3670, 4373, 50684], "temperature": 0.0, "avg_logprob": + -0.1052474021911621, "compression_ratio": 1.5797872340425532, "no_speech_prob": + 0.0009645342361181974}, {"id": 219, "seek": 151896, "start": 1525.3600000000001, + "end": 1532.0, "text": " search engine we have a mobile screen and so we can only + show top three results and the target is", "tokens": [50684, 3164, 2848, 321, 362, + 257, 6013, 2568, 293, 370, 321, 393, 787, 855, 1192, 1045, 3542, 293, 264, 3779, + 307, 51016], "temperature": 0.0, "avg_logprob": -0.1052474021911621, "compression_ratio": + 1.5797872340425532, "no_speech_prob": 0.0009645342361181974}, {"id": 220, "seek": + 151896, "start": 1532.0, "end": 1541.3600000000001, "text": " obviously to have + a high CTR and so we''ve quickly noticed that if you do a sparse search without + any", "tokens": [51016, 2745, 281, 362, 257, 1090, 19529, 49, 293, 370, 321, 600, + 2661, 5694, 300, 498, 291, 360, 257, 637, 11668, 3164, 1553, 604, 51484], "temperature": + 0.0, "avg_logprob": -0.1052474021911621, "compression_ratio": 1.5797872340425532, + "no_speech_prob": 0.0009645342361181974}, {"id": 221, "seek": 154136, "start": 1541.36, + "end": 1548.6399999999999, "text": " logic on the query whatsoever CTR is very low + so you have to do some tricks like", "tokens": [50364, 9952, 322, 264, 14581, 17076, + 19529, 49, 307, 588, 2295, 370, 291, 362, 281, 360, 512, 11733, 411, 50728], "temperature": + 0.0, "avg_logprob": -0.09863645059091074, "compression_ratio": 1.6457399103139014, + "no_speech_prob": 0.003454318270087242}, {"id": 222, "seek": 154136, "start": 1549.1999999999998, + "end": 1553.52, "text": " query understanding and also trying to increase precision + at the same time maintaining the", "tokens": [50756, 14581, 3701, 293, 611, 1382, + 281, 3488, 18356, 412, 264, 912, 565, 14916, 264, 50972], "temperature": 0.0, "avg_logprob": + -0.09863645059091074, "compression_ratio": 1.6457399103139014, "no_speech_prob": + 0.003454318270087242}, {"id": 223, "seek": 154136, "start": 1553.52, "end": 1560.08, + "text": " diversity of search in some degree so you know basically it''s very easy + that with sparse search", "tokens": [50972, 8811, 295, 3164, 294, 512, 4314, 370, + 291, 458, 1936, 309, 311, 588, 1858, 300, 365, 637, 11668, 3164, 51300], "temperature": + 0.0, "avg_logprob": -0.09863645059091074, "compression_ratio": 1.6457399103139014, + "no_speech_prob": 0.003454318270087242}, {"id": 224, "seek": 154136, "start": 1560.08, + "end": 1567.1999999999998, "text": " you hit just the tip of that iceberg and basically + say okay I have three teacher jobs for you which", "tokens": [51300, 291, 2045, + 445, 264, 4125, 295, 300, 38880, 293, 1936, 584, 1392, 286, 362, 1045, 5027, 4782, + 337, 291, 597, 51656], "temperature": 0.0, "avg_logprob": -0.09863645059091074, + "compression_ratio": 1.6457399103139014, "no_speech_prob": 0.003454318270087242}, + {"id": 225, "seek": 156720, "start": 1567.52, "end": 1572.48, "text": " is not that + interesting because we don''t know if the user is looking for teacher jobs right + so", "tokens": [50380, 307, 406, 300, 1880, 570, 321, 500, 380, 458, 498, 264, 4195, + 307, 1237, 337, 5027, 4782, 558, 370, 50628], "temperature": 0.0, "avg_logprob": + -0.1585766077041626, "compression_ratio": 1.7017543859649122, "no_speech_prob": + 0.0016459384933114052}, {"id": 226, "seek": 156720, "start": 1572.48, "end": 1579.92, + "text": " so that''s that''s like have you seen cases like this I think these are + really really challenging ones", "tokens": [50628, 370, 300, 311, 300, 311, 411, + 362, 291, 1612, 3331, 411, 341, 286, 519, 613, 366, 534, 534, 7595, 2306, 51000], + "temperature": 0.0, "avg_logprob": -0.1585766077041626, "compression_ratio": 1.7017543859649122, + "no_speech_prob": 0.0016459384933114052}, {"id": 227, "seek": 156720, "start": 1581.2, + "end": 1587.3600000000001, "text": " yeah but generally if you look at the results + like if you ever rate on MS Markov for example", "tokens": [51064, 1338, 457, 5101, + 498, 291, 574, 412, 264, 3542, 411, 498, 291, 1562, 3314, 322, 7395, 3934, 5179, + 337, 1365, 51372], "temperature": 0.0, "avg_logprob": -0.1585766077041626, "compression_ratio": + 1.7017543859649122, "no_speech_prob": 0.0016459384933114052}, {"id": 228, "seek": + 156720, "start": 1587.3600000000001, "end": 1594.32, "text": " and the official + metric there which is the mean the reciprocal rank right so if you get the perfect", + "tokens": [51372, 293, 264, 4783, 20678, 456, 597, 307, 264, 914, 264, 46948, 6181, + 558, 370, 498, 291, 483, 264, 2176, 51720], "temperature": 0.0, "avg_logprob": -0.1585766077041626, + "compression_ratio": 1.7017543859649122, "no_speech_prob": 0.0016459384933114052}, + {"id": 229, "seek": 159432, "start": 1594.32, "end": 1598.1599999999999, "text": + " the actual relevant document you''re able to retrieve it and put it in position + one", "tokens": [50364, 264, 3539, 7340, 4166, 291, 434, 1075, 281, 30254, 309, + 293, 829, 309, 294, 2535, 472, 50556], "temperature": 0.0, "avg_logprob": -0.10449729495578342, + "compression_ratio": 1.669683257918552, "no_speech_prob": 0.0007100481889210641}, + {"id": 230, "seek": 159432, "start": 1599.04, "end": 1604.32, "text": " that query + gets a score of one right but if you put it in the second place it''s gonna have + a score", "tokens": [50600, 300, 14581, 2170, 257, 6175, 295, 472, 558, 457, 498, + 291, 829, 309, 294, 264, 1150, 1081, 309, 311, 799, 362, 257, 6175, 50864], "temperature": + 0.0, "avg_logprob": -0.10449729495578342, "compression_ratio": 1.669683257918552, + "no_speech_prob": 0.0007100481889210641}, {"id": 231, "seek": 159432, "start": 1604.32, + "end": 1611.2, "text": " of 0.5 I think that''s really good measure when you talk + about mobile screen precision at 1 2 3 so", "tokens": [50864, 295, 1958, 13, 20, + 286, 519, 300, 311, 534, 665, 3481, 562, 291, 751, 466, 6013, 2568, 18356, 412, + 502, 568, 805, 370, 51208], "temperature": 0.0, "avg_logprob": -0.10449729495578342, + "compression_ratio": 1.669683257918552, "no_speech_prob": 0.0007100481889210641}, + {"id": 232, "seek": 159432, "start": 1611.84, "end": 1617.6799999999998, "text": + " that''s really important but in the kind of retrieval not this stage retrieval + and ranking", "tokens": [51240, 300, 311, 534, 1021, 457, 294, 264, 733, 295, 19817, + 3337, 406, 341, 3233, 19817, 3337, 293, 17833, 51532], "temperature": 0.0, "avg_logprob": + -0.10449729495578342, "compression_ratio": 1.669683257918552, "no_speech_prob": + 0.0007100481889210641}, {"id": 233, "seek": 161768, "start": 1617.68, "end": 1625.6000000000001, + "text": " pipeline it makes sense to spend more of the computational budget within + the lakes SLA on those", "tokens": [50364, 15517, 309, 1669, 2020, 281, 3496, 544, + 295, 264, 28270, 4706, 1951, 264, 25595, 318, 11435, 322, 729, 50760], "temperature": + 0.0, "avg_logprob": -0.16362902522087097, "compression_ratio": 1.4736842105263157, + "no_speech_prob": 0.0033677159808576107}, {"id": 234, "seek": 161768, "start": 1625.6000000000001, + "end": 1633.92, "text": " top K hits right so like in when you go to Google today + and you do a search probably 100", "tokens": [50760, 1192, 591, 8664, 558, 370, + 411, 294, 562, 291, 352, 281, 3329, 965, 293, 291, 360, 257, 3164, 1391, 2319, 51176], + "temperature": 0.0, "avg_logprob": -0.16362902522087097, "compression_ratio": 1.4736842105263157, + "no_speech_prob": 0.0033677159808576107}, {"id": 235, "seek": 161768, "start": 1633.92, + "end": 1641.44, "text": " million documents will be excluded in just a fraction + of a millisecond right and then there are", "tokens": [51176, 2459, 8512, 486, 312, + 29486, 294, 445, 257, 14135, 295, 257, 27940, 18882, 558, 293, 550, 456, 366, 51552], + "temperature": 0.0, "avg_logprob": -0.16362902522087097, "compression_ratio": 1.4736842105263157, + "no_speech_prob": 0.0033677159808576107}, {"id": 236, "seek": 164144, "start": 1641.44, + "end": 1648.8, "text": " multiple stages and you can be sure that the kind of the + the 10 last documents from the previous", "tokens": [50364, 3866, 10232, 293, 291, + 393, 312, 988, 300, 264, 733, 295, 264, 264, 1266, 1036, 8512, 490, 264, 3894, 50732], + "temperature": 0.0, "avg_logprob": -0.1745302860553448, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.003506377572193742}, {"id": 237, "seek": 164144, "start": 1648.8, + "end": 1658.3200000000002, "text": " stages that the invest more time or computer + computational resources on those hits yeah yeah exactly", "tokens": [50732, 10232, + 300, 264, 1963, 544, 565, 420, 3820, 28270, 3593, 322, 729, 8664, 1338, 1338, 2293, + 51208], "temperature": 0.0, "avg_logprob": -0.1745302860553448, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.003506377572193742}, {"id": 238, "seek": + 164144, "start": 1658.3200000000002, "end": 1663.68, "text": " and also the good + thing is that is that because that''s when I talked about the West", "tokens": [51208, + 293, 611, 264, 665, 551, 307, 300, 307, 300, 570, 300, 311, 562, 286, 2825, 466, + 264, 4055, 51476], "temperature": 0.0, "avg_logprob": -0.1745302860553448, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.003506377572193742}, {"id": 239, "seek": + 164144, "start": 1663.68, "end": 1668.72, "text": " Architecture where you have + this division between stateless which is doing the scatter gather", "tokens": [51476, + 43049, 689, 291, 362, 341, 10044, 1296, 2219, 4272, 597, 307, 884, 264, 34951, 5448, + 51728], "temperature": 0.0, "avg_logprob": -0.1745302860553448, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.003506377572193742}, {"id": 240, "seek": + 166872, "start": 1669.3600000000001, "end": 1675.04, "text": " and each of the local + nodes is that basically in a search engine today you need to move", "tokens": [50396, + 293, 1184, 295, 264, 2654, 13891, 307, 300, 1936, 294, 257, 3164, 2848, 965, 291, + 643, 281, 1286, 50680], "temperature": 0.0, "avg_logprob": -0.20608916979157524, + "compression_ratio": 1.7877358490566038, "no_speech_prob": 0.0054605938494205475}, + {"id": 241, "seek": 166872, "start": 1675.92, "end": 1682.64, "text": " move computations + both to the data but there''s a lot of talk about you know moving or separating", + "tokens": [50724, 1286, 2807, 763, 1293, 281, 264, 1412, 457, 456, 311, 257, 688, + 295, 751, 466, 291, 458, 2684, 420, 29279, 51060], "temperature": 0.0, "avg_logprob": + -0.20608916979157524, "compression_ratio": 1.7877358490566038, "no_speech_prob": + 0.0054605938494205475}, {"id": 242, "seek": 166872, "start": 1682.64, "end": 1690.16, + "text": " compute from storage which is a huge thing right in the cloud but in search + in search use cases", "tokens": [51060, 14722, 490, 6725, 597, 307, 257, 2603, 551, + 558, 294, 264, 4588, 457, 294, 3164, 294, 3164, 764, 3331, 51436], "temperature": + 0.0, "avg_logprob": -0.20608916979157524, "compression_ratio": 1.7877358490566038, + "no_speech_prob": 0.0054605938494205475}, {"id": 243, "seek": 166872, "start": 1690.16, + "end": 1696.96, "text": " with high triplet high document or you need to be able + to do both you need to do fast computations", "tokens": [51436, 365, 1090, 1376, + 14657, 1090, 4166, 420, 291, 643, 281, 312, 1075, 281, 360, 1293, 291, 643, 281, + 360, 2370, 2807, 763, 51776], "temperature": 0.0, "avg_logprob": -0.20608916979157524, + "compression_ratio": 1.7877358490566038, "no_speech_prob": 0.0054605938494205475}, + {"id": 244, "seek": 169696, "start": 1697.92, "end": 1704.88, "text": " across multiple + nodes and then you transfer data like in last well each of the hits that you can", + "tokens": [50412, 2108, 3866, 13891, 293, 550, 291, 5003, 1412, 411, 294, 1036, + 731, 1184, 295, 264, 8664, 300, 291, 393, 50760], "temperature": 0.0, "avg_logprob": + -0.14356700940565628, "compression_ratio": 1.7066666666666668, "no_speech_prob": + 0.0014597148401662707}, {"id": 245, "seek": 169696, "start": 1705.92, "end": 1712.16, + "text": " include ranking features or so on that the subsequent phases can actually + use for re-ranking", "tokens": [50812, 4090, 17833, 4122, 420, 370, 322, 300, 264, + 19962, 18764, 393, 767, 764, 337, 319, 12, 20479, 278, 51124], "temperature": 0.0, + "avg_logprob": -0.14356700940565628, "compression_ratio": 1.7066666666666668, "no_speech_prob": + 0.0014597148401662707}, {"id": 246, "seek": 169696, "start": 1712.72, "end": 1718.64, + "text": " and the good thing is like I know you''ve done a talk about diversity + in search results and so on", "tokens": [51152, 293, 264, 665, 551, 307, 411, 286, + 458, 291, 600, 1096, 257, 751, 466, 8811, 294, 3164, 3542, 293, 370, 322, 51448], + "temperature": 0.0, "avg_logprob": -0.14356700940565628, "compression_ratio": 1.7066666666666668, + "no_speech_prob": 0.0014597148401662707}, {"id": 247, "seek": 169696, "start": 1718.64, + "end": 1725.8400000000001, "text": " is that you need to have that global view you + know in order to kind of optimize for for diversity", "tokens": [51448, 307, 300, + 291, 643, 281, 362, 300, 4338, 1910, 291, 458, 294, 1668, 281, 733, 295, 19719, + 337, 337, 8811, 51808], "temperature": 0.0, "avg_logprob": -0.14356700940565628, + "compression_ratio": 1.7066666666666668, "no_speech_prob": 0.0014597148401662707}, + {"id": 248, "seek": 172584, "start": 1725.84, "end": 1730.24, "text": " and then + you can kind of throw away a lot of the hits that you''re not going to show because + of", "tokens": [50364, 293, 550, 291, 393, 733, 295, 3507, 1314, 257, 688, 295, + 264, 8664, 300, 291, 434, 406, 516, 281, 855, 570, 295, 50584], "temperature": 0.0, + "avg_logprob": -0.08826788039434524, "compression_ratio": 1.8818897637795275, "no_speech_prob": + 0.0001775760465534404}, {"id": 249, "seek": 172584, "start": 1730.24, "end": 1735.36, + "text": " kind of business constraints or diversity constraints and you don''t need + to invoke the heavy model", "tokens": [50584, 733, 295, 1606, 18491, 420, 8811, + 18491, 293, 291, 500, 380, 643, 281, 41117, 264, 4676, 2316, 50840], "temperature": + 0.0, "avg_logprob": -0.08826788039434524, "compression_ratio": 1.8818897637795275, + "no_speech_prob": 0.0001775760465534404}, {"id": 250, "seek": 172584, "start": 1736.24, + "end": 1741.9199999999998, "text": " for those hits yeah so yeah I think it''s interesting + for these kind of pipelines but one thing that", "tokens": [50884, 337, 729, 8664, + 1338, 370, 1338, 286, 519, 309, 311, 1880, 337, 613, 733, 295, 40168, 457, 472, + 551, 300, 51168], "temperature": 0.0, "avg_logprob": -0.08826788039434524, "compression_ratio": + 1.8818897637795275, "no_speech_prob": 0.0001775760465534404}, {"id": 251, "seek": + 172584, "start": 1741.9199999999998, "end": 1748.1599999999999, "text": " is challenging + regarding both the stage pipelines is that they interact with each other right", + "tokens": [51168, 307, 7595, 8595, 1293, 264, 3233, 40168, 307, 300, 436, 4648, + 365, 1184, 661, 558, 51480], "temperature": 0.0, "avg_logprob": -0.08826788039434524, + "compression_ratio": 1.8818897637795275, "no_speech_prob": 0.0001775760465534404}, + {"id": 252, "seek": 172584, "start": 1748.1599999999999, "end": 1754.3999999999999, + "text": " and if you do if you have like a system for training your model retraining + the model using", "tokens": [51480, 293, 498, 291, 360, 498, 291, 362, 411, 257, + 1185, 337, 3097, 428, 2316, 49356, 1760, 264, 2316, 1228, 51792], "temperature": + 0.0, "avg_logprob": -0.08826788039434524, "compression_ratio": 1.8818897637795275, + "no_speech_prob": 0.0001775760465534404}, {"id": 253, "seek": 175440, "start": 1754.4, + "end": 1759.2800000000002, "text": " statistical features what are users clicking + on and so on the one of the features then you will", "tokens": [50364, 22820, 4122, + 437, 366, 5022, 9697, 322, 293, 370, 322, 264, 472, 295, 264, 4122, 550, 291, 486, + 50608], "temperature": 0.0, "avg_logprob": -0.15696592440550355, "compression_ratio": + 1.7412280701754386, "no_speech_prob": 0.000270951451966539}, {"id": 254, "seek": + 175440, "start": 1759.2800000000002, "end": 1766.5600000000002, "text": " have some + biases towards the actual the ranking algorithm that is in place today because that''s + the", "tokens": [50608, 362, 512, 32152, 3030, 264, 3539, 264, 17833, 9284, 300, + 307, 294, 1081, 965, 570, 300, 311, 264, 50972], "temperature": 0.0, "avg_logprob": + -0.15696592440550355, "compression_ratio": 1.7412280701754386, "no_speech_prob": + 0.000270951451966539}, {"id": 255, "seek": 175440, "start": 1766.5600000000002, + "end": 1774.16, "text": " model that is bringing interactions so you basically just + retrain on the top hits and that was what", "tokens": [50972, 2316, 300, 307, 5062, + 13280, 370, 291, 1936, 445, 1533, 7146, 322, 264, 1192, 8664, 293, 300, 390, 437, + 51352], "temperature": 0.0, "avg_logprob": -0.15696592440550355, "compression_ratio": + 1.7412280701754386, "no_speech_prob": 0.000270951451966539}, {"id": 256, "seek": + 175440, "start": 1774.16, "end": 1781.1200000000001, "text": " we saw on Amazon + as well as that when they started to improve the retriever so when it''s not actually", + "tokens": [51352, 321, 1866, 322, 6795, 382, 731, 382, 300, 562, 436, 1409, 281, + 3470, 264, 19817, 331, 370, 562, 309, 311, 406, 767, 51700], "temperature": 0.0, + "avg_logprob": -0.15696592440550355, "compression_ratio": 1.7412280701754386, "no_speech_prob": + 0.000270951451966539}, {"id": 257, "seek": 178112, "start": 1781.12, "end": 1789.6799999999998, + "text": " instead of having a BM25 like do BM25 and then rerank they had a mean + reciprocal rank score of 0.35", "tokens": [50364, 2602, 295, 1419, 257, 15901, 6074, + 411, 360, 15901, 6074, 293, 550, 319, 20479, 436, 632, 257, 914, 46948, 6181, 6175, + 295, 1958, 13, 8794, 50792], "temperature": 0.0, "avg_logprob": -0.14804854600325876, + "compression_ratio": 1.7844036697247707, "no_speech_prob": 0.0019400801975280046}, + {"id": 258, "seek": 178112, "start": 1789.6799999999998, "end": 1796.8, "text": + " or something and now after changing into a dense retriever now we''re talking + about 0.42 or", "tokens": [50792, 420, 746, 293, 586, 934, 4473, 666, 257, 18011, + 19817, 331, 586, 321, 434, 1417, 466, 1958, 13, 15628, 420, 51148], "temperature": + 0.0, "avg_logprob": -0.14804854600325876, "compression_ratio": 1.7844036697247707, + "no_speech_prob": 0.0019400801975280046}, {"id": 259, "seek": 178112, "start": 1796.8, + "end": 1802.7199999999998, "text": " something like that so by improving the improving + the retriever right because the retriever kind of", "tokens": [51148, 746, 411, + 300, 370, 538, 11470, 264, 11470, 264, 19817, 331, 558, 570, 264, 19817, 331, 733, + 295, 51444], "temperature": 0.0, "avg_logprob": -0.14804854600325876, "compression_ratio": + 1.7844036697247707, "no_speech_prob": 0.0019400801975280046}, {"id": 260, "seek": + 178112, "start": 1802.7199999999998, "end": 1808.2399999999998, "text": " sets the + upper bound you know because the rerank cannot really dream up you know the relevant + hits", "tokens": [51444, 6352, 264, 6597, 5472, 291, 458, 570, 264, 319, 20479, + 2644, 534, 3055, 493, 291, 458, 264, 7340, 8664, 51720], "temperature": 0.0, "avg_logprob": + -0.14804854600325876, "compression_ratio": 1.7844036697247707, "no_speech_prob": + 0.0019400801975280046}, {"id": 261, "seek": 180824, "start": 1808.24, "end": 1814.32, + "text": " if the retriever hasn''t retrieved it right so that''s an important point + you know in the retriever", "tokens": [50364, 498, 264, 19817, 331, 6132, 380, 19817, + 937, 309, 558, 370, 300, 311, 364, 1021, 935, 291, 458, 294, 264, 19817, 331, 50668], + "temperature": 0.0, "avg_logprob": -0.11389731501673793, "compression_ratio": 1.8037037037037038, + "no_speech_prob": 0.0006391859496943653}, {"id": 262, "seek": 180824, "start": 1814.32, + "end": 1820.24, "text": " and writing stages so exactly and I think we can gradually + move into neural search and vector search", "tokens": [50668, 293, 3579, 10232, + 370, 2293, 293, 286, 519, 321, 393, 13145, 1286, 666, 18161, 3164, 293, 8062, 3164, + 50964], "temperature": 0.0, "avg_logprob": -0.11389731501673793, "compression_ratio": + 1.8037037037037038, "no_speech_prob": 0.0006391859496943653}, {"id": 263, "seek": + 180824, "start": 1820.24, "end": 1825.68, "text": " but like you know it was one + of the students question also yes then the same course how much you", "tokens": + [50964, 457, 411, 291, 458, 309, 390, 472, 295, 264, 1731, 1168, 611, 2086, 550, + 264, 912, 1164, 577, 709, 291, 51236], "temperature": 0.0, "avg_logprob": -0.11389731501673793, + "compression_ratio": 1.8037037037037038, "no_speech_prob": 0.0006391859496943653}, + {"id": 264, "seek": 180824, "start": 1825.68, "end": 1831.28, "text": " can actually + solve with the rerunquer if your first stage retriever didn''t even find what the", + "tokens": [51236, 393, 767, 5039, 365, 264, 43819, 409, 8035, 498, 428, 700, 3233, + 19817, 331, 994, 380, 754, 915, 437, 264, 51516], "temperature": 0.0, "avg_logprob": + -0.11389731501673793, "compression_ratio": 1.8037037037037038, "no_speech_prob": + 0.0006391859496943653}, {"id": 265, "seek": 180824, "start": 1831.28, "end": 1836.72, + "text": " user is looking for which means probably the query is not a match for + this search engine you know", "tokens": [51516, 4195, 307, 1237, 337, 597, 1355, + 1391, 264, 14581, 307, 406, 257, 2995, 337, 341, 3164, 2848, 291, 458, 51788], "temperature": + 0.0, "avg_logprob": -0.11389731501673793, "compression_ratio": 1.8037037037037038, + "no_speech_prob": 0.0006391859496943653}, {"id": 266, "seek": 183672, "start": 1837.28, + "end": 1842.64, "text": " let''s say they''re looking for a specific model of a + phone but they don''t even cell phones right so", "tokens": [50392, 718, 311, 584, + 436, 434, 1237, 337, 257, 2685, 2316, 295, 257, 2593, 457, 436, 500, 380, 754, 2815, + 10216, 558, 370, 50660], "temperature": 0.0, "avg_logprob": -0.15847271479917377, + "compression_ratio": 1.6899563318777293, "no_speech_prob": 0.0015201332280412316}, + {"id": 267, "seek": 183672, "start": 1843.28, "end": 1848.8, "text": " like and + I think the response from Daniel Tankilang on this one was that actually you can", + "tokens": [50692, 411, 293, 286, 519, 264, 4134, 490, 8033, 314, 27203, 25241, 322, + 341, 472, 390, 300, 767, 291, 393, 50968], "temperature": 0.0, "avg_logprob": -0.15847271479917377, + "compression_ratio": 1.6899563318777293, "no_speech_prob": 0.0015201332280412316}, + {"id": 268, "seek": 183672, "start": 1848.8, "end": 1854.56, "text": " implement + a query understanding system which will understand the query as much as it can and + if it", "tokens": [50968, 4445, 257, 14581, 3701, 1185, 597, 486, 1223, 264, 14581, + 382, 709, 382, 309, 393, 293, 498, 309, 51256], "temperature": 0.0, "avg_logprob": + -0.15847271479917377, "compression_ratio": 1.6899563318777293, "no_speech_prob": + 0.0015201332280412316}, {"id": 269, "seek": 183672, "start": 1854.56, "end": 1860.64, + "text": " knows that there are no such items in the database don''t even bother + searching for them and I think", "tokens": [51256, 3255, 300, 456, 366, 572, 1270, + 4754, 294, 264, 8149, 500, 380, 754, 8677, 10808, 337, 552, 293, 286, 519, 51560], + "temperature": 0.0, "avg_logprob": -0.15847271479917377, "compression_ratio": 1.6899563318777293, + "no_speech_prob": 0.0015201332280412316}, {"id": 270, "seek": 186064, "start": 1860.72, + "end": 1868.8000000000002, "text": " this was a really really clever advice on it + and and and he said that system worked extremely well", "tokens": [50368, 341, 390, + 257, 534, 534, 13494, 5192, 322, 309, 293, 293, 293, 415, 848, 300, 1185, 2732, + 4664, 731, 50772], "temperature": 0.0, "avg_logprob": -0.11532719511734812, "compression_ratio": + 1.6527777777777777, "no_speech_prob": 0.0026600901037454605}, {"id": 271, "seek": + 186064, "start": 1868.8000000000002, "end": 1875.44, "text": " so like for user + for user satisfaction to save their time right because in the end what we''re", + "tokens": [50772, 370, 411, 337, 4195, 337, 4195, 18715, 281, 3155, 641, 565, 558, + 570, 294, 264, 917, 437, 321, 434, 51104], "temperature": 0.0, "avg_logprob": -0.11532719511734812, + "compression_ratio": 1.6527777777777777, "no_speech_prob": 0.0026600901037454605}, + {"id": 272, "seek": 186064, "start": 1875.44, "end": 1881.1200000000001, "text": + " doing is actually optimizing the user journey which then translates into business + right so", "tokens": [51104, 884, 307, 767, 40425, 264, 4195, 4671, 597, 550, 28468, + 666, 1606, 558, 370, 51388], "temperature": 0.0, "avg_logprob": -0.11532719511734812, + "compression_ratio": 1.6527777777777777, "no_speech_prob": 0.0026600901037454605}, + {"id": 273, "seek": 186064, "start": 1881.1200000000001, "end": 1886.0, "text": + " that was a fantastic example of how you can also talk such search problem", "tokens": + [51388, 300, 390, 257, 5456, 1365, 295, 577, 291, 393, 611, 751, 1270, 3164, 1154, + 51632], "temperature": 0.0, "avg_logprob": -0.11532719511734812, "compression_ratio": + 1.6527777777777777, "no_speech_prob": 0.0026600901037454605}, {"id": 274, "seek": + 188600, "start": 1886.64, "end": 1895.52, "text": " yeah and that''s one area where + I wanted to because we''re building all of these sample applications", "tokens": + [50396, 1338, 293, 300, 311, 472, 1859, 689, 286, 1415, 281, 570, 321, 434, 2390, + 439, 295, 613, 6889, 5821, 50840], "temperature": 0.0, "avg_logprob": -0.12055834664238824, + "compression_ratio": 1.6939655172413792, "no_speech_prob": 0.0036166973877698183}, + {"id": 275, "seek": 188600, "start": 1895.52, "end": 1900.88, "text": " what you + can build with VASPA and query understanding has been one of the topics that I wanted + to build", "tokens": [50840, 437, 291, 393, 1322, 365, 691, 3160, 10297, 293, 14581, + 3701, 575, 668, 472, 295, 264, 8378, 300, 286, 1415, 281, 1322, 51108], "temperature": + 0.0, "avg_logprob": -0.12055834664238824, "compression_ratio": 1.6939655172413792, + "no_speech_prob": 0.0036166973877698183}, {"id": 276, "seek": 188600, "start": 1900.88, + "end": 1906.8, "text": " out to to demonstrate you know how to do that especially + building it using a transformer model", "tokens": [51108, 484, 281, 281, 11698, + 291, 458, 577, 281, 360, 300, 2318, 2390, 309, 1228, 257, 31782, 2316, 51404], "temperature": + 0.0, "avg_logprob": -0.12055834664238824, "compression_ratio": 1.6939655172413792, + "no_speech_prob": 0.0036166973877698183}, {"id": 277, "seek": 188600, "start": 1906.8, + "end": 1913.2, "text": " actually so you can have different ways of doing this but + one way of doing it is to use it as a", "tokens": [51404, 767, 370, 291, 393, 362, + 819, 2098, 295, 884, 341, 457, 472, 636, 295, 884, 309, 307, 281, 764, 309, 382, + 257, 51724], "temperature": 0.0, "avg_logprob": -0.12055834664238824, "compression_ratio": + 1.6939655172413792, "no_speech_prob": 0.0036166973877698183}, {"id": 278, "seek": + 191320, "start": 1913.2, "end": 1921.1200000000001, "text": " multi-label categorization + problem so given a query here are the intents and their probability", "tokens": + [50364, 4825, 12, 75, 18657, 19250, 2144, 1154, 370, 2212, 257, 14581, 510, 366, + 264, 560, 791, 293, 641, 8482, 50760], "temperature": 0.0, "avg_logprob": -0.11612093183729384, + "compression_ratio": 1.709090909090909, "no_speech_prob": 0.0005346767138689756}, + {"id": 279, "seek": 191320, "start": 1922.24, "end": 1927.3600000000001, "text": + " but what''s stopping me from doing this is that we need to work on kind of open + data sets now", "tokens": [50816, 457, 437, 311, 12767, 385, 490, 884, 341, 307, + 300, 321, 643, 281, 589, 322, 733, 295, 1269, 1412, 6352, 586, 51072], "temperature": + 0.0, "avg_logprob": -0.11612093183729384, "compression_ratio": 1.709090909090909, + "no_speech_prob": 0.0005346767138689756}, {"id": 280, "seek": 191320, "start": 1927.3600000000001, + "end": 1933.44, "text": " and there are very few query data sets in this way so + one approximation is actually to train", "tokens": [51072, 293, 456, 366, 588, 1326, + 14581, 1412, 6352, 294, 341, 636, 370, 472, 28023, 307, 767, 281, 3847, 51376], + "temperature": 0.0, "avg_logprob": -0.11612093183729384, "compression_ratio": 1.709090909090909, + "no_speech_prob": 0.0005346767138689756}, {"id": 281, "seek": 191320, "start": 1934.0, + "end": 1940.8, "text": " using the title so in the e-commerce set you can train + based on the titles but then you need to", "tokens": [51404, 1228, 264, 4876, 370, + 294, 264, 308, 12, 26926, 992, 291, 393, 3847, 2361, 322, 264, 12992, 457, 550, + 291, 643, 281, 51744], "temperature": 0.0, "avg_logprob": -0.11612093183729384, + "compression_ratio": 1.709090909090909, "no_speech_prob": 0.0005346767138689756}, + {"id": 282, "seek": 194080, "start": 1940.8, "end": 1947.2, "text": " have some + kind of label on you know is this you can do it around categories so you have the + title", "tokens": [50364, 362, 512, 733, 295, 7645, 322, 291, 458, 307, 341, 291, + 393, 360, 309, 926, 10479, 370, 291, 362, 264, 4876, 50684], "temperature": 0.0, + "avg_logprob": -0.12704996221205767, "compression_ratio": 1.8130841121495327, "no_speech_prob": + 0.0006706691347062588}, {"id": 283, "seek": 194080, "start": 1947.2, "end": 1953.76, + "text": " of the e-commerce listing and you have the category and the beauty of + this is that you''re actually", "tokens": [50684, 295, 264, 308, 12, 26926, 22161, + 293, 291, 362, 264, 7719, 293, 264, 6643, 295, 341, 307, 300, 291, 434, 767, 51012], + "temperature": 0.0, "avg_logprob": -0.12704996221205767, "compression_ratio": 1.8130841121495327, + "no_speech_prob": 0.0006706691347062588}, {"id": 284, "seek": 194080, "start": 1953.76, + "end": 1962.48, "text": " mapping free text queries into kind of a fixed predefined + vocabulary which is the categories", "tokens": [51012, 18350, 1737, 2487, 24109, + 666, 733, 295, 257, 6806, 659, 37716, 19864, 597, 307, 264, 10479, 51448], "temperature": + 0.0, "avg_logprob": -0.12704996221205767, "compression_ratio": 1.8130841121495327, + "no_speech_prob": 0.0006706691347062588}, {"id": 285, "seek": 194080, "start": 1962.48, + "end": 1969.52, "text": " and then you can actually eliminate zero hits the zero + hits problem because you actually no longer", "tokens": [51448, 293, 550, 291, 393, + 767, 13819, 4018, 8664, 264, 4018, 8664, 1154, 570, 291, 767, 572, 2854, 51800], + "temperature": 0.0, "avg_logprob": -0.12704996221205767, "compression_ratio": 1.8130841121495327, + "no_speech_prob": 0.0006706691347062588}, {"id": 286, "seek": 196952, "start": 1969.52, + "end": 1975.04, "text": " retrieving based on on the free text queries but you''re + retrieving based on the most kind of", "tokens": [50364, 19817, 798, 2361, 322, + 322, 264, 1737, 2487, 24109, 457, 291, 434, 19817, 798, 2361, 322, 264, 881, 733, + 295, 50640], "temperature": 0.0, "avg_logprob": -0.12792613528190402, "compression_ratio": + 2.102222222222222, "no_speech_prob": 0.0007940616924315691}, {"id": 287, "seek": + 196952, "start": 1975.04, "end": 1979.6, "text": " interesting categories so yeah + so these are really interesting you know topics and that''s one", "tokens": [50640, + 1880, 10479, 370, 1338, 370, 613, 366, 534, 1880, 291, 458, 8378, 293, 300, 311, + 472, 50868], "temperature": 0.0, "avg_logprob": -0.12792613528190402, "compression_ratio": + 2.102222222222222, "no_speech_prob": 0.0007940616924315691}, {"id": 288, "seek": + 196952, "start": 1979.6, "end": 1985.44, "text": " thing that I find you know with + search you know and why I love being in search is that there''s", "tokens": [50868, + 551, 300, 286, 915, 291, 458, 365, 3164, 291, 458, 293, 983, 286, 959, 885, 294, + 3164, 307, 300, 456, 311, 51160], "temperature": 0.0, "avg_logprob": -0.12792613528190402, + "compression_ratio": 2.102222222222222, "no_speech_prob": 0.0007940616924315691}, + {"id": 289, "seek": 196952, "start": 1985.44, "end": 1990.56, "text": " there''s + there''s there''s a ton of things that you can you know build and you know there''s + query", "tokens": [51160, 456, 311, 456, 311, 456, 311, 257, 2952, 295, 721, 300, + 291, 393, 291, 458, 1322, 293, 291, 458, 456, 311, 14581, 51416], "temperature": + 0.0, "avg_logprob": -0.12792613528190402, "compression_ratio": 2.102222222222222, + "no_speech_prob": 0.0007940616924315691}, {"id": 290, "seek": 196952, "start": 1990.56, + "end": 1997.68, "text": " understanding you know there''s facets there''s retrieval + and ranking dense bars you know and then", "tokens": [51416, 3701, 291, 458, 456, + 311, 49752, 456, 311, 19817, 3337, 293, 17833, 18011, 10228, 291, 458, 293, 550, + 51772], "temperature": 0.0, "avg_logprob": -0.12792613528190402, "compression_ratio": + 2.102222222222222, "no_speech_prob": 0.0007940616924315691}, {"id": 291, "seek": + 199768, "start": 1997.68, "end": 2003.2, "text": " you have the scale of it you + know how to make it fast you know there''s so many things you know they''re", "tokens": + [50364, 291, 362, 264, 4373, 295, 309, 291, 458, 577, 281, 652, 309, 2370, 291, + 458, 456, 311, 370, 867, 721, 291, 458, 436, 434, 50640], "temperature": 0.0, "avg_logprob": + -0.11847272672151264, "compression_ratio": 1.9274193548387097, "no_speech_prob": + 0.0008521171403117478}, {"id": 292, "seek": 199768, "start": 2003.2, "end": 2009.04, + "text": " just query understanding you know it''s probably you know a full research + topic you know on", "tokens": [50640, 445, 14581, 3701, 291, 458, 309, 311, 1391, + 291, 458, 257, 1577, 2132, 4829, 291, 458, 322, 50932], "temperature": 0.0, "avg_logprob": + -0.11847272672151264, "compression_ratio": 1.9274193548387097, "no_speech_prob": + 0.0008521171403117478}, {"id": 293, "seek": 199768, "start": 2009.04, "end": 2015.1200000000001, + "text": " its own so there''s so many things involved in search so that''s yeah + yeah it''s like endless journey", "tokens": [50932, 1080, 1065, 370, 456, 311, 370, + 867, 721, 3288, 294, 3164, 370, 300, 311, 1338, 1338, 309, 311, 411, 16144, 4671, + 51236], "temperature": 0.0, "avg_logprob": -0.11847272672151264, "compression_ratio": + 1.9274193548387097, "no_speech_prob": 0.0008521171403117478}, {"id": 294, "seek": + 199768, "start": 2015.1200000000001, "end": 2020.5600000000002, "text": " I agree + it''s like yeah it is you can also dive into an old piece out of things or you can", + "tokens": [51236, 286, 3986, 309, 311, 411, 1338, 309, 307, 291, 393, 611, 9192, + 666, 364, 1331, 2522, 484, 295, 721, 420, 291, 393, 51508], "temperature": 0.0, + "avg_logprob": -0.11847272672151264, "compression_ratio": 1.9274193548387097, "no_speech_prob": + 0.0008521171403117478}, {"id": 295, "seek": 199768, "start": 2020.5600000000002, + "end": 2026.16, "text": " stay with like scaling of search or query parsing whatever + that you find passion and maybe your", "tokens": [51508, 1754, 365, 411, 21589, + 295, 3164, 420, 14581, 21156, 278, 2035, 300, 291, 915, 5418, 293, 1310, 428, 51788], + "temperature": 0.0, "avg_logprob": -0.11847272672151264, "compression_ratio": 1.9274193548387097, + "no_speech_prob": 0.0008521171403117478}, {"id": 296, "seek": 202616, "start": 2026.16, + "end": 2031.76, "text": " passion changes over time as well right so you did a bit + of an LP then you move to query parsing then", "tokens": [50364, 5418, 2962, 670, + 565, 382, 731, 558, 370, 291, 630, 257, 857, 295, 364, 38095, 550, 291, 1286, 281, + 14581, 21156, 278, 550, 50644], "temperature": 0.0, "avg_logprob": -0.13073219873208916, + "compression_ratio": 1.7412587412587412, "no_speech_prob": 0.00043184225796721876}, + {"id": 297, "seek": 202616, "start": 2031.76, "end": 2037.0400000000002, "text": + " you move to maybe even scalability or whatever yeah I agree it''s like a fascinating + topic and also", "tokens": [50644, 291, 1286, 281, 1310, 754, 15664, 2310, 420, + 2035, 1338, 286, 3986, 309, 311, 411, 257, 10343, 4829, 293, 611, 50908], "temperature": + 0.0, "avg_logprob": -0.13073219873208916, "compression_ratio": 1.7412587412587412, + "no_speech_prob": 0.00043184225796721876}, {"id": 298, "seek": 202616, "start": + 2037.0400000000002, "end": 2042.0, "text": " what fascinates me is that on the other + side of things users are also not sleeping they''re puzzling", "tokens": [50908, + 437, 7184, 259, 1024, 385, 307, 300, 322, 264, 661, 1252, 295, 721, 5022, 366, 611, + 406, 8296, 436, 434, 18741, 1688, 51156], "temperature": 0.0, "avg_logprob": -0.13073219873208916, + "compression_ratio": 1.7412587412587412, "no_speech_prob": 0.00043184225796721876}, + {"id": 299, "seek": 202616, "start": 2042.0, "end": 2048.8, "text": " you all the + time with new queries seasonal changes data changes as well because in mid-size + a larger", "tokens": [51156, 291, 439, 264, 565, 365, 777, 24109, 27421, 2962, 1412, + 2962, 382, 731, 570, 294, 2062, 12, 27553, 257, 4833, 51496], "temperature": 0.0, + "avg_logprob": -0.13073219873208916, "compression_ratio": 1.7412587412587412, "no_speech_prob": + 0.00043184225796721876}, {"id": 300, "seek": 202616, "start": 2048.8, "end": 2054.56, + "text": " company it''s usually work of multiple teams and you know or departments + some departments looking", "tokens": [51496, 2237, 309, 311, 2673, 589, 295, 3866, + 5491, 293, 291, 458, 420, 15326, 512, 15326, 1237, 51784], "temperature": 0.0, "avg_logprob": + -0.13073219873208916, "compression_ratio": 1.7412587412587412, "no_speech_prob": + 0.00043184225796721876}, {"id": 301, "seek": 205456, "start": 2054.56, "end": 2059.92, + "text": " after data some looking after ranking recommendation all the feature collection + and what not you know", "tokens": [50364, 934, 1412, 512, 1237, 934, 17833, 11879, + 439, 264, 4111, 5765, 293, 437, 406, 291, 458, 50632], "temperature": 0.0, "avg_logprob": + -0.08208456493559338, "compression_ratio": 1.9308943089430894, "no_speech_prob": + 0.0008439552038908005}, {"id": 302, "seek": 205456, "start": 2059.92, "end": 2065.92, + "text": " something somewhere can sometimes go wrong and you need to prepare for + it you need to interact you", "tokens": [50632, 746, 4079, 393, 2171, 352, 2085, + 293, 291, 643, 281, 5940, 337, 309, 291, 643, 281, 4648, 291, 50932], "temperature": + 0.0, "avg_logprob": -0.08208456493559338, "compression_ratio": 1.9308943089430894, + "no_speech_prob": 0.0008439552038908005}, {"id": 303, "seek": 205456, "start": 2065.92, + "end": 2070.32, "text": " need to kind of build a system that is resilient and it''s + it''s a fantastic fantastic space", "tokens": [50932, 643, 281, 733, 295, 1322, + 257, 1185, 300, 307, 23699, 293, 309, 311, 309, 311, 257, 5456, 5456, 1901, 51152], + "temperature": 0.0, "avg_logprob": -0.08208456493559338, "compression_ratio": 1.9308943089430894, + "no_speech_prob": 0.0008439552038908005}, {"id": 304, "seek": 205456, "start": 2072.64, + "end": 2077.12, "text": " yeah it really is and there''s so many methods and that''s + also one of the things you know people", "tokens": [51268, 1338, 309, 534, 307, + 293, 456, 311, 370, 867, 7150, 293, 300, 311, 611, 472, 295, 264, 721, 291, 458, + 561, 51492], "temperature": 0.0, "avg_logprob": -0.08208456493559338, "compression_ratio": + 1.9308943089430894, "no_speech_prob": 0.0008439552038908005}, {"id": 305, "seek": + 205456, "start": 2077.68, "end": 2081.36, "text": " you know they want to build + something that is great you know but even if you''re using a", "tokens": [51520, + 291, 458, 436, 528, 281, 1322, 746, 300, 307, 869, 291, 458, 457, 754, 498, 291, + 434, 1228, 257, 51704], "temperature": 0.0, "avg_logprob": -0.08208456493559338, + "compression_ratio": 1.9308943089430894, "no_speech_prob": 0.0008439552038908005}, + {"id": 306, "seek": 208136, "start": 2081.36, "end": 2086.4, "text": " passion to + see know if you''re using Westbar you know you need to have kind of some investment + of", "tokens": [50364, 5418, 281, 536, 458, 498, 291, 434, 1228, 4055, 5356, 291, + 458, 291, 643, 281, 362, 733, 295, 512, 6078, 295, 50616], "temperature": 0.0, "avg_logprob": + -0.14800088447436952, "compression_ratio": 1.8918918918918919, "no_speech_prob": + 0.001226239139214158}, {"id": 307, "seek": 208136, "start": 2086.4, "end": 2090.96, + "text": " actually getting great results and that the same thing if you''re using + like a vector search library or", "tokens": [50616, 767, 1242, 869, 3542, 293, 300, + 264, 912, 551, 498, 291, 434, 1228, 411, 257, 8062, 3164, 6405, 420, 50844], "temperature": + 0.0, "avg_logprob": -0.14800088447436952, "compression_ratio": 1.8918918918918919, + "no_speech_prob": 0.001226239139214158}, {"id": 308, "seek": 208136, "start": 2091.76, + "end": 2097.36, "text": " you know you need to have some kind of data pipeline for + your documents and queries so there''s", "tokens": [50884, 291, 458, 291, 643, 281, + 362, 512, 733, 295, 1412, 15517, 337, 428, 8512, 293, 24109, 370, 456, 311, 51164], + "temperature": 0.0, "avg_logprob": -0.14800088447436952, "compression_ratio": 1.8918918918918919, + "no_speech_prob": 0.001226239139214158}, {"id": 309, "seek": 208136, "start": 2097.92, + "end": 2104.2400000000002, "text": " you know I''m not a huge believer in you know + none of these technologies really work that well you", "tokens": [51192, 291, 458, + 286, 478, 406, 257, 2603, 23892, 294, 291, 458, 6022, 295, 613, 7943, 534, 589, + 300, 731, 291, 51508], "temperature": 0.0, "avg_logprob": -0.14800088447436952, + "compression_ratio": 1.8918918918918919, "no_speech_prob": 0.001226239139214158}, + {"id": 310, "seek": 208136, "start": 2104.2400000000002, "end": 2109.36, "text": + " know out of the box you know it''s it''s such as definitely not a sole problem + and even if you look", "tokens": [51508, 458, 484, 295, 264, 2424, 291, 458, 309, + 311, 309, 311, 1270, 382, 2138, 406, 257, 12321, 1154, 293, 754, 498, 291, 574, + 51764], "temperature": 0.0, "avg_logprob": -0.14800088447436952, "compression_ratio": + 1.8918918918918919, "no_speech_prob": 0.001226239139214158}, {"id": 311, "seek": + 210936, "start": 2109.36, "end": 2113.6, "text": " at Google you know they''re struggling + as well you know there are many queries of question answering", "tokens": [50364, + 412, 3329, 291, 458, 436, 434, 9314, 382, 731, 291, 458, 456, 366, 867, 24109, 295, + 1168, 13430, 50576], "temperature": 0.0, "avg_logprob": -0.1325650920038638, "compression_ratio": + 1.9453125, "no_speech_prob": 0.0014266303041949868}, {"id": 312, "seek": 210936, + "start": 2113.6, "end": 2118.7200000000003, "text": " and so on that they totally + get wrong you know and people want to build Google but they have like", "tokens": + [50576, 293, 370, 322, 300, 436, 3879, 483, 2085, 291, 458, 293, 561, 528, 281, + 1322, 3329, 457, 436, 362, 411, 50832], "temperature": 0.0, "avg_logprob": -0.1325650920038638, + "compression_ratio": 1.9453125, "no_speech_prob": 0.0014266303041949868}, {"id": + 313, "seek": 210936, "start": 2118.7200000000003, "end": 2126.32, "text": " maybe + two guys you know or or girls you know working on search you know you you you don''t + build a great", "tokens": [50832, 1310, 732, 1074, 291, 458, 420, 420, 4519, 291, + 458, 1364, 322, 3164, 291, 458, 291, 291, 291, 500, 380, 1322, 257, 869, 51212], + "temperature": 0.0, "avg_logprob": -0.1325650920038638, "compression_ratio": 1.9453125, + "no_speech_prob": 0.0014266303041949868}, {"id": 314, "seek": 210936, "start": 2126.32, + "end": 2132.4, "text": " search experience you know if you''re by a team of two + two people yeah yeah it''s a huge investment", "tokens": [51212, 3164, 1752, 291, + 458, 498, 291, 434, 538, 257, 1469, 295, 732, 732, 561, 1338, 1338, 309, 311, 257, + 2603, 6078, 51516], "temperature": 0.0, "avg_logprob": -0.1325650920038638, "compression_ratio": + 1.9453125, "no_speech_prob": 0.0014266303041949868}, {"id": 315, "seek": 210936, + "start": 2132.4, "end": 2139.28, "text": " and also like time investment not just + like you need to hire a lot of smart people but you need to", "tokens": [51516, + 293, 611, 411, 565, 6078, 406, 445, 411, 291, 643, 281, 11158, 257, 688, 295, 4069, + 561, 457, 291, 643, 281, 51860], "temperature": 0.0, "avg_logprob": -0.1325650920038638, + "compression_ratio": 1.9453125, "no_speech_prob": 0.0014266303041949868}, {"id": + 316, "seek": 213928, "start": 2139.28, "end": 2144.8, "text": " give them time to + actually go through all these challenges and you know now that you''ve mentioned", + "tokens": [50364, 976, 552, 565, 281, 767, 352, 807, 439, 613, 4759, 293, 291, 458, + 586, 300, 291, 600, 2835, 50640], "temperature": 0.0, "avg_logprob": -0.08299543857574462, + "compression_ratio": 1.7824074074074074, "no_speech_prob": 0.0021586879156529903}, + {"id": 317, "seek": 213928, "start": 2144.8, "end": 2151.76, "text": " vector search + I''m actually curious like when in Westbar journey you know did you first hear about", + "tokens": [50640, 8062, 3164, 286, 478, 767, 6369, 411, 562, 294, 4055, 5356, 4671, + 291, 458, 630, 291, 700, 1568, 466, 50988], "temperature": 0.0, "avg_logprob": -0.08299543857574462, + "compression_ratio": 1.7824074074074074, "no_speech_prob": 0.0021586879156529903}, + {"id": 318, "seek": 213928, "start": 2151.76, "end": 2156.2400000000002, "text": + " vector search and actually what caught your eye and you know like sometimes even + today when", "tokens": [50988, 8062, 3164, 293, 767, 437, 5415, 428, 3313, 293, + 291, 458, 411, 2171, 754, 965, 562, 51212], "temperature": 0.0, "avg_logprob": -0.08299543857574462, + "compression_ratio": 1.7824074074074074, "no_speech_prob": 0.0021586879156529903}, + {"id": 319, "seek": 213928, "start": 2156.2400000000002, "end": 2161.36, "text": + " companies evaluate whether or not to take the neural search journey or stay with + this part search", "tokens": [51212, 3431, 13059, 1968, 420, 406, 281, 747, 264, + 18161, 3164, 4671, 420, 1754, 365, 341, 644, 3164, 51468], "temperature": 0.0, "avg_logprob": + -0.08299543857574462, "compression_ratio": 1.7824074074074074, "no_speech_prob": + 0.0021586879156529903}, {"id": 320, "seek": 216136, "start": 2161.6, "end": 2169.36, + "text": " journey it is not that obvious actually and and maybe you could share + some advice there as well but", "tokens": [50376, 4671, 309, 307, 406, 300, 6322, + 767, 293, 293, 1310, 291, 727, 2073, 512, 5192, 456, 382, 731, 457, 50764], "temperature": + 0.0, "avg_logprob": -0.13622127260480607, "compression_ratio": 1.620879120879121, + "no_speech_prob": 0.003026058431714773}, {"id": 321, "seek": 216136, "start": 2169.36, + "end": 2180.2400000000002, "text": " maybe first if you could also do a historical + deep dive there super exciting yeah it''s it''s so we''ve", "tokens": [50764, 1310, + 700, 498, 291, 727, 611, 360, 257, 8584, 2452, 9192, 456, 1687, 4670, 1338, 309, + 311, 309, 311, 370, 321, 600, 51308], "temperature": 0.0, "avg_logprob": -0.13622127260480607, + "compression_ratio": 1.620879120879121, "no_speech_prob": 0.003026058431714773}, + {"id": 322, "seek": 216136, "start": 2180.2400000000002, "end": 2186.96, "text": + " been using like dog products and so on for search but it''s been it''s been brute + force right so", "tokens": [51308, 668, 1228, 411, 3000, 3383, 293, 370, 322, 337, + 3164, 457, 309, 311, 668, 309, 311, 668, 47909, 3464, 558, 370, 51644], "temperature": + 0.0, "avg_logprob": -0.13622127260480607, "compression_ratio": 1.620879120879121, + "no_speech_prob": 0.003026058431714773}, {"id": 323, "seek": 218696, "start": 2187.44, + "end": 2196.48, "text": " been able to do brute force vector search in Westbar for + a long time and then in 2018", "tokens": [50388, 668, 1075, 281, 360, 47909, 3464, + 8062, 3164, 294, 4055, 5356, 337, 257, 938, 565, 293, 550, 294, 6096, 50840], "temperature": + 0.0, "avg_logprob": -0.20160357157389322, "compression_ratio": 1.3656716417910448, + "no_speech_prob": 0.0004122816026210785}, {"id": 324, "seek": 218696, "start": 2197.84, + "end": 2208.56, "text": " bird came out and in January 2019 the researchers published + really great results on them as Marco", "tokens": [50908, 5255, 1361, 484, 293, + 294, 7061, 6071, 264, 10309, 6572, 534, 869, 3542, 322, 552, 382, 26535, 51444], + "temperature": 0.0, "avg_logprob": -0.20160357157389322, "compression_ratio": 1.3656716417910448, + "no_speech_prob": 0.0004122816026210785}, {"id": 325, "seek": 220856, "start": 2208.56, + "end": 2216.24, "text": " Pasadranking and then we like you know this is this bird + model you know how can we use it you know", "tokens": [50364, 14199, 345, 20479, + 278, 293, 550, 321, 411, 291, 458, 341, 307, 341, 5255, 2316, 291, 458, 577, 393, + 321, 764, 309, 291, 458, 50748], "temperature": 0.0, "avg_logprob": -0.15139446258544922, + "compression_ratio": 1.95, "no_speech_prob": 0.0015901551814749837}, {"id": 326, + "seek": 220856, "start": 2216.24, "end": 2221.2, "text": " is it you know there + are a lot of things you know to get your head around you know what its bird is", + "tokens": [50748, 307, 309, 291, 458, 456, 366, 257, 688, 295, 721, 291, 458, 281, + 483, 428, 1378, 926, 291, 458, 437, 1080, 5255, 307, 50996], "temperature": 0.0, + "avg_logprob": -0.15139446258544922, "compression_ratio": 1.95, "no_speech_prob": + 0.0015901551814749837}, {"id": 327, "seek": 220856, "start": 2221.2, "end": 2227.36, + "text": " and how to use it and then we saw that there basically were two ways of + of using it either as", "tokens": [50996, 293, 577, 281, 764, 309, 293, 550, 321, + 1866, 300, 456, 1936, 645, 732, 2098, 295, 295, 1228, 309, 2139, 382, 51304], "temperature": + 0.0, "avg_logprob": -0.15139446258544922, "compression_ratio": 1.95, "no_speech_prob": + 0.0015901551814749837}, {"id": 328, "seek": 220856, "start": 2227.36, "end": 2233.12, + "text": " a representation model where you encode the query and the document independently + and then you can", "tokens": [51304, 257, 10290, 2316, 689, 291, 2058, 1429, 264, + 14581, 293, 264, 4166, 21761, 293, 550, 291, 393, 51592], "temperature": 0.0, "avg_logprob": + -0.15139446258544922, "compression_ratio": 1.95, "no_speech_prob": 0.0015901551814749837}, + {"id": 329, "seek": 223312, "start": 2233.2, "end": 2239.52, "text": " build using + a vector search library you can you build the index of your corpus and then you + can", "tokens": [50368, 1322, 1228, 257, 8062, 3164, 6405, 291, 393, 291, 1322, + 264, 8186, 295, 428, 1181, 31624, 293, 550, 291, 393, 50684], "temperature": 0.0, + "avg_logprob": -0.23999903575483575, "compression_ratio": 1.6506550218340612, "no_speech_prob": + 0.0010553663596510887}, {"id": 330, "seek": 223312, "start": 2239.52, "end": 2245.68, + "text": " retrieve pretty efficiently if you have La Crocs Med Search version so + that actually was what", "tokens": [50684, 30254, 1238, 19621, 498, 291, 362, 2369, + 18965, 14368, 3982, 17180, 3037, 370, 300, 767, 390, 437, 50992], "temperature": + 0.0, "avg_logprob": -0.23999903575483575, "compression_ratio": 1.6506550218340612, + "no_speech_prob": 0.0010553663596510887}, {"id": 331, "seek": 223312, "start": 2245.68, + "end": 2253.44, "text": " motivated us in 2019 we started that work in summer August + to actually have vector search and then", "tokens": [50992, 14515, 505, 294, 6071, + 321, 1409, 300, 589, 294, 4266, 6897, 281, 767, 362, 8062, 3164, 293, 550, 51380], + "temperature": 0.0, "avg_logprob": -0.23999903575483575, "compression_ratio": 1.6506550218340612, + "no_speech_prob": 0.0010553663596510887}, {"id": 332, "seek": 223312, "start": 2254.08, + "end": 2260.3199999999997, "text": " also in term in Java there are a couple of + image search use cases around hamming distance", "tokens": [51412, 611, 294, 1433, + 294, 10745, 456, 366, 257, 1916, 295, 3256, 3164, 764, 3331, 926, 36600, 278, 4560, + 51724], "temperature": 0.0, "avg_logprob": -0.23999903575483575, "compression_ratio": + 1.6506550218340612, "no_speech_prob": 0.0010553663596510887}, {"id": 333, "seek": + 226032, "start": 2260.8, "end": 2267.04, "text": " so they were pushing for that + so there are multiple things and also our users it was open source by", "tokens": + [50388, 370, 436, 645, 7380, 337, 300, 370, 456, 366, 3866, 721, 293, 611, 527, + 5022, 309, 390, 1269, 4009, 538, 50700], "temperature": 0.0, "avg_logprob": -0.15801701338394827, + "compression_ratio": 1.7808219178082192, "no_speech_prob": 0.001198856974951923}, + {"id": 334, "seek": 226032, "start": 2267.04, "end": 2272.32, "text": " then so + users were also asking for it you know can Westbar do vector search you know we + see that we", "tokens": [50700, 550, 370, 5022, 645, 611, 3365, 337, 309, 291, 458, + 393, 4055, 5356, 360, 8062, 3164, 291, 458, 321, 536, 300, 321, 50964], "temperature": + 0.0, "avg_logprob": -0.15801701338394827, "compression_ratio": 1.7808219178082192, + "no_speech_prob": 0.001198856974951923}, {"id": 335, "seek": 226032, "start": 2272.32, + "end": 2278.4, "text": " could represent vectors but it''s not that cost efficient + if we need to do brute force so then we", "tokens": [50964, 727, 2906, 18875, 457, + 309, 311, 406, 300, 2063, 7148, 498, 321, 643, 281, 360, 47909, 3464, 370, 550, + 321, 51268], "temperature": 0.0, "avg_logprob": -0.15801701338394827, "compression_ratio": + 1.7808219178082192, "no_speech_prob": 0.001198856974951923}, {"id": 336, "seek": + 226032, "start": 2278.4, "end": 2284.0, "text": " start looking at you know it and + we had all the kind of building pieces we had the tensor the", "tokens": [51268, + 722, 1237, 412, 291, 458, 309, 293, 321, 632, 439, 264, 733, 295, 2390, 3755, 321, + 632, 264, 40863, 264, 51548], "temperature": 0.0, "avg_logprob": -0.15801701338394827, + "compression_ratio": 1.7808219178082192, "no_speech_prob": 0.001198856974951923}, + {"id": 337, "seek": 228400, "start": 2284.0, "end": 2290.72, "text": " document + models representing floats and all these numeric fields and so so we had it wasn''t + a lot", "tokens": [50364, 4166, 5245, 13460, 37878, 293, 439, 613, 7866, 299, 7909, + 293, 370, 370, 321, 632, 309, 2067, 380, 257, 688, 50700], "temperature": 0.0, "avg_logprob": + -0.11810681960161995, "compression_ratio": 1.6478260869565218, "no_speech_prob": + 0.0007994991610758007}, {"id": 338, "seek": 228400, "start": 2290.72, "end": 2297.28, + "text": " of work to get all the kind of pieces together but we had to implement + the algorithm and we did", "tokens": [50700, 295, 589, 281, 483, 439, 264, 733, + 295, 3755, 1214, 457, 321, 632, 281, 4445, 264, 9284, 293, 321, 630, 51028], "temperature": + 0.0, "avg_logprob": -0.11810681960161995, "compression_ratio": 1.6478260869565218, + "no_speech_prob": 0.0007994991610758007}, {"id": 339, "seek": 228400, "start": 2298.0, + "end": 2304.16, "text": " we did a pretty I think like one month where we actually + surveyed multiple algorithms for", "tokens": [51064, 321, 630, 257, 1238, 286, 519, + 411, 472, 1618, 689, 321, 767, 8984, 292, 3866, 14642, 337, 51372], "temperature": + 0.0, "avg_logprob": -0.11810681960161995, "compression_ratio": 1.6478260869565218, + "no_speech_prob": 0.0007994991610758007}, {"id": 340, "seek": 228400, "start": 2305.2, + "end": 2310.88, "text": " approximate vector search you know how could they fit + into the Westbar model of doing things so", "tokens": [51424, 30874, 8062, 3164, + 291, 458, 577, 727, 436, 3318, 666, 264, 4055, 5356, 2316, 295, 884, 721, 370, 51708], + "temperature": 0.0, "avg_logprob": -0.11810681960161995, "compression_ratio": 1.6478260869565218, + "no_speech_prob": 0.0007994991610758007}, {"id": 341, "seek": 231088, "start": 2310.88, + "end": 2316.2400000000002, "text": " that''s the background of kind of why why vector + search came to Westbar and I was really", "tokens": [50364, 300, 311, 264, 3678, + 295, 733, 295, 983, 983, 8062, 3164, 1361, 281, 4055, 5356, 293, 286, 390, 534, + 50632], "temperature": 0.0, "avg_logprob": -0.15628110278736462, "compression_ratio": + 1.6902654867256637, "no_speech_prob": 0.002095071831718087}, {"id": 342, "seek": + 231088, "start": 2316.2400000000002, "end": 2321.44, "text": " exciting when we + started working on that because there were a lot of interest in it right so there + were", "tokens": [50632, 4670, 562, 321, 1409, 1364, 322, 300, 570, 456, 645, 257, + 688, 295, 1179, 294, 309, 558, 370, 456, 645, 50892], "temperature": 0.0, "avg_logprob": + -0.15628110278736462, "compression_ratio": 1.6902654867256637, "no_speech_prob": + 0.002095071831718087}, {"id": 343, "seek": 231088, "start": 2321.44, "end": 2326.48, + "text": " people wanted to work on that project so yeah of course because it''s + something like a bleeding edge", "tokens": [50892, 561, 1415, 281, 589, 322, 300, + 1716, 370, 1338, 295, 1164, 570, 309, 311, 746, 411, 257, 19312, 4691, 51144], "temperature": + 0.0, "avg_logprob": -0.15628110278736462, "compression_ratio": 1.6902654867256637, + "no_speech_prob": 0.002095071831718087}, {"id": 344, "seek": 231088, "start": 2326.48, + "end": 2334.1600000000003, "text": " and like a new but also like one of the podcasts + I mentioned I think it was also with Yuri", "tokens": [51144, 293, 411, 257, 777, + 457, 611, 411, 472, 295, 264, 24045, 286, 2835, 286, 519, 309, 390, 611, 365, 33901, + 51528], "temperature": 0.0, "avg_logprob": -0.15628110278736462, "compression_ratio": + 1.6902654867256637, "no_speech_prob": 0.002095071831718087}, {"id": 345, "seek": + 233416, "start": 2334.16, "end": 2341.12, "text": " Malkov that I''ve I''ve had + a friend who worked in essentially vector search but he was a mathematician", "tokens": + [50364, 376, 667, 5179, 300, 286, 600, 286, 600, 632, 257, 1277, 567, 2732, 294, + 4476, 8062, 3164, 457, 415, 390, 257, 48281, 50712], "temperature": 0.0, "avg_logprob": + -0.119738208546358, "compression_ratio": 1.7161572052401746, "no_speech_prob": 0.01538281049579382}, + {"id": 346, "seek": 233416, "start": 2341.12, "end": 2347.44, "text": " himself + right so I also viewed it as a pure mathematical concept and I was like yeah he''s + playing", "tokens": [50712, 3647, 558, 370, 286, 611, 19174, 309, 382, 257, 6075, + 18894, 3410, 293, 286, 390, 411, 1338, 415, 311, 2433, 51028], "temperature": 0.0, + "avg_logprob": -0.119738208546358, "compression_ratio": 1.7161572052401746, "no_speech_prob": + 0.01538281049579382}, {"id": 347, "seek": 233416, "start": 2347.44, "end": 2352.64, + "text": " with some theoretical you know advancements and then he actually gave + a talk at Google you know", "tokens": [51028, 365, 512, 20864, 291, 458, 7295, 1117, + 293, 550, 415, 767, 2729, 257, 751, 412, 3329, 291, 458, 51288], "temperature": + 0.0, "avg_logprob": -0.119738208546358, "compression_ratio": 1.7161572052401746, + "no_speech_prob": 0.01538281049579382}, {"id": 348, "seek": 233416, "start": 2352.64, + "end": 2357.44, "text": " as well actually presenting this algorithm and the nearest + neighbor search essentially and how to", "tokens": [51288, 382, 731, 767, 15578, + 341, 9284, 293, 264, 23831, 5987, 3164, 4476, 293, 577, 281, 51528], "temperature": + 0.0, "avg_logprob": -0.119738208546358, "compression_ratio": 1.7161572052401746, + "no_speech_prob": 0.01538281049579382}, {"id": 349, "seek": 235744, "start": 2357.44, + "end": 2364.48, "text": " optimize it and even then I wasn''t like essentially buying + in and like okay it''s still mathematics", "tokens": [50364, 19719, 309, 293, 754, + 550, 286, 2067, 380, 411, 4476, 6382, 294, 293, 411, 1392, 309, 311, 920, 18666, + 50716], "temperature": 0.0, "avg_logprob": -0.07550488236129925, "compression_ratio": + 1.6899563318777293, "no_speech_prob": 0.00816343817859888}, {"id": 350, "seek": + 235744, "start": 2364.48, "end": 2372.08, "text": " but then when I was reading + H&SW paper I saw them citing his work I was like wow so now these", "tokens": [50716, + 457, 550, 562, 286, 390, 3760, 389, 5, 50, 54, 3035, 286, 1866, 552, 48749, 702, + 589, 286, 390, 411, 6076, 370, 586, 613, 51096], "temperature": 0.0, "avg_logprob": + -0.07550488236129925, "compression_ratio": 1.6899563318777293, "no_speech_prob": + 0.00816343817859888}, {"id": 351, "seek": 235744, "start": 2372.08, "end": 2377.92, + "text": " paths have intersected so now this makes sense and you know usually it + excites me when it''s put", "tokens": [51096, 14518, 362, 27815, 292, 370, 586, + 341, 1669, 2020, 293, 291, 458, 2673, 309, 1624, 3324, 385, 562, 309, 311, 829, + 51388], "temperature": 0.0, "avg_logprob": -0.07550488236129925, "compression_ratio": + 1.6899563318777293, "no_speech_prob": 0.00816343817859888}, {"id": 352, "seek": + 235744, "start": 2377.92, "end": 2385.68, "text": " into practice is that how you + felt as well like was mathematics aspect of it like engaging for you", "tokens": + [51388, 666, 3124, 307, 300, 577, 291, 2762, 382, 731, 411, 390, 18666, 4171, 295, + 309, 411, 11268, 337, 291, 51776], "temperature": 0.0, "avg_logprob": -0.07550488236129925, + "compression_ratio": 1.6899563318777293, "no_speech_prob": 0.00816343817859888}, + {"id": 353, "seek": 238568, "start": 2385.68, "end": 2393.3599999999997, "text": + " or did you view it more like an engineering sort of yeah I''m definitely on the + engineering side", "tokens": [50364, 420, 630, 291, 1910, 309, 544, 411, 364, 7043, + 1333, 295, 1338, 286, 478, 2138, 322, 264, 7043, 1252, 50748], "temperature": 0.0, + "avg_logprob": -0.12282318904482085, "compression_ratio": 1.8613861386138615, "no_speech_prob": + 0.0004701754660345614}, {"id": 354, "seek": 238568, "start": 2393.3599999999997, + "end": 2399.8399999999997, "text": " so I''m definitely on the engineering side + so for example on transformers you know I don''t care", "tokens": [50748, 370, 286, + 478, 2138, 322, 264, 7043, 1252, 370, 337, 1365, 322, 4088, 433, 291, 458, 286, + 500, 380, 1127, 51072], "temperature": 0.0, "avg_logprob": -0.12282318904482085, + "compression_ratio": 1.8613861386138615, "no_speech_prob": 0.0004701754660345614}, + {"id": 355, "seek": 238568, "start": 2399.8399999999997, "end": 2405.6, "text": + " about the deep neural network architecture how these interacts you know I basically + treat", "tokens": [51072, 466, 264, 2452, 18161, 3209, 9482, 577, 613, 43582, 291, + 458, 286, 1936, 2387, 51360], "temperature": 0.0, "avg_logprob": -0.12282318904482085, + "compression_ratio": 1.8613861386138615, "no_speech_prob": 0.0004701754660345614}, + {"id": 356, "seek": 238568, "start": 2405.6, "end": 2410.7999999999997, "text": + " as a black box you know this is this is the box and you need a tokenizer for it + okay what''s the", "tokens": [51360, 382, 257, 2211, 2424, 291, 458, 341, 307, 341, + 307, 264, 2424, 293, 291, 643, 257, 14862, 6545, 337, 309, 1392, 437, 311, 264, + 51620], "temperature": 0.0, "avg_logprob": -0.12282318904482085, "compression_ratio": + 1.8613861386138615, "no_speech_prob": 0.0004701754660345614}, {"id": 357, "seek": + 241080, "start": 2410.8, "end": 2417.04, "text": " tokenizer what''s saying people''s + output what can I use it for you know I''m not gonna do and they", "tokens": [50364, + 14862, 6545, 437, 311, 1566, 561, 311, 5598, 437, 393, 286, 764, 309, 337, 291, + 458, 286, 478, 406, 799, 360, 293, 436, 50676], "temperature": 0.0, "avg_logprob": + -0.22658492706634187, "compression_ratio": 1.6455696202531647, "no_speech_prob": + 0.0003873474197462201}, {"id": 358, "seek": 241080, "start": 2417.04, "end": 2422.8, + "text": " be I mean a lot of research actually study you know how can we build ultimate + in neural network", "tokens": [50676, 312, 286, 914, 257, 688, 295, 2132, 767, 2979, + 291, 458, 577, 393, 321, 1322, 9705, 294, 18161, 3209, 50964], "temperature": 0.0, + "avg_logprob": -0.22658492706634187, "compression_ratio": 1.6455696202531647, "no_speech_prob": + 0.0003873474197462201}, {"id": 359, "seek": 241080, "start": 2422.8, "end": 2429.2000000000003, + "text": " architecture so definitely know from for me that was not the math involved + but we we have some people", "tokens": [50964, 9482, 370, 2138, 458, 490, 337, 385, + 300, 390, 406, 264, 5221, 3288, 457, 321, 321, 362, 512, 561, 51284], "temperature": + 0.0, "avg_logprob": -0.22658492706634187, "compression_ratio": 1.6455696202531647, + "no_speech_prob": 0.0003873474197462201}, {"id": 360, "seek": 241080, "start": 2429.2000000000003, + "end": 2434.96, "text": " in our team with a heavy math background and you know + they can teach me a little bit about what", "tokens": [51284, 294, 527, 1469, 365, + 257, 4676, 5221, 3678, 293, 291, 458, 436, 393, 2924, 385, 257, 707, 857, 466, 437, + 51572], "temperature": 0.0, "avg_logprob": -0.22658492706634187, "compression_ratio": + 1.6455696202531647, "no_speech_prob": 0.0003873474197462201}, {"id": 361, "seek": + 243496, "start": 2434.96, "end": 2441.04, "text": " it''s a proper distance metric + and you know why why this one work and this one work so that", "tokens": [50364, + 309, 311, 257, 2296, 4560, 20678, 293, 291, 458, 983, 983, 341, 472, 589, 293, 341, + 472, 589, 370, 300, 50668], "temperature": 0.0, "avg_logprob": -0.16876449584960937, + "compression_ratio": 1.8269230769230769, "no_speech_prob": 0.0002527958422433585}, + {"id": 362, "seek": 243496, "start": 2441.04, "end": 2447.6, "text": " that was + really also a learning experience for for me to engage with such core team on this + feature", "tokens": [50668, 300, 390, 534, 611, 257, 2539, 1752, 337, 337, 385, + 281, 4683, 365, 1270, 4965, 1469, 322, 341, 4111, 50996], "temperature": 0.0, "avg_logprob": + -0.16876449584960937, "compression_ratio": 1.8269230769230769, "no_speech_prob": + 0.0002527958422433585}, {"id": 363, "seek": 243496, "start": 2448.16, "end": 2454.16, + "text": " and a huge discussion we had you know who was you know one of my main + point was that you know", "tokens": [51024, 293, 257, 2603, 5017, 321, 632, 291, + 458, 567, 390, 291, 458, 472, 295, 452, 2135, 935, 390, 300, 291, 458, 51324], "temperature": + 0.0, "avg_logprob": -0.16876449584960937, "compression_ratio": 1.8269230769230769, + "no_speech_prob": 0.0002527958422433585}, {"id": 364, "seek": 243496, "start": 2454.16, + "end": 2460.2400000000002, "text": " we need to be able to integrate for users when + they want to use vector search they want to have", "tokens": [51324, 321, 643, 281, + 312, 1075, 281, 13365, 337, 5022, 562, 436, 528, 281, 764, 8062, 3164, 436, 528, + 281, 362, 51628], "temperature": 0.0, "avg_logprob": -0.16876449584960937, "compression_ratio": + 1.8269230769230769, "no_speech_prob": 0.0002527958422433585}, {"id": 365, "seek": + 246024, "start": 2460.9599999999996, "end": 2466.56, "text": " filters they want + to be able to express this in our query language so that you can combine", "tokens": + [50400, 15995, 436, 528, 281, 312, 1075, 281, 5109, 341, 294, 527, 14581, 2856, + 370, 300, 291, 393, 10432, 50680], "temperature": 0.0, "avg_logprob": -0.11005795279214549, + "compression_ratio": 1.8019323671497585, "no_speech_prob": 0.0026134189683943987}, + {"id": 366, "seek": 246024, "start": 2466.56, "end": 2472.4799999999996, "text": + " the best of both worlds and and that took really some time you know to to get + that right and that", "tokens": [50680, 264, 1151, 295, 1293, 13401, 293, 293, 300, + 1890, 534, 512, 565, 291, 458, 281, 281, 483, 300, 558, 293, 300, 50976], "temperature": + 0.0, "avg_logprob": -0.11005795279214549, "compression_ratio": 1.8019323671497585, + "no_speech_prob": 0.0026134189683943987}, {"id": 367, "seek": 246024, "start": 2472.4799999999996, + "end": 2477.6, "text": " was really you know really fun to see that that you actually + can write a query and say that", "tokens": [50976, 390, 534, 291, 458, 534, 1019, + 281, 536, 300, 300, 291, 767, 393, 2464, 257, 14581, 293, 584, 300, 51232], "temperature": + 0.0, "avg_logprob": -0.11005795279214549, "compression_ratio": 1.8019323671497585, + "no_speech_prob": 0.0026134189683943987}, {"id": 368, "seek": 246024, "start": 2478.16, + "end": 2483.6, "text": " hey give me documents that are near in vector space then + filter on this attribute but at the", "tokens": [51260, 4177, 976, 385, 8512, 300, + 366, 2651, 294, 8062, 1901, 550, 6608, 322, 341, 19667, 457, 412, 264, 51532], "temperature": + 0.0, "avg_logprob": -0.11005795279214549, "compression_ratio": 1.8019323671497585, + "no_speech_prob": 0.0026134189683943987}, {"id": 369, "seek": 248360, "start": 2483.6, + "end": 2491.2, "text": " same time also compute or retrieve based on the weekend + query operator which you heard about", "tokens": [50364, 912, 565, 611, 14722, 420, + 30254, 2361, 322, 264, 6711, 14581, 12973, 597, 291, 2198, 466, 50744], "temperature": + 0.0, "avg_logprob": -0.14506806522966867, "compression_ratio": 1.7268518518518519, + "no_speech_prob": 0.00020996364764869213}, {"id": 370, "seek": 248360, "start": + 2491.2, "end": 2497.44, "text": " weekend which is an optimization technique for + doing spatial chival and that you could actually", "tokens": [50744, 6711, 597, + 307, 364, 19618, 6532, 337, 884, 23598, 417, 3576, 293, 300, 291, 727, 767, 51056], + "temperature": 0.0, "avg_logprob": -0.14506806522966867, "compression_ratio": 1.7268518518518519, + "no_speech_prob": 0.00020996364764869213}, {"id": 371, "seek": 248360, "start": + 2497.44, "end": 2503.44, "text": " express that in the same query and I have to + say that I was really proud of our effort when", "tokens": [51056, 5109, 300, 294, + 264, 912, 14581, 293, 286, 362, 281, 584, 300, 286, 390, 534, 4570, 295, 527, 4630, + 562, 51356], "temperature": 0.0, "avg_logprob": -0.14506806522966867, "compression_ratio": + 1.7268518518518519, "no_speech_prob": 0.00020996364764869213}, {"id": 372, "seek": + 248360, "start": 2503.44, "end": 2509.52, "text": " when it came out with that and + and could be able to combine it and that''s really if you look", "tokens": [51356, + 562, 309, 1361, 484, 365, 300, 293, 293, 727, 312, 1075, 281, 10432, 309, 293, 300, + 311, 534, 498, 291, 574, 51660], "temperature": 0.0, "avg_logprob": -0.14506806522966867, + "compression_ratio": 1.7268518518518519, "no_speech_prob": 0.00020996364764869213}, + {"id": 373, "seek": 250952, "start": 2509.52, "end": 2518.0, "text": " on the future + side I think vector search it''s been the biggest game changer for us was to actually", + "tokens": [50364, 322, 264, 2027, 1252, 286, 519, 8062, 3164, 309, 311, 668, 264, + 3880, 1216, 22822, 337, 505, 390, 281, 767, 50788], "temperature": 0.0, "avg_logprob": + -0.1991895916818202, "compression_ratio": 1.6946902654867257, "no_speech_prob": + 0.0013272097567096353}, {"id": 374, "seek": 250952, "start": 2518.0, "end": 2523.6, + "text": " integrate vector search because that''s speared a lot of interest into + VESPO yeah yeah and I can", "tokens": [50788, 13365, 8062, 3164, 570, 300, 311, + 768, 1642, 257, 688, 295, 1179, 666, 691, 2358, 34885, 1338, 1338, 293, 286, 393, + 51068], "temperature": 0.0, "avg_logprob": -0.1991895916818202, "compression_ratio": + 1.6946902654867257, "no_speech_prob": 0.0013272097567096353}, {"id": 375, "seek": + 250952, "start": 2523.6, "end": 2529.6, "text": " actually have people coming in + you know yeah but I can imagine that vector search can still", "tokens": [51068, + 767, 362, 561, 1348, 294, 291, 458, 1338, 457, 286, 393, 3811, 300, 8062, 3164, + 393, 920, 51368], "temperature": 0.0, "avg_logprob": -0.1991895916818202, "compression_ratio": + 1.6946902654867257, "no_speech_prob": 0.0013272097567096353}, {"id": 376, "seek": + 250952, "start": 2530.48, "end": 2538.8, "text": " be useful in search as well as + recommendation systems right yeah exactly so so and that''s one of", "tokens": [51412, + 312, 4420, 294, 3164, 382, 731, 382, 11879, 3652, 558, 1338, 2293, 370, 370, 293, + 300, 311, 472, 295, 51828], "temperature": 0.0, "avg_logprob": -0.1991895916818202, + "compression_ratio": 1.6946902654867257, "no_speech_prob": 0.0013272097567096353}, + {"id": 377, "seek": 253880, "start": 2538.8, "end": 2545.84, "text": " the things + that you know you see that factorization machines dot products has been used for", + "tokens": [50364, 264, 721, 300, 291, 458, 291, 536, 300, 5952, 2144, 8379, 5893, + 3383, 575, 668, 1143, 337, 50716], "temperature": 0.0, "avg_logprob": -0.1209275045512635, + "compression_ratio": 1.7731481481481481, "no_speech_prob": 0.00017274596029892564}, + {"id": 378, "seek": 253880, "start": 2545.84, "end": 2553.04, "text": " recommendation + for a long time so you basically see the technology for search and recommendation", + "tokens": [50716, 11879, 337, 257, 938, 565, 370, 291, 1936, 536, 264, 2899, 337, + 3164, 293, 11879, 51076], "temperature": 0.0, "avg_logprob": -0.1209275045512635, + "compression_ratio": 1.7731481481481481, "no_speech_prob": 0.00017274596029892564}, + {"id": 379, "seek": 253880, "start": 2553.04, "end": 2561.36, "text": " use cases + kind of merging into the kind of same same space technology space and for those + type of", "tokens": [51076, 764, 3331, 733, 295, 44559, 666, 264, 733, 295, 912, + 912, 1901, 2899, 1901, 293, 337, 729, 2010, 295, 51492], "temperature": 0.0, "avg_logprob": + -0.1209275045512635, "compression_ratio": 1.7731481481481481, "no_speech_prob": + 0.00017274596029892564}, {"id": 380, "seek": 253880, "start": 2561.36, "end": 2567.1200000000003, + "text": " use cases I think VESPO is really strong technology and but the interesting + thing that I want to", "tokens": [51492, 764, 3331, 286, 519, 691, 2358, 34885, + 307, 534, 2068, 2899, 293, 457, 264, 1880, 551, 300, 286, 528, 281, 51780], "temperature": + 0.0, "avg_logprob": -0.1209275045512635, "compression_ratio": 1.7731481481481481, + "no_speech_prob": 0.00017274596029892564}, {"id": 381, "seek": 256712, "start": + 2567.12, "end": 2573.6, "text": " mention is that we have people coming in you know + asking about VESPO thinking that it was a vector", "tokens": [50364, 2152, 307, + 300, 321, 362, 561, 1348, 294, 291, 458, 3365, 466, 691, 2358, 34885, 1953, 300, + 309, 390, 257, 8062, 50688], "temperature": 0.0, "avg_logprob": -0.09232767166629914, + "compression_ratio": 1.7729257641921397, "no_speech_prob": 0.0006334540667012334}, + {"id": 382, "seek": 256712, "start": 2573.6, "end": 2578.64, "text": " search database + and then they realized hey you know there''s keywords there''s ranking there''s + a lot", "tokens": [50688, 3164, 8149, 293, 550, 436, 5334, 4177, 291, 458, 456, + 311, 21009, 456, 311, 17833, 456, 311, 257, 688, 50940], "temperature": 0.0, "avg_logprob": + -0.09232767166629914, "compression_ratio": 1.7729257641921397, "no_speech_prob": + 0.0006334540667012334}, {"id": 383, "seek": 256712, "start": 2578.64, "end": 2584.96, + "text": " of other features here you know so that''s been interesting for me you + know you know I see vector search", "tokens": [50940, 295, 661, 4122, 510, 291, + 458, 370, 300, 311, 668, 1880, 337, 385, 291, 458, 291, 458, 286, 536, 8062, 3164, + 51256], "temperature": 0.0, "avg_logprob": -0.09232767166629914, "compression_ratio": + 1.7729257641921397, "no_speech_prob": 0.0006334540667012334}, {"id": 384, "seek": + 256712, "start": 2584.96, "end": 2592.0, "text": " as a feature of VESPO in this + whole kind of serving engine but you can use for search and recommendation", "tokens": + [51256, 382, 257, 4111, 295, 691, 2358, 34885, 294, 341, 1379, 733, 295, 8148, 2848, + 457, 291, 393, 764, 337, 3164, 293, 11879, 51608], "temperature": 0.0, "avg_logprob": + -0.09232767166629914, "compression_ratio": 1.7729257641921397, "no_speech_prob": + 0.0006334540667012334}, {"id": 385, "seek": 259200, "start": 2592.56, "end": 2599.04, + "text": " not like I see vector search as a very important feature but it''s like + one feature of VESPO", "tokens": [50392, 406, 411, 286, 536, 8062, 3164, 382, 257, + 588, 1021, 4111, 457, 309, 311, 411, 472, 4111, 295, 691, 2358, 34885, 50716], "temperature": + 0.0, "avg_logprob": -0.11946787779358611, "compression_ratio": 1.644736842105263, + "no_speech_prob": 0.0026857887860387564}, {"id": 386, "seek": 259200, "start": 2599.6, + "end": 2604.56, "text": " yeah I have to admit that part I probably played that + role in bringing those users onto you", "tokens": [50744, 1338, 286, 362, 281, 9796, + 300, 644, 286, 1391, 3737, 300, 3090, 294, 5062, 729, 5022, 3911, 291, 50992], "temperature": + 0.0, "avg_logprob": -0.11946787779358611, "compression_ratio": 1.644736842105263, + "no_speech_prob": 0.0026857887860387564}, {"id": 387, "seek": 259200, "start": 2605.2, + "end": 2609.92, "text": " through that blog post that I will of course mention and + did mention multiple times and where", "tokens": [51024, 807, 300, 6968, 2183, 300, + 286, 486, 295, 1164, 2152, 293, 630, 2152, 3866, 1413, 293, 689, 51260], "temperature": + 0.0, "avg_logprob": -0.11946787779358611, "compression_ratio": 1.644736842105263, + "no_speech_prob": 0.0026857887860387564}, {"id": 388, "seek": 259200, "start": 2609.92, + "end": 2616.32, "text": " I compare multiple you know now seven vector databases + and I did put VESPO in that corner just to", "tokens": [51260, 286, 6794, 3866, + 291, 458, 586, 3407, 8062, 22380, 293, 286, 630, 829, 691, 2358, 34885, 294, 300, + 4538, 445, 281, 51580], "temperature": 0.0, "avg_logprob": -0.11946787779358611, + "compression_ratio": 1.644736842105263, "no_speech_prob": 0.0026857887860387564}, + {"id": 389, "seek": 261632, "start": 2616.32, "end": 2622.56, "text": " consider + only the vector part but I knew that you guys over a lot more and actually still + learn", "tokens": [50364, 1949, 787, 264, 8062, 644, 457, 286, 2586, 300, 291, 1074, + 670, 257, 688, 544, 293, 767, 920, 1466, 50676], "temperature": 0.0, "avg_logprob": + -0.07114389330841774, "compression_ratio": 1.6782608695652175, "no_speech_prob": + 0.001639834139496088}, {"id": 390, "seek": 261632, "start": 2622.56, "end": 2627.52, + "text": " at some point hopefully we''ll use VESPO in some project that I can actually + evaluate but yeah", "tokens": [50676, 412, 512, 935, 4696, 321, 603, 764, 691, 2358, + 34885, 294, 512, 1716, 300, 286, 393, 767, 13059, 457, 1338, 50924], "temperature": + 0.0, "avg_logprob": -0.07114389330841774, "compression_ratio": 1.6782608695652175, + "no_speech_prob": 0.001639834139496088}, {"id": 391, "seek": 261632, "start": 2627.52, + "end": 2633.04, "text": " you absolutely right that some of these systems are actually + beyond just vector search and you know", "tokens": [50924, 291, 3122, 558, 300, + 512, 295, 613, 3652, 366, 767, 4399, 445, 8062, 3164, 293, 291, 458, 51200], "temperature": + 0.0, "avg_logprob": -0.07114389330841774, "compression_ratio": 1.6782608695652175, + "no_speech_prob": 0.001639834139496088}, {"id": 392, "seek": 261632, "start": 2633.6000000000004, + "end": 2638.48, "text": " also the use cases like the way you view this right you + should actually take a step back and ask", "tokens": [51228, 611, 264, 764, 3331, + 411, 264, 636, 291, 1910, 341, 558, 291, 820, 767, 747, 257, 1823, 646, 293, 1029, + 51472], "temperature": 0.0, "avg_logprob": -0.07114389330841774, "compression_ratio": + 1.6782608695652175, "no_speech_prob": 0.001639834139496088}, {"id": 393, "seek": + 263848, "start": 2638.48, "end": 2645.28, "text": " yourself what is it that you + are trying to build yeah I think it''s really important and", "tokens": [50364, + 1803, 437, 307, 309, 300, 291, 366, 1382, 281, 1322, 1338, 286, 519, 309, 311, 534, + 1021, 293, 50704], "temperature": 0.0, "avg_logprob": -0.2550050511079676, "compression_ratio": + 1.5080213903743316, "no_speech_prob": 0.0013134075561538339}, {"id": 394, "seek": + 263848, "start": 2646.32, "end": 2653.68, "text": " so when you look at vector search + and we didn''t so to clarify on the algorithm side after investigating", "tokens": + [50756, 370, 562, 291, 574, 412, 8062, 3164, 293, 321, 994, 380, 370, 281, 17594, + 322, 264, 9284, 1252, 934, 22858, 51124], "temperature": 0.0, "avg_logprob": -0.2550050511079676, + "compression_ratio": 1.5080213903743316, "no_speech_prob": 0.0013134075561538339}, + {"id": 395, "seek": 263848, "start": 2653.68, "end": 2661.04, "text": " an oil and + several techniques we went for your emalco''s H&SW algorithm so we implemented a", + "tokens": [51124, 364, 3184, 293, 2940, 7512, 321, 1437, 337, 428, 846, 304, 1291, + 311, 389, 5, 50, 54, 9284, 370, 321, 12270, 257, 51492], "temperature": 0.0, "avg_logprob": + -0.2550050511079676, "compression_ratio": 1.5080213903743316, "no_speech_prob": + 0.0013134075561538339}, {"id": 396, "seek": 266104, "start": 2661.04, "end": 2670.88, + "text": " version of that to be able to also handle filtering real-time updates + and so on so but I think", "tokens": [50364, 3037, 295, 300, 281, 312, 1075, 281, + 611, 4813, 30822, 957, 12, 3766, 9205, 293, 370, 322, 370, 457, 286, 519, 50856], + "temperature": 0.0, "avg_logprob": -0.08852616823636568, "compression_ratio": 1.4864864864864864, + "no_speech_prob": 0.0005162590532563627}, {"id": 397, "seek": 266104, "start": 2671.52, + "end": 2680.8, "text": " one discussion that is is not heard that often is that + vector search when you introduce", "tokens": [50888, 472, 5017, 300, 307, 307, 406, + 2198, 300, 2049, 307, 300, 8062, 3164, 562, 291, 5366, 51352], "temperature": 0.0, + "avg_logprob": -0.08852616823636568, "compression_ratio": 1.4864864864864864, "no_speech_prob": + 0.0005162590532563627}, {"id": 398, "seek": 266104, "start": 2680.8, "end": 2687.84, + "text": " kind of H&SW or any technique you are losing some accuracy compared to + the brute force right", "tokens": [51352, 733, 295, 389, 5, 50, 54, 420, 604, 6532, + 291, 366, 7027, 512, 14170, 5347, 281, 264, 47909, 3464, 558, 51704], "temperature": + 0.0, "avg_logprob": -0.08852616823636568, "compression_ratio": 1.4864864864864864, + "no_speech_prob": 0.0005162590532563627}, {"id": 399, "seek": 268784, "start": 2688.8, + "end": 2695.36, "text": " so for example a data set that is called SIFT one million + documents you can do a single", "tokens": [50412, 370, 337, 1365, 257, 1412, 992, + 300, 307, 1219, 318, 12775, 51, 472, 2459, 8512, 291, 393, 360, 257, 2167, 50740], + "temperature": 0.0, "avg_logprob": -0.1700624421585438, "compression_ratio": 1.6026785714285714, + "no_speech_prob": 0.001758672297000885}, {"id": 400, "seek": 268784, "start": 2695.36, + "end": 2701.6000000000004, "text": " treaded route for search over those one million + vectors in about a hundred milliseconds", "tokens": [50740, 2192, 12777, 7955, 337, + 3164, 670, 729, 472, 2459, 18875, 294, 466, 257, 3262, 34184, 51052], "temperature": + 0.0, "avg_logprob": -0.1700624421585438, "compression_ratio": 1.6026785714285714, + "no_speech_prob": 0.001758672297000885}, {"id": 401, "seek": 268784, "start": 2702.32, + "end": 2710.96, "text": " right but if you do approximate then some parameters of + H&SW you might get down to 0.1", "tokens": [51088, 558, 457, 498, 291, 360, 30874, + 550, 512, 9834, 295, 389, 5, 50, 54, 291, 1062, 483, 760, 281, 1958, 13, 16, 51520], + "temperature": 0.0, "avg_logprob": -0.1700624421585438, "compression_ratio": 1.6026785714285714, + "no_speech_prob": 0.001758672297000885}, {"id": 402, "seek": 268784, "start": 2710.96, + "end": 2716.6400000000003, "text": " milliseconds as well using a library right + so it''s a thousand times faster but by doing that you", "tokens": [51520, 34184, + 382, 731, 1228, 257, 6405, 558, 370, 309, 311, 257, 4714, 1413, 4663, 457, 538, + 884, 300, 291, 51804], "temperature": 0.0, "avg_logprob": -0.1700624421585438, "compression_ratio": + 1.6026785714285714, "no_speech_prob": 0.001758672297000885}, {"id": 403, "seek": + 271664, "start": 2716.64, "end": 2723.6, "text": " are losing some accuracy and + that''s kind of when I see blog posts about approximate vector search", "tokens": + [50364, 366, 7027, 512, 14170, 293, 300, 311, 733, 295, 562, 286, 536, 6968, 12300, + 466, 30874, 8062, 3164, 50712], "temperature": 0.0, "avg_logprob": -0.12471242178054083, + "compression_ratio": 1.6054054054054054, "no_speech_prob": 0.0003434710088185966}, + {"id": 404, "seek": 271664, "start": 2723.6, "end": 2730.64, "text": " without mentioning + the kind of trade-offs between recall and performance then I like you know", "tokens": + [50712, 1553, 18315, 264, 733, 295, 4923, 12, 19231, 1296, 9901, 293, 3389, 550, + 286, 411, 291, 458, 51064], "temperature": 0.0, "avg_logprob": -0.12471242178054083, + "compression_ratio": 1.6054054054054054, "no_speech_prob": 0.0003434710088185966}, + {"id": 405, "seek": 271664, "start": 2730.64, "end": 2740.08, "text": " you should + include the recall numbers because there''s really so it''s really I think it''s + really important", "tokens": [51064, 291, 820, 4090, 264, 9901, 3547, 570, 456, + 311, 534, 370, 309, 311, 534, 286, 519, 309, 311, 534, 1021, 51536], "temperature": + 0.0, "avg_logprob": -0.12471242178054083, "compression_ratio": 1.6054054054054054, + "no_speech_prob": 0.0003434710088185966}, {"id": 406, "seek": 274008, "start": 2740.16, + "end": 2746.56, "text": " for many use cases right it might be that you need to + do use a brute force because that kind", "tokens": [50368, 337, 867, 764, 3331, + 558, 309, 1062, 312, 300, 291, 643, 281, 360, 764, 257, 47909, 3464, 570, 300, 733, + 50688], "temperature": 0.0, "avg_logprob": -0.23524654743283294, "compression_ratio": + 1.885, "no_speech_prob": 0.002460829447954893}, {"id": 407, "seek": 274008, "start": + 2746.56, "end": 2754.08, "text": " of approximative error that you introduce is + not acceptable right so we do have use cases in", "tokens": [50688, 295, 8542, 1166, + 6713, 300, 291, 5366, 307, 406, 15513, 558, 370, 321, 360, 362, 764, 3331, 294, + 51064], "temperature": 0.0, "avg_logprob": -0.23524654743283294, "compression_ratio": + 1.885, "no_speech_prob": 0.002460829447954893}, {"id": 408, "seek": 274008, "start": + 2754.08, "end": 2759.68, "text": " now that we actually use we don''t have like + large amount of documents that we actually use a brute", "tokens": [51064, 586, + 300, 321, 767, 764, 321, 500, 380, 362, 411, 2416, 2372, 295, 8512, 300, 321, 767, + 764, 257, 47909, 51344], "temperature": 0.0, "avg_logprob": -0.23524654743283294, + "compression_ratio": 1.885, "no_speech_prob": 0.002460829447954893}, {"id": 409, + "seek": 274008, "start": 2759.68, "end": 2766.16, "text": " force search and best + but best best supports brute force search so yeah yeah okay so you can", "tokens": + [51344, 3464, 3164, 293, 1151, 457, 1151, 1151, 9346, 47909, 3464, 3164, 370, 1338, + 1338, 1392, 370, 291, 393, 51668], "temperature": 0.0, "avg_logprob": -0.23524654743283294, + "compression_ratio": 1.885, "no_speech_prob": 0.002460829447954893}, {"id": 410, + "seek": 276616, "start": 2766.16, "end": 2772.24, "text": " switch and that''s the + beauty is that since we support this you just say in the query time you can", "tokens": + [50364, 3679, 293, 300, 311, 264, 6643, 307, 300, 1670, 321, 1406, 341, 291, 445, + 584, 294, 264, 14581, 565, 291, 393, 50668], "temperature": 0.0, "avg_logprob": + -0.09092583212741585, "compression_ratio": 1.9014778325123152, "no_speech_prob": + 0.00034351329668425024}, {"id": 411, "seek": 276616, "start": 2772.24, "end": 2780.56, + "text": " say approximate through your false and that means that you can take a + query run it using a brute force", "tokens": [50668, 584, 30874, 807, 428, 7908, + 293, 300, 1355, 300, 291, 393, 747, 257, 14581, 1190, 309, 1228, 257, 47909, 3464, + 51084], "temperature": 0.0, "avg_logprob": -0.09092583212741585, "compression_ratio": + 1.9014778325123152, "no_speech_prob": 0.00034351329668425024}, {"id": 412, "seek": + 276616, "start": 2780.56, "end": 2785.2799999999997, "text": " and then you can + compare the result for the brute force which is exact with the approximation", "tokens": + [51084, 293, 550, 291, 393, 6794, 264, 1874, 337, 264, 47909, 3464, 597, 307, 1900, + 365, 264, 28023, 51320], "temperature": 0.0, "avg_logprob": -0.09092583212741585, + "compression_ratio": 1.9014778325123152, "no_speech_prob": 0.00034351329668425024}, + {"id": 413, "seek": 276616, "start": 2785.8399999999997, "end": 2790.72, "text": + " then you can compute the overlap between those two and that''s typically then + what''s used in", "tokens": [51348, 550, 291, 393, 14722, 264, 19959, 1296, 729, + 732, 293, 300, 311, 5850, 550, 437, 311, 1143, 294, 51592], "temperature": 0.0, + "avg_logprob": -0.09092583212741585, "compression_ratio": 1.9014778325123152, "no_speech_prob": + 0.00034351329668425024}, {"id": 414, "seek": 279072, "start": 2790.72, "end": 2797.8399999999997, + "text": " the recall at k right so I did two blog posts on what I call billion scale + vector search with with", "tokens": [50364, 264, 9901, 412, 350, 558, 370, 286, + 630, 732, 6968, 12300, 322, 437, 286, 818, 5218, 4373, 8062, 3164, 365, 365, 50720], + "temperature": 0.0, "avg_logprob": -0.16294440594348278, "compression_ratio": 1.7136563876651982, + "no_speech_prob": 0.00045209767995402217}, {"id": 415, "seek": 279072, "start": + 2797.8399999999997, "end": 2805.9199999999996, "text": " last one where I did deep + dive I think into these kind of trade-offs because when you introduce", "tokens": + [50720, 1036, 472, 689, 286, 630, 2452, 9192, 286, 519, 666, 613, 733, 295, 4923, + 12, 19231, 570, 562, 291, 5366, 51124], "temperature": 0.0, "avg_logprob": -0.16294440594348278, + "compression_ratio": 1.7136563876651982, "no_speech_prob": 0.00045209767995402217}, + {"id": 416, "seek": 279072, "start": 2805.9199999999996, "end": 2811.52, "text": + " approximate you also need to build these kind of index structures so in hnsw you + need to build the", "tokens": [51124, 30874, 291, 611, 643, 281, 1322, 613, 733, + 295, 8186, 9227, 370, 294, 276, 3695, 86, 291, 643, 281, 1322, 264, 51404], "temperature": + 0.0, "avg_logprob": -0.16294440594348278, "compression_ratio": 1.7136563876651982, + "no_speech_prob": 0.00045209767995402217}, {"id": 417, "seek": 279072, "start": + 2811.52, "end": 2817.6, "text": " graph right which is time and resource taking + time you know I''m costing memory so there are all", "tokens": [51404, 4295, 558, + 597, 307, 565, 293, 7684, 1940, 565, 291, 458, 286, 478, 37917, 4675, 370, 456, + 366, 439, 51708], "temperature": 0.0, "avg_logprob": -0.16294440594348278, "compression_ratio": + 1.7136563876651982, "no_speech_prob": 0.00045209767995402217}, {"id": 418, "seek": + 281760, "start": 2817.6, "end": 2823.04, "text": " these kind of trade-offs and + that''s generally I mean generally for search a lot of trade-offs but", "tokens": + [50364, 613, 733, 295, 4923, 12, 19231, 293, 300, 311, 5101, 286, 914, 5101, 337, + 3164, 257, 688, 295, 4923, 12, 19231, 457, 50636], "temperature": 0.0, "avg_logprob": + -0.23356497287750244, "compression_ratio": 1.8708133971291867, "no_speech_prob": + 0.002701298100873828}, {"id": 419, "seek": 281760, "start": 2823.04, "end": 2827.6, + "text": " especially around vector search I call it the jack of old trade-offs because + there''s so many things", "tokens": [50636, 2318, 926, 8062, 3164, 286, 818, 309, + 264, 7109, 295, 1331, 4923, 12, 19231, 570, 456, 311, 370, 867, 721, 50864], "temperature": + 0.0, "avg_logprob": -0.23356497287750244, "compression_ratio": 1.8708133971291867, + "no_speech_prob": 0.002701298100873828}, {"id": 420, "seek": 281760, "start": 2827.6, + "end": 2835.92, "text": " you know to consider you know memory usage this usage + CPU and so on so yeah that love the term jack", "tokens": [50864, 291, 458, 281, + 1949, 291, 458, 4675, 14924, 341, 14924, 13199, 293, 370, 322, 370, 1338, 300, 959, + 264, 1433, 7109, 51280], "temperature": 0.0, "avg_logprob": -0.23356497287750244, + "compression_ratio": 1.8708133971291867, "no_speech_prob": 0.002701298100873828}, + {"id": 421, "seek": 281760, "start": 2835.92, "end": 2843.2799999999997, "text": + " of old feed-offs yeah but it but it really is you know you really have so many + trade-offs and", "tokens": [51280, 295, 1331, 3154, 12, 19231, 1338, 457, 309, 457, + 309, 534, 307, 291, 458, 291, 534, 362, 370, 867, 4923, 12, 19231, 293, 51648], + "temperature": 0.0, "avg_logprob": -0.23356497287750244, "compression_ratio": 1.8708133971291867, + "no_speech_prob": 0.002701298100873828}, {"id": 422, "seek": 284328, "start": 2843.36, + "end": 2849.6000000000004, "text": " some companies you know maybe you have lots + of data but you don''t have any real tripe it right", "tokens": [50368, 512, 3431, + 291, 458, 1310, 291, 362, 3195, 295, 1412, 457, 291, 500, 380, 362, 604, 957, 1376, + 494, 309, 558, 50680], "temperature": 0.0, "avg_logprob": -0.15905709599339685, + "compression_ratio": 1.755980861244019, "no_speech_prob": 0.0010011696722358465}, + {"id": 423, "seek": 284328, "start": 2849.6000000000004, "end": 2856.4, "text": + " in that case maybe disk a and n or things that basically using disk is is a good + alternative", "tokens": [50680, 294, 300, 1389, 1310, 12355, 257, 293, 297, 420, + 721, 300, 1936, 1228, 12355, 307, 307, 257, 665, 8535, 51020], "temperature": 0.0, + "avg_logprob": -0.15905709599339685, "compression_ratio": 1.755980861244019, "no_speech_prob": + 0.0010011696722358465}, {"id": 424, "seek": 284328, "start": 2856.4, "end": 2861.76, + "text": " because when you''re buying servers in the cloud or renting servers in + the cloud you pay", "tokens": [51020, 570, 562, 291, 434, 6382, 15909, 294, 264, + 4588, 420, 40598, 15909, 294, 264, 4588, 291, 1689, 51288], "temperature": 0.0, + "avg_logprob": -0.15905709599339685, "compression_ratio": 1.755980861244019, "no_speech_prob": + 0.0010011696722358465}, {"id": 425, "seek": 284328, "start": 2862.7200000000003, + "end": 2867.92, "text": " when you want to have this amount of memory you get this + amount of CPU right there comes in", "tokens": [51336, 562, 291, 528, 281, 362, + 341, 2372, 295, 4675, 291, 483, 341, 2372, 295, 13199, 558, 456, 1487, 294, 51596], + "temperature": 0.0, "avg_logprob": -0.15905709599339685, "compression_ratio": 1.755980861244019, + "no_speech_prob": 0.0010011696722358465}, {"id": 426, "seek": 286792, "start": 2868.88, + "end": 2874.4, "text": " a relationship between the CPU and the memory and and so + there are different trade-offs around", "tokens": [50412, 257, 2480, 1296, 264, + 13199, 293, 264, 4675, 293, 293, 370, 456, 366, 819, 4923, 12, 19231, 926, 50688], + "temperature": 0.0, "avg_logprob": -0.13332405868841676, "compression_ratio": 1.7065217391304348, + "no_speech_prob": 0.0007798340520821512}, {"id": 427, "seek": 286792, "start": 2874.4, + "end": 2880.0, "text": " you know what what''s actually going to use it for you + yeah exactly have you heard any other", "tokens": [50688, 291, 458, 437, 437, 311, + 767, 516, 281, 764, 309, 337, 291, 1338, 2293, 362, 291, 2198, 604, 661, 50968], + "temperature": 0.0, "avg_logprob": -0.13332405868841676, "compression_ratio": 1.7065217391304348, + "no_speech_prob": 0.0007798340520821512}, {"id": 428, "seek": 286792, "start": 2880.0, + "end": 2884.7200000000003, "text": " misconceptions about neural search at large + you know when somebody comes and says hey I want to", "tokens": [50968, 50012, 466, + 18161, 3164, 412, 2416, 291, 458, 562, 2618, 1487, 293, 1619, 4177, 286, 528, 281, + 51204], "temperature": 0.0, "avg_logprob": -0.13332405868841676, "compression_ratio": + 1.7065217391304348, "no_speech_prob": 0.0007798340520821512}, {"id": 429, "seek": + 286792, "start": 2884.7200000000003, "end": 2889.84, "text": " implement a question + answering system you couldn''t principle use sparse search techniques or", "tokens": + [51204, 4445, 257, 1168, 13430, 1185, 291, 2809, 380, 8665, 764, 637, 11668, 3164, + 7512, 420, 51460], "temperature": 0.0, "avg_logprob": -0.13332405868841676, "compression_ratio": + 1.7065217391304348, "no_speech_prob": 0.0007798340520821512}, {"id": 430, "seek": + 286792, "start": 2889.84, "end": 2894.56, "text": " like query understanding techniques + you know to actually almost do it in the rule-based fashion", "tokens": [51460, + 411, 14581, 3701, 7512, 291, 458, 281, 767, 1920, 360, 309, 294, 264, 4978, 12, + 6032, 6700, 51696], "temperature": 0.0, "avg_logprob": -0.13332405868841676, "compression_ratio": + 1.7065217391304348, "no_speech_prob": 0.0007798340520821512}, {"id": 431, "seek": + 289456, "start": 2895.2799999999997, "end": 2902.08, "text": " but like neural search + on the other hand is like you know new sexy stuff everyone''s to try out so", "tokens": + [50400, 457, 411, 18161, 3164, 322, 264, 661, 1011, 307, 411, 291, 458, 777, 13701, + 1507, 1518, 311, 281, 853, 484, 370, 50740], "temperature": 0.0, "avg_logprob": + -0.13166793532993482, "compression_ratio": 1.7174887892376682, "no_speech_prob": + 0.002555578714236617}, {"id": 432, "seek": 289456, "start": 2902.08, "end": 2908.48, + "text": " the question is like have you heard of any misconceptions or something + that people think it''s", "tokens": [50740, 264, 1168, 307, 411, 362, 291, 2198, + 295, 604, 50012, 420, 746, 300, 561, 519, 309, 311, 51060], "temperature": 0.0, + "avg_logprob": -0.13166793532993482, "compression_ratio": 1.7174887892376682, "no_speech_prob": + 0.002555578714236617}, {"id": 433, "seek": 289456, "start": 2908.48, "end": 2916.88, + "text": " much easier than it is yeah that''s that''s I mean it''s a fantastic question + I think you know you", "tokens": [51060, 709, 3571, 813, 309, 307, 1338, 300, 311, + 300, 311, 286, 914, 309, 311, 257, 5456, 1168, 286, 519, 291, 458, 291, 51480], + "temperature": 0.0, "avg_logprob": -0.13166793532993482, "compression_ratio": 1.7174887892376682, + "no_speech_prob": 0.002555578714236617}, {"id": 434, "seek": 289456, "start": 2916.88, + "end": 2922.24, "text": " can just sit back you know this is I''m relaxed for a + few minutes because this is a topic that I", "tokens": [51480, 393, 445, 1394, 646, + 291, 458, 341, 307, 286, 478, 14628, 337, 257, 1326, 2077, 570, 341, 307, 257, 4829, + 300, 286, 51748], "temperature": 0.0, "avg_logprob": -0.13166793532993482, "compression_ratio": + 1.7174887892376682, "no_speech_prob": 0.002555578714236617}, {"id": 435, "seek": + 292224, "start": 2922.24, "end": 2929.12, "text": " really love um yeah so so so + the first time when we if you look at semantic search especially around", "tokens": + [50364, 534, 959, 1105, 1338, 370, 370, 370, 264, 700, 565, 562, 321, 498, 291, + 574, 412, 47982, 3164, 2318, 926, 50708], "temperature": 0.0, "avg_logprob": -0.16161622057904254, + "compression_ratio": 1.9086538461538463, "no_speech_prob": 0.0007816168363206089}, + {"id": 436, "seek": 292224, "start": 2929.12, "end": 2935.9199999999996, "text": + " vector search if we semantic search might mean a lot but if you look at the kind + of the typical", "tokens": [50708, 8062, 3164, 498, 321, 47982, 3164, 1062, 914, + 257, 688, 457, 498, 291, 574, 412, 264, 733, 295, 264, 7476, 51048], "temperature": + 0.0, "avg_logprob": -0.16161622057904254, "compression_ratio": 1.9086538461538463, + "no_speech_prob": 0.0007816168363206089}, {"id": 437, "seek": 292224, "start": 2935.9199999999996, + "end": 2940.64, "text": " that people use semantics search today is that you have + this vector search right you have independent", "tokens": [51048, 300, 561, 764, + 4361, 45298, 3164, 965, 307, 300, 291, 362, 341, 8062, 3164, 558, 291, 362, 6695, + 51284], "temperature": 0.0, "avg_logprob": -0.16161622057904254, "compression_ratio": + 1.9086538461538463, "no_speech_prob": 0.0007816168363206089}, {"id": 438, "seek": + 292224, "start": 2940.64, "end": 2946.7999999999997, "text": " query embedding in + the document embedding and so and if you base this if you take them pre-trained", + "tokens": [51284, 14581, 12240, 3584, 294, 264, 4166, 12240, 3584, 293, 370, 293, + 498, 291, 3096, 341, 498, 291, 747, 552, 659, 12, 17227, 2001, 51592], "temperature": + 0.0, "avg_logprob": -0.16161622057904254, "compression_ratio": 1.9086538461538463, + "no_speech_prob": 0.0007816168363206089}, {"id": 439, "seek": 294680, "start": 2946.8, + "end": 2953.6000000000004, "text": " language model from hugging phase and you just + pull that model and then you encode your queries", "tokens": [50364, 2856, 2316, + 490, 41706, 5574, 293, 291, 445, 2235, 300, 2316, 293, 550, 291, 2058, 1429, 428, + 24109, 50704], "temperature": 0.0, "avg_logprob": -0.16031797608332848, "compression_ratio": + 1.5978260869565217, "no_speech_prob": 0.0011512160999700427}, {"id": 440, "seek": + 294680, "start": 2953.6000000000004, "end": 2961.6800000000003, "text": " using + for instance the CLS token or the average over all tokens and the result that you + will get", "tokens": [50704, 1228, 337, 5197, 264, 12855, 50, 14862, 420, 264, 4274, + 670, 439, 22667, 293, 264, 1874, 300, 291, 486, 483, 51108], "temperature": 0.0, + "avg_logprob": -0.16031797608332848, "compression_ratio": 1.5978260869565217, "no_speech_prob": + 0.0011512160999700427}, {"id": 441, "seek": 294680, "start": 2961.6800000000003, + "end": 2970.6400000000003, "text": " from that is not going to compete at all with + the VM25 right because that language model has not been", "tokens": [51108, 490, + 300, 307, 406, 516, 281, 11831, 412, 439, 365, 264, 18038, 6074, 558, 570, 300, + 2856, 2316, 575, 406, 668, 51556], "temperature": 0.0, "avg_logprob": -0.16031797608332848, + "compression_ratio": 1.5978260869565217, "no_speech_prob": 0.0011512160999700427}, + {"id": 442, "seek": 297064, "start": 2970.72, "end": 2977.2, "text": " it''s only + been learned learning how to do mask language model right so it''s basically it''s + been", "tokens": [50368, 309, 311, 787, 668, 3264, 2539, 577, 281, 360, 6094, 2856, + 2316, 558, 370, 309, 311, 1936, 309, 311, 668, 50692], "temperature": 0.0, "avg_logprob": + -0.1080835276636584, "compression_ratio": 2.063025210084034, "no_speech_prob": 0.0019760611467063427}, + {"id": 443, "seek": 297064, "start": 2977.2, "end": 2982.16, "text": " trained on + predicting the next word right so it''s a deep neural network that''s it''s not + been trained", "tokens": [50692, 8895, 322, 32884, 264, 958, 1349, 558, 370, 309, + 311, 257, 2452, 18161, 3209, 300, 311, 309, 311, 406, 668, 8895, 50940], "temperature": + 0.0, "avg_logprob": -0.1080835276636584, "compression_ratio": 2.063025210084034, + "no_speech_prob": 0.0019760611467063427}, {"id": 444, "seek": 297064, "start": 2982.16, + "end": 2989.7599999999998, "text": " for that so it''s basically like taking some + deep neural network for my vacuum cleaner and put it", "tokens": [50940, 337, 300, + 370, 309, 311, 1936, 411, 1940, 512, 2452, 18161, 3209, 337, 452, 14224, 16532, + 293, 829, 309, 51320], "temperature": 0.0, "avg_logprob": -0.1080835276636584, "compression_ratio": + 2.063025210084034, "no_speech_prob": 0.0019760611467063427}, {"id": 445, "seek": + 297064, "start": 2989.7599999999998, "end": 2995.52, "text": " into my car you know + to try to try to try the car it''s not been trained for that right so that was", + "tokens": [51320, 666, 452, 1032, 291, 458, 281, 853, 281, 853, 281, 853, 264, 1032, + 309, 311, 406, 668, 8895, 337, 300, 558, 370, 300, 390, 51608], "temperature": 0.0, + "avg_logprob": -0.1080835276636584, "compression_ratio": 2.063025210084034, "no_speech_prob": + 0.0019760611467063427}, {"id": 446, "seek": 297064, "start": 2995.52, "end": 2999.2799999999997, + "text": " one of the things you know when we struggled as well when they looked + at bird and the other people", "tokens": [51608, 472, 295, 264, 721, 291, 458, 562, + 321, 19023, 382, 731, 562, 436, 2956, 412, 5255, 293, 264, 661, 561, 51796], "temperature": + 0.0, "avg_logprob": -0.1080835276636584, "compression_ratio": 2.063025210084034, + "no_speech_prob": 0.0019760611467063427}, {"id": 447, "seek": 299928, "start": 2999.44, + "end": 3004.32, "text": " like oh that''s so great and then we had the engine and + we could like compare it with VM25 and then", "tokens": [50372, 411, 1954, 300, + 311, 370, 869, 293, 550, 321, 632, 264, 2848, 293, 321, 727, 411, 6794, 309, 365, + 18038, 6074, 293, 550, 50616], "temperature": 0.0, "avg_logprob": -0.15717394633959697, + "compression_ratio": 1.6861924686192469, "no_speech_prob": 0.001105778617784381}, + {"id": 448, "seek": 299928, "start": 3004.32, "end": 3010.88, "text": " we did bird + here and there was like these results if you look at the actual information retrieval + benchmarks", "tokens": [50616, 321, 630, 5255, 510, 293, 456, 390, 411, 613, 3542, + 498, 291, 574, 412, 264, 3539, 1589, 19817, 3337, 43751, 50944], "temperature": + 0.0, "avg_logprob": -0.15717394633959697, "compression_ratio": 1.6861924686192469, + "no_speech_prob": 0.001105778617784381}, {"id": 449, "seek": 299928, "start": 3010.88, + "end": 3017.6000000000004, "text": " they''re like the results are not good they''re + they''re like really so then came the kind of you know", "tokens": [50944, 436, + 434, 411, 264, 3542, 366, 406, 665, 436, 434, 436, 434, 411, 534, 370, 550, 1361, + 264, 733, 295, 291, 458, 51280], "temperature": 0.0, "avg_logprob": -0.15717394633959697, + "compression_ratio": 1.6861924686192469, "no_speech_prob": 0.001105778617784381}, + {"id": 450, "seek": 299928, "start": 3017.6000000000004, "end": 3024.96, "text": + " realization I think that''s actually happened around industry as well in 2020 + when the DPR dense", "tokens": [51280, 25138, 286, 519, 300, 311, 767, 2011, 926, + 3518, 382, 731, 294, 4808, 562, 264, 413, 15958, 18011, 51648], "temperature": 0.0, + "avg_logprob": -0.15717394633959697, "compression_ratio": 1.6861924686192469, "no_speech_prob": + 0.001105778617784381}, {"id": 451, "seek": 302496, "start": 3024.96, "end": 3031.92, + "text": " passage retriever paper came out from from a Facebook where they trained + on natural questions", "tokens": [50364, 11497, 19817, 331, 3035, 1361, 484, 490, + 490, 257, 4384, 689, 436, 8895, 322, 3303, 1651, 50712], "temperature": 0.0, "avg_logprob": + -0.1474355521954988, "compression_ratio": 1.8423645320197044, "no_speech_prob": + 0.0006372269708663225}, {"id": 452, "seek": 302496, "start": 3032.48, "end": 3037.04, + "text": " the Google dataset they actually trained this dense retriever and the + dense model using a", "tokens": [50740, 264, 3329, 28872, 436, 767, 8895, 341, 18011, + 19817, 331, 293, 264, 18011, 2316, 1228, 257, 50968], "temperature": 0.0, "avg_logprob": + -0.1474355521954988, "compression_ratio": 1.8423645320197044, "no_speech_prob": + 0.0006372269708663225}, {"id": 453, "seek": 302496, "start": 3037.04, "end": 3043.12, + "text": " contrastive loss and hard negative mining so they basically demonstrate + you know how to actually", "tokens": [50968, 8712, 488, 4470, 293, 1152, 3671, 15512, + 370, 436, 1936, 11698, 291, 458, 577, 281, 767, 51272], "temperature": 0.0, "avg_logprob": + -0.1474355521954988, "compression_ratio": 1.8423645320197044, "no_speech_prob": + 0.0006372269708663225}, {"id": 454, "seek": 302496, "start": 3043.12, "end": 3048.56, + "text": " train a dense retriever model and then we actually saw the results were + much better than than", "tokens": [51272, 3847, 257, 18011, 19817, 331, 2316, 293, + 550, 321, 767, 1866, 264, 3542, 645, 709, 1101, 813, 813, 51544], "temperature": + 0.0, "avg_logprob": -0.1474355521954988, "compression_ratio": 1.8423645320197044, + "no_speech_prob": 0.0006372269708663225}, {"id": 455, "seek": 304856, "start": 3048.56, + "end": 3057.2799999999997, "text": " VM25 in that but but it''s a huge so that''s + one area where I think that people", "tokens": [50364, 18038, 6074, 294, 300, 457, + 457, 309, 311, 257, 2603, 370, 300, 311, 472, 1859, 689, 286, 519, 300, 561, 50800], + "temperature": 0.0, "avg_logprob": -0.11676137343696925, "compression_ratio": 1.6576576576576576, + "no_speech_prob": 0.00048137936391867697}, {"id": 456, "seek": 304856, "start": + 3058.08, "end": 3064.48, "text": " just using the pre-trained model might not work + well especially if it''s not been tuned for retrieval", "tokens": [50840, 445, 1228, + 264, 659, 12, 17227, 2001, 2316, 1062, 406, 589, 731, 2318, 498, 309, 311, 406, + 668, 10870, 337, 19817, 3337, 51160], "temperature": 0.0, "avg_logprob": -0.11676137343696925, + "compression_ratio": 1.6576576576576576, "no_speech_prob": 0.00048137936391867697}, + {"id": 457, "seek": 304856, "start": 3064.48, "end": 3071.04, "text": " and even + if you look at MS Marco which is the largest data set out there that you can train", + "tokens": [51160, 293, 754, 498, 291, 574, 412, 7395, 26535, 597, 307, 264, 6443, + 1412, 992, 484, 456, 300, 291, 393, 3847, 51488], "temperature": 0.0, "avg_logprob": + -0.11676137343696925, "compression_ratio": 1.6576576576576576, "no_speech_prob": + 0.00048137936391867697}, {"id": 458, "seek": 304856, "start": 3071.04, "end": 3078.16, + "text": " a model on if you train a model on MS Marco and then you apply that model + into a different domain", "tokens": [51488, 257, 2316, 322, 498, 291, 3847, 257, + 2316, 322, 7395, 26535, 293, 550, 291, 3079, 300, 2316, 666, 257, 819, 9274, 51844], + "temperature": 0.0, "avg_logprob": -0.11676137343696925, "compression_ratio": 1.6576576576576576, + "no_speech_prob": 0.00048137936391867697}, {"id": 459, "seek": 307856, "start": + 3078.64, "end": 3088.08, "text": " so on a different dataset it might not outcompete + VM25 in fact it actually in many cases it is", "tokens": [50368, 370, 322, 257, + 819, 28872, 309, 1062, 406, 484, 21541, 3498, 18038, 6074, 294, 1186, 309, 767, + 294, 867, 3331, 309, 307, 50840], "temperature": 0.0, "avg_logprob": -0.16981151077773546, + "compression_ratio": 1.7300884955752212, "no_speech_prob": 0.0013801638269796968}, + {"id": 460, "seek": 307856, "start": 3088.08, "end": 3094.56, "text": " actually + underperforms compared to VM25 so and that''s why there''s a lot of interest and + especially", "tokens": [50840, 767, 833, 26765, 82, 5347, 281, 18038, 6074, 370, + 293, 300, 311, 983, 456, 311, 257, 688, 295, 1179, 293, 2318, 51164], "temperature": + 0.0, "avg_logprob": -0.16981151077773546, "compression_ratio": 1.7300884955752212, + "no_speech_prob": 0.0013801638269796968}, {"id": 461, "seek": 307856, "start": 3095.2, + "end": 3101.44, "text": " recently is like you know if we combine this exact matching + you know the actual user search for", "tokens": [51196, 3938, 307, 411, 291, 458, + 498, 321, 10432, 341, 1900, 14324, 291, 458, 264, 3539, 4195, 3164, 337, 51508], + "temperature": 0.0, "avg_logprob": -0.16981151077773546, "compression_ratio": 1.7300884955752212, + "no_speech_prob": 0.0013801638269796968}, {"id": 462, "seek": 307856, "start": 3101.44, + "end": 3107.6, "text": " this phrase but we also have the vector representation + you know how to combine that and that''s that''s", "tokens": [51508, 341, 9535, + 457, 321, 611, 362, 264, 8062, 10290, 291, 458, 577, 281, 10432, 300, 293, 300, + 311, 300, 311, 51816], "temperature": 0.0, "avg_logprob": -0.16981151077773546, + "compression_ratio": 1.7300884955752212, "no_speech_prob": 0.0013801638269796968}, + {"id": 463, "seek": 310760, "start": 3107.6, "end": 3115.44, "text": " actually + two of my colleagues are right now working on the beer dataset to they open the + PR to", "tokens": [50364, 767, 732, 295, 452, 7734, 366, 558, 586, 1364, 322, 264, + 8795, 28872, 281, 436, 1269, 264, 11568, 281, 50756], "temperature": 0.0, "avg_logprob": + -0.1751697770841829, "compression_ratio": 1.5991735537190082, "no_speech_prob": + 0.0009115648572333157}, {"id": 464, "seek": 310760, "start": 3115.44, "end": 3123.2799999999997, + "text": " the to the dataset to include VASP as well and then we will demonstrate + some methods for combining", "tokens": [50756, 264, 281, 264, 28872, 281, 4090, + 691, 3160, 47, 382, 731, 293, 550, 321, 486, 11698, 512, 7150, 337, 21928, 51148], + "temperature": 0.0, "avg_logprob": -0.1751697770841829, "compression_ratio": 1.5991735537190082, + "no_speech_prob": 0.0009115648572333157}, {"id": 465, "seek": 310760, "start": 3123.2799999999997, + "end": 3128.4, "text": " sparse and dense. Yeah that''s awesome like I''ve read + the beer paper after you referred it to me", "tokens": [51148, 637, 11668, 293, + 18011, 13, 865, 300, 311, 3476, 411, 286, 600, 1401, 264, 8795, 3035, 934, 291, + 10839, 309, 281, 385, 51404], "temperature": 0.0, "avg_logprob": -0.1751697770841829, + "compression_ratio": 1.5991735537190082, "no_speech_prob": 0.0009115648572333157}, + {"id": 466, "seek": 310760, "start": 3128.4, "end": 3133.2799999999997, "text": + " actually and it was quite eye-opening because it does compare not only sort of + like search engine", "tokens": [51404, 767, 293, 309, 390, 1596, 3313, 12, 404, + 4559, 570, 309, 775, 6794, 406, 787, 1333, 295, 411, 3164, 2848, 51648], "temperature": + 0.0, "avg_logprob": -0.1751697770841829, "compression_ratio": 1.5991735537190082, + "no_speech_prob": 0.0009115648572333157}, {"id": 467, "seek": 313328, "start": 3133.28, + "end": 3139.52, "text": " algorithms and approaches but also datasets and tasks + right which different tasks like searching", "tokens": [50364, 14642, 293, 11587, + 457, 611, 42856, 293, 9608, 558, 597, 819, 9608, 411, 10808, 50676], "temperature": + 0.0, "avg_logprob": -0.08148647176808324, "compression_ratio": 1.628099173553719, + "no_speech_prob": 0.004539810586720705}, {"id": 468, "seek": 313328, "start": 3139.52, + "end": 3144.4, "text": " or answering questions may matter quite a lot and so it + was quite an eye-opening that first of all", "tokens": [50676, 420, 13430, 1651, + 815, 1871, 1596, 257, 688, 293, 370, 309, 390, 1596, 364, 3313, 12, 404, 4559, 300, + 700, 295, 439, 50920], "temperature": 0.0, "avg_logprob": -0.08148647176808324, + "compression_ratio": 1.628099173553719, "no_speech_prob": 0.004539810586720705}, + {"id": 469, "seek": 313328, "start": 3144.4, "end": 3150.8, "text": " VM25 is fairly + competitive so it''s not a loser not at all so like you should still consider using + it", "tokens": [50920, 18038, 6074, 307, 6457, 10043, 370, 309, 311, 406, 257, 24606, + 406, 412, 439, 370, 411, 291, 820, 920, 1949, 1228, 309, 51240], "temperature": + 0.0, "avg_logprob": -0.08148647176808324, "compression_ratio": 1.628099173553719, + "no_speech_prob": 0.004539810586720705}, {"id": 470, "seek": 313328, "start": 3150.8, + "end": 3156.48, "text": " like and actually maybe even keeping it as a strong baseline + in everything you do and I know some", "tokens": [51240, 411, 293, 767, 1310, 754, + 5145, 309, 382, 257, 2068, 20518, 294, 1203, 291, 360, 293, 286, 458, 512, 51524], + "temperature": 0.0, "avg_logprob": -0.08148647176808324, "compression_ratio": 1.628099173553719, + "no_speech_prob": 0.004539810586720705}, {"id": 471, "seek": 315648, "start": 3156.56, + "end": 3164.08, "text": " companies by the way still use TFIDF so maybe they should + also like first transition to VM25", "tokens": [50368, 3431, 538, 264, 636, 920, + 764, 40964, 2777, 37, 370, 1310, 436, 820, 611, 411, 700, 6034, 281, 18038, 6074, + 50744], "temperature": 0.0, "avg_logprob": -0.1442420482635498, "compression_ratio": + 1.6637554585152838, "no_speech_prob": 0.0011167546035721898}, {"id": 472, "seek": + 315648, "start": 3164.08, "end": 3169.04, "text": " and only then jump to neural + search techniques are like a denser trivel and and I think you also", "tokens": + [50744, 293, 787, 550, 3012, 281, 18161, 3164, 7512, 366, 411, 257, 24505, 260, + 1376, 779, 293, 293, 286, 519, 291, 611, 50992], "temperature": 0.0, "avg_logprob": + -0.1442420482635498, "compression_ratio": 1.6637554585152838, "no_speech_prob": + 0.0011167546035721898}, {"id": 473, "seek": 315648, "start": 3169.04, "end": 3175.2, + "text": " mentioned that and I saw by the way that you have participated in various + competitions on denser", "tokens": [50992, 2835, 300, 293, 286, 1866, 538, 264, + 636, 300, 291, 362, 17978, 294, 3683, 26185, 322, 24505, 260, 51300], "temperature": + 0.0, "avg_logprob": -0.1442420482635498, "compression_ratio": 1.6637554585152838, + "no_speech_prob": 0.0011167546035721898}, {"id": 474, "seek": 315648, "start": 3175.2, + "end": 3182.88, "text": " trivel and on ranking like can you can you elaborate a + bit more like what drives your interest", "tokens": [51300, 1376, 779, 293, 322, + 17833, 411, 393, 291, 393, 291, 20945, 257, 857, 544, 411, 437, 11754, 428, 1179, + 51684], "temperature": 0.0, "avg_logprob": -0.1442420482635498, "compression_ratio": + 1.6637554585152838, "no_speech_prob": 0.0011167546035721898}, {"id": 475, "seek": + 318288, "start": 3182.88, "end": 3188.1600000000003, "text": " there because to + me that sounds more like academic interest in a way right but of course", "tokens": + [50364, 456, 570, 281, 385, 300, 3263, 544, 411, 7778, 1179, 294, 257, 636, 558, + 457, 295, 1164, 50628], "temperature": 0.0, "avg_logprob": -0.12444192513652232, + "compression_ratio": 1.6598360655737705, "no_speech_prob": 0.0009809269104152918}, + {"id": 476, "seek": 318288, "start": 3188.1600000000003, "end": 3191.6, "text": + " you''re also showcasing and probably bringing ideas back to that spa.", "tokens": + [50628, 291, 434, 611, 29794, 3349, 293, 1391, 5062, 3487, 646, 281, 300, 32543, + 13, 50800], "temperature": 0.0, "avg_logprob": -0.12444192513652232, "compression_ratio": + 1.6598360655737705, "no_speech_prob": 0.0009809269104152918}, {"id": 477, "seek": + 318288, "start": 3194.0, "end": 3199.04, "text": " Yeah so the motivation was actually + around them as Marco passage ranking and", "tokens": [50920, 865, 370, 264, 12335, + 390, 767, 926, 552, 382, 26535, 11497, 17833, 293, 51172], "temperature": 0.0, "avg_logprob": + -0.12444192513652232, "compression_ratio": 1.6598360655737705, "no_speech_prob": + 0.0009809269104152918}, {"id": 478, "seek": 318288, "start": 3200.08, "end": 3204.56, + "text": " where we actually could use this dataset and then our dream when we started + to", "tokens": [51224, 689, 321, 767, 727, 764, 341, 28872, 293, 550, 527, 3055, + 562, 321, 1409, 281, 51448], "temperature": 0.0, "avg_logprob": -0.12444192513652232, + "compression_ratio": 1.6598360655737705, "no_speech_prob": 0.0009809269104152918}, + {"id": 479, "seek": 318288, "start": 3205.6, "end": 3211.04, "text": " implement + vector search was one thing and the other thing was you know how can we represent", + "tokens": [51500, 4445, 8062, 3164, 390, 472, 551, 293, 264, 661, 551, 390, 291, + 458, 577, 393, 321, 2906, 51772], "temperature": 0.0, "avg_logprob": -0.12444192513652232, + "compression_ratio": 1.6598360655737705, "no_speech_prob": 0.0009809269104152918}, + {"id": 480, "seek": 321288, "start": 3212.88, "end": 3218.4, "text": " the re-ranking + using bird in westbound so using the actual bird model inputting both the", "tokens": + [50364, 264, 319, 12, 20479, 278, 1228, 5255, 294, 7009, 18767, 370, 1228, 264, + 3539, 5255, 2316, 4846, 783, 1293, 264, 50640], "temperature": 0.0, "avg_logprob": + -0.1886046902163998, "compression_ratio": 1.663677130044843, "no_speech_prob": 0.0004989661974832416}, + {"id": 481, "seek": 321288, "start": 3218.4, "end": 3223.28, "text": " career and + the document at the same time so that was one dream we had and but we were looking + at", "tokens": [50640, 3988, 293, 264, 4166, 412, 264, 912, 565, 370, 300, 390, + 472, 3055, 321, 632, 293, 457, 321, 645, 1237, 412, 50884], "temperature": 0.0, + "avg_logprob": -0.1886046902163998, "compression_ratio": 1.663677130044843, "no_speech_prob": + 0.0004989661974832416}, {"id": 482, "seek": 321288, "start": 3223.28, "end": 3230.48, + "text": " the results and I think the first paper that we read it read that they + they used maybe a day", "tokens": [50884, 264, 3542, 293, 286, 519, 264, 700, 3035, + 300, 321, 1401, 309, 1401, 300, 436, 436, 1143, 1310, 257, 786, 51244], "temperature": + 0.0, "avg_logprob": -0.1886046902163998, "compression_ratio": 1.663677130044843, + "no_speech_prob": 0.0004989661974832416}, {"id": 483, "seek": 321288, "start": 3231.12, + "end": 3241.04, "text": " to with even with a GPU to actually perform 3600 queries + right so it was not really you know", "tokens": [51276, 281, 365, 754, 365, 257, + 18407, 281, 767, 2042, 8652, 628, 24109, 558, 370, 309, 390, 406, 534, 291, 458, + 51772], "temperature": 0.0, "avg_logprob": -0.1886046902163998, "compression_ratio": + 1.663677130044843, "no_speech_prob": 0.0004989661974832416}, {"id": 484, "seek": + 324104, "start": 3241.04, "end": 3247.7599999999998, "text": " how can we make this + practical and then two years later we actually did did beat", "tokens": [50364, + 577, 393, 321, 652, 341, 8496, 293, 550, 732, 924, 1780, 321, 767, 630, 630, 4224, + 50700], "temperature": 0.0, "avg_logprob": -0.1368877410888672, "compression_ratio": + 1.6787330316742082, "no_speech_prob": 0.0004709880449809134}, {"id": 485, "seek": + 324104, "start": 3248.56, "end": 3255.2799999999997, "text": " their benchmark and + to end represented on westbound and we were doing it less than 100 milliseconds", + "tokens": [50740, 641, 18927, 293, 281, 917, 10379, 322, 7009, 18767, 293, 321, + 645, 884, 309, 1570, 813, 2319, 34184, 51076], "temperature": 0.0, "avg_logprob": + -0.1368877410888672, "compression_ratio": 1.6787330316742082, "no_speech_prob": + 0.0004709880449809134}, {"id": 486, "seek": 324104, "start": 3256.0, "end": 3264.56, + "text": " so on CPU right so but there being a lot of learning to get there but + that was the motivation to", "tokens": [51112, 370, 322, 13199, 558, 370, 457, 456, + 885, 257, 688, 295, 2539, 281, 483, 456, 457, 300, 390, 264, 12335, 281, 51540], + "temperature": 0.0, "avg_logprob": -0.1368877410888672, "compression_ratio": 1.6787330316742082, + "no_speech_prob": 0.0004709880449809134}, {"id": 487, "seek": 324104, "start": 3264.56, + "end": 3269.92, "text": " kind of demonstrate that you can take this state of the + art or close to state of the art with", "tokens": [51540, 733, 295, 11698, 300, + 291, 393, 747, 341, 1785, 295, 264, 1523, 420, 1998, 281, 1785, 295, 264, 1523, + 365, 51808], "temperature": 0.0, "avg_logprob": -0.1368877410888672, "compression_ratio": + 1.6787330316742082, "no_speech_prob": 0.0004709880449809134}, {"id": 488, "seek": + 326992, "start": 3269.92, "end": 3276.8, "text": " three-wheel and ranking pipeline + from an open dataset which is how widely recognized and all the", "tokens": [50364, + 1045, 12, 22830, 293, 17833, 15517, 490, 364, 1269, 28872, 597, 307, 577, 13371, + 9823, 293, 439, 264, 50708], "temperature": 0.0, "avg_logprob": -0.222749733343357, + "compression_ratio": 1.820754716981132, "no_speech_prob": 0.0006659059436060488}, + {"id": 489, "seek": 326992, "start": 3276.8, "end": 3281.6, "text": " researchers + are actually publishing papers around it you can actually take that model and use", + "tokens": [50708, 10309, 366, 767, 17832, 10577, 926, 309, 291, 393, 767, 747, 300, + 2316, 293, 764, 50948], "temperature": 0.0, "avg_logprob": -0.222749733343357, "compression_ratio": + 1.820754716981132, "no_speech_prob": 0.0006659059436060488}, {"id": 490, "seek": + 326992, "start": 3281.6, "end": 3288.08, "text": " westbound and get those results + you know so it was one way of demonstrating that you can actually", "tokens": [50948, + 7009, 18767, 293, 483, 729, 3542, 291, 458, 370, 309, 390, 472, 636, 295, 29889, + 300, 291, 393, 767, 51272], "temperature": 0.0, "avg_logprob": -0.222749733343357, + "compression_ratio": 1.820754716981132, "no_speech_prob": 0.0006659059436060488}, + {"id": 491, "seek": 326992, "start": 3288.08, "end": 3293.84, "text": " then you + can actually use these models with westbound and have it serve in your state so + that was", "tokens": [51272, 550, 291, 393, 767, 764, 613, 5245, 365, 7009, 18767, + 293, 362, 309, 4596, 294, 428, 1785, 370, 300, 390, 51560], "temperature": 0.0, + "avg_logprob": -0.222749733343357, "compression_ratio": 1.820754716981132, "no_speech_prob": + 0.0006659059436060488}, {"id": 492, "seek": 329384, "start": 3293.84, "end": 3300.08, + "text": " actually the motivation not on the kind of science side and so on but + I have to say that I really", "tokens": [50364, 767, 264, 12335, 406, 322, 264, + 733, 295, 3497, 1252, 293, 370, 322, 457, 286, 362, 281, 584, 300, 286, 534, 50676], + "temperature": 0.0, "avg_logprob": -0.06640084036465349, "compression_ratio": 1.7824074074074074, + "no_speech_prob": 0.000877921876963228}, {"id": 493, "seek": 329384, "start": 3301.04, + "end": 3307.1200000000003, "text": " would encourage everybody that works in search + to look at some of these open datasets you know", "tokens": [50724, 576, 5373, 2201, + 300, 1985, 294, 3164, 281, 574, 412, 512, 295, 613, 1269, 42856, 291, 458, 51028], + "temperature": 0.0, "avg_logprob": -0.06640084036465349, "compression_ratio": 1.7824074074074074, + "no_speech_prob": 0.000877921876963228}, {"id": 494, "seek": 329384, "start": 3307.1200000000003, + "end": 3313.44, "text": " play with them you know maybe you have some ideas you + know around search and how to do search and", "tokens": [51028, 862, 365, 552, 291, + 458, 1310, 291, 362, 512, 3487, 291, 458, 926, 3164, 293, 577, 281, 360, 3164, 293, + 51344], "temperature": 0.0, "avg_logprob": -0.06640084036465349, "compression_ratio": + 1.7824074074074074, "no_speech_prob": 0.000877921876963228}, {"id": 495, "seek": + 329384, "start": 3314.32, "end": 3320.1600000000003, "text": " there''s a lot of + talks about boosting this phrasing you know how actually does that impact the", + "tokens": [51388, 456, 311, 257, 688, 295, 6686, 466, 43117, 341, 7636, 3349, 291, + 458, 577, 767, 775, 300, 2712, 264, 51680], "temperature": 0.0, "avg_logprob": -0.06640084036465349, + "compression_ratio": 1.7824074074074074, "no_speech_prob": 0.000877921876963228}, + {"id": 496, "seek": 332016, "start": 3320.16, "end": 3328.3199999999997, "text": + " results on kind of a dataset and I can really recommend the track COVID which + is a dataset that was", "tokens": [50364, 3542, 322, 733, 295, 257, 28872, 293, + 286, 393, 534, 2748, 264, 2837, 4566, 597, 307, 257, 28872, 300, 390, 50772], "temperature": + 0.0, "avg_logprob": -0.12790567609998915, "compression_ratio": 1.6623931623931625, + "no_speech_prob": 0.0004997096839360893}, {"id": 497, "seek": 332016, "start": 3329.44, + "end": 3336.64, "text": " made at the start of the pandemic and it has about 50 + queries and deep judgments for each of the", "tokens": [50828, 1027, 412, 264, 722, + 295, 264, 5388, 293, 309, 575, 466, 2625, 24109, 293, 2452, 40337, 337, 1184, 295, + 264, 51188], "temperature": 0.0, "avg_logprob": -0.12790567609998915, "compression_ratio": + 1.6623931623931625, "no_speech_prob": 0.0004997096839360893}, {"id": 498, "seek": + 332016, "start": 3336.64, "end": 3341.44, "text": " queries and the collection is + rather small so you can play with it on a single node and so on so", "tokens": [51188, + 24109, 293, 264, 5765, 307, 2831, 1359, 370, 291, 393, 862, 365, 309, 322, 257, + 2167, 9984, 293, 370, 322, 370, 51428], "temperature": 0.0, "avg_logprob": -0.12790567609998915, + "compression_ratio": 1.6623931623931625, "no_speech_prob": 0.0004997096839360893}, + {"id": 499, "seek": 332016, "start": 3341.44, "end": 3346.24, "text": " that I will + really encourage you know people in search to try out you know because then you + get", "tokens": [51428, 300, 286, 486, 534, 5373, 291, 458, 561, 294, 3164, 281, + 853, 484, 291, 458, 570, 550, 291, 483, 51668], "temperature": 0.0, "avg_logprob": + -0.12790567609998915, "compression_ratio": 1.6623931623931625, "no_speech_prob": + 0.0004997096839360893}, {"id": 500, "seek": 334624, "start": 3346.24, "end": 3352.3199999999997, + "text": " the feeling you know does it actually work does it actually you know compare + it with BM25 what", "tokens": [50364, 264, 2633, 291, 458, 775, 309, 767, 589, 775, + 309, 767, 291, 458, 6794, 309, 365, 15901, 6074, 437, 50668], "temperature": 0.0, + "avg_logprob": -0.15529566870795355, "compression_ratio": 1.7345132743362832, "no_speech_prob": + 0.0013958995696157217}, {"id": 501, "seek": 334624, "start": 3352.3199999999997, + "end": 3359.52, "text": " happens if I do phrase matching or do something clever + you know so I think that''s and and I''m really", "tokens": [50668, 2314, 498, 286, + 360, 9535, 14324, 420, 360, 746, 13494, 291, 458, 370, 286, 519, 300, 311, 293, + 293, 286, 478, 534, 51028], "temperature": 0.0, "avg_logprob": -0.15529566870795355, + "compression_ratio": 1.7345132743362832, "no_speech_prob": 0.0013958995696157217}, + {"id": 502, "seek": 334624, "start": 3359.52, "end": 3367.12, "text": " not a huge + fan of anecdotal query examples to see these kind of commercial actors with this + you know", "tokens": [51028, 406, 257, 2603, 3429, 295, 26652, 38180, 14581, 5110, + 281, 536, 613, 733, 295, 6841, 10037, 365, 341, 291, 458, 51408], "temperature": + 0.0, "avg_logprob": -0.15529566870795355, "compression_ratio": 1.7345132743362832, + "no_speech_prob": 0.0013958995696157217}, {"id": 503, "seek": 334624, "start": 3367.12, + "end": 3373.2799999999997, "text": " I''m searching for this on this magic results + you know I''m more into you know demonstrating that", "tokens": [51408, 286, 478, + 10808, 337, 341, 322, 341, 5585, 3542, 291, 458, 286, 478, 544, 666, 291, 458, 29889, + 300, 51716], "temperature": 0.0, "avg_logprob": -0.15529566870795355, "compression_ratio": + 1.7345132743362832, "no_speech_prob": 0.0013958995696157217}, {"id": 504, "seek": + 337328, "start": 3373.36, "end": 3378.1600000000003, "text": " actually Westpac + can do this and it has the funding and actually on the real datasets.", "tokens": + [50368, 767, 4055, 79, 326, 393, 360, 341, 293, 309, 575, 264, 6137, 293, 767, 322, + 264, 957, 42856, 13, 50608], "temperature": 0.0, "avg_logprob": -0.16603803634643555, + "compression_ratio": 1.7013574660633484, "no_speech_prob": 0.0014825869584456086}, + {"id": 505, "seek": 337328, "start": 3378.88, "end": 3384.7200000000003, "text": + " Yeah and I agree in the end of the day what matters is first of all can you apply + this tag as you", "tokens": [50644, 865, 293, 286, 3986, 294, 264, 917, 295, 264, + 786, 437, 7001, 307, 700, 295, 439, 393, 291, 3079, 341, 6162, 382, 291, 50936], + "temperature": 0.0, "avg_logprob": -0.16603803634643555, "compression_ratio": 1.7013574660633484, + "no_speech_prob": 0.0014825869584456086}, {"id": 506, "seek": 337328, "start": 3384.7200000000003, + "end": 3390.4, "text": " said in your real setting right in your domain then another + thing that you mentioned just now", "tokens": [50936, 848, 294, 428, 957, 3287, + 558, 294, 428, 9274, 550, 1071, 551, 300, 291, 2835, 445, 586, 51220], "temperature": + 0.0, "avg_logprob": -0.16603803634643555, "compression_ratio": 1.7013574660633484, + "no_speech_prob": 0.0014825869584456086}, {"id": 507, "seek": 337328, "start": 3391.44, + "end": 3397.6000000000004, "text": " you know the track COVID dataset so maybe as + the result of your research you might also impact on", "tokens": [51272, 291, 458, + 264, 2837, 4566, 28872, 370, 1310, 382, 264, 1874, 295, 428, 2132, 291, 1062, 611, + 2712, 322, 51580], "temperature": 0.0, "avg_logprob": -0.16603803634643555, "compression_ratio": + 1.7013574660633484, "no_speech_prob": 0.0014825869584456086}, {"id": 508, "seek": + 339760, "start": 3397.6, "end": 3403.2799999999997, "text": " the global situation + right maybe somewhere locally maybe somebody will use your work to actually", "tokens": + [50364, 264, 4338, 2590, 558, 1310, 4079, 16143, 1310, 2618, 486, 764, 428, 589, + 281, 767, 50648], "temperature": 0.0, "avg_logprob": -0.14964116977739939, "compression_ratio": + 1.6403508771929824, "no_speech_prob": 0.0006959047168493271}, {"id": 509, "seek": + 339760, "start": 3403.2799999999997, "end": 3411.36, "text": " implement a better + search system so I think that that''s also a fantastic segue to you know", "tokens": + [50648, 4445, 257, 1101, 3164, 1185, 370, 286, 519, 300, 300, 311, 611, 257, 5456, + 33850, 281, 291, 458, 51052], "temperature": 0.0, "avg_logprob": -0.14964116977739939, + "compression_ratio": 1.6403508771929824, "no_speech_prob": 0.0006959047168493271}, + {"id": 510, "seek": 339760, "start": 3412.3199999999997, "end": 3417.68, "text": + " the the context that you''re doing and that''s actually a very interesting point + because we had", "tokens": [51100, 264, 264, 4319, 300, 291, 434, 884, 293, 300, + 311, 767, 257, 588, 1880, 935, 570, 321, 632, 51368], "temperature": 0.0, "avg_logprob": + -0.14964116977739939, "compression_ratio": 1.6403508771929824, "no_speech_prob": + 0.0006959047168493271}, {"id": 511, "seek": 339760, "start": 3419.6, "end": 3426.56, + "text": " at the start of the pandemic we built a cord 19 search interface that + we published online so", "tokens": [51464, 412, 264, 722, 295, 264, 5388, 321, 3094, + 257, 12250, 1294, 3164, 9226, 300, 321, 6572, 2950, 370, 51812], "temperature": + 0.0, "avg_logprob": -0.14964116977739939, "compression_ratio": 1.6403508771929824, + "no_speech_prob": 0.0006959047168493271}, {"id": 512, "seek": 342656, "start": 3426.56, + "end": 3432.7999999999997, "text": " people who actually go and search this dataset + and people they were I don''t I don''t recall the", "tokens": [50364, 561, 567, + 767, 352, 293, 3164, 341, 28872, 293, 561, 436, 645, 286, 500, 380, 286, 500, 380, + 9901, 264, 50676], "temperature": 0.0, "avg_logprob": -0.1679816927228655, "compression_ratio": + 1.83203125, "no_speech_prob": 0.0003276356728747487}, {"id": 513, "seek": 342656, + "start": 3432.7999999999997, "end": 3437.84, "text": " details but it''s still online + and people actually because of all open source so they forked it and", "tokens": + [50676, 4365, 457, 309, 311, 920, 2950, 293, 561, 767, 570, 295, 439, 1269, 4009, + 370, 436, 17716, 292, 309, 293, 50928], "temperature": 0.0, "avg_logprob": -0.1679816927228655, + "compression_ratio": 1.83203125, "no_speech_prob": 0.0003276356728747487}, {"id": + 514, "seek": 342656, "start": 3437.84, "end": 3442.24, "text": " then they started + using Westpac and based on that and I think it''s a much better shape", "tokens": + [50928, 550, 436, 1409, 1228, 4055, 79, 326, 293, 2361, 322, 300, 293, 286, 519, + 309, 311, 257, 709, 1101, 3909, 51148], "temperature": 0.0, "avg_logprob": -0.1679816927228655, + "compression_ratio": 1.83203125, "no_speech_prob": 0.0003276356728747487}, {"id": + 515, "seek": 342656, "start": 3443.04, "end": 3447.92, "text": " that service right + now than the the cord 19 search that we did so they actually built on that", "tokens": + [51188, 300, 2643, 558, 586, 813, 264, 264, 12250, 1294, 3164, 300, 321, 630, 370, + 436, 767, 3094, 322, 300, 51432], "temperature": 0.0, "avg_logprob": -0.1679816927228655, + "compression_ratio": 1.83203125, "no_speech_prob": 0.0003276356728747487}, {"id": + 516, "seek": 342656, "start": 3447.92, "end": 3454.4, "text": " work so so that''s + that''s great I love to put what I call sample applications you know how what", + "tokens": [51432, 589, 370, 370, 300, 311, 300, 311, 869, 286, 959, 281, 829, 437, + 286, 818, 6889, 5821, 291, 458, 577, 437, 51756], "temperature": 0.0, "avg_logprob": + -0.1679816927228655, "compression_ratio": 1.83203125, "no_speech_prob": 0.0003276356728747487}, + {"id": 517, "seek": 345440, "start": 3454.4, "end": 3460.08, "text": " can you build + with with Westpac and and that''s actually a lot of my time these days are spent + on", "tokens": [50364, 393, 291, 1322, 365, 365, 4055, 79, 326, 293, 293, 300, 311, + 767, 257, 688, 295, 452, 565, 613, 1708, 366, 4418, 322, 50648], "temperature": + 0.0, "avg_logprob": -0.12521179953774253, "compression_ratio": 1.7568807339449541, + "no_speech_prob": 0.0006008953205309808}, {"id": 518, "seek": 345440, "start": 3460.64, + "end": 3467.92, "text": " making these sample applications smooth and easy to to + work with and especially we''ve been", "tokens": [50676, 1455, 613, 6889, 5821, + 5508, 293, 1858, 281, 281, 589, 365, 293, 2318, 321, 600, 668, 51040], "temperature": + 0.0, "avg_logprob": -0.12521179953774253, "compression_ratio": 1.7568807339449541, + "no_speech_prob": 0.0006008953205309808}, {"id": 519, "seek": 345440, "start": 3467.92, + "end": 3473.52, "text": " rather weak on the kind of UI putting together front dance + and so on so that''s actually some work", "tokens": [51040, 2831, 5336, 322, 264, + 733, 295, 15682, 3372, 1214, 1868, 4489, 293, 370, 322, 370, 300, 311, 767, 512, + 589, 51320], "temperature": 0.0, "avg_logprob": -0.12521179953774253, "compression_ratio": + 1.7568807339449541, "no_speech_prob": 0.0006008953205309808}, {"id": 520, "seek": + 345440, "start": 3473.52, "end": 3479.12, "text": " that I''m doing right now to + kind of build more of the product you know what can you build with it", "tokens": + [51320, 300, 286, 478, 884, 558, 586, 281, 733, 295, 1322, 544, 295, 264, 1674, + 291, 458, 437, 393, 291, 1322, 365, 309, 51600], "temperature": 0.0, "avg_logprob": + -0.12521179953774253, "compression_ratio": 1.7568807339449541, "no_speech_prob": + 0.0006008953205309808}, {"id": 521, "seek": 347912, "start": 3479.12, "end": 3485.6, + "text": " because people don''t get really excited about looking at Jason I output + you know to actually see", "tokens": [50364, 570, 561, 500, 380, 483, 534, 2919, + 466, 1237, 412, 11181, 286, 5598, 291, 458, 281, 767, 536, 50688], "temperature": + 0.0, "avg_logprob": -0.14565244562485638, "compression_ratio": 1.8073394495412844, + "no_speech_prob": 0.0030232362914830446}, {"id": 522, "seek": 347912, "start": 3485.6, + "end": 3490.72, "text": " some interactions faces facets you know out the completion + and to actually build that whole experience", "tokens": [50688, 512, 13280, 8475, + 49752, 291, 458, 484, 264, 19372, 293, 281, 767, 1322, 300, 1379, 1752, 50944], + "temperature": 0.0, "avg_logprob": -0.14565244562485638, "compression_ratio": 1.8073394495412844, + "no_speech_prob": 0.0030232362914830446}, {"id": 523, "seek": 347912, "start": 3490.72, + "end": 3496.56, "text": " you know for the for the product people it''s like looking + at the engine when you actually want to", "tokens": [50944, 291, 458, 337, 264, + 337, 264, 1674, 561, 309, 311, 411, 1237, 412, 264, 2848, 562, 291, 767, 528, 281, + 51236], "temperature": 0.0, "avg_logprob": -0.14565244562485638, "compression_ratio": + 1.8073394495412844, "no_speech_prob": 0.0030232362914830446}, {"id": 524, "seek": + 347912, "start": 3496.56, "end": 3502.08, "text": " maybe look at the car right + and then you get fascinated by how shiny and sort of sleek it is and", "tokens": + [51236, 1310, 574, 412, 264, 1032, 558, 293, 550, 291, 483, 24597, 538, 577, 16997, + 293, 1333, 295, 43464, 309, 307, 293, 51512], "temperature": 0.0, "avg_logprob": + -0.14565244562485638, "compression_ratio": 1.8073394495412844, "no_speech_prob": + 0.0030232362914830446}, {"id": 525, "seek": 350208, "start": 3502.48, "end": 3509.04, + "text": " then you''re like I''m buying it yes yes I totally hear you there and + like actually in these", "tokens": [50384, 550, 291, 434, 411, 286, 478, 6382, 309, + 2086, 2086, 286, 3879, 1568, 291, 456, 293, 411, 767, 294, 613, 50712], "temperature": + 0.0, "avg_logprob": -0.10854914983113607, "compression_ratio": 1.7767441860465116, + "no_speech_prob": 0.0009868770139291883}, {"id": 526, "seek": 350208, "start": 3509.04, + "end": 3515.2, "text": " use cases you know there are other platforms you know in + the neural search space also doing multiple", "tokens": [50712, 764, 3331, 291, + 458, 456, 366, 661, 9473, 291, 458, 294, 264, 18161, 3164, 1901, 611, 884, 3866, + 51020], "temperature": 0.0, "avg_logprob": -0.10854914983113607, "compression_ratio": + 1.7767441860465116, "no_speech_prob": 0.0009868770139291883}, {"id": 527, "seek": + 350208, "start": 3515.2, "end": 3522.0, "text": " demos have you been looking into + the direction of multimodal search does that excite you do you think", "tokens": + [51020, 33788, 362, 291, 668, 1237, 666, 264, 3513, 295, 32972, 378, 304, 3164, + 775, 300, 1624, 642, 291, 360, 291, 519, 51360], "temperature": 0.0, "avg_logprob": + -0.10854914983113607, "compression_ratio": 1.7767441860465116, "no_speech_prob": + 0.0009868770139291883}, {"id": 528, "seek": 350208, "start": 3522.88, "end": 3529.84, + "text": " it''s too much of a bling edge or niche use case or do you think it has + potential because", "tokens": [51404, 309, 311, 886, 709, 295, 257, 888, 278, 4691, + 420, 19956, 764, 1389, 420, 360, 291, 519, 309, 575, 3995, 570, 51752], "temperature": + 0.0, "avg_logprob": -0.10854914983113607, "compression_ratio": 1.7767441860465116, + "no_speech_prob": 0.0009868770139291883}, {"id": 529, "seek": 352984, "start": 3530.6400000000003, + "end": 3535.28, "text": " of the neural search crossing the boundary of text towards + the image audio and so on", "tokens": [50404, 295, 264, 18161, 3164, 14712, 264, + 12866, 295, 2487, 3030, 264, 3256, 6278, 293, 370, 322, 50636], "temperature": 0.0, + "avg_logprob": -0.1459303900252941, "compression_ratio": 1.7440758293838863, "no_speech_prob": + 0.0019839617889374495}, {"id": 530, "seek": 352984, "start": 3537.76, "end": 3544.2400000000002, + "text": " yeah I think multimodal is really where recto search is shining so this + is the area where you", "tokens": [50760, 1338, 286, 519, 32972, 378, 304, 307, + 534, 689, 11048, 78, 3164, 307, 18269, 370, 341, 307, 264, 1859, 689, 291, 51084], + "temperature": 0.0, "avg_logprob": -0.1459303900252941, "compression_ratio": 1.7440758293838863, + "no_speech_prob": 0.0019839617889374495}, {"id": 531, "seek": 352984, "start": 3544.96, + "end": 3551.84, "text": " it really shines I have some doubts about out of the main + like we discussed using a vector model", "tokens": [51120, 309, 534, 28056, 286, + 362, 512, 22618, 466, 484, 295, 264, 2135, 411, 321, 7152, 1228, 257, 8062, 2316, + 51464], "temperature": 0.0, "avg_logprob": -0.1459303900252941, "compression_ratio": + 1.7440758293838863, "no_speech_prob": 0.0019839617889374495}, {"id": 532, "seek": + 352984, "start": 3551.84, "end": 3556.8, "text": " for text search if you don''t + have any label training data and so on and adopted to your data", "tokens": [51464, + 337, 2487, 3164, 498, 291, 500, 380, 362, 604, 7645, 3097, 1412, 293, 370, 322, + 293, 12175, 281, 428, 1412, 51712], "temperature": 0.0, "avg_logprob": -0.1459303900252941, + "compression_ratio": 1.7440758293838863, "no_speech_prob": 0.0019839617889374495}, + {"id": 533, "seek": 355680, "start": 3557.6000000000004, "end": 3566.1600000000003, + "text": " using vector search alone for that I think is questionable but looking + at this multimodal where you", "tokens": [50404, 1228, 8062, 3164, 3312, 337, 300, + 286, 519, 307, 37158, 457, 1237, 412, 341, 32972, 378, 304, 689, 291, 50832], "temperature": + 0.0, "avg_logprob": -0.14201689370070833, "compression_ratio": 1.6901408450704225, + "no_speech_prob": 0.0010537938214838505}, {"id": 534, "seek": 355680, "start": 3566.1600000000003, + "end": 3572.32, "text": " combine both a transformer model and a typical image model + and you train that representation", "tokens": [50832, 10432, 1293, 257, 31782, 2316, + 293, 257, 7476, 3256, 2316, 293, 291, 3847, 300, 10290, 51140], "temperature": 0.0, + "avg_logprob": -0.14201689370070833, "compression_ratio": 1.6901408450704225, "no_speech_prob": + 0.0010537938214838505}, {"id": 535, "seek": 355680, "start": 3573.04, "end": 3577.84, + "text": " and from what I''ve seen from these models and we did a sample application + on this as well", "tokens": [51176, 293, 490, 437, 286, 600, 1612, 490, 613, 5245, + 293, 321, 630, 257, 6889, 3861, 322, 341, 382, 731, 51416], "temperature": 0.0, + "avg_logprob": -0.14201689370070833, "compression_ratio": 1.6901408450704225, "no_speech_prob": + 0.0010537938214838505}, {"id": 536, "seek": 355680, "start": 3578.5600000000004, + "end": 3585.92, "text": " using the clip embedding model from from open AI and looking + at the results I", "tokens": [51452, 1228, 264, 7353, 12240, 3584, 2316, 490, 490, + 1269, 7318, 293, 1237, 412, 264, 3542, 286, 51820], "temperature": 0.0, "avg_logprob": + -0.14201689370070833, "compression_ratio": 1.6901408450704225, "no_speech_prob": + 0.0010537938214838505}, {"id": 537, "seek": 358592, "start": 3586.48, "end": 3591.84, + "text": " I have to say that I''m really impressed by kind of just eyeballing I + don''t have any kind of", "tokens": [50392, 286, 362, 281, 584, 300, 286, 478, 534, + 11679, 538, 733, 295, 445, 38868, 278, 286, 500, 380, 362, 604, 733, 295, 50660], + "temperature": 0.0, "avg_logprob": -0.1111147953913762, "compression_ratio": 1.8256410256410256, + "no_speech_prob": 0.0037121912464499474}, {"id": 538, "seek": 358592, "start": 3593.04, + "end": 3598.48, "text": " I don''t have any hard data sets or but it''s really impressive + you know what that model can", "tokens": [50720, 286, 500, 380, 362, 604, 1152, + 1412, 6352, 420, 457, 309, 311, 534, 8992, 291, 458, 437, 300, 2316, 393, 50992], + "temperature": 0.0, "avg_logprob": -0.1111147953913762, "compression_ratio": 1.8256410256410256, + "no_speech_prob": 0.0037121912464499474}, {"id": 539, "seek": 358592, "start": 3598.48, + "end": 3605.2000000000003, "text": " can actually do so I definitely think that + multimodal is it''s very I don''t think it''s", "tokens": [50992, 393, 767, 360, + 370, 286, 2138, 519, 300, 32972, 378, 304, 307, 309, 311, 588, 286, 500, 380, 519, + 309, 311, 51328], "temperature": 0.0, "avg_logprob": -0.1111147953913762, "compression_ratio": + 1.8256410256410256, "no_speech_prob": 0.0037121912464499474}, {"id": 540, "seek": + 358592, "start": 3606.32, "end": 3613.76, "text": " I don''t think it''s that far + ahead I think because we have interest in representing clip", "tokens": [51384, + 286, 500, 380, 519, 309, 311, 300, 1400, 2286, 286, 519, 570, 321, 362, 1179, 294, + 13460, 7353, 51756], "temperature": 0.0, "avg_logprob": -0.1111147953913762, "compression_ratio": + 1.8256410256410256, "no_speech_prob": 0.0037121912464499474}, {"id": 541, "seek": + 361376, "start": 3614.32, "end": 3619.28, "text": " in best from from actual questions + I''m actually I''m seeing an email right now you know how to", "tokens": [50392, + 294, 1151, 490, 490, 3539, 1651, 286, 478, 767, 286, 478, 2577, 364, 3796, 558, + 586, 291, 458, 577, 281, 50640], "temperature": 0.0, "avg_logprob": -0.2134443995464279, + "compression_ratio": 1.7326732673267327, "no_speech_prob": 0.004124809056520462}, + {"id": 542, "seek": 361376, "start": 3620.2400000000002, "end": 3625.2000000000003, + "text": " they want to help on their schema and definitely they want to use clip", + "tokens": [50688, 436, 528, 281, 854, 322, 641, 34078, 293, 2138, 436, 528, 281, + 764, 7353, 50936], "temperature": 0.0, "avg_logprob": -0.2134443995464279, "compression_ratio": + 1.7326732673267327, "no_speech_prob": 0.004124809056520462}, {"id": 543, "seek": + 361376, "start": 3625.84, "end": 3631.6800000000003, "text": " yeah so definitely + I don''t think it''s that advanced at the moment and I think we''ll see", "tokens": + [50968, 1338, 370, 2138, 286, 500, 380, 519, 309, 311, 300, 7339, 412, 264, 1623, + 293, 286, 519, 321, 603, 536, 51260], "temperature": 0.0, "avg_logprob": -0.2134443995464279, + "compression_ratio": 1.7326732673267327, "no_speech_prob": 0.004124809056520462}, + {"id": 544, "seek": 361376, "start": 3632.8, "end": 3637.36, "text": " another thing + that I''m working on right now is that I talked about our sample applications I + want", "tokens": [51316, 1071, 551, 300, 286, 478, 1364, 322, 558, 586, 307, 300, + 286, 2825, 466, 527, 6889, 5821, 286, 528, 51544], "temperature": 0.0, "avg_logprob": + -0.2134443995464279, "compression_ratio": 1.7326732673267327, "no_speech_prob": + 0.004124809056520462}, {"id": 545, "seek": 363736, "start": 3637.36, "end": 3643.92, + "text": " to build a new sample application that demonstrates in a UI in an e-commerce + setting where you", "tokens": [50364, 281, 1322, 257, 777, 6889, 3861, 300, 31034, + 294, 257, 15682, 294, 364, 308, 12, 26926, 3287, 689, 291, 50692], "temperature": + 0.0, "avg_logprob": -0.13036282857259116, "compression_ratio": 1.6551724137931034, + "no_speech_prob": 0.002691468223929405}, {"id": 546, "seek": 363736, "start": 3643.92, + "end": 3650.6400000000003, "text": " combine different kind of fussy matching exact + matching vector search all in the same query and", "tokens": [50692, 10432, 819, + 733, 295, 283, 26394, 14324, 1900, 14324, 8062, 3164, 439, 294, 264, 912, 14581, + 293, 51028], "temperature": 0.0, "avg_logprob": -0.13036282857259116, "compression_ratio": + 1.6551724137931034, "no_speech_prob": 0.002691468223929405}, {"id": 547, "seek": + 363736, "start": 3650.6400000000003, "end": 3655.84, "text": " then you can have + some sliders where you actually slide these you know how does the result change", + "tokens": [51028, 550, 291, 393, 362, 512, 1061, 6936, 689, 291, 767, 4137, 613, + 291, 458, 577, 775, 264, 1874, 1319, 51288], "temperature": 0.0, "avg_logprob": + -0.13036282857259116, "compression_ratio": 1.6551724137931034, "no_speech_prob": + 0.002691468223929405}, {"id": 548, "seek": 363736, "start": 3655.84, "end": 3661.6, + "text": " and they change in real time so I just need some help on the on the react + front end because I''m", "tokens": [51288, 293, 436, 1319, 294, 957, 565, 370, 286, + 445, 643, 512, 854, 322, 264, 322, 264, 4515, 1868, 917, 570, 286, 478, 51576], + "temperature": 0.0, "avg_logprob": -0.13036282857259116, "compression_ratio": 1.6551724137931034, + "no_speech_prob": 0.002691468223929405}, {"id": 549, "seek": 366160, "start": 3661.6, + "end": 3667.6, "text": " not I''m not a great JavaScript programmer I have to admit + so I need some help on that so yeah but", "tokens": [50364, 406, 286, 478, 406, + 257, 869, 15778, 32116, 286, 362, 281, 9796, 370, 286, 643, 512, 854, 322, 300, + 370, 1338, 457, 50664], "temperature": 0.0, "avg_logprob": -0.1433842764960395, + "compression_ratio": 1.6302521008403361, "no_speech_prob": 0.0010606141295284033}, + {"id": 550, "seek": 366160, "start": 3667.6, "end": 3673.7599999999998, "text": + " I definitely think that multimodal vector search has a really has a huge number + of use cases yeah", "tokens": [50664, 286, 2138, 519, 300, 32972, 378, 304, 8062, + 3164, 575, 257, 534, 575, 257, 2603, 1230, 295, 764, 3331, 1338, 50972], "temperature": + 0.0, "avg_logprob": -0.1433842764960395, "compression_ratio": 1.6302521008403361, + "no_speech_prob": 0.0010606141295284033}, {"id": 551, "seek": 366160, "start": 3674.7999999999997, + "end": 3682.64, "text": " I hope that amongst listeners of this podcast maybe there + are some with front end skills and maybe", "tokens": [51024, 286, 1454, 300, 12918, + 23274, 295, 341, 7367, 1310, 456, 366, 512, 365, 1868, 917, 3942, 293, 1310, 51416], + "temperature": 0.0, "avg_logprob": -0.1433842764960395, "compression_ratio": 1.6302521008403361, + "no_speech_prob": 0.0010606141295284033}, {"id": 552, "seek": 366160, "start": 3682.64, + "end": 3688.3199999999997, "text": " since you''re building this for open source + you know that might be good use case as well to be", "tokens": [51416, 1670, 291, + 434, 2390, 341, 337, 1269, 4009, 291, 458, 300, 1062, 312, 665, 764, 1389, 382, + 731, 281, 312, 51700], "temperature": 0.0, "avg_logprob": -0.1433842764960395, "compression_ratio": + 1.6302521008403361, "no_speech_prob": 0.0010606141295284033}, {"id": 553, "seek": + 368832, "start": 3688.32, "end": 3694.96, "text": " contributing to this crazy journey + but we have yeah I mean that would I mean definitely we do see", "tokens": [50364, + 19270, 281, 341, 3219, 4671, 457, 321, 362, 1338, 286, 914, 300, 576, 286, 914, + 2138, 321, 360, 536, 50696], "temperature": 0.0, "avg_logprob": -0.15763405070585362, + "compression_ratio": 1.7130434782608697, "no_speech_prob": 0.002225163159891963}, + {"id": 554, "seek": 368832, "start": 3694.96, "end": 3702.2400000000002, "text": + " more involvement and contributions from from the in the kind of community around + VESPA so I think", "tokens": [50696, 544, 17447, 293, 15725, 490, 490, 264, 294, + 264, 733, 295, 1768, 926, 691, 2358, 10297, 370, 286, 519, 51060], "temperature": + 0.0, "avg_logprob": -0.15763405070585362, "compression_ratio": 1.7130434782608697, + "no_speech_prob": 0.002225163159891963}, {"id": 555, "seek": 368832, "start": 3702.2400000000002, + "end": 3707.28, "text": " we build a lot of the last two years of the community + side and people getting to know more about", "tokens": [51060, 321, 1322, 257, 688, + 295, 264, 1036, 732, 924, 295, 264, 1768, 1252, 293, 561, 1242, 281, 458, 544, 466, + 51312], "temperature": 0.0, "avg_logprob": -0.15763405070585362, "compression_ratio": + 1.7130434782608697, "no_speech_prob": 0.002225163159891963}, {"id": 556, "seek": + 368832, "start": 3707.28, "end": 3713.84, "text": " VESPA and actually starting + to contribute back both on the sample applications and also documentation", "tokens": + [51312, 691, 2358, 10297, 293, 767, 2891, 281, 10586, 646, 1293, 322, 264, 6889, + 5821, 293, 611, 14333, 51640], "temperature": 0.0, "avg_logprob": -0.15763405070585362, + "compression_ratio": 1.7130434782608697, "no_speech_prob": 0.002225163159891963}, + {"id": 557, "seek": 371384, "start": 3713.84, "end": 3718.8, "text": " and also + we''re seeing our more involved in contributing to the code so definitely yeah", + "tokens": [50364, 293, 611, 321, 434, 2577, 527, 544, 3288, 294, 19270, 281, 264, + 3089, 370, 2138, 1338, 50612], "temperature": 0.0, "avg_logprob": -0.20125305259620752, + "compression_ratio": 1.619047619047619, "no_speech_prob": 0.0013368910877034068}, + {"id": 558, "seek": 371384, "start": 3720.8, "end": 3726.56, "text": " so but I + think it''s from a product side it''s really important for us to and also we have + a", "tokens": [50712, 370, 457, 286, 519, 309, 311, 490, 257, 1674, 1252, 309, 311, + 534, 1021, 337, 505, 281, 293, 611, 321, 362, 257, 51000], "temperature": 0.0, "avg_logprob": + -0.20125305259620752, "compression_ratio": 1.619047619047619, "no_speech_prob": + 0.0013368910877034068}, {"id": 559, "seek": 371384, "start": 3726.56, "end": 3732.48, + "text": " commercial offering of VESPA where you actually have a hosted interface + hosted solution multi-region", "tokens": [51000, 6841, 8745, 295, 691, 2358, 10297, + 689, 291, 767, 362, 257, 19204, 9226, 19204, 3827, 4825, 12, 3375, 313, 51296], + "temperature": 0.0, "avg_logprob": -0.20125305259620752, "compression_ratio": 1.619047619047619, + "no_speech_prob": 0.0013368910877034068}, {"id": 560, "seek": 371384, "start": 3733.6800000000003, + "end": 3741.36, "text": " and to I think it we want VESPA to be able to run fully + fledged in your environment if you want", "tokens": [51356, 293, 281, 286, 519, + 309, 321, 528, 691, 2358, 10297, 281, 312, 1075, 281, 1190, 4498, 24114, 3004, 294, + 428, 2823, 498, 291, 528, 51740], "temperature": 0.0, "avg_logprob": -0.20125305259620752, + "compression_ratio": 1.619047619047619, "no_speech_prob": 0.0013368910877034068}, + {"id": 561, "seek": 374136, "start": 3741.36, "end": 3747.04, "text": " to use it + because it''s open source it''s our Pasha 20 if you want to use our cloud you are + welcome", "tokens": [50364, 281, 764, 309, 570, 309, 311, 1269, 4009, 309, 311, + 527, 430, 12137, 945, 498, 291, 528, 281, 764, 527, 4588, 291, 366, 2928, 50648], + "temperature": 0.0, "avg_logprob": -0.20485071035531852, "compression_ratio": 1.5638297872340425, + "no_speech_prob": 0.0011292911367490888}, {"id": 562, "seek": 374136, "start": 3747.04, + "end": 3756.08, "text": " to do that and to kind of have and the same kind of functionality + and what we add in the cloud is", "tokens": [50648, 281, 360, 300, 293, 281, 733, + 295, 362, 293, 264, 912, 733, 295, 14980, 293, 437, 321, 909, 294, 264, 4588, 307, + 51100], "temperature": 0.0, "avg_logprob": -0.20485071035531852, "compression_ratio": + 1.5638297872340425, "no_speech_prob": 0.0011292911367490888}, {"id": 563, "seek": + 374136, "start": 3756.6400000000003, "end": 3765.84, "text": " CICD pipelines how + to do multi-region failovers like in the US East US West you can have different", + "tokens": [51128, 383, 2532, 35, 40168, 577, 281, 360, 4825, 12, 3375, 313, 3061, + 25348, 411, 294, 264, 2546, 6747, 2546, 4055, 291, 393, 362, 819, 51588], "temperature": + 0.0, "avg_logprob": -0.20485071035531852, "compression_ratio": 1.5638297872340425, + "no_speech_prob": 0.0011292911367490888}, {"id": 564, "seek": 376584, "start": 3765.84, + "end": 3773.2000000000003, "text": " so all this kind of top and take care of sort + of take care of nodes failing and whatever you know", "tokens": [50364, 370, 439, + 341, 733, 295, 1192, 293, 747, 1127, 295, 1333, 295, 747, 1127, 295, 13891, 18223, + 293, 2035, 291, 458, 50732], "temperature": 0.0, "avg_logprob": -0.19817033278203644, + "compression_ratio": 1.864864864864865, "no_speech_prob": 0.0016855113208293915}, + {"id": 565, "seek": 376584, "start": 3773.2000000000003, "end": 3778.7200000000003, + "text": " the hole the kind of host the experience so and that''s been an issue + with our sample apps they have", "tokens": [50732, 264, 5458, 264, 733, 295, 3975, + 264, 1752, 370, 293, 300, 311, 668, 364, 2734, 365, 527, 6889, 7733, 436, 362, 51008], + "temperature": 0.0, "avg_logprob": -0.19817033278203644, "compression_ratio": 1.864864864864865, + "no_speech_prob": 0.0016855113208293915}, {"id": 566, "seek": 376584, "start": 3778.7200000000003, + "end": 3783.6800000000003, "text": " been like it has been some friction around + you know how to deploy them locally how to deploy", "tokens": [51008, 668, 411, + 309, 575, 668, 512, 17710, 926, 291, 458, 577, 281, 7274, 552, 16143, 577, 281, + 7274, 51256], "temperature": 0.0, "avg_logprob": -0.19817033278203644, "compression_ratio": + 1.864864864864865, "no_speech_prob": 0.0016855113208293915}, {"id": 567, "seek": + 376584, "start": 3783.6800000000003, "end": 3788.88, "text": " them to the cloud + so I''m trying to kind of bring them together so that they they work in in multiple", + "tokens": [51256, 552, 281, 264, 4588, 370, 286, 478, 1382, 281, 733, 295, 1565, + 552, 1214, 370, 300, 436, 436, 589, 294, 294, 3866, 51516], "temperature": 0.0, + "avg_logprob": -0.19817033278203644, "compression_ratio": 1.864864864864865, "no_speech_prob": + 0.0016855113208293915}, {"id": 568, "seek": 376584, "start": 3788.88, "end": 3795.1200000000003, + "text": " environments yeah that''s a lot of sense and I guess it takes a lot of + engineering effort to", "tokens": [51516, 12388, 1338, 300, 311, 257, 688, 295, + 2020, 293, 286, 2041, 309, 2516, 257, 688, 295, 7043, 4630, 281, 51828], "temperature": + 0.0, "avg_logprob": -0.19817033278203644, "compression_ratio": 1.864864864864865, + "no_speech_prob": 0.0016855113208293915}, {"id": 569, "seek": 379512, "start": 3795.12, + "end": 3801.44, "text": " also kind of cover all these different use cases so sounds + quite exciting and actually demoing", "tokens": [50364, 611, 733, 295, 2060, 439, + 613, 819, 764, 3331, 370, 3263, 1596, 4670, 293, 767, 10723, 278, 50680], "temperature": + 0.0, "avg_logprob": -0.08196961602499318, "compression_ratio": 1.7244444444444444, + "no_speech_prob": 0.0008621862507425249}, {"id": 570, "seek": 379512, "start": 3801.44, + "end": 3810.0, "text": " the technology I think you know as you know other vector + databases have got it and I think it''s a", "tokens": [50680, 264, 2899, 286, 519, + 291, 458, 382, 291, 458, 661, 8062, 22380, 362, 658, 309, 293, 286, 519, 309, 311, + 257, 51108], "temperature": 0.0, "avg_logprob": -0.08196961602499318, "compression_ratio": + 1.7244444444444444, "no_speech_prob": 0.0008621862507425249}, {"id": 571, "seek": + 379512, "start": 3810.0, "end": 3817.3599999999997, "text": " such a low entry for + especially for non-technical people or those who are in charge of businesses", "tokens": + [51108, 1270, 257, 2295, 8729, 337, 2318, 337, 2107, 12, 29113, 804, 561, 420, 729, + 567, 366, 294, 4602, 295, 6011, 51476], "temperature": 0.0, "avg_logprob": -0.08196961602499318, + "compression_ratio": 1.7244444444444444, "no_speech_prob": 0.0008621862507425249}, + {"id": 572, "seek": 379512, "start": 3817.3599999999997, "end": 3823.92, "text": + " business units to actually make decisions and I think for them you know having + a relevant demo is", "tokens": [51476, 1606, 6815, 281, 767, 652, 5327, 293, 286, + 519, 337, 552, 291, 458, 1419, 257, 7340, 10723, 307, 51804], "temperature": 0.0, + "avg_logprob": -0.08196961602499318, "compression_ratio": 1.7244444444444444, "no_speech_prob": + 0.0008621862507425249}, {"id": 573, "seek": 382392, "start": 3823.92, "end": 3830.2400000000002, + "text": " going to be quite a game changer because if they need to reason about + your technology only through", "tokens": [50364, 516, 281, 312, 1596, 257, 1216, + 22822, 570, 498, 436, 643, 281, 1778, 466, 428, 2899, 787, 807, 50680], "temperature": + 0.0, "avg_logprob": -0.0946647510972134, "compression_ratio": 1.6835443037974684, + "no_speech_prob": 0.001372458878904581}, {"id": 574, "seek": 382392, "start": 3830.2400000000002, + "end": 3838.0, "text": " the eyes of engineers in their company then probably that''s + that''s much longer path right yeah exactly", "tokens": [50680, 264, 2575, 295, + 11955, 294, 641, 2237, 550, 1391, 300, 311, 300, 311, 709, 2854, 3100, 558, 1338, + 2293, 51068], "temperature": 0.0, "avg_logprob": -0.0946647510972134, "compression_ratio": + 1.6835443037974684, "no_speech_prob": 0.001372458878904581}, {"id": 575, "seek": + 382392, "start": 3838.0, "end": 3844.88, "text": " and I want this experience to + be as smooth as possible so that you can get started with the sample", "tokens": + [51068, 293, 286, 528, 341, 1752, 281, 312, 382, 5508, 382, 1944, 370, 300, 291, + 393, 483, 1409, 365, 264, 6889, 51412], "temperature": 0.0, "avg_logprob": -0.0946647510972134, + "compression_ratio": 1.6835443037974684, "no_speech_prob": 0.001372458878904581}, + {"id": 576, "seek": 382392, "start": 3844.88, "end": 3852.7200000000003, "text": + " application run it locally get some data into it fire up your front and react + and you can interact", "tokens": [51412, 3861, 1190, 309, 16143, 483, 512, 1412, + 666, 309, 2610, 493, 428, 1868, 293, 4515, 293, 291, 393, 4648, 51804], "temperature": + 0.0, "avg_logprob": -0.0946647510972134, "compression_ratio": 1.6835443037974684, + "no_speech_prob": 0.001372458878904581}, {"id": 577, "seek": 385272, "start": 3852.72, + "end": 3858.64, "text": " with it and if you''re happy with it if you want to share + with your friends you can upload it to", "tokens": [50364, 365, 309, 293, 498, 291, + 434, 2055, 365, 309, 498, 291, 528, 281, 2073, 365, 428, 1855, 291, 393, 6580, 309, + 281, 50660], "temperature": 0.0, "avg_logprob": -0.13376969362782165, "compression_ratio": + 1.9208333333333334, "no_speech_prob": 0.003531272755935788}, {"id": 578, "seek": + 385272, "start": 3858.64, "end": 3863.8399999999997, "text": " the Westpac Cloud + and then you can share to URL to your friends and that''s a model that I really", + "tokens": [50660, 264, 4055, 79, 326, 8061, 293, 550, 291, 393, 2073, 281, 12905, + 281, 428, 1855, 293, 300, 311, 257, 2316, 300, 286, 534, 50920], "temperature": + 0.0, "avg_logprob": -0.13376969362782165, "compression_ratio": 1.9208333333333334, + "no_speech_prob": 0.003531272755935788}, {"id": 579, "seek": 385272, "start": 3863.8399999999997, + "end": 3868.8799999999997, "text": " believe in that you can it''s open source so + you can actually run it locally and then you can", "tokens": [50920, 1697, 294, + 300, 291, 393, 309, 311, 1269, 4009, 370, 291, 393, 767, 1190, 309, 16143, 293, + 550, 291, 393, 51172], "temperature": 0.0, "avg_logprob": -0.13376969362782165, + "compression_ratio": 1.9208333333333334, "no_speech_prob": 0.003531272755935788}, + {"id": 580, "seek": 385272, "start": 3868.8799999999997, "end": 3873.8399999999997, + "text": " take the cloud provider can actually take care of the hosting for you + so that''s", "tokens": [51172, 747, 264, 4588, 12398, 393, 767, 747, 1127, 295, + 264, 16058, 337, 291, 370, 300, 311, 51420], "temperature": 0.0, "avg_logprob": + -0.13376969362782165, "compression_ratio": 1.9208333333333334, "no_speech_prob": + 0.003531272755935788}, {"id": 581, "seek": 385272, "start": 3875.12, "end": 3880.3999999999996, + "text": " and right now we actually we are providing like free trials so you don''t + you only need an email", "tokens": [51484, 293, 558, 586, 321, 767, 321, 366, 6530, + 411, 1737, 12450, 370, 291, 500, 380, 291, 787, 643, 364, 3796, 51748], "temperature": + 0.0, "avg_logprob": -0.13376969362782165, "compression_ratio": 1.9208333333333334, + "no_speech_prob": 0.003531272755935788}, {"id": 582, "seek": 388040, "start": 3880.48, + "end": 3884.56, "text": " address for the Westpac Cloud you don''t need a credit + card or things like that so you can actually", "tokens": [50368, 2985, 337, 264, + 4055, 79, 326, 8061, 291, 500, 380, 643, 257, 5397, 2920, 420, 721, 411, 300, 370, + 291, 393, 767, 50572], "temperature": 0.0, "avg_logprob": -0.14439379085193982, + "compression_ratio": 1.7740740740740741, "no_speech_prob": 0.0016540519427508116}, + {"id": 583, "seek": 388040, "start": 3884.56, "end": 3890.56, "text": " play it + play with it and run with we can even leave a link where users can try out", "tokens": + [50572, 862, 309, 862, 365, 309, 293, 1190, 365, 321, 393, 754, 1856, 257, 2113, + 689, 5022, 393, 853, 484, 50872], "temperature": 0.0, "avg_logprob": -0.14439379085193982, + "compression_ratio": 1.7740740740740741, "no_speech_prob": 0.0016540519427508116}, + {"id": 584, "seek": 388040, "start": 3891.28, "end": 3896.2400000000002, "text": + " Westpac and subscribe so I think that will be quite beneficial and actually I + was thinking like even", "tokens": [50908, 4055, 79, 326, 293, 3022, 370, 286, 519, + 300, 486, 312, 1596, 14072, 293, 767, 286, 390, 1953, 411, 754, 51156], "temperature": + 0.0, "avg_logprob": -0.14439379085193982, "compression_ratio": 1.7740740740740741, + "no_speech_prob": 0.0016540519427508116}, {"id": 585, "seek": 388040, "start": 3896.2400000000002, + "end": 3902.8, "text": " though we a little bit drifted in our conversation away + from better search you did mention the exciting", "tokens": [51156, 1673, 321, 257, + 707, 857, 19699, 292, 294, 527, 3761, 1314, 490, 1101, 3164, 291, 630, 2152, 264, + 4670, 51484], "temperature": 0.0, "avg_logprob": -0.14439379085193982, "compression_ratio": + 1.7740740740740741, "no_speech_prob": 0.0016540519427508116}, {"id": 586, "seek": + 388040, "start": 3902.8, "end": 3908.4, "text": " space of combining you know better + search with smart search and I wanted to take it from the", "tokens": [51484, 1901, + 295, 21928, 291, 458, 1101, 3164, 365, 4069, 3164, 293, 286, 1415, 281, 747, 309, + 490, 264, 51764], "temperature": 0.0, "avg_logprob": -0.14439379085193982, "compression_ratio": + 1.7740740740740741, "no_speech_prob": 0.0016540519427508116}, {"id": 587, "seek": + 390840, "start": 3908.4, "end": 3914.96, "text": " angle of a non-technical user + right so let''s say they come to you and they say Joe can you actually", "tokens": + [50364, 5802, 295, 257, 2107, 12, 29113, 804, 4195, 558, 370, 718, 311, 584, 436, + 808, 281, 291, 293, 436, 584, 6807, 393, 291, 767, 50692], "temperature": 0.0, "avg_logprob": + -0.11715237990669582, "compression_ratio": 1.7180616740088106, "no_speech_prob": + 0.0014938532840460539}, {"id": 588, "seek": 390840, "start": 3914.96, "end": 3920.4, + "text": " enlighten me a little bit on how do I combine these things maybe I just + want to deep my toe and", "tokens": [50692, 18690, 268, 385, 257, 707, 857, 322, + 577, 360, 286, 10432, 613, 721, 1310, 286, 445, 528, 281, 2452, 452, 13976, 293, + 50964], "temperature": 0.0, "avg_logprob": -0.11715237990669582, "compression_ratio": + 1.7180616740088106, "no_speech_prob": 0.0014938532840460539}, {"id": 589, "seek": + 390840, "start": 3920.4, "end": 3927.84, "text": " vector search just to see what + it cannot cannot do in my domain what would you recommend them to do", "tokens": + [50964, 8062, 3164, 445, 281, 536, 437, 309, 2644, 2644, 360, 294, 452, 9274, 437, + 576, 291, 2748, 552, 281, 360, 51336], "temperature": 0.0, "avg_logprob": -0.11715237990669582, + "compression_ratio": 1.7180616740088106, "no_speech_prob": 0.0014938532840460539}, + {"id": 590, "seek": 390840, "start": 3927.84, "end": 3933.6800000000003, "text": + " assuming that they already have maybe like a smart search engine and maybe they + are evaluating", "tokens": [51336, 11926, 300, 436, 1217, 362, 1310, 411, 257, 4069, + 3164, 2848, 293, 1310, 436, 366, 27479, 51628], "temperature": 0.0, "avg_logprob": + -0.11715237990669582, "compression_ratio": 1.7180616740088106, "no_speech_prob": + 0.0014938532840460539}, {"id": 591, "seek": 393368, "start": 3933.7599999999998, + "end": 3946.08, "text": " Westpac as one candidate yeah so I think the question + is if you''re using Westpac it''s rather", "tokens": [50368, 4055, 79, 326, 382, + 472, 11532, 1338, 370, 286, 519, 264, 1168, 307, 498, 291, 434, 1228, 4055, 79, + 326, 309, 311, 2831, 50984], "temperature": 0.0, "avg_logprob": -0.16778639088506284, + "compression_ratio": 1.5666666666666667, "no_speech_prob": 0.005347550846636295}, + {"id": 592, "seek": 393368, "start": 3946.08, "end": 3951.12, "text": " easy to + do this because you you can express it in the query and then you write the right + key", "tokens": [50984, 1858, 281, 360, 341, 570, 291, 291, 393, 5109, 309, 294, + 264, 14581, 293, 550, 291, 2464, 264, 558, 2141, 51236], "temperature": 0.0, "avg_logprob": + -0.16778639088506284, "compression_ratio": 1.5666666666666667, "no_speech_prob": + 0.005347550846636295}, {"id": 593, "seek": 393368, "start": 3951.12, "end": 3956.16, + "text": " profiles saying that you know this is how going to combine the sparse + ranking single for example", "tokens": [51236, 23693, 1566, 300, 291, 458, 341, + 307, 577, 516, 281, 10432, 264, 637, 11668, 17833, 2167, 337, 1365, 51488], "temperature": + 0.0, "avg_logprob": -0.16778639088506284, "compression_ratio": 1.5666666666666667, + "no_speech_prob": 0.005347550846636295}, {"id": 594, "seek": 395616, "start": 3956.16, + "end": 3964.0, "text": " be on 25 with retrieval for others that are not using Westpac + using for example elastic search and", "tokens": [50364, 312, 322, 3552, 365, 19817, + 3337, 337, 2357, 300, 366, 406, 1228, 4055, 79, 326, 1228, 337, 1365, 17115, 3164, + 293, 50756], "temperature": 0.0, "avg_logprob": -0.24895044391074878, "compression_ratio": + 1.7731481481481481, "no_speech_prob": 0.0003182266082148999}, {"id": 595, "seek": + 395616, "start": 3964.56, "end": 3969.92, "text": " open source of our shesolar + what we see is that they build a lot of infrastructure on top of", "tokens": [50784, + 1269, 4009, 295, 527, 402, 279, 401, 289, 437, 321, 536, 307, 300, 436, 1322, 257, + 688, 295, 6896, 322, 1192, 295, 51052], "temperature": 0.0, "avg_logprob": -0.24895044391074878, + "compression_ratio": 1.7731481481481481, "no_speech_prob": 0.0003182266082148999}, + {"id": 596, "seek": 395616, "start": 3969.92, "end": 3976.72, "text": " these so + they actually have the ranking layers outside of elastic search right so in that + case is", "tokens": [51052, 613, 370, 436, 767, 362, 264, 17833, 7914, 2380, 295, + 17115, 3164, 558, 370, 294, 300, 1389, 307, 51392], "temperature": 0.0, "avg_logprob": + -0.24895044391074878, "compression_ratio": 1.7731481481481481, "no_speech_prob": + 0.0003182266082148999}, {"id": 597, "seek": 395616, "start": 3976.72, "end": 3985.2, + "text": " you could have kind of a vector search library running at the side of + elastic search and then", "tokens": [51392, 291, 727, 362, 733, 295, 257, 8062, + 3164, 6405, 2614, 412, 264, 1252, 295, 17115, 3164, 293, 550, 51816], "temperature": + 0.0, "avg_logprob": -0.24895044391074878, "compression_ratio": 1.7731481481481481, + "no_speech_prob": 0.0003182266082148999}, {"id": 598, "seek": 398520, "start": 3985.8399999999997, + "end": 3991.6, "text": " retrieve and then you need to you need to keep those two + data stores in sync and then you can", "tokens": [50396, 30254, 293, 550, 291, 643, + 281, 291, 643, 281, 1066, 729, 732, 1412, 9512, 294, 20271, 293, 550, 291, 393, + 50684], "temperature": 0.0, "avg_logprob": -0.08734472592671712, "compression_ratio": + 1.9077669902912622, "no_speech_prob": 0.000683339312672615}, {"id": 599, "seek": + 398520, "start": 3991.6, "end": 3999.6, "text": " in parallel fetch okay give me + elastic search your best results and vector search database give me", "tokens": + [50684, 294, 8952, 23673, 1392, 976, 385, 17115, 3164, 428, 1151, 3542, 293, 8062, + 3164, 8149, 976, 385, 51084], "temperature": 0.0, "avg_logprob": -0.08734472592671712, + "compression_ratio": 1.9077669902912622, "no_speech_prob": 0.000683339312672615}, + {"id": 600, "seek": 398520, "start": 3999.6, "end": 4006.3999999999996, "text": + " your best results and then you can use a technique called reciprocal rank fusion + where you basically", "tokens": [51084, 428, 1151, 3542, 293, 550, 291, 393, 764, + 257, 6532, 1219, 46948, 6181, 23100, 689, 291, 1936, 51424], "temperature": 0.0, + "avg_logprob": -0.08734472592671712, "compression_ratio": 1.9077669902912622, "no_speech_prob": + 0.000683339312672615}, {"id": 601, "seek": 398520, "start": 4006.3999999999996, + "end": 4012.56, "text": " merged results based on you know are they are they ranking + you know it''s the document found in both", "tokens": [51424, 36427, 3542, 2361, + 322, 291, 458, 366, 436, 366, 436, 17833, 291, 458, 309, 311, 264, 4166, 1352, 294, + 1293, 51732], "temperature": 0.0, "avg_logprob": -0.08734472592671712, "compression_ratio": + 1.9077669902912622, "no_speech_prob": 0.000683339312672615}, {"id": 602, "seek": + 401256, "start": 4013.52, "end": 4018.32, "text": " so that''s that''s a powerful + technique of but you don''t have to actually know anything about", "tokens": [50412, + 370, 300, 311, 300, 311, 257, 4005, 6532, 295, 457, 291, 500, 380, 362, 281, 767, + 458, 1340, 466, 50652], "temperature": 0.0, "avg_logprob": -0.15275844486280418, + "compression_ratio": 1.7327188940092166, "no_speech_prob": 0.00033098013955168426}, + {"id": 603, "seek": 401256, "start": 4018.32, "end": 4024.0, "text": " the distribution + of ranking scores and so on so google is writing a lot about reciprocal rank", "tokens": + [50652, 264, 7316, 295, 17833, 13444, 293, 370, 322, 370, 20742, 307, 3579, 257, + 688, 466, 46948, 6181, 50936], "temperature": 0.0, "avg_logprob": -0.15275844486280418, + "compression_ratio": 1.7327188940092166, "no_speech_prob": 0.00033098013955168426}, + {"id": 604, "seek": 401256, "start": 4024.0, "end": 4030.56, "text": " fusion so + it''s interesting direction and that''s one thing we know from Bing and from others + from", "tokens": [50936, 23100, 370, 309, 311, 1880, 3513, 293, 300, 311, 472, 551, + 321, 458, 490, 30755, 293, 490, 2357, 490, 51264], "temperature": 0.0, "avg_logprob": + -0.15275844486280418, "compression_ratio": 1.7327188940092166, "no_speech_prob": + 0.00033098013955168426}, {"id": 605, "seek": 401256, "start": 4030.56, "end": 4038.16, + "text": " from both Bing and from bydo in in China is that they''re doing this kind + of mix mix retrieval", "tokens": [51264, 490, 1293, 30755, 293, 490, 538, 2595, + 294, 294, 3533, 307, 300, 436, 434, 884, 341, 733, 295, 2890, 2890, 19817, 3337, + 51644], "temperature": 0.0, "avg_logprob": -0.15275844486280418, "compression_ratio": + 1.7327188940092166, "no_speech_prob": 0.00033098013955168426}, {"id": 606, "seek": + 403816, "start": 4039.12, "end": 4044.56, "text": " with different systems for sparse + signals and then signals but but then you have for the regular", "tokens": [50412, + 365, 819, 3652, 337, 637, 11668, 12354, 293, 550, 12354, 457, 457, 550, 291, 362, + 337, 264, 3890, 50684], "temperature": 0.0, "avg_logprob": -0.23162523905436197, + "compression_ratio": 1.8066037735849056, "no_speech_prob": 0.004440918564796448}, + {"id": 607, "seek": 403816, "start": 4044.56, "end": 4050.3199999999997, "text": + " uses you have a lot of moving parts right you have different data stores make + the manned ship", "tokens": [50684, 4960, 291, 362, 257, 688, 295, 2684, 3166, 558, + 291, 362, 819, 1412, 9512, 652, 264, 587, 9232, 5374, 50972], "temperature": 0.0, + "avg_logprob": -0.23162523905436197, "compression_ratio": 1.8066037735849056, "no_speech_prob": + 0.004440918564796448}, {"id": 608, "seek": 403816, "start": 4050.3199999999997, + "end": 4055.6, "text": " and that''s one of the things that we try to our advantage + is that when you''re using Westbyes that", "tokens": [50972, 293, 300, 311, 472, + 295, 264, 721, 300, 321, 853, 281, 527, 5002, 307, 300, 562, 291, 434, 1228, 4055, + 2322, 279, 300, 51236], "temperature": 0.0, "avg_logprob": -0.23162523905436197, + "compression_ratio": 1.8066037735849056, "no_speech_prob": 0.004440918564796448}, + {"id": 609, "seek": 403816, "start": 4056.3199999999997, "end": 4061.8399999999997, + "text": " you know you you get these capabilities in the same engine you don''t + need to store the data in", "tokens": [51272, 291, 458, 291, 291, 483, 613, 10862, + 294, 264, 912, 2848, 291, 500, 380, 643, 281, 3531, 264, 1412, 294, 51548], "temperature": + 0.0, "avg_logprob": -0.23162523905436197, "compression_ratio": 1.8066037735849056, + "no_speech_prob": 0.004440918564796448}, {"id": 610, "seek": 406184, "start": 4061.84, + "end": 4068.7200000000003, "text": " different stores and having consistency problems + because of that yeah yeah so I will definitely", "tokens": [50364, 819, 9512, 293, + 1419, 14416, 2740, 570, 295, 300, 1338, 1338, 370, 286, 486, 2138, 50708], "temperature": + 0.0, "avg_logprob": -0.20454967169114102, "compression_ratio": 1.6755555555555555, + "no_speech_prob": 0.00220703799277544}, {"id": 611, "seek": 406184, "start": 4068.7200000000003, + "end": 4073.6800000000003, "text": " if you''re interested if you''re sitting there + today with open source or or elastic search", "tokens": [50708, 498, 291, 434, 3102, + 498, 291, 434, 3798, 456, 965, 365, 1269, 4009, 420, 420, 17115, 3164, 50956], "temperature": + 0.0, "avg_logprob": -0.20454967169114102, "compression_ratio": 1.6755555555555555, + "no_speech_prob": 0.00220703799277544}, {"id": 612, "seek": 406184, "start": 4074.88, + "end": 4080.6400000000003, "text": " and you don''t want to invest in in in in the + vast park you could try this batching the query", "tokens": [51016, 293, 291, 500, + 380, 528, 281, 1963, 294, 294, 294, 294, 264, 8369, 3884, 291, 727, 853, 341, 15245, + 278, 264, 14581, 51304], "temperature": 0.0, "avg_logprob": -0.20454967169114102, + "compression_ratio": 1.6755555555555555, "no_speech_prob": 0.00220703799277544}, + {"id": 613, "seek": 406184, "start": 4080.6400000000003, "end": 4087.76, "text": + " and doing reciprocal rank fusion yeah yeah it could be like one way to actually + introduce something", "tokens": [51304, 293, 884, 46948, 6181, 23100, 1338, 1338, + 309, 727, 312, 411, 472, 636, 281, 767, 5366, 746, 51660], "temperature": 0.0, "avg_logprob": + -0.20454967169114102, "compression_ratio": 1.6755555555555555, "no_speech_prob": + 0.00220703799277544}, {"id": 614, "seek": 408776, "start": 4087.84, "end": 4094.48, + "text": " from more like semantic search if you view it that way right so that''s + a great idea because I think", "tokens": [50368, 490, 544, 411, 47982, 3164, 498, + 291, 1910, 309, 300, 636, 558, 370, 300, 311, 257, 869, 1558, 570, 286, 519, 50700], + "temperature": 0.0, "avg_logprob": -0.14632236256318934, "compression_ratio": 1.7327188940092166, + "no_speech_prob": 0.00031861523166298866}, {"id": 615, "seek": 408776, "start": + 4095.6800000000003, "end": 4102.08, "text": " there are multiple approaches to this + and I think if you are within one search engine", "tokens": [50760, 456, 366, 3866, + 11587, 281, 341, 293, 286, 519, 498, 291, 366, 1951, 472, 3164, 2848, 51080], "temperature": + 0.0, "avg_logprob": -0.14632236256318934, "compression_ratio": 1.7327188940092166, + "no_speech_prob": 0.00031861523166298866}, {"id": 616, "seek": 408776, "start": + 4103.52, "end": 4109.12, "text": " like say VESPA or elastic search open search + a solar would have you then I think you could in", "tokens": [51152, 411, 584, 691, + 2358, 10297, 420, 17115, 3164, 1269, 3164, 257, 7936, 576, 362, 291, 550, 286, 519, + 291, 727, 294, 51432], "temperature": 0.0, "avg_logprob": -0.14632236256318934, + "compression_ratio": 1.7327188940092166, "no_speech_prob": 0.00031861523166298866}, + {"id": 617, "seek": 408776, "start": 4109.12, "end": 4116.72, "text": " principle + experiment with like fusing you know the neural search result with sparse search + using", "tokens": [51432, 8665, 5120, 365, 411, 283, 7981, 291, 458, 264, 18161, + 3164, 1874, 365, 637, 11668, 3164, 1228, 51812], "temperature": 0.0, "avg_logprob": + -0.14632236256318934, "compression_ratio": 1.7327188940092166, "no_speech_prob": + 0.00031861523166298866}, {"id": 618, "seek": 411672, "start": 4116.72, "end": 4123.92, + "text": " some kind of linear combination as you actually retrieve it right yeah + so so so so you so you can", "tokens": [50364, 512, 733, 295, 8213, 6562, 382, 291, + 767, 30254, 309, 558, 1338, 370, 370, 370, 370, 291, 370, 291, 393, 50724], "temperature": + 0.0, "avg_logprob": -0.1170221306811804, "compression_ratio": 1.835680751173709, + "no_speech_prob": 0.0007419881876558065}, {"id": 619, "seek": 411672, "start": 4123.92, + "end": 4129.12, "text": " actually use the linear combination but the great thing + about this rank fusion is that you don''t", "tokens": [50724, 767, 764, 264, 8213, + 6562, 457, 264, 869, 551, 466, 341, 6181, 23100, 307, 300, 291, 500, 380, 50984], + "temperature": 0.0, "avg_logprob": -0.1170221306811804, "compression_ratio": 1.835680751173709, + "no_speech_prob": 0.0007419881876558065}, {"id": 620, "seek": 411672, "start": 4129.12, + "end": 4136.0, "text": " simply you don''t look at the ranking scores so you basically + just fuse them by the order of their", "tokens": [50984, 2935, 291, 500, 380, 574, + 412, 264, 17833, 13444, 370, 291, 1936, 445, 31328, 552, 538, 264, 1668, 295, 641, + 51328], "temperature": 0.0, "avg_logprob": -0.1170221306811804, "compression_ratio": + 1.835680751173709, "no_speech_prob": 0.0007419881876558065}, {"id": 621, "seek": + 411672, "start": 4136.0, "end": 4142.8, "text": " returns so you don''t have to + know anything about the score distribution like EM25 it has basically", "tokens": + [51328, 11247, 370, 291, 500, 380, 362, 281, 458, 1340, 466, 264, 6175, 7316, 411, + 16237, 6074, 309, 575, 1936, 51668], "temperature": 0.0, "avg_logprob": -0.1170221306811804, + "compression_ratio": 1.835680751173709, "no_speech_prob": 0.0007419881876558065}, + {"id": 622, "seek": 414280, "start": 4142.8, "end": 4150.56, "text": " unbounded + it could be 25 it could be 100 to be 5 right so it''s very difficult to to to combine + that", "tokens": [50364, 517, 18767, 292, 309, 727, 312, 3552, 309, 727, 312, 2319, + 281, 312, 1025, 558, 370, 309, 311, 588, 2252, 281, 281, 281, 10432, 300, 50752], + "temperature": 0.0, "avg_logprob": -0.17413631900326237, "compression_ratio": 1.6796536796536796, + "no_speech_prob": 0.0005067753954790533}, {"id": 623, "seek": 414280, "start": 4150.56, + "end": 4156.16, "text": " using a linear model because you have two signals you + know and one is number is going to be like this", "tokens": [50752, 1228, 257, 8213, + 2316, 570, 291, 362, 732, 12354, 291, 458, 293, 472, 307, 1230, 307, 516, 281, 312, + 411, 341, 51032], "temperature": 0.0, "avg_logprob": -0.17413631900326237, "compression_ratio": + 1.6796536796536796, "no_speech_prob": 0.0005067753954790533}, {"id": 624, "seek": + 414280, "start": 4156.16, "end": 4162.56, "text": " and now it was going to be between + 0 and 1 so reciprocal rank fusion is definitely you know", "tokens": [51032, 293, + 586, 309, 390, 516, 281, 312, 1296, 1958, 293, 502, 370, 46948, 6181, 23100, 307, + 2138, 291, 458, 51352], "temperature": 0.0, "avg_logprob": -0.17413631900326237, + "compression_ratio": 1.6796536796536796, "no_speech_prob": 0.0005067753954790533}, + {"id": 625, "seek": 414280, "start": 4163.6, "end": 4170.400000000001, "text": " + interesting case actually this is super great point and hopefully we can provide + some links to", "tokens": [51404, 1880, 1389, 767, 341, 307, 1687, 869, 935, 293, + 4696, 321, 393, 2893, 512, 6123, 281, 51744], "temperature": 0.0, "avg_logprob": + -0.17413631900326237, "compression_ratio": 1.6796536796536796, "no_speech_prob": + 0.0005067753954790533}, {"id": 626, "seek": 417040, "start": 4170.48, "end": 4176.879999999999, + "text": " this because this technique because I think I heard this question multiple + times that would you", "tokens": [50368, 341, 570, 341, 6532, 570, 286, 519, 286, + 2198, 341, 1168, 3866, 1413, 300, 576, 291, 50688], "temperature": 0.0, "avg_logprob": + -0.10547225446586149, "compression_ratio": 1.7813953488372094, "no_speech_prob": + 0.0018489620415493846}, {"id": 627, "seek": 417040, "start": 4176.879999999999, + "end": 4184.0, "text": " set exactly just now that the score space is completely + different and they are not compatible with", "tokens": [50688, 992, 2293, 445, 586, + 300, 264, 6175, 1901, 307, 2584, 819, 293, 436, 366, 406, 18218, 365, 51044], "temperature": + 0.0, "avg_logprob": -0.10547225446586149, "compression_ratio": 1.7813953488372094, + "no_speech_prob": 0.0018489620415493846}, {"id": 628, "seek": 417040, "start": 4184.0, + "end": 4191.679999999999, "text": " each other and so you have to find a way to + still interleave them or merge them right so that", "tokens": [51044, 1184, 661, + 293, 370, 291, 362, 281, 915, 257, 636, 281, 920, 728, 306, 946, 552, 420, 22183, + 552, 558, 370, 300, 51428], "temperature": 0.0, "avg_logprob": -0.10547225446586149, + "compression_ratio": 1.7813953488372094, "no_speech_prob": 0.0018489620415493846}, + {"id": 629, "seek": 417040, "start": 4192.719999999999, "end": 4197.5199999999995, + "text": " would you set exactly makes sense that you don''t pay attention to the + score space you actually", "tokens": [51480, 576, 291, 992, 2293, 1669, 2020, 300, + 291, 500, 380, 1689, 3202, 281, 264, 6175, 1901, 291, 767, 51720], "temperature": + 0.0, "avg_logprob": -0.10547225446586149, "compression_ratio": 1.7813953488372094, + "no_speech_prob": 0.0018489620415493846}, {"id": 630, "seek": 419752, "start": 4197.52, + "end": 4204.72, "text": " look at the order and you try your best to interleave + them yeah that makes total sense yeah yeah", "tokens": [50364, 574, 412, 264, 1668, + 293, 291, 853, 428, 1151, 281, 728, 306, 946, 552, 1338, 300, 1669, 3217, 2020, + 1338, 1338, 50724], "temperature": 0.0, "avg_logprob": -0.17798460535256258, "compression_ratio": + 1.7442922374429224, "no_speech_prob": 0.0006731762550771236}, {"id": 631, "seek": + 419752, "start": 4205.52, "end": 4211.52, "text": " there was actually a recent + recent paper on because there has been more interest in that these", "tokens": [50764, + 456, 390, 767, 257, 5162, 5162, 3035, 322, 570, 456, 575, 668, 544, 1179, 294, 300, + 613, 51064], "temperature": 0.0, "avg_logprob": -0.17798460535256258, "compression_ratio": + 1.7442922374429224, "no_speech_prob": 0.0006731762550771236}, {"id": 632, "seek": + 419752, "start": 4211.52, "end": 4217.92, "text": " dense models alone that they + generalize not that well what you''re using out of domain and one of", "tokens": + [51064, 18011, 5245, 3312, 300, 436, 2674, 1125, 406, 300, 731, 437, 291, 434, 1228, + 484, 295, 9274, 293, 472, 295, 51384], "temperature": 0.0, "avg_logprob": -0.17798460535256258, + "compression_ratio": 1.7442922374429224, "no_speech_prob": 0.0006731762550771236}, + {"id": 633, "seek": 419752, "start": 4217.92, "end": 4223.84, "text": " the things + that the Google researchers were doing and showed promising results was using this", + "tokens": [51384, 264, 721, 300, 264, 3329, 10309, 645, 884, 293, 4712, 20257, 3542, + 390, 1228, 341, 51680], "temperature": 0.0, "avg_logprob": -0.17798460535256258, + "compression_ratio": 1.7442922374429224, "no_speech_prob": 0.0006731762550771236}, + {"id": 634, "seek": 422384, "start": 4223.84, "end": 4230.72, "text": " rank fusion + and I''ve seen this rank fusion in a multiple Google papers so so it''s very interesting", + "tokens": [50364, 6181, 23100, 293, 286, 600, 1612, 341, 6181, 23100, 294, 257, + 3866, 3329, 10577, 370, 370, 309, 311, 588, 1880, 50708], "temperature": 0.0, "avg_logprob": + -0.17916328566414969, "compression_ratio": 1.7105263157894737, "no_speech_prob": + 0.001993240090087056}, {"id": 635, "seek": 422384, "start": 4230.72, "end": 4238.72, + "text": " the researchers they''re really interested in reciprocal rank fusion so + yeah sounds like a popular", "tokens": [50708, 264, 10309, 436, 434, 534, 3102, + 294, 46948, 6181, 23100, 370, 1338, 3263, 411, 257, 3743, 51108], "temperature": + 0.0, "avg_logprob": -0.17916328566414969, "compression_ratio": 1.7105263157894737, + "no_speech_prob": 0.001993240090087056}, {"id": 636, "seek": 422384, "start": 4238.72, + "end": 4246.0, "text": " technique yes I mean time flies and I really enjoy talking + it feels like we could record another", "tokens": [51108, 6532, 2086, 286, 914, + 565, 17414, 293, 286, 534, 2103, 1417, 309, 3417, 411, 321, 727, 2136, 1071, 51472], + "temperature": 0.0, "avg_logprob": -0.17916328566414969, "compression_ratio": 1.7105263157894737, + "no_speech_prob": 0.001993240090087056}, {"id": 637, "seek": 422384, "start": 4246.0, + "end": 4252.72, "text": " podcast what do you think I''m talking multiple topics + but I still really love to pick a brain on", "tokens": [51472, 7367, 437, 360, 291, + 519, 286, 478, 1417, 3866, 8378, 457, 286, 920, 534, 959, 281, 1888, 257, 3567, + 322, 51808], "temperature": 0.0, "avg_logprob": -0.17916328566414969, "compression_ratio": + 1.7105263157894737, "no_speech_prob": 0.001993240090087056}, {"id": 638, "seek": + 425272, "start": 4252.8, "end": 4264.400000000001, "text": " that philosophical + question and kind of ask you what what keeps you so interested like you are a", + "tokens": [50368, 300, 25066, 1168, 293, 733, 295, 1029, 291, 437, 437, 5965, 291, + 370, 3102, 411, 291, 366, 257, 50948], "temperature": 0.0, "avg_logprob": -0.1546521027882894, + "compression_ratio": 1.5769230769230769, "no_speech_prob": 0.0005918564274907112}, + {"id": 639, "seek": 425272, "start": 4264.400000000001, "end": 4271.12, "text": + " loudmouth behind West Pine general but you also offer a bunch of advice right + like through your", "tokens": [50948, 6588, 22357, 2261, 4055, 33531, 2674, 457, + 291, 611, 2626, 257, 3840, 295, 5192, 558, 411, 807, 428, 51284], "temperature": + 0.0, "avg_logprob": -0.1546521027882894, "compression_ratio": 1.5769230769230769, + "no_speech_prob": 0.0005918564274907112}, {"id": 640, "seek": 425272, "start": 4271.12, + "end": 4277.12, "text": " blogs through your public presentations and even sharing + papers on Twitter at least for me was", "tokens": [51284, 31038, 807, 428, 1908, + 18964, 293, 754, 5414, 10577, 322, 5794, 412, 1935, 337, 385, 390, 51584], "temperature": + 0.0, "avg_logprob": -0.1546521027882894, "compression_ratio": 1.5769230769230769, + "no_speech_prob": 0.0005918564274907112}, {"id": 641, "seek": 427712, "start": 4277.12, + "end": 4283.84, "text": " super helpful that I could you know quickly read the paper + that you shared but what keeps you", "tokens": [50364, 1687, 4961, 300, 286, 727, + 291, 458, 2661, 1401, 264, 3035, 300, 291, 5507, 457, 437, 5965, 291, 50700], "temperature": + 0.0, "avg_logprob": -0.10353815102879005, "compression_ratio": 1.6803652968036529, + "no_speech_prob": 0.0009478465071879327}, {"id": 642, "seek": 427712, "start": 4283.84, + "end": 4290.4, "text": " motivated and interested to stay in this field and also + specifically you know maybe you think", "tokens": [50700, 14515, 293, 3102, 281, + 1754, 294, 341, 2519, 293, 611, 4682, 291, 458, 1310, 291, 519, 51028], "temperature": + 0.0, "avg_logprob": -0.10353815102879005, "compression_ratio": 1.6803652968036529, + "no_speech_prob": 0.0009478465071879327}, {"id": 643, "seek": 427712, "start": 4290.4, + "end": 4296.72, "text": " something is missing in the vector search space or in + general in in search space that you would like", "tokens": [51028, 746, 307, 5361, + 294, 264, 8062, 3164, 1901, 420, 294, 2674, 294, 294, 3164, 1901, 300, 291, 576, + 411, 51344], "temperature": 0.0, "avg_logprob": -0.10353815102879005, "compression_ratio": + 1.6803652968036529, "no_speech_prob": 0.0009478465071879327}, {"id": 644, "seek": + 427712, "start": 4296.72, "end": 4307.04, "text": " to fix yeah so it''s a great + philosophical question I think I''m not that excited", "tokens": [51344, 281, 3191, + 1338, 370, 309, 311, 257, 869, 25066, 1168, 286, 519, 286, 478, 406, 300, 2919, + 51860], "temperature": 0.0, "avg_logprob": -0.10353815102879005, "compression_ratio": + 1.6803652968036529, "no_speech_prob": 0.0009478465071879327}, {"id": 645, "seek": + 430712, "start": 4307.12, "end": 4315.76, "text": " about vector search I see that + as a technique so I''m more like excited about search because I think", "tokens": + [50364, 466, 8062, 3164, 286, 536, 300, 382, 257, 6532, 370, 286, 478, 544, 411, + 2919, 466, 3164, 570, 286, 519, 50796], "temperature": 0.0, "avg_logprob": -0.13897228240966797, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.001989323878660798}, + {"id": 646, "seek": 430712, "start": 4315.76, "end": 4321.599999999999, "text": + " it''s such a fascinating problem we touched on it before you know you have query + categorization", "tokens": [50796, 309, 311, 1270, 257, 10343, 1154, 321, 9828, + 322, 309, 949, 291, 458, 291, 362, 14581, 19250, 2144, 51088], "temperature": 0.0, + "avg_logprob": -0.13897228240966797, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.001989323878660798}, {"id": 647, "seek": 430712, "start": 4321.599999999999, "end": + 4328.64, "text": " spelling you have so many different aspects of building a great + search experience and also the", "tokens": [51088, 22254, 291, 362, 370, 867, 819, + 7270, 295, 2390, 257, 869, 3164, 1752, 293, 611, 264, 51440], "temperature": 0.0, + "avg_logprob": -0.13897228240966797, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.001989323878660798}, {"id": 648, "seek": 430712, "start": 4328.64, "end": 4334.64, + "text": " scale thing is really appealing for me you know kind of passionate about + you know how can we do this", "tokens": [51440, 4373, 551, 307, 534, 23842, 337, + 385, 291, 458, 733, 295, 11410, 466, 291, 458, 577, 393, 321, 360, 341, 51740], + "temperature": 0.0, "avg_logprob": -0.13897228240966797, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.001989323878660798}, {"id": 649, "seek": 433464, "start": 4334.72, + "end": 4340.8, "text": " billion scale how can we make it fast you know what if + we need 100,000 queries per second what if", "tokens": [50368, 5218, 4373, 577, + 393, 321, 652, 309, 2370, 291, 458, 437, 498, 321, 643, 2319, 11, 1360, 24109, 680, + 1150, 437, 498, 50672], "temperature": 0.0, "avg_logprob": -0.13269922766886966, + "compression_ratio": 1.6444444444444444, "no_speech_prob": 0.0014623100869357586}, + {"id": 650, "seek": 433464, "start": 4340.8, "end": 4348.88, "text": " we need to + update all the documents in real time and within 12 hours or one hour or you know + where''s", "tokens": [50672, 321, 643, 281, 5623, 439, 264, 8512, 294, 957, 565, + 293, 1951, 2272, 2496, 420, 472, 1773, 420, 291, 458, 689, 311, 51076], "temperature": + 0.0, "avg_logprob": -0.13269922766886966, "compression_ratio": 1.6444444444444444, + "no_speech_prob": 0.0014623100869357586}, {"id": 651, "seek": 433464, "start": 4348.88, + "end": 4357.92, "text": " the limits you know where is the cloud going you know + this compute versus storage can we now move", "tokens": [51076, 264, 10406, 291, + 458, 689, 307, 264, 4588, 516, 291, 458, 341, 14722, 5717, 6725, 393, 321, 586, + 1286, 51528], "temperature": 0.0, "avg_logprob": -0.13269922766886966, "compression_ratio": + 1.6444444444444444, "no_speech_prob": 0.0014623100869357586}, {"id": 652, "seek": + 435792, "start": 4358.88, "end": 4365.68, "text": " more computations out of the + storage layer there are a lot of these exciting on the kind of system", "tokens": + [50412, 544, 2807, 763, 484, 295, 264, 6725, 4583, 456, 366, 257, 688, 295, 613, + 4670, 322, 264, 733, 295, 1185, 50752], "temperature": 0.0, "avg_logprob": -0.1777720658675484, + "compression_ratio": 1.7644444444444445, "no_speech_prob": 0.0009034093818627298}, + {"id": 653, "seek": 435792, "start": 4365.68, "end": 4371.92, "text": " things but + on them kind of more science things you know how to build a great search experience + I think", "tokens": [50752, 721, 457, 322, 552, 733, 295, 544, 3497, 721, 291, 458, + 577, 281, 1322, 257, 869, 3164, 1752, 286, 519, 51064], "temperature": 0.0, "avg_logprob": + -0.1777720658675484, "compression_ratio": 1.7644444444444445, "no_speech_prob": + 0.0009034093818627298}, {"id": 654, "seek": 435792, "start": 4371.92, "end": 4378.0, + "text": " you need to have this kind of multiple techniques and we didn''t touch + on it but vector search is", "tokens": [51064, 291, 643, 281, 362, 341, 733, 295, + 3866, 7512, 293, 321, 994, 380, 2557, 322, 309, 457, 8062, 3164, 307, 51368], "temperature": + 0.0, "avg_logprob": -0.1777720658675484, "compression_ratio": 1.7644444444444445, + "no_speech_prob": 0.0009034093818627298}, {"id": 655, "seek": 435792, "start": 4378.0, + "end": 4386.0, "text": " one thing sparse search is one other but GBDT models tree + based models is really ruling the search", "tokens": [51368, 472, 551, 637, 11668, + 3164, 307, 472, 661, 457, 460, 33, 35, 51, 5245, 4230, 2361, 5245, 307, 534, 21437, + 264, 3164, 51768], "temperature": 0.0, "avg_logprob": -0.1777720658675484, "compression_ratio": + 1.7644444444444445, "no_speech_prob": 0.0009034093818627298}, {"id": 656, "seek": + 438600, "start": 4386.16, "end": 4393.6, "text": " or it''s kind of the hammer of + search because those models on tabular data statistical features", "tokens": [50372, + 420, 309, 311, 733, 295, 264, 13017, 295, 3164, 570, 729, 5245, 322, 4421, 1040, + 1412, 22820, 4122, 50744], "temperature": 0.0, "avg_logprob": -0.14175045911003561, + "compression_ratio": 1.6163793103448276, "no_speech_prob": 0.0004085246182512492}, + {"id": 657, "seek": 438600, "start": 4393.6, "end": 4398.8, "text": " you know they + really show promising results and that''s another thing that I think is great", + "tokens": [50744, 291, 458, 436, 534, 855, 20257, 3542, 293, 300, 311, 1071, 551, + 300, 286, 519, 307, 869, 51004], "temperature": 0.0, "avg_logprob": -0.14175045911003561, + "compression_ratio": 1.6163793103448276, "no_speech_prob": 0.0004085246182512492}, + {"id": 658, "seek": 438600, "start": 4398.8, "end": 4404.64, "text": " than about + west based that you can combine these GBDT models newer metals in the same ranking", + "tokens": [51004, 813, 466, 7009, 2361, 300, 291, 393, 10432, 613, 460, 33, 35, + 51, 5245, 17628, 22548, 294, 264, 912, 17833, 51296], "temperature": 0.0, "avg_logprob": + -0.14175045911003561, "compression_ratio": 1.6163793103448276, "no_speech_prob": + 0.0004085246182512492}, {"id": 659, "seek": 438600, "start": 4404.64, "end": 4412.0, + "text": " functions I don''t think that there''s a one single silver bullet for + retrieval I think there are", "tokens": [51296, 6828, 286, 500, 380, 519, 300, 456, + 311, 257, 472, 2167, 8753, 11632, 337, 19817, 3337, 286, 519, 456, 366, 51664], + "temperature": 0.0, "avg_logprob": -0.14175045911003561, "compression_ratio": 1.6163793103448276, + "no_speech_prob": 0.0004085246182512492}, {"id": 660, "seek": 441200, "start": 4412.0, + "end": 4417.68, "text": " multiple different singles like for instance let''s let''s + do vector search if you only do vector", "tokens": [50364, 3866, 819, 36334, 411, + 337, 5197, 718, 311, 718, 311, 360, 8062, 3164, 498, 291, 787, 360, 8062, 50648], + "temperature": 0.0, "avg_logprob": -0.14105597487441054, "compression_ratio": 1.9637096774193548, + "no_speech_prob": 0.002578939311206341}, {"id": 661, "seek": 441200, "start": 4417.68, + "end": 4424.48, "text": " search if Google only did vector search only on the text + right it would basically you have a lot", "tokens": [50648, 3164, 498, 3329, 787, + 630, 8062, 3164, 787, 322, 264, 2487, 558, 309, 576, 1936, 291, 362, 257, 688, 50988], + "temperature": 0.0, "avg_logprob": -0.14105597487441054, "compression_ratio": 1.9637096774193548, + "no_speech_prob": 0.002578939311206341}, {"id": 662, "seek": 441200, "start": 4424.48, + "end": 4430.72, "text": " of duplicates on the web you have low quality content + you know there''s page rank there other factors", "tokens": [50988, 295, 17154, + 1024, 322, 264, 3670, 291, 362, 2295, 3125, 2701, 291, 458, 456, 311, 3028, 6181, + 456, 661, 6771, 51300], "temperature": 0.0, "avg_logprob": -0.14105597487441054, + "compression_ratio": 1.9637096774193548, "no_speech_prob": 0.002578939311206341}, + {"id": 663, "seek": 441200, "start": 4430.72, "end": 4436.64, "text": " you know + it''s not only vector search there''s that kind of different techniques so and so + that", "tokens": [51300, 291, 458, 309, 311, 406, 787, 8062, 3164, 456, 311, 300, + 733, 295, 819, 7512, 370, 293, 370, 300, 51596], "temperature": 0.0, "avg_logprob": + -0.14105597487441054, "compression_ratio": 1.9637096774193548, "no_speech_prob": + 0.002578939311206341}, {"id": 664, "seek": 441200, "start": 4436.64, "end": 4441.44, + "text": " means also that there''s a lot of things new things to learn you know + how do you do caricaturization", "tokens": [51596, 1355, 611, 300, 456, 311, 257, + 688, 295, 721, 777, 721, 281, 1466, 291, 458, 577, 360, 291, 360, 45732, 267, 374, + 2144, 51836], "temperature": 0.0, "avg_logprob": -0.14105597487441054, "compression_ratio": + 1.9637096774193548, "no_speech_prob": 0.002578939311206341}, {"id": 665, "seek": + 444144, "start": 4441.44, "end": 4448.5599999999995, "text": " you know how do you + how do you how do you actually determine which facets and then kind of navigation", + "tokens": [50364, 291, 458, 577, 360, 291, 577, 360, 291, 577, 360, 291, 767, 6997, + 597, 49752, 293, 550, 733, 295, 17346, 50720], "temperature": 0.0, "avg_logprob": + -0.09995260573270028, "compression_ratio": 1.8576923076923078, "no_speech_prob": + 0.0009046139311976731}, {"id": 666, "seek": 444144, "start": 4448.5599999999995, + "end": 4455.5199999999995, "text": " you''re going to show to the user and like + you touched on at the start you know if your user does", "tokens": [50720, 291, + 434, 516, 281, 855, 281, 264, 4195, 293, 411, 291, 9828, 322, 412, 264, 722, 291, + 458, 498, 428, 4195, 775, 51068], "temperature": 0.0, "avg_logprob": -0.09995260573270028, + "compression_ratio": 1.8576923076923078, "no_speech_prob": 0.0009046139311976731}, + {"id": 667, "seek": 444144, "start": 4455.5199999999995, "end": 4461.28, "text": + " a query and we don''t have any good results you know should we just slow them + some random results", "tokens": [51068, 257, 14581, 293, 321, 500, 380, 362, 604, + 665, 3542, 291, 458, 820, 321, 445, 2964, 552, 512, 4974, 3542, 51356], "temperature": + 0.0, "avg_logprob": -0.09995260573270028, "compression_ratio": 1.8576923076923078, + "no_speech_prob": 0.0009046139311976731}, {"id": 668, "seek": 444144, "start": 4461.28, + "end": 4466.0, "text": " or should they say that hey you know I''m sorry but we + don''t have anything for your query so", "tokens": [51356, 420, 820, 436, 584, 300, + 4177, 291, 458, 286, 478, 2597, 457, 321, 500, 380, 362, 1340, 337, 428, 14581, + 370, 51592], "temperature": 0.0, "avg_logprob": -0.09995260573270028, "compression_ratio": + 1.8576923076923078, "no_speech_prob": 0.0009046139311976731}, {"id": 669, "seek": + 444144, "start": 4466.0, "end": 4470.799999999999, "text": " yeah that''s really + what motivates me is that it''s such a fantastic problem if you''re interested", + "tokens": [51592, 1338, 300, 311, 534, 437, 42569, 385, 307, 300, 309, 311, 1270, + 257, 5456, 1154, 498, 291, 434, 3102, 51832], "temperature": 0.0, "avg_logprob": + -0.09995260573270028, "compression_ratio": 1.8576923076923078, "no_speech_prob": + 0.0009046139311976731}, {"id": 670, "seek": 447080, "start": 4470.8, "end": 4477.52, + "text": " in scale and all these kind of things coming together so yeah yeah thanks + for that it''s deep and", "tokens": [50364, 294, 4373, 293, 439, 613, 733, 295, + 721, 1348, 1214, 370, 1338, 1338, 3231, 337, 300, 309, 311, 2452, 293, 50700], "temperature": + 0.0, "avg_logprob": -0.09366100392443069, "compression_ratio": 1.7444933920704846, + "no_speech_prob": 0.0006980682956054807}, {"id": 671, "seek": 447080, "start": 4477.52, + "end": 4486.24, "text": " it''s very wide and I think it''s like limitless space + and I hope also that newcomers feel it''s kind", "tokens": [50700, 309, 311, 588, + 4874, 293, 286, 519, 309, 311, 411, 4948, 1832, 1901, 293, 286, 1454, 611, 300, + 40014, 433, 841, 309, 311, 733, 51136], "temperature": 0.0, "avg_logprob": -0.09366100392443069, + "compression_ratio": 1.7444933920704846, "no_speech_prob": 0.0006980682956054807}, + {"id": 672, "seek": 447080, "start": 4486.24, "end": 4492.16, "text": " of like + a low bar entry especially and we didn''t touch on this but especially with your + work and", "tokens": [51136, 295, 411, 257, 2295, 2159, 8729, 2318, 293, 321, 994, + 380, 2557, 322, 341, 457, 2318, 365, 428, 589, 293, 51432], "temperature": 0.0, + "avg_logprob": -0.09366100392443069, "compression_ratio": 1.7444933920704846, "no_speech_prob": + 0.0006980682956054807}, {"id": 673, "seek": 447080, "start": 4492.16, "end": 4500.08, + "text": " open source you know the support like you can go and slack or whatever + tool you''re using to communicate", "tokens": [51432, 1269, 4009, 291, 458, 264, + 1406, 411, 291, 393, 352, 293, 29767, 420, 2035, 2290, 291, 434, 1228, 281, 7890, + 51828], "temperature": 0.0, "avg_logprob": -0.09366100392443069, "compression_ratio": + 1.7444933920704846, "no_speech_prob": 0.0006980682956054807}, {"id": 674, "seek": + 450008, "start": 4500.08, "end": 4508.0, "text": " with your users and actually + listen and address their concerns questions and hopefully this", "tokens": [50364, + 365, 428, 5022, 293, 767, 2140, 293, 2985, 641, 7389, 1651, 293, 4696, 341, 50760], + "temperature": 0.0, "avg_logprob": -0.11716015715348094, "compression_ratio": 1.9282051282051282, + "no_speech_prob": 0.0008709453977644444}, {"id": 675, "seek": 450008, "start": 4508.0, + "end": 4516.64, "text": " opens more you know more possibilities for newcomers to + enter yeah I love I love actually", "tokens": [50760, 9870, 544, 291, 458, 544, + 12178, 337, 40014, 433, 281, 3242, 1338, 286, 959, 286, 959, 767, 51192], "temperature": + 0.0, "avg_logprob": -0.11716015715348094, "compression_ratio": 1.9282051282051282, + "no_speech_prob": 0.0008709453977644444}, {"id": 676, "seek": 450008, "start": 4517.6, + "end": 4522.4, "text": " it''s actually a weakness as well but I love answering + questions you know you can see me answering", "tokens": [51240, 309, 311, 767, 257, + 12772, 382, 731, 457, 286, 959, 13430, 1651, 291, 458, 291, 393, 536, 385, 13430, + 51480], "temperature": 0.0, "avg_logprob": -0.11716015715348094, "compression_ratio": + 1.9282051282051282, "no_speech_prob": 0.0008709453977644444}, {"id": 677, "seek": + 450008, "start": 4522.4, "end": 4527.84, "text": " questions on multiple slack spaces + you know I love people you know asking questions about search", "tokens": [51480, + 1651, 322, 3866, 29767, 7673, 291, 458, 286, 959, 561, 291, 458, 3365, 1651, 466, + 3164, 51752], "temperature": 0.0, "avg_logprob": -0.11716015715348094, "compression_ratio": + 1.9282051282051282, "no_speech_prob": 0.0008709453977644444}, {"id": 678, "seek": + 452784, "start": 4527.84, "end": 4535.52, "text": " so I really love that and I''m + and what really gets me if someone is struggling with something you know", "tokens": + [50364, 370, 286, 534, 959, 300, 293, 286, 478, 293, 437, 534, 2170, 385, 498, 1580, + 307, 9314, 365, 746, 291, 458, 50748], "temperature": 0.0, "avg_logprob": -0.13298741897734084, + "compression_ratio": 1.8364485981308412, "no_speech_prob": 0.0019649919122457504}, + {"id": 679, "seek": 452784, "start": 4535.52, "end": 4541.4400000000005, "text": + " how can I do this with West Park and I''ll try to explain it you know you have + to do this and that", "tokens": [50748, 577, 393, 286, 360, 341, 365, 4055, 4964, + 293, 286, 603, 853, 281, 2903, 309, 291, 458, 291, 362, 281, 360, 341, 293, 300, + 51044], "temperature": 0.0, "avg_logprob": -0.13298741897734084, "compression_ratio": + 1.8364485981308412, "no_speech_prob": 0.0019649919122457504}, {"id": 680, "seek": + 452784, "start": 4541.4400000000005, "end": 4545.84, "text": " you know and then + I like I go back you know at the program saying you know we need to fix this you", + "tokens": [51044, 291, 458, 293, 550, 286, 411, 286, 352, 646, 291, 458, 412, 264, + 1461, 1566, 291, 458, 321, 643, 281, 3191, 341, 291, 51264], "temperature": 0.0, + "avg_logprob": -0.13298741897734084, "compression_ratio": 1.8364485981308412, "no_speech_prob": + 0.0019649919122457504}, {"id": 681, "seek": 452784, "start": 4545.84, "end": 4552.400000000001, + "text": " know we need to make this more easy for people to use right so it''s it''s + a both way thing and", "tokens": [51264, 458, 321, 643, 281, 652, 341, 544, 1858, + 337, 561, 281, 764, 558, 370, 309, 311, 309, 311, 257, 1293, 636, 551, 293, 51592], + "temperature": 0.0, "avg_logprob": -0.13298741897734084, "compression_ratio": 1.8364485981308412, + "no_speech_prob": 0.0019649919122457504}, {"id": 682, "seek": 455240, "start": 4552.48, + "end": 4557.759999999999, "text": " that''s one thing that I learned in my career + is that you know listen carefully to your users", "tokens": [50368, 300, 311, 472, + 551, 300, 286, 3264, 294, 452, 3988, 307, 300, 291, 458, 2140, 7500, 281, 428, 5022, + 50632], "temperature": 0.0, "avg_logprob": -0.1503269752759612, "compression_ratio": + 1.760180995475113, "no_speech_prob": 0.0011456109350547194}, {"id": 683, "seek": + 455240, "start": 4558.879999999999, "end": 4565.36, "text": " how they''re using + the product what are the pain points you know how to how does it feel to get started", + "tokens": [50688, 577, 436, 434, 1228, 264, 1674, 437, 366, 264, 1822, 2793, 291, + 458, 577, 281, 577, 775, 309, 841, 281, 483, 1409, 51012], "temperature": 0.0, "avg_logprob": + -0.1503269752759612, "compression_ratio": 1.760180995475113, "no_speech_prob": 0.0011456109350547194}, + {"id": 684, "seek": 455240, "start": 4565.92, "end": 4573.679999999999, "text": + " are able to progress so that''s that''s really also motivating and and honestly + I think that some", "tokens": [51040, 366, 1075, 281, 4205, 370, 300, 311, 300, + 311, 534, 611, 41066, 293, 293, 6095, 286, 519, 300, 512, 51428], "temperature": + 0.0, "avg_logprob": -0.1503269752759612, "compression_ratio": 1.760180995475113, + "no_speech_prob": 0.0011456109350547194}, {"id": 685, "seek": 455240, "start": 4573.679999999999, + "end": 4581.839999999999, "text": " of the work that we''ve done using some of these + smaller transformer models has been has an impact", "tokens": [51428, 295, 264, + 589, 300, 321, 600, 1096, 1228, 512, 295, 613, 4356, 31782, 5245, 575, 668, 575, + 364, 2712, 51836], "temperature": 0.0, "avg_logprob": -0.1503269752759612, "compression_ratio": + 1.760180995475113, "no_speech_prob": 0.0011456109350547194}, {"id": 686, "seek": + 458184, "start": 4581.84, "end": 4587.360000000001, "text": " on the industry like + I got contacted by a person here on Twitter the other day said that you know I", + "tokens": [50364, 322, 264, 3518, 411, 286, 658, 21546, 538, 257, 954, 510, 322, + 5794, 264, 661, 786, 848, 300, 291, 458, 286, 50640], "temperature": 0.0, "avg_logprob": + -0.15301656174933773, "compression_ratio": 1.71875, "no_speech_prob": 0.0018938354915007949}, + {"id": 687, "seek": 458184, "start": 4587.360000000001, "end": 4595.04, "text": + " saw your tweet about these smaller language models like not the birth base that + people usually", "tokens": [50640, 1866, 428, 15258, 466, 613, 4356, 2856, 5245, + 411, 406, 264, 3965, 3096, 300, 561, 2673, 51024], "temperature": 0.0, "avg_logprob": + -0.15301656174933773, "compression_ratio": 1.71875, "no_speech_prob": 0.0018938354915007949}, + {"id": 688, "seek": 458184, "start": 4595.04, "end": 4602.88, "text": " turn okay + but this mini LM model which is a distilled 22 million parameters that actually + did", "tokens": [51024, 1261, 1392, 457, 341, 8382, 46529, 2316, 597, 307, 257, + 1483, 6261, 5853, 2459, 9834, 300, 767, 630, 51416], "temperature": 0.0, "avg_logprob": + -0.15301656174933773, "compression_ratio": 1.71875, "no_speech_prob": 0.0018938354915007949}, + {"id": 689, "seek": 458184, "start": 4602.88, "end": 4608.400000000001, "text": + " the demo that you can run in your browser and it said you know I saw your tweet + and I went ahead", "tokens": [51416, 264, 10723, 300, 291, 393, 1190, 294, 428, + 11185, 293, 309, 848, 291, 458, 286, 1866, 428, 15258, 293, 286, 1437, 2286, 51692], + "temperature": 0.0, "avg_logprob": -0.15301656174933773, "compression_ratio": 1.71875, + "no_speech_prob": 0.0018938354915007949}, {"id": 690, "seek": 460840, "start": 4608.719999999999, + "end": 4616.5599999999995, "text": " and tried it for my domain which was classification + of hate speech and then he like did a blog post", "tokens": [50380, 293, 3031, 309, + 337, 452, 9274, 597, 390, 21538, 295, 4700, 6218, 293, 550, 415, 411, 630, 257, + 6968, 2183, 50772], "temperature": 0.0, "avg_logprob": -0.13487017154693604, "compression_ratio": + 1.7312775330396475, "no_speech_prob": 0.003136327490210533}, {"id": 691, "seek": + 460840, "start": 4616.5599999999995, "end": 4621.2, "text": " on it and he mentioned + me and I think that was really like interesting for me to see that you know", "tokens": + [50772, 322, 309, 293, 415, 2835, 385, 293, 286, 519, 300, 390, 534, 411, 1880, + 337, 385, 281, 536, 300, 291, 458, 51004], "temperature": 0.0, "avg_logprob": -0.13487017154693604, + "compression_ratio": 1.7312775330396475, "no_speech_prob": 0.003136327490210533}, + {"id": 692, "seek": 460840, "start": 4621.759999999999, "end": 4627.44, "text": + " that I could share something that some people could actually make use of even + if it was outside of", "tokens": [51032, 300, 286, 727, 2073, 746, 300, 512, 561, + 727, 767, 652, 764, 295, 754, 498, 309, 390, 2380, 295, 51316], "temperature": 0.0, + "avg_logprob": -0.13487017154693604, "compression_ratio": 1.7312775330396475, "no_speech_prob": + 0.003136327490210533}, {"id": 693, "seek": 460840, "start": 4627.44, "end": 4633.36, + "text": " search show and I learned a lot from especially around the relevance is + slack space that we are", "tokens": [51316, 3164, 855, 293, 286, 3264, 257, 688, + 490, 2318, 926, 264, 32684, 307, 29767, 1901, 300, 321, 366, 51612], "temperature": + 0.0, "avg_logprob": -0.13487017154693604, "compression_ratio": 1.7312775330396475, + "no_speech_prob": 0.003136327490210533}, {"id": 694, "seek": 463336, "start": 4633.36, + "end": 4640.08, "text": " both in the open social connections slack space so a lot + of discussion there rector search and we", "tokens": [50364, 1293, 294, 264, 1269, + 2093, 9271, 29767, 1901, 370, 257, 688, 295, 5017, 456, 319, 1672, 3164, 293, 321, + 50700], "temperature": 0.0, "avg_logprob": -0.2521303962258732, "compression_ratio": + 1.6756756756756757, "no_speech_prob": 0.0008492626948282123}, {"id": 695, "seek": + 463336, "start": 4640.08, "end": 4646.32, "text": " are you know sharing some blog + posts and then I ask Greg from Pinecoin a tough question maybe you", "tokens": [50700, + 366, 291, 458, 5414, 512, 6968, 12300, 293, 550, 286, 1029, 11490, 490, 33531, 8562, + 257, 4930, 1168, 1310, 291, 51012], "temperature": 0.0, "avg_logprob": -0.2521303962258732, + "compression_ratio": 1.6756756756756757, "no_speech_prob": 0.0008492626948282123}, + {"id": 696, "seek": 463336, "start": 4646.32, "end": 4652.08, "text": " know and + so I really love being there and discussing and I learn a lot from from other people + like", "tokens": [51012, 458, 293, 370, 286, 534, 959, 885, 456, 293, 10850, 293, + 286, 1466, 257, 688, 490, 490, 661, 561, 411, 51300], "temperature": 0.0, "avg_logprob": + -0.2521303962258732, "compression_ratio": 1.6756756756756757, "no_speech_prob": + 0.0008492626948282123}, {"id": 697, "seek": 463336, "start": 4652.08, "end": 4657.599999999999, + "text": " just devins from elastic search and so and I''m from you and especially + around", "tokens": [51300, 445, 1905, 1292, 490, 17115, 3164, 293, 370, 293, 286, + 478, 490, 291, 293, 2318, 926, 51576], "temperature": 0.0, "avg_logprob": -0.2521303962258732, + "compression_ratio": 1.6756756756756757, "no_speech_prob": 0.0008492626948282123}, + {"id": 698, "seek": 465760, "start": 4657.92, "end": 4665.280000000001, "text": + " Berlin best search last year you did the AMA on the vector search and for me like + one of the key", "tokens": [50380, 13848, 1151, 3164, 1036, 1064, 291, 630, 264, + 6475, 32, 322, 264, 8062, 3164, 293, 337, 385, 411, 472, 295, 264, 2141, 50748], + "temperature": 0.0, "avg_logprob": -0.16211449846308282, "compression_ratio": 1.7123893805309736, + "no_speech_prob": 0.01185557059943676}, {"id": 699, "seek": 465760, "start": 4665.280000000001, + "end": 4672.08, "text": " moments was that Max Irvin your co-host of that he said + you know what if the user types of phrase", "tokens": [50748, 6065, 390, 300, 7402, + 9151, 4796, 428, 598, 12, 6037, 295, 300, 415, 848, 291, 458, 437, 498, 264, 4195, + 3467, 295, 9535, 51088], "temperature": 0.0, "avg_logprob": -0.16211449846308282, + "compression_ratio": 1.7123893805309736, "no_speech_prob": 0.01185557059943676}, + {"id": 700, "seek": 465760, "start": 4672.08, "end": 4679.200000000001, "text": + " query you know actually quote marks I want to search for this exact phrase you + know don''t show me", "tokens": [51088, 14581, 291, 458, 767, 6513, 10640, 286, + 528, 281, 3164, 337, 341, 1900, 9535, 291, 458, 500, 380, 855, 385, 51444], "temperature": + 0.0, "avg_logprob": -0.16211449846308282, "compression_ratio": 1.7123893805309736, + "no_speech_prob": 0.01185557059943676}, {"id": 701, "seek": 465760, "start": 4679.200000000001, + "end": 4684.240000000001, "text": " anything else give me that phrase you know and + that''s something that is really hard to do with", "tokens": [51444, 1340, 1646, + 976, 385, 300, 9535, 291, 458, 293, 300, 311, 746, 300, 307, 534, 1152, 281, 360, + 365, 51696], "temperature": 0.0, "avg_logprob": -0.16211449846308282, "compression_ratio": + 1.7123893805309736, "no_speech_prob": 0.01185557059943676}, {"id": 702, "seek": + 468424, "start": 4684.24, "end": 4689.92, "text": " vector search alone right because + you basically map it into this vector space and we do the", "tokens": [50364, 8062, + 3164, 3312, 558, 570, 291, 1936, 4471, 309, 666, 341, 8062, 1901, 293, 321, 360, + 264, 50648], "temperature": 0.0, "avg_logprob": -0.13975758388124662, "compression_ratio": + 1.6977777777777778, "no_speech_prob": 0.0005393280880525708}, {"id": 703, "seek": + 468424, "start": 4689.92, "end": 4695.44, "text": " similarities and that was a + key takeaway from me and that was a really eye-opener for me you know", "tokens": + [50648, 24197, 293, 300, 390, 257, 2141, 30681, 490, 385, 293, 300, 390, 257, 534, + 3313, 12, 404, 7971, 337, 385, 291, 458, 50924], "temperature": 0.0, "avg_logprob": + -0.13975758388124662, "compression_ratio": 1.6977777777777778, "no_speech_prob": + 0.0005393280880525708}, {"id": 704, "seek": 468424, "start": 4695.44, "end": 4702.0, + "text": " you need to be building out better examples of how actually to combine + sparse and then single so", "tokens": [50924, 291, 643, 281, 312, 2390, 484, 1101, + 5110, 295, 577, 767, 281, 10432, 637, 11668, 293, 550, 2167, 370, 51252], "temperature": + 0.0, "avg_logprob": -0.13975758388124662, "compression_ratio": 1.6977777777777778, + "no_speech_prob": 0.0005393280880525708}, {"id": 705, "seek": 468424, "start": 4702.0, + "end": 4711.28, "text": " yeah this is amazing and what I enjoyed what you said + is that you keep your practitioner hat on", "tokens": [51252, 1338, 341, 307, 2243, + 293, 437, 286, 4626, 437, 291, 848, 307, 300, 291, 1066, 428, 32125, 2385, 322, + 51716], "temperature": 0.0, "avg_logprob": -0.13975758388124662, "compression_ratio": + 1.6977777777777778, "no_speech_prob": 0.0005393280880525708}, {"id": 706, "seek": + 471128, "start": 4711.28, "end": 4718.639999999999, "text": " so you don''t just + buy in easily into these new models or you don''t stay on the field of okay I''m", + "tokens": [50364, 370, 291, 500, 380, 445, 2256, 294, 3612, 666, 613, 777, 5245, + 420, 291, 500, 380, 1754, 322, 264, 2519, 295, 1392, 286, 478, 50732], "temperature": + 0.0, "avg_logprob": -0.07790875434875488, "compression_ratio": 1.7117903930131004, + "no_speech_prob": 0.0022152839228510857}, {"id": 707, "seek": 471128, "start": 4718.639999999999, + "end": 4724.8, "text": " only an engineer I don''t even know what machine learning + is because I think the profession is", "tokens": [50732, 787, 364, 11403, 286, 500, + 380, 754, 458, 437, 3479, 2539, 307, 570, 286, 519, 264, 7032, 307, 51040], "temperature": + 0.0, "avg_logprob": -0.07790875434875488, "compression_ratio": 1.7117903930131004, + "no_speech_prob": 0.0022152839228510857}, {"id": 708, "seek": 471128, "start": 4724.8, + "end": 4731.36, "text": " slowly changing and it''s like a blend of skills where + today you need to succeed as a search engineer", "tokens": [51040, 5692, 4473, 293, + 309, 311, 411, 257, 10628, 295, 3942, 689, 965, 291, 643, 281, 7754, 382, 257, 3164, + 11403, 51368], "temperature": 0.0, "avg_logprob": -0.07790875434875488, "compression_ratio": + 1.7117903930131004, "no_speech_prob": 0.0022152839228510857}, {"id": 709, "seek": + 471128, "start": 4731.36, "end": 4739.36, "text": " and maybe it shouldn''t be called + the search engineer anymore like I think it needs some new term but", "tokens": + [51368, 293, 1310, 309, 4659, 380, 312, 1219, 264, 3164, 11403, 3602, 411, 286, + 519, 309, 2203, 512, 777, 1433, 457, 51768], "temperature": 0.0, "avg_logprob": + -0.07790875434875488, "compression_ratio": 1.7117903930131004, "no_speech_prob": + 0.0022152839228510857}, {"id": 710, "seek": 473936, "start": 4739.36, "end": 4745.679999999999, + "text": " we will probably be stuck with it for the lack of a better term but eventually + you will need more", "tokens": [50364, 321, 486, 1391, 312, 5541, 365, 309, 337, + 264, 5011, 295, 257, 1101, 1433, 457, 4728, 291, 486, 643, 544, 50680], "temperature": + 0.0, "avg_logprob": -0.07570054314353249, "compression_ratio": 1.6195652173913044, + "no_speech_prob": 0.0026735481806099415}, {"id": 711, "seek": 473936, "start": 4745.679999999999, + "end": 4753.36, "text": " skills under your belt and I think of the work that you + are doing is amazing in sharing this knowledge", "tokens": [50680, 3942, 833, 428, + 10750, 293, 286, 519, 295, 264, 589, 300, 291, 366, 884, 307, 2243, 294, 5414, 341, + 3601, 51064], "temperature": 0.0, "avg_logprob": -0.07570054314353249, "compression_ratio": + 1.6195652173913044, "no_speech_prob": 0.0026735481806099415}, {"id": 712, "seek": + 473936, "start": 4753.36, "end": 4760.88, "text": " and that people can actually + reproduce it and I think that''s super super crucial for the progress", "tokens": + [51064, 293, 300, 561, 393, 767, 29501, 309, 293, 286, 519, 300, 311, 1687, 1687, + 11462, 337, 264, 4205, 51440], "temperature": 0.0, "avg_logprob": -0.07570054314353249, + "compression_ratio": 1.6195652173913044, "no_speech_prob": 0.0026735481806099415}, + {"id": 713, "seek": 476088, "start": 4760.88, "end": 4768.0, "text": " yeah I mean + thank you Dimitri that''s that''s really nice of you and you know that''s that''s", + "tokens": [50364, 1338, 286, 914, 1309, 291, 20975, 270, 470, 300, 311, 300, 311, + 534, 1481, 295, 291, 293, 291, 458, 300, 311, 300, 311, 50720], "temperature": 0.0, + "avg_logprob": -0.20147931235177177, "compression_ratio": 1.7735849056603774, "no_speech_prob": + 0.009716111235320568}, {"id": 714, "seek": 476088, "start": 4770.16, "end": 4778.24, + "text": " I yeah I think it''s it''s it''s actually true you know to share and I + think that what you said you", "tokens": [50828, 286, 1338, 286, 519, 309, 311, + 309, 311, 309, 311, 767, 2074, 291, 458, 281, 2073, 293, 286, 519, 300, 437, 291, + 848, 291, 51232], "temperature": 0.0, "avg_logprob": -0.20147931235177177, "compression_ratio": + 1.7735849056603774, "no_speech_prob": 0.009716111235320568}, {"id": 715, "seek": + 476088, "start": 4778.24, "end": 4787.76, "text": " know building a search team + today is really hard because especially since deep learning entered", "tokens": + [51232, 458, 2390, 257, 3164, 1469, 965, 307, 534, 1152, 570, 2318, 1670, 2452, + 2539, 9065, 51708], "temperature": 0.0, "avg_logprob": -0.20147931235177177, "compression_ratio": + 1.7735849056603774, "no_speech_prob": 0.009716111235320568}, {"id": 716, "seek": + 478776, "start": 4787.76, "end": 4794.24, "text": " the search field right so now + you need to know how to configure and do matching and boosting", "tokens": [50364, + 264, 3164, 2519, 558, 370, 586, 291, 643, 281, 458, 577, 281, 22162, 293, 360, 14324, + 293, 43117, 50688], "temperature": 0.0, "avg_logprob": -0.11616359667831593, "compression_ratio": + 1.8883495145631068, "no_speech_prob": 0.005274524912238121}, {"id": 717, "seek": + 478776, "start": 4794.24, "end": 4800.24, "text": " an elastic search and now you + also need you know how do I train the dense vector model and you know", "tokens": + [50688, 364, 17115, 3164, 293, 586, 291, 611, 643, 291, 458, 577, 360, 286, 3847, + 264, 18011, 8062, 2316, 293, 291, 458, 50988], "temperature": 0.0, "avg_logprob": + -0.11616359667831593, "compression_ratio": 1.8883495145631068, "no_speech_prob": + 0.005274524912238121}, {"id": 718, "seek": 478776, "start": 4800.24, "end": 4806.88, + "text": " how should I you know should I use birch they use birch large you know + does it handle multilingual", "tokens": [50988, 577, 820, 286, 291, 458, 820, 286, + 764, 1904, 339, 436, 764, 1904, 339, 2416, 291, 458, 775, 309, 4813, 2120, 38219, + 51320], "temperature": 0.0, "avg_logprob": -0.11616359667831593, "compression_ratio": + 1.8883495145631068, "no_speech_prob": 0.005274524912238121}, {"id": 719, "seek": + 478776, "start": 4806.88, "end": 4812.88, "text": " text does it handle spell correction + you know they''re always kind of different things you know so", "tokens": [51320, + 2487, 775, 309, 4813, 9827, 19984, 291, 458, 436, 434, 1009, 733, 295, 819, 721, + 291, 458, 370, 51620], "temperature": 0.0, "avg_logprob": -0.11616359667831593, + "compression_ratio": 1.8883495145631068, "no_speech_prob": 0.005274524912238121}, + {"id": 720, "seek": 481288, "start": 4812.88, "end": 4821.4400000000005, "text": + " building a search team in 2022 it''s not easy because you need the kind of a mixed + NLP search you", "tokens": [50364, 2390, 257, 3164, 1469, 294, 20229, 309, 311, + 406, 1858, 570, 291, 643, 264, 733, 295, 257, 7467, 426, 45196, 3164, 291, 50792], + "temperature": 0.0, "avg_logprob": -0.12078923238834864, "compression_ratio": 1.529100529100529, + "no_speech_prob": 0.0005237549194134772}, {"id": 721, "seek": 481288, "start": 4821.4400000000005, + "end": 4825.92, "text": " know there are a lot of different things and that''s what + I love about it you know and I talked about", "tokens": [50792, 458, 456, 366, 257, + 688, 295, 819, 721, 293, 300, 311, 437, 286, 959, 466, 309, 291, 458, 293, 286, + 2825, 466, 51016], "temperature": 0.0, "avg_logprob": -0.12078923238834864, "compression_ratio": + 1.529100529100529, "no_speech_prob": 0.0005237549194134772}, {"id": 722, "seek": + 481288, "start": 4825.92, "end": 4834.56, "text": " on Twitter and you know in a + talk I did earlier as well that this neural paradigm shift has", "tokens": [51016, + 322, 5794, 293, 291, 458, 294, 257, 751, 286, 630, 3071, 382, 731, 300, 341, 18161, + 24709, 5513, 575, 51448], "temperature": 0.0, "avg_logprob": -0.12078923238834864, + "compression_ratio": 1.529100529100529, "no_speech_prob": 0.0005237549194134772}, + {"id": 723, "seek": 483456, "start": 4834.56, "end": 4840.320000000001, "text": + " opened this kind of knowledge gap you know how to actually successfully apply + these methods", "tokens": [50364, 5625, 341, 733, 295, 3601, 7417, 291, 458, 577, + 281, 767, 10727, 3079, 613, 7150, 50652], "temperature": 0.0, "avg_logprob": -0.12762591482579022, + "compression_ratio": 1.6304347826086956, "no_speech_prob": 0.0012347548035904765}, + {"id": 724, "seek": 483456, "start": 4841.280000000001, "end": 4848.56, "text": + " and also on the technology side that we try to bring you know with VESPA that + you can kind of", "tokens": [50700, 293, 611, 322, 264, 2899, 1252, 300, 321, 853, + 281, 1565, 291, 458, 365, 691, 2358, 10297, 300, 291, 393, 733, 295, 51064], "temperature": + 0.0, "avg_logprob": -0.12762591482579022, "compression_ratio": 1.6304347826086956, + "no_speech_prob": 0.0012347548035904765}, {"id": 725, "seek": 483456, "start": 4848.56, + "end": 4855.92, "text": " combine different techniques we don''t have to throw away + 50 or 300 years of the inverted index", "tokens": [51064, 10432, 819, 7512, 321, + 500, 380, 362, 281, 3507, 1314, 2625, 420, 6641, 924, 295, 264, 38969, 8186, 51432], + "temperature": 0.0, "avg_logprob": -0.12762591482579022, "compression_ratio": 1.6304347826086956, + "no_speech_prob": 0.0012347548035904765}, {"id": 726, "seek": 483456, "start": 4855.92, + "end": 4861.84, "text": " you know we don''t need to throw that away you know it + still has value it''s going to have value", "tokens": [51432, 291, 458, 321, 500, + 380, 643, 281, 3507, 300, 1314, 291, 458, 309, 920, 575, 2158, 309, 311, 516, 281, + 362, 2158, 51728], "temperature": 0.0, "avg_logprob": -0.12762591482579022, "compression_ratio": + 1.6304347826086956, "no_speech_prob": 0.0012347548035904765}, {"id": 727, "seek": + 486184, "start": 4861.84, "end": 4867.84, "text": " probably forever you know so + so we don''t have to throw away so that''s interesting but that''s also", "tokens": + [50364, 1391, 5680, 291, 458, 370, 370, 321, 500, 380, 362, 281, 3507, 1314, 370, + 300, 311, 1880, 457, 300, 311, 611, 50664], "temperature": 0.0, "avg_logprob": -0.1293924635490485, + "compression_ratio": 1.7706766917293233, "no_speech_prob": 0.000996629474684596}, + {"id": 728, "seek": 486184, "start": 4867.84, "end": 4873.6, "text": " what it''s + been fascinating and I''ve said numerous times that I don''t think I''ve learned + that much", "tokens": [50664, 437, 309, 311, 668, 10343, 293, 286, 600, 848, 12546, + 1413, 300, 286, 500, 380, 519, 286, 600, 3264, 300, 709, 50952], "temperature": + 0.0, "avg_logprob": -0.1293924635490485, "compression_ratio": 1.7706766917293233, + "no_speech_prob": 0.000996629474684596}, {"id": 729, "seek": 486184, "start": 4873.6, + "end": 4879.360000000001, "text": " in my career that I''ve actually over the last + three years because reading papers it''s not being", "tokens": [50952, 294, 452, + 3988, 300, 286, 600, 767, 670, 264, 1036, 1045, 924, 570, 3760, 10577, 309, 311, + 406, 885, 51240], "temperature": 0.0, "avg_logprob": -0.1293924635490485, "compression_ratio": + 1.7706766917293233, "no_speech_prob": 0.000996629474684596}, {"id": 730, "seek": + 486184, "start": 4879.360000000001, "end": 4884.4800000000005, "text": " a big kind + of interest of mine earlier it''s been one of the system side engineering side", + "tokens": [51240, 257, 955, 733, 295, 1179, 295, 3892, 3071, 309, 311, 668, 472, + 295, 264, 1185, 1252, 7043, 1252, 51496], "temperature": 0.0, "avg_logprob": -0.1293924635490485, + "compression_ratio": 1.7706766917293233, "no_speech_prob": 0.000996629474684596}, + {"id": 731, "seek": 486184, "start": 4885.52, "end": 4890.400000000001, "text": + " but that''s been a high-opener for me you know to kind of how to apply these techniques + and", "tokens": [51548, 457, 300, 311, 668, 257, 1090, 12, 404, 7971, 337, 385, + 291, 458, 281, 733, 295, 577, 281, 3079, 613, 7512, 293, 51792], "temperature": + 0.0, "avg_logprob": -0.1293924635490485, "compression_ratio": 1.7706766917293233, + "no_speech_prob": 0.000996629474684596}, {"id": 732, "seek": 489040, "start": 4890.96, + "end": 4897.599999999999, "text": " yeah so and to learn you know and that''s the + great thing about open source is that you know we can", "tokens": [50392, 1338, + 370, 293, 281, 1466, 291, 458, 293, 300, 311, 264, 869, 551, 466, 1269, 4009, 307, + 300, 291, 458, 321, 393, 50724], "temperature": 0.0, "avg_logprob": -0.0845474296145969, + "compression_ratio": 1.813397129186603, "no_speech_prob": 0.0017564388690516353}, + {"id": 733, "seek": 489040, "start": 4897.599999999999, "end": 4908.32, "text": + " share ways of doing things yeah yeah absolutely sharing is caring and so much + comes back to you", "tokens": [50724, 2073, 2098, 295, 884, 721, 1338, 1338, 3122, + 5414, 307, 15365, 293, 370, 709, 1487, 646, 281, 291, 51260], "temperature": 0.0, + "avg_logprob": -0.0845474296145969, "compression_ratio": 1.813397129186603, "no_speech_prob": + 0.0017564388690516353}, {"id": 734, "seek": 489040, "start": 4908.32, "end": 4913.5199999999995, + "text": " as you said you know you get mentioned somewhere and you feel like you + didn''t do it in vain", "tokens": [51260, 382, 291, 848, 291, 458, 291, 483, 2835, + 4079, 293, 291, 841, 411, 291, 994, 380, 360, 309, 294, 22240, 51520], "temperature": + 0.0, "avg_logprob": -0.0845474296145969, "compression_ratio": 1.813397129186603, + "no_speech_prob": 0.0017564388690516353}, {"id": 735, "seek": 489040, "start": 4913.5199999999995, + "end": 4919.36, "text": " but also you might learn something new like a new use + case and I feel the same you know when", "tokens": [51520, 457, 611, 291, 1062, + 1466, 746, 777, 411, 257, 777, 764, 1389, 293, 286, 841, 264, 912, 291, 458, 562, + 51812], "temperature": 0.0, "avg_logprob": -0.0845474296145969, "compression_ratio": + 1.813397129186603, "no_speech_prob": 0.0017564388690516353}, {"id": 736, "seek": + 491936, "start": 4919.36, "end": 4926.639999999999, "text": " when I blog or when + some video is viewed by somebody and then somebody says thank you even just", "tokens": + [50364, 562, 286, 6968, 420, 562, 512, 960, 307, 19174, 538, 2618, 293, 550, 2618, + 1619, 1309, 291, 754, 445, 50728], "temperature": 0.0, "avg_logprob": -0.06558184788144868, + "compression_ratio": 1.6869565217391305, "no_speech_prob": 0.0007935056346468627}, + {"id": 737, "seek": 491936, "start": 4926.639999999999, "end": 4932.719999999999, + "text": " multiple months after I did it and you know it''s just an amazing feeling + it''s like a sense of", "tokens": [50728, 3866, 2493, 934, 286, 630, 309, 293, 291, + 458, 309, 311, 445, 364, 2243, 2633, 309, 311, 411, 257, 2020, 295, 51032], "temperature": + 0.0, "avg_logprob": -0.06558184788144868, "compression_ratio": 1.6869565217391305, + "no_speech_prob": 0.0007935056346468627}, {"id": 738, "seek": 491936, "start": 4932.719999999999, + "end": 4941.12, "text": " connection as well especially in these days when we don''t + maybe meet socially as we used to but", "tokens": [51032, 4984, 382, 731, 2318, + 294, 613, 1708, 562, 321, 500, 380, 1310, 1677, 21397, 382, 321, 1143, 281, 457, + 51452], "temperature": 0.0, "avg_logprob": -0.06558184788144868, "compression_ratio": + 1.6869565217391305, "no_speech_prob": 0.0007935056346468627}, {"id": 739, "seek": + 491936, "start": 4941.679999999999, "end": 4948.5599999999995, "text": " that''s + a new actually evolutionary new way of connecting and I feel much more comfortable + and enjoying", "tokens": [51480, 300, 311, 257, 777, 767, 27567, 777, 636, 295, + 11015, 293, 286, 841, 709, 544, 4619, 293, 9929, 51824], "temperature": 0.0, "avg_logprob": + -0.06558184788144868, "compression_ratio": 1.6869565217391305, "no_speech_prob": + 0.0007935056346468627}, {"id": 740, "seek": 494856, "start": 4948.56, "end": 4957.04, + "text": " this more detailed conversation so maybe these interactions on Twitter + they they bring a lot more", "tokens": [50364, 341, 544, 9942, 3761, 370, 1310, + 613, 13280, 322, 5794, 436, 436, 1565, 257, 688, 544, 50788], "temperature": 0.0, + "avg_logprob": -0.13781012258222025, "compression_ratio": 1.655367231638418, "no_speech_prob": + 0.0008748817490413785}, {"id": 741, "seek": 494856, "start": 4957.04, "end": 4963.92, + "text": " value and I think this is super super great is there is something that + you would like to share on", "tokens": [50788, 2158, 293, 286, 519, 341, 307, 1687, + 1687, 869, 307, 456, 307, 746, 300, 291, 576, 411, 281, 2073, 322, 51132], "temperature": + 0.0, "avg_logprob": -0.13781012258222025, "compression_ratio": 1.655367231638418, + "no_speech_prob": 0.0008748817490413785}, {"id": 742, "seek": 494856, "start": 4963.92, + "end": 4970.240000000001, "text": " Vespa development or maybe something that users + might anticipate and maybe you want to point them", "tokens": [51132, 691, 279, + 4306, 3250, 420, 1310, 746, 300, 5022, 1062, 21685, 293, 1310, 291, 528, 281, 935, + 552, 51448], "temperature": 0.0, "avg_logprob": -0.13781012258222025, "compression_ratio": + 1.655367231638418, "no_speech_prob": 0.0008748817490413785}, {"id": 743, "seek": + 497024, "start": 4971.04, "end": 4974.32, "text": " to some tutorial that they might + you know take a look at", "tokens": [50404, 281, 512, 7073, 300, 436, 1062, 291, + 458, 747, 257, 574, 412, 50568], "temperature": 0.0, "avg_logprob": -0.14266654423304967, + "compression_ratio": 1.6386138613861385, "no_speech_prob": 0.00326779717579484}, + {"id": 744, "seek": 497024, "start": 4977.84, "end": 4982.5599999999995, "text": + " yeah so I can give a few product updates of what''s coming from Vespa we are coming", + "tokens": [50744, 1338, 370, 286, 393, 976, 257, 1326, 1674, 9205, 295, 437, 311, + 1348, 490, 691, 279, 4306, 321, 366, 1348, 50980], "temperature": 0.0, "avg_logprob": + -0.14266654423304967, "compression_ratio": 1.6386138613861385, "no_speech_prob": + 0.00326779717579484}, {"id": 745, "seek": 497024, "start": 4983.76, "end": 4990.32, + "text": " gonna release some integrated dense models for Vespa so though you don''t + have to export so you", "tokens": [51040, 799, 4374, 512, 10919, 18011, 5245, 337, + 691, 279, 4306, 370, 1673, 291, 500, 380, 362, 281, 10725, 370, 291, 51368], "temperature": + 0.0, "avg_logprob": -0.14266654423304967, "compression_ratio": 1.6386138613861385, + "no_speech_prob": 0.00326779717579484}, {"id": 746, "seek": 497024, "start": 4990.32, + "end": 4997.28, "text": " can you can use these models off the shelf and then we + allow you to tune the Korean coder so you", "tokens": [51368, 393, 291, 393, 764, + 613, 5245, 766, 264, 15222, 293, 550, 321, 2089, 291, 281, 10864, 264, 6933, 17656, + 260, 370, 291, 51716], "temperature": 0.0, "avg_logprob": -0.14266654423304967, + "compression_ratio": 1.6386138613861385, "no_speech_prob": 0.00326779717579484}, + {"id": 747, "seek": 499728, "start": 4997.28, "end": 5004.08, "text": " have the + document code is frozen but then you can tune the Korean coder and then show you + how to", "tokens": [50364, 362, 264, 4166, 3089, 307, 12496, 457, 550, 291, 393, + 10864, 264, 6933, 17656, 260, 293, 550, 855, 291, 577, 281, 50704], "temperature": + 0.0, "avg_logprob": -0.1601293756720725, "compression_ratio": 1.683982683982684, + "no_speech_prob": 0.0006800630362704396}, {"id": 748, "seek": 499728, "start": 5004.08, + "end": 5009.36, "text": " combine these combining both dense as far so that''s one + thing that is coming out other thing is", "tokens": [50704, 10432, 613, 21928, 1293, + 18011, 382, 1400, 370, 300, 311, 472, 551, 300, 307, 1348, 484, 661, 551, 307, 50968], + "temperature": 0.0, "avg_logprob": -0.1601293756720725, "compression_ratio": 1.683982683982684, + "no_speech_prob": 0.0006800630362704396}, {"id": 749, "seek": 499728, "start": 5009.36, + "end": 5017.12, "text": " that we''re taking some steps regarding for love QPS use + cases because we designed Vespa you know", "tokens": [50968, 300, 321, 434, 1940, + 512, 4439, 8595, 337, 959, 1249, 6273, 764, 3331, 570, 321, 4761, 691, 279, 4306, + 291, 458, 51356], "temperature": 0.0, "avg_logprob": -0.1601293756720725, "compression_ratio": + 1.683982683982684, "no_speech_prob": 0.0006800630362704396}, {"id": 750, "seek": + 499728, "start": 5017.12, "end": 5024.32, "text": " to be kind of low single digit + male seconds on multiple different use cases but not everybody needs", "tokens": + [51356, 281, 312, 733, 295, 2295, 2167, 14293, 7133, 3949, 322, 3866, 819, 764, + 3331, 457, 406, 2201, 2203, 51716], "temperature": 0.0, "avg_logprob": -0.1601293756720725, + "compression_ratio": 1.683982683982684, "no_speech_prob": 0.0006800630362704396}, + {"id": 751, "seek": 502432, "start": 5024.32, "end": 5032.08, "text": " that so + we''re introducing some new options for memory management so that we can actually + run", "tokens": [50364, 300, 370, 321, 434, 15424, 512, 777, 3956, 337, 4675, 4592, + 370, 300, 321, 393, 767, 1190, 50752], "temperature": 0.0, "avg_logprob": -0.11968808540931115, + "compression_ratio": 1.6470588235294117, "no_speech_prob": 0.0025737411342561245}, + {"id": 752, "seek": 502432, "start": 5033.5199999999995, "end": 5039.44, "text": + " on service with less memory so that''s I think that''s gonna be a game changer + for certain", "tokens": [50824, 322, 2643, 365, 1570, 4675, 370, 300, 311, 286, + 519, 300, 311, 799, 312, 257, 1216, 22822, 337, 1629, 51120], "temperature": 0.0, + "avg_logprob": -0.11968808540931115, "compression_ratio": 1.6470588235294117, "no_speech_prob": + 0.0025737411342561245}, {"id": 753, "seek": 502432, "start": 5039.44, "end": 5050.88, + "text": " use cases that don''t need high throughput low latency so that''s two + things and yeah that''s I think", "tokens": [51120, 764, 3331, 300, 500, 380, 643, + 1090, 44629, 2295, 27043, 370, 300, 311, 732, 721, 293, 1338, 300, 311, 286, 519, + 51692], "temperature": 0.0, "avg_logprob": -0.11968808540931115, "compression_ratio": + 1.6470588235294117, "no_speech_prob": 0.0025737411342561245}, {"id": 754, "seek": + 505088, "start": 5050.88, "end": 5056.8, "text": " that''s more than enough you + know and there will there will be some some blogs I think about our", "tokens": + [50364, 300, 311, 544, 813, 1547, 291, 458, 293, 456, 486, 456, 486, 312, 512, 512, + 31038, 286, 519, 466, 527, 50660], "temperature": 0.0, "avg_logprob": -0.19128237599911896, + "compression_ratio": 1.6355932203389831, "no_speech_prob": 0.005922040902078152}, + {"id": 755, "seek": 505088, "start": 5056.8, "end": 5065.2, "text": " results on + the beer beer benchmark yeah yeah that''s and I''m gonna come out also with a blog + post on", "tokens": [50660, 3542, 322, 264, 8795, 8795, 18927, 1338, 1338, 300, + 311, 293, 286, 478, 799, 808, 484, 611, 365, 257, 6968, 2183, 322, 51080], "temperature": + 0.0, "avg_logprob": -0.19128237599911896, "compression_ratio": 1.6355932203389831, + "no_speech_prob": 0.005922040902078152}, {"id": 756, "seek": 505088, "start": 5065.2, + "end": 5071.84, "text": " a technique called span which is a paper for Microsoft + so m2n represented that on Vespa so it''s", "tokens": [51080, 257, 6532, 1219, 16174, + 597, 307, 257, 3035, 337, 8116, 370, 275, 17, 77, 10379, 300, 322, 691, 279, 4306, + 370, 309, 311, 51412], "temperature": 0.0, "avg_logprob": -0.19128237599911896, + "compression_ratio": 1.6355932203389831, "no_speech_prob": 0.005922040902078152}, + {"id": 757, "seek": 505088, "start": 5071.84, "end": 5078.56, "text": " really interesting + technique with the hybrid combination of HMSW and inverted file and you can", "tokens": + [51412, 534, 1880, 6532, 365, 264, 13051, 6562, 295, 389, 10288, 54, 293, 38969, + 3991, 293, 291, 393, 51748], "temperature": 0.0, "avg_logprob": -0.19128237599911896, + "compression_ratio": 1.6355932203389831, "no_speech_prob": 0.005922040902078152}, + {"id": 758, "seek": 507856, "start": 5078.56, "end": 5085.04, "text": " represent + this m2n in Vespa so I''m gonna do a part three of my blog post serial billion scale", + "tokens": [50364, 2906, 341, 275, 17, 77, 294, 691, 279, 4306, 370, 286, 478, 799, + 360, 257, 644, 1045, 295, 452, 6968, 2183, 17436, 5218, 4373, 50688], "temperature": + 0.0, "avg_logprob": -0.1659896501930811, "compression_ratio": 1.6566523605150214, + "no_speech_prob": 0.0017443925607949495}, {"id": 759, "seek": 507856, "start": 5085.04, + "end": 5089.6, "text": " so that''s something I''m looking forward to but right + now I''m that kind of refracting a lot of", "tokens": [50688, 370, 300, 311, 746, + 286, 478, 1237, 2128, 281, 457, 558, 586, 286, 478, 300, 733, 295, 1895, 1897, 278, + 257, 688, 295, 50916], "temperature": 0.0, "avg_logprob": -0.1659896501930811, "compression_ratio": + 1.6566523605150214, "no_speech_prob": 0.0017443925607949495}, {"id": 760, "seek": + 507856, "start": 5089.6, "end": 5097.04, "text": " the sample applications and so + on to to get the experience more smoothly yeah yeah sounds fantastic", "tokens": + [50916, 264, 6889, 5821, 293, 370, 322, 281, 281, 483, 264, 1752, 544, 19565, 1338, + 1338, 3263, 5456, 51288], "temperature": 0.0, "avg_logprob": -0.1659896501930811, + "compression_ratio": 1.6566523605150214, "no_speech_prob": 0.0017443925607949495}, + {"id": 761, "seek": 507856, "start": 5097.04, "end": 5102.400000000001, "text": + " looking forward to it and we''ll make sure to link all the blog posts that you + mentioned especially", "tokens": [51288, 1237, 2128, 281, 309, 293, 321, 603, 652, + 988, 281, 2113, 439, 264, 6968, 12300, 300, 291, 2835, 2318, 51556], "temperature": + 0.0, "avg_logprob": -0.1659896501930811, "compression_ratio": 1.6566523605150214, + "no_speech_prob": 0.0017443925607949495}, {"id": 762, "seek": 510240, "start": 5102.719999999999, + "end": 5108.48, "text": " on billion scale vector search and other tutorials that + you mentioned and this is fantastic thank", "tokens": [50380, 322, 5218, 4373, 8062, + 3164, 293, 661, 17616, 300, 291, 2835, 293, 341, 307, 5456, 1309, 50668], "temperature": + 0.0, "avg_logprob": -0.0654059490525579, "compression_ratio": 1.7345132743362832, + "no_speech_prob": 0.0033172317780554295}, {"id": 763, "seek": 510240, "start": 5108.48, + "end": 5115.04, "text": " you for doing this and keep doing this keep finding the + energy I know it''s stuck sometimes but I", "tokens": [50668, 291, 337, 884, 341, + 293, 1066, 884, 341, 1066, 5006, 264, 2281, 286, 458, 309, 311, 5541, 2171, 457, + 286, 50996], "temperature": 0.0, "avg_logprob": -0.0654059490525579, "compression_ratio": + 1.7345132743362832, "no_speech_prob": 0.0033172317780554295}, {"id": 764, "seek": + 510240, "start": 5115.04, "end": 5121.5199999999995, "text": " think it keeps you + also awake and sort of like pushing yourself forward and I think the best way to", + "tokens": [50996, 519, 309, 5965, 291, 611, 15994, 293, 1333, 295, 411, 7380, 1803, + 2128, 293, 286, 519, 264, 1151, 636, 281, 51320], "temperature": 0.0, "avg_logprob": + -0.0654059490525579, "compression_ratio": 1.7345132743362832, "no_speech_prob": + 0.0033172317780554295}, {"id": 765, "seek": 510240, "start": 5121.5199999999995, + "end": 5128.4, "text": " use your brain is actually doing something that is useful + be reading a paper or implementing code", "tokens": [51320, 764, 428, 3567, 307, + 767, 884, 746, 300, 307, 4420, 312, 3760, 257, 3035, 420, 18114, 3089, 51664], "temperature": + 0.0, "avg_logprob": -0.0654059490525579, "compression_ratio": 1.7345132743362832, + "no_speech_prob": 0.0033172317780554295}, {"id": 766, "seek": 512840, "start": 5128.4, + "end": 5135.2, "text": " or blogging about it so this is fantastic thanks so much + for your active contribution", "tokens": [50364, 420, 6968, 3249, 466, 309, 370, + 341, 307, 5456, 3231, 370, 709, 337, 428, 4967, 13150, 50704], "temperature": 0.0, + "avg_logprob": -0.09214245291317211, "compression_ratio": 1.7342995169082125, "no_speech_prob": + 0.010311142541468143}, {"id": 767, "seek": 512840, "start": 5135.839999999999, "end": + 5143.28, "text": " thank you thank you as well yeah um and I really enjoyed this + conversation I really hope we can", "tokens": [50736, 1309, 291, 1309, 291, 382, + 731, 1338, 1105, 293, 286, 534, 4626, 341, 3761, 286, 534, 1454, 321, 393, 51108], + "temperature": 0.0, "avg_logprob": -0.09214245291317211, "compression_ratio": 1.7342995169082125, + "no_speech_prob": 0.010311142541468143}, {"id": 768, "seek": 512840, "start": 5143.28, + "end": 5148.799999999999, "text": " record at some point down the road as well if + you will be open to it and I think we can", "tokens": [51108, 2136, 412, 512, 935, + 760, 264, 3060, 382, 731, 498, 291, 486, 312, 1269, 281, 309, 293, 286, 519, 321, + 393, 51384], "temperature": 0.0, "avg_logprob": -0.09214245291317211, "compression_ratio": + 1.7342995169082125, "no_speech_prob": 0.010311142541468143}, {"id": 769, "seek": + 512840, "start": 5148.799999999999, "end": 5154.879999999999, "text": " cover a + lot more topics as well but I wish you all the success in your endeavors and stay", + "tokens": [51384, 2060, 257, 688, 544, 8378, 382, 731, 457, 286, 3172, 291, 439, + 264, 2245, 294, 428, 49608, 293, 1754, 51688], "temperature": 0.0, "avg_logprob": + -0.09214245291317211, "compression_ratio": 1.7342995169082125, "no_speech_prob": + 0.010311142541468143}, {"id": 770, "seek": 515488, "start": 5154.88, "end": 5162.0, + "text": " warm and excited about the field yeah I will I mean such an exciting field + thank you very much", "tokens": [50364, 4561, 293, 2919, 466, 264, 2519, 1338, 286, + 486, 286, 914, 1270, 364, 4670, 2519, 1309, 291, 588, 709, 50720], "temperature": + 0.0, "avg_logprob": -0.1856441705123238, "compression_ratio": 1.1604938271604939, + "no_speech_prob": 0.018528467044234276}, {"id": 771, "seek": 516200, "start": 5162.0, + "end": 5172.96, "text": " Dimitri for hosting this and you know we''ll talk later + on thank you thank you and see you around bye bye", "tokens": [50368, 20975, 270, + 470, 337, 16058, 341, 293, 291, 458, 321, 603, 751, 1780, 322, 1309, 291, 1309, + 291, 293, 536, 291, 926, 6543, 6543, 50912], "temperature": 0.0, "avg_logprob": + -0.36714546768753614, "compression_ratio": 1.2380952380952381, "no_speech_prob": + 0.050745509564876556}, {"id": 772, "seek": 519200, "start": 5192.0, "end": 5194.8, + "text": " you", "tokens": [50372, 291, 50504], "temperature": 1.0, "avg_logprob": + -1.8751850128173828, "compression_ratio": 0.2727272727272727, "no_speech_prob": + 0.39786800742149353}]' +--- + +Everyone, Vector Podcast is here. I hope you have been waiting for another episode. And today I have a rock star with me. Joe Christian Bergum, a distinguished engineer with Yahoo. And he has been super vocal in the field of Vector Search. +And he has been also advocating for one of the famous Vector Search engines and actually like a platform. Shirley Jo can talk more about it called Vespa. Hey Joe, how are you doing? Hey Dimitri, I'm good, thanks. How are you? I'm great. Thank you very much for taking time to talk to me. +It's fantastic being here on your show. It's become so popular. Thank you for that introduction. I'm not sure if I'm a rock star. It's really interesting to be here. I really look forward for our conversation on Vector Search and maybe we'll touch on language models as well. +And they'll talk a little bit about Vespa and the technology in Vespa. I'm really excited. Yeah, I'm looking forward to that. And I mean, you are a rock star. I can hear you every way on Twitter and LinkedIn and blogging. And so what else? So this has been like this. +And I'm really glad to hear to talk to you here today. And so as a tradition, could you please introduce yourself however you want to know the detail you want and we'll take it from there? Yeah. Yeah, so my name is Joe Christian and I work for Yahoo. And I've been working for Yahoo's is 2007. +My current role in Yahoo is distinguished engineer and I work on the Vespa platform. And I've been working on Search and Recommendations since about 2001. When I joined a company here as an intern during my studies, a company called Fast Search and Transfer, an Norwegian company. +Back then they were doing web search with this web search engine called alldevab.com. So they started around 98 I think trying to compete with Google and so on. And then Yahoo came along and bought the web search division. The team here in Toronto. They also bought all the way star and so on. +So that was back in 2003. And in 2004, Vespa was born. So and I joined I actually worked in Fast in the enterprise search division for some time, three years. And then I joined Yahoo in 2007. And since then I'm been here working on search and Vespa in Yahoo. So that's my background. +I also hold a master degree in computer science from the Norwegian University here in Toronto. Oh yeah, that's great. Actually, by the way, I did visit Toronto Hame. Was it 2007 for an interview with one search company? Not fast. But yeah, it was a great, great visit. I mean, I love the city. +It's an amazing place. Yeah, it's an amazing place. And it's funny what you said about search and trial because it really has a special, maybe special even in Europe because we at one time we had both Google, Bing and Yahoo here in in in in trial line. So that was a fantastic time. +Google shut down their office here in trial line. And but now we have a Microsoft is here in in tronheim and also Yahoo as office here in in tronheim. So there's a lot of search technology competence here in tronheim. +This is amazing actually for for relatively small city, but I think Tronheim used to be a capital of Norway at some point in back in history. Yeah, in its on point, back way back in the Viking days. Exactly. So now all these Vikings are stopped going around with boats and harassing people. +Now we developed search technology now. Yeah, such a move. Wow. And I also understood that in tronheim, as you said, there is the university. Is it actually one of the talent supplies for this industry or engineering in general? Yeah, it is. +We have the largest technical university in Norway, C and in tronheim. So as an old kind of history and so it's definitely one of the reasons why the search companies evolved. And the fast search and transfer of the company was founded by people coming out of the university here. +So two point in the east week, very good swing and so these two reggae and they they came they actually started with FTP search bucket back in like the night the seven. So and that developed into this web search engine and then eventually this became a Westpaw in Yahoo. Oh yeah, yeah, sounds great. +So I can actually maybe touch on the backgrounds since I've mentioned now web search and you know how maybe not everybody has heard about Westpaw and so Westpaw actually we started developing Westpaw in 2004. So Yahoo said that you know we brought you into the company. +We want you to build a vertical search platform that we can use across our properties in Yahoo. So for example, Yahoo finance, Yahoo news, they need to have some kind of search engine. +So and they gave that task to ask you in trial and I'm so they started building Westpaw, the Westpaw platform using the routes and the technology from the web search and putting that into a package that the verticals could install and use. +And then over time this so basically starting with basic BM25 like search like keyword search and then gradually Westpaw added more features real time indexing, 10 source aggregations, grouping facets as well. +So it really developed over time and new requirements came in especially when we started Westpaw it was around search but in 2007, 2008 around that time Westpaw's also started to be used more of as a recommendation engine. So serving of recommendations. So when you go to finance, Yahoo. +com and there's a set of articles that are recommended to you the serving engine doing that is Westpaw. And then in 2017, Yahoo decided that we're going to open source Westpaw to the world. +So we open-sourced it using our Apache tool license and we still continue to actively, very actively develop on Westpaw and add new features and so on. So that's a kind of brief background. So Westpaw is not new. +It's really kind of it has in a very long history and I think that's also great thing and we can talk maybe a little bit about it because you know we need to develop software over time. There are a lot of changes you know in the infrastructure. There was no cloud, public cloud. +There were no Kubernetes and from 2004. I started in 2007, you know a high power content machine, content node machine would have maybe eight things of RAM. And it would have maybe maximum 1 gigabit per second network. And if we go fast forward, you know, and it will have spinning disks. +And now we have NVME SSD disks. We have nodes with four terabytes, potentially of memory, lots of CPU power. So there's like keeping up in improving the software and adopting it to the hardware and new hardware and so on. It's been really fun to watch. +I think we did a good job actually making Westpaw kind of modern from something that started in 2004. It turns like really an exciting journey and really like starting from when you would explain like you know small scale servers in the way all the way. +And the technology has changed so much right? The disks became faster I guess and you know the network has become faster. +And like I remember like in Silicon Valley, Citco, if you if you watched it, it like they had a case when they optimized one module in the system and the whole system went down because it's way too fast. +So it's like it sounds like you have done quite a bit of job to actually keep this shape of flow. And like if I understood correctly, technically speaking, Westpaw or portion of Westpaw is implemented in Java. And then portion in C or C++ and then you also have some Python. +And maybe you can talk more about the choice of languages and sort of culture that there isn't the team. But I'm also curious like around the same time. I've actually seen was also developing right quite quite fast. +Did you kind of look at what that team is doing which is like an open source project? Was there something to loan from? Yeah, so let me tackle the first questions around Westpaw and the kind of languages that be used. And there's a lot of things here to cover. So Westpaw is around 1. +7 million lines of code, the total Westpaw platform. And it's a roughly 50% is written in Java. And 50% is written in C++. +And why do we use two different languages and what are the trade-offs? +So in the Westpaw architecture, we made a clear distinction between what we call the cluster that holds the content where you actually index and invert the documents and you have all the data structures for fast searching in these data structures. +The content layer is written in C++ because you're managing a lot of data. You have the data that you need to have in memory and so on. So and it needs to be fairly efficient. And then on there what we call the stateless layer is the layer that actually interacts with user requests. +So user requests comes in. It's accepted by HSP server and there you do and that layer is written in Java. So you can also then deploy plugins. You can write your own searcher functions that can dispatch the query and get a reply. And you don't. +It's transparent from a given searcher if you have a 100 node cluster or if you have a single node cluster. So that's kind of hidden away when you deploy a plugin. So those languages have different trade-offs. +So it's a lot easier for people to write plugins using Java without shooting themselves in the foot using C++. So in the content layer in C++ we don't allow any kind of plugins. You can contribute or you can contribute to the open source but then it needs to be a kind of feature. +We don't allow you to embed a library or something into the content layer. So that's a trade-off. So then you mentioned Python. We have a Python, what we call pi-wespa which is language binding on top of the HDP API. So it's not of the core kind of westpides. +It's an API where we built around interacting with westpa, doing model evaluation and evaluating for example different retrieval and writing strategies. So that's the kind of language. And regarding your scene, Apache Lucene. So if I recall correctly, I think Apache Lucene started in 1998. +So around time. So there's a lot of inspiration of course and it's not that many ways you can build a search engine. So I'm losing pretty much, it's a really good library. +So yeah, definitely we look at what's happening in open source and they have a lot of admiration for the work and the committers of Apache Lucene. I mean, it's a great job that they've done and they'll be able to develop this over 20 years. +And the core difference is between westpun, Apache Lucene is that westpun is a full kind of engine. So it becomes more of like comparing westpun with elastic search or Apache Zoolar, which is kind of an engine on top. So there's no like westpun library which you can use. +You have to kind of buy the whole, you have to buy the whole platform. Yes, basically like a web server around it and all the components like the nodes and overseer and other architectural elements. Yeah, for sure. +And on the Python side, I'm also curious like with all the development of models and you know, hugging face and you can pretty much find a paper and then most likely there is a model already available in some shape and form. +And so the Python layer in westpun does it help you know newcomers to kind of easier experiment with these models in conjunction with westpun? We do hope so. And that was one of the goals for making py westpun. +So there are different kind of use cases where you if you have like a more of a low query volume, maybe you have 200,000 documents or something like that, you know, not really not really very low latency and so on. +Then you can use Python and do embeddings and you can play then it natively works with hugging face and all those libraries that are typically written in Python. And then you can use westpun, just purely HTTP based APIs and so on. +The other option, which is more involved, I have to say, and that is that you can take a transformer model, for example, and export it to one X format or on X, which is open neural network exchange format. +So that's a kind of open neural network format that multiple companies like Microsoft, I think also Facebook have rallied around, you know, this open format. +So you can take the transformer models from the hugging face library and then you can export it to on X and then you can import all next models into westpun for evaluation. +And westpun we integrate with on X runtime, which is open source library from Microsoft, which has a lot of different language findings, Python, C++, Java. So it's a really great library and we integrate with that. +So you don't use it directly, but we have like you can put the model here, westpun you can be use it and you can invoke it and so on. + And those models and then you're kind of a trade off between, you know, getting to know westpun playing around with it and then, you know, maybe low QPS, but in the scenario where you have a really large scale, you want to do 100,000 per cent back and there's something like that, then you move it to on X and deploy it actually inside the westpun cluster, which has many benefits because then you don't transfer a lot of data over the network and so on, because network is still even, you know, within the data centers, maybe the network limitations have this sold so you can get 10 gigs or 25 gigs even, but going cross region, then latency is still concern and that's that's one thing that really fascinates me is that we're still sometimes, you know, the use cases are bottlenecked by the speed of the light, right? +So yeah, going from the east goes to the west coast and the US is easily 100 milliseconds. +So hasn't been yet canceled or sold so yeah, physics. + Yeah, this is fantastic and and so and also like even before we go into this wonderful world of models and latest advancements like I'm still curious also to dig into the item that you mentioned like you when when you have been evolving westpun over time, you found a need to add something really interesting, some really interesting data structures like tensors you mentioned and like could you elaborate a bit more on how this need arise and also like, you know, what are the use cases, typical use cases for it today and also how accessible to an average user of westpun so to say. +Yeah, so I'll do a little bit of history on that. So the best for document model you write has a fixed kind of you have to have a defined schema in westpun. +So you have to define it for instance, you have a document type called document and it has a title, it has maybe some time stamp, it might be have an integer attribute. + So there are different like normal document model, what you expect from kind of any any schema oriented database and we also had vectors so you can do early on that you can actually do brute force dot products as part of ranking because that was really popular among in in your you know for various ranking requirements you will multiply or sorry, you will perform multiple different dot products over the documents that you you're queried as retreat then in around 2013 2014 the researchers in your outside, you know, we really want to express these type of recommendation models where we can use the general concept of tensor so not just storing a vector in the document but even a matrix and they had some use cases around recommendation. + So for instance in the in the in the document you can represent in the matrix so you can have multiple is this document popular in multiple different categories for example that you know this document is popular among people that are interested in use this is in the ones that they're interested in finance and so on. + So it's a really like complex and complex like that you can actually have both the tensors in the in the document side but also the query side and then you can do during the ranking phase you can evaluate these kind of expressions so it's a really it's a really powerful the language and one example concrete example is we haven't touched on the language models and so on but for instance the callbert model which is contextualized late interaction overbert where you actually take the query is not represented as one vector but each of the terms in the queries represent the desivector and similar on the document side each of the document terms are represented as a vector and then at runtime you retrieve documents and then you rank them based on this maximum similarity function so it takes the vector of the first term and then it performs k dot products against the vectors of the document terms and then you you take the maximum of that score and then you do that for all of the terms and the final is the it's a sum so that was actually one of the things that I personally the tensors hadn't been that much used for search use cases but more around recommendation use cases when I when I when I saw callbert and I saw the maximum operator I was like this is just perfect fit for for the best potential it's a perfect use case so yeah yeah that's one example yeah awesome once you described like when you go like many models today is like okay embedded spas that you embed this paragraph whatever but if if you need to go world level that's like lots of data lots of computation right so how you would even do this sounds like tensors have found the use case there yeah and in in in in callbert what when we we represented callbert on less also we did a large sample application around the ms marker dataset the passage ranking dataset of mammoth marker so we made a sample app where you can combine these different retrieval and ranking strategies and but in our case we used callbert as a re ranking model and that's one of the really strength of of espice that we allow you to express really complex retrieval and ranking pipelines so that you do a query and then each of the nodes involved in the query they will do a local ranking or matching and then you could have a second face locally on each node and then when you have the kind of global view after you have done the scatter gather then you can do another re ranking face because then you have the global view so there are a lot of possibilities to kind of trade off between accuracy and cost them yeah yeah exactly and actually as you've been describing this I also realized that we've been recently discussing in one of the podcasts about multi-stage runker right so you could have either a sparse or dense retrieval but you can then later use your graph algorithm to kind of like re rank the items I think it was in the podcast with Yuri Markov the author of H&SW algorithm and so have you have you seen any use cases based on espice you know for multi-stage ranking pipeline? + Definitely I mean so both the search internally in our we also see this outside from customers using last but they do multi-stage retrieval and ranking pipelines so there's basically the reason why you do it typically is that it's too expensive to evaluate the kind of final ranking model over all the documents right so you take some kind of approximation of that model and then you execute that as to kind of candidate the treaver and I think one of the we haven't talked about the vector search capabilities of VESPA yet but one of the beauties of VESPA is that we after we integrate it approximate nearest neighbor searches that you can do a combination when you actually do the matching and querying that you can combine it the regular sparse or keyword search with a vector search and then you re rank and it's kind of paradigm of having multiple stages you know you see that in the question answering pipelines as well right or you have retriever and then you have what they call a reader right so it basically finds some candidate passages from Wikipedia and then extracts in a reader but evaluating the reader which is a complex typically a transformer model where you input both the query and the document at the same time into the deep neural network it's very complex to actually evaluate that overall the potential passages and user types. + It's like super intensive and I'm super curious to drill into this topic of like combining you know neural search with sparse search actually before that as you've been talking I've realized I'm actually now taking a search with machine learning course dot by grand ingressol Daniel thank you like it's a fantastic course I can highly recommend it it's super intense as well and I think yesterday grand mentioned that there are companies which you know that they really need to optimize only like top one or top two results and they have built models to optimize only that top one or top two which sounds like my mind blowing right and that was like something maybe this applies to web scale to some extent and one of my recent experiences is actually in web scale search engine we have a mobile screen and so we can only show top three results and the target is obviously to have a high CTR and so we've quickly noticed that if you do a sparse search without any logic on the query whatsoever CTR is very low so you have to do some tricks like query understanding and also trying to increase precision at the same time maintaining the diversity of search in some degree so you know basically it's very easy that with sparse search you hit just the tip of that iceberg and basically say okay I have three teacher jobs for you which is not that interesting because we don't know if the user is looking for teacher jobs right so so that's that's like have you seen cases like this I think these are really really challenging ones yeah but generally if you look at the results like if you ever rate on MS Markov for example and the official metric there which is the mean the reciprocal rank right so if you get the perfect the actual relevant document you're able to retrieve it and put it in position one that query gets a score of one right but if you put it in the second place it's gonna have a score of 0. +5 I think that's really good measure when you talk about mobile screen precision at 1 2 3 so that's really important but in the kind of retrieval not this stage retrieval and ranking pipeline it makes sense to spend more of the computational budget within the lakes SLA on those top K hits right so like in when you go to Google today and you do a search probably 100 million documents will be excluded in just a fraction of a millisecond right and then there are multiple stages and you can be sure that the kind of the the 10 last documents from the previous stages that the invest more time or computer computational resources on those hits yeah yeah exactly and also the good thing is that is that because that's when I talked about the West Architecture where you have this division between stateless which is doing the scatter gather and each of the local nodes is that basically in a search engine today you need to move move computations both to the data but there's a lot of talk about you know moving or separating compute from storage which is a huge thing right in the cloud but in search in search use cases with high triplet high document or you need to be able to do both you need to do fast computations across multiple nodes and then you transfer data like in last well each of the hits that you can include ranking features or so on that the subsequent phases can actually use for re-ranking and the good thing is like I know you've done a talk about diversity in search results and so on is that you need to have that global view you know in order to kind of optimize for for diversity and then you can kind of throw away a lot of the hits that you're not going to show because of kind of business constraints or diversity constraints and you don't need to invoke the heavy model for those hits yeah so yeah I think it's interesting for these kind of pipelines but one thing that is challenging regarding both the stage pipelines is that they interact with each other right and if you do if you have like a system for training your model retraining the model using statistical features what are users clicking on and so on the one of the features then you will have some biases towards the actual the ranking algorithm that is in place today because that's the model that is bringing interactions so you basically just retrain on the top hits and that was what we saw on Amazon as well as that when they started to improve the retriever so when it's not actually instead of having a BM25 like do BM25 and then rerank they had a mean reciprocal rank score of 0. +35 or something and now after changing into a dense retriever now we're talking about 0. +42 or something like that so by improving the improving the retriever right because the retriever kind of sets the upper bound you know because the rerank cannot really dream up you know the relevant hits if the retriever hasn't retrieved it right so that's an important point you know in the retriever and writing stages so exactly and I think we can gradually move into neural search and vector search but like you know it was one of the students question also yes then the same course how much you can actually solve with the rerunquer if your first stage retriever didn't even find what the user is looking for which means probably the query is not a match for this search engine you know let's say they're looking for a specific model of a phone but they don't even cell phones right so like and I think the response from Daniel Tankilang on this one was that actually you can implement a query understanding system which will understand the query as much as it can and if it knows that there are no such items in the database don't even bother searching for them and I think this was a really really clever advice on it and and and he said that system worked extremely well so like for user for user satisfaction to save their time right because in the end what we're doing is actually optimizing the user journey which then translates into business right so that was a fantastic example of how you can also talk such search problem yeah and that's one area where I wanted to because we're building all of these sample applications what you can build with VASPA and query understanding has been one of the topics that I wanted to build out to to demonstrate you know how to do that especially building it using a transformer model actually so you can have different ways of doing this but one way of doing it is to use it as a multi-label categorization problem so given a query here are the intents and their probability but what's stopping me from doing this is that we need to work on kind of open data sets now and there are very few query data sets in this way so one approximation is actually to train using the title so in the e-commerce set you can train based on the titles but then you need to have some kind of label on you know is this you can do it around categories so you have the title of the e-commerce listing and you have the category and the beauty of this is that you're actually mapping free text queries into kind of a fixed predefined vocabulary which is the categories and then you can actually eliminate zero hits the zero hits problem because you actually no longer retrieving based on on the free text queries but you're retrieving based on the most kind of interesting categories so yeah so these are really interesting you know topics and that's one thing that I find you know with search you know and why I love being in search is that there's there's there's there's a ton of things that you can you know build and you know there's query understanding you know there's facets there's retrieval and ranking dense bars you know and then you have the scale of it you know how to make it fast you know there's so many things you know they're just query understanding you know it's probably you know a full research topic you know on its own so there's so many things involved in search so that's yeah yeah it's like endless journey I agree it's like yeah it is you can also dive into an old piece out of things or you can stay with like scaling of search or query parsing whatever that you find passion and maybe your passion changes over time as well right so you did a bit of an LP then you move to query parsing then you move to maybe even scalability or whatever yeah I agree it's like a fascinating topic and also what fascinates me is that on the other side of things users are also not sleeping they're puzzling you all the time with new queries seasonal changes data changes as well because in mid-size a larger company it's usually work of multiple teams and you know or departments some departments looking after data some looking after ranking recommendation all the feature collection and what not you know something somewhere can sometimes go wrong and you need to prepare for it you need to interact you need to kind of build a system that is resilient and it's it's a fantastic fantastic space yeah it really is and there's so many methods and that's also one of the things you know people you know they want to build something that is great you know but even if you're using a passion to see know if you're using Westbar you know you need to have kind of some investment of actually getting great results and that the same thing if you're using like a vector search library or you know you need to have some kind of data pipeline for your documents and queries so there's you know I'm not a huge believer in you know none of these technologies really work that well you know out of the box you know it's it's such as definitely not a sole problem and even if you look at Google you know they're struggling as well you know there are many queries of question answering and so on that they totally get wrong you know and people want to build Google but they have like maybe two guys you know or or girls you know working on search you know you you you don't build a great search experience you know if you're by a team of two two people yeah yeah it's a huge investment and also like time investment not just like you need to hire a lot of smart people but you need to give them time to actually go through all these challenges and you know now that you've mentioned vector search I'm actually curious like when in Westbar journey you know did you first hear about vector search and actually what caught your eye and you know like sometimes even today when companies evaluate whether or not to take the neural search journey or stay with this part search journey it is not that obvious actually and and maybe you could share some advice there as well but maybe first if you could also do a historical deep dive there super exciting yeah it's it's so we've been using like dog products and so on for search but it's been it's been brute force right so been able to do brute force vector search in Westbar for a long time and then in 2018 bird came out and in January 2019 the researchers published really great results on them as Marco Pasadranking and then we like you know this is this bird model you know how can we use it you know is it you know there are a lot of things you know to get your head around you know what its bird is and how to use it and then we saw that there basically were two ways of of using it either as a representation model where you encode the query and the document independently and then you can build using a vector search library you can you build the index of your corpus and then you can retrieve pretty efficiently if you have La Crocs Med Search version so that actually was what motivated us in 2019 we started that work in summer August to actually have vector search and then also in term in Java there are a couple of image search use cases around hamming distance so they were pushing for that so there are multiple things and also our users it was open source by then so users were also asking for it you know can Westbar do vector search you know we see that we could represent vectors but it's not that cost efficient if we need to do brute force so then we start looking at you know it and we had all the kind of building pieces we had the tensor the document models representing floats and all these numeric fields and so so we had it wasn't a lot of work to get all the kind of pieces together but we had to implement the algorithm and we did we did a pretty I think like one month where we actually surveyed multiple algorithms for approximate vector search you know how could they fit into the Westbar model of doing things so that's the background of kind of why why vector search came to Westbar and I was really exciting when we started working on that because there were a lot of interest in it right so there were people wanted to work on that project so yeah of course because it's something like a bleeding edge and like a new but also like one of the podcasts I mentioned I think it was also with Yuri Malkov that I've I've had a friend who worked in essentially vector search but he was a mathematician himself right so I also viewed it as a pure mathematical concept and I was like yeah he's playing with some theoretical you know advancements and then he actually gave a talk at Google you know as well actually presenting this algorithm and the nearest neighbor search essentially and how to optimize it and even then I wasn't like essentially buying in and like okay it's still mathematics but then when I was reading H&SW paper I saw them citing his work I was like wow so now these paths have intersected so now this makes sense and you know usually it excites me when it's put into practice is that how you felt as well like was mathematics aspect of it like engaging for you or did you view it more like an engineering sort of yeah I'm definitely on the engineering side so I'm definitely on the engineering side so for example on transformers you know I don't care about the deep neural network architecture how these interacts you know I basically treat as a black box you know this is this is the box and you need a tokenizer for it okay what's the tokenizer what's saying people's output what can I use it for you know I'm not gonna do and they be I mean a lot of research actually study you know how can we build ultimate in neural network architecture so definitely know from for me that was not the math involved but we we have some people in our team with a heavy math background and you know they can teach me a little bit about what it's a proper distance metric and you know why why this one work and this one work so that that was really also a learning experience for for me to engage with such core team on this feature and a huge discussion we had you know who was you know one of my main point was that you know we need to be able to integrate for users when they want to use vector search they want to have filters they want to be able to express this in our query language so that you can combine the best of both worlds and and that took really some time you know to to get that right and that was really you know really fun to see that that you actually can write a query and say that hey give me documents that are near in vector space then filter on this attribute but at the same time also compute or retrieve based on the weekend query operator which you heard about weekend which is an optimization technique for doing spatial chival and that you could actually express that in the same query and I have to say that I was really proud of our effort when when it came out with that and and could be able to combine it and that's really if you look on the future side I think vector search it's been the biggest game changer for us was to actually integrate vector search because that's speared a lot of interest into VESPO yeah yeah and I can actually have people coming in you know yeah but I can imagine that vector search can still be useful in search as well as recommendation systems right yeah exactly so so and that's one of the things that you know you see that factorization machines dot products has been used for recommendation for a long time so you basically see the technology for search and recommendation use cases kind of merging into the kind of same same space technology space and for those type of use cases I think VESPO is really strong technology and but the interesting thing that I want to mention is that we have people coming in you know asking about VESPO thinking that it was a vector search database and then they realized hey you know there's keywords there's ranking there's a lot of other features here you know so that's been interesting for me you know you know I see vector search as a feature of VESPO in this whole kind of serving engine but you can use for search and recommendation not like I see vector search as a very important feature but it's like one feature of VESPO yeah I have to admit that part I probably played that role in bringing those users onto you through that blog post that I will of course mention and did mention multiple times and where I compare multiple you know now seven vector databases and I did put VESPO in that corner just to consider only the vector part but I knew that you guys over a lot more and actually still learn at some point hopefully we'll use VESPO in some project that I can actually evaluate but yeah you absolutely right that some of these systems are actually beyond just vector search and you know also the use cases like the way you view this right you should actually take a step back and ask yourself what is it that you are trying to build yeah I think it's really important and so when you look at vector search and we didn't so to clarify on the algorithm side after investigating an oil and several techniques we went for your emalco's H&SW algorithm so we implemented a version of that to be able to also handle filtering real-time updates and so on so but I think one discussion that is is not heard that often is that vector search when you introduce kind of H&SW or any technique you are losing some accuracy compared to the brute force right so for example a data set that is called SIFT one million documents you can do a single treaded route for search over those one million vectors in about a hundred milliseconds right but if you do approximate then some parameters of H&SW you might get down to 0. +1 milliseconds as well using a library right so it's a thousand times faster but by doing that you are losing some accuracy and that's kind of when I see blog posts about approximate vector search without mentioning the kind of trade-offs between recall and performance then I like you know you should include the recall numbers because there's really so it's really I think it's really important for many use cases right it might be that you need to do use a brute force because that kind of approximative error that you introduce is not acceptable right so we do have use cases in now that we actually use we don't have like large amount of documents that we actually use a brute force search and best but best best supports brute force search so yeah yeah okay so you can switch and that's the beauty is that since we support this you just say in the query time you can say approximate through your false and that means that you can take a query run it using a brute force and then you can compare the result for the brute force which is exact with the approximation then you can compute the overlap between those two and that's typically then what's used in the recall at k right so I did two blog posts on what I call billion scale vector search with with last one where I did deep dive I think into these kind of trade-offs because when you introduce approximate you also need to build these kind of index structures so in hnsw you need to build the graph right which is time and resource taking time you know I'm costing memory so there are all these kind of trade-offs and that's generally I mean generally for search a lot of trade-offs but especially around vector search I call it the jack of old trade-offs because there's so many things you know to consider you know memory usage this usage CPU and so on so yeah that love the term jack of old feed-offs yeah but it but it really is you know you really have so many trade-offs and some companies you know maybe you have lots of data but you don't have any real tripe it right in that case maybe disk a and n or things that basically using disk is is a good alternative because when you're buying servers in the cloud or renting servers in the cloud you pay when you want to have this amount of memory you get this amount of CPU right there comes in a relationship between the CPU and the memory and and so there are different trade-offs around you know what what's actually going to use it for you yeah exactly have you heard any other misconceptions about neural search at large you know when somebody comes and says hey I want to implement a question answering system you couldn't principle use sparse search techniques or like query understanding techniques you know to actually almost do it in the rule-based fashion but like neural search on the other hand is like you know new sexy stuff everyone's to try out so the question is like have you heard of any misconceptions or something that people think it's much easier than it is yeah that's that's I mean it's a fantastic question I think you know you can just sit back you know this is I'm relaxed for a few minutes because this is a topic that I really love um yeah so so so the first time when we if you look at semantic search especially around vector search if we semantic search might mean a lot but if you look at the kind of the typical that people use semantics search today is that you have this vector search right you have independent query embedding in the document embedding and so and if you base this if you take them pre-trained language model from hugging phase and you just pull that model and then you encode your queries using for instance the CLS token or the average over all tokens and the result that you will get from that is not going to compete at all with the VM25 right because that language model has not been it's only been learned learning how to do mask language model right so it's basically it's been trained on predicting the next word right so it's a deep neural network that's it's not been trained for that so it's basically like taking some deep neural network for my vacuum cleaner and put it into my car you know to try to try to try the car it's not been trained for that right so that was one of the things you know when we struggled as well when they looked at bird and the other people like oh that's so great and then we had the engine and we could like compare it with VM25 and then we did bird here and there was like these results if you look at the actual information retrieval benchmarks they're like the results are not good they're they're like really so then came the kind of you know realization I think that's actually happened around industry as well in 2020 when the DPR dense passage retriever paper came out from from a Facebook where they trained on natural questions the Google dataset they actually trained this dense retriever and the dense model using a contrastive loss and hard negative mining so they basically demonstrate you know how to actually train a dense retriever model and then we actually saw the results were much better than than VM25 in that but but it's a huge so that's one area where I think that people just using the pre-trained model might not work well especially if it's not been tuned for retrieval and even if you look at MS Marco which is the largest data set out there that you can train a model on if you train a model on MS Marco and then you apply that model into a different domain so on a different dataset it might not outcompete VM25 in fact it actually in many cases it is actually underperforms compared to VM25 so and that's why there's a lot of interest and especially recently is like you know if we combine this exact matching you know the actual user search for this phrase but we also have the vector representation you know how to combine that and that's that's actually two of my colleagues are right now working on the beer dataset to they open the PR to the to the dataset to include VASP as well and then we will demonstrate some methods for combining sparse and dense. + Yeah that's awesome like I've read the beer paper after you referred it to me actually and it was quite eye-opening because it does compare not only sort of like search engine algorithms and approaches but also datasets and tasks right which different tasks like searching or answering questions may matter quite a lot and so it was quite an eye-opening that first of all VM25 is fairly competitive so it's not a loser not at all so like you should still consider using it like and actually maybe even keeping it as a strong baseline in everything you do and I know some companies by the way still use TFIDF so maybe they should also like first transition to VM25 and only then jump to neural search techniques are like a denser trivel and and I think you also mentioned that and I saw by the way that you have participated in various competitions on denser trivel and on ranking like can you can you elaborate a bit more like what drives your interest there because to me that sounds more like academic interest in a way right but of course you're also showcasing and probably bringing ideas back to that spa. + Yeah so the motivation was actually around them as Marco passage ranking and where we actually could use this dataset and then our dream when we started to implement vector search was one thing and the other thing was you know how can we represent the re-ranking using bird in westbound so using the actual bird model inputting both the career and the document at the same time so that was one dream we had and but we were looking at the results and I think the first paper that we read it read that they they used maybe a day to with even with a GPU to actually perform 3600 queries right so it was not really you know how can we make this practical and then two years later we actually did did beat their benchmark and to end represented on westbound and we were doing it less than 100 milliseconds so on CPU right so but there being a lot of learning to get there but that was the motivation to kind of demonstrate that you can take this state of the art or close to state of the art with three-wheel and ranking pipeline from an open dataset which is how widely recognized and all the researchers are actually publishing papers around it you can actually take that model and use westbound and get those results you know so it was one way of demonstrating that you can actually then you can actually use these models with westbound and have it serve in your state so that was actually the motivation not on the kind of science side and so on but I have to say that I really would encourage everybody that works in search to look at some of these open datasets you know play with them you know maybe you have some ideas you know around search and how to do search and there's a lot of talks about boosting this phrasing you know how actually does that impact the results on kind of a dataset and I can really recommend the track COVID which is a dataset that was made at the start of the pandemic and it has about 50 queries and deep judgments for each of the queries and the collection is rather small so you can play with it on a single node and so on so that I will really encourage you know people in search to try out you know because then you get the feeling you know does it actually work does it actually you know compare it with BM25 what happens if I do phrase matching or do something clever you know so I think that's and and I'm really not a huge fan of anecdotal query examples to see these kind of commercial actors with this you know I'm searching for this on this magic results you know I'm more into you know demonstrating that actually Westpac can do this and it has the funding and actually on the real datasets. + Yeah and I agree in the end of the day what matters is first of all can you apply this tag as you said in your real setting right in your domain then another thing that you mentioned just now you know the track COVID dataset so maybe as the result of your research you might also impact on the global situation right maybe somewhere locally maybe somebody will use your work to actually implement a better search system so I think that that's also a fantastic segue to you know the the context that you're doing and that's actually a very interesting point because we had at the start of the pandemic we built a cord 19 search interface that we published online so people who actually go and search this dataset and people they were I don't I don't recall the details but it's still online and people actually because of all open source so they forked it and then they started using Westpac and based on that and I think it's a much better shape that service right now than the the cord 19 search that we did so they actually built on that work so so that's that's great I love to put what I call sample applications you know how what can you build with with Westpac and and that's actually a lot of my time these days are spent on making these sample applications smooth and easy to to work with and especially we've been rather weak on the kind of UI putting together front dance and so on so that's actually some work that I'm doing right now to kind of build more of the product you know what can you build with it because people don't get really excited about looking at Jason I output you know to actually see some interactions faces facets you know out the completion and to actually build that whole experience you know for the for the product people it's like looking at the engine when you actually want to maybe look at the car right and then you get fascinated by how shiny and sort of sleek it is and then you're like I'm buying it yes yes I totally hear you there and like actually in these use cases you know there are other platforms you know in the neural search space also doing multiple demos have you been looking into the direction of multimodal search does that excite you do you think it's too much of a bling edge or niche use case or do you think it has potential because of the neural search crossing the boundary of text towards the image audio and so on yeah I think multimodal is really where recto search is shining so this is the area where you it really shines I have some doubts about out of the main like we discussed using a vector model for text search if you don't have any label training data and so on and adopted to your data using vector search alone for that I think is questionable but looking at this multimodal where you combine both a transformer model and a typical image model and you train that representation and from what I've seen from these models and we did a sample application on this as well using the clip embedding model from from open AI and looking at the results I I have to say that I'm really impressed by kind of just eyeballing I don't have any kind of I don't have any hard data sets or but it's really impressive you know what that model can can actually do so I definitely think that multimodal is it's very I don't think it's I don't think it's that far ahead I think because we have interest in representing clip in best from from actual questions I'm actually I'm seeing an email right now you know how to they want to help on their schema and definitely they want to use clip yeah so definitely I don't think it's that advanced at the moment and I think we'll see another thing that I'm working on right now is that I talked about our sample applications I want to build a new sample application that demonstrates in a UI in an e-commerce setting where you combine different kind of fussy matching exact matching vector search all in the same query and then you can have some sliders where you actually slide these you know how does the result change and they change in real time so I just need some help on the on the react front end because I'm not I'm not a great JavaScript programmer I have to admit so I need some help on that so yeah but I definitely think that multimodal vector search has a really has a huge number of use cases yeah I hope that amongst listeners of this podcast maybe there are some with front end skills and maybe since you're building this for open source you know that might be good use case as well to be contributing to this crazy journey but we have yeah I mean that would I mean definitely we do see more involvement and contributions from from the in the kind of community around VESPA so I think we build a lot of the last two years of the community side and people getting to know more about VESPA and actually starting to contribute back both on the sample applications and also documentation and also we're seeing our more involved in contributing to the code so definitely yeah so but I think it's from a product side it's really important for us to and also we have a commercial offering of VESPA where you actually have a hosted interface hosted solution multi-region and to I think it we want VESPA to be able to run fully fledged in your environment if you want to use it because it's open source it's our Pasha 20 if you want to use our cloud you are welcome to do that and to kind of have and the same kind of functionality and what we add in the cloud is CICD pipelines how to do multi-region failovers like in the US East US West you can have different so all this kind of top and take care of sort of take care of nodes failing and whatever you know the hole the kind of host the experience so and that's been an issue with our sample apps they have been like it has been some friction around you know how to deploy them locally how to deploy them to the cloud so I'm trying to kind of bring them together so that they they work in in multiple environments yeah that's a lot of sense and I guess it takes a lot of engineering effort to also kind of cover all these different use cases so sounds quite exciting and actually demoing the technology I think you know as you know other vector databases have got it and I think it's a such a low entry for especially for non-technical people or those who are in charge of businesses business units to actually make decisions and I think for them you know having a relevant demo is going to be quite a game changer because if they need to reason about your technology only through the eyes of engineers in their company then probably that's that's much longer path right yeah exactly and I want this experience to be as smooth as possible so that you can get started with the sample application run it locally get some data into it fire up your front and react and you can interact with it and if you're happy with it if you want to share with your friends you can upload it to the Westpac Cloud and then you can share to URL to your friends and that's a model that I really believe in that you can it's open source so you can actually run it locally and then you can take the cloud provider can actually take care of the hosting for you so that's and right now we actually we are providing like free trials so you don't you only need an email address for the Westpac Cloud you don't need a credit card or things like that so you can actually play it play with it and run with we can even leave a link where users can try out Westpac and subscribe so I think that will be quite beneficial and actually I was thinking like even though we a little bit drifted in our conversation away from better search you did mention the exciting space of combining you know better search with smart search and I wanted to take it from the angle of a non-technical user right so let's say they come to you and they say Joe can you actually enlighten me a little bit on how do I combine these things maybe I just want to deep my toe and vector search just to see what it cannot cannot do in my domain what would you recommend them to do assuming that they already have maybe like a smart search engine and maybe they are evaluating Westpac as one candidate yeah so I think the question is if you're using Westpac it's rather easy to do this because you you can express it in the query and then you write the right key profiles saying that you know this is how going to combine the sparse ranking single for example be on 25 with retrieval for others that are not using Westpac using for example elastic search and open source of our shesolar what we see is that they build a lot of infrastructure on top of these so they actually have the ranking layers outside of elastic search right so in that case is you could have kind of a vector search library running at the side of elastic search and then retrieve and then you need to you need to keep those two data stores in sync and then you can in parallel fetch okay give me elastic search your best results and vector search database give me your best results and then you can use a technique called reciprocal rank fusion where you basically merged results based on you know are they are they ranking you know it's the document found in both so that's that's a powerful technique of but you don't have to actually know anything about the distribution of ranking scores and so on so google is writing a lot about reciprocal rank fusion so it's interesting direction and that's one thing we know from Bing and from others from from both Bing and from bydo in in China is that they're doing this kind of mix mix retrieval with different systems for sparse signals and then signals but but then you have for the regular uses you have a lot of moving parts right you have different data stores make the manned ship and that's one of the things that we try to our advantage is that when you're using Westbyes that you know you you get these capabilities in the same engine you don't need to store the data in different stores and having consistency problems because of that yeah yeah so I will definitely if you're interested if you're sitting there today with open source or or elastic search and you don't want to invest in in in in the vast park you could try this batching the query and doing reciprocal rank fusion yeah yeah it could be like one way to actually introduce something from more like semantic search if you view it that way right so that's a great idea because I think there are multiple approaches to this and I think if you are within one search engine like say VESPA or elastic search open search a solar would have you then I think you could in principle experiment with like fusing you know the neural search result with sparse search using some kind of linear combination as you actually retrieve it right yeah so so so so you so you can actually use the linear combination but the great thing about this rank fusion is that you don't simply you don't look at the ranking scores so you basically just fuse them by the order of their returns so you don't have to know anything about the score distribution like EM25 it has basically unbounded it could be 25 it could be 100 to be 5 right so it's very difficult to to to combine that using a linear model because you have two signals you know and one is number is going to be like this and now it was going to be between 0 and 1 so reciprocal rank fusion is definitely you know interesting case actually this is super great point and hopefully we can provide some links to this because this technique because I think I heard this question multiple times that would you set exactly just now that the score space is completely different and they are not compatible with each other and so you have to find a way to still interleave them or merge them right so that would you set exactly makes sense that you don't pay attention to the score space you actually look at the order and you try your best to interleave them yeah that makes total sense yeah yeah there was actually a recent recent paper on because there has been more interest in that these dense models alone that they generalize not that well what you're using out of domain and one of the things that the Google researchers were doing and showed promising results was using this rank fusion and I've seen this rank fusion in a multiple Google papers so so it's very interesting the researchers they're really interested in reciprocal rank fusion so yeah sounds like a popular technique yes I mean time flies and I really enjoy talking it feels like we could record another podcast what do you think I'm talking multiple topics but I still really love to pick a brain on that philosophical question and kind of ask you what what keeps you so interested like you are a loudmouth behind West Pine general but you also offer a bunch of advice right like through your blogs through your public presentations and even sharing papers on Twitter at least for me was super helpful that I could you know quickly read the paper that you shared but what keeps you motivated and interested to stay in this field and also specifically you know maybe you think something is missing in the vector search space or in general in in search space that you would like to fix yeah so it's a great philosophical question I think I'm not that excited about vector search I see that as a technique so I'm more like excited about search because I think it's such a fascinating problem we touched on it before you know you have query categorization spelling you have so many different aspects of building a great search experience and also the scale thing is really appealing for me you know kind of passionate about you know how can we do this billion scale how can we make it fast you know what if we need 100,000 queries per second what if we need to update all the documents in real time and within 12 hours or one hour or you know where's the limits you know where is the cloud going you know this compute versus storage can we now move more computations out of the storage layer there are a lot of these exciting on the kind of system things but on them kind of more science things you know how to build a great search experience I think you need to have this kind of multiple techniques and we didn't touch on it but vector search is one thing sparse search is one other but GBDT models tree based models is really ruling the search or it's kind of the hammer of search because those models on tabular data statistical features you know they really show promising results and that's another thing that I think is great than about west based that you can combine these GBDT models newer metals in the same ranking functions I don't think that there's a one single silver bullet for retrieval I think there are multiple different singles like for instance let's let's do vector search if you only do vector search if Google only did vector search only on the text right it would basically you have a lot of duplicates on the web you have low quality content you know there's page rank there other factors you know it's not only vector search there's that kind of different techniques so and so that means also that there's a lot of things new things to learn you know how do you do caricaturization you know how do you how do you how do you actually determine which facets and then kind of navigation you're going to show to the user and like you touched on at the start you know if your user does a query and we don't have any good results you know should we just slow them some random results or should they say that hey you know I'm sorry but we don't have anything for your query so yeah that's really what motivates me is that it's such a fantastic problem if you're interested in scale and all these kind of things coming together so yeah yeah thanks for that it's deep and it's very wide and I think it's like limitless space and I hope also that newcomers feel it's kind of like a low bar entry especially and we didn't touch on this but especially with your work and open source you know the support like you can go and slack or whatever tool you're using to communicate with your users and actually listen and address their concerns questions and hopefully this opens more you know more possibilities for newcomers to enter yeah I love I love actually it's actually a weakness as well but I love answering questions you know you can see me answering questions on multiple slack spaces you know I love people you know asking questions about search so I really love that and I'm and what really gets me if someone is struggling with something you know how can I do this with West Park and I'll try to explain it you know you have to do this and that you know and then I like I go back you know at the program saying you know we need to fix this you know we need to make this more easy for people to use right so it's it's a both way thing and that's one thing that I learned in my career is that you know listen carefully to your users how they're using the product what are the pain points you know how to how does it feel to get started are able to progress so that's that's really also motivating and and honestly I think that some of the work that we've done using some of these smaller transformer models has been has an impact on the industry like I got contacted by a person here on Twitter the other day said that you know I saw your tweet about these smaller language models like not the birth base that people usually turn okay but this mini LM model which is a distilled 22 million parameters that actually did the demo that you can run in your browser and it said you know I saw your tweet and I went ahead and tried it for my domain which was classification of hate speech and then he like did a blog post on it and he mentioned me and I think that was really like interesting for me to see that you know that I could share something that some people could actually make use of even if it was outside of search show and I learned a lot from especially around the relevance is slack space that we are both in the open social connections slack space so a lot of discussion there rector search and we are you know sharing some blog posts and then I ask Greg from Pinecoin a tough question maybe you know and so I really love being there and discussing and I learn a lot from from other people like just devins from elastic search and so and I'm from you and especially around Berlin best search last year you did the AMA on the vector search and for me like one of the key moments was that Max Irvin your co-host of that he said you know what if the user types of phrase query you know actually quote marks I want to search for this exact phrase you know don't show me anything else give me that phrase you know and that's something that is really hard to do with vector search alone right because you basically map it into this vector space and we do the similarities and that was a key takeaway from me and that was a really eye-opener for me you know you need to be building out better examples of how actually to combine sparse and then single so yeah this is amazing and what I enjoyed what you said is that you keep your practitioner hat on so you don't just buy in easily into these new models or you don't stay on the field of okay I'm only an engineer I don't even know what machine learning is because I think the profession is slowly changing and it's like a blend of skills where today you need to succeed as a search engineer and maybe it shouldn't be called the search engineer anymore like I think it needs some new term but we will probably be stuck with it for the lack of a better term but eventually you will need more skills under your belt and I think of the work that you are doing is amazing in sharing this knowledge and that people can actually reproduce it and I think that's super super crucial for the progress yeah I mean thank you Dimitri that's that's really nice of you and you know that's that's I yeah I think it's it's it's actually true you know to share and I think that what you said you know building a search team today is really hard because especially since deep learning entered the search field right so now you need to know how to configure and do matching and boosting an elastic search and now you also need you know how do I train the dense vector model and you know how should I you know should I use birch they use birch large you know does it handle multilingual text does it handle spell correction you know they're always kind of different things you know so building a search team in 2022 it's not easy because you need the kind of a mixed NLP search you know there are a lot of different things and that's what I love about it you know and I talked about on Twitter and you know in a talk I did earlier as well that this neural paradigm shift has opened this kind of knowledge gap you know how to actually successfully apply these methods and also on the technology side that we try to bring you know with VESPA that you can kind of combine different techniques we don't have to throw away 50 or 300 years of the inverted index you know we don't need to throw that away you know it still has value it's going to have value probably forever you know so so we don't have to throw away so that's interesting but that's also what it's been fascinating and I've said numerous times that I don't think I've learned that much in my career that I've actually over the last three years because reading papers it's not being a big kind of interest of mine earlier it's been one of the system side engineering side but that's been a high-opener for me you know to kind of how to apply these techniques and yeah so and to learn you know and that's the great thing about open source is that you know we can share ways of doing things yeah yeah absolutely sharing is caring and so much comes back to you as you said you know you get mentioned somewhere and you feel like you didn't do it in vain but also you might learn something new like a new use case and I feel the same you know when when I blog or when some video is viewed by somebody and then somebody says thank you even just multiple months after I did it and you know it's just an amazing feeling it's like a sense of connection as well especially in these days when we don't maybe meet socially as we used to but that's a new actually evolutionary new way of connecting and I feel much more comfortable and enjoying this more detailed conversation so maybe these interactions on Twitter they they bring a lot more value and I think this is super super great is there is something that you would like to share on Vespa development or maybe something that users might anticipate and maybe you want to point them to some tutorial that they might you know take a look at yeah so I can give a few product updates of what's coming from Vespa we are coming gonna release some integrated dense models for Vespa so though you don't have to export so you can you can use these models off the shelf and then we allow you to tune the Korean coder so you have the document code is frozen but then you can tune the Korean coder and then show you how to combine these combining both dense as far so that's one thing that is coming out other thing is that we're taking some steps regarding for love QPS use cases because we designed Vespa you know to be kind of low single digit male seconds on multiple different use cases but not everybody needs that so we're introducing some new options for memory management so that we can actually run on service with less memory so that's I think that's gonna be a game changer for certain use cases that don't need high throughput low latency so that's two things and yeah that's I think that's more than enough you know and there will there will be some some blogs I think about our results on the beer beer benchmark yeah yeah that's and I'm gonna come out also with a blog post on a technique called span which is a paper for Microsoft so m2n represented that on Vespa so it's really interesting technique with the hybrid combination of HMSW and inverted file and you can represent this m2n in Vespa so I'm gonna do a part three of my blog post serial billion scale so that's something I'm looking forward to but right now I'm that kind of refracting a lot of the sample applications and so on to to get the experience more smoothly yeah yeah sounds fantastic looking forward to it and we'll make sure to link all the blog posts that you mentioned especially on billion scale vector search and other tutorials that you mentioned and this is fantastic thank you for doing this and keep doing this keep finding the energy I know it's stuck sometimes but I think it keeps you also awake and sort of like pushing yourself forward and I think the best way to use your brain is actually doing something that is useful be reading a paper or implementing code or blogging about it so this is fantastic thanks so much for your active contribution thank you thank you as well yeah um and I really enjoyed this conversation I really hope we can record at some point down the road as well if you will be open to it and I think we can cover a lot more topics as well but I wish you all the success in your endeavors and stay warm and excited about the field yeah I will I mean such an exciting field thank you very much Dimitri for hosting this and you know we'll talk later on thank you thank you and see you around bye bye you \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/joan-fontanals-principal-engineer-jina-ai.md b/transcripts_with_timestamps/vector-podcast/joan-fontanals-principal-engineer-jina-ai.md new file mode 100644 index 0000000..94d1d10 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/joan-fontanals-principal-engineer-jina-ai.md @@ -0,0 +1,2922 @@ +--- +description: '

Topics:

00:00 Intro

00:42 Joan''s background

01:46 + What attracted Joan''s attention in Jina as a company and product?

04:39 Main + area of focus for Joan in the product

05:46 How Open Source model works for + Jina?

08:38 Deeper dive into Jina.AI as a product and technology stack

11:57 + Does Jina fit the use cases of smaller / mid-size players with smaller amount of + data?

13:45 KNN/ANN algorithms available in Jina

16:05 BigANN competition + and BuddyPQ, increasing 12% in recall over FAISS

17:07 Does Jina support customers + in model training? Finetuner

20:46 How does Jina framework compare to Vector + Databases?

26:46 Jina''s investment in user-friendly APIs

31:04 Applications + of Jina beyond search engines, like question answering systems

33:20 How to + bring bits of neural search into traditional keyword retrieval? Connection to model + interpretability

41:14 Does Jina allow going multimodal, including images + / audio etc?

46:03 The magical question of Why

55:20 Product announcement + from Joan

Order your Jina swag https://docs.google.com/forms/d/e/1FAIpQLSedYVfqiwvdzWPX-blCpVu-tQoiFiUJQz2QnIHU1ggy1oyg/ + Use this promo code: vectorPodcastxJinaAI

Show notes:

- Jina.AI: https://jina.ai/

- + HNSW + PostgreSQL Indexer: [GitHub - jina-ai/executor-hnsw-postgres: A production-ready, + scalable Indexer for the Jina neural search framework, based on HNSW and PSQL](https://github.com/jina-ai/executor-h...)

- + pqlite: [GitHub - jina-ai/pqlite: A fast embedded library for Approximate Nearest + Neighbor Search integrated with the Jina ecosystem](https://github.com/jina-ai/pqlite)

- + BuddyPQ: [Billion-Scale Vector Search: Team Sisu and BuddyPQ | by Dmitry Kan | Big-ANN-Benchmarks + | Nov, 2021 | Medium](https://medium.com/big-ann-benchmarks...)

- PaddlePaddle: + [GitHub - PaddlePaddle/Paddle: PArallel Distributed Deep LEarning: Machine Learning + Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)](https://github.com/PaddlePaddle/Paddle)

- + Jina Finetuner: [Finetuner 0.3.1 documentation](https://finetuner.jina.ai/)

- [Not + All Vector Databases Are Made Equal | by Dmitry Kan | Towards Data Science](https://towardsdatascience.com/milvus...)

- + Fluent interface (method chaining): [Fluent interfaces in Python | Florian Einfalt + – Developer](https://florianeinfalt.de/posts/fluen...)

- Sujit Pal’s blog: + [Salmon Run](http://sujitpal.blogspot.com/)

- + ByT5: Towards a token-free future with pre-trained byte-to-byte models https://arxiv.org/abs/2105.13626

Special + thanks to Saurabh Rai for the Podcast Thumbnail: https://twitter.com/srbhr_ https://www.linkedin.com/in/srbh077/

' +image_url: https://media.rss.com/vector-podcast/20220119_090157_f67877f44bb32ae14fd380d9328691ec.jpg +pub_date: Wed, 19 Jan 2022 21:02:57 GMT +title: Joan Fontanals - Principal Engineer - Jina AI +url: https://rss.com/podcasts/vector-podcast/366298 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 22.12, "text": " Hey + everyone, Bector Podcast is here and today we are continuing our quest to study + more", "tokens": [50364, 1911, 1518, 11, 363, 20814, 29972, 307, 510, 293, 965, + 321, 366, 9289, 527, 866, 281, 2979, 544, 51470], "temperature": 0.0, "avg_logprob": + -0.3706725293939764, "compression_ratio": 1.0602409638554218, "no_speech_prob": + 0.11737712472677231}, {"id": 1, "seek": 2212, "start": 22.12, "end": 31.64, "text": + " about Bector Technologies and Beding Technologies platforms and today I have a + guest from Gina AI,", "tokens": [50364, 466, 363, 20814, 46993, 293, 363, 9794, + 46993, 9473, 293, 965, 286, 362, 257, 8341, 490, 34711, 7318, 11, 50840], "temperature": + 0.0, "avg_logprob": -0.4015885848033277, "compression_ratio": 1.592783505154639, + "no_speech_prob": 0.3786201775074005}, {"id": 2, "seek": 2212, "start": 31.64, "end": + 36.68, "text": " his name is John Fontanelles and he is a principal engineer at + Gina AI. Hey John.", "tokens": [50840, 702, 1315, 307, 2619, 43901, 282, 19787, + 293, 415, 307, 257, 9716, 11403, 412, 34711, 7318, 13, 1911, 2619, 13, 51092], "temperature": + 0.0, "avg_logprob": -0.4015885848033277, "compression_ratio": 1.592783505154639, + "no_speech_prob": 0.3786201775074005}, {"id": 3, "seek": 2212, "start": 36.68, "end": + 40.28, "text": " Hello, nice to meet you.", "tokens": [51092, 2425, 11, 1481, 281, + 1677, 291, 13, 51272], "temperature": 0.0, "avg_logprob": -0.4015885848033277, "compression_ratio": + 1.592783505154639, "no_speech_prob": 0.3786201775074005}, {"id": 4, "seek": 2212, + "start": 40.28, "end": 45.88, "text": " Yeah, nice to meet you as well. Thanks for + joining me today and I''m really really excited to talk about", "tokens": [51272, + 865, 11, 1481, 281, 1677, 291, 382, 731, 13, 2561, 337, 5549, 385, 965, 293, 286, + 478, 534, 534, 2919, 281, 751, 466, 51552], "temperature": 0.0, "avg_logprob": -0.4015885848033277, + "compression_ratio": 1.592783505154639, "no_speech_prob": 0.3786201775074005}, {"id": + 5, "seek": 4588, "start": 45.88, "end": 52.760000000000005, "text": " what is Gina + AI? I know something I used to use some kind of predecessors of Gina AI in some + sense,", "tokens": [50364, 437, 307, 34711, 7318, 30, 286, 458, 746, 286, 1143, + 281, 764, 512, 733, 295, 24874, 45700, 295, 34711, 7318, 294, 512, 2020, 11, 50708], + "temperature": 0.0, "avg_logprob": -0.32326271713420907, "compression_ratio": 1.6652360515021458, + "no_speech_prob": 0.03313431888818741}, {"id": 6, "seek": 4588, "start": 52.760000000000005, + "end": 59.72, "text": " but not like Gina AI itself. But first of all I would like + you to introduce yourself,", "tokens": [50708, 457, 406, 411, 34711, 7318, 2564, + 13, 583, 700, 295, 439, 286, 576, 411, 291, 281, 5366, 1803, 11, 51056], "temperature": + 0.0, "avg_logprob": -0.32326271713420907, "compression_ratio": 1.6652360515021458, + "no_speech_prob": 0.03313431888818741}, {"id": 7, "seek": 4588, "start": 59.72, + "end": 68.36, "text": " your background to our listeners and to me. So, well I studied + engineering degree in Barcelona,", "tokens": [51056, 428, 3678, 281, 527, 23274, + 293, 281, 385, 13, 407, 11, 731, 286, 9454, 7043, 4314, 294, 21247, 11, 51488], + "temperature": 0.0, "avg_logprob": -0.32326271713420907, "compression_ratio": 1.6652360515021458, + "no_speech_prob": 0.03313431888818741}, {"id": 8, "seek": 4588, "start": 68.36, + "end": 74.36, "text": " not computer science, from general engineering with my AmiExelectical + Engineering, Mechanical Engineering,", "tokens": [51488, 406, 3820, 3497, 11, 490, + 2674, 7043, 365, 452, 2012, 72, 11149, 68, 1809, 804, 16215, 11, 30175, 804, 16215, + 11, 51788], "temperature": 0.0, "avg_logprob": -0.32326271713420907, "compression_ratio": + 1.6652360515021458, "no_speech_prob": 0.03313431888818741}, {"id": 9, "seek": 7436, + "start": 74.36, "end": 80.68, "text": " but I got into software engineering because + I was related with robotics and then when I started my", "tokens": [50364, 457, + 286, 658, 666, 4722, 7043, 570, 286, 390, 4077, 365, 34145, 293, 550, 562, 286, + 1409, 452, 50680], "temperature": 0.0, "avg_logprob": -0.22764238104762802, "compression_ratio": + 1.7972972972972974, "no_speech_prob": 0.0032917847856879234}, {"id": 10, "seek": + 7436, "start": 80.68, "end": 85.72, "text": " professional career I did software + engineering at different companies and industries and then I got", "tokens": [50680, + 4843, 3988, 286, 630, 4722, 7043, 412, 819, 3431, 293, 13284, 293, 550, 286, 658, + 50932], "temperature": 0.0, "avg_logprob": -0.22764238104762802, "compression_ratio": + 1.7972972972972974, "no_speech_prob": 0.0032917847856879234}, {"id": 11, "seek": + 7436, "start": 85.72, "end": 94.2, "text": " more into that engineering machine + learning and these kinds of fields and then I also did some work", "tokens": [50932, + 544, 666, 300, 7043, 3479, 2539, 293, 613, 3685, 295, 7909, 293, 550, 286, 611, + 630, 512, 589, 51356], "temperature": 0.0, "avg_logprob": -0.22764238104762802, + "compression_ratio": 1.7972972972972974, "no_speech_prob": 0.0032917847856879234}, + {"id": 12, "seek": 7436, "start": 94.2, "end": 103.24, "text": " on traditional + search, on web search, engine so on and then just live brought me to Gina which + was a", "tokens": [51356, 322, 5164, 3164, 11, 322, 3670, 3164, 11, 2848, 370, 322, + 293, 550, 445, 1621, 3038, 385, 281, 34711, 597, 390, 257, 51808], "temperature": + 0.0, "avg_logprob": -0.22764238104762802, "compression_ratio": 1.7972972972972974, + "no_speech_prob": 0.0032917847856879234}, {"id": 13, "seek": 10324, "start": 103.96, + "end": 110.03999999999999, "text": " good step in my career. Oh yeah, cool. So, + what caught your eye in Gina AI as a company and maybe", "tokens": [50400, 665, + 1823, 294, 452, 3988, 13, 876, 1338, 11, 1627, 13, 407, 11, 437, 5415, 428, 3313, + 294, 34711, 7318, 382, 257, 2237, 293, 1310, 50704], "temperature": 0.0, "avg_logprob": + -0.16933830579121908, "compression_ratio": 1.6382978723404256, "no_speech_prob": + 0.008950803428888321}, {"id": 14, "seek": 10324, "start": 110.03999999999999, "end": + 117.47999999999999, "text": " as a technology as a product or maybe the team? So, + for me what caught me the eye, it was like the", "tokens": [50704, 382, 257, 2899, + 382, 257, 1674, 420, 1310, 264, 1469, 30, 407, 11, 337, 385, 437, 5415, 385, 264, + 3313, 11, 309, 390, 411, 264, 51076], "temperature": 0.0, "avg_logprob": -0.16933830579121908, + "compression_ratio": 1.6382978723404256, "no_speech_prob": 0.008950803428888321}, + {"id": 15, "seek": 10324, "start": 117.47999999999999, "end": 124.84, "text": " + technology and the vision of I see that vector search embedding a semantic search + in general can", "tokens": [51076, 2899, 293, 264, 5201, 295, 286, 536, 300, 8062, + 3164, 12240, 3584, 257, 47982, 3164, 294, 2674, 393, 51444], "temperature": 0.0, + "avg_logprob": -0.16933830579121908, "compression_ratio": 1.6382978723404256, "no_speech_prob": + 0.008950803428888321}, {"id": 16, "seek": 10324, "start": 124.84, "end": 132.04, + "text": " revolutionize how we understand search and can bring it to the next level + also and adapt to", "tokens": [51444, 8894, 1125, 577, 321, 1223, 3164, 293, 393, + 1565, 309, 281, 264, 958, 1496, 611, 293, 6231, 281, 51804], "temperature": 0.0, + "avg_logprob": -0.16933830579121908, "compression_ratio": 1.6382978723404256, "no_speech_prob": + 0.008950803428888321}, {"id": 17, "seek": 13204, "start": 132.12, "end": 139.32, + "text": " different kind of data or so on and go beyond the typical search bar that + we are so much used to.", "tokens": [50368, 819, 733, 295, 1412, 420, 370, 322, + 293, 352, 4399, 264, 7476, 3164, 2159, 300, 321, 366, 370, 709, 1143, 281, 13, 50728], + "temperature": 0.0, "avg_logprob": -0.2180116203393829, "compression_ratio": 1.6462882096069869, + "no_speech_prob": 0.007553476840257645}, {"id": 18, "seek": 13204, "start": 139.95999999999998, + "end": 145.79999999999998, "text": " Yeah, yeah and I mean but Gina is like more + than just embedding or kind of it''s more like a", "tokens": [50760, 865, 11, 1338, + 293, 286, 914, 457, 34711, 307, 411, 544, 813, 445, 12240, 3584, 420, 733, 295, + 309, 311, 544, 411, 257, 51052], "temperature": 0.0, "avg_logprob": -0.2180116203393829, + "compression_ratio": 1.6462882096069869, "no_speech_prob": 0.007553476840257645}, + {"id": 19, "seek": 13204, "start": 145.79999999999998, "end": 151.0, "text": " ecosystem + right like it has like marketplace it has many different building blocks and", "tokens": + [51052, 11311, 558, 411, 309, 575, 411, 19455, 309, 575, 867, 819, 2390, 8474, 293, + 51312], "temperature": 0.0, "avg_logprob": -0.2180116203393829, "compression_ratio": + 1.6462882096069869, "no_speech_prob": 0.007553476840257645}, {"id": 20, "seek": + 13204, "start": 151.0, "end": 157.48, "text": " components. This is what I think + most of the people that might be hearing us might be wondering much", "tokens": + [51312, 6677, 13, 639, 307, 437, 286, 519, 881, 295, 264, 561, 300, 1062, 312, 4763, + 505, 1062, 312, 6359, 709, 51636], "temperature": 0.0, "avg_logprob": -0.2180116203393829, + "compression_ratio": 1.6462882096069869, "no_speech_prob": 0.007553476840257645}, + {"id": 21, "seek": 15748, "start": 157.56, "end": 163.56, "text": " because it''s + a question that we receive a lot so we are not such another vector database as the + ones", "tokens": [50368, 570, 309, 311, 257, 1168, 300, 321, 4774, 257, 688, 370, + 321, 366, 406, 1270, 1071, 8062, 8149, 382, 264, 2306, 50668], "temperature": 0.0, + "avg_logprob": -0.14861750072903104, "compression_ratio": 1.8726415094339623, "no_speech_prob": + 0.001868902938440442}, {"id": 22, "seek": 15748, "start": 163.56, "end": 167.56, + "text": " that have been created in the podcast so we are treating the problem of + semantic search and we are", "tokens": [50668, 300, 362, 668, 2942, 294, 264, 7367, + 370, 321, 366, 15083, 264, 1154, 295, 47982, 3164, 293, 321, 366, 50868], "temperature": + 0.0, "avg_logprob": -0.14861750072903104, "compression_ratio": 1.8726415094339623, + "no_speech_prob": 0.001868902938440442}, {"id": 23, "seek": 15748, "start": 167.56, + "end": 175.23999999999998, "text": " seeing this as a then-to-end problem and we + are trying to build an ecosystem to help the business", "tokens": [50868, 2577, + 341, 382, 257, 550, 12, 1353, 12, 521, 1154, 293, 321, 366, 1382, 281, 1322, 364, + 11311, 281, 854, 264, 1606, 51252], "temperature": 0.0, "avg_logprob": -0.14861750072903104, + "compression_ratio": 1.8726415094339623, "no_speech_prob": 0.001868902938440442}, + {"id": 24, "seek": 15748, "start": 175.23999999999998, "end": 182.67999999999998, + "text": " and developers to develop their own neural search based engines and for + that we are trying to build", "tokens": [51252, 293, 8849, 281, 1499, 641, 1065, + 18161, 3164, 2361, 12982, 293, 337, 300, 321, 366, 1382, 281, 1322, 51624], "temperature": + 0.0, "avg_logprob": -0.14861750072903104, "compression_ratio": 1.8726415094339623, + "no_speech_prob": 0.001868902938440442}, {"id": 25, "seek": 18268, "start": 182.68, + "end": 189.8, "text": " a ecosystem from the core to our document types where we + are also recently", "tokens": [50364, 257, 11311, 490, 264, 4965, 281, 527, 4166, + 3467, 689, 321, 366, 611, 3938, 50720], "temperature": 0.0, "avg_logprob": -0.18552758143498346, + "compression_ratio": 1.7102803738317758, "no_speech_prob": 0.00203107763081789}, + {"id": 26, "seek": 18268, "start": 191.64000000000001, "end": 197.72, "text": " + and launched this fine tuner project to help you with fine tuning your models for + your search", "tokens": [50812, 293, 8730, 341, 2489, 4267, 260, 1716, 281, 854, + 291, 365, 2489, 15164, 428, 5245, 337, 428, 3164, 51116], "temperature": 0.0, "avg_logprob": + -0.18552758143498346, "compression_ratio": 1.7102803738317758, "no_speech_prob": + 0.00203107763081789}, {"id": 27, "seek": 18268, "start": 197.72, "end": 204.04000000000002, + "text": " applications so we are building a whole family of products and projects + in this around this area", "tokens": [51116, 5821, 370, 321, 366, 2390, 257, 1379, + 1605, 295, 3383, 293, 4455, 294, 341, 926, 341, 1859, 51432], "temperature": 0.0, + "avg_logprob": -0.18552758143498346, "compression_ratio": 1.7102803738317758, "no_speech_prob": + 0.00203107763081789}, {"id": 28, "seek": 18268, "start": 204.04000000000002, "end": + 209.24, "text": " of neural search. Yeah it sounds quite ambitious and it sounds + like all of these building blocks are", "tokens": [51432, 295, 18161, 3164, 13, + 865, 309, 3263, 1596, 20239, 293, 309, 3263, 411, 439, 295, 613, 2390, 8474, 366, + 51692], "temperature": 0.0, "avg_logprob": -0.18552758143498346, "compression_ratio": + 1.7102803738317758, "no_speech_prob": 0.00203107763081789}, {"id": 29, "seek": 20924, + "start": 209.24, "end": 215.32000000000002, "text": " really needed for anybody + who wants to venture into embedding world of semantic search or you", "tokens": + [50364, 534, 2978, 337, 4472, 567, 2738, 281, 18474, 666, 12240, 3584, 1002, 295, + 47982, 3164, 420, 291, 50668], "temperature": 0.0, "avg_logprob": -0.1806302865346273, + "compression_ratio": 1.7536231884057971, "no_speech_prob": 0.0025847635697573423}, + {"id": 30, "seek": 20924, "start": 215.32000000000002, "end": 223.32000000000002, + "text": " know kind of bringing the power of this deep learning models. So it goes + beyond only", "tokens": [50668, 458, 733, 295, 5062, 264, 1347, 295, 341, 2452, + 2539, 5245, 13, 407, 309, 1709, 4399, 787, 51068], "temperature": 0.0, "avg_logprob": + -0.1806302865346273, "compression_ratio": 1.7536231884057971, "no_speech_prob": + 0.0025847635697573423}, {"id": 31, "seek": 20924, "start": 224.76000000000002, "end": + 230.68, "text": " only embedding your data and searching through it you may want + to cut it into different pieces,", "tokens": [51140, 787, 12240, 3584, 428, 1412, + 293, 10808, 807, 309, 291, 815, 528, 281, 1723, 309, 666, 819, 3755, 11, 51436], + "temperature": 0.0, "avg_logprob": -0.1806302865346273, "compression_ratio": 1.7536231884057971, + "no_speech_prob": 0.0025847635697573423}, {"id": 32, "seek": 20924, "start": 231.4, + "end": 236.84, "text": " you may want to re-run it at the end, you may want to join + different modalities together", "tokens": [51472, 291, 815, 528, 281, 319, 12, 12997, + 309, 412, 264, 917, 11, 291, 815, 528, 281, 3917, 819, 1072, 16110, 1214, 51744], + "temperature": 0.0, "avg_logprob": -0.1806302865346273, "compression_ratio": 1.7536231884057971, + "no_speech_prob": 0.0025847635697573423}, {"id": 33, "seek": 23684, "start": 237.48, + "end": 243.56, "text": " so we are trying to give and make it easy for the user + to develop these applications so that they", "tokens": [50396, 370, 321, 366, 1382, + 281, 976, 293, 652, 309, 1858, 337, 264, 4195, 281, 1499, 613, 5821, 370, 300, 436, + 50700], "temperature": 0.0, "avg_logprob": -0.24583013931123338, "compression_ratio": + 1.6569037656903767, "no_speech_prob": 0.011452664621174335}, {"id": 34, "seek": + 23684, "start": 243.56, "end": 249.4, "text": " speak the same language and we hope + that they will all speak gene language. Oh yeah, oh yeah, for sure.", "tokens": + [50700, 1710, 264, 912, 2856, 293, 321, 1454, 300, 436, 486, 439, 1710, 12186, 2856, + 13, 876, 1338, 11, 1954, 1338, 11, 337, 988, 13, 50992], "temperature": 0.0, "avg_logprob": + -0.24583013931123338, "compression_ratio": 1.6569037656903767, "no_speech_prob": + 0.011452664621174335}, {"id": 35, "seek": 23684, "start": 249.4, "end": 256.36, + "text": " And GNI is open source, right? Yes, so can you speak a bit more like towards + the business model or", "tokens": [50992, 400, 460, 42496, 307, 1269, 4009, 11, + 558, 30, 1079, 11, 370, 393, 291, 1710, 257, 857, 544, 411, 3030, 264, 1606, 2316, + 420, 51340], "temperature": 0.0, "avg_logprob": -0.24583013931123338, "compression_ratio": + 1.6569037656903767, "no_speech_prob": 0.011452664621174335}, {"id": 36, "seek": + 23684, "start": 256.36, "end": 261.56, "text": " kind of how GNI kind of makes money + in a way like so basically it''s open source, anybody can go", "tokens": [51340, + 733, 295, 577, 460, 42496, 733, 295, 1669, 1460, 294, 257, 636, 411, 370, 1936, + 309, 311, 1269, 4009, 11, 4472, 393, 352, 51600], "temperature": 0.0, "avg_logprob": + -0.24583013931123338, "compression_ratio": 1.6569037656903767, "no_speech_prob": + 0.011452664621174335}, {"id": 37, "seek": 26156, "start": 261.56, "end": 266.36, + "text": " and download it and basically leverage in their work or is there something + that like you have some", "tokens": [50364, 293, 5484, 309, 293, 1936, 13982, 294, + 641, 589, 420, 307, 456, 746, 300, 411, 291, 362, 512, 50604], "temperature": 0.0, + "avg_logprob": -0.21195084398443048, "compression_ratio": 1.6939655172413792, "no_speech_prob": + 0.005834519863128662}, {"id": 38, "seek": 26156, "start": 266.36, "end": 272.52, + "text": " products for which customers can pay and kind of right now we are right + now we are completely open", "tokens": [50604, 3383, 337, 597, 4581, 393, 1689, + 293, 733, 295, 558, 586, 321, 366, 558, 586, 321, 366, 2584, 1269, 50912], "temperature": + 0.0, "avg_logprob": -0.21195084398443048, "compression_ratio": 1.6939655172413792, + "no_speech_prob": 0.005834519863128662}, {"id": 39, "seek": 26156, "start": 272.52, + "end": 281.24, "text": " source everything that you can see in our report and stuff + each open for everyone. Yeah so so okay", "tokens": [50912, 4009, 1203, 300, 291, + 393, 536, 294, 527, 2275, 293, 1507, 1184, 1269, 337, 1518, 13, 865, 370, 370, 1392, + 51348], "temperature": 0.0, "avg_logprob": -0.21195084398443048, "compression_ratio": + 1.6939655172413792, "no_speech_prob": 0.005834519863128662}, {"id": 40, "seek": + 26156, "start": 281.24, "end": 285.64, "text": " and you are like mostly working + on back-and-side of things so you''re not interacting with direct", "tokens": [51348, + 293, 291, 366, 411, 5240, 1364, 322, 646, 12, 474, 12, 1812, 295, 721, 370, 291, + 434, 406, 18017, 365, 2047, 51568], "temperature": 0.0, "avg_logprob": -0.21195084398443048, + "compression_ratio": 1.6939655172413792, "no_speech_prob": 0.005834519863128662}, + {"id": 41, "seek": 28564, "start": 285.71999999999997, "end": 293.32, "text": " + customers right? Is that okay? I''m working mostly on the main products and what + do you hear", "tokens": [50368, 4581, 558, 30, 1119, 300, 1392, 30, 286, 478, 1364, + 5240, 322, 264, 2135, 3383, 293, 437, 360, 291, 1568, 50748], "temperature": 0.0, + "avg_logprob": -0.2294152123587472, "compression_ratio": 1.5372340425531914, "no_speech_prob": + 0.003993872553110123}, {"id": 42, "seek": 28564, "start": 293.32, "end": 301.8, + "text": " about use cases like how do they translate to your level of kind of day-to-day + job? So most of our", "tokens": [50748, 466, 764, 3331, 411, 577, 360, 436, 13799, + 281, 428, 1496, 295, 733, 295, 786, 12, 1353, 12, 810, 1691, 30, 407, 881, 295, + 527, 51172], "temperature": 0.0, "avg_logprob": -0.2294152123587472, "compression_ratio": + 1.5372340425531914, "no_speech_prob": 0.003993872553110123}, {"id": 43, "seek": + 28564, "start": 301.8, "end": 307.56, "text": " solution engineers that say that + are closer to clients and users they bring guys their pain points", "tokens": [51172, + 3827, 11955, 300, 584, 300, 366, 4966, 281, 6982, 293, 5022, 436, 1565, 1074, 641, + 1822, 2793, 51460], "temperature": 0.0, "avg_logprob": -0.2294152123587472, "compression_ratio": + 1.5372340425531914, "no_speech_prob": 0.003993872553110123}, {"id": 44, "seek": + 30756, "start": 308.12, "end": 315.64, "text": " on how they are trying to solve + users needs and some of the main use cases that we are trying to", "tokens": [50392, + 322, 577, 436, 366, 1382, 281, 5039, 5022, 2203, 293, 512, 295, 264, 2135, 764, + 3331, 300, 321, 366, 1382, 281, 50768], "temperature": 0.0, "avg_logprob": -0.19238075528826032, + "compression_ratio": 1.858974358974359, "no_speech_prob": 0.006196014583110809}, + {"id": 45, "seek": 30756, "start": 315.64, "end": 322.44, "text": " solve come from + textile search, e-match search, multimodal search that is something that we are", + "tokens": [50768, 5039, 808, 490, 42069, 3164, 11, 308, 12, 76, 852, 3164, 11, 32972, + 378, 304, 3164, 300, 307, 746, 300, 321, 366, 51108], "temperature": 0.0, "avg_logprob": + -0.19238075528826032, "compression_ratio": 1.858974358974359, "no_speech_prob": + 0.006196014583110809}, {"id": 46, "seek": 30756, "start": 322.44, "end": 330.68, + "text": " trying to excel at that is going beyond only just using search and text + or images to search maybe", "tokens": [51108, 1382, 281, 24015, 412, 300, 307, 516, + 4399, 787, 445, 1228, 3164, 293, 2487, 420, 5267, 281, 3164, 1310, 51520], "temperature": + 0.0, "avg_logprob": -0.19238075528826032, "compression_ratio": 1.858974358974359, + "no_speech_prob": 0.006196014583110809}, {"id": 47, "seek": 33068, "start": 331.24, + "end": 338.6, "text": " trying to have a combination of walls to power search to + the next level. So they might like bring", "tokens": [50392, 1382, 281, 362, 257, + 6562, 295, 7920, 281, 1347, 3164, 281, 264, 958, 1496, 13, 407, 436, 1062, 411, + 1565, 50760], "temperature": 0.0, "avg_logprob": -0.11148281097412109, "compression_ratio": + 1.6538461538461537, "no_speech_prob": 0.0063421037048101425}, {"id": 48, "seek": + 33068, "start": 338.6, "end": 345.24, "text": " some kind of use case that you need + to figure out on tech level right? Yes kind of translates to", "tokens": [50760, + 512, 733, 295, 764, 1389, 300, 291, 643, 281, 2573, 484, 322, 7553, 1496, 558, 30, + 1079, 733, 295, 28468, 281, 51092], "temperature": 0.0, "avg_logprob": -0.11148281097412109, + "compression_ratio": 1.6538461538461537, "no_speech_prob": 0.0063421037048101425}, + {"id": 49, "seek": 33068, "start": 345.24, "end": 350.04, "text": " you but on the + other hand like you said it''s open source so it means like there is like a bunch + of", "tokens": [51092, 291, 457, 322, 264, 661, 1011, 411, 291, 848, 309, 311, 1269, + 4009, 370, 309, 1355, 411, 456, 307, 411, 257, 3840, 295, 51332], "temperature": + 0.0, "avg_logprob": -0.11148281097412109, "compression_ratio": 1.6538461538461537, + "no_speech_prob": 0.0063421037048101425}, {"id": 50, "seek": 33068, "start": 350.04, + "end": 354.92, "text": " like GitHub issues coming in right and if you have like + Slack or I don''t know if you''re using", "tokens": [51332, 411, 23331, 2663, 1348, + 294, 558, 293, 498, 291, 362, 411, 37211, 420, 286, 500, 380, 458, 498, 291, 434, + 1228, 51576], "temperature": 0.0, "avg_logprob": -0.11148281097412109, "compression_ratio": + 1.6538461538461537, "no_speech_prob": 0.0063421037048101425}, {"id": 51, "seek": + 35492, "start": 355.0, "end": 363.40000000000003, "text": " Slack anyway. Yeah, + so like probably every day like somebody you wake up and there are questions", "tokens": + [50368, 37211, 4033, 13, 865, 11, 370, 411, 1391, 633, 786, 411, 2618, 291, 6634, + 493, 293, 456, 366, 1651, 50788], "temperature": 0.0, "avg_logprob": -0.2297153053702889, + "compression_ratio": 1.6196581196581197, "no_speech_prob": 0.006950638722628355}, + {"id": 52, "seek": 35492, "start": 363.40000000000003, "end": 371.32, "text": " + there right? So it''s also clients in a way right? Yes for me my users are our clients + and we", "tokens": [50788, 456, 558, 30, 407, 309, 311, 611, 6982, 294, 257, 636, + 558, 30, 1079, 337, 385, 452, 5022, 366, 527, 6982, 293, 321, 51184], "temperature": + 0.0, "avg_logprob": -0.2297153053702889, "compression_ratio": 1.6196581196581197, + "no_speech_prob": 0.006950638722628355}, {"id": 53, "seek": 35492, "start": 371.32, + "end": 377.8, "text": " have to listen to them so that''s the big point of open + source in my opinion is this direct feedback", "tokens": [51184, 362, 281, 2140, + 281, 552, 370, 300, 311, 264, 955, 935, 295, 1269, 4009, 294, 452, 4800, 307, 341, + 2047, 5824, 51508], "temperature": 0.0, "avg_logprob": -0.2297153053702889, "compression_ratio": + 1.6196581196581197, "no_speech_prob": 0.006950638722628355}, {"id": 54, "seek": + 35492, "start": 377.8, "end": 384.04, "text": " from the users we can you can correct + your direction and you can measure if your APIs are", "tokens": [51508, 490, 264, + 5022, 321, 393, 291, 393, 3006, 428, 3513, 293, 291, 393, 3481, 498, 428, 21445, + 366, 51820], "temperature": 0.0, "avg_logprob": -0.2297153053702889, "compression_ratio": + 1.6196581196581197, "no_speech_prob": 0.006950638722628355}, {"id": 55, "seek": + 38404, "start": 384.04, "end": 390.12, "text": " or your design are too complex + for the user to rush or whatever so this direct feedback is", "tokens": [50364, + 420, 428, 1715, 366, 886, 3997, 337, 264, 4195, 281, 9300, 420, 2035, 370, 341, + 2047, 5824, 307, 50668], "temperature": 0.0, "avg_logprob": -0.26971201463179156, + "compression_ratio": 1.6157205240174672, "no_speech_prob": 0.0017228100914508104}, + {"id": 56, "seek": 38404, "start": 390.12, "end": 400.44, "text": " really useful. + And to this point it''s manageable. Yeah yeah but it''s also like I guess I also", + "tokens": [50668, 534, 4420, 13, 400, 281, 341, 935, 309, 311, 38798, 13, 865, 1338, + 457, 309, 311, 611, 411, 286, 2041, 286, 611, 51184], "temperature": 0.0, "avg_logprob": + -0.26971201463179156, "compression_ratio": 1.6157205240174672, "no_speech_prob": + 0.0017228100914508104}, {"id": 57, "seek": 38404, "start": 400.44, "end": 407.96000000000004, + "text": " alluded to this and one of the podcasts was with Bob One Lloyd from from + semi like it''s also", "tokens": [51184, 33919, 281, 341, 293, 472, 295, 264, 24045, + 390, 365, 6085, 1485, 31401, 490, 490, 12909, 411, 309, 311, 611, 51560], "temperature": + 0.0, "avg_logprob": -0.26971201463179156, "compression_ratio": 1.6157205240174672, + "no_speech_prob": 0.0017228100914508104}, {"id": 58, "seek": 38404, "start": 407.96000000000004, + "end": 413.40000000000003, "text": " sometimes maybe to give up with all the questions + right? Like if you get all these questions", "tokens": [51560, 2171, 1310, 281, + 976, 493, 365, 439, 264, 1651, 558, 30, 1743, 498, 291, 483, 439, 613, 1651, 51832], + "temperature": 0.0, "avg_logprob": -0.26971201463179156, "compression_ratio": 1.6157205240174672, + "no_speech_prob": 0.0017228100914508104}, {"id": 59, "seek": 41340, "start": 413.4, + "end": 418.84, "text": " when do you find the answer to kind of really deeply answer? + Yeah fine time for answer them yeah.", "tokens": [50364, 562, 360, 291, 915, 264, + 1867, 281, 733, 295, 534, 8760, 1867, 30, 865, 2489, 565, 337, 1867, 552, 1338, + 13, 50636], "temperature": 0.0, "avg_logprob": -0.17133570777045357, "compression_ratio": + 1.7309417040358743, "no_speech_prob": 0.002529696561396122}, {"id": 60, "seek": + 41340, "start": 418.84, "end": 424.76, "text": " So we are trying to grow our team + into knowing that the community is something that makes us", "tokens": [50636, 407, + 321, 366, 1382, 281, 1852, 527, 1469, 666, 5276, 300, 264, 1768, 307, 746, 300, + 1669, 505, 50932], "temperature": 0.0, "avg_logprob": -0.17133570777045357, "compression_ratio": + 1.7309417040358743, "no_speech_prob": 0.002529696561396122}, {"id": 61, "seek": + 41340, "start": 424.76, "end": 430.28, "text": " special and it''s important for + us to take care of our community so we are all trying to keep an eye", "tokens": + [50932, 2121, 293, 309, 311, 1021, 337, 505, 281, 747, 1127, 295, 527, 1768, 370, + 321, 366, 439, 1382, 281, 1066, 364, 3313, 51208], "temperature": 0.0, "avg_logprob": + -0.17133570777045357, "compression_ratio": 1.7309417040358743, "no_speech_prob": + 0.002529696561396122}, {"id": 62, "seek": 41340, "start": 430.28, "end": 437.71999999999997, + "text": " on the community. Yeah yeah yeah I remember like when I was developing + like search code we were", "tokens": [51208, 322, 264, 1768, 13, 865, 1338, 1338, + 286, 1604, 411, 562, 286, 390, 6416, 411, 3164, 3089, 321, 645, 51580], "temperature": + 0.0, "avg_logprob": -0.17133570777045357, "compression_ratio": 1.7309417040358743, + "no_speech_prob": 0.002529696561396122}, {"id": 63, "seek": 43772, "start": 437.8, + "end": 444.44000000000005, "text": " using like Apache Solar and I had to like customize + some parts of solar and listen and I remember like", "tokens": [50368, 1228, 411, + 46597, 22385, 293, 286, 632, 281, 411, 19734, 512, 3166, 295, 7936, 293, 2140, 293, + 286, 1604, 411, 50700], "temperature": 0.0, "avg_logprob": -0.17635962303648603, + "compression_ratio": 1.76, "no_speech_prob": 0.022289138287305832}, {"id": 64, "seek": + 43772, "start": 444.44000000000005, "end": 450.68, "text": " in order for me to + kind of get up to speed I had to go to this mailing list right? And so there are", + "tokens": [50700, 294, 1668, 337, 385, 281, 733, 295, 483, 493, 281, 3073, 286, + 632, 281, 352, 281, 341, 41612, 1329, 558, 30, 400, 370, 456, 366, 51012], "temperature": + 0.0, "avg_logprob": -0.17635962303648603, "compression_ratio": 1.76, "no_speech_prob": + 0.022289138287305832}, {"id": 65, "seek": 43772, "start": 450.68, "end": 455.32000000000005, + "text": " like thousands and thousands emails actually Apache Solar was super active + you know like in", "tokens": [51012, 411, 5383, 293, 5383, 12524, 767, 46597, 22385, + 390, 1687, 4967, 291, 458, 411, 294, 51244], "temperature": 0.0, "avg_logprob": + -0.17635962303648603, "compression_ratio": 1.76, "no_speech_prob": 0.022289138287305832}, + {"id": 66, "seek": 43772, "start": 455.32000000000005, "end": 461.72, "text": " + sillies in many ways and and I was like how can I keep up with all these questions + but like I do need", "tokens": [51244, 37160, 530, 294, 867, 2098, 293, 293, 286, + 390, 411, 577, 393, 286, 1066, 493, 365, 439, 613, 1651, 457, 411, 286, 360, 643, + 51564], "temperature": 0.0, "avg_logprob": -0.17635962303648603, "compression_ratio": + 1.76, "no_speech_prob": 0.022289138287305832}, {"id": 67, "seek": 46172, "start": + 461.72, "end": 468.04, "text": " to somehow keep up and summarize maybe what what + is being asked there in order to understand", "tokens": [50364, 281, 6063, 1066, + 493, 293, 20858, 1310, 437, 437, 307, 885, 2351, 456, 294, 1668, 281, 1223, 50680], + "temperature": 0.0, "avg_logprob": -0.13355929955192233, "compression_ratio": 1.711111111111111, + "no_speech_prob": 0.002701396122574806}, {"id": 68, "seek": 46172, "start": 468.04, + "end": 474.04, "text": " it''s useful for me or not because when you ask a question + on the mailing list or like today on", "tokens": [50680, 309, 311, 4420, 337, 385, + 420, 406, 570, 562, 291, 1029, 257, 1168, 322, 264, 41612, 1329, 420, 411, 965, + 322, 50980], "temperature": 0.0, "avg_logprob": -0.13355929955192233, "compression_ratio": + 1.711111111111111, "no_speech_prob": 0.002701396122574806}, {"id": 69, "seek": 46172, + "start": 474.04, "end": 480.12, "text": " Slack sometimes you need to be ready to + pay back right? If somebody help you in the community like", "tokens": [50980, 37211, + 2171, 291, 643, 281, 312, 1919, 281, 1689, 646, 558, 30, 759, 2618, 854, 291, 294, + 264, 1768, 411, 51284], "temperature": 0.0, "avg_logprob": -0.13355929955192233, + "compression_ratio": 1.711111111111111, "no_speech_prob": 0.002701396122574806}, + {"id": 70, "seek": 46172, "start": 480.12, "end": 487.24, "text": " you sometimes + need to also pay back so it''s like it''s a game. When this is seen in the community + I", "tokens": [51284, 291, 2171, 643, 281, 611, 1689, 646, 370, 309, 311, 411, 309, + 311, 257, 1216, 13, 1133, 341, 307, 1612, 294, 264, 1768, 286, 51640], "temperature": + 0.0, "avg_logprob": -0.13355929955192233, "compression_ratio": 1.711111111111111, + "no_speech_prob": 0.002701396122574806}, {"id": 71, "seek": 48724, "start": 487.24, + "end": 491.8, "text": " think it''s really pleasant for all the team when community + interacts with each other and none of", "tokens": [50364, 519, 309, 311, 534, 16232, + 337, 439, 264, 1469, 562, 1768, 43582, 365, 1184, 661, 293, 6022, 295, 50592], "temperature": + 0.0, "avg_logprob": -0.16638494420934608, "compression_ratio": 1.8358778625954197, + "no_speech_prob": 0.0034056217409670353}, {"id": 72, "seek": 48724, "start": 491.8, + "end": 496.68, "text": " the no one in the team has to jump in because they so they + help each other that''s when I think", "tokens": [50592, 264, 572, 472, 294, 264, + 1469, 575, 281, 3012, 294, 570, 436, 370, 436, 854, 1184, 661, 300, 311, 562, 286, + 519, 50836], "temperature": 0.0, "avg_logprob": -0.16638494420934608, "compression_ratio": + 1.8358778625954197, "no_speech_prob": 0.0034056217409670353}, {"id": 73, "seek": + 48724, "start": 496.68, "end": 503.88, "text": " the community really scales and + really open source goes to the next level. Yeah it''s kind of", "tokens": [50836, + 264, 1768, 534, 17408, 293, 534, 1269, 4009, 1709, 281, 264, 958, 1496, 13, 865, + 309, 311, 733, 295, 51196], "temperature": 0.0, "avg_logprob": -0.16638494420934608, + "compression_ratio": 1.8358778625954197, "no_speech_prob": 0.0034056217409670353}, + {"id": 74, "seek": 48724, "start": 503.88, "end": 509.64, "text": " regenerating + itself and kind of the cultural element of it so and the community drives you forward + I mean", "tokens": [51196, 26358, 990, 2564, 293, 733, 295, 264, 6988, 4478, 295, + 309, 370, 293, 264, 1768, 11754, 291, 2128, 286, 914, 51484], "temperature": 0.0, + "avg_logprob": -0.16638494420934608, "compression_ratio": 1.8358778625954197, "no_speech_prob": + 0.0034056217409670353}, {"id": 75, "seek": 48724, "start": 511.16, "end": 516.52, + "text": " just driving force of the project from the interaction point and the feature + wise as well.", "tokens": [51560, 445, 4840, 3464, 295, 264, 1716, 490, 264, 9285, + 935, 293, 264, 4111, 10829, 382, 731, 13, 51828], "temperature": 0.0, "avg_logprob": + -0.16638494420934608, "compression_ratio": 1.8358778625954197, "no_speech_prob": + 0.0034056217409670353}, {"id": 76, "seek": 51652, "start": 516.6, "end": 523.24, + "text": " Yeah sounds good. So John tell me more about GNI itself like as a product + let''s say as a", "tokens": [50368, 865, 3263, 665, 13, 407, 2619, 980, 385, 544, + 466, 460, 42496, 2564, 411, 382, 257, 1674, 718, 311, 584, 382, 257, 50700], "temperature": + 0.0, "avg_logprob": -0.1818456252415975, "compression_ratio": 1.6120689655172413, + "no_speech_prob": 0.003542924067005515}, {"id": 77, "seek": 51652, "start": 523.24, + "end": 530.84, "text": " technology stack like what can I do as a user you know + using GNI and yeah like is it self-series", "tokens": [50700, 2899, 8630, 411, 437, + 393, 286, 360, 382, 257, 4195, 291, 458, 1228, 460, 42496, 293, 1338, 411, 307, + 309, 2698, 12, 12484, 530, 51080], "temperature": 0.0, "avg_logprob": -0.1818456252415975, + "compression_ratio": 1.6120689655172413, "no_speech_prob": 0.003542924067005515}, + {"id": 78, "seek": 51652, "start": 530.84, "end": 538.68, "text": " and so on. So + the main point of GNI is that we want to be with the user from the minute they", + "tokens": [51080, 293, 370, 322, 13, 407, 264, 2135, 935, 295, 460, 42496, 307, + 300, 321, 528, 281, 312, 365, 264, 4195, 490, 264, 3456, 436, 51472], "temperature": + 0.0, "avg_logprob": -0.1818456252415975, "compression_ratio": 1.6120689655172413, + "no_speech_prob": 0.003542924067005515}, {"id": 79, "seek": 51652, "start": 538.68, + "end": 544.68, "text": " are experimenting with their search application so for + instance we are written in Python and we", "tokens": [51472, 366, 29070, 365, 641, + 3164, 3861, 370, 337, 5197, 321, 366, 3720, 294, 15329, 293, 321, 51772], "temperature": + 0.0, "avg_logprob": -0.1818456252415975, "compression_ratio": 1.6120689655172413, + "no_speech_prob": 0.003542924067005515}, {"id": 80, "seek": 54468, "start": 544.68, + "end": 549.0, "text": " have a really nice API in Python to build with your documents + that can treat with any type of", "tokens": [50364, 362, 257, 534, 1481, 9362, 294, + 15329, 281, 1322, 365, 428, 8512, 300, 393, 2387, 365, 604, 2010, 295, 50580], "temperature": + 0.0, "avg_logprob": -0.18774527595156715, "compression_ratio": 1.688073394495413, + "no_speech_prob": 0.0003409196506254375}, {"id": 81, "seek": 54468, "start": 549.0, + "end": 556.28, "text": " data, text, images, audio, video and we are trying to build + a really easy to use API for this", "tokens": [50580, 1412, 11, 2487, 11, 5267, + 11, 6278, 11, 960, 293, 321, 366, 1382, 281, 1322, 257, 534, 1858, 281, 764, 9362, + 337, 341, 50944], "temperature": 0.0, "avg_logprob": -0.18774527595156715, "compression_ratio": + 1.688073394495413, "no_speech_prob": 0.0003409196506254375}, {"id": 82, "seek": + 54468, "start": 556.28, "end": 563.7199999999999, "text": " for you to run locally + your solutions. The first experimental facing to wrap your code", "tokens": [50944, + 337, 291, 281, 1190, 16143, 428, 6547, 13, 440, 700, 17069, 7170, 281, 7019, 428, + 3089, 51316], "temperature": 0.0, "avg_logprob": -0.18774527595156715, "compression_ratio": + 1.688073394495413, "no_speech_prob": 0.0003409196506254375}, {"id": 83, "seek": + 54468, "start": 563.7199999999999, "end": 568.92, "text": " for processing loading + the files and for processing the images or whatever and embedding them", "tokens": + [51316, 337, 9007, 15114, 264, 7098, 293, 337, 9007, 264, 5267, 420, 2035, 293, + 12240, 3584, 552, 51576], "temperature": 0.0, "avg_logprob": -0.18774527595156715, + "compression_ratio": 1.688073394495413, "no_speech_prob": 0.0003409196506254375}, + {"id": 84, "seek": 56892, "start": 569.64, "end": 576.76, "text": " searching to + do a process many as neighbors or exact nearest neighbor search then once you have + this", "tokens": [50400, 10808, 281, 360, 257, 1399, 867, 382, 12512, 420, 1900, + 23831, 5987, 3164, 550, 1564, 291, 362, 341, 50756], "temperature": 0.0, "avg_logprob": + -0.2781321493427405, "compression_ratio": 1.8761904761904762, "no_speech_prob": + 0.0011840553488582373}, {"id": 85, "seek": 56892, "start": 578.12, "end": 584.92, + "text": " we make it easy for you to wrap it in some microservices what we call + executors so first phase you", "tokens": [50824, 321, 652, 309, 1858, 337, 291, + 281, 7019, 309, 294, 512, 15547, 47480, 437, 321, 818, 7568, 830, 370, 700, 5574, + 291, 51164], "temperature": 0.0, "avg_logprob": -0.2781321493427405, "compression_ratio": + 1.8761904761904762, "no_speech_prob": 0.0011840553488582373}, {"id": 86, "seek": + 56892, "start": 584.92, "end": 590.92, "text": " deal with these document array + types that we have come with then you come with them to the next", "tokens": [51164, + 2028, 365, 613, 4166, 10225, 3467, 300, 321, 362, 808, 365, 550, 291, 808, 365, + 552, 281, 264, 958, 51464], "temperature": 0.0, "avg_logprob": -0.2781321493427405, + "compression_ratio": 1.8761904761904762, "no_speech_prob": 0.0011840553488582373}, + {"id": 87, "seek": 56892, "start": 590.92, "end": 596.04, "text": " layer is you + have it with the executors so you wrap your logic in different microservices and + then", "tokens": [51464, 4583, 307, 291, 362, 309, 365, 264, 7568, 830, 370, 291, + 7019, 428, 9952, 294, 819, 15547, 47480, 293, 550, 51720], "temperature": 0.0, "avg_logprob": + -0.2781321493427405, "compression_ratio": 1.8761904761904762, "no_speech_prob": + 0.0011840553488582373}, {"id": 88, "seek": 59604, "start": 596.04, "end": 603.0799999999999, + "text": " we put it in what we call a flow that is kind of a pipeline that is really + to scale locally or", "tokens": [50364, 321, 829, 309, 294, 437, 321, 818, 257, + 3095, 300, 307, 733, 295, 257, 15517, 300, 307, 534, 281, 4373, 16143, 420, 50716], + "temperature": 0.0, "avg_logprob": -0.1448435007139694, "compression_ratio": 1.632034632034632, + "no_speech_prob": 0.0012566702207550406}, {"id": 89, "seek": 59604, "start": 603.0799999999999, + "end": 608.8399999999999, "text": " remotely or even with Kubernetes so that you + have replication and scalability taken care for.", "tokens": [50716, 20824, 420, + 754, 365, 23145, 370, 300, 291, 362, 39911, 293, 15664, 2310, 2726, 1127, 337, 13, + 51004], "temperature": 0.0, "avg_logprob": -0.1448435007139694, "compression_ratio": + 1.632034632034632, "no_speech_prob": 0.0012566702207550406}, {"id": 90, "seek": + 59604, "start": 609.64, "end": 617.56, "text": " So we are trying to bring you easily + from your day zero of development to the production system.", "tokens": [51044, + 407, 321, 366, 1382, 281, 1565, 291, 3612, 490, 428, 786, 4018, 295, 3250, 281, + 264, 4265, 1185, 13, 51440], "temperature": 0.0, "avg_logprob": -0.1448435007139694, + "compression_ratio": 1.632034632034632, "no_speech_prob": 0.0012566702207550406}, + {"id": 91, "seek": 59604, "start": 617.56, "end": 623.88, "text": " Yeah yeah sounds + good sounds comprehensive and like what if I would like to just use like a", "tokens": + [51440, 865, 1338, 3263, 665, 3263, 13914, 293, 411, 437, 498, 286, 576, 411, 281, + 445, 764, 411, 257, 51756], "temperature": 0.0, "avg_logprob": -0.1448435007139694, + "compression_ratio": 1.632034632034632, "no_speech_prob": 0.0012566702207550406}, + {"id": 92, "seek": 62388, "start": 623.96, "end": 631.0, "text": " hosted version + can I use a hosted version from Gina AI or do I need to do an operation?", "tokens": + [50368, 19204, 3037, 393, 286, 764, 257, 19204, 3037, 490, 34711, 7318, 420, 360, + 286, 643, 281, 360, 364, 6916, 30, 50720], "temperature": 0.0, "avg_logprob": -0.5413087463378906, + "compression_ratio": 1.6602316602316602, "no_speech_prob": 0.004382084123790264}, + {"id": 93, "seek": 62388, "start": 631.0, "end": 635.0, "text": " There is no hosted + version at this point yeah so it''s basically I need it''s like a", "tokens": [50720, + 821, 307, 572, 19204, 3037, 412, 341, 935, 1338, 370, 309, 311, 1936, 286, 643, + 309, 311, 411, 257, 50920], "temperature": 0.0, "avg_logprob": -0.5413087463378906, + "compression_ratio": 1.6602316602316602, "no_speech_prob": 0.004382084123790264}, + {"id": 94, "seek": 62388, "start": 636.12, "end": 639.96, "text": " Lego type of + thing right? Yes exactly. I will have a nice deployment.", "tokens": [50976, 28761, + 2010, 295, 551, 558, 30, 1079, 2293, 13, 286, 486, 362, 257, 1481, 19317, 13, 51168], + "temperature": 0.0, "avg_logprob": -0.5413087463378906, "compression_ratio": 1.6602316602316602, + "no_speech_prob": 0.004382084123790264}, {"id": 95, "seek": 62388, "start": 640.92, + "end": 646.04, "text": " And we have even this marketplace as you said this with + this helicopter cutter so you can share", "tokens": [51216, 400, 321, 362, 754, + 341, 19455, 382, 291, 848, 341, 365, 341, 19803, 25531, 370, 291, 393, 2073, 51472], + "temperature": 0.0, "avg_logprob": -0.5413087463378906, "compression_ratio": 1.6602316602316602, + "no_speech_prob": 0.004382084123790264}, {"id": 96, "seek": 62388, "start": 646.04, + "end": 651.08, "text": " publicly or privately with your colleagues or with the + community your meeting blocks that you", "tokens": [51472, 14843, 420, 31919, 365, + 428, 7734, 420, 365, 264, 1768, 428, 3440, 8474, 300, 291, 51724], "temperature": + 0.0, "avg_logprob": -0.5413087463378906, "compression_ratio": 1.6602316602316602, + "no_speech_prob": 0.004382084123790264}, {"id": 97, "seek": 65108, "start": 651.08, + "end": 655.72, "text": " may think they are useful for you. Yeah modern deep learning + models that you have packed", "tokens": [50364, 815, 519, 436, 366, 4420, 337, 291, + 13, 865, 4363, 2452, 2539, 5245, 300, 291, 362, 13265, 50596], "temperature": 0.0, + "avg_logprob": -0.4869543053637976, "compression_ratio": 1.5806451612903225, "no_speech_prob": + 0.0020556175149977207}, {"id": 98, "seek": 65108, "start": 656.44, "end": 660.76, + "text": " processing, copy-done, re-runking, even back to research research.", "tokens": + [50632, 9007, 11, 5055, 12, 67, 546, 11, 319, 12, 81, 3197, 278, 11, 754, 646, 281, + 2132, 2132, 13, 50848], "temperature": 0.0, "avg_logprob": -0.4869543053637976, + "compression_ratio": 1.5806451612903225, "no_speech_prob": 0.0020556175149977207}, + {"id": 99, "seek": 65108, "start": 660.76, "end": 669.96, "text": " Yeah so and + how does it align also with like companies or hubs like Hagen phase you know Hagen", + "tokens": [50848, 865, 370, 293, 577, 775, 309, 7975, 611, 365, 411, 3431, 420, + 46870, 411, 389, 4698, 5574, 291, 458, 389, 4698, 51308], "temperature": 0.0, "avg_logprob": + -0.4869543053637976, "compression_ratio": 1.5806451612903225, "no_speech_prob": + 0.0020556175149977207}, {"id": 100, "seek": 65108, "start": 669.96, "end": 674.6800000000001, + "text": " phase is also very famous on model side right? So like let''s show somebody + picks a model and", "tokens": [51308, 5574, 307, 611, 588, 4618, 322, 2316, 1252, + 558, 30, 407, 411, 718, 311, 855, 2618, 16137, 257, 2316, 293, 51544], "temperature": + 0.0, "avg_logprob": -0.4869543053637976, "compression_ratio": 1.5806451612903225, + "no_speech_prob": 0.0020556175149977207}, {"id": 101, "seek": 67468, "start": 674.68, + "end": 680.76, "text": " wants to bring it to Gina what''s the process there? So + it''s quite I would say having", "tokens": [50364, 2738, 281, 1565, 309, 281, 34711, + 437, 311, 264, 1399, 456, 30, 407, 309, 311, 1596, 286, 576, 584, 1419, 50668], + "temperature": 0.0, "avg_logprob": -0.246503784542992, "compression_ratio": 1.708133971291866, + "no_speech_prob": 0.0036826268769800663}, {"id": 102, "seek": 67468, "start": 680.76, + "end": 685.0, "text": " phase it''s quite inspirational for us in this sense in + this marketplace community", "tokens": [50668, 5574, 309, 311, 1596, 33554, 337, + 505, 294, 341, 2020, 294, 341, 19455, 1768, 50880], "temperature": 0.0, "avg_logprob": + -0.246503784542992, "compression_ratio": 1.708133971291866, "no_speech_prob": 0.0036826268769800663}, + {"id": 103, "seek": 67468, "start": 685.9599999999999, "end": 694.68, "text": " + and place it is quite similar but um Gina is this marketplace is related to our + executor so it", "tokens": [50928, 293, 1081, 309, 307, 1596, 2531, 457, 1105, 34711, + 307, 341, 19455, 307, 4077, 281, 527, 7568, 284, 370, 309, 51364], "temperature": + 0.0, "avg_logprob": -0.246503784542992, "compression_ratio": 1.708133971291866, + "no_speech_prob": 0.0036826268769800663}, {"id": 104, "seek": 67468, "start": 694.68, + "end": 702.3599999999999, "text": " goes beyond only models so it''s any subsystem + enabled and block that you can that you can build", "tokens": [51364, 1709, 4399, + 787, 5245, 370, 309, 311, 604, 2090, 9321, 15172, 293, 3461, 300, 291, 393, 300, + 291, 393, 1322, 51748], "temperature": 0.0, "avg_logprob": -0.246503784542992, "compression_ratio": + 1.708133971291866, "no_speech_prob": 0.0036826268769800663}, {"id": 105, "seek": + 70236, "start": 702.36, "end": 709.5600000000001, "text": " that is able to be part + of this of this Gina pipeline for us and we are trying to make it", "tokens": [50364, + 300, 307, 1075, 281, 312, 644, 295, 341, 295, 341, 34711, 15517, 337, 505, 293, + 321, 366, 1382, 281, 652, 309, 50724], "temperature": 0.0, "avg_logprob": -0.16105395952860516, + "compression_ratio": 1.7043795620437956, "no_speech_prob": 0.0008499606628902256}, + {"id": 106, "seek": 70236, "start": 709.5600000000001, "end": 715.32, "text": " + user-phone for you to localize it and use it in any way in a simple API and we''re + still working", "tokens": [50724, 4195, 12, 4977, 337, 291, 281, 2654, 1125, 309, + 293, 764, 309, 294, 604, 636, 294, 257, 2199, 9362, 293, 321, 434, 920, 1364, 51012], + "temperature": 0.0, "avg_logprob": -0.16105395952860516, "compression_ratio": 1.7043795620437956, + "no_speech_prob": 0.0008499606628902256}, {"id": 107, "seek": 70236, "start": 715.32, + "end": 720.6, "text": " to make it easier every time. Yeah of course because actually + you know it tends to get a lot of", "tokens": [51012, 281, 652, 309, 3571, 633, + 565, 13, 865, 295, 1164, 570, 767, 291, 458, 309, 12258, 281, 483, 257, 688, 295, + 51276], "temperature": 0.0, "avg_logprob": -0.16105395952860516, "compression_ratio": + 1.7043795620437956, "no_speech_prob": 0.0008499606628902256}, {"id": 108, "seek": + 70236, "start": 720.6, "end": 725.24, "text": " time you know the infrastructure + part like how do I bring my model let''s say I have a custom model", "tokens": [51276, + 565, 291, 458, 264, 6896, 644, 411, 577, 360, 286, 1565, 452, 2316, 718, 311, 584, + 286, 362, 257, 2375, 2316, 51508], "temperature": 0.0, "avg_logprob": -0.16105395952860516, + "compression_ratio": 1.7043795620437956, "no_speech_prob": 0.0008499606628902256}, + {"id": 109, "seek": 70236, "start": 725.24, "end": 732.28, "text": " and I want + to bring it inside Gina right so it serves as a embedding layer so how do I", "tokens": + [51508, 293, 286, 528, 281, 1565, 309, 1854, 34711, 558, 370, 309, 13451, 382, 257, + 12240, 3584, 4583, 370, 577, 360, 286, 51860], "temperature": 0.0, "avg_logprob": + -0.16105395952860516, "compression_ratio": 1.7043795620437956, "no_speech_prob": + 0.0008499606628902256}, {"id": 110, "seek": 73228, "start": 732.28, "end": 739.0799999999999, + "text": " figure out all this scalability or latency parameters and so on so I think + so the first thing", "tokens": [50364, 2573, 484, 439, 341, 15664, 2310, 420, 27043, + 9834, 293, 370, 322, 370, 286, 519, 370, 264, 700, 551, 50704], "temperature": 0.0, + "avg_logprob": -0.25484494412882946, "compression_ratio": 1.7363636363636363, "no_speech_prob": + 0.0010467789834365249}, {"id": 111, "seek": 73228, "start": 739.0799999999999, "end": + 745.4, "text": " is to get it working we are having to we expose these with these + executors that have some API", "tokens": [50704, 307, 281, 483, 309, 1364, 321, + 366, 1419, 281, 321, 19219, 613, 365, 613, 7568, 830, 300, 362, 512, 9362, 51020], + "temperature": 0.0, "avg_logprob": -0.25484494412882946, "compression_ratio": 1.7363636363636363, + "no_speech_prob": 0.0010467789834365249}, {"id": 112, "seek": 73228, "start": 745.4, + "end": 753.88, "text": " and to that read requests with some maybe I inspired with + this fast API approach and then you have", "tokens": [51020, 293, 281, 300, 1401, + 12475, 365, 512, 1310, 286, 7547, 365, 341, 2370, 9362, 3109, 293, 550, 291, 362, + 51444], "temperature": 0.0, "avg_logprob": -0.25484494412882946, "compression_ratio": + 1.7363636363636363, "no_speech_prob": 0.0010467789834365249}, {"id": 113, "seek": + 73228, "start": 754.52, "end": 759.8, "text": " with this row you have the parameters + to replicate to scale and so on you you may run it in GPU", "tokens": [51476, 365, + 341, 5386, 291, 362, 264, 9834, 281, 25356, 281, 4373, 293, 370, 322, 291, 291, + 815, 1190, 309, 294, 18407, 51740], "temperature": 0.0, "avg_logprob": -0.25484494412882946, + "compression_ratio": 1.7363636363636363, "no_speech_prob": 0.0010467789834365249}, + {"id": 114, "seek": 75980, "start": 759.88, "end": 765.9599999999999, "text": " + whatever yeah yeah so like you can choose your cost kind of like factors right or", + "tokens": [50368, 2035, 1338, 1338, 370, 411, 291, 393, 2826, 428, 2063, 733, 295, + 411, 6771, 558, 420, 50672], "temperature": 0.0, "avg_logprob": -0.23909169365377986, + "compression_ratio": 1.7230046948356808, "no_speech_prob": 0.001926393830217421}, + {"id": 115, "seek": 75980, "start": 765.9599999999999, "end": 771.0, "text": " based + on your cost factors you can choose it''s CPU actually and then latency and for + some models", "tokens": [50672, 2361, 322, 428, 2063, 6771, 291, 393, 2826, 309, + 311, 13199, 767, 293, 550, 27043, 293, 337, 512, 5245, 50924], "temperature": 0.0, + "avg_logprob": -0.23909169365377986, "compression_ratio": 1.7230046948356808, "no_speech_prob": + 0.001926393830217421}, {"id": 116, "seek": 75980, "start": 771.0, "end": 779.7199999999999, + "text": " actually CPU is fine so yeah I mean why not yeah it depends also on the + user needs so for instance", "tokens": [50924, 767, 13199, 307, 2489, 370, 1338, + 286, 914, 983, 406, 1338, 309, 5946, 611, 322, 264, 4195, 2203, 370, 337, 5197, + 51360], "temperature": 0.0, "avg_logprob": -0.23909169365377986, "compression_ratio": + 1.7230046948356808, "no_speech_prob": 0.001926393830217421}, {"id": 117, "seek": + 75980, "start": 780.5999999999999, "end": 788.4399999999999, "text": " we are also + seeing that neural search main not all is not needed to be only for these big", + "tokens": [51404, 321, 366, 611, 2577, 300, 18161, 3164, 2135, 406, 439, 307, 406, + 2978, 281, 312, 787, 337, 613, 955, 51796], "temperature": 0.0, "avg_logprob": -0.23909169365377986, + "compression_ratio": 1.7230046948356808, "no_speech_prob": 0.001926393830217421}, + {"id": 118, "seek": 78844, "start": 788.44, "end": 792.84, "text": " giants with + this big amount of data and big amount of resources so any company it''s more", + "tokens": [50364, 31894, 365, 341, 955, 2372, 295, 1412, 293, 955, 2372, 295, 3593, + 370, 604, 2237, 309, 311, 544, 50584], "temperature": 0.0, "avg_logprob": -0.1439342588748572, + "compression_ratio": 1.9076305220883534, "no_speech_prob": 0.0014836308546364307}, + {"id": 119, "seek": 78844, "start": 792.84, "end": 799.4000000000001, "text": " + company can benefit from the power of the neural networks to power their search + so they may not need", "tokens": [50584, 2237, 393, 5121, 490, 264, 1347, 295, 264, + 18161, 9590, 281, 1347, 641, 3164, 370, 436, 815, 406, 643, 50912], "temperature": + 0.0, "avg_logprob": -0.1439342588748572, "compression_ratio": 1.9076305220883534, + "no_speech_prob": 0.0014836308546364307}, {"id": 120, "seek": 78844, "start": 800.7600000000001, + "end": 807.4000000000001, "text": " so much require so much resources or they may + not require so much speed so it''s about and so we", "tokens": [50980, 370, 709, + 3651, 370, 709, 3593, 420, 436, 815, 406, 3651, 370, 709, 3073, 370, 309, 311, 466, + 293, 370, 321, 51312], "temperature": 0.0, "avg_logprob": -0.1439342588748572, "compression_ratio": + 1.9076305220883534, "no_speech_prob": 0.0014836308546364307}, {"id": 121, "seek": + 78844, "start": 807.4000000000001, "end": 812.44, "text": " are giving the power + more or less to use yeah and kind of flexibility of the platform so because", "tokens": + [51312, 366, 2902, 264, 1347, 544, 420, 1570, 281, 764, 1338, 293, 733, 295, 12635, + 295, 264, 3663, 370, 570, 51564], "temperature": 0.0, "avg_logprob": -0.1439342588748572, + "compression_ratio": 1.9076305220883534, "no_speech_prob": 0.0014836308546364307}, + {"id": 122, "seek": 78844, "start": 812.44, "end": 816.84, "text": " essentially + if they wanted to do it from scratch then they would probably need to figure out", + "tokens": [51564, 4476, 498, 436, 1415, 281, 360, 309, 490, 8459, 550, 436, 576, + 1391, 643, 281, 2573, 484, 51784], "temperature": 0.0, "avg_logprob": -0.1439342588748572, + "compression_ratio": 1.9076305220883534, "no_speech_prob": 0.0014836308546364307}, + {"id": 123, "seek": 81684, "start": 816.84, "end": 824.52, "text": " similar things + like component isolation and scaling and yeah like an algorithm like a quality", + "tokens": [50364, 2531, 721, 411, 6542, 16001, 293, 21589, 293, 1338, 411, 364, + 9284, 411, 257, 3125, 50748], "temperature": 0.0, "avg_logprob": -0.17490789890289307, + "compression_ratio": 1.7971014492753623, "no_speech_prob": 0.00211620656773448}, + {"id": 124, "seek": 81684, "start": 824.52, "end": 830.9200000000001, "text": " + checks and so on and on the algorithm side you said like you have exact search as + well as", "tokens": [50748, 13834, 293, 370, 322, 293, 322, 264, 9284, 1252, 291, + 848, 411, 291, 362, 1900, 3164, 382, 731, 382, 51068], "temperature": 0.0, "avg_logprob": + -0.17490789890289307, "compression_ratio": 1.7971014492753623, "no_speech_prob": + 0.00211620656773448}, {"id": 125, "seek": 81684, "start": 830.9200000000001, "end": + 837.24, "text": " in exact search can you talk with more and kind of mention maybe + some algorithms that you support", "tokens": [51068, 294, 1900, 3164, 393, 291, + 751, 365, 544, 293, 733, 295, 2152, 1310, 512, 14642, 300, 291, 1406, 51384], "temperature": + 0.0, "avg_logprob": -0.17490789890289307, "compression_ratio": 1.7971014492753623, + "no_speech_prob": 0.00211620656773448}, {"id": 126, "seek": 81684, "start": 838.2800000000001, + "end": 845.24, "text": " so yeah so right now natively we support as the main native + quite optimized version of the", "tokens": [51436, 370, 1338, 370, 558, 586, 8470, + 356, 321, 1406, 382, 264, 2135, 8470, 1596, 26941, 3037, 295, 264, 51784], "temperature": + 0.0, "avg_logprob": -0.17490789890289307, "compression_ratio": 1.7971014492753623, + "no_speech_prob": 0.00211620656773448}, {"id": 127, "seek": 84524, "start": 845.64, + "end": 852.76, "text": " and exact nearest neighbor search but then for instance + one of these building blocks can be any", "tokens": [50384, 293, 1900, 23831, 5987, + 3164, 457, 550, 337, 5197, 472, 295, 613, 2390, 8474, 393, 312, 604, 50740], "temperature": + 0.0, "avg_logprob": -0.20129682277810984, "compression_ratio": 1.7337278106508875, + "no_speech_prob": 0.0007803683984093368}, {"id": 128, "seek": 84524, "start": 852.76, + "end": 859.32, "text": " support wrapping any client for any other vector database + but for instance we just realized our own", "tokens": [50740, 1406, 21993, 604, + 6423, 337, 604, 661, 8062, 8149, 457, 337, 5197, 321, 445, 5334, 527, 1065, 51068], + "temperature": 0.0, "avg_logprob": -0.20129682277810984, "compression_ratio": 1.7337278106508875, + "no_speech_prob": 0.0007803683984093368}, {"id": 129, "seek": 84524, "start": 860.28, + "end": 866.76, "text": " and approximate nearest neighbor solution we have two of + them for instance that we have developed", "tokens": [51116, 293, 30874, 23831, + 5987, 3827, 321, 362, 732, 295, 552, 337, 5197, 300, 321, 362, 4743, 51440], "temperature": + 0.0, "avg_logprob": -0.20129682277810984, "compression_ratio": 1.7337278106508875, + "no_speech_prob": 0.0007803683984093368}, {"id": 130, "seek": 86676, "start": 866.76, + "end": 876.28, "text": " so much so we have one that is based on hsw plus a postgres + indexer a postgres database", "tokens": [50364, 370, 709, 370, 321, 362, 472, 300, + 307, 2361, 322, 276, 82, 86, 1804, 257, 2183, 45189, 8186, 260, 257, 2183, 45189, + 8149, 50840], "temperature": 0.0, "avg_logprob": -0.2965242119245632, "compression_ratio": + 1.6904761904761905, "no_speech_prob": 0.0017833516467362642}, {"id": 131, "seek": + 86676, "start": 876.28, "end": 882.68, "text": " for to require the documents and + then we have built our well we just released and in Slack", "tokens": [50840, 337, + 281, 3651, 264, 8512, 293, 550, 321, 362, 3094, 527, 731, 321, 445, 4736, 293, 294, + 37211, 51160], "temperature": 0.0, "avg_logprob": -0.2965242119245632, "compression_ratio": + 1.6904761904761905, "no_speech_prob": 0.0017833516467362642}, {"id": 132, "seek": + 86676, "start": 882.68, "end": 888.6, "text": " the community can start enjoying + it we have and build what we call pcolyte which", "tokens": [51160, 264, 1768, 393, + 722, 9929, 309, 321, 362, 293, 1322, 437, 321, 818, 280, 1291, 356, 975, 597, 51456], + "temperature": 0.0, "avg_logprob": -0.2965242119245632, "compression_ratio": 1.6904761904761905, + "no_speech_prob": 0.0017833516467362642}, {"id": 133, "seek": 86676, "start": 889.3199999999999, + "end": 896.36, "text": " and works with product quantization but also has support + for hsw you said pcolyte or how do you", "tokens": [51492, 293, 1985, 365, 1674, + 4426, 2144, 457, 611, 575, 1406, 337, 276, 82, 86, 291, 848, 280, 1291, 356, 975, + 420, 577, 360, 291, 51844], "temperature": 0.0, "avg_logprob": -0.2965242119245632, + "compression_ratio": 1.6904761904761905, "no_speech_prob": 0.0017833516467362642}, + {"id": 134, "seek": 89636, "start": 896.36, "end": 904.6, "text": " spell that? + P2Lite, P2Lite, which is like product quantization light version. Yes we", "tokens": + [50364, 9827, 300, 30, 430, 17, 43, 642, 11, 430, 17, 43, 642, 11, 597, 307, 411, + 1674, 4426, 2144, 1442, 3037, 13, 1079, 321, 50776], "temperature": 0.6, "avg_logprob": + -0.4572385281932597, "compression_ratio": 1.6017699115044248, "no_speech_prob": + 0.003951632417738438}, {"id": 135, "seek": 89636, "start": 905.48, "end": 910.36, + "text": " and profiltering options as well. Oh with preview and how in what sense + is it light,", "tokens": [50820, 293, 1740, 388, 391, 278, 3956, 382, 731, 13, 876, + 365, 14281, 293, 577, 294, 437, 2020, 307, 309, 1442, 11, 51064], "temperature": + 0.6, "avg_logprob": -0.4572385281932597, "compression_ratio": 1.6017699115044248, + "no_speech_prob": 0.003951632417738438}, {"id": 136, "seek": 89636, "start": 910.92, + "end": 917.32, "text": " compared to product quantization? No I have not been involved + so much in this spreader right now", "tokens": [51092, 5347, 281, 1674, 4426, 2144, + 30, 883, 286, 362, 406, 668, 3288, 370, 709, 294, 341, 3974, 260, 558, 586, 51412], + "temperature": 0.6, "avg_logprob": -0.4572385281932597, "compression_ratio": 1.6017699115044248, + "no_speech_prob": 0.003951632417738438}, {"id": 137, "seek": 89636, "start": 917.32, + "end": 925.8000000000001, "text": " so it''s a new thing but it is light in sense + of that it is quite embedded and it''s quite native", "tokens": [51412, 370, 309, + 311, 257, 777, 551, 457, 309, 307, 1442, 294, 2020, 295, 300, 309, 307, 1596, 16741, + 293, 309, 311, 1596, 8470, 51836], "temperature": 0.6, "avg_logprob": -0.4572385281932597, + "compression_ratio": 1.6017699115044248, "no_speech_prob": 0.003951632417738438}, + {"id": 138, "seek": 92580, "start": 925.8, "end": 931.8, "text": " to work with + our document type. So it''s not so general as any object, but it is really", "tokens": + [50364, 281, 589, 365, 527, 4166, 2010, 13, 407, 309, 311, 406, 370, 2674, 382, + 604, 2657, 11, 457, 309, 307, 534, 50664], "temperature": 0.0, "avg_logprob": -0.2690470654031505, + "compression_ratio": 1.6209386281588447, "no_speech_prob": 0.2745043933391571}, + {"id": 139, "seek": 92580, "start": 932.52, "end": 937.0799999999999, "text": " + built to integrate very easily with Jina. Oh, I see. Like with specific kind of", + "tokens": [50700, 3094, 281, 13365, 588, 3612, 365, 508, 1426, 13, 876, 11, 286, + 536, 13, 1743, 365, 2685, 733, 295, 50928], "temperature": 0.0, "avg_logprob": -0.2690470654031505, + "compression_ratio": 1.6209386281588447, "no_speech_prob": 0.2745043933391571}, + {"id": 140, "seek": 92580, "start": 937.7199999999999, "end": 944.12, "text": " + schema or document types. And it''s also open source. And do you do you like obviously + you can", "tokens": [50960, 34078, 420, 4166, 3467, 13, 400, 309, 311, 611, 1269, + 4009, 13, 400, 360, 291, 360, 291, 411, 2745, 291, 393, 51280], "temperature": 0.0, + "avg_logprob": -0.2690470654031505, "compression_ratio": 1.6209386281588447, "no_speech_prob": + 0.2745043933391571}, {"id": 141, "seek": 92580, "start": 944.12, "end": 949.24, + "text": " provide the links or we can also link in the show notes. But do you also + like have some kind of", "tokens": [51280, 2893, 264, 6123, 420, 321, 393, 611, + 2113, 294, 264, 855, 5570, 13, 583, 360, 291, 611, 411, 362, 512, 733, 295, 51536], + "temperature": 0.0, "avg_logprob": -0.2690470654031505, "compression_ratio": 1.6209386281588447, + "no_speech_prob": 0.2745043933391571}, {"id": 142, "seek": 92580, "start": 949.9599999999999, + "end": 954.8399999999999, "text": " latency analysis for this algorithm? Like has + it been conducted? Do you know? Yeah, there is", "tokens": [51572, 27043, 5215, + 337, 341, 9284, 30, 1743, 575, 309, 668, 13809, 30, 1144, 291, 458, 30, 865, 11, + 456, 307, 51816], "temperature": 0.0, "avg_logprob": -0.2690470654031505, "compression_ratio": + 1.6209386281588447, "no_speech_prob": 0.2745043933391571}, {"id": 143, "seek": 95484, + "start": 954.84, "end": 957.96, "text": " some benchmarks that you''re going to + find in the read. I cannot have the", "tokens": [50364, 512, 43751, 300, 291, 434, + 516, 281, 915, 294, 264, 1401, 13, 286, 2644, 362, 264, 50520], "temperature": 0.0, + "avg_logprob": -0.2548856642639753, "compression_ratio": 1.5714285714285714, "no_speech_prob": + 0.012339537963271141}, {"id": 144, "seek": 95484, "start": 958.52, "end": 963.4, + "text": " numbers in my head right now. Yeah, but I think for portion of our audience + it''s going to be", "tokens": [50548, 3547, 294, 452, 1378, 558, 586, 13, 865, 11, + 457, 286, 519, 337, 8044, 295, 527, 4034, 309, 311, 516, 281, 312, 50792], "temperature": + 0.0, "avg_logprob": -0.2548856642639753, "compression_ratio": 1.5714285714285714, + "no_speech_prob": 0.012339537963271141}, {"id": 145, "seek": 95484, "start": 963.4, + "end": 969.1600000000001, "text": " interesting to check out because as you know, + like actually my team just completed", "tokens": [50792, 1880, 281, 1520, 484, 570, + 382, 291, 458, 11, 411, 767, 452, 1469, 445, 7365, 51080], "temperature": 0.0, "avg_logprob": + -0.2548856642639753, "compression_ratio": 1.5714285714285714, "no_speech_prob": + 0.012339537963271141}, {"id": 146, "seek": 95484, "start": 970.6800000000001, "end": + 975.08, "text": " participation in big A&N. I don''t know if you heard about this + competition. So it''s like", "tokens": [51156, 13487, 294, 955, 316, 5, 45, 13, + 286, 500, 380, 458, 498, 291, 2198, 466, 341, 6211, 13, 407, 309, 311, 411, 51376], + "temperature": 0.0, "avg_logprob": -0.2548856642639753, "compression_ratio": 1.5714285714285714, + "no_speech_prob": 0.012339537963271141}, {"id": 147, "seek": 95484, "start": 975.08, + "end": 981.32, "text": " Villion scale approximate near nearest neighbor search. + So we invented like a new algorithm", "tokens": [51376, 691, 11836, 4373, 30874, + 2651, 23831, 5987, 3164, 13, 407, 321, 14479, 411, 257, 777, 9284, 51688], "temperature": + 0.0, "avg_logprob": -0.2548856642639753, "compression_ratio": 1.5714285714285714, + "no_speech_prob": 0.012339537963271141}, {"id": 148, "seek": 98132, "start": 981.32, + "end": 987.1600000000001, "text": " called BUDGPQ. I will also link in the show + notes like the blog post about it. So we increased", "tokens": [50364, 1219, 363, + 9438, 38, 47, 48, 13, 286, 486, 611, 2113, 294, 264, 855, 5570, 411, 264, 6968, + 2183, 466, 309, 13, 407, 321, 6505, 50656], "temperature": 0.0, "avg_logprob": -0.24560226650413022, + "compression_ratio": 1.6058091286307055, "no_speech_prob": 0.0063160983845591545}, + {"id": 149, "seek": 98132, "start": 987.1600000000001, "end": 997.72, "text": " + recall by 12% over FIES model. So yeah, FIES algorithm. So yeah, I think it''s great + that you guys", "tokens": [50656, 9901, 538, 2272, 4, 670, 479, 40, 2358, 2316, + 13, 407, 1338, 11, 479, 40, 2358, 9284, 13, 407, 1338, 11, 286, 519, 309, 311, 869, + 300, 291, 1074, 51184], "temperature": 0.0, "avg_logprob": -0.24560226650413022, + "compression_ratio": 1.6058091286307055, "no_speech_prob": 0.0063160983845591545}, + {"id": 150, "seek": 98132, "start": 997.72, "end": 1003.32, "text": " also inventing. + I don''t know if we are testing to this billion scale. I think we are more in the", + "tokens": [51184, 611, 7962, 278, 13, 286, 500, 380, 458, 498, 321, 366, 4997, 281, + 341, 5218, 4373, 13, 286, 519, 321, 366, 544, 294, 264, 51464], "temperature": 0.0, + "avg_logprob": -0.24560226650413022, "compression_ratio": 1.6058091286307055, "no_speech_prob": + 0.0063160983845591545}, {"id": 151, "seek": 98132, "start": 1003.32, "end": 1009.96, + "text": " million scale. Yeah, actually, we also ventured into billion scale, but + in the process we figured", "tokens": [51464, 2459, 4373, 13, 865, 11, 767, 11, + 321, 611, 6931, 3831, 666, 5218, 4373, 11, 457, 294, 264, 1399, 321, 8932, 51796], + "temperature": 0.0, "avg_logprob": -0.24560226650413022, "compression_ratio": 1.6058091286307055, + "no_speech_prob": 0.0063160983845591545}, {"id": 152, "seek": 100996, "start": 1010.12, + "end": 1014.84, "text": " out a solution for million scale. So it''s not for billion + years. We don''t know yet if we can", "tokens": [50372, 484, 257, 3827, 337, 2459, + 4373, 13, 407, 309, 311, 406, 337, 5218, 924, 13, 492, 500, 380, 458, 1939, 498, + 321, 393, 50608], "temperature": 0.0, "avg_logprob": -0.15596600236563846, "compression_ratio": + 1.6513409961685823, "no_speech_prob": 0.004739090800285339}, {"id": 153, "seek": + 100996, "start": 1014.84, "end": 1018.44, "text": " generalize to that level, but + I think we can with some additional research.", "tokens": [50608, 2674, 1125, 281, + 300, 1496, 11, 457, 286, 519, 321, 393, 365, 512, 4497, 2132, 13, 50788], "temperature": + 0.0, "avg_logprob": -0.15596600236563846, "compression_ratio": 1.6513409961685823, + "no_speech_prob": 0.004739090800285339}, {"id": 154, "seek": 100996, "start": 1019.88, + "end": 1023.48, "text": " Well, this is the first version. So for sure, we will + try to improve it.", "tokens": [50860, 1042, 11, 341, 307, 264, 700, 3037, 13, 407, + 337, 988, 11, 321, 486, 853, 281, 3470, 309, 13, 51040], "temperature": 0.0, "avg_logprob": + -0.15596600236563846, "compression_ratio": 1.6513409961685823, "no_speech_prob": + 0.004739090800285339}, {"id": 155, "seek": 100996, "start": 1024.1200000000001, + "end": 1030.3600000000001, "text": " Yeah, awesome. Awesome. This is great. And + have you also helped customers to like train models?", "tokens": [51072, 865, 11, + 3476, 13, 10391, 13, 639, 307, 869, 13, 400, 362, 291, 611, 4254, 4581, 281, 411, + 3847, 5245, 30, 51384], "temperature": 0.0, "avg_logprob": -0.15596600236563846, + "compression_ratio": 1.6513409961685823, "no_speech_prob": 0.004739090800285339}, + {"id": 156, "seek": 100996, "start": 1032.04, "end": 1037.0, "text": " No, but we + don''t, we didn''t help customers. Well, we did from our solution point of view, + but", "tokens": [51468, 883, 11, 457, 321, 500, 380, 11, 321, 994, 380, 854, 4581, + 13, 1042, 11, 321, 630, 490, 527, 3827, 935, 295, 1910, 11, 457, 51716], "temperature": + 0.0, "avg_logprob": -0.15596600236563846, "compression_ratio": 1.6513409961685823, + "no_speech_prob": 0.004739090800285339}, {"id": 157, "seek": 103700, "start": 1037.0, + "end": 1041.64, "text": " this is an interesting topic because this is something + that of the, this is one of the pains that", "tokens": [50364, 341, 307, 364, 1880, + 4829, 570, 341, 307, 746, 300, 295, 264, 11, 341, 307, 472, 295, 264, 29774, 300, + 50596], "temperature": 0.0, "avg_logprob": -0.23163244226476648, "compression_ratio": + 1.613733905579399, "no_speech_prob": 0.0018255988834425807}, {"id": 158, "seek": + 103700, "start": 1041.64, "end": 1049.72, "text": " we found quite often with our + users. Like it was easy for them to go to that level, 70% let''s say", "tokens": + [50596, 321, 1352, 1596, 2049, 365, 527, 5022, 13, 1743, 309, 390, 1858, 337, 552, + 281, 352, 281, 300, 1496, 11, 5285, 4, 718, 311, 584, 51000], "temperature": 0.0, + "avg_logprob": -0.23163244226476648, "compression_ratio": 1.613733905579399, "no_speech_prob": + 0.0018255988834425807}, {"id": 159, "seek": 103700, "start": 1049.72, "end": 1056.52, + "text": " of accuracy with any deep learning model that all these tech giants have + developed, right?", "tokens": [51000, 295, 14170, 365, 604, 2452, 2539, 2316, 300, + 439, 613, 7553, 31894, 362, 4743, 11, 558, 30, 51340], "temperature": 0.0, "avg_logprob": + -0.23163244226476648, "compression_ratio": 1.613733905579399, "no_speech_prob": + 0.0018255988834425807}, {"id": 160, "seek": 103700, "start": 1056.52, "end": 1063.0, + "text": " But we believe that this last mile, this transfer learning part is important. + And we are,", "tokens": [51340, 583, 321, 1697, 300, 341, 1036, 12620, 11, 341, + 5003, 2539, 644, 307, 1021, 13, 400, 321, 366, 11, 51664], "temperature": 0.0, "avg_logprob": + -0.23163244226476648, "compression_ratio": 1.613733905579399, "no_speech_prob": + 0.0018255988834425807}, {"id": 161, "seek": 106300, "start": 1063.72, "end": 1070.92, + "text": " and when we realize we started this project that is we called, well, we + know it''s already released,", "tokens": [50400, 293, 562, 321, 4325, 321, 1409, + 341, 1716, 300, 307, 321, 1219, 11, 731, 11, 321, 458, 309, 311, 1217, 4736, 11, + 50760], "temperature": 0.0, "avg_logprob": -0.3508202573086353, "compression_ratio": + 1.6212765957446809, "no_speech_prob": 0.0016702304128557444}, {"id": 162, "seek": + 106300, "start": 1070.92, "end": 1078.52, "text": " the fine tuner. Maybe we can + share that as well, where we try to make it easy for users to", "tokens": [50760, + 264, 2489, 4267, 260, 13, 2704, 321, 393, 2073, 300, 382, 731, 11, 689, 321, 853, + 281, 652, 309, 1858, 337, 5022, 281, 51140], "temperature": 0.0, "avg_logprob": + -0.3508202573086353, "compression_ratio": 1.6212765957446809, "no_speech_prob": + 0.0016702304128557444}, {"id": 163, "seek": 106300, "start": 1079.48, "end": 1084.84, + "text": " fine tune their models for a metric learning search applications. And + they are, and it is also", "tokens": [51188, 2489, 10864, 641, 5245, 337, 257, 20678, + 2539, 3164, 5821, 13, 400, 436, 366, 11, 293, 309, 307, 611, 51456], "temperature": + 0.0, "avg_logprob": -0.3508202573086353, "compression_ratio": 1.6212765957446809, + "no_speech_prob": 0.0016702304128557444}, {"id": 164, "seek": 106300, "start": 1084.84, + "end": 1092.04, "text": " framework agnostic for, we support fighters, TensorFlow + and paddle, paddle. So we realized this", "tokens": [51456, 8388, 623, 77, 19634, + 337, 11, 321, 1406, 19714, 11, 37624, 293, 31834, 11, 31834, 13, 407, 321, 5334, + 341, 51816], "temperature": 0.0, "avg_logprob": -0.3508202573086353, "compression_ratio": + 1.6212765957446809, "no_speech_prob": 0.0016702304128557444}, {"id": 165, "seek": + 109204, "start": 1092.04, "end": 1100.12, "text": " pain point for the users that + once we have everything running at home, the quality was not as expected.", "tokens": + [50364, 1822, 935, 337, 264, 5022, 300, 1564, 321, 362, 1203, 2614, 412, 1280, 11, + 264, 3125, 390, 406, 382, 5176, 13, 50768], "temperature": 0.0, "avg_logprob": -0.2245252245948428, + "compression_ratio": 1.6262626262626263, "no_speech_prob": 0.0024672262370586395}, + {"id": 166, "seek": 109204, "start": 1100.12, "end": 1105.1599999999999, "text": + " And this, and we are trying to get to help the user in our ecosystem to get to + this,", "tokens": [50768, 400, 341, 11, 293, 321, 366, 1382, 281, 483, 281, 854, + 264, 4195, 294, 527, 11311, 281, 483, 281, 341, 11, 51020], "temperature": 0.0, + "avg_logprob": -0.2245252245948428, "compression_ratio": 1.6262626262626263, "no_speech_prob": + 0.0024672262370586395}, {"id": 167, "seek": 109204, "start": 1106.2, "end": 1109.24, + "text": " yeah, to this level by using this fine tuner.", "tokens": [51072, 1338, + 11, 281, 341, 1496, 538, 1228, 341, 2489, 4267, 260, 13, 51224], "temperature": + 0.0, "avg_logprob": -0.2245252245948428, "compression_ratio": 1.6262626262626263, + "no_speech_prob": 0.0024672262370586395}, {"id": 168, "seek": 109204, "start": 1110.2, + "end": 1114.68, "text": " So basically, can you can you explain a bit more about + fine tuner? Like basically what,", "tokens": [51272, 407, 1936, 11, 393, 291, 393, + 291, 2903, 257, 857, 544, 466, 2489, 4267, 260, 30, 1743, 1936, 437, 11, 51496], + "temperature": 0.0, "avg_logprob": -0.2245252245948428, "compression_ratio": 1.6262626262626263, + "no_speech_prob": 0.0024672262370586395}, {"id": 169, "seek": 111468, "start": 1115.16, + "end": 1119.0800000000002, "text": " what input do I need to provide as a user into + this?", "tokens": [50388, 437, 4846, 360, 286, 643, 281, 2893, 382, 257, 4195, 666, + 341, 30, 50584], "temperature": 0.0, "avg_logprob": -0.19164693078329398, "compression_ratio": + 1.5700934579439252, "no_speech_prob": 0.0013237111270427704}, {"id": 170, "seek": + 111468, "start": 1119.8, "end": 1126.76, "text": " So fine tuner, it could feel + similar to any fighter dataset, for instance, but we are trying to put", "tokens": + [50620, 407, 2489, 4267, 260, 11, 309, 727, 841, 2531, 281, 604, 15932, 28872, 11, + 337, 5197, 11, 457, 321, 366, 1382, 281, 829, 50968], "temperature": 0.0, "avg_logprob": + -0.19164693078329398, "compression_ratio": 1.5700934579439252, "no_speech_prob": + 0.0013237111270427704}, {"id": 171, "seek": 111468, "start": 1126.76, "end": 1132.68, + "text": " our documents as our as the main citizen of our ecosystem. So you have + to wrap your", "tokens": [50968, 527, 8512, 382, 527, 382, 264, 2135, 13326, 295, + 527, 11311, 13, 407, 291, 362, 281, 7019, 428, 51264], "temperature": 0.0, "avg_logprob": + -0.19164693078329398, "compression_ratio": 1.5700934579439252, "no_speech_prob": + 0.0013237111270427704}, {"id": 172, "seek": 111468, "start": 1133.4, "end": 1139.8, + "text": " any of your data into our document types, which is really easy. So it''s + something easy to learn and", "tokens": [51300, 604, 295, 428, 1412, 666, 527, 4166, + 3467, 11, 597, 307, 534, 1858, 13, 407, 309, 311, 746, 1858, 281, 1466, 293, 51620], + "temperature": 0.0, "avg_logprob": -0.19164693078329398, "compression_ratio": 1.5700934579439252, + "no_speech_prob": 0.0013237111270427704}, {"id": 173, "seek": 113980, "start": 1139.8, + "end": 1147.72, "text": " easy to use. And then you can fit your models and we have + made it easy for you to use the most", "tokens": [50364, 1858, 281, 764, 13, 400, + 550, 291, 393, 3318, 428, 5245, 293, 321, 362, 1027, 309, 1858, 337, 291, 281, 764, + 264, 881, 50760], "temperature": 0.0, "avg_logprob": -0.20860824584960938, "compression_ratio": + 1.6555555555555554, "no_speech_prob": 0.004577599931508303}, {"id": 174, "seek": + 113980, "start": 1147.72, "end": 1154.6, "text": " typical, those functions we are + trying to introduce, hard negative mining. We are trying to make it easy", "tokens": + [50760, 7476, 11, 729, 6828, 321, 366, 1382, 281, 5366, 11, 1152, 3671, 15512, 13, + 492, 366, 1382, 281, 652, 309, 1858, 51104], "temperature": 0.0, "avg_logprob": + -0.20860824584960938, "compression_ratio": 1.6555555555555554, "no_speech_prob": + 0.004577599931508303}, {"id": 175, "seek": 113980, "start": 1154.6, "end": 1163.8799999999999, + "text": " for everyone to solve the common problems when having, when training for, + for search applications.", "tokens": [51104, 337, 1518, 281, 5039, 264, 2689, 2740, + 562, 1419, 11, 562, 3097, 337, 11, 337, 3164, 5821, 13, 51568], "temperature": 0.0, + "avg_logprob": -0.20860824584960938, "compression_ratio": 1.6555555555555554, "no_speech_prob": + 0.004577599931508303}, {"id": 176, "seek": 116388, "start": 1163.88, "end": 1170.0400000000002, + "text": " And we are also trying to make an interactive labeler that helps you interactively + through an easy", "tokens": [50364, 400, 321, 366, 611, 1382, 281, 652, 364, 15141, + 7645, 260, 300, 3665, 291, 4648, 3413, 807, 364, 1858, 50672], "temperature": 0.0, + "avg_logprob": -0.19627386150938092, "compression_ratio": 1.6150627615062763, "no_speech_prob": + 0.006436491850763559}, {"id": 177, "seek": 116388, "start": 1170.0400000000002, + "end": 1177.88, "text": " to use UI and tag similar objects so that you can go together + with them. Yeah, yeah, so like,", "tokens": [50672, 281, 764, 15682, 293, 6162, + 2531, 6565, 370, 300, 291, 393, 352, 1214, 365, 552, 13, 865, 11, 1338, 11, 370, + 411, 11, 51064], "temperature": 0.0, "avg_logprob": -0.19627386150938092, "compression_ratio": + 1.6150627615062763, "no_speech_prob": 0.006436491850763559}, {"id": 178, "seek": + 116388, "start": 1179.0800000000002, "end": 1185.3200000000002, "text": " kind of, + I mean, fine tuning can be a pipeline by itself, right? In itself. So like, how + do you get", "tokens": [51124, 733, 295, 11, 286, 914, 11, 2489, 15164, 393, 312, + 257, 15517, 538, 2564, 11, 558, 30, 682, 2564, 13, 407, 411, 11, 577, 360, 291, + 483, 51436], "temperature": 0.0, "avg_logprob": -0.19627386150938092, "compression_ratio": + 1.6150627615062763, "no_speech_prob": 0.006436491850763559}, {"id": 179, "seek": + 116388, "start": 1185.88, "end": 1191.0800000000002, "text": " these data samples + that you want to fine tune on? And you might have them with full launch or", "tokens": + [51464, 613, 1412, 10938, 300, 291, 528, 281, 2489, 10864, 322, 30, 400, 291, 1062, + 362, 552, 365, 1577, 4025, 420, 51724], "temperature": 0.0, "avg_logprob": -0.19627386150938092, + "compression_ratio": 1.6150627615062763, "no_speech_prob": 0.006436491850763559}, + {"id": 180, "seek": 119108, "start": 1191.8, "end": 1197.96, "text": " during test, + after launch, and it''s like, you know, the cycle and flywheel of success, so to + say,", "tokens": [50400, 1830, 1500, 11, 934, 4025, 11, 293, 309, 311, 411, 11, + 291, 458, 11, 264, 6586, 293, 3603, 22830, 295, 2245, 11, 370, 281, 584, 11, 50708], + "temperature": 0.0, "avg_logprob": -0.23574898974730238, "compression_ratio": 1.6581196581196582, + "no_speech_prob": 0.006464001722633839}, {"id": 181, "seek": 119108, "start": 1198.6799999999998, + "end": 1205.32, "text": " right? So do you cover like the full workflow until production, + including production, or is it", "tokens": [50744, 558, 30, 407, 360, 291, 2060, + 411, 264, 1577, 20993, 1826, 4265, 11, 3009, 4265, 11, 420, 307, 309, 51076], "temperature": + 0.0, "avg_logprob": -0.23574898974730238, "compression_ratio": 1.6581196581196582, + "no_speech_prob": 0.006464001722633839}, {"id": 182, "seek": 119108, "start": 1205.32, + "end": 1213.3999999999999, "text": " like pre-production? So for now, we are using + the just embedding model. And just to get embeddings that", "tokens": [51076, 411, + 659, 12, 40827, 30, 407, 337, 586, 11, 321, 366, 1228, 264, 445, 12240, 3584, 2316, + 13, 400, 445, 281, 483, 12240, 29432, 300, 51480], "temperature": 0.0, "avg_logprob": + -0.23574898974730238, "compression_ratio": 1.6581196581196582, "no_speech_prob": + 0.006464001722633839}, {"id": 183, "seek": 119108, "start": 1213.3999999999999, + "end": 1219.72, "text": " get better semantics out of your data set of your specific + use case. But we are in a really", "tokens": [51480, 483, 1101, 4361, 45298, 484, + 295, 428, 1412, 992, 295, 428, 2685, 764, 1389, 13, 583, 321, 366, 294, 257, 534, + 51796], "temperature": 0.0, "avg_logprob": -0.23574898974730238, "compression_ratio": + 1.6581196581196582, "no_speech_prob": 0.006464001722633839}, {"id": 184, "seek": + 121972, "start": 1220.52, "end": 1225.24, "text": " thing, it''s easier to point + to release or something around in there, so there''s a long way to go.", "tokens": + [50404, 551, 11, 309, 311, 3571, 281, 935, 281, 4374, 420, 746, 926, 294, 456, 11, + 370, 456, 311, 257, 938, 636, 281, 352, 13, 50640], "temperature": 0.0, "avg_logprob": + -0.2805020986509717, "compression_ratio": 1.6981818181818182, "no_speech_prob": + 0.004531599581241608}, {"id": 185, "seek": 121972, "start": 1225.72, "end": 1229.32, + "text": " Yeah, for sure. But I mean, the direction is fantastic because that''s + exactly what,", "tokens": [50664, 865, 11, 337, 988, 13, 583, 286, 914, 11, 264, + 3513, 307, 5456, 570, 300, 311, 2293, 437, 11, 50844], "temperature": 0.0, "avg_logprob": + -0.2805020986509717, "compression_ratio": 1.6981818181818182, "no_speech_prob": + 0.004531599581241608}, {"id": 186, "seek": 121972, "start": 1229.32, "end": 1235.64, + "text": " what addresses the real need, any user, like fine tuning. Like it''s all + fancy to take like", "tokens": [50844, 437, 16862, 264, 957, 643, 11, 604, 4195, + 11, 411, 2489, 15164, 13, 1743, 309, 311, 439, 10247, 281, 747, 411, 51160], "temperature": + 0.0, "avg_logprob": -0.2805020986509717, "compression_ratio": 1.6981818181818182, + "no_speech_prob": 0.004531599581241608}, {"id": 187, "seek": 121972, "start": 1235.64, + "end": 1241.32, "text": " a hugging case model or whatever, but like fine tuning + it to the level when you''re users beloved,", "tokens": [51160, 257, 41706, 1389, + 2316, 420, 2035, 11, 457, 411, 2489, 15164, 309, 281, 264, 1496, 562, 291, 434, + 5022, 14553, 11, 51444], "temperature": 0.0, "avg_logprob": -0.2805020986509717, + "compression_ratio": 1.6981818181818182, "no_speech_prob": 0.004531599581241608}, + {"id": 188, "seek": 121972, "start": 1241.32, "end": 1248.2, "text": " that''s a + different story. Yeah, that sounds great. But I also wanted to come back to your, + like,", "tokens": [51444, 300, 311, 257, 819, 1657, 13, 865, 11, 300, 3263, 869, + 13, 583, 286, 611, 1415, 281, 808, 646, 281, 428, 11, 411, 11, 51788], "temperature": + 0.0, "avg_logprob": -0.2805020986509717, "compression_ratio": 1.6981818181818182, + "no_speech_prob": 0.004531599581241608}, {"id": 189, "seek": 124820, "start": 1248.2, + "end": 1253.72, "text": " you mentioned that Gina AI doesn''t kind of compare to + vector databases, but I do get sometimes", "tokens": [50364, 291, 2835, 300, 34711, + 7318, 1177, 380, 733, 295, 6794, 281, 8062, 22380, 11, 457, 286, 360, 483, 2171, + 50640], "temperature": 0.0, "avg_logprob": -0.1569671630859375, "compression_ratio": + 1.6363636363636365, "no_speech_prob": 0.004010406788438559}, {"id": 190, "seek": + 124820, "start": 1253.72, "end": 1260.3600000000001, "text": " questions like how + do these systems compare to each other? And you may or may not know, I''ve", "tokens": + [50640, 1651, 411, 577, 360, 613, 3652, 6794, 281, 1184, 661, 30, 400, 291, 815, + 420, 815, 406, 458, 11, 286, 600, 50972], "temperature": 0.0, "avg_logprob": -0.1569671630859375, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 0.004010406788438559}, + {"id": 191, "seek": 124820, "start": 1260.3600000000001, "end": 1265.56, "text": + " blocked about all vector databases I knew to that point and turns out they''ve + been six and then", "tokens": [50972, 15470, 466, 439, 8062, 22380, 286, 2586, 281, + 300, 935, 293, 4523, 484, 436, 600, 668, 2309, 293, 550, 51232], "temperature": + 0.0, "avg_logprob": -0.1569671630859375, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.004010406788438559}, {"id": 192, "seek": 124820, "start": 1265.56, + "end": 1271.56, "text": " the seventh one knocked on the door, so it''s also now + on the blog. But I didn''t cover Gina AI,", "tokens": [51232, 264, 17875, 472, 16914, + 322, 264, 2853, 11, 370, 309, 311, 611, 586, 322, 264, 6968, 13, 583, 286, 994, + 380, 2060, 34711, 7318, 11, 51532], "temperature": 0.0, "avg_logprob": -0.1569671630859375, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 0.004010406788438559}, + {"id": 193, "seek": 127156, "start": 1271.56, "end": 1280.6799999999998, "text": + " I didn''t cover deep sets haystack because I thought that Gina and haystack, they''re + like layers", "tokens": [50364, 286, 994, 380, 2060, 2452, 6352, 4842, 372, 501, + 570, 286, 1194, 300, 34711, 293, 4842, 372, 501, 11, 436, 434, 411, 7914, 50820], + "temperature": 0.0, "avg_logprob": -0.18398407178047377, "compression_ratio": 1.4871794871794872, + "no_speech_prob": 0.003168991534039378}, {"id": 194, "seek": 127156, "start": 1280.6799999999998, + "end": 1288.6, "text": " above a vector database. Is that the right thinking? Yes, + I think it makes sense. We are, we might", "tokens": [50820, 3673, 257, 8062, 8149, + 13, 1119, 300, 264, 558, 1953, 30, 1079, 11, 286, 519, 309, 1669, 2020, 13, 492, + 366, 11, 321, 1062, 51216], "temperature": 0.0, "avg_logprob": -0.18398407178047377, + "compression_ratio": 1.4871794871794872, "no_speech_prob": 0.003168991534039378}, + {"id": 195, "seek": 127156, "start": 1288.6, "end": 1296.6, "text": " try to develop + our solutions for the use cases that we may feel more worth. So that is, I mean,", + "tokens": [51216, 853, 281, 1499, 527, 6547, 337, 264, 764, 3331, 300, 321, 815, + 841, 544, 3163, 13, 407, 300, 307, 11, 286, 914, 11, 51616], "temperature": 0.0, + "avg_logprob": -0.18398407178047377, "compression_ratio": 1.4871794871794872, "no_speech_prob": + 0.003168991534039378}, {"id": 196, "seek": 129660, "start": 1296.6, "end": 1303.56, + "text": " the one is out there to do, but yeah, I think it''s right. We are trying + to, I think, vector databases", "tokens": [50364, 264, 472, 307, 484, 456, 281, + 360, 11, 457, 1338, 11, 286, 519, 309, 311, 558, 13, 492, 366, 1382, 281, 11, 286, + 519, 11, 8062, 22380, 50712], "temperature": 0.0, "avg_logprob": -0.18911768465625997, + "compression_ratio": 1.7630331753554502, "no_speech_prob": 0.0485207661986351}, + {"id": 197, "seek": 129660, "start": 1303.56, "end": 1309.0, "text": " cover one + of the parts or one of the challenges, maybe one of the main challenges of vector + search", "tokens": [50712, 2060, 472, 295, 264, 3166, 420, 472, 295, 264, 4759, + 11, 1310, 472, 295, 264, 2135, 4759, 295, 8062, 3164, 50984], "temperature": 0.0, + "avg_logprob": -0.18911768465625997, "compression_ratio": 1.7630331753554502, "no_speech_prob": + 0.0485207661986351}, {"id": 198, "seek": 129660, "start": 1309.0, "end": 1313.9599999999998, + "text": " or neural search, but we try to see the whole scope and the whole pipeline. + So,", "tokens": [50984, 420, 18161, 3164, 11, 457, 321, 853, 281, 536, 264, 1379, + 11923, 293, 264, 1379, 15517, 13, 407, 11, 51232], "temperature": 0.0, "avg_logprob": + -0.18911768465625997, "compression_ratio": 1.7630331753554502, "no_speech_prob": + 0.0485207661986351}, {"id": 199, "seek": 129660, "start": 1315.08, "end": 1321.7199999999998, + "text": " in Gina, we can use, you can wrap some client that will use any of the + big vector searches,", "tokens": [51288, 294, 34711, 11, 321, 393, 764, 11, 291, + 393, 7019, 512, 6423, 300, 486, 764, 604, 295, 264, 955, 8062, 26701, 11, 51620], + "temperature": 0.0, "avg_logprob": -0.18911768465625997, "compression_ratio": 1.7630331753554502, + "no_speech_prob": 0.0485207661986351}, {"id": 200, "seek": 132172, "start": 1321.72, + "end": 1326.76, "text": " big data research of how there have you done any integration + with some vector database?", "tokens": [50364, 955, 1412, 2132, 295, 577, 456, 362, + 291, 1096, 604, 10980, 365, 512, 8062, 8149, 30, 50616], "temperature": 0.0, "avg_logprob": + -0.2602545304731889, "compression_ratio": 1.678030303030303, "no_speech_prob": 0.004188275430351496}, + {"id": 201, "seek": 132172, "start": 1328.1200000000001, "end": 1333.16, "text": + " Not ourselves right now, but it would be, we might do it in the future.", "tokens": + [50684, 1726, 4175, 558, 586, 11, 457, 309, 576, 312, 11, 321, 1062, 360, 309, 294, + 264, 2027, 13, 50936], "temperature": 0.0, "avg_logprob": -0.2602545304731889, "compression_ratio": + 1.678030303030303, "no_speech_prob": 0.004188275430351496}, {"id": 202, "seek": + 132172, "start": 1333.16, "end": 1339.0, "text": " Okay, yeah, because for now, + you did mention that you offer GNN and algorithms, which to me", "tokens": [50936, + 1033, 11, 1338, 11, 570, 337, 586, 11, 291, 630, 2152, 300, 291, 2626, 460, 45, + 45, 293, 14642, 11, 597, 281, 385, 51228], "temperature": 0.0, "avg_logprob": -0.2602545304731889, + "compression_ratio": 1.678030303030303, "no_speech_prob": 0.004188275430351496}, + {"id": 203, "seek": 132172, "start": 1339.0, "end": 1344.28, "text": " sounds like + a core building block of vector database, but then of course in vector database,", + "tokens": [51228, 3263, 411, 257, 4965, 2390, 3461, 295, 8062, 8149, 11, 457, 550, + 295, 1164, 294, 8062, 8149, 11, 51492], "temperature": 0.0, "avg_logprob": -0.2602545304731889, + "compression_ratio": 1.678030303030303, "no_speech_prob": 0.004188275430351496}, + {"id": 204, "seek": 132172, "start": 1344.28, "end": 1349.4, "text": " you have + many more things, right? Like, where do you store objects? How you store them? What + about", "tokens": [51492, 291, 362, 867, 544, 721, 11, 558, 30, 1743, 11, 689, 360, + 291, 3531, 6565, 30, 1012, 291, 3531, 552, 30, 708, 466, 51748], "temperature": + 0.0, "avg_logprob": -0.2602545304731889, "compression_ratio": 1.678030303030303, + "no_speech_prob": 0.004188275430351496}, {"id": 205, "seek": 134940, "start": 1349.4, + "end": 1358.2800000000002, "text": " filters and so on? But we are trying to cover + from the, for instance, we are not some,", "tokens": [50364, 15995, 293, 370, 322, + 30, 583, 321, 366, 1382, 281, 2060, 490, 264, 11, 337, 5197, 11, 321, 366, 406, + 512, 11, 50808], "temperature": 0.0, "avg_logprob": -0.2384679424228953, "compression_ratio": + 1.4943181818181819, "no_speech_prob": 0.0033319646026939154}, {"id": 206, "seek": + 134940, "start": 1358.2800000000002, "end": 1364.2, "text": " some people for some + use cases, and just exactly as neighbor search might work just fine,", "tokens": + [50808, 512, 561, 337, 512, 764, 3331, 11, 293, 445, 2293, 382, 5987, 3164, 1062, + 589, 445, 2489, 11, 51104], "temperature": 0.0, "avg_logprob": -0.2384679424228953, + "compression_ratio": 1.4943181818181819, "no_speech_prob": 0.0033319646026939154}, + {"id": 207, "seek": 134940, "start": 1364.2, "end": 1371.24, "text": " and they + don''t need to worry about configuring fancy A&N models for their recall speed", + "tokens": [51104, 293, 436, 500, 380, 643, 281, 3292, 466, 6662, 1345, 10247, 316, + 5, 45, 5245, 337, 641, 9901, 3073, 51456], "temperature": 0.0, "avg_logprob": -0.2384679424228953, + "compression_ratio": 1.4943181818181819, "no_speech_prob": 0.0033319646026939154}, + {"id": 208, "seek": 137124, "start": 1372.2, "end": 1379.56, "text": " requirements. + So, I think there is room for everyone. So, I think it just, you have to offer", + "tokens": [50412, 7728, 13, 407, 11, 286, 519, 456, 307, 1808, 337, 1518, 13, 407, + 11, 286, 519, 309, 445, 11, 291, 362, 281, 2626, 50780], "temperature": 0.0, "avg_logprob": + -0.2002204123963701, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.011460872367024422}, {"id": 209, "seek": 137124, "start": 1380.2, "end": 1385.32, + "text": " what is right for the right use case and the right need. Yeah, of course. + And by the way,", "tokens": [50812, 437, 307, 558, 337, 264, 558, 764, 1389, 293, + 264, 558, 643, 13, 865, 11, 295, 1164, 13, 400, 538, 264, 636, 11, 51068], "temperature": + 0.0, "avg_logprob": -0.2002204123963701, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.011460872367024422}, {"id": 210, "seek": 137124, "start": 1385.32, + "end": 1392.36, "text": " what''s the core programming language used in Gina? So, + our core programming language is Python,", "tokens": [51068, 437, 311, 264, 4965, + 9410, 2856, 1143, 294, 34711, 30, 407, 11, 527, 4965, 9410, 2856, 307, 15329, 11, + 51420], "temperature": 0.0, "avg_logprob": -0.2002204123963701, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.011460872367024422}, {"id": 211, "seek": + 137124, "start": 1392.36, "end": 1399.88, "text": " because we are more like, since + we are this pipeline and we are like a glue ecosystem,", "tokens": [51420, 570, + 321, 366, 544, 411, 11, 1670, 321, 366, 341, 15517, 293, 321, 366, 411, 257, 8998, + 11311, 11, 51796], "temperature": 0.0, "avg_logprob": -0.2002204123963701, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.011460872367024422}, {"id": 212, "seek": + 139988, "start": 1399.96, "end": 1407.5600000000002, "text": " most of our operations + are wrapping models that run in optimized languages or something,", "tokens": [50368, + 881, 295, 527, 7705, 366, 21993, 5245, 300, 1190, 294, 26941, 8650, 420, 746, 11, + 50748], "temperature": 0.0, "avg_logprob": -0.15994265805120053, "compression_ratio": + 1.543103448275862, "no_speech_prob": 0.002710112603381276}, {"id": 213, "seek": + 139988, "start": 1407.5600000000002, "end": 1413.8000000000002, "text": " and that + also Python helps us to iterate really fast, which other languages might slow us + down.", "tokens": [50748, 293, 300, 611, 15329, 3665, 505, 281, 44497, 534, 2370, + 11, 597, 661, 8650, 1062, 2964, 505, 760, 13, 51060], "temperature": 0.0, "avg_logprob": + -0.15994265805120053, "compression_ratio": 1.543103448275862, "no_speech_prob": + 0.002710112603381276}, {"id": 214, "seek": 139988, "start": 1414.3600000000001, + "end": 1419.96, "text": " Yeah, that''s true. And does it also apply to the A&N + algorithms that you mentioned,", "tokens": [51088, 865, 11, 300, 311, 2074, 13, + 400, 775, 309, 611, 3079, 281, 264, 316, 5, 45, 14642, 300, 291, 2835, 11, 51368], + "temperature": 0.0, "avg_logprob": -0.15994265805120053, "compression_ratio": 1.543103448275862, + "no_speech_prob": 0.002710112603381276}, {"id": 215, "seek": 139988, "start": 1419.96, + "end": 1427.8000000000002, "text": " like BQLite? Is it also Python? I don''t know + if we are, for instance, I think we are also", "tokens": [51368, 411, 363, 48, 43, + 642, 30, 1119, 309, 611, 15329, 30, 286, 500, 380, 458, 498, 321, 366, 11, 337, + 5197, 11, 286, 519, 321, 366, 611, 51760], "temperature": 0.0, "avg_logprob": -0.15994265805120053, + "compression_ratio": 1.543103448275862, "no_speech_prob": 0.002710112603381276}, + {"id": 216, "seek": 142780, "start": 1427.8799999999999, "end": 1437.56, "text": + " using some bindings for H&N. So, you are using probably C++ version of H&N SW + binding to Python,", "tokens": [50368, 1228, 512, 14786, 1109, 337, 389, 5, 45, + 13, 407, 11, 291, 366, 1228, 1391, 383, 25472, 3037, 295, 389, 5, 45, 20346, 17359, + 281, 15329, 11, 50852], "temperature": 0.0, "avg_logprob": -0.28086015913221574, + "compression_ratio": 1.4793814432989691, "no_speech_prob": 0.003570443019270897}, + {"id": 217, "seek": 142780, "start": 1437.56, "end": 1444.2, "text": " right? Yes, + that''s for sure. But I don''t know if some of, for the H&N SWD, yes, for some other", + "tokens": [50852, 558, 30, 1079, 11, 300, 311, 337, 988, 13, 583, 286, 500, 380, + 458, 498, 512, 295, 11, 337, 264, 389, 5, 45, 20346, 35, 11, 2086, 11, 337, 512, + 661, 51184], "temperature": 0.0, "avg_logprob": -0.28086015913221574, "compression_ratio": + 1.4793814432989691, "no_speech_prob": 0.003570443019270897}, {"id": 218, "seek": + 142780, "start": 1444.2, "end": 1453.0, "text": " parts, I don''t know, we are trying + to optimize whatever we find. Yeah, but it sounds cool that,", "tokens": [51184, + 3166, 11, 286, 500, 380, 458, 11, 321, 366, 1382, 281, 19719, 2035, 321, 915, 13, + 865, 11, 457, 309, 3263, 1627, 300, 11, 51624], "temperature": 0.0, "avg_logprob": + -0.28086015913221574, "compression_ratio": 1.4793814432989691, "no_speech_prob": + 0.003570443019270897}, {"id": 219, "seek": 145300, "start": 1453.0, "end": 1459.88, + "text": " you know, if we still continue kind of this comparison a little bit between + Gina and vector databases,", "tokens": [50364, 291, 458, 11, 498, 321, 920, 2354, + 733, 295, 341, 9660, 257, 707, 857, 1296, 34711, 293, 8062, 22380, 11, 50708], "temperature": + 0.0, "avg_logprob": -0.3140570660854908, "compression_ratio": 1.5360360360360361, + "no_speech_prob": 0.010131534188985825}, {"id": 220, "seek": 145300, "start": 1460.84, + "end": 1464.68, "text": " like vector databases, if you pick them, let''s say BIAV8 + is implemented in Go,", "tokens": [50756, 411, 8062, 22380, 11, 498, 291, 1888, + 552, 11, 718, 311, 584, 363, 6914, 53, 23, 307, 12270, 294, 1037, 11, 50948], "temperature": + 0.0, "avg_logprob": -0.3140570660854908, "compression_ratio": 1.5360360360360361, + "no_speech_prob": 0.010131534188985825}, {"id": 221, "seek": 145300, "start": 1466.36, + "end": 1471.56, "text": " what grant is implemented in Rust? So, these are compiling + languages, right? So,", "tokens": [51032, 437, 6386, 307, 12270, 294, 34952, 30, + 407, 11, 613, 366, 715, 4883, 8650, 11, 558, 30, 407, 11, 51292], "temperature": + 0.0, "avg_logprob": -0.3140570660854908, "compression_ratio": 1.5360360360360361, + "no_speech_prob": 0.010131534188985825}, {"id": 222, "seek": 145300, "start": 1472.6, + "end": 1481.72, "text": " VESPA is like Java plus some C, I think, C++ and mostly + Java. So, like, nobody", "tokens": [51344, 691, 2358, 10297, 307, 411, 10745, 1804, + 512, 383, 11, 286, 519, 11, 383, 25472, 293, 5240, 10745, 13, 407, 11, 411, 11, + 5079, 51800], "temperature": 0.0, "avg_logprob": -0.3140570660854908, "compression_ratio": + 1.5360360360360361, "no_speech_prob": 0.010131534188985825}, {"id": 223, "seek": + 148172, "start": 1481.72, "end": 1488.04, "text": " implements the vector search + in pure Python, because it''s very, it''s going to be very", "tokens": [50364, 704, + 17988, 264, 8062, 3164, 294, 6075, 15329, 11, 570, 309, 311, 588, 11, 309, 311, + 516, 281, 312, 588, 50680], "temperature": 0.0, "avg_logprob": -0.265682578086853, + "compression_ratio": 1.6088888888888888, "no_speech_prob": 0.0032989559695124626}, + {"id": 224, "seek": 148172, "start": 1488.04, "end": 1493.72, "text": " taxing on + the latency, you know? Sure. No, but the expensive operation, we are not running.", + "tokens": [50680, 3366, 278, 322, 264, 27043, 11, 291, 458, 30, 4894, 13, 883, 11, + 457, 264, 5124, 6916, 11, 321, 366, 406, 2614, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.265682578086853, "compression_ratio": 1.6088888888888888, "no_speech_prob": 0.0032989559695124626}, + {"id": 225, "seek": 148172, "start": 1493.72, "end": 1499.56, "text": " So, for + instance, the nearest neighbor search we are doing, we are based on NAMPA operations,", + "tokens": [50964, 407, 11, 337, 5197, 11, 264, 23831, 5987, 3164, 321, 366, 884, + 11, 321, 366, 2361, 322, 426, 2865, 10297, 7705, 11, 51256], "temperature": 0.0, + "avg_logprob": -0.265682578086853, "compression_ratio": 1.6088888888888888, "no_speech_prob": + 0.0032989559695124626}, {"id": 226, "seek": 148172, "start": 1499.56, "end": 1504.2, + "text": " which are optimized at NAMPA level, and the approximations neighbors, + I think, most of the", "tokens": [51256, 597, 366, 26941, 412, 426, 2865, 10297, + 1496, 11, 293, 264, 8542, 763, 12512, 11, 286, 519, 11, 881, 295, 264, 51488], "temperature": + 0.0, "avg_logprob": -0.265682578086853, "compression_ratio": 1.6088888888888888, + "no_speech_prob": 0.0032989559695124626}, {"id": 227, "seek": 150420, "start": 1504.68, + "end": 1511.0800000000002, "text": " heavy lifting is done on the C++ level from, + I''m just covering our bindings.", "tokens": [50388, 4676, 15798, 307, 1096, 322, + 264, 383, 25472, 1496, 490, 11, 286, 478, 445, 10322, 527, 14786, 1109, 13, 50708], + "temperature": 0.0, "avg_logprob": -0.24465321559532016, "compression_ratio": 1.5338983050847457, + "no_speech_prob": 0.008307828567922115}, {"id": 228, "seek": 150420, "start": 1511.88, + "end": 1517.72, "text": " Yeah, and I''m still curious about BQ Lite, like, is it + the C or is it Python, but I think we need", "tokens": [50748, 865, 11, 293, 286, + 478, 920, 6369, 466, 363, 48, 32986, 11, 411, 11, 307, 309, 264, 383, 420, 307, + 309, 15329, 11, 457, 286, 519, 321, 643, 51040], "temperature": 0.0, "avg_logprob": + -0.24465321559532016, "compression_ratio": 1.5338983050847457, "no_speech_prob": + 0.008307828567922115}, {"id": 229, "seek": 150420, "start": 1517.72, "end": 1524.1200000000001, + "text": " to check the documentation. Yes. Yeah, I''m curious because like, I''ve + actually invented a new", "tokens": [51040, 281, 1520, 264, 14333, 13, 1079, 13, + 865, 11, 286, 478, 6369, 570, 411, 11, 286, 600, 767, 14479, 257, 777, 51360], "temperature": + 0.0, "avg_logprob": -0.24465321559532016, "compression_ratio": 1.5338983050847457, + "no_speech_prob": 0.008307828567922115}, {"id": 230, "seek": 150420, "start": 1524.1200000000001, + "end": 1529.96, "text": " algorithm in NAMPA search, but I haven''t published it + widely, it''s open source, but I haven''t", "tokens": [51360, 9284, 294, 426, 2865, + 10297, 3164, 11, 457, 286, 2378, 380, 6572, 309, 13371, 11, 309, 311, 1269, 4009, + 11, 457, 286, 2378, 380, 51652], "temperature": 0.0, "avg_logprob": -0.24465321559532016, + "compression_ratio": 1.5338983050847457, "no_speech_prob": 0.008307828567922115}, + {"id": 231, "seek": 152996, "start": 1530.6000000000001, "end": 1535.56, "text": + " done the thorough benchmarking. And what I''ve faced is that, you know, like, + in Python, even though", "tokens": [50396, 1096, 264, 12934, 18927, 278, 13, 400, + 437, 286, 600, 11446, 307, 300, 11, 291, 458, 11, 411, 11, 294, 15329, 11, 754, + 1673, 50644], "temperature": 0.0, "avg_logprob": -0.14417409896850586, "compression_ratio": + 1.6213991769547325, "no_speech_prob": 0.007663533557206392}, {"id": 232, "seek": + 152996, "start": 1535.56, "end": 1542.28, "text": " I optimized all parts of the + algorithm, I''m using preallocation and NAMPA, it still runs out of", "tokens": + [50644, 286, 26941, 439, 3166, 295, 264, 9284, 11, 286, 478, 1228, 659, 336, 27943, + 293, 426, 2865, 10297, 11, 309, 920, 6676, 484, 295, 50980], "temperature": 0.0, + "avg_logprob": -0.14417409896850586, "compression_ratio": 1.6213991769547325, "no_speech_prob": + 0.007663533557206392}, {"id": 233, "seek": 152996, "start": 1542.28, "end": 1548.52, + "text": " memory, runs out of memory as in it leaks memory, and it doesn''t explain, + like, Python virtual machine", "tokens": [50980, 4675, 11, 6676, 484, 295, 4675, + 382, 294, 309, 28885, 4675, 11, 293, 309, 1177, 380, 2903, 11, 411, 11, 15329, 6374, + 3479, 51292], "temperature": 0.0, "avg_logprob": -0.14417409896850586, "compression_ratio": + 1.6213991769547325, "no_speech_prob": 0.007663533557206392}, {"id": 234, "seek": + 152996, "start": 1548.52, "end": 1552.28, "text": " doesn''t tell you where, like, + you don''t have tools. Okay, there are some tools, but they''re not", "tokens": + [51292, 1177, 380, 980, 291, 689, 11, 411, 11, 291, 500, 380, 362, 3873, 13, 1033, + 11, 456, 366, 512, 3873, 11, 457, 436, 434, 406, 51480], "temperature": 0.0, "avg_logprob": + -0.14417409896850586, "compression_ratio": 1.6213991769547325, "no_speech_prob": + 0.007663533557206392}, {"id": 235, "seek": 155228, "start": 1552.28, "end": 1558.68, + "text": " useful. Like, no, you''re showing a little stuff. No. So, and I''ve been + like a little bit like", "tokens": [50364, 4420, 13, 1743, 11, 572, 11, 291, 434, + 4099, 257, 707, 1507, 13, 883, 13, 407, 11, 293, 286, 600, 668, 411, 257, 707, 857, + 411, 50684], "temperature": 0.0, "avg_logprob": -0.2797810993497334, "compression_ratio": + 1.6178571428571429, "no_speech_prob": 0.02168191783130169}, {"id": 236, "seek": + 155228, "start": 1558.68, "end": 1564.2, "text": " desperate, and I''ve been thinking, + okay, should I now move into RAST GO territory, which might be", "tokens": [50684, + 17601, 11, 293, 286, 600, 668, 1953, 11, 1392, 11, 820, 286, 586, 1286, 666, 497, + 20398, 10365, 11360, 11, 597, 1062, 312, 50960], "temperature": 0.0, "avg_logprob": + -0.2797810993497334, "compression_ratio": 1.6178571428571429, "no_speech_prob": + 0.02168191783130169}, {"id": 237, "seek": 155228, "start": 1564.2, "end": 1568.92, + "text": " a little bit more dangerous, like, even though I do have some experience + in C++, but you know, like,", "tokens": [50960, 257, 707, 857, 544, 5795, 11, 411, + 11, 754, 1673, 286, 360, 362, 512, 1752, 294, 383, 25472, 11, 457, 291, 458, 11, + 411, 11, 51196], "temperature": 0.0, "avg_logprob": -0.2797810993497334, "compression_ratio": + 1.6178571428571429, "no_speech_prob": 0.02168191783130169}, {"id": 238, "seek": + 155228, "start": 1568.92, "end": 1572.12, "text": " do I want to go there now? Like, + Python is much more comfortable.", "tokens": [51196, 360, 286, 528, 281, 352, 456, + 586, 30, 1743, 11, 15329, 307, 709, 544, 4619, 13, 51356], "temperature": 0.0, "avg_logprob": + -0.2797810993497334, "compression_ratio": 1.6178571428571429, "no_speech_prob": + 0.02168191783130169}, {"id": 239, "seek": 155228, "start": 1573.16, "end": 1580.92, + "text": " The, I think, depends on the later you are working with, and it''s, so + I think that by offering", "tokens": [51408, 440, 11, 286, 519, 11, 5946, 322, 264, + 1780, 291, 366, 1364, 365, 11, 293, 309, 311, 11, 370, 286, 519, 300, 538, 8745, + 51796], "temperature": 0.0, "avg_logprob": -0.2797810993497334, "compression_ratio": + 1.6178571428571429, "no_speech_prob": 0.02168191783130169}, {"id": 240, "seek": + 158092, "start": 1580.92, "end": 1588.2, "text": " Python APIs in the field, if + machine learning will attract, then we''ll make everybody much easier to use.", + "tokens": [50364, 15329, 21445, 294, 264, 2519, 11, 498, 3479, 2539, 486, 5049, + 11, 550, 321, 603, 652, 2201, 709, 3571, 281, 764, 13, 50728], "temperature": 0.0, + "avg_logprob": -0.1616580287615458, "compression_ratio": 1.6056910569105691, "no_speech_prob": + 0.0022548306733369827}, {"id": 241, "seek": 158092, "start": 1588.92, "end": 1597.0800000000002, + "text": " Then if you get API rights, the API is right, you might then bind it to + whatever of your favorite", "tokens": [50764, 1396, 498, 291, 483, 9362, 4601, 11, + 264, 9362, 307, 558, 11, 291, 1062, 550, 14786, 309, 281, 2035, 295, 428, 2954, + 51172], "temperature": 0.0, "avg_logprob": -0.1616580287615458, "compression_ratio": + 1.6056910569105691, "no_speech_prob": 0.0022548306733369827}, {"id": 242, "seek": + 158092, "start": 1597.0800000000002, "end": 1602.68, "text": " languages, but I + think getting the comfortable API for that developer to use and to love using", + "tokens": [51172, 8650, 11, 457, 286, 519, 1242, 264, 4619, 9362, 337, 300, 10754, + 281, 764, 293, 281, 959, 1228, 51452], "temperature": 0.0, "avg_logprob": -0.1616580287615458, + "compression_ratio": 1.6056910569105691, "no_speech_prob": 0.0022548306733369827}, + {"id": 243, "seek": 158092, "start": 1602.68, "end": 1609.0800000000002, "text": + " is one of the key first steps. So, do you invest a lot into building these APIs? + Can you give an", "tokens": [51452, 307, 472, 295, 264, 2141, 700, 4439, 13, 407, + 11, 360, 291, 1963, 257, 688, 666, 2390, 613, 21445, 30, 1664, 291, 976, 364, 51772], + "temperature": 0.0, "avg_logprob": -0.1616580287615458, "compression_ratio": 1.6056910569105691, + "no_speech_prob": 0.0022548306733369827}, {"id": 244, "seek": 160908, "start": 1609.08, + "end": 1615.3999999999999, "text": " example of like some API within Gina that kind + of makes the workflow easier for?", "tokens": [50364, 1365, 295, 411, 512, 9362, + 1951, 34711, 300, 733, 295, 1669, 264, 20993, 3571, 337, 30, 50680], "temperature": + 0.0, "avg_logprob": -0.1389803575432819, "compression_ratio": 1.6478260869565218, + "no_speech_prob": 0.000956216361373663}, {"id": 245, "seek": 160908, "start": 1615.3999999999999, + "end": 1620.6, "text": " So, for instance, we are trying to improve a lot in this + document. So, documents are our central", "tokens": [50680, 407, 11, 337, 5197, + 11, 321, 366, 1382, 281, 3470, 257, 688, 294, 341, 4166, 13, 407, 11, 8512, 366, + 527, 5777, 50940], "temperature": 0.0, "avg_logprob": -0.1389803575432819, "compression_ratio": + 1.6478260869565218, "no_speech_prob": 0.000956216361373663}, {"id": 246, "seek": + 160908, "start": 1620.6, "end": 1627.8, "text": " logic, and documents are raised + that these are the two core members of our family in the ecosystem.", "tokens": + [50940, 9952, 11, 293, 8512, 366, 6005, 300, 613, 366, 264, 732, 4965, 2679, 295, + 527, 1605, 294, 264, 11311, 13, 51300], "temperature": 0.0, "avg_logprob": -0.1389803575432819, + "compression_ratio": 1.6478260869565218, "no_speech_prob": 0.000956216361373663}, + {"id": 247, "seek": 160908, "start": 1627.8, "end": 1636.28, "text": " So, we are + spending a lot of time on making them easy to use. For instance, with this fluent + pattern,", "tokens": [51300, 407, 11, 321, 366, 6434, 257, 688, 295, 565, 322, 1455, + 552, 1858, 281, 764, 13, 1171, 5197, 11, 365, 341, 40799, 5102, 11, 51724], "temperature": + 0.0, "avg_logprob": -0.1389803575432819, "compression_ratio": 1.6478260869565218, + "no_speech_prob": 0.000956216361373663}, {"id": 248, "seek": 163628, "start": 1636.28, + "end": 1641.48, "text": " we are trying to invest a lot of time on finding the best + way, the more Python way to work on it.", "tokens": [50364, 321, 366, 1382, 281, + 1963, 257, 688, 295, 565, 322, 5006, 264, 1151, 636, 11, 264, 544, 15329, 636, 281, + 589, 322, 309, 13, 50624], "temperature": 0.0, "avg_logprob": -0.24750641414097377, + "compression_ratio": 1.6008583690987124, "no_speech_prob": 0.005503923632204533}, + {"id": 249, "seek": 163628, "start": 1642.84, "end": 1647.6399999999999, "text": + " Yeah, so it''s a constant evolution, try and error. Yeah, of course, but it''s,", + "tokens": [50692, 865, 11, 370, 309, 311, 257, 5754, 9303, 11, 853, 293, 6713, 13, + 865, 11, 295, 1164, 11, 457, 309, 311, 11, 50932], "temperature": 0.0, "avg_logprob": + -0.24750641414097377, "compression_ratio": 1.6008583690987124, "no_speech_prob": + 0.005503923632204533}, {"id": 250, "seek": 163628, "start": 1648.44, "end": 1654.44, + "text": " it''s like APIs is like exactly that layer, which is essentially like + facing the customer, right?", "tokens": [50972, 309, 311, 411, 21445, 307, 411, + 2293, 300, 4583, 11, 597, 307, 4476, 411, 7170, 264, 5474, 11, 558, 30, 51272], + "temperature": 0.0, "avg_logprob": -0.24750641414097377, "compression_ratio": 1.6008583690987124, + "no_speech_prob": 0.005503923632204533}, {"id": 251, "seek": 163628, "start": 1654.44, + "end": 1661.56, "text": " And you don''t know the scenarios they will use it in, + and sometimes they might kind of surprise you,", "tokens": [51272, 400, 291, 500, + 380, 458, 264, 15077, 436, 486, 764, 309, 294, 11, 293, 2171, 436, 1062, 733, 295, + 6365, 291, 11, 51628], "temperature": 0.0, "avg_logprob": -0.24750641414097377, + "compression_ratio": 1.6008583690987124, "no_speech_prob": 0.005503923632204533}, + {"id": 252, "seek": 166156, "start": 1661.56, "end": 1666.9199999999998, "text": + " or they might say, okay, I found some work around for your like missing parts, + but then you think,", "tokens": [50364, 420, 436, 1062, 584, 11, 1392, 11, 286, + 1352, 512, 589, 926, 337, 428, 411, 5361, 3166, 11, 457, 550, 291, 519, 11, 50632], + "temperature": 0.0, "avg_logprob": -0.13931622551482858, "compression_ratio": 1.6047430830039526, + "no_speech_prob": 0.0029693488031625748}, {"id": 253, "seek": 166156, "start": 1666.9199999999998, + "end": 1673.3999999999999, "text": " okay, I didn''t think about it, right? The + API layer is a fantastic way of talking to your client through", "tokens": [50632, + 1392, 11, 286, 994, 380, 519, 466, 309, 11, 558, 30, 440, 9362, 4583, 307, 257, + 5456, 636, 295, 1417, 281, 428, 6423, 807, 50956], "temperature": 0.0, "avg_logprob": + -0.13931622551482858, "compression_ratio": 1.6047430830039526, "no_speech_prob": + 0.0029693488031625748}, {"id": 254, "seek": 166156, "start": 1673.3999999999999, + "end": 1680.36, "text": " like API contract in a way, right? Yeah, and it''s a quite + a big challenge I would say to have the", "tokens": [50956, 411, 9362, 4364, 294, + 257, 636, 11, 558, 30, 865, 11, 293, 309, 311, 257, 1596, 257, 955, 3430, 286, 576, + 584, 281, 362, 264, 51304], "temperature": 0.0, "avg_logprob": -0.13931622551482858, + "compression_ratio": 1.6047430830039526, "no_speech_prob": 0.0029693488031625748}, + {"id": 255, "seek": 166156, "start": 1680.36, "end": 1688.6799999999998, "text": + " right balance between ease of use and flexibility. So, what belongs there and + what doesn''t belong there?", "tokens": [51304, 558, 4772, 1296, 12708, 295, 764, + 293, 12635, 13, 407, 11, 437, 12953, 456, 293, 437, 1177, 380, 5784, 456, 30, 51720], + "temperature": 0.0, "avg_logprob": -0.13931622551482858, "compression_ratio": 1.6047430830039526, + "no_speech_prob": 0.0029693488031625748}, {"id": 256, "seek": 168868, "start": 1688.68, + "end": 1694.68, "text": " Because there''s always a risk to put too much functionality + in one same thing and make it very", "tokens": [50364, 1436, 456, 311, 1009, 257, + 3148, 281, 829, 886, 709, 14980, 294, 472, 912, 551, 293, 652, 309, 588, 50664], + "temperature": 0.0, "avg_logprob": -0.2866224351820055, "compression_ratio": 1.6232558139534883, + "no_speech_prob": 0.011051030829548836}, {"id": 257, "seek": 168868, "start": 1694.68, + "end": 1701.64, "text": " powerful, but make it a nightmare to use. Yeah, so in + these, in these balance, I think there is the key", "tokens": [50664, 4005, 11, + 457, 652, 309, 257, 18724, 281, 764, 13, 865, 11, 370, 294, 613, 11, 294, 613, 4772, + 11, 286, 519, 456, 307, 264, 2141, 51012], "temperature": 0.0, "avg_logprob": -0.2866224351820055, + "compression_ratio": 1.6232558139534883, "no_speech_prob": 0.011051030829548836}, + {"id": 258, "seek": 168868, "start": 1702.68, "end": 1707.48, "text": " what is + your choice when you have to choose? Let''s say it''s a balance of flexibility,", + "tokens": [51064, 437, 307, 428, 3922, 562, 291, 362, 281, 2826, 30, 961, 311, 584, + 309, 311, 257, 4772, 295, 12635, 11, 51304], "temperature": 0.0, "avg_logprob": + -0.2866224351820055, "compression_ratio": 1.6232558139534883, "no_speech_prob": + 0.011051030829548836}, {"id": 259, "seek": 168868, "start": 1707.48, "end": 1711.0, + "text": " or like flexibility, or what did you say the ease of use, right?", "tokens": + [51304, 420, 411, 12635, 11, 420, 437, 630, 291, 584, 264, 12708, 295, 764, 11, + 558, 30, 51480], "temperature": 0.0, "avg_logprob": -0.2866224351820055, "compression_ratio": + 1.6232558139534883, "no_speech_prob": 0.011051030829548836}, {"id": 260, "seek": + 171100, "start": 1711.08, "end": 1715.16, "text": " ease of use. I think we are + now, I''m now", "tokens": [50368, 12708, 295, 764, 13, 286, 519, 321, 366, 586, + 11, 286, 478, 586, 50572], "temperature": 0.0, "avg_logprob": -0.2741630639922753, + "compression_ratio": 1.6176470588235294, "no_speech_prob": 0.02000425010919571}, + {"id": 261, "seek": 171100, "start": 1716.36, "end": 1722.44, "text": " attending + to go for the ease of use because for instance, with these open source, I read that", + "tokens": [50632, 15862, 281, 352, 337, 264, 12708, 295, 764, 570, 337, 5197, 11, + 365, 613, 1269, 4009, 11, 286, 1401, 300, 50936], "temperature": 0.0, "avg_logprob": + -0.2741630639922753, "compression_ratio": 1.6176470588235294, "no_speech_prob": + 0.02000425010919571}, {"id": 262, "seek": 171100, "start": 1722.44, "end": 1729.48, + "text": " open source teaches you well. I think at some point, we did a nearly down + to well the APIs and", "tokens": [50936, 1269, 4009, 16876, 291, 731, 13, 286, 519, + 412, 512, 935, 11, 321, 630, 257, 6217, 760, 281, 731, 264, 21445, 293, 51288], + "temperature": 0.0, "avg_logprob": -0.2741630639922753, "compression_ratio": 1.6176470588235294, + "no_speech_prob": 0.02000425010919571}, {"id": 263, "seek": 171100, "start": 1729.48, + "end": 1733.88, "text": " it was a little bit complex to use. You could do a lot + of things, but at the end maybe not everybody", "tokens": [51288, 309, 390, 257, + 707, 857, 3997, 281, 764, 13, 509, 727, 360, 257, 688, 295, 721, 11, 457, 412, 264, + 917, 1310, 406, 2201, 51508], "temperature": 0.0, "avg_logprob": -0.2741630639922753, + "compression_ratio": 1.6176470588235294, "no_speech_prob": 0.02000425010919571}, + {"id": 264, "seek": 173388, "start": 1734.8400000000001, "end": 1740.8400000000001, + "text": " was doing. So, I think it''s of use for the first century barrier. It''s + the most important thing.", "tokens": [50412, 390, 884, 13, 407, 11, 286, 519, 309, + 311, 295, 764, 337, 264, 700, 4901, 13357, 13, 467, 311, 264, 881, 1021, 551, 13, + 50712], "temperature": 0.0, "avg_logprob": -0.22011393359583667, "compression_ratio": + 1.6631578947368422, "no_speech_prob": 0.011186445131897926}, {"id": 265, "seek": + 173388, "start": 1741.64, "end": 1746.5200000000002, "text": " Yeah, and I mean, + also like it''s interesting, you know, like if you have a real API, let''s say", + "tokens": [50752, 865, 11, 293, 286, 914, 11, 611, 411, 309, 311, 1880, 11, 291, + 458, 11, 411, 498, 291, 362, 257, 957, 9362, 11, 718, 311, 584, 50996], "temperature": + 0.0, "avg_logprob": -0.22011393359583667, "compression_ratio": 1.6631578947368422, + "no_speech_prob": 0.011186445131897926}, {"id": 266, "seek": 173388, "start": 1746.5200000000002, + "end": 1751.8000000000002, "text": " deploy it somewhere and it''s a published contract + and people are sending queries there,", "tokens": [50996, 7274, 309, 4079, 293, + 309, 311, 257, 6572, 4364, 293, 561, 366, 7750, 24109, 456, 11, 51260], "temperature": + 0.0, "avg_logprob": -0.22011393359583667, "compression_ratio": 1.6631578947368422, + "no_speech_prob": 0.011186445131897926}, {"id": 267, "seek": 173388, "start": 1751.8000000000002, + "end": 1757.3200000000002, "text": " then you know actually which endpoints which + features are being used which are not, which options", "tokens": [51260, 550, 291, + 458, 767, 597, 917, 20552, 597, 4122, 366, 885, 1143, 597, 366, 406, 11, 597, 3956, + 51536], "temperature": 0.0, "avg_logprob": -0.22011393359583667, "compression_ratio": + 1.6631578947368422, "no_speech_prob": 0.011186445131897926}, {"id": 268, "seek": + 173388, "start": 1757.3200000000002, "end": 1761.72, "text": " are completely ignored + even though you put them in the dogs, right? But how do you go about this", "tokens": + [51536, 366, 2584, 19735, 754, 1673, 291, 829, 552, 294, 264, 7197, 11, 558, 30, + 583, 577, 360, 291, 352, 466, 341, 51756], "temperature": 0.0, "avg_logprob": -0.22011393359583667, + "compression_ratio": 1.6631578947368422, "no_speech_prob": 0.011186445131897926}, + {"id": 269, "seek": 176172, "start": 1762.44, "end": 1769.64, "text": " in the open + source code? Like somebody downloads your code, they use it somewhere, you don''t + know how.", "tokens": [50400, 294, 264, 1269, 4009, 3089, 30, 1743, 2618, 36553, + 428, 3089, 11, 436, 764, 309, 4079, 11, 291, 500, 380, 458, 577, 13, 50760], "temperature": + 0.0, "avg_logprob": -0.20005519866943358, "compression_ratio": 1.6282051282051282, + "no_speech_prob": 0.014889408834278584}, {"id": 270, "seek": 176172, "start": 1770.3600000000001, + "end": 1775.08, "text": " So, how do you collect these analytics from them? Do you + just send like call out messages,", "tokens": [50796, 407, 11, 577, 360, 291, 2500, + 613, 15370, 490, 552, 30, 1144, 291, 445, 2845, 411, 818, 484, 7897, 11, 51032], + "temperature": 0.0, "avg_logprob": -0.20005519866943358, "compression_ratio": 1.6282051282051282, + "no_speech_prob": 0.014889408834278584}, {"id": 271, "seek": 176172, "start": 1775.08, + "end": 1779.88, "text": " hey guys, what do you use, what do you don''t? Right now + we are trying to keep attention on who is", "tokens": [51032, 4177, 1074, 11, 437, + 360, 291, 764, 11, 437, 360, 291, 500, 380, 30, 1779, 586, 321, 366, 1382, 281, + 1066, 3202, 322, 567, 307, 51272], "temperature": 0.0, "avg_logprob": -0.20005519866943358, + "compression_ratio": 1.6282051282051282, "no_speech_prob": 0.014889408834278584}, + {"id": 272, "seek": 176172, "start": 1779.88, "end": 1786.44, "text": " using guys, + what, and when people ask us, we try to get the most information out of them,", + "tokens": [51272, 1228, 1074, 11, 437, 11, 293, 562, 561, 1029, 505, 11, 321, 853, + 281, 483, 264, 881, 1589, 484, 295, 552, 11, 51600], "temperature": 0.0, "avg_logprob": + -0.20005519866943358, "compression_ratio": 1.6282051282051282, "no_speech_prob": + 0.014889408834278584}, {"id": 273, "seek": 178644, "start": 1787.16, "end": 1791.48, + "text": " not information on the business of how they use it, how they feel. So + right now the", "tokens": [50400, 406, 1589, 322, 264, 1606, 295, 577, 436, 764, + 309, 11, 577, 436, 841, 13, 407, 558, 586, 264, 50616], "temperature": 0.0, "avg_logprob": + -0.2196634732759916, "compression_ratio": 1.7246963562753037, "no_speech_prob": + 0.007910187356173992}, {"id": 274, "seek": 178644, "start": 1791.48, "end": 1796.28, + "text": " community is the only source of information we have. That''s the open + source. What?", "tokens": [50616, 1768, 307, 264, 787, 4009, 295, 1589, 321, 362, + 13, 663, 311, 264, 1269, 4009, 13, 708, 30, 50856], "temperature": 0.0, "avg_logprob": + -0.2196634732759916, "compression_ratio": 1.7246963562753037, "no_speech_prob": + 0.007910187356173992}, {"id": 275, "seek": 178644, "start": 1797.24, "end": 1802.2, + "text": " How do you talk to them? Like do you like send like messages saying, hey + guys, can you vote", "tokens": [50904, 1012, 360, 291, 751, 281, 552, 30, 1743, + 360, 291, 411, 2845, 411, 7897, 1566, 11, 4177, 1074, 11, 393, 291, 4740, 51152], + "temperature": 0.0, "avg_logprob": -0.2196634732759916, "compression_ratio": 1.7246963562753037, + "no_speech_prob": 0.007910187356173992}, {"id": 276, "seek": 178644, "start": 1802.2, + "end": 1807.72, "text": " about keeping this feature and removing that one or not + exactly like this, but would you see", "tokens": [51152, 466, 5145, 341, 4111, 293, + 12720, 300, 472, 420, 406, 2293, 411, 341, 11, 457, 576, 291, 536, 51428], "temperature": + 0.0, "avg_logprob": -0.2196634732759916, "compression_ratio": 1.7246963562753037, + "no_speech_prob": 0.007910187356173992}, {"id": 277, "seek": 178644, "start": 1808.2, + "end": 1811.8, "text": " people that are more engaged or more or less engaged, people + that are more", "tokens": [51452, 561, 300, 366, 544, 8237, 420, 544, 420, 1570, + 8237, 11, 561, 300, 366, 544, 51632], "temperature": 0.0, "avg_logprob": -0.2196634732759916, + "compression_ratio": 1.7246963562753037, "no_speech_prob": 0.007910187356173992}, + {"id": 278, "seek": 181180, "start": 1812.6, "end": 1817.96, "text": " finding it + more easier or less or having more difficulties with your with your solution.", + "tokens": [50404, 5006, 309, 544, 3571, 420, 1570, 420, 1419, 544, 14399, 365, 428, + 365, 428, 3827, 13, 50672], "temperature": 0.0, "avg_logprob": -0.2089354294996995, + "compression_ratio": 1.5674157303370786, "no_speech_prob": 0.010515459813177586}, + {"id": 279, "seek": 181180, "start": 1819.3999999999999, "end": 1827.0, "text": + " So it''s and sometimes we have a development relations team that try to get also + feedback from", "tokens": [50744, 407, 309, 311, 293, 2171, 321, 362, 257, 3250, + 2299, 1469, 300, 853, 281, 483, 611, 5824, 490, 51124], "temperature": 0.0, "avg_logprob": + -0.2089354294996995, "compression_ratio": 1.5674157303370786, "no_speech_prob": + 0.010515459813177586}, {"id": 280, "seek": 181180, "start": 1827.0, "end": 1834.2, + "text": " from the community in many terms. So this is a global effort. But in the + end you have you have a", "tokens": [51124, 490, 264, 1768, 294, 867, 2115, 13, + 407, 341, 307, 257, 4338, 4630, 13, 583, 294, 264, 917, 291, 362, 291, 362, 257, + 51484], "temperature": 0.0, "avg_logprob": -0.2089354294996995, "compression_ratio": + 1.5674157303370786, "no_speech_prob": 0.010515459813177586}, {"id": 281, "seek": + 183420, "start": 1834.2, "end": 1839.64, "text": " say, right? Like no matter what + they ask you have a say, is that right? Well, I mean,", "tokens": [50364, 584, 11, + 558, 30, 1743, 572, 1871, 437, 436, 1029, 291, 362, 257, 584, 11, 307, 300, 558, + 30, 1042, 11, 286, 914, 11, 50636], "temperature": 0.0, "avg_logprob": -0.2862526782147296, + "compression_ratio": 1.4974358974358974, "no_speech_prob": 0.02727777510881424}, + {"id": 282, "seek": 183420, "start": 1840.92, "end": 1848.76, "text": " sometimes + you cannot please the community to all the extent, I don''t know, we have to keep + a road map.", "tokens": [50700, 2171, 291, 2644, 1767, 264, 1768, 281, 439, 264, + 8396, 11, 286, 500, 380, 458, 11, 321, 362, 281, 1066, 257, 3060, 4471, 13, 51092], + "temperature": 0.0, "avg_logprob": -0.2862526782147296, "compression_ratio": 1.4974358974358974, + "no_speech_prob": 0.02727777510881424}, {"id": 283, "seek": 183420, "start": 1849.56, + "end": 1856.04, "text": " For instance, people may want you to build something that + is emanated, but maybe not so significant for", "tokens": [51132, 1171, 5197, 11, + 561, 815, 528, 291, 281, 1322, 746, 300, 307, 28211, 770, 11, 457, 1310, 406, 370, + 4776, 337, 51456], "temperature": 0.0, "avg_logprob": -0.2862526782147296, "compression_ratio": + 1.4974358974358974, "no_speech_prob": 0.02727777510881424}, {"id": 284, "seek": + 185604, "start": 1856.36, "end": 1865.96, "text": " search solutions. This is quite + a confusion, I think. So beyond search like where can I use GNA,", "tokens": [50380, + 3164, 6547, 13, 639, 307, 1596, 257, 15075, 11, 286, 519, 13, 407, 4399, 3164, 411, + 689, 393, 286, 764, 460, 5321, 11, 50860], "temperature": 0.0, "avg_logprob": -0.30593849891840025, + "compression_ratio": 1.6741071428571428, "no_speech_prob": 0.005071389023214579}, + {"id": 285, "seek": 185604, "start": 1865.96, "end": 1870.76, "text": " what kind + of other use cases have you seen beyond like kind of similarity search?", "tokens": + [50860, 437, 733, 295, 661, 764, 3331, 362, 291, 1612, 4399, 411, 733, 295, 32194, + 3164, 30, 51100], "temperature": 0.0, "avg_logprob": -0.30593849891840025, "compression_ratio": + 1.6741071428571428, "no_speech_prob": 0.005071389023214579}, {"id": 286, "seek": + 185604, "start": 1872.36, "end": 1879.32, "text": " So since we are building these + abstractions, it is quite easy for you to use these abstractions", "tokens": [51180, + 407, 1670, 321, 366, 2390, 613, 12649, 626, 11, 309, 307, 1596, 1858, 337, 291, + 281, 764, 613, 12649, 626, 51528], "temperature": 0.0, "avg_logprob": -0.30593849891840025, + "compression_ratio": 1.6741071428571428, "no_speech_prob": 0.005071389023214579}, + {"id": 287, "seek": 185604, "start": 1879.32, "end": 1885.6399999999999, "text": + " for building any classification model or anything as you really did, you could + even deploy something", "tokens": [51528, 337, 2390, 604, 21538, 2316, 420, 1340, + 382, 291, 534, 630, 11, 291, 727, 754, 7274, 746, 51844], "temperature": 0.0, "avg_logprob": + -0.30593849891840025, "compression_ratio": 1.6741071428571428, "no_speech_prob": + 0.005071389023214579}, {"id": 288, "seek": 188564, "start": 1885.64, "end": 1892.8400000000001, + "text": " and use GNA to easily deploy and scale and use your segmenter and object + segmenter model.", "tokens": [50364, 293, 764, 460, 5321, 281, 3612, 7274, 293, + 4373, 293, 764, 428, 9469, 260, 293, 2657, 9469, 260, 2316, 13, 50724], "temperature": + 0.0, "avg_logprob": -0.18430510811183765, "compression_ratio": 1.7135922330097086, + "no_speech_prob": 0.0018924744799733162}, {"id": 289, "seek": 188564, "start": 1893.4, + "end": 1898.5200000000002, "text": " But this is this is something that you could + do, but GNA is born and will", "tokens": [50752, 583, 341, 307, 341, 307, 746, 300, + 291, 727, 360, 11, 457, 460, 5321, 307, 4232, 293, 486, 51008], "temperature": 0.0, + "avg_logprob": -0.18430510811183765, "compression_ratio": 1.7135922330097086, "no_speech_prob": + 0.0018924744799733162}, {"id": 290, "seek": 188564, "start": 1899.4, "end": 1907.48, + "text": " will be working to implement new research solutions. So you could still + use this but might not", "tokens": [51052, 486, 312, 1364, 281, 4445, 777, 2132, + 6547, 13, 407, 291, 727, 920, 764, 341, 457, 1062, 406, 51456], "temperature": 0.0, + "avg_logprob": -0.18430510811183765, "compression_ratio": 1.7135922330097086, "no_speech_prob": + 0.0018924744799733162}, {"id": 291, "seek": 188564, "start": 1907.48, "end": 1912.68, + "text": " be the best tool for it. So we are not born for that, but you could do + it. But we can see that", "tokens": [51456, 312, 264, 1151, 2290, 337, 309, 13, + 407, 321, 366, 406, 4232, 337, 300, 11, 457, 291, 727, 360, 309, 13, 583, 321, 393, + 536, 300, 51716], "temperature": 0.0, "avg_logprob": -0.18430510811183765, "compression_ratio": + 1.7135922330097086, "no_speech_prob": 0.0018924744799733162}, {"id": 292, "seek": + 191268, "start": 1913.16, "end": 1918.1200000000001, "text": " we are done. You + can do this because for instance classification or segmenting can object can be", + "tokens": [50388, 321, 366, 1096, 13, 509, 393, 360, 341, 570, 337, 5197, 21538, + 420, 9469, 278, 393, 2657, 393, 312, 50636], "temperature": 0.0, "avg_logprob": + -0.34182002932526345, "compression_ratio": 1.7511737089201878, "no_speech_prob": + 0.005156046710908413}, {"id": 293, "seek": 191268, "start": 1918.1200000000001, + "end": 1923.48, "text": " part of your pipeline, but in theory we are born to support + search applications.", "tokens": [50636, 644, 295, 428, 15517, 11, 457, 294, 5261, + 321, 366, 4232, 281, 1406, 3164, 5821, 13, 50904], "temperature": 0.0, "avg_logprob": + -0.34182002932526345, "compression_ratio": 1.7511737089201878, "no_speech_prob": + 0.005156046710908413}, {"id": 294, "seek": 191268, "start": 1924.28, "end": 1929.96, + "text": " Yeah, yeah. So like, or for example, something that is or search applications + or something that you", "tokens": [50944, 865, 11, 1338, 13, 407, 411, 11, 420, + 337, 1365, 11, 746, 300, 307, 420, 3164, 5821, 420, 746, 300, 291, 51228], "temperature": + 0.0, "avg_logprob": -0.34182002932526345, "compression_ratio": 1.7511737089201878, + "no_speech_prob": 0.005156046710908413}, {"id": 295, "seek": 191268, "start": 1929.96, + "end": 1936.8400000000001, "text": " can frame as a search application right now, + for instance, a question-answering system that you", "tokens": [51228, 393, 3920, + 382, 257, 3164, 3861, 558, 586, 11, 337, 5197, 11, 257, 1168, 12, 43904, 278, 1185, + 300, 291, 51572], "temperature": 0.0, "avg_logprob": -0.34182002932526345, "compression_ratio": + 1.7511737089201878, "no_speech_prob": 0.005156046710908413}, {"id": 296, "seek": + 193684, "start": 1936.9199999999998, "end": 1940.9199999999998, "text": " can frame + as a part where you will do something like research or", "tokens": [50368, 393, + 3920, 382, 257, 644, 689, 291, 486, 360, 746, 411, 2132, 420, 50568], "temperature": + 0.0, "avg_logprob": -0.20958545684814453, "compression_ratio": 1.7892561983471074, + "no_speech_prob": 0.0024767343420535326}, {"id": 297, "seek": 193684, "start": 1941.8, + "end": 1948.36, "text": " spare search and then you have some real or model that + extracts more information from something.", "tokens": [50612, 13798, 3164, 293, + 550, 291, 362, 512, 957, 420, 2316, 300, 8947, 82, 544, 1589, 490, 746, 13, 50940], + "temperature": 0.0, "avg_logprob": -0.20958545684814453, "compression_ratio": 1.7892561983471074, + "no_speech_prob": 0.0024767343420535326}, {"id": 298, "seek": 193684, "start": 1948.36, + "end": 1952.76, "text": " So anything that falls into this domain, you can do it. + Yeah, I guess you can also", "tokens": [50940, 407, 1340, 300, 8804, 666, 341, 9274, + 11, 291, 393, 360, 309, 13, 865, 11, 286, 2041, 291, 393, 611, 51160], "temperature": + 0.0, "avg_logprob": -0.20958545684814453, "compression_ratio": 1.7892561983471074, + "no_speech_prob": 0.0024767343420535326}, {"id": 299, "seek": 193684, "start": 1953.8, + "end": 1960.12, "text": " like based on the research and some practice happening + in data augmentation based on retrieving,", "tokens": [51212, 411, 2361, 322, 264, + 2132, 293, 512, 3124, 2737, 294, 1412, 14501, 19631, 2361, 322, 19817, 798, 11, + 51528], "temperature": 0.0, "avg_logprob": -0.20958545684814453, "compression_ratio": + 1.7892561983471074, "no_speech_prob": 0.0024767343420535326}, {"id": 300, "seek": + 193684, "start": 1960.12, "end": 1965.6399999999999, "text": " you can also formulate + data augmentation as a process of search in principle, right? So the", "tokens": + [51528, 291, 393, 611, 47881, 1412, 14501, 19631, 382, 257, 1399, 295, 3164, 294, + 8665, 11, 558, 30, 407, 264, 51804], "temperature": 0.0, "avg_logprob": -0.20958545684814453, + "compression_ratio": 1.7892561983471074, "no_speech_prob": 0.0024767343420535326}, + {"id": 301, "seek": 196564, "start": 1965.64, "end": 1971.72, "text": " output will + be your augmented data, but you use search in the middle.", "tokens": [50364, 5598, + 486, 312, 428, 36155, 1412, 11, 457, 291, 764, 3164, 294, 264, 2808, 13, 50668], + "temperature": 0.0, "avg_logprob": -0.268896210059691, "compression_ratio": 1.577092511013216, + "no_speech_prob": 0.0038744050543755293}, {"id": 302, "seek": 196564, "start": 1972.5200000000002, + "end": 1978.92, "text": " Yes, actually that might be but search can be so many + problems can be framed into search. I", "tokens": [50708, 1079, 11, 767, 300, 1062, + 312, 457, 3164, 393, 312, 370, 867, 2740, 393, 312, 30420, 666, 3164, 13, 286, 51028], + "temperature": 0.0, "avg_logprob": -0.268896210059691, "compression_ratio": 1.577092511013216, + "no_speech_prob": 0.0038744050543755293}, {"id": 303, "seek": 196564, "start": 1978.92, + "end": 1987.16, "text": " don''t know. At the end is like vectors are somehow like + the truth, not like the semantic information,", "tokens": [51028, 500, 380, 458, + 13, 1711, 264, 917, 307, 411, 18875, 366, 6063, 411, 264, 3494, 11, 406, 411, 264, + 47982, 1589, 11, 51440], "temperature": 0.0, "avg_logprob": -0.268896210059691, + "compression_ratio": 1.577092511013216, "no_speech_prob": 0.0038744050543755293}, + {"id": 304, "seek": 196564, "start": 1987.16, "end": 1993.72, "text": " so how we + don''t understand exactly why but is encoded there, right? So just by clustering + them", "tokens": [51440, 370, 577, 321, 500, 380, 1223, 2293, 983, 457, 307, 2058, + 12340, 456, 11, 558, 30, 407, 445, 538, 596, 48673, 552, 51768], "temperature": + 0.0, "avg_logprob": -0.268896210059691, "compression_ratio": 1.577092511013216, + "no_speech_prob": 0.0038744050543755293}, {"id": 305, "seek": 199372, "start": 1993.72, + "end": 1998.52, "text": " together, somehow we have some understanding, so so many + things to confirm with.", "tokens": [50364, 1214, 11, 6063, 321, 362, 512, 3701, + 11, 370, 370, 867, 721, 281, 9064, 365, 13, 50604], "temperature": 0.0, "avg_logprob": + -0.22372118048711653, "compression_ratio": 1.5229681978798586, "no_speech_prob": + 0.0020545891020447016}, {"id": 306, "seek": 199372, "start": 2000.84, "end": 2005.32, + "text": " I also wanted to ask you a little bit like closer to the similarity search + itself,", "tokens": [50720, 286, 611, 1415, 281, 1029, 291, 257, 707, 857, 411, + 4966, 281, 264, 32194, 3164, 2564, 11, 50944], "temperature": 0.0, "avg_logprob": + -0.22372118048711653, "compression_ratio": 1.5229681978798586, "no_speech_prob": + 0.0020545891020447016}, {"id": 307, "seek": 199372, "start": 2005.32, "end": 2011.88, + "text": " you know, let''s say I built a traditional kind of text search engine, + okay? And I''m moving away", "tokens": [50944, 291, 458, 11, 718, 311, 584, 286, + 3094, 257, 5164, 733, 295, 2487, 3164, 2848, 11, 1392, 30, 400, 286, 478, 2684, + 1314, 51272], "temperature": 0.0, "avg_logprob": -0.22372118048711653, "compression_ratio": + 1.5229681978798586, "no_speech_prob": 0.0020545891020447016}, {"id": 308, "seek": + 199372, "start": 2011.88, "end": 2017.56, "text": " from BM25, which is like probably + majority of this market today. So I''m thinking, okay,", "tokens": [51272, 490, + 15901, 6074, 11, 597, 307, 411, 1391, 6286, 295, 341, 2142, 965, 13, 407, 286, 478, + 1953, 11, 1392, 11, 51556], "temperature": 0.0, "avg_logprob": -0.22372118048711653, + "compression_ratio": 1.5229681978798586, "no_speech_prob": 0.0020545891020447016}, + {"id": 309, "seek": 199372, "start": 2017.56, "end": 2021.56, "text": " what are + these cool kids doing? Maybe I should try it out, plug in some bird model.", "tokens": + [51556, 437, 366, 613, 1627, 2301, 884, 30, 2704, 286, 820, 853, 309, 484, 11, 5452, + 294, 512, 5255, 2316, 13, 51756], "temperature": 0.0, "avg_logprob": -0.22372118048711653, + "compression_ratio": 1.5229681978798586, "no_speech_prob": 0.0020545891020447016}, + {"id": 310, "seek": 202156, "start": 2022.12, "end": 2030.2, "text": " So and but + then in my UI, I am also showing snippets and it''s very easy to show snippets when + it''s a", "tokens": [50392, 407, 293, 457, 550, 294, 452, 15682, 11, 286, 669, 611, + 4099, 35623, 1385, 293, 309, 311, 588, 1858, 281, 855, 35623, 1385, 562, 309, 311, + 257, 50796], "temperature": 0.0, "avg_logprob": -0.19450630680207284, "compression_ratio": + 1.665137614678899, "no_speech_prob": 0.01120330486446619}, {"id": 311, "seek": 202156, + "start": 2030.2, "end": 2037.6399999999999, "text": " keyboard search, right? So + what should I do or what can I do with model like bird and genie AI", "tokens": + [50796, 10186, 3164, 11, 558, 30, 407, 437, 820, 286, 360, 420, 437, 393, 286, 360, + 365, 2316, 411, 5255, 293, 1049, 414, 7318, 51168], "temperature": 0.0, "avg_logprob": + -0.19450630680207284, "compression_ratio": 1.665137614678899, "no_speech_prob": + 0.01120330486446619}, {"id": 312, "seek": 202156, "start": 2037.6399999999999, "end": + 2041.32, "text": " to show snippets or something that will resemble snippets to + the users?", "tokens": [51168, 281, 855, 35623, 1385, 420, 746, 300, 486, 36870, + 35623, 1385, 281, 264, 5022, 30, 51352], "temperature": 0.0, "avg_logprob": -0.19450630680207284, + "compression_ratio": 1.665137614678899, "no_speech_prob": 0.01120330486446619}, + {"id": 313, "seek": 202156, "start": 2042.84, "end": 2048.84, "text": " Maybe you + can also change your information so that you check where the attention is put in + your", "tokens": [51428, 2704, 291, 393, 611, 1319, 428, 1589, 370, 300, 291, 1520, + 689, 264, 3202, 307, 829, 294, 428, 51728], "temperature": 0.0, "avg_logprob": -0.19450630680207284, + "compression_ratio": 1.665137614678899, "no_speech_prob": 0.01120330486446619}, + {"id": 314, "seek": 204884, "start": 2048.84, "end": 2055.6400000000003, "text": + " model or somehow, but yeah, I think also there is a thing that we are framed and + we have been", "tokens": [50364, 2316, 420, 6063, 11, 457, 1338, 11, 286, 519, 611, + 456, 307, 257, 551, 300, 321, 366, 30420, 293, 321, 362, 668, 50704], "temperature": + 0.0, "avg_logprob": -0.22729716350122825, "compression_ratio": 1.6543778801843319, + "no_speech_prob": 0.0010044240625575185}, {"id": 315, "seek": 204884, "start": 2055.6400000000003, + "end": 2062.1200000000003, "text": " grown into this keyboard search that it''s + so interpretable and so easy to use and even so", "tokens": [50704, 7709, 666, 341, + 10186, 3164, 300, 309, 311, 370, 7302, 712, 293, 370, 1858, 281, 764, 293, 754, + 370, 51028], "temperature": 0.0, "avg_logprob": -0.22729716350122825, "compression_ratio": + 1.6543778801843319, "no_speech_prob": 0.0010044240625575185}, {"id": 316, "seek": + 204884, "start": 2062.1200000000003, "end": 2066.84, "text": " easy to hack. So + how, you know, you know, you as a user know how to drive your", "tokens": [51028, + 1858, 281, 10339, 13, 407, 577, 11, 291, 458, 11, 291, 458, 11, 291, 382, 257, 4195, + 458, 577, 281, 3332, 428, 51264], "temperature": 0.0, "avg_logprob": -0.22729716350122825, + "compression_ratio": 1.6543778801843319, "no_speech_prob": 0.0010044240625575185}, + {"id": 317, "seek": 204884, "start": 2066.84, "end": 2073.08, "text": " search if + you don''t find it, right? Okay, this word might find you here. And I think since + these", "tokens": [51264, 3164, 498, 291, 500, 380, 915, 309, 11, 558, 30, 1033, + 11, 341, 1349, 1062, 915, 291, 510, 13, 400, 286, 519, 1670, 613, 51576], "temperature": + 0.0, "avg_logprob": -0.22729716350122825, "compression_ratio": 1.6543778801843319, + "no_speech_prob": 0.0010044240625575185}, {"id": 318, "seek": 207308, "start": 2073.08, + "end": 2081.4, "text": " models are kind of black box for many of us, I think in + this kind of sense, this interpretability is", "tokens": [50364, 5245, 366, 733, + 295, 2211, 2424, 337, 867, 295, 505, 11, 286, 519, 294, 341, 733, 295, 2020, 11, + 341, 7302, 2310, 307, 50780], "temperature": 0.0, "avg_logprob": -0.21640856903378325, + "compression_ratio": 1.654708520179372, "no_speech_prob": 0.0023971577174961567}, + {"id": 319, "seek": 207308, "start": 2081.4, "end": 2085.56, "text": " one of the + main challenges and I think one of the main focus that research should go.", "tokens": + [50780, 472, 295, 264, 2135, 4759, 293, 286, 519, 472, 295, 264, 2135, 1879, 300, + 2132, 820, 352, 13, 50988], "temperature": 0.0, "avg_logprob": -0.21640856903378325, + "compression_ratio": 1.654708520179372, "no_speech_prob": 0.0023971577174961567}, + {"id": 320, "seek": 207308, "start": 2086.2, "end": 2091.08, "text": " Yeah, but + I mean, you you you call it out as an interpretability, but like for the user sent", + "tokens": [51020, 865, 11, 457, 286, 914, 11, 291, 291, 291, 818, 309, 484, 382, + 364, 7302, 2310, 11, 457, 411, 337, 264, 4195, 2279, 51264], "temperature": 0.0, + "avg_logprob": -0.21640856903378325, "compression_ratio": 1.654708520179372, "no_speech_prob": + 0.0023971577174961567}, {"id": 321, "seek": 207308, "start": 2091.08, "end": 2096.2799999999997, + "text": " for me, let''s say I''m a product manager, I don''t care about is it bird + model, is it VM25?", "tokens": [51264, 337, 385, 11, 718, 311, 584, 286, 478, 257, + 1674, 6598, 11, 286, 500, 380, 1127, 466, 307, 309, 5255, 2316, 11, 307, 309, 18038, + 6074, 30, 51524], "temperature": 0.0, "avg_logprob": -0.21640856903378325, "compression_ratio": + 1.654708520179372, "no_speech_prob": 0.0023971577174961567}, {"id": 322, "seek": + 209628, "start": 2096.36, "end": 2102.92, "text": " I used to see snippets, I want + to see them now. So like, what should the point in VM25?", "tokens": [50368, 286, + 1143, 281, 536, 35623, 1385, 11, 286, 528, 281, 536, 552, 586, 13, 407, 411, 11, + 437, 820, 264, 935, 294, 18038, 6074, 30, 50696], "temperature": 0.0, "avg_logprob": + -0.2243355941772461, "compression_ratio": 1.5875, "no_speech_prob": 0.008157463744282722}, + {"id": 323, "seek": 209628, "start": 2103.6400000000003, "end": 2111.0, "text": + " I can give you a snippet because I know why I have this solution. Here is, well, + it correlates and", "tokens": [50732, 286, 393, 976, 291, 257, 35623, 302, 570, + 286, 458, 983, 286, 362, 341, 3827, 13, 1692, 307, 11, 731, 11, 309, 13983, 1024, + 293, 51100], "temperature": 0.0, "avg_logprob": -0.2243355941772461, "compression_ratio": + 1.5875, "no_speech_prob": 0.008157463744282722}, {"id": 324, "seek": 209628, "start": + 2111.0, "end": 2116.1200000000003, "text": " but where the information that I want + is there. Maybe for instance, in dinner, one of the main", "tokens": [51100, 457, + 689, 264, 1589, 300, 286, 528, 307, 456, 13, 2704, 337, 5197, 11, 294, 6148, 11, + 472, 295, 264, 2135, 51356], "temperature": 0.0, "avg_logprob": -0.2243355941772461, + "compression_ratio": 1.5875, "no_speech_prob": 0.008157463744282722}, {"id": 325, + "seek": 209628, "start": 2117.5600000000004, "end": 2122.0400000000004, "text": + " building blocks that we have is our document is our recursive structure because + most of the things,", "tokens": [51428, 2390, 8474, 300, 321, 362, 307, 527, 4166, + 307, 527, 20560, 488, 3877, 570, 881, 295, 264, 721, 11, 51652], "temperature": + 0.0, "avg_logprob": -0.2243355941772461, "compression_ratio": 1.5875, "no_speech_prob": + 0.008157463744282722}, {"id": 326, "seek": 212204, "start": 2122.04, "end": 2126.68, + "text": " for instance, if you find the search, if you search a document, a text + or document, you might need", "tokens": [50364, 337, 5197, 11, 498, 291, 915, 264, + 3164, 11, 498, 291, 3164, 257, 4166, 11, 257, 2487, 420, 4166, 11, 291, 1062, 643, + 50596], "temperature": 0.0, "avg_logprob": -0.18899994753719715, "compression_ratio": + 1.8267326732673268, "no_speech_prob": 0.005950938444584608}, {"id": 327, "seek": + 212204, "start": 2126.68, "end": 2134.2799999999997, "text": " to break into paragraphs + into digs and so on. So maybe what you can do is you do the vector search", "tokens": + [50596, 281, 1821, 666, 48910, 666, 2528, 82, 293, 370, 322, 13, 407, 1310, 437, + 291, 393, 360, 307, 291, 360, 264, 8062, 3164, 50976], "temperature": 0.0, "avg_logprob": + -0.18899994753719715, "compression_ratio": 1.8267326732673268, "no_speech_prob": + 0.005950938444584608}, {"id": 328, "seek": 212204, "start": 2134.2799999999997, + "end": 2139.8, "text": " at the variety level is at sentence level, but then the + results might be shown at paragraph or", "tokens": [50976, 412, 264, 5673, 1496, + 307, 412, 8174, 1496, 11, 457, 550, 264, 3542, 1062, 312, 4898, 412, 18865, 420, + 51252], "temperature": 0.0, "avg_logprob": -0.18899994753719715, "compression_ratio": + 1.8267326732673268, "no_speech_prob": 0.005950938444584608}, {"id": 329, "seek": + 212204, "start": 2139.8, "end": 2144.68, "text": " at sentence level. So you can + highlight very easily the sentence that really", "tokens": [51252, 412, 8174, 1496, + 13, 407, 291, 393, 5078, 588, 3612, 264, 8174, 300, 534, 51496], "temperature": + 0.0, "avg_logprob": -0.18899994753719715, "compression_ratio": 1.8267326732673268, + "no_speech_prob": 0.005950938444584608}, {"id": 330, "seek": 214468, "start": 2145.64, + "end": 2154.7599999999998, "text": " and drove the search to this page. I remember + actually, I don''t know if you know the block,", "tokens": [50412, 293, 13226, 264, + 3164, 281, 341, 3028, 13, 286, 1604, 767, 11, 286, 500, 380, 458, 498, 291, 458, + 264, 3461, 11, 50868], "temperature": 0.0, "avg_logprob": -0.3132782716017503, "compression_ratio": + 1.446808510638298, "no_speech_prob": 0.01839555613696575}, {"id": 331, "seek": 214468, + "start": 2154.7599999999998, "end": 2160.9199999999996, "text": " was it salmon + run, like Sujitpal, he''s doing a lot of blogging in the area of like,", "tokens": + [50868, 390, 309, 18518, 1190, 11, 411, 2746, 73, 270, 31862, 11, 415, 311, 884, + 257, 688, 295, 6968, 3249, 294, 264, 1859, 295, 411, 11, 51176], "temperature": + 0.0, "avg_logprob": -0.3132782716017503, "compression_ratio": 1.446808510638298, + "no_speech_prob": 0.01839555613696575}, {"id": 332, "seek": 214468, "start": 2162.44, + "end": 2167.7999999999997, "text": " here is the problem, how do I solve it? And + then he, quite usually he goes into deep learning or", "tokens": [51252, 510, 307, + 264, 1154, 11, 577, 360, 286, 5039, 309, 30, 400, 550, 415, 11, 1596, 2673, 415, + 1709, 666, 2452, 2539, 420, 51520], "temperature": 0.0, "avg_logprob": -0.3132782716017503, + "compression_ratio": 1.446808510638298, "no_speech_prob": 0.01839555613696575}, + {"id": 333, "seek": 216780, "start": 2168.52, "end": 2173.96, "text": " trying out + some vector search, maybe or not. And I remember like he was saying that", "tokens": + [50400, 1382, 484, 512, 8062, 3164, 11, 1310, 420, 406, 13, 400, 286, 1604, 411, + 415, 390, 1566, 300, 50672], "temperature": 0.0, "avg_logprob": -0.2136070556640625, + "compression_ratio": 1.8804780876494025, "no_speech_prob": 0.020604722201824188}, + {"id": 334, "seek": 216780, "start": 2176.36, "end": 2180.52, "text": " to solve + this snippeting problem, how he would do it because he comes from his additional", + "tokens": [50792, 281, 5039, 341, 35623, 9880, 1154, 11, 577, 415, 576, 360, 309, + 570, 415, 1487, 490, 702, 4497, 51000], "temperature": 0.0, "avg_logprob": -0.2136070556640625, + "compression_ratio": 1.8804780876494025, "no_speech_prob": 0.020604722201824188}, + {"id": 335, "seek": 216780, "start": 2180.52, "end": 2186.6000000000004, "text": + " search and I do in a way. And like he said, okay, you can kind of build, like, + if I remember correctly,", "tokens": [51000, 3164, 293, 286, 360, 294, 257, 636, + 13, 400, 411, 415, 848, 11, 1392, 11, 291, 393, 733, 295, 1322, 11, 411, 11, 498, + 286, 1604, 8944, 11, 51304], "temperature": 0.0, "avg_logprob": -0.2136070556640625, + "compression_ratio": 1.8804780876494025, "no_speech_prob": 0.020604722201824188}, + {"id": 336, "seek": 216780, "start": 2187.48, "end": 2191.4, "text": " if you can + do like almost like a dictionary, right? So let''s say you take a word, you can + embed it,", "tokens": [51348, 498, 291, 393, 360, 411, 1920, 411, 257, 25890, 11, + 558, 30, 407, 718, 311, 584, 291, 747, 257, 1349, 11, 291, 393, 12240, 309, 11, + 51544], "temperature": 0.0, "avg_logprob": -0.2136070556640625, "compression_ratio": + 1.8804780876494025, "no_speech_prob": 0.020604722201824188}, {"id": 337, "seek": + 216780, "start": 2191.4, "end": 2195.8, "text": " take a word, you can embed it, + like you can embed a dictionary, right? Now when you found that", "tokens": [51544, + 747, 257, 1349, 11, 291, 393, 12240, 309, 11, 411, 291, 393, 12240, 257, 25890, + 11, 558, 30, 823, 562, 291, 1352, 300, 51764], "temperature": 0.0, "avg_logprob": + -0.2136070556640625, "compression_ratio": 1.8804780876494025, "no_speech_prob": + 0.020604722201824188}, {"id": 338, "seek": 219580, "start": 2195.8, "end": 2202.28, + "text": " document, you can kind of from embeddings, you can map back to the words. + If they happen to be", "tokens": [50364, 4166, 11, 291, 393, 733, 295, 490, 12240, + 29432, 11, 291, 393, 4471, 646, 281, 264, 2283, 13, 759, 436, 1051, 281, 312, 50688], + "temperature": 0.0, "avg_logprob": -0.13280782197651111, "compression_ratio": 1.6866359447004609, + "no_speech_prob": 0.0025919703766703606}, {"id": 339, "seek": 219580, "start": 2202.28, + "end": 2207.2400000000002, "text": " closed enough, like geometrically, you can + find closed enough words. So you can kind of try to say,", "tokens": [50688, 5395, + 1547, 11, 411, 12956, 81, 984, 11, 291, 393, 915, 5395, 1547, 2283, 13, 407, 291, + 393, 733, 295, 853, 281, 584, 11, 50936], "temperature": 0.0, "avg_logprob": -0.13280782197651111, + "compression_ratio": 1.6866359447004609, "no_speech_prob": 0.0025919703766703606}, + {"id": 340, "seek": 219580, "start": 2207.2400000000002, "end": 2213.4, "text": + " okay, maybe these keywords are representative of this text, but I''m not 100% + sure, but at least you try.", "tokens": [50936, 1392, 11, 1310, 613, 21009, 366, + 12424, 295, 341, 2487, 11, 457, 286, 478, 406, 2319, 4, 988, 11, 457, 412, 1935, + 291, 853, 13, 51244], "temperature": 0.0, "avg_logprob": -0.13280782197651111, "compression_ratio": + 1.6866359447004609, "no_speech_prob": 0.0025919703766703606}, {"id": 341, "seek": + 219580, "start": 2214.2000000000003, "end": 2217.2400000000002, "text": " So you + go backwards, like reverse engineering from the embeddings.", "tokens": [51284, + 407, 291, 352, 12204, 11, 411, 9943, 7043, 490, 264, 12240, 29432, 13, 51436], "temperature": + 0.0, "avg_logprob": -0.13280782197651111, "compression_ratio": 1.6866359447004609, + "no_speech_prob": 0.0025919703766703606}, {"id": 342, "seek": 221724, "start": 2218.2, + "end": 2225.64, "text": " It''s interesting, sir. You may need to go through all + the pain of dramatizing kind of these kind of", "tokens": [50412, 467, 311, 1880, + 11, 4735, 13, 509, 815, 643, 281, 352, 807, 439, 264, 1822, 295, 42749, 3319, 733, + 295, 613, 733, 295, 50784], "temperature": 0.0, "avg_logprob": -0.3267186608644995, + "compression_ratio": 1.6048387096774193, "no_speech_prob": 0.010406569577753544}, + {"id": 343, "seek": 221724, "start": 2225.64, "end": 2231.7999999999997, "text": + " stuff that you may have saved by going through semantic search and now you are + back to it. So,", "tokens": [50784, 1507, 300, 291, 815, 362, 6624, 538, 516, 807, + 47982, 3164, 293, 586, 291, 366, 646, 281, 309, 13, 407, 11, 51092], "temperature": + 0.0, "avg_logprob": -0.3267186608644995, "compression_ratio": 1.6048387096774193, + "no_speech_prob": 0.010406569577753544}, {"id": 344, "seek": 221724, "start": 2232.8399999999997, + "end": 2239.24, "text": " like straight-offs, but yeah, it might be a good... Yeah, + dramatization is another thing, but like I think", "tokens": [51144, 411, 2997, + 12, 19231, 11, 457, 1338, 11, 309, 1062, 312, 257, 665, 485, 865, 11, 42749, 2144, + 307, 1071, 551, 11, 457, 411, 286, 519, 51464], "temperature": 0.0, "avg_logprob": + -0.3267186608644995, "compression_ratio": 1.6048387096774193, "no_speech_prob": + 0.010406569577753544}, {"id": 345, "seek": 221724, "start": 2239.7999999999997, + "end": 2246.3599999999997, "text": " there was this paper from, I believe Google + about byte level training, right? So they don''t care", "tokens": [51492, 456, 390, + 341, 3035, 490, 11, 286, 1697, 3329, 466, 40846, 1496, 3097, 11, 558, 30, 407, 436, + 500, 380, 1127, 51820], "temperature": 0.0, "avg_logprob": -0.3267186608644995, + "compression_ratio": 1.6048387096774193, "no_speech_prob": 0.010406569577753544}, + {"id": 346, "seek": 224636, "start": 2246.36, "end": 2252.6, "text": " if it''s + like lemma or if it''s like suffix or prefix, they just go byte level. They don''t + go sub-word", "tokens": [50364, 498, 309, 311, 411, 7495, 1696, 420, 498, 309, 311, + 411, 3889, 970, 420, 46969, 11, 436, 445, 352, 40846, 1496, 13, 814, 500, 380, 352, + 1422, 12, 7462, 50676], "temperature": 0.0, "avg_logprob": -0.13911275251196065, + "compression_ratio": 1.6958333333333333, "no_speech_prob": 0.001156385987997055}, + {"id": 347, "seek": 224636, "start": 2252.6, "end": 2258.36, "text": " level. They + go byte level. And then with byte level, you can essentially kind of like, okay, + now I can compute", "tokens": [50676, 1496, 13, 814, 352, 40846, 1496, 13, 400, + 550, 365, 40846, 1496, 11, 291, 393, 4476, 733, 295, 411, 11, 1392, 11, 586, 286, + 393, 14722, 50964], "temperature": 0.0, "avg_logprob": -0.13911275251196065, "compression_ratio": + 1.6958333333333333, "no_speech_prob": 0.001156385987997055}, {"id": 348, "seek": + 224636, "start": 2258.36, "end": 2264.04, "text": " the distance again, right? Okay, + how close is this to this dictionary word or not? But then again,", "tokens": [50964, + 264, 4560, 797, 11, 558, 30, 1033, 11, 577, 1998, 307, 341, 281, 341, 25890, 1349, + 420, 406, 30, 583, 550, 797, 11, 51248], "temperature": 0.0, "avg_logprob": -0.13911275251196065, + "compression_ratio": 1.6958333333333333, "no_speech_prob": 0.001156385987997055}, + {"id": 349, "seek": 224636, "start": 2264.6, "end": 2269.7200000000003, "text": + " from there, in order to produce a snippet that will look like natural language, + you will have to", "tokens": [51276, 490, 456, 11, 294, 1668, 281, 5258, 257, 35623, + 302, 300, 486, 574, 411, 3303, 2856, 11, 291, 486, 362, 281, 51532], "temperature": + 0.0, "avg_logprob": -0.13911275251196065, "compression_ratio": 1.6958333333333333, + "no_speech_prob": 0.001156385987997055}, {"id": 350, "seek": 226972, "start": 2269.7999999999997, + "end": 2277.16, "text": " use some kind of model like GPT or like in general, generate + the sentence. And at that point,", "tokens": [50368, 764, 512, 733, 295, 2316, 411, + 26039, 51, 420, 411, 294, 2674, 11, 8460, 264, 8174, 13, 400, 412, 300, 935, 11, + 50736], "temperature": 0.0, "avg_logprob": -0.22329714638846263, "compression_ratio": + 1.455, "no_speech_prob": 0.0027010105550289154}, {"id": 351, "seek": 226972, "start": + 2277.16, "end": 2283.16, "text": " it might actually go completely different direction + from your text, right? Start like hallucinating or", "tokens": [50736, 309, 1062, + 767, 352, 2584, 819, 3513, 490, 428, 2487, 11, 558, 30, 6481, 411, 35212, 8205, + 420, 51036], "temperature": 0.0, "avg_logprob": -0.22329714638846263, "compression_ratio": + 1.455, "no_speech_prob": 0.0027010105550289154}, {"id": 352, "seek": 226972, "start": + 2284.2, "end": 2292.68, "text": " write a news item that doesn''t exist. So yeah. + Well, maybe you can use these extractive models", "tokens": [51088, 2464, 257, 2583, + 3174, 300, 1177, 380, 2514, 13, 407, 1338, 13, 1042, 11, 1310, 291, 393, 764, 613, + 8947, 488, 5245, 51512], "temperature": 0.0, "avg_logprob": -0.22329714638846263, + "compression_ratio": 1.455, "no_speech_prob": 0.0027010105550289154}, {"id": 353, + "seek": 229268, "start": 2292.7599999999998, "end": 2302.3599999999997, "text": + " from a sentence, giving a context, but nature, all these top-notch research is + basically.", "tokens": [50368, 490, 257, 8174, 11, 2902, 257, 4319, 11, 457, 3687, + 11, 439, 613, 1192, 12, 2247, 339, 2132, 307, 1936, 13, 50848], "temperature": 0.0, + "avg_logprob": -0.32912081400553383, "compression_ratio": 1.5265957446808511, "no_speech_prob": + 0.011391001753509045}, {"id": 354, "seek": 229268, "start": 2303.08, "end": 2308.7599999999998, + "text": " Yeah, yeah. But I mean, like attention, what you mentioned, attention + probably can be used here,", "tokens": [50884, 865, 11, 1338, 13, 583, 286, 914, + 11, 411, 3202, 11, 437, 291, 2835, 11, 3202, 1391, 393, 312, 1143, 510, 11, 51168], + "temperature": 0.0, "avg_logprob": -0.32912081400553383, "compression_ratio": 1.5265957446808511, + "no_speech_prob": 0.011391001753509045}, {"id": 355, "seek": 229268, "start": 2308.7599999999998, + "end": 2314.9199999999996, "text": " right? So like you can ask the model, okay, + what did you pay attention to when you did the matching,", "tokens": [51168, 558, + 30, 407, 411, 291, 393, 1029, 264, 2316, 11, 1392, 11, 437, 630, 291, 1689, 3202, + 281, 562, 291, 630, 264, 14324, 11, 51476], "temperature": 0.0, "avg_logprob": -0.32912081400553383, + "compression_ratio": 1.5265957446808511, "no_speech_prob": 0.011391001753509045}, + {"id": 356, "seek": 231492, "start": 2315.7200000000003, "end": 2322.6800000000003, + "text": " but still it''s not some people, as you say, like you can say it interpretability, + but on that hand,", "tokens": [50404, 457, 920, 309, 311, 406, 512, 561, 11, 382, + 291, 584, 11, 411, 291, 393, 584, 309, 7302, 2310, 11, 457, 322, 300, 1011, 11, + 50752], "temperature": 0.0, "avg_logprob": -0.14122381015699736, "compression_ratio": + 1.6842105263157894, "no_speech_prob": 0.01424292754381895}, {"id": 357, "seek": + 231492, "start": 2322.6800000000003, "end": 2328.44, "text": " it''s kind of like + when you go specifically to that product case, you need that snippet or you need", + "tokens": [50752, 309, 311, 733, 295, 411, 562, 291, 352, 4682, 281, 300, 1674, + 1389, 11, 291, 643, 300, 35623, 302, 420, 291, 643, 51040], "temperature": 0.0, + "avg_logprob": -0.14122381015699736, "compression_ratio": 1.6842105263157894, "no_speech_prob": + 0.01424292754381895}, {"id": 358, "seek": 231492, "start": 2328.44, "end": 2334.6, + "text": " that kind of context of the match. Or like if you said mathematics and + it picked algebra,", "tokens": [51040, 300, 733, 295, 4319, 295, 264, 2995, 13, + 1610, 411, 498, 291, 848, 18666, 293, 309, 6183, 21989, 11, 51348], "temperature": + 0.0, "avg_logprob": -0.14122381015699736, "compression_ratio": 1.6842105263157894, + "no_speech_prob": 0.01424292754381895}, {"id": 359, "seek": 231492, "start": 2334.6, + "end": 2339.08, "text": " like why did it pick algebra? At least can you explain? + Because here it''s more or less obvious,", "tokens": [51348, 411, 983, 630, 309, + 1888, 21989, 30, 1711, 1935, 393, 291, 2903, 30, 1436, 510, 309, 311, 544, 420, + 1570, 6322, 11, 51572], "temperature": 0.0, "avg_logprob": -0.14122381015699736, + "compression_ratio": 1.6842105263157894, "no_speech_prob": 0.01424292754381895}, + {"id": 360, "seek": 233908, "start": 2339.08, "end": 2344.52, "text": " but in a + specific domain, it might not be, right? Yes, like what do we do?", "tokens": [50364, + 457, 294, 257, 2685, 9274, 11, 309, 1062, 406, 312, 11, 558, 30, 1079, 11, 411, + 437, 360, 321, 360, 30, 50636], "temperature": 0.0, "avg_logprob": -0.2686802724773964, + "compression_ratio": 1.5818181818181818, "no_speech_prob": 0.019221186637878418}, + {"id": 361, "seek": 233908, "start": 2346.68, "end": 2351.24, "text": " Maybe you + are not using the right tool, I don''t know. Maybe we are obsessed on using the", + "tokens": [50744, 2704, 291, 366, 406, 1228, 264, 558, 2290, 11, 286, 500, 380, + 458, 13, 2704, 321, 366, 16923, 322, 1228, 264, 50972], "temperature": 0.0, "avg_logprob": + -0.2686802724773964, "compression_ratio": 1.5818181818181818, "no_speech_prob": + 0.019221186637878418}, {"id": 362, "seek": 233908, "start": 2352.2799999999997, + "end": 2360.04, "text": " declared for everything. But I think these two walls of + keyword, what we call traditional", "tokens": [51024, 15489, 337, 1203, 13, 583, + 286, 519, 613, 732, 7920, 295, 20428, 11, 437, 321, 818, 5164, 51412], "temperature": + 0.0, "avg_logprob": -0.2686802724773964, "compression_ratio": 1.5818181818181818, + "no_speech_prob": 0.019221186637878418}, {"id": 363, "seek": 233908, "start": 2360.04, + "end": 2365.4, "text": " search and this neural search, I think they can be combined + to power things to the next level.", "tokens": [51412, 3164, 293, 341, 18161, 3164, + 11, 286, 519, 436, 393, 312, 9354, 281, 1347, 721, 281, 264, 958, 1496, 13, 51680], + "temperature": 0.0, "avg_logprob": -0.2686802724773964, "compression_ratio": 1.5818181818181818, + "no_speech_prob": 0.019221186637878418}, {"id": 364, "seek": 236540, "start": 2365.48, + "end": 2371.7200000000003, "text": " I think they need to be enemies and there is + good and bad team both sides. Do you have any thoughts", "tokens": [50368, 286, + 519, 436, 643, 281, 312, 7805, 293, 456, 307, 665, 293, 1578, 1469, 1293, 4881, + 13, 1144, 291, 362, 604, 4598, 50680], "temperature": 0.0, "avg_logprob": -0.24581652323404948, + "compression_ratio": 1.5904255319148937, "no_speech_prob": 0.009404975920915604}, + {"id": 365, "seek": 236540, "start": 2371.7200000000003, "end": 2381.96, "text": + " how you would combine? For instance, in any solution, you can have solution, I + don''t know, maybe you can", "tokens": [50680, 577, 291, 576, 10432, 30, 1171, 5197, + 11, 294, 604, 3827, 11, 291, 393, 362, 3827, 11, 286, 500, 380, 458, 11, 1310, 291, + 393, 51192], "temperature": 0.0, "avg_logprob": -0.24581652323404948, "compression_ratio": + 1.5904255319148937, "no_speech_prob": 0.009404975920915604}, {"id": 366, "seek": + 236540, "start": 2381.96, "end": 2391.0, "text": " get results based on both sides + and then at our ranking steps consider what is best, you know,", "tokens": [51192, + 483, 3542, 2361, 322, 1293, 4881, 293, 550, 412, 527, 17833, 4439, 1949, 437, 307, + 1151, 11, 291, 458, 11, 51644], "temperature": 0.0, "avg_logprob": -0.24581652323404948, + "compression_ratio": 1.5904255319148937, "no_speech_prob": 0.009404975920915604}, + {"id": 367, "seek": 239100, "start": 2391.64, "end": 2402.28, "text": " is this + a complex query? Maybe I''m looking more for some semantically reached solution.", + "tokens": [50396, 307, 341, 257, 3997, 14581, 30, 2704, 286, 478, 1237, 544, 337, + 512, 4361, 49505, 6488, 3827, 13, 50928], "temperature": 0.0, "avg_logprob": -0.3067806995276249, + "compression_ratio": 1.451086956521739, "no_speech_prob": 0.008017083629965782}, + {"id": 368, "seek": 239100, "start": 2403.24, "end": 2410.68, "text": " Did this + guy just send a couple of keywords? Is it semantically reached enough? No,", "tokens": + [50976, 2589, 341, 2146, 445, 2845, 257, 1916, 295, 21009, 30, 1119, 309, 4361, + 49505, 6488, 1547, 30, 883, 11, 51348], "temperature": 0.0, "avg_logprob": -0.3067806995276249, + "compression_ratio": 1.451086956521739, "no_speech_prob": 0.008017083629965782}, + {"id": 369, "seek": 239100, "start": 2411.32, "end": 2419.0, "text": " this user + might be expecting keyword based feedback. Yeah, that''s true. Well, you could even + go", "tokens": [51380, 341, 4195, 1062, 312, 9650, 20428, 2361, 5824, 13, 865, 11, + 300, 311, 2074, 13, 1042, 11, 291, 727, 754, 352, 51764], "temperature": 0.0, "avg_logprob": + -0.3067806995276249, "compression_ratio": 1.451086956521739, "no_speech_prob": 0.008017083629965782}, + {"id": 370, "seek": 241900, "start": 2419.0, "end": 2426.36, "text": " as simple + as giving that control to users. So, if they know that it''s keyword, they first + want to", "tokens": [50364, 382, 2199, 382, 2902, 300, 1969, 281, 5022, 13, 407, + 11, 498, 436, 458, 300, 309, 311, 20428, 11, 436, 700, 528, 281, 50732], "temperature": + 0.0, "avg_logprob": -0.3150469636263913, "compression_ratio": 1.5901639344262295, + "no_speech_prob": 0.007296734489500523}, {"id": 371, "seek": 241900, "start": 2426.36, + "end": 2431.32, "text": " go with what they know, what works or may not work and + then if they are not satisfied enough,", "tokens": [50732, 352, 365, 437, 436, 458, + 11, 437, 1985, 420, 815, 406, 589, 293, 550, 498, 436, 366, 406, 11239, 1547, 11, + 50980], "temperature": 0.0, "avg_logprob": -0.3150469636263913, "compression_ratio": + 1.5901639344262295, "no_speech_prob": 0.007296734489500523}, {"id": 372, "seek": + 241900, "start": 2431.32, "end": 2438.6, "text": " then they optimize for equal, + they might go into explorative mode, that''s on the similarity search.", "tokens": + [50980, 550, 436, 19719, 337, 2681, 11, 436, 1062, 352, 666, 24765, 1166, 4391, + 11, 300, 311, 322, 264, 32194, 3164, 13, 51344], "temperature": 0.0, "avg_logprob": + -0.3150469636263913, "compression_ratio": 1.5901639344262295, "no_speech_prob": + 0.007296734489500523}, {"id": 373, "seek": 243860, "start": 2439.0, "end": 2448.2799999999997, + "text": " That might be quite viable. So, it''s interesting. The problem is that + keyword search,", "tokens": [50384, 663, 1062, 312, 1596, 22024, 13, 407, 11, 309, + 311, 1880, 13, 440, 1154, 307, 300, 20428, 3164, 11, 50848], "temperature": 0.0, + "avg_logprob": -0.3180640846170405, "compression_ratio": 1.5570175438596492, "no_speech_prob": + 0.1174476370215416}, {"id": 374, "seek": 243860, "start": 2448.2799999999997, "end": + 2455.72, "text": " well, as far as search, it might have not a good future for image + based search or any other", "tokens": [50848, 731, 11, 382, 1400, 382, 3164, 11, + 309, 1062, 362, 406, 257, 665, 2027, 337, 3256, 2361, 3164, 420, 604, 661, 51220], + "temperature": 0.0, "avg_logprob": -0.3180640846170405, "compression_ratio": 1.5570175438596492, + "no_speech_prob": 0.1174476370215416}, {"id": 375, "seek": 243860, "start": 2455.72, + "end": 2461.16, "text": " mobility related search. Yeah, exactly. The moment you + go beyond text, what do you do?", "tokens": [51220, 16199, 4077, 3164, 13, 865, + 11, 2293, 13, 440, 1623, 291, 352, 4399, 2487, 11, 437, 360, 291, 360, 30, 51492], + "temperature": 0.0, "avg_logprob": -0.3180640846170405, "compression_ratio": 1.5570175438596492, + "no_speech_prob": 0.1174476370215416}, {"id": 376, "seek": 243860, "start": 2461.96, + "end": 2467.64, "text": " That''s a big power, I think, and the big future that + Neurochurch has ahead. There is where", "tokens": [51532, 663, 311, 257, 955, 1347, + 11, 286, 519, 11, 293, 264, 955, 2027, 300, 1734, 7052, 339, 2476, 575, 2286, 13, + 821, 307, 689, 51816], "temperature": 0.0, "avg_logprob": -0.3180640846170405, "compression_ratio": + 1.5570175438596492, "no_speech_prob": 0.1174476370215416}, {"id": 377, "seek": 246860, + "start": 2468.92, "end": 2473.0, "text": " not any traditional search solution, + I think, will keep up.", "tokens": [50380, 406, 604, 5164, 3164, 3827, 11, 286, + 519, 11, 486, 1066, 493, 13, 50584], "temperature": 0.0, "avg_logprob": -0.24995876761043773, + "compression_ratio": 1.6164383561643836, "no_speech_prob": 0.005052740685641766}, + {"id": 378, "seek": 246860, "start": 2473.96, "end": 2476.8399999999997, "text": + " So, if I want to build a multimodal search, can I pick some", "tokens": [50632, + 407, 11, 498, 286, 528, 281, 1322, 257, 32972, 378, 304, 3164, 11, 393, 286, 1888, + 512, 50776], "temperature": 0.0, "avg_logprob": -0.24995876761043773, "compression_ratio": + 1.6164383561643836, "no_speech_prob": 0.005052740685641766}, {"id": 379, "seek": + 246860, "start": 2477.72, "end": 2482.04, "text": " executor from the marketplace + and plug it into Gina today and do it?", "tokens": [50820, 7568, 284, 490, 264, + 19455, 293, 5452, 309, 666, 34711, 965, 293, 360, 309, 30, 51036], "temperature": + 0.0, "avg_logprob": -0.24995876761043773, "compression_ratio": 1.6164383561643836, + "no_speech_prob": 0.005052740685641766}, {"id": 380, "seek": 246860, "start": 2483.56, + "end": 2487.48, "text": " Yeah, I don''t know. I think we have some, for instance, + but you can use clip", "tokens": [51112, 865, 11, 286, 500, 380, 458, 13, 286, 519, + 321, 362, 512, 11, 337, 5197, 11, 457, 291, 393, 764, 7353, 51308], "temperature": + 0.0, "avg_logprob": -0.24995876761043773, "compression_ratio": 1.6164383561643836, + "no_speech_prob": 0.005052740685641766}, {"id": 381, "seek": 246860, "start": 2488.2, + "end": 2493.0, "text": " that clip. You can use clip to encode. I think there is + audio, you can text, or there is", "tokens": [51344, 300, 7353, 13, 509, 393, 764, + 7353, 281, 2058, 1429, 13, 286, 519, 456, 307, 6278, 11, 291, 393, 2487, 11, 420, + 456, 307, 51584], "temperature": 0.0, "avg_logprob": -0.24995876761043773, "compression_ratio": + 1.6164383561643836, "no_speech_prob": 0.005052740685641766}, {"id": 382, "seek": + 249300, "start": 2493.72, "end": 2499.24, "text": " image and text, and it performs + very well. We have wrapped it in one of these executors and", "tokens": [50400, + 3256, 293, 2487, 11, 293, 309, 26213, 588, 731, 13, 492, 362, 14226, 309, 294, 472, + 295, 613, 7568, 830, 293, 50676], "temperature": 0.0, "avg_logprob": -0.28962413124416186, + "compression_ratio": 1.6069868995633187, "no_speech_prob": 0.006543709896504879}, + {"id": 383, "seek": 249300, "start": 2499.24, "end": 2504.36, "text": " hot modules, + and you can use these clip models to do your close model search.", "tokens": [50676, + 2368, 16679, 11, 293, 291, 393, 764, 613, 7353, 5245, 281, 360, 428, 1998, 2316, + 3164, 13, 50932], "temperature": 0.0, "avg_logprob": -0.28962413124416186, "compression_ratio": + 1.6069868995633187, "no_speech_prob": 0.006543709896504879}, {"id": 384, "seek": + 249300, "start": 2505.96, "end": 2511.56, "text": " It''s quite efficient without + the match-faint tuning to search for images given text and the other", "tokens": + [51012, 467, 311, 1596, 7148, 1553, 264, 2995, 12, 69, 5114, 15164, 281, 3164, 337, + 5267, 2212, 2487, 293, 264, 661, 51292], "temperature": 0.0, "avg_logprob": -0.28962413124416186, + "compression_ratio": 1.6069868995633187, "no_speech_prob": 0.006543709896504879}, + {"id": 385, "seek": 249300, "start": 2511.56, "end": 2517.8, "text": " way around. + It''s quite impressive. Yeah, that sounds cool. When I was thinking, if I want to + combine", "tokens": [51292, 636, 926, 13, 467, 311, 1596, 8992, 13, 865, 11, 300, + 3263, 1627, 13, 1133, 286, 390, 1953, 11, 498, 286, 528, 281, 10432, 51604], "temperature": + 0.0, "avg_logprob": -0.28962413124416186, "compression_ratio": 1.6069868995633187, + "no_speech_prob": 0.006543709896504879}, {"id": 386, "seek": 251780, "start": 2517.88, + "end": 2523.6400000000003, "text": " like speech, text, and image, then I need to + probably come up with some meta model of that.", "tokens": [50368, 411, 6218, 11, + 2487, 11, 293, 3256, 11, 550, 286, 643, 281, 1391, 808, 493, 365, 512, 19616, 2316, + 295, 300, 13, 50656], "temperature": 0.0, "avg_logprob": -0.19282292210778526, "compression_ratio": + 1.6944444444444444, "no_speech_prob": 0.006866848096251488}, {"id": 387, "seek": + 251780, "start": 2523.6400000000003, "end": 2529.4, "text": " Right? There is some + research in this area where it is not that like modalities are treated", "tokens": + [50656, 1779, 30, 821, 307, 512, 2132, 294, 341, 1859, 689, 309, 307, 406, 300, + 411, 1072, 16110, 366, 8668, 50944], "temperature": 0.0, "avg_logprob": -0.19282292210778526, + "compression_ratio": 1.6944444444444444, "no_speech_prob": 0.006866848096251488}, + {"id": 388, "seek": 251780, "start": 2529.4, "end": 2535.1600000000003, "text": + " differently and encoded separately, but where they are considered together, even + there is some", "tokens": [50944, 7614, 293, 2058, 12340, 14759, 11, 457, 689, 436, + 366, 4888, 1214, 11, 754, 456, 307, 512, 51232], "temperature": 0.0, "avg_logprob": + -0.19282292210778526, "compression_ratio": 1.6944444444444444, "no_speech_prob": + 0.006866848096251488}, {"id": 389, "seek": 251780, "start": 2535.1600000000003, + "end": 2541.2400000000002, "text": " research where there is multimodality and some + contact switch, so they move the vector.", "tokens": [51232, 2132, 689, 456, 307, + 32972, 378, 1860, 293, 512, 3385, 3679, 11, 370, 436, 1286, 264, 8062, 13, 51536], + "temperature": 0.0, "avg_logprob": -0.19282292210778526, "compression_ratio": 1.6944444444444444, + "no_speech_prob": 0.006866848096251488}, {"id": 390, "seek": 254124, "start": 2541.72, + "end": 2548.04, "text": " So that''s also possible to get the latest research, wrap + it into one of these models and", "tokens": [50388, 407, 300, 311, 611, 1944, 281, + 483, 264, 6792, 2132, 11, 7019, 309, 666, 472, 295, 613, 5245, 293, 50704], "temperature": + 0.0, "avg_logprob": -0.2535543441772461, "compression_ratio": 1.6123348017621146, + "no_speech_prob": 0.003322682110592723}, {"id": 391, "seek": 254124, "start": 2548.04, + "end": 2554.4399999999996, "text": " deploy it in production. But this is not so + easy. For us, we didn''t focus on building these", "tokens": [50704, 7274, 309, + 294, 4265, 13, 583, 341, 307, 406, 370, 1858, 13, 1171, 505, 11, 321, 994, 380, + 1879, 322, 2390, 613, 51024], "temperature": 0.0, "avg_logprob": -0.2535543441772461, + "compression_ratio": 1.6123348017621146, "no_speech_prob": 0.003322682110592723}, + {"id": 392, "seek": 254124, "start": 2554.4399999999996, "end": 2561.16, "text": + " front scratch, but we''re also looking to having these top-notch researchers into + building", "tokens": [51024, 1868, 8459, 11, 457, 321, 434, 611, 1237, 281, 1419, + 613, 1192, 12, 2247, 339, 10309, 666, 2390, 51360], "temperature": 0.0, "avg_logprob": + -0.2535543441772461, "compression_ratio": 1.6123348017621146, "no_speech_prob": + 0.003322682110592723}, {"id": 393, "seek": 254124, "start": 2562.12, "end": 2570.52, + "text": " these modules in here. So like, in that case, would you prefer communities + to help out to bring", "tokens": [51408, 613, 16679, 294, 510, 13, 407, 411, 11, + 294, 300, 1389, 11, 576, 291, 4382, 4456, 281, 854, 484, 281, 1565, 51828], "temperature": + 0.0, "avg_logprob": -0.2535543441772461, "compression_ratio": 1.6123348017621146, + "no_speech_prob": 0.003322682110592723}, {"id": 394, "seek": 257052, "start": 2570.52, + "end": 2578.92, "text": " in the model, or are you helping to do that? Right now, + we are driving this direction to offer", "tokens": [50364, 294, 264, 2316, 11, 420, + 366, 291, 4315, 281, 360, 300, 30, 1779, 586, 11, 321, 366, 4840, 341, 3513, 281, + 2626, 50784], "temperature": 0.0, "avg_logprob": -0.3101684794706457, "compression_ratio": + 1.619718309859155, "no_speech_prob": 0.0037170234136283398}, {"id": 395, "seek": + 257052, "start": 2578.92, "end": 2585.08, "text": " these for the community. I think + that our dream as an open source is to have the community", "tokens": [50784, 613, + 337, 264, 1768, 13, 286, 519, 300, 527, 3055, 382, 364, 1269, 4009, 307, 281, 362, + 264, 1768, 51092], "temperature": 0.0, "avg_logprob": -0.3101684794706457, "compression_ratio": + 1.619718309859155, "no_speech_prob": 0.0037170234136283398}, {"id": 396, "seek": + 257052, "start": 2585.08, "end": 2592.2, "text": " flourish and be alive upon itself. + So the future should be community driven.", "tokens": [51092, 38311, 293, 312, 5465, + 3564, 2564, 13, 407, 264, 2027, 820, 312, 1768, 9555, 13, 51448], "temperature": + 0.0, "avg_logprob": -0.3101684794706457, "compression_ratio": 1.619718309859155, + "no_speech_prob": 0.0037170234136283398}, {"id": 397, "seek": 257052, "start": 2593.16, + "end": 2599.0, "text": " Yeah, because in the end, community might also know kind + of when these growths be,", "tokens": [51496, 865, 11, 570, 294, 264, 917, 11, 1768, + 1062, 611, 458, 733, 295, 562, 613, 4599, 82, 312, 11, 51788], "temperature": 0.0, + "avg_logprob": -0.3101684794706457, "compression_ratio": 1.619718309859155, "no_speech_prob": + 0.0037170234136283398}, {"id": 398, "seek": 259900, "start": 2599.0, "end": 2604.92, + "text": " you know, community will be kind of helping each other. Like some of these + things will become", "tokens": [50364, 291, 458, 11, 1768, 486, 312, 733, 295, 4315, + 1184, 661, 13, 1743, 512, 295, 613, 721, 486, 1813, 50660], "temperature": 0.0, + "avg_logprob": -0.13075381462727118, "compression_ratio": 1.8638132295719845, "no_speech_prob": + 0.004427563864737749}, {"id": 399, "seek": 259900, "start": 2605.48, "end": 2610.76, + "text": " what you may call commodity to some extent, right? Or at least the way + you integrate might become", "tokens": [50688, 437, 291, 815, 818, 29125, 281, 512, + 8396, 11, 558, 30, 1610, 412, 1935, 264, 636, 291, 13365, 1062, 1813, 50952], "temperature": + 0.0, "avg_logprob": -0.13075381462727118, "compression_ratio": 1.8638132295719845, + "no_speech_prob": 0.004427563864737749}, {"id": 400, "seek": 259900, "start": 2610.76, + "end": 2615.48, "text": " commodity and the use cases might become commodity and + there will be new use cases which are", "tokens": [50952, 29125, 293, 264, 764, + 3331, 1062, 1813, 29125, 293, 456, 486, 312, 777, 764, 3331, 597, 366, 51188], "temperature": + 0.0, "avg_logprob": -0.13075381462727118, "compression_ratio": 1.8638132295719845, + "no_speech_prob": 0.004427563864737749}, {"id": 401, "seek": 259900, "start": 2615.48, + "end": 2620.44, "text": " untapped, but I think community can definitely help out + each other. What we might need to focus on", "tokens": [51188, 517, 1328, 3320, + 11, 457, 286, 519, 1768, 393, 2138, 854, 484, 1184, 661, 13, 708, 321, 1062, 643, + 281, 1879, 322, 51436], "temperature": 0.0, "avg_logprob": -0.13075381462727118, + "compression_ratio": 1.8638132295719845, "no_speech_prob": 0.004427563864737749}, + {"id": 402, "seek": 259900, "start": 2620.44, "end": 2627.08, "text": " to make + these models easier to use or easier to find if we have a marketplace where everything,", + "tokens": [51436, 281, 652, 613, 5245, 3571, 281, 764, 420, 3571, 281, 915, 498, + 321, 362, 257, 19455, 689, 1203, 11, 51768], "temperature": 0.0, "avg_logprob": + -0.13075381462727118, "compression_ratio": 1.8638132295719845, "no_speech_prob": + 0.004427563864737749}, {"id": 403, "seek": 262708, "start": 2627.08, "end": 2632.68, + "text": " maybe we need to help the community on finding what they need in every + time.", "tokens": [50364, 1310, 321, 643, 281, 854, 264, 1768, 322, 5006, 437, 436, + 643, 294, 633, 565, 13, 50644], "temperature": 0.0, "avg_logprob": -0.2001635996500651, + "compression_ratio": 1.5260663507109005, "no_speech_prob": 0.005009770393371582}, + {"id": 404, "seek": 262708, "start": 2633.48, "end": 2640.36, "text": " Yeah, yeah. + Content wise, hopefully there is a time where community is the main contributor + there.", "tokens": [50684, 865, 11, 1338, 13, 30078, 10829, 11, 4696, 456, 307, + 257, 565, 689, 1768, 307, 264, 2135, 42859, 456, 13, 51028], "temperature": 0.0, + "avg_logprob": -0.2001635996500651, "compression_ratio": 1.5260663507109005, "no_speech_prob": + 0.005009770393371582}, {"id": 405, "seek": 262708, "start": 2641.16, "end": 2644.44, + "text": " Was there something else in Gina that we should know about as users?", + "tokens": [51068, 3027, 456, 746, 1646, 294, 34711, 300, 321, 820, 458, 466, 382, + 5022, 30, 51232], "temperature": 0.0, "avg_logprob": -0.2001635996500651, "compression_ratio": + 1.5260663507109005, "no_speech_prob": 0.005009770393371582}, {"id": 406, "seek": + 262708, "start": 2645.16, "end": 2650.44, "text": " Some cool feature or some system + that you think doesn''t exist in competitors?", "tokens": [51268, 2188, 1627, 4111, + 420, 512, 1185, 300, 291, 519, 1177, 380, 2514, 294, 18333, 30, 51532], "temperature": + 0.0, "avg_logprob": -0.2001635996500651, "compression_ratio": 1.5260663507109005, + "no_speech_prob": 0.005009770393371582}, {"id": 407, "seek": 265044, "start": 2650.44, + "end": 2653.48, "text": " Is there something at all to school?", "tokens": [50364, + 1119, 456, 746, 412, 439, 281, 1395, 30, 50516], "temperature": 0.0, "avg_logprob": + -0.526416318962373, "compression_ratio": 1.592964824120603, "no_speech_prob": 0.03870006650686264}, + {"id": 408, "seek": 265044, "start": 2654.92, "end": 2661.16, "text": " I don''t + know right now about competitors. So I think what I like the most is the easy,", + "tokens": [50588, 286, 500, 380, 458, 558, 586, 466, 18333, 13, 407, 286, 519, 437, + 286, 411, 264, 881, 307, 264, 1858, 11, 50900], "temperature": 0.0, "avg_logprob": + -0.526416318962373, "compression_ratio": 1.592964824120603, "no_speech_prob": 0.03870006650686264}, + {"id": 409, "seek": 265044, "start": 2662.04, "end": 2670.76, "text": " the easy + views and the time saving. You go out to our readme and you try to build from zero + to", "tokens": [50944, 264, 1858, 6809, 293, 264, 565, 6816, 13, 509, 352, 484, + 281, 527, 1401, 1398, 293, 291, 853, 281, 1322, 490, 4018, 281, 51380], "temperature": + 0.0, "avg_logprob": -0.526416318962373, "compression_ratio": 1.592964824120603, + "no_speech_prob": 0.03870006650686264}, {"id": 410, "seek": 265044, "start": 2672.2000000000003, + "end": 2676.36, "text": " to the plighting core net is an neural search solution + and image search solution. I think you will", "tokens": [51452, 281, 264, 499, 397, + 278, 4965, 2533, 307, 364, 18161, 3164, 3827, 293, 3256, 3164, 3827, 13, 286, 519, + 291, 486, 51660], "temperature": 0.0, "avg_logprob": -0.526416318962373, "compression_ratio": + 1.592964824120603, "no_speech_prob": 0.03870006650686264}, {"id": 411, "seek": 267636, + "start": 2676.76, "end": 2683.56, "text": " you will all enjoy days and nights. + Yeah. Yeah. So it''s like kind of well-oiled machine.", "tokens": [50384, 291, 486, + 439, 2103, 1708, 293, 13249, 13, 865, 13, 865, 13, 407, 309, 311, 411, 733, 295, + 731, 12, 78, 7292, 3479, 13, 50724], "temperature": 0.0, "avg_logprob": -0.29376270294189455, + "compression_ratio": 1.5676855895196506, "no_speech_prob": 0.025605594739317894}, + {"id": 412, "seek": 267636, "start": 2684.36, "end": 2691.32, "text": " But can + I also bring it up on my laptop? Yes. You can try on your laptop everything.", "tokens": + [50764, 583, 393, 286, 611, 1565, 309, 493, 322, 452, 10732, 30, 1079, 13, 509, + 393, 853, 322, 428, 10732, 1203, 13, 51112], "temperature": 0.0, "avg_logprob": + -0.29376270294189455, "compression_ratio": 1.5676855895196506, "no_speech_prob": + 0.025605594739317894}, {"id": 413, "seek": 267636, "start": 2691.32, "end": 2699.1600000000003, + "text": " So the point is you may not be able to index so many images but you can + get the first feeling", "tokens": [51112, 407, 264, 935, 307, 291, 815, 406, 312, + 1075, 281, 8186, 370, 867, 5267, 457, 291, 393, 483, 264, 700, 2633, 51504], "temperature": + 0.0, "avg_logprob": -0.29376270294189455, "compression_ratio": 1.5676855895196506, + "no_speech_prob": 0.025605594739317894}, {"id": 414, "seek": 267636, "start": 2699.1600000000003, + "end": 2703.6400000000003, "text": " with your laptop. Yeah, I mean if I want to + be like a demo to impress my manager, you know,", "tokens": [51504, 365, 428, 10732, + 13, 865, 11, 286, 914, 498, 286, 528, 281, 312, 411, 257, 10723, 281, 6729, 452, + 6598, 11, 291, 458, 11, 51728], "temperature": 0.0, "avg_logprob": -0.29376270294189455, + "compression_ratio": 1.5676855895196506, "no_speech_prob": 0.025605594739317894}, + {"id": 415, "seek": 270364, "start": 2704.12, "end": 2709.4, "text": " so I usually + use my laptop right? Like that''s maybe one way. Gina is really for that.", "tokens": + [50388, 370, 286, 2673, 764, 452, 10732, 558, 30, 1743, 300, 311, 1310, 472, 636, + 13, 34711, 307, 534, 337, 300, 13, 50652], "temperature": 0.0, "avg_logprob": -0.2590641055190772, + "compression_ratio": 1.7153846153846153, "no_speech_prob": 0.02220165729522705}, + {"id": 416, "seek": 270364, "start": 2709.7999999999997, "end": 2714.68, "text": + " Yeah, that''s that''s pretty cool. And I think also like it''s nice that you said + it''s", "tokens": [50672, 865, 11, 300, 311, 300, 311, 1238, 1627, 13, 400, 286, + 519, 611, 411, 309, 311, 1481, 300, 291, 848, 309, 311, 50916], "temperature": 0.0, + "avg_logprob": -0.2590641055190772, "compression_ratio": 1.7153846153846153, "no_speech_prob": + 0.02220165729522705}, {"id": 417, "seek": 270364, "start": 2715.56, "end": 2722.6, + "text": " Python friendly so it opens doors to so many things like especially like + on hiking face it''s", "tokens": [50960, 15329, 9208, 370, 309, 9870, 8077, 281, + 370, 867, 721, 411, 2318, 411, 322, 23784, 1851, 309, 311, 51312], "temperature": + 0.0, "avg_logprob": -0.2590641055190772, "compression_ratio": 1.7153846153846153, + "no_speech_prob": 0.02220165729522705}, {"id": 418, "seek": 270364, "start": 2722.6, + "end": 2727.24, "text": " pretty much all Python right so I need to pick some models + like it in and do I need to", "tokens": [51312, 1238, 709, 439, 15329, 558, 370, + 286, 643, 281, 1888, 512, 5245, 411, 309, 294, 293, 360, 286, 643, 281, 51544], + "temperature": 0.0, "avg_logprob": -0.2590641055190772, "compression_ratio": 1.7153846153846153, + "no_speech_prob": 0.02220165729522705}, {"id": 419, "seek": 270364, "start": 2727.24, + "end": 2732.2, "text": " containerize it maybe or figure out isolation and so on + like just plug it in and start using it.", "tokens": [51544, 10129, 1125, 309, 1310, + 420, 2573, 484, 16001, 293, 370, 322, 411, 445, 5452, 309, 294, 293, 722, 1228, + 309, 13, 51792], "temperature": 0.0, "avg_logprob": -0.2590641055190772, "compression_ratio": + 1.7153846153846153, "no_speech_prob": 0.02220165729522705}, {"id": 420, "seek": + 273220, "start": 2732.2, "end": 2740.04, "text": " I think that''s also a great + boost to productivity and actually kind of implementing the use case", "tokens": + [50364, 286, 519, 300, 311, 611, 257, 869, 9194, 281, 15604, 293, 767, 733, 295, + 18114, 264, 764, 1389, 50756], "temperature": 0.0, "avg_logprob": -0.1659651096050556, + "compression_ratio": 1.541237113402062, "no_speech_prob": 0.0006990849506109953}, + {"id": 421, "seek": 273220, "start": 2740.04, "end": 2747.24, "text": " rather than + focusing on some mundane components and parts and processes right? Yeah and even + these", "tokens": [50756, 2831, 813, 8416, 322, 512, 43497, 6677, 293, 3166, 293, + 7555, 558, 30, 865, 293, 754, 613, 51116], "temperature": 0.0, "avg_logprob": -0.1659651096050556, + "compression_ratio": 1.541237113402062, "no_speech_prob": 0.0006990849506109953}, + {"id": 422, "seek": 273220, "start": 2747.24, "end": 2755.3999999999996, "text": + " modules that we have they are already containerized for you. So we have on our + end we build a container", "tokens": [51116, 16679, 300, 321, 362, 436, 366, 1217, + 10129, 1602, 337, 291, 13, 407, 321, 362, 322, 527, 917, 321, 1322, 257, 10129, + 51524], "temperature": 0.0, "avg_logprob": -0.1659651096050556, "compression_ratio": + 1.541237113402062, "no_speech_prob": 0.0006990849506109953}, {"id": 423, "seek": + 275540, "start": 2756.04, "end": 2763.0, "text": " for you so that you can be in + an isolated way with your all your dependencies and stuff.", "tokens": [50396, 337, + 291, 370, 300, 291, 393, 312, 294, 364, 14621, 636, 365, 428, 439, 428, 36606, 293, + 1507, 13, 50744], "temperature": 0.0, "avg_logprob": -0.23395265534866688, "compression_ratio": + 1.5194805194805194, "no_speech_prob": 0.003008848987519741}, {"id": 424, "seek": + 275540, "start": 2763.0, "end": 2768.44, "text": " Yeah, yeah. Sounds great. I mean + I think we now have pretty good understanding of Gina.", "tokens": [50744, 865, + 11, 1338, 13, 14576, 869, 13, 286, 914, 286, 519, 321, 586, 362, 1238, 665, 3701, + 295, 34711, 13, 51016], "temperature": 0.0, "avg_logprob": -0.23395265534866688, + "compression_ratio": 1.5194805194805194, "no_speech_prob": 0.003008848987519741}, + {"id": 425, "seek": 275540, "start": 2768.44, "end": 2773.8, "text": " Of course + we didn''t read the docs yet but it''s sounds promising so I hope some of", "tokens": + [51016, 2720, 1164, 321, 994, 380, 1401, 264, 45623, 1939, 457, 309, 311, 3263, + 20257, 370, 286, 1454, 512, 295, 51284], "temperature": 0.0, "avg_logprob": -0.23395265534866688, + "compression_ratio": 1.5194805194805194, "no_speech_prob": 0.003008848987519741}, + {"id": 426, "seek": 275540, "start": 2773.8, "end": 2780.6800000000003, "text": + " listeners and audience will take it out. I wanted to go more into this kind of + philosophical", "tokens": [51284, 23274, 293, 4034, 486, 747, 309, 484, 13, 286, + 1415, 281, 352, 544, 666, 341, 733, 295, 25066, 51628], "temperature": 0.0, "avg_logprob": + -0.23395265534866688, "compression_ratio": 1.5194805194805194, "no_speech_prob": + 0.003008848987519741}, {"id": 427, "seek": 278068, "start": 2780.68, "end": 2785.16, + "text": " level like what what drives you in this space? Like you said that you''ve + been working in web", "tokens": [50364, 1496, 411, 437, 437, 11754, 291, 294, 341, + 1901, 30, 1743, 291, 848, 300, 291, 600, 668, 1364, 294, 3670, 50588], "temperature": + 0.0, "avg_logprob": -0.20741359486299402, "compression_ratio": 1.7464114832535884, + "no_speech_prob": 0.009304668754339218}, {"id": 428, "seek": 278068, "start": 2785.16, + "end": 2789.96, "text": " scale search as well before right? And like some other + search and engineering in general.", "tokens": [50588, 4373, 3164, 382, 731, 949, + 558, 30, 400, 411, 512, 661, 3164, 293, 7043, 294, 2674, 13, 50828], "temperature": + 0.0, "avg_logprob": -0.20741359486299402, "compression_ratio": 1.7464114832535884, + "no_speech_prob": 0.009304668754339218}, {"id": 429, "seek": 278068, "start": 2790.6, + "end": 2796.68, "text": " So what drives you here now in this area when you join + Gina and why why you join Gina?", "tokens": [50860, 407, 437, 11754, 291, 510, 586, + 294, 341, 1859, 562, 291, 3917, 34711, 293, 983, 983, 291, 3917, 34711, 30, 51164], + "temperature": 0.0, "avg_logprob": -0.20741359486299402, "compression_ratio": 1.7464114832535884, + "no_speech_prob": 0.009304668754339218}, {"id": 430, "seek": 278068, "start": 2797.64, + "end": 2805.48, "text": " So I joined Gina especially as I was in this traditional + search space I was working on training", "tokens": [51212, 407, 286, 6869, 34711, + 2318, 382, 286, 390, 294, 341, 5164, 3164, 1901, 286, 390, 1364, 322, 3097, 51604], + "temperature": 0.0, "avg_logprob": -0.20741359486299402, "compression_ratio": 1.7464114832535884, + "no_speech_prob": 0.009304668754339218}, {"id": 431, "seek": 280548, "start": 2805.48, + "end": 2815.96, "text": " ranking models. So what drove me more is to enable this + search system, this search experience", "tokens": [50364, 17833, 5245, 13, 407, + 437, 13226, 385, 544, 307, 281, 9528, 341, 3164, 1185, 11, 341, 3164, 1752, 50888], + "temperature": 0.0, "avg_logprob": -0.26310432563393804, "compression_ratio": 1.4947368421052631, + "no_speech_prob": 0.0009810883784666657}, {"id": 432, "seek": 280548, "start": 2815.96, + "end": 2823.64, "text": " instance beyond text. I was super curious about how we + can extend it''s impressive to me how the", "tokens": [50888, 5197, 4399, 2487, + 13, 286, 390, 1687, 6369, 466, 577, 321, 393, 10101, 309, 311, 8992, 281, 385, 577, + 264, 51272], "temperature": 0.0, "avg_logprob": -0.26310432563393804, "compression_ratio": + 1.4947368421052631, "no_speech_prob": 0.0009810883784666657}, {"id": 433, "seek": + 280548, "start": 2823.64, "end": 2829.32, "text": " same framework of getting something + that extracts meaningful vectors with semantic information", "tokens": [51272, 912, + 8388, 295, 1242, 746, 300, 8947, 82, 10995, 18875, 365, 47982, 1589, 51556], "temperature": + 0.0, "avg_logprob": -0.26310432563393804, "compression_ratio": 1.4947368421052631, + "no_speech_prob": 0.0009810883784666657}, {"id": 434, "seek": 282932, "start": 2830.04, + "end": 2837.32, "text": " can be used for images, for video, for audio, for anything. + These frameworks I think it has a lot", "tokens": [50400, 393, 312, 1143, 337, 5267, + 11, 337, 960, 11, 337, 6278, 11, 337, 1340, 13, 1981, 29834, 286, 519, 309, 575, + 257, 688, 50764], "temperature": 0.0, "avg_logprob": -0.24318938657461878, "compression_ratio": + 1.660633484162896, "no_speech_prob": 0.005732949823141098}, {"id": 435, "seek": + 282932, "start": 2837.32, "end": 2845.1600000000003, "text": " of features because + it''s quite and also how the how the different research areas from different", "tokens": + [50764, 295, 4122, 570, 309, 311, 1596, 293, 611, 577, 264, 577, 264, 819, 2132, + 3179, 490, 819, 51156], "temperature": 0.0, "avg_logprob": -0.24318938657461878, + "compression_ratio": 1.660633484162896, "no_speech_prob": 0.005732949823141098}, + {"id": 436, "seek": 282932, "start": 2845.1600000000003, "end": 2849.48, "text": + " modalities interact with each other. I don''t know I don''t know. Trans the conversion", + "tokens": [51156, 1072, 16110, 4648, 365, 1184, 661, 13, 286, 500, 380, 458, 286, + 500, 380, 458, 13, 6531, 264, 14298, 51372], "temperature": 0.0, "avg_logprob": + -0.24318938657461878, "compression_ratio": 1.660633484162896, "no_speech_prob": + 0.005732949823141098}, {"id": 437, "seek": 282932, "start": 2849.48, "end": 2856.04, + "text": " neural network appeared even some text classification used to do these + then appeared the", "tokens": [51372, 18161, 3209, 8516, 754, 512, 2487, 21538, + 1143, 281, 360, 613, 550, 8516, 264, 51700], "temperature": 0.0, "avg_logprob": + -0.24318938657461878, "compression_ratio": 1.660633484162896, "no_speech_prob": + 0.005732949823141098}, {"id": 438, "seek": 285604, "start": 2856.04, "end": 2862.12, + "text": " transformer right now the computer vision community is getting in love + with transformers.", "tokens": [50364, 31782, 558, 586, 264, 3820, 5201, 1768, 307, + 1242, 294, 959, 365, 4088, 433, 13, 50668], "temperature": 0.0, "avg_logprob": -0.26536799025261537, + "compression_ratio": 1.6460176991150441, "no_speech_prob": 0.00353757431730628}, + {"id": 439, "seek": 285604, "start": 2862.92, "end": 2869.32, "text": " These back + and forth I think that it''s impressive but also if you think of the magic of getting", + "tokens": [50708, 1981, 646, 293, 5220, 286, 519, 300, 309, 311, 8992, 457, 611, + 498, 291, 519, 295, 264, 5585, 295, 1242, 51028], "temperature": 0.0, "avg_logprob": + -0.26536799025261537, "compression_ratio": 1.6460176991150441, "no_speech_prob": + 0.00353757431730628}, {"id": 440, "seek": 285604, "start": 2869.32, "end": 2875.24, + "text": " this vector and having so much meaning there it''s quite amazing. Yeah + it''s true I mean it''s", "tokens": [51028, 341, 8062, 293, 1419, 370, 709, 3620, + 456, 309, 311, 1596, 2243, 13, 865, 309, 311, 2074, 286, 914, 309, 311, 51324], + "temperature": 0.0, "avg_logprob": -0.26536799025261537, "compression_ratio": 1.6460176991150441, + "no_speech_prob": 0.00353757431730628}, {"id": 441, "seek": 285604, "start": 2875.72, + "end": 2883.32, "text": " very powerful you know like that the the the sheer fact + that you don''t need to build a synonym", "tokens": [51348, 588, 4005, 291, 458, + 411, 300, 264, 264, 264, 23061, 1186, 300, 291, 500, 380, 643, 281, 1322, 257, 5451, + 12732, 51728], "temperature": 0.0, "avg_logprob": -0.26536799025261537, "compression_ratio": + 1.6460176991150441, "no_speech_prob": 0.00353757431730628}, {"id": 442, "seek": + 288332, "start": 2883.32, "end": 2889.6400000000003, "text": " like dictionary if + you go full text right like it just tells you that yeah mathematics is close to", + "tokens": [50364, 411, 25890, 498, 291, 352, 1577, 2487, 558, 411, 309, 445, 5112, + 291, 300, 1338, 18666, 307, 1998, 281, 50680], "temperature": 0.0, "avg_logprob": + -0.10486579209231259, "compression_ratio": 1.7627906976744185, "no_speech_prob": + 0.010772858746349812}, {"id": 443, "seek": 288332, "start": 2889.6400000000003, + "end": 2895.96, "text": " algebra or you know like you throw data at it and it''s + an unsupervised right it just tells you", "tokens": [50680, 21989, 420, 291, 458, + 411, 291, 3507, 1412, 412, 309, 293, 309, 311, 364, 2693, 12879, 24420, 558, 309, + 445, 5112, 291, 50996], "temperature": 0.0, "avg_logprob": -0.10486579209231259, + "compression_ratio": 1.7627906976744185, "no_speech_prob": 0.010772858746349812}, + {"id": 444, "seek": 288332, "start": 2895.96, "end": 2901.6400000000003, "text": + " hey I''ve trained it up like now okay I can tell you what''s close to each other + geometrically", "tokens": [50996, 4177, 286, 600, 8895, 309, 493, 411, 586, 1392, + 286, 393, 980, 291, 437, 311, 1998, 281, 1184, 661, 12956, 81, 984, 51280], "temperature": + 0.0, "avg_logprob": -0.10486579209231259, "compression_ratio": 1.7627906976744185, + "no_speech_prob": 0.010772858746349812}, {"id": 445, "seek": 288332, "start": 2901.6400000000003, + "end": 2907.2400000000002, "text": " it also has the mathematical beauty there right + geometric closeness rather than kind of some", "tokens": [51280, 309, 611, 575, + 264, 18894, 6643, 456, 558, 33246, 2611, 15264, 2831, 813, 733, 295, 512, 51560], + "temperature": 0.0, "avg_logprob": -0.10486579209231259, "compression_ratio": 1.7627906976744185, + "no_speech_prob": 0.010772858746349812}, {"id": 446, "seek": 290724, "start": 2907.24, + "end": 2914.68, "text": " obscure strange abstract sparse closeness it''s quite + elegant yeah yeah you have", "tokens": [50364, 34443, 5861, 12649, 637, 11668, 2611, + 15264, 309, 311, 1596, 21117, 1338, 1338, 291, 362, 50736], "temperature": 0.0, + "avg_logprob": -0.2207587787083217, "compression_ratio": 1.7109004739336493, "no_speech_prob": + 0.003598213894292712}, {"id": 447, "seek": 290724, "start": 2914.68, "end": 2919.0, + "text": " tracked all these knowledge all these and you have that this simple thing + that you can", "tokens": [50736, 31703, 439, 613, 3601, 439, 613, 293, 291, 362, + 300, 341, 2199, 551, 300, 291, 393, 50952], "temperature": 0.0, "avg_logprob": -0.2207587787083217, + "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.003598213894292712}, + {"id": 448, "seek": 290724, "start": 2919.64, "end": 2926.52, "text": " imagine + in your head as a 3D space and that is simple as algebra from I don''t know which + grade", "tokens": [50984, 3811, 294, 428, 1378, 382, 257, 805, 35, 1901, 293, 300, + 307, 2199, 382, 21989, 490, 286, 500, 380, 458, 597, 7204, 51328], "temperature": + 0.0, "avg_logprob": -0.2207587787083217, "compression_ratio": 1.7109004739336493, + "no_speech_prob": 0.003598213894292712}, {"id": 449, "seek": 290724, "start": 2926.52, + "end": 2932.52, "text": " but quite simple yeah I think in simplicity there is a + lot of beauty yeah it''s very easy to explain", "tokens": [51328, 457, 1596, 2199, + 1338, 286, 519, 294, 25632, 456, 307, 257, 688, 295, 6643, 1338, 309, 311, 588, + 1858, 281, 2903, 51628], "temperature": 0.0, "avg_logprob": -0.2207587787083217, + "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.003598213894292712}, + {"id": 450, "seek": 293252, "start": 2932.6, "end": 2938.68, "text": " to your granny + like I''m doing this you know like it''s 3D space kind of there are points and I''m + just", "tokens": [50368, 281, 428, 44797, 411, 286, 478, 884, 341, 291, 458, 411, + 309, 311, 805, 35, 1901, 733, 295, 456, 366, 2793, 293, 286, 478, 445, 50672], "temperature": + 0.0, "avg_logprob": -0.14355464463823298, "compression_ratio": 1.7623318385650224, + "no_speech_prob": 0.003687912132591009}, {"id": 451, "seek": 293252, "start": 2938.68, + "end": 2945.48, "text": " looking the closest one I expect to have something that + puts things close to each other that makes", "tokens": [50672, 1237, 264, 13699, + 472, 286, 2066, 281, 362, 746, 300, 8137, 721, 1998, 281, 1184, 661, 300, 1669, + 51012], "temperature": 0.0, "avg_logprob": -0.14355464463823298, "compression_ratio": + 1.7623318385650224, "no_speech_prob": 0.003687912132591009}, {"id": 452, "seek": + 293252, "start": 2945.48, "end": 2951.4, "text": " sense together that is what we + expect from these black box exactly exactly and then and then the", "tokens": [51012, + 2020, 1214, 300, 307, 437, 321, 2066, 490, 613, 2211, 2424, 2293, 2293, 293, 550, + 293, 550, 264, 51308], "temperature": 0.0, "avg_logprob": -0.14355464463823298, + "compression_ratio": 1.7623318385650224, "no_speech_prob": 0.003687912132591009}, + {"id": 453, "seek": 293252, "start": 2951.4, "end": 2955.96, "text": " question + of scale like if you go to 10 million hundred million billion billion then okay + can you", "tokens": [51308, 1168, 295, 4373, 411, 498, 291, 352, 281, 1266, 2459, + 3262, 2459, 5218, 5218, 550, 1392, 393, 291, 51536], "temperature": 0.0, "avg_logprob": + -0.14355464463823298, "compression_ratio": 1.7623318385650224, "no_speech_prob": + 0.003687912132591009}, {"id": 454, "seek": 295596, "start": 2956.04, "end": 2964.52, + "text": " trade some of that closeness precision you know and kind of get past the + speed so yeah it''s very", "tokens": [50368, 4923, 512, 295, 300, 2611, 15264, 18356, + 291, 458, 293, 733, 295, 483, 1791, 264, 3073, 370, 1338, 309, 311, 588, 50792], + "temperature": 0.0, "avg_logprob": -0.16583452958327075, "compression_ratio": 1.7349397590361446, + "no_speech_prob": 0.0026092110201716423}, {"id": 455, "seek": 295596, "start": 2964.52, + "end": 2970.6, "text": " interesting I mean it''s um does it does it interest you + more like on deep learning side or on", "tokens": [50792, 1880, 286, 914, 309, 311, + 1105, 775, 309, 775, 309, 1179, 291, 544, 411, 322, 2452, 2539, 1252, 420, 322, + 51096], "temperature": 0.0, "avg_logprob": -0.16583452958327075, "compression_ratio": + 1.7349397590361446, "no_speech_prob": 0.0026092110201716423}, {"id": 456, "seek": + 295596, "start": 2970.6, "end": 2978.52, "text": " mathematics side or engineering + side like or maybe some other side in every side from mathematics", "tokens": [51096, + 18666, 1252, 420, 7043, 1252, 411, 420, 1310, 512, 661, 1252, 294, 633, 1252, 490, + 18666, 51492], "temperature": 0.0, "avg_logprob": -0.16583452958327075, "compression_ratio": + 1.7349397590361446, "no_speech_prob": 0.0026092110201716423}, {"id": 457, "seek": + 297852, "start": 2979.48, "end": 2986.92, "text": " I enjoy a lot the beauty of + it sometimes it''s too obscure for me but I really like and understand", "tokens": + [50412, 286, 2103, 257, 688, 264, 6643, 295, 309, 2171, 309, 311, 886, 34443, 337, + 385, 457, 286, 534, 411, 293, 1223, 50784], "temperature": 0.0, "avg_logprob": -0.13012472299429087, + "compression_ratio": 1.6101694915254237, "no_speech_prob": 0.020431529730558395}, + {"id": 458, "seek": 297852, "start": 2986.92, "end": 2996.7599999999998, "text": + " deep learning I like it although I feel that some of some of the research doesn''t + seem to be so", "tokens": [50784, 2452, 2539, 286, 411, 309, 4878, 286, 841, 300, + 512, 295, 512, 295, 264, 2132, 1177, 380, 1643, 281, 312, 370, 51276], "temperature": + 0.0, "avg_logprob": -0.13012472299429087, "compression_ratio": 1.6101694915254237, + "no_speech_prob": 0.020431529730558395}, {"id": 459, "seek": 297852, "start": 2996.7599999999998, + "end": 3005.32, "text": " innovative and maybe we should spend more time checking + other paths and deep learning which", "tokens": [51276, 12999, 293, 1310, 321, 820, + 3496, 544, 565, 8568, 661, 14518, 293, 2452, 2539, 597, 51704], "temperature": 0.0, + "avg_logprob": -0.13012472299429087, "compression_ratio": 1.6101694915254237, "no_speech_prob": + 0.020431529730558395}, {"id": 460, "seek": 300532, "start": 3005.32, "end": 3010.2000000000003, + "text": " are the paths like I don''t know you should be honest I don''t know I''m + just feel I just feel that", "tokens": [50364, 366, 264, 14518, 411, 286, 500, 380, + 458, 291, 820, 312, 3245, 286, 500, 380, 458, 286, 478, 445, 841, 286, 445, 841, + 300, 50608], "temperature": 0.0, "avg_logprob": -0.22727193114578084, "compression_ratio": + 1.7085201793721974, "no_speech_prob": 0.0034583949018269777}, {"id": 461, "seek": + 300532, "start": 3010.76, "end": 3015.96, "text": " there''s so much literature + that I cannot keep up with and then from the engineering side I think", "tokens": + [50636, 456, 311, 370, 709, 10394, 300, 286, 2644, 1066, 493, 365, 293, 550, 490, + 264, 7043, 1252, 286, 519, 50896], "temperature": 0.0, "avg_logprob": -0.22727193114578084, + "compression_ratio": 1.7085201793721974, "no_speech_prob": 0.0034583949018269777}, + {"id": 462, "seek": 300532, "start": 3015.96, "end": 3024.28, "text": " it''s cool + it''s just space I think I also can provide more value and sometimes concepts are", + "tokens": [50896, 309, 311, 1627, 309, 311, 445, 1901, 286, 519, 286, 611, 393, + 2893, 544, 2158, 293, 2171, 10392, 366, 51312], "temperature": 0.0, "avg_logprob": + -0.22727193114578084, "compression_ratio": 1.7085201793721974, "no_speech_prob": + 0.0034583949018269777}, {"id": 463, "seek": 300532, "start": 3024.28, "end": 3031.0, + "text": " to abstract from yeah like for me like I want to call out but you like + point on is deep learning", "tokens": [51312, 281, 12649, 490, 1338, 411, 337, 385, + 411, 286, 528, 281, 818, 484, 457, 291, 411, 935, 322, 307, 2452, 2539, 51648], + "temperature": 0.0, "avg_logprob": -0.22727193114578084, "compression_ratio": 1.7085201793721974, + "no_speech_prob": 0.0034583949018269777}, {"id": 464, "seek": 303100, "start": 3031.0, + "end": 3037.96, "text": " the only way you know like for example one scary thing + is that these models are becoming more", "tokens": [50364, 264, 787, 636, 291, 458, + 411, 337, 1365, 472, 6958, 551, 307, 300, 613, 5245, 366, 5617, 544, 50712], "temperature": + 0.0, "avg_logprob": -0.11038015080594468, "compression_ratio": 1.7894736842105263, + "no_speech_prob": 0.002288981806486845}, {"id": 465, "seek": 303100, "start": 3037.96, + "end": 3043.32, "text": " and more kind of parameterized so you have like hundreds + of billions parameters maybe billion", "tokens": [50712, 293, 544, 733, 295, 13075, + 1602, 370, 291, 362, 411, 6779, 295, 17375, 9834, 1310, 5218, 50980], "temperature": + 0.0, "avg_logprob": -0.11038015080594468, "compression_ratio": 1.7894736842105263, + "no_speech_prob": 0.002288981806486845}, {"id": 466, "seek": 303100, "start": 3043.32, + "end": 3050.92, "text": " trillion like how many more can you have zillion parameters + in there but first of all it''s", "tokens": [50980, 18723, 411, 577, 867, 544, 393, + 291, 362, 710, 11836, 9834, 294, 456, 457, 700, 295, 439, 309, 311, 51360], "temperature": + 0.0, "avg_logprob": -0.11038015080594468, "compression_ratio": 1.7894736842105263, + "no_speech_prob": 0.002288981806486845}, {"id": 467, "seek": 303100, "start": 3050.92, + "end": 3055.72, "text": " impractical so if you take that model you try to plug + it in it doesn''t plug in because it''s too", "tokens": [51360, 704, 1897, 804, + 370, 498, 291, 747, 300, 2316, 291, 853, 281, 5452, 309, 294, 309, 1177, 380, 5452, + 294, 570, 309, 311, 886, 51600], "temperature": 0.0, "avg_logprob": -0.11038015080594468, + "compression_ratio": 1.7894736842105263, "no_speech_prob": 0.002288981806486845}, + {"id": 468, "seek": 305572, "start": 3055.72, "end": 3062.8399999999997, "text": + " expensive and also you might not have that much data in the first place right + so why should you", "tokens": [50364, 5124, 293, 611, 291, 1062, 406, 362, 300, + 709, 1412, 294, 264, 700, 1081, 558, 370, 983, 820, 291, 50720], "temperature": + 0.0, "avg_logprob": -0.12929350679570978, "compression_ratio": 1.6858407079646018, + "no_speech_prob": 0.0019374772673472762}, {"id": 469, "seek": 305572, "start": 3062.8399999999997, + "end": 3070.3599999999997, "text": " care like web scale search engines probably + will but like you as a researcher in let''s say a startup", "tokens": [50720, 1127, + 411, 3670, 4373, 3164, 12982, 1391, 486, 457, 411, 291, 382, 257, 21751, 294, 718, + 311, 584, 257, 18578, 51096], "temperature": 0.0, "avg_logprob": -0.12929350679570978, + "compression_ratio": 1.6858407079646018, "no_speech_prob": 0.0019374772673472762}, + {"id": 470, "seek": 305572, "start": 3070.3599999999997, "end": 3076.7599999999998, + "text": " you don''t know if you need that much you need to sell solve that specific + thing right so it will", "tokens": [51096, 291, 500, 380, 458, 498, 291, 643, 300, + 709, 291, 643, 281, 3607, 5039, 300, 2685, 551, 558, 370, 309, 486, 51416], "temperature": + 0.0, "avg_logprob": -0.12929350679570978, "compression_ratio": 1.6858407079646018, + "no_speech_prob": 0.0019374772673472762}, {"id": 471, "seek": 305572, "start": 3076.7599999999998, + "end": 3083.3999999999996, "text": " it will look really strange to bring this huge + microscope and like GBT model in and say", "tokens": [51416, 309, 486, 574, 534, + 5861, 281, 1565, 341, 2603, 29753, 293, 411, 460, 33853, 2316, 294, 293, 584, 51748], + "temperature": 0.0, "avg_logprob": -0.12929350679570978, "compression_ratio": 1.6858407079646018, + "no_speech_prob": 0.0019374772673472762}, {"id": 472, "seek": 308340, "start": 3083.4, + "end": 3089.0, "text": " this is what we need to use and then like the whole budget + goes into paying that model or whatever", "tokens": [50364, 341, 307, 437, 321, + 643, 281, 764, 293, 550, 411, 264, 1379, 4706, 1709, 666, 6229, 300, 2316, 420, + 2035, 50644], "temperature": 0.0, "avg_logprob": -0.13988312896417113, "compression_ratio": + 1.786046511627907, "no_speech_prob": 0.006199564319103956}, {"id": 473, "seek": + 308340, "start": 3089.0, "end": 3096.36, "text": " you know like it''s in it''s + in practical so that that direction by itself like I think it''s a little", "tokens": + [50644, 291, 458, 411, 309, 311, 294, 309, 311, 294, 8496, 370, 300, 300, 3513, + 538, 2564, 411, 286, 519, 309, 311, 257, 707, 51012], "temperature": 0.0, "avg_logprob": + -0.13988312896417113, "compression_ratio": 1.786046511627907, "no_speech_prob": + 0.006199564319103956}, {"id": 474, "seek": 308340, "start": 3096.36, "end": 3102.36, + "text": " bit like doomed or like I don''t know like how you feel about it yeah + it''s a it''s a race where I", "tokens": [51012, 857, 411, 33847, 420, 411, 286, + 500, 380, 458, 411, 577, 291, 841, 466, 309, 1338, 309, 311, 257, 309, 311, 257, + 4569, 689, 286, 51312], "temperature": 0.0, "avg_logprob": -0.13988312896417113, + "compression_ratio": 1.786046511627907, "no_speech_prob": 0.006199564319103956}, + {"id": 475, "seek": 308340, "start": 3102.36, "end": 3108.12, "text": " add another + layer and get more parameters and I win and I think I''m not but it feels that", + "tokens": [51312, 909, 1071, 4583, 293, 483, 544, 9834, 293, 286, 1942, 293, 286, + 519, 286, 478, 406, 457, 309, 3417, 300, 51600], "temperature": 0.0, "avg_logprob": + -0.13988312896417113, "compression_ratio": 1.786046511627907, "no_speech_prob": + 0.006199564319103956}, {"id": 476, "seek": 310812, "start": 3108.8399999999997, + "end": 3115.16, "text": " the first step to move away from this is to really understand + how things are learned and why", "tokens": [50400, 264, 700, 1823, 281, 1286, 1314, + 490, 341, 307, 281, 534, 1223, 577, 721, 366, 3264, 293, 983, 50716], "temperature": + 0.0, "avg_logprob": -0.1281579907020826, "compression_ratio": 1.8516746411483254, + "no_speech_prob": 0.004840914625674486}, {"id": 477, "seek": 310812, "start": 3115.16, + "end": 3120.52, "text": " are learned the way they are I don''t know any match you + have these models to bring back to the", "tokens": [50716, 366, 3264, 264, 636, + 436, 366, 286, 500, 380, 458, 604, 2995, 291, 362, 613, 5245, 281, 1565, 646, 281, + 264, 50984], "temperature": 0.0, "avg_logprob": -0.1281579907020826, "compression_ratio": + 1.8516746411483254, "no_speech_prob": 0.004840914625674486}, {"id": 478, "seek": + 310812, "start": 3120.52, "end": 3127.4, "text": " image where the where the filters + are learned more or less you have some idea or where the model is", "tokens": [50984, + 3256, 689, 264, 689, 264, 15995, 366, 3264, 544, 420, 1570, 291, 362, 512, 1558, + 420, 689, 264, 2316, 307, 51328], "temperature": 0.0, "avg_logprob": -0.1281579907020826, + "compression_ratio": 1.8516746411483254, "no_speech_prob": 0.004840914625674486}, + {"id": 479, "seek": 310812, "start": 3127.4, "end": 3133.88, "text": " looking but + maybe to put more research on slow down let''s slow down these race and let''s understand", + "tokens": [51328, 1237, 457, 1310, 281, 829, 544, 2132, 322, 2964, 760, 718, 311, + 2964, 760, 613, 4569, 293, 718, 311, 1223, 51652], "temperature": 0.0, "avg_logprob": + -0.1281579907020826, "compression_ratio": 1.8516746411483254, "no_speech_prob": + 0.004840914625674486}, {"id": 480, "seek": 313388, "start": 3133.96, "end": 3140.44, + "text": " and maybe we find a way to make it more sustainable for everyone yeah + because I remember like", "tokens": [50368, 293, 1310, 321, 915, 257, 636, 281, + 652, 309, 544, 11235, 337, 1518, 1338, 570, 286, 1604, 411, 50692], "temperature": + 0.0, "avg_logprob": -0.1108939124316704, "compression_ratio": 1.7168141592920354, + "no_speech_prob": 0.005945454817265272}, {"id": 481, "seek": 313388, "start": 3140.44, + "end": 3146.04, "text": " when I was doing my PhD in machine translation it was + using like statistical models like Moses and", "tokens": [50692, 562, 286, 390, + 884, 452, 14476, 294, 3479, 12853, 309, 390, 1228, 411, 22820, 5245, 411, 17580, + 293, 50972], "temperature": 0.0, "avg_logprob": -0.1108939124316704, "compression_ratio": + 1.7168141592920354, "no_speech_prob": 0.005945454817265272}, {"id": 482, "seek": + 313388, "start": 3146.04, "end": 3151.56, "text": " you know statistical machine + translation and so it would suffer from things like out of vocabulary", "tokens": + [50972, 291, 458, 22820, 3479, 12853, 293, 370, 309, 576, 9753, 490, 721, 411, 484, + 295, 19864, 51248], "temperature": 0.0, "avg_logprob": -0.1108939124316704, "compression_ratio": + 1.7168141592920354, "no_speech_prob": 0.005945454817265272}, {"id": 483, "seek": + 313388, "start": 3151.56, "end": 3158.2000000000003, "text": " and you know how + do I bring syntax in and whatnot but then like when deep learning came like all", + "tokens": [51248, 293, 291, 458, 577, 360, 286, 1565, 28431, 294, 293, 25882, 457, + 550, 411, 562, 2452, 2539, 1361, 411, 439, 51580], "temperature": 0.0, "avg_logprob": + -0.1108939124316704, "compression_ratio": 1.7168141592920354, "no_speech_prob": + 0.005945454817265272}, {"id": 484, "seek": 315820, "start": 3158.2, "end": 3164.04, + "text": " of a sudden you see that it translates much much better and you think + wow probably probably", "tokens": [50364, 295, 257, 3990, 291, 536, 300, 309, 28468, + 709, 709, 1101, 293, 291, 519, 6076, 1391, 1391, 50656], "temperature": 0.0, "avg_logprob": + -0.09600380117242986, "compression_ratio": 1.7851851851851852, "no_speech_prob": + 0.0020337915048003197}, {"id": 485, "seek": 315820, "start": 3164.04, "end": 3169.48, + "text": " they solved it now right the claim from 50s that we will solve machine + translation probably now is", "tokens": [50656, 436, 13041, 309, 586, 558, 264, + 3932, 490, 2625, 82, 300, 321, 486, 5039, 3479, 12853, 1391, 586, 307, 50928], "temperature": + 0.0, "avg_logprob": -0.09600380117242986, "compression_ratio": 1.7851851851851852, + "no_speech_prob": 0.0020337915048003197}, {"id": 486, "seek": 315820, "start": 3169.48, + "end": 3175.8799999999997, "text": " delivered that the promise but but then you + notice it''s fluent but it''s kind of like I don''t want", "tokens": [50928, 10144, + 300, 264, 6228, 457, 457, 550, 291, 3449, 309, 311, 40799, 457, 309, 311, 733, 295, + 411, 286, 500, 380, 528, 51248], "temperature": 0.0, "avg_logprob": -0.09600380117242986, + "compression_ratio": 1.7851851851851852, "no_speech_prob": 0.0020337915048003197}, + {"id": 487, "seek": 315820, "start": 3175.8799999999997, "end": 3180.52, "text": + " to use the word stupid but it just doesn''t get it right like it it makes wolf + subject an object", "tokens": [51248, 281, 764, 264, 1349, 6631, 457, 309, 445, + 1177, 380, 483, 309, 558, 411, 309, 309, 1669, 19216, 3983, 364, 2657, 51480], "temperature": + 0.0, "avg_logprob": -0.09600380117242986, "compression_ratio": 1.7851851851851852, + "no_speech_prob": 0.0020337915048003197}, {"id": 488, "seek": 315820, "start": 3180.52, + "end": 3186.6, "text": " easily it may go and hallucinate about something that doesn''t + exist there or it actually goes and", "tokens": [51480, 3612, 309, 815, 352, 293, + 35212, 13923, 466, 746, 300, 1177, 380, 2514, 456, 420, 309, 767, 1709, 293, 51784], + "temperature": 0.0, "avg_logprob": -0.09600380117242986, "compression_ratio": 1.7851851851851852, + "no_speech_prob": 0.0020337915048003197}, {"id": 489, "seek": 318660, "start": 3186.6, + "end": 3192.92, "text": " translates into like single letters all of all of a sudden + you know or repeating engrams or like", "tokens": [50364, 28468, 666, 411, 2167, + 7825, 439, 295, 439, 295, 257, 3990, 291, 458, 420, 18617, 465, 1342, 82, 420, 411, + 50680], "temperature": 0.0, "avg_logprob": -0.23640477657318115, "compression_ratio": + 1.7455357142857142, "no_speech_prob": 0.0007487453403882682}, {"id": 490, "seek": + 318660, "start": 3192.92, "end": 3200.2, "text": " you see that it didn''t exactly + solve it right you wouldn''t trans your life to such a system yet", "tokens": [50680, + 291, 536, 300, 309, 994, 380, 2293, 5039, 309, 558, 291, 2759, 380, 1145, 428, 993, + 281, 1270, 257, 1185, 1939, 51044], "temperature": 0.0, "avg_logprob": -0.23640477657318115, + "compression_ratio": 1.7455357142857142, "no_speech_prob": 0.0007487453403882682}, + {"id": 491, "seek": 318660, "start": 3200.2, "end": 3205.56, "text": " no yeah and + then you kind of come back and like okay and I used to do it like in a rule-based + approach", "tokens": [51044, 572, 1338, 293, 550, 291, 733, 295, 808, 646, 293, + 411, 1392, 293, 286, 1143, 281, 360, 309, 411, 294, 257, 4978, 12, 6032, 3109, 51312], + "temperature": 0.0, "avg_logprob": -0.23640477657318115, "compression_ratio": 1.7455357142857142, + "no_speech_prob": 0.0007487453403882682}, {"id": 492, "seek": 318660, "start": 3205.56, + "end": 3212.7599999999998, "text": " so I could understand the syntax of the of + the sentence and then semantics like nod in the tree", "tokens": [51312, 370, 286, + 727, 1223, 264, 28431, 295, 264, 295, 264, 8174, 293, 550, 4361, 45298, 411, 15224, + 294, 264, 4230, 51672], "temperature": 0.0, "avg_logprob": -0.23640477657318115, + "compression_ratio": 1.7455357142857142, "no_speech_prob": 0.0007487453403882682}, + {"id": 493, "seek": 321276, "start": 3213.7200000000003, "end": 3219.88, "text": + " and then when I translate I use some semantic like function and it''s all well-defined + in the", "tokens": [50412, 293, 550, 562, 286, 13799, 286, 764, 512, 47982, 411, + 2445, 293, 309, 311, 439, 731, 12, 37716, 294, 264, 50720], "temperature": 0.0, + "avg_logprob": -0.1517265131185343, "compression_ratio": 1.6933333333333334, "no_speech_prob": + 0.005797029007226229}, {"id": 494, "seek": 321276, "start": 3219.88, "end": 3224.6000000000004, + "text": " the astronomy of semantic functions and so on like okay now I go back + to deploying do you have", "tokens": [50720, 264, 37844, 295, 47982, 6828, 293, + 370, 322, 411, 1392, 586, 286, 352, 646, 281, 34198, 360, 291, 362, 50956], "temperature": + 0.0, "avg_logprob": -0.1517265131185343, "compression_ratio": 1.6933333333333334, + "no_speech_prob": 0.005797029007226229}, {"id": 495, "seek": 321276, "start": 3224.6000000000004, + "end": 3231.1600000000003, "text": " anything like that no it''s like just space + maybe there is a there should be a way to combine I don''t", "tokens": [50956, 1340, + 411, 300, 572, 309, 311, 411, 445, 1901, 1310, 456, 307, 257, 456, 820, 312, 257, + 636, 281, 10432, 286, 500, 380, 51284], "temperature": 0.0, "avg_logprob": -0.1517265131185343, + "compression_ratio": 1.6933333333333334, "no_speech_prob": 0.005797029007226229}, + {"id": 496, "seek": 321276, "start": 3231.1600000000003, "end": 3236.28, "text": + " know we have built I think as humans we have built this complex way of talking + to each other", "tokens": [51284, 458, 321, 362, 3094, 286, 519, 382, 6255, 321, + 362, 3094, 341, 3997, 636, 295, 1417, 281, 1184, 661, 51540], "temperature": 0.0, + "avg_logprob": -0.1517265131185343, "compression_ratio": 1.6933333333333334, "no_speech_prob": + 0.005797029007226229}, {"id": 497, "seek": 323628, "start": 3236.36, "end": 3245.0, + "text": " which I mean multiple languages and stuff and there is no way that all + these language can go back", "tokens": [50368, 597, 286, 914, 3866, 8650, 293, 1507, + 293, 456, 307, 572, 636, 300, 439, 613, 2856, 393, 352, 646, 50800], "temperature": + 0.0, "avg_logprob": -0.35277674275059856, "compression_ratio": 1.576271186440678, + "no_speech_prob": 0.0032870080322027206}, {"id": 498, "seek": 323628, "start": 3245.0, + "end": 3255.88, "text": " to this deep learning world world it seems country to + if div at least yeah yeah so you are certain that", "tokens": [50800, 281, 341, + 2452, 2539, 1002, 1002, 309, 2544, 1941, 281, 498, 3414, 412, 1935, 1338, 1338, + 370, 291, 366, 1629, 300, 51344], "temperature": 0.0, "avg_logprob": -0.35277674275059856, + "compression_ratio": 1.576271186440678, "no_speech_prob": 0.0032870080322027206}, + {"id": 499, "seek": 323628, "start": 3255.88, "end": 3262.76, "text": " like maybe + the voice of those who build alternative models to deploying off a", "tokens": [51344, + 411, 1310, 264, 3177, 295, 729, 567, 1322, 8535, 5245, 281, 34198, 766, 257, 51688], + "temperature": 0.0, "avg_logprob": -0.35277674275059856, "compression_ratio": 1.576271186440678, + "no_speech_prob": 0.0032870080322027206}, {"id": 500, "seek": 326276, "start": 3262.76, + "end": 3268.5200000000004, "text": " alternative approach should be maybe louder + yeah I think that we may suffer from the bias of", "tokens": [50364, 8535, 3109, + 820, 312, 1310, 22717, 1338, 286, 519, 300, 321, 815, 9753, 490, 264, 12577, 295, + 50652], "temperature": 0.0, "avg_logprob": -0.17831427630256205, "compression_ratio": + 1.7924528301886793, "no_speech_prob": 0.0033132873941212893}, {"id": 501, "seek": + 326276, "start": 3268.5200000000004, "end": 3274.0400000000004, "text": " the winner + no I mean maybe the first one who opens a door might not do in the race because", + "tokens": [50652, 264, 8507, 572, 286, 914, 1310, 264, 700, 472, 567, 9870, 257, + 2853, 1062, 406, 360, 294, 264, 4569, 570, 50928], "temperature": 0.0, "avg_logprob": + -0.17831427630256205, "compression_ratio": 1.7924528301886793, "no_speech_prob": + 0.0033132873941212893}, {"id": 502, "seek": 326276, "start": 3274.76, "end": 3281.6400000000003, + "text": " but even if they show another way that the race might go I think they + might deserve more attention", "tokens": [50964, 457, 754, 498, 436, 855, 1071, + 636, 300, 264, 4569, 1062, 352, 286, 519, 436, 1062, 9948, 544, 3202, 51308], "temperature": + 0.0, "avg_logprob": -0.17831427630256205, "compression_ratio": 1.7924528301886793, + "no_speech_prob": 0.0033132873941212893}, {"id": 503, "seek": 326276, "start": 3282.36, + "end": 3290.36, "text": " yeah yeah this this is quite deep thanks for this white + white section like you like you you think", "tokens": [51344, 1338, 1338, 341, 341, + 307, 1596, 2452, 3231, 337, 341, 2418, 2418, 3541, 411, 291, 411, 291, 291, 519, + 51744], "temperature": 0.0, "avg_logprob": -0.17831427630256205, "compression_ratio": + 1.7924528301886793, "no_speech_prob": 0.0033132873941212893}, {"id": 504, "seek": + 329036, "start": 3290.36, "end": 3295.6400000000003, "text": " about it a lot like + kind of okay not to be biased okay yes there are challenges but", "tokens": [50364, + 466, 309, 257, 688, 411, 733, 295, 1392, 406, 281, 312, 28035, 1392, 2086, 456, + 366, 4759, 457, 50628], "temperature": 0.0, "avg_logprob": -0.1141632263918957, + "compression_ratio": 1.6807511737089202, "no_speech_prob": 0.0027028413023799658}, + {"id": 505, "seek": 329036, "start": 3296.36, "end": 3302.76, "text": " it might + not be the only right approach and giving you experience as well like you can judge + a", "tokens": [50664, 309, 1062, 406, 312, 264, 787, 558, 3109, 293, 2902, 291, + 1752, 382, 731, 411, 291, 393, 6995, 257, 50984], "temperature": 0.0, "avg_logprob": + -0.1141632263918957, "compression_ratio": 1.6807511737089202, "no_speech_prob": + 0.0027028413023799658}, {"id": 506, "seek": 329036, "start": 3302.76, "end": 3310.36, + "text": " little bit like with with your open eyes I think we should explore more + and maybe not one", "tokens": [50984, 707, 857, 411, 365, 365, 428, 1269, 2575, + 286, 519, 321, 820, 6839, 544, 293, 1310, 406, 472, 51364], "temperature": 0.0, + "avg_logprob": -0.1141632263918957, "compression_ratio": 1.6807511737089202, "no_speech_prob": + 0.0027028413023799658}, {"id": 507, "seek": 329036, "start": 3311.1600000000003, + "end": 3317.0, "text": " focus of that yeah and and probably explore with Gina right + that''s the talk for sure yeah", "tokens": [51404, 1879, 295, 300, 1338, 293, 293, + 1391, 6839, 365, 34711, 558, 300, 311, 264, 751, 337, 988, 1338, 51696], "temperature": + 0.0, "avg_logprob": -0.1141632263918957, "compression_ratio": 1.6807511737089202, + "no_speech_prob": 0.0027028413023799658}, {"id": 508, "seek": 331700, "start": 3317.64, + "end": 3322.44, "text": " so this is this is super great is there something you + want to share you already shared that the", "tokens": [50396, 370, 341, 307, 341, + 307, 1687, 869, 307, 456, 746, 291, 528, 281, 2073, 291, 1217, 5507, 300, 264, 50636], + "temperature": 0.0, "avg_logprob": -0.1539045447733865, "compression_ratio": 1.6436781609195403, + "no_speech_prob": 0.005217256955802441}, {"id": 509, "seek": 331700, "start": 3322.44, + "end": 3328.68, "text": " fine tuner is available so our listeners can go and check + it out right is there something else that", "tokens": [50636, 2489, 4267, 260, 307, + 2435, 370, 527, 23274, 393, 352, 293, 1520, 309, 484, 558, 307, 456, 746, 1646, + 300, 50948], "temperature": 0.0, "avg_logprob": -0.1539045447733865, "compression_ratio": + 1.6436781609195403, "no_speech_prob": 0.005217256955802441}, {"id": 510, "seek": + 331700, "start": 3329.32, "end": 3339.48, "text": " like we need to be expecting + and early next year we should be releasing our 3.0 version so", "tokens": [50980, + 411, 321, 643, 281, 312, 9650, 293, 2440, 958, 1064, 321, 820, 312, 16327, 527, + 805, 13, 15, 3037, 370, 51488], "temperature": 0.0, "avg_logprob": -0.1539045447733865, + "compression_ratio": 1.6436781609195403, "no_speech_prob": 0.005217256955802441}, + {"id": 511, "seek": 333948, "start": 3340.44, "end": 3350.76, "text": " the stay + tuned for that yeah yeah comment we will be moving fast and the next times yeah + this is", "tokens": [50412, 264, 1754, 10870, 337, 300, 1338, 1338, 2871, 321, 486, + 312, 2684, 2370, 293, 264, 958, 1413, 1338, 341, 307, 50928], "temperature": 0.0, + "avg_logprob": -0.24917275977857184, "compression_ratio": 1.653179190751445, "no_speech_prob": + 0.005728862248361111}, {"id": 512, "seek": 333948, "start": 3350.76, "end": 3356.12, + "text": " fantastic I mean thanks so much for all this information and detail on + Gina and also like your", "tokens": [50928, 5456, 286, 914, 3231, 370, 709, 337, + 439, 341, 1589, 293, 2607, 322, 34711, 293, 611, 411, 428, 51196], "temperature": + 0.0, "avg_logprob": -0.24917275977857184, "compression_ratio": 1.653179190751445, + "no_speech_prob": 0.005728862248361111}, {"id": 513, "seek": 333948, "start": 3357.2400000000002, + "end": 3363.56, "text": " your ambition and then kind of like you''re thinking here + I mean it''s really nice that you keep", "tokens": [51252, 428, 22814, 293, 550, + 733, 295, 411, 291, 434, 1953, 510, 286, 914, 309, 311, 534, 1481, 300, 291, 1066, + 51568], "temperature": 0.0, "avg_logprob": -0.24917275977857184, "compression_ratio": + 1.653179190751445, "no_speech_prob": 0.005728862248361111}, {"id": 514, "seek": + 336356, "start": 3363.56, "end": 3370.68, "text": " your open mind available to + all of our listeners yeah thanks so much it was a pleasure to talk to", "tokens": + [50364, 428, 1269, 1575, 2435, 281, 439, 295, 527, 23274, 1338, 3231, 370, 709, + 309, 390, 257, 6834, 281, 751, 281, 50720], "temperature": 0.0, "avg_logprob": -0.3123651146888733, + "compression_ratio": 1.608, "no_speech_prob": 0.0056417565792799}, {"id": 515, "seek": + 336356, "start": 3370.68, "end": 3379.16, "text": " you John today yeah thank you + so much yeah thank you much looking forward to 3.0 yeah thank you bye bye", "tokens": + [50720, 291, 2619, 965, 1338, 1309, 291, 370, 709, 1338, 1309, 291, 709, 1237, 2128, + 281, 805, 13, 15, 1338, 1309, 291, 6543, 6543, 51144], "temperature": 0.0, "avg_logprob": + -0.3123651146888733, "compression_ratio": 1.608, "no_speech_prob": 0.0056417565792799}, + {"id": 516, "seek": 339356, "start": 3393.56, "end": 3396.16, "text": " you", "tokens": + [50406, 291, 50494], "temperature": 1.0, "avg_logprob": -2.8450746536254883, "compression_ratio": + 0.2727272727272727, "no_speech_prob": 0.537224292755127}]' +--- + +Hey everyone, Bector Podcast is here and today we are continuing our quest to study more about Bector Technologies and Beding Technologies platforms and today I have a guest from Gina AI, his name is John Fontanelles and he is a principal engineer at Gina AI. Hey John. Hello, nice to meet you. +Yeah, nice to meet you as well. Thanks for joining me today and I'm really really excited to talk about what is Gina AI? I know something I used to use some kind of predecessors of Gina AI in some sense, but not like Gina AI itself. +But first of all I would like you to introduce yourself, your background to our listeners and to me. + So, well I studied engineering degree in Barcelona, not computer science, from general engineering with my AmiExelectical Engineering, Mechanical Engineering, but I got into software engineering because I was related with robotics and then when I started my professional career I did software engineering at different companies and industries and then I got more into that engineering machine learning and these kinds of fields and then I also did some work on traditional search, on web search, engine so on and then just live brought me to Gina which was a good step in my career. +Oh yeah, cool. +So, what caught your eye in Gina AI as a company and maybe as a technology as a product or maybe the team? + So, for me what caught me the eye, it was like the technology and the vision of I see that vector search embedding a semantic search in general can revolutionize how we understand search and can bring it to the next level also and adapt to different kind of data or so on and go beyond the typical search bar that we are so much used to. +Yeah, yeah and I mean but Gina is like more than just embedding or kind of it's more like a ecosystem right like it has like marketplace it has many different building blocks and components. + This is what I think most of the people that might be hearing us might be wondering much because it's a question that we receive a lot so we are not such another vector database as the ones that have been created in the podcast so we are treating the problem of semantic search and we are seeing this as a then-to-end problem and we are trying to build an ecosystem to help the business and developers to develop their own neural search based engines and for that we are trying to build a ecosystem from the core to our document types where we are also recently and launched this fine tuner project to help you with fine tuning your models for your search applications so we are building a whole family of products and projects in this around this area of neural search. +Yeah it sounds quite ambitious and it sounds like all of these building blocks are really needed for anybody who wants to venture into embedding world of semantic search or you know kind of bringing the power of this deep learning models. + So it goes beyond only only embedding your data and searching through it you may want to cut it into different pieces, you may want to re-run it at the end, you may want to join different modalities together so we are trying to give and make it easy for the user to develop these applications so that they speak the same language and we hope that they will all speak gene language. +Oh yeah, oh yeah, for sure. +And GNI is open source, right? + Yes, so can you speak a bit more like towards the business model or kind of how GNI kind of makes money in a way like so basically it's open source, anybody can go and download it and basically leverage in their work or is there something that like you have some products for which customers can pay and kind of right now we are right now we are completely open source everything that you can see in our report and stuff each open for everyone. +Yeah so so okay and you are like mostly working on back-and-side of things so you're not interacting with direct customers right? Is that okay? I'm working mostly on the main products and what do you hear about use cases like how do they translate to your level of kind of day-to-day job? + So most of our solution engineers that say that are closer to clients and users they bring guys their pain points on how they are trying to solve users needs and some of the main use cases that we are trying to solve come from textile search, e-match search, multimodal search that is something that we are trying to excel at that is going beyond only just using search and text or images to search maybe trying to have a combination of walls to power search to the next level. +So they might like bring some kind of use case that you need to figure out on tech level right? +Yes kind of translates to you but on the other hand like you said it's open source so it means like there is like a bunch of like GitHub issues coming in right and if you have like Slack or I don't know if you're using Slack anyway. +Yeah, so like probably every day like somebody you wake up and there are questions there right? So it's also clients in a way right? + Yes for me my users are our clients and we have to listen to them so that's the big point of open source in my opinion is this direct feedback from the users we can you can correct your direction and you can measure if your APIs are or your design are too complex for the user to rush or whatever so this direct feedback is really useful. +And to this point it's manageable. +Yeah yeah but it's also like I guess I also alluded to this and one of the podcasts was with Bob One Lloyd from from semi like it's also sometimes maybe to give up with all the questions right? Like if you get all these questions when do you find the answer to kind of really deeply answer? +Yeah fine time for answer them yeah. +So we are trying to grow our team into knowing that the community is something that makes us special and it's important for us to take care of our community so we are all trying to keep an eye on the community. +Yeah yeah yeah I remember like when I was developing like search code we were using like Apache Solar and I had to like customize some parts of solar and listen and I remember like in order for me to kind of get up to speed I had to go to this mailing list right? + And so there are like thousands and thousands emails actually Apache Solar was super active you know like in sillies in many ways and and I was like how can I keep up with all these questions but like I do need to somehow keep up and summarize maybe what what is being asked there in order to understand it's useful for me or not because when you ask a question on the mailing list or like today on Slack sometimes you need to be ready to pay back right? +If somebody help you in the community like you sometimes need to also pay back so it's like it's a game. + When this is seen in the community I think it's really pleasant for all the team when community interacts with each other and none of the no one in the team has to jump in because they so they help each other that's when I think the community really scales and really open source goes to the next level. +Yeah it's kind of regenerating itself and kind of the cultural element of it so and the community drives you forward I mean just driving force of the project from the interaction point and the feature wise as well. Yeah sounds good. +So John tell me more about GNI itself like as a product let's say as a technology stack like what can I do as a user you know using GNI and yeah like is it self-series and so on. + So the main point of GNI is that we want to be with the user from the minute they are experimenting with their search application so for instance we are written in Python and we have a really nice API in Python to build with your documents that can treat with any type of data, text, images, audio, video and we are trying to build a really easy to use API for this for you to run locally your solutions. + The first experimental facing to wrap your code for processing loading the files and for processing the images or whatever and embedding them searching to do a process many as neighbors or exact nearest neighbor search then once you have this we make it easy for you to wrap it in some microservices what we call executors so first phase you deal with these document array types that we have come with then you come with them to the next layer is you have it with the executors so you wrap your logic in different microservices and then we put it in what we call a flow that is kind of a pipeline that is really to scale locally or remotely or even with Kubernetes so that you have replication and scalability taken care for. +So we are trying to bring you easily from your day zero of development to the production system. +Yeah yeah sounds good sounds comprehensive and like what if I would like to just use like a hosted version can I use a hosted version from Gina AI or do I need to do an operation? There is no hosted version at this point yeah so it's basically I need it's like a Lego type of thing right? +Yes exactly. +I will have a nice deployment. And we have even this marketplace as you said this with this helicopter cutter so you can share publicly or privately with your colleagues or with the community your meeting blocks that you may think they are useful for you. +Yeah modern deep learning models that you have packed processing, copy-done, re-runking, even back to research research. +Yeah so and how does it align also with like companies or hubs like Hagen phase you know Hagen phase is also very famous on model side right? So like let's show somebody picks a model and wants to bring it to Gina what's the process there? + So it's quite I would say having phase it's quite inspirational for us in this sense in this marketplace community and place it is quite similar but um Gina is this marketplace is related to our executor so it goes beyond only models so it's any subsystem enabled and block that you can that you can build that is able to be part of this of this Gina pipeline for us and we are trying to make it user-phone for you to localize it and use it in any way in a simple API and we're still working to make it easier every time. + Yeah of course because actually you know it tends to get a lot of time you know the infrastructure part like how do I bring my model let's say I have a custom model and I want to bring it inside Gina right so it serves as a embedding layer so how do I figure out all this scalability or latency parameters and so on so I think so the first thing is to get it working we are having to we expose these with these executors that have some API and to that read requests with some maybe I inspired with this fast API approach and then you have with this row you have the parameters to replicate to scale and so on you you may run it in GPU whatever yeah yeah so like you can choose your cost kind of like factors right or based on your cost factors you can choose it's CPU actually and then latency and for some models actually CPU is fine so yeah I mean why not yeah it depends also on the user needs so for instance we are also seeing that neural search main not all is not needed to be only for these big giants with this big amount of data and big amount of resources so any company it's more company can benefit from the power of the neural networks to power their search so they may not need so much require so much resources or they may not require so much speed so it's about and so we are giving the power more or less to use yeah and kind of flexibility of the platform so because essentially if they wanted to do it from scratch then they would probably need to figure out similar things like component isolation and scaling and yeah like an algorithm like a quality checks and so on and on the algorithm side you said like you have exact search as well as in exact search can you talk with more and kind of mention maybe some algorithms that you support so yeah so right now natively we support as the main native quite optimized version of the and exact nearest neighbor search but then for instance one of these building blocks can be any support wrapping any client for any other vector database but for instance we just realized our own and approximate nearest neighbor solution we have two of them for instance that we have developed so much so we have one that is based on hsw plus a postgres indexer a postgres database for to require the documents and then we have built our well we just released and in Slack the community can start enjoying it we have and build what we call pcolyte which and works with product quantization but also has support for hsw you said pcolyte or how do you spell that? +P2Lite, P2Lite, which is like product quantization light version. +Yes we and profiltering options as well. +Oh with preview and how in what sense is it light, compared to product quantization? No I have not been involved so much in this spreader right now so it's a new thing but it is light in sense of that it is quite embedded and it's quite native to work with our document type. +So it's not so general as any object, but it is really built to integrate very easily with Jina. Oh, I see. Like with specific kind of schema or document types. And it's also open source. And do you do you like obviously you can provide the links or we can also link in the show notes. +But do you also like have some kind of latency analysis for this algorithm? Like has it been conducted? Do you know? Yeah, there is some benchmarks that you're going to find in the read. I cannot have the numbers in my head right now. +Yeah, but I think for portion of our audience it's going to be interesting to check out because as you know, like actually my team just completed participation in big A&N. I don't know if you heard about this competition. So it's like Villion scale approximate near nearest neighbor search. +So we invented like a new algorithm called BUDGPQ. I will also link in the show notes like the blog post about it. So we increased recall by 12% over FIES model. So yeah, FIES algorithm. So yeah, I think it's great that you guys also inventing. I don't know if we are testing to this billion scale. +I think we are more in the million scale. Yeah, actually, we also ventured into billion scale, but in the process we figured out a solution for million scale. So it's not for billion years. We don't know yet if we can generalize to that level, but I think we can with some additional research. +Well, this is the first version. So for sure, we will try to improve it. Yeah, awesome. Awesome. This is great. And have you also helped customers to like train models? No, but we don't, we didn't help customers. +Well, we did from our solution point of view, but this is an interesting topic because this is something that of the, this is one of the pains that we found quite often with our users. +Like it was easy for them to go to that level, 70% let's say of accuracy with any deep learning model that all these tech giants have developed, right? But we believe that this last mile, this transfer learning part is important. +And we are, and when we realize we started this project that is we called, well, we know it's already released, the fine tuner. Maybe we can share that as well, where we try to make it easy for users to fine tune their models for a metric learning search applications. +And they are, and it is also framework agnostic for, we support fighters, TensorFlow and paddle, paddle. So we realized this pain point for the users that once we have everything running at home, the quality was not as expected. +And this, and we are trying to get to help the user in our ecosystem to get to this, yeah, to this level by using this fine tuner. +So basically, can you can you explain a bit more about fine tuner? Like basically what, what input do I need to provide as a user into this? So fine tuner, it could feel similar to any fighter dataset, for instance, but we are trying to put our documents as our as the main citizen of our ecosystem. +So you have to wrap your any of your data into our document types, which is really easy. So it's something easy to learn and easy to use. And then you can fit your models and we have made it easy for you to use the most typical, those functions we are trying to introduce, hard negative mining. +We are trying to make it easy for everyone to solve the common problems when having, when training for, for search applications. +And we are also trying to make an interactive labeler that helps you interactively through an easy to use UI and tag similar objects so that you can go together with them. Yeah, yeah, so like, kind of, I mean, fine tuning can be a pipeline by itself, right? In itself. +So like, how do you get these data samples that you want to fine tune on? And you might have them with full launch or during test, after launch, and it's like, you know, the cycle and flywheel of success, so to say, right? +So do you cover like the full workflow until production, including production, or is it like pre-production? So for now, we are using the just embedding model. +And just to get embeddings that get better semantics out of your data set of your specific use case. But we are in a really thing, it's easier to point to release or something around in there, so there's a long way to go. Yeah, for sure. +But I mean, the direction is fantastic because that's exactly what, what addresses the real need, any user, like fine tuning. Like it's all fancy to take like a hugging case model or whatever, but like fine tuning it to the level when you're users beloved, that's a different story. +Yeah, that sounds great. +But I also wanted to come back to your, like, you mentioned that Gina AI doesn't kind of compare to vector databases, but I do get sometimes questions like how do these systems compare to each other? +And you may or may not know, I've blocked about all vector databases I knew to that point and turns out they've been six and then the seventh one knocked on the door, so it's also now on the blog. +But I didn't cover Gina AI, I didn't cover deep sets haystack because I thought that Gina and haystack, they're like layers above a vector database. Is that the right thinking? Yes, I think it makes sense. We are, we might try to develop our solutions for the use cases that we may feel more worth. +So that is, I mean, the one is out there to do, but yeah, I think it's right. We are trying to, I think, vector databases cover one of the parts or one of the challenges, maybe one of the main challenges of vector search or neural search, but we try to see the whole scope and the whole pipeline. +So, in Gina, we can use, you can wrap some client that will use any of the big vector searches, big data research of how there have you done any integration with some vector database? Not ourselves right now, but it would be, we might do it in the future. +Okay, yeah, because for now, you did mention that you offer GNN and algorithms, which to me sounds like a core building block of vector database, but then of course in vector database, you have many more things, right? Like, where do you store objects? How you store them? +What about filters and so on? But we are trying to cover from the, for instance, we are not some, some people for some use cases, and just exactly as neighbor search might work just fine, and they don't need to worry about configuring fancy A&N models for their recall speed requirements. +So, I think there is room for everyone. So, I think it just, you have to offer what is right for the right use case and the right need. Yeah, of course. +And by the way, what's the core programming language used in Gina? + So, our core programming language is Python, because we are more like, since we are this pipeline and we are like a glue ecosystem, most of our operations are wrapping models that run in optimized languages or something, and that also Python helps us to iterate really fast, which other languages might slow us down. +Yeah, that's true. And does it also apply to the A&N algorithms that you mentioned, like BQLite? Is it also Python? I don't know if we are, for instance, I think we are also using some bindings for H&N. So, you are using probably C++ version of H&N SW binding to Python, right? Yes, that's for sure. +But I don't know if some of, for the H&N SWD, yes, for some other parts, I don't know, we are trying to optimize whatever we find. +Yeah, but it sounds cool that, you know, if we still continue kind of this comparison a little bit between Gina and vector databases, like vector databases, if you pick them, let's say BIAV8 is implemented in Go, what grant is implemented in Rust? So, these are compiling languages, right? +So, VESPA is like Java plus some C, I think, C++ and mostly Java. +So, like, nobody implements the vector search in pure Python, because it's very, it's going to be very taxing on the latency, you know? Sure. No, but the expensive operation, we are not running. +So, for instance, the nearest neighbor search we are doing, we are based on NAMPA operations, which are optimized at NAMPA level, and the approximations neighbors, I think, most of the heavy lifting is done on the C++ level from, I'm just covering our bindings. +Yeah, and I'm still curious about BQ Lite, like, is it the C or is it Python, but I think we need to check the documentation. Yes. +Yeah, I'm curious because like, I've actually invented a new algorithm in NAMPA search, but I haven't published it widely, it's open source, but I haven't done the thorough benchmarking. + And what I've faced is that, you know, like, in Python, even though I optimized all parts of the algorithm, I'm using preallocation and NAMPA, it still runs out of memory, runs out of memory as in it leaks memory, and it doesn't explain, like, Python virtual machine doesn't tell you where, like, you don't have tools. +Okay, there are some tools, but they're not useful. Like, no, you're showing a little stuff. No. +So, and I've been like a little bit like desperate, and I've been thinking, okay, should I now move into RAST GO territory, which might be a little bit more dangerous, like, even though I do have some experience in C++, but you know, like, do I want to go there now? +Like, Python is much more comfortable. +The, I think, depends on the later you are working with, and it's, so I think that by offering Python APIs in the field, if machine learning will attract, then we'll make everybody much easier to use. +Then if you get API rights, the API is right, you might then bind it to whatever of your favorite languages, but I think getting the comfortable API for that developer to use and to love using is one of the key first steps. +So, do you invest a lot into building these APIs? Can you give an example of like some API within Gina that kind of makes the workflow easier for? So, for instance, we are trying to improve a lot in this document. +So, documents are our central logic, and documents are raised that these are the two core members of our family in the ecosystem. So, we are spending a lot of time on making them easy to use. +For instance, with this fluent pattern, we are trying to invest a lot of time on finding the best way, the more Python way to work on it. Yeah, so it's a constant evolution, try and error. +Yeah, of course, but it's, it's like APIs is like exactly that layer, which is essentially like facing the customer, right? +And you don't know the scenarios they will use it in, and sometimes they might kind of surprise you, or they might say, okay, I found some work around for your like missing parts, but then you think, okay, I didn't think about it, right? +The API layer is a fantastic way of talking to your client through like API contract in a way, right? Yeah, and it's a quite a big challenge I would say to have the right balance between ease of use and flexibility. +So, what belongs there and what doesn't belong there? Because there's always a risk to put too much functionality in one same thing and make it very powerful, but make it a nightmare to use. +Yeah, so in these, in these balance, I think there is the key what is your choice when you have to choose? Let's say it's a balance of flexibility, or like flexibility, or what did you say the ease of use, right? ease of use. +I think we are now, I'm now attending to go for the ease of use because for instance, with these open source, I read that open source teaches you well. I think at some point, we did a nearly down to well the APIs and it was a little bit complex to use. +You could do a lot of things, but at the end maybe not everybody was doing. So, I think it's of use for the first century barrier. It's the most important thing. + Yeah, and I mean, also like it's interesting, you know, like if you have a real API, let's say deploy it somewhere and it's a published contract and people are sending queries there, then you know actually which endpoints which features are being used which are not, which options are completely ignored even though you put them in the dogs, right? +But how do you go about this in the open source code? Like somebody downloads your code, they use it somewhere, you don't know how. +So, how do you collect these analytics from them? Do you just send like call out messages, hey guys, what do you use, what do you don't? +Right now we are trying to keep attention on who is using guys, what, and when people ask us, we try to get the most information out of them, not information on the business of how they use it, how they feel. +So right now the community is the only source of information we have. That's the open source. +What? How do you talk to them? + Like do you like send like messages saying, hey guys, can you vote about keeping this feature and removing that one or not exactly like this, but would you see people that are more engaged or more or less engaged, people that are more finding it more easier or less or having more difficulties with your with your solution. +So it's and sometimes we have a development relations team that try to get also feedback from from the community in many terms. So this is a global effort. +But in the end you have you have a say, right? Like no matter what they ask you have a say, is that right? Well, I mean, sometimes you cannot please the community to all the extent, I don't know, we have to keep a road map. +For instance, people may want you to build something that is emanated, but maybe not so significant for search solutions. This is quite a confusion, I think. +So beyond search like where can I use GNA, what kind of other use cases have you seen beyond like kind of similarity search? +So since we are building these abstractions, it is quite easy for you to use these abstractions for building any classification model or anything as you really did, you could even deploy something and use GNA to easily deploy and scale and use your segmenter and object segmenter model. +But this is this is something that you could do, but GNA is born and will will be working to implement new research solutions. So you could still use this but might not be the best tool for it. So we are not born for that, but you could do it. But we can see that we are done. +You can do this because for instance classification or segmenting can object can be part of your pipeline, but in theory we are born to support search applications. Yeah, yeah. + So like, or for example, something that is or search applications or something that you can frame as a search application right now, for instance, a question-answering system that you can frame as a part where you will do something like research or spare search and then you have some real or model that extracts more information from something. +So anything that falls into this domain, you can do it. +Yeah, I guess you can also like based on the research and some practice happening in data augmentation based on retrieving, you can also formulate data augmentation as a process of search in principle, right? So the output will be your augmented data, but you use search in the middle. +Yes, actually that might be but search can be so many problems can be framed into search. I don't know. +At the end is like vectors are somehow like the truth, not like the semantic information, so how we don't understand exactly why but is encoded there, right? So just by clustering them together, somehow we have some understanding, so so many things to confirm with. +I also wanted to ask you a little bit like closer to the similarity search itself, you know, let's say I built a traditional kind of text search engine, okay? And I'm moving away from BM25, which is like probably majority of this market today. +So I'm thinking, okay, what are these cool kids doing? Maybe I should try it out, plug in some bird model. +So and but then in my UI, I am also showing snippets and it's very easy to show snippets when it's a keyboard search, right? So what should I do or what can I do with model like bird and genie AI to show snippets or something that will resemble snippets to the users? +Maybe you can also change your information so that you check where the attention is put in your model or somehow, but yeah, I think also there is a thing that we are framed and we have been grown into this keyboard search that it's so interpretable and so easy to use and even so easy to hack. +So how, you know, you know, you as a user know how to drive your search if you don't find it, right? Okay, this word might find you here. +And I think since these models are kind of black box for many of us, I think in this kind of sense, this interpretability is one of the main challenges and I think one of the main focus that research should go. +Yeah, but I mean, you you you call it out as an interpretability, but like for the user sent for me, let's say I'm a product manager, I don't care about is it bird model, is it VM25? I used to see snippets, I want to see them now. +So like, what should the point in VM25? I can give you a snippet because I know why I have this solution. Here is, well, it correlates and but where the information that I want is there. +Maybe for instance, in dinner, one of the main building blocks that we have is our document is our recursive structure because most of the things, for instance, if you find the search, if you search a document, a text or document, you might need to break into paragraphs into digs and so on. +So maybe what you can do is you do the vector search at the variety level is at sentence level, but then the results might be shown at paragraph or at sentence level. So you can highlight very easily the sentence that really and drove the search to this page. +I remember actually, I don't know if you know the block, was it salmon run, like Sujitpal, he's doing a lot of blogging in the area of like, here is the problem, how do I solve it? And then he, quite usually he goes into deep learning or trying out some vector search, maybe or not. +And I remember like he was saying that to solve this snippeting problem, how he would do it because he comes from his additional search and I do in a way. +And like he said, okay, you can kind of build, like, if I remember correctly, if you can do like almost like a dictionary, right? So let's say you take a word, you can embed it, take a word, you can embed it, like you can embed a dictionary, right? +Now when you found that document, you can kind of from embeddings, you can map back to the words. +If they happen to be closed enough, like geometrically, you can find closed enough words. So you can kind of try to say, okay, maybe these keywords are representative of this text, but I'm not 100% sure, but at least you try. So you go backwards, like reverse engineering from the embeddings. +It's interesting, sir. You may need to go through all the pain of dramatizing kind of these kind of stuff that you may have saved by going through semantic search and now you are back to it. So, like straight-offs, but yeah, it might be a good... +Yeah, dramatization is another thing, but like I think there was this paper from, I believe Google about byte level training, right? So they don't care if it's like lemma or if it's like suffix or prefix, they just go byte level. They don't go sub-word level. They go byte level. +And then with byte level, you can essentially kind of like, okay, now I can compute the distance again, right? Okay, how close is this to this dictionary word or not? +But then again, from there, in order to produce a snippet that will look like natural language, you will have to use some kind of model like GPT or like in general, generate the sentence. +And at that point, it might actually go completely different direction from your text, right? Start like hallucinating or write a news item that doesn't exist. So yeah. +Well, maybe you can use these extractive models from a sentence, giving a context, but nature, all these top-notch research is basically. Yeah, yeah. +But I mean, like attention, what you mentioned, attention probably can be used here, right? + So like you can ask the model, okay, what did you pay attention to when you did the matching, but still it's not some people, as you say, like you can say it interpretability, but on that hand, it's kind of like when you go specifically to that product case, you need that snippet or you need that kind of context of the match. +Or like if you said mathematics and it picked algebra, like why did it pick algebra? At least can you explain? Because here it's more or less obvious, but in a specific domain, it might not be, right? Yes, like what do we do? Maybe you are not using the right tool, I don't know. +Maybe we are obsessed on using the declared for everything. But I think these two walls of keyword, what we call traditional search and this neural search, I think they can be combined to power things to the next level. I think they need to be enemies and there is good and bad team both sides. +Do you have any thoughts how you would combine? For instance, in any solution, you can have solution, I don't know, maybe you can get results based on both sides and then at our ranking steps consider what is best, you know, is this a complex query? +Maybe I'm looking more for some semantically reached solution. +Did this guy just send a couple of keywords? Is it semantically reached enough? No, this user might be expecting keyword based feedback. Yeah, that's true. Well, you could even go as simple as giving that control to users. +So, if they know that it's keyword, they first want to go with what they know, what works or may not work and then if they are not satisfied enough, then they optimize for equal, they might go into explorative mode, that's on the similarity search. That might be quite viable. So, it's interesting. +The problem is that keyword search, well, as far as search, it might have not a good future for image based search or any other mobility related search. Yeah, exactly. The moment you go beyond text, what do you do? That's a big power, I think, and the big future that Neurochurch has ahead. +There is where not any traditional search solution, I think, will keep up. So, if I want to build a multimodal search, can I pick some executor from the marketplace and plug it into Gina today and do it? Yeah, I don't know. I think we have some, for instance, but you can use clip that clip. +You can use clip to encode. I think there is audio, you can text, or there is image and text, and it performs very well. We have wrapped it in one of these executors and hot modules, and you can use these clip models to do your close model search. +It's quite efficient without the match-faint tuning to search for images given text and the other way around. It's quite impressive. Yeah, that sounds cool. When I was thinking, if I want to combine like speech, text, and image, then I need to probably come up with some meta model of that. +Right? There is some research in this area where it is not that like modalities are treated differently and encoded separately, but where they are considered together, even there is some research where there is multimodality and some contact switch, so they move the vector. +So that's also possible to get the latest research, wrap it into one of these models and deploy it in production. But this is not so easy. For us, we didn't focus on building these front scratch, but we're also looking to having these top-notch researchers into building these modules in here. +So like, in that case, would you prefer communities to help out to bring in the model, or are you helping to do that? Right now, we are driving this direction to offer these for the community. I think that our dream as an open source is to have the community flourish and be alive upon itself. +So the future should be community driven. Yeah, because in the end, community might also know kind of when these growths be, you know, community will be kind of helping each other. +Like some of these things will become what you may call commodity to some extent, right? Or at least the way you integrate might become commodity and the use cases might become commodity and there will be new use cases which are untapped, but I think community can definitely help out each other. +What we might need to focus on to make these models easier to use or easier to find if we have a marketplace where everything, maybe we need to help the community on finding what they need in every time. Yeah, yeah. +Content wise, hopefully there is a time where community is the main contributor there. +Was there something else in Gina that we should know about as users? Some cool feature or some system that you think doesn't exist in competitors? Is there something at all to school? I don't know right now about competitors. +So I think what I like the most is the easy, the easy views and the time saving. You go out to our readme and you try to build from zero to to the plighting core net is an neural search solution and image search solution. I think you will you will all enjoy days and nights. Yeah. Yeah. +So it's like kind of well-oiled machine. But can I also bring it up on my laptop? Yes. You can try on your laptop everything. So the point is you may not be able to index so many images but you can get the first feeling with your laptop. +Yeah, I mean if I want to be like a demo to impress my manager, you know, so I usually use my laptop right? Like that's maybe one way. Gina is really for that. Yeah, that's that's pretty cool. + And I think also like it's nice that you said it's Python friendly so it opens doors to so many things like especially like on hiking face it's pretty much all Python right so I need to pick some models like it in and do I need to containerize it maybe or figure out isolation and so on like just plug it in and start using it. +I think that's also a great boost to productivity and actually kind of implementing the use case rather than focusing on some mundane components and parts and processes right? Yeah and even these modules that we have they are already containerized for you. +So we have on our end we build a container for you so that you can be in an isolated way with your all your dependencies and stuff. Yeah, yeah. Sounds great. I mean I think we now have pretty good understanding of Gina. +Of course we didn't read the docs yet but it's sounds promising so I hope some of listeners and audience will take it out. +I wanted to go more into this kind of philosophical level like what what drives you in this space? Like you said that you've been working in web scale search as well before right? And like some other search and engineering in general. +So what drives you here now in this area when you join Gina and why why you join Gina? So I joined Gina especially as I was in this traditional search space I was working on training ranking models. So what drove me more is to enable this search system, this search experience instance beyond text. +I was super curious about how we can extend it's impressive to me how the same framework of getting something that extracts meaningful vectors with semantic information can be used for images, for video, for audio, for anything. +These frameworks I think it has a lot of features because it's quite and also how the how the different research areas from different modalities interact with each other. I don't know I don't know. +Trans the conversion neural network appeared even some text classification used to do these then appeared the transformer right now the computer vision community is getting in love with transformers. +These back and forth I think that it's impressive but also if you think of the magic of getting this vector and having so much meaning there it's quite amazing. + Yeah it's true I mean it's very powerful you know like that the the the sheer fact that you don't need to build a synonym like dictionary if you go full text right like it just tells you that yeah mathematics is close to algebra or you know like you throw data at it and it's an unsupervised right it just tells you hey I've trained it up like now okay I can tell you what's close to each other geometrically it also has the mathematical beauty there right geometric closeness rather than kind of some obscure strange abstract sparse closeness it's quite elegant yeah yeah you have tracked all these knowledge all these and you have that this simple thing that you can imagine in your head as a 3D space and that is simple as algebra from I don't know which grade but quite simple yeah I think in simplicity there is a lot of beauty yeah it's very easy to explain to your granny like I'm doing this you know like it's 3D space kind of there are points and I'm just looking the closest one I expect to have something that puts things close to each other that makes sense together that is what we expect from these black box exactly exactly and then and then the question of scale like if you go to 10 million hundred million billion billion then okay can you trade some of that closeness precision you know and kind of get past the speed so yeah it's very interesting I mean it's um does it does it interest you more like on deep learning side or on mathematics side or engineering side like or maybe some other side in every side from mathematics I enjoy a lot the beauty of it sometimes it's too obscure for me but I really like and understand deep learning I like it although I feel that some of some of the research doesn't seem to be so innovative and maybe we should spend more time checking other paths and deep learning which are the paths like I don't know you should be honest I don't know I'm just feel I just feel that there's so much literature that I cannot keep up with and then from the engineering side I think it's cool it's just space I think I also can provide more value and sometimes concepts are to abstract from yeah like for me like I want to call out but you like point on is deep learning the only way you know like for example one scary thing is that these models are becoming more and more kind of parameterized so you have like hundreds of billions parameters maybe billion trillion like how many more can you have zillion parameters in there but first of all it's impractical so if you take that model you try to plug it in it doesn't plug in because it's too expensive and also you might not have that much data in the first place right so why should you care like web scale search engines probably will but like you as a researcher in let's say a startup you don't know if you need that much you need to sell solve that specific thing right so it will it will look really strange to bring this huge microscope and like GBT model in and say this is what we need to use and then like the whole budget goes into paying that model or whatever you know like it's in it's in practical so that that direction by itself like I think it's a little bit like doomed or like I don't know like how you feel about it yeah it's a it's a race where I add another layer and get more parameters and I win and I think I'm not but it feels that the first step to move away from this is to really understand how things are learned and why are learned the way they are I don't know any match you have these models to bring back to the image where the where the filters are learned more or less you have some idea or where the model is looking but maybe to put more research on slow down let's slow down these race and let's understand and maybe we find a way to make it more sustainable for everyone yeah because I remember like when I was doing my PhD in machine translation it was using like statistical models like Moses and you know statistical machine translation and so it would suffer from things like out of vocabulary and you know how do I bring syntax in and whatnot but then like when deep learning came like all of a sudden you see that it translates much much better and you think wow probably probably they solved it now right the claim from 50s that we will solve machine translation probably now is delivered that the promise but but then you notice it's fluent but it's kind of like I don't want to use the word stupid but it just doesn't get it right like it it makes wolf subject an object easily it may go and hallucinate about something that doesn't exist there or it actually goes and translates into like single letters all of all of a sudden you know or repeating engrams or like you see that it didn't exactly solve it right you wouldn't trans your life to such a system yet no yeah and then you kind of come back and like okay and I used to do it like in a rule-based approach so I could understand the syntax of the of the sentence and then semantics like nod in the tree and then when I translate I use some semantic like function and it's all well-defined in the the astronomy of semantic functions and so on like okay now I go back to deploying do you have anything like that no it's like just space maybe there is a there should be a way to combine I don't know we have built I think as humans we have built this complex way of talking to each other which I mean multiple languages and stuff and there is no way that all these language can go back to this deep learning world world it seems country to if div at least yeah yeah so you are certain that like maybe the voice of those who build alternative models to deploying off a alternative approach should be maybe louder yeah I think that we may suffer from the bias of the winner no I mean maybe the first one who opens a door might not do in the race because but even if they show another way that the race might go I think they might deserve more attention yeah yeah this this is quite deep thanks for this white white section like you like you you think about it a lot like kind of okay not to be biased okay yes there are challenges but it might not be the only right approach and giving you experience as well like you can judge a little bit like with with your open eyes I think we should explore more and maybe not one focus of that yeah and and probably explore with Gina right that's the talk for sure yeah so this is this is super great is there something you want to share you already shared that the fine tuner is available so our listeners can go and check it out right is there something else that like we need to be expecting and early next year we should be releasing our 3. +0 version so the stay tuned for that yeah yeah comment we will be moving fast and the next times yeah this is fantastic I mean thanks so much for all this information and detail on Gina and also like your your ambition and then kind of like you're thinking here I mean it's really nice that you keep your open mind available to all of our listeners yeah thanks so much it was a pleasure to talk to you John today yeah thank you so much yeah thank you much looking forward to 3. +0 yeah thank you bye bye you \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/louis-brandy-sql-meets-vector-search-at-rockset.md b/transcripts_with_timestamps/vector-podcast/louis-brandy-sql-meets-vector-search-at-rockset.md new file mode 100644 index 0000000..5ee960a --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/louis-brandy-sql-meets-vector-search-at-rockset.md @@ -0,0 +1,3062 @@ +--- +description: '

This episode on YouTube: https://www.youtube.com/watch?v=TiwqVlDpsl8

00:00 + Intro

00:42 + Louis''s background

05:39 From Facebook + to Rockset

07:41 + Embeddings prior to deep learning / LLM era

12:35 + What''s Rockset as a product

15:27 Use cases

18:04 + RocksDB as part of Rockset

20:33 AI capabilities: + ANN index, hybrid search

25:11 Types of + hybrid search

28:05 + Can one learn the alpha?

30:03 Louis''s + prediction of the future of vector search

33:55 + RAG and other AI capabilities

41:46 + Call out to the Vector Search community

46:16 + Vector Databases vs Databases

49:16 + Question of WHY

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20240501_010549_d5b842295c8b59f78ff9fa1e488d2af8.png +pub_date: Wed, 01 May 2024 13:54:39 GMT +title: Louis Brandy - SQL meets Vector Search at Rockset +url: https://rss.com/podcasts/vector-podcast/1460893 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 20.22, "text": " Hello + there, vector podcast. Season three and this promised I''m trying to shoot for 30", + "tokens": [50364, 2425, 456, 11, 8062, 7367, 13, 16465, 1045, 293, 341, 10768, 286, + 478, 1382, 281, 3076, 337, 2217, 51375], "temperature": 0.0, "avg_logprob": -0.2997669966324516, + "compression_ratio": 1.303030303030303, "no_speech_prob": 0.18687711656093597}, + {"id": 1, "seek": 0, "start": 20.22, "end": 24.96, "text": " minute episodes. Let''s + see how I''m going to do on this one. I''m super excited to have", "tokens": [51375, + 3456, 9313, 13, 961, 311, 536, 577, 286, 478, 516, 281, 360, 322, 341, 472, 13, + 286, 478, 1687, 2919, 281, 362, 51612], "temperature": 0.0, "avg_logprob": -0.2997669966324516, + "compression_ratio": 1.303030303030303, "no_speech_prob": 0.18687711656093597}, + {"id": 2, "seek": 2496, "start": 24.96, "end": 31.200000000000003, "text": " Luis + Brandy, vice president of engineering at Rockset. I know you guys are building database.", + "tokens": [50364, 25133, 11119, 88, 11, 11964, 3868, 295, 7043, 412, 6922, 3854, + 13, 286, 458, 291, 1074, 366, 2390, 8149, 13, 50676], "temperature": 0.0, "avg_logprob": + -0.1938008483575315, "compression_ratio": 1.5198412698412698, "no_speech_prob": + 0.36037302017211914}, {"id": 3, "seek": 2496, "start": 31.200000000000003, "end": + 36.24, "text": " Hey Luis, how are you doing? I''m doing great. So far so good. + Thank you for having me today.", "tokens": [50676, 1911, 25133, 11, 577, 366, 291, + 884, 30, 286, 478, 884, 869, 13, 407, 1400, 370, 665, 13, 1044, 291, 337, 1419, + 385, 965, 13, 50928], "temperature": 0.0, "avg_logprob": -0.1938008483575315, "compression_ratio": + 1.5198412698412698, "no_speech_prob": 0.36037302017211914}, {"id": 4, "seek": 2496, + "start": 36.24, "end": 44.0, "text": " Oh yeah, excited. Excited to learn about + Rockset as well. But before that, it''s a tradition. Could", "tokens": [50928, 876, + 1338, 11, 2919, 13, 9368, 1226, 281, 1466, 466, 6922, 3854, 382, 731, 13, 583, 949, + 300, 11, 309, 311, 257, 6994, 13, 7497, 51316], "temperature": 0.0, "avg_logprob": + -0.1938008483575315, "compression_ratio": 1.5198412698412698, "no_speech_prob": + 0.36037302017211914}, {"id": 5, "seek": 2496, "start": 44.0, "end": 50.480000000000004, + "text": " you please introduce yourself a little bit about your background and how + you got to your stage in", "tokens": [51316, 291, 1767, 5366, 1803, 257, 707, 857, + 466, 428, 3678, 293, 577, 291, 658, 281, 428, 3233, 294, 51640], "temperature": + 0.0, "avg_logprob": -0.1938008483575315, "compression_ratio": 1.5198412698412698, + "no_speech_prob": 0.36037302017211914}, {"id": 6, "seek": 5048, "start": 50.48, + "end": 57.36, "text": " your professional life? Sure. So I''ve been at Rockset for + two years and change over two years.", "tokens": [50364, 428, 4843, 993, 30, 4894, + 13, 407, 286, 600, 668, 412, 6922, 3854, 337, 732, 924, 293, 1319, 670, 732, 924, + 13, 50708], "temperature": 0.0, "avg_logprob": -0.13005469062111594, "compression_ratio": + 1.623931623931624, "no_speech_prob": 0.0070395926013588905}, {"id": 7, "seek": 5048, + "start": 58.64, "end": 65.84, "text": " VP of engineering. Before that, I was at + Facebook for 11 years. So I did roughly three things at", "tokens": [50772, 35812, + 295, 7043, 13, 4546, 300, 11, 286, 390, 412, 4384, 337, 2975, 924, 13, 407, 286, + 630, 9810, 1045, 721, 412, 51132], "temperature": 0.0, "avg_logprob": -0.13005469062111594, + "compression_ratio": 1.623931623931624, "no_speech_prob": 0.0070395926013588905}, + {"id": 8, "seek": 5048, "start": 65.84, "end": 70.88, "text": " Facebook and it''s + funny because even the ones that feel least relevant have become more relevant", + "tokens": [51132, 4384, 293, 309, 311, 4074, 570, 754, 264, 2306, 300, 841, 1935, + 7340, 362, 1813, 544, 7340, 51384], "temperature": 0.0, "avg_logprob": -0.13005469062111594, + "compression_ratio": 1.623931623931624, "no_speech_prob": 0.0070395926013588905}, + {"id": 9, "seek": 5048, "start": 70.88, "end": 77.28, "text": " recently. I did + spam fighting infrastructure for my first much of time at Facebook and that", "tokens": + [51384, 3938, 13, 286, 630, 24028, 5237, 6896, 337, 452, 700, 709, 295, 565, 412, + 4384, 293, 300, 51704], "temperature": 0.0, "avg_logprob": -0.13005469062111594, + "compression_ratio": 1.623931623931624, "no_speech_prob": 0.0070395926013588905}, + {"id": 10, "seek": 7728, "start": 77.28, "end": 81.04, "text": " involved like two + large systems. One was like a super real time system, which turns into the", "tokens": + [50364, 3288, 411, 732, 2416, 3652, 13, 1485, 390, 411, 257, 1687, 957, 565, 1185, + 11, 597, 4523, 666, 264, 50552], "temperature": 0.0, "avg_logprob": -0.14345688606376078, + "compression_ratio": 1.8031746031746032, "no_speech_prob": 0.01672694832086563}, + {"id": 11, "seek": 7728, "start": 81.04, "end": 84.8, "text": " real time database + we''re going to talk about today. And the other was we did a lot of vector", "tokens": + [50552, 957, 565, 8149, 321, 434, 516, 281, 751, 466, 965, 13, 400, 264, 661, 390, + 321, 630, 257, 688, 295, 8062, 50740], "temperature": 0.0, "avg_logprob": -0.14345688606376078, + "compression_ratio": 1.8031746031746032, "no_speech_prob": 0.01672694832086563}, + {"id": 12, "seek": 7728, "start": 84.8, "end": 90.48, "text": " clustering. Like + back I was doing vectors but way before they were cool. This time was like 2011,", + "tokens": [50740, 596, 48673, 13, 1743, 646, 286, 390, 884, 18875, 457, 636, 949, + 436, 645, 1627, 13, 639, 565, 390, 411, 10154, 11, 51024], "temperature": 0.0, "avg_logprob": + -0.14345688606376078, "compression_ratio": 1.8031746031746032, "no_speech_prob": + 0.01672694832086563}, {"id": 13, "seek": 7728, "start": 90.48, "end": 96.72, "text": + " 2015 or so. And we used vectors a lot in in spam fighting and image classification. + And this", "tokens": [51024, 7546, 420, 370, 13, 400, 321, 1143, 18875, 257, 688, + 294, 294, 24028, 5237, 293, 3256, 21538, 13, 400, 341, 51336], "temperature": 0.0, + "avg_logprob": -0.14345688606376078, "compression_ratio": 1.8031746031746032, "no_speech_prob": + 0.01672694832086563}, {"id": 14, "seek": 7728, "start": 96.72, "end": 100.64, "text": + " is like even before like the deep learning took over the world like this is right + before deep", "tokens": [51336, 307, 411, 754, 949, 411, 264, 2452, 2539, 1890, + 670, 264, 1002, 411, 341, 307, 558, 949, 2452, 51532], "temperature": 0.0, "avg_logprob": + -0.14345688606376078, "compression_ratio": 1.8031746031746032, "no_speech_prob": + 0.01672694832086563}, {"id": 15, "seek": 7728, "start": 100.64, "end": 104.48, "text": + " learning changed everything in this in this space. But we were using vectors a + lot. We built some", "tokens": [51532, 2539, 3105, 1203, 294, 341, 294, 341, 1901, + 13, 583, 321, 645, 1228, 18875, 257, 688, 13, 492, 3094, 512, 51724], "temperature": + 0.0, "avg_logprob": -0.14345688606376078, "compression_ratio": 1.8031746031746032, + "no_speech_prob": 0.01672694832086563}, {"id": 16, "seek": 10448, "start": 104.48, + "end": 109.2, "text": " pretty powerful systems actually built like large scale + vector clustering. I don''t know before", "tokens": [50364, 1238, 4005, 3652, 767, + 3094, 411, 2416, 4373, 8062, 596, 48673, 13, 286, 500, 380, 458, 949, 50600], "temperature": + 0.0, "avg_logprob": -0.1368819399996921, "compression_ratio": 1.750915750915751, + "no_speech_prob": 0.00035316290450282395}, {"id": 17, "seek": 10448, "start": 109.2, + "end": 114.64, "text": " it was cool. Now everyone''s building large scale vector + applications. And then I worked on a", "tokens": [50600, 309, 390, 1627, 13, 823, + 1518, 311, 2390, 2416, 4373, 8062, 5821, 13, 400, 550, 286, 2732, 322, 257, 50872], + "temperature": 0.0, "avg_logprob": -0.1368819399996921, "compression_ratio": 1.750915750915751, + "no_speech_prob": 0.00035316290450282395}, {"id": 18, "seek": 10448, "start": 114.64, + "end": 119.92, "text": " lot of other stuff at my time at Facebook. So there was + a lot of core C++ core libraries, a lot", "tokens": [50872, 688, 295, 661, 1507, + 412, 452, 565, 412, 4384, 13, 407, 456, 390, 257, 688, 295, 4965, 383, 25472, 4965, + 15148, 11, 257, 688, 51136], "temperature": 0.0, "avg_logprob": -0.1368819399996921, + "compression_ratio": 1.750915750915751, "no_speech_prob": 0.00035316290450282395}, + {"id": 19, "seek": 10448, "start": 119.92, "end": 124.72, "text": " of infrastructure + stuff. I worked on an open source stuff called folly and thrift. So these are", + "tokens": [51136, 295, 6896, 1507, 13, 286, 2732, 322, 364, 1269, 4009, 1507, 1219, + 726, 13020, 293, 739, 2008, 13, 407, 613, 366, 51376], "temperature": 0.0, "avg_logprob": + -0.1368819399996921, "compression_ratio": 1.750915750915751, "no_speech_prob": 0.00035316290450282395}, + {"id": 20, "seek": 10448, "start": 124.72, "end": 130.08, "text": " basically like + core libraries that Facebook has released over the years. And the theme of all this", + "tokens": [51376, 1936, 411, 4965, 15148, 300, 4384, 575, 4736, 670, 264, 924, 13, + 400, 264, 6314, 295, 439, 341, 51644], "temperature": 0.0, "avg_logprob": -0.1368819399996921, + "compression_ratio": 1.750915750915751, "no_speech_prob": 0.00035316290450282395}, + {"id": 21, "seek": 13008, "start": 130.08, "end": 134.96, "text": " is like highly + scalable infra. And then I did some real time and some vector stuff back in the", + "tokens": [50364, 307, 411, 5405, 38481, 23654, 13, 400, 550, 286, 630, 512, 957, + 565, 293, 512, 8062, 1507, 646, 294, 264, 50608], "temperature": 0.0, "avg_logprob": + -0.19945635347284824, "compression_ratio": 1.6632302405498283, "no_speech_prob": + 0.004424632992595434}, {"id": 22, "seek": 13008, "start": 134.96, "end": 139.60000000000002, + "text": " spam fighting days. It''s not totally applicable necessarily to the modern + world. But it''s still", "tokens": [50608, 24028, 5237, 1708, 13, 467, 311, 406, + 3879, 21142, 4725, 281, 264, 4363, 1002, 13, 583, 309, 311, 920, 50840], "temperature": + 0.0, "avg_logprob": -0.19945635347284824, "compression_ratio": 1.6632302405498283, + "no_speech_prob": 0.004424632992595434}, {"id": 23, "seek": 13008, "start": 139.60000000000002, + "end": 144.4, "text": " pretty interesting background. It a very interesting confluence + of things that have brought me to", "tokens": [50840, 1238, 1880, 3678, 13, 467, + 257, 588, 1880, 1497, 40432, 295, 721, 300, 362, 3038, 385, 281, 51080], "temperature": + 0.0, "avg_logprob": -0.19945635347284824, "compression_ratio": 1.6632302405498283, + "no_speech_prob": 0.004424632992595434}, {"id": 24, "seek": 13008, "start": 144.4, + "end": 150.08, "text": " Roxette. So yeah, that''s my life story roughly and in + nutshell, there''s more. But I think that", "tokens": [51080, 44427, 3007, 13, 407, + 1338, 11, 300, 311, 452, 993, 1657, 9810, 293, 294, 37711, 11, 456, 311, 544, 13, + 583, 286, 519, 300, 51364], "temperature": 0.0, "avg_logprob": -0.19945635347284824, + "compression_ratio": 1.6632302405498283, "no_speech_prob": 0.004424632992595434}, + {"id": 25, "seek": 13008, "start": 150.08, "end": 156.4, "text": " will do for the + for the intro. Yeah, for sure. Very exciting, really exciting. I heard about thrift.", + "tokens": [51364, 486, 360, 337, 264, 337, 264, 12897, 13, 865, 11, 337, 988, 13, + 4372, 4670, 11, 534, 4670, 13, 286, 2198, 466, 739, 2008, 13, 51680], "temperature": + 0.0, "avg_logprob": -0.19945635347284824, "compression_ratio": 1.6632302405498283, + "no_speech_prob": 0.004424632992595434}, {"id": 26, "seek": 15640, "start": 156.4, + "end": 162.56, "text": " And I also remember like early on many years ago, when + some of you guys were on stage, you know,", "tokens": [50364, 400, 286, 611, 1604, + 411, 2440, 322, 867, 924, 2057, 11, 562, 512, 295, 291, 1074, 645, 322, 3233, 11, + 291, 458, 11, 50672], "temperature": 0.0, "avg_logprob": -0.1499397090223969, "compression_ratio": + 1.7272727272727273, "no_speech_prob": 0.008209636434912682}, {"id": 27, "seek": + 15640, "start": 162.56, "end": 167.36, "text": " from the engineering at Facebook, + you would constantly, you know, hint to the point that yeah,", "tokens": [50672, + 490, 264, 7043, 412, 4384, 11, 291, 576, 6460, 11, 291, 458, 11, 12075, 281, 264, + 935, 300, 1338, 11, 50912], "temperature": 0.0, "avg_logprob": -0.1499397090223969, + "compression_ratio": 1.7272727272727273, "no_speech_prob": 0.008209636434912682}, + {"id": 28, "seek": 15640, "start": 167.36, "end": 173.04000000000002, "text": " + we ran out of the capabilities of this database. So we needed to scale up. We needed + to build a new one", "tokens": [50912, 321, 5872, 484, 295, 264, 10862, 295, 341, + 8149, 13, 407, 321, 2978, 281, 4373, 493, 13, 492, 2978, 281, 1322, 257, 777, 472, + 51196], "temperature": 0.0, "avg_logprob": -0.1499397090223969, "compression_ratio": + 1.7272727272727273, "no_speech_prob": 0.008209636434912682}, {"id": 29, "seek": + 15640, "start": 173.04000000000002, "end": 179.04000000000002, "text": " sometimes. + And that was really really interesting that it''s constantly like, you know, you''re + always", "tokens": [51196, 2171, 13, 400, 300, 390, 534, 534, 1880, 300, 309, 311, + 6460, 411, 11, 291, 458, 11, 291, 434, 1009, 51496], "temperature": 0.0, "avg_logprob": + -0.1499397090223969, "compression_ratio": 1.7272727272727273, "no_speech_prob": + 0.008209636434912682}, {"id": 30, "seek": 15640, "start": 179.04000000000002, "end": + 184.16, "text": " battle against too many images, too many videos and so on and + so forth. Yeah, one thing that I''ve", "tokens": [51496, 4635, 1970, 886, 867, 5267, + 11, 886, 867, 2145, 293, 370, 322, 293, 370, 5220, 13, 865, 11, 472, 551, 300, 286, + 600, 51752], "temperature": 0.0, "avg_logprob": -0.1499397090223969, "compression_ratio": + 1.7272727272727273, "no_speech_prob": 0.008209636434912682}, {"id": 31, "seek": + 18416, "start": 184.16, "end": 190.8, "text": " always said was that like everything + is broken at scale. Like like, there''s this idea that sometimes", "tokens": [50364, + 1009, 848, 390, 300, 411, 1203, 307, 5463, 412, 4373, 13, 1743, 411, 11, 456, 311, + 341, 1558, 300, 2171, 50696], "temperature": 0.0, "avg_logprob": -0.13620163654458933, + "compression_ratio": 1.7664233576642336, "no_speech_prob": 0.002339428523555398}, + {"id": 32, "seek": 18416, "start": 190.8, "end": 196.56, "text": " you reach for + the right tool for the job, but the reality is like when when you push the even + the", "tokens": [50696, 291, 2524, 337, 264, 558, 2290, 337, 264, 1691, 11, 457, + 264, 4103, 307, 411, 562, 562, 291, 2944, 264, 754, 264, 50984], "temperature": + 0.0, "avg_logprob": -0.13620163654458933, "compression_ratio": 1.7664233576642336, + "no_speech_prob": 0.002339428523555398}, {"id": 33, "seek": 18416, "start": 196.56, + "end": 201.12, "text": " right tool to the limit, it will fall over and you''ll + find yourself doing things that other people,", "tokens": [50984, 558, 2290, 281, + 264, 4948, 11, 309, 486, 2100, 670, 293, 291, 603, 915, 1803, 884, 721, 300, 661, + 561, 11, 51212], "temperature": 0.0, "avg_logprob": -0.13620163654458933, "compression_ratio": + 1.7664233576642336, "no_speech_prob": 0.002339428523555398}, {"id": 34, "seek": + 18416, "start": 201.12, "end": 204.48, "text": " you know, rebuilding something + that other people take for granted. Like my favorite example of this", "tokens": + [51212, 291, 458, 11, 36717, 746, 300, 661, 561, 747, 337, 12344, 13, 1743, 452, + 2954, 1365, 295, 341, 51380], "temperature": 0.0, "avg_logprob": -0.13620163654458933, + "compression_ratio": 1.7664233576642336, "no_speech_prob": 0.002339428523555398}, + {"id": 35, "seek": 18416, "start": 204.48, "end": 210.32, "text": " is at Facebook, + we had a team on on my core C++ group that was working on Malak. Like", "tokens": + [51380, 307, 412, 4384, 11, 321, 632, 257, 1469, 322, 322, 452, 4965, 383, 25472, + 1594, 300, 390, 1364, 322, 5746, 514, 13, 1743, 51672], "temperature": 0.0, "avg_logprob": + -0.13620163654458933, "compression_ratio": 1.7664233576642336, "no_speech_prob": + 0.002339428523555398}, {"id": 36, "seek": 21032, "start": 211.28, "end": 216.64, + "text": " who who works on Malak? It turns out there are people that work at Malak. + They''re most of them work at", "tokens": [50412, 567, 567, 1985, 322, 5746, 514, + 30, 467, 4523, 484, 456, 366, 561, 300, 589, 412, 5746, 514, 13, 814, 434, 881, + 295, 552, 589, 412, 50680], "temperature": 0.0, "avg_logprob": -0.18197216811003508, + "compression_ratio": 1.6322314049586777, "no_speech_prob": 0.012771518900990486}, + {"id": 37, "seek": 21032, "start": 216.64, "end": 221.12, "text": " like a place + like Facebook or Google or places like that, but but that''s like the kind of thing + where", "tokens": [50680, 411, 257, 1081, 411, 4384, 420, 3329, 420, 3190, 411, + 300, 11, 457, 457, 300, 311, 411, 264, 733, 295, 551, 689, 50904], "temperature": + 0.0, "avg_logprob": -0.18197216811003508, "compression_ratio": 1.6322314049586777, + "no_speech_prob": 0.012771518900990486}, {"id": 38, "seek": 21032, "start": 221.12, + "end": 226.0, "text": " you can save a lot of money by making tiny improvements + to Malak. So it''s worth doing.", "tokens": [50904, 291, 393, 3155, 257, 688, 295, + 1460, 538, 1455, 5870, 13797, 281, 5746, 514, 13, 407, 309, 311, 3163, 884, 13, + 51148], "temperature": 0.0, "avg_logprob": -0.18197216811003508, "compression_ratio": + 1.6322314049586777, "no_speech_prob": 0.012771518900990486}, {"id": 39, "seek": + 21032, "start": 226.0, "end": 233.35999999999999, "text": " It''s amazing. I remember + I did a bit of a C++ as well. I guess like you could say two and a half years.", + "tokens": [51148, 467, 311, 2243, 13, 286, 1604, 286, 630, 257, 857, 295, 257, 383, + 25472, 382, 731, 13, 286, 2041, 411, 291, 727, 584, 732, 293, 257, 1922, 924, 13, + 51516], "temperature": 0.0, "avg_logprob": -0.18197216811003508, "compression_ratio": + 1.6322314049586777, "no_speech_prob": 0.012771518900990486}, {"id": 40, "seek": + 23336, "start": 234.32000000000002, "end": 242.0, "text": " And at some point in + the 90 virus company here in Finland, I had to choose which Malak will it be,", + "tokens": [50412, 400, 412, 512, 935, 294, 264, 4289, 5752, 2237, 510, 294, 24869, + 11, 286, 632, 281, 2826, 597, 5746, 514, 486, 309, 312, 11, 50796], "temperature": + 0.0, "avg_logprob": -0.15402721891216203, "compression_ratio": 1.58, "no_speech_prob": + 0.03078543022274971}, {"id": 41, "seek": 23336, "start": 242.0, "end": 247.12, "text": + " right? And I had to sort of discuss with my team and I was like, struck really, + is that really the", "tokens": [50796, 558, 30, 400, 286, 632, 281, 1333, 295, 2248, + 365, 452, 1469, 293, 286, 390, 411, 11, 13159, 534, 11, 307, 300, 534, 264, 51052], + "temperature": 0.0, "avg_logprob": -0.15402721891216203, "compression_ratio": 1.58, + "no_speech_prob": 0.03078543022274971}, {"id": 42, "seek": 23336, "start": 247.12, + "end": 252.4, "text": " thing we need to discuss? And they said, yeah, actually + you won''t believe because we are running on", "tokens": [51052, 551, 321, 643, + 281, 2248, 30, 400, 436, 848, 11, 1338, 11, 767, 291, 1582, 380, 1697, 570, 321, + 366, 2614, 322, 51316], "temperature": 0.0, "avg_logprob": -0.15402721891216203, + "compression_ratio": 1.58, "no_speech_prob": 0.03078543022274971}, {"id": 43, "seek": + 23336, "start": 252.4, "end": 258.96000000000004, "text": " a mobile phone back + then it was this Microsoft''s Windows mobile, I guess it was called, right? So", + "tokens": [51316, 257, 6013, 2593, 646, 550, 309, 390, 341, 8116, 311, 8591, 6013, + 11, 286, 2041, 309, 390, 1219, 11, 558, 30, 407, 51644], "temperature": 0.0, "avg_logprob": + -0.15402721891216203, "compression_ratio": 1.58, "no_speech_prob": 0.03078543022274971}, + {"id": 44, "seek": 25896, "start": 259.03999999999996, "end": 264.47999999999996, + "text": " you have to be really careful to the round. Yeah, I mean, there''s only + here for Malak''s in the", "tokens": [50368, 291, 362, 281, 312, 534, 5026, 281, + 264, 3098, 13, 865, 11, 286, 914, 11, 456, 311, 787, 510, 337, 5746, 514, 311, 294, + 264, 50640], "temperature": 0.0, "avg_logprob": -0.16987393596979578, "compression_ratio": + 1.6526315789473685, "no_speech_prob": 0.021107446402311325}, {"id": 45, "seek": + 25896, "start": 264.47999999999996, "end": 269.2, "text": " world. So you might + have chosen ours. Who knows? Amazing. All these say is that you''ve been really,", + "tokens": [50640, 1002, 13, 407, 291, 1062, 362, 8614, 11896, 13, 2102, 3255, 30, + 14165, 13, 1057, 613, 584, 307, 300, 291, 600, 668, 534, 11, 50876], "temperature": + 0.0, "avg_logprob": -0.16987393596979578, "compression_ratio": 1.6526315789473685, + "no_speech_prob": 0.021107446402311325}, {"id": 46, "seek": 25896, "start": 269.2, + "end": 275.03999999999996, "text": " really deep and low level. And so I think you + you doubled in coding obviously, right?", "tokens": [50876, 534, 2452, 293, 2295, + 1496, 13, 400, 370, 286, 519, 291, 291, 24405, 294, 17720, 2745, 11, 558, 30, 51168], + "temperature": 0.0, "avg_logprob": -0.16987393596979578, "compression_ratio": 1.6526315789473685, + "no_speech_prob": 0.021107446402311325}, {"id": 47, "seek": 25896, "start": 276.0, + "end": 280.15999999999997, "text": " Yeah. So I was a fairly technical. I''ve been + a manager for relatively long time. I don''t know,", "tokens": [51216, 865, 13, + 407, 286, 390, 257, 6457, 6191, 13, 286, 600, 668, 257, 6598, 337, 7226, 938, 565, + 13, 286, 500, 380, 458, 11, 51424], "temperature": 0.0, "avg_logprob": -0.16987393596979578, + "compression_ratio": 1.6526315789473685, "no_speech_prob": 0.021107446402311325}, + {"id": 48, "seek": 25896, "start": 280.15999999999997, "end": 285.84, "text": " + 12 years or so, but I''ve always been a fairly technical manager in my path. And + so for example,", "tokens": [51424, 2272, 924, 420, 370, 11, 457, 286, 600, 1009, + 668, 257, 6457, 6191, 6598, 294, 452, 3100, 13, 400, 370, 337, 1365, 11, 51708], + "temperature": 0.0, "avg_logprob": -0.16987393596979578, "compression_ratio": 1.6526315789473685, + "no_speech_prob": 0.021107446402311325}, {"id": 49, "seek": 28584, "start": 286.0, + "end": 289.84, "text": " for years I worked in the course, SQL''s plus libraries + at Facebook, even while I was a manager,", "tokens": [50372, 337, 924, 286, 2732, + 294, 264, 1164, 11, 19200, 311, 1804, 15148, 412, 4384, 11, 754, 1339, 286, 390, + 257, 6598, 11, 50564], "temperature": 0.0, "avg_logprob": -0.20760056229888416, + "compression_ratio": 1.6643356643356644, "no_speech_prob": 0.004974909592419863}, + {"id": 50, "seek": 28584, "start": 289.84, "end": 297.2, "text": " even a director. + I was on the SQL standards committee for a while and doing things like that.", "tokens": + [50564, 754, 257, 5391, 13, 286, 390, 322, 264, 19200, 7787, 7482, 337, 257, 1339, + 293, 884, 721, 411, 300, 13, 50932], "temperature": 0.0, "avg_logprob": -0.20760056229888416, + "compression_ratio": 1.6643356643356644, "no_speech_prob": 0.004974909592419863}, + {"id": 51, "seek": 28584, "start": 298.96, "end": 306.15999999999997, "text": " + Sorry, I got paged. Everything''s fine. So yeah, I''ve definitely worked in the + code. I''ve tried", "tokens": [51020, 4919, 11, 286, 658, 280, 2980, 13, 5471, 311, + 2489, 13, 407, 1338, 11, 286, 600, 2138, 2732, 294, 264, 3089, 13, 286, 600, 3031, + 51380], "temperature": 0.0, "avg_logprob": -0.20760056229888416, "compression_ratio": + 1.6643356643356644, "no_speech_prob": 0.004974909592419863}, {"id": 52, "seek": + 28584, "start": 306.15999999999997, "end": 309.67999999999995, "text": " to stay + as hands on as possible. In most recent years, it''s become increasingly difficult.", + "tokens": [51380, 281, 1754, 382, 2377, 322, 382, 1944, 13, 682, 881, 5162, 924, + 11, 309, 311, 1813, 12980, 2252, 13, 51556], "temperature": 0.0, "avg_logprob": + -0.20760056229888416, "compression_ratio": 1.6643356643356644, "no_speech_prob": + 0.004974909592419863}, {"id": 53, "seek": 28584, "start": 310.4, "end": 315.12, + "text": " I just I don''t know, it''s sort of the dark side of management. You slowly + slide into more managerial", "tokens": [51592, 286, 445, 286, 500, 380, 458, 11, + 309, 311, 1333, 295, 264, 2877, 1252, 295, 4592, 13, 509, 5692, 4137, 666, 544, + 6598, 831, 51828], "temperature": 0.0, "avg_logprob": -0.20760056229888416, "compression_ratio": + 1.6643356643356644, "no_speech_prob": 0.004974909592419863}, {"id": 54, "seek": + 31512, "start": 315.12, "end": 320.4, "text": " things. But I still try to stay + about as hands on as I possibly can. Oh, tell me about,", "tokens": [50364, 721, + 13, 583, 286, 920, 853, 281, 1754, 466, 382, 2377, 322, 382, 286, 6264, 393, 13, + 876, 11, 980, 385, 466, 11, 50628], "temperature": 0.0, "avg_logprob": -0.18076297429602917, + "compression_ratio": 1.6559139784946237, "no_speech_prob": 0.013700243085622787}, + {"id": 55, "seek": 31512, "start": 320.4, "end": 326.24, "text": " tell me about + me about that. I mean, I''m also on the product management side today and", "tokens": + [50628, 980, 385, 466, 385, 466, 300, 13, 286, 914, 11, 286, 478, 611, 322, 264, + 1674, 4592, 1252, 965, 293, 50920], "temperature": 0.0, "avg_logprob": -0.18076297429602917, + "compression_ratio": 1.6559139784946237, "no_speech_prob": 0.013700243085622787}, + {"id": 56, "seek": 31512, "start": 326.24, "end": 331.04, "text": " previously a + manager of people as well. And I''m like, am I sliding backwards? Do I need to?", + "tokens": [50920, 8046, 257, 6598, 295, 561, 382, 731, 13, 400, 286, 478, 411, 11, + 669, 286, 21169, 12204, 30, 1144, 286, 643, 281, 30, 51160], "temperature": 0.0, + "avg_logprob": -0.18076297429602917, "compression_ratio": 1.6559139784946237, "no_speech_prob": + 0.013700243085622787}, {"id": 57, "seek": 31512, "start": 331.76, "end": 338.0, + "text": " Sometimes I do, but it''s not on the same level as it used to be for sure. + But it makes sense to", "tokens": [51196, 4803, 286, 360, 11, 457, 309, 311, 406, + 322, 264, 912, 1496, 382, 309, 1143, 281, 312, 337, 988, 13, 583, 309, 1669, 2020, + 281, 51508], "temperature": 0.0, "avg_logprob": -0.18076297429602917, "compression_ratio": + 1.6559139784946237, "no_speech_prob": 0.013700243085622787}, {"id": 58, "seek": + 31512, "start": 338.0, "end": 343.92, "text": " stay on these topics. And and then + after all these years, you decided to move to Roxette. I''ve read", "tokens": [51508, + 1754, 322, 613, 8378, 13, 400, 293, 550, 934, 439, 613, 924, 11, 291, 3047, 281, + 1286, 281, 44427, 3007, 13, 286, 600, 1401, 51804], "temperature": 0.0, "avg_logprob": + -0.18076297429602917, "compression_ratio": 1.6559139784946237, "no_speech_prob": + 0.013700243085622787}, {"id": 59, "seek": 34392, "start": 344.64000000000004, "end": + 351.04, "text": " a blog post that I think you''ve written for the company where + you give the reasons why you did so", "tokens": [50400, 257, 6968, 2183, 300, 286, + 519, 291, 600, 3720, 337, 264, 2237, 689, 291, 976, 264, 4112, 983, 291, 630, 370, + 50720], "temperature": 0.0, "avg_logprob": -0.15379995169098845, "compression_ratio": + 1.6141078838174274, "no_speech_prob": 0.020054491236805916}, {"id": 60, "seek": + 34392, "start": 351.04, "end": 356.48, "text": " and you explain about the team + strengths and so on support. Some of them are from Facebook as well.", "tokens": + [50720, 293, 291, 2903, 466, 264, 1469, 16986, 293, 370, 322, 1406, 13, 2188, 295, + 552, 366, 490, 4384, 382, 731, 13, 50992], "temperature": 0.0, "avg_logprob": -0.15379995169098845, + "compression_ratio": 1.6141078838174274, "no_speech_prob": 0.020054491236805916}, + {"id": 61, "seek": 34392, "start": 356.48, "end": 365.52000000000004, "text": " + Today matter, right? Can you sort of repeat that story a little bit like why you + moved from a big", "tokens": [50992, 2692, 1871, 11, 558, 30, 1664, 291, 1333, 295, + 7149, 300, 1657, 257, 707, 857, 411, 983, 291, 4259, 490, 257, 955, 51444], "temperature": + 0.0, "avg_logprob": -0.15379995169098845, "compression_ratio": 1.6141078838174274, + "no_speech_prob": 0.020054491236805916}, {"id": 62, "seek": 34392, "start": 365.52000000000004, + "end": 372.08000000000004, "text": " company you could say, right? To a startup. + So the answer is in short is the people. The core", "tokens": [51444, 2237, 291, + 727, 584, 11, 558, 30, 1407, 257, 18578, 13, 407, 264, 1867, 307, 294, 2099, 307, + 264, 561, 13, 440, 4965, 51772], "temperature": 0.0, "avg_logprob": -0.15379995169098845, + "compression_ratio": 1.6141078838174274, "no_speech_prob": 0.020054491236805916}, + {"id": 63, "seek": 37208, "start": 372.08, "end": 376.64, "text": " group at Roxette + is a bunch of, I shouldn''t say the core group now. The core group now has grown + a", "tokens": [50364, 1594, 412, 44427, 3007, 307, 257, 3840, 295, 11, 286, 4659, + 380, 584, 264, 4965, 1594, 586, 13, 440, 4965, 1594, 586, 575, 7709, 257, 50592], + "temperature": 0.0, "avg_logprob": -0.18809340610977046, "compression_ratio": 1.8065693430656935, + "no_speech_prob": 0.0033300681971013546}, {"id": 64, "seek": 37208, "start": 376.64, + "end": 382.88, "text": " lot, but the original founding or was a bunch of extremely + strong Facebook people that I knew from", "tokens": [50592, 688, 11, 457, 264, 3380, + 22223, 420, 390, 257, 3840, 295, 4664, 2068, 4384, 561, 300, 286, 2586, 490, 50904], + "temperature": 0.0, "avg_logprob": -0.18809340610977046, "compression_ratio": 1.8065693430656935, + "no_speech_prob": 0.0033300681971013546}, {"id": 65, "seek": 37208, "start": 382.88, + "end": 388.15999999999997, "text": " Facebook and from. And so you know, you mentioned + rebuilding databases. For example, like two of the", "tokens": [50904, 4384, 293, + 490, 13, 400, 370, 291, 458, 11, 291, 2835, 36717, 22380, 13, 1171, 1365, 11, 411, + 732, 295, 264, 51168], "temperature": 0.0, "avg_logprob": -0.18809340610977046, + "compression_ratio": 1.8065693430656935, "no_speech_prob": 0.0033300681971013546}, + {"id": 66, "seek": 37208, "start": 388.15999999999997, "end": 391.76, "text": " + main people were probably three of the main people responsible for rebuilding databases + at Facebook", "tokens": [51168, 2135, 561, 645, 1391, 1045, 295, 264, 2135, 561, + 6250, 337, 36717, 22380, 412, 4384, 51348], "temperature": 0.0, "avg_logprob": -0.18809340610977046, + "compression_ratio": 1.8065693430656935, "no_speech_prob": 0.0033300681971013546}, + {"id": 67, "seek": 37208, "start": 391.76, "end": 398.08, "text": " are at Roxette. + So Drupal, who''s who''s our CTO was built RoxDB at Facebook. And that was part + of", "tokens": [51348, 366, 412, 44427, 3007, 13, 407, 413, 11976, 304, 11, 567, + 311, 567, 311, 527, 383, 15427, 390, 3094, 44427, 27735, 412, 4384, 13, 400, 300, + 390, 644, 295, 51664], "temperature": 0.0, "avg_logprob": -0.18809340610977046, + "compression_ratio": 1.8065693430656935, "no_speech_prob": 0.0033300681971013546}, + {"id": 68, "seek": 39808, "start": 398.56, "end": 404.24, "text": " replacing the + storage plan of my sequel and a highly scalable way at Facebook. And then of course,", + "tokens": [50388, 19139, 264, 6725, 1393, 295, 452, 20622, 293, 257, 5405, 38481, + 636, 412, 4384, 13, 400, 550, 295, 1164, 11, 50672], "temperature": 0.0, "avg_logprob": + -0.21484406415153953, "compression_ratio": 1.6724890829694323, "no_speech_prob": + 0.0004236118693370372}, {"id": 69, "seek": 39808, "start": 404.24, "end": 408.47999999999996, + "text": " the graph database that powers literally all of Facebook like Facebook + is a graph and it is", "tokens": [50672, 264, 4295, 8149, 300, 8674, 3736, 439, + 295, 4384, 411, 4384, 307, 257, 4295, 293, 309, 307, 50884], "temperature": 0.0, + "avg_logprob": -0.21484406415153953, "compression_ratio": 1.6724890829694323, "no_speech_prob": + 0.0004236118693370372}, {"id": 70, "seek": 39808, "start": 408.47999999999996, "end": + 413.36, "text": " primarily powered by a graph database called Tau. Nathan and Vencat + are two people who worked", "tokens": [50884, 10029, 17786, 538, 257, 4295, 8149, + 1219, 314, 1459, 13, 20634, 293, 11182, 18035, 366, 732, 561, 567, 2732, 51128], + "temperature": 0.0, "avg_logprob": -0.21484406415153953, "compression_ratio": 1.6724890829694323, + "no_speech_prob": 0.0004236118693370372}, {"id": 71, "seek": 39808, "start": 413.36, + "end": 420.15999999999997, "text": " who worked on that at at extensively at like + tech leads and founders in some sense of that project", "tokens": [51128, 567, 2732, + 322, 300, 412, 412, 32636, 412, 411, 7553, 6689, 293, 25608, 294, 512, 2020, 295, + 300, 1716, 51468], "temperature": 0.0, "avg_logprob": -0.21484406415153953, "compression_ratio": + 1.6724890829694323, "no_speech_prob": 0.0004236118693370372}, {"id": 72, "seek": + 42016, "start": 420.8, "end": 429.52000000000004, "text": " Facebook. So this is + like extremely pedigree group. So to me, I don''t care so much about it,", "tokens": + [50396, 4384, 13, 407, 341, 307, 411, 4664, 5670, 328, 701, 1594, 13, 407, 281, + 385, 11, 286, 500, 380, 1127, 370, 709, 466, 309, 11, 50832], "temperature": 0.0, + "avg_logprob": -0.1625074109723491, "compression_ratio": 1.7814814814814814, "no_speech_prob": + 0.020830150693655014}, {"id": 73, "seek": 42016, "start": 429.52000000000004, "end": + 433.76000000000005, "text": " they''re also like genuinely amazing people to work + with and work around. And so this is kind of", "tokens": [50832, 436, 434, 611, + 411, 17839, 2243, 561, 281, 589, 365, 293, 589, 926, 13, 400, 370, 341, 307, 733, + 295, 51044], "temperature": 0.0, "avg_logprob": -0.1625074109723491, "compression_ratio": + 1.7814814814814814, "no_speech_prob": 0.020830150693655014}, {"id": 74, "seek": + 42016, "start": 433.76000000000005, "end": 439.12, "text": " this idea of like, + hey, you want to join us startup with a bunch of the smartest people you''ve ever", + "tokens": [51044, 341, 1558, 295, 411, 11, 4177, 11, 291, 528, 281, 3917, 505, 18578, + 365, 257, 3840, 295, 264, 41491, 561, 291, 600, 1562, 51312], "temperature": 0.0, + "avg_logprob": -0.1625074109723491, "compression_ratio": 1.7814814814814814, "no_speech_prob": + 0.020830150693655014}, {"id": 75, "seek": 42016, "start": 439.12, "end": 444.72, + "text": " worked with and like try to do something and worst case scenario, you + know, it all goes, you know,", "tokens": [51312, 2732, 365, 293, 411, 853, 281, + 360, 746, 293, 5855, 1389, 9005, 11, 291, 458, 11, 309, 439, 1709, 11, 291, 458, + 11, 51592], "temperature": 0.0, "avg_logprob": -0.1625074109723491, "compression_ratio": + 1.7814814814814814, "no_speech_prob": 0.020830150693655014}, {"id": 76, "seek": + 42016, "start": 444.72, "end": 448.48, "text": " kaput or whatever, but you have + like you worked with like some of the best people on a really", "tokens": [51592, + 13816, 325, 420, 2035, 11, 457, 291, 362, 411, 291, 2732, 365, 411, 512, 295, 264, + 1151, 561, 322, 257, 534, 51780], "temperature": 0.0, "avg_logprob": -0.1625074109723491, + "compression_ratio": 1.7814814814814814, "no_speech_prob": 0.020830150693655014}, + {"id": 77, "seek": 44848, "start": 448.48, "end": 451.76, "text": " interesting + problem for a couple years. And I was like, yeah, I''m in for that. There''s a", + "tokens": [50364, 1880, 1154, 337, 257, 1916, 924, 13, 400, 286, 390, 411, 11, 1338, + 11, 286, 478, 294, 337, 300, 13, 821, 311, 257, 50528], "temperature": 0.0, "avg_logprob": + -0.16544021478220194, "compression_ratio": 1.6736842105263159, "no_speech_prob": + 0.0017827838892117143}, {"id": 78, "seek": 44848, "start": 451.76, "end": 455.84000000000003, + "text": " longer version of that story, but that''s that is really the central reason + of why I ended up", "tokens": [50528, 2854, 3037, 295, 300, 1657, 11, 457, 300, + 311, 300, 307, 534, 264, 5777, 1778, 295, 983, 286, 4590, 493, 50732], "temperature": + 0.0, "avg_logprob": -0.16544021478220194, "compression_ratio": 1.6736842105263159, + "no_speech_prob": 0.0017827838892117143}, {"id": 79, "seek": 44848, "start": 455.84000000000003, + "end": 462.48, "text": " I ended up switching. Yeah, I mean, it sounds like a brilliant + reason too. But I''m also interested", "tokens": [50732, 286, 4590, 493, 16493, + 13, 865, 11, 286, 914, 11, 309, 3263, 411, 257, 10248, 1778, 886, 13, 583, 286, + 478, 611, 3102, 51064], "temperature": 0.0, "avg_logprob": -0.16544021478220194, + "compression_ratio": 1.6736842105263159, "no_speech_prob": 0.0017827838892117143}, + {"id": 80, "seek": 44848, "start": 462.48, "end": 468.16, "text": " you said you''ve + been using embeddings before like Facebook and on vectors and you said that prior + to", "tokens": [51064, 291, 848, 291, 600, 668, 1228, 12240, 29432, 949, 411, 4384, + 293, 322, 18875, 293, 291, 848, 300, 4059, 281, 51348], "temperature": 0.0, "avg_logprob": + -0.16544021478220194, "compression_ratio": 1.6736842105263159, "no_speech_prob": + 0.0017827838892117143}, {"id": 81, "seek": 44848, "start": 469.04, "end": 475.92, + "text": " deep learning era. So can you explain a bit like how these vectors were + sort of created if it''s", "tokens": [51392, 2452, 2539, 4249, 13, 407, 393, 291, + 2903, 257, 857, 411, 577, 613, 18875, 645, 1333, 295, 2942, 498, 309, 311, 51736], + "temperature": 0.0, "avg_logprob": -0.16544021478220194, "compression_ratio": 1.6736842105263159, + "no_speech_prob": 0.0017827838892117143}, {"id": 82, "seek": 47592, "start": 475.92, + "end": 481.52000000000004, "text": " possible? So there is some sensitivity here, + but it''s not maybe for the reason you think it''s not", "tokens": [50364, 1944, + 30, 407, 456, 307, 512, 19392, 510, 11, 457, 309, 311, 406, 1310, 337, 264, 1778, + 291, 519, 309, 311, 406, 50644], "temperature": 0.0, "avg_logprob": -0.16989439277238744, + "compression_ratio": 1.72, "no_speech_prob": 0.0009300808887928724}, {"id": 83, + "seek": 47592, "start": 481.52000000000004, "end": 487.52000000000004, "text": " + a trade sensitivity. The sensitivity with abuse abuse use cases. What we were doing + was image", "tokens": [50644, 257, 4923, 19392, 13, 440, 19392, 365, 9852, 9852, + 764, 3331, 13, 708, 321, 645, 884, 390, 3256, 50944], "temperature": 0.0, "avg_logprob": + -0.16989439277238744, "compression_ratio": 1.72, "no_speech_prob": 0.0009300808887928724}, + {"id": 84, "seek": 47592, "start": 487.52000000000004, "end": 494.88, "text": " + classification. And and most of this is I''m not going to go into too much detail + for maybe obvious", "tokens": [50944, 21538, 13, 400, 293, 881, 295, 341, 307, 286, + 478, 406, 516, 281, 352, 666, 886, 709, 2607, 337, 1310, 6322, 51312], "temperature": + 0.0, "avg_logprob": -0.16989439277238744, "compression_ratio": 1.72, "no_speech_prob": + 0.0009300808887928724}, {"id": 85, "seek": 47592, "start": 494.88, "end": 502.08000000000004, + "text": " reasons, but there are there are images that you are not allowed to to + to use or put up and they", "tokens": [51312, 4112, 11, 457, 456, 366, 456, 366, + 5267, 300, 291, 366, 406, 4350, 281, 281, 281, 764, 420, 829, 493, 293, 436, 51672], + "temperature": 0.0, "avg_logprob": -0.16989439277238744, "compression_ratio": 1.72, + "no_speech_prob": 0.0009300808887928724}, {"id": 86, "seek": 50208, "start": 502.08, + "end": 506.88, "text": " and and obviously what they don''t want to do is hand all + these companies like the images and say", "tokens": [50364, 293, 293, 2745, 437, + 436, 500, 380, 528, 281, 360, 307, 1011, 439, 613, 3431, 411, 264, 5267, 293, 584, + 50604], "temperature": 0.0, "avg_logprob": -0.13278803011266196, "compression_ratio": + 1.7661870503597121, "no_speech_prob": 0.00045941834105178714}, {"id": 87, "seek": + 50208, "start": 506.88, "end": 513.4399999999999, "text": " if you see this illegal + image, tell us. So oftentimes they give you hashes. And but these aren''t actual", + "tokens": [50604, 498, 291, 536, 341, 11905, 3256, 11, 980, 505, 13, 407, 18349, + 436, 976, 291, 575, 8076, 13, 400, 457, 613, 3212, 380, 3539, 50932], "temperature": + 0.0, "avg_logprob": -0.13278803011266196, "compression_ratio": 1.7661870503597121, + "no_speech_prob": 0.00045941834105178714}, {"id": 88, "seek": 50208, "start": 513.4399999999999, + "end": 517.52, "text": " hashes. They are not a hash of the illegal image. They + are a locality sensitive hash and they''re", "tokens": [50932, 575, 8076, 13, 814, + 366, 406, 257, 22019, 295, 264, 11905, 3256, 13, 814, 366, 257, 1628, 1860, 9477, + 22019, 293, 436, 434, 51136], "temperature": 0.0, "avg_logprob": -0.13278803011266196, + "compression_ratio": 1.7661870503597121, "no_speech_prob": 0.00045941834105178714}, + {"id": 89, "seek": 50208, "start": 517.52, "end": 523.92, "text": " a vector. What + they are is literally a vector. And Euclidean distance is the measurement of so + you", "tokens": [51136, 257, 8062, 13, 708, 436, 366, 307, 3736, 257, 8062, 13, + 400, 462, 1311, 31264, 282, 4560, 307, 264, 13160, 295, 370, 291, 51456], "temperature": + 0.0, "avg_logprob": -0.13278803011266196, "compression_ratio": 1.7661870503597121, + "no_speech_prob": 0.00045941834105178714}, {"id": 90, "seek": 50208, "start": 523.92, + "end": 529.6, "text": " basically have a a classic vector search problem. You''re + given a pile of vectors. If there''s a", "tokens": [51456, 1936, 362, 257, 257, + 7230, 8062, 3164, 1154, 13, 509, 434, 2212, 257, 14375, 295, 18875, 13, 759, 456, + 311, 257, 51740], "temperature": 0.0, "avg_logprob": -0.13278803011266196, "compression_ratio": + 1.7661870503597121, "no_speech_prob": 0.00045941834105178714}, {"id": 91, "seek": + 52960, "start": 529.6, "end": 534.4, "text": " technology known as photo DNA that + you can look up. It''s it''s not as far as I know it''s not like an", "tokens": + [50364, 2899, 2570, 382, 5052, 8272, 300, 291, 393, 574, 493, 13, 467, 311, 309, + 311, 406, 382, 1400, 382, 286, 458, 309, 311, 406, 411, 364, 50604], "temperature": + 0.0, "avg_logprob": -0.15064579678564957, "compression_ratio": 1.6382113821138211, + "no_speech_prob": 0.00031559172202832997}, {"id": 92, "seek": 52960, "start": 534.4, + "end": 539.6800000000001, "text": " open standard. So you don''t actually it''s + not actually in the public domain. What it actually is,", "tokens": [50604, 1269, + 3832, 13, 407, 291, 500, 380, 767, 309, 311, 406, 767, 294, 264, 1908, 9274, 13, + 708, 309, 767, 307, 11, 50868], "temperature": 0.0, "avg_logprob": -0.15064579678564957, + "compression_ratio": 1.6382113821138211, "no_speech_prob": 0.00031559172202832997}, + {"id": 93, "seek": 52960, "start": 539.6800000000001, "end": 545.6, "text": " but + it''s effectively a mechanism for turning images into vectors that''s used as this + hashing mechanism.", "tokens": [50868, 457, 309, 311, 8659, 257, 7513, 337, 6246, + 5267, 666, 18875, 300, 311, 1143, 382, 341, 575, 571, 7513, 13, 51164], "temperature": + 0.0, "avg_logprob": -0.15064579678564957, "compression_ratio": 1.6382113821138211, + "no_speech_prob": 0.00031559172202832997}, {"id": 94, "seek": 52960, "start": 546.16, + "end": 555.36, "text": " And and so Facebook built a bunch of infrastructure to + flag hashes that came through for reasons that", "tokens": [51192, 400, 293, 370, + 4384, 3094, 257, 3840, 295, 6896, 281, 7166, 575, 8076, 300, 1361, 807, 337, 4112, + 300, 51652], "temperature": 0.0, "avg_logprob": -0.15064579678564957, "compression_ratio": + 1.6382113821138211, "no_speech_prob": 0.00031559172202832997}, {"id": 95, "seek": + 55536, "start": 556.32, "end": 561.12, "text": " are not fun to talk about. Let''s + put it that way. Like they''re there. There''s like again, I don''t want", "tokens": + [50412, 366, 406, 1019, 281, 751, 466, 13, 961, 311, 829, 309, 300, 636, 13, 1743, + 436, 434, 456, 13, 821, 311, 411, 797, 11, 286, 500, 380, 528, 50652], "temperature": + 0.0, "avg_logprob": -0.15344659062742277, "compression_ratio": 1.745583038869258, + "no_speech_prob": 0.003532773582264781}, {"id": 96, "seek": 55536, "start": 561.12, + "end": 566.16, "text": " to get into it. It''s kind of it''s awful, right? But at + the end of the day, like you have you have", "tokens": [50652, 281, 483, 666, 309, + 13, 467, 311, 733, 295, 309, 311, 11232, 11, 558, 30, 583, 412, 264, 917, 295, 264, + 786, 11, 411, 291, 362, 291, 362, 50904], "temperature": 0.0, "avg_logprob": -0.15344659062742277, + "compression_ratio": 1.745583038869258, "no_speech_prob": 0.003532773582264781}, + {"id": 97, "seek": 55536, "start": 566.16, "end": 573.52, "text": " vectors flowing + into the system. And what you''re doing every single upload is is doing essentially + a", "tokens": [50904, 18875, 13974, 666, 264, 1185, 13, 400, 437, 291, 434, 884, + 633, 2167, 6580, 307, 307, 884, 4476, 257, 51272], "temperature": 0.0, "avg_logprob": + -0.15344659062742277, "compression_ratio": 1.745583038869258, "no_speech_prob": + 0.003532773582264781}, {"id": 98, "seek": 55536, "start": 573.52, "end": 578.32, + "text": " vector search. You''re saying, Hey, given this corpus of vectors is this + vector that fly that''s", "tokens": [51272, 8062, 3164, 13, 509, 434, 1566, 11, + 1911, 11, 2212, 341, 1181, 31624, 295, 18875, 307, 341, 8062, 300, 3603, 300, 311, + 51512], "temperature": 0.0, "avg_logprob": -0.15344659062742277, "compression_ratio": + 1.745583038869258, "no_speech_prob": 0.003532773582264781}, {"id": 99, "seek": 55536, + "start": 578.32, "end": 584.88, "text": " coming in match any use. That was the + basic core of the system. But once you have this like these", "tokens": [51512, + 1348, 294, 2995, 604, 764, 13, 663, 390, 264, 3875, 4965, 295, 264, 1185, 13, 583, + 1564, 291, 362, 341, 411, 613, 51840], "temperature": 0.0, "avg_logprob": -0.15344659062742277, + "compression_ratio": 1.745583038869258, "no_speech_prob": 0.003532773582264781}, + {"id": 100, "seek": 58488, "start": 584.88, "end": 588.72, "text": " vectors, you + can start to do other abuse things. So for example, you can start clustering vectors.", + "tokens": [50364, 18875, 11, 291, 393, 722, 281, 360, 661, 9852, 721, 13, 407, 337, + 1365, 11, 291, 393, 722, 596, 48673, 18875, 13, 50556], "temperature": 0.0, "avg_logprob": + -0.1964050654707284, "compression_ratio": 1.835820895522388, "no_speech_prob": 0.00021399892284534872}, + {"id": 101, "seek": 58488, "start": 588.72, "end": 593.52, "text": " You can build + vector clusters. And that way you can find like neighborhoods of images, like similar", + "tokens": [50556, 509, 393, 1322, 8062, 23313, 13, 400, 300, 636, 291, 393, 915, + 411, 20052, 295, 5267, 11, 411, 2531, 50796], "temperature": 0.0, "avg_logprob": + -0.1964050654707284, "compression_ratio": 1.835820895522388, "no_speech_prob": 0.00021399892284534872}, + {"id": 102, "seek": 58488, "start": 593.52, "end": 601.76, "text": " images. Now + here similar is here similar means something quite different. Because these were + not", "tokens": [50796, 5267, 13, 823, 510, 2531, 307, 510, 2531, 1355, 746, 1596, + 819, 13, 1436, 613, 645, 406, 51208], "temperature": 0.0, "avg_logprob": -0.1964050654707284, + "compression_ratio": 1.835820895522388, "no_speech_prob": 0.00021399892284534872}, + {"id": 103, "seek": 58488, "start": 601.76, "end": 606.48, "text": " like semantic + similarity. So this is not like what you would get from an embedding today from + like", "tokens": [51208, 411, 47982, 32194, 13, 407, 341, 307, 406, 411, 437, 291, + 576, 483, 490, 364, 12240, 3584, 965, 490, 411, 51444], "temperature": 0.0, "avg_logprob": + -0.1964050654707284, "compression_ratio": 1.835820895522388, "no_speech_prob": 0.00021399892284534872}, + {"id": 104, "seek": 58488, "start": 606.48, "end": 612.64, "text": " say, you know, + any of the modern. Yeah, I heard clip. Yeah, whatever. Yeah. These were these were", + "tokens": [51444, 584, 11, 291, 458, 11, 604, 295, 264, 4363, 13, 865, 11, 286, + 2198, 7353, 13, 865, 11, 2035, 13, 865, 13, 1981, 645, 613, 645, 51752], "temperature": + 0.0, "avg_logprob": -0.1964050654707284, "compression_ratio": 1.835820895522388, + "no_speech_prob": 0.00021399892284534872}, {"id": 105, "seek": 61264, "start": 612.64, + "end": 619.84, "text": " these were much more like text textual. I mean, texture. + People familiar with like image processing", "tokens": [50364, 613, 645, 709, 544, + 411, 2487, 2487, 901, 13, 286, 914, 11, 8091, 13, 3432, 4963, 365, 411, 3256, 9007, + 50724], "temperature": 0.0, "avg_logprob": -0.1727774852030986, "compression_ratio": + 1.7158273381294964, "no_speech_prob": 0.0003227063571102917}, {"id": 106, "seek": + 61264, "start": 619.84, "end": 625.76, "text": " techniques. This is these are vectors + based on things like local pixel gradients or wavelet", "tokens": [50724, 7512, + 13, 639, 307, 613, 366, 18875, 2361, 322, 721, 411, 2654, 19261, 2771, 2448, 420, + 22144, 302, 51020], "temperature": 0.0, "avg_logprob": -0.1727774852030986, "compression_ratio": + 1.7158273381294964, "no_speech_prob": 0.0003227063571102917}, {"id": 107, "seek": + 61264, "start": 625.76, "end": 630.96, "text": " transforms things like that. So + when we say images were similar, we mean to things like, you know,", "tokens": [51020, + 35592, 721, 411, 300, 13, 407, 562, 321, 584, 5267, 645, 2531, 11, 321, 914, 281, + 721, 411, 11, 291, 458, 11, 51280], "temperature": 0.0, "avg_logprob": -0.1727774852030986, + "compression_ratio": 1.7158273381294964, "no_speech_prob": 0.0003227063571102917}, + {"id": 108, "seek": 61264, "start": 630.96, "end": 637.84, "text": " like rebalancing + the white scale or or changing the hue and saturation like like those kinds of", + "tokens": [51280, 411, 319, 2645, 8779, 264, 2418, 4373, 420, 420, 4473, 264, 24967, + 293, 27090, 411, 411, 729, 3685, 295, 51624], "temperature": 0.0, "avg_logprob": + -0.1727774852030986, "compression_ratio": 1.7158273381294964, "no_speech_prob": + 0.0003227063571102917}, {"id": 109, "seek": 61264, "start": 637.84, "end": 641.92, + "text": " image manipulation or re encoding it right from a different JPEG, different + JPEG encoding.", "tokens": [51624, 3256, 26475, 420, 319, 43430, 309, 558, 490, + 257, 819, 508, 5208, 38, 11, 819, 508, 5208, 38, 43430, 13, 51828], "temperature": + 0.0, "avg_logprob": -0.1727774852030986, "compression_ratio": 1.7158273381294964, + "no_speech_prob": 0.0003227063571102917}, {"id": 110, "seek": 64192, "start": 641.92, + "end": 647.36, "text": " Like it was tolerant to that kind of manipulation, not + like it wasn''t like finding images of elephants,", "tokens": [50364, 1743, 309, + 390, 45525, 281, 300, 733, 295, 26475, 11, 406, 411, 309, 2067, 380, 411, 5006, + 5267, 295, 33015, 11, 50636], "temperature": 0.0, "avg_logprob": -0.21305700302124023, + "compression_ratio": 1.640495867768595, "no_speech_prob": 0.001825169543735683}, + {"id": 111, "seek": 64192, "start": 647.36, "end": 652.4799999999999, "text": " + like that''s that''s not what it was doing. Yeah, yeah, I remember I took a course. + Actually, I studied", "tokens": [50636, 411, 300, 311, 300, 311, 406, 437, 309, + 390, 884, 13, 865, 11, 1338, 11, 286, 1604, 286, 1890, 257, 1164, 13, 5135, 11, + 286, 9454, 50892], "temperature": 0.0, "avg_logprob": -0.21305700302124023, "compression_ratio": + 1.640495867768595, "no_speech_prob": 0.001825169543735683}, {"id": 112, "seek": + 64192, "start": 652.4799999999999, "end": 659.28, "text": " master degree in Finland + here dedicated to data security. And one of the courses was about,", "tokens": [50892, + 4505, 4314, 294, 24869, 510, 8374, 281, 1412, 3825, 13, 400, 472, 295, 264, 7712, + 390, 466, 11, 51232], "temperature": 0.0, "avg_logprob": -0.21305700302124023, "compression_ratio": + 1.640495867768595, "no_speech_prob": 0.001825169543735683}, {"id": 113, "seek": + 64192, "start": 659.28, "end": 667.4399999999999, "text": " you know, how you can + temper with images that had watermarks, right? So like, yeah, and then how do", + "tokens": [51232, 291, 458, 11, 577, 291, 393, 3393, 365, 5267, 300, 632, 1281, + 37307, 11, 558, 30, 407, 411, 11, 1338, 11, 293, 550, 577, 360, 51640], "temperature": + 0.0, "avg_logprob": -0.21305700302124023, "compression_ratio": 1.640495867768595, + "no_speech_prob": 0.001825169543735683}, {"id": 114, "seek": 66744, "start": 667.44, + "end": 673.2, "text": " you make that watermark resilient to any tempering that + might happen on the image level, right?", "tokens": [50364, 291, 652, 300, 1281, + 5638, 23699, 281, 604, 3393, 278, 300, 1062, 1051, 322, 264, 3256, 1496, 11, 558, + 30, 50652], "temperature": 0.0, "avg_logprob": -0.15820659838224713, "compression_ratio": + 1.6192468619246863, "no_speech_prob": 0.0037283434066921473}, {"id": 115, "seek": + 66744, "start": 673.2, "end": 679.6800000000001, "text": " On any of the bands and + stuff. And as you explained, he was in stuff. So that''s basically they digital", + "tokens": [50652, 1282, 604, 295, 264, 13543, 293, 1507, 13, 400, 382, 291, 8825, + 11, 415, 390, 294, 1507, 13, 407, 300, 311, 1936, 436, 4562, 50976], "temperature": + 0.0, "avg_logprob": -0.15820659838224713, "compression_ratio": 1.6192468619246863, + "no_speech_prob": 0.0037283434066921473}, {"id": 116, "seek": 66744, "start": 679.6800000000001, + "end": 684.8000000000001, "text": " image processing is the word to Google if someone + wants to. And then it''s like a big, big topic.", "tokens": [50976, 3256, 9007, + 307, 264, 1349, 281, 3329, 498, 1580, 2738, 281, 13, 400, 550, 309, 311, 411, 257, + 955, 11, 955, 4829, 13, 51232], "temperature": 0.0, "avg_logprob": -0.15820659838224713, + "compression_ratio": 1.6192468619246863, "no_speech_prob": 0.0037283434066921473}, + {"id": 117, "seek": 66744, "start": 686.24, "end": 691.36, "text": " But what struck + me and what you explained is that every image upload had to go through that", "tokens": + [51304, 583, 437, 13159, 385, 293, 437, 291, 8825, 307, 300, 633, 3256, 6580, 632, + 281, 352, 807, 300, 51560], "temperature": 0.0, "avg_logprob": -0.15820659838224713, + "compression_ratio": 1.6192468619246863, "no_speech_prob": 0.0037283434066921473}, + {"id": 118, "seek": 69136, "start": 692.08, "end": 699.2, "text": " process, which + means it had to be super scalable. And also your database of vectors would", "tokens": + [50400, 1399, 11, 597, 1355, 309, 632, 281, 312, 1687, 38481, 13, 400, 611, 428, + 8149, 295, 18875, 576, 50756], "temperature": 0.0, "avg_logprob": -0.14114008144456514, + "compression_ratio": 1.602510460251046, "no_speech_prob": 0.012525550089776516}, + {"id": 119, "seek": 69136, "start": 699.2, "end": 704.32, "text": " be ever growing + all the time as the image passes through or doesn''t, you would need to add it", + "tokens": [50756, 312, 1562, 4194, 439, 264, 565, 382, 264, 3256, 11335, 807, 420, + 1177, 380, 11, 291, 576, 643, 281, 909, 309, 51012], "temperature": 0.0, "avg_logprob": + -0.14114008144456514, "compression_ratio": 1.602510460251046, "no_speech_prob": + 0.012525550089776516}, {"id": 120, "seek": 69136, "start": 704.32, "end": 709.84, + "text": " somewhere to your vector space. So in this case, no, this is the one advantage + we had, because we", "tokens": [51012, 4079, 281, 428, 8062, 1901, 13, 407, 294, + 341, 1389, 11, 572, 11, 341, 307, 264, 472, 5002, 321, 632, 11, 570, 321, 51288], + "temperature": 0.0, "avg_logprob": -0.14114008144456514, "compression_ratio": 1.602510460251046, + "no_speech_prob": 0.012525550089776516}, {"id": 121, "seek": 69136, "start": 709.84, + "end": 717.6, "text": " only care about matches to a specific relatively small set. + Oh, I see, I see. So it''s like a set that", "tokens": [51288, 787, 1127, 466, 10676, + 281, 257, 2685, 7226, 1359, 992, 13, 876, 11, 286, 536, 11, 286, 536, 13, 407, 309, + 311, 411, 257, 992, 300, 51676], "temperature": 0.0, "avg_logprob": -0.14114008144456514, + "compression_ratio": 1.602510460251046, "no_speech_prob": 0.012525550089776516}, + {"id": 122, "seek": 71760, "start": 717.6800000000001, "end": 724.24, "text": " + shouldn''t grow ideally, right? Yeah, or very, very nominal. I see. Yeah. And so + this is, so it''s", "tokens": [50368, 4659, 380, 1852, 22915, 11, 558, 30, 865, + 11, 420, 588, 11, 588, 41641, 13, 286, 536, 13, 865, 13, 400, 370, 341, 307, 11, + 370, 309, 311, 50696], "temperature": 0.0, "avg_logprob": -0.2112926079974911, "compression_ratio": + 1.6563573883161513, "no_speech_prob": 0.003292010398581624}, {"id": 123, "seek": + 71760, "start": 724.24, "end": 730.8000000000001, "text": " funny because that that''s + a big difference that that makes it easy in that era. Today, you''d have", "tokens": + [50696, 4074, 570, 300, 300, 311, 257, 955, 2649, 300, 300, 1669, 309, 1858, 294, + 300, 4249, 13, 2692, 11, 291, 1116, 362, 51024], "temperature": 0.0, "avg_logprob": + -0.2112926079974911, "compression_ratio": 1.6563573883161513, "no_speech_prob": + 0.003292010398581624}, {"id": 124, "seek": 71760, "start": 730.8000000000001, "end": + 734.88, "text": " to bust out all the A and N stuff and maybe stuff we''ll get into + to really be able to do a really", "tokens": [51024, 281, 19432, 484, 439, 264, + 316, 293, 426, 1507, 293, 1310, 1507, 321, 603, 483, 666, 281, 534, 312, 1075, 281, + 360, 257, 534, 51228], "temperature": 0.0, "avg_logprob": -0.2112926079974911, "compression_ratio": + 1.6563573883161513, "no_speech_prob": 0.003292010398581624}, {"id": 125, "seek": + 71760, "start": 734.88, "end": 740.08, "text": " much more scalable vector search. + So this was really more about evaluating a relatively fixed", "tokens": [51228, + 709, 544, 38481, 8062, 3164, 13, 407, 341, 390, 534, 544, 466, 27479, 257, 7226, + 6806, 51488], "temperature": 0.0, "avg_logprob": -0.2112926079974911, "compression_ratio": + 1.6563573883161513, "no_speech_prob": 0.003292010398581624}, {"id": 126, "seek": + 71760, "start": 740.88, "end": 747.28, "text": " set of vectors. So you can hyper + optimize how that was organized in like a, and, but evaluating", "tokens": [51528, + 992, 295, 18875, 13, 407, 291, 393, 9848, 19719, 577, 300, 390, 9983, 294, 411, + 257, 11, 293, 11, 457, 27479, 51848], "temperature": 0.0, "avg_logprob": -0.2112926079974911, + "compression_ratio": 1.6563573883161513, "no_speech_prob": 0.003292010398581624}, + {"id": 127, "seek": 74728, "start": 747.28, "end": 752.0, "text": " it at an insane + scale. So the update problem wasn''t very hard, but the evaluation problem was", + "tokens": [50364, 309, 412, 364, 10838, 4373, 13, 407, 264, 5623, 1154, 2067, 380, + 588, 1152, 11, 457, 264, 13344, 1154, 390, 50600], "temperature": 0.0, "avg_logprob": + -0.19094404923288447, "compression_ratio": 1.5833333333333333, "no_speech_prob": + 0.001189122791402042}, {"id": 128, "seek": 74728, "start": 752.0, "end": 758.0799999999999, + "text": " like it needed to be extremely high scale. Yeah, a bunch of questions + in my mind, but let''s move", "tokens": [50600, 411, 309, 2978, 281, 312, 4664, + 1090, 4373, 13, 865, 11, 257, 3840, 295, 1651, 294, 452, 1575, 11, 457, 718, 311, + 1286, 50904], "temperature": 0.0, "avg_logprob": -0.19094404923288447, "compression_ratio": + 1.5833333333333333, "no_speech_prob": 0.001189122791402042}, {"id": 129, "seek": + 74728, "start": 758.0799999999999, "end": 764.16, "text": " move on to Roxette. + Tell me more about the what part, you know, what it is as the product.", "tokens": + [50904, 1286, 322, 281, 44427, 3007, 13, 5115, 385, 544, 466, 264, 437, 644, 11, + 291, 458, 11, 437, 309, 307, 382, 264, 1674, 13, 51208], "temperature": 0.0, "avg_logprob": + -0.19094404923288447, "compression_ratio": 1.5833333333333333, "no_speech_prob": + 0.001189122791402042}, {"id": 130, "seek": 74728, "start": 764.48, "end": 770.72, + "text": " And then slowly, let''s go deeper into the technology side. Yeah. So my + standard", "tokens": [51224, 400, 550, 5692, 11, 718, 311, 352, 7731, 666, 264, + 2899, 1252, 13, 865, 13, 407, 452, 3832, 51536], "temperature": 0.0, "avg_logprob": + -0.19094404923288447, "compression_ratio": 1.5833333333333333, "no_speech_prob": + 0.001189122791402042}, {"id": 131, "seek": 77072, "start": 771.6800000000001, "end": + 777.6, "text": " statement of what Roxette is is Roxette is a search and analytics + database built for the cloud.", "tokens": [50412, 5629, 295, 437, 44427, 3007, 307, + 307, 44427, 3007, 307, 257, 3164, 293, 15370, 8149, 3094, 337, 264, 4588, 13, 50708], + "temperature": 0.0, "avg_logprob": -0.1229516967894539, "compression_ratio": 1.8850574712643677, + "no_speech_prob": 0.0064694019965827465}, {"id": 132, "seek": 77072, "start": 779.6, + "end": 785.52, "text": " And that''s a bunch of, I forgot one, it''s a real time + search and analytics database for the cloud.", "tokens": [50808, 400, 300, 311, + 257, 3840, 295, 11, 286, 5298, 472, 11, 309, 311, 257, 957, 565, 3164, 293, 15370, + 8149, 337, 264, 4588, 13, 51104], "temperature": 0.0, "avg_logprob": -0.1229516967894539, + "compression_ratio": 1.8850574712643677, "no_speech_prob": 0.0064694019965827465}, + {"id": 133, "seek": 77072, "start": 785.52, "end": 788.88, "text": " Now that''s + a bunch of little buzz worries that, you know, it''s very easy to get lost in the + kind", "tokens": [51104, 823, 300, 311, 257, 3840, 295, 707, 13036, 16340, 300, + 11, 291, 458, 11, 309, 311, 588, 1858, 281, 483, 2731, 294, 264, 733, 51272], "temperature": + 0.0, "avg_logprob": -0.1229516967894539, "compression_ratio": 1.8850574712643677, + "no_speech_prob": 0.0064694019965827465}, {"id": 134, "seek": 77072, "start": 788.88, + "end": 794.64, "text": " of marketing feel of that, but each of those words does + like non trivial amounts of work and what", "tokens": [51272, 295, 6370, 841, 295, + 300, 11, 457, 1184, 295, 729, 2283, 775, 411, 2107, 26703, 11663, 295, 589, 293, + 437, 51560], "temperature": 0.0, "avg_logprob": -0.1229516967894539, "compression_ratio": + 1.8850574712643677, "no_speech_prob": 0.0064694019965827465}, {"id": 135, "seek": + 77072, "start": 794.64, "end": 798.64, "text": " it is I''m really trying to build + here. So first of all, it''s a search and analytics database. So here,", "tokens": + [51560, 309, 307, 286, 478, 534, 1382, 281, 1322, 510, 13, 407, 700, 295, 439, 11, + 309, 311, 257, 3164, 293, 15370, 8149, 13, 407, 510, 11, 51760], "temperature": + 0.0, "avg_logprob": -0.1229516967894539, "compression_ratio": 1.8850574712643677, + "no_speech_prob": 0.0064694019965827465}, {"id": 136, "seek": 79864, "start": 798.64, + "end": 803.68, "text": " what we mean is like a like an OLAP style analytics database + is like it''s it''s like that''s where", "tokens": [50364, 437, 321, 914, 307, 411, + 257, 411, 364, 39191, 4715, 3758, 15370, 8149, 307, 411, 309, 311, 309, 311, 411, + 300, 311, 689, 50616], "temperature": 0.0, "avg_logprob": -0.13680712930087385, + "compression_ratio": 1.8136882129277567, "no_speech_prob": 0.0001771239476511255}, + {"id": 137, "seek": 79864, "start": 803.68, "end": 808.08, "text": " we''re starting. + We want to run an analytics type queries and this I won''t get into all of this,", + "tokens": [50616, 321, 434, 2891, 13, 492, 528, 281, 1190, 364, 15370, 2010, 24109, + 293, 341, 286, 1582, 380, 483, 666, 439, 295, 341, 11, 50836], "temperature": 0.0, + "avg_logprob": -0.13680712930087385, "compression_ratio": 1.8136882129277567, "no_speech_prob": + 0.0001771239476511255}, {"id": 138, "seek": 79864, "start": 808.08, "end": 813.4399999999999, + "text": " but this is like separate from your OLTP style databases. So this is not + my sequel, not a large", "tokens": [50836, 457, 341, 307, 411, 4994, 490, 428, 39191, + 16804, 3758, 22380, 13, 407, 341, 307, 406, 452, 20622, 11, 406, 257, 2416, 51104], + "temperature": 0.0, "avg_logprob": -0.13680712930087385, "compression_ratio": 1.8136882129277567, + "no_speech_prob": 0.0001771239476511255}, {"id": 139, "seek": 79864, "start": 813.4399999999999, + "end": 819.84, "text": " transactional thing. It is a but is it OLAP style database. + And search and analytics is a very", "tokens": [51104, 46688, 1966, 551, 13, 467, + 307, 257, 457, 307, 309, 39191, 4715, 3758, 8149, 13, 400, 3164, 293, 15370, 307, + 257, 588, 51424], "temperature": 0.0, "avg_logprob": -0.13680712930087385, "compression_ratio": + 1.8136882129277567, "no_speech_prob": 0.0001771239476511255}, {"id": 140, "seek": + 79864, "start": 819.84, "end": 825.04, "text": " interesting pairing in this world + because systems like elastic search or very search oriented", "tokens": [51424, + 1880, 32735, 294, 341, 1002, 570, 3652, 411, 17115, 3164, 420, 588, 3164, 21841, + 51684], "temperature": 0.0, "avg_logprob": -0.13680712930087385, "compression_ratio": + 1.8136882129277567, "no_speech_prob": 0.0001771239476511255}, {"id": 141, "seek": + 82504, "start": 825.04, "end": 829.12, "text": " system systems like rocks that + have analytics styles, but these are actually not that different", "tokens": [50364, + 1185, 3652, 411, 10989, 300, 362, 15370, 13273, 11, 457, 613, 366, 767, 406, 300, + 819, 50568], "temperature": 0.0, "avg_logprob": -0.14332838276870377, "compression_ratio": + 1.8580645161290323, "no_speech_prob": 0.0005571605870500207}, {"id": 142, "seek": + 82504, "start": 829.12, "end": 833.76, "text": " architecturally. They''re very + the way you use them may feel different. The primitives you''re", "tokens": [50568, + 6331, 6512, 13, 814, 434, 588, 264, 636, 291, 764, 552, 815, 841, 819, 13, 440, + 2886, 38970, 291, 434, 50800], "temperature": 0.0, "avg_logprob": -0.14332838276870377, + "compression_ratio": 1.8580645161290323, "no_speech_prob": 0.0005571605870500207}, + {"id": 143, "seek": 82504, "start": 833.76, "end": 838.7199999999999, "text": " + using feel different, but all that sits fairly shallowly in the technology. The + underlying", "tokens": [50800, 1228, 841, 819, 11, 457, 439, 300, 12696, 6457, 20488, + 356, 294, 264, 2899, 13, 440, 14217, 51048], "temperature": 0.0, "avg_logprob": + -0.14332838276870377, "compression_ratio": 1.8580645161290323, "no_speech_prob": + 0.0005571605870500207}, {"id": 144, "seek": 82504, "start": 838.7199999999999, "end": + 842.0799999999999, "text": " architecture of these systems ends up looking quite + similar. So search and analytics actually go", "tokens": [51048, 9482, 295, 613, + 3652, 5314, 493, 1237, 1596, 2531, 13, 407, 3164, 293, 15370, 767, 352, 51216], + "temperature": 0.0, "avg_logprob": -0.14332838276870377, "compression_ratio": 1.8580645161290323, + "no_speech_prob": 0.0005571605870500207}, {"id": 145, "seek": 82504, "start": 842.0799999999999, + "end": 847.04, "text": " together quite nicely from like a I can do both. Maybe + I don''t do both well, but that will mostly", "tokens": [51216, 1214, 1596, 9594, + 490, 411, 257, 286, 393, 360, 1293, 13, 2704, 286, 500, 380, 360, 1293, 731, 11, + 457, 300, 486, 5240, 51464], "temperature": 0.0, "avg_logprob": -0.14332838276870377, + "compression_ratio": 1.8580645161290323, "no_speech_prob": 0.0005571605870500207}, + {"id": 146, "seek": 82504, "start": 847.04, "end": 852.9599999999999, "text": " + exist at the top, not not in the not in the infrastructure. It''s in the cloud. + So the whole system is", "tokens": [51464, 2514, 412, 264, 1192, 11, 406, 406, 294, + 264, 406, 294, 264, 6896, 13, 467, 311, 294, 264, 4588, 13, 407, 264, 1379, 1185, + 307, 51760], "temperature": 0.0, "avg_logprob": -0.14332838276870377, "compression_ratio": + 1.8580645161290323, "no_speech_prob": 0.0005571605870500207}, {"id": 147, "seek": + 85296, "start": 852.96, "end": 857.76, "text": " built to be elastic from the beginning. + So if you send me twice as much data, I can scale you out", "tokens": [50364, 3094, + 281, 312, 17115, 490, 264, 2863, 13, 407, 498, 291, 2845, 385, 6091, 382, 709, 1412, + 11, 286, 393, 4373, 291, 484, 50604], "temperature": 0.0, "avg_logprob": -0.1125994548201561, + "compression_ratio": 1.7310344827586206, "no_speech_prob": 0.0008639035513624549}, + {"id": 148, "seek": 85296, "start": 857.76, "end": 862.32, "text": " in a way that + you know, like it just works. You don''t have to worry. You''re not reprovisioning + more", "tokens": [50604, 294, 257, 636, 300, 291, 458, 11, 411, 309, 445, 1985, + 13, 509, 500, 380, 362, 281, 3292, 13, 509, 434, 406, 1085, 340, 6763, 278, 544, + 50832], "temperature": 0.0, "avg_logprob": -0.1125994548201561, "compression_ratio": + 1.7310344827586206, "no_speech_prob": 0.0008639035513624549}, {"id": 149, "seek": + 85296, "start": 862.32, "end": 867.84, "text": " machines to double your cluster + size or anything like that. And then real time. So our focus has", "tokens": [50832, + 8379, 281, 3834, 428, 13630, 2744, 420, 1340, 411, 300, 13, 400, 550, 957, 565, + 13, 407, 527, 1879, 575, 51108], "temperature": 0.0, "avg_logprob": -0.1125994548201561, + "compression_ratio": 1.7310344827586206, "no_speech_prob": 0.0008639035513624549}, + {"id": 150, "seek": 85296, "start": 867.84, "end": 872.96, "text": " always been + real time, which is to say specifically most people when they think of real time + they want", "tokens": [51108, 1009, 668, 957, 565, 11, 597, 307, 281, 584, 4682, + 881, 561, 562, 436, 519, 295, 957, 565, 436, 528, 51364], "temperature": 0.0, "avg_logprob": + -0.1125994548201561, "compression_ratio": 1.7310344827586206, "no_speech_prob": + 0.0008639035513624549}, {"id": 151, "seek": 85296, "start": 872.96, "end": 878.48, + "text": " their queries to be fast, but the real heart of real time is ingest latency. + So if you send me new data,", "tokens": [51364, 641, 24109, 281, 312, 2370, 11, + 457, 264, 957, 1917, 295, 957, 565, 307, 3957, 377, 27043, 13, 407, 498, 291, 2845, + 385, 777, 1412, 11, 51640], "temperature": 0.0, "avg_logprob": -0.1125994548201561, + "compression_ratio": 1.7310344827586206, "no_speech_prob": 0.0008639035513624549}, + {"id": 152, "seek": 87848, "start": 878.48, "end": 883.2, "text": " how quickly + does that data get manifested in the queries? If it''s tomorrow, if it shows up + in", "tokens": [50364, 577, 2661, 775, 300, 1412, 483, 42775, 294, 264, 24109, 30, + 759, 309, 311, 4153, 11, 498, 309, 3110, 493, 294, 50600], "temperature": 0.0, "avg_logprob": + -0.16038077218191965, "compression_ratio": 1.7314487632508835, "no_speech_prob": + 0.001409567310474813}, {"id": 153, "seek": 87848, "start": 883.2, "end": 887.52, + "text": " tomorrow''s queries, you''re not that''s not a real time system. And there''s + a lot of systems like this,", "tokens": [50600, 4153, 311, 24109, 11, 291, 434, + 406, 300, 311, 406, 257, 957, 565, 1185, 13, 400, 456, 311, 257, 688, 295, 3652, + 411, 341, 11, 50816], "temperature": 0.0, "avg_logprob": -0.16038077218191965, "compression_ratio": + 1.7314487632508835, "no_speech_prob": 0.001409567310474813}, {"id": 154, "seek": + 87848, "start": 887.52, "end": 894.88, "text": " these very big batch style, like + mega exabyte type of like doob clusters that you you can query", "tokens": [50816, + 613, 588, 955, 15245, 3758, 11, 411, 17986, 454, 34529, 2010, 295, 411, 360, 996, + 23313, 300, 291, 291, 393, 14581, 51184], "temperature": 0.0, "avg_logprob": -0.16038077218191965, + "compression_ratio": 1.7314487632508835, "no_speech_prob": 0.001409567310474813}, + {"id": 155, "seek": 87848, "start": 894.88, "end": 900.24, "text": " yesterday''s + data, right? And get and get like genuinely enormous amounts of data. That is not", + "tokens": [51184, 5186, 311, 1412, 11, 558, 30, 400, 483, 293, 483, 411, 17839, + 11322, 11663, 295, 1412, 13, 663, 307, 406, 51452], "temperature": 0.0, "avg_logprob": + -0.16038077218191965, "compression_ratio": 1.7314487632508835, "no_speech_prob": + 0.001409567310474813}, {"id": 156, "seek": 87848, "start": 900.24, "end": 905.6, + "text": " rock set like as that''s not rock sets problem. But for us, it''s like, + hey, if you want like last minutes", "tokens": [51452, 3727, 992, 411, 382, 300, + 311, 406, 3727, 6352, 1154, 13, 583, 337, 505, 11, 309, 311, 411, 11, 4177, 11, + 498, 291, 528, 411, 1036, 2077, 51720], "temperature": 0.0, "avg_logprob": -0.16038077218191965, + "compression_ratio": 1.7314487632508835, "no_speech_prob": 0.001409567310474813}, + {"id": 157, "seek": 90560, "start": 905.6, "end": 910.88, "text": " data and it''s + ideally several zero smaller of a working set, then that''s where rock set is meant", + "tokens": [50364, 1412, 293, 309, 311, 22915, 2940, 4018, 4356, 295, 257, 1364, + 992, 11, 550, 300, 311, 689, 3727, 992, 307, 4140, 50628], "temperature": 0.0, "avg_logprob": + -0.14660627824546646, "compression_ratio": 1.7689530685920578, "no_speech_prob": + 0.010947044007480145}, {"id": 158, "seek": 90560, "start": 910.88, "end": 915.6800000000001, + "text": " is meant to work really well. And so this is like the heart of this is + what we''ve set out to build", "tokens": [50628, 307, 4140, 281, 589, 534, 731, + 13, 400, 370, 341, 307, 411, 264, 1917, 295, 341, 307, 437, 321, 600, 992, 484, + 281, 1322, 50868], "temperature": 0.0, "avg_logprob": -0.14660627824546646, "compression_ratio": + 1.7689530685920578, "no_speech_prob": 0.010947044007480145}, {"id": 159, "seek": + 90560, "start": 916.48, "end": 923.76, "text": " at a high level. And I don''t know + if you want to do want me to keep going. I don''t know. I feel like", "tokens": + [50908, 412, 257, 1090, 1496, 13, 400, 286, 500, 380, 458, 498, 291, 528, 281, 360, + 528, 385, 281, 1066, 516, 13, 286, 500, 380, 458, 13, 286, 841, 411, 51272], "temperature": + 0.0, "avg_logprob": -0.14660627824546646, "compression_ratio": 1.7689530685920578, + "no_speech_prob": 0.010947044007480145}, {"id": 160, "seek": 90560, "start": 923.76, + "end": 929.52, "text": " I''ve already said too much. I want I want no, it''s amazing. + It''s a good start. I wanted to stay", "tokens": [51272, 286, 600, 1217, 848, 886, + 709, 13, 286, 528, 286, 528, 572, 11, 309, 311, 2243, 13, 467, 311, 257, 665, 722, + 13, 286, 1415, 281, 1754, 51560], "temperature": 0.0, "avg_logprob": -0.14660627824546646, + "compression_ratio": 1.7689530685920578, "no_speech_prob": 0.010947044007480145}, + {"id": 161, "seek": 90560, "start": 929.52, "end": 935.12, "text": " a little bit + on the product side. If you go now and flip over to the use cases for the moment. + So", "tokens": [51560, 257, 707, 857, 322, 264, 1674, 1252, 13, 759, 291, 352, 586, + 293, 7929, 670, 281, 264, 764, 3331, 337, 264, 1623, 13, 407, 51840], "temperature": + 0.0, "avg_logprob": -0.14660627824546646, "compression_ratio": 1.7689530685920578, + "no_speech_prob": 0.010947044007480145}, {"id": 162, "seek": 93512, "start": 935.44, + "end": 941.6, "text": " what are the typical use cases and sort of can you zoom + out as much as possible, maybe even giving,", "tokens": [50380, 437, 366, 264, 7476, + 764, 3331, 293, 1333, 295, 393, 291, 8863, 484, 382, 709, 382, 1944, 11, 1310, 754, + 2902, 11, 50688], "temperature": 0.0, "avg_logprob": -0.1525605773925781, "compression_ratio": + 1.7859778597785978, "no_speech_prob": 0.0006826731842011213}, {"id": 163, "seek": + 93512, "start": 941.6, "end": 946.08, "text": " you know, even if hypothetical, + it''s fine. For example, so products that use your product.", "tokens": [50688, + 291, 458, 11, 754, 498, 33053, 11, 309, 311, 2489, 13, 1171, 1365, 11, 370, 3383, + 300, 764, 428, 1674, 13, 50912], "temperature": 0.0, "avg_logprob": -0.1525605773925781, + "compression_ratio": 1.7859778597785978, "no_speech_prob": 0.0006826731842011213}, + {"id": 164, "seek": 93512, "start": 946.72, "end": 952.96, "text": " Yeah. So we + we have a bunch of customers in a bunch of different domains and it''s we can even + go.", "tokens": [50944, 865, 13, 407, 321, 321, 362, 257, 3840, 295, 4581, 294, + 257, 3840, 295, 819, 25514, 293, 309, 311, 321, 393, 754, 352, 13, 51256], "temperature": + 0.0, "avg_logprob": -0.1525605773925781, "compression_ratio": 1.7859778597785978, + "no_speech_prob": 0.0006826731842011213}, {"id": 165, "seek": 93512, "start": 952.96, + "end": 956.32, "text": " So so one way to think about this is just like who''s using + it and why like what domains are they", "tokens": [51256, 407, 370, 472, 636, 281, + 519, 466, 341, 307, 445, 411, 567, 311, 1228, 309, 293, 983, 411, 437, 25514, 366, + 436, 51424], "temperature": 0.0, "avg_logprob": -0.1525605773925781, "compression_ratio": + 1.7859778597785978, "no_speech_prob": 0.0006826731842011213}, {"id": 166, "seek": + 93512, "start": 956.32, "end": 961.52, "text": " using it in? And so for example, + we have a bunch of gaming customers. So this is like there''s real", "tokens": [51424, + 1228, 309, 294, 30, 400, 370, 337, 1365, 11, 321, 362, 257, 3840, 295, 9703, 4581, + 13, 407, 341, 307, 411, 456, 311, 957, 51684], "temperature": 0.0, "avg_logprob": + -0.1525605773925781, "compression_ratio": 1.7859778597785978, "no_speech_prob": + 0.0006826731842011213}, {"id": 167, "seek": 96152, "start": 961.52, "end": 967.12, + "text": " time events occurring in games. Imagine an online an online game of some + sort and", "tokens": [50364, 565, 3931, 18386, 294, 2813, 13, 11739, 364, 2950, + 364, 2950, 1216, 295, 512, 1333, 293, 50644], "temperature": 0.0, "avg_logprob": + -0.15728677037250566, "compression_ratio": 1.6355555555555557, "no_speech_prob": + 0.004070555791258812}, {"id": 168, "seek": 96152, "start": 970.96, "end": 976.3199999999999, + "text": " they''re collating that information constantly and having it be real uptime + say leader boards or", "tokens": [50836, 436, 434, 1263, 990, 300, 1589, 6460, 293, + 1419, 309, 312, 957, 493, 3766, 584, 5263, 13293, 420, 51104], "temperature": 0.0, + "avg_logprob": -0.15728677037250566, "compression_ratio": 1.6355555555555557, "no_speech_prob": + 0.004070555791258812}, {"id": 169, "seek": 96152, "start": 976.3199999999999, "end": + 981.76, "text": " things like that are happening. There''s a lot there''s several + actually like logistics and supply", "tokens": [51104, 721, 411, 300, 366, 2737, + 13, 821, 311, 257, 688, 456, 311, 2940, 767, 411, 27420, 293, 5847, 51376], "temperature": + 0.0, "avg_logprob": -0.15728677037250566, "compression_ratio": 1.6355555555555557, + "no_speech_prob": 0.004070555791258812}, {"id": 170, "seek": 96152, "start": 981.76, + "end": 988.56, "text": " chain type people using it. So like where is my package + right now or where is the boat in the", "tokens": [51376, 5021, 2010, 561, 1228, + 309, 13, 407, 411, 689, 307, 452, 7372, 558, 586, 420, 689, 307, 264, 6582, 294, + 264, 51716], "temperature": 0.0, "avg_logprob": -0.15728677037250566, "compression_ratio": + 1.6355555555555557, "no_speech_prob": 0.004070555791258812}, {"id": 171, "seek": + 98856, "start": 988.56, "end": 994.88, "text": " ocean like these kinds of queries + are like very commonly done, you know, like where is the where", "tokens": [50364, + 7810, 411, 613, 3685, 295, 24109, 366, 411, 588, 12719, 1096, 11, 291, 458, 11, + 411, 689, 307, 264, 689, 50680], "temperature": 0.0, "avg_logprob": -0.1868603689628735, + "compression_ratio": 1.7963636363636364, "no_speech_prob": 0.0016653829952701926}, + {"id": 172, "seek": 98856, "start": 994.88, "end": 998.16, "text": " they''re basically + tracking their entire supply chain trying to find shortages and what''s going to", + "tokens": [50680, 436, 434, 1936, 11603, 641, 2302, 5847, 5021, 1382, 281, 915, + 46765, 293, 437, 311, 516, 281, 50844], "temperature": 0.0, "avg_logprob": -0.1868603689628735, + "compression_ratio": 1.7963636363636364, "no_speech_prob": 0.0016653829952701926}, + {"id": 173, "seek": 98856, "start": 998.16, "end": 1003.5999999999999, "text": " + create problems down the line in like a logistic type settings. There''s a lot of + FinTech. There''s", "tokens": [50844, 1884, 2740, 760, 264, 1622, 294, 411, 257, + 3565, 3142, 2010, 6257, 13, 821, 311, 257, 688, 295, 3773, 36050, 13, 821, 311, + 51116], "temperature": 0.0, "avg_logprob": -0.1868603689628735, "compression_ratio": + 1.7963636363636364, "no_speech_prob": 0.0016653829952701926}, {"id": 174, "seek": + 98856, "start": 1003.5999999999999, "end": 1009.04, "text": " a lot of financial + financial firms use it a lot of fraud detection. So again fraud and spam these", + "tokens": [51116, 257, 688, 295, 4669, 4669, 18055, 764, 309, 257, 688, 295, 14560, + 17784, 13, 407, 797, 14560, 293, 24028, 613, 51388], "temperature": 0.0, "avg_logprob": + -0.1868603689628735, "compression_ratio": 1.7963636363636364, "no_speech_prob": + 0.0016653829952701926}, {"id": 175, "seek": 98856, "start": 1009.04, "end": 1014.16, + "text": " are very real time problems. You can''t like detect yesterday spam or + fraud. That''s like really harmful.", "tokens": [51388, 366, 588, 957, 565, 2740, + 13, 509, 393, 380, 411, 5531, 5186, 24028, 420, 14560, 13, 663, 311, 411, 534, 19727, + 13, 51644], "temperature": 0.0, "avg_logprob": -0.1868603689628735, "compression_ratio": + 1.7963636363636364, "no_speech_prob": 0.0016653829952701926}, {"id": 176, "seek": + 101416, "start": 1014.16, "end": 1021.8399999999999, "text": " You got to you need + to know now right. And then a lot of like recommendation and like product", "tokens": + [50364, 509, 658, 281, 291, 643, 281, 458, 586, 558, 13, 400, 550, 257, 688, 295, + 411, 11879, 293, 411, 1674, 50748], "temperature": 0.0, "avg_logprob": -0.20070678904905156, + "compression_ratio": 1.7259259259259259, "no_speech_prob": 0.0022025706712156534}, + {"id": 177, "seek": 101416, "start": 1021.8399999999999, "end": 1026.48, "text": + " experience. So anytime that like you want to power a user facing experience, you + almost always", "tokens": [50748, 1752, 13, 407, 13038, 300, 411, 291, 528, 281, + 1347, 257, 4195, 7170, 1752, 11, 291, 1920, 1009, 50980], "temperature": 0.0, "avg_logprob": + -0.20070678904905156, "compression_ratio": 1.7259259259259259, "no_speech_prob": + 0.0022025706712156534}, {"id": 178, "seek": 101416, "start": 1026.48, "end": 1031.2, + "text": " need that to be real time. So you know example I like to use is there''s + a there''s a there''s a place", "tokens": [50980, 643, 300, 281, 312, 957, 565, + 13, 407, 291, 458, 1365, 286, 411, 281, 764, 307, 456, 311, 257, 456, 311, 257, + 456, 311, 257, 1081, 51216], "temperature": 0.0, "avg_logprob": -0.20070678904905156, + "compression_ratio": 1.7259259259259259, "no_speech_prob": 0.0022025706712156534}, + {"id": 179, "seek": 101416, "start": 1031.2, "end": 1036.0, "text": " called what + not if you go to what not.com if you''ve never heard of it. What not is basically + a", "tokens": [51216, 1219, 437, 406, 498, 291, 352, 281, 437, 406, 13, 1112, 498, + 291, 600, 1128, 2198, 295, 309, 13, 708, 406, 307, 1936, 257, 51456], "temperature": + 0.0, "avg_logprob": -0.20070678904905156, "compression_ratio": 1.7259259259259259, + "no_speech_prob": 0.0022025706712156534}, {"id": 180, "seek": 101416, "start": 1036.0, + "end": 1041.6, "text": " streaming site for buying and selling. So it''s sort of + eBay meets Twitch kind of a", "tokens": [51456, 11791, 3621, 337, 6382, 293, 6511, + 13, 407, 309, 311, 1333, 295, 33803, 13961, 22222, 733, 295, 257, 51736], "temperature": + 0.0, "avg_logprob": -0.20070678904905156, "compression_ratio": 1.7259259259259259, + "no_speech_prob": 0.0022025706712156534}, {"id": 181, "seek": 104160, "start": 1042.1599999999999, + "end": 1045.4399999999998, "text": " easiest way I could describe it. But what''s + really cool about that is you have a recommendation", "tokens": [50392, 12889, 636, + 286, 727, 6786, 309, 13, 583, 437, 311, 534, 1627, 466, 300, 307, 291, 362, 257, + 11879, 50556], "temperature": 0.0, "avg_logprob": -0.15352546746003712, "compression_ratio": + 1.9233333333333333, "no_speech_prob": 0.020020080730319023}, {"id": 182, "seek": + 104160, "start": 1045.4399999999998, "end": 1049.76, "text": " problem like I want + to buy something or people selling it. So I it''s really in the sites.", "tokens": + [50556, 1154, 411, 286, 528, 281, 2256, 746, 420, 561, 6511, 309, 13, 407, 286, + 309, 311, 534, 294, 264, 7533, 13, 50772], "temperature": 0.0, "avg_logprob": -0.15352546746003712, + "compression_ratio": 1.9233333333333333, "no_speech_prob": 0.020020080730319023}, + {"id": 183, "seek": 104160, "start": 1049.76, "end": 1053.9199999999998, "text": + " And my interest were you to show me like you might want to check these things + out. That''s like a", "tokens": [50772, 400, 452, 1179, 645, 291, 281, 855, 385, + 411, 291, 1062, 528, 281, 1520, 613, 721, 484, 13, 663, 311, 411, 257, 50980], "temperature": + 0.0, "avg_logprob": -0.15352546746003712, "compression_ratio": 1.9233333333333333, + "no_speech_prob": 0.020020080730319023}, {"id": 184, "seek": 104160, "start": 1053.9199999999998, + "end": 1059.76, "text": " recommendation problem. But it''s like really real time + right. It has to match me to online sellers", "tokens": [50980, 11879, 1154, 13, + 583, 309, 311, 411, 534, 957, 565, 558, 13, 467, 575, 281, 2995, 385, 281, 2950, + 31276, 51272], "temperature": 0.0, "avg_logprob": -0.15352546746003712, "compression_ratio": + 1.9233333333333333, "no_speech_prob": 0.020020080730319023}, {"id": 185, "seek": + 104160, "start": 1059.76, "end": 1064.48, "text": " at any given moment. And so + it''s a recommendation system that has to get built needs decent amount", "tokens": + [51272, 412, 604, 2212, 1623, 13, 400, 370, 309, 311, 257, 11879, 1185, 300, 575, + 281, 483, 3094, 2203, 8681, 2372, 51508], "temperature": 0.0, "avg_logprob": -0.15352546746003712, + "compression_ratio": 1.9233333333333333, "no_speech_prob": 0.020020080730319023}, + {"id": 186, "seek": 104160, "start": 1064.48, "end": 1069.52, "text": " of needs + high scale. And it also needs to be real time. It needs to use a lot of real time + data.", "tokens": [51508, 295, 2203, 1090, 4373, 13, 400, 309, 611, 2203, 281, 312, + 957, 565, 13, 467, 2203, 281, 764, 257, 688, 295, 957, 565, 1412, 13, 51760], "temperature": + 0.0, "avg_logprob": -0.15352546746003712, "compression_ratio": 1.9233333333333333, + "no_speech_prob": 0.020020080730319023}, {"id": 187, "seek": 106952, "start": 1069.52, + "end": 1074.72, "text": " So these are all use cases for Rockset. These are every + one of these is real customers", "tokens": [50364, 407, 613, 366, 439, 764, 3331, + 337, 6922, 3854, 13, 1981, 366, 633, 472, 295, 613, 307, 957, 4581, 50624], "temperature": + 0.0, "avg_logprob": -0.23288567860921225, "compression_ratio": 1.6852791878172588, + "no_speech_prob": 0.0013946861727163196}, {"id": 188, "seek": 106952, "start": 1075.44, + "end": 1082.16, "text": " using Rockset to do something. Yeah, for sure. Now I want + to go back to jump back to tech side.", "tokens": [50660, 1228, 6922, 3854, 281, + 360, 746, 13, 865, 11, 337, 988, 13, 823, 286, 528, 281, 352, 646, 281, 3012, 646, + 281, 7553, 1252, 13, 50996], "temperature": 0.0, "avg_logprob": -0.23288567860921225, + "compression_ratio": 1.6852791878172588, "no_speech_prob": 0.0013946861727163196}, + {"id": 189, "seek": 106952, "start": 1082.72, "end": 1087.76, "text": " So Rockset + and inside it are you using RocksDB or something else?", "tokens": [51024, 407, + 6922, 3854, 293, 1854, 309, 366, 291, 1228, 6922, 82, 27735, 420, 746, 1646, 30, + 51276], "temperature": 0.0, "avg_logprob": -0.23288567860921225, "compression_ratio": + 1.6852791878172588, "no_speech_prob": 0.0013946861727163196}, {"id": 190, "seek": + 106952, "start": 1087.76, "end": 1095.12, "text": " So okay. So okay. Are we using + RocksDB? So first of all do we know what RocksDB is?", "tokens": [51276, 407, 1392, + 13, 407, 1392, 13, 2014, 321, 1228, 6922, 82, 27735, 30, 407, 700, 295, 439, 360, + 321, 458, 437, 6922, 82, 27735, 307, 30, 51644], "temperature": 0.0, "avg_logprob": + -0.23288567860921225, "compression_ratio": 1.6852791878172588, "no_speech_prob": + 0.0013946861727163196}, {"id": 191, "seek": 109512, "start": 1095.12, "end": 1100.2399999999998, + "text": " Just everyone''s on the same page. RocksDB is an engine that was built + by Drew Bat Facebook.", "tokens": [50364, 1449, 1518, 311, 322, 264, 912, 3028, + 13, 6922, 82, 27735, 307, 364, 2848, 300, 390, 3094, 538, 25550, 10066, 4384, 13, + 50620], "temperature": 0.0, "avg_logprob": -0.17456136211272208, "compression_ratio": + 1.6702898550724639, "no_speech_prob": 0.0019574747420847416}, {"id": 192, "seek": + 109512, "start": 1100.8799999999999, "end": 1104.3999999999999, "text": " And I + shouldn''t say by Drew, but by a team that Drew was a part of. Like he was one of + the", "tokens": [50652, 400, 286, 4659, 380, 584, 538, 25550, 11, 457, 538, 257, + 1469, 300, 25550, 390, 257, 644, 295, 13, 1743, 415, 390, 472, 295, 264, 50828], + "temperature": 0.0, "avg_logprob": -0.17456136211272208, "compression_ratio": 1.6702898550724639, + "no_speech_prob": 0.0019574747420847416}, {"id": 193, "seek": 109512, "start": 1104.3999999999999, + "end": 1107.84, "text": " original founders of that team. There''s certainly a lot + of people involved in RocksDB.", "tokens": [50828, 3380, 25608, 295, 300, 1469, + 13, 821, 311, 3297, 257, 688, 295, 561, 3288, 294, 6922, 82, 27735, 13, 51000], + "temperature": 0.0, "avg_logprob": -0.17456136211272208, "compression_ratio": 1.6702898550724639, + "no_speech_prob": 0.0019574747420847416}, {"id": 194, "seek": 109512, "start": 1107.84, + "end": 1113.76, "text": " It''s a key value store right. It''s built sort of to + scale very well and sort of do log structure", "tokens": [51000, 467, 311, 257, + 2141, 2158, 3531, 558, 13, 467, 311, 3094, 1333, 295, 281, 4373, 588, 731, 293, + 1333, 295, 360, 3565, 3877, 51296], "temperature": 0.0, "avg_logprob": -0.17456136211272208, + "compression_ratio": 1.6702898550724639, "no_speech_prob": 0.0019574747420847416}, + {"id": 195, "seek": 109512, "start": 1113.76, "end": 1122.0, "text": " merge over + time. Rockset absolutely uses RocksDB as its storage plane. And so there''s a lot + of", "tokens": [51296, 22183, 670, 565, 13, 6922, 3854, 3122, 4960, 6922, 82, 27735, + 382, 1080, 6725, 5720, 13, 400, 370, 456, 311, 257, 688, 295, 51708], "temperature": + 0.0, "avg_logprob": -0.17456136211272208, "compression_ratio": 1.6702898550724639, + "no_speech_prob": 0.0019574747420847416}, {"id": 196, "seek": 112200, "start": 1122.0, + "end": 1127.84, "text": " Rockset built on top of RocksDB. So Rockset is not RocksDB + as a service. That is not what Rockset is.", "tokens": [50364, 6922, 3854, 3094, + 322, 1192, 295, 6922, 82, 27735, 13, 407, 6922, 3854, 307, 406, 6922, 82, 27735, + 382, 257, 2643, 13, 663, 307, 406, 437, 6922, 3854, 307, 13, 50656], "temperature": + 0.0, "avg_logprob": -0.12445149539915984, "compression_ratio": 1.7649253731343284, + "no_speech_prob": 0.00041014549788087606}, {"id": 197, "seek": 112200, "start": + 1127.84, "end": 1136.64, "text": " We do use it as the storage plane of Rockset. + And we do take heavy advantage of, again,", "tokens": [50656, 492, 360, 764, 309, + 382, 264, 6725, 5720, 295, 6922, 3854, 13, 400, 321, 360, 747, 4676, 5002, 295, + 11, 797, 11, 51096], "temperature": 0.0, "avg_logprob": -0.12445149539915984, "compression_ratio": + 1.7649253731343284, "no_speech_prob": 0.00041014549788087606}, {"id": 198, "seek": + 112200, "start": 1136.64, "end": 1140.72, "text": " to get into the technical weeds + a little bit like log structured merges to keep our indexes", "tokens": [51096, + 281, 483, 666, 264, 6191, 26370, 257, 707, 857, 411, 3565, 18519, 3551, 2880, 281, + 1066, 527, 8186, 279, 51300], "temperature": 0.0, "avg_logprob": -0.12445149539915984, + "compression_ratio": 1.7649253731343284, "no_speech_prob": 0.00041014549788087606}, + {"id": 199, "seek": 112200, "start": 1141.28, "end": 1146.56, "text": " sort of + up to date continuously. And that is a big part of like the real timeness of Rockset.", + "tokens": [51328, 1333, 295, 493, 281, 4002, 15684, 13, 400, 300, 307, 257, 955, + 644, 295, 411, 264, 957, 524, 15264, 295, 6922, 3854, 13, 51592], "temperature": + 0.0, "avg_logprob": -0.12445149539915984, "compression_ratio": 1.7649253731343284, + "no_speech_prob": 0.00041014549788087606}, {"id": 200, "seek": 112200, "start": + 1146.56, "end": 1151.2, "text": " Like being able to update the index continuously + and having this like heavy weight infrastructure", "tokens": [51592, 1743, 885, + 1075, 281, 5623, 264, 8186, 15684, 293, 1419, 341, 411, 4676, 3364, 6896, 51824], + "temperature": 0.0, "avg_logprob": -0.12445149539915984, "compression_ratio": 1.7649253731343284, + "no_speech_prob": 0.00041014549788087606}, {"id": 201, "seek": 115120, "start": + 1151.2, "end": 1155.52, "text": " to merge these indexes and then the kind of the + appendonly log structured way that you do.", "tokens": [50364, 281, 22183, 613, + 8186, 279, 293, 550, 264, 733, 295, 264, 724, 521, 25202, 3565, 18519, 636, 300, + 291, 360, 13, 50580], "temperature": 0.0, "avg_logprob": -0.21169595795918286, "compression_ratio": + 1.711111111111111, "no_speech_prob": 0.0015068243956193328}, {"id": 202, "seek": + 115120, "start": 1156.4, "end": 1161.2, "text": " And the LSM world is part of the + secret sauce. It''s not that secret, but it''s part of the secret", "tokens": [50624, + 400, 264, 441, 26693, 1002, 307, 644, 295, 264, 4054, 4880, 13, 467, 311, 406, 300, + 4054, 11, 457, 309, 311, 644, 295, 264, 4054, 50864], "temperature": 0.0, "avg_logprob": + -0.21169595795918286, "compression_ratio": 1.711111111111111, "no_speech_prob": + 0.0015068243956193328}, {"id": 203, "seek": 115120, "start": 1161.2, "end": 1167.44, + "text": " sauce of Rockset. Yeah, for sure. But then also all these things like + vector search, you know,", "tokens": [50864, 4880, 295, 6922, 3854, 13, 865, 11, + 337, 988, 13, 583, 550, 611, 439, 613, 721, 411, 8062, 3164, 11, 291, 458, 11, 51176], + "temperature": 0.0, "avg_logprob": -0.21169595795918286, "compression_ratio": 1.711111111111111, + "no_speech_prob": 0.0015068243956193328}, {"id": 204, "seek": 115120, "start": 1168.16, + "end": 1173.76, "text": " storing the embeddings. Is that also happening outside + of RocksDB? Basically, in the layer you explained.", "tokens": [51212, 26085, 264, + 12240, 29432, 13, 1119, 300, 611, 2737, 2380, 295, 6922, 82, 27735, 30, 8537, 11, + 294, 264, 4583, 291, 8825, 13, 51492], "temperature": 0.0, "avg_logprob": -0.21169595795918286, + "compression_ratio": 1.711111111111111, "no_speech_prob": 0.0015068243956193328}, + {"id": 205, "seek": 115120, "start": 1175.1200000000001, "end": 1180.56, "text": + " So hold on, you asked about vector, what were the things? Oh, embeddings.", "tokens": + [51560, 407, 1797, 322, 11, 291, 2351, 466, 8062, 11, 437, 645, 264, 721, 30, 876, + 11, 12240, 29432, 13, 51832], "temperature": 0.0, "avg_logprob": -0.21169595795918286, + "compression_ratio": 1.711111111111111, "no_speech_prob": 0.0015068243956193328}, + {"id": 206, "seek": 118056, "start": 1180.56, "end": 1184.6399999999999, "text": + " And embeddings and vector search itself and the sort of a and n indexes presumably.", + "tokens": [50364, 400, 12240, 29432, 293, 8062, 3164, 2564, 293, 264, 1333, 295, + 257, 293, 297, 8186, 279, 26742, 13, 50568], "temperature": 0.0, "avg_logprob": + -0.1622507226376133, "compression_ratio": 1.7638376383763839, "no_speech_prob": + 0.0004625711590051651}, {"id": 207, "seek": 118056, "start": 1184.6399999999999, + "end": 1190.8, "text": " Yeah. So the a and n index, so we''ve added, we''ve extended + RocksDB a little bit to kind of have", "tokens": [50568, 865, 13, 407, 264, 257, + 293, 297, 8186, 11, 370, 321, 600, 3869, 11, 321, 600, 10913, 6922, 82, 27735, 257, + 707, 857, 281, 733, 295, 362, 50876], "temperature": 0.0, "avg_logprob": -0.1622507226376133, + "compression_ratio": 1.7638376383763839, "no_speech_prob": 0.0004625711590051651}, + {"id": 208, "seek": 118056, "start": 1190.8, "end": 1195.84, "text": " this notion + of a blob of memory that you attach to a particular thing. It''s what''s going to + be the", "tokens": [50876, 341, 10710, 295, 257, 46115, 295, 4675, 300, 291, 5085, + 281, 257, 1729, 551, 13, 467, 311, 437, 311, 516, 281, 312, 264, 51128], "temperature": + 0.0, "avg_logprob": -0.1622507226376133, "compression_ratio": 1.7638376383763839, + "no_speech_prob": 0.0004625711590051651}, {"id": 209, "seek": 118056, "start": 1195.84, + "end": 1202.0, "text": " a and n index. And then you can build custom operators + to merge them, for example. And so that we do,", "tokens": [51128, 257, 293, 297, + 8186, 13, 400, 550, 291, 393, 1322, 2375, 19077, 281, 22183, 552, 11, 337, 1365, + 13, 400, 370, 300, 321, 360, 11, 51436], "temperature": 0.0, "avg_logprob": -0.1622507226376133, + "compression_ratio": 1.7638376383763839, "no_speech_prob": 0.0004625711590051651}, + {"id": 210, "seek": 118056, "start": 1202.0, "end": 1208.0, "text": " we do essentially + shove the a and n index into this. And so it gets into RocksDB. RocksDB doesn''t", + "tokens": [51436, 321, 360, 4476, 35648, 264, 257, 293, 297, 8186, 666, 341, 13, + 400, 370, 309, 2170, 666, 6922, 82, 27735, 13, 6922, 82, 27735, 1177, 380, 51736], + "temperature": 0.0, "avg_logprob": -0.1622507226376133, "compression_ratio": 1.7638376383763839, + "no_speech_prob": 0.0004625711590051651}, {"id": 211, "seek": 120800, "start": 1208.0, + "end": 1212.48, "text": " know about a and n indexes. It just knows there''s a blob + of memory that it has to log structure merge", "tokens": [50364, 458, 466, 257, + 293, 297, 8186, 279, 13, 467, 445, 3255, 456, 311, 257, 46115, 295, 4675, 300, 309, + 575, 281, 3565, 3877, 22183, 50588], "temperature": 0.0, "avg_logprob": -0.1709749548284857, + "compression_ratio": 1.676595744680851, "no_speech_prob": 0.0006812462816014886}, + {"id": 212, "seek": 120800, "start": 1212.48, "end": 1219.6, "text": " down the + road. As far as embeddings, for us, that''s just arrays. So for us, an embedding + is just a", "tokens": [50588, 760, 264, 3060, 13, 1018, 1400, 382, 12240, 29432, + 11, 337, 505, 11, 300, 311, 445, 41011, 13, 407, 337, 505, 11, 364, 12240, 3584, + 307, 445, 257, 50944], "temperature": 0.0, "avg_logprob": -0.1709749548284857, "compression_ratio": + 1.676595744680851, "no_speech_prob": 0.0006812462816014886}, {"id": 213, "seek": + 120800, "start": 1219.6, "end": 1225.04, "text": " vector. And for us, a vector + is just an array. There''s no real difference in the way these things", "tokens": + [50944, 8062, 13, 400, 337, 505, 11, 257, 8062, 307, 445, 364, 10225, 13, 821, 311, + 572, 957, 2649, 294, 264, 636, 613, 721, 51216], "temperature": 0.0, "avg_logprob": + -0.1709749548284857, "compression_ratio": 1.676595744680851, "no_speech_prob": 0.0006812462816014886}, + {"id": 214, "seek": 120800, "start": 1225.04, "end": 1233.92, "text": " are stored. + And those are stored in RocksDB. Yeah, got it. And so, and basically, what other + AI", "tokens": [51216, 366, 12187, 13, 400, 729, 366, 12187, 294, 6922, 82, 27735, + 13, 865, 11, 658, 309, 13, 400, 370, 11, 293, 1936, 11, 437, 661, 7318, 51660], + "temperature": 0.0, "avg_logprob": -0.1709749548284857, "compression_ratio": 1.676595744680851, + "no_speech_prob": 0.0006812462816014886}, {"id": 215, "seek": 123392, "start": 1233.92, + "end": 1239.6000000000001, "text": " capabilities does RocksDB offer, you know, + basically everything? What''s the secret source of that", "tokens": [50364, 10862, + 775, 6922, 82, 27735, 2626, 11, 291, 458, 11, 1936, 1203, 30, 708, 311, 264, 4054, + 4009, 295, 300, 50648], "temperature": 0.0, "avg_logprob": -0.16754087854604252, + "compression_ratio": 1.8081180811808117, "no_speech_prob": 0.018749956041574478}, + {"id": 216, "seek": 123392, "start": 1239.6000000000001, "end": 1247.3600000000001, + "text": " thing? So they''re facing, right? But still. So I have two, there''s a + few things to talk about here.", "tokens": [50648, 551, 30, 407, 436, 434, 7170, + 11, 558, 30, 583, 920, 13, 407, 286, 362, 732, 11, 456, 311, 257, 1326, 721, 281, + 751, 466, 510, 13, 51036], "temperature": 0.0, "avg_logprob": -0.16754087854604252, + "compression_ratio": 1.8081180811808117, "no_speech_prob": 0.018749956041574478}, + {"id": 217, "seek": 123392, "start": 1247.3600000000001, "end": 1251.6000000000001, + "text": " We talk about secret sauce. So one thing we skipped over about one thing + that''s worth touching on", "tokens": [51036, 492, 751, 466, 4054, 4880, 13, 407, + 472, 551, 321, 30193, 670, 466, 472, 551, 300, 311, 3163, 11175, 322, 51248], "temperature": + 0.0, "avg_logprob": -0.16754087854604252, "compression_ratio": 1.8081180811808117, + "no_speech_prob": 0.018749956041574478}, {"id": 218, "seek": 123392, "start": 1251.6000000000001, + "end": 1258.64, "text": " in terms of RocksDB or architecture is RocksDB has two + things that you hope every database has,", "tokens": [51248, 294, 2115, 295, 6922, + 82, 27735, 420, 9482, 307, 6922, 82, 27735, 575, 732, 721, 300, 291, 1454, 633, + 8149, 575, 11, 51600], "temperature": 0.0, "avg_logprob": -0.16754087854604252, + "compression_ratio": 1.8081180811808117, "no_speech_prob": 0.018749956041574478}, + {"id": 219, "seek": 123392, "start": 1258.64, "end": 1263.3600000000001, "text": + " but not every database has one is we have disaggregated storage, like fully disaggregated + storage.", "tokens": [51600, 457, 406, 633, 8149, 575, 472, 307, 321, 362, 10414, + 11027, 770, 6725, 11, 411, 4498, 10414, 11027, 770, 6725, 13, 51836], "temperature": + 0.0, "avg_logprob": -0.16754087854604252, "compression_ratio": 1.8081180811808117, + "no_speech_prob": 0.018749956041574478}, {"id": 220, "seek": 126336, "start": 1263.36, + "end": 1267.52, "text": " So if you double your storage, you can, you can, basically, + you can double your storage,", "tokens": [50364, 407, 498, 291, 3834, 428, 6725, + 11, 291, 393, 11, 291, 393, 11, 1936, 11, 291, 393, 3834, 428, 6725, 11, 50572], + "temperature": 0.0, "avg_logprob": -0.14591153462727866, "compression_ratio": 2.089605734767025, + "no_speech_prob": 0.0007795862038619816}, {"id": 221, "seek": 126336, "start": 1267.52, + "end": 1272.0, "text": " you can double your compute, you can do either. You don''t + have to do both, right? You can, they", "tokens": [50572, 291, 393, 3834, 428, 14722, + 11, 291, 393, 360, 2139, 13, 509, 500, 380, 362, 281, 360, 1293, 11, 558, 30, 509, + 393, 11, 436, 50796], "temperature": 0.0, "avg_logprob": -0.14591153462727866, "compression_ratio": + 2.089605734767025, "no_speech_prob": 0.0007795862038619816}, {"id": 222, "seek": + 126336, "start": 1272.0, "end": 1275.9199999999998, "text": " are stable. They, + there''s compute optimized machines and storage optimized machines, and you can", + "tokens": [50796, 366, 8351, 13, 814, 11, 456, 311, 14722, 26941, 8379, 293, 6725, + 26941, 8379, 11, 293, 291, 393, 50992], "temperature": 0.0, "avg_logprob": -0.14591153462727866, + "compression_ratio": 2.089605734767025, "no_speech_prob": 0.0007795862038619816}, + {"id": 223, "seek": 126336, "start": 1275.9199999999998, "end": 1282.08, "text": + " add to either group independently. We also have compute and compute isolation. + So you can set aside", "tokens": [50992, 909, 281, 2139, 1594, 21761, 13, 492, 611, + 362, 14722, 293, 14722, 16001, 13, 407, 291, 393, 992, 7359, 51300], "temperature": + 0.0, "avg_logprob": -0.14591153462727866, "compression_ratio": 2.089605734767025, + "no_speech_prob": 0.0007795862038619816}, {"id": 224, "seek": 126336, "start": 1282.08, + "end": 1287.6799999999998, "text": " a set of machines, for example, just to do + ingest and a different set of machines, just to do queries.", "tokens": [51300, + 257, 992, 295, 8379, 11, 337, 1365, 11, 445, 281, 360, 3957, 377, 293, 257, 819, + 992, 295, 8379, 11, 445, 281, 360, 24109, 13, 51580], "temperature": 0.0, "avg_logprob": + -0.14591153462727866, "compression_ratio": 2.089605734767025, "no_speech_prob": + 0.0007795862038619816}, {"id": 225, "seek": 126336, "start": 1287.6799999999998, + "end": 1291.84, "text": " And they both operate on the same backend, for example. + You can go farther than that. You can have", "tokens": [51580, 400, 436, 1293, 9651, + 322, 264, 912, 38087, 11, 337, 1365, 13, 509, 393, 352, 20344, 813, 300, 13, 509, + 393, 362, 51788], "temperature": 0.0, "avg_logprob": -0.14591153462727866, "compression_ratio": + 2.089605734767025, "no_speech_prob": 0.0007795862038619816}, {"id": 226, "seek": + 129184, "start": 1292.32, "end": 1296.1599999999999, "text": " different groups + of machines for different sets of queries or by 10 in or whatever you can go", "tokens": + [50388, 819, 3935, 295, 8379, 337, 819, 6352, 295, 24109, 420, 538, 1266, 294, 420, + 2035, 291, 393, 352, 50580], "temperature": 0.0, "avg_logprob": -0.14488281637935316, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 0.0013978761853650212}, + {"id": 227, "seek": 129184, "start": 1296.1599999999999, "end": 1301.6, "text": + " wild with this idea, isolating compute from it''s from each other, right? Once + you have disaggregated", "tokens": [50580, 4868, 365, 341, 1558, 11, 48912, 14722, + 490, 309, 311, 490, 1184, 661, 11, 558, 30, 3443, 291, 362, 10414, 11027, 770, 50852], + "temperature": 0.0, "avg_logprob": -0.14488281637935316, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.0013978761853650212}, {"id": 228, "seek": 129184, "start": 1301.6, + "end": 1308.3999999999999, "text": " storage, this is an idea you can do. This is + already really powerful for AI use cases, like in a way", "tokens": [50852, 6725, + 11, 341, 307, 364, 1558, 291, 393, 360, 13, 639, 307, 1217, 534, 4005, 337, 7318, + 764, 3331, 11, 411, 294, 257, 636, 51192], "temperature": 0.0, "avg_logprob": -0.14488281637935316, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 0.0013978761853650212}, + {"id": 229, "seek": 129184, "start": 1308.3999999999999, "end": 1313.36, "text": + " you don''t necessarily appreciate, because what it means is I have a way to do + my index rebuilds,", "tokens": [51192, 291, 500, 380, 4725, 4449, 11, 570, 437, + 309, 1355, 307, 286, 362, 257, 636, 281, 360, 452, 8186, 16877, 82, 11, 51440], + "temperature": 0.0, "avg_logprob": -0.14488281637935316, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.0013978761853650212}, {"id": 230, "seek": 129184, "start": 1313.36, + "end": 1319.28, "text": " which are expensive in a vector world, away from the machines + handling queries. Like I''m not,", "tokens": [51440, 597, 366, 5124, 294, 257, 8062, + 1002, 11, 1314, 490, 264, 8379, 13175, 24109, 13, 1743, 286, 478, 406, 11, 51736], + "temperature": 0.0, "avg_logprob": -0.14488281637935316, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.0013978761853650212}, {"id": 231, "seek": 131928, "start": 1319.28, + "end": 1323.76, "text": " like what''s not going to happen is the machine, the database + is going to bog itself down doing", "tokens": [50364, 411, 437, 311, 406, 516, 281, + 1051, 307, 264, 3479, 11, 264, 8149, 307, 516, 281, 26132, 2564, 760, 884, 50588], + "temperature": 0.0, "avg_logprob": -0.14676791223986396, "compression_ratio": 1.6810035842293907, + "no_speech_prob": 0.0016930574784055352}, {"id": 232, "seek": 131928, "start": 1323.76, + "end": 1328.24, "text": " an index updates of some sort while queries are trying + to be served and you''re going to get time", "tokens": [50588, 364, 8186, 9205, + 295, 512, 1333, 1339, 24109, 366, 1382, 281, 312, 7584, 293, 291, 434, 516, 281, + 483, 565, 50812], "temperature": 0.0, "avg_logprob": -0.14676791223986396, "compression_ratio": + 1.6810035842293907, "no_speech_prob": 0.0016930574784055352}, {"id": 233, "seek": + 131928, "start": 1328.24, "end": 1334.0, "text": " out. So being able to actually + separate out compute is very powerful in these AI settings.", "tokens": [50812, + 484, 13, 407, 885, 1075, 281, 767, 4994, 484, 14722, 307, 588, 4005, 294, 613, 7318, + 6257, 13, 51100], "temperature": 0.0, "avg_logprob": -0.14676791223986396, "compression_ratio": + 1.6810035842293907, "no_speech_prob": 0.0016930574784055352}, {"id": 234, "seek": + 131928, "start": 1334.0, "end": 1338.96, "text": " Again, another example, no one''s + done this like in total anger yet, but it''s going to come,", "tokens": [51100, + 3764, 11, 1071, 1365, 11, 572, 472, 311, 1096, 341, 411, 294, 3217, 10240, 1939, + 11, 457, 309, 311, 516, 281, 808, 11, 51348], "temperature": 0.0, "avg_logprob": + -0.14676791223986396, "compression_ratio": 1.6810035842293907, "no_speech_prob": + 0.0016930574784055352}, {"id": 235, "seek": 131928, "start": 1338.96, "end": 1344.8, + "text": " which is like the hey, I have a god awful amount of vectors. I want to + update them to the next", "tokens": [51348, 597, 307, 411, 264, 4177, 11, 286, 362, + 257, 3044, 11232, 2372, 295, 18875, 13, 286, 528, 281, 5623, 552, 281, 264, 958, + 51640], "temperature": 0.0, "avg_logprob": -0.14676791223986396, "compression_ratio": + 1.6810035842293907, "no_speech_prob": 0.0016930574784055352}, {"id": 236, "seek": + 134480, "start": 1344.8, "end": 1349.52, "text": " generation of my the new open + AI model has come out. I want to rerun the entire data set.", "tokens": [50364, + 5125, 295, 452, 264, 777, 1269, 7318, 2316, 575, 808, 484, 13, 286, 528, 281, 43819, + 409, 264, 2302, 1412, 992, 13, 50600], "temperature": 0.0, "avg_logprob": -0.16023313698648406, + "compression_ratio": 1.6725352112676057, "no_speech_prob": 0.0026966959703713655}, + {"id": 237, "seek": 134480, "start": 1350.56, "end": 1355.68, "text": " We can do + that in this kind of like off on the side fashion in a way that just reduz it all + in", "tokens": [50652, 492, 393, 360, 300, 294, 341, 733, 295, 411, 766, 322, 264, + 1252, 6700, 294, 257, 636, 300, 445, 2182, 3334, 309, 439, 294, 50908], "temperature": + 0.0, "avg_logprob": -0.16023313698648406, "compression_ratio": 1.6725352112676057, + "no_speech_prob": 0.0026966959703713655}, {"id": 238, "seek": 134480, "start": 1355.68, + "end": 1360.48, "text": " place without affecting the running application as an + example. So that''s one like kind of very", "tokens": [50908, 1081, 1553, 17476, + 264, 2614, 3861, 382, 364, 1365, 13, 407, 300, 311, 472, 411, 733, 295, 588, 51148], + "temperature": 0.0, "avg_logprob": -0.16023313698648406, "compression_ratio": 1.6725352112676057, + "no_speech_prob": 0.0026966959703713655}, {"id": 239, "seek": 134480, "start": 1360.48, + "end": 1366.24, "text": " architectural found is a very database type of a feature + that you you will miss it if you don''t", "tokens": [51148, 26621, 1352, 307, 257, + 588, 8149, 2010, 295, 257, 4111, 300, 291, 291, 486, 1713, 309, 498, 291, 500, 380, + 51436], "temperature": 0.0, "avg_logprob": -0.16023313698648406, "compression_ratio": + 1.6725352112676057, "no_speech_prob": 0.0026966959703713655}, {"id": 240, "seek": + 134480, "start": 1366.24, "end": 1372.56, "text": " have it when the day comes. + Moving up to kind of more AI level things, the other thing that we have", "tokens": + [51436, 362, 309, 562, 264, 786, 1487, 13, 14242, 493, 281, 733, 295, 544, 7318, + 1496, 721, 11, 264, 661, 551, 300, 321, 362, 51752], "temperature": 0.0, "avg_logprob": + -0.16023313698648406, "compression_ratio": 1.6725352112676057, "no_speech_prob": + 0.0026966959703713655}, {"id": 241, "seek": 137256, "start": 1372.56, "end": 1379.76, + "text": " is like we have a huge pile of infrastructure of like doing SQL and relational + queries, right?", "tokens": [50364, 307, 411, 321, 362, 257, 2603, 14375, 295, 6896, + 295, 411, 884, 19200, 293, 38444, 24109, 11, 558, 30, 50724], "temperature": 0.0, + "avg_logprob": -0.1363154867421026, "compression_ratio": 1.7482014388489209, "no_speech_prob": + 0.0016974152531474829}, {"id": 242, "seek": 137256, "start": 1380.3999999999999, + "end": 1386.3999999999999, "text": " In this system that''s separate from the vector + stuff. So when the vector stuff gets mixed with that", "tokens": [50756, 682, 341, + 1185, 300, 311, 4994, 490, 264, 8062, 1507, 13, 407, 562, 264, 8062, 1507, 2170, + 7467, 365, 300, 51056], "temperature": 0.0, "avg_logprob": -0.1363154867421026, + "compression_ratio": 1.7482014388489209, "no_speech_prob": 0.0016974152531474829}, + {"id": 243, "seek": 137256, "start": 1386.3999999999999, "end": 1391.9199999999998, + "text": " stuff, the things get very powerful and very magical. And so this gets + you into so there''s it''s", "tokens": [51056, 1507, 11, 264, 721, 483, 588, 4005, + 293, 588, 12066, 13, 400, 370, 341, 2170, 291, 666, 370, 456, 311, 309, 311, 51332], + "temperature": 0.0, "avg_logprob": -0.1363154867421026, "compression_ratio": 1.7482014388489209, + "no_speech_prob": 0.0016974152531474829}, {"id": 244, "seek": 137256, "start": 1391.9199999999998, + "end": 1396.1599999999999, "text": " funny because database people talk a certain + way and AI people talk a certain way and a lot of", "tokens": [51332, 4074, 570, + 8149, 561, 751, 257, 1629, 636, 293, 7318, 561, 751, 257, 1629, 636, 293, 257, 688, + 295, 51544], "temperature": 0.0, "avg_logprob": -0.1363154867421026, "compression_ratio": + 1.7482014388489209, "no_speech_prob": 0.0016974152531474829}, {"id": 245, "seek": + 137256, "start": 1396.1599999999999, "end": 1399.36, "text": " times they''re actually + saying the same thing, but they use none of the same words. And so they don''t", + "tokens": [51544, 1413, 436, 434, 767, 1566, 264, 912, 551, 11, 457, 436, 764, 6022, + 295, 264, 912, 2283, 13, 400, 370, 436, 500, 380, 51704], "temperature": 0.0, "avg_logprob": + -0.1363154867421026, "compression_ratio": 1.7482014388489209, "no_speech_prob": + 0.0016974152531474829}, {"id": 246, "seek": 139936, "start": 1399.36, "end": 1403.84, + "text": " know they''re talking about the same thing, but as an example, like so + in an AI context, things like", "tokens": [50364, 458, 436, 434, 1417, 466, 264, + 912, 551, 11, 457, 382, 364, 1365, 11, 411, 370, 294, 364, 7318, 4319, 11, 721, + 411, 50588], "temperature": 0.0, "avg_logprob": -0.19275750404547068, "compression_ratio": + 1.8972332015810276, "no_speech_prob": 0.0008676409488543868}, {"id": 247, "seek": + 139936, "start": 1403.84, "end": 1409.76, "text": " metadata filtering or hybrid + search, these are all things Rockset does out of the box. Like", "tokens": [50588, + 26603, 30822, 420, 13051, 3164, 11, 613, 366, 439, 721, 6922, 3854, 775, 484, 295, + 264, 2424, 13, 1743, 50884], "temperature": 0.0, "avg_logprob": -0.19275750404547068, + "compression_ratio": 1.8972332015810276, "no_speech_prob": 0.0008676409488543868}, + {"id": 248, "seek": 139936, "start": 1409.76, "end": 1415.28, "text": " metadata + filtering in an AI context or a vector context, that''s just like the wear clause + of a", "tokens": [50884, 26603, 30822, 294, 364, 7318, 4319, 420, 257, 8062, 4319, + 11, 300, 311, 445, 411, 264, 3728, 25925, 295, 257, 51160], "temperature": 0.0, + "avg_logprob": -0.19275750404547068, "compression_ratio": 1.8972332015810276, "no_speech_prob": + 0.0008676409488543868}, {"id": 249, "seek": 139936, "start": 1415.28, "end": 1419.76, + "text": " SQL query. Like that''s all that is like where X is created and this time + is created and that.", "tokens": [51160, 19200, 14581, 13, 1743, 300, 311, 439, + 300, 307, 411, 689, 1783, 307, 2942, 293, 341, 565, 307, 2942, 293, 300, 13, 51384], + "temperature": 0.0, "avg_logprob": -0.19275750404547068, "compression_ratio": 1.8972332015810276, + "no_speech_prob": 0.0008676409488543868}, {"id": 250, "seek": 139936, "start": 1419.76, + "end": 1425.04, "text": " Like so for us, it''s that''s all done. Like metadata + filtering is easy. That''s not a hard problem at", "tokens": [51384, 1743, 370, + 337, 505, 11, 309, 311, 300, 311, 439, 1096, 13, 1743, 26603, 30822, 307, 1858, + 13, 663, 311, 406, 257, 1152, 1154, 412, 51648], "temperature": 0.0, "avg_logprob": + -0.19275750404547068, "compression_ratio": 1.8972332015810276, "no_speech_prob": + 0.0008676409488543868}, {"id": 251, "seek": 142504, "start": 1425.04, "end": 1430.1599999999999, + "text": " all. All you have to do is you know, I have a super powerful query language. + I''m query optimizer.", "tokens": [50364, 439, 13, 1057, 291, 362, 281, 360, 307, + 291, 458, 11, 286, 362, 257, 1687, 4005, 14581, 2856, 13, 286, 478, 14581, 5028, + 6545, 13, 50620], "temperature": 0.0, "avg_logprob": -0.14817818556681717, "compression_ratio": + 1.6753246753246753, "no_speech_prob": 7.794262637617067e-05}, {"id": 252, "seek": + 142504, "start": 1430.1599999999999, "end": 1434.8799999999999, "text": " All you + have to do is kind of merge that with the a and n kind of vector search and I get + like", "tokens": [50620, 1057, 291, 362, 281, 360, 307, 733, 295, 22183, 300, 365, + 264, 257, 293, 297, 733, 295, 8062, 3164, 293, 286, 483, 411, 50856], "temperature": + 0.0, "avg_logprob": -0.14817818556681717, "compression_ratio": 1.6753246753246753, + "no_speech_prob": 7.794262637617067e-05}, {"id": 253, "seek": 142504, "start": 1434.8799999999999, + "end": 1439.6, "text": " metadata filtering is like a not that''s not a hard problem + for us to solve. Like it would be for", "tokens": [50856, 26603, 30822, 307, 411, + 257, 406, 300, 311, 406, 257, 1152, 1154, 337, 505, 281, 5039, 13, 1743, 309, 576, + 312, 337, 51092], "temperature": 0.0, "avg_logprob": -0.14817818556681717, "compression_ratio": + 1.6753246753246753, "no_speech_prob": 7.794262637617067e-05}, {"id": 254, "seek": + 142504, "start": 1439.6, "end": 1446.72, "text": " others to solve. And so I do + think we really shine in situations where a you care about real time", "tokens": + [51092, 2357, 281, 5039, 13, 400, 370, 286, 360, 519, 321, 534, 12207, 294, 6851, + 689, 257, 291, 1127, 466, 957, 565, 51448], "temperature": 0.0, "avg_logprob": -0.14817818556681717, + "compression_ratio": 1.6753246753246753, "no_speech_prob": 7.794262637617067e-05}, + {"id": 255, "seek": 144672, "start": 1446.72, "end": 1456.72, "text": " ingest, + be you care about any kind of hybrid or metadata filtering. Rockset''s really good + as well", "tokens": [50364, 3957, 377, 11, 312, 291, 1127, 466, 604, 733, 295, 13051, + 420, 26603, 30822, 13, 6922, 3854, 311, 534, 665, 382, 731, 50864], "temperature": + 0.0, "avg_logprob": -0.16194987297058105, "compression_ratio": 1.6485355648535565, + "no_speech_prob": 0.000734504486899823}, {"id": 256, "seek": 144672, "start": 1456.72, + "end": 1464.32, "text": " for for kind of raw vector power, but I wouldn''t say + we''re like the best database in the world for like,", "tokens": [50864, 337, 337, + 733, 295, 8936, 8062, 1347, 11, 457, 286, 2759, 380, 584, 321, 434, 411, 264, 1151, + 8149, 294, 264, 1002, 337, 411, 11, 51244], "temperature": 0.0, "avg_logprob": -0.16194987297058105, + "compression_ratio": 1.6485355648535565, "no_speech_prob": 0.000734504486899823}, + {"id": 257, "seek": 144672, "start": 1465.28, "end": 1468.96, "text": " I don''t + know, I view it more like we kind of going where our customers are taking us. Like + if", "tokens": [51292, 286, 500, 380, 458, 11, 286, 1910, 309, 544, 411, 321, 733, + 295, 516, 689, 527, 4581, 366, 1940, 505, 13, 1743, 498, 51476], "temperature": + 0.0, "avg_logprob": -0.16194987297058105, "compression_ratio": 1.6485355648535565, + "no_speech_prob": 0.000734504486899823}, {"id": 258, "seek": 144672, "start": 1468.96, + "end": 1473.04, "text": " the customers came to me as like, Hey, if you if you if + I can have 10 times more vectors and like", "tokens": [51476, 264, 4581, 1361, 281, + 385, 382, 411, 11, 1911, 11, 498, 291, 498, 291, 498, 286, 393, 362, 1266, 1413, + 544, 18875, 293, 411, 51680], "temperature": 0.0, "avg_logprob": -0.16194987297058105, + "compression_ratio": 1.6485355648535565, "no_speech_prob": 0.000734504486899823}, + {"id": 259, "seek": 147304, "start": 1473.6, "end": 1477.92, "text": " 4% more precision + and recall, if you implemented this slightly better algorithm with these parameters,", + "tokens": [50392, 1017, 4, 544, 18356, 293, 9901, 11, 498, 291, 12270, 341, 4748, + 1101, 9284, 365, 613, 9834, 11, 50608], "temperature": 0.0, "avg_logprob": -0.12673638774230417, + "compression_ratio": 1.7132867132867133, "no_speech_prob": 0.002413652604445815}, + {"id": 260, "seek": 147304, "start": 1477.92, "end": 1485.04, "text": " we would + do it. But almost always it''s like they want we want that like hybrid search seems + to be", "tokens": [50608, 321, 576, 360, 309, 13, 583, 1920, 1009, 309, 311, 411, + 436, 528, 321, 528, 300, 411, 13051, 3164, 2544, 281, 312, 50964], "temperature": + 0.0, "avg_logprob": -0.12673638774230417, "compression_ratio": 1.7132867132867133, + "no_speech_prob": 0.002413652604445815}, {"id": 261, "seek": 147304, "start": 1485.04, + "end": 1490.32, "text": " the king. Like it''s it''s it''s merging these things. + And that''s where like a lot of our effort has", "tokens": [50964, 264, 4867, 13, + 1743, 309, 311, 309, 311, 309, 311, 44559, 613, 721, 13, 400, 300, 311, 689, 411, + 257, 688, 295, 527, 4630, 575, 51228], "temperature": 0.0, "avg_logprob": -0.12673638774230417, + "compression_ratio": 1.7132867132867133, "no_speech_prob": 0.002413652604445815}, + {"id": 262, "seek": 147304, "start": 1490.32, "end": 1495.28, "text": " gone is + into making the hybrid search story like making these two worlds work together like", + "tokens": [51228, 2780, 307, 666, 1455, 264, 13051, 3164, 1657, 411, 1455, 613, + 732, 13401, 589, 1214, 411, 51476], "temperature": 0.0, "avg_logprob": -0.12673638774230417, + "compression_ratio": 1.7132867132867133, "no_speech_prob": 0.002413652604445815}, + {"id": 263, "seek": 147304, "start": 1495.28, "end": 1500.6399999999999, "text": + " fairly seamlessly. Like be able to say like show me the closest 10 vectors that + were updated in the", "tokens": [51476, 6457, 38083, 13, 1743, 312, 1075, 281, 584, + 411, 855, 385, 264, 13699, 1266, 18875, 300, 645, 10588, 294, 264, 51744], "temperature": + 0.0, "avg_logprob": -0.12673638774230417, "compression_ratio": 1.7132867132867133, + "no_speech_prob": 0.002413652604445815}, {"id": 264, "seek": 150064, "start": 1500.64, + "end": 1507.6000000000001, "text": " last 10 minutes. Like that that kind of query + is really powerful. And that''s kind of what we''ve", "tokens": [50364, 1036, 1266, + 2077, 13, 1743, 300, 300, 733, 295, 14581, 307, 534, 4005, 13, 400, 300, 311, 733, + 295, 437, 321, 600, 50712], "temperature": 0.0, "avg_logprob": -0.20999789441752637, + "compression_ratio": 1.7388059701492538, "no_speech_prob": 0.00041285910992883146}, + {"id": 265, "seek": 150064, "start": 1507.6000000000001, "end": 1513.1200000000001, + "text": " been focused on in terms of in terms of. But I guess the timestamp example + you gave it''s also", "tokens": [50712, 668, 5178, 322, 294, 2115, 295, 294, 2115, + 295, 13, 583, 286, 2041, 264, 49108, 1215, 1365, 291, 2729, 309, 311, 611, 50988], + "temperature": 0.0, "avg_logprob": -0.20999789441752637, "compression_ratio": 1.7388059701492538, + "no_speech_prob": 0.00041285910992883146}, {"id": 266, "seek": 150064, "start": + 1513.1200000000001, "end": 1518.16, "text": " like metadata check right. It''s kind + of like way close where you say between a and b timestamps.", "tokens": [50988, + 411, 26603, 1520, 558, 13, 467, 311, 733, 295, 411, 636, 1998, 689, 291, 584, 1296, + 257, 293, 272, 49108, 23150, 13, 51240], "temperature": 0.0, "avg_logprob": -0.20999789441752637, + "compression_ratio": 1.7388059701492538, "no_speech_prob": 0.00041285910992883146}, + {"id": 267, "seek": 150064, "start": 1518.16, "end": 1523.68, "text": " Yes. Yes. + But like hybrid search at least the way I''m hearing people do this is that", "tokens": + [51240, 1079, 13, 1079, 13, 583, 411, 13051, 3164, 412, 1935, 264, 636, 286, 478, + 4763, 561, 360, 341, 307, 300, 51516], "temperature": 0.0, "avg_logprob": -0.20999789441752637, + "compression_ratio": 1.7388059701492538, "no_speech_prob": 0.00041285910992883146}, + {"id": 268, "seek": 150064, "start": 1524.24, "end": 1530.0, "text": " let''s say + take the search domains example. You might have a keyword search right which is + your", "tokens": [51544, 718, 311, 584, 747, 264, 3164, 25514, 1365, 13, 509, 1062, + 362, 257, 20428, 3164, 558, 597, 307, 428, 51832], "temperature": 0.0, "avg_logprob": + -0.20999789441752637, "compression_ratio": 1.7388059701492538, "no_speech_prob": + 0.00041285910992883146}, {"id": 269, "seek": 153000, "start": 1530.0, "end": 1536.96, + "text": " sparse index and then you have your then syntax vector search. And you + want to combine the two in", "tokens": [50364, 637, 11668, 8186, 293, 550, 291, + 362, 428, 550, 28431, 8062, 3164, 13, 400, 291, 528, 281, 10432, 264, 732, 294, + 50712], "temperature": 0.0, "avg_logprob": -0.19386354569465883, "compression_ratio": + 1.7536764705882353, "no_speech_prob": 0.0020784453954547644}, {"id": 270, "seek": + 153000, "start": 1536.96, "end": 1543.2, "text": " some way. For example, you could + say I still trust keyword search. So let''s give it 75% of weight", "tokens": [50712, + 512, 636, 13, 1171, 1365, 11, 291, 727, 584, 286, 920, 3361, 20428, 3164, 13, 407, + 718, 311, 976, 309, 9562, 4, 295, 3364, 51024], "temperature": 0.0, "avg_logprob": + -0.19386354569465883, "compression_ratio": 1.7536764705882353, "no_speech_prob": + 0.0020784453954547644}, {"id": 271, "seek": 153000, "start": 1543.2, "end": 1548.96, + "text": " and then 25% goes to vector. And then you combine them into leave them + in some merging strategy.", "tokens": [51024, 293, 550, 3552, 4, 1709, 281, 8062, + 13, 400, 550, 291, 10432, 552, 666, 1856, 552, 294, 512, 44559, 5206, 13, 51312], + "temperature": 0.0, "avg_logprob": -0.19386354569465883, "compression_ratio": 1.7536764705882353, + "no_speech_prob": 0.0020784453954547644}, {"id": 272, "seek": 153000, "start": 1548.96, + "end": 1553.2, "text": " And then you return back to the user. Is this how you see + hybrid search or do you see?", "tokens": [51312, 400, 550, 291, 2736, 646, 281, + 264, 4195, 13, 1119, 341, 577, 291, 536, 13051, 3164, 420, 360, 291, 536, 30, 51524], + "temperature": 0.0, "avg_logprob": -0.19386354569465883, "compression_ratio": 1.7536764705882353, + "no_speech_prob": 0.0020784453954547644}, {"id": 273, "seek": 153000, "start": 1554.0, + "end": 1558.4, "text": " So I have a whole ran here. You might you might have yes + you''ve unlocked my ran here. So let''s go", "tokens": [51564, 407, 286, 362, 257, + 1379, 5872, 510, 13, 509, 1062, 291, 1062, 362, 2086, 291, 600, 30180, 452, 5872, + 510, 13, 407, 718, 311, 352, 51784], "temperature": 0.0, "avg_logprob": -0.19386354569465883, + "compression_ratio": 1.7536764705882353, "no_speech_prob": 0.0020784453954547644}, + {"id": 274, "seek": 155840, "start": 1558.88, "end": 1563.52, "text": " so hybrid + search is one of these very overloaded terms exactly as you have kind of this is + kind", "tokens": [50388, 370, 13051, 3164, 307, 472, 295, 613, 588, 28777, 292, + 2115, 2293, 382, 291, 362, 733, 295, 341, 307, 733, 50620], "temperature": 0.0, + "avg_logprob": -0.13490611051036194, "compression_ratio": 1.888030888030888, "no_speech_prob": + 0.0010457593016326427}, {"id": 275, "seek": 155840, "start": 1563.52, "end": 1568.4, + "text": " of sometimes what people mean. Sometimes people do they they smuggle metadata + filtering as a hybrid", "tokens": [50620, 295, 2171, 437, 561, 914, 13, 4803, 561, + 360, 436, 436, 899, 31726, 26603, 30822, 382, 257, 13051, 50864], "temperature": + 0.0, "avg_logprob": -0.13490611051036194, "compression_ratio": 1.888030888030888, + "no_speech_prob": 0.0010457593016326427}, {"id": 276, "seek": 155840, "start": 1568.4, + "end": 1573.6000000000001, "text": " search. Strictly speaking under my definition, + metadata filtering is a kind of hybrid search. It", "tokens": [50864, 3164, 13, + 745, 3740, 356, 4124, 833, 452, 7123, 11, 26603, 30822, 307, 257, 733, 295, 13051, + 3164, 13, 467, 51124], "temperature": 0.0, "avg_logprob": -0.13490611051036194, + "compression_ratio": 1.888030888030888, "no_speech_prob": 0.0010457593016326427}, + {"id": 277, "seek": 155840, "start": 1573.6000000000001, "end": 1579.0400000000002, + "text": " just sort of has extreme weights right like it''s weight one if it matches + and zero if it doesn''t.", "tokens": [51124, 445, 1333, 295, 575, 8084, 17443, 558, + 411, 309, 311, 3364, 472, 498, 309, 10676, 293, 4018, 498, 309, 1177, 380, 13, 51396], + "temperature": 0.0, "avg_logprob": -0.13490611051036194, "compression_ratio": 1.888030888030888, + "no_speech_prob": 0.0010457593016326427}, {"id": 278, "seek": 155840, "start": 1579.0400000000002, + "end": 1586.0, "text": " And so it''s kind of like a weighted hybrid search. You + can also do this kind of linear combination", "tokens": [51396, 400, 370, 309, 311, + 733, 295, 411, 257, 32807, 13051, 3164, 13, 509, 393, 611, 360, 341, 733, 295, 8213, + 6562, 51744], "temperature": 0.0, "avg_logprob": -0.13490611051036194, "compression_ratio": + 1.888030888030888, "no_speech_prob": 0.0010457593016326427}, {"id": 279, "seek": + 158600, "start": 1586.0, "end": 1591.04, "text": " hybrid search right like I have + a BM 25 keyword type a ranker which by the way, rockset can do like", "tokens": + [50364, 13051, 3164, 558, 411, 286, 362, 257, 15901, 3552, 20428, 2010, 257, 6181, + 260, 597, 538, 264, 636, 11, 10989, 302, 393, 360, 411, 50616], "temperature": 0.0, + "avg_logprob": -0.18300769774894404, "compression_ratio": 1.7870036101083033, "no_speech_prob": + 0.003641925984993577}, {"id": 280, "seek": 158600, "start": 1591.04, "end": 1596.24, + "text": " rockset has this rockset you can build you can do this order by keyword + ranking limit 10 like you", "tokens": [50616, 10989, 302, 575, 341, 10989, 302, + 291, 393, 1322, 291, 393, 360, 341, 1668, 538, 20428, 17833, 4948, 1266, 411, 291, + 50876], "temperature": 0.0, "avg_logprob": -0.18300769774894404, "compression_ratio": + 1.7870036101083033, "no_speech_prob": 0.003641925984993577}, {"id": 281, "seek": + 158600, "start": 1596.24, "end": 1601.52, "text": " could write that. And then you + can also then do the vector limit 10 like show me the 10 closest", "tokens": [50876, + 727, 2464, 300, 13, 400, 550, 291, 393, 611, 550, 360, 264, 8062, 4948, 1266, 411, + 855, 385, 264, 1266, 13699, 51140], "temperature": 0.0, "avg_logprob": -0.18300769774894404, + "compression_ratio": 1.7870036101083033, "no_speech_prob": 0.003641925984993577}, + {"id": 282, "seek": 158600, "start": 1601.52, "end": 1609.84, "text": " vectors. + There''s nothing stopping you from saying or you know order by 0.25 of that plus + 0.75 of that", "tokens": [51140, 18875, 13, 821, 311, 1825, 12767, 291, 490, 1566, + 420, 291, 458, 1668, 538, 1958, 13, 6074, 295, 300, 1804, 1958, 13, 11901, 295, + 300, 51556], "temperature": 0.0, "avg_logprob": -0.18300769774894404, "compression_ratio": + 1.7870036101083033, "no_speech_prob": 0.003641925984993577}, {"id": 283, "seek": + 158600, "start": 1609.84, "end": 1615.52, "text": " for example, in your in your + example. So that kind of linear combination hybrid search is is doable", "tokens": + [51556, 337, 1365, 11, 294, 428, 294, 428, 1365, 13, 407, 300, 733, 295, 8213, 6562, + 13051, 3164, 307, 307, 41183, 51840], "temperature": 0.0, "avg_logprob": -0.18300769774894404, + "compression_ratio": 1.7870036101083033, "no_speech_prob": 0.003641925984993577}, + {"id": 284, "seek": 161552, "start": 1615.52, "end": 1619.04, "text": " like that + that''s how that''s how you could do rockset. Sorry, that''s how you can do that + kind of", "tokens": [50364, 411, 300, 300, 311, 577, 300, 311, 577, 291, 727, 360, + 10989, 302, 13, 4919, 11, 300, 311, 577, 291, 393, 360, 300, 733, 295, 50540], "temperature": + 0.0, "avg_logprob": -0.11747104030544475, "compression_ratio": 1.8365019011406845, + "no_speech_prob": 0.0003555835864972323}, {"id": 285, "seek": 161552, "start": 1619.04, + "end": 1624.4, "text": " hybrid search on rockset today. Now you people do do slightly + more advanced things than this by", "tokens": [50540, 13051, 3164, 322, 10989, 302, + 965, 13, 823, 291, 561, 360, 360, 4748, 544, 7339, 721, 813, 341, 538, 50808], "temperature": + 0.0, "avg_logprob": -0.11747104030544475, "compression_ratio": 1.8365019011406845, + "no_speech_prob": 0.0003555835864972323}, {"id": 286, "seek": 161552, "start": 1624.4, + "end": 1628.72, "text": " the way there are like you can go beyond that in hybrid + search and get into things like by encoding", "tokens": [50808, 264, 636, 456, 366, + 411, 291, 393, 352, 4399, 300, 294, 13051, 3164, 293, 483, 666, 721, 411, 538, 43430, + 51024], "temperature": 0.0, "avg_logprob": -0.11747104030544475, "compression_ratio": + 1.8365019011406845, "no_speech_prob": 0.0003555835864972323}, {"id": 287, "seek": + 161552, "start": 1628.72, "end": 1635.68, "text": " and crossing coding where you + really do try to take the the expanded vector space and treat it", "tokens": [51024, + 293, 14712, 17720, 689, 291, 534, 360, 853, 281, 747, 264, 264, 14342, 8062, 1901, + 293, 2387, 309, 51372], "temperature": 0.0, "avg_logprob": -0.11747104030544475, + "compression_ratio": 1.8365019011406845, "no_speech_prob": 0.0003555835864972323}, + {"id": 288, "seek": 161552, "start": 1635.68, "end": 1641.76, "text": " non-linearly + so it''s no longer a linear combination of the two halves. And we''ve this is this + is", "tokens": [51372, 2107, 12, 28263, 356, 370, 309, 311, 572, 2854, 257, 8213, + 6562, 295, 264, 732, 38490, 13, 400, 321, 600, 341, 307, 341, 307, 51676], "temperature": + 0.0, "avg_logprob": -0.11747104030544475, "compression_ratio": 1.8365019011406845, + "no_speech_prob": 0.0003555835864972323}, {"id": 289, "seek": 164176, "start": 1641.76, + "end": 1648.96, "text": " something we are actively looking at. I don''t so it''s + I don''t think it''s hard to add like it''s", "tokens": [50364, 746, 321, 366, 13022, + 1237, 412, 13, 286, 500, 380, 370, 309, 311, 286, 500, 380, 519, 309, 311, 1152, + 281, 909, 411, 309, 311, 50724], "temperature": 0.0, "avg_logprob": -0.10175528064850838, + "compression_ratio": 1.9516129032258065, "no_speech_prob": 0.0010528104612603784}, + {"id": 290, "seek": 164176, "start": 1648.96, "end": 1653.76, "text": " easy extension + onto onto onto the current system but it''s more of like a science question like", + "tokens": [50724, 1858, 10320, 3911, 3911, 3911, 264, 2190, 1185, 457, 309, 311, + 544, 295, 411, 257, 3497, 1168, 411, 50964], "temperature": 0.0, "avg_logprob": + -0.10175528064850838, "compression_ratio": 1.9516129032258065, "no_speech_prob": + 0.0010528104612603784}, {"id": 291, "seek": 164176, "start": 1653.76, "end": 1658.96, + "text": " it''s more of like if you tell me what to add I''ll add it sure that''s + easy but it''s like what do we", "tokens": [50964, 309, 311, 544, 295, 411, 498, + 291, 980, 385, 437, 281, 909, 286, 603, 909, 309, 988, 300, 311, 1858, 457, 309, + 311, 411, 437, 360, 321, 51224], "temperature": 0.0, "avg_logprob": -0.10175528064850838, + "compression_ratio": 1.9516129032258065, "no_speech_prob": 0.0010528104612603784}, + {"id": 292, "seek": 164176, "start": 1658.96, "end": 1662.32, "text": " add like + what''s the right crossing code I don''t know that''s a much harder problem that''s + like much", "tokens": [51224, 909, 411, 437, 311, 264, 558, 14712, 3089, 286, 500, + 380, 458, 300, 311, 257, 709, 6081, 1154, 300, 311, 411, 709, 51392], "temperature": + 0.0, "avg_logprob": -0.10175528064850838, "compression_ratio": 1.9516129032258065, + "no_speech_prob": 0.0010528104612603784}, {"id": 293, "seek": 164176, "start": 1662.32, + "end": 1667.52, "text": " more of a scientific question in terms of like do I need + to train an encoder for your particular", "tokens": [51392, 544, 295, 257, 8134, + 1168, 294, 2115, 295, 411, 360, 286, 643, 281, 3847, 364, 2058, 19866, 337, 428, + 1729, 51652], "temperature": 0.0, "avg_logprob": -0.10175528064850838, "compression_ratio": + 1.9516129032258065, "no_speech_prob": 0.0010528104612603784}, {"id": 294, "seek": + 166752, "start": 1667.68, "end": 1671.36, "text": " use case is there such a thing + as a good off the shelf one right so that''s that''s kind of where", "tokens": [50372, + 764, 1389, 307, 456, 1270, 257, 551, 382, 257, 665, 766, 264, 15222, 472, 558, 370, + 300, 311, 300, 311, 733, 295, 689, 50556], "temperature": 0.0, "avg_logprob": -0.15393006088387254, + "compression_ratio": 1.8795180722891567, "no_speech_prob": 0.004065278917551041}, + {"id": 295, "seek": 166752, "start": 1671.36, "end": 1675.12, "text": " we''re at + with this but but in terms of adding that functionality that is like an active you''ve", + "tokens": [50556, 321, 434, 412, 365, 341, 457, 457, 294, 2115, 295, 5127, 300, + 14980, 300, 307, 411, 364, 4967, 291, 600, 50744], "temperature": 0.0, "avg_logprob": + -0.15393006088387254, "compression_ratio": 1.8795180722891567, "no_speech_prob": + 0.004065278917551041}, {"id": 296, "seek": 166752, "start": 1675.12, "end": 1678.8799999999999, + "text": " you''ve this is the this is the frontier right now for us that''s for + those people that they''re", "tokens": [50744, 291, 600, 341, 307, 264, 341, 307, + 264, 35853, 558, 586, 337, 505, 300, 311, 337, 729, 561, 300, 436, 434, 50932], + "temperature": 0.0, "avg_logprob": -0.15393006088387254, "compression_ratio": 1.8795180722891567, + "no_speech_prob": 0.004065278917551041}, {"id": 297, "seek": 166752, "start": 1678.8799999999999, + "end": 1687.92, "text": " trying to go beyond the kind of bilinear yeah wait yeah + another thing yeah bilinear is it''s", "tokens": [50932, 1382, 281, 352, 4399, 264, + 733, 295, 8588, 533, 289, 1338, 1699, 1338, 1071, 551, 1338, 8588, 533, 289, 307, + 309, 311, 51384], "temperature": 0.0, "avg_logprob": -0.15393006088387254, "compression_ratio": + 1.8795180722891567, "no_speech_prob": 0.004065278917551041}, {"id": 298, "seek": + 166752, "start": 1687.92, "end": 1692.24, "text": " it''s amazing maybe you can + share some resources as well for for me and the audience to read", "tokens": [51384, + 309, 311, 2243, 1310, 291, 393, 2073, 512, 3593, 382, 731, 337, 337, 385, 293, 264, + 4034, 281, 1401, 51600], "temperature": 0.0, "avg_logprob": -0.15393006088387254, + "compression_ratio": 1.8795180722891567, "no_speech_prob": 0.004065278917551041}, + {"id": 299, "seek": 169224, "start": 1692.56, "end": 1699.36, "text": " but also + I thought you know when hybrid search sort of topic emerged right in in the vector", + "tokens": [50380, 457, 611, 286, 1194, 291, 458, 562, 13051, 3164, 1333, 295, 4829, + 20178, 558, 294, 294, 264, 8062, 50720], "temperature": 0.0, "avg_logprob": -0.24188362373100533, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.003579241456463933}, + {"id": 300, "seek": 169224, "start": 1699.36, "end": 1707.84, "text": " database + world you know bb8 pine cone milbus what you''re like and so on I think one thing + that was", "tokens": [50720, 8149, 1002, 291, 458, 272, 65, 23, 15113, 19749, 1962, + 21441, 437, 291, 434, 411, 293, 370, 322, 286, 519, 472, 551, 300, 390, 51144], + "temperature": 0.0, "avg_logprob": -0.24188362373100533, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.003579241456463933}, {"id": 301, "seek": 169224, "start": 1707.84, + "end": 1715.1200000000001, "text": " overlooked and I really wanted to tap into + that at some point is to learn the alpha rise because", "tokens": [51144, 32269, + 293, 286, 534, 1415, 281, 5119, 666, 300, 412, 512, 935, 307, 281, 1466, 264, 8961, + 6272, 570, 51508], "temperature": 0.0, "avg_logprob": -0.24188362373100533, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.003579241456463933}, {"id": 302, "seek": + 169224, "start": 1715.1200000000001, "end": 1720.24, "text": " it''s not given like + how should you come if you go with linear combination you know what should be", + "tokens": [51508, 309, 311, 406, 2212, 411, 577, 820, 291, 808, 498, 291, 352, 365, + 8213, 6562, 291, 458, 437, 820, 312, 51764], "temperature": 0.0, "avg_logprob": + -0.24188362373100533, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.003579241456463933}, {"id": 303, "seek": 172024, "start": 1720.24, "end": 1728.88, + "text": " the alpha for your data yeah it''s fun yeah this is what''s certain like + the search community has", "tokens": [50364, 264, 8961, 337, 428, 1412, 1338, 309, + 311, 1019, 1338, 341, 307, 437, 311, 1629, 411, 264, 3164, 1768, 575, 50796], "temperature": + 0.0, "avg_logprob": -0.14784541220035194, "compression_ratio": 1.9173553719008265, + "no_speech_prob": 0.0009606542298570275}, {"id": 304, "seek": 172024, "start": 1728.88, + "end": 1733.68, "text": " been doing things like this for a long time like search + people are quite familiar with this idea of", "tokens": [50796, 668, 884, 721, 411, + 341, 337, 257, 938, 565, 411, 3164, 561, 366, 1596, 4963, 365, 341, 1558, 295, 51036], + "temperature": 0.0, "avg_logprob": -0.14784541220035194, "compression_ratio": 1.9173553719008265, + "no_speech_prob": 0.0009606542298570275}, {"id": 305, "seek": 172024, "start": 1734.4, + "end": 1738.72, "text": " I have a semantic search system I have a keyword ranking + system I have an alpha", "tokens": [51072, 286, 362, 257, 47982, 3164, 1185, 286, + 362, 257, 20428, 17833, 1185, 286, 362, 364, 8961, 51288], "temperature": 0.0, "avg_logprob": + -0.14784541220035194, "compression_ratio": 1.9173553719008265, "no_speech_prob": + 0.0009606542298570275}, {"id": 306, "seek": 172024, "start": 1739.36, "end": 1743.6, + "text": " and I''ve learned I learned that alpha and I inject it into my system + and then they''ve even", "tokens": [51320, 293, 286, 600, 3264, 286, 3264, 300, + 8961, 293, 286, 10711, 309, 666, 452, 1185, 293, 550, 436, 600, 754, 51532], "temperature": + 0.0, "avg_logprob": -0.14784541220035194, "compression_ratio": 1.9173553719008265, + "no_speech_prob": 0.0009606542298570275}, {"id": 307, "seek": 172024, "start": 1743.6, + "end": 1747.2, "text": " gone farther like search has this whole like wand idea + people I don''t know if people are familiar", "tokens": [51532, 2780, 20344, 411, + 3164, 575, 341, 1379, 411, 14304, 1558, 561, 286, 500, 380, 458, 498, 561, 366, + 4963, 51712], "temperature": 0.0, "avg_logprob": -0.14784541220035194, "compression_ratio": + 1.9173553719008265, "no_speech_prob": 0.0009606542298570275}, {"id": 308, "seek": + 174720, "start": 1747.2, "end": 1753.04, "text": " so again we have to have all + these community like weekend weekend exactly right joe bersham listening to", "tokens": + [50364, 370, 797, 321, 362, 281, 362, 439, 613, 1768, 411, 6711, 6711, 2293, 558, + 1488, 68, 272, 433, 4822, 4764, 281, 50656], "temperature": 0.0, "avg_logprob": + -0.194007044253142, "compression_ratio": 1.9094488188976377, "no_speech_prob": 0.005349722225219011}, + {"id": 309, "seek": 174720, "start": 1753.04, "end": 1757.92, "text": " this podcast + will probably say yeah I know what you''re talking about yeah yeah yeah so it''s + funny", "tokens": [50656, 341, 7367, 486, 1391, 584, 1338, 286, 458, 437, 291, 434, + 1417, 466, 1338, 1338, 1338, 370, 309, 311, 4074, 50900], "temperature": 0.0, "avg_logprob": + -0.194007044253142, "compression_ratio": 1.9094488188976377, "no_speech_prob": 0.005349722225219011}, + {"id": 310, "seek": 174720, "start": 1757.92, "end": 1763.8400000000001, "text": + " because like the vector community is sort of like I mean it''s not it''s not rehashing + it''s not", "tokens": [50900, 570, 411, 264, 8062, 1768, 307, 1333, 295, 411, 286, + 914, 309, 311, 406, 309, 311, 406, 22355, 11077, 309, 311, 406, 51196], "temperature": + 0.0, "avg_logprob": -0.194007044253142, "compression_ratio": 1.9094488188976377, + "no_speech_prob": 0.005349722225219011}, {"id": 311, "seek": 174720, "start": 1763.8400000000001, + "end": 1767.76, "text": " relearning because it''s got this new thing this a and + n things what''s got to like drag a and", "tokens": [51196, 2951, 2341, 570, 309, + 311, 658, 341, 777, 551, 341, 257, 293, 297, 721, 437, 311, 658, 281, 411, 5286, + 257, 293, 51392], "temperature": 0.0, "avg_logprob": -0.194007044253142, "compression_ratio": + 1.9094488188976377, "no_speech_prob": 0.005349722225219011}, {"id": 312, "seek": + 174720, "start": 1767.76, "end": 1773.2, "text": " through the search history of + like things these other kinds of things so yes so so learning the", "tokens": [51392, + 807, 264, 3164, 2503, 295, 411, 721, 613, 661, 3685, 295, 721, 370, 2086, 370, 370, + 2539, 264, 51664], "temperature": 0.0, "avg_logprob": -0.194007044253142, "compression_ratio": + 1.9094488188976377, "no_speech_prob": 0.005349722225219011}, {"id": 313, "seek": + 177320, "start": 1773.2, "end": 1779.92, "text": " ant parameter um this is not + a particularly hard thing to do using rock set but it''s not a thing we", "tokens": + [50364, 2511, 13075, 1105, 341, 307, 406, 257, 4098, 1152, 551, 281, 360, 1228, + 3727, 992, 457, 309, 311, 406, 257, 551, 321, 50700], "temperature": 0.0, "avg_logprob": + -0.14020461710090312, "compression_ratio": 2.0041152263374484, "no_speech_prob": + 0.002518066205084324}, {"id": 314, "seek": 177320, "start": 1780.56, "end": 1786.8, + "text": " we don''t we don''t help you like I don''t have a button you push to to + automatically learn your", "tokens": [50732, 321, 500, 380, 321, 500, 380, 854, + 291, 411, 286, 500, 380, 362, 257, 2960, 291, 2944, 281, 281, 6772, 1466, 428, 51044], + "temperature": 0.0, "avg_logprob": -0.14020461710090312, "compression_ratio": 2.0041152263374484, + "no_speech_prob": 0.002518066205084324}, {"id": 315, "seek": 177320, "start": 1786.8, + "end": 1791.1200000000001, "text": " your alpha like you can send me whatever query + you want it can have whatever alpha in it you want", "tokens": [51044, 428, 8961, + 411, 291, 393, 2845, 385, 2035, 14581, 291, 528, 309, 393, 362, 2035, 8961, 294, + 309, 291, 528, 51260], "temperature": 0.0, "avg_logprob": -0.14020461710090312, + "compression_ratio": 2.0041152263374484, "no_speech_prob": 0.002518066205084324}, + {"id": 316, "seek": 177320, "start": 1791.1200000000001, "end": 1796.32, "text": + " you can build whatever system you can query us arbitrarily to generate an alpha + however you''d like", "tokens": [51260, 291, 393, 1322, 2035, 1185, 291, 393, 14581, + 505, 19071, 3289, 281, 8460, 364, 8961, 4461, 291, 1116, 411, 51520], "temperature": + 0.0, "avg_logprob": -0.14020461710090312, "compression_ratio": 2.0041152263374484, + "no_speech_prob": 0.002518066205084324}, {"id": 317, "seek": 177320, "start": 1796.32, + "end": 1800.56, "text": " and and then send me the queries with the alpha you you + you you you dreaded and ready but that''s", "tokens": [51520, 293, 293, 550, 2845, + 385, 264, 24109, 365, 264, 8961, 291, 291, 291, 291, 291, 22236, 292, 293, 1919, + 457, 300, 311, 51732], "temperature": 0.0, "avg_logprob": -0.14020461710090312, + "compression_ratio": 2.0041152263374484, "no_speech_prob": 0.002518066205084324}, + {"id": 318, "seek": 180056, "start": 1800.56, "end": 1805.44, "text": " that''s + roughly how that''s going to look today for us yeah do you think at all maybe picking + a", "tokens": [50364, 300, 311, 9810, 577, 300, 311, 516, 281, 574, 965, 337, 505, + 1338, 360, 291, 519, 412, 439, 1310, 8867, 257, 50608], "temperature": 0.0, "avg_logprob": + -0.15571122030610018, "compression_ratio": 1.836, "no_speech_prob": 0.002275618491694331}, + {"id": 319, "seek": 180056, "start": 1805.44, "end": 1811.52, "text": " little bit + into the future and sort of inspiration do you think at all that the industry you + iin", "tokens": [50608, 707, 857, 666, 264, 2027, 293, 1333, 295, 10249, 360, 291, + 519, 412, 439, 300, 264, 3518, 291, 741, 259, 50912], "temperature": 0.0, "avg_logprob": + -0.15571122030610018, "compression_ratio": 1.836, "no_speech_prob": 0.002275618491694331}, + {"id": 320, "seek": 180056, "start": 1812.1599999999999, "end": 1817.44, "text": + " one day will end up suggesting these values to their users you know learning from + the data and", "tokens": [50944, 472, 786, 486, 917, 493, 18094, 613, 4190, 281, + 641, 5022, 291, 458, 2539, 490, 264, 1412, 293, 51208], "temperature": 0.0, "avg_logprob": + -0.15571122030610018, "compression_ratio": 1.836, "no_speech_prob": 0.002275618491694331}, + {"id": 321, "seek": 180056, "start": 1817.44, "end": 1822.24, "text": " sort of + maybe even like you know looking at how things behave you know introduction", "tokens": + [51208, 1333, 295, 1310, 754, 411, 291, 458, 1237, 412, 577, 721, 15158, 291, 458, + 9339, 51448], "temperature": 0.0, "avg_logprob": -0.15571122030610018, "compression_ratio": + 1.836, "no_speech_prob": 0.002275618491694331}, {"id": 322, "seek": 180056, "start": + 1822.72, "end": 1827.6799999999998, "text": " what do people click although yeah + there is a risk of going too much into the application", "tokens": [51472, 437, + 360, 561, 2052, 4878, 1338, 456, 307, 257, 3148, 295, 516, 886, 709, 666, 264, 3861, + 51720], "temperature": 0.0, "avg_logprob": -0.15571122030610018, "compression_ratio": + 1.836, "no_speech_prob": 0.002275618491694331}, {"id": 323, "seek": 182768, "start": + 1828.24, "end": 1835.76, "text": " logic which you probably do not want to do but + I know I my view is kind of like", "tokens": [50392, 9952, 597, 291, 1391, 360, + 406, 528, 281, 360, 457, 286, 458, 286, 452, 1910, 307, 733, 295, 411, 50768], "temperature": + 0.0, "avg_logprob": -0.10111207740251409, "compression_ratio": 1.7788461538461537, + "no_speech_prob": 0.0014382046647369862}, {"id": 324, "seek": 182768, "start": 1836.5600000000002, + "end": 1842.16, "text": " so once upon a time I had a similar feeling this reminds + me of a similar discussion that didn''t", "tokens": [50808, 370, 1564, 3564, 257, + 565, 286, 632, 257, 2531, 2633, 341, 12025, 385, 295, 257, 2531, 5017, 300, 994, + 380, 51088], "temperature": 0.0, "avg_logprob": -0.10111207740251409, "compression_ratio": + 1.7788461538461537, "no_speech_prob": 0.0014382046647369862}, {"id": 325, "seek": + 182768, "start": 1842.16, "end": 1848.0800000000002, "text": " happen that long + ago which was like around feature stores like database people looked at feature", + "tokens": [51088, 1051, 300, 938, 2057, 597, 390, 411, 926, 4111, 9512, 411, 8149, + 561, 2956, 412, 4111, 51384], "temperature": 0.0, "avg_logprob": -0.10111207740251409, + "compression_ratio": 1.7788461538461537, "no_speech_prob": 0.0014382046647369862}, + {"id": 326, "seek": 182768, "start": 1848.0800000000002, "end": 1852.16, "text": + " stores and we''re like what do you need a feature store for like you just use + a database store for", "tokens": [51384, 9512, 293, 321, 434, 411, 437, 360, 291, + 643, 257, 4111, 3531, 337, 411, 291, 445, 764, 257, 8149, 3531, 337, 51588], "temperature": + 0.0, "avg_logprob": -0.10111207740251409, "compression_ratio": 1.7788461538461537, + "no_speech_prob": 0.0014382046647369862}, {"id": 327, "seek": 185216, "start": 1852.16, + "end": 1858.64, "text": " features and the reality is most feature stores that''s + what they are they are databases that on top", "tokens": [50364, 4122, 293, 264, + 4103, 307, 881, 4111, 9512, 300, 311, 437, 436, 366, 436, 366, 22380, 300, 322, + 1192, 50688], "temperature": 0.0, "avg_logprob": -0.22868998297329607, "compression_ratio": + 1.9367588932806323, "no_speech_prob": 0.0004790632228832692}, {"id": 328, "seek": + 185216, "start": 1858.64, "end": 1862.8000000000002, "text": " of them put a lot + of things to help manage as a first class citizen the lifetime of a feature", "tokens": + [50688, 295, 552, 829, 257, 688, 295, 721, 281, 854, 3067, 382, 257, 700, 1508, + 13326, 264, 11364, 295, 257, 4111, 50896], "temperature": 0.0, "avg_logprob": -0.22868998297329607, + "compression_ratio": 1.9367588932806323, "no_speech_prob": 0.0004790632228832692}, + {"id": 329, "seek": 185216, "start": 1864.16, "end": 1869.76, "text": " like orchestration + platforms like like techton and he''s like orc orc orcust and that orchestration + systems", "tokens": [50964, 411, 14161, 2405, 9473, 411, 411, 7553, 1756, 293, 415, + 311, 411, 420, 66, 420, 66, 420, 66, 381, 293, 300, 14161, 2405, 3652, 51244], "temperature": + 0.0, "avg_logprob": -0.22868998297329607, "compression_ratio": 1.9367588932806323, + "no_speech_prob": 0.0004790632228832692}, {"id": 330, "seek": 185216, "start": 1869.76, + "end": 1875.3600000000001, "text": " that''s that''s what you''re I think it no + matter what there''ll always be a database in there", "tokens": [51244, 300, 311, + 300, 311, 437, 291, 434, 286, 519, 309, 572, 1871, 437, 456, 603, 1009, 312, 257, + 8149, 294, 456, 51524], "temperature": 0.0, "avg_logprob": -0.22868998297329607, + "compression_ratio": 1.9367588932806323, "no_speech_prob": 0.0004790632228832692}, + {"id": 331, "seek": 185216, "start": 1875.3600000000001, "end": 1879.28, "text": + " and something like rockset will be in there and the question of whether or not + rockset the company", "tokens": [51524, 293, 746, 411, 10989, 302, 486, 312, 294, + 456, 293, 264, 1168, 295, 1968, 420, 406, 10989, 302, 264, 2237, 51720], "temperature": + 0.0, "avg_logprob": -0.22868998297329607, "compression_ratio": 1.9367588932806323, + "no_speech_prob": 0.0004790632228832692}, {"id": 332, "seek": 187928, "start": 1879.36, + "end": 1883.68, "text": " is like a larger piece of software that has rockset the + database and some orchestration layers", "tokens": [50368, 307, 411, 257, 4833, + 2522, 295, 4722, 300, 575, 10989, 302, 264, 8149, 293, 512, 14161, 2405, 7914, 50584], + "temperature": 0.0, "avg_logprob": -0.0850755724796029, "compression_ratio": 1.6919642857142858, + "no_speech_prob": 0.0013353492831811309}, {"id": 333, "seek": 187928, "start": 1883.68, + "end": 1887.84, "text": " above it to help you do these kinds of things that''s + a harder question I think that", "tokens": [50584, 3673, 309, 281, 854, 291, 360, + 613, 3685, 295, 721, 300, 311, 257, 6081, 1168, 286, 519, 300, 50792], "temperature": + 0.0, "avg_logprob": -0.0850755724796029, "compression_ratio": 1.6919642857142858, + "no_speech_prob": 0.0013353492831811309}, {"id": 334, "seek": 187928, "start": 1888.72, + "end": 1894.08, "text": " if you ask me to make a prediction about where things + are going my guess is that for the foreseeable", "tokens": [50836, 498, 291, 1029, + 385, 281, 652, 257, 17630, 466, 689, 721, 366, 516, 452, 2041, 307, 300, 337, 264, + 38736, 712, 51104], "temperature": 0.0, "avg_logprob": -0.0850755724796029, "compression_ratio": + 1.6919642857142858, "no_speech_prob": 0.0013353492831811309}, {"id": 335, "seek": + 187928, "start": 1894.08, "end": 1902.56, "text": " future hybrid searches king + of some kind so very few problems will be purely vector search I that''s", "tokens": + [51104, 2027, 13051, 26701, 4867, 295, 512, 733, 370, 588, 1326, 2740, 486, 312, + 17491, 8062, 3164, 286, 300, 311, 51528], "temperature": 0.0, "avg_logprob": -0.0850755724796029, + "compression_ratio": 1.6919642857142858, "no_speech_prob": 0.0013353492831811309}, + {"id": 336, "seek": 190256, "start": 1902.56, "end": 1909.76, "text": " my guess + almost all will be will we greatly benefited by some form of hybridization even + if it''s", "tokens": [50364, 452, 2041, 1920, 439, 486, 312, 486, 321, 14147, 33605, + 538, 512, 1254, 295, 13051, 2144, 754, 498, 309, 311, 50724], "temperature": 0.0, + "avg_logprob": -0.11390698674213455, "compression_ratio": 1.668103448275862, "no_speech_prob": + 0.00033994796103797853}, {"id": 337, "seek": 190256, "start": 1909.76, "end": 1918.3999999999999, + "text": " just metadata filtering um and then that means that the more advanced + search techniques that will", "tokens": [50724, 445, 26603, 30822, 1105, 293, 550, + 300, 1355, 300, 264, 544, 7339, 3164, 7512, 300, 486, 51156], "temperature": 0.0, + "avg_logprob": -0.11390698674213455, "compression_ratio": 1.668103448275862, "no_speech_prob": + 0.00033994796103797853}, {"id": 338, "seek": 190256, "start": 1918.3999999999999, + "end": 1923.2, "text": " slowly migrate over which means things like alpha learning + and weak and and all these other kinds", "tokens": [51156, 5692, 31821, 670, 597, + 1355, 721, 411, 8961, 2539, 293, 5336, 293, 293, 439, 613, 661, 3685, 51396], "temperature": + 0.0, "avg_logprob": -0.11390698674213455, "compression_ratio": 1.668103448275862, + "no_speech_prob": 0.00033994796103797853}, {"id": 339, "seek": 190256, "start": + 1923.2, "end": 1929.6, "text": " of higher level two-stage retrieval type ideas + that that come from the search world I do think", "tokens": [51396, 295, 2946, 1496, + 732, 12, 17882, 19817, 3337, 2010, 3487, 300, 300, 808, 490, 264, 3164, 1002, 286, + 360, 519, 51716], "temperature": 0.0, "avg_logprob": -0.11390698674213455, "compression_ratio": + 1.668103448275862, "no_speech_prob": 0.00033994796103797853}, {"id": 340, "seek": + 192960, "start": 1929.6, "end": 1935.04, "text": " will come over and more and more + influence the vector search world because the vector search", "tokens": [50364, + 486, 808, 670, 293, 544, 293, 544, 6503, 264, 8062, 3164, 1002, 570, 264, 8062, + 3164, 50636], "temperature": 0.0, "avg_logprob": -0.1513976331027049, "compression_ratio": + 1.687719298245614, "no_speech_prob": 0.004446651786565781}, {"id": 341, "seek": + 192960, "start": 1935.04, "end": 1939.76, "text": " ultimately is a form of search + so it shouldn''t be surprising that most of these same ideas are still", "tokens": + [50636, 6284, 307, 257, 1254, 295, 3164, 370, 309, 4659, 380, 312, 8830, 300, 881, + 295, 613, 912, 3487, 366, 920, 50872], "temperature": 0.0, "avg_logprob": -0.1513976331027049, + "compression_ratio": 1.687719298245614, "no_speech_prob": 0.004446651786565781}, + {"id": 342, "seek": 192960, "start": 1939.76, "end": 1946.7199999999998, "text": + " still apply yeah for sure I mean there is this extreme example from Mark Cuban + the episode on", "tokens": [50872, 920, 3079, 1338, 337, 988, 286, 914, 456, 307, + 341, 8084, 1365, 490, 3934, 31547, 264, 3500, 322, 51220], "temperature": 0.0, "avg_logprob": + -0.1513976331027049, "compression_ratio": 1.687719298245614, "no_speech_prob": 0.004446651786565781}, + {"id": 343, "seek": 192960, "start": 1946.7199999999998, "end": 1953.04, "text": + " Lex Friedman podcast that they just finished listening to he says that probably + in the future all", "tokens": [51220, 24086, 17605, 1601, 7367, 300, 436, 445, 4335, + 4764, 281, 415, 1619, 300, 1391, 294, 264, 2027, 439, 51536], "temperature": 0.0, + "avg_logprob": -0.1513976331027049, "compression_ratio": 1.687719298245614, "no_speech_prob": + 0.004446651786565781}, {"id": 344, "seek": 192960, "start": 1953.04, "end": 1958.7199999999998, + "text": " of us will have their own LLAMs trained for whatever reason you know for + example you want to do", "tokens": [51536, 295, 505, 486, 362, 641, 1065, 441, 43, + 2865, 82, 8895, 337, 2035, 1778, 291, 458, 337, 1365, 291, 528, 281, 360, 51820], + "temperature": 0.0, "avg_logprob": -0.1513976331027049, "compression_ratio": 1.687719298245614, + "no_speech_prob": 0.004446651786565781}, {"id": 345, "seek": 195872, "start": 1958.72, + "end": 1964.56, "text": " stock trading and so you start you know draining your + model maybe on specific subset of stocks or", "tokens": [50364, 4127, 9529, 293, + 370, 291, 722, 291, 458, 42916, 428, 2316, 1310, 322, 2685, 25993, 295, 12966, 420, + 50656], "temperature": 0.0, "avg_logprob": -0.14325469605466154, "compression_ratio": + 1.7863636363636364, "no_speech_prob": 0.010126732289791107}, {"id": 346, "seek": + 195872, "start": 1964.56, "end": 1972.16, "text": " whatever and then it will help + you it will augment augment you as they say yeah as an entity yeah I", "tokens": + [50656, 2035, 293, 550, 309, 486, 854, 291, 309, 486, 29919, 29919, 291, 382, 436, + 584, 1338, 382, 364, 13977, 1338, 286, 51036], "temperature": 0.0, "avg_logprob": + -0.14325469605466154, "compression_ratio": 1.7863636363636364, "no_speech_prob": + 0.010126732289791107}, {"id": 347, "seek": 195872, "start": 1972.16, "end": 1978.72, + "text": " would love I would love for a chat GPT that could like put in making an + email sketch me a skeleton", "tokens": [51036, 576, 959, 286, 576, 959, 337, 257, + 5081, 26039, 51, 300, 727, 411, 829, 294, 1455, 364, 3796, 12325, 385, 257, 25204, + 51364], "temperature": 0.0, "avg_logprob": -0.14325469605466154, "compression_ratio": + 1.7863636363636364, "no_speech_prob": 0.010126732289791107}, {"id": 348, "seek": + 195872, "start": 1978.72, "end": 1983.84, "text": " email in my voice like the chat + GPT voice if I say hey write me an email to say this to somebody", "tokens": [51364, + 3796, 294, 452, 3177, 411, 264, 5081, 26039, 51, 3177, 498, 286, 584, 4177, 2464, + 385, 364, 3796, 281, 584, 341, 281, 2618, 51620], "temperature": 0.0, "avg_logprob": + -0.14325469605466154, "compression_ratio": 1.7863636363636364, "no_speech_prob": + 0.010126732289791107}, {"id": 349, "seek": 198384, "start": 1983.84, "end": 1988.1599999999999, + "text": " it''s not my voice right it''s a I don''t know it''s a little too corporate + my voice is a little bit more", "tokens": [50364, 309, 311, 406, 452, 3177, 558, + 309, 311, 257, 286, 500, 380, 458, 309, 311, 257, 707, 886, 10896, 452, 3177, 307, + 257, 707, 857, 544, 50580], "temperature": 0.0, "avg_logprob": -0.13255721803695436, + "compression_ratio": 1.996031746031746, "no_speech_prob": 0.01226211991161108}, + {"id": 350, "seek": 198384, "start": 1988.1599999999999, "end": 1994.48, "text": + " yeah so it''d be cool to have it like learn my voice and be able to you know write + me a skeleton that", "tokens": [50580, 1338, 370, 309, 1116, 312, 1627, 281, 362, + 309, 411, 1466, 452, 3177, 293, 312, 1075, 281, 291, 458, 2464, 385, 257, 25204, + 300, 50896], "temperature": 0.0, "avg_logprob": -0.13255721803695436, "compression_ratio": + 1.996031746031746, "no_speech_prob": 0.01226211991161108}, {"id": 351, "seek": 198384, + "start": 1994.48, "end": 1999.52, "text": " of something that was like sounded like + me that would be that would be awesome I''ll be I''m there for", "tokens": [50896, + 295, 746, 300, 390, 411, 17714, 411, 385, 300, 576, 312, 300, 576, 312, 3476, 286, + 603, 312, 286, 478, 456, 337, 51148], "temperature": 0.0, "avg_logprob": -0.13255721803695436, + "compression_ratio": 1.996031746031746, "no_speech_prob": 0.01226211991161108}, + {"id": 352, "seek": 198384, "start": 1999.52, "end": 2005.9199999999998, "text": + " that yeah what I would like that some model or whatever it is would remind me + that I forgot to drink", "tokens": [51148, 300, 1338, 437, 286, 576, 411, 300, 512, + 2316, 420, 2035, 309, 307, 576, 4160, 385, 300, 286, 5298, 281, 2822, 51468], "temperature": + 0.0, "avg_logprob": -0.13255721803695436, "compression_ratio": 1.996031746031746, + "no_speech_prob": 0.01226211991161108}, {"id": 353, "seek": 198384, "start": 2005.9199999999998, + "end": 2010.3999999999999, "text": " water you know something like that so it learns + my habits and it knows that it''s bad for my health", "tokens": [51468, 1281, 291, + 458, 746, 411, 300, 370, 309, 27152, 452, 14100, 293, 309, 3255, 300, 309, 311, + 1578, 337, 452, 1585, 51692], "temperature": 0.0, "avg_logprob": -0.13255721803695436, + "compression_ratio": 1.996031746031746, "no_speech_prob": 0.01226211991161108}, + {"id": 354, "seek": 201040, "start": 2010.96, "end": 2015.52, "text": " you know + remember to do these remember to stand out remember to walk things like this you + know", "tokens": [50392, 291, 458, 1604, 281, 360, 613, 1604, 281, 1463, 484, 1604, + 281, 1792, 721, 411, 341, 291, 458, 50620], "temperature": 0.0, "avg_logprob": -0.20290459034054778, + "compression_ratio": 1.8019323671497585, "no_speech_prob": 0.0164946336299181}, + {"id": 355, "seek": 201040, "start": 2016.88, "end": 2023.52, "text": " I drank + some water that''s good everyone drinks water yes yeah please do because it''s very + healthy", "tokens": [50688, 286, 21011, 512, 1281, 300, 311, 665, 1518, 12142, 1281, + 2086, 1338, 1767, 360, 570, 309, 311, 588, 4627, 51020], "temperature": 0.0, "avg_logprob": + -0.20290459034054778, "compression_ratio": 1.8019323671497585, "no_speech_prob": + 0.0164946336299181}, {"id": 356, "seek": 201040, "start": 2023.52, "end": 2028.4, + "text": " you need to drink I guess two liters a day whatever some people do forget + this and then they say have", "tokens": [51020, 291, 643, 281, 2822, 286, 2041, + 732, 32323, 257, 786, 2035, 512, 561, 360, 2870, 341, 293, 550, 436, 584, 362, 51264], + "temperature": 0.0, "avg_logprob": -0.20290459034054778, "compression_ratio": 1.8019323671497585, + "no_speech_prob": 0.0164946336299181}, {"id": 357, "seek": 201040, "start": 2028.88, + "end": 2033.0400000000002, "text": " you know I have to take pink here or so whatever + no you don''t just drink water", "tokens": [51288, 291, 458, 286, 362, 281, 747, + 7022, 510, 420, 370, 2035, 572, 291, 500, 380, 445, 2822, 1281, 51496], "temperature": + 0.0, "avg_logprob": -0.20290459034054778, "compression_ratio": 1.8019323671497585, + "no_speech_prob": 0.0164946336299181}, {"id": 358, "seek": 203304, "start": 2033.2, + "end": 2043.12, "text": " but so what else do you want to share about Rockset you + know as an offering as a AI", "tokens": [50372, 457, 370, 437, 1646, 360, 291, 528, + 281, 2073, 466, 6922, 3854, 291, 458, 382, 364, 8745, 382, 257, 7318, 50868], "temperature": + 0.0, "avg_logprob": -0.23303544180733818, "compression_ratio": 1.6790123456790123, + "no_speech_prob": 0.00903274118900299}, {"id": 359, "seek": 203304, "start": 2044.08, + "end": 2050.32, "text": " enabler you know maybe do you guys plan to support rag + or do you think rag is sort of like", "tokens": [50916, 465, 455, 1918, 291, 458, + 1310, 360, 291, 1074, 1393, 281, 1406, 17539, 420, 360, 291, 519, 17539, 307, 1333, + 295, 411, 51228], "temperature": 0.0, "avg_logprob": -0.23303544180733818, "compression_ratio": + 1.6790123456790123, "no_speech_prob": 0.00903274118900299}, {"id": 360, "seek": + 203304, "start": 2050.32, "end": 2055.52, "text": " client side you know thing as + well that people can do you know using your tap things like that no", "tokens": + [51228, 6423, 1252, 291, 458, 551, 382, 731, 300, 561, 393, 360, 291, 458, 1228, + 428, 5119, 721, 411, 300, 572, 51488], "temperature": 0.0, "avg_logprob": -0.23303544180733818, + "compression_ratio": 1.6790123456790123, "no_speech_prob": 0.00903274118900299}, + {"id": 361, "seek": 205552, "start": 2056.16, "end": 2062.56, "text": " no we we + um we actually have a bunch of rag type style use cases on Rockset today and I do", + "tokens": [50396, 572, 321, 321, 1105, 321, 767, 362, 257, 3840, 295, 17539, 2010, + 3758, 764, 3331, 322, 6922, 3854, 965, 293, 286, 360, 50716], "temperature": 0.0, + "avg_logprob": -0.13970440312435753, "compression_ratio": 1.7522522522522523, "no_speech_prob": + 0.000697075854986906}, {"id": 362, "seek": 205552, "start": 2062.56, "end": 2068.0, + "text": " I do think Rockset naturally supports rag but it''s interesting so like + I guess one of the my kind", "tokens": [50716, 286, 360, 519, 6922, 3854, 8195, + 9346, 17539, 457, 309, 311, 1880, 370, 411, 286, 2041, 472, 295, 264, 452, 733, + 50988], "temperature": 0.0, "avg_logprob": -0.13970440312435753, "compression_ratio": + 1.7522522522522523, "no_speech_prob": 0.000697075854986906}, {"id": 363, "seek": + 205552, "start": 2068.0, "end": 2074.72, "text": " of open questions is pure rag + and I''m making up a term here but but but it is actually one of the very", "tokens": + [50988, 295, 1269, 1651, 307, 6075, 17539, 293, 286, 478, 1455, 493, 257, 1433, + 510, 457, 457, 457, 309, 307, 767, 472, 295, 264, 588, 51324], "temperature": 0.0, + "avg_logprob": -0.13970440312435753, "compression_ratio": 1.7522522522522523, "no_speech_prob": + 0.000697075854986906}, {"id": 364, "seek": 205552, "start": 2074.72, "end": 2081.28, + "text": " few like almost perfect vector use cases in its pure vector search but + I''m actually not convinced", "tokens": [51324, 1326, 411, 1920, 2176, 8062, 764, + 3331, 294, 1080, 6075, 8062, 3164, 457, 286, 478, 767, 406, 12561, 51652], "temperature": + 0.0, "avg_logprob": -0.13970440312435753, "compression_ratio": 1.7522522522522523, + "no_speech_prob": 0.000697075854986906}, {"id": 365, "seek": 208128, "start": 2081.36, + "end": 2087.28, "text": " because even most of the people that we''re we know that + are doing like rag style things want", "tokens": [50368, 570, 754, 881, 295, 264, + 561, 300, 321, 434, 321, 458, 300, 366, 884, 411, 17539, 3758, 721, 528, 50664], + "temperature": 0.0, "avg_logprob": -0.09950314946921475, "compression_ratio": 1.7971698113207548, + "no_speech_prob": 0.0014873042237013578}, {"id": 366, "seek": 208128, "start": 2087.84, + "end": 2094.2400000000002, "text": " are also doing some amount of boosting and + or metadata filtering to like further augment like", "tokens": [50692, 366, 611, + 884, 512, 2372, 295, 43117, 293, 420, 26603, 30822, 281, 411, 3052, 29919, 411, + 51012], "temperature": 0.0, "avg_logprob": -0.09950314946921475, "compression_ratio": + 1.7971698113207548, "no_speech_prob": 0.0014873042237013578}, {"id": 367, "seek": + 208128, "start": 2094.2400000000002, "end": 2101.6800000000003, "text": " hybrid + augmented the retrieval that augments the generation um uh and so so for example + like hey", "tokens": [51012, 13051, 36155, 264, 19817, 3337, 300, 14501, 1117, 264, + 5125, 1105, 2232, 293, 370, 370, 337, 1365, 411, 4177, 51384], "temperature": 0.0, + "avg_logprob": -0.09950314946921475, "compression_ratio": 1.7971698113207548, "no_speech_prob": + 0.0014873042237013578}, {"id": 368, "seek": 208128, "start": 2101.6800000000003, + "end": 2108.2400000000002, "text": " if the user asks about a certain thing when + you search for blurbs to augment the generation boost", "tokens": [51384, 498, 264, + 4195, 8962, 466, 257, 1629, 551, 562, 291, 3164, 337, 14257, 929, 281, 29919, 264, + 5125, 9194, 51712], "temperature": 0.0, "avg_logprob": -0.09950314946921475, "compression_ratio": + 1.7971698113207548, "no_speech_prob": 0.0014873042237013578}, {"id": 369, "seek": + 210824, "start": 2108.24, "end": 2112.4799999999996, "text": " the more recent ones + kind of a kind of a thing like there''s this kind of thing that gets injected", + "tokens": [50364, 264, 544, 5162, 2306, 733, 295, 257, 733, 295, 257, 551, 411, + 456, 311, 341, 733, 295, 551, 300, 2170, 36967, 50576], "temperature": 0.0, "avg_logprob": + -0.056673914194107056, "compression_ratio": 1.7853881278538812, "no_speech_prob": + 0.00019456676091067493}, {"id": 370, "seek": 210824, "start": 2112.4799999999996, + "end": 2120.3999999999996, "text": " into these systems um yeah so I''m I''m we + you can build this with Rockset today and I''m quite keen", "tokens": [50576, 666, + 613, 3652, 1105, 1338, 370, 286, 478, 286, 478, 321, 291, 393, 1322, 341, 365, 6922, + 3854, 965, 293, 286, 478, 1596, 20297, 50972], "temperature": 0.0, "avg_logprob": + -0.056673914194107056, "compression_ratio": 1.7853881278538812, "no_speech_prob": + 0.00019456676091067493}, {"id": 371, "seek": 210824, "start": 2120.3999999999996, + "end": 2126.9599999999996, "text": " on on these kinds of use cases I would say + that like like looking forward I''m I am quite interested", "tokens": [50972, 322, + 322, 613, 3685, 295, 764, 3331, 286, 576, 584, 300, 411, 411, 1237, 2128, 286, 478, + 286, 669, 1596, 3102, 51300], "temperature": 0.0, "avg_logprob": -0.056673914194107056, + "compression_ratio": 1.7853881278538812, "no_speech_prob": 0.00019456676091067493}, + {"id": 372, "seek": 210824, "start": 2126.9599999999996, "end": 2134.08, "text": + " in this kind of emerging dynamic of like where the real value is from here they''re + sort of like", "tokens": [51300, 294, 341, 733, 295, 14989, 8546, 295, 411, 689, + 264, 957, 2158, 307, 490, 510, 436, 434, 1333, 295, 411, 51656], "temperature": + 0.0, "avg_logprob": -0.056673914194107056, "compression_ratio": 1.7853881278538812, + "no_speech_prob": 0.00019456676091067493}, {"id": 373, "seek": 213408, "start": + 2134.08, "end": 2140.72, "text": " at least three dimensions things could go one + is like better and better an an algorithms that", "tokens": [50364, 412, 1935, 1045, + 12819, 721, 727, 352, 472, 307, 411, 1101, 293, 1101, 364, 364, 14642, 300, 50696], + "temperature": 0.0, "avg_logprob": -0.13713808946831282, "compression_ratio": 1.8037383177570094, + "no_speech_prob": 0.00011774120503105223}, {"id": 374, "seek": 213408, "start": + 2140.72, "end": 2148.3199999999997, "text": " squeeze more performance and more + scale and more whatever recall out of out of everything out of", "tokens": [50696, + 13578, 544, 3389, 293, 544, 4373, 293, 544, 2035, 9901, 484, 295, 484, 295, 1203, + 484, 295, 51076], "temperature": 0.0, "avg_logprob": -0.13713808946831282, "compression_ratio": + 1.8037383177570094, "no_speech_prob": 0.00011774120503105223}, {"id": 375, "seek": + 213408, "start": 2148.3199999999997, "end": 2153.84, "text": " every bite of RAM + and so forth and so on another direction is incrementability so a lot of these", + "tokens": [51076, 633, 7988, 295, 14561, 293, 370, 5220, 293, 370, 322, 1071, 3513, + 307, 26200, 2310, 370, 257, 688, 295, 613, 51352], "temperature": 0.0, "avg_logprob": + -0.13713808946831282, "compression_ratio": 1.8037383177570094, "no_speech_prob": + 0.00011774120503105223}, {"id": 376, "seek": 213408, "start": 2153.84, "end": 2160.0, + "text": " there''s a lot of a lot of these like really advanced really strong systems + sorry a n n indexes are", "tokens": [51352, 456, 311, 257, 688, 295, 257, 688, 295, + 613, 411, 534, 7339, 534, 2068, 3652, 2597, 257, 297, 297, 8186, 279, 366, 51660], + "temperature": 0.0, "avg_logprob": -0.13713808946831282, "compression_ratio": 1.8037383177570094, + "no_speech_prob": 0.00011774120503105223}, {"id": 377, "seek": 216000, "start": + 2160.0, "end": 2166.96, "text": " not updateable easily so the sort of updateability + destroys a lot of what you just worked really hard", "tokens": [50364, 406, 5623, + 712, 3612, 370, 264, 1333, 295, 5623, 2310, 36714, 257, 688, 295, 437, 291, 445, + 2732, 534, 1152, 50712], "temperature": 0.0, "avg_logprob": -0.10931411411451257, + "compression_ratio": 1.8021978021978022, "no_speech_prob": 0.0004392066621221602}, + {"id": 378, "seek": 216000, "start": 2166.96, "end": 2172.72, "text": " to build + or you spend way too much CPU to do it so which is better like which which in on + in real", "tokens": [50712, 281, 1322, 420, 291, 3496, 636, 886, 709, 13199, 281, + 360, 309, 370, 597, 307, 1101, 411, 597, 597, 294, 322, 294, 957, 51000], "temperature": + 0.0, "avg_logprob": -0.10931411411451257, "compression_ratio": 1.8021978021978022, + "no_speech_prob": 0.0004392066621221602}, {"id": 379, "seek": 216000, "start": 2172.72, + "end": 2177.84, "text": " life which are the what I rather update twice as fast + or twice as painlessly or what I rather get", "tokens": [51000, 993, 597, 366, 264, + 437, 286, 2831, 5623, 6091, 382, 2370, 420, 6091, 382, 1822, 12048, 420, 437, 286, + 2831, 483, 51256], "temperature": 0.0, "avg_logprob": -0.10931411411451257, "compression_ratio": + 1.8021978021978022, "no_speech_prob": 0.0004392066621221602}, {"id": 380, "seek": + 216000, "start": 2177.84, "end": 2182.32, "text": " three and a half percent more + you know on my precision recall and then the third dimension is how", "tokens": + [51256, 1045, 293, 257, 1922, 3043, 544, 291, 458, 322, 452, 18356, 9901, 293, 550, + 264, 2636, 10139, 307, 577, 51480], "temperature": 0.0, "avg_logprob": -0.10931411411451257, + "compression_ratio": 1.8021978021978022, "no_speech_prob": 0.0004392066621221602}, + {"id": 381, "seek": 216000, "start": 2182.32, "end": 2188.4, "text": " do these + things integrate with other indexes right so certain a and n indexes are much better + at", "tokens": [51480, 360, 613, 721, 13365, 365, 661, 8186, 279, 558, 370, 1629, + 257, 293, 297, 8186, 279, 366, 709, 1101, 412, 51784], "temperature": 0.0, "avg_logprob": + -0.10931411411451257, "compression_ratio": 1.8021978021978022, "no_speech_prob": + 0.0004392066621221602}, {"id": 382, "seek": 218840, "start": 2188.4, "end": 2193.92, + "text": " doing meditative filter at scale than other ones are and so you know if + there''s more value in that", "tokens": [50364, 884, 1205, 14275, 6608, 412, 4373, + 813, 661, 2306, 366, 293, 370, 291, 458, 498, 456, 311, 544, 2158, 294, 300, 50640], + "temperature": 0.0, "avg_logprob": -0.13066453519074814, "compression_ratio": 1.866412213740458, + "no_speech_prob": 0.00030513416277244687}, {"id": 383, "seek": 218840, "start": + 2193.92, "end": 2200.2400000000002, "text": " than the 3% I got over here then I + so it''s not all together clear we are pretty heavily betting on", "tokens": [50640, + 813, 264, 805, 4, 286, 658, 670, 510, 550, 286, 370, 309, 311, 406, 439, 1214, 1850, + 321, 366, 1238, 10950, 34246, 322, 50956], "temperature": 0.0, "avg_logprob": -0.13066453519074814, + "compression_ratio": 1.866412213740458, "no_speech_prob": 0.00030513416277244687}, + {"id": 384, "seek": 218840, "start": 2200.2400000000002, "end": 2206.1600000000003, + "text": " the I shouldn''t say we''re betting on it like right now we we got the + hybrid stuff relatively easily", "tokens": [50956, 264, 286, 4659, 380, 584, 321, + 434, 34246, 322, 309, 411, 558, 586, 321, 321, 658, 264, 13051, 1507, 7226, 3612, + 51252], "temperature": 0.0, "avg_logprob": -0.13066453519074814, "compression_ratio": + 1.866412213740458, "no_speech_prob": 0.00030513416277244687}, {"id": 385, "seek": + 218840, "start": 2206.1600000000003, "end": 2210.08, "text": " so that''s the thing + that we''re building heavily because all the all the hybridization has been", "tokens": + [51252, 370, 300, 311, 264, 551, 300, 321, 434, 2390, 10950, 570, 439, 264, 439, + 264, 13051, 2144, 575, 668, 51448], "temperature": 0.0, "avg_logprob": -0.13066453519074814, + "compression_ratio": 1.866412213740458, "no_speech_prob": 0.00030513416277244687}, + {"id": 386, "seek": 218840, "start": 2210.08, "end": 2216.0, "text": " and the the + incrementability because that''s core like so for us the incrementability is like + not", "tokens": [51448, 293, 264, 264, 26200, 2310, 570, 300, 311, 4965, 411, 370, + 337, 505, 264, 26200, 2310, 307, 411, 406, 51744], "temperature": 0.0, "avg_logprob": + -0.13066453519074814, "compression_ratio": 1.866412213740458, "no_speech_prob": + 0.00030513416277244687}, {"id": 387, "seek": 221600, "start": 2216.32, "end": 2220.24, + "text": " you have to have that I can''t use an an index that requires like overnight + training like that''s", "tokens": [50380, 291, 362, 281, 362, 300, 286, 393, 380, + 764, 364, 364, 8186, 300, 7029, 411, 13935, 3097, 411, 300, 311, 50576], "temperature": + 0.0, "avg_logprob": -0.12285808865114939, "compression_ratio": 2.0385964912280703, + "no_speech_prob": 0.00858211424201727}, {"id": 388, "seek": 221600, "start": 2220.24, + "end": 2224.88, "text": " not a thing that that rock that doesn''t work with rocks + x we were trying to be real time and then I", "tokens": [50576, 406, 257, 551, 300, + 300, 3727, 300, 1177, 380, 589, 365, 10989, 2031, 321, 645, 1382, 281, 312, 957, + 565, 293, 550, 286, 50808], "temperature": 0.0, "avg_logprob": -0.12285808865114939, + "compression_ratio": 2.0385964912280703, "no_speech_prob": 0.00858211424201727}, + {"id": 389, "seek": 221600, "start": 2224.88, "end": 2228.64, "text": " guess there''s + like one fourth dimension that could blow all this up which is that somehow the", + "tokens": [50808, 2041, 456, 311, 411, 472, 6409, 10139, 300, 727, 6327, 439, 341, + 493, 597, 307, 300, 6063, 264, 50996], "temperature": 0.0, "avg_logprob": -0.12285808865114939, + "compression_ratio": 2.0385964912280703, "no_speech_prob": 0.00858211424201727}, + {"id": 390, "seek": 221600, "start": 2228.64, "end": 2233.84, "text": " vectors + get so good that none of the rest of this matters like maybe there is maybe there + is no rag", "tokens": [50996, 18875, 483, 370, 665, 300, 6022, 295, 264, 1472, 295, + 341, 7001, 411, 1310, 456, 307, 1310, 456, 307, 572, 17539, 51256], "temperature": + 0.0, "avg_logprob": -0.12285808865114939, "compression_ratio": 2.0385964912280703, + "no_speech_prob": 0.00858211424201727}, {"id": 391, "seek": 221600, "start": 2233.84, + "end": 2238.48, "text": " maybe there is no like maybe the vectors just good enough + maybe the machine is smart enough that", "tokens": [51256, 1310, 456, 307, 572, + 411, 1310, 264, 18875, 445, 665, 1547, 1310, 264, 3479, 307, 4069, 1547, 300, 51488], + "temperature": 0.0, "avg_logprob": -0.12285808865114939, "compression_ratio": 2.0385964912280703, + "no_speech_prob": 0.00858211424201727}, {"id": 392, "seek": 221600, "start": 2238.48, + "end": 2242.32, "text": " we don''t need any of the rest of this we don''t need + any hybrids I think that''s unlikely in the", "tokens": [51488, 321, 500, 380, 643, + 604, 295, 264, 1472, 295, 341, 321, 500, 380, 643, 604, 2477, 1443, 3742, 286, 519, + 300, 311, 17518, 294, 264, 51680], "temperature": 0.0, "avg_logprob": -0.12285808865114939, + "compression_ratio": 2.0385964912280703, "no_speech_prob": 0.00858211424201727}, + {"id": 393, "seek": 224232, "start": 2242.32, "end": 2246.8, "text": " short and + medium term but who knows in the in the long that''s probably require some kind + of", "tokens": [50364, 2099, 293, 6399, 1433, 457, 567, 3255, 294, 264, 294, 264, + 938, 300, 311, 1391, 3651, 512, 733, 295, 50588], "temperature": 0.0, "avg_logprob": + -0.16735186857335707, "compression_ratio": 1.7130044843049328, "no_speech_prob": + 0.011528562754392624}, {"id": 394, "seek": 224232, "start": 2246.8, "end": 2254.0800000000004, + "text": " singularity yes it is jump right because that means that you do not need + foundational models from", "tokens": [50588, 20010, 507, 2086, 309, 307, 3012, 558, + 570, 300, 1355, 300, 291, 360, 406, 643, 32195, 5245, 490, 50952], "temperature": + 0.0, "avg_logprob": -0.16735186857335707, "compression_ratio": 1.7130044843049328, + "no_speech_prob": 0.011528562754392624}, {"id": 395, "seek": 224232, "start": 2254.0800000000004, + "end": 2260.1600000000003, "text": " metal whoever right you could train it from + scratch and if you can do it within a couple minutes", "tokens": [50952, 5760, 11387, + 558, 291, 727, 3847, 309, 490, 8459, 293, 498, 291, 393, 360, 309, 1951, 257, 1916, + 2077, 51256], "temperature": 0.0, "avg_logprob": -0.16735186857335707, "compression_ratio": + 1.7130044843049328, "no_speech_prob": 0.011528562754392624}, {"id": 396, "seek": + 224232, "start": 2260.1600000000003, "end": 2267.84, "text": " then why would you + bother taking those models right that''s very interesting it''s my that that''s", + "tokens": [51256, 550, 983, 576, 291, 8677, 1940, 729, 5245, 558, 300, 311, 588, + 1880, 309, 311, 452, 300, 300, 311, 51640], "temperature": 0.0, "avg_logprob": -0.16735186857335707, + "compression_ratio": 1.7130044843049328, "no_speech_prob": 0.011528562754392624}, + {"id": 397, "seek": 226784, "start": 2267.92, "end": 2273.44, "text": " that''s + why I said there''s three and then I threw the fourth one in because I it''s it''s + not impossible", "tokens": [50368, 300, 311, 983, 286, 848, 456, 311, 1045, 293, + 550, 286, 11918, 264, 6409, 472, 294, 570, 286, 309, 311, 309, 311, 406, 6243, 50644], + "temperature": 0.0, "avg_logprob": -0.1486361821492513, "compression_ratio": 1.7207207207207207, + "no_speech_prob": 0.005666608456522226}, {"id": 398, "seek": 226784, "start": 2273.44, + "end": 2280.2400000000002, "text": " but I think it''s not likely not anytime exactly + I mean it''s probably if this is about to happen", "tokens": [50644, 457, 286, 519, + 309, 311, 406, 3700, 406, 13038, 2293, 286, 914, 309, 311, 1391, 498, 341, 307, + 466, 281, 1051, 50984], "temperature": 0.0, "avg_logprob": -0.1486361821492513, + "compression_ratio": 1.7207207207207207, "no_speech_prob": 0.005666608456522226}, + {"id": 399, "seek": 226784, "start": 2280.96, "end": 2286.32, "text": " then probably + we would already see the room like you know the signals of that but today", "tokens": + [51020, 550, 1391, 321, 576, 1217, 536, 264, 1808, 411, 291, 458, 264, 12354, 295, + 300, 457, 965, 51288], "temperature": 0.0, "avg_logprob": -0.1486361821492513, "compression_ratio": + 1.7207207207207207, "no_speech_prob": 0.005666608456522226}, {"id": 400, "seek": + 226784, "start": 2287.2000000000003, "end": 2292.8, "text": " still we can see how + these giants keep training the models and they keep open sourcing sometimes", "tokens": + [51332, 920, 321, 393, 536, 577, 613, 31894, 1066, 3097, 264, 5245, 293, 436, 1066, + 1269, 11006, 2175, 2171, 51612], "temperature": 0.0, "avg_logprob": -0.1486361821492513, + "compression_ratio": 1.7207207207207207, "no_speech_prob": 0.005666608456522226}, + {"id": 401, "seek": 229280, "start": 2292.96, "end": 2299.76, "text": " encodes + sometimes for real but yeah it''s it''s another topic to cover I have a very practical", + "tokens": [50372, 2058, 4789, 2171, 337, 957, 457, 1338, 309, 311, 309, 311, 1071, + 4829, 281, 2060, 286, 362, 257, 588, 8496, 50712], "temperature": 0.0, "avg_logprob": + -0.19653683739739494, "compression_ratio": 1.5628415300546448, "no_speech_prob": + 0.0026188548654317856}, {"id": 402, "seek": 229280, "start": 2299.76, "end": 2305.76, + "text": " question as well so for example if I do have a model and that model could + be from Higging Face for", "tokens": [50712, 1168, 382, 731, 370, 337, 1365, 498, + 286, 360, 362, 257, 2316, 293, 300, 2316, 727, 312, 490, 389, 328, 3249, 4047, 337, + 51012], "temperature": 0.0, "avg_logprob": -0.19653683739739494, "compression_ratio": + 1.5628415300546448, "no_speech_prob": 0.0026188548654317856}, {"id": 403, "seek": + 229280, "start": 2305.76, "end": 2314.4, "text": " example so it''s not mine how + do I bring the embeddings to Rockset can I leverage the Rockset''s", "tokens": [51012, + 1365, 370, 309, 311, 406, 3892, 577, 360, 286, 1565, 264, 12240, 29432, 281, 6922, + 3854, 393, 286, 13982, 264, 6922, 3854, 311, 51444], "temperature": 0.0, "avg_logprob": + -0.19653683739739494, "compression_ratio": 1.5628415300546448, "no_speech_prob": + 0.0026188548654317856}, {"id": 404, "seek": 231440, "start": 2314.4, "end": 2323.6, + "text": " infrastructure to compute the embeddings themselves so this the answer + in short is no today and it is", "tokens": [50364, 6896, 281, 14722, 264, 12240, + 29432, 2969, 370, 341, 264, 1867, 294, 2099, 307, 572, 965, 293, 309, 307, 50824], + "temperature": 0.0, "avg_logprob": -0.09754545792289403, "compression_ratio": 1.8389513108614233, + "no_speech_prob": 0.0016795805422589183}, {"id": 405, "seek": 231440, "start": 2323.6, + "end": 2329.04, "text": " on my list super high on my list so there is a customer + who came to me tomorrow and was like hey", "tokens": [50824, 322, 452, 1329, 1687, + 1090, 322, 452, 1329, 370, 456, 307, 257, 5474, 567, 1361, 281, 385, 4153, 293, + 390, 411, 4177, 51096], "temperature": 0.0, "avg_logprob": -0.09754545792289403, + "compression_ratio": 1.8389513108614233, "no_speech_prob": 0.0016795805422589183}, + {"id": 406, "seek": 231440, "start": 2330.08, "end": 2335.52, "text": " I want to + run this model using your infrastructure over my data I''d probably find a way to + make", "tokens": [51148, 286, 528, 281, 1190, 341, 2316, 1228, 428, 6896, 670, 452, + 1412, 286, 1116, 1391, 915, 257, 636, 281, 652, 51420], "temperature": 0.0, "avg_logprob": + -0.09754545792289403, "compression_ratio": 1.8389513108614233, "no_speech_prob": + 0.0016795805422589183}, {"id": 407, "seek": 231440, "start": 2335.52, "end": 2339.36, + "text": " that work for that like an existing customer like I would like because + that''s a feature I want to", "tokens": [51420, 300, 589, 337, 300, 411, 364, 6741, + 5474, 411, 286, 576, 411, 570, 300, 311, 257, 4111, 286, 528, 281, 51612], "temperature": + 0.0, "avg_logprob": -0.09754545792289403, "compression_ratio": 1.8389513108614233, + "no_speech_prob": 0.0016795805422589183}, {"id": 408, "seek": 231440, "start": 2339.36, + "end": 2343.36, "text": " build I''m like waiting for the excuse to build that the + problem for me is it''s just really hard to", "tokens": [51612, 1322, 286, 478, + 411, 3806, 337, 264, 8960, 281, 1322, 300, 264, 1154, 337, 385, 307, 309, 311, 445, + 534, 1152, 281, 51812], "temperature": 0.0, "avg_logprob": -0.09754545792289403, + "compression_ratio": 1.8389513108614233, "no_speech_prob": 0.0016795805422589183}, + {"id": 409, "seek": 234336, "start": 2343.36, "end": 2351.52, "text": " build generally + like if it was like call this API or support these exact kind of models it''s not", + "tokens": [50364, 1322, 5101, 411, 498, 309, 390, 411, 818, 341, 9362, 420, 1406, + 613, 1900, 733, 295, 5245, 309, 311, 406, 50772], "temperature": 0.0, "avg_logprob": + -0.044462160269419355, "compression_ratio": 1.7765567765567765, "no_speech_prob": + 0.0018911833176389337}, {"id": 410, "seek": 234336, "start": 2351.52, "end": 2355.6, + "text": " so hard but to do it in general without having like a specific customer + demand it''s a little bit", "tokens": [50772, 370, 1152, 457, 281, 360, 309, 294, + 2674, 1553, 1419, 411, 257, 2685, 5474, 4733, 309, 311, 257, 707, 857, 50976], "temperature": + 0.0, "avg_logprob": -0.044462160269419355, "compression_ratio": 1.7765567765567765, + "no_speech_prob": 0.0018911833176389337}, {"id": 411, "seek": 234336, "start": 2355.6, + "end": 2360.48, "text": " trickier so we can kind of wait until that take a little + bit more shape but we have the pieces in", "tokens": [50976, 4282, 811, 370, 321, + 393, 733, 295, 1699, 1826, 300, 747, 257, 707, 857, 544, 3909, 457, 321, 362, 264, + 3755, 294, 51220], "temperature": 0.0, "avg_logprob": -0.044462160269419355, "compression_ratio": + 1.7765567765567765, "no_speech_prob": 0.0018911833176389337}, {"id": 412, "seek": + 234336, "start": 2360.48, "end": 2364.2400000000002, "text": " place like it''s + not hard for me to spin up a bunch of machines that run on your data and write", + "tokens": [51220, 1081, 411, 309, 311, 406, 1152, 337, 385, 281, 6060, 493, 257, + 3840, 295, 8379, 300, 1190, 322, 428, 1412, 293, 2464, 51408], "temperature": 0.0, + "avg_logprob": -0.044462160269419355, "compression_ratio": 1.7765567765567765, "no_speech_prob": + 0.0018911833176389337}, {"id": 413, "seek": 234336, "start": 2364.2400000000002, + "end": 2371.04, "text": " to your database I just it''s the actual like last mile + of wire of like what code do I run how do I", "tokens": [51408, 281, 428, 8149, + 286, 445, 309, 311, 264, 3539, 411, 1036, 12620, 295, 6234, 295, 411, 437, 3089, + 360, 286, 1190, 577, 360, 286, 51748], "temperature": 0.0, "avg_logprob": -0.044462160269419355, + "compression_ratio": 1.7765567765567765, "no_speech_prob": 0.0018911833176389337}, + {"id": 414, "seek": 237104, "start": 2371.04, "end": 2375.52, "text": " secure that + code you know like that kind of stuff that''s like what''s missing from us so today", + "tokens": [50364, 7144, 300, 3089, 291, 458, 411, 300, 733, 295, 1507, 300, 311, + 411, 437, 311, 5361, 490, 505, 370, 965, 50588], "temperature": 0.0, "avg_logprob": + -0.12991098846708024, "compression_ratio": 1.8098859315589353, "no_speech_prob": + 0.0019807470962405205}, {"id": 415, "seek": 237104, "start": 2375.52, "end": 2378.72, + "text": " and today you have to give me the embedding you''re gonna have to run + them and put them in a", "tokens": [50588, 293, 965, 291, 362, 281, 976, 385, 264, + 12240, 3584, 291, 434, 799, 362, 281, 1190, 552, 293, 829, 552, 294, 257, 50748], + "temperature": 0.0, "avg_logprob": -0.12991098846708024, "compression_ratio": 1.8098859315589353, + "no_speech_prob": 0.0019807470962405205}, {"id": 416, "seek": 237104, "start": 2378.72, + "end": 2386.0, "text": " rock set but this is at the top of my list of sort of features + I want to build yeah I mean it", "tokens": [50748, 3727, 992, 457, 341, 307, 412, + 264, 1192, 295, 452, 1329, 295, 1333, 295, 4122, 286, 528, 281, 1322, 1338, 286, + 914, 309, 51112], "temperature": 0.0, "avg_logprob": -0.12991098846708024, "compression_ratio": + 1.8098859315589353, "no_speech_prob": 0.0019807470962405205}, {"id": 417, "seek": + 237104, "start": 2386.0, "end": 2393.04, "text": " just sounds and by the way you + know if you take database today probably you could divide them into", "tokens": + [51112, 445, 3263, 293, 538, 264, 636, 291, 458, 498, 291, 747, 8149, 965, 1391, + 291, 727, 9845, 552, 666, 51464], "temperature": 0.0, "avg_logprob": -0.12991098846708024, + "compression_ratio": 1.8098859315589353, "no_speech_prob": 0.0019807470962405205}, + {"id": 418, "seek": 237104, "start": 2393.04, "end": 2397.7599999999998, "text": + " two groups you know using these dimensions specifically whether or not you can + compute embeddings", "tokens": [51464, 732, 3935, 291, 458, 1228, 613, 12819, 4682, + 1968, 420, 406, 291, 393, 14722, 12240, 29432, 51700], "temperature": 0.0, "avg_logprob": + -0.12991098846708024, "compression_ratio": 1.8098859315589353, "no_speech_prob": + 0.0019807470962405205}, {"id": 419, "seek": 239776, "start": 2397.76, "end": 2405.28, + "text": " inside and sometimes you do not want that because you want to like fine + tune the model and obviously", "tokens": [50364, 1854, 293, 2171, 291, 360, 406, + 528, 300, 570, 291, 528, 281, 411, 2489, 10864, 264, 2316, 293, 2745, 50740], "temperature": + 0.0, "avg_logprob": -0.08197278766841679, "compression_ratio": 1.7739130434782608, + "no_speech_prob": 0.0028080677147954702}, {"id": 420, "seek": 239776, "start": 2405.28, + "end": 2410.48, "text": " the database wouldn''t have access to it unless there + is a very easy way to plug it in which I haven''t", "tokens": [50740, 264, 8149, + 2759, 380, 362, 2105, 281, 309, 5969, 456, 307, 257, 588, 1858, 636, 281, 5452, + 309, 294, 597, 286, 2378, 380, 51000], "temperature": 0.0, "avg_logprob": -0.08197278766841679, + "compression_ratio": 1.7739130434782608, "no_speech_prob": 0.0028080677147954702}, + {"id": 421, "seek": 239776, "start": 2410.48, "end": 2417.28, "text": " seen by + the way probably I''m missing something but I haven''t seen it and everyone today + has some sort", "tokens": [51000, 1612, 538, 264, 636, 1391, 286, 478, 5361, 746, + 457, 286, 2378, 380, 1612, 309, 293, 1518, 965, 575, 512, 1333, 51340], "temperature": + 0.0, "avg_logprob": -0.08197278766841679, "compression_ratio": 1.7739130434782608, + "no_speech_prob": 0.0028080677147954702}, {"id": 422, "seek": 239776, "start": 2417.28, + "end": 2424.6400000000003, "text": " of vector support you know both the traditional + databases as well as this new breed of vector databases", "tokens": [51340, 295, + 8062, 1406, 291, 458, 1293, 264, 5164, 22380, 382, 731, 382, 341, 777, 18971, 295, + 8062, 22380, 51708], "temperature": 0.0, "avg_logprob": -0.08197278766841679, "compression_ratio": + 1.7739130434782608, "no_speech_prob": 0.0028080677147954702}, {"id": 423, "seek": + 242464, "start": 2425.6, "end": 2429.2799999999997, "text": " but yeah that''s interesting + that''s interesting that you guys are looking in that direction", "tokens": [50412, + 457, 1338, 300, 311, 1880, 300, 311, 1880, 300, 291, 1074, 366, 1237, 294, 300, + 3513, 50596], "temperature": 0.0, "avg_logprob": -0.15169025551189075, "compression_ratio": + 1.9790575916230366, "no_speech_prob": 0.008275533095002174}, {"id": 424, "seek": + 242464, "start": 2431.2799999999997, "end": 2438.48, "text": " what else you know + like if if someone wants in the audience wants to try rock set today you know", + "tokens": [50696, 437, 1646, 291, 458, 411, 498, 498, 1580, 2738, 294, 264, 4034, + 2738, 281, 853, 3727, 992, 965, 291, 458, 51056], "temperature": 0.0, "avg_logprob": + -0.15169025551189075, "compression_ratio": 1.9790575916230366, "no_speech_prob": + 0.008275533095002174}, {"id": 425, "seek": 242464, "start": 2439.12, "end": 2444.08, + "text": " do they need to pay it right away well can they have some free tier to + play around oh there''s", "tokens": [51088, 360, 436, 643, 281, 1689, 309, 558, + 1314, 731, 393, 436, 362, 512, 1737, 12362, 281, 862, 926, 1954, 456, 311, 51336], + "temperature": 0.0, "avg_logprob": -0.15169025551189075, "compression_ratio": 1.9790575916230366, + "no_speech_prob": 0.008275533095002174}, {"id": 426, "seek": 242464, "start": 2444.08, + "end": 2450.0, "text": " free there''s free tiers yeah so you can play around you + can play around for free in rock set and", "tokens": [51336, 1737, 456, 311, 1737, + 40563, 1338, 370, 291, 393, 862, 926, 291, 393, 862, 926, 337, 1737, 294, 3727, + 992, 293, 51632], "temperature": 0.0, "avg_logprob": -0.15169025551189075, "compression_ratio": + 1.9790575916230366, "no_speech_prob": 0.008275533095002174}, {"id": 427, "seek": + 245000, "start": 2450.96, "end": 2455.92, "text": " uh if anybody is like super + interested and they have something interesting and they they they", "tokens": [50412, + 2232, 498, 4472, 307, 411, 1687, 3102, 293, 436, 362, 746, 1880, 293, 436, 436, + 436, 50660], "temperature": 0.0, "avg_logprob": -0.14585168738114207, "compression_ratio": + 1.8237547892720307, "no_speech_prob": 0.00036798944347538054}, {"id": 428, "seek": + 245000, "start": 2455.92, "end": 2460.88, "text": " can always email us too um we + we will try to find a way to make make make that stuff work as much as", "tokens": + [50660, 393, 1009, 3796, 505, 886, 1105, 321, 321, 486, 853, 281, 915, 257, 636, + 281, 652, 652, 652, 300, 1507, 589, 382, 709, 382, 50908], "temperature": 0.0, "avg_logprob": + -0.14585168738114207, "compression_ratio": 1.8237547892720307, "no_speech_prob": + 0.00036798944347538054}, {"id": 429, "seek": 245000, "start": 2460.88, "end": 2465.52, + "text": " possible but yes there is a free tier you can go back around with it yeah + um and um", "tokens": [50908, 1944, 457, 2086, 456, 307, 257, 1737, 12362, 291, + 393, 352, 646, 926, 365, 309, 1338, 1105, 293, 1105, 51140], "temperature": 0.0, + "avg_logprob": -0.14585168738114207, "compression_ratio": 1.8237547892720307, "no_speech_prob": + 0.00036798944347538054}, {"id": 430, "seek": 245000, "start": 2467.84, "end": 2471.76, + "text": " it is managed so that the one thing you have to understand about rock + set is the managed service", "tokens": [51256, 309, 307, 6453, 370, 300, 264, 472, + 551, 291, 362, 281, 1223, 466, 3727, 992, 307, 264, 6453, 2643, 51452], "temperature": + 0.0, "avg_logprob": -0.14585168738114207, "compression_ratio": 1.8237547892720307, + "no_speech_prob": 0.00036798944347538054}, {"id": 431, "seek": 245000, "start": + 2471.76, "end": 2475.6, "text": " right so you''re not going to download it and + run it or whatever it''s not that''s not the way it works", "tokens": [51452, 558, + 370, 291, 434, 406, 516, 281, 5484, 309, 293, 1190, 309, 420, 2035, 309, 311, 406, + 300, 311, 406, 264, 636, 309, 1985, 51644], "temperature": 0.0, "avg_logprob": -0.14585168738114207, + "compression_ratio": 1.8237547892720307, "no_speech_prob": 0.00036798944347538054}, + {"id": 432, "seek": 247560, "start": 2475.6, "end": 2481.2799999999997, "text": + " no and and by the way that''s exactly the advantage for businesses right and that''s + why we do have", "tokens": [50364, 572, 293, 293, 538, 264, 636, 300, 311, 2293, + 264, 5002, 337, 6011, 558, 293, 300, 311, 983, 321, 360, 362, 50648], "temperature": + 0.0, "avg_logprob": -0.1159331202507019, "compression_ratio": 1.8106060606060606, + "no_speech_prob": 0.0008684160420671105}, {"id": 433, "seek": 247560, "start": 2481.2799999999997, + "end": 2486.72, "text": " different business models you know because in the end + of the day you''re not doing this only", "tokens": [50648, 819, 1606, 5245, 291, + 458, 570, 294, 264, 917, 295, 264, 786, 291, 434, 406, 884, 341, 787, 50920], "temperature": + 0.0, "avg_logprob": -0.1159331202507019, "compression_ratio": 1.8106060606060606, + "no_speech_prob": 0.0008684160420671105}, {"id": 434, "seek": 247560, "start": 2486.72, + "end": 2492.48, "text": " for fun you you really need to run money too for the company + to grow and and build more things", "tokens": [50920, 337, 1019, 291, 291, 534, + 643, 281, 1190, 1460, 886, 337, 264, 2237, 281, 1852, 293, 293, 1322, 544, 721, + 51208], "temperature": 0.0, "avg_logprob": -0.1159331202507019, "compression_ratio": + 1.8106060606060606, "no_speech_prob": 0.0008684160420671105}, {"id": 435, "seek": + 247560, "start": 2492.48, "end": 2499.68, "text": " for your users and so that''s + absolutely legit uh approach not everything needs to be open source", "tokens": + [51208, 337, 428, 5022, 293, 370, 300, 311, 3122, 10275, 2232, 3109, 406, 1203, + 2203, 281, 312, 1269, 4009, 51568], "temperature": 0.0, "avg_logprob": -0.1159331202507019, + "compression_ratio": 1.8106060606060606, "no_speech_prob": 0.0008684160420671105}, + {"id": 436, "seek": 247560, "start": 2499.68, "end": 2505.36, "text": " you chose + it that way but it''s great that you have free tier and we can also link it in the + show", "tokens": [51568, 291, 5111, 309, 300, 636, 457, 309, 311, 869, 300, 291, + 362, 1737, 12362, 293, 321, 393, 611, 2113, 309, 294, 264, 855, 51852], "temperature": + 0.0, "avg_logprob": -0.1159331202507019, "compression_ratio": 1.8106060606060606, + "no_speech_prob": 0.0008684160420671105}, {"id": 437, "seek": 250536, "start": 2505.44, + "end": 2512.7200000000003, "text": " notes sure um what are you looking at you know + do you need some you said you have already so", "tokens": [50368, 5570, 988, 1105, + 437, 366, 291, 1237, 412, 291, 458, 360, 291, 643, 512, 291, 848, 291, 362, 1217, + 370, 50732], "temperature": 0.0, "avg_logprob": -0.14704912049429758, "compression_ratio": + 1.6741071428571428, "no_speech_prob": 0.0011592706432566047}, {"id": 438, "seek": + 250536, "start": 2512.7200000000003, "end": 2520.88, "text": " many clients in different + nations different verticals what else would you benefit from by sharing", "tokens": + [50732, 867, 6982, 294, 819, 11035, 819, 9429, 82, 437, 1646, 576, 291, 5121, 490, + 538, 5414, 51140], "temperature": 0.0, "avg_logprob": -0.14704912049429758, "compression_ratio": + 1.6741071428571428, "no_speech_prob": 0.0011592706432566047}, {"id": 439, "seek": + 250536, "start": 2521.44, "end": 2528.4, "text": " rock set into a wider community + you know through these podcasts all right so there''s a lot of", "tokens": [51168, + 3727, 992, 666, 257, 11842, 1768, 291, 458, 807, 613, 24045, 439, 558, 370, 456, + 311, 257, 688, 295, 51516], "temperature": 0.0, "avg_logprob": -0.14704912049429758, + "compression_ratio": 1.6741071428571428, "no_speech_prob": 0.0011592706432566047}, + {"id": 440, "seek": 250536, "start": 2528.4, "end": 2533.52, "text": " ways to answer + this question but but this is the vector group right so selfishly I kind of", "tokens": + [51516, 2098, 281, 1867, 341, 1168, 457, 457, 341, 307, 264, 8062, 1594, 558, 370, + 19074, 356, 286, 733, 295, 51772], "temperature": 0.0, "avg_logprob": -0.14704912049429758, + "compression_ratio": 1.6741071428571428, "no_speech_prob": 0.0011592706432566047}, + {"id": 441, "seek": 253352, "start": 2533.52, "end": 2540.0, "text": " already hinted + at this is I''m trying to get a clearer sense by where the value is going to come", + "tokens": [50364, 1217, 12075, 292, 412, 341, 307, 286, 478, 1382, 281, 483, 257, + 26131, 2020, 538, 689, 264, 2158, 307, 516, 281, 808, 50688], "temperature": 0.0, + "avg_logprob": -0.07006655890366127, "compression_ratio": 1.8358778625954197, "no_speech_prob": + 0.0005382033996284008}, {"id": 442, "seek": 253352, "start": 2540.0, "end": 2545.12, + "text": " from in for vectors in the in the short and medium term for people like + there''s a lot of people", "tokens": [50688, 490, 294, 337, 18875, 294, 264, 294, + 264, 2099, 293, 6399, 1433, 337, 561, 411, 456, 311, 257, 688, 295, 561, 50944], + "temperature": 0.0, "avg_logprob": -0.07006655890366127, "compression_ratio": 1.8358778625954197, + "no_speech_prob": 0.0005382033996284008}, {"id": 443, "seek": 253352, "start": 2545.12, + "end": 2550.0, "text": " out there and we saw this there''s a million people trying + to oh my god vectors are happening how", "tokens": [50944, 484, 456, 293, 321, 1866, + 341, 456, 311, 257, 2459, 561, 1382, 281, 1954, 452, 3044, 18875, 366, 2737, 577, + 51188], "temperature": 0.0, "avg_logprob": -0.07006655890366127, "compression_ratio": + 1.8358778625954197, "no_speech_prob": 0.0005382033996284008}, {"id": 444, "seek": + 253352, "start": 2550.0, "end": 2556.24, "text": " do I plug this into my business + like is there it''s can I use this and we''ve seen a bunch of", "tokens": [51188, + 360, 286, 5452, 341, 666, 452, 1606, 411, 307, 456, 309, 311, 393, 286, 764, 341, + 293, 321, 600, 1612, 257, 3840, 295, 51500], "temperature": 0.0, "avg_logprob": + -0.07006655890366127, "compression_ratio": 1.8358778625954197, "no_speech_prob": + 0.0005382033996284008}, {"id": 445, "seek": 253352, "start": 2556.24, "end": 2560.96, + "text": " interesting super novel use cases like things you would not expect and + you know there''s an insurance", "tokens": [51500, 1880, 1687, 7613, 764, 3331, + 411, 721, 291, 576, 406, 2066, 293, 291, 458, 456, 311, 364, 7214, 51736], "temperature": + 0.0, "avg_logprob": -0.07006655890366127, "compression_ratio": 1.8358778625954197, + "no_speech_prob": 0.0005382033996284008}, {"id": 446, "seek": 256096, "start": 2560.96, + "end": 2566.08, "text": " company that want to that wants to like scan internal + documents you know do they want to do search", "tokens": [50364, 2237, 300, 528, + 281, 300, 2738, 281, 411, 11049, 6920, 8512, 291, 458, 360, 436, 528, 281, 360, + 3164, 50620], "temperature": 0.0, "avg_logprob": -0.1157717008269235, "compression_ratio": + 1.9186602870813396, "no_speech_prob": 0.0012765832943841815}, {"id": 447, "seek": + 256096, "start": 2566.08, "end": 2574.4, "text": " they want to do like internal + search semantic search um and so for me my my most selfish interest here", "tokens": + [50620, 436, 528, 281, 360, 411, 6920, 3164, 47982, 3164, 1105, 293, 370, 337, 385, + 452, 452, 881, 19074, 1179, 510, 51036], "temperature": 0.0, "avg_logprob": -0.1157717008269235, + "compression_ratio": 1.9186602870813396, "no_speech_prob": 0.0012765832943841815}, + {"id": 448, "seek": 256096, "start": 2574.4, "end": 2580.88, "text": " is to really + get a clear picture of like which of these like little subdomains is actually really", + "tokens": [51036, 307, 281, 534, 483, 257, 1850, 3036, 295, 411, 597, 295, 613, + 411, 707, 1422, 4121, 2315, 307, 767, 534, 51360], "temperature": 0.0, "avg_logprob": + -0.1157717008269235, "compression_ratio": 1.9186602870813396, "no_speech_prob": + 0.0012765832943841815}, {"id": 449, "seek": 256096, "start": 2581.2, "end": 2586.4, + "text": " providing like real value like what is really what is really like taking + off it''s hard it''s sometimes", "tokens": [51376, 6530, 411, 957, 2158, 411, 437, + 307, 534, 437, 307, 534, 411, 1940, 766, 309, 311, 1152, 309, 311, 2171, 51636], + "temperature": 0.0, "avg_logprob": -0.1157717008269235, "compression_ratio": 1.9186602870813396, + "no_speech_prob": 0.0012765832943841815}, {"id": 450, "seek": 258640, "start": 2586.56, + "end": 2590.56, "text": " it''s hard to tell like who''s just messing around because + everyone''s messing around literally everyone", "tokens": [50372, 309, 311, 1152, + 281, 980, 411, 567, 311, 445, 23258, 926, 570, 1518, 311, 23258, 926, 3736, 1518, + 50572], "temperature": 0.0, "avg_logprob": -0.11061093596374097, "compression_ratio": + 2.079310344827586, "no_speech_prob": 0.008515751920640469}, {"id": 451, "seek": + 258640, "start": 2590.56, "end": 2595.28, "text": " is messing around and who''s + like actually latch on to something that''s got some real legs and every", "tokens": + [50572, 307, 23258, 926, 293, 567, 311, 411, 767, 31837, 322, 281, 746, 300, 311, + 658, 512, 957, 5668, 293, 633, 50808], "temperature": 0.0, "avg_logprob": -0.11061093596374097, + "compression_ratio": 2.079310344827586, "no_speech_prob": 0.008515751920640469}, + {"id": 452, "seek": 258640, "start": 2595.28, "end": 2599.52, "text": " time we + find a customer that''s got like real legs we dig in we''re like all in we''re like + all right how", "tokens": [50808, 565, 321, 915, 257, 5474, 300, 311, 658, 411, + 957, 5668, 321, 2528, 294, 321, 434, 411, 439, 294, 321, 434, 411, 439, 558, 577, + 51020], "temperature": 0.0, "avg_logprob": -0.11061093596374097, "compression_ratio": + 2.079310344827586, "no_speech_prob": 0.008515751920640469}, {"id": 453, "seek": + 258640, "start": 2599.52, "end": 2604.96, "text": " can we help you like let me + let''s you know again like the the I''m waiting for one of these people to", "tokens": + [51020, 393, 321, 854, 291, 411, 718, 385, 718, 311, 291, 458, 797, 411, 264, 264, + 286, 478, 3806, 337, 472, 295, 613, 561, 281, 51292], "temperature": 0.0, "avg_logprob": + -0.11061093596374097, "compression_ratio": 2.079310344827586, "no_speech_prob": + 0.008515751920640469}, {"id": 454, "seek": 258640, "start": 2604.96, "end": 2608.4, + "text": " come back and be like can we retrain our embedding so like all right yeah + let''s go build it right so", "tokens": [51292, 808, 646, 293, 312, 411, 393, 321, + 1533, 7146, 527, 12240, 3584, 370, 411, 439, 558, 1338, 718, 311, 352, 1322, 309, + 558, 370, 51464], "temperature": 0.0, "avg_logprob": -0.11061093596374097, "compression_ratio": + 2.079310344827586, "no_speech_prob": 0.008515751920640469}, {"id": 455, "seek": + 258640, "start": 2608.4, "end": 2613.76, "text": " that''s kind of my yeah I I want + people to keep messing around with this stuff I I want to figure", "tokens": [51464, + 300, 311, 733, 295, 452, 1338, 286, 286, 528, 561, 281, 1066, 23258, 926, 365, 341, + 1507, 286, 286, 528, 281, 2573, 51732], "temperature": 0.0, "avg_logprob": -0.11061093596374097, + "compression_ratio": 2.079310344827586, "no_speech_prob": 0.008515751920640469}, + {"id": 456, "seek": 261376, "start": 2613.84, "end": 2617.84, "text": " like all + of us messing around it is going to find where it gets traction like where we can + get our", "tokens": [50368, 411, 439, 295, 505, 23258, 926, 309, 307, 516, 281, + 915, 689, 309, 2170, 23558, 411, 689, 321, 393, 483, 527, 50568], "temperature": + 0.0, "avg_logprob": -0.10648101010768533, "compression_ratio": 1.9387096774193548, + "no_speech_prob": 0.0008747426327317953}, {"id": 457, "seek": 261376, "start": 2617.84, + "end": 2621.76, "text": " hooks in and where things start to start to really make + progress and then I just want to hear", "tokens": [50568, 26485, 294, 293, 689, + 721, 722, 281, 722, 281, 534, 652, 4205, 293, 550, 286, 445, 528, 281, 1568, 50764], + "temperature": 0.0, "avg_logprob": -0.10648101010768533, "compression_ratio": 1.9387096774193548, + "no_speech_prob": 0.0008747426327317953}, {"id": 458, "seek": 261376, "start": 2621.76, + "end": 2626.8, "text": " from those people like I want to know what what you need + every time we talk to someone it''s something new", "tokens": [50764, 490, 729, + 561, 411, 286, 528, 281, 458, 437, 437, 291, 643, 633, 565, 321, 751, 281, 1580, + 309, 311, 746, 777, 51016], "temperature": 0.0, "avg_logprob": -0.10648101010768533, + "compression_ratio": 1.9387096774193548, "no_speech_prob": 0.0008747426327317953}, + {"id": 459, "seek": 261376, "start": 2626.8, "end": 2633.36, "text": " and surprising + right um and that''s kind of though yeah when the real world intersects with all + this like", "tokens": [51016, 293, 8830, 558, 1105, 293, 300, 311, 733, 295, 1673, + 1338, 562, 264, 957, 1002, 27815, 82, 365, 439, 341, 411, 51344], "temperature": + 0.0, "avg_logprob": -0.10648101010768533, "compression_ratio": 1.9387096774193548, + "no_speech_prob": 0.0008747426327317953}, {"id": 460, "seek": 261376, "start": 2633.36, + "end": 2639.1200000000003, "text": " you know uh in my head it''s all an indexes + and graph theory or whatever but but uh when the real", "tokens": [51344, 291, 458, + 2232, 294, 452, 1378, 309, 311, 439, 364, 8186, 279, 293, 4295, 5261, 420, 2035, + 457, 457, 2232, 562, 264, 957, 51632], "temperature": 0.0, "avg_logprob": -0.10648101010768533, + "compression_ratio": 1.9387096774193548, "no_speech_prob": 0.0008747426327317953}, + {"id": 461, "seek": 261376, "start": 2639.1200000000003, "end": 2643.0400000000004, + "text": " word intersects is always something like simple that you need that would + make your life a lot easier", "tokens": [51632, 1349, 27815, 82, 307, 1009, 746, + 411, 2199, 300, 291, 643, 300, 576, 652, 428, 993, 257, 688, 3571, 51828], "temperature": + 0.0, "avg_logprob": -0.10648101010768533, "compression_ratio": 1.9387096774193548, + "no_speech_prob": 0.0008747426327317953}, {"id": 462, "seek": 264304, "start": 2643.04, + "end": 2648.96, "text": " and that''s the kind of stuff that I''m eager to hear + yeah I think uh I could share with you without", "tokens": [50364, 293, 300, 311, + 264, 733, 295, 1507, 300, 286, 478, 18259, 281, 1568, 1338, 286, 519, 2232, 286, + 727, 2073, 365, 291, 1553, 50660], "temperature": 0.0, "avg_logprob": -0.23126888275146484, + "compression_ratio": 1.7123287671232876, "no_speech_prob": 0.001815369469113648}, + {"id": 463, "seek": 264304, "start": 2648.96, "end": 2654.48, "text": " saying what + would be okay uh one uh member of my team said hey we''re we''re we''re", "tokens": + [50660, 1566, 437, 576, 312, 1392, 2232, 472, 2232, 4006, 295, 452, 1469, 848, 4177, + 321, 434, 321, 434, 321, 434, 50936], "temperature": 0.0, "avg_logprob": -0.23126888275146484, + "compression_ratio": 1.7123287671232876, "no_speech_prob": 0.001815369469113648}, + {"id": 464, "seek": 264304, "start": 2654.48, "end": 2662.32, "text": " using one + one um search engine today which also has you know beyond the um sparse indexals + that", "tokens": [50936, 1228, 472, 472, 1105, 3164, 2848, 965, 597, 611, 575, 291, + 458, 4399, 264, 1105, 637, 11668, 8186, 1124, 300, 51328], "temperature": 0.0, "avg_logprob": + -0.23126888275146484, "compression_ratio": 1.7123287671232876, "no_speech_prob": + 0.001815369469113648}, {"id": 465, "seek": 264304, "start": 2662.32, "end": 2668.48, + "text": " vector search support and so he was saying okay they''re using hnsw algorithm + but I cannot tweak the", "tokens": [51328, 8062, 3164, 1406, 293, 370, 415, 390, + 1566, 1392, 436, 434, 1228, 276, 3695, 86, 9284, 457, 286, 2644, 29879, 264, 51636], + "temperature": 0.0, "avg_logprob": -0.23126888275146484, "compression_ratio": 1.7123287671232876, + "no_speech_prob": 0.001815369469113648}, {"id": 466, "seek": 266848, "start": 2668.48, + "end": 2675.6, "text": " amp parameter and I forgot what was the second parameter + and look because I cannot do that recall", "tokens": [50364, 18648, 13075, 293, + 286, 5298, 437, 390, 264, 1150, 13075, 293, 574, 570, 286, 2644, 360, 300, 9901, + 50720], "temperature": 0.0, "avg_logprob": -0.13528889279032863, "compression_ratio": + 1.6981981981981982, "no_speech_prob": 0.004841654561460018}, {"id": 467, "seek": + 266848, "start": 2675.6, "end": 2682.48, "text": " is really below what it needs + to be it just doesn''t work and then he went online it''s an open source", "tokens": + [50720, 307, 534, 2507, 437, 309, 2203, 281, 312, 309, 445, 1177, 380, 589, 293, + 550, 415, 1437, 2950, 309, 311, 364, 1269, 4009, 51064], "temperature": 0.0, "avg_logprob": + -0.13528889279032863, "compression_ratio": 1.6981981981981982, "no_speech_prob": + 0.004841654561460018}, {"id": 468, "seek": 266848, "start": 2682.48, "end": 2690.2400000000002, + "text": " database he typed you know the issue on github and they realized oh we + missed really important", "tokens": [51064, 8149, 415, 33941, 291, 458, 264, 2734, + 322, 290, 355, 836, 293, 436, 5334, 1954, 321, 6721, 534, 1021, 51452], "temperature": + 0.0, "avg_logprob": -0.13528889279032863, "compression_ratio": 1.6981981981981982, + "no_speech_prob": 0.004841654561460018}, {"id": 469, "seek": 266848, "start": 2690.2400000000002, + "end": 2698.0, "text": " thing so they quickly uh expose the parameters and so he + now can tune them right so", "tokens": [51452, 551, 370, 436, 2661, 2232, 19219, + 264, 9834, 293, 370, 415, 586, 393, 10864, 552, 558, 370, 51840], "temperature": + 0.0, "avg_logprob": -0.13528889279032863, "compression_ratio": 1.6981981981981982, + "no_speech_prob": 0.004841654561460018}, {"id": 470, "seek": 269848, "start": 2698.96, + "end": 2704.96, "text": " so yeah the tuning of the index is another this is a good + one right so a lot of these systems have", "tokens": [50388, 370, 1338, 264, 15164, + 295, 264, 8186, 307, 1071, 341, 307, 257, 665, 472, 558, 370, 257, 688, 295, 613, + 3652, 362, 50688], "temperature": 0.0, "avg_logprob": -0.09588472304805633, "compression_ratio": + 1.8435114503816794, "no_speech_prob": 0.0006601736531592906}, {"id": 471, "seek": + 269848, "start": 2704.96, "end": 2710.16, "text": " like a tier there''s like a + coarse grain and a fine grain so you have hnsw over IVF or hnsw over", "tokens": + [50688, 411, 257, 12362, 456, 311, 411, 257, 39312, 12837, 293, 257, 2489, 12837, + 370, 291, 362, 276, 3695, 86, 670, 15967, 37, 420, 276, 3695, 86, 670, 50948], "temperature": + 0.0, "avg_logprob": -0.09588472304805633, "compression_ratio": 1.8435114503816794, + "no_speech_prob": 0.0006601736531592906}, {"id": 472, "seek": 269848, "start": 2710.16, + "end": 2715.12, "text": " or IVF or IVF and then each of these has parameters and + so you get these like massive config strings", "tokens": [50948, 420, 15967, 37, + 420, 15967, 37, 293, 550, 1184, 295, 613, 575, 9834, 293, 370, 291, 483, 613, 411, + 5994, 6662, 13985, 51196], "temperature": 0.0, "avg_logprob": -0.09588472304805633, + "compression_ratio": 1.8435114503816794, "no_speech_prob": 0.0006601736531592906}, + {"id": 473, "seek": 269848, "start": 2715.12, "end": 2723.04, "text": " that set + that say how these are built um and we we expose this so you can do all this stuff + but", "tokens": [51196, 300, 992, 300, 584, 577, 613, 366, 3094, 1105, 293, 321, + 321, 19219, 341, 370, 291, 393, 360, 439, 341, 1507, 457, 51592], "temperature": + 0.0, "avg_logprob": -0.09588472304805633, "compression_ratio": 1.8435114503816794, + "no_speech_prob": 0.0006601736531592906}, {"id": 474, "seek": 269848, "start": 2723.92, + "end": 2728.0, "text": " in real life if you''re building like what what number + do you even pick like how do you know", "tokens": [51636, 294, 957, 993, 498, 291, + 434, 2390, 411, 437, 437, 1230, 360, 291, 754, 1888, 411, 577, 360, 291, 458, 51840], + "temperature": 0.0, "avg_logprob": -0.09588472304805633, "compression_ratio": 1.8435114503816794, + "no_speech_prob": 0.0006601736531592906}, {"id": 475, "seek": 272848, "start": 2728.48, + "end": 2731.84, "text": " I don''t know that person must have gone through a lot + to decide they needed to change that", "tokens": [50364, 286, 500, 380, 458, 300, + 954, 1633, 362, 2780, 807, 257, 688, 281, 4536, 436, 2978, 281, 1319, 300, 50532], + "temperature": 0.0, "avg_logprob": -0.11468374832816745, "compression_ratio": 1.8396946564885497, + "no_speech_prob": 0.0002649218949954957}, {"id": 476, "seek": 272848, "start": 2731.84, + "end": 2735.68, "text": " ever because it''s not obvious it''s not like oh yeah + you it''s like you look at the data like 16s", "tokens": [50532, 1562, 570, 309, + 311, 406, 6322, 309, 311, 406, 411, 1954, 1338, 291, 309, 311, 411, 291, 574, 412, + 264, 1412, 411, 3165, 82, 50724], "temperature": 0.0, "avg_logprob": -0.11468374832816745, + "compression_ratio": 1.8396946564885497, "no_speech_prob": 0.0002649218949954957}, + {"id": 477, "seek": 272848, "start": 2735.68, "end": 2742.72, "text": " wrong like + the infrastructure to like optimize this system is not trivial and then even if + you do", "tokens": [50724, 2085, 411, 264, 6896, 281, 411, 19719, 341, 1185, 307, + 406, 26703, 293, 550, 754, 498, 291, 360, 51076], "temperature": 0.0, "avg_logprob": + -0.11468374832816745, "compression_ratio": 1.8396946564885497, "no_speech_prob": + 0.0002649218949954957}, {"id": 478, "seek": 272848, "start": 2742.72, "end": 2746.8, + "text": " optimize it you have to rerun everything you have to rebuild that index + right once you kind of", "tokens": [51076, 19719, 309, 291, 362, 281, 43819, 409, + 1203, 291, 362, 281, 16877, 300, 8186, 558, 1564, 291, 733, 295, 51280], "temperature": + 0.0, "avg_logprob": -0.11468374832816745, "compression_ratio": 1.8396946564885497, + "no_speech_prob": 0.0002649218949954957}, {"id": 479, "seek": 272848, "start": 2746.8, + "end": 2753.28, "text": " trained it so to speak so yeah I think that''s a that''s + a huge area where our our infrastructure is not", "tokens": [51280, 8895, 309, 370, + 281, 1710, 370, 1338, 286, 519, 300, 311, 257, 300, 311, 257, 2603, 1859, 689, 527, + 527, 6896, 307, 406, 51604], "temperature": 0.0, "avg_logprob": -0.11468374832816745, + "compression_ratio": 1.8396946564885497, "no_speech_prob": 0.0002649218949954957}, + {"id": 480, "seek": 275328, "start": 2753.36, "end": 2761.6000000000004, "text": + " helpful at the moment yeah but I''m sure you will learn in general excited like + Luis look you have", "tokens": [50368, 4961, 412, 264, 1623, 1338, 457, 286, 478, + 988, 291, 486, 1466, 294, 2674, 2919, 411, 25133, 574, 291, 362, 50780], "temperature": + 0.0, "avg_logprob": -0.1946825632234899, "compression_ratio": 1.6593886462882097, + "no_speech_prob": 0.002556996885687113}, {"id": 481, "seek": 275328, "start": 2761.6000000000004, + "end": 2768.32, "text": " so much information that I think we should record another + episode as well down the road as you guys", "tokens": [50780, 370, 709, 1589, 300, + 286, 519, 321, 820, 2136, 1071, 3500, 382, 731, 760, 264, 3060, 382, 291, 1074, + 51116], "temperature": 0.0, "avg_logprob": -0.1946825632234899, "compression_ratio": + 1.6593886462882097, "no_speech_prob": 0.002556996885687113}, {"id": 482, "seek": + 275328, "start": 2768.32, "end": 2775.1200000000003, "text": " progressing on the + database and you add all this interesting you know tweaks that and not", "tokens": + [51116, 36305, 322, 264, 8149, 293, 291, 909, 439, 341, 1880, 291, 458, 46664, 300, + 293, 406, 51456], "temperature": 0.0, "avg_logprob": -0.1946825632234899, "compression_ratio": + 1.6593886462882097, "no_speech_prob": 0.002556996885687113}, {"id": 483, "seek": + 275328, "start": 2775.1200000000003, "end": 2780.8, "text": " to the database as + well but I''m also super excited about the direction because basically you", "tokens": + [51456, 281, 264, 8149, 382, 731, 457, 286, 478, 611, 1687, 2919, 466, 264, 3513, + 570, 1936, 291, 51740], "temperature": 0.0, "avg_logprob": -0.1946825632234899, + "compression_ratio": 1.6593886462882097, "no_speech_prob": 0.002556996885687113}, + {"id": 484, "seek": 278080, "start": 2781.28, "end": 2789.1200000000003, "text": + " offer like if you take pure vector databases you know they do not implement SQL + support right", "tokens": [50388, 2626, 411, 498, 291, 747, 6075, 8062, 22380, 291, + 458, 436, 360, 406, 4445, 19200, 1406, 558, 50780], "temperature": 0.0, "avg_logprob": + -0.1702940434585383, "compression_ratio": 1.6741071428571428, "no_speech_prob": + 0.003910748288035393}, {"id": 485, "seek": 278080, "start": 2789.52, "end": 2794.0, + "text": " right they like the purpose of what what the existence is something else + right they''ve been", "tokens": [50800, 558, 436, 411, 264, 4334, 295, 437, 437, + 264, 9123, 307, 746, 1646, 558, 436, 600, 668, 51024], "temperature": 0.0, "avg_logprob": + -0.1702940434585383, "compression_ratio": 1.6741071428571428, "no_speech_prob": + 0.003910748288035393}, {"id": 486, "seek": 278080, "start": 2794.6400000000003, + "end": 2802.88, "text": " designed to have vectors as the first class citizens and + so they they make it super easy to", "tokens": [51056, 4761, 281, 362, 18875, 382, + 264, 700, 1508, 7180, 293, 370, 436, 436, 652, 309, 1687, 1858, 281, 51468], "temperature": + 0.0, "avg_logprob": -0.1702940434585383, "compression_ratio": 1.6741071428571428, + "no_speech_prob": 0.003910748288035393}, {"id": 487, "seek": 278080, "start": 2802.88, + "end": 2809.28, "text": " plug in a model or actually have the model you know almost + pulled from hugging face or some other", "tokens": [51468, 5452, 294, 257, 2316, + 420, 767, 362, 264, 2316, 291, 458, 1920, 7373, 490, 41706, 1851, 420, 512, 661, + 51788], "temperature": 0.0, "avg_logprob": -0.1702940434585383, "compression_ratio": + 1.6741071428571428, "no_speech_prob": 0.003910748288035393}, {"id": 488, "seek": + 280928, "start": 2809.28, "end": 2817.1200000000003, "text": " model storage model + model hub but then when you want to do some facets or whatever you want to", "tokens": + [50364, 2316, 6725, 2316, 2316, 11838, 457, 550, 562, 291, 528, 281, 360, 512, 49752, + 420, 2035, 291, 528, 281, 50756], "temperature": 0.0, "avg_logprob": -0.11756068087638692, + "compression_ratio": 1.7935779816513762, "no_speech_prob": 0.004267509561032057}, + {"id": 489, "seek": 280928, "start": 2817.1200000000003, "end": 2822.2400000000002, + "text": " call them aggregations right that''s not as easy depends on database probably + as well but I''ve seen", "tokens": [50756, 818, 552, 16743, 763, 558, 300, 311, + 406, 382, 1858, 5946, 322, 8149, 1391, 382, 731, 457, 286, 600, 1612, 51012], "temperature": + 0.0, "avg_logprob": -0.11756068087638692, "compression_ratio": 1.7935779816513762, + "no_speech_prob": 0.004267509561032057}, {"id": 490, "seek": 280928, "start": 2822.2400000000002, + "end": 2827.76, "text": " some I don''t want to name them but in any case they know + it''s it''s a weak point and it''s probably", "tokens": [51012, 512, 286, 500, 380, + 528, 281, 1315, 552, 457, 294, 604, 1389, 436, 458, 309, 311, 309, 311, 257, 5336, + 935, 293, 309, 311, 1391, 51288], "temperature": 0.0, "avg_logprob": -0.11756068087638692, + "compression_ratio": 1.7935779816513762, "no_speech_prob": 0.004267509561032057}, + {"id": 491, "seek": 280928, "start": 2827.76, "end": 2834.6400000000003, "text": + " because they do not want to serve that segment of the market maybe they do it''s + partially right but", "tokens": [51288, 570, 436, 360, 406, 528, 281, 4596, 300, + 9469, 295, 264, 2142, 1310, 436, 360, 309, 311, 18886, 558, 457, 51632], "temperature": + 0.0, "avg_logprob": -0.11756068087638692, "compression_ratio": 1.7935779816513762, + "no_speech_prob": 0.004267509561032057}, {"id": 492, "seek": 283464, "start": 2834.64, + "end": 2842.08, "text": " it''s so hard yeah exactly yeah I mean I think that the + really good vector databases who succeed", "tokens": [50364, 309, 311, 370, 1152, + 1338, 2293, 1338, 286, 914, 286, 519, 300, 264, 534, 665, 8062, 22380, 567, 7754, + 50736], "temperature": 0.0, "avg_logprob": -0.07968366371010835, "compression_ratio": + 1.849056603773585, "no_speech_prob": 0.0001273619564017281}, {"id": 493, "seek": + 283464, "start": 2842.08, "end": 2847.92, "text": " will slowly turn into databases + and databases will turn into like these things are merging they''re", "tokens": + [50736, 486, 5692, 1261, 666, 22380, 293, 22380, 486, 1261, 666, 411, 613, 721, + 366, 44559, 436, 434, 51028], "temperature": 0.0, "avg_logprob": -0.07968366371010835, + "compression_ratio": 1.849056603773585, "no_speech_prob": 0.0001273619564017281}, + {"id": 494, "seek": 283464, "start": 2847.92, "end": 2852.16, "text": " just coming + at each other for different directions like like if you''re at a vector database + if you''re", "tokens": [51028, 445, 1348, 412, 1184, 661, 337, 819, 11095, 411, + 411, 498, 291, 434, 412, 257, 8062, 8149, 498, 291, 434, 51240], "temperature": + 0.0, "avg_logprob": -0.07968366371010835, "compression_ratio": 1.849056603773585, + "no_speech_prob": 0.0001273619564017281}, {"id": 495, "seek": 283464, "start": 2852.16, + "end": 2855.8399999999997, "text": " building a vector database and you''re looking + at your metadata filtering support you''re like I", "tokens": [51240, 2390, 257, + 8062, 8149, 293, 291, 434, 1237, 412, 428, 26603, 30822, 1406, 291, 434, 411, 286, + 51424], "temperature": 0.0, "avg_logprob": -0.07968366371010835, "compression_ratio": + 1.849056603773585, "no_speech_prob": 0.0001273619564017281}, {"id": 496, "seek": + 283464, "start": 2855.8399999999997, "end": 2859.6, "text": " can''t make this more + powerful without just reinventing SQL like at some point I''m going to have to", + "tokens": [51424, 393, 380, 652, 341, 544, 4005, 1553, 445, 33477, 278, 19200, 411, + 412, 512, 935, 286, 478, 516, 281, 362, 281, 51612], "temperature": 0.0, "avg_logprob": + -0.07968366371010835, "compression_ratio": 1.849056603773585, "no_speech_prob": + 0.0001273619564017281}, {"id": 497, "seek": 285960, "start": 2859.6, "end": 2864.72, + "text": " just build SQL and so one day they''re going to bite the bullet and we''ll + I mean maybe not SQL but", "tokens": [50364, 445, 1322, 19200, 293, 370, 472, 786, + 436, 434, 516, 281, 7988, 264, 11632, 293, 321, 603, 286, 914, 1310, 406, 19200, + 457, 50620], "temperature": 0.0, "avg_logprob": -0.09612897924474768, "compression_ratio": + 1.8576923076923078, "no_speech_prob": 0.00153682054951787}, {"id": 498, "seek": + 285960, "start": 2864.72, "end": 2869.6, "text": " something you know SQL complete + if you will right because you just need all that stuff and then", "tokens": [50620, + 746, 291, 458, 19200, 3566, 498, 291, 486, 558, 570, 291, 445, 643, 439, 300, 1507, + 293, 550, 50864], "temperature": 0.0, "avg_logprob": -0.09612897924474768, "compression_ratio": + 1.8576923076923078, "no_speech_prob": 0.00153682054951787}, {"id": 499, "seek": + 285960, "start": 2869.6, "end": 2874.96, "text": " pretty soon you get into the + problem of like hey my metadata filter is the slow part of this", "tokens": [50864, + 1238, 2321, 291, 483, 666, 264, 1154, 295, 411, 4177, 452, 26603, 6608, 307, 264, + 2964, 644, 295, 341, 51132], "temperature": 0.0, "avg_logprob": -0.09612897924474768, + "compression_ratio": 1.8576923076923078, "no_speech_prob": 0.00153682054951787}, + {"id": 500, "seek": 285960, "start": 2874.96, "end": 2880.72, "text": " of my thing + so now what like oh now I''m doing query optimization like SQL query optimization + like", "tokens": [51132, 295, 452, 551, 370, 586, 437, 411, 1954, 586, 286, 478, + 884, 14581, 19618, 411, 19200, 14581, 19618, 411, 51420], "temperature": 0.0, "avg_logprob": + -0.09612897924474768, "compression_ratio": 1.8576923076923078, "no_speech_prob": + 0.00153682054951787}, {"id": 501, "seek": 285960, "start": 2880.72, "end": 2886.08, + "text": " now I''m building query optimizers like metadata filter optimizers and + you know so we have all that", "tokens": [51420, 586, 286, 478, 2390, 14581, 5028, + 22525, 411, 26603, 6608, 5028, 22525, 293, 291, 458, 370, 321, 362, 439, 300, 51688], + "temperature": 0.0, "avg_logprob": -0.09612897924474768, "compression_ratio": 1.8576923076923078, + "no_speech_prob": 0.00153682054951787}, {"id": 502, "seek": 288608, "start": 2886.08, + "end": 2890.16, "text": " like we brought all that to the party right like I have + a I have a cost based optimizer for my", "tokens": [50364, 411, 321, 3038, 439, + 300, 281, 264, 3595, 558, 411, 286, 362, 257, 286, 362, 257, 2063, 2361, 5028, 6545, + 337, 452, 50568], "temperature": 0.0, "avg_logprob": -0.10259328925091288, "compression_ratio": + 1.8089887640449438, "no_speech_prob": 0.00031465725624002516}, {"id": 503, "seek": + 288608, "start": 2890.16, "end": 2895.36, "text": " SQL query so if your metadata + filter does crazy stuff I can like do you know all kinds of SQL", "tokens": [50568, + 19200, 14581, 370, 498, 428, 26603, 6608, 775, 3219, 1507, 286, 393, 411, 360, 291, + 458, 439, 3685, 295, 19200, 50828], "temperature": 0.0, "avg_logprob": -0.10259328925091288, + "compression_ratio": 1.8089887640449438, "no_speech_prob": 0.00031465725624002516}, + {"id": 504, "seek": 288608, "start": 2895.36, "end": 2901.7599999999998, "text": + " magic to like to optimize this query but on the flip side like yeah like so everybody''s + everybody", "tokens": [50828, 5585, 281, 411, 281, 19719, 341, 14581, 457, 322, + 264, 7929, 1252, 411, 1338, 411, 370, 2201, 311, 2201, 51148], "temperature": 0.0, + "avg_logprob": -0.10259328925091288, "compression_ratio": 1.8089887640449438, "no_speech_prob": + 0.00031465725624002516}, {"id": 505, "seek": 288608, "start": 2903.2799999999997, + "end": 2907.92, "text": " I think the good systems need all this stuff it so it''s + just we took a hard problem we took two hard", "tokens": [51224, 286, 519, 264, + 665, 3652, 643, 439, 341, 1507, 309, 370, 309, 311, 445, 321, 1890, 257, 1152, 1154, + 321, 1890, 732, 1152, 51456], "temperature": 0.0, "avg_logprob": -0.10259328925091288, + "compression_ratio": 1.8089887640449438, "no_speech_prob": 0.00031465725624002516}, + {"id": 506, "seek": 288608, "start": 2907.92, "end": 2912.08, "text": " problems + and we say congratulations this now one hard problem and it''s like okay well okay + it''s", "tokens": [51456, 2740, 293, 321, 584, 13568, 341, 586, 472, 1152, 1154, + 293, 309, 311, 411, 1392, 731, 1392, 309, 311, 51664], "temperature": 0.0, "avg_logprob": + -0.10259328925091288, "compression_ratio": 1.8089887640449438, "no_speech_prob": + 0.00031465725624002516}, {"id": 507, "seek": 291208, "start": 2912.08, "end": 2918.4, + "text": " a big hard problem yeah I love how you you model it that this database + is and non-data", "tokens": [50364, 257, 955, 1152, 1154, 1338, 286, 959, 577, 291, + 291, 2316, 309, 300, 341, 8149, 307, 293, 2107, 12, 67, 3274, 50680], "temperature": + 0.0, "avg_logprob": -0.14094980372938998, "compression_ratio": 1.76036866359447, + "no_speech_prob": 0.0022365718614310026}, {"id": 508, "seek": 291208, "start": 2918.4, + "end": 2922.56, "text": " bases sort of will converge eventually even though everyone + I think at this point calls themselves", "tokens": [50680, 17949, 1333, 295, 486, + 41881, 4728, 754, 1673, 1518, 286, 519, 412, 341, 935, 5498, 2969, 50888], "temperature": + 0.0, "avg_logprob": -0.14094980372938998, "compression_ratio": 1.76036866359447, + "no_speech_prob": 0.0022365718614310026}, {"id": 509, "seek": 291208, "start": 2922.56, + "end": 2930.48, "text": " a database yeah probably minor exceptions but still you + are spot on on whether or not first of all", "tokens": [50888, 257, 8149, 1338, + 1391, 6696, 22847, 457, 920, 291, 366, 4008, 322, 322, 1968, 420, 406, 700, 295, + 439, 51284], "temperature": 0.0, "avg_logprob": -0.14094980372938998, "compression_ratio": + 1.76036866359447, "no_speech_prob": 0.0022365718614310026}, {"id": 510, "seek": + 291208, "start": 2930.48, "end": 2934.64, "text": " what is a database right and + then whether or not you have all these features that that need to be", "tokens": + [51284, 437, 307, 257, 8149, 558, 293, 550, 1968, 420, 406, 291, 362, 439, 613, + 4122, 300, 300, 643, 281, 312, 51492], "temperature": 0.0, "avg_logprob": -0.14094980372938998, + "compression_ratio": 1.76036866359447, "no_speech_prob": 0.0022365718614310026}, + {"id": 511, "seek": 293464, "start": 2934.64, "end": 2942.8799999999997, "text": + " supported and and also like really importantly the world is used to having SQL + databases right so", "tokens": [50364, 8104, 293, 293, 611, 411, 534, 8906, 264, + 1002, 307, 1143, 281, 1419, 19200, 22380, 558, 370, 50776], "temperature": 0.0, + "avg_logprob": -0.14325880200675364, "compression_ratio": 1.684873949579832, "no_speech_prob": + 0.0021565337665379047}, {"id": 512, "seek": 293464, "start": 2942.8799999999997, + "end": 2949.12, "text": " like if you sort of I don''t have a better analogy but + basically if you develop something and you say", "tokens": [50776, 411, 498, 291, + 1333, 295, 286, 500, 380, 362, 257, 1101, 21663, 457, 1936, 498, 291, 1499, 746, + 293, 291, 584, 51088], "temperature": 0.0, "avg_logprob": -0.14325880200675364, + "compression_ratio": 1.684873949579832, "no_speech_prob": 0.0021565337665379047}, + {"id": 513, "seek": 293464, "start": 2949.12, "end": 2956.08, "text": " it can run + but cannot walk and you''re like okay but sometimes you need to walk right that''s + amazing", "tokens": [51088, 309, 393, 1190, 457, 2644, 1792, 293, 291, 434, 411, + 1392, 457, 2171, 291, 643, 281, 1792, 558, 300, 311, 2243, 51436], "temperature": + 0.0, "avg_logprob": -0.14325880200675364, "compression_ratio": 1.684873949579832, + "no_speech_prob": 0.0021565337665379047}, {"id": 514, "seek": 293464, "start": 2956.08, + "end": 2961.52, "text": " before we close I really like to ask this question with + some people find it a little awkward to answer", "tokens": [51436, 949, 321, 1998, + 286, 534, 411, 281, 1029, 341, 1168, 365, 512, 561, 915, 309, 257, 707, 11411, 281, + 1867, 51708], "temperature": 0.0, "avg_logprob": -0.14325880200675364, "compression_ratio": + 1.684873949579832, "no_speech_prob": 0.0021565337665379047}, {"id": 515, "seek": + 296152, "start": 2962.16, "end": 2968.88, "text": " but I do feel it''s important + it''s a little bit philosophical and I ask what drives you it used to be", "tokens": + [50396, 457, 286, 360, 841, 309, 311, 1021, 309, 311, 257, 707, 857, 25066, 293, + 286, 1029, 437, 11754, 291, 309, 1143, 281, 312, 50732], "temperature": 0.0, "avg_logprob": + -0.1462329047066825, "compression_ratio": 1.6201117318435754, "no_speech_prob": + 0.008417884819209576}, {"id": 516, "seek": 296152, "start": 2968.88, "end": 2976.24, + "text": " why you do this but basically when you wake up you know you are driven + to continue but what''s", "tokens": [50732, 983, 291, 360, 341, 457, 1936, 562, + 291, 6634, 493, 291, 458, 291, 366, 9555, 281, 2354, 457, 437, 311, 51100], "temperature": + 0.0, "avg_logprob": -0.1462329047066825, "compression_ratio": 1.6201117318435754, + "no_speech_prob": 0.008417884819209576}, {"id": 517, "seek": 296152, "start": 2976.24, + "end": 2981.92, "text": " inside that spinning you''ve been through it right you''ve + been doing this for so many years also", "tokens": [51100, 1854, 300, 15640, 291, + 600, 668, 807, 309, 558, 291, 600, 668, 884, 341, 337, 370, 867, 924, 611, 51384], + "temperature": 0.0, "avg_logprob": -0.1462329047066825, "compression_ratio": 1.6201117318435754, + "no_speech_prob": 0.008417884819209576}, {"id": 518, "seek": 298192, "start": 2982.0, + "end": 2991.76, "text": " Facebook at scale but you want to continue to do that + so I am the way I think about this is there''s", "tokens": [50368, 4384, 412, 4373, + 457, 291, 528, 281, 2354, 281, 360, 300, 370, 286, 669, 264, 636, 286, 519, 466, + 341, 307, 456, 311, 50856], "temperature": 0.0, "avg_logprob": -0.08384686787923178, + "compression_ratio": 1.6627906976744187, "no_speech_prob": 0.003769760252907872}, + {"id": 519, "seek": 298192, "start": 2993.6, "end": 2999.44, "text": " there''s + like a shiny problem at the heart of all this that I love and if you if you let + me", "tokens": [50948, 456, 311, 411, 257, 16997, 1154, 412, 264, 1917, 295, 439, + 341, 300, 286, 959, 293, 498, 291, 498, 291, 718, 385, 51240], "temperature": 0.0, + "avg_logprob": -0.08384686787923178, "compression_ratio": 1.6627906976744187, "no_speech_prob": + 0.003769760252907872}, {"id": 520, "seek": 298192, "start": 3000.2400000000002, + "end": 3008.08, "text": " I will sit there and like I will be happy if like I just + come into work every day and like look", "tokens": [51280, 286, 486, 1394, 456, + 293, 411, 286, 486, 312, 2055, 498, 411, 286, 445, 808, 666, 589, 633, 786, 293, + 411, 574, 51672], "temperature": 0.0, "avg_logprob": -0.08384686787923178, "compression_ratio": + 1.6627906976744187, "no_speech_prob": 0.003769760252907872}, {"id": 521, "seek": + 300808, "start": 3008.08, "end": 3013.04, "text": " through the corridors and and + fix bugs anything that''s crashing then look through the profiles and", "tokens": + [50364, 807, 264, 46920, 293, 293, 3191, 15120, 1340, 300, 311, 26900, 550, 574, + 807, 264, 23693, 293, 50612], "temperature": 0.0, "avg_logprob": -0.11945095768681278, + "compression_ratio": 1.9918367346938775, "no_speech_prob": 0.004618676844984293}, + {"id": 522, "seek": 300808, "start": 3013.04, "end": 3019.52, "text": " like optimize + code I can just do this this just makes me happy and so like building like reliable", + "tokens": [50612, 411, 19719, 3089, 286, 393, 445, 360, 341, 341, 445, 1669, 385, + 2055, 293, 370, 411, 2390, 411, 12924, 50936], "temperature": 0.0, "avg_logprob": + -0.11945095768681278, "compression_ratio": 1.9918367346938775, "no_speech_prob": + 0.004618676844984293}, {"id": 523, "seek": 300808, "start": 3019.52, "end": 3025.36, + "text": " scalable systems make me happy so there''s like this shiny problem in + the middle of all this and", "tokens": [50936, 38481, 3652, 652, 385, 2055, 370, + 456, 311, 411, 341, 16997, 1154, 294, 264, 2808, 295, 439, 341, 293, 51228], "temperature": + 0.0, "avg_logprob": -0.11945095768681278, "compression_ratio": 1.9918367346938775, + "no_speech_prob": 0.004618676844984293}, {"id": 524, "seek": 300808, "start": 3025.36, + "end": 3030.72, "text": " it''s like the common thread through everything that I + could just do and be happy and it it''s rewarded", "tokens": [51228, 309, 311, 411, + 264, 2689, 7207, 807, 1203, 300, 286, 727, 445, 360, 293, 312, 2055, 293, 309, 309, + 311, 29105, 51496], "temperature": 0.0, "avg_logprob": -0.11945095768681278, "compression_ratio": + 1.9918367346938775, "no_speech_prob": 0.004618676844984293}, {"id": 525, "seek": + 300808, "start": 3030.72, "end": 3036.4, "text": " and rewarding right and so that + like the basis of like it''s really easy to like this stuff so", "tokens": [51496, + 293, 20063, 558, 293, 370, 300, 411, 264, 5143, 295, 411, 309, 311, 534, 1858, 281, + 411, 341, 1507, 370, 51780], "temperature": 0.0, "avg_logprob": -0.11945095768681278, + "compression_ratio": 1.9918367346938775, "no_speech_prob": 0.004618676844984293}, + {"id": 526, "seek": 303640, "start": 3036.4, "end": 3040.88, "text": " then obviously + you have to extend upon that like the way that you get driven beyond the shiny", + "tokens": [50364, 550, 2745, 291, 362, 281, 10101, 3564, 300, 411, 264, 636, 300, + 291, 483, 9555, 4399, 264, 16997, 50588], "temperature": 0.0, "avg_logprob": -0.07752901012614621, + "compression_ratio": 1.9254901960784314, "no_speech_prob": 0.00015686126425862312}, + {"id": 527, "seek": 303640, "start": 3040.88, "end": 3045.28, "text": " thing because + you know I could go do that for like Minecraft mods I don''t have to do that for + databases", "tokens": [50588, 551, 570, 291, 458, 286, 727, 352, 360, 300, 337, + 411, 21029, 30899, 286, 500, 380, 362, 281, 360, 300, 337, 22380, 50808], "temperature": + 0.0, "avg_logprob": -0.07752901012614621, "compression_ratio": 1.9254901960784314, + "no_speech_prob": 0.00015686126425862312}, {"id": 528, "seek": 303640, "start": + 3046.32, "end": 3054.8, "text": " is like in some larger mission that like you feel + connected to so for me the the mission here was", "tokens": [50860, 307, 411, 294, + 512, 4833, 4447, 300, 411, 291, 841, 4582, 281, 370, 337, 385, 264, 264, 4447, 510, + 390, 51284], "temperature": 0.0, "avg_logprob": -0.07752901012614621, "compression_ratio": + 1.9254901960784314, "no_speech_prob": 0.00015686126425862312}, {"id": 529, "seek": + 303640, "start": 3054.8, "end": 3059.92, "text": " a little bit too old the people + was actually kind of the original driving force like this is the", "tokens": [51284, + 257, 707, 857, 886, 1331, 264, 561, 390, 767, 733, 295, 264, 3380, 4840, 3464, 411, + 341, 307, 264, 51540], "temperature": 0.0, "avg_logprob": -0.07752901012614621, + "compression_ratio": 1.9254901960784314, "no_speech_prob": 0.00015686126425862312}, + {"id": 530, "seek": 303640, "start": 3059.92, "end": 3063.76, "text": " people I + don''t even care what we''re I don''t care what we''re doing like let''s go do it + like us as a", "tokens": [51540, 561, 286, 500, 380, 754, 1127, 437, 321, 434, 286, + 500, 380, 1127, 437, 321, 434, 884, 411, 718, 311, 352, 360, 309, 411, 505, 382, + 257, 51732], "temperature": 0.0, "avg_logprob": -0.07752901012614621, "compression_ratio": + 1.9254901960784314, "no_speech_prob": 0.00015686126425862312}, {"id": 531, "seek": + 306376, "start": 3063.76, "end": 3068.8, "text": " group that''s gonna be fun but + then the whole AI thing I mean look I we can get philosophical you", "tokens": [50364, + 1594, 300, 311, 799, 312, 1019, 457, 550, 264, 1379, 7318, 551, 286, 914, 574, 286, + 321, 393, 483, 25066, 291, 50616], "temperature": 0.0, "avg_logprob": -0.06892404040774784, + "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0020170139614492655}, + {"id": 532, "seek": 306376, "start": 3068.8, "end": 3074.8, "text": " want to get + philosophical do it real quick there''s like two or three nominations for technologies", + "tokens": [50616, 528, 281, 483, 25066, 360, 309, 957, 1702, 456, 311, 411, 732, + 420, 1045, 46331, 337, 7943, 50916], "temperature": 0.0, "avg_logprob": -0.06892404040774784, + "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0020170139614492655}, + {"id": 533, "seek": 306376, "start": 3074.8, "end": 3081.6800000000003, "text": + " that will change the 21st century like and I you got to work pretty hard to not + put AI at the top", "tokens": [50916, 300, 486, 1319, 264, 5080, 372, 4901, 411, + 293, 286, 291, 658, 281, 589, 1238, 1152, 281, 406, 829, 7318, 412, 264, 1192, 51260], + "temperature": 0.0, "avg_logprob": -0.06892404040774784, "compression_ratio": 1.7636363636363637, + "no_speech_prob": 0.0020170139614492655}, {"id": 534, "seek": 306376, "start": 3081.6800000000003, + "end": 3086.96, "text": " of that list maybe there''s some other ones you could + argue like maybe nuclear fusion is a 21st", "tokens": [51260, 295, 300, 1329, 1310, + 456, 311, 512, 661, 2306, 291, 727, 9695, 411, 1310, 8179, 23100, 307, 257, 5080, + 372, 51524], "temperature": 0.0, "avg_logprob": -0.06892404040774784, "compression_ratio": + 1.7636363636363637, "no_speech_prob": 0.0020170139614492655}, {"id": 535, "seek": + 306376, "start": 3086.96, "end": 3092.0, "text": " century revolution maybe gene + editing like I don''t know you could come up with something but like", "tokens": + [51524, 4901, 8894, 1310, 12186, 10000, 411, 286, 500, 380, 458, 291, 727, 808, + 493, 365, 746, 457, 411, 51776], "temperature": 0.0, "avg_logprob": -0.06892404040774784, + "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0020170139614492655}, + {"id": 536, "seek": 309200, "start": 3092.0, "end": 3099.12, "text": " chances are + that AI is gonna be like a defining 21st century technology so you''re gonna let + me play", "tokens": [50364, 10486, 366, 300, 7318, 307, 799, 312, 411, 257, 17827, + 5080, 372, 4901, 2899, 370, 291, 434, 799, 718, 385, 862, 50720], "temperature": + 0.0, "avg_logprob": -0.17327480316162108, "compression_ratio": 1.726027397260274, + "no_speech_prob": 0.0004677653778344393}, {"id": 537, "seek": 309200, "start": 3099.12, + "end": 3105.76, "text": " with my shiny toys in that yeah that''s okay I''m out + of bed now right I''ll get out of bed I''ll come", "tokens": [50720, 365, 452, 16997, + 13753, 294, 300, 1338, 300, 311, 1392, 286, 478, 484, 295, 2901, 586, 558, 286, + 603, 483, 484, 295, 2901, 286, 603, 808, 51052], "temperature": 0.0, "avg_logprob": + -0.17327480316162108, "compression_ratio": 1.726027397260274, "no_speech_prob": + 0.0004677653778344393}, {"id": 538, "seek": 309200, "start": 3105.76, "end": 3110.72, + "text": " I''ll come get out of bed and I will let''s go let''s go build something + so that''s I think that''s", "tokens": [51052, 286, 603, 808, 483, 484, 295, 2901, + 293, 286, 486, 718, 311, 352, 718, 311, 352, 1322, 746, 370, 300, 311, 286, 519, + 300, 311, 51300], "temperature": 0.0, "avg_logprob": -0.17327480316162108, "compression_ratio": + 1.726027397260274, "no_speech_prob": 0.0004677653778344393}, {"id": 539, "seek": + 309200, "start": 3110.72, "end": 3116.4, "text": " my answer to your question amazing + and I think you got it on from here to really and", "tokens": [51300, 452, 1867, + 281, 428, 1168, 2243, 293, 286, 519, 291, 658, 309, 322, 490, 510, 281, 534, 293, + 51584], "temperature": 0.0, "avg_logprob": -0.17327480316162108, "compression_ratio": + 1.726027397260274, "no_speech_prob": 0.0004677653778344393}, {"id": 540, "seek": + 311640, "start": 3116.4, "end": 3124.7200000000003, "text": " and they saw passion + knowledge so you did see the movements so I''m really excited to see what", "tokens": + [50364, 293, 436, 1866, 5418, 3601, 370, 291, 630, 536, 264, 9981, 370, 286, 478, + 534, 2919, 281, 536, 437, 50780], "temperature": 0.0, "avg_logprob": -0.2578787857227111, + "compression_ratio": 1.6741071428571428, "no_speech_prob": 0.007722989656031132}, + {"id": 541, "seek": 311640, "start": 3124.7200000000003, "end": 3131.52, "text": + " you guys got a built thank you so much for joining me today to discuss yes we + didn''t go to the", "tokens": [50780, 291, 1074, 658, 257, 3094, 1309, 291, 370, + 709, 337, 5549, 385, 965, 281, 2248, 2086, 321, 994, 380, 352, 281, 264, 51120], + "temperature": 0.0, "avg_logprob": -0.2578787857227111, "compression_ratio": 1.6741071428571428, + "no_speech_prob": 0.007722989656031132}, {"id": 542, "seek": 311640, "start": 3131.52, + "end": 3138.08, "text": " NM tuning this algo and this is how the algorithm goes + but hey I really enjoyed the product level", "tokens": [51120, 426, 44, 15164, 341, + 8655, 293, 341, 307, 577, 264, 9284, 1709, 457, 4177, 286, 534, 4626, 264, 1674, + 1496, 51448], "temperature": 0.0, "avg_logprob": -0.2578787857227111, "compression_ratio": + 1.6741071428571428, "no_speech_prob": 0.007722989656031132}, {"id": 543, "seek": + 311640, "start": 3138.08, "end": 3146.0, "text": " this is what this is on the money + some during my company say yeah fantastic thank you so", "tokens": [51448, 341, + 307, 437, 341, 307, 322, 264, 1460, 512, 1830, 452, 2237, 584, 1338, 5456, 1309, + 291, 370, 51844], "temperature": 0.0, "avg_logprob": -0.2578787857227111, "compression_ratio": + 1.6741071428571428, "no_speech_prob": 0.007722989656031132}, {"id": 544, "seek": + 314600, "start": 3146.0, "end": 3153.28, "text": " much Luis you know enjoy your + day and let''s talk soon awesome thank you for having me and yeah happy", "tokens": + [50364, 709, 25133, 291, 458, 2103, 428, 786, 293, 718, 311, 751, 2321, 3476, 1309, + 291, 337, 1419, 385, 293, 1338, 2055, 50728], "temperature": 0.0, "avg_logprob": + -0.2339029592626235, "compression_ratio": 1.275229357798165, "no_speech_prob": 0.02100030519068241}, + {"id": 545, "seek": 314600, "start": 3153.28, "end": 3156.0, "text": " to chat again + all right cheers bye bye", "tokens": [50728, 281, 5081, 797, 439, 558, 15301, 6543, + 6543, 50864], "temperature": 0.0, "avg_logprob": -0.2339029592626235, "compression_ratio": + 1.275229357798165, "no_speech_prob": 0.02100030519068241}]' +--- + +Hello there, vector podcast. Season three and this promised I'm trying to shoot for 30 minute episodes. Let's see how I'm going to do on this one. I'm super excited to have Luis Brandy, vice president of engineering at Rockset. I know you guys are building database. +Hey Luis, how are you doing? I'm doing great. So far so good. Thank you for having me today. Oh yeah, excited. Excited to learn about Rockset as well. But before that, it's a tradition. +Could you please introduce yourself a little bit about your background and how you got to your stage in your professional life? Sure. So I've been at Rockset for two years and change over two years. VP of engineering. Before that, I was at Facebook for 11 years. +So I did roughly three things at Facebook and it's funny because even the ones that feel least relevant have become more relevant recently. I did spam fighting infrastructure for my first much of time at Facebook and that involved like two large systems. +One was like a super real time system, which turns into the real time database we're going to talk about today. And the other was we did a lot of vector clustering. Like back I was doing vectors but way before they were cool. This time was like 2011, 2015 or so. +And we used vectors a lot in in spam fighting and image classification. And this is like even before like the deep learning took over the world like this is right before deep learning changed everything in this in this space. But we were using vectors a lot. +We built some pretty powerful systems actually built like large scale vector clustering. I don't know before it was cool. Now everyone's building large scale vector applications. And then I worked on a lot of other stuff at my time at Facebook. +So there was a lot of core C++ core libraries, a lot of infrastructure stuff. I worked on an open source stuff called folly and thrift. So these are basically like core libraries that Facebook has released over the years. And the theme of all this is like highly scalable infra. +And then I did some real time and some vector stuff back in the spam fighting days. It's not totally applicable necessarily to the modern world. But it's still pretty interesting background. It a very interesting confluence of things that have brought me to Roxette. +So yeah, that's my life story roughly and in nutshell, there's more. But I think that will do for the for the intro. Yeah, for sure. Very exciting, really exciting. I heard about thrift. +And I also remember like early on many years ago, when some of you guys were on stage, you know, from the engineering at Facebook, you would constantly, you know, hint to the point that yeah, we ran out of the capabilities of this database. So we needed to scale up. +We needed to build a new one sometimes. And that was really really interesting that it's constantly like, you know, you're always battle against too many images, too many videos and so on and so forth. Yeah, one thing that I've always said was that like everything is broken at scale. + Like like, there's this idea that sometimes you reach for the right tool for the job, but the reality is like when when you push the even the right tool to the limit, it will fall over and you'll find yourself doing things that other people, you know, rebuilding something that other people take for granted. +Like my favorite example of this is at Facebook, we had a team on on my core C++ group that was working on Malak. Like who who works on Malak? It turns out there are people that work at Malak. +They're most of them work at like a place like Facebook or Google or places like that, but but that's like the kind of thing where you can save a lot of money by making tiny improvements to Malak. So it's worth doing. It's amazing. I remember I did a bit of a C++ as well. +I guess like you could say two and a half years. +And at some point in the 90 virus company here in Finland, I had to choose which Malak will it be, right? And I had to sort of discuss with my team and I was like, struck really, is that really the thing we need to discuss? +And they said, yeah, actually you won't believe because we are running on a mobile phone back then it was this Microsoft's Windows mobile, I guess it was called, right? So you have to be really careful to the round. +Yeah, I mean, there's only here for Malak's in the world. So you might have chosen ours. Who knows? Amazing. All these say is that you've been really, really deep and low level. And so I think you you doubled in coding obviously, right? Yeah. So I was a fairly technical. +I've been a manager for relatively long time. I don't know, 12 years or so, but I've always been a fairly technical manager in my path. And so for example, for years I worked in the course, SQL's plus libraries at Facebook, even while I was a manager, even a director. +I was on the SQL standards committee for a while and doing things like that. Sorry, I got paged. Everything's fine. So yeah, I've definitely worked in the code. I've tried to stay as hands on as possible. In most recent years, it's become increasingly difficult. +I just I don't know, it's sort of the dark side of management. You slowly slide into more managerial things. But I still try to stay about as hands on as I possibly can. Oh, tell me about, tell me about me about that. +I mean, I'm also on the product management side today and previously a manager of people as well. And I'm like, am I sliding backwards? Do I need to? Sometimes I do, but it's not on the same level as it used to be for sure. But it makes sense to stay on these topics. +And and then after all these years, you decided to move to Roxette. I've read a blog post that I think you've written for the company where you give the reasons why you did so and you explain about the team strengths and so on support. Some of them are from Facebook as well. +Today matter, right? Can you sort of repeat that story a little bit like why you moved from a big company you could say, right? To a startup. So the answer is in short is the people. The core group at Roxette is a bunch of, I shouldn't say the core group now. +The core group now has grown a lot, but the original founding or was a bunch of extremely strong Facebook people that I knew from Facebook and from. And so you know, you mentioned rebuilding databases. +For example, like two of the main people were probably three of the main people responsible for rebuilding databases at Facebook are at Roxette. So Drupal, who's who's our CTO was built RoxDB at Facebook. +And that was part of replacing the storage plan of my sequel and a highly scalable way at Facebook. And then of course, the graph database that powers literally all of Facebook like Facebook is a graph and it is primarily powered by a graph database called Tau. +Nathan and Vencat are two people who worked who worked on that at at extensively at like tech leads and founders in some sense of that project Facebook. So this is like extremely pedigree group. +So to me, I don't care so much about it, they're also like genuinely amazing people to work with and work around. + And so this is kind of this idea of like, hey, you want to join us startup with a bunch of the smartest people you've ever worked with and like try to do something and worst case scenario, you know, it all goes, you know, kaput or whatever, but you have like you worked with like some of the best people on a really interesting problem for a couple years. +And I was like, yeah, I'm in for that. There's a longer version of that story, but that's that is really the central reason of why I ended up I ended up switching. Yeah, I mean, it sounds like a brilliant reason too. +But I'm also interested you said you've been using embeddings before like Facebook and on vectors and you said that prior to deep learning era. +So can you explain a bit like how these vectors were sort of created if it's possible? So there is some sensitivity here, but it's not maybe for the reason you think it's not a trade sensitivity. The sensitivity with abuse abuse use cases. What we were doing was image classification. + And and most of this is I'm not going to go into too much detail for maybe obvious reasons, but there are there are images that you are not allowed to to to use or put up and they and and obviously what they don't want to do is hand all these companies like the images and say if you see this illegal image, tell us. +So oftentimes they give you hashes. And but these aren't actual hashes. They are not a hash of the illegal image. They are a locality sensitive hash and they're a vector. What they are is literally a vector. +And Euclidean distance is the measurement of so you basically have a a classic vector search problem. You're given a pile of vectors. If there's a technology known as photo DNA that you can look up. It's it's not as far as I know it's not like an open standard. +So you don't actually it's not actually in the public domain. What it actually is, but it's effectively a mechanism for turning images into vectors that's used as this hashing mechanism. +And and so Facebook built a bunch of infrastructure to flag hashes that came through for reasons that are not fun to talk about. Let's put it that way. Like they're there. There's like again, I don't want to get into it. +It's kind of it's awful, right? But at the end of the day, like you have you have vectors flowing into the system. And what you're doing every single upload is is doing essentially a vector search. +You're saying, Hey, given this corpus of vectors is this vector that fly that's coming in match any use. That was the basic core of the system. But once you have this like these vectors, you can start to do other abuse things. So for example, you can start clustering vectors. +You can build vector clusters. And that way you can find like neighborhoods of images, like similar images. Now here similar is here similar means something quite different. Because these were not like semantic similarity. +So this is not like what you would get from an embedding today from like say, you know, any of the modern. Yeah, I heard clip. Yeah, whatever. Yeah. These were these were these were much more like text textual. I mean, texture. People familiar with like image processing techniques. +This is these are vectors based on things like local pixel gradients or wavelet transforms things like that. +So when we say images were similar, we mean to things like, you know, like rebalancing the white scale or or changing the hue and saturation like like those kinds of image manipulation or re encoding it right from a different JPEG, different JPEG encoding. +Like it was tolerant to that kind of manipulation, not like it wasn't like finding images of elephants, like that's that's not what it was doing. Yeah, yeah, I remember I took a course. Actually, I studied master degree in Finland here dedicated to data security. +And one of the courses was about, you know, how you can temper with images that had watermarks, right? So like, yeah, and then how do you make that watermark resilient to any tempering that might happen on the image level, right? On any of the bands and stuff. And as you explained, he was in stuff. +So that's basically they digital image processing is the word to Google if someone wants to. And then it's like a big, big topic. But what struck me and what you explained is that every image upload had to go through that process, which means it had to be super scalable. +And also your database of vectors would be ever growing all the time as the image passes through or doesn't, you would need to add it somewhere to your vector space. So in this case, no, this is the one advantage we had, because we only care about matches to a specific relatively small set. +Oh, I see, I see. So it's like a set that shouldn't grow ideally, right? Yeah, or very, very nominal. I see. Yeah. And so this is, so it's funny because that that's a big difference that that makes it easy in that era. +Today, you'd have to bust out all the A and N stuff and maybe stuff we'll get into to really be able to do a really much more scalable vector search. So this was really more about evaluating a relatively fixed set of vectors. +So you can hyper optimize how that was organized in like a, and, but evaluating it at an insane scale. So the update problem wasn't very hard, but the evaluation problem was like it needed to be extremely high scale. Yeah, a bunch of questions in my mind, but let's move move on to Roxette. +Tell me more about the what part, you know, what it is as the product. And then slowly, let's go deeper into the technology side. Yeah. So my standard statement of what Roxette is is Roxette is a search and analytics database built for the cloud. +And that's a bunch of, I forgot one, it's a real time search and analytics database for the cloud. +Now that's a bunch of little buzz worries that, you know, it's very easy to get lost in the kind of marketing feel of that, but each of those words does like non trivial amounts of work and what it is I'm really trying to build here. So first of all, it's a search and analytics database. +So here, what we mean is like a like an OLAP style analytics database is like it's it's like that's where we're starting. We want to run an analytics type queries and this I won't get into all of this, but this is like separate from your OLTP style databases. +So this is not my sequel, not a large transactional thing. It is a but is it OLAP style database. +And search and analytics is a very interesting pairing in this world because systems like elastic search or very search oriented system systems like rocks that have analytics styles, but these are actually not that different architecturally. They're very the way you use them may feel different. +The primitives you're using feel different, but all that sits fairly shallowly in the technology. The underlying architecture of these systems ends up looking quite similar. So search and analytics actually go together quite nicely from like a I can do both. +Maybe I don't do both well, but that will mostly exist at the top, not not in the not in the infrastructure. It's in the cloud. So the whole system is built to be elastic from the beginning. So if you send me twice as much data, I can scale you out in a way that you know, like it just works. +You don't have to worry. You're not reprovisioning more machines to double your cluster size or anything like that. And then real time. +So our focus has always been real time, which is to say specifically most people when they think of real time they want their queries to be fast, but the real heart of real time is ingest latency. +So if you send me new data, how quickly does that data get manifested in the queries? If it's tomorrow, if it shows up in tomorrow's queries, you're not that's not a real time system. +And there's a lot of systems like this, these very big batch style, like mega exabyte type of like doob clusters that you you can query yesterday's data, right? And get and get like genuinely enormous amounts of data. That is not rock set like as that's not rock sets problem. +But for us, it's like, hey, if you want like last minutes data and it's ideally several zero smaller of a working set, then that's where rock set is meant is meant to work really well. And so this is like the heart of this is what we've set out to build at a high level. +And I don't know if you want to do want me to keep going. I don't know. I feel like I've already said too much. I want I want no, it's amazing. It's a good start. I wanted to stay a little bit on the product side. If you go now and flip over to the use cases for the moment. +So what are the typical use cases and sort of can you zoom out as much as possible, maybe even giving, you know, even if hypothetical, it's fine. For example, so products that use your product. Yeah. So we we have a bunch of customers in a bunch of different domains and it's we can even go. +So so one way to think about this is just like who's using it and why like what domains are they using it in? And so for example, we have a bunch of gaming customers. So this is like there's real time events occurring in games. +Imagine an online an online game of some sort and they're collating that information constantly and having it be real uptime say leader boards or things like that are happening. There's a lot there's several actually like logistics and supply chain type people using it. + So like where is my package right now or where is the boat in the ocean like these kinds of queries are like very commonly done, you know, like where is the where they're basically tracking their entire supply chain trying to find shortages and what's going to create problems down the line in like a logistic type settings. +There's a lot of FinTech. There's a lot of financial financial firms use it a lot of fraud detection. So again fraud and spam these are very real time problems. You can't like detect yesterday spam or fraud. That's like really harmful. You got to you need to know now right. +And then a lot of like recommendation and like product experience. So anytime that like you want to power a user facing experience, you almost always need that to be real time. So you know example I like to use is there's a there's a there's a place called what not if you go to what not. +com if you've never heard of it. What not is basically a streaming site for buying and selling. So it's sort of eBay meets Twitch kind of a easiest way I could describe it. But what's really cool about that is you have a recommendation problem like I want to buy something or people selling it. +So I it's really in the sites. And my interest were you to show me like you might want to check these things out. That's like a recommendation problem. But it's like really real time right. It has to match me to online sellers at any given moment. +And so it's a recommendation system that has to get built needs decent amount of needs high scale. And it also needs to be real time. It needs to use a lot of real time data. So these are all use cases for Rockset. These are every one of these is real customers using Rockset to do something. +Yeah, for sure. Now I want to go back to jump back to tech side. So Rockset and inside it are you using RocksDB or something else? So okay. So okay. Are we using RocksDB? So first of all do we know what RocksDB is? Just everyone's on the same page. +RocksDB is an engine that was built by Drew Bat Facebook. And I shouldn't say by Drew, but by a team that Drew was a part of. Like he was one of the original founders of that team. There's certainly a lot of people involved in RocksDB. It's a key value store right. +It's built sort of to scale very well and sort of do log structure merge over time. Rockset absolutely uses RocksDB as its storage plane. And so there's a lot of Rockset built on top of RocksDB. So Rockset is not RocksDB as a service. That is not what Rockset is. +We do use it as the storage plane of Rockset. And we do take heavy advantage of, again, to get into the technical weeds a little bit like log structured merges to keep our indexes sort of up to date continuously. And that is a big part of like the real timeness of Rockset. +Like being able to update the index continuously and having this like heavy weight infrastructure to merge these indexes and then the kind of the appendonly log structured way that you do. And the LSM world is part of the secret sauce. +It's not that secret, but it's part of the secret sauce of Rockset. Yeah, for sure. But then also all these things like vector search, you know, storing the embeddings. Is that also happening outside of RocksDB? Basically, in the layer you explained. +So hold on, you asked about vector, what were the things? Oh, embeddings. And embeddings and vector search itself and the sort of a and n indexes presumably. Yeah. +So the a and n index, so we've added, we've extended RocksDB a little bit to kind of have this notion of a blob of memory that you attach to a particular thing. It's what's going to be the a and n index. And then you can build custom operators to merge them, for example. +And so that we do, we do essentially shove the a and n index into this. And so it gets into RocksDB. RocksDB doesn't know about a and n indexes. It just knows there's a blob of memory that it has to log structure merge down the road. As far as embeddings, for us, that's just arrays. +So for us, an embedding is just a vector. And for us, a vector is just an array. There's no real difference in the way these things are stored. And those are stored in RocksDB. Yeah, got it. +And so, and basically, what other AI capabilities does RocksDB offer, you know, basically everything? What's the secret source of that thing? So they're facing, right? But still. So I have two, there's a few things to talk about here. We talk about secret sauce. +So one thing we skipped over about one thing that's worth touching on in terms of RocksDB or architecture is RocksDB has two things that you hope every database has, but not every database has one is we have disaggregated storage, like fully disaggregated storage. +So if you double your storage, you can, you can, basically, you can double your storage, you can double your compute, you can do either. You don't have to do both, right? You can, they are stable. +They, there's compute optimized machines and storage optimized machines, and you can add to either group independently. We also have compute and compute isolation. So you can set aside a set of machines, for example, just to do ingest and a different set of machines, just to do queries. +And they both operate on the same backend, for example. You can go farther than that. +You can have different groups of machines for different sets of queries or by 10 in or whatever you can go wild with this idea, isolating compute from it's from each other, right? Once you have disaggregated storage, this is an idea you can do. +This is already really powerful for AI use cases, like in a way you don't necessarily appreciate, because what it means is I have a way to do my index rebuilds, which are expensive in a vector world, away from the machines handling queries. +Like I'm not, like what's not going to happen is the machine, the database is going to bog itself down doing an index updates of some sort while queries are trying to be served and you're going to get time out. So being able to actually separate out compute is very powerful in these AI settings. +Again, another example, no one's done this like in total anger yet, but it's going to come, which is like the hey, I have a god awful amount of vectors. I want to update them to the next generation of my the new open AI model has come out. I want to rerun the entire data set. +We can do that in this kind of like off on the side fashion in a way that just reduz it all in place without affecting the running application as an example. +So that's one like kind of very architectural found is a very database type of a feature that you you will miss it if you don't have it when the day comes. +Moving up to kind of more AI level things, the other thing that we have is like we have a huge pile of infrastructure of like doing SQL and relational queries, right? In this system that's separate from the vector stuff. +So when the vector stuff gets mixed with that stuff, the things get very powerful and very magical. +And so this gets you into so there's it's funny because database people talk a certain way and AI people talk a certain way and a lot of times they're actually saying the same thing, but they use none of the same words. +And so they don't know they're talking about the same thing, but as an example, like so in an AI context, things like metadata filtering or hybrid search, these are all things Rockset does out of the box. +Like metadata filtering in an AI context or a vector context, that's just like the wear clause of a SQL query. Like that's all that is like where X is created and this time is created and that. Like so for us, it's that's all done. Like metadata filtering is easy. That's not a hard problem at all. +All you have to do is you know, I have a super powerful query language. I'm query optimizer. All you have to do is kind of merge that with the a and n kind of vector search and I get like metadata filtering is like a not that's not a hard problem for us to solve. +Like it would be for others to solve. And so I do think we really shine in situations where a you care about real time ingest, be you care about any kind of hybrid or metadata filtering. +Rockset's really good as well for for kind of raw vector power, but I wouldn't say we're like the best database in the world for like, I don't know, I view it more like we kind of going where our customers are taking us. +Like if the customers came to me as like, Hey, if you if you if I can have 10 times more vectors and like 4% more precision and recall, if you implemented this slightly better algorithm with these parameters, we would do it. +But almost always it's like they want we want that like hybrid search seems to be the king. Like it's it's it's merging these things. And that's where like a lot of our effort has gone is into making the hybrid search story like making these two worlds work together like fairly seamlessly. +Like be able to say like show me the closest 10 vectors that were updated in the last 10 minutes. Like that that kind of query is really powerful. And that's kind of what we've been focused on in terms of in terms of. But I guess the timestamp example you gave it's also like metadata check right. +It's kind of like way close where you say between a and b timestamps. Yes. Yes. But like hybrid search at least the way I'm hearing people do this is that let's say take the search domains example. +You might have a keyword search right which is your sparse index and then you have your then syntax vector search. And you want to combine the two in some way. For example, you could say I still trust keyword search. So let's give it 75% of weight and then 25% goes to vector. +And then you combine them into leave them in some merging strategy. And then you return back to the user. Is this how you see hybrid search or do you see? So I have a whole ran here. You might you might have yes you've unlocked my ran here. +So let's go so hybrid search is one of these very overloaded terms exactly as you have kind of this is kind of sometimes what people mean. Sometimes people do they they smuggle metadata filtering as a hybrid search. +Strictly speaking under my definition, metadata filtering is a kind of hybrid search. It just sort of has extreme weights right like it's weight one if it matches and zero if it doesn't. And so it's kind of like a weighted hybrid search. +You can also do this kind of linear combination hybrid search right like I have a BM 25 keyword type a ranker which by the way, rockset can do like rockset has this rockset you can build you can do this order by keyword ranking limit 10 like you could write that. +And then you can also then do the vector limit 10 like show me the 10 closest vectors. There's nothing stopping you from saying or you know order by 0.25 of that plus 0.75 of that for example, in your in your example. +So that kind of linear combination hybrid search is is doable like that that's how that's how you could do rockset. Sorry, that's how you can do that kind of hybrid search on rockset today. + Now you people do do slightly more advanced things than this by the way there are like you can go beyond that in hybrid search and get into things like by encoding and crossing coding where you really do try to take the the expanded vector space and treat it non-linearly so it's no longer a linear combination of the two halves. +And we've this is this is something we are actively looking at. + I don't so it's I don't think it's hard to add like it's easy extension onto onto onto the current system but it's more of like a science question like it's more of like if you tell me what to add I'll add it sure that's easy but it's like what do we add like what's the right crossing code I don't know that's a much harder problem that's like much more of a scientific question in terms of like do I need to train an encoder for your particular use case is there such a thing as a good off the shelf one right so that's that's kind of where we're at with this but but in terms of adding that functionality that is like an active you've you've this is the this is the frontier right now for us that's for those people that they're trying to go beyond the kind of bilinear yeah wait yeah another thing yeah bilinear is it's it's amazing maybe you can share some resources as well for for me and the audience to read but also I thought you know when hybrid search sort of topic emerged right in in the vector database world you know bb8 pine cone milbus what you're like and so on I think one thing that was overlooked and I really wanted to tap into that at some point is to learn the alpha rise because it's not given like how should you come if you go with linear combination you know what should be the alpha for your data yeah it's fun yeah this is what's certain like the search community has been doing things like this for a long time like search people are quite familiar with this idea of I have a semantic search system I have a keyword ranking system I have an alpha and I've learned I learned that alpha and I inject it into my system and then they've even gone farther like search has this whole like wand idea people I don't know if people are familiar so again we have to have all these community like weekend weekend exactly right joe bersham listening to this podcast will probably say yeah I know what you're talking about yeah yeah yeah so it's funny because like the vector community is sort of like I mean it's not it's not rehashing it's not relearning because it's got this new thing this a and n things what's got to like drag a and through the search history of like things these other kinds of things so yes so so learning the ant parameter um this is not a particularly hard thing to do using rock set but it's not a thing we we don't we don't help you like I don't have a button you push to to automatically learn your your alpha like you can send me whatever query you want it can have whatever alpha in it you want you can build whatever system you can query us arbitrarily to generate an alpha however you'd like and and then send me the queries with the alpha you you you you you dreaded and ready but that's that's roughly how that's going to look today for us yeah do you think at all maybe picking a little bit into the future and sort of inspiration do you think at all that the industry you iin one day will end up suggesting these values to their users you know learning from the data and sort of maybe even like you know looking at how things behave you know introduction what do people click although yeah there is a risk of going too much into the application logic which you probably do not want to do but I know I my view is kind of like so once upon a time I had a similar feeling this reminds me of a similar discussion that didn't happen that long ago which was like around feature stores like database people looked at feature stores and we're like what do you need a feature store for like you just use a database store for features and the reality is most feature stores that's what they are they are databases that on top of them put a lot of things to help manage as a first class citizen the lifetime of a feature like orchestration platforms like like techton and he's like orc orc orcust and that orchestration systems that's that's what you're I think it no matter what there'll always be a database in there and something like rockset will be in there and the question of whether or not rockset the company is like a larger piece of software that has rockset the database and some orchestration layers above it to help you do these kinds of things that's a harder question I think that if you ask me to make a prediction about where things are going my guess is that for the foreseeable future hybrid searches king of some kind so very few problems will be purely vector search I that's my guess almost all will be will we greatly benefited by some form of hybridization even if it's just metadata filtering um and then that means that the more advanced search techniques that will slowly migrate over which means things like alpha learning and weak and and all these other kinds of higher level two-stage retrieval type ideas that that come from the search world I do think will come over and more and more influence the vector search world because the vector search ultimately is a form of search so it shouldn't be surprising that most of these same ideas are still still apply yeah for sure I mean there is this extreme example from Mark Cuban the episode on Lex Friedman podcast that they just finished listening to he says that probably in the future all of us will have their own LLAMs trained for whatever reason you know for example you want to do stock trading and so you start you know draining your model maybe on specific subset of stocks or whatever and then it will help you it will augment augment you as they say yeah as an entity yeah I would love I would love for a chat GPT that could like put in making an email sketch me a skeleton email in my voice like the chat GPT voice if I say hey write me an email to say this to somebody it's not my voice right it's a I don't know it's a little too corporate my voice is a little bit more yeah so it'd be cool to have it like learn my voice and be able to you know write me a skeleton that of something that was like sounded like me that would be that would be awesome I'll be I'm there for that yeah what I would like that some model or whatever it is would remind me that I forgot to drink water you know something like that so it learns my habits and it knows that it's bad for my health you know remember to do these remember to stand out remember to walk things like this you know I drank some water that's good everyone drinks water yes yeah please do because it's very healthy you need to drink I guess two liters a day whatever some people do forget this and then they say have you know I have to take pink here or so whatever no you don't just drink water but so what else do you want to share about Rockset you know as an offering as a AI enabler you know maybe do you guys plan to support rag or do you think rag is sort of like client side you know thing as well that people can do you know using your tap things like that no no we we um we actually have a bunch of rag type style use cases on Rockset today and I do I do think Rockset naturally supports rag but it's interesting so like I guess one of the my kind of open questions is pure rag and I'm making up a term here but but but it is actually one of the very few like almost perfect vector use cases in its pure vector search but I'm actually not convinced because even most of the people that we're we know that are doing like rag style things want are also doing some amount of boosting and or metadata filtering to like further augment like hybrid augmented the retrieval that augments the generation um uh and so so for example like hey if the user asks about a certain thing when you search for blurbs to augment the generation boost the more recent ones kind of a kind of a thing like there's this kind of thing that gets injected into these systems um yeah so I'm I'm we you can build this with Rockset today and I'm quite keen on on these kinds of use cases I would say that like like looking forward I'm I am quite interested in this kind of emerging dynamic of like where the real value is from here they're sort of like at least three dimensions things could go one is like better and better an an algorithms that squeeze more performance and more scale and more whatever recall out of out of everything out of every bite of RAM and so forth and so on another direction is incrementability so a lot of these there's a lot of a lot of these like really advanced really strong systems sorry a n n indexes are not updateable easily so the sort of updateability destroys a lot of what you just worked really hard to build or you spend way too much CPU to do it so which is better like which which in on in real life which are the what I rather update twice as fast or twice as painlessly or what I rather get three and a half percent more you know on my precision recall and then the third dimension is how do these things integrate with other indexes right so certain a and n indexes are much better at doing meditative filter at scale than other ones are and so you know if there's more value in that than the 3% I got over here then I so it's not all together clear we are pretty heavily betting on the I shouldn't say we're betting on it like right now we we got the hybrid stuff relatively easily so that's the thing that we're building heavily because all the all the hybridization has been and the the incrementability because that's core like so for us the incrementability is like not you have to have that I can't use an an index that requires like overnight training like that's not a thing that that rock that doesn't work with rocks x we were trying to be real time and then I guess there's like one fourth dimension that could blow all this up which is that somehow the vectors get so good that none of the rest of this matters like maybe there is maybe there is no rag maybe there is no like maybe the vectors just good enough maybe the machine is smart enough that we don't need any of the rest of this we don't need any hybrids I think that's unlikely in the short and medium term but who knows in the in the long that's probably require some kind of singularity yes it is jump right because that means that you do not need foundational models from metal whoever right you could train it from scratch and if you can do it within a couple minutes then why would you bother taking those models right that's very interesting it's my that that's that's why I said there's three and then I threw the fourth one in because I it's it's not impossible but I think it's not likely not anytime exactly I mean it's probably if this is about to happen then probably we would already see the room like you know the signals of that but today still we can see how these giants keep training the models and they keep open sourcing sometimes encodes sometimes for real but yeah it's it's another topic to cover I have a very practical question as well so for example if I do have a model and that model could be from Higging Face for example so it's not mine how do I bring the embeddings to Rockset can I leverage the Rockset's infrastructure to compute the embeddings themselves so this the answer in short is no today and it is on my list super high on my list so there is a customer who came to me tomorrow and was like hey I want to run this model using your infrastructure over my data I'd probably find a way to make that work for that like an existing customer like I would like because that's a feature I want to build I'm like waiting for the excuse to build that the problem for me is it's just really hard to build generally like if it was like call this API or support these exact kind of models it's not so hard but to do it in general without having like a specific customer demand it's a little bit trickier so we can kind of wait until that take a little bit more shape but we have the pieces in place like it's not hard for me to spin up a bunch of machines that run on your data and write to your database I just it's the actual like last mile of wire of like what code do I run how do I secure that code you know like that kind of stuff that's like what's missing from us so today and today you have to give me the embedding you're gonna have to run them and put them in a rock set but this is at the top of my list of sort of features I want to build yeah I mean it just sounds and by the way you know if you take database today probably you could divide them into two groups you know using these dimensions specifically whether or not you can compute embeddings inside and sometimes you do not want that because you want to like fine tune the model and obviously the database wouldn't have access to it unless there is a very easy way to plug it in which I haven't seen by the way probably I'm missing something but I haven't seen it and everyone today has some sort of vector support you know both the traditional databases as well as this new breed of vector databases but yeah that's interesting that's interesting that you guys are looking in that direction what else you know like if if someone wants in the audience wants to try rock set today you know do they need to pay it right away well can they have some free tier to play around oh there's free there's free tiers yeah so you can play around you can play around for free in rock set and uh if anybody is like super interested and they have something interesting and they they they can always email us too um we we will try to find a way to make make make that stuff work as much as possible but yes there is a free tier you can go back around with it yeah um and um it is managed so that the one thing you have to understand about rock set is the managed service right so you're not going to download it and run it or whatever it's not that's not the way it works no and and by the way that's exactly the advantage for businesses right and that's why we do have different business models you know because in the end of the day you're not doing this only for fun you you really need to run money too for the company to grow and and build more things for your users and so that's absolutely legit uh approach not everything needs to be open source you chose it that way but it's great that you have free tier and we can also link it in the show notes sure um what are you looking at you know do you need some you said you have already so many clients in different nations different verticals what else would you benefit from by sharing rock set into a wider community you know through these podcasts all right so there's a lot of ways to answer this question but but this is the vector group right so selfishly I kind of already hinted at this is I'm trying to get a clearer sense by where the value is going to come from in for vectors in the in the short and medium term for people like there's a lot of people out there and we saw this there's a million people trying to oh my god vectors are happening how do I plug this into my business like is there it's can I use this and we've seen a bunch of interesting super novel use cases like things you would not expect and you know there's an insurance company that want to that wants to like scan internal documents you know do they want to do search they want to do like internal search semantic search um and so for me my my most selfish interest here is to really get a clear picture of like which of these like little subdomains is actually really providing like real value like what is really what is really like taking off it's hard it's sometimes it's hard to tell like who's just messing around because everyone's messing around literally everyone is messing around and who's like actually latch on to something that's got some real legs and every time we find a customer that's got like real legs we dig in we're like all in we're like all right how can we help you like let me let's you know again like the the I'm waiting for one of these people to come back and be like can we retrain our embedding so like all right yeah let's go build it right so that's kind of my yeah I I want people to keep messing around with this stuff I I want to figure like all of us messing around it is going to find where it gets traction like where we can get our hooks in and where things start to start to really make progress and then I just want to hear from those people like I want to know what what you need every time we talk to someone it's something new and surprising right um and that's kind of though yeah when the real world intersects with all this like you know uh in my head it's all an indexes and graph theory or whatever but but uh when the real word intersects is always something like simple that you need that would make your life a lot easier and that's the kind of stuff that I'm eager to hear yeah I think uh I could share with you without saying what would be okay uh one uh member of my team said hey we're we're we're using one one um search engine today which also has you know beyond the um sparse indexals that vector search support and so he was saying okay they're using hnsw algorithm but I cannot tweak the amp parameter and I forgot what was the second parameter and look because I cannot do that recall is really below what it needs to be it just doesn't work and then he went online it's an open source database he typed you know the issue on github and they realized oh we missed really important thing so they quickly uh expose the parameters and so he now can tune them right so so yeah the tuning of the index is another this is a good one right so a lot of these systems have like a tier there's like a coarse grain and a fine grain so you have hnsw over IVF or hnsw over or IVF or IVF and then each of these has parameters and so you get these like massive config strings that set that say how these are built um and we we expose this so you can do all this stuff but in real life if you're building like what what number do you even pick like how do you know I don't know that person must have gone through a lot to decide they needed to change that ever because it's not obvious it's not like oh yeah you it's like you look at the data like 16s wrong like the infrastructure to like optimize this system is not trivial and then even if you do optimize it you have to rerun everything you have to rebuild that index right once you kind of trained it so to speak so yeah I think that's a that's a huge area where our our infrastructure is not helpful at the moment yeah but I'm sure you will learn in general excited like Luis look you have so much information that I think we should record another episode as well down the road as you guys progressing on the database and you add all this interesting you know tweaks that and not to the database as well but I'm also super excited about the direction because basically you offer like if you take pure vector databases you know they do not implement SQL support right right they like the purpose of what what the existence is something else right they've been designed to have vectors as the first class citizens and so they they make it super easy to plug in a model or actually have the model you know almost pulled from hugging face or some other model storage model model hub but then when you want to do some facets or whatever you want to call them aggregations right that's not as easy depends on database probably as well but I've seen some I don't want to name them but in any case they know it's it's a weak point and it's probably because they do not want to serve that segment of the market maybe they do it's partially right but it's so hard yeah exactly yeah I mean I think that the really good vector databases who succeed will slowly turn into databases and databases will turn into like these things are merging they're just coming at each other for different directions like like if you're at a vector database if you're building a vector database and you're looking at your metadata filtering support you're like I can't make this more powerful without just reinventing SQL like at some point I'm going to have to just build SQL and so one day they're going to bite the bullet and we'll I mean maybe not SQL but something you know SQL complete if you will right because you just need all that stuff and then pretty soon you get into the problem of like hey my metadata filter is the slow part of this of my thing so now what like oh now I'm doing query optimization like SQL query optimization like now I'm building query optimizers like metadata filter optimizers and you know so we have all that like we brought all that to the party right like I have a I have a cost based optimizer for my SQL query so if your metadata filter does crazy stuff I can like do you know all kinds of SQL magic to like to optimize this query but on the flip side like yeah like so everybody's everybody I think the good systems need all this stuff it so it's just we took a hard problem we took two hard problems and we say congratulations this now one hard problem and it's like okay well okay it's a big hard problem yeah I love how you you model it that this database is and non-data bases sort of will converge eventually even though everyone I think at this point calls themselves a database yeah probably minor exceptions but still you are spot on on whether or not first of all what is a database right and then whether or not you have all these features that that need to be supported and and also like really importantly the world is used to having SQL databases right so like if you sort of I don't have a better analogy but basically if you develop something and you say it can run but cannot walk and you're like okay but sometimes you need to walk right that's amazing before we close I really like to ask this question with some people find it a little awkward to answer but I do feel it's important it's a little bit philosophical and I ask what drives you it used to be why you do this but basically when you wake up you know you are driven to continue but what's inside that spinning you've been through it right you've been doing this for so many years also Facebook at scale but you want to continue to do that so I am the way I think about this is there's there's like a shiny problem at the heart of all this that I love and if you if you let me I will sit there and like I will be happy if like I just come into work every day and like look through the corridors and and fix bugs anything that's crashing then look through the profiles and like optimize code I can just do this this just makes me happy and so like building like reliable scalable systems make me happy so there's like this shiny problem in the middle of all this and it's like the common thread through everything that I could just do and be happy and it it's rewarded and rewarding right and so that like the basis of like it's really easy to like this stuff so then obviously you have to extend upon that like the way that you get driven beyond the shiny thing because you know I could go do that for like Minecraft mods I don't have to do that for databases is like in some larger mission that like you feel connected to so for me the the mission here was a little bit too old the people was actually kind of the original driving force like this is the people I don't even care what we're I don't care what we're doing like let's go do it like us as a group that's gonna be fun but then the whole AI thing I mean look I we can get philosophical you want to get philosophical do it real quick there's like two or three nominations for technologies that will change the 21st century like and I you got to work pretty hard to not put AI at the top of that list maybe there's some other ones you could argue like maybe nuclear fusion is a 21st century revolution maybe gene editing like I don't know you could come up with something but like chances are that AI is gonna be like a defining 21st century technology so you're gonna let me play with my shiny toys in that yeah that's okay I'm out of bed now right I'll get out of bed I'll come I'll come get out of bed and I will let's go let's go build something so that's I think that's my answer to your question amazing and I think you got it on from here to really and and they saw passion knowledge so you did see the movements so I'm really excited to see what you guys got a built thank you so much for joining me today to discuss yes we didn't go to the NM tuning this algo and this is how the algorithm goes but hey I really enjoyed the product level this is what this is on the money some during my company say yeah fantastic thank you so much Luis you know enjoy your day and let's talk soon awesome thank you for having me and yeah happy to chat again all right cheers bye bye \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/malte-pietsch-cto-deepset-passion-in-nlp-and-bridging-the-academia-industry-gap-with-haystack.md b/transcripts_with_timestamps/vector-podcast/malte-pietsch-cto-deepset-passion-in-nlp-and-bridging-the-academia-industry-gap-with-haystack.md new file mode 100644 index 0000000..c354063 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/malte-pietsch-cto-deepset-passion-in-nlp-and-bridging-the-academia-industry-gap-with-haystack.md @@ -0,0 +1,3051 @@ +--- +description: '

YouTube: https://www.youtube.com/watch?v=N5Brb7Rzc2c

Topics:

00:00 + Introduction

01:12 Malte’s background

07:58 NLP crossing paths with + Search

11:20 Product discovery: early stage repetitive use cases pre-dating + Haystack

16:25 Acyclic directed graph for modeling a complex search pipeline

18:22 + Early integrations with Vector Databases

20:09 Aha!-use case in Haystack

23:23 + Capabilities of Haystack today

30:11 Deepset Cloud: end-to-end deployment, + experiment tracking, observability, evaluation, debugging and communicating with + stakeholders

39:00 Examples of value for the end-users of Deepset Cloud

46:00 + Success metrics

50:35 Where Haystack is taking us beyond MLOps for search + experimentation

57:13 Haystack as a smart assistant to guide experiments

1:02:49 + Multimodality

1:05:53 Future of the Vector Search / NLP field: large language + models

1:15:13 Incorporating knowledge into Language Models & an Open + NLP Meetup on this topic

1:16:25 The magical question of WHY

1:23:47 + Announcements from Malte

Show notes:

- Haystack: https://github.com/deepset-ai/haystack/

- + Deepset Cloud: https://www.deepset.ai/deepset-cloud

- + Tutorial: Build Your First QA System: https://haystack.deepset.ai/tutorials/v0.5.0/first-qa-system

- + Open NLP Meetup on Sep 29th (Nils Reimers talking about “Incorporating New Knowledge + Into LMs”): https://www.meetup.com/open-nlp-meetup/events/287159377/

- + Atlas Paper (Few shot learning with retrieval augmented large language models): + https://arxiv.org/abs/2208.03299

- + Zero click search: https://www.searchmetrics.com/glossary/zero-click-searches/

Very + large LMs:

- 540B PaLM by Google: https://lnkd.in/eajsjCMr

- 11B + Atlas by Meta: https://lnkd.in/eENzNkrG

- + 20B AlexaTM by Amazon: https://lnkd.in/eyBaZDTy

- + Players in Vector Search: https://www.youtube.com/watch?v=8IOpgmXf5r8 + https://dmitry-kan.medium.com/players-in-vector-search-video-2fd390d00d6

- + Click Residual: A Query Success Metric: https://observer.wunderwood.org/2022/08/08/click-residual-a-query-success-metric/

- + Tutorials and papers around incorporating Knowledge into Language Models: https://cs.stanford.edu/people/cgzhu/

' +image_url: https://media.rss.com/vector-podcast/20220830_070827_46ba9c40226c9b5c8e39886c99b0aea3.jpg +pub_date: Tue, 30 Aug 2022 07:27:26 GMT +title: Malte Pietsch - CTO, Deepset - Passion in NLP and bridging the academia-industry + gap with Haystack +url: https://rss.com/podcasts/vector-podcast/599924 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 29.0, "text": " Hello + there, Vector Podcast. Season 2, we are relaunching after summer and it was a little", + "tokens": [50364, 2425, 456, 11, 691, 20814, 29972, 13, 16465, 568, 11, 321, 366, + 5195, 46079, 934, 4266, 293, 309, 390, 257, 707, 51814], "temperature": 0.0, "avg_logprob": + -0.43664016723632815, "compression_ratio": 1.0588235294117647, "no_speech_prob": + 0.09692271053791046}, {"id": 1, "seek": 2900, "start": 29.0, "end": 33.08, "text": + " bit of break last episode was from Berlin buzzwords and today,", "tokens": [50364, + 857, 295, 1821, 1036, 3500, 390, 490, 13848, 13036, 13832, 293, 965, 11, 50568], + "temperature": 0.0, "avg_logprob": -0.31167654389316596, "compression_ratio": 1.5369649805447472, + "no_speech_prob": 0.5647331476211548}, {"id": 2, "seek": 2900, "start": 33.08, "end": + 40.96, "text": " coincidentally, we have a guest from Berlin, multi-peach, a studio + of deep set, the company", "tokens": [50568, 13001, 36578, 11, 321, 362, 257, 8341, + 490, 13848, 11, 4825, 12, 494, 608, 11, 257, 6811, 295, 2452, 992, 11, 264, 2237, + 50962], "temperature": 0.0, "avg_logprob": -0.31167654389316596, "compression_ratio": + 1.5369649805447472, "no_speech_prob": 0.5647331476211548}, {"id": 3, "seek": 2900, + "start": 40.96, "end": 46.400000000000006, "text": " behind Haystack. So we''re + going to be diving into what I call a neural framework, but I wonder", "tokens": + [50962, 2261, 8721, 372, 501, 13, 407, 321, 434, 516, 281, 312, 20241, 666, 437, + 286, 818, 257, 18161, 8388, 11, 457, 286, 2441, 51234], "temperature": 0.0, "avg_logprob": + -0.31167654389316596, "compression_ratio": 1.5369649805447472, "no_speech_prob": + 0.5647331476211548}, {"id": 4, "seek": 2900, "start": 46.400000000000006, "end": + 54.0, "text": " if Malta would give a different picture there, but still very interested + to learn and dive", "tokens": [51234, 498, 5746, 1328, 576, 976, 257, 819, 3036, + 456, 11, 457, 920, 588, 3102, 281, 1466, 293, 9192, 51614], "temperature": 0.0, + "avg_logprob": -0.31167654389316596, "compression_ratio": 1.5369649805447472, "no_speech_prob": + 0.5647331476211548}, {"id": 5, "seek": 2900, "start": 54.0, "end": 56.8, "text": + " into multiple topics there. Hey, Malta, how you doing?", "tokens": [51614, 666, + 3866, 8378, 456, 13, 1911, 11, 5746, 1328, 11, 577, 291, 884, 30, 51754], "temperature": + 0.0, "avg_logprob": -0.31167654389316596, "compression_ratio": 1.5369649805447472, + "no_speech_prob": 0.5647331476211548}, {"id": 6, "seek": 5680, "start": 57.8, "end": + 60.8, "text": " I''m good doing great. Thanks for having me today. How are you doing?", + "tokens": [50414, 286, 478, 665, 884, 869, 13, 2561, 337, 1419, 385, 965, 13, 1012, + 366, 291, 884, 30, 50564], "temperature": 0.0, "avg_logprob": -0.20092537379500888, + "compression_ratio": 1.6444444444444444, "no_speech_prob": 0.041570864617824554}, + {"id": 7, "seek": 5680, "start": 60.8, "end": 66.8, "text": " I''m good. I''m great. + It''s still summer. It''s super hot as we were exchanging before the recording.", + "tokens": [50564, 286, 478, 665, 13, 286, 478, 869, 13, 467, 311, 920, 4266, 13, + 467, 311, 1687, 2368, 382, 321, 645, 6210, 9741, 949, 264, 6613, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.20092537379500888, "compression_ratio": 1.6444444444444444, + "no_speech_prob": 0.041570864617824554}, {"id": 8, "seek": 5680, "start": 66.8, + "end": 74.8, "text": " It''s super, super hot, but I like it. So yeah, I think before + we dive into what is Haystack,", "tokens": [50864, 467, 311, 1687, 11, 1687, 2368, + 11, 457, 286, 411, 309, 13, 407, 1338, 11, 286, 519, 949, 321, 9192, 666, 437, 307, + 8721, 372, 501, 11, 51264], "temperature": 0.0, "avg_logprob": -0.20092537379500888, + "compression_ratio": 1.6444444444444444, "no_speech_prob": 0.041570864617824554}, + {"id": 9, "seek": 5680, "start": 74.8, "end": 83.8, "text": " I really like to learn + about yourself and what is your background and how did you find yourself in this + space", "tokens": [51264, 286, 534, 411, 281, 1466, 466, 1803, 293, 437, 307, 428, + 3678, 293, 577, 630, 291, 915, 1803, 294, 341, 1901, 51714], "temperature": 0.0, + "avg_logprob": -0.20092537379500888, "compression_ratio": 1.6444444444444444, "no_speech_prob": + 0.041570864617824554}, {"id": 10, "seek": 8380, "start": 83.8, "end": 89.8, "text": + " of what we call Vector Search? I wonder if you describe it differently, but I + call it Vector Search,", "tokens": [50364, 295, 437, 321, 818, 691, 20814, 17180, + 30, 286, 2441, 498, 291, 6786, 309, 7614, 11, 457, 286, 818, 309, 691, 20814, 17180, + 11, 50664], "temperature": 0.0, "avg_logprob": -0.18857783856599228, "compression_ratio": + 1.5622119815668203, "no_speech_prob": 0.14885388314723969}, {"id": 11, "seek": 8380, + "start": 89.8, "end": 93.8, "text": " Vector Search players. So can you tell a bit + about that?", "tokens": [50664, 691, 20814, 17180, 4150, 13, 407, 393, 291, 980, + 257, 857, 466, 300, 30, 50864], "temperature": 0.0, "avg_logprob": -0.18857783856599228, + "compression_ratio": 1.5622119815668203, "no_speech_prob": 0.14885388314723969}, + {"id": 12, "seek": 8380, "start": 93.8, "end": 100.8, "text": " Yeah, I''m sure + I''m happy. So I would say my background is mostly in NLP engineering,", "tokens": + [50864, 865, 11, 286, 478, 988, 286, 478, 2055, 13, 407, 286, 576, 584, 452, 3678, + 307, 5240, 294, 426, 45196, 7043, 11, 51214], "temperature": 0.0, "avg_logprob": + -0.18857783856599228, "compression_ratio": 1.5622119815668203, "no_speech_prob": + 0.14885388314723969}, {"id": 13, "seek": 8380, "start": 100.8, "end": 107.8, "text": + " what I would call probably these days. And during my studies, I basically had + no clue about NLP.", "tokens": [51214, 437, 286, 576, 818, 1391, 613, 1708, 13, + 400, 1830, 452, 5313, 11, 286, 1936, 632, 572, 13602, 466, 426, 45196, 13, 51564], + "temperature": 0.0, "avg_logprob": -0.18857783856599228, "compression_ratio": 1.5622119815668203, + "no_speech_prob": 0.14885388314723969}, {"id": 14, "seek": 10780, "start": 108.8, + "end": 115.8, "text": " I think it wasn''t really any part of our coursework or + something really a thing.", "tokens": [50414, 286, 519, 309, 2067, 380, 534, 604, + 644, 295, 527, 1164, 1902, 420, 746, 534, 257, 551, 13, 50764], "temperature": 0.0, + "avg_logprob": -0.19991210569818335, "compression_ratio": 1.6055045871559632, "no_speech_prob": + 0.04129105433821678}, {"id": 15, "seek": 10780, "start": 115.8, "end": 122.8, "text": + " And for me, all then started basically after my studies, went to the research + project in the US,", "tokens": [50764, 400, 337, 385, 11, 439, 550, 1409, 1936, + 934, 452, 5313, 11, 1437, 281, 264, 2132, 1716, 294, 264, 2546, 11, 51114], "temperature": + 0.0, "avg_logprob": -0.19991210569818335, "compression_ratio": 1.6055045871559632, + "no_speech_prob": 0.04129105433821678}, {"id": 16, "seek": 10780, "start": 122.8, + "end": 126.8, "text": " which was at the intersection of machine learning and healthcare.", + "tokens": [51114, 597, 390, 412, 264, 15236, 295, 3479, 2539, 293, 8884, 13, 51314], + "temperature": 0.0, "avg_logprob": -0.19991210569818335, "compression_ratio": 1.6055045871559632, + "no_speech_prob": 0.04129105433821678}, {"id": 17, "seek": 10780, "start": 126.8, + "end": 134.8, "text": " And the big, big focus there was on numerical data. So we + were basically trying to find signals, patterns,", "tokens": [51314, 400, 264, 955, + 11, 955, 1879, 456, 390, 322, 29054, 1412, 13, 407, 321, 645, 1936, 1382, 281, 915, + 12354, 11, 8294, 11, 51714], "temperature": 0.0, "avg_logprob": -0.19991210569818335, + "compression_ratio": 1.6055045871559632, "no_speech_prob": 0.04129105433821678}, + {"id": 18, "seek": 13480, "start": 135.8, "end": 141.8, "text": " and laboratory + measurements for kidney disease patients to predict some kind of risks.", "tokens": + [50414, 293, 16523, 15383, 337, 19000, 4752, 4209, 281, 6069, 512, 733, 295, 10888, + 13, 50714], "temperature": 0.0, "avg_logprob": -0.16551578367078626, "compression_ratio": + 1.6122448979591837, "no_speech_prob": 0.007430925965309143}, {"id": 19, "seek": + 13480, "start": 141.8, "end": 149.8, "text": " And there was all the kind of numerical + data. And NLP wasn''t really really scope of that project,", "tokens": [50714, 400, + 456, 390, 439, 264, 733, 295, 29054, 1412, 13, 400, 426, 45196, 2067, 380, 534, + 534, 11923, 295, 300, 1716, 11, 51114], "temperature": 0.0, "avg_logprob": -0.16551578367078626, + "compression_ratio": 1.6122448979591837, "no_speech_prob": 0.007430925965309143}, + {"id": 20, "seek": 13480, "start": 149.8, "end": 160.8, "text": " but there was + for me, that basically one kind of event that made me then get in touch with NLP + and eventually fell at fall in love.", "tokens": [51114, 457, 456, 390, 337, 385, + 11, 300, 1936, 472, 733, 295, 2280, 300, 1027, 385, 550, 483, 294, 2557, 365, 426, + 45196, 293, 4728, 5696, 412, 2100, 294, 959, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.16551578367078626, "compression_ratio": 1.6122448979591837, "no_speech_prob": + 0.007430925965309143}, {"id": 21, "seek": 16080, "start": 161.8, "end": 167.8, "text": + " And it was really in this project, we tried to predict a lot of these risk factors + through a lot of,", "tokens": [50414, 400, 309, 390, 534, 294, 341, 1716, 11, 321, + 3031, 281, 6069, 257, 688, 295, 613, 3148, 6771, 807, 257, 688, 295, 11, 50714], + "temperature": 0.0, "avg_logprob": -0.18421941333346897, "compression_ratio": 1.6470588235294117, + "no_speech_prob": 0.0091433459892869}, {"id": 22, "seek": 16080, "start": 167.8, + "end": 172.8, "text": " I would say, quite fancy modeling to get some good signals.", + "tokens": [50714, 286, 576, 584, 11, 1596, 10247, 15983, 281, 483, 512, 665, 12354, + 13, 50964], "temperature": 0.0, "avg_logprob": -0.18421941333346897, "compression_ratio": + 1.6470588235294117, "no_speech_prob": 0.0091433459892869}, {"id": 23, "seek": 16080, + "start": 172.8, "end": 183.8, "text": " And at the end, it kind of worked. We were + able to predict some risks, but when we then talked to doctors and showed them these + results or asked for their feedback,", "tokens": [50964, 400, 412, 264, 917, 11, + 309, 733, 295, 2732, 13, 492, 645, 1075, 281, 6069, 512, 10888, 11, 457, 562, 321, + 550, 2825, 281, 8778, 293, 4712, 552, 613, 3542, 420, 2351, 337, 641, 5824, 11, + 51514], "temperature": 0.0, "avg_logprob": -0.18421941333346897, "compression_ratio": + 1.6470588235294117, "no_speech_prob": 0.0091433459892869}, {"id": 24, "seek": 16080, + "start": 183.8, "end": 189.8, "text": " they said, yeah, yeah, that''s all correct. + Yeah, but it''s not really new. We knew that before.", "tokens": [51514, 436, 848, + 11, 1338, 11, 1338, 11, 300, 311, 439, 3006, 13, 865, 11, 457, 309, 311, 406, 534, + 777, 13, 492, 2586, 300, 949, 13, 51814], "temperature": 0.0, "avg_logprob": -0.18421941333346897, + "compression_ratio": 1.6470588235294117, "no_speech_prob": 0.0091433459892869}, + {"id": 25, "seek": 18980, "start": 190.8, "end": 199.8, "text": " But this part + here, this is like, this is an interesting one. And this is what we do there. And + that was basically the only small part where we,", "tokens": [50414, 583, 341, 644, + 510, 11, 341, 307, 411, 11, 341, 307, 364, 1880, 472, 13, 400, 341, 307, 437, 321, + 360, 456, 13, 400, 300, 390, 1936, 264, 787, 1359, 644, 689, 321, 11, 50864], "temperature": + 0.0, "avg_logprob": -0.20698158187095564, "compression_ratio": 1.8256880733944953, + "no_speech_prob": 0.0024313770700246096}, {"id": 26, "seek": 18980, "start": 199.8, + "end": 210.8, "text": " where we looked at written notes of of doctors during treatments. + And from a modeling perspective, that was really, I would say, nothing fancy, nothing + advanced,", "tokens": [50864, 689, 321, 2956, 412, 3720, 5570, 295, 295, 8778, 1830, + 15795, 13, 400, 490, 257, 15983, 4585, 11, 300, 390, 534, 11, 286, 576, 584, 11, + 1825, 10247, 11, 1825, 7339, 11, 51414], "temperature": 0.0, "avg_logprob": -0.20698158187095564, + "compression_ratio": 1.8256880733944953, "no_speech_prob": 0.0024313770700246096}, + {"id": 27, "seek": 18980, "start": 210.8, "end": 215.8, "text": " nothing where + we spend a lot of time. But at the end, it was the point, I think, where the,", + "tokens": [51414, 1825, 689, 321, 3496, 257, 688, 295, 565, 13, 583, 412, 264, 917, + 11, 309, 390, 264, 935, 11, 286, 519, 11, 689, 264, 11, 51664], "temperature": 0.0, + "avg_logprob": -0.20698158187095564, "compression_ratio": 1.8256880733944953, "no_speech_prob": + 0.0024313770700246096}, {"id": 28, "seek": 21580, "start": 216.8, "end": 223.8, + "text": " the doctor''s physician saw the biggest value. And that kind of got me + to think again, thought, okay, well, like,", "tokens": [50414, 264, 4631, 311, 16456, + 1866, 264, 3880, 2158, 13, 400, 300, 733, 295, 658, 385, 281, 519, 797, 11, 1194, + 11, 1392, 11, 731, 11, 411, 11, 50764], "temperature": 0.0, "avg_logprob": -0.2544753991284417, + "compression_ratio": 1.6491935483870968, "no_speech_prob": 0.016170455142855644}, + {"id": 29, "seek": 21580, "start": 224.8, "end": 231.8, "text": " just this kind + of data source, it was something they couldn''t really access before. And now with + this, like, very simple,", "tokens": [50814, 445, 341, 733, 295, 1412, 4009, 11, + 309, 390, 746, 436, 2809, 380, 534, 2105, 949, 13, 400, 586, 365, 341, 11, 411, + 11, 588, 2199, 11, 51164], "temperature": 0.0, "avg_logprob": -0.2544753991284417, + "compression_ratio": 1.6491935483870968, "no_speech_prob": 0.016170455142855644}, + {"id": 30, "seek": 21580, "start": 231.8, "end": 241.8, "text": " native methods, + they somehow saw a value, a new thing. And that''s basically where I thought, oh, + what, it''s cool. What can you actually then do with more advanced methods of,", + "tokens": [51164, 8470, 7150, 11, 436, 6063, 1866, 257, 2158, 11, 257, 777, 551, + 13, 400, 300, 311, 1936, 689, 286, 1194, 11, 1954, 11, 437, 11, 309, 311, 1627, + 13, 708, 393, 291, 767, 550, 360, 365, 544, 7339, 7150, 295, 11, 51664], "temperature": + 0.0, "avg_logprob": -0.2544753991284417, "compression_ratio": 1.6491935483870968, + "no_speech_prob": 0.016170455142855644}, {"id": 31, "seek": 24180, "start": 242.8, + "end": 253.8, "text": " if you have more fancy models, how can you make this kind + of unused data source than accessible. And yeah, basically, realizing this, the + power of it.", "tokens": [50414, 498, 291, 362, 544, 10247, 5245, 11, 577, 393, + 291, 652, 341, 733, 295, 44383, 1412, 4009, 813, 9515, 13, 400, 1338, 11, 1936, + 11, 16734, 341, 11, 264, 1347, 295, 309, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.28173087193415713, "compression_ratio": 1.6431924882629108, "no_speech_prob": + 0.002493217820301652}, {"id": 32, "seek": 24180, "start": 254.8, "end": 268.8, "text": + " And that''s basically when it then started digging deeper, working more on energy, + at some point, then set left research, because I was really interested in seeing + these models working the real world.", "tokens": [51014, 400, 300, 311, 1936, 562, + 309, 550, 1409, 17343, 7731, 11, 1364, 544, 322, 2281, 11, 412, 512, 935, 11, 550, + 992, 1411, 2132, 11, 570, 286, 390, 534, 3102, 294, 2577, 613, 5245, 1364, 264, + 957, 1002, 13, 51714], "temperature": 0.0, "avg_logprob": -0.28173087193415713, + "compression_ratio": 1.6431924882629108, "no_speech_prob": 0.002493217820301652}, + {"id": 33, "seek": 26880, "start": 269.8, "end": 288.8, "text": " How do they work + at scale? How can they really then solve problems every day? And basically, and + came back to Germany, worked in a couple of startups, always just say, NAP at scale, + kind of intersection, a lot in online advertisement, recommend our systems.", "tokens": + [50414, 1012, 360, 436, 589, 412, 4373, 30, 1012, 393, 436, 534, 550, 5039, 2740, + 633, 786, 30, 400, 1936, 11, 293, 1361, 646, 281, 7244, 11, 2732, 294, 257, 1916, + 295, 28041, 11, 1009, 445, 584, 11, 426, 4715, 412, 4373, 11, 733, 295, 15236, 11, + 257, 688, 294, 2950, 31370, 11, 2748, 527, 3652, 13, 51364], "temperature": 0.0, + "avg_logprob": -0.27561473846435547, "compression_ratio": 1.4277777777777778, "no_speech_prob": + 0.007864031009376049}, {"id": 34, "seek": 28880, "start": 289.8, "end": 308.8, "text": + " And then eventually four years ago, then we started sort of deep set. And together + with two colleagues, we found the deep set basically because we saw this big motion + appeared was kind of piling up.", "tokens": [50414, 400, 550, 4728, 1451, 924, 2057, + 11, 550, 321, 1409, 1333, 295, 2452, 992, 13, 400, 1214, 365, 732, 7734, 11, 321, + 1352, 264, 2452, 992, 1936, 570, 321, 1866, 341, 955, 5394, 8516, 390, 733, 295, + 280, 4883, 493, 13, 51364], "temperature": 0.0, "avg_logprob": -0.2754840638902452, + "compression_ratio": 1.434782608695652, "no_speech_prob": 0.09630188345909119}, + {"id": 35, "seek": 30880, "start": 308.8, "end": 321.8, "text": " There was a whole + like still pre transformers, but there were early science, I think, on research + that, that things are becoming more feasible and super interesting things became + possible.", "tokens": [50364, 821, 390, 257, 1379, 411, 920, 659, 4088, 433, 11, + 457, 456, 645, 2440, 3497, 11, 286, 519, 11, 322, 2132, 300, 11, 300, 721, 366, + 5617, 544, 26648, 293, 1687, 1880, 721, 3062, 1944, 13, 51014], "temperature": 0.0, + "avg_logprob": -0.19103850920995077, "compression_ratio": 1.752, "no_speech_prob": + 0.061876144260168076}, {"id": 36, "seek": 30880, "start": 322.8, "end": 337.8, "text": + " At the same time, we also saw that there''s this big gap, you know, like things + becoming possible on research side, didn''t really mean people were using it in + production in the industry. And I think we were at this, this interesting bubble + back then.", "tokens": [51064, 1711, 264, 912, 565, 11, 321, 611, 1866, 300, 456, + 311, 341, 955, 7417, 11, 291, 458, 11, 411, 721, 5617, 1944, 322, 2132, 1252, 11, + 994, 380, 534, 914, 561, 645, 1228, 309, 294, 4265, 294, 264, 3518, 13, 400, 286, + 519, 321, 645, 412, 341, 11, 341, 1880, 12212, 646, 550, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.19103850920995077, "compression_ratio": 1.752, "no_speech_prob": + 0.061876144260168076}, {"id": 37, "seek": 33780, "start": 337.8, "end": 348.8, "text": + " We did it, we applied deep learning models at scale, saw how that worked, but + also saw how much of work it actually is of manual work to get it done.", "tokens": + [50364, 492, 630, 309, 11, 321, 6456, 2452, 2539, 5245, 412, 4373, 11, 1866, 577, + 300, 2732, 11, 457, 611, 1866, 577, 709, 295, 589, 309, 767, 307, 295, 9688, 589, + 281, 483, 309, 1096, 13, 50914], "temperature": 0.0, "avg_logprob": -0.20719686760959855, + "compression_ratio": 1.6682242990654206, "no_speech_prob": 0.009239474311470985}, + {"id": 38, "seek": 33780, "start": 348.8, "end": 366.8, "text": " And basically + up the early days of deep set were mainly around, how can we bridge that gap, how + can we get latest models from research into production in the industry, what kind + of product tooling can we do.", "tokens": [50914, 400, 1936, 493, 264, 2440, 1708, + 295, 2452, 992, 645, 8704, 926, 11, 577, 393, 321, 7283, 300, 7417, 11, 577, 393, + 321, 483, 6792, 5245, 490, 2132, 666, 4265, 294, 264, 3518, 11, 437, 733, 295, 1674, + 46593, 393, 321, 360, 13, 51814], "temperature": 0.0, "avg_logprob": -0.20719686760959855, + "compression_ratio": 1.6682242990654206, "no_speech_prob": 0.009239474311470985}, + {"id": 39, "seek": 36780, "start": 367.8, "end": 372.8, "text": " And can we build + to make that transition easier.", "tokens": [50364, 400, 393, 321, 1322, 281, 652, + 300, 6034, 3571, 13, 50614], "temperature": 0.0, "avg_logprob": -0.28563130912134205, + "compression_ratio": 1.477124183006536, "no_speech_prob": 0.0020190493669360876}, + {"id": 40, "seek": 36780, "start": 372.8, "end": 380.8, "text": " Yeah, and that''s + basically how we, we ended up in the, in the startup world building building out + deep set.", "tokens": [50614, 865, 11, 293, 300, 311, 1936, 577, 321, 11, 321, 4590, + 493, 294, 264, 11, 294, 264, 18578, 1002, 2390, 2390, 484, 2452, 992, 13, 51014], + "temperature": 0.0, "avg_logprob": -0.28563130912134205, "compression_ratio": 1.477124183006536, + "no_speech_prob": 0.0020190493669360876}, {"id": 41, "seek": 36780, "start": 380.8, + "end": 388.8, "text": " And, yeah, initially, that was really more about we saw + this problem.", "tokens": [51014, 400, 11, 1338, 11, 9105, 11, 300, 390, 534, 544, + 466, 321, 1866, 341, 1154, 13, 51414], "temperature": 0.0, "avg_logprob": -0.28563130912134205, + "compression_ratio": 1.477124183006536, "no_speech_prob": 0.0020190493669360876}, + {"id": 42, "seek": 38880, "start": 388.8, "end": 397.8, "text": " We had a couple + of product hypothesis, but we didn''t, didn''t like say place a bet on directly + on one of them.", "tokens": [50364, 492, 632, 257, 1916, 295, 1674, 17291, 11, 457, + 321, 994, 380, 11, 994, 380, 411, 584, 1081, 257, 778, 322, 3838, 322, 472, 295, + 552, 13, 50814], "temperature": 0.0, "avg_logprob": -0.18604428840406012, "compression_ratio": + 1.5337423312883436, "no_speech_prob": 0.2281481772661209}, {"id": 43, "seek": 38880, + "start": 397.8, "end": 407.8, "text": " We rather said, okay, let''s, let''s go + out there. Let''s really try to understand for one year what are really repetitive + use cases out there.", "tokens": [50814, 492, 2831, 848, 11, 1392, 11, 718, 311, + 11, 718, 311, 352, 484, 456, 13, 961, 311, 534, 853, 281, 1223, 337, 472, 1064, + 437, 366, 534, 29404, 764, 3331, 484, 456, 13, 51314], "temperature": 0.0, "avg_logprob": + -0.18604428840406012, "compression_ratio": 1.5337423312883436, "no_speech_prob": + 0.2281481772661209}, {"id": 44, "seek": 40780, "start": 407.8, "end": 420.8, "text": + " What are really the pain points of other enterprise teams that are working in + that field and then kind of settling on a product and then building it out.", "tokens": + [50364, 708, 366, 534, 264, 1822, 2793, 295, 661, 14132, 5491, 300, 366, 1364, 294, + 300, 2519, 293, 550, 733, 295, 33841, 322, 257, 1674, 293, 550, 2390, 309, 484, + 13, 51014], "temperature": 0.0, "avg_logprob": -0.17773635570819563, "compression_ratio": + 1.4797297297297298, "no_speech_prob": 0.03821157291531563}, {"id": 45, "seek": 40780, + "start": 420.8, "end": 426.8, "text": " Yeah, that''s basically after one year, + how we ended up in search.", "tokens": [51014, 865, 11, 300, 311, 1936, 934, 472, + 1064, 11, 577, 321, 4590, 493, 294, 3164, 13, 51314], "temperature": 0.0, "avg_logprob": + -0.17773635570819563, "compression_ratio": 1.4797297297297298, "no_speech_prob": + 0.03821157291531563}, {"id": 46, "seek": 42680, "start": 426.8, "end": 452.8, "text": + " And of course, I would say really the one use case, the dominant use case, there + was present in every company that we worked with and that was really a big say, + valuable use case, where the push not only came from the developers who wanted to + do something better, but also actually from the, from the business side where people + saw big value inside Eric.", "tokens": [50364, 400, 295, 1164, 11, 286, 576, 584, + 534, 264, 472, 764, 1389, 11, 264, 15657, 764, 1389, 11, 456, 390, 1974, 294, 633, + 2237, 300, 321, 2732, 365, 293, 300, 390, 534, 257, 955, 584, 11, 8263, 764, 1389, + 11, 689, 264, 2944, 406, 787, 1361, 490, 264, 8849, 567, 1415, 281, 360, 746, 1101, + 11, 457, 611, 767, 490, 264, 11, 490, 264, 1606, 1252, 689, 561, 1866, 955, 2158, + 1854, 9336, 13, 51664], "temperature": 0.0, "avg_logprob": -0.2226718511336889, + "compression_ratio": 1.6857142857142857, "no_speech_prob": 0.5127701163291931}, + {"id": 47, "seek": 45280, "start": 452.8, "end": 463.8, "text": " I use Google every + day, where can''t we have something similar in our product or our internal data + sets and and that thing was something that got us done really interested.", "tokens": + [50364, 286, 764, 3329, 633, 786, 11, 689, 393, 380, 321, 362, 746, 2531, 294, 527, + 1674, 420, 527, 6920, 1412, 6352, 293, 293, 300, 551, 390, 746, 300, 658, 505, 1096, + 534, 3102, 13, 50914], "temperature": 0.0, "avg_logprob": -0.21586583734868647, + "compression_ratio": 1.704, "no_speech_prob": 0.01011658739298582}, {"id": 48, "seek": + 45280, "start": 463.8, "end": 477.8, "text": " And on the same time that on the + the tech side, basically learning more and more about the pain points, why is it + actually so difficult for for people in these and these enterprises to build modern + search systems, what could you actually do to help them.", "tokens": [50914, 400, + 322, 264, 912, 565, 300, 322, 264, 264, 7553, 1252, 11, 1936, 2539, 544, 293, 544, + 466, 264, 1822, 2793, 11, 983, 307, 309, 767, 370, 2252, 337, 337, 561, 294, 613, + 293, 613, 29034, 281, 1322, 4363, 3164, 3652, 11, 437, 727, 291, 767, 360, 281, + 854, 552, 13, 51614], "temperature": 0.0, "avg_logprob": -0.21586583734868647, "compression_ratio": + 1.704, "no_speech_prob": 0.01011658739298582}, {"id": 49, "seek": 47780, "start": + 477.8, "end": 504.8, "text": " Yeah, that''s fascinating. Actually four or five + years ago, could you have imagined that an L.P. would cross paths with search because + like in many ways, this bar search, which existed for many, many years before was + in some sense, I sense it that way in mailing list, let''s say a patch is solar + mailing list, people were dreaming about applying an L.P. in some way, compared + to what is happening right now.", "tokens": [50364, 865, 11, 300, 311, 10343, 13, + 5135, 1451, 420, 1732, 924, 2057, 11, 727, 291, 362, 16590, 300, 364, 441, 13, 47, + 13, 576, 3278, 14518, 365, 3164, 570, 411, 294, 867, 2098, 11, 341, 2159, 3164, + 11, 597, 13135, 337, 867, 11, 867, 924, 949, 390, 294, 512, 2020, 11, 286, 2020, + 309, 300, 636, 294, 41612, 1329, 11, 718, 311, 584, 257, 9972, 307, 7936, 41612, + 1329, 11, 561, 645, 21475, 466, 9275, 364, 441, 13, 47, 13, 294, 512, 636, 11, 5347, + 281, 437, 307, 2737, 558, 586, 13, 51714], "temperature": 0.0, "avg_logprob": -0.22746465603510538, + "compression_ratio": 1.6370967741935485, "no_speech_prob": 0.10398313403129578}, + {"id": 50, "seek": 50480, "start": 504.8, "end": 532.8, "text": " I don''t want + to downplay those efforts, but I''m saying things like you could embed a post tag, + part of speech tag on on term level, and then use that during search again, you + need to run some kind of parser on the query, and then use that payload information + to filter through let''s say adjectives and verbs or something bad, you know, I + don''t know if there was any practical application in place, probably there was.", + "tokens": [50364, 286, 500, 380, 528, 281, 760, 2858, 729, 6484, 11, 457, 286, 478, + 1566, 721, 411, 291, 727, 12240, 257, 2183, 6162, 11, 644, 295, 6218, 6162, 322, + 322, 1433, 1496, 11, 293, 550, 764, 300, 1830, 3164, 797, 11, 291, 643, 281, 1190, + 512, 733, 295, 21156, 260, 322, 264, 14581, 11, 293, 550, 764, 300, 30918, 1589, + 281, 6608, 807, 718, 311, 584, 29378, 1539, 293, 30051, 420, 746, 1578, 11, 291, + 458, 11, 286, 500, 380, 458, 498, 456, 390, 604, 8496, 3861, 294, 1081, 11, 1391, + 456, 390, 13, 51764], "temperature": 0.0, "avg_logprob": -0.10818461781924534, "compression_ratio": + 1.6482213438735178, "no_speech_prob": 0.043999310582876205}, {"id": 51, "seek": + 53280, "start": 532.8, "end": 551.8, "text": " But again, if you compare that to + what is happening today, you basically have a vast array of models right in deep + learning models that can be applied directly to search using vector search approach, + could you have imagined this happening when you when you were about to start the + company.", "tokens": [50364, 583, 797, 11, 498, 291, 6794, 300, 281, 437, 307, 2737, + 965, 11, 291, 1936, 362, 257, 8369, 10225, 295, 5245, 558, 294, 2452, 2539, 5245, + 300, 393, 312, 6456, 3838, 281, 3164, 1228, 8062, 3164, 3109, 11, 727, 291, 362, + 16590, 341, 2737, 562, 291, 562, 291, 645, 466, 281, 722, 264, 2237, 13, 51314], + "temperature": 0.0, "avg_logprob": -0.09641309511863579, "compression_ratio": 1.6235955056179776, + "no_speech_prob": 0.015384525991976261}, {"id": 52, "seek": 55180, "start": 551.8, + "end": 579.8, "text": " No, I would say I was I think what we we had big big say + dreams about N.A.P. and we we were true believers that that things become easier + and say more feasible in production, but that was more actually under I would say + transfer learning side and making models to say more easily adoptable to certain + domains for search, I think that was for us.", "tokens": [50364, 883, 11, 286, 576, + 584, 286, 390, 286, 519, 437, 321, 321, 632, 955, 955, 584, 7505, 466, 426, 13, + 32, 13, 47, 13, 293, 321, 321, 645, 2074, 23125, 300, 300, 721, 1813, 3571, 293, + 584, 544, 26648, 294, 4265, 11, 457, 300, 390, 544, 767, 833, 286, 576, 584, 5003, + 2539, 1252, 293, 1455, 5245, 281, 584, 544, 3612, 6878, 712, 281, 1629, 25514, 337, + 3164, 11, 286, 519, 300, 390, 337, 505, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.24728968143463134, "compression_ratio": 1.6507177033492824, "no_speech_prob": + 0.1343073695898056}, {"id": 53, "seek": 57980, "start": 579.8, "end": 599.8, "text": + " And only then on our journey where we kind of realized, oh, like that''s actually + two interesting different fields kind of connecting over time right and also I felt + from at least from my perspective, from a community side from the people who worked + on information retrieval.", "tokens": [50364, 400, 787, 550, 322, 527, 4671, 689, + 321, 733, 295, 5334, 11, 1954, 11, 411, 300, 311, 767, 732, 1880, 819, 7909, 733, + 295, 11015, 670, 565, 558, 293, 611, 286, 2762, 490, 412, 1935, 490, 452, 4585, + 11, 490, 257, 1768, 1252, 490, 264, 561, 567, 2732, 322, 1589, 19817, 3337, 13, + 51364], "temperature": 0.0, "avg_logprob": -0.2910017047012061, "compression_ratio": + 1.4972677595628416, "no_speech_prob": 0.04161360487341881}, {"id": 54, "seek": 59980, + "start": 599.8, "end": 626.8, "text": " I think for a long time, a big, like a lot + of skeptic people, I wouldn''t be talking about any key or dance dance retrieval + for good reason right because I think there was also like a lot of hype around deep + learning and still what''s a lot of promises that were made like that it will just + outperform space retrieval out of the box.", "tokens": [50364, 286, 519, 337, 257, + 938, 565, 11, 257, 955, 11, 411, 257, 688, 295, 19128, 299, 561, 11, 286, 2759, + 380, 312, 1417, 466, 604, 2141, 420, 4489, 4489, 19817, 3337, 337, 665, 1778, 558, + 570, 286, 519, 456, 390, 611, 411, 257, 688, 295, 24144, 926, 2452, 2539, 293, 920, + 437, 311, 257, 688, 295, 16403, 300, 645, 1027, 411, 300, 309, 486, 445, 484, 26765, + 1901, 19817, 3337, 484, 295, 264, 2424, 13, 51714], "temperature": 0.0, "avg_logprob": + -0.2797529969034316, "compression_ratio": 1.6305418719211822, "no_speech_prob": + 0.15161512792110443}, {"id": 55, "seek": 62680, "start": 626.8, "end": 632.8, "text": + " And then I think many of these promises were not hold for a long time.", "tokens": + [50364, 400, 550, 286, 519, 867, 295, 613, 16403, 645, 406, 1797, 337, 257, 938, + 565, 13, 50664], "temperature": 0.0, "avg_logprob": -0.1850408471148947, "compression_ratio": + 1.5876288659793814, "no_speech_prob": 0.029741784557700157}, {"id": 56, "seek": + 62680, "start": 632.8, "end": 647.8, "text": " But I think then basically there + was another phase where I think people realized, oh, actually now it''s kind of + starting to work and not only just in research and these ivory towers and lab settings + but actually also in reality at scale.", "tokens": [50664, 583, 286, 519, 550, 1936, + 456, 390, 1071, 5574, 689, 286, 519, 561, 5334, 11, 1954, 11, 767, 586, 309, 311, + 733, 295, 2891, 281, 589, 293, 406, 787, 445, 294, 2132, 293, 613, 49218, 25045, + 293, 2715, 6257, 457, 767, 611, 294, 4103, 412, 4373, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.1850408471148947, "compression_ratio": 1.5876288659793814, + "no_speech_prob": 0.029741784557700157}, {"id": 57, "seek": 64780, "start": 647.8, + "end": 669.8, "text": " And I think that was then fast also here, the moment where + I''ve got really interesting and I think since then just crazy to see how things + are progressing when thinking about a multi model search or now just was like more + I say going away from document retrieval to maybe something like question answering + which we do a lot.", "tokens": [50364, 400, 286, 519, 300, 390, 550, 2370, 611, + 510, 11, 264, 1623, 689, 286, 600, 658, 534, 1880, 293, 286, 519, 1670, 550, 445, + 3219, 281, 536, 577, 721, 366, 36305, 562, 1953, 466, 257, 4825, 2316, 3164, 420, + 586, 445, 390, 411, 544, 286, 584, 516, 1314, 490, 4166, 19817, 3337, 281, 1310, + 746, 411, 1168, 13430, 597, 321, 360, 257, 688, 13, 51464], "temperature": 0.0, + "avg_logprob": -0.26162394355325136, "compression_ratio": 1.603960396039604, "no_speech_prob": + 0.20492742955684662}, {"id": 58, "seek": 66980, "start": 669.8, "end": 679.8, "text": + " And really really crazy to see what''s possible these days and I couldn''t have + imagined that it''s going so fast.", "tokens": [50364, 400, 534, 534, 3219, 281, + 536, 437, 311, 1944, 613, 1708, 293, 286, 2809, 380, 362, 16590, 300, 309, 311, + 516, 370, 2370, 13, 50864], "temperature": 0.0, "avg_logprob": -0.218989186472707, + "compression_ratio": 1.4805825242718447, "no_speech_prob": 0.12468904256820679}, + {"id": 59, "seek": 66980, "start": 679.8, "end": 683.8, "text": " Yeah, and there + are a lot of contributors as well, of course.", "tokens": [50864, 865, 11, 293, + 456, 366, 257, 688, 295, 45627, 382, 731, 11, 295, 1164, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.218989186472707, "compression_ratio": 1.4805825242718447, + "no_speech_prob": 0.12468904256820679}, {"id": 60, "seek": 66980, "start": 683.8, + "end": 692.8, "text": " I just happened to give a talk about players in vector search. + I will link it in the show notes, which was just published with C''s.", "tokens": + [51064, 286, 445, 2011, 281, 976, 257, 751, 466, 4150, 294, 8062, 3164, 13, 286, + 486, 2113, 309, 294, 264, 855, 5570, 11, 597, 390, 445, 6572, 365, 383, 311, 13, + 51514], "temperature": 0.0, "avg_logprob": -0.218989186472707, "compression_ratio": + 1.4805825242718447, "no_speech_prob": 0.12468904256820679}, {"id": 61, "seek": 69280, + "start": 692.8, "end": 704.8, "text": " London IR meet up, but even that during + that presentation, I felt like I''m scratching the the tip of the iceberg in some + sense, I know there is so much happening.", "tokens": [50364, 7042, 16486, 1677, + 493, 11, 457, 754, 300, 1830, 300, 5860, 11, 286, 2762, 411, 286, 478, 29699, 264, + 264, 4125, 295, 264, 38880, 294, 512, 2020, 11, 286, 458, 456, 307, 370, 709, 2737, + 13, 50964], "temperature": 0.0, "avg_logprob": -0.2050486711355356, "compression_ratio": + 1.6798418972332017, "no_speech_prob": 0.31858739256858826}, {"id": 62, "seek": 69280, + "start": 704.8, "end": 719.8, "text": " And in Heystack, like did you have a vision + for the product, like you said, you didn''t know what the product will be, but you + knew sort of the repetitive use cases in a way, right, and also challenges, can + you share some of the early day challenges that you saw.", "tokens": [50964, 400, + 294, 1911, 372, 501, 11, 411, 630, 291, 362, 257, 5201, 337, 264, 1674, 11, 411, + 291, 848, 11, 291, 994, 380, 458, 437, 264, 1674, 486, 312, 11, 457, 291, 2586, + 1333, 295, 264, 29404, 764, 3331, 294, 257, 636, 11, 558, 11, 293, 611, 4759, 11, + 393, 291, 2073, 512, 295, 264, 2440, 786, 4759, 300, 291, 1866, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.2050486711355356, "compression_ratio": 1.6798418972332017, + "no_speech_prob": 0.31858739256858826}, {"id": 63, "seek": 71980, "start": 719.8, + "end": 729.8, "text": " And do you think that they are solved today or are they + still kind of like in the mix of we need to fix something''s there.", "tokens": + [50364, 400, 360, 291, 519, 300, 436, 366, 13041, 965, 420, 366, 436, 920, 733, + 295, 411, 294, 264, 2890, 295, 321, 643, 281, 3191, 746, 311, 456, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.16654385113325276, "compression_ratio": 1.5408805031446542, + "no_speech_prob": 0.09999927133321762}, {"id": 64, "seek": 71980, "start": 729.8, + "end": 738.8, "text": " So I think that was basically all about this first year + of deep set, where we did these learnings where wasn''t that clear.", "tokens": + [50864, 407, 286, 519, 300, 390, 1936, 439, 466, 341, 700, 1064, 295, 2452, 992, + 11, 689, 321, 630, 613, 2539, 82, 689, 2067, 380, 300, 1850, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.16654385113325276, "compression_ratio": 1.5408805031446542, + "no_speech_prob": 0.09999927133321762}, {"id": 65, "seek": 73880, "start": 738.8, + "end": 748.8, "text": " But after that year, I think we had a lot of clear insights + and at least for us, a clear vision also for Heystack, what we want to want to solve + there.", "tokens": [50364, 583, 934, 300, 1064, 11, 286, 519, 321, 632, 257, 688, + 295, 1850, 14310, 293, 412, 1935, 337, 505, 11, 257, 1850, 5201, 611, 337, 1911, + 372, 501, 11, 437, 321, 528, 281, 528, 281, 5039, 456, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.17051504770914713, "compression_ratio": 1.5625, "no_speech_prob": + 0.021068288013339043}, {"id": 66, "seek": 73880, "start": 748.8, "end": 761.8, "text": + " And I would say the big challenge, the big problem that we focused on that we + saw in the industry was having just all these get up technologies and.", "tokens": + [50864, 400, 286, 576, 584, 264, 955, 3430, 11, 264, 955, 1154, 300, 321, 5178, + 322, 300, 321, 1866, 294, 264, 3518, 390, 1419, 445, 439, 613, 483, 493, 7943, 293, + 13, 51514], "temperature": 0.0, "avg_logprob": -0.17051504770914713, "compression_ratio": + 1.5625, "no_speech_prob": 0.021068288013339043}, {"id": 67, "seek": 76180, "start": + 761.8, "end": 774.8, "text": " And basically Heystack is trying and always as I + would say as a design philosophy design principle has two things in place that try + to bring these data technologies together in a meaningful way.", "tokens": [50364, + 400, 1936, 1911, 372, 501, 307, 1382, 293, 1009, 382, 286, 576, 584, 382, 257, 1715, + 10675, 1715, 8665, 575, 732, 721, 294, 1081, 300, 853, 281, 1565, 613, 1412, 7943, + 1214, 294, 257, 10995, 636, 13, 51014], "temperature": 0.0, "avg_logprob": -0.24527852963178587, + "compression_ratio": 1.6454545454545455, "no_speech_prob": 0.029410675168037415}, + {"id": 68, "seek": 76180, "start": 774.8, "end": 789.8, "text": " And what I mean + with that is basically if you think about search it''s what say really it''s a lot + more than then model right and it typically you have factor databases.", "tokens": + [51014, 400, 437, 286, 914, 365, 300, 307, 1936, 498, 291, 519, 466, 3164, 309, + 311, 437, 584, 534, 309, 311, 257, 688, 544, 813, 550, 2316, 558, 293, 309, 5850, + 291, 362, 5952, 22380, 13, 51764], "temperature": 0.0, "avg_logprob": -0.24527852963178587, + "compression_ratio": 1.6454545454545455, "no_speech_prob": 0.029410675168037415}, + {"id": 69, "seek": 78980, "start": 789.8, "end": 799.8, "text": " And you may be + chained together multiple models, you have something you want to do at indexing + time, you have other things you want to do a query time.", "tokens": [50364, 400, + 291, 815, 312, 417, 3563, 1214, 3866, 5245, 11, 291, 362, 746, 291, 528, 281, 360, + 412, 8186, 278, 565, 11, 291, 362, 661, 721, 291, 528, 281, 360, 257, 14581, 565, + 13, 50864], "temperature": 0.0, "avg_logprob": -0.15708896590442192, "compression_ratio": + 1.685, "no_speech_prob": 0.018163319677114487}, {"id": 70, "seek": 78980, "start": + 799.8, "end": 812.8, "text": " And for each of these say kind of components that + you need at the end, there are so many different options that you''re that you can + plug in and often it''s hard to say in the early days.", "tokens": [50864, 400, + 337, 1184, 295, 613, 584, 733, 295, 6677, 300, 291, 643, 412, 264, 917, 11, 456, + 366, 370, 867, 819, 3956, 300, 291, 434, 300, 291, 393, 5452, 294, 293, 2049, 309, + 311, 1152, 281, 584, 294, 264, 2440, 1708, 13, 51514], "temperature": 0.0, "avg_logprob": + -0.15708896590442192, "compression_ratio": 1.685, "no_speech_prob": 0.018163319677114487}, + {"id": 71, "seek": 81280, "start": 812.8, "end": 829.8, "text": " And then you know, + do I go for elastic search or something like pine cone electrical database, do I + go for this model or that model, do I need a, I don''t know, just the retriever + in my pipeline or do I actually also need to add a re rank or something else.", + "tokens": [50364, 400, 550, 291, 458, 11, 360, 286, 352, 337, 17115, 3164, 420, + 746, 411, 15113, 19749, 12147, 8149, 11, 360, 286, 352, 337, 341, 2316, 420, 300, + 2316, 11, 360, 286, 643, 257, 11, 286, 500, 380, 458, 11, 445, 264, 19817, 331, + 294, 452, 15517, 420, 360, 286, 767, 611, 643, 281, 909, 257, 319, 6181, 420, 746, + 1646, 13, 51214], "temperature": 0.0, "avg_logprob": -0.3190241300142728, "compression_ratio": + 1.6, "no_speech_prob": 0.08912377059459686}, {"id": 72, "seek": 82980, "start": + 829.8, "end": 835.8, "text": " And we just saw that teams are aware of actually + spending a lot of time on.", "tokens": [50364, 400, 321, 445, 1866, 300, 5491, 366, + 3650, 295, 767, 6434, 257, 688, 295, 565, 322, 13, 50664], "temperature": 0.0, "avg_logprob": + -0.3001950127737863, "compression_ratio": 1.0273972602739727, "no_speech_prob": + 0.05606675148010254}, {"id": 73, "seek": 83580, "start": 835.8, "end": 849.8, "text": + " And then we''re doing these things together manually. And even when they had it + once there was and constant or maintenance work or iterations where they have to + exchange one component of the system.", "tokens": [50364, 400, 550, 321, 434, 884, + 613, 721, 1214, 16945, 13, 400, 754, 562, 436, 632, 309, 1564, 456, 390, 293, 5754, + 420, 11258, 589, 420, 36540, 689, 436, 362, 281, 7742, 472, 6542, 295, 264, 1185, + 13, 51064], "temperature": 0.0, "avg_logprob": -0.30508428906637525, "compression_ratio": + 1.6195652173913044, "no_speech_prob": 0.47956162691116333}, {"id": 74, "seek": 83580, + "start": 849.8, "end": 857.8, "text": " And that was really just slowing them down + a lot and sometimes even then causing that a project got.", "tokens": [51064, 400, + 300, 390, 534, 445, 26958, 552, 760, 257, 688, 293, 2171, 754, 550, 9853, 300, 257, + 1716, 658, 13, 51464], "temperature": 0.0, "avg_logprob": -0.30508428906637525, + "compression_ratio": 1.6195652173913044, "no_speech_prob": 0.47956162691116333}, + {"id": 75, "seek": 85780, "start": 857.8, "end": 869.8, "text": " So over time, + not really ending up in production, but kind of dying at the prototyping stage, + because it just took so long and and things got kind of sidetracked.", "tokens": + [50364, 407, 670, 565, 11, 406, 534, 8121, 493, 294, 4265, 11, 457, 733, 295, 8639, + 412, 264, 46219, 3381, 3233, 11, 570, 309, 445, 1890, 370, 938, 293, 293, 721, 658, + 733, 295, 20822, 27965, 25949, 13, 50964], "temperature": 0.0, "avg_logprob": -0.28619562784830727, + "compression_ratio": 1.5572139303482586, "no_speech_prob": 0.017590856179594994}, + {"id": 76, "seek": 85780, "start": 869.8, "end": 878.8, "text": " And with hastag, + we basically tried to solve that and having very clear building blocks like, for + example, the retriever, which very clean their face.", "tokens": [50964, 400, 365, + 6581, 559, 11, 321, 1936, 3031, 281, 5039, 300, 293, 1419, 588, 1850, 2390, 8474, + 411, 11, 337, 1365, 11, 264, 19817, 331, 11, 597, 588, 2541, 641, 1851, 13, 51414], + "temperature": 0.0, "avg_logprob": -0.28619562784830727, "compression_ratio": 1.5572139303482586, + "no_speech_prob": 0.017590856179594994}, {"id": 77, "seek": 87880, "start": 878.8, + "end": 896.8, "text": " And within that you can swap a lot of different technology + models and the same for a slew vector database document stores where you can very + easily change between something like elastic search, pine cone, we veate and whatnot.", + "tokens": [50364, 400, 1951, 300, 291, 393, 18135, 257, 688, 295, 819, 2899, 5245, + 293, 264, 912, 337, 257, 2426, 86, 8062, 8149, 4166, 9512, 689, 291, 393, 588, 3612, + 1319, 1296, 746, 411, 17115, 3164, 11, 15113, 19749, 11, 321, 1241, 473, 293, 25882, + 13, 51264], "temperature": 0.0, "avg_logprob": -0.43700941403706867, "compression_ratio": + 1.4394904458598725, "no_speech_prob": 0.0024109873920679092}, {"id": 78, "seek": + 89680, "start": 896.8, "end": 916.8, "text": " So I would say that''s the was the + one thing this building blocks and trying to get the focus of developers back on + making these creative decisions what they actually want to have in their pipeline, + trying it out with with anti users, rather than just spending time on doing things + together.", "tokens": [50364, 407, 286, 576, 584, 300, 311, 264, 390, 264, 472, + 551, 341, 2390, 8474, 293, 1382, 281, 483, 264, 1879, 295, 8849, 646, 322, 1455, + 613, 5880, 5327, 437, 436, 767, 528, 281, 362, 294, 641, 15517, 11, 1382, 309, 484, + 365, 365, 6061, 5022, 11, 2831, 813, 445, 6434, 565, 322, 884, 721, 1214, 13, 51364], + "temperature": 0.0, "avg_logprob": -0.18553767204284669, "compression_ratio": 1.5508021390374331, + "no_speech_prob": 0.017965903505682945}, {"id": 79, "seek": 91680, "start": 916.8, + "end": 933.8, "text": " And the second thing is I would say very deep concept also + in hastag up pipelines. So really what we saw is it''s not just one model. It''s + typically a couple of steps that you want to have there.", "tokens": [50364, 400, + 264, 1150, 551, 307, 286, 576, 584, 588, 2452, 3410, 611, 294, 6581, 559, 493, 40168, + 13, 407, 534, 437, 321, 1866, 307, 309, 311, 406, 445, 472, 2316, 13, 467, 311, + 5850, 257, 1916, 295, 4439, 300, 291, 528, 281, 362, 456, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.18035824444829202, "compression_ratio": 1.3661971830985915, + "no_speech_prob": 0.04663967341184616}, {"id": 80, "seek": 93380, "start": 934.8, + "end": 950.8, "text": " So in hastag we started early on having direct as to click + graphs where you can have different notes and basically when you have a query or + indexing time file that kind of hits the pipeline, you can root it for this graph.", + "tokens": [50414, 407, 294, 6581, 559, 321, 1409, 2440, 322, 1419, 2047, 382, 281, + 2052, 24877, 689, 291, 393, 362, 819, 5570, 293, 1936, 562, 291, 362, 257, 14581, + 420, 8186, 278, 565, 3991, 300, 733, 295, 8664, 264, 15517, 11, 291, 393, 5593, + 309, 337, 341, 4295, 13, 51214], "temperature": 0.0, "avg_logprob": -0.24802230386173024, + "compression_ratio": 1.4701986754966887, "no_speech_prob": 0.16086244583129883}, + {"id": 81, "seek": 95080, "start": 950.8, "end": 963.8, "text": " That can be very + easy. There is a set of a query. I do put it to a retriever and I get back my documents + or can go basically quite complex where you say all like depending on the query + type.", "tokens": [50364, 663, 393, 312, 588, 1858, 13, 821, 307, 257, 992, 295, + 257, 14581, 13, 286, 360, 829, 309, 281, 257, 19817, 331, 293, 286, 483, 646, 452, + 8512, 420, 393, 352, 1936, 1596, 3997, 689, 291, 584, 439, 411, 5413, 322, 264, + 14581, 2010, 13, 51014], "temperature": 0.0, "avg_logprob": -0.2815162502989477, + "compression_ratio": 1.3768115942028984, "no_speech_prob": 0.09382446855306625}, + {"id": 82, "seek": 96380, "start": 963.8, "end": 980.8, "text": " If it''s a keyword + query, I rooted a certain path in my graph, my pipeline, or if it''s a question, + maybe I go a different way and I have different models, I''m basically involved + in my in my search request.", "tokens": [50364, 759, 309, 311, 257, 20428, 14581, + 11, 286, 25277, 257, 1629, 3100, 294, 452, 4295, 11, 452, 15517, 11, 420, 498, 309, + 311, 257, 1168, 11, 1310, 286, 352, 257, 819, 636, 293, 286, 362, 819, 5245, 11, + 286, 478, 1936, 3288, 294, 452, 294, 452, 3164, 5308, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.2510858751692862, "compression_ratio": 1.4265734265734267, + "no_speech_prob": 0.03975219652056694}, {"id": 83, "seek": 98080, "start": 980.8, + "end": 1005.8, "text": " And these two, I was here, the core principles in hastag + up. That''s very interesting. So that second thing they are cyclic graph with a + love for very complex scenarios, right. Like as you explained, we couldn''t principle + support question answering use case side by side with the kind of like normal search + with theory, rankers and stuff, right. Is that correct.", "tokens": [50364, 400, + 613, 732, 11, 286, 390, 510, 11, 264, 4965, 9156, 294, 6581, 559, 493, 13, 663, + 311, 588, 1880, 13, 407, 300, 1150, 551, 436, 366, 38154, 1050, 4295, 365, 257, + 959, 337, 588, 3997, 15077, 11, 558, 13, 1743, 382, 291, 8825, 11, 321, 2809, 380, + 8665, 1406, 1168, 13430, 764, 1389, 1252, 538, 1252, 365, 264, 733, 295, 411, 2710, + 3164, 365, 5261, 11, 6181, 433, 293, 1507, 11, 558, 13, 1119, 300, 3006, 13, 51614], + "temperature": 0.0, "avg_logprob": -0.318989939805938, "compression_ratio": 1.5427350427350428, + "no_speech_prob": 0.09990677982568741}, {"id": 84, "seek": 100580, "start": 1005.8, + "end": 1020.8, "text": " Exactly. So that''s what we basically learned from customers + like when we saw there was a big interest in something like question answering and + people say, wow, that''s amazing. Can we use that for our website or for our product + here.", "tokens": [50364, 7587, 13, 407, 300, 311, 437, 321, 1936, 3264, 490, 4581, + 411, 562, 321, 1866, 456, 390, 257, 955, 1179, 294, 746, 411, 1168, 13430, 293, + 561, 584, 11, 6076, 11, 300, 311, 2243, 13, 1664, 321, 764, 300, 337, 527, 3144, + 420, 337, 527, 1674, 510, 13, 51114], "temperature": 0.0, "avg_logprob": -0.18177322240976188, + "compression_ratio": 1.45, "no_speech_prob": 0.023152818903326988}, {"id": 85, "seek": + 102080, "start": 1021.8, "end": 1033.8, "text": " But doing that switch in a production + case is quite tough, right. Like if people are used to do keyword queries and they + know I know I have to enter your keywords to get basically my results.", "tokens": + [50414, 583, 884, 300, 3679, 294, 257, 4265, 1389, 307, 1596, 4930, 11, 558, 13, + 1743, 498, 561, 366, 1143, 281, 360, 20428, 24109, 293, 436, 458, 286, 458, 286, + 362, 281, 3242, 428, 21009, 281, 483, 1936, 452, 3542, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.2019599776670157, "compression_ratio": 1.6266666666666667, + "no_speech_prob": 0.2565976679325104}, {"id": 86, "seek": 102080, "start": 1033.8, + "end": 1045.8, "text": " And then from one day to the other, you switch to more + semantic queries, maybe more questions or also I think dance retrieval, if you really + have more sentences that you use.", "tokens": [51014, 400, 550, 490, 472, 786, 281, + 264, 661, 11, 291, 3679, 281, 544, 47982, 24109, 11, 1310, 544, 1651, 420, 611, + 286, 519, 4489, 19817, 3337, 11, 498, 291, 534, 362, 544, 16579, 300, 291, 764, + 13, 51614], "temperature": 0.0, "avg_logprob": -0.2019599776670157, "compression_ratio": + 1.6266666666666667, "no_speech_prob": 0.2565976679325104}, {"id": 87, "seek": 104580, + "start": 1045.8, "end": 1056.8, "text": " It takes some time for people to adjust + and we saw that in a couple of scenarios that basically the traffic kind of requests + that come in.", "tokens": [50364, 467, 2516, 512, 565, 337, 561, 281, 4369, 293, + 321, 1866, 300, 294, 257, 1916, 295, 15077, 300, 1936, 264, 6419, 733, 295, 12475, + 300, 808, 294, 13, 50914], "temperature": 0.0, "avg_logprob": -0.19762579600016275, + "compression_ratio": 1.5748792270531402, "no_speech_prob": 0.002506805118173361}, + {"id": 88, "seek": 104580, "start": 1056.8, "end": 1069.8, "text": " Start a lot + with keyword queries and then over time slowly shift towards more semantic queries. + When people realize, oh, I can actually also ask a question and all this like, like + Google.", "tokens": [50914, 6481, 257, 688, 365, 20428, 24109, 293, 550, 670, 565, + 5692, 5513, 3030, 544, 47982, 24109, 13, 1133, 561, 4325, 11, 1954, 11, 286, 393, + 767, 611, 1029, 257, 1168, 293, 439, 341, 411, 11, 411, 3329, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.19762579600016275, "compression_ratio": 1.5748792270531402, + "no_speech_prob": 0.002506805118173361}, {"id": 89, "seek": 106980, "start": 1069.8, + "end": 1098.8, "text": " And then there''s a trend, but you need everything to have + an option your system to allow both for certain time and and hasty basically with + the query classifier where you can initially basically classify is that a question + or a keyword query or you could go with also semantically like what a topic level + saying all like this is a query for certain type of category in my my document set.", + "tokens": [50364, 400, 550, 456, 311, 257, 6028, 11, 457, 291, 643, 1203, 281, 362, + 364, 3614, 428, 1185, 281, 2089, 1293, 337, 1629, 565, 293, 293, 6581, 88, 1936, + 365, 264, 14581, 1508, 9902, 689, 291, 393, 9105, 1936, 33872, 307, 300, 257, 1168, + 420, 257, 20428, 14581, 420, 291, 727, 352, 365, 611, 4361, 49505, 411, 437, 257, + 4829, 1496, 1566, 439, 411, 341, 307, 257, 14581, 337, 1629, 2010, 295, 7719, 294, + 452, 452, 4166, 992, 13, 51814], "temperature": 0.0, "avg_logprob": -0.286276514937238, + "compression_ratio": 1.7017543859649122, "no_speech_prob": 0.05763981118798256}, + {"id": 90, "seek": 109880, "start": 1098.8, "end": 1101.8, "text": " And then maybe + do something different.", "tokens": [50364, 400, 550, 1310, 360, 746, 819, 13, 50514], + "temperature": 0.0, "avg_logprob": -0.2573685091595317, "compression_ratio": 1.7053571428571428, + "no_speech_prob": 0.013653156347572803}, {"id": 91, "seek": 109880, "start": 1101.8, + "end": 1110.8, "text": " And like early on Hey stack did it integrate with any database + per se was it like the last search back then.", "tokens": [50514, 400, 411, 2440, + 322, 1911, 8630, 630, 309, 13365, 365, 604, 8149, 680, 369, 390, 309, 411, 264, + 1036, 3164, 646, 550, 13, 50964], "temperature": 0.0, "avg_logprob": -0.2573685091595317, + "compression_ratio": 1.7053571428571428, "no_speech_prob": 0.013653156347572803}, + {"id": 92, "seek": 109880, "start": 1110.8, "end": 1117.8, "text": " Yeah, like + the basically starting point was the last search was the very first document store + we had.", "tokens": [50964, 865, 11, 411, 264, 1936, 2891, 935, 390, 264, 1036, + 3164, 390, 264, 588, 700, 4166, 3531, 321, 632, 13, 51314], "temperature": 0.0, + "avg_logprob": -0.2573685091595317, "compression_ratio": 1.7053571428571428, "no_speech_prob": + 0.013653156347572803}, {"id": 93, "seek": 109880, "start": 1117.8, "end": 1125.8, + "text": " But the last search back then didn''t I believe didn''t support neural + search right so how did you actually gel these things together.", "tokens": [51314, + 583, 264, 1036, 3164, 646, 550, 994, 380, 286, 1697, 994, 380, 1406, 18161, 3164, + 558, 370, 577, 630, 291, 767, 4087, 613, 721, 1214, 13, 51714], "temperature": 0.0, + "avg_logprob": -0.2573685091595317, "compression_ratio": 1.7053571428571428, "no_speech_prob": + 0.013653156347572803}, {"id": 94, "seek": 112580, "start": 1125.8, "end": 1129.8, + "text": " Yeah, that was just that kind of coming in over time right so it was.", + "tokens": [50364, 865, 11, 300, 390, 445, 300, 733, 295, 1348, 294, 670, 565, 558, + 370, 309, 390, 13, 50564], "temperature": 0.0, "avg_logprob": -0.24707120259602863, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.012001951225101948}, + {"id": 95, "seek": 112580, "start": 1129.8, "end": 1134.8, "text": " Think the the + era where elastic search was for us was really.", "tokens": [50564, 6557, 264, 264, + 4249, 689, 17115, 3164, 390, 337, 505, 390, 534, 13, 50814], "temperature": 0.0, + "avg_logprob": -0.24707120259602863, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.012001951225101948}, {"id": 96, "seek": 112580, "start": 1134.8, "end": 1141.8, + "text": " We came from a question answering use cases a lot and there was really + like how do we scale that how can we now.", "tokens": [50814, 492, 1361, 490, 257, + 1168, 13430, 764, 3331, 257, 688, 293, 456, 390, 534, 411, 577, 360, 321, 4373, + 300, 577, 393, 321, 586, 13, 51164], "temperature": 0.0, "avg_logprob": -0.24707120259602863, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.012001951225101948}, + {"id": 97, "seek": 112580, "start": 1141.8, "end": 1151.8, "text": " Ask questions + not on a single document and single small passage, but how can we do it actually + on millions of files and.", "tokens": [51164, 12320, 1651, 406, 322, 257, 2167, + 4166, 293, 2167, 1359, 11497, 11, 457, 577, 393, 321, 360, 309, 767, 322, 6803, + 295, 7098, 293, 13, 51664], "temperature": 0.0, "avg_logprob": -0.24707120259602863, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.012001951225101948}, + {"id": 98, "seek": 115180, "start": 1151.8, "end": 1165.8, "text": " And the 25 + work as a retriever step before that was was okay was not not too bad and that''s + kind of how it started and then very fast evolved into into a say back to search + direction.", "tokens": [50364, 400, 264, 3552, 589, 382, 257, 19817, 331, 1823, + 949, 300, 390, 390, 1392, 390, 406, 406, 886, 1578, 293, 300, 311, 733, 295, 577, + 309, 1409, 293, 550, 588, 2370, 14178, 666, 666, 257, 584, 646, 281, 3164, 3513, + 13, 51064], "temperature": 0.0, "avg_logprob": -0.2835579367244945, "compression_ratio": + 1.6136363636363635, "no_speech_prob": 0.0006131429108791053}, {"id": 99, "seek": + 115180, "start": 1165.8, "end": 1170.8, "text": " Where we had them a files basically + as a as a next document store.", "tokens": [51064, 2305, 321, 632, 552, 257, 7098, + 1936, 382, 257, 382, 257, 958, 4166, 3531, 13, 51314], "temperature": 0.0, "avg_logprob": + -0.2835579367244945, "compression_ratio": 1.6136363636363635, "no_speech_prob": + 0.0006131429108791053}, {"id": 100, "seek": 115180, "start": 1170.8, "end": 1178.8, + "text": " In combination with some some SQL database for for the metadata and so + on and then it basically kind of.", "tokens": [51314, 682, 6562, 365, 512, 512, + 19200, 8149, 337, 337, 264, 26603, 293, 370, 322, 293, 550, 309, 1936, 733, 295, + 13, 51714], "temperature": 0.0, "avg_logprob": -0.2835579367244945, "compression_ratio": + 1.6136363636363635, "no_speech_prob": 0.0006131429108791053}, {"id": 101, "seek": + 117880, "start": 1178.8, "end": 1190.8, "text": " I think took off on the lecture + database side with the nervous we via a pine cone and so on and so forth open search + today is also part of the face deck.", "tokens": [50364, 286, 519, 1890, 766, 322, + 264, 7991, 8149, 1252, 365, 264, 6296, 321, 5766, 257, 15113, 19749, 293, 370, 322, + 293, 370, 5220, 1269, 3164, 965, 307, 611, 644, 295, 264, 1851, 9341, 13, 50964], + "temperature": 0.0, "avg_logprob": -0.42298108477925145, "compression_ratio": 1.5299539170506913, + "no_speech_prob": 0.012067960575222969}, {"id": 102, "seek": 117880, "start": 1190.8, + "end": 1193.8, "text": " But that was I think then just.", "tokens": [50964, 583, + 300, 390, 286, 519, 550, 445, 13, 51114], "temperature": 0.0, "avg_logprob": -0.42298108477925145, + "compression_ratio": 1.5299539170506913, "no_speech_prob": 0.012067960575222969}, + {"id": 103, "seek": 117880, "start": 1193.8, "end": 1197.8, "text": " Half half + here after we launched a stick.", "tokens": [51114, 15917, 1922, 510, 934, 321, + 8730, 257, 2897, 13, 51314], "temperature": 0.0, "avg_logprob": -0.42298108477925145, + "compression_ratio": 1.5299539170506913, "no_speech_prob": 0.012067960575222969}, + {"id": 104, "seek": 117880, "start": 1197.8, "end": 1204.8, "text": " Oh yeah, that''s + awesome. That sounds quite quick. I know that BBA was also emerging about the same + time.", "tokens": [51314, 876, 1338, 11, 300, 311, 3476, 13, 663, 3263, 1596, 1702, + 13, 286, 458, 300, 363, 9295, 390, 611, 14989, 466, 264, 912, 565, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.42298108477925145, "compression_ratio": 1.5299539170506913, + "no_speech_prob": 0.012067960575222969}, {"id": 105, "seek": 120480, "start": 1204.8, + "end": 1209.8, "text": " And then and then neighbors I guess as well. Yeah, that''s + that''s that sounds super cool.", "tokens": [50364, 400, 550, 293, 550, 12512, 286, + 2041, 382, 731, 13, 865, 11, 300, 311, 300, 311, 300, 3263, 1687, 1627, 13, 50614], + "temperature": 0.0, "avg_logprob": -0.18779731750488282, "compression_ratio": 1.6634615384615385, + "no_speech_prob": 0.055835068225860596}, {"id": 106, "seek": 120480, "start": 1209.8, + "end": 1228.8, "text": " And was there any as you were approaching your clients + or like prospects was there any specific use case that you would be demoing with + because you knew this would trigger the aha moment like question answering or maybe + a specific domain where you did that.", "tokens": [50614, 400, 390, 456, 604, 382, + 291, 645, 14908, 428, 6982, 420, 411, 32933, 390, 456, 604, 2685, 764, 1389, 300, + 291, 576, 312, 10723, 278, 365, 570, 291, 2586, 341, 576, 7875, 264, 47340, 1623, + 411, 1168, 13430, 420, 1310, 257, 2685, 9274, 689, 291, 630, 300, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.18779731750488282, "compression_ratio": 1.6634615384615385, + "no_speech_prob": 0.055835068225860596}, {"id": 107, "seek": 122880, "start": 1228.8, + "end": 1238.8, "text": " Yeah, I would say we were for us it was a lot around question + answering back then that was really very great that I think many of these aha moments.", + "tokens": [50364, 865, 11, 286, 576, 584, 321, 645, 337, 505, 309, 390, 257, 688, + 926, 1168, 13430, 646, 550, 300, 390, 534, 588, 869, 300, 286, 519, 867, 295, 613, + 47340, 6065, 13, 50864], "temperature": 0.0, "avg_logprob": -0.2535906303219679, + "compression_ratio": 1.645021645021645, "no_speech_prob": 0.0030102732125669718}, + {"id": 108, "seek": 122880, "start": 1238.8, "end": 1246.8, "text": " As to remember + we were at one client and when this meeting and it was like on the in the financial + domain.", "tokens": [50864, 1018, 281, 1604, 321, 645, 412, 472, 6423, 293, 562, + 341, 3440, 293, 309, 390, 411, 322, 264, 294, 264, 4669, 9274, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.2535906303219679, "compression_ratio": 1.645021645021645, + "no_speech_prob": 0.0030102732125669718}, {"id": 109, "seek": 122880, "start": 1246.8, + "end": 1257.8, "text": " So we''re interested in asking questions on financial reports + of certain companies and basically accelerating their analysis.", "tokens": [51264, + 407, 321, 434, 3102, 294, 3365, 1651, 322, 4669, 7122, 295, 1629, 3431, 293, 1936, + 34391, 641, 5215, 13, 51814], "temperature": 0.0, "avg_logprob": -0.2535906303219679, + "compression_ratio": 1.645021645021645, "no_speech_prob": 0.0030102732125669718}, + {"id": 110, "seek": 125780, "start": 1257.8, "end": 1275.8, "text": " And at one + point in this meeting we showed what you can do with question answering ask these + questions and they also like suggested own questions that we should ask and they + work so they were that point and convinced oh like that''s not fake.", "tokens": + [50364, 400, 412, 472, 935, 294, 341, 3440, 321, 4712, 437, 291, 393, 360, 365, + 1168, 13430, 1029, 613, 1651, 293, 436, 611, 411, 10945, 1065, 1651, 300, 321, 820, + 1029, 293, 436, 589, 370, 436, 645, 300, 935, 293, 12561, 1954, 411, 300, 311, 406, + 7592, 13, 51264], "temperature": 0.0, "avg_logprob": -0.2992697697059781, "compression_ratio": + 1.6394557823129252, "no_speech_prob": 0.0038970394525676966}, {"id": 111, "seek": + 127580, "start": 1275.8, "end": 1304.8, "text": " And like smoke and mirror here. + And the basically the boss of the department was standing up and shouting like wow + that''s that''s amazing and went out of the office and at the office next door and + and carried over colleagues and said like you have to see that and that was actually + even before we started building hastag but was these kind of moments were very important + to see like this is something.", "tokens": [50364, 400, 411, 8439, 293, 8013, 510, + 13, 400, 264, 1936, 264, 5741, 295, 264, 5882, 390, 4877, 493, 293, 20382, 411, + 6076, 300, 311, 300, 311, 2243, 293, 1437, 484, 295, 264, 3398, 293, 412, 264, 3398, + 958, 2853, 293, 293, 9094, 670, 7734, 293, 848, 411, 291, 362, 281, 536, 300, 293, + 300, 390, 767, 754, 949, 321, 1409, 2390, 6581, 559, 457, 390, 613, 733, 295, 6065, + 645, 588, 1021, 281, 536, 411, 341, 307, 746, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.3284063913735999, "compression_ratio": 1.7467248908296944, "no_speech_prob": + 0.33951041102409363}, {"id": 112, "seek": 130480, "start": 1304.8, "end": 1319.8, + "text": " That is not just fascinating for for techies like we were but also say + business people and users see that value and see value and their work for it.", + "tokens": [50364, 663, 307, 406, 445, 10343, 337, 337, 7553, 530, 411, 321, 645, + 457, 611, 584, 1606, 561, 293, 5022, 536, 300, 2158, 293, 536, 2158, 293, 641, 589, + 337, 309, 13, 51114], "temperature": 0.0, "avg_logprob": -0.15099378313337053, "compression_ratio": + 1.3703703703703705, "no_speech_prob": 0.00979765597730875}, {"id": 113, "seek": + 131980, "start": 1319.8, "end": 1346.8, "text": " I can imagine that and it''s like + a class of what we call knowledge workers right it''s something that you spend so + much time on crafting this queries and I have spent some time in the full text finance + I would say at alpha sense and remember some of the clients they had accumulated + Boolean queries over a period of 20 years right and they were like so long it''s + like several pages.", "tokens": [50364, 286, 393, 3811, 300, 293, 309, 311, 411, + 257, 1508, 295, 437, 321, 818, 3601, 5600, 558, 309, 311, 746, 300, 291, 3496, 370, + 709, 565, 322, 29048, 341, 24109, 293, 286, 362, 4418, 512, 565, 294, 264, 1577, + 2487, 10719, 286, 576, 584, 412, 8961, 2020, 293, 1604, 512, 295, 264, 6982, 436, + 632, 31346, 23351, 28499, 24109, 670, 257, 2896, 295, 945, 924, 558, 293, 436, 645, + 411, 370, 938, 309, 311, 411, 2940, 7183, 13, 51714], "temperature": 0.0, "avg_logprob": + -0.11544191546556426, "compression_ratio": 1.6493506493506493, "no_speech_prob": + 0.5693442225456238}, {"id": 114, "seek": 134680, "start": 1346.8, "end": 1375.8, + "text": " When you when you when you slap that into solar it runs for three minutes + because our index layout was not what it is today and was not very optimal and it''s + crazy to see what what people kind of start doing as work around right so we are + at a similar case with a with an airplane manufacturer was not financial domain + but really on some more maintenance level analyzing", "tokens": [50364, 1133, 291, + 562, 291, 562, 291, 21075, 300, 666, 7936, 309, 6676, 337, 1045, 2077, 570, 527, + 8186, 13333, 390, 406, 437, 309, 307, 965, 293, 390, 406, 588, 16252, 293, 309, + 311, 3219, 281, 536, 437, 437, 561, 733, 295, 722, 884, 382, 589, 926, 558, 370, + 321, 366, 412, 257, 2531, 1389, 365, 257, 365, 364, 17130, 18022, 390, 406, 4669, + 9274, 457, 534, 322, 512, 544, 11258, 1496, 23663, 51814], "temperature": 0.0, "avg_logprob": + -0.1869453505465859, "compression_ratio": 1.6742081447963801, "no_speech_prob": + 0.03269074112176895}, {"id": 115, "seek": 137580, "start": 1375.8, "end": 1399.8, + "text": " basically issues that come up maybe in certain technical areas and they + also have like this crazy Boolean search queries and people just became experts + and crafting that but it took them really long like asking for sending one query + creating this query I was taking easily like minutes.", "tokens": [50364, 1936, + 2663, 300, 808, 493, 1310, 294, 1629, 6191, 3179, 293, 436, 611, 362, 411, 341, + 3219, 23351, 28499, 3164, 24109, 293, 561, 445, 3062, 8572, 293, 29048, 300, 457, + 309, 1890, 552, 534, 938, 411, 3365, 337, 7750, 472, 14581, 4084, 341, 14581, 286, + 390, 1940, 3612, 411, 2077, 13, 51564], "temperature": 0.0, "avg_logprob": -0.2198630766435103, + "compression_ratio": 1.5714285714285714, "no_speech_prob": 0.01872316189110279}, + {"id": 116, "seek": 139980, "start": 1399.8, "end": 1428.8, "text": " Yeah exactly + and so what hey stack is today can you can you elaborate a bit on the architecture + and maybe if it''s possible if you find it easy if you put if you pick what say + use case actually I recently I was talking to one stakeholder who wanted to build + a chatbot but it was a very specific domain so that chatbot would actually ask you + some kind of philosophical question.", "tokens": [50364, 865, 2293, 293, 370, 437, + 4177, 8630, 307, 965, 393, 291, 393, 291, 20945, 257, 857, 322, 264, 9482, 293, + 1310, 498, 309, 311, 1944, 498, 291, 915, 309, 1858, 498, 291, 829, 498, 291, 1888, + 437, 584, 764, 1389, 767, 286, 3938, 286, 390, 1417, 281, 472, 43406, 567, 1415, + 281, 1322, 257, 5081, 18870, 457, 309, 390, 257, 588, 2685, 9274, 370, 300, 5081, + 18870, 576, 767, 1029, 291, 512, 733, 295, 25066, 1168, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.1847158670425415, "compression_ratio": 1.6563876651982379, + "no_speech_prob": 0.024833954870700836}, {"id": 117, "seek": 142980, "start": 1429.8, + "end": 1453.8, "text": " So I think it''s a very difficult then like questions sort + of a little bit like distracting you from from what''s going on let''s say you are + on a conference and in a lot of things go through your mind but you don''t register + maybe what''s going on you don''t get see the value and and that Zenbot might kind + of ask you and well essentially allow you to pause and reflect right.", "tokens": + [50364, 407, 286, 519, 309, 311, 257, 588, 2252, 550, 411, 1651, 1333, 295, 257, + 707, 857, 411, 36689, 291, 490, 490, 437, 311, 516, 322, 718, 311, 584, 291, 366, + 322, 257, 7586, 293, 294, 257, 688, 295, 721, 352, 807, 428, 1575, 457, 291, 500, + 380, 7280, 1310, 437, 311, 516, 322, 291, 500, 380, 483, 536, 264, 2158, 293, 293, + 300, 22387, 18870, 1062, 733, 295, 1029, 291, 293, 731, 4476, 2089, 291, 281, 10465, + 293, 5031, 558, 13, 51564], "temperature": 0.0, "avg_logprob": -0.25512452967026655, + "compression_ratio": 1.6986301369863013, "no_speech_prob": 0.08729945868253708}, + {"id": 118, "seek": 145380, "start": 1453.8, "end": 1472.8, "text": " What I realized + is that yeah I could pick another shelf model let''s say question answering bird + or something but it probably wouldn''t work on what I want right my domain is different + and I had an electronic book with this Zen type of statements.", "tokens": [50364, + 708, 286, 5334, 307, 300, 1338, 286, 727, 1888, 1071, 15222, 2316, 718, 311, 584, + 1168, 13430, 5255, 420, 746, 457, 309, 1391, 2759, 380, 589, 322, 437, 286, 528, + 558, 452, 9274, 307, 819, 293, 286, 632, 364, 10092, 1446, 365, 341, 22387, 2010, + 295, 12363, 13, 51314], "temperature": 0.0, "avg_logprob": -0.13467521850879377, + "compression_ratio": 1.4080459770114941, "no_speech_prob": 0.03453877195715904}, + {"id": 119, "seek": 147280, "start": 1472.8, "end": 1485.8, "text": " So this one + question I''m hinting to is kind of fine tuning or maybe even right retraining right + but where would I start with hey stack and can you walk me through the architecture.", + "tokens": [50364, 407, 341, 472, 1168, 286, 478, 12075, 278, 281, 307, 733, 295, + 2489, 15164, 420, 1310, 754, 558, 49356, 1760, 558, 457, 689, 576, 286, 722, 365, + 4177, 8630, 293, 393, 291, 1792, 385, 807, 264, 9482, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.21200343540736608, "compression_ratio": 1.3636363636363635, + "no_speech_prob": 0.16099584102630615}, {"id": 120, "seek": 148580, "start": 1485.8, + "end": 1514.8, "text": " So as mentioned earlier into core principles are these + building blocks and using this building blocks to assemble pipelines and I would + say the core we come from is question answering and search but by now I would say + the framework has evolved a lot in that direction if you have a lot of different + notes and can support a lot of different use cases going to translation zero short + classification.", "tokens": [50414, 407, 382, 2835, 3071, 666, 4965, 9156, 366, + 613, 2390, 8474, 293, 1228, 341, 2390, 8474, 281, 22364, 40168, 293, 286, 576, 584, + 264, 4965, 321, 808, 490, 307, 1168, 13430, 293, 3164, 457, 538, 586, 286, 576, + 584, 264, 8388, 575, 14178, 257, 688, 294, 300, 3513, 498, 291, 362, 257, 688, 295, + 819, 5570, 293, 393, 1406, 257, 688, 295, 819, 764, 3331, 516, 281, 12853, 4018, + 2099, 21538, 13, 51814], "temperature": 0.0, "avg_logprob": -0.20851414998372395, + "compression_ratio": 1.7723214285714286, "no_speech_prob": 0.0936017706990242}, + {"id": 121, "seek": 151580, "start": 1515.8, "end": 1524.8, "text": " And you could + produce these notes in isolation or you can kind of assemble them and use them within + your search pipeline.", "tokens": [50364, 400, 291, 727, 5258, 613, 5570, 294, 16001, + 420, 291, 393, 733, 295, 22364, 552, 293, 764, 552, 1951, 428, 3164, 15517, 13, + 50814], "temperature": 0.0, "avg_logprob": -0.19384542004815464, "compression_ratio": + 1.6775510204081632, "no_speech_prob": 0.002056156052276492}, {"id": 122, "seek": + 151580, "start": 1524.8, "end": 1544.8, "text": " So usually I think what what our + users through and how they start is now they often come with a kind of search use + case pick one of the standard pipelines that we have so we can very easily the few + lines of Python created pipeline for no it''s a question answering or maybe dance + retrieval.", "tokens": [50814, 407, 2673, 286, 519, 437, 437, 527, 5022, 807, 293, + 577, 436, 722, 307, 586, 436, 2049, 808, 365, 257, 733, 295, 3164, 764, 1389, 1888, + 472, 295, 264, 3832, 40168, 300, 321, 362, 370, 321, 393, 588, 3612, 264, 1326, + 3876, 295, 15329, 2942, 15517, 337, 572, 309, 311, 257, 1168, 13430, 420, 1310, + 4489, 19817, 3337, 13, 51814], "temperature": 0.0, "avg_logprob": -0.19384542004815464, + "compression_ratio": 1.6775510204081632, "no_speech_prob": 0.002056156052276492}, + {"id": 123, "seek": 154580, "start": 1545.8, "end": 1566.8, "text": " Pick a document + store you pick one model from for example the hackenface model hub and and we give + some recommendations on which models might be my people starting point and then + it''s very easy actually to just put your files into a into a pipeline can be PDF + files we do the conversion basically for you there''s a note for it.", "tokens": + [50364, 14129, 257, 4166, 3531, 291, 1888, 472, 2316, 490, 337, 1365, 264, 10339, + 268, 2868, 2316, 11838, 293, 293, 321, 976, 512, 10434, 322, 597, 5245, 1062, 312, + 452, 561, 2891, 935, 293, 550, 309, 311, 588, 1858, 767, 281, 445, 829, 428, 7098, + 666, 257, 666, 257, 15517, 393, 312, 17752, 7098, 321, 360, 264, 14298, 1936, 337, + 291, 456, 311, 257, 3637, 337, 309, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.2539628794495489, "compression_ratio": 1.5497630331753554, "no_speech_prob": + 0.006160913500934839}, {"id": 124, "seek": 156680, "start": 1567.8, "end": 1584.8, + "text": " And just have a basic say demo system up and running in a few minutes + and that''s often already I think a good good starting point if you are maybe also + new to that field if you just want to try it quickly out on this kind of ebooks + that you mentioned.", "tokens": [50414, 400, 445, 362, 257, 3875, 584, 10723, 1185, + 493, 293, 2614, 294, 257, 1326, 2077, 293, 300, 311, 2049, 1217, 286, 519, 257, + 665, 665, 2891, 935, 498, 291, 366, 1310, 611, 777, 281, 300, 2519, 498, 291, 445, + 528, 281, 853, 309, 2661, 484, 322, 341, 733, 295, 308, 15170, 300, 291, 2835, 13, + 51264], "temperature": 0.0, "avg_logprob": -0.15944649001299324, "compression_ratio": + 1.494047619047619, "no_speech_prob": 0.08734586834907532}, {"id": 125, "seek": 158480, + "start": 1585.8, "end": 1610.8, "text": " And get a get a first let''s say quality + of understanding how good piece of the shelf pipelines for my use case get this + first data point and then basically enter the I would say next next steps typically + in your project if you see all like this is promising but not enough for really + going to production.", "tokens": [50414, 400, 483, 257, 483, 257, 700, 718, 311, + 584, 3125, 295, 3701, 577, 665, 2522, 295, 264, 15222, 40168, 337, 452, 764, 1389, + 483, 341, 700, 1412, 935, 293, 550, 1936, 3242, 264, 286, 576, 584, 958, 958, 4439, + 5850, 294, 428, 1716, 498, 291, 536, 439, 411, 341, 307, 20257, 457, 406, 1547, + 337, 534, 516, 281, 4265, 13, 51664], "temperature": 0.0, "avg_logprob": -0.17646285891532898, + "compression_ratio": 1.5751295336787565, "no_speech_prob": 0.10711150616407394}, + {"id": 126, "seek": 161080, "start": 1610.8, "end": 1629.8, "text": " And then typically + go more in this experimentation mode they say all it''s now maybe evaluate compare + a couple of different models let''s maybe adjust this pipeline a bit or add a re-ranker + maybe or go maybe to the to a hybrid retriever pipeline where we come.", "tokens": + [50364, 400, 550, 5850, 352, 544, 294, 341, 37142, 4391, 436, 584, 439, 309, 311, + 586, 1310, 13059, 6794, 257, 1916, 295, 819, 5245, 718, 311, 1310, 4369, 341, 15517, + 257, 857, 420, 909, 257, 319, 12, 20479, 260, 1310, 420, 352, 1310, 281, 264, 281, + 257, 13051, 19817, 331, 15517, 689, 321, 808, 13, 51314], "temperature": 0.0, "avg_logprob": + -0.29111651716561154, "compression_ratio": 1.5266272189349113, "no_speech_prob": + 0.02739662677049637}, {"id": 127, "seek": 162980, "start": 1629.8, "end": 1650.8, + "text": " Basically have a 25 retriever in parallel to a dense retriever and we + join these documents and hastic has a lot of functionality that makes that easy + to to basically change a pipeline as you wonder very quickly and then evaluate if + that gives you any any benefit.", "tokens": [50364, 8537, 362, 257, 3552, 19817, + 331, 294, 8952, 281, 257, 18011, 19817, 331, 293, 321, 3917, 613, 8512, 293, 6581, + 299, 575, 257, 688, 295, 14980, 300, 1669, 300, 1858, 281, 281, 1936, 1319, 257, + 15517, 382, 291, 2441, 588, 2661, 293, 550, 13059, 498, 300, 2709, 291, 604, 604, + 5121, 13, 51414], "temperature": 0.0, "avg_logprob": -0.24349680968693324, "compression_ratio": + 1.5380116959064327, "no_speech_prob": 0.013260602951049805}, {"id": 128, "seek": + 165080, "start": 1651.8, "end": 1678.8, "text": " If these say of the shelf options + and combinations are not enough for use case then yeah you can go down the fine + tuning route I would say we have also have a source the notation tool labeling tool + where you can create training data and basically fine tune parts of your pipeline + retriever or reader for question answering.", "tokens": [50414, 759, 613, 584, 295, + 264, 15222, 3956, 293, 21267, 366, 406, 1547, 337, 764, 1389, 550, 1338, 291, 393, + 352, 760, 264, 2489, 15164, 7955, 286, 576, 584, 321, 362, 611, 362, 257, 4009, + 264, 24657, 2290, 40244, 2290, 689, 291, 393, 1884, 3097, 1412, 293, 1936, 2489, + 10864, 3166, 295, 428, 15517, 19817, 331, 420, 15149, 337, 1168, 13430, 13, 51764], + "temperature": 0.0, "avg_logprob": -0.2928010793832632, "compression_ratio": 1.6649484536082475, + "no_speech_prob": 0.015037382952868938}, {"id": 129, "seek": 167880, "start": 1678.8, + "end": 1683.8, "text": " So basically I would say everything from a quick prototype + tool.", "tokens": [50364, 407, 1936, 286, 576, 584, 1203, 490, 257, 1702, 19475, + 2290, 13, 50614], "temperature": 0.0, "avg_logprob": -0.3243922551472982, "compression_ratio": + 1.393103448275862, "no_speech_prob": 0.001257824245840311}, {"id": 130, "seek": + 167880, "start": 1683.8, "end": 1693.8, "text": " Let''s do some some experiments + here and there to then going and production and deploying it with a with a basic + rest API until basically.", "tokens": [50614, 961, 311, 360, 512, 512, 12050, 510, + 293, 456, 281, 550, 516, 293, 4265, 293, 34198, 309, 365, 257, 365, 257, 3875, 1472, + 9362, 1826, 1936, 13, 51114], "temperature": 0.0, "avg_logprob": -0.3243922551472982, + "compression_ratio": 1.393103448275862, "no_speech_prob": 0.001257824245840311}, + {"id": 131, "seek": 169380, "start": 1694.8, "end": 1720.8, "text": " Sounds cool + and so in that experimentation mode I guess one one one aspect is like fine tuning + you mentioned right the other is kind of like what building blocks I could plug + in right and I know you guys have really good documentation is there something like + a tutorial or or some kind of walk through that would even help me discover is a + user what are the options.", "tokens": [50414, 14576, 1627, 293, 370, 294, 300, + 37142, 4391, 286, 2041, 472, 472, 472, 4171, 307, 411, 2489, 15164, 291, 2835, 558, + 264, 661, 307, 733, 295, 411, 437, 2390, 8474, 286, 727, 5452, 294, 558, 293, 286, + 458, 291, 1074, 362, 534, 665, 14333, 307, 456, 746, 411, 257, 7073, 420, 420, 512, + 733, 295, 1792, 807, 300, 576, 754, 854, 385, 4411, 307, 257, 4195, 437, 366, 264, + 3956, 13, 51714], "temperature": 0.0, "avg_logprob": -0.08503927230834961, "compression_ratio": + 1.6712328767123288, "no_speech_prob": 0.06906592845916748}, {"id": 132, "seek": + 172080, "start": 1721.8, "end": 1736.8, "text": " So we have a couple of different + different tutorials showing you what kind of notes also you can use like many people + are not aware of for example options that can do it indexing time that might be + helpful so.", "tokens": [50414, 407, 321, 362, 257, 1916, 295, 819, 819, 17616, + 4099, 291, 437, 733, 295, 5570, 611, 291, 393, 764, 411, 867, 561, 366, 406, 3650, + 295, 337, 1365, 3956, 300, 393, 360, 309, 8186, 278, 565, 300, 1062, 312, 4961, + 370, 13, 51164], "temperature": 0.0, "avg_logprob": -0.13306047605431598, "compression_ratio": + 1.4822695035460993, "no_speech_prob": 0.005889758467674255}, {"id": 133, "seek": + 173680, "start": 1737.8, "end": 1754.8, "text": " For example, like enriching your + documents with metadata can be incredibly powerful later at search time because + you can then filter on your search space to make more categories that that you''re + interested in.", "tokens": [50414, 1171, 1365, 11, 411, 18849, 278, 428, 8512, 365, + 26603, 393, 312, 6252, 4005, 1780, 412, 3164, 565, 570, 291, 393, 550, 6608, 322, + 428, 3164, 1901, 281, 652, 544, 10479, 300, 300, 291, 434, 3102, 294, 13, 51264], + "temperature": 0.0, "avg_logprob": -0.2171345211210705, "compression_ratio": 1.4217687074829932, + "no_speech_prob": 0.03299427404999733}, {"id": 134, "seek": 175480, "start": 1754.8, + "end": 1759.8, "text": " And there we have for example, the stories that show you + how easily you can.", "tokens": [50364, 400, 456, 321, 362, 337, 1365, 11, 264, + 3676, 300, 855, 291, 577, 3612, 291, 393, 13, 50614], "temperature": 0.0, "avg_logprob": + -0.23244542208584873, "compression_ratio": 1.6352201257861636, "no_speech_prob": + 0.0044640968553721905}, {"id": 135, "seek": 175480, "start": 1759.8, "end": 1773.8, + "text": " For example, classify documents that you index to certain categories and + then later on at query time use these categories to narrow down your search space + filter for these categories.", "tokens": [50614, 1171, 1365, 11, 33872, 8512, 300, + 291, 8186, 281, 1629, 10479, 293, 550, 1780, 322, 412, 14581, 565, 764, 613, 10479, + 281, 9432, 760, 428, 3164, 1901, 6608, 337, 613, 10479, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.23244542208584873, "compression_ratio": 1.6352201257861636, + "no_speech_prob": 0.0044640968553721905}, {"id": 136, "seek": 177380, "start": 1774.8, + "end": 1785.8, "text": " And on the model side, say if you are now you know that + you want to have a say QA model reader and you know interested in what model you + want.", "tokens": [50414, 400, 322, 264, 2316, 1252, 11, 584, 498, 291, 366, 586, + 291, 458, 300, 291, 528, 281, 362, 257, 584, 1249, 32, 2316, 15149, 293, 291, 458, + 3102, 294, 437, 2316, 291, 528, 13, 50964], "temperature": 0.0, "avg_logprob": -0.1968604496547154, + "compression_ratio": 1.5743589743589743, "no_speech_prob": 0.020196301862597466}, + {"id": 137, "seek": 177380, "start": 1785.8, "end": 1799.8, "text": " I would probably + suggest you just go to our benchmarks page which is linked from documentation there + we have a couple of comparisons in terms of accuracy and speed.", "tokens": [50964, + 286, 576, 1391, 3402, 291, 445, 352, 281, 527, 43751, 3028, 597, 307, 9408, 490, + 14333, 456, 321, 362, 257, 1916, 295, 33157, 294, 2115, 295, 14170, 293, 3073, 13, + 51664], "temperature": 0.0, "avg_logprob": -0.1968604496547154, "compression_ratio": + 1.5743589743589743, "no_speech_prob": 0.020196301862597466}, {"id": 138, "seek": + 179980, "start": 1799.8, "end": 1808.8, "text": " But also we have most of our own + models on the hackenface model hub which appears to find this information and model + cards.", "tokens": [50364, 583, 611, 321, 362, 881, 295, 527, 1065, 5245, 322, 264, + 10339, 268, 2868, 2316, 11838, 597, 7038, 281, 915, 341, 1589, 293, 2316, 5632, + 13, 50814], "temperature": 0.0, "avg_logprob": -0.23439445495605468, "compression_ratio": + 1.6566523605150214, "no_speech_prob": 0.011053085327148438}, {"id": 139, "seek": + 179980, "start": 1809.8, "end": 1826.8, "text": " Yeah, that''s awesome. So you + guys in addition to open source version that I could I presume could host completely + myself right I still have a bunch of questions on that open source side but still + you also offer the cloud version you call deep set cloud is right.", "tokens": [50864, + 865, 11, 300, 311, 3476, 13, 407, 291, 1074, 294, 4500, 281, 1269, 4009, 3037, 300, + 286, 727, 286, 43283, 727, 3975, 2584, 2059, 558, 286, 920, 362, 257, 3840, 295, + 1651, 322, 300, 1269, 4009, 1252, 457, 920, 291, 611, 2626, 264, 4588, 3037, 291, + 818, 2452, 992, 4588, 307, 558, 13, 51714], "temperature": 0.0, "avg_logprob": -0.23439445495605468, + "compression_ratio": 1.6566523605150214, "no_speech_prob": 0.011053085327148438}, + {"id": 140, "seek": 182680, "start": 1826.8, "end": 1839.8, "text": " Can you explain + what users get with that I presume scalability but maybe something else and I think + we can we can leave a link to in the show notes as well for those users who want + to try it out.", "tokens": [50364, 1664, 291, 2903, 437, 5022, 483, 365, 300, 286, + 43283, 15664, 2310, 457, 1310, 746, 1646, 293, 286, 519, 321, 393, 321, 393, 1856, + 257, 2113, 281, 294, 264, 855, 5570, 382, 731, 337, 729, 5022, 567, 528, 281, 853, + 309, 484, 13, 51014], "temperature": 0.0, "avg_logprob": -0.21587915530149965, "compression_ratio": + 1.6198347107438016, "no_speech_prob": 0.01978258602321148}, {"id": 141, "seek": + 182680, "start": 1839.8, "end": 1852.8, "text": " Yeah, basically hey stack the + open source predictors will be a Python framework and you can do everything you + want there to prototype the experiments and if you want also go to production with + it.", "tokens": [51014, 865, 11, 1936, 4177, 8630, 264, 1269, 4009, 6069, 830, 486, + 312, 257, 15329, 8388, 293, 291, 393, 360, 1203, 291, 528, 456, 281, 19475, 264, + 12050, 293, 498, 291, 528, 611, 352, 281, 4265, 365, 309, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.21587915530149965, "compression_ratio": 1.6198347107438016, + "no_speech_prob": 0.01978258602321148}, {"id": 142, "seek": 185280, "start": 1852.8, + "end": 1881.8, "text": " But you also found in basically in addition to that people + want something more like they want to really host the platform where it''s really + end to end and basically you have faster workflows so really what''s covering the + whole lifecycle of an application from early prototyping to running many experiments + and parallel getting more guidance.", "tokens": [50364, 583, 291, 611, 1352, 294, + 1936, 294, 4500, 281, 300, 561, 528, 746, 544, 411, 436, 528, 281, 534, 3975, 264, + 3663, 689, 309, 311, 534, 917, 281, 917, 293, 1936, 291, 362, 4663, 43461, 370, + 534, 437, 311, 10322, 264, 1379, 45722, 295, 364, 3861, 490, 2440, 46219, 3381, + 281, 2614, 867, 12050, 293, 8952, 1242, 544, 10056, 13, 51814], "temperature": 0.0, + "avg_logprob": -0.27022168040275574, "compression_ratio": 1.6394230769230769, "no_speech_prob": + 0.028763873502612114}, {"id": 143, "seek": 188180, "start": 1881.8, "end": 1891.8, + "text": " What''s on from your eye perspective on what to launch investigating certain + documents in a faster way.", "tokens": [50364, 708, 311, 322, 490, 428, 3313, 4585, + 322, 437, 281, 4025, 22858, 1629, 8512, 294, 257, 4663, 636, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.2527727782726288, "compression_ratio": 1.4973821989528795, + "no_speech_prob": 0.02090139500796795}, {"id": 144, "seek": 188180, "start": 1891.8, + "end": 1906.8, "text": " Then to OK now I did all these experiments and I want ever + kind of one click path to production and I don''t want to bother with any scaling + and basically a productionizing on my side.", "tokens": [50864, 1396, 281, 2264, + 586, 286, 630, 439, 613, 12050, 293, 286, 528, 1562, 733, 295, 472, 2052, 3100, + 281, 4265, 293, 286, 500, 380, 528, 281, 8677, 365, 604, 21589, 293, 1936, 257, + 4265, 3319, 322, 452, 1252, 13, 51614], "temperature": 0.0, "avg_logprob": -0.2527727782726288, + "compression_ratio": 1.4973821989528795, "no_speech_prob": 0.02090139500796795}, + {"id": 145, "seek": 190680, "start": 1906.8, "end": 1933.8, "text": " And this is + basically what what we do with these at cloud so if you imagine as a host the platform + the cloud the SaaS platform where you develop your NAP applications and can easily + bring them to production and monitor them afterwards so really the I would say whole + life cycle and especially what''s going on getting your.", "tokens": [50364, 400, + 341, 307, 1936, 437, 437, 321, 360, 365, 613, 412, 4588, 370, 498, 291, 3811, 382, + 257, 3975, 264, 3663, 264, 4588, 264, 49733, 3663, 689, 291, 1499, 428, 426, 4715, + 5821, 293, 393, 3612, 1565, 552, 281, 4265, 293, 6002, 552, 10543, 370, 534, 264, + 286, 576, 584, 1379, 993, 6586, 293, 2318, 437, 311, 516, 322, 1242, 428, 13, 51714], + "temperature": 0.0, "avg_logprob": -0.3750101725260417, "compression_ratio": 1.5707317073170732, + "no_speech_prob": 0.006859649904072285}, {"id": 146, "seek": 193380, "start": 1933.8, + "end": 1948.8, "text": " Getting your NAP pipelines faster to production as you + would probably do it on a just Python level and then continue monitoring them and + having this close group is to later want to maintain them.", "tokens": [50364, 13674, + 428, 426, 4715, 40168, 4663, 281, 4265, 382, 291, 576, 1391, 360, 309, 322, 257, + 445, 15329, 1496, 293, 550, 2354, 11028, 552, 293, 1419, 341, 1998, 1594, 307, 281, + 1780, 528, 281, 6909, 552, 13, 51114], "temperature": 0.0, "avg_logprob": -0.3244152999505764, + "compression_ratio": 1.3829787234042554, "no_speech_prob": 0.019420376047492027}, + {"id": 147, "seek": 194880, "start": 1948.8, "end": 1977.8, "text": " So it sounds + cool and since it''s kind of like so with open source version I presume I could + do kind of a local development on my PC right and then go and use some deployment + pipeline to deploy with cloud version I have sort of like managed haystack right + and now thinking about developer experience are you guys moving more towards cloud + tools as well you know like for example.", "tokens": [50364, 407, 309, 3263, 1627, + 293, 1670, 309, 311, 733, 295, 411, 370, 365, 1269, 4009, 3037, 286, 43283, 286, + 727, 360, 733, 295, 257, 2654, 3250, 322, 452, 6465, 558, 293, 550, 352, 293, 764, + 512, 19317, 15517, 281, 7274, 365, 4588, 3037, 286, 362, 1333, 295, 411, 6453, 4842, + 372, 501, 558, 293, 586, 1953, 466, 10754, 1752, 366, 291, 1074, 2684, 544, 3030, + 4588, 3873, 382, 731, 291, 458, 411, 337, 1365, 13, 51814], "temperature": 0.0, + "avg_logprob": -0.15316334253625025, "compression_ratio": 1.6578947368421053, "no_speech_prob": + 0.7715344429016113}, {"id": 148, "seek": 197780, "start": 1977.8, "end": 1990.8, + "text": " A code editor could be in the clouds or the changes and click click the + button and off it goes I don''t even need to download it locally right or or do + you see some other trend with your users.", "tokens": [50364, 316, 3089, 9839, 727, + 312, 294, 264, 12193, 420, 264, 2962, 293, 2052, 2052, 264, 2960, 293, 766, 309, + 1709, 286, 500, 380, 754, 643, 281, 5484, 309, 16143, 558, 420, 420, 360, 291, 536, + 512, 661, 6028, 365, 428, 5022, 13, 51014], "temperature": 0.0, "avg_logprob": -0.16123879474142325, + "compression_ratio": 1.411764705882353, "no_speech_prob": 0.007400870323181152}, + {"id": 149, "seek": 199080, "start": 1991.8, "end": 2017.8, "text": " No like we + maybe that''s also an important point so it''s still a developer platform right + so we are not in a low code no code space and what we really try is basically giving + developers the option to customize components and that then goes through coding + and and there we have for example editors directly on the platform where you can.", + "tokens": [50414, 883, 411, 321, 1310, 300, 311, 611, 364, 1021, 935, 370, 309, + 311, 920, 257, 10754, 3663, 558, 370, 321, 366, 406, 294, 257, 2295, 3089, 572, + 3089, 1901, 293, 437, 321, 534, 853, 307, 1936, 2902, 8849, 264, 3614, 281, 19734, + 6677, 293, 300, 550, 1709, 807, 17720, 293, 293, 456, 321, 362, 337, 1365, 31446, + 3838, 322, 264, 3663, 689, 291, 393, 13, 51714], "temperature": 0.0, "avg_logprob": + -0.21557789954586307, "compression_ratio": 1.6262135922330097, "no_speech_prob": + 0.06608215719461441}, {"id": 150, "seek": 201780, "start": 2017.8, "end": 2039.8, + "text": " Edit for example just the young definition of pipelines and quickly switch + certain parameters if you want to do that and then it''s basically there''s a hosted + notebooks where you can also easily kind of open these resources like a pipeline + and we automatically create some Python code of it in notebook that you can then.", + "tokens": [50364, 33241, 337, 1365, 445, 264, 2037, 7123, 295, 40168, 293, 2661, + 3679, 1629, 9834, 498, 291, 528, 281, 360, 300, 293, 550, 309, 311, 1936, 456, 311, + 257, 19204, 43782, 689, 291, 393, 611, 3612, 733, 295, 1269, 613, 3593, 411, 257, + 15517, 293, 321, 6772, 1884, 512, 15329, 3089, 295, 309, 294, 21060, 300, 291, 393, + 550, 13, 51464], "temperature": 0.0, "avg_logprob": -0.15287343282548208, "compression_ratio": + 1.5841584158415842, "no_speech_prob": 0.0018619955517351627}, {"id": 151, "seek": + 203980, "start": 2039.8, "end": 2044.8, "text": " Then edit as you as you know it + also from haystack open source.", "tokens": [50364, 1396, 8129, 382, 291, 382, 291, + 458, 309, 611, 490, 4842, 372, 501, 1269, 4009, 13, 50614], "temperature": 0.0, + "avg_logprob": -0.2969833427751568, "compression_ratio": 1.618811881188119, "no_speech_prob": + 0.0009909897344186902}, {"id": 152, "seek": 203980, "start": 2045.8, "end": 2064.8, + "text": " Adjust the sort of certain component debug it maybe at another one and + then it''s basically just one Python line again to move away from the Python code + in your notebook to the production artifacts to the pipeline that is then deployed + and then can run production.", "tokens": [50664, 34049, 264, 1333, 295, 1629, 6542, + 24083, 309, 1310, 412, 1071, 472, 293, 550, 309, 311, 1936, 445, 472, 15329, 1622, + 797, 281, 1286, 1314, 490, 264, 15329, 3089, 294, 428, 21060, 281, 264, 4265, 24617, + 281, 264, 15517, 300, 307, 550, 17826, 293, 550, 393, 1190, 4265, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.2969833427751568, "compression_ratio": 1.618811881188119, + "no_speech_prob": 0.0009909897344186902}, {"id": 153, "seek": 206480, "start": 2065.8, + "end": 2080.8, "text": " Yeah sounds cool and if a user has some as a user I mean + it could be a company right so let''s say they have an established tool set you + know maybe if the usage maker maybe they don''t maybe use something else.", "tokens": + [50414, 865, 3263, 1627, 293, 498, 257, 4195, 575, 512, 382, 257, 4195, 286, 914, + 309, 727, 312, 257, 2237, 558, 370, 718, 311, 584, 436, 362, 364, 7545, 2290, 992, + 291, 458, 1310, 498, 264, 14924, 17127, 1310, 436, 500, 380, 1310, 764, 746, 1646, + 13, 51164], "temperature": 0.0, "avg_logprob": -0.1538731431307858, "compression_ratio": + 1.5891891891891892, "no_speech_prob": 0.013970565982162952}, {"id": 154, "seek": + 206480, "start": 2081.8, "end": 2089.8, "text": " How do you reach these tools said + that is kind of outside of haystack do you have to.", "tokens": [51214, 1012, 360, + 291, 2524, 613, 3873, 848, 300, 307, 733, 295, 2380, 295, 4842, 372, 501, 360, 291, + 362, 281, 13, 51614], "temperature": 0.0, "avg_logprob": -0.1538731431307858, "compression_ratio": + 1.5891891891891892, "no_speech_prob": 0.013970565982162952}, {"id": 155, "seek": + 208980, "start": 2089.8, "end": 2118.8, "text": " I would say in most cases not + so you will I mean what were very very basically stop I would say with with the + cloud is when you have your pipeline to NAP service and you have your rest API that + you expose that''s kind of where we stop so there''s a lot of I would say stuff + in a company that is built around it when you''re into your product and also on + the other side of where do the files come from where does that.", "tokens": [50414, + 286, 576, 584, 294, 881, 3331, 406, 370, 291, 486, 286, 914, 437, 645, 588, 588, + 1936, 1590, 286, 576, 584, 365, 365, 264, 4588, 307, 562, 291, 362, 428, 15517, + 281, 426, 4715, 2643, 293, 291, 362, 428, 1472, 9362, 300, 291, 19219, 300, 311, + 733, 295, 689, 321, 1590, 370, 456, 311, 257, 688, 295, 286, 576, 584, 1507, 294, + 257, 2237, 300, 307, 3094, 926, 309, 562, 291, 434, 666, 428, 1674, 293, 611, 322, + 264, 661, 1252, 295, 689, 360, 264, 7098, 808, 490, 689, 775, 300, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.3394919144479852, "compression_ratio": 1.7468354430379747, + "no_speech_prob": 0.006194399204105139}, {"id": 156, "seek": 211980, "start": 2119.8, + "end": 2123.8, "text": " Data come from how you think it into into a deep set cloud.", + "tokens": [50364, 11888, 808, 490, 577, 291, 519, 309, 666, 666, 257, 2452, 992, + 4588, 13, 50564], "temperature": 0.0, "avg_logprob": -0.318569540977478, "compression_ratio": + 1.3795620437956204, "no_speech_prob": 0.0023937236983329058}, {"id": 157, "seek": + 211980, "start": 2124.8, "end": 2129.8, "text": " But within that space we rather + see people.", "tokens": [50614, 583, 1951, 300, 1901, 321, 2831, 536, 561, 13, 50864], + "temperature": 0.0, "avg_logprob": -0.318569540977478, "compression_ratio": 1.3795620437956204, + "no_speech_prob": 0.0023937236983329058}, {"id": 158, "seek": 211980, "start": 2130.8, + "end": 2137.8, "text": " Customers who appreciate it that''s like fully integrated + and they don''t usually then.", "tokens": [50914, 16649, 433, 567, 4449, 309, 300, + 311, 411, 4498, 10919, 293, 436, 500, 380, 2673, 550, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.318569540977478, "compression_ratio": 1.3795620437956204, + "no_speech_prob": 0.0023937236983329058}, {"id": 159, "seek": 213780, "start": 2138.8, + "end": 2146.8, "text": " Want to stay on on sage maker if they are on it for these + NAP use cases so from our perspective.", "tokens": [50414, 11773, 281, 1754, 322, + 322, 19721, 17127, 498, 436, 366, 322, 309, 337, 613, 426, 4715, 764, 3331, 370, + 490, 527, 4585, 13, 50814], "temperature": 0.0, "avg_logprob": -0.3936958659778942, + "compression_ratio": 1.4675324675324675, "no_speech_prob": 0.08074244111776352}, + {"id": 160, "seek": 213780, "start": 2147.8, "end": 2155.8, "text": " There are + the other are these more generic solutions that are not specific for NAP the car + work for any kind of machine learning.", "tokens": [50864, 821, 366, 264, 661, 366, + 613, 544, 19577, 6547, 300, 366, 406, 2685, 337, 426, 4715, 264, 1032, 589, 337, + 604, 733, 295, 3479, 2539, 13, 51264], "temperature": 0.0, "avg_logprob": -0.3936958659778942, + "compression_ratio": 1.4675324675324675, "no_speech_prob": 0.08074244111776352}, + {"id": 161, "seek": 215580, "start": 2155.8, "end": 2161.8, "text": " But if you + really have cases where you want to be faster on your NAP use cases.", "tokens": + [50364, 583, 498, 291, 534, 362, 3331, 689, 291, 528, 281, 312, 4663, 322, 428, + 426, 4715, 764, 3331, 13, 50664], "temperature": 0.0, "avg_logprob": -0.29431327444608096, + "compression_ratio": 1.511111111111111, "no_speech_prob": 0.051327504217624664}, + {"id": 162, "seek": 215580, "start": 2161.8, "end": 2178.8, "text": " Want to have + more say support on that side that''s basically where where deep set cloud and comes + into play and to give you an example your think of experiments should evaluate these + pipelines.", "tokens": [50664, 11773, 281, 362, 544, 584, 1406, 322, 300, 1252, + 300, 311, 1936, 689, 689, 2452, 992, 4588, 293, 1487, 666, 862, 293, 281, 976, 291, + 364, 1365, 428, 519, 295, 12050, 820, 13059, 613, 40168, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.29431327444608096, "compression_ratio": 1.511111111111111, + "no_speech_prob": 0.051327504217624664}, {"id": 163, "seek": 217880, "start": 2178.8, + "end": 2195.8, "text": " And then you have to do give basically a lot of options + to investigate predictions and what do these metrics actually say and this is a + thing is something that is usually missing and solutions like sage maker.", "tokens": + [50364, 400, 550, 291, 362, 281, 360, 976, 1936, 257, 688, 295, 3956, 281, 15013, + 21264, 293, 437, 360, 613, 16367, 767, 584, 293, 341, 307, 257, 551, 307, 746, 300, + 307, 2673, 5361, 293, 6547, 411, 19721, 17127, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.31479220390319823, "compression_ratio": 1.7168949771689497, "no_speech_prob": + 0.07646393030881882}, {"id": 164, "seek": 217880, "start": 2195.8, "end": 2202.8, + "text": " You have to then really combine with many other tools and build in there + like a lot of extra stuff.", "tokens": [51214, 509, 362, 281, 550, 534, 10432, 365, + 867, 661, 3873, 293, 1322, 294, 456, 411, 257, 688, 295, 2857, 1507, 13, 51564], + "temperature": 0.0, "avg_logprob": -0.31479220390319823, "compression_ratio": 1.7168949771689497, + "no_speech_prob": 0.07646393030881882}, {"id": 165, "seek": 217880, "start": 2202.8, + "end": 2206.8, "text": " And that basically comes all together already with deep + set cloud.", "tokens": [51564, 400, 300, 1936, 1487, 439, 1214, 1217, 365, 2452, + 992, 4588, 13, 51764], "temperature": 0.0, "avg_logprob": -0.31479220390319823, + "compression_ratio": 1.7168949771689497, "no_speech_prob": 0.07646393030881882}, + {"id": 166, "seek": 220680, "start": 2206.8, "end": 2213.8, "text": " So get it + right so deep set cloud with offer me sort of an evaluation tool set right.", "tokens": + [50364, 407, 483, 309, 558, 370, 2452, 992, 4588, 365, 2626, 385, 1333, 295, 364, + 13344, 2290, 992, 558, 13, 50714], "temperature": 0.0, "avg_logprob": -0.18572496564200755, + "compression_ratio": 1.7173913043478262, "no_speech_prob": 0.001505466760136187}, + {"id": 167, "seek": 220680, "start": 2213.8, "end": 2219.8, "text": " Can I get + the same in the open source version or it''s not present there.", "tokens": [50714, + 1664, 286, 483, 264, 912, 294, 264, 1269, 4009, 3037, 420, 309, 311, 406, 1974, + 456, 13, 51014], "temperature": 0.0, "avg_logprob": -0.18572496564200755, "compression_ratio": + 1.7173913043478262, "no_speech_prob": 0.001505466760136187}, {"id": 168, "seek": + 220680, "start": 2219.8, "end": 2224.8, "text": " You can basically evaluate single + pipelines also in the open source version.", "tokens": [51014, 509, 393, 1936, 13059, + 2167, 40168, 611, 294, 264, 1269, 4009, 3037, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.18572496564200755, "compression_ratio": 1.7173913043478262, "no_speech_prob": + 0.001505466760136187}, {"id": 169, "seek": 220680, "start": 2224.8, "end": 2234.8, + "text": " The difference is that basically in deep set cloud you have a full overview + over your project where we track all your experiments you can kind of compare them.", + "tokens": [51264, 440, 2649, 307, 300, 1936, 294, 2452, 992, 4588, 291, 362, 257, + 1577, 12492, 670, 428, 1716, 689, 321, 2837, 439, 428, 12050, 291, 393, 733, 295, + 6794, 552, 13, 51764], "temperature": 0.0, "avg_logprob": -0.18572496564200755, + "compression_ratio": 1.7173913043478262, "no_speech_prob": 0.001505466760136187}, + {"id": 170, "seek": 223480, "start": 2234.8, "end": 2250.8, "text": " Launch easily + 20 experiments in parallel and this is actually on large data sets and with open + source I think and generally you would need to provision a lot of machines GPUs + to run that in parallel.", "tokens": [50364, 28119, 3612, 945, 12050, 294, 8952, + 293, 341, 307, 767, 322, 2416, 1412, 6352, 293, 365, 1269, 4009, 286, 519, 293, + 5101, 291, 576, 643, 281, 17225, 257, 688, 295, 8379, 18407, 82, 281, 1190, 300, + 294, 8952, 13, 51164], "temperature": 0.0, "avg_logprob": -0.2792954112208167, "compression_ratio": + 1.3724137931034484, "no_speech_prob": 0.004351471550762653}, {"id": 171, "seek": + 225080, "start": 2250.8, "end": 2260.8, "text": " And that''s basically what one + thing that we offer and deep set cloud and the other is basically the I would say + just the you I love layer over it.", "tokens": [50364, 400, 300, 311, 1936, 437, + 472, 551, 300, 321, 2626, 293, 2452, 992, 4588, 293, 264, 661, 307, 1936, 264, 286, + 576, 584, 445, 264, 291, 286, 959, 4583, 670, 309, 13, 50864], "temperature": 0.0, + "avg_logprob": -0.26469106259553327, "compression_ratio": 1.5549738219895288, "no_speech_prob": + 0.24975942075252533}, {"id": 172, "seek": 225080, "start": 2260.8, "end": 2273.8, + "text": " So of course I can work with what Hey stack on and get basically a report + around my experiments again maybe a panel state of frame I get some metrics.", "tokens": + [50864, 407, 295, 1164, 286, 393, 589, 365, 437, 1911, 8630, 322, 293, 483, 1936, + 257, 2275, 926, 452, 12050, 797, 1310, 257, 4831, 1785, 295, 3920, 286, 483, 512, + 16367, 13, 51514], "temperature": 0.0, "avg_logprob": -0.26469106259553327, "compression_ratio": + 1.5549738219895288, "no_speech_prob": 0.24975942075252533}, {"id": 173, "seek": + 227380, "start": 2273.8, "end": 2287.8, "text": " What we do when as you on top + in deep set cloud is allowing people to interact with this kind of data more easily + like finding examples of queries that fail that.", "tokens": [50364, 708, 321, 360, + 562, 382, 291, 322, 1192, 294, 2452, 992, 4588, 307, 8293, 561, 281, 4648, 365, + 341, 733, 295, 1412, 544, 3612, 411, 5006, 5110, 295, 24109, 300, 3061, 300, 13, + 51064], "temperature": 0.0, "avg_logprob": -0.2974823624340456, "compression_ratio": + 1.5326633165829147, "no_speech_prob": 0.0484265498816967}, {"id": 174, "seek": 227380, + "start": 2287.8, "end": 2292.8, "text": " Or that are successful getting feedback + from also end users so collaborating.", "tokens": [51064, 1610, 300, 366, 4406, + 1242, 5824, 490, 611, 917, 5022, 370, 30188, 13, 51314], "temperature": 0.0, "avg_logprob": + -0.2974823624340456, "compression_ratio": 1.5326633165829147, "no_speech_prob": + 0.0484265498816967}, {"id": 175, "seek": 227380, "start": 2292.8, "end": 2297.8, + "text": " Basically the the persons who use that search system at the end.", "tokens": + [51314, 8537, 264, 264, 14453, 567, 764, 300, 3164, 1185, 412, 264, 917, 13, 51564], + "temperature": 0.0, "avg_logprob": -0.2974823624340456, "compression_ratio": 1.5326633165829147, + "no_speech_prob": 0.0484265498816967}, {"id": 176, "seek": 229780, "start": 2297.8, + "end": 2315.8, "text": " And now that''s also what I think what we what we saw a + lot that yeah you can extract your predictions and maybe it''s like a CSV and then + you shared with your next colleague who then I''m kind of rates or give say human + evaluation if these queries makes sense or not.", "tokens": [50364, 400, 586, 300, + 311, 611, 437, 286, 519, 437, 321, 437, 321, 1866, 257, 688, 300, 1338, 291, 393, + 8947, 428, 21264, 293, 1310, 309, 311, 411, 257, 48814, 293, 550, 291, 5507, 365, + 428, 958, 13532, 567, 550, 286, 478, 733, 295, 6846, 420, 976, 584, 1952, 13344, + 498, 613, 24109, 1669, 2020, 420, 406, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.2676256836437788, "compression_ratio": 1.4887640449438202, "no_speech_prob": + 0.012977584265172482}, {"id": 177, "seek": 231580, "start": 2315.8, "end": 2322.8, + "text": " But again this is like a lot of friction you have them a lot of these + sees these are exifies floating around.", "tokens": [50364, 583, 797, 341, 307, + 411, 257, 688, 295, 17710, 291, 362, 552, 257, 688, 295, 613, 8194, 613, 366, 454, + 11221, 12607, 926, 13, 50714], "temperature": 0.0, "avg_logprob": -0.3132982616183124, + "compression_ratio": 1.6945812807881773, "no_speech_prob": 0.011917325668036938}, + {"id": 178, "seek": 231580, "start": 2322.8, "end": 2338.8, "text": " And what we + would be what we do is I think bring this together again having it in one place + that you can also in future easily reuse that for other experiments and even use + it for training and and have it in this in one central place.", "tokens": [50714, + 400, 437, 321, 576, 312, 437, 321, 360, 307, 286, 519, 1565, 341, 1214, 797, 1419, + 309, 294, 472, 1081, 300, 291, 393, 611, 294, 2027, 3612, 26225, 300, 337, 661, + 12050, 293, 754, 764, 309, 337, 3097, 293, 293, 362, 309, 294, 341, 294, 472, 5777, + 1081, 13, 51514], "temperature": 0.0, "avg_logprob": -0.3132982616183124, "compression_ratio": + 1.6945812807881773, "no_speech_prob": 0.011917325668036938}, {"id": 179, "seek": + 233880, "start": 2338.8, "end": 2348.8, "text": " Yeah sounds amazing from what + I gather this sounds like a end to end ML ops platforms specifically for an LP neural + search right.", "tokens": [50364, 865, 3263, 2243, 490, 437, 286, 5448, 341, 3263, + 411, 257, 917, 281, 917, 21601, 44663, 9473, 4682, 337, 364, 38095, 18161, 3164, + 558, 13, 50864], "temperature": 0.0, "avg_logprob": -0.250203937292099, "compression_ratio": + 1.5299539170506913, "no_speech_prob": 0.03293554112315178}, {"id": 180, "seek": + 233880, "start": 2348.8, "end": 2362.8, "text": " Exactly you have thought through + so many things not only the developer side of things like experimentation but also + you know debugging and actually going through the feedback from stakeholders or + users.", "tokens": [50864, 7587, 291, 362, 1194, 807, 370, 867, 721, 406, 787, 264, + 10754, 1252, 295, 721, 411, 37142, 457, 611, 291, 458, 45592, 293, 767, 516, 807, + 264, 5824, 490, 17779, 420, 5022, 13, 51564], "temperature": 0.0, "avg_logprob": + -0.250203937292099, "compression_ratio": 1.5299539170506913, "no_speech_prob": 0.03293554112315178}, + {"id": 181, "seek": 236280, "start": 2362.8, "end": 2365.8, "text": " And then communicating + with them.", "tokens": [50364, 400, 550, 17559, 365, 552, 13, 50514], "temperature": + 0.0, "avg_logprob": -0.190742990863857, "compression_ratio": 1.6532663316582914, + "no_speech_prob": 0.018567346036434174}, {"id": 182, "seek": 236280, "start": 2365.8, + "end": 2387.8, "text": " Yeah and I think this is like something that is missed + in many projects like this like end user collaboration and from our experience this + should really happen in in a very early stage of a project that also kind of continuously + when when you move to production and even when you are production.", "tokens": [50514, + 865, 293, 286, 519, 341, 307, 411, 746, 300, 307, 6721, 294, 867, 4455, 411, 341, + 411, 917, 4195, 9363, 293, 490, 527, 1752, 341, 820, 534, 1051, 294, 294, 257, 588, + 2440, 3233, 295, 257, 1716, 300, 611, 733, 295, 15684, 562, 562, 291, 1286, 281, + 4265, 293, 754, 562, 291, 366, 4265, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.190742990863857, "compression_ratio": 1.6532663316582914, "no_speech_prob": 0.018567346036434174}, + {"id": 183, "seek": 238780, "start": 2387.8, "end": 2399.8, "text": " And I think + this is something which is if you don''t have the right tooling that''s very annoying + to you probably like just building a demo like a UI for some search system.", "tokens": + [50364, 400, 286, 519, 341, 307, 746, 597, 307, 498, 291, 500, 380, 362, 264, 558, + 46593, 300, 311, 588, 11304, 281, 291, 1391, 411, 445, 2390, 257, 10723, 411, 257, + 15682, 337, 512, 3164, 1185, 13, 50964], "temperature": 0.0, "avg_logprob": -0.3199255923007397, + "compression_ratio": 1.6745098039215687, "no_speech_prob": 0.06848172098398209}, + {"id": 184, "seek": 238780, "start": 2399.8, "end": 2416.8, "text": " If you are + not a front end developer if you''re an LP engineer it takes some extra time and + even with something extremely these days it''s still is then annoying to do it properly + and if you''re an enterprise maybe draft some access to it''s permission words.", + "tokens": [50964, 759, 291, 366, 406, 257, 1868, 917, 10754, 498, 291, 434, 364, + 38095, 11403, 309, 2516, 512, 2857, 565, 293, 754, 365, 746, 4664, 613, 1708, 309, + 311, 920, 307, 550, 11304, 281, 360, 309, 6108, 293, 498, 291, 434, 364, 14132, + 1310, 11206, 512, 2105, 281, 309, 311, 11226, 2283, 13, 51814], "temperature": 0.0, + "avg_logprob": -0.3199255923007397, "compression_ratio": 1.6745098039215687, "no_speech_prob": + 0.06848172098398209}, {"id": 185, "seek": 241680, "start": 2416.8, "end": 2426.8, + "text": " But it''s so important I think when you look at what projects work out + at the end what pipelines more customers go to production.", "tokens": [50364, 583, + 309, 311, 370, 1021, 286, 519, 562, 291, 574, 412, 437, 4455, 589, 484, 412, 264, + 917, 437, 40168, 544, 4581, 352, 281, 4265, 13, 50864], "temperature": 0.0, "avg_logprob": + -0.1524914332798549, "compression_ratio": 1.7027027027027026, "no_speech_prob": + 0.008107648231089115}, {"id": 186, "seek": 241680, "start": 2426.8, "end": 2443.8, + "text": " It''s really a big criteria I think in the early days like sharing a demo + with your colleagues and end users really the first pipeline you have more or less + giving it to the hands of users and seeing what what they think about it and how + they use it.", "tokens": [50864, 467, 311, 534, 257, 955, 11101, 286, 519, 294, + 264, 2440, 1708, 411, 5414, 257, 10723, 365, 428, 7734, 293, 917, 5022, 534, 264, + 700, 15517, 291, 362, 544, 420, 1570, 2902, 309, 281, 264, 2377, 295, 5022, 293, + 2577, 437, 437, 436, 519, 466, 309, 293, 577, 436, 764, 309, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.1524914332798549, "compression_ratio": 1.7027027027027026, + "no_speech_prob": 0.008107648231089115}, {"id": 187, "seek": 244380, "start": 2443.8, + "end": 2462.8, "text": " And there were so many examples where NLP engineers thought + they they knew what people were were searching but after these kind of demo sessions + or like sharing it I want to see what what people actually do there.", "tokens": + [50364, 400, 456, 645, 370, 867, 5110, 689, 426, 45196, 11955, 1194, 436, 436, 2586, + 437, 561, 645, 645, 10808, 457, 934, 613, 733, 295, 10723, 11081, 420, 411, 5414, + 309, 286, 528, 281, 536, 437, 437, 561, 767, 360, 456, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.3400594923231337, "compression_ratio": 1.4791666666666667, + "no_speech_prob": 0.007221141830086708}, {"id": 188, "seek": 246280, "start": 2462.8, + "end": 2473.8, "text": " And then they realized oh like they use a lot of key work + queries or they never put a question mark at the end or they have a lot of misspellings + what else.", "tokens": [50364, 400, 550, 436, 5334, 1954, 411, 436, 764, 257, 688, + 295, 2141, 589, 24109, 420, 436, 1128, 829, 257, 1168, 1491, 412, 264, 917, 420, + 436, 362, 257, 688, 295, 1713, 49241, 1109, 437, 1646, 13, 50914], "temperature": + 0.0, "avg_logprob": -0.20001360949348002, "compression_ratio": 1.5681818181818181, + "no_speech_prob": 0.14753645658493042}, {"id": 189, "seek": 246280, "start": 2473.8, + "end": 2482.8, "text": " So I think there''s a lot of early learnings that you can + make as a developer from these demos and understanding it out.", "tokens": [50914, + 407, 286, 519, 456, 311, 257, 688, 295, 2440, 2539, 82, 300, 291, 393, 652, 382, + 257, 10754, 490, 613, 33788, 293, 3701, 309, 484, 13, 51364], "temperature": 0.0, + "avg_logprob": -0.20001360949348002, "compression_ratio": 1.5681818181818181, "no_speech_prob": + 0.14753645658493042}, {"id": 190, "seek": 248280, "start": 2482.8, "end": 2495.8, + "text": " And also I think on the other side just creating this early aha moment + this kind of wow effect and some trust on the end user side is also crucial.", "tokens": + [50364, 400, 611, 286, 519, 322, 264, 661, 1252, 445, 4084, 341, 2440, 47340, 1623, + 341, 733, 295, 6076, 1802, 293, 512, 3361, 322, 264, 917, 4195, 1252, 307, 611, + 11462, 13, 51014], "temperature": 0.0, "avg_logprob": -0.17290538339053882, "compression_ratio": + 1.6837606837606838, "no_speech_prob": 0.03404049575328827}, {"id": 191, "seek": + 248280, "start": 2495.8, "end": 2511.8, "text": " So I would say that''s a cycle + one point very early demo getting this initial feedback and then probably the second + point that we see often is when you then had a time of running your experiments + tuning your pipeline kind of the way to production.", "tokens": [51014, 407, 286, + 576, 584, 300, 311, 257, 6586, 472, 935, 588, 2440, 10723, 1242, 341, 5883, 5824, + 293, 550, 1391, 264, 1150, 935, 300, 321, 536, 2049, 307, 562, 291, 550, 632, 257, + 565, 295, 2614, 428, 12050, 15164, 428, 15517, 733, 295, 264, 636, 281, 4265, 13, + 51814], "temperature": 0.0, "avg_logprob": -0.17290538339053882, "compression_ratio": + 1.6837606837606838, "no_speech_prob": 0.03404049575328827}, {"id": 192, "seek": + 251180, "start": 2511.8, "end": 2527.8, "text": " I think then at some point a second + phase where you you just do again some manual evaluation with end user so not completely + relying on on machine learning metrics.", "tokens": [50364, 286, 519, 550, 412, + 512, 935, 257, 1150, 5574, 689, 291, 291, 445, 360, 797, 512, 9688, 13344, 365, + 917, 4195, 370, 406, 2584, 24140, 322, 322, 3479, 2539, 16367, 13, 51164], "temperature": + 0.0, "avg_logprob": -0.25162489754813055, "compression_ratio": 1.3666666666666667, + "no_speech_prob": 0.003285330254584551}, {"id": 193, "seek": 252780, "start": 2527.8, + "end": 2545.8, "text": " Because we think there''s some kind of metric blindness + in the industry sometimes you just kind of get obsessed with your one metric that + you optimize in these experiments and whatever it is just increasing it from experiment + experiment.", "tokens": [50364, 1436, 321, 519, 456, 311, 512, 733, 295, 20678, + 46101, 294, 264, 3518, 2171, 291, 445, 733, 295, 483, 16923, 365, 428, 472, 20678, + 300, 291, 19719, 294, 613, 12050, 293, 2035, 309, 307, 445, 5662, 309, 490, 5120, + 5120, 13, 51264], "temperature": 0.0, "avg_logprob": -0.14726734161376953, "compression_ratio": + 1.5945945945945945, "no_speech_prob": 0.16876477003097534}, {"id": 194, "seek": + 254580, "start": 2545.8, "end": 2557.8, "text": " And you go to production and you + realize wow okay this metric is doesn''t say say anything about the user set of + satisfaction that I have in the end.", "tokens": [50364, 400, 291, 352, 281, 4265, + 293, 291, 4325, 6076, 1392, 341, 20678, 307, 1177, 380, 584, 584, 1340, 466, 264, + 4195, 992, 295, 18715, 300, 286, 362, 294, 264, 917, 13, 50964], "temperature": + 0.0, "avg_logprob": -0.21130960052077835, "compression_ratio": 1.59375, "no_speech_prob": + 0.08091925084590912}, {"id": 195, "seek": 254580, "start": 2557.8, "end": 2574.8, + "text": " And there are so many examples from our customers where just handing out + this pipeline showing kind of like search queries and results and then collecting + some easy kind of thumbs up thumbs down feedback and.", "tokens": [50964, 400, 456, + 366, 370, 867, 5110, 490, 527, 4581, 689, 445, 34774, 484, 341, 15517, 4099, 733, + 295, 411, 3164, 24109, 293, 3542, 293, 550, 12510, 512, 1858, 733, 295, 8838, 493, + 8838, 760, 5824, 293, 13, 51814], "temperature": 0.0, "avg_logprob": -0.21130960052077835, + "compression_ratio": 1.59375, "no_speech_prob": 0.08091925084590912}, {"id": 196, + "seek": 257480, "start": 2574.8, "end": 2592.8, "text": " And then trying to correlate + is that really what we also saw in our experiments in our metrics and in the thing + in many cases was that either the pipeline was not yet ready for production and + they were like it''s.", "tokens": [50364, 400, 550, 1382, 281, 48742, 307, 300, + 534, 437, 321, 611, 1866, 294, 527, 12050, 294, 527, 16367, 293, 294, 264, 551, + 294, 867, 3331, 390, 300, 2139, 264, 15517, 390, 406, 1939, 1919, 337, 4265, 293, + 436, 645, 411, 309, 311, 13, 51264], "temperature": 0.0, "avg_logprob": -0.2779214940172561, + "compression_ratio": 1.50354609929078, "no_speech_prob": 0.0039782715030014515}, + {"id": 197, "seek": 259280, "start": 2592.8, "end": 2613.8, "text": " The far less + accurate than we thought or also case where it was the other way around where teams + thought are stuck we will never go beyond and like a for a for one score of 60% + we do not here it''s it''s not working.", "tokens": [50364, 440, 1400, 1570, 8559, + 813, 321, 1194, 420, 611, 1389, 689, 309, 390, 264, 661, 636, 926, 689, 5491, 1194, + 366, 5541, 321, 486, 1128, 352, 4399, 293, 411, 257, 337, 257, 337, 472, 6175, 295, + 4060, 4, 321, 360, 406, 510, 309, 311, 309, 311, 406, 1364, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.40870486565355985, "compression_ratio": 1.4758620689655173, + "no_speech_prob": 0.21634607017040253}, {"id": 198, "seek": 261380, "start": 2613.8, + "end": 2623.8, "text": " And they kind of handed out this this predictions or like + get this demo and then people actually don''t like notes like these predictions + are perfectly fine.", "tokens": [50364, 400, 436, 733, 295, 16013, 484, 341, 341, + 21264, 420, 411, 483, 341, 10723, 293, 550, 561, 767, 500, 380, 411, 5570, 411, + 613, 21264, 366, 6239, 2489, 13, 50864], "temperature": 0.0, "avg_logprob": -0.2570039524751551, + "compression_ratio": 1.627659574468085, "no_speech_prob": 0.12585090100765228}, + {"id": 199, "seek": 261380, "start": 2623.8, "end": 2637.8, "text": " And when you + then dig deeper I think it''s often that engineers not look enough into the data + I think I''m just kind of rely on this high level metric.", "tokens": [50864, 400, + 562, 291, 550, 2528, 7731, 286, 519, 309, 311, 2049, 300, 11955, 406, 574, 1547, + 666, 264, 1412, 286, 519, 286, 478, 445, 733, 295, 10687, 322, 341, 1090, 1496, + 20678, 13, 51564], "temperature": 0.0, "avg_logprob": -0.2570039524751551, "compression_ratio": + 1.627659574468085, "no_speech_prob": 0.12585090100765228}, {"id": 200, "seek": 263780, + "start": 2637.8, "end": 2640.8, "text": " And the thing especially nowadays.", "tokens": + [50364, 400, 264, 551, 2318, 13434, 13, 50514], "temperature": 0.0, "avg_logprob": + -0.29621750011778714, "compression_ratio": 1.5029940119760479, "no_speech_prob": + 0.00845425482839346}, {"id": 201, "seek": 263780, "start": 2640.8, "end": 2650.8, + "text": " These metrics only tell the part of the story because you''re like for + question answering also for search.", "tokens": [50514, 1981, 16367, 787, 980, 264, + 644, 295, 264, 1657, 570, 291, 434, 411, 337, 1168, 13430, 611, 337, 3164, 13, 51014], + "temperature": 0.0, "avg_logprob": -0.29621750011778714, "compression_ratio": 1.5029940119760479, + "no_speech_prob": 0.00845425482839346}, {"id": 202, "seek": 263780, "start": 2650.8, + "end": 2661.8, "text": " If you have a relation data set and let''s say you always + label the exact answer for certain question or query.", "tokens": [51014, 759, 291, + 362, 257, 9721, 1412, 992, 293, 718, 311, 584, 291, 1009, 7645, 264, 1900, 1867, + 337, 1629, 1168, 420, 14581, 13, 51564], "temperature": 0.0, "avg_logprob": -0.29621750011778714, + "compression_ratio": 1.5029940119760479, "no_speech_prob": 0.00845425482839346}, + {"id": 203, "seek": 266180, "start": 2661.8, "end": 2665.8, "text": " There''s just + so many ways how you know.", "tokens": [50364, 821, 311, 445, 370, 867, 2098, 577, + 291, 458, 13, 50564], "temperature": 0.0, "avg_logprob": -0.23199141515444402, "compression_ratio": + 1.5441176470588236, "no_speech_prob": 0.00421946682035923}, {"id": 204, "seek": + 266180, "start": 2665.8, "end": 2673.8, "text": " Can give a correct answer for + for question that is different to this label so to give an example.", "tokens": + [50564, 1664, 976, 257, 3006, 1867, 337, 337, 1168, 300, 307, 819, 281, 341, 7645, + 370, 281, 976, 364, 1365, 13, 50964], "temperature": 0.0, "avg_logprob": -0.23199141515444402, + "compression_ratio": 1.5441176470588236, "no_speech_prob": 0.00421946682035923}, + {"id": 205, "seek": 266180, "start": 2673.8, "end": 2678.8, "text": " And we have + many customers financially domain so typical question there is.", "tokens": [50964, + 400, 321, 362, 867, 4581, 20469, 9274, 370, 7476, 1168, 456, 307, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.23199141515444402, "compression_ratio": 1.5441176470588236, + "no_speech_prob": 0.00421946682035923}, {"id": 206, "seek": 266180, "start": 2678.8, + "end": 2686.8, "text": " How will revenue evolve next year and maybe in your data + set and the evaluation data set you labeled.", "tokens": [51214, 1012, 486, 9324, + 16693, 958, 1064, 293, 1310, 294, 428, 1412, 992, 293, 264, 13344, 1412, 992, 291, + 21335, 13, 51614], "temperature": 0.0, "avg_logprob": -0.23199141515444402, "compression_ratio": + 1.5441176470588236, "no_speech_prob": 0.00421946682035923}, {"id": 207, "seek": + 268680, "start": 2686.8, "end": 2690.8, "text": " It will increase by 12%.", "tokens": + [50364, 467, 486, 3488, 538, 2272, 6856, 50564], "temperature": 0.0, "avg_logprob": + -0.17184983662196568, "compression_ratio": 1.3032786885245902, "no_speech_prob": + 0.00619047274813056}, {"id": 208, "seek": 268680, "start": 2690.8, "end": 2703.8, + "text": " And now at the prediction time your model maybe finds another passage + or generates the answer and says it will significantly increase.", "tokens": [50564, + 400, 586, 412, 264, 17630, 565, 428, 2316, 1310, 10704, 1071, 11497, 420, 23815, + 264, 1867, 293, 1619, 309, 486, 10591, 3488, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.17184983662196568, "compression_ratio": 1.3032786885245902, "no_speech_prob": + 0.00619047274813056}, {"id": 209, "seek": 270380, "start": 2703.8, "end": 2716.8, + "text": " So like there''s no overlap at all from a lexical side still both answers + make sense and and are correct and we can probably debate now which one is more + accurate.", "tokens": [50364, 407, 411, 456, 311, 572, 19959, 412, 439, 490, 257, + 476, 87, 804, 1252, 920, 1293, 6338, 652, 2020, 293, 293, 366, 3006, 293, 321, 393, + 1391, 7958, 586, 597, 472, 307, 544, 8559, 13, 51014], "temperature": 0.0, "avg_logprob": + -0.16805486511765866, "compression_ratio": 1.5061728395061729, "no_speech_prob": + 0.10800694674253464}, {"id": 210, "seek": 270380, "start": 2716.8, "end": 2722.8, + "text": " But in many cases there is they basically give the same same answer semantically.", + "tokens": [51014, 583, 294, 867, 3331, 456, 307, 436, 1936, 976, 264, 912, 912, + 1867, 4361, 49505, 13, 51314], "temperature": 0.0, "avg_logprob": -0.16805486511765866, + "compression_ratio": 1.5061728395061729, "no_speech_prob": 0.10800694674253464}, + {"id": 211, "seek": 272280, "start": 2722.8, "end": 2728.8, "text": " But they''re + just formulated very differently and that''s where I would say traditional metrics + fail.", "tokens": [50364, 583, 436, 434, 445, 48936, 588, 7614, 293, 300, 311, 689, + 286, 576, 584, 5164, 16367, 3061, 13, 50664], "temperature": 0.0, "avg_logprob": + -0.23958090812929214, "compression_ratio": 1.4919786096256684, "no_speech_prob": + 0.06622686982154846}, {"id": 212, "seek": 272280, "start": 2728.8, "end": 2741.8, + "text": " So yeah, we need better metrics and we basically did some research work + on that and also part of the haystack where you can do like more semantic answer + similarity or as a metric.", "tokens": [50664, 407, 1338, 11, 321, 643, 1101, 16367, + 293, 321, 1936, 630, 512, 2132, 589, 322, 300, 293, 611, 644, 295, 264, 4842, 372, + 501, 689, 291, 393, 360, 411, 544, 47982, 1867, 32194, 420, 382, 257, 20678, 13, + 51314], "temperature": 0.0, "avg_logprob": -0.23958090812929214, "compression_ratio": + 1.4919786096256684, "no_speech_prob": 0.06622686982154846}, {"id": 213, "seek": + 274180, "start": 2741.8, "end": 2760.8, "text": " But it''s of course also just + I think looking at your data and looking at these predictions and seeing if they''re + really wrong on or if they''re actually okay and maybe it''s some problem of metrics + or you are labeling process where maybe you need to collect more different options + that are okay.", "tokens": [50364, 583, 309, 311, 295, 1164, 611, 445, 286, 519, + 1237, 412, 428, 1412, 293, 1237, 412, 613, 21264, 293, 2577, 498, 436, 434, 534, + 2085, 322, 420, 498, 436, 434, 767, 1392, 293, 1310, 309, 311, 512, 1154, 295, 16367, + 420, 291, 366, 40244, 1399, 689, 1310, 291, 643, 281, 2500, 544, 819, 3956, 300, + 366, 1392, 13, 51314], "temperature": 0.0, "avg_logprob": -0.16979574388073337, + "compression_ratio": 1.6368715083798884, "no_speech_prob": 0.10116980969905853}, + {"id": 214, "seek": 276080, "start": 2760.8, "end": 2779.8, "text": " Yeah, I totally + agree it''s like it''s it''s a challenge of intersecting user language with whatever + machinery you have to answer that right be it''s part search be dense search doesn''t + matter like users don''t care what they care is that their language is understood + and often enough it''s not.", "tokens": [50364, 865, 11, 286, 3879, 3986, 309, 311, + 411, 309, 311, 309, 311, 257, 3430, 295, 27815, 278, 4195, 2856, 365, 2035, 27302, + 291, 362, 281, 1867, 300, 558, 312, 309, 311, 644, 3164, 312, 18011, 3164, 1177, + 380, 1871, 411, 5022, 500, 380, 1127, 437, 436, 1127, 307, 300, 641, 2856, 307, + 7320, 293, 2049, 1547, 309, 311, 406, 13, 51314], "temperature": 0.0, "avg_logprob": + -0.1133083701133728, "compression_ratio": 1.5879120879120878, "no_speech_prob": + 0.009111515246331692}, {"id": 215, "seek": 277980, "start": 2779.8, "end": 2790.8, + "text": " Especially around things like bird if we go dance bird model doesn''t + understand engagements right there was a research paper on that and that might actually + harm.", "tokens": [50364, 8545, 926, 721, 411, 5255, 498, 321, 352, 4489, 5255, + 2316, 1177, 380, 1223, 44978, 558, 456, 390, 257, 2132, 3035, 322, 300, 293, 300, + 1062, 767, 6491, 13, 50914], "temperature": 0.0, "avg_logprob": -0.13662292843773252, + "compression_ratio": 1.564516129032258, "no_speech_prob": 0.23053868114948273}, + {"id": 216, "seek": 277980, "start": 2790.8, "end": 2800.8, "text": " There was + even a Google example where it''s showing the opposite like you say I don''t want + that but they say yes you actually do.", "tokens": [50914, 821, 390, 754, 257, 3329, + 1365, 689, 309, 311, 4099, 264, 6182, 411, 291, 584, 286, 500, 380, 528, 300, 457, + 436, 584, 2086, 291, 767, 360, 13, 51414], "temperature": 0.0, "avg_logprob": -0.13662292843773252, + "compression_ratio": 1.564516129032258, "no_speech_prob": 0.23053868114948273}, + {"id": 217, "seek": 280080, "start": 2800.8, "end": 2805.8, "text": " And then take + that medicine which might be harmful.", "tokens": [50364, 400, 550, 747, 300, 7195, + 597, 1062, 312, 19727, 13, 50614], "temperature": 0.0, "avg_logprob": -0.15870278222220285, + "compression_ratio": 1.6319018404907975, "no_speech_prob": 0.10229014605283737}, + {"id": 218, "seek": 280080, "start": 2805.8, "end": 2818.8, "text": " And then the + metrics is essentially what I get from what you just described essentially you might + have offline metrics right let''s say and DCG or precision or recall whatever and + then you have online metrics right.", "tokens": [50614, 400, 550, 264, 16367, 307, + 4476, 437, 286, 483, 490, 437, 291, 445, 7619, 4476, 291, 1062, 362, 21857, 16367, + 558, 718, 311, 584, 293, 9114, 38, 420, 18356, 420, 9901, 2035, 293, 550, 291, 362, + 2950, 16367, 558, 13, 51264], "temperature": 0.0, "avg_logprob": -0.15870278222220285, + "compression_ratio": 1.6319018404907975, "no_speech_prob": 0.10229014605283737}, + {"id": 219, "seek": 281880, "start": 2818.8, "end": 2836.8, "text": " And actually + crafting the online metrics is is also our an art and it''s never ending journey + and just recently I came across one blog post which was shared by a former Netflix + engineer.", "tokens": [50364, 400, 767, 29048, 264, 2950, 16367, 307, 307, 611, + 527, 364, 1523, 293, 309, 311, 1128, 8121, 4671, 293, 445, 3938, 286, 1361, 2108, + 472, 6968, 2183, 597, 390, 5507, 538, 257, 5819, 12778, 11403, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.14310580492019653, "compression_ratio": 1.3214285714285714, + "no_speech_prob": 0.1011769026517868}, {"id": 220, "seek": 283680, "start": 2836.8, + "end": 2861.8, "text": " I will make sure to link it in the show notes as well describing + click residual metric right so it''s what is you expected success on on on on that + let''s say segment of your market whatever on the queries versus what you got and + then people still keep trying and trying and trying but just doesn''t deliver so + you could have these as a low hanging fruit to fix your system right and so.", "tokens": + [50364, 286, 486, 652, 988, 281, 2113, 309, 294, 264, 855, 5570, 382, 731, 16141, + 2052, 27980, 20678, 558, 370, 309, 311, 437, 307, 291, 5176, 2245, 322, 322, 322, + 322, 300, 718, 311, 584, 9469, 295, 428, 2142, 2035, 322, 264, 24109, 5717, 437, + 291, 658, 293, 550, 561, 920, 1066, 1382, 293, 1382, 293, 1382, 457, 445, 1177, + 380, 4239, 370, 291, 727, 362, 613, 382, 257, 2295, 8345, 6773, 281, 3191, 428, + 1185, 558, 293, 370, 13, 51614], "temperature": 0.0, "avg_logprob": -0.11219907094197101, + "compression_ratio": 1.6695652173913043, "no_speech_prob": 0.3428858816623688}, + {"id": 221, "seek": 286180, "start": 2862.8, "end": 2885.8, "text": " Do you see + that maybe that''s already happening in haystack or do you see that that might happen + that I as a user might be able to describe my metric let''s say in the form of Python + or JavaScript code whatever plug it into haystack and let it measure what I want + and kind of mimic the online metric in substance.", "tokens": [50414, 1144, 291, + 536, 300, 1310, 300, 311, 1217, 2737, 294, 4842, 372, 501, 420, 360, 291, 536, 300, + 300, 1062, 1051, 300, 286, 382, 257, 4195, 1062, 312, 1075, 281, 6786, 452, 20678, + 718, 311, 584, 294, 264, 1254, 295, 15329, 420, 15778, 3089, 2035, 5452, 309, 666, + 4842, 372, 501, 293, 718, 309, 3481, 437, 286, 528, 293, 733, 295, 31075, 264, 2950, + 20678, 294, 12961, 13, 51564], "temperature": 0.0, "avg_logprob": -0.12431187099880642, + "compression_ratio": 1.61139896373057, "no_speech_prob": 0.015542225912213326}, + {"id": 222, "seek": 288580, "start": 2885.8, "end": 2909.8, "text": " So I think + like providing kind of custom metrics yeah yeah yeah yeah and you can can can do + that to some degree already like plugging in basically like a Python function and + forwarding it that''s the one way I think the other is probably on a on a note level + so you can imagine this pipeline they''re providing at some point", "tokens": [50364, + 407, 286, 519, 411, 6530, 733, 295, 2375, 16367, 1338, 1338, 1338, 1338, 293, 291, + 393, 393, 393, 360, 300, 281, 512, 4314, 1217, 411, 42975, 294, 1936, 411, 257, + 15329, 2445, 293, 2128, 278, 309, 300, 311, 264, 472, 636, 286, 519, 264, 661, 307, + 1391, 322, 257, 322, 257, 3637, 1496, 370, 291, 393, 3811, 341, 15517, 436, 434, + 6530, 412, 512, 935, 51564], "temperature": 0.0, "avg_logprob": -0.3537139339723449, + "compression_ratio": 1.7127659574468086, "no_speech_prob": 0.13700120151042938}, + {"id": 223, "seek": 290980, "start": 2909.8, "end": 2936.8, "text": " that you can + have a lot of connections be it answers or documents so you can also easily kind + of add custom notes various have like this this no check now I''ll compare it to + whatever you want or like maybe on an online setting kind of write some locks somewhere + like take some some signals from from from the early query", "tokens": [50364, 300, + 291, 393, 362, 257, 688, 295, 9271, 312, 309, 6338, 420, 8512, 370, 291, 393, 611, + 3612, 733, 295, 909, 2375, 5570, 3683, 362, 411, 341, 341, 572, 1520, 586, 286, + 603, 6794, 309, 281, 2035, 291, 528, 420, 411, 1310, 322, 364, 2950, 3287, 733, + 295, 2464, 512, 20703, 4079, 411, 747, 512, 512, 12354, 490, 490, 490, 264, 2440, + 14581, 51714], "temperature": 0.4, "avg_logprob": -0.6333732604980469, "compression_ratio": + 1.675392670157068, "no_speech_prob": 0.08251698315143585}, {"id": 224, "seek": 293680, + "start": 2936.8, "end": 2966.76, "text": " to an extensive the way you can monitor + it. So yeah I think there''s that''s probably one of the kind of next steps where + we see it''s more and more online metrics more and more online experiments I would + say right now where we see big parts of the market I think that''s the more in that + phase of developing experimenting finding the pipeline getting it initially to production + and having", "tokens": [50364, 281, 364, 13246, 264, 636, 291, 393, 6002, 309, 13, + 407, 1338, 286, 519, 456, 311, 300, 311, 1391, 472, 295, 264, 733, 295, 958, 4439, + 689, 321, 536, 309, 311, 544, 293, 544, 2950, 16367, 544, 293, 544, 2950, 12050, + 286, 576, 584, 558, 586, 689, 321, 536, 955, 3166, 295, 264, 2142, 286, 519, 300, + 311, 264, 544, 294, 300, 5574, 295, 6416, 29070, 5006, 264, 15517, 1242, 309, 9105, + 281, 4265, 293, 1419, 51862], "temperature": 0.0, "avg_logprob": -0.26680656626254695, + "compression_ratio": 1.7824074074074074, "no_speech_prob": 0.05011191591620445}, + {"id": 225, "seek": 296680, "start": 2966.8, "end": 2981.8, "text": " their radio + would say smooth journey and having a fast path to production kind of high success + rates for these projects and I would say it''s very right now focused on more.", + "tokens": [50364, 641, 6477, 576, 584, 5508, 4671, 293, 1419, 257, 2370, 3100, 281, + 4265, 733, 295, 1090, 2245, 6846, 337, 613, 4455, 293, 286, 576, 584, 309, 311, + 588, 558, 586, 5178, 322, 544, 13, 51114], "temperature": 0.0, "avg_logprob": -0.34129288322047185, + "compression_ratio": 1.376, "no_speech_prob": 0.0011878025252372026}, {"id": 226, + "seek": 298180, "start": 2981.8, "end": 3009.8, "text": " But yeah I would say further + down the road if you really think about the whole and add up life cycle I think + on the monitoring side there''s this logic and one online metrics but also then + things like data drift my queries actually shift into a different direction to these + things a lot of our query profiles and we think like what I actually these use case + how how are how can we describe this", "tokens": [50364, 583, 1338, 286, 576, 584, + 3052, 760, 264, 3060, 498, 291, 534, 519, 466, 264, 1379, 293, 909, 493, 993, 6586, + 286, 519, 322, 264, 11028, 1252, 456, 311, 341, 9952, 293, 472, 2950, 16367, 457, + 611, 550, 721, 411, 1412, 19699, 452, 24109, 767, 5513, 666, 257, 819, 3513, 281, + 613, 721, 257, 688, 295, 527, 14581, 23693, 293, 321, 519, 411, 437, 286, 767, 613, + 764, 1389, 577, 577, 366, 577, 393, 321, 6786, 341, 51764], "temperature": 0.0, + "avg_logprob": -0.4757468612105758, "compression_ratio": 1.7161572052401746, "no_speech_prob": + 0.7357689142227173}, {"id": 227, "seek": 300980, "start": 3009.8, "end": 3032.8, + "text": " query distribution and this can be on a formal level like say again questions + for those keyboard queries but could be also on a topic level to understand what + is a profile at point a I mean we can match it with certain pipelines but also is + that kind of changing over time.", "tokens": [50364, 14581, 7316, 293, 341, 393, + 312, 322, 257, 9860, 1496, 411, 584, 797, 1651, 337, 729, 10186, 24109, 457, 727, + 312, 611, 322, 257, 4829, 1496, 281, 1223, 437, 307, 257, 7964, 412, 935, 257, 286, + 914, 321, 393, 2995, 309, 365, 1629, 40168, 457, 611, 307, 300, 733, 295, 4473, + 670, 565, 13, 51514], "temperature": 0.0, "avg_logprob": -0.31107313879605, "compression_ratio": + 1.5423728813559323, "no_speech_prob": 0.020919276401400566}, {"id": 228, "seek": + 303280, "start": 3033.4, "end": 3057.8, "text": " Yeah yeah you you somewhat anticipate + like expected my question or sort of partly answered my question and my next question + about where do you see the biggest effort in haystack and and deep set cloud going + let''s say beyond ML ops you know tightening the knobs and making sure that this + flies and works correctly.", "tokens": [50394, 865, 1338, 291, 291, 8344, 21685, + 411, 5176, 452, 1168, 420, 1333, 295, 17031, 10103, 452, 1168, 293, 452, 958, 1168, + 466, 689, 360, 291, 536, 264, 3880, 4630, 294, 4842, 372, 501, 293, 293, 2452, 992, + 4588, 516, 718, 311, 584, 4399, 21601, 44663, 291, 458, 42217, 264, 46999, 293, + 1455, 988, 300, 341, 17414, 293, 1985, 8944, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.2557430565357208, "compression_ratio": 1.6051282051282052, "no_speech_prob": + 0.07559827715158463}, {"id": 229, "seek": 305780, "start": 3057.8, "end": 3087.6000000000004, + "text": " More towards I know you guys also hiring a product manager so sort of + like more on vision side and connected to that if you will what do you think is + missing on the market today still maybe in understanding maybe in perception level + maybe in tooling you already alluded also to things like metric blindness right + and and and maybe when users get stuck and thinking that this is a wrong system + but actually it''s not they just didn''t look the right way and things like that.", + "tokens": [50394, 5048, 3030, 286, 458, 291, 1074, 611, 15335, 257, 1674, 6598, + 370, 1333, 295, 411, 544, 322, 5201, 1252, 293, 4582, 281, 300, 498, 291, 486, 437, + 360, 291, 519, 307, 5361, 322, 264, 2142, 965, 920, 1310, 294, 3701, 1310, 294, + 12860, 1496, 1310, 294, 46593, 291, 1217, 33919, 611, 281, 721, 411, 20678, 46101, + 558, 293, 293, 293, 1310, 562, 5022, 483, 5541, 293, 1953, 300, 341, 307, 257, 2085, + 1185, 457, 767, 309, 311, 406, 436, 445, 994, 380, 574, 264, 558, 636, 293, 721, + 411, 300, 13, 51854], "temperature": 0.0, "avg_logprob": -0.13475291272427173, "compression_ratio": + 1.7481481481481482, "no_speech_prob": 0.011465130373835564}, {"id": 230, "seek": + 308780, "start": 3088.4, "end": 3109.8, "text": " Yeah and there''s I think the + ton of works to left I think we are we already talked about it I think things progressed + a lot in the last years it''s crazy to see but still I feel it''s with the in the + middle of it or just starting and so much more work and things you can improve and + do better.", "tokens": [50394, 865, 293, 456, 311, 286, 519, 264, 2952, 295, 1985, + 281, 1411, 286, 519, 321, 366, 321, 1217, 2825, 466, 309, 286, 519, 721, 36789, + 257, 688, 294, 264, 1036, 924, 309, 311, 3219, 281, 536, 457, 920, 286, 841, 309, + 311, 365, 264, 294, 264, 2808, 295, 309, 420, 445, 2891, 293, 370, 709, 544, 589, + 293, 721, 291, 393, 3470, 293, 360, 1101, 13, 51464], "temperature": 0.0, "avg_logprob": + -0.2778404780796596, "compression_ratio": 1.6166666666666667, "no_speech_prob": + 0.011329004541039467}, {"id": 231, "seek": 310980, "start": 3110.8, "end": 3129.8, + "text": " Yeah I would say for us right now there''s like a lot of different directions + but I think especially on the on the open source side we want to improve the developer + experience also like simplifying the first steps within haystack I think it can + be still overwhelming and I really want to make sure that", "tokens": [50414, 865, + 286, 576, 584, 337, 505, 558, 586, 456, 311, 411, 257, 688, 295, 819, 11095, 457, + 286, 519, 2318, 322, 264, 322, 264, 1269, 4009, 1252, 321, 528, 281, 3470, 264, + 10754, 1752, 611, 411, 6883, 5489, 264, 700, 4439, 1951, 4842, 372, 501, 286, 519, + 309, 393, 312, 920, 13373, 293, 286, 534, 528, 281, 652, 988, 300, 51364], "temperature": + 0.0, "avg_logprob": -0.18127349019050598, "compression_ratio": 1.5759162303664922, + "no_speech_prob": 0.13318483531475067}, {"id": 232, "seek": 312980, "start": 3129.8, + "end": 3145.8, "text": " get as many people to the first aha moment like using all + your own data asking a few questions comparing sparse to dense retrieval and really + experiencing this first hand I think this is one of the things we work on.", "tokens": + [50364, 483, 382, 867, 561, 281, 264, 700, 47340, 1623, 411, 1228, 439, 428, 1065, + 1412, 3365, 257, 1326, 1651, 15763, 637, 11668, 281, 18011, 19817, 3337, 293, 534, + 11139, 341, 700, 1011, 286, 519, 341, 307, 472, 295, 264, 721, 321, 589, 322, 13, + 51164], "temperature": 0.0, "avg_logprob": -0.3492608865102132, "compression_ratio": + 1.4896551724137932, "no_speech_prob": 0.09030165523290634}, {"id": 233, "seek": + 314580, "start": 3145.8, "end": 3174.8, "text": " Then a lot around multi model + so we recently added support for tables within haystack so I think one interesting + direction right now that you can really query into these kind of tables in your + documents but maybe also further down the road into your SQL database as another + data source and then of course everything around images videos audio and it''s also + interesting for us I think for our customers.", "tokens": [50364, 1396, 257, 688, + 926, 4825, 2316, 370, 321, 3938, 3869, 1406, 337, 8020, 1951, 4842, 372, 501, 370, + 286, 519, 472, 1880, 3513, 558, 586, 300, 291, 393, 534, 14581, 666, 613, 733, 295, + 8020, 294, 428, 8512, 457, 1310, 611, 3052, 760, 264, 3060, 666, 428, 19200, 8149, + 382, 1071, 1412, 4009, 293, 550, 295, 1164, 1203, 926, 5267, 2145, 6278, 293, 309, + 311, 611, 1880, 337, 505, 286, 519, 337, 527, 4581, 13, 51814], "temperature": 0.0, + "avg_logprob": -0.17817988762488732, "compression_ratio": 1.654320987654321, "no_speech_prob": + 0.40422317385673523}, {"id": 234, "seek": 317580, "start": 3175.8, "end": 3185.8, + "text": " Because it''s typically less important than kind of tax in tables but + still I think it''s interesting interesting options that you can do there.", "tokens": + [50364, 1436, 309, 311, 5850, 1570, 1021, 813, 733, 295, 3366, 294, 8020, 457, 920, + 286, 519, 309, 311, 1880, 1880, 3956, 300, 291, 393, 360, 456, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.23722707489390432, "compression_ratio": 1.646808510638298, + "no_speech_prob": 0.0018835271475836635}, {"id": 235, "seek": 317580, "start": 3185.8, + "end": 3204.8, "text": " So yeah I think that''s like a lot on on open source side + and deep set cloud are we recently launched basically the experiments module that + was one big step forward there and now it''s a lot around giving there also guidance + and suggestions like.", "tokens": [50864, 407, 1338, 286, 519, 300, 311, 411, 257, + 688, 322, 322, 1269, 4009, 1252, 293, 2452, 992, 4588, 366, 321, 3938, 8730, 1936, + 264, 12050, 10088, 300, 390, 472, 955, 1823, 2128, 456, 293, 586, 309, 311, 257, + 688, 926, 2902, 456, 611, 10056, 293, 13396, 411, 13, 51814], "temperature": 0.0, + "avg_logprob": -0.23722707489390432, "compression_ratio": 1.646808510638298, "no_speech_prob": + 0.0018835271475836635}, {"id": 236, "seek": 320480, "start": 3204.8, "end": 3233.8, + "text": " Like for example now I have the experiment I ran an experiment I''ve like + a lot of these metrics I have a lot of data that was somehow generated but as it''s + not a single model anymore it''s like a pipeline I really want to understand as + a data scientist okay like where should I not focus on or like where what''s probably + a good way forward to improve this pipeline is a rather the retrieval problem is + a rather.", "tokens": [50364, 1743, 337, 1365, 586, 286, 362, 264, 5120, 286, 5872, + 364, 5120, 286, 600, 411, 257, 688, 295, 613, 16367, 286, 362, 257, 688, 295, 1412, + 300, 390, 6063, 10833, 457, 382, 309, 311, 406, 257, 2167, 2316, 3602, 309, 311, + 411, 257, 15517, 286, 534, 528, 281, 1223, 382, 257, 1412, 12662, 1392, 411, 689, + 820, 286, 406, 1879, 322, 420, 411, 689, 437, 311, 1391, 257, 665, 636, 2128, 281, + 3470, 341, 15517, 307, 257, 2831, 264, 19817, 3337, 1154, 307, 257, 2831, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.23560462527804904, "compression_ratio": 1.7446808510638299, + "no_speech_prob": 0.010290063917636871}, {"id": 237, "seek": 323380, "start": 3233.8, + "end": 3246.8, "text": " Another note that I should improve is maybe something wrong + with my evaluation data set should I go back to labeling and like giving these kind + of at least making these kind of analysis easier.", "tokens": [50364, 3996, 3637, + 300, 286, 820, 3470, 307, 1310, 746, 2085, 365, 452, 13344, 1412, 992, 820, 286, + 352, 646, 281, 40244, 293, 411, 2902, 613, 733, 295, 412, 1935, 1455, 613, 733, + 295, 5215, 3571, 13, 51014], "temperature": 0.0, "avg_logprob": -0.2450826911516087, + "compression_ratio": 1.6961538461538461, "no_speech_prob": 0.0007211799384094775}, + {"id": 238, "seek": 323380, "start": 3246.8, "end": 3262.8, "text": " It''s something + that we work on right now and then I think further down the road that will be for + us a lot expanding in this world ML of life cycles what we talk about right monitoring + without just making it simpler to integrate it at both ends so.", "tokens": [51014, + 467, 311, 746, 300, 321, 589, 322, 558, 586, 293, 550, 286, 519, 3052, 760, 264, + 3060, 300, 486, 312, 337, 505, 257, 688, 14702, 294, 341, 1002, 21601, 295, 993, + 17796, 437, 321, 751, 466, 558, 11028, 1553, 445, 1455, 309, 18587, 281, 13365, + 309, 412, 1293, 5314, 370, 13, 51814], "temperature": 0.0, "avg_logprob": -0.2450826911516087, + "compression_ratio": 1.6961538461538461, "no_speech_prob": 0.0007211799384094775}, + {"id": 239, "seek": 326280, "start": 3262.8, "end": 3279.8, "text": " Basically + on the one side ingesting your source data more easily and thinking it more easily + into into deep set cloud so that you can say I know either maybe I have a wiki system + that I use maybe I don''t know I use notion or maybe I use.", "tokens": [50364, + 8537, 322, 264, 472, 1252, 3957, 8714, 428, 4009, 1412, 544, 3612, 293, 1953, 309, + 544, 3612, 666, 666, 2452, 992, 4588, 370, 300, 291, 393, 584, 286, 458, 2139, 1310, + 286, 362, 257, 261, 9850, 1185, 300, 286, 764, 1310, 286, 500, 380, 458, 286, 764, + 10710, 420, 1310, 286, 764, 13, 51214], "temperature": 0.0, "avg_logprob": -0.1941745824981154, + "compression_ratio": 1.6013513513513513, "no_speech_prob": 0.020814063027501106}, + {"id": 240, "seek": 327980, "start": 3279.8, "end": 3304.8, "text": " So I think + it''s just a little bit more conflict or I have a not another elastic such class + that we already my my documents that I''m interested in so we having there kind + of smooth connectors that you can can import your data and directly work on it and + then on the other end the API now how can I easily get now a kind of search bar + or search functionality in my final product.", "tokens": [50364, 407, 286, 519, + 309, 311, 445, 257, 707, 857, 544, 6596, 420, 286, 362, 257, 406, 1071, 17115, 1270, + 1508, 300, 321, 1217, 452, 452, 8512, 300, 286, 478, 3102, 294, 370, 321, 1419, + 456, 733, 295, 5508, 31865, 300, 291, 393, 393, 974, 428, 1412, 293, 3838, 589, + 322, 309, 293, 550, 322, 264, 661, 917, 264, 9362, 586, 577, 393, 286, 3612, 483, + 586, 257, 733, 295, 3164, 2159, 420, 3164, 14980, 294, 452, 2572, 1674, 13, 51614], + "temperature": 0.4, "avg_logprob": -0.5478533089879047, "compression_ratio": 1.6391304347826088, + "no_speech_prob": 0.3419419825077057}, {"id": 241, "seek": 330480, "start": 3304.8, + "end": 3330.8, "text": " So there''s a lot of things and then everything around + fine tuning few short learning with large language models that something we are + quite excited about because I think mentioned I think right now there''s already + made a big step forward that you there are a lot of use cases where you don''t need + to train at all anymore and then maybe that''s a misperception that you also see + in the market.", "tokens": [50364, 407, 456, 311, 257, 688, 295, 721, 293, 550, + 1203, 926, 2489, 15164, 1326, 2099, 2539, 365, 2416, 2856, 5245, 300, 746, 321, + 366, 1596, 2919, 466, 570, 286, 519, 2835, 286, 519, 558, 586, 456, 311, 1217, 1027, + 257, 955, 1823, 2128, 300, 291, 456, 366, 257, 688, 295, 764, 3331, 689, 291, 500, + 380, 643, 281, 3847, 412, 439, 3602, 293, 550, 1310, 300, 311, 257, 3346, 610, 7311, + 300, 291, 611, 536, 294, 264, 2142, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.2031043868467032, "compression_ratio": 1.7074235807860263, "no_speech_prob": + 0.0442170575261116}, {"id": 242, "seek": 333080, "start": 3330.8, "end": 3359.8, + "text": " I think to the typical users come to us and say like oh yeah this use + case how can I train and then we usually ask did you really need to train your own + model like have you tried this and that take these kind of combinations and kind + of models that are out there certain sentence transformers certain pre trained QA + models or rank models and that no no but like our use cases are different and that + won''t work and.", "tokens": [50364, 286, 519, 281, 264, 7476, 5022, 808, 281, 505, + 293, 584, 411, 1954, 1338, 341, 764, 1389, 577, 393, 286, 3847, 293, 550, 321, 2673, + 1029, 630, 291, 534, 643, 281, 3847, 428, 1065, 2316, 411, 362, 291, 3031, 341, + 293, 300, 747, 613, 733, 295, 21267, 293, 733, 295, 5245, 300, 366, 484, 456, 1629, + 8174, 4088, 433, 1629, 659, 8895, 1249, 32, 5245, 420, 6181, 5245, 293, 300, 572, + 572, 457, 411, 527, 764, 3331, 366, 819, 293, 300, 1582, 380, 589, 293, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.3410816616482205, "compression_ratio": 1.8157894736842106, + "no_speech_prob": 0.02054857648909092}, {"id": 243, "seek": 335980, "start": 3359.8, + "end": 3381.8, "text": " In many cases it does or at least they''re surprised how + good it is already and maybe it''s enough to get started on it and so I think that''s + one misperception still I think there are then also these cases to be fair where + fine tuning still helps right and where you really care about if you.", "tokens": + [50364, 682, 867, 3331, 309, 775, 420, 412, 1935, 436, 434, 6100, 577, 665, 309, + 307, 1217, 293, 1310, 309, 311, 1547, 281, 483, 1409, 322, 309, 293, 370, 286, 519, + 300, 311, 472, 3346, 610, 7311, 920, 286, 519, 456, 366, 550, 611, 613, 3331, 281, + 312, 3143, 689, 2489, 15164, 920, 3665, 558, 293, 689, 291, 534, 1127, 466, 498, + 291, 13, 51464], "temperature": 0.0, "avg_logprob": -0.14951805570232335, "compression_ratio": + 1.5846994535519126, "no_speech_prob": 0.032929182052612305}, {"id": 244, "seek": + 338180, "start": 3381.8, "end": 3410.8, "text": " So you can go to percentage points + better accuracy and where you then go down and say let''s now start labeling let''s + collect either like we in this manual labeling process or maybe from some more noisy + maybe real time like a production data where you saw what people search what they + clicked how can we use that maybe for training that''s something where we see big + potential probably for next year.", "tokens": [50364, 407, 291, 393, 352, 281, 9668, + 2793, 1101, 14170, 293, 689, 291, 550, 352, 760, 293, 584, 718, 311, 586, 722, 40244, + 718, 311, 2500, 2139, 411, 321, 294, 341, 9688, 40244, 1399, 420, 1310, 490, 512, + 544, 24518, 1310, 957, 565, 411, 257, 4265, 1412, 689, 291, 1866, 437, 561, 3164, + 437, 436, 23370, 577, 393, 321, 764, 300, 1310, 337, 3097, 300, 311, 746, 689, 321, + 536, 955, 3995, 1391, 337, 958, 1064, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.2801186525368992, "compression_ratio": 1.7008547008547008, "no_speech_prob": + 0.3999885022640228}, {"id": 245, "seek": 341180, "start": 3411.8, "end": 3431.8, + "text": " And basically want to simplify this domain adaptation to have less manual + effort and basically more automated way of of training it and that I think was also + that in the direction of maybe large language models.", "tokens": [50364, 400, 1936, + 528, 281, 20460, 341, 9274, 21549, 281, 362, 1570, 9688, 4630, 293, 1936, 544, 18473, + 636, 295, 295, 3097, 309, 293, 300, 286, 519, 390, 611, 300, 294, 264, 3513, 295, + 1310, 2416, 2856, 5245, 13, 51364], "temperature": 0.0, "avg_logprob": -0.2421353885105678, + "compression_ratio": 1.4652777777777777, "no_speech_prob": 0.00703832320868969}, + {"id": 246, "seek": 343180, "start": 3431.8, "end": 3456.8, "text": " Yeah sounds + cool and if we go in even in look even further into the future would say I don''t + know 5 10 years out do you think that haystack at some point may even start suggesting + the user what to try you know if you go and set up a key PI for yourself right you + end goal and then through the chain and that I see click graph it looks like finds + a weak node and say yes something is going on there.", "tokens": [50364, 865, 3263, + 1627, 293, 498, 321, 352, 294, 754, 294, 574, 754, 3052, 666, 264, 2027, 576, 584, + 286, 500, 380, 458, 1025, 1266, 924, 484, 360, 291, 519, 300, 4842, 372, 501, 412, + 512, 935, 815, 754, 722, 18094, 264, 4195, 437, 281, 853, 291, 458, 498, 291, 352, + 293, 992, 493, 257, 2141, 27176, 337, 1803, 558, 291, 917, 3387, 293, 550, 807, + 264, 5021, 293, 300, 286, 536, 2052, 4295, 309, 1542, 411, 10704, 257, 5336, 9984, + 293, 584, 2086, 746, 307, 516, 322, 456, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.24444268339423722, "compression_ratio": 1.6446280991735538, "no_speech_prob": + 0.09969537705183029}, {"id": 247, "seek": 345680, "start": 3456.8, "end": 3471.8, + "text": " Then it would actually suggest you also to try some other model do you + think it''s possible or do you think it''s a wrong direction at all like to you + drive and leave this to the creativity of your users.", "tokens": [50364, 1396, + 309, 576, 767, 3402, 291, 611, 281, 853, 512, 661, 2316, 360, 291, 519, 309, 311, + 1944, 420, 360, 291, 519, 309, 311, 257, 2085, 3513, 412, 439, 411, 281, 291, 3332, + 293, 1856, 341, 281, 264, 12915, 295, 428, 5022, 13, 51114], "temperature": 0.0, + "avg_logprob": -0.16148829967417616, "compression_ratio": 1.4744525547445255, "no_speech_prob": + 0.009724569506943226}, {"id": 248, "seek": 347180, "start": 3471.8, "end": 3485.8, + "text": " I think it''s a combination of both so I definitely think that helps to + accelerate and certain parts of your work so especially I think suggesting what + experiment to run next or what it could be something you can try.", "tokens": [50364, + 286, 519, 309, 311, 257, 6562, 295, 1293, 370, 286, 2138, 519, 300, 3665, 281, 21341, + 293, 1629, 3166, 295, 428, 589, 370, 2318, 286, 519, 18094, 437, 5120, 281, 1190, + 958, 420, 437, 309, 727, 312, 746, 291, 393, 853, 13, 51064], "temperature": 0.0, + "avg_logprob": -0.2008727775825249, "compression_ratio": 1.598360655737705, "no_speech_prob": + 0.0762307271361351}, {"id": 249, "seek": 347180, "start": 3485.8, "end": 3496.8, + "text": " So I''m a big fan of that and I think we don''t need to go probably like + 5 or 10 years down the road that is happening already sooner so I can and haystack + and deep set cloud.", "tokens": [51064, 407, 286, 478, 257, 955, 3429, 295, 300, + 293, 286, 519, 321, 500, 380, 643, 281, 352, 1391, 411, 1025, 420, 1266, 924, 760, + 264, 3060, 300, 307, 2737, 1217, 15324, 370, 286, 393, 293, 4842, 372, 501, 293, + 2452, 992, 4588, 13, 51614], "temperature": 0.0, "avg_logprob": -0.2008727775825249, + "compression_ratio": 1.598360655737705, "no_speech_prob": 0.0762307271361351}, {"id": + 250, "seek": 349680, "start": 3496.8, "end": 3525.8, "text": " And maybe just like + one thing we are so we have our company something called Hockey Friday so it''s + like one Friday every month where every person the company can work on whatever + they want so really hacking on crazy ideas trying stuff out and I know that this + Friday people are working on a generative model where you basically give in you + describe what you want like what kind of pipeline so you can type in.", "tokens": + [50364, 400, 1310, 445, 411, 472, 551, 321, 366, 370, 321, 362, 527, 2237, 746, + 1219, 389, 46164, 6984, 370, 309, 311, 411, 472, 6984, 633, 1618, 689, 633, 954, + 264, 2237, 393, 589, 322, 2035, 436, 528, 370, 534, 31422, 322, 3219, 3487, 1382, + 1507, 484, 293, 286, 458, 300, 341, 6984, 561, 366, 1364, 322, 257, 1337, 1166, + 2316, 689, 291, 1936, 976, 294, 291, 6786, 437, 291, 528, 411, 437, 733, 295, 15517, + 370, 291, 393, 2010, 294, 13, 51814], "temperature": 0.0, "avg_logprob": -0.23581998488482306, + "compression_ratio": 1.718487394957983, "no_speech_prob": 0.14596544206142426}, + {"id": 251, "seek": 352580, "start": 3525.8, "end": 3551.8, "text": " And let''s + say I want documents such pipeline that works on legal data that is very fast something + like that and the output is basically a YAML file that describes this haystack pipeline + which you can then easily kind of load and Python try out and also write a load + and then deep set cloud and run it there.", "tokens": [50364, 400, 718, 311, 584, + 286, 528, 8512, 1270, 15517, 300, 1985, 322, 5089, 1412, 300, 307, 588, 2370, 746, + 411, 300, 293, 264, 5598, 307, 1936, 257, 398, 2865, 43, 3991, 300, 15626, 341, + 4842, 372, 501, 15517, 597, 291, 393, 550, 3612, 733, 295, 3677, 293, 15329, 853, + 484, 293, 611, 2464, 257, 3677, 293, 550, 2452, 992, 4588, 293, 1190, 309, 456, + 13, 51664], "temperature": 0.0, "avg_logprob": -0.15587123926135077, "compression_ratio": + 1.5505050505050506, "no_speech_prob": 0.00043805621680803597}, {"id": 252, "seek": + 355180, "start": 3551.8, "end": 3571.8, "text": " So that''s actually we are experimenting + with right now and and of course some time for the down the road I could see that + you can take also like signals from from what we know from what worked on certain + domains and and basically fuse that in into this maybe a generative process.", "tokens": + [50364, 407, 300, 311, 767, 321, 366, 29070, 365, 558, 586, 293, 293, 295, 1164, + 512, 565, 337, 264, 760, 264, 3060, 286, 727, 536, 300, 291, 393, 747, 611, 411, + 12354, 490, 490, 437, 321, 458, 490, 437, 2732, 322, 1629, 25514, 293, 293, 1936, + 31328, 300, 294, 666, 341, 1310, 257, 1337, 1166, 1399, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.2281182289123535, "compression_ratio": 1.5555555555555556, + "no_speech_prob": 0.018495267257094383}, {"id": 253, "seek": 357180, "start": 3571.8, + "end": 3581.8, "text": " Yeah, it sounds cool actually reminded me of the time when + I was doing my PhD something like 12 years ago a bit more.", "tokens": [50364, 865, + 11, 309, 3263, 1627, 767, 15920, 385, 295, 264, 565, 562, 286, 390, 884, 452, 14476, + 746, 411, 2272, 924, 2057, 257, 857, 544, 13, 50864], "temperature": 0.0, "avg_logprob": + -0.13349807780721915, "compression_ratio": 1.6041666666666667, "no_speech_prob": + 0.046169932931661606}, {"id": 254, "seek": 357180, "start": 3581.8, "end": 3600.8, + "text": " I had a collaborator who wrote a paper on taking taking the user text + and converging that into C++ code and the use case I don''t remember exactly all + the details of the use case but I remember it was some way in the airport so like + they do a lot of this routine work.", "tokens": [50864, 286, 632, 257, 5091, 1639, + 567, 4114, 257, 3035, 322, 1940, 1940, 264, 4195, 2487, 293, 9652, 3249, 300, 666, + 383, 25472, 3089, 293, 264, 764, 1389, 286, 500, 380, 1604, 2293, 439, 264, 4365, + 295, 264, 764, 1389, 457, 286, 1604, 309, 390, 512, 636, 294, 264, 10155, 370, 411, + 436, 360, 257, 688, 295, 341, 9927, 589, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.13349807780721915, "compression_ratio": 1.6041666666666667, "no_speech_prob": + 0.046169932931661606}, {"id": 255, "seek": 360080, "start": 3600.8, "end": 3615.8, + "text": " And instead of repeating it you could actually build a smarter system + right so you think this could be the future of haystack or maybe the industry at + large.", "tokens": [50364, 400, 2602, 295, 18617, 309, 291, 727, 767, 1322, 257, + 20294, 1185, 558, 370, 291, 519, 341, 727, 312, 264, 2027, 295, 4842, 372, 501, + 420, 1310, 264, 3518, 412, 2416, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.10084089967939588, "compression_ratio": 1.3652173913043477, "no_speech_prob": + 0.01292424462735653}, {"id": 256, "seek": 361580, "start": 3616.8, "end": 3633.8, + "text": " Yeah, at least I think it''s like one if you want element that helps accelerating + right so if you also if you look at the core pilot right now I like it a lot for + calling and I''m still in many cases surprised what what co pilot suggests you''re + as a as a note on the code level.", "tokens": [50414, 865, 11, 412, 1935, 286, 519, + 309, 311, 411, 472, 498, 291, 528, 4478, 300, 3665, 34391, 558, 370, 498, 291, 611, + 498, 291, 574, 412, 264, 4965, 9691, 558, 586, 286, 411, 309, 257, 688, 337, 5141, + 293, 286, 478, 920, 294, 867, 3331, 6100, 437, 437, 598, 9691, 13409, 291, 434, + 382, 257, 382, 257, 3637, 322, 264, 3089, 1496, 13, 51264], "temperature": 0.0, + "avg_logprob": -0.29271122946668027, "compression_ratio": 1.5771428571428572, "no_speech_prob": + 0.10706516355276108}, {"id": 257, "seek": 363380, "start": 3633.8, "end": 3647.8, + "text": " And I think something similar as also positive on the machine learning + side and you are not only generates a correct code but really something that fits + for for use case and to describe it.", "tokens": [50364, 400, 286, 519, 746, 2531, + 382, 611, 3353, 322, 264, 3479, 2539, 1252, 293, 291, 366, 406, 787, 23815, 257, + 3006, 3089, 457, 534, 746, 300, 9001, 337, 337, 764, 1389, 293, 281, 6786, 309, + 13, 51064], "temperature": 0.0, "avg_logprob": -0.26022115434919085, "compression_ratio": + 1.6210526315789473, "no_speech_prob": 0.009107493795454502}, {"id": 258, "seek": + 363380, "start": 3647.8, "end": 3654.8, "text": " I mean I think it''s like if you + think about the big up picture I think it''s one piece that helps you in your workflow.", + "tokens": [51064, 286, 914, 286, 519, 309, 311, 411, 498, 291, 519, 466, 264, 955, + 493, 3036, 286, 519, 309, 311, 472, 2522, 300, 3665, 291, 294, 428, 20993, 13, 51414], + "temperature": 0.0, "avg_logprob": -0.26022115434919085, "compression_ratio": 1.6210526315789473, + "no_speech_prob": 0.009107493795454502}, {"id": 259, "seek": 365480, "start": 3654.8, + "end": 3664.8, "text": " I think it''s there''s still like many many other pieces + that we need to get right and that won''t be that''s it a holy grail I think at + the end.", "tokens": [50364, 286, 519, 309, 311, 456, 311, 920, 411, 867, 867, 661, + 3755, 300, 321, 643, 281, 483, 558, 293, 300, 1582, 380, 312, 300, 311, 309, 257, + 10622, 1295, 388, 286, 519, 412, 264, 917, 13, 50864], "temperature": 0.0, "avg_logprob": + -0.22905368390290634, "compression_ratio": 1.7212389380530972, "no_speech_prob": + 0.022865615785121918}, {"id": 260, "seek": 365480, "start": 3664.8, "end": 3683.8, + "text": " What I really believe in is that you need a framework or a platform where + we want to call it where you can easily compare things on your data and and I think + this helps a lot then and creating transparency in the market creating also like + kind of.", "tokens": [50864, 708, 286, 534, 1697, 294, 307, 300, 291, 643, 257, + 8388, 420, 257, 3663, 689, 321, 528, 281, 818, 309, 689, 291, 393, 3612, 6794, 721, + 322, 428, 1412, 293, 293, 286, 519, 341, 3665, 257, 688, 550, 293, 4084, 17131, + 294, 264, 2142, 4084, 611, 411, 733, 295, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.22905368390290634, "compression_ratio": 1.7212389380530972, "no_speech_prob": + 0.022865615785121918}, {"id": 261, "seek": 368380, "start": 3683.8, "end": 3712.8, + "text": " Trust for your own use case that you are not basically doing a technology + choice before you actually started working on your use case and that I think holds + for vector databases where maybe today this is a good choice for you but maybe I + know one year down the road maybe you want to switch this I think this market is + so early that it''s very hard to place a batch right now on one of these technologies + and similarly I think this is on the model.", "tokens": [50364, 11580, 337, 428, + 1065, 764, 1389, 300, 291, 366, 406, 1936, 884, 257, 2899, 3922, 949, 291, 767, + 1409, 1364, 322, 428, 764, 1389, 293, 300, 286, 519, 9190, 337, 8062, 22380, 689, + 1310, 965, 341, 307, 257, 665, 3922, 337, 291, 457, 1310, 286, 458, 472, 1064, 760, + 264, 3060, 1310, 291, 528, 281, 3679, 341, 286, 519, 341, 2142, 307, 370, 2440, + 300, 309, 311, 588, 1152, 281, 1081, 257, 15245, 558, 586, 322, 472, 295, 613, 7943, + 293, 14138, 286, 519, 341, 307, 322, 264, 2316, 13, 51814], "temperature": 0.0, + "avg_logprob": -0.19237743540013091, "compression_ratio": 1.8024193548387097, "no_speech_prob": + 0.013304518535733223}, {"id": 262, "seek": 371280, "start": 3712.8, "end": 3741.8, + "text": " Modeling side there''s like so so much crazy bus around large language + models and can firstly see the trend going there but it''s also I think very important + to to understand if that''s really useful for your use case now how it compares + to much smaller models and and this should be easy right this shouldn''t this shouldn''t + be big part of your project it should be rather.", "tokens": [50364, 6583, 11031, + 1252, 456, 311, 411, 370, 370, 709, 3219, 1255, 926, 2416, 2856, 5245, 293, 393, + 27376, 536, 264, 6028, 516, 456, 457, 309, 311, 611, 286, 519, 588, 1021, 281, 281, + 1223, 498, 300, 311, 534, 4420, 337, 428, 764, 1389, 586, 577, 309, 38334, 281, + 709, 4356, 5245, 293, 293, 341, 820, 312, 1858, 558, 341, 4659, 380, 341, 4659, + 380, 312, 955, 644, 295, 428, 1716, 309, 820, 312, 2831, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.23783538914933988, "compression_ratio": 1.7289719626168225, + "no_speech_prob": 0.004387346561998129}, {"id": 263, "seek": 374180, "start": 3741.8, + "end": 3756.8, "text": " You were trying to think about options you want to try + maybe getting some suggestions as well there but this would be I think this is a + human creativity part as well and then the the actual.", "tokens": [50364, 509, + 645, 1382, 281, 519, 466, 3956, 291, 528, 281, 853, 1310, 1242, 512, 13396, 382, + 731, 456, 457, 341, 576, 312, 286, 519, 341, 307, 257, 1952, 12915, 644, 382, 731, + 293, 550, 264, 264, 3539, 13, 51114], "temperature": 0.0, "avg_logprob": -0.18044514723227056, + "compression_ratio": 1.6633165829145728, "no_speech_prob": 0.02033403143286705}, + {"id": 264, "seek": 374180, "start": 3756.8, "end": 3767.8, "text": " So a swapping + of components and comparing their making them comparable I think that''s nothing + where you should spend time as a developer on.", "tokens": [51114, 407, 257, 1693, + 10534, 295, 6677, 293, 15763, 641, 1455, 552, 25323, 286, 519, 300, 311, 1825, 689, + 291, 820, 3496, 565, 382, 257, 10754, 322, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.18044514723227056, "compression_ratio": 1.6633165829145728, "no_speech_prob": + 0.02033403143286705}, {"id": 265, "seek": 376780, "start": 3767.8, "end": 3796.8, + "text": " And like connected to the question about future maybe causing off of on + that we recently built with my colleague are netalman a multi model and multilingual + search demo right where we used clip model of the shelf without any fine tuning + on web data and it showed us really really amazing results right so like where keyword + search cannot find because simply.", "tokens": [50414, 400, 411, 4582, 281, 264, + 1168, 466, 2027, 1310, 9853, 766, 295, 322, 300, 321, 3938, 3094, 365, 452, 13532, + 366, 2533, 304, 1601, 257, 4825, 2316, 293, 2120, 38219, 3164, 10723, 558, 689, + 321, 1143, 7353, 2316, 295, 264, 15222, 1553, 604, 2489, 15164, 322, 3670, 1412, + 293, 309, 4712, 505, 534, 534, 2243, 3542, 558, 370, 411, 689, 20428, 3164, 2644, + 915, 570, 2935, 13, 51814], "temperature": 0.0, "avg_logprob": -0.2451728003365653, + "compression_ratio": 1.634703196347032, "no_speech_prob": 0.03368542715907097}, + {"id": 266, "seek": 379780, "start": 3797.8, "end": 3808.8, "text": " We metadata + doesn''t have it and it''s multilingual right so and it type it the same query with + neural retrieval and it gets it.", "tokens": [50364, 492, 26603, 1177, 380, 362, + 309, 293, 309, 311, 2120, 38219, 558, 370, 293, 309, 2010, 309, 264, 912, 14581, + 365, 18161, 19817, 3337, 293, 309, 2170, 309, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.1457685743059431, "compression_ratio": 1.6396396396396395, "no_speech_prob": + 0.02699311263859272}, {"id": 267, "seek": 379780, "start": 3808.8, "end": 3823.8, + "text": " Is there anything stopping high stack to move into that direction as well + sort of like crossing the boundary of only text right so like you did say multi + model in the context of let''s say queering a table but I could also query an image.", + "tokens": [50914, 1119, 456, 1340, 12767, 1090, 8630, 281, 1286, 666, 300, 3513, + 382, 731, 1333, 295, 411, 14712, 264, 12866, 295, 787, 2487, 558, 370, 411, 291, + 630, 584, 4825, 2316, 294, 264, 4319, 295, 718, 311, 584, 631, 1794, 257, 3199, + 457, 286, 727, 611, 14581, 364, 3256, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.1457685743059431, "compression_ratio": 1.6396396396396395, "no_speech_prob": + 0.02699311263859272}, {"id": 268, "seek": 382380, "start": 3823.8, "end": 3827.8, + "text": " So the same with the test stack is going in that direction as well.", + "tokens": [50364, 407, 264, 912, 365, 264, 1500, 8630, 307, 516, 294, 300, 3513, + 382, 731, 13, 50564], "temperature": 0.0, "avg_logprob": -0.363587605584528, "compression_ratio": + 1.6732283464566928, "no_speech_prob": 0.012012441642582417}, {"id": 269, "seek": + 382380, "start": 3827.8, "end": 3852.8, "text": " Yeah so we are actually like real + right now working on it so we have a first case where we want to support where you + have a text query but you can query also into images from the right side side and + then basically now other way around would be probably one of the later ones they + have an image as a query until I want to find different media types, I''d say.", + "tokens": [50564, 865, 370, 321, 366, 767, 411, 957, 558, 586, 1364, 322, 309, 370, + 321, 362, 257, 700, 1389, 689, 321, 528, 281, 1406, 689, 291, 362, 257, 2487, 14581, + 457, 291, 393, 14581, 611, 666, 5267, 490, 264, 558, 1252, 1252, 293, 550, 1936, + 586, 661, 636, 926, 576, 312, 1391, 472, 295, 264, 1780, 2306, 436, 362, 364, 3256, + 382, 257, 14581, 1826, 286, 528, 281, 915, 819, 3021, 3467, 11, 286, 1116, 584, + 13, 51814], "temperature": 0.0, "avg_logprob": -0.363587605584528, "compression_ratio": + 1.6732283464566928, "no_speech_prob": 0.012012441642582417}, {"id": 270, "seek": + 385280, "start": 3852.8, "end": 3858.8, "text": " But yeah this is like definitely + what we right now working on.", "tokens": [50364, 583, 1338, 341, 307, 411, 2138, + 437, 321, 558, 586, 1364, 322, 13, 50664], "temperature": 0.0, "avg_logprob": -0.2822669681749846, + "compression_ratio": 1.5925925925925926, "no_speech_prob": 0.007458253763616085}, + {"id": 271, "seek": 385280, "start": 3858.8, "end": 3868.8, "text": " I think I + also think we need to think always see what are the big use cases and what kind + of customers you have and how do we use it.", "tokens": [50664, 286, 519, 286, 611, + 519, 321, 643, 281, 519, 1009, 536, 437, 366, 264, 955, 764, 3331, 293, 437, 733, + 295, 4581, 291, 362, 293, 577, 360, 321, 764, 309, 13, 51164], "temperature": 0.0, + "avg_logprob": -0.2822669681749846, "compression_ratio": 1.5925925925925926, "no_speech_prob": + 0.007458253763616085}, {"id": 272, "seek": 385280, "start": 3868.8, "end": 3877.8, + "text": " I think with images there''s a lot of interesting use cases mainly in + e-commerce I would say that''s cool.", "tokens": [51164, 286, 519, 365, 5267, 456, + 311, 257, 688, 295, 1880, 764, 3331, 8704, 294, 308, 12, 26926, 286, 576, 584, 300, + 311, 1627, 13, 51614], "temperature": 0.0, "avg_logprob": -0.2822669681749846, "compression_ratio": + 1.5925925925925926, "no_speech_prob": 0.007458253763616085}, {"id": 273, "seek": + 387780, "start": 3877.8, "end": 3883.8, "text": " Yeah, we are already supported + to some degree and will support more I think in the next month.", "tokens": [50364, + 865, 11, 321, 366, 1217, 8104, 281, 512, 4314, 293, 486, 1406, 544, 286, 519, 294, + 264, 958, 1618, 13, 50664], "temperature": 0.0, "avg_logprob": -0.18296910336143093, + "compression_ratio": 1.6931818181818181, "no_speech_prob": 0.1519637256860733}, + {"id": 274, "seek": 387780, "start": 3883.8, "end": 3905.8, "text": " That''s great + to learn and that also means that I need to adjust my classification because I''ve + been presenting what I know about the players in in vector database and neural frameworks + and specifically for haystack I put NLP as the main vertical and I think largely + you guys still advertise that as the main vertical but I think nothing stops you + from.", "tokens": [50664, 663, 311, 869, 281, 1466, 293, 300, 611, 1355, 300, 286, + 643, 281, 4369, 452, 21538, 570, 286, 600, 668, 15578, 437, 286, 458, 466, 264, + 4150, 294, 294, 8062, 8149, 293, 18161, 29834, 293, 4682, 337, 4842, 372, 501, 286, + 829, 426, 45196, 382, 264, 2135, 9429, 293, 286, 519, 11611, 291, 1074, 920, 35379, + 300, 382, 264, 2135, 9429, 457, 286, 519, 1825, 10094, 291, 490, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.18296910336143093, "compression_ratio": 1.6931818181818181, + "no_speech_prob": 0.1519637256860733}, {"id": 275, "seek": 390580, "start": 3905.8, + "end": 3913.8, "text": " Switching that to multi modality right so NLP computer + vision and maybe even speech at some point.", "tokens": [50364, 13893, 278, 300, + 281, 4825, 1072, 1860, 558, 370, 426, 45196, 3820, 5201, 293, 1310, 754, 6218, 412, + 512, 935, 13, 50764], "temperature": 0.0, "avg_logprob": -0.17239184319218503, "compression_ratio": + 1.5260869565217392, "no_speech_prob": 0.01824888400733471}, {"id": 276, "seek": + 390580, "start": 3913.8, "end": 3931.8, "text": " Yeah totally I think our approaches + there''s just a bit like doing one thing to quite a depth first and then moving + on to the next rather than let''s say starting with very high level basic support + for all modalities and then kind of growing all of them.", "tokens": [50764, 865, + 3879, 286, 519, 527, 11587, 456, 311, 445, 257, 857, 411, 884, 472, 551, 281, 1596, + 257, 7161, 700, 293, 550, 2684, 322, 281, 264, 958, 2831, 813, 718, 311, 584, 2891, + 365, 588, 1090, 1496, 3875, 1406, 337, 439, 1072, 16110, 293, 550, 733, 295, 4194, + 439, 295, 552, 13, 51664], "temperature": 0.0, "avg_logprob": -0.17239184319218503, + "compression_ratio": 1.5260869565217392, "no_speech_prob": 0.01824888400733471}, + {"id": 277, "seek": 393180, "start": 3931.8, "end": 3950.8, "text": " So what we + rather did in the past and still doing is very deep support for texts and we haven''t + there everything in place before kind of moving on to the next that''s a bit of + a philosophy question maybe a strategic questions what you want to do it.", "tokens": + [50364, 407, 437, 321, 2831, 630, 294, 264, 1791, 293, 920, 884, 307, 588, 2452, + 1406, 337, 15765, 293, 321, 2378, 380, 456, 1203, 294, 1081, 949, 733, 295, 2684, + 322, 281, 264, 958, 300, 311, 257, 857, 295, 257, 10675, 1168, 1310, 257, 10924, + 1651, 437, 291, 528, 281, 360, 309, 13, 51314], "temperature": 0.0, "avg_logprob": + -0.3619434152330671, "compression_ratio": 1.5121951219512195, "no_speech_prob": + 0.030860206112265587}, {"id": 278, "seek": 395080, "start": 3950.8, "end": 3968.8, + "text": " So this field multi is changing quite a lot right so a lot of things generative + models really big large models models that I don''t know even how to use yet you + know like dali.", "tokens": [50364, 407, 341, 2519, 4825, 307, 4473, 1596, 257, + 688, 558, 370, 257, 688, 295, 721, 1337, 1166, 5245, 534, 955, 2416, 5245, 5245, + 300, 286, 500, 380, 458, 754, 577, 281, 764, 1939, 291, 458, 411, 274, 5103, 13, + 51264], "temperature": 0.0, "avg_logprob": -0.2677145669626635, "compression_ratio": + 1.4112903225806452, "no_speech_prob": 0.3014005124568939}, {"id": 279, "seek": 396880, + "start": 3968.8, "end": 3982.8, "text": " Of course beyond just kind of experimental + interest but probably there will be some use cases where do you think else the trends + are going in this space.", "tokens": [50364, 2720, 1164, 4399, 445, 733, 295, 17069, + 1179, 457, 1391, 456, 486, 312, 512, 764, 3331, 689, 360, 291, 519, 1646, 264, 13892, + 366, 516, 294, 341, 1901, 13, 51064], "temperature": 0.0, "avg_logprob": -0.10289331638451779, + "compression_ratio": 1.3076923076923077, "no_speech_prob": 0.12392525374889374}, + {"id": 280, "seek": 398280, "start": 3982.8, "end": 4003.8, "text": " Yeah so we + like want one big trend I think for sure is these large language models and everything + around it and as I talked earlier about it the questions like where is it right + now and is it already today really usable is it already kind of worth investigating + them comparing them for for your own use cases.", "tokens": [50364, 865, 370, 321, + 411, 528, 472, 955, 6028, 286, 519, 337, 988, 307, 613, 2416, 2856, 5245, 293, 1203, + 926, 309, 293, 382, 286, 2825, 3071, 466, 309, 264, 1651, 411, 689, 307, 309, 558, + 586, 293, 307, 309, 1217, 965, 534, 29975, 307, 309, 1217, 733, 295, 3163, 22858, + 552, 15763, 552, 337, 337, 428, 1065, 764, 3331, 13, 51414], "temperature": 0.0, + "avg_logprob": -0.16247457265853882, "compression_ratio": 1.6263157894736842, "no_speech_prob": + 0.12506522238254547}, {"id": 281, "seek": 400380, "start": 4003.8, "end": 4032.8, + "text": " I think there we are I would say still in an early phase it''s look at + for example GPT 3 and and I think it''s high months to the quite nice analysis earlier + this year where compared embedding some GPT 3 and will more standard size transformers + and there we think we saw it''s the performance is it''s not bad but it''s also + definitely not our performing.", "tokens": [50364, 286, 519, 456, 321, 366, 286, + 576, 584, 920, 294, 364, 2440, 5574, 309, 311, 574, 412, 337, 1365, 26039, 51, 805, + 293, 293, 286, 519, 309, 311, 1090, 2493, 281, 264, 1596, 1481, 5215, 3071, 341, + 1064, 689, 5347, 12240, 3584, 512, 26039, 51, 805, 293, 486, 544, 3832, 2744, 4088, + 433, 293, 456, 321, 519, 321, 1866, 309, 311, 264, 3389, 307, 309, 311, 406, 1578, + 457, 309, 311, 611, 2138, 406, 527, 10205, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.3959727817111545, "compression_ratio": 1.6129032258064515, "no_speech_prob": + 0.0796964168548584}, {"id": 282, "seek": 403280, "start": 4032.8, "end": 4045.8, + "text": " You say regular size models which are a thousand times smaller cost a + few dollars in not thousands and tens of thousands of dollars for for your influence + costs.", "tokens": [50364, 509, 584, 3890, 2744, 5245, 597, 366, 257, 4714, 1413, + 4356, 2063, 257, 1326, 3808, 294, 406, 5383, 293, 10688, 295, 5383, 295, 3808, 337, + 337, 428, 6503, 5497, 13, 51014], "temperature": 0.0, "avg_logprob": -0.22282164805644267, + "compression_ratio": 1.6313131313131313, "no_speech_prob": 0.005298241041600704}, + {"id": 283, "seek": 403280, "start": 4045.8, "end": 4058.8, "text": " So I think + that''s it''s basically right now as to let''s see case by case that it makes sense + for use case but if you think look a bit further into the next years.", "tokens": + [51014, 407, 286, 519, 300, 311, 309, 311, 1936, 558, 586, 382, 281, 718, 311, 536, + 1389, 538, 1389, 300, 309, 1669, 2020, 337, 764, 1389, 457, 498, 291, 519, 574, + 257, 857, 3052, 666, 264, 958, 924, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.22282164805644267, "compression_ratio": 1.6313131313131313, "no_speech_prob": + 0.005298241041600704}, {"id": 284, "seek": 405880, "start": 4058.8, "end": 4087.8, + "text": " I''m pretty sure and convinced that this is only a matter of time until + we see more and more large language models really in production also in search pipelines + in production and think that now it''s this phase of figuring out how can we make + them really more efficient more more reliable so we really can trust these these + results there.", "tokens": [50364, 286, 478, 1238, 988, 293, 12561, 300, 341, 307, + 787, 257, 1871, 295, 565, 1826, 321, 536, 544, 293, 544, 2416, 2856, 5245, 534, + 294, 4265, 611, 294, 3164, 40168, 294, 4265, 293, 519, 300, 586, 309, 311, 341, + 5574, 295, 15213, 484, 577, 393, 321, 652, 552, 534, 544, 7148, 544, 544, 12924, + 370, 321, 534, 393, 3361, 613, 613, 3542, 456, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.14773211759679458, "compression_ratio": 1.6884422110552764, "no_speech_prob": + 0.027059154585003853}, {"id": 285, "seek": 408780, "start": 4087.8, "end": 4116.8, + "text": " Not going to be an easier way update to new knowledge and I would really + but now look a lot into and what I''m personally quite excited about is now this + I think area of research around retrieval based NLP so yes on the one hand side + kind of scaling up the models making them bigger because we think learned and over + last years that they are good few short learners and I think that''s really good.", + "tokens": [50364, 1726, 516, 281, 312, 364, 3571, 636, 5623, 281, 777, 3601, 293, + 286, 576, 534, 457, 586, 574, 257, 688, 666, 293, 437, 286, 478, 5665, 1596, 2919, + 466, 307, 586, 341, 286, 519, 1859, 295, 2132, 926, 19817, 3337, 2361, 426, 45196, + 370, 2086, 322, 264, 472, 1011, 1252, 733, 295, 21589, 493, 264, 5245, 1455, 552, + 3801, 570, 321, 519, 3264, 293, 670, 1036, 924, 300, 436, 366, 665, 1326, 2099, + 23655, 293, 286, 519, 300, 311, 534, 665, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.34569773563118866, "compression_ratio": 1.6363636363636365, "no_speech_prob": + 0.025945376604795456}, {"id": 286, "seek": 411680, "start": 4116.8, "end": 4135.8, + "text": " And that''s of course exciting because you can just take these models + and kind of throw a task at them and they will perform so less manual work of of + annotating data creating domain specific data sets and so on.", "tokens": [50364, + 400, 300, 311, 295, 1164, 4670, 570, 291, 393, 445, 747, 613, 5245, 293, 733, 295, + 3507, 257, 5633, 412, 552, 293, 436, 486, 2042, 370, 1570, 9688, 589, 295, 295, + 25339, 990, 1412, 4084, 9274, 2685, 1412, 6352, 293, 370, 322, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.21598241684284616, "compression_ratio": 1.4452054794520548, + "no_speech_prob": 0.01452100370079279}, {"id": 287, "seek": 413580, "start": 4135.8, + "end": 4164.8, "text": " But I think we also saw that they are not very efficient + and there are these other problems. How do you how do you actually now teach not + to be free about recent events or about your own domain knowledge and typically + I think these these data sets that you that you want to search in they''re not static + right so there''s a constantly evolving and you really want to retrain these crazy + models every few days or weeks just to kind of catch up with us.", "tokens": [50364, + 583, 286, 519, 321, 611, 1866, 300, 436, 366, 406, 588, 7148, 293, 456, 366, 613, + 661, 2740, 13, 1012, 360, 291, 577, 360, 291, 767, 586, 2924, 406, 281, 312, 1737, + 466, 5162, 3931, 420, 466, 428, 1065, 9274, 3601, 293, 5850, 286, 519, 613, 613, + 1412, 6352, 300, 291, 300, 291, 528, 281, 3164, 294, 436, 434, 406, 13437, 558, + 370, 456, 311, 257, 6460, 21085, 293, 291, 534, 528, 281, 1533, 7146, 613, 3219, + 5245, 633, 1326, 1708, 420, 3259, 445, 281, 733, 295, 3745, 493, 365, 505, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.18238230253520765, "compression_ratio": 1.7470817120622568, + "no_speech_prob": 0.03828652203083038}, {"id": 288, "seek": 416580, "start": 4166.8, + "end": 4180.8, "text": " And I think that''s like where this stream of retrieval + based or achievement at models is super interesting and I think there''s a lot of + cool work.", "tokens": [50414, 400, 286, 519, 300, 311, 411, 689, 341, 4309, 295, + 19817, 3337, 2361, 420, 15838, 412, 5245, 307, 1687, 1880, 293, 286, 519, 456, 311, + 257, 688, 295, 1627, 589, 13, 51114], "temperature": 0.0, "avg_logprob": -0.3552790233067104, + "compression_ratio": 1.3394495412844036, "no_speech_prob": 0.0035831511486321688}, + {"id": 289, "seek": 418080, "start": 4180.8, "end": 4189.8, "text": " So just this + week we''re back from from Patrick Lewis publication around the Atlas model.", "tokens": + [50364, 407, 445, 341, 1243, 321, 434, 646, 490, 490, 13980, 17412, 19953, 926, + 264, 32485, 2316, 13, 50814], "temperature": 0.0, "avg_logprob": -0.3222217082977295, + "compression_ratio": 1.569377990430622, "no_speech_prob": 0.03529032692313194}, + {"id": 290, "seek": 418080, "start": 4189.8, "end": 4193.8, "text": " Sure if you + saw it. But there''s basically the idea.", "tokens": [50814, 4894, 498, 291, 1866, + 309, 13, 583, 456, 311, 1936, 264, 1558, 13, 51014], "temperature": 0.0, "avg_logprob": + -0.3222217082977295, "compression_ratio": 1.569377990430622, "no_speech_prob": 0.03529032692313194}, + {"id": 291, "seek": 418080, "start": 4193.8, "end": 4208.8, "text": " Can we can + we somehow remove the say the memory part from these big models and it kind of outsource + it to a database to an index and then at a query time we still have like a large + model.", "tokens": [51014, 1664, 321, 393, 321, 6063, 4159, 264, 584, 264, 4675, + 644, 490, 613, 955, 5245, 293, 309, 733, 295, 14758, 2948, 309, 281, 257, 8149, + 281, 364, 8186, 293, 550, 412, 257, 14581, 565, 321, 920, 362, 411, 257, 2416, 2316, + 13, 51764], "temperature": 0.0, "avg_logprob": -0.3222217082977295, "compression_ratio": + 1.569377990430622, "no_speech_prob": 0.03529032692313194}, {"id": 292, "seek": 420880, + "start": 4208.8, "end": 4237.8, "text": " Can we look complex reasoning, but it''s + kind of basing the generation on some retrieve documents and that can be useful + for search but can be also for an effect checking or other use cases and and long + story short, I think they have interesting they did love interesting experiments + and that paper that show that you can actually outsource quite a bit of these parameters + of this memory into into a", "tokens": [50364, 1664, 321, 574, 3997, 21577, 11, + 457, 309, 311, 733, 295, 987, 278, 264, 5125, 322, 512, 30254, 8512, 293, 300, 393, + 312, 4420, 337, 3164, 457, 393, 312, 611, 337, 364, 1802, 8568, 420, 661, 764, 3331, + 293, 293, 938, 1657, 2099, 11, 286, 519, 436, 362, 1880, 436, 630, 959, 1880, 12050, + 293, 300, 3035, 300, 855, 300, 291, 393, 767, 14758, 2948, 1596, 257, 857, 295, + 613, 9834, 295, 341, 4675, 666, 666, 257, 51814], "temperature": 0.0, "avg_logprob": + -0.2299939379279996, "compression_ratio": 1.7155172413793103, "no_speech_prob": + 0.005177072249352932}, {"id": 293, "seek": 423780, "start": 4237.8, "end": 4246.8, + "text": " say a vector like the database and and still keep the few shot capabilities + of these giant language models.", "tokens": [50364, 584, 257, 8062, 411, 264, 8149, + 293, 293, 920, 1066, 264, 1326, 3347, 10862, 295, 613, 7410, 2856, 5245, 13, 50814], + "temperature": 0.0, "avg_logprob": -0.2909165721828655, "compression_ratio": 1.5664739884393064, + "no_speech_prob": 0.0033793686889111996}, {"id": 294, "seek": 423780, "start": 4246.8, + "end": 4262.8, "text": " And I think this is like a super cool route like larger + models but still not putting everything in it, not not blowing up parameters, parameters + size unreasonably.", "tokens": [50814, 400, 286, 519, 341, 307, 411, 257, 1687, + 1627, 7955, 411, 4833, 5245, 457, 920, 406, 3372, 1203, 294, 309, 11, 406, 406, + 15068, 493, 9834, 11, 9834, 2744, 20584, 1258, 1188, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.2909165721828655, "compression_ratio": 1.5664739884393064, + "no_speech_prob": 0.0033793686889111996}, {"id": 295, "seek": 426280, "start": 4263.8, + "end": 4270.8, "text": " Let''s do combining it with now let''s say an external + document base or knowledge base.", "tokens": [50414, 961, 311, 360, 21928, 309, + 365, 586, 718, 311, 584, 364, 8320, 4166, 3096, 420, 3601, 3096, 13, 50764], "temperature": + 0.0, "avg_logprob": -0.2332797604937886, "compression_ratio": 1.6291079812206573, + "no_speech_prob": 0.16929569840431213}, {"id": 296, "seek": 426280, "start": 4270.8, + "end": 4286.8, "text": " Yeah, I think it''s the topic attached upon it''s fascinating + that on one hand, let''s say you have a model, right? And if you if you keep retraining + it or fine tuning it on on latest data, you may run into this. I think it''s called + catastrophic forgetting, right?", "tokens": [50764, 865, 11, 286, 519, 309, 311, + 264, 4829, 8570, 3564, 309, 311, 10343, 300, 322, 472, 1011, 11, 718, 311, 584, + 291, 362, 257, 2316, 11, 558, 30, 400, 498, 291, 498, 291, 1066, 49356, 1760, 309, + 420, 2489, 15164, 309, 322, 322, 6792, 1412, 11, 291, 815, 1190, 666, 341, 13, 286, + 519, 309, 311, 1219, 34915, 25428, 11, 558, 30, 51564], "temperature": 0.0, "avg_logprob": + -0.2332797604937886, "compression_ratio": 1.6291079812206573, "no_speech_prob": + 0.16929569840431213}, {"id": 297, "seek": 428680, "start": 4286.8, "end": 4294.8, + "text": " Like things that we as humans know that I don''t know what is liquid kind + of on high level without going into chemistry.", "tokens": [50364, 1743, 721, 300, + 321, 382, 6255, 458, 300, 286, 500, 380, 458, 437, 307, 6553, 733, 295, 322, 1090, + 1496, 1553, 516, 666, 12558, 13, 50764], "temperature": 0.0, "avg_logprob": -0.16509604166789227, + "compression_ratio": 1.6339285714285714, "no_speech_prob": 0.38385775685310364}, + {"id": 298, "seek": 428680, "start": 4294.8, "end": 4309.8, "text": " And it''s + not that we think about it every single day when we drink water, but like it''s + not that we actually forget it if somebody asks us right no matter how many news + or papers, whatever the red books right we still remember the basic facts and", + "tokens": [50764, 400, 309, 311, 406, 300, 321, 519, 466, 309, 633, 2167, 786, 562, + 321, 2822, 1281, 11, 457, 411, 309, 311, 406, 300, 321, 767, 2870, 309, 498, 2618, + 8962, 505, 558, 572, 1871, 577, 867, 2583, 420, 10577, 11, 2035, 264, 2182, 3642, + 558, 321, 920, 1604, 264, 3875, 9130, 293, 51514], "temperature": 0.0, "avg_logprob": + -0.16509604166789227, "compression_ratio": 1.6339285714285714, "no_speech_prob": + 0.38385775685310364}, {"id": 299, "seek": 430980, "start": 4309.8, "end": 4322.8, + "text": " and I think what you just said with the Atlas model right so approach + outsourcing that memory into some database that you can maybe even control and say, + okay, these facts need to stay.", "tokens": [50364, 293, 286, 519, 437, 291, 445, + 848, 365, 264, 32485, 2316, 558, 370, 3109, 14758, 41849, 300, 4675, 666, 512, 8149, + 300, 291, 393, 1310, 754, 1969, 293, 584, 11, 1392, 11, 613, 9130, 643, 281, 1754, + 13, 51014], "temperature": 0.0, "avg_logprob": -0.17950237424750076, "compression_ratio": + 1.5953488372093023, "no_speech_prob": 0.09019295126199722}, {"id": 300, "seek": + 430980, "start": 4322.8, "end": 4332.8, "text": " I never want them to go away no + matter what right, these are like basic principles and maybe they exist in every + domain like finance or healthcare and so on.", "tokens": [51014, 286, 1128, 528, + 552, 281, 352, 1314, 572, 1871, 437, 558, 11, 613, 366, 411, 3875, 9156, 293, 1310, + 436, 2514, 294, 633, 9274, 411, 10719, 420, 8884, 293, 370, 322, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.17950237424750076, "compression_ratio": 1.5953488372093023, + "no_speech_prob": 0.09019295126199722}, {"id": 301, "seek": 433280, "start": 4332.8, + "end": 4337.8, "text": " And yeah, I think this is interesting direction.", "tokens": + [50364, 400, 1338, 11, 286, 519, 341, 307, 1880, 3513, 13, 50614], "temperature": + 0.0, "avg_logprob": -0.31480225920677185, "compression_ratio": 1.5210526315789474, + "no_speech_prob": 0.1434401124715805}, {"id": 302, "seek": 433280, "start": 4337.8, + "end": 4354.8, "text": " Yeah, absolutely all these facts change right can also + be that over time you have to adjust facts or knowledge and this is way easier I + think if you have it explicitly somewhere in documents, not so much in the just + in the parameters model.", "tokens": [50614, 865, 11, 3122, 439, 613, 9130, 1319, + 558, 393, 611, 312, 300, 670, 565, 291, 362, 281, 4369, 9130, 420, 3601, 293, 341, + 307, 636, 3571, 286, 519, 498, 291, 362, 309, 20803, 4079, 294, 8512, 11, 406, 370, + 709, 294, 264, 445, 294, 264, 9834, 2316, 13, 51464], "temperature": 0.0, "avg_logprob": + -0.31480225920677185, "compression_ratio": 1.5210526315789474, "no_speech_prob": + 0.1434401124715805}, {"id": 303, "seek": 435480, "start": 4354.8, "end": 4370.8, + "text": " Yeah, exactly exactly and like maybe just one example that comes to my + mind is like CT CT''s change names right and so you could still go back and say + what was the name of that CD between you know 1995 and 2000 right something like + that.", "tokens": [50364, 865, 11, 2293, 2293, 293, 411, 1310, 445, 472, 1365, 300, + 1487, 281, 452, 1575, 307, 411, 19529, 19529, 311, 1319, 5288, 558, 293, 370, 291, + 727, 920, 352, 646, 293, 584, 437, 390, 264, 1315, 295, 300, 6743, 1296, 291, 458, + 22601, 293, 8132, 558, 746, 411, 300, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.12375709745619032, "compression_ratio": 1.4478527607361964, "no_speech_prob": + 0.05547681078314781}, {"id": 304, "seek": 437080, "start": 4370.8, "end": 4383.8, + "text": " Yeah, or presidents of nations also change right so for this kind of queries + I think you want to make sure that you''re up to date and change it.", "tokens": + [50364, 865, 11, 420, 27611, 295, 11035, 611, 1319, 558, 370, 337, 341, 733, 295, + 24109, 286, 519, 291, 528, 281, 652, 988, 300, 291, 434, 493, 281, 4002, 293, 1319, + 309, 13, 51014], "temperature": 0.0, "avg_logprob": -0.23905001746283638, "compression_ratio": + 1.2743362831858407, "no_speech_prob": 0.1190975084900856}, {"id": 305, "seek": 438380, + "start": 4383.8, "end": 4397.8, "text": " Yeah, and I think maybe coming back to + search understanding the context will place such a huge role once these models become + even more mature and available and knowledge aware.", "tokens": [50364, 865, 11, + 293, 286, 519, 1310, 1348, 646, 281, 3164, 3701, 264, 4319, 486, 1081, 1270, 257, + 2603, 3090, 1564, 613, 5245, 1813, 754, 544, 14442, 293, 2435, 293, 3601, 3650, + 13, 51064], "temperature": 0.0, "avg_logprob": -0.12950106461842856, "compression_ratio": + 1.3435114503816794, "no_speech_prob": 0.1525879204273224}, {"id": 306, "seek": 439780, + "start": 4397.8, "end": 4424.8, "text": " But but the challenge of extracting contacts + from the query still is there if I say who is the president of the United States, + it might you know conclude that i''m asking about now present but if I was couple + programs above already saying setting the stage about specific period of time in + the past it could actually reason that I''m maybe not asking about presence right.", + "tokens": [50364, 583, 457, 264, 3430, 295, 49844, 15836, 490, 264, 14581, 920, + 307, 456, 498, 286, 584, 567, 307, 264, 3868, 295, 264, 2824, 3040, 11, 309, 1062, + 291, 458, 16886, 300, 741, 478, 3365, 466, 586, 1974, 457, 498, 286, 390, 1916, + 4268, 3673, 1217, 1566, 3287, 264, 3233, 466, 2685, 2896, 295, 565, 294, 264, 1791, + 309, 727, 767, 1778, 300, 286, 478, 1310, 406, 3365, 466, 6814, 558, 13, 51714], + "temperature": 0.0, "avg_logprob": -0.16103317260742187, "compression_ratio": 1.6228070175438596, + "no_speech_prob": 0.48550447821617126}, {"id": 307, "seek": 442480, "start": 4424.8, + "end": 4436.8, "text": " Exactly could do this reasoning or could you ask a clarifying + question right or say like all like here are a couple of options that you mean this + like as you may want more to win a human conversation.", "tokens": [50364, 7587, + 727, 360, 341, 21577, 420, 727, 291, 1029, 257, 6093, 5489, 1168, 558, 420, 584, + 411, 439, 411, 510, 366, 257, 1916, 295, 3956, 300, 291, 914, 341, 411, 382, 291, + 815, 528, 544, 281, 1942, 257, 1952, 3761, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.2118764321009318, "compression_ratio": 1.778225806451613, "no_speech_prob": 0.013676443137228489}, + {"id": 308, "seek": 442480, "start": 4436.8, "end": 4453.8, "text": " Yeah, so I + think it''s called conversational information retrieval right and I think that we + might start seeing this blend of what probably today is called chatbot and a search + engine but it could be a search engine which is just clarifying.", "tokens": [50964, + 865, 11, 370, 286, 519, 309, 311, 1219, 2615, 1478, 1589, 19817, 3337, 558, 293, + 286, 519, 300, 321, 1062, 722, 2577, 341, 10628, 295, 437, 1391, 965, 307, 1219, + 5081, 18870, 293, 257, 3164, 2848, 457, 309, 727, 312, 257, 3164, 2848, 597, 307, + 445, 6093, 5489, 13, 51814], "temperature": 0.0, "avg_logprob": -0.2118764321009318, + "compression_ratio": 1.778225806451613, "no_speech_prob": 0.013676443137228489}, + {"id": 309, "seek": 445380, "start": 4453.8, "end": 4471.8, "text": " Yeah, I mean + overall I think it''s it''s that''s a thing also is in the field I think we are + seeing that the search what we understand undersurge is evolving right so it''s + not so much anymore so I think about web search engines.", "tokens": [50364, 865, + 11, 286, 914, 4787, 286, 519, 309, 311, 309, 311, 300, 311, 257, 551, 611, 307, + 294, 264, 2519, 286, 519, 321, 366, 2577, 300, 264, 3164, 437, 321, 1223, 16692, + 374, 432, 307, 21085, 558, 370, 309, 311, 406, 370, 709, 3602, 370, 286, 519, 466, + 3670, 3164, 12982, 13, 51264], "temperature": 0.0, "avg_logprob": -0.3456519671848842, + "compression_ratio": 1.5957446808510638, "no_speech_prob": 0.006885879673063755}, + {"id": 310, "seek": 447180, "start": 4471.8, "end": 4480.8, "text": " Yeah, in few + cases you were still you search search and you click on the website and then you''ll + search somewhere your information.", "tokens": [50364, 865, 11, 294, 1326, 3331, + 291, 645, 920, 291, 3164, 3164, 293, 291, 2052, 322, 264, 3144, 293, 550, 291, 603, + 3164, 4079, 428, 1589, 13, 50814], "temperature": 0.0, "avg_logprob": -0.2595294189453125, + "compression_ratio": 1.7064676616915422, "no_speech_prob": 0.08032990247011185}, + {"id": 311, "seek": 447180, "start": 4480.8, "end": 4497.8, "text": " But in many + cases we will kind of zero click search now where you have your query and within + the search results you already find what you want at and i think this is just yeah + getting more and more popular that.", "tokens": [50814, 583, 294, 867, 3331, 321, + 486, 733, 295, 4018, 2052, 3164, 586, 689, 291, 362, 428, 14581, 293, 1951, 264, + 3164, 3542, 291, 1217, 915, 437, 291, 528, 412, 293, 741, 519, 341, 307, 445, 1338, + 1242, 544, 293, 544, 3743, 300, 13, 51664], "temperature": 0.0, "avg_logprob": -0.2595294189453125, + "compression_ratio": 1.7064676616915422, "no_speech_prob": 0.08032990247011185}, + {"id": 312, "seek": 449780, "start": 4497.8, "end": 4512.8, "text": " Yeah, you''re + not providing say there''s the route to go to another knowledge source but you''re + trying to really answer the query directly and there''s no need to go further.", + "tokens": [50364, 865, 11, 291, 434, 406, 6530, 584, 456, 311, 264, 7955, 281, 352, + 281, 1071, 3601, 4009, 457, 291, 434, 1382, 281, 534, 1867, 264, 14581, 3838, 293, + 456, 311, 572, 643, 281, 352, 3052, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.2128302574157715, "compression_ratio": 1.4132231404958677, "no_speech_prob": + 0.009024864062666893}, {"id": 313, "seek": 451280, "start": 4512.8, "end": 4526.8, + "text": " I will also try to remember to link one paper maybe it''s like a series + of papers from Microsoft where they try to embed knowledge into the language model + and that''s.", "tokens": [50364, 286, 486, 611, 853, 281, 1604, 281, 2113, 472, + 3035, 1310, 309, 311, 411, 257, 2638, 295, 10577, 490, 8116, 689, 436, 853, 281, + 12240, 3601, 666, 264, 2856, 2316, 293, 300, 311, 13, 51064], "temperature": 0.0, + "avg_logprob": -0.12951921161852384, "compression_ratio": 1.3306451612903225, "no_speech_prob": + 0.3616473376750946}, {"id": 314, "seek": 452680, "start": 4527.8, "end": 4543.8, + "text": " I think it''s a very interesting direction as well as also embedding knowledge + graphs into the model right because one way, as you said, and I think that trend + probably still there that yeah you can keep adding parameters more and more billion + trillions.", "tokens": [50414, 286, 519, 309, 311, 257, 588, 1880, 3513, 382, 731, + 382, 611, 12240, 3584, 3601, 24877, 666, 264, 2316, 558, 570, 472, 636, 11, 382, + 291, 848, 11, 293, 286, 519, 300, 6028, 1391, 920, 456, 300, 1338, 291, 393, 1066, + 5127, 9834, 544, 293, 544, 5218, 504, 46279, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.130066297672413, "compression_ratio": 1.5149700598802396, "no_speech_prob": 0.15515585243701935}, + {"id": 315, "seek": 454380, "start": 4543.8, "end": 4553.8, "text": " But at some + point it just simply becomes an on practical in practical right to to have such + a large model in production and then how do you find unit.", "tokens": [50364, 583, + 412, 512, 935, 309, 445, 2935, 3643, 364, 322, 8496, 294, 8496, 558, 281, 281, 362, + 1270, 257, 2416, 2316, 294, 4265, 293, 550, 577, 360, 291, 915, 4985, 13, 50864], + "temperature": 0.0, "avg_logprob": -0.18731724132191052, "compression_ratio": 1.5157232704402517, + "no_speech_prob": 0.01904473640024662}, {"id": 316, "seek": 454380, "start": 4553.8, + "end": 4559.8, "text": " But again it doesn''t capture the relationships well enough + right if you didn''t explain it.", "tokens": [50864, 583, 797, 309, 1177, 380, 7983, + 264, 6159, 731, 1547, 558, 498, 291, 994, 380, 2903, 309, 13, 51164], "temperature": + 0.0, "avg_logprob": -0.18731724132191052, "compression_ratio": 1.5157232704402517, + "no_speech_prob": 0.01904473640024662}, {"id": 317, "seek": 455980, "start": 4560.8, + "end": 4582.8, "text": " Absolutely and just thinking of it we have actually a meetup + end of September so if you''re interested or anyone who''s listening where a needs + Rimas will also talk about exactly that topic how do you kind of incorporate knowledge + into a language model and and that would be our end of September as well just look + for maybe can link it in the show notes.", "tokens": [50414, 7021, 293, 445, 1953, + 295, 309, 321, 362, 767, 257, 1677, 1010, 917, 295, 7216, 370, 498, 291, 434, 3102, + 420, 2878, 567, 311, 4764, 689, 257, 2203, 497, 17957, 486, 611, 751, 466, 2293, + 300, 4829, 577, 360, 291, 733, 295, 16091, 3601, 666, 257, 2856, 2316, 293, 293, + 300, 576, 312, 527, 917, 295, 7216, 382, 731, 445, 574, 337, 1310, 393, 2113, 309, + 294, 264, 855, 5570, 13, 51514], "temperature": 0.0, "avg_logprob": -0.32280543009440105, + "compression_ratio": 1.5855855855855856, "no_speech_prob": 0.033445827662944794}, + {"id": 318, "seek": 458280, "start": 4582.8, "end": 4594.8, "text": " So I will + manage you to experience absolutely will do that gladly my favorite question I know + you touched many times as well during this podcast which I really really enjoyed.", + "tokens": [50364, 407, 286, 486, 3067, 291, 281, 1752, 3122, 486, 360, 300, 47307, + 452, 2954, 1168, 286, 458, 291, 9828, 867, 1413, 382, 731, 1830, 341, 7367, 597, + 286, 534, 534, 4626, 13, 50964], "temperature": 0.0, "avg_logprob": -0.25164263407389326, + "compression_ratio": 1.681159420289855, "no_speech_prob": 0.33660757541656494}, + {"id": 319, "seek": 458280, "start": 4594.8, "end": 4609.8, "text": " But what what + else drives you beyond you know you have a role as a city or you have a role as + a pioneer in this space and maybe educating and reaching more and more people.", + "tokens": [50964, 583, 437, 437, 1646, 11754, 291, 4399, 291, 458, 291, 362, 257, + 3090, 382, 257, 2307, 420, 291, 362, 257, 3090, 382, 257, 37668, 294, 341, 1901, + 293, 1310, 28835, 293, 9906, 544, 293, 544, 561, 13, 51714], "temperature": 0.0, + "avg_logprob": -0.25164263407389326, "compression_ratio": 1.681159420289855, "no_speech_prob": + 0.33660757541656494}, {"id": 320, "seek": 460980, "start": 4610.8, "end": 4617.8, + "text": " Is there something else that drives you sort of beyond the tech itself + in this field.", "tokens": [50414, 1119, 456, 746, 1646, 300, 11754, 291, 1333, + 295, 4399, 264, 7553, 2564, 294, 341, 2519, 13, 50764], "temperature": 0.0, "avg_logprob": + -0.2299879921807183, "compression_ratio": 1.6965811965811965, "no_speech_prob": + 0.07786066085100174}, {"id": 321, "seek": 460980, "start": 4617.8, "end": 4623.8, + "text": " Yeah like I mean that I think my make sight of my passion for an appears + here.", "tokens": [50764, 865, 411, 286, 914, 300, 286, 519, 452, 652, 7860, 295, + 452, 5418, 337, 364, 7038, 510, 13, 51064], "temperature": 0.0, "avg_logprob": -0.2299879921807183, + "compression_ratio": 1.6965811965811965, "no_speech_prob": 0.07786066085100174}, + {"id": 322, "seek": 460980, "start": 4623.8, "end": 4626.8, "text": " I hope that + came came through.", "tokens": [51064, 286, 1454, 300, 1361, 1361, 807, 13, 51214], + "temperature": 0.0, "avg_logprob": -0.2299879921807183, "compression_ratio": 1.6965811965811965, + "no_speech_prob": 0.07786066085100174}, {"id": 323, "seek": 460980, "start": 4626.8, + "end": 4638.8, "text": " But for me like the technology is the one thing but then + really seeing how you solve problems with that like how you can make annoying work + of financial analyst faster and better like just seeing that.", "tokens": [51214, + 583, 337, 385, 411, 264, 2899, 307, 264, 472, 551, 457, 550, 534, 2577, 577, 291, + 5039, 2740, 365, 300, 411, 577, 291, 393, 652, 11304, 589, 295, 4669, 19085, 4663, + 293, 1101, 411, 445, 2577, 300, 13, 51814], "temperature": 0.0, "avg_logprob": -0.2299879921807183, + "compression_ratio": 1.6965811965811965, "no_speech_prob": 0.07786066085100174}, + {"id": 324, "seek": 463880, "start": 4638.8, "end": 4646.8, "text": " Either say + first and because they are customer or it''s a more indirectly.", "tokens": [50364, + 13746, 584, 700, 293, 570, 436, 366, 5474, 420, 309, 311, 257, 544, 37779, 13, 50764], + "temperature": 0.0, "avg_logprob": -0.22878915385196083, "compression_ratio": 1.5792079207920793, + "no_speech_prob": 0.013838470913469791}, {"id": 325, "seek": 463880, "start": 4646.8, + "end": 4649.8, "text": " If you know that this is now kind of possible.", "tokens": + [50764, 759, 291, 458, 300, 341, 307, 586, 733, 295, 1944, 13, 50914], "temperature": + 0.0, "avg_logprob": -0.22878915385196083, "compression_ratio": 1.5792079207920793, + "no_speech_prob": 0.013838470913469791}, {"id": 326, "seek": 463880, "start": 4649.8, + "end": 4664.8, "text": " So I think it''s like still a big driver for me personally + and I think I want to think I absolutely love about open source that it''s not just + paying users commercial users where you kind of see that.", "tokens": [50914, 407, + 286, 519, 309, 311, 411, 920, 257, 955, 6787, 337, 385, 5665, 293, 286, 519, 286, + 528, 281, 519, 286, 3122, 959, 466, 1269, 4009, 300, 309, 311, 406, 445, 6229, 5022, + 6841, 5022, 689, 291, 733, 295, 536, 300, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.22878915385196083, "compression_ratio": 1.5792079207920793, "no_speech_prob": + 0.013838470913469791}, {"id": 327, "seek": 466480, "start": 4664.8, "end": 4678.8, + "text": " But we are really this huge community by now from haystack where there''s + so many different people with different backgrounds, different use cases and it''s.", + "tokens": [50364, 583, 321, 366, 534, 341, 2603, 1768, 538, 586, 490, 4842, 372, + 501, 689, 456, 311, 370, 867, 819, 561, 365, 819, 17336, 11, 819, 764, 3331, 293, + 309, 311, 13, 51064], "temperature": 0.0, "avg_logprob": -0.29184861864362444, "compression_ratio": + 1.5522388059701493, "no_speech_prob": 0.16269415616989136}, {"id": 328, "seek": + 466480, "start": 4678.8, "end": 4689.8, "text": " For me often like just end of + the day really like scrolling through and all get up issues kind of questions that + come in or on Slack when are we on discord.", "tokens": [51064, 1171, 385, 2049, + 411, 445, 917, 295, 264, 786, 534, 411, 29053, 807, 293, 439, 483, 493, 2663, 733, + 295, 1651, 300, 808, 294, 420, 322, 37211, 562, 366, 321, 322, 32989, 13, 51614], + "temperature": 0.0, "avg_logprob": -0.29184861864362444, "compression_ratio": 1.5522388059701493, + "no_speech_prob": 0.16269415616989136}, {"id": 329, "seek": 468980, "start": 4689.8, + "end": 4701.8, "text": " Like what what people are actually building with that and + and it''s really cool to see what they kind of use case come up with but also how + far this actually got that it''s.", "tokens": [50364, 1743, 437, 437, 561, 366, + 767, 2390, 365, 300, 293, 293, 309, 311, 534, 1627, 281, 536, 437, 436, 733, 295, + 764, 1389, 808, 493, 365, 457, 611, 577, 1400, 341, 767, 658, 300, 309, 311, 13, + 50964], "temperature": 0.0, "avg_logprob": -0.2533611437169517, "compression_ratio": + 1.4491525423728813, "no_speech_prob": 0.03317677974700928}, {"id": 330, "seek": + 470180, "start": 4701.8, "end": 4711.8, "text": " And using so many companies all + around the world from big tech to classical enterprise to start up to build their + products on top.", "tokens": [50364, 400, 1228, 370, 867, 3431, 439, 926, 264, 1002, + 490, 955, 7553, 281, 13735, 14132, 281, 722, 493, 281, 1322, 641, 3383, 322, 1192, + 13, 50864], "temperature": 0.0, "avg_logprob": -0.28388368672338027, "compression_ratio": + 1.5263157894736843, "no_speech_prob": 0.2008688747882843}, {"id": 331, "seek": 470180, + "start": 4711.8, "end": 4722.8, "text": " And that thing is a stick to one of my + biggest motivation boosters that you can get at seeing the community appreciating + using it.", "tokens": [50864, 400, 300, 551, 307, 257, 2897, 281, 472, 295, 452, + 3880, 12335, 748, 40427, 300, 291, 393, 483, 412, 2577, 264, 1768, 3616, 990, 1228, + 309, 13, 51414], "temperature": 0.0, "avg_logprob": -0.28388368672338027, "compression_ratio": + 1.5263157894736843, "no_speech_prob": 0.2008688747882843}, {"id": 332, "seek": 472280, + "start": 4722.8, "end": 4741.8, "text": " And and probably also like on tip beyond + get up just recently ran into a guy in a bar who he and Berlin who who use taste + that and let''s definitely something I never would have imagined a few years ago.", + "tokens": [50364, 400, 293, 1391, 611, 411, 322, 4125, 4399, 483, 493, 445, 3938, + 5872, 666, 257, 2146, 294, 257, 2159, 567, 415, 293, 13848, 567, 567, 764, 3939, + 300, 293, 718, 311, 2138, 746, 286, 1128, 576, 362, 16590, 257, 1326, 924, 2057, + 13, 51314], "temperature": 0.0, "avg_logprob": -0.3452634608491938, "compression_ratio": + 1.3835616438356164, "no_speech_prob": 0.0756370946764946}, {"id": 333, "seek": 474180, + "start": 4741.8, "end": 4752.8, "text": " And this kind of happens or what we said + of a glass here when we find a bit of a vision and thought about some goals at the + company offside.", "tokens": [50364, 400, 341, 733, 295, 2314, 420, 437, 321, 848, + 295, 257, 4276, 510, 562, 321, 915, 257, 857, 295, 257, 5201, 293, 1194, 466, 512, + 5493, 412, 264, 2237, 766, 1812, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.3188341547933857, "compression_ratio": 1.6134453781512605, "no_speech_prob": + 0.289371520280838}, {"id": 334, "seek": 474180, "start": 4752.8, "end": 4769.8, + "text": " I think one of us for the open source side that people start putting say + haystack experience into their job requirements or the other way around people putting + that in the CVs and we thought oh, I guess this is maybe three years down the road.", + "tokens": [50914, 286, 519, 472, 295, 505, 337, 264, 1269, 4009, 1252, 300, 561, + 722, 3372, 584, 4842, 372, 501, 1752, 666, 641, 1691, 7728, 420, 264, 661, 636, + 926, 561, 3372, 300, 294, 264, 22995, 82, 293, 321, 1194, 1954, 11, 286, 2041, 341, + 307, 1310, 1045, 924, 760, 264, 3060, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.3188341547933857, "compression_ratio": 1.6134453781512605, "no_speech_prob": + 0.289371520280838}, {"id": 335, "seek": 476980, "start": 4769.8, "end": 4780.8, + "text": " But then a few weeks afterwards we saw these first job postings where + this was required and also TVs where this was mentioned.", "tokens": [50364, 583, + 550, 257, 1326, 3259, 10543, 321, 1866, 613, 700, 1691, 2183, 1109, 689, 341, 390, + 4739, 293, 611, 38085, 689, 341, 390, 2835, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.22198439263678216, "compression_ratio": 1.5809523809523809, "no_speech_prob": + 0.0012213498121127486}, {"id": 336, "seek": 476980, "start": 4780.8, "end": 4796.8, + "text": " So I think it''s just cool to see how you can leave a footprint and beyond + let''s say you are immediate bubble but really kind of spreads it''s open it''s + all digital it''s kind of connected in the world right.", "tokens": [50914, 407, + 286, 519, 309, 311, 445, 1627, 281, 536, 577, 291, 393, 1856, 257, 24222, 293, 4399, + 718, 311, 584, 291, 366, 11629, 12212, 457, 534, 733, 295, 25728, 309, 311, 1269, + 309, 311, 439, 4562, 309, 311, 733, 295, 4582, 294, 264, 1002, 558, 13, 51714], + "temperature": 0.0, "avg_logprob": -0.22198439263678216, "compression_ratio": 1.5809523809523809, + "no_speech_prob": 0.0012213498121127486}, {"id": 337, "seek": 479680, "start": 4796.8, + "end": 4802.8, "text": " And leaving those kind of footprint is what I enjoy.", + "tokens": [50364, 400, 5012, 729, 733, 295, 24222, 307, 437, 286, 2103, 13, 50664], + "temperature": 0.0, "avg_logprob": -0.31329286420667496, "compression_ratio": 1.5353535353535352, + "no_speech_prob": 0.09440034627914429}, {"id": 338, "seek": 479680, "start": 4802.8, + "end": 4823.8, "text": " And yeah, search in I think as a domain in us just for + me really interesting because it''s so diverse as you can go in many directions + can dive very deep into an IP can think a lot about the user side at the end for + what use cases you can make it work.", "tokens": [50664, 400, 1338, 11, 3164, 294, + 286, 519, 382, 257, 9274, 294, 505, 445, 337, 385, 534, 1880, 570, 309, 311, 370, + 9521, 382, 291, 393, 352, 294, 867, 11095, 393, 9192, 588, 2452, 666, 364, 8671, + 393, 519, 257, 688, 466, 264, 4195, 1252, 412, 264, 917, 337, 437, 764, 3331, 291, + 393, 652, 309, 589, 13, 51714], "temperature": 0.0, "avg_logprob": -0.31329286420667496, + "compression_ratio": 1.5353535353535352, "no_speech_prob": 0.09440034627914429}, + {"id": 339, "seek": 482380, "start": 4823.8, "end": 4827.8, "text": " And can think + a lot about scalability.", "tokens": [50364, 400, 393, 519, 257, 688, 466, 15664, + 2310, 13, 50564], "temperature": 0.0, "avg_logprob": -0.22565984725952148, "compression_ratio": + 1.6402116402116402, "no_speech_prob": 0.07637202739715576}, {"id": 340, "seek": + 482380, "start": 4827.8, "end": 4837.8, "text": " It''s just I think the one of + the most for my point most exciting and diverse applications of technology right + now.", "tokens": [50564, 467, 311, 445, 286, 519, 264, 472, 295, 264, 881, 337, + 452, 935, 881, 4670, 293, 9521, 5821, 295, 2899, 558, 586, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.22565984725952148, "compression_ratio": 1.6402116402116402, + "no_speech_prob": 0.07637202739715576}, {"id": 341, "seek": 482380, "start": 4837.8, + "end": 4849.8, "text": " And and one way I think you can really relate to like really + can think okay what what is actually possible what kind of information you can make + accessible.", "tokens": [51064, 400, 293, 472, 636, 286, 519, 291, 393, 534, 10961, + 281, 411, 534, 393, 519, 1392, 437, 437, 307, 767, 1944, 437, 733, 295, 1589, 291, + 393, 652, 9515, 13, 51664], "temperature": 0.0, "avg_logprob": -0.22565984725952148, + "compression_ratio": 1.6402116402116402, "no_speech_prob": 0.07637202739715576}, + {"id": 342, "seek": 484980, "start": 4849.8, "end": 4853.8, "text": " And that''s + that''s obviously the beauty of it.", "tokens": [50364, 400, 300, 311, 300, 311, + 2745, 264, 6643, 295, 309, 13, 50564], "temperature": 0.0, "avg_logprob": -0.22360794127933561, + "compression_ratio": 1.567251461988304, "no_speech_prob": 0.057433754205703735}, + {"id": 343, "seek": 484980, "start": 4853.8, "end": 4869.8, "text": " Yeah, it''s + beautifully put thanks thanks for sharing I know some some of the guests that I + asked this question would probably think hey why is this philosophical question + I''m just you know doing it I like it but that''s it.", "tokens": [50564, 865, 11, + 309, 311, 16525, 829, 3231, 3231, 337, 5414, 286, 458, 512, 512, 295, 264, 9804, + 300, 286, 2351, 341, 1168, 576, 1391, 519, 4177, 983, 307, 341, 25066, 1168, 286, + 478, 445, 291, 458, 884, 309, 286, 411, 309, 457, 300, 311, 309, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.22360794127933561, "compression_ratio": 1.567251461988304, + "no_speech_prob": 0.057433754205703735}, {"id": 344, "seek": 486980, "start": 4869.8, + "end": 4887.8, "text": " But I think it gives so much to towards you know you reflecting + on what you do because that might also influence your choices in in the tech or + in how you approach your users what message you send and so on and so forth and + maybe reconsider some things as well.", "tokens": [50364, 583, 286, 519, 309, 2709, + 370, 709, 281, 3030, 291, 458, 291, 23543, 322, 437, 291, 360, 570, 300, 1062, 611, + 6503, 428, 7994, 294, 294, 264, 7553, 420, 294, 577, 291, 3109, 428, 5022, 437, + 3636, 291, 2845, 293, 370, 322, 293, 370, 5220, 293, 1310, 40497, 512, 721, 382, + 731, 13, 51264], "temperature": 0.0, "avg_logprob": -0.10283544607329786, "compression_ratio": + 1.535294117647059, "no_speech_prob": 0.09484151005744934}, {"id": 345, "seek": 488780, + "start": 4887.8, "end": 4916.8, "text": " And an open source part you reminded me + of one story when it was my first time visit in the US I think it was 2015 and it + was a patchy coin I was crossing on the traffic light you know on the pedestrian + crossing and it was like this narrow avenue you know not narrow white and every + right and select takes on my account like few minutes but it''s of course not minutes + maybe 20 seconds.", "tokens": [50364, 400, 364, 1269, 4009, 644, 291, 15920, 385, + 295, 472, 1657, 562, 309, 390, 452, 700, 565, 3441, 294, 264, 2546, 286, 519, 309, + 390, 7546, 293, 309, 390, 257, 9972, 88, 11464, 286, 390, 14712, 322, 264, 6419, + 1442, 291, 458, 322, 264, 33947, 14712, 293, 309, 390, 411, 341, 9432, 39230, 291, + 458, 406, 9432, 2418, 293, 633, 558, 293, 3048, 2516, 322, 452, 2696, 411, 1326, + 2077, 457, 309, 311, 295, 1164, 406, 2077, 1310, 945, 3949, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.18648285585291247, "compression_ratio": 1.6594827586206897, + "no_speech_prob": 0.41918399930000305}, {"id": 346, "seek": 491680, "start": 4916.8, + "end": 4927.8, "text": " And I think it''s really bouncing to me from the other + side of the road saying I know you I was like no it''s impossible it''s my first + time visit you know I don''t I''m not a public figure.", "tokens": [50364, 400, + 286, 519, 309, 311, 534, 27380, 281, 385, 490, 264, 661, 1252, 295, 264, 3060, 1566, + 286, 458, 291, 286, 390, 411, 572, 309, 311, 6243, 309, 311, 452, 700, 565, 3441, + 291, 458, 286, 500, 380, 286, 478, 406, 257, 1908, 2573, 13, 50914], "temperature": + 0.0, "avg_logprob": -0.28622476914349726, "compression_ratio": 1.5922330097087378, + "no_speech_prob": 0.0880960151553154}, {"id": 347, "seek": 491680, "start": 4927.8, + "end": 4937.8, "text": " How is it possible and he said he because you build look + it''s one of the open source kind of you seen in the extriders that I used to work + on.", "tokens": [50914, 1012, 307, 309, 1944, 293, 415, 848, 415, 570, 291, 1322, + 574, 309, 311, 472, 295, 264, 1269, 4009, 733, 295, 291, 1612, 294, 264, 16455, + 6936, 300, 286, 1143, 281, 589, 322, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.28622476914349726, "compression_ratio": 1.5922330097087378, "no_speech_prob": + 0.0880960151553154}, {"id": 348, "seek": 493780, "start": 4937.8, "end": 4957.8, + "text": " You know which I inherited from its original creator Andrej Blyetski and + that''s it he didn''t stop to say anything else but but he made my day you know + and I think what you felt in the bar was probably similar knowing that that person + uses haystack and you know it''s amazing.", "tokens": [50364, 509, 458, 597, 286, + 27091, 490, 1080, 3380, 14181, 20667, 73, 363, 356, 1385, 2984, 293, 300, 311, 309, + 415, 994, 380, 1590, 281, 584, 1340, 1646, 457, 457, 415, 1027, 452, 786, 291, 458, + 293, 286, 519, 437, 291, 2762, 294, 264, 2159, 390, 1391, 2531, 5276, 300, 300, + 954, 4960, 4842, 372, 501, 293, 291, 458, 309, 311, 2243, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.18135158943407464, "compression_ratio": 1.4972677595628416, + "no_speech_prob": 0.28599128127098083}, {"id": 349, "seek": 495780, "start": 4957.8, + "end": 4973.8, "text": " Absolutely because it''s just it feels very honest right + it feels like it is is not because we know it''s crazy marketing or anything like + that it''s just like a really like a natural community thing and and just building + something that''s useful for others.", "tokens": [50364, 7021, 570, 309, 311, 445, + 309, 3417, 588, 3245, 558, 309, 3417, 411, 309, 307, 307, 406, 570, 321, 458, 309, + 311, 3219, 6370, 420, 1340, 411, 300, 309, 311, 445, 411, 257, 534, 411, 257, 3303, + 1768, 551, 293, 293, 445, 2390, 746, 300, 311, 4420, 337, 2357, 13, 51164], "temperature": + 0.0, "avg_logprob": -0.19310691621568468, "compression_ratio": 1.6178343949044587, + "no_speech_prob": 0.29695138335227966}, {"id": 350, "seek": 497380, "start": 4973.8, + "end": 4991.8, "text": " Yeah exactly which probably reinforces you and gives you + these well in this case direct feedback well not only specifics of your of your + platform but actually the fact that they''re using it and relying on and building + a business and that tells the two decisions you made in the architecture and so + on and so forth that''s amazing.", "tokens": [50364, 865, 2293, 597, 1391, 20520, + 887, 291, 293, 2709, 291, 613, 731, 294, 341, 1389, 2047, 5824, 731, 406, 787, 28454, + 295, 428, 295, 428, 3663, 457, 767, 264, 1186, 300, 436, 434, 1228, 309, 293, 24140, + 322, 293, 2390, 257, 1606, 293, 300, 5112, 264, 732, 5327, 291, 1027, 294, 264, + 9482, 293, 370, 322, 293, 370, 5220, 300, 311, 2243, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.16160714448387944, "compression_ratio": 1.645, "no_speech_prob": + 0.1877872794866562}, {"id": 351, "seek": 499180, "start": 4991.8, "end": 5006.8, + "text": " Yeah I mean like from a company perspective that''s one of the fastest + feedback cycles you can have right and like seeing diverse use cases diverse developer + person on us how they approach things what they''re struggling with.", "tokens": + [50364, 865, 286, 914, 411, 490, 257, 2237, 4585, 300, 311, 472, 295, 264, 14573, + 5824, 17796, 291, 393, 362, 558, 293, 411, 2577, 9521, 764, 3331, 9521, 10754, 954, + 322, 505, 577, 436, 3109, 721, 437, 436, 434, 9314, 365, 13, 51114], "temperature": + 0.0, "avg_logprob": -0.27308099023227034, "compression_ratio": 1.5, "no_speech_prob": + 0.022672582417726517}, {"id": 352, "seek": 499180, "start": 5006.8, "end": 5011.8, + "text": " Yeah also that angle it was fast yeah absolutely crucial.", "tokens": + [51114, 865, 611, 300, 5802, 309, 390, 2370, 1338, 3122, 11462, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.27308099023227034, "compression_ratio": 1.5, "no_speech_prob": + 0.022672582417726517}, {"id": 353, "seek": 501180, "start": 5011.8, "end": 5029.8, + "text": " I think it''s the best and it''s like I think it''s Elon Musk who said + the best setting is when your user fell in love with your product and once you just + succeed so yeah there you go amazing and I''ve enjoyed this podcast so much is there + anything you want to announce to our listeners.", "tokens": [50364, 286, 519, 309, + 311, 264, 1151, 293, 309, 311, 411, 286, 519, 309, 311, 28498, 26019, 567, 848, + 264, 1151, 3287, 307, 562, 428, 4195, 5696, 294, 959, 365, 428, 1674, 293, 1564, + 291, 445, 7754, 370, 1338, 456, 291, 352, 2243, 293, 286, 600, 4626, 341, 7367, + 370, 709, 307, 456, 1340, 291, 528, 281, 7478, 281, 527, 23274, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.08603833271906926, "compression_ratio": 1.5810055865921788, + "no_speech_prob": 0.11487273871898651}, {"id": 354, "seek": 502980, "start": 5029.8, + "end": 5052.8, "text": " Yeah we''ve just the meet up I already mentioned so if + you''re interested in LP that''s happening in September it will be hybrid so you + can join online but if you''re if you happen to be in Berlin we also have a small + on site event and then yeah of course if you haven''t tried hastag yet maybe check + it out on GitHub.", "tokens": [50364, 865, 321, 600, 445, 264, 1677, 493, 286, 1217, + 2835, 370, 498, 291, 434, 3102, 294, 38095, 300, 311, 2737, 294, 7216, 309, 486, + 312, 13051, 370, 291, 393, 3917, 2950, 457, 498, 291, 434, 498, 291, 1051, 281, + 312, 294, 13848, 321, 611, 362, 257, 1359, 322, 3621, 2280, 293, 550, 1338, 295, + 1164, 498, 291, 2378, 380, 3031, 6581, 559, 1939, 1310, 1520, 309, 484, 322, 23331, + 13, 51514], "temperature": 0.2, "avg_logprob": -0.18091154098510742, "compression_ratio": + 1.5120772946859904, "no_speech_prob": 0.49246010184288025}, {"id": 355, "seek": + 505280, "start": 5052.8, "end": 5080.8, "text": " As a prompt every promise you + can get an easy first pipeline up and running and just give it a try to try to question + answering if you haven''t if you''re more coming from traditional search and down + on deep set cloud as mentioned we just released a big new model on experiments with + still an early stage with with the product but we have an early access program so + if you''re interested if you''re", "tokens": [50364, 1018, 257, 12391, 633, 6228, + 291, 393, 483, 364, 1858, 700, 15517, 493, 293, 2614, 293, 445, 976, 309, 257, 853, + 281, 853, 281, 1168, 13430, 498, 291, 2378, 380, 498, 291, 434, 544, 1348, 490, + 5164, 3164, 293, 760, 322, 2452, 992, 4588, 382, 2835, 321, 445, 4736, 257, 955, + 777, 2316, 322, 12050, 365, 920, 364, 2440, 3233, 365, 365, 264, 1674, 457, 321, + 362, 364, 2440, 2105, 1461, 370, 498, 291, 434, 3102, 498, 291, 434, 51764], "temperature": + 0.0, "avg_logprob": -0.2995092323027461, "compression_ratio": 1.7356828193832599, + "no_speech_prob": 0.2435733675956726}, {"id": 356, "seek": 508080, "start": 5080.8, + "end": 5101.8, "text": " having a lot of use case that you want to bring to production + in a fast way where you think about how to scale it how to actually find that pipeline + how to collaborate with with your end users and get some feedback there just reach + out to us and then we can can get you on the on the early access program.", "tokens": + [50364, 1419, 257, 688, 295, 764, 1389, 300, 291, 528, 281, 1565, 281, 4265, 294, + 257, 2370, 636, 689, 291, 519, 466, 577, 281, 4373, 309, 577, 281, 767, 915, 300, + 15517, 577, 281, 18338, 365, 365, 428, 917, 5022, 293, 483, 512, 5824, 456, 445, + 2524, 484, 281, 505, 293, 550, 321, 393, 393, 483, 291, 322, 264, 322, 264, 2440, + 2105, 1461, 13, 51414], "temperature": 0.0, "avg_logprob": -0.22766752804026885, + "compression_ratio": 1.6576086956521738, "no_speech_prob": 0.041459083557128906}, + {"id": 357, "seek": 510180, "start": 5101.8, "end": 5130.8, "text": " Amazing thanks + so much multi have enjoyed again saying this and this was deep and thoughtful and + we will make sure to link all the all the goodies that you mentioned in the show + notes and I hope to meet some day maybe in Berlin maybe somewhere else but absolutely + yeah let''s make that happen and I totally enjoyed our conversation as well so thanks + but for having me.", "tokens": [50384, 14165, 3231, 370, 709, 4825, 362, 4626, 797, + 1566, 341, 293, 341, 390, 2452, 293, 21566, 293, 321, 486, 652, 988, 281, 2113, + 439, 264, 439, 264, 44072, 300, 291, 2835, 294, 264, 855, 5570, 293, 286, 1454, + 281, 1677, 512, 786, 1310, 294, 13848, 1310, 4079, 1646, 457, 3122, 1338, 718, 311, + 652, 300, 1051, 293, 286, 3879, 4626, 527, 3761, 382, 731, 370, 3231, 457, 337, + 1419, 385, 13, 51814], "temperature": 0.0, "avg_logprob": -0.19059837186658704, + "compression_ratio": 1.660633484162896, "no_speech_prob": 0.0754031240940094}, {"id": + 358, "seek": 513180, "start": 5131.8, "end": 5139.8, "text": " It''s definitely + interesting fantastic all the best with haystack and and with your research and + development.", "tokens": [50364, 467, 311, 2138, 1880, 5456, 439, 264, 1151, 365, + 4842, 372, 501, 293, 293, 365, 428, 2132, 293, 3250, 13, 50764], "temperature": + 0.0, "avg_logprob": -0.354872076851981, "compression_ratio": 1.3545454545454545, + "no_speech_prob": 0.007043542806059122}, {"id": 359, "seek": 513180, "start": 5140.6, + "end": 5143.8, "text": " Thanks a lot thanks all the bye bye bye.", "tokens": [50804, + 2561, 257, 688, 3231, 439, 264, 6543, 6543, 6543, 13, 50964], "temperature": 0.0, + "avg_logprob": -0.354872076851981, "compression_ratio": 1.3545454545454545, "no_speech_prob": + 0.007043542806059122}, {"id": 360, "seek": 516180, "start": 5161.8, "end": 5163.8, + "text": " you", "tokens": [50364, 291, 50464], "temperature": 0.0, "avg_logprob": + -0.7324387431144714, "compression_ratio": 0.2727272727272727, "no_speech_prob": + 0.705066442489624}]' +--- + +Hello there, Vector Podcast. Season 2, we are relaunching after summer and it was a little bit of break last episode was from Berlin buzzwords and today, coincidentally, we have a guest from Berlin, multi-peach, a studio of deep set, the company behind Haystack. +So we're going to be diving into what I call a neural framework, but I wonder if Malta would give a different picture there, but still very interested to learn and dive into multiple topics there. Hey, Malta, how you doing? I'm good doing great. Thanks for having me today. +How are you doing? I'm good. I'm great. It's still summer. It's super hot as we were exchanging before the recording. It's super, super hot, but I like it. +So yeah, I think before we dive into what is Haystack, I really like to learn about yourself and what is your background and how did you find yourself in this space of what we call Vector Search? I wonder if you describe it differently, but I call it Vector Search, Vector Search players. +So can you tell a bit about that? Yeah, I'm sure I'm happy. So I would say my background is mostly in NLP engineering, what I would call probably these days. And during my studies, I basically had no clue about NLP. I think it wasn't really any part of our coursework or something really a thing. +And for me, all then started basically after my studies, went to the research project in the US, which was at the intersection of machine learning and healthcare. And the big, big focus there was on numerical data. +So we were basically trying to find signals, patterns, and laboratory measurements for kidney disease patients to predict some kind of risks. And there was all the kind of numerical data. +And NLP wasn't really really scope of that project, but there was for me, that basically one kind of event that made me then get in touch with NLP and eventually fell at fall in love. +And it was really in this project, we tried to predict a lot of these risk factors through a lot of, I would say, quite fancy modeling to get some good signals. And at the end, it kind of worked. +We were able to predict some risks, but when we then talked to doctors and showed them these results or asked for their feedback, they said, yeah, yeah, that's all correct. Yeah, but it's not really new. We knew that before. But this part here, this is like, this is an interesting one. +And this is what we do there. And that was basically the only small part where we, where we looked at written notes of of doctors during treatments. And from a modeling perspective, that was really, I would say, nothing fancy, nothing advanced, nothing where we spend a lot of time. +But at the end, it was the point, I think, where the, the doctor's physician saw the biggest value. And that kind of got me to think again, thought, okay, well, like, just this kind of data source, it was something they couldn't really access before. +And now with this, like, very simple, native methods, they somehow saw a value, a new thing. And that's basically where I thought, oh, what, it's cool. +What can you actually then do with more advanced methods of, if you have more fancy models, how can you make this kind of unused data source than accessible. And yeah, basically, realizing this, the power of it. +And that's basically when it then started digging deeper, working more on energy, at some point, then set left research, because I was really interested in seeing these models working the real world. +How do they work at scale? How can they really then solve problems every day? And basically, and came back to Germany, worked in a couple of startups, always just say, NAP at scale, kind of intersection, a lot in online advertisement, recommend our systems. +And then eventually four years ago, then we started sort of deep set. And together with two colleagues, we found the deep set basically because we saw this big motion appeared was kind of piling up. +There was a whole like still pre transformers, but there were early science, I think, on research that, that things are becoming more feasible and super interesting things became possible. +At the same time, we also saw that there's this big gap, you know, like things becoming possible on research side, didn't really mean people were using it in production in the industry. And I think we were at this, this interesting bubble back then. +We did it, we applied deep learning models at scale, saw how that worked, but also saw how much of work it actually is of manual work to get it done. +And basically up the early days of deep set were mainly around, how can we bridge that gap, how can we get latest models from research into production in the industry, what kind of product tooling can we do. And can we build to make that transition easier. +Yeah, and that's basically how we, we ended up in the, in the startup world building building out deep set. And, yeah, initially, that was really more about we saw this problem. We had a couple of product hypothesis, but we didn't, didn't like say place a bet on directly on one of them. +We rather said, okay, let's, let's go out there. Let's really try to understand for one year what are really repetitive use cases out there. What are really the pain points of other enterprise teams that are working in that field and then kind of settling on a product and then building it out. +Yeah, that's basically after one year, how we ended up in search. + And of course, I would say really the one use case, the dominant use case, there was present in every company that we worked with and that was really a big say, valuable use case, where the push not only came from the developers who wanted to do something better, but also actually from the, from the business side where people saw big value inside Eric. +I use Google every day, where can't we have something similar in our product or our internal data sets and and that thing was something that got us done really interested. +And on the same time that on the the tech side, basically learning more and more about the pain points, why is it actually so difficult for for people in these and these enterprises to build modern search systems, what could you actually do to help them. Yeah, that's fascinating. +Actually four or five years ago, could you have imagined that an L.P. +would cross paths with search because like in many ways, this bar search, which existed for many, many years before was in some sense, I sense it that way in mailing list, let's say a patch is solar mailing list, people were dreaming about applying an L.P. +in some way, compared to what is happening right now. + I don't want to downplay those efforts, but I'm saying things like you could embed a post tag, part of speech tag on on term level, and then use that during search again, you need to run some kind of parser on the query, and then use that payload information to filter through let's say adjectives and verbs or something bad, you know, I don't know if there was any practical application in place, probably there was. +But again, if you compare that to what is happening today, you basically have a vast array of models right in deep learning models that can be applied directly to search using vector search approach, could you have imagined this happening when you when you were about to start the company. +No, I would say I was I think what we we had big big say dreams about N.A.P. +and we we were true believers that that things become easier and say more feasible in production, but that was more actually under I would say transfer learning side and making models to say more easily adoptable to certain domains for search, I think that was for us. +And only then on our journey where we kind of realized, oh, like that's actually two interesting different fields kind of connecting over time right and also I felt from at least from my perspective, from a community side from the people who worked on information retrieval. + I think for a long time, a big, like a lot of skeptic people, I wouldn't be talking about any key or dance dance retrieval for good reason right because I think there was also like a lot of hype around deep learning and still what's a lot of promises that were made like that it will just outperform space retrieval out of the box. +And then I think many of these promises were not hold for a long time. +But I think then basically there was another phase where I think people realized, oh, actually now it's kind of starting to work and not only just in research and these ivory towers and lab settings but actually also in reality at scale. + And I think that was then fast also here, the moment where I've got really interesting and I think since then just crazy to see how things are progressing when thinking about a multi model search or now just was like more I say going away from document retrieval to maybe something like question answering which we do a lot. +And really really crazy to see what's possible these days and I couldn't have imagined that it's going so fast. Yeah, and there are a lot of contributors as well, of course. I just happened to give a talk about players in vector search. +I will link it in the show notes, which was just published with C's. London IR meet up, but even that during that presentation, I felt like I'm scratching the the tip of the iceberg in some sense, I know there is so much happening. +And in Heystack, like did you have a vision for the product, like you said, you didn't know what the product will be, but you knew sort of the repetitive use cases in a way, right, and also challenges, can you share some of the early day challenges that you saw. +And do you think that they are solved today or are they still kind of like in the mix of we need to fix something's there. So I think that was basically all about this first year of deep set, where we did these learnings where wasn't that clear. +But after that year, I think we had a lot of clear insights and at least for us, a clear vision also for Heystack, what we want to want to solve there. +And I would say the big challenge, the big problem that we focused on that we saw in the industry was having just all these get up technologies and. +And basically Heystack is trying and always as I would say as a design philosophy design principle has two things in place that try to bring these data technologies together in a meaningful way. +And what I mean with that is basically if you think about search it's what say really it's a lot more than then model right and it typically you have factor databases. +And you may be chained together multiple models, you have something you want to do at indexing time, you have other things you want to do a query time. +And for each of these say kind of components that you need at the end, there are so many different options that you're that you can plug in and often it's hard to say in the early days. +And then you know, do I go for elastic search or something like pine cone electrical database, do I go for this model or that model, do I need a, I don't know, just the retriever in my pipeline or do I actually also need to add a re rank or something else. +And we just saw that teams are aware of actually spending a lot of time on. And then we're doing these things together manually. And even when they had it once there was and constant or maintenance work or iterations where they have to exchange one component of the system. +And that was really just slowing them down a lot and sometimes even then causing that a project got. So over time, not really ending up in production, but kind of dying at the prototyping stage, because it just took so long and and things got kind of sidetracked. +And with hastag, we basically tried to solve that and having very clear building blocks like, for example, the retriever, which very clean their face. +And within that you can swap a lot of different technology models and the same for a slew vector database document stores where you can very easily change between something like elastic search, pine cone, we veate and whatnot. +So I would say that's the was the one thing this building blocks and trying to get the focus of developers back on making these creative decisions what they actually want to have in their pipeline, trying it out with with anti users, rather than just spending time on doing things together. +And the second thing is I would say very deep concept also in hastag up pipelines. So really what we saw is it's not just one model. It's typically a couple of steps that you want to have there. +So in hastag we started early on having direct as to click graphs where you can have different notes and basically when you have a query or indexing time file that kind of hits the pipeline, you can root it for this graph. That can be very easy. There is a set of a query. +I do put it to a retriever and I get back my documents or can go basically quite complex where you say all like depending on the query type. +If it's a keyword query, I rooted a certain path in my graph, my pipeline, or if it's a question, maybe I go a different way and I have different models, I'm basically involved in my in my search request. And these two, I was here, the core principles in hastag up. That's very interesting. +So that second thing they are cyclic graph with a love for very complex scenarios, right. Like as you explained, we couldn't principle support question answering use case side by side with the kind of like normal search with theory, rankers and stuff, right. Is that correct. Exactly. +So that's what we basically learned from customers like when we saw there was a big interest in something like question answering and people say, wow, that's amazing. Can we use that for our website or for our product here. But doing that switch in a production case is quite tough, right. +Like if people are used to do keyword queries and they know I know I have to enter your keywords to get basically my results. +And then from one day to the other, you switch to more semantic queries, maybe more questions or also I think dance retrieval, if you really have more sentences that you use. +It takes some time for people to adjust and we saw that in a couple of scenarios that basically the traffic kind of requests that come in. Start a lot with keyword queries and then over time slowly shift towards more semantic queries. +When people realize, oh, I can actually also ask a question and all this like, like Google. + And then there's a trend, but you need everything to have an option your system to allow both for certain time and and hasty basically with the query classifier where you can initially basically classify is that a question or a keyword query or you could go with also semantically like what a topic level saying all like this is a query for certain type of category in my my document set. +And then maybe do something different. And like early on Hey stack did it integrate with any database per se was it like the last search back then. Yeah, like the basically starting point was the last search was the very first document store we had. +But the last search back then didn't I believe didn't support neural search right so how did you actually gel these things together. Yeah, that was just that kind of coming in over time right so it was. Think the the era where elastic search was for us was really. +We came from a question answering use cases a lot and there was really like how do we scale that how can we now. Ask questions not on a single document and single small passage, but how can we do it actually on millions of files and. +And the 25 work as a retriever step before that was was okay was not not too bad and that's kind of how it started and then very fast evolved into into a say back to search direction. Where we had them a files basically as a as a next document store. +In combination with some some SQL database for for the metadata and so on and then it basically kind of. I think took off on the lecture database side with the nervous we via a pine cone and so on and so forth open search today is also part of the face deck. But that was I think then just. +Half half here after we launched a stick. Oh yeah, that's awesome. That sounds quite quick. I know that BBA was also emerging about the same time. And then and then neighbors I guess as well. Yeah, that's that's that sounds super cool. +And was there any as you were approaching your clients or like prospects was there any specific use case that you would be demoing with because you knew this would trigger the aha moment like question answering or maybe a specific domain where you did that. +Yeah, I would say we were for us it was a lot around question answering back then that was really very great that I think many of these aha moments. As to remember we were at one client and when this meeting and it was like on the in the financial domain. +So we're interested in asking questions on financial reports of certain companies and basically accelerating their analysis. +And at one point in this meeting we showed what you can do with question answering ask these questions and they also like suggested own questions that we should ask and they work so they were that point and convinced oh like that's not fake. And like smoke and mirror here. + And the basically the boss of the department was standing up and shouting like wow that's that's amazing and went out of the office and at the office next door and and carried over colleagues and said like you have to see that and that was actually even before we started building hastag but was these kind of moments were very important to see like this is something. +That is not just fascinating for for techies like we were but also say business people and users see that value and see value and their work for it. + I can imagine that and it's like a class of what we call knowledge workers right it's something that you spend so much time on crafting this queries and I have spent some time in the full text finance I would say at alpha sense and remember some of the clients they had accumulated Boolean queries over a period of 20 years right and they were like so long it's like several pages. + When you when you when you slap that into solar it runs for three minutes because our index layout was not what it is today and was not very optimal and it's crazy to see what what people kind of start doing as work around right so we are at a similar case with a with an airplane manufacturer was not financial domain but really on some more maintenance level analyzing basically issues that come up maybe in certain technical areas and they also have like this crazy Boolean search queries and people just became experts and crafting that but it took them really long like asking for sending one query creating this query I was taking easily like minutes. + Yeah exactly and so what hey stack is today can you can you elaborate a bit on the architecture and maybe if it's possible if you find it easy if you put if you pick what say use case actually I recently I was talking to one stakeholder who wanted to build a chatbot but it was a very specific domain so that chatbot would actually ask you some kind of philosophical question. + So I think it's a very difficult then like questions sort of a little bit like distracting you from from what's going on let's say you are on a conference and in a lot of things go through your mind but you don't register maybe what's going on you don't get see the value and and that Zenbot might kind of ask you and well essentially allow you to pause and reflect right. +What I realized is that yeah I could pick another shelf model let's say question answering bird or something but it probably wouldn't work on what I want right my domain is different and I had an electronic book with this Zen type of statements. +So this one question I'm hinting to is kind of fine tuning or maybe even right retraining right but where would I start with hey stack and can you walk me through the architecture. + So as mentioned earlier into core principles are these building blocks and using this building blocks to assemble pipelines and I would say the core we come from is question answering and search but by now I would say the framework has evolved a lot in that direction if you have a lot of different notes and can support a lot of different use cases going to translation zero short classification. +And you could produce these notes in isolation or you can kind of assemble them and use them within your search pipeline. +So usually I think what what our users through and how they start is now they often come with a kind of search use case pick one of the standard pipelines that we have so we can very easily the few lines of Python created pipeline for no it's a question answering or maybe dance retrieval. + Pick a document store you pick one model from for example the hackenface model hub and and we give some recommendations on which models might be my people starting point and then it's very easy actually to just put your files into a into a pipeline can be PDF files we do the conversion basically for you there's a note for it. +And just have a basic say demo system up and running in a few minutes and that's often already I think a good good starting point if you are maybe also new to that field if you just want to try it quickly out on this kind of ebooks that you mentioned. + And get a get a first let's say quality of understanding how good piece of the shelf pipelines for my use case get this first data point and then basically enter the I would say next next steps typically in your project if you see all like this is promising but not enough for really going to production. +And then typically go more in this experimentation mode they say all it's now maybe evaluate compare a couple of different models let's maybe adjust this pipeline a bit or add a re-ranker maybe or go maybe to the to a hybrid retriever pipeline where we come. +Basically have a 25 retriever in parallel to a dense retriever and we join these documents and hastic has a lot of functionality that makes that easy to to basically change a pipeline as you wonder very quickly and then evaluate if that gives you any any benefit. + If these say of the shelf options and combinations are not enough for use case then yeah you can go down the fine tuning route I would say we have also have a source the notation tool labeling tool where you can create training data and basically fine tune parts of your pipeline retriever or reader for question answering. +So basically I would say everything from a quick prototype tool. Let's do some some experiments here and there to then going and production and deploying it with a with a basic rest API until basically. + Sounds cool and so in that experimentation mode I guess one one one aspect is like fine tuning you mentioned right the other is kind of like what building blocks I could plug in right and I know you guys have really good documentation is there something like a tutorial or or some kind of walk through that would even help me discover is a user what are the options. +So we have a couple of different different tutorials showing you what kind of notes also you can use like many people are not aware of for example options that can do it indexing time that might be helpful so. +For example, like enriching your documents with metadata can be incredibly powerful later at search time because you can then filter on your search space to make more categories that that you're interested in. And there we have for example, the stories that show you how easily you can. +For example, classify documents that you index to certain categories and then later on at query time use these categories to narrow down your search space filter for these categories. +And on the model side, say if you are now you know that you want to have a say QA model reader and you know interested in what model you want. +I would probably suggest you just go to our benchmarks page which is linked from documentation there we have a couple of comparisons in terms of accuracy and speed. But also we have most of our own models on the hackenface model hub which appears to find this information and model cards. +Yeah, that's awesome. So you guys in addition to open source version that I could I presume could host completely myself right I still have a bunch of questions on that open source side but still you also offer the cloud version you call deep set cloud is right. +Can you explain what users get with that I presume scalability but maybe something else and I think we can we can leave a link to in the show notes as well for those users who want to try it out. +Yeah, basically hey stack the open source predictors will be a Python framework and you can do everything you want there to prototype the experiments and if you want also go to production with it. + But you also found in basically in addition to that people want something more like they want to really host the platform where it's really end to end and basically you have faster workflows so really what's covering the whole lifecycle of an application from early prototyping to running many experiments and parallel getting more guidance. +What's on from your eye perspective on what to launch investigating certain documents in a faster way. Then to OK now I did all these experiments and I want ever kind of one click path to production and I don't want to bother with any scaling and basically a productionizing on my side. + And this is basically what what we do with these at cloud so if you imagine as a host the platform the cloud the SaaS platform where you develop your NAP applications and can easily bring them to production and monitor them afterwards so really the I would say whole life cycle and especially what's going on getting your. +Getting your NAP pipelines faster to production as you would probably do it on a just Python level and then continue monitoring them and having this close group is to later want to maintain them. + So it sounds cool and since it's kind of like so with open source version I presume I could do kind of a local development on my PC right and then go and use some deployment pipeline to deploy with cloud version I have sort of like managed haystack right and now thinking about developer experience are you guys moving more towards cloud tools as well you know like for example. +A code editor could be in the clouds or the changes and click click the button and off it goes I don't even need to download it locally right or or do you see some other trend with your users. + No like we maybe that's also an important point so it's still a developer platform right so we are not in a low code no code space and what we really try is basically giving developers the option to customize components and that then goes through coding and and there we have for example editors directly on the platform where you can. + Edit for example just the young definition of pipelines and quickly switch certain parameters if you want to do that and then it's basically there's a hosted notebooks where you can also easily kind of open these resources like a pipeline and we automatically create some Python code of it in notebook that you can then. +Then edit as you as you know it also from haystack open source. +Adjust the sort of certain component debug it maybe at another one and then it's basically just one Python line again to move away from the Python code in your notebook to the production artifacts to the pipeline that is then deployed and then can run production. +Yeah sounds cool and if a user has some as a user I mean it could be a company right so let's say they have an established tool set you know maybe if the usage maker maybe they don't maybe use something else. How do you reach these tools said that is kind of outside of haystack do you have to. + I would say in most cases not so you will I mean what were very very basically stop I would say with with the cloud is when you have your pipeline to NAP service and you have your rest API that you expose that's kind of where we stop so there's a lot of I would say stuff in a company that is built around it when you're into your product and also on the other side of where do the files come from where does that. +Data come from how you think it into into a deep set cloud. But within that space we rather see people. Customers who appreciate it that's like fully integrated and they don't usually then. Want to stay on on sage maker if they are on it for these NAP use cases so from our perspective. +There are the other are these more generic solutions that are not specific for NAP the car work for any kind of machine learning. But if you really have cases where you want to be faster on your NAP use cases. +Want to have more say support on that side that's basically where where deep set cloud and comes into play and to give you an example your think of experiments should evaluate these pipelines. +And then you have to do give basically a lot of options to investigate predictions and what do these metrics actually say and this is a thing is something that is usually missing and solutions like sage maker. +You have to then really combine with many other tools and build in there like a lot of extra stuff. And that basically comes all together already with deep set cloud. So get it right so deep set cloud with offer me sort of an evaluation tool set right. +Can I get the same in the open source version or it's not present there. You can basically evaluate single pipelines also in the open source version. +The difference is that basically in deep set cloud you have a full overview over your project where we track all your experiments you can kind of compare them. +Launch easily 20 experiments in parallel and this is actually on large data sets and with open source I think and generally you would need to provision a lot of machines GPUs to run that in parallel. +And that's basically what one thing that we offer and deep set cloud and the other is basically the I would say just the you I love layer over it. So of course I can work with what Hey stack on and get basically a report around my experiments again maybe a panel state of frame I get some metrics. +What we do when as you on top in deep set cloud is allowing people to interact with this kind of data more easily like finding examples of queries that fail that. Or that are successful getting feedback from also end users so collaborating. +Basically the the persons who use that search system at the end. +And now that's also what I think what we what we saw a lot that yeah you can extract your predictions and maybe it's like a CSV and then you shared with your next colleague who then I'm kind of rates or give say human evaluation if these queries makes sense or not. +But again this is like a lot of friction you have them a lot of these sees these are exifies floating around. +And what we would be what we do is I think bring this together again having it in one place that you can also in future easily reuse that for other experiments and even use it for training and and have it in this in one central place. +Yeah sounds amazing from what I gather this sounds like a end to end ML ops platforms specifically for an LP neural search right. +Exactly you have thought through so many things not only the developer side of things like experimentation but also you know debugging and actually going through the feedback from stakeholders or users. And then communicating with them. +Yeah and I think this is like something that is missed in many projects like this like end user collaboration and from our experience this should really happen in in a very early stage of a project that also kind of continuously when when you move to production and even when you are production. +And I think this is something which is if you don't have the right tooling that's very annoying to you probably like just building a demo like a UI for some search system. +If you are not a front end developer if you're an LP engineer it takes some extra time and even with something extremely these days it's still is then annoying to do it properly and if you're an enterprise maybe draft some access to it's permission words. +But it's so important I think when you look at what projects work out at the end what pipelines more customers go to production. +It's really a big criteria I think in the early days like sharing a demo with your colleagues and end users really the first pipeline you have more or less giving it to the hands of users and seeing what what they think about it and how they use it. +And there were so many examples where NLP engineers thought they they knew what people were were searching but after these kind of demo sessions or like sharing it I want to see what what people actually do there. +And then they realized oh like they use a lot of key work queries or they never put a question mark at the end or they have a lot of misspellings what else. So I think there's a lot of early learnings that you can make as a developer from these demos and understanding it out. +And also I think on the other side just creating this early aha moment this kind of wow effect and some trust on the end user side is also crucial. +So I would say that's a cycle one point very early demo getting this initial feedback and then probably the second point that we see often is when you then had a time of running your experiments tuning your pipeline kind of the way to production. +I think then at some point a second phase where you you just do again some manual evaluation with end user so not completely relying on on machine learning metrics. +Because we think there's some kind of metric blindness in the industry sometimes you just kind of get obsessed with your one metric that you optimize in these experiments and whatever it is just increasing it from experiment experiment. +And you go to production and you realize wow okay this metric is doesn't say say anything about the user set of satisfaction that I have in the end. +And there are so many examples from our customers where just handing out this pipeline showing kind of like search queries and results and then collecting some easy kind of thumbs up thumbs down feedback and. +And then trying to correlate is that really what we also saw in our experiments in our metrics and in the thing in many cases was that either the pipeline was not yet ready for production and they were like it's. +The far less accurate than we thought or also case where it was the other way around where teams thought are stuck we will never go beyond and like a for a for one score of 60% we do not here it's it's not working. +And they kind of handed out this this predictions or like get this demo and then people actually don't like notes like these predictions are perfectly fine. +And when you then dig deeper I think it's often that engineers not look enough into the data I think I'm just kind of rely on this high level metric. And the thing especially nowadays. These metrics only tell the part of the story because you're like for question answering also for search. +If you have a relation data set and let's say you always label the exact answer for certain question or query. There's just so many ways how you know. Can give a correct answer for for question that is different to this label so to give an example. +And we have many customers financially domain so typical question there is. How will revenue evolve next year and maybe in your data set and the evaluation data set you labeled. It will increase by 12%. +And now at the prediction time your model maybe finds another passage or generates the answer and says it will significantly increase. So like there's no overlap at all from a lexical side still both answers make sense and and are correct and we can probably debate now which one is more accurate. +But in many cases there is they basically give the same same answer semantically. But they're just formulated very differently and that's where I would say traditional metrics fail. +So yeah, we need better metrics and we basically did some research work on that and also part of the haystack where you can do like more semantic answer similarity or as a metric. +But it's of course also just I think looking at your data and looking at these predictions and seeing if they're really wrong on or if they're actually okay and maybe it's some problem of metrics or you are labeling process where maybe you need to collect more different options that are okay. +Yeah, I totally agree it's like it's it's a challenge of intersecting user language with whatever machinery you have to answer that right be it's part search be dense search doesn't matter like users don't care what they care is that their language is understood and often enough it's not. +Especially around things like bird if we go dance bird model doesn't understand engagements right there was a research paper on that and that might actually harm. There was even a Google example where it's showing the opposite like you say I don't want that but they say yes you actually do. +And then take that medicine which might be harmful. And then the metrics is essentially what I get from what you just described essentially you might have offline metrics right let's say and DCG or precision or recall whatever and then you have online metrics right. +And actually crafting the online metrics is is also our an art and it's never ending journey and just recently I came across one blog post which was shared by a former Netflix engineer. + I will make sure to link it in the show notes as well describing click residual metric right so it's what is you expected success on on on on that let's say segment of your market whatever on the queries versus what you got and then people still keep trying and trying and trying but just doesn't deliver so you could have these as a low hanging fruit to fix your system right and so. + Do you see that maybe that's already happening in haystack or do you see that that might happen that I as a user might be able to describe my metric let's say in the form of Python or JavaScript code whatever plug it into haystack and let it measure what I want and kind of mimic the online metric in substance. + So I think like providing kind of custom metrics yeah yeah yeah yeah and you can can can do that to some degree already like plugging in basically like a Python function and forwarding it that's the one way I think the other is probably on a on a note level so you can imagine this pipeline they're providing at some point that you can have a lot of connections be it answers or documents so you can also easily kind of add custom notes various have like this this no check now I'll compare it to whatever you want or like maybe on an online setting kind of write some locks somewhere like take some some signals from from from the early query to an extensive the way you can monitor it. + So yeah I think there's that's probably one of the kind of next steps where we see it's more and more online metrics more and more online experiments I would say right now where we see big parts of the market I think that's the more in that phase of developing experimenting finding the pipeline getting it initially to production and having their radio would say smooth journey and having a fast path to production kind of high success rates for these projects and I would say it's very right now focused on more. + But yeah I would say further down the road if you really think about the whole and add up life cycle I think on the monitoring side there's this logic and one online metrics but also then things like data drift my queries actually shift into a different direction to these things a lot of our query profiles and we think like what I actually these use case how how are how can we describe this query distribution and this can be on a formal level like say again questions for those keyboard queries but could be also on a topic level to understand what is a profile at point a I mean we can match it with certain pipelines but also is that kind of changing over time. + Yeah yeah you you somewhat anticipate like expected my question or sort of partly answered my question and my next question about where do you see the biggest effort in haystack and and deep set cloud going let's say beyond ML ops you know tightening the knobs and making sure that this flies and works correctly. + More towards I know you guys also hiring a product manager so sort of like more on vision side and connected to that if you will what do you think is missing on the market today still maybe in understanding maybe in perception level maybe in tooling you already alluded also to things like metric blindness right and and and maybe when users get stuck and thinking that this is a wrong system but actually it's not they just didn't look the right way and things like that. +Yeah and there's I think the ton of works to left I think we are we already talked about it I think things progressed a lot in the last years it's crazy to see but still I feel it's with the in the middle of it or just starting and so much more work and things you can improve and do better. + Yeah I would say for us right now there's like a lot of different directions but I think especially on the on the open source side we want to improve the developer experience also like simplifying the first steps within haystack I think it can be still overwhelming and I really want to make sure that get as many people to the first aha moment like using all your own data asking a few questions comparing sparse to dense retrieval and really experiencing this first hand I think this is one of the things we work on. + Then a lot around multi model so we recently added support for tables within haystack so I think one interesting direction right now that you can really query into these kind of tables in your documents but maybe also further down the road into your SQL database as another data source and then of course everything around images videos audio and it's also interesting for us I think for our customers. +Because it's typically less important than kind of tax in tables but still I think it's interesting interesting options that you can do there. +So yeah I think that's like a lot on on open source side and deep set cloud are we recently launched basically the experiments module that was one big step forward there and now it's a lot around giving there also guidance and suggestions like. + Like for example now I have the experiment I ran an experiment I've like a lot of these metrics I have a lot of data that was somehow generated but as it's not a single model anymore it's like a pipeline I really want to understand as a data scientist okay like where should I not focus on or like where what's probably a good way forward to improve this pipeline is a rather the retrieval problem is a rather. +Another note that I should improve is maybe something wrong with my evaluation data set should I go back to labeling and like giving these kind of at least making these kind of analysis easier. +It's something that we work on right now and then I think further down the road that will be for us a lot expanding in this world ML of life cycles what we talk about right monitoring without just making it simpler to integrate it at both ends so. +Basically on the one side ingesting your source data more easily and thinking it more easily into into deep set cloud so that you can say I know either maybe I have a wiki system that I use maybe I don't know I use notion or maybe I use. + So I think it's just a little bit more conflict or I have a not another elastic such class that we already my my documents that I'm interested in so we having there kind of smooth connectors that you can can import your data and directly work on it and then on the other end the API now how can I easily get now a kind of search bar or search functionality in my final product. + So there's a lot of things and then everything around fine tuning few short learning with large language models that something we are quite excited about because I think mentioned I think right now there's already made a big step forward that you there are a lot of use cases where you don't need to train at all anymore and then maybe that's a misperception that you also see in the market. + I think to the typical users come to us and say like oh yeah this use case how can I train and then we usually ask did you really need to train your own model like have you tried this and that take these kind of combinations and kind of models that are out there certain sentence transformers certain pre trained QA models or rank models and that no no but like our use cases are different and that won't work and. +In many cases it does or at least they're surprised how good it is already and maybe it's enough to get started on it and so I think that's one misperception still I think there are then also these cases to be fair where fine tuning still helps right and where you really care about if you. + So you can go to percentage points better accuracy and where you then go down and say let's now start labeling let's collect either like we in this manual labeling process or maybe from some more noisy maybe real time like a production data where you saw what people search what they clicked how can we use that maybe for training that's something where we see big potential probably for next year. +And basically want to simplify this domain adaptation to have less manual effort and basically more automated way of of training it and that I think was also that in the direction of maybe large language models. + Yeah sounds cool and if we go in even in look even further into the future would say I don't know 5 10 years out do you think that haystack at some point may even start suggesting the user what to try you know if you go and set up a key PI for yourself right you end goal and then through the chain and that I see click graph it looks like finds a weak node and say yes something is going on there. +Then it would actually suggest you also to try some other model do you think it's possible or do you think it's a wrong direction at all like to you drive and leave this to the creativity of your users. +I think it's a combination of both so I definitely think that helps to accelerate and certain parts of your work so especially I think suggesting what experiment to run next or what it could be something you can try. +So I'm a big fan of that and I think we don't need to go probably like 5 or 10 years down the road that is happening already sooner so I can and haystack and deep set cloud. + And maybe just like one thing we are so we have our company something called Hockey Friday so it's like one Friday every month where every person the company can work on whatever they want so really hacking on crazy ideas trying stuff out and I know that this Friday people are working on a generative model where you basically give in you describe what you want like what kind of pipeline so you can type in. + And let's say I want documents such pipeline that works on legal data that is very fast something like that and the output is basically a YAML file that describes this haystack pipeline which you can then easily kind of load and Python try out and also write a load and then deep set cloud and run it there. +So that's actually we are experimenting with right now and and of course some time for the down the road I could see that you can take also like signals from from what we know from what worked on certain domains and and basically fuse that in into this maybe a generative process. +Yeah, it sounds cool actually reminded me of the time when I was doing my PhD something like 12 years ago a bit more. +I had a collaborator who wrote a paper on taking taking the user text and converging that into C++ code and the use case I don't remember exactly all the details of the use case but I remember it was some way in the airport so like they do a lot of this routine work. +And instead of repeating it you could actually build a smarter system right so you think this could be the future of haystack or maybe the industry at large. +Yeah, at least I think it's like one if you want element that helps accelerating right so if you also if you look at the core pilot right now I like it a lot for calling and I'm still in many cases surprised what what co pilot suggests you're as a as a note on the code level. +And I think something similar as also positive on the machine learning side and you are not only generates a correct code but really something that fits for for use case and to describe it. +I mean I think it's like if you think about the big up picture I think it's one piece that helps you in your workflow. I think it's there's still like many many other pieces that we need to get right and that won't be that's it a holy grail I think at the end. +What I really believe in is that you need a framework or a platform where we want to call it where you can easily compare things on your data and and I think this helps a lot then and creating transparency in the market creating also like kind of. + Trust for your own use case that you are not basically doing a technology choice before you actually started working on your use case and that I think holds for vector databases where maybe today this is a good choice for you but maybe I know one year down the road maybe you want to switch this I think this market is so early that it's very hard to place a batch right now on one of these technologies and similarly I think this is on the model. + Modeling side there's like so so much crazy bus around large language models and can firstly see the trend going there but it's also I think very important to to understand if that's really useful for your use case now how it compares to much smaller models and and this should be easy right this shouldn't this shouldn't be big part of your project it should be rather. +You were trying to think about options you want to try maybe getting some suggestions as well there but this would be I think this is a human creativity part as well and then the the actual. +So a swapping of components and comparing their making them comparable I think that's nothing where you should spend time as a developer on. + And like connected to the question about future maybe causing off of on that we recently built with my colleague are netalman a multi model and multilingual search demo right where we used clip model of the shelf without any fine tuning on web data and it showed us really really amazing results right so like where keyword search cannot find because simply. +We metadata doesn't have it and it's multilingual right so and it type it the same query with neural retrieval and it gets it. +Is there anything stopping high stack to move into that direction as well sort of like crossing the boundary of only text right so like you did say multi model in the context of let's say queering a table but I could also query an image. +So the same with the test stack is going in that direction as well. + Yeah so we are actually like real right now working on it so we have a first case where we want to support where you have a text query but you can query also into images from the right side side and then basically now other way around would be probably one of the later ones they have an image as a query until I want to find different media types, I'd say. +But yeah this is like definitely what we right now working on. I think I also think we need to think always see what are the big use cases and what kind of customers you have and how do we use it. +I think with images there's a lot of interesting use cases mainly in e-commerce I would say that's cool. Yeah, we are already supported to some degree and will support more I think in the next month. + That's great to learn and that also means that I need to adjust my classification because I've been presenting what I know about the players in in vector database and neural frameworks and specifically for haystack I put NLP as the main vertical and I think largely you guys still advertise that as the main vertical but I think nothing stops you from. +Switching that to multi modality right so NLP computer vision and maybe even speech at some point. +Yeah totally I think our approaches there's just a bit like doing one thing to quite a depth first and then moving on to the next rather than let's say starting with very high level basic support for all modalities and then kind of growing all of them. +So what we rather did in the past and still doing is very deep support for texts and we haven't there everything in place before kind of moving on to the next that's a bit of a philosophy question maybe a strategic questions what you want to do it. +So this field multi is changing quite a lot right so a lot of things generative models really big large models models that I don't know even how to use yet you know like dali. +Of course beyond just kind of experimental interest but probably there will be some use cases where do you think else the trends are going in this space. + Yeah so we like want one big trend I think for sure is these large language models and everything around it and as I talked earlier about it the questions like where is it right now and is it already today really usable is it already kind of worth investigating them comparing them for for your own use cases. + I think there we are I would say still in an early phase it's look at for example GPT 3 and and I think it's high months to the quite nice analysis earlier this year where compared embedding some GPT 3 and will more standard size transformers and there we think we saw it's the performance is it's not bad but it's also definitely not our performing. +You say regular size models which are a thousand times smaller cost a few dollars in not thousands and tens of thousands of dollars for for your influence costs. +So I think that's it's basically right now as to let's see case by case that it makes sense for use case but if you think look a bit further into the next years. + I'm pretty sure and convinced that this is only a matter of time until we see more and more large language models really in production also in search pipelines in production and think that now it's this phase of figuring out how can we make them really more efficient more more reliable so we really can trust these these results there. + Not going to be an easier way update to new knowledge and I would really but now look a lot into and what I'm personally quite excited about is now this I think area of research around retrieval based NLP so yes on the one hand side kind of scaling up the models making them bigger because we think learned and over last years that they are good few short learners and I think that's really good. +And that's of course exciting because you can just take these models and kind of throw a task at them and they will perform so less manual work of of annotating data creating domain specific data sets and so on. +But I think we also saw that they are not very efficient and there are these other problems. + How do you how do you actually now teach not to be free about recent events or about your own domain knowledge and typically I think these these data sets that you that you want to search in they're not static right so there's a constantly evolving and you really want to retrain these crazy models every few days or weeks just to kind of catch up with us. +And I think that's like where this stream of retrieval based or achievement at models is super interesting and I think there's a lot of cool work. So just this week we're back from from Patrick Lewis publication around the Atlas model. Sure if you saw it. But there's basically the idea. +Can we can we somehow remove the say the memory part from these big models and it kind of outsource it to a database to an index and then at a query time we still have like a large model. + Can we look complex reasoning, but it's kind of basing the generation on some retrieve documents and that can be useful for search but can be also for an effect checking or other use cases and and long story short, I think they have interesting they did love interesting experiments and that paper that show that you can actually outsource quite a bit of these parameters of this memory into into a say a vector like the database and and still keep the few shot capabilities of these giant language models. +And I think this is like a super cool route like larger models but still not putting everything in it, not not blowing up parameters, parameters size unreasonably. Let's do combining it with now let's say an external document base or knowledge base. +Yeah, I think it's the topic attached upon it's fascinating that on one hand, let's say you have a model, right? And if you if you keep retraining it or fine tuning it on on latest data, you may run into this. +I think it's called catastrophic forgetting, right? Like things that we as humans know that I don't know what is liquid kind of on high level without going into chemistry. + And it's not that we think about it every single day when we drink water, but like it's not that we actually forget it if somebody asks us right no matter how many news or papers, whatever the red books right we still remember the basic facts and and I think what you just said with the Atlas model right so approach outsourcing that memory into some database that you can maybe even control and say, okay, these facts need to stay. +I never want them to go away no matter what right, these are like basic principles and maybe they exist in every domain like finance or healthcare and so on. And yeah, I think this is interesting direction. +Yeah, absolutely all these facts change right can also be that over time you have to adjust facts or knowledge and this is way easier I think if you have it explicitly somewhere in documents, not so much in the just in the parameters model. +Yeah, exactly exactly and like maybe just one example that comes to my mind is like CT CT's change names right and so you could still go back and say what was the name of that CD between you know 1995 and 2000 right something like that. +Yeah, or presidents of nations also change right so for this kind of queries I think you want to make sure that you're up to date and change it. +Yeah, and I think maybe coming back to search understanding the context will place such a huge role once these models become even more mature and available and knowledge aware. + But but the challenge of extracting contacts from the query still is there if I say who is the president of the United States, it might you know conclude that i'm asking about now present but if I was couple programs above already saying setting the stage about specific period of time in the past it could actually reason that I'm maybe not asking about presence right. +Exactly could do this reasoning or could you ask a clarifying question right or say like all like here are a couple of options that you mean this like as you may want more to win a human conversation. +Yeah, so I think it's called conversational information retrieval right and I think that we might start seeing this blend of what probably today is called chatbot and a search engine but it could be a search engine which is just clarifying. +Yeah, I mean overall I think it's it's that's a thing also is in the field I think we are seeing that the search what we understand undersurge is evolving right so it's not so much anymore so I think about web search engines. +Yeah, in few cases you were still you search search and you click on the website and then you'll search somewhere your information. +But in many cases we will kind of zero click search now where you have your query and within the search results you already find what you want at and i think this is just yeah getting more and more popular that. +Yeah, you're not providing say there's the route to go to another knowledge source but you're trying to really answer the query directly and there's no need to go further. +I will also try to remember to link one paper maybe it's like a series of papers from Microsoft where they try to embed knowledge into the language model and that's. +I think it's a very interesting direction as well as also embedding knowledge graphs into the model right because one way, as you said, and I think that trend probably still there that yeah you can keep adding parameters more and more billion trillions. +But at some point it just simply becomes an on practical in practical right to to have such a large model in production and then how do you find unit. But again it doesn't capture the relationships well enough right if you didn't explain it. + Absolutely and just thinking of it we have actually a meetup end of September so if you're interested or anyone who's listening where a needs Rimas will also talk about exactly that topic how do you kind of incorporate knowledge into a language model and and that would be our end of September as well just look for maybe can link it in the show notes. +So I will manage you to experience absolutely will do that gladly my favorite question I know you touched many times as well during this podcast which I really really enjoyed. +But what what else drives you beyond you know you have a role as a city or you have a role as a pioneer in this space and maybe educating and reaching more and more people. Is there something else that drives you sort of beyond the tech itself in this field. +Yeah like I mean that I think my make sight of my passion for an appears here. I hope that came came through. +But for me like the technology is the one thing but then really seeing how you solve problems with that like how you can make annoying work of financial analyst faster and better like just seeing that. Either say first and because they are customer or it's a more indirectly. +If you know that this is now kind of possible. So I think it's like still a big driver for me personally and I think I want to think I absolutely love about open source that it's not just paying users commercial users where you kind of see that. +But we are really this huge community by now from haystack where there's so many different people with different backgrounds, different use cases and it's. +For me often like just end of the day really like scrolling through and all get up issues kind of questions that come in or on Slack when are we on discord. +Like what what people are actually building with that and and it's really cool to see what they kind of use case come up with but also how far this actually got that it's. +And using so many companies all around the world from big tech to classical enterprise to start up to build their products on top. And that thing is a stick to one of my biggest motivation boosters that you can get at seeing the community appreciating using it. +And and probably also like on tip beyond get up just recently ran into a guy in a bar who he and Berlin who who use taste that and let's definitely something I never would have imagined a few years ago. +And this kind of happens or what we said of a glass here when we find a bit of a vision and thought about some goals at the company offside. +I think one of us for the open source side that people start putting say haystack experience into their job requirements or the other way around people putting that in the CVs and we thought oh, I guess this is maybe three years down the road. +But then a few weeks afterwards we saw these first job postings where this was required and also TVs where this was mentioned. +So I think it's just cool to see how you can leave a footprint and beyond let's say you are immediate bubble but really kind of spreads it's open it's all digital it's kind of connected in the world right. And leaving those kind of footprint is what I enjoy. +And yeah, search in I think as a domain in us just for me really interesting because it's so diverse as you can go in many directions can dive very deep into an IP can think a lot about the user side at the end for what use cases you can make it work. And can think a lot about scalability. +It's just I think the one of the most for my point most exciting and diverse applications of technology right now. And and one way I think you can really relate to like really can think okay what what is actually possible what kind of information you can make accessible. +And that's that's obviously the beauty of it. Yeah, it's beautifully put thanks thanks for sharing I know some some of the guests that I asked this question would probably think hey why is this philosophical question I'm just you know doing it I like it but that's it. +But I think it gives so much to towards you know you reflecting on what you do because that might also influence your choices in in the tech or in how you approach your users what message you send and so on and so forth and maybe reconsider some things as well. + And an open source part you reminded me of one story when it was my first time visit in the US I think it was 2015 and it was a patchy coin I was crossing on the traffic light you know on the pedestrian crossing and it was like this narrow avenue you know not narrow white and every right and select takes on my account like few minutes but it's of course not minutes maybe 20 seconds. +And I think it's really bouncing to me from the other side of the road saying I know you I was like no it's impossible it's my first time visit you know I don't I'm not a public figure. +How is it possible and he said he because you build look it's one of the open source kind of you seen in the extriders that I used to work on. +You know which I inherited from its original creator Andrej Blyetski and that's it he didn't stop to say anything else but but he made my day you know and I think what you felt in the bar was probably similar knowing that that person uses haystack and you know it's amazing. +Absolutely because it's just it feels very honest right it feels like it is is not because we know it's crazy marketing or anything like that it's just like a really like a natural community thing and and just building something that's useful for others. + Yeah exactly which probably reinforces you and gives you these well in this case direct feedback well not only specifics of your of your platform but actually the fact that they're using it and relying on and building a business and that tells the two decisions you made in the architecture and so on and so forth that's amazing. +Yeah I mean like from a company perspective that's one of the fastest feedback cycles you can have right and like seeing diverse use cases diverse developer person on us how they approach things what they're struggling with. Yeah also that angle it was fast yeah absolutely crucial. +I think it's the best and it's like I think it's Elon Musk who said the best setting is when your user fell in love with your product and once you just succeed so yeah there you go amazing and I've enjoyed this podcast so much is there anything you want to announce to our listeners. + Yeah we've just the meet up I already mentioned so if you're interested in LP that's happening in September it will be hybrid so you can join online but if you're if you happen to be in Berlin we also have a small on site event and then yeah of course if you haven't tried hastag yet maybe check it out on GitHub. + As a prompt every promise you can get an easy first pipeline up and running and just give it a try to try to question answering if you haven't if you're more coming from traditional search and down on deep set cloud as mentioned we just released a big new model on experiments with still an early stage with with the product but we have an early access program so if you're interested if you're having a lot of use case that you want to bring to production in a fast way where you think about how to scale it how to actually find that pipeline how to collaborate with with your end users and get some feedback there just reach out to us and then we can can get you on the on the early access program. + Amazing thanks so much multi have enjoyed again saying this and this was deep and thoughtful and we will make sure to link all the all the goodies that you mentioned in the show notes and I hope to meet some day maybe in Berlin maybe somewhere else but absolutely yeah let's make that happen and I totally enjoyed our conversation as well so thanks but for having me. +It's definitely interesting fantastic all the best with haystack and and with your research and development. Thanks a lot thanks all the bye bye bye. you \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/max-irwin-founder-max-io-on-economics-of-scale-in-embedding-computation-with-mighty.md b/transcripts_with_timestamps/vector-podcast/max-irwin-founder-max-io-on-economics-of-scale-in-embedding-computation-with-mighty.md new file mode 100644 index 0000000..4efa56a --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/max-irwin-founder-max-io-on-economics-of-scale-in-embedding-computation-with-mighty.md @@ -0,0 +1,6142 @@ +--- +description: '

00:00 Introduction

01:10 Max''s deep experience in search and + how he transitioned from structured data

08:28 Query-term dependence problem + and Max''s perception of the Vector Search field

12:46 Is vector search a + solution looking for a problem?

20:16 How to move embeddings computation from + GPU to CPU and retain GPU latency?

27:51 Plug-in neural model into Java? Example + with a Hugging Face model

33:02 Web-server Mighty and its philosophy

35:33 + How Mighty compares to in-DB embedding layer, like Weavite or Vespa

39:40 + The importance of fault-tolerance in search backends

43:31 Unit economics + of Mighty

50:18 Mighty distribution and supported operating systems

54:57 + The secret sauce behind Mighty''s insane fast-ness

59:48 What a customer is + paying for when buying Mighty

1:01:45 How will Max track the usage of Mighty: + is it commercial or research use?

1:04:39 Role of Open Source Community to + grow business

1:10:58 Max''s vision for Mighty connectors to popular vector + databases

1:18:09 What tooling is missing beyond Mighty in vector search pipelines

1:22:34 + Fine-tuning models, metric learning and Max''s call for partnerships

1:26:37 + MLOps perspective of neural pipelines and Mighty''s role in it

1:30:04 Mighty + vs AWS Inferentia vs Hugging Face Infinity

1:35:50 What''s left in ML for + those who are not into Python

1:40:50 The philosophical (and magical) question + of WHY

1:48:15 Announcements from Max

25% discount for the first year + of using Mighty in your great product / project with promo code VECTOR:

https://bit.ly/3QekTWE

Show notes:

- + Max''s blog about BERT and search relevance: https://opensourceconnections.com/blog/2019/11/05/understanding-bert-and-search-relevance/

- + Case study and unit economics of Mighty: https://max.io/blog/encoding-the-federal-register.html

- + Not All Vector Databases Are Made Equal: https://towardsdatascience.com/milvus-pinecone-vespa-weaviate-vald-gsi-what-unites-these-buzz-words-and-what-makes-each-9c65a3bd0696

Watch + on YouTube: https://youtu.be/LnF4hbl1cE4

' +image_url: https://media.rss.com/vector-podcast/20220616_060650_51fed3f5cf98ff1ddb61cc17e11e43be.jpg +pub_date: Thu, 16 Jun 2022 18:27:50 GMT +title: Max Irwin - Founder, MAX.IO - On economics of scale in embedding computation + with Mighty +url: https://rss.com/podcasts/vector-podcast/522301 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 23.0, "text": " Hello, + vector podcast is here.", "tokens": [50364, 2425, 11, 8062, 7367, 307, 510, 13, + 51514], "temperature": 0.0, "avg_logprob": -0.7542389956387606, "compression_ratio": + 0.7894736842105263, "no_speech_prob": 0.11571534723043442}, {"id": 1, "seek": 2300, + "start": 24.0, "end": 27.0, "text": " And today I''m going to be talking to Max + Irwin.", "tokens": [50414, 400, 965, 286, 478, 516, 281, 312, 1417, 281, 7402, 9151, + 9136, 13, 50564], "temperature": 0.0, "avg_logprob": -0.20569798946380616, "compression_ratio": + 1.4684210526315788, "no_speech_prob": 0.6937023401260376}, {"id": 2, "seek": 2300, + "start": 27.0, "end": 32.0, "text": " He''s this star in the search engine business + in search engine world.", "tokens": [50564, 634, 311, 341, 3543, 294, 264, 3164, + 2848, 1606, 294, 3164, 2848, 1002, 13, 50814], "temperature": 0.0, "avg_logprob": + -0.20569798946380616, "compression_ratio": 1.4684210526315788, "no_speech_prob": + 0.6937023401260376}, {"id": 3, "seek": 2300, "start": 32.0, "end": 36.0, "text": + " He has been doubling also in NLP a lot.", "tokens": [50814, 634, 575, 668, 33651, + 611, 294, 426, 45196, 257, 688, 13, 51014], "temperature": 0.0, "avg_logprob": -0.20569798946380616, + "compression_ratio": 1.4684210526315788, "no_speech_prob": 0.6937023401260376}, + {"id": 4, "seek": 2300, "start": 36.0, "end": 39.0, "text": " I don''t know 20 years. + It''s huge amount of time.", "tokens": [51014, 286, 500, 380, 458, 945, 924, 13, + 467, 311, 2603, 2372, 295, 565, 13, 51164], "temperature": 0.0, "avg_logprob": -0.20569798946380616, + "compression_ratio": 1.4684210526315788, "no_speech_prob": 0.6937023401260376}, + {"id": 5, "seek": 2300, "start": 39.0, "end": 49.0, "text": " And I mean, he has + been consulting in this space, also building products.", "tokens": [51164, 400, + 286, 914, 11, 415, 575, 668, 23682, 294, 341, 1901, 11, 611, 2390, 3383, 13, 51664], + "temperature": 0.0, "avg_logprob": -0.20569798946380616, "compression_ratio": 1.4684210526315788, + "no_speech_prob": 0.6937023401260376}, {"id": 6, "seek": 4900, "start": 49.0, "end": + 53.0, "text": " And now he''s focusing on building his new product.", "tokens": + [50364, 400, 586, 415, 311, 8416, 322, 2390, 702, 777, 1674, 13, 50564], "temperature": + 0.0, "avg_logprob": -0.12551216725949887, "compression_ratio": 1.5982905982905984, + "no_speech_prob": 0.10351726412773132}, {"id": 7, "seek": 4900, "start": 53.0, "end": + 58.0, "text": " And he''s the founder of company called max.io, which is also a + website.", "tokens": [50564, 400, 415, 311, 264, 14917, 295, 2237, 1219, 11469, + 13, 1004, 11, 597, 307, 611, 257, 3144, 13, 50814], "temperature": 0.0, "avg_logprob": + -0.12551216725949887, "compression_ratio": 1.5982905982905984, "no_speech_prob": + 0.10351726412773132}, {"id": 8, "seek": 4900, "start": 58.0, "end": 62.0, "text": + " You can go check it out. And he''s building a mighty inference server.", "tokens": + [50814, 509, 393, 352, 1520, 309, 484, 13, 400, 415, 311, 2390, 257, 21556, 38253, + 7154, 13, 51014], "temperature": 0.0, "avg_logprob": -0.12551216725949887, "compression_ratio": + 1.5982905982905984, "no_speech_prob": 0.10351726412773132}, {"id": 9, "seek": 4900, + "start": 62.0, "end": 66.0, "text": " And the number of other tools that I''m sure + Max will talk about today.", "tokens": [51014, 400, 264, 1230, 295, 661, 3873, 300, + 286, 478, 988, 7402, 486, 751, 466, 965, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.12551216725949887, "compression_ratio": 1.5982905982905984, "no_speech_prob": + 0.10351726412773132}, {"id": 10, "seek": 4900, "start": 66.0, "end": 67.0, "text": + " Hey, Max, how are you doing?", "tokens": [51214, 1911, 11, 7402, 11, 577, 366, + 291, 884, 30, 51264], "temperature": 0.0, "avg_logprob": -0.12551216725949887, "compression_ratio": + 1.5982905982905984, "no_speech_prob": 0.10351726412773132}, {"id": 11, "seek": 4900, + "start": 67.0, "end": 69.0, "text": " I''m doing great. How are you?", "tokens": + [51264, 286, 478, 884, 869, 13, 1012, 366, 291, 30, 51364], "temperature": 0.0, + "avg_logprob": -0.12551216725949887, "compression_ratio": 1.5982905982905984, "no_speech_prob": + 0.10351726412773132}, {"id": 12, "seek": 4900, "start": 69.0, "end": 73.0, "text": + " I''m great. And thanks so much for joining me today.", "tokens": [51364, 286, + 478, 869, 13, 400, 3231, 370, 709, 337, 5549, 385, 965, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.12551216725949887, "compression_ratio": 1.5982905982905984, + "no_speech_prob": 0.10351726412773132}, {"id": 13, "seek": 7300, "start": 73.0, + "end": 76.0, "text": " I''m very happy to be talking to you today.", "tokens": [50364, + 286, 478, 588, 2055, 281, 312, 1417, 281, 291, 965, 13, 50514], "temperature": 0.0, + "avg_logprob": -0.3944159527214206, "compression_ratio": 2.151898734177215, "no_speech_prob": + 0.25695300102233887}, {"id": 14, "seek": 7300, "start": 76.0, "end": 79.0, "text": + " I''m very happy to be talking to you today.", "tokens": [50514, 286, 478, 588, + 2055, 281, 312, 1417, 281, 291, 965, 13, 50664], "temperature": 0.0, "avg_logprob": + -0.3944159527214206, "compression_ratio": 2.151898734177215, "no_speech_prob": 0.25695300102233887}, + {"id": 15, "seek": 7300, "start": 79.0, "end": 82.0, "text": " I''m very happy to + be talking to you today.", "tokens": [50664, 286, 478, 588, 2055, 281, 312, 1417, + 281, 291, 965, 13, 50814], "temperature": 0.0, "avg_logprob": -0.3944159527214206, + "compression_ratio": 2.151898734177215, "no_speech_prob": 0.25695300102233887}, + {"id": 16, "seek": 7300, "start": 82.0, "end": 84.0, "text": " I''m very happy to + be talking to you today.", "tokens": [50814, 286, 478, 588, 2055, 281, 312, 1417, + 281, 291, 965, 13, 50914], "temperature": 0.0, "avg_logprob": -0.3944159527214206, + "compression_ratio": 2.151898734177215, "no_speech_prob": 0.25695300102233887}, + {"id": 17, "seek": 7300, "start": 84.0, "end": 90.0, "text": " And I''m learning + about my tea and all the things that you''re cooking there.", "tokens": [50914, + 400, 286, 478, 2539, 466, 452, 5817, 293, 439, 264, 721, 300, 291, 434, 6361, 456, + 13, 51214], "temperature": 0.0, "avg_logprob": -0.3944159527214206, "compression_ratio": + 2.151898734177215, "no_speech_prob": 0.25695300102233887}, {"id": 18, "seek": 7300, + "start": 90.0, "end": 96.0, "text": " But I think as a tradition, could you start + with introducing yourself first?", "tokens": [51214, 583, 286, 519, 382, 257, 6994, + 11, 727, 291, 722, 365, 15424, 1803, 700, 30, 51514], "temperature": 0.0, "avg_logprob": + -0.3944159527214206, "compression_ratio": 2.151898734177215, "no_speech_prob": 0.25695300102233887}, + {"id": 19, "seek": 7300, "start": 96.0, "end": 98.0, "text": " Sure. Yeah. Hi.", + "tokens": [51514, 4894, 13, 865, 13, 2421, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.3944159527214206, "compression_ratio": 2.151898734177215, "no_speech_prob": 0.25695300102233887}, + {"id": 20, "seek": 9800, "start": 98.0, "end": 101.0, "text": " So I''m good to + go on my own business.", "tokens": [50364, 407, 286, 478, 665, 281, 352, 322, 452, + 1065, 1606, 13, 50514], "temperature": 0.6, "avg_logprob": -0.7845254224889419, + "compression_ratio": 2.3445945945945947, "no_speech_prob": 0.12286259233951569}, + {"id": 21, "seek": 9800, "start": 101.0, "end": 104.0, "text": " I''m good to go + on my own business.", "tokens": [50514, 286, 478, 665, 281, 352, 322, 452, 1065, + 1606, 13, 50664], "temperature": 0.6, "avg_logprob": -0.7845254224889419, "compression_ratio": + 2.3445945945945947, "no_speech_prob": 0.12286259233951569}, {"id": 22, "seek": 9800, + "start": 104.0, "end": 105.0, "text": " So I''m a doctor.", "tokens": [50664, 407, + 286, 478, 257, 4631, 13, 50714], "temperature": 0.6, "avg_logprob": -0.7845254224889419, + "compression_ratio": 2.3445945945945947, "no_speech_prob": 0.12286259233951569}, + {"id": 23, "seek": 9800, "start": 105.0, "end": 108.0, "text": " I''m a doctor.", + "tokens": [50714, 286, 478, 257, 4631, 13, 50864], "temperature": 0.6, "avg_logprob": + -0.7845254224889419, "compression_ratio": 2.3445945945945947, "no_speech_prob": + 0.12286259233951569}, {"id": 24, "seek": 9800, "start": 108.0, "end": 109.0, "text": + " I''m a doctor.", "tokens": [50864, 286, 478, 257, 4631, 13, 50914], "temperature": + 0.6, "avg_logprob": -0.7845254224889419, "compression_ratio": 2.3445945945945947, + "no_speech_prob": 0.12286259233951569}, {"id": 25, "seek": 9800, "start": 109.0, + "end": 110.0, "text": " I''m a doctor.", "tokens": [50914, 286, 478, 257, 4631, + 13, 50964], "temperature": 0.6, "avg_logprob": -0.7845254224889419, "compression_ratio": + 2.3445945945945947, "no_speech_prob": 0.12286259233951569}, {"id": 26, "seek": 9800, + "start": 110.0, "end": 112.0, "text": " I''m a doctor.", "tokens": [50964, 286, + 478, 257, 4631, 13, 51064], "temperature": 0.6, "avg_logprob": -0.7845254224889419, + "compression_ratio": 2.3445945945945947, "no_speech_prob": 0.12286259233951569}, + {"id": 27, "seek": 9800, "start": 112.0, "end": 113.0, "text": " I''m a doctor.", + "tokens": [51064, 286, 478, 257, 4631, 13, 51114], "temperature": 0.6, "avg_logprob": + -0.7845254224889419, "compression_ratio": 2.3445945945945947, "no_speech_prob": + 0.12286259233951569}, {"id": 28, "seek": 9800, "start": 113.0, "end": 115.0, "text": + " I''m a doctor.", "tokens": [51114, 286, 478, 257, 4631, 13, 51214], "temperature": + 0.6, "avg_logprob": -0.7845254224889419, "compression_ratio": 2.3445945945945947, + "no_speech_prob": 0.12286259233951569}, {"id": 29, "seek": 9800, "start": 115.0, + "end": 116.0, "text": " I''m a doctor.", "tokens": [51214, 286, 478, 257, 4631, + 13, 51264], "temperature": 0.6, "avg_logprob": -0.7845254224889419, "compression_ratio": + 2.3445945945945947, "no_speech_prob": 0.12286259233951569}, {"id": 30, "seek": 9800, + "start": 116.0, "end": 117.0, "text": " I''m a doctor.", "tokens": [51264, 286, + 478, 257, 4631, 13, 51314], "temperature": 0.6, "avg_logprob": -0.7845254224889419, + "compression_ratio": 2.3445945945945947, "no_speech_prob": 0.12286259233951569}, + {"id": 31, "seek": 9800, "start": 117.0, "end": 118.0, "text": " I''m a doctor.", + "tokens": [51314, 286, 478, 257, 4631, 13, 51364], "temperature": 0.6, "avg_logprob": + -0.7845254224889419, "compression_ratio": 2.3445945945945947, "no_speech_prob": + 0.12286259233951569}, {"id": 32, "seek": 9800, "start": 118.0, "end": 122.0, "text": + " And sometimes I get a lot of things to do with my career.", "tokens": [51364, + 400, 2171, 286, 483, 257, 688, 295, 721, 281, 360, 365, 452, 3988, 13, 51564], "temperature": + 0.6, "avg_logprob": -0.7845254224889419, "compression_ratio": 2.3445945945945947, + "no_speech_prob": 0.12286259233951569}, {"id": 33, "seek": 9800, "start": 122.0, + "end": 126.0, "text": " In fact, when I was a younger, I didn''t do so well in my + language course.", "tokens": [51564, 682, 1186, 11, 562, 286, 390, 257, 7037, 11, + 286, 994, 380, 360, 370, 731, 294, 452, 2856, 1164, 13, 51764], "temperature": 0.6, + "avg_logprob": -0.7845254224889419, "compression_ratio": 2.3445945945945947, "no_speech_prob": + 0.12286259233951569}, {"id": 34, "seek": 12600, "start": 126.0, "end": 129.76, "text": + " started in 2015-2016 with actual product development around NLP.", "tokens": [50364, + 1409, 294, 7546, 12, 41103, 365, 3539, 1674, 3250, 926, 426, 45196, 13, 50552], + "temperature": 0.0, "avg_logprob": -0.2042112986246745, "compression_ratio": 1.5689655172413792, + "no_speech_prob": 0.06913179159164429}, {"id": 35, "seek": 12600, "start": 131.6, + "end": 139.44, "text": " With search, I''ve been doing search since about 2010-2011. + Again, it''s fuzzy when I actually first started,", "tokens": [50644, 2022, 3164, + 11, 286, 600, 668, 884, 3164, 1670, 466, 9657, 12, 2009, 5348, 13, 3764, 11, 309, + 311, 34710, 562, 286, 767, 700, 1409, 11, 51036], "temperature": 0.0, "avg_logprob": + -0.2042112986246745, "compression_ratio": 1.5689655172413792, "no_speech_prob": + 0.06913179159164429}, {"id": 36, "seek": 12600, "start": 139.44, "end": 148.32, + "text": " but I think the first real serious thing I did with search was when I + went to take my first solar", "tokens": [51036, 457, 286, 519, 264, 700, 957, 3156, + 551, 286, 630, 365, 3164, 390, 562, 286, 1437, 281, 747, 452, 700, 7936, 51480], + "temperature": 0.0, "avg_logprob": -0.2042112986246745, "compression_ratio": 1.5689655172413792, + "no_speech_prob": 0.06913179159164429}, {"id": 37, "seek": 12600, "start": 148.32, + "end": 153.52, "text": " training course, which was one of the, when Lucid Works + still had solar training and they had", "tokens": [51480, 3097, 1164, 11, 597, 390, + 472, 295, 264, 11, 562, 9593, 327, 27914, 920, 632, 7936, 3097, 293, 436, 632, 51740], + "temperature": 0.0, "avg_logprob": -0.2042112986246745, "compression_ratio": 1.5689655172413792, + "no_speech_prob": 0.06913179159164429}, {"id": 38, "seek": 15352, "start": 153.52, + "end": 160.88000000000002, "text": " contractors coming to give training. So that + was, that was in 2012, but I''d been messing around", "tokens": [50364, 28377, 1348, + 281, 976, 3097, 13, 407, 300, 390, 11, 300, 390, 294, 9125, 11, 457, 286, 1116, + 668, 23258, 926, 50732], "temperature": 0.0, "avg_logprob": -0.18199107622859453, + "compression_ratio": 1.5648535564853556, "no_speech_prob": 0.0015786823350936174}, + {"id": 39, "seek": 15352, "start": 160.88000000000002, "end": 168.32000000000002, + "text": " with engines before that. And I started on an engine called DT Search, + which was the C++", "tokens": [50732, 365, 12982, 949, 300, 13, 400, 286, 1409, + 322, 364, 2848, 1219, 413, 51, 17180, 11, 597, 390, 264, 383, 25472, 51104], "temperature": + 0.0, "avg_logprob": -0.18199107622859453, "compression_ratio": 1.5648535564853556, + "no_speech_prob": 0.0015786823350936174}, {"id": 40, "seek": 15352, "start": 171.12, + "end": 175.12, "text": " closed source engine, but you could buy the code for like + a thousand dollars a year. So the", "tokens": [51244, 5395, 4009, 2848, 11, 457, + 291, 727, 2256, 264, 3089, 337, 411, 257, 4714, 3808, 257, 1064, 13, 407, 264, 51444], + "temperature": 0.0, "avg_logprob": -0.18199107622859453, "compression_ratio": 1.5648535564853556, + "no_speech_prob": 0.0015786823350936174}, {"id": 41, "seek": 15352, "start": 175.12, + "end": 182.8, "text": " company I was working for, MetiRex, we actually bought the + code. And I was, I was the newbie with", "tokens": [51444, 2237, 286, 390, 1364, + 337, 11, 6377, 72, 49, 3121, 11, 321, 767, 4243, 264, 3089, 13, 400, 286, 390, 11, + 286, 390, 264, 777, 7392, 365, 51828], "temperature": 0.0, "avg_logprob": -0.18199107622859453, + "compression_ratio": 1.5648535564853556, "no_speech_prob": 0.0015786823350936174}, + {"id": 42, "seek": 18280, "start": 182.8, "end": 187.92000000000002, "text": " search. + I mean, we had guys been working with it for a while. And they''d built a whole + platform", "tokens": [50364, 3164, 13, 286, 914, 11, 321, 632, 1074, 668, 1364, + 365, 309, 337, 257, 1339, 13, 400, 436, 1116, 3094, 257, 1379, 3663, 50620], "temperature": + 0.0, "avg_logprob": -0.17406098819473415, "compression_ratio": 1.5578512396694215, + "no_speech_prob": 0.0006656002951785922}, {"id": 43, "seek": 18280, "start": 187.92000000000002, + "end": 194.56, "text": " around DT Search. And then I was starting to show its age. + So we started shifting over to solar.", "tokens": [50620, 926, 413, 51, 17180, 13, + 400, 550, 286, 390, 2891, 281, 855, 1080, 3205, 13, 407, 321, 1409, 17573, 670, + 281, 7936, 13, 50952], "temperature": 0.0, "avg_logprob": -0.17406098819473415, + "compression_ratio": 1.5578512396694215, "no_speech_prob": 0.0006656002951785922}, + {"id": 44, "seek": 18280, "start": 196.24, "end": 201.76000000000002, "text": " + But yeah, since I started that, but well, before that, I did a little bunch of computer", + "tokens": [51036, 583, 1338, 11, 1670, 286, 1409, 300, 11, 457, 731, 11, 949, 300, + 11, 286, 630, 257, 707, 3840, 295, 3820, 51312], "temperature": 0.0, "avg_logprob": + -0.17406098819473415, "compression_ratio": 1.5578512396694215, "no_speech_prob": + 0.0006656002951785922}, {"id": 45, "seek": 18280, "start": 201.76000000000002, "end": + 208.4, "text": " programs. So like the 20 years, 22 years-ish stuff that''s in my + bio, like I''ve been, I graduated", "tokens": [51312, 4268, 13, 407, 411, 264, 945, + 924, 11, 5853, 924, 12, 742, 1507, 300, 311, 294, 452, 12198, 11, 411, 286, 600, + 668, 11, 286, 13693, 51644], "temperature": 0.0, "avg_logprob": -0.17406098819473415, + "compression_ratio": 1.5578512396694215, "no_speech_prob": 0.0006656002951785922}, + {"id": 46, "seek": 20840, "start": 208.4, "end": 212.96, "text": " university in + the year 2000, and I''ve been, you know, working professionally software ever since.", + "tokens": [50364, 5454, 294, 264, 1064, 8132, 11, 293, 286, 600, 668, 11, 291, 458, + 11, 1364, 27941, 4722, 1562, 1670, 13, 50592], "temperature": 0.0, "avg_logprob": + -0.19830952088038126, "compression_ratio": 1.556910569105691, "no_speech_prob": + 0.0018238467164337635}, {"id": 47, "seek": 20840, "start": 213.76000000000002, "end": + 221.92000000000002, "text": " But with Search, I, I really got interested in in + Search around 2012, is when I really said,", "tokens": [50632, 583, 365, 17180, + 11, 286, 11, 286, 534, 658, 3102, 294, 294, 17180, 926, 9125, 11, 307, 562, 286, + 534, 848, 11, 51040], "temperature": 0.0, "avg_logprob": -0.19830952088038126, "compression_ratio": + 1.556910569105691, "no_speech_prob": 0.0018238467164337635}, {"id": 48, "seek": + 20840, "start": 221.92000000000002, "end": 226.4, "text": " wow, this is amazing. + This is so much different from what I''ve been doing before. So that''s when", "tokens": + [51040, 6076, 11, 341, 307, 2243, 13, 639, 307, 370, 709, 819, 490, 437, 286, 600, + 668, 884, 949, 13, 407, 300, 311, 562, 51264], "temperature": 0.0, "avg_logprob": + -0.19830952088038126, "compression_ratio": 1.556910569105691, "no_speech_prob": + 0.0018238467164337635}, {"id": 49, "seek": 20840, "start": 226.4, "end": 231.84, + "text": " I really do have had first into into the problem space in the domain. + Yeah. And some people say", "tokens": [51264, 286, 534, 360, 362, 632, 700, 666, + 666, 264, 1154, 1901, 294, 264, 9274, 13, 865, 13, 400, 512, 561, 584, 51536], "temperature": + 0.0, "avg_logprob": -0.19830952088038126, "compression_ratio": 1.556910569105691, + "no_speech_prob": 0.0018238467164337635}, {"id": 50, "seek": 23184, "start": 232.48, + "end": 239.20000000000002, "text": " that many of us ended up in Search field by + accident, as well as actually NLP. I''ve been talking", "tokens": [50396, 300, 867, + 295, 505, 4590, 493, 294, 17180, 2519, 538, 6398, 11, 382, 731, 382, 767, 426, 45196, + 13, 286, 600, 668, 1417, 50732], "temperature": 0.0, "avg_logprob": -0.1389677281282386, + "compression_ratio": 1.6508620689655173, "no_speech_prob": 0.03212380409240723}, + {"id": 51, "seek": 23184, "start": 239.20000000000002, "end": 244.72, "text": " + to one professor here in the University of Helsinki has built machine translation + team, very,", "tokens": [50732, 281, 472, 8304, 510, 294, 264, 3535, 295, 45429, + 41917, 575, 3094, 3479, 12853, 1469, 11, 588, 11, 51008], "temperature": 0.0, "avg_logprob": + -0.1389677281282386, "compression_ratio": 1.6508620689655173, "no_speech_prob": + 0.03212380409240723}, {"id": 52, "seek": 23184, "start": 244.72, "end": 252.72, + "text": " very strong one. And, and he has built the, the system called Opus. And, + and, and he actually said", "tokens": [51008, 588, 2068, 472, 13, 400, 11, 293, + 415, 575, 3094, 264, 11, 264, 1185, 1219, 12011, 301, 13, 400, 11, 293, 11, 293, + 415, 767, 848, 51408], "temperature": 0.0, "avg_logprob": -0.1389677281282386, "compression_ratio": + 1.6508620689655173, "no_speech_prob": 0.03212380409240723}, {"id": 53, "seek": 23184, + "start": 252.72, "end": 259.04, "text": " that he ended up in NLP also by accident + because it was just an offer from a professor and he", "tokens": [51408, 300, 415, + 4590, 493, 294, 426, 45196, 611, 538, 6398, 570, 309, 390, 445, 364, 2626, 490, + 257, 8304, 293, 415, 51724], "temperature": 0.0, "avg_logprob": -0.1389677281282386, + "compression_ratio": 1.6508620689655173, "no_speech_prob": 0.03212380409240723}, + {"id": 54, "seek": 25904, "start": 259.04, "end": 263.52000000000004, "text": " + decided to take it and he turned out to be quite good at it, you know. But he also + had another", "tokens": [50364, 3047, 281, 747, 309, 293, 415, 3574, 484, 281, 312, + 1596, 665, 412, 309, 11, 291, 458, 13, 583, 415, 611, 632, 1071, 50588], "temperature": + 0.0, "avg_logprob": -0.10589849031888522, "compression_ratio": 1.63135593220339, + "no_speech_prob": 0.004477100912481546}, {"id": 55, "seek": 25904, "start": 263.52000000000004, + "end": 268.56, "text": " option just to go and work in in Germany, he''s from Germany, + to work in Germany in some company,", "tokens": [50588, 3614, 445, 281, 352, 293, + 589, 294, 294, 7244, 11, 415, 311, 490, 7244, 11, 281, 589, 294, 7244, 294, 512, + 2237, 11, 50840], "temperature": 0.0, "avg_logprob": -0.10589849031888522, "compression_ratio": + 1.63135593220339, "no_speech_prob": 0.004477100912481546}, {"id": 56, "seek": 25904, + "start": 268.56, "end": 274.96000000000004, "text": " database company. And, and, + and likely he didn''t take that path. How was it for you? How do you feel", "tokens": + [50840, 8149, 2237, 13, 400, 11, 293, 11, 293, 3700, 415, 994, 380, 747, 300, 3100, + 13, 1012, 390, 309, 337, 291, 30, 1012, 360, 291, 841, 51160], "temperature": 0.0, + "avg_logprob": -0.10589849031888522, "compression_ratio": 1.63135593220339, "no_speech_prob": + 0.004477100912481546}, {"id": 57, "seek": 25904, "start": 274.96000000000004, "end": + 281.84000000000003, "text": " about yourself and then ending up in the in the in + this space? That''s a great question. It''s", "tokens": [51160, 466, 1803, 293, + 550, 8121, 493, 294, 264, 294, 264, 294, 341, 1901, 30, 663, 311, 257, 869, 1168, + 13, 467, 311, 51504], "temperature": 0.0, "avg_logprob": -0.10589849031888522, "compression_ratio": + 1.63135593220339, "no_speech_prob": 0.004477100912481546}, {"id": 58, "seek": 28184, + "start": 281.84, "end": 293.52, "text": " interesting. I feel like ending it up, + I, it was definitely somewhat accidental. I found, I,", "tokens": [50364, 1880, + 13, 286, 841, 411, 8121, 309, 493, 11, 286, 11, 309, 390, 2138, 8344, 38094, 13, + 286, 1352, 11, 286, 11, 50948], "temperature": 0.0, "avg_logprob": -0.16171821425942814, + "compression_ratio": 1.572972972972973, "no_speech_prob": 0.0006110819522291422}, + {"id": 59, "seek": 28184, "start": 293.52, "end": 299.03999999999996, "text": " + I had the pleasure of meeting so many people in search through my different positions + that I was", "tokens": [50948, 286, 632, 264, 6834, 295, 3440, 370, 867, 561, 294, + 3164, 807, 452, 819, 8432, 300, 286, 390, 51224], "temperature": 0.0, "avg_logprob": + -0.16171821425942814, "compression_ratio": 1.572972972972973, "no_speech_prob": + 0.0006110819522291422}, {"id": 60, "seek": 28184, "start": 299.03999999999996, "end": + 306.55999999999995, "text": " working with and the varying degrees of expertise. + I found that a lot of people who got involved with", "tokens": [51224, 1364, 365, + 293, 264, 22984, 5310, 295, 11769, 13, 286, 1352, 300, 257, 688, 295, 561, 567, + 658, 3288, 365, 51600], "temperature": 0.0, "avg_logprob": -0.16171821425942814, + "compression_ratio": 1.572972972972973, "no_speech_prob": 0.0006110819522291422}, + {"id": 61, "seek": 30656, "start": 306.56, "end": 313.36, "text": " machine learning + found out about search because TFI, DF and all that stuff is like an algorithm", + "tokens": [50364, 3479, 2539, 1352, 484, 466, 3164, 570, 314, 38568, 11, 48336, + 293, 439, 300, 1507, 307, 411, 364, 9284, 50704], "temperature": 0.0, "avg_logprob": + -0.1660832511054145, "compression_ratio": 1.6551724137931034, "no_speech_prob": + 9.055841655936092e-05}, {"id": 62, "seek": 30656, "start": 313.36, "end": 317.28000000000003, + "text": " and it''s like, oh, there''s this whole language problem behind search, + so we have to figure out.", "tokens": [50704, 293, 309, 311, 411, 11, 1954, 11, + 456, 311, 341, 1379, 2856, 1154, 2261, 3164, 11, 370, 321, 362, 281, 2573, 484, + 13, 50900], "temperature": 0.0, "avg_logprob": -0.1660832511054145, "compression_ratio": + 1.6551724137931034, "no_speech_prob": 9.055841655936092e-05}, {"id": 63, "seek": + 30656, "start": 317.28000000000003, "end": 321.28000000000003, "text": " And then + the search people get involved in machine learning because, oh, this language problem", + "tokens": [50900, 400, 550, 264, 3164, 561, 483, 3288, 294, 3479, 2539, 570, 11, + 1954, 11, 341, 2856, 1154, 51100], "temperature": 0.0, "avg_logprob": -0.1660832511054145, + "compression_ratio": 1.6551724137931034, "no_speech_prob": 9.055841655936092e-05}, + {"id": 64, "seek": 30656, "start": 321.28000000000003, "end": 329.84000000000003, + "text": " is horrible. How do we solve it with automation and learning? So I, I + accidentally stumbled on it", "tokens": [51100, 307, 9263, 13, 1012, 360, 321, 5039, + 309, 365, 17769, 293, 2539, 30, 407, 286, 11, 286, 15715, 36668, 322, 309, 51528], + "temperature": 0.0, "avg_logprob": -0.1660832511054145, "compression_ratio": 1.6551724137931034, + "no_speech_prob": 9.055841655936092e-05}, {"id": 65, "seek": 32984, "start": 329.84, + "end": 336.96, "text": " because I took, it was a, it was a role that was in like + healthcare compliance. And I was", "tokens": [50364, 570, 286, 1890, 11, 309, 390, + 257, 11, 309, 390, 257, 3090, 300, 390, 294, 411, 8884, 15882, 13, 400, 286, 390, + 50720], "temperature": 0.0, "avg_logprob": -0.1749326012351296, "compression_ratio": + 1.705223880597015, "no_speech_prob": 0.0008606132469139993}, {"id": 66, "seek": + 32984, "start": 336.96, "end": 341.03999999999996, "text": " interested in that + domain specifically and search just happened to be a really important problem", + "tokens": [50720, 3102, 294, 300, 9274, 4682, 293, 3164, 445, 2011, 281, 312, 257, + 534, 1021, 1154, 50924], "temperature": 0.0, "avg_logprob": -0.1749326012351296, + "compression_ratio": 1.705223880597015, "no_speech_prob": 0.0008606132469139993}, + {"id": 67, "seek": 32984, "start": 341.03999999999996, "end": 345.03999999999996, + "text": " in that space. So that''s how I kind of got into the, the technical domain + of search.", "tokens": [50924, 294, 300, 1901, 13, 407, 300, 311, 577, 286, 733, + 295, 658, 666, 264, 11, 264, 6191, 9274, 295, 3164, 13, 51124], "temperature": 0.0, + "avg_logprob": -0.1749326012351296, "compression_ratio": 1.705223880597015, "no_speech_prob": + 0.0008606132469139993}, {"id": 68, "seek": 32984, "start": 346.88, "end": 352.79999999999995, + "text": " And it just was so much more fascinating than like the stuff that I was + used to with,", "tokens": [51216, 400, 309, 445, 390, 370, 709, 544, 10343, 813, + 411, 264, 1507, 300, 286, 390, 1143, 281, 365, 11, 51512], "temperature": 0.0, "avg_logprob": + -0.1749326012351296, "compression_ratio": 1.705223880597015, "no_speech_prob": 0.0008606132469139993}, + {"id": 69, "seek": 32984, "start": 352.79999999999995, "end": 358.79999999999995, + "text": " crud, you know, just create read update delete and just workflow applications, + which I''d been doing", "tokens": [51512, 941, 532, 11, 291, 458, 11, 445, 1884, + 1401, 5623, 12097, 293, 445, 20993, 5821, 11, 597, 286, 1116, 668, 884, 51812], + "temperature": 0.0, "avg_logprob": -0.1749326012351296, "compression_ratio": 1.705223880597015, + "no_speech_prob": 0.0008606132469139993}, {"id": 70, "seek": 35880, "start": 358.8, + "end": 365.76, "text": " for, you know, 10 to 12 years at that point. Yeah. Yeah, + I mean, for me, like searching, you know, like,", "tokens": [50364, 337, 11, 291, + 458, 11, 1266, 281, 2272, 924, 412, 300, 935, 13, 865, 13, 865, 11, 286, 914, 11, + 337, 385, 11, 411, 10808, 11, 291, 458, 11, 411, 11, 50712], "temperature": 0.0, + "avg_logprob": -0.17931307142025957, "compression_ratio": 1.6300813008130082, "no_speech_prob": + 0.00387572986073792}, {"id": 71, "seek": 35880, "start": 366.64, "end": 375.12, + "text": " I think I started 2002, 2003 academically, but then it was like seven + years past and I still couldn''t", "tokens": [50756, 286, 519, 286, 1409, 17822, + 11, 16416, 48944, 11, 457, 550, 309, 390, 411, 3407, 924, 1791, 293, 286, 920, 2809, + 380, 51180], "temperature": 0.0, "avg_logprob": -0.17931307142025957, "compression_ratio": + 1.6300813008130082, "no_speech_prob": 0.00387572986073792}, {"id": 72, "seek": 35880, + "start": 375.12, "end": 380.88, "text": " find a niche or a job for myself because + there haven''t been then many search companies in Finland", "tokens": [51180, 915, + 257, 19956, 420, 257, 1691, 337, 2059, 570, 456, 2378, 380, 668, 550, 867, 3164, + 3431, 294, 24869, 51468], "temperature": 0.0, "avg_logprob": -0.17931307142025957, + "compression_ratio": 1.6300813008130082, "no_speech_prob": 0.00387572986073792}, + {"id": 73, "seek": 35880, "start": 380.88, "end": 388.32, "text": " actually at + that point. And then I found a company which I joined in 2010, AlfaSense. And it + was", "tokens": [51468, 767, 412, 300, 935, 13, 400, 550, 286, 1352, 257, 2237, + 597, 286, 6869, 294, 9657, 11, 967, 11771, 50, 1288, 13, 400, 309, 390, 51840], + "temperature": 0.0, "avg_logprob": -0.17931307142025957, "compression_ratio": 1.6300813008130082, + "no_speech_prob": 0.00387572986073792}, {"id": 74, "seek": 38832, "start": 388.32, + "end": 394.8, "text": " Apache Solar, you see in everything new, but it was still + somehow inviting. And I think the first", "tokens": [50364, 46597, 22385, 11, 291, + 536, 294, 1203, 777, 11, 457, 309, 390, 920, 6063, 18202, 13, 400, 286, 519, 264, + 700, 50688], "temperature": 0.0, "avg_logprob": -0.1527200946371064, "compression_ratio": + 1.74822695035461, "no_speech_prob": 0.0015541462926194072}, {"id": 75, "seek": 38832, + "start": 394.8, "end": 398.71999999999997, "text": " time when I, when I''ve built + the backend and I was like, okay, somebody is going to use this,", "tokens": [50688, + 565, 562, 286, 11, 562, 286, 600, 3094, 264, 38087, 293, 286, 390, 411, 11, 1392, + 11, 2618, 307, 516, 281, 764, 341, 11, 50884], "temperature": 0.0, "avg_logprob": + -0.1527200946371064, "compression_ratio": 1.74822695035461, "no_speech_prob": 0.0015541462926194072}, + {"id": 76, "seek": 38832, "start": 398.71999999999997, "end": 404.24, "text": " + somebody is going to type the queries and we''ll try to find information. So I also + tried it out", "tokens": [50884, 2618, 307, 516, 281, 2010, 264, 24109, 293, 321, + 603, 853, 281, 915, 1589, 13, 407, 286, 611, 3031, 309, 484, 51160], "temperature": + 0.0, "avg_logprob": -0.1527200946371064, "compression_ratio": 1.74822695035461, + "no_speech_prob": 0.0015541462926194072}, {"id": 77, "seek": 38832, "start": 404.24, + "end": 410.56, "text": " and kind of like maybe work, maybe didn''t, I wasn''t the, + the, the user of this system. I didn''t know", "tokens": [51160, 293, 733, 295, + 411, 1310, 589, 11, 1310, 994, 380, 11, 286, 2067, 380, 264, 11, 264, 11, 264, 4195, + 295, 341, 1185, 13, 286, 994, 380, 458, 51476], "temperature": 0.0, "avg_logprob": + -0.1527200946371064, "compression_ratio": 1.74822695035461, "no_speech_prob": 0.0015541462926194072}, + {"id": 78, "seek": 38832, "start": 410.56, "end": 415.84, "text": " what to type. + So I was just grabbing some phrases from the documents and see, okay, does it find + or not,", "tokens": [51476, 437, 281, 2010, 13, 407, 286, 390, 445, 23771, 512, + 20312, 490, 264, 8512, 293, 536, 11, 1392, 11, 775, 309, 915, 420, 406, 11, 51740], + "temperature": 0.0, "avg_logprob": -0.1527200946371064, "compression_ratio": 1.74822695035461, + "no_speech_prob": 0.0015541462926194072}, {"id": 79, "seek": 41584, "start": 415.84, + "end": 422.08, "text": " you know? So is this something that also like attracted + you like, okay, findability, right?", "tokens": [50364, 291, 458, 30, 407, 307, + 341, 746, 300, 611, 411, 15912, 291, 411, 11, 1392, 11, 915, 2310, 11, 558, 30, + 50676], "temperature": 0.0, "avg_logprob": -0.14896703474592454, "compression_ratio": + 1.7028112449799198, "no_speech_prob": 0.0015000887215137482}, {"id": 80, "seek": + 41584, "start": 422.08, "end": 426.64, "text": " Like discovery or maybe discovery + is the next stage, but even the findability itself.", "tokens": [50676, 1743, 12114, + 420, 1310, 12114, 307, 264, 958, 3233, 11, 457, 754, 264, 915, 2310, 2564, 13, 50904], + "temperature": 0.0, "avg_logprob": -0.14896703474592454, "compression_ratio": 1.7028112449799198, + "no_speech_prob": 0.0015000887215137482}, {"id": 81, "seek": 41584, "start": 428.15999999999997, + "end": 431.67999999999995, "text": " Yeah, I guess search was really my first step + towards", "tokens": [50980, 865, 11, 286, 2041, 3164, 390, 534, 452, 700, 1823, + 3030, 51156], "temperature": 0.0, "avg_logprob": -0.14896703474592454, "compression_ratio": + 1.7028112449799198, "no_speech_prob": 0.0015000887215137482}, {"id": 82, "seek": + 41584, "start": 433.35999999999996, "end": 439.52, "text": " working with real complex + data that wasn''t so unstructured, unstructured data, right? You kind of,", "tokens": + [51240, 1364, 365, 957, 3997, 1412, 300, 2067, 380, 370, 18799, 46847, 11, 18799, + 46847, 1412, 11, 558, 30, 509, 733, 295, 11, 51548], "temperature": 0.0, "avg_logprob": + -0.14896703474592454, "compression_ratio": 1.7028112449799198, "no_speech_prob": + 0.0015000887215137482}, {"id": 83, "seek": 41584, "start": 440.15999999999997, "end": + 444.55999999999995, "text": " you kind of reach a limit with structured data at + some point of getting stuff into databases,", "tokens": [51580, 291, 733, 295, 2524, + 257, 4948, 365, 18519, 1412, 412, 512, 935, 295, 1242, 1507, 666, 22380, 11, 51800], + "temperature": 0.0, "avg_logprob": -0.14896703474592454, "compression_ratio": 1.7028112449799198, + "no_speech_prob": 0.0015000887215137482}, {"id": 84, "seek": 44456, "start": 444.56, + "end": 447.84, "text": " getting it out and things like that. And you can, you can + spend a lifetime in that work.", "tokens": [50364, 1242, 309, 484, 293, 721, 411, + 300, 13, 400, 291, 393, 11, 291, 393, 3496, 257, 11364, 294, 300, 589, 13, 50528], + "temperature": 0.0, "avg_logprob": -0.1303442970651095, "compression_ratio": 1.8975409836065573, + "no_speech_prob": 0.0005779014900326729}, {"id": 85, "seek": 44456, "start": 448.64, + "end": 457.44, "text": " But I felt like I''d been doing it for a while. And with, + with search, it was like this,", "tokens": [50568, 583, 286, 2762, 411, 286, 1116, + 668, 884, 309, 337, 257, 1339, 13, 400, 365, 11, 365, 3164, 11, 309, 390, 411, 341, + 11, 51008], "temperature": 0.0, "avg_logprob": -0.1303442970651095, "compression_ratio": + 1.8975409836065573, "no_speech_prob": 0.0005779014900326729}, {"id": 86, "seek": + 44456, "start": 457.44, "end": 462.4, "text": " this weird world where it''s like + all this unknown stuff and you don''t know what to do. So it''s", "tokens": [51008, + 341, 3657, 1002, 689, 309, 311, 411, 439, 341, 9841, 1507, 293, 291, 500, 380, 458, + 437, 281, 360, 13, 407, 309, 311, 51256], "temperature": 0.0, "avg_logprob": -0.1303442970651095, + "compression_ratio": 1.8975409836065573, "no_speech_prob": 0.0005779014900326729}, + {"id": 87, "seek": 44456, "start": 462.4, "end": 466.96, "text": " this unsolved + problem. I felt like databases and things like that were like this solved problem,", + "tokens": [51256, 341, 2693, 29110, 1154, 13, 286, 2762, 411, 22380, 293, 721, 411, + 300, 645, 411, 341, 13041, 1154, 11, 51484], "temperature": 0.0, "avg_logprob": + -0.1303442970651095, "compression_ratio": 1.8975409836065573, "no_speech_prob": + 0.0005779014900326729}, {"id": 88, "seek": 44456, "start": 466.96, "end": 474.0, + "text": " where search, search wasn''t a solved problem and still isn''t. Now with + the work, if I had been", "tokens": [51484, 689, 3164, 11, 3164, 2067, 380, 257, + 13041, 1154, 293, 920, 1943, 380, 13, 823, 365, 264, 589, 11, 498, 286, 632, 668, + 51836], "temperature": 0.0, "avg_logprob": -0.1303442970651095, "compression_ratio": + 1.8975409836065573, "no_speech_prob": 0.0005779014900326729}, {"id": 89, "seek": + 47400, "start": 474.0, "end": 478.08, "text": " doing the same database work, that''s + all no code right now. You can just create the same stuff I", "tokens": [50364, + 884, 264, 912, 8149, 589, 11, 300, 311, 439, 572, 3089, 558, 586, 13, 509, 393, + 445, 1884, 264, 912, 1507, 286, 50568], "temperature": 0.0, "avg_logprob": -0.15399330381363158, + "compression_ratio": 1.7018867924528303, "no_speech_prob": 0.0005444454145617783}, + {"id": 90, "seek": 47400, "start": 478.08, "end": 481.68, "text": " was doing with + no code tools. You don''t even have to be a programmer if you don''t want to", "tokens": + [50568, 390, 884, 365, 572, 3089, 3873, 13, 509, 500, 380, 754, 362, 281, 312, 257, + 32116, 498, 291, 500, 380, 528, 281, 50748], "temperature": 0.0, "avg_logprob": + -0.15399330381363158, "compression_ratio": 1.7018867924528303, "no_speech_prob": + 0.0005444454145617783}, {"id": 91, "seek": 47400, "start": 482.32, "end": 486.4, + "text": " with the level that we were doing it, you know, in the mid 2000s. So,", + "tokens": [50780, 365, 264, 1496, 300, 321, 645, 884, 309, 11, 291, 458, 11, 294, + 264, 2062, 8132, 82, 13, 407, 11, 50984], "temperature": 0.0, "avg_logprob": -0.15399330381363158, + "compression_ratio": 1.7018867924528303, "no_speech_prob": 0.0005444454145617783}, + {"id": 92, "seek": 47400, "start": 489.12, "end": 495.36, "text": " yeah, now it + is. And it''s still, it''s still unsolved. Even when we start talking, you know,", + "tokens": [51120, 1338, 11, 586, 309, 307, 13, 400, 309, 311, 920, 11, 309, 311, + 920, 2693, 29110, 13, 2754, 562, 321, 722, 1417, 11, 291, 458, 11, 51432], "temperature": + 0.0, "avg_logprob": -0.15399330381363158, "compression_ratio": 1.7018867924528303, + "no_speech_prob": 0.0005444454145617783}, {"id": 93, "seek": 47400, "start": 495.36, + "end": 499.68, "text": " we''re going to talk about vectors, of course, but vector + search. But that''s still an unsolved problem.", "tokens": [51432, 321, 434, 516, + 281, 751, 466, 18875, 11, 295, 1164, 11, 457, 8062, 3164, 13, 583, 300, 311, 920, + 364, 2693, 29110, 1154, 13, 51648], "temperature": 0.0, "avg_logprob": -0.15399330381363158, + "compression_ratio": 1.7018867924528303, "no_speech_prob": 0.0005444454145617783}, + {"id": 94, "seek": 49968, "start": 499.68, "end": 504.8, "text": " It''s like another + tool, but you still have all these issues that you have to take into account.", + "tokens": [50364, 467, 311, 411, 1071, 2290, 11, 457, 291, 920, 362, 439, 613, 2663, + 300, 291, 362, 281, 747, 666, 2696, 13, 50620], "temperature": 0.0, "avg_logprob": + -0.1564168632030487, "compression_ratio": 1.5991735537190082, "no_speech_prob": + 0.006438321899622679}, {"id": 95, "seek": 49968, "start": 505.6, "end": 512.32, + "text": " Yeah. So endless exploration. Yeah, it''s like infinite quest in many + ways. There is like a", "tokens": [50660, 865, 13, 407, 16144, 16197, 13, 865, 11, + 309, 311, 411, 13785, 866, 294, 867, 2098, 13, 821, 307, 411, 257, 50996], "temperature": + 0.0, "avg_logprob": -0.1564168632030487, "compression_ratio": 1.5991735537190082, + "no_speech_prob": 0.006438321899622679}, {"id": 96, "seek": 49968, "start": 512.32, + "end": 519.84, "text": " limitless amount of tasks to solve. But then, so somehow + in your career, there was a turn that you", "tokens": [50996, 4948, 1832, 2372, + 295, 9608, 281, 5039, 13, 583, 550, 11, 370, 6063, 294, 428, 3988, 11, 456, 390, + 257, 1261, 300, 291, 51372], "temperature": 0.0, "avg_logprob": -0.1564168632030487, + "compression_ratio": 1.5991735537190082, "no_speech_prob": 0.006438321899622679}, + {"id": 97, "seek": 49968, "start": 519.84, "end": 528.0, "text": " decided to get + closer to this vector search field. I just wanted to hear your kind of first reaction,", + "tokens": [51372, 3047, 281, 483, 4966, 281, 341, 8062, 3164, 2519, 13, 286, 445, + 1415, 281, 1568, 428, 733, 295, 700, 5480, 11, 51780], "temperature": 0.0, "avg_logprob": + -0.1564168632030487, "compression_ratio": 1.5991735537190082, "no_speech_prob": + 0.006438321899622679}, {"id": 98, "seek": 52800, "start": 528.0, "end": 533.92, + "text": " like what did you think about it? When did you hear about it? And also, + what attracted you?", "tokens": [50364, 411, 437, 630, 291, 519, 466, 309, 30, 1133, + 630, 291, 1568, 466, 309, 30, 400, 611, 11, 437, 15912, 291, 30, 50660], "temperature": + 0.0, "avg_logprob": -0.12184692251271215, "compression_ratio": 1.6492890995260663, + "no_speech_prob": 0.00047050503781065345}, {"id": 99, "seek": 52800, "start": 536.96, + "end": 542.72, "text": " I''d say the first thing that really attracted me towards + vector search was the birth paper", "tokens": [50812, 286, 1116, 584, 264, 700, + 551, 300, 534, 15912, 385, 3030, 8062, 3164, 390, 264, 3965, 3035, 51100], "temperature": + 0.0, "avg_logprob": -0.12184692251271215, "compression_ratio": 1.6492890995260663, + "no_speech_prob": 0.00047050503781065345}, {"id": 100, "seek": 52800, "start": 544.16, + "end": 548.56, "text": " that was written in 2018, but I didn''t I didn''t come + across it until 2019.", "tokens": [51172, 300, 390, 3720, 294, 6096, 11, 457, 286, + 994, 380, 286, 994, 380, 808, 2108, 309, 1826, 6071, 13, 51392], "temperature": + 0.0, "avg_logprob": -0.12184692251271215, "compression_ratio": 1.6492890995260663, + "no_speech_prob": 0.00047050503781065345}, {"id": 101, "seek": 52800, "start": 550.4, + "end": 554.0, "text": " And Google had written a blog about how they were using + it for their for their web search.", "tokens": [51484, 400, 3329, 632, 3720, 257, + 6968, 466, 577, 436, 645, 1228, 309, 337, 641, 337, 641, 3670, 3164, 13, 51664], + "temperature": 0.0, "avg_logprob": -0.12184692251271215, "compression_ratio": 1.6492890995260663, + "no_speech_prob": 0.00047050503781065345}, {"id": 102, "seek": 55400, "start": 554.96, + "end": 559.92, "text": " And you know, you could download some Python and get this + stuff to work.", "tokens": [50412, 400, 291, 458, 11, 291, 727, 5484, 512, 15329, + 293, 483, 341, 1507, 281, 589, 13, 50660], "temperature": 0.0, "avg_logprob": -0.18963942876676235, + "compression_ratio": 1.4976303317535544, "no_speech_prob": 0.007488514296710491}, + {"id": 103, "seek": 55400, "start": 561.44, "end": 566.08, "text": " But the reason + why I was so fascinated by that is because of", "tokens": [50736, 583, 264, 1778, + 983, 286, 390, 370, 24597, 538, 300, 307, 570, 295, 50968], "temperature": 0.0, + "avg_logprob": -0.18963942876676235, "compression_ratio": 1.4976303317535544, "no_speech_prob": + 0.007488514296710491}, {"id": 104, "seek": 55400, "start": 568.4, "end": 577.92, + "text": " you know, working in search already six years. No, let''s do some math. + So, you know, eight years at", "tokens": [51084, 291, 458, 11, 1364, 294, 3164, + 1217, 2309, 924, 13, 883, 11, 718, 311, 360, 512, 5221, 13, 407, 11, 291, 458, 11, + 3180, 924, 412, 51560], "temperature": 0.0, "avg_logprob": -0.18963942876676235, + "compression_ratio": 1.4976303317535544, "no_speech_prob": 0.007488514296710491}, + {"id": 105, "seek": 55400, "start": 577.92, "end": 583.6, "text": " that point, + I had been stumbling along with the vocabulary problem. The query term", "tokens": + [51560, 300, 935, 11, 286, 632, 668, 342, 14188, 2051, 365, 264, 19864, 1154, 13, + 440, 14581, 1433, 51844], "temperature": 0.0, "avg_logprob": -0.18963942876676235, + "compression_ratio": 1.4976303317535544, "no_speech_prob": 0.007488514296710491}, + {"id": 106, "seek": 58360, "start": 583.6, "end": 589.0400000000001, "text": " dependence + problem, as we call it, where, okay, well, to solve this, you have to create a bunch + of", "tokens": [50364, 31704, 1154, 11, 382, 321, 818, 309, 11, 689, 11, 1392, 11, + 731, 11, 281, 5039, 341, 11, 291, 362, 281, 1884, 257, 3840, 295, 50636], "temperature": + 0.0, "avg_logprob": -0.19538683104283602, "compression_ratio": 1.7757847533632287, + "no_speech_prob": 0.0017192689701914787}, {"id": 107, "seek": 58360, "start": 589.0400000000001, + "end": 593.44, "text": " synonyms and then you get to a certain level of advancement + and then you create a taxonomy and then", "tokens": [50636, 5451, 2526, 2592, 293, + 550, 291, 483, 281, 257, 1629, 1496, 295, 35764, 293, 550, 291, 1884, 257, 3366, + 23423, 293, 550, 50856], "temperature": 0.0, "avg_logprob": -0.19538683104283602, + "compression_ratio": 1.7757847533632287, "no_speech_prob": 0.0017192689701914787}, + {"id": 108, "seek": 58360, "start": 593.44, "end": 600.16, "text": " you know, you + created a knowledge graph. And you know, before before birth, we''d started playing", + "tokens": [50856, 291, 458, 11, 291, 2942, 257, 3601, 4295, 13, 400, 291, 458, 11, + 949, 949, 3965, 11, 321, 1116, 1409, 2433, 51192], "temperature": 0.0, "avg_logprob": + -0.19538683104283602, "compression_ratio": 1.7757847533632287, "no_speech_prob": + 0.0017192689701914787}, {"id": 109, "seek": 58360, "start": 600.16, "end": 607.2, + "text": " around with word to veck and saying, oh, can you know, can these type + of embeddings be used to solve", "tokens": [51192, 926, 365, 1349, 281, 1241, 547, + 293, 1566, 11, 1954, 11, 393, 291, 458, 11, 393, 613, 2010, 295, 12240, 29432, 312, + 1143, 281, 5039, 51544], "temperature": 0.0, "avg_logprob": -0.19538683104283602, + "compression_ratio": 1.7757847533632287, "no_speech_prob": 0.0017192689701914787}, + {"id": 110, "seek": 60720, "start": 608.08, "end": 612.48, "text": " this whack-able + problem with synonyms and knowledge graph vocabulary expansion?", "tokens": [50408, + 341, 42877, 12, 712, 1154, 365, 5451, 2526, 2592, 293, 3601, 4295, 19864, 11260, + 30, 50628], "temperature": 0.0, "avg_logprob": -0.14181370735168458, "compression_ratio": + 1.6451612903225807, "no_speech_prob": 0.001170991687104106}, {"id": 111, "seek": + 60720, "start": 613.12, "end": 618.24, "text": " The answer turned out to be no + with word to veck. It didn''t work as well as we''d hoped.", "tokens": [50660, 440, + 1867, 3574, 484, 281, 312, 572, 365, 1349, 281, 1241, 547, 13, 467, 994, 380, 589, + 382, 731, 382, 321, 1116, 19737, 13, 50916], "temperature": 0.0, "avg_logprob": + -0.14181370735168458, "compression_ratio": 1.6451612903225807, "no_speech_prob": + 0.001170991687104106}, {"id": 112, "seek": 60720, "start": 618.24, "end": 624.6400000000001, + "text": " It helped with some things, but not, but it harmed with others. So it + produced a lot of noise and", "tokens": [50916, 467, 4254, 365, 512, 721, 11, 457, + 406, 11, 457, 309, 41478, 365, 2357, 13, 407, 309, 7126, 257, 688, 295, 5658, 293, + 51236], "temperature": 0.0, "avg_logprob": -0.14181370735168458, "compression_ratio": + 1.6451612903225807, "no_speech_prob": 0.001170991687104106}, {"id": 113, "seek": + 60720, "start": 624.6400000000001, "end": 629.36, "text": " and you know, maybe + we didn''t give it a good enough chance, but we saw, okay, we can train this", "tokens": + [51236, 293, 291, 458, 11, 1310, 321, 994, 380, 976, 309, 257, 665, 1547, 2931, + 11, 457, 321, 1866, 11, 1392, 11, 321, 393, 3847, 341, 51472], "temperature": 0.0, + "avg_logprob": -0.14181370735168458, "compression_ratio": 1.6451612903225807, "no_speech_prob": + 0.001170991687104106}, {"id": 114, "seek": 60720, "start": 629.36, "end": 635.0400000000001, + "text": " thing pretty quick and we can get this model from our content. But there''s + still this problem. So", "tokens": [51472, 551, 1238, 1702, 293, 321, 393, 483, + 341, 2316, 490, 527, 2701, 13, 583, 456, 311, 920, 341, 1154, 13, 407, 51756], "temperature": + 0.0, "avg_logprob": -0.14181370735168458, "compression_ratio": 1.6451612903225807, + "no_speech_prob": 0.001170991687104106}, {"id": 115, "seek": 63504, "start": 636.0, + "end": 642.64, "text": " when I started to play around with some of the Python tools + that were available for", "tokens": [50412, 562, 286, 1409, 281, 862, 926, 365, + 512, 295, 264, 15329, 3873, 300, 645, 2435, 337, 50744], "temperature": 0.0, "avg_logprob": + -0.15774615427081504, "compression_ratio": 1.6681614349775784, "no_speech_prob": + 0.0004325170593801886}, {"id": 116, "seek": 63504, "start": 643.52, "end": 648.56, + "text": " for Bert and large language networks, which actually used word to veck + as the pre-processing step", "tokens": [50788, 337, 29594, 293, 2416, 2856, 9590, + 11, 597, 767, 1143, 1349, 281, 1241, 547, 382, 264, 659, 12, 41075, 278, 1823, 51040], + "temperature": 0.0, "avg_logprob": -0.15774615427081504, "compression_ratio": 1.6681614349775784, + "no_speech_prob": 0.0004325170593801886}, {"id": 117, "seek": 63504, "start": 649.8399999999999, + "end": 654.4, "text": " to get the first to get the first encodings and then with + first embeddings and then use those", "tokens": [51104, 281, 483, 264, 700, 281, + 483, 264, 700, 2058, 378, 1109, 293, 550, 365, 700, 12240, 29432, 293, 550, 764, + 729, 51332], "temperature": 0.0, "avg_logprob": -0.15774615427081504, "compression_ratio": + 1.6681614349775784, "no_speech_prob": 0.0004325170593801886}, {"id": 118, "seek": + 63504, "start": 654.4, "end": 660.9599999999999, "text": " identifiers to go forward. + I really saw something there. I saw actual similarity where I didn''t,", "tokens": + [51332, 2473, 23463, 281, 352, 2128, 13, 286, 534, 1866, 746, 456, 13, 286, 1866, + 3539, 32194, 689, 286, 994, 380, 11, 51660], "temperature": 0.0, "avg_logprob": + -0.15774615427081504, "compression_ratio": 1.6681614349775784, "no_speech_prob": + 0.0004325170593801886}, {"id": 119, "seek": 66096, "start": 660.96, "end": 666.72, + "text": " I just saw kind of co-occurrence with word to veck before. Yeah, these + things are, you see them", "tokens": [50364, 286, 445, 1866, 733, 295, 598, 12, + 905, 14112, 10760, 365, 1349, 281, 1241, 547, 949, 13, 865, 11, 613, 721, 366, 11, + 291, 536, 552, 50652], "temperature": 0.0, "avg_logprob": -0.12455715093397557, + "compression_ratio": 1.7472527472527473, "no_speech_prob": 0.0007773492834530771}, + {"id": 120, "seek": 66096, "start": 666.72, "end": 671.36, "text": " in the same + context. But with actual linguistic similarity, the first time I saw that was with", + "tokens": [50652, 294, 264, 912, 4319, 13, 583, 365, 3539, 43002, 32194, 11, 264, + 700, 565, 286, 1866, 300, 390, 365, 50884], "temperature": 0.0, "avg_logprob": -0.12455715093397557, + "compression_ratio": 1.7472527472527473, "no_speech_prob": 0.0007773492834530771}, + {"id": 121, "seek": 66096, "start": 671.36, "end": 676.5600000000001, "text": " + Bert and that''s where all the hype came from. And then the next step with Bert + is like, okay,", "tokens": [50884, 29594, 293, 300, 311, 689, 439, 264, 24144, 1361, + 490, 13, 400, 550, 264, 958, 1823, 365, 29594, 307, 411, 11, 1392, 11, 51144], "temperature": + 0.0, "avg_logprob": -0.12455715093397557, "compression_ratio": 1.7472527472527473, + "no_speech_prob": 0.0007773492834530771}, {"id": 122, "seek": 66096, "start": 676.5600000000001, + "end": 681.2, "text": " I have these vectors. Now what do I do with them? And then + I said, okay, well, I have to use a", "tokens": [51144, 286, 362, 613, 18875, 13, + 823, 437, 360, 286, 360, 365, 552, 30, 400, 550, 286, 848, 11, 1392, 11, 731, 11, + 286, 362, 281, 764, 257, 51376], "temperature": 0.0, "avg_logprob": -0.12455715093397557, + "compression_ratio": 1.7472527472527473, "no_speech_prob": 0.0007773492834530771}, + {"id": 123, "seek": 66096, "start": 681.2, "end": 686.88, "text": " dot product, + right? I have to use a cosine similarity. Okay, let me just do that. And then I + say,", "tokens": [51376, 5893, 1674, 11, 558, 30, 286, 362, 281, 764, 257, 23565, + 32194, 13, 1033, 11, 718, 385, 445, 360, 300, 13, 400, 550, 286, 584, 11, 51660], + "temperature": 0.0, "avg_logprob": -0.12455715093397557, "compression_ratio": 1.7472527472527473, + "no_speech_prob": 0.0007773492834530771}, {"id": 124, "seek": 68688, "start": 686.96, + "end": 692.16, "text": " oh, you can''t just do that across every vector. It''s + impossible. You have to do something else.", "tokens": [50368, 1954, 11, 291, 393, + 380, 445, 360, 300, 2108, 633, 8062, 13, 467, 311, 6243, 13, 509, 362, 281, 360, + 746, 1646, 13, 50628], "temperature": 0.0, "avg_logprob": -0.1347291197957872, "compression_ratio": + 1.4646464646464648, "no_speech_prob": 0.0005540549173019826}, {"id": 125, "seek": + 68688, "start": 692.16, "end": 700.64, "text": " And then you go on this learning + path, right? So that''s where I ended up. And I had actually written", "tokens": + [50628, 400, 550, 291, 352, 322, 341, 2539, 3100, 11, 558, 30, 407, 300, 311, 689, + 286, 4590, 493, 13, 400, 286, 632, 767, 3720, 51052], "temperature": 0.0, "avg_logprob": + -0.1347291197957872, "compression_ratio": 1.4646464646464648, "no_speech_prob": + 0.0005540549173019826}, {"id": 126, "seek": 68688, "start": 700.64, "end": 709.4399999999999, + "text": " a blog post in 2019, you know, about, and I think that post was, you know, + widely accepted by", "tokens": [51052, 257, 6968, 2183, 294, 6071, 11, 291, 458, + 11, 466, 11, 293, 286, 519, 300, 2183, 390, 11, 291, 458, 11, 13371, 9035, 538, + 51492], "temperature": 0.0, "avg_logprob": -0.1347291197957872, "compression_ratio": + 1.4646464646464648, "no_speech_prob": 0.0005540549173019826}, {"id": 127, "seek": + 70944, "start": 709.44, "end": 718.5600000000001, "text": " community, it''s still + in the open source connections blog. And it was really, it was really showing like,", + "tokens": [50364, 1768, 11, 309, 311, 920, 294, 264, 1269, 4009, 9271, 6968, 13, + 400, 309, 390, 534, 11, 309, 390, 534, 4099, 411, 11, 50820], "temperature": 0.0, + "avg_logprob": -0.1168779797024197, "compression_ratio": 1.7387387387387387, "no_speech_prob": + 0.002807541051879525}, {"id": 128, "seek": 70944, "start": 718.5600000000001, "end": + 723.0400000000001, "text": " hey, this is, this is a change, you know, it''s not + just Google that''s going to be doing this.", "tokens": [50820, 4177, 11, 341, 307, + 11, 341, 307, 257, 1319, 11, 291, 458, 11, 309, 311, 406, 445, 3329, 300, 311, 516, + 281, 312, 884, 341, 13, 51044], "temperature": 0.0, "avg_logprob": -0.1168779797024197, + "compression_ratio": 1.7387387387387387, "no_speech_prob": 0.002807541051879525}, + {"id": 129, "seek": 70944, "start": 723.0400000000001, "end": 728.8000000000001, + "text": " Like, this is really interesting. And a lot of people agreed and there''s, + there was like this", "tokens": [51044, 1743, 11, 341, 307, 534, 1880, 13, 400, + 257, 688, 295, 561, 9166, 293, 456, 311, 11, 456, 390, 411, 341, 51332], "temperature": + 0.0, "avg_logprob": -0.1168779797024197, "compression_ratio": 1.7387387387387387, + "no_speech_prob": 0.002807541051879525}, {"id": 130, "seek": 70944, "start": 728.8000000000001, + "end": 734.6400000000001, "text": " movement that kind of happened after that. And + a lot of other people were coming to the same", "tokens": [51332, 3963, 300, 733, + 295, 2011, 934, 300, 13, 400, 257, 688, 295, 661, 561, 645, 1348, 281, 264, 912, + 51624], "temperature": 0.0, "avg_logprob": -0.1168779797024197, "compression_ratio": + 1.7387387387387387, "no_speech_prob": 0.002807541051879525}, {"id": 131, "seek": + 73464, "start": 734.64, "end": 742.96, "text": " conclusions, but there were a lot + of challenges. So with vector search and approximate nearest", "tokens": [50364, + 22865, 11, 457, 456, 645, 257, 688, 295, 4759, 13, 407, 365, 8062, 3164, 293, 30874, + 23831, 50780], "temperature": 0.0, "avg_logprob": -0.1271048684914907, "compression_ratio": + 1.6905829596412556, "no_speech_prob": 0.002390810288488865}, {"id": 132, "seek": + 73464, "start": 742.96, "end": 752.96, "text": " neighbor search, you know, that''s, + it would, that''s just the tool to solve the problem. It''s like,", "tokens": [50780, + 5987, 3164, 11, 291, 458, 11, 300, 311, 11, 309, 576, 11, 300, 311, 445, 264, 2290, + 281, 5039, 264, 1154, 13, 467, 311, 411, 11, 51280], "temperature": 0.0, "avg_logprob": + -0.1271048684914907, "compression_ratio": 1.6905829596412556, "no_speech_prob": + 0.002390810288488865}, {"id": 133, "seek": 73464, "start": 752.96, "end": 756.48, + "text": " you know, you start with this problem over here, and then you go like + 10 steps over here,", "tokens": [51280, 291, 458, 11, 291, 722, 365, 341, 1154, + 670, 510, 11, 293, 550, 291, 352, 411, 1266, 4439, 670, 510, 11, 51456], "temperature": + 0.0, "avg_logprob": -0.1271048684914907, "compression_ratio": 1.6905829596412556, + "no_speech_prob": 0.002390810288488865}, {"id": 134, "seek": 73464, "start": 756.48, + "end": 760.64, "text": " and finally, you get to vector searching. Okay, this is, + this is a potential solution, right?", "tokens": [51456, 293, 2721, 11, 291, 483, + 281, 8062, 10808, 13, 1033, 11, 341, 307, 11, 341, 307, 257, 3995, 3827, 11, 558, + 30, 51664], "temperature": 0.0, "avg_logprob": -0.1271048684914907, "compression_ratio": + 1.6905829596412556, "no_speech_prob": 0.002390810288488865}, {"id": 135, "seek": + 76064, "start": 761.1999999999999, "end": 764.4, "text": " This is the core of the + potential solution with all this stuff in the middle.", "tokens": [50392, 639, 307, + 264, 4965, 295, 264, 3995, 3827, 365, 439, 341, 1507, 294, 264, 2808, 13, 50552], + "temperature": 0.0, "avg_logprob": -0.13434708913167318, "compression_ratio": 1.5319148936170213, + "no_speech_prob": 0.0066062333062291145}, {"id": 136, "seek": 76064, "start": 765.84, + "end": 772.56, "text": " Yeah, but have you felt that I should read this blog and + we''ll definitely link it in the show notes.", "tokens": [50624, 865, 11, 457, 362, + 291, 2762, 300, 286, 820, 1401, 341, 6968, 293, 321, 603, 2138, 2113, 309, 294, + 264, 855, 5570, 13, 50960], "temperature": 0.0, "avg_logprob": -0.13434708913167318, + "compression_ratio": 1.5319148936170213, "no_speech_prob": 0.0066062333062291145}, + {"id": 137, "seek": 76064, "start": 773.1999999999999, "end": 780.96, "text": " + But sometimes when I look at vector search, let''s say demos or applications or + algorithms,", "tokens": [50992, 583, 2171, 562, 286, 574, 412, 8062, 3164, 11, 718, + 311, 584, 33788, 420, 5821, 420, 14642, 11, 51380], "temperature": 0.0, "avg_logprob": + -0.13434708913167318, "compression_ratio": 1.5319148936170213, "no_speech_prob": + 0.0066062333062291145}, {"id": 138, "seek": 76064, "start": 781.92, "end": 788.0, + "text": " I get a feeling that you might just think, okay, I have a solution. Let + me find a problem.", "tokens": [51428, 286, 483, 257, 2633, 300, 291, 1062, 445, + 519, 11, 1392, 11, 286, 362, 257, 3827, 13, 961, 385, 915, 257, 1154, 13, 51732], + "temperature": 0.0, "avg_logprob": -0.13434708913167318, "compression_ratio": 1.5319148936170213, + "no_speech_prob": 0.0066062333062291145}, {"id": 139, "seek": 78800, "start": 788.0, + "end": 798.4, "text": " Because it''s, it''s all semitical. I mean, it''s so sexy, + right? Do you, do you think this is one", "tokens": [50364, 1436, 309, 311, 11, + 309, 311, 439, 4361, 270, 804, 13, 286, 914, 11, 309, 311, 370, 13701, 11, 558, + 30, 1144, 291, 11, 360, 291, 519, 341, 307, 472, 50884], "temperature": 0.0, "avg_logprob": + -0.2098242441813151, "compression_ratio": 1.566137566137566, "no_speech_prob": 0.008189880289137363}, + {"id": 140, "seek": 78800, "start": 798.4, "end": 804.88, "text": " of the sort + of misconceptions, you know, in this field, or do you think that it''s well-past + that already?", "tokens": [50884, 295, 264, 1333, 295, 50012, 11, 291, 458, 11, + 294, 341, 2519, 11, 420, 360, 291, 519, 300, 309, 311, 731, 12, 79, 525, 300, 1217, + 30, 51208], "temperature": 0.0, "avg_logprob": -0.2098242441813151, "compression_ratio": + 1.566137566137566, "no_speech_prob": 0.008189880289137363}, {"id": 141, "seek": + 78800, "start": 805.92, "end": 812.72, "text": " That''s a great question. I don''t + know if, I don''t think it''s a solution looking for a problem.", "tokens": [51260, + 663, 311, 257, 869, 1168, 13, 286, 500, 380, 458, 498, 11, 286, 500, 380, 519, 309, + 311, 257, 3827, 1237, 337, 257, 1154, 13, 51600], "temperature": 0.0, "avg_logprob": + -0.2098242441813151, "compression_ratio": 1.566137566137566, "no_speech_prob": 0.008189880289137363}, + {"id": 142, "seek": 81272, "start": 812.72, "end": 818.32, "text": " I don''t think + that''s true. I think there, it actually does solve some problems.", "tokens": [50364, + 286, 500, 380, 519, 300, 311, 2074, 13, 286, 519, 456, 11, 309, 767, 775, 5039, + 512, 2740, 13, 50644], "temperature": 0.0, "avg_logprob": -0.1123572826385498, "compression_ratio": + 1.7188940092165899, "no_speech_prob": 0.0018509075744077563}, {"id": 143, "seek": + 81272, "start": 819.52, "end": 827.12, "text": " But I do agree that it gets, you + know, there''s a lot of gray area. And how do you arrive at that", "tokens": [50704, + 583, 286, 360, 3986, 300, 309, 2170, 11, 291, 458, 11, 456, 311, 257, 688, 295, + 10855, 1859, 13, 400, 577, 360, 291, 8881, 412, 300, 51084], "temperature": 0.0, + "avg_logprob": -0.1123572826385498, "compression_ratio": 1.7188940092165899, "no_speech_prob": + 0.0018509075744077563}, {"id": 144, "seek": 81272, "start": 827.12, "end": 833.28, + "text": " from, I need to find things as a person? You know, and all the things + that you have to go through", "tokens": [51084, 490, 11, 286, 643, 281, 915, 721, + 382, 257, 954, 30, 509, 458, 11, 293, 439, 264, 721, 300, 291, 362, 281, 352, 807, + 51392], "temperature": 0.0, "avg_logprob": -0.1123572826385498, "compression_ratio": + 1.7188940092165899, "no_speech_prob": 0.0018509075744077563}, {"id": 145, "seek": + 81272, "start": 833.28, "end": 838.72, "text": " until vector search actually means + something that is a solution. I think there''s, there''s a lot of", "tokens": [51392, + 1826, 8062, 3164, 767, 1355, 746, 300, 307, 257, 3827, 13, 286, 519, 456, 311, 11, + 456, 311, 257, 688, 295, 51664], "temperature": 0.0, "avg_logprob": -0.1123572826385498, + "compression_ratio": 1.7188940092165899, "no_speech_prob": 0.0018509075744077563}, + {"id": 146, "seek": 83872, "start": 838.72, "end": 842.24, "text": " people who + picked it up and say, okay, we could just use this and it''s going to solve, solve + these", "tokens": [50364, 561, 567, 6183, 309, 493, 293, 584, 11, 1392, 11, 321, + 727, 445, 764, 341, 293, 309, 311, 516, 281, 5039, 11, 5039, 613, 50540], "temperature": + 0.0, "avg_logprob": -0.13758413314819337, "compression_ratio": 1.5766129032258065, + "no_speech_prob": 0.0017870538868010044}, {"id": 147, "seek": 83872, "start": 842.24, + "end": 848.4, "text": " problems. But it doesn''t do that, right? Because search + is not just about similarity, you know, you can", "tokens": [50540, 2740, 13, 583, + 309, 1177, 380, 360, 300, 11, 558, 30, 1436, 3164, 307, 406, 445, 466, 32194, 11, + 291, 458, 11, 291, 393, 50848], "temperature": 0.0, "avg_logprob": -0.13758413314819337, + "compression_ratio": 1.5766129032258065, "no_speech_prob": 0.0017870538868010044}, + {"id": 148, "seek": 83872, "start": 848.4, "end": 856.64, "text": " express a query + similarity with a document using TFI DFBM25, you know, the sentence transformer,", + "tokens": [50848, 5109, 257, 14581, 32194, 365, 257, 4166, 1228, 314, 38568, 48336, + 18345, 6074, 11, 291, 458, 11, 264, 8174, 31782, 11, 51260], "temperature": 0.0, + "avg_logprob": -0.13758413314819337, "compression_ratio": 1.5766129032258065, "no_speech_prob": + 0.0017870538868010044}, {"id": 149, "seek": 83872, "start": 856.64, "end": 863.2, + "text": " you know, cosine distance, whatever. But that''s only the similarity. + There''s also like the,", "tokens": [51260, 291, 458, 11, 23565, 4560, 11, 2035, + 13, 583, 300, 311, 787, 264, 32194, 13, 821, 311, 611, 411, 264, 11, 51588], "temperature": + 0.0, "avg_logprob": -0.13758413314819337, "compression_ratio": 1.5766129032258065, + "no_speech_prob": 0.0017870538868010044}, {"id": 150, "seek": 86320, "start": 863.2800000000001, + "end": 868.32, "text": " the need that the person has to what they have. So it''s, + it''s a bunch of", "tokens": [50368, 264, 643, 300, 264, 954, 575, 281, 437, 436, + 362, 13, 407, 309, 311, 11, 309, 311, 257, 3840, 295, 50620], "temperature": 0.0, + "avg_logprob": -0.09141857856142838, "compression_ratio": 1.7549407114624507, "no_speech_prob": + 0.00377467623911798}, {"id": 151, "seek": 86320, "start": 869.5200000000001, "end": + 873.84, "text": " candidate documents that are similar, but what''s the actual document + you need? So that''s where a lot", "tokens": [50680, 11532, 8512, 300, 366, 2531, + 11, 457, 437, 311, 264, 3539, 4166, 291, 643, 30, 407, 300, 311, 689, 257, 688, + 50896], "temperature": 0.0, "avg_logprob": -0.09141857856142838, "compression_ratio": + 1.7549407114624507, "no_speech_prob": 0.00377467623911798}, {"id": 152, "seek": + 86320, "start": 873.84, "end": 878.88, "text": " of other things come into play. + It''s just one piece in a much larger search or recommendations", "tokens": [50896, + 295, 661, 721, 808, 666, 862, 13, 467, 311, 445, 472, 2522, 294, 257, 709, 4833, + 3164, 420, 10434, 51148], "temperature": 0.0, "avg_logprob": -0.09141857856142838, + "compression_ratio": 1.7549407114624507, "no_speech_prob": 0.00377467623911798}, + {"id": 153, "seek": 86320, "start": 878.88, "end": 883.76, "text": " platform, you + know, you still have to take on all the other signals and, you know,", "tokens": + [51148, 3663, 11, 291, 458, 11, 291, 920, 362, 281, 747, 322, 439, 264, 661, 12354, + 293, 11, 291, 458, 11, 51392], "temperature": 0.0, "avg_logprob": -0.09141857856142838, + "compression_ratio": 1.7549407114624507, "no_speech_prob": 0.00377467623911798}, + {"id": 154, "seek": 86320, "start": 885.36, "end": 890.48, "text": " common now + in the, in the more mature platforms is, you know, you have some learning to rank", + "tokens": [51472, 2689, 586, 294, 264, 11, 294, 264, 544, 14442, 9473, 307, 11, + 291, 458, 11, 291, 362, 512, 2539, 281, 6181, 51728], "temperature": 0.0, "avg_logprob": + -0.09141857856142838, "compression_ratio": 1.7549407114624507, "no_speech_prob": + 0.00377467623911798}, {"id": 155, "seek": 89048, "start": 890.48, "end": 896.48, + "text": " algorithm that takes, you know, me and Vector similarity is one, is one + feature in, in a learning", "tokens": [50364, 9284, 300, 2516, 11, 291, 458, 11, + 385, 293, 691, 20814, 32194, 307, 472, 11, 307, 472, 4111, 294, 11, 294, 257, 2539, + 50664], "temperature": 0.0, "avg_logprob": -0.20427821766246448, "compression_ratio": + 1.6428571428571428, "no_speech_prob": 0.0007845693035051227}, {"id": 156, "seek": + 89048, "start": 896.48, "end": 902.64, "text": " to rank model. Along with, you + know, BM25 with the title, BM25 with the body, you know, the number of", "tokens": + [50664, 281, 6181, 2316, 13, 17457, 365, 11, 291, 458, 11, 15901, 6074, 365, 264, + 4876, 11, 15901, 6074, 365, 264, 1772, 11, 291, 458, 11, 264, 1230, 295, 50972], + "temperature": 0.0, "avg_logprob": -0.20427821766246448, "compression_ratio": 1.6428571428571428, + "no_speech_prob": 0.0007845693035051227}, {"id": 157, "seek": 89048, "start": 903.36, + "end": 909.76, "text": " clicks, the date, all this other stuff. And it''s, it''s + a piece. But the thing that the piece", "tokens": [51008, 18521, 11, 264, 4002, + 11, 439, 341, 661, 1507, 13, 400, 309, 311, 11, 309, 311, 257, 2522, 13, 583, 264, + 551, 300, 264, 2522, 51328], "temperature": 0.0, "avg_logprob": -0.20427821766246448, + "compression_ratio": 1.6428571428571428, "no_speech_prob": 0.0007845693035051227}, + {"id": 158, "seek": 89048, "start": 909.76, "end": 919.6, "text": " solves is that + query term dependence problem, whereas like I don''t have to, in a, in a, sometimes,", + "tokens": [51328, 39890, 307, 300, 14581, 1433, 31704, 1154, 11, 9735, 411, 286, + 500, 380, 362, 281, 11, 294, 257, 11, 294, 257, 11, 2171, 11, 51820], "temperature": + 0.0, "avg_logprob": -0.20427821766246448, "compression_ratio": 1.6428571428571428, + "no_speech_prob": 0.0007845693035051227}, {"id": 159, "seek": 91960, "start": 919.6, + "end": 923.9200000000001, "text": " you know, I don''t have to go in and, and craft + synonyms by hand, and I don''t have this endless task of", "tokens": [50364, 291, + 458, 11, 286, 500, 380, 362, 281, 352, 294, 293, 11, 293, 8448, 5451, 2526, 2592, + 538, 1011, 11, 293, 286, 500, 380, 362, 341, 16144, 5633, 295, 50580], "temperature": + 0.0, "avg_logprob": -0.17329160902235244, "compression_ratio": 1.8160919540229885, + "no_speech_prob": 0.0018212435534223914}, {"id": 160, "seek": 91960, "start": 923.9200000000001, + "end": 927.84, "text": " doing that. You just, you kind of have all these other + tasks that you still have to do, but", "tokens": [50580, 884, 300, 13, 509, 445, + 11, 291, 733, 295, 362, 439, 613, 661, 9608, 300, 291, 920, 362, 281, 360, 11, 457, + 50776], "temperature": 0.0, "avg_logprob": -0.17329160902235244, "compression_ratio": + 1.8160919540229885, "no_speech_prob": 0.0018212435534223914}, {"id": 161, "seek": + 91960, "start": 928.48, "end": 934.8000000000001, "text": " that one maybe has kept + it bay a little bit. Yeah, yeah, absolutely. I mean, maybe I can", "tokens": [50808, + 300, 472, 1310, 575, 4305, 309, 13642, 257, 707, 857, 13, 865, 11, 1338, 11, 3122, + 13, 286, 914, 11, 1310, 286, 393, 51124], "temperature": 0.0, "avg_logprob": -0.17329160902235244, + "compression_ratio": 1.8160919540229885, "no_speech_prob": 0.0018212435534223914}, + {"id": 162, "seek": 91960, "start": 935.76, "end": 942.64, "text": " a little bit + like, restate my question, or sort of like, clarify what I meant, I guess, when + you read,", "tokens": [51172, 257, 707, 857, 411, 11, 1472, 473, 452, 1168, 11, + 420, 1333, 295, 411, 11, 17594, 437, 286, 4140, 11, 286, 2041, 11, 562, 291, 1401, + 11, 51516], "temperature": 0.0, "avg_logprob": -0.17329160902235244, "compression_ratio": + 1.8160919540229885, "no_speech_prob": 0.0018212435534223914}, {"id": 163, "seek": + 91960, "start": 943.36, "end": 948.72, "text": " I think when you read the paper, + like, birth or similar papers, they also say, hey, we,", "tokens": [51552, 286, + 519, 562, 291, 1401, 264, 3035, 11, 411, 11, 3965, 420, 2531, 10577, 11, 436, 611, + 584, 11, 4177, 11, 321, 11, 51820], "temperature": 0.0, "avg_logprob": -0.17329160902235244, + "compression_ratio": 1.8160919540229885, "no_speech_prob": 0.0018212435534223914}, + {"id": 164, "seek": 94960, "start": 950.08, "end": 955.2, "text": " we ran down + this on downstream task, like sentiment analysis, we also did question answering,", + "tokens": [50388, 321, 5872, 760, 341, 322, 30621, 5633, 11, 411, 16149, 5215, 11, + 321, 611, 630, 1168, 13430, 11, 50644], "temperature": 0.0, "avg_logprob": -0.13802738811658777, + "compression_ratio": 1.6276150627615062, "no_speech_prob": 0.0023776732850819826}, + {"id": 165, "seek": 94960, "start": 955.2, "end": 960.4, "text": " we did recommendation, + all these other things. And it works great. Which kind of like pushes you", "tokens": + [50644, 321, 630, 11879, 11, 439, 613, 661, 721, 13, 400, 309, 1985, 869, 13, 3013, + 733, 295, 411, 21020, 291, 50904], "temperature": 0.0, "avg_logprob": -0.13802738811658777, + "compression_ratio": 1.6276150627615062, "no_speech_prob": 0.0023776732850819826}, + {"id": 166, "seek": 94960, "start": 960.4, "end": 966.72, "text": " to think in + the direction that is this a universal language model or approach that I can now + take", "tokens": [50904, 281, 519, 294, 264, 3513, 300, 307, 341, 257, 11455, 2856, + 2316, 420, 3109, 300, 286, 393, 586, 747, 51220], "temperature": 0.0, "avg_logprob": + -0.13802738811658777, "compression_ratio": 1.6276150627615062, "no_speech_prob": + 0.0023776732850819826}, {"id": 167, "seek": 94960, "start": 966.72, "end": 973.2, + "text": " and apply to everywhere, every task. And the answer is actually no, because, + hey, I mean, if you are", "tokens": [51220, 293, 3079, 281, 5315, 11, 633, 5633, + 13, 400, 264, 1867, 307, 767, 572, 11, 570, 11, 4177, 11, 286, 914, 11, 498, 291, + 366, 51544], "temperature": 0.0, "avg_logprob": -0.13802738811658777, "compression_ratio": + 1.6276150627615062, "no_speech_prob": 0.0023776732850819826}, {"id": 168, "seek": + 97320, "start": 973.2, "end": 979.9200000000001, "text": " in healthcare and they + trained on news, it''s not going to work. So the vocabulary still was not", "tokens": + [50364, 294, 8884, 293, 436, 8895, 322, 2583, 11, 309, 311, 406, 516, 281, 589, + 13, 407, 264, 19864, 920, 390, 406, 50700], "temperature": 0.0, "avg_logprob": -0.16023155428328603, + "compression_ratio": 1.588235294117647, "no_speech_prob": 0.007176055572926998}, + {"id": 169, "seek": 97320, "start": 979.9200000000001, "end": 986.32, "text": " + excluded from this journey. So if it''s mismatch, it''s mismatch. But the model + itself, of course,", "tokens": [50700, 29486, 490, 341, 4671, 13, 407, 498, 309, + 311, 23220, 852, 11, 309, 311, 23220, 852, 13, 583, 264, 2316, 2564, 11, 295, 1164, + 11, 51020], "temperature": 0.0, "avg_logprob": -0.16023155428328603, "compression_ratio": + 1.588235294117647, "no_speech_prob": 0.007176055572926998}, {"id": 170, "seek": + 97320, "start": 986.32, "end": 992.72, "text": " is a clever piece of, you know, + attack, which you can then take and kind of apply fine tune, maybe,", "tokens": + [51020, 307, 257, 13494, 2522, 295, 11, 291, 458, 11, 2690, 11, 597, 291, 393, 550, + 747, 293, 733, 295, 3079, 2489, 10864, 11, 1310, 11, 51340], "temperature": 0.0, + "avg_logprob": -0.16023155428328603, "compression_ratio": 1.588235294117647, "no_speech_prob": + 0.007176055572926998}, {"id": 171, "seek": 97320, "start": 992.72, "end": 1000.72, + "text": " or retrain on your data. So I think that that''s, that''s one way to look + at it, right?", "tokens": [51340, 420, 1533, 7146, 322, 428, 1412, 13, 407, 286, + 519, 300, 300, 311, 11, 300, 311, 472, 636, 281, 574, 412, 309, 11, 558, 30, 51740], + "temperature": 0.0, "avg_logprob": -0.16023155428328603, "compression_ratio": 1.588235294117647, + "no_speech_prob": 0.007176055572926998}, {"id": 172, "seek": 100072, "start": 1000.96, + "end": 1010.96, "text": " It is, but I think that we, we see a huge, still a huge + gap in the domain, right? I think there", "tokens": [50376, 467, 307, 11, 457, 286, + 519, 300, 321, 11, 321, 536, 257, 2603, 11, 920, 257, 2603, 7417, 294, 264, 9274, + 11, 558, 30, 286, 519, 456, 50876], "temperature": 0.0, "avg_logprob": -0.1841632544276226, + "compression_ratio": 1.5966850828729282, "no_speech_prob": 0.010473725385963917}, + {"id": 173, "seek": 100072, "start": 1010.96, "end": 1016.4, "text": " are a lot + of organizations that can just make use of retrain models and fine tune them. But,", + "tokens": [50876, 366, 257, 688, 295, 6150, 300, 393, 445, 652, 764, 295, 1533, + 7146, 5245, 293, 2489, 10864, 552, 13, 583, 11, 51148], "temperature": 0.0, "avg_logprob": + -0.1841632544276226, "compression_ratio": 1.5966850828729282, "no_speech_prob": + 0.010473725385963917}, {"id": 174, "seek": 100072, "start": 1018.08, "end": 1023.9200000000001, + "text": " you know, we, I know that there are still domains that you can''t do that. + Like, if you go up and you", "tokens": [51232, 291, 458, 11, 321, 11, 286, 458, + 300, 456, 366, 920, 25514, 300, 291, 393, 380, 360, 300, 13, 1743, 11, 498, 291, + 352, 493, 293, 291, 51524], "temperature": 0.0, "avg_logprob": -0.1841632544276226, + "compression_ratio": 1.5966850828729282, "no_speech_prob": 0.010473725385963917}, + {"id": 175, "seek": 102392, "start": 1023.92, "end": 1032.56, "text": " try, you + know, something that''s fine tuned, like law, right? Law is like its own language. + I wouldn''t", "tokens": [50364, 853, 11, 291, 458, 11, 746, 300, 311, 2489, 10870, + 11, 411, 2101, 11, 558, 30, 7744, 307, 411, 1080, 1065, 2856, 13, 286, 2759, 380, + 50796], "temperature": 0.0, "avg_logprob": -0.10807442429042098, "compression_ratio": + 1.75, "no_speech_prob": 0.0016423448687419295}, {"id": 176, "seek": 102392, "start": + 1032.56, "end": 1037.68, "text": " even, like law written in English, I wouldn''t + even call that English. I''d call that, you know, legal", "tokens": [50796, 754, + 11, 411, 2101, 3720, 294, 3669, 11, 286, 2759, 380, 754, 818, 300, 3669, 13, 286, + 1116, 818, 300, 11, 291, 458, 11, 5089, 51052], "temperature": 0.0, "avg_logprob": + -0.10807442429042098, "compression_ratio": 1.75, "no_speech_prob": 0.0016423448687419295}, + {"id": 177, "seek": 102392, "start": 1037.68, "end": 1046.8, "text": " English, + right? Because just the structure, the vocabulary, the grammar, all this stuff is + so", "tokens": [51052, 3669, 11, 558, 30, 1436, 445, 264, 3877, 11, 264, 19864, + 11, 264, 22317, 11, 439, 341, 1507, 307, 370, 51508], "temperature": 0.0, "avg_logprob": + -0.10807442429042098, "compression_ratio": 1.75, "no_speech_prob": 0.0016423448687419295}, + {"id": 178, "seek": 102392, "start": 1046.8, "end": 1051.92, "text": " different + than what''s in like a Wikipedia article or in the news or something like that, + right?", "tokens": [51508, 819, 813, 437, 311, 294, 411, 257, 28999, 7222, 420, + 294, 264, 2583, 420, 746, 411, 300, 11, 558, 30, 51764], "temperature": 0.0, "avg_logprob": + -0.10807442429042098, "compression_ratio": 1.75, "no_speech_prob": 0.0016423448687419295}, + {"id": 179, "seek": 105192, "start": 1052.4, "end": 1060.0, "text": " So, when you + try to do a fine tuning on a pretrained model that''s trained on, you know, let''s + say like", "tokens": [50388, 407, 11, 562, 291, 853, 281, 360, 257, 2489, 15164, + 322, 257, 1162, 31774, 2316, 300, 311, 8895, 322, 11, 291, 458, 11, 718, 311, 584, + 411, 50768], "temperature": 0.0, "avg_logprob": -0.20728203685013288, "compression_ratio": + 1.6434782608695653, "no_speech_prob": 0.0008146049221977592}, {"id": 180, "seek": + 105192, "start": 1060.0, "end": 1065.04, "text": " onto notice 5, which is a bunch + of collections of like, you know, news, Wikipedia, like general", "tokens": [50768, + 3911, 3449, 1025, 11, 597, 307, 257, 3840, 295, 16641, 295, 411, 11, 291, 458, 11, + 2583, 11, 28999, 11, 411, 2674, 51020], "temperature": 0.0, "avg_logprob": -0.20728203685013288, + "compression_ratio": 1.6434782608695653, "no_speech_prob": 0.0008146049221977592}, + {"id": 181, "seek": 105192, "start": 1065.04, "end": 1070.88, "text": " knowledge + that most people use. When you find tune it, there''s still a gap. There''s, there''s", + "tokens": [51020, 3601, 300, 881, 561, 764, 13, 1133, 291, 915, 10864, 309, 11, + 456, 311, 920, 257, 7417, 13, 821, 311, 11, 456, 311, 51312], "temperature": 0.0, + "avg_logprob": -0.20728203685013288, "compression_ratio": 1.6434782608695653, "no_speech_prob": + 0.0008146049221977592}, {"id": 182, "seek": 105192, "start": 1070.88, "end": 1078.48, + "text": " something missing, right? Because the original trained model was lacking + this context.", "tokens": [51312, 746, 5361, 11, 558, 30, 1436, 264, 3380, 8895, + 2316, 390, 20889, 341, 4319, 13, 51692], "temperature": 0.0, "avg_logprob": -0.20728203685013288, + "compression_ratio": 1.6434782608695653, "no_speech_prob": 0.0008146049221977592}, + {"id": 183, "seek": 107848, "start": 1079.2, "end": 1087.52, "text": " And that''s, + that''s only for the content also. That''s just, that''s just the content. And when + people", "tokens": [50400, 400, 300, 311, 11, 300, 311, 787, 337, 264, 2701, 611, + 13, 663, 311, 445, 11, 300, 311, 445, 264, 2701, 13, 400, 562, 561, 50816], "temperature": + 0.0, "avg_logprob": -0.1696626773247352, "compression_ratio": 1.8064516129032258, + "no_speech_prob": 0.004847658798098564}, {"id": 184, "seek": 107848, "start": 1087.52, + "end": 1092.8, "text": " search and they type in terms, you know, you can imagine + like this, this Venn diagram of like,", "tokens": [50816, 3164, 293, 436, 2010, + 294, 2115, 11, 291, 458, 11, 291, 393, 3811, 411, 341, 11, 341, 691, 1857, 10686, + 295, 411, 11, 51080], "temperature": 0.0, "avg_logprob": -0.1696626773247352, "compression_ratio": + 1.8064516129032258, "no_speech_prob": 0.004847658798098564}, {"id": 185, "seek": + 107848, "start": 1092.8, "end": 1097.04, "text": " well, here''s, here''s all of + the content over here that you''ve trained on. And then here''s all the", "tokens": + [51080, 731, 11, 510, 311, 11, 510, 311, 439, 295, 264, 2701, 670, 510, 300, 291, + 600, 8895, 322, 13, 400, 550, 510, 311, 439, 264, 51292], "temperature": 0.0, "avg_logprob": + -0.1696626773247352, "compression_ratio": 1.8064516129032258, "no_speech_prob": + 0.004847658798098564}, {"id": 186, "seek": 107848, "start": 1097.04, "end": 1101.52, + "text": " terms that your people, that the users know, right? And you try to like + bring these closer together", "tokens": [51292, 2115, 300, 428, 561, 11, 300, 264, + 5022, 458, 11, 558, 30, 400, 291, 853, 281, 411, 1565, 613, 4966, 1214, 51516], + "temperature": 0.0, "avg_logprob": -0.1696626773247352, "compression_ratio": 1.8064516129032258, + "no_speech_prob": 0.004847658798098564}, {"id": 187, "seek": 110152, "start": 1101.6, + "end": 1111.84, "text": " somehow, right? If the model was trained on content that + is like up here, then you''re going to have", "tokens": [50368, 6063, 11, 558, 30, + 759, 264, 2316, 390, 8895, 322, 2701, 300, 307, 411, 493, 510, 11, 550, 291, 434, + 516, 281, 362, 50880], "temperature": 0.0, "avg_logprob": -0.15758652262168354, + "compression_ratio": 1.6147540983606556, "no_speech_prob": 0.01173362322151661}, + {"id": 188, "seek": 110152, "start": 1111.84, "end": 1116.16, "text": " trouble + like kind of putting it together. I don''t know if you can do a good job in my hands + showing", "tokens": [50880, 5253, 411, 733, 295, 3372, 309, 1214, 13, 286, 500, + 380, 458, 498, 291, 393, 360, 257, 665, 1691, 294, 452, 2377, 4099, 51096], "temperature": + 0.0, "avg_logprob": -0.15758652262168354, "compression_ratio": 1.6147540983606556, + "no_speech_prob": 0.01173362322151661}, {"id": 189, "seek": 110152, "start": 1116.16, + "end": 1124.8, "text": " this, but no, you''re doing perfect job there. So I think + that one of the one of the big existing", "tokens": [51096, 341, 11, 457, 572, 11, + 291, 434, 884, 2176, 1691, 456, 13, 407, 286, 519, 300, 472, 295, 264, 472, 295, + 264, 955, 6741, 51528], "temperature": 0.0, "avg_logprob": -0.15758652262168354, + "compression_ratio": 1.6147540983606556, "no_speech_prob": 0.01173362322151661}, + {"id": 190, "seek": 110152, "start": 1124.8, "end": 1130.56, "text": " problems + is pre-training still costs like a ridiculous amount of money and is out of the + reach of", "tokens": [51528, 2740, 307, 659, 12, 17227, 1760, 920, 5497, 411, 257, + 11083, 2372, 295, 1460, 293, 307, 484, 295, 264, 2524, 295, 51816], "temperature": + 0.0, "avg_logprob": -0.15758652262168354, "compression_ratio": 1.6147540983606556, + "no_speech_prob": 0.01173362322151661}, {"id": 191, "seek": 113056, "start": 1130.56, + "end": 1139.76, "text": " most teams. Yeah, I''ve read, I''ve read papers, you know, + one of them was by Microsoft showing like,", "tokens": [50364, 881, 5491, 13, 865, + 11, 286, 600, 1401, 11, 286, 600, 1401, 10577, 11, 291, 458, 11, 472, 295, 552, + 390, 538, 8116, 4099, 411, 11, 50824], "temperature": 0.0, "avg_logprob": -0.11304883858592239, + "compression_ratio": 1.6563876651982379, "no_speech_prob": 0.0005696497391909361}, + {"id": 192, "seek": 113056, "start": 1139.76, "end": 1144.24, "text": " if you, + you know, the bird vocabulary is like 30,000 words or something like that. If you + increase", "tokens": [50824, 498, 291, 11, 291, 458, 11, 264, 5255, 19864, 307, + 411, 2217, 11, 1360, 2283, 420, 746, 411, 300, 13, 759, 291, 3488, 51048], "temperature": + 0.0, "avg_logprob": -0.11304883858592239, "compression_ratio": 1.6563876651982379, + "no_speech_prob": 0.0005696497391909361}, {"id": 193, "seek": 113056, "start": 1144.24, + "end": 1154.8, "text": " the vocabulary size to like 100,000 words, then the model + generalizes much better. And you,", "tokens": [51048, 264, 19864, 2744, 281, 411, + 2319, 11, 1360, 2283, 11, 550, 264, 2316, 2674, 5660, 709, 1101, 13, 400, 291, 11, + 51576], "temperature": 0.0, "avg_logprob": -0.11304883858592239, "compression_ratio": + 1.6563876651982379, "no_speech_prob": 0.0005696497391909361}, {"id": 194, "seek": + 113056, "start": 1154.8, "end": 1157.9199999999998, "text": " of course, you expand + the content and the domains that are involved in that training.", "tokens": [51576, + 295, 1164, 11, 291, 5268, 264, 2701, 293, 264, 25514, 300, 366, 3288, 294, 300, + 3097, 13, 51732], "temperature": 0.0, "avg_logprob": -0.11304883858592239, "compression_ratio": + 1.6563876651982379, "no_speech_prob": 0.0005696497391909361}, {"id": 195, "seek": + 115792, "start": 1158.4, "end": 1166.3200000000002, "text": " So I think you, I + think we''re going to see some more of that. The world is still stuck on this 30,000", + "tokens": [50388, 407, 286, 519, 291, 11, 286, 519, 321, 434, 516, 281, 536, 512, + 544, 295, 300, 13, 440, 1002, 307, 920, 5541, 322, 341, 2217, 11, 1360, 50784], + "temperature": 0.0, "avg_logprob": -0.16254050835319186, "compression_ratio": 1.58008658008658, + "no_speech_prob": 0.002149689244106412}, {"id": 196, "seek": 115792, "start": 1166.3200000000002, + "end": 1173.44, "text": " terms in the pre-trained space of things like onto notes + because it''s just so expensive,", "tokens": [50784, 2115, 294, 264, 659, 12, 17227, + 2001, 1901, 295, 721, 411, 3911, 5570, 570, 309, 311, 445, 370, 5124, 11, 51140], + "temperature": 0.0, "avg_logprob": -0.16254050835319186, "compression_ratio": 1.58008658008658, + "no_speech_prob": 0.002149689244106412}, {"id": 197, "seek": 115792, "start": 1173.44, + "end": 1178.3200000000002, "text": " it''s train models and Google and Microsoft + and Facebook and these companies that train models,", "tokens": [51140, 309, 311, + 3847, 5245, 293, 3329, 293, 8116, 293, 4384, 293, 613, 3431, 300, 3847, 5245, 11, + 51384], "temperature": 0.0, "avg_logprob": -0.16254050835319186, "compression_ratio": + 1.58008658008658, "no_speech_prob": 0.002149689244106412}, {"id": 198, "seek": 115792, + "start": 1178.3200000000002, "end": 1182.96, "text": " they''re not going to bother + open sourcing those. Maybe they will at some point,", "tokens": [51384, 436, 434, + 406, 516, 281, 8677, 1269, 11006, 2175, 729, 13, 2704, 436, 486, 412, 512, 935, + 11, 51616], "temperature": 0.0, "avg_logprob": -0.16254050835319186, "compression_ratio": + 1.58008658008658, "no_speech_prob": 0.002149689244106412}, {"id": 199, "seek": 118296, + "start": 1183.04, "end": 1187.8400000000001, "text": " but I think we''re going + to need to see big companies that are specific in that domain,", "tokens": [50368, + 457, 286, 519, 321, 434, 516, 281, 643, 281, 536, 955, 3431, 300, 366, 2685, 294, + 300, 9274, 11, 50608], "temperature": 0.0, "avg_logprob": -0.1331671679461444, "compression_ratio": + 1.8611111111111112, "no_speech_prob": 0.0076689026318490505}, {"id": 200, "seek": + 118296, "start": 1187.8400000000001, "end": 1192.64, "text": " train those models + and then open sourcing them. But if you spend millions of dollars to train a model", + "tokens": [50608, 3847, 729, 5245, 293, 550, 1269, 11006, 2175, 552, 13, 583, 498, + 291, 3496, 6803, 295, 3808, 281, 3847, 257, 2316, 50848], "temperature": 0.0, "avg_logprob": + -0.1331671679461444, "compression_ratio": 1.8611111111111112, "no_speech_prob": + 0.0076689026318490505}, {"id": 201, "seek": 118296, "start": 1192.64, "end": 1197.76, + "text": " and you''re a big private company, are you going to open sourcing the + model weights? Probably not,", "tokens": [50848, 293, 291, 434, 257, 955, 4551, + 2237, 11, 366, 291, 516, 281, 1269, 11006, 2175, 264, 2316, 17443, 30, 9210, 406, + 11, 51104], "temperature": 0.0, "avg_logprob": -0.1331671679461444, "compression_ratio": + 1.8611111111111112, "no_speech_prob": 0.0076689026318490505}, {"id": 202, "seek": + 118296, "start": 1197.76, "end": 1201.92, "text": " you''re going to keep it for + yourself because that''s huge value, it''s huge value for your product.", "tokens": + [51104, 291, 434, 516, 281, 1066, 309, 337, 1803, 570, 300, 311, 2603, 2158, 11, + 309, 311, 2603, 2158, 337, 428, 1674, 13, 51312], "temperature": 0.0, "avg_logprob": + -0.1331671679461444, "compression_ratio": 1.8611111111111112, "no_speech_prob": + 0.0076689026318490505}, {"id": 203, "seek": 118296, "start": 1203.2, "end": 1206.96, + "text": " I guess you open sourced the idea sort of if you publish, okay, here''s + the bird model,", "tokens": [51376, 286, 2041, 291, 1269, 11006, 1232, 264, 1558, + 1333, 295, 498, 291, 11374, 11, 1392, 11, 510, 311, 264, 5255, 2316, 11, 51564], + "temperature": 0.0, "avg_logprob": -0.1331671679461444, "compression_ratio": 1.8611111111111112, + "no_speech_prob": 0.0076689026318490505}, {"id": 204, "seek": 118296, "start": 1206.96, + "end": 1210.8, "text": " here''s the mom model or whatever. But then go train it + yourself.", "tokens": [51564, 510, 311, 264, 1225, 2316, 420, 2035, 13, 583, 550, + 352, 3847, 309, 1803, 13, 51756], "temperature": 0.0, "avg_logprob": -0.1331671679461444, + "compression_ratio": 1.8611111111111112, "no_speech_prob": 0.0076689026318490505}, + {"id": 205, "seek": 121080, "start": 1210.8, "end": 1215.12, "text": " Yeah, yeah, + if you have a couple million dollars lying around.", "tokens": [50364, 865, 11, + 1338, 11, 498, 291, 362, 257, 1916, 2459, 3808, 8493, 926, 13, 50580], "temperature": + 0.0, "avg_logprob": -0.2861937673468339, "compression_ratio": 1.525, "no_speech_prob": + 0.014519112184643745}, {"id": 206, "seek": 121080, "start": 1215.12, "end": 1221.6, + "text": " Yeah, and then I was also talking to in another episode, I mean, Ahmaad, + who used to work at Google,", "tokens": [50580, 865, 11, 293, 550, 286, 390, 611, + 1417, 281, 294, 1071, 3500, 11, 286, 914, 11, 2438, 1696, 345, 11, 567, 1143, 281, + 589, 412, 3329, 11, 50904], "temperature": 0.0, "avg_logprob": -0.2861937673468339, + "compression_ratio": 1.525, "no_speech_prob": 0.014519112184643745}, {"id": 207, + "seek": 121080, "start": 1222.24, "end": 1231.04, "text": " and he said that entire + teams would be dedicated on a quarterly basis to do the expensive fine-tuning work", + "tokens": [50936, 293, 415, 848, 300, 2302, 5491, 576, 312, 8374, 322, 257, 38633, + 5143, 281, 360, 264, 5124, 2489, 12, 83, 37726, 589, 51376], "temperature": 0.0, + "avg_logprob": -0.2861937673468339, "compression_ratio": 1.525, "no_speech_prob": + 0.014519112184643745}, {"id": 208, "seek": 121080, "start": 1231.84, "end": 1237.68, + "text": " with burrito similar models. So can you imagine that it''s like a team''s + effort and this people,", "tokens": [51416, 365, 2779, 17492, 2531, 5245, 13, 407, + 393, 291, 3811, 300, 309, 311, 411, 257, 1469, 311, 4630, 293, 341, 561, 11, 51708], + "temperature": 0.0, "avg_logprob": -0.2861937673468339, "compression_ratio": 1.525, + "no_speech_prob": 0.014519112184643745}, {"id": 209, "seek": 123768, "start": 1237.68, + "end": 1243.8400000000001, "text": " some of them invented the model, some of them + didn''t, but you know, with all the resources that Google", "tokens": [50364, 512, + 295, 552, 14479, 264, 2316, 11, 512, 295, 552, 994, 380, 11, 457, 291, 458, 11, + 365, 439, 264, 3593, 300, 3329, 50672], "temperature": 0.0, "avg_logprob": -0.14056586344307714, + "compression_ratio": 1.6791666666666667, "no_speech_prob": 0.01326596550643444}, + {"id": 210, "seek": 123768, "start": 1243.8400000000001, "end": 1250.96, "text": + " has to fine tune them for three months. So I don''t think this is out of reach + of startups. And I mean,", "tokens": [50672, 575, 281, 2489, 10864, 552, 337, 1045, + 2493, 13, 407, 286, 500, 380, 519, 341, 307, 484, 295, 2524, 295, 28041, 13, 400, + 286, 914, 11, 51028], "temperature": 0.0, "avg_logprob": -0.14056586344307714, "compression_ratio": + 1.6791666666666667, "no_speech_prob": 0.01326596550643444}, {"id": 211, "seek": + 123768, "start": 1250.96, "end": 1254.8, "text": " there are other things that are + out of reach, like, and this is where you saw the gap with MITI,", "tokens": [51028, + 456, 366, 661, 721, 300, 366, 484, 295, 2524, 11, 411, 11, 293, 341, 307, 689, 291, + 1866, 264, 7417, 365, 13100, 40, 11, 51220], "temperature": 0.0, "avg_logprob": + -0.14056586344307714, "compression_ratio": 1.6791666666666667, "no_speech_prob": + 0.01326596550643444}, {"id": 212, "seek": 123768, "start": 1254.8, "end": 1263.8400000000001, + "text": " I want to get closer to the MITI now. So there is, you know, every time + I install a vector database,", "tokens": [51220, 286, 528, 281, 483, 4966, 281, + 264, 13100, 40, 586, 13, 407, 456, 307, 11, 291, 458, 11, 633, 565, 286, 3625, 257, + 8062, 8149, 11, 51672], "temperature": 0.0, "avg_logprob": -0.14056586344307714, + "compression_ratio": 1.6791666666666667, "no_speech_prob": 0.01326596550643444}, + {"id": 213, "seek": 126384, "start": 1263.84, "end": 1270.3999999999999, "text": + " I''m not going to name one. And it says, hey, you know, it will be faster if you + use GPUs. And I''m like,", "tokens": [50364, 286, 478, 406, 516, 281, 1315, 472, + 13, 400, 309, 1619, 11, 4177, 11, 291, 458, 11, 309, 486, 312, 4663, 498, 291, 764, + 18407, 82, 13, 400, 286, 478, 411, 11, 50692], "temperature": 0.0, "avg_logprob": + -0.17104445370760832, "compression_ratio": 1.6574074074074074, "no_speech_prob": + 0.004702328238636255}, {"id": 214, "seek": 126384, "start": 1270.3999999999999, + "end": 1277.28, "text": " okay, I''m a startup. I don''t have GPUs. You know, so + this is, I think one of the gaps that you saw", "tokens": [50692, 1392, 11, 286, + 478, 257, 18578, 13, 286, 500, 380, 362, 18407, 82, 13, 509, 458, 11, 370, 341, + 307, 11, 286, 519, 472, 295, 264, 15031, 300, 291, 1866, 51036], "temperature": + 0.0, "avg_logprob": -0.17104445370760832, "compression_ratio": 1.6574074074074074, + "no_speech_prob": 0.004702328238636255}, {"id": 215, "seek": 126384, "start": 1277.28, + "end": 1282.8799999999999, "text": " with MITI, but are there other gaps that you + saw that you are addressing with MITI server?", "tokens": [51036, 365, 13100, 40, + 11, 457, 366, 456, 661, 15031, 300, 291, 1866, 300, 291, 366, 14329, 365, 13100, + 40, 7154, 30, 51316], "temperature": 0.0, "avg_logprob": -0.17104445370760832, "compression_ratio": + 1.6574074074074074, "no_speech_prob": 0.004702328238636255}, {"id": 216, "seek": + 126384, "start": 1285.4399999999998, "end": 1292.32, "text": " Yes, so the NLP world + right now, and the vector world right now,", "tokens": [51444, 1079, 11, 370, 264, + 426, 45196, 1002, 558, 586, 11, 293, 264, 8062, 1002, 558, 586, 11, 51788], "temperature": + 0.0, "avg_logprob": -0.17104445370760832, "compression_ratio": 1.6574074074074074, + "no_speech_prob": 0.004702328238636255}, {"id": 217, "seek": 129232, "start": 1292.96, + "end": 1296.96, "text": " they all they talk about is Python, Python, Python, Python, + everything is in Python.", "tokens": [50396, 436, 439, 436, 751, 466, 307, 15329, + 11, 15329, 11, 15329, 11, 15329, 11, 1203, 307, 294, 15329, 13, 50596], "temperature": + 0.0, "avg_logprob": -0.23446333926656973, "compression_ratio": 1.6682926829268292, + "no_speech_prob": 0.0007650977931916714}, {"id": 218, "seek": 129232, "start": 1297.9199999999998, + "end": 1300.8799999999999, "text": " When you get to production, you use something + else, but it''s Python, Python, Python.", "tokens": [50644, 1133, 291, 483, 281, + 4265, 11, 291, 764, 746, 1646, 11, 457, 309, 311, 15329, 11, 15329, 11, 15329, 13, + 50792], "temperature": 0.0, "avg_logprob": -0.23446333926656973, "compression_ratio": + 1.6682926829268292, "no_speech_prob": 0.0007650977931916714}, {"id": 219, "seek": + 129232, "start": 1304.3999999999999, "end": 1312.24, "text": " So I wanted to, I + came from a non-Python background. I started with C,", "tokens": [50968, 407, 286, + 1415, 281, 11, 286, 1361, 490, 257, 2107, 12, 47, 88, 11943, 3678, 13, 286, 1409, + 365, 383, 11, 51360], "temperature": 0.0, "avg_logprob": -0.23446333926656973, "compression_ratio": + 1.6682926829268292, "no_speech_prob": 0.0007650977931916714}, {"id": 220, "seek": + 129232, "start": 1313.36, "end": 1320.24, "text": " Pascal when I was really young + and then my seed programming is terrible, I''m sure. Then I discovered,", "tokens": + [51416, 41723, 562, 286, 390, 534, 2037, 293, 550, 452, 8871, 9410, 307, 6237, 11, + 286, 478, 988, 13, 1396, 286, 6941, 11, 51760], "temperature": 0.0, "avg_logprob": + -0.23446333926656973, "compression_ratio": 1.6682926829268292, "no_speech_prob": + 0.0007650977931916714}, {"id": 221, "seek": 132024, "start": 1320.24, "end": 1325.68, + "text": " you know, intermediate, intermediately compiled languages, Java, C sharp, + things like that.", "tokens": [50364, 291, 458, 11, 19376, 11, 19376, 356, 36548, + 8650, 11, 10745, 11, 383, 8199, 11, 721, 411, 300, 13, 50636], "temperature": 0.0, + "avg_logprob": -0.1762496381998062, "compression_ratio": 1.7946768060836502, "no_speech_prob": + 0.002240551169961691}, {"id": 222, "seek": 132024, "start": 1325.68, "end": 1330.88, + "text": " And that was like early 2000s for me. And I kind of went, I was in the + Microsoft world,", "tokens": [50636, 400, 300, 390, 411, 2440, 8132, 82, 337, 385, + 13, 400, 286, 733, 295, 1437, 11, 286, 390, 294, 264, 8116, 1002, 11, 50896], "temperature": + 0.0, "avg_logprob": -0.1762496381998062, "compression_ratio": 1.7946768060836502, + "no_speech_prob": 0.002240551169961691}, {"id": 223, "seek": 132024, "start": 1330.88, + "end": 1336.48, "text": " so I was doing C sharp for a while. And then I found, + and all the while I''ve been doing Java script", "tokens": [50896, 370, 286, 390, + 884, 383, 8199, 337, 257, 1339, 13, 400, 550, 286, 1352, 11, 293, 439, 264, 1339, + 286, 600, 668, 884, 10745, 5755, 51176], "temperature": 0.0, "avg_logprob": -0.1762496381998062, + "compression_ratio": 1.7946768060836502, "no_speech_prob": 0.002240551169961691}, + {"id": 224, "seek": 132024, "start": 1336.48, "end": 1343.92, "text": " because + of, you know, I was involved in the web, so in the mid 90s, and that''s how I got + involved", "tokens": [51176, 570, 295, 11, 291, 458, 11, 286, 390, 3288, 294, 264, + 3670, 11, 370, 294, 264, 2062, 4289, 82, 11, 293, 300, 311, 577, 286, 658, 3288, + 51548], "temperature": 0.0, "avg_logprob": -0.1762496381998062, "compression_ratio": + 1.7946768060836502, "no_speech_prob": 0.002240551169961691}, {"id": 225, "seek": + 132024, "start": 1343.92, "end": 1348.56, "text": " with content and content data + and all this stuff. It''s just all web stuff. And then you got to", "tokens": [51548, + 365, 2701, 293, 2701, 1412, 293, 439, 341, 1507, 13, 467, 311, 445, 439, 3670, 1507, + 13, 400, 550, 291, 658, 281, 51780], "temperature": 0.0, "avg_logprob": -0.1762496381998062, + "compression_ratio": 1.7946768060836502, "no_speech_prob": 0.002240551169961691}, + {"id": 226, "seek": 134856, "start": 1348.56, "end": 1353.52, "text": " know JavaScript + if you do anything with the web. So it was like C sharp and JavaScript for me for + a while.", "tokens": [50364, 458, 15778, 498, 291, 360, 1340, 365, 264, 3670, 13, + 407, 309, 390, 411, 383, 8199, 293, 15778, 337, 385, 337, 257, 1339, 13, 50612], + "temperature": 0.0, "avg_logprob": -0.23894014444437112, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.0007655072840861976}, {"id": 227, "seek": 134856, "start": 1355.12, + "end": 1360.72, "text": " So I know that there''s a gap. If you go and if you go + and you go into the JavaScript world,", "tokens": [50692, 407, 286, 458, 300, 456, + 311, 257, 7417, 13, 759, 291, 352, 293, 498, 291, 352, 293, 291, 352, 666, 264, + 15778, 1002, 11, 50972], "temperature": 0.0, "avg_logprob": -0.23894014444437112, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.0007655072840861976}, + {"id": 228, "seek": 134856, "start": 1360.72, "end": 1369.9199999999998, "text": + " the node, or, you know, TypeScript or those things, Dino now, there''s nothing. + You want to do NLP?", "tokens": [50972, 264, 9984, 11, 420, 11, 291, 458, 11, 15576, + 14237, 420, 729, 721, 11, 413, 2982, 586, 11, 456, 311, 1825, 13, 509, 528, 281, + 360, 426, 45196, 30, 51432], "temperature": 0.0, "avg_logprob": -0.23894014444437112, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.0007655072840861976}, + {"id": 229, "seek": 134856, "start": 1369.9199999999998, "end": 1376.96, "text": + " Learn Python. That''s pretty much the suggestion. Same with C sharp, you know. + Okay, well, there''s", "tokens": [51432, 17216, 15329, 13, 663, 311, 1238, 709, + 264, 16541, 13, 10635, 365, 383, 8199, 11, 291, 458, 13, 1033, 11, 731, 11, 456, + 311, 51784], "temperature": 0.0, "avg_logprob": -0.23894014444437112, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.0007655072840861976}, {"id": 230, "seek": + 137696, "start": 1376.96, "end": 1382.0, "text": " there''s some libraries out there, + but they''re clunky. Nobody really, you know, Microsoft probably", "tokens": [50364, + 456, 311, 512, 15148, 484, 456, 11, 457, 436, 434, 596, 25837, 13, 9297, 534, 11, + 291, 458, 11, 8116, 1391, 50616], "temperature": 0.0, "avg_logprob": -0.21946386992931366, + "compression_ratio": 1.7201365187713311, "no_speech_prob": 0.000817342079244554}, + {"id": 231, "seek": 137696, "start": 1382.0, "end": 1386.56, "text": " uses them, + right? Because they''re Microsoft and they built C sharp and everybody''s doing + Microsoft stuff.", "tokens": [50616, 4960, 552, 11, 558, 30, 1436, 436, 434, 8116, + 293, 436, 3094, 383, 8199, 293, 2201, 311, 884, 8116, 1507, 13, 50844], "temperature": + 0.0, "avg_logprob": -0.21946386992931366, "compression_ratio": 1.7201365187713311, + "no_speech_prob": 0.000817342079244554}, {"id": 232, "seek": 137696, "start": 1386.56, + "end": 1391.8400000000001, "text": " But, you know, outside of outside of Microsoft, + like who''s using C sharp for for natural language", "tokens": [50844, 583, 11, + 291, 458, 11, 2380, 295, 2380, 295, 8116, 11, 411, 567, 311, 1228, 383, 8199, 337, + 337, 3303, 2856, 51108], "temperature": 0.0, "avg_logprob": -0.21946386992931366, + "compression_ratio": 1.7201365187713311, "no_speech_prob": 0.000817342079244554}, + {"id": 233, "seek": 137696, "start": 1391.8400000000001, "end": 1399.44, "text": + " process to train models? No, but, and to host models, you know, okay, well, to + do it, you have to", "tokens": [51108, 1399, 281, 3847, 5245, 30, 883, 11, 457, + 11, 293, 281, 3975, 5245, 11, 291, 458, 11, 1392, 11, 731, 11, 281, 360, 309, 11, + 291, 362, 281, 51488], "temperature": 0.0, "avg_logprob": -0.21946386992931366, + "compression_ratio": 1.7201365187713311, "no_speech_prob": 0.000817342079244554}, + {"id": 234, "seek": 137696, "start": 1399.44, "end": 1405.04, "text": " jump through + all these hoops. And it''s really hard. So unless you want to like put Python in + your stack,", "tokens": [51488, 3012, 807, 439, 613, 1106, 3370, 13, 400, 309, 311, + 534, 1152, 13, 407, 5969, 291, 528, 281, 411, 829, 15329, 294, 428, 8630, 11, 51768], + "temperature": 0.0, "avg_logprob": -0.21946386992931366, "compression_ratio": 1.7201365187713311, + "no_speech_prob": 0.000817342079244554}, {"id": 235, "seek": 140504, "start": 1405.84, + "end": 1413.76, "text": " which is basically a non-starter for a lot of teams. A + lot of teams, they work in languages like", "tokens": [50404, 597, 307, 1936, 257, + 2107, 12, 33969, 337, 257, 688, 295, 5491, 13, 316, 688, 295, 5491, 11, 436, 589, + 294, 8650, 411, 50800], "temperature": 0.0, "avg_logprob": -0.1466675176248922, + "compression_ratio": 1.462686567164179, "no_speech_prob": 0.0013073710724711418}, + {"id": 236, "seek": 140504, "start": 1413.76, "end": 1423.2, "text": " node, JavaScript, + C sharp, Java, Ruby, Go. Like there''s so many huge languages out there that just", + "tokens": [50800, 9984, 11, 15778, 11, 383, 8199, 11, 10745, 11, 19907, 11, 1037, + 13, 1743, 456, 311, 370, 867, 2603, 8650, 484, 456, 300, 445, 51272], "temperature": + 0.0, "avg_logprob": -0.1466675176248922, "compression_ratio": 1.462686567164179, + "no_speech_prob": 0.0013073710724711418}, {"id": 237, "seek": 140504, "start": 1423.2, + "end": 1429.36, "text": " can''t touch these models. So I wanted something that + kind of broke out of this shell, this Python,", "tokens": [51272, 393, 380, 2557, + 613, 5245, 13, 407, 286, 1415, 746, 300, 733, 295, 6902, 484, 295, 341, 8720, 11, + 341, 15329, 11, 51580], "temperature": 0.0, "avg_logprob": -0.1466675176248922, + "compression_ratio": 1.462686567164179, "no_speech_prob": 0.0013073710724711418}, + {"id": 238, "seek": 142936, "start": 1430.0, "end": 1435.84, "text": " this Python + like enclosure of like how do you get this stuff into the hands of other people", + "tokens": [50396, 341, 15329, 411, 34093, 295, 411, 577, 360, 291, 483, 341, 1507, + 666, 264, 2377, 295, 661, 561, 50688], "temperature": 0.0, "avg_logprob": -0.2190812753171337, + "compression_ratio": 1.7162162162162162, "no_speech_prob": 0.001832067035138607}, + {"id": 239, "seek": 142936, "start": 1435.84, "end": 1441.36, "text": " just want + to build web applications. They don''t want to go and, you know, go into the Python + family.", "tokens": [50688, 445, 528, 281, 1322, 3670, 5821, 13, 814, 500, 380, + 528, 281, 352, 293, 11, 291, 458, 11, 352, 666, 264, 15329, 1605, 13, 50964], "temperature": + 0.0, "avg_logprob": -0.2190812753171337, "compression_ratio": 1.7162162162162162, + "no_speech_prob": 0.001832067035138607}, {"id": 240, "seek": 142936, "start": 1442.7199999999998, + "end": 1449.28, "text": " So that was that was one of the one of the starting catalysts + from from Mighty InfraServer.", "tokens": [51032, 407, 300, 390, 300, 390, 472, + 295, 264, 472, 295, 264, 2891, 23868, 82, 490, 490, 45874, 11537, 424, 31859, 331, + 13, 51360], "temperature": 0.0, "avg_logprob": -0.2190812753171337, "compression_ratio": + 1.7162162162162162, "no_speech_prob": 0.001832067035138607}, {"id": 241, "seek": + 142936, "start": 1451.6799999999998, "end": 1458.1599999999999, "text": " I, I there + are there is one tool that I have to use that is Python because it has to you have + to", "tokens": [51480, 286, 11, 286, 456, 366, 456, 307, 472, 2290, 300, 286, 362, + 281, 764, 300, 307, 15329, 570, 309, 575, 281, 291, 362, 281, 51804], "temperature": + 0.0, "avg_logprob": -0.2190812753171337, "compression_ratio": 1.7162162162162162, + "no_speech_prob": 0.001832067035138607}, {"id": 242, "seek": 145816, "start": 1458.16, + "end": 1463.2, "text": " convert a model and I convert the model to Onyx, which + is most people know about Onyx if you''re in", "tokens": [50364, 7620, 257, 2316, + 293, 286, 7620, 264, 2316, 281, 1282, 88, 87, 11, 597, 307, 881, 561, 458, 466, + 1282, 88, 87, 498, 291, 434, 294, 50616], "temperature": 0.0, "avg_logprob": -0.22187744568441517, + "compression_ratio": 1.5879828326180256, "no_speech_prob": 0.0003088583762291819}, + {"id": 243, "seek": 145816, "start": 1463.2, "end": 1469.44, "text": " the NLP world + by now, which is it''s ONNX, that''s for an open neural network exchange. And", + "tokens": [50616, 264, 426, 45196, 1002, 538, 586, 11, 597, 307, 309, 311, 422, + 45, 45, 55, 11, 300, 311, 337, 364, 1269, 18161, 3209, 7742, 13, 400, 50928], "temperature": + 0.0, "avg_logprob": -0.22187744568441517, "compression_ratio": 1.5879828326180256, + "no_speech_prob": 0.0003088583762291819}, {"id": 244, "seek": 145816, "start": 1470.16, + "end": 1476.96, "text": " is this intermediary format that can be used generically. + It''s like an open model format.", "tokens": [50964, 307, 341, 15184, 822, 7877, + 300, 393, 312, 1143, 1337, 984, 13, 467, 311, 411, 364, 1269, 2316, 7877, 13, 51304], + "temperature": 0.0, "avg_logprob": -0.22187744568441517, "compression_ratio": 1.5879828326180256, + "no_speech_prob": 0.0003088583762291819}, {"id": 245, "seek": 145816, "start": 1478.3200000000002, + "end": 1484.0, "text": " Now there are runtimes that you can take Onyx and Onyx + models and run them. So the biggest,", "tokens": [51372, 823, 456, 366, 49435, 1532, + 300, 291, 393, 747, 1282, 88, 87, 293, 1282, 88, 87, 5245, 293, 1190, 552, 13, 407, + 264, 3880, 11, 51656], "temperature": 0.0, "avg_logprob": -0.22187744568441517, + "compression_ratio": 1.5879828326180256, "no_speech_prob": 0.0003088583762291819}, + {"id": 246, "seek": 148400, "start": 1484.96, "end": 1490.72, "text": " the biggest + one is Onyx runtime and that''s developed by Microsoft, it''s open source. I see + licensed", "tokens": [50412, 264, 3880, 472, 307, 1282, 88, 87, 34474, 293, 300, + 311, 4743, 538, 8116, 11, 309, 311, 1269, 4009, 13, 286, 536, 25225, 50700], "temperature": + 0.0, "avg_logprob": -0.1664989249220172, "compression_ratio": 1.6569037656903767, + "no_speech_prob": 0.0006226053810678422}, {"id": 247, "seek": 148400, "start": 1491.6, + "end": 1499.2, "text": " and that''s written in C++. But there are bindings for + other languages and community contributes", "tokens": [50744, 293, 300, 311, 3720, + 294, 383, 25472, 13, 583, 456, 366, 14786, 1109, 337, 661, 8650, 293, 1768, 32035, + 51124], "temperature": 0.0, "avg_logprob": -0.1664989249220172, "compression_ratio": + 1.6569037656903767, "no_speech_prob": 0.0006226053810678422}, {"id": 248, "seek": + 148400, "start": 1499.2, "end": 1504.96, "text": " bindings. So you can use Onyx + runtime in Python if you want to. You can, and you''ll get like for those", "tokens": + [51124, 14786, 1109, 13, 407, 291, 393, 764, 1282, 88, 87, 34474, 294, 15329, 498, + 291, 528, 281, 13, 509, 393, 11, 293, 291, 603, 483, 411, 337, 729, 51412], "temperature": + 0.0, "avg_logprob": -0.1664989249220172, "compression_ratio": 1.6569037656903767, + "no_speech_prob": 0.0006226053810678422}, {"id": 249, "seek": 148400, "start": 1504.96, + "end": 1510.64, "text": " Python people who want to host models in Python, just + convert your model to Onyx and host it in a", "tokens": [51412, 15329, 561, 567, + 528, 281, 3975, 5245, 294, 15329, 11, 445, 7620, 428, 2316, 281, 1282, 88, 87, 293, + 3975, 309, 294, 257, 51696], "temperature": 0.0, "avg_logprob": -0.1664989249220172, + "compression_ratio": 1.6569037656903767, "no_speech_prob": 0.0006226053810678422}, + {"id": 250, "seek": 151064, "start": 1510.64, "end": 1515.2800000000002, "text": + " Python Onyx runtime. It''ll double the speed of the model inference, like out + of the box. You don''t", "tokens": [50364, 15329, 1282, 88, 87, 34474, 13, 467, + 603, 3834, 264, 3073, 295, 264, 2316, 38253, 11, 411, 484, 295, 264, 2424, 13, 509, + 500, 380, 50596], "temperature": 0.0, "avg_logprob": -0.21714177861943976, "compression_ratio": + 1.75, "no_speech_prob": 0.0002043155545834452}, {"id": 251, "seek": 151064, "start": + 1515.2800000000002, "end": 1519.1200000000001, "text": " have to do anything. You + press like a button. You don''t, you clone the repo, you press a button,", "tokens": + [50596, 362, 281, 360, 1340, 13, 509, 1886, 411, 257, 2960, 13, 509, 500, 380, 11, + 291, 26506, 264, 49040, 11, 291, 1886, 257, 2960, 11, 50788], "temperature": 0.0, + "avg_logprob": -0.21714177861943976, "compression_ratio": 1.75, "no_speech_prob": + 0.0002043155545834452}, {"id": 252, "seek": 151064, "start": 1519.1200000000001, + "end": 1527.1200000000001, "text": " then twice, twice as fast. But for others, + you know, there''s binding for C sharp, there''s bindings for", "tokens": [50788, + 550, 6091, 11, 6091, 382, 2370, 13, 583, 337, 2357, 11, 291, 458, 11, 456, 311, + 17359, 337, 383, 8199, 11, 456, 311, 14786, 1109, 337, 51188], "temperature": 0.0, + "avg_logprob": -0.21714177861943976, "compression_ratio": 1.75, "no_speech_prob": + 0.0002043155545834452}, {"id": 253, "seek": 151064, "start": 1527.1200000000001, + "end": 1534.96, "text": " Java, there''s, there might be bindings for Ruby. I haven''t + looked probably bindings for go. And even", "tokens": [51188, 10745, 11, 456, 311, + 11, 456, 1062, 312, 14786, 1109, 337, 19907, 13, 286, 2378, 380, 2956, 1391, 14786, + 1109, 337, 352, 13, 400, 754, 51580], "temperature": 0.0, "avg_logprob": -0.21714177861943976, + "compression_ratio": 1.75, "no_speech_prob": 0.0002043155545834452}, {"id": 254, + "seek": 153496, "start": 1534.96, "end": 1543.6000000000001, "text": " if Microsoft + doesn''t support them, the community builds them. So you can do this, but there''s + this", "tokens": [50364, 498, 8116, 1177, 380, 1406, 552, 11, 264, 1768, 15182, + 552, 13, 407, 291, 393, 360, 341, 11, 457, 456, 311, 341, 50796], "temperature": + 0.0, "avg_logprob": -0.1317237115675403, "compression_ratio": 1.8157894736842106, + "no_speech_prob": 0.0016603866824880242}, {"id": 255, "seek": 153496, "start": 1543.6000000000001, + "end": 1548.96, "text": " other problem that you have. The other problem is that, + well, those are just the model weights.", "tokens": [50796, 661, 1154, 300, 291, + 362, 13, 440, 661, 1154, 307, 300, 11, 731, 11, 729, 366, 445, 264, 2316, 17443, + 13, 51064], "temperature": 0.0, "avg_logprob": -0.1317237115675403, "compression_ratio": + 1.8157894736842106, "no_speech_prob": 0.0016603866824880242}, {"id": 256, "seek": + 153496, "start": 1548.96, "end": 1553.2, "text": " And if you''re talking about, + and hosting the runtime for the model weights, so you put in inputs", "tokens": + [51064, 400, 498, 291, 434, 1417, 466, 11, 293, 16058, 264, 34474, 337, 264, 2316, + 17443, 11, 370, 291, 829, 294, 15743, 51276], "temperature": 0.0, "avg_logprob": + -0.1317237115675403, "compression_ratio": 1.8157894736842106, "no_speech_prob": + 0.0016603866824880242}, {"id": 257, "seek": 153496, "start": 1553.2, "end": 1558.72, + "text": " and you get outputs. But where do you get the inputs from? Well, you have + to tokenize text,", "tokens": [51276, 293, 291, 483, 23930, 13, 583, 689, 360, 291, + 483, 264, 15743, 490, 30, 1042, 11, 291, 362, 281, 14862, 1125, 2487, 11, 51552], + "temperature": 0.0, "avg_logprob": -0.1317237115675403, "compression_ratio": 1.8157894736842106, + "no_speech_prob": 0.0016603866824880242}, {"id": 258, "seek": 153496, "start": 1559.3600000000001, + "end": 1564.32, "text": " you have to do all the stuff to prepare it to pre-process + it. And then when you tokenize and you do", "tokens": [51584, 291, 362, 281, 360, + 439, 264, 1507, 281, 5940, 309, 281, 659, 12, 41075, 309, 13, 400, 550, 562, 291, + 14862, 1125, 293, 291, 360, 51832], "temperature": 0.0, "avg_logprob": -0.1317237115675403, + "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0016603866824880242}, + {"id": 259, "seek": 156432, "start": 1564.32, "end": 1574.08, "text": " pre-processing, + then you can pass in those, the tokenized data as inputs. But all the tokenizers", + "tokens": [50364, 659, 12, 41075, 278, 11, 550, 291, 393, 1320, 294, 729, 11, 264, + 14862, 1602, 1412, 382, 15743, 13, 583, 439, 264, 14862, 22525, 50852], "temperature": + 0.0, "avg_logprob": -0.17863023281097412, "compression_ratio": 1.7339449541284404, + "no_speech_prob": 0.00027670248528011143}, {"id": 260, "seek": 156432, "start": + 1574.08, "end": 1581.52, "text": " are written in Python. So now you have to, now + you have that problem. So I actually used Rust", "tokens": [50852, 366, 3720, 294, + 15329, 13, 407, 586, 291, 362, 281, 11, 586, 291, 362, 300, 1154, 13, 407, 286, + 767, 1143, 34952, 51224], "temperature": 0.0, "avg_logprob": -0.17863023281097412, + "compression_ratio": 1.7339449541284404, "no_speech_prob": 0.00027670248528011143}, + {"id": 261, "seek": 156432, "start": 1581.52, "end": 1588.08, "text": " for mighty + inference server because hugging face based their tokenizer, their fast tokenizers", + "tokens": [51224, 337, 21556, 38253, 7154, 570, 41706, 1851, 2361, 641, 14862, 6545, + 11, 641, 2370, 14862, 22525, 51552], "temperature": 0.0, "avg_logprob": -0.17863023281097412, + "compression_ratio": 1.7339449541284404, "no_speech_prob": 0.00027670248528011143}, + {"id": 262, "seek": 156432, "start": 1588.08, "end": 1591.76, "text": " on Rust, + they wrote it in Rust, and they offer bindings for Python. So if you, if you install", + "tokens": [51552, 322, 34952, 11, 436, 4114, 309, 294, 34952, 11, 293, 436, 2626, + 14786, 1109, 337, 15329, 13, 407, 498, 291, 11, 498, 291, 3625, 51736], "temperature": + 0.0, "avg_logprob": -0.17863023281097412, "compression_ratio": 1.7339449541284404, + "no_speech_prob": 0.00027670248528011143}, {"id": 263, "seek": 159176, "start": + 1591.76, "end": 1598.48, "text": " a fast tokenizer in Python, you''re actually + using Rust bindings for that. So I wrote a web server", "tokens": [50364, 257, 2370, + 14862, 6545, 294, 15329, 11, 291, 434, 767, 1228, 34952, 14786, 1109, 337, 300, + 13, 407, 286, 4114, 257, 3670, 7154, 50700], "temperature": 0.0, "avg_logprob": + -0.21274781227111816, "compression_ratio": 1.6260504201680672, "no_speech_prob": + 0.00030918169068172574}, {"id": 264, "seek": 159176, "start": 1598.48, "end": 1606.96, + "text": " that wraps the Rust tokenizers and on X Run time. And I wrote a whole + bunch of code for pipeline", "tokens": [50700, 300, 25831, 264, 34952, 14862, 22525, + 293, 322, 1783, 8950, 565, 13, 400, 286, 4114, 257, 1379, 3840, 295, 3089, 337, + 15517, 51124], "temperature": 0.0, "avg_logprob": -0.21274781227111816, "compression_ratio": + 1.6260504201680672, "no_speech_prob": 0.00030918169068172574}, {"id": 265, "seek": + 159176, "start": 1606.96, "end": 1613.36, "text": " specific stuff like question + answering, sentence transformers, sequence classification, which is", "tokens": + [51124, 2685, 1507, 411, 1168, 13430, 11, 8174, 4088, 433, 11, 8310, 21538, 11, + 597, 307, 51444], "temperature": 0.0, "avg_logprob": -0.21274781227111816, "compression_ratio": + 1.6260504201680672, "no_speech_prob": 0.00030918169068172574}, {"id": 266, "seek": + 159176, "start": 1613.36, "end": 1621.2, "text": " like sentiment analysis token + classifications. That''s like, entity recognition. And I''m working", "tokens": + [51444, 411, 16149, 5215, 14862, 1508, 7833, 13, 663, 311, 411, 11, 13977, 11150, + 13, 400, 286, 478, 1364, 51836], "temperature": 0.0, "avg_logprob": -0.21274781227111816, + "compression_ratio": 1.6260504201680672, "no_speech_prob": 0.00030918169068172574}, + {"id": 267, "seek": 162120, "start": 1621.3600000000001, "end": 1628.8, "text": + " on others also, but it''s so much faster, it''s so much faster than Python, like + it''s not even close.", "tokens": [50372, 322, 2357, 611, 11, 457, 309, 311, 370, + 709, 4663, 11, 309, 311, 370, 709, 4663, 813, 15329, 11, 411, 309, 311, 406, 754, + 1998, 13, 50744], "temperature": 0.0, "avg_logprob": -0.13338863736107237, "compression_ratio": + 1.6609442060085837, "no_speech_prob": 0.0020275639835745096}, {"id": 268, "seek": + 162120, "start": 1631.2, "end": 1636.72, "text": " It''s probably like three or + four times as fast without any fine tuning of it. And I''ve gone", "tokens": [50864, + 467, 311, 1391, 411, 1045, 420, 1451, 1413, 382, 2370, 1553, 604, 2489, 15164, 295, + 309, 13, 400, 286, 600, 2780, 51140], "temperature": 0.0, "avg_logprob": -0.13338863736107237, + "compression_ratio": 1.6609442060085837, "no_speech_prob": 0.0020275639835745096}, + {"id": 269, "seek": 162120, "start": 1636.72, "end": 1642.56, "text": " through + fine tuning. So I haven''t compared it to Python in a long time, but I might be + like five", "tokens": [51140, 807, 2489, 15164, 13, 407, 286, 2378, 380, 5347, 309, + 281, 15329, 294, 257, 938, 565, 11, 457, 286, 1062, 312, 411, 1732, 51432], "temperature": + 0.0, "avg_logprob": -0.13338863736107237, "compression_ratio": 1.6609442060085837, + "no_speech_prob": 0.0020275639835745096}, {"id": 270, "seek": 162120, "start": 1642.56, + "end": 1650.16, "text": " times as fast as Python right now on CPU. You can also + use GPU if you want. And it''s, you maintain", "tokens": [51432, 1413, 382, 2370, + 382, 15329, 558, 586, 322, 13199, 13, 509, 393, 611, 764, 18407, 498, 291, 528, + 13, 400, 309, 311, 11, 291, 6909, 51812], "temperature": 0.0, "avg_logprob": -0.13338863736107237, + "compression_ratio": 1.6609442060085837, "no_speech_prob": 0.0020275639835745096}, + {"id": 271, "seek": 165016, "start": 1650.16, "end": 1658.72, "text": " the same, + the same speed. It''s just as fast. Yeah. Well, it''s just as fast as the, the ratio + of speed is", "tokens": [50364, 264, 912, 11, 264, 912, 3073, 13, 467, 311, 445, + 382, 2370, 13, 865, 13, 1042, 11, 309, 311, 445, 382, 2370, 382, 264, 11, 264, 8509, + 295, 3073, 307, 50792], "temperature": 0.0, "avg_logprob": -0.16960127848499226, + "compression_ratio": 1.736842105263158, "no_speech_prob": 0.002098849043250084}, + {"id": 272, "seek": 165016, "start": 1658.72, "end": 1664.0, "text": " like the, + you know, if you took the model and Python and you put it in a GPU versus you take + the model", "tokens": [50792, 411, 264, 11, 291, 458, 11, 498, 291, 1890, 264, 2316, + 293, 15329, 293, 291, 829, 309, 294, 257, 18407, 5717, 291, 747, 264, 2316, 51056], + "temperature": 0.0, "avg_logprob": -0.16960127848499226, "compression_ratio": 1.736842105263158, + "no_speech_prob": 0.002098849043250084}, {"id": 273, "seek": 165016, "start": 1664.0, + "end": 1669.2, "text": " and on X Run time, you put it in the GPU, you get it''s + far faster.", "tokens": [51056, 293, 322, 1783, 8950, 565, 11, 291, 829, 309, 294, + 264, 18407, 11, 291, 483, 309, 311, 1400, 4663, 13, 51316], "temperature": 0.0, + "avg_logprob": -0.16960127848499226, "compression_ratio": 1.736842105263158, "no_speech_prob": + 0.002098849043250084}, {"id": 274, "seek": 165016, "start": 1670.72, "end": 1675.28, + "text": " And you say like when you said bindings in other languages, you know, + like Java C sharp.", "tokens": [51392, 400, 291, 584, 411, 562, 291, 848, 14786, + 1109, 294, 661, 8650, 11, 291, 458, 11, 411, 10745, 383, 8199, 13, 51620], "temperature": + 0.0, "avg_logprob": -0.16960127848499226, "compression_ratio": 1.736842105263158, + "no_speech_prob": 0.002098849043250084}, {"id": 275, "seek": 167528, "start": 1676.24, + "end": 1683.28, "text": " So if my stack is in Java, I can take this model and kind + of plug it in into my Java code.", "tokens": [50412, 407, 498, 452, 8630, 307, 294, + 10745, 11, 286, 393, 747, 341, 2316, 293, 733, 295, 5452, 309, 294, 666, 452, 10745, + 3089, 13, 50764], "temperature": 0.0, "avg_logprob": -0.18871515933598312, "compression_ratio": + 1.6926605504587156, "no_speech_prob": 0.014592043124139309}, {"id": 276, "seek": + 167528, "start": 1684.6399999999999, "end": 1689.84, "text": " You can take a, you + can take a, let''s take a hugging face model, for example, like just", "tokens": + [50832, 509, 393, 747, 257, 11, 291, 393, 747, 257, 11, 718, 311, 747, 257, 41706, + 1851, 2316, 11, 337, 1365, 11, 411, 445, 51092], "temperature": 0.0, "avg_logprob": + -0.18871515933598312, "compression_ratio": 1.6926605504587156, "no_speech_prob": + 0.014592043124139309}, {"id": 277, "seek": 167528, "start": 1691.12, "end": 1695.76, + "text": " let''s say brick based on case, you know, most people know that one. Brat + based on case, you can", "tokens": [51156, 718, 311, 584, 16725, 2361, 322, 1389, + 11, 291, 458, 11, 881, 561, 458, 300, 472, 13, 1603, 267, 2361, 322, 1389, 11, 291, + 393, 51388], "temperature": 0.0, "avg_logprob": -0.18871515933598312, "compression_ratio": + 1.6926605504587156, "no_speech_prob": 0.014592043124139309}, {"id": 278, "seek": + 167528, "start": 1695.76, "end": 1700.3999999999999, "text": " export that to Onyx + with hugging face code in Python. And you have now you have an Onyx model.", "tokens": + [51388, 10725, 300, 281, 1282, 88, 87, 365, 41706, 1851, 3089, 294, 15329, 13, 400, + 291, 362, 586, 291, 362, 364, 1282, 88, 87, 2316, 13, 51620], "temperature": 0.0, + "avg_logprob": -0.18871515933598312, "compression_ratio": 1.6926605504587156, "no_speech_prob": + 0.014592043124139309}, {"id": 279, "seek": 170040, "start": 1700.8000000000002, + "end": 1712.3200000000002, "text": " Now you can, in Java, you can stand up a class + that wraps the Onyx runtime and you, and you load", "tokens": [50384, 823, 291, + 393, 11, 294, 10745, 11, 291, 393, 1463, 493, 257, 1508, 300, 25831, 264, 1282, + 88, 87, 34474, 293, 291, 11, 293, 291, 3677, 50960], "temperature": 0.0, "avg_logprob": + -0.1804190947946194, "compression_ratio": 1.8045454545454545, "no_speech_prob": + 0.0014941992703825235}, {"id": 280, "seek": 170040, "start": 1712.3200000000002, + "end": 1718.5600000000002, "text": " the model into memory with on X run time in + your class. And then you can create methods around that", "tokens": [50960, 264, + 2316, 666, 4675, 365, 322, 1783, 1190, 565, 294, 428, 1508, 13, 400, 550, 291, 393, + 1884, 7150, 926, 300, 51272], "temperature": 0.0, "avg_logprob": -0.1804190947946194, + "compression_ratio": 1.8045454545454545, "no_speech_prob": 0.0014941992703825235}, + {"id": 281, "seek": 170040, "start": 1718.5600000000002, "end": 1724.0, "text": + " class, right? And then you can call, you can call it and you can say, I''m going + to pass in the inputs", "tokens": [51272, 1508, 11, 558, 30, 400, 550, 291, 393, + 818, 11, 291, 393, 818, 309, 293, 291, 393, 584, 11, 286, 478, 516, 281, 1320, 294, + 264, 15743, 51544], "temperature": 0.0, "avg_logprob": -0.1804190947946194, "compression_ratio": + 1.8045454545454545, "no_speech_prob": 0.0014941992703825235}, {"id": 282, "seek": + 170040, "start": 1724.0, "end": 1728.8000000000002, "text": " and I''m going to + get out. And that''s all in Java now. Well, with the C++ wrapper for Onyx runtime,", + "tokens": [51544, 293, 286, 478, 516, 281, 483, 484, 13, 400, 300, 311, 439, 294, + 10745, 586, 13, 1042, 11, 365, 264, 383, 25472, 46906, 337, 1282, 88, 87, 34474, + 11, 51784], "temperature": 0.0, "avg_logprob": -0.1804190947946194, "compression_ratio": + 1.8045454545454545, "no_speech_prob": 0.0014941992703825235}, {"id": 283, "seek": + 172880, "start": 1728.8, "end": 1737.2, "text": " of course, but the connect, but + to wrap that C++ runtime, there have to be bindings between the", "tokens": [50364, + 295, 1164, 11, 457, 264, 1745, 11, 457, 281, 7019, 300, 383, 25472, 34474, 11, 456, + 362, 281, 312, 14786, 1109, 1296, 264, 50784], "temperature": 0.0, "avg_logprob": + -0.1769964055317204, "compression_ratio": 1.5828877005347595, "no_speech_prob": + 0.0011279038153588772}, {"id": 284, "seek": 172880, "start": 1737.2, "end": 1744.56, + "text": " language. So Java has to have some application interface to talk to C++. + Yeah, which is GNI, right?", "tokens": [50784, 2856, 13, 407, 10745, 575, 281, 362, + 512, 3861, 9226, 281, 751, 281, 383, 25472, 13, 865, 11, 597, 307, 460, 42496, 11, + 558, 30, 51152], "temperature": 0.0, "avg_logprob": -0.1769964055317204, "compression_ratio": + 1.5828877005347595, "no_speech_prob": 0.0011279038153588772}, {"id": 285, "seek": + 172880, "start": 1744.56, "end": 1753.36, "text": " Java native interface, I think. + Yeah, I think so. Java. Yeah. So that part, like having Java talk to", "tokens": + [51152, 10745, 8470, 9226, 11, 286, 519, 13, 865, 11, 286, 519, 370, 13, 10745, + 13, 865, 13, 407, 300, 644, 11, 411, 1419, 10745, 751, 281, 51592], "temperature": + 0.0, "avg_logprob": -0.1769964055317204, "compression_ratio": 1.5828877005347595, + "no_speech_prob": 0.0011279038153588772}, {"id": 286, "seek": 175336, "start": 1753.4399999999998, + "end": 1758.32, "text": " Onyx runtime is taken care of already. You still have + to write all the other stuff around it,", "tokens": [50368, 1282, 88, 87, 34474, + 307, 2726, 1127, 295, 1217, 13, 509, 920, 362, 281, 2464, 439, 264, 661, 1507, 926, + 309, 11, 50612], "temperature": 0.0, "avg_logprob": -0.18286997133547123, "compression_ratio": + 1.6050420168067228, "no_speech_prob": 0.0015153585700318217}, {"id": 287, "seek": + 175336, "start": 1758.32, "end": 1763.1999999999998, "text": " like to you to leverage + it. But that''s, you know, where programmers used to that sort of thing.", "tokens": + [50612, 411, 281, 291, 281, 13982, 309, 13, 583, 300, 311, 11, 291, 458, 11, 689, + 41504, 1143, 281, 300, 1333, 295, 551, 13, 50856], "temperature": 0.0, "avg_logprob": + -0.18286997133547123, "compression_ratio": 1.6050420168067228, "no_speech_prob": + 0.0015153585700318217}, {"id": 288, "seek": 175336, "start": 1763.1999999999998, + "end": 1775.6799999999998, "text": " You know, Java, you can, you can do that. And + I think, I don''t know if it''s, I don''t know how much", "tokens": [50856, 509, + 458, 11, 10745, 11, 291, 393, 11, 291, 393, 360, 300, 13, 400, 286, 519, 11, 286, + 500, 380, 458, 498, 309, 311, 11, 286, 500, 380, 458, 577, 709, 51480], "temperature": + 0.0, "avg_logprob": -0.18286997133547123, "compression_ratio": 1.6050420168067228, + "no_speech_prob": 0.0015153585700318217}, {"id": 289, "seek": 175336, "start": 1775.6799999999998, + "end": 1781.12, "text": " we''ve seen it, but Jeff Zemorek, who works at open source + connections, I know that is like he", "tokens": [51480, 321, 600, 1612, 309, 11, + 457, 7506, 1176, 443, 418, 74, 11, 567, 1985, 412, 1269, 4009, 9271, 11, 286, 458, + 300, 307, 411, 415, 51752], "temperature": 0.0, "avg_logprob": -0.18286997133547123, + "compression_ratio": 1.6050420168067228, "no_speech_prob": 0.0015153585700318217}, + {"id": 290, "seek": 178112, "start": 1781.12, "end": 1785.84, "text": " was working + on a project where he, you know, he could try to load an Onyx runtime in open,", + "tokens": [50364, 390, 1364, 322, 257, 1716, 689, 415, 11, 291, 458, 11, 415, 727, + 853, 281, 3677, 364, 1282, 88, 87, 34474, 294, 1269, 11, 50600], "temperature": + 0.0, "avg_logprob": -0.23622053703375623, "compression_ratio": 1.575, "no_speech_prob": + 0.001816479256376624}, {"id": 291, "seek": 178112, "start": 1785.84, "end": 1793.04, + "text": " in open NLP, which is a Java program. So trying to get an Onyx model in + open NLP. And I think he", "tokens": [50600, 294, 1269, 426, 45196, 11, 597, 307, + 257, 10745, 1461, 13, 407, 1382, 281, 483, 364, 1282, 88, 87, 2316, 294, 1269, 426, + 45196, 13, 400, 286, 519, 415, 50960], "temperature": 0.0, "avg_logprob": -0.23622053703375623, + "compression_ratio": 1.575, "no_speech_prob": 0.001816479256376624}, {"id": 292, + "seek": 178112, "start": 1793.04, "end": 1799.04, "text": " succeeded. I haven''t + seen code for that. Yeah. Yeah. But that''s what I''m exactly. Yeah, that''s,", + "tokens": [50960, 20263, 13, 286, 2378, 380, 1612, 3089, 337, 300, 13, 865, 13, + 865, 13, 583, 300, 311, 437, 286, 478, 2293, 13, 865, 11, 300, 311, 11, 51260], + "temperature": 0.0, "avg_logprob": -0.23622053703375623, "compression_ratio": 1.575, + "no_speech_prob": 0.001816479256376624}, {"id": 293, "seek": 178112, "start": 1799.04, + "end": 1806.08, "text": " that''s awesome. So I mean, the reason I''m asking is + because I witnessed this tectonic shift in", "tokens": [51260, 300, 311, 3476, 13, + 407, 286, 914, 11, 264, 1778, 286, 478, 3365, 307, 570, 286, 21519, 341, 535, 349, + 11630, 5513, 294, 51612], "temperature": 0.0, "avg_logprob": -0.23622053703375623, + "compression_ratio": 1.575, "no_speech_prob": 0.001816479256376624}, {"id": 294, + "seek": 180608, "start": 1806.24, "end": 1815.36, "text": " in my previous company + where we had the entire stack in Java. Even though we started with Pearl,", "tokens": + [50372, 294, 452, 3894, 2237, 689, 321, 632, 264, 2302, 8630, 294, 10745, 13, 2754, + 1673, 321, 1409, 365, 24639, 11, 50828], "temperature": 0.0, "avg_logprob": -0.22057157693449983, + "compression_ratio": 1.5756302521008403, "no_speech_prob": 0.0024803695268929005}, + {"id": 295, "seek": 180608, "start": 1815.36, "end": 1823.4399999999998, "text": + " but we had to read right everything into Java, just didn''t scale on Pearl. And, + yeah, and I mean,", "tokens": [50828, 457, 321, 632, 281, 1401, 558, 1203, 666, + 10745, 11, 445, 994, 380, 4373, 322, 24639, 13, 400, 11, 1338, 11, 293, 286, 914, + 11, 51232], "temperature": 0.0, "avg_logprob": -0.22057157693449983, "compression_ratio": + 1.5756302521008403, "no_speech_prob": 0.0024803695268929005}, {"id": 296, "seek": + 180608, "start": 1823.4399999999998, "end": 1829.4399999999998, "text": " we had + Apache Solar on one end as the open source search engine also written in Java. And,", + "tokens": [51232, 321, 632, 46597, 22385, 322, 472, 917, 382, 264, 1269, 4009, 3164, + 2848, 611, 3720, 294, 10745, 13, 400, 11, 51532], "temperature": 0.0, "avg_logprob": + -0.22057157693449983, "compression_ratio": 1.5756302521008403, "no_speech_prob": + 0.0024803695268929005}, {"id": 297, "seek": 180608, "start": 1829.4399999999998, + "end": 1835.4399999999998, "text": " you know, when we would customize it, we would + write plugins in Java and so on. But then,", "tokens": [51532, 291, 458, 11, 562, + 321, 576, 19734, 309, 11, 321, 576, 2464, 33759, 294, 10745, 293, 370, 322, 13, + 583, 550, 11, 51832], "temperature": 0.0, "avg_logprob": -0.22057157693449983, "compression_ratio": + 1.5756302521008403, "no_speech_prob": 0.0024803695268929005}, {"id": 298, "seek": + 183544, "start": 1835.76, "end": 1842.48, "text": " when we wanted to introduce + AI into the pipeline, of course, everything was in Python. We hired", "tokens": + [50380, 562, 321, 1415, 281, 5366, 7318, 666, 264, 15517, 11, 295, 1164, 11, 1203, + 390, 294, 15329, 13, 492, 13144, 50716], "temperature": 0.0, "avg_logprob": -0.12215498515537807, + "compression_ratio": 1.6166666666666667, "no_speech_prob": 0.003057604655623436}, + {"id": 299, "seek": 183544, "start": 1842.48, "end": 1849.2, "text": " people who + could only do Python, nothing else, fresh grads. And, and now you are stuck with + this", "tokens": [50716, 561, 567, 727, 787, 360, 15329, 11, 1825, 1646, 11, 4451, + 677, 5834, 13, 400, 11, 293, 586, 291, 366, 5541, 365, 341, 51052], "temperature": + 0.0, "avg_logprob": -0.12215498515537807, "compression_ratio": 1.6166666666666667, + "no_speech_prob": 0.003057604655623436}, {"id": 300, "seek": 183544, "start": 1849.2, + "end": 1855.92, "text": " new architecture. Okay, you have Python as one step in + the pipeline. How do you call it? How do", "tokens": [51052, 777, 9482, 13, 1033, + 11, 291, 362, 15329, 382, 472, 1823, 294, 264, 15517, 13, 1012, 360, 291, 818, 309, + 30, 1012, 360, 51388], "temperature": 0.0, "avg_logprob": -0.12215498515537807, + "compression_ratio": 1.6166666666666667, "no_speech_prob": 0.003057604655623436}, + {"id": 301, "seek": 183544, "start": 1855.92, "end": 1861.8400000000001, "text": + " you handle errors? How do you scale this thing? Right? And we were also moving + to Kubernetes to add", "tokens": [51388, 291, 4813, 13603, 30, 1012, 360, 291, 4373, + 341, 551, 30, 1779, 30, 400, 321, 645, 611, 2684, 281, 23145, 281, 909, 51684], + "temperature": 0.0, "avg_logprob": -0.12215498515537807, "compression_ratio": 1.6166666666666667, + "no_speech_prob": 0.003057604655623436}, {"id": 302, "seek": 186184, "start": 1861.84, + "end": 1871.4399999999998, "text": " to this crazy mix. And, and what we ended up + doing is that we would have a synchronous processor,", "tokens": [50364, 281, 341, + 3219, 2890, 13, 400, 11, 293, 437, 321, 4590, 493, 884, 307, 300, 321, 576, 362, + 257, 44743, 15321, 11, 50844], "temperature": 0.0, "avg_logprob": -0.13527759240598095, + "compression_ratio": 1.5378486055776892, "no_speech_prob": 0.004806575831025839}, + {"id": 303, "seek": 186184, "start": 1871.4399999999998, "end": 1878.1599999999999, + "text": " plugged in in every place where you have Python to abstract Python away + from Java. Right? So you", "tokens": [50844, 25679, 294, 294, 633, 1081, 689, 291, + 362, 15329, 281, 12649, 15329, 1314, 490, 10745, 13, 1779, 30, 407, 291, 51180], + "temperature": 0.0, "avg_logprob": -0.13527759240598095, "compression_ratio": 1.5378486055776892, + "no_speech_prob": 0.004806575831025839}, {"id": 304, "seek": 186184, "start": 1878.1599999999999, + "end": 1883.28, "text": " would kind of like just say send this message to an SQS + queue. And on the other end, there is", "tokens": [51180, 576, 733, 295, 411, 445, + 584, 2845, 341, 3636, 281, 364, 318, 48, 50, 18639, 13, 400, 322, 264, 661, 917, + 11, 456, 307, 51436], "temperature": 0.0, "avg_logprob": -0.13527759240598095, "compression_ratio": + 1.5378486055776892, "no_speech_prob": 0.004806575831025839}, {"id": 305, "seek": + 186184, "start": 1883.28, "end": 1889.84, "text": " somebody consuming it. Can you + imagine how scalable this can be? It works. It works. You can also", "tokens": [51436, + 2618, 19867, 309, 13, 1664, 291, 3811, 577, 38481, 341, 393, 312, 30, 467, 1985, + 13, 467, 1985, 13, 509, 393, 611, 51764], "temperature": 0.0, "avg_logprob": -0.13527759240598095, + "compression_ratio": 1.5378486055776892, "no_speech_prob": 0.004806575831025839}, + {"id": 306, "seek": 188984, "start": 1889.84, "end": 1896.24, "text": " like scale + it locally. But as the whole architecture, I don''t think it''s a very kind of smooth", + "tokens": [50364, 411, 4373, 309, 16143, 13, 583, 382, 264, 1379, 9482, 11, 286, + 500, 380, 519, 309, 311, 257, 588, 733, 295, 5508, 50684], "temperature": 0.0, "avg_logprob": + -0.17292574177617612, "compression_ratio": 1.5814977973568283, "no_speech_prob": + 0.002885310212150216}, {"id": 307, "seek": 188984, "start": 1896.24, "end": 1902.08, + "text": " in a way solution, like not to mention that the performance element of + it is just not taken care of.", "tokens": [50684, 294, 257, 636, 3827, 11, 411, + 406, 281, 2152, 300, 264, 3389, 4478, 295, 309, 307, 445, 406, 2726, 1127, 295, + 13, 50976], "temperature": 0.0, "avg_logprob": -0.17292574177617612, "compression_ratio": + 1.5814977973568283, "no_speech_prob": 0.002885310212150216}, {"id": 308, "seek": + 188984, "start": 1903.9199999999998, "end": 1911.52, "text": " And what you say + now, essentially, like with ONX binding in Java, we could just train that model", + "tokens": [51068, 400, 437, 291, 584, 586, 11, 4476, 11, 411, 365, 9299, 55, 17359, + 294, 10745, 11, 321, 727, 445, 3847, 300, 2316, 51448], "temperature": 0.0, "avg_logprob": + -0.17292574177617612, "compression_ratio": 1.5814977973568283, "no_speech_prob": + 0.002885310212150216}, {"id": 309, "seek": 188984, "start": 1911.52, "end": 1915.1999999999998, + "text": " and then export it in ONX format and then use it directly in Java.", "tokens": + [51448, 293, 550, 10725, 309, 294, 9299, 55, 7877, 293, 550, 764, 309, 3838, 294, + 10745, 13, 51632], "temperature": 0.0, "avg_logprob": -0.17292574177617612, "compression_ratio": + 1.5814977973568283, "no_speech_prob": 0.002885310212150216}, {"id": 310, "seek": + 191520, "start": 1916.0800000000002, "end": 1922.64, "text": " You can''t, yes. + But you still have to get the inputs to the model. So if", "tokens": [50408, 509, + 393, 380, 11, 2086, 13, 583, 291, 920, 362, 281, 483, 264, 15743, 281, 264, 2316, + 13, 407, 498, 50736], "temperature": 0.0, "avg_logprob": -0.15294763565063477, "compression_ratio": + 1.6923076923076923, "no_speech_prob": 0.011475927196443081}, {"id": 311, "seek": + 191520, "start": 1925.2, "end": 1929.04, "text": " if it''s like an image or something + like that, it''s usually pretty easy. But if it''s text, then you", "tokens": [50864, + 498, 309, 311, 411, 364, 3256, 420, 746, 411, 300, 11, 309, 311, 2673, 1238, 1858, + 13, 583, 498, 309, 311, 2487, 11, 550, 291, 51056], "temperature": 0.0, "avg_logprob": + -0.15294763565063477, "compression_ratio": 1.6923076923076923, "no_speech_prob": + 0.011475927196443081}, {"id": 312, "seek": 191520, "start": 1929.04, "end": 1934.96, + "text": " have to tokenize first. And you have to use the right tokenizer. And you + have to do, you have to", "tokens": [51056, 362, 281, 14862, 1125, 700, 13, 400, + 291, 362, 281, 764, 264, 558, 14862, 6545, 13, 400, 291, 362, 281, 360, 11, 291, + 362, 281, 51352], "temperature": 0.0, "avg_logprob": -0.15294763565063477, "compression_ratio": + 1.6923076923076923, "no_speech_prob": 0.011475927196443081}, {"id": 313, "seek": + 191520, "start": 1934.96, "end": 1939.04, "text": " kind of jump through a bunch + of hoops to get it to work correctly. So it''s probably", "tokens": [51352, 733, + 295, 3012, 807, 257, 3840, 295, 1106, 3370, 281, 483, 309, 281, 589, 8944, 13, 407, + 309, 311, 1391, 51556], "temperature": 0.0, "avg_logprob": -0.15294763565063477, + "compression_ratio": 1.6923076923076923, "no_speech_prob": 0.011475927196443081}, + {"id": 314, "seek": 193904, "start": 1939.84, "end": 1945.12, "text": " a month''s + worth of work to get a tokenizer working in Java the way you needed to work.", "tokens": + [50404, 257, 1618, 311, 3163, 295, 589, 281, 483, 257, 14862, 6545, 1364, 294, 10745, + 264, 636, 291, 2978, 281, 589, 13, 50668], "temperature": 0.0, "avg_logprob": -0.19555811260057532, + "compression_ratio": 1.705607476635514, "no_speech_prob": 0.00319071882404387}, + {"id": 315, "seek": 193904, "start": 1945.84, "end": 1950.48, "text": " Yeah. And + maybe you could, in principle, share this tokenizer between tasks, right? So", "tokens": + [50704, 865, 13, 400, 1310, 291, 727, 11, 294, 8665, 11, 2073, 341, 14862, 6545, + 1296, 9608, 11, 558, 30, 407, 50936], "temperature": 0.0, "avg_logprob": -0.19555811260057532, + "compression_ratio": 1.705607476635514, "no_speech_prob": 0.00319071882404387}, + {"id": 316, "seek": 193904, "start": 1950.48, "end": 1957.36, "text": " it''s for + sentiment or for entity recognition in principle, you could use the same tokenizer. + Yeah.", "tokens": [50936, 309, 311, 337, 16149, 420, 337, 13977, 11150, 294, 8665, + 11, 291, 727, 764, 264, 912, 14862, 6545, 13, 865, 13, 51280], "temperature": 0.0, + "avg_logprob": -0.19555811260057532, "compression_ratio": 1.705607476635514, "no_speech_prob": + 0.00319071882404387}, {"id": 317, "seek": 193904, "start": 1958.3999999999999, "end": + 1964.0, "text": " Right. So the tokenizer is, so the tokenizer relies on the vocabulary + and the configuration,", "tokens": [51332, 1779, 13, 407, 264, 14862, 6545, 307, + 11, 370, 264, 14862, 6545, 30910, 322, 264, 19864, 293, 264, 11694, 11, 51612], + "temperature": 0.0, "avg_logprob": -0.19555811260057532, "compression_ratio": 1.705607476635514, + "no_speech_prob": 0.00319071882404387}, {"id": 318, "seek": 196400, "start": 1964.0, + "end": 1969.6, "text": " which is bound to the model. So the model is dependent + on these things. So if you have a generalized", "tokens": [50364, 597, 307, 5472, + 281, 264, 2316, 13, 407, 264, 2316, 307, 12334, 322, 613, 721, 13, 407, 498, 291, + 362, 257, 44498, 50644], "temperature": 0.0, "avg_logprob": -0.22368507385253905, + "compression_ratio": 1.5108695652173914, "no_speech_prob": 0.008656044490635395}, + {"id": 319, "seek": 196400, "start": 1969.6, "end": 1978.4, "text": " way to load + the vocabulary and the configuration, then yes, you could just take the thing and", + "tokens": [50644, 636, 281, 3677, 264, 19864, 293, 264, 11694, 11, 550, 2086, 11, + 291, 727, 445, 747, 264, 551, 293, 51084], "temperature": 0.0, "avg_logprob": -0.22368507385253905, + "compression_ratio": 1.5108695652173914, "no_speech_prob": 0.008656044490635395}, + {"id": 320, "seek": 196400, "start": 1980.8, "end": 1987.28, "text": " your new + stack. But having said all this with MITY, you took a different, you know,", "tokens": + [51204, 428, 777, 8630, 13, 583, 1419, 848, 439, 341, 365, 13100, 56, 11, 291, 1890, + 257, 819, 11, 291, 458, 11, 51528], "temperature": 0.0, "avg_logprob": -0.22368507385253905, + "compression_ratio": 1.5108695652173914, "no_speech_prob": 0.008656044490635395}, + {"id": 321, "seek": 198728, "start": 1987.44, "end": 1994.0, "text": " approach, + like the philosophy behind MIT, you''ll offer it as a web server, right?", "tokens": + [50372, 3109, 11, 411, 264, 10675, 2261, 13100, 11, 291, 603, 2626, 309, 382, 257, + 3670, 7154, 11, 558, 30, 50700], "temperature": 0.0, "avg_logprob": -0.16905337688969632, + "compression_ratio": 1.574468085106383, "no_speech_prob": 0.011271785944700241}, + {"id": 322, "seek": 198728, "start": 1994.0, "end": 1998.96, "text": " Yeah. And + again, can you tell me more about it? I mean, I''m sure you can open a lot of detail.", + "tokens": [50700, 865, 13, 400, 797, 11, 393, 291, 980, 385, 544, 466, 309, 30, + 286, 914, 11, 286, 478, 988, 291, 393, 1269, 257, 688, 295, 2607, 13, 50948], "temperature": + 0.0, "avg_logprob": -0.16905337688969632, "compression_ratio": 1.574468085106383, + "no_speech_prob": 0.011271785944700241}, {"id": 323, "seek": 198728, "start": 1998.96, + "end": 2009.2, "text": " Yeah, the reason I went that route is because when you, + when you want to do model inference,", "tokens": [50948, 865, 11, 264, 1778, 286, + 1437, 300, 7955, 307, 570, 562, 291, 11, 562, 291, 528, 281, 360, 2316, 38253, 11, + 51460], "temperature": 0.0, "avg_logprob": -0.16905337688969632, "compression_ratio": + 1.574468085106383, "no_speech_prob": 0.011271785944700241}, {"id": 324, "seek": + 198728, "start": 2010.48, "end": 2016.0, "text": " you want to give it as much compute + as possible, right? And you kind of want it to be its own thing.", "tokens": [51524, + 291, 528, 281, 976, 309, 382, 709, 14722, 382, 1944, 11, 558, 30, 400, 291, 733, + 295, 528, 309, 281, 312, 1080, 1065, 551, 13, 51800], "temperature": 0.0, "avg_logprob": + -0.16905337688969632, "compression_ratio": 1.574468085106383, "no_speech_prob": + 0.011271785944700241}, {"id": 325, "seek": 201600, "start": 2016.64, "end": 2022.24, + "text": " So I went the microservice route. I''m not, I''m not saying microservices + are the way of the", "tokens": [50396, 407, 286, 1437, 264, 15547, 25006, 7955, + 13, 286, 478, 406, 11, 286, 478, 406, 1566, 15547, 47480, 366, 264, 636, 295, 264, + 50676], "temperature": 0.0, "avg_logprob": -0.13343693846363133, "compression_ratio": + 1.728301886792453, "no_speech_prob": 0.0013234539655968547}, {"id": 326, "seek": + 201600, "start": 2022.24, "end": 2027.28, "text": " future and they''re better than + model it''s and all this stuff. But the idea of coupling", "tokens": [50676, 2027, + 293, 436, 434, 1101, 813, 2316, 309, 311, 293, 439, 341, 1507, 13, 583, 264, 1558, + 295, 37447, 50928], "temperature": 0.0, "avg_logprob": -0.13343693846363133, "compression_ratio": + 1.728301886792453, "no_speech_prob": 0.0013234539655968547}, {"id": 327, "seek": + 201600, "start": 2027.28, "end": 2033.28, "text": " with this, you know, this model + inference is part of like your regular application code.", "tokens": [50928, 365, + 341, 11, 291, 458, 11, 341, 2316, 38253, 307, 644, 295, 411, 428, 3890, 3861, 3089, + 13, 51228], "temperature": 0.0, "avg_logprob": -0.13343693846363133, "compression_ratio": + 1.728301886792453, "no_speech_prob": 0.0013234539655968547}, {"id": 328, "seek": + 201600, "start": 2034.4, "end": 2038.4, "text": " Maybe you don''t want to do that, + you know, you want to have this other service that can,", "tokens": [51284, 2704, + 291, 500, 380, 528, 281, 360, 300, 11, 291, 458, 11, 291, 528, 281, 362, 341, 661, + 2643, 300, 393, 11, 51484], "temperature": 0.0, "avg_logprob": -0.13343693846363133, + "compression_ratio": 1.728301886792453, "no_speech_prob": 0.0013234539655968547}, + {"id": 329, "seek": 201600, "start": 2039.2, "end": 2045.52, "text": " and this + is part of like the bigger ML ops question, which is, well, how often should I update + models?", "tokens": [51524, 293, 341, 307, 644, 295, 411, 264, 3801, 21601, 44663, + 1168, 11, 597, 307, 11, 731, 11, 577, 2049, 820, 286, 5623, 5245, 30, 51840], "temperature": + 0.0, "avg_logprob": -0.13343693846363133, "compression_ratio": 1.728301886792453, + "no_speech_prob": 0.0013234539655968547}, {"id": 330, "seek": 204552, "start": 2045.52, + "end": 2052.0, "text": " What are the things that I just know about, you know, drift + and all these things that are like,", "tokens": [50364, 708, 366, 264, 721, 300, + 286, 445, 458, 466, 11, 291, 458, 11, 19699, 293, 439, 613, 721, 300, 366, 411, + 11, 50688], "temperature": 0.0, "avg_logprob": -0.10393639907096196, "compression_ratio": + 1.7342342342342343, "no_speech_prob": 0.00017055209900718182}, {"id": 331, "seek": + 204552, "start": 2052.0, "end": 2056.4, "text": " what about logging and all this + stuff? It''s like, well, okay, you need a way to do this. And if you", "tokens": + [50688, 437, 466, 27991, 293, 439, 341, 1507, 30, 467, 311, 411, 11, 731, 11, 1392, + 11, 291, 643, 257, 636, 281, 360, 341, 13, 400, 498, 291, 50908], "temperature": + 0.0, "avg_logprob": -0.10393639907096196, "compression_ratio": 1.7342342342342343, + "no_speech_prob": 0.00017055209900718182}, {"id": 332, "seek": 204552, "start": + 2057.04, "end": 2062.08, "text": " embed model inference in your own code, now you''re + also responsible for all this stuff, right?", "tokens": [50940, 12240, 2316, 38253, + 294, 428, 1065, 3089, 11, 586, 291, 434, 611, 6250, 337, 439, 341, 1507, 11, 558, + 30, 51192], "temperature": 0.0, "avg_logprob": -0.10393639907096196, "compression_ratio": + 1.7342342342342343, "no_speech_prob": 0.00017055209900718182}, {"id": 333, "seek": + 204552, "start": 2063.68, "end": 2070.32, "text": " So as a, as a microservice, + you can evolve that microservice and say, all right, this thing is", "tokens": [51272, + 407, 382, 257, 11, 382, 257, 15547, 25006, 11, 291, 393, 16693, 300, 15547, 25006, + 293, 584, 11, 439, 558, 11, 341, 551, 307, 51604], "temperature": 0.0, "avg_logprob": + -0.10393639907096196, "compression_ratio": 1.7342342342342343, "no_speech_prob": + 0.00017055209900718182}, {"id": 334, "seek": 207032, "start": 2070.32, "end": 2077.76, + "text": " responsible for model inference and that''s it, right? And then all the + side effects around that of like,", "tokens": [50364, 6250, 337, 2316, 38253, 293, + 300, 311, 309, 11, 558, 30, 400, 550, 439, 264, 1252, 5065, 926, 300, 295, 411, + 11, 50736], "temperature": 0.0, "avg_logprob": -0.1442105953509991, "compression_ratio": + 1.7056277056277056, "no_speech_prob": 0.0006310308817774057}, {"id": 335, "seek": + 207032, "start": 2077.76, "end": 2083.2000000000003, "text": " okay, well, you need + a new model, but if you have to AB test models, what if you want to do logging,", + "tokens": [50736, 1392, 11, 731, 11, 291, 643, 257, 777, 2316, 11, 457, 498, 291, + 362, 281, 13838, 1500, 5245, 11, 437, 498, 291, 528, 281, 360, 27991, 11, 51008], + "temperature": 0.0, "avg_logprob": -0.1442105953509991, "compression_ratio": 1.7056277056277056, + "no_speech_prob": 0.0006310308817774057}, {"id": 336, "seek": 207032, "start": 2083.2000000000003, + "end": 2089.84, "text": " what if you want to do all these other things? You can + evolve that in its own way and it''s in the", "tokens": [51008, 437, 498, 291, 528, + 281, 360, 439, 613, 661, 721, 30, 509, 393, 16693, 300, 294, 1080, 1065, 636, 293, + 309, 311, 294, 264, 51340], "temperature": 0.0, "avg_logprob": -0.1442105953509991, + "compression_ratio": 1.7056277056277056, "no_speech_prob": 0.0006310308817774057}, + {"id": 337, "seek": 207032, "start": 2089.84, "end": 2096.7200000000003, "text": + " separation of concerns makes much more sense. So, and then it kind of gets you + out of the,", "tokens": [51340, 14634, 295, 7389, 1669, 709, 544, 2020, 13, 407, + 11, 293, 550, 309, 733, 295, 2170, 291, 484, 295, 264, 11, 51684], "temperature": + 0.0, "avg_logprob": -0.1442105953509991, "compression_ratio": 1.7056277056277056, + "no_speech_prob": 0.0006310308817774057}, {"id": 338, "seek": 209672, "start": 2097.12, + "end": 2103.4399999999996, "text": " it gets you out of the problem of like, okay, + well, am I going to build a mighty for Ruby? Am I going", "tokens": [50384, 309, + 2170, 291, 484, 295, 264, 1154, 295, 411, 11, 1392, 11, 731, 11, 669, 286, 516, + 281, 1322, 257, 21556, 337, 19907, 30, 2012, 286, 516, 50700], "temperature": 0.0, + "avg_logprob": -0.1849060196807419, "compression_ratio": 1.8278388278388278, "no_speech_prob": + 0.0056561920791864395}, {"id": 339, "seek": 209672, "start": 2103.4399999999996, + "end": 2108.0, "text": " to build mighty for node? Am I going to build ready mighty + for go? Like, I don''t have to do that. I", "tokens": [50700, 281, 1322, 21556, + 337, 9984, 30, 2012, 286, 516, 281, 1322, 1919, 21556, 337, 352, 30, 1743, 11, 286, + 500, 380, 362, 281, 360, 300, 13, 286, 50928], "temperature": 0.0, "avg_logprob": + -0.1849060196807419, "compression_ratio": 1.8278388278388278, "no_speech_prob": + 0.0056561920791864395}, {"id": 340, "seek": 209672, "start": 2108.0, "end": 2114.56, + "text": " can just build mighty inference server as a web server or a GRPC, which + own, you know, it''s on,", "tokens": [50928, 393, 445, 1322, 21556, 38253, 7154, + 382, 257, 3670, 7154, 420, 257, 10903, 12986, 11, 597, 1065, 11, 291, 458, 11, 309, + 311, 322, 11, 51256], "temperature": 0.0, "avg_logprob": -0.1849060196807419, "compression_ratio": + 1.8278388278388278, "no_speech_prob": 0.0056561920791864395}, {"id": 341, "seek": + 209672, "start": 2114.56, "end": 2118.9599999999996, "text": " it''s on the roadmap. + I don''t know how long that''s going to take, but now you have this thing. And then", + "tokens": [51256, 309, 311, 322, 264, 35738, 13, 286, 500, 380, 458, 577, 938, 300, + 311, 516, 281, 747, 11, 457, 586, 291, 362, 341, 551, 13, 400, 550, 51476], "temperature": + 0.0, "avg_logprob": -0.1849060196807419, "compression_ratio": 1.8278388278388278, + "no_speech_prob": 0.0056561920791864395}, {"id": 342, "seek": 209672, "start": 2118.9599999999996, + "end": 2124.16, "text": " I just have to write client libraries. And the APIs always + the same. The client libraries for HTTP", "tokens": [51476, 286, 445, 362, 281, + 2464, 6423, 15148, 13, 400, 264, 21445, 1009, 264, 912, 13, 440, 6423, 15148, 337, + 33283, 51736], "temperature": 0.0, "avg_logprob": -0.1849060196807419, "compression_ratio": + 1.8278388278388278, "no_speech_prob": 0.0056561920791864395}, {"id": 343, "seek": + 212416, "start": 2124.24, "end": 2135.2799999999997, "text": " are super easy. So + yeah. And if you compare this, let''s say we take a database, like VV8 or", "tokens": + [50368, 366, 1687, 1858, 13, 407, 1338, 13, 400, 498, 291, 6794, 341, 11, 718, 311, + 584, 321, 747, 257, 8149, 11, 411, 691, 53, 23, 420, 50920], "temperature": 0.0, + "avg_logprob": -0.21464213346823668, "compression_ratio": 1.4739583333333333, "no_speech_prob": + 0.007369358092546463}, {"id": 344, "seek": 212416, "start": 2135.2799999999997, + "end": 2143.2799999999997, "text": " SBA, they have inference inside them, right? + So like, if you already bought into that solution,", "tokens": [50920, 318, 9295, + 11, 436, 362, 38253, 1854, 552, 11, 558, 30, 407, 411, 11, 498, 291, 1217, 4243, + 666, 300, 3827, 11, 51320], "temperature": 0.0, "avg_logprob": -0.21464213346823668, + "compression_ratio": 1.4739583333333333, "no_speech_prob": 0.007369358092546463}, + {"id": 345, "seek": 212416, "start": 2143.2799999999997, "end": 2149.2799999999997, + "text": " in principle, you could do this. The only caveat I think is that if you + have your custom model,", "tokens": [51320, 294, 8665, 11, 291, 727, 360, 341, 13, + 440, 787, 43012, 286, 519, 307, 300, 498, 291, 362, 428, 2375, 2316, 11, 51620], + "temperature": 0.0, "avg_logprob": -0.21464213346823668, "compression_ratio": 1.4739583333333333, + "no_speech_prob": 0.007369358092546463}, {"id": 346, "seek": 214928, "start": 2149.28, + "end": 2156.0800000000004, "text": " you''ll have to go an extra mile to actually + integrate it inside this database, right? And", "tokens": [50364, 291, 603, 362, + 281, 352, 364, 2857, 12620, 281, 767, 13365, 309, 1854, 341, 8149, 11, 558, 30, + 400, 50704], "temperature": 0.0, "avg_logprob": -0.14749718612095095, "compression_ratio": + 1.606837606837607, "no_speech_prob": 0.0033544660545885563}, {"id": 347, "seek": + 214928, "start": 2156.0800000000004, "end": 2162.2400000000002, "text": " at that + point, with VV8, I think you will have to master go with SBA, you''ll have to master + the C", "tokens": [50704, 412, 300, 935, 11, 365, 691, 53, 23, 11, 286, 519, 291, + 486, 362, 281, 4505, 352, 365, 318, 9295, 11, 291, 603, 362, 281, 4505, 264, 383, + 51012], "temperature": 0.0, "avg_logprob": -0.14749718612095095, "compression_ratio": + 1.606837606837607, "no_speech_prob": 0.0033544660545885563}, {"id": 348, "seek": + 214928, "start": 2162.2400000000002, "end": 2167.6800000000003, "text": " plus plus + or Java. I''m not sure. I''m not an expert in that, but there is a podcast with + Joe", "tokens": [51012, 1804, 1804, 420, 10745, 13, 286, 478, 406, 988, 13, 286, + 478, 406, 364, 5844, 294, 300, 11, 457, 456, 307, 257, 7367, 365, 6807, 51284], + "temperature": 0.0, "avg_logprob": -0.14749718612095095, "compression_ratio": 1.606837606837607, + "no_speech_prob": 0.0033544660545885563}, {"id": 349, "seek": 214928, "start": 2167.6800000000003, + "end": 2175.0400000000004, "text": " Bergum that you can check out. But yes, so + how would you kind of like on product side, how would", "tokens": [51284, 27511, + 449, 300, 291, 393, 1520, 484, 13, 583, 2086, 11, 370, 577, 576, 291, 733, 295, + 411, 322, 1674, 1252, 11, 577, 576, 51652], "temperature": 0.0, "avg_logprob": -0.14749718612095095, + "compression_ratio": 1.606837606837607, "no_speech_prob": 0.0033544660545885563}, + {"id": 350, "seek": 217504, "start": 2175.2, "end": 2184.8, "text": " you compare + mititude that approach? So VESPA uses on X-Rontane. VESPA wraps on X-Rontane. I + believe", "tokens": [50372, 291, 6794, 2194, 4377, 300, 3109, 30, 407, 691, 2358, + 10297, 4960, 322, 1783, 12, 49, 896, 1929, 13, 691, 2358, 10297, 25831, 322, 1783, + 12, 49, 896, 1929, 13, 286, 1697, 50852], "temperature": 0.0, "avg_logprob": -0.25370802999544545, + "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.00028813848621211946}, + {"id": 351, "seek": 217504, "start": 2184.8, "end": 2188.72, "text": " it''s on + X-Rontane. I know they use on X models. I don''t know how to present it on X-Rontane.", + "tokens": [50852, 309, 311, 322, 1783, 12, 49, 896, 1929, 13, 286, 458, 436, 764, + 322, 1783, 5245, 13, 286, 500, 380, 458, 577, 281, 1974, 309, 322, 1783, 12, 49, + 896, 1929, 13, 51048], "temperature": 0.0, "avg_logprob": -0.25370802999544545, + "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.00028813848621211946}, + {"id": 352, "seek": 217504, "start": 2190.8, "end": 2197.12, "text": " So you''d + still have to go through the step of doing that. With VV8, it''s a little bit different.", + "tokens": [51152, 407, 291, 1116, 920, 362, 281, 352, 807, 264, 1823, 295, 884, + 300, 13, 2022, 691, 53, 23, 11, 309, 311, 257, 707, 857, 819, 13, 51468], "temperature": + 0.0, "avg_logprob": -0.25370802999544545, "compression_ratio": 1.6830357142857142, + "no_speech_prob": 0.00028813848621211946}, {"id": 353, "seek": 217504, "start": + 2197.12, "end": 2202.32, "text": " With VV8, you have these things called modules. + And then the modules are typically like", "tokens": [51468, 2022, 691, 53, 23, 11, + 291, 362, 613, 721, 1219, 16679, 13, 400, 550, 264, 16679, 366, 5850, 411, 51728], + "temperature": 0.0, "avg_logprob": -0.25370802999544545, "compression_ratio": 1.6830357142857142, + "no_speech_prob": 0.00028813848621211946}, {"id": 354, "seek": 220232, "start": + 2202.96, "end": 2212.0800000000004, "text": " Docker containers with APIs exposed. + And then there''s logic written in the module code for", "tokens": [50396, 33772, + 17089, 365, 21445, 9495, 13, 400, 550, 456, 311, 9952, 3720, 294, 264, 10088, 3089, + 337, 50852], "temperature": 0.0, "avg_logprob": -0.14255503245762416, "compression_ratio": + 1.5603864734299517, "no_speech_prob": 0.00028645788552239537}, {"id": 355, "seek": + 220232, "start": 2212.0800000000004, "end": 2218.7200000000003, "text": " VV8 that + will wrap that API. And it''s easier if you just copy and paste a model and then + change", "tokens": [50852, 691, 53, 23, 300, 486, 7019, 300, 9362, 13, 400, 309, + 311, 3571, 498, 291, 445, 5055, 293, 9163, 257, 2316, 293, 550, 1319, 51184], "temperature": + 0.0, "avg_logprob": -0.14255503245762416, "compression_ratio": 1.5603864734299517, + "no_speech_prob": 0.00028645788552239537}, {"id": 356, "seek": 220232, "start": + 2218.7200000000003, "end": 2221.84, "text": " stuff to match the API of the thing + that you have in a Docker container.", "tokens": [51184, 1507, 281, 2995, 264, 9362, + 295, 264, 551, 300, 291, 362, 294, 257, 33772, 10129, 13, 51340], "temperature": + 0.0, "avg_logprob": -0.14255503245762416, "compression_ratio": 1.5603864734299517, + "no_speech_prob": 0.00028645788552239537}, {"id": 357, "seek": 220232, "start": + 2224.32, "end": 2230.1600000000003, "text": " So it''s not that much work. You still + have to know go to do it.", "tokens": [51464, 407, 309, 311, 406, 300, 709, 589, + 13, 509, 920, 362, 281, 458, 352, 281, 360, 309, 13, 51756], "temperature": 0.0, + "avg_logprob": -0.14255503245762416, "compression_ratio": 1.5603864734299517, "no_speech_prob": + 0.00028645788552239537}, {"id": 358, "seek": 223232, "start": 2232.88, "end": 2243.1200000000003, + "text": " And yeah, I think the other problem that I have with that approach, and + I''m not saying it''s wrong,", "tokens": [50392, 400, 1338, 11, 286, 519, 264, 661, + 1154, 300, 286, 362, 365, 300, 3109, 11, 293, 286, 478, 406, 1566, 309, 311, 2085, + 11, 50904], "temperature": 0.0, "avg_logprob": -0.19556961832819758, "compression_ratio": + 1.4270833333333333, "no_speech_prob": 0.0015615865122526884}, {"id": 359, "seek": + 223232, "start": 2243.76, "end": 2250.4, "text": " but from my perspective, so if + you look at the documentation actually for a couple of vector", "tokens": [50936, + 457, 490, 452, 4585, 11, 370, 498, 291, 574, 412, 264, 14333, 767, 337, 257, 1916, + 295, 8062, 51268], "temperature": 0.0, "avg_logprob": -0.19556961832819758, "compression_ratio": + 1.4270833333333333, "no_speech_prob": 0.0015615865122526884}, {"id": 360, "seek": + 223232, "start": 2250.4, "end": 2256.56, "text": " search engines, I''m not sure + of VESPA, but I think VV8 and maybe another will say,", "tokens": [51268, 3164, + 12982, 11, 286, 478, 406, 988, 295, 691, 2358, 10297, 11, 457, 286, 519, 691, 53, + 23, 293, 1310, 1071, 486, 584, 11, 51576], "temperature": 0.0, "avg_logprob": -0.19556961832819758, + "compression_ratio": 1.4270833333333333, "no_speech_prob": 0.0015615865122526884}, + {"id": 361, "seek": 225656, "start": 2256.72, "end": 2262.48, "text": " OK, well, + it''s better to use a GPU for inference and then CPU for the vector search,", "tokens": + [50372, 2264, 11, 731, 11, 309, 311, 1101, 281, 764, 257, 18407, 337, 38253, 293, + 550, 13199, 337, 264, 8062, 3164, 11, 50660], "temperature": 0.0, "avg_logprob": + -0.15691626202929151, "compression_ratio": 1.6576576576576576, "no_speech_prob": + 0.0019621665123850107}, {"id": 362, "seek": 225656, "start": 2262.48, "end": 2270.24, + "text": " right? Because you want to provide as many workers to the search algorithm + as possible.", "tokens": [50660, 558, 30, 1436, 291, 528, 281, 2893, 382, 867, 5600, + 281, 264, 3164, 9284, 382, 1944, 13, 51048], "temperature": 0.0, "avg_logprob": + -0.15691626202929151, "compression_ratio": 1.6576576576576576, "no_speech_prob": + 0.0019621665123850107}, {"id": 363, "seek": 225656, "start": 2270.24, "end": 2278.4, + "text": " And you don''t want the inference, the model inference, and the vector + search fighting for resources.", "tokens": [51048, 400, 291, 500, 380, 528, 264, + 38253, 11, 264, 2316, 38253, 11, 293, 264, 8062, 3164, 5237, 337, 3593, 13, 51456], + "temperature": 0.0, "avg_logprob": -0.15691626202929151, "compression_ratio": 1.6576576576576576, + "no_speech_prob": 0.0019621665123850107}, {"id": 364, "seek": 225656, "start": 2278.96, + "end": 2284.24, "text": " Because both are very expensive, right? So they say, hey, + if you have GPU, then all your model", "tokens": [51484, 1436, 1293, 366, 588, 5124, + 11, 558, 30, 407, 436, 584, 11, 4177, 11, 498, 291, 362, 18407, 11, 550, 439, 428, + 2316, 51748], "temperature": 0.0, "avg_logprob": -0.15691626202929151, "compression_ratio": + 1.6576576576576576, "no_speech_prob": 0.0019621665123850107}, {"id": 365, "seek": + 228424, "start": 2284.24, "end": 2289.04, "text": " inferences and GPU and your + vector search is all CPU and you get this one perfect box and", "tokens": [50364, + 13596, 2667, 293, 18407, 293, 428, 8062, 3164, 307, 439, 13199, 293, 291, 483, 341, + 472, 2176, 2424, 293, 50604], "temperature": 0.0, "avg_logprob": -0.1686912737394634, + "compression_ratio": 1.5932203389830508, "no_speech_prob": 0.00260981940664351}, + {"id": 366, "seek": 228424, "start": 2289.04, "end": 2296.4799999999996, "text": + " everything just works. But OK, well, what if you want to scale beyond that? You + can only send so", "tokens": [50604, 1203, 445, 1985, 13, 583, 2264, 11, 731, 11, + 437, 498, 291, 528, 281, 4373, 4399, 300, 30, 509, 393, 787, 2845, 370, 50976], + "temperature": 0.0, "avg_logprob": -0.1686912737394634, "compression_ratio": 1.5932203389830508, + "no_speech_prob": 0.00260981940664351}, {"id": 367, "seek": 228424, "start": 2296.4799999999996, + "end": 2304.56, "text": " many documents into a GPU at a time. What if I need 12 + machines? Well, now I need 12 machines", "tokens": [50976, 867, 8512, 666, 257, + 18407, 412, 257, 565, 13, 708, 498, 286, 643, 2272, 8379, 30, 1042, 11, 586, 286, + 643, 2272, 8379, 51380], "temperature": 0.0, "avg_logprob": -0.1686912737394634, + "compression_ratio": 1.5932203389830508, "no_speech_prob": 0.00260981940664351}, + {"id": 368, "seek": 228424, "start": 2304.56, "end": 2312.08, "text": " that are + all hosting VV8 and they''re all hosting mighty or whatever your inference solution + is,", "tokens": [51380, 300, 366, 439, 16058, 691, 53, 23, 293, 436, 434, 439, 16058, + 21556, 420, 2035, 428, 38253, 3827, 307, 11, 51756], "temperature": 0.0, "avg_logprob": + -0.1686912737394634, "compression_ratio": 1.5932203389830508, "no_speech_prob": + 0.00260981940664351}, {"id": 369, "seek": 231208, "start": 2312.08, "end": 2316.7999999999997, + "text": " all it wants, right? So this goes back to the separation of concerns,", + "tokens": [50364, 439, 309, 2738, 11, 558, 30, 407, 341, 1709, 646, 281, 264, 14634, + 295, 7389, 11, 50600], "temperature": 0.0, "avg_logprob": -0.13059596186098846, + "compression_ratio": 1.7213740458015268, "no_speech_prob": 0.003360312432050705}, + {"id": 370, "seek": 231208, "start": 2316.7999999999997, "end": 2321.44, "text": + " problems. Like, well, what if I have a lot of documents that I need to process? + And it doesn''t take", "tokens": [50600, 2740, 13, 1743, 11, 731, 11, 437, 498, + 286, 362, 257, 688, 295, 8512, 300, 286, 643, 281, 1399, 30, 400, 309, 1177, 380, + 747, 50832], "temperature": 0.0, "avg_logprob": -0.13059596186098846, "compression_ratio": + 1.7213740458015268, "no_speech_prob": 0.003360312432050705}, {"id": 371, "seek": + 231208, "start": 2321.44, "end": 2326.56, "text": " that long to get them into the + vector search by the vectors, but processing those documents", "tokens": [50832, + 300, 938, 281, 483, 552, 666, 264, 8062, 3164, 538, 264, 18875, 11, 457, 9007, 729, + 8512, 51088], "temperature": 0.0, "avg_logprob": -0.13059596186098846, "compression_ratio": + 1.7213740458015268, "no_speech_prob": 0.003360312432050705}, {"id": 372, "seek": + 231208, "start": 2326.56, "end": 2332.48, "text": " takes a long time. So I have + to pre-process. Well, now you''ve kind of got like this situation where", "tokens": + [51088, 2516, 257, 938, 565, 13, 407, 286, 362, 281, 659, 12, 41075, 13, 1042, 11, + 586, 291, 600, 733, 295, 658, 411, 341, 2590, 689, 51384], "temperature": 0.0, "avg_logprob": + -0.13059596186098846, "compression_ratio": 1.7213740458015268, "no_speech_prob": + 0.003360312432050705}, {"id": 373, "seek": 231208, "start": 2332.48, "end": 2338.72, + "text": " you might need another solution to do this batch pre-processing, right? + In another place.", "tokens": [51384, 291, 1062, 643, 1071, 3827, 281, 360, 341, + 15245, 659, 12, 41075, 278, 11, 558, 30, 682, 1071, 1081, 13, 51696], "temperature": + 0.0, "avg_logprob": -0.13059596186098846, "compression_ratio": 1.7213740458015268, + "no_speech_prob": 0.003360312432050705}, {"id": 374, "seek": 233872, "start": 2339.3599999999997, + "end": 2345.3599999999997, "text": " And then you bypass the module when you integrate + into VV8. You just want to send the", "tokens": [50396, 400, 550, 291, 24996, 264, + 10088, 562, 291, 13365, 666, 691, 53, 23, 13, 509, 445, 528, 281, 2845, 264, 50696], + "temperature": 0.0, "avg_logprob": -0.19641963640848795, "compression_ratio": 1.7338403041825095, + "no_speech_prob": 0.005385193973779678}, {"id": 375, "seek": 233872, "start": 2345.3599999999997, + "end": 2350.3199999999997, "text": " vectors directly to VV8 so you don''t have + any inference. You send the vector to it. So,", "tokens": [50696, 18875, 3838, 281, + 691, 53, 23, 370, 291, 500, 380, 362, 604, 38253, 13, 509, 2845, 264, 8062, 281, + 309, 13, 407, 11, 50944], "temperature": 0.0, "avg_logprob": -0.19641963640848795, + "compression_ratio": 1.7338403041825095, "no_speech_prob": 0.005385193973779678}, + {"id": 376, "seek": 233872, "start": 2351.2, "end": 2358.0, "text": " again, it''s + like this. I''m not saying it''s wrong. I think it''s a great idea because you can + just", "tokens": [50988, 797, 11, 309, 311, 411, 341, 13, 286, 478, 406, 1566, 309, + 311, 2085, 13, 286, 519, 309, 311, 257, 869, 1558, 570, 291, 393, 445, 51328], "temperature": + 0.0, "avg_logprob": -0.19641963640848795, "compression_ratio": 1.7338403041825095, + "no_speech_prob": 0.005385193973779678}, {"id": 377, "seek": 233872, "start": 2358.0, + "end": 2362.64, "text": " install something that will just work, right? You don''t + have to install like three different things", "tokens": [51328, 3625, 746, 300, + 486, 445, 589, 11, 558, 30, 509, 500, 380, 362, 281, 3625, 411, 1045, 819, 721, + 51560], "temperature": 0.0, "avg_logprob": -0.19641963640848795, "compression_ratio": + 1.7338403041825095, "no_speech_prob": 0.005385193973779678}, {"id": 378, "seek": + 233872, "start": 2362.64, "end": 2368.48, "text": " and try to figure it all out. + So I think that getting up to speed on that is probably", "tokens": [51560, 293, + 853, 281, 2573, 309, 439, 484, 13, 407, 286, 519, 300, 1242, 493, 281, 3073, 322, + 300, 307, 1391, 51852], "temperature": 0.0, "avg_logprob": -0.19641963640848795, + "compression_ratio": 1.7338403041825095, "no_speech_prob": 0.005385193973779678}, + {"id": 379, "seek": 236848, "start": 2368.48, "end": 2374.96, "text": " quick. But + in the long term, like the scalability overall, I think that you now have this coupling", + "tokens": [50364, 1702, 13, 583, 294, 264, 938, 1433, 11, 411, 264, 15664, 2310, + 4787, 11, 286, 519, 300, 291, 586, 362, 341, 37447, 50688], "temperature": 0.0, + "avg_logprob": -0.12974992835003396, "compression_ratio": 1.6245614035087719, "no_speech_prob": + 0.0026637595146894455}, {"id": 380, "seek": 236848, "start": 2374.96, "end": 2378.16, + "text": " and it''s a bit of a challenge. So I don''t know how that gets resolved.", + "tokens": [50688, 293, 309, 311, 257, 857, 295, 257, 3430, 13, 407, 286, 500, 380, + 458, 577, 300, 2170, 20772, 13, 50848], "temperature": 0.0, "avg_logprob": -0.12974992835003396, + "compression_ratio": 1.6245614035087719, "no_speech_prob": 0.0026637595146894455}, + {"id": 381, "seek": 236848, "start": 2379.28, "end": 2384.64, "text": " Yeah, that''s + actually a good point because you reminded me of, I don''t remember precisely what + we", "tokens": [50904, 865, 11, 300, 311, 767, 257, 665, 935, 570, 291, 15920, 385, + 295, 11, 286, 500, 380, 1604, 13402, 437, 321, 51172], "temperature": 0.0, "avg_logprob": + -0.12974992835003396, "compression_ratio": 1.6245614035087719, "no_speech_prob": + 0.0026637595146894455}, {"id": 382, "seek": 236848, "start": 2384.64, "end": 2391.12, + "text": " were sort of balancing between, but like with solar and a Java pipeline + in front of it. So the", "tokens": [51172, 645, 1333, 295, 22495, 1296, 11, 457, + 411, 365, 7936, 293, 257, 10745, 15517, 294, 1868, 295, 309, 13, 407, 264, 51496], + "temperature": 0.0, "avg_logprob": -0.12974992835003396, "compression_ratio": 1.6245614035087719, + "no_speech_prob": 0.0026637595146894455}, {"id": 383, "seek": 236848, "start": 2391.12, + "end": 2397.6, "text": " pipeline would process documents as they come in, you know, + chunk them, classify them, run sentiment", "tokens": [51496, 15517, 576, 1399, 8512, + 382, 436, 808, 294, 11, 291, 458, 11, 16635, 552, 11, 33872, 552, 11, 1190, 16149, + 51820], "temperature": 0.0, "avg_logprob": -0.12974992835003396, "compression_ratio": + 1.6245614035087719, "no_speech_prob": 0.0026637595146894455}, {"id": 384, "seek": + 239760, "start": 2397.6, "end": 2404.56, "text": " analysis on them and so on. We + were thinking, okay, some of these things could be computed inside", "tokens": [50364, + 5215, 322, 552, 293, 370, 322, 13, 492, 645, 1953, 11, 1392, 11, 512, 295, 613, + 721, 727, 312, 40610, 1854, 50712], "temperature": 0.0, "avg_logprob": -0.15172861019770303, + "compression_ratio": 1.603448275862069, "no_speech_prob": 0.0005646685021929443}, + {"id": 385, "seek": 239760, "start": 2404.56, "end": 2410.72, "text": " solar. Like + we could write some clever plugin which actually does, I mean, solar has a lot of", + "tokens": [50712, 7936, 13, 1743, 321, 727, 2464, 512, 13494, 23407, 597, 767, 775, + 11, 286, 914, 11, 7936, 575, 257, 688, 295, 51020], "temperature": 0.0, "avg_logprob": + -0.15172861019770303, "compression_ratio": 1.603448275862069, "no_speech_prob": + 0.0005646685021929443}, {"id": 386, "seek": 239760, "start": 2410.72, "end": 2415.92, + "text": " things there, you know, like before it indexes the document, you can run + like a ton of things.", "tokens": [51020, 721, 456, 11, 291, 458, 11, 411, 949, + 309, 8186, 279, 264, 4166, 11, 291, 393, 1190, 411, 257, 2952, 295, 721, 13, 51280], + "temperature": 0.0, "avg_logprob": -0.15172861019770303, "compression_ratio": 1.603448275862069, + "no_speech_prob": 0.0005646685021929443}, {"id": 387, "seek": 239760, "start": 2415.92, + "end": 2420.08, "text": " I think OpenNLP is one example, right? You could plug + in and it runs something there.", "tokens": [51280, 286, 519, 7238, 45, 45196, 307, + 472, 1365, 11, 558, 30, 509, 727, 5452, 294, 293, 309, 6676, 746, 456, 13, 51488], + "temperature": 0.0, "avg_logprob": -0.15172861019770303, "compression_ratio": 1.603448275862069, + "no_speech_prob": 0.0005646685021929443}, {"id": 388, "seek": 242008, "start": 2420.48, + "end": 2428.88, "text": " And I remember that my manager, like who was a VP of engineering, + he came and said, hey,", "tokens": [50384, 400, 286, 1604, 300, 452, 6598, 11, 411, + 567, 390, 257, 35812, 295, 7043, 11, 415, 1361, 293, 848, 11, 4177, 11, 50804], + "temperature": 0.0, "avg_logprob": -0.14155952297911353, "compression_ratio": 1.5601659751037344, + "no_speech_prob": 0.011596374213695526}, {"id": 389, "seek": 242008, "start": 2428.88, + "end": 2434.64, "text": " what if we lose solar? So we computed everything inside + solar, stored it and lost it. Then what?", "tokens": [50804, 437, 498, 321, 3624, + 7936, 30, 407, 321, 40610, 1203, 1854, 7936, 11, 12187, 309, 293, 2731, 309, 13, + 1396, 437, 30, 51092], "temperature": 0.0, "avg_logprob": -0.14155952297911353, + "compression_ratio": 1.5601659751037344, "no_speech_prob": 0.011596374213695526}, + {"id": 390, "seek": 242008, "start": 2435.44, "end": 2440.64, "text": " Like now + you need to bring it up back really quickly and usually what you want to do is probably + like", "tokens": [51132, 1743, 586, 291, 643, 281, 1565, 309, 493, 646, 534, 2661, + 293, 2673, 437, 291, 528, 281, 360, 307, 1391, 411, 51392], "temperature": 0.0, + "avg_logprob": -0.14155952297911353, "compression_ratio": 1.5601659751037344, "no_speech_prob": + 0.011596374213695526}, {"id": 391, "seek": 242008, "start": 2440.64, "end": 2446.08, + "text": " replicate some shard and off you go, right? But if you don''t have that + data, you need to", "tokens": [51392, 25356, 512, 402, 515, 293, 766, 291, 352, + 11, 558, 30, 583, 498, 291, 500, 380, 362, 300, 1412, 11, 291, 643, 281, 51664], + "temperature": 0.0, "avg_logprob": -0.14155952297911353, "compression_ratio": 1.5601659751037344, + "no_speech_prob": 0.011596374213695526}, {"id": 392, "seek": 244608, "start": 2446.08, + "end": 2451.84, "text": " recompute it now. So you don''t have any intermediate + storage. Solar is not the storage. Solar is", "tokens": [50364, 48000, 1169, 309, + 586, 13, 407, 291, 500, 380, 362, 604, 19376, 6725, 13, 22385, 307, 406, 264, 6725, + 13, 22385, 307, 50652], "temperature": 0.0, "avg_logprob": -0.1170184378530465, + "compression_ratio": 1.75, "no_speech_prob": 0.002784757874906063}, {"id": 393, + "seek": 244608, "start": 2451.84, "end": 2457.68, "text": " the database. And so + we backtracked and we said, okay, we will compute everything and store it", "tokens": + [50652, 264, 8149, 13, 400, 370, 321, 646, 19466, 292, 293, 321, 848, 11, 1392, + 11, 321, 486, 14722, 1203, 293, 3531, 309, 50944], "temperature": 0.0, "avg_logprob": + -0.1170184378530465, "compression_ratio": 1.75, "no_speech_prob": 0.002784757874906063}, + {"id": 394, "seek": 244608, "start": 2457.68, "end": 2464.4, "text": " in S3, you + know, in file storage. And if in the event of losing solar, we will restore it and", + "tokens": [50944, 294, 318, 18, 11, 291, 458, 11, 294, 3991, 6725, 13, 400, 498, + 294, 264, 2280, 295, 7027, 7936, 11, 321, 486, 15227, 309, 293, 51280], "temperature": + 0.0, "avg_logprob": -0.1170184378530465, "compression_ratio": 1.75, "no_speech_prob": + 0.002784757874906063}, {"id": 395, "seek": 244608, "start": 2464.4, "end": 2472.3199999999997, + "text": " reindex everything on the fly. So I mean, that kind of also like, you + know, resurrected that", "tokens": [51280, 319, 471, 3121, 1203, 322, 264, 3603, + 13, 407, 286, 914, 11, 300, 733, 295, 611, 411, 11, 291, 458, 11, 48825, 300, 51676], + "temperature": 0.0, "avg_logprob": -0.1170184378530465, "compression_ratio": 1.75, + "no_speech_prob": 0.002784757874906063}, {"id": 396, "seek": 247232, "start": 2472.56, + "end": 2478.48, "text": " situation that also be deviated or quadrant to any other + database. If you lose the fact,", "tokens": [50376, 2590, 300, 611, 312, 31219, + 770, 420, 46856, 281, 604, 661, 8149, 13, 759, 291, 3624, 264, 1186, 11, 50672], + "temperature": 0.0, "avg_logprob": -0.1890843255179269, "compression_ratio": 1.7889908256880733, + "no_speech_prob": 0.004681700374931097}, {"id": 397, "seek": 247232, "start": 2478.48, + "end": 2483.36, "text": " if you lose the database, you lose the vectors. So if + you have computed them inside the database,", "tokens": [50672, 498, 291, 3624, + 264, 8149, 11, 291, 3624, 264, 18875, 13, 407, 498, 291, 362, 40610, 552, 1854, + 264, 8149, 11, 50916], "temperature": 0.0, "avg_logprob": -0.1890843255179269, "compression_ratio": + 1.7889908256880733, "no_speech_prob": 0.004681700374931097}, {"id": 398, "seek": + 247232, "start": 2483.36, "end": 2488.88, "text": " now bringing it back and then + turning it on and say, hey, please compute my vectors again, please,", "tokens": + [50916, 586, 5062, 309, 646, 293, 550, 6246, 309, 322, 293, 584, 11, 4177, 11, 1767, + 14722, 452, 18875, 797, 11, 1767, 11, 51192], "temperature": 0.0, "avg_logprob": + -0.1890843255179269, "compression_ratio": 1.7889908256880733, "no_speech_prob": + 0.004681700374931097}, {"id": 399, "seek": 247232, "start": 2488.88, "end": 2497.92, + "text": " please, please, you know, just too much time. You''re exactly right. And + this is a lesson that I learned.", "tokens": [51192, 1767, 11, 1767, 11, 291, 458, + 11, 445, 886, 709, 565, 13, 509, 434, 2293, 558, 13, 400, 341, 307, 257, 6898, 300, + 286, 3264, 13, 51644], "temperature": 0.0, "avg_logprob": -0.1890843255179269, "compression_ratio": + 1.7889908256880733, "no_speech_prob": 0.004681700374931097}, {"id": 400, "seek": + 249792, "start": 2498.88, "end": 2503.28, "text": " I didn''t learn this lesson + the hard way, thankfully. But this is just a lesson I learned picking", "tokens": + [50412, 286, 994, 380, 1466, 341, 6898, 264, 1152, 636, 11, 27352, 13, 583, 341, + 307, 445, 257, 6898, 286, 3264, 8867, 50632], "temperature": 0.0, "avg_logprob": + -0.13989214536522618, "compression_ratio": 1.7194244604316546, "no_speech_prob": + 0.0031802107114344835}, {"id": 401, "seek": 249792, "start": 2503.28, "end": 2510.32, + "text": " stuff up when I was at, when I was at Walter''s Clure, which is a huge + publishing firm. And you have,", "tokens": [50632, 1507, 493, 562, 286, 390, 412, + 11, 562, 286, 390, 412, 21572, 311, 2033, 540, 11, 597, 307, 257, 2603, 17832, 6174, + 13, 400, 291, 362, 11, 50984], "temperature": 0.0, "avg_logprob": -0.13989214536522618, + "compression_ratio": 1.7194244604316546, "no_speech_prob": 0.0031802107114344835}, + {"id": 402, "seek": 249792, "start": 2510.32, "end": 2514.8, "text": " you have + your content, which is like editorial content, primary source content. And it''s,", + "tokens": [50984, 291, 362, 428, 2701, 11, 597, 307, 411, 33412, 2701, 11, 6194, + 4009, 2701, 13, 400, 309, 311, 11, 51208], "temperature": 0.0, "avg_logprob": -0.13989214536522618, + "compression_ratio": 1.7194244604316546, "no_speech_prob": 0.0031802107114344835}, + {"id": 403, "seek": 249792, "start": 2515.76, "end": 2521.92, "text": " it''s written + in such a way where it''s it''s pretty raw from a machine perspective, you know. + And", "tokens": [51256, 309, 311, 3720, 294, 1270, 257, 636, 689, 309, 311, 309, + 311, 1238, 8936, 490, 257, 3479, 4585, 11, 291, 458, 13, 400, 51564], "temperature": + 0.0, "avg_logprob": -0.13989214536522618, "compression_ratio": 1.7194244604316546, + "no_speech_prob": 0.0031802107114344835}, {"id": 404, "seek": 249792, "start": 2521.92, + "end": 2526.0, "text": " then it goes through a series of enrichments and transformations. + So eventually it reaches the", "tokens": [51564, 550, 309, 1709, 807, 257, 2638, + 295, 18849, 1117, 293, 34852, 13, 407, 4728, 309, 14235, 264, 51768], "temperature": + 0.0, "avg_logprob": -0.13989214536522618, "compression_ratio": 1.7194244604316546, + "no_speech_prob": 0.0031802107114344835}, {"id": 405, "seek": 252600, "start": 2526.0, + "end": 2531.12, "text": " search engine. But every step along the way, it''s like, + okay, well, we need to add topics to classify", "tokens": [50364, 3164, 2848, 13, + 583, 633, 1823, 2051, 264, 636, 11, 309, 311, 411, 11, 1392, 11, 731, 11, 321, 643, + 281, 909, 8378, 281, 33872, 50620], "temperature": 0.0, "avg_logprob": -0.1483428147587463, + "compression_ratio": 1.8661710037174721, "no_speech_prob": 0.0020140265114605427}, + {"id": 406, "seek": 252600, "start": 2531.12, "end": 2535.68, "text": " topics, + right? So I''m going to add the topics. And then I''m going to save that state that''s + now on", "tokens": [50620, 8378, 11, 558, 30, 407, 286, 478, 516, 281, 909, 264, + 8378, 13, 400, 550, 286, 478, 516, 281, 3155, 300, 1785, 300, 311, 586, 322, 50848], + "temperature": 0.0, "avg_logprob": -0.1483428147587463, "compression_ratio": 1.8661710037174721, + "no_speech_prob": 0.0020140265114605427}, {"id": 407, "seek": 252600, "start": 2535.68, + "end": 2541.28, "text": " disk somewhere back to, okay, well, now I have to, you + know, add this other thing, you know, do any", "tokens": [50848, 12355, 4079, 646, + 281, 11, 1392, 11, 731, 11, 586, 286, 362, 281, 11, 291, 458, 11, 909, 341, 661, + 551, 11, 291, 458, 11, 360, 604, 51128], "temperature": 0.0, "avg_logprob": -0.1483428147587463, + "compression_ratio": 1.8661710037174721, "no_speech_prob": 0.0020140265114605427}, + {"id": 408, "seek": 252600, "start": 2541.28, "end": 2545.68, "text": " recognition + or something. That''s also saved, right? So you have all these intermediate steps. + So if", "tokens": [51128, 11150, 420, 746, 13, 663, 311, 611, 6624, 11, 558, 30, + 407, 291, 362, 439, 613, 19376, 4439, 13, 407, 498, 51348], "temperature": 0.0, + "avg_logprob": -0.1483428147587463, "compression_ratio": 1.8661710037174721, "no_speech_prob": + 0.0020140265114605427}, {"id": 409, "seek": 252600, "start": 2545.68, "end": 2550.32, + "text": " you lose anything, it''s really easy. You don''t have to rerun the entire, + you have to rerun the entire", "tokens": [51348, 291, 3624, 1340, 11, 309, 311, + 534, 1858, 13, 509, 500, 380, 362, 281, 43819, 409, 264, 2302, 11, 291, 362, 281, + 43819, 409, 264, 2302, 51580], "temperature": 0.0, "avg_logprob": -0.1483428147587463, + "compression_ratio": 1.8661710037174721, "no_speech_prob": 0.0020140265114605427}, + {"id": 410, "seek": 255032, "start": 2550.32, "end": 2557.04, "text": " pipeline. + It takes you months to do that. Not just days, but like literally months to start + from", "tokens": [50364, 15517, 13, 467, 2516, 291, 2493, 281, 360, 300, 13, 1726, + 445, 1708, 11, 457, 411, 3736, 2493, 281, 722, 490, 50700], "temperature": 0.0, + "avg_logprob": -0.13068068118495796, "compression_ratio": 1.7956989247311828, "no_speech_prob": + 0.004148334264755249}, {"id": 411, "seek": 255032, "start": 2557.04, "end": 2564.0, + "text": " scratch with content. So that''s like a disastrous scenario. So this lesson + that you learn is, okay,", "tokens": [50700, 8459, 365, 2701, 13, 407, 300, 311, + 411, 257, 44502, 9005, 13, 407, 341, 6898, 300, 291, 1466, 307, 11, 1392, 11, 51048], + "temperature": 0.0, "avg_logprob": -0.13068068118495796, "compression_ratio": 1.7956989247311828, + "no_speech_prob": 0.004148334264755249}, {"id": 412, "seek": 255032, "start": 2564.0, + "end": 2567.6800000000003, "text": " well, yeah, you don''t do, you don''t do everything + all in one place. Because if you lose it, then it''s", "tokens": [51048, 731, 11, + 1338, 11, 291, 500, 380, 360, 11, 291, 500, 380, 360, 1203, 439, 294, 472, 1081, + 13, 1436, 498, 291, 3624, 309, 11, 550, 309, 311, 51232], "temperature": 0.0, "avg_logprob": + -0.13068068118495796, "compression_ratio": 1.7956989247311828, "no_speech_prob": + 0.004148334264755249}, {"id": 413, "seek": 255032, "start": 2567.6800000000003, + "end": 2573.76, "text": " all gone. You have to start from scratch. So yeah, separating + concerns in that way. And then the idea", "tokens": [51232, 439, 2780, 13, 509, + 362, 281, 722, 490, 8459, 13, 407, 1338, 11, 29279, 7389, 294, 300, 636, 13, 400, + 550, 264, 1558, 51536], "temperature": 0.0, "avg_logprob": -0.13068068118495796, + "compression_ratio": 1.7956989247311828, "no_speech_prob": 0.004148334264755249}, + {"id": 414, "seek": 255032, "start": 2573.76, "end": 2577.92, "text": " of, well, + you can plug this thing in anywhere along the chain now, you know, you have this, + you have", "tokens": [51536, 295, 11, 731, 11, 291, 393, 5452, 341, 551, 294, 4992, + 2051, 264, 5021, 586, 11, 291, 458, 11, 291, 362, 341, 11, 291, 362, 51744], "temperature": + 0.0, "avg_logprob": -0.13068068118495796, "compression_ratio": 1.7956989247311828, + "no_speech_prob": 0.004148334264755249}, {"id": 415, "seek": 257792, "start": 2577.92, + "end": 2581.6800000000003, "text": " a microservice, you can put it in, you can + put it anywhere. And then you can, you don''t even have to", "tokens": [50364, 257, + 15547, 25006, 11, 291, 393, 829, 309, 294, 11, 291, 393, 829, 309, 4992, 13, 400, + 550, 291, 393, 11, 291, 500, 380, 754, 362, 281, 50552], "temperature": 0.0, "avg_logprob": + -0.12371279808782762, "compression_ratio": 1.9673202614379084, "no_speech_prob": + 0.0010272186482325196}, {"id": 416, "seek": 257792, "start": 2581.6800000000003, + "end": 2586.0, "text": " just take the vectors and then stick them in the search + engine, right? But what if you want, what if", "tokens": [50552, 445, 747, 264, + 18875, 293, 550, 2897, 552, 294, 264, 3164, 2848, 11, 558, 30, 583, 437, 498, 291, + 528, 11, 437, 498, 50768], "temperature": 0.0, "avg_logprob": -0.12371279808782762, + "compression_ratio": 1.9673202614379084, "no_speech_prob": 0.0010272186482325196}, + {"id": 417, "seek": 257792, "start": 2586.0, "end": 2590.08, "text": " you need + the vectors and you want to do something else? What if you have like a recommendations + platform", "tokens": [50768, 291, 643, 264, 18875, 293, 291, 528, 281, 360, 746, + 1646, 30, 708, 498, 291, 362, 411, 257, 10434, 3663, 50972], "temperature": 0.0, + "avg_logprob": -0.12371279808782762, "compression_ratio": 1.9673202614379084, "no_speech_prob": + 0.0010272186482325196}, {"id": 418, "seek": 257792, "start": 2590.08, "end": 2593.52, + "text": " and you have this other system over here and you want to do this other + stuff? It''s like, well,", "tokens": [50972, 293, 291, 362, 341, 661, 1185, 670, + 510, 293, 291, 528, 281, 360, 341, 661, 1507, 30, 467, 311, 411, 11, 731, 11, 51144], + "temperature": 0.0, "avg_logprob": -0.12371279808782762, "compression_ratio": 1.9673202614379084, + "no_speech_prob": 0.0010272186482325196}, {"id": 419, "seek": 257792, "start": 2593.52, + "end": 2598.0, "text": " now you have to think about routing and all these other + things. But if you just have an easy way to", "tokens": [51144, 586, 291, 362, 281, + 519, 466, 32722, 293, 439, 613, 661, 721, 13, 583, 498, 291, 445, 362, 364, 1858, + 636, 281, 51368], "temperature": 0.0, "avg_logprob": -0.12371279808782762, "compression_ratio": + 1.9673202614379084, "no_speech_prob": 0.0010272186482325196}, {"id": 420, "seek": + 257792, "start": 2598.0, "end": 2604.16, "text": " get vectors, you know, plug it + anywhere along the stack, then that''s up to you. You know, there''s no", "tokens": + [51368, 483, 18875, 11, 291, 458, 11, 5452, 309, 4992, 2051, 264, 8630, 11, 550, + 300, 311, 493, 281, 291, 13, 509, 458, 11, 456, 311, 572, 51676], "temperature": + 0.0, "avg_logprob": -0.12371279808782762, "compression_ratio": 1.9673202614379084, + "no_speech_prob": 0.0010272186482325196}, {"id": 421, "seek": 260416, "start": 2605.12, + "end": 2610.3199999999997, "text": " prescribed way of doing things. It''s a Lego. + You put the Lego wherever you want.", "tokens": [50412, 29099, 636, 295, 884, 721, + 13, 467, 311, 257, 28761, 13, 509, 829, 264, 28761, 8660, 291, 528, 13, 50672], + "temperature": 0.0, "avg_logprob": -0.21572314026535197, "compression_ratio": 1.5690376569037656, + "no_speech_prob": 0.0036720873322337866}, {"id": 422, "seek": 260416, "start": 2610.3199999999997, + "end": 2619.7599999999998, "text": " Yeah, that''s a great point because we also + implemented like an algorithm, which was it computing", "tokens": [50672, 865, 11, + 300, 311, 257, 869, 935, 570, 321, 611, 12270, 411, 364, 9284, 11, 597, 390, 309, + 15866, 51144], "temperature": 0.0, "avg_logprob": -0.21572314026535197, "compression_ratio": + 1.5690376569037656, "no_speech_prob": 0.0036720873322337866}, {"id": 423, "seek": + 260416, "start": 2619.7599999999998, "end": 2627.04, "text": " some topics, I think. + And we used fast text and work to back vectors. But we didn''t need the vectors", + "tokens": [51144, 512, 8378, 11, 286, 519, 13, 400, 321, 1143, 2370, 2487, 293, + 589, 281, 646, 18875, 13, 583, 321, 994, 380, 643, 264, 18875, 51508], "temperature": + 0.0, "avg_logprob": -0.21572314026535197, "compression_ratio": 1.5690376569037656, + "no_speech_prob": 0.0036720873322337866}, {"id": 424, "seek": 260416, "start": 2627.04, + "end": 2632.48, "text": " in the end in the downstream system. We just computed + them, clustered, ran some magic algorithm,", "tokens": [51508, 294, 264, 917, 294, + 264, 30621, 1185, 13, 492, 445, 40610, 552, 11, 596, 38624, 11, 5872, 512, 5585, + 9284, 11, 51780], "temperature": 0.0, "avg_logprob": -0.21572314026535197, "compression_ratio": + 1.5690376569037656, "no_speech_prob": 0.0036720873322337866}, {"id": 425, "seek": + 263248, "start": 2633.12, "end": 2639.52, "text": " you know, produced topics and + then you store the topics. So you store actual words in some", "tokens": [50396, + 291, 458, 11, 7126, 8378, 293, 550, 291, 3531, 264, 8378, 13, 407, 291, 3531, 3539, + 2283, 294, 512, 50716], "temperature": 0.0, "avg_logprob": -0.2163111822945731, + "compression_ratio": 1.618421052631579, "no_speech_prob": 0.0015619659097865224}, + {"id": 426, "seek": 263248, "start": 2639.52, "end": 2644.2400000000002, "text": + " database, so index them in the search engine. So yeah, you''re absolutely right. + Like, sometimes", "tokens": [50716, 8149, 11, 370, 8186, 552, 294, 264, 3164, 2848, + 13, 407, 1338, 11, 291, 434, 3122, 558, 13, 1743, 11, 2171, 50952], "temperature": + 0.0, "avg_logprob": -0.2163111822945731, "compression_ratio": 1.618421052631579, + "no_speech_prob": 0.0015619659097865224}, {"id": 427, "seek": 263248, "start": 2644.2400000000002, + "end": 2652.96, "text": " you don''t need the vectors, but they are still the medium + to get to your target. So, and so,", "tokens": [50952, 291, 500, 380, 643, 264, + 18875, 11, 457, 436, 366, 920, 264, 6399, 281, 483, 281, 428, 3779, 13, 407, 11, + 293, 370, 11, 51388], "temperature": 0.0, "avg_logprob": -0.2163111822945731, "compression_ratio": + 1.618421052631579, "no_speech_prob": 0.0015619659097865224}, {"id": 428, "seek": + 263248, "start": 2653.68, "end": 2658.64, "text": " but you''ve, I''ve seen the + blog posts, which will also link, you''ve published on marks.io,", "tokens": [51424, + 457, 291, 600, 11, 286, 600, 1612, 264, 6968, 12300, 11, 597, 486, 611, 2113, 11, + 291, 600, 6572, 322, 10640, 13, 1004, 11, 51672], "temperature": 0.0, "avg_logprob": + -0.2163111822945731, "compression_ratio": 1.618421052631579, "no_speech_prob": 0.0015619659097865224}, + {"id": 429, "seek": 265864, "start": 2659.2799999999997, "end": 2665.92, "text": + " so discussing sort of almost like a unit, unit economy of this thing. Like, if + I have MIT", "tokens": [50396, 370, 10850, 1333, 295, 1920, 411, 257, 4985, 11, + 4985, 5010, 295, 341, 551, 13, 1743, 11, 498, 286, 362, 13100, 50728], "temperature": + 0.0, "avg_logprob": -0.22191854964855107, "compression_ratio": 1.5701357466063348, + "no_speech_prob": 0.012499794363975525}, {"id": 430, "seek": 265864, "start": 2666.7999999999997, + "end": 2670.96, "text": " gazillion amount of servers, how it will play out, you + know, how much", "tokens": [50772, 26232, 11836, 2372, 295, 15909, 11, 577, 309, + 486, 862, 484, 11, 291, 458, 11, 577, 709, 50980], "temperature": 0.0, "avg_logprob": + -0.22191854964855107, "compression_ratio": 1.5701357466063348, "no_speech_prob": + 0.012499794363975525}, {"id": 431, "seek": 265864, "start": 2672.24, "end": 2677.8399999999997, + "text": " separation of concern and also resource separation, all these things, + and how economical it is", "tokens": [51044, 14634, 295, 3136, 293, 611, 7684, 14634, + 11, 439, 613, 721, 11, 293, 577, 42473, 309, 307, 51324], "temperature": 0.0, "avg_logprob": + -0.22191854964855107, "compression_ratio": 1.5701357466063348, "no_speech_prob": + 0.012499794363975525}, {"id": 432, "seek": 265864, "start": 2678.4, "end": 2684.3199999999997, + "text": " in the end. Is this something that you are proposing? So let''s say if + somebody takes MIT and", "tokens": [51352, 294, 264, 917, 13, 1119, 341, 746, 300, + 291, 366, 29939, 30, 407, 718, 311, 584, 498, 2618, 2516, 13100, 293, 51648], "temperature": + 0.0, "avg_logprob": -0.22191854964855107, "compression_ratio": 1.5701357466063348, + "no_speech_prob": 0.012499794363975525}, {"id": 433, "seek": 268432, "start": 2684.32, + "end": 2690.2400000000002, "text": " wants to scale it, you know, like all of a + sudden you get, instead of 10,000 documents, you get", "tokens": [50364, 2738, 281, + 4373, 309, 11, 291, 458, 11, 411, 439, 295, 257, 3990, 291, 483, 11, 2602, 295, + 1266, 11, 1360, 8512, 11, 291, 483, 50660], "temperature": 0.0, "avg_logprob": -0.13023217180941968, + "compression_ratio": 1.5726141078838174, "no_speech_prob": 0.00242774304933846}, + {"id": 434, "seek": 268432, "start": 2690.2400000000002, "end": 2694.8, "text": + " 10 million documents to process, right? Because somebody changed somewhere in + the pipeline,", "tokens": [50660, 1266, 2459, 8512, 281, 1399, 11, 558, 30, 1436, + 2618, 3105, 4079, 294, 264, 15517, 11, 50888], "temperature": 0.0, "avg_logprob": + -0.13023217180941968, "compression_ratio": 1.5726141078838174, "no_speech_prob": + 0.00242774304933846}, {"id": 435, "seek": 268432, "start": 2694.8, "end": 2700.56, + "text": " and now we need to rerun the whole thing. So, how would you, what is your + recommendation also", "tokens": [50888, 293, 586, 321, 643, 281, 43819, 409, 264, + 1379, 551, 13, 407, 11, 577, 576, 291, 11, 437, 307, 428, 11879, 611, 51176], "temperature": + 0.0, "avg_logprob": -0.13023217180941968, "compression_ratio": 1.5726141078838174, + "no_speech_prob": 0.00242774304933846}, {"id": 436, "seek": 268432, "start": 2700.56, + "end": 2709.52, "text": " on the economy side? How do you see MIT playing a role + in making this huge thing more economical?", "tokens": [51176, 322, 264, 5010, 1252, + 30, 1012, 360, 291, 536, 13100, 2433, 257, 3090, 294, 1455, 341, 2603, 551, 544, + 42473, 30, 51624], "temperature": 0.0, "avg_logprob": -0.13023217180941968, "compression_ratio": + 1.5726141078838174, "no_speech_prob": 0.00242774304933846}, {"id": 437, "seek": + 270952, "start": 2709.92, "end": 2721.12, "text": " So, the first thing, the first + thing that I see is that you can, you can calculate the cost ahead", "tokens": [50384, + 407, 11, 264, 700, 551, 11, 264, 700, 551, 300, 286, 536, 307, 300, 291, 393, 11, + 291, 393, 8873, 264, 2063, 2286, 50944], "temperature": 0.0, "avg_logprob": -0.1829650707733937, + "compression_ratio": 1.5978260869565217, "no_speech_prob": 0.005748170427978039}, + {"id": 438, "seek": 270952, "start": 2721.12, "end": 2729.28, "text": " of time, + because it''s absolutely linearly scalable, right? You take, so MIT itself sits + on one CPU,", "tokens": [50944, 295, 565, 11, 570, 309, 311, 3122, 43586, 38481, + 11, 558, 30, 509, 747, 11, 370, 13100, 2564, 12696, 322, 472, 13199, 11, 51352], + "temperature": 0.0, "avg_logprob": -0.1829650707733937, "compression_ratio": 1.5978260869565217, + "no_speech_prob": 0.005748170427978039}, {"id": 439, "seek": 270952, "start": 2730.08, + "end": 2735.12, "text": " right? It sits on one thread, I''ll even say a thread, + because these days you have cores and CPUs", "tokens": [51392, 558, 30, 467, 12696, + 322, 472, 7207, 11, 286, 603, 754, 584, 257, 7207, 11, 570, 613, 1708, 291, 362, + 24826, 293, 13199, 82, 51644], "temperature": 0.0, "avg_logprob": -0.1829650707733937, + "compression_ratio": 1.5978260869565217, "no_speech_prob": 0.005748170427978039}, + {"id": 440, "seek": 273512, "start": 2735.12, "end": 2741.8399999999997, "text": + " and threads and it gets messed up. You can tell MIT to use multiple threads in + certain situations", "tokens": [50364, 293, 19314, 293, 309, 2170, 16507, 493, 13, + 509, 393, 980, 13100, 281, 764, 3866, 19314, 294, 1629, 6851, 50700], "temperature": + 0.0, "avg_logprob": -0.19406892557059768, "compression_ratio": 1.7092198581560283, + "no_speech_prob": 0.0006323584821075201}, {"id": 441, "seek": 273512, "start": 2741.8399999999997, + "end": 2746.7999999999997, "text": " that you want to, but the example for bash + processing that I use, which I actually learned from", "tokens": [50700, 300, 291, + 528, 281, 11, 457, 264, 1365, 337, 46183, 9007, 300, 286, 764, 11, 597, 286, 767, + 3264, 490, 50948], "temperature": 0.0, "avg_logprob": -0.19406892557059768, "compression_ratio": + 1.7092198581560283, "no_speech_prob": 0.0006323584821075201}, {"id": 442, "seek": + 273512, "start": 2746.7999999999997, "end": 2754.08, "text": " the VESPITE because + they wrote an amazing blog post in, I think it was early January, they released + a", "tokens": [50948, 264, 691, 2358, 47, 3927, 36, 570, 436, 4114, 364, 2243, 6968, + 2183, 294, 11, 286, 519, 309, 390, 2440, 7061, 11, 436, 4736, 257, 51312], "temperature": + 0.0, "avg_logprob": -0.19406892557059768, "compression_ratio": 1.7092198581560283, + "no_speech_prob": 0.0006323584821075201}, {"id": 443, "seek": 273512, "start": 2754.08, + "end": 2759.2799999999997, "text": " blog post talking about this exact problem + of, you know, do you have one process across multiple", "tokens": [51312, 6968, + 2183, 1417, 466, 341, 1900, 1154, 295, 11, 291, 458, 11, 360, 291, 362, 472, 1399, + 2108, 3866, 51572], "temperature": 0.0, "avg_logprob": -0.19406892557059768, "compression_ratio": + 1.7092198581560283, "no_speech_prob": 0.0006323584821075201}, {"id": 444, "seek": + 273512, "start": 2759.2799999999997, "end": 2764.24, "text": " threads? Do you have + multiple processes? So, if you go with the multiple processes route,", "tokens": + [51572, 19314, 30, 1144, 291, 362, 3866, 7555, 30, 407, 11, 498, 291, 352, 365, + 264, 3866, 7555, 7955, 11, 51820], "temperature": 0.0, "avg_logprob": -0.19406892557059768, + "compression_ratio": 1.7092198581560283, "no_speech_prob": 0.0006323584821075201}, + {"id": 445, "seek": 276512, "start": 2765.2799999999997, "end": 2775.52, "text": + " let''s say I take, I take a bunch of documents and I pass them in and I have some + level of consistency", "tokens": [50372, 718, 311, 584, 286, 747, 11, 286, 747, + 257, 3840, 295, 8512, 293, 286, 1320, 552, 294, 293, 286, 362, 512, 1496, 295, 14416, + 50884], "temperature": 0.0, "avg_logprob": -0.17621022179013207, "compression_ratio": + 1.751111111111111, "no_speech_prob": 0.0031097624450922012}, {"id": 446, "seek": + 276512, "start": 2775.52, "end": 2782.56, "text": " in the document size, which + you usually do. Pass them in and it takes you X as long, it takes you", "tokens": + [50884, 294, 264, 4166, 2744, 11, 597, 291, 2673, 360, 13, 10319, 552, 294, 293, + 309, 2516, 291, 1783, 382, 938, 11, 309, 2516, 291, 51236], "temperature": 0.0, + "avg_logprob": -0.17621022179013207, "compression_ratio": 1.751111111111111, "no_speech_prob": + 0.0031097624450922012}, {"id": 447, "seek": 276512, "start": 2782.56, "end": 2787.8399999999997, + "text": " X to get all of your documents, inference, right? So, you have that number + and you know how long it", "tokens": [51236, 1783, 281, 483, 439, 295, 428, 8512, + 11, 38253, 11, 558, 30, 407, 11, 291, 362, 300, 1230, 293, 291, 458, 577, 938, 309, + 51500], "temperature": 0.0, "avg_logprob": -0.17621022179013207, "compression_ratio": + 1.751111111111111, "no_speech_prob": 0.0031097624450922012}, {"id": 448, "seek": + 276512, "start": 2787.8399999999997, "end": 2794.16, "text": " took and you know + how much, how much content you processed in terms of bytes. Well, what if I,", "tokens": + [51500, 1890, 293, 291, 458, 577, 709, 11, 577, 709, 2701, 291, 18846, 294, 2115, + 295, 36088, 13, 1042, 11, 437, 498, 286, 11, 51816], "temperature": 0.0, "avg_logprob": + -0.17621022179013207, "compression_ratio": 1.751111111111111, "no_speech_prob": + 0.0031097624450922012}, {"id": 449, "seek": 279416, "start": 2794.16, "end": 2800.3999999999996, + "text": " if I add, if I add another process now and I''m doing this purely paralyzeable, + so half of my documents", "tokens": [50364, 498, 286, 909, 11, 498, 286, 909, 1071, + 1399, 586, 293, 286, 478, 884, 341, 17491, 32645, 1381, 712, 11, 370, 1922, 295, + 452, 8512, 50676], "temperature": 0.0, "avg_logprob": -0.1267544315979544, "compression_ratio": + 1.6652542372881356, "no_speech_prob": 0.0006732585607096553}, {"id": 450, "seek": + 279416, "start": 2800.3999999999996, "end": 2806.16, "text": " go here, half of + my documents go there, it''s what I said exactly is linearly scalable. I add a CPU,", + "tokens": [50676, 352, 510, 11, 1922, 295, 452, 8512, 352, 456, 11, 309, 311, 437, + 286, 848, 2293, 307, 43586, 38481, 13, 286, 909, 257, 13199, 11, 50964], "temperature": + 0.0, "avg_logprob": -0.1267544315979544, "compression_ratio": 1.6652542372881356, + "no_speech_prob": 0.0006732585607096553}, {"id": 451, "seek": 279416, "start": 2806.7999999999997, + "end": 2815.7599999999998, "text": " it has the time, right? It has the time that + it takes to do this. So, if I have a situation where", "tokens": [50996, 309, 575, + 264, 565, 11, 558, 30, 467, 575, 264, 565, 300, 309, 2516, 281, 360, 341, 13, 407, + 11, 498, 286, 362, 257, 2590, 689, 51444], "temperature": 0.0, "avg_logprob": -0.1267544315979544, + "compression_ratio": 1.6652542372881356, "no_speech_prob": 0.0006732585607096553}, + {"id": 452, "seek": 279416, "start": 2815.7599999999998, "end": 2821.44, "text": + " I''ve said, okay, I did 10,000 documents, it took me X, now I have to do a million + documents.", "tokens": [51444, 286, 600, 848, 11, 1392, 11, 286, 630, 1266, 11, + 1360, 8512, 11, 309, 1890, 385, 1783, 11, 586, 286, 362, 281, 360, 257, 2459, 8512, + 13, 51728], "temperature": 0.0, "avg_logprob": -0.1267544315979544, "compression_ratio": + 1.6652542372881356, "no_speech_prob": 0.0006732585607096553}, {"id": 453, "seek": + 282144, "start": 2822.0, "end": 2828.32, "text": " How long do I want it to take? + You can actually write down the calculation and say, I need,", "tokens": [50392, + 1012, 938, 360, 286, 528, 309, 281, 747, 30, 509, 393, 767, 2464, 760, 264, 17108, + 293, 584, 11, 286, 643, 11, 50708], "temperature": 0.0, "avg_logprob": -0.15547142406501394, + "compression_ratio": 1.5756302521008403, "no_speech_prob": 0.006124628242105246}, + {"id": 454, "seek": 282144, "start": 2828.32, "end": 2833.2000000000003, "text": + " I need this exact infrastructure, which is a huge problem right now, a lot of + people don''t know that.", "tokens": [50708, 286, 643, 341, 1900, 6896, 11, 597, + 307, 257, 2603, 1154, 558, 586, 11, 257, 688, 295, 561, 500, 380, 458, 300, 13, + 50952], "temperature": 0.0, "avg_logprob": -0.15547142406501394, "compression_ratio": + 1.5756302521008403, "no_speech_prob": 0.006124628242105246}, {"id": 455, "seek": + 282144, "start": 2833.2000000000003, "end": 2839.52, "text": " It''s like, okay, + let''s just add a lot of GPUs and see what happens, you know. You can, you can spend", + "tokens": [50952, 467, 311, 411, 11, 1392, 11, 718, 311, 445, 909, 257, 688, 295, + 18407, 82, 293, 536, 437, 2314, 11, 291, 458, 13, 509, 393, 11, 291, 393, 3496, + 51268], "temperature": 0.0, "avg_logprob": -0.15547142406501394, "compression_ratio": + 1.5756302521008403, "no_speech_prob": 0.006124628242105246}, {"id": 456, "seek": + 282144, "start": 2839.52, "end": 2845.2000000000003, "text": " the time to go through + and do that calculation, but it''s not so straightforward.", "tokens": [51268, 264, + 565, 281, 352, 807, 293, 360, 300, 17108, 11, 457, 309, 311, 406, 370, 15325, 13, + 51552], "temperature": 0.0, "avg_logprob": -0.15547142406501394, "compression_ratio": + 1.5756302521008403, "no_speech_prob": 0.006124628242105246}, {"id": 457, "seek": + 284520, "start": 2845.8399999999997, "end": 2854.3999999999996, "text": " And you''d + have to do it like, you''d have to cost it yourself. I haven''t released it, but + I want", "tokens": [50396, 400, 291, 1116, 362, 281, 360, 309, 411, 11, 291, 1116, + 362, 281, 2063, 309, 1803, 13, 286, 2378, 380, 4736, 309, 11, 457, 286, 528, 50824], + "temperature": 0.0, "avg_logprob": -0.1919794630730289, "compression_ratio": 1.5891891891891892, + "no_speech_prob": 0.031021952629089355}, {"id": 458, "seek": 284520, "start": 2854.3999999999996, + "end": 2859.2799999999997, "text": " to have a calculator that says, how many bytes + do you have and, you know, how long do you want to spend?", "tokens": [50824, 281, + 362, 257, 24993, 300, 1619, 11, 577, 867, 36088, 360, 291, 362, 293, 11, 291, 458, + 11, 577, 938, 360, 291, 528, 281, 3496, 30, 51068], "temperature": 0.0, "avg_logprob": + -0.1919794630730289, "compression_ratio": 1.5891891891891892, "no_speech_prob": + 0.031021952629089355}, {"id": 459, "seek": 284520, "start": 2859.2799999999997, + "end": 2869.68, "text": " And I can say, well, it''ll cost you this in Amazon or + whatever. So, that''s, that''s one thing.", "tokens": [51068, 400, 286, 393, 584, + 11, 731, 11, 309, 603, 2063, 291, 341, 294, 6795, 420, 2035, 13, 407, 11, 300, 311, + 11, 300, 311, 472, 551, 13, 51588], "temperature": 0.0, "avg_logprob": -0.1919794630730289, + "compression_ratio": 1.5891891891891892, "no_speech_prob": 0.031021952629089355}, + {"id": 460, "seek": 286968, "start": 2870.3999999999996, "end": 2877.6, "text": + " I also want it so, I mentioned GPUs, it''s like, this is, I built it so it works + on CPU.", "tokens": [50400, 286, 611, 528, 309, 370, 11, 286, 2835, 18407, 82, 11, + 309, 311, 411, 11, 341, 307, 11, 286, 3094, 309, 370, 309, 1985, 322, 13199, 13, + 50760], "temperature": 0.0, "avg_logprob": -0.24736902848729547, "compression_ratio": + 1.6299559471365639, "no_speech_prob": 0.0028566857799887657}, {"id": 461, "seek": + 286968, "start": 2878.56, "end": 2885.2799999999997, "text": " If you are a company + that''s getting into this stuff and this, this, this idea of the", "tokens": [50808, + 759, 291, 366, 257, 2237, 300, 311, 1242, 666, 341, 1507, 293, 341, 11, 341, 11, + 341, 1558, 295, 264, 51144], "temperature": 0.0, "avg_logprob": -0.24736902848729547, + "compression_ratio": 1.6299559471365639, "no_speech_prob": 0.0028566857799887657}, + {"id": 462, "seek": 286968, "start": 2885.2799999999997, "end": 2890.8799999999997, + "text": " unit economy, like, how long does it take to process something? And what''s + the cost and, you know,", "tokens": [51144, 4985, 5010, 11, 411, 11, 577, 938, 775, + 309, 747, 281, 1399, 746, 30, 400, 437, 311, 264, 2063, 293, 11, 291, 458, 11, 51424], + "temperature": 0.0, "avg_logprob": -0.24736902848729547, "compression_ratio": 1.6299559471365639, + "no_speech_prob": 0.0028566857799887657}, {"id": 463, "seek": 286968, "start": 2890.8799999999997, + "end": 2896.3999999999996, "text": " how do I scale it? But the, the, the, the billion + documents. If I''m coming into this ecosystem and", "tokens": [51424, 577, 360, + 286, 4373, 309, 30, 583, 264, 11, 264, 11, 264, 11, 264, 5218, 8512, 13, 759, 286, + 478, 1348, 666, 341, 11311, 293, 51700], "temperature": 0.0, "avg_logprob": -0.24736902848729547, + "compression_ratio": 1.6299559471365639, "no_speech_prob": 0.0028566857799887657}, + {"id": 464, "seek": 289640, "start": 2896.4, "end": 2903.44, "text": " content processing, + and I''m used to working in Java or, you know, C sharp or something like that.", + "tokens": [50364, 2701, 9007, 11, 293, 286, 478, 1143, 281, 1364, 294, 10745, 420, + 11, 291, 458, 11, 383, 8199, 420, 746, 411, 300, 13, 50716], "temperature": 0.0, + "avg_logprob": -0.17613258018149985, "compression_ratio": 1.560483870967742, "no_speech_prob": + 0.0010718363337218761}, {"id": 465, "seek": 289640, "start": 2905.12, "end": 2909.6800000000003, + "text": " Now you''re telling me I need to buy GPUs, like I have to run GPUs, and + then I go check the prices,", "tokens": [50800, 823, 291, 434, 3585, 385, 286, 643, + 281, 2256, 18407, 82, 11, 411, 286, 362, 281, 1190, 18407, 82, 11, 293, 550, 286, + 352, 1520, 264, 7901, 11, 51028], "temperature": 0.0, "avg_logprob": -0.17613258018149985, + "compression_ratio": 1.560483870967742, "no_speech_prob": 0.0010718363337218761}, + {"id": 466, "seek": 289640, "start": 2909.6800000000003, "end": 2914.8, "text": + " I''m like, well, that''s not how much we spend on infrastructure. That''s not + in our budget. I''m", "tokens": [51028, 286, 478, 411, 11, 731, 11, 300, 311, 406, + 577, 709, 321, 3496, 322, 6896, 13, 663, 311, 406, 294, 527, 4706, 13, 286, 478, + 51284], "temperature": 0.0, "avg_logprob": -0.17613258018149985, "compression_ratio": + 1.560483870967742, "no_speech_prob": 0.0010718363337218761}, {"id": 467, "seek": + 289640, "start": 2914.8, "end": 2920.8, "text": " sorry to tell you. So maybe we + can''t even do this. So I wanted to have a way where you could get", "tokens": [51284, + 2597, 281, 980, 291, 13, 407, 1310, 321, 393, 380, 754, 360, 341, 13, 407, 286, + 1415, 281, 362, 257, 636, 689, 291, 727, 483, 51584], "temperature": 0.0, "avg_logprob": + -0.17613258018149985, "compression_ratio": 1.560483870967742, "no_speech_prob": + 0.0010718363337218761}, {"id": 468, "seek": 292080, "start": 2920.8, "end": 2925.84, + "text": " around that problem where you could just use CPU and it''s a straightforward + understanding of the cost", "tokens": [50364, 926, 300, 1154, 689, 291, 727, 445, + 764, 13199, 293, 309, 311, 257, 15325, 3701, 295, 264, 2063, 50616], "temperature": + 0.0, "avg_logprob": -0.2150675045546665, "compression_ratio": 1.615702479338843, + "no_speech_prob": 0.001631997642107308}, {"id": 469, "seek": 292080, "start": 2927.04, + "end": 2933.6800000000003, "text": " that you''d have to put in. I haven''t checked + Amazon, I haven''t checked Amazon prices in a little while,", "tokens": [50676, + 300, 291, 1116, 362, 281, 829, 294, 13, 286, 2378, 380, 10033, 6795, 11, 286, 2378, + 380, 10033, 6795, 7901, 294, 257, 707, 1339, 11, 51008], "temperature": 0.0, "avg_logprob": + -0.2150675045546665, "compression_ratio": 1.615702479338843, "no_speech_prob": 0.001631997642107308}, + {"id": 470, "seek": 292080, "start": 2933.6800000000003, "end": 2941.52, "text": + " but I might as well be posted online, which is, which is another cloud platform. + I just, the pricing", "tokens": [51008, 457, 286, 1062, 382, 731, 312, 9437, 2950, + 11, 597, 307, 11, 597, 307, 1071, 4588, 3663, 13, 286, 445, 11, 264, 17621, 51400], + "temperature": 0.0, "avg_logprob": -0.2150675045546665, "compression_ratio": 1.615702479338843, + "no_speech_prob": 0.001631997642107308}, {"id": 471, "seek": 292080, "start": 2941.52, + "end": 2950.0, "text": " is better and I just, like, they were actually recently + purchased by a huge content,", "tokens": [51400, 307, 1101, 293, 286, 445, 11, 411, + 11, 436, 645, 767, 3938, 14734, 538, 257, 2603, 2701, 11, 51824], "temperature": + 0.0, "avg_logprob": -0.2150675045546665, "compression_ratio": 1.615702479338843, + "no_speech_prob": 0.001631997642107308}, {"id": 472, "seek": 295080, "start": 2951.2000000000003, + "end": 2957.52, "text": " management system, uh, it starts with an, I forget the + name, whatever. Anyway, I use line-out and", "tokens": [50384, 4592, 1185, 11, 2232, + 11, 309, 3719, 365, 364, 11, 286, 2870, 264, 1315, 11, 2035, 13, 5684, 11, 286, + 764, 1622, 12, 346, 293, 50700], "temperature": 0.0, "avg_logprob": -0.22877515709918478, + "compression_ratio": 1.5646551724137931, "no_speech_prob": 0.0037144243251532316}, + {"id": 473, "seek": 295080, "start": 2957.52, "end": 2963.6800000000003, "text": + " it''s, uh, it''s, it''s cheap for CPUs. Like, it''s great, but you want to, you + want to run a GPU,", "tokens": [50700, 309, 311, 11, 2232, 11, 309, 311, 11, 309, + 311, 7084, 337, 13199, 82, 13, 1743, 11, 309, 311, 869, 11, 457, 291, 528, 281, + 11, 291, 528, 281, 1190, 257, 18407, 11, 51008], "temperature": 0.0, "avg_logprob": + -0.22877515709918478, "compression_ratio": 1.5646551724137931, "no_speech_prob": + 0.0037144243251532316}, {"id": 474, "seek": 295080, "start": 2963.6800000000003, + "end": 2970.5600000000004, "text": " it''s like $500 a month or $1,000 a month. + And that''s a lot of money for one machine,", "tokens": [51008, 309, 311, 411, 1848, + 7526, 257, 1618, 420, 1848, 16, 11, 1360, 257, 1618, 13, 400, 300, 311, 257, 688, + 295, 1460, 337, 472, 3479, 11, 51352], "temperature": 0.0, "avg_logprob": -0.22877515709918478, + "compression_ratio": 1.5646551724137931, "no_speech_prob": 0.0037144243251532316}, + {"id": 475, "seek": 295080, "start": 2970.5600000000004, "end": 2976.96, "text": + " and most teams are not willing to spend that. If you want to do fractional, you + know,", "tokens": [51352, 293, 881, 5491, 366, 406, 4950, 281, 3496, 300, 13, 759, + 291, 528, 281, 360, 17948, 1966, 11, 291, 458, 11, 51672], "temperature": 0.0, "avg_logprob": + -0.22877515709918478, "compression_ratio": 1.5646551724137931, "no_speech_prob": + 0.0037144243251532316}, {"id": 476, "seek": 297696, "start": 2976.96, "end": 2984.32, + "text": " on AWS is probably for actionable GPUs, I think, but it''s still expensive. + And now you''re, now,", "tokens": [50364, 322, 17650, 307, 1391, 337, 45098, 18407, + 82, 11, 286, 519, 11, 457, 309, 311, 920, 5124, 13, 400, 586, 291, 434, 11, 586, + 11, 50732], "temperature": 0.0, "avg_logprob": -0.19726796735797011, "compression_ratio": + 1.5932203389830508, "no_speech_prob": 0.001627921941690147}, {"id": 477, "seek": + 297696, "start": 2985.6, "end": 2989.92, "text": " it''s like this cost that never + goes away. Like, once you do it, it''s like, well, it''s there,", "tokens": [50796, + 309, 311, 411, 341, 2063, 300, 1128, 1709, 1314, 13, 1743, 11, 1564, 291, 360, 309, + 11, 309, 311, 411, 11, 731, 11, 309, 311, 456, 11, 51012], "temperature": 0.0, "avg_logprob": + -0.19726796735797011, "compression_ratio": 1.5932203389830508, "no_speech_prob": + 0.001627921941690147}, {"id": 478, "seek": 297696, "start": 2989.92, "end": 2996.8, + "text": " it''s there for a long time, you know, CPUs are a commodity. Um, GPUs, + you have to fight with the,", "tokens": [51012, 309, 311, 456, 337, 257, 938, 565, + 11, 291, 458, 11, 13199, 82, 366, 257, 29125, 13, 3301, 11, 18407, 82, 11, 291, + 362, 281, 2092, 365, 264, 11, 51356], "temperature": 0.0, "avg_logprob": -0.19726796735797011, + "compression_ratio": 1.5932203389830508, "no_speech_prob": 0.001627921941690147}, + {"id": 479, "seek": 297696, "start": 2996.8, "end": 3005.2, "text": " with the cryptocurrency + crowd for the costs, all this stuff. So, yes, CPUs the way to go.", "tokens": [51356, + 365, 264, 28809, 6919, 337, 264, 5497, 11, 439, 341, 1507, 13, 407, 11, 2086, 11, + 13199, 82, 264, 636, 281, 352, 13, 51776], "temperature": 0.0, "avg_logprob": -0.19726796735797011, + "compression_ratio": 1.5932203389830508, "no_speech_prob": 0.001627921941690147}, + {"id": 480, "seek": 300520, "start": 3005.4399999999996, "end": 3011.7599999999998, + "text": " I can imagine that GPUs can be used during the model training or fine + tuning, but during serving,", "tokens": [50376, 286, 393, 3811, 300, 18407, 82, + 393, 312, 1143, 1830, 264, 2316, 3097, 420, 2489, 15164, 11, 457, 1830, 8148, 11, + 50692], "temperature": 0.0, "avg_logprob": -0.1661984776876059, "compression_ratio": + 1.5, "no_speech_prob": 0.005041016731411219}, {"id": 481, "seek": 300520, "start": + 3011.7599999999998, "end": 3020.0, "text": " that sounds way too expensive. Right. + Yeah, yeah, that makes a lot of sense. And, um, and so,", "tokens": [50692, 300, + 3263, 636, 886, 5124, 13, 1779, 13, 865, 11, 1338, 11, 300, 1669, 257, 688, 295, + 2020, 13, 400, 11, 1105, 11, 293, 370, 11, 51104], "temperature": 0.0, "avg_logprob": + -0.1661984776876059, "compression_ratio": 1.5, "no_speech_prob": 0.005041016731411219}, + {"id": 482, "seek": 300520, "start": 3021.04, "end": 3028.08, "text": " now when + you offer my, how exactly you offer it, it''s, it''s a binary package, right, uh, + that I can", "tokens": [51156, 586, 562, 291, 2626, 452, 11, 577, 2293, 291, 2626, + 309, 11, 309, 311, 11, 309, 311, 257, 17434, 7372, 11, 558, 11, 2232, 11, 300, 286, + 393, 51508], "temperature": 0.0, "avg_logprob": -0.1661984776876059, "compression_ratio": + 1.5, "no_speech_prob": 0.005041016731411219}, {"id": 483, "seek": 302808, "start": + 3028.08, "end": 3035.44, "text": " install and, and, and basically run on my, on + my system, and I can decide whether it will be like,", "tokens": [50364, 3625, 293, + 11, 293, 11, 293, 1936, 1190, 322, 452, 11, 322, 452, 1185, 11, 293, 286, 393, 4536, + 1968, 309, 486, 312, 411, 11, 50732], "temperature": 0.0, "avg_logprob": -0.21533725788066913, + "compression_ratio": 1.5245901639344261, "no_speech_prob": 0.004794891923666}, {"id": + 484, "seek": 302808, "start": 3035.44, "end": 3042.0, "text": " a standalone kind + of script or it will be a pod in Kubernetes or Docker image and some other", "tokens": + [50732, 257, 37454, 733, 295, 5755, 420, 309, 486, 312, 257, 2497, 294, 23145, 420, + 33772, 3256, 293, 512, 661, 51060], "temperature": 0.0, "avg_logprob": -0.21533725788066913, + "compression_ratio": 1.5245901639344261, "no_speech_prob": 0.004794891923666}, {"id": + 485, "seek": 302808, "start": 3043.12, "end": 3051.52, "text": " non Kubernetes. + Um, so is that right? That''s right. It''s, it''s a very small executable.", "tokens": + [51116, 2107, 23145, 13, 3301, 11, 370, 307, 300, 558, 30, 663, 311, 558, 13, 467, + 311, 11, 309, 311, 257, 588, 1359, 7568, 712, 13, 51536], "temperature": 0.0, "avg_logprob": + -0.21533725788066913, "compression_ratio": 1.5245901639344261, "no_speech_prob": + 0.004794891923666}, {"id": 486, "seek": 305152, "start": 3051.52, "end": 3061.28, + "text": " Um, it''s, it''s so Linux is a first class citizen. Um, Windows is, it''ll + run on Windows. It''ll run on", "tokens": [50364, 3301, 11, 309, 311, 11, 309, 311, + 370, 18734, 307, 257, 700, 1508, 13326, 13, 3301, 11, 8591, 307, 11, 309, 603, 1190, + 322, 8591, 13, 467, 603, 1190, 322, 50852], "temperature": 0.0, "avg_logprob": -0.16736780014713254, + "compression_ratio": 1.5983935742971886, "no_speech_prob": 0.003780386410653591}, + {"id": 487, "seek": 305152, "start": 3061.28, "end": 3067.6, "text": " Mac, but + I''ve heard people running it on Mac M1, but they had to like do a lot of stuff + to like", "tokens": [50852, 5707, 11, 457, 286, 600, 2198, 561, 2614, 309, 322, + 5707, 376, 16, 11, 457, 436, 632, 281, 411, 360, 257, 688, 295, 1507, 281, 411, + 51168], "temperature": 0.0, "avg_logprob": -0.16736780014713254, "compression_ratio": + 1.5983935742971886, "no_speech_prob": 0.003780386410653591}, {"id": 488, "seek": + 305152, "start": 3067.6, "end": 3072.08, "text": " fix dependencies and it wasn''t + really working that well. And I think what, what''s it called Rosetta", "tokens": + [51168, 3191, 36606, 293, 309, 2067, 380, 534, 1364, 300, 731, 13, 400, 286, 519, + 437, 11, 437, 311, 309, 1219, 11144, 16593, 51392], "temperature": 0.0, "avg_logprob": + -0.16736780014713254, "compression_ratio": 1.5983935742971886, "no_speech_prob": + 0.003780386410653591}, {"id": 489, "seek": 305152, "start": 3072.08, "end": 3078.72, + "text": " or something? I think it''s still using that like to, to do the X86 like + bridge, like the translation,", "tokens": [51392, 420, 746, 30, 286, 519, 309, 311, + 920, 1228, 300, 411, 281, 11, 281, 360, 264, 1783, 22193, 411, 7283, 11, 411, 264, + 12853, 11, 51724], "temperature": 0.0, "avg_logprob": -0.16736780014713254, "compression_ratio": + 1.5983935742971886, "no_speech_prob": 0.003780386410653591}, {"id": 490, "seek": + 307872, "start": 3078.72, "end": 3084.7999999999997, "text": " visualization. Um, + so Mac M1, it''s not, uh, I wouldn''t consider it working. I''ve also seen some", + "tokens": [50364, 25801, 13, 3301, 11, 370, 5707, 376, 16, 11, 309, 311, 406, 11, + 2232, 11, 286, 2759, 380, 1949, 309, 1364, 13, 286, 600, 611, 1612, 512, 50668], + "temperature": 0.0, "avg_logprob": -0.15476605491916628, "compression_ratio": 1.696969696969697, + "no_speech_prob": 0.002198495902121067}, {"id": 491, "seek": 307872, "start": 3084.7999999999997, + "end": 3089.12, "text": " other problems on, on Mac that I''m trying to resolve. + It works fine, works on my machine, right,", "tokens": [50668, 661, 2740, 322, 11, + 322, 5707, 300, 286, 478, 1382, 281, 14151, 13, 467, 1985, 2489, 11, 1985, 322, + 452, 3479, 11, 558, 11, 50884], "temperature": 0.0, "avg_logprob": -0.15476605491916628, + "compression_ratio": 1.696969696969697, "no_speech_prob": 0.002198495902121067}, + {"id": 492, "seek": 307872, "start": 3089.12, "end": 3092.9599999999996, "text": + " that, that type of thing, but really it''s meant to be running links. Um, you + can run it in Docker.", "tokens": [50884, 300, 11, 300, 2010, 295, 551, 11, 457, + 534, 309, 311, 4140, 281, 312, 2614, 6123, 13, 3301, 11, 291, 393, 1190, 309, 294, + 33772, 13, 51076], "temperature": 0.0, "avg_logprob": -0.15476605491916628, "compression_ratio": + 1.696969696969697, "no_speech_prob": 0.002198495902121067}, {"id": 493, "seek": + 307872, "start": 3092.9599999999996, "end": 3097.68, "text": " It''s really easy + to get started in Docker. Uh, so you can download the executable and run it on your", + "tokens": [51076, 467, 311, 534, 1858, 281, 483, 1409, 294, 33772, 13, 4019, 11, + 370, 291, 393, 5484, 264, 7568, 712, 293, 1190, 309, 322, 428, 51312], "temperature": + 0.0, "avg_logprob": -0.15476605491916628, "compression_ratio": 1.696969696969697, + "no_speech_prob": 0.002198495902121067}, {"id": 494, "seek": 307872, "start": 3097.68, + "end": 3103.52, "text": " Mac, um, or you can just download the Docker and use that, + which is probably a little bit more straightforward.", "tokens": [51312, 5707, 11, + 1105, 11, 420, 291, 393, 445, 5484, 264, 33772, 293, 764, 300, 11, 597, 307, 1391, + 257, 707, 857, 544, 15325, 13, 51604], "temperature": 0.0, "avg_logprob": -0.15476605491916628, + "compression_ratio": 1.696969696969697, "no_speech_prob": 0.002198495902121067}, + {"id": 495, "seek": 310352, "start": 3104.48, "end": 3109.6, "text": " Um, then + you don''t have to worry about other dependencies. Uh, with Linux, I don''t, if + you''re running it", "tokens": [50412, 3301, 11, 550, 291, 500, 380, 362, 281, 3292, + 466, 661, 36606, 13, 4019, 11, 365, 18734, 11, 286, 500, 380, 11, 498, 291, 434, + 2614, 309, 50668], "temperature": 0.0, "avg_logprob": -0.1407222897987666, "compression_ratio": + 1.6971830985915493, "no_speech_prob": 0.0024130267556756735}, {"id": 496, "seek": + 310352, "start": 3110.08, "end": 3115.28, "text": " on, uh, on Linux machines, you + can use the Docker if you''re doing like Kubernetes and that stuff.", "tokens": + [50692, 322, 11, 2232, 11, 322, 18734, 8379, 11, 291, 393, 764, 264, 33772, 498, + 291, 434, 884, 411, 23145, 293, 300, 1507, 13, 50952], "temperature": 0.0, "avg_logprob": + -0.1407222897987666, "compression_ratio": 1.6971830985915493, "no_speech_prob": + 0.0024130267556756735}, {"id": 497, "seek": 310352, "start": 3115.28, "end": 3122.4, + "text": " Great. Run it in Docker. Um, just make sure that you sort out like in + your pod or whatever,", "tokens": [50952, 3769, 13, 8950, 309, 294, 33772, 13, 3301, + 11, 445, 652, 988, 300, 291, 1333, 484, 411, 294, 428, 2497, 420, 2035, 11, 51308], + "temperature": 0.0, "avg_logprob": -0.1407222897987666, "compression_ratio": 1.6971830985915493, + "no_speech_prob": 0.0024130267556756735}, {"id": 498, "seek": 310352, "start": 3122.4, + "end": 3127.68, "text": " like how much compute you''re actually giving it. Um, + because model inference doesn''t,", "tokens": [51308, 411, 577, 709, 14722, 291, + 434, 767, 2902, 309, 13, 3301, 11, 570, 2316, 38253, 1177, 380, 11, 51572], "temperature": + 0.0, "avg_logprob": -0.1407222897987666, "compression_ratio": 1.6971830985915493, + "no_speech_prob": 0.0024130267556756735}, {"id": 499, "seek": 310352, "start": 3127.68, + "end": 3131.92, "text": " it''s not just a mighty. It''s like all model inference + is really, really heavy. It''s really expensive.", "tokens": [51572, 309, 311, 406, + 445, 257, 21556, 13, 467, 311, 411, 439, 2316, 38253, 307, 534, 11, 534, 4676, 13, + 467, 311, 534, 5124, 13, 51784], "temperature": 0.0, "avg_logprob": -0.1407222897987666, + "compression_ratio": 1.6971830985915493, "no_speech_prob": 0.0024130267556756735}, + {"id": 500, "seek": 313192, "start": 3131.92, "end": 3137.2000000000003, "text": + " It wants a lot of, wants a lot of compute, not so much memory, but compute.", + "tokens": [50364, 467, 2738, 257, 688, 295, 11, 2738, 257, 688, 295, 14722, 11, + 406, 370, 709, 4675, 11, 457, 14722, 13, 50628], "temperature": 0.0, "avg_logprob": + -0.1825498100218734, "compression_ratio": 1.6345381526104417, "no_speech_prob": + 0.0004321828600950539}, {"id": 501, "seek": 313192, "start": 3137.76, "end": 3144.08, + "text": " So just be sure to give it enough, um, to satisfy your needs and do time. + I haven''t done Kubernetes test myself.", "tokens": [50656, 407, 445, 312, 988, + 281, 976, 309, 1547, 11, 1105, 11, 281, 19319, 428, 2203, 293, 360, 565, 13, 286, + 2378, 380, 1096, 23145, 1500, 2059, 13, 50972], "temperature": 0.0, "avg_logprob": + -0.1825498100218734, "compression_ratio": 1.6345381526104417, "no_speech_prob": + 0.0004321828600950539}, {"id": 502, "seek": 313192, "start": 3145.2000000000003, + "end": 3153.76, "text": " Uh, but I like to run, I''m, I''m old school. Like this + whole Docker thing. Yeah. Okay. I''ll, uh, I''ll make a Docker file.", "tokens": + [51028, 4019, 11, 457, 286, 411, 281, 1190, 11, 286, 478, 11, 286, 478, 1331, 1395, + 13, 1743, 341, 1379, 33772, 551, 13, 865, 13, 1033, 13, 286, 603, 11, 2232, 11, + 286, 603, 652, 257, 33772, 3991, 13, 51456], "temperature": 0.0, "avg_logprob": + -0.1825498100218734, "compression_ratio": 1.6345381526104417, "no_speech_prob": + 0.0004321828600950539}, {"id": 503, "seek": 313192, "start": 3153.76, "end": 3158.96, + "text": " Sure. You can use it in Docker. Um, it''s on the Docker hub. Uh, but I + like to just install stuff.", "tokens": [51456, 4894, 13, 509, 393, 764, 309, 294, + 33772, 13, 3301, 11, 309, 311, 322, 264, 33772, 11838, 13, 4019, 11, 457, 286, 411, + 281, 445, 3625, 1507, 13, 51716], "temperature": 0.0, "avg_logprob": -0.1825498100218734, + "compression_ratio": 1.6345381526104417, "no_speech_prob": 0.0004321828600950539}, + {"id": 504, "seek": 315896, "start": 3159.04, "end": 3164.08, "text": " The old + fashioned way. Uh, in Ubuntu, I just, you know, download the download the thing.", + "tokens": [50368, 440, 1331, 40646, 636, 13, 4019, 11, 294, 30230, 45605, 11, 286, + 445, 11, 291, 458, 11, 5484, 264, 5484, 264, 551, 13, 50620], "temperature": 0.0, + "avg_logprob": -0.21900687073216293, "compression_ratio": 1.7983870967741935, "no_speech_prob": + 0.0019972093869000673}, {"id": 505, "seek": 315896, "start": 3164.08, "end": 3171.28, + "text": " It''s a tarball and you, you know, it''s at the tarball and you''re good + to go. Uh, and, uh, the way you", "tokens": [50620, 467, 311, 257, 3112, 3129, 293, + 291, 11, 291, 458, 11, 309, 311, 412, 264, 3112, 3129, 293, 291, 434, 665, 281, + 352, 13, 4019, 11, 293, 11, 2232, 11, 264, 636, 291, 50980], "temperature": 0.0, + "avg_logprob": -0.21900687073216293, "compression_ratio": 1.7983870967741935, "no_speech_prob": + 0.0019972093869000673}, {"id": 506, "seek": 315896, "start": 3171.28, "end": 3176.2400000000002, + "text": " start it is actually, it''s a, it''s a rest program with a, with a library + dependency, which is on", "tokens": [50980, 722, 309, 307, 767, 11, 309, 311, 257, + 11, 309, 311, 257, 1472, 1461, 365, 257, 11, 365, 257, 6405, 33621, 11, 597, 307, + 322, 51228], "temperature": 0.0, "avg_logprob": -0.21900687073216293, "compression_ratio": + 1.7983870967741935, "no_speech_prob": 0.0019972093869000673}, {"id": 507, "seek": + 315896, "start": 3176.2400000000002, "end": 3180.56, "text": " extra one time. Um, + because it''s dynamically linked. It''s not statically linked.", "tokens": [51228, + 2857, 472, 565, 13, 3301, 11, 570, 309, 311, 43492, 9408, 13, 467, 311, 406, 2219, + 984, 9408, 13, 51444], "temperature": 0.0, "avg_logprob": -0.21900687073216293, + "compression_ratio": 1.7983870967741935, "no_speech_prob": 0.0019972093869000673}, + {"id": 508, "seek": 315896, "start": 3181.76, "end": 3186.08, "text": " But, uh, + to start it, you can either start one core or you specify the model.", "tokens": + [51504, 583, 11, 2232, 11, 281, 722, 309, 11, 291, 393, 2139, 722, 472, 4965, 420, + 291, 16500, 264, 2316, 13, 51720], "temperature": 0.0, "avg_logprob": -0.21900687073216293, + "compression_ratio": 1.7983870967741935, "no_speech_prob": 0.0019972093869000673}, + {"id": 509, "seek": 318608, "start": 3186.72, "end": 3190.64, "text": " Or there''s + a thing that says it''s called mighty cluster. It''s just a back script, back script.", + "tokens": [50396, 1610, 456, 311, 257, 551, 300, 1619, 309, 311, 1219, 21556, 13630, + 13, 467, 311, 445, 257, 646, 5755, 11, 646, 5755, 13, 50592], "temperature": 0.0, + "avg_logprob": -0.2029423787043645, "compression_ratio": 1.6807017543859648, "no_speech_prob": + 0.007068520411849022}, {"id": 510, "seek": 318608, "start": 3190.64, "end": 3195.52, + "text": " And it''ll look and check how many cores you have on the machine and it''ll + start a process for", "tokens": [50592, 400, 309, 603, 574, 293, 1520, 577, 867, + 24826, 291, 362, 322, 264, 3479, 293, 309, 603, 722, 257, 1399, 337, 50836], "temperature": + 0.0, "avg_logprob": -0.2029423787043645, "compression_ratio": 1.6807017543859648, + "no_speech_prob": 0.007068520411849022}, {"id": 511, "seek": 318608, "start": 3195.52, + "end": 3202.16, "text": " every core that you have. So it does this work. Um, and + it takes like less than half a second for", "tokens": [50836, 633, 4965, 300, 291, + 362, 13, 407, 309, 775, 341, 589, 13, 3301, 11, 293, 309, 2516, 411, 1570, 813, + 1922, 257, 1150, 337, 51168], "temperature": 0.0, "avg_logprob": -0.2029423787043645, + "compression_ratio": 1.6807017543859648, "no_speech_prob": 0.007068520411849022}, + {"id": 512, "seek": 318608, "start": 3202.16, "end": 3208.16, "text": " each quarter + startup. It is, I, I actually put that in on purpose. That''s a limit I put in to + slow it", "tokens": [51168, 1184, 6555, 18578, 13, 467, 307, 11, 286, 11, 286, 767, + 829, 300, 294, 322, 4334, 13, 663, 311, 257, 4948, 286, 829, 294, 281, 2964, 309, + 51468], "temperature": 0.0, "avg_logprob": -0.2029423787043645, "compression_ratio": + 1.6807017543859648, "no_speech_prob": 0.007068520411849022}, {"id": 513, "seek": + 318608, "start": 3208.16, "end": 3212.4, "text": " down a little bit. So it didn''t + let go off the rails. Um, but you could probably take that", "tokens": [51468, 760, + 257, 707, 857, 13, 407, 309, 994, 380, 718, 352, 766, 264, 27649, 13, 3301, 11, + 457, 291, 727, 1391, 747, 300, 51680], "temperature": 0.0, "avg_logprob": -0.2029423787043645, + "compression_ratio": 1.6807017543859648, "no_speech_prob": 0.007068520411849022}, + {"id": 514, "seek": 321240, "start": 3212.4, "end": 3217.6, "text": " limit off. + You could just go and modify the bash script and, uh, and see how, see how quickly + it''ll", "tokens": [50364, 4948, 766, 13, 509, 727, 445, 352, 293, 16927, 264, 46183, + 5755, 293, 11, 2232, 11, 293, 536, 577, 11, 536, 577, 2661, 309, 603, 50624], "temperature": + 0.0, "avg_logprob": -0.1596168268506772, "compression_ratio": 1.6266094420600858, + "no_speech_prob": 0.0004996072966605425}, {"id": 515, "seek": 321240, "start": 3217.6, + "end": 3224.4, "text": " start up. So I, so that blog post that you mentioned before, + um, like I rented on 128 cores.", "tokens": [50624, 722, 493, 13, 407, 286, 11, + 370, 300, 6968, 2183, 300, 291, 2835, 949, 11, 1105, 11, 411, 286, 32381, 322, 29810, + 24826, 13, 50964], "temperature": 0.0, "avg_logprob": -0.1596168268506772, "compression_ratio": + 1.6266094420600858, "no_speech_prob": 0.0004996072966605425}, {"id": 516, "seek": + 321240, "start": 3224.4, "end": 3228.7200000000003, "text": " So it would take like, + I actually took the rails off and let it start up really quickly. Um,", "tokens": + [50964, 407, 309, 576, 747, 411, 11, 286, 767, 1890, 264, 27649, 766, 293, 718, + 309, 722, 493, 534, 2661, 13, 3301, 11, 51180], "temperature": 0.0, "avg_logprob": + -0.1596168268506772, "compression_ratio": 1.6266094420600858, "no_speech_prob": + 0.0004996072966605425}, {"id": 517, "seek": 321240, "start": 3229.6, "end": 3237.44, + "text": " but it can take, it can take a moment to start it up on every single core. + Uh, and, yeah, you", "tokens": [51224, 457, 309, 393, 747, 11, 309, 393, 747, 257, + 1623, 281, 722, 309, 493, 322, 633, 2167, 4965, 13, 4019, 11, 293, 11, 1338, 11, + 291, 51616], "temperature": 0.0, "avg_logprob": -0.1596168268506772, "compression_ratio": + 1.6266094420600858, "no_speech_prob": 0.0004996072966605425}, {"id": 518, "seek": + 323744, "start": 3237.52, "end": 3241.84, "text": " you could do it in Docker. You + could do it bare metal. Um, if there''s any people out there using", "tokens": [50368, + 291, 727, 360, 309, 294, 33772, 13, 509, 727, 360, 309, 6949, 5760, 13, 3301, 11, + 498, 456, 311, 604, 561, 484, 456, 1228, 50584], "temperature": 0.0, "avg_logprob": + -0.16772691286527194, "compression_ratio": 1.6982456140350877, "no_speech_prob": + 0.011370191350579262}, {"id": 519, "seek": 323744, "start": 3241.84, "end": 3246.7200000000003, + "text": " windows, I''d love to hear from you. Um, I''ve heard some feedback from + Mac and Linux, but I haven''t", "tokens": [50584, 9309, 11, 286, 1116, 959, 281, + 1568, 490, 291, 13, 3301, 11, 286, 600, 2198, 512, 5824, 490, 5707, 293, 18734, + 11, 457, 286, 2378, 380, 50828], "temperature": 0.0, "avg_logprob": -0.16772691286527194, + "compression_ratio": 1.6982456140350877, "no_speech_prob": 0.011370191350579262}, + {"id": 520, "seek": 323744, "start": 3246.7200000000003, "end": 3251.36, "text": + " gotten any windows feedback. So I don''t even know if it''s worth building it + for windows these days.", "tokens": [50828, 5768, 604, 9309, 5824, 13, 407, 286, + 500, 380, 754, 458, 498, 309, 311, 3163, 2390, 309, 337, 9309, 613, 1708, 13, 51060], + "temperature": 0.0, "avg_logprob": -0.16772691286527194, "compression_ratio": 1.6982456140350877, + "no_speech_prob": 0.011370191350579262}, {"id": 521, "seek": 323744, "start": 3251.36, + "end": 3258.2400000000002, "text": " Maybe not. Yeah, I think it depends. If I don''t + know what should be this scenario is like you", "tokens": [51060, 2704, 406, 13, + 865, 11, 286, 519, 309, 5946, 13, 759, 286, 500, 380, 458, 437, 820, 312, 341, 9005, + 307, 411, 291, 51404], "temperature": 0.0, "avg_logprob": -0.16772691286527194, + "compression_ratio": 1.6982456140350877, "no_speech_prob": 0.011370191350579262}, + {"id": 522, "seek": 323744, "start": 3258.2400000000002, "end": 3264.32, "text": + " are a developer on windows. And for some reason, you don''t go on your server + side to like you,", "tokens": [51404, 366, 257, 10754, 322, 9309, 13, 400, 337, + 512, 1778, 11, 291, 500, 380, 352, 322, 428, 7154, 1252, 281, 411, 291, 11, 51708], + "temperature": 0.0, "avg_logprob": -0.16772691286527194, "compression_ratio": 1.6982456140350877, + "no_speech_prob": 0.011370191350579262}, {"id": 523, "seek": 326432, "start": 3264.4, + "end": 3268.7200000000003, "text": " we still want to develop everything locally, + right? So you want to bring up, I saw such guys actually", "tokens": [50368, 321, + 920, 528, 281, 1499, 1203, 16143, 11, 558, 30, 407, 291, 528, 281, 1565, 493, 11, + 286, 1866, 1270, 1074, 767, 50584], "temperature": 0.0, "avg_logprob": -0.1669043712928647, + "compression_ratio": 1.7035714285714285, "no_speech_prob": 0.006542309653013945}, + {"id": 524, "seek": 326432, "start": 3268.7200000000003, "end": 3275.1200000000003, + "text": " in my team, they wanted to bring every single server service on their + laptop. Yeah. And that''s", "tokens": [50584, 294, 452, 1469, 11, 436, 1415, 281, + 1565, 633, 2167, 7154, 2643, 322, 641, 10732, 13, 865, 13, 400, 300, 311, 50904], + "temperature": 0.0, "avg_logprob": -0.1669043712928647, "compression_ratio": 1.7035714285714285, + "no_speech_prob": 0.006542309653013945}, {"id": 525, "seek": 326432, "start": 3275.1200000000003, + "end": 3279.92, "text": " how they developed. They didn''t want to depend on any + external connection. Right. Even,", "tokens": [50904, 577, 436, 4743, 13, 814, 994, + 380, 528, 281, 5672, 322, 604, 8320, 4984, 13, 1779, 13, 2754, 11, 51144], "temperature": + 0.0, "avg_logprob": -0.1669043712928647, "compression_ratio": 1.7035714285714285, + "no_speech_prob": 0.006542309653013945}, {"id": 526, "seek": 326432, "start": 3279.92, + "end": 3285.76, "text": " even Docker is like a pain in windows these days sometimes, + right? So I know that I know the", "tokens": [51144, 754, 33772, 307, 411, 257, + 1822, 294, 9309, 613, 1708, 2171, 11, 558, 30, 407, 286, 458, 300, 286, 458, 264, + 51436], "temperature": 0.0, "avg_logprob": -0.1669043712928647, "compression_ratio": + 1.7035714285714285, "no_speech_prob": 0.006542309653013945}, {"id": 527, "seek": + 326432, "start": 3285.76, "end": 3291.36, "text": " windows ecosystem, because I + used to, I used to be in in the 2000s. That''s the, that''s the mindset.", "tokens": + [51436, 9309, 11311, 11, 570, 286, 1143, 281, 11, 286, 1143, 281, 312, 294, 294, + 264, 8132, 82, 13, 663, 311, 264, 11, 300, 311, 264, 12543, 13, 51716], "temperature": + 0.0, "avg_logprob": -0.1669043712928647, "compression_ratio": 1.7035714285714285, + "no_speech_prob": 0.006542309653013945}, {"id": 528, "seek": 329136, "start": 3291.36, + "end": 3294.6400000000003, "text": " It''s like I''m just going to run everything + natively in windows. Yeah.", "tokens": [50364, 467, 311, 411, 286, 478, 445, 516, + 281, 1190, 1203, 8470, 356, 294, 9309, 13, 865, 13, 50528], "temperature": 0.0, + "avg_logprob": -0.17491290148566752, "compression_ratio": 1.5728155339805825, "no_speech_prob": + 0.011930027045309544}, {"id": 529, "seek": 329136, "start": 3296.6400000000003, + "end": 3304.2400000000002, "text": " And like when I tried mighty on on Mac, I think + it took like some seconds to boot,", "tokens": [50628, 400, 411, 562, 286, 3031, + 21556, 322, 322, 5707, 11, 286, 519, 309, 1890, 411, 512, 3949, 281, 11450, 11, + 51008], "temperature": 0.0, "avg_logprob": -0.17491290148566752, "compression_ratio": + 1.5728155339805825, "no_speech_prob": 0.011930027045309544}, {"id": 530, "seek": + 329136, "start": 3304.2400000000002, "end": 3309.04, "text": " but the moment it + booted, I was like shooting some queries and like to compute the vectors and", "tokens": + [51008, 457, 264, 1623, 309, 11450, 292, 11, 286, 390, 411, 5942, 512, 24109, 293, + 411, 281, 14722, 264, 18875, 293, 51248], "temperature": 0.0, "avg_logprob": -0.17491290148566752, + "compression_ratio": 1.5728155339805825, "no_speech_prob": 0.011930027045309544}, + {"id": 531, "seek": 329136, "start": 3309.04, "end": 3315.1200000000003, "text": + " it was insanely fast. Is it only on a secret source in this insane fastness?", + "tokens": [51248, 309, 390, 40965, 2370, 13, 1119, 309, 787, 322, 257, 4054, 4009, + 294, 341, 10838, 2370, 1287, 30, 51552], "temperature": 0.0, "avg_logprob": -0.17491290148566752, + "compression_ratio": 1.5728155339805825, "no_speech_prob": 0.011930027045309544}, + {"id": 532, "seek": 331512, "start": 3316.08, "end": 3322.4, "text": " I mean, if + you''re used to running models and Python, it''ll seem insanely fast. A lot of", + "tokens": [50412, 286, 914, 11, 498, 291, 434, 1143, 281, 2614, 5245, 293, 15329, + 11, 309, 603, 1643, 40965, 2370, 13, 316, 688, 295, 50728], "temperature": 0.0, + "avg_logprob": -0.24038882785373264, "compression_ratio": 1.569377990430622, "no_speech_prob": + 0.007889417931437492}, {"id": 533, "seek": 331512, "start": 3322.4, "end": 3328.4, + "text": " it is on. That''s they get most of the credit there. Yes. But there''s + a lot of other stuff that", "tokens": [50728, 309, 307, 322, 13, 663, 311, 436, + 483, 881, 295, 264, 5397, 456, 13, 1079, 13, 583, 456, 311, 257, 688, 295, 661, + 1507, 300, 51028], "temperature": 0.0, "avg_logprob": -0.24038882785373264, "compression_ratio": + 1.569377990430622, "no_speech_prob": 0.007889417931437492}, {"id": 534, "seek": + 331512, "start": 3328.4, "end": 3334.0, "text": " goes into it, which is like the + tokenization and the pre processing and the post processing is", "tokens": [51028, + 1709, 666, 309, 11, 597, 307, 411, 264, 14862, 2144, 293, 264, 659, 9007, 293, 264, + 2183, 9007, 307, 51308], "temperature": 0.0, "avg_logprob": -0.24038882785373264, + "compression_ratio": 1.569377990430622, "no_speech_prob": 0.007889417931437492}, + {"id": 535, "seek": 331512, "start": 3334.0, "end": 3338.7999999999997, "text": + " just, it''s fast is I''ve been using Rust for it and", "tokens": [51308, 445, + 11, 309, 311, 2370, 307, 286, 600, 668, 1228, 34952, 337, 309, 293, 51548], "temperature": + 0.0, "avg_logprob": -0.24038882785373264, "compression_ratio": 1.569377990430622, + "no_speech_prob": 0.007889417931437492}, {"id": 536, "seek": 333880, "start": 3339.36, + "end": 3345.52, "text": " Rust is a, Rust is a really interesting language. It''s + it''s gotten me back into systems programming.", "tokens": [50392, 34952, 307, 257, + 11, 34952, 307, 257, 534, 1880, 2856, 13, 467, 311, 309, 311, 5768, 385, 646, 666, + 3652, 9410, 13, 50700], "temperature": 0.0, "avg_logprob": -0.17074422367283557, + "compression_ratio": 1.7463235294117647, "no_speech_prob": 0.004121183417737484}, + {"id": 537, "seek": 333880, "start": 3346.5600000000004, "end": 3350.4, "text": + " I''m not here to say that like Rust is like the most amazing thing ever. There + are things I love", "tokens": [50752, 286, 478, 406, 510, 281, 584, 300, 411, 34952, + 307, 411, 264, 881, 2243, 551, 1562, 13, 821, 366, 721, 286, 959, 50944], "temperature": + 0.0, "avg_logprob": -0.17074422367283557, "compression_ratio": 1.7463235294117647, + "no_speech_prob": 0.004121183417737484}, {"id": 538, "seek": 333880, "start": 3350.4, + "end": 3355.04, "text": " about it, the things that are like, I don''t know if I + would do that way, but you''re supposed to", "tokens": [50944, 466, 309, 11, 264, + 721, 300, 366, 411, 11, 286, 500, 380, 458, 498, 286, 576, 360, 300, 636, 11, 457, + 291, 434, 3442, 281, 51176], "temperature": 0.0, "avg_logprob": -0.17074422367283557, + "compression_ratio": 1.7463235294117647, "no_speech_prob": 0.004121183417737484}, + {"id": 539, "seek": 333880, "start": 3355.04, "end": 3360.48, "text": " do things + a certain way because the compiler understands that it''ll super optimize it for + you.", "tokens": [51176, 360, 721, 257, 1629, 636, 570, 264, 31958, 15146, 300, + 309, 603, 1687, 19719, 309, 337, 291, 13, 51448], "temperature": 0.0, "avg_logprob": + -0.17074422367283557, "compression_ratio": 1.7463235294117647, "no_speech_prob": + 0.004121183417737484}, {"id": 540, "seek": 333880, "start": 3361.04, "end": 3364.8, + "text": " It''s hard to, it''s hard to wrap your brain around it if you''re if you''re + from a dynamic", "tokens": [51476, 467, 311, 1152, 281, 11, 309, 311, 1152, 281, + 7019, 428, 3567, 926, 309, 498, 291, 434, 498, 291, 434, 490, 257, 8546, 51664], + "temperature": 0.0, "avg_logprob": -0.17074422367283557, "compression_ratio": 1.7463235294117647, + "no_speech_prob": 0.004121183417737484}, {"id": 541, "seek": 336480, "start": 3364.8, + "end": 3370.96, "text": " really typed language like Python or JavaScript. It''s + hard to get a handle on node. My my compiled", "tokens": [50364, 534, 33941, 2856, + 411, 15329, 420, 15778, 13, 467, 311, 1152, 281, 483, 257, 4813, 322, 9984, 13, + 1222, 452, 36548, 50672], "temperature": 0.0, "avg_logprob": -0.19009941668549846, + "compression_ratio": 1.7634408602150538, "no_speech_prob": 0.010129575617611408}, + {"id": 542, "seek": 336480, "start": 3370.96, "end": 3377.1200000000003, "text": + " background like, you know, typed, typed programming languages compile ahead of + time. I was used", "tokens": [50672, 3678, 411, 11, 291, 458, 11, 33941, 11, 33941, + 9410, 8650, 31413, 2286, 295, 565, 13, 286, 390, 1143, 50980], "temperature": 0.0, + "avg_logprob": -0.19009941668549846, "compression_ratio": 1.7634408602150538, "no_speech_prob": + 0.010129575617611408}, {"id": 543, "seek": 336480, "start": 3377.1200000000003, + "end": 3382.2400000000002, "text": " to that from my previous life. So I was able + to pick it up again and I read the I read I just read", "tokens": [50980, 281, 300, + 490, 452, 3894, 993, 13, 407, 286, 390, 1075, 281, 1888, 309, 493, 797, 293, 286, + 1401, 264, 286, 1401, 286, 445, 1401, 51236], "temperature": 0.0, "avg_logprob": + -0.19009941668549846, "compression_ratio": 1.7634408602150538, "no_speech_prob": + 0.010129575617611408}, {"id": 544, "seek": 336480, "start": 3382.2400000000002, + "end": 3387.04, "text": " the rest book. There''s a free book out there. I actually + bought the I bought a paperback because", "tokens": [51236, 264, 1472, 1446, 13, + 821, 311, 257, 1737, 1446, 484, 456, 13, 286, 767, 4243, 264, 286, 4243, 257, 3035, + 3207, 570, 51476], "temperature": 0.0, "avg_logprob": -0.19009941668549846, "compression_ratio": + 1.7634408602150538, "no_speech_prob": 0.010129575617611408}, {"id": 545, "seek": + 336480, "start": 3387.04, "end": 3394.1600000000003, "text": " I like paperbacks + and I like hard covers like actual paper these days. So I was reading it like that.", + "tokens": [51476, 286, 411, 3035, 17758, 293, 286, 411, 1152, 10538, 411, 3539, + 3035, 613, 1708, 13, 407, 286, 390, 3760, 309, 411, 300, 13, 51832], "temperature": + 0.0, "avg_logprob": -0.19009941668549846, "compression_ratio": 1.7634408602150538, + "no_speech_prob": 0.010129575617611408}, {"id": 546, "seek": 339480, "start": 3395.6000000000004, + "end": 3399.28, "text": " And just going through the examples took me a couple weeks + to get a handle on Rust.", "tokens": [50404, 400, 445, 516, 807, 264, 5110, 1890, + 385, 257, 1916, 3259, 281, 483, 257, 4813, 322, 34952, 13, 50588], "temperature": + 0.0, "avg_logprob": -0.13685197096604568, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.001487237517721951}, {"id": 547, "seek": 339480, "start": 3401.2000000000003, + "end": 3409.52, "text": " That gets a lot of the credit as well, the Rust language. + It''s just it optimizes and you know,", "tokens": [50684, 663, 2170, 257, 688, 295, + 264, 5397, 382, 731, 11, 264, 34952, 2856, 13, 467, 311, 445, 309, 5028, 5660, 293, + 291, 458, 11, 51100], "temperature": 0.0, "avg_logprob": -0.13685197096604568, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.001487237517721951}, {"id": 548, "seek": + 339480, "start": 3409.52, "end": 3417.44, "text": " you have to learn this field + that I''m in now with model inference. It''s like the super niche field of", "tokens": + [51100, 291, 362, 281, 1466, 341, 2519, 300, 286, 478, 294, 586, 365, 2316, 38253, + 13, 467, 311, 411, 264, 1687, 19956, 2519, 295, 51496], "temperature": 0.0, "avg_logprob": + -0.13685197096604568, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.001487237517721951}, {"id": 549, "seek": 339480, "start": 3417.44, "end": 3422.32, + "text": " like you have to understand the hardware and you have to understand the + machine learning.", "tokens": [51496, 411, 291, 362, 281, 1223, 264, 8837, 293, + 291, 362, 281, 1223, 264, 3479, 2539, 13, 51740], "temperature": 0.0, "avg_logprob": + -0.13685197096604568, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.001487237517721951}, {"id": 550, "seek": 342232, "start": 3423.04, "end": 3428.7200000000003, + "text": " And there are those two fields are like so different. There are very few + people out there that are", "tokens": [50400, 400, 456, 366, 729, 732, 7909, 366, + 411, 370, 819, 13, 821, 366, 588, 1326, 561, 484, 456, 300, 366, 50684], "temperature": + 0.0, "avg_logprob": -0.16213827495333516, "compression_ratio": 1.5125628140703518, + "no_speech_prob": 0.001896693604066968}, {"id": 551, "seek": 342232, "start": 3429.52, + "end": 3441.04, "text": " really good in both. So I know that there''s a word vectorization. + So vectorized on the CPU is like,", "tokens": [50724, 534, 665, 294, 1293, 13, 407, + 286, 458, 300, 456, 311, 257, 1349, 8062, 2144, 13, 407, 8062, 1602, 322, 264, 13199, + 307, 411, 11, 51300], "temperature": 0.0, "avg_logprob": -0.16213827495333516, "compression_ratio": + 1.5125628140703518, "no_speech_prob": 0.001896693604066968}, {"id": 552, "seek": + 342232, "start": 3441.04, "end": 3448.4, "text": " well, if I have to do a calculation + with eight with a byte and you know, I have a register at 64 bits.", "tokens": [51300, + 731, 11, 498, 286, 362, 281, 360, 257, 17108, 365, 3180, 365, 257, 40846, 293, 291, + 458, 11, 286, 362, 257, 7280, 412, 12145, 9239, 13, 51668], "temperature": 0.0, + "avg_logprob": -0.16213827495333516, "compression_ratio": 1.5125628140703518, "no_speech_prob": + 0.001896693604066968}, {"id": 553, "seek": 344840, "start": 3449.36, "end": 3456.32, + "text": " But I have an eight bit byte like, well, I can vectorize and I can do + eight calculations because it''s", "tokens": [50412, 583, 286, 362, 364, 3180, 857, + 40846, 411, 11, 731, 11, 286, 393, 8062, 1125, 293, 286, 393, 360, 3180, 20448, + 570, 309, 311, 50760], "temperature": 0.0, "avg_logprob": -0.2842003984271355, "compression_ratio": + 1.5905511811023623, "no_speech_prob": 0.0031938592437654734}, {"id": 554, "seek": + 344840, "start": 3456.32, "end": 3464.4, "text": " with SIMD, same instruction multiple + data. So that so Rust, if you turn on certain compile flags,", "tokens": [50760, + 365, 24738, 35, 11, 912, 10951, 3866, 1412, 13, 407, 300, 370, 34952, 11, 498, 291, + 1261, 322, 1629, 31413, 23265, 11, 51164], "temperature": 0.0, "avg_logprob": -0.2842003984271355, + "compression_ratio": 1.5905511811023623, "no_speech_prob": 0.0031938592437654734}, + {"id": 555, "seek": 344840, "start": 3464.4, "end": 3469.6, "text": " it''ll do + that for you automatically. So you get that speed up. So I turn those knobs all + the way up.", "tokens": [51164, 309, 603, 360, 300, 337, 291, 6772, 13, 407, 291, + 483, 300, 3073, 493, 13, 407, 286, 1261, 729, 46999, 439, 264, 636, 493, 13, 51424], + "temperature": 0.0, "avg_logprob": -0.2842003984271355, "compression_ratio": 1.5905511811023623, + "no_speech_prob": 0.0031938592437654734}, {"id": 556, "seek": 344840, "start": 3469.6, + "end": 3476.88, "text": " I said use all the use AVX1 and 2 if the process supports + it and most processes do these days if you''re", "tokens": [51424, 286, 848, 764, + 439, 264, 764, 30198, 55, 16, 293, 568, 498, 264, 1399, 9346, 309, 293, 881, 7555, + 360, 613, 1708, 498, 291, 434, 51788], "temperature": 0.0, "avg_logprob": -0.2842003984271355, + "compression_ratio": 1.5905511811023623, "no_speech_prob": 0.0031938592437654734}, + {"id": 557, "seek": 347688, "start": 3476.88, "end": 3483.76, "text": " on X86. + ARM has a different set. I haven''t gotten into the ARM world that I have to get + an M1 Mac and", "tokens": [50364, 322, 1783, 22193, 13, 45209, 575, 257, 819, 992, + 13, 286, 2378, 380, 5768, 666, 264, 45209, 1002, 300, 286, 362, 281, 483, 364, 376, + 16, 5707, 293, 50708], "temperature": 0.0, "avg_logprob": -0.10746487704190341, + "compression_ratio": 1.6694214876033058, "no_speech_prob": 0.001158062950707972}, + {"id": 558, "seek": 347688, "start": 3483.76, "end": 3490.08, "text": " I''m going + to start messing around with all that. But if you know that stuff and you know how + to turn it", "tokens": [50708, 286, 478, 516, 281, 722, 23258, 926, 365, 439, 300, + 13, 583, 498, 291, 458, 300, 1507, 293, 291, 458, 577, 281, 1261, 309, 51024], "temperature": + 0.0, "avg_logprob": -0.10746487704190341, "compression_ratio": 1.6694214876033058, + "no_speech_prob": 0.001158062950707972}, {"id": 559, "seek": 347688, "start": 3490.08, + "end": 3496.96, "text": " on, Rust does the rest for you. You kind of have to write + your code a certain way so that, you know,", "tokens": [51024, 322, 11, 34952, 775, + 264, 1472, 337, 291, 13, 509, 733, 295, 362, 281, 2464, 428, 3089, 257, 1629, 636, + 370, 300, 11, 291, 458, 11, 51368], "temperature": 0.0, "avg_logprob": -0.10746487704190341, + "compression_ratio": 1.6694214876033058, "no_speech_prob": 0.001158062950707972}, + {"id": 560, "seek": 347688, "start": 3496.96, "end": 3501.12, "text": " Rust will + do the optimization a certain way. You can''t think like old school. You have to + kind of", "tokens": [51368, 34952, 486, 360, 264, 19618, 257, 1629, 636, 13, 509, + 393, 380, 519, 411, 1331, 1395, 13, 509, 362, 281, 733, 295, 51576], "temperature": + 0.0, "avg_logprob": -0.10746487704190341, "compression_ratio": 1.6694214876033058, + "no_speech_prob": 0.001158062950707972}, {"id": 561, "seek": 350112, "start": 3501.12, + "end": 3507.8399999999997, "text": " think in Rust world a little bit. But doing + that, now you get all this extra, all this extra speed", "tokens": [50364, 519, + 294, 34952, 1002, 257, 707, 857, 13, 583, 884, 300, 11, 586, 291, 483, 439, 341, + 2857, 11, 439, 341, 2857, 3073, 50700], "temperature": 0.0, "avg_logprob": -0.16897632678349814, + "compression_ratio": 1.5958333333333334, "no_speech_prob": 0.006385597866028547}, + {"id": 562, "seek": 350112, "start": 3508.64, "end": 3512.64, "text": " from pretty + much nothing just from writing your code a certain way turning on a couple of compiled", + "tokens": [50740, 490, 1238, 709, 1825, 445, 490, 3579, 428, 3089, 257, 1629, 636, + 6246, 322, 257, 1916, 295, 36548, 50940], "temperature": 0.0, "avg_logprob": -0.16897632678349814, + "compression_ratio": 1.5958333333333334, "no_speech_prob": 0.006385597866028547}, + {"id": 563, "seek": 350112, "start": 3512.64, "end": 3518.96, "text": " flags. That''s + why it was so fast. Yeah, but you still needed to figure all of these out and I", + "tokens": [50940, 23265, 13, 663, 311, 983, 309, 390, 370, 2370, 13, 865, 11, 457, + 291, 920, 2978, 281, 2573, 439, 295, 613, 484, 293, 286, 51256], "temperature": + 0.0, "avg_logprob": -0.16897632678349814, "compression_ratio": 1.5958333333333334, + "no_speech_prob": 0.006385597866028547}, {"id": 564, "seek": 350112, "start": 3518.96, + "end": 3525.6, "text": " remember you were you were saying that you had a bunch + of weeks, you know, coding on stuff.", "tokens": [51256, 1604, 291, 645, 291, 645, + 1566, 300, 291, 632, 257, 3840, 295, 3259, 11, 291, 458, 11, 17720, 322, 1507, 13, + 51588], "temperature": 0.0, "avg_logprob": -0.16897632678349814, "compression_ratio": + 1.5958333333333334, "no_speech_prob": 0.006385597866028547}, {"id": 565, "seek": + 352560, "start": 3526.56, "end": 3532.24, "text": " Yeah, you get things done because + I know and many of us probably know here in the audience that", "tokens": [50412, + 865, 11, 291, 483, 721, 1096, 570, 286, 458, 293, 867, 295, 505, 1391, 458, 510, + 294, 264, 4034, 300, 50696], "temperature": 0.0, "avg_logprob": -0.1683180191937615, + "compression_ratio": 1.583673469387755, "no_speech_prob": 0.053860876709222794}, + {"id": 566, "seek": 352560, "start": 3532.24, "end": 3538.7999999999997, "text": + " if you are a programmer, you might say, yeah, I can do it. But you cannot actually + estimate when", "tokens": [50696, 498, 291, 366, 257, 32116, 11, 291, 1062, 584, + 11, 1338, 11, 286, 393, 360, 309, 13, 583, 291, 2644, 767, 12539, 562, 51024], "temperature": + 0.0, "avg_logprob": -0.1683180191937615, "compression_ratio": 1.583673469387755, + "no_speech_prob": 0.053860876709222794}, {"id": 567, "seek": 352560, "start": 3538.7999999999997, + "end": 3544.96, "text": " you will be done. So you get into the weeds and like, + oh my god, it just like UTF or something else", "tokens": [51024, 291, 486, 312, + 1096, 13, 407, 291, 483, 666, 264, 26370, 293, 411, 11, 1954, 452, 3044, 11, 309, + 445, 411, 624, 20527, 420, 746, 1646, 51332], "temperature": 0.0, "avg_logprob": + -0.1683180191937615, "compression_ratio": 1.583673469387755, "no_speech_prob": 0.053860876709222794}, + {"id": 568, "seek": 352560, "start": 3544.96, "end": 3549.7599999999998, "text": + " doesn''t work or like, I''m sending a requested fails, whatever, what''s going + on and you spend so", "tokens": [51332, 1177, 380, 589, 420, 411, 11, 286, 478, + 7750, 257, 16436, 18199, 11, 2035, 11, 437, 311, 516, 322, 293, 291, 3496, 370, + 51572], "temperature": 0.0, "avg_logprob": -0.1683180191937615, "compression_ratio": + 1.583673469387755, "no_speech_prob": 0.053860876709222794}, {"id": 569, "seek": + 354976, "start": 3549.76, "end": 3555.1200000000003, "text": " much time or if you''re + doing an algorithm, that''s another story. That''s like an internet journey", "tokens": + [50364, 709, 565, 420, 498, 291, 434, 884, 364, 9284, 11, 300, 311, 1071, 1657, + 13, 663, 311, 411, 364, 4705, 4671, 50632], "temperature": 0.0, "avg_logprob": -0.16823814465449408, + "compression_ratio": 1.6623931623931625, "no_speech_prob": 0.0027399607934057713}, + {"id": 570, "seek": 354976, "start": 3555.1200000000003, "end": 3562.8, "text": + " there, like debugging all these states. And I mean, I''m just trying to say that + even though you", "tokens": [50632, 456, 11, 411, 45592, 439, 613, 4368, 13, 400, + 286, 914, 11, 286, 478, 445, 1382, 281, 584, 300, 754, 1673, 291, 51016], "temperature": + 0.0, "avg_logprob": -0.16823814465449408, "compression_ratio": 1.6623931623931625, + "no_speech_prob": 0.0027399607934057713}, {"id": 571, "seek": 354976, "start": 3562.8, + "end": 3570.96, "text": " you make it sound so easy to to master Rust and you know, + to go through all these maze and make", "tokens": [51016, 291, 652, 309, 1626, 370, + 1858, 281, 281, 4505, 34952, 293, 291, 458, 11, 281, 352, 807, 439, 613, 33032, + 293, 652, 51424], "temperature": 0.0, "avg_logprob": -0.16823814465449408, "compression_ratio": + 1.6623931623931625, "no_speech_prob": 0.0027399607934057713}, {"id": 572, "seek": + 354976, "start": 3570.96, "end": 3578.2400000000002, "text": " it the way compiler + wants it, it''s still time. It''s a lot of time. It''s skill. And so you master + it.", "tokens": [51424, 309, 264, 636, 31958, 2738, 309, 11, 309, 311, 920, 565, + 13, 467, 311, 257, 688, 295, 565, 13, 467, 311, 5389, 13, 400, 370, 291, 4505, 309, + 13, 51788], "temperature": 0.0, "avg_logprob": -0.16823814465449408, "compression_ratio": + 1.6623931623931625, "no_speech_prob": 0.0027399607934057713}, {"id": 573, "seek": + 357824, "start": 3578.24, "end": 3584.7999999999997, "text": " And that''s why and + in the end, you know, the end result was not given. You earned it, right?", "tokens": + [50364, 400, 300, 311, 983, 293, 294, 264, 917, 11, 291, 458, 11, 264, 917, 1874, + 390, 406, 2212, 13, 509, 12283, 309, 11, 558, 30, 50692], "temperature": 0.0, "avg_logprob": + -0.17792512785713627, "compression_ratio": 1.706896551724138, "no_speech_prob": + 0.0017735135043039918}, {"id": 574, "seek": 357824, "start": 3585.4399999999996, + "end": 3592.08, "text": " So why not turn this into the business? So now on the + business side, I''m thinking, how do you offer", "tokens": [50724, 407, 983, 406, + 1261, 341, 666, 264, 1606, 30, 407, 586, 322, 264, 1606, 1252, 11, 286, 478, 1953, + 11, 577, 360, 291, 2626, 51056], "temperature": 0.0, "avg_logprob": -0.17792512785713627, + "compression_ratio": 1.706896551724138, "no_speech_prob": 0.0017735135043039918}, + {"id": 575, "seek": 357824, "start": 3592.08, "end": 3599.3599999999997, "text": + " Rust? Like, so how do you offer excuse me, mighty? So you have you have the the + binary, you have the", "tokens": [51056, 34952, 30, 1743, 11, 370, 577, 360, 291, + 2626, 8960, 385, 11, 21556, 30, 407, 291, 362, 291, 362, 264, 264, 17434, 11, 291, + 362, 264, 51420], "temperature": 0.0, "avg_logprob": -0.17792512785713627, "compression_ratio": + 1.706896551724138, "no_speech_prob": 0.0017735135043039918}, {"id": 576, "seek": + 357824, "start": 3599.3599999999997, "end": 3605.4399999999996, "text": " like the + model will be shipped separately somehow outside of binary, right? But what is a + customer I''m", "tokens": [51420, 411, 264, 2316, 486, 312, 25312, 14759, 6063, + 2380, 295, 17434, 11, 558, 30, 583, 437, 307, 257, 5474, 286, 478, 51724], "temperature": + 0.0, "avg_logprob": -0.17792512785713627, "compression_ratio": 1.706896551724138, + "no_speech_prob": 0.0017735135043039918}, {"id": 577, "seek": 360544, "start": 3605.44, + "end": 3612.4, "text": " paying for? And yeah, and also kind of ahead of time, a + question, can you give a discount code", "tokens": [50364, 6229, 337, 30, 400, 1338, + 11, 293, 611, 733, 295, 2286, 295, 565, 11, 257, 1168, 11, 393, 291, 976, 257, 11635, + 3089, 50712], "temperature": 0.0, "avg_logprob": -0.13891210468537216, "compression_ratio": + 1.5925925925925926, "no_speech_prob": 0.0020615719258785248}, {"id": 578, "seek": + 360544, "start": 3612.4, "end": 3622.2400000000002, "text": " for our audience to + try it? Oh, that''s a great question. Um, uh, yes. So my business model is,", "tokens": + [50712, 337, 527, 4034, 281, 853, 309, 30, 876, 11, 300, 311, 257, 869, 1168, 13, + 3301, 11, 2232, 11, 2086, 13, 407, 452, 1606, 2316, 307, 11, 51204], "temperature": + 0.0, "avg_logprob": -0.13891210468537216, "compression_ratio": 1.5925925925925926, + "no_speech_prob": 0.0020615719258785248}, {"id": 579, "seek": 360544, "start": 3622.96, + "end": 3627.76, "text": " is again, old school, because I''ve been doing software + for a long time. So it''s licensed software,", "tokens": [51240, 307, 797, 11, 1331, + 1395, 11, 570, 286, 600, 668, 884, 4722, 337, 257, 938, 565, 13, 407, 309, 311, + 25225, 4722, 11, 51480], "temperature": 0.0, "avg_logprob": -0.13891210468537216, + "compression_ratio": 1.5925925925925926, "no_speech_prob": 0.0020615719258785248}, + {"id": 580, "seek": 360544, "start": 3627.76, "end": 3632.96, "text": " right? You + pay, you pay a license, you get to use the software. Um, I''m still trying to figure + out", "tokens": [51480, 558, 30, 509, 1689, 11, 291, 1689, 257, 10476, 11, 291, + 483, 281, 764, 264, 4722, 13, 3301, 11, 286, 478, 920, 1382, 281, 2573, 484, 51740], + "temperature": 0.0, "avg_logprob": -0.13891210468537216, "compression_ratio": 1.5925925925925926, + "no_speech_prob": 0.0020615719258785248}, {"id": 581, "seek": 363296, "start": 3632.96, + "end": 3638.32, "text": " the exact price point. Um, some people say, some people + say it''s too cheap, which is interesting,", "tokens": [50364, 264, 1900, 3218, + 935, 13, 3301, 11, 512, 561, 584, 11, 512, 561, 584, 309, 311, 886, 7084, 11, 597, + 307, 1880, 11, 50632], "temperature": 0.0, "avg_logprob": -0.16370959933713186, + "compression_ratio": 1.7419354838709677, "no_speech_prob": 0.0046398574486374855}, + {"id": 582, "seek": 363296, "start": 3638.32, "end": 3643.6, "text": " because I + didn''t, I didn''t think so. Um, some people think I say I should charge more money + for it.", "tokens": [50632, 570, 286, 994, 380, 11, 286, 994, 380, 519, 370, 13, + 3301, 11, 512, 561, 519, 286, 584, 286, 820, 4602, 544, 1460, 337, 309, 13, 50896], + "temperature": 0.0, "avg_logprob": -0.16370959933713186, "compression_ratio": 1.7419354838709677, + "no_speech_prob": 0.0046398574486374855}, {"id": 583, "seek": 363296, "start": 3644.0, + "end": 3650.0, "text": " Uh, it''s $99 a month right now. When this podcast is published + and after that, it may change.", "tokens": [50916, 4019, 11, 309, 311, 1848, 8494, + 257, 1618, 558, 586, 13, 1133, 341, 7367, 307, 6572, 293, 934, 300, 11, 309, 815, + 1319, 13, 51216], "temperature": 0.0, "avg_logprob": -0.16370959933713186, "compression_ratio": + 1.7419354838709677, "no_speech_prob": 0.0046398574486374855}, {"id": 584, "seek": + 363296, "start": 3650.56, "end": 3655.2, "text": " If you get it, I don''t have, + it''s a light up to strike. I can go in and create a discount code", "tokens": [51244, + 759, 291, 483, 309, 11, 286, 500, 380, 362, 11, 309, 311, 257, 1442, 493, 281, 9302, + 13, 286, 393, 352, 294, 293, 1884, 257, 11635, 3089, 51476], "temperature": 0.0, + "avg_logprob": -0.16370959933713186, "compression_ratio": 1.7419354838709677, "no_speech_prob": + 0.0046398574486374855}, {"id": 585, "seek": 363296, "start": 3655.84, "end": 3660.32, + "text": " for folks. I don''t have a code right now. But if you, if you email me + and you say I heard about you", "tokens": [51508, 337, 4024, 13, 286, 500, 380, + 362, 257, 3089, 558, 586, 13, 583, 498, 291, 11, 498, 291, 3796, 385, 293, 291, + 584, 286, 2198, 466, 291, 51732], "temperature": 0.0, "avg_logprob": -0.16370959933713186, + "compression_ratio": 1.7419354838709677, "no_speech_prob": 0.0046398574486374855}, + {"id": 586, "seek": 366032, "start": 3660.4, "end": 3665.28, "text": " on the vector + podcast, um, follow the link in the description, like follow the link in the notes + and", "tokens": [50368, 322, 264, 8062, 7367, 11, 1105, 11, 1524, 264, 2113, 294, + 264, 3855, 11, 411, 1524, 264, 2113, 294, 264, 5570, 293, 50612], "temperature": + 0.0, "avg_logprob": -0.15423403845893013, "compression_ratio": 1.6866359447004609, + "no_speech_prob": 0.0013247025199234486}, {"id": 587, "seek": 366032, "start": 3665.92, + "end": 3669.6000000000004, "text": " email, we''ll, we''ll set something up, um, + so you can get a discount.", "tokens": [50644, 3796, 11, 321, 603, 11, 321, 603, + 992, 746, 493, 11, 1105, 11, 370, 291, 393, 483, 257, 11635, 13, 50828], "temperature": + 0.0, "avg_logprob": -0.15423403845893013, "compression_ratio": 1.6866359447004609, + "no_speech_prob": 0.0013247025199234486}, {"id": 588, "seek": 366032, "start": 3670.96, + "end": 3677.1200000000003, "text": " Uh, that''s the way it works. But that''s, + uh, that''s for commercial. So if you''re using it commercially,", "tokens": [50896, + 4019, 11, 300, 311, 264, 636, 309, 1985, 13, 583, 300, 311, 11, 2232, 11, 300, 311, + 337, 6841, 13, 407, 498, 291, 434, 1228, 309, 41751, 11, 51204], "temperature": + 0.0, "avg_logprob": -0.15423403845893013, "compression_ratio": 1.6866359447004609, + "no_speech_prob": 0.0013247025199234486}, {"id": 589, "seek": 366032, "start": 3677.1200000000003, + "end": 3683.76, "text": " um, and you''re making money from it, uh, then, you know, + I, I ask you pay a license, please.", "tokens": [51204, 1105, 11, 293, 291, 434, + 1455, 1460, 490, 309, 11, 2232, 11, 550, 11, 291, 458, 11, 286, 11, 286, 1029, 291, + 1689, 257, 10476, 11, 1767, 13, 51536], "temperature": 0.0, "avg_logprob": -0.15423403845893013, + "compression_ratio": 1.6866359447004609, "no_speech_prob": 0.0013247025199234486}, + {"id": 590, "seek": 368376, "start": 3683.92, "end": 3691.84, "text": " If you are + a nonprofit charity, um, or just using it, you''re a student, um, or you just have + a", "tokens": [50372, 759, 291, 366, 257, 23348, 16863, 11, 1105, 11, 420, 445, + 1228, 309, 11, 291, 434, 257, 3107, 11, 1105, 11, 420, 291, 445, 362, 257, 50768], + "temperature": 0.0, "avg_logprob": -0.21080346540971237, "compression_ratio": 1.7300884955752212, + "no_speech_prob": 0.0044619301334023476}, {"id": 591, "seek": 368376, "start": 3691.84, + "end": 3696.96, "text": " side project, you''re messing around, you just want to + get some vectors, go and install it, you know,", "tokens": [50768, 1252, 1716, 11, + 291, 434, 23258, 926, 11, 291, 445, 528, 281, 483, 512, 18875, 11, 352, 293, 3625, + 309, 11, 291, 458, 11, 51024], "temperature": 0.0, "avg_logprob": -0.21080346540971237, + "compression_ratio": 1.7300884955752212, "no_speech_prob": 0.0044619301334023476}, + {"id": 592, "seek": 368376, "start": 3696.96, "end": 3701.1200000000003, "text": + " don''t worry about it. Um, but if you put it in production and you''re, and you''re + charging money for", "tokens": [51024, 500, 380, 3292, 466, 309, 13, 3301, 11, 457, + 498, 291, 829, 309, 294, 4265, 293, 291, 434, 11, 293, 291, 434, 11379, 1460, 337, + 51232], "temperature": 0.0, "avg_logprob": -0.21080346540971237, "compression_ratio": + 1.7300884955752212, "no_speech_prob": 0.0044619301334023476}, {"id": 593, "seek": + 368376, "start": 3701.1200000000003, "end": 3707.76, "text": " your product, then + please, please buy a list. Yeah. Yeah. To have questions, Sign, um, how will", "tokens": + [51232, 428, 1674, 11, 550, 1767, 11, 1767, 2256, 257, 1329, 13, 865, 13, 865, 13, + 1407, 362, 1651, 11, 13515, 11, 1105, 11, 577, 486, 51564], "temperature": 0.0, + "avg_logprob": -0.21080346540971237, "compression_ratio": 1.7300884955752212, "no_speech_prob": + 0.0044619301334023476}, {"id": 594, "seek": 370776, "start": 3707.84, "end": 3713.1200000000003, + "text": " you track who is using it for commercial and who is using it for a hobbyist + project?", "tokens": [50368, 291, 2837, 567, 307, 1228, 309, 337, 6841, 293, 567, + 307, 1228, 309, 337, 257, 18240, 468, 1716, 30, 50632], "temperature": 0.0, "avg_logprob": + -0.14765692220150844, "compression_ratio": 1.6380090497737556, "no_speech_prob": + 0.0018436595564708114}, {"id": 595, "seek": 370776, "start": 3713.92, "end": 3722.1600000000003, + "text": " That''s a great question. And, and I don''t, I don''t track that. Um, + I''m also, I''m really into", "tokens": [50672, 663, 311, 257, 869, 1168, 13, 400, + 11, 293, 286, 500, 380, 11, 286, 500, 380, 2837, 300, 13, 3301, 11, 286, 478, 611, + 11, 286, 478, 534, 666, 51084], "temperature": 0.0, "avg_logprob": -0.14765692220150844, + "compression_ratio": 1.6380090497737556, "no_speech_prob": 0.0018436595564708114}, + {"id": 596, "seek": 370776, "start": 3723.2000000000003, "end": 3729.1200000000003, + "text": " uh, privacy and safety on the web. So I don''t like the idea of like putting + in a whole bunch of", "tokens": [51136, 2232, 11, 11427, 293, 4514, 322, 264, 3670, + 13, 407, 286, 500, 380, 411, 264, 1558, 295, 411, 3372, 294, 257, 1379, 3840, 295, + 51432], "temperature": 0.0, "avg_logprob": -0.14765692220150844, "compression_ratio": + 1.6380090497737556, "no_speech_prob": 0.0018436595564708114}, {"id": 597, "seek": + 370776, "start": 3729.1200000000003, "end": 3735.0400000000004, "text": " tracking + into lemon tree. Um, I think that''s a terrible way to run a product these days.", + "tokens": [51432, 11603, 666, 11356, 4230, 13, 3301, 11, 286, 519, 300, 311, 257, + 6237, 636, 281, 1190, 257, 1674, 613, 1708, 13, 51728], "temperature": 0.0, "avg_logprob": + -0.14765692220150844, "compression_ratio": 1.6380090497737556, "no_speech_prob": + 0.0018436595564708114}, {"id": 598, "seek": 373504, "start": 3735.04, "end": 3745.84, + "text": " Um, I, the only thing it does is I, uh, have it ask when it first starts + up, it just asks the server", "tokens": [50364, 3301, 11, 286, 11, 264, 787, 551, + 309, 775, 307, 286, 11, 2232, 11, 362, 309, 1029, 562, 309, 700, 3719, 493, 11, + 309, 445, 8962, 264, 7154, 50904], "temperature": 0.0, "avg_logprob": -0.13559315347263956, + "compression_ratio": 1.6569037656903767, "no_speech_prob": 0.0042787641286849976}, + {"id": 599, "seek": 373504, "start": 3745.84, "end": 3751.6, "text": " for what + the latest version is. And it''ll tell you if there''s a new version. So with that, + I see,", "tokens": [50904, 337, 437, 264, 6792, 3037, 307, 13, 400, 309, 603, 980, + 291, 498, 456, 311, 257, 777, 3037, 13, 407, 365, 300, 11, 286, 536, 11, 51192], + "temperature": 0.0, "avg_logprob": -0.13559315347263956, "compression_ratio": 1.6569037656903767, + "no_speech_prob": 0.0042787641286849976}, {"id": 600, "seek": 373504, "start": 3751.6, + "end": 3758.72, "text": " I see that, um, okay, the, you know, somebody asked for + a new version. Uh, and I anonymize all the", "tokens": [51192, 286, 536, 300, 11, + 1105, 11, 1392, 11, 264, 11, 291, 458, 11, 2618, 2351, 337, 257, 777, 3037, 13, + 4019, 11, 293, 286, 37293, 1125, 439, 264, 51548], "temperature": 0.0, "avg_logprob": + -0.13559315347263956, "compression_ratio": 1.6569037656903767, "no_speech_prob": + 0.0042787641286849976}, {"id": 601, "seek": 373504, "start": 3758.72, "end": 3763.92, + "text": " IP addresses. So I don''t even know who, like, there''s nothing. There''s + no user information at all.", "tokens": [51548, 8671, 16862, 13, 407, 286, 500, + 380, 754, 458, 567, 11, 411, 11, 456, 311, 1825, 13, 821, 311, 572, 4195, 1589, + 412, 439, 13, 51808], "temperature": 0.0, "avg_logprob": -0.13559315347263956, "compression_ratio": + 1.6569037656903767, "no_speech_prob": 0.0042787641286849976}, {"id": 602, "seek": + 376392, "start": 3763.92, "end": 3769.92, "text": " So I just use that to kind of + track, um, how often it starts. And it''s, I, I see like maybe,", "tokens": [50364, + 407, 286, 445, 764, 300, 281, 733, 295, 2837, 11, 1105, 11, 577, 2049, 309, 3719, + 13, 400, 309, 311, 11, 286, 11, 286, 536, 411, 1310, 11, 50664], "temperature": + 0.0, "avg_logprob": -0.13836362806417174, "compression_ratio": 1.631578947368421, + "no_speech_prob": 0.00038933835458010435}, {"id": 603, "seek": 376392, "start": + 3771.04, "end": 3780.8, "text": " maybe five downloads a day, um, right now. That''s + what I do. So if, if you''re running it,", "tokens": [50720, 1310, 1732, 36553, + 257, 786, 11, 1105, 11, 558, 586, 13, 663, 311, 437, 286, 360, 13, 407, 498, 11, + 498, 291, 434, 2614, 309, 11, 51208], "temperature": 0.0, "avg_logprob": -0.13836362806417174, + "compression_ratio": 1.631578947368421, "no_speech_prob": 0.00038933835458010435}, + {"id": 604, "seek": 376392, "start": 3780.8, "end": 3787.28, "text": " you''re for + pirating it, I can''t stop you. Uh, spending my time trying to stop you. Uh, it''s + not,", "tokens": [51208, 291, 434, 337, 13528, 990, 309, 11, 286, 393, 380, 1590, + 291, 13, 4019, 11, 6434, 452, 565, 1382, 281, 1590, 291, 13, 4019, 11, 309, 311, + 406, 11, 51532], "temperature": 0.0, "avg_logprob": -0.13836362806417174, "compression_ratio": + 1.631578947368421, "no_speech_prob": 0.00038933835458010435}, {"id": 605, "seek": + 376392, "start": 3787.28, "end": 3792.08, "text": " it''s not worth my energy. Yeah. + And I''d much rather, uh, I''d much rather work with teams who", "tokens": [51532, + 309, 311, 406, 3163, 452, 2281, 13, 865, 13, 400, 286, 1116, 709, 2831, 11, 2232, + 11, 286, 1116, 709, 2831, 589, 365, 5491, 567, 51772], "temperature": 0.0, "avg_logprob": + -0.13836362806417174, "compression_ratio": 1.631578947368421, "no_speech_prob": + 0.00038933835458010435}, {"id": 606, "seek": 379208, "start": 3792.08, "end": 3796.16, + "text": " really want to gain something. So if you do buy a license, I''ll work + with you on setting it up", "tokens": [50364, 534, 528, 281, 6052, 746, 13, 407, + 498, 291, 360, 2256, 257, 10476, 11, 286, 603, 589, 365, 291, 322, 3287, 309, 493, + 50568], "temperature": 0.0, "avg_logprob": -0.10490030914772558, "compression_ratio": + 1.8452830188679246, "no_speech_prob": 0.0010457357857376337}, {"id": 607, "seek": + 379208, "start": 3796.16, "end": 3802.4, "text": " and telling you how to use it + and working it and working on it with you. Um, it''s not advertised,", "tokens": + [50568, 293, 3585, 291, 577, 281, 764, 309, 293, 1364, 309, 293, 1364, 322, 309, + 365, 291, 13, 3301, 11, 309, 311, 406, 42310, 11, 50880], "temperature": 0.0, "avg_logprob": + -0.10490030914772558, "compression_ratio": 1.8452830188679246, "no_speech_prob": + 0.0010457357857376337}, {"id": 608, "seek": 379208, "start": 3802.4, "end": 3809.2, + "text": " but around model inference itself, I''m happy to, uh, offer services, + uh, to get your model up and", "tokens": [50880, 457, 926, 2316, 38253, 2564, 11, + 286, 478, 2055, 281, 11, 2232, 11, 2626, 3328, 11, 2232, 11, 281, 483, 428, 2316, + 493, 293, 51220], "temperature": 0.0, "avg_logprob": -0.10490030914772558, "compression_ratio": + 1.8452830188679246, "no_speech_prob": 0.0010457357857376337}, {"id": 609, "seek": + 379208, "start": 3809.2, "end": 3814.3199999999997, "text": " running and making + sure that it''s running optimally, um, even doing a model conversion with you,", + "tokens": [51220, 2614, 293, 1455, 988, 300, 309, 311, 2614, 5028, 379, 11, 1105, + 11, 754, 884, 257, 2316, 14298, 365, 291, 11, 51476], "temperature": 0.0, "avg_logprob": + -0.10490030914772558, "compression_ratio": 1.8452830188679246, "no_speech_prob": + 0.0010457357857376337}, {"id": 610, "seek": 379208, "start": 3814.3199999999997, + "end": 3821.2, "text": " setting you, setting you up for that stuff. Um, but that''s, + that''s not advertised. It does say, like,", "tokens": [51476, 3287, 291, 11, 3287, + 291, 493, 337, 300, 1507, 13, 3301, 11, 457, 300, 311, 11, 300, 311, 406, 42310, + 13, 467, 775, 584, 11, 411, 11, 51820], "temperature": 0.0, "avg_logprob": -0.10490030914772558, + "compression_ratio": 1.8452830188679246, "no_speech_prob": 0.0010457357857376337}, + {"id": 611, "seek": 382120, "start": 3821.6, "end": 3826.08, "text": " I''ll spend + an hour with you if you buy a subscription to get you set up, but if you need more + help", "tokens": [50384, 286, 603, 3496, 364, 1773, 365, 291, 498, 291, 2256, 257, + 17231, 281, 483, 291, 992, 493, 11, 457, 498, 291, 643, 544, 854, 50608], "temperature": + 0.0, "avg_logprob": -0.14842058577627507, "compression_ratio": 1.608695652173913, + "no_speech_prob": 0.0005218836013227701}, {"id": 612, "seek": 382120, "start": 3826.08, + "end": 3834.0, "text": " than that, you know, let me know. Uh, now there''s another + tier, which is like, if you''re Amazon,", "tokens": [50608, 813, 300, 11, 291, 458, + 11, 718, 385, 458, 13, 4019, 11, 586, 456, 311, 1071, 12362, 11, 597, 307, 411, + 11, 498, 291, 434, 6795, 11, 51004], "temperature": 0.0, "avg_logprob": -0.14842058577627507, + "compression_ratio": 1.608695652173913, "no_speech_prob": 0.0005218836013227701}, + {"id": 613, "seek": 382120, "start": 3834.72, "end": 3839.2, "text": " Amazon would + never buy my, they have their own world. But if you''re like a cloud provider,", + "tokens": [51040, 6795, 576, 1128, 2256, 452, 11, 436, 362, 641, 1065, 1002, 13, + 583, 498, 291, 434, 411, 257, 4588, 12398, 11, 51264], "temperature": 0.0, "avg_logprob": + -0.14842058577627507, "compression_ratio": 1.608695652173913, "no_speech_prob": + 0.0005218836013227701}, {"id": 614, "seek": 382120, "start": 3839.2, "end": 3844.96, + "text": " or if you want to offer it as an API, that it doesn''t count because it''s, + it''s per,", "tokens": [51264, 420, 498, 291, 528, 281, 2626, 309, 382, 364, 9362, + 11, 300, 309, 1177, 380, 1207, 570, 309, 311, 11, 309, 311, 680, 11, 51552], "temperature": + 0.0, "avg_logprob": -0.14842058577627507, "compression_ratio": 1.608695652173913, + "no_speech_prob": 0.0005218836013227701}, {"id": 615, "seek": 384496, "start": 3845.92, + "end": 3852.88, "text": " it''s per product that I sell the license for. So if you + are selling it as a cloud provider,", "tokens": [50412, 309, 311, 680, 1674, 300, + 286, 3607, 264, 10476, 337, 13, 407, 498, 291, 366, 6511, 309, 382, 257, 4588, 12398, + 11, 50760], "temperature": 0.0, "avg_logprob": -0.1285796598954634, "compression_ratio": + 1.5836909871244635, "no_speech_prob": 0.00031709924223832786}, {"id": 616, "seek": + 384496, "start": 3852.88, "end": 3858.4, "text": " or as an API, and you''ve got + like a thousand clients that are now using my money, well, I,", "tokens": [50760, + 420, 382, 364, 9362, 11, 293, 291, 600, 658, 411, 257, 4714, 6982, 300, 366, 586, + 1228, 452, 1460, 11, 731, 11, 286, 11, 51036], "temperature": 0.0, "avg_logprob": + -0.1285796598954634, "compression_ratio": 1.5836909871244635, "no_speech_prob": + 0.00031709924223832786}, {"id": 617, "seek": 384496, "start": 3858.4, "end": 3865.84, + "text": " I actually count all of those clients as a mighty user. So I don''t have + a price published,", "tokens": [51036, 286, 767, 1207, 439, 295, 729, 6982, 382, + 257, 21556, 4195, 13, 407, 286, 500, 380, 362, 257, 3218, 6572, 11, 51408], "temperature": + 0.0, "avg_logprob": -0.1285796598954634, "compression_ratio": 1.5836909871244635, + "no_speech_prob": 0.00031709924223832786}, {"id": 618, "seek": 384496, "start": + 3865.84, "end": 3871.28, "text": " but if you have that situation, I''m not going + to charge you 99 dollars a month for each client.", "tokens": [51408, 457, 498, + 291, 362, 300, 2590, 11, 286, 478, 406, 516, 281, 4602, 291, 11803, 3808, 257, 1618, + 337, 1184, 6423, 13, 51680], "temperature": 0.0, "avg_logprob": -0.1285796598954634, + "compression_ratio": 1.5836909871244635, "no_speech_prob": 0.00031709924223832786}, + {"id": 619, "seek": 387128, "start": 3871.28, "end": 3876.1600000000003, "text": + " That''s that, that, you know, if you''re running that type of business, contact + me and we''ll work,", "tokens": [50364, 663, 311, 300, 11, 300, 11, 291, 458, 11, + 498, 291, 434, 2614, 300, 2010, 295, 1606, 11, 3385, 385, 293, 321, 603, 589, 11, + 50608], "temperature": 0.0, "avg_logprob": -0.1755919683547247, "compression_ratio": + 1.606837606837607, "no_speech_prob": 0.010349803604185581}, {"id": 620, "seek": + 387128, "start": 3876.1600000000003, "end": 3881.6800000000003, "text": " we''ll + work something out. Yeah, that''s perfect. I mean, sounds like a solid model. I + mean,", "tokens": [50608, 321, 603, 589, 746, 484, 13, 865, 11, 300, 311, 2176, + 13, 286, 914, 11, 3263, 411, 257, 5100, 2316, 13, 286, 914, 11, 50884], "temperature": + 0.0, "avg_logprob": -0.1755919683547247, "compression_ratio": 1.606837606837607, + "no_speech_prob": 0.010349803604185581}, {"id": 621, "seek": 387128, "start": 3881.6800000000003, + "end": 3887.44, "text": " for the start, for sure. And another like favorite question + I have, and I''ve been asking this question", "tokens": [50884, 337, 264, 722, 11, + 337, 988, 13, 400, 1071, 411, 2954, 1168, 286, 362, 11, 293, 286, 600, 668, 3365, + 341, 1168, 51172], "temperature": 0.0, "avg_logprob": -0.1755919683547247, "compression_ratio": + 1.606837606837607, "no_speech_prob": 0.010349803604185581}, {"id": 622, "seek": + 387128, "start": 3887.44, "end": 3895.6800000000003, "text": " also to open source + players like VV8, um, and I think quadrant. Um, so basically, um,", "tokens": [51172, + 611, 281, 1269, 4009, 4150, 411, 691, 53, 23, 11, 1105, 11, 293, 286, 519, 46856, + 13, 3301, 11, 370, 1936, 11, 1105, 11, 51584], "temperature": 0.0, "avg_logprob": + -0.1755919683547247, "compression_ratio": 1.606837606837607, "no_speech_prob": 0.010349803604185581}, + {"id": 623, "seek": 389568, "start": 3896.64, "end": 3903.2, "text": " have you + thought, you know, one way of kind of building that connection that may yield a + business", "tokens": [50412, 362, 291, 1194, 11, 291, 458, 11, 472, 636, 295, 733, + 295, 2390, 300, 4984, 300, 815, 11257, 257, 1606, 50740], "temperature": 0.0, "avg_logprob": + -0.13726764029644906, "compression_ratio": 1.6182572614107884, "no_speech_prob": + 0.002891132840886712}, {"id": 624, "seek": 389568, "start": 3903.2, "end": 3909.3599999999997, + "text": " case for you is what you just explained, right? So somebody buys a license + and then you scale with", "tokens": [50740, 1389, 337, 291, 307, 437, 291, 445, + 8825, 11, 558, 30, 407, 2618, 28153, 257, 10476, 293, 550, 291, 4373, 365, 51048], + "temperature": 0.0, "avg_logprob": -0.13726764029644906, "compression_ratio": 1.6182572614107884, + "no_speech_prob": 0.002891132840886712}, {"id": 625, "seek": 389568, "start": 3909.3599999999997, + "end": 3914.64, "text": " them, you explain how to make it better, how to tune it, + maybe implement some features. Another", "tokens": [51048, 552, 11, 291, 2903, 577, + 281, 652, 309, 1101, 11, 577, 281, 10864, 309, 11, 1310, 4445, 512, 4122, 13, 3996, + 51312], "temperature": 0.0, "avg_logprob": -0.13726764029644906, "compression_ratio": + 1.6182572614107884, "no_speech_prob": 0.002891132840886712}, {"id": 626, "seek": + 389568, "start": 3914.64, "end": 3921.2799999999997, "text": " route is to open + a Slack channel or Discord, whatever. And, you know, invite users there and then", + "tokens": [51312, 7955, 307, 281, 1269, 257, 37211, 2269, 420, 32623, 11, 2035, + 13, 400, 11, 291, 458, 11, 7980, 5022, 456, 293, 550, 51644], "temperature": 0.0, + "avg_logprob": -0.13726764029644906, "compression_ratio": 1.6182572614107884, "no_speech_prob": + 0.002891132840886712}, {"id": 627, "seek": 392128, "start": 3921.28, "end": 3926.0, + "text": " start talking to them. And maybe you''ll have some open source components + as well at some point,", "tokens": [50364, 722, 1417, 281, 552, 13, 400, 1310, 291, + 603, 362, 512, 1269, 4009, 6677, 382, 731, 412, 512, 935, 11, 50600], "temperature": + 0.0, "avg_logprob": -0.10573368687783519, "compression_ratio": 1.6198347107438016, + "no_speech_prob": 0.0014794785529375076}, {"id": 628, "seek": 392128, "start": 3926.0, + "end": 3931.2000000000003, "text": " you know, I don''t know a tool that helps me + to convert my model into representation that might", "tokens": [50600, 291, 458, + 11, 286, 500, 380, 458, 257, 2290, 300, 3665, 385, 281, 7620, 452, 2316, 666, 10290, + 300, 1062, 50860], "temperature": 0.0, "avg_logprob": -0.10573368687783519, "compression_ratio": + 1.6198347107438016, "no_speech_prob": 0.0014794785529375076}, {"id": 629, "seek": + 392128, "start": 3931.2000000000003, "end": 3938.8, "text": " you can read. Um, + have you considered also taking that open source route as one way of building", + "tokens": [50860, 291, 393, 1401, 13, 3301, 11, 362, 291, 4888, 611, 1940, 300, + 1269, 4009, 7955, 382, 472, 636, 295, 2390, 51240], "temperature": 0.0, "avg_logprob": + -0.10573368687783519, "compression_ratio": 1.6198347107438016, "no_speech_prob": + 0.0014794785529375076}, {"id": 630, "seek": 392128, "start": 3938.8, "end": 3946.5600000000004, + "text": " that community of some of which will be your users and paying customers? + Uh, great question. I don''t have", "tokens": [51240, 300, 1768, 295, 512, 295, + 597, 486, 312, 428, 5022, 293, 6229, 4581, 30, 4019, 11, 869, 1168, 13, 286, 500, + 380, 362, 51628], "temperature": 0.0, "avg_logprob": -0.10573368687783519, "compression_ratio": + 1.6198347107438016, "no_speech_prob": 0.0014794785529375076}, {"id": 631, "seek": + 394656, "start": 3947.2, "end": 3955.2, "text": " a, I don''t have a Slack, I don''t + have a Slack myself. Um, I''m a member in many other slacks.", "tokens": [50396, + 257, 11, 286, 500, 380, 362, 257, 37211, 11, 286, 500, 380, 362, 257, 37211, 2059, + 13, 3301, 11, 286, 478, 257, 4006, 294, 867, 661, 1061, 7424, 13, 50796], "temperature": + 0.0, "avg_logprob": -0.1792139350821119, "compression_ratio": 1.7454545454545454, + "no_speech_prob": 0.0023044534027576447}, {"id": 632, "seek": 394656, "start": 3956.08, + "end": 3962.72, "text": " I could set up a Discord, I''m on Discord, um, mostly + just for the ML ops community in Discord.", "tokens": [50840, 286, 727, 992, 493, + 257, 32623, 11, 286, 478, 322, 32623, 11, 1105, 11, 5240, 445, 337, 264, 21601, + 44663, 1768, 294, 32623, 13, 51172], "temperature": 0.0, "avg_logprob": -0.1792139350821119, + "compression_ratio": 1.7454545454545454, "no_speech_prob": 0.0023044534027576447}, + {"id": 633, "seek": 394656, "start": 3963.68, "end": 3968.64, "text": " But I could + just start like a thread or a channel in that. I don''t know if mighty itself needs + its", "tokens": [51220, 583, 286, 727, 445, 722, 411, 257, 7207, 420, 257, 2269, + 294, 300, 13, 286, 500, 380, 458, 498, 21556, 2564, 2203, 1080, 51468], "temperature": + 0.0, "avg_logprob": -0.1792139350821119, "compression_ratio": 1.7454545454545454, + "no_speech_prob": 0.0023044534027576447}, {"id": 634, "seek": 394656, "start": 3968.64, + "end": 3975.68, "text": " own Slack by itself. Um, I think it''s more of a community. + It would be part of another community.", "tokens": [51468, 1065, 37211, 538, 2564, + 13, 3301, 11, 286, 519, 309, 311, 544, 295, 257, 1768, 13, 467, 576, 312, 644, 295, + 1071, 1768, 13, 51820], "temperature": 0.0, "avg_logprob": -0.1792139350821119, + "compression_ratio": 1.7454545454545454, "no_speech_prob": 0.0023044534027576447}, + {"id": 635, "seek": 397656, "start": 3977.2, "end": 3981.44, "text": " One of the, + one of the annoying things for me is I have to go and join like 12 million slacks", + "tokens": [50396, 1485, 295, 264, 11, 472, 295, 264, 11304, 721, 337, 385, 307, + 286, 362, 281, 352, 293, 3917, 411, 2272, 2459, 1061, 7424, 50608], "temperature": + 0.0, "avg_logprob": -0.1495165475984899, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.0019218268571421504}, {"id": 636, "seek": 397656, "start": 3981.44, + "end": 3985.52, "text": " because everybody has their own Slack and it doesn''t + work with one another. Discord does that", "tokens": [50608, 570, 2201, 575, 641, + 1065, 37211, 293, 309, 1177, 380, 589, 365, 472, 1071, 13, 32623, 775, 300, 50812], + "temperature": 0.0, "avg_logprob": -0.1495165475984899, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.0019218268571421504}, {"id": 637, "seek": 397656, "start": 3985.52, + "end": 3991.68, "text": " way better. Slack, we got to have words. Um, you got to + make it easier. I have like four email", "tokens": [50812, 636, 1101, 13, 37211, + 11, 321, 658, 281, 362, 2283, 13, 3301, 11, 291, 658, 281, 652, 309, 3571, 13, 286, + 362, 411, 1451, 3796, 51120], "temperature": 0.0, "avg_logprob": -0.1495165475984899, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.0019218268571421504}, + {"id": 638, "seek": 397656, "start": 3991.68, "end": 3997.12, "text": " addresses + or five email addresses across like 12 different slacks right now. I can''t keep + track of", "tokens": [51120, 16862, 420, 1732, 3796, 16862, 2108, 411, 2272, 819, + 1061, 7424, 558, 586, 13, 286, 393, 380, 1066, 2837, 295, 51392], "temperature": + 0.0, "avg_logprob": -0.1495165475984899, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.0019218268571421504}, {"id": 639, "seek": 397656, "start": 3997.12, + "end": 4003.04, "text": " them. Um, but in terms of open, of open source, I already + have a bunch of open source projects. So", "tokens": [51392, 552, 13, 3301, 11, + 457, 294, 2115, 295, 1269, 11, 295, 1269, 4009, 11, 286, 1217, 362, 257, 3840, 295, + 1269, 4009, 4455, 13, 407, 51688], "temperature": 0.0, "avg_logprob": -0.1495165475984899, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.0019218268571421504}, + {"id": 640, "seek": 400304, "start": 4004.0, "end": 4015.84, "text": " uh, there + is, um, max.io but spelled out, M-A-X-D-O-T-I-O on GitHub. Somebody already took + max-io.", "tokens": [50412, 2232, 11, 456, 307, 11, 1105, 11, 11469, 13, 1004, 457, + 34388, 484, 11, 376, 12, 32, 12, 55, 12, 35, 12, 46, 12, 51, 12, 40, 12, 46, 322, + 23331, 13, 13463, 1217, 1890, 11469, 12, 1004, 13, 51004], "temperature": 0.0, "avg_logprob": + -0.22393798828125, "compression_ratio": 1.3923444976076556, "no_speech_prob": 0.0030506616458296776}, + {"id": 641, "seek": 400304, "start": 4015.84, "end": 4024.0, "text": " We can''t + have dots in GitHub names. Um, that''s fine. You know, names or names. Uh, so there''s + mighty", "tokens": [51004, 492, 393, 380, 362, 15026, 294, 23331, 5288, 13, 3301, + 11, 300, 311, 2489, 13, 509, 458, 11, 5288, 420, 5288, 13, 4019, 11, 370, 456, 311, + 21556, 51412], "temperature": 0.0, "avg_logprob": -0.22393798828125, "compression_ratio": + 1.3923444976076556, "no_speech_prob": 0.0030506616458296776}, {"id": 642, "seek": + 400304, "start": 4024.0, "end": 4029.04, "text": " convert, which I actually, I''m + working on updating that because it''s based on, um, optimum,", "tokens": [51412, + 7620, 11, 597, 286, 767, 11, 286, 478, 1364, 322, 25113, 300, 570, 309, 311, 2361, + 322, 11, 1105, 11, 39326, 11, 51664], "temperature": 0.0, "avg_logprob": -0.22393798828125, + "compression_ratio": 1.3923444976076556, "no_speech_prob": 0.0030506616458296776}, + {"id": 643, "seek": 402904, "start": 4029.04, "end": 4034.8, "text": " which is + a hugging-face repository that does model conversion. It''s a very light wrapper + around", "tokens": [50364, 597, 307, 257, 41706, 12, 2868, 25841, 300, 775, 2316, + 14298, 13, 467, 311, 257, 588, 1442, 46906, 926, 50652], "temperature": 0.0, "avg_logprob": + -0.13433128244736614, "compression_ratio": 1.6081632653061224, "no_speech_prob": + 0.0006002169102430344}, {"id": 644, "seek": 402904, "start": 4035.84, "end": 4044.08, + "text": " optimum. It basically just converts the model for you, uh, bundles the + tokenizer and a configuration.", "tokens": [50704, 39326, 13, 467, 1936, 445, 38874, + 264, 2316, 337, 291, 11, 2232, 11, 13882, 904, 264, 14862, 6545, 293, 257, 11694, + 13, 51116], "temperature": 0.0, "avg_logprob": -0.13433128244736614, "compression_ratio": + 1.6081632653061224, "no_speech_prob": 0.0006002169102430344}, {"id": 645, "seek": + 402904, "start": 4044.08, "end": 4049.2799999999997, "text": " That''s it. Uh, it''s + pretty straightforward. Um, you can do that yourself. You don''t have to use that.", + "tokens": [51116, 663, 311, 309, 13, 4019, 11, 309, 311, 1238, 15325, 13, 3301, + 11, 291, 393, 360, 300, 1803, 13, 509, 500, 380, 362, 281, 764, 300, 13, 51376], + "temperature": 0.0, "avg_logprob": -0.13433128244736614, "compression_ratio": 1.6081632653061224, + "no_speech_prob": 0.0006002169102430344}, {"id": 646, "seek": 402904, "start": 4049.84, + "end": 4055.92, "text": " So that, but that''s open source. Um, there''s also mighty + batch, which is a node program, which", "tokens": [51404, 407, 300, 11, 457, 300, + 311, 1269, 4009, 13, 3301, 11, 456, 311, 611, 21556, 15245, 11, 597, 307, 257, 9984, + 1461, 11, 597, 51708], "temperature": 0.0, "avg_logprob": -0.13433128244736614, + "compression_ratio": 1.6081632653061224, "no_speech_prob": 0.0006002169102430344}, + {"id": 647, "seek": 405592, "start": 4056.7200000000003, "end": 4062.0, "text": + " is a way to do, uh, concurrent batch processing of documents,", "tokens": [50404, + 307, 257, 636, 281, 360, 11, 2232, 11, 37702, 15245, 9007, 295, 8512, 11, 50668], + "temperature": 0.0, "avg_logprob": -0.17158190863473075, "compression_ratio": 1.5029239766081872, + "no_speech_prob": 0.0006857871194370091}, {"id": 648, "seek": 405592, "start": 4062.56, + "end": 4073.2000000000003, "text": " into vectors, pointing it at a mighty server. + Um, that''s best described in the blog post that I wrote", "tokens": [50696, 666, + 18875, 11, 12166, 309, 412, 257, 21556, 7154, 13, 3301, 11, 300, 311, 1151, 7619, + 294, 264, 6968, 2183, 300, 286, 4114, 51228], "temperature": 0.0, "avg_logprob": + -0.17158190863473075, "compression_ratio": 1.5029239766081872, "no_speech_prob": + 0.0006857871194370091}, {"id": 649, "seek": 405592, "start": 4073.2000000000003, + "end": 4078.64, "text": " and how that works, um, about, you know, the, the blog + post being, uh, converting the code of", "tokens": [51228, 293, 577, 300, 1985, + 11, 1105, 11, 466, 11, 291, 458, 11, 264, 11, 264, 6968, 2183, 885, 11, 2232, 11, + 29942, 264, 3089, 295, 51500], "temperature": 0.0, "avg_logprob": -0.17158190863473075, + "compression_ratio": 1.5029239766081872, "no_speech_prob": 0.0006857871194370091}, + {"id": 650, "seek": 407864, "start": 4078.64, "end": 4087.52, "text": " federal + regulations. Um, that''s on, it''s on the homepage of max.io. And, uh, there''s + also a bunch of", "tokens": [50364, 6019, 12563, 13, 3301, 11, 300, 311, 322, 11, + 309, 311, 322, 264, 31301, 295, 11469, 13, 1004, 13, 400, 11, 2232, 11, 456, 311, + 611, 257, 3840, 295, 50808], "temperature": 0.0, "avg_logprob": -0.1253059442760875, + "compression_ratio": 1.625531914893617, "no_speech_prob": 0.00027877523098140955}, + {"id": 651, "seek": 407864, "start": 4087.52, "end": 4092.4, "text": " other open + source projects that I haven''t talked about yet. So there''s now node mighty, which + is", "tokens": [50808, 661, 1269, 4009, 4455, 300, 286, 2378, 380, 2825, 466, 1939, + 13, 407, 456, 311, 586, 9984, 21556, 11, 597, 307, 51052], "temperature": 0.0, "avg_logprob": + -0.1253059442760875, "compression_ratio": 1.625531914893617, "no_speech_prob": 0.00027877523098140955}, + {"id": 652, "seek": 407864, "start": 4092.4, "end": 4098.8, "text": " basically + just an API client for node that will talk to mighty, um, it does connection pooling.", + "tokens": [51052, 1936, 445, 364, 9362, 6423, 337, 9984, 300, 486, 751, 281, 21556, + 11, 1105, 11, 309, 775, 4984, 7005, 278, 13, 51372], "temperature": 0.0, "avg_logprob": + -0.1253059442760875, "compression_ratio": 1.625531914893617, "no_speech_prob": 0.00027877523098140955}, + {"id": 653, "seek": 407864, "start": 4098.8, "end": 4104.639999999999, "text": " + So if you have like four mighty, four mighty cores running, it''ll talk to all the, + it''ll", "tokens": [51372, 407, 498, 291, 362, 411, 1451, 21556, 11, 1451, 21556, + 24826, 2614, 11, 309, 603, 751, 281, 439, 264, 11, 309, 603, 51664], "temperature": + 0.0, "avg_logprob": -0.1253059442760875, "compression_ratio": 1.625531914893617, + "no_speech_prob": 0.00027877523098140955}, {"id": 654, "seek": 410464, "start": + 4104.64, "end": 4109.92, "text": " negotiate which core to use, um, when it makes + a call. So that''s really easy to use in like an express", "tokens": [50364, 21713, + 597, 4965, 281, 764, 11, 1105, 11, 562, 309, 1669, 257, 818, 13, 407, 300, 311, + 534, 1858, 281, 764, 294, 411, 364, 5109, 50628], "temperature": 0.0, "avg_logprob": + -0.12416741583082411, "compression_ratio": 1.6040816326530611, "no_speech_prob": + 0.000273947196546942}, {"id": 655, "seek": 410464, "start": 4109.92, "end": 4116.88, + "text": " server. Um, I also wrote two other node modules while I was at it, uh, + that aren''t from mighty,", "tokens": [50628, 7154, 13, 3301, 11, 286, 611, 4114, + 732, 661, 9984, 16679, 1339, 286, 390, 412, 309, 11, 2232, 11, 300, 3212, 380, 490, + 21556, 11, 50976], "temperature": 0.0, "avg_logprob": -0.12416741583082411, "compression_ratio": + 1.6040816326530611, "no_speech_prob": 0.000273947196546942}, {"id": 656, "seek": + 410464, "start": 4116.88, "end": 4122.320000000001, "text": " but I wrote node quadrant. + So now there''s a node client for, for the quadrant vector database.", "tokens": + [50976, 457, 286, 4114, 9984, 46856, 13, 407, 586, 456, 311, 257, 9984, 6423, 337, + 11, 337, 264, 46856, 8062, 8149, 13, 51248], "temperature": 0.0, "avg_logprob": + -0.12416741583082411, "compression_ratio": 1.6040816326530611, "no_speech_prob": + 0.000273947196546942}, {"id": 657, "seek": 410464, "start": 4123.12, "end": 4131.68, + "text": " And I told, uh, uh, I told the guys at quadrant that this exists. I''m + trying to work on a blog post", "tokens": [51288, 400, 286, 1907, 11, 2232, 11, + 2232, 11, 286, 1907, 264, 1074, 412, 46856, 300, 341, 8198, 13, 286, 478, 1382, + 281, 589, 322, 257, 6968, 2183, 51716], "temperature": 0.0, "avg_logprob": -0.12416741583082411, + "compression_ratio": 1.6040816326530611, "no_speech_prob": 0.000273947196546942}, + {"id": 658, "seek": 413168, "start": 4131.68, "end": 4136.4800000000005, "text": + " out of an asset, I guess this is the announcement. Um, but I''ll, I''ll, I''ll + publish something.", "tokens": [50364, 484, 295, 364, 11999, 11, 286, 2041, 341, + 307, 264, 12847, 13, 3301, 11, 457, 286, 603, 11, 286, 603, 11, 286, 603, 11374, + 746, 13, 50604], "temperature": 0.0, "avg_logprob": -0.1966743999057346, "compression_ratio": + 1.795539033457249, "no_speech_prob": 0.0013255112571641803}, {"id": 659, "seek": + 413168, "start": 4136.4800000000005, "end": 4142.72, "text": " There''s going to + be a demo. I also wrote node pine cone. So well, it''s pine cone dash I.O. So now", + "tokens": [50604, 821, 311, 516, 281, 312, 257, 10723, 13, 286, 611, 4114, 9984, + 15113, 19749, 13, 407, 731, 11, 309, 311, 15113, 19749, 8240, 286, 13, 46, 13, 407, + 586, 50916], "temperature": 0.0, "avg_logprob": -0.1966743999057346, "compression_ratio": + 1.795539033457249, "no_speech_prob": 0.0013255112571641803}, {"id": 660, "seek": + 413168, "start": 4142.72, "end": 4149.04, "text": " there''s a node JS integration + into pine cone. So you can talk to pine cone from, from node from", "tokens": [50916, + 456, 311, 257, 9984, 33063, 10980, 666, 15113, 19749, 13, 407, 291, 393, 751, 281, + 15113, 19749, 490, 11, 490, 9984, 490, 51232], "temperature": 0.0, "avg_logprob": + -0.1966743999057346, "compression_ratio": 1.795539033457249, "no_speech_prob": 0.0013255112571641803}, + {"id": 661, "seek": 413168, "start": 4149.04, "end": 4153.4400000000005, "text": + " a kind of express server or something. Um, the guys at pine cone don''t know what + that I wrote that", "tokens": [51232, 257, 733, 295, 5109, 7154, 420, 746, 13, 3301, + 11, 264, 1074, 412, 15113, 19749, 500, 380, 458, 437, 300, 286, 4114, 300, 51452], + "temperature": 0.0, "avg_logprob": -0.1966743999057346, "compression_ratio": 1.795539033457249, + "no_speech_prob": 0.0013255112571641803}, {"id": 662, "seek": 413168, "start": 4153.4400000000005, + "end": 4160.8, "text": " yet because it wasn''t, I didn''t, I just put it out there. + It''s on npm. Um, so I gotta, I gotta,", "tokens": [51452, 1939, 570, 309, 2067, + 380, 11, 286, 994, 380, 11, 286, 445, 829, 309, 484, 456, 13, 467, 311, 322, 297, + 14395, 13, 3301, 11, 370, 286, 3428, 11, 286, 3428, 11, 51820], "temperature": 0.0, + "avg_logprob": -0.1966743999057346, "compression_ratio": 1.795539033457249, "no_speech_prob": + 0.0013255112571641803}, {"id": 663, "seek": 416080, "start": 4160.8, "end": 4164.56, + "text": " I gotta work that out. And they might want it. If you guys, if you want + this, you know, I,", "tokens": [50364, 286, 3428, 589, 300, 484, 13, 400, 436, 1062, + 528, 309, 13, 759, 291, 1074, 11, 498, 291, 528, 341, 11, 291, 458, 11, 286, 11, + 50552], "temperature": 0.0, "avg_logprob": -0.14496816387613312, "compression_ratio": + 1.7563636363636363, "no_speech_prob": 0.0006712747272104025}, {"id": 664, "seek": + 416080, "start": 4164.56, "end": 4169.4400000000005, "text": " I just wanted something + that I could use, but it''s your name. So please take the, take the package", "tokens": + [50552, 286, 445, 1415, 746, 300, 286, 727, 764, 11, 457, 309, 311, 428, 1315, 13, + 407, 1767, 747, 264, 11, 747, 264, 7372, 50796], "temperature": 0.0, "avg_logprob": + -0.14496816387613312, "compression_ratio": 1.7563636363636363, "no_speech_prob": + 0.0006712747272104025}, {"id": 665, "seek": 416080, "start": 4169.4400000000005, + "end": 4174.4800000000005, "text": " from me. If you don''t get upset that I used + your name. Um, I just wanted a tool that I could use", "tokens": [50796, 490, 385, + 13, 759, 291, 500, 380, 483, 8340, 300, 286, 1143, 428, 1315, 13, 3301, 11, 286, + 445, 1415, 257, 2290, 300, 286, 727, 764, 51048], "temperature": 0.0, "avg_logprob": + -0.14496816387613312, "compression_ratio": 1.7563636363636363, "no_speech_prob": + 0.0006712747272104025}, {"id": 666, "seek": 416080, "start": 4174.4800000000005, + "end": 4180.4800000000005, "text": " for my own node JS testing. Um, but then this + stuff integrates with, with mighty really easily. So", "tokens": [51048, 337, 452, + 1065, 9984, 33063, 4997, 13, 3301, 11, 457, 550, 341, 1507, 3572, 1024, 365, 11, + 365, 21556, 534, 3612, 13, 407, 51348], "temperature": 0.0, "avg_logprob": -0.14496816387613312, + "compression_ratio": 1.7563636363636363, "no_speech_prob": 0.0006712747272104025}, + {"id": 667, "seek": 416080, "start": 4180.4800000000005, "end": 4184.4800000000005, + "text": " I have all these node clients now and I''m fricking focusing, focusing + on JavaScript first. So all", "tokens": [51348, 286, 362, 439, 613, 9984, 6982, + 586, 293, 286, 478, 431, 10401, 8416, 11, 8416, 322, 15778, 700, 13, 407, 439, 51548], + "temperature": 0.0, "avg_logprob": -0.14496816387613312, "compression_ratio": 1.7563636363636363, + "no_speech_prob": 0.0006712747272104025}, {"id": 668, "seek": 418448, "start": 4184.48, + "end": 4189.839999999999, "text": " the stuff is going to be released. It''s already, + it''s already up there. It''s on npm. It''s on my,", "tokens": [50364, 264, 1507, + 307, 516, 281, 312, 4736, 13, 467, 311, 1217, 11, 309, 311, 1217, 493, 456, 13, + 467, 311, 322, 297, 14395, 13, 467, 311, 322, 452, 11, 50632], "temperature": 0.0, + "avg_logprob": -0.12790803909301757, "compression_ratio": 1.7048611111111112, "no_speech_prob": + 0.0005388341378420591}, {"id": 669, "seek": 418448, "start": 4189.839999999999, + "end": 4196.959999999999, "text": " my GitHub. Uh, it''s, it''s free to use. It + maybe needs a little bit more polish. I haven''t fully", "tokens": [50632, 452, + 23331, 13, 4019, 11, 309, 311, 11, 309, 311, 1737, 281, 764, 13, 467, 1310, 2203, + 257, 707, 857, 544, 20452, 13, 286, 2378, 380, 4498, 50988], "temperature": 0.0, + "avg_logprob": -0.12790803909301757, "compression_ratio": 1.7048611111111112, "no_speech_prob": + 0.0005388341378420591}, {"id": 670, "seek": 418448, "start": 4196.959999999999, + "end": 4202.639999999999, "text": " mapped out the APIs. I just mapped out the course + stuff that I needed to do. So it doesn''t do things", "tokens": [50988, 33318, 484, + 264, 21445, 13, 286, 445, 33318, 484, 264, 1164, 1507, 300, 286, 2978, 281, 360, + 13, 407, 309, 1177, 380, 360, 721, 51272], "temperature": 0.0, "avg_logprob": -0.12790803909301757, + "compression_ratio": 1.7048611111111112, "no_speech_prob": 0.0005388341378420591}, + {"id": 671, "seek": 418448, "start": 4202.639999999999, "end": 4207.599999999999, + "text": " like the scroll command, you know, where you can scroll through all the + points on quadrant. But I", "tokens": [51272, 411, 264, 11369, 5622, 11, 291, 458, + 11, 689, 291, 393, 11369, 807, 439, 264, 2793, 322, 46856, 13, 583, 286, 51520], + "temperature": 0.0, "avg_logprob": -0.12790803909301757, "compression_ratio": 1.7048611111111112, + "no_speech_prob": 0.0005388341378420591}, {"id": 672, "seek": 418448, "start": 4207.599999999999, + "end": 4212.08, "text": " don''t know how much of a use for that is it''s really + easy to add that. I just didn''t have the time.", "tokens": [51520, 500, 380, 458, + 577, 709, 295, 257, 764, 337, 300, 307, 309, 311, 534, 1858, 281, 909, 300, 13, + 286, 445, 994, 380, 362, 264, 565, 13, 51744], "temperature": 0.0, "avg_logprob": + -0.12790803909301757, "compression_ratio": 1.7048611111111112, "no_speech_prob": + 0.0005388341378420591}, {"id": 673, "seek": 421208, "start": 4212.64, "end": 4218.64, + "text": " So yeah, there''s, there''s a bunch of open source work that I''ve been + doing. Um, I also want to", "tokens": [50392, 407, 1338, 11, 456, 311, 11, 456, + 311, 257, 3840, 295, 1269, 4009, 589, 300, 286, 600, 668, 884, 13, 3301, 11, 286, + 611, 528, 281, 50692], "temperature": 0.0, "avg_logprob": -0.16313892940305313, + "compression_ratio": 1.7882882882882882, "no_speech_prob": 0.0008735001320019364}, + {"id": 674, "seek": 421208, "start": 4218.64, "end": 4227.84, "text": " mention + I''m working on starter applications. So I have, I have right now, uh, basically + it''s like a,", "tokens": [50692, 2152, 286, 478, 1364, 322, 22465, 5821, 13, 407, + 286, 362, 11, 286, 362, 558, 586, 11, 2232, 11, 1936, 309, 311, 411, 257, 11, 51152], + "temperature": 0.0, "avg_logprob": -0.16313892940305313, "compression_ratio": 1.7882882882882882, + "no_speech_prob": 0.0008735001320019364}, {"id": 675, "seek": 421208, "start": 4227.84, + "end": 4235.28, "text": " it''s like a starter app that works with node and node + mighty and quadrant. Um, and also node mighty", "tokens": [51152, 309, 311, 411, + 257, 22465, 724, 300, 1985, 365, 9984, 293, 9984, 21556, 293, 46856, 13, 3301, 11, + 293, 611, 9984, 21556, 51524], "temperature": 0.0, "avg_logprob": -0.16313892940305313, + "compression_ratio": 1.7882882882882882, "no_speech_prob": 0.0008735001320019364}, + {"id": 676, "seek": 421208, "start": 4235.28, "end": 4241.28, "text": " and pine + cone. I have two starter apps that aren''t released yet that I''m working on polishing + up and,", "tokens": [51524, 293, 15113, 19749, 13, 286, 362, 732, 22465, 7733, 300, + 3212, 380, 4736, 1939, 300, 286, 478, 1364, 322, 47258, 493, 293, 11, 51824], "temperature": + 0.0, "avg_logprob": -0.16313892940305313, "compression_ratio": 1.7882882882882882, + "no_speech_prob": 0.0008735001320019364}, {"id": 677, "seek": 424128, "start": 4241.28, + "end": 4246.24, "text": " and getting out there where it''s where it''s really easy + if you''re a node, if you''re a job script", "tokens": [50364, 293, 1242, 484, 456, + 689, 309, 311, 689, 309, 311, 534, 1858, 498, 291, 434, 257, 9984, 11, 498, 291, + 434, 257, 1691, 5755, 50612], "temperature": 0.0, "avg_logprob": -0.20601383209228516, + "compression_ratio": 1.6244725738396624, "no_speech_prob": 0.0005491789197549224}, + {"id": 678, "seek": 424128, "start": 4246.24, "end": 4250.96, "text": " person to + just take documents, convert them into vectors, load them into a vector database,", + "tokens": [50612, 954, 281, 445, 747, 8512, 11, 7620, 552, 666, 18875, 11, 3677, + 552, 666, 257, 8062, 8149, 11, 50848], "temperature": 0.0, "avg_logprob": -0.20601383209228516, + "compression_ratio": 1.6244725738396624, "no_speech_prob": 0.0005491789197549224}, + {"id": 679, "seek": 424128, "start": 4251.84, "end": 4260.719999999999, "text": + " and have a search app running using them. Yeah, that''s fantastic. I mean, so + much to unpack. And I", "tokens": [50892, 293, 362, 257, 3164, 724, 2614, 1228, + 552, 13, 865, 11, 300, 311, 5456, 13, 286, 914, 11, 370, 709, 281, 26699, 13, 400, + 286, 51336], "temperature": 0.0, "avg_logprob": -0.20601383209228516, "compression_ratio": + 1.6244725738396624, "no_speech_prob": 0.0005491789197549224}, {"id": 680, "seek": + 424128, "start": 4260.719999999999, "end": 4269.5199999999995, "text": " think this + could be one of the like a we''re witnessing, um, a community written, uh, software + for", "tokens": [51336, 519, 341, 727, 312, 472, 295, 264, 411, 257, 321, 434, 39233, + 11, 1105, 11, 257, 1768, 3720, 11, 2232, 11, 4722, 337, 51776], "temperature": 0.0, + "avg_logprob": -0.20601383209228516, "compression_ratio": 1.6244725738396624, "no_speech_prob": + 0.0005491789197549224}, {"id": 681, "seek": 426952, "start": 4269.52, "end": 4275.76, + "text": " a close software company. I mean, pine cone is a close software company, + right? And we have an", "tokens": [50364, 257, 1998, 4722, 2237, 13, 286, 914, 11, + 15113, 19749, 307, 257, 1998, 4722, 2237, 11, 558, 30, 400, 321, 362, 364, 50676], + "temperature": 0.0, "avg_logprob": -0.24457542494972154, "compression_ratio": 1.6375, + "no_speech_prob": 0.0016823967453092337}, {"id": 682, "seek": 426952, "start": 4275.76, + "end": 4283.360000000001, "text": " episode with Greg Kogan, who is a chief marketing + marketing officer with pine cone. Um, we can connect", "tokens": [50676, 3500, 365, + 11490, 591, 21576, 11, 567, 307, 257, 9588, 6370, 6370, 8456, 365, 15113, 19749, + 13, 3301, 11, 321, 393, 1745, 51056], "temperature": 0.0, "avg_logprob": -0.24457542494972154, + "compression_ratio": 1.6375, "no_speech_prob": 0.0016823967453092337}, {"id": 683, + "seek": 426952, "start": 4283.360000000001, "end": 4288.96, "text": " you to and + you can discuss the future. Yeah, I talked to Greg. You know, we''re working on + some stuff.", "tokens": [51056, 291, 281, 293, 291, 393, 2248, 264, 2027, 13, 865, + 11, 286, 2825, 281, 11490, 13, 509, 458, 11, 321, 434, 1364, 322, 512, 1507, 13, + 51336], "temperature": 0.0, "avg_logprob": -0.24457542494972154, "compression_ratio": + 1.6375, "no_speech_prob": 0.0016823967453092337}, {"id": 684, "seek": 426952, "start": + 4290.320000000001, "end": 4295.92, "text": " But what, what, my question is what + made you write those connectors? Like, did you think that", "tokens": [51404, 583, + 437, 11, 437, 11, 452, 1168, 307, 437, 1027, 291, 2464, 729, 31865, 30, 1743, 11, + 630, 291, 519, 300, 51684], "temperature": 0.0, "avg_logprob": -0.24457542494972154, + "compression_ratio": 1.6375, "no_speech_prob": 0.0016823967453092337}, {"id": 685, + "seek": 429592, "start": 4296.4800000000005, "end": 4302.88, "text": " this would + also pave the way to using mighty, you know, plugging in mighty in the pipeline?", + "tokens": [50392, 341, 576, 611, 28870, 264, 636, 281, 1228, 21556, 11, 291, 458, + 11, 42975, 294, 21556, 294, 264, 15517, 30, 50712], "temperature": 0.0, "avg_logprob": + -0.12615863096366808, "compression_ratio": 1.6973684210526316, "no_speech_prob": + 0.002458206145092845}, {"id": 686, "seek": 429592, "start": 4302.88, "end": 4308.4800000000005, + "text": " Let''s say if I''m a pine cone user and I can have a node pine cone connector + at the same time as", "tokens": [50712, 961, 311, 584, 498, 286, 478, 257, 15113, + 19749, 4195, 293, 286, 393, 362, 257, 9984, 15113, 19749, 19127, 412, 264, 912, + 565, 382, 50992], "temperature": 0.0, "avg_logprob": -0.12615863096366808, "compression_ratio": + 1.6973684210526316, "no_speech_prob": 0.002458206145092845}, {"id": 687, "seek": + 429592, "start": 4308.4800000000005, "end": 4314.32, "text": " mighty. I''d say + half half that, you know, there is, you know, I do want to promote mighty, of course.", + "tokens": [50992, 21556, 13, 286, 1116, 584, 1922, 1922, 300, 11, 291, 458, 11, + 456, 307, 11, 291, 458, 11, 286, 360, 528, 281, 9773, 21556, 11, 295, 1164, 13, + 51284], "temperature": 0.0, "avg_logprob": -0.12615863096366808, "compression_ratio": + 1.6973684210526316, "no_speech_prob": 0.002458206145092845}, {"id": 688, "seek": + 429592, "start": 4315.12, "end": 4322.56, "text": " But again, I want to bring these + tools outside of the Python ecosystem. If you look at the vector", "tokens": [51324, + 583, 797, 11, 286, 528, 281, 1565, 613, 3873, 2380, 295, 264, 15329, 11311, 13, + 759, 291, 574, 412, 264, 8062, 51696], "temperature": 0.0, "avg_logprob": -0.12615863096366808, + "compression_ratio": 1.6973684210526316, "no_speech_prob": 0.002458206145092845}, + {"id": 689, "seek": 432256, "start": 4322.56, "end": 4328.0, "text": " databases + right now, with it, with the exception of, uh, with VV8, VV8 does a great job of + having", "tokens": [50364, 22380, 558, 586, 11, 365, 309, 11, 365, 264, 11183, 295, + 11, 2232, 11, 365, 691, 53, 23, 11, 691, 53, 23, 775, 257, 869, 1691, 295, 1419, + 50636], "temperature": 0.0, "avg_logprob": -0.20592634139522428, "compression_ratio": + 1.7610294117647058, "no_speech_prob": 0.0020707380026578903}, {"id": 690, "seek": + 432256, "start": 4328.0, "end": 4334.72, "text": " different clients for different, + um, different languages and stacks, um, Vespa as well. But both,", "tokens": [50636, + 819, 6982, 337, 819, 11, 1105, 11, 819, 8650, 293, 30792, 11, 1105, 11, 691, 279, + 4306, 382, 731, 13, 583, 1293, 11, 50972], "temperature": 0.0, "avg_logprob": -0.20592634139522428, + "compression_ratio": 1.7610294117647058, "no_speech_prob": 0.0020707380026578903}, + {"id": 691, "seek": 432256, "start": 4335.4400000000005, "end": 4340.8, "text": + " both quadrant and pine cone right now, it''s all Python. Well, like, quadrant, + quadrant is written", "tokens": [51008, 1293, 46856, 293, 15113, 19749, 558, 586, + 11, 309, 311, 439, 15329, 13, 1042, 11, 411, 11, 46856, 11, 46856, 307, 3720, 51276], + "temperature": 0.0, "avg_logprob": -0.20592634139522428, "compression_ratio": 1.7610294117647058, + "no_speech_prob": 0.0020707380026578903}, {"id": 692, "seek": 432256, "start": 4340.8, + "end": 4346.64, "text": " in rust, but their client right now is their, their first + class client is in Python. Um,", "tokens": [51276, 294, 15259, 11, 457, 641, 6423, + 558, 586, 307, 641, 11, 641, 700, 1508, 6423, 307, 294, 15329, 13, 3301, 11, 51568], + "temperature": 0.0, "avg_logprob": -0.20592634139522428, "compression_ratio": 1.7610294117647058, + "no_speech_prob": 0.0020707380026578903}, {"id": 693, "seek": 432256, "start": 4347.360000000001, + "end": 4352.4800000000005, "text": " which they did that because obviously everybody + who has to get vectors has to use Python anyway,", "tokens": [51604, 597, 436, 630, + 300, 570, 2745, 2201, 567, 575, 281, 483, 18875, 575, 281, 764, 15329, 4033, 11, + 51860], "temperature": 0.0, "avg_logprob": -0.20592634139522428, "compression_ratio": + 1.7610294117647058, "no_speech_prob": 0.0020707380026578903}, {"id": 694, "seek": + 435248, "start": 4352.48, "end": 4359.919999999999, "text": " uh, or they used to. + Um, so that''s why they chose Python. At least that''s, that''s what I speculate.", + "tokens": [50364, 2232, 11, 420, 436, 1143, 281, 13, 3301, 11, 370, 300, 311, 983, + 436, 5111, 15329, 13, 1711, 1935, 300, 311, 11, 300, 311, 437, 286, 40775, 13, 50736], + "temperature": 0.0, "avg_logprob": -0.13662703335285187, "compression_ratio": 1.8091603053435115, + "no_speech_prob": 0.00022973115846980363}, {"id": 695, "seek": 435248, "start": + 4360.32, "end": 4366.32, "text": " Um, and pine cone as well, all their examples + are in notebook form, um, in Jupyter notebook form.", "tokens": [50756, 3301, 11, + 293, 15113, 19749, 382, 731, 11, 439, 641, 5110, 366, 294, 21060, 1254, 11, 1105, + 11, 294, 22125, 88, 391, 21060, 1254, 13, 51056], "temperature": 0.0, "avg_logprob": + -0.13662703335285187, "compression_ratio": 1.8091603053435115, "no_speech_prob": + 0.00022973115846980363}, {"id": 696, "seek": 435248, "start": 4366.32, "end": 4370.4, + "text": " You go in and you want to do a somatic search example, that''s a Python + notebook.", "tokens": [51056, 509, 352, 294, 293, 291, 528, 281, 360, 257, 3307, + 2399, 3164, 1365, 11, 300, 311, 257, 15329, 21060, 13, 51260], "temperature": 0.0, + "avg_logprob": -0.13662703335285187, "compression_ratio": 1.8091603053435115, "no_speech_prob": + 0.00022973115846980363}, {"id": 697, "seek": 435248, "start": 4371.759999999999, + "end": 4376.08, "text": " I''m not crazy about Python notebooks. I think Python + notebooks are good for illustrating like", "tokens": [51328, 286, 478, 406, 3219, + 466, 15329, 43782, 13, 286, 519, 15329, 43782, 366, 665, 337, 8490, 8754, 411, 51544], + "temperature": 0.0, "avg_logprob": -0.13662703335285187, "compression_ratio": 1.8091603053435115, + "no_speech_prob": 0.00022973115846980363}, {"id": 698, "seek": 435248, "start": + 4376.08, "end": 4381.759999999999, "text": " ideas and sketches, uh, for papers, + but it''s really hard for me to look at a Python notebook and say,", "tokens": [51544, + 3487, 293, 34547, 11, 2232, 11, 337, 10577, 11, 457, 309, 311, 534, 1152, 337, 385, + 281, 574, 412, 257, 15329, 21060, 293, 584, 11, 51828], "temperature": 0.0, "avg_logprob": + -0.13662703335285187, "compression_ratio": 1.8091603053435115, "no_speech_prob": + 0.00022973115846980363}, {"id": 699, "seek": 438176, "start": 4381.76, "end": 4387.52, + "text": " here''s how I make this into a working application. Um, it doesn''t translate + well because the,", "tokens": [50364, 510, 311, 577, 286, 652, 341, 666, 257, 1364, + 3861, 13, 3301, 11, 309, 1177, 380, 13799, 731, 570, 264, 11, 50652], "temperature": + 0.0, "avg_logprob": -0.10886617986167349, "compression_ratio": 1.7765567765567765, + "no_speech_prob": 0.0005089626647531986}, {"id": 700, "seek": 438176, "start": 4387.52, + "end": 4390.88, "text": " the architecture isn''t there. It''s a bunch of cells + that you run in order. That''s not how,", "tokens": [50652, 264, 9482, 1943, 380, + 456, 13, 467, 311, 257, 3840, 295, 5438, 300, 291, 1190, 294, 1668, 13, 663, 311, + 406, 577, 11, 50820], "temperature": 0.0, "avg_logprob": -0.10886617986167349, "compression_ratio": + 1.7765567765567765, "no_speech_prob": 0.0005089626647531986}, {"id": 701, "seek": + 438176, "start": 4391.52, "end": 4399.04, "text": " you know, real world applications + work. So the idea is to just get these tools and get these ideas", "tokens": [50852, + 291, 458, 11, 957, 1002, 5821, 589, 13, 407, 264, 1558, 307, 281, 445, 483, 613, + 3873, 293, 483, 613, 3487, 51228], "temperature": 0.0, "avg_logprob": -0.10886617986167349, + "compression_ratio": 1.7765567765567765, "no_speech_prob": 0.0005089626647531986}, + {"id": 702, "seek": 438176, "start": 4399.04, "end": 4403.52, "text": " and capabilities + out into the hands of a lot of other people who want to be able to use this stuff", + "tokens": [51228, 293, 10862, 484, 666, 264, 2377, 295, 257, 688, 295, 661, 561, + 567, 528, 281, 312, 1075, 281, 764, 341, 1507, 51452], "temperature": 0.0, "avg_logprob": + -0.10886617986167349, "compression_ratio": 1.7765567765567765, "no_speech_prob": + 0.0005089626647531986}, {"id": 703, "seek": 438176, "start": 4403.52, "end": 4407.84, + "text": " and are not familiar with Python, they''re not familiar with NLP, but + they want to be able to use this,", "tokens": [51452, 293, 366, 406, 4963, 365, + 15329, 11, 436, 434, 406, 4963, 365, 426, 45196, 11, 457, 436, 528, 281, 312, 1075, + 281, 764, 341, 11, 51668], "temperature": 0.0, "avg_logprob": -0.10886617986167349, + "compression_ratio": 1.7765567765567765, "no_speech_prob": 0.0005089626647531986}, + {"id": 704, "seek": 440784, "start": 4407.84, "end": 4413.04, "text": " uh, this + new technology because they might have a business problem to try to solve.", "tokens": + [50364, 2232, 11, 341, 777, 2899, 570, 436, 1062, 362, 257, 1606, 1154, 281, 853, + 281, 5039, 13, 50624], "temperature": 0.0, "avg_logprob": -0.18054136899438236, + "compression_ratio": 1.5378486055776892, "no_speech_prob": 0.008108616806566715}, + {"id": 705, "seek": 440784, "start": 4413.04, "end": 4418.16, "text": " So you''re + thinking actually about engineers who are day to day productizing their code and + thinking,", "tokens": [50624, 407, 291, 434, 1953, 767, 466, 11955, 567, 366, 786, + 281, 786, 1674, 3319, 641, 3089, 293, 1953, 11, 50880], "temperature": 0.0, "avg_logprob": + -0.18054136899438236, "compression_ratio": 1.5378486055776892, "no_speech_prob": + 0.008108616806566715}, {"id": 706, "seek": 440784, "start": 4418.16, "end": 4425.52, + "text": " okay, yeah, I need a embedding layer, but I don''t care about notebooks. + I''m not a Pythonista or whatever.", "tokens": [50880, 1392, 11, 1338, 11, 286, + 643, 257, 12240, 3584, 4583, 11, 457, 286, 500, 380, 1127, 466, 43782, 13, 286, + 478, 406, 257, 15329, 5236, 420, 2035, 13, 51248], "temperature": 0.0, "avg_logprob": + -0.18054136899438236, "compression_ratio": 1.5378486055776892, "no_speech_prob": + 0.008108616806566715}, {"id": 707, "seek": 440784, "start": 4425.52, "end": 4433.28, + "text": " So, you know, just give me the tool. Exactly. Yeah, that''s fantastic. + I mean, and by the tools,", "tokens": [51248, 407, 11, 291, 458, 11, 445, 976, 385, + 264, 2290, 13, 7587, 13, 865, 11, 300, 311, 5456, 13, 286, 914, 11, 293, 538, 264, + 3873, 11, 51636], "temperature": 0.0, "avg_logprob": -0.18054136899438236, "compression_ratio": + 1.5378486055776892, "no_speech_prob": 0.008108616806566715}, {"id": 708, "seek": + 443328, "start": 4433.28, "end": 4437.759999999999, "text": " you also like disclose + something like ahead of time with me that, to me, that, um, you,", "tokens": [50364, + 291, 611, 411, 36146, 746, 411, 2286, 295, 565, 365, 385, 300, 11, 281, 385, 11, + 300, 11, 1105, 11, 291, 11, 50588], "temperature": 0.0, "avg_logprob": -0.13792694755222487, + "compression_ratio": 1.7692307692307692, "no_speech_prob": 0.00717752892524004}, + {"id": 709, "seek": 443328, "start": 4438.32, "end": 4444.16, "text": " like one + of the overarching goals for you is to develop as many tools for the vector search", + "tokens": [50616, 411, 472, 295, 264, 45501, 5493, 337, 291, 307, 281, 1499, 382, + 867, 3873, 337, 264, 8062, 3164, 50908], "temperature": 0.0, "avg_logprob": -0.13792694755222487, + "compression_ratio": 1.7692307692307692, "no_speech_prob": 0.00717752892524004}, + {"id": 710, "seek": 443328, "start": 4444.16, "end": 4449.84, "text": " community + as possible. And like some of the tools you mentioned, like go beyond, you know,", + "tokens": [50908, 1768, 382, 1944, 13, 400, 411, 512, 295, 264, 3873, 291, 2835, + 11, 411, 352, 4399, 11, 291, 458, 11, 51192], "temperature": 0.0, "avg_logprob": + -0.13792694755222487, "compression_ratio": 1.7692307692307692, "no_speech_prob": + 0.00717752892524004}, {"id": 711, "seek": 443328, "start": 4449.84, "end": 4455.84, + "text": " pure engineering components, like connectors, you said, uh, maybe like + fine tuning a model or", "tokens": [51192, 6075, 7043, 6677, 11, 411, 31865, 11, + 291, 848, 11, 2232, 11, 1310, 411, 2489, 15164, 257, 2316, 420, 51492], "temperature": + 0.0, "avg_logprob": -0.13792694755222487, "compression_ratio": 1.7692307692307692, + "no_speech_prob": 0.00717752892524004}, {"id": 712, "seek": 443328, "start": 4455.84, + "end": 4460.719999999999, "text": " something of that sort, which at that point, + I think you''re stepping, uh, on the ground of, you", "tokens": [51492, 746, 295, + 300, 1333, 11, 597, 412, 300, 935, 11, 286, 519, 291, 434, 16821, 11, 2232, 11, + 322, 264, 2727, 295, 11, 291, 51736], "temperature": 0.0, "avg_logprob": -0.13792694755222487, + "compression_ratio": 1.7692307692307692, "no_speech_prob": 0.00717752892524004}, + {"id": 713, "seek": 446072, "start": 4460.88, "end": 4467.360000000001, "text": + " know, other startups like Gina and, um, you know, deep set and so on. Do you feel + that way or do you", "tokens": [50372, 458, 11, 661, 28041, 411, 34711, 293, 11, + 1105, 11, 291, 458, 11, 2452, 992, 293, 370, 322, 13, 1144, 291, 841, 300, 636, + 420, 360, 291, 50696], "temperature": 0.0, "avg_logprob": -0.13885197146185513, + "compression_ratio": 1.6260504201680672, "no_speech_prob": 0.0015320490347221494}, + {"id": 714, "seek": 446072, "start": 4467.360000000001, "end": 4471.52, "text": + " not concern yourself with those and you''re just thinking, okay, what''s missing + in the field? I''m", "tokens": [50696, 406, 3136, 1803, 365, 729, 293, 291, 434, + 445, 1953, 11, 1392, 11, 437, 311, 5361, 294, 264, 2519, 30, 286, 478, 50904], "temperature": + 0.0, "avg_logprob": -0.13885197146185513, "compression_ratio": 1.6260504201680672, + "no_speech_prob": 0.0015320490347221494}, {"id": 715, "seek": 446072, "start": 4471.52, + "end": 4480.56, "text": " going to add it. I''m going to open source it. Yeah, same. + So, uh, deep set is like, it''s all Python.", "tokens": [50904, 516, 281, 909, 309, + 13, 286, 478, 516, 281, 1269, 4009, 309, 13, 865, 11, 912, 13, 407, 11, 2232, 11, + 2452, 992, 307, 411, 11, 309, 311, 439, 15329, 13, 51356], "temperature": 0.0, "avg_logprob": + -0.13885197146185513, "compression_ratio": 1.6260504201680672, "no_speech_prob": + 0.0015320490347221494}, {"id": 716, "seek": 446072, "start": 4480.56, "end": 4486.4800000000005, + "text": " Again, um, Gina, I think is a lot of Python, right? I''m not as familiar + with Gina. Yeah,", "tokens": [51356, 3764, 11, 1105, 11, 34711, 11, 286, 519, 307, + 257, 688, 295, 15329, 11, 558, 30, 286, 478, 406, 382, 4963, 365, 34711, 13, 865, + 11, 51652], "temperature": 0.0, "avg_logprob": -0.13885197146185513, "compression_ratio": + 1.6260504201680672, "no_speech_prob": 0.0015320490347221494}, {"id": 717, "seek": + 448648, "start": 4486.48, "end": 4494.0, "text": " they are, they''re Python mostly. + Yeah. It''s, it''s, there''s a huge opportunity, uh, to make", "tokens": [50364, + 436, 366, 11, 436, 434, 15329, 5240, 13, 865, 13, 467, 311, 11, 309, 311, 11, 456, + 311, 257, 2603, 2650, 11, 2232, 11, 281, 652, 50740], "temperature": 0.0, "avg_logprob": + -0.18279426976254112, "compression_ratio": 1.4404145077720207, "no_speech_prob": + 0.00047038114280439913}, {"id": 718, "seek": 448648, "start": 4494.0, "end": 4507.12, + "text": " these tools available to non Python, um, stacks. And I don''t, I, before + I started working in", "tokens": [50740, 613, 3873, 2435, 281, 2107, 15329, 11, + 1105, 11, 30792, 13, 400, 286, 500, 380, 11, 286, 11, 949, 286, 1409, 1364, 294, + 51396], "temperature": 0.0, "avg_logprob": -0.18279426976254112, "compression_ratio": + 1.4404145077720207, "no_speech_prob": 0.00047038114280439913}, {"id": 719, "seek": + 448648, "start": 4507.12, "end": 4512.799999999999, "text": " machine learning, + I''ve never even considered Python as, as an application framework. You know,", + "tokens": [51396, 3479, 2539, 11, 286, 600, 1128, 754, 4888, 15329, 382, 11, 382, + 364, 3861, 8388, 13, 509, 458, 11, 51680], "temperature": 0.0, "avg_logprob": -0.18279426976254112, + "compression_ratio": 1.4404145077720207, "no_speech_prob": 0.00047038114280439913}, + {"id": 720, "seek": 451280, "start": 4512.88, "end": 4519.360000000001, "text": + " people are using like Django, flash and stuff like that. Um, but for me, it was + like, uh,", "tokens": [50368, 561, 366, 1228, 411, 33464, 17150, 11, 7319, 293, + 1507, 411, 300, 13, 3301, 11, 457, 337, 385, 11, 309, 390, 411, 11, 2232, 11, 50692], + "temperature": 0.0, "avg_logprob": -0.1309790228405138, "compression_ratio": 1.763157894736842, + "no_speech_prob": 0.0004544277908280492}, {"id": 721, "seek": 451280, "start": 4519.360000000001, + "end": 4523.6, "text": " it''s not that it didn''t take it seriously. I just felt + it wasn''t, it wasn''t something that", "tokens": [50692, 309, 311, 406, 300, 309, + 994, 380, 747, 309, 6638, 13, 286, 445, 2762, 309, 2067, 380, 11, 309, 2067, 380, + 746, 300, 50904], "temperature": 0.0, "avg_logprob": -0.1309790228405138, "compression_ratio": + 1.763157894736842, "no_speech_prob": 0.0004544277908280492}, {"id": 722, "seek": + 451280, "start": 4524.24, "end": 4530.400000000001, "text": " I would have chosen + to use aside from, you know, a lot of other, a lot of other stacks. So there", "tokens": + [50936, 286, 576, 362, 8614, 281, 764, 7359, 490, 11, 291, 458, 11, 257, 688, 295, + 661, 11, 257, 688, 295, 661, 30792, 13, 407, 456, 51244], "temperature": 0.0, "avg_logprob": + -0.1309790228405138, "compression_ratio": 1.763157894736842, "no_speech_prob": 0.0004544277908280492}, + {"id": 723, "seek": 451280, "start": 4530.400000000001, "end": 4534.88, "text": + " is so many other teams out there that want to be able to use these things, but + now they have to,", "tokens": [51244, 307, 370, 867, 661, 5491, 484, 456, 300, 528, + 281, 312, 1075, 281, 764, 613, 721, 11, 457, 586, 436, 362, 281, 11, 51468], "temperature": + 0.0, "avg_logprob": -0.1309790228405138, "compression_ratio": 1.763157894736842, + "no_speech_prob": 0.0004544277908280492}, {"id": 724, "seek": 451280, "start": 4534.88, + "end": 4541.52, "text": " oh, Python, Python, Python, it''s nonstop. So we got to + break out of that somehow. Um, and, uh,", "tokens": [51468, 1954, 11, 15329, 11, + 15329, 11, 15329, 11, 309, 311, 2107, 13559, 13, 407, 321, 658, 281, 1821, 484, + 295, 300, 6063, 13, 3301, 11, 293, 11, 2232, 11, 51800], "temperature": 0.0, "avg_logprob": + -0.1309790228405138, "compression_ratio": 1.763157894736842, "no_speech_prob": 0.0004544277908280492}, + {"id": 725, "seek": 454152, "start": 4541.52, "end": 4546.0, "text": " I''m starting + with node because the JavaScript ecosystem is just absolutely enormous. I think", + "tokens": [50364, 286, 478, 2891, 365, 9984, 570, 264, 15778, 11311, 307, 445, 3122, + 11322, 13, 286, 519, 50588], "temperature": 0.0, "avg_logprob": -0.13762704166797324, + "compression_ratio": 1.7348484848484849, "no_speech_prob": 0.0015206891112029552}, + {"id": 726, "seek": 454152, "start": 4546.0, "end": 4550.400000000001, "text": " + people underestimate the size of the JavaScript ecosystem. If you''re in machine + learning and you''re", "tokens": [50588, 561, 35826, 264, 2744, 295, 264, 15778, + 11311, 13, 759, 291, 434, 294, 3479, 2539, 293, 291, 434, 50808], "temperature": + 0.0, "avg_logprob": -0.13762704166797324, "compression_ratio": 1.7348484848484849, + "no_speech_prob": 0.0015206891112029552}, {"id": 727, "seek": 454152, "start": 4550.400000000001, + "end": 4558.160000000001, "text": " listening to this podcast right now, like there, + there are like maybe a hundred people for every one", "tokens": [50808, 4764, 281, + 341, 7367, 558, 586, 11, 411, 456, 11, 456, 366, 411, 1310, 257, 3262, 561, 337, + 633, 472, 51196], "temperature": 0.0, "avg_logprob": -0.13762704166797324, "compression_ratio": + 1.7348484848484849, "no_speech_prob": 0.0015206891112029552}, {"id": 728, "seek": + 454152, "start": 4558.160000000001, "end": 4566.080000000001, "text": " of you who''s + using JavaScript in, in, for applications. Like that''s how big it is. Um, so that, + uh,", "tokens": [51196, 295, 291, 567, 311, 1228, 15778, 294, 11, 294, 11, 337, + 5821, 13, 1743, 300, 311, 577, 955, 309, 307, 13, 3301, 11, 370, 300, 11, 2232, + 11, 51592], "temperature": 0.0, "avg_logprob": -0.13762704166797324, "compression_ratio": + 1.7348484848484849, "no_speech_prob": 0.0015206891112029552}, {"id": 729, "seek": + 454152, "start": 4566.080000000001, "end": 4568.72, "text": " I''m starting there. + I just know it''s just an enormous community.", "tokens": [51592, 286, 478, 2891, + 456, 13, 286, 445, 458, 309, 311, 445, 364, 11322, 1768, 13, 51724], "temperature": + 0.0, "avg_logprob": -0.13762704166797324, "compression_ratio": 1.7348484848484849, + "no_speech_prob": 0.0015206891112029552}, {"id": 730, "seek": 456872, "start": 4569.4400000000005, + "end": 4574.8, "text": " And not only for front end development, you know, we need + to emphasize this because you also have", "tokens": [50400, 400, 406, 787, 337, + 1868, 917, 3250, 11, 291, 458, 11, 321, 643, 281, 16078, 341, 570, 291, 611, 362, + 50668], "temperature": 0.0, "avg_logprob": -0.19019221296214095, "compression_ratio": + 1.6074380165289257, "no_speech_prob": 0.01118532195687294}, {"id": 731, "seek": + 456872, "start": 4574.8, "end": 4581.12, "text": " server side JavaScript, like + Node.js and others. And it''s, it''s huge. And a lot of software,", "tokens": [50668, + 7154, 1252, 15778, 11, 411, 38640, 13, 25530, 293, 2357, 13, 400, 309, 311, 11, + 309, 311, 2603, 13, 400, 257, 688, 295, 4722, 11, 50984], "temperature": 0.0, "avg_logprob": + -0.19019221296214095, "compression_ratio": 1.6074380165289257, "no_speech_prob": + 0.01118532195687294}, {"id": 732, "seek": 456872, "start": 4582.0, "end": 4589.360000000001, + "text": " which is kind of the, the middleware between your super cool search engine + or your, your vector database,", "tokens": [51028, 597, 307, 733, 295, 264, 11, + 264, 2808, 3039, 1296, 428, 1687, 1627, 3164, 2848, 420, 428, 11, 428, 8062, 8149, + 11, 51396], "temperature": 0.0, "avg_logprob": -0.19019221296214095, "compression_ratio": + 1.6074380165289257, "no_speech_prob": 0.01118532195687294}, {"id": 733, "seek": + 456872, "start": 4590.0, "end": 4594.96, "text": " and the front end, you have a + lot of middleware written in node because it''s so much easier.", "tokens": [51428, + 293, 264, 1868, 917, 11, 291, 362, 257, 688, 295, 2808, 3039, 3720, 294, 9984, 570, + 309, 311, 370, 709, 3571, 13, 51676], "temperature": 0.0, "avg_logprob": -0.19019221296214095, + "compression_ratio": 1.6074380165289257, "no_speech_prob": 0.01118532195687294}, + {"id": 734, "seek": 459496, "start": 4595.68, "end": 4600.24, "text": " Oh, well, + not easy. I don''t know. Is it easier? But I think it''s just the pervasive, you + know,", "tokens": [50400, 876, 11, 731, 11, 406, 1858, 13, 286, 500, 380, 458, 13, + 1119, 309, 3571, 30, 583, 286, 519, 309, 311, 445, 264, 680, 39211, 11, 291, 458, + 11, 50628], "temperature": 0.0, "avg_logprob": -0.15575005326952254, "compression_ratio": + 1.6394849785407726, "no_speech_prob": 0.006392363924533129}, {"id": 735, "seek": + 459496, "start": 4600.24, "end": 4606.0, "text": " nature of JavaScript. Yeah, I + don''t know if I''d say node is easier than Python. I think it''s,", "tokens": [50628, + 3687, 295, 15778, 13, 865, 11, 286, 500, 380, 458, 498, 286, 1116, 584, 9984, 307, + 3571, 813, 15329, 13, 286, 519, 309, 311, 11, 50916], "temperature": 0.0, "avg_logprob": + -0.15575005326952254, "compression_ratio": 1.6394849785407726, "no_speech_prob": + 0.006392363924533129}, {"id": 736, "seek": 459496, "start": 4606.8, "end": 4611.12, + "text": " you know, I think they''re similar actually in a lot of ways. The syntax + is a little bit different,", "tokens": [50956, 291, 458, 11, 286, 519, 436, 434, + 2531, 767, 294, 257, 688, 295, 2098, 13, 440, 28431, 307, 257, 707, 857, 819, 11, + 51172], "temperature": 0.0, "avg_logprob": -0.15575005326952254, "compression_ratio": + 1.6394849785407726, "no_speech_prob": 0.006392363924533129}, {"id": 737, "seek": + 459496, "start": 4611.12, "end": 4620.64, "text": " you know, curly braces versus + tabs. But, uh, I think that node, we''re, we''re getting away from", "tokens": [51172, + 291, 458, 11, 32066, 41537, 5717, 20743, 13, 583, 11, 2232, 11, 286, 519, 300, 9984, + 11, 321, 434, 11, 321, 434, 1242, 1314, 490, 51648], "temperature": 0.0, "avg_logprob": + -0.15575005326952254, "compression_ratio": 1.6394849785407726, "no_speech_prob": + 0.006392363924533129}, {"id": 738, "seek": 462064, "start": 4620.64, "end": 4625.76, + "text": " vectors right now. But node started because the JavaScript was the language + of the web.", "tokens": [50364, 18875, 558, 586, 13, 583, 9984, 1409, 570, 264, + 15778, 390, 264, 2856, 295, 264, 3670, 13, 50620], "temperature": 0.0, "avg_logprob": + -0.16725805193878884, "compression_ratio": 1.5896226415094339, "no_speech_prob": + 0.0004629429313354194}, {"id": 739, "seek": 462064, "start": 4626.72, "end": 4632.320000000001, + "text": " And people didn''t want to learn another language to also write back end + code.", "tokens": [50668, 400, 561, 994, 380, 528, 281, 1466, 1071, 2856, 281, 611, + 2464, 646, 917, 3089, 13, 50948], "temperature": 0.0, "avg_logprob": -0.16725805193878884, + "compression_ratio": 1.5896226415094339, "no_speech_prob": 0.0004629429313354194}, + {"id": 740, "seek": 462064, "start": 4634.0, "end": 4639.200000000001, "text": " + You know, you were using Pearl, right? So a lot of the, there was a lot of time + where it was like", "tokens": [51032, 509, 458, 11, 291, 645, 1228, 24639, 11, 558, + 30, 407, 257, 688, 295, 264, 11, 456, 390, 257, 688, 295, 565, 689, 309, 390, 411, + 51292], "temperature": 0.0, "avg_logprob": -0.16725805193878884, "compression_ratio": + 1.5896226415094339, "no_speech_prob": 0.0004629429313354194}, {"id": 741, "seek": + 462064, "start": 4639.200000000001, "end": 4643.68, "text": " Pearl, PHP, plus JavaScript, + right? There was that whole world out there.", "tokens": [51292, 24639, 11, 47298, + 11, 1804, 15778, 11, 558, 30, 821, 390, 300, 1379, 1002, 484, 456, 13, 51516], "temperature": + 0.0, "avg_logprob": -0.16725805193878884, "compression_ratio": 1.5896226415094339, + "no_speech_prob": 0.0004629429313354194}, {"id": 742, "seek": 464368, "start": 4644.56, + "end": 4651.280000000001, "text": " Um, so that''s where node came from. It came + from the web, the web front end. So that''s,", "tokens": [50408, 3301, 11, 370, + 300, 311, 689, 9984, 1361, 490, 13, 467, 1361, 490, 264, 3670, 11, 264, 3670, 1868, + 917, 13, 407, 300, 311, 11, 50744], "temperature": 0.0, "avg_logprob": -0.21919116795620072, + "compression_ratio": 1.6923076923076923, "no_speech_prob": 0.0033956628758460283}, + {"id": 743, "seek": 464368, "start": 4651.92, "end": 4655.92, "text": " web front + end is enormous. And they all, and a lot of them just adopted node. And the node + had", "tokens": [50776, 3670, 1868, 917, 307, 11322, 13, 400, 436, 439, 11, 293, + 257, 688, 295, 552, 445, 12175, 9984, 13, 400, 264, 9984, 632, 50976], "temperature": + 0.0, "avg_logprob": -0.21919116795620072, "compression_ratio": 1.6923076923076923, + "no_speech_prob": 0.0033956628758460283}, {"id": 744, "seek": 464368, "start": 4655.92, + "end": 4663.360000000001, "text": " its own hype cycle, like 2010 through 2014 was + like maybe nodes heyday where it just was like", "tokens": [50976, 1080, 1065, 24144, + 6586, 11, 411, 9657, 807, 8227, 390, 411, 1310, 13891, 4177, 810, 689, 309, 445, + 390, 411, 51348], "temperature": 0.0, "avg_logprob": -0.21919116795620072, "compression_ratio": + 1.6923076923076923, "no_speech_prob": 0.0033956628758460283}, {"id": 745, "seek": + 464368, "start": 4663.360000000001, "end": 4669.68, "text": " through the roof. + Everything was node.js. Um, it was going crazy. Now it''s all, now it''s all, uh,", + "tokens": [51348, 807, 264, 8418, 13, 5471, 390, 9984, 13, 25530, 13, 3301, 11, + 309, 390, 516, 3219, 13, 823, 309, 311, 439, 11, 586, 309, 311, 439, 11, 2232, 11, + 51664], "temperature": 0.0, "avg_logprob": -0.21919116795620072, "compression_ratio": + 1.6923076923076923, "no_speech_prob": 0.0033956628758460283}, {"id": 746, "seek": + 466968, "start": 4669.68, "end": 4675.4400000000005, "text": " uh, you know, machine + learning and AI. A lot of people got involved in this, in this world.", "tokens": + [50364, 2232, 11, 291, 458, 11, 3479, 2539, 293, 7318, 13, 316, 688, 295, 561, 658, + 3288, 294, 341, 11, 294, 341, 1002, 13, 50652], "temperature": 0.0, "avg_logprob": + -0.16469749296554412, "compression_ratio": 1.60352422907489, "no_speech_prob": 0.003676739986985922}, + {"id": 747, "seek": 466968, "start": 4676.0, "end": 4681.280000000001, "text": " + But there''s still a huge, a huge section of the world that''s written on top of + node", "tokens": [50680, 583, 456, 311, 920, 257, 2603, 11, 257, 2603, 3541, 295, + 264, 1002, 300, 311, 3720, 322, 1192, 295, 9984, 50944], "temperature": 0.0, "avg_logprob": + -0.16469749296554412, "compression_ratio": 1.60352422907489, "no_speech_prob": 0.003676739986985922}, + {"id": 748, "seek": 466968, "start": 4681.84, "end": 4687.4400000000005, "text": + " from applications that started in, in, you know, the early 2010s. And it evolved + ever since.", "tokens": [50972, 490, 5821, 300, 1409, 294, 11, 294, 11, 291, 458, + 11, 264, 2440, 9657, 82, 13, 400, 309, 14178, 1562, 1670, 13, 51252], "temperature": + 0.0, "avg_logprob": -0.16469749296554412, "compression_ratio": 1.60352422907489, + "no_speech_prob": 0.003676739986985922}, {"id": 749, "seek": 466968, "start": 4688.8, + "end": 4694.4800000000005, "text": " Yeah. But back to tools. Like, so you said + in the early notes you shared, like you also want to", "tokens": [51320, 865, 13, + 583, 646, 281, 3873, 13, 1743, 11, 370, 291, 848, 294, 264, 2440, 5570, 291, 5507, + 11, 411, 291, 611, 528, 281, 51604], "temperature": 0.0, "avg_logprob": -0.16469749296554412, + "compression_ratio": 1.60352422907489, "no_speech_prob": 0.003676739986985922}, + {"id": 750, "seek": 469448, "start": 4694.5599999999995, "end": 4700.16, "text": + " address some of the, uh, problem solved problems like in model fine tuning or + some other like pipeline", "tokens": [50368, 2985, 512, 295, 264, 11, 2232, 11, + 1154, 13041, 2740, 411, 294, 2316, 2489, 15164, 420, 512, 661, 411, 15517, 50648], + "temperature": 0.0, "avg_logprob": -0.18292916010296534, "compression_ratio": 1.7509578544061302, + "no_speech_prob": 0.00938659068197012}, {"id": 751, "seek": 469448, "start": 4700.16, + "end": 4707.04, "text": " steps that, that may be precede the, uh, the, the embedding + layer that you have now addressed with", "tokens": [50648, 4439, 300, 11, 300, 815, + 312, 16969, 68, 264, 11, 2232, 11, 264, 11, 264, 12240, 3584, 4583, 300, 291, 362, + 586, 13847, 365, 50992], "temperature": 0.0, "avg_logprob": -0.18292916010296534, + "compression_ratio": 1.7509578544061302, "no_speech_prob": 0.00938659068197012}, + {"id": 752, "seek": 469448, "start": 4707.04, "end": 4710.08, "text": " MITIS. So + what are your thoughts there? What, what do you think is missing?", "tokens": [50992, + 13100, 2343, 13, 407, 437, 366, 428, 4598, 456, 30, 708, 11, 437, 360, 291, 519, + 307, 5361, 30, 51144], "temperature": 0.0, "avg_logprob": -0.18292916010296534, + "compression_ratio": 1.7509578544061302, "no_speech_prob": 0.00938659068197012}, + {"id": 753, "seek": 469448, "start": 4712.16, "end": 4716.4, "text": " I don''t, + yeah, I don''t know if I''m going to get into actual model tuning. I think that,", + "tokens": [51248, 286, 500, 380, 11, 1338, 11, 286, 500, 380, 458, 498, 286, 478, + 516, 281, 483, 666, 3539, 2316, 15164, 13, 286, 519, 300, 11, 51460], "temperature": + 0.0, "avg_logprob": -0.18292916010296534, "compression_ratio": 1.7509578544061302, + "no_speech_prob": 0.00938659068197012}, {"id": 754, "seek": 469448, "start": 4717.839999999999, + "end": 4722.879999999999, "text": " first of all, I''m not, I''m not as good as, + I''m not as good training models as other people.", "tokens": [51532, 700, 295, + 439, 11, 286, 478, 406, 11, 286, 478, 406, 382, 665, 382, 11, 286, 478, 406, 382, + 665, 3097, 5245, 382, 661, 561, 13, 51784], "temperature": 0.0, "avg_logprob": -0.18292916010296534, + "compression_ratio": 1.7509578544061302, "no_speech_prob": 0.00938659068197012}, + {"id": 755, "seek": 472288, "start": 4722.88, "end": 4727.76, "text": " There are + other people that are suited to train models. But I do think there''s a lot of other", + "tokens": [50364, 821, 366, 661, 561, 300, 366, 24736, 281, 3847, 5245, 13, 583, + 286, 360, 519, 456, 311, 257, 688, 295, 661, 50608], "temperature": 0.0, "avg_logprob": + -0.18495378040132068, "compression_ratio": 1.5826446280991735, "no_speech_prob": + 0.00019972323207184672}, {"id": 756, "seek": 472288, "start": 4727.76, "end": 4736.16, + "text": " information that is lacking in model in the, the ML ops world and vector, + and vector search.", "tokens": [50608, 1589, 300, 307, 20889, 294, 2316, 294, 264, + 11, 264, 21601, 44663, 1002, 293, 8062, 11, 293, 8062, 3164, 13, 51028], "temperature": + 0.0, "avg_logprob": -0.18495378040132068, "compression_ratio": 1.5826446280991735, + "no_speech_prob": 0.00019972323207184672}, {"id": 757, "seek": 472288, "start": + 4737.4400000000005, "end": 4742.32, "text": " One of them is just like, well, how + similar are these things, right? What, what''s the distribution", "tokens": [51092, + 1485, 295, 552, 307, 445, 411, 11, 731, 11, 577, 2531, 366, 613, 721, 11, 558, 30, + 708, 11, 437, 311, 264, 7316, 51336], "temperature": 0.0, "avg_logprob": -0.18495378040132068, + "compression_ratio": 1.5826446280991735, "no_speech_prob": 0.00019972323207184672}, + {"id": 758, "seek": 472288, "start": 4742.32, "end": 4750.4800000000005, "text": + " of similarities? Um, I think we, V8 said they, they do support, uh, some of that + and, uh, Vespas,", "tokens": [51336, 295, 24197, 30, 3301, 11, 286, 519, 321, 11, + 691, 23, 848, 436, 11, 436, 360, 1406, 11, 2232, 11, 512, 295, 300, 293, 11, 2232, + 11, 691, 13361, 296, 11, 51744], "temperature": 0.0, "avg_logprob": -0.18495378040132068, + "compression_ratio": 1.5826446280991735, "no_speech_prob": 0.00019972323207184672}, + {"id": 759, "seek": 475048, "start": 4750.48, "end": 4757.2, "text": " what''s some + of that in logging. But, um, I don''t know about Pankoam, uh, I''m pretty sure, + rust,", "tokens": [50364, 437, 311, 512, 295, 300, 294, 27991, 13, 583, 11, 1105, + 11, 286, 500, 380, 458, 466, 430, 657, 78, 335, 11, 2232, 11, 286, 478, 1238, 988, + 11, 15259, 11, 50700], "temperature": 0.0, "avg_logprob": -0.2034935992697011, "compression_ratio": + 1.6167400881057268, "no_speech_prob": 0.00035570000181905925}, {"id": 760, "seek": + 475048, "start": 4757.2, "end": 4762.959999999999, "text": " uh, I''m sure, pretty + sure quadrant does not. So it''s like, what do I mean by this? It''s like, if I,", + "tokens": [50700, 2232, 11, 286, 478, 988, 11, 1238, 988, 46856, 775, 406, 13, 407, + 309, 311, 411, 11, 437, 360, 286, 914, 538, 341, 30, 467, 311, 411, 11, 498, 286, + 11, 50988], "temperature": 0.0, "avg_logprob": -0.2034935992697011, "compression_ratio": + 1.6167400881057268, "no_speech_prob": 0.00035570000181905925}, {"id": 761, "seek": + 475048, "start": 4762.959999999999, "end": 4772.48, "text": " if I, uh, have a vector + and I get, and I do a query against, um, uh, quadrant, for example,", "tokens": + [50988, 498, 286, 11, 2232, 11, 362, 257, 8062, 293, 286, 483, 11, 293, 286, 360, + 257, 14581, 1970, 11, 1105, 11, 2232, 11, 46856, 11, 337, 1365, 11, 51464], "temperature": + 0.0, "avg_logprob": -0.2034935992697011, "compression_ratio": 1.6167400881057268, + "no_speech_prob": 0.00035570000181905925}, {"id": 762, "seek": 475048, "start": + 4772.48, "end": 4776.24, "text": " I get back a list of documents that are nearest + neighbors and the similarities.", "tokens": [51464, 286, 483, 646, 257, 1329, 295, + 8512, 300, 366, 23831, 12512, 293, 264, 24197, 13, 51652], "temperature": 0.0, "avg_logprob": + -0.2034935992697011, "compression_ratio": 1.6167400881057268, "no_speech_prob": + 0.00035570000181905925}, {"id": 763, "seek": 477624, "start": 4777.2, "end": 4782.719999999999, + "text": " Well, where does that fit? Like, if I get it back in the first document + is like point four or", "tokens": [50412, 1042, 11, 689, 775, 300, 3318, 30, 1743, + 11, 498, 286, 483, 309, 646, 294, 264, 700, 4166, 307, 411, 935, 1451, 420, 50688], + "temperature": 0.0, "avg_logprob": -0.1438651597204287, "compression_ratio": 1.7470817120622568, + "no_speech_prob": 0.003709448967128992}, {"id": 764, "seek": 477624, "start": 4782.719999999999, + "end": 4789.04, "text": " not similar, right? Is that good? Is that bad? Like, what + are the, what''s, what''s, what''s real,", "tokens": [50688, 406, 2531, 11, 558, + 30, 1119, 300, 665, 30, 1119, 300, 1578, 30, 1743, 11, 437, 366, 264, 11, 437, 311, + 11, 437, 311, 11, 437, 311, 957, 11, 51004], "temperature": 0.0, "avg_logprob": + -0.1438651597204287, "compression_ratio": 1.7470817120622568, "no_speech_prob": + 0.003709448967128992}, {"id": 765, "seek": 477624, "start": 4789.04, "end": 4793.04, + "text": " what''s real good similar? Maybe, maybe the best similarities are like + point eight range.", "tokens": [51004, 437, 311, 957, 665, 2531, 30, 2704, 11, 1310, + 264, 1151, 24197, 366, 411, 935, 3180, 3613, 13, 51204], "temperature": 0.0, "avg_logprob": + -0.1438651597204287, "compression_ratio": 1.7470817120622568, "no_speech_prob": + 0.003709448967128992}, {"id": 766, "seek": 477624, "start": 4793.599999999999, "end": + 4798.96, "text": " So now I know that, well, in terms of my entire corpus and how + people usually query,", "tokens": [51232, 407, 586, 286, 458, 300, 11, 731, 11, + 294, 2115, 295, 452, 2302, 1181, 31624, 293, 577, 561, 2673, 14581, 11, 51500], + "temperature": 0.0, "avg_logprob": -0.1438651597204287, "compression_ratio": 1.7470817120622568, + "no_speech_prob": 0.003709448967128992}, {"id": 767, "seek": 477624, "start": 4798.96, + "end": 4804.16, "text": " this result is actually not that great. And there''s a + lot of questions to be answered", "tokens": [51500, 341, 1874, 307, 767, 406, 300, + 869, 13, 400, 456, 311, 257, 688, 295, 1651, 281, 312, 10103, 51760], "temperature": + 0.0, "avg_logprob": -0.1438651597204287, "compression_ratio": 1.7470817120622568, + "no_speech_prob": 0.003709448967128992}, {"id": 768, "seek": 480416, "start": 4805.12, + "end": 4810.08, "text": " around that stuff. And so I think that''s lacking in a + lot of ways. I don''t know if that''s the right", "tokens": [50412, 926, 300, 1507, + 13, 400, 370, 286, 519, 300, 311, 20889, 294, 257, 688, 295, 2098, 13, 286, 500, + 380, 458, 498, 300, 311, 264, 558, 50660], "temperature": 0.0, "avg_logprob": -0.14286590936615712, + "compression_ratio": 1.8122605363984674, "no_speech_prob": 0.00830679852515459}, + {"id": 769, "seek": 480416, "start": 4810.08, "end": 4814.5599999999995, "text": + " fit for Mighty though. I think there''s just external tools that, you know, I''m + kicking around.", "tokens": [50660, 3318, 337, 45874, 1673, 13, 286, 519, 456, 311, + 445, 8320, 3873, 300, 11, 291, 458, 11, 286, 478, 19137, 926, 13, 50884], "temperature": + 0.0, "avg_logprob": -0.14286590936615712, "compression_ratio": 1.8122605363984674, + "no_speech_prob": 0.00830679852515459}, {"id": 770, "seek": 480416, "start": 4814.5599999999995, + "end": 4821.2, "text": " All that stuff would be open source. I''m really interested + in, in Mighty being, uh, the area of", "tokens": [50884, 1057, 300, 1507, 576, 312, + 1269, 4009, 13, 286, 478, 534, 3102, 294, 11, 294, 45874, 885, 11, 2232, 11, 264, + 1859, 295, 51216], "temperature": 0.0, "avg_logprob": -0.14286590936615712, "compression_ratio": + 1.8122605363984674, "no_speech_prob": 0.00830679852515459}, {"id": 771, "seek": + 480416, "start": 4821.2, "end": 4826.0, "text": " the business and then all the + other stuff is open source to make things easier for people to use,", "tokens": + [51216, 264, 1606, 293, 550, 439, 264, 661, 1507, 307, 1269, 4009, 281, 652, 721, + 3571, 337, 561, 281, 764, 11, 51456], "temperature": 0.0, "avg_logprob": -0.14286590936615712, + "compression_ratio": 1.8122605363984674, "no_speech_prob": 0.00830679852515459}, + {"id": 772, "seek": 480416, "start": 4826.5599999999995, "end": 4834.08, "text": + " uh, these things. But yeah, there, there''s a lot, there''s a lot of stuff in + terms of", "tokens": [51484, 2232, 11, 613, 721, 13, 583, 1338, 11, 456, 11, 456, + 311, 257, 688, 11, 456, 311, 257, 688, 295, 1507, 294, 2115, 295, 51860], "temperature": + 0.0, "avg_logprob": -0.14286590936615712, "compression_ratio": 1.8122605363984674, + "no_speech_prob": 0.00830679852515459}, {"id": 773, "seek": 483408, "start": 4834.08, + "end": 4841.84, "text": " uh, in the MLObs world, there''s like model drift. It''s + like, well, I used, you know, if,", "tokens": [50364, 2232, 11, 294, 264, 21601, + 46, 929, 1002, 11, 456, 311, 411, 2316, 19699, 13, 467, 311, 411, 11, 731, 11, 286, + 1143, 11, 291, 458, 11, 498, 11, 50752], "temperature": 0.0, "avg_logprob": -0.175953077233356, + "compression_ratio": 1.5380434782608696, "no_speech_prob": 0.0010293223895132542}, + {"id": 774, "seek": 483408, "start": 4841.84, "end": 4848.72, "text": " let''s say + I have like 100, uh, 100 sentences, right? And I vectorized these against, you know,", + "tokens": [50752, 718, 311, 584, 286, 362, 411, 2319, 11, 2232, 11, 2319, 16579, + 11, 558, 30, 400, 286, 8062, 1602, 613, 1970, 11, 291, 458, 11, 51096], "temperature": + 0.0, "avg_logprob": -0.175953077233356, "compression_ratio": 1.5380434782608696, + "no_speech_prob": 0.0010293223895132542}, {"id": 775, "seek": 483408, "start": 4848.72, + "end": 4855.6, "text": " model one dot two dot three, right? And I got back, um, + I got back a list of, uh, vectors. Now I''ve", "tokens": [51096, 2316, 472, 5893, + 732, 5893, 1045, 11, 558, 30, 400, 286, 658, 646, 11, 1105, 11, 286, 658, 646, 257, + 1329, 295, 11, 2232, 11, 18875, 13, 823, 286, 600, 51440], "temperature": 0.0, "avg_logprob": + -0.175953077233356, "compression_ratio": 1.5380434782608696, "no_speech_prob": 0.0010293223895132542}, + {"id": 776, "seek": 485560, "start": 4855.68, "end": 4863.76, "text": " upgraded + my model. I have model one dot three dot eight, right? And now I, now I, uh, run + my test", "tokens": [50368, 24133, 452, 2316, 13, 286, 362, 2316, 472, 5893, 1045, + 5893, 3180, 11, 558, 30, 400, 586, 286, 11, 586, 286, 11, 2232, 11, 1190, 452, 1500, + 50772], "temperature": 0.0, "avg_logprob": -0.1391047349910146, "compression_ratio": + 1.641350210970464, "no_speech_prob": 0.0005055960500612855}, {"id": 777, "seek": + 485560, "start": 4863.76, "end": 4869.92, "text": " vectors, my test sentences through + and I get different vectors. Like how, how much has changed?", "tokens": [50772, + 18875, 11, 452, 1500, 16579, 807, 293, 286, 483, 819, 18875, 13, 1743, 577, 11, + 577, 709, 575, 3105, 30, 51080], "temperature": 0.0, "avg_logprob": -0.1391047349910146, + "compression_ratio": 1.641350210970464, "no_speech_prob": 0.0005055960500612855}, + {"id": 778, "seek": 485560, "start": 4869.92, "end": 4874.08, "text": " What''s + the difference there? So there''s this whole world around measuring model drift. + And there''s", "tokens": [51080, 708, 311, 264, 2649, 456, 30, 407, 456, 311, 341, + 1379, 1002, 926, 13389, 2316, 19699, 13, 400, 456, 311, 51288], "temperature": 0.0, + "avg_logprob": -0.1391047349910146, "compression_ratio": 1.641350210970464, "no_speech_prob": + 0.0005055960500612855}, {"id": 779, "seek": 485560, "start": 4874.08, "end": 4880.72, + "text": " some, there''s some interesting open source tools on this already. But + they''re written in Python.", "tokens": [51288, 512, 11, 456, 311, 512, 1880, 1269, + 4009, 3873, 322, 341, 1217, 13, 583, 436, 434, 3720, 294, 15329, 13, 51620], "temperature": + 0.0, "avg_logprob": -0.1391047349910146, "compression_ratio": 1.641350210970464, + "no_speech_prob": 0.0005055960500612855}, {"id": 780, "seek": 488072, "start": 4881.52, + "end": 4885.68, "text": " So now you have to use Python to do all this stuff. So + I''m trying to understand what,", "tokens": [50404, 407, 586, 291, 362, 281, 764, + 15329, 281, 360, 439, 341, 1507, 13, 407, 286, 478, 1382, 281, 1223, 437, 11, 50612], + "temperature": 0.0, "avg_logprob": -0.16797809190647575, "compression_ratio": 1.6217391304347826, + "no_speech_prob": 0.0015828986652195454}, {"id": 781, "seek": 488072, "start": 4886.8, + "end": 4893.84, "text": " what the tools, uh, what tools could be written that are + not in Python land. Um, that could expose", "tokens": [50668, 437, 264, 3873, 11, + 2232, 11, 437, 3873, 727, 312, 3720, 300, 366, 406, 294, 15329, 2117, 13, 3301, + 11, 300, 727, 19219, 51020], "temperature": 0.0, "avg_logprob": -0.16797809190647575, + "compression_ratio": 1.6217391304347826, "no_speech_prob": 0.0015828986652195454}, + {"id": 782, "seek": 488072, "start": 4893.84, "end": 4900.56, "text": " these statistics + and this important information to people who, um, you know, who don''t want to", + "tokens": [51020, 613, 12523, 293, 341, 1021, 1589, 281, 561, 567, 11, 1105, 11, + 291, 458, 11, 567, 500, 380, 528, 281, 51356], "temperature": 0.0, "avg_logprob": + -0.16797809190647575, "compression_ratio": 1.6217391304347826, "no_speech_prob": + 0.0015828986652195454}, {"id": 783, "seek": 488072, "start": 4900.56, "end": 4905.360000000001, + "text": " marry themselves to Python. Yeah, yeah, absolutely. This sounds like you + touched also on this", "tokens": [51356, 9747, 2969, 281, 15329, 13, 865, 11, 1338, + 11, 3122, 13, 639, 3263, 411, 291, 9828, 611, 322, 341, 51596], "temperature": 0.0, + "avg_logprob": -0.16797809190647575, "compression_ratio": 1.6217391304347826, "no_speech_prob": + 0.0015828986652195454}, {"id": 784, "seek": 490536, "start": 4905.36, "end": 4912.32, + "text": " very important topic, which I think, uh, is, is, uh, known as a metric + learning where, um,", "tokens": [50364, 588, 1021, 4829, 11, 597, 286, 519, 11, + 2232, 11, 307, 11, 307, 11, 2232, 11, 2570, 382, 257, 20678, 2539, 689, 11, 1105, + 11, 50712], "temperature": 0.0, "avg_logprob": -0.14546428862072172, "compression_ratio": + 1.6428571428571428, "no_speech_prob": 0.002350588794797659}, {"id": 785, "seek": + 490536, "start": 4913.12, "end": 4918.0, "text": " like on one hand, you do want + to know what is the optimal distance and maybe you need to fine tune", "tokens": + [50752, 411, 322, 472, 1011, 11, 291, 360, 528, 281, 458, 437, 307, 264, 16252, + 4560, 293, 1310, 291, 643, 281, 2489, 10864, 50996], "temperature": 0.0, "avg_logprob": + -0.14546428862072172, "compression_ratio": 1.6428571428571428, "no_speech_prob": + 0.002350588794797659}, {"id": 786, "seek": 490536, "start": 4918.0, "end": 4923.759999999999, + "text": " your model or maybe your data is not good fit for this model and so on. + But you do need the tools.", "tokens": [50996, 428, 2316, 420, 1310, 428, 1412, + 307, 406, 665, 3318, 337, 341, 2316, 293, 370, 322, 13, 583, 291, 360, 643, 264, + 3873, 13, 51284], "temperature": 0.0, "avg_logprob": -0.14546428862072172, "compression_ratio": + 1.6428571428571428, "no_speech_prob": 0.002350588794797659}, {"id": 787, "seek": + 490536, "start": 4923.759999999999, "end": 4931.759999999999, "text": " Maybe it''s + something like Cupid for, you know, ranking, um, evaluation and tuning. You would + also have", "tokens": [51284, 2704, 309, 311, 746, 411, 383, 6127, 337, 11, 291, + 458, 11, 17833, 11, 1105, 11, 13344, 293, 15164, 13, 509, 576, 611, 362, 51684], + "temperature": 0.0, "avg_logprob": -0.14546428862072172, "compression_ratio": 1.6428571428571428, + "no_speech_prob": 0.002350588794797659}, {"id": 788, "seek": 493176, "start": 4931.76, + "end": 4937.2, "text": " some tool which is Cupid like maybe even with the UI way. + You can load this vectors, visualize them", "tokens": [50364, 512, 2290, 597, 307, + 383, 6127, 411, 1310, 754, 365, 264, 15682, 636, 13, 509, 393, 3677, 341, 18875, + 11, 23273, 552, 50636], "temperature": 0.0, "avg_logprob": -0.17643913200923375, + "compression_ratio": 1.5776892430278884, "no_speech_prob": 0.003824483370408416}, + {"id": 789, "seek": 493176, "start": 4937.2, "end": 4942.08, "text": " and see, + okay, how do they fit together? What''s missing and so on and then have the stats + on it,", "tokens": [50636, 293, 536, 11, 1392, 11, 577, 360, 436, 3318, 1214, 30, + 708, 311, 5361, 293, 370, 322, 293, 550, 362, 264, 18152, 322, 309, 11, 50880], + "temperature": 0.0, "avg_logprob": -0.17643913200923375, "compression_ratio": 1.5776892430278884, + "no_speech_prob": 0.003824483370408416}, {"id": 790, "seek": 493176, "start": 4942.08, + "end": 4948.64, "text": " right? So you can actually run the statistics and, um, + you know, I''m gonna let Eric write that tool.", "tokens": [50880, 558, 30, 407, + 291, 393, 767, 1190, 264, 12523, 293, 11, 1105, 11, 291, 458, 11, 286, 478, 799, + 718, 9336, 2464, 300, 2290, 13, 51208], "temperature": 0.0, "avg_logprob": -0.17643913200923375, + "compression_ratio": 1.5776892430278884, "no_speech_prob": 0.003824483370408416}, + {"id": 791, "seek": 493176, "start": 4948.64, "end": 4954.8, "text": " I love Cupid. + Cupid is so great. Eric, go write Cupid for vector search. Yeah, I think, uh, we + can", "tokens": [51208, 286, 959, 383, 6127, 13, 383, 6127, 307, 370, 869, 13, 9336, + 11, 352, 2464, 383, 6127, 337, 8062, 3164, 13, 865, 11, 286, 519, 11, 2232, 11, + 321, 393, 51516], "temperature": 0.0, "avg_logprob": -0.17643913200923375, "compression_ratio": + 1.5776892430278884, "no_speech_prob": 0.003824483370408416}, {"id": 792, "seek": + 495480, "start": 4954.8, "end": 4962.72, "text": " pair up on that maybe all of + us contribute, make it open source. Um, but yeah, um, I think this is", "tokens": + [50364, 6119, 493, 322, 300, 1310, 439, 295, 505, 10586, 11, 652, 309, 1269, 4009, + 13, 3301, 11, 457, 1338, 11, 1105, 11, 286, 519, 341, 307, 50760], "temperature": + 0.0, "avg_logprob": -0.15817988152597465, "compression_ratio": 1.6458333333333333, + "no_speech_prob": 0.007086475845426321}, {"id": 793, "seek": 495480, "start": 4962.72, + "end": 4970.16, "text": " one way to look at it, right? Um, and I think quadrant, + um, developers, they, they push the metric", "tokens": [50760, 472, 636, 281, 574, + 412, 309, 11, 558, 30, 3301, 11, 293, 286, 519, 46856, 11, 1105, 11, 8849, 11, 436, + 11, 436, 2944, 264, 20678, 51132], "temperature": 0.0, "avg_logprob": -0.15817988152597465, + "compression_ratio": 1.6458333333333333, "no_speech_prob": 0.007086475845426321}, + {"id": 794, "seek": 495480, "start": 4970.16, "end": 4976.320000000001, "text": + " learning quite heavily forward by the time this podcast is, uh, this episode is + out. There will be", "tokens": [51132, 2539, 1596, 10950, 2128, 538, 264, 565, 341, + 7367, 307, 11, 2232, 11, 341, 3500, 307, 484, 13, 821, 486, 312, 51440], "temperature": + 0.0, "avg_logprob": -0.15817988152597465, "compression_ratio": 1.6458333333333333, + "no_speech_prob": 0.007086475845426321}, {"id": 795, "seek": 495480, "start": 4976.320000000001, + "end": 4984.08, "text": " another episode with a developer from quadrant who is + actually very big on, on this idea of metric", "tokens": [51440, 1071, 3500, 365, + 257, 10754, 490, 46856, 567, 307, 767, 588, 955, 322, 11, 322, 341, 1558, 295, 20678, + 51828], "temperature": 0.0, "avg_logprob": -0.15817988152597465, "compression_ratio": + 1.6458333333333333, "no_speech_prob": 0.007086475845426321}, {"id": 796, "seek": + 498408, "start": 4984.08, "end": 4990.88, "text": " learning. And, uh, he opens + sources, of course, everything. And I mean, he offers tools and also like,", "tokens": + [50364, 2539, 13, 400, 11, 2232, 11, 415, 9870, 7139, 11, 295, 1164, 11, 1203, 13, + 400, 286, 914, 11, 415, 7736, 3873, 293, 611, 411, 11, 50704], "temperature": 0.0, + "avg_logprob": -0.13319195441479953, "compression_ratio": 1.6178861788617886, "no_speech_prob": + 0.012398069724440575}, {"id": 797, "seek": 498408, "start": 4991.76, "end": 4996.8, + "text": " papers that you can read and indicate yourself on this space. I think + this is something that", "tokens": [50748, 10577, 300, 291, 393, 1401, 293, 13330, + 1803, 322, 341, 1901, 13, 286, 519, 341, 307, 746, 300, 51000], "temperature": 0.0, + "avg_logprob": -0.13319195441479953, "compression_ratio": 1.6178861788617886, "no_speech_prob": + 0.012398069724440575}, {"id": 798, "seek": 498408, "start": 4996.8, "end": 5002.96, + "text": " barely is scratched at the moment by the community, by, by even the end + users, you know, they don''t", "tokens": [51000, 10268, 307, 40513, 412, 264, 1623, + 538, 264, 1768, 11, 538, 11, 538, 754, 264, 917, 5022, 11, 291, 458, 11, 436, 500, + 380, 51308], "temperature": 0.0, "avg_logprob": -0.13319195441479953, "compression_ratio": + 1.6178861788617886, "no_speech_prob": 0.012398069724440575}, {"id": 799, "seek": + 498408, "start": 5002.96, "end": 5009.84, "text": " know. Okay, I take clip model. + I have the images, plug them in together, works fine. I''m done. What if", "tokens": + [51308, 458, 13, 1033, 11, 286, 747, 7353, 2316, 13, 286, 362, 264, 5267, 11, 5452, + 552, 294, 1214, 11, 1985, 2489, 13, 286, 478, 1096, 13, 708, 498, 51652], "temperature": + 0.0, "avg_logprob": -0.13319195441479953, "compression_ratio": 1.6178861788617886, + "no_speech_prob": 0.012398069724440575}, {"id": 800, "seek": 500984, "start": 5009.84, + "end": 5016.0, "text": " it doesn''t work? What if you have some images, you never + find them for any query, but you do", "tokens": [50364, 309, 1177, 380, 589, 30, + 708, 498, 291, 362, 512, 5267, 11, 291, 1128, 915, 552, 337, 604, 14581, 11, 457, + 291, 360, 50672], "temperature": 0.0, "avg_logprob": -0.11690245027895327, "compression_ratio": + 1.7330960854092528, "no_speech_prob": 0.0021353857591748238}, {"id": 801, "seek": + 500984, "start": 5016.0, "end": 5020.8, "text": " want to find them because it''s + a name of some product that was recently released and you do want to,", "tokens": + [50672, 528, 281, 915, 552, 570, 309, 311, 257, 1315, 295, 512, 1674, 300, 390, + 3938, 4736, 293, 291, 360, 528, 281, 11, 50912], "temperature": 0.0, "avg_logprob": + -0.11690245027895327, "compression_ratio": 1.7330960854092528, "no_speech_prob": + 0.0021353857591748238}, {"id": 802, "seek": 500984, "start": 5020.8, "end": 5027.76, + "text": " to showcase it, right? And you''re not using keyword search there. It''s + a name. You''re using, um,", "tokens": [50912, 281, 20388, 309, 11, 558, 30, 400, + 291, 434, 406, 1228, 20428, 3164, 456, 13, 467, 311, 257, 1315, 13, 509, 434, 1228, + 11, 1105, 11, 51260], "temperature": 0.0, "avg_logprob": -0.11690245027895327, "compression_ratio": + 1.7330960854092528, "no_speech_prob": 0.0021353857591748238}, {"id": 803, "seek": + 500984, "start": 5027.76, "end": 5032.8, "text": " vectors to retrieve it, right? + So it thinks like this. I mean, it''s kind of like, there''s a bunch of", "tokens": + [51260, 18875, 281, 30254, 309, 11, 558, 30, 407, 309, 7309, 411, 341, 13, 286, + 914, 11, 309, 311, 733, 295, 411, 11, 456, 311, 257, 3840, 295, 51512], "temperature": + 0.0, "avg_logprob": -0.11690245027895327, "compression_ratio": 1.7330960854092528, + "no_speech_prob": 0.0021353857591748238}, {"id": 804, "seek": 500984, "start": 5032.8, + "end": 5039.76, "text": " topics there. One, another one favorite that I like is + the, uh, robustness, right? So if I have", "tokens": [51512, 8378, 456, 13, 1485, + 11, 1071, 472, 2954, 300, 286, 411, 307, 264, 11, 2232, 11, 13956, 1287, 11, 558, + 30, 407, 498, 286, 362, 51860], "temperature": 0.0, "avg_logprob": -0.11690245027895327, + "compression_ratio": 1.7330960854092528, "no_speech_prob": 0.0021353857591748238}, + {"id": 805, "seek": 503976, "start": 5040.16, "end": 5046.8, "text": " an aircraft, + I rotated a little bit and all of a sudden I find kittens instead of the aircrafts.", + "tokens": [50384, 364, 9465, 11, 286, 42146, 257, 707, 857, 293, 439, 295, 257, + 3990, 286, 915, 47363, 2602, 295, 264, 9465, 82, 13, 50716], "temperature": 0.0, + "avg_logprob": -0.20732696383607155, "compression_ratio": 1.5991902834008098, "no_speech_prob": + 0.005572644528001547}, {"id": 806, "seek": 503976, "start": 5046.8, "end": 5051.76, + "text": " And this is what Connor Shorten showed yesterday on on on the genometer + and was amazing. I mean,", "tokens": [50716, 400, 341, 307, 437, 33133, 16881, 268, + 4712, 5186, 322, 322, 322, 264, 1049, 13606, 293, 390, 2243, 13, 286, 914, 11, 50964], + "temperature": 0.0, "avg_logprob": -0.20732696383607155, "compression_ratio": 1.5991902834008098, + "no_speech_prob": 0.005572644528001547}, {"id": 807, "seek": 503976, "start": 5051.76, + "end": 5058.16, "text": " robustness. You just change slightly your input and you + just, yeah, it doesn''t work. So I think there", "tokens": [50964, 13956, 1287, + 13, 509, 445, 1319, 4748, 428, 4846, 293, 291, 445, 11, 1338, 11, 309, 1177, 380, + 589, 13, 407, 286, 519, 456, 51284], "temperature": 0.0, "avg_logprob": -0.20732696383607155, + "compression_ratio": 1.5991902834008098, "no_speech_prob": 0.005572644528001547}, + {"id": 808, "seek": 503976, "start": 5058.16, "end": 5064.4800000000005, "text": + " is a lot of things missing, but like you, like from what I sense in your answer, + like it feels like", "tokens": [51284, 307, 257, 688, 295, 721, 5361, 11, 457, 411, + 291, 11, 411, 490, 437, 286, 2020, 294, 428, 1867, 11, 411, 309, 3417, 411, 51600], + "temperature": 0.0, "avg_logprob": -0.20732696383607155, "compression_ratio": 1.5991902834008098, + "no_speech_prob": 0.005572644528001547}, {"id": 809, "seek": 506448, "start": 5064.48, + "end": 5070.799999999999, "text": " you do still want to keep your focus on mighty + and push that as further along as possible, right?", "tokens": [50364, 291, 360, + 920, 528, 281, 1066, 428, 1879, 322, 21556, 293, 2944, 300, 382, 3052, 2051, 382, + 1944, 11, 558, 30, 50680], "temperature": 0.0, "avg_logprob": -0.16498756408691406, + "compression_ratio": 1.5340314136125655, "no_speech_prob": 0.0002988640044350177}, + {"id": 810, "seek": 506448, "start": 5075.12, "end": 5081.839999999999, "text": + " Yes. And I want to, what I really want is I, I love that people download and install + it and use it", "tokens": [50896, 1079, 13, 400, 286, 528, 281, 11, 437, 286, 534, + 528, 307, 286, 11, 286, 959, 300, 561, 5484, 293, 3625, 309, 293, 764, 309, 51232], + "temperature": 0.0, "avg_logprob": -0.16498756408691406, "compression_ratio": 1.5340314136125655, + "no_speech_prob": 0.0002988640044350177}, {"id": 811, "seek": 506448, "start": 5081.839999999999, + "end": 5088.24, "text": " and do whatever they want, uh, to get vectors with my, + that''s awesome. I''m really trying to find", "tokens": [51232, 293, 360, 2035, + 436, 528, 11, 2232, 11, 281, 483, 18875, 365, 452, 11, 300, 311, 3476, 13, 286, + 478, 534, 1382, 281, 915, 51552], "temperature": 0.0, "avg_logprob": -0.16498756408691406, + "compression_ratio": 1.5340314136125655, "no_speech_prob": 0.0002988640044350177}, + {"id": 812, "seek": 508824, "start": 5088.24, "end": 5095.76, "text": " partners. + I''m really trying to find partners who, um, who want to just really make it super + easy", "tokens": [50364, 4462, 13, 286, 478, 534, 1382, 281, 915, 4462, 567, 11, + 1105, 11, 567, 528, 281, 445, 534, 652, 309, 1687, 1858, 50740], "temperature": + 0.0, "avg_logprob": -0.11893547148931594, "compression_ratio": 1.6132596685082874, + "no_speech_prob": 0.0005097273970022798}, {"id": 813, "seek": 508824, "start": 5096.719999999999, + "end": 5104.88, "text": " uh, to do, uh, inference, model inference at scale. Um, + so for example, I haven''t gotten any", "tokens": [50788, 2232, 11, 281, 360, 11, + 2232, 11, 38253, 11, 2316, 38253, 412, 4373, 13, 3301, 11, 370, 337, 1365, 11, 286, + 2378, 380, 5768, 604, 51196], "temperature": 0.0, "avg_logprob": -0.11893547148931594, + "compression_ratio": 1.6132596685082874, "no_speech_prob": 0.0005097273970022798}, + {"id": 814, "seek": 508824, "start": 5104.88, "end": 5112.08, "text": " replies. + I''ve been like spamming, uh, not spamming. I''ve been, uh, emailing and trying + to get in touch", "tokens": [51196, 42289, 13, 286, 600, 668, 411, 24028, 2810, + 11, 2232, 11, 406, 24028, 2810, 13, 286, 600, 668, 11, 2232, 11, 3796, 278, 293, + 1382, 281, 483, 294, 2557, 51556], "temperature": 0.0, "avg_logprob": -0.11893547148931594, + "compression_ratio": 1.6132596685082874, "no_speech_prob": 0.0005097273970022798}, + {"id": 815, "seek": 511208, "start": 5112.24, "end": 5118.32, "text": " with like + cloud cloud providers, right, to say serverless inference. If you could offer serverless", + "tokens": [50372, 365, 411, 4588, 4588, 11330, 11, 558, 11, 281, 584, 7154, 1832, + 38253, 13, 759, 291, 727, 2626, 7154, 1832, 50676], "temperature": 0.0, "avg_logprob": + -0.13872390747070312, "compression_ratio": 1.7008547008547008, "no_speech_prob": + 0.000725920544937253}, {"id": 816, "seek": 511208, "start": 5118.32, "end": 5124.72, + "text": " inference, right, through lambdas or whatever, that''s like so many people + are asking for that, you know,", "tokens": [50676, 38253, 11, 558, 11, 807, 10097, + 27476, 420, 2035, 11, 300, 311, 411, 370, 867, 561, 366, 3365, 337, 300, 11, 291, + 458, 11, 50996], "temperature": 0.0, "avg_logprob": -0.13872390747070312, "compression_ratio": + 1.7008547008547008, "no_speech_prob": 0.000725920544937253}, {"id": 817, "seek": + 511208, "start": 5125.6, "end": 5132.88, "text": " you can''t do that with Python + tools these days. Um, you can do it. It''s just going to, it would take", "tokens": + [51040, 291, 393, 380, 360, 300, 365, 15329, 3873, 613, 1708, 13, 3301, 11, 291, + 393, 360, 309, 13, 467, 311, 445, 516, 281, 11, 309, 576, 747, 51404], "temperature": + 0.0, "avg_logprob": -0.13872390747070312, "compression_ratio": 1.7008547008547008, + "no_speech_prob": 0.000725920544937253}, {"id": 818, "seek": 511208, "start": 5132.88, + "end": 5138.48, "text": " forever and it would be really expensive and really slow. + Um, but there''s such an opportunity", "tokens": [51404, 5680, 293, 309, 576, 312, + 534, 5124, 293, 534, 2964, 13, 3301, 11, 457, 456, 311, 1270, 364, 2650, 51684], + "temperature": 0.0, "avg_logprob": -0.13872390747070312, "compression_ratio": 1.7008547008547008, + "no_speech_prob": 0.000725920544937253}, {"id": 819, "seek": 513848, "start": 5139.28, + "end": 5148.16, "text": " for cloud providers to make it super easy. So you can + have, you know, you want to get content from,", "tokens": [50404, 337, 4588, 11330, + 281, 652, 309, 1687, 1858, 13, 407, 291, 393, 362, 11, 291, 458, 11, 291, 528, 281, + 483, 2701, 490, 11, 50848], "temperature": 0.0, "avg_logprob": -0.10755076127893784, + "compression_ratio": 1.6553191489361703, "no_speech_prob": 0.001585534424521029}, + {"id": 820, "seek": 513848, "start": 5148.959999999999, "end": 5154.32, "text": + " from point A into, uh, into your recommendation engine or your vector database + or whatever,", "tokens": [50888, 490, 935, 316, 666, 11, 2232, 11, 666, 428, 11879, + 2848, 420, 428, 8062, 8149, 420, 2035, 11, 51156], "temperature": 0.0, "avg_logprob": + -0.10755076127893784, "compression_ratio": 1.6553191489361703, "no_speech_prob": + 0.001585534424521029}, {"id": 821, "seek": 513848, "start": 5154.32, "end": 5160.5599999999995, + "text": " you know, do you want to stand up like the big GPU server in the middle + to get this? No, you don''t", "tokens": [51156, 291, 458, 11, 360, 291, 528, 281, + 1463, 493, 411, 264, 955, 18407, 7154, 294, 264, 2808, 281, 483, 341, 30, 883, 11, + 291, 500, 380, 51468], "temperature": 0.0, "avg_logprob": -0.10755076127893784, + "compression_ratio": 1.6553191489361703, "no_speech_prob": 0.001585534424521029}, + {"id": 822, "seek": 513848, "start": 5160.5599999999995, "end": 5165.5199999999995, + "text": " want to do that. Um, if you can avoid it. So how about something that''s + that serverless and people", "tokens": [51468, 528, 281, 360, 300, 13, 3301, 11, + 498, 291, 393, 5042, 309, 13, 407, 577, 466, 746, 300, 311, 300, 7154, 1832, 293, + 561, 51716], "temperature": 0.0, "avg_logprob": -0.10755076127893784, "compression_ratio": + 1.6553191489361703, "no_speech_prob": 0.001585534424521029}, {"id": 823, "seek": + 516552, "start": 5165.52, "end": 5171.120000000001, "text": " can just run? So I''m + trying to find partners there. I''m trying to find partners who, uh, who have", + "tokens": [50364, 393, 445, 1190, 30, 407, 286, 478, 1382, 281, 915, 4462, 456, + 13, 286, 478, 1382, 281, 915, 4462, 567, 11, 2232, 11, 567, 362, 50644], "temperature": + 0.0, "avg_logprob": -0.1581282948338708, "compression_ratio": 1.84375, "no_speech_prob": + 0.0009631191496737301}, {"id": 824, "seek": 516552, "start": 5172.8, "end": 5179.52, + "text": " search platforms and, um, and other and other platforms or just see this + as a Lego and their stack", "tokens": [50728, 3164, 9473, 293, 11, 1105, 11, 293, + 661, 293, 661, 9473, 420, 445, 536, 341, 382, 257, 28761, 293, 641, 8630, 51064], + "temperature": 0.0, "avg_logprob": -0.1581282948338708, "compression_ratio": 1.84375, + "no_speech_prob": 0.0009631191496737301}, {"id": 825, "seek": 516552, "start": 5180.0, + "end": 5185.120000000001, "text": " and things that''s going to make it easier and + they don''t want to, you know, hire a team and spend", "tokens": [51088, 293, 721, + 300, 311, 516, 281, 652, 309, 3571, 293, 436, 500, 380, 528, 281, 11, 291, 458, + 11, 11158, 257, 1469, 293, 3496, 51344], "temperature": 0.0, "avg_logprob": -0.1581282948338708, + "compression_ratio": 1.84375, "no_speech_prob": 0.0009631191496737301}, {"id": 826, + "seek": 516552, "start": 5185.120000000001, "end": 5189.84, "text": " months building + this thing and trying to figure it out. Um, you can do that of course. Go, uh,", + "tokens": [51344, 2493, 2390, 341, 551, 293, 1382, 281, 2573, 309, 484, 13, 3301, + 11, 291, 393, 360, 300, 295, 1164, 13, 1037, 11, 2232, 11, 51580], "temperature": + 0.0, "avg_logprob": -0.1581282948338708, "compression_ratio": 1.84375, "no_speech_prob": + 0.0009631191496737301}, {"id": 827, "seek": 516552, "start": 5189.84, "end": 5192.4800000000005, + "text": " go do that, but, you know, you can save yourself a lot of time and pay + and buy it.", "tokens": [51580, 352, 360, 300, 11, 457, 11, 291, 458, 11, 291, 393, + 3155, 1803, 257, 688, 295, 565, 293, 1689, 293, 2256, 309, 13, 51712], "temperature": + 0.0, "avg_logprob": -0.1581282948338708, "compression_ratio": 1.84375, "no_speech_prob": + 0.0009631191496737301}, {"id": 828, "seek": 519248, "start": 5193.28, "end": 5195.759999999999, + "text": " Um, by working with stuff that''s already there.", "tokens": [50404, 3301, + 11, 538, 1364, 365, 1507, 300, 311, 1217, 456, 13, 50528], "temperature": 0.0, "avg_logprob": + -0.24413514368742414, "compression_ratio": 1.5905511811023623, "no_speech_prob": + 0.005984840914607048}, {"id": 829, "seek": 519248, "start": 5197.2, "end": 5200.959999999999, + "text": " Yeah, that makes sense. I mean, probably companies like the likes of", + "tokens": [50600, 865, 11, 300, 1669, 2020, 13, 286, 914, 11, 1391, 3431, 411, 264, + 5902, 295, 50788], "temperature": 0.0, "avg_logprob": -0.24413514368742414, "compression_ratio": + 1.5905511811023623, "no_speech_prob": 0.005984840914607048}, {"id": 830, "seek": + 519248, "start": 5200.959999999999, "end": 5208.639999999999, "text": " Algolea + or, right, exactly, but potentially elastic, you know, because they, both of these,", + "tokens": [50788, 967, 1571, 306, 64, 420, 11, 558, 11, 2293, 11, 457, 7263, 17115, + 11, 291, 458, 11, 570, 436, 11, 1293, 295, 613, 11, 51172], "temperature": 0.0, + "avg_logprob": -0.24413514368742414, "compression_ratio": 1.5905511811023623, "no_speech_prob": + 0.005984840914607048}, {"id": 831, "seek": 519248, "start": 5209.44, "end": 5215.28, + "text": " want to get closer to the neural search even though maybe they were not + wired up originally to", "tokens": [51212, 528, 281, 483, 4966, 281, 264, 18161, + 3164, 754, 1673, 1310, 436, 645, 406, 27415, 493, 7993, 281, 51504], "temperature": + 0.0, "avg_logprob": -0.24413514368742414, "compression_ratio": 1.5905511811023623, + "no_speech_prob": 0.005984840914607048}, {"id": 832, "seek": 519248, "start": 5215.28, + "end": 5221.5199999999995, "text": " be vector search databases, but they do have + the components like elastic based on Lussin and Algolea", "tokens": [51504, 312, + 8062, 3164, 22380, 11, 457, 436, 360, 362, 264, 6677, 411, 17115, 2361, 322, 441, + 2023, 259, 293, 967, 1571, 306, 64, 51816], "temperature": 0.0, "avg_logprob": -0.24413514368742414, + "compression_ratio": 1.5905511811023623, "no_speech_prob": 0.005984840914607048}, + {"id": 833, "seek": 522152, "start": 5222.320000000001, "end": 5227.6, "text": " + probably based also Lussin. I''m not sure, but I''m sure that they''re looking at + this field. So I mean,", "tokens": [50404, 1391, 2361, 611, 441, 2023, 259, 13, + 286, 478, 406, 988, 11, 457, 286, 478, 988, 300, 436, 434, 1237, 412, 341, 2519, + 13, 407, 286, 914, 11, 50668], "temperature": 0.0, "avg_logprob": -0.19526315391610521, + "compression_ratio": 1.592, "no_speech_prob": 0.0017663463950157166}, {"id": 834, + "seek": 522152, "start": 5227.6, "end": 5234.400000000001, "text": " for them and + now we are getting a little bit into MLOPS and Vision, um, that you also shared + a little", "tokens": [50668, 337, 552, 293, 586, 321, 366, 1242, 257, 707, 857, + 666, 21601, 46, 6273, 293, 25170, 11, 1105, 11, 300, 291, 611, 5507, 257, 707, 51008], + "temperature": 0.0, "avg_logprob": -0.19526315391610521, "compression_ratio": 1.592, + "no_speech_prob": 0.0017663463950157166}, {"id": 835, "seek": 522152, "start": 5234.400000000001, + "end": 5244.320000000001, "text": " bit ahead of time that, um, might it could be + one of the components in the MLOPS ecosystem, right?", "tokens": [51008, 857, 2286, + 295, 565, 300, 11, 1105, 11, 1062, 309, 727, 312, 472, 295, 264, 6677, 294, 264, + 21601, 46, 6273, 11311, 11, 558, 30, 51504], "temperature": 0.0, "avg_logprob": + -0.19526315391610521, "compression_ratio": 1.592, "no_speech_prob": 0.0017663463950157166}, + {"id": 836, "seek": 522152, "start": 5245.76, "end": 5251.120000000001, "text": + " Yeah, absolutely. Not just a standalone kind of script, which I download and then + I''m thinking,", "tokens": [51576, 865, 11, 3122, 13, 1726, 445, 257, 37454, 733, + 295, 5755, 11, 597, 286, 5484, 293, 550, 286, 478, 1953, 11, 51844], "temperature": + 0.0, "avg_logprob": -0.19526315391610521, "compression_ratio": 1.592, "no_speech_prob": + 0.0017663463950157166}, {"id": 837, "seek": 525112, "start": 5251.12, "end": 5257.5199999999995, + "text": " okay, where do I plug it in? Right? I mean, if it was, if it was, are + you thinking in that direction", "tokens": [50364, 1392, 11, 689, 360, 286, 5452, + 309, 294, 30, 1779, 30, 286, 914, 11, 498, 309, 390, 11, 498, 309, 390, 11, 366, + 291, 1953, 294, 300, 3513, 50684], "temperature": 0.0, "avg_logprob": -0.17146593048459008, + "compression_ratio": 1.4607843137254901, "no_speech_prob": 0.0010670768097043037}, + {"id": 838, "seek": 525112, "start": 5257.5199999999995, "end": 5265.68, "text": + " as well yourself? Like, okay, identifying the tools and systems where MIT could + kind of like play", "tokens": [50684, 382, 731, 1803, 30, 1743, 11, 1392, 11, 16696, + 264, 3873, 293, 3652, 689, 13100, 727, 733, 295, 411, 862, 51092], "temperature": + 0.0, "avg_logprob": -0.17146593048459008, "compression_ratio": 1.4607843137254901, + "no_speech_prob": 0.0010670768097043037}, {"id": 839, "seek": 525112, "start": 5265.68, + "end": 5280.24, "text": " a long role of the embedding software? Yeah, absolutely. + Um, it''s, I have to, if, the other thing I", "tokens": [51092, 257, 938, 3090, + 295, 264, 12240, 3584, 4722, 30, 865, 11, 3122, 13, 3301, 11, 309, 311, 11, 286, + 362, 281, 11, 498, 11, 264, 661, 551, 286, 51820], "temperature": 0.0, "avg_logprob": + -0.17146593048459008, "compression_ratio": 1.4607843137254901, "no_speech_prob": + 0.0010670768097043037}, {"id": 840, "seek": 528024, "start": 5280.24, "end": 5285.599999999999, + "text": " want to figure out is, does it make sense as it is right now as a, as + a web server? Like that for", "tokens": [50364, 528, 281, 2573, 484, 307, 11, 775, + 309, 652, 2020, 382, 309, 307, 558, 586, 382, 257, 11, 382, 257, 3670, 7154, 30, + 1743, 300, 337, 50632], "temperature": 0.0, "avg_logprob": -0.1551298630976044, + "compression_ratio": 1.6055776892430278, "no_speech_prob": 0.0005562129081226885}, + {"id": 841, "seek": 528024, "start": 5285.599999999999, "end": 5292.08, "text": + " every case, probably not. There''s probably situations. GRPC was one request, + um, that I have to figure", "tokens": [50632, 633, 1389, 11, 1391, 406, 13, 821, + 311, 1391, 6851, 13, 10903, 12986, 390, 472, 5308, 11, 1105, 11, 300, 286, 362, + 281, 2573, 50956], "temperature": 0.0, "avg_logprob": -0.1551298630976044, "compression_ratio": + 1.6055776892430278, "no_speech_prob": 0.0005562129081226885}, {"id": 842, "seek": + 528024, "start": 5292.08, "end": 5298.24, "text": " that out. So that makes it a + little bit easier to, to, um, to bind it to certain application layers.", "tokens": + [50956, 300, 484, 13, 407, 300, 1669, 309, 257, 707, 857, 3571, 281, 11, 281, 11, + 1105, 11, 281, 14786, 309, 281, 1629, 3861, 7914, 13, 51264], "temperature": 0.0, + "avg_logprob": -0.1551298630976044, "compression_ratio": 1.6055776892430278, "no_speech_prob": + 0.0005562129081226885}, {"id": 843, "seek": 528024, "start": 5299.36, "end": 5308.96, + "text": " Uh, but yeah, it''s, it''s meant to be flexible for you sticking a model + that your model, um, you know,", "tokens": [51320, 4019, 11, 457, 1338, 11, 309, + 311, 11, 309, 311, 4140, 281, 312, 11358, 337, 291, 13465, 257, 2316, 300, 428, + 2316, 11, 1105, 11, 291, 458, 11, 51800], "temperature": 0.0, "avg_logprob": -0.1551298630976044, + "compression_ratio": 1.6055776892430278, "no_speech_prob": 0.0005562129081226885}, + {"id": 844, "seek": 530896, "start": 5309.84, "end": 5316.8, "text": " and, and + you, you run it how you want. The, the other thing that I found was that I, I met + a lot of", "tokens": [50408, 293, 11, 293, 291, 11, 291, 1190, 309, 577, 291, 528, + 13, 440, 11, 264, 661, 551, 300, 286, 1352, 390, 300, 286, 11, 286, 1131, 257, 688, + 295, 50756], "temperature": 0.0, "avg_logprob": -0.18921325798321487, "compression_ratio": + 1.7295373665480427, "no_speech_prob": 0.001124868169426918}, {"id": 845, "seek": + 530896, "start": 5316.8, "end": 5322.0, "text": " people who were like scratching + their heads saying, like, which model should I use also, right?", "tokens": [50756, + 561, 567, 645, 411, 29699, 641, 8050, 1566, 11, 411, 11, 597, 2316, 820, 286, 764, + 611, 11, 558, 30, 51016], "temperature": 0.0, "avg_logprob": -0.18921325798321487, + "compression_ratio": 1.7295373665480427, "no_speech_prob": 0.001124868169426918}, + {"id": 846, "seek": 530896, "start": 5322.0, "end": 5326.24, "text": " As my, my + first model or, or whatever. And I just want to start playing around with this. + So that''s", "tokens": [51016, 1018, 452, 11, 452, 700, 2316, 420, 11, 420, 2035, + 13, 400, 286, 445, 528, 281, 722, 2433, 926, 365, 341, 13, 407, 300, 311, 51228], + "temperature": 0.0, "avg_logprob": -0.18921325798321487, "compression_ratio": 1.7295373665480427, + "no_speech_prob": 0.001124868169426918}, {"id": 847, "seek": 530896, "start": 5326.24, + "end": 5331.68, "text": " the other thing I did is I, is I have like default models + that I, that I chose that I know work well", "tokens": [51228, 264, 661, 551, 286, + 630, 307, 286, 11, 307, 286, 362, 411, 7576, 5245, 300, 286, 11, 300, 286, 5111, + 300, 286, 458, 589, 731, 51500], "temperature": 0.0, "avg_logprob": -0.18921325798321487, + "compression_ratio": 1.7295373665480427, "no_speech_prob": 0.001124868169426918}, + {"id": 848, "seek": 530896, "start": 5331.68, "end": 5337.2, "text": " because, + you know, especially like Neil''s rumors, he''s amazing and he''s done amazing, + um,", "tokens": [51500, 570, 11, 291, 458, 11, 2318, 411, 18615, 311, 21201, 11, + 415, 311, 2243, 293, 415, 311, 1096, 2243, 11, 1105, 11, 51776], "temperature": + 0.0, "avg_logprob": -0.18921325798321487, "compression_ratio": 1.7295373665480427, + "no_speech_prob": 0.001124868169426918}, {"id": 849, "seek": 533720, "start": 5338.16, + "end": 5344.08, "text": " community development around, around expert and the models + that he''s trained and the", "tokens": [50412, 1768, 3250, 926, 11, 926, 5844, 293, + 264, 5245, 300, 415, 311, 8895, 293, 264, 50708], "temperature": 0.0, "avg_logprob": + -0.12591260429320297, "compression_ratio": 1.7732342007434945, "no_speech_prob": + 0.0006102172774262726}, {"id": 850, "seek": 533720, "start": 5344.08, "end": 5348.72, + "text": " documentation he''s published around why certain models are good and others + are bad. So other people,", "tokens": [50708, 14333, 415, 311, 6572, 926, 983, 1629, + 5245, 366, 665, 293, 2357, 366, 1578, 13, 407, 661, 561, 11, 50940], "temperature": + 0.0, "avg_logprob": -0.12591260429320297, "compression_ratio": 1.7732342007434945, + "no_speech_prob": 0.0006102172774262726}, {"id": 851, "seek": 533720, "start": 5348.72, + "end": 5353.2, "text": " they don''t know of, of, of this stuff. So it''s just like, + well, you don''t have to go off and learn", "tokens": [50940, 436, 500, 380, 458, + 295, 11, 295, 11, 295, 341, 1507, 13, 407, 309, 311, 445, 411, 11, 731, 11, 291, + 500, 380, 362, 281, 352, 766, 293, 1466, 51164], "temperature": 0.0, "avg_logprob": + -0.12591260429320297, "compression_ratio": 1.7732342007434945, "no_speech_prob": + 0.0006102172774262726}, {"id": 852, "seek": 533720, "start": 5353.2, "end": 5359.5199999999995, + "text": " and understand, um, right away, why, why I should choose one model versus + another. It''s a hard", "tokens": [51164, 293, 1223, 11, 1105, 11, 558, 1314, 11, + 983, 11, 983, 286, 820, 2826, 472, 2316, 5717, 1071, 13, 467, 311, 257, 1152, 51480], + "temperature": 0.0, "avg_logprob": -0.12591260429320297, "compression_ratio": 1.7732342007434945, + "no_speech_prob": 0.0006102172774262726}, {"id": 853, "seek": 533720, "start": 5359.5199999999995, + "end": 5363.84, "text": " decision to make. So there''s some, there''s some defaults + that I chose. So it''s really easy to get", "tokens": [51480, 3537, 281, 652, 13, + 407, 456, 311, 512, 11, 456, 311, 512, 7576, 82, 300, 286, 5111, 13, 407, 309, 311, + 534, 1858, 281, 483, 51696], "temperature": 0.0, "avg_logprob": -0.12591260429320297, + "compression_ratio": 1.7732342007434945, "no_speech_prob": 0.0006102172774262726}, + {"id": 854, "seek": 536384, "start": 5363.84, "end": 5368.400000000001, "text": + " started. So the, so the vectors themselves right off the bat or if you do question + answering,", "tokens": [50364, 1409, 13, 407, 264, 11, 370, 264, 18875, 2969, 558, + 766, 264, 7362, 420, 498, 291, 360, 1168, 13430, 11, 50592], "temperature": 0.0, + "avg_logprob": -0.1491843561331431, "compression_ratio": 1.6075949367088607, "no_speech_prob": + 0.0009784846333786845}, {"id": 855, "seek": 536384, "start": 5368.400000000001, + "end": 5375.28, "text": " it''ll be, it''ll be pretty good. Like for, for regular, + regular English, not domain specific.", "tokens": [50592, 309, 603, 312, 11, 309, + 603, 312, 1238, 665, 13, 1743, 337, 11, 337, 3890, 11, 3890, 3669, 11, 406, 9274, + 2685, 13, 50936], "temperature": 0.0, "avg_logprob": -0.1491843561331431, "compression_ratio": + 1.6075949367088607, "no_speech_prob": 0.0009784846333786845}, {"id": 856, "seek": + 536384, "start": 5375.28, "end": 5381.28, "text": " You, you still have to do fine + tuning for most cases. But you''re not going to start fine tuning", "tokens": [50936, + 509, 11, 291, 920, 362, 281, 360, 2489, 15164, 337, 881, 3331, 13, 583, 291, 434, + 406, 516, 281, 722, 2489, 15164, 51236], "temperature": 0.0, "avg_logprob": -0.1491843561331431, + "compression_ratio": 1.6075949367088607, "no_speech_prob": 0.0009784846333786845}, + {"id": 857, "seek": 536384, "start": 5381.28, "end": 5387.84, "text": " before you + even know how this thing performs like in the beginning, right? You want to try + a model", "tokens": [51236, 949, 291, 754, 458, 577, 341, 551, 26213, 411, 294, + 264, 2863, 11, 558, 30, 509, 528, 281, 853, 257, 2316, 51564], "temperature": 0.0, + "avg_logprob": -0.1491843561331431, "compression_ratio": 1.6075949367088607, "no_speech_prob": + 0.0009784846333786845}, {"id": 858, "seek": 538784, "start": 5387.92, "end": 5395.52, + "text": " and see what, how close it is. Um, so there''s some starting, starting + work there. I know Algolia is", "tokens": [50368, 293, 536, 437, 11, 577, 1998, + 309, 307, 13, 3301, 11, 370, 456, 311, 512, 2891, 11, 2891, 589, 456, 13, 286, 458, + 967, 70, 29760, 307, 50748], "temperature": 0.0, "avg_logprob": -0.17935154801708156, + "compression_ratio": 1.6556016597510372, "no_speech_prob": 0.015115798451006413}, + {"id": 859, "seek": 538784, "start": 5395.52, "end": 5399.52, "text": " getting + to the vector search stuff. So I don''t know. Maybe they, they, maybe they don''t + know how to", "tokens": [50748, 1242, 281, 264, 8062, 3164, 1507, 13, 407, 286, + 500, 380, 458, 13, 2704, 436, 11, 436, 11, 1310, 436, 500, 380, 458, 577, 281, 50948], + "temperature": 0.0, "avg_logprob": -0.17935154801708156, "compression_ratio": 1.6556016597510372, + "no_speech_prob": 0.015115798451006413}, {"id": 860, "seek": 538784, "start": 5399.52, + "end": 5405.84, "text": " choose a model. So you guys, you can use my default model + if you want. It''s just a, yeah, absolutely.", "tokens": [50948, 2826, 257, 2316, + 13, 407, 291, 1074, 11, 291, 393, 764, 452, 7576, 2316, 498, 291, 528, 13, 467, + 311, 445, 257, 11, 1338, 11, 3122, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.17935154801708156, "compression_ratio": 1.6556016597510372, "no_speech_prob": + 0.015115798451006413}, {"id": 861, "seek": 538784, "start": 5405.84, "end": 5412.96, + "text": " I mean, so far, what I hear from you is that my tea has, uh, the qualities, + let''s say, it can run", "tokens": [51264, 286, 914, 11, 370, 1400, 11, 437, 286, + 1568, 490, 291, 307, 300, 452, 5817, 575, 11, 2232, 11, 264, 16477, 11, 718, 311, + 584, 11, 309, 393, 1190, 51620], "temperature": 0.0, "avg_logprob": -0.17935154801708156, + "compression_ratio": 1.6556016597510372, "no_speech_prob": 0.015115798451006413}, + {"id": 862, "seek": 541296, "start": 5412.96, "end": 5419.2, "text": " on pure CPU, + which is a win on cost. It scales, which is also a win on cost in the long term.", + "tokens": [50364, 322, 6075, 13199, 11, 597, 307, 257, 1942, 322, 2063, 13, 467, + 17408, 11, 597, 307, 611, 257, 1942, 322, 2063, 294, 264, 938, 1433, 13, 50676], + "temperature": 0.0, "avg_logprob": -0.1636808552873244, "compression_ratio": 1.654867256637168, + "no_speech_prob": 0.03533283621072769}, {"id": 863, "seek": 541296, "start": 5419.76, + "end": 5429.52, "text": " Right. Um, and it also, uh, is insanely fast, which is + a win on product. It''s a win on, go to", "tokens": [50704, 1779, 13, 3301, 11, + 293, 309, 611, 11, 2232, 11, 307, 40965, 2370, 11, 597, 307, 257, 1942, 322, 1674, + 13, 467, 311, 257, 1942, 322, 11, 352, 281, 51192], "temperature": 0.0, "avg_logprob": + -0.1636808552873244, "compression_ratio": 1.654867256637168, "no_speech_prob": 0.03533283621072769}, + {"id": 864, "seek": 541296, "start": 5429.52, "end": 5435.52, "text": " market. + Like I have this document, how quickly it travels through the pipeline and is searchable.", + "tokens": [51192, 2142, 13, 1743, 286, 362, 341, 4166, 11, 577, 2661, 309, 19863, + 807, 264, 15517, 293, 307, 3164, 712, 13, 51492], "temperature": 0.0, "avg_logprob": + -0.1636808552873244, "compression_ratio": 1.654867256637168, "no_speech_prob": 0.03533283621072769}, + {"id": 865, "seek": 541296, "start": 5436.56, "end": 5442.16, "text": " Right. So + I mean, it''s important use case. In some cases, like paramount, you know, like", + "tokens": [51544, 1779, 13, 407, 286, 914, 11, 309, 311, 1021, 764, 1389, 13, 682, + 512, 3331, 11, 411, 6220, 792, 11, 291, 458, 11, 411, 51824], "temperature": 0.0, + "avg_logprob": -0.1636808552873244, "compression_ratio": 1.654867256637168, "no_speech_prob": + 0.03533283621072769}, {"id": 866, "seek": 544216, "start": 5442.16, "end": 5448.32, + "text": " financial space, you know, a document came out. I wanted to be indexed + right away. Like a second", "tokens": [50364, 4669, 1901, 11, 291, 458, 11, 257, + 4166, 1361, 484, 13, 286, 1415, 281, 312, 8186, 292, 558, 1314, 13, 1743, 257, 1150, + 50672], "temperature": 0.0, "avg_logprob": -0.15753915376752337, "compression_ratio": + 1.563265306122449, "no_speech_prob": 0.006090919487178326}, {"id": 867, "seek": + 544216, "start": 5448.32, "end": 5453.76, "text": " after, I don''t want to wait + five minutes. It will be way too late for me to make a decision. So,", "tokens": + [50672, 934, 11, 286, 500, 380, 528, 281, 1699, 1732, 2077, 13, 467, 486, 312, 636, + 886, 3469, 337, 385, 281, 652, 257, 3537, 13, 407, 11, 50944], "temperature": 0.0, + "avg_logprob": -0.15753915376752337, "compression_ratio": 1.563265306122449, "no_speech_prob": + 0.006090919487178326}, {"id": 868, "seek": 544216, "start": 5454.639999999999, "end": + 5461.36, "text": " I mean, is there something else? Like you, and maybe if you, + if you could compare now or point", "tokens": [50988, 286, 914, 11, 307, 456, 746, + 1646, 30, 1743, 291, 11, 293, 1310, 498, 291, 11, 498, 291, 727, 6794, 586, 420, + 935, 51324], "temperature": 0.0, "avg_logprob": -0.15753915376752337, "compression_ratio": + 1.563265306122449, "no_speech_prob": 0.006090919487178326}, {"id": 869, "seek": + 544216, "start": 5461.36, "end": 5467.04, "text": " us to the blog post, you know, + uh, with other vendors like Amazon has in French, uh, you know,", "tokens": [51324, + 505, 281, 264, 6968, 2183, 11, 291, 458, 11, 2232, 11, 365, 661, 22056, 411, 6795, + 575, 294, 5522, 11, 2232, 11, 291, 458, 11, 51608], "temperature": 0.0, "avg_logprob": + -0.15753915376752337, "compression_ratio": 1.563265306122449, "no_speech_prob": + 0.006090919487178326}, {"id": 870, "seek": 546704, "start": 5467.12, "end": 5475.04, + "text": " hugging face has infinity, infinity, right? Um, and then, uh, and video, + I think they also had", "tokens": [50368, 41706, 1851, 575, 13202, 11, 13202, 11, + 558, 30, 3301, 11, 293, 550, 11, 2232, 11, 293, 960, 11, 286, 519, 436, 611, 632, + 50764], "temperature": 0.0, "avg_logprob": -0.1803096889220562, "compression_ratio": + 1.626086956521739, "no_speech_prob": 0.007530526723712683}, {"id": 871, "seek": + 546704, "start": 5475.04, "end": 5480.8, "text": " some layer. I forgot its name, + but like those are probably fairly expensive. They probably", "tokens": [50764, + 512, 4583, 13, 286, 5298, 1080, 1315, 11, 457, 411, 729, 366, 1391, 6457, 5124, + 13, 814, 1391, 51052], "temperature": 0.0, "avg_logprob": -0.1803096889220562, "compression_ratio": + 1.626086956521739, "no_speech_prob": 0.007530526723712683}, {"id": 872, "seek": + 546704, "start": 5480.8, "end": 5488.64, "text": " are not $90 per piece. So what, + what, what is your thinking there? So like you, you, I think you", "tokens": [51052, + 366, 406, 1848, 7771, 680, 2522, 13, 407, 437, 11, 437, 11, 437, 307, 428, 1953, + 456, 30, 407, 411, 291, 11, 291, 11, 286, 519, 291, 51444], "temperature": 0.0, + "avg_logprob": -0.1803096889220562, "compression_ratio": 1.626086956521739, "no_speech_prob": + 0.007530526723712683}, {"id": 873, "seek": 546704, "start": 5488.64, "end": 5494.48, + "text": " also are vocal in this space or like in that direction that mighty is + much more economical.", "tokens": [51444, 611, 366, 11657, 294, 341, 1901, 420, + 411, 294, 300, 3513, 300, 21556, 307, 709, 544, 42473, 13, 51736], "temperature": + 0.0, "avg_logprob": -0.1803096889220562, "compression_ratio": 1.626086956521739, + "no_speech_prob": 0.007530526723712683}, {"id": 874, "seek": 549448, "start": 5494.48, + "end": 5501.759999999999, "text": " Uh, than these more expensive solutions, but + they probably offer something else as well, but like,", "tokens": [50364, 4019, + 11, 813, 613, 544, 5124, 6547, 11, 457, 436, 1391, 2626, 746, 1646, 382, 731, 11, + 457, 411, 11, 50728], "temperature": 0.0, "avg_logprob": -0.2381542696811185, "compression_ratio": + 1.6455696202531647, "no_speech_prob": 0.011571727693080902}, {"id": 875, "seek": + 549448, "start": 5501.759999999999, "end": 5510.24, "text": " you have an issue + for sure. Yeah. Um, I think that, so the interesting thing, if you, if you get", + "tokens": [50728, 291, 362, 364, 2734, 337, 988, 13, 865, 13, 3301, 11, 286, 519, + 300, 11, 370, 264, 1880, 551, 11, 498, 291, 11, 498, 291, 483, 51152], "temperature": + 0.0, "avg_logprob": -0.2381542696811185, "compression_ratio": 1.6455696202531647, + "no_speech_prob": 0.011571727693080902}, {"id": 876, "seek": 549448, "start": 5510.24, + "end": 5515.04, "text": " involved with like, if you, if you get into Amazon, like + in French, yeah, and all this stuff,", "tokens": [51152, 3288, 365, 411, 11, 498, + 291, 11, 498, 291, 483, 666, 6795, 11, 411, 294, 5522, 11, 1338, 11, 293, 439, 341, + 1507, 11, 51392], "temperature": 0.0, "avg_logprob": -0.2381542696811185, "compression_ratio": + 1.6455696202531647, "no_speech_prob": 0.011571727693080902}, {"id": 877, "seek": + 549448, "start": 5515.679999999999, "end": 5521.04, "text": " they crafted like + their entire, like they build their own hardware. Um, they have their neural core,", + "tokens": [51424, 436, 36213, 411, 641, 2302, 11, 411, 436, 1322, 641, 1065, 8837, + 13, 3301, 11, 436, 362, 641, 18161, 4965, 11, 51692], "temperature": 0.0, "avg_logprob": + -0.2381542696811185, "compression_ratio": 1.6455696202531647, "no_speech_prob": + 0.011571727693080902}, {"id": 878, "seek": 552104, "start": 5522.0, "end": 5529.6, + "text": " um, that all the stuff is based around. And that''s like, it''s lockin. + It''s big time lockin, right?", "tokens": [50412, 1105, 11, 300, 439, 264, 1507, + 307, 2361, 926, 13, 400, 300, 311, 411, 11, 309, 311, 4017, 259, 13, 467, 311, 955, + 565, 4017, 259, 11, 558, 30, 50792], "temperature": 0.0, "avg_logprob": -0.19079274868746415, + "compression_ratio": 1.632034632034632, "no_speech_prob": 0.001660916837863624}, + {"id": 879, "seek": 552104, "start": 5529.6, "end": 5539.28, "text": " Um, uh, this + is just a web API. You can just use it. I, I think that, um, I, I''ve considered", + "tokens": [50792, 3301, 11, 2232, 11, 341, 307, 445, 257, 3670, 9362, 13, 509, 393, + 445, 764, 309, 13, 286, 11, 286, 519, 300, 11, 1105, 11, 286, 11, 286, 600, 4888, + 51276], "temperature": 0.0, "avg_logprob": -0.19079274868746415, "compression_ratio": + 1.632034632034632, "no_speech_prob": 0.001660916837863624}, {"id": 880, "seek": + 552104, "start": 5539.28, "end": 5544.8, "text": " also like hosting an API, like, + uh, hugging face, hugging faces like one of the most amazing", "tokens": [51276, + 611, 411, 16058, 364, 9362, 11, 411, 11, 2232, 11, 41706, 1851, 11, 41706, 8475, + 411, 472, 295, 264, 881, 2243, 51552], "temperature": 0.0, "avg_logprob": -0.19079274868746415, + "compression_ratio": 1.632034632034632, "no_speech_prob": 0.001660916837863624}, + {"id": 881, "seek": 552104, "start": 5544.8, "end": 5550.0, "text": " software companies + ever. It''s like, that''s like the real community driven open source stuff.", "tokens": + [51552, 4722, 3431, 1562, 13, 467, 311, 411, 11, 300, 311, 411, 264, 957, 1768, + 9555, 1269, 4009, 1507, 13, 51812], "temperature": 0.0, "avg_logprob": -0.19079274868746415, + "compression_ratio": 1.632034632034632, "no_speech_prob": 0.001660916837863624}, + {"id": 882, "seek": 555000, "start": 5550.0, "end": 5554.48, "text": " They do such + amazing work. So I don''t want to, I don''t want to say anything bad about hugging", + "tokens": [50364, 814, 360, 1270, 2243, 589, 13, 407, 286, 500, 380, 528, 281, 11, + 286, 500, 380, 528, 281, 584, 1340, 1578, 466, 41706, 50588], "temperature": 0.0, + "avg_logprob": -0.1598685642458358, "compression_ratio": 1.6973684210526316, "no_speech_prob": + 0.000646044674795121}, {"id": 883, "seek": 555000, "start": 5554.48, "end": 5560.0, + "text": " face because I really have nothing bad to say at all. Um, but, you know, + Infinity definitely has", "tokens": [50588, 1851, 570, 286, 534, 362, 1825, 1578, + 281, 584, 412, 439, 13, 3301, 11, 457, 11, 291, 458, 11, 34762, 2138, 575, 50864], + "temperature": 0.0, "avg_logprob": -0.1598685642458358, "compression_ratio": 1.6973684210526316, + "no_speech_prob": 0.000646044674795121}, {"id": 884, "seek": 555000, "start": 5560.0, + "end": 5567.28, "text": " a fit for the market, which is like, you know, if you + are like Walmart and you need a solution, okay.", "tokens": [50864, 257, 3318, 337, + 264, 2142, 11, 597, 307, 411, 11, 291, 458, 11, 498, 291, 366, 411, 25237, 293, + 291, 643, 257, 3827, 11, 1392, 13, 51228], "temperature": 0.0, "avg_logprob": -0.1598685642458358, + "compression_ratio": 1.6973684210526316, "no_speech_prob": 0.000646044674795121}, + {"id": 885, "seek": 555000, "start": 5568.88, "end": 5574.8, "text": " Hacking face + infinity is in your budget. Go pay for it, you know. Um, that''s the type of thing", + "tokens": [51308, 389, 14134, 1851, 13202, 307, 294, 428, 4706, 13, 1037, 1689, + 337, 309, 11, 291, 458, 13, 3301, 11, 300, 311, 264, 2010, 295, 551, 51604], "temperature": + 0.0, "avg_logprob": -0.1598685642458358, "compression_ratio": 1.6973684210526316, + "no_speech_prob": 0.000646044674795121}, {"id": 886, "seek": 557480, "start": 5574.88, + "end": 5582.72, "text": " that Walmart should use. Um, but if, if you are just like, + if you''re a five person developer team", "tokens": [50368, 300, 25237, 820, 764, + 13, 3301, 11, 457, 498, 11, 498, 291, 366, 445, 411, 11, 498, 291, 434, 257, 1732, + 954, 10754, 1469, 50760], "temperature": 0.0, "avg_logprob": -0.13758040428161622, + "compression_ratio": 1.6222222222222222, "no_speech_prob": 0.0006148762768134475}, + {"id": 887, "seek": 557480, "start": 5582.72, "end": 5587.4400000000005, "text": + " or like, even a, if you work at a company that''s like, you know, 300 people,", + "tokens": [50760, 420, 411, 11, 754, 257, 11, 498, 291, 589, 412, 257, 2237, 300, + 311, 411, 11, 291, 458, 11, 6641, 561, 11, 50996], "temperature": 0.0, "avg_logprob": + -0.13758040428161622, "compression_ratio": 1.6222222222222222, "no_speech_prob": + 0.0006148762768134475}, {"id": 888, "seek": 557480, "start": 5589.2, "end": 5596.24, + "text": " infinity is like really, really expensive. Um, so there is a, there is + a market segmentation there.", "tokens": [51084, 13202, 307, 411, 534, 11, 534, + 5124, 13, 3301, 11, 370, 456, 307, 257, 11, 456, 307, 257, 2142, 9469, 399, 456, + 13, 51436], "temperature": 0.0, "avg_logprob": -0.13758040428161622, "compression_ratio": + 1.6222222222222222, "no_speech_prob": 0.0006148762768134475}, {"id": 889, "seek": + 557480, "start": 5596.24, "end": 5600.72, "text": " There''s a difference between, + okay, well, how much can you afford and who can you hire and", "tokens": [51436, + 821, 311, 257, 2649, 1296, 11, 1392, 11, 731, 11, 577, 709, 393, 291, 6157, 293, + 567, 393, 291, 11158, 293, 51660], "temperature": 0.0, "avg_logprob": -0.13758040428161622, + "compression_ratio": 1.6222222222222222, "no_speech_prob": 0.0006148762768134475}, + {"id": 890, "seek": 560072, "start": 5600.88, "end": 5606.56, "text": " what''s + the level of, um, internal support that you have to put around this thing and how + does it all fit?", "tokens": [50372, 437, 311, 264, 1496, 295, 11, 1105, 11, 6920, + 1406, 300, 291, 362, 281, 829, 926, 341, 551, 293, 577, 775, 309, 439, 3318, 30, + 50656], "temperature": 0.0, "avg_logprob": -0.16906849173612373, "compression_ratio": + 1.7169117647058822, "no_speech_prob": 0.002512376755475998}, {"id": 891, "seek": + 560072, "start": 5608.320000000001, "end": 5613.12, "text": " The teams that are + just starting off that need to use something that, that works really fast,", "tokens": + [50744, 440, 5491, 300, 366, 445, 2891, 766, 300, 643, 281, 764, 746, 300, 11, 300, + 1985, 534, 2370, 11, 50984], "temperature": 0.0, "avg_logprob": -0.16906849173612373, + "compression_ratio": 1.7169117647058822, "no_speech_prob": 0.002512376755475998}, + {"id": 892, "seek": 560072, "start": 5613.12, "end": 5618.320000000001, "text": + " easy to use, then that''s, that''s where it might fit. So I don''t think mighty + can,", "tokens": [50984, 1858, 281, 764, 11, 550, 300, 311, 11, 300, 311, 689, 309, + 1062, 3318, 13, 407, 286, 500, 380, 519, 21556, 393, 11, 51244], "temperature": + 0.0, "avg_logprob": -0.16906849173612373, "compression_ratio": 1.7169117647058822, + "no_speech_prob": 0.002512376755475998}, {"id": 893, "seek": 560072, "start": 5618.320000000001, + "end": 5623.92, "text": " competes with infinity because honestly, I, uh, you know, + hey, Walmart, if you want me as a", "tokens": [51244, 2850, 279, 365, 13202, 570, + 6095, 11, 286, 11, 2232, 11, 291, 458, 11, 4177, 11, 25237, 11, 498, 291, 528, 385, + 382, 257, 51524], "temperature": 0.0, "avg_logprob": -0.16906849173612373, "compression_ratio": + 1.7169117647058822, "no_speech_prob": 0.002512376755475998}, {"id": 894, "seek": + 560072, "start": 5623.92, "end": 5628.400000000001, "text": " customer, if you want, + if you want to buy mighty, sure, go ahead, you know, let''s talk or you", "tokens": + [51524, 5474, 11, 498, 291, 528, 11, 498, 291, 528, 281, 2256, 21556, 11, 988, 11, + 352, 2286, 11, 291, 458, 11, 718, 311, 751, 420, 291, 51748], "temperature": 0.0, + "avg_logprob": -0.16906849173612373, "compression_ratio": 1.7169117647058822, "no_speech_prob": + 0.002512376755475998}, {"id": 895, "seek": 562840, "start": 5628.4, "end": 5632.799999999999, + "text": " can pay the 99 bucks a month. But, you know, that''s not, that''s not + one target. I''m trying to", "tokens": [50364, 393, 1689, 264, 11803, 11829, 257, + 1618, 13, 583, 11, 291, 458, 11, 300, 311, 406, 11, 300, 311, 406, 472, 3779, 13, + 286, 478, 1382, 281, 50584], "temperature": 0.0, "avg_logprob": -0.17214930499041523, + "compression_ratio": 1.5, "no_speech_prob": 0.0036120289005339146}, {"id": 896, + "seek": 562840, "start": 5632.799999999999, "end": 5639.2, "text": " make it super + easy for everybody else. Um, somebody high rank recently, uh, connected with me,", + "tokens": [50584, 652, 309, 1687, 1858, 337, 2201, 1646, 13, 3301, 11, 2618, 1090, + 6181, 3938, 11, 2232, 11, 4582, 365, 385, 11, 50904], "temperature": 0.0, "avg_logprob": + -0.17214930499041523, "compression_ratio": 1.5, "no_speech_prob": 0.0036120289005339146}, + {"id": 897, "seek": 562840, "start": 5639.2, "end": 5645.2, "text": " a LinkedIn, + I think some kind of VP of engineering, hey, if you''re looking into embeddings, + contact max.", "tokens": [50904, 257, 20657, 11, 286, 519, 512, 733, 295, 35812, + 295, 7043, 11, 4177, 11, 498, 291, 434, 1237, 666, 12240, 29432, 11, 3385, 11469, + 13, 51204], "temperature": 0.0, "avg_logprob": -0.17214930499041523, "compression_ratio": + 1.5, "no_speech_prob": 0.0036120289005339146}, {"id": 898, "seek": 562840, "start": + 5648.32, "end": 5656.0, "text": " Really? But like, so we understand, uh, infinity + a little bit better because I didn''t try this at all.", "tokens": [51360, 4083, + 30, 583, 411, 11, 370, 321, 1223, 11, 2232, 11, 13202, 257, 707, 857, 1101, 570, + 286, 994, 380, 853, 341, 412, 439, 13, 51744], "temperature": 0.0, "avg_logprob": + -0.17214930499041523, "compression_ratio": 1.5, "no_speech_prob": 0.0036120289005339146}, + {"id": 899, "seek": 565600, "start": 5656.96, "end": 5662.64, "text": " Um, is this + like some kind of web service that you basically buy subscription for like,", "tokens": + [50412, 3301, 11, 307, 341, 411, 512, 733, 295, 3670, 2643, 300, 291, 1936, 2256, + 17231, 337, 411, 11, 50696], "temperature": 0.0, "avg_logprob": -0.2154098885958312, + "compression_ratio": 1.78544061302682, "no_speech_prob": 0.005826977081596851}, + {"id": 900, "seek": 565600, "start": 5662.64, "end": 5667.92, "text": " sauce kind + of thing? No, it''s like a dark container. I think infinity is a dark container. + Um,", "tokens": [50696, 4880, 733, 295, 551, 30, 883, 11, 309, 311, 411, 257, 2877, + 10129, 13, 286, 519, 13202, 307, 257, 2877, 10129, 13, 3301, 11, 50960], "temperature": + 0.0, "avg_logprob": -0.2154098885958312, "compression_ratio": 1.78544061302682, + "no_speech_prob": 0.005826977081596851}, {"id": 901, "seek": 565600, "start": 5669.28, + "end": 5674.08, "text": " I don''t know, it might be, it might even be written in + Rust, I''m not sure. Consider tokenizers are", "tokens": [51028, 286, 500, 380, + 458, 11, 309, 1062, 312, 11, 309, 1062, 754, 312, 3720, 294, 34952, 11, 286, 478, + 406, 988, 13, 17416, 14862, 22525, 366, 51268], "temperature": 0.0, "avg_logprob": + -0.2154098885958312, "compression_ratio": 1.78544061302682, "no_speech_prob": 0.005826977081596851}, + {"id": 902, "seek": 565600, "start": 5674.08, "end": 5678.48, "text": " written + in Rust, they may have done, I may have done some, infinity came out before mighty, + so they", "tokens": [51268, 3720, 294, 34952, 11, 436, 815, 362, 1096, 11, 286, + 815, 362, 1096, 512, 11, 13202, 1361, 484, 949, 21556, 11, 370, 436, 51488], "temperature": + 0.0, "avg_logprob": -0.2154098885958312, "compression_ratio": 1.78544061302682, + "no_speech_prob": 0.005826977081596851}, {"id": 903, "seek": 565600, "start": 5678.48, + "end": 5683.12, "text": " may have done something. So it''s a perfect competitor + for, for mighty in that sense.", "tokens": [51488, 815, 362, 1096, 746, 13, 407, + 309, 311, 257, 2176, 27266, 337, 11, 337, 21556, 294, 300, 2020, 13, 51720], "temperature": + 0.0, "avg_logprob": -0.2154098885958312, "compression_ratio": 1.78544061302682, + "no_speech_prob": 0.005826977081596851}, {"id": 904, "seek": 568312, "start": 5683.12, + "end": 5689.2, "text": " Um, I mean, I mean, no time pricing, but I mean, the only + package itself, right? So basically,", "tokens": [50364, 3301, 11, 286, 914, 11, + 286, 914, 11, 572, 565, 17621, 11, 457, 286, 914, 11, 264, 787, 7372, 2564, 11, + 558, 30, 407, 1936, 11, 50668], "temperature": 0.0, "avg_logprob": -0.26638587315877277, + "compression_ratio": 1.6885964912280702, "no_speech_prob": 0.00759807089343667}, + {"id": 905, "seek": 568312, "start": 5689.2, "end": 5695.28, "text": " it''s like + Docker Docker anyway. Yeah. And I think, I think, I think my, well, I think infinity", + "tokens": [50668, 309, 311, 411, 33772, 33772, 4033, 13, 865, 13, 400, 286, 519, + 11, 286, 519, 11, 286, 519, 452, 11, 731, 11, 286, 519, 13202, 50972], "temperature": + 0.0, "avg_logprob": -0.26638587315877277, "compression_ratio": 1.6885964912280702, + "no_speech_prob": 0.00759807089343667}, {"id": 906, "seek": 568312, "start": 5695.28, + "end": 5702.88, "text": " encourages GPU like you, they want you to use GPU for + it. But that''s like, I think infinity fits", "tokens": [50972, 28071, 18407, 411, + 291, 11, 436, 528, 291, 281, 764, 18407, 337, 309, 13, 583, 300, 311, 411, 11, 286, + 519, 13202, 9001, 51352], "temperature": 0.0, "avg_logprob": -0.26638587315877277, + "compression_ratio": 1.6885964912280702, "no_speech_prob": 0.00759807089343667}, + {"id": 907, "seek": 568312, "start": 5702.88, "end": 5708.4, "text": " well. If + you have like, you know, a million requests an hour, something like that scale, + you know?", "tokens": [51352, 731, 13, 759, 291, 362, 411, 11, 291, 458, 11, 257, + 2459, 12475, 364, 1773, 11, 746, 411, 300, 4373, 11, 291, 458, 30, 51628], "temperature": + 0.0, "avg_logprob": -0.26638587315877277, "compression_ratio": 1.6885964912280702, + "no_speech_prob": 0.00759807089343667}, {"id": 908, "seek": 570840, "start": 5708.4, + "end": 5716.4, "text": " Yeah. Um, if you have like 20,000 requests a day or a thousand + requests a day, you know,", "tokens": [50364, 865, 13, 3301, 11, 498, 291, 362, + 411, 945, 11, 1360, 12475, 257, 786, 420, 257, 4714, 12475, 257, 786, 11, 291, 458, + 11, 50764], "temperature": 0.0, "avg_logprob": -0.1739403020555728, "compression_ratio": + 1.7674418604651163, "no_speech_prob": 0.008607376366853714}, {"id": 909, "seek": + 570840, "start": 5716.4, "end": 5722.639999999999, "text": " that, that range, a + hundred thousand, you know, I think by these perfect for that, you know,", "tokens": + [50764, 300, 11, 300, 3613, 11, 257, 3262, 4714, 11, 291, 458, 11, 286, 519, 538, + 613, 2176, 337, 300, 11, 291, 458, 11, 51076], "temperature": 0.0, "avg_logprob": + -0.1739403020555728, "compression_ratio": 1.7674418604651163, "no_speech_prob": + 0.008607376366853714}, {"id": 910, "seek": 570840, "start": 5722.639999999999, "end": + 5728.16, "text": " it''s not, you don''t have to have like this huge scale. It can + get bigger. You can just, you know,", "tokens": [51076, 309, 311, 406, 11, 291, + 500, 380, 362, 281, 362, 411, 341, 2603, 4373, 13, 467, 393, 483, 3801, 13, 509, + 393, 445, 11, 291, 458, 11, 51352], "temperature": 0.0, "avg_logprob": -0.1739403020555728, + "compression_ratio": 1.7674418604651163, "no_speech_prob": 0.008607376366853714}, + {"id": 911, "seek": 570840, "start": 5728.16, "end": 5733.12, "text": " spend more + money on hardware than scale it up as much as you want. You can support, you can + support,", "tokens": [51352, 3496, 544, 1460, 322, 8837, 813, 4373, 309, 493, 382, + 709, 382, 291, 528, 13, 509, 393, 1406, 11, 291, 393, 1406, 11, 51600], "temperature": + 0.0, "avg_logprob": -0.1739403020555728, "compression_ratio": 1.7674418604651163, + "no_speech_prob": 0.008607376366853714}, {"id": 912, "seek": 573312, "start": 5733.599999999999, + "end": 5738.24, "text": " you know, a million requests a day if you want to, you''re + a 10 million. You just have to put more", "tokens": [50388, 291, 458, 11, 257, 2459, + 12475, 257, 786, 498, 291, 528, 281, 11, 291, 434, 257, 1266, 2459, 13, 509, 445, + 362, 281, 829, 544, 50620], "temperature": 0.0, "avg_logprob": -0.11717043099579988, + "compression_ratio": 1.6556016597510372, "no_speech_prob": 0.0022550441790372133}, + {"id": 913, "seek": 573312, "start": 5738.24, "end": 5744.24, "text": " hardware + behind it. So I think I''m just competing in a different market. I don''t think, + I don''t think", "tokens": [50620, 8837, 2261, 309, 13, 407, 286, 519, 286, 478, + 445, 15439, 294, 257, 819, 2142, 13, 286, 500, 380, 519, 11, 286, 500, 380, 519, + 50920], "temperature": 0.0, "avg_logprob": -0.11717043099579988, "compression_ratio": + 1.6556016597510372, "no_speech_prob": 0.0022550441790372133}, {"id": 914, "seek": + 573312, "start": 5744.24, "end": 5752.08, "text": " infinity and I are targeting + the same, the same businesses. Yeah. Yeah. And I mean, you do have the", "tokens": + [50920, 13202, 293, 286, 366, 17918, 264, 912, 11, 264, 912, 6011, 13, 865, 13, + 865, 13, 400, 286, 914, 11, 291, 360, 362, 264, 51312], "temperature": 0.0, "avg_logprob": + -0.11717043099579988, "compression_ratio": 1.6556016597510372, "no_speech_prob": + 0.0022550441790372133}, {"id": 915, "seek": 573312, "start": 5752.08, "end": 5760.16, + "text": " edge on the fact that you want to address the community beyond Python. + So like, I think it''s a big,", "tokens": [51312, 4691, 322, 264, 1186, 300, 291, + 528, 281, 2985, 264, 1768, 4399, 15329, 13, 407, 411, 11, 286, 519, 309, 311, 257, + 955, 11, 51716], "temperature": 0.0, "avg_logprob": -0.11717043099579988, "compression_ratio": + 1.6556016597510372, "no_speech_prob": 0.0022550441790372133}, {"id": 916, "seek": + 576016, "start": 5760.96, "end": 5769.92, "text": " it''s a big message to send. + Um, and in some ways through you, you channeled this, this feeling that,", "tokens": + [50404, 309, 311, 257, 955, 3636, 281, 2845, 13, 3301, 11, 293, 294, 512, 2098, + 807, 291, 11, 291, 2269, 292, 341, 11, 341, 2633, 300, 11, 50852], "temperature": + 0.0, "avg_logprob": -0.23092596871512278, "compression_ratio": 1.6596638655462186, + "no_speech_prob": 0.013590123504400253}, {"id": 917, "seek": 576016, "start": 5769.92, + "end": 5776.24, "text": " hey, this guy is in Node.js, a job, I probably feel like + left out from this big thing, but it''s", "tokens": [50852, 4177, 11, 341, 2146, + 307, 294, 38640, 13, 25530, 11, 257, 1691, 11, 286, 1391, 841, 411, 1411, 484, 490, + 341, 955, 551, 11, 457, 309, 311, 51168], "temperature": 0.0, "avg_logprob": -0.23092596871512278, + "compression_ratio": 1.6596638655462186, "no_speech_prob": 0.013590123504400253}, + {"id": 918, "seek": 576016, "start": 5776.24, "end": 5781.84, "text": " probably + not true. I mean, I know also there is this deep learning for J and blah, blah, + blah, but like,", "tokens": [51168, 1391, 406, 2074, 13, 286, 914, 11, 286, 458, + 611, 456, 307, 341, 2452, 2539, 337, 508, 293, 12288, 11, 12288, 11, 12288, 11, + 457, 411, 11, 51448], "temperature": 0.0, "avg_logprob": -0.23092596871512278, "compression_ratio": + 1.6596638655462186, "no_speech_prob": 0.013590123504400253}, {"id": 919, "seek": + 576016, "start": 5781.84, "end": 5787.28, "text": " it''s like an island in the + ocean, probably comparing it to it. It''s amazing software. It just", "tokens": + [51448, 309, 311, 411, 364, 6077, 294, 264, 7810, 11, 1391, 15763, 309, 281, 309, + 13, 467, 311, 2243, 4722, 13, 467, 445, 51720], "temperature": 0.0, "avg_logprob": + -0.23092596871512278, "compression_ratio": 1.6596638655462186, "no_speech_prob": + 0.013590123504400253}, {"id": 920, "seek": 578728, "start": 5787.28, "end": 5794.0, + "text": " didn''t get the adoption that Python got. Yeah. I remember going through + these internal pains myself,", "tokens": [50364, 994, 380, 483, 264, 19215, 300, + 15329, 658, 13, 865, 13, 286, 1604, 516, 807, 613, 6920, 29774, 2059, 11, 50700], + "temperature": 0.0, "avg_logprob": -0.25029105369490806, "compression_ratio": 1.6296296296296295, + "no_speech_prob": 0.003511376678943634}, {"id": 921, "seek": 578728, "start": 5794.0, + "end": 5803.04, "text": " right? When I was, when it was like 2015, 2016, and I + started, and I started getting deep learning", "tokens": [50700, 558, 30, 1133, + 286, 390, 11, 562, 309, 390, 411, 7546, 11, 6549, 11, 293, 286, 1409, 11, 293, 286, + 1409, 1242, 2452, 2539, 51152], "temperature": 0.0, "avg_logprob": -0.25029105369490806, + "compression_ratio": 1.6296296296296295, "no_speech_prob": 0.003511376678943634}, + {"id": 922, "seek": 578728, "start": 5803.04, "end": 5809.44, "text": " training + and I took course error courses and rings courses on machine learning and stuff. + I started", "tokens": [51152, 3097, 293, 286, 1890, 1164, 6713, 7712, 293, 11136, + 7712, 322, 3479, 2539, 293, 1507, 13, 286, 1409, 51472], "temperature": 0.0, "avg_logprob": + -0.25029105369490806, "compression_ratio": 1.6296296296296295, "no_speech_prob": + 0.003511376678943634}, {"id": 923, "seek": 578728, "start": 5809.44, "end": 5815.599999999999, + "text": " it off with Octav, which is an open source, uh, is a mathematical or whatever. + It''s a new Octav,", "tokens": [51472, 309, 766, 365, 6788, 706, 11, 597, 307, 364, + 1269, 4009, 11, 2232, 11, 307, 257, 18894, 420, 2035, 13, 467, 311, 257, 777, 6788, + 706, 11, 51780], "temperature": 0.0, "avg_logprob": -0.25029105369490806, "compression_ratio": + 1.6296296296296295, "no_speech_prob": 0.003511376678943634}, {"id": 924, "seek": + 581560, "start": 5815.6, "end": 5820.4800000000005, "text": " but it''s like, it''s + its own language, right? So it''s mathematics, um, just just code. But then,", "tokens": + [50364, 457, 309, 311, 411, 11, 309, 311, 1080, 1065, 2856, 11, 558, 30, 407, 309, + 311, 18666, 11, 1105, 11, 445, 445, 3089, 13, 583, 550, 11, 50608], "temperature": + 0.0, "avg_logprob": -0.1813757832845052, "compression_ratio": 1.8154981549815499, + "no_speech_prob": 0.0008447906002402306}, {"id": 925, "seek": 581560, "start": 5820.4800000000005, + "end": 5824.4800000000005, "text": " like the next courses were all Python. And + I was like, Oh, no, I have to learn Python. I don''t know", "tokens": [50608, 411, + 264, 958, 7712, 645, 439, 15329, 13, 400, 286, 390, 411, 11, 876, 11, 572, 11, 286, + 362, 281, 1466, 15329, 13, 286, 500, 380, 458, 50808], "temperature": 0.0, "avg_logprob": + -0.1813757832845052, "compression_ratio": 1.8154981549815499, "no_speech_prob": + 0.0008447906002402306}, {"id": 926, "seek": 581560, "start": 5824.4800000000005, + "end": 5830.08, "text": " Python. I have to know, no, no, no, no, no, no language + to use this stuff. Okay, fine. I''ll do it.", "tokens": [50808, 15329, 13, 286, + 362, 281, 458, 11, 572, 11, 572, 11, 572, 11, 572, 11, 572, 11, 572, 2856, 281, + 764, 341, 1507, 13, 1033, 11, 2489, 13, 286, 603, 360, 309, 13, 51088], "temperature": + 0.0, "avg_logprob": -0.1813757832845052, "compression_ratio": 1.8154981549815499, + "no_speech_prob": 0.0008447906002402306}, {"id": 927, "seek": 581560, "start": 5830.08, + "end": 5834.72, "text": " Right? So I went down that and I learned Python and I + got pretty good at it. Um, but there''s a lot", "tokens": [51088, 1779, 30, 407, + 286, 1437, 760, 300, 293, 286, 3264, 15329, 293, 286, 658, 1238, 665, 412, 309, + 13, 3301, 11, 457, 456, 311, 257, 688, 51320], "temperature": 0.0, "avg_logprob": + -0.1813757832845052, "compression_ratio": 1.8154981549815499, "no_speech_prob": + 0.0008447906002402306}, {"id": 928, "seek": 581560, "start": 5834.72, "end": 5838.56, + "text": " of people who just don''t want to take that step, you know, they want + to, they want to ship code in", "tokens": [51320, 295, 561, 567, 445, 500, 380, + 528, 281, 747, 300, 1823, 11, 291, 458, 11, 436, 528, 281, 11, 436, 528, 281, 5374, + 3089, 294, 51512], "temperature": 0.0, "avg_logprob": -0.1813757832845052, "compression_ratio": + 1.8154981549815499, "no_speech_prob": 0.0008447906002402306}, {"id": 929, "seek": + 583856, "start": 5838.56, "end": 5845.84, "text": " their, in their stack. So it''s, + it''s a big ask to say, if you want to use these awesome tools,", "tokens": [50364, + 641, 11, 294, 641, 8630, 13, 407, 309, 311, 11, 309, 311, 257, 955, 1029, 281, 584, + 11, 498, 291, 528, 281, 764, 613, 3476, 3873, 11, 50728], "temperature": 0.0, "avg_logprob": + -0.15885656782724325, "compression_ratio": 1.6869158878504673, "no_speech_prob": + 0.0017256569117307663}, {"id": 930, "seek": 583856, "start": 5845.84, "end": 5849.120000000001, + "text": " you got to use, you got to, you got to convert, you got to convert your + language.", "tokens": [50728, 291, 658, 281, 764, 11, 291, 658, 281, 11, 291, 658, + 281, 7620, 11, 291, 658, 281, 7620, 428, 2856, 13, 50892], "temperature": 0.0, "avg_logprob": + -0.15885656782724325, "compression_ratio": 1.6869158878504673, "no_speech_prob": + 0.0017256569117307663}, {"id": 931, "seek": 583856, "start": 5850.56, "end": 5858.96, + "text": " Yeah, yeah, exactly. And if you''re, you know, if you''re not into data + science or machine learning,", "tokens": [50964, 865, 11, 1338, 11, 2293, 13, 400, + 498, 291, 434, 11, 291, 458, 11, 498, 291, 434, 406, 666, 1412, 3497, 420, 3479, + 2539, 11, 51384], "temperature": 0.0, "avg_logprob": -0.15885656782724325, "compression_ratio": + 1.6869158878504673, "no_speech_prob": 0.0017256569117307663}, {"id": 932, "seek": + 583856, "start": 5859.6, "end": 5866.8, "text": " then why would you enter Python + at all? Like it has no, no like single winning point,", "tokens": [51416, 550, 983, + 576, 291, 3242, 15329, 412, 439, 30, 1743, 309, 575, 572, 11, 572, 411, 2167, 8224, + 935, 11, 51776], "temperature": 0.0, "avg_logprob": -0.15885656782724325, "compression_ratio": + 1.6869158878504673, "no_speech_prob": 0.0017256569117307663}, {"id": 933, "seek": + 586680, "start": 5867.76, "end": 5875.360000000001, "text": " well, maybe simplicity, + but hey, is that it? You know, um, and it''s like lose deposition. Of course,", + "tokens": [50412, 731, 11, 1310, 25632, 11, 457, 4177, 11, 307, 300, 309, 30, 509, + 458, 11, 1105, 11, 293, 309, 311, 411, 3624, 1367, 5830, 13, 2720, 1164, 11, 50792], + "temperature": 0.0, "avg_logprob": -0.16830779021640993, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.008484384045004845}, {"id": 934, "seek": 586680, "start": 5875.360000000001, + "end": 5880.0, "text": " you can make it more strict with typing and blah, blah, + blah, but like still, but like it took me,", "tokens": [50792, 291, 393, 652, 309, + 544, 10910, 365, 18444, 293, 12288, 11, 12288, 11, 12288, 11, 457, 411, 920, 11, + 457, 411, 309, 1890, 385, 11, 51024], "temperature": 0.0, "avg_logprob": -0.16830779021640993, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.008484384045004845}, + {"id": 935, "seek": 586680, "start": 5880.0, "end": 5884.64, "text": " I think it + took me actually good three years to learn Python properly. Because it''s like,", + "tokens": [51024, 286, 519, 309, 1890, 385, 767, 665, 1045, 924, 281, 1466, 15329, + 6108, 13, 1436, 309, 311, 411, 11, 51256], "temperature": 0.0, "avg_logprob": -0.16830779021640993, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.008484384045004845}, + {"id": 936, "seek": 586680, "start": 5884.64, "end": 5889.52, "text": " not like, + okay, oh, I understand how to do the for loop. I understand the indentation and + blah,", "tokens": [51256, 406, 411, 11, 1392, 11, 1954, 11, 286, 1223, 577, 281, + 360, 264, 337, 6367, 13, 286, 1223, 264, 44494, 399, 293, 12288, 11, 51500], "temperature": + 0.0, "avg_logprob": -0.16830779021640993, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.008484384045004845}, {"id": 937, "seek": 588952, "start": 5889.6, + "end": 5897.84, "text": " but like to actually master it, right? Like, you know, + avoid stupid loading of the model multiple times", "tokens": [50368, 457, 411, 281, + 767, 4505, 309, 11, 558, 30, 1743, 11, 291, 458, 11, 5042, 6631, 15114, 295, 264, + 2316, 3866, 1413, 50780], "temperature": 0.0, "avg_logprob": -0.3020557495484869, + "compression_ratio": 1.4527363184079602, "no_speech_prob": 0.005055190995335579}, + {"id": 938, "seek": 588952, "start": 5897.84, "end": 5907.92, "text": " in G unicorn. + So, so I think the, all like, sithon, I didn''t enter thyson, the sithon world,", + "tokens": [50780, 294, 460, 28122, 13, 407, 11, 370, 286, 519, 264, 11, 439, 411, + 11, 262, 355, 266, 11, 286, 994, 380, 3242, 258, 749, 266, 11, 264, 262, 355, 266, + 1002, 11, 51284], "temperature": 0.0, "avg_logprob": -0.3020557495484869, "compression_ratio": + 1.4527363184079602, "no_speech_prob": 0.005055190995335579}, {"id": 939, "seek": + 588952, "start": 5907.92, "end": 5914.240000000001, "text": " likely, but, but even + just writing normal soft when Python takes a lot of time, productizing it", "tokens": + [51284, 3700, 11, 457, 11, 457, 754, 445, 3579, 2710, 2787, 562, 15329, 2516, 257, + 688, 295, 565, 11, 1674, 3319, 309, 51600], "temperature": 0.0, "avg_logprob": -0.3020557495484869, + "compression_ratio": 1.4527363184079602, "no_speech_prob": 0.005055190995335579}, + {"id": 940, "seek": 591424, "start": 5914.24, "end": 5921.04, "text": " takes a + lot of time. So, so why would you enter it if you are not after the tasty machine + learning", "tokens": [50364, 2516, 257, 688, 295, 565, 13, 407, 11, 370, 983, 576, + 291, 3242, 309, 498, 291, 366, 406, 934, 264, 11535, 3479, 2539, 50704], "temperature": + 0.0, "avg_logprob": -0.08920461405878481, "compression_ratio": 1.676056338028169, + "no_speech_prob": 0.003312513465061784}, {"id": 941, "seek": 591424, "start": 5921.04, + "end": 5926.5599999999995, "text": " and data science? So why would you consider + even converting your software stack into this?", "tokens": [50704, 293, 1412, 3497, + 30, 407, 983, 576, 291, 1949, 754, 29942, 428, 4722, 8630, 666, 341, 30, 50980], + "temperature": 0.0, "avg_logprob": -0.08920461405878481, "compression_ratio": 1.676056338028169, + "no_speech_prob": 0.003312513465061784}, {"id": 942, "seek": 591424, "start": 5927.5199999999995, + "end": 5931.44, "text": " So it should be the other way around. And I think you''re + doing a great job there with Mighty,", "tokens": [51028, 407, 309, 820, 312, 264, + 661, 636, 926, 13, 400, 286, 519, 291, 434, 884, 257, 869, 1691, 456, 365, 45874, + 11, 51224], "temperature": 0.0, "avg_logprob": -0.08920461405878481, "compression_ratio": + 1.676056338028169, "no_speech_prob": 0.003312513465061784}, {"id": 943, "seek": + 591424, "start": 5932.16, "end": 5937.28, "text": " basically offered as a service + offered as maybe in the future as some kind of library or some kind", "tokens": + [51260, 1936, 8059, 382, 257, 2643, 8059, 382, 1310, 294, 264, 2027, 382, 512, 733, + 295, 6405, 420, 512, 733, 51516], "temperature": 0.0, "avg_logprob": -0.08920461405878481, + "compression_ratio": 1.676056338028169, "no_speech_prob": 0.003312513465061784}, + {"id": 944, "seek": 591424, "start": 5937.28, "end": 5941.5199999999995, "text": + " of environment. I mean, Microsoft has been doing a bunch of these things. I don''t + know if you", "tokens": [51516, 295, 2823, 13, 286, 914, 11, 8116, 575, 668, 884, + 257, 3840, 295, 613, 721, 13, 286, 500, 380, 458, 498, 291, 51728], "temperature": + 0.0, "avg_logprob": -0.08920461405878481, "compression_ratio": 1.676056338028169, + "no_speech_prob": 0.003312513465061784}, {"id": 945, "seek": 594152, "start": 5941.52, + "end": 5948.8, "text": " remember the CLR common language runtime. So like, you, + you, you bring up the, the visual studio and", "tokens": [50364, 1604, 264, 12855, + 49, 2689, 2856, 34474, 13, 407, 411, 11, 291, 11, 291, 11, 291, 1565, 493, 264, + 11, 264, 5056, 6811, 293, 50728], "temperature": 0.0, "avg_logprob": -0.12301448539451316, + "compression_ratio": 1.5846774193548387, "no_speech_prob": 0.0010254178196191788}, + {"id": 946, "seek": 594152, "start": 5948.8, "end": 5954.8, "text": " you can say, + okay, my project will be in Pearl compiled and run for Java. I don''t remember. + It", "tokens": [50728, 291, 393, 584, 11, 1392, 11, 452, 1716, 486, 312, 294, 24639, + 36548, 293, 1190, 337, 10745, 13, 286, 500, 380, 1604, 13, 467, 51028], "temperature": + 0.0, "avg_logprob": -0.12301448539451316, "compression_ratio": 1.5846774193548387, + "no_speech_prob": 0.0010254178196191788}, {"id": 947, "seek": 594152, "start": 5954.8, + "end": 5959.6, "text": " was crazy. I was just experimenting with it. And I was + like, I barely knew any of these languages", "tokens": [51028, 390, 3219, 13, 286, + 390, 445, 29070, 365, 309, 13, 400, 286, 390, 411, 11, 286, 10268, 2586, 604, 295, + 613, 8650, 51268], "temperature": 0.0, "avg_logprob": -0.12301448539451316, "compression_ratio": + 1.5846774193548387, "no_speech_prob": 0.0010254178196191788}, {"id": 948, "seek": + 594152, "start": 5959.6, "end": 5967.52, "text": " as a student, but I was fascinated + by the idea. It didn''t fly, I think, but it was, it was amazing.", "tokens": [51268, + 382, 257, 3107, 11, 457, 286, 390, 24597, 538, 264, 1558, 13, 467, 994, 380, 3603, + 11, 286, 519, 11, 457, 309, 390, 11, 309, 390, 2243, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.12301448539451316, "compression_ratio": 1.5846774193548387, + "no_speech_prob": 0.0010254178196191788}, {"id": 949, "seek": 596752, "start": 5968.4800000000005, + "end": 5978.400000000001, "text": " Yeah, absolutely. And I, you know, I would, + I did, I did play around with the idea of, well,", "tokens": [50412, 865, 11, 3122, + 13, 400, 286, 11, 291, 458, 11, 286, 576, 11, 286, 630, 11, 286, 630, 862, 926, + 365, 264, 1558, 295, 11, 731, 11, 50908], "temperature": 0.0, "avg_logprob": -0.17765399142428562, + "compression_ratio": 1.672340425531915, "no_speech_prob": 0.011254729703068733}, + {"id": 950, "seek": 596752, "start": 5978.400000000001, "end": 5982.56, "text": + " what if you don''t even have to download Mighty, what if I was playing around + with this idea from the", "tokens": [50908, 437, 498, 291, 500, 380, 754, 362, 281, + 5484, 45874, 11, 437, 498, 286, 390, 2433, 926, 365, 341, 1558, 490, 264, 51116], + "temperature": 0.0, "avg_logprob": -0.17765399142428562, "compression_ratio": 1.672340425531915, + "no_speech_prob": 0.011254729703068733}, {"id": 951, "seek": 596752, "start": 5982.56, + "end": 5988.160000000001, "text": " NPM perspective, like, what if you just installed + it from NPM command? And I thought, that''s a little", "tokens": [51116, 426, 18819, + 4585, 11, 411, 11, 437, 498, 291, 445, 8899, 309, 490, 426, 18819, 5622, 30, 400, + 286, 1194, 11, 300, 311, 257, 707, 51396], "temperature": 0.0, "avg_logprob": -0.17765399142428562, + "compression_ratio": 1.672340425531915, "no_speech_prob": 0.011254729703068733}, + {"id": 952, "seek": 596752, "start": 5988.160000000001, "end": 5995.6, "text": " + bit heavyweight. Do I want to bring in this thing from, you know, I could. I don''t + know if that''s", "tokens": [51396, 857, 4676, 12329, 13, 1144, 286, 528, 281, 1565, + 294, 341, 551, 490, 11, 291, 458, 11, 286, 727, 13, 286, 500, 380, 458, 498, 300, + 311, 51768], "temperature": 0.0, "avg_logprob": -0.17765399142428562, "compression_ratio": + 1.672340425531915, "no_speech_prob": 0.011254729703068733}, {"id": 953, "seek": + 599560, "start": 5996.400000000001, "end": 6001.360000000001, "text": " if I should + do that. And I also don''t want to set false expectations to. And maybe this is + just", "tokens": [50404, 498, 286, 820, 360, 300, 13, 400, 286, 611, 500, 380, 528, + 281, 992, 7908, 9843, 281, 13, 400, 1310, 341, 307, 445, 50652], "temperature": + 0.0, "avg_logprob": -0.1276414017928274, "compression_ratio": 1.9078947368421053, + "no_speech_prob": 0.002071540569886565}, {"id": 954, "seek": 599560, "start": 6001.360000000001, + "end": 6006.56, "text": " because I''m not great at marketing, but I don''t want + to set the expectation of you just do NPM", "tokens": [50652, 570, 286, 478, 406, + 869, 412, 6370, 11, 457, 286, 500, 380, 528, 281, 992, 264, 14334, 295, 291, 445, + 360, 426, 18819, 50912], "temperature": 0.0, "avg_logprob": -0.1276414017928274, + "compression_ratio": 1.9078947368421053, "no_speech_prob": 0.002071540569886565}, + {"id": 955, "seek": 599560, "start": 6006.56, "end": 6010.96, "text": " install, + look, Mighty server. And then you have a perfectly running thing because it''s, + it''s more", "tokens": [50912, 3625, 11, 574, 11, 45874, 7154, 13, 400, 550, 291, + 362, 257, 6239, 2614, 551, 570, 309, 311, 11, 309, 311, 544, 51132], "temperature": + 0.0, "avg_logprob": -0.1276414017928274, "compression_ratio": 1.9078947368421053, + "no_speech_prob": 0.002071540569886565}, {"id": 956, "seek": 599560, "start": 6010.96, + "end": 6015.52, "text": " than that. You have to scale it properly. You have to + give it its own compute. You have to choose", "tokens": [51132, 813, 300, 13, 509, + 362, 281, 4373, 309, 6108, 13, 509, 362, 281, 976, 309, 1080, 1065, 14722, 13, 509, + 362, 281, 2826, 51360], "temperature": 0.0, "avg_logprob": -0.1276414017928274, + "compression_ratio": 1.9078947368421053, "no_speech_prob": 0.002071540569886565}, + {"id": 957, "seek": 599560, "start": 6015.52, "end": 6019.280000000001, "text": + " the appropriate model. You have to, you have to do certain things to really get + the most out of it.", "tokens": [51360, 264, 6854, 2316, 13, 509, 362, 281, 11, + 291, 362, 281, 360, 1629, 721, 281, 534, 483, 264, 881, 484, 295, 309, 13, 51548], + "temperature": 0.0, "avg_logprob": -0.1276414017928274, "compression_ratio": 1.9078947368421053, + "no_speech_prob": 0.002071540569886565}, {"id": 958, "seek": 599560, "start": 6020.160000000001, + "end": 6025.52, "text": " So I don''t want to set false expectations where somebody + deploys it and, and it''s like, it''s,", "tokens": [51592, 407, 286, 500, 380, 528, + 281, 992, 7908, 9843, 689, 2618, 368, 49522, 309, 293, 11, 293, 309, 311, 411, 11, + 309, 311, 11, 51860], "temperature": 0.0, "avg_logprob": -0.1276414017928274, "compression_ratio": + 1.9078947368421053, "no_speech_prob": 0.002071540569886565}, {"id": 959, "seek": + 602552, "start": 6025.52, "end": 6031.040000000001, "text": " doesn''t work well + at all because they just did NPM install Mighty server, which doesn''t exist by", + "tokens": [50364, 1177, 380, 589, 731, 412, 439, 570, 436, 445, 630, 426, 18819, + 3625, 45874, 7154, 11, 597, 1177, 380, 2514, 538, 50640], "temperature": 0.0, "avg_logprob": + -0.14515689038854884, "compression_ratio": 1.6912280701754385, "no_speech_prob": + 0.0005770266288891435}, {"id": 960, "seek": 602552, "start": 6031.040000000001, + "end": 6037.6, "text": " the way. Don''t try that. And then it didn''t, and then + it didn''t work. So I, you know, there is a", "tokens": [50640, 264, 636, 13, 1468, + 380, 853, 300, 13, 400, 550, 309, 994, 380, 11, 293, 550, 309, 994, 380, 589, 13, + 407, 286, 11, 291, 458, 11, 456, 307, 257, 50968], "temperature": 0.0, "avg_logprob": + -0.14515689038854884, "compression_ratio": 1.6912280701754385, "no_speech_prob": + 0.0005770266288891435}, {"id": 961, "seek": 602552, "start": 6037.6, "end": 6042.080000000001, + "text": " little bit of knowledge that you do that that you do have to attain. So + I want to pass, you know,", "tokens": [50968, 707, 857, 295, 3601, 300, 291, 360, + 300, 300, 291, 360, 362, 281, 23766, 13, 407, 286, 528, 281, 1320, 11, 291, 458, + 11, 51192], "temperature": 0.0, "avg_logprob": -0.14515689038854884, "compression_ratio": + 1.6912280701754385, "no_speech_prob": 0.0005770266288891435}, {"id": 962, "seek": + 602552, "start": 6043.4400000000005, "end": 6047.4400000000005, "text": " you do + have to familiarize yourself with some concepts. That doesn''t mean learning an + entirely", "tokens": [51260, 291, 360, 362, 281, 4963, 1125, 1803, 365, 512, 10392, + 13, 663, 1177, 380, 914, 2539, 364, 7696, 51460], "temperature": 0.0, "avg_logprob": + -0.14515689038854884, "compression_ratio": 1.6912280701754385, "no_speech_prob": + 0.0005770266288891435}, {"id": 963, "seek": 602552, "start": 6047.4400000000005, + "end": 6053.120000000001, "text": " new language and stack. Yeah, it''s more like + probably like I''m a lobster devil. So somebody can", "tokens": [51460, 777, 2856, + 293, 8630, 13, 865, 11, 309, 311, 544, 411, 1391, 411, 286, 478, 257, 33198, 13297, + 13, 407, 2618, 393, 51744], "temperature": 0.0, "avg_logprob": -0.14515689038854884, + "compression_ratio": 1.6912280701754385, "no_speech_prob": 0.0005770266288891435}, + {"id": 964, "seek": 605312, "start": 6053.12, "end": 6059.44, "text": " pick it + up. And I mean, learning that way is much faster than actually, you know, figuring + out how", "tokens": [50364, 1888, 309, 493, 13, 400, 286, 914, 11, 2539, 300, 636, + 307, 709, 4663, 813, 767, 11, 291, 458, 11, 15213, 484, 577, 50680], "temperature": + 0.0, "avg_logprob": -0.138465409327035, "compression_ratio": 1.5532786885245902, + "no_speech_prob": 0.005983843468129635}, {"id": 965, "seek": 605312, "start": 6059.44, + "end": 6064.32, "text": " the how will I plug it into my Java code or C++ code or + whatever. So yeah, of course.", "tokens": [50680, 264, 577, 486, 286, 5452, 309, + 666, 452, 10745, 3089, 420, 383, 25472, 3089, 420, 2035, 13, 407, 1338, 11, 295, + 1164, 13, 50924], "temperature": 0.0, "avg_logprob": -0.138465409327035, "compression_ratio": + 1.5532786885245902, "no_speech_prob": 0.005983843468129635}, {"id": 966, "seek": + 605312, "start": 6065.84, "end": 6072.64, "text": " I think we, like I''ve really + enjoyed this conversation, Max, we went deep into all these aspects.", "tokens": + [51000, 286, 519, 321, 11, 411, 286, 600, 534, 4626, 341, 3761, 11, 7402, 11, 321, + 1437, 2452, 666, 439, 613, 7270, 13, 51340], "temperature": 0.0, "avg_logprob": + -0.138465409327035, "compression_ratio": 1.5532786885245902, "no_speech_prob": 0.005983843468129635}, + {"id": 967, "seek": 605312, "start": 6072.64, "end": 6078.0, "text": " Maybe not + we can record another episode, you know, going in another direction. I''m sure there + is", "tokens": [51340, 2704, 406, 321, 393, 2136, 1071, 3500, 11, 291, 458, 11, + 516, 294, 1071, 3513, 13, 286, 478, 988, 456, 307, 51608], "temperature": 0.0, "avg_logprob": + -0.138465409327035, "compression_ratio": 1.5532786885245902, "no_speech_prob": 0.005983843468129635}, + {"id": 968, "seek": 607800, "start": 6078.0, "end": 6086.56, "text": " like million, + million directions to take. But I enjoy asking this question of philosophical why,", + "tokens": [50364, 411, 2459, 11, 2459, 11095, 281, 747, 13, 583, 286, 2103, 3365, + 341, 1168, 295, 25066, 983, 11, 50792], "temperature": 0.0, "avg_logprob": -0.22510557396467343, + "compression_ratio": 1.5659574468085107, "no_speech_prob": 0.0058393762446939945}, + {"id": 969, "seek": 607800, "start": 6087.12, "end": 6094.0, "text": " if you can + still spare a few minutes, like why why why why why I''m fascinated by this", "tokens": + [50820, 498, 291, 393, 920, 13798, 257, 1326, 2077, 11, 411, 983, 983, 983, 983, + 983, 286, 478, 24597, 538, 341, 51164], "temperature": 0.0, "avg_logprob": -0.22510557396467343, + "compression_ratio": 1.5659574468085107, "no_speech_prob": 0.0058393762446939945}, + {"id": 970, "seek": 607800, "start": 6094.0, "end": 6101.44, "text": " field of + vector search. What brought you into it? And I remember I will also make sure to", + "tokens": [51164, 2519, 295, 8062, 3164, 13, 708, 3038, 291, 666, 309, 30, 400, + 286, 1604, 286, 486, 611, 652, 988, 281, 51536], "temperature": 0.0, "avg_logprob": + -0.22510557396467343, "compression_ratio": 1.5659574468085107, "no_speech_prob": + 0.0058393762446939945}, {"id": 971, "seek": 607800, "start": 6101.44, "end": 6107.76, + "text": " mention this that we did form a team with you and you you responded positively + to my inquiry to", "tokens": [51536, 2152, 341, 300, 321, 630, 1254, 257, 1469, + 365, 291, 293, 291, 291, 15806, 25795, 281, 452, 25736, 281, 51852], "temperature": + 0.0, "avg_logprob": -0.22510557396467343, "compression_ratio": 1.5659574468085107, + "no_speech_prob": 0.0058393762446939945}, {"id": 972, "seek": 610800, "start": 6108.0, + "end": 6116.08, "text": " compete in in billion scale and then competition. And + you basically single handedly, almost driven", "tokens": [50364, 11831, 294, 294, + 5218, 4373, 293, 550, 6211, 13, 400, 291, 1936, 2167, 1011, 13516, 11, 1920, 9555, + 50768], "temperature": 0.0, "avg_logprob": -0.25516940222846135, "compression_ratio": + 1.5737704918032787, "no_speech_prob": 0.0024721981026232243}, {"id": 973, "seek": + 610800, "start": 6116.08, "end": 6122.96, "text": " the idea of body PQ. Of course, + we also have Alexander, similar who was helping you and all of us", "tokens": [50768, + 264, 1558, 295, 1772, 430, 48, 13, 2720, 1164, 11, 321, 611, 362, 14845, 11, 2531, + 567, 390, 4315, 291, 293, 439, 295, 505, 51112], "temperature": 0.0, "avg_logprob": + -0.25516940222846135, "compression_ratio": 1.5737704918032787, "no_speech_prob": + 0.0024721981026232243}, {"id": 974, "seek": 610800, "start": 6122.96, "end": 6129.84, + "text": " been brainstorming with you. But so that was kind of like maybe academic + fascination with it,", "tokens": [51112, 668, 35245, 278, 365, 291, 13, 583, 370, + 300, 390, 733, 295, 411, 1310, 7778, 7184, 2486, 365, 309, 11, 51456], "temperature": + 0.0, "avg_logprob": -0.25516940222846135, "compression_ratio": 1.5737704918032787, + "no_speech_prob": 0.0024721981026232243}, {"id": 975, "seek": 610800, "start": 6129.84, + "end": 6135.84, "text": " right? But other other facets that that keep you going + also giving your background in search,", "tokens": [51456, 558, 30, 583, 661, 661, + 49752, 300, 300, 1066, 291, 516, 611, 2902, 428, 3678, 294, 3164, 11, 51756], "temperature": + 0.0, "avg_logprob": -0.25516940222846135, "compression_ratio": 1.5737704918032787, + "no_speech_prob": 0.0024721981026232243}, {"id": 976, "seek": 613584, "start": 6135.84, + "end": 6145.2, "text": " which was pre vector search. Yeah, I''d say just my endless + curiosity into things, you know,", "tokens": [50364, 597, 390, 659, 8062, 3164, + 13, 865, 11, 286, 1116, 584, 445, 452, 16144, 18769, 666, 721, 11, 291, 458, 11, + 50832], "temperature": 0.0, "avg_logprob": -0.1450278917948405, "compression_ratio": + 1.7048458149779735, "no_speech_prob": 0.006520905531942844}, {"id": 977, "seek": + 613584, "start": 6145.2, "end": 6150.72, "text": " think a lot of us have that as, + you know, if you''re listening to this podcast, the audience,", "tokens": [50832, + 519, 257, 688, 295, 505, 362, 300, 382, 11, 291, 458, 11, 498, 291, 434, 4764, 281, + 341, 7367, 11, 264, 4034, 11, 51108], "temperature": 0.0, "avg_logprob": -0.1450278917948405, + "compression_ratio": 1.7048458149779735, "no_speech_prob": 0.006520905531942844}, + {"id": 978, "seek": 613584, "start": 6150.72, "end": 6155.360000000001, "text": + " there''s probably a lot of you are very curious about just check technology in + general and the limitations", "tokens": [51108, 456, 311, 1391, 257, 688, 295, 291, + 366, 588, 6369, 466, 445, 1520, 2899, 294, 2674, 293, 264, 15705, 51340], "temperature": + 0.0, "avg_logprob": -0.1450278917948405, "compression_ratio": 1.7048458149779735, + "no_speech_prob": 0.006520905531942844}, {"id": 979, "seek": 613584, "start": 6155.360000000001, + "end": 6161.12, "text": " of technology and what''s positive and getting to that + new magical thing and trying something for", "tokens": [51340, 295, 2899, 293, 437, + 311, 3353, 293, 1242, 281, 300, 777, 12066, 551, 293, 1382, 746, 337, 51628], "temperature": + 0.0, "avg_logprob": -0.1450278917948405, "compression_ratio": 1.7048458149779735, + "no_speech_prob": 0.006520905531942844}, {"id": 980, "seek": 616112, "start": 6161.12, + "end": 6166.64, "text": " the first time and saying, Oh my God, this is incredible. + I can''t believe this actually worked that I", "tokens": [50364, 264, 700, 565, + 293, 1566, 11, 876, 452, 1265, 11, 341, 307, 4651, 13, 286, 393, 380, 1697, 341, + 767, 2732, 300, 286, 50640], "temperature": 0.0, "avg_logprob": -0.12363483224596296, + "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0021597477607429028}, + {"id": 981, "seek": 616112, "start": 6166.64, "end": 6176.88, "text": " did this. + So, so it''s that. I mean, I, you know, I''m in my 40s now. So I''ve gone through + that cycle a", "tokens": [50640, 630, 341, 13, 407, 11, 370, 309, 311, 300, 13, + 286, 914, 11, 286, 11, 291, 458, 11, 286, 478, 294, 452, 3356, 82, 586, 13, 407, + 286, 600, 2780, 807, 300, 6586, 257, 51152], "temperature": 0.0, "avg_logprob": + -0.12363483224596296, "compression_ratio": 1.640495867768595, "no_speech_prob": + 0.0021597477607429028}, {"id": 982, "seek": 616112, "start": 6176.88, "end": 6183.2, + "text": " lot of times where I''ve tried something and it was amazing. I do really + feel that there''s a lot of", "tokens": [51152, 688, 295, 1413, 689, 286, 600, 3031, + 746, 293, 309, 390, 2243, 13, 286, 360, 534, 841, 300, 456, 311, 257, 688, 295, + 51468], "temperature": 0.0, "avg_logprob": -0.12363483224596296, "compression_ratio": + 1.640495867768595, "no_speech_prob": 0.0021597477607429028}, {"id": 983, "seek": + 616112, "start": 6183.2, "end": 6188.48, "text": " practicality to it though, you + know, in my wisdom now. I see that yeah, just because something", "tokens": [51468, + 8496, 507, 281, 309, 1673, 11, 291, 458, 11, 294, 452, 10712, 586, 13, 286, 536, + 300, 1338, 11, 445, 570, 746, 51732], "temperature": 0.0, "avg_logprob": -0.12363483224596296, + "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0021597477607429028}, + {"id": 984, "seek": 618848, "start": 6188.48, "end": 6193.44, "text": " looks cool + doesn''t mean it''s the best thing in the world and it should be used everywhere. + So I,", "tokens": [50364, 1542, 1627, 1177, 380, 914, 309, 311, 264, 1151, 551, + 294, 264, 1002, 293, 309, 820, 312, 1143, 5315, 13, 407, 286, 11, 50612], "temperature": + 0.0, "avg_logprob": -0.11597125044146787, "compression_ratio": 1.615702479338843, + "no_speech_prob": 0.0005102952709421515}, {"id": 985, "seek": 618848, "start": 6193.44, + "end": 6205.04, "text": " I see the practical, the practical use and need for vector + search. Whether or not it turns out to be", "tokens": [50612, 286, 536, 264, 8496, + 11, 264, 8496, 764, 293, 643, 337, 8062, 3164, 13, 8503, 420, 406, 309, 4523, 484, + 281, 312, 51192], "temperature": 0.0, "avg_logprob": -0.11597125044146787, "compression_ratio": + 1.615702479338843, "no_speech_prob": 0.0005102952709421515}, {"id": 986, "seek": + 618848, "start": 6205.04, "end": 6211.2, "text": " like the end all be all the search, + you know, that debate is open, right? But I don''t think it is.", "tokens": [51192, + 411, 264, 917, 439, 312, 439, 264, 3164, 11, 291, 458, 11, 300, 7958, 307, 1269, + 11, 558, 30, 583, 286, 500, 380, 519, 309, 307, 13, 51500], "temperature": 0.0, + "avg_logprob": -0.11597125044146787, "compression_ratio": 1.615702479338843, "no_speech_prob": + 0.0005102952709421515}, {"id": 987, "seek": 618848, "start": 6211.2, "end": 6217.04, + "text": " I think it''s just one piece in the puzzle. But it does solve this whole + class of problems that", "tokens": [51500, 286, 519, 309, 311, 445, 472, 2522, 294, + 264, 12805, 13, 583, 309, 775, 5039, 341, 1379, 1508, 295, 2740, 300, 51792], "temperature": + 0.0, "avg_logprob": -0.11597125044146787, "compression_ratio": 1.615702479338843, + "no_speech_prob": 0.0005102952709421515}, {"id": 988, "seek": 621704, "start": 6217.04, + "end": 6223.5199999999995, "text": " were unsolvable before if you go back 10 years. + When I first started in search, the types of things", "tokens": [50364, 645, 2693, + 401, 17915, 949, 498, 291, 352, 646, 1266, 924, 13, 1133, 286, 700, 1409, 294, 3164, + 11, 264, 3467, 295, 721, 50688], "temperature": 0.0, "avg_logprob": -0.11372274206590283, + "compression_ratio": 1.8007380073800738, "no_speech_prob": 0.0010112569434568286}, + {"id": 989, "seek": 621704, "start": 6223.5199999999995, "end": 6229.36, "text": + " that I''m doing right now. And I''ll give you an example and I actually, you know, + I set this to", "tokens": [50688, 300, 286, 478, 884, 558, 586, 13, 400, 286, 603, + 976, 291, 364, 1365, 293, 286, 767, 11, 291, 458, 11, 286, 992, 341, 281, 50980], + "temperature": 0.0, "avg_logprob": -0.11372274206590283, "compression_ratio": 1.8007380073800738, + "no_speech_prob": 0.0010112569434568286}, {"id": 990, "seek": 621704, "start": 6229.36, + "end": 6234.64, "text": " somebody the other day. It''s like, you know, the first + time I installed solar, this is even, you", "tokens": [50980, 2618, 264, 661, 786, + 13, 467, 311, 411, 11, 291, 458, 11, 264, 700, 565, 286, 8899, 7936, 11, 341, 307, + 754, 11, 291, 51244], "temperature": 0.0, "avg_logprob": -0.11372274206590283, "compression_ratio": + 1.8007380073800738, "no_speech_prob": 0.0010112569434568286}, {"id": 991, "seek": + 621704, "start": 6234.64, "end": 6238.16, "text": " know, maybe elastic search was + around at that time, but maybe it was compass search. It wasn''t even", "tokens": + [51244, 458, 11, 1310, 17115, 3164, 390, 926, 412, 300, 565, 11, 457, 1310, 309, + 390, 10707, 3164, 13, 467, 2067, 380, 754, 51420], "temperature": 0.0, "avg_logprob": + -0.11372274206590283, "compression_ratio": 1.8007380073800738, "no_speech_prob": + 0.0010112569434568286}, {"id": 992, "seek": 621704, "start": 6238.16, "end": 6243.04, + "text": " elastic yet. The first time I installed solar and I put in some documents, + I was like, wow, this", "tokens": [51420, 17115, 1939, 13, 440, 700, 565, 286, 8899, + 7936, 293, 286, 829, 294, 512, 8512, 11, 286, 390, 411, 11, 6076, 11, 341, 51664], + "temperature": 0.0, "avg_logprob": -0.11372274206590283, "compression_ratio": 1.8007380073800738, + "no_speech_prob": 0.0010112569434568286}, {"id": 993, "seek": 624304, "start": 6243.04, + "end": 6248.32, "text": " is amazing. Like I can do a search. This is so much better + than that crappy index that I was using", "tokens": [50364, 307, 2243, 13, 1743, + 286, 393, 360, 257, 3164, 13, 639, 307, 370, 709, 1101, 813, 300, 36531, 8186, 300, + 286, 390, 1228, 50628], "temperature": 0.0, "avg_logprob": -0.1406606760892001, + "compression_ratio": 1.7357142857142858, "no_speech_prob": 0.00035309369559399784}, + {"id": 994, "seek": 624304, "start": 6248.32, "end": 6255.44, "text": " on SQL Server. + So it was just really, it was like that type of amazement. But then you,", "tokens": + [50628, 322, 19200, 25684, 13, 407, 309, 390, 445, 534, 11, 309, 390, 411, 300, + 2010, 295, 669, 921, 1712, 13, 583, 550, 291, 11, 50984], "temperature": 0.0, "avg_logprob": + -0.1406606760892001, "compression_ratio": 1.7357142857142858, "no_speech_prob": + 0.00035309369559399784}, {"id": 995, "seek": 624304, "start": 6255.44, "end": 6260.64, + "text": " you know, you work with it over time, you see the limitations and it''s + like, oh, this got it out", "tokens": [50984, 291, 458, 11, 291, 589, 365, 309, + 670, 565, 11, 291, 536, 264, 15705, 293, 309, 311, 411, 11, 1954, 11, 341, 658, + 309, 484, 51244], "temperature": 0.0, "avg_logprob": -0.1406606760892001, "compression_ratio": + 1.7357142857142858, "no_speech_prob": 0.00035309369559399784}, {"id": 996, "seek": + 624304, "start": 6260.64, "end": 6265.28, "text": " of these synonyms and all these + other problems and all this stuff. I''ll say that, you know, when you", "tokens": + [51244, 295, 613, 5451, 2526, 2592, 293, 439, 613, 661, 2740, 293, 439, 341, 1507, + 13, 286, 603, 584, 300, 11, 291, 458, 11, 562, 291, 51476], "temperature": 0.0, + "avg_logprob": -0.1406606760892001, "compression_ratio": 1.7357142857142858, "no_speech_prob": + 0.00035309369559399784}, {"id": 997, "seek": 624304, "start": 6265.28, "end": 6270.24, + "text": " when you first start off in like the relevance of solar, like out of the + box, you take their example,", "tokens": [51476, 562, 291, 700, 722, 766, 294, 411, + 264, 32684, 295, 7936, 11, 411, 484, 295, 264, 2424, 11, 291, 747, 641, 1365, 11, + 51724], "temperature": 0.0, "avg_logprob": -0.1406606760892001, "compression_ratio": + 1.7357142857142858, "no_speech_prob": 0.00035309369559399784}, {"id": 998, "seek": + 627024, "start": 6271.12, "end": 6278.48, "text": " schema XML and you and you add + some documents to it and you get back stuff and you''re like, okay,", "tokens": + [50408, 34078, 43484, 293, 291, 293, 291, 909, 512, 8512, 281, 309, 293, 291, 483, + 646, 1507, 293, 291, 434, 411, 11, 1392, 11, 50776], "temperature": 0.0, "avg_logprob": + -0.17839252587520715, "compression_ratio": 1.6569037656903767, "no_speech_prob": + 0.0020829071290791035}, {"id": 999, "seek": 627024, "start": 6278.48, "end": 6285.679999999999, + "text": " this is cool. If you take that feeling and then you once and I''ll just + use quadrant for an example", "tokens": [50776, 341, 307, 1627, 13, 759, 291, 747, + 300, 2633, 293, 550, 291, 1564, 293, 286, 603, 445, 764, 46856, 337, 364, 1365, + 51136], "temperature": 0.0, "avg_logprob": -0.17839252587520715, "compression_ratio": + 1.6569037656903767, "no_speech_prob": 0.0020829071290791035}, {"id": 1000, "seek": + 627024, "start": 6285.679999999999, "end": 6291.84, "text": " because quadrant is + in my opinion, like super easy to use, like you just doc or pull quadrant and", + "tokens": [51136, 570, 46856, 307, 294, 452, 4800, 11, 411, 1687, 1858, 281, 764, + 11, 411, 291, 445, 3211, 420, 2235, 46856, 293, 51444], "temperature": 0.0, "avg_logprob": + -0.17839252587520715, "compression_ratio": 1.6569037656903767, "no_speech_prob": + 0.0020829071290791035}, {"id": 1001, "seek": 627024, "start": 6291.84, "end": 6298.639999999999, + "text": " use throw some and stuff in there. Especially now with this node thing. + So when I did that, the first", "tokens": [51444, 764, 3507, 512, 293, 1507, 294, + 456, 13, 8545, 586, 365, 341, 9984, 551, 13, 407, 562, 286, 630, 300, 11, 264, 700, + 51784], "temperature": 0.0, "avg_logprob": -0.17839252587520715, "compression_ratio": + 1.6569037656903767, "no_speech_prob": 0.0020829071290791035}, {"id": 1002, "seek": + 629864, "start": 6298.64, "end": 6302.8, "text": " time I used quadrant and I wrote + this node wrapper and I just chucked in a whole bunch of documents,", "tokens": + [50364, 565, 286, 1143, 46856, 293, 286, 4114, 341, 9984, 46906, 293, 286, 445, + 20870, 292, 294, 257, 1379, 3840, 295, 8512, 11, 50572], "temperature": 0.0, "avg_logprob": + -0.12379133208724093, "compression_ratio": 1.7906137184115523, "no_speech_prob": + 0.0004194905632175505}, {"id": 1003, "seek": 629864, "start": 6302.8, "end": 6308.4800000000005, + "text": " I saw that like just the out of the box relevance. And I''m not saying + this is fine tuned, like this", "tokens": [50572, 286, 1866, 300, 411, 445, 264, + 484, 295, 264, 2424, 32684, 13, 400, 286, 478, 406, 1566, 341, 307, 2489, 10870, + 11, 411, 341, 50856], "temperature": 0.0, "avg_logprob": -0.12379133208724093, "compression_ratio": + 1.7906137184115523, "no_speech_prob": 0.0004194905632175505}, {"id": 1004, "seek": + 629864, "start": 6308.4800000000005, "end": 6314.56, "text": " isn''t something + production worthy. But just the out of the box relevance, I was like, this is better", + "tokens": [50856, 1943, 380, 746, 4265, 14829, 13, 583, 445, 264, 484, 295, 264, + 2424, 32684, 11, 286, 390, 411, 11, 341, 307, 1101, 51160], "temperature": 0.0, + "avg_logprob": -0.12379133208724093, "compression_ratio": 1.7906137184115523, "no_speech_prob": + 0.0004194905632175505}, {"id": 1005, "seek": 629864, "start": 6314.56, "end": 6321.52, + "text": " and I would spend in my opinion less time worrying about this than I would + with an inverted index,", "tokens": [51160, 293, 286, 576, 3496, 294, 452, 4800, + 1570, 565, 18788, 466, 341, 813, 286, 576, 365, 364, 38969, 8186, 11, 51508], "temperature": + 0.0, "avg_logprob": -0.12379133208724093, "compression_ratio": 1.7906137184115523, + "no_speech_prob": 0.0004194905632175505}, {"id": 1006, "seek": 629864, "start": + 6321.52, "end": 6328.240000000001, "text": " you know, just because well, yeah, + maybe the results aren''t like super precise all the time and", "tokens": [51508, + 291, 458, 11, 445, 570, 731, 11, 1338, 11, 1310, 264, 3542, 3212, 380, 411, 1687, + 13600, 439, 264, 565, 293, 51844], "temperature": 0.0, "avg_logprob": -0.12379133208724093, + "compression_ratio": 1.7906137184115523, "no_speech_prob": 0.0004194905632175505}, + {"id": 1007, "seek": 632824, "start": 6328.32, "end": 6334.639999999999, "text": + " things like that. But if I''m on a team and it''s like, I got this search bar + and I got this content and", "tokens": [50368, 721, 411, 300, 13, 583, 498, 286, + 478, 322, 257, 1469, 293, 309, 311, 411, 11, 286, 658, 341, 3164, 2159, 293, 286, + 658, 341, 2701, 293, 50684], "temperature": 0.0, "avg_logprob": -0.10822598492657697, + "compression_ratio": 1.92578125, "no_speech_prob": 0.001967298099771142}, {"id": + 1008, "seek": 632824, "start": 6336.48, "end": 6340.88, "text": " I don''t want + to worry about it, right? I don''t want to worry about it. I just wanted to work. + I", "tokens": [50776, 286, 500, 380, 528, 281, 3292, 466, 309, 11, 558, 30, 286, + 500, 380, 528, 281, 3292, 466, 309, 13, 286, 445, 1415, 281, 589, 13, 286, 50996], + "temperature": 0.0, "avg_logprob": -0.10822598492657697, "compression_ratio": 1.92578125, + "no_speech_prob": 0.001967298099771142}, {"id": 1009, "seek": 632824, "start": 6340.88, + "end": 6345.2, "text": " wanted to surface stuff that''s like reasonably accurate. + It doesn''t have to be the best search in the", "tokens": [50996, 1415, 281, 3753, + 1507, 300, 311, 411, 23551, 8559, 13, 467, 1177, 380, 362, 281, 312, 264, 1151, + 3164, 294, 264, 51212], "temperature": 0.0, "avg_logprob": -0.10822598492657697, + "compression_ratio": 1.92578125, "no_speech_prob": 0.001967298099771142}, {"id": + 1010, "seek": 632824, "start": 6345.2, "end": 6350.88, "text": " world. But it''s + a cost for me. It''s a cost for me as a team. I don''t make money from search, but", + "tokens": [51212, 1002, 13, 583, 309, 311, 257, 2063, 337, 385, 13, 467, 311, 257, + 2063, 337, 385, 382, 257, 1469, 13, 286, 500, 380, 652, 1460, 490, 3164, 11, 457, + 51496], "temperature": 0.0, "avg_logprob": -0.10822598492657697, "compression_ratio": + 1.92578125, "no_speech_prob": 0.001967298099771142}, {"id": 1011, "seek": 632824, + "start": 6350.88, "end": 6355.92, "text": " it''s something I have to support. I + think vector search offers are really, really good solution", "tokens": [51496, + 309, 311, 746, 286, 362, 281, 1406, 13, 286, 519, 8062, 3164, 7736, 366, 534, 11, + 534, 665, 3827, 51748], "temperature": 0.0, "avg_logprob": -0.10822598492657697, + "compression_ratio": 1.92578125, "no_speech_prob": 0.001967298099771142}, {"id": + 1012, "seek": 635592, "start": 6356.24, "end": 6364.72, "text": " because it''s + not like you have to chase that endless bug of this doesn''t even have anything + to do", "tokens": [50380, 570, 309, 311, 406, 411, 291, 362, 281, 15359, 300, 16144, + 7426, 295, 341, 1177, 380, 754, 362, 1340, 281, 360, 50804], "temperature": 0.0, + "avg_logprob": -0.1995239512125651, "compression_ratio": 1.597883597883598, "no_speech_prob": + 0.008552688173949718}, {"id": 1013, "seek": 635592, "start": 6364.72, "end": 6374.4800000000005, + "text": " with my search. I search for, you know, what is the best hiking boot or + something like that, you know?", "tokens": [50804, 365, 452, 3164, 13, 286, 3164, + 337, 11, 291, 458, 11, 437, 307, 264, 1151, 23784, 11450, 420, 746, 411, 300, 11, + 291, 458, 30, 51292], "temperature": 0.0, "avg_logprob": -0.1995239512125651, "compression_ratio": + 1.597883597883598, "no_speech_prob": 0.008552688173949718}, {"id": 1014, "seek": + 635592, "start": 6374.4800000000005, "end": 6381.2, "text": " And all the documents + they matched what 10 times, but there''s no semblance of hiking boot or anything", + "tokens": [51292, 400, 439, 264, 8512, 436, 21447, 437, 1266, 1413, 11, 457, 456, + 311, 572, 20775, 37271, 295, 23784, 11450, 420, 1340, 51628], "temperature": 0.0, + "avg_logprob": -0.1995239512125651, "compression_ratio": 1.597883597883598, "no_speech_prob": + 0.008552688173949718}, {"id": 1015, "seek": 638120, "start": 6381.28, "end": 6385.5199999999995, + "text": " in my document. This is terrible. You know, you don''t get anything like + that in vector search.", "tokens": [50368, 294, 452, 4166, 13, 639, 307, 6237, 13, + 509, 458, 11, 291, 500, 380, 483, 1340, 411, 300, 294, 8062, 3164, 13, 50580], "temperature": + 0.0, "avg_logprob": -0.1636473083496094, "compression_ratio": 1.7404580152671756, + "no_speech_prob": 0.0034643705002963543}, {"id": 1016, "seek": 638120, "start": + 6386.08, "end": 6391.2, "text": " And that''s, I think, the appeal. You know, when + you get into like real,", "tokens": [50608, 400, 300, 311, 11, 286, 519, 11, 264, + 13668, 13, 509, 458, 11, 562, 291, 483, 666, 411, 957, 11, 50864], "temperature": + 0.0, "avg_logprob": -0.1636473083496094, "compression_ratio": 1.7404580152671756, + "no_speech_prob": 0.0034643705002963543}, {"id": 1017, "seek": 638120, "start": + 6392.4, "end": 6398.5599999999995, "text": " real production, like highly tuned + search, it''s just one piece. But just for the teams that''s like", "tokens": [50924, + 957, 4265, 11, 411, 5405, 10870, 3164, 11, 309, 311, 445, 472, 2522, 13, 583, 445, + 337, 264, 5491, 300, 311, 411, 51232], "temperature": 0.0, "avg_logprob": -0.1636473083496094, + "compression_ratio": 1.7404580152671756, "no_speech_prob": 0.0034643705002963543}, + {"id": 1018, "seek": 638120, "start": 6398.5599999999995, "end": 6403.36, "text": + " out of the box, I want to work and I don''t want to deal with it. I think it''s + a better,", "tokens": [51232, 484, 295, 264, 2424, 11, 286, 528, 281, 589, 293, + 286, 500, 380, 528, 281, 2028, 365, 309, 13, 286, 519, 309, 311, 257, 1101, 11, + 51472], "temperature": 0.0, "avg_logprob": -0.1636473083496094, "compression_ratio": + 1.7404580152671756, "no_speech_prob": 0.0034643705002963543}, {"id": 1019, "seek": + 638120, "start": 6404.16, "end": 6409.5199999999995, "text": " I think it''s a better + solution than elasticer solar. You end up spending a lot less time and headache.", + "tokens": [51512, 286, 519, 309, 311, 257, 1101, 3827, 813, 17115, 260, 7936, 13, + 509, 917, 493, 6434, 257, 688, 1570, 565, 293, 23520, 13, 51780], "temperature": + 0.0, "avg_logprob": -0.1636473083496094, "compression_ratio": 1.7404580152671756, + "no_speech_prob": 0.0034643705002963543}, {"id": 1020, "seek": 640952, "start": + 6410.4800000000005, "end": 6414.4800000000005, "text": " Yeah, that''s amazing. + That''s that''s so deep. And in what you said,", "tokens": [50412, 865, 11, 300, + 311, 2243, 13, 663, 311, 300, 311, 370, 2452, 13, 400, 294, 437, 291, 848, 11, 50612], + "temperature": 0.0, "avg_logprob": -0.16556003656280174, "compression_ratio": 1.72, + "no_speech_prob": 0.009662752971053123}, {"id": 1021, "seek": 640952, "start": 6416.4800000000005, + "end": 6423.040000000001, "text": " speaks, speaks and sings a practitioner, but + I think also speaks and sings a dreamer.", "tokens": [50712, 10789, 11, 10789, 293, + 23250, 257, 32125, 11, 457, 286, 519, 611, 10789, 293, 23250, 257, 3055, 260, 13, + 51040], "temperature": 0.0, "avg_logprob": -0.16556003656280174, "compression_ratio": + 1.72, "no_speech_prob": 0.009662752971053123}, {"id": 1022, "seek": 640952, "start": + 6423.68, "end": 6429.68, "text": " I think you dream of better ways of searching + things, right? And like you went through it", "tokens": [51072, 286, 519, 291, 3055, + 295, 1101, 2098, 295, 10808, 721, 11, 558, 30, 400, 411, 291, 1437, 807, 309, 51372], + "temperature": 0.0, "avg_logprob": -0.16556003656280174, "compression_ratio": 1.72, + "no_speech_prob": 0.009662752971053123}, {"id": 1023, "seek": 640952, "start": 6430.56, + "end": 6438.160000000001, "text": " practically, but also, you know, when you when + you get so deep into practical elements, you get stuck", "tokens": [51416, 15667, + 11, 457, 611, 11, 291, 458, 11, 562, 291, 562, 291, 483, 370, 2452, 666, 8496, 4959, + 11, 291, 483, 5541, 51796], "temperature": 0.0, "avg_logprob": -0.16556003656280174, + "compression_ratio": 1.72, "no_speech_prob": 0.009662752971053123}, {"id": 1024, + "seek": 643816, "start": 6438.24, "end": 6443.28, "text": " into it and you''re + like, you''re thinking in the in the framework of the given system,", "tokens": + [50368, 666, 309, 293, 291, 434, 411, 11, 291, 434, 1953, 294, 264, 294, 264, 8388, + 295, 264, 2212, 1185, 11, 50620], "temperature": 0.0, "avg_logprob": -0.13786199903979743, + "compression_ratio": 1.6622222222222223, "no_speech_prob": 0.0015538427978754044}, + {"id": 1025, "seek": 643816, "start": 6443.28, "end": 6448.96, "text": " of a given + language even, right? Sometimes the paradigms that you read through the docs and + you", "tokens": [50620, 295, 257, 2212, 2856, 754, 11, 558, 30, 4803, 264, 13480, + 328, 2592, 300, 291, 1401, 807, 264, 45623, 293, 291, 50904], "temperature": 0.0, + "avg_logprob": -0.13786199903979743, "compression_ratio": 1.6622222222222223, "no_speech_prob": + 0.0015538427978754044}, {"id": 1026, "seek": 643816, "start": 6448.96, "end": 6455.2, + "text": " keep thinking about them, it''s hard to unstick yourself from them. And + I mean, the fact that", "tokens": [50904, 1066, 1953, 466, 552, 11, 309, 311, 1152, + 281, 517, 11881, 1803, 490, 552, 13, 400, 286, 914, 11, 264, 1186, 300, 51216], + "temperature": 0.0, "avg_logprob": -0.13786199903979743, "compression_ratio": 1.6622222222222223, + "no_speech_prob": 0.0015538427978754044}, {"id": 1027, "seek": 643816, "start": + 6455.2, "end": 6462.48, "text": " vector search field was born is magical in many + ways. So I feel like you feel the same. And I mean,", "tokens": [51216, 8062, 3164, + 2519, 390, 4232, 307, 12066, 294, 867, 2098, 13, 407, 286, 841, 411, 291, 841, 264, + 912, 13, 400, 286, 914, 11, 51580], "temperature": 0.0, "avg_logprob": -0.13786199903979743, + "compression_ratio": 1.6622222222222223, "no_speech_prob": 0.0015538427978754044}, + {"id": 1028, "seek": 646248, "start": 6462.48, "end": 6469.36, "text": " the fact + that you also ventured with me and others into building a new algorithm for vector + search", "tokens": [50364, 264, 1186, 300, 291, 611, 6931, 3831, 365, 385, 293, + 2357, 666, 2390, 257, 777, 9284, 337, 8062, 3164, 50708], "temperature": 0.0, "avg_logprob": + -0.11177883353284611, "compression_ratio": 1.6863636363636363, "no_speech_prob": + 0.0067966715432703495}, {"id": 1029, "seek": 646248, "start": 6469.36, "end": 6476.32, + "text": " also says that that you wanted to go as deep as implementing an algorithm, + right? So", "tokens": [50708, 611, 1619, 300, 300, 291, 1415, 281, 352, 382, 2452, + 382, 18114, 364, 9284, 11, 558, 30, 407, 51056], "temperature": 0.0, "avg_logprob": + -0.11177883353284611, "compression_ratio": 1.6863636363636363, "no_speech_prob": + 0.0067966715432703495}, {"id": 1030, "seek": 646248, "start": 6476.32, "end": 6481.2, + "text": " which what could be sexier than implementing an algorithm? I mean, I don''t + know. Of course,", "tokens": [51056, 597, 437, 727, 312, 3260, 811, 813, 18114, + 364, 9284, 30, 286, 914, 11, 286, 500, 380, 458, 13, 2720, 1164, 11, 51300], "temperature": + 0.0, "avg_logprob": -0.11177883353284611, "compression_ratio": 1.6863636363636363, + "no_speech_prob": 0.0067966715432703495}, {"id": 1031, "seek": 646248, "start": + 6481.2, "end": 6486.48, "text": " all other things are also sexy, but I''m just + saying that it''s very it''s very complex. It''s very", "tokens": [51300, 439, 661, + 721, 366, 611, 13701, 11, 457, 286, 478, 445, 1566, 300, 309, 311, 588, 309, 311, + 588, 3997, 13, 467, 311, 588, 51564], "temperature": 0.0, "avg_logprob": -0.11177883353284611, + "compression_ratio": 1.6863636363636363, "no_speech_prob": 0.0067966715432703495}, + {"id": 1032, "seek": 648648, "start": 6487.28, "end": 6495.5199999999995, "text": + " demanding in intellectually demanding work. So that that''s amazing. Thanks so + much for this depth.", "tokens": [50404, 19960, 294, 46481, 19960, 589, 13, 407, + 300, 300, 311, 2243, 13, 2561, 370, 709, 337, 341, 7161, 13, 50816], "temperature": + 0.0, "avg_logprob": -0.19223440847089213, "compression_ratio": 1.7657657657657657, + "no_speech_prob": 0.006982909515500069}, {"id": 1033, "seek": 648648, "start": 6495.5199999999995, + "end": 6501.839999999999, "text": " And is there something you would like to share + or announce with, you know, on mighty or maybe on", "tokens": [50816, 400, 307, + 456, 746, 291, 576, 411, 281, 2073, 420, 7478, 365, 11, 291, 458, 11, 322, 21556, + 420, 1310, 322, 51132], "temperature": 0.0, "avg_logprob": -0.19223440847089213, + "compression_ratio": 1.7657657657657657, "no_speech_prob": 0.006982909515500069}, + {"id": 1034, "seek": 648648, "start": 6501.839999999999, "end": 6506.799999999999, + "text": " something you''re going to be presenting on Berlin buzzwords, I know as + well, but is there something", "tokens": [51132, 746, 291, 434, 516, 281, 312, 15578, + 322, 13848, 13036, 13832, 11, 286, 458, 382, 731, 11, 457, 307, 456, 746, 51380], + "temperature": 0.0, "avg_logprob": -0.19223440847089213, "compression_ratio": 1.7657657657657657, + "no_speech_prob": 0.006982909515500069}, {"id": 1035, "seek": 648648, "start": 6506.799999999999, + "end": 6515.12, "text": " that you would like to share? Yeah, so I am presenting + a Berlin buzzwords. I am putting together", "tokens": [51380, 300, 291, 576, 411, + 281, 2073, 30, 865, 11, 370, 286, 669, 15578, 257, 13848, 13036, 13832, 13, 286, + 669, 3372, 1214, 51796], "temperature": 0.0, "avg_logprob": -0.19223440847089213, + "compression_ratio": 1.7657657657657657, "no_speech_prob": 0.006982909515500069}, + {"id": 1036, "seek": 651512, "start": 6515.68, "end": 6522.0, "text": " a charity + event to hack on vector search. And that''s going to be on May 5th. I don''t know + when this", "tokens": [50392, 257, 16863, 2280, 281, 10339, 322, 8062, 3164, 13, + 400, 300, 311, 516, 281, 312, 322, 1891, 1025, 392, 13, 286, 500, 380, 458, 562, + 341, 50708], "temperature": 0.0, "avg_logprob": -0.13522368344393643, "compression_ratio": + 1.6625, "no_speech_prob": 0.0016674576327204704}, {"id": 1037, "seek": 651512, "start": + 6522.0, "end": 6528.4, "text": " podcast will be published, but on May 5th, I want + to get and I want it to be it''s just going to be", "tokens": [50708, 7367, 486, + 312, 6572, 11, 457, 322, 1891, 1025, 392, 11, 286, 528, 281, 483, 293, 286, 528, + 309, 281, 312, 309, 311, 445, 516, 281, 312, 51028], "temperature": 0.0, "avg_logprob": + -0.13522368344393643, "compression_ratio": 1.6625, "no_speech_prob": 0.0016674576327204704}, + {"id": 1038, "seek": 651512, "start": 6528.4, "end": 6535.04, "text": " an all day + learning session on and I''m not charging money for this. This is like free. I just + want to", "tokens": [51028, 364, 439, 786, 2539, 5481, 322, 293, 286, 478, 406, + 11379, 1460, 337, 341, 13, 639, 307, 411, 1737, 13, 286, 445, 528, 281, 51360], + "temperature": 0.0, "avg_logprob": -0.13522368344393643, "compression_ratio": 1.6625, + "no_speech_prob": 0.0016674576327204704}, {"id": 1039, "seek": 651512, "start": + 6535.599999999999, "end": 6540.72, "text": " show people how to use these tools + if you''re not in the Python world. If you''re part of the Python", "tokens": [51388, + 855, 561, 577, 281, 764, 613, 3873, 498, 291, 434, 406, 294, 264, 15329, 1002, 13, + 759, 291, 434, 644, 295, 264, 15329, 51644], "temperature": 0.0, "avg_logprob": + -0.13522368344393643, "compression_ratio": 1.6625, "no_speech_prob": 0.0016674576327204704}, + {"id": 1040, "seek": 654072, "start": 6540.72, "end": 6548.0, "text": " world, you + want to join amazing. Great. But I want to do an all day like hackathon, where I''ll + show", "tokens": [50364, 1002, 11, 291, 528, 281, 3917, 2243, 13, 3769, 13, 583, + 286, 528, 281, 360, 364, 439, 786, 411, 10339, 18660, 11, 689, 286, 603, 855, 50728], + "temperature": 0.0, "avg_logprob": -0.19846921920776367, "compression_ratio": 1.6431535269709543, + "no_speech_prob": 0.0018031391082331538}, {"id": 1041, "seek": 654072, "start": + 6548.0, "end": 6553.04, "text": " you how to get these things up and running hack + away at it by the end you''ll have, you know, a working", "tokens": [50728, 291, + 577, 281, 483, 613, 721, 493, 293, 2614, 10339, 1314, 412, 309, 538, 264, 917, 291, + 603, 362, 11, 291, 458, 11, 257, 1364, 50980], "temperature": 0.0, "avg_logprob": + -0.19846921920776367, "compression_ratio": 1.6431535269709543, "no_speech_prob": + 0.0018031391082331538}, {"id": 1042, "seek": 654072, "start": 6553.04, "end": 6558.72, + "text": " or working example on your own. And all the money we''re all the time + we''re going to raise money for", "tokens": [50980, 420, 1364, 1365, 322, 428, 1065, + 13, 400, 439, 264, 1460, 321, 434, 439, 264, 565, 321, 434, 516, 281, 5300, 1460, + 337, 51264], "temperature": 0.0, "avg_logprob": -0.19846921920776367, "compression_ratio": + 1.6431535269709543, "no_speech_prob": 0.0018031391082331538}, {"id": 1043, "seek": + 654072, "start": 6558.72, "end": 6565.12, "text": " charity, specifically around + refugees and displaced people, you know, because of the horrible", "tokens": [51264, + 16863, 11, 4682, 926, 18301, 293, 33692, 561, 11, 291, 458, 11, 570, 295, 264, 9263, + 51584], "temperature": 0.0, "avg_logprob": -0.19846921920776367, "compression_ratio": + 1.6431535269709543, "no_speech_prob": 0.0018031391082331538}, {"id": 1044, "seek": + 656512, "start": 6565.2, "end": 6572.88, "text": " things that are happening in + Ukraine and other parts of the world as well. You know, getting", "tokens": [50368, + 721, 300, 366, 2737, 294, 14081, 293, 661, 3166, 295, 264, 1002, 382, 731, 13, 509, + 458, 11, 1242, 50752], "temperature": 0.0, "avg_logprob": -0.1525156878042912, "compression_ratio": + 1.4742268041237114, "no_speech_prob": 0.001272257068194449}, {"id": 1045, "seek": + 656512, "start": 6574.0, "end": 6579.76, "text": " getting some learning happening + and also raising money for charity seems like a great way to spend", "tokens": [50808, + 1242, 512, 2539, 2737, 293, 611, 11225, 1460, 337, 16863, 2544, 411, 257, 869, 636, + 281, 3496, 51096], "temperature": 0.0, "avg_logprob": -0.1525156878042912, "compression_ratio": + 1.4742268041237114, "no_speech_prob": 0.001272257068194449}, {"id": 1046, "seek": + 656512, "start": 6579.76, "end": 6588.4, "text": " time. So I plan to host that + on May 5th. It''s probably going to be on Twitch because I want to", "tokens": [51096, + 565, 13, 407, 286, 1393, 281, 3975, 300, 322, 1891, 1025, 392, 13, 467, 311, 1391, + 516, 281, 312, 322, 22222, 570, 286, 528, 281, 51528], "temperature": 0.0, "avg_logprob": + -0.1525156878042912, "compression_ratio": 1.4742268041237114, "no_speech_prob": + 0.001272257068194449}, {"id": 1047, "seek": 658840, "start": 6589.36, "end": 6594.799999999999, + "text": " just to be an open drop in drop out format. You can come, you can go. + It''s not going to be like a", "tokens": [50412, 445, 281, 312, 364, 1269, 3270, + 294, 3270, 484, 7877, 13, 509, 393, 808, 11, 291, 393, 352, 13, 467, 311, 406, 516, + 281, 312, 411, 257, 50684], "temperature": 0.0, "avg_logprob": -0.1657773866428165, + "compression_ratio": 1.6986301369863013, "no_speech_prob": 0.022870071232318878}, + {"id": 1048, "seek": 658840, "start": 6594.799999999999, "end": 6600.0, "text": + " controlled zoom where you, you know, it''s like that. It''s going to be on Twitch + with chat and stuff", "tokens": [50684, 10164, 8863, 689, 291, 11, 291, 458, 11, + 309, 311, 411, 300, 13, 467, 311, 516, 281, 312, 322, 22222, 365, 5081, 293, 1507, + 50944], "temperature": 0.0, "avg_logprob": -0.1657773866428165, "compression_ratio": + 1.6986301369863013, "no_speech_prob": 0.022870071232318878}, {"id": 1049, "seek": + 658840, "start": 6600.0, "end": 6604.32, "text": " like that. So I''m going to get + it all set up. Details are going to come out shortly. By the time", "tokens": [50944, + 411, 300, 13, 407, 286, 478, 516, 281, 483, 309, 439, 992, 493, 13, 42811, 366, + 516, 281, 808, 484, 13392, 13, 3146, 264, 565, 51160], "temperature": 0.0, "avg_logprob": + -0.1657773866428165, "compression_ratio": 1.6986301369863013, "no_speech_prob": + 0.022870071232318878}, {"id": 1050, "seek": 658840, "start": 6604.32, "end": 6609.2, + "text": " this is published, maybe the details will be available already and we''ll + drop a link. Yeah, awesome.", "tokens": [51160, 341, 307, 6572, 11, 1310, 264, 4365, + 486, 312, 2435, 1217, 293, 321, 603, 3270, 257, 2113, 13, 865, 11, 3476, 13, 51404], + "temperature": 0.0, "avg_logprob": -0.1657773866428165, "compression_ratio": 1.6986301369863013, + "no_speech_prob": 0.022870071232318878}, {"id": 1051, "seek": 658840, "start": 6609.2, + "end": 6617.44, "text": " This sounds amazing that you also keep thinking about + this sensitive topics or like what''s happening", "tokens": [51404, 639, 3263, 2243, + 300, 291, 611, 1066, 1953, 466, 341, 9477, 8378, 420, 411, 437, 311, 2737, 51816], + "temperature": 0.0, "avg_logprob": -0.1657773866428165, "compression_ratio": 1.6986301369863013, + "no_speech_prob": 0.022870071232318878}, {"id": 1052, "seek": 661744, "start": 6618.0, + "end": 6623.919999999999, "text": " in the world and you are also contributing with + your skills into a good course here.", "tokens": [50392, 294, 264, 1002, 293, 291, + 366, 611, 19270, 365, 428, 3942, 666, 257, 665, 1164, 510, 13, 50688], "temperature": + 0.0, "avg_logprob": -0.2411188013413373, "compression_ratio": 1.5023474178403755, + "no_speech_prob": 0.007671094965189695}, {"id": 1053, "seek": 661744, "start": 6624.32, + "end": 6628.719999999999, "text": " Thanks so much. I will try to publish this podcast + before May 5th.", "tokens": [50708, 2561, 370, 709, 13, 286, 486, 853, 281, 11374, + 341, 7367, 949, 1891, 1025, 392, 13, 50928], "temperature": 0.0, "avg_logprob": + -0.2411188013413373, "compression_ratio": 1.5023474178403755, "no_speech_prob": + 0.007671094965189695}, {"id": 1054, "seek": 661744, "start": 6630.5599999999995, + "end": 6636.24, "text": " So make sure that somebody can join and get chappin, of + course, we can do the", "tokens": [51020, 407, 652, 988, 300, 2618, 393, 3917, 293, + 483, 417, 1746, 259, 11, 295, 1164, 11, 321, 393, 360, 264, 51304], "temperature": + 0.0, "avg_logprob": -0.2411188013413373, "compression_ratio": 1.5023474178403755, + "no_speech_prob": 0.007671094965189695}, {"id": 1055, "seek": 661744, "start": 6636.24, + "end": 6644.4, "text": " the most media, social media push. This is amazing. Thanks + so much, Max. I''ve enjoyed this", "tokens": [51304, 264, 881, 3021, 11, 2093, 3021, + 2944, 13, 639, 307, 2243, 13, 2561, 370, 709, 11, 7402, 13, 286, 600, 4626, 341, + 51712], "temperature": 0.0, "avg_logprob": -0.2411188013413373, "compression_ratio": + 1.5023474178403755, "no_speech_prob": 0.007671094965189695}, {"id": 1056, "seek": + 664440, "start": 6644.4, "end": 6654.24, "text": " conversation thoroughly. We went + into depth and with and everything all dimensions. It''s a multi-dimensional", "tokens": + [50364, 3761, 17987, 13, 492, 1437, 666, 7161, 293, 365, 293, 1203, 439, 12819, + 13, 467, 311, 257, 4825, 12, 18759, 50856], "temperature": 0.0, "avg_logprob": -0.1875454818501192, + "compression_ratio": 1.5172413793103448, "no_speech_prob": 0.02902468480169773}, + {"id": 1057, "seek": 664440, "start": 6654.24, "end": 6662.0, "text": " conversation. + So thanks so much and keep it up and I''m curious to hear news about Mighty and + the", "tokens": [50856, 3761, 13, 407, 3231, 370, 709, 293, 1066, 309, 493, 293, + 286, 478, 6369, 281, 1568, 2583, 466, 45874, 293, 264, 51244], "temperature": 0.0, + "avg_logprob": -0.1875454818501192, "compression_ratio": 1.5172413793103448, "no_speech_prob": + 0.02902468480169773}, {"id": 1058, "seek": 664440, "start": 6662.0, "end": 6669.12, + "text": " tooling around it and also looking forward to your building buzzwords + presentation. Yeah, thank you so", "tokens": [51244, 46593, 926, 309, 293, 611, + 1237, 2128, 281, 428, 2390, 13036, 13832, 5860, 13, 865, 11, 1309, 291, 370, 51600], + "temperature": 0.0, "avg_logprob": -0.1875454818501192, "compression_ratio": 1.5172413793103448, + "no_speech_prob": 0.02902468480169773}, {"id": 1059, "seek": 666912, "start": 6669.12, + "end": 6679.44, "text": " much, Dima. It''s great to chat. Yeah, thank you, Max. + Cheers. Cheers. Take care. Bye-bye.", "tokens": [50364, 709, 11, 413, 4775, 13, + 467, 311, 869, 281, 5081, 13, 865, 11, 1309, 291, 11, 7402, 13, 13006, 13, 13006, + 13, 3664, 1127, 13, 4621, 12, 6650, 13, 50880], "temperature": 0.0, "avg_logprob": + -0.31698110699653625, "compression_ratio": 1.0348837209302326, "no_speech_prob": + 0.038853757083415985}]' +--- + +Hello, vector podcast is here. And today I'm going to be talking to Max Irwin. He's this star in the search engine business in search engine world. He has been doubling also in NLP a lot. I don't know 20 years. It's huge amount of time. +And I mean, he has been consulting in this space, also building products. And now he's focusing on building his new product. And he's the founder of company called max.io, which is also a website. You can go check it out. And he's building a mighty inference server. +And the number of other tools that I'm sure Max will talk about today. Hey, Max, how are you doing? I'm doing great. How are you? I'm great. And thanks so much for joining me today. I'm very happy to be talking to you today. I'm very happy to be talking to you today. +I'm very happy to be talking to you today. I'm very happy to be talking to you today. And I'm learning about my tea and all the things that you're cooking there. But I think as a tradition, could you start with introducing yourself first? Sure. Yeah. Hi. So I'm good to go on my own business. +I'm good to go on my own business. So I'm a doctor. I'm a doctor. I'm a doctor. I'm a doctor. I'm a doctor. I'm a doctor. I'm a doctor. I'm a doctor. I'm a doctor. I'm a doctor. And sometimes I get a lot of things to do with my career. +In fact, when I was a younger, I didn't do so well in my language course. started in 2015-2016 with actual product development around NLP. With search, I've been doing search since about 2010-2011. +Again, it's fuzzy when I actually first started, but I think the first real serious thing I did with search was when I went to take my first solar training course, which was one of the, when Lucid Works still had solar training and they had contractors coming to give training. +So that was, that was in 2012, but I'd been messing around with engines before that. And I started on an engine called DT Search, which was the C++ closed source engine, but you could buy the code for like a thousand dollars a year. +So the company I was working for, MetiRex, we actually bought the code. And I was, I was the newbie with search. I mean, we had guys been working with it for a while. And they'd built a whole platform around DT Search. And then I was starting to show its age. So we started shifting over to solar. +But yeah, since I started that, but well, before that, I did a little bunch of computer programs. So like the 20 years, 22 years-ish stuff that's in my bio, like I've been, I graduated university in the year 2000, and I've been, you know, working professionally software ever since. +But with Search, I, I really got interested in in Search around 2012, is when I really said, wow, this is amazing. This is so much different from what I've been doing before. So that's when I really do have had first into into the problem space in the domain. Yeah. +And some people say that many of us ended up in Search field by accident, as well as actually NLP. I've been talking to one professor here in the University of Helsinki has built machine translation team, very, very strong one. And, and he has built the, the system called Opus. +And, and, and he actually said that he ended up in NLP also by accident because it was just an offer from a professor and he decided to take it and he turned out to be quite good at it, you know. +But he also had another option just to go and work in in Germany, he's from Germany, to work in Germany in some company, database company. And, and, and likely he didn't take that path. +How was it for you? How do you feel about yourself and then ending up in the in the in this space? That's a great question. It's interesting. I feel like ending it up, I, it was definitely somewhat accidental. +I found, I, I had the pleasure of meeting so many people in search through my different positions that I was working with and the varying degrees of expertise. +I found that a lot of people who got involved with machine learning found out about search because TFI, DF and all that stuff is like an algorithm and it's like, oh, there's this whole language problem behind search, so we have to figure out. +And then the search people get involved in machine learning because, oh, this language problem is horrible. How do we solve it with automation and learning? So I, I accidentally stumbled on it because I took, it was a, it was a role that was in like healthcare compliance. +And I was interested in that domain specifically and search just happened to be a really important problem in that space. So that's how I kind of got into the, the technical domain of search. +And it just was so much more fascinating than like the stuff that I was used to with, crud, you know, just create read update delete and just workflow applications, which I'd been doing for, you know, 10 to 12 years at that point. Yeah. +Yeah, I mean, for me, like searching, you know, like, I think I started 2002, 2003 academically, but then it was like seven years past and I still couldn't find a niche or a job for myself because there haven't been then many search companies in Finland actually at that point. +And then I found a company which I joined in 2010, AlfaSense. And it was Apache Solar, you see in everything new, but it was still somehow inviting. +And I think the first time when I, when I've built the backend and I was like, okay, somebody is going to use this, somebody is going to type the queries and we'll try to find information. So I also tried it out and kind of like maybe work, maybe didn't, I wasn't the, the, the user of this system. +I didn't know what to type. So I was just grabbing some phrases from the documents and see, okay, does it find or not, you know? So is this something that also like attracted you like, okay, findability, right? Like discovery or maybe discovery is the next stage, but even the findability itself. +Yeah, I guess search was really my first step towards working with real complex data that wasn't so unstructured, unstructured data, right? You kind of, you kind of reach a limit with structured data at some point of getting stuff into databases, getting it out and things like that. +And you can, you can spend a lifetime in that work. But I felt like I'd been doing it for a while. And with, with search, it was like this, this weird world where it's like all this unknown stuff and you don't know what to do. So it's this unsolved problem. +I felt like databases and things like that were like this solved problem, where search, search wasn't a solved problem and still isn't. Now with the work, if I had been doing the same database work, that's all no code right now. You can just create the same stuff I was doing with no code tools. +You don't even have to be a programmer if you don't want to with the level that we were doing it, you know, in the mid 2000s. So, yeah, now it is. And it's still, it's still unsolved. Even when we start talking, you know, we're going to talk about vectors, of course, but vector search. +But that's still an unsolved problem. It's like another tool, but you still have all these issues that you have to take into account. Yeah. So endless exploration. Yeah, it's like infinite quest in many ways. There is like a limitless amount of tasks to solve. +But then, so somehow in your career, there was a turn that you decided to get closer to this vector search field. +I just wanted to hear your kind of first reaction, like what did you think about it? When did you hear about it? And also, what attracted you? +I'd say the first thing that really attracted me towards vector search was the birth paper that was written in 2018, but I didn't I didn't come across it until 2019. +And Google had written a blog about how they were using it for their for their web search. And you know, you could download some Python and get this stuff to work. But the reason why I was so fascinated by that is because of you know, working in search already six years. No, let's do some math. +So, you know, eight years at that point, I had been stumbling along with the vocabulary problem. +The query term dependence problem, as we call it, where, okay, well, to solve this, you have to create a bunch of synonyms and then you get to a certain level of advancement and then you create a taxonomy and then you know, you created a knowledge graph. +And you know, before before birth, we'd started playing around with word to veck and saying, oh, can you know, can these type of embeddings be used to solve this whack-able problem with synonyms and knowledge graph vocabulary expansion? The answer turned out to be no with word to veck. +It didn't work as well as we'd hoped. It helped with some things, but not, but it harmed with others. So it produced a lot of noise and and you know, maybe we didn't give it a good enough chance, but we saw, okay, we can train this thing pretty quick and we can get this model from our content. +But there's still this problem. + So when I started to play around with some of the Python tools that were available for for Bert and large language networks, which actually used word to veck as the pre-processing step to get the first to get the first encodings and then with first embeddings and then use those identifiers to go forward. +I really saw something there. I saw actual similarity where I didn't, I just saw kind of co-occurrence with word to veck before. Yeah, these things are, you see them in the same context. +But with actual linguistic similarity, the first time I saw that was with Bert and that's where all the hype came from. And then the next step with Bert is like, okay, I have these vectors. +Now what do I do with them? And then I said, okay, well, I have to use a dot product, right? I have to use a cosine similarity. Okay, let me just do that. And then I say, oh, you can't just do that across every vector. It's impossible. You have to do something else. +And then you go on this learning path, right? So that's where I ended up. And I had actually written a blog post in 2019, you know, about, and I think that post was, you know, widely accepted by community, it's still in the open source connections blog. +And it was really, it was really showing like, hey, this is, this is a change, you know, it's not just Google that's going to be doing this. Like, this is really interesting. And a lot of people agreed and there's, there was like this movement that kind of happened after that. +And a lot of other people were coming to the same conclusions, but there were a lot of challenges. So with vector search and approximate nearest neighbor search, you know, that's, it would, that's just the tool to solve the problem. +It's like, you know, you start with this problem over here, and then you go like 10 steps over here, and finally, you get to vector searching. Okay, this is, this is a potential solution, right? This is the core of the potential solution with all this stuff in the middle. +Yeah, but have you felt that I should read this blog and we'll definitely link it in the show notes. But sometimes when I look at vector search, let's say demos or applications or algorithms, I get a feeling that you might just think, okay, I have a solution. Let me find a problem. +Because it's, it's all semitical. I mean, it's so sexy, right? Do you, do you think this is one of the sort of misconceptions, you know, in this field, or do you think that it's well-past that already? That's a great question. I don't know if, I don't think it's a solution looking for a problem. +I don't think that's true. I think there, it actually does solve some problems. But I do agree that it gets, you know, there's a lot of gray area. +And how do you arrive at that from, I need to find things as a person? You know, and all the things that you have to go through until vector search actually means something that is a solution. +I think there's, there's a lot of people who picked it up and say, okay, we could just use this and it's going to solve, solve these problems. +But it doesn't do that, right? Because search is not just about similarity, you know, you can express a query similarity with a document using TFI DFBM25, you know, the sentence transformer, you know, cosine distance, whatever. But that's only the similarity. +There's also like the, the need that the person has to what they have. So it's, it's a bunch of candidate documents that are similar, but what's the actual document you need? So that's where a lot of other things come into play. + It's just one piece in a much larger search or recommendations platform, you know, you still have to take on all the other signals and, you know, common now in the, in the more mature platforms is, you know, you have some learning to rank algorithm that takes, you know, me and Vector similarity is one, is one feature in, in a learning to rank model. +Along with, you know, BM25 with the title, BM25 with the body, you know, the number of clicks, the date, all this other stuff. And it's, it's a piece. +But the thing that the piece solves is that query term dependence problem, whereas like I don't have to, in a, in a, sometimes, you know, I don't have to go in and, and craft synonyms by hand, and I don't have this endless task of doing that. +You just, you kind of have all these other tasks that you still have to do, but that one maybe has kept it bay a little bit. Yeah, yeah, absolutely. + I mean, maybe I can a little bit like, restate my question, or sort of like, clarify what I meant, I guess, when you read, I think when you read the paper, like, birth or similar papers, they also say, hey, we, we ran down this on downstream task, like sentiment analysis, we also did question answering, we did recommendation, all these other things. +And it works great. Which kind of like pushes you to think in the direction that is this a universal language model or approach that I can now take and apply to everywhere, every task. +And the answer is actually no, because, hey, I mean, if you are in healthcare and they trained on news, it's not going to work. So the vocabulary still was not excluded from this journey. So if it's mismatch, it's mismatch. +But the model itself, of course, is a clever piece of, you know, attack, which you can then take and kind of apply fine tune, maybe, or retrain on your data. +So I think that that's, that's one way to look at it, right? It is, but I think that we, we see a huge, still a huge gap in the domain, right? I think there are a lot of organizations that can just make use of retrain models and fine tune them. +But, you know, we, I know that there are still domains that you can't do that. Like, if you go up and you try, you know, something that's fine tuned, like law, right? Law is like its own language. I wouldn't even, like law written in English, I wouldn't even call that English. +I'd call that, you know, legal English, right? Because just the structure, the vocabulary, the grammar, all this stuff is so different than what's in like a Wikipedia article or in the news or something like that, right? +So, when you try to do a fine tuning on a pretrained model that's trained on, you know, let's say like onto notice 5, which is a bunch of collections of like, you know, news, Wikipedia, like general knowledge that most people use. +When you find tune it, there's still a gap. There's, there's something missing, right? Because the original trained model was lacking this context. And that's, that's only for the content also. That's just, that's just the content. +And when people search and they type in terms, you know, you can imagine like this, this Venn diagram of like, well, here's, here's all of the content over here that you've trained on. +And then here's all the terms that your people, that the users know, right? And you try to like bring these closer together somehow, right? If the model was trained on content that is like up here, then you're going to have trouble like kind of putting it together. +I don't know if you can do a good job in my hands showing this, but no, you're doing perfect job there. So I think that one of the one of the big existing problems is pre-training still costs like a ridiculous amount of money and is out of the reach of most teams. +Yeah, I've read, I've read papers, you know, one of them was by Microsoft showing like, if you, you know, the bird vocabulary is like 30,000 words or something like that. If you increase the vocabulary size to like 100,000 words, then the model generalizes much better. +And you, of course, you expand the content and the domains that are involved in that training. So I think you, I think we're going to see some more of that. +The world is still stuck on this 30,000 terms in the pre-trained space of things like onto notes because it's just so expensive, it's train models and Google and Microsoft and Facebook and these companies that train models, they're not going to bother open sourcing those. +Maybe they will at some point, but I think we're going to need to see big companies that are specific in that domain, train those models and then open sourcing them. +But if you spend millions of dollars to train a model and you're a big private company, are you going to open sourcing the model weights? Probably not, you're going to keep it for yourself because that's huge value, it's huge value for your product. +I guess you open sourced the idea sort of if you publish, okay, here's the bird model, here's the mom model or whatever. But then go train it yourself. Yeah, yeah, if you have a couple million dollars lying around. +Yeah, and then I was also talking to in another episode, I mean, Ahmaad, who used to work at Google, and he said that entire teams would be dedicated on a quarterly basis to do the expensive fine-tuning work with burrito similar models. +So can you imagine that it's like a team's effort and this people, some of them invented the model, some of them didn't, but you know, with all the resources that Google has to fine tune them for three months. So I don't think this is out of reach of startups. +And I mean, there are other things that are out of reach, like, and this is where you saw the gap with MITI, I want to get closer to the MITI now. So there is, you know, every time I install a vector database, I'm not going to name one. And it says, hey, you know, it will be faster if you use GPUs. +And I'm like, okay, I'm a startup. I don't have GPUs. +You know, so this is, I think one of the gaps that you saw with MITI, but are there other gaps that you saw that you are addressing with MITI server? Yes, so the NLP world right now, and the vector world right now, they all they talk about is Python, Python, Python, Python, everything is in Python. +When you get to production, you use something else, but it's Python, Python, Python. So I wanted to, I came from a non-Python background. I started with C, Pascal when I was really young and then my seed programming is terrible, I'm sure. +Then I discovered, you know, intermediate, intermediately compiled languages, Java, C sharp, things like that. And that was like early 2000s for me. And I kind of went, I was in the Microsoft world, so I was doing C sharp for a while. +And then I found, and all the while I've been doing Java script because of, you know, I was involved in the web, so in the mid 90s, and that's how I got involved with content and content data and all this stuff. It's just all web stuff. +And then you got to know JavaScript if you do anything with the web. So it was like C sharp and JavaScript for me for a while. So I know that there's a gap. If you go and if you go and you go into the JavaScript world, the node, or, you know, TypeScript or those things, Dino now, there's nothing. +You want to do NLP? Learn Python. That's pretty much the suggestion. Same with C sharp, you know. Okay, well, there's there's some libraries out there, but they're clunky. +Nobody really, you know, Microsoft probably uses them, right? Because they're Microsoft and they built C sharp and everybody's doing Microsoft stuff. +But, you know, outside of outside of Microsoft, like who's using C sharp for for natural language process to train models? No, but, and to host models, you know, okay, well, to do it, you have to jump through all these hoops. And it's really hard. +So unless you want to like put Python in your stack, which is basically a non-starter for a lot of teams. A lot of teams, they work in languages like node, JavaScript, C sharp, Java, Ruby, Go. Like there's so many huge languages out there that just can't touch these models. +So I wanted something that kind of broke out of this shell, this Python, this Python like enclosure of like how do you get this stuff into the hands of other people just want to build web applications. They don't want to go and, you know, go into the Python family. +So that was that was one of the one of the starting catalysts from from Mighty InfraServer. +I, I there are there is one tool that I have to use that is Python because it has to you have to convert a model and I convert the model to Onyx, which is most people know about Onyx if you're in the NLP world by now, which is it's ONNX, that's for an open neural network exchange. +And is this intermediary format that can be used generically. It's like an open model format. Now there are runtimes that you can take Onyx and Onyx models and run them. So the biggest, the biggest one is Onyx runtime and that's developed by Microsoft, it's open source. +I see licensed and that's written in C++. But there are bindings for other languages and community contributes bindings. So you can use Onyx runtime in Python if you want to. +You can, and you'll get like for those Python people who want to host models in Python, just convert your model to Onyx and host it in a Python Onyx runtime. It'll double the speed of the model inference, like out of the box. You don't have to do anything. You press like a button. +You don't, you clone the repo, you press a button, then twice, twice as fast. But for others, you know, there's binding for C sharp, there's bindings for Java, there's, there might be bindings for Ruby. I haven't looked probably bindings for go. +And even if Microsoft doesn't support them, the community builds them. So you can do this, but there's this other problem that you have. The other problem is that, well, those are just the model weights. +And if you're talking about, and hosting the runtime for the model weights, so you put in inputs and you get outputs. But where do you get the inputs from? Well, you have to tokenize text, you have to do all the stuff to prepare it to pre-process it. +And then when you tokenize and you do pre-processing, then you can pass in those, the tokenized data as inputs. But all the tokenizers are written in Python. So now you have to, now you have that problem. +So I actually used Rust for mighty inference server because hugging face based their tokenizer, their fast tokenizers on Rust, they wrote it in Rust, and they offer bindings for Python. So if you, if you install a fast tokenizer in Python, you're actually using Rust bindings for that. +So I wrote a web server that wraps the Rust tokenizers and on X Run time. And I wrote a whole bunch of code for pipeline specific stuff like question answering, sentence transformers, sequence classification, which is like sentiment analysis token classifications. That's like, entity recognition. +And I'm working on others also, but it's so much faster, it's so much faster than Python, like it's not even close. It's probably like three or four times as fast without any fine tuning of it. And I've gone through fine tuning. +So I haven't compared it to Python in a long time, but I might be like five times as fast as Python right now on CPU. You can also use GPU if you want. And it's, you maintain the same, the same speed. It's just as fast. Yeah. +Well, it's just as fast as the, the ratio of speed is like the, you know, if you took the model and Python and you put it in a GPU versus you take the model and on X Run time, you put it in the GPU, you get it's far faster. +And you say like when you said bindings in other languages, you know, like Java C sharp. So if my stack is in Java, I can take this model and kind of plug it in into my Java code. +You can take a, you can take a, let's take a hugging face model, for example, like just let's say brick based on case, you know, most people know that one. Brat based on case, you can export that to Onyx with hugging face code in Python. And you have now you have an Onyx model. +Now you can, in Java, you can stand up a class that wraps the Onyx runtime and you, and you load the model into memory with on X run time in your class. +And then you can create methods around that class, right? And then you can call, you can call it and you can say, I'm going to pass in the inputs and I'm going to get out. And that's all in Java now. +Well, with the C++ wrapper for Onyx runtime, of course, but the connect, but to wrap that C++ runtime, there have to be bindings between the language. So Java has to have some application interface to talk to C++. Yeah, which is GNI, right? Java native interface, I think. Yeah, I think so. Java. +Yeah. So that part, like having Java talk to Onyx runtime is taken care of already. You still have to write all the other stuff around it, like to you to leverage it. But that's, you know, where programmers used to that sort of thing. You know, Java, you can, you can do that. +And I think, I don't know if it's, I don't know how much we've seen it, but Jeff Zemorek, who works at open source connections, I know that is like he was working on a project where he, you know, he could try to load an Onyx runtime in open, in open NLP, which is a Java program. +So trying to get an Onyx model in open NLP. And I think he succeeded. I haven't seen code for that. Yeah. Yeah. But that's what I'm exactly. Yeah, that's, that's awesome. +So I mean, the reason I'm asking is because I witnessed this tectonic shift in in my previous company where we had the entire stack in Java. Even though we started with Pearl, but we had to read right everything into Java, just didn't scale on Pearl. +And, yeah, and I mean, we had Apache Solar on one end as the open source search engine also written in Java. And, you know, when we would customize it, we would write plugins in Java and so on. But then, when we wanted to introduce AI into the pipeline, of course, everything was in Python. +We hired people who could only do Python, nothing else, fresh grads. And, and now you are stuck with this new architecture. Okay, you have Python as one step in the pipeline. +How do you call it? How do you handle errors? How do you scale this thing? Right? And we were also moving to Kubernetes to add to this crazy mix. +And, and what we ended up doing is that we would have a synchronous processor, plugged in in every place where you have Python to abstract Python away from Java. Right? So you would kind of like just say send this message to an SQS queue. And on the other end, there is somebody consuming it. +Can you imagine how scalable this can be? It works. It works. You can also like scale it locally. But as the whole architecture, I don't think it's a very kind of smooth in a way solution, like not to mention that the performance element of it is just not taken care of. +And what you say now, essentially, like with ONX binding in Java, we could just train that model and then export it in ONX format and then use it directly in Java. You can't, yes. But you still have to get the inputs to the model. +So if if it's like an image or something like that, it's usually pretty easy. But if it's text, then you have to tokenize first. And you have to use the right tokenizer. And you have to do, you have to kind of jump through a bunch of hoops to get it to work correctly. +So it's probably a month's worth of work to get a tokenizer working in Java the way you needed to work. Yeah. And maybe you could, in principle, share this tokenizer between tasks, right? So it's for sentiment or for entity recognition in principle, you could use the same tokenizer. Yeah. Right. +So the tokenizer is, so the tokenizer relies on the vocabulary and the configuration, which is bound to the model. So the model is dependent on these things. So if you have a generalized way to load the vocabulary and the configuration, then yes, you could just take the thing and your new stack. +But having said all this with MITY, you took a different, you know, approach, like the philosophy behind MIT, you'll offer it as a web server, right? Yeah. And again, can you tell me more about it? I mean, I'm sure you can open a lot of detail. +Yeah, the reason I went that route is because when you, when you want to do model inference, you want to give it as much compute as possible, right? And you kind of want it to be its own thing. So I went the microservice route. +I'm not, I'm not saying microservices are the way of the future and they're better than model it's and all this stuff. But the idea of coupling with this, you know, this model inference is part of like your regular application code. +Maybe you don't want to do that, you know, you want to have this other service that can, and this is part of like the bigger ML ops question, which is, well, how often should I update models? +What are the things that I just know about, you know, drift and all these things that are like, what about logging and all this stuff? It's like, well, okay, you need a way to do this. +And if you embed model inference in your own code, now you're also responsible for all this stuff, right? So as a, as a microservice, you can evolve that microservice and say, all right, this thing is responsible for model inference and that's it, right? +And then all the side effects around that of like, okay, well, you need a new model, but if you have to AB test models, what if you want to do logging, what if you want to do all these other things? You can evolve that in its own way and it's in the separation of concerns makes much more sense. +So, and then it kind of gets you out of the, it gets you out of the problem of like, okay, well, am I going to build a mighty for Ruby? Am I going to build mighty for node? Am I going to build ready mighty for go? Like, I don't have to do that. +I can just build mighty inference server as a web server or a GRPC, which own, you know, it's on, it's on the roadmap. I don't know how long that's going to take, but now you have this thing. And then I just have to write client libraries. And the APIs always the same. +The client libraries for HTTP are super easy. So yeah. And if you compare this, let's say we take a database, like VV8 or SBA, they have inference inside them, right? So like, if you already bought into that solution, in principle, you could do this. +The only caveat I think is that if you have your custom model, you'll have to go an extra mile to actually integrate it inside this database, right? And at that point, with VV8, I think you will have to master go with SBA, you'll have to master the C plus plus or Java. I'm not sure. +I'm not an expert in that, but there is a podcast with Joe Bergum that you can check out. But yes, so how would you kind of like on product side, how would you compare mititude that approach? So VESPA uses on X-Rontane. VESPA wraps on X-Rontane. I believe it's on X-Rontane. +I know they use on X models. I don't know how to present it on X-Rontane. So you'd still have to go through the step of doing that. With VV8, it's a little bit different. With VV8, you have these things called modules. And then the modules are typically like Docker containers with APIs exposed. +And then there's logic written in the module code for VV8 that will wrap that API. And it's easier if you just copy and paste a model and then change stuff to match the API of the thing that you have in a Docker container. So it's not that much work. You still have to know go to do it. + And yeah, I think the other problem that I have with that approach, and I'm not saying it's wrong, but from my perspective, so if you look at the documentation actually for a couple of vector search engines, I'm not sure of VESPA, but I think VV8 and maybe another will say, OK, well, it's better to use a GPU for inference and then CPU for the vector search, right? +Because you want to provide as many workers to the search algorithm as possible. +And you don't want the inference, the model inference, and the vector search fighting for resources. +Because both are very expensive, right? So they say, hey, if you have GPU, then all your model inferences and GPU and your vector search is all CPU and you get this one perfect box and everything just works. +But OK, well, what if you want to scale beyond that? You can only send so many documents into a GPU at a time. +What if I need 12 machines? Well, now I need 12 machines that are all hosting VV8 and they're all hosting mighty or whatever your inference solution is, all it wants, right? So this goes back to the separation of concerns, problems. +Like, well, what if I have a lot of documents that I need to process? And it doesn't take that long to get them into the vector search by the vectors, but processing those documents takes a long time. So I have to pre-process. +Well, now you've kind of got like this situation where you might need another solution to do this batch pre-processing, right? In another place. And then you bypass the module when you integrate into VV8. You just want to send the vectors directly to VV8 so you don't have any inference. +You send the vector to it. So, again, it's like this. I'm not saying it's wrong. I think it's a great idea because you can just install something that will just work, right? You don't have to install like three different things and try to figure it all out. +So I think that getting up to speed on that is probably quick. But in the long term, like the scalability overall, I think that you now have this coupling and it's a bit of a challenge. So I don't know how that gets resolved. +Yeah, that's actually a good point because you reminded me of, I don't remember precisely what we were sort of balancing between, but like with solar and a Java pipeline in front of it. +So the pipeline would process documents as they come in, you know, chunk them, classify them, run sentiment analysis on them and so on. We were thinking, okay, some of these things could be computed inside solar. +Like we could write some clever plugin which actually does, I mean, solar has a lot of things there, you know, like before it indexes the document, you can run like a ton of things. I think OpenNLP is one example, right? You could plug in and it runs something there. +And I remember that my manager, like who was a VP of engineering, he came and said, hey, what if we lose solar? So we computed everything inside solar, stored it and lost it. +Then what? Like now you need to bring it up back really quickly and usually what you want to do is probably like replicate some shard and off you go, right? But if you don't have that data, you need to recompute it now. So you don't have any intermediate storage. Solar is not the storage. +Solar is the database. And so we backtracked and we said, okay, we will compute everything and store it in S3, you know, in file storage. And if in the event of losing solar, we will restore it and reindex everything on the fly. +So I mean, that kind of also like, you know, resurrected that situation that also be deviated or quadrant to any other database. If you lose the fact, if you lose the database, you lose the vectors. +So if you have computed them inside the database, now bringing it back and then turning it on and say, hey, please compute my vectors again, please, please, please, you know, just too much time. You're exactly right. And this is a lesson that I learned. +I didn't learn this lesson the hard way, thankfully. But this is just a lesson I learned picking stuff up when I was at, when I was at Walter's Clure, which is a huge publishing firm. And you have, you have your content, which is like editorial content, primary source content. +And it's, it's written in such a way where it's it's pretty raw from a machine perspective, you know. And then it goes through a series of enrichments and transformations. So eventually it reaches the search engine. +But every step along the way, it's like, okay, well, we need to add topics to classify topics, right? So I'm going to add the topics. +And then I'm going to save that state that's now on disk somewhere back to, okay, well, now I have to, you know, add this other thing, you know, do any recognition or something. That's also saved, right? So you have all these intermediate steps. So if you lose anything, it's really easy. +You don't have to rerun the entire, you have to rerun the entire pipeline. It takes you months to do that. Not just days, but like literally months to start from scratch with content. So that's like a disastrous scenario. +So this lesson that you learn is, okay, well, yeah, you don't do, you don't do everything all in one place. Because if you lose it, then it's all gone. You have to start from scratch. So yeah, separating concerns in that way. +And then the idea of, well, you can plug this thing in anywhere along the chain now, you know, you have this, you have a microservice, you can put it in, you can put it anywhere. +And then you can, you don't even have to just take the vectors and then stick them in the search engine, right? But what if you want, what if you need the vectors and you want to do something else? +What if you have like a recommendations platform and you have this other system over here and you want to do this other stuff? It's like, well, now you have to think about routing and all these other things. +But if you just have an easy way to get vectors, you know, plug it anywhere along the stack, then that's up to you. You know, there's no prescribed way of doing things. It's a Lego. You put the Lego wherever you want. +Yeah, that's a great point because we also implemented like an algorithm, which was it computing some topics, I think. And we used fast text and work to back vectors. But we didn't need the vectors in the end in the downstream system. +We just computed them, clustered, ran some magic algorithm, you know, produced topics and then you store the topics. So you store actual words in some database, so index them in the search engine. So yeah, you're absolutely right. +Like, sometimes you don't need the vectors, but they are still the medium to get to your target. So, and so, but you've, I've seen the blog posts, which will also link, you've published on marks.io, so discussing sort of almost like a unit, unit economy of this thing. +Like, if I have MIT gazillion amount of servers, how it will play out, you know, how much separation of concern and also resource separation, all these things, and how economical it is in the end. +Is this something that you are proposing? So let's say if somebody takes MIT and wants to scale it, you know, like all of a sudden you get, instead of 10,000 documents, you get 10 million documents to process, right? +Because somebody changed somewhere in the pipeline, and now we need to rerun the whole thing. +So, how would you, what is your recommendation also on the economy side? How do you see MIT playing a role in making this huge thing more economical? +So, the first thing, the first thing that I see is that you can, you can calculate the cost ahead of time, because it's absolutely linearly scalable, right? You take, so MIT itself sits on one CPU, right? +It sits on one thread, I'll even say a thread, because these days you have cores and CPUs and threads and it gets messed up. + You can tell MIT to use multiple threads in certain situations that you want to, but the example for bash processing that I use, which I actually learned from the VESPITE because they wrote an amazing blog post in, I think it was early January, they released a blog post talking about this exact problem of, you know, do you have one process across multiple threads? +Do you have multiple processes? So, if you go with the multiple processes route, let's say I take, I take a bunch of documents and I pass them in and I have some level of consistency in the document size, which you usually do. +Pass them in and it takes you X as long, it takes you X to get all of your documents, inference, right? So, you have that number and you know how long it took and you know how much, how much content you processed in terms of bytes. +Well, what if I, if I add, if I add another process now and I'm doing this purely paralyzeable, so half of my documents go here, half of my documents go there, it's what I said exactly is linearly scalable. I add a CPU, it has the time, right? It has the time that it takes to do this. +So, if I have a situation where I've said, okay, I did 10,000 documents, it took me X, now I have to do a million documents. +How long do I want it to take? You can actually write down the calculation and say, I need, I need this exact infrastructure, which is a huge problem right now, a lot of people don't know that. It's like, okay, let's just add a lot of GPUs and see what happens, you know. +You can, you can spend the time to go through and do that calculation, but it's not so straightforward. And you'd have to do it like, you'd have to cost it yourself. +I haven't released it, but I want to have a calculator that says, how many bytes do you have and, you know, how long do you want to spend? And I can say, well, it'll cost you this in Amazon or whatever. So, that's, that's one thing. +I also want it so, I mentioned GPUs, it's like, this is, I built it so it works on CPU. +If you are a company that's getting into this stuff and this, this, this idea of the unit economy, like, how long does it take to process something? And what's the cost and, you know, how do I scale it? But the, the, the, the billion documents. +If I'm coming into this ecosystem and content processing, and I'm used to working in Java or, you know, C sharp or something like that. Now you're telling me I need to buy GPUs, like I have to run GPUs, and then I go check the prices, I'm like, well, that's not how much we spend on infrastructure. +That's not in our budget. I'm sorry to tell you. So maybe we can't even do this. So I wanted to have a way where you could get around that problem where you could just use CPU and it's a straightforward understanding of the cost that you'd have to put in. +I haven't checked Amazon, I haven't checked Amazon prices in a little while, but I might as well be posted online, which is, which is another cloud platform. +I just, the pricing is better and I just, like, they were actually recently purchased by a huge content, management system, uh, it starts with an, I forget the name, whatever. Anyway, I use line-out and it's, uh, it's, it's cheap for CPUs. +Like, it's great, but you want to, you want to run a GPU, it's like $500 a month or $1,000 a month. And that's a lot of money for one machine, and most teams are not willing to spend that. +If you want to do fractional, you know, on AWS is probably for actionable GPUs, I think, but it's still expensive. And now you're, now, it's like this cost that never goes away. Like, once you do it, it's like, well, it's there, it's there for a long time, you know, CPUs are a commodity. +Um, GPUs, you have to fight with the, with the cryptocurrency crowd for the costs, all this stuff. So, yes, CPUs the way to go. I can imagine that GPUs can be used during the model training or fine tuning, but during serving, that sounds way too expensive. Right. +Yeah, yeah, that makes a lot of sense. + And, um, and so, now when you offer my, how exactly you offer it, it's, it's a binary package, right, uh, that I can install and, and, and basically run on my, on my system, and I can decide whether it will be like, a standalone kind of script or it will be a pod in Kubernetes or Docker image and some other non Kubernetes. +Um, so is that right? That's right. It's, it's a very small executable. Um, it's, it's so Linux is a first class citizen. Um, Windows is, it'll run on Windows. +It'll run on Mac, but I've heard people running it on Mac M1, but they had to like do a lot of stuff to like fix dependencies and it wasn't really working that well. +And I think what, what's it called Rosetta or something? I think it's still using that like to, to do the X86 like bridge, like the translation, visualization. Um, so Mac M1, it's not, uh, I wouldn't consider it working. I've also seen some other problems on, on Mac that I'm trying to resolve. +It works fine, works on my machine, right, that, that type of thing, but really it's meant to be running links. Um, you can run it in Docker. It's really easy to get started in Docker. +Uh, so you can download the executable and run it on your Mac, um, or you can just download the Docker and use that, which is probably a little bit more straightforward. Um, then you don't have to worry about other dependencies. +Uh, with Linux, I don't, if you're running it on, uh, on Linux machines, you can use the Docker if you're doing like Kubernetes and that stuff. Great. Run it in Docker. Um, just make sure that you sort out like in your pod or whatever, like how much compute you're actually giving it. +Um, because model inference doesn't, it's not just a mighty. It's like all model inference is really, really heavy. It's really expensive. It wants a lot of, wants a lot of compute, not so much memory, but compute. So just be sure to give it enough, um, to satisfy your needs and do time. +I haven't done Kubernetes test myself. Uh, but I like to run, I'm, I'm old school. Like this whole Docker thing. Yeah. Okay. I'll, uh, I'll make a Docker file. Sure. You can use it in Docker. Um, it's on the Docker hub. Uh, but I like to just install stuff. The old fashioned way. +Uh, in Ubuntu, I just, you know, download the download the thing. It's a tarball and you, you know, it's at the tarball and you're good to go. Uh, and, uh, the way you start it is actually, it's a, it's a rest program with a, with a library dependency, which is on extra one time. +Um, because it's dynamically linked. It's not statically linked. But, uh, to start it, you can either start one core or you specify the model. Or there's a thing that says it's called mighty cluster. It's just a back script, back script. +And it'll look and check how many cores you have on the machine and it'll start a process for every core that you have. So it does this work. Um, and it takes like less than half a second for each quarter startup. It is, I, I actually put that in on purpose. +That's a limit I put in to slow it down a little bit. So it didn't let go off the rails. Um, but you could probably take that limit off. You could just go and modify the bash script and, uh, and see how, see how quickly it'll start up. +So I, so that blog post that you mentioned before, um, like I rented on 128 cores. So it would take like, I actually took the rails off and let it start up really quickly. Um, but it can take, it can take a moment to start it up on every single core. Uh, and, yeah, you you could do it in Docker. +You could do it bare metal. Um, if there's any people out there using windows, I'd love to hear from you. Um, I've heard some feedback from Mac and Linux, but I haven't gotten any windows feedback. So I don't even know if it's worth building it for windows these days. Maybe not. +Yeah, I think it depends. If I don't know what should be this scenario is like you are a developer on windows. +And for some reason, you don't go on your server side to like you, we still want to develop everything locally, right? So you want to bring up, I saw such guys actually in my team, they wanted to bring every single server service on their laptop. Yeah. And that's how they developed. +They didn't want to depend on any external connection. Right. Even, even Docker is like a pain in windows these days sometimes, right? So I know that I know the windows ecosystem, because I used to, I used to be in in the 2000s. That's the, that's the mindset. +It's like I'm just going to run everything natively in windows. Yeah. And like when I tried mighty on on Mac, I think it took like some seconds to boot, but the moment it booted, I was like shooting some queries and like to compute the vectors and it was insanely fast. +Is it only on a secret source in this insane fastness? I mean, if you're used to running models and Python, it'll seem insanely fast. A lot of it is on. That's they get most of the credit there. Yes. +But there's a lot of other stuff that goes into it, which is like the tokenization and the pre processing and the post processing is just, it's fast is I've been using Rust for it and Rust is a, Rust is a really interesting language. It's it's gotten me back into systems programming. +I'm not here to say that like Rust is like the most amazing thing ever. There are things I love about it, the things that are like, I don't know if I would do that way, but you're supposed to do things a certain way because the compiler understands that it'll super optimize it for you. +It's hard to, it's hard to wrap your brain around it if you're if you're from a dynamic really typed language like Python or JavaScript. It's hard to get a handle on node. My my compiled background like, you know, typed, typed programming languages compile ahead of time. +I was used to that from my previous life. So I was able to pick it up again and I read the I read I just read the rest book. There's a free book out there. I actually bought the I bought a paperback because I like paperbacks and I like hard covers like actual paper these days. +So I was reading it like that. And just going through the examples took me a couple weeks to get a handle on Rust. That gets a lot of the credit as well, the Rust language. It's just it optimizes and you know, you have to learn this field that I'm in now with model inference. +It's like the super niche field of like you have to understand the hardware and you have to understand the machine learning. And there are those two fields are like so different. There are very few people out there that are really good in both. So I know that there's a word vectorization. +So vectorized on the CPU is like, well, if I have to do a calculation with eight with a byte and you know, I have a register at 64 bits. But I have an eight bit byte like, well, I can vectorize and I can do eight calculations because it's with SIMD, same instruction multiple data. +So that so Rust, if you turn on certain compile flags, it'll do that for you automatically. So you get that speed up. So I turn those knobs all the way up. I said use all the use AVX1 and 2 if the process supports it and most processes do these days if you're on X86. ARM has a different set. +I haven't gotten into the ARM world that I have to get an M1 Mac and I'm going to start messing around with all that. But if you know that stuff and you know how to turn it on, Rust does the rest for you. +You kind of have to write your code a certain way so that, you know, Rust will do the optimization a certain way. You can't think like old school. You have to kind of think in Rust world a little bit. +But doing that, now you get all this extra, all this extra speed from pretty much nothing just from writing your code a certain way turning on a couple of compiled flags. That's why it was so fast. +Yeah, but you still needed to figure all of these out and I remember you were you were saying that you had a bunch of weeks, you know, coding on stuff. +Yeah, you get things done because I know and many of us probably know here in the audience that if you are a programmer, you might say, yeah, I can do it. But you cannot actually estimate when you will be done. +So you get into the weeds and like, oh my god, it just like UTF or something else doesn't work or like, I'm sending a requested fails, whatever, what's going on and you spend so much time or if you're doing an algorithm, that's another story. +That's like an internet journey there, like debugging all these states. And I mean, I'm just trying to say that even though you you make it sound so easy to to master Rust and you know, to go through all these maze and make it the way compiler wants it, it's still time. It's a lot of time. +It's skill. And so you master it. And that's why and in the end, you know, the end result was not given. +You earned it, right? So why not turn this into the business? So now on the business side, I'm thinking, how do you offer Rust? Like, so how do you offer excuse me, mighty? So you have you have the the binary, you have the like the model will be shipped separately somehow outside of binary, right? +But what is a customer I'm paying for? And yeah, and also kind of ahead of time, a question, can you give a discount code for our audience to try it? Oh, that's a great question. +Um, uh, yes. So my business model is, is again, old school, because I've been doing software for a long time. So it's licensed software, right? You pay, you pay a license, you get to use the software. Um, I'm still trying to figure out the exact price point. +Um, some people say, some people say it's too cheap, which is interesting, because I didn't, I didn't think so. Um, some people think I say I should charge more money for it. Uh, it's $99 a month right now. When this podcast is published and after that, it may change. +If you get it, I don't have, it's a light up to strike. I can go in and create a discount code for folks. I don't have a code right now. +But if you, if you email me and you say I heard about you on the vector podcast, um, follow the link in the description, like follow the link in the notes and email, we'll, we'll set something up, um, so you can get a discount. Uh, that's the way it works. But that's, uh, that's for commercial. +So if you're using it commercially, um, and you're making money from it, uh, then, you know, I, I ask you pay a license, please. +If you are a nonprofit charity, um, or just using it, you're a student, um, or you just have a side project, you're messing around, you just want to get some vectors, go and install it, you know, don't worry about it. +Um, but if you put it in production and you're, and you're charging money for your product, then please, please buy a list. Yeah. Yeah. To have questions, Sign, um, how will you track who is using it for commercial and who is using it for a hobbyist project? That's a great question. +And, and I don't, I don't track that. Um, I'm also, I'm really into uh, privacy and safety on the web. So I don't like the idea of like putting in a whole bunch of tracking into lemon tree. Um, I think that's a terrible way to run a product these days. +Um, I, the only thing it does is I, uh, have it ask when it first starts up, it just asks the server for what the latest version is. And it'll tell you if there's a new version. So with that, I see, I see that, um, okay, the, you know, somebody asked for a new version. +Uh, and I anonymize all the IP addresses. So I don't even know who, like, there's nothing. There's no user information at all. So I just use that to kind of track, um, how often it starts. And it's, I, I see like maybe, maybe five downloads a day, um, right now. That's what I do. +So if, if you're running it, you're for pirating it, I can't stop you. Uh, spending my time trying to stop you. Uh, it's not, it's not worth my energy. Yeah. And I'd much rather, uh, I'd much rather work with teams who really want to gain something. +So if you do buy a license, I'll work with you on setting it up and telling you how to use it and working it and working on it with you. +Um, it's not advertised, but around model inference itself, I'm happy to, uh, offer services, uh, to get your model up and running and making sure that it's running optimally, um, even doing a model conversion with you, setting you, setting you up for that stuff. +Um, but that's, that's not advertised. It does say, like, I'll spend an hour with you if you buy a subscription to get you set up, but if you need more help than that, you know, let me know. +Uh, now there's another tier, which is like, if you're Amazon, Amazon would never buy my, they have their own world. But if you're like a cloud provider, or if you want to offer it as an API, that it doesn't count because it's, it's per, it's per product that I sell the license for. +So if you are selling it as a cloud provider, or as an API, and you've got like a thousand clients that are now using my money, well, I, I actually count all of those clients as a mighty user. +So I don't have a price published, but if you have that situation, I'm not going to charge you 99 dollars a month for each client. That's that, that, you know, if you're running that type of business, contact me and we'll work, we'll work something out. Yeah, that's perfect. +I mean, sounds like a solid model. I mean, for the start, for sure. And another like favorite question I have, and I've been asking this question also to open source players like VV8, um, and I think quadrant. +Um, so basically, um, have you thought, you know, one way of kind of building that connection that may yield a business case for you is what you just explained, right? +So somebody buys a license and then you scale with them, you explain how to make it better, how to tune it, maybe implement some features. +Another route is to open a Slack channel or Discord, whatever. And, you know, invite users there and then start talking to them. +And maybe you'll have some open source components as well at some point, you know, I don't know a tool that helps me to convert my model into representation that might you can read. +Um, have you considered also taking that open source route as one way of building that community of some of which will be your users and paying customers? Uh, great question. I don't have a, I don't have a Slack, I don't have a Slack myself. Um, I'm a member in many other slacks. +I could set up a Discord, I'm on Discord, um, mostly just for the ML ops community in Discord. But I could just start like a thread or a channel in that. I don't know if mighty itself needs its own Slack by itself. Um, I think it's more of a community. It would be part of another community. +One of the, one of the annoying things for me is I have to go and join like 12 million slacks because everybody has their own Slack and it doesn't work with one another. Discord does that way better. Slack, we got to have words. Um, you got to make it easier. +I have like four email addresses or five email addresses across like 12 different slacks right now. I can't keep track of them. Um, but in terms of open, of open source, I already have a bunch of open source projects. So uh, there is, um, max.io but spelled out, M-A-X-D-O-T-I-O on GitHub. +Somebody already took max-io. We can't have dots in GitHub names. Um, that's fine. You know, names or names. Uh, so there's mighty convert, which I actually, I'm working on updating that because it's based on, um, optimum, which is a hugging-face repository that does model conversion. +It's a very light wrapper around optimum. It basically just converts the model for you, uh, bundles the tokenizer and a configuration. That's it. Uh, it's pretty straightforward. Um, you can do that yourself. You don't have to use that. So that, but that's open source. +Um, there's also mighty batch, which is a node program, which is a way to do, uh, concurrent batch processing of documents, into vectors, pointing it at a mighty server. +Um, that's best described in the blog post that I wrote and how that works, um, about, you know, the, the blog post being, uh, converting the code of federal regulations. Um, that's on, it's on the homepage of max.io. +And, uh, there's also a bunch of other open source projects that I haven't talked about yet. So there's now node mighty, which is basically just an API client for node that will talk to mighty, um, it does connection pooling. +So if you have like four mighty, four mighty cores running, it'll talk to all the, it'll negotiate which core to use, um, when it makes a call. So that's really easy to use in like an express server. +Um, I also wrote two other node modules while I was at it, uh, that aren't from mighty, but I wrote node quadrant. So now there's a node client for, for the quadrant vector database. And I told, uh, uh, I told the guys at quadrant that this exists. +I'm trying to work on a blog post out of an asset, I guess this is the announcement. Um, but I'll, I'll, I'll publish something. There's going to be a demo. I also wrote node pine cone. So well, it's pine cone dash I.O. So now there's a node JS integration into pine cone. +So you can talk to pine cone from, from node from a kind of express server or something. Um, the guys at pine cone don't know what that I wrote that yet because it wasn't, I didn't, I just put it out there. It's on npm. Um, so I gotta, I gotta, I gotta work that out. And they might want it. +If you guys, if you want this, you know, I, I just wanted something that I could use, but it's your name. So please take the, take the package from me. If you don't get upset that I used your name. Um, I just wanted a tool that I could use for my own node JS testing. +Um, but then this stuff integrates with, with mighty really easily. So I have all these node clients now and I'm fricking focusing, focusing on JavaScript first. So all the stuff is going to be released. It's already, it's already up there. It's on npm. It's on my, my GitHub. +Uh, it's, it's free to use. It maybe needs a little bit more polish. I haven't fully mapped out the APIs. I just mapped out the course stuff that I needed to do. So it doesn't do things like the scroll command, you know, where you can scroll through all the points on quadrant. +But I don't know how much of a use for that is it's really easy to add that. I just didn't have the time. So yeah, there's, there's a bunch of open source work that I've been doing. Um, I also want to mention I'm working on starter applications. +So I have, I have right now, uh, basically it's like a, it's like a starter app that works with node and node mighty and quadrant. Um, and also node mighty and pine cone. + I have two starter apps that aren't released yet that I'm working on polishing up and, and getting out there where it's where it's really easy if you're a node, if you're a job script person to just take documents, convert them into vectors, load them into a vector database, and have a search app running using them. +Yeah, that's fantastic. I mean, so much to unpack. And I think this could be one of the like a we're witnessing, um, a community written, uh, software for a close software company. +I mean, pine cone is a close software company, right? And we have an episode with Greg Kogan, who is a chief marketing marketing officer with pine cone. Um, we can connect you to and you can discuss the future. Yeah, I talked to Greg. You know, we're working on some stuff. +But what, what, my question is what made you write those connectors? Like, did you think that this would also pave the way to using mighty, you know, plugging in mighty in the pipeline? Let's say if I'm a pine cone user and I can have a node pine cone connector at the same time as mighty. +I'd say half half that, you know, there is, you know, I do want to promote mighty, of course. But again, I want to bring these tools outside of the Python ecosystem. +If you look at the vector databases right now, with it, with the exception of, uh, with VV8, VV8 does a great job of having different clients for different, um, different languages and stacks, um, Vespa as well. But both, both quadrant and pine cone right now, it's all Python. +Well, like, quadrant, quadrant is written in rust, but their client right now is their, their first class client is in Python. Um, which they did that because obviously everybody who has to get vectors has to use Python anyway, uh, or they used to. Um, so that's why they chose Python. +At least that's, that's what I speculate. Um, and pine cone as well, all their examples are in notebook form, um, in Jupyter notebook form. You go in and you want to do a somatic search example, that's a Python notebook. I'm not crazy about Python notebooks. +I think Python notebooks are good for illustrating like ideas and sketches, uh, for papers, but it's really hard for me to look at a Python notebook and say, here's how I make this into a working application. Um, it doesn't translate well because the, the architecture isn't there. +It's a bunch of cells that you run in order. That's not how, you know, real world applications work. + So the idea is to just get these tools and get these ideas and capabilities out into the hands of a lot of other people who want to be able to use this stuff and are not familiar with Python, they're not familiar with NLP, but they want to be able to use this, uh, this new technology because they might have a business problem to try to solve. +So you're thinking actually about engineers who are day to day productizing their code and thinking, okay, yeah, I need a embedding layer, but I don't care about notebooks. I'm not a Pythonista or whatever. So, you know, just give me the tool. Exactly. Yeah, that's fantastic. +I mean, and by the tools, you also like disclose something like ahead of time with me that, to me, that, um, you, like one of the overarching goals for you is to develop as many tools for the vector search community as possible. + And like some of the tools you mentioned, like go beyond, you know, pure engineering components, like connectors, you said, uh, maybe like fine tuning a model or something of that sort, which at that point, I think you're stepping, uh, on the ground of, you know, other startups like Gina and, um, you know, deep set and so on. +Do you feel that way or do you not concern yourself with those and you're just thinking, okay, what's missing in the field? I'm going to add it. I'm going to open source it. Yeah, same. So, uh, deep set is like, it's all Python. +Again, um, Gina, I think is a lot of Python, right? I'm not as familiar with Gina. Yeah, they are, they're Python mostly. Yeah. It's, it's, there's a huge opportunity, uh, to make these tools available to non Python, um, stacks. +And I don't, I, before I started working in machine learning, I've never even considered Python as, as an application framework. You know, people are using like Django, flash and stuff like that. Um, but for me, it was like, uh, it's not that it didn't take it seriously. +I just felt it wasn't, it wasn't something that I would have chosen to use aside from, you know, a lot of other, a lot of other stacks. So there is so many other teams out there that want to be able to use these things, but now they have to, oh, Python, Python, Python, it's nonstop. +So we got to break out of that somehow. Um, and, uh, I'm starting with node because the JavaScript ecosystem is just absolutely enormous. I think people underestimate the size of the JavaScript ecosystem. +If you're in machine learning and you're listening to this podcast right now, like there, there are like maybe a hundred people for every one of you who's using JavaScript in, in, for applications. Like that's how big it is. Um, so that, uh, I'm starting there. +I just know it's just an enormous community. And not only for front end development, you know, we need to emphasize this because you also have server side JavaScript, like Node.js and others. And it's, it's huge. +And a lot of software, which is kind of the, the middleware between your super cool search engine or your, your vector database, and the front end, you have a lot of middleware written in node because it's so much easier. Oh, well, not easy. I don't know. +Is it easier? But I think it's just the pervasive, you know, nature of JavaScript. Yeah, I don't know if I'd say node is easier than Python. I think it's, you know, I think they're similar actually in a lot of ways. The syntax is a little bit different, you know, curly braces versus tabs. +But, uh, I think that node, we're, we're getting away from vectors right now. But node started because the JavaScript was the language of the web. And people didn't want to learn another language to also write back end code. +You know, you were using Pearl, right? So a lot of the, there was a lot of time where it was like Pearl, PHP, plus JavaScript, right? There was that whole world out there. Um, so that's where node came from. It came from the web, the web front end. So that's, web front end is enormous. +And they all, and a lot of them just adopted node. And the node had its own hype cycle, like 2010 through 2014 was like maybe nodes heyday where it just was like through the roof. Everything was node.js. Um, it was going crazy. Now it's all, now it's all, uh, uh, you know, machine learning and AI. +A lot of people got involved in this, in this world. But there's still a huge, a huge section of the world that's written on top of node from applications that started in, in, you know, the early 2010s. And it evolved ever since. Yeah. But back to tools. +Like, so you said in the early notes you shared, like you also want to address some of the, uh, problem solved problems like in model fine tuning or some other like pipeline steps that, that may be precede the, uh, the, the embedding layer that you have now addressed with MITIS. +So what are your thoughts there? What, what do you think is missing? I don't, yeah, I don't know if I'm going to get into actual model tuning. I think that, first of all, I'm not, I'm not as good as, I'm not as good training models as other people. +There are other people that are suited to train models. But I do think there's a lot of other information that is lacking in model in the, the ML ops world and vector, and vector search. +One of them is just like, well, how similar are these things, right? What, what's the distribution of similarities? Um, I think we, V8 said they, they do support, uh, some of that and, uh, Vespas, what's some of that in logging. +But, um, I don't know about Pankoam, uh, I'm pretty sure, rust, uh, I'm sure, pretty sure quadrant does not. +So it's like, what do I mean by this? It's like, if I, if I, uh, have a vector and I get, and I do a query against, um, uh, quadrant, for example, I get back a list of documents that are nearest neighbors and the similarities. +Well, where does that fit? Like, if I get it back in the first document is like point four or not similar, right? Is that good? Is that bad? Like, what are the, what's, what's, what's real, what's real good similar? Maybe, maybe the best similarities are like point eight range. +So now I know that, well, in terms of my entire corpus and how people usually query, this result is actually not that great. And there's a lot of questions to be answered around that stuff. And so I think that's lacking in a lot of ways. I don't know if that's the right fit for Mighty though. +I think there's just external tools that, you know, I'm kicking around. All that stuff would be open source. I'm really interested in, in Mighty being, uh, the area of the business and then all the other stuff is open source to make things easier for people to use, uh, these things. +But yeah, there, there's a lot, there's a lot of stuff in terms of uh, in the MLObs world, there's like model drift. +It's like, well, I used, you know, if, let's say I have like 100, uh, 100 sentences, right? And I vectorized these against, you know, model one dot two dot three, right? And I got back, um, I got back a list of, uh, vectors. Now I've upgraded my model. +I have model one dot three dot eight, right? And now I, now I, uh, run my test vectors, my test sentences through and I get different vectors. Like how, how much has changed? What's the difference there? So there's this whole world around measuring model drift. +And there's some, there's some interesting open source tools on this already. But they're written in Python. So now you have to use Python to do all this stuff. So I'm trying to understand what, what the tools, uh, what tools could be written that are not in Python land. +Um, that could expose these statistics and this important information to people who, um, you know, who don't want to marry themselves to Python. Yeah, yeah, absolutely. + This sounds like you touched also on this very important topic, which I think, uh, is, is, uh, known as a metric learning where, um, like on one hand, you do want to know what is the optimal distance and maybe you need to fine tune your model or maybe your data is not good fit for this model and so on. +But you do need the tools. Maybe it's something like Cupid for, you know, ranking, um, evaluation and tuning. You would also have some tool which is Cupid like maybe even with the UI way. +You can load this vectors, visualize them and see, okay, how do they fit together? What's missing and so on and then have the stats on it, right? So you can actually run the statistics and, um, you know, I'm gonna let Eric write that tool. I love Cupid. Cupid is so great. +Eric, go write Cupid for vector search. Yeah, I think, uh, we can pair up on that maybe all of us contribute, make it open source. +Um, but yeah, um, I think this is one way to look at it, right? Um, and I think quadrant, um, developers, they, they push the metric learning quite heavily forward by the time this podcast is, uh, this episode is out. +There will be another episode with a developer from quadrant who is actually very big on, on this idea of metric learning. And, uh, he opens sources, of course, everything. And I mean, he offers tools and also like, papers that you can read and indicate yourself on this space. +I think this is something that barely is scratched at the moment by the community, by, by even the end users, you know, they don't know. Okay, I take clip model. I have the images, plug them in together, works fine. I'm done. +What if it doesn't work? What if you have some images, you never find them for any query, but you do want to find them because it's a name of some product that was recently released and you do want to, to showcase it, right? And you're not using keyword search there. It's a name. +You're using, um, vectors to retrieve it, right? So it thinks like this. I mean, it's kind of like, there's a bunch of topics there. +One, another one favorite that I like is the, uh, robustness, right? So if I have an aircraft, I rotated a little bit and all of a sudden I find kittens instead of the aircrafts. And this is what Connor Shorten showed yesterday on on on the genometer and was amazing. I mean, robustness. +You just change slightly your input and you just, yeah, it doesn't work. So I think there is a lot of things missing, but like you, like from what I sense in your answer, like it feels like you do still want to keep your focus on mighty and push that as further along as possible, right? Yes. +And I want to, what I really want is I, I love that people download and install it and use it and do whatever they want, uh, to get vectors with my, that's awesome. I'm really trying to find partners. +I'm really trying to find partners who, um, who want to just really make it super easy uh, to do, uh, inference, model inference at scale. Um, so for example, I haven't gotten any replies. I've been like spamming, uh, not spamming. +I've been, uh, emailing and trying to get in touch with like cloud cloud providers, right, to say serverless inference. +If you could offer serverless inference, right, through lambdas or whatever, that's like so many people are asking for that, you know, you can't do that with Python tools these days. Um, you can do it. It's just going to, it would take forever and it would be really expensive and really slow. +Um, but there's such an opportunity for cloud providers to make it super easy. +So you can have, you know, you want to get content from, from point A into, uh, into your recommendation engine or your vector database or whatever, you know, do you want to stand up like the big GPU server in the middle to get this? No, you don't want to do that. Um, if you can avoid it. +So how about something that's that serverless and people can just run? So I'm trying to find partners there. +I'm trying to find partners who, uh, who have search platforms and, um, and other and other platforms or just see this as a Lego and their stack and things that's going to make it easier and they don't want to, you know, hire a team and spend months building this thing and trying to figure it out. +Um, you can do that of course. Go, uh, go do that, but, you know, you can save yourself a lot of time and pay and buy it. Um, by working with stuff that's already there. Yeah, that makes sense. + I mean, probably companies like the likes of Algolea or, right, exactly, but potentially elastic, you know, because they, both of these, want to get closer to the neural search even though maybe they were not wired up originally to be vector search databases, but they do have the components like elastic based on Lussin and Algolea probably based also Lussin. +I'm not sure, but I'm sure that they're looking at this field. So I mean, for them and now we are getting a little bit into MLOPS and Vision, um, that you also shared a little bit ahead of time that, um, might it could be one of the components in the MLOPS ecosystem, right? Yeah, absolutely. +Not just a standalone kind of script, which I download and then I'm thinking, okay, where do I plug it in? Right? I mean, if it was, if it was, are you thinking in that direction as well yourself? +Like, okay, identifying the tools and systems where MIT could kind of like play a long role of the embedding software? Yeah, absolutely. +Um, it's, I have to, if, the other thing I want to figure out is, does it make sense as it is right now as a, as a web server? Like that for every case, probably not. There's probably situations. GRPC was one request, um, that I have to figure that out. +So that makes it a little bit easier to, to, um, to bind it to certain application layers. Uh, but yeah, it's, it's meant to be flexible for you sticking a model that your model, um, you know, and, and you, you run it how you want. +The, the other thing that I found was that I, I met a lot of people who were like scratching their heads saying, like, which model should I use also, right? As my, my first model or, or whatever. And I just want to start playing around with this. + So that's the other thing I did is I, is I have like default models that I, that I chose that I know work well because, you know, especially like Neil's rumors, he's amazing and he's done amazing, um, community development around, around expert and the models that he's trained and the documentation he's published around why certain models are good and others are bad. +So other people, they don't know of, of, of this stuff. So it's just like, well, you don't have to go off and learn and understand, um, right away, why, why I should choose one model versus another. It's a hard decision to make. So there's some, there's some defaults that I chose. +So it's really easy to get started. So the, so the vectors themselves right off the bat or if you do question answering, it'll be, it'll be pretty good. Like for, for regular, regular English, not domain specific. You, you still have to do fine tuning for most cases. +But you're not going to start fine tuning before you even know how this thing performs like in the beginning, right? You want to try a model and see what, how close it is. Um, so there's some starting, starting work there. I know Algolia is getting to the vector search stuff. So I don't know. +Maybe they, they, maybe they don't know how to choose a model. So you guys, you can use my default model if you want. It's just a, yeah, absolutely. I mean, so far, what I hear from you is that my tea has, uh, the qualities, let's say, it can run on pure CPU, which is a win on cost. +It scales, which is also a win on cost in the long term. Right. Um, and it also, uh, is insanely fast, which is a win on product. It's a win on, go to market. Like I have this document, how quickly it travels through the pipeline and is searchable. Right. So I mean, it's important use case. +In some cases, like paramount, you know, like financial space, you know, a document came out. I wanted to be indexed right away. Like a second after, I don't want to wait five minutes. It will be way too late for me to make a decision. +So, I mean, is there something else? Like you, and maybe if you, if you could compare now or point us to the blog post, you know, uh, with other vendors like Amazon has in French, uh, you know, hugging face has infinity, infinity, right? +Um, and then, uh, and video, I think they also had some layer. +I forgot its name, but like those are probably fairly expensive. They probably are not $90 per piece. So what, what, what is your thinking there? So like you, you, I think you also are vocal in this space or like in that direction that mighty is much more economical. +Uh, than these more expensive solutions, but they probably offer something else as well, but like, you have an issue for sure. Yeah. +Um, I think that, so the interesting thing, if you, if you get involved with like, if you, if you get into Amazon, like in French, yeah, and all this stuff, they crafted like their entire, like they build their own hardware. Um, they have their neural core, um, that all the stuff is based around. +And that's like, it's lockin. It's big time lockin, right? Um, uh, this is just a web API. You can just use it. I, I think that, um, I, I've considered also like hosting an API, like, uh, hugging face, hugging faces like one of the most amazing software companies ever. +It's like, that's like the real community driven open source stuff. They do such amazing work. So I don't want to, I don't want to say anything bad about hugging face because I really have nothing bad to say at all. +Um, but, you know, Infinity definitely has a fit for the market, which is like, you know, if you are like Walmart and you need a solution, okay. Hacking face infinity is in your budget. Go pay for it, you know. Um, that's the type of thing that Walmart should use. +Um, but if, if you are just like, if you're a five person developer team or like, even a, if you work at a company that's like, you know, 300 people, infinity is like really, really expensive. Um, so there is a, there is a market segmentation there. +There's a difference between, okay, well, how much can you afford and who can you hire and what's the level of, um, internal support that you have to put around this thing and how does it all fit? +The teams that are just starting off that need to use something that, that works really fast, easy to use, then that's, that's where it might fit. +So I don't think mighty can, competes with infinity because honestly, I, uh, you know, hey, Walmart, if you want me as a customer, if you want, if you want to buy mighty, sure, go ahead, you know, let's talk or you can pay the 99 bucks a month. But, you know, that's not, that's not one target. +I'm trying to make it super easy for everybody else. Um, somebody high rank recently, uh, connected with me, a LinkedIn, I think some kind of VP of engineering, hey, if you're looking into embeddings, contact max. +Really? But like, so we understand, uh, infinity a little bit better because I didn't try this at all. Um, is this like some kind of web service that you basically buy subscription for like, sauce kind of thing? No, it's like a dark container. I think infinity is a dark container. +Um, I don't know, it might be, it might even be written in Rust, I'm not sure. Consider tokenizers are written in Rust, they may have done, I may have done some, infinity came out before mighty, so they may have done something. So it's a perfect competitor for, for mighty in that sense. +Um, I mean, I mean, no time pricing, but I mean, the only package itself, right? So basically, it's like Docker Docker anyway. Yeah. And I think, I think, I think my, well, I think infinity encourages GPU like you, they want you to use GPU for it. But that's like, I think infinity fits well. +If you have like, you know, a million requests an hour, something like that scale, you know? Yeah. +Um, if you have like 20,000 requests a day or a thousand requests a day, you know, that, that range, a hundred thousand, you know, I think by these perfect for that, you know, it's not, you don't have to have like this huge scale. It can get bigger. +You can just, you know, spend more money on hardware than scale it up as much as you want. You can support, you can support, you know, a million requests a day if you want to, you're a 10 million. You just have to put more hardware behind it. So I think I'm just competing in a different market. +I don't think, I don't think infinity and I are targeting the same, the same businesses. Yeah. Yeah. And I mean, you do have the edge on the fact that you want to address the community beyond Python. So like, I think it's a big, it's a big message to send. +Um, and in some ways through you, you channeled this, this feeling that, hey, this guy is in Node.js, a job, I probably feel like left out from this big thing, but it's probably not true. +I mean, I know also there is this deep learning for J and blah, blah, blah, but like, it's like an island in the ocean, probably comparing it to it. It's amazing software. It just didn't get the adoption that Python got. Yeah. +I remember going through these internal pains myself, right? When I was, when it was like 2015, 2016, and I started, and I started getting deep learning training and I took course error courses and rings courses on machine learning and stuff. +I started it off with Octav, which is an open source, uh, is a mathematical or whatever. It's a new Octav, but it's like, it's its own language, right? So it's mathematics, um, just just code. But then, like the next courses were all Python. And I was like, Oh, no, I have to learn Python. +I don't know Python. I have to know, no, no, no, no, no, no language to use this stuff. Okay, fine. I'll do it. Right? So I went down that and I learned Python and I got pretty good at it. +Um, but there's a lot of people who just don't want to take that step, you know, they want to, they want to ship code in their, in their stack. So it's, it's a big ask to say, if you want to use these awesome tools, you got to use, you got to, you got to convert, you got to convert your language. +Yeah, yeah, exactly. And if you're, you know, if you're not into data science or machine learning, then why would you enter Python at all? Like it has no, no like single winning point, well, maybe simplicity, but hey, is that it? You know, um, and it's like lose deposition. +Of course, you can make it more strict with typing and blah, blah, blah, but like still, but like it took me, I think it took me actually good three years to learn Python properly. Because it's like, not like, okay, oh, I understand how to do the for loop. +I understand the indentation and blah, but like to actually master it, right? Like, you know, avoid stupid loading of the model multiple times in G unicorn. +So, so I think the, all like, sithon, I didn't enter thyson, the sithon world, likely, but, but even just writing normal soft when Python takes a lot of time, productizing it takes a lot of time. +So, so why would you enter it if you are not after the tasty machine learning and data science? So why would you consider even converting your software stack into this? So it should be the other way around. +And I think you're doing a great job there with Mighty, basically offered as a service offered as maybe in the future as some kind of library or some kind of environment. I mean, Microsoft has been doing a bunch of these things. I don't know if you remember the CLR common language runtime. +So like, you, you, you bring up the, the visual studio and you can say, okay, my project will be in Pearl compiled and run for Java. I don't remember. It was crazy. I was just experimenting with it. And I was like, I barely knew any of these languages as a student, but I was fascinated by the idea. +It didn't fly, I think, but it was, it was amazing. Yeah, absolutely. +And I, you know, I would, I did, I did play around with the idea of, well, what if you don't even have to download Mighty, what if I was playing around with this idea from the NPM perspective, like, what if you just installed it from NPM command? And I thought, that's a little bit heavyweight. +Do I want to bring in this thing from, you know, I could. I don't know if that's if I should do that. And I also don't want to set false expectations to. +And maybe this is just because I'm not great at marketing, but I don't want to set the expectation of you just do NPM install, look, Mighty server. And then you have a perfectly running thing because it's, it's more than that. You have to scale it properly. You have to give it its own compute. +You have to choose the appropriate model. You have to, you have to do certain things to really get the most out of it. +So I don't want to set false expectations where somebody deploys it and, and it's like, it's, doesn't work well at all because they just did NPM install Mighty server, which doesn't exist by the way. Don't try that. And then it didn't, and then it didn't work. +So I, you know, there is a little bit of knowledge that you do that that you do have to attain. So I want to pass, you know, you do have to familiarize yourself with some concepts. That doesn't mean learning an entirely new language and stack. Yeah, it's more like probably like I'm a lobster devil. +So somebody can pick it up. And I mean, learning that way is much faster than actually, you know, figuring out how the how will I plug it into my Java code or C++ code or whatever. So yeah, of course. I think we, like I've really enjoyed this conversation, Max, we went deep into all these aspects. +Maybe not we can record another episode, you know, going in another direction. I'm sure there is like million, million directions to take. +But I enjoy asking this question of philosophical why, if you can still spare a few minutes, like why why why why why I'm fascinated by this field of vector search. +What brought you into it? And I remember I will also make sure to mention this that we did form a team with you and you you responded positively to my inquiry to compete in in billion scale and then competition. And you basically single handedly, almost driven the idea of body PQ. +Of course, we also have Alexander, similar who was helping you and all of us been brainstorming with you. But so that was kind of like maybe academic fascination with it, right? But other other facets that that keep you going also giving your background in search, which was pre vector search. + Yeah, I'd say just my endless curiosity into things, you know, think a lot of us have that as, you know, if you're listening to this podcast, the audience, there's probably a lot of you are very curious about just check technology in general and the limitations of technology and what's positive and getting to that new magical thing and trying something for the first time and saying, Oh my God, this is incredible. +I can't believe this actually worked that I did this. So, so it's that. I mean, I, you know, I'm in my 40s now. So I've gone through that cycle a lot of times where I've tried something and it was amazing. I do really feel that there's a lot of practicality to it though, you know, in my wisdom now. +I see that yeah, just because something looks cool doesn't mean it's the best thing in the world and it should be used everywhere. So I, I see the practical, the practical use and need for vector search. +Whether or not it turns out to be like the end all be all the search, you know, that debate is open, right? But I don't think it is. I think it's just one piece in the puzzle. But it does solve this whole class of problems that were unsolvable before if you go back 10 years. +When I first started in search, the types of things that I'm doing right now. And I'll give you an example and I actually, you know, I set this to somebody the other day. +It's like, you know, the first time I installed solar, this is even, you know, maybe elastic search was around at that time, but maybe it was compass search. It wasn't even elastic yet. The first time I installed solar and I put in some documents, I was like, wow, this is amazing. +Like I can do a search. This is so much better than that crappy index that I was using on SQL Server. So it was just really, it was like that type of amazement. +But then you, you know, you work with it over time, you see the limitations and it's like, oh, this got it out of these synonyms and all these other problems and all this stuff. +I'll say that, you know, when you when you first start off in like the relevance of solar, like out of the box, you take their example, schema XML and you and you add some documents to it and you get back stuff and you're like, okay, this is cool. +If you take that feeling and then you once and I'll just use quadrant for an example because quadrant is in my opinion, like super easy to use, like you just doc or pull quadrant and use throw some and stuff in there. Especially now with this node thing. +So when I did that, the first time I used quadrant and I wrote this node wrapper and I just chucked in a whole bunch of documents, I saw that like just the out of the box relevance. And I'm not saying this is fine tuned, like this isn't something production worthy. +But just the out of the box relevance, I was like, this is better and I would spend in my opinion less time worrying about this than I would with an inverted index, you know, just because well, yeah, maybe the results aren't like super precise all the time and things like that. +But if I'm on a team and it's like, I got this search bar and I got this content and I don't want to worry about it, right? I don't want to worry about it. I just wanted to work. I wanted to surface stuff that's like reasonably accurate. It doesn't have to be the best search in the world. +But it's a cost for me. It's a cost for me as a team. I don't make money from search, but it's something I have to support. I think vector search offers are really, really good solution because it's not like you have to chase that endless bug of this doesn't even have anything to do with my search. +I search for, you know, what is the best hiking boot or something like that, you know? And all the documents they matched what 10 times, but there's no semblance of hiking boot or anything in my document. This is terrible. You know, you don't get anything like that in vector search. +And that's, I think, the appeal. You know, when you get into like real, real production, like highly tuned search, it's just one piece. But just for the teams that's like out of the box, I want to work and I don't want to deal with it. +I think it's a better, I think it's a better solution than elasticer solar. You end up spending a lot less time and headache. Yeah, that's amazing. That's that's so deep. And in what you said, speaks, speaks and sings a practitioner, but I think also speaks and sings a dreamer. +I think you dream of better ways of searching things, right? +And like you went through it practically, but also, you know, when you when you get so deep into practical elements, you get stuck into it and you're like, you're thinking in the in the framework of the given system, of a given language even, right? +Sometimes the paradigms that you read through the docs and you keep thinking about them, it's hard to unstick yourself from them. +And I mean, the fact that vector search field was born is magical in many ways. So I feel like you feel the same. +And I mean, the fact that you also ventured with me and others into building a new algorithm for vector search also says that that you wanted to go as deep as implementing an algorithm, right? So which what could be sexier than implementing an algorithm? I mean, I don't know. +Of course, all other things are also sexy, but I'm just saying that it's very it's very complex. It's very demanding in intellectually demanding work. So that that's amazing. Thanks so much for this depth. +And is there something you would like to share or announce with, you know, on mighty or maybe on something you're going to be presenting on Berlin buzzwords, I know as well, but is there something that you would like to share? Yeah, so I am presenting a Berlin buzzwords. +I am putting together a charity event to hack on vector search. And that's going to be on May 5th. I don't know when this podcast will be published, but on May 5th, I want to get and I want it to be it's just going to be an all day learning session on and I'm not charging money for this. +This is like free. I just want to show people how to use these tools if you're not in the Python world. If you're part of the Python world, you want to join amazing. Great. +But I want to do an all day like hackathon, where I'll show you how to get these things up and running hack away at it by the end you'll have, you know, a working or working example on your own. +And all the money we're all the time we're going to raise money for charity, specifically around refugees and displaced people, you know, because of the horrible things that are happening in Ukraine and other parts of the world as well. +You know, getting getting some learning happening and also raising money for charity seems like a great way to spend time. So I plan to host that on May 5th. It's probably going to be on Twitch because I want to just to be an open drop in drop out format. You can come, you can go. +It's not going to be like a controlled zoom where you, you know, it's like that. It's going to be on Twitch with chat and stuff like that. So I'm going to get it all set up. Details are going to come out shortly. +By the time this is published, maybe the details will be available already and we'll drop a link. Yeah, awesome. This sounds amazing that you also keep thinking about this sensitive topics or like what's happening in the world and you are also contributing with your skills into a good course here. +Thanks so much. I will try to publish this podcast before May 5th. So make sure that somebody can join and get chappin, of course, we can do the the most media, social media push. This is amazing. Thanks so much, Max. I've enjoyed this conversation thoroughly. +We went into depth and with and everything all dimensions. It's a multi-dimensional conversation. So thanks so much and keep it up and I'm curious to hear news about Mighty and the tooling around it and also looking forward to your building buzzwords presentation. Yeah, thank you so much, Dima. +It's great to chat. Yeah, thank you, Max. Cheers. Cheers. Take care. Bye-bye. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/saurabh-rai-growing-resume-matcher.md b/transcripts_with_timestamps/vector-podcast/saurabh-rai-growing-resume-matcher.md new file mode 100644 index 0000000..480cb17 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/saurabh-rai-growing-resume-matcher.md @@ -0,0 +1,1396 @@ +--- +description: '

This episode on YouTube: https://www.youtube.com/watch?v=nx6BH9Z_gBA

Topics:

00:00 + Intro - how do you like our new design?

00:52 + Greets

01:55 + Saurabh''s background

03:04 Resume Matcher: + 4.5K stars, 800 community members, 1.5K forks

04:11 + How did you grow the project?

05:42 + Target audience and how to use Resume Matcher

09:00 + How did you attract so many contributors?

12:47 + Architecture aspects

15:10 Cloud or + not

16:12 + Challenges in maintaining OS projects

17:56 + Developer marketing with Swirl AI Connect

21:13 + What you (listener) can help with

22:52 + What drives you?

Show notes:

- Resume Matcher: https://github.com/srbhr/Resume-Matcher

website: + https://resumematcher.fyi/

- + Ultimate CV by Martin John Yate: https://www.amazon.com/Ultimate-CV-Cr...

- + fastembed: https://github.com/qdrant/fastembed

- + Swirl: https://github.com/swirlai/swirl-search

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20240412_070424_039c919c74991b595a1fa22f4c6cb1dd.png +pub_date: Fri, 12 Apr 2024 19:17:29 GMT +title: Saurabh Rai - Growing Resume Matcher +url: https://rss.com/podcasts/vector-podcast/1434941 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 23.28, "text": " Hello + there, this is Vector Podcast.", "tokens": [50364, 2425, 456, 11, 341, 307, 691, + 20814, 29972, 13, 51528], "temperature": 0.0, "avg_logprob": -0.43252109659129173, + "compression_ratio": 1.0543478260869565, "no_speech_prob": 0.20141193270683289}, + {"id": 1, "seek": 0, "start": 23.28, "end": 27.04, "text": " Third season, I''m + sure you''ve been waiting for new episodes.", "tokens": [51528, 12548, 3196, 11, + 286, 478, 988, 291, 600, 668, 3806, 337, 777, 9313, 13, 51716], "temperature": 0.0, + "avg_logprob": -0.43252109659129173, "compression_ratio": 1.0543478260869565, "no_speech_prob": + 0.20141193270683289}, {"id": 2, "seek": 2704, "start": 27.68, "end": 30.4, "text": + " There have been something happening in my family on positive side,", "tokens": + [50396, 821, 362, 668, 746, 2737, 294, 452, 1605, 322, 3353, 1252, 11, 50532], "temperature": + 0.0, "avg_logprob": -0.315221100710751, "compression_ratio": 1.4639639639639639, + "no_speech_prob": 0.6254786252975464}, {"id": 3, "seek": 2704, "start": 30.96, "end": + 37.68, "text": " so I was a little bit like distracted on a way, but I''m super + excited to have my guest with me.", "tokens": [50560, 370, 286, 390, 257, 707, 857, + 411, 21658, 322, 257, 636, 11, 457, 286, 478, 1687, 2919, 281, 362, 452, 8341, 365, + 385, 13, 50896], "temperature": 0.0, "avg_logprob": -0.315221100710751, "compression_ratio": + 1.4639639639639639, "no_speech_prob": 0.6254786252975464}, {"id": 4, "seek": 2704, + "start": 37.68, "end": 44.879999999999995, "text": " It''s Sarah Prie, who is a + software developer. He also doubles in DevRel.", "tokens": [50896, 467, 311, 9519, + 2114, 414, 11, 567, 307, 257, 4722, 10754, 13, 634, 611, 31634, 294, 9096, 49029, + 13, 51256], "temperature": 0.0, "avg_logprob": -0.315221100710751, "compression_ratio": + 1.4639639639639639, "no_speech_prob": 0.6254786252975464}, {"id": 5, "seek": 2704, + "start": 44.879999999999995, "end": 48.879999999999995, "text": " He has an open + source project, which is like Skyrocketing in stars.", "tokens": [51256, 634, 575, + 364, 1269, 4009, 1716, 11, 597, 307, 411, 9879, 37463, 278, 294, 6105, 13, 51456], + "temperature": 0.0, "avg_logprob": -0.315221100710751, "compression_ratio": 1.4639639639639639, + "no_speech_prob": 0.6254786252975464}, {"id": 6, "seek": 2704, "start": 50.480000000000004, + "end": 51.76, "text": " Yeah, welcome, Sarah.", "tokens": [51536, 865, 11, 2928, + 11, 9519, 13, 51600], "temperature": 0.0, "avg_logprob": -0.315221100710751, "compression_ratio": + 1.4639639639639639, "no_speech_prob": 0.6254786252975464}, {"id": 7, "seek": 5176, + "start": 51.76, "end": 57.519999999999996, "text": " It''s a high-demetripe, and + it''s a pleasure to be on the first episode of the third season,", "tokens": [50364, + 467, 311, 257, 1090, 12, 10730, 302, 470, 494, 11, 293, 309, 311, 257, 6834, 281, + 312, 322, 264, 700, 3500, 295, 264, 2636, 3196, 11, 50652], "temperature": 0.0, + "avg_logprob": -0.20411616563796997, "compression_ratio": 1.6631205673758864, "no_speech_prob": + 0.09891565889120102}, {"id": 8, "seek": 5176, "start": 57.519999999999996, "end": + 60.239999999999995, "text": " and it''s a pretty amazing introduction that you do.", + "tokens": [50652, 293, 309, 311, 257, 1238, 2243, 9339, 300, 291, 360, 13, 50788], + "temperature": 0.0, "avg_logprob": -0.20411616563796997, "compression_ratio": 1.6631205673758864, + "no_speech_prob": 0.09891565889120102}, {"id": 9, "seek": 5176, "start": 60.239999999999995, + "end": 66.96, "text": " So, hey, audience, I am Sarah. I am a software developer + with more than two years of experience.", "tokens": [50788, 407, 11, 4177, 11, 4034, + 11, 286, 669, 9519, 13, 286, 669, 257, 4722, 10754, 365, 544, 813, 732, 924, 295, + 1752, 13, 51124], "temperature": 0.0, "avg_logprob": -0.20411616563796997, "compression_ratio": + 1.6631205673758864, "no_speech_prob": 0.09891565889120102}, {"id": 10, "seek": 5176, + "start": 67.75999999999999, "end": 70.96, "text": " I''ve been doing a lot of open + source projects.", "tokens": [51164, 286, 600, 668, 884, 257, 688, 295, 1269, 4009, + 4455, 13, 51324], "temperature": 0.0, "avg_logprob": -0.20411616563796997, "compression_ratio": + 1.6631205673758864, "no_speech_prob": 0.09891565889120102}, {"id": 11, "seek": 5176, + "start": 70.96, "end": 75.44, "text": " And there is one more thing that I should + also mention, which is also very important.", "tokens": [51324, 400, 456, 307, 472, + 544, 551, 300, 286, 820, 611, 2152, 11, 597, 307, 611, 588, 1021, 13, 51548], "temperature": + 0.0, "avg_logprob": -0.20411616563796997, "compression_ratio": 1.6631205673758864, + "no_speech_prob": 0.09891565889120102}, {"id": 12, "seek": 5176, "start": 75.44, + "end": 81.44, "text": " Well, at least for me, and I know for you as well, is that + you are the designer on this podcast,", "tokens": [51548, 1042, 11, 412, 1935, 337, + 385, 11, 293, 286, 458, 337, 291, 382, 731, 11, 307, 300, 291, 366, 264, 11795, + 322, 341, 7367, 11, 51848], "temperature": 0.0, "avg_logprob": -0.20411616563796997, + "compression_ratio": 1.6631205673758864, "no_speech_prob": 0.09891565889120102}, + {"id": 13, "seek": 8144, "start": 81.44, "end": 85.28, "text": " and I''m sure that + you will be designing this very episode as well.", "tokens": [50364, 293, 286, 478, + 988, 300, 291, 486, 312, 14685, 341, 588, 3500, 382, 731, 13, 50556], "temperature": + 0.0, "avg_logprob": -0.21005839489875955, "compression_ratio": 1.6586538461538463, + "no_speech_prob": 0.0030059360433369875}, {"id": 14, "seek": 8144, "start": 86.64, + "end": 89.67999999999999, "text": " Oh, yeah. Yeah, it''s going to be fun, like + designing your own,", "tokens": [50624, 876, 11, 1338, 13, 865, 11, 309, 311, 516, + 281, 312, 1019, 11, 411, 14685, 428, 1065, 11, 50776], "temperature": 0.0, "avg_logprob": + -0.21005839489875955, "compression_ratio": 1.6586538461538463, "no_speech_prob": + 0.0030059360433369875}, {"id": 15, "seek": 8144, "start": 90.88, "end": 94.4, "text": + " like editing the own episode and the banner and so on.", "tokens": [50836, 411, + 10000, 264, 1065, 3500, 293, 264, 24348, 293, 370, 322, 13, 51012], "temperature": + 0.0, "avg_logprob": -0.21005839489875955, "compression_ratio": 1.6586538461538463, + "no_speech_prob": 0.0030059360433369875}, {"id": 16, "seek": 8144, "start": 94.4, + "end": 99.03999999999999, "text": " So it''s pretty amazing going from designer + open source and then here.", "tokens": [51012, 407, 309, 311, 1238, 2243, 516, 490, + 11795, 1269, 4009, 293, 550, 510, 13, 51244], "temperature": 0.0, "avg_logprob": + -0.21005839489875955, "compression_ratio": 1.6586538461538463, "no_speech_prob": + 0.0030059360433369875}, {"id": 17, "seek": 8144, "start": 99.68, "end": 101.6, "text": + " Yeah, absolutely.", "tokens": [51276, 865, 11, 3122, 13, 51372], "temperature": + 0.0, "avg_logprob": -0.21005839489875955, "compression_ratio": 1.6586538461538463, + "no_speech_prob": 0.0030059360433369875}, {"id": 18, "seek": 8144, "start": 103.03999999999999, + "end": 109.2, "text": " So, yeah, if you check out some of the designs that drew + your attention,", "tokens": [51444, 407, 11, 1338, 11, 498, 291, 1520, 484, 512, + 295, 264, 11347, 300, 12804, 428, 3202, 11, 51752], "temperature": 0.0, "avg_logprob": + -0.21005839489875955, "compression_ratio": 1.6586538461538463, "no_speech_prob": + 0.0030059360433369875}, {"id": 19, "seek": 10920, "start": 109.92, "end": 113.04, + "text": " you should know that this was, these were done by Sarah.", "tokens": [50400, + 291, 820, 458, 300, 341, 390, 11, 613, 645, 1096, 538, 9519, 13, 50556], "temperature": + 0.0, "avg_logprob": -0.20494035927646131, "compression_ratio": 1.542857142857143, + "no_speech_prob": 0.00799267366528511}, {"id": 20, "seek": 10920, "start": 113.04, + "end": 114.72, "text": " I''m really excited to have you here.", "tokens": [50556, + 286, 478, 534, 2919, 281, 362, 291, 510, 13, 50640], "temperature": 0.0, "avg_logprob": + -0.20494035927646131, "compression_ratio": 1.542857142857143, "no_speech_prob": + 0.00799267366528511}, {"id": 21, "seek": 10920, "start": 115.60000000000001, "end": + 121.44, "text": " As usual, we start with an intro. Could you introduce yourself + to our audience?", "tokens": [50684, 1018, 7713, 11, 321, 722, 365, 364, 12897, + 13, 7497, 291, 5366, 1803, 281, 527, 4034, 30, 50976], "temperature": 0.0, "avg_logprob": + -0.20494035927646131, "compression_ratio": 1.542857142857143, "no_speech_prob": + 0.00799267366528511}, {"id": 22, "seek": 10920, "start": 124.48, "end": 128.16, + "text": " Like what''s your background and how you got here?", "tokens": [51128, + 1743, 437, 311, 428, 3678, 293, 577, 291, 658, 510, 30, 51312], "temperature": 0.0, + "avg_logprob": -0.20494035927646131, "compression_ratio": 1.542857142857143, "no_speech_prob": + 0.00799267366528511}, {"id": 23, "seek": 10920, "start": 128.16, "end": 128.4, "text": + " Yeah.", "tokens": [51312, 865, 13, 51324], "temperature": 0.0, "avg_logprob": + -0.20494035927646131, "compression_ratio": 1.542857142857143, "no_speech_prob": + 0.00799267366528511}, {"id": 24, "seek": 10920, "start": 131.36, "end": 135.92000000000002, + "text": " Okay, so I have a background in computer science and engineering coming + up with an engineering", "tokens": [51472, 1033, 11, 370, 286, 362, 257, 3678, 294, + 3820, 3497, 293, 7043, 1348, 493, 365, 364, 7043, 51700], "temperature": 0.0, "avg_logprob": + -0.20494035927646131, "compression_ratio": 1.542857142857143, "no_speech_prob": + 0.00799267366528511}, {"id": 25, "seek": 13592, "start": 135.92, "end": 140.39999999999998, + "text": " degree right after the COVID second wave hit in India.", "tokens": [50364, + 4314, 558, 934, 264, 4566, 1150, 5772, 2045, 294, 5282, 13, 50588], "temperature": + 0.0, "avg_logprob": -0.19363869114925986, "compression_ratio": 1.5418326693227091, + "no_speech_prob": 0.035353872925043106}, {"id": 26, "seek": 13592, "start": 141.04, + "end": 147.04, "text": " And after that, I''ve been doing like a full stack development + for a very large corporate company", "tokens": [50620, 400, 934, 300, 11, 286, 600, + 668, 884, 411, 257, 1577, 8630, 3250, 337, 257, 588, 2416, 10896, 2237, 50920], + "temperature": 0.0, "avg_logprob": -0.19363869114925986, "compression_ratio": 1.5418326693227091, + "no_speech_prob": 0.035353872925043106}, {"id": 27, "seek": 13592, "start": 147.83999999999997, + "end": 150.79999999999998, "text": " out there, probably been there for two and + a half years.", "tokens": [50960, 484, 456, 11, 1391, 668, 456, 337, 732, 293, 257, + 1922, 924, 13, 51108], "temperature": 0.0, "avg_logprob": -0.19363869114925986, + "compression_ratio": 1.5418326693227091, "no_speech_prob": 0.035353872925043106}, + {"id": 28, "seek": 13592, "start": 150.79999999999998, "end": 155.11999999999998, + "text": " And apart from that, I was pretty much involved into open source projects,", + "tokens": [51108, 400, 4936, 490, 300, 11, 286, 390, 1238, 709, 3288, 666, 1269, + 4009, 4455, 11, 51324], "temperature": 0.0, "avg_logprob": -0.19363869114925986, + "compression_ratio": 1.5418326693227091, "no_speech_prob": 0.035353872925043106}, + {"id": 29, "seek": 13592, "start": 155.76, "end": 161.51999999999998, "text": " + vector search, machine learning in AI, and all the same spaces that you are in, + that''s how I found you.", "tokens": [51356, 8062, 3164, 11, 3479, 2539, 294, 7318, + 11, 293, 439, 264, 912, 7673, 300, 291, 366, 294, 11, 300, 311, 577, 286, 1352, + 291, 13, 51644], "temperature": 0.0, "avg_logprob": -0.19363869114925986, "compression_ratio": + 1.5418326693227091, "no_speech_prob": 0.035353872925043106}, {"id": 30, "seek": + 16152, "start": 162.16, "end": 168.96, "text": " And the other amazing team members + out there that we have collaborated and I''ve designed for as well.", "tokens": + [50396, 400, 264, 661, 2243, 1469, 2679, 484, 456, 300, 321, 362, 42463, 293, 286, + 600, 4761, 337, 382, 731, 13, 50736], "temperature": 0.0, "avg_logprob": -0.15221518093777686, + "compression_ratio": 1.7096774193548387, "no_speech_prob": 0.011774989776313305}, + {"id": 31, "seek": 16152, "start": 168.96, "end": 174.88, "text": " So that was + an interest that I''ve kept on to work towards artificial intelligence,", "tokens": + [50736, 407, 300, 390, 364, 1179, 300, 286, 600, 4305, 322, 281, 589, 3030, 11677, + 7599, 11, 51032], "temperature": 0.0, "avg_logprob": -0.15221518093777686, "compression_ratio": + 1.7096774193548387, "no_speech_prob": 0.011774989776313305}, {"id": 32, "seek": + 16152, "start": 174.88, "end": 180.56, "text": " machine learning, vector spaces, + vector search, vector databases and all those interesting things.", "tokens": [51032, + 3479, 2539, 11, 8062, 7673, 11, 8062, 3164, 11, 8062, 22380, 293, 439, 729, 1880, + 721, 13, 51316], "temperature": 0.0, "avg_logprob": -0.15221518093777686, "compression_ratio": + 1.7096774193548387, "no_speech_prob": 0.011774989776313305}, {"id": 33, "seek": + 16152, "start": 180.56, "end": 182.8, "text": " And yeah, and all thing open source + as well.", "tokens": [51316, 400, 1338, 11, 293, 439, 551, 1269, 4009, 382, 731, + 13, 51428], "temperature": 0.0, "avg_logprob": -0.15221518093777686, "compression_ratio": + 1.7096774193548387, "no_speech_prob": 0.011774989776313305}, {"id": 34, "seek": + 16152, "start": 183.28, "end": 190.56, "text": " So with all that''s happening in + the last two years with me, I ended up creating this project of", "tokens": [51452, + 407, 365, 439, 300, 311, 2737, 294, 264, 1036, 732, 924, 365, 385, 11, 286, 4590, + 493, 4084, 341, 1716, 295, 51816], "temperature": 0.0, "avg_logprob": -0.15221518093777686, + "compression_ratio": 1.7096774193548387, "no_speech_prob": 0.011774989776313305}, + {"id": 35, "seek": 19056, "start": 190.56, "end": 200.32, "text": " mine called + resume master, which is started to like gain more attention than I initially assumed.", + "tokens": [50364, 3892, 1219, 15358, 4505, 11, 597, 307, 1409, 281, 411, 6052, 544, + 3202, 813, 286, 9105, 15895, 13, 50852], "temperature": 0.0, "avg_logprob": -0.21606808398143354, + "compression_ratio": 1.4821428571428572, "no_speech_prob": 0.007892799563705921}, + {"id": 36, "seek": 19056, "start": 200.32, "end": 205.84, "text": " And that''s + where it all like blew up like 4000 nearly 4500 GitHub stars,", "tokens": [50852, + 400, 300, 311, 689, 309, 439, 411, 19075, 493, 411, 31104, 6217, 6905, 628, 23331, + 6105, 11, 51128], "temperature": 0.0, "avg_logprob": -0.21606808398143354, "compression_ratio": + 1.4821428571428572, "no_speech_prob": 0.007892799563705921}, {"id": 37, "seek": + 19056, "start": 206.72, "end": 212.56, "text": " huge amount of traffic on the website + and a lot of downloads, maybe like more than 1000 folks.", "tokens": [51172, 2603, + 2372, 295, 6419, 322, 264, 3144, 293, 257, 688, 295, 36553, 11, 1310, 411, 544, + 813, 9714, 4024, 13, 51464], "temperature": 0.0, "avg_logprob": -0.21606808398143354, + "compression_ratio": 1.4821428571428572, "no_speech_prob": 0.007892799563705921}, + {"id": 38, "seek": 19056, "start": 213.2, "end": 215.92000000000002, "text": " I''ve + got like 800 members in my community.", "tokens": [51496, 286, 600, 658, 411, 13083, + 2679, 294, 452, 1768, 13, 51632], "temperature": 0.0, "avg_logprob": -0.21606808398143354, + "compression_ratio": 1.4821428571428572, "no_speech_prob": 0.007892799563705921}, + {"id": 39, "seek": 19056, "start": 216.48000000000002, "end": 217.6, "text": " So + it''s pretty amazing.", "tokens": [51660, 407, 309, 311, 1238, 2243, 13, 51716], + "temperature": 0.0, "avg_logprob": -0.21606808398143354, "compression_ratio": 1.4821428571428572, + "no_speech_prob": 0.007892799563705921}, {"id": 40, "seek": 21760, "start": 218.23999999999998, + "end": 219.84, "text": " Yeah, this is insane and crazy.", "tokens": [50396, 865, + 11, 341, 307, 10838, 293, 3219, 13, 50476], "temperature": 0.0, "avg_logprob": -0.18685936730755262, + "compression_ratio": 1.7178571428571427, "no_speech_prob": 0.023342495784163475}, + {"id": 41, "seek": 21760, "start": 219.84, "end": 224.88, "text": " I remember we + were chatting together about this project and you said,", "tokens": [50476, 286, + 1604, 321, 645, 24654, 1214, 466, 341, 1716, 293, 291, 848, 11, 50728], "temperature": + 0.0, "avg_logprob": -0.18685936730755262, "compression_ratio": 1.7178571428571427, + "no_speech_prob": 0.023342495784163475}, {"id": 42, "seek": 21760, "start": 224.88, + "end": 230.07999999999998, "text": " you have this project, which is kind of like, + you know, a cadaver interest or something like that", "tokens": [50728, 291, 362, + 341, 1716, 11, 597, 307, 733, 295, 411, 11, 291, 458, 11, 257, 8411, 331, 1179, + 420, 746, 411, 300, 50988], "temperature": 0.0, "avg_logprob": -0.18685936730755262, + "compression_ratio": 1.7178571428571427, "no_speech_prob": 0.023342495784163475}, + {"id": 43, "seek": 21760, "start": 230.07999999999998, "end": 231.68, "text": " + that you''ve been doing on the side, right?", "tokens": [50988, 300, 291, 600, 668, + 884, 322, 264, 1252, 11, 558, 30, 51068], "temperature": 0.0, "avg_logprob": -0.18685936730755262, + "compression_ratio": 1.7178571428571427, "no_speech_prob": 0.023342495784163475}, + {"id": 44, "seek": 21760, "start": 231.68, "end": 237.12, "text": " And then we + were chatting, I give you like a small advice to rename it slightly.", "tokens": + [51068, 400, 550, 321, 645, 24654, 11, 286, 976, 291, 411, 257, 1359, 5192, 281, + 36741, 309, 4748, 13, 51340], "temperature": 0.0, "avg_logprob": -0.18685936730755262, + "compression_ratio": 1.7178571428571427, "no_speech_prob": 0.023342495784163475}, + {"id": 45, "seek": 21760, "start": 237.12, "end": 241.28, "text": " And then you + decided to really go public with it or whatever.", "tokens": [51340, 400, 550, 291, + 3047, 281, 534, 352, 1908, 365, 309, 420, 2035, 13, 51548], "temperature": 0.0, + "avg_logprob": -0.18685936730755262, "compression_ratio": 1.7178571428571427, "no_speech_prob": + 0.023342495784163475}, {"id": 46, "seek": 21760, "start": 241.28, "end": 246.95999999999998, + "text": " Like just an insane amount of growth that you''ve made there from like + a couple of stars to 4000", "tokens": [51548, 1743, 445, 364, 10838, 2372, 295, + 4599, 300, 291, 600, 1027, 456, 490, 411, 257, 1916, 295, 6105, 281, 31104, 51832], + "temperature": 0.0, "avg_logprob": -0.18685936730755262, "compression_ratio": 1.7178571428571427, + "no_speech_prob": 0.023342495784163475}, {"id": 47, "seek": 24696, "start": 247.04000000000002, + "end": 250.8, "text": " and half stars. Like how did you do that? Like I still don''t + understand.", "tokens": [50368, 293, 1922, 6105, 13, 1743, 577, 630, 291, 360, 300, + 30, 1743, 286, 920, 500, 380, 1223, 13, 50556], "temperature": 0.0, "avg_logprob": + -0.19997474584686623, "compression_ratio": 1.5564853556485356, "no_speech_prob": + 0.0018659131601452827}, {"id": 48, "seek": 24696, "start": 252.72, "end": 259.52, + "text": " So the amazing part with the whole thing was that as you mentioned, the + initially it was called", "tokens": [50652, 407, 264, 2243, 644, 365, 264, 1379, + 551, 390, 300, 382, 291, 2835, 11, 264, 9105, 309, 390, 1219, 50992], "temperature": + 0.0, "avg_logprob": -0.19997474584686623, "compression_ratio": 1.5564853556485356, + "no_speech_prob": 0.0018659131601452827}, {"id": 49, "seek": 24696, "start": 260.08, + "end": 264.48, "text": " name resume matching and it was pretty much focused on + the algorithm.", "tokens": [51020, 1315, 15358, 14324, 293, 309, 390, 1238, 709, + 5178, 322, 264, 9284, 13, 51240], "temperature": 0.0, "avg_logprob": -0.19997474584686623, + "compression_ratio": 1.5564853556485356, "no_speech_prob": 0.0018659131601452827}, + {"id": 50, "seek": 24696, "start": 264.48, "end": 266.08, "text": " And like when + we were talking about it,", "tokens": [51240, 400, 411, 562, 321, 645, 1417, 466, + 309, 11, 51320], "temperature": 0.0, "avg_logprob": -0.19997474584686623, "compression_ratio": + 1.5564853556485356, "no_speech_prob": 0.0018659131601452827}, {"id": 51, "seek": + 24696, "start": 266.8, "end": 272.64, "text": " to make it more public, friendly, + have a nice intuitive dashboard so that people can interact", "tokens": [51356, + 281, 652, 309, 544, 1908, 11, 9208, 11, 362, 257, 1481, 21769, 18342, 370, 300, + 561, 393, 4648, 51648], "temperature": 0.0, "avg_logprob": -0.19997474584686623, + "compression_ratio": 1.5564853556485356, "no_speech_prob": 0.0018659131601452827}, + {"id": 52, "seek": 27264, "start": 272.64, "end": 275.91999999999996, "text": " + apart from the command line stuff that I had before.", "tokens": [50364, 4936, 490, + 264, 5622, 1622, 1507, 300, 286, 632, 949, 13, 50528], "temperature": 0.0, "avg_logprob": + -0.13267572643687425, "compression_ratio": 1.714859437751004, "no_speech_prob": + 0.01109365001320839}, {"id": 53, "seek": 27264, "start": 276.47999999999996, "end": + 284.08, "text": " So that was the whole thing to create up a product that people + can use and that introduced me to", "tokens": [50556, 407, 300, 390, 264, 1379, + 551, 281, 1884, 493, 257, 1674, 300, 561, 393, 764, 293, 300, 7268, 385, 281, 50936], + "temperature": 0.0, "avg_logprob": -0.13267572643687425, "compression_ratio": 1.714859437751004, + "no_speech_prob": 0.01109365001320839}, {"id": 54, "seek": 27264, "start": 284.08, + "end": 290.15999999999997, "text": " something called as developer marketing. It''s + not just that you have the product or you have", "tokens": [50936, 746, 1219, 382, + 10754, 6370, 13, 467, 311, 406, 445, 300, 291, 362, 264, 1674, 420, 291, 362, 51240], + "temperature": 0.0, "avg_logprob": -0.13267572643687425, "compression_ratio": 1.714859437751004, + "no_speech_prob": 0.01109365001320839}, {"id": 55, "seek": 27264, "start": 290.15999999999997, + "end": 295.52, "text": " the code, but there has to be something special about it + that you can do so that it grows.", "tokens": [51240, 264, 3089, 11, 457, 456, 575, + 281, 312, 746, 2121, 466, 309, 300, 291, 393, 360, 370, 300, 309, 13156, 13, 51508], + "temperature": 0.0, "avg_logprob": -0.13267572643687425, "compression_ratio": 1.714859437751004, + "no_speech_prob": 0.01109365001320839}, {"id": 56, "seek": 27264, "start": 295.52, + "end": 301.76, "text": " I mean, we have a huge probably millions of open source + project in GitHub and GitLab as well.", "tokens": [51508, 286, 914, 11, 321, 362, + 257, 2603, 1391, 6803, 295, 1269, 4009, 1716, 294, 23331, 293, 16939, 37880, 382, + 731, 13, 51820], "temperature": 0.0, "avg_logprob": -0.13267572643687425, "compression_ratio": + 1.714859437751004, "no_speech_prob": 0.01109365001320839}, {"id": 57, "seek": 30176, + "start": 302.08, "end": 307.36, "text": " What makes a difference between something + that has 100 stars or maybe less to something that", "tokens": [50380, 708, 1669, + 257, 2649, 1296, 746, 300, 575, 2319, 6105, 420, 1310, 1570, 281, 746, 300, 50644], + "temperature": 0.0, "avg_logprob": -0.13835869957419003, "compression_ratio": 1.7488584474885844, + "no_speech_prob": 0.0036265328526496887}, {"id": 58, "seek": 30176, "start": 307.36, + "end": 315.76, "text": " has 4000 or maybe more than that? That has the element + of marketing out there and a finally", "tokens": [50644, 575, 31104, 420, 1310, + 544, 813, 300, 30, 663, 575, 264, 4478, 295, 6370, 484, 456, 293, 257, 2721, 51064], + "temperature": 0.0, "avg_logprob": -0.13835869957419003, "compression_ratio": 1.7488584474885844, + "no_speech_prob": 0.0036265328526496887}, {"id": 59, "seek": 30176, "start": 315.76, + "end": 323.44, "text": " polished product. So that''s the key differentiator between + everything is that. Writing software is", "tokens": [51064, 29079, 1674, 13, 407, + 300, 311, 264, 2141, 27372, 1639, 1296, 1203, 307, 300, 13, 32774, 4722, 307, 51448], + "temperature": 0.0, "avg_logprob": -0.13835869957419003, "compression_ratio": 1.7488584474885844, + "no_speech_prob": 0.0036265328526496887}, {"id": 60, "seek": 30176, "start": 323.44, + "end": 330.15999999999997, "text": " essential, but if you don''t do the marketing, + if you don''t do the advertising or evangelizing about", "tokens": [51448, 7115, + 11, 457, 498, 291, 500, 380, 360, 264, 6370, 11, 498, 291, 500, 380, 360, 264, 13097, + 420, 24546, 3319, 466, 51784], "temperature": 0.0, "avg_logprob": -0.13835869957419003, + "compression_ratio": 1.7488584474885844, "no_speech_prob": 0.0036265328526496887}, + {"id": 61, "seek": 33016, "start": 330.16, "end": 336.40000000000003, "text": " + your whole stuff, then it probably doesn''t get as much attention it wants to.", + "tokens": [50364, 428, 1379, 1507, 11, 550, 309, 1391, 1177, 380, 483, 382, 709, + 3202, 309, 2738, 281, 13, 50676], "temperature": 0.0, "avg_logprob": -0.24053984249339383, + "compression_ratio": 1.5296803652968036, "no_speech_prob": 0.012702646665275097}, + {"id": 62, "seek": 33016, "start": 336.40000000000003, "end": 341.20000000000005, + "text": " So that was the game changer for resume matching.", "tokens": [50676, + 407, 300, 390, 264, 1216, 22822, 337, 15358, 14324, 13, 50916], "temperature": 0.0, + "avg_logprob": -0.24053984249339383, "compression_ratio": 1.5296803652968036, "no_speech_prob": + 0.012702646665275097}, {"id": 63, "seek": 33016, "start": 341.84000000000003, "end": + 345.12, "text": " This is an amazing journey that you''ve had there.", "tokens": + [50948, 639, 307, 364, 2243, 4671, 300, 291, 600, 632, 456, 13, 51112], "temperature": + 0.0, "avg_logprob": -0.24053984249339383, "compression_ratio": 1.5296803652968036, + "no_speech_prob": 0.012702646665275097}, {"id": 64, "seek": 33016, "start": 346.88, + "end": 350.16, "text": " And this is where I want to ask you next. My next question,", + "tokens": [51200, 400, 341, 307, 689, 286, 528, 281, 1029, 291, 958, 13, 1222, 958, + 1168, 11, 51364], "temperature": 0.0, "avg_logprob": -0.24053984249339383, "compression_ratio": + 1.5296803652968036, "no_speech_prob": 0.012702646665275097}, {"id": 65, "seek": + 33016, "start": 350.16, "end": 358.64000000000004, "text": " can you explain what + resume matcher does and how is it relevant or useful to pretty much anybody,", "tokens": + [51364, 393, 291, 2903, 437, 15358, 2995, 260, 775, 293, 577, 307, 309, 7340, 420, + 4420, 281, 1238, 709, 4472, 11, 51788], "temperature": 0.0, "avg_logprob": -0.24053984249339383, + "compression_ratio": 1.5296803652968036, "no_speech_prob": 0.012702646665275097}, + {"id": 66, "seek": 35864, "start": 358.64, "end": 360.56, "text": " I guess, but + maybe developers first, right?", "tokens": [50364, 286, 2041, 11, 457, 1310, 8849, + 700, 11, 558, 30, 50460], "temperature": 0.0, "avg_logprob": -0.18611644744873046, + "compression_ratio": 1.7224489795918367, "no_speech_prob": 0.008626842871308327}, + {"id": 67, "seek": 35864, "start": 363.2, "end": 367.84, "text": " The target audience + for resume matcher was developers out there. I''m a software developer,", "tokens": + [50592, 440, 3779, 4034, 337, 15358, 2995, 260, 390, 8849, 484, 456, 13, 286, 478, + 257, 4722, 10754, 11, 50824], "temperature": 0.0, "avg_logprob": -0.18611644744873046, + "compression_ratio": 1.7224489795918367, "no_speech_prob": 0.008626842871308327}, + {"id": 68, "seek": 35864, "start": 367.84, "end": 373.52, "text": " so I knew like + what was the challenge with the whole project. What resume matcher does is like,", + "tokens": [50824, 370, 286, 2586, 411, 437, 390, 264, 3430, 365, 264, 1379, 1716, + 13, 708, 15358, 2995, 260, 775, 307, 411, 11, 51108], "temperature": 0.0, "avg_logprob": + -0.18611644744873046, "compression_ratio": 1.7224489795918367, "no_speech_prob": + 0.008626842871308327}, {"id": 69, "seek": 35864, "start": 373.52, "end": 380.56, + "text": " it is a reverse ATS tool for your resume is out there. It takes up certain + keywords from job", "tokens": [51108, 309, 307, 257, 9943, 316, 7327, 2290, 337, + 428, 15358, 307, 484, 456, 13, 467, 2516, 493, 1629, 21009, 490, 1691, 51460], "temperature": + 0.0, "avg_logprob": -0.18611644744873046, "compression_ratio": 1.7224489795918367, + "no_speech_prob": 0.008626842871308327}, {"id": 70, "seek": 35864, "start": 380.56, + "end": 386.88, "text": " descriptions and then it matches with your resume out there. + It suggests that probably you can add", "tokens": [51460, 24406, 293, 550, 309, + 10676, 365, 428, 15358, 484, 456, 13, 467, 13409, 300, 1391, 291, 393, 909, 51776], + "temperature": 0.0, "avg_logprob": -0.18611644744873046, "compression_ratio": 1.7224489795918367, + "no_speech_prob": 0.008626842871308327}, {"id": 71, "seek": 38688, "start": 386.96, + "end": 393.04, "text": " some extra keywords. An example would be, hey, I am a Java + developer. I''m going to apply for this", "tokens": [50368, 512, 2857, 21009, 13, + 1107, 1365, 576, 312, 11, 4177, 11, 286, 669, 257, 10745, 10754, 13, 286, 478, 516, + 281, 3079, 337, 341, 50672], "temperature": 0.0, "avg_logprob": -0.20089414168377312, + "compression_ratio": 1.6455696202531647, "no_speech_prob": 0.029341978952288628}, + {"id": 72, "seek": 38688, "start": 393.04, "end": 398.48, "text": " full stack developer + job that I know that I can do it well. The full stack developer job contains", "tokens": + [50672, 1577, 8630, 10754, 1691, 300, 286, 458, 300, 286, 393, 360, 309, 731, 13, + 440, 1577, 8630, 10754, 1691, 8306, 50944], "temperature": 0.0, "avg_logprob": -0.20089414168377312, + "compression_ratio": 1.6455696202531647, "no_speech_prob": 0.029341978952288628}, + {"id": 73, "seek": 38688, "start": 398.48, "end": 406.15999999999997, "text": " + certain extra keywords. I have written Java, but the job description has mentioned + it as G2EE or", "tokens": [50944, 1629, 2857, 21009, 13, 286, 362, 3720, 10745, + 11, 457, 264, 1691, 3855, 575, 2835, 309, 382, 460, 17, 36, 36, 420, 51328], "temperature": + 0.0, "avg_logprob": -0.20089414168377312, "compression_ratio": 1.6455696202531647, + "no_speech_prob": 0.029341978952288628}, {"id": 74, "seek": 38688, "start": 406.15999999999997, + "end": 414.24, "text": " Java to enterprise edition or probably another package. + Maybe it is, it has maven, it has spring,", "tokens": [51328, 10745, 281, 14132, + 11377, 420, 1391, 1071, 7372, 13, 2704, 309, 307, 11, 309, 575, 463, 553, 11, 309, + 575, 5587, 11, 51732], "temperature": 0.0, "avg_logprob": -0.20089414168377312, + "compression_ratio": 1.6455696202531647, "no_speech_prob": 0.029341978952288628}, + {"id": 75, "seek": 41424, "start": 414.24, "end": 419.28000000000003, "text": " + and I have written as spring boot. Certain keyword mismatches there, which anybody + can", "tokens": [50364, 293, 286, 362, 3720, 382, 5587, 11450, 13, 13407, 20428, + 23220, 852, 279, 456, 11, 597, 4472, 393, 50616], "temperature": 0.0, "avg_logprob": + -0.293464998404185, "compression_ratio": 1.6419213973799127, "no_speech_prob": 0.017837978899478912}, + {"id": 76, "seek": 41424, "start": 419.28000000000003, "end": 425.6, "text": " make. + Then that is something that resume matcher fills in. It would go out, parse your + resume,", "tokens": [50616, 652, 13, 1396, 300, 307, 746, 300, 15358, 2995, 260, + 22498, 294, 13, 467, 576, 352, 484, 11, 48377, 428, 15358, 11, 50932], "temperature": + 0.0, "avg_logprob": -0.293464998404185, "compression_ratio": 1.6419213973799127, + "no_speech_prob": 0.017837978899478912}, {"id": 77, "seek": 41424, "start": 425.6, + "end": 432.72, "text": " match it with a lot of job descriptions out there. It''s + going to suggest your similarity score as", "tokens": [50932, 2995, 309, 365, 257, + 688, 295, 1691, 24406, 484, 456, 13, 467, 311, 516, 281, 3402, 428, 32194, 6175, + 382, 51288], "temperature": 0.0, "avg_logprob": -0.293464998404185, "compression_ratio": + 1.6419213973799127, "no_speech_prob": 0.017837978899478912}, {"id": 78, "seek": + 41424, "start": 432.72, "end": 439.68, "text": " well. That, hey, this is the job, + this is how much you match. That''s where the webtoe similarity", "tokens": [51288, + 731, 13, 663, 11, 4177, 11, 341, 307, 264, 1691, 11, 341, 307, 577, 709, 291, 2995, + 13, 663, 311, 689, 264, 3670, 1353, 68, 32194, 51636], "temperature": 0.0, "avg_logprob": + -0.293464998404185, "compression_ratio": 1.6419213973799127, "no_speech_prob": 0.017837978899478912}, + {"id": 79, "seek": 43968, "start": 439.76, "end": 445.36, "text": " comes in. You + mentioned ATS. What does it stand for? Can you expand it?", "tokens": [50368, 1487, + 294, 13, 509, 2835, 316, 7327, 13, 708, 775, 309, 1463, 337, 30, 1664, 291, 5268, + 309, 30, 50648], "temperature": 0.0, "avg_logprob": -0.20689331966897714, "compression_ratio": + 1.4913793103448276, "no_speech_prob": 0.04439431428909302}, {"id": 80, "seek": 43968, + "start": 445.36, "end": 451.84000000000003, "text": " Oh, yeah, for shows. For those + who don''t know, ATS is an applicant tracking system. There''s", "tokens": [50648, + 876, 11, 1338, 11, 337, 3110, 13, 1171, 729, 567, 500, 380, 458, 11, 316, 7327, + 307, 364, 30915, 11603, 1185, 13, 821, 311, 50972], "temperature": 0.0, "avg_logprob": + -0.20689331966897714, "compression_ratio": 1.4913793103448276, "no_speech_prob": + 0.04439431428909302}, {"id": 81, "seek": 43968, "start": 451.84000000000003, "end": + 459.28000000000003, "text": " like a lot of them. What actually happens is that + when you apply to a company, any likes,", "tokens": [50972, 411, 257, 688, 295, + 552, 13, 708, 767, 2314, 307, 300, 562, 291, 3079, 281, 257, 2237, 11, 604, 5902, + 11, 51344], "temperature": 0.0, "avg_logprob": -0.20689331966897714, "compression_ratio": + 1.4913793103448276, "no_speech_prob": 0.04439431428909302}, {"id": 82, "seek": 43968, + "start": 459.28000000000003, "end": 464.64, "text": " from startup to a large corporate + company, you submit their resumes and they get ingested by", "tokens": [51344, 490, + 18578, 281, 257, 2416, 10896, 2237, 11, 291, 10315, 641, 48068, 293, 436, 483, 3957, + 21885, 538, 51612], "temperature": 0.0, "avg_logprob": -0.20689331966897714, "compression_ratio": + 1.4913793103448276, "no_speech_prob": 0.04439431428909302}, {"id": 83, "seek": 46464, + "start": 464.71999999999997, "end": 472.08, "text": " these applicant tracking system. + These software''s taken your resume, extract all the keywords,", "tokens": [50368, + 613, 30915, 11603, 1185, 13, 1981, 4722, 311, 2726, 428, 15358, 11, 8947, 439, 264, + 21009, 11, 50736], "temperature": 0.0, "avg_logprob": -0.15314285406905614, "compression_ratio": + 1.596638655462185, "no_speech_prob": 0.015142472460865974}, {"id": 84, "seek": 46464, + "start": 472.08, "end": 477.91999999999996, "text": " and then they run some similarity + scores against the job description. It also do some custom", "tokens": [50736, 293, + 550, 436, 1190, 512, 32194, 13444, 1970, 264, 1691, 3855, 13, 467, 611, 360, 512, + 2375, 51028], "temperature": 0.0, "avg_logprob": -0.15314285406905614, "compression_ratio": + 1.596638655462185, "no_speech_prob": 0.015142472460865974}, {"id": 85, "seek": 46464, + "start": 477.91999999999996, "end": 483.91999999999996, "text": " keywords searching + as well, but that''s to per company basis. It''s like optimizing your resume,", + "tokens": [51028, 21009, 10808, 382, 731, 11, 457, 300, 311, 281, 680, 2237, 5143, + 13, 467, 311, 411, 40425, 428, 15358, 11, 51328], "temperature": 0.0, "avg_logprob": + -0.15314285406905614, "compression_ratio": 1.596638655462185, "no_speech_prob": + 0.015142472460865974}, {"id": 86, "seek": 46464, "start": 483.91999999999996, "end": + 489.12, "text": " right? I remember I was reading one book. I will need to look + it up to link it in the show notes,", "tokens": [51328, 558, 30, 286, 1604, 286, + 390, 3760, 472, 1446, 13, 286, 486, 643, 281, 574, 309, 493, 281, 2113, 309, 294, + 264, 855, 5570, 11, 51588], "temperature": 0.0, "avg_logprob": -0.15314285406905614, + "compression_ratio": 1.596638655462185, "no_speech_prob": 0.015142472460865974}, + {"id": 87, "seek": 48912, "start": 489.68, "end": 495.84000000000003, "text": " + but basically it''s a book about how you should approach job seeking, but part of + it is", "tokens": [50392, 457, 1936, 309, 311, 257, 1446, 466, 577, 291, 820, 3109, + 1691, 11670, 11, 457, 644, 295, 309, 307, 50700], "temperature": 0.0, "avg_logprob": + -0.14707955070163892, "compression_ratio": 1.6206896551724137, "no_speech_prob": + 0.008435395546257496}, {"id": 88, "seek": 48912, "start": 495.84000000000003, "end": + 501.2, "text": " basically writing your resume, right? I remember that it was actually + starting in reverse way,", "tokens": [50700, 1936, 3579, 428, 15358, 11, 558, 30, + 286, 1604, 300, 309, 390, 767, 2891, 294, 9943, 636, 11, 50968], "temperature": + 0.0, "avg_logprob": -0.14707955070163892, "compression_ratio": 1.6206896551724137, + "no_speech_prob": 0.008435395546257496}, {"id": 89, "seek": 48912, "start": 501.2, + "end": 509.44, "text": " the same way you just explained, but it would first ask + you to list the jobs that way you want to", "tokens": [50968, 264, 912, 636, 291, + 445, 8825, 11, 457, 309, 576, 700, 1029, 291, 281, 1329, 264, 4782, 300, 636, 291, + 528, 281, 51380], "temperature": 0.0, "avg_logprob": -0.14707955070163892, "compression_ratio": + 1.6206896551724137, "no_speech_prob": 0.008435395546257496}, {"id": 90, "seek": + 48912, "start": 509.44, "end": 514.4, "text": " apply, right? Sort of like scope + them. Then you kind of try to summarize what''s common in there,", "tokens": [51380, + 3079, 11, 558, 30, 26149, 295, 411, 11923, 552, 13, 1396, 291, 733, 295, 853, 281, + 20858, 437, 311, 2689, 294, 456, 11, 51628], "temperature": 0.0, "avg_logprob": + -0.14707955070163892, "compression_ratio": 1.6206896551724137, "no_speech_prob": + 0.008435395546257496}, {"id": 91, "seek": 51440, "start": 514.4, "end": 519.52, + "text": " if there is some commonality, and then you go backwards to your experience + and you try to kind of", "tokens": [50364, 498, 456, 307, 512, 2689, 1860, 11, 293, + 550, 291, 352, 12204, 281, 428, 1752, 293, 291, 853, 281, 733, 295, 50620], "temperature": + 0.0, "avg_logprob": -0.19392798928653493, "compression_ratio": 1.6593406593406594, + "no_speech_prob": 0.007620566058903933}, {"id": 92, "seek": 51440, "start": 519.52, + "end": 524.3199999999999, "text": " match these things in a proper way, right? You + even need to start your resume from the key", "tokens": [50620, 2995, 613, 721, + 294, 257, 2296, 636, 11, 558, 30, 509, 754, 643, 281, 722, 428, 15358, 490, 264, + 2141, 50860], "temperature": 0.0, "avg_logprob": -0.19392798928653493, "compression_ratio": + 1.6593406593406594, "no_speech_prob": 0.007620566058903933}, {"id": 93, "seek": + 51440, "start": 525.12, "end": 531.04, "text": " skill that this job adds a looking + for, and that''s how your resume is going to stand out. That''s", "tokens": [50900, + 5389, 300, 341, 1691, 10860, 257, 1237, 337, 11, 293, 300, 311, 577, 428, 15358, + 307, 516, 281, 1463, 484, 13, 663, 311, 51196], "temperature": 0.0, "avg_logprob": + -0.19392798928653493, "compression_ratio": 1.6593406593406594, "no_speech_prob": + 0.007620566058903933}, {"id": 94, "seek": 51440, "start": 531.04, "end": 535.28, + "text": " amazing way that you cracked it. Basically, I don''t know if you''ve read + that book, but...", "tokens": [51196, 2243, 636, 300, 291, 25140, 309, 13, 8537, + 11, 286, 500, 380, 458, 498, 291, 600, 1401, 300, 1446, 11, 457, 485, 51408], "temperature": + 0.0, "avg_logprob": -0.19392798928653493, "compression_ratio": 1.6593406593406594, + "no_speech_prob": 0.007620566058903933}, {"id": 95, "seek": 51440, "start": 535.28, + "end": 544.16, "text": " No, I''m not. That''s amazing. This tool has now how many + contributors? It''s an", "tokens": [51408, 883, 11, 286, 478, 406, 13, 663, 311, + 2243, 13, 639, 2290, 575, 586, 577, 867, 45627, 30, 467, 311, 364, 51852], "temperature": + 0.0, "avg_logprob": -0.19392798928653493, "compression_ratio": 1.6593406593406594, + "no_speech_prob": 0.007620566058903933}, {"id": 96, "seek": 54416, "start": 544.16, + "end": 556.56, "text": " open source, right? Sorry, I missed it a bit. So this tool + is open source, right? Yeah, open source with", "tokens": [50364, 1269, 4009, 11, + 558, 30, 4919, 11, 286, 6721, 309, 257, 857, 13, 407, 341, 2290, 307, 1269, 4009, + 11, 558, 30, 865, 11, 1269, 4009, 365, 50984], "temperature": 0.0, "avg_logprob": + -0.18375730514526367, "compression_ratio": 1.4898989898989898, "no_speech_prob": + 0.005246844608336687}, {"id": 97, "seek": 54416, "start": 556.56, "end": 563.52, + "text": " Apache 2.0 license. Yeah, and how many... Part of the journey, it''s not + just getting stars,", "tokens": [50984, 46597, 568, 13, 15, 10476, 13, 865, 11, + 293, 577, 867, 485, 4100, 295, 264, 4671, 11, 309, 311, 406, 445, 1242, 6105, 11, + 51332], "temperature": 0.0, "avg_logprob": -0.18375730514526367, "compression_ratio": + 1.4898989898989898, "no_speech_prob": 0.005246844608336687}, {"id": 98, "seek": + 54416, "start": 563.52, "end": 570.9599999999999, "text": " you did get stars, which + is amazing, but I think an even more interesting result, at least for me,", "tokens": + [51332, 291, 630, 483, 6105, 11, 597, 307, 2243, 11, 457, 286, 519, 364, 754, 544, + 1880, 1874, 11, 412, 1935, 337, 385, 11, 51704], "temperature": 0.0, "avg_logprob": + -0.18375730514526367, "compression_ratio": 1.4898989898989898, "no_speech_prob": + 0.005246844608336687}, {"id": 99, "seek": 57096, "start": 570.96, "end": 577.0400000000001, + "text": " is that you got contributors besides yourself. And I think last time I + checked was today,", "tokens": [50364, 307, 300, 291, 658, 45627, 11868, 1803, 13, + 400, 286, 519, 1036, 565, 286, 10033, 390, 965, 11, 50668], "temperature": 0.0, + "avg_logprob": -0.19580082226825016, "compression_ratio": 1.5593220338983051, "no_speech_prob": + 0.039533793926239014}, {"id": 100, "seek": 57096, "start": 577.0400000000001, "end": + 582.32, "text": " like you have, I don''t know, dozens of them. How did that happen? + Yeah, more than...", "tokens": [50668, 411, 291, 362, 11, 286, 500, 380, 458, 11, + 18431, 295, 552, 13, 1012, 630, 300, 1051, 30, 865, 11, 544, 813, 485, 50932], "temperature": + 0.0, "avg_logprob": -0.19580082226825016, "compression_ratio": 1.5593220338983051, + "no_speech_prob": 0.039533793926239014}, {"id": 101, "seek": 57096, "start": 584.08, + "end": 590.8000000000001, "text": " So this advice would be more towards the open + source companies out there. We''ve seen a lot of", "tokens": [51020, 407, 341, 5192, + 576, 312, 544, 3030, 264, 1269, 4009, 3431, 484, 456, 13, 492, 600, 1612, 257, 688, + 295, 51356], "temperature": 0.0, "avg_logprob": -0.19580082226825016, "compression_ratio": + 1.5593220338983051, "no_speech_prob": 0.039533793926239014}, {"id": 102, "seek": + 57096, "start": 590.8000000000001, "end": 596.24, "text": " them getting started. + Why a combinator has a lot of... They''ve started funding a lot of open source", + "tokens": [51356, 552, 1242, 1409, 13, 1545, 257, 2512, 31927, 575, 257, 688, 295, + 485, 814, 600, 1409, 6137, 257, 688, 295, 1269, 4009, 51628], "temperature": 0.0, + "avg_logprob": -0.19580082226825016, "compression_ratio": 1.5593220338983051, "no_speech_prob": + 0.039533793926239014}, {"id": 103, "seek": 59624, "start": 596.32, "end": 601.12, + "text": " companies out there, which is pretty amazing. We''re from vector database, + machine learning,", "tokens": [50368, 3431, 484, 456, 11, 597, 307, 1238, 2243, + 13, 492, 434, 490, 8062, 8149, 11, 3479, 2539, 11, 50608], "temperature": 0.0, "avg_logprob": + -0.12437812631780451, "compression_ratio": 1.701818181818182, "no_speech_prob": + 0.04596226289868355}, {"id": 104, "seek": 59624, "start": 602.5600000000001, "end": + 609.12, "text": " tools, generative AI tools, all those amazing things. So the interesting + part of this is that", "tokens": [50680, 3873, 11, 1337, 1166, 7318, 3873, 11, 439, + 729, 2243, 721, 13, 407, 264, 1880, 644, 295, 341, 307, 300, 51008], "temperature": + 0.0, "avg_logprob": -0.12437812631780451, "compression_ratio": 1.701818181818182, + "no_speech_prob": 0.04596226289868355}, {"id": 105, "seek": 59624, "start": 610.0, + "end": 615.04, "text": " trending gets you visibility. Like, if you are on the GitHub + trending feed, it will give you", "tokens": [51052, 28692, 2170, 291, 19883, 13, + 1743, 11, 498, 291, 366, 322, 264, 23331, 28692, 3154, 11, 309, 486, 976, 291, 51304], + "temperature": 0.0, "avg_logprob": -0.12437812631780451, "compression_ratio": 1.701818181818182, + "no_speech_prob": 0.04596226289868355}, {"id": 106, "seek": 59624, "start": 615.04, + "end": 620.4, "text": " visibility. And there are all the people out there who are + like looking for their next content.", "tokens": [51304, 19883, 13, 400, 456, 366, + 439, 264, 561, 484, 456, 567, 366, 411, 1237, 337, 641, 958, 2701, 13, 51572], "temperature": + 0.0, "avg_logprob": -0.12437812631780451, "compression_ratio": 1.701818181818182, + "no_speech_prob": 0.04596226289868355}, {"id": 107, "seek": 59624, "start": 620.4, + "end": 625.6800000000001, "text": " So we have a lot of YouTubers out there who + talk about open source projects. We have a lot of", "tokens": [51572, 407, 321, + 362, 257, 688, 295, 30571, 484, 456, 567, 751, 466, 1269, 4009, 4455, 13, 492, 362, + 257, 688, 295, 51836], "temperature": 0.0, "avg_logprob": -0.12437812631780451, + "compression_ratio": 1.701818181818182, "no_speech_prob": 0.04596226289868355}, + {"id": 108, "seek": 62568, "start": 625.68, "end": 630.7199999999999, "text": " + blog writers out there who tried about... Even I have also written about it in certain", + "tokens": [50364, 6968, 13491, 484, 456, 567, 3031, 466, 485, 2754, 286, 362, 611, + 3720, 466, 309, 294, 1629, 50616], "temperature": 0.0, "avg_logprob": -0.17008523683290225, + "compression_ratio": 1.6996336996336996, "no_speech_prob": 0.005732365883886814}, + {"id": 109, "seek": 62568, "start": 632.4, "end": 637.4399999999999, "text": " in + certain of my blogs as well, like the other different tools. So trending feed gives + you", "tokens": [50700, 294, 1629, 295, 452, 31038, 382, 731, 11, 411, 264, 661, + 819, 3873, 13, 407, 28692, 3154, 2709, 291, 50952], "temperature": 0.0, "avg_logprob": + -0.17008523683290225, "compression_ratio": 1.6996336996336996, "no_speech_prob": + 0.005732365883886814}, {"id": 110, "seek": 62568, "start": 637.4399999999999, "end": + 642.0799999999999, "text": " visibility. People take it out and then people tweet + about the project. People talk in their", "tokens": [50952, 19883, 13, 3432, 747, + 309, 484, 293, 550, 561, 15258, 466, 264, 1716, 13, 3432, 751, 294, 641, 51184], + "temperature": 0.0, "avg_logprob": -0.17008523683290225, "compression_ratio": 1.6996336996336996, + "no_speech_prob": 0.005732365883886814}, {"id": 111, "seek": 62568, "start": 642.0799999999999, + "end": 648.64, "text": " different forums. It gets reshared. So this gives you a + really good visibility. Probably maybe like", "tokens": [51184, 819, 26998, 13, + 467, 2170, 725, 71, 1642, 13, 407, 341, 2709, 291, 257, 534, 665, 19883, 13, 9210, + 1310, 411, 51512], "temperature": 0.0, "avg_logprob": -0.17008523683290225, "compression_ratio": + 1.6996336996336996, "no_speech_prob": 0.005732365883886814}, {"id": 112, "seek": + 62568, "start": 648.64, "end": 655.4399999999999, "text": " improves up your SEO + a bit. That I would say. And once you are visible, it''s going to be like", "tokens": + [51512, 24771, 493, 428, 22964, 257, 857, 13, 663, 286, 576, 584, 13, 400, 1564, + 291, 366, 8974, 11, 309, 311, 516, 281, 312, 411, 51852], "temperature": 0.0, "avg_logprob": + -0.17008523683290225, "compression_ratio": 1.6996336996336996, "no_speech_prob": + 0.005732365883886814}, {"id": 113, "seek": 65544, "start": 655.44, "end": 659.7600000000001, + "text": " more people are going to download that project. They will try it out. + And if they find out a bug,", "tokens": [50364, 544, 561, 366, 516, 281, 5484, 300, + 1716, 13, 814, 486, 853, 309, 484, 13, 400, 498, 436, 915, 484, 257, 7426, 11, 50580], + "temperature": 0.0, "avg_logprob": -0.14161189256516177, "compression_ratio": 1.7296296296296296, + "no_speech_prob": 0.005547659937292337}, {"id": 114, "seek": 65544, "start": 659.7600000000001, + "end": 664.32, "text": " they will contribute it. And some other people are really + enthusiastic. They want to learn", "tokens": [50580, 436, 486, 10586, 309, 13, 400, + 512, 661, 561, 366, 534, 28574, 13, 814, 528, 281, 1466, 50808], "temperature": + 0.0, "avg_logprob": -0.14161189256516177, "compression_ratio": 1.7296296296296296, + "no_speech_prob": 0.005547659937292337}, {"id": 115, "seek": 65544, "start": 664.32, + "end": 670.1600000000001, "text": " hopping into your community. They will talk + about like, hey, can I contribute? Is this an issue", "tokens": [50808, 47199, 666, + 428, 1768, 13, 814, 486, 751, 466, 411, 11, 4177, 11, 393, 286, 10586, 30, 1119, + 341, 364, 2734, 51100], "temperature": 0.0, "avg_logprob": -0.14161189256516177, + "compression_ratio": 1.7296296296296296, "no_speech_prob": 0.005547659937292337}, + {"id": 116, "seek": 65544, "start": 670.1600000000001, "end": 676.6400000000001, + "text": " that I want to talk about? So the whole thing with the resume match was + the same as well.", "tokens": [51100, 300, 286, 528, 281, 751, 466, 30, 407, 264, + 1379, 551, 365, 264, 15358, 2995, 390, 264, 912, 382, 731, 13, 51424], "temperature": + 0.0, "avg_logprob": -0.14161189256516177, "compression_ratio": 1.7296296296296296, + "no_speech_prob": 0.005547659937292337}, {"id": 117, "seek": 65544, "start": 676.6400000000001, + "end": 684.5600000000001, "text": " Yeah, it''s amazing. And then I feel like maybe + these people also found your project relevant", "tokens": [51424, 865, 11, 309, + 311, 2243, 13, 400, 550, 286, 841, 411, 1310, 613, 561, 611, 1352, 428, 1716, 7340, + 51820], "temperature": 0.0, "avg_logprob": -0.14161189256516177, "compression_ratio": + 1.7296296296296296, "no_speech_prob": 0.005547659937292337}, {"id": 118, "seek": + 68456, "start": 684.56, "end": 690.7199999999999, "text": " to themselves or maybe + to their connections. I was checking your product hand and I see some", "tokens": + [50364, 281, 2969, 420, 1310, 281, 641, 9271, 13, 286, 390, 8568, 428, 1674, 1011, + 293, 286, 536, 512, 50672], "temperature": 0.0, "avg_logprob": -0.17783841981992618, + "compression_ratio": 1.534412955465587, "no_speech_prob": 0.013679549098014832}, + {"id": 119, "seek": 68456, "start": 690.7199999999999, "end": 695.76, "text": " + comments where people say that, hey, this is amazing. I also recommended it to a + friend of mine", "tokens": [50672, 3053, 689, 561, 584, 300, 11, 4177, 11, 341, + 307, 2243, 13, 286, 611, 9628, 309, 281, 257, 1277, 295, 3892, 50924], "temperature": + 0.0, "avg_logprob": -0.17783841981992618, "compression_ratio": 1.534412955465587, + "no_speech_prob": 0.013679549098014832}, {"id": 120, "seek": 68456, "start": 695.76, + "end": 702.64, "text": " who''s also looking for a job. And I think this market, + of course, became like IT sector became", "tokens": [50924, 567, 311, 611, 1237, + 337, 257, 1691, 13, 400, 286, 519, 341, 2142, 11, 295, 1164, 11, 3062, 411, 6783, + 6977, 3062, 51268], "temperature": 0.0, "avg_logprob": -0.17783841981992618, "compression_ratio": + 1.534412955465587, "no_speech_prob": 0.013679549098014832}, {"id": 121, "seek": + 68456, "start": 702.64, "end": 709.92, "text": " a little volatile. Probably around + the world, right? So you kind of kicked in with this project", "tokens": [51268, + 257, 707, 34377, 13, 9210, 926, 264, 1002, 11, 558, 30, 407, 291, 733, 295, 14609, + 294, 365, 341, 1716, 51632], "temperature": 0.0, "avg_logprob": -0.17783841981992618, + "compression_ratio": 1.534412955465587, "no_speech_prob": 0.013679549098014832}, + {"id": 122, "seek": 70992, "start": 710.16, "end": 719.76, "text": " right? Yes, + I would say that may be my timing was perfect in that sense. Like the moment,", + "tokens": [50376, 558, 30, 1079, 11, 286, 576, 584, 300, 815, 312, 452, 10822, 390, + 2176, 294, 300, 2020, 13, 1743, 264, 1623, 11, 50856], "temperature": 0.0, "avg_logprob": + -0.24509391269168337, "compression_ratio": 1.5555555555555556, "no_speech_prob": + 0.027569327503442764}, {"id": 123, "seek": 70992, "start": 722.56, "end": 726.88, + "text": " I would not like to stay in that sense, but I just wrote the product. + I never knew that the", "tokens": [50996, 286, 576, 406, 411, 281, 1754, 294, 300, + 2020, 11, 457, 286, 445, 4114, 264, 1674, 13, 286, 1128, 2586, 300, 264, 51212], + "temperature": 0.0, "avg_logprob": -0.24509391269168337, "compression_ratio": 1.5555555555555556, + "no_speech_prob": 0.027569327503442764}, {"id": 124, "seek": 70992, "start": 726.88, + "end": 734.7199999999999, "text": " market was going to be in that same thing. So + yeah, the resume measure took the advantage of time,", "tokens": [51212, 2142, 390, + 516, 281, 312, 294, 300, 912, 551, 13, 407, 1338, 11, 264, 15358, 3481, 1890, 264, + 5002, 295, 565, 11, 51604], "temperature": 0.0, "avg_logprob": -0.24509391269168337, + "compression_ratio": 1.5555555555555556, "no_speech_prob": 0.027569327503442764}, + {"id": 125, "seek": 73472, "start": 735.44, "end": 741.44, "text": " we''re also + seeing a lot of generative AI based open source startups. Lama index is pretty amazing", + "tokens": [50400, 321, 434, 611, 2577, 257, 688, 295, 1337, 1166, 7318, 2361, 1269, + 4009, 28041, 13, 441, 2404, 8186, 307, 1238, 2243, 50700], "temperature": 0.0, "avg_logprob": + -0.22733075011010265, "compression_ratio": 1.5214007782101167, "no_speech_prob": + 0.07012353092432022}, {"id": 126, "seek": 73472, "start": 741.44, "end": 746.4, + "text": " in that sense. Like I know Lama index and Langshane trying them out when + they were like really small,", "tokens": [50700, 294, 300, 2020, 13, 1743, 286, + 458, 441, 2404, 8186, 293, 13313, 2716, 1929, 1382, 552, 484, 562, 436, 645, 411, + 534, 1359, 11, 50948], "temperature": 0.0, "avg_logprob": -0.22733075011010265, + "compression_ratio": 1.5214007782101167, "no_speech_prob": 0.07012353092432022}, + {"id": 127, "seek": 73472, "start": 747.2, "end": 753.36, "text": " they were just + growing out there. And then after like 12, 13 months, I believe? Well, yeah,", "tokens": + [50988, 436, 645, 445, 4194, 484, 456, 13, 400, 550, 934, 411, 2272, 11, 3705, 2493, + 11, 286, 1697, 30, 1042, 11, 1338, 11, 51296], "temperature": 0.0, "avg_logprob": + -0.22733075011010265, "compression_ratio": 1.5214007782101167, "no_speech_prob": + 0.07012353092432022}, {"id": 128, "seek": 73472, "start": 753.36, "end": 759.12, + "text": " around nearly a year, maybe less. Now they''re like full-blown companies. + They''ve got an investment.", "tokens": [51296, 926, 6217, 257, 1064, 11, 1310, + 1570, 13, 823, 436, 434, 411, 1577, 12, 5199, 648, 3431, 13, 814, 600, 658, 364, + 6078, 13, 51584], "temperature": 0.0, "avg_logprob": -0.22733075011010265, "compression_ratio": + 1.5214007782101167, "no_speech_prob": 0.07012353092432022}, {"id": 129, "seek": + 75912, "start": 759.12, "end": 765.6, "text": " They have their own cloud tier. + So maybe the whole next big shift towards open source.", "tokens": [50364, 814, + 362, 641, 1065, 4588, 12362, 13, 407, 1310, 264, 1379, 958, 955, 5513, 3030, 1269, + 4009, 13, 50688], "temperature": 0.0, "avg_logprob": -0.1610635450516624, "compression_ratio": + 1.5474137931034482, "no_speech_prob": 0.043372150510549545}, {"id": 130, "seek": + 75912, "start": 766.32, "end": 773.04, "text": " Yeah, exactly. Well, getting a + bit more technical, can you also unveil the some of the", "tokens": [50724, 865, + 11, 2293, 13, 1042, 11, 1242, 257, 857, 544, 6191, 11, 393, 291, 611, 31009, 388, + 264, 512, 295, 264, 51060], "temperature": 0.0, "avg_logprob": -0.1610635450516624, + "compression_ratio": 1.5474137931034482, "no_speech_prob": 0.043372150510549545}, + {"id": 131, "seek": 75912, "start": 773.04, "end": 778.48, "text": " architecture + decisions you made for this tool? I know that you are using vector search.", "tokens": + [51060, 9482, 5327, 291, 1027, 337, 341, 2290, 30, 286, 458, 300, 291, 366, 1228, + 8062, 3164, 13, 51332], "temperature": 0.0, "avg_logprob": -0.1610635450516624, + "compression_ratio": 1.5474137931034482, "no_speech_prob": 0.043372150510549545}, + {"id": 132, "seek": 75912, "start": 778.48, "end": 783.6800000000001, "text": " + Is that right? Like what are you using? You are using some library or database. + Maybe you started", "tokens": [51332, 1119, 300, 558, 30, 1743, 437, 366, 291, 1228, + 30, 509, 366, 1228, 512, 6405, 420, 8149, 13, 2704, 291, 1409, 51592], "temperature": + 0.0, "avg_logprob": -0.1610635450516624, "compression_ratio": 1.5474137931034482, + "no_speech_prob": 0.043372150510549545}, {"id": 133, "seek": 78368, "start": 783.76, + "end": 788.9599999999999, "text": " using a database. Can you explain a bit more + about the architecture of your system?", "tokens": [50368, 1228, 257, 8149, 13, + 1664, 291, 2903, 257, 857, 544, 466, 264, 9482, 295, 428, 1185, 30, 50628], "temperature": + 0.0, "avg_logprob": -0.26963843797382553, "compression_ratio": 1.5303867403314917, + "no_speech_prob": 0.015871595591306686}, {"id": 134, "seek": 78368, "start": 791.52, + "end": 799.92, "text": " I use like basic tools such as PDF minor or PDF extraction + tools, word text extraction tools,", "tokens": [50756, 286, 764, 411, 3875, 3873, + 1270, 382, 17752, 6696, 420, 17752, 30197, 3873, 11, 1349, 2487, 30197, 3873, 11, + 51176], "temperature": 0.0, "avg_logprob": -0.26963843797382553, "compression_ratio": + 1.5303867403314917, "no_speech_prob": 0.015871595591306686}, {"id": 135, "seek": + 78368, "start": 799.92, "end": 808.7199999999999, "text": " I would say. I use that. + Then I use libraries such as Spacy, Annel TK. And yeah, Spacy and Annel TK", "tokens": + [51176, 286, 576, 584, 13, 286, 764, 300, 13, 1396, 286, 764, 15148, 1270, 382, + 1738, 2551, 11, 1107, 6396, 314, 42, 13, 400, 1338, 11, 1738, 2551, 293, 1107, 6396, + 314, 42, 51616], "temperature": 0.0, "avg_logprob": -0.26963843797382553, "compression_ratio": + 1.5303867403314917, "no_speech_prob": 0.015871595591306686}, {"id": 136, "seek": + 80872, "start": 809.36, "end": 816.08, "text": " code that I''ve written to combine + them and use algorithms to extract chunks of text.", "tokens": [50396, 3089, 300, + 286, 600, 3720, 281, 10432, 552, 293, 764, 14642, 281, 8947, 24004, 295, 2487, 13, + 50732], "temperature": 0.0, "avg_logprob": -0.171519564486098, "compression_ratio": + 1.7272727272727273, "no_speech_prob": 0.10209817439317703}, {"id": 137, "seek": + 80872, "start": 816.08, "end": 822.1600000000001, "text": " And then I use something + called as, so I was using quadrant vector database to do the vector", "tokens": + [50732, 400, 550, 286, 764, 746, 1219, 382, 11, 370, 286, 390, 1228, 46856, 8062, + 8149, 281, 360, 264, 8062, 51036], "temperature": 0.0, "avg_logprob": -0.171519564486098, + "compression_ratio": 1.7272727272727273, "no_speech_prob": 0.10209817439317703}, + {"id": 138, "seek": 80872, "start": 822.1600000000001, "end": 827.36, "text": " + embedding. But later on quadrant introduced something called as fast embed, which + is pretty", "tokens": [51036, 12240, 3584, 13, 583, 1780, 322, 46856, 7268, 746, + 1219, 382, 2370, 12240, 11, 597, 307, 1238, 51296], "temperature": 0.0, "avg_logprob": + -0.171519564486098, "compression_ratio": 1.7272727272727273, "no_speech_prob": 0.10209817439317703}, + {"id": 139, "seek": 80872, "start": 827.36, "end": 834.72, "text": " amazing. And + it can do the text to vector embedding on the fly. And then using that, I can,", + "tokens": [51296, 2243, 13, 400, 309, 393, 360, 264, 2487, 281, 8062, 12240, 3584, + 322, 264, 3603, 13, 400, 550, 1228, 300, 11, 286, 393, 11, 51664], "temperature": + 0.0, "avg_logprob": -0.171519564486098, "compression_ratio": 1.7272727272727273, + "no_speech_prob": 0.10209817439317703}, {"id": 140, "seek": 83472, "start": 834.72, + "end": 840.24, "text": " I''ve written like someone contributed the code to do the + co-science similarity for that.", "tokens": [50364, 286, 600, 3720, 411, 1580, 18434, + 264, 3089, 281, 360, 264, 598, 12, 82, 6699, 32194, 337, 300, 13, 50640], "temperature": + 0.0, "avg_logprob": -0.18911594152450562, "compression_ratio": 1.6106194690265487, + "no_speech_prob": 0.003894459456205368}, {"id": 141, "seek": 83472, "start": 841.28, + "end": 849.9200000000001, "text": " So extract the text, do the analysis using Spacy, + Annel TK, and there''s also another", "tokens": [50692, 407, 8947, 264, 2487, 11, + 360, 264, 5215, 1228, 1738, 2551, 11, 1107, 6396, 314, 42, 11, 293, 456, 311, 611, + 1071, 51124], "temperature": 0.0, "avg_logprob": -0.18911594152450562, "compression_ratio": + 1.6106194690265487, "no_speech_prob": 0.003894459456205368}, {"id": 142, "seek": + 83472, "start": 849.9200000000001, "end": 856.72, "text": " wrapper on top of Spacy. + Use that. Get the code of the data, visualize it, and then send it to", "tokens": + [51124, 46906, 322, 1192, 295, 1738, 2551, 13, 8278, 300, 13, 3240, 264, 3089, 295, + 264, 1412, 11, 23273, 309, 11, 293, 550, 2845, 309, 281, 51464], "temperature": + 0.0, "avg_logprob": -0.18911594152450562, "compression_ratio": 1.6106194690265487, + "no_speech_prob": 0.003894459456205368}, {"id": 143, "seek": 83472, "start": 856.72, + "end": 863.28, "text": " the people out there for the vector embedding. But the + search is something that I would like to", "tokens": [51464, 264, 561, 484, 456, + 337, 264, 8062, 12240, 3584, 13, 583, 264, 3164, 307, 746, 300, 286, 576, 411, 281, + 51792], "temperature": 0.0, "avg_logprob": -0.18911594152450562, "compression_ratio": + 1.6106194690265487, "no_speech_prob": 0.003894459456205368}, {"id": 144, "seek": + 86328, "start": 863.28, "end": 868.3199999999999, "text": " introduce. Maybe generative + AI is also that something that I''m working on as well. So the", "tokens": [50364, + 5366, 13, 2704, 1337, 1166, 7318, 307, 611, 300, 746, 300, 286, 478, 1364, 322, + 382, 731, 13, 407, 264, 50616], "temperature": 0.0, "avg_logprob": -0.20775381442719856, + "compression_ratio": 1.6498194945848375, "no_speech_prob": 0.0069742887280881405}, + {"id": 145, "seek": 86328, "start": 868.3199999999999, "end": 874.72, "text": " + current goal in the couple of months is to get a dynamic ATS that takes in your + resume,", "tokens": [50616, 2190, 3387, 294, 264, 1916, 295, 2493, 307, 281, 483, + 257, 8546, 316, 7327, 300, 2516, 294, 428, 15358, 11, 50936], "temperature": 0.0, + "avg_logprob": -0.20775381442719856, "compression_ratio": 1.6498194945848375, "no_speech_prob": + 0.0069742887280881405}, {"id": 146, "seek": 86328, "start": 875.8399999999999, "end": + 880.8, "text": " takes in the job description, optimizes it without hallucinating, + without adding some extra", "tokens": [50992, 2516, 294, 264, 1691, 3855, 11, 5028, + 5660, 309, 1553, 35212, 8205, 11, 1553, 5127, 512, 2857, 51240], "temperature": + 0.0, "avg_logprob": -0.20775381442719856, "compression_ratio": 1.6498194945848375, + "no_speech_prob": 0.0069742887280881405}, {"id": 147, "seek": 86328, "start": 880.8, + "end": 887.28, "text": " keywords out there. Like as a chava developer, you don''t + want cloud. If you haven''t worked in cloud,", "tokens": [51240, 21009, 484, 456, + 13, 1743, 382, 257, 417, 4061, 10754, 11, 291, 500, 380, 528, 4588, 13, 759, 291, + 2378, 380, 2732, 294, 4588, 11, 51564], "temperature": 0.0, "avg_logprob": -0.20775381442719856, + "compression_ratio": 1.6498194945848375, "no_speech_prob": 0.0069742887280881405}, + {"id": 148, "seek": 86328, "start": 887.28, "end": 892.72, "text": " then you don''t + want Kubernetes to be suddenly added into your resume. Or maybe it, you", "tokens": + [51564, 550, 291, 500, 380, 528, 23145, 281, 312, 5800, 3869, 666, 428, 15358, 13, + 1610, 1310, 309, 11, 291, 51836], "temperature": 0.0, "avg_logprob": -0.20775381442719856, + "compression_ratio": 1.6498194945848375, "no_speech_prob": 0.0069742887280881405}, + {"id": 149, "seek": 89272, "start": 892.72, "end": 897.0400000000001, "text": " + don''t want it to say that, hey, this guy has worked out. Some other company and + created", "tokens": [50364, 500, 380, 528, 309, 281, 584, 300, 11, 4177, 11, 341, + 2146, 575, 2732, 484, 13, 2188, 661, 2237, 293, 2942, 50580], "temperature": 0.0, + "avg_logprob": -0.25177805474463927, "compression_ratio": 1.623931623931624, "no_speech_prob": + 0.0038600496482104063}, {"id": 150, "seek": 89272, "start": 898.0, "end": 905.9200000000001, + "text": " open AI is any any item stuff that comes out. So limiting the hallucination + and all that thing", "tokens": [50628, 1269, 7318, 307, 604, 604, 3174, 1507, 300, + 1487, 484, 13, 407, 22083, 264, 35212, 2486, 293, 439, 300, 551, 51024], "temperature": + 0.0, "avg_logprob": -0.25177805474463927, "compression_ratio": 1.623931623931624, + "no_speech_prob": 0.0038600496482104063}, {"id": 151, "seek": 89272, "start": 906.5600000000001, + "end": 913.76, "text": " to introduce generative AI is somewhere in between. Yeah, + yeah, yeah, but sure. And are you also like", "tokens": [51056, 281, 5366, 1337, + 1166, 7318, 307, 4079, 294, 1296, 13, 865, 11, 1338, 11, 1338, 11, 457, 988, 13, + 400, 366, 291, 611, 411, 51416], "temperature": 0.0, "avg_logprob": -0.25177805474463927, + "compression_ratio": 1.623931623931624, "no_speech_prob": 0.0038600496482104063}, + {"id": 152, "seek": 89272, "start": 913.76, "end": 921.36, "text": " planning to + make it like a cloud version or something that you host for other people to access?", + "tokens": [51416, 5038, 281, 652, 309, 411, 257, 4588, 3037, 420, 746, 300, 291, + 3975, 337, 661, 561, 281, 2105, 30, 51796], "temperature": 0.0, "avg_logprob": -0.25177805474463927, + "compression_ratio": 1.623931623931624, "no_speech_prob": 0.0038600496482104063}, + {"id": 153, "seek": 92136, "start": 921.36, "end": 926.96, "text": " Or is it so + that at the moment, it''s mostly like self service, right? So people need to download", + "tokens": [50364, 1610, 307, 309, 370, 300, 412, 264, 1623, 11, 309, 311, 5240, + 411, 2698, 2643, 11, 558, 30, 407, 561, 643, 281, 5484, 50644], "temperature": 0.0, + "avg_logprob": -0.16978683471679687, "compression_ratio": 1.6081632653061224, "no_speech_prob": + 0.0033344458788633347}, {"id": 154, "seek": 92136, "start": 926.96, "end": 935.28, + "text": " your repository, set it up and start using it. The challenges with cloud + is that cloud is not free.", "tokens": [50644, 428, 25841, 11, 992, 309, 493, 293, + 722, 1228, 309, 13, 440, 4759, 365, 4588, 307, 300, 4588, 307, 406, 1737, 13, 51060], + "temperature": 0.0, "avg_logprob": -0.16978683471679687, "compression_ratio": 1.6081632653061224, + "no_speech_prob": 0.0033344458788633347}, {"id": 155, "seek": 92136, "start": 936.72, + "end": 942.48, "text": " And if I were was to introduce a cloud variant that has + to be like, have a paywall, people have to", "tokens": [51132, 400, 498, 286, 645, + 390, 281, 5366, 257, 4588, 17501, 300, 575, 281, 312, 411, 11, 362, 257, 1689, 16256, + 11, 561, 362, 281, 51420], "temperature": 0.0, "avg_logprob": -0.16978683471679687, + "compression_ratio": 1.6081632653061224, "no_speech_prob": 0.0033344458788633347}, + {"id": 156, "seek": 92136, "start": 942.48, "end": 949.76, "text": " log in, they + have to subscribe and all that thing. It''s just a far fetched goal that if I want + to,", "tokens": [51420, 3565, 294, 11, 436, 362, 281, 3022, 293, 439, 300, 551, + 13, 467, 311, 445, 257, 1400, 23673, 292, 3387, 300, 498, 286, 528, 281, 11, 51784], + "temperature": 0.0, "avg_logprob": -0.16978683471679687, "compression_ratio": 1.6081632653061224, + "no_speech_prob": 0.0033344458788633347}, {"id": 157, "seek": 94976, "start": 949.76, + "end": 956.0, "text": " I can get a paid version of it. But what it really want + is downloadable version that people can", "tokens": [50364, 286, 393, 483, 257, + 4835, 3037, 295, 309, 13, 583, 437, 309, 534, 528, 307, 5484, 712, 3037, 300, 561, + 393, 50676], "temperature": 0.0, "avg_logprob": -0.21781382353409476, "compression_ratio": + 1.6772727272727272, "no_speech_prob": 0.00868752971291542}, {"id": 158, "seek": + 94976, "start": 956.0, "end": 961.52, "text": " download and easily access, not + just software developers, but maybe everyone out there. Like a Mac,", "tokens": + [50676, 5484, 293, 3612, 2105, 11, 406, 445, 4722, 8849, 11, 457, 1310, 1518, 484, + 456, 13, 1743, 257, 5707, 11, 50952], "temperature": 0.0, "avg_logprob": -0.21781382353409476, + "compression_ratio": 1.6772727272727272, "no_speech_prob": 0.00868752971291542}, + {"id": 159, "seek": 94976, "start": 962.08, "end": 968.24, "text": " Mac OS app + or an iOS app that people can download and start to. Yeah, yeah, for sure.", "tokens": + [50980, 5707, 12731, 724, 420, 364, 17430, 724, 300, 561, 393, 5484, 293, 722, 281, + 13, 865, 11, 1338, 11, 337, 988, 13, 51288], "temperature": 0.0, "avg_logprob": + -0.21781382353409476, "compression_ratio": 1.6772727272727272, "no_speech_prob": + 0.00868752971291542}, {"id": 160, "seek": 94976, "start": 969.12, "end": 975.6, + "text": " Yeah, that makes a lot of sense for sure. Yeah, you also mentioned something + about like", "tokens": [51332, 865, 11, 300, 1669, 257, 688, 295, 2020, 337, 988, + 13, 865, 11, 291, 611, 2835, 746, 466, 411, 51656], "temperature": 0.0, "avg_logprob": + -0.21781382353409476, "compression_ratio": 1.6772727272727272, "no_speech_prob": + 0.00868752971291542}, {"id": 161, "seek": 97560, "start": 976.16, "end": 982.64, + "text": " challenges maintaining this project. I don''t know. Is this is a challenging + to maintain a project that", "tokens": [50392, 4759, 14916, 341, 1716, 13, 286, + 500, 380, 458, 13, 1119, 341, 307, 257, 7595, 281, 6909, 257, 1716, 300, 50716], + "temperature": 0.0, "avg_logprob": -0.18686716416302848, "compression_ratio": 1.6330275229357798, + "no_speech_prob": 0.023049090057611465}, {"id": 162, "seek": 97560, "start": 982.64, + "end": 988.72, "text": " you''ve been like the author of, but now you have so many + contributors, maybe also a growing", "tokens": [50716, 291, 600, 668, 411, 264, + 3793, 295, 11, 457, 586, 291, 362, 370, 867, 45627, 11, 1310, 611, 257, 4194, 51020], + "temperature": 0.0, "avg_logprob": -0.18686716416302848, "compression_ratio": 1.6330275229357798, + "no_speech_prob": 0.023049090057611465}, {"id": 163, "seek": 97560, "start": 988.72, + "end": 994.96, "text": " demand of things and people make decisions and so on. How + do you coordinate this thing today?", "tokens": [51020, 4733, 295, 721, 293, 561, + 652, 5327, 293, 370, 322, 13, 1012, 360, 291, 15670, 341, 551, 965, 30, 51332], + "temperature": 0.0, "avg_logprob": -0.18686716416302848, "compression_ratio": 1.6330275229357798, + "no_speech_prob": 0.023049090057611465}, {"id": 164, "seek": 97560, "start": 994.96, + "end": 1002.88, "text": " Is it a challenge? Yeah, that''s an interesting question + by the way.", "tokens": [51332, 1119, 309, 257, 3430, 30, 865, 11, 300, 311, 364, + 1880, 1168, 538, 264, 636, 13, 51728], "temperature": 0.0, "avg_logprob": -0.18686716416302848, + "compression_ratio": 1.6330275229357798, "no_speech_prob": 0.023049090057611465}, + {"id": 165, "seek": 100288, "start": 1003.36, "end": 1011.36, "text": " Yeah, it + becomes challenging after a certain time. So when you''re trending, it gives you + a really", "tokens": [50388, 865, 11, 309, 3643, 7595, 934, 257, 1629, 565, 13, + 407, 562, 291, 434, 28692, 11, 309, 2709, 291, 257, 534, 50788], "temperature": + 0.0, "avg_logprob": -0.23481357340909997, "compression_ratio": 1.568, "no_speech_prob": + 0.006400200072675943}, {"id": 166, "seek": 100288, "start": 1011.36, "end": 1016.64, + "text": " amazing intense dopamine hit and you''re excited about it. You check out + the GitHub Fidei, you have", "tokens": [50788, 2243, 9447, 37219, 2045, 293, 291, + 434, 2919, 466, 309, 13, 509, 1520, 484, 264, 23331, 479, 482, 72, 11, 291, 362, + 51052], "temperature": 0.0, "avg_logprob": -0.23481357340909997, "compression_ratio": + 1.568, "no_speech_prob": 0.006400200072675943}, {"id": 167, "seek": 100288, "start": + 1016.64, "end": 1021.4399999999999, "text": " grown by 200 stars and everything. + I''ve seen like a lot of people joining that are talking to you.", "tokens": [51052, + 7709, 538, 2331, 6105, 293, 1203, 13, 286, 600, 1612, 411, 257, 688, 295, 561, 5549, + 300, 366, 1417, 281, 291, 13, 51292], "temperature": 0.0, "avg_logprob": -0.23481357340909997, + "compression_ratio": 1.568, "no_speech_prob": 0.006400200072675943}, {"id": 168, + "seek": 100288, "start": 1022.08, "end": 1028.0, "text": " And then after that, + after the whole phase fades out, there comes in a time when I have a lot of", "tokens": + [51324, 400, 550, 934, 300, 11, 934, 264, 1379, 5574, 32679, 484, 11, 456, 1487, + 294, 257, 565, 562, 286, 362, 257, 688, 295, 51620], "temperature": 0.0, "avg_logprob": + -0.23481357340909997, "compression_ratio": 1.568, "no_speech_prob": 0.006400200072675943}, + {"id": 169, "seek": 102800, "start": 1028.0, "end": 1034.88, "text": " full requests + out there, which we have to talk about. And then even if the full requests are there,", + "tokens": [50364, 1577, 12475, 484, 456, 11, 597, 321, 362, 281, 751, 466, 13, 400, + 550, 754, 498, 264, 1577, 12475, 366, 456, 11, 50708], "temperature": 0.0, "avg_logprob": + -0.17392182857432265, "compression_ratio": 1.7031963470319635, "no_speech_prob": + 0.007528632413595915}, {"id": 170, "seek": 102800, "start": 1034.88, "end": 1040.08, + "text": " you have to download them, test them, and then there could be things like + you have to request", "tokens": [50708, 291, 362, 281, 5484, 552, 11, 1500, 552, + 11, 293, 550, 456, 727, 312, 721, 411, 291, 362, 281, 5308, 50968], "temperature": + 0.0, "avg_logprob": -0.17392182857432265, "compression_ratio": 1.7031963470319635, + "no_speech_prob": 0.007528632413595915}, {"id": 171, "seek": 102800, "start": 1040.08, + "end": 1045.92, "text": " changes and it all takes up a lot of time. So it does + become a challenging but", "tokens": [50968, 2962, 293, 309, 439, 2516, 493, 257, + 688, 295, 565, 13, 407, 309, 775, 1813, 257, 7595, 457, 51260], "temperature": 0.0, + "avg_logprob": -0.17392182857432265, "compression_ratio": 1.7031963470319635, "no_speech_prob": + 0.007528632413595915}, {"id": 172, "seek": 102800, "start": 1045.92, "end": 1051.28, + "text": " hay, but that''s the part of process. Like even you do the same thing + with at your workplace as well.", "tokens": [51260, 4842, 11, 457, 300, 311, 264, + 644, 295, 1399, 13, 1743, 754, 291, 360, 264, 912, 551, 365, 412, 428, 15328, 382, + 731, 13, 51528], "temperature": 0.0, "avg_logprob": -0.17392182857432265, "compression_ratio": + 1.7031963470319635, "no_speech_prob": 0.007528632413595915}, {"id": 173, "seek": + 105128, "start": 1051.28, "end": 1059.92, "text": " So I won''t focus more much + on that. It''s the work that''s there. Yeah, it''s", "tokens": [50364, 407, 286, + 1582, 380, 1879, 544, 709, 322, 300, 13, 467, 311, 264, 589, 300, 311, 456, 13, + 865, 11, 309, 311, 50796], "temperature": 0.0, "avg_logprob": -0.21808244826945852, + "compression_ratio": 1.5555555555555556, "no_speech_prob": 0.009932621382176876}, + {"id": 174, "seek": 105128, "start": 1059.92, "end": 1065.92, "text": " more of + any open source project. Yeah, exactly. So it''s kind of like marathon and you need + to", "tokens": [50796, 544, 295, 604, 1269, 4009, 1716, 13, 865, 11, 2293, 13, 407, + 309, 311, 733, 295, 411, 27601, 293, 291, 643, 281, 51096], "temperature": 0.0, + "avg_logprob": -0.21808244826945852, "compression_ratio": 1.5555555555555556, "no_speech_prob": + 0.009932621382176876}, {"id": 175, "seek": 105128, "start": 1065.92, "end": 1072.32, + "text": " dedicate some chunk of your day to do stuff, right? To maintain it, to + keep it alive.", "tokens": [51096, 30718, 512, 16635, 295, 428, 786, 281, 360, 1507, + 11, 558, 30, 1407, 6909, 309, 11, 281, 1066, 309, 5465, 13, 51416], "temperature": + 0.0, "avg_logprob": -0.21808244826945852, "compression_ratio": 1.5555555555555556, + "no_speech_prob": 0.009932621382176876}, {"id": 176, "seek": 105128, "start": 1073.76, + "end": 1080.56, "text": " Oh yeah, that''s amazing. You also mentioned that you + basically replicated this success of your,", "tokens": [51488, 876, 1338, 11, 300, + 311, 2243, 13, 509, 611, 2835, 300, 291, 1936, 46365, 341, 2245, 295, 428, 11, 51828], + "temperature": 0.0, "avg_logprob": -0.21808244826945852, "compression_ratio": 1.5555555555555556, + "no_speech_prob": 0.009932621382176876}, {"id": 177, "seek": 108056, "start": 1080.56, + "end": 1084.96, "text": " you became like a marketing guru in some sense, right? + Like I know that there are, there are even", "tokens": [50364, 291, 3062, 411, 257, + 6370, 29949, 294, 512, 2020, 11, 558, 30, 1743, 286, 458, 300, 456, 366, 11, 456, + 366, 754, 50584], "temperature": 0.0, "avg_logprob": -0.10424149036407471, "compression_ratio": + 1.7157894736842105, "no_speech_prob": 0.005120519548654556}, {"id": 178, "seek": + 108056, "start": 1084.96, "end": 1090.0, "text": " like companies that have open + source repositories that haven''t had such a success that you''ve had in", "tokens": + [50584, 411, 3431, 300, 362, 1269, 4009, 22283, 2083, 300, 2378, 380, 632, 1270, + 257, 2245, 300, 291, 600, 632, 294, 50836], "temperature": 0.0, "avg_logprob": -0.10424149036407471, + "compression_ratio": 1.7157894736842105, "no_speech_prob": 0.005120519548654556}, + {"id": 179, "seek": 108056, "start": 1090.0, "end": 1096.1599999999999, "text": + " just a few weeks. I mean, I haven''t just really amazed by this. You also picked + up another project", "tokens": [50836, 445, 257, 1326, 3259, 13, 286, 914, 11, 286, + 2378, 380, 445, 534, 20507, 538, 341, 13, 509, 611, 6183, 493, 1071, 1716, 51144], + "temperature": 0.0, "avg_logprob": -0.10424149036407471, "compression_ratio": 1.7157894736842105, + "no_speech_prob": 0.005120519548654556}, {"id": 180, "seek": 108056, "start": 1097.04, + "end": 1104.1599999999999, "text": " and you kind of like replicated the same process + there and you grew it to like really big level.", "tokens": [51188, 293, 291, 733, + 295, 411, 46365, 264, 912, 1399, 456, 293, 291, 6109, 309, 281, 411, 534, 955, 1496, + 13, 51544], "temperature": 0.0, "avg_logprob": -0.10424149036407471, "compression_ratio": + 1.7157894736842105, "no_speech_prob": 0.005120519548654556}, {"id": 181, "seek": + 108056, "start": 1104.1599999999999, "end": 1109.6799999999998, "text": " So can + you talk a bit more about what is this project about? And yeah, what''s your role + there?", "tokens": [51544, 407, 393, 291, 751, 257, 857, 544, 466, 437, 307, 341, + 1716, 466, 30, 400, 1338, 11, 437, 311, 428, 3090, 456, 30, 51820], "temperature": + 0.0, "avg_logprob": -0.10424149036407471, "compression_ratio": 1.7157894736842105, + "no_speech_prob": 0.005120519548654556}, {"id": 182, "seek": 110968, "start": 1109.68, + "end": 1119.1200000000001, "text": " And what do you do? Yep. So while I was doing + resume matching and all this thing, I was looking", "tokens": [50364, 400, 437, + 360, 291, 360, 30, 7010, 13, 407, 1339, 286, 390, 884, 15358, 14324, 293, 439, 341, + 551, 11, 286, 390, 1237, 50836], "temperature": 0.0, "avg_logprob": -0.23890346220169945, + "compression_ratio": 1.5465587044534412, "no_speech_prob": 0.006029557436704636}, + {"id": 183, "seek": 110968, "start": 1119.1200000000001, "end": 1125.28, "text": + " for a search engine out there like something that could easily federate the queries + across", "tokens": [50836, 337, 257, 3164, 2848, 484, 456, 411, 746, 300, 727, 3612, + 38024, 473, 264, 24109, 2108, 51144], "temperature": 0.0, "avg_logprob": -0.23890346220169945, + "compression_ratio": 1.5465587044534412, "no_speech_prob": 0.006029557436704636}, + {"id": 184, "seek": 110968, "start": 1125.28, "end": 1130.8, "text": " different + sources and extract information. And probably the guest has also paid on the previous", + "tokens": [51144, 819, 7139, 293, 8947, 1589, 13, 400, 1391, 264, 8341, 575, 611, + 4835, 322, 264, 3894, 51420], "temperature": 0.0, "avg_logprob": -0.23890346220169945, + "compression_ratio": 1.5465587044534412, "no_speech_prob": 0.006029557436704636}, + {"id": 185, "seek": 110968, "start": 1130.8, "end": 1138.96, "text": " version of + season two of vector podcast. That''s when it was world search. So we had a talk + with Sid", "tokens": [51420, 3037, 295, 3196, 732, 295, 8062, 7367, 13, 663, 311, + 562, 309, 390, 1002, 3164, 13, 407, 321, 632, 257, 751, 365, 19797, 51828], "temperature": + 0.0, "avg_logprob": -0.23890346220169945, "compression_ratio": 1.5465587044534412, + "no_speech_prob": 0.006029557436704636}, {"id": 186, "seek": 113968, "start": 1139.68, + "end": 1144.88, "text": " and we probably kicked off like, hey, we had similar interest. + He was also into search", "tokens": [50364, 293, 321, 1391, 14609, 766, 411, 11, + 4177, 11, 321, 632, 2531, 1179, 13, 634, 390, 611, 666, 3164, 50624], "temperature": + 0.0, "avg_logprob": -0.16747237224968112, "compression_ratio": 1.5690376569037656, + "no_speech_prob": 0.0027433601208031178}, {"id": 187, "seek": 113968, "start": 1144.88, + "end": 1151.28, "text": " and embedding and all those AI stuff. And I was also interested + in that. So I joined in as", "tokens": [50624, 293, 12240, 3584, 293, 439, 729, + 7318, 1507, 13, 400, 286, 390, 611, 3102, 294, 300, 13, 407, 286, 6869, 294, 382, + 50944], "temperature": 0.0, "avg_logprob": -0.16747237224968112, "compression_ratio": + 1.5690376569037656, "no_speech_prob": 0.0027433601208031178}, {"id": 188, "seek": + 113968, "start": 1151.28, "end": 1158.16, "text": " a open source contributor out + there. We talked about like, hey, how can we replicate the success", "tokens": [50944, + 257, 1269, 4009, 42859, 484, 456, 13, 492, 2825, 466, 411, 11, 4177, 11, 577, 393, + 321, 25356, 264, 2245, 51288], "temperature": 0.0, "avg_logprob": -0.16747237224968112, + "compression_ratio": 1.5690376569037656, "no_speech_prob": 0.0027433601208031178}, + {"id": 189, "seek": 113968, "start": 1158.16, "end": 1164.88, "text": " for resume + matching that to Swirl? And it wasn''t much of a serious role, I would say, or a + full-time", "tokens": [51288, 337, 15358, 14324, 300, 281, 3926, 1648, 30, 400, + 309, 2067, 380, 709, 295, 257, 3156, 3090, 11, 286, 576, 584, 11, 420, 257, 1577, + 12, 3766, 51624], "temperature": 0.0, "avg_logprob": -0.16747237224968112, "compression_ratio": + 1.5690376569037656, "no_speech_prob": 0.0027433601208031178}, {"id": 190, "seek": + 116488, "start": 1165.0400000000002, "end": 1171.44, "text": " role, but I did the + testing of like the, does the principles work somewhere else as well?", "tokens": + [50372, 3090, 11, 457, 286, 630, 264, 4997, 295, 411, 264, 11, 775, 264, 9156, 589, + 4079, 1646, 382, 731, 30, 50692], "temperature": 0.0, "avg_logprob": -0.2061401657436205, + "compression_ratio": 1.556910569105691, "no_speech_prob": 0.03927352651953697}, + {"id": 191, "seek": 116488, "start": 1171.44, "end": 1177.8400000000001, "text": + " So I took that as a challenge and the sole search went down from say a hundred + GitHub stars to", "tokens": [50692, 407, 286, 1890, 300, 382, 257, 3430, 293, 264, + 12321, 3164, 1437, 760, 490, 584, 257, 3262, 23331, 6105, 281, 51012], "temperature": + 0.0, "avg_logprob": -0.2061401657436205, "compression_ratio": 1.556910569105691, + "no_speech_prob": 0.03927352651953697}, {"id": 192, "seek": 116488, "start": 1177.8400000000001, + "end": 1186.0800000000002, "text": " somewhere between 1400 now. And I did that + around like two months, I would say. So it depends on", "tokens": [51012, 4079, + 1296, 46795, 586, 13, 400, 286, 630, 300, 926, 411, 732, 2493, 11, 286, 576, 584, + 13, 407, 309, 5946, 322, 51424], "temperature": 0.0, "avg_logprob": -0.2061401657436205, + "compression_ratio": 1.556910569105691, "no_speech_prob": 0.03927352651953697}, + {"id": 193, "seek": 116488, "start": 1186.0800000000002, "end": 1191.68, "text": + " project and then timing, not everything is going to be like resume oriented. It''s + some more practical", "tokens": [51424, 1716, 293, 550, 10822, 11, 406, 1203, 307, + 516, 281, 312, 411, 15358, 21841, 13, 467, 311, 512, 544, 8496, 51704], "temperature": + 0.0, "avg_logprob": -0.2061401657436205, "compression_ratio": 1.556910569105691, + "no_speech_prob": 0.03927352651953697}, {"id": 194, "seek": 119168, "start": 1191.76, + "end": 1198.64, "text": " and more enterprisey thing with sort. What they do in + just is that there are AI Connect platform,", "tokens": [50368, 293, 544, 14132, + 88, 551, 365, 1333, 13, 708, 436, 360, 294, 445, 307, 300, 456, 366, 7318, 11653, + 3663, 11, 50712], "temperature": 0.0, "avg_logprob": -0.2764475005013602, "compression_ratio": + 1.5791666666666666, "no_speech_prob": 0.023742007091641426}, {"id": 195, "seek": + 119168, "start": 1198.64, "end": 1206.16, "text": " they connect internal large + language models to your enterprise data sources such as teams and", "tokens": [50712, + 436, 1745, 6920, 2416, 2856, 5245, 281, 428, 14132, 1412, 7139, 1270, 382, 5491, + 293, 51088], "temperature": 0.0, "avg_logprob": -0.2764475005013602, "compression_ratio": + 1.5791666666666666, "no_speech_prob": 0.023742007091641426}, {"id": 196, "seek": + 119168, "start": 1206.72, "end": 1213.52, "text": " outlook. So while they have + some developer audience, it''s pretty focused towards something like", "tokens": + [51116, 26650, 13, 407, 1339, 436, 362, 512, 10754, 4034, 11, 309, 311, 1238, 5178, + 3030, 746, 411, 51456], "temperature": 0.0, "avg_logprob": -0.2764475005013602, + "compression_ratio": 1.5791666666666666, "no_speech_prob": 0.023742007091641426}, + {"id": 197, "seek": 119168, "start": 1213.52, "end": 1218.64, "text": " a niche + out there. Instead of like chat GPT or something which is generic and even can use.", + "tokens": [51456, 257, 19956, 484, 456, 13, 7156, 295, 411, 5081, 26039, 51, 420, + 746, 597, 307, 19577, 293, 754, 393, 764, 13, 51712], "temperature": 0.0, "avg_logprob": + -0.2764475005013602, "compression_ratio": 1.5791666666666666, "no_speech_prob": + 0.023742007091641426}, {"id": 198, "seek": 121864, "start": 1218.64, "end": 1223.0400000000002, + "text": " So early something which will cater to let''s just say five percent or + even three percent of the", "tokens": [50364, 407, 2440, 746, 597, 486, 21557, 281, + 718, 311, 445, 584, 1732, 3043, 420, 754, 1045, 3043, 295, 264, 50584], "temperature": + 0.0, "avg_logprob": -0.17315555847797198, "compression_ratio": 1.6485355648535565, + "no_speech_prob": 0.00622699037194252}, {"id": 199, "seek": 121864, "start": 1223.0400000000002, + "end": 1229.76, "text": " audience out there. So that''s that. I mean, that was + the challenge and I talked to the team. We did", "tokens": [50584, 4034, 484, 456, + 13, 407, 300, 311, 300, 13, 286, 914, 11, 300, 390, 264, 3430, 293, 286, 2825, 281, + 264, 1469, 13, 492, 630, 50920], "temperature": 0.0, "avg_logprob": -0.17315555847797198, + "compression_ratio": 1.6485355648535565, "no_speech_prob": 0.00622699037194252}, + {"id": 200, "seek": 121864, "start": 1229.76, "end": 1236.5600000000002, "text": + " some changes, wrote some blogs and we arrived at like a really good substantial + result. So as I could", "tokens": [50920, 512, 2962, 11, 4114, 512, 31038, 293, + 321, 6678, 412, 411, 257, 534, 665, 16726, 1874, 13, 407, 382, 286, 727, 51260], + "temperature": 0.0, "avg_logprob": -0.17315555847797198, "compression_ratio": 1.6485355648535565, + "no_speech_prob": 0.00622699037194252}, {"id": 201, "seek": 121864, "start": 1236.5600000000002, + "end": 1243.6000000000001, "text": " say, the principles work for all the companies + out there. Swirl has been could be say one of the", "tokens": [51260, 584, 11, 264, + 9156, 589, 337, 439, 264, 3431, 484, 456, 13, 3926, 1648, 575, 668, 727, 312, 584, + 472, 295, 264, 51612], "temperature": 0.0, "avg_logprob": -0.17315555847797198, + "compression_ratio": 1.6485355648535565, "no_speech_prob": 0.00622699037194252}, + {"id": 202, "seek": 124360, "start": 1243.6, "end": 1250.08, "text": " extreme use + cases that hey, this is catering to enterprise, but it works for them. Then it can + work for", "tokens": [50364, 8084, 764, 3331, 300, 4177, 11, 341, 307, 21557, 278, + 281, 14132, 11, 457, 309, 1985, 337, 552, 13, 1396, 309, 393, 589, 337, 50688], + "temperature": 0.0, "avg_logprob": -0.17909312504594044, "compression_ratio": 1.5793650793650793, + "no_speech_prob": 0.00839119404554367}, {"id": 203, "seek": 124360, "start": 1250.9599999999998, + "end": 1258.08, "text": " any generic public facing project. Yeah, yeah, amazing. + And please do check out the episode with", "tokens": [50732, 604, 19577, 1908, 7170, + 1716, 13, 865, 11, 1338, 11, 2243, 13, 400, 1767, 360, 1520, 484, 264, 3500, 365, + 51088], "temperature": 0.0, "avg_logprob": -0.17909312504594044, "compression_ratio": + 1.5793650793650793, "no_speech_prob": 0.00839119404554367}, {"id": 204, "seek": + 124360, "start": 1258.08, "end": 1265.36, "text": " seed prop stain that who who + created Swirl and he''s very driven individual with a lot of experience", "tokens": + [51088, 8871, 2365, 16441, 300, 567, 567, 2942, 3926, 1648, 293, 415, 311, 588, + 9555, 2609, 365, 257, 688, 295, 1752, 51452], "temperature": 0.0, "avg_logprob": + -0.17909312504594044, "compression_ratio": 1.5793650793650793, "no_speech_prob": + 0.00839119404554367}, {"id": 205, "seek": 124360, "start": 1265.36, "end": 1270.9599999999998, + "text": " in search engine development and software development at large. Please + check it out. Yeah, that''s", "tokens": [51452, 294, 3164, 2848, 3250, 293, 4722, + 3250, 412, 2416, 13, 2555, 1520, 309, 484, 13, 865, 11, 300, 311, 51732], "temperature": + 0.0, "avg_logprob": -0.17909312504594044, "compression_ratio": 1.5793650793650793, + "no_speech_prob": 0.00839119404554367}, {"id": 206, "seek": 127096, "start": 1270.96, + "end": 1279.3600000000001, "text": " amazing. Yeah, I also want to ask you where + do you go from here, right? So do you need some help", "tokens": [50364, 2243, 13, + 865, 11, 286, 611, 528, 281, 1029, 291, 689, 360, 291, 352, 490, 510, 11, 558, 30, + 407, 360, 291, 643, 512, 854, 50784], "temperature": 0.0, "avg_logprob": -0.20595436347158333, + "compression_ratio": 1.5384615384615385, "no_speech_prob": 0.010653882287442684}, + {"id": 207, "seek": 127096, "start": 1279.3600000000001, "end": 1286.64, "text": + " with the resume marcher or you have enough of help, but you need I don''t know, + you need some", "tokens": [50784, 365, 264, 15358, 8368, 260, 420, 291, 362, 1547, + 295, 854, 11, 457, 291, 643, 286, 500, 380, 458, 11, 291, 643, 512, 51148], "temperature": + 0.0, "avg_logprob": -0.20595436347158333, "compression_ratio": 1.5384615384615385, + "no_speech_prob": 0.010653882287442684}, {"id": 208, "seek": 127096, "start": 1288.0, + "end": 1296.0, "text": " maybe cloud, cloud credits to host it. Something like that. + Well, anything, anything else.", "tokens": [51216, 1310, 4588, 11, 4588, 16816, + 281, 3975, 309, 13, 6595, 411, 300, 13, 1042, 11, 1340, 11, 1340, 1646, 13, 51616], + "temperature": 0.0, "avg_logprob": -0.20595436347158333, "compression_ratio": 1.5384615384615385, + "no_speech_prob": 0.010653882287442684}, {"id": 209, "seek": 129600, "start": 1296.24, + "end": 1304.48, "text": " Okay, yeah, for sure. Like any help would be appreciated + with resume match out there. If you can donate", "tokens": [50376, 1033, 11, 1338, + 11, 337, 988, 13, 1743, 604, 854, 576, 312, 17169, 365, 15358, 2995, 484, 456, 13, + 759, 291, 393, 17751, 50788], "temperature": 0.0, "avg_logprob": -0.22586980779120264, + "compression_ratio": 1.6455696202531647, "no_speech_prob": 0.028985027223825455}, + {"id": 210, "seek": 129600, "start": 1304.48, "end": 1310.96, "text": " to the project + out there, it can help like drive the motivation out there to do stuff. Maybe what", + "tokens": [50788, 281, 264, 1716, 484, 456, 11, 309, 393, 854, 411, 3332, 264, 12335, + 484, 456, 281, 360, 1507, 13, 2704, 437, 51112], "temperature": 0.0, "avg_logprob": + -0.22586980779120264, "compression_ratio": 1.6455696202531647, "no_speech_prob": + 0.028985027223825455}, {"id": 211, "seek": 129600, "start": 1310.96, "end": 1316.4, + "text": " what I would really help is, especially is this in the generative AI offering + out there that I''m", "tokens": [51112, 437, 286, 576, 534, 854, 307, 11, 2318, + 307, 341, 294, 264, 1337, 1166, 7318, 8745, 484, 456, 300, 286, 478, 51384], "temperature": + 0.0, "avg_logprob": -0.22586980779120264, "compression_ratio": 1.6455696202531647, + "no_speech_prob": 0.028985027223825455}, {"id": 212, "seek": 129600, "start": 1316.4, + "end": 1323.76, "text": " going to develop, maybe help and test it out with different + AI providers. We also have like", "tokens": [51384, 516, 281, 1499, 11, 1310, 854, + 293, 1500, 309, 484, 365, 819, 7318, 11330, 13, 492, 611, 362, 411, 51752], "temperature": + 0.0, "avg_logprob": -0.22586980779120264, "compression_ratio": 1.6455696202531647, + "no_speech_prob": 0.028985027223825455}, {"id": 213, "seek": 132376, "start": 1324.48, + "end": 1330.4, "text": " open source model, mixed trial, Google''s, Gemma, all those + things and then we have charge", "tokens": [50400, 1269, 4009, 2316, 11, 7467, 7308, + 11, 3329, 311, 11, 22894, 1696, 11, 439, 729, 721, 293, 550, 321, 362, 4602, 50696], + "temperature": 0.0, "avg_logprob": -0.19154447317123413, "compression_ratio": 1.5924369747899159, + "no_speech_prob": 0.01943705603480339}, {"id": 214, "seek": 132376, "start": 1330.4, + "end": 1337.44, "text": " GPT. So really would like to help in orchestrating the + whole project around how to do it in a more", "tokens": [50696, 26039, 51, 13, 407, + 534, 576, 411, 281, 854, 294, 14161, 8754, 264, 1379, 1716, 926, 577, 281, 360, + 309, 294, 257, 544, 51048], "temperature": 0.0, "avg_logprob": -0.19154447317123413, + "compression_ratio": 1.5924369747899159, "no_speech_prob": 0.01943705603480339}, + {"id": 215, "seek": 132376, "start": 1337.44, "end": 1344.08, "text": " better generative + manner. So that''s I would really need anyone''s help out there. And of course,", + "tokens": [51048, 1101, 1337, 1166, 9060, 13, 407, 300, 311, 286, 576, 534, 643, + 2878, 311, 854, 484, 456, 13, 400, 295, 1164, 11, 51380], "temperature": 0.0, "avg_logprob": + -0.19154447317123413, "compression_ratio": 1.5924369747899159, "no_speech_prob": + 0.01943705603480339}, {"id": 216, "seek": 132376, "start": 1344.08, "end": 1349.28, + "text": " cloud hosting. If we can test a model out there that can help us with + cloud hosting or someone", "tokens": [51380, 4588, 16058, 13, 759, 321, 393, 1500, + 257, 2316, 484, 456, 300, 393, 854, 505, 365, 4588, 16058, 420, 1580, 51640], "temperature": + 0.0, "avg_logprob": -0.19154447317123413, "compression_ratio": 1.5924369747899159, + "no_speech_prob": 0.01943705603480339}, {"id": 217, "seek": 134928, "start": 1349.28, + "end": 1354.6399999999999, "text": " who would like to sponsor the project. That + would be the better one because it gets a lot of", "tokens": [50364, 567, 576, 411, + 281, 16198, 264, 1716, 13, 663, 576, 312, 264, 1101, 472, 570, 309, 2170, 257, 688, + 295, 50632], "temperature": 0.0, "avg_logprob": -0.18002832998143564, "compression_ratio": + 1.7727272727272727, "no_speech_prob": 0.0297043826431036}, {"id": 218, "seek": 134928, + "start": 1355.44, "end": 1360.72, "text": " traffic the website. I can put you up + on the website. If you''d like to sponsor, it can reach out to", "tokens": [50672, + 6419, 264, 3144, 13, 286, 393, 829, 291, 493, 322, 264, 3144, 13, 759, 291, 1116, + 411, 281, 16198, 11, 309, 393, 2524, 484, 281, 50936], "temperature": 0.0, "avg_logprob": + -0.18002832998143564, "compression_ratio": 1.7727272727272727, "no_speech_prob": + 0.0297043826431036}, {"id": 219, "seek": 134928, "start": 1360.72, "end": 1367.44, + "text": " me. I will drop in maybe that''s fantastic. I''m sure there will be someone + reaching out or at least", "tokens": [50936, 385, 13, 286, 486, 3270, 294, 1310, + 300, 311, 5456, 13, 286, 478, 988, 456, 486, 312, 1580, 9906, 484, 420, 412, 1935, + 51272], "temperature": 0.0, "avg_logprob": -0.18002832998143564, "compression_ratio": + 1.7727272727272727, "no_speech_prob": 0.0297043826431036}, {"id": 220, "seek": 134928, + "start": 1367.44, "end": 1374.6399999999999, "text": " checking out the website + using using the tool. Yeah, that''s amazing. Yeah, I also like to ask this", "tokens": + [51272, 8568, 484, 264, 3144, 1228, 1228, 264, 2290, 13, 865, 11, 300, 311, 2243, + 13, 865, 11, 286, 611, 411, 281, 1029, 341, 51632], "temperature": 0.0, "avg_logprob": + -0.18002832998143564, "compression_ratio": 1.7727272727272727, "no_speech_prob": + 0.0297043826431036}, {"id": 221, "seek": 137464, "start": 1374.64, "end": 1381.5200000000002, + "text": " a little bit more philosophical question by the end of the podcast, which + I used to ask it like", "tokens": [50364, 257, 707, 857, 544, 25066, 1168, 538, + 264, 917, 295, 264, 7367, 11, 597, 286, 1143, 281, 1029, 309, 411, 50708], "temperature": + 0.0, "avg_logprob": -0.14996301500420822, "compression_ratio": 1.6166666666666667, + "no_speech_prob": 0.016011372208595276}, {"id": 222, "seek": 137464, "start": 1381.5200000000002, + "end": 1390.0800000000002, "text": " why do you do things? But maybe I will try + to rephrase it in this third season. What drives you", "tokens": [50708, 983, 360, + 291, 360, 721, 30, 583, 1310, 286, 486, 853, 281, 319, 44598, 651, 309, 294, 341, + 2636, 3196, 13, 708, 11754, 291, 51136], "temperature": 0.0, "avg_logprob": -0.14996301500420822, + "compression_ratio": 1.6166666666666667, "no_speech_prob": 0.016011372208595276}, + {"id": 223, "seek": 137464, "start": 1390.0800000000002, "end": 1398.88, "text": + " when you wake up? What drives you? Why are you doing this? What drives you in + your open source job?", "tokens": [51136, 562, 291, 6634, 493, 30, 708, 11754, 291, + 30, 1545, 366, 291, 884, 341, 30, 708, 11754, 291, 294, 428, 1269, 4009, 1691, 30, + 51576], "temperature": 0.0, "avg_logprob": -0.14996301500420822, "compression_ratio": + 1.6166666666666667, "no_speech_prob": 0.016011372208595276}, {"id": 224, "seek": + 139888, "start": 1398.88, "end": 1412.3200000000002, "text": " Pretty amazing question. + What drives you? So I would like to quote Sandika as the Roman philosopher", "tokens": + [50364, 10693, 2243, 1168, 13, 708, 11754, 291, 30, 407, 286, 576, 411, 281, 6513, + 7985, 5439, 382, 264, 8566, 29805, 51036], "temperature": 0.0, "avg_logprob": -0.2146397829055786, + "compression_ratio": 1.4923076923076923, "no_speech_prob": 0.017006559297442436}, + {"id": 225, "seek": 139888, "start": 1412.3200000000002, "end": 1419.6000000000001, + "text": " out there for a philosophical question. He says life is short and our + time on planet earth is", "tokens": [51036, 484, 456, 337, 257, 25066, 1168, 13, + 634, 1619, 993, 307, 2099, 293, 527, 565, 322, 5054, 4120, 307, 51400], "temperature": + 0.0, "avg_logprob": -0.2146397829055786, "compression_ratio": 1.4923076923076923, + "no_speech_prob": 0.017006559297442436}, {"id": 226, "seek": 139888, "start": 1419.6000000000001, + "end": 1428.16, "text": " pretty limited. And if you use it, well, the same life + could be long enough for that. So it''s not", "tokens": [51400, 1238, 5567, 13, + 400, 498, 291, 764, 309, 11, 731, 11, 264, 912, 993, 727, 312, 938, 1547, 337, 300, + 13, 407, 309, 311, 406, 51828], "temperature": 0.0, "avg_logprob": -0.2146397829055786, + "compression_ratio": 1.4923076923076923, "no_speech_prob": 0.017006559297442436}, + {"id": 227, "seek": 142816, "start": 1428.16, "end": 1434.72, "text": " about why + you start with why, but you just start with something. And I started with open source.", + "tokens": [50364, 466, 983, 291, 722, 365, 983, 11, 457, 291, 445, 722, 365, 746, + 13, 400, 286, 1409, 365, 1269, 4009, 13, 50692], "temperature": 0.0, "avg_logprob": + -0.18133024918405632, "compression_ratio": 1.758139534883721, "no_speech_prob": + 0.03593135252594948}, {"id": 228, "seek": 142816, "start": 1435.28, "end": 1441.28, + "text": " And it went from like anybody can do open source. All you need is a laptop + or a computer", "tokens": [50720, 400, 309, 1437, 490, 411, 4472, 393, 360, 1269, + 4009, 13, 1057, 291, 643, 307, 257, 10732, 420, 257, 3820, 51020], "temperature": + 0.0, "avg_logprob": -0.18133024918405632, "compression_ratio": 1.758139534883721, + "no_speech_prob": 0.03593135252594948}, {"id": 229, "seek": 142816, "start": 1441.28, + "end": 1447.44, "text": " and how to do get it. And it starts with there. You build + out a public project. You talk about it.", "tokens": [51020, 293, 577, 281, 360, + 483, 309, 13, 400, 309, 3719, 365, 456, 13, 509, 1322, 484, 257, 1908, 1716, 13, + 509, 751, 466, 309, 13, 51328], "temperature": 0.0, "avg_logprob": -0.18133024918405632, + "compression_ratio": 1.758139534883721, "no_speech_prob": 0.03593135252594948}, + {"id": 230, "seek": 142816, "start": 1447.44, "end": 1452.0, "text": " Share the + same idea. You contribute to other projects. And the whole chain starts from there.", + "tokens": [51328, 14945, 264, 912, 1558, 13, 509, 10586, 281, 661, 4455, 13, 400, + 264, 1379, 5021, 3719, 490, 456, 13, 51556], "temperature": 0.0, "avg_logprob": + -0.18133024918405632, "compression_ratio": 1.758139534883721, "no_speech_prob": + 0.03593135252594948}, {"id": 231, "seek": 145200, "start": 1452.72, "end": 1460.56, + "text": " And then that''s how I found all these amazing people out there, including + you coming up with", "tokens": [50400, 400, 550, 300, 311, 577, 286, 1352, 439, + 613, 2243, 561, 484, 456, 11, 3009, 291, 1348, 493, 365, 50792], "temperature": + 0.0, "avg_logprob": -0.20586034303070397, "compression_ratio": 1.6977777777777778, + "no_speech_prob": 0.08886586129665375}, {"id": 232, "seek": 145200, "start": 1460.56, + "end": 1465.2, "text": " how to improve resume and match it. I''ve even met some + really grateful amazing people out there as well.", "tokens": [50792, 577, 281, + 3470, 15358, 293, 2995, 309, 13, 286, 600, 754, 1131, 512, 534, 7941, 2243, 561, + 484, 456, 382, 731, 13, 51024], "temperature": 0.0, "avg_logprob": -0.20586034303070397, + "compression_ratio": 1.6977777777777778, "no_speech_prob": 0.08886586129665375}, + {"id": 233, "seek": 145200, "start": 1466.08, "end": 1471.28, "text": " So it''s + in the beginning when I was doing the whole thing, it never occurred to me that + this", "tokens": [51068, 407, 309, 311, 294, 264, 2863, 562, 286, 390, 884, 264, + 1379, 551, 11, 309, 1128, 11068, 281, 385, 300, 341, 51328], "temperature": 0.0, + "avg_logprob": -0.20586034303070397, "compression_ratio": 1.6977777777777778, "no_speech_prob": + 0.08886586129665375}, {"id": 234, "seek": 145200, "start": 1471.28, "end": 1477.28, + "text": " project could go out and become a really trending project. It never did. + I mean, it was the", "tokens": [51328, 1716, 727, 352, 484, 293, 1813, 257, 534, + 28692, 1716, 13, 467, 1128, 630, 13, 286, 914, 11, 309, 390, 264, 51628], "temperature": + 0.0, "avg_logprob": -0.20586034303070397, "compression_ratio": 1.6977777777777778, + "no_speech_prob": 0.08886586129665375}, {"id": 235, "seek": 147728, "start": 1477.28, + "end": 1483.84, "text": " wildest maybe like 0.01% of the tree. This can''t this + go trending. And then it did happen.", "tokens": [50364, 4868, 377, 1310, 411, 1958, + 13, 10607, 4, 295, 264, 4230, 13, 639, 393, 380, 341, 352, 28692, 13, 400, 550, + 309, 630, 1051, 13, 50692], "temperature": 0.0, "avg_logprob": -0.22711768952926786, + "compression_ratio": 1.6283185840707965, "no_speech_prob": 0.017215127125382423}, + {"id": 236, "seek": 147728, "start": 1484.3999999999999, "end": 1490.96, "text": + " And not only that it happened like, well, more times to be go out and become this + whole trending", "tokens": [50720, 400, 406, 787, 300, 309, 2011, 411, 11, 731, + 11, 544, 1413, 281, 312, 352, 484, 293, 1813, 341, 1379, 28692, 51048], "temperature": + 0.0, "avg_logprob": -0.22711768952926786, "compression_ratio": 1.6283185840707965, + "no_speech_prob": 0.017215127125382423}, {"id": 237, "seek": 147728, "start": 1490.96, + "end": 1498.72, "text": " thing. So that was pretty amazing. So in the beginning, + it was let''s do just something. We have time,", "tokens": [51048, 551, 13, 407, + 300, 390, 1238, 2243, 13, 407, 294, 264, 2863, 11, 309, 390, 718, 311, 360, 445, + 746, 13, 492, 362, 565, 11, 51436], "temperature": 0.0, "avg_logprob": -0.22711768952926786, + "compression_ratio": 1.6283185840707965, "no_speech_prob": 0.017215127125382423}, + {"id": 238, "seek": 147728, "start": 1498.72, "end": 1504.48, "text": " not wasted, + don''t waste time. Let''s get started. And it actually spans it out.", "tokens": + [51436, 406, 19496, 11, 500, 380, 5964, 565, 13, 961, 311, 483, 1409, 13, 400, 309, + 767, 44086, 309, 484, 13, 51724], "temperature": 0.0, "avg_logprob": -0.22711768952926786, + "compression_ratio": 1.6283185840707965, "no_speech_prob": 0.017215127125382423}, + {"id": 239, "seek": 150448, "start": 1505.1200000000001, "end": 1510.72, "text": + " More action you take, the better it gets. Yeah, absolutely. So keep on doing this.", + "tokens": [50396, 5048, 3069, 291, 747, 11, 264, 1101, 309, 2170, 13, 865, 11, 3122, + 13, 407, 1066, 322, 884, 341, 13, 50676], "temperature": 0.0, "avg_logprob": -0.249063299159811, + "compression_ratio": 1.5518672199170125, "no_speech_prob": 0.07910144329071045}, + {"id": 240, "seek": 150448, "start": 1510.72, "end": 1517.76, "text": " And also + you are very driven like every time we talk of podcasts, I learned something from + you.", "tokens": [50676, 400, 611, 291, 366, 588, 9555, 411, 633, 565, 321, 751, + 295, 24045, 11, 286, 3264, 746, 490, 291, 13, 51028], "temperature": 0.0, "avg_logprob": + -0.249063299159811, "compression_ratio": 1.5518672199170125, "no_speech_prob": 0.07910144329071045}, + {"id": 241, "seek": 150448, "start": 1517.76, "end": 1522.96, "text": " You give + me a couple of links and I start checking them. So it seems like you always on the + edge.", "tokens": [51028, 509, 976, 385, 257, 1916, 295, 6123, 293, 286, 722, 8568, + 552, 13, 407, 309, 2544, 411, 291, 1009, 322, 264, 4691, 13, 51288], "temperature": + 0.0, "avg_logprob": -0.249063299159811, "compression_ratio": 1.5518672199170125, + "no_speech_prob": 0.07910144329071045}, {"id": 242, "seek": 150448, "start": 1523.84, + "end": 1530.64, "text": " That''s amazing. Yeah, thank you very much. Zara, I really + enjoyed chatting to you. I''m sure we''ll", "tokens": [51332, 663, 311, 2243, 13, + 865, 11, 1309, 291, 588, 709, 13, 1176, 2419, 11, 286, 534, 4626, 24654, 281, 291, + 13, 286, 478, 988, 321, 603, 51672], "temperature": 0.0, "avg_logprob": -0.249063299159811, + "compression_ratio": 1.5518672199170125, "no_speech_prob": 0.07910144329071045}, + {"id": 243, "seek": 153064, "start": 1530.64, "end": 1537.5200000000002, "text": + " connect more. I''m looking forward to the design, the craziest design that you + will come up for", "tokens": [50364, 1745, 544, 13, 286, 478, 1237, 2128, 281, 264, + 1715, 11, 264, 46339, 1715, 300, 291, 486, 808, 493, 337, 50708], "temperature": + 0.0, "avg_logprob": -0.22710519570570725, "compression_ratio": 1.5714285714285714, + "no_speech_prob": 0.035777363926172256}, {"id": 244, "seek": 153064, "start": 1537.5200000000002, + "end": 1544.4, "text": " this episode. Of course. And all the best with resume and + mature and with your blogging. And I", "tokens": [50708, 341, 3500, 13, 2720, 1164, + 13, 400, 439, 264, 1151, 365, 15358, 293, 14442, 293, 365, 428, 6968, 3249, 13, + 400, 286, 51052], "temperature": 0.0, "avg_logprob": -0.22710519570570725, "compression_ratio": + 1.5714285714285714, "no_speech_prob": 0.035777363926172256}, {"id": 245, "seek": + 153064, "start": 1544.4, "end": 1549.8400000000001, "text": " also know like you''re + using a ton of modern tools, right? Like chat GPT and you apply them to work.", + "tokens": [51052, 611, 458, 411, 291, 434, 1228, 257, 2952, 295, 4363, 3873, 11, + 558, 30, 1743, 5081, 26039, 51, 293, 291, 3079, 552, 281, 589, 13, 51324], "temperature": + 0.0, "avg_logprob": -0.22710519570570725, "compression_ratio": 1.5714285714285714, + "no_speech_prob": 0.035777363926172256}, {"id": 246, "seek": 153064, "start": 1549.8400000000001, + "end": 1556.48, "text": " That''s amazing. Yeah, for sure. And yes, thank you to + me, Trey. Thank you for having me on this", "tokens": [51324, 663, 311, 2243, 13, + 865, 11, 337, 988, 13, 400, 2086, 11, 1309, 291, 281, 385, 11, 314, 7950, 13, 1044, + 291, 337, 1419, 385, 322, 341, 51656], "temperature": 0.0, "avg_logprob": -0.22710519570570725, + "compression_ratio": 1.5714285714285714, "no_speech_prob": 0.035777363926172256}, + {"id": 247, "seek": 155648, "start": 1556.48, "end": 1567.2, "text": " podcast and + giving me the opportunity to talk about resume and action.", "tokens": [50364, 7367, + 293, 2902, 385, 264, 2650, 281, 751, 466, 15358, 293, 3069, 13, 50900], "temperature": + 0.0, "avg_logprob": -0.5187486410140991, "compression_ratio": 1.0294117647058822, + "no_speech_prob": 0.23877781629562378}]' +--- + +Hello there, this is Vector Podcast. Third season, I'm sure you've been waiting for new episodes. There have been something happening in my family on positive side, so I was a little bit like distracted on a way, but I'm super excited to have my guest with me. +It's Sarah Prie, who is a software developer. He also doubles in DevRel. He has an open source project, which is like Skyrocketing in stars. Yeah, welcome, Sarah. +It's a high-demetripe, and it's a pleasure to be on the first episode of the third season, and it's a pretty amazing introduction that you do. So, hey, audience, I am Sarah. I am a software developer with more than two years of experience. I've been doing a lot of open source projects. +And there is one more thing that I should also mention, which is also very important. Well, at least for me, and I know for you as well, is that you are the designer on this podcast, and I'm sure that you will be designing this very episode as well. Oh, yeah. +Yeah, it's going to be fun, like designing your own, like editing the own episode and the banner and so on. So it's pretty amazing going from designer open source and then here. Yeah, absolutely. +So, yeah, if you check out some of the designs that drew your attention, you should know that this was, these were done by Sarah. I'm really excited to have you here. As usual, we start with an intro. +Could you introduce yourself to our audience? Like what's your background and how you got here? Yeah. Okay, so I have a background in computer science and engineering coming up with an engineering degree right after the COVID second wave hit in India. +And after that, I've been doing like a full stack development for a very large corporate company out there, probably been there for two and a half years. +And apart from that, I was pretty much involved into open source projects, vector search, machine learning in AI, and all the same spaces that you are in, that's how I found you. And the other amazing team members out there that we have collaborated and I've designed for as well. +So that was an interest that I've kept on to work towards artificial intelligence, machine learning, vector spaces, vector search, vector databases and all those interesting things. And yeah, and all thing open source as well. +So with all that's happening in the last two years with me, I ended up creating this project of mine called resume master, which is started to like gain more attention than I initially assumed. +And that's where it all like blew up like 4000 nearly 4500 GitHub stars, huge amount of traffic on the website and a lot of downloads, maybe like more than 1000 folks. I've got like 800 members in my community. So it's pretty amazing. Yeah, this is insane and crazy. +I remember we were chatting together about this project and you said, you have this project, which is kind of like, you know, a cadaver interest or something like that that you've been doing on the side, right? And then we were chatting, I give you like a small advice to rename it slightly. +And then you decided to really go public with it or whatever. Like just an insane amount of growth that you've made there from like a couple of stars to 4000 and half stars. Like how did you do that? Like I still don't understand. +So the amazing part with the whole thing was that as you mentioned, the initially it was called name resume matching and it was pretty much focused on the algorithm. +And like when we were talking about it, to make it more public, friendly, have a nice intuitive dashboard so that people can interact apart from the command line stuff that I had before. +So that was the whole thing to create up a product that people can use and that introduced me to something called as developer marketing. It's not just that you have the product or you have the code, but there has to be something special about it that you can do so that it grows. +I mean, we have a huge probably millions of open source project in GitHub and GitLab as well. What makes a difference between something that has 100 stars or maybe less to something that has 4000 or maybe more than that? That has the element of marketing out there and a finally polished product. +So that's the key differentiator between everything is that. Writing software is essential, but if you don't do the marketing, if you don't do the advertising or evangelizing about your whole stuff, then it probably doesn't get as much attention it wants to. +So that was the game changer for resume matching. This is an amazing journey that you've had there. And this is where I want to ask you next. +My next question, can you explain what resume matcher does and how is it relevant or useful to pretty much anybody, I guess, but maybe developers first, right? The target audience for resume matcher was developers out there. +I'm a software developer, so I knew like what was the challenge with the whole project. What resume matcher does is like, it is a reverse ATS tool for your resume is out there. It takes up certain keywords from job descriptions and then it matches with your resume out there. +It suggests that probably you can add some extra keywords. An example would be, hey, I am a Java developer. I'm going to apply for this full stack developer job that I know that I can do it well. The full stack developer job contains certain extra keywords. +I have written Java, but the job description has mentioned it as G2EE or Java to enterprise edition or probably another package. Maybe it is, it has maven, it has spring, and I have written as spring boot. Certain keyword mismatches there, which anybody can make. +Then that is something that resume matcher fills in. It would go out, parse your resume, match it with a lot of job descriptions out there. It's going to suggest your similarity score as well. That, hey, this is the job, this is how much you match. That's where the webtoe similarity comes in. +You mentioned ATS. What does it stand for? Can you expand it? Oh, yeah, for shows. For those who don't know, ATS is an applicant tracking system. There's like a lot of them. +What actually happens is that when you apply to a company, any likes, from startup to a large corporate company, you submit their resumes and they get ingested by these applicant tracking system. +These software's taken your resume, extract all the keywords, and then they run some similarity scores against the job description. It also do some custom keywords searching as well, but that's to per company basis. It's like optimizing your resume, right? I remember I was reading one book. +I will need to look it up to link it in the show notes, but basically it's a book about how you should approach job seeking, but part of it is basically writing your resume, right? +I remember that it was actually starting in reverse way, the same way you just explained, but it would first ask you to list the jobs that way you want to apply, right? Sort of like scope them. +Then you kind of try to summarize what's common in there, if there is some commonality, and then you go backwards to your experience and you try to kind of match these things in a proper way, right? +You even need to start your resume from the key skill that this job adds a looking for, and that's how your resume is going to stand out. +That's amazing way that you cracked it. Basically, I don't know if you've read that book, but... No, I'm not. That's amazing. This tool has now how many contributors? It's an open source, right? Sorry, I missed it a bit. So this tool is open source, right? Yeah, open source with Apache 2.0 license. +Yeah, and how many... Part of the journey, it's not just getting stars, you did get stars, which is amazing, but I think an even more interesting result, at least for me, is that you got contributors besides yourself. +And I think last time I checked was today, like you have, I don't know, dozens of them. How did that happen? Yeah, more than... So this advice would be more towards the open source companies out there. We've seen a lot of them getting started. Why a combinator has a lot of... +They've started funding a lot of open source companies out there, which is pretty amazing. We're from vector database, machine learning, tools, generative AI tools, all those amazing things. So the interesting part of this is that trending gets you visibility. +Like, if you are on the GitHub trending feed, it will give you visibility. And there are all the people out there who are like looking for their next content. So we have a lot of YouTubers out there who talk about open source projects. We have a lot of blog writers out there who tried about... +Even I have also written about it in certain in certain of my blogs as well, like the other different tools. So trending feed gives you visibility. People take it out and then people tweet about the project. People talk in their different forums. It gets reshared. +So this gives you a really good visibility. Probably maybe like improves up your SEO a bit. That I would say. And once you are visible, it's going to be like more people are going to download that project. They will try it out. And if they find out a bug, they will contribute it. +And some other people are really enthusiastic. They want to learn hopping into your community. They will talk about like, hey, can I contribute? Is this an issue that I want to talk about? So the whole thing with the resume match was the same as well. Yeah, it's amazing. +And then I feel like maybe these people also found your project relevant to themselves or maybe to their connections. I was checking your product hand and I see some comments where people say that, hey, this is amazing. I also recommended it to a friend of mine who's also looking for a job. +And I think this market, of course, became like IT sector became a little volatile. Probably around the world, right? So you kind of kicked in with this project right? Yes, I would say that may be my timing was perfect in that sense. +Like the moment, I would not like to stay in that sense, but I just wrote the product. I never knew that the market was going to be in that same thing. So yeah, the resume measure took the advantage of time, we're also seeing a lot of generative AI based open source startups. +Lama index is pretty amazing in that sense. Like I know Lama index and Langshane trying them out when they were like really small, they were just growing out there. And then after like 12, 13 months, I believe? Well, yeah, around nearly a year, maybe less. Now they're like full-blown companies. +They've got an investment. They have their own cloud tier. So maybe the whole next big shift towards open source. Yeah, exactly. Well, getting a bit more technical, can you also unveil the some of the architecture decisions you made for this tool? I know that you are using vector search. +Is that right? Like what are you using? You are using some library or database. Maybe you started using a database. Can you explain a bit more about the architecture of your system? I use like basic tools such as PDF minor or PDF extraction tools, word text extraction tools, I would say. +I use that. Then I use libraries such as Spacy, Annel TK. And yeah, Spacy and Annel TK code that I've written to combine them and use algorithms to extract chunks of text. And then I use something called as, so I was using quadrant vector database to do the vector embedding. +But later on quadrant introduced something called as fast embed, which is pretty amazing. And it can do the text to vector embedding on the fly. And then using that, I can, I've written like someone contributed the code to do the co-science similarity for that. +So extract the text, do the analysis using Spacy, Annel TK, and there's also another wrapper on top of Spacy. Use that. Get the code of the data, visualize it, and then send it to the people out there for the vector embedding. But the search is something that I would like to introduce. +Maybe generative AI is also that something that I'm working on as well. So the current goal in the couple of months is to get a dynamic ATS that takes in your resume, takes in the job description, optimizes it without hallucinating, without adding some extra keywords out there. +Like as a chava developer, you don't want cloud. If you haven't worked in cloud, then you don't want Kubernetes to be suddenly added into your resume. Or maybe it, you don't want it to say that, hey, this guy has worked out. +Some other company and created open AI is any any item stuff that comes out. So limiting the hallucination and all that thing to introduce generative AI is somewhere in between. Yeah, yeah, yeah, but sure. +And are you also like planning to make it like a cloud version or something that you host for other people to access? Or is it so that at the moment, it's mostly like self service, right? So people need to download your repository, set it up and start using it. +The challenges with cloud is that cloud is not free. And if I were was to introduce a cloud variant that has to be like, have a paywall, people have to log in, they have to subscribe and all that thing. It's just a far fetched goal that if I want to, I can get a paid version of it. +But what it really want is downloadable version that people can download and easily access, not just software developers, but maybe everyone out there. Like a Mac, Mac OS app or an iOS app that people can download and start to. Yeah, yeah, for sure. Yeah, that makes a lot of sense for sure. +Yeah, you also mentioned something about like challenges maintaining this project. I don't know. Is this is a challenging to maintain a project that you've been like the author of, but now you have so many contributors, maybe also a growing demand of things and people make decisions and so on. +How do you coordinate this thing today? Is it a challenge? Yeah, that's an interesting question by the way. Yeah, it becomes challenging after a certain time. So when you're trending, it gives you a really amazing intense dopamine hit and you're excited about it. +You check out the GitHub Fidei, you have grown by 200 stars and everything. I've seen like a lot of people joining that are talking to you. And then after that, after the whole phase fades out, there comes in a time when I have a lot of full requests out there, which we have to talk about. +And then even if the full requests are there, you have to download them, test them, and then there could be things like you have to request changes and it all takes up a lot of time. So it does become a challenging but hay, but that's the part of process. +Like even you do the same thing with at your workplace as well. So I won't focus more much on that. It's the work that's there. Yeah, it's more of any open source project. Yeah, exactly. +So it's kind of like marathon and you need to dedicate some chunk of your day to do stuff, right? To maintain it, to keep it alive. Oh yeah, that's amazing. +You also mentioned that you basically replicated this success of your, you became like a marketing guru in some sense, right? Like I know that there are, there are even like companies that have open source repositories that haven't had such a success that you've had in just a few weeks. +I mean, I haven't just really amazed by this. You also picked up another project and you kind of like replicated the same process there and you grew it to like really big level. So can you talk a bit more about what is this project about? And yeah, what's your role there? And what do you do? Yep. +So while I was doing resume matching and all this thing, I was looking for a search engine out there like something that could easily federate the queries across different sources and extract information. And probably the guest has also paid on the previous version of season two of vector podcast. +That's when it was world search. So we had a talk with Sid and we probably kicked off like, hey, we had similar interest. He was also into search and embedding and all those AI stuff. And I was also interested in that. So I joined in as a open source contributor out there. +We talked about like, hey, how can we replicate the success for resume matching that to Swirl? And it wasn't much of a serious role, I would say, or a full-time role, but I did the testing of like the, does the principles work somewhere else as well? +So I took that as a challenge and the sole search went down from say a hundred GitHub stars to somewhere between 1400 now. +And I did that around like two months, I would say. So it depends on project and then timing, not everything is going to be like resume oriented. It's some more practical and more enterprisey thing with sort. +What they do in just is that there are AI Connect platform, they connect internal large language models to your enterprise data sources such as teams and outlook. So while they have some developer audience, it's pretty focused towards something like a niche out there. +Instead of like chat GPT or something which is generic and even can use. So early something which will cater to let's just say five percent or even three percent of the audience out there. So that's that. I mean, that was the challenge and I talked to the team. +We did some changes, wrote some blogs and we arrived at like a really good substantial result. So as I could say, the principles work for all the companies out there. Swirl has been could be say one of the extreme use cases that hey, this is catering to enterprise, but it works for them. +Then it can work for any generic public facing project. Yeah, yeah, amazing. And please do check out the episode with seed prop stain that who who created Swirl and he's very driven individual with a lot of experience in search engine development and software development at large. +Please check it out. Yeah, that's amazing. Yeah, I also want to ask you where do you go from here, right? So do you need some help with the resume marcher or you have enough of help, but you need I don't know, you need some maybe cloud, cloud credits to host it. Something like that. +Well, anything, anything else. Okay, yeah, for sure. Like any help would be appreciated with resume match out there. If you can donate to the project out there, it can help like drive the motivation out there to do stuff. +Maybe what what I would really help is, especially is this in the generative AI offering out there that I'm going to develop, maybe help and test it out with different AI providers. We also have like open source model, mixed trial, Google's, Gemma, all those things and then we have charge GPT. +So really would like to help in orchestrating the whole project around how to do it in a more better generative manner. So that's I would really need anyone's help out there. And of course, cloud hosting. +If we can test a model out there that can help us with cloud hosting or someone who would like to sponsor the project. That would be the better one because it gets a lot of traffic the website. I can put you up on the website. If you'd like to sponsor, it can reach out to me. +I will drop in maybe that's fantastic. I'm sure there will be someone reaching out or at least checking out the website using using the tool. Yeah, that's amazing. +Yeah, I also like to ask this a little bit more philosophical question by the end of the podcast, which I used to ask it like why do you do things? But maybe I will try to rephrase it in this third season. +What drives you when you wake up? What drives you? Why are you doing this? What drives you in your open source job? Pretty amazing question. What drives you? So I would like to quote Sandika as the Roman philosopher out there for a philosophical question. +He says life is short and our time on planet earth is pretty limited. And if you use it, well, the same life could be long enough for that. So it's not about why you start with why, but you just start with something. And I started with open source. And it went from like anybody can do open source. +All you need is a laptop or a computer and how to do get it. And it starts with there. You build out a public project. You talk about it. Share the same idea. You contribute to other projects. And the whole chain starts from there. +And then that's how I found all these amazing people out there, including you coming up with how to improve resume and match it. I've even met some really grateful amazing people out there as well. +So it's in the beginning when I was doing the whole thing, it never occurred to me that this project could go out and become a really trending project. It never did. I mean, it was the wildest maybe like 0.01% of the tree. This can't this go trending. And then it did happen. +And not only that it happened like, well, more times to be go out and become this whole trending thing. So that was pretty amazing. So in the beginning, it was let's do just something. We have time, not wasted, don't waste time. Let's get started. And it actually spans it out. +More action you take, the better it gets. Yeah, absolutely. So keep on doing this. And also you are very driven like every time we talk of podcasts, I learned something from you. You give me a couple of links and I start checking them. So it seems like you always on the edge. That's amazing. +Yeah, thank you very much. Zara, I really enjoyed chatting to you. I'm sure we'll connect more. I'm looking forward to the design, the craziest design that you will come up for this episode. Of course. And all the best with resume and mature and with your blogging. +And I also know like you're using a ton of modern tools, right? Like chat GPT and you apply them to work. That's amazing. Yeah, for sure. And yes, thank you to me, Trey. Thank you for having me on this podcast and giving me the opportunity to talk about resume and action. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/sid-probstein-creator-of-swirl-search-in-siloed-data-with-llms.md b/transcripts_with_timestamps/vector-podcast/sid-probstein-creator-of-swirl-search-in-siloed-data-with-llms.md new file mode 100644 index 0000000..fe94217 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/sid-probstein-creator-of-swirl-search-in-siloed-data-with-llms.md @@ -0,0 +1,5254 @@ +--- +description: '

Topics:

00:00 Intro

00:22 Quick demo of SWIRL on the + summary transcript of this episode

01:29 Sid’s background

08:50 Enterprise + vs Federated search

17:48 How vector search covers for missing folksonomy + in enterprise data

26:07 Relevancy from vector search standpoint

31:58 + How ChatGPT improves programmer’s productivity

32:57 Demo!

45:23 Google + PSE

53:10 Ideal user of SWIRL

57:22 Where SWIRL sits architecturally

1:01:46 + How to evolve SWIRL with domain expertise

1:04:59 Reasons to go open source

1:10:54 + How SWIRL and Sid interact with ChatGPT

1:23:22 The magical question of WHY

1:27:58 + Sid’s announcements to the community

YouTube version: https://www.youtube.com/watch?v=vhQ5LM5pK_Y

Design + by Saurabh Rai: https://twitter.com/_srbhr_ + Check out his Resume Matcher project: https://www.resumematcher.fyi/

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20230722_050704_2b439b236c5d93de6718cfecda81d779.jpg +pub_date: Sat, 22 Jul 2023 05:03:26 GMT +title: Sid Probstein - Creator of SWIRL - Search in siloed data with LLMs +url: https://rss.com/podcasts/vector-podcast/1047952 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 28.72, "text": " In + this episode, you will learn about Swirl, MetaSearch Engine with large language + models", "tokens": [50364, 682, 341, 3500, 11, 291, 486, 1466, 466, 3926, 1648, + 11, 6377, 64, 10637, 1178, 7659, 365, 2416, 2856, 5245, 51800], "temperature": 0.0, + "avg_logprob": -0.3252786000569661, "compression_ratio": 1.0229885057471264, "no_speech_prob": + 0.08927268534898758}, {"id": 1, "seek": 2872, "start": 28.72, "end": 36.16, "text": + " for your silo data. Here you can see how it works for the summary transcript of + this episode", "tokens": [50364, 337, 428, 3425, 78, 1412, 13, 1692, 291, 393, 536, + 577, 309, 1985, 337, 264, 12691, 24444, 295, 341, 3500, 50736], "temperature": 0.0, + "avg_logprob": -0.33554888346109046, "compression_ratio": 1.4162679425837321, "no_speech_prob": + 0.19984905421733856}, {"id": 2, "seek": 2872, "start": 36.16, "end": 40.16, "text": + " created with the tool called ClearWord.", "tokens": [50736, 2942, 365, 264, 2290, + 1219, 14993, 54, 765, 13, 50936], "temperature": 0.0, "avg_logprob": -0.33554888346109046, + "compression_ratio": 1.4162679425837321, "no_speech_prob": 0.19984905421733856}, + {"id": 3, "seek": 2872, "start": 40.16, "end": 46.08, "text": " Hello there, Vector + Podcast Season 2, and today I''m super excited to be talking to", "tokens": [50936, + 2425, 456, 11, 691, 20814, 29972, 16465, 568, 11, 293, 965, 286, 478, 1687, 2919, + 281, 312, 1417, 281, 51232], "temperature": 0.0, "avg_logprob": -0.33554888346109046, + "compression_ratio": 1.4162679425837321, "no_speech_prob": 0.19984905421733856}, + {"id": 4, "seek": 2872, "start": 46.08, "end": 53.44, "text": " CIPROP STING, the + creator of Swirl Search. It''s a federated VectorSearch Engine.", "tokens": [51232, + 383, 9139, 7142, 47, 4904, 3017, 11, 264, 14181, 295, 3926, 1648, 17180, 13, 467, + 311, 257, 38024, 770, 691, 20814, 10637, 1178, 7659, 13, 51600], "temperature": + 0.0, "avg_logprob": -0.33554888346109046, "compression_ratio": 1.4162679425837321, + "no_speech_prob": 0.19984905421733856}, {"id": 5, "seek": 5344, "start": 53.44, + "end": 58.72, "text": " If I''m correct, but I would not hear more from CIT himself. + Hello, CIT, how are you?", "tokens": [50364, 759, 286, 478, 3006, 11, 457, 286, + 576, 406, 1568, 544, 490, 383, 3927, 3647, 13, 2425, 11, 383, 3927, 11, 577, 366, + 291, 30, 50628], "temperature": 0.0, "avg_logprob": -0.2250738236510638, "compression_ratio": + 1.5427350427350428, "no_speech_prob": 0.04136265814304352}, {"id": 6, "seek": 5344, + "start": 59.36, "end": 62.879999999999995, "text": " I''m doing great. It''s really + great to be here. Thank you so much for inviting me.", "tokens": [50660, 286, 478, + 884, 869, 13, 467, 311, 534, 869, 281, 312, 510, 13, 1044, 291, 370, 709, 337, 18202, + 385, 13, 50836], "temperature": 0.0, "avg_logprob": -0.2250738236510638, "compression_ratio": + 1.5427350427350428, "no_speech_prob": 0.04136265814304352}, {"id": 7, "seek": 5344, + "start": 62.879999999999995, "end": 71.36, "text": " Yeah, thanks for joining. I''m + sure you are very busy building Swirl, and I''m really curious to", "tokens": [50836, + 865, 11, 3231, 337, 5549, 13, 286, 478, 988, 291, 366, 588, 5856, 2390, 3926, 1648, + 11, 293, 286, 478, 534, 6369, 281, 51260], "temperature": 0.0, "avg_logprob": -0.2250738236510638, + "compression_ratio": 1.5427350427350428, "no_speech_prob": 0.04136265814304352}, + {"id": 8, "seek": 5344, "start": 71.36, "end": 77.92, "text": " learn more about + it. I missed all the discussion, you know, how chat GPT is going to change things.", + "tokens": [51260, 1466, 544, 466, 309, 13, 286, 6721, 439, 264, 5017, 11, 291, 458, + 11, 577, 5081, 26039, 51, 307, 516, 281, 1319, 721, 13, 51588], "temperature": 0.0, + "avg_logprob": -0.2250738236510638, "compression_ratio": 1.5427350427350428, "no_speech_prob": + 0.04136265814304352}, {"id": 9, "seek": 7792, "start": 78.48, "end": 85.6, "text": + " You know, is it going to conquer us or whatnot? But yeah, I mean, I''m really + interested to hear", "tokens": [50392, 509, 458, 11, 307, 309, 516, 281, 24136, + 505, 420, 25882, 30, 583, 1338, 11, 286, 914, 11, 286, 478, 534, 3102, 281, 1568, + 50748], "temperature": 0.0, "avg_logprob": -0.15245074272155762, "compression_ratio": + 1.616326530612245, "no_speech_prob": 0.17178726196289062}, {"id": 10, "seek": 7792, + "start": 85.6, "end": 91.76, "text": " how you guys are doing, how you guys are + building this. And traditionally, we start with your background", "tokens": [50748, + 577, 291, 1074, 366, 884, 11, 577, 291, 1074, 366, 2390, 341, 13, 400, 19067, 11, + 321, 722, 365, 428, 3678, 51056], "temperature": 0.0, "avg_logprob": -0.15245074272155762, + "compression_ratio": 1.616326530612245, "no_speech_prob": 0.17178726196289062}, + {"id": 11, "seek": 7792, "start": 91.76, "end": 98.64, "text": " because we really + want to know how you got here. Absolutely, no. And it''s been an interesting", "tokens": + [51056, 570, 321, 534, 528, 281, 458, 577, 291, 658, 510, 13, 7021, 11, 572, 13, + 400, 309, 311, 668, 364, 1880, 51400], "temperature": 0.0, "avg_logprob": -0.15245074272155762, + "compression_ratio": 1.616326530612245, "no_speech_prob": 0.17178726196289062}, + {"id": 12, "seek": 7792, "start": 98.64, "end": 105.76, "text": " journey. Swirl + actually is my, the 12th venture I''ve been lucky enough to work on. I started actually", + "tokens": [51400, 4671, 13, 3926, 1648, 767, 307, 452, 11, 264, 2272, 392, 18474, + 286, 600, 668, 6356, 1547, 281, 589, 322, 13, 286, 1409, 767, 51756], "temperature": + 0.0, "avg_logprob": -0.15245074272155762, "compression_ratio": 1.616326530612245, + "no_speech_prob": 0.17178726196289062}, {"id": 13, "seek": 10576, "start": 105.76, + "end": 113.76, "text": " at a free email company called FreeMarkMail. You might + remember Juno, our vastly more successful", "tokens": [50364, 412, 257, 1737, 3796, + 2237, 1219, 11551, 15168, 44, 864, 13, 509, 1062, 1604, 8492, 78, 11, 527, 41426, + 544, 4406, 50764], "temperature": 0.0, "avg_logprob": -0.16761928285871233, "compression_ratio": + 1.45, "no_speech_prob": 0.009068218991160393}, {"id": 14, "seek": 10576, "start": + 113.76, "end": 120.4, "text": " competitor. It was a great, great lesson in marketing + and customer acquisition. But long story short,", "tokens": [50764, 27266, 13, 467, + 390, 257, 869, 11, 869, 6898, 294, 6370, 293, 5474, 21668, 13, 583, 938, 1657, 2099, + 11, 51096], "temperature": 0.0, "avg_logprob": -0.16761928285871233, "compression_ratio": + 1.45, "no_speech_prob": 0.009068218991160393}, {"id": 15, "seek": 10576, "start": + 120.4, "end": 128.08, "text": " you know, my dad was an MIT professor, and he suggested, + or he was interested in computers,", "tokens": [51096, 291, 458, 11, 452, 3546, + 390, 364, 13100, 8304, 11, 293, 415, 10945, 11, 420, 415, 390, 3102, 294, 10807, + 11, 51480], "temperature": 0.0, "avg_logprob": -0.16761928285871233, "compression_ratio": + 1.45, "no_speech_prob": 0.009068218991160393}, {"id": 16, "seek": 12808, "start": + 128.16000000000003, "end": 138.16000000000003, "text": " and somewhere around, it + was too long ago, but I was about 12 and I picked up a TRS 80 with 16K of RAM,", + "tokens": [50368, 293, 4079, 926, 11, 309, 390, 886, 938, 2057, 11, 457, 286, 390, + 466, 2272, 293, 286, 6183, 493, 257, 15176, 50, 4688, 365, 3165, 42, 295, 14561, + 11, 50868], "temperature": 0.0, "avg_logprob": -0.18698006798239314, "compression_ratio": + 1.434782608695652, "no_speech_prob": 0.007260477636009455}, {"id": 17, "seek": 12808, + "start": 138.16000000000003, "end": 144.64000000000001, "text": " I think, in a + cassette tape for storage. And we went to a couple of, actually, we went to two", + "tokens": [50868, 286, 519, 11, 294, 257, 40514, 7314, 337, 6725, 13, 400, 321, + 1437, 281, 257, 1916, 295, 11, 767, 11, 321, 1437, 281, 732, 51192], "temperature": + 0.0, "avg_logprob": -0.18698006798239314, "compression_ratio": 1.434782608695652, + "no_speech_prob": 0.007260477636009455}, {"id": 18, "seek": 12808, "start": 144.64000000000001, + "end": 149.76000000000002, "text": " classes together, and then he didn''t want + to do it anymore, but I stayed with it. And I have always", "tokens": [51192, 5359, + 1214, 11, 293, 550, 415, 994, 380, 528, 281, 360, 309, 3602, 11, 457, 286, 9181, + 365, 309, 13, 400, 286, 362, 1009, 51448], "temperature": 0.0, "avg_logprob": -0.18698006798239314, + "compression_ratio": 1.434782608695652, "no_speech_prob": 0.007260477636009455}, + {"id": 19, "seek": 14976, "start": 149.76, "end": 159.67999999999998, "text": " + loved getting that computer to do things that we wanted to do. And so I guess ever + since then,", "tokens": [50364, 4333, 1242, 300, 3820, 281, 360, 721, 300, 321, + 1415, 281, 360, 13, 400, 370, 286, 2041, 1562, 1670, 550, 11, 50860], "temperature": + 0.0, "avg_logprob": -0.1794012466279587, "compression_ratio": 1.58130081300813, + "no_speech_prob": 0.0018941774033010006}, {"id": 20, "seek": 14976, "start": 159.67999999999998, + "end": 165.04, "text": " I followed the tech path, so I was lucky enough to do my + undergrad at MIT. I actually have an MBA,", "tokens": [50860, 286, 6263, 264, 7553, + 3100, 11, 370, 286, 390, 6356, 1547, 281, 360, 452, 14295, 412, 13100, 13, 286, + 767, 362, 364, 26674, 11, 51128], "temperature": 0.0, "avg_logprob": -0.1794012466279587, + "compression_ratio": 1.58130081300813, "no_speech_prob": 0.0018941774033010006}, + {"id": 21, "seek": 14976, "start": 165.04, "end": 173.12, "text": " though, I''m + one of those MBA CTOs. And mostly I''ve worked building software and leading teams + to", "tokens": [51128, 1673, 11, 286, 478, 472, 295, 729, 26674, 383, 15427, 82, + 13, 400, 5240, 286, 600, 2732, 2390, 4722, 293, 5775, 5491, 281, 51532], "temperature": + 0.0, "avg_logprob": -0.1794012466279587, "compression_ratio": 1.58130081300813, + "no_speech_prob": 0.0018941774033010006}, {"id": 22, "seek": 14976, "start": 173.12, + "end": 178.56, "text": " build products and services. So some of them have been + a TIVIO, which is now actually service now,", "tokens": [51532, 1322, 3383, 293, + 3328, 13, 407, 512, 295, 552, 362, 668, 257, 314, 10375, 15167, 11, 597, 307, 586, + 767, 2643, 586, 11, 51804], "temperature": 0.0, "avg_logprob": -0.1794012466279587, + "compression_ratio": 1.58130081300813, "no_speech_prob": 0.0018941774033010006}, + {"id": 23, "seek": 17856, "start": 179.36, "end": 182.4, "text": " which is obviously + one of the unicorns out there. They really", "tokens": [50404, 597, 307, 2745, 472, + 295, 264, 28122, 82, 484, 456, 13, 814, 534, 50556], "temperature": 0.0, "avg_logprob": + -0.22655503890093634, "compression_ratio": 1.6072727272727272, "no_speech_prob": + 0.0016472884453833103}, {"id": 24, "seek": 17856, "start": 183.04, "end": 190.0, + "text": " totally disrupted the knowledge base and help desk space. And it''s an + incredible application", "tokens": [50588, 3879, 42271, 264, 3601, 3096, 293, 854, + 10026, 1901, 13, 400, 309, 311, 364, 4651, 3861, 50936], "temperature": 0.0, "avg_logprob": + -0.22655503890093634, "compression_ratio": 1.6072727272727272, "no_speech_prob": + 0.0016472884453833103}, {"id": 25, "seek": 17856, "start": 190.0, "end": 196.88, + "text": " of interesting core technology at the beginning, when things were whiteboardy. + I''ve worked in a", "tokens": [50936, 295, 1880, 4965, 2899, 412, 264, 2863, 11, + 562, 721, 645, 2418, 3787, 88, 13, 286, 600, 2732, 294, 257, 51280], "temperature": + 0.0, "avg_logprob": -0.22655503890093634, "compression_ratio": 1.6072727272727272, + "no_speech_prob": 0.0016472884453833103}, {"id": 26, "seek": 17856, "start": 196.88, + "end": 201.68, "text": " couple of other search companies, and with some other search + companies, I was lucky to spend a", "tokens": [51280, 1916, 295, 661, 3164, 3431, + 11, 293, 365, 512, 661, 3164, 3431, 11, 286, 390, 6356, 281, 3496, 257, 51520], + "temperature": 0.0, "avg_logprob": -0.22655503890093634, "compression_ratio": 1.6072727272727272, + "no_speech_prob": 0.0016472884453833103}, {"id": 27, "seek": 17856, "start": 201.68, + "end": 207.12, "text": " little time with Mercedes Arabian over at BA Insight, which + was a very cool and also Jeff Fried,", "tokens": [51520, 707, 565, 365, 22899, 8625, + 952, 670, 412, 21050, 9442, 397, 11, 597, 390, 257, 588, 1627, 293, 611, 7506, 17605, + 11, 51792], "temperature": 0.0, "avg_logprob": -0.22655503890093634, "compression_ratio": + 1.6072727272727272, "no_speech_prob": 0.0016472884453833103}, {"id": 28, "seek": + 20712, "start": 207.12, "end": 212.08, "text": " very cool company. And since I + know those guys back from fast, another company that I worked at,", "tokens": [50364, + 588, 1627, 2237, 13, 400, 1670, 286, 458, 729, 1074, 646, 490, 2370, 11, 1071, 2237, + 300, 286, 2732, 412, 11, 50612], "temperature": 0.0, "avg_logprob": -0.24365767117204337, + "compression_ratio": 1.5637860082304527, "no_speech_prob": 0.0026222297456115484}, + {"id": 29, "seek": 20712, "start": 212.08, "end": 219.68, "text": " now Microsoft, + fast was one of the early players in enterprise search that had an excellent product", + "tokens": [50612, 586, 8116, 11, 2370, 390, 472, 295, 264, 2440, 4150, 294, 14132, + 3164, 300, 632, 364, 7103, 1674, 50992], "temperature": 0.0, "avg_logprob": -0.24365767117204337, + "compression_ratio": 1.5637860082304527, "no_speech_prob": 0.0026222297456115484}, + {"id": 30, "seek": 20712, "start": 219.68, "end": 225.36, "text": " that scaled + and right as Google was sort of becoming a household name and just intermediate", + "tokens": [50992, 300, 36039, 293, 558, 382, 3329, 390, 1333, 295, 5617, 257, 9888, + 1315, 293, 445, 19376, 51276], "temperature": 0.0, "avg_logprob": -0.24365767117204337, + "compression_ratio": 1.5637860082304527, "no_speech_prob": 0.0026222297456115484}, + {"id": 31, "seek": 20712, "start": 225.36, "end": 233.44, "text": " everybody. We + had the tool to build the catalog, the e-catalog, that mostly for publishers,", + "tokens": [51276, 2201, 13, 492, 632, 264, 2290, 281, 1322, 264, 19746, 11, 264, + 308, 12, 18035, 44434, 11, 300, 5240, 337, 30421, 11, 51680], "temperature": 0.0, + "avg_logprob": -0.24365767117204337, "compression_ratio": 1.5637860082304527, "no_speech_prob": + 0.0026222297456115484}, {"id": 32, "seek": 23344, "start": 233.44, "end": 237.92, + "text": " and but then it really spread out and started to affect intranets. And + it was truly there that I", "tokens": [50364, 293, 457, 550, 309, 534, 3974, 484, + 293, 1409, 281, 3345, 560, 4257, 1385, 13, 400, 309, 390, 4908, 456, 300, 286, 50588], + "temperature": 0.0, "avg_logprob": -0.192283743518894, "compression_ratio": 1.582236842105263, + "no_speech_prob": 0.01615281216800213}, {"id": 33, "seek": 23344, "start": 237.92, + "end": 245.35999999999999, "text": " saw the power of search and how it could change + almost everything from the business perspective.", "tokens": [50588, 1866, 264, + 1347, 295, 3164, 293, 577, 309, 727, 1319, 1920, 1203, 490, 264, 1606, 4585, 13, + 50960], "temperature": 0.0, "avg_logprob": -0.192283743518894, "compression_ratio": + 1.582236842105263, "no_speech_prob": 0.01615281216800213}, {"id": 34, "seek": 23344, + "start": 245.92, "end": 250.8, "text": " You know, business intelligence and reporting + and all of these systems that have been around for", "tokens": [50988, 509, 458, + 11, 1606, 7599, 293, 10031, 293, 439, 295, 613, 3652, 300, 362, 668, 926, 337, 51232], + "temperature": 0.0, "avg_logprob": -0.192283743518894, "compression_ratio": 1.582236842105263, + "no_speech_prob": 0.01615281216800213}, {"id": 35, "seek": 23344, "start": 250.8, + "end": 257.04, "text": " 70, 80 years, they''re what we settle for. But everybody, + you know, from Brin and Page on,", "tokens": [51232, 5285, 11, 4688, 924, 11, 436, + 434, 437, 321, 11852, 337, 13, 583, 2201, 11, 291, 458, 11, 490, 1603, 259, 293, + 21217, 322, 11, 51544], "temperature": 0.0, "avg_logprob": -0.192283743518894, "compression_ratio": + 1.582236842105263, "no_speech_prob": 0.01615281216800213}, {"id": 36, "seek": 23344, + "start": 257.04, "end": 263.04, "text": " right, and way before that, we''re all + inspired by that Star Trek computer. Why can''t we just ask it,", "tokens": [51544, + 558, 11, 293, 636, 949, 300, 11, 321, 434, 439, 7547, 538, 300, 5705, 25845, 3820, + 13, 1545, 393, 380, 321, 445, 1029, 309, 11, 51844], "temperature": 0.0, "avg_logprob": + -0.192283743518894, "compression_ratio": 1.582236842105263, "no_speech_prob": 0.01615281216800213}, + {"id": 37, "seek": 26304, "start": 263.04, "end": 267.68, "text": " you know, it + seems like it''s not that hard. And now of course, not to give away the lead, right?", + "tokens": [50364, 291, 458, 11, 309, 2544, 411, 309, 311, 406, 300, 1152, 13, 400, + 586, 295, 1164, 11, 406, 281, 976, 1314, 264, 1477, 11, 558, 30, 50596], "temperature": + 0.0, "avg_logprob": -0.11837277292203502, "compression_ratio": 1.6928571428571428, + "no_speech_prob": 0.0009549632668495178}, {"id": 38, "seek": 26304, "start": 267.68, + "end": 272.32, "text": " But there''s definitely something doing that and it''s + been a long time coming. But that is not", "tokens": [50596, 583, 456, 311, 2138, + 746, 884, 300, 293, 309, 311, 668, 257, 938, 565, 1348, 13, 583, 300, 307, 406, + 50828], "temperature": 0.0, "avg_logprob": -0.11837277292203502, "compression_ratio": + 1.6928571428571428, "no_speech_prob": 0.0009549632668495178}, {"id": 39, "seek": + 26304, "start": 273.04, "end": 280.08000000000004, "text": " structured data. Well, + let''s not argue about the semantics, but it''s not what people refer to", "tokens": + [50864, 18519, 1412, 13, 1042, 11, 718, 311, 406, 9695, 466, 264, 4361, 45298, 11, + 457, 309, 311, 406, 437, 561, 2864, 281, 51216], "temperature": 0.0, "avg_logprob": + -0.11837277292203502, "compression_ratio": 1.6928571428571428, "no_speech_prob": + 0.0009549632668495178}, {"id": 40, "seek": 26304, "start": 280.08000000000004, "end": + 284.08000000000004, "text": " as structured. It''s not database data metrics and + KPIs and sales numbers and things like that.", "tokens": [51216, 382, 18519, 13, + 467, 311, 406, 8149, 1412, 16367, 293, 41371, 6802, 293, 5763, 3547, 293, 721, 411, + 300, 13, 51416], "temperature": 0.0, "avg_logprob": -0.11837277292203502, "compression_ratio": + 1.6928571428571428, "no_speech_prob": 0.0009549632668495178}, {"id": 41, "seek": + 26304, "start": 285.28000000000003, "end": 290.08000000000004, "text": " I think + that it was really at fast and also at Northern Light Technology, which is still + going", "tokens": [51476, 286, 519, 300, 309, 390, 534, 412, 2370, 293, 611, 412, + 14335, 8279, 15037, 11, 597, 307, 920, 516, 51716], "temperature": 0.0, "avg_logprob": + -0.11837277292203502, "compression_ratio": 1.6928571428571428, "no_speech_prob": + 0.0009549632668495178}, {"id": 42, "seek": 29008, "start": 290.08, "end": 294.0, + "text": " strong, by the way, with some fantastic indexing search. And now they''re + doing question answering.", "tokens": [50364, 2068, 11, 538, 264, 636, 11, 365, + 512, 5456, 8186, 278, 3164, 13, 400, 586, 436, 434, 884, 1168, 13430, 13, 50560], + "temperature": 0.0, "avg_logprob": -0.14486512116023473, "compression_ratio": 1.6785714285714286, + "no_speech_prob": 0.002761778188869357}, {"id": 43, "seek": 29008, "start": 295.28, + "end": 299.91999999999996, "text": " First place I really touched search, right, + was at Northern Light. It''s the human interface. And", "tokens": [50624, 2386, + 1081, 286, 534, 9828, 3164, 11, 558, 11, 390, 412, 14335, 8279, 13, 467, 311, 264, + 1952, 9226, 13, 400, 50856], "temperature": 0.0, "avg_logprob": -0.14486512116023473, + "compression_ratio": 1.6785714285714286, "no_speech_prob": 0.002761778188869357}, + {"id": 44, "seek": 29008, "start": 300.64, "end": 306.15999999999997, "text": " + we feel like it should be coming along faster. And now the text after many years + of indexing", "tokens": [50892, 321, 841, 411, 309, 820, 312, 1348, 2051, 4663, + 13, 400, 586, 264, 2487, 934, 867, 924, 295, 8186, 278, 51168], "temperature": 0.0, + "avg_logprob": -0.14486512116023473, "compression_ratio": 1.6785714285714286, "no_speech_prob": + 0.002761778188869357}, {"id": 45, "seek": 29008, "start": 306.15999999999997, "end": + 311.84, "text": " and vector search, right? And the advances driven by Google so + much, right? Transformer architectures", "tokens": [51168, 293, 8062, 3164, 11, + 558, 30, 400, 264, 25297, 9555, 538, 3329, 370, 709, 11, 558, 30, 27938, 260, 6331, + 1303, 51452], "temperature": 0.0, "avg_logprob": -0.14486512116023473, "compression_ratio": + 1.6785714285714286, "no_speech_prob": 0.002761778188869357}, {"id": 46, "seek": + 29008, "start": 311.84, "end": 317.52, "text": " and vectors. And that has all come + together into a pretty amazing place. And so", "tokens": [51452, 293, 18875, 13, + 400, 300, 575, 439, 808, 1214, 666, 257, 1238, 2243, 1081, 13, 400, 370, 51736], + "temperature": 0.0, "avg_logprob": -0.14486512116023473, "compression_ratio": 1.6785714285714286, + "no_speech_prob": 0.002761778188869357}, {"id": 47, "seek": 31752, "start": 318.47999999999996, + "end": 324.79999999999995, "text": " long story short, that background led me to + create swirl because I noticed a couple things.", "tokens": [50412, 938, 1657, 2099, + 11, 300, 3678, 4684, 385, 281, 1884, 30310, 570, 286, 5694, 257, 1916, 721, 13, + 50728], "temperature": 0.0, "avg_logprob": -0.2211082758528463, "compression_ratio": + 1.69377990430622, "no_speech_prob": 0.0010598271619528532}, {"id": 48, "seek": 31752, + "start": 326.0, "end": 331.03999999999996, "text": " It really came down to three + things. One is that there''s, there are silos, super silos,", "tokens": [50788, + 467, 534, 1361, 760, 281, 1045, 721, 13, 1485, 307, 300, 456, 311, 11, 456, 366, + 48893, 11, 1687, 48893, 11, 51040], "temperature": 0.0, "avg_logprob": -0.2211082758528463, + "compression_ratio": 1.69377990430622, "no_speech_prob": 0.0010598271619528532}, + {"id": 49, "seek": 31752, "start": 331.03999999999996, "end": 336.79999999999995, + "text": " like service now. Service now really did get a lot of the knowledge bases + and a lot of those,", "tokens": [51040, 411, 2643, 586, 13, 9561, 586, 534, 630, + 483, 257, 688, 295, 264, 3601, 17949, 293, 257, 688, 295, 729, 11, 51328], "temperature": + 0.0, "avg_logprob": -0.2211082758528463, "compression_ratio": 1.69377990430622, + "no_speech_prob": 0.0010598271619528532}, {"id": 50, "seek": 31752, "start": 337.52, + "end": 341.52, "text": " a lot of the help desk, you know, the tickets base with + the streams and tickets.", "tokens": [51364, 257, 688, 295, 264, 854, 10026, 11, + 291, 458, 11, 264, 12628, 3096, 365, 264, 15842, 293, 12628, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.2211082758528463, "compression_ratio": 1.69377990430622, + "no_speech_prob": 0.0010598271619528532}, {"id": 51, "seek": 34152, "start": 342.32, + "end": 348.71999999999997, "text": " M365 kind of won the files race at least right + along with email. And I guess they''ve done very well.", "tokens": [50404, 376, + 11309, 20, 733, 295, 1582, 264, 7098, 4569, 412, 1935, 558, 2051, 365, 3796, 13, + 400, 286, 2041, 436, 600, 1096, 588, 731, 13, 50724], "temperature": 0.0, "avg_logprob": + -0.16326441083635604, "compression_ratio": 1.5671140939597314, "no_speech_prob": + 0.004269641358405352}, {"id": 52, "seek": 34152, "start": 349.59999999999997, "end": + 354.08, "text": " Obviously, very impressive performance to build teams to the large + community that it has developed.", "tokens": [50768, 7580, 11, 588, 8992, 3389, + 281, 1322, 5491, 281, 264, 2416, 1768, 300, 309, 575, 4743, 13, 50992], "temperature": + 0.0, "avg_logprob": -0.16326441083635604, "compression_ratio": 1.5671140939597314, + "no_speech_prob": 0.004269641358405352}, {"id": 53, "seek": 34152, "start": 354.64, + "end": 359.12, "text": " So, and then there are others, right? There''s certainly + Salesforce, a great example of where most", "tokens": [51020, 407, 11, 293, 550, + 456, 366, 2357, 11, 558, 30, 821, 311, 3297, 40398, 11, 257, 869, 1365, 295, 689, + 881, 51244], "temperature": 0.0, "avg_logprob": -0.16326441083635604, "compression_ratio": + 1.5671140939597314, "no_speech_prob": 0.004269641358405352}, {"id": 54, "seek": + 34152, "start": 359.12, "end": 364.71999999999997, "text": " of the CRM data now + lives. Snowflakes, another one, you can''t really get a copy of these. I mean,", + "tokens": [51244, 295, 264, 14123, 44, 1412, 586, 2909, 13, 14827, 3423, 3419, 11, + 1071, 472, 11, 291, 393, 380, 534, 483, 257, 5055, 295, 613, 13, 286, 914, 11, 51524], + "temperature": 0.0, "avg_logprob": -0.16326441083635604, "compression_ratio": 1.5671140939597314, + "no_speech_prob": 0.004269641358405352}, {"id": 55, "seek": 34152, "start": 364.71999999999997, + "end": 367.84, "text": " moving the data out from Snowflake is relatively easy, + but the others,", "tokens": [51524, 2684, 264, 1412, 484, 490, 14827, 3423, 619, + 307, 7226, 1858, 11, 457, 264, 2357, 11, 51680], "temperature": 0.0, "avg_logprob": + -0.16326441083635604, "compression_ratio": 1.5671140939597314, "no_speech_prob": + 0.004269641358405352}, {"id": 56, "seek": 36784, "start": 368.23999999999995, "end": + 374.64, "text": " there''s a complicated API there. Salesforce has thousands of + tables. So, you can''t really get", "tokens": [50384, 456, 311, 257, 6179, 9362, + 456, 13, 40398, 575, 5383, 295, 8020, 13, 407, 11, 291, 393, 380, 534, 483, 50704], + "temperature": 0.0, "avg_logprob": -0.1893357907311391, "compression_ratio": 1.6895306859205776, + "no_speech_prob": 0.006047777831554413}, {"id": 57, "seek": 36784, "start": 374.64, + "end": 380.0, "text": " that data anymore, but yet it has some of the most important + ideas, concepts, and knowledge in your", "tokens": [50704, 300, 1412, 3602, 11, + 457, 1939, 309, 575, 512, 295, 264, 881, 1021, 3487, 11, 10392, 11, 293, 3601, 294, + 428, 50972], "temperature": 0.0, "avg_logprob": -0.1893357907311391, "compression_ratio": + 1.6895306859205776, "no_speech_prob": 0.006047777831554413}, {"id": 58, "seek": + 36784, "start": 380.0, "end": 385.91999999999996, "text": " entire company. So, + that''s when I realized something that had been tried before. MetaSearch,", "tokens": + [50972, 2302, 2237, 13, 407, 11, 300, 311, 562, 286, 5334, 746, 300, 632, 668, 3031, + 949, 13, 6377, 64, 10637, 1178, 11, 51268], "temperature": 0.0, "avg_logprob": -0.1893357907311391, + "compression_ratio": 1.6895306859205776, "no_speech_prob": 0.006047777831554413}, + {"id": 59, "seek": 36784, "start": 385.91999999999996, "end": 389.76, "text": " + right, or federated search. I think MetaSearch is clear up because now sometimes + people say", "tokens": [51268, 558, 11, 420, 38024, 770, 3164, 13, 286, 519, 6377, + 64, 10637, 1178, 307, 1850, 493, 570, 586, 2171, 561, 584, 51460], "temperature": + 0.0, "avg_logprob": -0.1893357907311391, "compression_ratio": 1.6895306859205776, + "no_speech_prob": 0.006047777831554413}, {"id": 60, "seek": 36784, "start": 390.47999999999996, + "end": 397.59999999999997, "text": " federated search is about e-commerce federation. + The MetaSearch was hard to do because of", "tokens": [51496, 38024, 770, 3164, 307, + 466, 308, 12, 26926, 4636, 5053, 13, 440, 6377, 64, 10637, 1178, 390, 1152, 281, + 360, 570, 295, 51852], "temperature": 0.0, "avg_logprob": -0.1893357907311391, "compression_ratio": + 1.6895306859205776, "no_speech_prob": 0.006047777831554413}, {"id": 61, "seek": + 39760, "start": 397.6, "end": 402.08000000000004, "text": " connectivity, right? + Like it could take you months to just get somebody to change a network", "tokens": + [50364, 21095, 11, 558, 30, 1743, 309, 727, 747, 291, 2493, 281, 445, 483, 2618, + 281, 1319, 257, 3209, 50588], "temperature": 0.0, "avg_logprob": -0.16252016813858697, + "compression_ratio": 1.5766666666666667, "no_speech_prob": 0.000260751461610198}, + {"id": 62, "seek": 39760, "start": 402.08000000000004, "end": 406.56, "text": " + thing or to put a VPN in place or either change permissions. That was expensive + in large enterprise.", "tokens": [50588, 551, 420, 281, 829, 257, 24512, 294, 1081, + 420, 2139, 1319, 32723, 13, 663, 390, 5124, 294, 2416, 14132, 13, 50812], "temperature": + 0.0, "avg_logprob": -0.16252016813858697, "compression_ratio": 1.5766666666666667, + "no_speech_prob": 0.000260751461610198}, {"id": 63, "seek": 39760, "start": 407.12, + "end": 411.6, "text": " But now, especially with public services, pretty much everything + has an API. The perimeter", "tokens": [50840, 583, 586, 11, 2318, 365, 1908, 3328, + 11, 1238, 709, 1203, 575, 364, 9362, 13, 440, 32404, 51064], "temperature": 0.0, + "avg_logprob": -0.16252016813858697, "compression_ratio": 1.5766666666666667, "no_speech_prob": + 0.000260751461610198}, {"id": 64, "seek": 39760, "start": 411.6, "end": 416.40000000000003, + "text": " doesn''t exist the way I used to. And so, you can query everything. So, + that left the problem", "tokens": [51064, 1177, 380, 2514, 264, 636, 286, 1143, + 281, 13, 400, 370, 11, 291, 393, 14581, 1203, 13, 407, 11, 300, 1411, 264, 1154, + 51304], "temperature": 0.0, "avg_logprob": -0.16252016813858697, "compression_ratio": + 1.5766666666666667, "no_speech_prob": 0.000260751461610198}, {"id": 65, "seek": + 39760, "start": 416.40000000000003, "end": 421.76000000000005, "text": " of can + you make sense of things? And that''s of course, what we''re here about, right, + is vectors.", "tokens": [51304, 295, 393, 291, 652, 2020, 295, 721, 30, 400, 300, + 311, 295, 1164, 11, 437, 321, 434, 510, 466, 11, 558, 11, 307, 18875, 13, 51572], + "temperature": 0.0, "avg_logprob": -0.16252016813858697, "compression_ratio": 1.5766666666666667, + "no_speech_prob": 0.000260751461610198}, {"id": 66, "seek": 42176, "start": 421.92, + "end": 428.08, "text": " The power of vector search and vector similarity, specifically, + right, self-cosign vector", "tokens": [50372, 440, 1347, 295, 8062, 3164, 293, 8062, + 32194, 11, 4682, 11, 558, 11, 2698, 12, 6877, 788, 8062, 50680], "temperature": + 0.0, "avg_logprob": -0.20515808377947126, "compression_ratio": 1.7045454545454546, + "no_speech_prob": 0.058903735131025314}, {"id": 67, "seek": 42176, "start": 428.08, + "end": 432.88, "text": " similarity that we use in Swirl to make sense of completely + disparate and very, very,", "tokens": [50680, 32194, 300, 321, 764, 294, 3926, 1648, + 281, 652, 2020, 295, 2584, 14548, 473, 293, 588, 11, 588, 11, 50920], "temperature": + 0.0, "avg_logprob": -0.20515808377947126, "compression_ratio": 1.7045454545454546, + "no_speech_prob": 0.058903735131025314}, {"id": 68, "seek": 42176, "start": 434.0, + "end": 438.08, "text": " very incompatible results, if you will. And it''s shocking + how well it works.", "tokens": [50976, 588, 40393, 267, 964, 3542, 11, 498, 291, + 486, 13, 400, 309, 311, 18776, 577, 731, 309, 1985, 13, 51180], "temperature": 0.0, + "avg_logprob": -0.20515808377947126, "compression_ratio": 1.7045454545454546, "no_speech_prob": + 0.058903735131025314}, {"id": 69, "seek": 42176, "start": 438.71999999999997, "end": + 442.4, "text": " That that''s when I saw it work, I said there''s more to this than + I thought it now. It seems", "tokens": [51212, 663, 300, 311, 562, 286, 1866, 309, + 589, 11, 286, 848, 456, 311, 544, 281, 341, 813, 286, 1194, 309, 586, 13, 467, 2544, + 51396], "temperature": 0.0, "avg_logprob": -0.20515808377947126, "compression_ratio": + 1.7045454545454546, "no_speech_prob": 0.058903735131025314}, {"id": 70, "seek": + 42176, "start": 442.4, "end": 446.08, "text": " I''m not the only one. So, but anyway, + that''s a little bit of the story in my background. I hope", "tokens": [51396, 286, + 478, 406, 264, 787, 472, 13, 407, 11, 457, 4033, 11, 300, 311, 257, 707, 857, 295, + 264, 1657, 294, 452, 3678, 13, 286, 1454, 51580], "temperature": 0.0, "avg_logprob": + -0.20515808377947126, "compression_ratio": 1.7045454545454546, "no_speech_prob": + 0.058903735131025314}, {"id": 71, "seek": 42176, "start": 446.08, "end": 450.4, + "text": " that that made some sense. Yeah, it''s very solid background. You reminded + me of one,", "tokens": [51580, 300, 300, 1027, 512, 2020, 13, 865, 11, 309, 311, + 588, 5100, 3678, 13, 509, 15920, 385, 295, 472, 11, 51796], "temperature": 0.0, + "avg_logprob": -0.20515808377947126, "compression_ratio": 1.7045454545454546, "no_speech_prob": + 0.058903735131025314}, {"id": 72, "seek": 45040, "start": 450.96, "end": 455.67999999999995, + "text": " I don''t remember the name of that computer, but like didn''t have the + display the way we have today,", "tokens": [50392, 286, 500, 380, 1604, 264, 1315, + 295, 300, 3820, 11, 457, 411, 994, 380, 362, 264, 4674, 264, 636, 321, 362, 965, + 11, 50628], "temperature": 0.0, "avg_logprob": -0.19299275324894832, "compression_ratio": + 1.6506849315068493, "no_speech_prob": 0.05183985456824303}, {"id": 73, "seek": 45040, + "start": 455.67999999999995, "end": 461.2, "text": " right? It just had the keyboard. + And then it had the cassette. And so, my friend and I were", "tokens": [50628, 558, + 30, 467, 445, 632, 264, 10186, 13, 400, 550, 309, 632, 264, 40514, 13, 400, 370, + 11, 452, 1277, 293, 286, 645, 50904], "temperature": 0.0, "avg_logprob": -0.19299275324894832, + "compression_ratio": 1.6506849315068493, "no_speech_prob": 0.05183985456824303}, + {"id": 74, "seek": 45040, "start": 461.2, "end": 467.12, "text": " sitting there + for several minutes to boot it. And then there was some game like Mario or whatever.", + "tokens": [50904, 3798, 456, 337, 2940, 2077, 281, 11450, 309, 13, 400, 550, 456, + 390, 512, 1216, 411, 9343, 420, 2035, 13, 51200], "temperature": 0.0, "avg_logprob": + -0.19299275324894832, "compression_ratio": 1.6506849315068493, "no_speech_prob": + 0.05183985456824303}, {"id": 75, "seek": 45040, "start": 468.56, "end": 474.88, + "text": " That was on the cool Apple twos. I was always envious of the Apple two, + you know, kids. Because", "tokens": [51272, 663, 390, 322, 264, 1627, 6373, 683, + 329, 13, 286, 390, 1009, 465, 1502, 295, 264, 6373, 732, 11, 291, 458, 11, 2301, + 13, 1436, 51588], "temperature": 0.0, "avg_logprob": -0.19299275324894832, "compression_ratio": + 1.6506849315068493, "no_speech_prob": 0.05183985456824303}, {"id": 76, "seek": 45040, + "start": 474.88, "end": 479.76, "text": " you''re right, on the TRS-80, we only + had block graphics. It was hilarious. But it didn''t move a", "tokens": [51588, + 291, 434, 558, 11, 322, 264, 15176, 50, 12, 4702, 11, 321, 787, 632, 3461, 11837, + 13, 467, 390, 19796, 13, 583, 309, 994, 380, 1286, 257, 51832], "temperature": 0.0, + "avg_logprob": -0.19299275324894832, "compression_ratio": 1.6506849315068493, "no_speech_prob": + 0.05183985456824303}, {"id": 77, "seek": 47976, "start": 479.76, "end": 484.8, "text": + " little bit faster in a way. Like you get to wait a long time for Apple upgrades. + But I remember", "tokens": [50364, 707, 857, 4663, 294, 257, 636, 13, 1743, 291, + 483, 281, 1699, 257, 938, 565, 337, 6373, 24868, 13, 583, 286, 1604, 50616], "temperature": + 0.0, "avg_logprob": -0.1847791515412878, "compression_ratio": 1.595890410958904, + "no_speech_prob": 0.0028061617631465197}, {"id": 78, "seek": 47976, "start": 485.92, + "end": 491.28, "text": " the TRS-80, there was an incredible ecosystem of things + you could add to it. So, memory. And then", "tokens": [50672, 264, 15176, 50, 12, + 4702, 11, 456, 390, 364, 4651, 11311, 295, 721, 291, 727, 909, 281, 309, 13, 407, + 11, 4675, 13, 400, 550, 50940], "temperature": 0.0, "avg_logprob": -0.1847791515412878, + "compression_ratio": 1.595890410958904, "no_speech_prob": 0.0028061617631465197}, + {"id": 79, "seek": 47976, "start": 491.28, "end": 495.68, "text": " there was a + company called Percom that put out disk drives. Wow, a disk drive. That was a game", + "tokens": [50940, 456, 390, 257, 2237, 1219, 3026, 1112, 300, 829, 484, 12355, 11754, + 13, 3153, 11, 257, 12355, 3332, 13, 663, 390, 257, 1216, 51160], "temperature": + 0.0, "avg_logprob": -0.1847791515412878, "compression_ratio": 1.595890410958904, + "no_speech_prob": 0.0028061617631465197}, {"id": 80, "seek": 47976, "start": 495.68, + "end": 500.64, "text": " changer if you played with a cassette recorder. Although, + who didn''t love switching your parents''", "tokens": [51160, 22822, 498, 291, 3737, + 365, 257, 40514, 37744, 13, 5780, 11, 567, 994, 380, 959, 16493, 428, 3152, 6, 51408], + "temperature": 0.0, "avg_logprob": -0.1847791515412878, "compression_ratio": 1.595890410958904, + "no_speech_prob": 0.0028061617631465197}, {"id": 81, "seek": 47976, "start": 500.64, + "end": 503.59999999999997, "text": " cassettes with the with the data tape so they''d + put it on in the car and we go,", "tokens": [51408, 21943, 16049, 365, 264, 365, + 264, 1412, 7314, 370, 436, 1116, 829, 309, 322, 294, 264, 1032, 293, 321, 352, 11, + 51556], "temperature": 0.0, "avg_logprob": -0.1847791515412878, "compression_ratio": + 1.595890410958904, "no_speech_prob": 0.0028061617631465197}, {"id": 82, "seek": + 50360, "start": 503.6, "end": 509.44, "text": " okay, are they going to stop and + turn that off? It was a hilarious prank. A great way to get", "tokens": [50364, + 1392, 11, 366, 436, 516, 281, 1590, 293, 1261, 300, 766, 30, 467, 390, 257, 19796, + 19794, 13, 316, 869, 636, 281, 483, 50656], "temperature": 0.0, "avg_logprob": -0.2387035157945421, + "compression_ratio": 1.6529209621993126, "no_speech_prob": 0.010534117929637432}, + {"id": 83, "seek": 50360, "start": 509.44, "end": 516.0, "text": " some sound. But + disk drives then gave, right? First, there were the five in a quarter or actually", + "tokens": [50656, 512, 1626, 13, 583, 12355, 11754, 550, 2729, 11, 558, 30, 2386, + 11, 456, 645, 264, 1732, 294, 257, 6555, 420, 767, 50984], "temperature": 0.0, "avg_logprob": + -0.2387035157945421, "compression_ratio": 1.6529209621993126, "no_speech_prob": + 0.010534117929637432}, {"id": 84, "seek": 50360, "start": 516.0, "end": 519.9200000000001, + "text": " eight inch, then five in a quarter. And then finally, they''ve got to + the cassette. I was at that", "tokens": [50984, 3180, 7227, 11, 550, 1732, 294, + 257, 6555, 13, 400, 550, 2721, 11, 436, 600, 658, 281, 264, 40514, 13, 286, 390, + 412, 300, 51180], "temperature": 0.0, "avg_logprob": -0.2387035157945421, "compression_ratio": + 1.6529209621993126, "no_speech_prob": 0.010534117929637432}, {"id": 85, "seek": + 50360, "start": 519.9200000000001, "end": 526.4, "text": " point, it was sort of + replaced, right? Then the IBM PC showed up. And that was a bit of a game", "tokens": + [51180, 935, 11, 309, 390, 1333, 295, 10772, 11, 558, 30, 1396, 264, 23487, 6465, + 4712, 493, 13, 400, 300, 390, 257, 857, 295, 257, 1216, 51504], "temperature": 0.0, + "avg_logprob": -0.2387035157945421, "compression_ratio": 1.6529209621993126, "no_speech_prob": + 0.010534117929637432}, {"id": 86, "seek": 50360, "start": 526.4, "end": 533.0400000000001, + "text": " changer. But the Apple always had better graphics. Yeah, absolutely. I + just wanted to come back to", "tokens": [51504, 22822, 13, 583, 264, 6373, 1009, + 632, 1101, 11837, 13, 865, 11, 3122, 13, 286, 445, 1415, 281, 808, 646, 281, 51836], + "temperature": 0.0, "avg_logprob": -0.2387035157945421, "compression_ratio": 1.6529209621993126, + "no_speech_prob": 0.010534117929637432}, {"id": 87, "seek": 53360, "start": 533.6, + "end": 538.32, "text": " what you just said about federated search and enterprise + search. I think I remember hearing about", "tokens": [50364, 437, 291, 445, 848, + 466, 38024, 770, 3164, 293, 14132, 3164, 13, 286, 519, 286, 1604, 4763, 466, 50600], + "temperature": 0.0, "avg_logprob": -0.12487641334533692, "compression_ratio": 1.592, + "no_speech_prob": 0.047529760748147964}, {"id": 88, "seek": 53360, "start": 538.32, + "end": 544.88, "text": " enterprise search was it like 15, 16, 17 years ago, I don''t + remember in the university when one of", "tokens": [50600, 14132, 3164, 390, 309, + 411, 2119, 11, 3165, 11, 3282, 924, 2057, 11, 286, 500, 380, 1604, 294, 264, 5454, + 562, 472, 295, 50928], "temperature": 0.0, "avg_logprob": -0.12487641334533692, + "compression_ratio": 1.592, "no_speech_prob": 0.047529760748147964}, {"id": 89, + "seek": 53360, "start": 544.88, "end": 551.0400000000001, "text": " my supervisors + was focusing on it and he was saying, this is the next big thing. And once it''s + figured", "tokens": [50928, 452, 42218, 390, 8416, 322, 309, 293, 415, 390, 1566, + 11, 341, 307, 264, 958, 955, 551, 13, 400, 1564, 309, 311, 8932, 51236], "temperature": + 0.0, "avg_logprob": -0.12487641334533692, "compression_ratio": 1.592, "no_speech_prob": + 0.047529760748147964}, {"id": 90, "seek": 53360, "start": 551.0400000000001, "end": + 561.0400000000001, "text": " out, you know, we will be rich. But somehow it didn''t + happen. And then later in my career, I heard", "tokens": [51236, 484, 11, 291, 458, + 11, 321, 486, 312, 4593, 13, 583, 6063, 309, 994, 380, 1051, 13, 400, 550, 1780, + 294, 452, 3988, 11, 286, 2198, 51736], "temperature": 0.0, "avg_logprob": -0.12487641334533692, + "compression_ratio": 1.592, "no_speech_prob": 0.047529760748147964}, {"id": 91, + "seek": 56104, "start": 561.52, "end": 569.52, "text": " term federated search in + connection to, okay, we have our own search engine, we have clients data,", "tokens": + [50388, 1433, 38024, 770, 3164, 294, 4984, 281, 11, 1392, 11, 321, 362, 527, 1065, + 3164, 2848, 11, 321, 362, 6982, 1412, 11, 50788], "temperature": 0.0, "avg_logprob": + -0.172983447710673, "compression_ratio": 1.579591836734694, "no_speech_prob": 0.013129369355738163}, + {"id": 92, "seek": 56104, "start": 570.0799999999999, "end": 575.68, "text": " can + we combine the two without needing them to upload their data to our servers? Because + in some", "tokens": [50816, 393, 321, 10432, 264, 732, 1553, 18006, 552, 281, 6580, + 641, 1412, 281, 527, 15909, 30, 1436, 294, 512, 51096], "temperature": 0.0, "avg_logprob": + -0.172983447710673, "compression_ratio": 1.579591836734694, "no_speech_prob": 0.013129369355738163}, + {"id": 93, "seek": 56104, "start": 575.68, "end": 581.5999999999999, "text": " cases, + they wouldn''t trust us, you know, securing it''s enough and so on. So forest. And + then we", "tokens": [51096, 3331, 11, 436, 2759, 380, 3361, 505, 11, 291, 458, 11, + 33640, 309, 311, 1547, 293, 370, 322, 13, 407, 6719, 13, 400, 550, 321, 51392], + "temperature": 0.0, "avg_logprob": -0.172983447710673, "compression_ratio": 1.579591836734694, + "no_speech_prob": 0.013129369355738163}, {"id": 94, "seek": 56104, "start": 581.5999999999999, + "end": 588.7199999999999, "text": " were confronted with the fact that maybe it + will incur quite a bit of latency. And also even in", "tokens": [51392, 645, 31257, + 365, 264, 1186, 300, 1310, 309, 486, 35774, 1596, 257, 857, 295, 27043, 13, 400, + 611, 754, 294, 51748], "temperature": 0.0, "avg_logprob": -0.172983447710673, "compression_ratio": + 1.579591836734694, "no_speech_prob": 0.013129369355738163}, {"id": 95, "seek": 58872, + "start": 588.72, "end": 594.5600000000001, "text": " the first place, how we would + build this. But you know, like before we even get there, how do you", "tokens": + [50364, 264, 700, 1081, 11, 577, 321, 576, 1322, 341, 13, 583, 291, 458, 11, 411, + 949, 321, 754, 483, 456, 11, 577, 360, 291, 50656], "temperature": 0.0, "avg_logprob": + -0.15520266200719254, "compression_ratio": 1.670995670995671, "no_speech_prob": + 0.000723956385627389}, {"id": 96, "seek": 58872, "start": 595.36, "end": 603.2, + "text": " relate enterprise search versus federated search? So, so I think they''re, + they''re different in", "tokens": [50696, 10961, 14132, 3164, 5717, 38024, 770, + 3164, 30, 407, 11, 370, 286, 519, 436, 434, 11, 436, 434, 819, 294, 51088], "temperature": + 0.0, "avg_logprob": -0.15520266200719254, "compression_ratio": 1.670995670995671, + "no_speech_prob": 0.000723956385627389}, {"id": 97, "seek": 58872, "start": 603.2, + "end": 608.64, "text": " that enterprise search is about a realm. Right. Enterprise + search means usually not public sources.", "tokens": [51088, 300, 14132, 3164, 307, + 466, 257, 15355, 13, 1779, 13, 26696, 3164, 1355, 2673, 406, 1908, 7139, 13, 51360], + "temperature": 0.0, "avg_logprob": -0.15520266200719254, "compression_ratio": 1.670995670995671, + "no_speech_prob": 0.000723956385627389}, {"id": 98, "seek": 58872, "start": 609.6800000000001, + "end": 615.0400000000001, "text": " And I think it''s important to differentiate + the problems of the large enterprise and even the", "tokens": [51412, 400, 286, + 519, 309, 311, 1021, 281, 23203, 264, 2740, 295, 264, 2416, 14132, 293, 754, 264, + 51680], "temperature": 0.0, "avg_logprob": -0.15520266200719254, "compression_ratio": + 1.670995670995671, "no_speech_prob": 0.000723956385627389}, {"id": 99, "seek": 61504, + "start": 615.04, "end": 620.48, "text": " medium enterprise are not the same as + the sort of small, small and medium enterprise enterprise.", "tokens": [50364, 6399, + 14132, 366, 406, 264, 912, 382, 264, 1333, 295, 1359, 11, 1359, 293, 6399, 14132, + 14132, 13, 50636], "temperature": 0.0, "avg_logprob": -0.1115419864654541, "compression_ratio": + 1.7179487179487178, "no_speech_prob": 0.0004416282754391432}, {"id": 100, "seek": + 61504, "start": 621.28, "end": 624.88, "text": " Maybe that''s not a great dividing + line. But definitely the large enterprise has a very different", "tokens": [50676, + 2704, 300, 311, 406, 257, 869, 26764, 1622, 13, 583, 2138, 264, 2416, 14132, 575, + 257, 588, 819, 50856], "temperature": 0.0, "avg_logprob": -0.1115419864654541, "compression_ratio": + 1.7179487179487178, "no_speech_prob": 0.0004416282754391432}, {"id": 101, "seek": + 61504, "start": 624.88, "end": 628.7199999999999, "text": " set of problems. It''s + so much more about, you know, global distribution and languages and", "tokens": + [50856, 992, 295, 2740, 13, 467, 311, 370, 709, 544, 466, 11, 291, 458, 11, 4338, + 7316, 293, 8650, 293, 51048], "temperature": 0.0, "avg_logprob": -0.1115419864654541, + "compression_ratio": 1.7179487179487178, "no_speech_prob": 0.0004416282754391432}, + {"id": 102, "seek": 61504, "start": 628.7199999999999, "end": 635.36, "text": " + regulation. If you''re a, you know, small company like like swirl ink, we have five + people,", "tokens": [51048, 15062, 13, 759, 291, 434, 257, 11, 291, 458, 11, 1359, + 2237, 411, 411, 30310, 11276, 11, 321, 362, 1732, 561, 11, 51380], "temperature": + 0.0, "avg_logprob": -0.1115419864654541, "compression_ratio": 1.7179487179487178, + "no_speech_prob": 0.0004416282754391432}, {"id": 103, "seek": 61504, "start": 635.36, + "end": 640.24, "text": " we can work off of almost anything. I mean, and we don''t + have the silo problem because we just", "tokens": [51380, 321, 393, 589, 766, 295, + 1920, 1340, 13, 286, 914, 11, 293, 321, 500, 380, 362, 264, 3425, 78, 1154, 570, + 321, 445, 51624], "temperature": 0.0, "avg_logprob": -0.1115419864654541, "compression_ratio": + 1.7179487179487178, "no_speech_prob": 0.0004416282754391432}, {"id": 104, "seek": + 64024, "start": 640.32, "end": 644.96, "text": " have picked, you know, we have + four. But it''s interesting. We do still have the silo problem,", "tokens": [50368, + 362, 6183, 11, 291, 458, 11, 321, 362, 1451, 13, 583, 309, 311, 1880, 13, 492, 360, + 920, 362, 264, 3425, 78, 1154, 11, 50600], "temperature": 0.0, "avg_logprob": -0.11713124884933722, + "compression_ratio": 1.6354515050167224, "no_speech_prob": 0.0015854957746341825}, + {"id": 105, "seek": 64024, "start": 644.96, "end": 649.36, "text": " right. And + as I''m going to show you, just when we were trying to find the steering document + for this", "tokens": [50600, 558, 13, 400, 382, 286, 478, 516, 281, 855, 291, 11, + 445, 562, 321, 645, 1382, 281, 915, 264, 14823, 4166, 337, 341, 50820], "temperature": + 0.0, "avg_logprob": -0.11713124884933722, "compression_ratio": 1.6354515050167224, + "no_speech_prob": 0.0015854957746341825}, {"id": 106, "seek": 64024, "start": 649.36, + "end": 654.0, "text": " discussion, I realized I was hunting around which silo did + I put it in instead of just going to", "tokens": [50820, 5017, 11, 286, 5334, 286, + 390, 12599, 926, 597, 3425, 78, 630, 286, 829, 309, 294, 2602, 295, 445, 516, 281, + 51052], "temperature": 0.0, "avg_logprob": -0.11713124884933722, "compression_ratio": + 1.6354515050167224, "no_speech_prob": 0.0015854957746341825}, {"id": 107, "seek": + 64024, "start": 654.0, "end": 660.72, "text": " search. So it''s funny that we''ve + trained ourselves to work that way. That I think it''s a reflection", "tokens": + [51052, 3164, 13, 407, 309, 311, 4074, 300, 321, 600, 8895, 4175, 281, 589, 300, + 636, 13, 663, 286, 519, 309, 311, 257, 12914, 51388], "temperature": 0.0, "avg_logprob": + -0.11713124884933722, "compression_ratio": 1.6354515050167224, "no_speech_prob": + 0.0015854957746341825}, {"id": 108, "seek": 64024, "start": 660.72, "end": 667.28, + "text": " of the reality that in the large enterprise, it''s exactly what you said + entitlements are extremely", "tokens": [51388, 295, 264, 4103, 300, 294, 264, 2416, + 14132, 11, 309, 311, 2293, 437, 291, 848, 14789, 17988, 366, 4664, 51716], "temperature": + 0.0, "avg_logprob": -0.11713124884933722, "compression_ratio": 1.6354515050167224, + "no_speech_prob": 0.0015854957746341825}, {"id": 109, "seek": 66728, "start": 667.28, + "end": 674.0799999999999, "text": " important. You''re talking about crown jewel + data like PL product data or customer feedback.", "tokens": [50364, 1021, 13, 509, + 434, 1417, 466, 11841, 16010, 1412, 411, 6999, 1674, 1412, 420, 5474, 5824, 13, + 50704], "temperature": 0.0, "avg_logprob": -0.13779831840878443, "compression_ratio": + 1.6297577854671281, "no_speech_prob": 0.0014001547824591398}, {"id": 110, "seek": + 66728, "start": 674.0799999999999, "end": 680.24, "text": " CRM data is much less + sensitive in some ways. Also data that you might purchase, it''s very common", "tokens": + [50704, 14123, 44, 1412, 307, 709, 1570, 9477, 294, 512, 2098, 13, 2743, 1412, 300, + 291, 1062, 8110, 11, 309, 311, 588, 2689, 51012], "temperature": 0.0, "avg_logprob": + -0.13779831840878443, "compression_ratio": 1.6297577854671281, "no_speech_prob": + 0.0014001547824591398}, {"id": 111, "seek": 66728, "start": 680.24, "end": 685.52, + "text": " for companies to build and or purchase data sets and assemble them or + assemble derivative sets.", "tokens": [51012, 337, 3431, 281, 1322, 293, 420, 8110, + 1412, 6352, 293, 22364, 552, 420, 22364, 13760, 6352, 13, 51276], "temperature": + 0.0, "avg_logprob": -0.13779831840878443, "compression_ratio": 1.6297577854671281, + "no_speech_prob": 0.0014001547824591398}, {"id": 112, "seek": 66728, "start": 685.52, + "end": 691.28, "text": " These would be incredibly valuable for lots of uses. The + simplest one, right, usually as sales", "tokens": [51276, 1981, 576, 312, 6252, + 8263, 337, 3195, 295, 4960, 13, 440, 22811, 472, 11, 558, 11, 2673, 382, 5763, 51564], + "temperature": 0.0, "avg_logprob": -0.13779831840878443, "compression_ratio": 1.6297577854671281, + "no_speech_prob": 0.0014001547824591398}, {"id": 113, "seek": 66728, "start": 691.28, + "end": 695.6, "text": " or the most obvious one is help sales, help partners sell + more at the knowledge companies,", "tokens": [51564, 420, 264, 881, 6322, 472, 307, + 854, 5763, 11, 854, 4462, 3607, 544, 412, 264, 3601, 3431, 11, 51780], "temperature": + 0.0, "avg_logprob": -0.13779831840878443, "compression_ratio": 1.6297577854671281, + "no_speech_prob": 0.0014001547824591398}, {"id": 114, "seek": 69560, "start": 695.6, + "end": 700.16, "text": " help the sales people better understand their customers + or industries. And there''s a massive", "tokens": [50364, 854, 264, 5763, 561, 1101, + 1223, 641, 4581, 420, 13284, 13, 400, 456, 311, 257, 5994, 50592], "temperature": + 0.0, "avg_logprob": -0.09958425842889465, "compression_ratio": 1.675, "no_speech_prob": + 0.000511071237269789}, {"id": 115, "seek": 69560, "start": 700.16, "end": 707.52, + "text": " amount of information overload. So the problems are different. They''re + acute. They''re willing to spend", "tokens": [50592, 2372, 295, 1589, 28777, 13, + 407, 264, 2740, 366, 819, 13, 814, 434, 24390, 13, 814, 434, 4950, 281, 3496, 50960], + "temperature": 0.0, "avg_logprob": -0.09958425842889465, "compression_ratio": 1.675, + "no_speech_prob": 0.000511071237269789}, {"id": 116, "seek": 69560, "start": 707.52, + "end": 712.5600000000001, "text": " significant money, right, and invest in really + creating a better world. I think now,", "tokens": [50960, 4776, 1460, 11, 558, 11, + 293, 1963, 294, 534, 4084, 257, 1101, 1002, 13, 286, 519, 586, 11, 51212], "temperature": + 0.0, "avg_logprob": -0.09958425842889465, "compression_ratio": 1.675, "no_speech_prob": + 0.000511071237269789}, {"id": 117, "seek": 69560, "start": 714.4, "end": 718.0, + "text": " maybe one of the most important trends is people are not so interested + in more search boxes.", "tokens": [51304, 1310, 472, 295, 264, 881, 1021, 13892, + 307, 561, 366, 406, 370, 3102, 294, 544, 3164, 9002, 13, 51484], "temperature": + 0.0, "avg_logprob": -0.09958425842889465, "compression_ratio": 1.675, "no_speech_prob": + 0.000511071237269789}, {"id": 118, "seek": 69560, "start": 718.72, "end": 724.32, + "text": " They want to build proactive systems that bring people the information + that they need. And this", "tokens": [51520, 814, 528, 281, 1322, 28028, 3652, 300, + 1565, 561, 264, 1589, 300, 436, 643, 13, 400, 341, 51800], "temperature": 0.0, "avg_logprob": + -0.09958425842889465, "compression_ratio": 1.675, "no_speech_prob": 0.000511071237269789}, + {"id": 119, "seek": 72432, "start": 724.32, "end": 728.8000000000001, "text": " + has been a long time thing in sales with things like LinkedIn Navigator, right? + A lot of the", "tokens": [50364, 575, 668, 257, 938, 565, 551, 294, 5763, 365, 721, + 411, 20657, 9219, 28895, 11, 558, 30, 316, 688, 295, 264, 50588], "temperature": + 0.0, "avg_logprob": -0.13354564596105506, "compression_ratio": 1.6982456140350877, + "no_speech_prob": 0.0004095844051335007}, {"id": 120, "seek": 72432, "start": 728.8000000000001, + "end": 733.44, "text": " public data gets harvested and brought to you. But think + about all of that incredibly rich,", "tokens": [50588, 1908, 1412, 2170, 40994, + 293, 3038, 281, 291, 13, 583, 519, 466, 439, 295, 300, 6252, 4593, 11, 50820], "temperature": + 0.0, "avg_logprob": -0.13354564596105506, "compression_ratio": 1.6982456140350877, + "no_speech_prob": 0.0004095844051335007}, {"id": 121, "seek": 72432, "start": 733.44, + "end": 739.6800000000001, "text": " valuable internal data and needing to bring + that and how hard it is to bring that to people inside", "tokens": [50820, 8263, + 6920, 1412, 293, 18006, 281, 1565, 300, 293, 577, 1152, 309, 307, 281, 1565, 300, + 281, 561, 1854, 51132], "temperature": 0.0, "avg_logprob": -0.13354564596105506, + "compression_ratio": 1.6982456140350877, "no_speech_prob": 0.0004095844051335007}, + {"id": 122, "seek": 72432, "start": 739.6800000000001, "end": 745.44, "text": " + the enterprise because of those entitlement lines. So federated or meta search is + a technical", "tokens": [51132, 264, 14132, 570, 295, 729, 14789, 3054, 3876, 13, + 407, 38024, 770, 420, 19616, 3164, 307, 257, 6191, 51420], "temperature": 0.0, "avg_logprob": + -0.13354564596105506, "compression_ratio": 1.6982456140350877, "no_speech_prob": + 0.0004095844051335007}, {"id": 123, "seek": 72432, "start": 745.44, "end": 751.6, + "text": " approach, which says rather than an in traditional enterprise search, + traditionally, the tool is indexing.", "tokens": [51420, 3109, 11, 597, 1619, 2831, + 813, 364, 294, 5164, 14132, 3164, 11, 19067, 11, 264, 2290, 307, 8186, 278, 13, + 51728], "temperature": 0.0, "avg_logprob": -0.13354564596105506, "compression_ratio": + 1.6982456140350877, "no_speech_prob": 0.0004095844051335007}, {"id": 124, "seek": + 75160, "start": 751.84, "end": 759.6, "text": " So you take the data from all the + sources that you need to query, which usually,", "tokens": [50376, 407, 291, 747, + 264, 1412, 490, 439, 264, 7139, 300, 291, 643, 281, 14581, 11, 597, 2673, 11, 50764], + "temperature": 0.0, "avg_logprob": -0.15041414896647134, "compression_ratio": 1.6470588235294117, + "no_speech_prob": 0.0007877394673414528}, {"id": 125, "seek": 75160, "start": 760.24, + "end": 764.96, "text": " since that''s hundreds, if not thousands, inside the large + enterprise, usually start with a few.", "tokens": [50796, 1670, 300, 311, 6779, + 11, 498, 406, 5383, 11, 1854, 264, 2416, 14132, 11, 2673, 722, 365, 257, 1326, 13, + 51032], "temperature": 0.0, "avg_logprob": -0.15041414896647134, "compression_ratio": + 1.6470588235294117, "no_speech_prob": 0.0007877394673414528}, {"id": 126, "seek": + 75160, "start": 766.24, "end": 771.44, "text": " And you extract the data, meaning + you pull it all out, then you have to remodel it because", "tokens": [51096, 400, + 291, 8947, 264, 1412, 11, 3620, 291, 2235, 309, 439, 484, 11, 550, 291, 362, 281, + 890, 41147, 309, 570, 51356], "temperature": 0.0, "avg_logprob": -0.15041414896647134, + "compression_ratio": 1.6470588235294117, "no_speech_prob": 0.0007877394673414528}, + {"id": 127, "seek": 75160, "start": 772.08, "end": 775.76, "text": " you could leave + it sort of as is, but the odds are high that won''t help with search. You need to", + "tokens": [51388, 291, 727, 1856, 309, 1333, 295, 382, 307, 11, 457, 264, 17439, + 366, 1090, 300, 1582, 380, 854, 365, 3164, 13, 509, 643, 281, 51572], "temperature": + 0.0, "avg_logprob": -0.15041414896647134, "compression_ratio": 1.6470588235294117, + "no_speech_prob": 0.0007877394673414528}, {"id": 128, "seek": 77576, "start": 776.48, + "end": 783.36, "text": " make at least some of the fields, things like title and + body line up. So you map those things over", "tokens": [50400, 652, 412, 1935, 512, + 295, 264, 7909, 11, 721, 411, 4876, 293, 1772, 1622, 493, 13, 407, 291, 4471, 729, + 721, 670, 50744], "temperature": 0.0, "avg_logprob": -0.1474829021253084, "compression_ratio": + 1.6583333333333334, "no_speech_prob": 0.0015358083182945848}, {"id": 129, "seek": + 77576, "start": 783.36, "end": 788.8, "text": " and you have to make sure that the + set of entitlements, meaning whose author I see stuff, all of that from", "tokens": + [50744, 293, 291, 362, 281, 652, 988, 300, 264, 992, 295, 14789, 17988, 11, 3620, + 6104, 3793, 286, 536, 1507, 11, 439, 295, 300, 490, 51016], "temperature": 0.0, + "avg_logprob": -0.1474829021253084, "compression_ratio": 1.6583333333333334, "no_speech_prob": + 0.0015358083182945848}, {"id": 130, "seek": 77576, "start": 788.8, "end": 794.08, + "text": " all the silos has to be aggregated and correctly rationalized and put + together, then you index it.", "tokens": [51016, 439, 264, 48893, 575, 281, 312, + 16743, 770, 293, 8944, 15090, 1602, 293, 829, 1214, 11, 550, 291, 8186, 309, 13, + 51280], "temperature": 0.0, "avg_logprob": -0.1474829021253084, "compression_ratio": + 1.6583333333333334, "no_speech_prob": 0.0015358083182945848}, {"id": 131, "seek": + 77576, "start": 794.96, "end": 800.56, "text": " Indexing is a technical process + like creating a structure like the back of most books or most", "tokens": [51324, + 33552, 278, 307, 257, 6191, 1399, 411, 4084, 257, 3877, 411, 264, 646, 295, 881, + 3642, 420, 881, 51604], "temperature": 0.0, "avg_logprob": -0.1474829021253084, + "compression_ratio": 1.6583333333333334, "no_speech_prob": 0.0015358083182945848}, + {"id": 132, "seek": 80056, "start": 800.56, "end": 805.28, "text": " long books, + a list of words with basically page numbers, but in this case, they''re slightly + more", "tokens": [50364, 938, 3642, 11, 257, 1329, 295, 2283, 365, 1936, 3028, 3547, + 11, 457, 294, 341, 1389, 11, 436, 434, 4748, 544, 50600], "temperature": 0.0, "avg_logprob": + -0.11037262364437705, "compression_ratio": 1.58, "no_speech_prob": 0.0005225735949352384}, + {"id": 133, "seek": 80056, "start": 805.28, "end": 809.68, "text": " complex. They + might identify the documents in the field and the exact token that it occurs in.", + "tokens": [50600, 3997, 13, 814, 1062, 5876, 264, 8512, 294, 264, 2519, 293, 264, + 1900, 14862, 300, 309, 11843, 294, 13, 50820], "temperature": 0.0, "avg_logprob": + -0.11037262364437705, "compression_ratio": 1.58, "no_speech_prob": 0.0005225735949352384}, + {"id": 134, "seek": 80056, "start": 809.68, "end": 818.56, "text": " So you have + this kind of data structure. And you just have to keep it up to date anytime anything + changes.", "tokens": [50820, 407, 291, 362, 341, 733, 295, 1412, 3877, 13, 400, + 291, 445, 362, 281, 1066, 309, 493, 281, 4002, 13038, 1340, 2962, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.11037262364437705, "compression_ratio": 1.58, "no_speech_prob": + 0.0005225735949352384}, {"id": 135, "seek": 80056, "start": 820.0799999999999, "end": + 825.76, "text": " So it''s really hard. I have been very lucky to work in search + and fast was a phenomenal indexing", "tokens": [51340, 407, 309, 311, 534, 1152, + 13, 286, 362, 668, 588, 6356, 281, 589, 294, 3164, 293, 2370, 390, 257, 17778, 8186, + 278, 51624], "temperature": 0.0, "avg_logprob": -0.11037262364437705, "compression_ratio": + 1.58, "no_speech_prob": 0.0005225735949352384}, {"id": 136, "seek": 82576, "start": + 825.76, "end": 832.64, "text": " company and it innovated in indexing beyond the + pale. I really incredible stuff. So fast was one", "tokens": [50364, 2237, 293, + 309, 5083, 770, 294, 8186, 278, 4399, 264, 19546, 13, 286, 534, 4651, 1507, 13, + 407, 2370, 390, 472, 50708], "temperature": 0.0, "avg_logprob": -0.195271746802876, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.00911737885326147}, + {"id": 137, "seek": 82576, "start": 832.64, "end": 838.88, "text": " of the first + companies to do updateable indices. You could actually update it. Then a lot of + the stuff", "tokens": [50708, 295, 264, 700, 3431, 281, 360, 5623, 712, 43840, 13, + 509, 727, 767, 5623, 309, 13, 1396, 257, 688, 295, 264, 1507, 51020], "temperature": + 0.0, "avg_logprob": -0.195271746802876, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.00911737885326147}, {"id": 138, "seek": 82576, "start": 838.88, + "end": 845.12, "text": " that they did is advanced vector. We did it fast, but you + know, me a tiny bit, right? Whatever the", "tokens": [51020, 300, 436, 630, 307, + 7339, 8062, 13, 492, 630, 309, 2370, 11, 457, 291, 458, 11, 385, 257, 5870, 857, + 11, 558, 30, 8541, 264, 51332], "temperature": 0.0, "avg_logprob": -0.195271746802876, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.00911737885326147}, + {"id": 139, "seek": 82576, "start": 845.12, "end": 850.0, "text": " nuggets were, + but they went on. They went so far with engine development at fast. And now it''s + by", "tokens": [51332, 42663, 645, 11, 457, 436, 1437, 322, 13, 814, 1437, 370, + 1400, 365, 2848, 3250, 412, 2370, 13, 400, 586, 309, 311, 538, 51576], "temperature": + 0.0, "avg_logprob": -0.195271746802876, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.00911737885326147}, {"id": 140, "seek": 82576, "start": 850.0, + "end": 854.08, "text": " the way available through the Vespa project, right? If + you go to Vespa.de, all that stuff is available,", "tokens": [51576, 264, 636, 2435, + 807, 264, 691, 279, 4306, 1716, 11, 558, 30, 759, 291, 352, 281, 691, 279, 4306, + 13, 1479, 11, 439, 300, 1507, 307, 2435, 11, 51780], "temperature": 0.0, "avg_logprob": + -0.195271746802876, "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.00911737885326147}, + {"id": 141, "seek": 85408, "start": 854.08, "end": 860.64, "text": " open source + to. Yeah, we have an episode with Vespa. You probably would joke. Yes.", "tokens": + [50364, 1269, 4009, 281, 13, 865, 11, 321, 362, 364, 3500, 365, 691, 279, 4306, + 13, 509, 1391, 576, 7647, 13, 1079, 13, 50692], "temperature": 0.0, "avg_logprob": + -0.23930153891304942, "compression_ratio": 1.6366906474820144, "no_speech_prob": + 0.0016994241159409285}, {"id": 142, "seek": 85408, "start": 862.32, "end": 869.2, + "text": " He''s my hero on on Twitter. So incredible advances at fast and frankly + at a", "tokens": [50776, 634, 311, 452, 5316, 322, 322, 5794, 13, 407, 4651, 25297, + 412, 2370, 293, 11939, 412, 257, 51120], "temperature": 0.0, "avg_logprob": -0.23930153891304942, + "compression_ratio": 1.6366906474820144, "no_speech_prob": 0.0016994241159409285}, + {"id": 143, "seek": 85408, "start": 869.2, "end": 873.5200000000001, "text": " TVO, + you know, there were a bunch of patents filed. Some very smart people worked on + that problem", "tokens": [51120, 3558, 46, 11, 291, 458, 11, 456, 645, 257, 3840, + 295, 38142, 18789, 13, 2188, 588, 4069, 561, 2732, 322, 300, 1154, 51336], "temperature": + 0.0, "avg_logprob": -0.23930153891304942, "compression_ratio": 1.6366906474820144, + "no_speech_prob": 0.0016994241159409285}, {"id": 144, "seek": 85408, "start": 873.5200000000001, + "end": 879.0400000000001, "text": " and came up with incredible ways to interlink + data by combining graph and and a traditional inverted", "tokens": [51336, 293, + 1361, 493, 365, 4651, 2098, 281, 728, 22473, 1412, 538, 21928, 4295, 293, 293, 257, + 5164, 38969, 51612], "temperature": 0.0, "avg_logprob": -0.23930153891304942, "compression_ratio": + 1.6366906474820144, "no_speech_prob": 0.0016994241159409285}, {"id": 145, "seek": + 85408, "start": 879.0400000000001, "end": 883.2, "text": " index and doing things + like then adding that to machine learning and doing things like predicting", "tokens": + [51612, 8186, 293, 884, 721, 411, 550, 5127, 300, 281, 3479, 2539, 293, 884, 721, + 411, 32884, 51820], "temperature": 0.0, "avg_logprob": -0.23930153891304942, "compression_ratio": + 1.6366906474820144, "no_speech_prob": 0.0016994241159409285}, {"id": 146, "seek": + 88320, "start": 883.44, "end": 891.36, "text": " the answer to a service ticket. + So there''s no end of indexing. It''s just hard. That''s all.", "tokens": [50376, + 264, 1867, 281, 257, 2643, 10550, 13, 407, 456, 311, 572, 917, 295, 8186, 278, 13, + 467, 311, 445, 1152, 13, 663, 311, 439, 13, 50772], "temperature": 0.0, "avg_logprob": + -0.1300945281982422, "compression_ratio": 1.5508474576271187, "no_speech_prob": + 0.0007845716900192201}, {"id": 147, "seek": 88320, "start": 892.0, "end": 896.08, + "text": " It''s just hard. And especially when you want to combine silos. And so + over the years,", "tokens": [50804, 467, 311, 445, 1152, 13, 400, 2318, 562, 291, + 528, 281, 10432, 48893, 13, 400, 370, 670, 264, 924, 11, 51008], "temperature": + 0.0, "avg_logprob": -0.1300945281982422, "compression_ratio": 1.5508474576271187, + "no_speech_prob": 0.0007845716900192201}, {"id": 148, "seek": 88320, "start": 896.08, + "end": 900.88, "text": " I''ve bumped into people who have had the multi silo problem + in grade numbers. There is one", "tokens": [51008, 286, 600, 42696, 666, 561, 567, + 362, 632, 264, 4825, 3425, 78, 1154, 294, 7204, 3547, 13, 821, 307, 472, 51248], + "temperature": 0.0, "avg_logprob": -0.1300945281982422, "compression_ratio": 1.5508474576271187, + "no_speech_prob": 0.0007845716900192201}, {"id": 149, "seek": 88320, "start": 900.88, + "end": 908.0, "text": " consulting company that has more than 500 silos, separate + installations of elastic, literally from", "tokens": [51248, 23682, 2237, 300, 575, + 544, 813, 5923, 48893, 11, 4994, 41932, 295, 17115, 11, 3736, 490, 51604], "temperature": + 0.0, "avg_logprob": -0.1300945281982422, "compression_ratio": 1.5508474576271187, + "no_speech_prob": 0.0007845716900192201}, {"id": 150, "seek": 90800, "start": 908.0, + "end": 912.56, "text": " version two to version eight or whatever they''re on now, + right? Because that was a standard.", "tokens": [50364, 3037, 732, 281, 3037, 3180, + 420, 2035, 436, 434, 322, 586, 11, 558, 30, 1436, 300, 390, 257, 3832, 13, 50592], + "temperature": 0.0, "avg_logprob": -0.13735894899110537, "compression_ratio": 1.6537102473498233, + "no_speech_prob": 0.004973128903657198}, {"id": 151, "seek": 90800, "start": 912.56, + "end": 917.36, "text": " And when they got a JSON data set or a database or they + bought something or they did a hackathon", "tokens": [50592, 400, 562, 436, 658, + 257, 31828, 1412, 992, 420, 257, 8149, 420, 436, 4243, 746, 420, 436, 630, 257, + 10339, 18660, 50832], "temperature": 0.0, "avg_logprob": -0.13735894899110537, "compression_ratio": + 1.6537102473498233, "no_speech_prob": 0.004973128903657198}, {"id": 152, "seek": + 90800, "start": 918.0, "end": 923.12, "text": " invariably, the documents ended + up in some elastic with some security on it. And now", "tokens": [50864, 33270, + 1188, 11, 264, 8512, 4590, 493, 294, 512, 17115, 365, 512, 3825, 322, 309, 13, 400, + 586, 51120], "temperature": 0.0, "avg_logprob": -0.13735894899110537, "compression_ratio": + 1.6537102473498233, "no_speech_prob": 0.004973128903657198}, {"id": 153, "seek": + 90800, "start": 923.92, "end": 930.56, "text": " the some of the variation right + in partner and case team performance is attributed internally", "tokens": [51160, + 264, 512, 295, 264, 12990, 558, 294, 4975, 293, 1389, 1469, 3389, 307, 30976, 19501, + 51492], "temperature": 0.0, "avg_logprob": -0.13735894899110537, "compression_ratio": + 1.6537102473498233, "no_speech_prob": 0.004973128903657198}, {"id": 154, "seek": + 90800, "start": 930.56, "end": 936.72, "text": " through surveys to who knows where + to get the data. If you know, oh, I know to talk to this person,", "tokens": [51492, + 807, 22711, 281, 567, 3255, 689, 281, 483, 264, 1412, 13, 759, 291, 458, 11, 1954, + 11, 286, 458, 281, 751, 281, 341, 954, 11, 51800], "temperature": 0.0, "avg_logprob": + -0.13735894899110537, "compression_ratio": 1.6537102473498233, "no_speech_prob": + 0.004973128903657198}, {"id": 155, "seek": 93672, "start": 937.28, "end": 942.72, + "text": " they will have the key to unlock this particular thing that I can then + use to say, hey, look what", "tokens": [50392, 436, 486, 362, 264, 2141, 281, 11634, + 341, 1729, 551, 300, 286, 393, 550, 764, 281, 584, 11, 4177, 11, 574, 437, 50664], + "temperature": 0.0, "avg_logprob": -0.1374705660659655, "compression_ratio": 1.725, + "no_speech_prob": 7.81923154136166e-05}, {"id": 156, "seek": 93672, "start": 942.72, + "end": 946.64, "text": " we did this incredible work we did in your industry before + or look at this incredible work we did", "tokens": [50664, 321, 630, 341, 4651, + 589, 321, 630, 294, 428, 3518, 949, 420, 574, 412, 341, 4651, 589, 321, 630, 50860], + "temperature": 0.0, "avg_logprob": -0.1374705660659655, "compression_ratio": 1.725, + "no_speech_prob": 7.81923154136166e-05}, {"id": 157, "seek": 93672, "start": 946.64, + "end": 952.48, "text": " for you in the past, right? A new partner might not know + that. They''ve done five engagements that", "tokens": [50860, 337, 291, 294, 264, + 1791, 11, 558, 30, 316, 777, 4975, 1062, 406, 458, 300, 13, 814, 600, 1096, 1732, + 44978, 300, 51152], "temperature": 0.0, "avg_logprob": -0.1374705660659655, "compression_ratio": + 1.725, "no_speech_prob": 7.81923154136166e-05}, {"id": 158, "seek": 93672, "start": + 952.48, "end": 960.32, "text": " were very similar. So it''s that kind of and I + think the word is systematic. People want to be", "tokens": [51152, 645, 588, 2531, + 13, 407, 309, 311, 300, 733, 295, 293, 286, 519, 264, 1349, 307, 27249, 13, 3432, + 528, 281, 312, 51544], "temperature": 0.0, "avg_logprob": -0.1374705660659655, "compression_ratio": + 1.725, "no_speech_prob": 7.81923154136166e-05}, {"id": 159, "seek": 93672, "start": + 960.32, "end": 965.2, "text": " very much more systematic now because everyone is + too busy and there''s information overload. So", "tokens": [51544, 588, 709, 544, + 27249, 586, 570, 1518, 307, 886, 5856, 293, 456, 311, 1589, 28777, 13, 407, 51788], + "temperature": 0.0, "avg_logprob": -0.1374705660659655, "compression_ratio": 1.725, + "no_speech_prob": 7.81923154136166e-05}, {"id": 160, "seek": 96672, "start": 967.44, + "end": 974.88, "text": " that''s really the to break those lines down. My view is + enterprise search now really desperately,", "tokens": [50400, 300, 311, 534, 264, + 281, 1821, 729, 3876, 760, 13, 1222, 1910, 307, 14132, 3164, 586, 534, 23726, 11, + 50772], "temperature": 0.0, "avg_logprob": -0.1684650130893873, "compression_ratio": + 1.7136563876651982, "no_speech_prob": 0.003799359081313014}, {"id": 161, "seek": + 96672, "start": 975.44, "end": 981.6800000000001, "text": " desperately critically + requires meta search. It''s the only choice you cannot you''re downloading,", "tokens": + [50800, 23726, 22797, 7029, 19616, 3164, 13, 467, 311, 264, 787, 3922, 291, 2644, + 291, 434, 32529, 11, 51112], "temperature": 0.0, "avg_logprob": -0.1684650130893873, + "compression_ratio": 1.7136563876651982, "no_speech_prob": 0.003799359081313014}, + {"id": 162, "seek": 96672, "start": 982.48, "end": 987.9200000000001, "text": " + you know, pulling out all of the data, even if you were to desire that. It''s very + hard to do.", "tokens": [51152, 291, 458, 11, 8407, 484, 439, 295, 264, 1412, 11, + 754, 498, 291, 645, 281, 7516, 300, 13, 467, 311, 588, 1152, 281, 360, 13, 51424], + "temperature": 0.0, "avg_logprob": -0.1684650130893873, "compression_ratio": 1.7136563876651982, + "no_speech_prob": 0.003799359081313014}, {"id": 163, "seek": 96672, "start": 989.2, + "end": 993.12, "text": " Now you''re because you have to basically the old way would + be to pull all the data out of everything", "tokens": [51488, 823, 291, 434, 570, + 291, 362, 281, 1936, 264, 1331, 636, 576, 312, 281, 2235, 439, 264, 1412, 484, 295, + 1203, 51684], "temperature": 0.0, "avg_logprob": -0.1684650130893873, "compression_ratio": + 1.7136563876651982, "no_speech_prob": 0.003799359081313014}, {"id": 164, "seek": + 99312, "start": 993.12, "end": 1000.8, "text": " and sort of filter it down. Why + not search it? Yeah, our search is to say it''s out there now.", "tokens": [50364, + 293, 1333, 295, 6608, 309, 760, 13, 1545, 406, 3164, 309, 30, 865, 11, 527, 3164, + 307, 281, 584, 309, 311, 484, 456, 586, 13, 50748], "temperature": 0.0, "avg_logprob": + -0.166215530549637, "compression_ratio": 1.6125, "no_speech_prob": 0.002496548928320408}, + {"id": 165, "seek": 99312, "start": 1000.8, "end": 1006.16, "text": " The vendors + are doing incredible things. I mean, service now from where it was years ago to + where", "tokens": [50748, 440, 22056, 366, 884, 4651, 721, 13, 286, 914, 11, 2643, + 586, 490, 689, 309, 390, 924, 2057, 281, 689, 51016], "temperature": 0.0, "avg_logprob": + -0.166215530549637, "compression_ratio": 1.6125, "no_speech_prob": 0.002496548928320408}, + {"id": 166, "seek": 99312, "start": 1006.16, "end": 1011.28, "text": " it is today. + It''s incredible. There''s an amazing team of people working away on that and that''s", + "tokens": [51016, 309, 307, 965, 13, 467, 311, 4651, 13, 821, 311, 364, 2243, 1469, + 295, 561, 1364, 1314, 322, 300, 293, 300, 311, 51272], "temperature": 0.0, "avg_logprob": + -0.166215530549637, "compression_ratio": 1.6125, "no_speech_prob": 0.002496548928320408}, + {"id": 167, "seek": 99312, "start": 1011.28, "end": 1017.52, "text": " true of most + applications now. Somebody''s working on search. It has a nice high quality API. + So let", "tokens": [51272, 2074, 295, 881, 5821, 586, 13, 13463, 311, 1364, 322, + 3164, 13, 467, 575, 257, 1481, 1090, 3125, 9362, 13, 407, 718, 51584], "temperature": + 0.0, "avg_logprob": -0.166215530549637, "compression_ratio": 1.6125, "no_speech_prob": + 0.002496548928320408}, {"id": 168, "seek": 101752, "start": 1017.68, "end": 1023.68, + "text": " them do their thing. Let them master it. But search and the other thing, + the interesting that makes", "tokens": [50372, 552, 360, 641, 551, 13, 961, 552, + 4505, 309, 13, 583, 3164, 293, 264, 661, 551, 11, 264, 1880, 300, 1669, 50672], + "temperature": 0.0, "avg_logprob": -0.13225844928196498, "compression_ratio": 1.8202247191011236, + "no_speech_prob": 0.003028707578778267}, {"id": 169, "seek": 101752, "start": 1023.68, + "end": 1028.08, "text": " meta search particularly powerful for the enterprise is + you''re always searching on behalf of something.", "tokens": [50672, 19616, 3164, + 4098, 4005, 337, 264, 14132, 307, 291, 434, 1009, 10808, 322, 9490, 295, 746, 13, + 50892], "temperature": 0.0, "avg_logprob": -0.13225844928196498, "compression_ratio": + 1.8202247191011236, "no_speech_prob": 0.003028707578778267}, {"id": 170, "seek": + 101752, "start": 1028.08, "end": 1034.32, "text": " Right. And that avoids something + that avoids it. It goes with the flow. It goes with the grain", "tokens": [50892, + 1779, 13, 400, 300, 3641, 3742, 746, 300, 3641, 3742, 309, 13, 467, 1709, 365, 264, + 3095, 13, 467, 1709, 365, 264, 12837, 51204], "temperature": 0.0, "avg_logprob": + -0.13225844928196498, "compression_ratio": 1.8202247191011236, "no_speech_prob": + 0.003028707578778267}, {"id": 171, "seek": 101752, "start": 1035.28, "end": 1038.96, + "text": " of the enterprise architecture. You''re supposed to query on behalf of + something and if you do,", "tokens": [51252, 295, 264, 14132, 9482, 13, 509, 434, + 3442, 281, 14581, 322, 9490, 295, 746, 293, 498, 291, 360, 11, 51436], "temperature": + 0.0, "avg_logprob": -0.13225844928196498, "compression_ratio": 1.8202247191011236, + "no_speech_prob": 0.003028707578778267}, {"id": 172, "seek": 101752, "start": 1038.96, + "end": 1042.8799999999999, "text": " in theory, the app can just maintain the context. + It only gets tricky when you start saying,", "tokens": [51436, 294, 5261, 11, 264, + 724, 393, 445, 6909, 264, 4319, 13, 467, 787, 2170, 12414, 562, 291, 722, 1566, + 11, 51632], "temperature": 0.0, "avg_logprob": -0.13225844928196498, "compression_ratio": + 1.8202247191011236, "no_speech_prob": 0.003028707578778267}, {"id": 173, "seek": + 104288, "start": 1042.88, "end": 1047.68, "text": " oh, I want to combine these + five together. At the data level, when you do it at the user level,", "tokens": + [50364, 1954, 11, 286, 528, 281, 10432, 613, 1732, 1214, 13, 1711, 264, 1412, 1496, + 11, 562, 291, 360, 309, 412, 264, 4195, 1496, 11, 50604], "temperature": 0.0, "avg_logprob": + -0.1771651127541712, "compression_ratio": 1.7749077490774907, "no_speech_prob": + 0.0022522613871842623}, {"id": 174, "seek": 104288, "start": 1047.68, "end": 1051.8400000000001, + "text": " that''s fine. Either the user was authorized to see all three or they + weren''t or they were able", "tokens": [50604, 300, 311, 2489, 13, 13746, 264, 4195, + 390, 28312, 281, 536, 439, 1045, 420, 436, 4999, 380, 420, 436, 645, 1075, 50812], + "temperature": 0.0, "avg_logprob": -0.1771651127541712, "compression_ratio": 1.7749077490774907, + "no_speech_prob": 0.0022522613871842623}, {"id": 175, "seek": 104288, "start": 1051.8400000000001, + "end": 1057.5200000000002, "text": " to see a portion of it or they weren''t. That''s + the way things work in the enterprise. So that''s", "tokens": [50812, 281, 536, + 257, 8044, 295, 309, 420, 436, 4999, 380, 13, 663, 311, 264, 636, 721, 589, 294, + 264, 14132, 13, 407, 300, 311, 51096], "temperature": 0.0, "avg_logprob": -0.1771651127541712, + "compression_ratio": 1.7749077490774907, "no_speech_prob": 0.0022522613871842623}, + {"id": 176, "seek": 104288, "start": 1058.8000000000002, "end": 1063.8400000000001, + "text": " that''s the subtle difference, right? To delineate them. Yeah. And why + the potentials there is that", "tokens": [51160, 300, 311, 264, 13743, 2649, 11, + 558, 30, 1407, 1103, 533, 473, 552, 13, 865, 13, 400, 983, 264, 3995, 82, 456, 307, + 300, 51412], "temperature": 0.0, "avg_logprob": -0.1771651127541712, "compression_ratio": + 1.7749077490774907, "no_speech_prob": 0.0022522613871842623}, {"id": 177, "seek": + 104288, "start": 1063.8400000000001, "end": 1070.16, "text": " indexing is costly. + And yet, on the. Yeah. And you described it really eloquently in a way that", "tokens": + [51412, 8186, 278, 307, 28328, 13, 400, 1939, 11, 322, 264, 13, 865, 13, 400, 291, + 7619, 309, 534, 38682, 47519, 294, 257, 636, 300, 51728], "temperature": 0.0, "avg_logprob": + -0.1771651127541712, "compression_ratio": 1.7749077490774907, "no_speech_prob": + 0.0022522613871842623}, {"id": 178, "seek": 107016, "start": 1071.0400000000002, + "end": 1078.16, "text": " to some extent by implementing meta search, you wouldn''t + need to solve indexing issues. You", "tokens": [50408, 281, 512, 8396, 538, 18114, + 19616, 3164, 11, 291, 2759, 380, 643, 281, 5039, 8186, 278, 2663, 13, 509, 50764], + "temperature": 0.0, "avg_logprob": -0.15108410353513108, "compression_ratio": 1.6814159292035398, + "no_speech_prob": 0.001609696657396853}, {"id": 179, "seek": 107016, "start": 1078.16, + "end": 1084.72, "text": " wouldn''t need to solve entitlement issues, right? You + kind of like use the existing proxies.", "tokens": [50764, 2759, 380, 643, 281, + 5039, 14789, 3054, 2663, 11, 558, 30, 509, 733, 295, 411, 764, 264, 6741, 447, 87, + 530, 13, 51092], "temperature": 0.0, "avg_logprob": -0.15108410353513108, "compression_ratio": + 1.6814159292035398, "no_speech_prob": 0.001609696657396853}, {"id": 180, "seek": + 107016, "start": 1085.28, "end": 1093.1200000000001, "text": " But there is one + remaining bit that I''m really curious about. So if you look at, let''s say what + Google", "tokens": [51120, 583, 456, 307, 472, 8877, 857, 300, 286, 478, 534, 6369, + 466, 13, 407, 498, 291, 574, 412, 11, 718, 311, 584, 437, 3329, 51512], "temperature": + 0.0, "avg_logprob": -0.15108410353513108, "compression_ratio": 1.6814159292035398, + "no_speech_prob": 0.001609696657396853}, {"id": 181, "seek": 107016, "start": 1093.1200000000001, + "end": 1099.92, "text": " did to the web search is that they looked at what you + could call a proxonym effect. So other", "tokens": [51512, 630, 281, 264, 3670, + 3164, 307, 300, 436, 2956, 412, 437, 291, 727, 818, 257, 447, 87, 12732, 1802, 13, + 407, 661, 51852], "temperature": 0.0, "avg_logprob": -0.15108410353513108, "compression_ratio": + 1.6814159292035398, "no_speech_prob": 0.001609696657396853}, {"id": 182, "seek": + 109992, "start": 1099.92, "end": 1106.88, "text": " people created pages linked + to more important pages, hubs, and then you invent the algorithm,", "tokens": [50364, + 561, 2942, 7183, 9408, 281, 544, 1021, 7183, 11, 46870, 11, 293, 550, 291, 7962, + 264, 9284, 11, 50712], "temperature": 0.0, "avg_logprob": -0.18699480978290686, + "compression_ratio": 1.6278026905829597, "no_speech_prob": 0.00293486169539392}, + {"id": 183, "seek": 109992, "start": 1107.52, "end": 1113.44, "text": " create it + to you. But you still kind of like rely on what others did in a way, right?", "tokens": + [50744, 1884, 309, 281, 291, 13, 583, 291, 920, 733, 295, 411, 10687, 322, 437, + 2357, 630, 294, 257, 636, 11, 558, 30, 51040], "temperature": 0.0, "avg_logprob": + -0.18699480978290686, "compression_ratio": 1.6278026905829597, "no_speech_prob": + 0.00293486169539392}, {"id": 184, "seek": 109992, "start": 1114.3200000000002, "end": + 1119.6000000000001, "text": " And so now you have the page rank algorithm, how you + how you rank the documents and all of a", "tokens": [51084, 400, 370, 586, 291, + 362, 264, 3028, 6181, 9284, 11, 577, 291, 577, 291, 6181, 264, 8512, 293, 439, 295, + 257, 51348], "temperature": 0.0, "avg_logprob": -0.18699480978290686, "compression_ratio": + 1.6278026905829597, "no_speech_prob": 0.00293486169539392}, {"id": 185, "seek": + 109992, "start": 1119.6000000000001, "end": 1124.5600000000002, "text": " sudden, + this is the breakthrough and this looks a lot more relevant. In enterprise search,", + "tokens": [51348, 3990, 11, 341, 307, 264, 22397, 293, 341, 1542, 257, 688, 544, + 7340, 13, 682, 14132, 3164, 11, 51596], "temperature": 0.0, "avg_logprob": -0.18699480978290686, + "compression_ratio": 1.6278026905829597, "no_speech_prob": 0.00293486169539392}, + {"id": 186, "seek": 112456, "start": 1125.2, "end": 1129.84, "text": " you don''t + necessarily have this. Okay, you do have documents that are being created, created,", + "tokens": [50396, 291, 500, 380, 4725, 362, 341, 13, 1033, 11, 291, 360, 362, 8512, + 300, 366, 885, 2942, 11, 2942, 11, 50628], "temperature": 0.0, "avg_logprob": -0.16532469267892366, + "compression_ratio": 1.603305785123967, "no_speech_prob": 0.04346649348735809}, + {"id": 187, "seek": 112456, "start": 1129.84, "end": 1136.0, "text": " and so forth. + But then as you said, there is a lot of silos, right? And so things get created.", + "tokens": [50628, 293, 370, 5220, 13, 583, 550, 382, 291, 848, 11, 456, 307, 257, + 688, 295, 48893, 11, 558, 30, 400, 370, 721, 483, 2942, 13, 50936], "temperature": + 0.0, "avg_logprob": -0.16532469267892366, "compression_ratio": 1.603305785123967, + "no_speech_prob": 0.04346649348735809}, {"id": 188, "seek": 112456, "start": 1136.0, + "end": 1141.36, "text": " There is no single place where you can say, what happened? + What did I miss? What do you have on this", "tokens": [50936, 821, 307, 572, 2167, + 1081, 689, 291, 393, 584, 11, 437, 2011, 30, 708, 630, 286, 1713, 30, 708, 360, + 291, 362, 322, 341, 51204], "temperature": 0.0, "avg_logprob": -0.16532469267892366, + "compression_ratio": 1.603305785123967, "no_speech_prob": 0.04346649348735809}, + {"id": 189, "seek": 112456, "start": 1141.36, "end": 1150.56, "text": " topic and + so forth? Just today in the morning, I was browsing through Office 365. They have + like a", "tokens": [51204, 4829, 293, 370, 5220, 30, 1449, 965, 294, 264, 2446, + 11, 286, 390, 38602, 807, 8935, 22046, 13, 814, 362, 411, 257, 51664], "temperature": + 0.0, "avg_logprob": -0.16532469267892366, "compression_ratio": 1.603305785123967, + "no_speech_prob": 0.04346649348735809}, {"id": 190, "seek": 115056, "start": 1150.56, + "end": 1157.28, "text": " single page, which shows me all the documents that either + I interacted with or someone interacted", "tokens": [50364, 2167, 3028, 11, 597, + 3110, 385, 439, 264, 8512, 300, 2139, 286, 49621, 365, 420, 1580, 49621, 50700], + "temperature": 0.0, "avg_logprob": -0.13872027146188837, "compression_ratio": 1.7232142857142858, + "no_speech_prob": 0.021721016615629196}, {"id": 191, "seek": 115056, "start": 1157.28, + "end": 1163.28, "text": " with and I am part of that group. And I can search there. + That was helpful actually. That''s all the", "tokens": [50700, 365, 293, 286, 669, + 644, 295, 300, 1594, 13, 400, 286, 393, 3164, 456, 13, 663, 390, 4961, 767, 13, + 663, 311, 439, 264, 51000], "temperature": 0.0, "avg_logprob": -0.13872027146188837, + "compression_ratio": 1.7232142857142858, "no_speech_prob": 0.021721016615629196}, + {"id": 192, "seek": 115056, "start": 1163.28, "end": 1171.84, "text": " lot of save + the lot of time. But again, it doesn''t have confidence. It doesn''t have Salesforce. + It", "tokens": [51000, 688, 295, 3155, 264, 688, 295, 565, 13, 583, 797, 11, 309, + 1177, 380, 362, 6687, 13, 467, 1177, 380, 362, 40398, 13, 467, 51428], "temperature": + 0.0, "avg_logprob": -0.13872027146188837, "compression_ratio": 1.7232142857142858, + "no_speech_prob": 0.021721016615629196}, {"id": 193, "seek": 115056, "start": 1171.84, + "end": 1179.28, "text": " doesn''t have a bunch of other places where it would go. + So I guess one missing component,", "tokens": [51428, 1177, 380, 362, 257, 3840, + 295, 661, 3190, 689, 309, 576, 352, 13, 407, 286, 2041, 472, 5361, 6542, 11, 51800], + "temperature": 0.0, "avg_logprob": -0.13872027146188837, "compression_ratio": 1.7232142857142858, + "no_speech_prob": 0.021721016615629196}, {"id": 194, "seek": 117928, "start": 1179.28, + "end": 1186.16, "text": " still in enterprise search was how would you rank these + documents, right? Because you don''t have", "tokens": [50364, 920, 294, 14132, 3164, + 390, 577, 576, 291, 6181, 613, 8512, 11, 558, 30, 1436, 291, 500, 380, 362, 50708], + "temperature": 0.0, "avg_logprob": -0.17276717341223427, "compression_ratio": 1.5330396475770924, + "no_speech_prob": 0.0022269641049206257}, {"id": 195, "seek": 117928, "start": 1186.16, + "end": 1192.8, "text": " a lot of signals. You simply have the documents themselves. + And so would you say that", "tokens": [50708, 257, 688, 295, 12354, 13, 509, 2935, + 362, 264, 8512, 2969, 13, 400, 370, 576, 291, 584, 300, 51040], "temperature": 0.0, + "avg_logprob": -0.17276717341223427, "compression_ratio": 1.5330396475770924, "no_speech_prob": + 0.0022269641049206257}, {"id": 196, "seek": 117928, "start": 1192.8, "end": 1199.12, + "text": " vector search now opens up this horizon for us? It helps solve this problem.", + "tokens": [51040, 8062, 3164, 586, 9870, 493, 341, 18046, 337, 505, 30, 467, 3665, + 5039, 341, 1154, 13, 51356], "temperature": 0.0, "avg_logprob": -0.17276717341223427, + "compression_ratio": 1.5330396475770924, "no_speech_prob": 0.0022269641049206257}, + {"id": 197, "seek": 117928, "start": 1200.16, "end": 1207.84, "text": " Absolutely. + And I think if we untangle it a little bit, it gets back to Google. In fact,", "tokens": + [51408, 7021, 13, 400, 286, 519, 498, 321, 1701, 7846, 309, 257, 707, 857, 11, 309, + 2170, 646, 281, 3329, 13, 682, 1186, 11, 51792], "temperature": 0.0, "avg_logprob": + -0.17276717341223427, "compression_ratio": 1.5330396475770924, "no_speech_prob": + 0.0022269641049206257}, {"id": 198, "seek": 120784, "start": 1207.84, "end": 1212.9599999999998, + "text": " it goes right back to Google. Google had the challenge of make they had + the biggest", "tokens": [50364, 309, 1709, 558, 646, 281, 3329, 13, 3329, 632, 264, + 3430, 295, 652, 436, 632, 264, 3880, 50620], "temperature": 0.0, "avg_logprob": + -0.14617651542731092, "compression_ratio": 1.6923076923076923, "no_speech_prob": + 0.0018704732647165656}, {"id": 199, "seek": 120784, "start": 1213.9199999999998, + "end": 1222.32, "text": " data set in history. The web incredibly interlinked. And + they did the absolute best job of", "tokens": [50668, 1412, 992, 294, 2503, 13, + 440, 3670, 6252, 728, 22473, 292, 13, 400, 436, 630, 264, 8236, 1151, 1691, 295, + 51088], "temperature": 0.0, "avg_logprob": -0.14617651542731092, "compression_ratio": + 1.6923076923076923, "no_speech_prob": 0.0018704732647165656}, {"id": 200, "seek": + 120784, "start": 1222.32, "end": 1227.52, "text": " figuring out how to model that + structure. You weren''t searching every web page every time you", "tokens": [51088, + 15213, 484, 577, 281, 2316, 300, 3877, 13, 509, 4999, 380, 10808, 633, 3670, 3028, + 633, 565, 291, 51348], "temperature": 0.0, "avg_logprob": -0.14617651542731092, + "compression_ratio": 1.6923076923076923, "no_speech_prob": 0.0018704732647165656}, + {"id": 201, "seek": 120784, "start": 1227.52, "end": 1232.72, "text": " searched. + You were searching a structure that in fact is a large language model. Right? That''s", + "tokens": [51348, 22961, 13, 509, 645, 10808, 257, 3877, 300, 294, 1186, 307, 257, + 2416, 2856, 2316, 13, 1779, 30, 663, 311, 51608], "temperature": 0.0, "avg_logprob": + -0.14617651542731092, "compression_ratio": 1.6923076923076923, "no_speech_prob": + 0.0018704732647165656}, {"id": 202, "seek": 120784, "start": 1232.72, "end": 1237.52, + "text": " what they built. They were the one they pioneered it. But it was the very + first one. Or no, that''s", "tokens": [51608, 437, 436, 3094, 13, 814, 645, 264, + 472, 436, 19761, 4073, 309, 13, 583, 309, 390, 264, 588, 700, 472, 13, 1610, 572, + 11, 300, 311, 51848], "temperature": 0.0, "avg_logprob": -0.14617651542731092, "compression_ratio": + 1.6923076923076923, "no_speech_prob": 0.0018704732647165656}, {"id": 203, "seek": + 123752, "start": 1237.52, "end": 1242.48, "text": " probably not true at all. Burr + was an early one that got popular. I don''t want to make, I have no", "tokens": + [50364, 1391, 406, 2074, 412, 439, 13, 7031, 81, 390, 364, 2440, 472, 300, 658, + 3743, 13, 286, 500, 380, 528, 281, 652, 11, 286, 362, 572, 50612], "temperature": + 0.0, "avg_logprob": -0.10617377758026122, "compression_ratio": 1.7056737588652482, + "no_speech_prob": 0.0009625177481211722}, {"id": 204, "seek": 123752, "start": 1242.48, + "end": 1247.28, "text": " idea. Right? What came first? But Burr was certainly the + one that was the game changer. It was very", "tokens": [50612, 1558, 13, 1779, 30, + 708, 1361, 700, 30, 583, 7031, 81, 390, 3297, 264, 472, 300, 390, 264, 1216, 22822, + 13, 467, 390, 588, 50852], "temperature": 0.0, "avg_logprob": -0.10617377758026122, + "compression_ratio": 1.7056737588652482, "no_speech_prob": 0.0009625177481211722}, + {"id": 205, "seek": 123752, "start": 1247.28, "end": 1254.56, "text": " recognized. + That''s where the real popularization of transformer models I think came from. And + it''s", "tokens": [50852, 9823, 13, 663, 311, 689, 264, 957, 3743, 2144, 295, 31782, + 5245, 286, 519, 1361, 490, 13, 400, 309, 311, 51216], "temperature": 0.0, "avg_logprob": + -0.10617377758026122, "compression_ratio": 1.7056737588652482, "no_speech_prob": + 0.0009625177481211722}, {"id": 206, "seek": 123752, "start": 1254.56, "end": 1259.68, + "text": " that structure. What is that structure? It''s a structure that can evaluate + results almost", "tokens": [51216, 300, 3877, 13, 708, 307, 300, 3877, 30, 467, + 311, 257, 3877, 300, 393, 13059, 3542, 1920, 51472], "temperature": 0.0, "avg_logprob": + -0.10617377758026122, "compression_ratio": 1.7056737588652482, "no_speech_prob": + 0.0009625177481211722}, {"id": 207, "seek": 123752, "start": 1259.68, "end": 1267.04, + "text": " independent of the results themselves. You don''t have to look at every + web page. And so that''s", "tokens": [51472, 6695, 295, 264, 3542, 2969, 13, 509, + 500, 380, 362, 281, 574, 412, 633, 3670, 3028, 13, 400, 370, 300, 311, 51840], "temperature": + 0.0, "avg_logprob": -0.10617377758026122, "compression_ratio": 1.7056737588652482, + "no_speech_prob": 0.0009625177481211722}, {"id": 208, "seek": 126704, "start": 1267.04, + "end": 1271.68, "text": " the key. In fact, you''re absolutely right. I think there + have been many attempts to do meta", "tokens": [50364, 264, 2141, 13, 682, 1186, + 11, 291, 434, 3122, 558, 13, 286, 519, 456, 362, 668, 867, 15257, 281, 360, 19616, + 50596], "temperature": 0.0, "avg_logprob": -0.13549367119284236, "compression_ratio": + 1.6464285714285714, "no_speech_prob": 0.003937163390219212}, {"id": 209, "seek": + 126704, "start": 1271.68, "end": 1276.1599999999999, "text": " search and federated + search even against APIs. But you then end up with just all the results.", "tokens": + [50596, 3164, 293, 38024, 770, 3164, 754, 1970, 21445, 13, 583, 291, 550, 917, 493, + 365, 445, 439, 264, 3542, 13, 50820], "temperature": 0.0, "avg_logprob": -0.13549367119284236, + "compression_ratio": 1.6464285714285714, "no_speech_prob": 0.003937163390219212}, + {"id": 210, "seek": 126704, "start": 1277.04, "end": 1280.24, "text": " Tiled or + whatever it is, but it''s just all the results. And that doesn''t help with", "tokens": + [50864, 314, 7292, 420, 2035, 309, 307, 11, 457, 309, 311, 445, 439, 264, 3542, + 13, 400, 300, 1177, 380, 854, 365, 51024], "temperature": 0.0, "avg_logprob": -0.13549367119284236, + "compression_ratio": 1.6464285714285714, "no_speech_prob": 0.003937163390219212}, + {"id": 211, "seek": 126704, "start": 1280.24, "end": 1285.92, "text": " information + overload. It also doesn''t really get to the potential. So the key is, and what''s", + "tokens": [51024, 1589, 28777, 13, 467, 611, 1177, 380, 534, 483, 281, 264, 3995, + 13, 407, 264, 2141, 307, 11, 293, 437, 311, 51308], "temperature": 0.0, "avg_logprob": + -0.13549367119284236, "compression_ratio": 1.6464285714285714, "no_speech_prob": + 0.003937163390219212}, {"id": 212, "seek": 126704, "start": 1285.92, "end": 1292.56, + "text": " world uses, we use a large language model. There''s more to it, right? + There''s a relevancy algorithm", "tokens": [51308, 1002, 4960, 11, 321, 764, 257, + 2416, 2856, 2316, 13, 821, 311, 544, 281, 309, 11, 558, 30, 821, 311, 257, 25916, + 6717, 9284, 51640], "temperature": 0.0, "avg_logprob": -0.13549367119284236, "compression_ratio": + 1.6464285714285714, "no_speech_prob": 0.003937163390219212}, {"id": 213, "seek": + 129256, "start": 1292.56, "end": 1296.3999999999999, "text": " around it. There''s + a similarity pipeline around it, right? But at the end of the day,", "tokens": [50364, + 926, 309, 13, 821, 311, 257, 32194, 15517, 926, 309, 11, 558, 30, 583, 412, 264, + 917, 295, 264, 786, 11, 50556], "temperature": 0.0, "avg_logprob": -0.10684136421449723, + "compression_ratio": 1.5822784810126582, "no_speech_prob": 0.0030361844692379236}, + {"id": 214, "seek": 129256, "start": 1297.04, "end": 1303.76, "text": " there''s + a large model that evaluates data as vectors with real numbers. And it allows you + to do", "tokens": [50588, 456, 311, 257, 2416, 2316, 300, 6133, 1024, 1412, 382, + 18875, 365, 957, 3547, 13, 400, 309, 4045, 291, 281, 360, 50924], "temperature": + 0.0, "avg_logprob": -0.10684136421449723, "compression_ratio": 1.5822784810126582, + "no_speech_prob": 0.0030361844692379236}, {"id": 215, "seek": 129256, "start": 1303.76, + "end": 1310.48, "text": " incredible comparisons. And the thing that, as I put this + together, I wrote it nights and weekends", "tokens": [50924, 4651, 33157, 13, 400, + 264, 551, 300, 11, 382, 286, 829, 341, 1214, 11, 286, 4114, 309, 13249, 293, 23595, + 51260], "temperature": 0.0, "avg_logprob": -0.10684136421449723, "compression_ratio": + 1.5822784810126582, "no_speech_prob": 0.0030361844692379236}, {"id": 216, "seek": + 129256, "start": 1311.76, "end": 1318.72, "text": " starting last year. And when + I started to get results from it, I was shocked because I did not", "tokens": [51324, + 2891, 1036, 1064, 13, 400, 562, 286, 1409, 281, 483, 3542, 490, 309, 11, 286, 390, + 12763, 570, 286, 630, 406, 51672], "temperature": 0.0, "avg_logprob": -0.10684136421449723, + "compression_ratio": 1.5822784810126582, "no_speech_prob": 0.0030361844692379236}, + {"id": 217, "seek": 131872, "start": 1318.72, "end": 1326.4, "text": " expect it + to go to be as good as it came out. The thing about relevancy, right? It''s, oh, + man,", "tokens": [50364, 2066, 309, 281, 352, 281, 312, 382, 665, 382, 309, 1361, + 484, 13, 440, 551, 466, 25916, 6717, 11, 558, 30, 467, 311, 11, 1954, 11, 587, 11, + 50748], "temperature": 0.0, "avg_logprob": -0.14171066904455665, "compression_ratio": + 1.6605839416058394, "no_speech_prob": 0.003876862581819296}, {"id": 218, "seek": + 131872, "start": 1326.4, "end": 1331.2, "text": " we can always say we can, we''ll + identify it when we see it. But building tests around it", "tokens": [50748, 321, + 393, 1009, 584, 321, 393, 11, 321, 603, 5876, 309, 562, 321, 536, 309, 13, 583, + 2390, 6921, 926, 309, 50988], "temperature": 0.0, "avg_logprob": -0.14171066904455665, + "compression_ratio": 1.6605839416058394, "no_speech_prob": 0.003876862581819296}, + {"id": 219, "seek": 131872, "start": 1331.2, "end": 1336.88, "text": " is very difficult. + You come out with gold standards. And I love all the tooling. There''s so much", + "tokens": [50988, 307, 588, 2252, 13, 509, 808, 484, 365, 3821, 7787, 13, 400, 286, + 959, 439, 264, 46593, 13, 821, 311, 370, 709, 51272], "temperature": 0.0, "avg_logprob": + -0.14171066904455665, "compression_ratio": 1.6605839416058394, "no_speech_prob": + 0.003876862581819296}, {"id": 220, "seek": 131872, "start": 1336.88, "end": 1340.64, + "text": " good tooling around it. But at the end of the day, it all depends, fundamentally, + on really", "tokens": [51272, 665, 46593, 926, 309, 13, 583, 412, 264, 917, 295, + 264, 786, 11, 309, 439, 5946, 11, 17879, 11, 322, 534, 51460], "temperature": 0.0, + "avg_logprob": -0.14171066904455665, "compression_ratio": 1.6605839416058394, "no_speech_prob": + 0.003876862581819296}, {"id": 221, "seek": 131872, "start": 1341.3600000000001, + "end": 1347.28, "text": " some set of finite labeled outcomes, right? That''s what + it is. I found another way", "tokens": [51496, 512, 992, 295, 19362, 21335, 10070, + 11, 558, 30, 663, 311, 437, 309, 307, 13, 286, 1352, 1071, 636, 51792], "temperature": + 0.0, "avg_logprob": -0.14171066904455665, "compression_ratio": 1.6605839416058394, + "no_speech_prob": 0.003876862581819296}, {"id": 222, "seek": 134728, "start": 1347.44, + "end": 1354.0, "text": " to measure the relevancy without doing that. And the way + to do that is how far to the right", "tokens": [50372, 281, 3481, 264, 25916, 6717, + 1553, 884, 300, 13, 400, 264, 636, 281, 360, 300, 307, 577, 1400, 281, 264, 558, + 50700], "temperature": 0.0, "avg_logprob": -0.19455844099803637, "compression_ratio": + 1.7393364928909953, "no_speech_prob": 0.0018015250097960234}, {"id": 223, "seek": + 134728, "start": 1354.56, "end": 1362.08, "text": " are the term hits. And in, when + you''re using swirl, it favors because of the large language,", "tokens": [50728, + 366, 264, 1433, 8664, 13, 400, 294, 11, 562, 291, 434, 1228, 30310, 11, 309, 40554, + 570, 295, 264, 2416, 2856, 11, 51104], "temperature": 0.0, "avg_logprob": -0.19455844099803637, + "compression_ratio": 1.7393364928909953, "no_speech_prob": 0.0018015250097960234}, + {"id": 224, "seek": 134728, "start": 1362.08, "end": 1370.6399999999999, "text": + " the mall we used, it favors hits that are to the left, beginning of the sentence. + It views", "tokens": [51104, 264, 16026, 321, 1143, 11, 309, 40554, 8664, 300, 366, + 281, 264, 1411, 11, 2863, 295, 264, 8174, 13, 467, 6809, 51532], "temperature": + 0.0, "avg_logprob": -0.19455844099803637, "compression_ratio": 1.7393364928909953, + "no_speech_prob": 0.0018015250097960234}, {"id": 225, "seek": 134728, "start": 1370.6399999999999, + "end": 1375.76, "text": " aboutness as being at the beginning of the sentence. It''s + extremely good at discriminating,", "tokens": [51532, 466, 1287, 382, 885, 412, + 264, 2863, 295, 264, 8174, 13, 467, 311, 4664, 665, 412, 20828, 990, 11, 51788], + "temperature": 0.0, "avg_logprob": -0.19455844099803637, "compression_ratio": 1.7393364928909953, + "no_speech_prob": 0.0018015250097960234}, {"id": 226, "seek": 137576, "start": 1375.76, + "end": 1383.52, "text": " again, identifying hits that are in passing. So I think + we can all agree. Relovancy, I''ve always", "tokens": [50364, 797, 11, 16696, 8664, + 300, 366, 294, 8437, 13, 407, 286, 519, 321, 393, 439, 3986, 13, 8738, 5179, 6717, + 11, 286, 600, 1009, 50752], "temperature": 0.0, "avg_logprob": -0.24532432556152345, + "compression_ratio": 1.6485355648535565, "no_speech_prob": 0.0020069556776434183}, + {"id": 227, "seek": 137576, "start": 1383.52, "end": 1389.2, "text": " viewed relevancy + as a bit of a stepped function. The absolute top is exactly what I searched for", + "tokens": [50752, 19174, 25916, 6717, 382, 257, 857, 295, 257, 15251, 2445, 13, + 440, 8236, 1192, 307, 2293, 437, 286, 22961, 337, 51036], "temperature": 0.0, "avg_logprob": + -0.24532432556152345, "compression_ratio": 1.6485355648535565, "no_speech_prob": + 0.0020069556776434183}, {"id": 228, "seek": 137576, "start": 1389.52, "end": 1393.6, + "text": " in the entire field of the title and the body, right? At the end of the, + the hits at the beginning", "tokens": [51052, 294, 264, 2302, 2519, 295, 264, 4876, + 293, 264, 1772, 11, 558, 30, 1711, 264, 917, 295, 264, 11, 264, 8664, 412, 264, + 2863, 51256], "temperature": 0.0, "avg_logprob": -0.24532432556152345, "compression_ratio": + 1.6485355648535565, "no_speech_prob": 0.0020069556776434183}, {"id": 229, "seek": + 137576, "start": 1393.6, "end": 1398.24, "text": " of the body, we can probably + agree that''s got to be a good hit, the degree there''s a title in a body.", "tokens": + [51256, 295, 264, 1772, 11, 321, 393, 1391, 3986, 300, 311, 658, 281, 312, 257, + 665, 2045, 11, 264, 4314, 456, 311, 257, 4876, 294, 257, 1772, 13, 51488], "temperature": + 0.0, "avg_logprob": -0.24532432556152345, "compression_ratio": 1.6485355648535565, + "no_speech_prob": 0.0020069556776434183}, {"id": 230, "seek": 139824, "start": 1398.48, + "end": 1407.2, "text": " And then we all fear the terrible mention, right? The enemy + of relevancy is one mention of New", "tokens": [50376, 400, 550, 321, 439, 4240, + 264, 6237, 2152, 11, 558, 30, 440, 5945, 295, 25916, 6717, 307, 472, 2152, 295, + 1873, 50812], "temperature": 0.0, "avg_logprob": -0.13642728446733834, "compression_ratio": + 1.7356828193832599, "no_speech_prob": 0.008744871243834496}, {"id": 231, "seek": + 139824, "start": 1407.2, "end": 1411.6, "text": " York at the very end, right? It''s + like they''re in the list of cities that absolutely have nothing to do", "tokens": + [50812, 3609, 412, 264, 588, 917, 11, 558, 30, 467, 311, 411, 436, 434, 294, 264, + 1329, 295, 6486, 300, 3122, 362, 1825, 281, 360, 51032], "temperature": 0.0, "avg_logprob": + -0.13642728446733834, "compression_ratio": 1.7356828193832599, "no_speech_prob": + 0.008744871243834496}, {"id": 232, "seek": 139824, "start": 1411.6, "end": 1418.48, + "text": " with the big apple. And that''s what I found is that the relevancy function, + the lower you are in", "tokens": [51032, 365, 264, 955, 10606, 13, 400, 300, 311, + 437, 286, 1352, 307, 300, 264, 25916, 6717, 2445, 11, 264, 3126, 291, 366, 294, + 51376], "temperature": 0.0, "avg_logprob": -0.13642728446733834, "compression_ratio": + 1.7356828193832599, "no_speech_prob": 0.008744871243834496}, {"id": 233, "seek": + 139824, "start": 1418.48, "end": 1423.52, "text": " the result list, the more to + the right your search terms are. And the relevancy is what, the other", "tokens": + [51376, 264, 1874, 1329, 11, 264, 544, 281, 264, 558, 428, 3164, 2115, 366, 13, + 400, 264, 25916, 6717, 307, 437, 11, 264, 661, 51628], "temperature": 0.0, "avg_logprob": + -0.13642728446733834, "compression_ratio": 1.7356828193832599, "no_speech_prob": + 0.008744871243834496}, {"id": 234, "seek": 142352, "start": 1423.52, "end": 1428.48, + "text": " thing about meta searches is you don''t have the documents. I believe + that an evidence-based approach", "tokens": [50364, 551, 466, 19616, 26701, 307, + 291, 500, 380, 362, 264, 8512, 13, 286, 1697, 300, 364, 4467, 12, 6032, 3109, 50612], + "temperature": 0.0, "avg_logprob": -0.17577994497198807, "compression_ratio": 1.625531914893617, + "no_speech_prob": 0.0018853303045034409}, {"id": 235, "seek": 142352, "start": 1428.48, + "end": 1437.28, "text": " is critical. Did the silo return the search terms that + you, the user put in, are they visible in", "tokens": [50612, 307, 4924, 13, 2589, + 264, 3425, 78, 2736, 264, 3164, 2115, 300, 291, 11, 264, 4195, 829, 294, 11, 366, + 436, 8974, 294, 51052], "temperature": 0.0, "avg_logprob": -0.17577994497198807, + "compression_ratio": 1.625531914893617, "no_speech_prob": 0.0018853303045034409}, + {"id": 236, "seek": 142352, "start": 1437.28, "end": 1442.0, "text": " the results? + If they''re not visible, then there''s a question. So that''s the other piece of + it is we do", "tokens": [51052, 264, 3542, 30, 759, 436, 434, 406, 8974, 11, 550, + 456, 311, 257, 1168, 13, 407, 300, 311, 264, 661, 2522, 295, 309, 307, 321, 360, + 51288], "temperature": 0.0, "avg_logprob": -0.17577994497198807, "compression_ratio": + 1.625531914893617, "no_speech_prob": 0.0018853303045034409}, {"id": 237, "seek": + 142352, "start": 1442.0, "end": 1448.24, "text": " use an evident space metric combined + with similarity to say to rank and it works.", "tokens": [51288, 764, 364, 16371, + 1901, 20678, 9354, 365, 32194, 281, 584, 281, 6181, 293, 309, 1985, 13, 51600], + "temperature": 0.0, "avg_logprob": -0.17577994497198807, "compression_ratio": 1.625531914893617, + "no_speech_prob": 0.0018853303045034409}, {"id": 238, "seek": 144824, "start": 1449.2, + "end": 1456.4, "text": " And now, so that said, here''s all the stuff that I just + left out. You have to normalize the query,", "tokens": [50412, 400, 586, 11, 370, + 300, 848, 11, 510, 311, 439, 264, 1507, 300, 286, 445, 1411, 484, 13, 509, 362, + 281, 2710, 1125, 264, 14581, 11, 50772], "temperature": 0.0, "avg_logprob": -0.1387463088082795, + "compression_ratio": 1.6334661354581674, "no_speech_prob": 0.012598407454788685}, + {"id": 239, "seek": 144824, "start": 1457.36, "end": 1461.36, "text": " especially + if you interpret the query and rewrite it. One of the most important things about + meta searches", "tokens": [50820, 2318, 498, 291, 7302, 264, 14581, 293, 28132, + 309, 13, 1485, 295, 264, 881, 1021, 721, 466, 19616, 26701, 51020], "temperature": + 0.0, "avg_logprob": -0.1387463088082795, "compression_ratio": 1.6334661354581674, + "no_speech_prob": 0.012598407454788685}, {"id": 240, "seek": 144824, "start": 1461.36, + "end": 1468.8, "text": " you can''t send the same query to every endpoint. Not all + endpoints are equal. Some endpoints, for example,", "tokens": [51020, 291, 393, + 380, 2845, 264, 912, 14581, 281, 633, 35795, 13, 1726, 439, 917, 20552, 366, 2681, + 13, 2188, 917, 20552, 11, 337, 1365, 11, 51392], "temperature": 0.0, "avg_logprob": + -0.1387463088082795, "compression_ratio": 1.6334661354581674, "no_speech_prob": + 0.012598407454788685}, {"id": 241, "seek": 144824, "start": 1468.8, "end": 1474.24, + "text": " might be a database that''s really able to target one field at a time + effectively. So for example,", "tokens": [51392, 1062, 312, 257, 8149, 300, 311, + 534, 1075, 281, 3779, 472, 2519, 412, 257, 565, 8659, 13, 407, 337, 1365, 11, 51664], + "temperature": 0.0, "avg_logprob": -0.1387463088082795, "compression_ratio": 1.6334661354581674, + "no_speech_prob": 0.012598407454788685}, {"id": 242, "seek": 147424, "start": 1474.24, + "end": 1479.36, "text": " they might be a repository that knows about companies. + So if your search is electric vehicle Tesla,", "tokens": [50364, 436, 1062, 312, + 257, 25841, 300, 3255, 466, 3431, 13, 407, 498, 428, 3164, 307, 5210, 5864, 13666, + 11, 50620], "temperature": 0.0, "avg_logprob": -0.16039108211158687, "compression_ratio": + 1.857677902621723, "no_speech_prob": 0.0033554621040821075}, {"id": 243, "seek": + 147424, "start": 1479.36, "end": 1484.24, "text": " don''t send electric vehicle + in, just send Tesla. So we provide a way to mark that. So we''re all", "tokens": + [50620, 500, 380, 2845, 5210, 5864, 294, 11, 445, 2845, 13666, 13, 407, 321, 2893, + 257, 636, 281, 1491, 300, 13, 407, 321, 434, 439, 50864], "temperature": 0.0, "avg_logprob": + -0.16039108211158687, "compression_ratio": 1.857677902621723, "no_speech_prob": + 0.0033554621040821075}, {"id": 244, "seek": 147424, "start": 1484.24, "end": 1489.2, + "text": " has the ability to tag each search provider with what it knows about. + So you''d write that electric", "tokens": [50864, 575, 264, 3485, 281, 6162, 1184, + 3164, 12398, 365, 437, 309, 3255, 466, 13, 407, 291, 1116, 2464, 300, 5210, 51112], + "temperature": 0.0, "avg_logprob": -0.16039108211158687, "compression_ratio": 1.857677902621723, + "no_speech_prob": 0.0033554621040821075}, {"id": 245, "seek": 147424, "start": 1489.2, + "end": 1497.84, "text": " vehicle company colon Tesla. Tesla goes just to the company + silos, the query. Everybody else drops the", "tokens": [51112, 5864, 2237, 8255, + 13666, 13, 13666, 1709, 445, 281, 264, 2237, 48893, 11, 264, 14581, 13, 7646, 1646, + 11438, 264, 51544], "temperature": 0.0, "avg_logprob": -0.16039108211158687, "compression_ratio": + 1.857677902621723, "no_speech_prob": 0.0033554621040821075}, {"id": 246, "seek": + 147424, "start": 1497.84, "end": 1502.16, "text": " tag. So Google gets electric + vehicle Tesla, which of course, it doesn''t have a magnificent job on.", "tokens": + [51544, 6162, 13, 407, 3329, 2170, 5210, 5864, 13666, 11, 597, 295, 1164, 11, 309, + 1177, 380, 362, 257, 23690, 1691, 322, 13, 51760], "temperature": 0.0, "avg_logprob": + -0.16039108211158687, "compression_ratio": 1.857677902621723, "no_speech_prob": + 0.0033554621040821075}, {"id": 247, "seek": 150216, "start": 1502.16, "end": 1506.88, + "text": " So then you have to normalize the query when you''re scoring, as well + as you have to normalize each", "tokens": [50364, 407, 550, 291, 362, 281, 2710, + 1125, 264, 14581, 562, 291, 434, 22358, 11, 382, 731, 382, 291, 362, 281, 2710, + 1125, 1184, 50600], "temperature": 0.0, "avg_logprob": -0.12436229926495512, "compression_ratio": + 1.7167235494880546, "no_speech_prob": 0.0015101874014362693}, {"id": 248, "seek": + 150216, "start": 1506.88, "end": 1513.2, "text": " field, right, as normal. Freshness + is certainly an interesting thing. I found the model also works best if", "tokens": + [50600, 2519, 11, 558, 11, 382, 2710, 13, 22843, 1287, 307, 3297, 364, 1880, 551, + 13, 286, 1352, 264, 2316, 611, 1985, 1151, 498, 50916], "temperature": 0.0, "avg_logprob": + -0.12436229926495512, "compression_ratio": 1.7167235494880546, "no_speech_prob": + 0.0015101874014362693}, {"id": 249, "seek": 150216, "start": 1513.2, "end": 1520.0800000000002, + "text": " we add a boost based on the top topness of the results. So if a repository + gave it rank number one,", "tokens": [50916, 321, 909, 257, 9194, 2361, 322, 264, + 1192, 1192, 1287, 295, 264, 3542, 13, 407, 498, 257, 25841, 2729, 309, 6181, 1230, + 472, 11, 51260], "temperature": 0.0, "avg_logprob": -0.12436229926495512, "compression_ratio": + 1.7167235494880546, "no_speech_prob": 0.0015101874014362693}, {"id": 250, "seek": + 150216, "start": 1520.0800000000002, "end": 1524.88, "text": " we should probably + at least factor that in a little bit. And then of course, there''s things like", + "tokens": [51260, 321, 820, 1391, 412, 1935, 5952, 300, 294, 257, 707, 857, 13, + 400, 550, 295, 1164, 11, 456, 311, 721, 411, 51500], "temperature": 0.0, "avg_logprob": + -0.12436229926495512, "compression_ratio": 1.7167235494880546, "no_speech_prob": + 0.0015101874014362693}, {"id": 251, "seek": 150216, "start": 1524.88, "end": 1530.48, + "text": " number of hits. And vector similarity is ultimately used, right? We aggregate + vector similarities to", "tokens": [51500, 1230, 295, 8664, 13, 400, 8062, 32194, + 307, 6284, 1143, 11, 558, 30, 492, 26118, 8062, 24197, 281, 51780], "temperature": + 0.0, "avg_logprob": -0.12436229926495512, "compression_ratio": 1.7167235494880546, + "no_speech_prob": 0.0015101874014362693}, {"id": 252, "seek": 153048, "start": 1530.48, + "end": 1536.48, "text": " reflect the evidence level. And then the strength of it, + right, is captured in the similarity. So a lot", "tokens": [50364, 5031, 264, 4467, + 1496, 13, 400, 550, 264, 3800, 295, 309, 11, 558, 11, 307, 11828, 294, 264, 32194, + 13, 407, 257, 688, 50664], "temperature": 0.0, "avg_logprob": -0.11056848091654259, + "compression_ratio": 1.610655737704918, "no_speech_prob": 0.002110518282279372}, + {"id": 253, "seek": 153048, "start": 1536.48, "end": 1543.44, "text": " went into + it. It''s probably the most awful module in our in our repo, but somebody smarter + will", "tokens": [50664, 1437, 666, 309, 13, 467, 311, 1391, 264, 881, 11232, 10088, + 294, 527, 294, 527, 49040, 11, 457, 2618, 20294, 486, 51012], "temperature": 0.0, + "avg_logprob": -0.11056848091654259, "compression_ratio": 1.610655737704918, "no_speech_prob": + 0.002110518282279372}, {"id": 254, "seek": 153048, "start": 1543.44, "end": 1549.04, + "text": " rewrite it soon, but it works. And that''s the important thing. And that + is why I''m here today,", "tokens": [51012, 28132, 309, 2321, 11, 457, 309, 1985, + 13, 400, 300, 311, 264, 1021, 551, 13, 400, 300, 307, 983, 286, 478, 510, 965, 11, + 51292], "temperature": 0.0, "avg_logprob": -0.11056848091654259, "compression_ratio": + 1.610655737704918, "no_speech_prob": 0.002110518282279372}, {"id": 255, "seek": + 153048, "start": 1549.84, "end": 1555.6, "text": " right? I have exited other ventures + because I believe in this so much. And I put together a little", "tokens": [51332, + 558, 30, 286, 362, 454, 1226, 661, 6931, 1303, 570, 286, 1697, 294, 341, 370, 709, + 13, 400, 286, 829, 1214, 257, 707, 51620], "temperature": 0.0, "avg_logprob": -0.11056848091654259, + "compression_ratio": 1.610655737704918, "no_speech_prob": 0.002110518282279372}, + {"id": 256, "seek": 155560, "start": 1556.56, "end": 1561.76, "text": " company + that is here to support it. It''s 100% open source under Apache 2.0. Get it or grab + the", "tokens": [50412, 2237, 300, 307, 510, 281, 1406, 309, 13, 467, 311, 2319, + 4, 1269, 4009, 833, 46597, 568, 13, 15, 13, 3240, 309, 420, 4444, 264, 50672], "temperature": + 0.0, "avg_logprob": -0.1441481304168701, "compression_ratio": 1.5261044176706828, + "no_speech_prob": 0.056096650660037994}, {"id": 257, "seek": 155560, "start": 1561.76, + "end": 1569.04, "text": " darker and you can make it run in two lines. And you''ll + see. Yeah, that sounds so fantastic. And I''m", "tokens": [50672, 12741, 293, 291, + 393, 652, 309, 1190, 294, 732, 3876, 13, 400, 291, 603, 536, 13, 865, 11, 300, 3263, + 370, 5456, 13, 400, 286, 478, 51036], "temperature": 0.0, "avg_logprob": -0.1441481304168701, + "compression_ratio": 1.5261044176706828, "no_speech_prob": 0.056096650660037994}, + {"id": 258, "seek": 155560, "start": 1569.04, "end": 1574.8799999999999, "text": + " sure our listeners will take a look, especially because it''s open source. It''s + much easier to", "tokens": [51036, 988, 527, 23274, 486, 747, 257, 574, 11, 2318, + 570, 309, 311, 1269, 4009, 13, 467, 311, 709, 3571, 281, 51328], "temperature": + 0.0, "avg_logprob": -0.1441481304168701, "compression_ratio": 1.5261044176706828, + "no_speech_prob": 0.056096650660037994}, {"id": 259, "seek": 155560, "start": 1575.6, + "end": 1583.4399999999998, "text": " you know, start hacking over the weekend or + something. I also wanted to ask you before you", "tokens": [51364, 291, 458, 11, + 722, 31422, 670, 264, 6711, 420, 746, 13, 286, 611, 1415, 281, 1029, 291, 949, 291, + 51756], "temperature": 0.0, "avg_logprob": -0.1441481304168701, "compression_ratio": + 1.5261044176706828, "no_speech_prob": 0.056096650660037994}, {"id": 260, "seek": + 158344, "start": 1583.92, "end": 1588.96, "text": " show us some demos. I think + this will be really, really interesting and change in format of the", "tokens": + [50388, 855, 505, 512, 33788, 13, 286, 519, 341, 486, 312, 534, 11, 534, 1880, 293, + 1319, 294, 7877, 295, 264, 50640], "temperature": 0.0, "avg_logprob": -0.15100580646145728, + "compression_ratio": 1.6398305084745763, "no_speech_prob": 0.003923231270164251}, + {"id": 261, "seek": 158344, "start": 1588.96, "end": 1598.96, "text": " podcast + to some extent. You mentioned the similarity aspect of vector search, right? And + so probably", "tokens": [50640, 7367, 281, 512, 8396, 13, 509, 2835, 264, 32194, + 4171, 295, 8062, 3164, 11, 558, 30, 400, 370, 1391, 51140], "temperature": 0.0, + "avg_logprob": -0.15100580646145728, "compression_ratio": 1.6398305084745763, "no_speech_prob": + 0.003923231270164251}, {"id": 262, "seek": 158344, "start": 1598.96, "end": 1604.88, + "text": " it also exists in keyword search to some extent, but there, as you said, + we trained ourselves to", "tokens": [51140, 309, 611, 8198, 294, 20428, 3164, 281, + 512, 8396, 11, 457, 456, 11, 382, 291, 848, 11, 321, 8895, 4175, 281, 51436], "temperature": + 0.0, "avg_logprob": -0.15100580646145728, "compression_ratio": 1.6398305084745763, + "no_speech_prob": 0.003923231270164251}, {"id": 263, "seek": 158344, "start": 1604.88, + "end": 1610.88, "text": " look at what we see. And if we see how a keyword, we kind + of trusted this more. Although this", "tokens": [51436, 574, 412, 437, 321, 536, + 13, 400, 498, 321, 536, 577, 257, 20428, 11, 321, 733, 295, 16034, 341, 544, 13, + 5780, 341, 51736], "temperature": 0.0, "avg_logprob": -0.15100580646145728, "compression_ratio": + 1.6398305084745763, "no_speech_prob": 0.003923231270164251}, {"id": 264, "seek": + 161088, "start": 1610.88, "end": 1617.44, "text": " probably varies per case, but + in similarity search and vector search, this similarity is a", "tokens": [50364, + 1391, 21716, 680, 1389, 11, 457, 294, 32194, 3164, 293, 8062, 3164, 11, 341, 32194, + 307, 257, 50692], "temperature": 0.0, "avg_logprob": -0.13356580016433553, "compression_ratio": + 1.759433962264151, "no_speech_prob": 0.006972862407565117}, {"id": 265, "seek": + 161088, "start": 1617.44, "end": 1623.92, "text": " play, right? So like, if you + cannot find a top result, or even like a middle relevant result,", "tokens": [50692, + 862, 11, 558, 30, 407, 411, 11, 498, 291, 2644, 915, 257, 1192, 1874, 11, 420, 754, + 411, 257, 2808, 7340, 1874, 11, 51016], "temperature": 0.0, "avg_logprob": -0.13356580016433553, + "compression_ratio": 1.759433962264151, "no_speech_prob": 0.006972862407565117}, + {"id": 266, "seek": 161088, "start": 1623.92, "end": 1630.0800000000002, "text": + " you only find like very distant ones. How do you detect this and how do you deal + with this?", "tokens": [51016, 291, 787, 915, 411, 588, 17275, 2306, 13, 1012, 360, + 291, 5531, 341, 293, 577, 360, 291, 2028, 365, 341, 30, 51324], "temperature": 0.0, + "avg_logprob": -0.13356580016433553, "compression_ratio": 1.759433962264151, "no_speech_prob": + 0.006972862407565117}, {"id": 267, "seek": 161088, "start": 1632.0, "end": 1636.88, + "text": " So the similarity will be poor and there''ll be no evidence. So the score + will be low and it will", "tokens": [51420, 407, 264, 32194, 486, 312, 4716, 293, + 456, 603, 312, 572, 4467, 13, 407, 264, 6175, 486, 312, 2295, 293, 309, 486, 51664], + "temperature": 0.0, "avg_logprob": -0.13356580016433553, "compression_ratio": 1.759433962264151, + "no_speech_prob": 0.006972862407565117}, {"id": 268, "seek": 163688, "start": 1636.88, + "end": 1642.16, "text": " be end up dropped to the back of the result list. That''s + the key. Now, there are a bunch of reasons", "tokens": [50364, 312, 917, 493, 8119, + 281, 264, 646, 295, 264, 1874, 1329, 13, 663, 311, 264, 2141, 13, 823, 11, 456, + 366, 257, 3840, 295, 4112, 50628], "temperature": 0.0, "avg_logprob": -0.13867864770404362, + "compression_ratio": 1.6551724137931034, "no_speech_prob": 0.005918482784181833}, + {"id": 269, "seek": 163688, "start": 1642.16, "end": 1647.7600000000002, "text": + " that can happen though. One of those reasons could be that perhaps we do not understand + the domain,", "tokens": [50628, 300, 393, 1051, 1673, 13, 1485, 295, 729, 4112, + 727, 312, 300, 4317, 321, 360, 406, 1223, 264, 9274, 11, 50908], "temperature": + 0.0, "avg_logprob": -0.13867864770404362, "compression_ratio": 1.6551724137931034, + "no_speech_prob": 0.005918482784181833}, {"id": 270, "seek": 163688, "start": 1647.7600000000002, + "end": 1654.48, "text": " as well as the silo does. And one thing, one example of + that is perhaps we''re dealing with", "tokens": [50908, 382, 731, 382, 264, 3425, + 78, 775, 13, 400, 472, 551, 11, 472, 1365, 295, 300, 307, 4317, 321, 434, 6260, + 365, 51244], "temperature": 0.0, "avg_logprob": -0.13867864770404362, "compression_ratio": + 1.6551724137931034, "no_speech_prob": 0.005918482784181833}, {"id": 271, "seek": + 163688, "start": 1654.48, "end": 1659.44, "text": " transformations of entities, + very often dictionary based, or sometimes it''s more subtle,", "tokens": [51244, + 34852, 295, 16667, 11, 588, 2049, 25890, 2361, 11, 420, 2171, 309, 311, 544, 13743, + 11, 51492], "temperature": 0.0, "avg_logprob": -0.13867864770404362, "compression_ratio": + 1.6551724137931034, "no_speech_prob": 0.005918482784181833}, {"id": 272, "seek": + 163688, "start": 1660.24, "end": 1666.0, "text": " but one of the things we learned + very quickly is QueryGee is an open, an amazing open source package", "tokens": + [51532, 457, 472, 295, 264, 721, 321, 3264, 588, 2661, 307, 2326, 2109, 38, 1653, + 307, 364, 1269, 11, 364, 2243, 1269, 4009, 7372, 51820], "temperature": 0.0, "avg_logprob": + -0.13867864770404362, "compression_ratio": 1.6551724137931034, "no_speech_prob": + 0.005918482784181833}, {"id": 273, "seek": 166600, "start": 1666.0, "end": 1671.6, + "text": " that''s used with Elastic Solar, an open source, an open search, I should + say. And it rewrites", "tokens": [50364, 300, 311, 1143, 365, 2699, 2750, 22385, + 11, 364, 1269, 4009, 11, 364, 1269, 3164, 11, 286, 820, 584, 13, 400, 309, 319, + 86, 30931, 50644], "temperature": 0.0, "avg_logprob": -0.10958928267161051, "compression_ratio": + 1.6714801444043321, "no_speech_prob": 0.0011835844488814473}, {"id": 274, "seek": + 166600, "start": 1671.6, "end": 1675.12, "text": " queries. It''s kind of the standard + for it. It''s very common to find it. It''s really amazing.", "tokens": [50644, + 24109, 13, 467, 311, 733, 295, 264, 3832, 337, 309, 13, 467, 311, 588, 2689, 281, + 915, 309, 13, 467, 311, 534, 2243, 13, 50820], "temperature": 0.0, "avg_logprob": + -0.10958928267161051, "compression_ratio": 1.6714801444043321, "no_speech_prob": + 0.0011835844488814473}, {"id": 275, "seek": 166600, "start": 1675.84, "end": 1681.04, + "text": " So here, the idea is that the silo knows something that we don''t. So + we actually have an", "tokens": [50856, 407, 510, 11, 264, 1558, 307, 300, 264, + 3425, 78, 3255, 746, 300, 321, 500, 380, 13, 407, 321, 767, 362, 364, 51116], "temperature": + 0.0, "avg_logprob": -0.10958928267161051, "compression_ratio": 1.6714801444043321, + "no_speech_prob": 0.0011835844488814473}, {"id": 276, "seek": 166600, "start": 1681.04, + "end": 1686.88, "text": " integration now where we listen to the feedback that comes + from each engine. So if they report,", "tokens": [51116, 10980, 586, 689, 321, 2140, + 281, 264, 5824, 300, 1487, 490, 1184, 2848, 13, 407, 498, 436, 2275, 11, 51408], + "temperature": 0.0, "avg_logprob": -0.10958928267161051, "compression_ratio": 1.6714801444043321, + "no_speech_prob": 0.0011835844488814473}, {"id": 277, "seek": 166600, "start": 1686.88, + "end": 1692.08, "text": " for example, if they highlight hits, we check the similarity. + The similarity is high enough", "tokens": [51408, 337, 1365, 11, 498, 436, 5078, + 8664, 11, 321, 1520, 264, 32194, 13, 440, 32194, 307, 1090, 1547, 51668], "temperature": + 0.0, "avg_logprob": -0.10958928267161051, "compression_ratio": 1.6714801444043321, + "no_speech_prob": 0.0011835844488814473}, {"id": 278, "seek": 169208, "start": 1692.1599999999999, + "end": 1698.32, "text": " and we''ll honor that. And that''s another idea, where + we want each of those silos to,", "tokens": [50368, 293, 321, 603, 5968, 300, 13, + 400, 300, 311, 1071, 1558, 11, 689, 321, 528, 1184, 295, 729, 48893, 281, 11, 50676], + "temperature": 0.0, "avg_logprob": -0.20130513984466267, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.0036669510882347822}, {"id": 279, "seek": 169208, "start": 1698.32, + "end": 1702.3999999999999, "text": " we want to honor their feedback. Now, we''re + not today, but in the future,", "tokens": [50676, 321, 528, 281, 5968, 641, 5824, + 13, 823, 11, 321, 434, 406, 965, 11, 457, 294, 264, 2027, 11, 50880], "temperature": + 0.0, "avg_logprob": -0.20130513984466267, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.0036669510882347822}, {"id": 280, "seek": 169208, "start": 1703.4399999999998, + "end": 1708.56, "text": " why not requery based on expanding our vocabulary around + the search? Those are all things", "tokens": [50932, 983, 406, 319, 358, 2109, 2361, + 322, 14702, 527, 19864, 926, 264, 3164, 30, 3950, 366, 439, 721, 51188], "temperature": + 0.0, "avg_logprob": -0.20130513984466267, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.0036669510882347822}, {"id": 281, "seek": 169208, "start": 1708.56, + "end": 1711.28, "text": " that can be done. And by the way, we''d love to get a + pull request if someone wants to do that.", "tokens": [51188, 300, 393, 312, 1096, + 13, 400, 538, 264, 636, 11, 321, 1116, 959, 281, 483, 257, 2235, 5308, 498, 1580, + 2738, 281, 360, 300, 13, 51324], "temperature": 0.0, "avg_logprob": -0.20130513984466267, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 0.0036669510882347822}, + {"id": 282, "seek": 169208, "start": 1713.12, "end": 1718.32, "text": " That honestly + is kind of the key to the whole thing.", "tokens": [51416, 663, 6095, 307, 733, + 295, 264, 2141, 281, 264, 1379, 551, 13, 51676], "temperature": 0.0, "avg_logprob": + -0.20130513984466267, "compression_ratio": 1.6363636363636365, "no_speech_prob": + 0.0036669510882347822}, {"id": 283, "seek": 171832, "start": 1719.28, "end": 1725.04, + "text": " Yeah. So you kind of like learn to innovate. Anyway, you have multiple + voter problems,", "tokens": [50412, 865, 13, 407, 291, 733, 295, 411, 1466, 281, + 33444, 13, 5684, 11, 291, 362, 3866, 21722, 2740, 11, 50700], "temperature": 0.0, + "avg_logprob": -0.21927140740787282, "compression_ratio": 1.547486033519553, "no_speech_prob": + 0.02564738504588604}, {"id": 284, "seek": 171832, "start": 1725.6799999999998, "end": + 1733.6799999999998, "text": " but you also want to really hear the signal, hear + out the signal from every of the voter,", "tokens": [50732, 457, 291, 611, 528, + 281, 534, 1568, 264, 6358, 11, 1568, 484, 264, 6358, 490, 633, 295, 264, 21722, + 11, 51132], "temperature": 0.0, "avg_logprob": -0.21927140740787282, "compression_ratio": + 1.547486033519553, "no_speech_prob": 0.02564738504588604}, {"id": 285, "seek": 171832, + "start": 1733.6799999999998, "end": 1743.76, "text": " and sort of like make sure + that you roll this up to the best formula, right? The best representation", "tokens": + [51132, 293, 1333, 295, 411, 652, 988, 300, 291, 3373, 341, 493, 281, 264, 1151, + 8513, 11, 558, 30, 440, 1151, 10290, 51636], "temperature": 0.0, "avg_logprob": + -0.21927140740787282, "compression_ratio": 1.547486033519553, "no_speech_prob": + 0.02564738504588604}, {"id": 286, "seek": 174376, "start": 1743.84, "end": 1750.0, + "text": " of this signal to the user. That''s right. Absolutely. And then you can + label some of those", "tokens": [50368, 295, 341, 6358, 281, 264, 4195, 13, 663, + 311, 558, 13, 7021, 13, 400, 550, 291, 393, 7645, 512, 295, 729, 50676], "temperature": + 0.0, "avg_logprob": -0.20709907059120922, "compression_ratio": 1.7065217391304348, + "no_speech_prob": 0.004691182170063257}, {"id": 287, "seek": 174376, "start": 1750.0, + "end": 1754.08, "text": " sounds because you''re right. Some of them are getting + really smart. Just some examples. I''ll", "tokens": [50676, 3263, 570, 291, 434, + 558, 13, 2188, 295, 552, 366, 1242, 534, 4069, 13, 1449, 512, 5110, 13, 286, 603, + 50880], "temperature": 0.0, "avg_logprob": -0.20709907059120922, "compression_ratio": + 1.7065217391304348, "no_speech_prob": 0.004691182170063257}, {"id": 288, "seek": + 174376, "start": 1754.08, "end": 1760.48, "text": " throw out some Vectara. Amazing, + amazing, incredible vector database. That''s probably an inadequate", "tokens": + [50880, 3507, 484, 512, 691, 557, 2419, 13, 14165, 11, 2243, 11, 4651, 8062, 8149, + 13, 663, 311, 1391, 364, 42107, 51200], "temperature": 0.0, "avg_logprob": -0.20709907059120922, + "compression_ratio": 1.7065217391304348, "no_speech_prob": 0.004691182170063257}, + {"id": 289, "seek": 174376, "start": 1760.48, "end": 1767.2, "text": " description, + but it answers questions on your documents. There''s a revolution in Vector,", "tokens": + [51200, 3855, 11, 457, 309, 6338, 1651, 322, 428, 8512, 13, 821, 311, 257, 8894, + 294, 691, 557, 284, 11, 51536], "temperature": 0.0, "avg_logprob": -0.20709907059120922, + "compression_ratio": 1.7065217391304348, "no_speech_prob": 0.004691182170063257}, + {"id": 290, "seek": 174376, "start": 1768.24, "end": 1772.72, "text": " in Vector + search. Some people are focused very much on performance, right? Some people are + focused", "tokens": [51588, 294, 691, 557, 284, 3164, 13, 2188, 561, 366, 5178, + 588, 709, 322, 3389, 11, 558, 30, 2188, 561, 366, 5178, 51812], "temperature": 0.0, + "avg_logprob": -0.20709907059120922, "compression_ratio": 1.7065217391304348, "no_speech_prob": + 0.004691182170063257}, {"id": 291, "seek": 177272, "start": 1772.72, "end": 1777.44, + "text": " on knowledge. Some people are focused on exporting Vectors. So I think + the enterprise, especially", "tokens": [50364, 322, 3601, 13, 2188, 561, 366, 5178, + 322, 44686, 691, 557, 830, 13, 407, 286, 519, 264, 14132, 11, 2318, 50600], "temperature": + 0.0, "avg_logprob": -0.15711737101056936, "compression_ratio": 1.7098976109215016, + "no_speech_prob": 0.0005176396225579083}, {"id": 292, "seek": 177272, "start": 1777.44, + "end": 1783.44, "text": " large enterprise, already has dozens of indexing tools + and engines and others. And there will be many", "tokens": [50600, 2416, 14132, + 11, 1217, 575, 18431, 295, 8186, 278, 3873, 293, 12982, 293, 2357, 13, 400, 456, + 486, 312, 867, 50900], "temperature": 0.0, "avg_logprob": -0.15711737101056936, + "compression_ratio": 1.7098976109215016, "no_speech_prob": 0.0005176396225579083}, + {"id": 293, "seek": 177272, "start": 1783.44, "end": 1787.92, "text": " of these + too, special case, right? There''ll be some that are incredible at customer service.", + "tokens": [50900, 295, 613, 886, 11, 2121, 1389, 11, 558, 30, 821, 603, 312, 512, + 300, 366, 4651, 412, 5474, 2643, 13, 51124], "temperature": 0.0, "avg_logprob": + -0.15711737101056936, "compression_ratio": 1.7098976109215016, "no_speech_prob": + 0.0005176396225579083}, {"id": 294, "seek": 177272, "start": 1787.92, "end": 1794.64, + "text": " And some will be incredible at exception handling. Some will be incredible + at perhaps sales pre-qualification.", "tokens": [51124, 400, 512, 486, 312, 4651, + 412, 11183, 13175, 13, 2188, 486, 312, 4651, 412, 4317, 5763, 659, 12, 22345, 3774, + 13, 51460], "temperature": 0.0, "avg_logprob": -0.15711737101056936, "compression_ratio": + 1.7098976109215016, "no_speech_prob": 0.0005176396225579083}, {"id": 295, "seek": + 177272, "start": 1794.64, "end": 1800.16, "text": " You know, I just sort of learned + from the past examples. Watson was going to diagnose everything,", "tokens": [51460, + 509, 458, 11, 286, 445, 1333, 295, 3264, 490, 264, 1791, 5110, 13, 25640, 390, 516, + 281, 36238, 1203, 11, 51736], "temperature": 0.0, "avg_logprob": -0.15711737101056936, + "compression_ratio": 1.7098976109215016, "no_speech_prob": 0.0005176396225579083}, + {"id": 296, "seek": 180016, "start": 1800.16, "end": 1803.92, "text": " right? And + I think what it ultimately did well was pre-approval authorizations.", "tokens": + [50364, 558, 30, 400, 286, 519, 437, 309, 6284, 630, 731, 390, 659, 12, 35821, 3337, + 3793, 14455, 13, 50552], "temperature": 0.0, "avg_logprob": -0.12786371477188602, + "compression_ratio": 1.6402877697841727, "no_speech_prob": 0.004836043808609247}, + {"id": 297, "seek": 180016, "start": 1805.3600000000001, "end": 1810.3200000000002, + "text": " So, but over time, I think it''s clearly these will all become more automated. + And so then,", "tokens": [50624, 407, 11, 457, 670, 565, 11, 286, 519, 309, 311, + 4448, 613, 486, 439, 1813, 544, 18473, 13, 400, 370, 550, 11, 50872], "temperature": + 0.0, "avg_logprob": -0.12786371477188602, "compression_ratio": 1.6402877697841727, + "no_speech_prob": 0.004836043808609247}, {"id": 298, "seek": 180016, "start": 1810.3200000000002, + "end": 1814.3200000000002, "text": " but you still need a way, if you''re trying + to figure out what''s new across these salads,", "tokens": [50872, 457, 291, 920, + 643, 257, 636, 11, 498, 291, 434, 1382, 281, 2573, 484, 437, 311, 777, 2108, 613, + 48025, 11, 51072], "temperature": 0.0, "avg_logprob": -0.12786371477188602, "compression_ratio": + 1.6402877697841727, "no_speech_prob": 0.004836043808609247}, {"id": 299, "seek": + 180016, "start": 1814.3200000000002, "end": 1818.96, "text": " you still need a + way to query them all. And so this world''s happy. We have an integration with chat", + "tokens": [51072, 291, 920, 643, 257, 636, 281, 14581, 552, 439, 13, 400, 370, 341, + 1002, 311, 2055, 13, 492, 362, 364, 10980, 365, 5081, 51304], "temperature": 0.0, + "avg_logprob": -0.12786371477188602, "compression_ratio": 1.6402877697841727, "no_speech_prob": + 0.004836043808609247}, {"id": 300, "seek": 180016, "start": 1818.96, "end": 1825.8400000000001, + "text": " GPT. You can query chat GPT. In fact, by default, we query it if you put + your key in every time.", "tokens": [51304, 26039, 51, 13, 509, 393, 14581, 5081, + 26039, 51, 13, 682, 1186, 11, 538, 7576, 11, 321, 14581, 309, 498, 291, 829, 428, + 2141, 294, 633, 565, 13, 51648], "temperature": 0.0, "avg_logprob": -0.12786371477188602, + "compression_ratio": 1.6402877697841727, "no_speech_prob": 0.004836043808609247}, + {"id": 301, "seek": 182584, "start": 1825.9199999999998, "end": 1830.72, "text": + " So you and we rewrite the query. If the query is a question, we just pass it right + along. If it''s not,", "tokens": [50368, 407, 291, 293, 321, 28132, 264, 14581, + 13, 759, 264, 14581, 307, 257, 1168, 11, 321, 445, 1320, 309, 558, 2051, 13, 759, + 309, 311, 406, 11, 50608], "temperature": 0.0, "avg_logprob": -0.158424617737297, + "compression_ratio": 1.7366412213740459, "no_speech_prob": 0.002280054846778512}, + {"id": 302, "seek": 182584, "start": 1830.72, "end": 1834.1599999999999, "text": + " we ask rewrite it using a prompt to something like tell me about", "tokens": [50608, + 321, 1029, 28132, 309, 1228, 257, 12391, 281, 746, 411, 980, 385, 466, 50780], "temperature": + 0.0, "avg_logprob": -0.158424617737297, "compression_ratio": 1.7366412213740459, + "no_speech_prob": 0.002280054846778512}, {"id": 303, "seek": 182584, "start": 1835.6, + "end": 1840.3999999999999, "text": " thing. So you get a summary, right, which we + pop up for, I think we also have a query processor.", "tokens": [50852, 551, 13, + 407, 291, 483, 257, 12691, 11, 558, 11, 597, 321, 1665, 493, 337, 11, 286, 519, + 321, 611, 362, 257, 14581, 15321, 13, 51092], "temperature": 0.0, "avg_logprob": + -0.158424617737297, "compression_ratio": 1.7366412213740459, "no_speech_prob": 0.002280054846778512}, + {"id": 304, "seek": 182584, "start": 1841.04, "end": 1845.84, "text": " So you can + have a chat GPT change the user''s query. Like you could say rewrite this to a Boolean,", + "tokens": [51124, 407, 291, 393, 362, 257, 5081, 26039, 51, 1319, 264, 4195, 311, + 14581, 13, 1743, 291, 727, 584, 28132, 341, 281, 257, 23351, 28499, 11, 51364], + "temperature": 0.0, "avg_logprob": -0.158424617737297, "compression_ratio": 1.7366412213740459, + "no_speech_prob": 0.002280054846778512}, {"id": 305, "seek": 182584, "start": 1845.84, + "end": 1851.12, "text": " or rewrite this why not to a vector. But in the end, right, + it''s going to do that on its own", "tokens": [51364, 420, 28132, 341, 983, 406, + 281, 257, 8062, 13, 583, 294, 264, 917, 11, 558, 11, 309, 311, 516, 281, 360, 300, + 322, 1080, 1065, 51628], "temperature": 0.0, "avg_logprob": -0.158424617737297, + "compression_ratio": 1.7366412213740459, "no_speech_prob": 0.002280054846778512}, + {"id": 306, "seek": 185112, "start": 1851.1999999999998, "end": 1857.4399999999998, + "text": " on the back side of things. So when you''re trying to solve problems in + the enterprise, the key is", "tokens": [50368, 322, 264, 646, 1252, 295, 721, 13, + 407, 562, 291, 434, 1382, 281, 5039, 2740, 294, 264, 14132, 11, 264, 2141, 307, + 50680], "temperature": 0.0, "avg_logprob": -0.10358491758020913, "compression_ratio": + 1.7686832740213523, "no_speech_prob": 0.005133743863552809}, {"id": 307, "seek": + 185112, "start": 1857.4399999999998, "end": 1863.52, "text": " you need an interface + to write to. And it would be nice not to have to write code to connect to", "tokens": + [50680, 291, 643, 364, 9226, 281, 2464, 281, 13, 400, 309, 576, 312, 1481, 406, + 281, 362, 281, 2464, 3089, 281, 1745, 281, 50984], "temperature": 0.0, "avg_logprob": + -0.10358491758020913, "compression_ratio": 1.7686832740213523, "no_speech_prob": + 0.005133743863552809}, {"id": 308, "seek": 185112, "start": 1863.52, "end": 1868.3999999999999, + "text": " all these things, getting back to your question about architecture. And + so those are the key abstractions", "tokens": [50984, 439, 613, 721, 11, 1242, 646, + 281, 428, 1168, 466, 9482, 13, 400, 370, 729, 366, 264, 2141, 12649, 626, 51228], + "temperature": 0.0, "avg_logprob": -0.10358491758020913, "compression_ratio": 1.7686832740213523, + "no_speech_prob": 0.005133743863552809}, {"id": 309, "seek": 185112, "start": 1868.3999999999999, + "end": 1872.7199999999998, "text": " in Swirl. Swirl, you don''t have to write code + to connect to an endpoint that we already have a", "tokens": [51228, 294, 3926, + 1648, 13, 3926, 1648, 11, 291, 500, 380, 362, 281, 2464, 3089, 281, 1745, 281, 364, + 35795, 300, 321, 1217, 362, 257, 51444], "temperature": 0.0, "avg_logprob": -0.10358491758020913, + "compression_ratio": 1.7686832740213523, "no_speech_prob": 0.005133743863552809}, + {"id": 310, "seek": 185112, "start": 1872.7199999999998, "end": 1877.6799999999998, + "text": " connector to. You just write a search provider. All you need to know is + JSON path and maybe be able to", "tokens": [51444, 19127, 281, 13, 509, 445, 2464, + 257, 3164, 12398, 13, 1057, 291, 643, 281, 458, 307, 31828, 3100, 293, 1310, 312, + 1075, 281, 51692], "temperature": 0.0, "avg_logprob": -0.10358491758020913, "compression_ratio": + 1.7686832740213523, "no_speech_prob": 0.005133743863552809}, {"id": 311, "seek": + 187768, "start": 1877.68, "end": 1882.0800000000002, "text": " read a little developer + API dog. Right. And then that you''ll pretty much be able to get the search", "tokens": + [50364, 1401, 257, 707, 10754, 9362, 3000, 13, 1779, 13, 400, 550, 300, 291, 603, + 1238, 709, 312, 1075, 281, 483, 264, 3164, 50584], "temperature": 0.0, "avg_logprob": + -0.1342258297029089, "compression_ratio": 1.6609589041095891, "no_speech_prob": + 0.00152679649181664}, {"id": 312, "seek": 187768, "start": 1882.0800000000002, "end": + 1887.8400000000001, "text": " provider. If you need to write a connector. And of + course, here''s the punch line. Well, I think", "tokens": [50584, 12398, 13, 759, + 291, 643, 281, 2464, 257, 19127, 13, 400, 295, 1164, 11, 510, 311, 264, 8135, 1622, + 13, 1042, 11, 286, 519, 50872], "temperature": 0.0, "avg_logprob": -0.1342258297029089, + "compression_ratio": 1.6609589041095891, "no_speech_prob": 0.00152679649181664}, + {"id": 313, "seek": 187768, "start": 1887.8400000000001, "end": 1892.88, "text": + " it will probably take you a couple hours, depending on your skill at Python. But + on average,", "tokens": [50872, 309, 486, 1391, 747, 291, 257, 1916, 2496, 11, 5413, + 322, 428, 5389, 412, 15329, 13, 583, 322, 4274, 11, 51124], "temperature": 0.0, + "avg_logprob": -0.1342258297029089, "compression_ratio": 1.6609589041095891, "no_speech_prob": + 0.00152679649181664}, {"id": 314, "seek": 187768, "start": 1892.88, "end": 1898.0, + "text": " it shouldn''t take more than two hours because I can give you a prompt. + And we can teach chat GPT", "tokens": [51124, 309, 4659, 380, 747, 544, 813, 732, + 2496, 570, 286, 393, 976, 291, 257, 12391, 13, 400, 321, 393, 2924, 5081, 26039, + 51, 51380], "temperature": 0.0, "avg_logprob": -0.1342258297029089, "compression_ratio": + 1.6609589041095891, "no_speech_prob": 0.00152679649181664}, {"id": 315, "seek": + 187768, "start": 1898.0, "end": 1903.04, "text": " about the connector class. You + should be able to get that done in a couple hours just fixing up what", "tokens": + [51380, 466, 264, 19127, 1508, 13, 509, 820, 312, 1075, 281, 483, 300, 1096, 294, + 257, 1916, 2496, 445, 19442, 493, 437, 51632], "temperature": 0.0, "avg_logprob": + -0.1342258297029089, "compression_ratio": 1.6609589041095891, "no_speech_prob": + 0.00152679649181664}, {"id": 316, "seek": 190304, "start": 1903.04, "end": 1909.76, + "text": " it does. I found that about 70% of the time, it will produce a workable + connector. Just", "tokens": [50364, 309, 775, 13, 286, 1352, 300, 466, 5285, 4, + 295, 264, 565, 11, 309, 486, 5258, 257, 589, 712, 19127, 13, 1449, 50700], "temperature": + 0.0, "avg_logprob": -0.16571494280281712, "compression_ratio": 1.6418439716312057, + "no_speech_prob": 0.002055591205134988}, {"id": 317, "seek": 190304, "start": 1909.76, + "end": 1914.6399999999999, "text": " fast. The right prompt, right? Teach it how + to teach it our connector class. And give it the", "tokens": [50700, 2370, 13, 440, + 558, 12391, 11, 558, 30, 26816, 309, 577, 281, 2924, 309, 527, 19127, 1508, 13, + 400, 976, 309, 264, 50944], "temperature": 0.0, "avg_logprob": -0.16571494280281712, + "compression_ratio": 1.6418439716312057, "no_speech_prob": 0.002055591205134988}, + {"id": 318, "seek": 190304, "start": 1914.6399999999999, "end": 1919.6, "text": + " right prompt and bang, you have sort of almost working codes. Yeah, I think this + is the best part", "tokens": [50944, 558, 12391, 293, 8550, 11, 291, 362, 1333, + 295, 1920, 1364, 14211, 13, 865, 11, 286, 519, 341, 307, 264, 1151, 644, 51192], + "temperature": 0.0, "avg_logprob": -0.16571494280281712, "compression_ratio": 1.6418439716312057, + "no_speech_prob": 0.002055591205134988}, {"id": 319, "seek": 190304, "start": 1919.6, + "end": 1926.1599999999999, "text": " of interfaces like chat GPT systems like chat + GPT is that you can outsource this mundane work", "tokens": [51192, 295, 28416, + 411, 5081, 26039, 51, 3652, 411, 5081, 26039, 51, 307, 300, 291, 393, 14758, 2948, + 341, 43497, 589, 51520], "temperature": 0.0, "avg_logprob": -0.16571494280281712, + "compression_ratio": 1.6418439716312057, "no_speech_prob": 0.002055591205134988}, + {"id": 320, "seek": 190304, "start": 1926.1599999999999, "end": 1932.72, "text": + " and really focus on the actual thing. I was actually born away myself. And to + some extent,", "tokens": [51520, 293, 534, 1879, 322, 264, 3539, 551, 13, 286, 390, + 767, 4232, 1314, 2059, 13, 400, 281, 512, 8396, 11, 51848], "temperature": 0.0, + "avg_logprob": -0.16571494280281712, "compression_ratio": 1.6418439716312057, "no_speech_prob": + 0.002055591205134988}, {"id": 321, "seek": 193272, "start": 1932.72, "end": 1938.72, + "text": " scared you weeks ago when I was just saying, Hey, can you create a Python + code which we''ll talk to", "tokens": [50364, 5338, 291, 3259, 2057, 562, 286, 390, + 445, 1566, 11, 1911, 11, 393, 291, 1884, 257, 15329, 3089, 597, 321, 603, 751, 281, + 50664], "temperature": 0.0, "avg_logprob": -0.23262975592362253, "compression_ratio": + 1.5120967741935485, "no_speech_prob": 0.0064412672072649}, {"id": 322, "seek": 193272, + "start": 1939.44, "end": 1945.76, "text": " TomTom search API map search API. And + it did create it. It just asked me to insert like a", "tokens": [50700, 5041, 23442, + 3164, 9362, 4471, 3164, 9362, 13, 400, 309, 630, 1884, 309, 13, 467, 445, 2351, + 385, 281, 8969, 411, 257, 51016], "temperature": 0.0, "avg_logprob": -0.23262975592362253, + "compression_ratio": 1.5120967741935485, "no_speech_prob": 0.0064412672072649}, + {"id": 323, "seek": 193272, "start": 1946.88, "end": 1952.48, "text": " token. So + I just subscribe developer token. But I was really blown away. Like I would have", + "tokens": [51072, 14862, 13, 407, 286, 445, 3022, 10754, 14862, 13, 583, 286, 390, + 534, 16479, 1314, 13, 1743, 286, 576, 362, 51352], "temperature": 0.0, "avg_logprob": + -0.23262975592362253, "compression_ratio": 1.5120967741935485, "no_speech_prob": + 0.0064412672072649}, {"id": 324, "seek": 193272, "start": 1952.48, "end": 1958.16, + "text": " spent probably like several half a days here and there figuring things + out, right? If it wasn''t", "tokens": [51352, 4418, 1391, 411, 2940, 1922, 257, + 1708, 510, 293, 456, 15213, 721, 484, 11, 558, 30, 759, 309, 2067, 380, 51636], + "temperature": 0.0, "avg_logprob": -0.23262975592362253, "compression_ratio": 1.5120967741935485, + "no_speech_prob": 0.0064412672072649}, {"id": 325, "seek": 195816, "start": 1958.16, + "end": 1966.48, "text": " TomTom or some other API. But yeah, I think this is amazing. + And I think, well, I believe that you", "tokens": [50364, 5041, 23442, 420, 512, + 661, 9362, 13, 583, 1338, 11, 286, 519, 341, 307, 2243, 13, 400, 286, 519, 11, 731, + 11, 286, 1697, 300, 291, 50780], "temperature": 0.0, "avg_logprob": -0.14554504344337865, + "compression_ratio": 1.5076923076923077, "no_speech_prob": 0.005329399835318327}, + {"id": 326, "seek": 195816, "start": 1966.48, "end": 1972.96, "text": " guys are + documenting a lot. But if you if you haven''t yet, which just explained within documents,", + "tokens": [50780, 1074, 366, 42360, 257, 688, 13, 583, 498, 291, 498, 291, 2378, + 380, 1939, 11, 597, 445, 8825, 1951, 8512, 11, 51104], "temperature": 0.0, "avg_logprob": + -0.14554504344337865, "compression_ratio": 1.5076923076923077, "no_speech_prob": + 0.005329399835318327}, {"id": 327, "seek": 195816, "start": 1972.96, "end": 1981.44, + "text": " I think that could save a lot of time for developers. But I was wondering + maybe you would like to", "tokens": [51104, 286, 519, 300, 727, 3155, 257, 688, + 295, 565, 337, 8849, 13, 583, 286, 390, 6359, 1310, 291, 576, 411, 281, 51528], + "temperature": 0.0, "avg_logprob": -0.14554504344337865, "compression_ratio": 1.5076923076923077, + "no_speech_prob": 0.005329399835318327}, {"id": 328, "seek": 198144, "start": 1981.44, + "end": 1988.96, "text": " show us a demo of swirl and then we''ll dig deeper into + that. Absolutely. So let me share my screen.", "tokens": [50364, 855, 505, 257, + 10723, 295, 30310, 293, 550, 321, 603, 2528, 7731, 666, 300, 13, 7021, 13, 407, + 718, 385, 2073, 452, 2568, 13, 50740], "temperature": 0.0, "avg_logprob": -0.15176015286832242, + "compression_ratio": 1.510989010989011, "no_speech_prob": 0.006961983162909746}, + {"id": 329, "seek": 198144, "start": 1990.0800000000002, "end": 1999.1200000000001, + "text": " So hopefully you can see my screen. Yes. So this is swirl. Actually, I''ll + start here.", "tokens": [50796, 407, 4696, 291, 393, 536, 452, 2568, 13, 1079, 13, + 407, 341, 307, 30310, 13, 5135, 11, 286, 603, 722, 510, 13, 51248], "temperature": + 0.0, "avg_logprob": -0.15176015286832242, "compression_ratio": 1.510989010989011, + "no_speech_prob": 0.006961983162909746}, {"id": 330, "seek": 198144, "start": 2000.24, + "end": 2008.64, "text": " This is the swirl repo. Everything you need to get started + is here. The readme describes,", "tokens": [51304, 639, 307, 264, 30310, 49040, + 13, 5471, 291, 643, 281, 483, 1409, 307, 510, 13, 440, 1401, 1398, 15626, 11, 51724], + "temperature": 0.0, "avg_logprob": -0.15176015286832242, "compression_ratio": 1.510989010989011, + "no_speech_prob": 0.006961983162909746}, {"id": 331, "seek": 200864, "start": 2009.44, + "end": 2014.16, "text": " pretty much the two commands you need to run to get swirl + running if you have Docker.", "tokens": [50404, 1238, 709, 264, 732, 16901, 291, + 643, 281, 1190, 281, 483, 30310, 2614, 498, 291, 362, 33772, 13, 50640], "temperature": + 0.0, "avg_logprob": -0.13183361291885376, "compression_ratio": 1.6232394366197183, + "no_speech_prob": 0.01708962209522724}, {"id": 332, "seek": 200864, "start": 2014.16, + "end": 2019.76, "text": " There are more detailed instructions if you want to download. + Everything that you''ll see here runs,", "tokens": [50640, 821, 366, 544, 9942, + 9415, 498, 291, 528, 281, 5484, 13, 5471, 300, 291, 603, 536, 510, 6676, 11, 50920], + "temperature": 0.0, "avg_logprob": -0.13183361291885376, "compression_ratio": 1.6232394366197183, + "no_speech_prob": 0.01708962209522724}, {"id": 333, "seek": 200864, "start": 2020.64, + "end": 2026.48, "text": " we have automated tests against everything. We have a + whole CICD environment. And support,", "tokens": [50964, 321, 362, 18473, 6921, + 1970, 1203, 13, 492, 362, 257, 1379, 383, 2532, 35, 2823, 13, 400, 1406, 11, 51256], + "temperature": 0.0, "avg_logprob": -0.13183361291885376, "compression_ratio": 1.6232394366197183, + "no_speech_prob": 0.01708962209522724}, {"id": 334, "seek": 200864, "start": 2026.48, + "end": 2030.4, "text": " I just want to be clear, is free. Please just join our + Slack channel and we''re happy to help", "tokens": [51256, 286, 445, 528, 281, 312, + 1850, 11, 307, 1737, 13, 2555, 445, 3917, 527, 37211, 2269, 293, 321, 434, 2055, + 281, 854, 51452], "temperature": 0.0, "avg_logprob": -0.13183361291885376, "compression_ratio": + 1.6232394366197183, "no_speech_prob": 0.01708962209522724}, {"id": 335, "seek": + 200864, "start": 2031.2800000000002, "end": 2038.5600000000002, "text": " anytime, + anywhere. Now, when you get swirl installed locally, as I have it, you''ll get this", + "tokens": [51496, 13038, 11, 4992, 13, 823, 11, 562, 291, 483, 30310, 8899, 16143, + 11, 382, 286, 362, 309, 11, 291, 603, 483, 341, 51860], "temperature": 0.0, "avg_logprob": + -0.13183361291885376, "compression_ratio": 1.6232394366197183, "no_speech_prob": + 0.01708962209522724}, {"id": 336, "seek": 203856, "start": 2038.56, "end": 2045.44, + "text": " nice homepage. But ultimately, what most people want to see is the UI. + So this is Spyglass. It''s an", "tokens": [50364, 1481, 31301, 13, 583, 6284, 11, + 437, 881, 561, 528, 281, 536, 307, 264, 15682, 13, 407, 341, 307, 35854, 28851, + 13, 467, 311, 364, 50708], "temperature": 0.0, "avg_logprob": -0.16613170888164255, + "compression_ratio": 1.5856573705179282, "no_speech_prob": 0.000386636151233688}, + {"id": 337, "seek": 203856, "start": 2045.44, "end": 2053.2799999999997, "text": + " open source UI produced by a sister company, KMW, actually long time friend Kevin + Waters. And he''s", "tokens": [50708, 1269, 4009, 15682, 7126, 538, 257, 4892, 2237, + 11, 591, 44, 54, 11, 767, 938, 565, 1277, 9954, 46743, 13, 400, 415, 311, 51100], + "temperature": 0.0, "avg_logprob": -0.16613170888164255, "compression_ratio": 1.5856573705179282, + "no_speech_prob": 0.000386636151233688}, {"id": 338, "seek": 203856, "start": 2053.2799999999997, + "end": 2060.64, "text": " a long time committer and a contributor to the open source + community as well. So Spyglass is a great", "tokens": [51100, 257, 938, 565, 800, + 3904, 293, 257, 42859, 281, 264, 1269, 4009, 1768, 382, 731, 13, 407, 35854, 28851, + 307, 257, 869, 51468], "temperature": 0.0, "avg_logprob": -0.16613170888164255, + "compression_ratio": 1.5856573705179282, "no_speech_prob": 0.000386636151233688}, + {"id": 339, "seek": 203856, "start": 2060.64, "end": 2065.84, "text": " starting + point for building user interfaces. It has a lot of the key building blocks. And + so here,", "tokens": [51468, 2891, 935, 337, 2390, 4195, 28416, 13, 467, 575, 257, + 688, 295, 264, 2141, 2390, 8474, 13, 400, 370, 510, 11, 51728], "temperature": 0.0, + "avg_logprob": -0.16613170888164255, "compression_ratio": 1.5856573705179282, "no_speech_prob": + 0.000386636151233688}, {"id": 340, "seek": 206584, "start": 2065.92, "end": 2074.1600000000003, + "text": " yesterday, I was thinking about how we wrote a document. You sent me a + document to use. And", "tokens": [50368, 5186, 11, 286, 390, 1953, 466, 577, 321, + 4114, 257, 4166, 13, 509, 2279, 385, 257, 4166, 281, 764, 13, 400, 50780], "temperature": + 0.0, "avg_logprob": -0.16641987363497415, "compression_ratio": 1.6120689655172413, + "no_speech_prob": 0.00249667395837605}, {"id": 341, "seek": 206584, "start": 2075.44, + "end": 2080.48, "text": " I admit today, I was seeing you''re going, where is that + document? And I actually said,", "tokens": [50844, 286, 9796, 965, 11, 286, 390, + 2577, 291, 434, 516, 11, 689, 307, 300, 4166, 30, 400, 286, 767, 848, 11, 51096], + "temperature": 0.0, "avg_logprob": -0.16641987363497415, "compression_ratio": 1.6120689655172413, + "no_speech_prob": 0.00249667395837605}, {"id": 342, "seek": 206584, "start": 2080.48, + "end": 2085.84, "text": " okay, it''s in Microsoft Outlook and I found it. But I + forgot that I could search because one of", "tokens": [51096, 1392, 11, 309, 311, + 294, 8116, 5925, 12747, 293, 286, 1352, 309, 13, 583, 286, 5298, 300, 286, 727, + 3164, 570, 472, 295, 51364], "temperature": 0.0, "avg_logprob": -0.16641987363497415, + "compression_ratio": 1.6120689655172413, "no_speech_prob": 0.00249667395837605}, + {"id": 343, "seek": 206584, "start": 2085.84, "end": 2090.08, "text": " the great + things about that''s coming out in swirl version two, which is going to drop next + month in", "tokens": [51364, 264, 869, 721, 466, 300, 311, 1348, 484, 294, 30310, + 3037, 732, 11, 597, 307, 516, 281, 3270, 958, 1618, 294, 51576], "temperature": + 0.0, "avg_logprob": -0.16641987363497415, "compression_ratio": 1.6120689655172413, + "no_speech_prob": 0.00249667395837605}, {"id": 344, "seek": 209008, "start": 2090.56, + "end": 2098.72, "text": " May is we have full M365 support. So you can do the OAuth + two dance. And I''ve actually searched through", "tokens": [50388, 1891, 307, 321, + 362, 1577, 376, 11309, 20, 1406, 13, 407, 291, 393, 360, 264, 48424, 2910, 732, + 4489, 13, 400, 286, 600, 767, 22961, 807, 50796], "temperature": 0.0, "avg_logprob": + -0.14870882969276578, "compression_ratio": 1.6244897959183673, "no_speech_prob": + 0.0049230908043682575}, {"id": 345, "seek": 209008, "start": 2098.72, "end": 2104.7999999999997, + "text": " my M365. And here''s my acceptance of your your meeting, actually, some + other references to it. And", "tokens": [50796, 452, 376, 11309, 20, 13, 400, 510, + 311, 452, 20351, 295, 428, 428, 3440, 11, 767, 11, 512, 661, 15400, 281, 309, 13, + 400, 51100], "temperature": 0.0, "avg_logprob": -0.14870882969276578, "compression_ratio": + 1.6244897959183673, "no_speech_prob": 0.0049230908043682575}, {"id": 346, "seek": + 209008, "start": 2104.7999999999997, "end": 2109.7599999999998, "text": " then here, + document number four, document shared with you, vector podcast. So if I had searched,", + "tokens": [51100, 550, 510, 11, 4166, 1230, 1451, 11, 4166, 5507, 365, 291, 11, + 8062, 7367, 13, 407, 498, 286, 632, 22961, 11, 51348], "temperature": 0.0, "avg_logprob": + -0.14870882969276578, "compression_ratio": 1.6244897959183673, "no_speech_prob": + 0.0049230908043682575}, {"id": 347, "seek": 209008, "start": 2109.7599999999998, + "end": 2114.16, "text": " it would have been the fourth hit above the fold. And + I actually haven''t done the relevancy tuning", "tokens": [51348, 309, 576, 362, + 668, 264, 6409, 2045, 3673, 264, 4860, 13, 400, 286, 767, 2378, 380, 1096, 264, + 25916, 6717, 15164, 51568], "temperature": 0.0, "avg_logprob": -0.14870882969276578, + "compression_ratio": 1.6244897959183673, "no_speech_prob": 0.0049230908043682575}, + {"id": 348, "seek": 211416, "start": 2114.3999999999996, "end": 2121.44, "text": + " on email or one drive yet. So it worked well enough to come up. But what I think + you can see again", "tokens": [50376, 322, 3796, 420, 472, 3332, 1939, 13, 407, + 309, 2732, 731, 1547, 281, 808, 493, 13, 583, 437, 286, 519, 291, 393, 536, 797, + 50728], "temperature": 0.0, "avg_logprob": -0.1231204754597432, "compression_ratio": + 1.529100529100529, "no_speech_prob": 0.0007240819395519793}, {"id": 349, "seek": + 211416, "start": 2122.48, "end": 2131.7599999999998, "text": " is the matches are + early in the document. It favors them. First of all, of course, it likes both", + "tokens": [50780, 307, 264, 10676, 366, 2440, 294, 264, 4166, 13, 467, 40554, 552, + 13, 2386, 295, 439, 11, 295, 1164, 11, 309, 5902, 1293, 51244], "temperature": 0.0, + "avg_logprob": -0.1231204754597432, "compression_ratio": 1.529100529100529, "no_speech_prob": + 0.0007240819395519793}, {"id": 350, "seek": 211416, "start": 2131.7599999999998, + "end": 2138.8799999999997, "text": " terms together, but it favors them without + with some exceptions. It favors the term that''s to", "tokens": [51244, 2115, 1214, + 11, 457, 309, 40554, 552, 1553, 365, 512, 22847, 13, 467, 40554, 264, 1433, 300, + 311, 281, 51600], "temperature": 0.0, "avg_logprob": -0.1231204754597432, "compression_ratio": + 1.529100529100529, "no_speech_prob": 0.0007240819395519793}, {"id": 351, "seek": + 213888, "start": 2138.96, "end": 2142.8, "text": " the left. And so you can see + there were a lot of results, but only a few really ranked.", "tokens": [50368, 264, + 1411, 13, 400, 370, 291, 393, 536, 456, 645, 257, 688, 295, 3542, 11, 457, 787, + 257, 1326, 534, 20197, 13, 50560], "temperature": 0.0, "avg_logprob": -0.12193725420081097, + "compression_ratio": 1.5899581589958158, "no_speech_prob": 0.0023759170435369015}, + {"id": 352, "seek": 213888, "start": 2144.2400000000002, "end": 2149.28, "text": + " Yeah, hi. And that''s the key, right? I scan it. I''m pretty much done now. And + I can say,", "tokens": [50632, 865, 11, 4879, 13, 400, 300, 311, 264, 2141, 11, + 558, 30, 286, 11049, 309, 13, 286, 478, 1238, 709, 1096, 586, 13, 400, 286, 393, + 584, 11, 50884], "temperature": 0.0, "avg_logprob": -0.12193725420081097, "compression_ratio": + 1.5899581589958158, "no_speech_prob": 0.0023759170435369015}, {"id": 353, "seek": + 213888, "start": 2149.28, "end": 2153.28, "text": " you know, I probably want to + go look in my email or my one drive. That''s more than likely where it is.", "tokens": + [50884, 291, 458, 11, 286, 1391, 528, 281, 352, 574, 294, 452, 3796, 420, 452, 472, + 3332, 13, 663, 311, 544, 813, 3700, 689, 309, 307, 13, 51084], "temperature": 0.0, + "avg_logprob": -0.12193725420081097, "compression_ratio": 1.5899581589958158, "no_speech_prob": + 0.0023759170435369015}, {"id": 354, "seek": 213888, "start": 2153.84, "end": 2161.84, + "text": " And I can go and do that, you know, very simply. Right, there we go. Now + I have in the top three. So", "tokens": [51112, 400, 286, 393, 352, 293, 360, 300, + 11, 291, 458, 11, 588, 2935, 13, 1779, 11, 456, 321, 352, 13, 823, 286, 362, 294, + 264, 1192, 1045, 13, 407, 51512], "temperature": 0.0, "avg_logprob": -0.12193725420081097, + "compression_ratio": 1.5899581589958158, "no_speech_prob": 0.0023759170435369015}, + {"id": 355, "seek": 216184, "start": 2162.0, "end": 2169.92, "text": " the power + of meta search though is more than just that. I will just let''s do that.", "tokens": + [50372, 264, 1347, 295, 19616, 3164, 1673, 307, 544, 813, 445, 300, 13, 286, 486, + 445, 718, 311, 360, 300, 13, 50768], "temperature": 0.0, "avg_logprob": -0.317051751273019, + "compression_ratio": 1.4821428571428572, "no_speech_prob": 0.00201908266171813}, + {"id": 356, "seek": 216184, "start": 2169.92, "end": 2177.36, "text": " Is it like + a Django app or? Yes. Yeah. Yeah. So the stack is the stack is", "tokens": [50768, + 1119, 309, 411, 257, 33464, 17150, 724, 420, 30, 1079, 13, 865, 13, 865, 13, 407, + 264, 8630, 307, 264, 8630, 307, 51140], "temperature": 0.0, "avg_logprob": -0.317051751273019, + "compression_ratio": 1.4821428571428572, "no_speech_prob": 0.00201908266171813}, + {"id": 357, "seek": 216184, "start": 2179.1200000000003, "end": 2186.0, "text": + " rabbit Django Python celery, although we''re not using too much celery and SQLite + or Postgres", "tokens": [51228, 19509, 33464, 17150, 15329, 37643, 11, 4878, 321, + 434, 406, 1228, 886, 709, 37643, 293, 19200, 642, 420, 10223, 45189, 51572], "temperature": + 0.0, "avg_logprob": -0.317051751273019, "compression_ratio": 1.4821428571428572, + "no_speech_prob": 0.00201908266171813}, {"id": 358, "seek": 218600, "start": 2186.88, + "end": 2193.04, "text": " with a lot of packages. We use NLTK, Spacey, Jason Path, + some others.", "tokens": [50408, 365, 257, 688, 295, 17401, 13, 492, 764, 426, 43, + 51, 42, 11, 8705, 88, 11, 11181, 21914, 11, 512, 2357, 13, 50716], "temperature": + 0.0, "avg_logprob": -0.2485576919887377, "compression_ratio": 1.4889867841409692, + "no_speech_prob": 0.0026296114083379507}, {"id": 359, "seek": 218600, "start": 2200.64, + "end": 2205.12, "text": " So now, so here I am running my electric vehicle company, + Colin Tesla.", "tokens": [51096, 407, 586, 11, 370, 510, 286, 669, 2614, 452, 5210, + 5864, 2237, 11, 29253, 13666, 13, 51320], "temperature": 0.0, "avg_logprob": -0.2485576919887377, + "compression_ratio": 1.4889867841409692, "no_speech_prob": 0.0026296114083379507}, + {"id": 360, "seek": 218600, "start": 2206.0, "end": 2209.84, "text": " This is an + earlier version of the software. So you''re going to see some, there''s one bug + here,", "tokens": [51364, 639, 307, 364, 3071, 3037, 295, 264, 4722, 13, 407, 291, + 434, 516, 281, 536, 512, 11, 456, 311, 472, 7426, 510, 11, 51556], "temperature": + 0.0, "avg_logprob": -0.2485576919887377, "compression_ratio": 1.4889867841409692, + "no_speech_prob": 0.0026296114083379507}, {"id": 361, "seek": 218600, "start": 2209.84, + "end": 2214.96, "text": " which is you''ll see the emphasis tags instead of having + them render, just because I reloaded the older", "tokens": [51556, 597, 307, 291, + 603, 536, 264, 16271, 18632, 2602, 295, 1419, 552, 15529, 11, 445, 570, 286, 25628, + 292, 264, 4906, 51812], "temperature": 0.0, "avg_logprob": -0.2485576919887377, + "compression_ratio": 1.4889867841409692, "no_speech_prob": 0.0026296114083379507}, + {"id": 362, "seek": 221496, "start": 2214.96, "end": 2223.36, "text": " version. + But here we can see a lot more sources than just, just, you know, enterprise sources.", + "tokens": [50364, 3037, 13, 583, 510, 321, 393, 536, 257, 688, 544, 7139, 813, 445, + 11, 445, 11, 291, 458, 11, 14132, 7139, 13, 50784], "temperature": 0.0, "avg_logprob": + -0.1451706832714295, "compression_ratio": 1.58130081300813, "no_speech_prob": 0.0006754028145223856}, + {"id": 363, "seek": 221496, "start": 2223.36, "end": 2229.2, "text": " And particular, + one of the things that the swirl adaptive query processor does is it rewrites", + "tokens": [50784, 400, 1729, 11, 472, 295, 264, 721, 300, 264, 30310, 27912, 14581, + 15321, 775, 307, 309, 319, 86, 30931, 51076], "temperature": 0.0, "avg_logprob": + -0.1451706832714295, "compression_ratio": 1.58130081300813, "no_speech_prob": 0.0006754028145223856}, + {"id": 364, "seek": 221496, "start": 2229.2, "end": 2237.04, "text": " this query. + Most repositories will get the search electric vehicle Tesla with the company tag + removed.", "tokens": [51076, 341, 14581, 13, 4534, 22283, 2083, 486, 483, 264, 3164, + 5210, 5864, 13666, 365, 264, 2237, 6162, 7261, 13, 51468], "temperature": 0.0, "avg_logprob": + -0.1451706832714295, "compression_ratio": 1.58130081300813, "no_speech_prob": 0.0006754028145223856}, + {"id": 365, "seek": 221496, "start": 2237.04, "end": 2243.52, "text": " However, + the company funding database in BigQuery, which I just fixed, will actually only + get the", "tokens": [51468, 2908, 11, 264, 2237, 6137, 8149, 294, 43422, 11, 597, + 286, 445, 6806, 11, 486, 767, 787, 483, 264, 51792], "temperature": 0.0, "avg_logprob": + -0.1451706832714295, "compression_ratio": 1.58130081300813, "no_speech_prob": 0.0006754028145223856}, + {"id": 366, "seek": 224352, "start": 2243.52, "end": 2249.92, "text": " query Tesla. + So if we now look at the results, you know, we''ll see fairly traditional high quality", + "tokens": [50364, 14581, 13666, 13, 407, 498, 321, 586, 574, 412, 264, 3542, 11, + 291, 458, 11, 321, 603, 536, 6457, 5164, 1090, 3125, 50684], "temperature": 0.0, + "avg_logprob": -0.09529762268066407, "compression_ratio": 1.5502008032128514, "no_speech_prob": + 0.0003635244211181998}, {"id": 367, "seek": 224352, "start": 2249.92, "end": 2256.96, + "text": " content here about electric vehicles with Tesla favored early on. So for + example, it loves this", "tokens": [50684, 2701, 510, 466, 5210, 8948, 365, 13666, + 44420, 2440, 322, 13, 407, 337, 1365, 11, 309, 6752, 341, 51036], "temperature": + 0.0, "avg_logprob": -0.09529762268066407, "compression_ratio": 1.5502008032128514, + "no_speech_prob": 0.0003635244211181998}, {"id": 368, "seek": 224352, "start": 2256.96, + "end": 2262.8, "text": " hit with Tesla right at the beginning of the body. Most + of these, I think are pretty good hits.", "tokens": [51036, 2045, 365, 13666, 558, + 412, 264, 2863, 295, 264, 1772, 13, 4534, 295, 613, 11, 286, 519, 366, 1238, 665, + 8664, 13, 51328], "temperature": 0.0, "avg_logprob": -0.09529762268066407, "compression_ratio": + 1.5502008032128514, "no_speech_prob": 0.0003635244211181998}, {"id": 369, "seek": + 224352, "start": 2262.8, "end": 2268.16, "text": " And here, here''s a database + hit. This is from BigQuery. It''s a company funding record. So Tesla", "tokens": + [51328, 400, 510, 11, 510, 311, 257, 8149, 2045, 13, 639, 307, 490, 43422, 13, 467, + 311, 257, 2237, 6137, 2136, 13, 407, 13666, 51596], "temperature": 0.0, "avg_logprob": + -0.09529762268066407, "compression_ratio": 1.5502008032128514, "no_speech_prob": + 0.0003635244211181998}, {"id": 370, "seek": 226816, "start": 2268.16, "end": 2276.64, + "text": " Motors raised a large series C back in 2006. This is an old database of + funding records from Kaggle.", "tokens": [50364, 40118, 6005, 257, 2416, 2638, 383, + 646, 294, 14062, 13, 639, 307, 364, 1331, 8149, 295, 6137, 7724, 490, 48751, 22631, + 13, 50788], "temperature": 0.0, "avg_logprob": -0.14827345477210152, "compression_ratio": + 1.4333333333333333, "no_speech_prob": 0.018362978473305702}, {"id": 371, "seek": + 226816, "start": 2277.44, "end": 2284.7999999999997, "text": " Now, a couple of + things I want to point out on the fly swirl allows you to turn a database record + into", "tokens": [50828, 823, 11, 257, 1916, 295, 721, 286, 528, 281, 935, 484, + 322, 264, 3603, 30310, 4045, 291, 281, 1261, 257, 8149, 2136, 666, 51196], "temperature": + 0.0, "avg_logprob": -0.14827345477210152, "compression_ratio": 1.4333333333333333, + "no_speech_prob": 0.018362978473305702}, {"id": 372, "seek": 226816, "start": 2284.7999999999997, + "end": 2292.48, "text": " a sort of pseudo document. You can actually just write + this as a Python expression and use braces", "tokens": [51196, 257, 1333, 295, 35899, + 4166, 13, 509, 393, 767, 445, 2464, 341, 382, 257, 15329, 6114, 293, 764, 41537, + 51580], "temperature": 0.0, "avg_logprob": -0.14827345477210152, "compression_ratio": + 1.4333333333333333, "no_speech_prob": 0.018362978473305702}, {"id": 373, "seek": + 229248, "start": 2292.48, "end": 2299.44, "text": " to refer to the fields. And + I''ll show that in a second. In addition, though, swirl has a fixed", "tokens": + [50364, 281, 2864, 281, 264, 7909, 13, 400, 286, 603, 855, 300, 294, 257, 1150, + 13, 682, 4500, 11, 1673, 11, 30310, 575, 257, 6806, 50712], "temperature": 0.0, + "avg_logprob": -0.1593510945638021, "compression_ratio": 1.7035714285714285, "no_speech_prob": + 0.0703338086605072}, {"id": 374, "seek": 229248, "start": 2299.44, "end": 2305.68, + "text": " schema URL title, body date published date retrieved and author, but it + also has a payload field.", "tokens": [50712, 34078, 12905, 4876, 11, 1772, 4002, + 6572, 4002, 19817, 937, 293, 3793, 11, 457, 309, 611, 575, 257, 30918, 2519, 13, + 51024], "temperature": 0.0, "avg_logprob": -0.1593510945638021, "compression_ratio": + 1.7035714285714285, "no_speech_prob": 0.0703338086605072}, {"id": 375, "seek": 229248, + "start": 2305.68, "end": 2310.72, "text": " The payload field can hold anything. + And by default, anything that you don''t specify for mapping", "tokens": [51024, + 440, 30918, 2519, 393, 1797, 1340, 13, 400, 538, 7576, 11, 1340, 300, 291, 500, + 380, 16500, 337, 18350, 51276], "temperature": 0.0, "avg_logprob": -0.1593510945638021, + "compression_ratio": 1.7035714285714285, "no_speech_prob": 0.0703338086605072}, + {"id": 376, "seek": 229248, "start": 2310.72, "end": 2314.96, "text": " goes into + the payload. You can also say, please don''t put anything in the payload. So here,", + "tokens": [51276, 1709, 666, 264, 30918, 13, 509, 393, 611, 584, 11, 1767, 500, + 380, 829, 1340, 294, 264, 30918, 13, 407, 510, 11, 51488], "temperature": 0.0, "avg_logprob": + -0.1593510945638021, "compression_ratio": 1.7035714285714285, "no_speech_prob": + 0.0703338086605072}, {"id": 377, "seek": 229248, "start": 2314.96, "end": 2320.4, + "text": " the fields are also repeated, right? As data items so that if I want, + did I could extract those", "tokens": [51488, 264, 7909, 366, 611, 10477, 11, 558, + 30, 1018, 1412, 4754, 370, 300, 498, 286, 528, 11, 630, 286, 727, 8947, 729, 51760], + "temperature": 0.0, "avg_logprob": -0.1593510945638021, "compression_ratio": 1.7035714285714285, + "no_speech_prob": 0.0703338086605072}, {"id": 378, "seek": 232040, "start": 2320.4, + "end": 2325.76, "text": " individually? And the idea here is you have a normalized + record that reflects the sort of", "tokens": [50364, 16652, 30, 400, 264, 1558, + 510, 307, 291, 362, 257, 48704, 2136, 300, 18926, 264, 1333, 295, 50632], "temperature": + 0.0, "avg_logprob": -0.12956732717053643, "compression_ratio": 1.6305084745762712, + "no_speech_prob": 0.003615084569901228}, {"id": 379, "seek": 232040, "start": 2325.76, + "end": 2330.56, "text": " top relevancy items. So you know whether or not you should + go deeper. And then the payload will", "tokens": [50632, 1192, 25916, 6717, 4754, + 13, 407, 291, 458, 1968, 420, 406, 291, 820, 352, 7731, 13, 400, 550, 264, 30918, + 486, 50872], "temperature": 0.0, "avg_logprob": -0.12956732717053643, "compression_ratio": + 1.6305084745762712, "no_speech_prob": 0.003615084569901228}, {"id": 380, "seek": + 232040, "start": 2330.56, "end": 2335.36, "text": " have anything extra that you + might need to make that decision. So for example, if we look a little", "tokens": + [50872, 362, 1340, 2857, 300, 291, 1062, 643, 281, 652, 300, 3537, 13, 407, 337, + 1365, 11, 498, 321, 574, 257, 707, 51112], "temperature": 0.0, "avg_logprob": -0.12956732717053643, + "compression_ratio": 1.6305084745762712, "no_speech_prob": 0.003615084569901228}, + {"id": 381, "seek": 232040, "start": 2335.36, "end": 2341.92, "text": " further + down, here''s a result from NLresearch.com. Northern Light, the same company I started + while", "tokens": [51112, 3052, 760, 11, 510, 311, 257, 1874, 490, 426, 43, 49838, + 1178, 13, 1112, 13, 14335, 8279, 11, 264, 912, 2237, 286, 1409, 1339, 51440], "temperature": + 0.0, "avg_logprob": -0.12956732717053643, "compression_ratio": 1.6305084745762712, + "no_speech_prob": 0.003615084569901228}, {"id": 382, "seek": 232040, "start": 2341.92, + "end": 2346.08, "text": " working on search, or I learned a lot about search at + was really the first company I worked for.", "tokens": [51440, 1364, 322, 3164, + 11, 420, 286, 3264, 257, 688, 466, 3164, 412, 390, 534, 264, 700, 2237, 286, 2732, + 337, 13, 51648], "temperature": 0.0, "avg_logprob": -0.12956732717053643, "compression_ratio": + 1.6305084745762712, "no_speech_prob": 0.003615084569901228}, {"id": 383, "seek": + 234608, "start": 2346.08, "end": 2351.92, "text": " Still going strong. One of the + things they do is extract super high quality news from the web.", "tokens": [50364, + 8291, 516, 2068, 13, 1485, 295, 264, 721, 436, 360, 307, 8947, 1687, 1090, 3125, + 2583, 490, 264, 3670, 13, 50656], "temperature": 0.0, "avg_logprob": -0.12547557374351045, + "compression_ratio": 1.6793103448275861, "no_speech_prob": 0.0012360269902274013}, + {"id": 384, "seek": 234608, "start": 2351.92, "end": 2357.36, "text": " And they + field it and they classify it and can really do rich searching. So here is an article", + "tokens": [50656, 400, 436, 2519, 309, 293, 436, 33872, 309, 293, 393, 534, 360, + 4593, 10808, 13, 407, 510, 307, 364, 7222, 50928], "temperature": 0.0, "avg_logprob": + -0.12547557374351045, "compression_ratio": 1.6793103448275861, "no_speech_prob": + 0.0012360269902274013}, {"id": 385, "seek": 234608, "start": 2357.36, "end": 2363.2, + "text": " that they pulled together about, you know, basically it''s not so much + about the electric vehicle market.", "tokens": [50928, 300, 436, 7373, 1214, 466, + 11, 291, 458, 11, 1936, 309, 311, 406, 370, 709, 466, 264, 5210, 5864, 2142, 13, + 51220], "temperature": 0.0, "avg_logprob": -0.12547557374351045, "compression_ratio": + 1.6793103448275861, "no_speech_prob": 0.0012360269902274013}, {"id": 386, "seek": + 234608, "start": 2363.2, "end": 2366.4, "text": " It''s about Tesla. So it ranked + a little bit lower. In this case, there were some other ones that", "tokens": [51220, + 467, 311, 466, 13666, 13, 407, 309, 20197, 257, 707, 857, 3126, 13, 682, 341, 1389, + 11, 456, 645, 512, 661, 2306, 300, 51380], "temperature": 0.0, "avg_logprob": -0.12547557374351045, + "compression_ratio": 1.6793103448275861, "no_speech_prob": 0.0012360269902274013}, + {"id": 387, "seek": 234608, "start": 2366.4, "end": 2371.12, "text": " ranked higher. + They have some nice data that we like to capture and put in the payload as well.", + "tokens": [51380, 20197, 2946, 13, 814, 362, 512, 1481, 1412, 300, 321, 411, 281, + 7983, 293, 829, 294, 264, 30918, 382, 731, 13, 51616], "temperature": 0.0, "avg_logprob": + -0.12547557374351045, "compression_ratio": 1.6793103448275861, "no_speech_prob": + 0.0012360269902274013}, {"id": 388, "seek": 237112, "start": 2371.68, "end": 2378.64, + "text": " So this really is the core of swirl. And you say it has things like facets. + For example,", "tokens": [50392, 407, 341, 534, 307, 264, 4965, 295, 30310, 13, + 400, 291, 584, 309, 575, 721, 411, 49752, 13, 1171, 1365, 11, 50740], "temperature": + 0.0, "avg_logprob": -0.2449361589219835, "compression_ratio": 1.5480769230769231, + "no_speech_prob": 0.00207709101960063}, {"id": 389, "seek": 237112, "start": 2378.64, + "end": 2386.16, "text": " we use U-track internally to track issues. So if I want + to, you know, just switch to those,", "tokens": [50740, 321, 764, 624, 12, 19466, + 19501, 281, 2837, 2663, 13, 407, 498, 286, 528, 281, 11, 291, 458, 11, 445, 3679, + 281, 729, 11, 51116], "temperature": 0.0, "avg_logprob": -0.2449361589219835, "compression_ratio": + 1.5480769230769231, "no_speech_prob": 0.00207709101960063}, {"id": 390, "seek": + 237112, "start": 2386.16, "end": 2394.3199999999997, "text": " it''ll bring just + those up. Oh, looks like I pooped on that one. Another thing you can do when you''re", + "tokens": [51116, 309, 603, 1565, 445, 729, 493, 13, 876, 11, 1542, 411, 286, 17153, + 292, 322, 300, 472, 13, 3996, 551, 291, 393, 360, 562, 291, 434, 51524], "temperature": + 0.0, "avg_logprob": -0.2449361589219835, "compression_ratio": 1.5480769230769231, + "no_speech_prob": 0.00207709101960063}, {"id": 391, "seek": 237112, "start": 2394.3199999999997, + "end": 2398.08, "text": " running, oops, looks like just a second.", "tokens": [51524, + 2614, 11, 34166, 11, 1542, 411, 445, 257, 1150, 13, 51712], "temperature": 0.0, + "avg_logprob": -0.2449361589219835, "compression_ratio": 1.5480769230769231, "no_speech_prob": + 0.00207709101960063}, {"id": 392, "seek": 240112, "start": 2401.3599999999997, "end": + 2413.2, "text": " Another thing you can do, we have the concept of mixers. Not for + drinks, but for results.", "tokens": [50376, 3996, 551, 291, 393, 360, 11, 321, + 362, 264, 3410, 295, 2890, 433, 13, 1726, 337, 12142, 11, 457, 337, 3542, 13, 50968], + "temperature": 0.0, "avg_logprob": -0.16944787940200495, "compression_ratio": 1.6403508771929824, + "no_speech_prob": 0.0036291347350925207}, {"id": 393, "seek": 240112, "start": 2413.7599999999998, + "end": 2418.64, "text": " You can mix the results up by default. We do it by relevancy, + but you can specify different", "tokens": [50996, 509, 393, 2890, 264, 3542, 493, + 538, 7576, 13, 492, 360, 309, 538, 25916, 6717, 11, 457, 291, 393, 16500, 819, 51240], + "temperature": 0.0, "avg_logprob": -0.16944787940200495, "compression_ratio": 1.6403508771929824, + "no_speech_prob": 0.0036291347350925207}, {"id": 394, "seek": 240112, "start": 2418.64, + "end": 2424.64, "text": " mixers. For example, the date mixer will focus on it. + Well, date sorted and it hides anything", "tokens": [51240, 2890, 433, 13, 1171, + 1365, 11, 264, 4002, 24063, 486, 1879, 322, 309, 13, 1042, 11, 4002, 25462, 293, + 309, 35953, 1340, 51540], "temperature": 0.0, "avg_logprob": -0.16944787940200495, + "compression_ratio": 1.6403508771929824, "no_speech_prob": 0.0036291347350925207}, + {"id": 395, "seek": 240112, "start": 2424.64, "end": 2429.68, "text": " that doesn''t + have a date published. The round robin mixer, on the other hand, still sort of honors", + "tokens": [51540, 300, 1177, 380, 362, 257, 4002, 6572, 13, 440, 3098, 3870, 259, + 24063, 11, 322, 264, 661, 1011, 11, 920, 1333, 295, 26884, 51792], "temperature": + 0.0, "avg_logprob": -0.16944787940200495, "compression_ratio": 1.6403508771929824, + "no_speech_prob": 0.0036291347350925207}, {"id": 396, "seek": 242968, "start": 2429.68, + "end": 2436.56, "text": " relevancy, but it just takes one from each result. So + you get a cross section of the results.", "tokens": [50364, 25916, 6717, 11, 457, + 309, 445, 2516, 472, 490, 1184, 1874, 13, 407, 291, 483, 257, 3278, 3541, 295, 264, + 3542, 13, 50708], "temperature": 0.0, "avg_logprob": -0.08485564407037229, "compression_ratio": + 1.740909090909091, "no_speech_prob": 0.003491308307275176}, {"id": 397, "seek": + 242968, "start": 2436.56, "end": 2442.3199999999997, "text": " So here, for example, + you know, just looking at the top five, one result, the best result from", "tokens": + [50708, 407, 510, 11, 337, 1365, 11, 291, 458, 11, 445, 1237, 412, 264, 1192, 1732, + 11, 472, 1874, 11, 264, 1151, 1874, 490, 50996], "temperature": 0.0, "avg_logprob": + -0.08485564407037229, "compression_ratio": 1.740909090909091, "no_speech_prob": + 0.003491308307275176}, {"id": 398, "seek": 242968, "start": 2442.3199999999997, + "end": 2448.3999999999996, "text": " each silo right here at the top. And of course, + here I''m arguing a little bit about the relevancy", "tokens": [50996, 1184, 3425, + 78, 558, 510, 412, 264, 1192, 13, 400, 295, 1164, 11, 510, 286, 478, 19697, 257, + 707, 857, 466, 264, 25916, 6717, 51300], "temperature": 0.0, "avg_logprob": -0.08485564407037229, + "compression_ratio": 1.740909090909091, "no_speech_prob": 0.003491308307275176}, + {"id": 399, "seek": 242968, "start": 2448.3999999999996, "end": 2452.16, "text": + " of this right in one of our support tickets. So you see everything kind of just + brought together", "tokens": [51300, 295, 341, 558, 294, 472, 295, 527, 1406, 12628, + 13, 407, 291, 536, 1203, 733, 295, 445, 3038, 1214, 51488], "temperature": 0.0, + "avg_logprob": -0.08485564407037229, "compression_ratio": 1.740909090909091, "no_speech_prob": + 0.003491308307275176}, {"id": 400, "seek": 245216, "start": 2452.24, "end": 2454.72, + "text": " for me and I can decide which things I might like to do.", "tokens": [50368, + 337, 385, 293, 286, 393, 4536, 597, 721, 286, 1062, 411, 281, 360, 13, 50492], "temperature": + 0.0, "avg_logprob": -0.1923727035522461, "compression_ratio": 1.715481171548117, + "no_speech_prob": 0.15056148171424866}, {"id": 401, "seek": 245216, "start": 2457.2, + "end": 2463.2799999999997, "text": " Yeah, maybe it could, I mean, I''m just commenting + as we go, but maybe visually it could also show", "tokens": [50616, 865, 11, 1310, + 309, 727, 11, 286, 914, 11, 286, 478, 445, 29590, 382, 321, 352, 11, 457, 1310, + 19622, 309, 727, 611, 855, 50920], "temperature": 0.0, "avg_logprob": -0.1923727035522461, + "compression_ratio": 1.715481171548117, "no_speech_prob": 0.15056148171424866}, + {"id": 402, "seek": 245216, "start": 2463.2799999999997, "end": 2467.3599999999997, + "text": " where this comes from, right? Because you do have on the left the sources.", + "tokens": [50920, 689, 341, 1487, 490, 11, 558, 30, 1436, 291, 360, 362, 322, 264, + 1411, 264, 7139, 13, 51124], "temperature": 0.0, "avg_logprob": -0.1923727035522461, + "compression_ratio": 1.715481171548117, "no_speech_prob": 0.15056148171424866}, + {"id": 403, "seek": 245216, "start": 2467.92, "end": 2472.3999999999996, "text": + " Yes. So could actually say this comes from here, this comes from there. But again,", + "tokens": [51152, 1079, 13, 407, 727, 767, 584, 341, 1487, 490, 510, 11, 341, 1487, + 490, 456, 13, 583, 797, 11, 51376], "temperature": 0.0, "avg_logprob": -0.1923727035522461, + "compression_ratio": 1.715481171548117, "no_speech_prob": 0.15056148171424866}, + {"id": 404, "seek": 245216, "start": 2472.3999999999996, "end": 2477.2, "text": + " the combined view is also excellent. It''s just if you needed to know, right? + If you need to know,", "tokens": [51376, 264, 9354, 1910, 307, 611, 7103, 13, 467, + 311, 445, 498, 291, 2978, 281, 458, 11, 558, 30, 759, 291, 643, 281, 458, 11, 51616], + "temperature": 0.0, "avg_logprob": -0.1923727035522461, "compression_ratio": 1.715481171548117, + "no_speech_prob": 0.15056148171424866}, {"id": 405, "seek": 247720, "start": 2477.2, + "end": 2484.24, "text": " where did I get this from, right? That''s right. So we + do, we do, we keep the source in the result", "tokens": [50364, 689, 630, 286, 483, + 341, 490, 11, 558, 30, 663, 311, 558, 13, 407, 321, 360, 11, 321, 360, 11, 321, + 1066, 264, 4009, 294, 264, 1874, 50716], "temperature": 0.0, "avg_logprob": -0.14283222016834077, + "compression_ratio": 1.6493506493506493, "no_speech_prob": 0.008677611127495766}, + {"id": 406, "seek": 247720, "start": 2484.24, "end": 2489.4399999999996, "text": + " here, along with whoever the source tells us the author is. However, in the in + this version,", "tokens": [50716, 510, 11, 2051, 365, 11387, 264, 4009, 5112, 505, + 264, 3793, 307, 13, 2908, 11, 294, 264, 294, 341, 3037, 11, 50976], "temperature": + 0.0, "avg_logprob": -0.14283222016834077, "compression_ratio": 1.6493506493506493, + "no_speech_prob": 0.008677611127495766}, {"id": 407, "seek": 247720, "start": 2489.4399999999996, + "end": 2494.72, "text": " we didn''t get to it. We like to report the original rank. + So you should see IT news and I''ll", "tokens": [50976, 321, 994, 380, 483, 281, + 309, 13, 492, 411, 281, 2275, 264, 3380, 6181, 13, 407, 291, 820, 536, 6783, 2583, + 293, 286, 603, 51240], "temperature": 0.0, "avg_logprob": -0.14283222016834077, + "compression_ratio": 1.6493506493506493, "no_speech_prob": 0.008677611127495766}, + {"id": 408, "seek": 247720, "start": 2494.72, "end": 2500.72, "text": " research + one here. It''s the number one result in the two point O version. Actually, there''s + a new", "tokens": [51240, 2132, 472, 510, 13, 467, 311, 264, 1230, 472, 1874, 294, + 264, 732, 935, 422, 3037, 13, 5135, 11, 456, 311, 257, 777, 51540], "temperature": + 0.0, "avg_logprob": -0.14283222016834077, "compression_ratio": 1.6493506493506493, + "no_speech_prob": 0.008677611127495766}, {"id": 409, "seek": 250072, "start": 2500.72, + "end": 2506.3999999999996, "text": " version that''s coming out. I think we''re + going to just do a bug fix on this. The latest version 10.1,", "tokens": [50364, + 3037, 300, 311, 1348, 484, 13, 286, 519, 321, 434, 516, 281, 445, 360, 257, 7426, + 3191, 322, 341, 13, 440, 6792, 3037, 1266, 13, 16, 11, 50648], "temperature": 0.0, + "avg_logprob": -0.127621134375311, "compression_ratio": 1.6772334293948126, "no_speech_prob": + 0.015672797337174416}, {"id": 410, "seek": 250072, "start": 2507.12, "end": 2510.8799999999997, + "text": " which is in the repo now, fixes that in a couple other issues. So if you + just get the newest,", "tokens": [50684, 597, 307, 294, 264, 49040, 586, 11, 32539, + 300, 294, 257, 1916, 661, 2663, 13, 407, 498, 291, 445, 483, 264, 17569, 11, 50872], + "temperature": 0.0, "avg_logprob": -0.127621134375311, "compression_ratio": 1.6772334293948126, + "no_speech_prob": 0.015672797337174416}, {"id": 411, "seek": 250072, "start": 2510.8799999999997, + "end": 2515.2, "text": " you''ll be good. In two point O that we have a little bit + of a new treatment for this. I think", "tokens": [50872, 291, 603, 312, 665, 13, + 682, 732, 935, 422, 300, 321, 362, 257, 707, 857, 295, 257, 777, 5032, 337, 341, + 13, 286, 519, 51088], "temperature": 0.0, "avg_logprob": -0.127621134375311, "compression_ratio": + 1.6772334293948126, "no_speech_prob": 0.015672797337174416}, {"id": 412, "seek": + 250072, "start": 2515.2, "end": 2520.0, "text": " you''ll like a lot better. But + before I jump to that, you asked me a really important question.", "tokens": [51088, + 291, 603, 411, 257, 688, 1101, 13, 583, 949, 286, 3012, 281, 300, 11, 291, 2351, + 385, 257, 534, 1021, 1168, 13, 51328], "temperature": 0.0, "avg_logprob": -0.127621134375311, + "compression_ratio": 1.6772334293948126, "no_speech_prob": 0.015672797337174416}, + {"id": 413, "seek": 250072, "start": 2520.0, "end": 2525.7599999999998, "text": + " Right? So how this is great, this UI will evolve. It''s here so that you can show + the power, right?", "tokens": [51328, 1779, 30, 407, 577, 341, 307, 869, 11, 341, + 15682, 486, 16693, 13, 467, 311, 510, 370, 300, 291, 393, 855, 264, 1347, 11, 558, + 30, 51616], "temperature": 0.0, "avg_logprob": -0.127621134375311, "compression_ratio": + 1.6772334293948126, "no_speech_prob": 0.015672797337174416}, {"id": 414, "seek": + 250072, "start": 2525.7599999999998, "end": 2530.16, "text": " And we ship it integrated. + But from a developer perspective, none of this is super helpful, right?", "tokens": + [51616, 400, 321, 5374, 309, 10919, 13, 583, 490, 257, 10754, 4585, 11, 6022, 295, + 341, 307, 1687, 4961, 11, 558, 30, 51836], "temperature": 0.0, "avg_logprob": -0.127621134375311, + "compression_ratio": 1.6772334293948126, "no_speech_prob": 0.015672797337174416}, + {"id": 415, "seek": 253016, "start": 2530.16, "end": 2534.96, "text": " How do I + integrate this with an existing UI? So that''s what I really wanted to show you + next. So", "tokens": [50364, 1012, 360, 286, 13365, 341, 365, 364, 6741, 15682, + 30, 407, 300, 311, 437, 286, 534, 1415, 281, 855, 291, 958, 13, 407, 50604], "temperature": + 0.0, "avg_logprob": -0.10717669866418325, "compression_ratio": 1.6088709677419355, + "no_speech_prob": 0.0006971502443775535}, {"id": 416, "seek": 253016, "start": 2535.92, + "end": 2543.6, "text": " first, how do we connect to something? The answer is a + search provider definition. So this definition", "tokens": [50652, 700, 11, 577, + 360, 321, 1745, 281, 746, 30, 440, 1867, 307, 257, 3164, 12398, 7123, 13, 407, 341, + 7123, 51036], "temperature": 0.0, "avg_logprob": -0.10717669866418325, "compression_ratio": + 1.6088709677419355, "no_speech_prob": 0.0006971502443775535}, {"id": 417, "seek": + 253016, "start": 2543.6, "end": 2550.3999999999996, "text": " right here, this text + record, mostly JSON, but mostly just strings. This configures are out of the", "tokens": + [51036, 558, 510, 11, 341, 2487, 2136, 11, 5240, 31828, 11, 457, 5240, 445, 13985, + 13, 639, 6662, 1303, 366, 484, 295, 264, 51376], "temperature": 0.0, "avg_logprob": + -0.10717669866418325, "compression_ratio": 1.6088709677419355, "no_speech_prob": + 0.0006971502443775535}, {"id": 418, "seek": 253016, "start": 2550.3999999999996, + "end": 2557.7599999999998, "text": " box request get connector to query a search + provider, to query in particular this Google programmable", "tokens": [51376, 2424, + 5308, 483, 19127, 281, 14581, 257, 3164, 12398, 11, 281, 14581, 294, 1729, 341, + 3329, 37648, 712, 51744], "temperature": 0.0, "avg_logprob": -0.10717669866418325, + "compression_ratio": 1.6088709677419355, "no_speech_prob": 0.0006971502443775535}, + {"id": 419, "seek": 255776, "start": 2557.76, "end": 2561.92, "text": " search engine + that I put together. And actually, we ship with three of them preset and please + feel", "tokens": [50364, 3164, 2848, 300, 286, 829, 1214, 13, 400, 767, 11, 321, + 5374, 365, 1045, 295, 552, 32081, 293, 1767, 841, 50572], "temperature": 0.0, "avg_logprob": + -0.12575600147247315, "compression_ratio": 1.591503267973856, "no_speech_prob": + 0.005927330814301968}, {"id": 420, "seek": 255776, "start": 2561.92, "end": 2568.32, + "text": " free to share our keys. We''re happy we want to make sure that something + is working for everybody,", "tokens": [50572, 1737, 281, 2073, 527, 9317, 13, 492, + 434, 2055, 321, 528, 281, 652, 988, 300, 746, 307, 1364, 337, 2201, 11, 50892], + "temperature": 0.0, "avg_logprob": -0.12575600147247315, "compression_ratio": 1.591503267973856, + "no_speech_prob": 0.005927330814301968}, {"id": 421, "seek": 255776, "start": 2568.32, + "end": 2575.2000000000003, "text": " right? Out of the box. So further in this, + are the things you''d expect? You can configure this with", "tokens": [50892, 558, + 30, 5925, 295, 264, 2424, 13, 407, 3052, 294, 341, 11, 366, 264, 721, 291, 1116, + 2066, 30, 509, 393, 22162, 341, 365, 51236], "temperature": 0.0, "avg_logprob": + -0.12575600147247315, "compression_ratio": 1.591503267973856, "no_speech_prob": + 0.005927330814301968}, {"id": 422, "seek": 255776, "start": 2575.2000000000003, + "end": 2581.44, "text": " by providing a URL. You can construct the URL by pulling + in fields from the query mappings.", "tokens": [51236, 538, 6530, 257, 12905, 13, + 509, 393, 7690, 264, 12905, 538, 8407, 294, 7909, 490, 264, 14581, 463, 28968, 13, + 51548], "temperature": 0.0, "avg_logprob": -0.12575600147247315, "compression_ratio": + 1.591503267973856, "no_speech_prob": 0.005927330814301968}, {"id": 423, "seek": + 255776, "start": 2581.44, "end": 2586.1600000000003, "text": " So the only thing + that ever really changes in a Google PSC is the CX code. Everything else you can", + "tokens": [51548, 407, 264, 787, 551, 300, 1562, 534, 2962, 294, 257, 3329, 8168, + 34, 307, 264, 383, 55, 3089, 13, 5471, 1646, 291, 393, 51784], "temperature": 0.0, + "avg_logprob": -0.12575600147247315, "compression_ratio": 1.591503267973856, "no_speech_prob": + 0.005927330814301968}, {"id": 424, "seek": 258616, "start": 2586.16, "end": 2592.16, + "text": " just copy and paste. You can put dozens of them in. Also here are some + of the important system", "tokens": [50364, 445, 5055, 293, 9163, 13, 509, 393, + 829, 18431, 295, 552, 294, 13, 2743, 510, 366, 512, 295, 264, 1021, 1185, 50664], + "temperature": 0.0, "avg_logprob": -0.13341502256171647, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.00013744051102548838}, {"id": 425, "seek": 258616, "start": + 2592.16, "end": 2599.52, "text": " things that help the system work, help us process + this. So we have four different processing", "tokens": [50664, 721, 300, 854, 264, + 1185, 589, 11, 854, 505, 1399, 341, 13, 407, 321, 362, 1451, 819, 9007, 51032], + "temperature": 0.0, "avg_logprob": -0.13341502256171647, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.00013744051102548838}, {"id": 426, "seek": 258616, "start": + 2599.52, "end": 2605.2, "text": " pipelines built into Swirl. One is a prequery + that runs before Federation. And then there''s a", "tokens": [51032, 40168, 3094, + 666, 3926, 1648, 13, 1485, 307, 257, 659, 358, 2109, 300, 6676, 949, 27237, 13, + 400, 550, 456, 311, 257, 51316], "temperature": 0.0, "avg_logprob": -0.13341502256171647, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 0.00013744051102548838}, + {"id": 427, "seek": 258616, "start": 2606.3199999999997, "end": 2610.96, "text": + " query processing pipeline that runs for each connector or actually actually say + search provider,", "tokens": [51372, 14581, 9007, 15517, 300, 6676, 337, 1184, 19127, + 420, 767, 767, 584, 3164, 12398, 11, 51604], "temperature": 0.0, "avg_logprob": + -0.13341502256171647, "compression_ratio": 1.6363636363636365, "no_speech_prob": + 0.00013744051102548838}, {"id": 428, "seek": 261096, "start": 2610.96, "end": 2617.04, + "text": " which is an instance, a configured instance of a connector. Then each + of those also has a", "tokens": [50364, 597, 307, 364, 5197, 11, 257, 30538, 5197, + 295, 257, 19127, 13, 1396, 1184, 295, 729, 611, 575, 257, 50668], "temperature": + 0.0, "avg_logprob": -0.09814239216742114, "compression_ratio": 1.7418181818181817, + "no_speech_prob": 0.0004108204448129982}, {"id": 429, "seek": 261096, "start": 2617.04, + "end": 2621.84, "text": " result processing pipeline, which transforms the results + from the source into our normalized", "tokens": [50668, 1874, 9007, 15517, 11, 597, + 35592, 264, 3542, 490, 264, 4009, 666, 527, 48704, 50908], "temperature": 0.0, "avg_logprob": + -0.09814239216742114, "compression_ratio": 1.7418181818181817, "no_speech_prob": + 0.0004108204448129982}, {"id": 430, "seek": 261096, "start": 2621.84, "end": 2626.16, + "text": " format. And then there''s a post result processing that does things like + relevancy ranking where you", "tokens": [50908, 7877, 13, 400, 550, 456, 311, 257, + 2183, 1874, 9007, 300, 775, 721, 411, 25916, 6717, 17833, 689, 291, 51124], "temperature": + 0.0, "avg_logprob": -0.09814239216742114, "compression_ratio": 1.7418181818181817, + "no_speech_prob": 0.0004108204448129982}, {"id": 431, "seek": 261096, "start": 2626.16, + "end": 2631.44, "text": " want all of the data. And they''re all different. By the + way, there''s an object model behind Swirl. So", "tokens": [51124, 528, 439, 295, + 264, 1412, 13, 400, 436, 434, 439, 819, 13, 3146, 264, 636, 11, 456, 311, 364, 2657, + 2316, 2261, 3926, 1648, 13, 407, 51388], "temperature": 0.0, "avg_logprob": -0.09814239216742114, + "compression_ratio": 1.7418181818181817, "no_speech_prob": 0.0004108204448129982}, + {"id": 432, "seek": 261096, "start": 2631.44, "end": 2636.08, "text": " grading + these things is really simple. There are different base classes for those and they + set", "tokens": [51388, 35540, 613, 721, 307, 534, 2199, 13, 821, 366, 819, 3096, + 5359, 337, 729, 293, 436, 992, 51620], "temperature": 0.0, "avg_logprob": -0.09814239216742114, + "compression_ratio": 1.7418181818181817, "no_speech_prob": 0.0004108204448129982}, + {"id": 433, "seek": 263608, "start": 2636.16, "end": 2641.92, "text": " you up with + everything you need. So essentially you come in, you have a Python model or I should + say", "tokens": [50368, 291, 493, 365, 1203, 291, 643, 13, 407, 4476, 291, 808, + 294, 11, 291, 362, 257, 15329, 2316, 420, 286, 820, 584, 50656], "temperature": + 0.0, "avg_logprob": -0.12926271646329673, "compression_ratio": 1.5515873015873016, + "no_speech_prob": 0.002594432793557644}, {"id": 434, "seek": 263608, "start": 2641.92, + "end": 2645.92, "text": " a Django model object to operate on. All you have to do + is change it and exit and you''re done.", "tokens": [50656, 257, 33464, 17150, 2316, + 2657, 281, 9651, 322, 13, 1057, 291, 362, 281, 360, 307, 1319, 309, 293, 11043, + 293, 291, 434, 1096, 13, 50856], "temperature": 0.0, "avg_logprob": -0.12926271646329673, + "compression_ratio": 1.5515873015873016, "no_speech_prob": 0.002594432793557644}, + {"id": 435, "seek": 263608, "start": 2646.48, "end": 2654.3199999999997, "text": + " Simple, simple. Also, we map out the different query capabilities of each provider + in the query", "tokens": [50884, 21532, 11, 2199, 13, 2743, 11, 321, 4471, 484, + 264, 819, 14581, 10862, 295, 1184, 12398, 294, 264, 14581, 51276], "temperature": + 0.0, "avg_logprob": -0.12926271646329673, "compression_ratio": 1.5515873015873016, + "no_speech_prob": 0.002594432793557644}, {"id": 436, "seek": 263608, "start": 2654.3199999999997, + "end": 2660.56, "text": " mappings. So how do you tell a given endpoint to sort + by date? This is how you add this to the URL.", "tokens": [51276, 463, 28968, 13, + 407, 577, 360, 291, 980, 257, 2212, 35795, 281, 1333, 538, 4002, 30, 639, 307, 577, + 291, 909, 341, 281, 264, 12905, 13, 51588], "temperature": 0.0, "avg_logprob": -0.12926271646329673, + "compression_ratio": 1.5515873015873016, "no_speech_prob": 0.002594432793557644}, + {"id": 437, "seek": 266056, "start": 2660.56, "end": 2668.32, "text": " How do you + page the results? This is how. Result index is a Swirl capability where we can provide + you", "tokens": [50364, 1012, 360, 291, 3028, 264, 3542, 30, 639, 307, 577, 13, + 5015, 723, 8186, 307, 257, 3926, 1648, 13759, 689, 321, 393, 2893, 291, 50752], + "temperature": 0.0, "avg_logprob": -0.16142276417125356, "compression_ratio": 1.7841409691629957, + "no_speech_prob": 0.0010711950017139316}, {"id": 438, "seek": 266056, "start": 2668.32, + "end": 2674.48, "text": " with the index number. You can also use result page. So + the count or the page that you want. And here''s", "tokens": [50752, 365, 264, 8186, + 1230, 13, 509, 393, 611, 764, 1874, 3028, 13, 407, 264, 1207, 420, 264, 3028, 300, + 291, 528, 13, 400, 510, 311, 51060], "temperature": 0.0, "avg_logprob": -0.16142276417125356, + "compression_ratio": 1.7841409691629957, "no_speech_prob": 0.0010711950017139316}, + {"id": 439, "seek": 266056, "start": 2674.48, "end": 2680.88, "text": " an important + one too, the not character. So do the silo support not as a term. This one doesn''t.", + "tokens": [51060, 364, 1021, 472, 886, 11, 264, 406, 2517, 13, 407, 360, 264, 3425, + 78, 1406, 406, 382, 257, 1433, 13, 639, 472, 1177, 380, 13, 51380], "temperature": + 0.0, "avg_logprob": -0.16142276417125356, "compression_ratio": 1.7841409691629957, + "no_speech_prob": 0.0010711950017139316}, {"id": 440, "seek": 266056, "start": 2681.44, + "end": 2687.92, "text": " It does not support not as a term. But it supports the + not character. So as an example, now if I go to", "tokens": [51408, 467, 775, 406, + 1406, 406, 382, 257, 1433, 13, 583, 309, 9346, 264, 406, 2517, 13, 407, 382, 364, + 1365, 11, 586, 498, 286, 352, 281, 51732], "temperature": 0.0, "avg_logprob": -0.16142276417125356, + "compression_ratio": 1.7841409691629957, "no_speech_prob": 0.0010711950017139316}, + {"id": 441, "seek": 268792, "start": 2687.92, "end": 2692.2400000000002, "text": + " the search object, I can run a search. I''ll run it for knowledge management.", + "tokens": [50364, 264, 3164, 2657, 11, 286, 393, 1190, 257, 3164, 13, 286, 603, + 1190, 309, 337, 3601, 4592, 13, 50580], "temperature": 0.0, "avg_logprob": -0.20566890003916982, + "compression_ratio": 1.5137614678899083, "no_speech_prob": 0.001280047930777073}, + {"id": 442, "seek": 268792, "start": 2695.92, "end": 2699.44, "text": " So actually, + I''ll just let that one run for a second.", "tokens": [50764, 407, 767, 11, 286, + 603, 445, 718, 300, 472, 1190, 337, 257, 1150, 13, 50940], "temperature": 0.0, "avg_logprob": + -0.20566890003916982, "compression_ratio": 1.5137614678899083, "no_speech_prob": + 0.001280047930777073}, {"id": 443, "seek": 268792, "start": 2703.36, "end": 2709.6800000000003, + "text": " There we go. I got my chat. I have the wrong chat GPT API key. But that''s + okay. Everybody knows", "tokens": [51136, 821, 321, 352, 13, 286, 658, 452, 5081, + 13, 286, 362, 264, 2085, 5081, 26039, 51, 9362, 2141, 13, 583, 300, 311, 1392, 13, + 7646, 3255, 51452], "temperature": 0.0, "avg_logprob": -0.20566890003916982, "compression_ratio": + 1.5137614678899083, "no_speech_prob": 0.001280047930777073}, {"id": 444, "seek": + 268792, "start": 2709.6800000000003, "end": 2716.4, "text": " what we would say + about this stuff. So actually the query I really want to do is Elon Musk not Twitter.", + "tokens": [51452, 437, 321, 576, 584, 466, 341, 1507, 13, 407, 767, 264, 14581, + 286, 534, 528, 281, 360, 307, 28498, 26019, 406, 5794, 13, 51788], "temperature": + 0.0, "avg_logprob": -0.20566890003916982, "compression_ratio": 1.5137614678899083, + "no_speech_prob": 0.001280047930777073}, {"id": 445, "seek": 271640, "start": 2716.48, + "end": 2722.08, "text": " So perfectly legitimate query, right? What''s going on + in Elon Musk''s world that''s not related to Twitter?", "tokens": [50368, 407, 6239, + 17956, 14581, 11, 558, 30, 708, 311, 516, 322, 294, 28498, 26019, 311, 1002, 300, + 311, 406, 4077, 281, 5794, 30, 50648], "temperature": 0.0, "avg_logprob": -0.2309645784312281, + "compression_ratio": 1.5875912408759123, "no_speech_prob": 0.0017051455797627568}, + {"id": 446, "seek": 271640, "start": 2722.4, "end": 2728.96, "text": " Now here''s + the thing. Google PSC will not understand that query. Everybody says what Google + doesn''t understand,", "tokens": [50664, 823, 510, 311, 264, 551, 13, 3329, 8168, + 34, 486, 406, 1223, 300, 14581, 13, 7646, 1619, 437, 3329, 1177, 380, 1223, 11, + 50992], "temperature": 0.0, "avg_logprob": -0.2309645784312281, "compression_ratio": + 1.5875912408759123, "no_speech_prob": 0.0017051455797627568}, {"id": 447, "seek": + 271640, "start": 2728.96, "end": 2734.48, "text": " not no web Google does, but + Google programmable search engine does not honor or not. And in fact,", "tokens": + [50992, 406, 572, 3670, 3329, 775, 11, 457, 3329, 37648, 712, 3164, 2848, 775, 406, + 5968, 420, 406, 13, 400, 294, 1186, 11, 51268], "temperature": 0.0, "avg_logprob": + -0.2309645784312281, "compression_ratio": 1.5875912408759123, "no_speech_prob": + 0.0017051455797627568}, {"id": 448, "seek": 271640, "start": 2734.48, "end": 2736.96, + "text": " just to prove it, PSC.google.com.", "tokens": [51268, 445, 281, 7081, + 309, 11, 8168, 34, 13, 1571, 3127, 13, 1112, 13, 51392], "temperature": 0.0, "avg_logprob": + -0.2309645784312281, "compression_ratio": 1.5875912408759123, "no_speech_prob": + 0.0017051455797627568}, {"id": 449, "seek": 271640, "start": 2739.12, "end": 2744.56, + "text": " By the way, before I talk to you, I didn''t know of this system existence + myself, PSC.", "tokens": [51500, 3146, 264, 636, 11, 949, 286, 751, 281, 291, 11, + 286, 994, 380, 458, 295, 341, 1185, 9123, 2059, 11, 8168, 34, 13, 51772], "temperature": + 0.0, "avg_logprob": -0.2309645784312281, "compression_ratio": 1.5875912408759123, + "no_speech_prob": 0.0017051455797627568}, {"id": 450, "seek": 274456, "start": 2745.44, + "end": 2750.56, "text": " Oh my gosh. For web slicing up the web, it is incredible. + I mean, it takes two seconds to build it,", "tokens": [50408, 876, 452, 6502, 13, + 1171, 3670, 46586, 493, 264, 3670, 11, 309, 307, 4651, 13, 286, 914, 11, 309, 2516, + 732, 3949, 281, 1322, 309, 11, 50664], "temperature": 0.0, "avg_logprob": -0.14057522349887425, + "compression_ratio": 1.5296442687747036, "no_speech_prob": 0.003534645540639758}, + {"id": 451, "seek": 274456, "start": 2750.56, "end": 2756.96, "text": " right? So + and you just give it examples. So here''s the thing. You can go, here''s the public + URL for", "tokens": [50664, 558, 30, 407, 293, 291, 445, 976, 309, 5110, 13, 407, + 510, 311, 264, 551, 13, 509, 393, 352, 11, 510, 311, 264, 1908, 12905, 337, 50984], + "temperature": 0.0, "avg_logprob": -0.14057522349887425, "compression_ratio": 1.5296442687747036, + "no_speech_prob": 0.003534645540639758}, {"id": 452, "seek": 274456, "start": 2756.96, + "end": 2761.92, "text": " one of the programmable search engines I put in and I''ll + do the same exact query. Elon Musk.", "tokens": [50984, 472, 295, 264, 37648, 712, + 3164, 12982, 286, 829, 294, 293, 286, 603, 360, 264, 912, 1900, 14581, 13, 28498, + 26019, 13, 51232], "temperature": 0.0, "avg_logprob": -0.14057522349887425, "compression_ratio": + 1.5296442687747036, "no_speech_prob": 0.003534645540639758}, {"id": 453, "seek": + 274456, "start": 2764.88, "end": 2772.24, "text": " Okay, so the very first result + has Twitter in it, right? It''s right there. In fact, the second", "tokens": [51380, + 1033, 11, 370, 264, 588, 700, 1874, 575, 5794, 294, 309, 11, 558, 30, 467, 311, + 558, 456, 13, 682, 1186, 11, 264, 1150, 51748], "temperature": 0.0, "avg_logprob": + -0.14057522349887425, "compression_ratio": 1.5296442687747036, "no_speech_prob": + 0.003534645540639758}, {"id": 454, "seek": 277224, "start": 2772.24, "end": 2776.9599999999996, + "text": " result also has Twitter. Google programmable search engine is not going + through the full Google", "tokens": [50364, 1874, 611, 575, 5794, 13, 3329, 37648, + 712, 3164, 2848, 307, 406, 516, 807, 264, 1577, 3329, 50600], "temperature": 0.0, + "avg_logprob": -0.13128133920522836, "compression_ratio": 1.651639344262295, "no_speech_prob": + 0.0016597436042502522}, {"id": 455, "seek": 277224, "start": 2776.9599999999996, + "end": 2784.8799999999997, "text": " parser. And it does not honor the not. However, + if I say this, it works perfectly. The plus-minus syntax", "tokens": [50600, 21156, + 260, 13, 400, 309, 775, 406, 5968, 264, 406, 13, 2908, 11, 498, 286, 584, 341, 11, + 309, 1985, 6239, 13, 440, 1804, 12, 2367, 301, 28431, 50996], "temperature": 0.0, + "avg_logprob": -0.13128133920522836, "compression_ratio": 1.651639344262295, "no_speech_prob": + 0.0016597436042502522}, {"id": 456, "seek": 277224, "start": 2784.8799999999997, + "end": 2793.8399999999997, "text": " works. Okay. So now when we look at this definition, + it says the not character for Google PSC is minus.", "tokens": [50996, 1985, 13, + 1033, 13, 407, 586, 562, 321, 574, 412, 341, 7123, 11, 309, 1619, 264, 406, 2517, + 337, 3329, 8168, 34, 307, 3175, 13, 51444], "temperature": 0.0, "avg_logprob": -0.13128133920522836, + "compression_ratio": 1.651639344262295, "no_speech_prob": 0.0016597436042502522}, + {"id": 457, "seek": 277224, "start": 2794.56, "end": 2799.04, "text": " So now if + we look at the search I ran, let''s look at the search object. It''s another object + inside", "tokens": [51480, 407, 586, 498, 321, 574, 412, 264, 3164, 286, 5872, 11, + 718, 311, 574, 412, 264, 3164, 2657, 13, 467, 311, 1071, 2657, 1854, 51704], "temperature": + 0.0, "avg_logprob": -0.13128133920522836, "compression_ratio": 1.651639344262295, + "no_speech_prob": 0.0016597436042502522}, {"id": 458, "seek": 279904, "start": 2799.04, + "end": 2804.08, "text": " a swirl. Why is there a search object? Because in MetaSearch, + it takes a few seconds to get the", "tokens": [50364, 257, 30310, 13, 1545, 307, + 456, 257, 3164, 2657, 30, 1436, 294, 6377, 64, 10637, 1178, 11, 309, 2516, 257, + 1326, 3949, 281, 483, 264, 50616], "temperature": 0.0, "avg_logprob": -0.14475833303560087, + "compression_ratio": 1.7318840579710144, "no_speech_prob": 0.02041260525584221}, + {"id": 459, "seek": 279904, "start": 2804.08, "end": 2810.08, "text": " results + from everything. And you may want to look at that data over and over again. In fact, + one of", "tokens": [50616, 3542, 490, 1203, 13, 400, 291, 815, 528, 281, 574, 412, + 300, 1412, 670, 293, 670, 797, 13, 682, 1186, 11, 472, 295, 50916], "temperature": + 0.0, "avg_logprob": -0.14475833303560087, "compression_ratio": 1.7318840579710144, + "no_speech_prob": 0.02041260525584221}, {"id": 460, "seek": 279904, "start": 2810.08, + "end": 2813.92, "text": " the cool things you can do is swirl is you can set the + subscribe function, swirl well, then", "tokens": [50916, 264, 1627, 721, 291, 393, + 360, 307, 30310, 307, 291, 393, 992, 264, 3022, 2445, 11, 30310, 731, 11, 550, 51108], + "temperature": 0.0, "avg_logprob": -0.14475833303560087, "compression_ratio": 1.7318840579710144, + "no_speech_prob": 0.02041260525584221}, {"id": 461, "seek": 279904, "start": 2813.92, + "end": 2818.56, "text": " recheck for new results every so often and update and + mark them new and you can even get an update", "tokens": [51108, 319, 15723, 337, + 777, 3542, 633, 370, 2049, 293, 5623, 293, 1491, 552, 777, 293, 291, 393, 754, 483, + 364, 5623, 51340], "temperature": 0.0, "avg_logprob": -0.14475833303560087, "compression_ratio": + 1.7318840579710144, "no_speech_prob": 0.02041260525584221}, {"id": 462, "seek": + 279904, "start": 2818.56, "end": 2822.48, "text": " things like that, right? So + alert mode if you will or subscribe mode as we like to call it.", "tokens": [51340, + 721, 411, 300, 11, 558, 30, 407, 9615, 4391, 498, 291, 486, 420, 3022, 4391, 382, + 321, 411, 281, 818, 309, 13, 51536], "temperature": 0.0, "avg_logprob": -0.14475833303560087, + "compression_ratio": 1.7318840579710144, "no_speech_prob": 0.02041260525584221}, + {"id": 463, "seek": 282248, "start": 2823.36, "end": 2830.2400000000002, "text": + " So let''s take a look at the search object. What this object contains for starters, + a block of messages", "tokens": [50408, 407, 718, 311, 747, 257, 574, 412, 264, + 3164, 2657, 13, 708, 341, 2657, 8306, 337, 35131, 11, 257, 3461, 295, 7897, 50752], + "temperature": 0.0, "avg_logprob": -0.0993785759837357, "compression_ratio": 1.5833333333333333, + "no_speech_prob": 0.0053709992207586765}, {"id": 464, "seek": 282248, "start": 2830.2400000000002, + "end": 2836.16, "text": " that explain exactly what was done to the query. And here + you can see the adaptive query processor", "tokens": [50752, 300, 2903, 2293, 437, + 390, 1096, 281, 264, 14581, 13, 400, 510, 291, 393, 536, 264, 27912, 14581, 15321, + 51048], "temperature": 0.0, "avg_logprob": -0.0993785759837357, "compression_ratio": + 1.5833333333333333, "no_speech_prob": 0.0053709992207586765}, {"id": 465, "seek": + 282248, "start": 2836.16, "end": 2843.12, "text": " rewrote the queries for Google + PSC from Elon Musk not Twitter to Elon Musk minus Twitter. So this", "tokens": [51048, + 319, 7449, 1370, 264, 24109, 337, 3329, 8168, 34, 490, 28498, 26019, 406, 5794, + 281, 28498, 26019, 3175, 5794, 13, 407, 341, 51396], "temperature": 0.0, "avg_logprob": + -0.0993785759837357, "compression_ratio": 1.5833333333333333, "no_speech_prob": + 0.0053709992207586765}, {"id": 466, "seek": 282248, "start": 2843.12, "end": 2847.6, + "text": " way we guarantee you''re going to get the right result, not a bad result. + Oh, and also our relevancy", "tokens": [51396, 636, 321, 10815, 291, 434, 516, 281, + 483, 264, 558, 1874, 11, 406, 257, 1578, 1874, 13, 876, 11, 293, 611, 527, 25916, + 6717, 51620], "temperature": 0.0, "avg_logprob": -0.0993785759837357, "compression_ratio": + 1.5833333333333333, "no_speech_prob": 0.0053709992207586765}, {"id": 467, "seek": + 284760, "start": 2847.6, "end": 2852.4, "text": " model checks. If you have a nodded + term in your query and it finds it in relevancy, we drop it", "tokens": [50364, + 2316, 13834, 13, 759, 291, 362, 257, 15224, 9207, 1433, 294, 428, 14581, 293, 309, + 10704, 309, 294, 25916, 6717, 11, 321, 3270, 309, 50604], "temperature": 0.0, "avg_logprob": + -0.13176261054144967, "compression_ratio": 1.606060606060606, "no_speech_prob": + 0.011119586415588856}, {"id": 468, "seek": 284760, "start": 2852.4, "end": 2856.48, + "text": " to the bottom and say we actually put a special flag on it. We say this + was a bad result.", "tokens": [50604, 281, 264, 2767, 293, 584, 321, 767, 829, 257, + 2121, 7166, 322, 309, 13, 492, 584, 341, 390, 257, 1578, 1874, 13, 50808], "temperature": + 0.0, "avg_logprob": -0.13176261054144967, "compression_ratio": 1.606060606060606, + "no_speech_prob": 0.011119586415588856}, {"id": 469, "seek": 284760, "start": 2858.4, + "end": 2863.2799999999997, "text": " Most of the others though, frankly, just either + didn''t know they don''t handle not. You track", "tokens": [50904, 4534, 295, 264, + 2357, 1673, 11, 11939, 11, 445, 2139, 994, 380, 458, 436, 500, 380, 4813, 406, 13, + 509, 2837, 51148], "temperature": 0.0, "avg_logprob": -0.13176261054144967, "compression_ratio": + 1.606060606060606, "no_speech_prob": 0.011119586415588856}, {"id": 470, "seek": + 284760, "start": 2863.2799999999997, "end": 2867.92, "text": " doesn''t handle a + knot at all. So we removed it completely and just say, go and give us what you''ve", + "tokens": [51148, 1177, 380, 4813, 257, 16966, 412, 439, 13, 407, 321, 7261, 309, + 2584, 293, 445, 584, 11, 352, 293, 976, 505, 437, 291, 600, 51380], "temperature": + 0.0, "avg_logprob": -0.13176261054144967, "compression_ratio": 1.606060606060606, + "no_speech_prob": 0.011119586415588856}, {"id": 471, "seek": 284760, "start": 2867.92, + "end": 2873.2, "text": " got for that. And for others, we probably would have left + them. Looking at the results, there''s also", "tokens": [51380, 658, 337, 300, 13, + 400, 337, 2357, 11, 321, 1391, 576, 362, 1411, 552, 13, 11053, 412, 264, 3542, 11, + 456, 311, 611, 51644], "temperature": 0.0, "avg_logprob": -0.13176261054144967, + "compression_ratio": 1.606060606060606, "no_speech_prob": 0.011119586415588856}, + {"id": 472, "seek": 287320, "start": 2873.2, "end": 2878.0, "text": " an info block. + This is all JSON. So it''s straightforward for developer using Python. It''s", "tokens": + [50364, 364, 13614, 3461, 13, 639, 307, 439, 31828, 13, 407, 309, 311, 15325, 337, + 10754, 1228, 15329, 13, 467, 311, 50604], "temperature": 0.0, "avg_logprob": -0.12787823019356565, + "compression_ratio": 1.6262626262626263, "no_speech_prob": 0.0014749389374628663}, + {"id": 473, "seek": 287320, "start": 2878.0, "end": 2883.7599999999998, "text": + " little lists and dictionaries. There''s a result that describes what each of the + different sources", "tokens": [50604, 707, 14511, 293, 22352, 4889, 13, 821, 311, + 257, 1874, 300, 15626, 437, 1184, 295, 264, 819, 7139, 50892], "temperature": 0.0, + "avg_logprob": -0.12787823019356565, "compression_ratio": 1.6262626262626263, "no_speech_prob": + 0.0014749389374628663}, {"id": 474, "seek": 287320, "start": 2883.7599999999998, + "end": 2889.68, "text": " gave back. Easy to parse if you want to build that. You + have a filter URL so you can construct your own", "tokens": [50892, 2729, 646, 13, + 16002, 281, 48377, 498, 291, 528, 281, 1322, 300, 13, 509, 362, 257, 6608, 12905, + 370, 291, 393, 7690, 428, 1065, 51188], "temperature": 0.0, "avg_logprob": -0.12787823019356565, + "compression_ratio": 1.6262626262626263, "no_speech_prob": 0.0014749389374628663}, + {"id": 475, "seek": 287320, "start": 2889.68, "end": 2895.8399999999997, "text": + " facet display and to jump to any given provider. We actually give you the query + that we ran. So if", "tokens": [51188, 1915, 302, 4674, 293, 281, 3012, 281, 604, + 2212, 12398, 13, 492, 767, 976, 291, 264, 14581, 300, 321, 5872, 13, 407, 498, 51496], + "temperature": 0.0, "avg_logprob": -0.12787823019356565, "compression_ratio": 1.6262626262626263, + "no_speech_prob": 0.0014749389374628663}, {"id": 476, "seek": 287320, "start": 2895.8399999999997, + "end": 2899.68, "text": " you want to check the results, assuming you have the right + credentials, there''s the results.", "tokens": [51496, 291, 528, 281, 1520, 264, + 3542, 11, 11926, 291, 362, 264, 558, 27404, 11, 456, 311, 264, 3542, 13, 51688], + "temperature": 0.0, "avg_logprob": -0.12787823019356565, "compression_ratio": 1.6262626262626263, + "no_speech_prob": 0.0014749389374628663}, {"id": 477, "seek": 289968, "start": 2899.68, + "end": 2904.96, "text": " So I can actually go look at and modify my JSON. And then + as you would expect, there''s a summary", "tokens": [50364, 407, 286, 393, 767, + 352, 574, 412, 293, 16927, 452, 31828, 13, 400, 550, 382, 291, 576, 2066, 11, 456, + 311, 257, 12691, 50628], "temperature": 0.0, "avg_logprob": -0.149194670505211, + "compression_ratio": 1.6482758620689655, "no_speech_prob": 0.002980087650939822}, + {"id": 478, "seek": 289968, "start": 2904.96, "end": 2912.56, "text": " of what + was found. So here''s what we actually searched. The overall query, if you want + to rerun", "tokens": [50628, 295, 437, 390, 1352, 13, 407, 510, 311, 437, 321, 767, + 22961, 13, 440, 4787, 14581, 11, 498, 291, 528, 281, 43819, 409, 51008], "temperature": + 0.0, "avg_logprob": -0.149194670505211, "compression_ratio": 1.6482758620689655, + "no_speech_prob": 0.002980087650939822}, {"id": 479, "seek": 289968, "start": 2912.56, + "end": 2916.48, "text": " or update a query or rescore it, you can do that right + from the result list. So those links are", "tokens": [51008, 420, 5623, 257, 14581, + 420, 9610, 418, 309, 11, 291, 393, 360, 300, 558, 490, 264, 1874, 1329, 13, 407, + 729, 6123, 366, 51204], "temperature": 0.0, "avg_logprob": -0.149194670505211, "compression_ratio": + 1.6482758620689655, "no_speech_prob": 0.002980087650939822}, {"id": 480, "seek": + 289968, "start": 2916.48, "end": 2922.64, "text": " available. We summarize the + federation results and the time. Give you the next page of results,", "tokens": + [51204, 2435, 13, 492, 20858, 264, 4636, 5053, 3542, 293, 264, 565, 13, 5303, 291, + 264, 958, 3028, 295, 3542, 11, 51512], "temperature": 0.0, "avg_logprob": -0.149194670505211, + "compression_ratio": 1.6482758620689655, "no_speech_prob": 0.002980087650939822}, + {"id": 481, "seek": 289968, "start": 2922.64, "end": 2926.48, "text": " everything + stored in swirl. So you can page through. By the way, you can also set a retention", + "tokens": [51512, 1203, 12187, 294, 30310, 13, 407, 291, 393, 3028, 807, 13, 3146, + 264, 636, 11, 291, 393, 611, 992, 257, 22871, 51704], "temperature": 0.0, "avg_logprob": + -0.149194670505211, "compression_ratio": 1.6482758620689655, "no_speech_prob": 0.002980087650939822}, + {"id": 482, "seek": 292648, "start": 2927.2, "end": 2931.6, "text": " or exploration + factor if you want. So results will simply disappear for secure applications. You", + "tokens": [50400, 420, 16197, 5952, 498, 291, 528, 13, 407, 3542, 486, 2935, 11596, + 337, 7144, 5821, 13, 509, 50620], "temperature": 0.0, "avg_logprob": -0.14801109786582203, + "compression_ratio": 1.72992700729927, "no_speech_prob": 0.008976401761174202}, + {"id": 483, "seek": 292648, "start": 2931.6, "end": 2937.52, "text": " can even + do it. So there''s no storage at all. And then the results. So from a developer + perspective,", "tokens": [50620, 393, 754, 360, 309, 13, 407, 456, 311, 572, 6725, + 412, 439, 13, 400, 550, 264, 3542, 13, 407, 490, 257, 10754, 4585, 11, 50916], "temperature": + 0.0, "avg_logprob": -0.14801109786582203, "compression_ratio": 1.72992700729927, + "no_speech_prob": 0.008976401761174202}, {"id": 484, "seek": 292648, "start": 2938.16, + "end": 2943.44, "text": " literally, I''m going to extract the results dictionary + or sorry, the results list from this", "tokens": [50948, 3736, 11, 286, 478, 516, + 281, 8947, 264, 3542, 25890, 420, 2597, 11, 264, 3542, 1329, 490, 341, 51212], "temperature": + 0.0, "avg_logprob": -0.14801109786582203, "compression_ratio": 1.72992700729927, + "no_speech_prob": 0.008976401761174202}, {"id": 485, "seek": 292648, "start": 2943.44, + "end": 2948.0, "text": " structure that I get back when I call it. And I''m going + to iterate on that and each thing''s", "tokens": [51212, 3877, 300, 286, 483, 646, + 562, 286, 818, 309, 13, 400, 286, 478, 516, 281, 44497, 322, 300, 293, 1184, 551, + 311, 51440], "temperature": 0.0, "avg_logprob": -0.14801109786582203, "compression_ratio": + 1.72992700729927, "no_speech_prob": 0.008976401761174202}, {"id": 486, "seek": 292648, + "start": 2948.0, "end": 2953.12, "text": " a dictionary. It''s a flat dictionary + with as the things you would expect pretty much, right?", "tokens": [51440, 257, + 25890, 13, 467, 311, 257, 4962, 25890, 365, 382, 264, 721, 291, 576, 2066, 1238, + 709, 11, 558, 30, 51696], "temperature": 0.0, "avg_logprob": -0.14801109786582203, + "compression_ratio": 1.72992700729927, "no_speech_prob": 0.008976401761174202}, + {"id": 487, "seek": 295312, "start": 2953.12, "end": 2960.08, "text": " Title URL, + body, date published date retrieved and author. Everything else is meta information.", + "tokens": [50364, 26768, 12905, 11, 1772, 11, 4002, 6572, 4002, 19817, 937, 293, + 3793, 13, 5471, 1646, 307, 19616, 1589, 13, 50712], "temperature": 0.0, "avg_logprob": + -0.16446457879017976, "compression_ratio": 1.586092715231788, "no_speech_prob": + 0.002359856851398945}, {"id": 488, "seek": 295312, "start": 2960.08, "end": 2966.3199999999997, + "text": " So what what search provider responded, what the rank was, our scores + a score. There''s various", "tokens": [50712, 407, 437, 437, 3164, 12398, 15806, + 11, 437, 264, 6181, 390, 11, 527, 13444, 257, 6175, 13, 821, 311, 3683, 51024], + "temperature": 0.0, "avg_logprob": -0.16446457879017976, "compression_ratio": 1.586092715231788, + "no_speech_prob": 0.002359856851398945}, {"id": 489, "seek": 295312, "start": 2966.3199999999997, + "end": 2970.96, "text": " techniques to turn that into a probability or a confidence + level if you would like. We may do", "tokens": [51024, 7512, 281, 1261, 300, 666, + 257, 8482, 420, 257, 6687, 1496, 498, 291, 576, 411, 13, 492, 815, 360, 51256], + "temperature": 0.0, "avg_logprob": -0.16446457879017976, "compression_ratio": 1.586092715231788, + "no_speech_prob": 0.002359856851398945}, {"id": 490, "seek": 295312, "start": 2970.96, + "end": 2975.6, "text": " that in the future. I think it''s if people wanted it, + we''d love to hear about it. I think for now,", "tokens": [51256, 300, 294, 264, + 2027, 13, 286, 519, 309, 311, 498, 561, 1415, 309, 11, 321, 1116, 959, 281, 1568, + 466, 309, 13, 286, 519, 337, 586, 11, 51488], "temperature": 0.0, "avg_logprob": + -0.16446457879017976, "compression_ratio": 1.586092715231788, "no_speech_prob": + 0.002359856851398945}, {"id": 491, "seek": 295312, "start": 2975.6, "end": 2981.6, + "text": " though, people seem to be very happy just with rank. Most importantly, + and really, this is what", "tokens": [51488, 1673, 11, 561, 1643, 281, 312, 588, + 2055, 445, 365, 6181, 13, 4534, 8906, 11, 293, 534, 11, 341, 307, 437, 51788], "temperature": + 0.0, "avg_logprob": -0.16446457879017976, "compression_ratio": 1.586092715231788, + "no_speech_prob": 0.002359856851398945}, {"id": 492, "seek": 298160, "start": 2981.6, + "end": 2988.0, "text": " swirls ultimate value is, we explain exactly why the result + matched and why it scored as it did.", "tokens": [50364, 30310, 82, 9705, 2158, + 307, 11, 321, 2903, 2293, 983, 264, 1874, 21447, 293, 983, 309, 18139, 382, 309, + 630, 13, 50684], "temperature": 0.0, "avg_logprob": -0.169556786032284, "compression_ratio": + 1.6483050847457628, "no_speech_prob": 0.0012338421074673533}, {"id": 493, "seek": + 298160, "start": 2988.48, "end": 2993.2, "text": " So for example, we in this case, + of course, there are no stems for a name, but we do as basically", "tokens": [50708, + 407, 337, 1365, 11, 321, 294, 341, 1389, 11, 295, 1164, 11, 456, 366, 572, 27600, + 337, 257, 1315, 11, 457, 321, 360, 382, 1936, 50944], "temperature": 0.0, "avg_logprob": + -0.169556786032284, "compression_ratio": 1.6483050847457628, "no_speech_prob": 0.0012338421074673533}, + {"id": 494, "seek": 298160, "start": 2993.2, "end": 3000.3199999999997, "text": + " we use nltk, we stem to maximize recall. Then you''ll see the actual extracted + hits, the actual", "tokens": [50944, 321, 764, 297, 2282, 74, 11, 321, 12312, 281, + 19874, 9901, 13, 1396, 291, 603, 536, 264, 3539, 34086, 8664, 11, 264, 3539, 51300], + "temperature": 0.0, "avg_logprob": -0.169556786032284, "compression_ratio": 1.6483050847457628, + "no_speech_prob": 0.0012338421074673533}, {"id": 495, "seek": 298160, "start": 3000.3199999999997, + "end": 3005.12, "text": " hit, not the lower case tokenized version, right? So we + extract the actual hit. And then we produce", "tokens": [51300, 2045, 11, 406, 264, + 3126, 1389, 14862, 1602, 3037, 11, 558, 30, 407, 321, 8947, 264, 3539, 2045, 13, + 400, 550, 321, 5258, 51540], "temperature": 0.0, "avg_logprob": -0.169556786032284, + "compression_ratio": 1.6483050847457628, "no_speech_prob": 0.0012338421074673533}, + {"id": 496, "seek": 300512, "start": 3005.12, "end": 3013.52, "text": " the score, + which is this is the cosine similarity between the query and the text around it + in the result.", "tokens": [50364, 264, 6175, 11, 597, 307, 341, 307, 264, 23565, + 32194, 1296, 264, 14581, 293, 264, 2487, 926, 309, 294, 264, 1874, 13, 50784], "temperature": + 0.0, "avg_logprob": -0.1206219372925935, "compression_ratio": 1.8066914498141264, + "no_speech_prob": 0.0025992179289460182}, {"id": 497, "seek": 300512, "start": 3014.08, + "end": 3019.04, "text": " So we kind of sentence tokenize the result that we get + and then we''re basically looking", "tokens": [50812, 407, 321, 733, 295, 8174, + 14862, 1125, 264, 1874, 300, 321, 483, 293, 550, 321, 434, 1936, 1237, 51060], "temperature": + 0.0, "avg_logprob": -0.1206219372925935, "compression_ratio": 1.8066914498141264, + "no_speech_prob": 0.0025992179289460182}, {"id": 498, "seek": 300512, "start": 3019.04, + "end": 3024.64, "text": " to try and stay within that sentence and see how relevant + it is. And ultimately, we also adjust", "tokens": [51060, 281, 853, 293, 1754, 1951, + 300, 8174, 293, 536, 577, 7340, 309, 307, 13, 400, 6284, 11, 321, 611, 4369, 51340], + "temperature": 0.0, "avg_logprob": -0.1206219372925935, "compression_ratio": 1.8066914498141264, + "no_speech_prob": 0.0025992179289460182}, {"id": 499, "seek": 300512, "start": 3024.64, + "end": 3028.48, "text": " since we are sending different queries to different systems + and of course, different systems have", "tokens": [51340, 1670, 321, 366, 7750, + 819, 24109, 281, 819, 3652, 293, 295, 1164, 11, 819, 3652, 362, 51532], "temperature": + 0.0, "avg_logprob": -0.1206219372925935, "compression_ratio": 1.8066914498141264, + "no_speech_prob": 0.0025992179289460182}, {"id": 500, "seek": 300512, "start": 3028.48, + "end": 3032.7999999999997, "text": " different result links on average. We do adjustments + for both of those. We also give you the exact", "tokens": [51532, 819, 1874, 6123, + 322, 4274, 13, 492, 360, 18624, 337, 1293, 295, 729, 13, 492, 611, 976, 291, 264, + 1900, 51748], "temperature": 0.0, "avg_logprob": -0.1206219372925935, "compression_ratio": + 1.8066914498141264, "no_speech_prob": 0.0025992179289460182}, {"id": 501, "seek": + 303280, "start": 3032.8, "end": 3041.52, "text": " token locations for everything + that''s hit and ready to rank from there. Wow. So much is done behind", "tokens": + [50364, 14862, 9253, 337, 1203, 300, 311, 2045, 293, 1919, 281, 6181, 490, 456, + 13, 3153, 13, 407, 709, 307, 1096, 2261, 50800], "temperature": 0.0, "avg_logprob": + -0.1566390722570285, "compression_ratio": 1.580110497237569, "no_speech_prob": 0.007770642172545195}, + {"id": 502, "seek": 303280, "start": 3041.52, "end": 3050.4, "text": " the scenes + here. And so much is simplified on the other side, on the outer side. That''s amazing.", + "tokens": [50800, 264, 8026, 510, 13, 400, 370, 709, 307, 26335, 322, 264, 661, + 1252, 11, 322, 264, 10847, 1252, 13, 663, 311, 2243, 13, 51244], "temperature": + 0.0, "avg_logprob": -0.1566390722570285, "compression_ratio": 1.580110497237569, + "no_speech_prob": 0.007770642172545195}, {"id": 503, "seek": 303280, "start": 3051.6800000000003, + "end": 3057.76, "text": " And how many systems do you support or which systems do + you support out of the box today?", "tokens": [51308, 400, 577, 867, 3652, 360, + 291, 1406, 420, 597, 3652, 360, 291, 1406, 484, 295, 264, 2424, 965, 30, 51612], + "temperature": 0.0, "avg_logprob": -0.1566390722570285, "compression_ratio": 1.580110497237569, + "no_speech_prob": 0.007770642172545195}, {"id": 504, "seek": 305776, "start": 3058.6400000000003, + "end": 3063.44, "text": " So I''m happy to say we have connectors to all of the + major open source search engines, including solar,", "tokens": [50408, 407, 286, + 478, 2055, 281, 584, 321, 362, 31865, 281, 439, 295, 264, 2563, 1269, 4009, 3164, + 12982, 11, 3009, 7936, 11, 50648], "temperature": 0.0, "avg_logprob": -0.1943179902576265, + "compression_ratio": 1.6, "no_speech_prob": 0.010007666423916817}, {"id": 505, "seek": + 305776, "start": 3063.92, "end": 3071.1200000000003, "text": " AWS open search or + open search.org, I should say, an elastic search. We also support the main", "tokens": + [50672, 17650, 1269, 3164, 420, 1269, 3164, 13, 4646, 11, 286, 820, 584, 11, 364, + 17115, 3164, 13, 492, 611, 1406, 264, 2135, 51032], "temperature": 0.0, "avg_logprob": + -0.1943179902576265, "compression_ratio": 1.6, "no_speech_prob": 0.010007666423916817}, + {"id": 506, "seek": 305776, "start": 3071.1200000000003, "end": 3077.92, "text": + " open databases, Postgres, SQLite, also some of the more traditional cloud ones, + Google BigQuery, for", "tokens": [51032, 1269, 22380, 11, 10223, 45189, 11, 19200, + 642, 11, 611, 512, 295, 264, 544, 5164, 4588, 2306, 11, 3329, 43422, 11, 337, 51372], + "temperature": 0.0, "avg_logprob": -0.1943179902576265, "compression_ratio": 1.6, + "no_speech_prob": 0.010007666423916817}, {"id": 507, "seek": 305776, "start": 3077.92, + "end": 3085.5200000000004, "text": " example. And we are in the process of adding, + as I mentioned, M365. We also have, as of the last one,", "tokens": [51372, 1365, + 13, 400, 321, 366, 294, 264, 1399, 295, 5127, 11, 382, 286, 2835, 11, 376, 11309, + 20, 13, 492, 611, 362, 11, 382, 295, 264, 1036, 472, 11, 51752], "temperature": + 0.0, "avg_logprob": -0.1943179902576265, "compression_ratio": 1.6, "no_speech_prob": + 0.010007666423916817}, {"id": 508, "seek": 308552, "start": 3085.52, "end": 3091.04, + "text": " you can connect to Atlassian using our request get. You can connect to + UTRAC. So many of the", "tokens": [50364, 291, 393, 1745, 281, 11000, 640, 952, + 1228, 527, 5308, 483, 13, 509, 393, 1745, 281, 624, 51, 3750, 34, 13, 407, 867, + 295, 264, 50640], "temperature": 0.0, "avg_logprob": -0.1748656382602928, "compression_ratio": + 1.625, "no_speech_prob": 0.011157287284731865}, {"id": 509, "seek": 308552, "start": + 3091.6, "end": 3096.0, "text": " sophisticated repositories, you can actually just + use the request get connector to talk to them.", "tokens": [50668, 16950, 22283, + 2083, 11, 291, 393, 767, 445, 764, 264, 5308, 483, 19127, 281, 751, 281, 552, 13, + 50888], "temperature": 0.0, "avg_logprob": -0.1748656382602928, "compression_ratio": + 1.625, "no_speech_prob": 0.011157287284731865}, {"id": 510, "seek": 308552, "start": + 3096.8, "end": 3101.92, "text": " And M365 and Slack are coming in our next release, + which is next month.", "tokens": [50928, 400, 376, 11309, 20, 293, 37211, 366, 1348, + 294, 527, 958, 4374, 11, 597, 307, 958, 1618, 13, 51184], "temperature": 0.0, "avg_logprob": + -0.1748656382602928, "compression_ratio": 1.625, "no_speech_prob": 0.011157287284731865}, + {"id": 511, "seek": 308552, "start": 3102.96, "end": 3109.28, "text": " Well, I + think especially Slack or any like messenger that also has this kind of APIs that", + "tokens": [51236, 1042, 11, 286, 519, 2318, 37211, 420, 604, 411, 26599, 300, 611, + 575, 341, 733, 295, 21445, 300, 51552], "temperature": 0.0, "avg_logprob": -0.1748656382602928, + "compression_ratio": 1.625, "no_speech_prob": 0.011157287284731865}, {"id": 512, + "seek": 308552, "start": 3109.28, "end": 3114.64, "text": " can utilize, I think + that''s going to be like a big thing in my opinion, because so much is", "tokens": + [51552, 393, 16117, 11, 286, 519, 300, 311, 516, 281, 312, 411, 257, 955, 551, 294, + 452, 4800, 11, 570, 370, 709, 307, 51820], "temperature": 0.0, "avg_logprob": -0.1748656382602928, + "compression_ratio": 1.625, "no_speech_prob": 0.011157287284731865}, {"id": 513, + "seek": 311464, "start": 3114.64, "end": 3120.96, "text": " happening in Slack or + similar platforms. You know, so much knowledge is kind of written there in", "tokens": + [50364, 2737, 294, 37211, 420, 2531, 9473, 13, 509, 458, 11, 370, 709, 3601, 307, + 733, 295, 3720, 456, 294, 50680], "temperature": 0.0, "avg_logprob": -0.16908105936917392, + "compression_ratio": 1.48828125, "no_speech_prob": 0.0022082871291786432}, {"id": + 514, "seek": 311464, "start": 3120.96, "end": 3125.7599999999998, "text": " public + channel. So you''re in your own direct messages, right? It''s possible to access + them.", "tokens": [50680, 1908, 2269, 13, 407, 291, 434, 294, 428, 1065, 2047, 7897, + 11, 558, 30, 467, 311, 1944, 281, 2105, 552, 13, 50920], "temperature": 0.0, "avg_logprob": + -0.16908105936917392, "compression_ratio": 1.48828125, "no_speech_prob": 0.0022082871291786432}, + {"id": 515, "seek": 311464, "start": 3126.64, "end": 3133.2, "text": " Then I think + that this is amazing. We even support Microsoft Teams in the next release, full", + "tokens": [50964, 1396, 286, 519, 300, 341, 307, 2243, 13, 492, 754, 1406, 8116, + 24702, 294, 264, 958, 4374, 11, 1577, 51292], "temperature": 0.0, "avg_logprob": + -0.16908105936917392, "compression_ratio": 1.48828125, "no_speech_prob": 0.0022082871291786432}, + {"id": 516, "seek": 311464, "start": 3133.2, "end": 3138.64, "text": " search of + messages, also all the shared objects, depending on configuration. And if you''re + familiar", "tokens": [51292, 3164, 295, 7897, 11, 611, 439, 264, 5507, 6565, 11, + 5413, 322, 11694, 13, 400, 498, 291, 434, 4963, 51564], "temperature": 0.0, "avg_logprob": + -0.16908105936917392, "compression_ratio": 1.48828125, "no_speech_prob": 0.0022082871291786432}, + {"id": 517, "seek": 313864, "start": 3139.12, "end": 3147.92, "text": " with the + M365 OpenID connect, the infrastructure and sort of that ecosystem, it''s entirely + under", "tokens": [50388, 365, 264, 376, 11309, 20, 7238, 2777, 1745, 11, 264, 6896, + 293, 1333, 295, 300, 11311, 11, 309, 311, 7696, 833, 50828], "temperature": 0.0, + "avg_logprob": -0.1988883399963379, "compression_ratio": 1.5039370078740157, "no_speech_prob": + 0.0131960054859519}, {"id": 518, "seek": 313864, "start": 3147.92, "end": 3153.68, + "text": " the deployers control. Swirl is just software. I mean, we have a hosted + platform which you get", "tokens": [50828, 264, 7274, 433, 1969, 13, 3926, 1648, + 307, 445, 4722, 13, 286, 914, 11, 321, 362, 257, 19204, 3663, 597, 291, 483, 51116], + "temperature": 0.0, "avg_logprob": -0.1988883399963379, "compression_ratio": 1.5039370078740157, + "no_speech_prob": 0.0131960054859519}, {"id": 519, "seek": 313864, "start": 3153.68, + "end": 3159.44, "text": " connected to, but the permissioning, all of that is actually + done on the on the owner''s side. And", "tokens": [51116, 4582, 281, 11, 457, 264, + 11226, 278, 11, 439, 295, 300, 307, 767, 1096, 322, 264, 322, 264, 7289, 311, 1252, + 13, 400, 51404], "temperature": 0.0, "avg_logprob": -0.1988883399963379, "compression_ratio": + 1.5039370078740157, "no_speech_prob": 0.0131960054859519}, {"id": 520, "seek": 313864, + "start": 3159.44, "end": 3166.24, "text": " you can turn it off in one second for + any reason you''re uncomfortable. But Swirl 2.0, again,", "tokens": [51404, 291, + 393, 1261, 309, 766, 294, 472, 1150, 337, 604, 1778, 291, 434, 10532, 13, 583, 3926, + 1648, 568, 13, 15, 11, 797, 11, 51744], "temperature": 0.0, "avg_logprob": -0.1988883399963379, + "compression_ratio": 1.5039370078740157, "no_speech_prob": 0.0131960054859519}, + {"id": 521, "seek": 316624, "start": 3166.24, "end": 3172.8799999999997, "text": + " we''ll be coming out next month, has all of the OAuth and OIDC capabilities so + that you''re really", "tokens": [50364, 321, 603, 312, 1348, 484, 958, 1618, 11, + 575, 439, 295, 264, 48424, 2910, 293, 422, 2777, 34, 10862, 370, 300, 291, 434, + 534, 50696], "temperature": 0.0, "avg_logprob": -0.18983150565105936, "compression_ratio": + 1.4980237154150198, "no_speech_prob": 0.005239561200141907}, {"id": 522, "seek": + 316624, "start": 3172.8799999999997, "end": 3177.9199999999996, "text": " just connecting + your Microsoft account, searching through that stuff. And there''s no other user", + "tokens": [50696, 445, 11015, 428, 8116, 2696, 11, 10808, 807, 300, 1507, 13, 400, + 456, 311, 572, 661, 4195, 50948], "temperature": 0.0, "avg_logprob": -0.18983150565105936, + "compression_ratio": 1.4980237154150198, "no_speech_prob": 0.005239561200141907}, + {"id": 523, "seek": 316624, "start": 3177.9199999999996, "end": 3184.72, "text": + " interfaces or IDs or anything like that. It''s all seamless. And again, I''ll + completely", "tokens": [50948, 28416, 420, 48212, 420, 1340, 411, 300, 13, 467, + 311, 439, 28677, 13, 400, 797, 11, 286, 603, 2584, 51288], "temperature": 0.0, "avg_logprob": + -0.18983150565105936, "compression_ratio": 1.4980237154150198, "no_speech_prob": + 0.005239561200141907}, {"id": 524, "seek": 316624, "start": 3185.6, "end": 3192.3999999999996, + "text": " controlled by the deploy inside that M36510 and owner. Yeah, fantastic. + Is there something else you", "tokens": [51332, 10164, 538, 264, 7274, 1854, 300, + 376, 11309, 20, 3279, 293, 7289, 13, 865, 11, 5456, 13, 1119, 456, 746, 1646, 291, + 51672], "temperature": 0.0, "avg_logprob": -0.18983150565105936, "compression_ratio": + 1.4980237154150198, "no_speech_prob": 0.005239561200141907}, {"id": 525, "seek": + 319240, "start": 3192.4, "end": 3199.28, "text": " want to show on this demo? Or + we want to go back to our audio mode video and audio for those", "tokens": [50364, + 528, 281, 855, 322, 341, 10723, 30, 1610, 321, 528, 281, 352, 646, 281, 527, 6278, + 4391, 960, 293, 6278, 337, 729, 50708], "temperature": 0.0, "avg_logprob": -0.13696183368658563, + "compression_ratio": 1.710914454277286, "no_speech_prob": 0.025341203436255455}, + {"id": 526, "seek": 319240, "start": 3199.28, "end": 3203.6800000000003, "text": + " who are listening only. All right. I hope that was more than enough. You know, + there''s a ton to", "tokens": [50708, 567, 366, 4764, 787, 13, 1057, 558, 13, 286, + 1454, 300, 390, 544, 813, 1547, 13, 509, 458, 11, 456, 311, 257, 2952, 281, 50928], + "temperature": 0.0, "avg_logprob": -0.13696183368658563, "compression_ratio": 1.710914454277286, + "no_speech_prob": 0.025341203436255455}, {"id": 527, "seek": 319240, "start": 3203.6800000000003, + "end": 3207.6, "text": " show. I just want to give a little flavor for it. And in + particular, you know, we''re really focused", "tokens": [50928, 855, 13, 286, 445, + 528, 281, 976, 257, 707, 6813, 337, 309, 13, 400, 294, 1729, 11, 291, 458, 11, 321, + 434, 534, 5178, 51124], "temperature": 0.0, "avg_logprob": -0.13696183368658563, + "compression_ratio": 1.710914454277286, "no_speech_prob": 0.025341203436255455}, + {"id": 528, "seek": 319240, "start": 3207.6, "end": 3212.8, "text": " on making + this easy for developers. That''s the current audience. I think there''s lots more + we can do", "tokens": [51124, 322, 1455, 341, 1858, 337, 8849, 13, 663, 311, 264, + 2190, 4034, 13, 286, 519, 456, 311, 3195, 544, 321, 393, 360, 51384], "temperature": + 0.0, "avg_logprob": -0.13696183368658563, "compression_ratio": 1.710914454277286, + "no_speech_prob": 0.025341203436255455}, {"id": 529, "seek": 319240, "start": 3212.8, + "end": 3217.04, "text": " in the future. But if you want to add a bunch of sources + or solve a multi-silver search problem,", "tokens": [51384, 294, 264, 2027, 13, + 583, 498, 291, 528, 281, 909, 257, 3840, 295, 7139, 420, 5039, 257, 4825, 12, 30605, + 331, 3164, 1154, 11, 51596], "temperature": 0.0, "avg_logprob": -0.13696183368658563, + "compression_ratio": 1.710914454277286, "no_speech_prob": 0.025341203436255455}, + {"id": 530, "seek": 319240, "start": 3217.04, "end": 3222.32, "text": " that''s + what Swirls intended to do. That''s a... It''s amazing. It''s amazing. And how do + you see", "tokens": [51596, 300, 311, 437, 3926, 1648, 82, 10226, 281, 360, 13, + 663, 311, 257, 485, 467, 311, 2243, 13, 467, 311, 2243, 13, 400, 577, 360, 291, + 536, 51860], "temperature": 0.0, "avg_logprob": -0.13696183368658563, "compression_ratio": + 1.710914454277286, "no_speech_prob": 0.025341203436255455}, {"id": 531, "seek": + 322232, "start": 3223.04, "end": 3229.76, "text": " the clientele? Like, what is + the ideal client for this system? How do you want to interact with", "tokens": [50400, + 264, 6423, 16884, 30, 1743, 11, 437, 307, 264, 7157, 6423, 337, 341, 1185, 30, 1012, + 360, 291, 528, 281, 4648, 365, 50736], "temperature": 0.0, "avg_logprob": -0.16753488540649414, + "compression_ratio": 1.572, "no_speech_prob": 0.003833875060081482}, {"id": 532, + "seek": 322232, "start": 3229.76, "end": 3236.1600000000003, "text": " these clients? + And how do you see... Or maybe you''ll already experience this, you know, first + steps to", "tokens": [50736, 613, 6982, 30, 400, 577, 360, 291, 536, 485, 1610, + 1310, 291, 603, 1217, 1752, 341, 11, 291, 458, 11, 700, 4439, 281, 51056], "temperature": + 0.0, "avg_logprob": -0.16753488540649414, "compression_ratio": 1.572, "no_speech_prob": + 0.003833875060081482}, {"id": 533, "seek": 322232, "start": 3237.04, "end": 3245.28, + "text": " succeeding on this path? So I honestly, people who are using it today + are doing three things with it.", "tokens": [51100, 47912, 322, 341, 3100, 30, 407, + 286, 6095, 11, 561, 567, 366, 1228, 309, 965, 366, 884, 1045, 721, 365, 309, 13, + 51512], "temperature": 0.0, "avg_logprob": -0.16753488540649414, "compression_ratio": + 1.572, "no_speech_prob": 0.003833875060081482}, {"id": 534, "seek": 322232, "start": + 3245.28, "end": 3250.6400000000003, "text": " And I''m super curious, right, as + to which ones of these will evolve. I think the most basic,", "tokens": [51512, + 400, 286, 478, 1687, 6369, 11, 558, 11, 382, 281, 597, 2306, 295, 613, 486, 16693, + 13, 286, 519, 264, 881, 3875, 11, 51780], "temperature": 0.0, "avg_logprob": -0.16753488540649414, + "compression_ratio": 1.572, "no_speech_prob": 0.003833875060081482}, {"id": 535, + "seek": 325064, "start": 3250.64, "end": 3254.0, "text": " you know, or interesting + use case, right, or the sort of like the most obvious use case", "tokens": [50364, + 291, 458, 11, 420, 1880, 764, 1389, 11, 558, 11, 420, 264, 1333, 295, 411, 264, + 881, 6322, 764, 1389, 50532], "temperature": 0.0, "avg_logprob": -0.10049458452173181, + "compression_ratio": 1.7725856697819315, "no_speech_prob": 0.0027524952311068773}, + {"id": 536, "seek": 325064, "start": 3254.64, "end": 3259.44, "text": " is one search + box to rule them all, apart from the Lord of the Rings reference. But honestly,", + "tokens": [50564, 307, 472, 3164, 2424, 281, 4978, 552, 439, 11, 4936, 490, 264, + 3257, 295, 264, 38543, 6408, 13, 583, 6095, 11, 50804], "temperature": 0.0, "avg_logprob": + -0.10049458452173181, "compression_ratio": 1.7725856697819315, "no_speech_prob": + 0.0027524952311068773}, {"id": 537, "seek": 325064, "start": 3259.44, "end": 3264.3199999999997, + "text": " that''s been so hard. If you''ve done a lot of enterprise search projects, + normally, you know,", "tokens": [50804, 300, 311, 668, 370, 1152, 13, 759, 291, + 600, 1096, 257, 688, 295, 14132, 3164, 4455, 11, 5646, 11, 291, 458, 11, 51048], + "temperature": 0.0, "avg_logprob": -0.10049458452173181, "compression_ratio": 1.7725856697819315, + "no_speech_prob": 0.0027524952311068773}, {"id": 538, "seek": 325064, "start": 3264.3199999999997, + "end": 3268.96, "text": " for the initial scope, and it''s expensive, and it takes + about a year or whatever, you know,", "tokens": [51048, 337, 264, 5883, 11923, 11, + 293, 309, 311, 5124, 11, 293, 309, 2516, 466, 257, 1064, 420, 2035, 11, 291, 458, + 11, 51280], "temperature": 0.0, "avg_logprob": -0.10049458452173181, "compression_ratio": + 1.7725856697819315, "no_speech_prob": 0.0027524952311068773}, {"id": 539, "seek": + 325064, "start": 3268.96, "end": 3273.8399999999997, "text": " you get a couple + silos in place, and things are good, and people like it. But adding silos over time", + "tokens": [51280, 291, 483, 257, 1916, 48893, 294, 1081, 11, 293, 721, 366, 665, + 11, 293, 561, 411, 309, 13, 583, 5127, 48893, 670, 565, 51524], "temperature": 0.0, + "avg_logprob": -0.10049458452173181, "compression_ratio": 1.7725856697819315, "no_speech_prob": + 0.0027524952311068773}, {"id": 540, "seek": 325064, "start": 3273.8399999999997, + "end": 3279.6, "text": " is super costly, and it''s hard, and this is the way to + do it. You have a great existing search index,", "tokens": [51524, 307, 1687, 28328, + 11, 293, 309, 311, 1152, 11, 293, 341, 307, 264, 636, 281, 360, 309, 13, 509, 362, + 257, 869, 6741, 3164, 8186, 11, 51812], "temperature": 0.0, "avg_logprob": -0.10049458452173181, + "compression_ratio": 1.7725856697819315, "no_speech_prob": 0.0027524952311068773}, + {"id": 541, "seek": 327960, "start": 3279.6, "end": 3286.48, "text": " you have + a search UI, awesome. Connect the search index to Swirl, and connect your UI to + Swirl.", "tokens": [50364, 291, 362, 257, 3164, 15682, 11, 3476, 13, 11653, 264, + 3164, 8186, 281, 3926, 1648, 11, 293, 1745, 428, 15682, 281, 3926, 1648, 13, 50708], + "temperature": 0.0, "avg_logprob": -0.10534522268507215, "compression_ratio": 1.6, + "no_speech_prob": 0.002256683073937893}, {"id": 542, "seek": 327960, "start": 3286.48, + "end": 3291.44, "text": " Now you can add a whole bunch of other sources and get + great ranking, and you don''t have to change", "tokens": [50708, 823, 291, 393, + 909, 257, 1379, 3840, 295, 661, 7139, 293, 483, 869, 17833, 11, 293, 291, 500, 380, + 362, 281, 1319, 50956], "temperature": 0.0, "avg_logprob": -0.10534522268507215, + "compression_ratio": 1.6, "no_speech_prob": 0.002256683073937893}, {"id": 543, "seek": + 327960, "start": 3291.44, "end": 3297.68, "text": " the UI necessarily. For the + most part, every search UI has URL, title, and body, and maybe a date.", "tokens": + [50956, 264, 15682, 4725, 13, 1171, 264, 881, 644, 11, 633, 3164, 15682, 575, 12905, + 11, 4876, 11, 293, 1772, 11, 293, 1310, 257, 4002, 13, 51268], "temperature": 0.0, + "avg_logprob": -0.10534522268507215, "compression_ratio": 1.6, "no_speech_prob": + 0.002256683073937893}, {"id": 544, "seek": 327960, "start": 3297.68, "end": 3302.72, + "text": " Okay, so if, for starters, you can just take those. And if you have more, + right, if you want to do", "tokens": [51268, 1033, 11, 370, 498, 11, 337, 35131, + 11, 291, 393, 445, 747, 729, 13, 400, 498, 291, 362, 544, 11, 558, 11, 498, 291, + 528, 281, 360, 51520], "temperature": 0.0, "avg_logprob": -0.10534522268507215, + "compression_ratio": 1.6, "no_speech_prob": 0.002256683073937893}, {"id": 545, "seek": + 330272, "start": 3302.72, "end": 3308.64, "text": " a source facet, that''s cool. + From there, I think, you know, people with Python, right, Django,", "tokens": [50364, + 257, 4009, 1915, 302, 11, 300, 311, 1627, 13, 3358, 456, 11, 286, 519, 11, 291, + 458, 11, 561, 365, 15329, 11, 558, 11, 33464, 17150, 11, 50660], "temperature": + 0.0, "avg_logprob": -0.14359905101634837, "compression_ratio": 1.697594501718213, + "no_speech_prob": 0.006232177373021841}, {"id": 546, "seek": 330272, "start": 3309.68, + "end": 3313.6, "text": " experience, and who want to take this and tailor it, we''d + love to help, we''d love to hear what you''re", "tokens": [50712, 1752, 11, 293, + 567, 528, 281, 747, 341, 293, 33068, 309, 11, 321, 1116, 959, 281, 854, 11, 321, + 1116, 959, 281, 1568, 437, 291, 434, 50908], "temperature": 0.0, "avg_logprob": + -0.14359905101634837, "compression_ratio": 1.697594501718213, "no_speech_prob": + 0.006232177373021841}, {"id": 547, "seek": 330272, "start": 3313.6, "end": 3318.9599999999996, + "text": " doing. Again, please, the Slack supports all free, just join up the community + and get in there,", "tokens": [50908, 884, 13, 3764, 11, 1767, 11, 264, 37211, 9346, + 439, 1737, 11, 445, 3917, 493, 264, 1768, 293, 483, 294, 456, 11, 51176], "temperature": + 0.0, "avg_logprob": -0.14359905101634837, "compression_ratio": 1.697594501718213, + "no_speech_prob": 0.006232177373021841}, {"id": 548, "seek": 330272, "start": 3318.9599999999996, + "end": 3323.52, "text": " and tell us what''s going on, or ask. And I think there''s + lots of other people who are working with", "tokens": [51176, 293, 980, 505, 437, + 311, 516, 322, 11, 420, 1029, 13, 400, 286, 519, 456, 311, 3195, 295, 661, 561, + 567, 366, 1364, 365, 51404], "temperature": 0.0, "avg_logprob": -0.14359905101634837, + "compression_ratio": 1.697594501718213, "no_speech_prob": 0.006232177373021841}, + {"id": 549, "seek": 330272, "start": 3323.52, "end": 3329.4399999999996, "text": + " it too, who are started to, you know, answer questions and things like that. The + second thing, though,", "tokens": [51404, 309, 886, 11, 567, 366, 1409, 281, 11, + 291, 458, 11, 1867, 1651, 293, 721, 411, 300, 13, 440, 1150, 551, 11, 1673, 11, + 51700], "temperature": 0.0, "avg_logprob": -0.14359905101634837, "compression_ratio": + 1.697594501718213, "no_speech_prob": 0.006232177373021841}, {"id": 550, "seek": + 332944, "start": 3329.92, "end": 3334.32, "text": " there are definitely use cases + where people really want to monitor multiple sources,", "tokens": [50388, 456, 366, + 2138, 764, 3331, 689, 561, 534, 528, 281, 6002, 3866, 7139, 11, 50608], "temperature": + 0.0, "avg_logprob": -0.12125456919435595, "compression_ratio": 1.6421052631578947, + "no_speech_prob": 0.0020573094952851534}, {"id": 551, "seek": 332944, "start": 3334.32, + "end": 3339.36, "text": " and push notifications out, like, to Slack, and to teams, + and things like that. That''s a very different", "tokens": [50608, 293, 2944, 13426, + 484, 11, 411, 11, 281, 37211, 11, 293, 281, 5491, 11, 293, 721, 411, 300, 13, 663, + 311, 257, 588, 819, 50860], "temperature": 0.0, "avg_logprob": -0.12125456919435595, + "compression_ratio": 1.6421052631578947, "no_speech_prob": 0.0020573094952851534}, + {"id": 552, "seek": 332944, "start": 3339.36, "end": 3345.04, "text": " model. I + don''t know if that''s for everybody, but I think it''s, in a way, that''s the future.", + "tokens": [50860, 2316, 13, 286, 500, 380, 458, 498, 300, 311, 337, 2201, 11, 457, + 286, 519, 309, 311, 11, 294, 257, 636, 11, 300, 311, 264, 2027, 13, 51144], "temperature": + 0.0, "avg_logprob": -0.12125456919435595, "compression_ratio": 1.6421052631578947, + "no_speech_prob": 0.0020573094952851534}, {"id": 553, "seek": 332944, "start": 3346.16, + "end": 3350.4, "text": " Right, we shouldn''t have to ask when going to a search + box takes time, and then I still have to parse it.", "tokens": [51200, 1779, 11, + 321, 4659, 380, 362, 281, 1029, 562, 516, 281, 257, 3164, 2424, 2516, 565, 11, 293, + 550, 286, 920, 362, 281, 48377, 309, 13, 51412], "temperature": 0.0, "avg_logprob": + -0.12125456919435595, "compression_ratio": 1.6421052631578947, "no_speech_prob": + 0.0020573094952851534}, {"id": 554, "seek": 332944, "start": 3351.44, "end": 3356.0, + "text": " Depending on what you know, Swirl doesn''t do any profiling or anything + like that,", "tokens": [51464, 22539, 322, 437, 291, 458, 11, 3926, 1648, 1177, + 380, 360, 604, 1740, 4883, 420, 1340, 411, 300, 11, 51692], "temperature": 0.0, + "avg_logprob": -0.12125456919435595, "compression_ratio": 1.6421052631578947, "no_speech_prob": + 0.0020573094952851534}, {"id": 555, "seek": 335600, "start": 3356.0, "end": 3360.4, + "text": " depending on what you know, you''re the builder of search apps, right, + or inside apps.", "tokens": [50364, 5413, 322, 437, 291, 458, 11, 291, 434, 264, + 27377, 295, 3164, 7733, 11, 558, 11, 420, 1854, 7733, 13, 50584], "temperature": + 0.0, "avg_logprob": -0.13668978460903825, "compression_ratio": 1.7746913580246915, + "no_speech_prob": 0.0021033899392932653}, {"id": 556, "seek": 335600, "start": 3361.6, + "end": 3366.16, "text": " You should be able to target them, but the barrier is + usually not what we know about the user,", "tokens": [50644, 509, 820, 312, 1075, + 281, 3779, 552, 11, 457, 264, 13357, 307, 2673, 406, 437, 321, 458, 466, 264, 4195, + 11, 50872], "temperature": 0.0, "avg_logprob": -0.13668978460903825, "compression_ratio": + 1.7746913580246915, "no_speech_prob": 0.0021033899392932653}, {"id": 557, "seek": + 335600, "start": 3366.16, "end": 3371.04, "text": " right? Since they''re an employee, + we might have skill knowledge about them, right? We probably have", "tokens": [50872, + 558, 30, 4162, 436, 434, 364, 10738, 11, 321, 1062, 362, 5389, 3601, 466, 552, 11, + 558, 30, 492, 1391, 362, 51116], "temperature": 0.0, "avg_logprob": -0.13668978460903825, + "compression_ratio": 1.7746913580246915, "no_speech_prob": 0.0021033899392932653}, + {"id": 558, "seek": 335600, "start": 3371.04, "end": 3376.32, "text": " access in + theory to some other information about their job function and department and who + they", "tokens": [51116, 2105, 294, 5261, 281, 512, 661, 1589, 466, 641, 1691, 2445, + 293, 5882, 293, 567, 436, 51380], "temperature": 0.0, "avg_logprob": -0.13668978460903825, + "compression_ratio": 1.7746913580246915, "no_speech_prob": 0.0021033899392932653}, + {"id": 559, "seek": 335600, "start": 3376.32, "end": 3381.12, "text": " talk to. + So it shouldn''t be that hard, but the problem isn''t knowing that stuff. The problem + is", "tokens": [51380, 751, 281, 13, 407, 309, 4659, 380, 312, 300, 1152, 11, 457, + 264, 1154, 1943, 380, 5276, 300, 1507, 13, 440, 1154, 307, 51620], "temperature": + 0.0, "avg_logprob": -0.13668978460903825, "compression_ratio": 1.7746913580246915, + "no_speech_prob": 0.0021033899392932653}, {"id": 560, "seek": 335600, "start": 3381.12, + "end": 3385.44, "text": " saying, okay, well, how do I get content, right? How do + I get that out? So again, hook it up to Swirl.", "tokens": [51620, 1566, 11, 1392, + 11, 731, 11, 577, 360, 286, 483, 2701, 11, 558, 30, 1012, 360, 286, 483, 300, 484, + 30, 407, 797, 11, 6328, 309, 493, 281, 3926, 1648, 13, 51836], "temperature": 0.0, + "avg_logprob": -0.13668978460903825, "compression_ratio": 1.7746913580246915, "no_speech_prob": + 0.0021033899392932653}, {"id": 561, "seek": 338600, "start": 3386.24, "end": 3391.92, + "text": " Build a watch list, which can be essentially a group of queries or set + of search objects with", "tokens": [50376, 11875, 257, 1159, 1329, 11, 597, 393, + 312, 4476, 257, 1594, 295, 24109, 420, 992, 295, 3164, 6565, 365, 50660], "temperature": + 0.0, "avg_logprob": -0.11151714492262456, "compression_ratio": 1.7321428571428572, + "no_speech_prob": 0.0031236843205988407}, {"id": 562, "seek": 338600, "start": 3391.92, + "end": 3397.12, "text": " the subscribe function turned on, you know, for a bunch + of topics, push that data out to the people", "tokens": [50660, 264, 3022, 2445, + 3574, 322, 11, 291, 458, 11, 337, 257, 3840, 295, 8378, 11, 2944, 300, 1412, 484, + 281, 264, 561, 50920], "temperature": 0.0, "avg_logprob": -0.11151714492262456, + "compression_ratio": 1.7321428571428572, "no_speech_prob": 0.0031236843205988407}, + {"id": 563, "seek": 338600, "start": 3397.12, "end": 3401.6, "text": " who need + to know, create groups, use service accounts to search as opposed to using individual", + "tokens": [50920, 567, 643, 281, 458, 11, 1884, 3935, 11, 764, 2643, 9402, 281, + 3164, 382, 8851, 281, 1228, 2609, 51144], "temperature": 0.0, "avg_logprob": -0.11151714492262456, + "compression_ratio": 1.7321428571428572, "no_speech_prob": 0.0031236843205988407}, + {"id": 564, "seek": 338600, "start": 3401.6, "end": 3407.84, "text": " users, right? + Targeting individual users, not super valuable for proactive delivery, but on a + group", "tokens": [51144, 5022, 11, 558, 30, 24586, 278, 2609, 5022, 11, 406, 1687, + 8263, 337, 28028, 8982, 11, 457, 322, 257, 1594, 51456], "temperature": 0.0, "avg_logprob": + -0.11151714492262456, "compression_ratio": 1.7321428571428572, "no_speech_prob": + 0.0031236843205988407}, {"id": 565, "seek": 338600, "start": 3407.84, "end": 3414.72, + "text": " basis, very valuable. So tell, right, create an industry feed that, you + know, if you really know", "tokens": [51456, 5143, 11, 588, 8263, 13, 407, 980, + 11, 558, 11, 1884, 364, 3518, 3154, 300, 11, 291, 458, 11, 498, 291, 534, 458, 51800], + "temperature": 0.0, "avg_logprob": -0.11151714492262456, "compression_ratio": 1.7321428571428572, + "no_speech_prob": 0.0031236843205988407}, {"id": 566, "seek": 341472, "start": 3414.72, + "end": 3420.3999999999996, "text": " where to get the best industry data, why not + make that systematic? Why not make that data available", "tokens": [50364, 689, + 281, 483, 264, 1151, 3518, 1412, 11, 983, 406, 652, 300, 27249, 30, 1545, 406, 652, + 300, 1412, 2435, 50648], "temperature": 0.0, "avg_logprob": -0.1317386465557551, + "compression_ratio": 1.7298245614035088, "no_speech_prob": 0.0063559189438819885}, + {"id": 567, "seek": 341472, "start": 3420.3999999999996, "end": 3424.08, "text": + " to everybody who''s out there trying to talk to those folks through whatever, + through their mobile?", "tokens": [50648, 281, 2201, 567, 311, 484, 456, 1382, 281, + 751, 281, 729, 4024, 807, 2035, 11, 807, 641, 6013, 30, 50832], "temperature": 0.0, + "avg_logprob": -0.1317386465557551, "compression_ratio": 1.7298245614035088, "no_speech_prob": + 0.0063559189438819885}, {"id": 568, "seek": 341472, "start": 3424.7999999999997, + "end": 3428.3999999999996, "text": " And this is a thing like trying to do end-to-end + enterprise search is super hard. You got to", "tokens": [50868, 400, 341, 307, 257, + 551, 411, 1382, 281, 360, 917, 12, 1353, 12, 521, 14132, 3164, 307, 1687, 1152, + 13, 509, 658, 281, 51048], "temperature": 0.0, "avg_logprob": -0.1317386465557551, + "compression_ratio": 1.7298245614035088, "no_speech_prob": 0.0063559189438819885}, + {"id": 569, "seek": 341472, "start": 3428.3999999999996, "end": 3432.64, "text": + " get people to adopt your solution. Why would what do you want my mobile app for? + You probably already", "tokens": [51048, 483, 561, 281, 6878, 428, 3827, 13, 1545, + 576, 437, 360, 291, 528, 452, 6013, 724, 337, 30, 509, 1391, 1217, 51260], "temperature": + 0.0, "avg_logprob": -0.1317386465557551, "compression_ratio": 1.7298245614035088, + "no_speech_prob": 0.0063559189438819885}, {"id": 570, "seek": 341472, "start": 3432.64, + "end": 3437.8399999999997, "text": " have a cool one. You might already have five. + So it''s all about just putting that data out there so", "tokens": [51260, 362, + 257, 1627, 472, 13, 509, 1062, 1217, 362, 1732, 13, 407, 309, 311, 439, 466, 445, + 3372, 300, 1412, 484, 456, 370, 51520], "temperature": 0.0, "avg_logprob": -0.1317386465557551, + "compression_ratio": 1.7298245614035088, "no_speech_prob": 0.0063559189438819885}, + {"id": 571, "seek": 343784, "start": 3437.84, "end": 3443.92, "text": " people can + keep building fast. That''s it. Yeah, this is amazing. I mean, you", "tokens": [50364, + 561, 393, 1066, 2390, 2370, 13, 663, 311, 309, 13, 865, 11, 341, 307, 2243, 13, + 286, 914, 11, 291, 50668], "temperature": 0.0, "avg_logprob": -0.1939904530843099, + "compression_ratio": 1.6904761904761905, "no_speech_prob": 0.0061910031363368034}, + {"id": 572, "seek": 343784, "start": 3445.04, "end": 3451.28, "text": " you simplified + a lot in how you presented, you simplified a lot and you solved so many", "tokens": + [50724, 291, 26335, 257, 688, 294, 577, 291, 8212, 11, 291, 26335, 257, 688, 293, + 291, 13041, 370, 867, 51036], "temperature": 0.0, "avg_logprob": -0.1939904530843099, + "compression_ratio": 1.6904761904761905, "no_speech_prob": 0.0061910031363368034}, + {"id": 573, "seek": 343784, "start": 3452.56, "end": 3459.04, "text": " edge case, + like not edge case, but like this really challenging things that are like showstoppers", + "tokens": [51100, 4691, 1389, 11, 411, 406, 4691, 1389, 11, 457, 411, 341, 534, + 7595, 721, 300, 366, 411, 855, 13559, 21819, 51424], "temperature": 0.0, "avg_logprob": + -0.1939904530843099, "compression_ratio": 1.6904761904761905, "no_speech_prob": + 0.0061910031363368034}, {"id": 574, "seek": 343784, "start": 3459.04, "end": 3463.84, + "text": " sometimes, you know, like, okay, I have this existing search demo app + or something, you know,", "tokens": [51424, 2171, 11, 291, 458, 11, 411, 11, 1392, + 11, 286, 362, 341, 6741, 3164, 10723, 724, 420, 746, 11, 291, 458, 11, 51664], "temperature": + 0.0, "avg_logprob": -0.1939904530843099, "compression_ratio": 1.6904761904761905, + "no_speech_prob": 0.0061910031363368034}, {"id": 575, "seek": 346384, "start": 3463.84, + "end": 3471.6000000000004, "text": " it''s used within my department. I just want + to add one data source. Now, what do I do, right?", "tokens": [50364, 309, 311, + 1143, 1951, 452, 5882, 13, 286, 445, 528, 281, 909, 472, 1412, 4009, 13, 823, 11, + 437, 360, 286, 360, 11, 558, 30, 50752], "temperature": 0.0, "avg_logprob": -0.15803291247441217, + "compression_ratio": 1.6394849785407726, "no_speech_prob": 0.00521750608459115}, + {"id": 576, "seek": 346384, "start": 3471.6000000000004, "end": 3476.4, "text": + " Do I really need to change my UI? Do I really need to rewrite the back end and + things like that?", "tokens": [50752, 1144, 286, 534, 643, 281, 1319, 452, 15682, + 30, 1144, 286, 534, 643, 281, 28132, 264, 646, 917, 293, 721, 411, 300, 30, 50992], + "temperature": 0.0, "avg_logprob": -0.15803291247441217, "compression_ratio": 1.6394849785407726, + "no_speech_prob": 0.00521750608459115}, {"id": 577, "seek": 346384, "start": 3476.4, + "end": 3484.96, "text": " And so I could actually, when I introduce swirl, will + it actually precede every search back end", "tokens": [50992, 400, 370, 286, 727, + 767, 11, 562, 286, 5366, 30310, 11, 486, 309, 767, 16969, 68, 633, 3164, 646, 917, + 51420], "temperature": 0.0, "avg_logprob": -0.15803291247441217, "compression_ratio": + 1.6394849785407726, "no_speech_prob": 0.00521750608459115}, {"id": 578, "seek": + 346384, "start": 3484.96, "end": 3493.28, "text": " call between UI and the search + back end? That''s how I do it now. And like, we''re setting it up.", "tokens": [51420, + 818, 1296, 15682, 293, 264, 3164, 646, 917, 30, 663, 311, 577, 286, 360, 309, 586, + 13, 400, 411, 11, 321, 434, 3287, 309, 493, 13, 51836], "temperature": 0.0, "avg_logprob": + -0.15803291247441217, "compression_ratio": 1.6394849785407726, "no_speech_prob": + 0.00521750608459115}, {"id": 579, "seek": 349328, "start": 3493.28, "end": 3499.92, + "text": " We use it internally and that''s the way to do it rather than querying + an index,", "tokens": [50364, 492, 764, 309, 19501, 293, 300, 311, 264, 636, 281, + 360, 309, 2831, 813, 7083, 1840, 364, 8186, 11, 50696], "temperature": 0.0, "avg_logprob": + -0.18405945833064308, "compression_ratio": 1.6774193548387097, "no_speech_prob": + 0.0009623169898986816}, {"id": 580, "seek": 349328, "start": 3500.6400000000003, + "end": 3505.36, "text": " you know, and then create just queries world and have + it query all of those things. And what you", "tokens": [50732, 291, 458, 11, 293, + 550, 1884, 445, 24109, 1002, 293, 362, 309, 14581, 439, 295, 729, 721, 13, 400, + 437, 291, 50968], "temperature": 0.0, "avg_logprob": -0.18405945833064308, "compression_ratio": + 1.6774193548387097, "no_speech_prob": 0.0009623169898986816}, {"id": 581, "seek": + 349328, "start": 3505.36, "end": 3511.2000000000003, "text": " get is the best results + from across all sources. Now, that''s no substitute though, right, from going", + "tokens": [50968, 483, 307, 264, 1151, 3542, 490, 2108, 439, 7139, 13, 823, 11, + 300, 311, 572, 15802, 1673, 11, 558, 11, 490, 516, 51260], "temperature": 0.0, "avg_logprob": + -0.18405945833064308, "compression_ratio": 1.6774193548387097, "no_speech_prob": + 0.0009623169898986816}, {"id": 582, "seek": 349328, "start": 3511.2000000000003, + "end": 3515.6800000000003, "text": " into the silo. Sometimes you need to go into + the silo. They have in addition to a great search API", "tokens": [51260, 666, 264, + 3425, 78, 13, 4803, 291, 643, 281, 352, 666, 264, 3425, 78, 13, 814, 362, 294, 4500, + 281, 257, 869, 3164, 9362, 51484], "temperature": 0.0, "avg_logprob": -0.18405945833064308, + "compression_ratio": 1.6774193548387097, "no_speech_prob": 0.0009623169898986816}, + {"id": 583, "seek": 349328, "start": 3516.2400000000002, "end": 3521.76, "text": + " and a lot of business logic right on their side, like query synums. There''s a + lot more. You", "tokens": [51512, 293, 257, 688, 295, 1606, 9952, 558, 322, 641, + 1252, 11, 411, 14581, 5451, 8099, 13, 821, 311, 257, 688, 544, 13, 509, 51788], + "temperature": 0.0, "avg_logprob": -0.18405945833064308, "compression_ratio": 1.6774193548387097, + "no_speech_prob": 0.0009623169898986816}, {"id": 584, "seek": 352176, "start": 3521.76, + "end": 3527.36, "text": " probably want to view the object in their environment + versus in swirl, we can create a copy of it or", "tokens": [50364, 1391, 528, 281, + 1910, 264, 2657, 294, 641, 2823, 5717, 294, 30310, 11, 321, 393, 1884, 257, 5055, + 295, 309, 420, 50644], "temperature": 0.0, "avg_logprob": -0.13036083406017673, + "compression_ratio": 1.6744186046511629, "no_speech_prob": 0.0013863056665286422}, + {"id": 585, "seek": 352176, "start": 3527.36, "end": 3532.48, "text": " whatever, + like everybody else does. We don''t. If somebody wants to do preview, you know, + there are so", "tokens": [50644, 2035, 11, 411, 2201, 1646, 775, 13, 492, 500, 380, + 13, 759, 2618, 2738, 281, 360, 14281, 11, 291, 458, 11, 456, 366, 370, 50900], "temperature": + 0.0, "avg_logprob": -0.13036083406017673, "compression_ratio": 1.6744186046511629, + "no_speech_prob": 0.0013863056665286422}, {"id": 586, "seek": 352176, "start": 3532.48, + "end": 3538.0, "text": " many technologies to do that, but why? Instead, take, I + think the best thing to do is after the user has", "tokens": [50900, 867, 7943, + 281, 360, 300, 11, 457, 983, 30, 7156, 11, 747, 11, 286, 519, 264, 1151, 551, 281, + 360, 307, 934, 264, 4195, 575, 51176], "temperature": 0.0, "avg_logprob": -0.13036083406017673, + "compression_ratio": 1.6744186046511629, "no_speech_prob": 0.0013863056665286422}, + {"id": 587, "seek": 352176, "start": 3538.0, "end": 3543.2000000000003, "text": + " scanned the shallow results that swirl gives you immediately two, two, three seconds, + that''s nothing", "tokens": [51176, 45089, 264, 20488, 3542, 300, 30310, 2709, 291, + 4258, 732, 11, 732, 11, 1045, 3949, 11, 300, 311, 1825, 51436], "temperature": 0.0, + "avg_logprob": -0.13036083406017673, "compression_ratio": 1.6744186046511629, "no_speech_prob": + 0.0013863056665286422}, {"id": 588, "seek": 352176, "start": 3543.2000000000003, + "end": 3547.5200000000004, "text": " compared to the time it takes to go to each + silo. After you''ve done three silos, you''re already", "tokens": [51436, 5347, + 281, 264, 565, 309, 2516, 281, 352, 281, 1184, 3425, 78, 13, 2381, 291, 600, 1096, + 1045, 48893, 11, 291, 434, 1217, 51652], "temperature": 0.0, "avg_logprob": -0.13036083406017673, + "compression_ratio": 1.6744186046511629, "no_speech_prob": 0.0013863056665286422}, + {"id": 589, "seek": 354752, "start": 3547.52, "end": 3552.32, "text": " way saving, + right? But then say, okay, look, it''s obvious to me that the best results here + are maybe", "tokens": [50364, 636, 6816, 11, 558, 30, 583, 550, 584, 11, 1392, 11, + 574, 11, 309, 311, 6322, 281, 385, 300, 264, 1151, 3542, 510, 366, 1310, 50604], + "temperature": 0.0, "avg_logprob": -0.15086651611328125, "compression_ratio": 1.62, + "no_speech_prob": 0.005448661744594574}, {"id": 590, "seek": 354752, "start": 3552.32, + "end": 3557.36, "text": " in one drive in this folder or maybe it''s in this team''s + chat or these teams chats. So now click,", "tokens": [50604, 294, 472, 3332, 294, + 341, 10820, 420, 1310, 309, 311, 294, 341, 1469, 311, 5081, 420, 613, 5491, 38057, + 13, 407, 586, 2052, 11, 50856], "temperature": 0.0, "avg_logprob": -0.15086651611328125, + "compression_ratio": 1.62, "no_speech_prob": 0.005448661744594574}, {"id": 591, + "seek": 354752, "start": 3557.84, "end": 3561.6, "text": " go into that environment + and hopefully you can then, right, traverse the data and get what you", "tokens": + [50880, 352, 666, 300, 2823, 293, 4696, 291, 393, 550, 11, 558, 11, 45674, 264, + 1412, 293, 483, 437, 291, 51068], "temperature": 0.0, "avg_logprob": -0.15086651611328125, + "compression_ratio": 1.62, "no_speech_prob": 0.005448661744594574}, {"id": 592, + "seek": 354752, "start": 3561.6, "end": 3569.2, "text": " actually need. And down + the road, when those repositories are serving up answers, right? We have", "tokens": + [51068, 767, 643, 13, 400, 760, 264, 3060, 11, 562, 729, 22283, 2083, 366, 8148, + 493, 6338, 11, 558, 30, 492, 362, 51448], "temperature": 0.0, "avg_logprob": -0.15086651611328125, + "compression_ratio": 1.62, "no_speech_prob": 0.005448661744594574}, {"id": 593, + "seek": 354752, "start": 3569.2, "end": 3575.04, "text": " mentioned chat, GPT much, + but I assume you''ve seen the Microsoft co-pilot demo. How long before", "tokens": + [51448, 2835, 5081, 11, 26039, 51, 709, 11, 457, 286, 6552, 291, 600, 1612, 264, + 8116, 598, 12, 79, 31516, 10723, 13, 1012, 938, 949, 51740], "temperature": 0.0, + "avg_logprob": -0.15086651611328125, "compression_ratio": 1.62, "no_speech_prob": + 0.005448661744594574}, {"id": 594, "seek": 357504, "start": 3575.04, "end": 3578.72, + "text": " that''s pushing the data back, as opposed to you asking for it, right? + It''s saying, oh, here''s", "tokens": [50364, 300, 311, 7380, 264, 1412, 646, 11, + 382, 8851, 281, 291, 3365, 337, 309, 11, 558, 30, 467, 311, 1566, 11, 1954, 11, + 510, 311, 50548], "temperature": 0.0, "avg_logprob": -0.11402137162255459, "compression_ratio": + 1.6909722222222223, "no_speech_prob": 0.003813371295109391}, {"id": 595, "seek": + 357504, "start": 3578.72, "end": 3584.16, "text": " the summary you need today. + If you knew what to tell it, it could probably do that for you. So I", "tokens": + [50548, 264, 12691, 291, 643, 965, 13, 759, 291, 2586, 437, 281, 980, 309, 11, 309, + 727, 1391, 360, 300, 337, 291, 13, 407, 286, 50820], "temperature": 0.0, "avg_logprob": + -0.11402137162255459, "compression_ratio": 1.6909722222222223, "no_speech_prob": + 0.003813371295109391}, {"id": 596, "seek": 357504, "start": 3584.16, "end": 3589.84, + "text": " think that''s the new landscape. The much more important thing than the + one search box to rule them all", "tokens": [50820, 519, 300, 311, 264, 777, 9661, + 13, 440, 709, 544, 1021, 551, 813, 264, 472, 3164, 2424, 281, 4978, 552, 439, 51104], + "temperature": 0.0, "avg_logprob": -0.11402137162255459, "compression_ratio": 1.6909722222222223, + "no_speech_prob": 0.003813371295109391}, {"id": 597, "seek": 357504, "start": 3589.84, + "end": 3594.72, "text": " is to use the power of meta-search to connect systems + together and deliver information to the stuff", "tokens": [51104, 307, 281, 764, + 264, 1347, 295, 19616, 12, 405, 1178, 281, 1745, 3652, 1214, 293, 4239, 1589, 281, + 264, 1507, 51348], "temperature": 0.0, "avg_logprob": -0.11402137162255459, "compression_ratio": + 1.6909722222222223, "no_speech_prob": 0.003813371295109391}, {"id": 598, "seek": + 357504, "start": 3594.72, "end": 3601.2, "text": " you have already, to the workflows + that work and make value already. Whether that''s Slack or,", "tokens": [51348, + 291, 362, 1217, 11, 281, 264, 43461, 300, 589, 293, 652, 2158, 1217, 13, 8503, 300, + 311, 37211, 420, 11, 51672], "temperature": 0.0, "avg_logprob": -0.11402137162255459, + "compression_ratio": 1.6909722222222223, "no_speech_prob": 0.003813371295109391}, + {"id": 599, "seek": 360120, "start": 3602.0, "end": 3608.64, "text": " a newsletter + or a notification to a Salesforce queue, that''s the what you should do. The world", + "tokens": [50404, 257, 26469, 420, 257, 11554, 281, 257, 40398, 18639, 11, 300, + 311, 264, 437, 291, 820, 360, 13, 440, 1002, 50736], "temperature": 0.0, "avg_logprob": + -0.20692779620488486, "compression_ratio": 1.6302521008403361, "no_speech_prob": + 0.010704675689339638}, {"id": 600, "seek": 360120, "start": 3608.64, "end": 3616.24, + "text": " doesn''t need another search you are. Yeah, especially like today, I saw + a message on Slack for one", "tokens": [50736, 1177, 380, 643, 1071, 3164, 291, + 366, 13, 865, 11, 2318, 411, 965, 11, 286, 1866, 257, 3636, 322, 37211, 337, 472, + 51116], "temperature": 0.0, "avg_logprob": -0.20692779620488486, "compression_ratio": + 1.6302521008403361, "no_speech_prob": 0.010704675689339638}, {"id": 601, "seek": + 360120, "start": 3616.24, "end": 3624.08, "text": " of the senior managers saying, + hey, what''s the password to this thing? And I can imagine that", "tokens": [51116, + 295, 264, 7965, 14084, 1566, 11, 4177, 11, 437, 311, 264, 11524, 281, 341, 551, + 30, 400, 286, 393, 3811, 300, 51508], "temperature": 0.0, "avg_logprob": -0.20692779620488486, + "compression_ratio": 1.6302521008403361, "no_speech_prob": 0.010704675689339638}, + {"id": 602, "seek": 360120, "start": 3624.08, "end": 3630.64, "text": " in the business + schedules, you know, they don''t have access, they don''t have the password right + now,", "tokens": [51508, 294, 264, 1606, 28078, 11, 291, 458, 11, 436, 500, 380, + 362, 2105, 11, 436, 500, 380, 362, 264, 11524, 558, 586, 11, 51836], "temperature": + 0.0, "avg_logprob": -0.20692779620488486, "compression_ratio": 1.6302521008403361, + "no_speech_prob": 0.010704675689339638}, {"id": 603, "seek": 363064, "start": 3630.64, + "end": 3635.3599999999997, "text": " they will switch to another topic, but maybe + this topic was still important and maybe even one", "tokens": [50364, 436, 486, + 3679, 281, 1071, 4829, 11, 457, 1310, 341, 4829, 390, 920, 1021, 293, 1310, 754, + 472, 50600], "temperature": 0.0, "avg_logprob": -0.09891825850291919, "compression_ratio": + 1.676595744680851, "no_speech_prob": 0.0022112398874014616}, {"id": 604, "seek": + 363064, "start": 3635.3599999999997, "end": 3642.08, "text": " important, but they + just don''t want to wait. And what you say is that in principle, they could have", + "tokens": [50600, 1021, 11, 457, 436, 445, 500, 380, 528, 281, 1699, 13, 400, 437, + 291, 584, 307, 300, 294, 8665, 11, 436, 727, 362, 50936], "temperature": 0.0, "avg_logprob": + -0.09891825850291919, "compression_ratio": 1.676595744680851, "no_speech_prob": + 0.0022112398874014616}, {"id": 605, "seek": 363064, "start": 3642.08, "end": 3654.0, + "text": " configured it once and access it as many times as they need. Exactly. + Exactly. And it''s not uncommon in", "tokens": [50936, 30538, 309, 1564, 293, 2105, + 309, 382, 867, 1413, 382, 436, 643, 13, 7587, 13, 7587, 13, 400, 309, 311, 406, + 29289, 294, 51532], "temperature": 0.0, "avg_logprob": -0.09891825850291919, "compression_ratio": + 1.676595744680851, "no_speech_prob": 0.0022112398874014616}, {"id": 606, "seek": + 363064, "start": 3654.0, "end": 3659.7599999999998, "text": " the world of, you + know, consulting, strategic consulting, tech strategy, that the most powerful", + "tokens": [51532, 264, 1002, 295, 11, 291, 458, 11, 23682, 11, 10924, 23682, 11, + 7553, 5206, 11, 300, 264, 881, 4005, 51820], "temperature": 0.0, "avg_logprob": + -0.09891825850291919, "compression_ratio": 1.676595744680851, "no_speech_prob": + 0.0022112398874014616}, {"id": 607, "seek": 365976, "start": 3659.76, "end": 3666.6400000000003, + "text": " people are analysts and admins because, you know, partners are very busy, + right, talking to and", "tokens": [50364, 561, 366, 31388, 293, 5910, 1292, 570, + 11, 291, 458, 11, 4462, 366, 588, 5856, 11, 558, 11, 1417, 281, 293, 50708], "temperature": + 0.0, "avg_logprob": -0.12218938271204631, "compression_ratio": 1.5617529880478087, + "no_speech_prob": 0.0026510958559811115}, {"id": 608, "seek": 365976, "start": 3666.6400000000003, + "end": 3673.0400000000004, "text": " solving client problems and finding new ones. + So they rely on those folks to have access to all the", "tokens": [50708, 12606, + 6423, 2740, 293, 5006, 777, 2306, 13, 407, 436, 10687, 322, 729, 4024, 281, 362, + 2105, 281, 439, 264, 51028], "temperature": 0.0, "avg_logprob": -0.12218938271204631, + "compression_ratio": 1.5617529880478087, "no_speech_prob": 0.0026510958559811115}, + {"id": 609, "seek": 365976, "start": 3673.0400000000004, "end": 3679.5200000000004, + "text": " systems and to go scour them. And of course, that''s a waste, right? Probably + nobody loves scouring", "tokens": [51028, 3652, 293, 281, 352, 795, 396, 552, 13, + 400, 295, 1164, 11, 300, 311, 257, 5964, 11, 558, 30, 9210, 5079, 6752, 795, 40510, + 51352], "temperature": 0.0, "avg_logprob": -0.12218938271204631, "compression_ratio": + 1.5617529880478087, "no_speech_prob": 0.0026510958559811115}, {"id": 610, "seek": + 365976, "start": 3679.5200000000004, "end": 3686.96, "text": " those silos, but + even more, we cannot be 100% systematic all the time. But with technologies like", + "tokens": [51352, 729, 48893, 11, 457, 754, 544, 11, 321, 2644, 312, 2319, 4, 27249, + 439, 264, 565, 13, 583, 365, 7943, 411, 51724], "temperature": 0.0, "avg_logprob": + -0.12218938271204631, "compression_ratio": 1.5617529880478087, "no_speech_prob": + 0.0026510958559811115}, {"id": 611, "seek": 368696, "start": 3686.96, "end": 3691.68, + "text": " meta-search and push technologies and there''s a million things you could + use and there''s a million", "tokens": [50364, 19616, 12, 405, 1178, 293, 2944, + 7943, 293, 456, 311, 257, 2459, 721, 291, 727, 764, 293, 456, 311, 257, 2459, 50600], + "temperature": 0.0, "avg_logprob": -0.1458443986608627, "compression_ratio": 1.6909871244635193, + "no_speech_prob": 0.003051463281735778}, {"id": 612, "seek": 368696, "start": 3691.68, + "end": 3698.96, "text": " ways to deliver those things, the opportunity is really + there to let those people work on something", "tokens": [50600, 2098, 281, 4239, + 729, 721, 11, 264, 2650, 307, 534, 456, 281, 718, 729, 561, 589, 322, 746, 50964], + "temperature": 0.0, "avg_logprob": -0.1458443986608627, "compression_ratio": 1.6909871244635193, + "no_speech_prob": 0.003051463281735778}, {"id": 613, "seek": 368696, "start": 3698.96, + "end": 3703.84, "text": " else, right, to create value in other ways and not just + be scouring it for everything that''s relevant,", "tokens": [50964, 1646, 11, 558, + 11, 281, 1884, 2158, 294, 661, 2098, 293, 406, 445, 312, 795, 40510, 309, 337, 1203, + 300, 311, 7340, 11, 51208], "temperature": 0.0, "avg_logprob": -0.1458443986608627, + "compression_ratio": 1.6909871244635193, "no_speech_prob": 0.003051463281735778}, + {"id": 614, "seek": 368696, "start": 3703.84, "end": 3711.44, "text": " for every, + you know, give the best chance. Yeah, absolutely. And how do you view the problem", + "tokens": [51208, 337, 633, 11, 291, 458, 11, 976, 264, 1151, 2931, 13, 865, 11, + 3122, 13, 400, 577, 360, 291, 1910, 264, 1154, 51588], "temperature": 0.0, "avg_logprob": + -0.1458443986608627, "compression_ratio": 1.6909871244635193, "no_speech_prob": + 0.003051463281735778}, {"id": 615, "seek": 371144, "start": 3712.2400000000002, + "end": 3718.48, "text": " of, or do you think it''s a problem at all, of evolving + such a search engine, you know, like,", "tokens": [50404, 295, 11, 420, 360, 291, + 519, 309, 311, 257, 1154, 412, 439, 11, 295, 21085, 1270, 257, 3164, 2848, 11, 291, + 458, 11, 411, 11, 50716], "temperature": 0.0, "avg_logprob": -0.20866667506206465, + "compression_ratio": 1.5509259259259258, "no_speech_prob": 0.009355876594781876}, + {"id": 616, "seek": 371144, "start": 3719.36, "end": 3727.04, "text": " if, if I + have the main experts who could actually label results for me for these queries,", + "tokens": [50760, 498, 11, 498, 286, 362, 264, 2135, 8572, 567, 727, 767, 7645, + 3542, 337, 385, 337, 613, 24109, 11, 51144], "temperature": 0.0, "avg_logprob": + -0.20866667506206465, "compression_ratio": 1.5509259259259258, "no_speech_prob": + 0.009355876594781876}, {"id": 617, "seek": 371144, "start": 3727.84, "end": 3732.56, + "text": " could I somehow integrate this into the process with swirl?", "tokens": + [51184, 727, 286, 6063, 13365, 341, 666, 264, 1399, 365, 30310, 30, 51420], "temperature": + 0.0, "avg_logprob": -0.20866667506206465, "compression_ratio": 1.5509259259259258, + "no_speech_prob": 0.009355876594781876}, {"id": 618, "seek": 371144, "start": 3733.6, + "end": 3740.32, "text": " Absolutely. So that brings me actually nice lead into + the third use case that the people are", "tokens": [51472, 7021, 13, 407, 300, 5607, + 385, 767, 1481, 1477, 666, 264, 2636, 764, 1389, 300, 264, 561, 366, 51808], "temperature": + 0.0, "avg_logprob": -0.20866667506206465, "compression_ratio": 1.5509259259259258, + "no_speech_prob": 0.009355876594781876}, {"id": 619, "seek": 374032, "start": 3740.32, + "end": 3746.48, "text": " starting to look at with swirl. So exactly what you said, + maybe I''m trying to build the chat GPT", "tokens": [50364, 2891, 281, 574, 412, + 365, 30310, 13, 407, 2293, 437, 291, 848, 11, 1310, 286, 478, 1382, 281, 1322, 264, + 5081, 26039, 51, 50672], "temperature": 0.0, "avg_logprob": -0.12266995587686855, + "compression_ratio": 1.6838487972508591, "no_speech_prob": 0.00021481956355273724}, + {"id": 620, "seek": 374032, "start": 3746.48, "end": 3750.96, "text": " of my business, + okay, maybe it doesn''t even have to be that, maybe it has to be something,", "tokens": + [50672, 295, 452, 1606, 11, 1392, 11, 1310, 309, 1177, 380, 754, 362, 281, 312, + 300, 11, 1310, 309, 575, 281, 312, 746, 11, 50896], "temperature": 0.0, "avg_logprob": + -0.12266995587686855, "compression_ratio": 1.6838487972508591, "no_speech_prob": + 0.00021481956355273724}, {"id": 621, "seek": 374032, "start": 3750.96, "end": 3757.28, + "text": " even a simpler version. How would I automate handling of an exception + when processing a mortgage,", "tokens": [50896, 754, 257, 18587, 3037, 13, 1012, + 576, 286, 31605, 13175, 295, 364, 11183, 562, 9007, 257, 20236, 11, 51212], "temperature": + 0.0, "avg_logprob": -0.12266995587686855, "compression_ratio": 1.6838487972508591, + "no_speech_prob": 0.00021481956355273724}, {"id": 622, "seek": 374032, "start": + 3757.28, "end": 3762.88, "text": " as an example? How could I automate that? That''s + really hard. That is probably not a rules-based system.", "tokens": [51212, 382, + 364, 1365, 30, 1012, 727, 286, 31605, 300, 30, 663, 311, 534, 1152, 13, 663, 307, + 1391, 406, 257, 4474, 12, 6032, 1185, 13, 51492], "temperature": 0.0, "avg_logprob": + -0.12266995587686855, "compression_ratio": 1.6838487972508591, "no_speech_prob": + 0.00021481956355273724}, {"id": 623, "seek": 374032, "start": 3763.84, "end": 3768.8, + "text": " But it''s exactly what you said. I need labels, right? So you''re going + to have your humans go scour,", "tokens": [51540, 583, 309, 311, 2293, 437, 291, + 848, 13, 286, 643, 16949, 11, 558, 30, 407, 291, 434, 516, 281, 362, 428, 6255, + 352, 795, 396, 11, 51788], "temperature": 0.0, "avg_logprob": -0.12266995587686855, + "compression_ratio": 1.6838487972508591, "no_speech_prob": 0.00021481956355273724}, + {"id": 624, "seek": 376880, "start": 3769.6000000000004, "end": 3774.8, "text": + " whatever, the various locations slack and teams and various products. And hopefully + they find", "tokens": [50404, 2035, 11, 264, 3683, 9253, 29767, 293, 5491, 293, + 3683, 3383, 13, 400, 4696, 436, 915, 50664], "temperature": 0.0, "avg_logprob": + -0.14234038073607166, "compression_ratio": 1.634453781512605, "no_speech_prob": + 0.0040127853862941265}, {"id": 625, "seek": 376880, "start": 3774.8, "end": 3780.6400000000003, + "text": " them and they label them. Why not use MetaSearch for that? So if you can + MetaSearch those things and use", "tokens": [50664, 552, 293, 436, 7645, 552, 13, + 1545, 406, 764, 6377, 64, 10637, 1178, 337, 300, 30, 407, 498, 291, 393, 6377, 64, + 10637, 1178, 729, 721, 293, 764, 50956], "temperature": 0.0, "avg_logprob": -0.14234038073607166, + "compression_ratio": 1.634453781512605, "no_speech_prob": 0.0040127853862941265}, + {"id": 626, "seek": 376880, "start": 3780.6400000000003, "end": 3786.96, "text": + " the language model, right, to basically say, I''m going to label anything over + a certain score", "tokens": [50956, 264, 2856, 2316, 11, 558, 11, 281, 1936, 584, + 11, 286, 478, 516, 281, 7645, 1340, 670, 257, 1629, 6175, 51272], "temperature": + 0.0, "avg_logprob": -0.14234038073607166, "compression_ratio": 1.634453781512605, + "no_speech_prob": 0.0040127853862941265}, {"id": 627, "seek": 376880, "start": 3787.6800000000003, + "end": 3792.6400000000003, "text": " as being about this thing, then I give it a + bunch of labels, let it hit, get a bunch of targets,", "tokens": [51308, 382, 885, + 466, 341, 551, 11, 550, 286, 976, 309, 257, 3840, 295, 16949, 11, 718, 309, 2045, + 11, 483, 257, 3840, 295, 12911, 11, 51556], "temperature": 0.0, "avg_logprob": -0.14234038073607166, + "compression_ratio": 1.634453781512605, "no_speech_prob": 0.0040127853862941265}, + {"id": 628, "seek": 379264, "start": 3792.64, "end": 3797.68, "text": " let it go, + find those things, hold the documents, because you will need the documents.", "tokens": + [50364, 718, 309, 352, 11, 915, 729, 721, 11, 1797, 264, 8512, 11, 570, 291, 486, + 643, 264, 8512, 13, 50616], "temperature": 0.0, "avg_logprob": -0.14220977056594122, + "compression_ratio": 1.829875518672199, "no_speech_prob": 0.0009466343908570707}, + {"id": 629, "seek": 379264, "start": 3798.56, "end": 3804.3199999999997, "text": + " The difficulty of pulling documents compared to searching documents in M365 is + one permission.", "tokens": [50660, 440, 10360, 295, 8407, 8512, 5347, 281, 10808, + 8512, 294, 376, 11309, 20, 307, 472, 11226, 13, 50948], "temperature": 0.0, "avg_logprob": + -0.14220977056594122, "compression_ratio": 1.829875518672199, "no_speech_prob": + 0.0009466343908570707}, {"id": 630, "seek": 379264, "start": 3805.52, "end": 3810.3199999999997, + "text": " So we are today, right, if you install swirl against M365, against your + tenant,", "tokens": [51008, 407, 321, 366, 965, 11, 558, 11, 498, 291, 3625, 30310, + 1970, 376, 11309, 20, 11, 1970, 428, 31000, 11, 51248], "temperature": 0.0, "avg_logprob": + -0.14220977056594122, "compression_ratio": 1.829875518672199, "no_speech_prob": + 0.0009466343908570707}, {"id": 631, "seek": 379264, "start": 3810.3199999999997, + "end": 3816.3199999999997, "text": " you are granting permission for swirl on behalf + of some user, right, to search through the one", "tokens": [51248, 291, 366, 50204, + 11226, 337, 30310, 322, 9490, 295, 512, 4195, 11, 558, 11, 281, 3164, 807, 264, + 472, 51548], "temperature": 0.0, "avg_logprob": -0.14220977056594122, "compression_ratio": + 1.829875518672199, "no_speech_prob": 0.0009466343908570707}, {"id": 632, "seek": + 379264, "start": 3816.3199999999997, "end": 3822.3199999999997, "text": " drive + files. So you could also give a permission to fetch those files. So use swirl,", + "tokens": [51548, 3332, 7098, 13, 407, 291, 727, 611, 976, 257, 11226, 281, 23673, + 729, 7098, 13, 407, 764, 30310, 11, 51848], "temperature": 0.0, "avg_logprob": -0.14220977056594122, + "compression_ratio": 1.829875518672199, "no_speech_prob": 0.0009466343908570707}, + {"id": 633, "seek": 382264, "start": 3822.7999999999997, "end": 3828.7999999999997, + "text": " to find the documents that are about the exception handling across silos, + label the ones that are", "tokens": [50372, 281, 915, 264, 8512, 300, 366, 466, + 264, 11183, 13175, 2108, 48893, 11, 7645, 264, 2306, 300, 366, 50672], "temperature": + 0.0, "avg_logprob": -0.1558786372548526, "compression_ratio": 1.625514403292181, + "no_speech_prob": 0.00024645490339025855}, {"id": 634, "seek": 382264, "start": + 3828.7999999999997, "end": 3834.64, "text": " above a certain threshold. Perhaps + you could display those in a UI and let the, let the analyst check", "tokens": [50672, + 3673, 257, 1629, 14678, 13, 10517, 291, 727, 4674, 729, 294, 257, 15682, 293, 718, + 264, 11, 718, 264, 19085, 1520, 50964], "temperature": 0.0, "avg_logprob": -0.1558786372548526, + "compression_ratio": 1.625514403292181, "no_speech_prob": 0.00024645490339025855}, + {"id": 635, "seek": 382264, "start": 3834.64, "end": 3840.16, "text": " the labels, + you could use a cool tool like Prodigy as an example, right, from explosion, the + same", "tokens": [50964, 264, 16949, 11, 291, 727, 764, 257, 1627, 2290, 411, 1705, + 25259, 88, 382, 364, 1365, 11, 558, 11, 490, 15673, 11, 264, 912, 51240], "temperature": + 0.0, "avg_logprob": -0.1558786372548526, "compression_ratio": 1.625514403292181, + "no_speech_prob": 0.00024645490339025855}, {"id": 636, "seek": 382264, "start": + 3840.16, "end": 3848.4, "text": " folks who make spacey, which is what we use in + in swirl. And I think from there you would say you", "tokens": [51240, 4024, 567, + 652, 1901, 88, 11, 597, 307, 437, 321, 764, 294, 294, 30310, 13, 400, 286, 519, + 490, 456, 291, 576, 584, 291, 51652], "temperature": 0.0, "avg_logprob": -0.1558786372548526, + "compression_ratio": 1.625514403292181, "no_speech_prob": 0.00024645490339025855}, + {"id": 637, "seek": 384840, "start": 3848.7200000000003, "end": 3853.04, "text": + " if you trusted the labels, if the labels were good enough, you could actually + do your first run,", "tokens": [50380, 498, 291, 16034, 264, 16949, 11, 498, 264, + 16949, 645, 665, 1547, 11, 291, 727, 767, 360, 428, 700, 1190, 11, 50596], "temperature": + 0.0, "avg_logprob": -0.17304945373535155, "compression_ratio": 1.745644599303136, + "no_speech_prob": 0.0018037580884993076}, {"id": 638, "seek": 384840, "start": 3853.04, + "end": 3859.6, "text": " right, take 25 or 40% or whatever your preferred number + of the labeled results out, build the machine", "tokens": [50596, 558, 11, 747, + 3552, 420, 3356, 4, 420, 2035, 428, 16494, 1230, 295, 264, 21335, 3542, 484, 11, + 1322, 264, 3479, 50924], "temperature": 0.0, "avg_logprob": -0.17304945373535155, + "compression_ratio": 1.745644599303136, "no_speech_prob": 0.0018037580884993076}, + {"id": 639, "seek": 384840, "start": 3859.6, "end": 3865.2000000000003, "text": + " learning model with the rest, and then test with the, you know, with the holdout + set, do the, you know,", "tokens": [50924, 2539, 2316, 365, 264, 1472, 11, 293, + 550, 1500, 365, 264, 11, 291, 458, 11, 365, 264, 1797, 346, 992, 11, 360, 264, 11, + 291, 458, 11, 51204], "temperature": 0.0, "avg_logprob": -0.17304945373535155, "compression_ratio": + 1.745644599303136, "no_speech_prob": 0.0018037580884993076}, {"id": 640, "seek": + 384840, "start": 3865.2000000000003, "end": 3870.88, "text": " if it''s bad, build + a confusion matrix, etc, etc. There you go. And at least now you''re reviewing and", + "tokens": [51204, 498, 309, 311, 1578, 11, 1322, 257, 15075, 8141, 11, 5183, 11, + 5183, 13, 821, 291, 352, 13, 400, 412, 1935, 586, 291, 434, 19576, 293, 51488], + "temperature": 0.0, "avg_logprob": -0.17304945373535155, "compression_ratio": 1.745644599303136, + "no_speech_prob": 0.0018037580884993076}, {"id": 641, "seek": 384840, "start": 3870.88, + "end": 3874.96, "text": " refining and adjusting the threshold as opposed to going + and starting with hand labeling of data.", "tokens": [51488, 1895, 1760, 293, 23559, + 264, 14678, 382, 8851, 281, 516, 293, 2891, 365, 1011, 40244, 295, 1412, 13, 51692], + "temperature": 0.0, "avg_logprob": -0.17304945373535155, "compression_ratio": 1.745644599303136, + "no_speech_prob": 0.0018037580884993076}, {"id": 642, "seek": 387496, "start": 3875.92, + "end": 3880.32, "text": " Yeah. Yeah. That''s, that''s a great application for meta + search and language models.", "tokens": [50412, 865, 13, 865, 13, 663, 311, 11, + 300, 311, 257, 869, 3861, 337, 19616, 3164, 293, 2856, 5245, 13, 50632], "temperature": + 0.0, "avg_logprob": -0.2142306926638581, "compression_ratio": 1.6575342465753424, + "no_speech_prob": 0.025662627071142197}, {"id": 643, "seek": 387496, "start": 3880.32, + "end": 3885.28, "text": " Exactly. And you explain basically kind of like in the, + in the most straightforward way, you know,", "tokens": [50632, 7587, 13, 400, 291, + 2903, 1936, 733, 295, 411, 294, 264, 11, 294, 264, 881, 15325, 636, 11, 291, 458, + 11, 50880], "temperature": 0.0, "avg_logprob": -0.2142306926638581, "compression_ratio": + 1.6575342465753424, "no_speech_prob": 0.025662627071142197}, {"id": 644, "seek": + 387496, "start": 3885.28, "end": 3891.04, "text": " in a machine learning, out of + machine learning model training testing validation, right,", "tokens": [50880, 294, + 257, 3479, 2539, 11, 484, 295, 3479, 2539, 2316, 3097, 4997, 24071, 11, 558, 11, + 51168], "temperature": 0.0, "avg_logprob": -0.2142306926638581, "compression_ratio": + 1.6575342465753424, "no_speech_prob": 0.025662627071142197}, {"id": 645, "seek": + 387496, "start": 3892.0, "end": 3899.28, "text": " which doesn''t escape in the + search world doesn''t escape from this. I think this is amazing.", "tokens": [51216, + 597, 1177, 380, 7615, 294, 264, 3164, 1002, 1177, 380, 7615, 490, 341, 13, 286, + 519, 341, 307, 2243, 13, 51580], "temperature": 0.0, "avg_logprob": -0.2142306926638581, + "compression_ratio": 1.6575342465753424, "no_speech_prob": 0.025662627071142197}, + {"id": 646, "seek": 389928, "start": 3900.2400000000002, "end": 3910.2400000000002, + "text": " You chose as the model for your product, open source. You have some thoughts + on this. I really like", "tokens": [50412, 509, 5111, 382, 264, 2316, 337, 428, + 1674, 11, 1269, 4009, 13, 509, 362, 512, 4598, 322, 341, 13, 286, 534, 411, 50912], + "temperature": 0.0, "avg_logprob": -0.16929204647357649, "compression_ratio": 1.4545454545454546, + "no_speech_prob": 0.03539002314209938}, {"id": 647, "seek": 389928, "start": 3910.2400000000002, + "end": 3916.5600000000004, "text": " this question when I asked this to, I think + I asked it to Bob Van Lloyd from VV8 as well.", "tokens": [50912, 341, 1168, 562, + 286, 2351, 341, 281, 11, 286, 519, 286, 2351, 309, 281, 6085, 8979, 31401, 490, + 691, 53, 23, 382, 731, 13, 51228], "temperature": 0.0, "avg_logprob": -0.16929204647357649, + "compression_ratio": 1.4545454545454546, "no_speech_prob": 0.03539002314209938}, + {"id": 648, "seek": 389928, "start": 3917.2000000000003, "end": 3923.52, "text": + " You know, why did you guys, you know, looking at your competition, let''s say + Pinecon didn''t choose", "tokens": [51260, 509, 458, 11, 983, 630, 291, 1074, 11, + 291, 458, 11, 1237, 412, 428, 6211, 11, 718, 311, 584, 33531, 1671, 994, 380, 2826, + 51576], "temperature": 0.0, "avg_logprob": -0.16929204647357649, "compression_ratio": + 1.4545454545454546, "no_speech_prob": 0.03539002314209938}, {"id": 649, "seek": + 392352, "start": 3923.52, "end": 3929.04, "text": " open source for some reasons + that are valid for them, but you guys did. And so did you,", "tokens": [50364, 1269, + 4009, 337, 512, 4112, 300, 366, 7363, 337, 552, 11, 457, 291, 1074, 630, 13, 400, + 370, 630, 291, 11, 50640], "temperature": 0.0, "avg_logprob": -0.17529916763305664, + "compression_ratio": 1.6194690265486726, "no_speech_prob": 0.014030051417648792}, + {"id": 650, "seek": 392352, "start": 3930.48, "end": 3939.52, "text": " what makes + you believe in this model work better? Because in some sense, it does require a + lot of", "tokens": [50712, 437, 1669, 291, 1697, 294, 341, 2316, 589, 1101, 30, + 1436, 294, 512, 2020, 11, 309, 775, 3651, 257, 688, 295, 51164], "temperature": + 0.0, "avg_logprob": -0.17529916763305664, "compression_ratio": 1.6194690265486726, + "no_speech_prob": 0.014030051417648792}, {"id": 651, "seek": 392352, "start": 3939.52, + "end": 3944.4, "text": " like public facing work, right? You need to explain, you + need to document, you need to", "tokens": [51164, 411, 1908, 7170, 589, 11, 558, + 30, 509, 643, 281, 2903, 11, 291, 643, 281, 4166, 11, 291, 643, 281, 51408], "temperature": + 0.0, "avg_logprob": -0.17529916763305664, "compression_ratio": 1.6194690265486726, + "no_speech_prob": 0.014030051417648792}, {"id": 652, "seek": 392352, "start": 3945.04, + "end": 3951.2, "text": " review pull requests with all the goodies that come with + this, of course, right? But there is", "tokens": [51440, 3131, 2235, 12475, 365, + 439, 264, 44072, 300, 808, 365, 341, 11, 295, 1164, 11, 558, 30, 583, 456, 307, + 51748], "temperature": 0.0, "avg_logprob": -0.17529916763305664, "compression_ratio": + 1.6194690265486726, "no_speech_prob": 0.014030051417648792}, {"id": 653, "seek": + 395120, "start": 3951.3599999999997, "end": 3958.3199999999997, "text": " an extra + work involved, but you definitely get some benefits. What is your thinking here?", + "tokens": [50372, 364, 2857, 589, 3288, 11, 457, 291, 2138, 483, 512, 5311, 13, + 708, 307, 428, 1953, 510, 30, 50720], "temperature": 0.0, "avg_logprob": -0.12645489519292658, + "compression_ratio": 1.524390243902439, "no_speech_prob": 0.003641559509560466}, + {"id": 654, "seek": 395120, "start": 3960.56, "end": 3967.52, "text": " The truth + is I''ve been an open source person forever. I just believe in it, whether it was,", + "tokens": [50832, 440, 3494, 307, 286, 600, 668, 364, 1269, 4009, 954, 5680, 13, + 286, 445, 1697, 294, 309, 11, 1968, 309, 390, 11, 51180], "temperature": 0.0, "avg_logprob": + -0.12645489519292658, "compression_ratio": 1.524390243902439, "no_speech_prob": + 0.003641559509560466}, {"id": 655, "seek": 395120, "start": 3968.3199999999997, + "end": 3973.7599999999998, "text": " Jeff Hammerbacker''s amazing comment about + how it''s too bad that everyone''s spending their time", "tokens": [51220, 7506, + 33722, 3207, 260, 311, 2243, 2871, 466, 577, 309, 311, 886, 1578, 300, 1518, 311, + 6434, 641, 565, 51492], "temperature": 0.0, "avg_logprob": -0.12645489519292658, + "compression_ratio": 1.524390243902439, "no_speech_prob": 0.003641559509560466}, + {"id": 656, "seek": 395120, "start": 3973.7599999999998, "end": 3979.7599999999998, + "text": " on clicks, right? And he believes that the data science approach benefits + hugely from open source.", "tokens": [51492, 322, 18521, 11, 558, 30, 400, 415, + 12307, 300, 264, 1412, 3497, 3109, 5311, 27417, 490, 1269, 4009, 13, 51792], "temperature": + 0.0, "avg_logprob": -0.12645489519292658, "compression_ratio": 1.524390243902439, + "no_speech_prob": 0.003641559509560466}, {"id": 657, "seek": 397976, "start": 3979.76, + "end": 3986.8, "text": " That''s so true. Joseph Jackson, the notable VC, right, + has written so much about its open source", "tokens": [50364, 663, 311, 370, 2074, + 13, 11170, 10647, 11, 264, 22556, 41922, 11, 558, 11, 575, 3720, 370, 709, 466, + 1080, 1269, 4009, 50716], "temperature": 0.0, "avg_logprob": -0.1254420757293701, + "compression_ratio": 1.589430894308943, "no_speech_prob": 0.0014395508915185928}, + {"id": 658, "seek": 397976, "start": 3986.8, "end": 3991.84, "text": " software + that''s really eating the world. It''s eating at a considerably higher rate. And + the reason", "tokens": [50716, 4722, 300, 311, 534, 3936, 264, 1002, 13, 467, 311, + 3936, 412, 257, 31308, 2946, 3314, 13, 400, 264, 1778, 50968], "temperature": 0.0, + "avg_logprob": -0.1254420757293701, "compression_ratio": 1.589430894308943, "no_speech_prob": + 0.0014395508915185928}, {"id": 659, "seek": 397976, "start": 3991.84, "end": 3998.8, + "text": " I think is it''s a few things. One is trust. One is trust. You know, during + the pandemic, I think the", "tokens": [50968, 286, 519, 307, 309, 311, 257, 1326, + 721, 13, 1485, 307, 3361, 13, 1485, 307, 3361, 13, 509, 458, 11, 1830, 264, 5388, + 11, 286, 519, 264, 51316], "temperature": 0.0, "avg_logprob": -0.1254420757293701, + "compression_ratio": 1.589430894308943, "no_speech_prob": 0.0014395508915185928}, + {"id": 660, "seek": 397976, "start": 3998.8, "end": 4005.44, "text": " large enterprise + saw a lot of promising young ventures just not make it. And if you bet on one", + "tokens": [51316, 2416, 14132, 1866, 257, 688, 295, 20257, 2037, 6931, 1303, 445, + 406, 652, 309, 13, 400, 498, 291, 778, 322, 472, 51648], "temperature": 0.0, "avg_logprob": + -0.1254420757293701, "compression_ratio": 1.589430894308943, "no_speech_prob": 0.0014395508915185928}, + {"id": 661, "seek": 400544, "start": 4005.44, "end": 4009.92, "text": " of those + technologies, you probably didn''t get the technology. Or maybe you did, right? + I don''t", "tokens": [50364, 295, 729, 7943, 11, 291, 1391, 994, 380, 483, 264, + 2899, 13, 1610, 1310, 291, 630, 11, 558, 30, 286, 500, 380, 50588], "temperature": + 0.0, "avg_logprob": -0.10269281243075844, "compression_ratio": 1.7359154929577465, + "no_speech_prob": 0.0021498538553714752}, {"id": 662, "seek": 400544, "start": 4009.92, + "end": 4014.88, "text": " know, but there was a certain amount of risk involved + in that. And open source does, although people,", "tokens": [50588, 458, 11, 457, + 456, 390, 257, 1629, 2372, 295, 3148, 3288, 294, 300, 13, 400, 1269, 4009, 775, + 11, 4878, 561, 11, 50836], "temperature": 0.0, "avg_logprob": -0.10269281243075844, + "compression_ratio": 1.7359154929577465, "no_speech_prob": 0.0021498538553714752}, + {"id": 663, "seek": 400544, "start": 4014.88, "end": 4019.28, "text": " I don''t + think want to take the code and run with it, they want to know that they could, + if they had", "tokens": [50836, 286, 500, 380, 519, 528, 281, 747, 264, 3089, 293, + 1190, 365, 309, 11, 436, 528, 281, 458, 300, 436, 727, 11, 498, 436, 632, 51056], + "temperature": 0.0, "avg_logprob": -0.10269281243075844, "compression_ratio": 1.7359154929577465, + "no_speech_prob": 0.0021498538553714752}, {"id": 664, "seek": 400544, "start": 4019.28, + "end": 4025.68, "text": " to. The second thing though, the trust is much deeper + when you have a commercial company that supports", "tokens": [51056, 281, 13, 440, + 1150, 551, 1673, 11, 264, 3361, 307, 709, 7731, 562, 291, 362, 257, 6841, 2237, + 300, 9346, 51376], "temperature": 0.0, "avg_logprob": -0.10269281243075844, "compression_ratio": + 1.7359154929577465, "no_speech_prob": 0.0021498538553714752}, {"id": 665, "seek": + 400544, "start": 4025.68, "end": 4031.44, "text": " open source, the so-called commercial + open source model, because it does require that public", "tokens": [51376, 1269, + 4009, 11, 264, 370, 12, 11880, 6841, 1269, 4009, 2316, 11, 570, 309, 775, 3651, + 300, 1908, 51664], "temperature": 0.0, "avg_logprob": -0.10269281243075844, "compression_ratio": + 1.7359154929577465, "no_speech_prob": 0.0021498538553714752}, {"id": 666, "seek": + 403144, "start": 4031.44, "end": 4037.04, "text": " investment, that public discipline. + We''re all about people using it. There''s no sales.", "tokens": [50364, 6078, 11, + 300, 1908, 13635, 13, 492, 434, 439, 466, 561, 1228, 309, 13, 821, 311, 572, 5763, + 13, 50644], "temperature": 0.0, "avg_logprob": -0.11111964175575657, "compression_ratio": + 1.5862068965517242, "no_speech_prob": 0.0028009426314383745}, {"id": 667, "seek": + 403144, "start": 4037.6, "end": 4042.96, "text": " Nobody has that title. We''re + here to make people successful using it. And I''m not sure,", "tokens": [50672, + 9297, 575, 300, 4876, 13, 492, 434, 510, 281, 652, 561, 4406, 1228, 309, 13, 400, + 286, 478, 406, 988, 11, 50940], "temperature": 0.0, "avg_logprob": -0.11111964175575657, + "compression_ratio": 1.5862068965517242, "no_speech_prob": 0.0028009426314383745}, + {"id": 668, "seek": 403144, "start": 4043.76, "end": 4047.36, "text": " to be honest + with you, how all the different ways it''s going to evolve, but we want to evolve", + "tokens": [50980, 281, 312, 3245, 365, 291, 11, 577, 439, 264, 819, 2098, 309, 311, + 516, 281, 16693, 11, 457, 321, 528, 281, 16693, 51160], "temperature": 0.0, "avg_logprob": + -0.11111964175575657, "compression_ratio": 1.5862068965517242, "no_speech_prob": + 0.0028009426314383745}, {"id": 669, "seek": 403144, "start": 4048.32, "end": 4055.52, + "text": " in line with what the actual community needs. You know, I think you start + with a kernel of an idea,", "tokens": [51208, 294, 1622, 365, 437, 264, 3539, 1768, + 2203, 13, 509, 458, 11, 286, 519, 291, 722, 365, 257, 28256, 295, 364, 1558, 11, + 51568], "temperature": 0.0, "avg_logprob": -0.11111964175575657, "compression_ratio": + 1.5862068965517242, "no_speech_prob": 0.0028009426314383745}, {"id": 670, "seek": + 405552, "start": 4055.6, "end": 4061.28, "text": " and I''ve worked in search enough + to have that. But beyond that, it''s a collective thing.", "tokens": [50368, 293, + 286, 600, 2732, 294, 3164, 1547, 281, 362, 300, 13, 583, 4399, 300, 11, 309, 311, + 257, 12590, 551, 13, 50652], "temperature": 0.0, "avg_logprob": -0.1342171819586503, + "compression_ratio": 1.625, "no_speech_prob": 0.01445038802921772}, {"id": 671, + "seek": 405552, "start": 4062.48, "end": 4067.44, "text": " I love the way Vespa, + as an example, is so open to look at how well it''s evolving in the", "tokens": + [50712, 286, 959, 264, 636, 691, 279, 4306, 11, 382, 364, 1365, 11, 307, 370, 1269, + 281, 574, 412, 577, 731, 309, 311, 21085, 294, 264, 50960], "temperature": 0.0, + "avg_logprob": -0.1342171819586503, "compression_ratio": 1.625, "no_speech_prob": + 0.01445038802921772}, {"id": 672, "seek": 405552, "start": 4068.0, "end": 4074.48, + "text": " place that the community that needs it. I think there''s a similar community. + And what is out there", "tokens": [50988, 1081, 300, 264, 1768, 300, 2203, 309, + 13, 286, 519, 456, 311, 257, 2531, 1768, 13, 400, 437, 307, 484, 456, 51312], "temperature": + 0.0, "avg_logprob": -0.1342171819586503, "compression_ratio": 1.625, "no_speech_prob": + 0.01445038802921772}, {"id": 673, "seek": 405552, "start": 4074.48, "end": 4080.88, + "text": " for them are a bunch of potentially some good and some unknown vendors, + some interesting open source", "tokens": [51312, 337, 552, 366, 257, 3840, 295, + 7263, 512, 665, 293, 512, 9841, 22056, 11, 512, 1880, 1269, 4009, 51632], "temperature": + 0.0, "avg_logprob": -0.1342171819586503, "compression_ratio": 1.625, "no_speech_prob": + 0.01445038802921772}, {"id": 674, "seek": 408088, "start": 4080.96, "end": 4085.36, + "text": " products, some of which might take a lot of work together. And maybe their + stories about", "tokens": [50368, 3383, 11, 512, 295, 597, 1062, 747, 257, 688, + 295, 589, 1214, 13, 400, 1310, 641, 3676, 466, 50588], "temperature": 0.0, "avg_logprob": + -0.17879707122517524, "compression_ratio": 1.627177700348432, "no_speech_prob": + 0.006980721838772297}, {"id": 675, "seek": 408088, "start": 4086.2400000000002, + "end": 4089.84, "text": " super hot projects where there''s one committee, and they + go on vacation for two months and", "tokens": [50632, 1687, 2368, 4455, 689, 456, + 311, 472, 7482, 11, 293, 436, 352, 322, 12830, 337, 732, 2493, 293, 50812], "temperature": + 0.0, "avg_logprob": -0.17879707122517524, "compression_ratio": 1.627177700348432, + "no_speech_prob": 0.006980721838772297}, {"id": 676, "seek": 408088, "start": 4089.84, + "end": 4095.28, "text": " everything falls apart, where they lose interest after + two years, and they leave with 2000 tickets.", "tokens": [50812, 1203, 8804, 4936, + 11, 689, 436, 3624, 1179, 934, 732, 924, 11, 293, 436, 1856, 365, 8132, 12628, 13, + 51084], "temperature": 0.0, "avg_logprob": -0.17879707122517524, "compression_ratio": + 1.627177700348432, "no_speech_prob": 0.006980721838772297}, {"id": 677, "seek": + 408088, "start": 4095.28, "end": 4100.4800000000005, "text": " It''s good to know + that there''s a little commercial entity. But ultimately, aren''t the greatest", + "tokens": [51084, 467, 311, 665, 281, 458, 300, 456, 311, 257, 707, 6841, 13977, + 13, 583, 6284, 11, 3212, 380, 264, 6636, 51344], "temperature": 0.0, "avg_logprob": + -0.17879707122517524, "compression_ratio": 1.627177700348432, "no_speech_prob": + 0.006980721838772297}, {"id": 678, "seek": 408088, "start": 4100.4800000000005, + "end": 4106.4800000000005, "text": " innovations coming from open source, open AI? + Most of the pieces out there, that''s why there", "tokens": [51344, 24283, 1348, + 490, 1269, 4009, 11, 1269, 7318, 30, 4534, 295, 264, 3755, 484, 456, 11, 300, 311, + 983, 456, 51644], "temperature": 0.0, "avg_logprob": -0.17879707122517524, "compression_ratio": + 1.627177700348432, "no_speech_prob": 0.006980721838772297}, {"id": 679, "seek": + 410648, "start": 4106.48, "end": 4110.959999999999, "text": " have been so many + replications. And that''s the last piece of it. It''s provable. You know,", "tokens": + [50364, 362, 668, 370, 867, 3248, 24847, 13, 400, 300, 311, 264, 1036, 2522, 295, + 309, 13, 467, 311, 1439, 712, 13, 509, 458, 11, 50588], "temperature": 0.0, "avg_logprob": + -0.18282179246869004, "compression_ratio": 1.6, "no_speech_prob": 0.005276723299175501}, + {"id": 680, "seek": 410648, "start": 4111.759999999999, "end": 4115.759999999999, + "text": " you can take my word for it. You can look at all the charts and stuff. + But with two", "tokens": [50628, 291, 393, 747, 452, 1349, 337, 309, 13, 509, 393, + 574, 412, 439, 264, 17767, 293, 1507, 13, 583, 365, 732, 50828], "temperature": + 0.0, "avg_logprob": -0.18282179246869004, "compression_ratio": 1.6, "no_speech_prob": + 0.005276723299175501}, {"id": 681, "seek": 410648, "start": 4116.5599999999995, + "end": 4120.879999999999, "text": " commands, if you have Docker running, you can + get swirl going, and you can see for yourself.", "tokens": [50868, 16901, 11, 498, + 291, 362, 33772, 2614, 11, 291, 393, 483, 30310, 516, 11, 293, 291, 393, 536, 337, + 1803, 13, 51084], "temperature": 0.0, "avg_logprob": -0.18282179246869004, "compression_ratio": + 1.6, "no_speech_prob": 0.005276723299175501}, {"id": 682, "seek": 410648, "start": + 4122.0, "end": 4126.0, "text": " Yeah. If it doesn''t do something, well, help us + make it better. Sorry, go ahead.", "tokens": [51140, 865, 13, 759, 309, 1177, 380, + 360, 746, 11, 731, 11, 854, 505, 652, 309, 1101, 13, 4919, 11, 352, 2286, 13, 51340], + "temperature": 0.0, "avg_logprob": -0.18282179246869004, "compression_ratio": 1.6, + "no_speech_prob": 0.005276723299175501}, {"id": 683, "seek": 410648, "start": 4126.0, + "end": 4130.08, "text": " Exactly. Exactly. No, I mean, that exactly proves it because", + "tokens": [51340, 7587, 13, 7587, 13, 883, 11, 286, 914, 11, 300, 2293, 25019, 309, + 570, 51544], "temperature": 0.0, "avg_logprob": -0.18282179246869004, "compression_ratio": + 1.6, "no_speech_prob": 0.005276723299175501}, {"id": 684, "seek": 413008, "start": + 4130.5599999999995, "end": 4138.64, "text": " however magical the software is, if + you are the engineer, you really want to like, you know,", "tokens": [50388, 4461, + 12066, 264, 4722, 307, 11, 498, 291, 366, 264, 11403, 11, 291, 534, 528, 281, 411, + 11, 291, 458, 11, 50792], "temperature": 0.0, "avg_logprob": -0.1522594041462186, + "compression_ratio": 1.554945054945055, "no_speech_prob": 0.04024715721607208}, + {"id": 685, "seek": 413008, "start": 4138.64, "end": 4146.24, "text": " open the + engine and see what''s going on there. How can I modify this? How can I plug this + in?", "tokens": [50792, 1269, 264, 2848, 293, 536, 437, 311, 516, 322, 456, 13, + 1012, 393, 286, 16927, 341, 30, 1012, 393, 286, 5452, 341, 294, 30, 51172], "temperature": + 0.0, "avg_logprob": -0.1522594041462186, "compression_ratio": 1.554945054945055, + "no_speech_prob": 0.04024715721607208}, {"id": 686, "seek": 413008, "start": 4147.2, + "end": 4155.2, "text": " Because if it''s not open, I guess, well, maybe someone + will blame me and say, no, this is wrong.", "tokens": [51220, 1436, 498, 309, 311, + 406, 1269, 11, 286, 2041, 11, 731, 11, 1310, 1580, 486, 10127, 385, 293, 584, 11, + 572, 11, 341, 307, 2085, 13, 51620], "temperature": 0.0, "avg_logprob": -0.1522594041462186, + "compression_ratio": 1.554945054945055, "no_speech_prob": 0.04024715721607208}, + {"id": 687, "seek": 415520, "start": 4155.28, "end": 4160.4, "text": " But you know, + if it''s like an API that I need to pay for, what''s the path for me to get", "tokens": + [50368, 583, 291, 458, 11, 498, 309, 311, 411, 364, 9362, 300, 286, 643, 281, 1689, + 337, 11, 437, 311, 264, 3100, 337, 385, 281, 483, 50624], "temperature": 0.0, "avg_logprob": + -0.13435068997469815, "compression_ratio": 1.6443661971830985, "no_speech_prob": + 0.0394461527466774}, {"id": 688, "seek": 415520, "start": 4160.639999999999, "end": + 4165.44, "text": " into hacking? Should I buy it on my own credit card? Or should + I call my manager and say, hey,", "tokens": [50636, 666, 31422, 30, 6454, 286, 2256, + 309, 322, 452, 1065, 5397, 2920, 30, 1610, 820, 286, 818, 452, 6598, 293, 584, 11, + 4177, 11, 50876], "temperature": 0.0, "avg_logprob": -0.13435068997469815, "compression_ratio": + 1.6443661971830985, "no_speech_prob": 0.0394461527466774}, {"id": 689, "seek": 415520, + "start": 4165.44, "end": 4170.4, "text": " can you, well, and usually what happens + if you look at Pinecon, for example, they will have,", "tokens": [50876, 393, 291, + 11, 731, 11, 293, 2673, 437, 2314, 498, 291, 574, 412, 33531, 1671, 11, 337, 1365, + 11, 436, 486, 362, 11, 51124], "temperature": 0.0, "avg_logprob": -0.13435068997469815, + "compression_ratio": 1.6443661971830985, "no_speech_prob": 0.0394461527466774}, + {"id": 690, "seek": 415520, "start": 4170.4, "end": 4176.4, "text": " they will + allocate like a free tier, right? And so you can kind of hack with free tier. If + you run", "tokens": [51124, 436, 486, 35713, 411, 257, 1737, 12362, 11, 558, 30, + 400, 370, 291, 393, 733, 295, 10339, 365, 1737, 12362, 13, 759, 291, 1190, 51424], + "temperature": 0.0, "avg_logprob": -0.13435068997469815, "compression_ratio": 1.6443661971830985, + "no_speech_prob": 0.0394461527466774}, {"id": 691, "seek": 415520, "start": 4176.4, + "end": 4182.48, "text": " out, then you''ll call your manager, I guess. Right. And + nothing wrong with that too. I mean,", "tokens": [51424, 484, 11, 550, 291, 603, + 818, 428, 6598, 11, 286, 2041, 13, 1779, 13, 400, 1825, 2085, 365, 300, 886, 13, + 286, 914, 11, 51728], "temperature": 0.0, "avg_logprob": -0.13435068997469815, "compression_ratio": + 1.6443661971830985, "no_speech_prob": 0.0394461527466774}, {"id": 692, "seek": 418248, + "start": 4182.48, "end": 4186.959999999999, "text": " but I think that that''s just + a facilitation of the try and buy process. It''s still a commercial", "tokens": + [50364, 457, 286, 519, 300, 300, 311, 445, 257, 10217, 4614, 295, 264, 853, 293, + 2256, 1399, 13, 467, 311, 920, 257, 6841, 50588], "temperature": 0.0, "avg_logprob": + -0.12968287392268105, "compression_ratio": 1.66, "no_speech_prob": 0.0054180677980184555}, + {"id": 693, "seek": 418248, "start": 4186.959999999999, "end": 4193.36, "text": + " company. You can''t know for sure. Right. And honestly, that works for many companies. + There''s no one", "tokens": [50588, 2237, 13, 509, 393, 380, 458, 337, 988, 13, + 1779, 13, 400, 6095, 11, 300, 1985, 337, 867, 3431, 13, 821, 311, 572, 472, 50908], + "temperature": 0.0, "avg_logprob": -0.12968287392268105, "compression_ratio": 1.66, + "no_speech_prob": 0.0054180677980184555}, {"id": 694, "seek": 418248, "start": 4193.36, + "end": 4199.12, "text": " size fits all. My point is this. I think for solving complex, + the kinds of complex multi-silow problems", "tokens": [50908, 2744, 9001, 439, 13, + 1222, 935, 307, 341, 13, 286, 519, 337, 12606, 3997, 11, 264, 3685, 295, 3997, 4825, + 12, 30605, 305, 2740, 51196], "temperature": 0.0, "avg_logprob": -0.12968287392268105, + "compression_ratio": 1.66, "no_speech_prob": 0.0054180677980184555}, {"id": 695, + "seek": 418248, "start": 4199.12, "end": 4202.879999999999, "text": " in the large + enterprise where where I have been very lucky to work before and where I think, + at", "tokens": [51196, 294, 264, 2416, 14132, 689, 689, 286, 362, 668, 588, 6356, + 281, 589, 949, 293, 689, 286, 519, 11, 412, 51384], "temperature": 0.0, "avg_logprob": + -0.12968287392268105, "compression_ratio": 1.66, "no_speech_prob": 0.0054180677980184555}, + {"id": 696, "seek": 418248, "start": 4202.879999999999, "end": 4206.799999999999, + "text": " least to some degree, I hear about the problems, right? Even if I don''t + understand them. I hear about", "tokens": [51384, 1935, 281, 512, 4314, 11, 286, + 1568, 466, 264, 2740, 11, 558, 30, 2754, 498, 286, 500, 380, 1223, 552, 13, 286, + 1568, 466, 51580], "temperature": 0.0, "avg_logprob": -0.12968287392268105, "compression_ratio": + 1.66, "no_speech_prob": 0.0054180677980184555}, {"id": 697, "seek": 420680, "start": + 4206.96, "end": 4214.320000000001, "text": " the problems. Open source is the winning + model because it is so tailorable. You know, no one has the", "tokens": [50372, + 264, 2740, 13, 7238, 4009, 307, 264, 8224, 2316, 570, 309, 307, 370, 6838, 284, + 712, 13, 509, 458, 11, 572, 472, 575, 264, 50740], "temperature": 0.0, "avg_logprob": + -0.11268501897012034, "compression_ratio": 1.7560975609756098, "no_speech_prob": + 0.005842366721481085}, {"id": 698, "seek": 420680, "start": 4214.320000000001, "end": + 4218.4800000000005, "text": " same thing. Everybody has seven of everything, I think, + in the large enterprise. And then there''s", "tokens": [50740, 912, 551, 13, 7646, + 575, 3407, 295, 1203, 11, 286, 519, 11, 294, 264, 2416, 14132, 13, 400, 550, 456, + 311, 50948], "temperature": 0.0, "avg_logprob": -0.11268501897012034, "compression_ratio": + 1.7560975609756098, "no_speech_prob": 0.005842366721481085}, {"id": 699, "seek": + 420680, "start": 4218.4800000000005, "end": 4223.28, "text": " regulation and compliance + regulatory systems, all that stuff. Those things are the ones, those are", "tokens": + [50948, 15062, 293, 15882, 18260, 3652, 11, 439, 300, 1507, 13, 3950, 721, 366, + 264, 2306, 11, 729, 366, 51188], "temperature": 0.0, "avg_logprob": -0.11268501897012034, + "compression_ratio": 1.7560975609756098, "no_speech_prob": 0.005842366721481085}, + {"id": 700, "seek": 420680, "start": 4223.28, "end": 4228.4800000000005, "text": + " the actual barriers. So open source is most adoptable in that regard. And then + I think as long as there''s", "tokens": [51188, 264, 3539, 13565, 13, 407, 1269, + 4009, 307, 881, 6878, 712, 294, 300, 3843, 13, 400, 550, 286, 519, 382, 938, 382, + 456, 311, 51448], "temperature": 0.0, "avg_logprob": -0.11268501897012034, "compression_ratio": + 1.7560975609756098, "no_speech_prob": 0.005842366721481085}, {"id": 701, "seek": + 420680, "start": 4228.4800000000005, "end": 4233.360000000001, "text": " someone + who, you know, as long as there''s some option to say, well, they''re not disappearing, + right?", "tokens": [51448, 1580, 567, 11, 291, 458, 11, 382, 938, 382, 456, 311, + 512, 3614, 281, 584, 11, 731, 11, 436, 434, 406, 34900, 11, 558, 30, 51692], "temperature": + 0.0, "avg_logprob": -0.11268501897012034, "compression_ratio": 1.7560975609756098, + "no_speech_prob": 0.005842366721481085}, {"id": 702, "seek": 423336, "start": 4233.36, + "end": 4237.44, "text": " They''re not. There''s still someone to help us who really + knows how this thing works.", "tokens": [50364, 814, 434, 406, 13, 821, 311, 920, + 1580, 281, 854, 505, 567, 534, 3255, 577, 341, 551, 1985, 13, 50568], "temperature": + 0.0, "avg_logprob": -0.25137206597056816, "compression_ratio": 1.5680272108843538, + "no_speech_prob": 0.009913946501910686}, {"id": 703, "seek": 423336, "start": 4238.0, + "end": 4245.2, "text": " It''s, it''s safe and tailorable. And that''s what''s really + driving so much of the growth,", "tokens": [50596, 467, 311, 11, 309, 311, 3273, + 293, 6838, 284, 712, 13, 400, 300, 311, 437, 311, 534, 4840, 370, 709, 295, 264, + 4599, 11, 50956], "temperature": 0.0, "avg_logprob": -0.25137206597056816, "compression_ratio": + 1.5680272108843538, "no_speech_prob": 0.009913946501910686}, {"id": 704, "seek": + 423336, "start": 4245.759999999999, "end": 4251.92, "text": " the incredible growth + in the software. Again, chat GPT, right? Paper, wood methods, not. It''s", "tokens": + [50984, 264, 4651, 4599, 294, 264, 4722, 13, 3764, 11, 5081, 26039, 51, 11, 558, + 30, 24990, 11, 4576, 7150, 11, 406, 13, 467, 311, 51292], "temperature": 0.0, "avg_logprob": + -0.25137206597056816, "compression_ratio": 1.5680272108843538, "no_speech_prob": + 0.009913946501910686}, {"id": 705, "seek": 423336, "start": 4251.92, "end": 4256.48, + "text": " being commercialized, but that''s no surprise. Yeah, I mean, it wouldn''t + probably exist if, like,", "tokens": [51292, 885, 6841, 1602, 11, 457, 300, 311, + 572, 6365, 13, 865, 11, 286, 914, 11, 309, 2759, 380, 1391, 2514, 498, 11, 411, + 11, 51520], "temperature": 0.0, "avg_logprob": -0.25137206597056816, "compression_ratio": + 1.5680272108843538, "no_speech_prob": 0.009913946501910686}, {"id": 706, "seek": + 423336, "start": 4256.48, "end": 4263.12, "text": " just yesterday I was hacking + something for going to bed. And it was super slow because I think it", "tokens": + [51520, 445, 5186, 286, 390, 31422, 746, 337, 516, 281, 2901, 13, 400, 309, 390, + 1687, 2964, 570, 286, 519, 309, 51852], "temperature": 0.0, "avg_logprob": -0.25137206597056816, + "compression_ratio": 1.5680272108843538, "no_speech_prob": 0.009913946501910686}, + {"id": 707, "seek": 426312, "start": 4263.12, "end": 4268.8, "text": " was US daytime. + Everyone was probably hacking there as well. But I was fine with that. It was typing", + "tokens": [50364, 390, 2546, 31908, 13, 5198, 390, 1391, 31422, 456, 382, 731, 13, + 583, 286, 390, 2489, 365, 300, 13, 467, 390, 18444, 50648], "temperature": 0.0, + "avg_logprob": -0.13506454278614896, "compression_ratio": 1.6416382252559727, "no_speech_prob": + 0.0028039070311933756}, {"id": 708, "seek": 426312, "start": 4268.8, "end": 4275.04, + "text": " slowly, giving me some code snippets. But could it have given me this + code snippets if they were not", "tokens": [50648, 5692, 11, 2902, 385, 512, 3089, + 35623, 1385, 13, 583, 727, 309, 362, 2212, 385, 341, 3089, 35623, 1385, 498, 436, + 645, 406, 50960], "temperature": 0.0, "avg_logprob": -0.13506454278614896, "compression_ratio": + 1.6416382252559727, "no_speech_prob": 0.0028039070311933756}, {"id": 709, "seek": + 426312, "start": 4275.04, "end": 4280.0, "text": " online, if they were not like + on GitHub or somewhere else, right? So I think it''s kind of", "tokens": [50960, + 2950, 11, 498, 436, 645, 406, 411, 322, 23331, 420, 4079, 1646, 11, 558, 30, 407, + 286, 519, 309, 311, 733, 295, 51208], "temperature": 0.0, "avg_logprob": -0.13506454278614896, + "compression_ratio": 1.6416382252559727, "no_speech_prob": 0.0028039070311933756}, + {"id": 710, "seek": 426312, "start": 4280.0, "end": 4285.44, "text": " extending + on the shoulders of giants again. Totally. I completely agree with you. And it''s", + "tokens": [51208, 24360, 322, 264, 10245, 295, 31894, 797, 13, 22837, 13, 286, 2584, + 3986, 365, 291, 13, 400, 309, 311, 51480], "temperature": 0.0, "avg_logprob": -0.13506454278614896, + "compression_ratio": 1.6416382252559727, "no_speech_prob": 0.0028039070311933756}, + {"id": 711, "seek": 426312, "start": 4285.44, "end": 4290.96, "text": " extremely + limited. Look, it was trained at least partly the noncode part, right? It was on + Reddit.", "tokens": [51480, 4664, 5567, 13, 2053, 11, 309, 390, 8895, 412, 1935, + 17031, 264, 2107, 22332, 644, 11, 558, 30, 467, 390, 322, 32210, 13, 51756], "temperature": + 0.0, "avg_logprob": -0.13506454278614896, "compression_ratio": 1.6416382252559727, + "no_speech_prob": 0.0028039070311933756}, {"id": 712, "seek": 429096, "start": 4291.6, + "end": 4297.12, "text": " It reads like Reddit. It has a little bit of a know it, + Ollie, you know, and it gives the sort of", "tokens": [50396, 467, 15700, 411, 32210, + 13, 467, 575, 257, 707, 857, 295, 257, 458, 309, 11, 35089, 11, 291, 458, 11, 293, + 309, 2709, 264, 1333, 295, 50672], "temperature": 0.0, "avg_logprob": -0.14392166137695311, + "compression_ratio": 1.6821428571428572, "no_speech_prob": 0.00977780856192112}, + {"id": 713, "seek": 429096, "start": 4297.12, "end": 4303.76, "text": " like consensus + answer. Now, that''s great for code as long as the consensus data is modern, current,", + "tokens": [50672, 411, 19115, 1867, 13, 823, 11, 300, 311, 869, 337, 3089, 382, + 938, 382, 264, 19115, 1412, 307, 4363, 11, 2190, 11, 51004], "temperature": 0.0, + "avg_logprob": -0.14392166137695311, "compression_ratio": 1.6821428571428572, "no_speech_prob": + 0.00977780856192112}, {"id": 714, "seek": 429096, "start": 4303.76, "end": 4309.2, + "text": " and available. So it''s never going to teach you, it probably won''t teach + you that much about enterprise", "tokens": [51004, 293, 2435, 13, 407, 309, 311, + 1128, 516, 281, 2924, 291, 11, 309, 1391, 1582, 380, 2924, 291, 300, 709, 466, 14132, + 51276], "temperature": 0.0, "avg_logprob": -0.14392166137695311, "compression_ratio": + 1.6821428571428572, "no_speech_prob": 0.00977780856192112}, {"id": 715, "seek": + 429096, "start": 4309.2, "end": 4312.96, "text": " integration patterns and enterprise + workloads. But it''ll teach you a lot about open source.", "tokens": [51276, 10980, + 8294, 293, 14132, 32452, 13, 583, 309, 603, 2924, 291, 257, 688, 466, 1269, 4009, + 13, 51464], "temperature": 0.0, "avg_logprob": -0.14392166137695311, "compression_ratio": + 1.6821428571428572, "no_speech_prob": 0.00977780856192112}, {"id": 716, "seek": + 429096, "start": 4314.4800000000005, "end": 4317.68, "text": " I write with it, + I try to write with it almost every day. And I can say this,", "tokens": [51540, + 286, 2464, 365, 309, 11, 286, 853, 281, 2464, 365, 309, 1920, 633, 786, 13, 400, + 286, 393, 584, 341, 11, 51700], "temperature": 0.0, "avg_logprob": -0.14392166137695311, + "compression_ratio": 1.6821428571428572, "no_speech_prob": 0.00977780856192112}, + {"id": 717, "seek": 431768, "start": 4318.64, "end": 4325.200000000001, "text": + " it''s very good at filling in a class function. If you teach it a class, it''s + very good at that.", "tokens": [50412, 309, 311, 588, 665, 412, 10623, 294, 257, + 1508, 2445, 13, 759, 291, 2924, 309, 257, 1508, 11, 309, 311, 588, 665, 412, 300, + 13, 50740], "temperature": 0.0, "avg_logprob": -0.09471610260009766, "compression_ratio": + 1.8108108108108107, "no_speech_prob": 0.007828890345990658}, {"id": 718, "seek": + 431768, "start": 4325.200000000001, "end": 4329.6, "text": " That seems to be, and + that''s really, I think, commodity work, right? How to connect to X?", "tokens": + [50740, 663, 2544, 281, 312, 11, 293, 300, 311, 534, 11, 286, 519, 11, 29125, 589, + 11, 558, 30, 1012, 281, 1745, 281, 1783, 30, 50960], "temperature": 0.0, "avg_logprob": + -0.09471610260009766, "compression_ratio": 1.8108108108108107, "no_speech_prob": + 0.007828890345990658}, {"id": 719, "seek": 431768, "start": 4330.64, "end": 4335.4400000000005, + "text": " It''s very, very disruptive there. It''s also potentially disruptive to + a lot of natural language", "tokens": [51012, 467, 311, 588, 11, 588, 37865, 456, + 13, 467, 311, 611, 7263, 37865, 281, 257, 688, 295, 3303, 2856, 51252], "temperature": + 0.0, "avg_logprob": -0.09471610260009766, "compression_ratio": 1.8108108108108107, + "no_speech_prob": 0.007828890345990658}, {"id": 720, "seek": 431768, "start": 4335.4400000000005, + "end": 4340.96, "text": " tasks. I think that''s the way it is because it is at + the end of the day a giant natural language", "tokens": [51252, 9608, 13, 286, 519, + 300, 311, 264, 636, 309, 307, 570, 309, 307, 412, 264, 917, 295, 264, 786, 257, + 7410, 3303, 2856, 51528], "temperature": 0.0, "avg_logprob": -0.09471610260009766, + "compression_ratio": 1.8108108108108107, "no_speech_prob": 0.007828890345990658}, + {"id": 721, "seek": 431768, "start": 4340.96, "end": 4346.320000000001, "text": + " model, right? So it''s not surprising. It can do things like translation. It''s + very good at", "tokens": [51528, 2316, 11, 558, 30, 407, 309, 311, 406, 8830, 13, + 467, 393, 360, 721, 411, 12853, 13, 467, 311, 588, 665, 412, 51796], "temperature": + 0.0, "avg_logprob": -0.09471610260009766, "compression_ratio": 1.8108108108108107, + "no_speech_prob": 0.007828890345990658}, {"id": 722, "seek": 434632, "start": 4346.32, + "end": 4351.2, "text": " rewriting a query to make it broader. It knows how to rewrite + a query to make it Boolean. Those", "tokens": [50364, 319, 19868, 257, 14581, 281, + 652, 309, 13227, 13, 467, 3255, 577, 281, 28132, 257, 14581, 281, 652, 309, 23351, + 28499, 13, 3950, 50608], "temperature": 0.0, "avg_logprob": -0.15012995923151734, + "compression_ratio": 1.6619718309859155, "no_speech_prob": 0.002547267824411392}, + {"id": 723, "seek": 434632, "start": 4351.2, "end": 4356.5599999999995, "text": + " are never going to change. But getting the data to it. Again, if you want to build + the chat GPT", "tokens": [50608, 366, 1128, 516, 281, 1319, 13, 583, 1242, 264, + 1412, 281, 309, 13, 3764, 11, 498, 291, 528, 281, 1322, 264, 5081, 26039, 51, 50876], + "temperature": 0.0, "avg_logprob": -0.15012995923151734, "compression_ratio": 1.6619718309859155, + "no_speech_prob": 0.002547267824411392}, {"id": 724, "seek": 434632, "start": 4356.5599999999995, + "end": 4362.0, "text": " of mortgage exception handling, you''re going to need to + pull a lot of internal data, label it carefully.", "tokens": [50876, 295, 20236, + 11183, 13175, 11, 291, 434, 516, 281, 643, 281, 2235, 257, 688, 295, 6920, 1412, + 11, 7645, 309, 7500, 13, 51148], "temperature": 0.0, "avg_logprob": -0.15012995923151734, + "compression_ratio": 1.6619718309859155, "no_speech_prob": 0.002547267824411392}, + {"id": 725, "seek": 434632, "start": 4362.96, "end": 4368.16, "text": " That''s + that, and that, and you might discover you don''t have enough. That could also be + the case.", "tokens": [51196, 663, 311, 300, 11, 293, 300, 11, 293, 291, 1062, 4411, + 291, 500, 380, 362, 1547, 13, 663, 727, 611, 312, 264, 1389, 13, 51456], "temperature": + 0.0, "avg_logprob": -0.15012995923151734, "compression_ratio": 1.6619718309859155, + "no_speech_prob": 0.002547267824411392}, {"id": 726, "seek": 434632, "start": 4368.16, + "end": 4372.32, "text": " There''s a whole synthetic data market that''s ready to + solve that problem. So,", "tokens": [51456, 821, 311, 257, 1379, 23420, 1412, 2142, + 300, 311, 1919, 281, 5039, 300, 1154, 13, 407, 11, 51664], "temperature": 0.0, "avg_logprob": + -0.15012995923151734, "compression_ratio": 1.6619718309859155, "no_speech_prob": + 0.002547267824411392}, {"id": 727, "seek": 437232, "start": 4373.28, "end": 4377.04, + "text": " but in a larger surprise, I think it''s much more of the other problem. + We can''t get to it. We know", "tokens": [50412, 457, 294, 257, 4833, 6365, 11, + 286, 519, 309, 311, 709, 544, 295, 264, 661, 1154, 13, 492, 393, 380, 483, 281, + 309, 13, 492, 458, 50600], "temperature": 0.0, "avg_logprob": -0.19560877482096353, + "compression_ratio": 1.5387596899224807, "no_speech_prob": 0.009804571978747845}, + {"id": 728, "seek": 437232, "start": 4377.04, "end": 4384.5599999999995, "text": + " it''s there. On that front, have you actually considered implementing a chat GPT + plugin so that I can go", "tokens": [50600, 309, 311, 456, 13, 1282, 300, 1868, + 11, 362, 291, 767, 4888, 18114, 257, 5081, 26039, 51, 23407, 370, 300, 286, 393, + 352, 50976], "temperature": 0.0, "avg_logprob": -0.19560877482096353, "compression_ratio": + 1.5387596899224807, "no_speech_prob": 0.009804571978747845}, {"id": 729, "seek": + 437232, "start": 4384.5599999999995, "end": 4391.04, "text": " as a user configure + things at my tokens? And now, boom, I can search my internal data lakes.", "tokens": + [50976, 382, 257, 4195, 22162, 721, 412, 452, 22667, 30, 400, 586, 11, 9351, 11, + 286, 393, 3164, 452, 6920, 1412, 25595, 13, 51300], "temperature": 0.0, "avg_logprob": + -0.19560877482096353, "compression_ratio": 1.5387596899224807, "no_speech_prob": + 0.009804571978747845}, {"id": 730, "seek": 437232, "start": 4391.92, "end": 4397.84, + "text": " So we have, we are integrated with chat GPT. There''s a connector so you + can query it. We, by default,", "tokens": [51344, 407, 321, 362, 11, 321, 366, 10919, + 365, 5081, 26039, 51, 13, 821, 311, 257, 19127, 370, 291, 393, 14581, 309, 13, 492, + 11, 538, 7576, 11, 51640], "temperature": 0.0, "avg_logprob": -0.19560877482096353, + "compression_ratio": 1.5387596899224807, "no_speech_prob": 0.009804571978747845}, + {"id": 731, "seek": 439784, "start": 4397.84, "end": 4401.84, "text": " send every + query to it. We also have a query processor, and we will soon have a result", "tokens": + [50364, 2845, 633, 14581, 281, 309, 13, 492, 611, 362, 257, 14581, 15321, 11, 293, + 321, 486, 2321, 362, 257, 1874, 50564], "temperature": 0.0, "avg_logprob": -0.12579126594480405, + "compression_ratio": 1.7132616487455197, "no_speech_prob": 0.006975292693823576}, + {"id": 732, "seek": 439784, "start": 4401.84, "end": 4406.8, "text": " processor + that will summarize your results for you. But frankly, I think several people have + already", "tokens": [50564, 15321, 300, 486, 20858, 428, 3542, 337, 291, 13, 583, + 11939, 11, 286, 519, 2940, 561, 362, 1217, 50812], "temperature": 0.0, "avg_logprob": + -0.12579126594480405, "compression_ratio": 1.7132616487455197, "no_speech_prob": + 0.006975292693823576}, {"id": 733, "seek": 439784, "start": 4406.8, "end": 4412.16, + "text": " done stuff like that. So you just copy and paste the links. You can probably + get that. I think that''s", "tokens": [50812, 1096, 1507, 411, 300, 13, 407, 291, + 445, 5055, 293, 9163, 264, 6123, 13, 509, 393, 1391, 483, 300, 13, 286, 519, 300, + 311, 51080], "temperature": 0.0, "avg_logprob": -0.12579126594480405, "compression_ratio": + 1.7132616487455197, "no_speech_prob": 0.006975292693823576}, {"id": 734, "seek": + 439784, "start": 4413.68, "end": 4417.6, "text": " that''s really an essential piece + of it. Now, to query, like generate queries from chat GPT,", "tokens": [51156, 300, + 311, 534, 364, 7115, 2522, 295, 309, 13, 823, 11, 281, 14581, 11, 411, 8460, 24109, + 490, 5081, 26039, 51, 11, 51352], "temperature": 0.0, "avg_logprob": -0.12579126594480405, + "compression_ratio": 1.7132616487455197, "no_speech_prob": 0.006975292693823576}, + {"id": 735, "seek": 439784, "start": 4417.6, "end": 4422.56, "text": " I think that''s + easy to do. Right. Someone can do that. But this is my point. There will be other", + "tokens": [51352, 286, 519, 300, 311, 1858, 281, 360, 13, 1779, 13, 8734, 393, 360, + 300, 13, 583, 341, 307, 452, 935, 13, 821, 486, 312, 661, 51600], "temperature": + 0.0, "avg_logprob": -0.12579126594480405, "compression_ratio": 1.7132616487455197, + "no_speech_prob": 0.006975292693823576}, {"id": 736, "seek": 442256, "start": 4423.120000000001, + "end": 4428.240000000001, "text": " GPs. We refer to chat GPT as a question answer, + right? Or questions. If you say question, Colin,", "tokens": [50392, 460, 23043, + 13, 492, 2864, 281, 5081, 26039, 51, 382, 257, 1168, 1867, 11, 558, 30, 1610, 1651, + 13, 759, 291, 584, 1168, 11, 29253, 11, 50648], "temperature": 0.0, "avg_logprob": + -0.17330613683481685, "compression_ratio": 1.7014388489208634, "no_speech_prob": + 0.008556035347282887}, {"id": 737, "seek": 442256, "start": 4428.240000000001, "end": + 4433.280000000001, "text": " put your question in. We''ll send it to chat GPT. I + am sure people are looking at the amazing", "tokens": [50648, 829, 428, 1168, 294, + 13, 492, 603, 2845, 309, 281, 5081, 26039, 51, 13, 286, 669, 988, 561, 366, 1237, + 412, 264, 2243, 50900], "temperature": 0.0, "avg_logprob": -0.17330613683481685, + "compression_ratio": 1.7014388489208634, "no_speech_prob": 0.008556035347282887}, + {"id": 738, "seek": 442256, "start": 4433.280000000001, "end": 4438.96, "text": + " platforms you''ve just mentioned, right? All of them. Those are going to end up + deployed in different", "tokens": [50900, 9473, 291, 600, 445, 2835, 11, 558, 30, + 1057, 295, 552, 13, 3950, 366, 516, 281, 917, 493, 17826, 294, 819, 51184], "temperature": + 0.0, "avg_logprob": -0.17330613683481685, "compression_ratio": 1.7014388489208634, + "no_speech_prob": 0.008556035347282887}, {"id": 739, "seek": 442256, "start": 4438.96, + "end": 4446.96, "text": " parts of the enterprise, answering questions, summarizing, + extracting, predicting, prescribing.", "tokens": [51184, 3166, 295, 264, 14132, + 11, 13430, 1651, 11, 14611, 3319, 11, 49844, 11, 32884, 11, 1183, 39541, 13, 51584], + "temperature": 0.0, "avg_logprob": -0.17330613683481685, "compression_ratio": 1.7014388489208634, + "no_speech_prob": 0.008556035347282887}, {"id": 740, "seek": 442256, "start": 4446.96, + "end": 4450.4800000000005, "text": " There will be all those things out there. And + the key will be, how do you get at them?", "tokens": [51584, 821, 486, 312, 439, + 729, 721, 484, 456, 13, 400, 264, 2141, 486, 312, 11, 577, 360, 291, 483, 412, 552, + 30, 51760], "temperature": 0.0, "avg_logprob": -0.17330613683481685, "compression_ratio": + 1.7014388489208634, "no_speech_prob": 0.008556035347282887}, {"id": 741, "seek": + 445048, "start": 4451.44, "end": 4455.44, "text": " Yeah. It''s still the problem. + Right. Just because you have something that will comment on the", "tokens": [50412, + 865, 13, 467, 311, 920, 264, 1154, 13, 1779, 13, 1449, 570, 291, 362, 746, 300, + 486, 2871, 322, 264, 50612], "temperature": 0.0, "avg_logprob": -0.1542581266111082, + "compression_ratio": 1.6512455516014235, "no_speech_prob": 0.010725785978138447}, + {"id": 742, "seek": 445048, "start": 4455.44, "end": 4461.679999999999, "text": + " financial implications of a federal rule change, for example, right? Doesn''t + mean anyone''s going to", "tokens": [50612, 4669, 16602, 295, 257, 6019, 4978, 1319, + 11, 337, 1365, 11, 558, 30, 12955, 380, 914, 2878, 311, 516, 281, 50924], "temperature": + 0.0, "avg_logprob": -0.1542581266111082, "compression_ratio": 1.6512455516014235, + "no_speech_prob": 0.010725785978138447}, {"id": 743, "seek": 445048, "start": 4461.679999999999, + "end": 4467.839999999999, "text": " go look at it. So, but if you made sure that + every day or whatever it is or every that we were", "tokens": [50924, 352, 574, + 412, 309, 13, 407, 11, 457, 498, 291, 1027, 988, 300, 633, 786, 420, 2035, 309, + 307, 420, 633, 300, 321, 645, 51232], "temperature": 0.0, "avg_logprob": -0.1542581266111082, + "compression_ratio": 1.6512455516014235, "no_speech_prob": 0.010725785978138447}, + {"id": 744, "seek": 445048, "start": 4467.839999999999, "end": 4472.48, "text": + " checking for new temporal updates from that and those were being pushed out to + the people who", "tokens": [51232, 8568, 337, 777, 30881, 9205, 490, 300, 293, 729, + 645, 885, 9152, 484, 281, 264, 561, 567, 51464], "temperature": 0.0, "avg_logprob": + -0.1542581266111082, "compression_ratio": 1.6512455516014235, "no_speech_prob": + 0.010725785978138447}, {"id": 745, "seek": 445048, "start": 4472.48, "end": 4475.919999999999, + "text": " needed to know that and read it, especially if you could check that they + read it.", "tokens": [51464, 2978, 281, 458, 300, 293, 1401, 309, 11, 2318, 498, + 291, 727, 1520, 300, 436, 1401, 309, 13, 51636], "temperature": 0.0, "avg_logprob": + -0.1542581266111082, "compression_ratio": 1.6512455516014235, "no_speech_prob": + 0.010725785978138447}, {"id": 746, "seek": 447592, "start": 4476.4, "end": 4483.28, + "text": " If you could imagine doing something like pushing information to analysts + or somebody who''s", "tokens": [50388, 759, 291, 727, 3811, 884, 746, 411, 7380, + 1589, 281, 31388, 420, 2618, 567, 311, 50732], "temperature": 0.0, "avg_logprob": + -0.12917191529076946, "compression_ratio": 1.7364864864864864, "no_speech_prob": + 0.006023196037858725}, {"id": 747, "seek": 447592, "start": 4483.28, "end": 4486.96, + "text": " taking action on it and then tracking to see who read it and then watching + their performance,", "tokens": [50732, 1940, 3069, 322, 309, 293, 550, 11603, 281, + 536, 567, 1401, 309, 293, 550, 1976, 641, 3389, 11, 50916], "temperature": 0.0, + "avg_logprob": -0.12917191529076946, "compression_ratio": 1.7364864864864864, "no_speech_prob": + 0.006023196037858725}, {"id": 748, "seek": 447592, "start": 4487.92, "end": 4490.96, + "text": " I am sure that that will be a thing in the financial services world.", + "tokens": [50964, 286, 669, 988, 300, 300, 486, 312, 257, 551, 294, 264, 4669, 3328, + 1002, 13, 51116], "temperature": 0.0, "avg_logprob": -0.12917191529076946, "compression_ratio": + 1.7364864864864864, "no_speech_prob": 0.006023196037858725}, {"id": 749, "seek": + 447592, "start": 4492.0, "end": 4496.32, "text": " You know, it''s a tough world. + There''s very used to a high level of governance, if you will,", "tokens": [51168, + 509, 458, 11, 309, 311, 257, 4930, 1002, 13, 821, 311, 588, 1143, 281, 257, 1090, + 1496, 295, 17449, 11, 498, 291, 486, 11, 51384], "temperature": 0.0, "avg_logprob": + -0.12917191529076946, "compression_ratio": 1.7364864864864864, "no_speech_prob": + 0.006023196037858725}, {"id": 750, "seek": 447592, "start": 4496.32, "end": 4500.64, + "text": " but I think that that''s the kind of system that will ultimately produce + the automation", "tokens": [51384, 457, 286, 519, 300, 300, 311, 264, 733, 295, + 1185, 300, 486, 6284, 5258, 264, 17769, 51600], "temperature": 0.0, "avg_logprob": + -0.12917191529076946, "compression_ratio": 1.7364864864864864, "no_speech_prob": + 0.006023196037858725}, {"id": 751, "seek": 447592, "start": 4501.36, "end": 4505.36, + "text": " where the chat GPT will be able to solve the mortgage exception. So, on + its own,", "tokens": [51636, 689, 264, 5081, 26039, 51, 486, 312, 1075, 281, 5039, + 264, 20236, 11183, 13, 407, 11, 322, 1080, 1065, 11, 51836], "temperature": 0.0, + "avg_logprob": -0.12917191529076946, "compression_ratio": 1.7364864864864864, "no_speech_prob": + 0.006023196037858725}, {"id": 752, "seek": 450592, "start": 4505.92, "end": 4509.04, + "text": " 90% of the time, right? 10% of the time engaging a human.", "tokens": + [50364, 4289, 4, 295, 264, 565, 11, 558, 30, 1266, 4, 295, 264, 565, 11268, 257, + 1952, 13, 50520], "temperature": 0.0, "avg_logprob": -0.15538631690727486, "compression_ratio": + 1.5023041474654377, "no_speech_prob": 0.002235033782199025}, {"id": 753, "seek": + 450592, "start": 4510.56, "end": 4518.0, "text": " Yes. That''s somewhat scary, + but I think it could also be liberating if done well. And I think", "tokens": [50596, + 1079, 13, 663, 311, 8344, 6958, 11, 457, 286, 519, 309, 727, 611, 312, 6774, 990, + 498, 1096, 731, 13, 400, 286, 519, 50968], "temperature": 0.0, "avg_logprob": -0.15538631690727486, + "compression_ratio": 1.5023041474654377, "no_speech_prob": 0.002235033782199025}, + {"id": 754, "seek": 450592, "start": 4518.0, "end": 4523.4400000000005, "text": + " there is a big discussion on this topic going on. How do we collectively as a + humanity,", "tokens": [50968, 456, 307, 257, 955, 5017, 322, 341, 4829, 516, 322, + 13, 1012, 360, 321, 24341, 382, 257, 10243, 11, 51240], "temperature": 0.0, "avg_logprob": + -0.15538631690727486, "compression_ratio": 1.5023041474654377, "no_speech_prob": + 0.002235033782199025}, {"id": 755, "seek": 450592, "start": 4524.0, "end": 4530.4, + "text": " you know, make sure that this tech doesn''t host us, right? Doesn''t just + kick us out of", "tokens": [51268, 291, 458, 11, 652, 988, 300, 341, 7553, 1177, + 380, 3975, 505, 11, 558, 30, 12955, 380, 445, 4437, 505, 484, 295, 51588], "temperature": + 0.0, "avg_logprob": -0.15538631690727486, "compression_ratio": 1.5023041474654377, + "no_speech_prob": 0.002235033782199025}, {"id": 756, "seek": 453040, "start": 4531.04, + "end": 4540.0, "text": " our professions or, you know, we still have a way to, I + mean, even just going back to yesterday''s", "tokens": [50396, 527, 38129, 420, + 11, 291, 458, 11, 321, 920, 362, 257, 636, 281, 11, 286, 914, 11, 754, 445, 516, + 646, 281, 5186, 311, 50844], "temperature": 0.0, "avg_logprob": -0.13544075375511533, + "compression_ratio": 1.6016260162601625, "no_speech_prob": 0.02668783999979496}, + {"id": 757, "seek": 453040, "start": 4540.0, "end": 4545.2, "text": " example, I + was going really in circles. I was just drawing some pins on the map using chat + GPT.", "tokens": [50844, 1365, 11, 286, 390, 516, 534, 294, 13040, 13, 286, 390, + 445, 6316, 512, 16392, 322, 264, 4471, 1228, 5081, 26039, 51, 13, 51104], "temperature": + 0.0, "avg_logprob": -0.13544075375511533, "compression_ratio": 1.6016260162601625, + "no_speech_prob": 0.02668783999979496}, {"id": 758, "seek": 453040, "start": 4545.2, + "end": 4551.2, "text": " And it couldn''t get exactly the crux of what I was asking. + And so I went to the kitchen. I thought,", "tokens": [51104, 400, 309, 2809, 380, + 483, 2293, 264, 5140, 87, 295, 437, 286, 390, 3365, 13, 400, 370, 286, 1437, 281, + 264, 6525, 13, 286, 1194, 11, 51404], "temperature": 0.0, "avg_logprob": -0.13544075375511533, + "compression_ratio": 1.6016260162601625, "no_speech_prob": 0.02668783999979496}, + {"id": 759, "seek": 453040, "start": 4551.2, "end": 4556.48, "text": " just for + two minutes and I thought, okay, I can just break down my code in two parts without + telling", "tokens": [51404, 445, 337, 732, 2077, 293, 286, 1194, 11, 1392, 11, 286, + 393, 445, 1821, 760, 452, 3089, 294, 732, 3166, 1553, 3585, 51668], "temperature": + 0.0, "avg_logprob": -0.13544075375511533, "compression_ratio": 1.6016260162601625, + "no_speech_prob": 0.02668783999979496}, {"id": 760, "seek": 455648, "start": 4556.48, + "end": 4561.44, "text": " chat GPT what I''m doing and just run everything in my + ID and boom, I''m done because I was", "tokens": [50364, 5081, 26039, 51, 437, 286, + 478, 884, 293, 445, 1190, 1203, 294, 452, 7348, 293, 9351, 11, 286, 478, 1096, 570, + 286, 390, 50612], "temperature": 0.0, "avg_logprob": -0.1262016607367474, "compression_ratio": + 1.5446808510638297, "no_speech_prob": 0.024406125769019127}, {"id": 761, "seek": + 455648, "start": 4561.44, "end": 4568.799999999999, "text": " reasonably going in + circles. And maybe it''s just me unable to, you know, engineer better prompt,", + "tokens": [50612, 23551, 516, 294, 13040, 13, 400, 1310, 309, 311, 445, 385, 11299, + 281, 11, 291, 458, 11, 11403, 1101, 12391, 11, 50980], "temperature": 0.0, "avg_logprob": + -0.1262016607367474, "compression_ratio": 1.5446808510638297, "no_speech_prob": + 0.024406125769019127}, {"id": 762, "seek": 455648, "start": 4568.799999999999, "end": + 4575.12, "text": " so engineer better questions. Or maybe chat GPT does have limitations + as well. You never know.", "tokens": [50980, 370, 11403, 1101, 1651, 13, 1610, 1310, + 5081, 26039, 51, 775, 362, 15705, 382, 731, 13, 509, 1128, 458, 13, 51296], "temperature": + 0.0, "avg_logprob": -0.1262016607367474, "compression_ratio": 1.5446808510638297, + "no_speech_prob": 0.024406125769019127}, {"id": 763, "seek": 455648, "start": 4575.839999999999, + "end": 4581.599999999999, "text": " But it did help me probably like 90% of the + work was done using that interaction.", "tokens": [51332, 583, 309, 630, 854, 385, + 1391, 411, 4289, 4, 295, 264, 589, 390, 1096, 1228, 300, 9285, 13, 51620], "temperature": + 0.0, "avg_logprob": -0.1262016607367474, "compression_ratio": 1.5446808510638297, + "no_speech_prob": 0.024406125769019127}, {"id": 764, "seek": 458160, "start": 4582.400000000001, + "end": 4587.360000000001, "text": " Like I would have spent several half a days + as they call them or whatever evenings,", "tokens": [50404, 1743, 286, 576, 362, + 4418, 2940, 1922, 257, 1708, 382, 436, 818, 552, 420, 2035, 42835, 11, 50652], "temperature": + 0.0, "avg_logprob": -0.20467409369063705, "compression_ratio": 1.4757281553398058, + "no_speech_prob": 0.020998766645789146}, {"id": 765, "seek": 458160, "start": 4587.360000000001, + "end": 4592.64, "text": " figuring out all these things. Like what library should + they use to connect to open source", "tokens": [50652, 15213, 484, 439, 613, 721, + 13, 1743, 437, 6405, 820, 436, 764, 281, 1745, 281, 1269, 4009, 50916], "temperature": + 0.0, "avg_logprob": -0.20467409369063705, "compression_ratio": 1.4757281553398058, + "no_speech_prob": 0.020998766645789146}, {"id": 766, "seek": 458160, "start": 4595.04, + "end": 4598.72, "text": " map or whatever. You know, how do I drop pins?", "tokens": + [51036, 4471, 420, 2035, 13, 509, 458, 11, 577, 360, 286, 3270, 16392, 30, 51220], + "temperature": 0.0, "avg_logprob": -0.20467409369063705, "compression_ratio": 1.4757281553398058, + "no_speech_prob": 0.020998766645789146}, {"id": 767, "seek": 458160, "start": 4600.240000000001, + "end": 4607.280000000001, "text": " Absolutely. The chat GPT is the perfect replacement + for the more senior developer,", "tokens": [51296, 7021, 13, 440, 5081, 26039, 51, + 307, 264, 2176, 14419, 337, 264, 544, 7965, 10754, 11, 51648], "temperature": 0.0, + "avg_logprob": -0.20467409369063705, "compression_ratio": 1.4757281553398058, "no_speech_prob": + 0.020998766645789146}, {"id": 768, "seek": 460728, "start": 4607.28, "end": 4611.759999999999, + "text": " who will answer your texts right or your Slack, sorry Dave, my name''s + mine.", "tokens": [50364, 567, 486, 1867, 428, 15765, 558, 420, 428, 37211, 11, + 2597, 11017, 11, 452, 1315, 311, 3892, 13, 50588], "temperature": 0.0, "avg_logprob": + -0.2849465647051411, "compression_ratio": 1.6498194945848375, "no_speech_prob": + 0.004442207049578428}, {"id": 769, "seek": 460728, "start": 4612.639999999999, "end": + 4617.2, "text": " You know, like that used to do work until you''re blocked and + then you go find somebody and say,", "tokens": [50632, 509, 458, 11, 411, 300, 1143, + 281, 360, 589, 1826, 291, 434, 15470, 293, 550, 291, 352, 915, 2618, 293, 584, 11, + 50860], "temperature": 0.0, "avg_logprob": -0.2849465647051411, "compression_ratio": + 1.6498194945848375, "no_speech_prob": 0.004442207049578428}, {"id": 770, "seek": + 460728, "start": 4617.2, "end": 4623.44, "text": " okay, so I can''t figure this + out. This was pre-internet, right? Now, for a long time, we had stack trace or", + "tokens": [50860, 1392, 11, 370, 286, 393, 380, 2573, 341, 484, 13, 639, 390, 659, + 12, 259, 2231, 302, 11, 558, 30, 823, 11, 337, 257, 938, 565, 11, 321, 632, 8630, + 13508, 420, 51172], "temperature": 0.0, "avg_logprob": -0.2849465647051411, "compression_ratio": + 1.6498194945848375, "no_speech_prob": 0.004442207049578428}, {"id": 771, "seek": + 460728, "start": 4626.16, "end": 4630.5599999999995, "text": " the other thing that + chat GPT has completely revised. Yeah, stack overflow.", "tokens": [51308, 264, + 661, 551, 300, 5081, 26039, 51, 575, 2584, 35228, 13, 865, 11, 8630, 37772, 13, + 51528], "temperature": 0.0, "avg_logprob": -0.2849465647051411, "compression_ratio": + 1.6498194945848375, "no_speech_prob": 0.004442207049578428}, {"id": 772, "seek": + 460728, "start": 4631.28, "end": 4636.32, "text": " Stack overflow. Right. Exactly. + Now it is we have stack overflow. For a while, we had stack overflow.", "tokens": + [51564, 37649, 37772, 13, 1779, 13, 7587, 13, 823, 309, 307, 321, 362, 8630, 37772, + 13, 1171, 257, 1339, 11, 321, 632, 8630, 37772, 13, 51816], "temperature": 0.0, + "avg_logprob": -0.2849465647051411, "compression_ratio": 1.6498194945848375, "no_speech_prob": + 0.004442207049578428}, {"id": 773, "seek": 463632, "start": 4636.32, "end": 4640.16, + "text": " And then now chat GPT, it''s funny. I forgot the name because I use chat + GPT instead.", "tokens": [50364, 400, 550, 586, 5081, 26039, 51, 11, 309, 311, 4074, + 13, 286, 5298, 264, 1315, 570, 286, 764, 5081, 26039, 51, 2602, 13, 50556], "temperature": + 0.0, "avg_logprob": -0.14326433371041566, "compression_ratio": 1.6114864864864864, + "no_speech_prob": 0.0007970345322974026}, {"id": 774, "seek": 463632, "start": 4640.16, + "end": 4646.4, "text": " I haven''t Googled for a code thing in so long. I can''t + even replace your habit, right? Your memory", "tokens": [50556, 286, 2378, 380, + 45005, 1493, 337, 257, 3089, 551, 294, 370, 938, 13, 286, 393, 380, 754, 7406, 428, + 7164, 11, 558, 30, 2260, 4675, 50868], "temperature": 0.0, "avg_logprob": -0.14326433371041566, + "compression_ratio": 1.6114864864864864, "no_speech_prob": 0.0007970345322974026}, + {"id": 775, "seek": 463632, "start": 4646.4, "end": 4651.12, "text": " and habit + in some sense. Yeah. Well, you know, we all get good at evaluating those, right?", + "tokens": [50868, 293, 7164, 294, 512, 2020, 13, 865, 13, 1042, 11, 291, 458, 11, + 321, 439, 483, 665, 412, 27479, 729, 11, 558, 30, 51104], "temperature": 0.0, "avg_logprob": + -0.14326433371041566, "compression_ratio": 1.6114864864864864, "no_speech_prob": + 0.0007970345322974026}, {"id": 776, "seek": 463632, "start": 4651.12, "end": 4655.679999999999, + "text": " The stack overflow articles like, okay, so when''s it from? How many upvotes + does it have? Is there a", "tokens": [51104, 440, 8630, 37772, 11290, 411, 11, 1392, + 11, 370, 562, 311, 309, 490, 30, 1012, 867, 493, 85, 17251, 775, 309, 362, 30, 1119, + 456, 257, 51332], "temperature": 0.0, "avg_logprob": -0.14326433371041566, "compression_ratio": + 1.6114864864864864, "no_speech_prob": 0.0007970345322974026}, {"id": 777, "seek": + 463632, "start": 4655.679999999999, "end": 4661.04, "text": " good response? Does + it have the green check mark? Chat GPT is pretty much bringing you back the green", + "tokens": [51332, 665, 4134, 30, 4402, 309, 362, 264, 3092, 1520, 1491, 30, 27503, + 26039, 51, 307, 1238, 709, 5062, 291, 646, 264, 3092, 51600], "temperature": 0.0, + "avg_logprob": -0.14326433371041566, "compression_ratio": 1.6114864864864864, "no_speech_prob": + 0.0007970345322974026}, {"id": 778, "seek": 466104, "start": 4661.04, "end": 4666.96, + "text": " check mark answer. So there''s no point anymore. That''s what it''s good + at. I totally review.", "tokens": [50364, 1520, 1491, 1867, 13, 407, 456, 311, 572, + 935, 3602, 13, 663, 311, 437, 309, 311, 665, 412, 13, 286, 3879, 3131, 13, 50660], + "temperature": 0.0, "avg_logprob": -0.15141577287153765, "compression_ratio": 1.6109215017064846, + "no_speech_prob": 0.007223762106150389}, {"id": 779, "seek": 466104, "start": 4666.96, + "end": 4671.92, "text": " It''s funny. You mentioned this because exactly same thought + across my mind when I was interacting", "tokens": [50660, 467, 311, 4074, 13, 509, + 2835, 341, 570, 2293, 912, 1194, 2108, 452, 1575, 562, 286, 390, 18017, 50908], + "temperature": 0.0, "avg_logprob": -0.15141577287153765, "compression_ratio": 1.6109215017064846, + "no_speech_prob": 0.007223762106150389}, {"id": 780, "seek": 466104, "start": 4671.92, + "end": 4678.8, "text": " with chat GPT. So that was like relating to my experience + with stack overflow, doing some small", "tokens": [50908, 365, 5081, 26039, 51, + 13, 407, 300, 390, 411, 23968, 281, 452, 1752, 365, 8630, 37772, 11, 884, 512, 1359, + 51252], "temperature": 0.0, "avg_logprob": -0.15141577287153765, "compression_ratio": + 1.6109215017064846, "no_speech_prob": 0.007223762106150389}, {"id": 781, "seek": + 466104, "start": 4678.8, "end": 4684.48, "text": " Android application. And I''ve + ran into the issue which was described in like something like 20", "tokens": [51252, + 8853, 3861, 13, 400, 286, 600, 5872, 666, 264, 2734, 597, 390, 7619, 294, 411, 746, + 411, 945, 51536], "temperature": 0.0, "avg_logprob": -0.15141577287153765, "compression_ratio": + 1.6109215017064846, "no_speech_prob": 0.007223762106150389}, {"id": 782, "seek": + 466104, "start": 4684.48, "end": 4690.24, "text": " questions and answers on exactly + same topic. And everyone had a green, you know, check mark", "tokens": [51536, 1651, + 293, 6338, 322, 2293, 912, 4829, 13, 400, 1518, 632, 257, 3092, 11, 291, 458, 11, + 1520, 1491, 51824], "temperature": 0.0, "avg_logprob": -0.15141577287153765, "compression_ratio": + 1.6109215017064846, "no_speech_prob": 0.007223762106150389}, {"id": 783, "seek": + 469024, "start": 4690.24, "end": 4697.2, "text": " upvotes, but nothing worked. + And in the end, I found just one of them that worked. And you know,", "tokens": + [50364, 493, 85, 17251, 11, 457, 1825, 2732, 13, 400, 294, 264, 917, 11, 286, 1352, + 445, 472, 295, 552, 300, 2732, 13, 400, 291, 458, 11, 50712], "temperature": 0.0, + "avg_logprob": -0.12891268957228888, "compression_ratio": 1.673913043478261, "no_speech_prob": + 0.0066907997243106365}, {"id": 784, "seek": 469024, "start": 4697.2, "end": 4703.76, + "text": " that was like the process in a way like iterative, repetitive, and also + in some sense for", "tokens": [50712, 300, 390, 411, 264, 1399, 294, 257, 636, 411, + 17138, 1166, 11, 29404, 11, 293, 611, 294, 512, 2020, 337, 51040], "temperature": + 0.0, "avg_logprob": -0.12891268957228888, "compression_ratio": 1.673913043478261, + "no_speech_prob": 0.0066907997243106365}, {"id": 785, "seek": 469024, "start": 4703.76, + "end": 4709.04, "text": " trading, but then in the end, when you achieve it, you + know, it''s fine. You achieve what you want.", "tokens": [51040, 9529, 11, 457, + 550, 294, 264, 917, 11, 562, 291, 4584, 309, 11, 291, 458, 11, 309, 311, 2489, 13, + 509, 4584, 437, 291, 528, 13, 51304], "temperature": 0.0, "avg_logprob": -0.12891268957228888, + "compression_ratio": 1.673913043478261, "no_speech_prob": 0.0066907997243106365}, + {"id": 786, "seek": 469024, "start": 4709.04, "end": 4714.24, "text": " With chat + GPT is somewhat similar, but the experience is different. I don''t need to type + that much.", "tokens": [51304, 2022, 5081, 26039, 51, 307, 8344, 2531, 11, 457, + 264, 1752, 307, 819, 13, 286, 500, 380, 643, 281, 2010, 300, 709, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.12891268957228888, "compression_ratio": 1.673913043478261, + "no_speech_prob": 0.0066907997243106365}, {"id": 787, "seek": 471424, "start": 4714.24, + "end": 4721.12, "text": " I mean, I don''t need to type something into Google then + go to Stackflow, you know, read this thing,", "tokens": [50364, 286, 914, 11, 286, + 500, 380, 643, 281, 2010, 746, 666, 3329, 550, 352, 281, 37649, 10565, 11, 291, + 458, 11, 1401, 341, 551, 11, 50708], "temperature": 0.0, "avg_logprob": -0.18470687095565025, + "compression_ratio": 1.6192468619246863, "no_speech_prob": 0.016192669048905373}, + {"id": 788, "seek": 471424, "start": 4721.12, "end": 4727.599999999999, "text": + " comprehend it, and then apply it. With chat GPT, all of this is condensed. It''s + like all of", "tokens": [50708, 38183, 309, 11, 293, 550, 3079, 309, 13, 2022, 5081, + 26039, 51, 11, 439, 295, 341, 307, 36398, 13, 467, 311, 411, 439, 295, 51032], "temperature": + 0.0, "avg_logprob": -0.18470687095565025, "compression_ratio": 1.6192468619246863, + "no_speech_prob": 0.016192669048905373}, {"id": 789, "seek": 471424, "start": 4727.599999999999, + "end": 4733.04, "text": " these steps just condensed and meet just literally typing + what I want and getting something on the screen.", "tokens": [51032, 613, 4439, + 445, 36398, 293, 1677, 445, 3736, 18444, 437, 286, 528, 293, 1242, 746, 322, 264, + 2568, 13, 51304], "temperature": 0.0, "avg_logprob": -0.18470687095565025, "compression_ratio": + 1.6192468619246863, "no_speech_prob": 0.016192669048905373}, {"id": 790, "seek": + 471424, "start": 4733.5199999999995, "end": 4741.599999999999, "text": " Right. + This part by itself is amazing. It is hard to predict where how far that will go.", + "tokens": [51328, 1779, 13, 639, 644, 538, 2564, 307, 2243, 13, 467, 307, 1152, + 281, 6069, 689, 577, 1400, 300, 486, 352, 13, 51732], "temperature": 0.0, "avg_logprob": + -0.18470687095565025, "compression_ratio": 1.6192468619246863, "no_speech_prob": + 0.016192669048905373}, {"id": 791, "seek": 474160, "start": 4741.68, "end": 4749.120000000001, + "text": " But I think that one thing is very clear. The M365 silo is probably the + most important one going", "tokens": [50368, 583, 286, 519, 300, 472, 551, 307, + 588, 1850, 13, 440, 376, 11309, 20, 3425, 78, 307, 1391, 264, 881, 1021, 472, 516, + 50740], "temperature": 0.0, "avg_logprob": -0.1410447359085083, "compression_ratio": + 1.640495867768595, "no_speech_prob": 0.004218554589897394}, {"id": 792, "seek": + 474160, "start": 4749.120000000001, "end": 4755.92, "text": " forward because it''s + going to kind of automatically be taking the knowledge, which is very present in", + "tokens": [50740, 2128, 570, 309, 311, 516, 281, 733, 295, 6772, 312, 1940, 264, + 3601, 11, 597, 307, 588, 1974, 294, 51080], "temperature": 0.0, "avg_logprob": -0.1410447359085083, + "compression_ratio": 1.640495867768595, "no_speech_prob": 0.004218554589897394}, + {"id": 793, "seek": 474160, "start": 4755.92, "end": 4760.64, "text": " outlook, + right? Maybe not so much encounter, but in your email is a lot of knowledge there + in teams.", "tokens": [51080, 26650, 11, 558, 30, 2704, 406, 370, 709, 8593, 11, + 457, 294, 428, 3796, 307, 257, 688, 295, 3601, 456, 294, 5491, 13, 51316], "temperature": + 0.0, "avg_logprob": -0.1410447359085083, "compression_ratio": 1.640495867768595, + "no_speech_prob": 0.004218554589897394}, {"id": 794, "seek": 474160, "start": 4760.64, + "end": 4765.360000000001, "text": " There''s a lot of knowledge there. Documents, + probably a decent amount there too, although I think", "tokens": [51316, 821, 311, + 257, 688, 295, 3601, 456, 13, 16024, 4697, 11, 1391, 257, 8681, 2372, 456, 886, + 11, 4878, 286, 519, 51552], "temperature": 0.0, "avg_logprob": -0.1410447359085083, + "compression_ratio": 1.640495867768595, "no_speech_prob": 0.004218554589897394}, + {"id": 795, "seek": 476536, "start": 4765.36, "end": 4771.12, "text": " that tends + to be more scattered. But effectively, right? Chat GPT was trained from Reddit,", + "tokens": [50364, 300, 12258, 281, 312, 544, 21986, 13, 583, 8659, 11, 558, 30, + 27503, 26039, 51, 390, 8895, 490, 32210, 11, 50652], "temperature": 0.0, "avg_logprob": + -0.09605575644451639, "compression_ratio": 1.5495867768595042, "no_speech_prob": + 0.0017753250431269407}, {"id": 796, "seek": 476536, "start": 4771.12, "end": 4779.28, + "text": " which is chat. Teams is chat. Outlook is sort of chat. So there''s no + doubt that maybe those early", "tokens": [50652, 597, 307, 5081, 13, 24702, 307, + 5081, 13, 5925, 12747, 307, 1333, 295, 5081, 13, 407, 456, 311, 572, 6385, 300, + 1310, 729, 2440, 51060], "temperature": 0.0, "avg_logprob": -0.09605575644451639, + "compression_ratio": 1.5495867768595042, "no_speech_prob": 0.0017753250431269407}, + {"id": 797, "seek": 476536, "start": 4779.28, "end": 4785.5199999999995, "text": + " interactions will come through that channel. But I do think that exactly as you + said early on,", "tokens": [51060, 13280, 486, 808, 807, 300, 2269, 13, 583, 286, + 360, 519, 300, 2293, 382, 291, 848, 2440, 322, 11, 51372], "temperature": 0.0, "avg_logprob": + -0.09605575644451639, "compression_ratio": 1.5495867768595042, "no_speech_prob": + 0.0017753250431269407}, {"id": 798, "seek": 476536, "start": 4785.5199999999995, + "end": 4790.719999999999, "text": " Microsoft is never going to make it easy to + talk to anybody else. They still come from that", "tokens": [51372, 8116, 307, 1128, + 516, 281, 652, 309, 1858, 281, 751, 281, 4472, 1646, 13, 814, 920, 808, 490, 300, + 51632], "temperature": 0.0, "avg_logprob": -0.09605575644451639, "compression_ratio": + 1.5495867768595042, "no_speech_prob": 0.0017753250431269407}, {"id": 799, "seek": + 479072, "start": 4790.8, "end": 4797.04, "text": " position of silo dominance or + whatever it is. They don''t like to work with Salesforce.", "tokens": [50368, 2535, + 295, 3425, 78, 34987, 420, 2035, 309, 307, 13, 814, 500, 380, 411, 281, 589, 365, + 40398, 13, 50680], "temperature": 0.0, "avg_logprob": -0.15088270043814053, "compression_ratio": + 1.6293103448275863, "no_speech_prob": 0.005442452616989613}, {"id": 800, "seek": + 479072, "start": 4797.04, "end": 4803.280000000001, "text": " Salesforce doesn''t + like to work with them. Nobody likes to use the non-great product in someone", "tokens": + [50680, 40398, 1177, 380, 411, 281, 589, 365, 552, 13, 9297, 5902, 281, 764, 264, + 2107, 12, 40753, 1674, 294, 1580, 50992], "temperature": 0.0, "avg_logprob": -0.15088270043814053, + "compression_ratio": 1.6293103448275863, "no_speech_prob": 0.005442452616989613}, + {"id": 801, "seek": 479072, "start": 4803.280000000001, "end": 4811.4400000000005, + "text": " else''s stack just because we''re trying to consolidate. So that''s why + it persists. And that is very real", "tokens": [50992, 1646, 311, 8630, 445, 570, + 321, 434, 1382, 281, 49521, 13, 407, 300, 311, 983, 309, 868, 1751, 13, 400, 300, + 307, 588, 957, 51400], "temperature": 0.0, "avg_logprob": -0.15088270043814053, + "compression_ratio": 1.6293103448275863, "no_speech_prob": 0.005442452616989613}, + {"id": 802, "seek": 479072, "start": 4811.4400000000005, "end": 4817.6, "text": + " and exacerbates the problem, the walls between the silos, and then throw in all + the others.", "tokens": [51400, 293, 38819, 1024, 264, 1154, 11, 264, 7920, 1296, + 264, 48893, 11, 293, 550, 3507, 294, 439, 264, 2357, 13, 51708], "temperature": + 0.0, "avg_logprob": -0.15088270043814053, "compression_ratio": 1.6293103448275863, + "no_speech_prob": 0.005442452616989613}, {"id": 803, "seek": 481760, "start": 4818.08, + "end": 4824.160000000001, "text": " After you get the basic whatever, big five, + then you have all the elastics and open searches and", "tokens": [50388, 2381, 291, + 483, 264, 3875, 2035, 11, 955, 1732, 11, 550, 291, 362, 439, 264, 806, 21598, 293, + 1269, 26701, 293, 50692], "temperature": 0.0, "avg_logprob": -0.32651522424485946, + "compression_ratio": 1.5051546391752577, "no_speech_prob": 0.014431041665375233}, + {"id": 804, "seek": 481760, "start": 4824.160000000001, "end": 4834.96, "text": + " solars and postgres and to say nothing of the applications. So one group is using + swirl to look", "tokens": [50692, 1404, 685, 293, 2183, 45189, 293, 281, 584, 1825, + 295, 264, 5821, 13, 407, 472, 1594, 307, 1228, 30310, 281, 574, 51232], "temperature": + 0.0, "avg_logprob": -0.32651522424485946, "compression_ratio": 1.5051546391752577, + "no_speech_prob": 0.014431041665375233}, {"id": 805, "seek": 481760, "start": 4834.96, + "end": 4840.96, "text": " at five different ticket systems. They''re all just ticket. + You track is one from JetBrains and then", "tokens": [51232, 412, 1732, 819, 10550, + 3652, 13, 814, 434, 439, 445, 10550, 13, 509, 2837, 307, 472, 490, 28730, 45606, + 1292, 293, 550, 51532], "temperature": 0.0, "avg_logprob": -0.32651522424485946, + "compression_ratio": 1.5051546391752577, "no_speech_prob": 0.014431041665375233}, + {"id": 806, "seek": 484096, "start": 4841.52, "end": 4849.36, "text": " you''re + on the, there''s some others. Okay, that''s a really interesting problem. The cost + to migrate", "tokens": [50392, 291, 434, 322, 264, 11, 456, 311, 512, 2357, 13, + 1033, 11, 300, 311, 257, 534, 1880, 1154, 13, 440, 2063, 281, 31821, 50784], "temperature": + 0.0, "avg_logprob": -0.2110749880472819, "compression_ratio": 1.617117117117117, + "no_speech_prob": 0.02523277886211872}, {"id": 807, "seek": 484096, "start": 4849.36, + "end": 4853.2, "text": " all that stuff would be just, it''s not even, I don''t + think it''s necessarily that much money.", "tokens": [50784, 439, 300, 1507, 576, + 312, 445, 11, 309, 311, 406, 754, 11, 286, 500, 380, 519, 309, 311, 4725, 300, 709, + 1460, 13, 50976], "temperature": 0.0, "avg_logprob": -0.2110749880472819, "compression_ratio": + 1.617117117117117, "no_speech_prob": 0.02523277886211872}, {"id": 808, "seek": 484096, + "start": 4853.2, "end": 4858.88, "text": " It''s just a massive amount of pain. + If you could figure out how to do it, probably some transfer", "tokens": [50976, + 467, 311, 445, 257, 5994, 2372, 295, 1822, 13, 759, 291, 727, 2573, 484, 577, 281, + 360, 309, 11, 1391, 512, 5003, 51260], "temperature": 0.0, "avg_logprob": -0.2110749880472819, + "compression_ratio": 1.617117117117117, "no_speech_prob": 0.02523277886211872}, + {"id": 809, "seek": 484096, "start": 4858.88, "end": 4863.04, "text": " much, it''s + not that much money, but it is a tremendous amount of work.", "tokens": [51260, + 709, 11, 309, 311, 406, 300, 709, 1460, 11, 457, 309, 307, 257, 10048, 2372, 295, + 589, 13, 51468], "temperature": 0.0, "avg_logprob": -0.2110749880472819, "compression_ratio": + 1.617117117117117, "no_speech_prob": 0.02523277886211872}, {"id": 810, "seek": 486304, + "start": 4863.6, "end": 4872.08, "text": " Yes, I think you probably don''t realize + yourself yet, but from the way you explain this,", "tokens": [50392, 1079, 11, 286, + 519, 291, 1391, 500, 380, 4325, 1803, 1939, 11, 457, 490, 264, 636, 291, 2903, 341, + 11, 50816], "temperature": 0.0, "avg_logprob": -0.16856873736662023, "compression_ratio": + 1.4615384615384615, "no_speech_prob": 0.07404676079750061}, {"id": 811, "seek": + 486304, "start": 4872.08, "end": 4878.24, "text": " it feels like you''ve invented + JetGPT for the search part. I mean, in some sense,", "tokens": [50816, 309, 3417, + 411, 291, 600, 14479, 28730, 38, 47, 51, 337, 264, 3164, 644, 13, 286, 914, 11, + 294, 512, 2020, 11, 51124], "temperature": 0.0, "avg_logprob": -0.16856873736662023, + "compression_ratio": 1.4615384615384615, "no_speech_prob": 0.07404676079750061}, + {"id": 812, "seek": 486304, "start": 4879.04, "end": 4887.76, "text": " like simplifying + things, not actually, as you said, not requesting anyone to physically reinvent", + "tokens": [51164, 411, 6883, 5489, 721, 11, 406, 767, 11, 382, 291, 848, 11, 406, + 31937, 2878, 281, 9762, 33477, 51600], "temperature": 0.0, "avg_logprob": -0.16856873736662023, + "compression_ratio": 1.4615384615384615, "no_speech_prob": 0.07404676079750061}, + {"id": 813, "seek": 488776, "start": 4887.84, "end": 4893.68, "text": " things like + move data here and there, which can take years, sometimes like dozens of years,", + "tokens": [50368, 721, 411, 1286, 1412, 510, 293, 456, 11, 597, 393, 747, 924, 11, + 2171, 411, 18431, 295, 924, 11, 50660], "temperature": 0.0, "avg_logprob": -0.12154824393136161, + "compression_ratio": 1.6891891891891893, "no_speech_prob": 0.10080519318580627}, + {"id": 814, "seek": 488776, "start": 4893.68, "end": 4902.24, "text": " people simply + don''t do this. And also access to the data, like today, I only remember a fraction", + "tokens": [50660, 561, 2935, 500, 380, 360, 341, 13, 400, 611, 2105, 281, 264, 1412, + 11, 411, 965, 11, 286, 787, 1604, 257, 14135, 51088], "temperature": 0.0, "avg_logprob": + -0.12154824393136161, "compression_ratio": 1.6891891891891893, "no_speech_prob": + 0.10080519318580627}, {"id": 815, "seek": 488776, "start": 4902.24, "end": 4907.2, + "text": " of things that I did. I literally forget things that I''ve done yesterday. + I might sometimes", "tokens": [51088, 295, 721, 300, 286, 630, 13, 286, 3736, 2870, + 721, 300, 286, 600, 1096, 5186, 13, 286, 1062, 2171, 51336], "temperature": 0.0, + "avg_logprob": -0.12154824393136161, "compression_ratio": 1.6891891891891893, "no_speech_prob": + 0.10080519318580627}, {"id": 816, "seek": 488776, "start": 4907.2, "end": 4913.2, + "text": " reflect and I remember something a week ago or so, but it''s still, it''s + because of information", "tokens": [51336, 5031, 293, 286, 1604, 746, 257, 1243, + 2057, 420, 370, 11, 457, 309, 311, 920, 11, 309, 311, 570, 295, 1589, 51636], "temperature": + 0.0, "avg_logprob": -0.12154824393136161, "compression_ratio": 1.6891891891891893, + "no_speech_prob": 0.10080519318580627}, {"id": 817, "seek": 491320, "start": 4913.2, + "end": 4919.2, "text": " overload, and I need to make decisions, I need to scramble + something together quickly", "tokens": [50364, 28777, 11, 293, 286, 643, 281, 652, + 5327, 11, 286, 643, 281, 795, 48382, 746, 1214, 2661, 50664], "temperature": 0.0, + "avg_logprob": -0.18316137635862673, "compression_ratio": 1.5566037735849056, "no_speech_prob": + 0.017945485189557076}, {"id": 818, "seek": 491320, "start": 4919.92, "end": 4926.24, + "text": " on a conference page, how much knowledge do I have myself? And if I had + that magical", "tokens": [50700, 322, 257, 7586, 3028, 11, 577, 709, 3601, 360, + 286, 362, 2059, 30, 400, 498, 286, 632, 300, 12066, 51016], "temperature": 0.0, + "avg_logprob": -0.18316137635862673, "compression_ratio": 1.5566037735849056, "no_speech_prob": + 0.017945485189557076}, {"id": 819, "seek": 491320, "start": 4927.2, "end": 4932.24, + "text": " search bar where I could have typed something and just get the support + material,", "tokens": [51064, 3164, 2159, 689, 286, 727, 362, 33941, 746, 293, 445, + 483, 264, 1406, 2527, 11, 51316], "temperature": 0.0, "avg_logprob": -0.18316137635862673, + "compression_ratio": 1.5566037735849056, "no_speech_prob": 0.017945485189557076}, + {"id": 820, "seek": 491320, "start": 4933.599999999999, "end": 4937.679999999999, + "text": " not to go all over the place, essentially doing what search engines should + do,", "tokens": [51384, 406, 281, 352, 439, 670, 264, 1081, 11, 4476, 884, 437, + 3164, 12982, 820, 360, 11, 51588], "temperature": 0.0, "avg_logprob": -0.18316137635862673, + "compression_ratio": 1.5566037735849056, "no_speech_prob": 0.017945485189557076}, + {"id": 821, "seek": 493768, "start": 4938.56, "end": 4943.6, "text": " just go and + check what happened where and when and by whom?", "tokens": [50408, 445, 352, 293, + 1520, 437, 2011, 689, 293, 562, 293, 538, 7101, 30, 50660], "temperature": 0.0, + "avg_logprob": -0.17529078892299108, "compression_ratio": 1.606425702811245, "no_speech_prob": + 0.005362936295568943}, {"id": 822, "seek": 493768, "start": 4945.04, "end": 4950.4800000000005, + "text": " Exactly. Exactly. There''s so much amazing work and time and", "tokens": + [50732, 7587, 13, 7587, 13, 821, 311, 370, 709, 2243, 589, 293, 565, 293, 51004], + "temperature": 0.0, "avg_logprob": -0.17529078892299108, "compression_ratio": 1.606425702811245, + "no_speech_prob": 0.005362936295568943}, {"id": 823, "seek": 493768, "start": 4951.280000000001, + "end": 4954.88, "text": " genius that''s gone into some of these apps. I mean, who + doesn''t love them? Like they''re,", "tokens": [51044, 14017, 300, 311, 2780, 666, + 512, 295, 613, 7733, 13, 286, 914, 11, 567, 1177, 380, 959, 552, 30, 1743, 436, + 434, 11, 51224], "temperature": 0.0, "avg_logprob": -0.17529078892299108, "compression_ratio": + 1.606425702811245, "no_speech_prob": 0.005362936295568943}, {"id": 824, "seek": + 493768, "start": 4955.4400000000005, "end": 4961.04, "text": " you know, they all + have incredible capabilities and they''re evolved, they''re growing all the time.", + "tokens": [51252, 291, 458, 11, 436, 439, 362, 4651, 10862, 293, 436, 434, 14178, + 11, 436, 434, 4194, 439, 264, 565, 13, 51532], "temperature": 0.0, "avg_logprob": + -0.17529078892299108, "compression_ratio": 1.606425702811245, "no_speech_prob": + 0.005362936295568943}, {"id": 825, "seek": 493768, "start": 4962.16, "end": 4966.400000000001, + "text": " In a way, right, the idea that you would take data out to try to make + sense of it is absurd.", "tokens": [51588, 682, 257, 636, 11, 558, 11, 264, 1558, + 300, 291, 576, 747, 1412, 484, 281, 853, 281, 652, 2020, 295, 309, 307, 19774, 13, + 51800], "temperature": 0.0, "avg_logprob": -0.17529078892299108, "compression_ratio": + 1.606425702811245, "no_speech_prob": 0.005362936295568943}, {"id": 826, "seek": + 496640, "start": 4967.04, "end": 4973.2, "text": " It really is. Think of Salesforce + as 2,000 plus tables just to make the application work,", "tokens": [50396, 467, + 534, 307, 13, 6557, 295, 40398, 382, 568, 11, 1360, 1804, 8020, 445, 281, 652, 264, + 3861, 589, 11, 50704], "temperature": 0.0, "avg_logprob": -0.1639777981505102, "compression_ratio": + 1.5791666666666666, "no_speech_prob": 0.007999851368367672}, {"id": 827, "seek": + 496640, "start": 4973.2, "end": 4979.599999999999, "text": " you''re going to extract + that? No, you''re going to query it. And that''s the key, right? And so", "tokens": + [50704, 291, 434, 516, 281, 8947, 300, 30, 883, 11, 291, 434, 516, 281, 14581, 309, + 13, 400, 300, 311, 264, 2141, 11, 558, 30, 400, 370, 51024], "temperature": 0.0, + "avg_logprob": -0.1639777981505102, "compression_ratio": 1.5791666666666666, "no_speech_prob": + 0.007999851368367672}, {"id": 828, "seek": 496640, "start": 4979.599999999999, "end": + 4984.96, "text": " we''re focused on making the querying easy and understandable + simplicity. You know, I''ve worked on", "tokens": [51024, 321, 434, 5178, 322, 1455, + 264, 7083, 1840, 1858, 293, 25648, 25632, 13, 509, 458, 11, 286, 600, 2732, 322, + 51292], "temperature": 0.0, "avg_logprob": -0.1639777981505102, "compression_ratio": + 1.5791666666666666, "no_speech_prob": 0.007999851368367672}, {"id": 829, "seek": + 496640, "start": 4984.96, "end": 4989.92, "text": " some amazing products that were + not simple. And I''m sorry for some of them, right? Not being that", "tokens": [51292, + 512, 2243, 3383, 300, 645, 406, 2199, 13, 400, 286, 478, 2597, 337, 512, 295, 552, + 11, 558, 30, 1726, 885, 300, 51540], "temperature": 0.0, "avg_logprob": -0.1639777981505102, + "compression_ratio": 1.5791666666666666, "no_speech_prob": 0.007999851368367672}, + {"id": 830, "seek": 498992, "start": 4989.92, "end": 4997.2, "text": " simple, but + at the end of the day, I think today in the enterprise, it''s got to get easier.", + "tokens": [50364, 2199, 11, 457, 412, 264, 917, 295, 264, 786, 11, 286, 519, 965, + 294, 264, 14132, 11, 309, 311, 658, 281, 483, 3571, 13, 50728], "temperature": 0.0, + "avg_logprob": -0.1337649167239011, "compression_ratio": 1.6604651162790698, "no_speech_prob": + 0.010883676819503307}, {"id": 831, "seek": 498992, "start": 4997.2, "end": 5001.84, + "text": " And there''s got to be alternatives to indexing. And so thus the simplicity.", + "tokens": [50728, 400, 456, 311, 658, 281, 312, 20478, 281, 8186, 278, 13, 400, + 370, 8807, 264, 25632, 13, 50960], "temperature": 0.0, "avg_logprob": -0.1337649167239011, + "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.010883676819503307}, + {"id": 832, "seek": 498992, "start": 5002.64, "end": 5008.8, "text": " Amazing. + Here comes my favorite question. As we get closer to the end of this amazing podcast", + "tokens": [51000, 14165, 13, 1692, 1487, 452, 2954, 1168, 13, 1018, 321, 483, 4966, + 281, 264, 917, 295, 341, 2243, 7367, 51308], "temperature": 0.0, "avg_logprob": + -0.1337649167239011, "compression_ratio": 1.6604651162790698, "no_speech_prob": + 0.010883676819503307}, {"id": 833, "seek": 498992, "start": 5008.8, "end": 5017.36, + "text": " episode, the question of why you''ve done a lot in software engineering, + you''ve done quite a lot", "tokens": [51308, 3500, 11, 264, 1168, 295, 983, 291, + 600, 1096, 257, 688, 294, 4722, 7043, 11, 291, 600, 1096, 1596, 257, 688, 51736], + "temperature": 0.0, "avg_logprob": -0.1337649167239011, "compression_ratio": 1.6604651162790698, + "no_speech_prob": 0.010883676819503307}, {"id": 834, "seek": 501736, "start": 5017.36, + "end": 5023.679999999999, "text": " in search. You mentioned on all this companies, + you know, like fast, which, you know,", "tokens": [50364, 294, 3164, 13, 509, 2835, + 322, 439, 341, 3431, 11, 291, 458, 11, 411, 2370, 11, 597, 11, 291, 458, 11, 50680], + "temperature": 0.0, "avg_logprob": -0.22209063212076824, "compression_ratio": 1.4806629834254144, + "no_speech_prob": 0.0016709007322788239}, {"id": 835, "seek": 501736, "start": 5023.679999999999, + "end": 5030.08, "text": " product became like Vespa and so on. You''re building + swirl. Why? Like what keeps you", "tokens": [50680, 1674, 3062, 411, 691, 279, 4306, + 293, 370, 322, 13, 509, 434, 2390, 30310, 13, 1545, 30, 1743, 437, 5965, 291, 51000], + "temperature": 0.0, "avg_logprob": -0.22209063212076824, "compression_ratio": 1.4806629834254144, + "no_speech_prob": 0.0016709007322788239}, {"id": 836, "seek": 501736, "start": 5031.2, + "end": 5040.4, "text": " motivated to do this? And as amazing as it is, like you''re + doing a lot of things. And also in the", "tokens": [51056, 14515, 281, 360, 341, + 30, 400, 382, 2243, 382, 309, 307, 11, 411, 291, 434, 884, 257, 688, 295, 721, 13, + 400, 611, 294, 264, 51516], "temperature": 0.0, "avg_logprob": -0.22209063212076824, + "compression_ratio": 1.4806629834254144, "no_speech_prob": 0.0016709007322788239}, + {"id": 837, "seek": 504040, "start": 5040.4, "end": 5050.16, "text": " open, what + motivates you to stay in this topic of search? You know, whether or not it''s been", + "tokens": [50364, 1269, 11, 437, 42569, 291, 281, 1754, 294, 341, 4829, 295, 3164, + 30, 509, 458, 11, 1968, 420, 406, 309, 311, 668, 50852], "temperature": 0.0, "avg_logprob": + -0.1261233507200729, "compression_ratio": 1.5183673469387755, "no_speech_prob": + 0.0040545351803302765}, {"id": 838, "seek": 504040, "start": 5050.16, "end": 5056.0, + "text": " searched, data integration has been the thing that I''ve always liked. + I started my career at", "tokens": [50852, 22961, 11, 1412, 10980, 575, 668, 264, + 551, 300, 286, 600, 1009, 4501, 13, 286, 1409, 452, 3988, 412, 51144], "temperature": + 0.0, "avg_logprob": -0.1261233507200729, "compression_ratio": 1.5183673469387755, + "no_speech_prob": 0.0040545351803302765}, {"id": 839, "seek": 504040, "start": 5056.0, + "end": 5061.679999999999, "text": " John Hancock financial services working in marketing, + doing customer segmentation. Interesting", "tokens": [51144, 2619, 7820, 29779, + 4669, 3328, 1364, 294, 6370, 11, 884, 5474, 9469, 399, 13, 14711, 51428], "temperature": + 0.0, "avg_logprob": -0.1261233507200729, "compression_ratio": 1.5183673469387755, + "no_speech_prob": 0.0040545351803302765}, {"id": 840, "seek": 504040, "start": 5061.679999999999, + "end": 5069.679999999999, "text": " stuff. But really, the problem the company couldn''t + solve was how to view, well, completely", "tokens": [51428, 1507, 13, 583, 534, + 11, 264, 1154, 264, 2237, 2809, 380, 5039, 390, 577, 281, 1910, 11, 731, 11, 2584, + 51828], "temperature": 0.0, "avg_logprob": -0.1261233507200729, "compression_ratio": + 1.5183673469387755, "no_speech_prob": 0.0040545351803302765}, {"id": 841, "seek": + 506968, "start": 5069.68, "end": 5078.240000000001, "text": " separate product lines + in one way. They had no idea, right? 110-year-old company had no idea", "tokens": + [50364, 4994, 1674, 3876, 294, 472, 636, 13, 814, 632, 572, 1558, 11, 558, 30, 20154, + 12, 5294, 12, 2641, 2237, 632, 572, 1558, 50792], "temperature": 0.0, "avg_logprob": + -0.12974620902019998, "compression_ratio": 1.5685483870967742, "no_speech_prob": + 0.004880491644144058}, {"id": 842, "seek": 506968, "start": 5078.240000000001, "end": + 5084.0, "text": " that it had a Pareto actually was somewhat worse. Like 10% or + 15% of the customers were producing", "tokens": [50792, 300, 309, 632, 257, 31189, + 1353, 767, 390, 8344, 5324, 13, 1743, 1266, 4, 420, 2119, 4, 295, 264, 4581, 645, + 10501, 51080], "temperature": 0.0, "avg_logprob": -0.12974620902019998, "compression_ratio": + 1.5685483870967742, "no_speech_prob": 0.004880491644144058}, {"id": 843, "seek": + 506968, "start": 5084.0, "end": 5089.52, "text": " 80% of the premiums. Everybody + got treated equally. It was like a very old school business that", "tokens": [51080, + 4688, 4, 295, 264, 12049, 82, 13, 7646, 658, 8668, 12309, 13, 467, 390, 411, 257, + 588, 1331, 1395, 1606, 300, 51356], "temperature": 0.0, "avg_logprob": -0.12974620902019998, + "compression_ratio": 1.5685483870967742, "no_speech_prob": 0.004880491644144058}, + {"id": 844, "seek": 506968, "start": 5089.52, "end": 5094.96, "text": " was all + about customers without really understanding customers. And it was still massively + successful.", "tokens": [51356, 390, 439, 466, 4581, 1553, 534, 3701, 4581, 13, + 400, 309, 390, 920, 29379, 4406, 13, 51628], "temperature": 0.0, "avg_logprob": + -0.12974620902019998, "compression_ratio": 1.5685483870967742, "no_speech_prob": + 0.004880491644144058}, {"id": 845, "seek": 509496, "start": 5095.28, "end": 5100.32, + "text": " So that''s not an act. They were one of the biggest users of technology. + Also, Hancock had the", "tokens": [50380, 407, 300, 311, 406, 364, 605, 13, 814, + 645, 472, 295, 264, 3880, 5022, 295, 2899, 13, 2743, 11, 7820, 29779, 632, 264, + 50632], "temperature": 0.0, "avg_logprob": -0.16405684668738563, "compression_ratio": + 1.7075812274368232, "no_speech_prob": 0.025208672508597374}, {"id": 846, "seek": + 509496, "start": 5100.32, "end": 5108.16, "text": " largest IBM mainframe, I think, + in the Northeast for many years. But the silo problem was the problem", "tokens": + [50632, 6443, 23487, 2135, 17265, 11, 286, 519, 11, 294, 264, 42150, 337, 867, 924, + 13, 583, 264, 3425, 78, 1154, 390, 264, 1154, 51024], "temperature": 0.0, "avg_logprob": + -0.16405684668738563, "compression_ratio": 1.7075812274368232, "no_speech_prob": + 0.025208672508597374}, {"id": 847, "seek": 509496, "start": 5108.64, "end": 5113.76, + "text": " that we had to solve to actually take the company to the level that it + could compete with direct", "tokens": [51048, 300, 321, 632, 281, 5039, 281, 767, + 747, 264, 2237, 281, 264, 1496, 300, 309, 727, 11831, 365, 2047, 51304], "temperature": + 0.0, "avg_logprob": -0.16405684668738563, "compression_ratio": 1.7075812274368232, + "no_speech_prob": 0.025208672508597374}, {"id": 848, "seek": 509496, "start": 5113.76, + "end": 5118.0, "text": " mail companies because direct mail companies had a lower + cost basis and they knew the customer.", "tokens": [51304, 10071, 3431, 570, 2047, + 10071, 3431, 632, 257, 3126, 2063, 5143, 293, 436, 2586, 264, 5474, 13, 51516], + "temperature": 0.0, "avg_logprob": -0.16405684668738563, "compression_ratio": 1.7075812274368232, + "no_speech_prob": 0.025208672508597374}, {"id": 849, "seek": 509496, "start": 5119.04, + "end": 5124.0, "text": " And that project quite honestly is the pattern that I have + seen over and over again,", "tokens": [51568, 400, 300, 1716, 1596, 6095, 307, 264, + 5102, 300, 286, 362, 1612, 670, 293, 670, 797, 11, 51816], "temperature": 0.0, "avg_logprob": + -0.16405684668738563, "compression_ratio": 1.7075812274368232, "no_speech_prob": + 0.025208672508597374}, {"id": 850, "seek": 512400, "start": 5124.0, "end": 5129.04, + "text": " regardless of what venture search has been one of them. But I was really + lucky to work on mortgage", "tokens": [50364, 10060, 295, 437, 18474, 3164, 575, + 668, 472, 295, 552, 13, 583, 286, 390, 534, 6356, 281, 589, 322, 20236, 50616], + "temperature": 0.0, "avg_logprob": -0.1379446045965211, "compression_ratio": 1.6452702702702702, + "no_speech_prob": 0.0015558414161205292}, {"id": 851, "seek": 512400, "start": 5129.04, + "end": 5136.64, "text": " processing too. So a company called AI Foundry was actually + backed by Kodak Alaris, which was the", "tokens": [50616, 9007, 886, 13, 407, 257, + 2237, 1219, 7318, 8207, 627, 390, 767, 20391, 538, 591, 378, 514, 967, 27489, 11, + 597, 390, 264, 50996], "temperature": 0.0, "avg_logprob": -0.1379446045965211, "compression_ratio": + 1.6452702702702702, "no_speech_prob": 0.0015558414161205292}, {"id": 852, "seek": + 512400, "start": 5136.64, "end": 5141.52, "text": " world leader in scanning at + the time, right? Said, we need to come up with something to do.", "tokens": [50996, + 1002, 5263, 294, 27019, 412, 264, 565, 11, 558, 30, 26490, 11, 321, 643, 281, 808, + 493, 365, 746, 281, 360, 13, 51240], "temperature": 0.0, "avg_logprob": -0.1379446045965211, + "compression_ratio": 1.6452702702702702, "no_speech_prob": 0.0015558414161205292}, + {"id": 853, "seek": 512400, "start": 5141.52, "end": 5145.52, "text": " We need + to do something interesting with this scanning technology. And we''d like to apply + it in a", "tokens": [51240, 492, 643, 281, 360, 746, 1880, 365, 341, 27019, 2899, + 13, 400, 321, 1116, 411, 281, 3079, 309, 294, 257, 51440], "temperature": 0.0, "avg_logprob": + -0.1379446045965211, "compression_ratio": 1.6452702702702702, "no_speech_prob": + 0.0015558414161205292}, {"id": 854, "seek": 512400, "start": 5145.52, "end": 5150.24, + "text": " market other than consumer photos or things like that. Try to find a new + market. And mortgage turned", "tokens": [51440, 2142, 661, 813, 9711, 5787, 420, + 721, 411, 300, 13, 6526, 281, 915, 257, 777, 2142, 13, 400, 20236, 3574, 51676], + "temperature": 0.0, "avg_logprob": -0.1379446045965211, "compression_ratio": 1.6452702702702702, + "no_speech_prob": 0.0015558414161205292}, {"id": 855, "seek": 515024, "start": 5150.24, + "end": 5154.5599999999995, "text": " out to be hot because if you''ve done a mortgage, + right, if you''ve taken a mortgage, you have this", "tokens": [50364, 484, 281, + 312, 2368, 570, 498, 291, 600, 1096, 257, 20236, 11, 558, 11, 498, 291, 600, 2726, + 257, 20236, 11, 291, 362, 341, 50580], "temperature": 0.0, "avg_logprob": -0.14849209104265484, + "compression_ratio": 1.7283582089552239, "no_speech_prob": 0.014303239062428474}, + {"id": 856, "seek": 515024, "start": 5154.5599999999995, "end": 5158.24, "text": + " ugly moment of sending them a bunch of documents and then you just have to wait. + And then sometimes", "tokens": [50580, 12246, 1623, 295, 7750, 552, 257, 3840, 295, + 8512, 293, 550, 291, 445, 362, 281, 1699, 13, 400, 550, 2171, 50764], "temperature": + 0.0, "avg_logprob": -0.14849209104265484, "compression_ratio": 1.7283582089552239, + "no_speech_prob": 0.014303239062428474}, {"id": 857, "seek": 515024, "start": 5158.24, + "end": 5162.719999999999, "text": " they''re like, Oh, I need to do this one again. + I believe there''s research that showed that something", "tokens": [50764, 436, + 434, 411, 11, 876, 11, 286, 643, 281, 360, 341, 472, 797, 13, 286, 1697, 456, 311, + 2132, 300, 4712, 300, 746, 50988], "temperature": 0.0, "avg_logprob": -0.14849209104265484, + "compression_ratio": 1.7283582089552239, "no_speech_prob": 0.014303239062428474}, + {"id": 858, "seek": 515024, "start": 5162.719999999999, "end": 5167.92, "text": + " like one third of the applicants drop out every two or three days after, you know, + you haven''t", "tokens": [50988, 411, 472, 2636, 295, 264, 28767, 3270, 484, 633, + 732, 420, 1045, 1708, 934, 11, 291, 458, 11, 291, 2378, 380, 51248], "temperature": + 0.0, "avg_logprob": -0.14849209104265484, "compression_ratio": 1.7283582089552239, + "no_speech_prob": 0.014303239062428474}, {"id": 859, "seek": 515024, "start": 5167.92, + "end": 5170.8, "text": " got back to them with their documents. They just want that + all clear, like you''re good.", "tokens": [51248, 658, 646, 281, 552, 365, 641, + 8512, 13, 814, 445, 528, 300, 439, 1850, 11, 411, 291, 434, 665, 13, 51392], "temperature": + 0.0, "avg_logprob": -0.14849209104265484, "compression_ratio": 1.7283582089552239, + "no_speech_prob": 0.014303239062428474}, {"id": 860, "seek": 515024, "start": 5171.679999999999, + "end": 5177.28, "text": " So AI Foundry used pretty interesting OCR''s, zone technology, + classification, text classification", "tokens": [51436, 407, 7318, 8207, 627, 1143, + 1238, 1880, 422, 18547, 311, 11, 6668, 2899, 11, 21538, 11, 2487, 21538, 51716], + "temperature": 0.0, "avg_logprob": -0.14849209104265484, "compression_ratio": 1.7283582089552239, + "no_speech_prob": 0.014303239062428474}, {"id": 861, "seek": 517728, "start": 5177.36, + "end": 5182.8, "text": " to turn the mortgage app into data, not 100% with the state + of the art before was", "tokens": [50368, 281, 1261, 264, 20236, 724, 666, 1412, + 11, 406, 2319, 4, 365, 264, 1785, 295, 264, 1523, 949, 390, 50640], "temperature": + 0.0, "avg_logprob": -0.22198511872972762, "compression_ratio": 1.6472727272727272, + "no_speech_prob": 0.006602272856980562}, {"id": 862, "seek": 517728, "start": 5182.8, + "end": 5186.8, "text": " keying it, manually keying it, and then someone would manually + review it. So we switched it to review.", "tokens": [50640, 803, 1840, 309, 11, + 16945, 803, 1840, 309, 11, 293, 550, 1580, 576, 16945, 3131, 309, 13, 407, 321, + 16858, 309, 281, 3131, 13, 50840], "temperature": 0.0, "avg_logprob": -0.22198511872972762, + "compression_ratio": 1.6472727272727272, "no_speech_prob": 0.006602272856980562}, + {"id": 863, "seek": 517728, "start": 5187.5199999999995, "end": 5193.36, "text": + " Company was successful. It was a silo problem again. You could think of the different", + "tokens": [50876, 13918, 390, 4406, 13, 467, 390, 257, 3425, 78, 1154, 797, 13, + 509, 727, 519, 295, 264, 819, 51168], "temperature": 0.0, "avg_logprob": -0.22198511872972762, + "compression_ratio": 1.6472727272727272, "no_speech_prob": 0.006602272856980562}, + {"id": 864, "seek": 517728, "start": 5194.24, "end": 5199.12, "text": " types, right, + of articles as being fundamentally silos and understanding them was hard,", "tokens": + [51212, 3467, 11, 558, 11, 295, 11290, 382, 885, 17879, 48893, 293, 3701, 552, 390, + 1152, 11, 51456], "temperature": 0.0, "avg_logprob": -0.22198511872972762, "compression_ratio": + 1.6472727272727272, "no_speech_prob": 0.006602272856980562}, {"id": 865, "seek": + 517728, "start": 5199.12, "end": 5203.04, "text": " and we do a lot of modeling + and it worked. It worked great, right? Gaulous bought the company.", "tokens": [51456, + 293, 321, 360, 257, 688, 295, 15983, 293, 309, 2732, 13, 467, 2732, 869, 11, 558, + 30, 10384, 6893, 4243, 264, 2237, 13, 51652], "temperature": 0.0, "avg_logprob": + -0.22198511872972762, "compression_ratio": 1.6472727272727272, "no_speech_prob": + 0.006602272856980562}, {"id": 866, "seek": 520304, "start": 5203.76, "end": 5210.24, + "text": " That''s just another example. Did the same thing in an IoT company, most + recently, where we''re", "tokens": [50400, 663, 311, 445, 1071, 1365, 13, 2589, + 264, 912, 551, 294, 364, 30112, 2237, 11, 881, 3938, 11, 689, 321, 434, 50724], + "temperature": 0.0, "avg_logprob": -0.15992421117322198, "compression_ratio": 1.6079734219269104, + "no_speech_prob": 0.0012512467801570892}, {"id": 867, "seek": 520304, "start": 5210.24, + "end": 5214.88, "text": " basically taking sensor data from healthcare settings, + marrying it up with other data, like", "tokens": [50724, 1936, 1940, 10200, 1412, + 490, 8884, 6257, 11, 36376, 309, 493, 365, 661, 1412, 11, 411, 50956], "temperature": + 0.0, "avg_logprob": -0.15992421117322198, "compression_ratio": 1.6079734219269104, + "no_speech_prob": 0.0012512467801570892}, {"id": 868, "seek": 520304, "start": 5214.88, + "end": 5220.24, "text": " their EHR data and trying to predict, you know, likelihood + of various conditions. So it''s always", "tokens": [50956, 641, 39416, 49, 1412, + 293, 1382, 281, 6069, 11, 291, 458, 11, 22119, 295, 3683, 4487, 13, 407, 309, 311, + 1009, 51224], "temperature": 0.0, "avg_logprob": -0.15992421117322198, "compression_ratio": + 1.6079734219269104, "no_speech_prob": 0.0012512467801570892}, {"id": 869, "seek": + 520304, "start": 5220.24, "end": 5224.8, "text": " the silo problem. And frankly, + every single one of these ventures would have benefited from something", "tokens": + [51224, 264, 3425, 78, 1154, 13, 400, 11939, 11, 633, 2167, 472, 295, 613, 6931, + 1303, 576, 362, 33605, 490, 746, 51452], "temperature": 0.0, "avg_logprob": -0.15992421117322198, + "compression_ratio": 1.6079734219269104, "no_speech_prob": 0.0012512467801570892}, + {"id": 870, "seek": 520304, "start": 5224.8, "end": 5230.08, "text": " like swirl. + So that''s why I did it. It''s because to be honest with you, I think the data problem + is", "tokens": [51452, 411, 30310, 13, 407, 300, 311, 983, 286, 630, 309, 13, 467, + 311, 570, 281, 312, 3245, 365, 291, 11, 286, 519, 264, 1412, 1154, 307, 51716], + "temperature": 0.0, "avg_logprob": -0.15992421117322198, "compression_ratio": 1.6079734219269104, + "no_speech_prob": 0.0012512467801570892}, {"id": 871, "seek": 523008, "start": 5230.16, + "end": 5236.32, "text": " huge. I''m passionate about it. And I think it''s important + to solve it because frankly, some of the", "tokens": [50368, 2603, 13, 286, 478, + 11410, 466, 309, 13, 400, 286, 519, 309, 311, 1021, 281, 5039, 309, 570, 11939, + 11, 512, 295, 264, 50676], "temperature": 0.0, "avg_logprob": -0.12234523684479469, + "compression_ratio": 1.6464646464646464, "no_speech_prob": 0.015091207809746265}, + {"id": 872, "seek": 523008, "start": 5236.32, "end": 5240.72, "text": " service + problems, right, that we all suffer when we''re out in the field dealing with large + companies", "tokens": [50676, 2643, 2740, 11, 558, 11, 300, 321, 439, 9753, 562, + 321, 434, 484, 294, 264, 2519, 6260, 365, 2416, 3431, 50896], "temperature": 0.0, + "avg_logprob": -0.12234523684479469, "compression_ratio": 1.6464646464646464, "no_speech_prob": + 0.015091207809746265}, {"id": 873, "seek": 523008, "start": 5240.72, "end": 5244.96, + "text": " because they just don''t have the data. They''re not just trying to be + mean or be clueless, right?", "tokens": [50896, 570, 436, 445, 500, 380, 362, 264, + 1412, 13, 814, 434, 406, 445, 1382, 281, 312, 914, 420, 312, 596, 3483, 442, 11, + 558, 30, 51108], "temperature": 0.0, "avg_logprob": -0.12234523684479469, "compression_ratio": + 1.6464646464646464, "no_speech_prob": 0.015091207809746265}, {"id": 874, "seek": + 523008, "start": 5244.96, "end": 5249.68, "text": " Sometimes it''s like, it''s + a hard problem to solve. We expect a lot now. As an engineer, right?", "tokens": + [51108, 4803, 309, 311, 411, 11, 309, 311, 257, 1152, 1154, 281, 5039, 13, 492, + 2066, 257, 688, 586, 13, 1018, 364, 11403, 11, 558, 30, 51344], "temperature": 0.0, + "avg_logprob": -0.12234523684479469, "compression_ratio": 1.6464646464646464, "no_speech_prob": + 0.015091207809746265}, {"id": 875, "seek": 523008, "start": 5249.68, "end": 5255.28, + "text": " I''m expecting chat GPT level response is pretty soon. And yet, what we + have is Siri, who like can", "tokens": [51344, 286, 478, 9650, 5081, 26039, 51, + 1496, 4134, 307, 1238, 2321, 13, 400, 1939, 11, 437, 321, 362, 307, 33682, 11, 567, + 411, 393, 51624], "temperature": 0.0, "avg_logprob": -0.12234523684479469, "compression_ratio": + 1.6464646464646464, "no_speech_prob": 0.015091207809746265}, {"id": 876, "seek": + 525528, "start": 5255.28, "end": 5260.8, "text": " barely figure out to turn off + the alarm, you know, what it''s going to. So there are going to be some", "tokens": + [50364, 10268, 2573, 484, 281, 1261, 766, 264, 14183, 11, 291, 458, 11, 437, 309, + 311, 516, 281, 13, 407, 456, 366, 516, 281, 312, 512, 50640], "temperature": 0.0, + "avg_logprob": -0.2271327105435458, "compression_ratio": 1.5633802816901408, "no_speech_prob": + 0.004162776283919811}, {"id": 877, "seek": 525528, "start": 5260.8, "end": 5267.12, + "text": " bumps. There''s going to be some sudden pulls and pushes. But I think + the important thing is that", "tokens": [50640, 27719, 13, 821, 311, 516, 281, 312, + 512, 3990, 16982, 293, 21020, 13, 583, 286, 519, 264, 1021, 551, 307, 300, 50956], + "temperature": 0.0, "avg_logprob": -0.2271327105435458, "compression_ratio": 1.5633802816901408, + "no_speech_prob": 0.004162776283919811}, {"id": 878, "seek": 525528, "start": 5268.0, + "end": 5271.28, "text": " why you ask me why do it open because prove it.", "tokens": + [51000, 983, 291, 1029, 385, 983, 360, 309, 1269, 570, 7081, 309, 13, 51164], "temperature": + 0.0, "avg_logprob": -0.2271327105435458, "compression_ratio": 1.5633802816901408, + "no_speech_prob": 0.004162776283919811}, {"id": 879, "seek": 525528, "start": 5273.92, + "end": 5283.36, "text": " Awesome. Yeah, this is an amazing answer. So data is literally + king and the one who has", "tokens": [51296, 10391, 13, 865, 11, 341, 307, 364, + 2243, 1867, 13, 407, 1412, 307, 3736, 4867, 293, 264, 472, 567, 575, 51768], "temperature": + 0.0, "avg_logprob": -0.2271327105435458, "compression_ratio": 1.5633802816901408, + "no_speech_prob": 0.004162776283919811}, {"id": 880, "seek": 528336, "start": 5284.32, + "end": 5289.44, "text": " universal access to data, wins, right? In so many senses + of this word.", "tokens": [50412, 11455, 2105, 281, 1412, 11, 10641, 11, 558, 30, + 682, 370, 867, 17057, 295, 341, 1349, 13, 50668], "temperature": 0.0, "avg_logprob": + -0.15400277953786948, "compression_ratio": 1.5493562231759657, "no_speech_prob": + 0.030450498685240746}, {"id": 881, "seek": 528336, "start": 5290.719999999999, "end": + 5296.88, "text": " This is so great. This is so good. Chatting to you, Sid, I''ve + learned a lot. I was wondering if", "tokens": [50732, 639, 307, 370, 869, 13, 639, + 307, 370, 665, 13, 27503, 783, 281, 291, 11, 19797, 11, 286, 600, 3264, 257, 688, + 13, 286, 390, 6359, 498, 51040], "temperature": 0.0, "avg_logprob": -0.15400277953786948, + "compression_ratio": 1.5493562231759657, "no_speech_prob": 0.030450498685240746}, + {"id": 882, "seek": 528336, "start": 5296.88, "end": 5302.719999999999, "text": + " there''s something you would like to announce. Something is cooking. Or you simply + want to invite", "tokens": [51040, 456, 311, 746, 291, 576, 411, 281, 7478, 13, + 6595, 307, 6361, 13, 1610, 291, 2935, 528, 281, 7980, 51332], "temperature": 0.0, + "avg_logprob": -0.15400277953786948, "compression_ratio": 1.5493562231759657, "no_speech_prob": + 0.030450498685240746}, {"id": 883, "seek": 528336, "start": 5302.719999999999, "end": + 5309.2, "text": " developers to a tutorial and to send a pull request. Well, I would + love to do that. First of all,", "tokens": [51332, 8849, 281, 257, 7073, 293, 281, + 2845, 257, 2235, 5308, 13, 1042, 11, 286, 576, 959, 281, 360, 300, 13, 2386, 295, + 439, 11, 51656], "temperature": 0.0, "avg_logprob": -0.15400277953786948, "compression_ratio": + 1.5493562231759657, "no_speech_prob": 0.030450498685240746}, {"id": 884, "seek": + 530920, "start": 5309.2, "end": 5316.16, "text": " we have webinars every couple + of weeks. Please come if you''re interested. Just, it''s a,", "tokens": [50364, + 321, 362, 26065, 633, 1916, 295, 3259, 13, 2555, 808, 498, 291, 434, 3102, 13, 1449, + 11, 309, 311, 257, 11, 50712], "temperature": 0.0, "avg_logprob": -0.23585526645183563, + "compression_ratio": 1.6283783783783783, "no_speech_prob": 0.01513128262013197}, + {"id": 885, "seek": 530920, "start": 5316.16, "end": 5323.2, "text": " you just + need to put an email address at the edge of the red form. We are also totally available", + "tokens": [50712, 291, 445, 643, 281, 829, 364, 3796, 2985, 412, 264, 4691, 295, + 264, 2182, 1254, 13, 492, 366, 611, 3879, 2435, 51064], "temperature": 0.0, "avg_logprob": + -0.23585526645183563, "compression_ratio": 1.6283783783783783, "no_speech_prob": + 0.01513128262013197}, {"id": 886, "seek": 530920, "start": 5323.2, "end": 5328.16, + "text": " on Slack. There''s, you know, totally, we don''t have sales. It''s free. + Just connect up. You''ll talk", "tokens": [51064, 322, 37211, 13, 821, 311, 11, + 291, 458, 11, 3879, 11, 321, 500, 380, 362, 5763, 13, 467, 311, 1737, 13, 1449, + 1745, 493, 13, 509, 603, 751, 51312], "temperature": 0.0, "avg_logprob": -0.23585526645183563, + "compression_ratio": 1.6283783783783783, "no_speech_prob": 0.01513128262013197}, + {"id": 887, "seek": 530920, "start": 5328.16, "end": 5333.36, "text": " to support + or customer success. I guess is the more, more appropriate term these days. But + they''re", "tokens": [51312, 281, 1406, 420, 5474, 2245, 13, 286, 2041, 307, 264, + 544, 11, 544, 6854, 1433, 613, 1708, 13, 583, 436, 434, 51572], "temperature": 0.0, + "avg_logprob": -0.23585526645183563, "compression_ratio": 1.6283783783783783, "no_speech_prob": + 0.01513128262013197}, {"id": 888, "seek": 530920, "start": 5333.36, "end": 5337.12, + "text": " here. We''re here to help. That includes me and everybody else on the + team. There''s only five of us.", "tokens": [51572, 510, 13, 492, 434, 510, 281, + 854, 13, 663, 5974, 385, 293, 2201, 1646, 322, 264, 1469, 13, 821, 311, 787, 1732, + 295, 505, 13, 51760], "temperature": 0.0, "avg_logprob": -0.23585526645183563, "compression_ratio": + 1.6283783783783783, "no_speech_prob": 0.01513128262013197}, {"id": 889, "seek": + 533712, "start": 5338.08, "end": 5343.76, "text": " But we''re all here to help. + We would love to hear what you want to do this world, what you''re", "tokens": [50412, + 583, 321, 434, 439, 510, 281, 854, 13, 492, 576, 959, 281, 1568, 437, 291, 528, + 281, 360, 341, 1002, 11, 437, 291, 434, 50696], "temperature": 0.0, "avg_logprob": + -0.1151013780933942, "compression_ratio": 1.6958041958041958, "no_speech_prob": + 0.0313730388879776}, {"id": 890, "seek": 533712, "start": 5343.76, "end": 5347.68, + "text": " doing this world. We are here to write, if you need help with a search + provider, we''ll write it", "tokens": [50696, 884, 341, 1002, 13, 492, 366, 510, + 281, 2464, 11, 498, 291, 643, 854, 365, 257, 3164, 12398, 11, 321, 603, 2464, 309, + 50892], "temperature": 0.0, "avg_logprob": -0.1151013780933942, "compression_ratio": + 1.6958041958041958, "no_speech_prob": 0.0313730388879776}, {"id": 891, "seek": 533712, + "start": 5347.68, "end": 5354.08, "text": " for you or help you help you get it + working. What I can say for sure is this. Next month, version 2.0", "tokens": [50892, + 337, 291, 420, 854, 291, 854, 291, 483, 309, 1364, 13, 708, 286, 393, 584, 337, + 988, 307, 341, 13, 3087, 1618, 11, 3037, 568, 13, 15, 51212], "temperature": 0.0, + "avg_logprob": -0.1151013780933942, "compression_ratio": 1.6958041958041958, "no_speech_prob": + 0.0313730388879776}, {"id": 892, "seek": 533712, "start": 5354.08, "end": 5359.68, + "text": " will drop. It will be something you can one click try and it will have + the M365 integration that I", "tokens": [51212, 486, 3270, 13, 467, 486, 312, 746, + 291, 393, 472, 2052, 853, 293, 309, 486, 362, 264, 376, 11309, 20, 10980, 300, 286, + 51492], "temperature": 0.0, "avg_logprob": -0.1151013780933942, "compression_ratio": + 1.6958041958041958, "no_speech_prob": 0.0313730388879776}, {"id": 893, "seek": 533712, + "start": 5359.68, "end": 5364.4, "text": " talked about. So the full ability to + deploy it to your tenant in our hosted version or just to", "tokens": [51492, 2825, + 466, 13, 407, 264, 1577, 3485, 281, 7274, 309, 281, 428, 31000, 294, 527, 19204, + 3037, 420, 445, 281, 51728], "temperature": 0.0, "avg_logprob": -0.1151013780933942, + "compression_ratio": 1.6958041958041958, "no_speech_prob": 0.0313730388879776}, + {"id": 894, "seek": 536440, "start": 5364.4, "end": 5370.5599999999995, "text": + " take the Docker, run with it, hook that up so it will support OAuth 2 and OIDC. + Many, many more features", "tokens": [50364, 747, 264, 33772, 11, 1190, 365, 309, + 11, 6328, 300, 493, 370, 309, 486, 1406, 48424, 2910, 568, 293, 422, 2777, 34, 13, + 5126, 11, 867, 544, 4122, 50672], "temperature": 0.0, "avg_logprob": -0.17458599585073967, + "compression_ratio": 1.5134099616858236, "no_speech_prob": 0.003458797698840499}, + {"id": 895, "seek": 536440, "start": 5370.5599999999995, "end": 5374.799999999999, + "text": " will be elaborating on the things you can do with it over the next couple + of months, particularly in", "tokens": [50672, 486, 312, 16298, 990, 322, 264, 721, + 291, 393, 360, 365, 309, 670, 264, 958, 1916, 295, 2493, 11, 4098, 294, 50884], + "temperature": 0.0, "avg_logprob": -0.17458599585073967, "compression_ratio": 1.5134099616858236, + "no_speech_prob": 0.003458797698840499}, {"id": 896, "seek": 536440, "start": 5374.799999999999, + "end": 5383.839999999999, "text": " May. And I just really would beg people to try + it and tell us what you think. That''s my ask.", "tokens": [50884, 1891, 13, 400, + 286, 445, 534, 576, 4612, 561, 281, 853, 309, 293, 980, 505, 437, 291, 519, 13, + 663, 311, 452, 1029, 13, 51336], "temperature": 0.0, "avg_logprob": -0.17458599585073967, + "compression_ratio": 1.5134099616858236, "no_speech_prob": 0.003458797698840499}, + {"id": 897, "seek": 536440, "start": 5383.839999999999, "end": 5390.16, "text": + " So if, and if anybody can, if you want to work on it, you know, we''re always + delighted to accept", "tokens": [51336, 407, 498, 11, 293, 498, 4472, 393, 11, 498, + 291, 528, 281, 589, 322, 309, 11, 291, 458, 11, 321, 434, 1009, 18783, 281, 3241, + 51652], "temperature": 0.0, "avg_logprob": -0.17458599585073967, "compression_ratio": + 1.5134099616858236, "no_speech_prob": 0.003458797698840499}, {"id": 898, "seek": + 539016, "start": 5391.12, "end": 5396.48, "text": " and even guide anybody as to + where to start. Right. So that''s where we are. We''re very young", "tokens": [50412, + 293, 754, 5934, 4472, 382, 281, 689, 281, 722, 13, 1779, 13, 407, 300, 311, 689, + 321, 366, 13, 492, 434, 588, 2037, 50680], "temperature": 0.0, "avg_logprob": -0.19176543386358963, + "compression_ratio": 1.602510460251046, "no_speech_prob": 0.03748473897576332}, + {"id": 899, "seek": 539016, "start": 5396.48, "end": 5402.96, "text": " and we''re + trying to figure this out. And energetic and knowledgeable. And I think we will + link", "tokens": [50680, 293, 321, 434, 1382, 281, 2573, 341, 484, 13, 400, 24935, + 293, 33800, 13, 400, 286, 519, 321, 486, 2113, 51004], "temperature": 0.0, "avg_logprob": + -0.19176543386358963, "compression_ratio": 1.602510460251046, "no_speech_prob": + 0.03748473897576332}, {"id": 900, "seek": 539016, "start": 5402.96, "end": 5408.72, + "text": " everything you mentioned, of course, in the episode show notes so everyone + can click it their will.", "tokens": [51004, 1203, 291, 2835, 11, 295, 1164, 11, + 294, 264, 3500, 855, 5570, 370, 1518, 393, 2052, 309, 641, 486, 13, 51292], "temperature": + 0.0, "avg_logprob": -0.19176543386358963, "compression_ratio": 1.602510460251046, + "no_speech_prob": 0.03748473897576332}, {"id": 901, "seek": 539016, "start": 5408.72, + "end": 5416.96, "text": " And you know, follow and learn from you as I did today. + And I really want to allocate time also", "tokens": [51292, 400, 291, 458, 11, 1524, + 293, 1466, 490, 291, 382, 286, 630, 965, 13, 400, 286, 534, 528, 281, 35713, 565, + 611, 51704], "temperature": 0.0, "avg_logprob": -0.19176543386358963, "compression_ratio": + 1.602510460251046, "no_speech_prob": 0.03748473897576332}, {"id": 902, "seek": 541696, + "start": 5417.36, "end": 5421.28, "text": " to participate in one of your webinars + with them. I''m pretty sure I will learn more.", "tokens": [50384, 281, 8197, 294, + 472, 295, 428, 26065, 365, 552, 13, 286, 478, 1238, 988, 286, 486, 1466, 544, 13, + 50580], "temperature": 0.0, "avg_logprob": -0.20890057881673177, "compression_ratio": + 1.6774193548387097, "no_speech_prob": 0.013585628010332584}, {"id": 903, "seek": + 541696, "start": 5422.32, "end": 5427.2, "text": " That would be great. We are definitely + bringing in folks. We had again, KMW, which makes", "tokens": [50632, 663, 576, + 312, 869, 13, 492, 366, 2138, 5062, 294, 4024, 13, 492, 632, 797, 11, 591, 44, 54, + 11, 597, 1669, 50876], "temperature": 0.0, "avg_logprob": -0.20890057881673177, + "compression_ratio": 1.6774193548387097, "no_speech_prob": 0.013585628010332584}, + {"id": 904, "seek": 541696, "start": 5427.2, "end": 5432.56, "text": " Spyglass + the open source project. We had the author of Quarge came, came previously Renee. + It was", "tokens": [50876, 35854, 28851, 264, 1269, 4009, 1716, 13, 492, 632, 264, + 3793, 295, 2326, 289, 432, 1361, 11, 1361, 8046, 47790, 13, 467, 390, 51144], "temperature": + 0.0, "avg_logprob": -0.20890057881673177, "compression_ratio": 1.6774193548387097, + "no_speech_prob": 0.013585628010332584}, {"id": 905, "seek": 541696, "start": 5432.56, + "end": 5436.64, "text": " great fun. We hope to have him on again, because I think + we could learn. I''m actually listening for", "tokens": [51144, 869, 1019, 13, 492, + 1454, 281, 362, 796, 322, 797, 11, 570, 286, 519, 321, 727, 1466, 13, 286, 478, + 767, 4764, 337, 51348], "temperature": 0.0, "avg_logprob": -0.20890057881673177, + "compression_ratio": 1.6774193548387097, "no_speech_prob": 0.013585628010332584}, + {"id": 906, "seek": 541696, "start": 5436.64, "end": 5442.0, "text": " our talk + about the things they''re doing. So and many others. So absolutely, we''d love to + have you on.", "tokens": [51348, 527, 751, 466, 264, 721, 436, 434, 884, 13, 407, + 293, 867, 2357, 13, 407, 3122, 11, 321, 1116, 959, 281, 362, 291, 322, 13, 51616], + "temperature": 0.0, "avg_logprob": -0.20890057881673177, "compression_ratio": 1.6774193548387097, + "no_speech_prob": 0.013585628010332584}, {"id": 907, "seek": 541696, "start": 5442.0, + "end": 5446.16, "text": " And if you know anybody who wants to talk about the stuff + too, please, I''d love to have them on as", "tokens": [51616, 400, 498, 291, 458, + 4472, 567, 2738, 281, 751, 466, 264, 1507, 886, 11, 1767, 11, 286, 1116, 959, 281, + 362, 552, 322, 382, 51824], "temperature": 0.0, "avg_logprob": -0.20890057881673177, + "compression_ratio": 1.6774193548387097, "no_speech_prob": 0.013585628010332584}, + {"id": 908, "seek": 544616, "start": 5446.16, "end": 5454.96, "text": " well. Fantastic. + Thanks for pushing the envelope of search. Keep pushing. I wish you all the success", + "tokens": [50364, 731, 13, 21320, 13, 2561, 337, 7380, 264, 19989, 295, 3164, 13, + 5527, 7380, 13, 286, 3172, 291, 439, 264, 2245, 50804], "temperature": 0.0, "avg_logprob": + -0.16196058234389948, "compression_ratio": 1.625514403292181, "no_speech_prob": + 0.02095511183142662}, {"id": 909, "seek": 544616, "start": 5454.96, "end": 5462.88, + "text": " that you can get and beyond. And I hope we can chat more down the line + down the road as you got,", "tokens": [50804, 300, 291, 393, 483, 293, 4399, 13, + 400, 286, 1454, 321, 393, 5081, 544, 760, 264, 1622, 760, 264, 3060, 382, 291, 658, + 11, 51200], "temperature": 0.0, "avg_logprob": -0.16196058234389948, "compression_ratio": + 1.625514403292181, "no_speech_prob": 0.02095511183142662}, {"id": 910, "seek": 544616, + "start": 5462.88, "end": 5469.599999999999, "text": " as you guys grow and I''m + pretty sure you will. Thank you so much for the confidence we will love to", "tokens": + [51200, 382, 291, 1074, 1852, 293, 286, 478, 1238, 988, 291, 486, 13, 1044, 291, + 370, 709, 337, 264, 6687, 321, 486, 959, 281, 51536], "temperature": 0.0, "avg_logprob": + -0.16196058234389948, "compression_ratio": 1.625514403292181, "no_speech_prob": + 0.02095511183142662}, {"id": 911, "seek": 544616, "start": 5469.599999999999, "end": + 5474.639999999999, "text": " share updates in future, especially I''ll be very psyched + to show you some of the machine learning", "tokens": [51536, 2073, 9205, 294, 2027, + 11, 2318, 286, 603, 312, 588, 4681, 292, 281, 855, 291, 512, 295, 264, 3479, 2539, + 51788], "temperature": 0.0, "avg_logprob": -0.16196058234389948, "compression_ratio": + 1.625514403292181, "no_speech_prob": 0.02095511183142662}, {"id": 912, "seek": 547464, + "start": 5474.64, "end": 5479.4400000000005, "text": " stuff we''re talking about + as a case, we definitely want to build that as a use case and make it", "tokens": + [50364, 1507, 321, 434, 1417, 466, 382, 257, 1389, 11, 321, 2138, 528, 281, 1322, + 300, 382, 257, 764, 1389, 293, 652, 309, 50604], "temperature": 0.0, "avg_logprob": + -0.18566633760929108, "compression_ratio": 1.6655052264808363, "no_speech_prob": + 0.04799362272024155}, {"id": 913, "seek": 547464, "start": 5479.4400000000005, "end": + 5484.8, "text": " one click easy to do that. So yeah, let''s keep it touch. I love + to too. I mean, I''m a huge fan of", "tokens": [50604, 472, 2052, 1858, 281, 360, + 300, 13, 407, 1338, 11, 718, 311, 1066, 309, 2557, 13, 286, 959, 281, 886, 13, 286, + 914, 11, 286, 478, 257, 2603, 3429, 295, 50872], "temperature": 0.0, "avg_logprob": + -0.18566633760929108, "compression_ratio": 1.6655052264808363, "no_speech_prob": + 0.04799362272024155}, {"id": 914, "seek": 547464, "start": 5484.8, "end": 5491.84, + "text": " the podcast. Obviously, I''ve listened to the Vespa cast several times + and I think please keep", "tokens": [50872, 264, 7367, 13, 7580, 11, 286, 600, 13207, + 281, 264, 691, 279, 4306, 4193, 2940, 1413, 293, 286, 519, 1767, 1066, 51224], "temperature": + 0.0, "avg_logprob": -0.18566633760929108, "compression_ratio": 1.6655052264808363, + "no_speech_prob": 0.04799362272024155}, {"id": 915, "seek": 547464, "start": 5491.84, + "end": 5495.52, "text": " it up. It''s awesome. There''s not enough people focused + on this incredible area of technology.", "tokens": [51224, 309, 493, 13, 467, 311, + 3476, 13, 821, 311, 406, 1547, 561, 5178, 322, 341, 4651, 1859, 295, 2899, 13, 51408], + "temperature": 0.0, "avg_logprob": -0.18566633760929108, "compression_ratio": 1.6655052264808363, + "no_speech_prob": 0.04799362272024155}, {"id": 916, "seek": 547464, "start": 5497.200000000001, + "end": 5500.96, "text": " We''re talking about stuff. I think it''s going to become + more common, but it''s still a little bit", "tokens": [51492, 492, 434, 1417, 466, + 1507, 13, 286, 519, 309, 311, 516, 281, 1813, 544, 2689, 11, 457, 309, 311, 920, + 257, 707, 857, 51680], "temperature": 0.0, "avg_logprob": -0.18566633760929108, + "compression_ratio": 1.6655052264808363, "no_speech_prob": 0.04799362272024155}, + {"id": 917, "seek": 550096, "start": 5500.96, "end": 5507.28, "text": " unknown. + Yeah, appreciate your kind words. It''s thanks to you, makers. Thank you so much,", + "tokens": [50364, 9841, 13, 865, 11, 4449, 428, 733, 2283, 13, 467, 311, 3231, 281, + 291, 11, 19323, 13, 1044, 291, 370, 709, 11, 50680], "temperature": 0.0, "avg_logprob": + -0.23133602509131798, "compression_ratio": 1.072289156626506, "no_speech_prob": + 0.03108839876949787}, {"id": 918, "seek": 550728, "start": 5507.28, "end": 5511.599999999999, + "text": " say it for your time. Really enjoyed it. Thank you very much. Bye bye.", + "tokens": [50364, 584, 309, 337, 428, 565, 13, 4083, 4626, 309, 13, 1044, 291, 588, + 709, 13, 4621, 6543, 13, 50580], "temperature": 0.0, "avg_logprob": -0.4556102752685547, + "compression_ratio": 0.958904109589041, "no_speech_prob": 0.17116160690784454}]' +--- + +In this episode, you will learn about Swirl, MetaSearch Engine with large language models for your silo data. Here you can see how it works for the summary transcript of this episode created with the tool called ClearWord. +Hello there, Vector Podcast Season 2, and today I'm super excited to be talking to CIPROP STING, the creator of Swirl Search. It's a federated VectorSearch Engine. If I'm correct, but I would not hear more from CIT himself. Hello, CIT, how are you? I'm doing great. It's really great to be here. +Thank you so much for inviting me. Yeah, thanks for joining. I'm sure you are very busy building Swirl, and I'm really curious to learn more about it. I missed all the discussion, you know, how chat GPT is going to change things. +You know, is it going to conquer us or whatnot? But yeah, I mean, I'm really interested to hear how you guys are doing, how you guys are building this. And traditionally, we start with your background because we really want to know how you got here. Absolutely, no. +And it's been an interesting journey. Swirl actually is my, the 12th venture I've been lucky enough to work on. I started actually at a free email company called FreeMarkMail. You might remember Juno, our vastly more successful competitor. +It was a great, great lesson in marketing and customer acquisition. +But long story short, you know, my dad was an MIT professor, and he suggested, or he was interested in computers, and somewhere around, it was too long ago, but I was about 12 and I picked up a TRS 80 with 16K of RAM, I think, in a cassette tape for storage. +And we went to a couple of, actually, we went to two classes together, and then he didn't want to do it anymore, but I stayed with it. And I have always loved getting that computer to do things that we wanted to do. +And so I guess ever since then, I followed the tech path, so I was lucky enough to do my undergrad at MIT. I actually have an MBA, though, I'm one of those MBA CTOs. And mostly I've worked building software and leading teams to build products and services. +So some of them have been a TIVIO, which is now actually service now, which is obviously one of the unicorns out there. They really totally disrupted the knowledge base and help desk space. +And it's an incredible application of interesting core technology at the beginning, when things were whiteboardy. +I've worked in a couple of other search companies, and with some other search companies, I was lucky to spend a little time with Mercedes Arabian over at BA Insight, which was a very cool and also Jeff Fried, very cool company. +And since I know those guys back from fast, another company that I worked at, now Microsoft, fast was one of the early players in enterprise search that had an excellent product that scaled and right as Google was sort of becoming a household name and just intermediate everybody. +We had the tool to build the catalog, the e-catalog, that mostly for publishers, and but then it really spread out and started to affect intranets. And it was truly there that I saw the power of search and how it could change almost everything from the business perspective. +You know, business intelligence and reporting and all of these systems that have been around for 70, 80 years, they're what we settle for. But everybody, you know, from Brin and Page on, right, and way before that, we're all inspired by that Star Trek computer. +Why can't we just ask it, you know, it seems like it's not that hard. And now of course, not to give away the lead, right? But there's definitely something doing that and it's been a long time coming. But that is not structured data. +Well, let's not argue about the semantics, but it's not what people refer to as structured. It's not database data metrics and KPIs and sales numbers and things like that. +I think that it was really at fast and also at Northern Light Technology, which is still going strong, by the way, with some fantastic indexing search. And now they're doing question answering. First place I really touched search, right, was at Northern Light. It's the human interface. +And we feel like it should be coming along faster. And now the text after many years of indexing and vector search, right? And the advances driven by Google so much, right? Transformer architectures and vectors. And that has all come together into a pretty amazing place. +And so long story short, that background led me to create swirl because I noticed a couple things. It really came down to three things. One is that there's, there are silos, super silos, like service now. +Service now really did get a lot of the knowledge bases and a lot of those, a lot of the help desk, you know, the tickets base with the streams and tickets. M365 kind of won the files race at least right along with email. And I guess they've done very well. +Obviously, very impressive performance to build teams to the large community that it has developed. So, and then there are others, right? There's certainly Salesforce, a great example of where most of the CRM data now lives. Snowflakes, another one, you can't really get a copy of these. +I mean, moving the data out from Snowflake is relatively easy, but the others, there's a complicated API there. Salesforce has thousands of tables. So, you can't really get that data anymore, but yet it has some of the most important ideas, concepts, and knowledge in your entire company. +So, that's when I realized something that had been tried before. MetaSearch, right, or federated search. I think MetaSearch is clear up because now sometimes people say federated search is about e-commerce federation. +The MetaSearch was hard to do because of connectivity, right? Like it could take you months to just get somebody to change a network thing or to put a VPN in place or either change permissions. That was expensive in large enterprise. +But now, especially with public services, pretty much everything has an API. The perimeter doesn't exist the way I used to. And so, you can query everything. So, that left the problem of can you make sense of things? And that's of course, what we're here about, right, is vectors. +The power of vector search and vector similarity, specifically, right, self-cosign vector similarity that we use in Swirl to make sense of completely disparate and very, very, very incompatible results, if you will. And it's shocking how well it works. +That that's when I saw it work, I said there's more to this than I thought it now. It seems I'm not the only one. So, but anyway, that's a little bit of the story in my background. I hope that that made some sense. Yeah, it's very solid background. +You reminded me of one, I don't remember the name of that computer, but like didn't have the display the way we have today, right? It just had the keyboard. And then it had the cassette. And so, my friend and I were sitting there for several minutes to boot it. +And then there was some game like Mario or whatever. That was on the cool Apple twos. I was always envious of the Apple two, you know, kids. Because you're right, on the TRS-80, we only had block graphics. It was hilarious. But it didn't move a little bit faster in a way. +Like you get to wait a long time for Apple upgrades. But I remember the TRS-80, there was an incredible ecosystem of things you could add to it. So, memory. And then there was a company called Percom that put out disk drives. Wow, a disk drive. +That was a game changer if you played with a cassette recorder. Although, who didn't love switching your parents' cassettes with the with the data tape so they'd put it on in the car and we go, okay, are they going to stop and turn that off? It was a hilarious prank. A great way to get some sound. +But disk drives then gave, right? First, there were the five in a quarter or actually eight inch, then five in a quarter. And then finally, they've got to the cassette. I was at that point, it was sort of replaced, right? Then the IBM PC showed up. And that was a bit of a game changer. +But the Apple always had better graphics. Yeah, absolutely. I just wanted to come back to what you just said about federated search and enterprise search. +I think I remember hearing about enterprise search was it like 15, 16, 17 years ago, I don't remember in the university when one of my supervisors was focusing on it and he was saying, this is the next big thing. And once it's figured out, you know, we will be rich. But somehow it didn't happen. +And then later in my career, I heard term federated search in connection to, okay, we have our own search engine, we have clients data, can we combine the two without needing them to upload their data to our servers? +Because in some cases, they wouldn't trust us, you know, securing it's enough and so on. +So forest. And then we were confronted with the fact that maybe it will incur quite a bit of latency. And also even in the first place, how we would build this. +But you know, like before we even get there, how do you relate enterprise search versus federated search? So, so I think they're, they're different in that enterprise search is about a realm. Right. Enterprise search means usually not public sources. +And I think it's important to differentiate the problems of the large enterprise and even the medium enterprise are not the same as the sort of small, small and medium enterprise enterprise. Maybe that's not a great dividing line. +But definitely the large enterprise has a very different set of problems. It's so much more about, you know, global distribution and languages and regulation. If you're a, you know, small company like like swirl ink, we have five people, we can work off of almost anything. +I mean, and we don't have the silo problem because we just have picked, you know, we have four. But it's interesting. We do still have the silo problem, right. +And as I'm going to show you, just when we were trying to find the steering document for this discussion, I realized I was hunting around which silo did I put it in instead of just going to search. So it's funny that we've trained ourselves to work that way. +That I think it's a reflection of the reality that in the large enterprise, it's exactly what you said entitlements are extremely important. You're talking about crown jewel data like PL product data or customer feedback. CRM data is much less sensitive in some ways. +Also data that you might purchase, it's very common for companies to build and or purchase data sets and assemble them or assemble derivative sets. These would be incredibly valuable for lots of uses. +The simplest one, right, usually as sales or the most obvious one is help sales, help partners sell more at the knowledge companies, help the sales people better understand their customers or industries. And there's a massive amount of information overload. So the problems are different. +They're acute. They're willing to spend significant money, right, and invest in really creating a better world. I think now, maybe one of the most important trends is people are not so interested in more search boxes. +They want to build proactive systems that bring people the information that they need. And this has been a long time thing in sales with things like LinkedIn Navigator, right? A lot of the public data gets harvested and brought to you. +But think about all of that incredibly rich, valuable internal data and needing to bring that and how hard it is to bring that to people inside the enterprise because of those entitlement lines. +So federated or meta search is a technical approach, which says rather than an in traditional enterprise search, traditionally, the tool is indexing. +So you take the data from all the sources that you need to query, which usually, since that's hundreds, if not thousands, inside the large enterprise, usually start with a few. +And you extract the data, meaning you pull it all out, then you have to remodel it because you could leave it sort of as is, but the odds are high that won't help with search. You need to make at least some of the fields, things like title and body line up. +So you map those things over and you have to make sure that the set of entitlements, meaning whose author I see stuff, all of that from all the silos has to be aggregated and correctly rationalized and put together, then you index it. +Indexing is a technical process like creating a structure like the back of most books or most long books, a list of words with basically page numbers, but in this case, they're slightly more complex. They might identify the documents in the field and the exact token that it occurs in. +So you have this kind of data structure. And you just have to keep it up to date anytime anything changes. So it's really hard. I have been very lucky to work in search and fast was a phenomenal indexing company and it innovated in indexing beyond the pale. I really incredible stuff. +So fast was one of the first companies to do updateable indices. You could actually update it. Then a lot of the stuff that they did is advanced vector. We did it fast, but you know, me a tiny bit, right? Whatever the nuggets were, but they went on. They went so far with engine development at fast. +And now it's by the way available through the Vespa project, right? If you go to Vespa.de, all that stuff is available, open source to. Yeah, we have an episode with Vespa. You probably would joke. Yes. He's my hero on on Twitter. +So incredible advances at fast and frankly at a TVO, you know, there were a bunch of patents filed. +Some very smart people worked on that problem and came up with incredible ways to interlink data by combining graph and and a traditional inverted index and doing things like then adding that to machine learning and doing things like predicting the answer to a service ticket. +So there's no end of indexing. It's just hard. That's all. It's just hard. And especially when you want to combine silos. And so over the years, I've bumped into people who have had the multi silo problem in grade numbers. +There is one consulting company that has more than 500 silos, separate installations of elastic, literally from version two to version eight or whatever they're on now, right? Because that was a standard. +And when they got a JSON data set or a database or they bought something or they did a hackathon invariably, the documents ended up in some elastic with some security on it. +And now the some of the variation right in partner and case team performance is attributed internally through surveys to who knows where to get the data. +If you know, oh, I know to talk to this person, they will have the key to unlock this particular thing that I can then use to say, hey, look what we did this incredible work we did in your industry before or look at this incredible work we did for you in the past, right? +A new partner might not know that. +They've done five engagements that were very similar. So it's that kind of and I think the word is systematic. People want to be very much more systematic now because everyone is too busy and there's information overload. So that's really the to break those lines down. +My view is enterprise search now really desperately, desperately critically requires meta search. It's the only choice you cannot you're downloading, you know, pulling out all of the data, even if you were to desire that. It's very hard to do. +Now you're because you have to basically the old way would be to pull all the data out of everything and sort of filter it down. Why not search it? Yeah, our search is to say it's out there now. The vendors are doing incredible things. +I mean, service now from where it was years ago to where it is today. It's incredible. There's an amazing team of people working away on that and that's true of most applications now. Somebody's working on search. It has a nice high quality API. So let them do their thing. Let them master it. +But search and the other thing, the interesting that makes meta search particularly powerful for the enterprise is you're always searching on behalf of something. Right. And that avoids something that avoids it. It goes with the flow. It goes with the grain of the enterprise architecture. +You're supposed to query on behalf of something and if you do, in theory, the app can just maintain the context. It only gets tricky when you start saying, oh, I want to combine these five together. At the data level, when you do it at the user level, that's fine. +Either the user was authorized to see all three or they weren't or they were able to see a portion of it or they weren't. That's the way things work in the enterprise. So that's that's the subtle difference, right? To delineate them. Yeah. And why the potentials there is that indexing is costly. +And yet, on the. Yeah. And you described it really eloquently in a way that to some extent by implementing meta search, you wouldn't need to solve indexing issues. You wouldn't need to solve entitlement issues, right? You kind of like use the existing proxies. +But there is one remaining bit that I'm really curious about. So if you look at, let's say what Google did to the web search is that they looked at what you could call a proxonym effect. +So other people created pages linked to more important pages, hubs, and then you invent the algorithm, create it to you. +But you still kind of like rely on what others did in a way, right? And so now you have the page rank algorithm, how you how you rank the documents and all of a sudden, this is the breakthrough and this looks a lot more relevant. In enterprise search, you don't necessarily have this. +Okay, you do have documents that are being created, created, and so forth. But then as you said, there is a lot of silos, right? And so things get created. +There is no single place where you can say, what happened? What did I miss? What do you have on this topic and so forth? Just today in the morning, I was browsing through Office 365. +They have like a single page, which shows me all the documents that either I interacted with or someone interacted with and I am part of that group. And I can search there. That was helpful actually. That's all the lot of save the lot of time. But again, it doesn't have confidence. +It doesn't have Salesforce. It doesn't have a bunch of other places where it would go. So I guess one missing component, still in enterprise search was how would you rank these documents, right? Because you don't have a lot of signals. You simply have the documents themselves. +And so would you say that vector search now opens up this horizon for us? It helps solve this problem. Absolutely. And I think if we untangle it a little bit, it gets back to Google. In fact, it goes right back to Google. Google had the challenge of make they had the biggest data set in history. +The web incredibly interlinked. And they did the absolute best job of figuring out how to model that structure. You weren't searching every web page every time you searched. You were searching a structure that in fact is a large language model. Right? That's what they built. +They were the one they pioneered it. But it was the very first one. Or no, that's probably not true at all. Burr was an early one that got popular. I don't want to make, I have no idea. Right? What came first? But Burr was certainly the one that was the game changer. It was very recognized. +That's where the real popularization of transformer models I think came from. And it's that structure. What is that structure? It's a structure that can evaluate results almost independent of the results themselves. You don't have to look at every web page. And so that's the key. +In fact, you're absolutely right. I think there have been many attempts to do meta search and federated search even against APIs. But you then end up with just all the results. Tiled or whatever it is, but it's just all the results. And that doesn't help with information overload. +It also doesn't really get to the potential. So the key is, and what's world uses, we use a large language model. There's more to it, right? There's a relevancy algorithm around it. +There's a similarity pipeline around it, right? But at the end of the day, there's a large model that evaluates data as vectors with real numbers. And it allows you to do incredible comparisons. And the thing that, as I put this together, I wrote it nights and weekends starting last year. +And when I started to get results from it, I was shocked because I did not expect it to go to be as good as it came out. The thing about relevancy, right? It's, oh, man, we can always say we can, we'll identify it when we see it. But building tests around it is very difficult. +You come out with gold standards. And I love all the tooling. There's so much good tooling around it. But at the end of the day, it all depends, fundamentally, on really some set of finite labeled outcomes, right? That's what it is. I found another way to measure the relevancy without doing that. +And the way to do that is how far to the right are the term hits. And in, when you're using swirl, it favors because of the large language, the mall we used, it favors hits that are to the left, beginning of the sentence. It views aboutness as being at the beginning of the sentence. +It's extremely good at discriminating, again, identifying hits that are in passing. So I think we can all agree. Relovancy, I've always viewed relevancy as a bit of a stepped function. +The absolute top is exactly what I searched for in the entire field of the title and the body, right? At the end of the, the hits at the beginning of the body, we can probably agree that's got to be a good hit, the degree there's a title in a body. +And then we all fear the terrible mention, right? The enemy of relevancy is one mention of New York at the very end, right? It's like they're in the list of cities that absolutely have nothing to do with the big apple. +And that's what I found is that the relevancy function, the lower you are in the result list, the more to the right your search terms are. And the relevancy is what, the other thing about meta searches is you don't have the documents. I believe that an evidence-based approach is critical. +Did the silo return the search terms that you, the user put in, are they visible in the results? If they're not visible, then there's a question. So that's the other piece of it is we do use an evident space metric combined with similarity to say to rank and it works. +And now, so that said, here's all the stuff that I just left out. You have to normalize the query, especially if you interpret the query and rewrite it. One of the most important things about meta searches you can't send the same query to every endpoint. Not all endpoints are equal. +Some endpoints, for example, might be a database that's really able to target one field at a time effectively. So for example, they might be a repository that knows about companies. So if your search is electric vehicle Tesla, don't send electric vehicle in, just send Tesla. +So we provide a way to mark that. So we're all has the ability to tag each search provider with what it knows about. So you'd write that electric vehicle company colon Tesla. Tesla goes just to the company silos, the query. Everybody else drops the tag. +So Google gets electric vehicle Tesla, which of course, it doesn't have a magnificent job on. So then you have to normalize the query when you're scoring, as well as you have to normalize each field, right, as normal. Freshness is certainly an interesting thing. +I found the model also works best if we add a boost based on the top topness of the results. So if a repository gave it rank number one, we should probably at least factor that in a little bit. And then of course, there's things like number of hits. +And vector similarity is ultimately used, right? We aggregate vector similarities to reflect the evidence level. And then the strength of it, right, is captured in the similarity. So a lot went into it. +It's probably the most awful module in our in our repo, but somebody smarter will rewrite it soon, but it works. And that's the important thing. And that is why I'm here today, right? I have exited other ventures because I believe in this so much. +And I put together a little company that is here to support it. It's 100% open source under Apache 2.0. Get it or grab the darker and you can make it run in two lines. And you'll see. Yeah, that sounds so fantastic. And I'm sure our listeners will take a look, especially because it's open source. +It's much easier to you know, start hacking over the weekend or something. I also wanted to ask you before you show us some demos. I think this will be really, really interesting and change in format of the podcast to some extent. +You mentioned the similarity aspect of vector search, right? And so probably it also exists in keyword search to some extent, but there, as you said, we trained ourselves to look at what we see. And if we see how a keyword, we kind of trusted this more. +Although this probably varies per case, but in similarity search and vector search, this similarity is a play, right? So like, if you cannot find a top result, or even like a middle relevant result, you only find like very distant ones. +How do you detect this and how do you deal with this? So the similarity will be poor and there'll be no evidence. So the score will be low and it will be end up dropped to the back of the result list. That's the key. Now, there are a bunch of reasons that can happen though. +One of those reasons could be that perhaps we do not understand the domain, as well as the silo does. + And one thing, one example of that is perhaps we're dealing with transformations of entities, very often dictionary based, or sometimes it's more subtle, but one of the things we learned very quickly is QueryGee is an open, an amazing open source package that's used with Elastic Solar, an open source, an open search, I should say. +And it rewrites queries. It's kind of the standard for it. It's very common to find it. It's really amazing. So here, the idea is that the silo knows something that we don't. So we actually have an integration now where we listen to the feedback that comes from each engine. +So if they report, for example, if they highlight hits, we check the similarity. The similarity is high enough and we'll honor that. And that's another idea, where we want each of those silos to, we want to honor their feedback. +Now, we're not today, but in the future, why not requery based on expanding our vocabulary around the search? Those are all things that can be done. And by the way, we'd love to get a pull request if someone wants to do that. That honestly is kind of the key to the whole thing. Yeah. +So you kind of like learn to innovate. Anyway, you have multiple voter problems, but you also want to really hear the signal, hear out the signal from every of the voter, and sort of like make sure that you roll this up to the best formula, right? The best representation of this signal to the user. +That's right. Absolutely. And then you can label some of those sounds because you're right. Some of them are getting really smart. Just some examples. I'll throw out some Vectara. Amazing, amazing, incredible vector database. +That's probably an inadequate description, but it answers questions on your documents. There's a revolution in Vector, in Vector search. Some people are focused very much on performance, right? Some people are focused on knowledge. Some people are focused on exporting Vectors. +So I think the enterprise, especially large enterprise, already has dozens of indexing tools and engines and others. And there will be many of these too, special case, right? There'll be some that are incredible at customer service. And some will be incredible at exception handling. +Some will be incredible at perhaps sales pre-qualification. You know, I just sort of learned from the past examples. Watson was going to diagnose everything, right? And I think what it ultimately did well was pre-approval authorizations. +So, but over time, I think it's clearly these will all become more automated. And so then, but you still need a way, if you're trying to figure out what's new across these salads, you still need a way to query them all. And so this world's happy. We have an integration with chat GPT. +You can query chat GPT. In fact, by default, we query it if you put your key in every time. So you and we rewrite the query. If the query is a question, we just pass it right along. If it's not, we ask rewrite it using a prompt to something like tell me about thing. +So you get a summary, right, which we pop up for, I think we also have a query processor. So you can have a chat GPT change the user's query. Like you could say rewrite this to a Boolean, or rewrite this why not to a vector. +But in the end, right, it's going to do that on its own on the back side of things. So when you're trying to solve problems in the enterprise, the key is you need an interface to write to. +And it would be nice not to have to write code to connect to all these things, getting back to your question about architecture. And so those are the key abstractions in Swirl. Swirl, you don't have to write code to connect to an endpoint that we already have a connector to. +You just write a search provider. All you need to know is JSON path and maybe be able to read a little developer API dog. Right. And then that you'll pretty much be able to get the search provider. If you need to write a connector. And of course, here's the punch line. +Well, I think it will probably take you a couple hours, depending on your skill at Python. But on average, it shouldn't take more than two hours because I can give you a prompt. And we can teach chat GPT about the connector class. +You should be able to get that done in a couple hours just fixing up what it does. I found that about 70% of the time, it will produce a workable connector. Just fast. The right prompt, right? Teach it how to teach it our connector class. +And give it the right prompt and bang, you have sort of almost working codes. Yeah, I think this is the best part of interfaces like chat GPT systems like chat GPT is that you can outsource this mundane work and really focus on the actual thing. I was actually born away myself. +And to some extent, scared you weeks ago when I was just saying, Hey, can you create a Python code which we'll talk to TomTom search API map search API. And it did create it. It just asked me to insert like a token. So I just subscribe developer token. But I was really blown away. +Like I would have spent probably like several half a days here and there figuring things out, right? If it wasn't TomTom or some other API. But yeah, I think this is amazing. And I think, well, I believe that you guys are documenting a lot. +But if you if you haven't yet, which just explained within documents, I think that could save a lot of time for developers. But I was wondering maybe you would like to show us a demo of swirl and then we'll dig deeper into that. Absolutely. So let me share my screen. +So hopefully you can see my screen. Yes. So this is swirl. Actually, I'll start here. This is the swirl repo. Everything you need to get started is here. The readme describes, pretty much the two commands you need to run to get swirl running if you have Docker. +There are more detailed instructions if you want to download. Everything that you'll see here runs, we have automated tests against everything. We have a whole CICD environment. And support, I just want to be clear, is free. +Please just join our Slack channel and we're happy to help anytime, anywhere. Now, when you get swirl installed locally, as I have it, you'll get this nice homepage. But ultimately, what most people want to see is the UI. So this is Spyglass. +It's an open source UI produced by a sister company, KMW, actually long time friend Kevin Waters. And he's a long time committer and a contributor to the open source community as well. So Spyglass is a great starting point for building user interfaces. It has a lot of the key building blocks. +And so here, yesterday, I was thinking about how we wrote a document. You sent me a document to use. And I admit today, I was seeing you're going, where is that document? And I actually said, okay, it's in Microsoft Outlook and I found it. +But I forgot that I could search because one of the great things about that's coming out in swirl version two, which is going to drop next month in May is we have full M365 support. So you can do the OAuth two dance. And I've actually searched through my M365. +And here's my acceptance of your your meeting, actually, some other references to it. And then here, document number four, document shared with you, vector podcast. So if I had searched, it would have been the fourth hit above the fold. +And I actually haven't done the relevancy tuning on email or one drive yet. So it worked well enough to come up. But what I think you can see again is the matches are early in the document. It favors them. +First of all, of course, it likes both terms together, but it favors them without with some exceptions. It favors the term that's to the left. And so you can see there were a lot of results, but only a few really ranked. Yeah, hi. And that's the key, right? I scan it. I'm pretty much done now. +And I can say, you know, I probably want to go look in my email or my one drive. That's more than likely where it is. And I can go and do that, you know, very simply. Right, there we go. Now I have in the top three. So the power of meta search though is more than just that. +I will just let's do that. Is it like a Django app or? Yes. Yeah. Yeah. So the stack is the stack is rabbit Django Python celery, although we're not using too much celery and SQLite or Postgres with a lot of packages. We use NLTK, Spacey, Jason Path, some others. +So now, so here I am running my electric vehicle company, Colin Tesla. This is an earlier version of the software. So you're going to see some, there's one bug here, which is you'll see the emphasis tags instead of having them render, just because I reloaded the older version. +But here we can see a lot more sources than just, just, you know, enterprise sources. And particular, one of the things that the swirl adaptive query processor does is it rewrites this query. Most repositories will get the search electric vehicle Tesla with the company tag removed. +However, the company funding database in BigQuery, which I just fixed, will actually only get the query Tesla. So if we now look at the results, you know, we'll see fairly traditional high quality content here about electric vehicles with Tesla favored early on. +So for example, it loves this hit with Tesla right at the beginning of the body. Most of these, I think are pretty good hits. And here, here's a database hit. This is from BigQuery. It's a company funding record. So Tesla Motors raised a large series C back in 2006. +This is an old database of funding records from Kaggle. Now, a couple of things I want to point out on the fly swirl allows you to turn a database record into a sort of pseudo document. You can actually just write this as a Python expression and use braces to refer to the fields. +And I'll show that in a second. In addition, though, swirl has a fixed schema URL title, body date published date retrieved and author, but it also has a payload field. The payload field can hold anything. And by default, anything that you don't specify for mapping goes into the payload. +You can also say, please don't put anything in the payload. So here, the fields are also repeated, right? As data items so that if I want, did I could extract those individually? And the idea here is you have a normalized record that reflects the sort of top relevancy items. +So you know whether or not you should go deeper. And then the payload will have anything extra that you might need to make that decision. So for example, if we look a little further down, here's a result from NLresearch.com. +Northern Light, the same company I started while working on search, or I learned a lot about search at was really the first company I worked for. Still going strong. One of the things they do is extract super high quality news from the web. +And they field it and they classify it and can really do rich searching. So here is an article that they pulled together about, you know, basically it's not so much about the electric vehicle market. It's about Tesla. So it ranked a little bit lower. +In this case, there were some other ones that ranked higher. They have some nice data that we like to capture and put in the payload as well. So this really is the core of swirl. And you say it has things like facets. For example, we use U-track internally to track issues. +So if I want to, you know, just switch to those, it'll bring just those up. Oh, looks like I pooped on that one. Another thing you can do when you're running, oops, looks like just a second. Another thing you can do, we have the concept of mixers. Not for drinks, but for results. +You can mix the results up by default. We do it by relevancy, but you can specify different mixers. For example, the date mixer will focus on it. Well, date sorted and it hides anything that doesn't have a date published. +The round robin mixer, on the other hand, still sort of honors relevancy, but it just takes one from each result. So you get a cross section of the results. So here, for example, you know, just looking at the top five, one result, the best result from each silo right here at the top. +And of course, here I'm arguing a little bit about the relevancy of this right in one of our support tickets. So you see everything kind of just brought together for me and I can decide which things I might like to do. +Yeah, maybe it could, I mean, I'm just commenting as we go, but maybe visually it could also show where this comes from, right? Because you do have on the left the sources. Yes. So could actually say this comes from here, this comes from there. But again, the combined view is also excellent. +It's just if you needed to know, right? If you need to know, where did I get this from, right? That's right. So we do, we do, we keep the source in the result here, along with whoever the source tells us the author is. However, in the in this version, we didn't get to it. +We like to report the original rank. So you should see IT news and I'll research one here. It's the number one result in the two point O version. Actually, there's a new version that's coming out. I think we're going to just do a bug fix on this. The latest version 10. +1, which is in the repo now, fixes that in a couple other issues. So if you just get the newest, you'll be good. In two point O that we have a little bit of a new treatment for this. I think you'll like a lot better. But before I jump to that, you asked me a really important question. +Right? So how this is great, this UI will evolve. It's here so that you can show the power, right? And we ship it integrated. But from a developer perspective, none of this is super helpful, right? How do I integrate this with an existing UI? So that's what I really wanted to show you next. +So first, how do we connect to something? The answer is a search provider definition. So this definition right here, this text record, mostly JSON, but mostly just strings. +This configures are out of the box request get connector to query a search provider, to query in particular this Google programmable search engine that I put together. And actually, we ship with three of them preset and please feel free to share our keys. +We're happy we want to make sure that something is working for everybody, right? Out of the box. So further in this, are the things you'd expect? You can configure this with by providing a URL. You can construct the URL by pulling in fields from the query mappings. +So the only thing that ever really changes in a Google PSC is the CX code. Everything else you can just copy and paste. You can put dozens of them in. Also here are some of the important system things that help the system work, help us process this. +So we have four different processing pipelines built into Swirl. One is a prequery that runs before Federation. And then there's a query processing pipeline that runs for each connector or actually actually say search provider, which is an instance, a configured instance of a connector. +Then each of those also has a result processing pipeline, which transforms the results from the source into our normalized format. And then there's a post result processing that does things like relevancy ranking where you want all of the data. And they're all different. +By the way, there's an object model behind Swirl. So grading these things is really simple. There are different base classes for those and they set you up with everything you need. So essentially you come in, you have a Python model or I should say a Django model object to operate on. +All you have to do is change it and exit and you're done. Simple, simple. Also, we map out the different query capabilities of each provider in the query mappings. So how do you tell a given endpoint to sort by date? This is how you add this to the URL. How do you page the results? This is how. +Result index is a Swirl capability where we can provide you with the index number. You can also use result page. So the count or the page that you want. And here's an important one too, the not character. So do the silo support not as a term. This one doesn't. It does not support not as a term. +But it supports the not character. So as an example, now if I go to the search object, I can run a search. I'll run it for knowledge management. So actually, I'll just let that one run for a second. There we go. I got my chat. I have the wrong chat GPT API key. But that's okay. +Everybody knows what we would say about this stuff. So actually the query I really want to do is Elon Musk not Twitter. So perfectly legitimate query, right? What's going on in Elon Musk's world that's not related to Twitter? Now here's the thing. Google PSC will not understand that query. +Everybody says what Google doesn't understand, not no web Google does, but Google programmable search engine does not honor or not. And in fact, just to prove it, PSC.google.com. By the way, before I talk to you, I didn't know of this system existence myself, PSC. Oh my gosh. +For web slicing up the web, it is incredible. I mean, it takes two seconds to build it, right? So and you just give it examples. So here's the thing. You can go, here's the public URL for one of the programmable search engines I put in and I'll do the same exact query. Elon Musk. +Okay, so the very first result has Twitter in it, right? It's right there. In fact, the second result also has Twitter. Google programmable search engine is not going through the full Google parser. And it does not honor the not. However, if I say this, it works perfectly. +The plus-minus syntax works. Okay. So now when we look at this definition, it says the not character for Google PSC is minus. So now if we look at the search I ran, let's look at the search object. It's another object inside a swirl. +Why is there a search object? Because in MetaSearch, it takes a few seconds to get the results from everything. And you may want to look at that data over and over again. +In fact, one of the cool things you can do is swirl is you can set the subscribe function, swirl well, then recheck for new results every so often and update and mark them new and you can even get an update things like that, right? So alert mode if you will or subscribe mode as we like to call it. +So let's take a look at the search object. What this object contains for starters, a block of messages that explain exactly what was done to the query. And here you can see the adaptive query processor rewrote the queries for Google PSC from Elon Musk not Twitter to Elon Musk minus Twitter. +So this way we guarantee you're going to get the right result, not a bad result. Oh, and also our relevancy model checks. If you have a nodded term in your query and it finds it in relevancy, we drop it to the bottom and say we actually put a special flag on it. We say this was a bad result. +Most of the others though, frankly, just either didn't know they don't handle not. You track doesn't handle a knot at all. So we removed it completely and just say, go and give us what you've got for that. And for others, we probably would have left them. +Looking at the results, there's also an info block. This is all JSON. So it's straightforward for developer using Python. It's little lists and dictionaries. There's a result that describes what each of the different sources gave back. Easy to parse if you want to build that. +You have a filter URL so you can construct your own facet display and to jump to any given provider. We actually give you the query that we ran. So if you want to check the results, assuming you have the right credentials, there's the results. So I can actually go look at and modify my JSON. +And then as you would expect, there's a summary of what was found. So here's what we actually searched. The overall query, if you want to rerun or update a query or rescore it, you can do that right from the result list. So those links are available. +We summarize the federation results and the time. Give you the next page of results, everything stored in swirl. So you can page through. By the way, you can also set a retention or exploration factor if you want. So results will simply disappear for secure applications. You can even do it. +So there's no storage at all. And then the results. So from a developer perspective, literally, I'm going to extract the results dictionary or sorry, the results list from this structure that I get back when I call it. And I'm going to iterate on that and each thing's a dictionary. +It's a flat dictionary with as the things you would expect pretty much, right? Title URL, body, date published date retrieved and author. Everything else is meta information. So what what search provider responded, what the rank was, our scores a score. +There's various techniques to turn that into a probability or a confidence level if you would like. We may do that in the future. I think it's if people wanted it, we'd love to hear about it. I think for now, though, people seem to be very happy just with rank. +Most importantly, and really, this is what swirls ultimate value is, we explain exactly why the result matched and why it scored as it did. So for example, we in this case, of course, there are no stems for a name, but we do as basically we use nltk, we stem to maximize recall. +Then you'll see the actual extracted hits, the actual hit, not the lower case tokenized version, right? So we extract the actual hit. And then we produce the score, which is this is the cosine similarity between the query and the text around it in the result. +So we kind of sentence tokenize the result that we get and then we're basically looking to try and stay within that sentence and see how relevant it is. +And ultimately, we also adjust since we are sending different queries to different systems and of course, different systems have different result links on average. We do adjustments for both of those. We also give you the exact token locations for everything that's hit and ready to rank from there. +Wow. So much is done behind the scenes here. And so much is simplified on the other side, on the outer side. That's amazing. +And how many systems do you support or which systems do you support out of the box today? So I'm happy to say we have connectors to all of the major open source search engines, including solar, AWS open search or open search.org, I should say, an elastic search. +We also support the main open databases, Postgres, SQLite, also some of the more traditional cloud ones, Google BigQuery, for example. And we are in the process of adding, as I mentioned, M365. We also have, as of the last one, you can connect to Atlassian using our request get. +You can connect to UTRAC. So many of the sophisticated repositories, you can actually just use the request get connector to talk to them. And M365 and Slack are coming in our next release, which is next month. +Well, I think especially Slack or any like messenger that also has this kind of APIs that can utilize, I think that's going to be like a big thing in my opinion, because so much is happening in Slack or similar platforms. You know, so much knowledge is kind of written there in public channel. +So you're in your own direct messages, right? It's possible to access them. Then I think that this is amazing. We even support Microsoft Teams in the next release, full search of messages, also all the shared objects, depending on configuration. +And if you're familiar with the M365 OpenID connect, the infrastructure and sort of that ecosystem, it's entirely under the deployers control. Swirl is just software. +I mean, we have a hosted platform which you get connected to, but the permissioning, all of that is actually done on the on the owner's side. And you can turn it off in one second for any reason you're uncomfortable. But Swirl 2. +0, again, we'll be coming out next month, has all of the OAuth and OIDC capabilities so that you're really just connecting your Microsoft account, searching through that stuff. And there's no other user interfaces or IDs or anything like that. It's all seamless. +And again, I'll completely controlled by the deploy inside that M36510 and owner. Yeah, fantastic. Is there something else you want to show on this demo? Or we want to go back to our audio mode video and audio for those who are listening only. All right. I hope that was more than enough. +You know, there's a ton to show. I just want to give a little flavor for it. And in particular, you know, we're really focused on making this easy for developers. That's the current audience. I think there's lots more we can do in the future. +But if you want to add a bunch of sources or solve a multi-silver search problem, that's what Swirls intended to do. That's a... It's amazing. It's amazing. +And how do you see the clientele? Like, what is the ideal client for this system? How do you want to interact with these clients? And how do you see... +Or maybe you'll already experience this, you know, first steps to succeeding on this path? So I honestly, people who are using it today are doing three things with it. And I'm super curious, right, as to which ones of these will evolve. +I think the most basic, you know, or interesting use case, right, or the sort of like the most obvious use case is one search box to rule them all, apart from the Lord of the Rings reference. But honestly, that's been so hard. +If you've done a lot of enterprise search projects, normally, you know, for the initial scope, and it's expensive, and it takes about a year or whatever, you know, you get a couple silos in place, and things are good, and people like it. +But adding silos over time is super costly, and it's hard, and this is the way to do it. You have a great existing search index, you have a search UI, awesome. Connect the search index to Swirl, and connect your UI to Swirl. +Now you can add a whole bunch of other sources and get great ranking, and you don't have to change the UI necessarily. For the most part, every search UI has URL, title, and body, and maybe a date. Okay, so if, for starters, you can just take those. +And if you have more, right, if you want to do a source facet, that's cool. From there, I think, you know, people with Python, right, Django, experience, and who want to take this and tailor it, we'd love to help, we'd love to hear what you're doing. +Again, please, the Slack supports all free, just join up the community and get in there, and tell us what's going on, or ask. And I think there's lots of other people who are working with it too, who are started to, you know, answer questions and things like that. +The second thing, though, there are definitely use cases where people really want to monitor multiple sources, and push notifications out, like, to Slack, and to teams, and things like that. That's a very different model. +I don't know if that's for everybody, but I think it's, in a way, that's the future. Right, we shouldn't have to ask when going to a search box takes time, and then I still have to parse it. +Depending on what you know, Swirl doesn't do any profiling or anything like that, depending on what you know, you're the builder of search apps, right, or inside apps. +You should be able to target them, but the barrier is usually not what we know about the user, right? Since they're an employee, we might have skill knowledge about them, right? We probably have access in theory to some other information about their job function and department and who they talk to. +So it shouldn't be that hard, but the problem isn't knowing that stuff. The problem is saying, okay, well, how do I get content, right? How do I get that out? So again, hook it up to Swirl. + Build a watch list, which can be essentially a group of queries or set of search objects with the subscribe function turned on, you know, for a bunch of topics, push that data out to the people who need to know, create groups, use service accounts to search as opposed to using individual users, right? +Targeting individual users, not super valuable for proactive delivery, but on a group basis, very valuable. +So tell, right, create an industry feed that, you know, if you really know where to get the best industry data, why not make that systematic? Why not make that data available to everybody who's out there trying to talk to those folks through whatever, through their mobile? +And this is a thing like trying to do end-to-end enterprise search is super hard. +You got to get people to adopt your solution. Why would what do you want my mobile app for? You probably already have a cool one. You might already have five. So it's all about just putting that data out there so people can keep building fast. That's it. Yeah, this is amazing. + I mean, you you simplified a lot in how you presented, you simplified a lot and you solved so many edge case, like not edge case, but like this really challenging things that are like showstoppers sometimes, you know, like, okay, I have this existing search demo app or something, you know, it's used within my department. +I just want to add one data source. +Now, what do I do, right? Do I really need to change my UI? Do I really need to rewrite the back end and things like that? And so I could actually, when I introduce swirl, will it actually precede every search back end call between UI and the search back end? That's how I do it now. +And like, we're setting it up. We use it internally and that's the way to do it rather than querying an index, you know, and then create just queries world and have it query all of those things. And what you get is the best results from across all sources. +Now, that's no substitute though, right, from going into the silo. Sometimes you need to go into the silo. They have in addition to a great search API and a lot of business logic right on their side, like query synums. There's a lot more. +You probably want to view the object in their environment versus in swirl, we can create a copy of it or whatever, like everybody else does. We don't. +If somebody wants to do preview, you know, there are so many technologies to do that, but why? +Instead, take, I think the best thing to do is after the user has scanned the shallow results that swirl gives you immediately two, two, three seconds, that's nothing compared to the time it takes to go to each silo. +After you've done three silos, you're already way saving, right? But then say, okay, look, it's obvious to me that the best results here are maybe in one drive in this folder or maybe it's in this team's chat or these teams chats. +So now click, go into that environment and hopefully you can then, right, traverse the data and get what you actually need. And down the road, when those repositories are serving up answers, right? We have mentioned chat, GPT much, but I assume you've seen the Microsoft co-pilot demo. +How long before that's pushing the data back, as opposed to you asking for it, right? It's saying, oh, here's the summary you need today. If you knew what to tell it, it could probably do that for you. So I think that's the new landscape. +The much more important thing than the one search box to rule them all is to use the power of meta-search to connect systems together and deliver information to the stuff you have already, to the workflows that work and make value already. +Whether that's Slack or, a newsletter or a notification to a Salesforce queue, that's the what you should do. The world doesn't need another search you are. +Yeah, especially like today, I saw a message on Slack for one of the senior managers saying, hey, what's the password to this thing? +And I can imagine that in the business schedules, you know, they don't have access, they don't have the password right now, they will switch to another topic, but maybe this topic was still important and maybe even one important, but they just don't want to wait. +And what you say is that in principle, they could have configured it once and access it as many times as they need. Exactly. Exactly. +And it's not uncommon in the world of, you know, consulting, strategic consulting, tech strategy, that the most powerful people are analysts and admins because, you know, partners are very busy, right, talking to and solving client problems and finding new ones. +So they rely on those folks to have access to all the systems and to go scour them. And of course, that's a waste, right? Probably nobody loves scouring those silos, but even more, we cannot be 100% systematic all the time. + But with technologies like meta-search and push technologies and there's a million things you could use and there's a million ways to deliver those things, the opportunity is really there to let those people work on something else, right, to create value in other ways and not just be scouring it for everything that's relevant, for every, you know, give the best chance. +Yeah, absolutely. +And how do you view the problem of, or do you think it's a problem at all, of evolving such a search engine, you know, like, if, if I have the main experts who could actually label results for me for these queries, could I somehow integrate this into the process with swirl? Absolutely. +So that brings me actually nice lead into the third use case that the people are starting to look at with swirl. So exactly what you said, maybe I'm trying to build the chat GPT of my business, okay, maybe it doesn't even have to be that, maybe it has to be something, even a simpler version. +How would I automate handling of an exception when processing a mortgage, as an example? How could I automate that? That's really hard. That is probably not a rules-based system. But it's exactly what you said. +I need labels, right? So you're going to have your humans go scour, whatever, the various locations slack and teams and various products. And hopefully they find them and they label them. +Why not use MetaSearch for that? + So if you can MetaSearch those things and use the language model, right, to basically say, I'm going to label anything over a certain score as being about this thing, then I give it a bunch of labels, let it hit, get a bunch of targets, let it go, find those things, hold the documents, because you will need the documents. +The difficulty of pulling documents compared to searching documents in M365 is one permission. So we are today, right, if you install swirl against M365, against your tenant, you are granting permission for swirl on behalf of some user, right, to search through the one drive files. +So you could also give a permission to fetch those files. So use swirl, to find the documents that are about the exception handling across silos, label the ones that are above a certain threshold. +Perhaps you could display those in a UI and let the, let the analyst check the labels, you could use a cool tool like Prodigy as an example, right, from explosion, the same folks who make spacey, which is what we use in in swirl. + And I think from there you would say you if you trusted the labels, if the labels were good enough, you could actually do your first run, right, take 25 or 40% or whatever your preferred number of the labeled results out, build the machine learning model with the rest, and then test with the, you know, with the holdout set, do the, you know, if it's bad, build a confusion matrix, etc, etc. +There you go. And at least now you're reviewing and refining and adjusting the threshold as opposed to going and starting with hand labeling of data. Yeah. Yeah. That's, that's a great application for meta search and language models. Exactly. +And you explain basically kind of like in the, in the most straightforward way, you know, in a machine learning, out of machine learning model training testing validation, right, which doesn't escape in the search world doesn't escape from this. I think this is amazing. +You chose as the model for your product, open source. You have some thoughts on this. I really like this question when I asked this to, I think I asked it to Bob Van Lloyd from VV8 as well. +You know, why did you guys, you know, looking at your competition, let's say Pinecon didn't choose open source for some reasons that are valid for them, but you guys did. +And so did you, what makes you believe in this model work better? Because in some sense, it does require a lot of like public facing work, right? You need to explain, you need to document, you need to review pull requests with all the goodies that come with this, of course, right? +But there is an extra work involved, but you definitely get some benefits. +What is your thinking here? The truth is I've been an open source person forever. +I just believe in it, whether it was, Jeff Hammerbacker's amazing comment about how it's too bad that everyone's spending their time on clicks, right? And he believes that the data science approach benefits hugely from open source. That's so true. +Joseph Jackson, the notable VC, right, has written so much about its open source software that's really eating the world. It's eating at a considerably higher rate. And the reason I think is it's a few things. One is trust. One is trust. +You know, during the pandemic, I think the large enterprise saw a lot of promising young ventures just not make it. And if you bet on one of those technologies, you probably didn't get the technology. Or maybe you did, right? I don't know, but there was a certain amount of risk involved in that. +And open source does, although people, I don't think want to take the code and run with it, they want to know that they could, if they had to. +The second thing though, the trust is much deeper when you have a commercial company that supports open source, the so-called commercial open source model, because it does require that public investment, that public discipline. We're all about people using it. There's no sales. +Nobody has that title. We're here to make people successful using it. And I'm not sure, to be honest with you, how all the different ways it's going to evolve, but we want to evolve in line with what the actual community needs. +You know, I think you start with a kernel of an idea, and I've worked in search enough to have that. But beyond that, it's a collective thing. I love the way Vespa, as an example, is so open to look at how well it's evolving in the place that the community that needs it. +I think there's a similar community. And what is out there for them are a bunch of potentially some good and some unknown vendors, some interesting open source products, some of which might take a lot of work together. +And maybe their stories about super hot projects where there's one committee, and they go on vacation for two months and everything falls apart, where they lose interest after two years, and they leave with 2000 tickets. It's good to know that there's a little commercial entity. +But ultimately, aren't the greatest innovations coming from open source, open AI? Most of the pieces out there, that's why there have been so many replications. And that's the last piece of it. It's provable. You know, you can take my word for it. You can look at all the charts and stuff. +But with two commands, if you have Docker running, you can get swirl going, and you can see for yourself. Yeah. If it doesn't do something, well, help us make it better. Sorry, go ahead. Exactly. Exactly. +No, I mean, that exactly proves it because however magical the software is, if you are the engineer, you really want to like, you know, open the engine and see what's going on there. +How can I modify this? How can I plug this in? Because if it's not open, I guess, well, maybe someone will blame me and say, no, this is wrong. +But you know, if it's like an API that I need to pay for, what's the path for me to get into hacking? Should I buy it on my own credit card? +Or should I call my manager and say, hey, can you, well, and usually what happens if you look at Pinecon, for example, they will have, they will allocate like a free tier, right? And so you can kind of hack with free tier. +If you run out, then you'll call your manager, I guess. Right. And nothing wrong with that too. I mean, but I think that that's just a facilitation of the try and buy process. It's still a commercial company. You can't know for sure. Right. And honestly, that works for many companies. +There's no one size fits all. My point is this. +I think for solving complex, the kinds of complex multi-silow problems in the large enterprise where where I have been very lucky to work before and where I think, at least to some degree, I hear about the problems, right? Even if I don't understand them. I hear about the problems. +Open source is the winning model because it is so tailorable. You know, no one has the same thing. Everybody has seven of everything, I think, in the large enterprise. And then there's regulation and compliance regulatory systems, all that stuff. +Those things are the ones, those are the actual barriers. So open source is most adoptable in that regard. And then I think as long as there's someone who, you know, as long as there's some option to say, well, they're not disappearing, right? They're not. +There's still someone to help us who really knows how this thing works. It's, it's safe and tailorable. And that's what's really driving so much of the growth, the incredible growth in the software. Again, chat GPT, right? Paper, wood methods, not. It's being commercialized, but that's no surprise. +Yeah, I mean, it wouldn't probably exist if, like, just yesterday I was hacking something for going to bed. And it was super slow because I think it was US daytime. Everyone was probably hacking there as well. But I was fine with that. It was typing slowly, giving me some code snippets. +But could it have given me this code snippets if they were not online, if they were not like on GitHub or somewhere else, right? So I think it's kind of extending on the shoulders of giants again. Totally. I completely agree with you. And it's extremely limited. +Look, it was trained at least partly the noncode part, right? It was on Reddit. It reads like Reddit. It has a little bit of a know it, Ollie, you know, and it gives the sort of like consensus answer. Now, that's great for code as long as the consensus data is modern, current, and available. +So it's never going to teach you, it probably won't teach you that much about enterprise integration patterns and enterprise workloads. But it'll teach you a lot about open source. I write with it, I try to write with it almost every day. +And I can say this, it's very good at filling in a class function. If you teach it a class, it's very good at that. That seems to be, and that's really, I think, commodity work, right? How to connect to X? It's very, very disruptive there. +It's also potentially disruptive to a lot of natural language tasks. I think that's the way it is because it is at the end of the day a giant natural language model, right? So it's not surprising. It can do things like translation. It's very good at rewriting a query to make it broader. +It knows how to rewrite a query to make it Boolean. Those are never going to change. But getting the data to it. Again, if you want to build the chat GPT of mortgage exception handling, you're going to need to pull a lot of internal data, label it carefully. +That's that, and that, and you might discover you don't have enough. That could also be the case. There's a whole synthetic data market that's ready to solve that problem. So, but in a larger surprise, I think it's much more of the other problem. We can't get to it. We know it's there. +On that front, have you actually considered implementing a chat GPT plugin so that I can go as a user configure things at my tokens? And now, boom, I can search my internal data lakes. So we have, we are integrated with chat GPT. There's a connector so you can query it. +We, by default, send every query to it. We also have a query processor, and we will soon have a result processor that will summarize your results for you. But frankly, I think several people have already done stuff like that. So you just copy and paste the links. You can probably get that. +I think that's that's really an essential piece of it. Now, to query, like generate queries from chat GPT, I think that's easy to do. Right. Someone can do that. But this is my point. There will be other GPs. We refer to chat GPT as a question answer, right? Or questions. +If you say question, Colin, put your question in. We'll send it to chat GPT. I am sure people are looking at the amazing platforms you've just mentioned, right? All of them. +Those are going to end up deployed in different parts of the enterprise, answering questions, summarizing, extracting, predicting, prescribing. There will be all those things out there. And the key will be, how do you get at them? Yeah. It's still the problem. Right. +Just because you have something that will comment on the financial implications of a federal rule change, for example, right? Doesn't mean anyone's going to go look at it. +So, but if you made sure that every day or whatever it is or every that we were checking for new temporal updates from that and those were being pushed out to the people who needed to know that and read it, especially if you could check that they read it. +If you could imagine doing something like pushing information to analysts or somebody who's taking action on it and then tracking to see who read it and then watching their performance, I am sure that that will be a thing in the financial services world. You know, it's a tough world. +There's very used to a high level of governance, if you will, but I think that that's the kind of system that will ultimately produce the automation where the chat GPT will be able to solve the mortgage exception. So, on its own, 90% of the time, right? 10% of the time engaging a human. Yes. +That's somewhat scary, but I think it could also be liberating if done well. And I think there is a big discussion on this topic going on. +How do we collectively as a humanity, you know, make sure that this tech doesn't host us, right? Doesn't just kick us out of our professions or, you know, we still have a way to, I mean, even just going back to yesterday's example, I was going really in circles. +I was just drawing some pins on the map using chat GPT. And it couldn't get exactly the crux of what I was asking. And so I went to the kitchen. +I thought, just for two minutes and I thought, okay, I can just break down my code in two parts without telling chat GPT what I'm doing and just run everything in my ID and boom, I'm done because I was reasonably going in circles. +And maybe it's just me unable to, you know, engineer better prompt, so engineer better questions. Or maybe chat GPT does have limitations as well. You never know. But it did help me probably like 90% of the work was done using that interaction. +Like I would have spent several half a days as they call them or whatever evenings, figuring out all these things. Like what library should they use to connect to open source map or whatever. You know, how do I drop pins? Absolutely. +The chat GPT is the perfect replacement for the more senior developer, who will answer your texts right or your Slack, sorry Dave, my name's mine. You know, like that used to do work until you're blocked and then you go find somebody and say, okay, so I can't figure this out. +This was pre-internet, right? Now, for a long time, we had stack trace or the other thing that chat GPT has completely revised. Yeah, stack overflow. Stack overflow. Right. Exactly. Now it is we have stack overflow. For a while, we had stack overflow. And then now chat GPT, it's funny. +I forgot the name because I use chat GPT instead. I haven't Googled for a code thing in so long. I can't even replace your habit, right? Your memory and habit in some sense. Yeah. +Well, you know, we all get good at evaluating those, right? The stack overflow articles like, okay, so when's it from? How many upvotes does it have? Is there a good response? Does it have the green check mark? Chat GPT is pretty much bringing you back the green check mark answer. +So there's no point anymore. That's what it's good at. I totally review. It's funny. You mentioned this because exactly same thought across my mind when I was interacting with chat GPT. So that was like relating to my experience with stack overflow, doing some small Android application. +And I've ran into the issue which was described in like something like 20 questions and answers on exactly same topic. And everyone had a green, you know, check mark upvotes, but nothing worked. And in the end, I found just one of them that worked. +And you know, that was like the process in a way like iterative, repetitive, and also in some sense for trading, but then in the end, when you achieve it, you know, it's fine. You achieve what you want. With chat GPT is somewhat similar, but the experience is different. +I don't need to type that much. I mean, I don't need to type something into Google then go to Stackflow, you know, read this thing, comprehend it, and then apply it. With chat GPT, all of this is condensed. +It's like all of these steps just condensed and meet just literally typing what I want and getting something on the screen. Right. This part by itself is amazing. It is hard to predict where how far that will go. But I think that one thing is very clear. +The M365 silo is probably the most important one going forward because it's going to kind of automatically be taking the knowledge, which is very present in outlook, right? Maybe not so much encounter, but in your email is a lot of knowledge there in teams. There's a lot of knowledge there. +Documents, probably a decent amount there too, although I think that tends to be more scattered. But effectively, right? Chat GPT was trained from Reddit, which is chat. Teams is chat. Outlook is sort of chat. So there's no doubt that maybe those early interactions will come through that channel. +But I do think that exactly as you said early on, Microsoft is never going to make it easy to talk to anybody else. They still come from that position of silo dominance or whatever it is. They don't like to work with Salesforce. Salesforce doesn't like to work with them. +Nobody likes to use the non-great product in someone else's stack just because we're trying to consolidate. So that's why it persists. And that is very real and exacerbates the problem, the walls between the silos, and then throw in all the others. +After you get the basic whatever, big five, then you have all the elastics and open searches and solars and postgres and to say nothing of the applications. So one group is using swirl to look at five different ticket systems. They're all just ticket. +You track is one from JetBrains and then you're on the, there's some others. Okay, that's a really interesting problem. The cost to migrate all that stuff would be just, it's not even, I don't think it's necessarily that much money. It's just a massive amount of pain. +If you could figure out how to do it, probably some transfer much, it's not that much money, but it is a tremendous amount of work. Yes, I think you probably don't realize yourself yet, but from the way you explain this, it feels like you've invented JetGPT for the search part. +I mean, in some sense, like simplifying things, not actually, as you said, not requesting anyone to physically reinvent things like move data here and there, which can take years, sometimes like dozens of years, people simply don't do this. +And also access to the data, like today, I only remember a fraction of things that I did. I literally forget things that I've done yesterday. +I might sometimes reflect and I remember something a week ago or so, but it's still, it's because of information overload, and I need to make decisions, I need to scramble something together quickly on a conference page, how much knowledge do I have myself? +And if I had that magical search bar where I could have typed something and just get the support material, not to go all over the place, essentially doing what search engines should do, just go and check what happened where and when and by whom? Exactly. +Exactly. There's so much amazing work and time and genius that's gone into some of these apps. I mean, who doesn't love them? Like they're, you know, they all have incredible capabilities and they're evolved, they're growing all the time. +In a way, right, the idea that you would take data out to try to make sense of it is absurd. It really is. Think of Salesforce as 2,000 plus tables just to make the application work, you're going to extract that? No, you're going to query it. +And that's the key, right? And so we're focused on making the querying easy and understandable simplicity. You know, I've worked on some amazing products that were not simple. +And I'm sorry for some of them, right? Not being that simple, but at the end of the day, I think today in the enterprise, it's got to get easier. And there's got to be alternatives to indexing. And so thus the simplicity. Amazing. Here comes my favorite question. +As we get closer to the end of this amazing podcast episode, the question of why you've done a lot in software engineering, you've done quite a lot in search. You mentioned on all this companies, you know, like fast, which, you know, product became like Vespa and so on. You're building swirl. +Why? Like what keeps you motivated to do this? And as amazing as it is, like you're doing a lot of things. And also in the open, what motivates you to stay in this topic of search? You know, whether or not it's been searched, data integration has been the thing that I've always liked. +I started my career at John Hancock financial services working in marketing, doing customer segmentation. Interesting stuff. But really, the problem the company couldn't solve was how to view, well, completely separate product lines in one way. +They had no idea, right? 110-year-old company had no idea that it had a Pareto actually was somewhat worse. Like 10% or 15% of the customers were producing 80% of the premiums. Everybody got treated equally. +It was like a very old school business that was all about customers without really understanding customers. And it was still massively successful. So that's not an act. They were one of the biggest users of technology. +Also, Hancock had the largest IBM mainframe, I think, in the Northeast for many years. +But the silo problem was the problem that we had to solve to actually take the company to the level that it could compete with direct mail companies because direct mail companies had a lower cost basis and they knew the customer. +And that project quite honestly is the pattern that I have seen over and over again, regardless of what venture search has been one of them. But I was really lucky to work on mortgage processing too. +So a company called AI Foundry was actually backed by Kodak Alaris, which was the world leader in scanning at the time, right? Said, we need to come up with something to do. We need to do something interesting with this scanning technology. +And we'd like to apply it in a market other than consumer photos or things like that. Try to find a new market. +And mortgage turned out to be hot because if you've done a mortgage, right, if you've taken a mortgage, you have this ugly moment of sending them a bunch of documents and then you just have to wait. And then sometimes they're like, Oh, I need to do this one again. +I believe there's research that showed that something like one third of the applicants drop out every two or three days after, you know, you haven't got back to them with their documents. They just want that all clear, like you're good. +So AI Foundry used pretty interesting OCR's, zone technology, classification, text classification to turn the mortgage app into data, not 100% with the state of the art before was keying it, manually keying it, and then someone would manually review it. So we switched it to review. +Company was successful. It was a silo problem again. You could think of the different types, right, of articles as being fundamentally silos and understanding them was hard, and we do a lot of modeling and it worked. It worked great, right? Gaulous bought the company. That's just another example. +Did the same thing in an IoT company, most recently, where we're basically taking sensor data from healthcare settings, marrying it up with other data, like their EHR data and trying to predict, you know, likelihood of various conditions. So it's always the silo problem. +And frankly, every single one of these ventures would have benefited from something like swirl. So that's why I did it. It's because to be honest with you, I think the data problem is huge. I'm passionate about it. +And I think it's important to solve it because frankly, some of the service problems, right, that we all suffer when we're out in the field dealing with large companies because they just don't have the data. +They're not just trying to be mean or be clueless, right? Sometimes it's like, it's a hard problem to solve. We expect a lot now. As an engineer, right? I'm expecting chat GPT level response is pretty soon. +And yet, what we have is Siri, who like can barely figure out to turn off the alarm, you know, what it's going to. So there are going to be some bumps. There's going to be some sudden pulls and pushes. But I think the important thing is that why you ask me why do it open because prove it. Awesome. +Yeah, this is an amazing answer. So data is literally king and the one who has universal access to data, wins, right? In so many senses of this word. This is so great. This is so good. Chatting to you, Sid, I've learned a lot. I was wondering if there's something you would like to announce. +Something is cooking. Or you simply want to invite developers to a tutorial and to send a pull request. Well, I would love to do that. First of all, we have webinars every couple of weeks. Please come if you're interested. +Just, it's a, you just need to put an email address at the edge of the red form. We are also totally available on Slack. There's, you know, totally, we don't have sales. It's free. Just connect up. You'll talk to support or customer success. I guess is the more, more appropriate term these days. +But they're here. We're here to help. That includes me and everybody else on the team. There's only five of us. But we're all here to help. We would love to hear what you want to do this world, what you're doing this world. +We are here to write, if you need help with a search provider, we'll write it for you or help you help you get it working. What I can say for sure is this. Next month, version 2.0 will drop. It will be something you can one click try and it will have the M365 integration that I talked about. +So the full ability to deploy it to your tenant in our hosted version or just to take the Docker, run with it, hook that up so it will support OAuth 2 and OIDC. Many, many more features will be elaborating on the things you can do with it over the next couple of months, particularly in May. +And I just really would beg people to try it and tell us what you think. That's my ask. So if, and if anybody can, if you want to work on it, you know, we're always delighted to accept and even guide anybody as to where to start. Right. So that's where we are. +We're very young and we're trying to figure this out. And energetic and knowledgeable. And I think we will link everything you mentioned, of course, in the episode show notes so everyone can click it their will. And you know, follow and learn from you as I did today. +And I really want to allocate time also to participate in one of your webinars with them. I'm pretty sure I will learn more. That would be great. We are definitely bringing in folks. We had again, KMW, which makes Spyglass the open source project. +We had the author of Quarge came, came previously Renee. It was great fun. We hope to have him on again, because I think we could learn. I'm actually listening for our talk about the things they're doing. So and many others. So absolutely, we'd love to have you on. +And if you know anybody who wants to talk about the stuff too, please, I'd love to have them on as well. Fantastic. Thanks for pushing the envelope of search. Keep pushing. I wish you all the success that you can get and beyond. +And I hope we can chat more down the line down the road as you got, as you guys grow and I'm pretty sure you will. +Thank you so much for the confidence we will love to share updates in future, especially I'll be very psyched to show you some of the machine learning stuff we're talking about as a case, we definitely want to build that as a use case and make it one click easy to do that. +So yeah, let's keep it touch. I love to too. I mean, I'm a huge fan of the podcast. Obviously, I've listened to the Vespa cast several times and I think please keep it up. It's awesome. There's not enough people focused on this incredible area of technology. We're talking about stuff. +I think it's going to become more common, but it's still a little bit unknown. Yeah, appreciate your kind words. It's thanks to you, makers. Thank you so much, say it for your time. Really enjoyed it. Thank you very much. Bye bye. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/sid-probstein-part-ii-bring-ai-to-company-data-with-swirl.md b/transcripts_with_timestamps/vector-podcast/sid-probstein-part-ii-bring-ai-to-company-data-with-swirl.md new file mode 100644 index 0000000..5e61eeb --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/sid-probstein-part-ii-bring-ai-to-company-data-with-swirl.md @@ -0,0 +1,1979 @@ +--- +description: '

This episode on YouTube: https://www.youtube.com/watch?v=5fafSkzKpfw

00:00 + Intro

01:54 + Reflection on the past year in AI

08:08 + Reader LLM (and RAG)

12:36 Does it + need fine-tuning to a domain?

14:20 + How LLMs can lie

17:32 + What if data isn''t perfect

21:21 SWIRL''s + secret sauce with Reader LLM

23:55 Feedback + loop

26:14 + Some surprising client perspective

31:17 + How Gen AI can change communication interfaces

34:11 + Call-out to the Community

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20240515_120505_ab56f7a7d7ebadfb6bbd3486a4d2e7ad.png +pub_date: Wed, 15 May 2024 12:57:55 GMT +title: Sid Probstein, part II - Bring AI to company data with SWIRL +url: https://rss.com/podcasts/vector-podcast/1480271 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 20.96, "text": " Hello + there, this is Vector Podcast Season 3 and I''m super excited to be talking to", + "tokens": [50364, 2425, 456, 11, 341, 307, 691, 20814, 29972, 16465, 805, 293, 286, + 478, 1687, 2919, 281, 312, 1417, 281, 51412], "temperature": 0.0, "avg_logprob": + -0.339648785798446, "compression_ratio": 1.0, "no_speech_prob": 0.0807822197675705}, + {"id": 1, "seek": 14096, "start": 140.96, "end": 147.28, "text": " companies with + thousands and thousands of users and thousands and thousands of systems that it''s + been a", "tokens": [50364, 3431, 365, 5383, 293, 5383, 295, 5022, 293, 5383, 293, + 5383, 295, 3652, 300, 309, 311, 668, 257, 50680], "temperature": 0.0, "avg_logprob": + -0.16850528520407135, "compression_ratio": 1.6653061224489796, "no_speech_prob": + 0.5573660135269165}, {"id": 2, "seek": 14096, "start": 147.28, "end": 155.12, "text": + " time of inspiration and a little bit of continued nervousness about what it all + means. Last March on the", "tokens": [50680, 565, 295, 10249, 293, 257, 707, 857, + 295, 7014, 6296, 1287, 466, 437, 309, 439, 1355, 13, 5264, 6129, 322, 264, 51072], + "temperature": 0.0, "avg_logprob": -0.16850528520407135, "compression_ratio": 1.6653061224489796, + "no_speech_prob": 0.5573660135269165}, {"id": 3, "seek": 14096, "start": 155.12, + "end": 163.04000000000002, "text": " 15th actually was the 14th was Pi Day and that + was the one year anniversary of GPT-4. What I''ve learned", "tokens": [51072, 2119, + 392, 767, 390, 264, 3499, 392, 390, 17741, 5226, 293, 300, 390, 264, 472, 1064, + 12962, 295, 26039, 51, 12, 19, 13, 708, 286, 600, 3264, 51468], "temperature": 0.0, + "avg_logprob": -0.16850528520407135, "compression_ratio": 1.6653061224489796, "no_speech_prob": + 0.5573660135269165}, {"id": 4, "seek": 14096, "start": 163.04000000000002, "end": + 169.12, "text": " is that those large enterprises were again looked at GPT-4 and + said this is going to change our", "tokens": [51468, 307, 300, 729, 2416, 29034, + 645, 797, 2956, 412, 26039, 51, 12, 19, 293, 848, 341, 307, 516, 281, 1319, 527, + 51772], "temperature": 0.0, "avg_logprob": -0.16850528520407135, "compression_ratio": + 1.6653061224489796, "no_speech_prob": 0.5573660135269165}, {"id": 5, "seek": 16912, + "start": 169.12, "end": 177.76, "text": " business. This can really help everybody + be an efficient expert and just slice through the", "tokens": [50364, 1606, 13, + 639, 393, 534, 854, 2201, 312, 364, 7148, 5844, 293, 445, 13153, 807, 264, 50796], + "temperature": 0.0, "avg_logprob": -0.15865017067302356, "compression_ratio": 1.583673469387755, + "no_speech_prob": 0.002471005544066429}, {"id": 6, "seek": 16912, "start": 177.76, + "end": 184.4, "text": " current problems of silo data and inconsistent systems. + But at the same time there were a lot of", "tokens": [50796, 2190, 2740, 295, 3425, + 78, 1412, 293, 36891, 3652, 13, 583, 412, 264, 912, 565, 456, 645, 257, 688, 295, + 51128], "temperature": 0.0, "avg_logprob": -0.15865017067302356, "compression_ratio": + 1.583673469387755, "no_speech_prob": 0.002471005544066429}, {"id": 7, "seek": 16912, + "start": 184.4, "end": 189.04000000000002, "text": " fear about well are we exposing + invaluable internal data to AI''s that are then going to be trained on", "tokens": + [51128, 4240, 466, 731, 366, 321, 33178, 40367, 6920, 1412, 281, 7318, 311, 300, + 366, 550, 516, 281, 312, 8895, 322, 51360], "temperature": 0.0, "avg_logprob": -0.15865017067302356, + "compression_ratio": 1.583673469387755, "no_speech_prob": 0.002471005544066429}, + {"id": 8, "seek": 16912, "start": 189.04000000000002, "end": 194.64000000000001, + "text": " it? Is this going to be exposed? Lost? There have been many many lawsuits. + So ultimately the large", "tokens": [51360, 309, 30, 1119, 341, 516, 281, 312, 9495, + 30, 23422, 30, 821, 362, 668, 867, 867, 39493, 13, 407, 6284, 264, 2416, 51640], + "temperature": 0.0, "avg_logprob": -0.15865017067302356, "compression_ratio": 1.583673469387755, + "no_speech_prob": 0.002471005544066429}, {"id": 9, "seek": 19464, "start": 194.64, + "end": 200.0, "text": " enterprises did what they always do which is engage with + it on their own terms and many of them", "tokens": [50364, 29034, 630, 437, 436, + 1009, 360, 597, 307, 4683, 365, 309, 322, 641, 1065, 2115, 293, 867, 295, 552, 50632], + "temperature": 0.0, "avg_logprob": -0.1540858849235203, "compression_ratio": 1.5934959349593496, + "no_speech_prob": 0.0038459908682852983}, {"id": 10, "seek": 19464, "start": 200.0, + "end": 207.51999999999998, "text": " purchased download installed AI, generative + AI''s and LLMs in their private clouds and we''re working", "tokens": [50632, 14734, + 5484, 8899, 7318, 11, 1337, 1166, 7318, 311, 293, 441, 43, 26386, 294, 641, 4551, + 12193, 293, 321, 434, 1364, 51008], "temperature": 0.0, "avg_logprob": -0.1540858849235203, + "compression_ratio": 1.5934959349593496, "no_speech_prob": 0.0038459908682852983}, + {"id": 11, "seek": 19464, "start": 207.51999999999998, "end": 214.32, "text": " + with one large company that did that and trained it with a bunch of what they called + safe data. So", "tokens": [51008, 365, 472, 2416, 2237, 300, 630, 300, 293, 8895, + 309, 365, 257, 3840, 295, 437, 436, 1219, 3273, 1412, 13, 407, 51348], "temperature": + 0.0, "avg_logprob": -0.1540858849235203, "compression_ratio": 1.5934959349593496, + "no_speech_prob": 0.0038459908682852983}, {"id": 12, "seek": 19464, "start": 214.32, + "end": 221.6, "text": " annual reports and you employ a handbook and it''s very + interesting to talk to but it can''t really", "tokens": [51348, 9784, 7122, 293, + 291, 3188, 257, 1011, 2939, 293, 309, 311, 588, 1880, 281, 751, 281, 457, 309, 393, + 380, 534, 51712], "temperature": 0.0, "avg_logprob": -0.1540858849235203, "compression_ratio": + 1.5934959349593496, "no_speech_prob": 0.0038459908682852983}, {"id": 13, "seek": + 22160, "start": 221.6, "end": 228.16, "text": " help a business person or somebody + trying to answer a question in the supply chain group or in", "tokens": [50364, + 854, 257, 1606, 954, 420, 2618, 1382, 281, 1867, 257, 1168, 294, 264, 5847, 5021, + 1594, 420, 294, 50692], "temperature": 0.0, "avg_logprob": -0.1555552811458193, + "compression_ratio": 1.6864111498257839, "no_speech_prob": 0.002812143648043275}, + {"id": 14, "seek": 22160, "start": 228.16, "end": 233.68, "text": " the R&D group + or in HR because this doesn''t have access to those systems and in those places", + "tokens": [50692, 264, 497, 5, 35, 1594, 420, 294, 19460, 570, 341, 1177, 380, 362, + 2105, 281, 729, 3652, 293, 294, 729, 3190, 50968], "temperature": 0.0, "avg_logprob": + -0.1555552811458193, "compression_ratio": 1.6864111498257839, "no_speech_prob": + 0.002812143648043275}, {"id": 15, "seek": 22160, "start": 234.72, "end": 239.2, + "text": " you''ve ever worked there. You know when you onboard the first thing they + do is your manager does", "tokens": [51020, 291, 600, 1562, 2732, 456, 13, 509, + 458, 562, 291, 24033, 264, 700, 551, 436, 360, 307, 428, 6598, 775, 51244], "temperature": + 0.0, "avg_logprob": -0.1555552811458193, "compression_ratio": 1.6864111498257839, + "no_speech_prob": 0.002812143648043275}, {"id": 16, "seek": 22160, "start": 239.2, + "end": 243.84, "text": " right is they open a bunch of tickets so that you could + have access to systems. That''s hard enough.", "tokens": [51244, 558, 307, 436, + 1269, 257, 3840, 295, 12628, 370, 300, 291, 727, 362, 2105, 281, 3652, 13, 663, + 311, 1152, 1547, 13, 51476], "temperature": 0.0, "avg_logprob": -0.1555552811458193, + "compression_ratio": 1.6864111498257839, "no_speech_prob": 0.002812143648043275}, + {"id": 17, "seek": 22160, "start": 244.79999999999998, "end": 251.04, "text": " + So the reason that there''s been so in a way so little progress right lots of installs + of AI but not", "tokens": [51524, 407, 264, 1778, 300, 456, 311, 668, 370, 294, + 257, 636, 370, 707, 4205, 558, 3195, 295, 3625, 82, 295, 7318, 457, 406, 51836], + "temperature": 0.0, "avg_logprob": -0.1555552811458193, "compression_ratio": 1.6864111498257839, + "no_speech_prob": 0.002812143648043275}, {"id": 18, "seek": 25104, "start": 251.04, + "end": 255.84, "text": " that much real I''d love to hear from you some of the use + cases out there. People are still trying to", "tokens": [50364, 300, 709, 957, 286, + 1116, 959, 281, 1568, 490, 291, 512, 295, 264, 764, 3331, 484, 456, 13, 3432, 366, + 920, 1382, 281, 50604], "temperature": 0.0, "avg_logprob": -0.12538250724037925, + "compression_ratio": 1.7644444444444445, "no_speech_prob": 0.0005569803761318326}, + {"id": 19, "seek": 25104, "start": 255.84, "end": 262.48, "text": " say we''re still + trying to get the data to the AI so that it can provide the benefit and what ultimately", + "tokens": [50604, 584, 321, 434, 920, 1382, 281, 483, 264, 1412, 281, 264, 7318, + 370, 300, 309, 393, 2893, 264, 5121, 293, 437, 6284, 50936], "temperature": 0.0, + "avg_logprob": -0.12538250724037925, "compression_ratio": 1.7644444444444445, "no_speech_prob": + 0.0005569803761318326}, {"id": 20, "seek": 25104, "start": 262.48, "end": 270.0, + "text": " what what happened is this they''ve got the AI''s installed the first + generation of AI architecture", "tokens": [50936, 437, 437, 2011, 307, 341, 436, + 600, 658, 264, 7318, 311, 8899, 264, 700, 5125, 295, 7318, 9482, 51312], "temperature": + 0.0, "avg_logprob": -0.12538250724037925, "compression_ratio": 1.7644444444444445, + "no_speech_prob": 0.0005569803761318326}, {"id": 21, "seek": 25104, "start": 270.0, + "end": 276.15999999999997, "text": " solution architectures is what I will refer + to as a vendor driven put the data in architecture", "tokens": [51312, 3827, 6331, + 1303, 307, 437, 286, 486, 2864, 281, 382, 257, 24321, 9555, 829, 264, 1412, 294, + 9482, 51620], "temperature": 0.0, "avg_logprob": -0.12538250724037925, "compression_ratio": + 1.7644444444444445, "no_speech_prob": 0.0005569803761318326}, {"id": 22, "seek": + 27616, "start": 276.16, "end": 282.48, "text": " literally every product out there + I don''t want to name them but they all say the first step", "tokens": [50364, 3736, + 633, 1674, 484, 456, 286, 500, 380, 528, 281, 1315, 552, 457, 436, 439, 584, 264, + 700, 1823, 50680], "temperature": 0.0, "avg_logprob": -0.0800493066961115, "compression_ratio": + 1.6784452296819787, "no_speech_prob": 0.0043405089527368546}, {"id": 23, "seek": + 27616, "start": 282.48, "end": 288.56, "text": " is put the data in again like for + some people for many applications from POVs for testing it out", "tokens": [50680, + 307, 829, 264, 1412, 294, 797, 411, 337, 512, 561, 337, 867, 5821, 490, 22299, 53, + 82, 337, 4997, 309, 484, 50984], "temperature": 0.0, "avg_logprob": -0.0800493066961115, + "compression_ratio": 1.6784452296819787, "no_speech_prob": 0.0043405089527368546}, + {"id": 24, "seek": 27616, "start": 288.56, "end": 292.96000000000004, "text": " + that''s great and I''ve who hasn''t done it with a few PDFs right and got some interesting + results", "tokens": [50984, 300, 311, 869, 293, 286, 600, 567, 6132, 380, 1096, + 309, 365, 257, 1326, 17752, 82, 558, 293, 658, 512, 1880, 3542, 51204], "temperature": + 0.0, "avg_logprob": -0.0800493066961115, "compression_ratio": 1.6784452296819787, + "no_speech_prob": 0.0043405089527368546}, {"id": 25, "seek": 27616, "start": 293.6, + "end": 300.64000000000004, "text": " but you can''t just take a copy of a departmental + database and hand it over to a centralized", "tokens": [51236, 457, 291, 393, 380, + 445, 747, 257, 5055, 295, 257, 5882, 304, 8149, 293, 1011, 309, 670, 281, 257, 32395, + 51588], "temperature": 0.0, "avg_logprob": -0.0800493066961115, "compression_ratio": + 1.6784452296819787, "no_speech_prob": 0.0043405089527368546}, {"id": 26, "seek": + 27616, "start": 300.64000000000004, "end": 305.92, "text": " corporate database + for training like that their rules in place to prevent that even more difficult", + "tokens": [51588, 10896, 8149, 337, 3097, 411, 300, 641, 4474, 294, 1081, 281, 4871, + 300, 754, 544, 2252, 51852], "temperature": 0.0, "avg_logprob": -0.0800493066961115, + "compression_ratio": 1.6784452296819787, "no_speech_prob": 0.0043405089527368546}, + {"id": 27, "seek": 30592, "start": 305.92, "end": 311.12, "text": " is the idea + that you would send it outside your perimeter into someone else''s cloud right at + another", "tokens": [50364, 307, 264, 1558, 300, 291, 576, 2845, 309, 2380, 428, + 32404, 666, 1580, 1646, 311, 4588, 558, 412, 1071, 50624], "temperature": 0.0, "avg_logprob": + -0.10807621479034424, "compression_ratio": 1.651877133105802, "no_speech_prob": + 0.0006289776647463441}, {"id": 28, "seek": 30592, "start": 311.12, "end": 316.16, + "text": " big manufacturing firm they have a 24 month waiting list to onboard a + new SaaS product right they''d", "tokens": [50624, 955, 11096, 6174, 436, 362, 257, + 4022, 1618, 3806, 1329, 281, 24033, 257, 777, 49733, 1674, 558, 436, 1116, 50876], + "temperature": 0.0, "avg_logprob": -0.10807621479034424, "compression_ratio": 1.651877133105802, + "no_speech_prob": 0.0006289776647463441}, {"id": 29, "seek": 30592, "start": 316.16, + "end": 321.92, "text": " like we have to put our security team on it so I believe + it''s a very interesting time and ultimately", "tokens": [50876, 411, 321, 362, + 281, 829, 527, 3825, 1469, 322, 309, 370, 286, 1697, 309, 311, 257, 588, 1880, 565, + 293, 6284, 51164], "temperature": 0.0, "avg_logprob": -0.10807621479034424, "compression_ratio": + 1.651877133105802, "no_speech_prob": 0.0006289776647463441}, {"id": 30, "seek": + 30592, "start": 321.92, "end": 327.36, "text": " what happened is Swirl thought + differently about the problem as you said we thought about it from", "tokens": [51164, + 437, 2011, 307, 3926, 1648, 1194, 7614, 466, 264, 1154, 382, 291, 848, 321, 1194, + 466, 309, 490, 51436], "temperature": 0.0, "avg_logprob": -0.10807621479034424, + "compression_ratio": 1.651877133105802, "no_speech_prob": 0.0006289776647463441}, + {"id": 31, "seek": 30592, "start": 327.36, "end": 333.92, "text": " the search technology + perspective why would we move all of the data instead move the", "tokens": [51436, + 264, 3164, 2899, 4585, 983, 576, 321, 1286, 439, 295, 264, 1412, 2602, 1286, 264, + 51764], "temperature": 0.0, "avg_logprob": -0.10807621479034424, "compression_ratio": + 1.651877133105802, "no_speech_prob": 0.0006289776647463441}, {"id": 32, "seek": + 33392, "start": 334.88, "end": 340.88, "text": " essentially take only the data + that you didn''t give it to the AI at that moment and what Swirl does", "tokens": + [50412, 4476, 747, 787, 264, 1412, 300, 291, 994, 380, 976, 309, 281, 264, 7318, + 412, 300, 1623, 293, 437, 3926, 1648, 775, 50712], "temperature": 0.0, "avg_logprob": + -0.10267083834757847, "compression_ratio": 1.7563636363636363, "no_speech_prob": + 0.0016980391228571534}, {"id": 33, "seek": 33392, "start": 340.88, "end": 346.16, + "text": " first to do that is we create a single pane of glass well the next thing + I''ll mention is Swirl is", "tokens": [50712, 700, 281, 360, 300, 307, 321, 1884, + 257, 2167, 32605, 295, 4276, 731, 264, 958, 551, 286, 603, 2152, 307, 3926, 1648, + 307, 50976], "temperature": 0.0, "avg_logprob": -0.10267083834757847, "compression_ratio": + 1.7563636363636363, "no_speech_prob": 0.0016980391228571534}, {"id": 34, "seek": + 33392, "start": 346.16, "end": 351.04, "text": " software we are a software company + and our software is typically deployed in the customers private", "tokens": [50976, + 4722, 321, 366, 257, 4722, 2237, 293, 527, 4722, 307, 5850, 17826, 294, 264, 4581, + 4551, 51220], "temperature": 0.0, "avg_logprob": -0.10267083834757847, "compression_ratio": + 1.7563636363636363, "no_speech_prob": 0.0016980391228571534}, {"id": 35, "seek": + 33392, "start": 351.04, "end": 357.20000000000005, "text": " cloud there''s we are + happy to do hosting for POVs and for various applications but for larger", "tokens": + [51220, 4588, 456, 311, 321, 366, 2055, 281, 360, 16058, 337, 22299, 53, 82, 293, + 337, 3683, 5821, 457, 337, 4833, 51528], "temperature": 0.0, "avg_logprob": -0.10267083834757847, + "compression_ratio": 1.7563636363636363, "no_speech_prob": 0.0016980391228571534}, + {"id": 36, "seek": 33392, "start": 357.20000000000005, "end": 361.92, "text": " + enterprise we don''t expect that to be the case once you deploy Swirl it integrates + with your", "tokens": [51528, 14132, 321, 500, 380, 2066, 300, 281, 312, 264, 1389, + 1564, 291, 7274, 3926, 1648, 309, 3572, 1024, 365, 428, 51764], "temperature": 0.0, + "avg_logprob": -0.10267083834757847, "compression_ratio": 1.7563636363636363, "no_speech_prob": + 0.0016980391228571534}, {"id": 37, "seek": 36192, "start": 362.24, "end": 367.6, + "text": " single sign on systems such as Microsoft or Octa or ping federate others + you can have cast whatever", "tokens": [50380, 2167, 1465, 322, 3652, 1270, 382, + 8116, 420, 6788, 64, 420, 26151, 38024, 473, 2357, 291, 393, 362, 4193, 2035, 50648], + "temperature": 0.0, "avg_logprob": -0.13063430786132812, "compression_ratio": 1.652542372881356, + "no_speech_prob": 0.002351338043808937}, {"id": 38, "seek": 36192, "start": 367.6, + "end": 375.52000000000004, "text": " in there once it''s configured you send a question + prompt or query search query to Swirl it brokers", "tokens": [50648, 294, 456, 1564, + 309, 311, 30538, 291, 2845, 257, 1168, 12391, 420, 14581, 3164, 14581, 281, 3926, + 1648, 309, 47549, 51044], "temperature": 0.0, "avg_logprob": -0.13063430786132812, + "compression_ratio": 1.652542372881356, "no_speech_prob": 0.002351338043808937}, + {"id": 39, "seek": 36192, "start": 375.52000000000004, "end": 381.44, "text": " + that query to all the sources that it''s authorized to do so and it does so on behalf + of that user", "tokens": [51044, 300, 14581, 281, 439, 264, 7139, 300, 309, 311, + 28312, 281, 360, 370, 293, 309, 775, 370, 322, 9490, 295, 300, 4195, 51340], "temperature": + 0.0, "avg_logprob": -0.13063430786132812, "compression_ratio": 1.652542372881356, + "no_speech_prob": 0.002351338043808937}, {"id": 40, "seek": 36192, "start": 382.56, + "end": 388.48, "text": " so it''s not only is it safe compliance search using existing + infrastructure but it''s personal", "tokens": [51396, 370, 309, 311, 406, 787, 307, + 309, 3273, 15882, 3164, 1228, 6741, 6896, 457, 309, 311, 2973, 51692], "temperature": + 0.0, "avg_logprob": -0.13063430786132812, "compression_ratio": 1.652542372881356, + "no_speech_prob": 0.002351338043808937}, {"id": 41, "seek": 38848, "start": 389.36, + "end": 395.20000000000005, "text": " the data the user or caller gets back is based + on what that user can see so I use it all the time", "tokens": [50408, 264, 1412, + 264, 4195, 420, 48324, 2170, 646, 307, 2361, 322, 437, 300, 4195, 393, 536, 370, + 286, 764, 309, 439, 264, 565, 50700], "temperature": 0.0, "avg_logprob": -0.09666124396367905, + "compression_ratio": 1.78515625, "no_speech_prob": 0.00889927800744772}, {"id": + 42, "seek": 38848, "start": 395.76, "end": 400.8, "text": " and it''s my email my + outlook my calendar my LinkedIn whatever right it''s my view", "tokens": [50728, + 293, 309, 311, 452, 3796, 452, 26650, 452, 12183, 452, 20657, 2035, 558, 309, 311, + 452, 1910, 50980], "temperature": 0.0, "avg_logprob": -0.09666124396367905, "compression_ratio": + 1.78515625, "no_speech_prob": 0.00889927800744772}, {"id": 43, "seek": 38848, "start": + 401.92, "end": 406.72, "text": " we actually love the idea that we should present + the data to the user so you get that single", "tokens": [51036, 321, 767, 959, 264, + 1558, 300, 321, 820, 1974, 264, 1412, 281, 264, 4195, 370, 291, 483, 300, 2167, + 51276], "temperature": 0.0, "avg_logprob": -0.09666124396367905, "compression_ratio": + 1.78515625, "no_speech_prob": 0.00889927800744772}, {"id": 44, "seek": 38848, "start": + 406.72, "end": 410.16, "text": " pane of glass and actually you can decide what + to do with it you can say I don''t want this", "tokens": [51276, 32605, 295, 4276, + 293, 767, 291, 393, 4536, 437, 281, 360, 365, 309, 291, 393, 584, 286, 500, 380, + 528, 341, 51448], "temperature": 0.0, "avg_logprob": -0.09666124396367905, "compression_ratio": + 1.78515625, "no_speech_prob": 0.00889927800744772}, {"id": 45, "seek": 38848, "start": + 410.16, "end": 415.68, "text": " source or whatever you can make adjustments but + ultimately we then execute rag we have our own", "tokens": [51448, 4009, 420, 2035, + 291, 393, 652, 18624, 457, 6284, 321, 550, 14483, 17539, 321, 362, 527, 1065, 51724], + "temperature": 0.0, "avg_logprob": -0.09666124396367905, "compression_ratio": 1.78515625, + "no_speech_prob": 0.00889927800744772}, {"id": 46, "seek": 41568, "start": 416.40000000000003, + "end": 422.16, "text": " excellent high quality rag better than many in particular + it seeks highly relevant", "tokens": [50400, 7103, 1090, 3125, 17539, 1101, 813, + 867, 294, 1729, 309, 28840, 5405, 7340, 50688], "temperature": 0.0, "avg_logprob": + -0.12839764887743657, "compression_ratio": 1.6867924528301887, "no_speech_prob": + 0.0017235697014257312}, {"id": 47, "seek": 41568, "start": 422.16, "end": 426.64, + "text": " passages from all of the documents we can fetch the documents and authenticate + on the fly", "tokens": [50688, 31589, 490, 439, 295, 264, 8512, 321, 393, 23673, + 264, 8512, 293, 9214, 8700, 322, 264, 3603, 50912], "temperature": 0.0, "avg_logprob": + -0.12839764887743657, "compression_ratio": 1.6867924528301887, "no_speech_prob": + 0.0017235697014257312}, {"id": 48, "seek": 41568, "start": 426.64, "end": 431.28000000000003, + "text": " as to do that um bind those to a prompt we have our own prompt engineering + you can", "tokens": [50912, 382, 281, 360, 300, 1105, 14786, 729, 281, 257, 12391, + 321, 362, 527, 1065, 12391, 7043, 291, 393, 51144], "temperature": 0.0, "avg_logprob": + -0.12839764887743657, "compression_ratio": 1.6867924528301887, "no_speech_prob": + 0.0017235697014257312}, {"id": 49, "seek": 41568, "start": 431.28000000000003, "end": + 437.36, "text": " uh override it and then do the rag against a huge list of AI providers + actually we support more than", "tokens": [51144, 2232, 42321, 309, 293, 550, 360, + 264, 17539, 1970, 257, 2603, 1329, 295, 7318, 11330, 767, 321, 1406, 544, 813, 51448], + "temperature": 0.0, "avg_logprob": -0.12839764887743657, "compression_ratio": 1.6867924528301887, + "no_speech_prob": 0.0017235697014257312}, {"id": 50, "seek": 41568, "start": 437.36, + "end": 442.72, "text": " 20 today including most of the ones we see out there open + AI open AI and azure bedrock and", "tokens": [51448, 945, 965, 3009, 881, 295, 264, + 2306, 321, 536, 484, 456, 1269, 7318, 1269, 7318, 293, 7883, 540, 2901, 17799, 293, + 51716], "temperature": 0.0, "avg_logprob": -0.12839764887743657, "compression_ratio": + 1.6867924528301887, "no_speech_prob": 0.0017235697014257312}, {"id": 51, "seek": + 44272, "start": 442.72, "end": 451.04, "text": " prop at google mistral uh co here + etc and in all cases no code should be required you configure", "tokens": [50364, + 2365, 412, 20742, 3544, 2155, 2232, 598, 510, 5183, 293, 294, 439, 3331, 572, 3089, + 820, 312, 4739, 291, 22162, 50780], "temperature": 0.0, "avg_logprob": -0.15557110526344992, + "compression_ratio": 1.8432835820895523, "no_speech_prob": 0.001988664735108614}, + {"id": 52, "seek": 44272, "start": 451.04, "end": 455.68, "text": " an existing + connector more than likely you''re putting in just endpoint information and authentication", + "tokens": [50780, 364, 6741, 19127, 544, 813, 3700, 291, 434, 3372, 294, 445, 35795, + 1589, 293, 26643, 51012], "temperature": 0.0, "avg_logprob": -0.15557110526344992, + "compression_ratio": 1.8432835820895523, "no_speech_prob": 0.001988664735108614}, + {"id": 53, "seek": 44272, "start": 455.68, "end": 461.28000000000003, "text": " + tokens and then swirl again does that broker and creates that pane of glass and + execute rag you can", "tokens": [51012, 22667, 293, 550, 30310, 797, 775, 300, 26502, + 293, 7829, 300, 32605, 295, 4276, 293, 14483, 17539, 291, 393, 51292], "temperature": + 0.0, "avg_logprob": -0.15557110526344992, "compression_ratio": 1.8432835820895523, + "no_speech_prob": 0.001988664735108614}, {"id": 54, "seek": 44272, "start": 461.28000000000003, + "end": 466.88000000000005, "text": " also use swirl just for the R if you have your + own rag right you can get the result list and do", "tokens": [51292, 611, 764, 30310, + 445, 337, 264, 497, 498, 291, 362, 428, 1065, 17539, 558, 291, 393, 483, 264, 1874, + 1329, 293, 360, 51572], "temperature": 0.0, "avg_logprob": -0.15557110526344992, + "compression_ratio": 1.8432835820895523, "no_speech_prob": 0.001988664735108614}, + {"id": 55, "seek": 44272, "start": 466.88000000000005, "end": 472.0, "text": " your + fetching or you can hook after you''ve got the swirl has the fetched results and + you can operate", "tokens": [51572, 428, 23673, 278, 420, 291, 393, 6328, 934, 291, + 600, 658, 264, 30310, 575, 264, 23673, 292, 3542, 293, 291, 393, 9651, 51828], "temperature": + 0.0, "avg_logprob": -0.15557110526344992, "compression_ratio": 1.8432835820895523, + "no_speech_prob": 0.001988664735108614}, {"id": 56, "seek": 47200, "start": 472.0, + "end": 478.64, "text": " that on just the full documents the key to this i love + that you asked is the reader lllm we have been", "tokens": [50364, 300, 322, 445, + 264, 1577, 8512, 264, 2141, 281, 341, 741, 959, 300, 291, 2351, 307, 264, 15149, + 287, 285, 76, 321, 362, 668, 50696], "temperature": 0.0, "avg_logprob": -0.1298468278186156, + "compression_ratio": 1.7731481481481481, "no_speech_prob": 0.0022196744102984667}, + {"id": 57, "seek": 47200, "start": 478.64, "end": 484.72, "text": " really heads + down working on the reader llm um i''ve actually been asking people if they have + heard", "tokens": [50696, 534, 8050, 760, 1364, 322, 264, 15149, 287, 75, 76, 1105, + 741, 600, 767, 668, 3365, 561, 498, 436, 362, 2198, 51000], "temperature": 0.0, + "avg_logprob": -0.1298468278186156, "compression_ratio": 1.7731481481481481, "no_speech_prob": + 0.0022196744102984667}, {"id": 58, "seek": 47200, "start": 484.72, "end": 490.72, + "text": " the term before and many haven''t uh i don''t know what what your take + is on on reader llm these days", "tokens": [51000, 264, 1433, 949, 293, 867, 2378, + 380, 2232, 741, 500, 380, 458, 437, 437, 428, 747, 307, 322, 322, 15149, 287, 75, + 76, 613, 1708, 51300], "temperature": 0.0, "avg_logprob": -0.1298468278186156, "compression_ratio": + 1.7731481481481481, "no_speech_prob": 0.0022196744102984667}, {"id": 59, "seek": + 47200, "start": 492.0, "end": 496.96, "text": " oh yeah i''m still catching up really + i mean the way i see it and i''m still kind of", "tokens": [51364, 1954, 1338, 741, + 478, 920, 16124, 493, 534, 741, 914, 264, 636, 741, 536, 309, 293, 741, 478, 920, + 733, 295, 51612], "temperature": 0.0, "avg_logprob": -0.1298468278186156, "compression_ratio": + 1.7731481481481481, "no_speech_prob": 0.0022196744102984667}, {"id": 60, "seek": + 49696, "start": 497.03999999999996, "end": 503.68, "text": " plowing through rag + itself right so you you said what is my take on on how easy it is to", "tokens": + [50368, 499, 9637, 807, 17539, 2564, 558, 370, 291, 291, 848, 437, 307, 452, 747, + 322, 322, 577, 1858, 309, 307, 281, 50700], "temperature": 0.0, "avg_logprob": -0.15840950277116564, + "compression_ratio": 1.6057142857142856, "no_speech_prob": 0.003570165252313018}, + {"id": 61, "seek": 49696, "start": 503.68, "end": 511.59999999999997, "text": " + on board to the say i models and so on i i have a sense that people are aware of + this because", "tokens": [50700, 322, 3150, 281, 264, 584, 741, 5245, 293, 370, + 322, 741, 741, 362, 257, 2020, 300, 561, 366, 3650, 295, 341, 570, 51096], "temperature": + 0.0, "avg_logprob": -0.15840950277116564, "compression_ratio": 1.6057142857142856, + "no_speech_prob": 0.003570165252313018}, {"id": 62, "seek": 49696, "start": 511.59999999999997, + "end": 519.1999999999999, "text": " it''s so easy to access through chat chat gpt + and similar tools but then when it comes to deploying", "tokens": [51096, 309, 311, + 370, 1858, 281, 2105, 807, 5081, 5081, 290, 662, 293, 2531, 3873, 457, 550, 562, + 309, 1487, 281, 34198, 51476], "temperature": 0.0, "avg_logprob": -0.15840950277116564, + "compression_ratio": 1.6057142857142856, "no_speech_prob": 0.003570165252313018}, + {"id": 63, "seek": 51920, "start": 519.2800000000001, "end": 527.44, "text": " these + things i don''t think that it''s as easy right so because you you have to go through + a list of", "tokens": [50368, 613, 721, 741, 500, 380, 519, 300, 309, 311, 382, + 1858, 558, 370, 570, 291, 291, 362, 281, 352, 807, 257, 1329, 295, 50776], "temperature": + 0.0, "avg_logprob": -0.13676402443333677, "compression_ratio": 1.7798165137614679, + "no_speech_prob": 0.007207722403109074}, {"id": 64, "seek": 51920, "start": 527.44, + "end": 532.32, "text": " models you need to figure out which one to pick and and + and and hence you need to be a data scientist", "tokens": [50776, 5245, 291, 643, + 281, 2573, 484, 597, 472, 281, 1888, 293, 293, 293, 293, 16678, 291, 643, 281, 312, + 257, 1412, 12662, 51020], "temperature": 0.0, "avg_logprob": -0.13676402443333677, + "compression_ratio": 1.7798165137614679, "no_speech_prob": 0.007207722403109074}, + {"id": 65, "seek": 51920, "start": 532.32, "end": 539.0400000000001, "text": " right + at that point or ml practitioner or whatever um and it''s not and it''s like the + web is", "tokens": [51020, 558, 412, 300, 935, 420, 23271, 32125, 420, 2035, 1105, + 293, 309, 311, 406, 293, 309, 311, 411, 264, 3670, 307, 51356], "temperature": 0.0, + "avg_logprob": -0.13676402443333677, "compression_ratio": 1.7798165137614679, "no_speech_prob": + 0.007207722403109074}, {"id": 66, "seek": 51920, "start": 539.6800000000001, "end": + 548.08, "text": " exploding with so many cheap advice you know use these use that + but then as you go through that", "tokens": [51388, 35175, 365, 370, 867, 7084, + 5192, 291, 458, 764, 613, 764, 300, 457, 550, 382, 291, 352, 807, 300, 51808], "temperature": + 0.0, "avg_logprob": -0.13676402443333677, "compression_ratio": 1.7798165137614679, + "no_speech_prob": 0.007207722403109074}, {"id": 67, "seek": 54808, "start": 548.08, + "end": 553.36, "text": " process you realize that none of those models work and + so you need to do something okay the", "tokens": [50364, 1399, 291, 4325, 300, 6022, + 295, 729, 5245, 589, 293, 370, 291, 643, 281, 360, 746, 1392, 264, 50628], "temperature": + 0.0, "avg_logprob": -0.15698103471235794, "compression_ratio": 1.7455357142857142, + "no_speech_prob": 0.0012613519793376327}, {"id": 68, "seek": 54808, "start": 553.36, + "end": 559.76, "text": " risk rag but setting up rag means that you need to bring + in an effective database that you haven''t", "tokens": [50628, 3148, 17539, 457, + 3287, 493, 17539, 1355, 300, 291, 643, 281, 1565, 294, 364, 4942, 8149, 300, 291, + 2378, 380, 50948], "temperature": 0.0, "avg_logprob": -0.15698103471235794, "compression_ratio": + 1.7455357142857142, "no_speech_prob": 0.0012613519793376327}, {"id": 69, "seek": + 54808, "start": 559.76, "end": 568.64, "text": " seen before and things like this + right so it''s yeah so i love that so just speaking of misinformation", "tokens": + [50948, 1612, 949, 293, 721, 411, 341, 558, 370, 309, 311, 1338, 370, 741, 959, + 300, 370, 445, 4124, 295, 34238, 51392], "temperature": 0.0, "avg_logprob": -0.15698103471235794, + "compression_ratio": 1.7455357142857142, "no_speech_prob": 0.0012613519793376327}, + {"id": 70, "seek": 54808, "start": 568.64, "end": 574.4000000000001, "text": " right + i think you''re absolutely right there''s so much um confusing stuff out there you + do not need", "tokens": [51392, 558, 741, 519, 291, 434, 3122, 558, 456, 311, 370, + 709, 1105, 13181, 1507, 484, 456, 291, 360, 406, 643, 51680], "temperature": 0.0, + "avg_logprob": -0.15698103471235794, "compression_ratio": 1.7455357142857142, "no_speech_prob": + 0.0012613519793376327}, {"id": 71, "seek": 57440, "start": 574.4, "end": 579.52, + "text": " a vector database to rag you never did it''s it''s a it''s a vendor thing + that i totally understand", "tokens": [50364, 257, 8062, 8149, 281, 17539, 291, + 1128, 630, 309, 311, 309, 311, 257, 309, 311, 257, 24321, 551, 300, 741, 3879, 1223, + 50620], "temperature": 0.0, "avg_logprob": -0.15462010527310305, "compression_ratio": + 1.8513931888544892, "no_speech_prob": 0.003457391634583473}, {"id": 72, "seek": + 57440, "start": 579.52, "end": 583.6, "text": " they''re charging per gigabyte or + whatever so they say you have to have it to rag uh there''s an excellent", "tokens": + [50620, 436, 434, 11379, 680, 8741, 34529, 420, 2035, 370, 436, 584, 291, 362, 281, + 362, 309, 281, 17539, 2232, 456, 311, 364, 7103, 50824], "temperature": 0.0, "avg_logprob": + -0.15462010527310305, "compression_ratio": 1.8513931888544892, "no_speech_prob": + 0.003457391634583473}, {"id": 73, "seek": 57440, "start": 583.6, "end": 588.48, + "text": " study by zet hub and actually simpson garf ankles and advisor to swore + all you may have heard that name", "tokens": [50824, 2979, 538, 710, 302, 11838, + 293, 767, 1034, 16962, 3691, 69, 40962, 293, 19161, 281, 1693, 418, 439, 291, 815, + 362, 2198, 300, 1315, 51068], "temperature": 0.0, "avg_logprob": -0.15462010527310305, + "compression_ratio": 1.8513931888544892, "no_speech_prob": 0.003457391634583473}, + {"id": 74, "seek": 57440, "start": 588.48, "end": 593.36, "text": " incredible tech + writer um he recently wrote a study a survey or a summary i should say of the zet", + "tokens": [51068, 4651, 7553, 9936, 1105, 415, 3938, 4114, 257, 2979, 257, 8984, + 420, 257, 12691, 741, 820, 584, 295, 264, 710, 302, 51312], "temperature": 0.0, + "avg_logprob": -0.15462010527310305, "compression_ratio": 1.8513931888544892, "no_speech_prob": + 0.003457391634583473}, {"id": 75, "seek": 57440, "start": 593.36, "end": 598.72, + "text": " hub study the zet hub study shows that you do not need to vectorize your + data to get high quality", "tokens": [51312, 11838, 2979, 264, 710, 302, 11838, + 2979, 3110, 300, 291, 360, 406, 643, 281, 8062, 1125, 428, 1412, 281, 483, 1090, + 3125, 51580], "temperature": 0.0, "avg_logprob": -0.15462010527310305, "compression_ratio": + 1.8513931888544892, "no_speech_prob": 0.003457391634583473}, {"id": 76, "seek": + 57440, "start": 598.72, "end": 604.16, "text": " results instead you just increase + the number of results you get from a so-called naive nonvector", "tokens": [51580, + 3542, 2602, 291, 445, 3488, 264, 1230, 295, 3542, 291, 483, 490, 257, 370, 12, 11880, + 29052, 2107, 303, 1672, 51852], "temperature": 0.0, "avg_logprob": -0.15462010527310305, + "compression_ratio": 1.8513931888544892, "no_speech_prob": 0.003457391634583473}, + {"id": 77, "seek": 60416, "start": 604.16, "end": 610.0, "text": " search engine + or database and re-rank using vectors that''s exactly what swore all this we vectorize", + "tokens": [50364, 3164, 2848, 420, 8149, 293, 319, 12, 20479, 1228, 18875, 300, + 311, 2293, 437, 1693, 418, 439, 341, 321, 8062, 1125, 50656], "temperature": 0.0, + "avg_logprob": -0.15329949468628973, "compression_ratio": 1.8996138996138996, "no_speech_prob": + 0.0006137712625786662}, {"id": 78, "seek": 60416, "start": 610.0, "end": 616.56, + "text": " the result set snippets we vectorize the full text of the documents we + vectorize the query the prompt", "tokens": [50656, 264, 1874, 992, 35623, 1385, + 321, 8062, 1125, 264, 1577, 2487, 295, 264, 8512, 321, 8062, 1125, 264, 14581, 264, + 12391, 50984], "temperature": 0.0, "avg_logprob": -0.15329949468628973, "compression_ratio": + 1.8996138996138996, "no_speech_prob": 0.0006137712625786662}, {"id": 79, "seek": + 60416, "start": 616.56, "end": 621.68, "text": " whatever it is right and our reader + llem is responsible for a complex similarity re-ranking", "tokens": [50984, 2035, + 309, 307, 558, 293, 527, 15149, 287, 10386, 307, 6250, 337, 257, 3997, 32194, 319, + 12, 20479, 278, 51240], "temperature": 0.0, "avg_logprob": -0.15329949468628973, + "compression_ratio": 1.8996138996138996, "no_speech_prob": 0.0006137712625786662}, + {"id": 80, "seek": 60416, "start": 622.4, "end": 627.1999999999999, "text": " you + can actually plug the a your choice of embeddings into our reader llem embeddings + are actually", "tokens": [51276, 291, 393, 767, 5452, 264, 257, 428, 3922, 295, + 12240, 29432, 666, 527, 15149, 287, 10386, 12240, 29432, 366, 767, 51516], "temperature": + 0.0, "avg_logprob": -0.15329949468628973, "compression_ratio": 1.8996138996138996, + "no_speech_prob": 0.0006137712625786662}, {"id": 81, "seek": 60416, "start": 627.1999999999999, + "end": 632.88, "text": " just a feature one one of the many things that llem''s + do so you can change that but the reader llem", "tokens": [51516, 445, 257, 4111, + 472, 472, 295, 264, 867, 721, 300, 287, 10386, 311, 360, 370, 291, 393, 1319, 300, + 457, 264, 15149, 287, 10386, 51800], "temperature": 0.0, "avg_logprob": -0.15329949468628973, + "compression_ratio": 1.8996138996138996, "no_speech_prob": 0.0006137712625786662}, + {"id": 82, "seek": 63288, "start": 633.6, "end": 639.28, "text": " here''s really + the core of it it''s the middle layers of the generative AI llem without the", "tokens": + [50400, 510, 311, 534, 264, 4965, 295, 309, 309, 311, 264, 2808, 7914, 295, 264, + 1337, 1166, 7318, 287, 10386, 1553, 264, 50684], "temperature": 0.0, "avg_logprob": + -0.11676945405847886, "compression_ratio": 1.7004405286343611, "no_speech_prob": + 0.002972096437588334}, {"id": 83, "seek": 63288, "start": 640.0, "end": 646.32, + "text": " you know um text generation and text interpretation part that''s not there + at all instead you use it", "tokens": [50720, 291, 458, 1105, 2487, 5125, 293, 2487, + 14174, 644, 300, 311, 406, 456, 412, 439, 2602, 291, 764, 309, 51036], "temperature": + 0.0, "avg_logprob": -0.11676945405847886, "compression_ratio": 1.7004405286343611, + "no_speech_prob": 0.002972096437588334}, {"id": 84, "seek": 63288, "start": 646.32, + "end": 651.76, "text": " to determine the similarity right cosine or they''re many + different algorithms but ultimately you''re", "tokens": [51036, 281, 6997, 264, + 32194, 558, 23565, 420, 436, 434, 867, 819, 14642, 457, 6284, 291, 434, 51308], + "temperature": 0.0, "avg_logprob": -0.11676945405847886, "compression_ratio": 1.7004405286343611, + "no_speech_prob": 0.002972096437588334}, {"id": 85, "seek": 63288, "start": 651.76, + "end": 656.64, "text": " taking some algorithm like that and you''re using embeddings + plus the reader llem''s own knowledge", "tokens": [51308, 1940, 512, 9284, 411, + 300, 293, 291, 434, 1228, 12240, 29432, 1804, 264, 15149, 287, 10386, 311, 1065, + 3601, 51552], "temperature": 0.0, "avg_logprob": -0.11676945405847886, "compression_ratio": + 1.7004405286343611, "no_speech_prob": 0.002972096437588334}, {"id": 86, "seek": + 65664, "start": 657.28, "end": 663.6, "text": " to say how similar is the query + or prompt or part of it to the response that i got or find the", "tokens": [50396, + 281, 584, 577, 2531, 307, 264, 14581, 420, 12391, 420, 644, 295, 309, 281, 264, + 4134, 300, 741, 658, 420, 915, 264, 50712], "temperature": 0.0, "avg_logprob": -0.13489681285816235, + "compression_ratio": 1.673728813559322, "no_speech_prob": 0.003514697542414069}, + {"id": 87, "seek": 65664, "start": 663.6, "end": 668.88, "text": " most relevant + passage in a document because you''re absolutely right there are tools like langshane", + "tokens": [50712, 881, 7340, 11497, 294, 257, 4166, 570, 291, 434, 3122, 558, 456, + 366, 3873, 411, 2265, 2716, 1929, 50976], "temperature": 0.0, "avg_logprob": -0.13489681285816235, + "compression_ratio": 1.673728813559322, "no_speech_prob": 0.003514697542414069}, + {"id": 88, "seek": 65664, "start": 668.88, "end": 673.76, "text": " out there as + in one example which give you lots of interesting tooling right but it''s still + on you", "tokens": [50976, 484, 456, 382, 294, 472, 1365, 597, 976, 291, 3195, 295, + 1880, 46593, 558, 457, 309, 311, 920, 322, 291, 51220], "temperature": 0.0, "avg_logprob": + -0.13489681285816235, "compression_ratio": 1.673728813559322, "no_speech_prob": + 0.003514697542414069}, {"id": 89, "seek": 65664, "start": 673.76, "end": 681.6, + "text": " the developer i actually had chat tpt generate me a pipeline just as a + demo and the biggest problem is", "tokens": [51220, 264, 10754, 741, 767, 632, 5081, + 256, 662, 8460, 385, 257, 15517, 445, 382, 257, 10723, 293, 264, 3880, 1154, 307, + 51612], "temperature": 0.0, "avg_logprob": -0.13489681285816235, "compression_ratio": + 1.673728813559322, "no_speech_prob": 0.003514697542414069}, {"id": 90, "seek": 68160, + "start": 681.6, "end": 687.36, "text": " it generated a function that i have to + fill in which is called select documents that''s really hard", "tokens": [50364, + 309, 10833, 257, 2445, 300, 741, 362, 281, 2836, 294, 597, 307, 1219, 3048, 8512, + 300, 311, 534, 1152, 50652], "temperature": 0.0, "avg_logprob": -0.12007498467105558, + "compression_ratio": 1.7004405286343611, "no_speech_prob": 0.00027063299785368145}, + {"id": 91, "seek": 68160, "start": 687.9200000000001, "end": 692.72, "text": " and + ultimately like you''re basically just providing the pipeline to move the data once + again", "tokens": [50680, 293, 6284, 411, 291, 434, 1936, 445, 6530, 264, 15517, + 281, 1286, 264, 1412, 1564, 797, 50920], "temperature": 0.0, "avg_logprob": -0.12007498467105558, + "compression_ratio": 1.7004405286343611, "no_speech_prob": 0.00027063299785368145}, + {"id": 92, "seek": 68160, "start": 693.36, "end": 700.32, "text": " but it''s the + reader llem in swirl is all about re-ranking and finding the best passages so that + you", "tokens": [50952, 457, 309, 311, 264, 15149, 287, 10386, 294, 30310, 307, + 439, 466, 319, 12, 20479, 278, 293, 5006, 264, 1151, 31589, 370, 300, 291, 51300], + "temperature": 0.0, "avg_logprob": -0.12007498467105558, "compression_ratio": 1.7004405286343611, + "no_speech_prob": 0.00027063299785368145}, {"id": 93, "seek": 68160, "start": 700.32, + "end": 705.9200000000001, "text": " are not sending a hundred pdf of which one paragraph + is relevant you are sending the paragraph", "tokens": [51300, 366, 406, 7750, 257, + 3262, 280, 45953, 295, 597, 472, 18865, 307, 7340, 291, 366, 7750, 264, 18865, 51580], + "temperature": 0.0, "avg_logprob": -0.12007498467105558, "compression_ratio": 1.7004405286343611, + "no_speech_prob": 0.00027063299785368145}, {"id": 94, "seek": 70592, "start": 706.56, + "end": 711.92, "text": " that way you can put a lot more data and you can also not + blow out your token limits right assuming", "tokens": [50396, 300, 636, 291, 393, + 829, 257, 688, 544, 1412, 293, 291, 393, 611, 406, 6327, 484, 428, 14862, 10406, + 558, 11926, 50664], "temperature": 0.0, "avg_logprob": -0.14703163023917906, "compression_ratio": + 1.7720588235294117, "no_speech_prob": 0.0033860597759485245}, {"id": 95, "seek": + 70592, "start": 711.92, "end": 716.56, "text": " you have such a thing if you''re + on prem but that''s what that''s the reader lm i''ll say this", "tokens": [50664, + 291, 362, 1270, 257, 551, 498, 291, 434, 322, 5624, 457, 300, 311, 437, 300, 311, + 264, 15149, 287, 76, 741, 603, 584, 341, 50896], "temperature": 0.0, "avg_logprob": + -0.14703163023917906, "compression_ratio": 1.7720588235294117, "no_speech_prob": + 0.0033860597759485245}, {"id": 96, "seek": 70592, "start": 716.56, "end": 723.52, + "text": " their reader lm are the unsung heroes of especially search but also a + rag when you''re looking at", "tokens": [50896, 641, 15149, 287, 76, 366, 264, 2693, + 1063, 12332, 295, 2318, 3164, 457, 611, 257, 17539, 562, 291, 434, 1237, 412, 51244], + "temperature": 0.0, "avg_logprob": -0.14703163023917906, "compression_ratio": 1.7720588235294117, + "no_speech_prob": 0.0033860597759485245}, {"id": 97, "seek": 70592, "start": 723.52, + "end": 728.8, "text": " i would say bing or or chat gpt and you ask it a question + and it goes and fetches documents from", "tokens": [51244, 741, 576, 584, 272, 278, + 420, 420, 5081, 290, 662, 293, 291, 1029, 309, 257, 1168, 293, 309, 1709, 293, 15136, + 3781, 8512, 490, 51508], "temperature": 0.0, "avg_logprob": -0.14703163023917906, + "compression_ratio": 1.7720588235294117, "no_speech_prob": 0.0033860597759485245}, + {"id": 98, "seek": 70592, "start": 728.8, "end": 734.9599999999999, "text": " the + web it''s almost certainly using a reader llm to determine which pages are best + and to be fair", "tokens": [51508, 264, 3670, 309, 311, 1920, 3297, 1228, 257, 15149, + 287, 75, 76, 281, 6997, 597, 7183, 366, 1151, 293, 281, 312, 3143, 51816], "temperature": + 0.0, "avg_logprob": -0.14703163023917906, "compression_ratio": 1.7720588235294117, + "no_speech_prob": 0.0033860597759485245}, {"id": 99, "seek": 73496, "start": 735.36, + "end": 739.6, "text": " being in google have incredible knowledge of that already + so it''s not like it''s that hard but then", "tokens": [50384, 885, 294, 20742, + 362, 4651, 3601, 295, 300, 1217, 370, 309, 311, 406, 411, 309, 311, 300, 1152, 457, + 550, 50596], "temperature": 0.0, "avg_logprob": -0.11065021861683239, "compression_ratio": + 1.8, "no_speech_prob": 0.00176139734685421}, {"id": 100, "seek": 73496, "start": + 739.6, "end": 743.76, "text": " they''re almost certainly reading the most relevant + passages right they''re not just passing the whole web", "tokens": [50596, 436, + 434, 1920, 3297, 3760, 264, 881, 7340, 31589, 558, 436, 434, 406, 445, 8437, 264, + 1379, 3670, 50804], "temperature": 0.0, "avg_logprob": -0.11065021861683239, "compression_ratio": + 1.8, "no_speech_prob": 0.00176139734685421}, {"id": 101, "seek": 73496, "start": + 743.76, "end": 750.72, "text": " page in so reader lm''s are a thing they''re definitely + becoming more and more prevalent and they", "tokens": [50804, 3028, 294, 370, 15149, + 287, 76, 311, 366, 257, 551, 436, 434, 2138, 5617, 544, 293, 544, 30652, 293, 436, + 51152], "temperature": 0.0, "avg_logprob": -0.11065021861683239, "compression_ratio": + 1.8, "no_speech_prob": 0.00176139734685421}, {"id": 102, "seek": 73496, "start": + 750.72, "end": 755.76, "text": " provide a critical non hallucinating step to help + find the best results so the user doesn''t have to", "tokens": [51152, 2893, 257, + 4924, 2107, 35212, 8205, 1823, 281, 854, 915, 264, 1151, 3542, 370, 264, 4195, 1177, + 380, 362, 281, 51404], "temperature": 0.0, "avg_logprob": -0.11065021861683239, + "compression_ratio": 1.8, "no_speech_prob": 0.00176139734685421}, {"id": 103, "seek": + 73496, "start": 756.64, "end": 764.88, "text": " and that''s very interesting and + and how let''s say if you plug into a companies network right so", "tokens": [51448, + 293, 300, 311, 588, 1880, 293, 293, 577, 718, 311, 584, 498, 291, 5452, 666, 257, + 3431, 3209, 558, 370, 51860], "temperature": 0.0, "avg_logprob": -0.11065021861683239, + "compression_ratio": 1.8, "no_speech_prob": 0.00176139734685421}, {"id": 104, "seek": + 76488, "start": 764.88, "end": 770.64, "text": " and they focus on something i don''t + know healthcare banking what have you would you need to fine tune", "tokens": [50364, + 293, 436, 1879, 322, 746, 741, 500, 380, 458, 8884, 18261, 437, 362, 291, 576, 291, + 643, 281, 2489, 10864, 50652], "temperature": 0.0, "avg_logprob": -0.09680914324383404, + "compression_ratio": 1.6936170212765957, "no_speech_prob": 0.00058299012016505}, + {"id": 105, "seek": 76488, "start": 770.64, "end": 778.24, "text": " reader lm in + any way no i actually don''t recommend it i think there''s a lot of evidence that + fine", "tokens": [50652, 15149, 287, 76, 294, 604, 636, 572, 741, 767, 500, 380, + 2748, 309, 741, 519, 456, 311, 257, 688, 295, 4467, 300, 2489, 51032], "temperature": + 0.0, "avg_logprob": -0.09680914324383404, "compression_ratio": 1.6936170212765957, + "no_speech_prob": 0.00058299012016505}, {"id": 106, "seek": 76488, "start": 778.24, + "end": 783.12, "text": " tuning because of its fundamentally lossy process right + is somewhat responsible for hallucinations", "tokens": [51032, 15164, 570, 295, + 1080, 17879, 4470, 88, 1399, 558, 307, 8344, 6250, 337, 35212, 10325, 51276], "temperature": + 0.0, "avg_logprob": -0.09680914324383404, "compression_ratio": 1.6936170212765957, + "no_speech_prob": 0.00058299012016505}, {"id": 107, "seek": 76488, "start": 783.12, + "end": 788.0, "text": " there''s been quite a bit written about this and i think + that ultimately the the winning combination", "tokens": [51276, 456, 311, 668, 1596, + 257, 857, 3720, 466, 341, 293, 741, 519, 300, 6284, 264, 264, 8224, 6562, 51520], + "temperature": 0.0, "avg_logprob": -0.09680914324383404, "compression_ratio": 1.6936170212765957, + "no_speech_prob": 0.00058299012016505}, {"id": 108, "seek": 78800, "start": 788.08, + "end": 795.52, "text": " today is that you use a very well trained capable model + that is generalist and you provide it with", "tokens": [50368, 965, 307, 300, 291, + 764, 257, 588, 731, 8895, 8189, 2316, 300, 307, 2674, 468, 293, 291, 2893, 309, + 365, 50740], "temperature": 0.0, "avg_logprob": -0.060065312044961114, "compression_ratio": + 1.9203187250996017, "no_speech_prob": 0.004245159216225147}, {"id": 109, "seek": + 78800, "start": 795.52, "end": 800.24, "text": " the data that you need to provide + it with at the moment you need to for example swirls prompt", "tokens": [50740, + 264, 1412, 300, 291, 643, 281, 2893, 309, 365, 412, 264, 1623, 291, 643, 281, 337, + 1365, 30310, 82, 12391, 50976], "temperature": 0.0, "avg_logprob": -0.060065312044961114, + "compression_ratio": 1.9203187250996017, "no_speech_prob": 0.004245159216225147}, + {"id": 110, "seek": 78800, "start": 800.24, "end": 805.2, "text": " engineering + does a few things one we force it to only consider the rag data and not add its + own", "tokens": [50976, 7043, 775, 257, 1326, 721, 472, 321, 3464, 309, 281, 787, + 1949, 264, 17539, 1412, 293, 406, 909, 1080, 1065, 51224], "temperature": 0.0, "avg_logprob": + -0.060065312044961114, "compression_ratio": 1.9203187250996017, "no_speech_prob": + 0.004245159216225147}, {"id": 111, "seek": 78800, "start": 805.2, "end": 809.84, + "text": " model thoughts right you can interpret but don''t say don''t create facts + that aren''t presented to you", "tokens": [51224, 2316, 4598, 558, 291, 393, 7302, + 457, 500, 380, 584, 500, 380, 1884, 9130, 300, 3212, 380, 8212, 281, 291, 51456], + "temperature": 0.0, "avg_logprob": -0.060065312044961114, "compression_ratio": 1.9203187250996017, + "no_speech_prob": 0.004245159216225147}, {"id": 112, "seek": 78800, "start": 810.64, + "end": 816.0, "text": " second force it to disambiguate this is one of the worst + errors in prompt engineering is not", "tokens": [51496, 1150, 3464, 309, 281, 717, + 2173, 328, 10107, 341, 307, 472, 295, 264, 5855, 13603, 294, 12391, 7043, 307, 406, + 51764], "temperature": 0.0, "avg_logprob": -0.060065312044961114, "compression_ratio": + 1.9203187250996017, "no_speech_prob": 0.004245159216225147}, {"id": 113, "seek": + 81600, "start": 816.0, "end": 820.64, "text": " is just letting it go right up on + past equating right two entities with the same name as if they''re", "tokens": [50364, + 307, 445, 8295, 309, 352, 558, 493, 322, 1791, 1267, 990, 558, 732, 16667, 365, + 264, 912, 1315, 382, 498, 436, 434, 50596], "temperature": 0.0, "avg_logprob": -0.12356334924697876, + "compression_ratio": 1.8438661710037174, "no_speech_prob": 0.0021778661757707596}, + {"id": 114, "seek": 81600, "start": 820.64, "end": 826.08, "text": " the same thing + so our default engineering says listen if you see and into two entities with the + same", "tokens": [50596, 264, 912, 551, 370, 527, 7576, 7043, 1619, 2140, 498, 291, + 536, 293, 666, 732, 16667, 365, 264, 912, 50868], "temperature": 0.0, "avg_logprob": + -0.12356334924697876, "compression_ratio": 1.8438661710037174, "no_speech_prob": + 0.0021778661757707596}, {"id": 115, "seek": 81600, "start": 826.08, "end": 831.84, + "text": " name don''t you know essentially call that out and don''t just gloss over + it the last one is especially", "tokens": [50868, 1315, 500, 380, 291, 458, 4476, + 818, 300, 484, 293, 500, 380, 445, 19574, 670, 309, 264, 1036, 472, 307, 2318, 51156], + "temperature": 0.0, "avg_logprob": -0.12356334924697876, "compression_ratio": 1.8438661710037174, + "no_speech_prob": 0.0021778661757707596}, {"id": 116, "seek": 81600, "start": 831.84, + "end": 836.56, "text": " when you''re talking about multiple sources of data and + enterprise data the user must be able to", "tokens": [51156, 562, 291, 434, 1417, + 466, 3866, 7139, 295, 1412, 293, 14132, 1412, 264, 4195, 1633, 312, 1075, 281, 51392], + "temperature": 0.0, "avg_logprob": -0.12356334924697876, "compression_ratio": 1.8438661710037174, + "no_speech_prob": 0.0021778661757707596}, {"id": 117, "seek": 81600, "start": 836.56, + "end": 841.28, "text": " verify or nobody wants to make a career limiting move because + they took chat gpt''s and answer and", "tokens": [51392, 16888, 420, 5079, 2738, + 281, 652, 257, 3988, 22083, 1286, 570, 436, 1890, 5081, 290, 662, 311, 293, 1867, + 293, 51628], "temperature": 0.0, "avg_logprob": -0.12356334924697876, "compression_ratio": + 1.8438661710037174, "no_speech_prob": 0.0021778661757707596}, {"id": 118, "seek": + 84128, "start": 841.28, "end": 846.24, "text": " said here it is here it is right + put it up on the on the investor site not a good idea but", "tokens": [50364, 848, + 510, 309, 307, 510, 309, 307, 558, 829, 309, 493, 322, 264, 322, 264, 18479, 3621, + 406, 257, 665, 1558, 457, 50612], "temperature": 0.0, "avg_logprob": -0.11461341164328835, + "compression_ratio": 1.7735849056603774, "no_speech_prob": 0.00804323423653841}, + {"id": 119, "seek": 84128, "start": 846.8, "end": 852.48, "text": " swirl also forces + the AI to quote the sources that you use to cite them and of course you also have", + "tokens": [50640, 30310, 611, 5874, 264, 7318, 281, 6513, 264, 7139, 300, 291, 764, + 281, 37771, 552, 293, 295, 1164, 291, 611, 362, 50924], "temperature": 0.0, "avg_logprob": + -0.11461341164328835, "compression_ratio": 1.7735849056603774, "no_speech_prob": + 0.00804323423653841}, {"id": 120, "seek": 84128, "start": 852.48, "end": 856.4, + "text": " access to the underlying search results right so you can verify that yes + you have a million", "tokens": [50924, 2105, 281, 264, 14217, 3164, 3542, 558, 370, + 291, 393, 16888, 300, 2086, 291, 362, 257, 2459, 51120], "temperature": 0.0, "avg_logprob": + -0.11461341164328835, "compression_ratio": 1.7735849056603774, "no_speech_prob": + 0.00804323423653841}, {"id": 121, "seek": 84128, "start": 856.4, "end": 862.0, "text": + " dollars in insurance coverage and it covers x y and z that''s key yeah that''s + amazing I was just", "tokens": [51120, 3808, 294, 7214, 9645, 293, 309, 10538, 2031, + 288, 293, 710, 300, 311, 2141, 1338, 300, 311, 2243, 286, 390, 445, 51400], "temperature": + 0.0, "avg_logprob": -0.11461341164328835, "compression_ratio": 1.7735849056603774, + "no_speech_prob": 0.00804323423653841}, {"id": 122, "seek": 84128, "start": 862.0, + "end": 867.1999999999999, "text": " you reminded me of when you said about hallucinations + I was just listening to one interview", "tokens": [51400, 291, 15920, 385, 295, + 562, 291, 848, 466, 35212, 10325, 286, 390, 445, 4764, 281, 472, 4049, 51660], "temperature": + 0.0, "avg_logprob": -0.11461341164328835, "compression_ratio": 1.7735849056603774, + "no_speech_prob": 0.00804323423653841}, {"id": 123, "seek": 86720, "start": 868.0, + "end": 874.88, "text": " is not related to AI world attack world it''s political + sciences and so she was asked the", "tokens": [50404, 307, 406, 4077, 281, 7318, + 1002, 2690, 1002, 309, 311, 3905, 17677, 293, 370, 750, 390, 2351, 264, 50748], + "temperature": 0.0, "avg_logprob": -0.1185360116473699, "compression_ratio": 1.8588235294117648, + "no_speech_prob": 0.014525174163281918}, {"id": 124, "seek": 86720, "start": 874.88, + "end": 879.44, "text": " scientists she was asked you know are you using chat gpt + at work and she said yes sometimes I do", "tokens": [50748, 7708, 750, 390, 2351, + 291, 458, 366, 291, 1228, 5081, 290, 662, 412, 589, 293, 750, 848, 2086, 2171, 286, + 360, 50976], "temperature": 0.0, "avg_logprob": -0.1185360116473699, "compression_ratio": + 1.8588235294117648, "no_speech_prob": 0.014525174163281918}, {"id": 125, "seek": + 86720, "start": 879.44, "end": 885.9200000000001, "text": " sometimes I use it as + a co-writer so you know I I draft some things quickly and and I still see", "tokens": + [50976, 2171, 286, 764, 309, 382, 257, 598, 12, 23681, 370, 291, 458, 286, 286, + 11206, 512, 721, 2661, 293, 293, 286, 920, 536, 51300], "temperature": 0.0, "avg_logprob": + -0.1185360116473699, "compression_ratio": 1.8588235294117648, "no_speech_prob": + 0.014525174163281918}, {"id": 126, "seek": 86720, "start": 885.9200000000001, "end": + 890.1600000000001, "text": " that chat gpt is very crude you know in the way it + approaches you know I can do it better but", "tokens": [51300, 300, 5081, 290, 662, + 307, 588, 30796, 291, 458, 294, 264, 636, 309, 11587, 291, 458, 286, 393, 360, 309, + 1101, 457, 51512], "temperature": 0.0, "avg_logprob": -0.1185360116473699, "compression_ratio": + 1.8588235294117648, "no_speech_prob": 0.014525174163281918}, {"id": 127, "seek": + 86720, "start": 890.1600000000001, "end": 896.88, "text": " sometimes I''m just + you know lazy or tired okay let it do it but then the thing that struck her was", + "tokens": [51512, 2171, 286, 478, 445, 291, 458, 14847, 420, 5868, 1392, 718, 309, + 360, 309, 457, 550, 264, 551, 300, 13159, 720, 390, 51848], "temperature": 0.0, + "avg_logprob": -0.1185360116473699, "compression_ratio": 1.8588235294117648, "no_speech_prob": + 0.014525174163281918}, {"id": 128, "seek": 89688, "start": 896.88, "end": 903.04, + "text": " that it actually hallucinates she was asking give me you know top five + books in political science", "tokens": [50364, 300, 309, 767, 35212, 259, 1024, + 750, 390, 3365, 976, 385, 291, 458, 1192, 1732, 3642, 294, 3905, 3497, 50672], "temperature": + 0.0, "avg_logprob": -0.09582684491131757, "compression_ratio": 1.9467213114754098, + "no_speech_prob": 0.0008806909318082035}, {"id": 129, "seek": 89688, "start": 903.04, + "end": 908.88, "text": " you know in specific country and chat gpt was very confident + and and said they said the five books", "tokens": [50672, 291, 458, 294, 2685, 1941, + 293, 5081, 290, 662, 390, 588, 6679, 293, 293, 848, 436, 848, 264, 1732, 3642, 50964], + "temperature": 0.0, "avg_logprob": -0.09582684491131757, "compression_ratio": 1.9467213114754098, + "no_speech_prob": 0.0008806909318082035}, {"id": 130, "seek": 89688, "start": 908.88, + "end": 913.92, "text": " and the authors and when she googled them they don''t exist + and and then she said they don''t exist", "tokens": [50964, 293, 264, 16552, 293, + 562, 750, 50061, 1493, 552, 436, 500, 380, 2514, 293, 293, 550, 750, 848, 436, 500, + 380, 2514, 51216], "temperature": 0.0, "avg_logprob": -0.09582684491131757, "compression_ratio": + 1.9467213114754098, "no_speech_prob": 0.0008806909318082035}, {"id": 131, "seek": + 89688, "start": 914.64, "end": 918.96, "text": " and then chat gpt responded okay + here is only one book that you should read and that didn''t", "tokens": [51252, + 293, 550, 5081, 290, 662, 15806, 1392, 510, 307, 787, 472, 1446, 300, 291, 820, + 1401, 293, 300, 994, 380, 51468], "temperature": 0.0, "avg_logprob": -0.09582684491131757, + "compression_ratio": 1.9467213114754098, "no_speech_prob": 0.0008806909318082035}, + {"id": 132, "seek": 89688, "start": 918.96, "end": 925.12, "text": " exist either + so she was genuinely like baffled and she said okay you might say something", "tokens": + [51468, 2514, 2139, 370, 750, 390, 17839, 411, 272, 2518, 1493, 293, 750, 848, 1392, + 291, 1062, 584, 746, 51776], "temperature": 0.0, "avg_logprob": -0.09582684491131757, + "compression_ratio": 1.9467213114754098, "no_speech_prob": 0.0008806909318082035}, + {"id": 133, "seek": 92512, "start": 925.92, "end": 933.12, "text": " with less confidence + but why lie why do you lie she doesn''t know what is hallucinations but she''s", + "tokens": [50404, 365, 1570, 6687, 457, 983, 4544, 983, 360, 291, 4544, 750, 1177, + 380, 458, 437, 307, 35212, 10325, 457, 750, 311, 50764], "temperature": 0.0, "avg_logprob": + -0.08857180408595763, "compression_ratio": 1.75, "no_speech_prob": 0.015747902914881706}, + {"id": 134, "seek": 92512, "start": 933.12, "end": 939.52, "text": " she looks at + it as a user and it''s very disconcerting so believe it or not when I first started", + "tokens": [50764, 750, 1542, 412, 309, 382, 257, 4195, 293, 309, 311, 588, 717, + 1671, 1776, 783, 370, 1697, 309, 420, 406, 562, 286, 700, 1409, 51084], "temperature": + 0.0, "avg_logprob": -0.08857180408595763, "compression_ratio": 1.75, "no_speech_prob": + 0.015747902914881706}, {"id": 135, "seek": 92512, "start": 939.52, "end": 944.96, + "text": " using gpt 4 I got a hallucination that I thought was so real I wrote to + the publisher and said why is", "tokens": [51084, 1228, 290, 662, 1017, 286, 658, + 257, 35212, 2486, 300, 286, 1194, 390, 370, 957, 286, 4114, 281, 264, 25088, 293, + 848, 983, 307, 51356], "temperature": 0.0, "avg_logprob": -0.08857180408595763, + "compression_ratio": 1.75, "no_speech_prob": 0.015747902914881706}, {"id": 136, + "seek": 92512, "start": 944.96, "end": 950.0, "text": " this article no longer online + and the publisher wrote back and said there is no such article but", "tokens": [51356, + 341, 7222, 572, 2854, 2950, 293, 264, 25088, 4114, 646, 293, 848, 456, 307, 572, + 1270, 7222, 457, 51608], "temperature": 0.0, "avg_logprob": -0.08857180408595763, + "compression_ratio": 1.75, "no_speech_prob": 0.015747902914881706}, {"id": 137, + "seek": 95000, "start": 950.0, "end": 955.12, "text": " it could have been it was + authored by someone they said gpt 4 said it was authored by another author", "tokens": + [50364, 309, 727, 362, 668, 309, 390, 6979, 2769, 538, 1580, 436, 848, 290, 662, + 1017, 848, 309, 390, 6979, 2769, 538, 1071, 3793, 50620], "temperature": 0.0, "avg_logprob": + -0.11615510490851674, "compression_ratio": 1.7544483985765125, "no_speech_prob": + 0.02273571863770485}, {"id": 138, "seek": 95000, "start": 955.12, "end": 960.0, + "text": " who had posted on that site the url looked correct and the content looked + I mean the snippet", "tokens": [50620, 567, 632, 9437, 322, 300, 3621, 264, 4038, + 75, 2956, 3006, 293, 264, 2701, 2956, 286, 914, 264, 35623, 302, 50864], "temperature": + 0.0, "avg_logprob": -0.11615510490851674, "compression_ratio": 1.7544483985765125, + "no_speech_prob": 0.02273571863770485}, {"id": 139, "seek": 95000, "start": 960.0, + "end": 966.16, "text": " it gave me looked absolutely real but again when they build + these models a few you know 10 20 gigabyte", "tokens": [50864, 309, 2729, 385, 2956, + 3122, 957, 457, 797, 562, 436, 1322, 613, 5245, 257, 1326, 291, 458, 1266, 945, + 8741, 34529, 51172], "temperature": 0.0, "avg_logprob": -0.11615510490851674, "compression_ratio": + 1.7544483985765125, "no_speech_prob": 0.02273571863770485}, {"id": 140, "seek": + 95000, "start": 966.16, "end": 971.44, "text": " model right of gpt 4 or 35 or whatever + it is petabytes and petabytes of data went into that so by", "tokens": [51172, 2316, + 558, 295, 290, 662, 1017, 420, 6976, 420, 2035, 309, 307, 3817, 24538, 293, 3817, + 24538, 295, 1412, 1437, 666, 300, 370, 538, 51436], "temperature": 0.0, "avg_logprob": + -0.11615510490851674, "compression_ratio": 1.7544483985765125, "no_speech_prob": + 0.02273571863770485}, {"id": 141, "seek": 95000, "start": 971.44, "end": 977.52, + "text": " definition it''s lossy but the way the lllm the generative part works + is it must provide a response", "tokens": [51436, 7123, 309, 311, 4470, 88, 457, + 264, 636, 264, 287, 285, 76, 264, 1337, 1166, 644, 1985, 307, 309, 1633, 2893, 257, + 4134, 51740], "temperature": 0.0, "avg_logprob": -0.11615510490851674, "compression_ratio": + 1.7544483985765125, "no_speech_prob": 0.02273571863770485}, {"id": 142, "seek": + 97752, "start": 977.52, "end": 982.3199999999999, "text": " so you know how that + is when you can''t quite remember the name of something and it''s essentially", + "tokens": [50364, 370, 291, 458, 577, 300, 307, 562, 291, 393, 380, 1596, 1604, + 264, 1315, 295, 746, 293, 309, 311, 4476, 50604], "temperature": 0.0, "avg_logprob": + -0.08851746532404534, "compression_ratio": 1.7740740740740741, "no_speech_prob": + 0.0024549805093556643}, {"id": 143, "seek": 97752, "start": 982.3199999999999, "end": + 986.24, "text": " doing the same thing so it knows it I saw an artifact that looked + like that but I don''t have the", "tokens": [50604, 884, 264, 912, 551, 370, 309, + 3255, 309, 286, 1866, 364, 34806, 300, 2956, 411, 300, 457, 286, 500, 380, 362, + 264, 50800], "temperature": 0.0, "avg_logprob": -0.08851746532404534, "compression_ratio": + 1.7740740740740741, "no_speech_prob": 0.0024549805093556643}, {"id": 144, "seek": + 97752, "start": 986.24, "end": 993.1999999999999, "text": " artifact anymore so + it generates something that is the consensus version of what it would have been", + "tokens": [50800, 34806, 3602, 370, 309, 23815, 746, 300, 307, 264, 19115, 3037, + 295, 437, 309, 576, 362, 668, 51148], "temperature": 0.0, "avg_logprob": -0.08851746532404534, + "compression_ratio": 1.7740740740740741, "no_speech_prob": 0.0024549805093556643}, + {"id": 145, "seek": 97752, "start": 993.1999999999999, "end": 999.52, "text": " + had it existed and that''s why I don''t believe in fine tuning so much think if + you have a high", "tokens": [51148, 632, 309, 13135, 293, 300, 311, 983, 286, 500, + 380, 1697, 294, 2489, 15164, 370, 709, 519, 498, 291, 362, 257, 1090, 51464], "temperature": + 0.0, "avg_logprob": -0.08851746532404534, "compression_ratio": 1.7740740740740741, + "no_speech_prob": 0.0024549805093556643}, {"id": 146, "seek": 97752, "start": 999.52, + "end": 1004.4, "text": " capable model with some reasoning and the ability to interpret + text and follow instructions", "tokens": [51464, 8189, 2316, 365, 512, 21577, 293, + 264, 3485, 281, 7302, 2487, 293, 1524, 9415, 51708], "temperature": 0.0, "avg_logprob": + -0.08851746532404534, "compression_ratio": 1.7740740740740741, "no_speech_prob": + 0.0024549805093556643}, {"id": 147, "seek": 100440, "start": 1004.4, "end": 1011.04, + "text": " you provide it with your internal data and that is the beauty of reg because + here''s the thing", "tokens": [50364, 291, 2893, 309, 365, 428, 6920, 1412, 293, + 300, 307, 264, 6643, 295, 1121, 570, 510, 311, 264, 551, 50696], "temperature": + 0.0, "avg_logprob": -0.13919814573515446, "compression_ratio": 1.7984790874524714, + "no_speech_prob": 0.0036329131107777357}, {"id": 148, "seek": 100440, "start": 1011.68, + "end": 1016.24, "text": " the reason it''s so good at things why does why does chat + gpt 4 sound like a smart person on", "tokens": [50728, 264, 1778, 309, 311, 370, + 665, 412, 721, 983, 775, 983, 775, 5081, 290, 662, 1017, 1626, 411, 257, 4069, 954, + 322, 50956], "temperature": 0.0, "avg_logprob": -0.13919814573515446, "compression_ratio": + 1.7984790874524714, "no_speech_prob": 0.0036329131107777357}, {"id": 149, "seek": + 100440, "start": 1016.24, "end": 1020.24, "text": " you know reddit or or some or + Facebook or something like that right and that''s because that''s", "tokens": [50956, + 291, 458, 2182, 17975, 420, 420, 512, 420, 4384, 420, 746, 411, 300, 558, 293, 300, + 311, 570, 300, 311, 51156], "temperature": 0.0, "avg_logprob": -0.13919814573515446, + "compression_ratio": 1.7984790874524714, "no_speech_prob": 0.0036329131107777357}, + {"id": 150, "seek": 100440, "start": 1020.24, "end": 1025.2, "text": " where it + was trained from you''re internal and of course on something like reddit or whatever + they", "tokens": [51156, 689, 309, 390, 8895, 490, 291, 434, 6920, 293, 295, 1164, + 322, 746, 411, 2182, 17975, 420, 2035, 436, 51404], "temperature": 0.0, "avg_logprob": + -0.13919814573515446, "compression_ratio": 1.7984790874524714, "no_speech_prob": + 0.0036329131107777357}, {"id": 151, "seek": 100440, "start": 1025.2, "end": 1029.44, + "text": " have a new the same conversation 10 million times right I mean how many + discussions of whatever", "tokens": [51404, 362, 257, 777, 264, 912, 3761, 1266, + 2459, 1413, 558, 286, 914, 577, 867, 11088, 295, 2035, 51616], "temperature": 0.0, + "avg_logprob": -0.13919814573515446, "compression_ratio": 1.7984790874524714, "no_speech_prob": + 0.0036329131107777357}, {"id": 152, "seek": 102944, "start": 1029.44, "end": 1034.88, + "text": " twin peaks or battle star galactica you know are there there are a lot + of them and so it learns the", "tokens": [50364, 18397, 26897, 420, 4635, 3543, + 7660, 578, 2262, 291, 458, 366, 456, 456, 366, 257, 688, 295, 552, 293, 370, 309, + 27152, 264, 50636], "temperature": 0.0, "avg_logprob": -0.10733833483287267, "compression_ratio": + 1.8544776119402986, "no_speech_prob": 0.00649447413161397}, {"id": 153, "seek": + 102944, "start": 1034.88, "end": 1039.6000000000001, "text": " core of these things + right and can answer those questions but if you feed at your internal data like", + "tokens": [50636, 4965, 295, 613, 721, 558, 293, 393, 1867, 729, 1651, 457, 498, + 291, 3154, 412, 428, 6920, 1412, 411, 50872], "temperature": 0.0, "avg_logprob": + -0.10733833483287267, "compression_ratio": 1.8544776119402986, "no_speech_prob": + 0.00649447413161397}, {"id": 154, "seek": 102944, "start": 1039.6000000000001, "end": + 1046.56, "text": " it''s probably not so repetitive it''s probably much more conflicting + than not and so you that''s", "tokens": [50872, 309, 311, 1391, 406, 370, 29404, + 309, 311, 1391, 709, 544, 43784, 813, 406, 293, 370, 291, 300, 311, 51220], "temperature": + 0.0, "avg_logprob": -0.10733833483287267, "compression_ratio": 1.8544776119402986, + "no_speech_prob": 0.00649447413161397}, {"id": 155, "seek": 102944, "start": 1046.56, + "end": 1051.28, "text": " why you produce more problems it''s much better give it + the one thing that''s really relevant and let it", "tokens": [51220, 983, 291, 5258, + 544, 2740, 309, 311, 709, 1101, 976, 309, 264, 472, 551, 300, 311, 534, 7340, 293, + 718, 309, 51456], "temperature": 0.0, "avg_logprob": -0.10733833483287267, "compression_ratio": + 1.8544776119402986, "no_speech_prob": 0.00649447413161397}, {"id": 156, "seek": + 102944, "start": 1051.28, "end": 1058.4, "text": " reason yeah that sounds go and + slight live right it''s something that can be updated throughout the", "tokens": + [51456, 1778, 1338, 300, 3263, 352, 293, 4036, 1621, 558, 309, 311, 746, 300, 393, + 312, 10588, 3710, 264, 51812], "temperature": 0.0, "avg_logprob": -0.10733833483287267, + "compression_ratio": 1.8544776119402986, "no_speech_prob": 0.00649447413161397}, + {"id": 157, "seek": 105840, "start": 1058.4, "end": 1063.2, "text": " lifecycle + of your company or department whatever but there is one challenge they want to offer + to you", "tokens": [50364, 45722, 295, 428, 2237, 420, 5882, 2035, 457, 456, 307, + 472, 3430, 436, 528, 281, 2626, 281, 291, 50604], "temperature": 0.0, "avg_logprob": + -0.14367870850996536, "compression_ratio": 1.7012987012987013, "no_speech_prob": + 0.0036661839112639427}, {"id": 158, "seek": 105840, "start": 1063.2, "end": 1071.0400000000002, + "text": " and came to me just today as I was thinking and preparing for this episode + data is not gold you know", "tokens": [50604, 293, 1361, 281, 385, 445, 965, 382, + 286, 390, 1953, 293, 10075, 337, 341, 3500, 1412, 307, 406, 3821, 291, 458, 50996], + "temperature": 0.0, "avg_logprob": -0.14367870850996536, "compression_ratio": 1.7012987012987013, + "no_speech_prob": 0.0036661839112639427}, {"id": 159, "seek": 105840, "start": 1071.0400000000002, + "end": 1076.72, "text": " sometimes it is gold because everyone talks about it la + la la but like it also is very complex", "tokens": [50996, 2171, 309, 307, 3821, + 570, 1518, 6686, 466, 309, 635, 635, 635, 457, 411, 309, 611, 307, 588, 3997, 51280], + "temperature": 0.0, "avg_logprob": -0.14367870850996536, "compression_ratio": 1.7012987012987013, + "no_speech_prob": 0.0036661839112639427}, {"id": 160, "seek": 105840, "start": 1076.72, + "end": 1082.8000000000002, "text": " machinery and it can have the stakes of it + of its own you know misattribution misclassification", "tokens": [51280, 27302, + 293, 309, 393, 362, 264, 28429, 295, 309, 295, 1080, 1065, 291, 458, 3346, 1591, + 30783, 3346, 11665, 3774, 51584], "temperature": 0.0, "avg_logprob": -0.14367870850996536, + "compression_ratio": 1.7012987012987013, "no_speech_prob": 0.0036661839112639427}, + {"id": 161, "seek": 108280, "start": 1083.68, "end": 1091.52, "text": " and human + error what have you how how would you say reader lllm or swirl gonna tackle this", + "tokens": [50408, 293, 1952, 6713, 437, 362, 291, 577, 577, 576, 291, 584, 15149, + 287, 285, 76, 420, 30310, 799, 14896, 341, 50800], "temperature": 0.0, "avg_logprob": + -0.18142213017107492, "compression_ratio": 1.6077586206896552, "no_speech_prob": + 0.004327643662691116}, {"id": 162, "seek": 108280, "start": 1091.52, "end": 1096.56, + "text": " issue or is it just gonna transparently sort of like garbage in garbage + out type of response", "tokens": [50800, 2734, 420, 307, 309, 445, 799, 7132, 6420, + 1333, 295, 411, 14150, 294, 14150, 484, 2010, 295, 4134, 51052], "temperature": + 0.0, "avg_logprob": -0.18142213017107492, "compression_ratio": 1.6077586206896552, + "no_speech_prob": 0.004327643662691116}, {"id": 163, "seek": 108280, "start": 1098.48, + "end": 1104.3999999999999, "text": " it''s a great question speaking of hallucinations + in AI right we all have probably worked with", "tokens": [51148, 309, 311, 257, + 869, 1168, 4124, 295, 35212, 10325, 294, 7318, 558, 321, 439, 362, 1391, 2732, 365, + 51444], "temperature": 0.0, "avg_logprob": -0.18142213017107492, "compression_ratio": + 1.6077586206896552, "no_speech_prob": 0.004327643662691116}, {"id": 164, "seek": + 108280, "start": 1104.3999999999999, "end": 1109.04, "text": " somebody at one time + another who made a mistake right or whatever didn''t understand the problem", "tokens": + [51444, 2618, 412, 472, 565, 1071, 567, 1027, 257, 6146, 558, 420, 2035, 994, 380, + 1223, 264, 1154, 51676], "temperature": 0.0, "avg_logprob": -0.18142213017107492, + "compression_ratio": 1.6077586206896552, "no_speech_prob": 0.004327643662691116}, + {"id": 165, "seek": 110904, "start": 1109.28, "end": 1113.84, "text": " enough and + that stuff gets into teams and slack and you know documents are wrong like it''s", + "tokens": [50376, 1547, 293, 300, 1507, 2170, 666, 5491, 293, 29767, 293, 291, 458, + 8512, 366, 2085, 411, 309, 311, 50604], "temperature": 0.0, "avg_logprob": -0.15550581014381265, + "compression_ratio": 1.7655677655677655, "no_speech_prob": 0.00762926647439599}, + {"id": 166, "seek": 110904, "start": 1113.84, "end": 1118.72, "text": " incredible + you''re right it''s incredibly messy in the enterprise happy as anybody not worked + at a firm", "tokens": [50604, 4651, 291, 434, 558, 309, 311, 6252, 16191, 294, 264, + 14132, 2055, 382, 4472, 406, 2732, 412, 257, 6174, 50848], "temperature": 0.0, "avg_logprob": + -0.15550581014381265, "compression_ratio": 1.7655677655677655, "no_speech_prob": + 0.00762926647439599}, {"id": 167, "seek": 110904, "start": 1118.72, "end": 1124.8, + "text": " where they had you know 500 versions of the same PowerPoint that is just + evolved right so absolutely", "tokens": [50848, 689, 436, 632, 291, 458, 5923, 9606, + 295, 264, 912, 25584, 300, 307, 445, 14178, 558, 370, 3122, 51152], "temperature": + 0.0, "avg_logprob": -0.15550581014381265, "compression_ratio": 1.7655677655677655, + "no_speech_prob": 0.00762926647439599}, {"id": 168, "seek": 110904, "start": 1124.8, + "end": 1128.8799999999999, "text": " well these are things that ultimately are gonna + have to be continued to be work on but here''s", "tokens": [51152, 731, 613, 366, + 721, 300, 6284, 366, 799, 362, 281, 312, 7014, 281, 312, 589, 322, 457, 510, 311, + 51356], "temperature": 0.0, "avg_logprob": -0.15550581014381265, "compression_ratio": + 1.7655677655677655, "no_speech_prob": 0.00762926647439599}, {"id": 169, "seek": + 110904, "start": 1128.8799999999999, "end": 1133.92, "text": " one point number + one if you leave the system in the system data in the system of record you''re", + "tokens": [51356, 472, 935, 1230, 472, 498, 291, 1856, 264, 1185, 294, 264, 1185, + 1412, 294, 264, 1185, 295, 2136, 291, 434, 51608], "temperature": 0.0, "avg_logprob": + -0.15550581014381265, "compression_ratio": 1.7655677655677655, "no_speech_prob": + 0.00762926647439599}, {"id": 170, "seek": 113392, "start": 1133.92, "end": 1139.68, + "text": " much less likely to introduce new problems especially like security problems + and you leave it in", "tokens": [50364, 709, 1570, 3700, 281, 5366, 777, 2740, 2318, + 411, 3825, 2740, 293, 291, 1856, 309, 294, 50652], "temperature": 0.0, "avg_logprob": + -0.10585203877201786, "compression_ratio": 1.772563176895307, "no_speech_prob": + 0.0005382790695875883}, {"id": 171, "seek": 113392, "start": 1139.68, "end": 1144.48, + "text": " the system of record than any domain modeling lexicon''s ontologies text + items you get the benefit of", "tokens": [50652, 264, 1185, 295, 2136, 813, 604, + 9274, 15983, 476, 87, 11911, 311, 6592, 6204, 2487, 4754, 291, 483, 264, 5121, 295, + 50892], "temperature": 0.0, "avg_logprob": -0.10585203877201786, "compression_ratio": + 1.772563176895307, "no_speech_prob": 0.0005382790695875883}, {"id": 172, "seek": + 113392, "start": 1144.48, "end": 1149.04, "text": " those if someone cared about + that source they might very well have done some of that right so if", "tokens": + [50892, 729, 498, 1580, 19779, 466, 300, 4009, 436, 1062, 588, 731, 362, 1096, 512, + 295, 300, 558, 370, 498, 51120], "temperature": 0.0, "avg_logprob": -0.10585203877201786, + "compression_ratio": 1.772563176895307, "no_speech_prob": 0.0005382790695875883}, + {"id": 173, "seek": 113392, "start": 1149.04, "end": 1154.3200000000002, "text": + " you pull it all out and put it in a vector database like what happened to all + of that knowledge so", "tokens": [51120, 291, 2235, 309, 439, 484, 293, 829, 309, + 294, 257, 8062, 8149, 411, 437, 2011, 281, 439, 295, 300, 3601, 370, 51384], "temperature": + 0.0, "avg_logprob": -0.10585203877201786, "compression_ratio": 1.772563176895307, + "no_speech_prob": 0.0005382790695875883}, {"id": 174, "seek": 113392, "start": 1154.3200000000002, + "end": 1159.8400000000001, "text": " I would argue that the systems of record that + are valuable have things in place to deal with that", "tokens": [51384, 286, 576, + 9695, 300, 264, 3652, 295, 2136, 300, 366, 8263, 362, 721, 294, 1081, 281, 2028, + 365, 300, 51660], "temperature": 0.0, "avg_logprob": -0.10585203877201786, "compression_ratio": + 1.772563176895307, "no_speech_prob": 0.0005382790695875883}, {"id": 175, "seek": + 115984, "start": 1160.48, "end": 1165.76, "text": " number two the reader lm does + a couple things that to help this one it''s aware of certain", "tokens": [50396, + 1230, 732, 264, 15149, 287, 76, 775, 257, 1916, 721, 300, 281, 854, 341, 472, 309, + 311, 3650, 295, 1629, 50660], "temperature": 0.0, "avg_logprob": -0.11508199572563171, + "compression_ratio": 1.7622641509433963, "no_speech_prob": 0.0027516174595803022}, + {"id": 176, "seek": 115984, "start": 1165.76, "end": 1172.08, "text": " commonly + problematic formats email is the worst reply forward and signature content very + very", "tokens": [50660, 12719, 19011, 25879, 3796, 307, 264, 5855, 16972, 2128, + 293, 13397, 2701, 588, 588, 50976], "temperature": 0.0, "avg_logprob": -0.11508199572563171, + "compression_ratio": 1.7622641509433963, "no_speech_prob": 0.0027516174595803022}, + {"id": 177, "seek": 115984, "start": 1172.08, "end": 1177.04, "text": " problematic + we have a solution for public data too so that you can get article content without", + "tokens": [50976, 19011, 321, 362, 257, 3827, 337, 1908, 1412, 886, 370, 300, 291, + 393, 483, 7222, 2701, 1553, 51224], "temperature": 0.0, "avg_logprob": -0.11508199572563171, + "compression_ratio": 1.7622641509433963, "no_speech_prob": 0.0027516174595803022}, + {"id": 178, "seek": 115984, "start": 1177.04, "end": 1183.36, "text": " getting + as an example at navigation advertisements cloked data stuff like that right so + because", "tokens": [51224, 1242, 382, 364, 1365, 412, 17346, 42897, 596, 9511, + 1412, 1507, 411, 300, 558, 370, 570, 51540], "temperature": 0.0, "avg_logprob": + -0.11508199572563171, "compression_ratio": 1.7622641509433963, "no_speech_prob": + 0.0027516174595803022}, {"id": 179, "seek": 115984, "start": 1183.36, "end": 1188.08, + "text": " very often public data is relevant right to to large enterprise like they + want to see policy", "tokens": [51540, 588, 2049, 1908, 1412, 307, 7340, 558, 281, + 281, 2416, 14132, 411, 436, 528, 281, 536, 3897, 51776], "temperature": 0.0, "avg_logprob": + -0.11508199572563171, "compression_ratio": 1.7622641509433963, "no_speech_prob": + 0.0027516174595803022}, {"id": 180, "seek": 118808, "start": 1188.08, "end": 1193.84, + "text": " changes regulatory changes online catalog changes right that that''s all + relevant stuff then there''s", "tokens": [50364, 2962, 18260, 2962, 2950, 19746, + 2962, 558, 300, 300, 311, 439, 7340, 1507, 550, 456, 311, 50652], "temperature": + 0.0, "avg_logprob": -0.13668353469283492, "compression_ratio": 1.8446969696969697, + "no_speech_prob": 0.0026266479399055243}, {"id": 181, "seek": 118808, "start": 1193.84, + "end": 1198.96, "text": " the similarity problem right so one of the another thing + the reader lm does it can do semantic", "tokens": [50652, 264, 32194, 1154, 558, + 370, 472, 295, 264, 1071, 551, 264, 15149, 287, 76, 775, 309, 393, 360, 47982, 50908], + "temperature": 0.0, "avg_logprob": -0.13668353469283492, "compression_ratio": 1.8446969696969697, + "no_speech_prob": 0.0026266479399055243}, {"id": 182, "seek": 118808, "start": 1198.96, + "end": 1204.8799999999999, "text": " analysis to determine which is the latest version + of the same document that''s a one of the great", "tokens": [50908, 5215, 281, 6997, + 597, 307, 264, 6792, 3037, 295, 264, 912, 4166, 300, 311, 257, 472, 295, 264, 869, + 51204], "temperature": 0.0, "avg_logprob": -0.13668353469283492, "compression_ratio": + 1.8446969696969697, "no_speech_prob": 0.0026266479399055243}, {"id": 183, "seek": + 118808, "start": 1204.8799999999999, "end": 1209.6799999999998, "text": " lm''s + are amazing at that much better at old school like multi windows setups where you''re + trying to", "tokens": [51204, 287, 76, 311, 366, 2243, 412, 300, 709, 1101, 412, + 1331, 1395, 411, 4825, 9309, 46832, 689, 291, 434, 1382, 281, 51444], "temperature": + 0.0, "avg_logprob": -0.13668353469283492, "compression_ratio": 1.8446969696969697, + "no_speech_prob": 0.0026266479399055243}, {"id": 184, "seek": 118808, "start": 1210.3999999999999, + "end": 1215.6799999999998, "text": " take out like a signature of the document and + say well this could it''s very similar but lm does", "tokens": [51480, 747, 484, + 411, 257, 13397, 295, 264, 4166, 293, 584, 731, 341, 727, 309, 311, 588, 2531, 457, + 287, 76, 775, 51744], "temperature": 0.0, "avg_logprob": -0.13668353469283492, "compression_ratio": + 1.8446969696969697, "no_speech_prob": 0.0026266479399055243}, {"id": 185, "seek": + 121568, "start": 1215.68, "end": 1219.52, "text": " it much better right and you + can quickly say now this is the latest version of that spreadsheet", "tokens": [50364, + 309, 709, 1101, 558, 293, 291, 393, 2661, 584, 586, 341, 307, 264, 6792, 3037, 295, + 300, 27733, 50556], "temperature": 0.0, "avg_logprob": -0.08372452022793057, "compression_ratio": + 1.8277153558052435, "no_speech_prob": 0.003425214672461152}, {"id": 186, "seek": + 121568, "start": 1220.48, "end": 1226.0800000000002, "text": " or you can let the + user decide it''s another thing who doesn''t love shopping I love being able to", + "tokens": [50604, 420, 291, 393, 718, 264, 4195, 4536, 309, 311, 1071, 551, 567, + 1177, 380, 959, 8688, 286, 959, 885, 1075, 281, 50884], "temperature": 0.0, "avg_logprob": + -0.08372452022793057, "compression_ratio": 1.8277153558052435, "no_speech_prob": + 0.003425214672461152}, {"id": 187, "seek": 121568, "start": 1226.0800000000002, + "end": 1231.2, "text": " look at my shopping cart full of swirl results and say + you know this one I know isn''t really relevant", "tokens": [50884, 574, 412, 452, + 8688, 5467, 1577, 295, 30310, 3542, 293, 584, 291, 458, 341, 472, 286, 458, 1943, + 380, 534, 7340, 51140], "temperature": 0.0, "avg_logprob": -0.08372452022793057, + "compression_ratio": 1.8277153558052435, "no_speech_prob": 0.003425214672461152}, + {"id": 188, "seek": 121568, "start": 1231.2, "end": 1235.8400000000001, "text": + " these are the five I''ve risen or maybe this is the source or these are the sources + that I want my", "tokens": [51140, 613, 366, 264, 1732, 286, 600, 28614, 420, 1310, + 341, 307, 264, 4009, 420, 613, 366, 264, 7139, 300, 286, 528, 452, 51372], "temperature": + 0.0, "avg_logprob": -0.08372452022793057, "compression_ratio": 1.8277153558052435, + "no_speech_prob": 0.003425214672461152}, {"id": 189, "seek": 121568, "start": 1235.8400000000001, + "end": 1241.3600000000001, "text": " data from today that''s another way of allowing + the user to bring their expertise and experience", "tokens": [51372, 1412, 490, + 965, 300, 311, 1071, 636, 295, 8293, 264, 4195, 281, 1565, 641, 11769, 293, 1752, + 51648], "temperature": 0.0, "avg_logprob": -0.08372452022793057, "compression_ratio": + 1.8277153558052435, "no_speech_prob": 0.003425214672461152}, {"id": 190, "seek": + 124136, "start": 1241.36, "end": 1247.6, "text": " and knowledge and say no no no + colibre not thoughts but snowflake not oracle whatever and I''m", "tokens": [50364, + 293, 3601, 293, 584, 572, 572, 572, 1173, 897, 265, 406, 4598, 457, 44124, 619, + 406, 420, 7041, 2035, 293, 286, 478, 50676], "temperature": 0.0, "avg_logprob": + -0.10338307263558372, "compression_ratio": 1.8277153558052435, "no_speech_prob": + 0.0075044408440589905}, {"id": 191, "seek": 124136, "start": 1247.6, "end": 1252.9599999999998, + "text": " not picking on anybody we all can say they''re all present they all have + value the question is which", "tokens": [50676, 406, 8867, 322, 4472, 321, 439, + 393, 584, 436, 434, 439, 1974, 436, 439, 362, 2158, 264, 1168, 307, 597, 50944], + "temperature": 0.0, "avg_logprob": -0.10338307263558372, "compression_ratio": 1.8277153558052435, + "no_speech_prob": 0.0075044408440589905}, {"id": 192, "seek": 124136, "start": 1252.9599999999998, + "end": 1258.24, "text": " one has the answer for me today well until the until they + can write the query with the context that", "tokens": [50944, 472, 575, 264, 1867, + 337, 385, 965, 731, 1826, 264, 1826, 436, 393, 2464, 264, 14581, 365, 264, 4319, + 300, 51208], "temperature": 0.0, "avg_logprob": -0.10338307263558372, "compression_ratio": + 1.8277153558052435, "no_speech_prob": 0.0075044408440589905}, {"id": 193, "seek": + 124136, "start": 1258.24, "end": 1262.3999999999999, "text": " answers that you + know I think the key is to keep the user in the loop make sure that there are", + "tokens": [51208, 6338, 300, 291, 458, 286, 519, 264, 2141, 307, 281, 1066, 264, + 4195, 294, 264, 6367, 652, 988, 300, 456, 366, 51416], "temperature": 0.0, "avg_logprob": + -0.10338307263558372, "compression_ratio": 1.8277153558052435, "no_speech_prob": + 0.0075044408440589905}, {"id": 194, "seek": 124136, "start": 1262.3999999999999, + "end": 1267.76, "text": " citations and ultimately that in a year the systems will + be smarter and many of these problems will", "tokens": [51416, 4814, 763, 293, 6284, + 300, 294, 257, 1064, 264, 3652, 486, 312, 20294, 293, 867, 295, 613, 2740, 486, + 51684], "temperature": 0.0, "avg_logprob": -0.10338307263558372, "compression_ratio": + 1.8277153558052435, "no_speech_prob": 0.0075044408440589905}, {"id": 195, "seek": + 126776, "start": 1267.76, "end": 1273.6, "text": " be solved after all almost all + the naive search engines right that were BM25 or whatever", "tokens": [50364, 312, + 13041, 934, 439, 1920, 439, 264, 29052, 3164, 12982, 558, 300, 645, 15901, 6074, + 420, 2035, 50656], "temperature": 0.0, "avg_logprob": -0.21619083180147058, "compression_ratio": + 1.5261044176706828, "no_speech_prob": 0.0016703270375728607}, {"id": 196, "seek": + 126776, "start": 1273.6, "end": 1278.4, "text": " pretty much all have vector upgrades + now only questions can you wait long enough to vectorize", "tokens": [50656, 1238, + 709, 439, 362, 8062, 24868, 586, 787, 1651, 393, 291, 1699, 938, 1547, 281, 8062, + 1125, 50896], "temperature": 0.0, "avg_logprob": -0.21619083180147058, "compression_ratio": + 1.5261044176706828, "no_speech_prob": 0.0016703270375728607}, {"id": 197, "seek": + 126776, "start": 1278.4, "end": 1286.0, "text": " at high-dimensional space a few + million documents exactly yeah sometimes when I use chat GPT I don''t", "tokens": + [50896, 412, 1090, 12, 18759, 1901, 257, 1326, 2459, 8512, 2293, 1338, 2171, 562, + 286, 764, 5081, 26039, 51, 286, 500, 380, 51276], "temperature": 0.0, "avg_logprob": + -0.21619083180147058, "compression_ratio": 1.5261044176706828, "no_speech_prob": + 0.0016703270375728607}, {"id": 198, "seek": 126776, "start": 1286.0, "end": 1291.36, + "text": " use it that often by the way for some reason maybe it says something about + me maybe I should you", "tokens": [51276, 764, 309, 300, 2049, 538, 264, 636, 337, + 512, 1778, 1310, 309, 1619, 746, 466, 385, 1310, 286, 820, 291, 51544], "temperature": + 0.0, "avg_logprob": -0.21619083180147058, "compression_ratio": 1.5261044176706828, + "no_speech_prob": 0.0016703270375728607}, {"id": 199, "seek": 129136, "start": 1291.84, + "end": 1298.0, "text": " learn to do it but sometimes as you said you know it just + generates something it seems a little", "tokens": [50388, 1466, 281, 360, 309, 457, + 2171, 382, 291, 848, 291, 458, 309, 445, 23815, 746, 309, 2544, 257, 707, 50696], + "temperature": 0.0, "avg_logprob": -0.10082892120861617, "compression_ratio": 1.8549618320610688, + "no_speech_prob": 0.009863103739917278}, {"id": 200, "seek": 129136, "start": 1298.0, + "end": 1302.7199999999998, "text": " average you know it''s a code snippet or something + like that you try it it doesn''t work at that point", "tokens": [50696, 4274, 291, + 458, 309, 311, 257, 3089, 35623, 302, 420, 746, 411, 300, 291, 853, 309, 309, 1177, + 380, 589, 412, 300, 935, 50932], "temperature": 0.0, "avg_logprob": -0.10082892120861617, + "compression_ratio": 1.8549618320610688, "no_speech_prob": 0.009863103739917278}, + {"id": 201, "seek": 129136, "start": 1303.4399999999998, "end": 1308.32, "text": + " when I when I get frustrated a little bit I''m like can you show me the source + maybe a link to", "tokens": [50968, 562, 286, 562, 286, 483, 15751, 257, 707, 857, + 286, 478, 411, 393, 291, 855, 385, 264, 4009, 1310, 257, 2113, 281, 51212], "temperature": + 0.0, "avg_logprob": -0.10082892120861617, "compression_ratio": 1.8549618320610688, + "no_speech_prob": 0.009863103739917278}, {"id": 202, "seek": 129136, "start": 1308.32, + "end": 1314.1599999999999, "text": " stack over flow so I can go and drill in for + myself you know I don''t have to sort of keep pounding", "tokens": [51212, 8630, + 670, 3095, 370, 286, 393, 352, 293, 11392, 294, 337, 2059, 291, 458, 286, 500, 380, + 362, 281, 1333, 295, 1066, 40034, 51504], "temperature": 0.0, "avg_logprob": -0.10082892120861617, + "compression_ratio": 1.8549618320610688, "no_speech_prob": 0.009863103739917278}, + {"id": 203, "seek": 129136, "start": 1314.1599999999999, "end": 1319.9199999999998, + "text": " and you and I''m asking you know okay that didn''t work this didn''t work + because I can do the same", "tokens": [51504, 293, 291, 293, 286, 478, 3365, 291, + 458, 1392, 300, 994, 380, 589, 341, 994, 380, 589, 570, 286, 393, 360, 264, 912, + 51792], "temperature": 0.0, "avg_logprob": -0.10082892120861617, "compression_ratio": + 1.8549618320610688, "no_speech_prob": 0.009863103739917278}, {"id": 204, "seek": + 131992, "start": 1319.92, "end": 1324.96, "text": " thing just staring at the stack + over flow page right and maybe they have been already some updates", "tokens": [50364, + 551, 445, 18043, 412, 264, 8630, 670, 3095, 3028, 558, 293, 1310, 436, 362, 668, + 1217, 512, 9205, 50616], "temperature": 0.0, "avg_logprob": -0.14670597423206677, + "compression_ratio": 1.7522522522522523, "no_speech_prob": 0.0025897531304508448}, + {"id": 205, "seek": 131992, "start": 1324.96, "end": 1329.68, "text": " and someone + said no that doesn''t work in all some times they see you see the selected answer", + "tokens": [50616, 293, 1580, 848, 572, 300, 1177, 380, 589, 294, 439, 512, 1413, + 436, 536, 291, 536, 264, 8209, 1867, 50852], "temperature": 0.0, "avg_logprob": + -0.14670597423206677, "compression_ratio": 1.7522522522522523, "no_speech_prob": + 0.0025897531304508448}, {"id": 206, "seek": 131992, "start": 1329.68, "end": 1335.76, + "text": " but then there is another one which everyone says that works not the selected + one so that''s just", "tokens": [50852, 457, 550, 456, 307, 1071, 472, 597, 1518, + 1619, 300, 1985, 406, 264, 8209, 472, 370, 300, 311, 445, 51156], "temperature": + 0.0, "avg_logprob": -0.14670597423206677, "compression_ratio": 1.7522522522522523, + "no_speech_prob": 0.0025897531304508448}, {"id": 207, "seek": 131992, "start": 1335.76, + "end": 1343.52, "text": " funny yeah that''s amazing so reader lulm like just to + sort of bring it back to the ground especially", "tokens": [51156, 4074, 1338, 300, + 311, 2243, 370, 15149, 287, 425, 76, 411, 445, 281, 1333, 295, 1565, 309, 646, 281, + 264, 2727, 2318, 51544], "temperature": 0.0, "avg_logprob": -0.14670597423206677, + "compression_ratio": 1.7522522522522523, "no_speech_prob": 0.0025897531304508448}, + {"id": 208, "seek": 134352, "start": 1343.52, "end": 1350.72, "text": " for those + who are sort of no vice like myself I still consider myself no vice you know have + you", "tokens": [50364, 337, 729, 567, 366, 1333, 295, 572, 11964, 411, 2059, 286, + 920, 1949, 2059, 572, 11964, 291, 458, 362, 291, 50724], "temperature": 0.0, "avg_logprob": + -0.14166661671229772, "compression_ratio": 1.8160377358490567, "no_speech_prob": + 0.009141028858721256}, {"id": 209, "seek": 134352, "start": 1350.72, "end": 1356.56, + "text": " sort of taken a reader lulm off the shelf you sort of implemented someone''s + paper or took it", "tokens": [50724, 1333, 295, 2726, 257, 15149, 287, 425, 76, + 766, 264, 15222, 291, 1333, 295, 12270, 1580, 311, 3035, 420, 1890, 309, 51016], + "temperature": 0.0, "avg_logprob": -0.14166661671229772, "compression_ratio": 1.8160377358490567, + "no_speech_prob": 0.009141028858721256}, {"id": 210, "seek": 134352, "start": 1356.56, + "end": 1364.56, "text": " or did you did you have to train it how did you go about + it we built it largely from it''s it''s been", "tokens": [51016, 420, 630, 291, + 630, 291, 362, 281, 3847, 309, 577, 630, 291, 352, 466, 309, 321, 3094, 309, 11611, + 490, 309, 311, 309, 311, 668, 51416], "temperature": 0.0, "avg_logprob": -0.14166661671229772, + "compression_ratio": 1.8160377358490567, "no_speech_prob": 0.009141028858721256}, + {"id": 211, "seek": 134352, "start": 1364.56, "end": 1369.76, "text": " an evolving + thing but they''re they''re definitely our other reader lulm''s out there the key + is to", "tokens": [51416, 364, 21085, 551, 457, 436, 434, 436, 434, 2138, 527, 661, + 15149, 287, 425, 76, 311, 484, 456, 264, 2141, 307, 281, 51676], "temperature": + 0.0, "avg_logprob": -0.14166661671229772, "compression_ratio": 1.8160377358490567, + "no_speech_prob": 0.009141028858721256}, {"id": 212, "seek": 136976, "start": 1369.76, + "end": 1374.4, "text": " preserve the structure right and the pieces the structure + that allows you to do similarity we", "tokens": [50364, 15665, 264, 3877, 558, 293, + 264, 3755, 264, 3877, 300, 4045, 291, 281, 360, 32194, 321, 50596], "temperature": + 0.0, "avg_logprob": -0.10438672115928248, "compression_ratio": 1.8314176245210727, + "no_speech_prob": 0.0012780092656612396}, {"id": 213, "seek": 136976, "start": 1374.4, + "end": 1378.96, "text": " implemented our own similarity and other algos we also + do things like named entity recognition and", "tokens": [50596, 12270, 527, 1065, + 32194, 293, 661, 3501, 329, 321, 611, 360, 721, 411, 4926, 13977, 11150, 293, 50824], + "temperature": 0.0, "avg_logprob": -0.10438672115928248, "compression_ratio": 1.8314176245210727, + "no_speech_prob": 0.0012780092656612396}, {"id": 214, "seek": 136976, "start": 1378.96, + "end": 1383.84, "text": " sentiment analysis well those are great at that stuff + it can do scoring for machine learning", "tokens": [50824, 16149, 5215, 731, 729, + 366, 869, 412, 300, 1507, 309, 393, 360, 22358, 337, 3479, 2539, 51068], "temperature": + 0.0, "avg_logprob": -0.10438672115928248, "compression_ratio": 1.8314176245210727, + "no_speech_prob": 0.0012780092656612396}, {"id": 215, "seek": 136976, "start": 1383.84, + "end": 1388.8799999999999, "text": " purposes right so we have a nice intention + detection system now that will essentially based on", "tokens": [51068, 9932, 558, + 370, 321, 362, 257, 1481, 7789, 17784, 1185, 586, 300, 486, 4476, 2361, 322, 51320], + "temperature": 0.0, "avg_logprob": -0.10438672115928248, "compression_ratio": 1.8314176245210727, + "no_speech_prob": 0.0012780092656612396}, {"id": 216, "seek": 136976, "start": 1388.8799999999999, + "end": 1393.76, "text": " the responses that you get to tell you which sources are + most relevant right up front right based", "tokens": [51320, 264, 13019, 300, 291, + 483, 281, 980, 291, 597, 7139, 366, 881, 7340, 558, 493, 1868, 558, 2361, 51564], + "temperature": 0.0, "avg_logprob": -0.10438672115928248, "compression_ratio": 1.8314176245210727, + "no_speech_prob": 0.0012780092656612396}, {"id": 217, "seek": 139376, "start": 1393.76, + "end": 1398.24, "text": " on the responses and also optionally ratings right if + you want to bring bring that into the system", "tokens": [50364, 322, 264, 13019, + 293, 611, 3614, 379, 24603, 558, 498, 291, 528, 281, 1565, 1565, 300, 666, 264, + 1185, 50588], "temperature": 0.0, "avg_logprob": -0.10572098348742333, "compression_ratio": + 1.8314176245210727, "no_speech_prob": 0.001105372211895883}, {"id": 218, "seek": + 139376, "start": 1399.44, "end": 1404.72, "text": " passage detection is totally + in response our reader lulm''s passage detection is totally in", "tokens": [50648, + 11497, 17784, 307, 3879, 294, 4134, 527, 15149, 287, 425, 76, 311, 11497, 17784, + 307, 3879, 294, 50912], "temperature": 0.0, "avg_logprob": -0.10572098348742333, + "compression_ratio": 1.8314176245210727, "no_speech_prob": 0.001105372211895883}, + {"id": 219, "seek": 139376, "start": 1404.72, "end": 1410.96, "text": " response + to the problem you described right which is the data is messy and we don''t necessarily + want", "tokens": [50912, 4134, 281, 264, 1154, 291, 7619, 558, 597, 307, 264, 1412, + 307, 16191, 293, 321, 500, 380, 4725, 528, 51224], "temperature": 0.0, "avg_logprob": + -0.10572098348742333, "compression_ratio": 1.8314176245210727, "no_speech_prob": + 0.001105372211895883}, {"id": 220, "seek": 139376, "start": 1410.96, "end": 1417.68, + "text": " to ship a you know 500 100 page PDFs that have essentially the same data + so there we it finds the", "tokens": [51224, 281, 5374, 257, 291, 458, 5923, 2319, + 3028, 17752, 82, 300, 362, 4476, 264, 912, 1412, 370, 456, 321, 309, 10704, 264, + 51560], "temperature": 0.0, "avg_logprob": -0.10572098348742333, "compression_ratio": + 1.8314176245210727, "no_speech_prob": 0.001105372211895883}, {"id": 221, "seek": + 139376, "start": 1417.68, "end": 1421.6, "text": " most relevant passage super quickly + and truncates the document down to a window around it", "tokens": [51560, 881, 7340, + 11497, 1687, 2661, 293, 504, 409, 66, 1024, 264, 4166, 760, 281, 257, 4910, 926, + 309, 51756], "temperature": 0.0, "avg_logprob": -0.10572098348742333, "compression_ratio": + 1.8314176245210727, "no_speech_prob": 0.001105372211895883}, {"id": 222, "seek": + 142160, "start": 1422.56, "end": 1426.8, "text": " um those those are the things + and it we''ve really implemented it ourselves it''s our it''s our own", "tokens": + [50412, 1105, 729, 729, 366, 264, 721, 293, 309, 321, 600, 534, 12270, 309, 4175, + 309, 311, 527, 309, 311, 527, 1065, 50624], "temperature": 0.0, "avg_logprob": -0.1914636960593603, + "compression_ratio": 1.8066037735849056, "no_speech_prob": 0.010378072038292885}, + {"id": 223, "seek": 142160, "start": 1426.8, "end": 1431.9199999999998, "text": + " it''s our own creation oh that''s fantastic so that''s your secret source as well + I mean that''s", "tokens": [50624, 309, 311, 527, 1065, 8016, 1954, 300, 311, 5456, + 370, 300, 311, 428, 4054, 4009, 382, 731, 286, 914, 300, 311, 50880], "temperature": + 0.0, "avg_logprob": -0.1914636960593603, "compression_ratio": 1.8066037735849056, + "no_speech_prob": 0.010378072038292885}, {"id": 224, "seek": 142160, "start": 1431.9199999999998, + "end": 1438.56, "text": " that''s something to be proud of and also I want to sort + of close up that the sort of you know", "tokens": [50880, 300, 311, 746, 281, 312, + 4570, 295, 293, 611, 286, 528, 281, 1333, 295, 1998, 493, 300, 264, 1333, 295, 291, + 458, 51212], "temperature": 0.0, "avg_logprob": -0.1914636960593603, "compression_ratio": + 1.8066037735849056, "no_speech_prob": 0.010378072038292885}, {"id": 225, "seek": + 142160, "start": 1438.56, "end": 1446.8799999999999, "text": " description that + he gave or maybe looking at the future does world have some way of feedback do you", + "tokens": [51212, 3855, 300, 415, 2729, 420, 1310, 1237, 412, 264, 2027, 775, 1002, + 362, 512, 636, 295, 5824, 360, 291, 51628], "temperature": 0.0, "avg_logprob": -0.1914636960593603, + "compression_ratio": 1.8066037735849056, "no_speech_prob": 0.010378072038292885}, + {"id": 226, "seek": 144688, "start": 1446.88, "end": 1453.5200000000002, "text": + " plan to if not do you plan to implement do you think it''s reasonable to have + a feedback loop you", "tokens": [50364, 1393, 281, 498, 406, 360, 291, 1393, 281, + 4445, 360, 291, 519, 309, 311, 10585, 281, 362, 257, 5824, 6367, 291, 50696], "temperature": + 0.0, "avg_logprob": -0.14271168856276678, "compression_ratio": 1.7092511013215859, + "no_speech_prob": 0.01309046521782875}, {"id": 227, "seek": 144688, "start": 1453.5200000000002, + "end": 1458.4, "text": " know like in chat jpd you can say thumbs up thumbs down + you cannot say much you know you can say", "tokens": [50696, 458, 411, 294, 5081, + 361, 79, 67, 291, 393, 584, 8838, 493, 8838, 760, 291, 2644, 584, 709, 291, 458, + 291, 393, 584, 50940], "temperature": 0.0, "avg_logprob": -0.14271168856276678, + "compression_ratio": 1.7092511013215859, "no_speech_prob": 0.01309046521782875}, + {"id": 228, "seek": 144688, "start": 1458.4, "end": 1463.68, "text": " this was + the answer I don''t know if that''s gonna go into the loop but whatever because + it gives me", "tokens": [50940, 341, 390, 264, 1867, 286, 500, 380, 458, 498, 300, + 311, 799, 352, 666, 264, 6367, 457, 2035, 570, 309, 2709, 385, 51204], "temperature": + 0.0, "avg_logprob": -0.14271168856276678, "compression_ratio": 1.7092511013215859, + "no_speech_prob": 0.01309046521782875}, {"id": 229, "seek": 144688, "start": 1463.68, + "end": 1472.4, "text": " the the joy of sort of completing it um yeah oh yes so + when swirl AI connect is deployed in the", "tokens": [51204, 264, 264, 6258, 295, + 1333, 295, 19472, 309, 1105, 1338, 1954, 2086, 370, 562, 30310, 7318, 1745, 307, + 17826, 294, 264, 51640], "temperature": 0.0, "avg_logprob": -0.14271168856276678, + "compression_ratio": 1.7092511013215859, "no_speech_prob": 0.01309046521782875}, + {"id": 230, "seek": 147240, "start": 1472.4, "end": 1478.16, "text": " enterprise + where starters you get to connect the data and get rag and your choice of AI''s + and by", "tokens": [50364, 14132, 689, 35131, 291, 483, 281, 1745, 264, 1412, 293, + 483, 17539, 293, 428, 3922, 295, 7318, 311, 293, 538, 50652], "temperature": 0.0, + "avg_logprob": -0.13804718653361003, "compression_ratio": 1.9254901960784314, "no_speech_prob": + 0.0011625030310824513}, {"id": 231, "seek": 147240, "start": 1478.16, "end": 1482.0800000000002, + "text": " the way it''s again configuration for the AI you put your keys in right + check pick the model and you", "tokens": [50652, 264, 636, 309, 311, 797, 11694, + 337, 264, 7318, 291, 829, 428, 9317, 294, 558, 1520, 1888, 264, 2316, 293, 291, + 50848], "temperature": 0.0, "avg_logprob": -0.13804718653361003, "compression_ratio": + 1.9254901960784314, "no_speech_prob": 0.0011625030310824513}, {"id": 232, "seek": + 147240, "start": 1482.0800000000002, "end": 1487.0400000000002, "text": " can rag + against it you can also choose the role you want to use different generative AI + in rights you", "tokens": [50848, 393, 17539, 1970, 309, 291, 393, 611, 2826, 264, + 3090, 291, 528, 281, 764, 819, 1337, 1166, 7318, 294, 4601, 291, 51096], "temperature": + 0.0, "avg_logprob": -0.13804718653361003, "compression_ratio": 1.9254901960784314, + "no_speech_prob": 0.0011625030310824513}, {"id": 233, "seek": 147240, "start": 1487.0400000000002, + "end": 1491.1200000000001, "text": " can use it for query writing you can use it + for direct answer you can use it for rag if it has", "tokens": [51096, 393, 764, + 309, 337, 14581, 3579, 291, 393, 764, 309, 337, 2047, 1867, 291, 393, 764, 309, + 337, 17539, 498, 309, 575, 51300], "temperature": 0.0, "avg_logprob": -0.13804718653361003, + "compression_ratio": 1.9254901960784314, "no_speech_prob": 0.0011625030310824513}, + {"id": 234, "seek": 147240, "start": 1491.1200000000001, "end": 1496.8000000000002, + "text": " embeddings you can use that to power the the reader L so just just to + be clear there it''s uh it''s", "tokens": [51300, 12240, 29432, 291, 393, 764, 300, + 281, 1347, 264, 264, 15149, 441, 370, 445, 445, 281, 312, 1850, 456, 309, 311, 2232, + 309, 311, 51584], "temperature": 0.0, "avg_logprob": -0.13804718653361003, "compression_ratio": + 1.9254901960784314, "no_speech_prob": 0.0011625030310824513}, {"id": 235, "seek": + 149680, "start": 1496.8, "end": 1502.6399999999999, "text": " a bit more flexible + yeah i was asking about feedback right so like do you plan do you have it and", + "tokens": [50364, 257, 857, 544, 11358, 1338, 741, 390, 3365, 466, 5824, 558, 370, + 411, 360, 291, 1393, 360, 291, 362, 309, 293, 50656], "temperature": 0.0, "avg_logprob": + -0.0796822767991286, "compression_ratio": 1.7419354838709677, "no_speech_prob": + 0.00028872390976175666}, {"id": 236, "seek": 149680, "start": 1502.6399999999999, + "end": 1508.6399999999999, "text": " if not do you plan to think it''s reasonable + to have it right absolutely so after you deploy AI", "tokens": [50656, 498, 406, + 360, 291, 1393, 281, 519, 309, 311, 10585, 281, 362, 309, 558, 3122, 370, 934, 291, + 7274, 7318, 50956], "temperature": 0.0, "avg_logprob": -0.0796822767991286, "compression_ratio": + 1.7419354838709677, "no_speech_prob": 0.00028872390976175666}, {"id": 237, "seek": + 149680, "start": 1508.6399999999999, "end": 1513.9199999999998, "text": " connect + as mentioned you get those abilities then we have an analytics package which will + give you", "tokens": [50956, 1745, 382, 2835, 291, 483, 729, 11582, 550, 321, 362, + 364, 15370, 7372, 597, 486, 976, 291, 51220], "temperature": 0.0, "avg_logprob": + -0.0796822767991286, "compression_ratio": 1.7419354838709677, "no_speech_prob": + 0.00028872390976175666}, {"id": 238, "seek": 149680, "start": 1513.9199999999998, + "end": 1520.6399999999999, "text": " insight as to which sources are providing the + most relevant responses and rating and putting all", "tokens": [51220, 11269, 382, + 281, 597, 7139, 366, 6530, 264, 881, 7340, 13019, 293, 10990, 293, 3372, 439, 51556], + "temperature": 0.0, "avg_logprob": -0.0796822767991286, "compression_ratio": 1.7419354838709677, + "no_speech_prob": 0.00028872390976175666}, {"id": 239, "seek": 149680, "start": + 1520.6399999999999, "end": 1525.04, "text": " of that into dashboards understanding + who are the number one users who write the best prompts who", "tokens": [51556, + 295, 300, 666, 8240, 17228, 3701, 567, 366, 264, 1230, 472, 5022, 567, 2464, 264, + 1151, 41095, 567, 51776], "temperature": 0.0, "avg_logprob": -0.0796822767991286, + "compression_ratio": 1.7419354838709677, "no_speech_prob": 0.00028872390976175666}, + {"id": 240, "seek": 152504, "start": 1525.04, "end": 1529.68, "text": " get which + sources produce the best results which prompts you absolutely is all part of the", + "tokens": [50364, 483, 597, 7139, 5258, 264, 1151, 3542, 597, 41095, 291, 3122, + 307, 439, 644, 295, 264, 50596], "temperature": 0.0, "avg_logprob": -0.14850575455995363, + "compression_ratio": 1.743682310469314, "no_speech_prob": 0.00039235540316440165}, + {"id": 241, "seek": 152504, "start": 1530.3999999999999, "end": 1537.28, "text": + " the offering and ultimately it''s part of what we tailor right for the for the + deployment and again", "tokens": [50632, 264, 8745, 293, 6284, 309, 311, 644, 295, + 437, 321, 33068, 558, 337, 264, 337, 264, 19317, 293, 797, 50976], "temperature": + 0.0, "avg_logprob": -0.14850575455995363, "compression_ratio": 1.743682310469314, + "no_speech_prob": 0.00039235540316440165}, {"id": 242, "seek": 152504, "start": + 1537.28, "end": 1543.04, "text": " that can be on premises but AI connect is the + key because it''s collecting that data on again always", "tokens": [50976, 300, + 393, 312, 322, 34266, 457, 7318, 1745, 307, 264, 2141, 570, 309, 311, 12510, 300, + 1412, 322, 797, 1009, 51264], "temperature": 0.0, "avg_logprob": -0.14850575455995363, + "compression_ratio": 1.743682310469314, "no_speech_prob": 0.00039235540316440165}, + {"id": 243, "seek": 152504, "start": 1543.04, "end": 1548.24, "text": " in the customers + private cloud like we don''t see it we''re not sass but that data is absolutely", + "tokens": [51264, 294, 264, 4581, 4551, 4588, 411, 321, 500, 380, 536, 309, 321, + 434, 406, 262, 640, 457, 300, 1412, 307, 3122, 51524], "temperature": 0.0, "avg_logprob": + -0.14850575455995363, "compression_ratio": 1.743682310469314, "no_speech_prob": + 0.00039235540316440165}, {"id": 244, "seek": 152504, "start": 1548.24, "end": 1552.8799999999999, + "text": " turnable into gold a variety of different gold things and so you can hopefully + figure out which AI", "tokens": [51524, 1261, 712, 666, 3821, 257, 5673, 295, 819, + 3821, 721, 293, 370, 291, 393, 4696, 2573, 484, 597, 7318, 51756], "temperature": + 0.0, "avg_logprob": -0.14850575455995363, "compression_ratio": 1.743682310469314, + "no_speech_prob": 0.00039235540316440165}, {"id": 245, "seek": 155288, "start": + 1552.88, "end": 1557.6000000000001, "text": " works best for which groups you can + figure out which sources right are providing the best", "tokens": [50364, 1985, + 1151, 337, 597, 3935, 291, 393, 2573, 484, 597, 7139, 558, 366, 6530, 264, 1151, + 50600], "temperature": 0.0, "avg_logprob": -0.08563851375205844, "compression_ratio": + 1.7224334600760456, "no_speech_prob": 0.004041314125061035}, {"id": 246, "seek": + 155288, "start": 1557.6000000000001, "end": 1566.4, "text": " input for rag etc + yeah that''s fantastic i love that you do have feedback uh i think it''s definitely", + "tokens": [50600, 4846, 337, 17539, 5183, 1338, 300, 311, 5456, 741, 959, 300, 291, + 360, 362, 5824, 2232, 741, 519, 309, 311, 2138, 51040], "temperature": 0.0, "avg_logprob": + -0.08563851375205844, "compression_ratio": 1.7224334600760456, "no_speech_prob": + 0.004041314125061035}, {"id": 247, "seek": 155288, "start": 1566.4, "end": 1570.5600000000002, + "text": " gold it could could also be super messy and noisy and stuff but it''s + better than", "tokens": [51040, 3821, 309, 727, 727, 611, 312, 1687, 16191, 293, + 24518, 293, 1507, 457, 309, 311, 1101, 813, 51248], "temperature": 0.0, "avg_logprob": + -0.08563851375205844, "compression_ratio": 1.7224334600760456, "no_speech_prob": + 0.004041314125061035}, {"id": 248, "seek": 155288, "start": 1571.3600000000001, + "end": 1577.92, "text": " absence of it um yeah that''s amazing maybe like in the + past year so you''ve been deploying", "tokens": [51288, 17145, 295, 309, 1105, 1338, + 300, 311, 2243, 1310, 411, 294, 264, 1791, 1064, 370, 291, 600, 668, 34198, 51616], + "temperature": 0.0, "avg_logprob": -0.08563851375205844, "compression_ratio": 1.7224334600760456, + "no_speech_prob": 0.004041314125061035}, {"id": 249, "seek": 155288, "start": 1577.92, + "end": 1582.5600000000002, "text": " this with clients obviously you don''t have + to mention the names but was there something that", "tokens": [51616, 341, 365, + 6982, 2745, 291, 500, 380, 362, 281, 2152, 264, 5288, 457, 390, 456, 746, 300, 51848], + "temperature": 0.0, "avg_logprob": -0.08563851375205844, "compression_ratio": 1.7224334600760456, + "no_speech_prob": 0.004041314125061035}, {"id": 250, "seek": 158256, "start": 1582.6399999999999, + "end": 1590.1599999999999, "text": " surprised you how clients you know sort of + perceived uh swirl yeah i i would say that um", "tokens": [50368, 6100, 291, 577, + 6982, 291, 458, 1333, 295, 19049, 2232, 30310, 1338, 741, 741, 576, 584, 300, 1105, + 50744], "temperature": 0.0, "avg_logprob": -0.13787886229428378, "compression_ratio": + 1.6724890829694323, "no_speech_prob": 0.00022580412041861564}, {"id": 251, "seek": + 158256, "start": 1591.52, "end": 1599.36, "text": " people have not really been + looking for search if i''m the AI the explosion of AI and the excitement", "tokens": + [50812, 561, 362, 406, 534, 668, 1237, 337, 3164, 498, 741, 478, 264, 7318, 264, + 15673, 295, 7318, 293, 264, 14755, 51204], "temperature": 0.0, "avg_logprob": -0.13787886229428378, + "compression_ratio": 1.6724890829694323, "no_speech_prob": 0.00022580412041861564}, + {"id": 252, "seek": 158256, "start": 1599.36, "end": 1605.52, "text": " around AI + kind of crowded everything out round everything out so that''s why i think so many + of these", "tokens": [51204, 926, 7318, 733, 295, 21634, 1203, 484, 3098, 1203, + 484, 370, 300, 311, 983, 741, 519, 370, 867, 295, 613, 51512], "temperature": 0.0, + "avg_logprob": -0.13787886229428378, "compression_ratio": 1.6724890829694323, "no_speech_prob": + 0.00022580412041861564}, {"id": 253, "seek": 158256, "start": 1605.52, "end": 1611.36, + "text": " copy in architectures were got so much momentum but and i by the way i + think people are doing", "tokens": [51512, 5055, 294, 6331, 1303, 645, 658, 370, + 709, 11244, 457, 293, 741, 538, 264, 636, 741, 519, 561, 366, 884, 51804], "temperature": + 0.0, "avg_logprob": -0.13787886229428378, "compression_ratio": 1.6724890829694323, + "no_speech_prob": 0.00022580412041861564}, {"id": 254, "seek": 161136, "start": + 1611.36, "end": 1615.04, "text": " incredible stuff with that so it''s not like + those aren''t perfectly legitimate i mean every database", "tokens": [50364, 4651, + 1507, 365, 300, 370, 309, 311, 406, 411, 729, 3212, 380, 6239, 17956, 741, 914, + 633, 8149, 50548], "temperature": 0.0, "avg_logprob": -0.10960712271221613, "compression_ratio": + 1.8014440433212997, "no_speech_prob": 0.0005734601290896535}, {"id": 255, "seek": + 161136, "start": 1615.04, "end": 1620.08, "text": " ever starts that way right you + put the data in and you get inside of it''s just that there''s a bit more", "tokens": + [50548, 1562, 3719, 300, 636, 558, 291, 829, 264, 1412, 294, 293, 291, 483, 1854, + 295, 309, 311, 445, 300, 456, 311, 257, 857, 544, 50800], "temperature": 0.0, "avg_logprob": + -0.10960712271221613, "compression_ratio": 1.8014440433212997, "no_speech_prob": + 0.0005734601290896535}, {"id": 256, "seek": 161136, "start": 1620.08, "end": 1624.3999999999999, + "text": " to the story right there''s a whole other world of well there''s a lot + of these and i just moved them", "tokens": [50800, 281, 264, 1657, 558, 456, 311, + 257, 1379, 661, 1002, 295, 731, 456, 311, 257, 688, 295, 613, 293, 741, 445, 4259, + 552, 51016], "temperature": 0.0, "avg_logprob": -0.10960712271221613, "compression_ratio": + 1.8014440433212997, "no_speech_prob": 0.0005734601290896535}, {"id": 257, "seek": + 161136, "start": 1624.3999999999999, "end": 1629.84, "text": " to cloud and i can''t + necessarily do it but people weren''t thinking search i''m not sure what they", + "tokens": [51016, 281, 4588, 293, 741, 393, 380, 4725, 360, 309, 457, 561, 4999, + 380, 1953, 3164, 741, 478, 406, 988, 437, 436, 51288], "temperature": 0.0, "avg_logprob": + -0.10960712271221613, "compression_ratio": 1.8014440433212997, "no_speech_prob": + 0.0005734601290896535}, {"id": 258, "seek": 161136, "start": 1629.84, "end": 1634.8, + "text": " believed the answer would be but uh there were some excellent posts there + was one on linkedin by um", "tokens": [51288, 7847, 264, 1867, 576, 312, 457, 2232, + 456, 645, 512, 7103, 12300, 456, 390, 472, 322, 9408, 259, 538, 1105, 51536], "temperature": + 0.0, "avg_logprob": -0.10960712271221613, "compression_ratio": 1.8014440433212997, + "no_speech_prob": 0.0005734601290896535}, {"id": 259, "seek": 163480, "start": 1635.76, + "end": 1640.48, "text": " vector ventures i think or um i should probably get the + name right but in any event they", "tokens": [50412, 8062, 6931, 1303, 741, 519, + 420, 1105, 741, 820, 1391, 483, 264, 1315, 558, 457, 294, 604, 2280, 436, 50648], + "temperature": 0.0, "avg_logprob": -0.13112862604968953, "compression_ratio": 1.7410071942446044, + "no_speech_prob": 0.0007252498180605471}, {"id": 260, "seek": 163480, "start": 1641.44, + "end": 1647.28, "text": " published an excellent piece about how search is probably + the answer to making AI work in a lot", "tokens": [50696, 6572, 364, 7103, 2522, + 466, 577, 3164, 307, 1391, 264, 1867, 281, 1455, 7318, 589, 294, 257, 688, 50988], + "temperature": 0.0, "avg_logprob": -0.13112862604968953, "compression_ratio": 1.7410071942446044, + "no_speech_prob": 0.0007252498180605471}, {"id": 261, "seek": 163480, "start": 1647.28, + "end": 1651.76, "text": " of these cases and uh you know they also point out there''s + not that many people who have come at it", "tokens": [50988, 295, 613, 3331, 293, + 2232, 291, 458, 436, 611, 935, 484, 456, 311, 406, 300, 867, 561, 567, 362, 808, + 412, 309, 51212], "temperature": 0.0, "avg_logprob": -0.13112862604968953, "compression_ratio": + 1.7410071942446044, "no_speech_prob": 0.0007252498180605471}, {"id": 262, "seek": + 163480, "start": 1651.76, "end": 1657.04, "text": " from the search perspective + so that was a bit surprising to me because the large enterprise has", "tokens": + [51212, 490, 264, 3164, 4585, 370, 300, 390, 257, 857, 8830, 281, 385, 570, 264, + 2416, 14132, 575, 51476], "temperature": 0.0, "avg_logprob": -0.13112862604968953, + "compression_ratio": 1.7410071942446044, "no_speech_prob": 0.0007252498180605471}, + {"id": 263, "seek": 163480, "start": 1657.84, "end": 1664.48, "text": " always loved + search always uh because that that''s how knowledge workers and people get stuff + done right", "tokens": [51516, 1009, 4333, 3164, 1009, 2232, 570, 300, 300, 311, + 577, 3601, 5600, 293, 561, 483, 1507, 1096, 558, 51848], "temperature": 0.0, "avg_logprob": + -0.13112862604968953, "compression_ratio": 1.7410071942446044, "no_speech_prob": + 0.0007252498180605471}, {"id": 264, "seek": 166448, "start": 1664.48, "end": 1668.56, + "text": " it''s yes you have business intelligence and dashboards and reporting + and we like those things but", "tokens": [50364, 309, 311, 2086, 291, 362, 1606, + 7599, 293, 8240, 17228, 293, 10031, 293, 321, 411, 729, 721, 457, 50568], "temperature": + 0.0, "avg_logprob": -0.12428855895996094, "compression_ratio": 1.6888888888888889, + "no_speech_prob": 0.00013345517800189555}, {"id": 265, "seek": 166448, "start": + 1669.28, "end": 1674.0, "text": " so much of the qualitative why did things happen + explain it to me how do i solve this", "tokens": [50604, 370, 709, 295, 264, 31312, + 983, 630, 721, 1051, 2903, 309, 281, 385, 577, 360, 741, 5039, 341, 50840], "temperature": + 0.0, "avg_logprob": -0.12428855895996094, "compression_ratio": 1.6888888888888889, + "no_speech_prob": 0.00013345517800189555}, {"id": 266, "seek": 166448, "start": + 1674.56, "end": 1680.8, "text": " that''s been something that search did a good + job of um ultimately it''s a technology right and the", "tokens": [50868, 300, 311, + 668, 746, 300, 3164, 630, 257, 665, 1691, 295, 1105, 6284, 309, 311, 257, 2899, + 558, 293, 264, 51180], "temperature": 0.0, "avg_logprob": -0.12428855895996094, + "compression_ratio": 1.6888888888888889, "no_speech_prob": 0.00013345517800189555}, + {"id": 267, "seek": 166448, "start": 1680.8, "end": 1690.32, "text": " marriage + of search with llm that seems to be the unlocker if you will in the enterprise and + that''s", "tokens": [51180, 7194, 295, 3164, 365, 287, 75, 76, 300, 2544, 281, 312, + 264, 11634, 260, 498, 291, 486, 294, 264, 14132, 293, 300, 311, 51656], "temperature": + 0.0, "avg_logprob": -0.12428855895996094, "compression_ratio": 1.6888888888888889, + "no_speech_prob": 0.00013345517800189555}, {"id": 268, "seek": 169032, "start": + 1690.3999999999999, "end": 1694.8, "text": " that was surprising to me right i i + thought that there would be much more of a search first approach", "tokens": [50368, + 300, 390, 8830, 281, 385, 558, 741, 741, 1194, 300, 456, 576, 312, 709, 544, 295, + 257, 3164, 700, 3109, 50588], "temperature": 0.0, "avg_logprob": -0.10382959577772352, + "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.001205646782182157}, + {"id": 269, "seek": 169032, "start": 1694.8, "end": 1699.2, "text": " and i think + everybody had to get through the understanding of what it means to adopt AI and + how", "tokens": [50588, 293, 741, 519, 2201, 632, 281, 483, 807, 264, 3701, 295, + 437, 309, 1355, 281, 6878, 7318, 293, 577, 50808], "temperature": 0.0, "avg_logprob": + -0.10382959577772352, "compression_ratio": 1.7741935483870968, "no_speech_prob": + 0.001205646782182157}, {"id": 270, "seek": 169032, "start": 1699.2, "end": 1706.48, + "text": " the first generation works and now i think people are recognizing real + time real time architecture using", "tokens": [50808, 264, 700, 5125, 1985, 293, + 586, 741, 519, 561, 366, 18538, 957, 565, 957, 565, 9482, 1228, 51172], "temperature": + 0.0, "avg_logprob": -0.10382959577772352, "compression_ratio": 1.7741935483870968, + "no_speech_prob": 0.001205646782182157}, {"id": 271, "seek": 169032, "start": 1706.48, + "end": 1713.04, "text": " systems of record with re-ranking and and then just stay + keeping up right with the incredible", "tokens": [51172, 3652, 295, 2136, 365, 319, + 12, 20479, 278, 293, 293, 550, 445, 1754, 5145, 493, 558, 365, 264, 4651, 51500], + "temperature": 0.0, "avg_logprob": -0.10382959577772352, "compression_ratio": 1.7741935483870968, + "no_speech_prob": 0.001205646782182157}, {"id": 272, "seek": 169032, "start": 1713.04, + "end": 1718.3999999999999, "text": " innovation in it in let''s call it generative + AI right that''s that''s the interesting thing but again", "tokens": [51500, 8504, + 294, 309, 294, 718, 311, 818, 309, 1337, 1166, 7318, 558, 300, 311, 300, 311, 264, + 1880, 551, 457, 797, 51768], "temperature": 0.0, "avg_logprob": -0.10382959577772352, + "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.001205646782182157}, + {"id": 273, "seek": 171840, "start": 1718.5600000000002, "end": 1722.72, "text": + " i come back to what i said at the beginning there''s going to be a many many incredible + generative", "tokens": [50372, 741, 808, 646, 281, 437, 741, 848, 412, 264, 2863, + 456, 311, 516, 281, 312, 257, 867, 867, 4651, 1337, 1166, 50580], "temperature": + 0.0, "avg_logprob": -0.14686921161154043, "compression_ratio": 1.862962962962963, + "no_speech_prob": 0.0010927822440862656}, {"id": 274, "seek": 171840, "start": 1722.72, + "end": 1727.76, "text": " AI''s they''re going to do different things that we haven''t + even seen i don''t think the most extraordinary", "tokens": [50580, 7318, 311, 436, + 434, 516, 281, 360, 819, 721, 300, 321, 2378, 380, 754, 1612, 741, 500, 380, 519, + 264, 881, 10581, 50832], "temperature": 0.0, "avg_logprob": -0.14686921161154043, + "compression_ratio": 1.862962962962963, "no_speech_prob": 0.0010927822440862656}, + {"id": 275, "seek": 171840, "start": 1727.76, "end": 1732.16, "text": " ones they''ll + be from the big science publishers they''re going to build incredible life sciences + gai", "tokens": [50832, 2306, 436, 603, 312, 490, 264, 955, 3497, 30421, 436, 434, + 516, 281, 1322, 4651, 993, 17677, 290, 1301, 51052], "temperature": 0.0, "avg_logprob": + -0.14686921161154043, "compression_ratio": 1.862962962962963, "no_speech_prob": + 0.0010927822440862656}, {"id": 276, "seek": 171840, "start": 1732.8000000000002, + "end": 1739.2800000000002, "text": " i''m sure people like bloomberg ft you''re + going to build incredible um financial ones and that''s great", "tokens": [51084, + 741, 478, 988, 561, 411, 26899, 6873, 283, 83, 291, 434, 516, 281, 1322, 4651, 1105, + 4669, 2306, 293, 300, 311, 869, 51408], "temperature": 0.0, "avg_logprob": -0.14686921161154043, + "compression_ratio": 1.862962962962963, "no_speech_prob": 0.0010927822440862656}, + {"id": 277, "seek": 171840, "start": 1739.92, "end": 1747.52, "text": " but all + of those still need your data to operate on your environment and give you answers + that are", "tokens": [51440, 457, 439, 295, 729, 920, 643, 428, 1412, 281, 9651, + 322, 428, 2823, 293, 976, 291, 6338, 300, 366, 51820], "temperature": 0.0, "avg_logprob": + -0.14686921161154043, "compression_ratio": 1.862962962962963, "no_speech_prob": + 0.0010927822440862656}, {"id": 278, "seek": 174752, "start": 1747.52, "end": 1753.68, + "text": " meaningful and that problem that problem is the problem that''s world + solves so just to understand", "tokens": [50364, 10995, 293, 300, 1154, 300, 1154, + 307, 264, 1154, 300, 311, 1002, 39890, 370, 445, 281, 1223, 50672], "temperature": + 0.0, "avg_logprob": -0.1668013466729058, "compression_ratio": 1.813953488372093, + "no_speech_prob": 0.0017082623671740294}, {"id": 279, "seek": 174752, "start": 1753.68, + "end": 1761.44, "text": " what were they expecting was it like chat interface or + you said they didn''t expect search no they", "tokens": [50672, 437, 645, 436, 9650, + 390, 309, 411, 5081, 9226, 420, 291, 848, 436, 994, 380, 2066, 3164, 572, 436, 51060], + "temperature": 0.0, "avg_logprob": -0.1668013466729058, "compression_ratio": 1.813953488372093, + "no_speech_prob": 0.0017082623671740294}, {"id": 280, "seek": 174752, "start": 1761.44, + "end": 1768.0, "text": " thought they would ask the question oh yeah so they thought + like a chat right yeah that everybody", "tokens": [51060, 1194, 436, 576, 1029, + 264, 1168, 1954, 1338, 370, 436, 1194, 411, 257, 5081, 558, 1338, 300, 2201, 51388], + "temperature": 0.0, "avg_logprob": -0.1668013466729058, "compression_ratio": 1.813953488372093, + "no_speech_prob": 0.0017082623671740294}, {"id": 281, "seek": 174752, "start": 1768.0, + "end": 1772.96, "text": " wants kind of a conversational interface you know another + thing i learned actually you''re really", "tokens": [51388, 2738, 733, 295, 257, + 2615, 1478, 9226, 291, 458, 1071, 551, 741, 3264, 767, 291, 434, 534, 51636], "temperature": + 0.0, "avg_logprob": -0.1668013466729058, "compression_ratio": 1.813953488372093, + "no_speech_prob": 0.0017082623671740294}, {"id": 282, "seek": 177296, "start": 1772.96, + "end": 1780.24, "text": " reminded me people are not so interested in the idea that + there is an AI place that you go i think", "tokens": [50364, 15920, 385, 561, 366, + 406, 370, 3102, 294, 264, 1558, 300, 456, 307, 364, 7318, 1081, 300, 291, 352, 741, + 519, 50728], "temperature": 0.0, "avg_logprob": -0.0865283693586077, "compression_ratio": + 1.6926406926406927, "no_speech_prob": 0.0011206247145310044}, {"id": 283, "seek": + 177296, "start": 1780.24, "end": 1785.52, "text": " another very logical step is + business folks knowledge workers they would like to use the channels", "tokens": + [50728, 1071, 588, 14978, 1823, 307, 1606, 4024, 3601, 5600, 436, 576, 411, 281, + 764, 264, 9235, 50992], "temperature": 0.0, "avg_logprob": -0.0865283693586077, + "compression_ratio": 1.6926406926406927, "no_speech_prob": 0.0011206247145310044}, + {"id": 284, "seek": 177296, "start": 1785.52, "end": 1790.32, "text": " they use + today so rather than i have to have a new place to go why can''t i talk to it on + teens", "tokens": [50992, 436, 764, 965, 370, 2831, 813, 741, 362, 281, 362, 257, + 777, 1081, 281, 352, 983, 393, 380, 741, 751, 281, 309, 322, 24849, 51232], "temperature": + 0.0, "avg_logprob": -0.0865283693586077, "compression_ratio": 1.6926406926406927, + "no_speech_prob": 0.0011206247145310044}, {"id": 285, "seek": 177296, "start": 1791.04, + "end": 1799.52, "text": " or in slack or on my whatsapp why can''t i text it um + if i need to get a visual i could always go to", "tokens": [51268, 420, 294, 29767, + 420, 322, 452, 29625, 1746, 983, 393, 380, 741, 2487, 309, 1105, 498, 741, 643, + 281, 483, 257, 5056, 741, 727, 1009, 352, 281, 51692], "temperature": 0.0, "avg_logprob": + -0.0865283693586077, "compression_ratio": 1.6926406926406927, "no_speech_prob": + 0.0011206247145310044}, {"id": 286, "seek": 179952, "start": 1799.52, "end": 1804.8799999999999, + "text": " a screen right and then i could have it show me the underlying data show + me the graph show me the", "tokens": [50364, 257, 2568, 558, 293, 550, 741, 727, + 362, 309, 855, 385, 264, 14217, 1412, 855, 385, 264, 4295, 855, 385, 264, 50632], + "temperature": 0.0, "avg_logprob": -0.1014098403274372, "compression_ratio": 1.7946428571428572, + "no_speech_prob": 0.0011538842227309942}, {"id": 287, "seek": 179952, "start": 1804.8799999999999, + "end": 1812.8799999999999, "text": " chart but for the the the future is not applications + that are destinations the future is an ongoing", "tokens": [50632, 6927, 457, 337, + 264, 264, 264, 2027, 307, 406, 5821, 300, 366, 37787, 264, 2027, 307, 364, 10452, + 51032], "temperature": 0.0, "avg_logprob": -0.1014098403274372, "compression_ratio": + 1.7946428571428572, "no_speech_prob": 0.0011538842227309942}, {"id": 288, "seek": + 179952, "start": 1812.8799999999999, "end": 1818.48, "text": " that i log with the + AI that understands your business your world has access to your data and becomes", + "tokens": [51032, 300, 741, 3565, 365, 264, 7318, 300, 15146, 428, 1606, 428, 1002, + 575, 2105, 281, 428, 1412, 293, 3643, 51312], "temperature": 0.0, "avg_logprob": + -0.1014098403274372, "compression_ratio": 1.7946428571428572, "no_speech_prob": + 0.0011538842227309942}, {"id": 289, "seek": 179952, "start": 1818.48, "end": 1824.0, + "text": " your uh trusted advisor and agent i don''t want to use the word co-pilot + because i think that''s a little", "tokens": [51312, 428, 2232, 16034, 19161, 293, + 9461, 741, 500, 380, 528, 281, 764, 264, 1349, 598, 12, 79, 31516, 570, 741, 519, + 300, 311, 257, 707, 51588], "temperature": 0.0, "avg_logprob": -0.1014098403274372, + "compression_ratio": 1.7946428571428572, "no_speech_prob": 0.0011538842227309942}, + {"id": 290, "seek": 182400, "start": 1824.96, "end": 1829.68, "text": " it''s much + more your confidant it''s much more your agent it''s going to tell you stuff like + hey", "tokens": [50412, 309, 311, 709, 544, 428, 1497, 327, 394, 309, 311, 709, + 544, 428, 9461, 309, 311, 516, 281, 980, 291, 1507, 411, 4177, 50648], "temperature": + 0.0, "avg_logprob": -0.12206665674845378, "compression_ratio": 1.946843853820598, + "no_speech_prob": 0.004025470931082964}, {"id": 291, "seek": 182400, "start": 1829.92, + "end": 1834.08, "text": " that question you asked last month there''s a very different + answer this month that''s a pretty", "tokens": [50660, 300, 1168, 291, 2351, 1036, + 1618, 456, 311, 257, 588, 819, 1867, 341, 1618, 300, 311, 257, 1238, 50868], "temperature": + 0.0, "avg_logprob": -0.12206665674845378, "compression_ratio": 1.946843853820598, + "no_speech_prob": 0.004025470931082964}, {"id": 292, "seek": 182400, "start": 1834.08, + "end": 1838.4, "text": " interesting thing or it''s going to let you explore so + you know tell me about our customer service", "tokens": [50868, 1880, 551, 420, + 309, 311, 516, 281, 718, 291, 6839, 370, 291, 458, 980, 385, 466, 527, 5474, 2643, + 51084], "temperature": 0.0, "avg_logprob": -0.12206665674845378, "compression_ratio": + 1.946843853820598, "no_speech_prob": 0.004025470931082964}, {"id": 293, "seek": + 182400, "start": 1838.4, "end": 1843.36, "text": " ratings well which region right + disambiguation which was previously something you know that you would", "tokens": + [51084, 24603, 731, 597, 4458, 558, 717, 2173, 328, 16073, 597, 390, 8046, 746, + 291, 458, 300, 291, 576, 51332], "temperature": 0.0, "avg_logprob": -0.12206665674845378, + "compression_ratio": 1.946843853820598, "no_speech_prob": 0.004025470931082964}, + {"id": 294, "seek": 182400, "start": 1843.36, "end": 1848.8, "text": " do like through + facets in search right that''s kind of thing that should become more dialogue oriented", + "tokens": [51332, 360, 411, 807, 49752, 294, 3164, 558, 300, 311, 733, 295, 551, + 300, 820, 1813, 544, 10221, 21841, 51604], "temperature": 0.0, "avg_logprob": -0.12206665674845378, + "compression_ratio": 1.946843853820598, "no_speech_prob": 0.004025470931082964}, + {"id": 295, "seek": 182400, "start": 1848.8, "end": 1853.04, "text": " but those + things that''s going to take some time because in order to know how to disambiguate + you", "tokens": [51604, 457, 729, 721, 300, 311, 516, 281, 747, 512, 565, 570, 294, + 1668, 281, 458, 577, 281, 717, 2173, 328, 10107, 291, 51816], "temperature": 0.0, + "avg_logprob": -0.12206665674845378, "compression_ratio": 1.946843853820598, "no_speech_prob": + 0.004025470931082964}, {"id": 296, "seek": 185304, "start": 1853.04, "end": 1857.6, + "text": " still have to know what data is relevant right so so that''s been surprising + but i think we''re going", "tokens": [50364, 920, 362, 281, 458, 437, 1412, 307, + 7340, 558, 370, 370, 300, 311, 668, 8830, 457, 741, 519, 321, 434, 516, 50592], + "temperature": 0.0, "avg_logprob": -0.1127456318248402, "compression_ratio": 1.7863636363636364, + "no_speech_prob": 0.0001136185455834493}, {"id": 297, "seek": 185304, "start": 1857.6, + "end": 1864.6399999999999, "text": " to see a wave of search driven innovation and + i''m excited about it i think the more people shift away", "tokens": [50592, 281, + 536, 257, 5772, 295, 3164, 9555, 8504, 293, 741, 478, 2919, 466, 309, 741, 519, + 264, 544, 561, 5513, 1314, 50944], "temperature": 0.0, "avg_logprob": -0.1127456318248402, + "compression_ratio": 1.7863636363636364, "no_speech_prob": 0.0001136185455834493}, + {"id": 298, "seek": 185304, "start": 1864.6399999999999, "end": 1870.56, "text": + " from innovating in a repository to innovating across repositories we''ll see you + know another", "tokens": [50944, 490, 5083, 990, 294, 257, 25841, 281, 5083, 990, + 2108, 22283, 2083, 321, 603, 536, 291, 458, 1071, 51240], "temperature": 0.0, "avg_logprob": + -0.1127456318248402, "compression_ratio": 1.7863636363636364, "no_speech_prob": + 0.0001136185455834493}, {"id": 299, "seek": 185304, "start": 1870.56, "end": 1876.48, + "text": " another layer of innovation and and even more productivity left right + for for the people to use it", "tokens": [51240, 1071, 4583, 295, 8504, 293, 293, + 754, 544, 15604, 1411, 558, 337, 337, 264, 561, 281, 764, 309, 51536], "temperature": + 0.0, "avg_logprob": -0.1127456318248402, "compression_ratio": 1.7863636363636364, + "no_speech_prob": 0.0001136185455834493}, {"id": 300, "seek": 187648, "start": 1877.1200000000001, + "end": 1882.72, "text": " oh yeah that''s fantastic the way i put it and i''m glad + to hear this because there''s the search", "tokens": [50396, 1954, 1338, 300, 311, + 5456, 264, 636, 741, 829, 309, 293, 741, 478, 5404, 281, 1568, 341, 570, 456, 311, + 264, 3164, 50676], "temperature": 0.0, "avg_logprob": -0.15686762068006727, "compression_ratio": + 1.7219730941704037, "no_speech_prob": 0.00398709811270237}, {"id": 301, "seek": + 187648, "start": 1882.72, "end": 1889.44, "text": " professional you know now a + product manager i love the fact that the powerhouse of you know the", "tokens": + [50676, 4843, 291, 458, 586, 257, 1674, 6598, 741, 959, 264, 1186, 300, 264, 1347, + 6410, 295, 291, 458, 264, 51012], "temperature": 0.0, "avg_logprob": -0.15686762068006727, + "compression_ratio": 1.7219730941704037, "no_speech_prob": 0.00398709811270237}, + {"id": 302, "seek": 187648, "start": 1889.44, "end": 1896.16, "text": " future of + AI still continues to be searched right at the core and i think search it also says + that", "tokens": [51012, 2027, 295, 7318, 920, 6515, 281, 312, 22961, 558, 412, + 264, 4965, 293, 741, 519, 3164, 309, 611, 1619, 300, 51348], "temperature": 0.0, + "avg_logprob": -0.15686762068006727, "compression_ratio": 1.7219730941704037, "no_speech_prob": + 0.00398709811270237}, {"id": 303, "seek": 187648, "start": 1896.16, "end": 1902.8, + "text": " search isn''t solved and maybe this is another iteration that we will + approach it but it''s like", "tokens": [51348, 3164, 1943, 380, 13041, 293, 1310, + 341, 307, 1071, 24784, 300, 321, 486, 3109, 309, 457, 309, 311, 411, 51680], "temperature": + 0.0, "avg_logprob": -0.15686762068006727, "compression_ratio": 1.7219730941704037, + "no_speech_prob": 0.00398709811270237}, {"id": 304, "seek": 190280, "start": 1903.36, + "end": 1909.04, "text": " because search is also perception right it''s also how + you express yourself how you perceive what you", "tokens": [50392, 570, 3164, 307, + 611, 12860, 558, 309, 311, 611, 577, 291, 5109, 1803, 577, 291, 20281, 437, 291, + 50676], "temperature": 0.0, "avg_logprob": -0.10426116270177505, "compression_ratio": + 1.8130841121495327, "no_speech_prob": 0.004224587697535753}, {"id": 305, "seek": + 190280, "start": 1909.04, "end": 1916.32, "text": " see maybe the interfaces will + change right so sometimes i do want to you know that that product", "tokens": [50676, + 536, 1310, 264, 28416, 486, 1319, 558, 370, 2171, 741, 360, 528, 281, 291, 458, + 300, 300, 1674, 51040], "temperature": 0.0, "avg_logprob": -0.10426116270177505, + "compression_ratio": 1.8130841121495327, "no_speech_prob": 0.004224587697535753}, + {"id": 306, "seek": 190280, "start": 1916.32, "end": 1922.48, "text": " that google + had google glass sometimes i want to have glass on me to take a picture or you know", + "tokens": [51040, 300, 20742, 632, 20742, 4276, 2171, 741, 528, 281, 362, 4276, + 322, 385, 281, 747, 257, 3036, 420, 291, 458, 51348], "temperature": 0.0, "avg_logprob": + -0.10426116270177505, "compression_ratio": 1.8130841121495327, "no_speech_prob": + 0.004224587697535753}, {"id": 307, "seek": 190280, "start": 1922.48, "end": 1928.6399999999999, + "text": " not to be as distracted by going and fetching my phone or something right + because today i still", "tokens": [51348, 406, 281, 312, 382, 21658, 538, 516, 293, + 23673, 278, 452, 2593, 420, 746, 558, 570, 965, 741, 920, 51656], "temperature": + 0.0, "avg_logprob": -0.10426116270177505, "compression_ratio": 1.8130841121495327, + "no_speech_prob": 0.004224587697535753}, {"id": 308, "seek": 192864, "start": 1928.72, + "end": 1934.8000000000002, "text": " have to do that it''s not as immersive experience + and also i''ve noticed working with engineers now", "tokens": [50368, 362, 281, + 360, 300, 309, 311, 406, 382, 35409, 1752, 293, 611, 741, 600, 5694, 1364, 365, + 11955, 586, 50672], "temperature": 0.0, "avg_logprob": -0.09696457121107313, "compression_ratio": + 1.7454545454545454, "no_speech_prob": 0.012357759289443493}, {"id": 309, "seek": + 192864, "start": 1934.8000000000002, "end": 1940.0800000000002, "text": " when i + flipped on this side of you know the process i''m a product manager so i keep thinking + about", "tokens": [50672, 562, 741, 26273, 322, 341, 1252, 295, 291, 458, 264, 1399, + 741, 478, 257, 1674, 6598, 370, 741, 1066, 1953, 466, 50936], "temperature": 0.0, + "avg_logprob": -0.09696457121107313, "compression_ratio": 1.7454545454545454, "no_speech_prob": + 0.012357759289443493}, {"id": 310, "seek": 192864, "start": 1940.0800000000002, + "end": 1945.76, "text": " things and they keep coding and sometimes i''ve noticed + that they don''t even go back on slack", "tokens": [50936, 721, 293, 436, 1066, + 17720, 293, 2171, 741, 600, 5694, 300, 436, 500, 380, 754, 352, 646, 322, 29767, + 51220], "temperature": 0.0, "avg_logprob": -0.09696457121107313, "compression_ratio": + 1.7454545454545454, "no_speech_prob": 0.012357759289443493}, {"id": 311, "seek": + 192864, "start": 1945.76, "end": 1952.4, "text": " from like a couple hours when + i when i ask something because they don''t want to be uh you know", "tokens": [51220, + 490, 411, 257, 1916, 2496, 562, 741, 562, 741, 1029, 746, 570, 436, 500, 380, 528, + 281, 312, 2232, 291, 458, 51552], "temperature": 0.0, "avg_logprob": -0.09696457121107313, + "compression_ratio": 1.7454545454545454, "no_speech_prob": 0.012357759289443493}, + {"id": 312, "seek": 195240, "start": 1952.96, "end": 1960.88, "text": " distracted + from their ideas so maybe there could be a way for me or agent or whoever to sort + of", "tokens": [50392, 21658, 490, 641, 3487, 370, 1310, 456, 727, 312, 257, 636, + 337, 385, 420, 9461, 420, 11387, 281, 1333, 295, 50788], "temperature": 0.0, "avg_logprob": + -0.1186936212622601, "compression_ratio": 1.7692307692307692, "no_speech_prob": + 0.0033342058304697275}, {"id": 313, "seek": 195240, "start": 1960.88, "end": 1967.2800000000002, + "text": " sneak into their idea ask a question talk to them right that would be + fantastic maybe it sounds", "tokens": [50788, 13164, 666, 641, 1558, 1029, 257, + 1168, 751, 281, 552, 558, 300, 576, 312, 5456, 1310, 309, 3263, 51108], "temperature": + 0.0, "avg_logprob": -0.1186936212622601, "compression_ratio": 1.7692307692307692, + "no_speech_prob": 0.0033342058304697275}, {"id": 314, "seek": 195240, "start": 1967.2800000000002, + "end": 1974.0, "text": " also a little crazy like you you still want to have privacy + and sort of you know uh flow but at the", "tokens": [51108, 611, 257, 707, 3219, + 411, 291, 291, 920, 528, 281, 362, 11427, 293, 1333, 295, 291, 458, 2232, 3095, + 457, 412, 264, 51444], "temperature": 0.0, "avg_logprob": -0.1186936212622601, "compression_ratio": + 1.7692307692307692, "no_speech_prob": 0.0033342058304697275}, {"id": 315, "seek": + 195240, "start": 1974.0, "end": 1980.24, "text": " same time there is reality of + your job right you you do need to go back to your email to your slack", "tokens": + [51444, 912, 565, 456, 307, 4103, 295, 428, 1691, 558, 291, 291, 360, 643, 281, + 352, 646, 281, 428, 3796, 281, 428, 29767, 51756], "temperature": 0.0, "avg_logprob": + -0.1186936212622601, "compression_ratio": 1.7692307692307692, "no_speech_prob": + 0.0033342058304697275}, {"id": 316, "seek": 198024, "start": 1980.32, "end": 1986.08, + "text": " or whatever you''re using teams and get distracted and then you forget + what is it that you''ve been", "tokens": [50368, 420, 2035, 291, 434, 1228, 5491, + 293, 483, 21658, 293, 550, 291, 2870, 437, 307, 309, 300, 291, 600, 668, 50656], + "temperature": 0.0, "avg_logprob": -0.08900578429059285, "compression_ratio": 1.6822033898305084, + "no_speech_prob": 0.008829892612993717}, {"id": 317, "seek": 198024, "start": 1986.08, + "end": 1994.72, "text": " onto when you come back to your your motive execution + absolutely you know in a way applications", "tokens": [50656, 3911, 562, 291, 808, + 646, 281, 428, 428, 28827, 15058, 3122, 291, 458, 294, 257, 636, 5821, 51088], "temperature": + 0.0, "avg_logprob": -0.08900578429059285, "compression_ratio": 1.6822033898305084, + "no_speech_prob": 0.008829892612993717}, {"id": 318, "seek": 198024, "start": 1994.72, + "end": 1999.2, "text": " are distracting i think there was a really good study recently + that showed the danger of interrupting", "tokens": [51088, 366, 36689, 741, 519, + 456, 390, 257, 534, 665, 2979, 3938, 300, 4712, 264, 4330, 295, 49455, 51312], "temperature": + 0.0, "avg_logprob": -0.08900578429059285, "compression_ratio": 1.6822033898305084, + "no_speech_prob": 0.008829892612993717}, {"id": 319, "seek": 198024, "start": 1999.2, + "end": 2004.96, "text": " engineers right because of the context switch um it''s + definitely the same for business people they''re", "tokens": [51312, 11955, 558, + 570, 295, 264, 4319, 3679, 1105, 309, 311, 2138, 264, 912, 337, 1606, 561, 436, + 434, 51600], "temperature": 0.0, "avg_logprob": -0.08900578429059285, "compression_ratio": + 1.6822033898305084, "no_speech_prob": 0.008829892612993717}, {"id": 320, "seek": + 200496, "start": 2005.04, "end": 2010.88, "text": " just like everybody else right + context matters and it can be hard to switch um i think that''s the real", "tokens": + [50368, 445, 411, 2201, 1646, 558, 4319, 7001, 293, 309, 393, 312, 1152, 281, 3679, + 1105, 741, 519, 300, 311, 264, 957, 50660], "temperature": 0.0, "avg_logprob": -0.0938236220129605, + "compression_ratio": 1.8507462686567164, "no_speech_prob": 0.0005857805954292417}, + {"id": 321, "seek": 200496, "start": 2010.88, "end": 2015.92, "text": " promise + of a i look at chat gpt you go to chat gpt they''re gonna have search soon web search + you can", "tokens": [50660, 6228, 295, 257, 741, 574, 412, 5081, 290, 662, 291, + 352, 281, 5081, 290, 662, 436, 434, 799, 362, 3164, 2321, 3670, 3164, 291, 393, + 50912], "temperature": 0.0, "avg_logprob": -0.0938236220129605, "compression_ratio": + 1.8507462686567164, "no_speech_prob": 0.0005857805954292417}, {"id": 322, "seek": + 200496, "start": 2015.92, "end": 2020.96, "text": " ask it questions you don''t + have to go to google and being in five other places right to get it", "tokens": + [50912, 1029, 309, 1651, 291, 500, 380, 362, 281, 352, 281, 20742, 293, 885, 294, + 1732, 661, 3190, 558, 281, 483, 309, 51164], "temperature": 0.0, "avg_logprob": + -0.0938236220129605, "compression_ratio": 1.8507462686567164, "no_speech_prob": + 0.0005857805954292417}, {"id": 323, "seek": 200496, "start": 2020.96, "end": 2027.76, + "text": " and that is the real possibility that you would choose the way you want + to interact with it and", "tokens": [51164, 293, 300, 307, 264, 957, 7959, 300, + 291, 576, 2826, 264, 636, 291, 528, 281, 4648, 365, 309, 293, 51504], "temperature": + 0.0, "avg_logprob": -0.0938236220129605, "compression_ratio": 1.8507462686567164, + "no_speech_prob": 0.0005857805954292417}, {"id": 324, "seek": 200496, "start": 2027.76, + "end": 2034.56, "text": " that thing in theory that single point that single pane + of glass or single conversational agent right", "tokens": [51504, 300, 551, 294, + 5261, 300, 2167, 935, 300, 2167, 32605, 295, 4276, 420, 2167, 2615, 1478, 9461, + 558, 51844], "temperature": 0.0, "avg_logprob": -0.0938236220129605, "compression_ratio": + 1.8507462686567164, "no_speech_prob": 0.0005857805954292417}, {"id": 325, "seek": + 203456, "start": 2034.96, "end": 2040.8, "text": " that could potentially be in + front of many many sources of data and that i think that''s what", "tokens": [50384, + 300, 727, 7263, 312, 294, 1868, 295, 867, 867, 7139, 295, 1412, 293, 300, 741, 519, + 300, 311, 437, 50676], "temperature": 0.0, "avg_logprob": -0.11139005144065786, + "compression_ratio": 1.8097165991902835, "no_speech_prob": 0.002264224225655198}, + {"id": 326, "seek": 203456, "start": 2040.8, "end": 2044.96, "text": " what people + realize it''s hard to say how does what does it really look like in five years", + "tokens": [50676, 437, 561, 4325, 309, 311, 1152, 281, 584, 577, 775, 437, 775, + 309, 534, 574, 411, 294, 1732, 924, 50884], "temperature": 0.0, "avg_logprob": -0.11139005144065786, + "compression_ratio": 1.8097165991902835, "no_speech_prob": 0.002264224225655198}, + {"id": 327, "seek": 203456, "start": 2044.96, "end": 2049.2, "text": " if a i really + continues along the path of sun the answer is it''s the end of applications", "tokens": + [50884, 498, 257, 741, 534, 6515, 2051, 264, 3100, 295, 3295, 264, 1867, 307, 309, + 311, 264, 917, 295, 5821, 51096], "temperature": 0.0, "avg_logprob": -0.11139005144065786, + "compression_ratio": 1.8097165991902835, "no_speech_prob": 0.002264224225655198}, + {"id": 328, "seek": 203456, "start": 2050.72, "end": 2055.7599999999998, "text": + " yeah yeah exactly and going back to being immersive and sort of feeling that", + "tokens": [51172, 1338, 1338, 2293, 293, 516, 646, 281, 885, 35409, 293, 1333, 295, + 2633, 300, 51424], "temperature": 0.0, "avg_logprob": -0.11139005144065786, "compression_ratio": + 1.8097165991902835, "no_speech_prob": 0.002264224225655198}, {"id": 329, "seek": + 203456, "start": 2056.72, "end": 2063.2799999999997, "text": " i''m myself and i + am in control and not like vice versa when today i don''t feel like i''m in control", + "tokens": [51472, 741, 478, 2059, 293, 741, 669, 294, 1969, 293, 406, 411, 11964, + 25650, 562, 965, 741, 500, 380, 841, 411, 741, 478, 294, 1969, 51800], "temperature": + 0.0, "avg_logprob": -0.11139005144065786, "compression_ratio": 1.8097165991902835, + "no_speech_prob": 0.002264224225655198}, {"id": 330, "seek": 206328, "start": 2063.6000000000004, + "end": 2070.1600000000003, "text": " applications update by themselves i phone restarts + i have no idea what''s in that update i will not", "tokens": [50380, 5821, 5623, + 538, 2969, 741, 2593, 1472, 11814, 741, 362, 572, 1558, 437, 311, 294, 300, 5623, + 741, 486, 406, 50708], "temperature": 0.0, "avg_logprob": -0.12655060941522772, + "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0031326808966696262}, + {"id": 331, "seek": 206328, "start": 2070.1600000000003, "end": 2076.88, "text": + " be able to ever understand what they do but they do it so sometimes i feel like + whatever i have bought", "tokens": [50708, 312, 1075, 281, 1562, 1223, 437, 436, + 360, 457, 436, 360, 309, 370, 2171, 741, 841, 411, 2035, 741, 362, 4243, 51044], + "temperature": 0.0, "avg_logprob": -0.12655060941522772, "compression_ratio": 1.7636363636363637, + "no_speech_prob": 0.0031326808966696262}, {"id": 332, "seek": 206328, "start": 2076.88, + "end": 2082.5600000000004, "text": " belongs to someone else but probably this will + change and i think this should change", "tokens": [51044, 12953, 281, 1580, 1646, + 457, 1391, 341, 486, 1319, 293, 741, 519, 341, 820, 1319, 51328], "temperature": + 0.0, "avg_logprob": -0.12655060941522772, "compression_ratio": 1.7636363636363637, + "no_speech_prob": 0.0031326808966696262}, {"id": 333, "seek": 206328, "start": 2083.84, + "end": 2089.92, "text": " as we wrap up i was thinking is there something you want + to sort of call out to the community and say", "tokens": [51392, 382, 321, 7019, + 493, 741, 390, 1953, 307, 456, 746, 291, 528, 281, 1333, 295, 818, 484, 281, 264, + 1768, 293, 584, 51696], "temperature": 0.0, "avg_logprob": -0.12655060941522772, + "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0031326808966696262}, + {"id": 334, "seek": 208992, "start": 2090.0, "end": 2094.56, "text": " you know + by now swirl obviously has progressed you guys open source i love it", "tokens": + [50368, 291, 458, 538, 586, 30310, 2745, 575, 36789, 291, 1074, 1269, 4009, 741, + 959, 309, 50596], "temperature": 0.0, "avg_logprob": -0.14063210487365724, "compression_ratio": + 1.7019230769230769, "no_speech_prob": 0.0029575617518275976}, {"id": 335, "seek": + 208992, "start": 2095.28, "end": 2100.08, "text": " you you have a bunch of contributors + probably that you trust and you work with but", "tokens": [50632, 291, 291, 362, + 257, 3840, 295, 45627, 1391, 300, 291, 3361, 293, 291, 589, 365, 457, 50872], "temperature": + 0.0, "avg_logprob": -0.14063210487365724, "compression_ratio": 1.7019230769230769, + "no_speech_prob": 0.0029575617518275976}, {"id": 336, "seek": 208992, "start": 2100.08, + "end": 2105.6, "text": " is there anything that you would you know benefit from + calling out the the larger community", "tokens": [50872, 307, 456, 1340, 300, 291, + 576, 291, 458, 5121, 490, 5141, 484, 264, 264, 4833, 1768, 51148], "temperature": + 0.0, "avg_logprob": -0.14063210487365724, "compression_ratio": 1.7019230769230769, + "no_speech_prob": 0.0029575617518275976}, {"id": 337, "seek": 208992, "start": 2107.76, + "end": 2114.7200000000003, "text": " i think i''m very happy to see the the folks + shift and focus towards search i think the thing i''d call", "tokens": [51256, 741, + 519, 741, 478, 588, 2055, 281, 536, 264, 264, 4024, 5513, 293, 1879, 3030, 3164, + 741, 519, 264, 551, 741, 1116, 818, 51604], "temperature": 0.0, "avg_logprob": -0.14063210487365724, + "compression_ratio": 1.7019230769230769, "no_speech_prob": 0.0029575617518275976}, + {"id": 338, "seek": 211472, "start": 2114.7999999999997, "end": 2121.68, "text": + " out is to say you know there are many different user communities that want to + consume AI they will", "tokens": [50368, 484, 307, 281, 584, 291, 458, 456, 366, + 867, 819, 4195, 4456, 300, 528, 281, 14732, 7318, 436, 486, 50712], "temperature": + 0.0, "avg_logprob": -0.12618076408302392, "compression_ratio": 1.6359832635983265, + "no_speech_prob": 0.0013017432065680623}, {"id": 339, "seek": 211472, "start": 2121.68, + "end": 2130.08, "text": " benefit from it and i think the key is not to go too far + on the hype cycle right and because honestly", "tokens": [50712, 5121, 490, 309, + 293, 741, 519, 264, 2141, 307, 406, 281, 352, 886, 1400, 322, 264, 24144, 6586, + 558, 293, 570, 6095, 51132], "temperature": 0.0, "avg_logprob": -0.12618076408302392, + "compression_ratio": 1.6359832635983265, "no_speech_prob": 0.0013017432065680623}, + {"id": 340, "seek": 211472, "start": 2131.12, "end": 2137.8399999999997, "text": + " another thing i learned is not everybody is into the details of how AI works right + and like", "tokens": [51184, 1071, 551, 741, 3264, 307, 406, 2201, 307, 666, 264, + 4365, 295, 577, 7318, 1985, 558, 293, 411, 51520], "temperature": 0.0, "avg_logprob": + -0.12618076408302392, "compression_ratio": 1.6359832635983265, "no_speech_prob": + 0.0013017432065680623}, {"id": 341, "seek": 211472, "start": 2137.8399999999997, + "end": 2143.2, "text": " fine tuning is an example it''s a very deep discussion + at some level i''m no expert right i can tell", "tokens": [51520, 2489, 15164, 307, + 364, 1365, 309, 311, 257, 588, 2452, 5017, 412, 512, 1496, 741, 478, 572, 5844, + 558, 741, 393, 980, 51788], "temperature": 0.0, "avg_logprob": -0.12618076408302392, + "compression_ratio": 1.6359832635983265, "no_speech_prob": 0.0013017432065680623}, + {"id": 342, "seek": 214320, "start": 2143.4399999999996, "end": 2147.6, "text": + " a lot about it but i think there''s people who have done much more on it than + i will ever do", "tokens": [50376, 257, 688, 466, 309, 457, 741, 519, 456, 311, + 561, 567, 362, 1096, 709, 544, 322, 309, 813, 741, 486, 1562, 360, 50584], "temperature": + 0.0, "avg_logprob": -0.10820113198231843, "compression_ratio": 1.8409090909090908, + "no_speech_prob": 0.0027906633913517}, {"id": 343, "seek": 214320, "start": 2148.24, + "end": 2154.56, "text": " but the end of the day the user that''s way way way far + from the user''s head right what they''re what", "tokens": [50616, 457, 264, 917, + 295, 264, 786, 264, 4195, 300, 311, 636, 636, 636, 1400, 490, 264, 4195, 311, 1378, + 558, 437, 436, 434, 437, 50932], "temperature": 0.0, "avg_logprob": -0.10820113198231843, + "compression_ratio": 1.8409090909090908, "no_speech_prob": 0.0027906633913517}, + {"id": 344, "seek": 214320, "start": 2154.56, "end": 2157.52, "text": " they''re + trying to understand and the people are making decisions about bringing these things + in", "tokens": [50932, 436, 434, 1382, 281, 1223, 293, 264, 561, 366, 1455, 5327, + 466, 5062, 613, 721, 294, 51080], "temperature": 0.0, "avg_logprob": -0.10820113198231843, + "compression_ratio": 1.8409090909090908, "no_speech_prob": 0.0027906633913517}, + {"id": 345, "seek": 214320, "start": 2157.52, "end": 2164.0, "text": " are is it + safe how can i trust it how do i get it to provide a benefit so i think the honest + thing is", "tokens": [51080, 366, 307, 309, 3273, 577, 393, 741, 3361, 309, 577, + 360, 741, 483, 309, 281, 2893, 257, 5121, 370, 741, 519, 264, 3245, 551, 307, 51404], + "temperature": 0.0, "avg_logprob": -0.10820113198231843, "compression_ratio": 1.8409090909090908, + "no_speech_prob": 0.0027906633913517}, {"id": 346, "seek": 214320, "start": 2164.0, + "end": 2170.56, "text": " rather than focusing on like we''ve got a few more tokens + than somebody else talk about use cases", "tokens": [51404, 2831, 813, 8416, 322, + 411, 321, 600, 658, 257, 1326, 544, 22667, 813, 2618, 1646, 751, 466, 764, 3331, + 51732], "temperature": 0.0, "avg_logprob": -0.10820113198231843, "compression_ratio": + 1.8409090909090908, "no_speech_prob": 0.0027906633913517}, {"id": 347, "seek": 217056, + "start": 2170.64, "end": 2174.48, "text": " focus on the user i think that''s where + that''s what''s world did from the beginning because if you''re", "tokens": [50368, + 1879, 322, 264, 4195, 741, 519, 300, 311, 689, 300, 311, 437, 311, 1002, 630, 490, + 264, 2863, 570, 498, 291, 434, 50560], "temperature": 0.0, "avg_logprob": -0.17847571165665335, + "compression_ratio": 2.0280701754385966, "no_speech_prob": 0.0029498040676116943}, + {"id": 348, "seek": 217056, "start": 2174.48, "end": 2180.16, "text": " in search + like there is nothing but the user right the user''s intent is everything right + and from", "tokens": [50560, 294, 3164, 411, 456, 307, 1825, 457, 264, 4195, 558, + 264, 4195, 311, 8446, 307, 1203, 558, 293, 490, 50844], "temperature": 0.0, "avg_logprob": + -0.17847571165665335, "compression_ratio": 2.0280701754385966, "no_speech_prob": + 0.0029498040676116943}, {"id": 349, "seek": 217056, "start": 2180.72, "end": 2184.72, + "text": " like we can go back to lots of lots of great writing about that from biaires + e8s to tinkle and", "tokens": [50872, 411, 321, 393, 352, 646, 281, 3195, 295, 3195, + 295, 869, 3579, 466, 300, 490, 272, 654, 3145, 308, 23, 82, 281, 256, 14095, 293, + 51072], "temperature": 0.0, "avg_logprob": -0.17847571165665335, "compression_ratio": + 2.0280701754385966, "no_speech_prob": 0.0029498040676116943}, {"id": 350, "seek": + 217056, "start": 2184.72, "end": 2190.4, "text": " and all points in between but + the user''s intent''s important and that''s the thing to focus on what", "tokens": + [51072, 293, 439, 2793, 294, 1296, 457, 264, 4195, 311, 8446, 311, 1021, 293, 300, + 311, 264, 551, 281, 1879, 322, 437, 51356], "temperature": 0.0, "avg_logprob": -0.17847571165665335, + "compression_ratio": 2.0280701754385966, "no_speech_prob": 0.0029498040676116943}, + {"id": 351, "seek": 217056, "start": 2190.4, "end": 2195.44, "text": " are they + trying to accomplish and build great use cases that ultimately you know allow people + to", "tokens": [51356, 366, 436, 1382, 281, 9021, 293, 1322, 869, 764, 3331, 300, + 6284, 291, 458, 2089, 561, 281, 51608], "temperature": 0.0, "avg_logprob": -0.17847571165665335, + "compression_ratio": 2.0280701754385966, "no_speech_prob": 0.0029498040676116943}, + {"id": 352, "seek": 217056, "start": 2195.44, "end": 2200.08, "text": " focus on + the things they''d rather focus on instead of you know the minutia and the time + of", "tokens": [51608, 1879, 322, 264, 721, 436, 1116, 2831, 1879, 322, 2602, 295, + 291, 458, 264, 13951, 654, 293, 264, 565, 295, 51840], "temperature": 0.0, "avg_logprob": + -0.17847571165665335, "compression_ratio": 2.0280701754385966, "no_speech_prob": + 0.0029498040676116943}, {"id": 353, "seek": 220008, "start": 2200.08, "end": 2205.2799999999997, + "text": " collecting all these different data points yeah that''s i think that''s + i think what you do as a", "tokens": [50364, 12510, 439, 613, 819, 1412, 2793, 1338, + 300, 311, 741, 519, 300, 311, 741, 519, 437, 291, 360, 382, 257, 50624], "temperature": + 0.0, "avg_logprob": -0.15746388965182834, "compression_ratio": 1.730593607305936, + "no_speech_prob": 0.003096443135291338}, {"id": 354, "seek": 220008, "start": 2205.2799999999997, + "end": 2211.04, "text": " as a tech industry responding to the user demand for AI + my two cents oh that''s amazing i don''t", "tokens": [50624, 382, 257, 7553, 3518, + 16670, 281, 264, 4195, 4733, 337, 7318, 452, 732, 14941, 1954, 300, 311, 2243, 741, + 500, 380, 50912], "temperature": 0.0, "avg_logprob": -0.15746388965182834, "compression_ratio": + 1.730593607305936, "no_speech_prob": 0.003096443135291338}, {"id": 355, "seek": + 220008, "start": 2211.04, "end": 2218.7999999999997, "text": " don''t try to outsmart + the users and uh make things that you produce explainable and so they", "tokens": + [50912, 500, 380, 853, 281, 484, 10817, 446, 264, 5022, 293, 2232, 652, 721, 300, + 291, 5258, 2903, 712, 293, 370, 436, 51300], "temperature": 0.0, "avg_logprob": + -0.15746388965182834, "compression_ratio": 1.730593607305936, "no_speech_prob": + 0.003096443135291338}, {"id": 356, "seek": 220008, "start": 2218.7999999999997, + "end": 2226.88, "text": " probably will adopt them weaker uh that''s amazing i also + see uh super uh maybe provocative really", "tokens": [51300, 1391, 486, 6878, 552, + 24286, 2232, 300, 311, 2243, 741, 611, 536, 2232, 1687, 2232, 1310, 47663, 534, + 51704], "temperature": 0.0, "avg_logprob": -0.15746388965182834, "compression_ratio": + 1.730593607305936, "no_speech_prob": 0.003096443135291338}, {"id": 357, "seek": + 222688, "start": 2226.96, "end": 2233.28, "text": " nice one from your side where + you say that you outshine google i will link it in the show notes and", "tokens": + [50368, 1481, 472, 490, 428, 1252, 689, 291, 584, 300, 291, 484, 19686, 20742, 741, + 486, 2113, 309, 294, 264, 855, 5570, 293, 50684], "temperature": 0.0, "avg_logprob": + -0.10509832612760775, "compression_ratio": 1.735159817351598, "no_speech_prob": + 0.00680195540189743}, {"id": 358, "seek": 222688, "start": 2233.28, "end": 2239.2000000000003, + "text": " maybe we''ll discuss it at some point as well something about ranking + looks it i really", "tokens": [50684, 1310, 321, 603, 2248, 309, 412, 512, 935, + 382, 731, 746, 466, 17833, 1542, 309, 741, 534, 50980], "temperature": 0.0, "avg_logprob": + -0.10509832612760775, "compression_ratio": 1.735159817351598, "no_speech_prob": + 0.00680195540189743}, {"id": 359, "seek": 222688, "start": 2239.92, "end": 2246.7200000000003, + "text": " enjoyed chatting to you today i''m sure there will be someone you know + in the community reaching out", "tokens": [51016, 4626, 24654, 281, 291, 965, 741, + 478, 988, 456, 486, 312, 1580, 291, 458, 294, 264, 1768, 9906, 484, 51356], "temperature": + 0.0, "avg_logprob": -0.10509832612760775, "compression_ratio": 1.735159817351598, + "no_speech_prob": 0.00680195540189743}, {"id": 360, "seek": 222688, "start": 2246.7200000000003, + "end": 2252.0, "text": " and maybe trying out swirl to be honest it''s itching for + me to try it out right when i when i", "tokens": [51356, 293, 1310, 1382, 484, 30310, + 281, 312, 3245, 309, 311, 309, 17354, 337, 385, 281, 853, 309, 484, 558, 562, 741, + 562, 741, 51620], "temperature": 0.0, "avg_logprob": -0.10509832612760775, "compression_ratio": + 1.735159817351598, "no_speech_prob": 0.00680195540189743}, {"id": 361, "seek": 225200, + "start": 2252.0, "end": 2258.24, "text": " said it when i mentioned in my company + and someone said i would love to have that single keyword box", "tokens": [50364, + 848, 309, 562, 741, 2835, 294, 452, 2237, 293, 1580, 848, 741, 576, 959, 281, 362, + 300, 2167, 20428, 2424, 50676], "temperature": 0.0, "avg_logprob": -0.11457275789837505, + "compression_ratio": 1.727699530516432, "no_speech_prob": 0.009190120734274387}, + {"id": 362, "seek": 225200, "start": 2258.24, "end": 2264.56, "text": " so that + i can search slack and conference and email and everything um that''s amazing that''s", + "tokens": [50676, 370, 300, 741, 393, 3164, 29767, 293, 7586, 293, 3796, 293, 1203, + 1105, 300, 311, 2243, 300, 311, 50992], "temperature": 0.0, "avg_logprob": -0.11457275789837505, + "compression_ratio": 1.727699530516432, "no_speech_prob": 0.009190120734274387}, + {"id": 363, "seek": 225200, "start": 2264.56, "end": 2270.88, "text": " fantastic + and also amazing that you guys do it uh in the open so everyone can try it um all + the best", "tokens": [50992, 5456, 293, 611, 2243, 300, 291, 1074, 360, 309, 2232, + 294, 264, 1269, 370, 1518, 393, 853, 309, 1105, 439, 264, 1151, 51308], "temperature": + 0.0, "avg_logprob": -0.11457275789837505, "compression_ratio": 1.727699530516432, + "no_speech_prob": 0.009190120734274387}, {"id": 364, "seek": 225200, "start": 2270.88, + "end": 2276.16, "text": " to you good luck in in whatever you''re building in your + uh next big things", "tokens": [51308, 281, 291, 665, 3668, 294, 294, 2035, 291, + 434, 2390, 294, 428, 2232, 958, 955, 721, 51572], "temperature": 0.0, "avg_logprob": + -0.11457275789837505, "compression_ratio": 1.727699530516432, "no_speech_prob": + 0.009190120734274387}, {"id": 365, "seek": 227616, "start": 2276.48, "end": 2281.2, + "text": " thanks to me tree thank you so much it was great to talk to you and uh + when you want to", "tokens": [50380, 3231, 281, 385, 4230, 1309, 291, 370, 709, + 309, 390, 869, 281, 751, 281, 291, 293, 2232, 562, 291, 528, 281, 50616], "temperature": + 0.0, "avg_logprob": -0.2952918677494444, "compression_ratio": 1.6122448979591837, + "no_speech_prob": 0.015635771676898003}, {"id": 366, "seek": 227616, "start": 2281.2, + "end": 2285.92, "text": " ready to try a swirl you got the open source version enjoy + our slack and we''ll be happy to help", "tokens": [50616, 1919, 281, 853, 257, 30310, + 291, 658, 264, 1269, 4009, 3037, 2103, 527, 29767, 293, 321, 603, 312, 2055, 281, + 854, 50852], "temperature": 0.0, "avg_logprob": -0.2952918677494444, "compression_ratio": + 1.6122448979591837, "no_speech_prob": 0.015635771676898003}, {"id": 367, "seek": + 227616, "start": 2285.92, "end": 2296.0, "text": " absolutely thank you very much + enjoy your day youtube", "tokens": [50852, 3122, 1309, 291, 588, 709, 2103, 428, + 786, 12487, 51356], "temperature": 0.0, "avg_logprob": -0.2952918677494444, "compression_ratio": + 1.6122448979591837, "no_speech_prob": 0.015635771676898003}]' +--- + +Hello there, this is Vector Podcast Season 3 and I'm super excited to be talking to companies with thousands and thousands of users and thousands and thousands of systems that it's been a time of inspiration and a little bit of continued nervousness about what it all means. +Last March on the 15th actually was the 14th was Pi Day and that was the one year anniversary of GPT-4. What I've learned is that those large enterprises were again looked at GPT-4 and said this is going to change our business. +This can really help everybody be an efficient expert and just slice through the current problems of silo data and inconsistent systems. +But at the same time there were a lot of fear about well are we exposing invaluable internal data to AI's that are then going to be trained on it? Is this going to be exposed? Lost? There have been many many lawsuits. + So ultimately the large enterprises did what they always do which is engage with it on their own terms and many of them purchased download installed AI, generative AI's and LLMs in their private clouds and we're working with one large company that did that and trained it with a bunch of what they called safe data. + So annual reports and you employ a handbook and it's very interesting to talk to but it can't really help a business person or somebody trying to answer a question in the supply chain group or in the R&D group or in HR because this doesn't have access to those systems and in those places you've ever worked there. +You know when you onboard the first thing they do is your manager does right is they open a bunch of tickets so that you could have access to systems. That's hard enough. +So the reason that there's been so in a way so little progress right lots of installs of AI but not that much real I'd love to hear from you some of the use cases out there. + People are still trying to say we're still trying to get the data to the AI so that it can provide the benefit and what ultimately what what happened is this they've got the AI's installed the first generation of AI architecture solution architectures is what I will refer to as a vendor driven put the data in architecture literally every product out there I don't want to name them but they all say the first step is put the data in again like for some people for many applications from POVs for testing it out that's great and I've who hasn't done it with a few PDFs right and got some interesting results but you can't just take a copy of a departmental database and hand it over to a centralized corporate database for training like that their rules in place to prevent that even more difficult is the idea that you would send it outside your perimeter into someone else's cloud right at another big manufacturing firm they have a 24 month waiting list to onboard a new SaaS product right they'd like we have to put our security team on it so I believe it's a very interesting time and ultimately what happened is Swirl thought differently about the problem as you said we thought about it from the search technology perspective why would we move all of the data instead move the essentially take only the data that you didn't give it to the AI at that moment and what Swirl does first to do that is we create a single pane of glass well the next thing I'll mention is Swirl is software we are a software company and our software is typically deployed in the customers private cloud there's we are happy to do hosting for POVs and for various applications but for larger enterprise we don't expect that to be the case once you deploy Swirl it integrates with your single sign on systems such as Microsoft or Octa or ping federate others you can have cast whatever in there once it's configured you send a question prompt or query search query to Swirl it brokers that query to all the sources that it's authorized to do so and it does so on behalf of that user so it's not only is it safe compliance search using existing infrastructure but it's personal the data the user or caller gets back is based on what that user can see so I use it all the time and it's my email my outlook my calendar my LinkedIn whatever right it's my view we actually love the idea that we should present the data to the user so you get that single pane of glass and actually you can decide what to do with it you can say I don't want this source or whatever you can make adjustments but ultimately we then execute rag we have our own excellent high quality rag better than many in particular it seeks highly relevant passages from all of the documents we can fetch the documents and authenticate on the fly as to do that um bind those to a prompt we have our own prompt engineering you can uh override it and then do the rag against a huge list of AI providers actually we support more than 20 today including most of the ones we see out there open AI open AI and azure bedrock and prop at google mistral uh co here etc and in all cases no code should be required you configure an existing connector more than likely you're putting in just endpoint information and authentication tokens and then swirl again does that broker and creates that pane of glass and execute rag you can also use swirl just for the R if you have your own rag right you can get the result list and do your fetching or you can hook after you've got the swirl has the fetched results and you can operate that on just the full documents the key to this i love that you asked is the reader lllm we have been really heads down working on the reader llm um i've actually been asking people if they have heard the term before and many haven't uh i don't know what what your take is on on reader llm these days oh yeah i'm still catching up really i mean the way i see it and i'm still kind of plowing through rag itself right so you you said what is my take on on how easy it is to on board to the say i models and so on i i have a sense that people are aware of this because it's so easy to access through chat chat gpt and similar tools but then when it comes to deploying these things i don't think that it's as easy right so because you you have to go through a list of models you need to figure out which one to pick and and and and hence you need to be a data scientist right at that point or ml practitioner or whatever um and it's not and it's like the web is exploding with so many cheap advice you know use these use that but then as you go through that process you realize that none of those models work and so you need to do something okay the risk rag but setting up rag means that you need to bring in an effective database that you haven't seen before and things like this right so it's yeah so i love that so just speaking of misinformation right i think you're absolutely right there's so much um confusing stuff out there you do not need a vector database to rag you never did it's it's a it's a vendor thing that i totally understand they're charging per gigabyte or whatever so they say you have to have it to rag uh there's an excellent study by zet hub and actually simpson garf ankles and advisor to swore all you may have heard that name incredible tech writer um he recently wrote a study a survey or a summary i should say of the zet hub study the zet hub study shows that you do not need to vectorize your data to get high quality results instead you just increase the number of results you get from a so-called naive nonvector search engine or database and re-rank using vectors that's exactly what swore all this we vectorize the result set snippets we vectorize the full text of the documents we vectorize the query the prompt whatever it is right and our reader llem is responsible for a complex similarity re-ranking you can actually plug the a your choice of embeddings into our reader llem embeddings are actually just a feature one one of the many things that llem's do so you can change that but the reader llem here's really the core of it it's the middle layers of the generative AI llem without the you know um text generation and text interpretation part that's not there at all instead you use it to determine the similarity right cosine or they're many different algorithms but ultimately you're taking some algorithm like that and you're using embeddings plus the reader llem's own knowledge to say how similar is the query or prompt or part of it to the response that i got or find the most relevant passage in a document because you're absolutely right there are tools like langshane out there as in one example which give you lots of interesting tooling right but it's still on you the developer i actually had chat tpt generate me a pipeline just as a demo and the biggest problem is it generated a function that i have to fill in which is called select documents that's really hard and ultimately like you're basically just providing the pipeline to move the data once again but it's the reader llem in swirl is all about re-ranking and finding the best passages so that you are not sending a hundred pdf of which one paragraph is relevant you are sending the paragraph that way you can put a lot more data and you can also not blow out your token limits right assuming you have such a thing if you're on prem but that's what that's the reader lm i'll say this their reader lm are the unsung heroes of especially search but also a rag when you're looking at i would say bing or or chat gpt and you ask it a question and it goes and fetches documents from the web it's almost certainly using a reader llm to determine which pages are best and to be fair being in google have incredible knowledge of that already so it's not like it's that hard but then they're almost certainly reading the most relevant passages right they're not just passing the whole web page in so reader lm's are a thing they're definitely becoming more and more prevalent and they provide a critical non hallucinating step to help find the best results so the user doesn't have to and that's very interesting and and how let's say if you plug into a companies network right so and they focus on something i don't know healthcare banking what have you would you need to fine tune reader lm in any way no i actually don't recommend it i think there's a lot of evidence that fine tuning because of its fundamentally lossy process right is somewhat responsible for hallucinations there's been quite a bit written about this and i think that ultimately the the winning combination today is that you use a very well trained capable model that is generalist and you provide it with the data that you need to provide it with at the moment you need to for example swirls prompt engineering does a few things one we force it to only consider the rag data and not add its own model thoughts right you can interpret but don't say don't create facts that aren't presented to you second force it to disambiguate this is one of the worst errors in prompt engineering is not is just letting it go right up on past equating right two entities with the same name as if they're the same thing so our default engineering says listen if you see and into two entities with the same name don't you know essentially call that out and don't just gloss over it the last one is especially when you're talking about multiple sources of data and enterprise data the user must be able to verify or nobody wants to make a career limiting move because they took chat gpt's and answer and said here it is here it is right put it up on the on the investor site not a good idea but swirl also forces the AI to quote the sources that you use to cite them and of course you also have access to the underlying search results right so you can verify that yes you have a million dollars in insurance coverage and it covers x y and z that's key yeah that's amazing I was just you reminded me of when you said about hallucinations I was just listening to one interview is not related to AI world attack world it's political sciences and so she was asked the scientists she was asked you know are you using chat gpt at work and she said yes sometimes I do sometimes I use it as a co-writer so you know I I draft some things quickly and and I still see that chat gpt is very crude you know in the way it approaches you know I can do it better but sometimes I'm just you know lazy or tired okay let it do it but then the thing that struck her was that it actually hallucinates she was asking give me you know top five books in political science you know in specific country and chat gpt was very confident and and said they said the five books and the authors and when she googled them they don't exist and and then she said they don't exist and then chat gpt responded okay here is only one book that you should read and that didn't exist either so she was genuinely like baffled and she said okay you might say something with less confidence but why lie why do you lie she doesn't know what is hallucinations but she's she looks at it as a user and it's very disconcerting so believe it or not when I first started using gpt 4 I got a hallucination that I thought was so real I wrote to the publisher and said why is this article no longer online and the publisher wrote back and said there is no such article but it could have been it was authored by someone they said gpt 4 said it was authored by another author who had posted on that site the url looked correct and the content looked I mean the snippet it gave me looked absolutely real but again when they build these models a few you know 10 20 gigabyte model right of gpt 4 or 35 or whatever it is petabytes and petabytes of data went into that so by definition it's lossy but the way the lllm the generative part works is it must provide a response so you know how that is when you can't quite remember the name of something and it's essentially doing the same thing so it knows it I saw an artifact that looked like that but I don't have the artifact anymore so it generates something that is the consensus version of what it would have been had it existed and that's why I don't believe in fine tuning so much think if you have a high capable model with some reasoning and the ability to interpret text and follow instructions you provide it with your internal data and that is the beauty of reg because here's the thing the reason it's so good at things why does why does chat gpt 4 sound like a smart person on you know reddit or or some or Facebook or something like that right and that's because that's where it was trained from you're internal and of course on something like reddit or whatever they have a new the same conversation 10 million times right I mean how many discussions of whatever twin peaks or battle star galactica you know are there there are a lot of them and so it learns the core of these things right and can answer those questions but if you feed at your internal data like it's probably not so repetitive it's probably much more conflicting than not and so you that's why you produce more problems it's much better give it the one thing that's really relevant and let it reason yeah that sounds go and slight live right it's something that can be updated throughout the lifecycle of your company or department whatever but there is one challenge they want to offer to you and came to me just today as I was thinking and preparing for this episode data is not gold you know sometimes it is gold because everyone talks about it la la la but like it also is very complex machinery and it can have the stakes of it of its own you know misattribution misclassification and human error what have you how how would you say reader lllm or swirl gonna tackle this issue or is it just gonna transparently sort of like garbage in garbage out type of response it's a great question speaking of hallucinations in AI right we all have probably worked with somebody at one time another who made a mistake right or whatever didn't understand the problem enough and that stuff gets into teams and slack and you know documents are wrong like it's incredible you're right it's incredibly messy in the enterprise happy as anybody not worked at a firm where they had you know 500 versions of the same PowerPoint that is just evolved right so absolutely well these are things that ultimately are gonna have to be continued to be work on but here's one point number one if you leave the system in the system data in the system of record you're much less likely to introduce new problems especially like security problems and you leave it in the system of record than any domain modeling lexicon's ontologies text items you get the benefit of those if someone cared about that source they might very well have done some of that right so if you pull it all out and put it in a vector database like what happened to all of that knowledge so I would argue that the systems of record that are valuable have things in place to deal with that number two the reader lm does a couple things that to help this one it's aware of certain commonly problematic formats email is the worst reply forward and signature content very very problematic we have a solution for public data too so that you can get article content without getting as an example at navigation advertisements cloked data stuff like that right so because very often public data is relevant right to to large enterprise like they want to see policy changes regulatory changes online catalog changes right that that's all relevant stuff then there's the similarity problem right so one of the another thing the reader lm does it can do semantic analysis to determine which is the latest version of the same document that's a one of the great lm's are amazing at that much better at old school like multi windows setups where you're trying to take out like a signature of the document and say well this could it's very similar but lm does it much better right and you can quickly say now this is the latest version of that spreadsheet or you can let the user decide it's another thing who doesn't love shopping I love being able to look at my shopping cart full of swirl results and say you know this one I know isn't really relevant these are the five I've risen or maybe this is the source or these are the sources that I want my data from today that's another way of allowing the user to bring their expertise and experience and knowledge and say no no no colibre not thoughts but snowflake not oracle whatever and I'm not picking on anybody we all can say they're all present they all have value the question is which one has the answer for me today well until the until they can write the query with the context that answers that you know I think the key is to keep the user in the loop make sure that there are citations and ultimately that in a year the systems will be smarter and many of these problems will be solved after all almost all the naive search engines right that were BM25 or whatever pretty much all have vector upgrades now only questions can you wait long enough to vectorize at high-dimensional space a few million documents exactly yeah sometimes when I use chat GPT I don't use it that often by the way for some reason maybe it says something about me maybe I should you learn to do it but sometimes as you said you know it just generates something it seems a little average you know it's a code snippet or something like that you try it it doesn't work at that point when I when I get frustrated a little bit I'm like can you show me the source maybe a link to stack over flow so I can go and drill in for myself you know I don't have to sort of keep pounding and you and I'm asking you know okay that didn't work this didn't work because I can do the same thing just staring at the stack over flow page right and maybe they have been already some updates and someone said no that doesn't work in all some times they see you see the selected answer but then there is another one which everyone says that works not the selected one so that's just funny yeah that's amazing so reader lulm like just to sort of bring it back to the ground especially for those who are sort of no vice like myself I still consider myself no vice you know have you sort of taken a reader lulm off the shelf you sort of implemented someone's paper or took it or did you did you have to train it how did you go about it we built it largely from it's it's been an evolving thing but they're they're definitely our other reader lulm's out there the key is to preserve the structure right and the pieces the structure that allows you to do similarity we implemented our own similarity and other algos we also do things like named entity recognition and sentiment analysis well those are great at that stuff it can do scoring for machine learning purposes right so we have a nice intention detection system now that will essentially based on the responses that you get to tell you which sources are most relevant right up front right based on the responses and also optionally ratings right if you want to bring bring that into the system passage detection is totally in response our reader lulm's passage detection is totally in response to the problem you described right which is the data is messy and we don't necessarily want to ship a you know 500 100 page PDFs that have essentially the same data so there we it finds the most relevant passage super quickly and truncates the document down to a window around it um those those are the things and it we've really implemented it ourselves it's our it's our own it's our own creation oh that's fantastic so that's your secret source as well I mean that's that's something to be proud of and also I want to sort of close up that the sort of you know description that he gave or maybe looking at the future does world have some way of feedback do you plan to if not do you plan to implement do you think it's reasonable to have a feedback loop you know like in chat jpd you can say thumbs up thumbs down you cannot say much you know you can say this was the answer I don't know if that's gonna go into the loop but whatever because it gives me the the joy of sort of completing it um yeah oh yes so when swirl AI connect is deployed in the enterprise where starters you get to connect the data and get rag and your choice of AI's and by the way it's again configuration for the AI you put your keys in right check pick the model and you can rag against it you can also choose the role you want to use different generative AI in rights you can use it for query writing you can use it for direct answer you can use it for rag if it has embeddings you can use that to power the the reader L so just just to be clear there it's uh it's a bit more flexible yeah i was asking about feedback right so like do you plan do you have it and if not do you plan to think it's reasonable to have it right absolutely so after you deploy AI connect as mentioned you get those abilities then we have an analytics package which will give you insight as to which sources are providing the most relevant responses and rating and putting all of that into dashboards understanding who are the number one users who write the best prompts who get which sources produce the best results which prompts you absolutely is all part of the the offering and ultimately it's part of what we tailor right for the for the deployment and again that can be on premises but AI connect is the key because it's collecting that data on again always in the customers private cloud like we don't see it we're not sass but that data is absolutely turnable into gold a variety of different gold things and so you can hopefully figure out which AI works best for which groups you can figure out which sources right are providing the best input for rag etc yeah that's fantastic i love that you do have feedback uh i think it's definitely gold it could could also be super messy and noisy and stuff but it's better than absence of it um yeah that's amazing maybe like in the past year so you've been deploying this with clients obviously you don't have to mention the names but was there something that surprised you how clients you know sort of perceived uh swirl yeah i i would say that um people have not really been looking for search if i'm the AI the explosion of AI and the excitement around AI kind of crowded everything out round everything out so that's why i think so many of these copy in architectures were got so much momentum but and i by the way i think people are doing incredible stuff with that so it's not like those aren't perfectly legitimate i mean every database ever starts that way right you put the data in and you get inside of it's just that there's a bit more to the story right there's a whole other world of well there's a lot of these and i just moved them to cloud and i can't necessarily do it but people weren't thinking search i'm not sure what they believed the answer would be but uh there were some excellent posts there was one on linkedin by um vector ventures i think or um i should probably get the name right but in any event they published an excellent piece about how search is probably the answer to making AI work in a lot of these cases and uh you know they also point out there's not that many people who have come at it from the search perspective so that was a bit surprising to me because the large enterprise has always loved search always uh because that that's how knowledge workers and people get stuff done right it's yes you have business intelligence and dashboards and reporting and we like those things but so much of the qualitative why did things happen explain it to me how do i solve this that's been something that search did a good job of um ultimately it's a technology right and the marriage of search with llm that seems to be the unlocker if you will in the enterprise and that's that was surprising to me right i i thought that there would be much more of a search first approach and i think everybody had to get through the understanding of what it means to adopt AI and how the first generation works and now i think people are recognizing real time real time architecture using systems of record with re-ranking and and then just stay keeping up right with the incredible innovation in it in let's call it generative AI right that's that's the interesting thing but again i come back to what i said at the beginning there's going to be a many many incredible generative AI's they're going to do different things that we haven't even seen i don't think the most extraordinary ones they'll be from the big science publishers they're going to build incredible life sciences gai i'm sure people like bloomberg ft you're going to build incredible um financial ones and that's great but all of those still need your data to operate on your environment and give you answers that are meaningful and that problem that problem is the problem that's world solves so just to understand what were they expecting was it like chat interface or you said they didn't expect search no they thought they would ask the question oh yeah so they thought like a chat right yeah that everybody wants kind of a conversational interface you know another thing i learned actually you're really reminded me people are not so interested in the idea that there is an AI place that you go i think another very logical step is business folks knowledge workers they would like to use the channels they use today so rather than i have to have a new place to go why can't i talk to it on teens or in slack or on my whatsapp why can't i text it um if i need to get a visual i could always go to a screen right and then i could have it show me the underlying data show me the graph show me the chart but for the the the future is not applications that are destinations the future is an ongoing that i log with the AI that understands your business your world has access to your data and becomes your uh trusted advisor and agent i don't want to use the word co-pilot because i think that's a little it's much more your confidant it's much more your agent it's going to tell you stuff like hey that question you asked last month there's a very different answer this month that's a pretty interesting thing or it's going to let you explore so you know tell me about our customer service ratings well which region right disambiguation which was previously something you know that you would do like through facets in search right that's kind of thing that should become more dialogue oriented but those things that's going to take some time because in order to know how to disambiguate you still have to know what data is relevant right so so that's been surprising but i think we're going to see a wave of search driven innovation and i'm excited about it i think the more people shift away from innovating in a repository to innovating across repositories we'll see you know another another layer of innovation and and even more productivity left right for for the people to use it oh yeah that's fantastic the way i put it and i'm glad to hear this because there's the search professional you know now a product manager i love the fact that the powerhouse of you know the future of AI still continues to be searched right at the core and i think search it also says that search isn't solved and maybe this is another iteration that we will approach it but it's like because search is also perception right it's also how you express yourself how you perceive what you see maybe the interfaces will change right so sometimes i do want to you know that that product that google had google glass sometimes i want to have glass on me to take a picture or you know not to be as distracted by going and fetching my phone or something right because today i still have to do that it's not as immersive experience and also i've noticed working with engineers now when i flipped on this side of you know the process i'm a product manager so i keep thinking about things and they keep coding and sometimes i've noticed that they don't even go back on slack from like a couple hours when i when i ask something because they don't want to be uh you know distracted from their ideas so maybe there could be a way for me or agent or whoever to sort of sneak into their idea ask a question talk to them right that would be fantastic maybe it sounds also a little crazy like you you still want to have privacy and sort of you know uh flow but at the same time there is reality of your job right you you do need to go back to your email to your slack or whatever you're using teams and get distracted and then you forget what is it that you've been onto when you come back to your your motive execution absolutely you know in a way applications are distracting i think there was a really good study recently that showed the danger of interrupting engineers right because of the context switch um it's definitely the same for business people they're just like everybody else right context matters and it can be hard to switch um i think that's the real promise of a i look at chat gpt you go to chat gpt they're gonna have search soon web search you can ask it questions you don't have to go to google and being in five other places right to get it and that is the real possibility that you would choose the way you want to interact with it and that thing in theory that single point that single pane of glass or single conversational agent right that could potentially be in front of many many sources of data and that i think that's what what people realize it's hard to say how does what does it really look like in five years if a i really continues along the path of sun the answer is it's the end of applications yeah yeah exactly and going back to being immersive and sort of feeling that i'm myself and i am in control and not like vice versa when today i don't feel like i'm in control applications update by themselves i phone restarts i have no idea what's in that update i will not be able to ever understand what they do but they do it so sometimes i feel like whatever i have bought belongs to someone else but probably this will change and i think this should change as we wrap up i was thinking is there something you want to sort of call out to the community and say you know by now swirl obviously has progressed you guys open source i love it you you have a bunch of contributors probably that you trust and you work with but is there anything that you would you know benefit from calling out the the larger community i think i'm very happy to see the the folks shift and focus towards search i think the thing i'd call out is to say you know there are many different user communities that want to consume AI they will benefit from it and i think the key is not to go too far on the hype cycle right and because honestly another thing i learned is not everybody is into the details of how AI works right and like fine tuning is an example it's a very deep discussion at some level i'm no expert right i can tell a lot about it but i think there's people who have done much more on it than i will ever do but the end of the day the user that's way way way far from the user's head right what they're what they're trying to understand and the people are making decisions about bringing these things in are is it safe how can i trust it how do i get it to provide a benefit so i think the honest thing is rather than focusing on like we've got a few more tokens than somebody else talk about use cases focus on the user i think that's where that's what's world did from the beginning because if you're in search like there is nothing but the user right the user's intent is everything right and from like we can go back to lots of lots of great writing about that from biaires e8s to tinkle and and all points in between but the user's intent's important and that's the thing to focus on what are they trying to accomplish and build great use cases that ultimately you know allow people to focus on the things they'd rather focus on instead of you know the minutia and the time of collecting all these different data points yeah that's i think that's i think what you do as a as a tech industry responding to the user demand for AI my two cents oh that's amazing i don't don't try to outsmart the users and uh make things that you produce explainable and so they probably will adopt them weaker uh that's amazing i also see uh super uh maybe provocative really nice one from your side where you say that you outshine google i will link it in the show notes and maybe we'll discuss it at some point as well something about ranking looks it i really enjoyed chatting to you today i'm sure there will be someone you know in the community reaching out and maybe trying out swirl to be honest it's itching for me to try it out right when i when i said it when i mentioned in my company and someone said i would love to have that single keyword box so that i can search slack and conference and email and everything um that's amazing that's fantastic and also amazing that you guys do it uh in the open so everyone can try it um all the best to you good luck in in whatever you're building in your uh next big things thanks to me tree thank you so much it was great to talk to you and uh when you want to ready to try a swirl you got the open source version enjoy our slack and we'll be happy to help absolutely thank you very much enjoy your day youtube \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/tom-lackner-vp-engineering-classic-com-on-qdrant-nft-challenges-and-joys-of-ml-engineering.md b/transcripts_with_timestamps/vector-podcast/tom-lackner-vp-engineering-classic-com-on-qdrant-nft-challenges-and-joys-of-ml-engineering.md new file mode 100644 index 0000000..a87a270 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/tom-lackner-vp-engineering-classic-com-on-qdrant-nft-challenges-and-joys-of-ml-engineering.md @@ -0,0 +1,3385 @@ +--- +description: '

YouTube: https://www.youtube.com/watch?v=kVCIDTmiZyk

Show + notes:

- The ML Test Score: A Rubric for ML Production Readiness and Technical + Debt Reduction https://research.google/pubs/pub46555/

- + IEEE MLOps Standard for Ethical AI https://docs.google.com/document/d/1x...

- + Qdrant: https://qdrant.tech/

- + Elixir connector for Qdrant by Tom: https://github.com/tlack/exqdr

- + Other 6 vector databases: https://towardsdatascience.com/milvus...

- + ByT5: Towards a token-free future with pre-trained byte-to-byte models https://arxiv.org/abs/2105.13626

- + Tantivy: https://github.com/quickwit-inc/tantivy

- + Papers with code: https://paperswithcode.com/

' +image_url: https://media.rss.com/vector-podcast/20211223_041259_de64d1b728c612795842622095155ffc.jpg +pub_date: Thu, 23 Dec 2021 16:01:59 GMT +title: Tom Lackner - VP Engineering - Classic.com - on Qdrant, NFT, challenges and + joys of ML engineering +url: https://rss.com/podcasts/vector-podcast/347538 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 14.120000000000001, + "text": " Hi, everyone.", "tokens": [50364, 2421, 11, 1518, 13, 51070], "temperature": + 0.0, "avg_logprob": -0.33490589141845706, "compression_ratio": 1.2318840579710144, + "no_speech_prob": 0.06484056264162064}, {"id": 1, "seek": 0, "start": 14.120000000000001, + "end": 16.04, "text": " Bector Podcast is here.", "tokens": [51070, 363, 20814, + 29972, 307, 510, 13, 51166], "temperature": 0.0, "avg_logprob": -0.33490589141845706, + "compression_ratio": 1.2318840579710144, "no_speech_prob": 0.06484056264162064}, + {"id": 2, "seek": 0, "start": 16.04, "end": 22.44, "text": " And today we have Tom + Lackner, Vice President of Technology at the company called Classic.", "tokens": + [51166, 400, 965, 321, 362, 5041, 441, 501, 1193, 11, 13276, 3117, 295, 15037, 412, + 264, 2237, 1219, 25008, 13, 51486], "temperature": 0.0, "avg_logprob": -0.33490589141845706, + "compression_ratio": 1.2318840579710144, "no_speech_prob": 0.06484056264162064}, + {"id": 3, "seek": 0, "start": 22.44, "end": 24.52, "text": " And I''m sure Tom will + talk more about it.", "tokens": [51486, 400, 286, 478, 988, 5041, 486, 751, 544, + 466, 309, 13, 51590], "temperature": 0.0, "avg_logprob": -0.33490589141845706, "compression_ratio": + 1.2318840579710144, "no_speech_prob": 0.06484056264162064}, {"id": 4, "seek": 2452, + "start": 24.52, "end": 31.12, "text": " And he''s also the founder and sole developer + of Lookpop, which I''m sure Tom will talk more", "tokens": [50364, 400, 415, 311, + 611, 264, 14917, 293, 12321, 10754, 295, 2053, 13872, 11, 597, 286, 478, 988, 5041, + 486, 751, 544, 50694], "temperature": 0.0, "avg_logprob": -0.2699189552894005, "compression_ratio": + 1.6223175965665235, "no_speech_prob": 0.13463129103183746}, {"id": 5, "seek": 2452, + "start": 31.12, "end": 33.32, "text": " about as well today.", "tokens": [50694, + 466, 382, 731, 965, 13, 50804], "temperature": 0.0, "avg_logprob": -0.2699189552894005, + "compression_ratio": 1.6223175965665235, "no_speech_prob": 0.13463129103183746}, + {"id": 6, "seek": 2452, "start": 33.32, "end": 39.64, "text": " And what''s really + cool is that Tom has been using vector database called Quadrant in his", "tokens": + [50804, 400, 437, 311, 534, 1627, 307, 300, 5041, 575, 668, 1228, 8062, 8149, 1219, + 29619, 7541, 294, 702, 51120], "temperature": 0.0, "avg_logprob": -0.2699189552894005, + "compression_ratio": 1.6223175965665235, "no_speech_prob": 0.13463129103183746}, + {"id": 7, "seek": 2452, "start": 39.64, "end": 40.92, "text": " development.", "tokens": + [51120, 3250, 13, 51184], "temperature": 0.0, "avg_logprob": -0.2699189552894005, + "compression_ratio": 1.6223175965665235, "no_speech_prob": 0.13463129103183746}, + {"id": 8, "seek": 2452, "start": 40.92, "end": 45.879999999999995, "text": " And + so today we have a user of a vector database, not a maker.", "tokens": [51184, 400, + 370, 965, 321, 362, 257, 4195, 295, 257, 8062, 8149, 11, 406, 257, 17127, 13, 51432], + "temperature": 0.0, "avg_logprob": -0.2699189552894005, "compression_ratio": 1.6223175965665235, + "no_speech_prob": 0.13463129103183746}, {"id": 9, "seek": 2452, "start": 45.879999999999995, + "end": 50.2, "text": " And that''s amazing to hear firsthand how it goes with vector + database.", "tokens": [51432, 400, 300, 311, 2243, 281, 1568, 38599, 577, 309, 1709, + 365, 8062, 8149, 13, 51648], "temperature": 0.0, "avg_logprob": -0.2699189552894005, + "compression_ratio": 1.6223175965665235, "no_speech_prob": 0.13463129103183746}, + {"id": 10, "seek": 2452, "start": 50.2, "end": 51.2, "text": " Hey Tom.", "tokens": + [51648, 1911, 5041, 13, 51698], "temperature": 0.0, "avg_logprob": -0.2699189552894005, + "compression_ratio": 1.6223175965665235, "no_speech_prob": 0.13463129103183746}, + {"id": 11, "seek": 2452, "start": 51.2, "end": 53.96, "text": " Hey, what''s going + on?", "tokens": [51698, 1911, 11, 437, 311, 516, 322, 30, 51836], "temperature": + 0.0, "avg_logprob": -0.2699189552894005, "compression_ratio": 1.6223175965665235, + "no_speech_prob": 0.13463129103183746}, {"id": 12, "seek": 5396, "start": 53.96, + "end": 56.28, "text": " So great that you joined today.", "tokens": [50364, 407, + 869, 300, 291, 6869, 965, 13, 50480], "temperature": 0.0, "avg_logprob": -0.20187821463933067, + "compression_ratio": 1.5819397993311037, "no_speech_prob": 0.008327370509505272}, + {"id": 13, "seek": 5396, "start": 56.28, "end": 60.52, "text": " And I just wanted + to start as usual, like if you could please introduce yourself and", "tokens": [50480, + 400, 286, 445, 1415, 281, 722, 382, 7713, 11, 411, 498, 291, 727, 1767, 5366, 1803, + 293, 50692], "temperature": 0.0, "avg_logprob": -0.20187821463933067, "compression_ratio": + 1.5819397993311037, "no_speech_prob": 0.008327370509505272}, {"id": 14, "seek": + 5396, "start": 60.52, "end": 63.28, "text": " give you a little bit like color to + your background.", "tokens": [50692, 976, 291, 257, 707, 857, 411, 2017, 281, 428, + 3678, 13, 50830], "temperature": 0.0, "avg_logprob": -0.20187821463933067, "compression_ratio": + 1.5819397993311037, "no_speech_prob": 0.008327370509505272}, {"id": 15, "seek": + 5396, "start": 63.28, "end": 64.28, "text": " Sure.", "tokens": [50830, 4894, 13, + 50880], "temperature": 0.0, "avg_logprob": -0.20187821463933067, "compression_ratio": + 1.5819397993311037, "no_speech_prob": 0.008327370509505272}, {"id": 16, "seek": + 5396, "start": 64.28, "end": 65.28, "text": " My name is Tom Lackner.", "tokens": + [50880, 1222, 1315, 307, 5041, 441, 501, 1193, 13, 50930], "temperature": 0.0, "avg_logprob": + -0.20187821463933067, "compression_ratio": 1.5819397993311037, "no_speech_prob": + 0.008327370509505272}, {"id": 17, "seek": 5396, "start": 65.28, "end": 69.68, "text": + " I''m a software developer living in Miami, Florida, a very warm place.", "tokens": + [50930, 286, 478, 257, 4722, 10754, 2647, 294, 18367, 11, 9117, 11, 257, 588, 4561, + 1081, 13, 51150], "temperature": 0.0, "avg_logprob": -0.20187821463933067, "compression_ratio": + 1.5819397993311037, "no_speech_prob": 0.008327370509505272}, {"id": 18, "seek": + 5396, "start": 69.68, "end": 74.88, "text": " I''ve been developing stuff on the + web for about 20 years now since the early days of", "tokens": [51150, 286, 600, + 668, 6416, 1507, 322, 264, 3670, 337, 466, 945, 924, 586, 1670, 264, 2440, 1708, + 295, 51410], "temperature": 0.0, "avg_logprob": -0.20187821463933067, "compression_ratio": + 1.5819397993311037, "no_speech_prob": 0.008327370509505272}, {"id": 19, "seek": + 5396, "start": 74.88, "end": 75.88, "text": " it.", "tokens": [51410, 309, 13, 51460], + "temperature": 0.0, "avg_logprob": -0.20187821463933067, "compression_ratio": 1.5819397993311037, + "no_speech_prob": 0.008327370509505272}, {"id": 20, "seek": 5396, "start": 75.88, + "end": 80.28, "text": " And I really, really love vector databases these days and + doing stuff with embeddings.", "tokens": [51460, 400, 286, 534, 11, 534, 959, 8062, + 22380, 613, 1708, 293, 884, 1507, 365, 12240, 29432, 13, 51680], "temperature": + 0.0, "avg_logprob": -0.20187821463933067, "compression_ratio": 1.5819397993311037, + "no_speech_prob": 0.008327370509505272}, {"id": 21, "seek": 5396, "start": 80.28, + "end": 83.2, "text": " Yeah, fantastic, fantastic.", "tokens": [51680, 865, 11, + 5456, 11, 5456, 13, 51826], "temperature": 0.0, "avg_logprob": -0.20187821463933067, + "compression_ratio": 1.5819397993311037, "no_speech_prob": 0.008327370509505272}, + {"id": 22, "seek": 8320, "start": 83.2, "end": 86.44, "text": " And can you tell + more about classic?", "tokens": [50364, 400, 393, 291, 980, 544, 466, 7230, 30, + 50526], "temperature": 0.0, "avg_logprob": -0.17682403564453125, "compression_ratio": + 1.723021582733813, "no_speech_prob": 0.0038303793407976627}, {"id": 23, "seek": + 8320, "start": 86.44, "end": 91.76, "text": " So I know that it''s about classic + cars, but yeah, what''s this website is about and what''s", "tokens": [50526, 407, + 286, 458, 300, 309, 311, 466, 7230, 5163, 11, 457, 1338, 11, 437, 311, 341, 3144, + 307, 466, 293, 437, 311, 50792], "temperature": 0.0, "avg_logprob": -0.17682403564453125, + "compression_ratio": 1.723021582733813, "no_speech_prob": 0.0038303793407976627}, + {"id": 24, "seek": 8320, "start": 91.76, "end": 94.84, "text": " the community maybe + around it and so on.", "tokens": [50792, 264, 1768, 1310, 926, 309, 293, 370, 322, + 13, 50946], "temperature": 0.0, "avg_logprob": -0.17682403564453125, "compression_ratio": + 1.723021582733813, "no_speech_prob": 0.0038303793407976627}, {"id": 25, "seek": + 8320, "start": 94.84, "end": 99.6, "text": " So I''m the VP of technology for a + site called classic.com, the tracks classic car values.", "tokens": [50946, 407, + 286, 478, 264, 35812, 295, 2899, 337, 257, 3621, 1219, 7230, 13, 1112, 11, 264, + 10218, 7230, 1032, 4190, 13, 51184], "temperature": 0.0, "avg_logprob": -0.17682403564453125, + "compression_ratio": 1.723021582733813, "no_speech_prob": 0.0038303793407976627}, + {"id": 26, "seek": 8320, "start": 99.6, "end": 104.2, "text": " So what we basically + do is we go out on the web and we grab all the car sales that are", "tokens": [51184, + 407, 437, 321, 1936, 360, 307, 321, 352, 484, 322, 264, 3670, 293, 321, 4444, 439, + 264, 1032, 5763, 300, 366, 51414], "temperature": 0.0, "avg_logprob": -0.17682403564453125, + "compression_ratio": 1.723021582733813, "no_speech_prob": 0.0038303793407976627}, + {"id": 27, "seek": 8320, "start": 104.2, "end": 108.88, "text": " occurring that + are happening in a way that''s easily understood.", "tokens": [51414, 18386, 300, + 366, 2737, 294, 257, 636, 300, 311, 3612, 7320, 13, 51648], "temperature": 0.0, + "avg_logprob": -0.17682403564453125, "compression_ratio": 1.723021582733813, "no_speech_prob": + 0.0038303793407976627}, {"id": 28, "seek": 8320, "start": 108.88, "end": 112.4, + "text": " So if anything is sold with a price on it, we record that information.", + "tokens": [51648, 407, 498, 1340, 307, 3718, 365, 257, 3218, 322, 309, 11, 321, + 2136, 300, 1589, 13, 51824], "temperature": 0.0, "avg_logprob": -0.17682403564453125, + "compression_ratio": 1.723021582733813, "no_speech_prob": 0.0038303793407976627}, + {"id": 29, "seek": 11240, "start": 112.4, "end": 116.68, "text": " And then we cross + reference all these vehicles broken down into what we call markets.", "tokens": + [50364, 400, 550, 321, 3278, 6408, 439, 613, 8948, 5463, 760, 666, 437, 321, 818, + 8383, 13, 50578], "temperature": 0.0, "avg_logprob": -0.1719828212962431, "compression_ratio": + 1.697594501718213, "no_speech_prob": 0.00224810978397727}, {"id": 30, "seek": 11240, + "start": 116.68, "end": 122.0, "text": " So if a vehicle came in two different trims, + two different levels of options, we break", "tokens": [50578, 407, 498, 257, 5864, + 1361, 294, 732, 819, 10445, 82, 11, 732, 819, 4358, 295, 3956, 11, 321, 1821, 50844], + "temperature": 0.0, "avg_logprob": -0.1719828212962431, "compression_ratio": 1.697594501718213, + "no_speech_prob": 0.00224810978397727}, {"id": 31, "seek": 11240, "start": 122.0, + "end": 126.92, "text": " those out separately and we can give the user a really + good estimate of value with very", "tokens": [50844, 729, 484, 14759, 293, 321, + 393, 976, 264, 4195, 257, 534, 665, 12539, 295, 2158, 365, 588, 51090], "temperature": + 0.0, "avg_logprob": -0.1719828212962431, "compression_ratio": 1.697594501718213, + "no_speech_prob": 0.00224810978397727}, {"id": 32, "seek": 11240, "start": 126.92, + "end": 131.44, "text": " specific and granular understanding of what a car is really + worth.", "tokens": [51090, 2685, 293, 39962, 3701, 295, 437, 257, 1032, 307, 534, + 3163, 13, 51316], "temperature": 0.0, "avg_logprob": -0.1719828212962431, "compression_ratio": + 1.697594501718213, "no_speech_prob": 0.00224810978397727}, {"id": 33, "seek": 11240, + "start": 131.44, "end": 134.88, "text": " So it''s basically like a big data for + cars type project, I guess you could say.", "tokens": [51316, 407, 309, 311, 1936, + 411, 257, 955, 1412, 337, 5163, 2010, 1716, 11, 286, 2041, 291, 727, 584, 13, 51488], + "temperature": 0.0, "avg_logprob": -0.1719828212962431, "compression_ratio": 1.697594501718213, + "no_speech_prob": 0.00224810978397727}, {"id": 34, "seek": 11240, "start": 134.88, + "end": 139.64000000000001, "text": " Yeah, and I mean, I checked the website and + I mean, the cars look so great and some of", "tokens": [51488, 865, 11, 293, 286, + 914, 11, 286, 10033, 264, 3144, 293, 286, 914, 11, 264, 5163, 574, 370, 869, 293, + 512, 295, 51726], "temperature": 0.0, "avg_logprob": -0.1719828212962431, "compression_ratio": + 1.697594501718213, "no_speech_prob": 0.00224810978397727}, {"id": 35, "seek": 13964, + "start": 139.64, "end": 143.04, "text": " them are kind of like on on high end in + terms of pricing.", "tokens": [50364, 552, 366, 733, 295, 411, 322, 322, 1090, 917, + 294, 2115, 295, 17621, 13, 50534], "temperature": 0.0, "avg_logprob": -0.19497376955472506, + "compression_ratio": 1.6026058631921825, "no_speech_prob": 0.06344874948263168}, + {"id": 36, "seek": 13964, "start": 143.04, "end": 145.6, "text": " So it also defines + the audience, right?", "tokens": [50534, 407, 309, 611, 23122, 264, 4034, 11, 558, + 30, 50662], "temperature": 0.0, "avg_logprob": -0.19497376955472506, "compression_ratio": + 1.6026058631921825, "no_speech_prob": 0.06344874948263168}, {"id": 37, "seek": 13964, + "start": 145.6, "end": 150.64, "text": " Yeah, it''s classic car values have really + gone up in the past five years, especially considering", "tokens": [50662, 865, + 11, 309, 311, 7230, 1032, 4190, 362, 534, 2780, 493, 294, 264, 1791, 1732, 924, + 11, 2318, 8079, 50914], "temperature": 0.0, "avg_logprob": -0.19497376955472506, + "compression_ratio": 1.6026058631921825, "no_speech_prob": 0.06344874948263168}, + {"id": 38, "seek": 13964, "start": 150.64, "end": 153.6, "text": " COVID and a couple + of factors in the United States.", "tokens": [50914, 4566, 293, 257, 1916, 295, + 6771, 294, 264, 2824, 3040, 13, 51062], "temperature": 0.0, "avg_logprob": -0.19497376955472506, + "compression_ratio": 1.6026058631921825, "no_speech_prob": 0.06344874948263168}, + {"id": 39, "seek": 13964, "start": 153.6, "end": 157.44, "text": " So it''s more + important than ever to do really intelligent, like savvy shopping before you", "tokens": + [51062, 407, 309, 311, 544, 1021, 813, 1562, 281, 360, 534, 13232, 11, 411, 47506, + 8688, 949, 291, 51254], "temperature": 0.0, "avg_logprob": -0.19497376955472506, + "compression_ratio": 1.6026058631921825, "no_speech_prob": 0.06344874948263168}, + {"id": 40, "seek": 13964, "start": 157.44, "end": 158.44, "text": " make a purchase.", + "tokens": [51254, 652, 257, 8110, 13, 51304], "temperature": 0.0, "avg_logprob": + -0.19497376955472506, "compression_ratio": 1.6026058631921825, "no_speech_prob": + 0.06344874948263168}, {"id": 41, "seek": 13964, "start": 158.44, "end": 159.44, + "text": " So that''s where we''re coming from.", "tokens": [51304, 407, 300, 311, + 689, 321, 434, 1348, 490, 13, 51354], "temperature": 0.0, "avg_logprob": -0.19497376955472506, + "compression_ratio": 1.6026058631921825, "no_speech_prob": 0.06344874948263168}, + {"id": 42, "seek": 13964, "start": 159.44, "end": 160.44, "text": " Oh, yeah.", + "tokens": [51354, 876, 11, 1338, 13, 51404], "temperature": 0.0, "avg_logprob": + -0.19497376955472506, "compression_ratio": 1.6026058631921825, "no_speech_prob": + 0.06344874948263168}, {"id": 43, "seek": 13964, "start": 160.44, "end": 161.44, + "text": " Awesome.", "tokens": [51404, 10391, 13, 51454], "temperature": 0.0, "avg_logprob": + -0.19497376955472506, "compression_ratio": 1.6026058631921825, "no_speech_prob": + 0.06344874948263168}, {"id": 44, "seek": 13964, "start": 161.44, "end": 167.79999999999998, + "text": " And like, is it so that the kind of the user experience is mostly kind + of managed on", "tokens": [51454, 400, 411, 11, 307, 309, 370, 300, 264, 733, 295, + 264, 4195, 1752, 307, 5240, 733, 295, 6453, 322, 51772], "temperature": 0.0, "avg_logprob": + -0.19497376955472506, "compression_ratio": 1.6026058631921825, "no_speech_prob": + 0.06344874948263168}, {"id": 45, "seek": 16780, "start": 167.8, "end": 173.08, "text": + " the website or you have also some offline part of the operations?", "tokens": + [50364, 264, 3144, 420, 291, 362, 611, 512, 21857, 644, 295, 264, 7705, 30, 50628], + "temperature": 0.0, "avg_logprob": -0.19043871251548208, "compression_ratio": 1.7671232876712328, + "no_speech_prob": 0.00932757742702961}, {"id": 46, "seek": 16780, "start": 173.08, + "end": 175.92000000000002, "text": " So most of our operations are online on the + website.", "tokens": [50628, 407, 881, 295, 527, 7705, 366, 2950, 322, 264, 3144, + 13, 50770], "temperature": 0.0, "avg_logprob": -0.19043871251548208, "compression_ratio": + 1.7671232876712328, "no_speech_prob": 0.00932757742702961}, {"id": 47, "seek": 16780, + "start": 175.92000000000002, "end": 179.96, "text": " We also have an iPhone app, + but what''s really important is our backend crawlers.", "tokens": [50770, 492, 611, + 362, 364, 7252, 724, 11, 457, 437, 311, 534, 1021, 307, 527, 38087, 13999, 11977, + 13, 50972], "temperature": 0.0, "avg_logprob": -0.19043871251548208, "compression_ratio": + 1.7671232876712328, "no_speech_prob": 0.00932757742702961}, {"id": 48, "seek": 16780, + "start": 179.96, "end": 187.0, "text": " So we have a huge amount of software and + resources attached to the idea of brightened crawlers", "tokens": [50972, 407, 321, + 362, 257, 2603, 2372, 295, 4722, 293, 3593, 8570, 281, 264, 1558, 295, 4730, 5320, + 13999, 11977, 51324], "temperature": 0.0, "avg_logprob": -0.19043871251548208, "compression_ratio": + 1.7671232876712328, "no_speech_prob": 0.00932757742702961}, {"id": 49, "seek": 16780, + "start": 187.0, "end": 190.60000000000002, "text": " that can understand different + auction websites really, really well.", "tokens": [51324, 300, 393, 1223, 819, 24139, + 12891, 534, 11, 534, 731, 13, 51504], "temperature": 0.0, "avg_logprob": -0.19043871251548208, + "compression_ratio": 1.7671232876712328, "no_speech_prob": 0.00932757742702961}, + {"id": 50, "seek": 16780, "start": 190.60000000000002, "end": 194.08, "text": " + That''s like a critical part of the infrastructure that''s sort of behind the scenes, + but ends up", "tokens": [51504, 663, 311, 411, 257, 4924, 644, 295, 264, 6896, 300, + 311, 1333, 295, 2261, 264, 8026, 11, 457, 5314, 493, 51678], "temperature": 0.0, + "avg_logprob": -0.19043871251548208, "compression_ratio": 1.7671232876712328, "no_speech_prob": + 0.00932757742702961}, {"id": 51, "seek": 16780, "start": 194.08, "end": 197.08, + "text": " doing becoming, you know, a key part of what we''re doing.", "tokens": + [51678, 884, 5617, 11, 291, 458, 11, 257, 2141, 644, 295, 437, 321, 434, 884, 13, + 51828], "temperature": 0.0, "avg_logprob": -0.19043871251548208, "compression_ratio": + 1.7671232876712328, "no_speech_prob": 0.00932757742702961}, {"id": 52, "seek": 19708, + "start": 197.20000000000002, "end": 198.08, "text": " Yeah.", "tokens": [50370, + 865, 13, 50414], "temperature": 0.0, "avg_logprob": -0.23322883981173156, "compression_ratio": + 1.7169811320754718, "no_speech_prob": 0.0056465016677975655}, {"id": 53, "seek": + 19708, "start": 198.08, "end": 200.16000000000003, "text": " And I knew obviously + I have a search bar there.", "tokens": [50414, 400, 286, 2586, 2745, 286, 362, 257, + 3164, 2159, 456, 13, 50518], "temperature": 0.0, "avg_logprob": -0.23322883981173156, + "compression_ratio": 1.7169811320754718, "no_speech_prob": 0.0056465016677975655}, + {"id": 54, "seek": 19708, "start": 200.16000000000003, "end": 206.44, "text": " + So what happens when I type something like in, you know, on classic, we use a combination", + "tokens": [50518, 407, 437, 2314, 562, 286, 2010, 746, 411, 294, 11, 291, 458, 11, + 322, 7230, 11, 321, 764, 257, 6562, 50832], "temperature": 0.0, "avg_logprob": -0.23322883981173156, + "compression_ratio": 1.7169811320754718, "no_speech_prob": 0.0056465016677975655}, + {"id": 55, "seek": 19708, "start": 206.44, "end": 211.8, "text": " of Postgres for + the actual like OLTP data, like, you know, the actual ground truth.", "tokens": + [50832, 295, 10223, 45189, 337, 264, 3539, 411, 39191, 16804, 1412, 11, 411, 11, + 291, 458, 11, 264, 3539, 2727, 3494, 13, 51100], "temperature": 0.0, "avg_logprob": + -0.23322883981173156, "compression_ratio": 1.7169811320754718, "no_speech_prob": + 0.0056465016677975655}, {"id": 56, "seek": 19708, "start": 211.8, "end": 215.84, + "text": " And then we feed that into a last search to do the full text search.", + "tokens": [51100, 400, 550, 321, 3154, 300, 666, 257, 1036, 3164, 281, 360, 264, + 1577, 2487, 3164, 13, 51302], "temperature": 0.0, "avg_logprob": -0.23322883981173156, + "compression_ratio": 1.7169811320754718, "no_speech_prob": 0.0056465016677975655}, + {"id": 57, "seek": 19708, "start": 215.84, "end": 221.64000000000001, "text": " + What we''re actually trying to do there is transition that as well to using a text + embedding.", "tokens": [51302, 708, 321, 434, 767, 1382, 281, 360, 456, 307, 6034, + 300, 382, 731, 281, 1228, 257, 2487, 12240, 3584, 13, 51592], "temperature": 0.0, + "avg_logprob": -0.23322883981173156, "compression_ratio": 1.7169811320754718, "no_speech_prob": + 0.0056465016677975655}, {"id": 58, "seek": 19708, "start": 221.64000000000001, "end": + 225.44, "text": " We I find that text embeddings are easier to use in the long run.", + "tokens": [51592, 492, 286, 915, 300, 2487, 12240, 29432, 366, 3571, 281, 764, 294, + 264, 938, 1190, 13, 51782], "temperature": 0.0, "avg_logprob": -0.23322883981173156, + "compression_ratio": 1.7169811320754718, "no_speech_prob": 0.0056465016677975655}, + {"id": 59, "seek": 22544, "start": 225.44, "end": 228.96, "text": " But what''s + actually challenging there is developing a good understanding of typos.", "tokens": + [50364, 583, 437, 311, 767, 7595, 456, 307, 6416, 257, 665, 3701, 295, 2125, 329, + 13, 50540], "temperature": 0.0, "avg_logprob": -0.24042604815575383, "compression_ratio": + 1.7112676056338028, "no_speech_prob": 0.0036462817806750536}, {"id": 60, "seek": + 22544, "start": 228.96, "end": 229.96, "text": " Right.", "tokens": [50540, 1779, + 13, 50590], "temperature": 0.0, "avg_logprob": -0.24042604815575383, "compression_ratio": + 1.7112676056338028, "no_speech_prob": 0.0036462817806750536}, {"id": 61, "seek": + 22544, "start": 229.96, "end": 234.72, "text": " So we could probably go into one + detail later, but the most of the text embeddings that you", "tokens": [50590, 407, + 321, 727, 1391, 352, 666, 472, 2607, 1780, 11, 457, 264, 881, 295, 264, 2487, 12240, + 29432, 300, 291, 50828], "temperature": 0.0, "avg_logprob": -0.24042604815575383, + "compression_ratio": 1.7112676056338028, "no_speech_prob": 0.0036462817806750536}, + {"id": 62, "seek": 22544, "start": 234.72, "end": 237.28, "text": " encounter aren''t + really typo tolerant.", "tokens": [50828, 8593, 3212, 380, 534, 2125, 78, 45525, + 13, 50956], "temperature": 0.0, "avg_logprob": -0.24042604815575383, "compression_ratio": + 1.7112676056338028, "no_speech_prob": 0.0036462817806750536}, {"id": 63, "seek": + 22544, "start": 237.28, "end": 241.07999999999998, "text": " So in our case, that + search box needs to really understand, like, let''s say, Ferrari", "tokens": [50956, + 407, 294, 527, 1389, 11, 300, 3164, 2424, 2203, 281, 534, 1223, 11, 411, 11, 718, + 311, 584, 11, 29828, 51146], "temperature": 0.0, "avg_logprob": -0.24042604815575383, + "compression_ratio": 1.7112676056338028, "no_speech_prob": 0.0036462817806750536}, + {"id": 64, "seek": 22544, "start": 241.07999999999998, "end": 242.07999999999998, + "text": " or Lamborghini.", "tokens": [51146, 420, 48389, 45600, 13, 51196], "temperature": + 0.0, "avg_logprob": -0.24042604815575383, "compression_ratio": 1.7112676056338028, + "no_speech_prob": 0.0036462817806750536}, {"id": 65, "seek": 22544, "start": 242.07999999999998, + "end": 246.0, "text": " Those words are often spelled incorrectly for obvious reasons.", + "tokens": [51196, 3950, 2283, 366, 2049, 34388, 42892, 337, 6322, 4112, 13, 51392], + "temperature": 0.0, "avg_logprob": -0.24042604815575383, "compression_ratio": 1.7112676056338028, + "no_speech_prob": 0.0036462817806750536}, {"id": 66, "seek": 22544, "start": 246.0, + "end": 251.07999999999998, "text": " So one of the things that''s holding us up + there is developing a typo, is a system level", "tokens": [51392, 407, 472, 295, + 264, 721, 300, 311, 5061, 505, 493, 456, 307, 6416, 257, 2125, 78, 11, 307, 257, + 1185, 1496, 51646], "temperature": 0.0, "avg_logprob": -0.24042604815575383, "compression_ratio": + 1.7112676056338028, "no_speech_prob": 0.0036462817806750536}, {"id": 67, "seek": + 22544, "start": 251.07999999999998, "end": 252.07999999999998, "text": " of embedding.", + "tokens": [51646, 295, 12240, 3584, 13, 51696], "temperature": 0.0, "avg_logprob": + -0.24042604815575383, "compression_ratio": 1.7112676056338028, "no_speech_prob": + 0.0036462817806750536}, {"id": 68, "seek": 25208, "start": 252.52, "end": 258.2, + "text": " Yeah, it sounds also some similarities to web search, for example, you + know, like where", "tokens": [50386, 865, 11, 309, 3263, 611, 512, 24197, 281, 3670, + 3164, 11, 337, 1365, 11, 291, 458, 11, 411, 689, 50670], "temperature": 0.0, "avg_logprob": + -0.21365135962809992, "compression_ratio": 1.6330935251798562, "no_speech_prob": + 0.015611261129379272}, {"id": 69, "seek": 25208, "start": 258.2, "end": 263.32, + "text": " users are using like colloquial language, or if they if they talk to their + microphone", "tokens": [50670, 5022, 366, 1228, 411, 1263, 29826, 831, 2856, 11, + 420, 498, 436, 498, 436, 751, 281, 641, 10952, 50926], "temperature": 0.0, "avg_logprob": + -0.21365135962809992, "compression_ratio": 1.6330935251798562, "no_speech_prob": + 0.015611261129379272}, {"id": 70, "seek": 25208, "start": 263.32, "end": 269.04, + "text": " instead of typing, then you have these typical problems from ASR, like + automatic speech recognition", "tokens": [50926, 2602, 295, 18444, 11, 550, 291, + 362, 613, 7476, 2740, 490, 7469, 49, 11, 411, 12509, 6218, 11150, 51212], "temperature": + 0.0, "avg_logprob": -0.21365135962809992, "compression_ratio": 1.6330935251798562, + "no_speech_prob": 0.015611261129379272}, {"id": 71, "seek": 25208, "start": 269.04, + "end": 270.88, "text": " systems, and you need to tolerate that.", "tokens": [51212, + 3652, 11, 293, 291, 643, 281, 25773, 300, 13, 51304], "temperature": 0.0, "avg_logprob": + -0.21365135962809992, "compression_ratio": 1.6330935251798562, "no_speech_prob": + 0.015611261129379272}, {"id": 72, "seek": 25208, "start": 270.88, "end": 275.84000000000003, + "text": " So so it means that I don''t know, like we''ve been thinking about data + augmentation techniques.", "tokens": [51304, 407, 370, 309, 1355, 300, 286, 500, + 380, 458, 11, 411, 321, 600, 668, 1953, 466, 1412, 14501, 19631, 7512, 13, 51552], + "temperature": 0.0, "avg_logprob": -0.21365135962809992, "compression_ratio": 1.6330935251798562, + "no_speech_prob": 0.015611261129379272}, {"id": 73, "seek": 25208, "start": 275.84000000000003, + "end": 277.84000000000003, "text": " Have you have you thought about that as well?", + "tokens": [51552, 3560, 291, 362, 291, 1194, 466, 300, 382, 731, 30, 51652], "temperature": + 0.0, "avg_logprob": -0.21365135962809992, "compression_ratio": 1.6330935251798562, + "no_speech_prob": 0.015611261129379272}, {"id": 74, "seek": 27784, "start": 278.67999999999995, + "end": 285.59999999999997, "text": " So what I''ve tried to do is to retrain the + retrain the model using basically our input", "tokens": [50406, 407, 437, 286, 600, + 3031, 281, 360, 307, 281, 1533, 7146, 264, 1533, 7146, 264, 2316, 1228, 1936, 527, + 4846, 50752], "temperature": 0.0, "avg_logprob": -0.25395614143431655, "compression_ratio": + 1.697594501718213, "no_speech_prob": 0.01793365366756916}, {"id": 75, "seek": 27784, + "start": 285.59999999999997, "end": 291.12, "text": " data, but with certain transformations + applied certain permutations.", "tokens": [50752, 1412, 11, 457, 365, 1629, 34852, + 6456, 1629, 4784, 325, 763, 13, 51028], "temperature": 0.0, "avg_logprob": -0.25395614143431655, + "compression_ratio": 1.697594501718213, "no_speech_prob": 0.01793365366756916}, + {"id": 76, "seek": 27784, "start": 291.12, "end": 295.15999999999997, "text": " + At this point, I''m not I am not the point where I have a usable model coming out + of that,", "tokens": [51028, 1711, 341, 935, 11, 286, 478, 406, 286, 669, 406, 264, + 935, 689, 286, 362, 257, 29975, 2316, 1348, 484, 295, 300, 11, 51230], "temperature": + 0.0, "avg_logprob": -0.25395614143431655, "compression_ratio": 1.697594501718213, + "no_speech_prob": 0.01793365366756916}, {"id": 77, "seek": 27784, "start": 295.15999999999997, + "end": 297.84, "text": " but I''m still doing some research and I it should work + in theory.", "tokens": [51230, 457, 286, 478, 920, 884, 512, 2132, 293, 286, 309, + 820, 589, 294, 5261, 13, 51364], "temperature": 0.0, "avg_logprob": -0.25395614143431655, + "compression_ratio": 1.697594501718213, "no_speech_prob": 0.01793365366756916}, + {"id": 78, "seek": 27784, "start": 297.84, "end": 302.15999999999997, "text": " + Yeah, and there are so many models like on hugging face that they guess you can + also kind", "tokens": [51364, 865, 11, 293, 456, 366, 370, 867, 5245, 411, 322, + 41706, 1851, 300, 436, 2041, 291, 393, 611, 733, 51580], "temperature": 0.0, "avg_logprob": + -0.25395614143431655, "compression_ratio": 1.697594501718213, "no_speech_prob": + 0.01793365366756916}, {"id": 79, "seek": 27784, "start": 302.15999999999997, "end": + 303.15999999999997, "text": " of tap in, right?", "tokens": [51580, 295, 5119, 294, + 11, 558, 30, 51630], "temperature": 0.0, "avg_logprob": -0.25395614143431655, "compression_ratio": + 1.697594501718213, "no_speech_prob": 0.01793365366756916}, {"id": 80, "seek": 27784, + "start": 303.15999999999997, "end": 307.23999999999995, "text": " And that''s actually + one of the hard parts is to evaluate all those models.", "tokens": [51630, 400, + 300, 311, 767, 472, 295, 264, 1152, 3166, 307, 281, 13059, 439, 729, 5245, 13, 51834], + "temperature": 0.0, "avg_logprob": -0.25395614143431655, "compression_ratio": 1.697594501718213, + "no_speech_prob": 0.01793365366756916}, {"id": 81, "seek": 30724, "start": 307.24, + "end": 310.72, "text": " So I have been taking a couple days to write a script that + just downloaded every single", "tokens": [50364, 407, 286, 362, 668, 1940, 257, + 1916, 1708, 281, 2464, 257, 5755, 300, 445, 21748, 633, 2167, 50538], "temperature": + 0.0, "avg_logprob": -0.22285458996037769, "compression_ratio": 1.717201166180758, + "no_speech_prob": 0.0009586111409589648}, {"id": 82, "seek": 30724, "start": 310.72, + "end": 313.68, "text": " one and tried them to determine which to best at our tasks.", + "tokens": [50538, 472, 293, 3031, 552, 281, 6997, 597, 281, 1151, 412, 527, 9608, + 13, 50686], "temperature": 0.0, "avg_logprob": -0.22285458996037769, "compression_ratio": + 1.717201166180758, "no_speech_prob": 0.0009586111409589648}, {"id": 83, "seek": + 30724, "start": 313.68, "end": 314.68, "text": " Yeah, exactly.", "tokens": [50686, + 865, 11, 2293, 13, 50736], "temperature": 0.0, "avg_logprob": -0.22285458996037769, + "compression_ratio": 1.717201166180758, "no_speech_prob": 0.0009586111409589648}, + {"id": 84, "seek": 30724, "start": 314.68, "end": 319.68, "text": " And also like + choosing kind of the quality metrics is another, another direction like how", "tokens": + [50736, 400, 611, 411, 10875, 733, 295, 264, 3125, 16367, 307, 1071, 11, 1071, 3513, + 411, 577, 50986], "temperature": 0.0, "avg_logprob": -0.22285458996037769, "compression_ratio": + 1.717201166180758, "no_speech_prob": 0.0009586111409589648}, {"id": 85, "seek": + 30724, "start": 319.68, "end": 320.68, "text": " do you evaluate?", "tokens": [50986, + 360, 291, 13059, 30, 51036], "temperature": 0.0, "avg_logprob": -0.22285458996037769, + "compression_ratio": 1.717201166180758, "no_speech_prob": 0.0009586111409589648}, + {"id": 86, "seek": 30724, "start": 320.68, "end": 321.68, "text": " Yeah, absolutely.", + "tokens": [51036, 865, 11, 3122, 13, 51086], "temperature": 0.0, "avg_logprob": + -0.22285458996037769, "compression_ratio": 1.717201166180758, "no_speech_prob": + 0.0009586111409589648}, {"id": 87, "seek": 30724, "start": 321.68, "end": 325.0, + "text": " Yeah, we''re kind of a new territory for a lot of this.", "tokens": [51086, + 865, 11, 321, 434, 733, 295, 257, 777, 11360, 337, 257, 688, 295, 341, 13, 51252], + "temperature": 0.0, "avg_logprob": -0.22285458996037769, "compression_ratio": 1.717201166180758, + "no_speech_prob": 0.0009586111409589648}, {"id": 88, "seek": 30724, "start": 325.0, + "end": 328.0, "text": " So I mean, that''s exciting on one hand, but on the other + hand, like sometimes you just", "tokens": [51252, 407, 286, 914, 11, 300, 311, 4670, + 322, 472, 1011, 11, 457, 322, 264, 661, 1011, 11, 411, 2171, 291, 445, 51402], "temperature": + 0.0, "avg_logprob": -0.22285458996037769, "compression_ratio": 1.717201166180758, + "no_speech_prob": 0.0009586111409589648}, {"id": 89, "seek": 30724, "start": 328.0, + "end": 331.92, "text": " don''t know the answer to a problem, which is like, yeah, + yeah, for sure.", "tokens": [51402, 500, 380, 458, 264, 1867, 281, 257, 1154, 11, + 597, 307, 411, 11, 1338, 11, 1338, 11, 337, 988, 13, 51598], "temperature": 0.0, + "avg_logprob": -0.22285458996037769, "compression_ratio": 1.717201166180758, "no_speech_prob": + 0.0009586111409589648}, {"id": 90, "seek": 30724, "start": 331.92, "end": 336.68, + "text": " So in that sense, classic is kind of well, it''s it''s funny that there + is a coincidence", "tokens": [51598, 407, 294, 300, 2020, 11, 7230, 307, 733, 295, + 731, 11, 309, 311, 309, 311, 4074, 300, 456, 307, 257, 22137, 51836], "temperature": + 0.0, "avg_logprob": -0.22285458996037769, "compression_ratio": 1.717201166180758, + "no_speech_prob": 0.0009586111409589648}, {"id": 91, "seek": 33668, "start": 336.68, + "end": 341.64, "text": " classic and then classic search like in a way that you''re + using TF idea for BM25,", "tokens": [50364, 7230, 293, 550, 7230, 3164, 411, 294, + 257, 636, 300, 291, 434, 1228, 40964, 1558, 337, 15901, 6074, 11, 50612], "temperature": + 0.0, "avg_logprob": -0.24085406651572575, "compression_ratio": 1.6725352112676057, + "no_speech_prob": 0.02021767757833004}, {"id": 92, "seek": 33668, "start": 341.64, + "end": 342.64, "text": " right?", "tokens": [50612, 558, 30, 50662], "temperature": + 0.0, "avg_logprob": -0.24085406651572575, "compression_ratio": 1.6725352112676057, + "no_speech_prob": 0.02021767757833004}, {"id": 93, "seek": 33668, "start": 342.64, + "end": 348.12, "text": " Well, of course, you will add an embedding layer at some + point to make it more semantic,", "tokens": [50662, 1042, 11, 295, 1164, 11, 291, + 486, 909, 364, 12240, 3584, 4583, 412, 512, 935, 281, 652, 309, 544, 47982, 11, + 50936], "temperature": 0.0, "avg_logprob": -0.24085406651572575, "compression_ratio": + 1.6725352112676057, "no_speech_prob": 0.02021767757833004}, {"id": 94, "seek": 33668, + "start": 348.12, "end": 349.12, "text": " right?", "tokens": [50936, 558, 30, 50986], + "temperature": 0.0, "avg_logprob": -0.24085406651572575, "compression_ratio": 1.6725352112676057, + "no_speech_prob": 0.02021767757833004}, {"id": 95, "seek": 33668, "start": 349.12, + "end": 354.88, "text": " Then you said that you have look pop, which where you are + now experimenting with vector", "tokens": [50986, 1396, 291, 848, 300, 291, 362, + 574, 1665, 11, 597, 689, 291, 366, 586, 29070, 365, 8062, 51274], "temperature": + 0.0, "avg_logprob": -0.24085406651572575, "compression_ratio": 1.6725352112676057, + "no_speech_prob": 0.02021767757833004}, {"id": 96, "seek": 33668, "start": 354.88, + "end": 355.88, "text": " search.", "tokens": [51274, 3164, 13, 51324], "temperature": + 0.0, "avg_logprob": -0.24085406651572575, "compression_ratio": 1.6725352112676057, + "no_speech_prob": 0.02021767757833004}, {"id": 97, "seek": 33668, "start": 355.88, + "end": 358.36, "text": " Can you tell me more what is look pop?", "tokens": [51324, + 1664, 291, 980, 385, 544, 437, 307, 574, 1665, 30, 51448], "temperature": 0.0, "avg_logprob": + -0.24085406651572575, "compression_ratio": 1.6725352112676057, "no_speech_prob": + 0.02021767757833004}, {"id": 98, "seek": 33668, "start": 358.36, "end": 361.36, + "text": " And then how do you implement vector search there?", "tokens": [51448, + 400, 550, 577, 360, 291, 4445, 8062, 3164, 456, 30, 51598], "temperature": 0.0, + "avg_logprob": -0.24085406651572575, "compression_ratio": 1.6725352112676057, "no_speech_prob": + 0.02021767757833004}, {"id": 99, "seek": 33668, "start": 361.36, "end": 364.64, + "text": " For the last couple of years, I''ve been really interested in search engines + and how search", "tokens": [51598, 1171, 264, 1036, 1916, 295, 924, 11, 286, 600, + 668, 534, 3102, 294, 3164, 12982, 293, 577, 3164, 51762], "temperature": 0.0, "avg_logprob": + -0.24085406651572575, "compression_ratio": 1.6725352112676057, "no_speech_prob": + 0.02021767757833004}, {"id": 100, "seek": 33668, "start": 364.64, "end": 365.64, + "text": " engines work.", "tokens": [51762, 12982, 589, 13, 51812], "temperature": + 0.0, "avg_logprob": -0.24085406651572575, "compression_ratio": 1.6725352112676057, + "no_speech_prob": 0.02021767757833004}, {"id": 101, "seek": 36564, "start": 365.96, + "end": 370.28, "text": " I feel like Google has sort of done us a disservice in + certain ways over the past, you", "tokens": [50380, 286, 841, 411, 3329, 575, 1333, + 295, 1096, 505, 257, 7802, 25006, 294, 1629, 2098, 670, 264, 1791, 11, 291, 50596], + "temperature": 0.0, "avg_logprob": -0.186670838740834, "compression_ratio": 1.5851851851851853, + "no_speech_prob": 0.0034059451427310705}, {"id": 102, "seek": 36564, "start": 370.28, + "end": 373.24, "text": " know, a couple generations of its development.", "tokens": + [50596, 458, 11, 257, 1916, 10593, 295, 1080, 3250, 13, 50744], "temperature": 0.0, + "avg_logprob": -0.186670838740834, "compression_ratio": 1.5851851851851853, "no_speech_prob": + 0.0034059451427310705}, {"id": 103, "seek": 36564, "start": 373.24, "end": 377.03999999999996, + "text": " So I''ve been interested in like, you know, developing better web search + tools.", "tokens": [50744, 407, 286, 600, 668, 3102, 294, 411, 11, 291, 458, 11, + 6416, 1101, 3670, 3164, 3873, 13, 50934], "temperature": 0.0, "avg_logprob": -0.186670838740834, + "compression_ratio": 1.5851851851851853, "no_speech_prob": 0.0034059451427310705}, + {"id": 104, "seek": 36564, "start": 377.03999999999996, "end": 381.71999999999997, + "text": " Lookpop.co is my effort to make a NFT search tool.", "tokens": [50934, + 2053, 13872, 13, 1291, 307, 452, 4630, 281, 652, 257, 50075, 3164, 2290, 13, 51168], + "temperature": 0.0, "avg_logprob": -0.186670838740834, "compression_ratio": 1.5851851851851853, + "no_speech_prob": 0.0034059451427310705}, {"id": 105, "seek": 36564, "start": 381.71999999999997, + "end": 384.84, "text": " So NFTs are digital artworks that you can buy in the past + year.", "tokens": [51168, 407, 13576, 33424, 366, 4562, 15829, 82, 300, 291, 393, + 2256, 294, 264, 1791, 1064, 13, 51324], "temperature": 0.0, "avg_logprob": -0.186670838740834, + "compression_ratio": 1.5851851851851853, "no_speech_prob": 0.0034059451427310705}, + {"id": 106, "seek": 36564, "start": 384.84, "end": 387.0, "text": " The NFT market + has exploded.", "tokens": [51324, 440, 50075, 2142, 575, 27049, 13, 51432], "temperature": + 0.0, "avg_logprob": -0.186670838740834, "compression_ratio": 1.5851851851851853, + "no_speech_prob": 0.0034059451427310705}, {"id": 107, "seek": 36564, "start": 387.0, + "end": 392.28, "text": " I think something like $6 billion has been exchanged this + year in NFTs.", "tokens": [51432, 286, 519, 746, 411, 1848, 21, 5218, 575, 668, + 38378, 341, 1064, 294, 13576, 33424, 13, 51696], "temperature": 0.0, "avg_logprob": + -0.186670838740834, "compression_ratio": 1.5851851851851853, "no_speech_prob": 0.0034059451427310705}, + {"id": 108, "seek": 39228, "start": 392.28, "end": 396.96, "text": " But the problem + with NFTs is that coming from the world and the language of cryptocurrency,", "tokens": + [50364, 583, 264, 1154, 365, 13576, 33424, 307, 300, 1348, 490, 264, 1002, 293, + 264, 2856, 295, 28809, 11, 50598], "temperature": 0.0, "avg_logprob": -0.20883095782736075, + "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.029990853741765022}, + {"id": 109, "seek": 39228, "start": 396.96, "end": 401.64, "text": " a lot of the + websites related to NFTs are about the price, the up, the down, the this,", "tokens": + [50598, 257, 688, 295, 264, 12891, 4077, 281, 13576, 33424, 366, 466, 264, 3218, + 11, 264, 493, 11, 264, 760, 11, 264, 341, 11, 50832], "temperature": 0.0, "avg_logprob": + -0.20883095782736075, "compression_ratio": 1.7586206896551724, "no_speech_prob": + 0.029990853741765022}, {"id": 110, "seek": 39228, "start": 401.64, "end": 402.64, + "text": " that that, you know, what''s hot?", "tokens": [50832, 300, 300, 11, 291, + 458, 11, 437, 311, 2368, 30, 50882], "temperature": 0.0, "avg_logprob": -0.20883095782736075, + "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.029990853741765022}, + {"id": 111, "seek": 39228, "start": 402.64, "end": 407.03999999999996, "text": " + What''s not blah, blah, blah, who''s flipping, you know, like I personally don''t + care for", "tokens": [50882, 708, 311, 406, 12288, 11, 12288, 11, 12288, 11, 567, + 311, 26886, 11, 291, 458, 11, 411, 286, 5665, 500, 380, 1127, 337, 51102], "temperature": + 0.0, "avg_logprob": -0.20883095782736075, "compression_ratio": 1.7586206896551724, + "no_speech_prob": 0.029990853741765022}, {"id": 112, "seek": 39228, "start": 407.03999999999996, + "end": 408.03999999999996, "text": " that.", "tokens": [51102, 300, 13, 51152], + "temperature": 0.0, "avg_logprob": -0.20883095782736075, "compression_ratio": 1.7586206896551724, + "no_speech_prob": 0.029990853741765022}, {"id": 113, "seek": 39228, "start": 408.03999999999996, + "end": 411.32, "text": " So I was looking for an NFT search engine that could actually + help me understand the", "tokens": [51152, 407, 286, 390, 1237, 337, 364, 50075, + 3164, 2848, 300, 727, 767, 854, 385, 1223, 264, 51316], "temperature": 0.0, "avg_logprob": + -0.20883095782736075, "compression_ratio": 1.7586206896551724, "no_speech_prob": + 0.029990853741765022}, {"id": 114, "seek": 39228, "start": 411.32, "end": 414.71999999999997, + "text": " meaning of NFTs and find visually similar ones.", "tokens": [51316, 3620, + 295, 13576, 33424, 293, 915, 19622, 2531, 2306, 13, 51486], "temperature": 0.0, + "avg_logprob": -0.20883095782736075, "compression_ratio": 1.7586206896551724, "no_speech_prob": + 0.029990853741765022}, {"id": 115, "seek": 39228, "start": 414.71999999999997, "end": + 417.11999999999995, "text": " If I find something I like, it would be cool like + the up to see stuff that''s kind of in", "tokens": [51486, 759, 286, 915, 746, 286, + 411, 11, 309, 576, 312, 1627, 411, 264, 493, 281, 536, 1507, 300, 311, 733, 295, + 294, 51606], "temperature": 0.0, "avg_logprob": -0.20883095782736075, "compression_ratio": + 1.7586206896551724, "no_speech_prob": 0.029990853741765022}, {"id": 116, "seek": + 39228, "start": 417.11999999999995, "end": 422.08, "text": " that same vein without + having to manually search around on OpenC, for instance, which", "tokens": [51606, + 300, 912, 30669, 1553, 1419, 281, 16945, 3164, 926, 322, 7238, 34, 11, 337, 5197, + 11, 597, 51854], "temperature": 0.0, "avg_logprob": -0.20883095782736075, "compression_ratio": + 1.7586206896551724, "no_speech_prob": 0.029990853741765022}, {"id": 117, "seek": + 42208, "start": 422.08, "end": 424.71999999999997, "text": " is the number one NFT + market.", "tokens": [50364, 307, 264, 1230, 472, 50075, 2142, 13, 50496], "temperature": + 0.0, "avg_logprob": -0.22264932584361868, "compression_ratio": 1.6245487364620939, + "no_speech_prob": 0.00550074502825737}, {"id": 118, "seek": 42208, "start": 424.71999999999997, + "end": 428.03999999999996, "text": " You can only search by the name of the creators, + right, which is so weird to me.", "tokens": [50496, 509, 393, 787, 3164, 538, 264, + 1315, 295, 264, 16039, 11, 558, 11, 597, 307, 370, 3657, 281, 385, 13, 50662], "temperature": + 0.0, "avg_logprob": -0.22264932584361868, "compression_ratio": 1.6245487364620939, + "no_speech_prob": 0.00550074502825737}, {"id": 119, "seek": 42208, "start": 428.03999999999996, + "end": 431.8, "text": " I wanted to be able to search by themes, by visual styles.", + "tokens": [50662, 286, 1415, 281, 312, 1075, 281, 3164, 538, 13544, 11, 538, 5056, + 13273, 13, 50850], "temperature": 0.0, "avg_logprob": -0.22264932584361868, "compression_ratio": + 1.6245487364620939, "no_speech_prob": 0.00550074502825737}, {"id": 120, "seek": + 42208, "start": 431.8, "end": 437.08, "text": " And when I came across clip, the + text embedding system or the imaging embedding system, it", "tokens": [50850, 400, + 562, 286, 1361, 2108, 7353, 11, 264, 2487, 12240, 3584, 1185, 420, 264, 25036, 12240, + 3584, 1185, 11, 309, 51114], "temperature": 0.0, "avg_logprob": -0.22264932584361868, + "compression_ratio": 1.6245487364620939, "no_speech_prob": 0.00550074502825737}, + {"id": 121, "seek": 42208, "start": 437.08, "end": 439.91999999999996, "text": " + really, like it provided all those features in a pretty easy to use way.", "tokens": + [51114, 534, 11, 411, 309, 5649, 439, 729, 4122, 294, 257, 1238, 1858, 281, 764, + 636, 13, 51256], "temperature": 0.0, "avg_logprob": -0.22264932584361868, "compression_ratio": + 1.6245487364620939, "no_speech_prob": 0.00550074502825737}, {"id": 122, "seek": + 42208, "start": 439.91999999999996, "end": 442.91999999999996, "text": " So I''m + really excited about that functionality.", "tokens": [51256, 407, 286, 478, 534, + 2919, 466, 300, 14980, 13, 51406], "temperature": 0.0, "avg_logprob": -0.22264932584361868, + "compression_ratio": 1.6245487364620939, "no_speech_prob": 0.00550074502825737}, + {"id": 123, "seek": 42208, "start": 442.91999999999996, "end": 443.91999999999996, + "text": " Yeah.", "tokens": [51406, 865, 13, 51456], "temperature": 0.0, "avg_logprob": + -0.22264932584361868, "compression_ratio": 1.6245487364620939, "no_speech_prob": + 0.00550074502825737}, {"id": 124, "seek": 42208, "start": 443.91999999999996, "end": + 449.2, "text": " And clip is basically the embedding model developed by OpenAI.", + "tokens": [51456, 400, 7353, 307, 1936, 264, 12240, 3584, 2316, 4743, 538, 7238, + 48698, 13, 51720], "temperature": 0.0, "avg_logprob": -0.22264932584361868, "compression_ratio": + 1.6245487364620939, "no_speech_prob": 0.00550074502825737}, {"id": 125, "seek": + 44920, "start": 449.2, "end": 452.84, "text": " I think it''s also available as + a hiking face model.", "tokens": [50364, 286, 519, 309, 311, 611, 2435, 382, 257, + 23784, 1851, 2316, 13, 50546], "temperature": 0.0, "avg_logprob": -0.25169246952708174, + "compression_ratio": 1.6074074074074074, "no_speech_prob": 0.008038913831114769}, + {"id": 126, "seek": 44920, "start": 452.84, "end": 457.56, "text": " So you can + kind of plug it in much easier in the code.", "tokens": [50546, 407, 291, 393, 733, + 295, 5452, 309, 294, 709, 3571, 294, 264, 3089, 13, 50782], "temperature": 0.0, + "avg_logprob": -0.25169246952708174, "compression_ratio": 1.6074074074074074, "no_speech_prob": + 0.008038913831114769}, {"id": 127, "seek": 44920, "start": 457.56, "end": 463.48, + "text": " And so you have what is your experience with clip so far?", "tokens": + [50782, 400, 370, 291, 362, 437, 307, 428, 1752, 365, 7353, 370, 1400, 30, 51078], + "temperature": 0.0, "avg_logprob": -0.25169246952708174, "compression_ratio": 1.6074074074074074, + "no_speech_prob": 0.008038913831114769}, {"id": 128, "seek": 44920, "start": 463.48, + "end": 468.03999999999996, "text": " So one of the great things about embedding + is that when they work, it''s like a sort of", "tokens": [51078, 407, 472, 295, + 264, 869, 721, 466, 12240, 3584, 307, 300, 562, 436, 589, 11, 309, 311, 411, 257, + 1333, 295, 51306], "temperature": 0.0, "avg_logprob": -0.25169246952708174, "compression_ratio": + 1.6074074074074074, "no_speech_prob": 0.008038913831114769}, {"id": 129, "seek": + 44920, "start": 468.03999999999996, "end": 469.03999999999996, "text": " like magic, + right?", "tokens": [51306, 411, 5585, 11, 558, 30, 51356], "temperature": 0.0, "avg_logprob": + -0.25169246952708174, "compression_ratio": 1.6074074074074074, "no_speech_prob": + 0.008038913831114769}, {"id": 130, "seek": 44920, "start": 469.03999999999996, "end": + 471.44, "text": " Like it''s like, it''s amazing that this was even possible.", + "tokens": [51356, 1743, 309, 311, 411, 11, 309, 311, 2243, 300, 341, 390, 754, 1944, + 13, 51476], "temperature": 0.0, "avg_logprob": -0.25169246952708174, "compression_ratio": + 1.6074074074074074, "no_speech_prob": 0.008038913831114769}, {"id": 131, "seek": + 44920, "start": 471.44, "end": 477.4, "text": " The problem is though, if you actually + look at the results set as a whole, it''s only", "tokens": [51476, 440, 1154, 307, + 1673, 11, 498, 291, 767, 574, 412, 264, 3542, 992, 382, 257, 1379, 11, 309, 311, + 787, 51774], "temperature": 0.0, "avg_logprob": -0.25169246952708174, "compression_ratio": + 1.6074074074074074, "no_speech_prob": 0.008038913831114769}, {"id": 132, "seek": + 44920, "start": 477.4, "end": 479.12, "text": " 80% accurate, right?", "tokens": + [51774, 4688, 4, 8559, 11, 558, 30, 51860], "temperature": 0.0, "avg_logprob": -0.25169246952708174, + "compression_ratio": 1.6074074074074074, "no_speech_prob": 0.008038913831114769}, + {"id": 133, "seek": 47912, "start": 479.12, "end": 484.32, "text": " Like you''ll + find 20% of those in there are just what the hell is this?", "tokens": [50364, 1743, + 291, 603, 915, 945, 4, 295, 729, 294, 456, 366, 445, 437, 264, 4921, 307, 341, 30, + 50624], "temperature": 0.0, "avg_logprob": -0.18269220475227602, "compression_ratio": + 1.657992565055762, "no_speech_prob": 0.004415399394929409}, {"id": 134, "seek": + 47912, "start": 484.32, "end": 490.88, "text": " So as a sort of imperative programmer + coming to it, or a guy who''s my experience is based", "tokens": [50624, 407, 382, + 257, 1333, 295, 32490, 32116, 1348, 281, 309, 11, 420, 257, 2146, 567, 311, 452, + 1752, 307, 2361, 50952], "temperature": 0.0, "avg_logprob": -0.18269220475227602, + "compression_ratio": 1.657992565055762, "no_speech_prob": 0.004415399394929409}, + {"id": 135, "seek": 47912, "start": 490.88, "end": 497.32, "text": " in the world + of traditional programming, to see that it''s like, okay, this is a bug,", "tokens": + [50952, 294, 264, 1002, 295, 5164, 9410, 11, 281, 536, 300, 309, 311, 411, 11, 1392, + 11, 341, 307, 257, 7426, 11, 51274], "temperature": 0.0, "avg_logprob": -0.18269220475227602, + "compression_ratio": 1.657992565055762, "no_speech_prob": 0.004415399394929409}, + {"id": 136, "seek": 47912, "start": 497.32, "end": 498.32, "text": " but it''s not.", + "tokens": [51274, 457, 309, 311, 406, 13, 51324], "temperature": 0.0, "avg_logprob": + -0.18269220475227602, "compression_ratio": 1.657992565055762, "no_speech_prob": + 0.004415399394929409}, {"id": 137, "seek": 47912, "start": 498.32, "end": 502.36, + "text": " It''s one of the switchovers you need to make is to accept the fact that + you''re going", "tokens": [51324, 467, 311, 472, 295, 264, 3679, 25348, 291, 643, + 281, 652, 307, 281, 3241, 264, 1186, 300, 291, 434, 516, 51526], "temperature": + 0.0, "avg_logprob": -0.18269220475227602, "compression_ratio": 1.657992565055762, + "no_speech_prob": 0.004415399394929409}, {"id": 138, "seek": 47912, "start": 502.36, + "end": 507.56, "text": " to get a lot of great results for very little effort, but + you''re not going to get every", "tokens": [51526, 281, 483, 257, 688, 295, 869, + 3542, 337, 588, 707, 4630, 11, 457, 291, 434, 406, 516, 281, 483, 633, 51786], "temperature": + 0.0, "avg_logprob": -0.18269220475227602, "compression_ratio": 1.657992565055762, + "no_speech_prob": 0.004415399394929409}, {"id": 139, "seek": 47912, "start": 507.56, + "end": 508.56, "text": " 100% results.", "tokens": [51786, 2319, 4, 3542, 13, 51836], + "temperature": 0.0, "avg_logprob": -0.18269220475227602, "compression_ratio": 1.657992565055762, + "no_speech_prob": 0.004415399394929409}, {"id": 140, "seek": 50856, "start": 508.56, + "end": 512.72, "text": " It''s more about identifying the results that are bad, + flagging them and trying to retrain", "tokens": [50364, 467, 311, 544, 466, 16696, + 264, 3542, 300, 366, 1578, 11, 7166, 3249, 552, 293, 1382, 281, 1533, 7146, 50572], + "temperature": 0.0, "avg_logprob": -0.2140293294733221, "compression_ratio": 1.6754716981132076, + "no_speech_prob": 0.004566708113998175}, {"id": 141, "seek": 50856, "start": 512.72, + "end": 515.8, "text": " to get them out of the loop in the future.", "tokens": [50572, + 281, 483, 552, 484, 295, 264, 6367, 294, 264, 2027, 13, 50726], "temperature": 0.0, + "avg_logprob": -0.2140293294733221, "compression_ratio": 1.6754716981132076, "no_speech_prob": + 0.004566708113998175}, {"id": 142, "seek": 50856, "start": 515.8, "end": 516.8, + "text": " Yeah, exactly.", "tokens": [50726, 865, 11, 2293, 13, 50776], "temperature": + 0.0, "avg_logprob": -0.2140293294733221, "compression_ratio": 1.6754716981132076, + "no_speech_prob": 0.004566708113998175}, {"id": 143, "seek": 50856, "start": 516.8, + "end": 521.8, "text": " So like building that pipeline, it''s essentially like MLOPS + pipeline, right?", "tokens": [50776, 407, 411, 2390, 300, 15517, 11, 309, 311, 4476, + 411, 21601, 46, 6273, 15517, 11, 558, 30, 51026], "temperature": 0.0, "avg_logprob": + -0.2140293294733221, "compression_ratio": 1.6754716981132076, "no_speech_prob": + 0.004566708113998175}, {"id": 144, "seek": 50856, "start": 521.8, "end": 526.16, + "text": " Machine learning operations where you need to switch your mind a little + bit into building", "tokens": [51026, 22155, 2539, 7705, 689, 291, 643, 281, 3679, + 428, 1575, 257, 707, 857, 666, 2390, 51244], "temperature": 0.0, "avg_logprob": + -0.2140293294733221, "compression_ratio": 1.6754716981132076, "no_speech_prob": + 0.004566708113998175}, {"id": 145, "seek": 50856, "start": 526.16, "end": 528.0, + "text": " this pipeline way.", "tokens": [51244, 341, 15517, 636, 13, 51336], "temperature": + 0.0, "avg_logprob": -0.2140293294733221, "compression_ratio": 1.6754716981132076, + "no_speech_prob": 0.004566708113998175}, {"id": 146, "seek": 50856, "start": 528.0, + "end": 533.96, "text": " You can like detect problems and then feedback to the process + of building a next version", "tokens": [51336, 509, 393, 411, 5531, 2740, 293, 550, + 5824, 281, 264, 1399, 295, 2390, 257, 958, 3037, 51634], "temperature": 0.0, "avg_logprob": + -0.2140293294733221, "compression_ratio": 1.6754716981132076, "no_speech_prob": + 0.004566708113998175}, {"id": 147, "seek": 50856, "start": 533.96, "end": 535.32, + "text": " of your model, right?", "tokens": [51634, 295, 428, 2316, 11, 558, 30, + 51702], "temperature": 0.0, "avg_logprob": -0.2140293294733221, "compression_ratio": + 1.6754716981132076, "no_speech_prob": 0.004566708113998175}, {"id": 148, "seek": + 53532, "start": 535.32, "end": 539.32, "text": " So it''s not as easy as opening + your debugger and then, okay, here is the bug.", "tokens": [50364, 407, 309, 311, + 406, 382, 1858, 382, 5193, 428, 24083, 1321, 293, 550, 11, 1392, 11, 510, 307, 264, + 7426, 13, 50564], "temperature": 0.0, "avg_logprob": -0.18235963488382007, "compression_ratio": + 1.6551724137931034, "no_speech_prob": 0.010557171888649464}, {"id": 149, "seek": + 53532, "start": 539.32, "end": 541.5200000000001, "text": " It''s logical, fix it, + done.", "tokens": [50564, 467, 311, 14978, 11, 3191, 309, 11, 1096, 13, 50674], + "temperature": 0.0, "avg_logprob": -0.18235963488382007, "compression_ratio": 1.6551724137931034, + "no_speech_prob": 0.010557171888649464}, {"id": 150, "seek": 53532, "start": 541.5200000000001, + "end": 545.6800000000001, "text": " Yeah, the pipeline to develop these models is + long term.", "tokens": [50674, 865, 11, 264, 15517, 281, 1499, 613, 5245, 307, 938, + 1433, 13, 50882], "temperature": 0.0, "avg_logprob": -0.18235963488382007, "compression_ratio": + 1.6551724137931034, "no_speech_prob": 0.010557171888649464}, {"id": 151, "seek": + 53532, "start": 545.6800000000001, "end": 548.9200000000001, "text": " It''s very + different than a piece of software and you need continuous monitoring and you", + "tokens": [50882, 467, 311, 588, 819, 813, 257, 2522, 295, 4722, 293, 291, 643, + 10957, 11028, 293, 291, 51044], "temperature": 0.0, "avg_logprob": -0.18235963488382007, + "compression_ratio": 1.6551724137931034, "no_speech_prob": 0.010557171888649464}, + {"id": 152, "seek": 53532, "start": 548.9200000000001, "end": 553.2800000000001, + "text": " need to continuously be able to sort of have signals feeding in to make + that make that", "tokens": [51044, 643, 281, 15684, 312, 1075, 281, 1333, 295, 362, + 12354, 12919, 294, 281, 652, 300, 652, 300, 51262], "temperature": 0.0, "avg_logprob": + -0.18235963488382007, "compression_ratio": 1.6551724137931034, "no_speech_prob": + 0.010557171888649464}, {"id": 153, "seek": 53532, "start": 553.2800000000001, "end": + 554.2800000000001, "text": " model better next time.", "tokens": [51262, 2316, 1101, + 958, 565, 13, 51312], "temperature": 0.0, "avg_logprob": -0.18235963488382007, "compression_ratio": + 1.6551724137931034, "no_speech_prob": 0.010557171888649464}, {"id": 154, "seek": + 53532, "start": 554.2800000000001, "end": 555.96, "text": " It''s actually pretty + difficult.", "tokens": [51312, 467, 311, 767, 1238, 2252, 13, 51396], "temperature": + 0.0, "avg_logprob": -0.18235963488382007, "compression_ratio": 1.6551724137931034, + "no_speech_prob": 0.010557171888649464}, {"id": 155, "seek": 53532, "start": 555.96, + "end": 560.72, "text": " I know there''s a lot of startups around like MLOPS now, + which makes total sense to me,", "tokens": [51396, 286, 458, 456, 311, 257, 688, + 295, 28041, 926, 411, 21601, 46, 6273, 586, 11, 597, 1669, 3217, 2020, 281, 385, + 11, 51634], "temperature": 0.0, "avg_logprob": -0.18235963488382007, "compression_ratio": + 1.6551724137931034, "no_speech_prob": 0.010557171888649464}, {"id": 156, "seek": + 56072, "start": 561.72, "end": 565.72, "text": " but it''s almost like I feel like + developers myself included need to build the mindset", "tokens": [50414, 457, 309, + 311, 1920, 411, 286, 841, 411, 8849, 2059, 5556, 643, 281, 1322, 264, 12543, 50614], + "temperature": 0.0, "avg_logprob": -0.2332855837188498, "compression_ratio": 1.751700680272109, + "no_speech_prob": 0.01702490821480751}, {"id": 157, "seek": 56072, "start": 565.72, + "end": 569.52, "text": " and to know, and to, and to mentally know like, okay, these + are the different components", "tokens": [50614, 293, 281, 458, 11, 293, 281, 11, + 293, 281, 17072, 458, 411, 11, 1392, 11, 613, 366, 264, 819, 6677, 50804], "temperature": + 0.0, "avg_logprob": -0.2332855837188498, "compression_ratio": 1.751700680272109, + "no_speech_prob": 0.01702490821480751}, {"id": 158, "seek": 56072, "start": 569.52, + "end": 572.1600000000001, "text": " that I need to put into the system.", "tokens": + [50804, 300, 286, 643, 281, 829, 666, 264, 1185, 13, 50936], "temperature": 0.0, + "avg_logprob": -0.2332855837188498, "compression_ratio": 1.751700680272109, "no_speech_prob": + 0.01702490821480751}, {"id": 159, "seek": 56072, "start": 572.1600000000001, "end": + 573.1600000000001, "text": " Yeah, absolutely.", "tokens": [50936, 865, 11, 3122, + 13, 50986], "temperature": 0.0, "avg_logprob": -0.2332855837188498, "compression_ratio": + 1.751700680272109, "no_speech_prob": 0.01702490821480751}, {"id": 160, "seek": 56072, + "start": 573.1600000000001, "end": 576.96, "text": " And there is like, there are + white papers published, for example, there is one by Google,", "tokens": [50986, + 400, 456, 307, 411, 11, 456, 366, 2418, 10577, 6572, 11, 337, 1365, 11, 456, 307, + 472, 538, 3329, 11, 51176], "temperature": 0.0, "avg_logprob": -0.2332855837188498, + "compression_ratio": 1.751700680272109, "no_speech_prob": 0.01702490821480751}, + {"id": 161, "seek": 56072, "start": 576.96, "end": 582.08, "text": " I will try + to also, you know, link it in the show notes and share with you.", "tokens": [51176, + 286, 486, 853, 281, 611, 11, 291, 458, 11, 2113, 309, 294, 264, 855, 5570, 293, + 2073, 365, 291, 13, 51432], "temperature": 0.0, "avg_logprob": -0.2332855837188498, + "compression_ratio": 1.751700680272109, "no_speech_prob": 0.01702490821480751}, + {"id": 162, "seek": 56072, "start": 582.08, "end": 586.72, "text": " But it''s fairly + long document and it goes so high level that you might get, get,", "tokens": [51432, + 583, 309, 311, 6457, 938, 4166, 293, 309, 1709, 370, 1090, 1496, 300, 291, 1062, + 483, 11, 483, 11, 51664], "temperature": 0.0, "avg_logprob": -0.2332855837188498, + "compression_ratio": 1.751700680272109, "no_speech_prob": 0.01702490821480751}, + {"id": 163, "seek": 56072, "start": 586.72, "end": 588.52, "text": " get a slip, + you know, while reading it.", "tokens": [51664, 483, 257, 11140, 11, 291, 458, 11, + 1339, 3760, 309, 13, 51754], "temperature": 0.0, "avg_logprob": -0.2332855837188498, + "compression_ratio": 1.751700680272109, "no_speech_prob": 0.01702490821480751}, + {"id": 164, "seek": 58852, "start": 588.52, "end": 590.76, "text": " So you need + like real tools, right?", "tokens": [50364, 407, 291, 643, 411, 957, 3873, 11, 558, + 30, 50476], "temperature": 0.0, "avg_logprob": -0.16535390507091174, "compression_ratio": + 1.6402640264026402, "no_speech_prob": 0.023784223943948746}, {"id": 165, "seek": + 58852, "start": 590.76, "end": 595.76, "text": " And you need some kind of understanding, + okay, I stitched this together and I just achieved", "tokens": [50476, 400, 291, + 643, 512, 733, 295, 3701, 11, 1392, 11, 286, 48992, 341, 1214, 293, 286, 445, 11042, + 50726], "temperature": 0.0, "avg_logprob": -0.16535390507091174, "compression_ratio": + 1.6402640264026402, "no_speech_prob": 0.023784223943948746}, {"id": 166, "seek": + 58852, "start": 595.76, "end": 596.76, "text": " my goal.", "tokens": [50726, 452, + 3387, 13, 50776], "temperature": 0.0, "avg_logprob": -0.16535390507091174, "compression_ratio": + 1.6402640264026402, "no_speech_prob": 0.023784223943948746}, {"id": 167, "seek": + 58852, "start": 596.76, "end": 600.64, "text": " I don''t want to like build the + fully blown MLOPS pipeline, right?", "tokens": [50776, 286, 500, 380, 528, 281, + 411, 1322, 264, 4498, 16479, 21601, 46, 6273, 15517, 11, 558, 30, 50970], "temperature": + 0.0, "avg_logprob": -0.16535390507091174, "compression_ratio": 1.6402640264026402, + "no_speech_prob": 0.023784223943948746}, {"id": 168, "seek": 58852, "start": 600.64, + "end": 604.96, "text": " And it''s also expensive, like retraining these models + is very slow, which means you''re", "tokens": [50970, 400, 309, 311, 611, 5124, + 11, 411, 49356, 1760, 613, 5245, 307, 588, 2964, 11, 597, 1355, 291, 434, 51186], + "temperature": 0.0, "avg_logprob": -0.16535390507091174, "compression_ratio": 1.6402640264026402, + "no_speech_prob": 0.023784223943948746}, {"id": 169, "seek": 58852, "start": 604.96, + "end": 606.84, "text": " going to want to use the best hardware you can.", "tokens": + [51186, 516, 281, 528, 281, 764, 264, 1151, 8837, 291, 393, 13, 51280], "temperature": + 0.0, "avg_logprob": -0.16535390507091174, "compression_ratio": 1.6402640264026402, + "no_speech_prob": 0.023784223943948746}, {"id": 170, "seek": 58852, "start": 606.84, + "end": 610.1999999999999, "text": " And if you''re doing that every day, which is + crazy, but let''s say you do it every week", "tokens": [51280, 400, 498, 291, 434, + 884, 300, 633, 786, 11, 597, 307, 3219, 11, 457, 718, 311, 584, 291, 360, 309, 633, + 1243, 51448], "temperature": 0.0, "avg_logprob": -0.16535390507091174, "compression_ratio": + 1.6402640264026402, "no_speech_prob": 0.023784223943948746}, {"id": 171, "seek": + 58852, "start": 610.1999999999999, "end": 614.0, "text": " or every month, it''s + still a significant amount of like fixed resources.", "tokens": [51448, 420, 633, + 1618, 11, 309, 311, 920, 257, 4776, 2372, 295, 411, 6806, 3593, 13, 51638], "temperature": + 0.0, "avg_logprob": -0.16535390507091174, "compression_ratio": 1.6402640264026402, + "no_speech_prob": 0.023784223943948746}, {"id": 172, "seek": 61400, "start": 614.0, + "end": 618.32, "text": " You have to allocate to it and like mental resources to + understand it to debug it.", "tokens": [50364, 509, 362, 281, 35713, 281, 309, 293, + 411, 4973, 3593, 281, 1223, 309, 281, 24083, 309, 13, 50580], "temperature": 0.0, + "avg_logprob": -0.22747085889180502, "compression_ratio": 1.6842105263157894, "no_speech_prob": + 0.05114749073982239}, {"id": 173, "seek": 61400, "start": 618.32, "end": 619.32, + "text": " Yes.", "tokens": [50580, 1079, 13, 50630], "temperature": 0.0, "avg_logprob": + -0.22747085889180502, "compression_ratio": 1.6842105263157894, "no_speech_prob": + 0.05114749073982239}, {"id": 174, "seek": 61400, "start": 619.32, "end": 620.32, + "text": " Yes.", "tokens": [50630, 1079, 13, 50680], "temperature": 0.0, "avg_logprob": + -0.22747085889180502, "compression_ratio": 1.6842105263157894, "no_speech_prob": + 0.05114749073982239}, {"id": 175, "seek": 61400, "start": 620.32, "end": 621.32, + "text": " Yeah.", "tokens": [50680, 865, 13, 50730], "temperature": 0.0, "avg_logprob": + -0.22747085889180502, "compression_ratio": 1.6842105263157894, "no_speech_prob": + 0.05114749073982239}, {"id": 176, "seek": 61400, "start": 621.32, "end": 622.32, + "text": " You''re right.", "tokens": [50730, 509, 434, 558, 13, 50780], "temperature": + 0.0, "avg_logprob": -0.22747085889180502, "compression_ratio": 1.6842105263157894, + "no_speech_prob": 0.05114749073982239}, {"id": 177, "seek": 61400, "start": 622.32, + "end": 626.48, "text": " Yeah, I think there is still a long way to go in this in + this direction, but at the", "tokens": [50780, 865, 11, 286, 519, 456, 307, 920, + 257, 938, 636, 281, 352, 294, 341, 294, 341, 3513, 11, 457, 412, 264, 50988], "temperature": + 0.0, "avg_logprob": -0.22747085889180502, "compression_ratio": 1.6842105263157894, + "no_speech_prob": 0.05114749073982239}, {"id": 178, "seek": 61400, "start": 626.48, + "end": 634.0, "text": " same time, you like you as a developer, you want to focus + on the task and on the result,", "tokens": [50988, 912, 565, 11, 291, 411, 291, + 382, 257, 10754, 11, 291, 528, 281, 1879, 322, 264, 5633, 293, 322, 264, 1874, 11, + 51364], "temperature": 0.0, "avg_logprob": -0.22747085889180502, "compression_ratio": + 1.6842105263157894, "no_speech_prob": 0.05114749073982239}, {"id": 179, "seek": + 61400, "start": 634.0, "end": 635.0, "text": " right?", "tokens": [51364, 558, 30, + 51414], "temperature": 0.0, "avg_logprob": -0.22747085889180502, "compression_ratio": + 1.6842105263157894, "no_speech_prob": 0.05114749073982239}, {"id": 180, "seek": + 61400, "start": 635.0, "end": 639.2, "text": " Like not on figuring out what''s + the bug and that framework or whatever.", "tokens": [51414, 1743, 406, 322, 15213, + 484, 437, 311, 264, 7426, 293, 300, 8388, 420, 2035, 13, 51624], "temperature": + 0.0, "avg_logprob": -0.22747085889180502, "compression_ratio": 1.6842105263157894, + "no_speech_prob": 0.05114749073982239}, {"id": 181, "seek": 61400, "start": 639.2, + "end": 641.72, "text": " So yeah, I think there are tools already available.", "tokens": + [51624, 407, 1338, 11, 286, 519, 456, 366, 3873, 1217, 2435, 13, 51750], "temperature": + 0.0, "avg_logprob": -0.22747085889180502, "compression_ratio": 1.6842105263157894, + "no_speech_prob": 0.05114749073982239}, {"id": 182, "seek": 64172, "start": 641.72, + "end": 646.96, "text": " So maybe one of them that I''ve been using is determined + AI or kind of doing some early", "tokens": [50364, 407, 1310, 472, 295, 552, 300, + 286, 600, 668, 1228, 307, 9540, 7318, 420, 733, 295, 884, 512, 2440, 50626], "temperature": + 0.0, "avg_logprob": -0.22890563540988498, "compression_ratio": 1.5367965367965368, + "no_speech_prob": 0.005908787716180086}, {"id": 183, "seek": 64172, "start": 646.96, + "end": 649.24, "text": " stage moves there.", "tokens": [50626, 3233, 6067, 456, + 13, 50740], "temperature": 0.0, "avg_logprob": -0.22890563540988498, "compression_ratio": + 1.5367965367965368, "no_speech_prob": 0.005908787716180086}, {"id": 184, "seek": + 64172, "start": 649.24, "end": 656.28, "text": " It''s completely open source and + it''s and it claims that basically it utilizes your GPUs", "tokens": [50740, 467, + 311, 2584, 1269, 4009, 293, 309, 311, 293, 309, 9441, 300, 1936, 309, 4976, 5660, + 428, 18407, 82, 51092], "temperature": 0.0, "avg_logprob": -0.22890563540988498, + "compression_ratio": 1.5367965367965368, "no_speech_prob": 0.005908787716180086}, + {"id": 185, "seek": 64172, "start": 656.28, "end": 659.28, "text": " to the maximum + because GPU is super expensive.", "tokens": [51092, 281, 264, 6674, 570, 18407, + 307, 1687, 5124, 13, 51242], "temperature": 0.0, "avg_logprob": -0.22890563540988498, + "compression_ratio": 1.5367965367965368, "no_speech_prob": 0.005908787716180086}, + {"id": 186, "seek": 64172, "start": 659.28, "end": 666.8000000000001, "text": " + And yeah, so basically at the abstracts GPU kind of allocation away from you, but + it has", "tokens": [51242, 400, 1338, 11, 370, 1936, 412, 264, 12649, 82, 18407, + 733, 295, 27599, 1314, 490, 291, 11, 457, 309, 575, 51618], "temperature": 0.0, + "avg_logprob": -0.22890563540988498, "compression_ratio": 1.5367965367965368, "no_speech_prob": + 0.005908787716180086}, {"id": 187, "seek": 64172, "start": 666.8000000000001, "end": + 668.9200000000001, "text": " some limitations as well.", "tokens": [51618, 512, + 15705, 382, 731, 13, 51724], "temperature": 0.0, "avg_logprob": -0.22890563540988498, + "compression_ratio": 1.5367965367965368, "no_speech_prob": 0.005908787716180086}, + {"id": 188, "seek": 66892, "start": 668.92, "end": 673.52, "text": " So the team + is working on or resulting them, but like PyTorch and TensorFlow are supported.", + "tokens": [50364, 407, 264, 1469, 307, 1364, 322, 420, 16505, 552, 11, 457, 411, + 9953, 51, 284, 339, 293, 37624, 366, 8104, 13, 50594], "temperature": 0.0, "avg_logprob": + -0.21397534438541957, "compression_ratio": 1.6141732283464567, "no_speech_prob": + 0.028628148138523102}, {"id": 189, "seek": 66892, "start": 673.52, "end": 679.52, + "text": " So like you can run some fine tuning or training or evaluation and hyperparameter + search.", "tokens": [50594, 407, 411, 291, 393, 1190, 512, 2489, 15164, 420, 3097, + 420, 13344, 293, 9848, 2181, 335, 2398, 3164, 13, 50894], "temperature": 0.0, "avg_logprob": + -0.21397534438541957, "compression_ratio": 1.6141732283464567, "no_speech_prob": + 0.028628148138523102}, {"id": 190, "seek": 66892, "start": 679.52, "end": 685.4399999999999, + "text": " So yeah, I mean, it gives you a sense of control in a way, but of course + that comes with some", "tokens": [50894, 407, 1338, 11, 286, 914, 11, 309, 2709, + 291, 257, 2020, 295, 1969, 294, 257, 636, 11, 457, 295, 1164, 300, 1487, 365, 512, + 51190], "temperature": 0.0, "avg_logprob": -0.21397534438541957, "compression_ratio": + 1.6141732283464567, "no_speech_prob": 0.028628148138523102}, {"id": 191, "seek": + 66892, "start": 685.4399999999999, "end": 691.92, "text": " rigidity built in, but + eventually I really hope that they will make it and it will be", "tokens": [51190, + 8329, 17711, 3094, 294, 11, 457, 4728, 286, 534, 1454, 300, 436, 486, 652, 309, + 293, 309, 486, 312, 51514], "temperature": 0.0, "avg_logprob": -0.21397534438541957, + "compression_ratio": 1.6141732283464567, "no_speech_prob": 0.028628148138523102}, + {"id": 192, "seek": 66892, "start": 691.92, "end": 694.04, "text": " more kind of + widespread.", "tokens": [51514, 544, 733, 295, 22679, 13, 51620], "temperature": + 0.0, "avg_logprob": -0.21397534438541957, "compression_ratio": 1.6141732283464567, + "no_speech_prob": 0.028628148138523102}, {"id": 193, "seek": 66892, "start": 694.04, + "end": 695.04, "text": " Yeah.", "tokens": [51620, 865, 13, 51670], "temperature": + 0.0, "avg_logprob": -0.21397534438541957, "compression_ratio": 1.6141732283464567, + "no_speech_prob": 0.028628148138523102}, {"id": 194, "seek": 66892, "start": 695.04, + "end": 696.04, "text": " Awesome.", "tokens": [51670, 10391, 13, 51720], "temperature": + 0.0, "avg_logprob": -0.21397534438541957, "compression_ratio": 1.6141732283464567, + "no_speech_prob": 0.028628148138523102}, {"id": 195, "seek": 66892, "start": 696.04, + "end": 697.04, "text": " Awesome.", "tokens": [51720, 10391, 13, 51770], "temperature": + 0.0, "avg_logprob": -0.21397534438541957, "compression_ratio": 1.6141732283464567, + "no_speech_prob": 0.028628148138523102}, {"id": 196, "seek": 69704, "start": 697.36, + "end": 700.9599999999999, "text": " So today if I go to look up, can I buy an NFT + already?", "tokens": [50380, 407, 965, 498, 286, 352, 281, 574, 493, 11, 393, 286, + 2256, 364, 50075, 1217, 30, 50560], "temperature": 0.0, "avg_logprob": -0.28291544420965786, + "compression_ratio": 1.6203007518796992, "no_speech_prob": 0.07014330476522446}, + {"id": 197, "seek": 69704, "start": 700.9599999999999, "end": 702.88, "text": " + Okay, then just find it.", "tokens": [50560, 1033, 11, 550, 445, 915, 309, 13, 50656], + "temperature": 0.0, "avg_logprob": -0.28291544420965786, "compression_ratio": 1.6203007518796992, + "no_speech_prob": 0.07014330476522446}, {"id": 198, "seek": 69704, "start": 702.88, + "end": 710.56, "text": " You can just find it and click through the open see the + actual process of you getting the", "tokens": [50656, 509, 393, 445, 915, 309, 293, + 2052, 807, 264, 1269, 536, 264, 3539, 1399, 295, 291, 1242, 264, 51040], "temperature": + 0.0, "avg_logprob": -0.28291544420965786, "compression_ratio": 1.6203007518796992, + "no_speech_prob": 0.07014330476522446}, {"id": 199, "seek": 69704, "start": 710.56, + "end": 713.0799999999999, "text": " deliverable and the token and all that stuff + is actually pretty complicated.", "tokens": [51040, 4239, 712, 293, 264, 14862, + 293, 439, 300, 1507, 307, 767, 1238, 6179, 13, 51166], "temperature": 0.0, "avg_logprob": + -0.28291544420965786, "compression_ratio": 1.6203007518796992, "no_speech_prob": + 0.07014330476522446}, {"id": 200, "seek": 69704, "start": 713.0799999999999, "end": + 715.36, "text": " So I''m going to let them do it.", "tokens": [51166, 407, 286, + 478, 516, 281, 718, 552, 360, 309, 13, 51280], "temperature": 0.0, "avg_logprob": + -0.28291544420965786, "compression_ratio": 1.6203007518796992, "no_speech_prob": + 0.07014330476522446}, {"id": 201, "seek": 69704, "start": 715.36, "end": 720.36, + "text": " I really want to on look up, hopefully be indexing tons of different NFT + markets.", "tokens": [51280, 286, 534, 528, 281, 322, 574, 493, 11, 4696, 312, 8186, + 278, 9131, 295, 819, 50075, 8383, 13, 51530], "temperature": 0.0, "avg_logprob": + -0.28291544420965786, "compression_ratio": 1.6203007518796992, "no_speech_prob": + 0.07014330476522446}, {"id": 202, "seek": 69704, "start": 720.36, "end": 723.5999999999999, + "text": " Open see is the biggest one, but there''s quite a few other small ones.", + "tokens": [51530, 7238, 536, 307, 264, 3880, 472, 11, 457, 456, 311, 1596, 257, + 1326, 661, 1359, 2306, 13, 51692], "temperature": 0.0, "avg_logprob": -0.28291544420965786, + "compression_ratio": 1.6203007518796992, "no_speech_prob": 0.07014330476522446}, + {"id": 203, "seek": 72360, "start": 723.6, "end": 727.24, "text": " So I didn''t + want to time myself too closely to one particular blockchain or one particular", + "tokens": [50364, 407, 286, 994, 380, 528, 281, 565, 2059, 886, 8185, 281, 472, + 1729, 17176, 420, 472, 1729, 50546], "temperature": 0.0, "avg_logprob": -0.2259359844660355, + "compression_ratio": 1.6398467432950192, "no_speech_prob": 0.02143983729183674}, + {"id": 204, "seek": 72360, "start": 727.24, "end": 728.5600000000001, "text": " + form of operation.", "tokens": [50546, 1254, 295, 6916, 13, 50612], "temperature": + 0.0, "avg_logprob": -0.2259359844660355, "compression_ratio": 1.6398467432950192, + "no_speech_prob": 0.02143983729183674}, {"id": 205, "seek": 72360, "start": 728.5600000000001, + "end": 730.5600000000001, "text": " I do think that this is developing so quickly.", + "tokens": [50612, 286, 360, 519, 300, 341, 307, 6416, 370, 2661, 13, 50712], "temperature": + 0.0, "avg_logprob": -0.2259359844660355, "compression_ratio": 1.6398467432950192, + "no_speech_prob": 0.02143983729183674}, {"id": 206, "seek": 72360, "start": 730.5600000000001, + "end": 733.32, "text": " NFTs weren''t even a thing until about two years ago.", + "tokens": [50712, 13576, 33424, 4999, 380, 754, 257, 551, 1826, 466, 732, 924, 2057, + 13, 50850], "temperature": 0.0, "avg_logprob": -0.2259359844660355, "compression_ratio": + 1.6398467432950192, "no_speech_prob": 0.02143983729183674}, {"id": 207, "seek": + 72360, "start": 733.32, "end": 738.6800000000001, "text": " So I feel like it''s + a little early to sort of like get in bed with just one of the vendors", "tokens": + [50850, 407, 286, 841, 411, 309, 311, 257, 707, 2440, 281, 1333, 295, 411, 483, + 294, 2901, 365, 445, 472, 295, 264, 22056, 51118], "temperature": 0.0, "avg_logprob": + -0.2259359844660355, "compression_ratio": 1.6398467432950192, "no_speech_prob": + 0.02143983729183674}, {"id": 208, "seek": 72360, "start": 738.6800000000001, "end": + 740.8000000000001, "text": " or just one of the vendors.", "tokens": [51118, 420, + 445, 472, 295, 264, 22056, 13, 51224], "temperature": 0.0, "avg_logprob": -0.2259359844660355, + "compression_ratio": 1.6398467432950192, "no_speech_prob": 0.02143983729183674}, + {"id": 209, "seek": 72360, "start": 740.8000000000001, "end": 741.8000000000001, + "text": " Yeah.", "tokens": [51224, 865, 13, 51274], "temperature": 0.0, "avg_logprob": + -0.2259359844660355, "compression_ratio": 1.6398467432950192, "no_speech_prob": + 0.02143983729183674}, {"id": 210, "seek": 72360, "start": 741.8000000000001, "end": + 742.8000000000001, "text": " Yeah.", "tokens": [51274, 865, 13, 51324], "temperature": + 0.0, "avg_logprob": -0.2259359844660355, "compression_ratio": 1.6398467432950192, + "no_speech_prob": 0.02143983729183674}, {"id": 211, "seek": 72360, "start": 742.8000000000001, + "end": 748.6800000000001, "text": " And I''ve been also like when I joined Quadrant + Telegram channel, I saw like you''ve been", "tokens": [51324, 400, 286, 600, 668, + 611, 411, 562, 286, 6869, 29619, 7541, 14889, 1342, 2269, 11, 286, 1866, 411, 291, + 600, 668, 51618], "temperature": 0.0, "avg_logprob": -0.2259359844660355, "compression_ratio": + 1.6398467432950192, "no_speech_prob": 0.02143983729183674}, {"id": 212, "seek": + 74868, "start": 748.68, "end": 755.04, "text": " so active like you are sending + some advice or commentary almost every single day.", "tokens": [50364, 370, 4967, + 411, 291, 366, 7750, 512, 5192, 420, 23527, 1920, 633, 2167, 786, 13, 50682], "temperature": + 0.0, "avg_logprob": -0.24700527954101562, "compression_ratio": 1.6948529411764706, + "no_speech_prob": 0.20283888280391693}, {"id": 213, "seek": 74868, "start": 755.04, + "end": 757.2399999999999, "text": " So I love Quadrant and I love Telegram.", "tokens": + [50682, 407, 286, 959, 29619, 7541, 293, 286, 959, 14889, 1342, 13, 50792], "temperature": + 0.0, "avg_logprob": -0.24700527954101562, "compression_ratio": 1.6948529411764706, + "no_speech_prob": 0.20283888280391693}, {"id": 214, "seek": 74868, "start": 757.2399999999999, + "end": 762.2399999999999, "text": " Yeah, I was just thinking like you are the developer + of Quadrant or what, but you are the", "tokens": [50792, 865, 11, 286, 390, 445, + 1953, 411, 291, 366, 264, 10754, 295, 29619, 7541, 420, 437, 11, 457, 291, 366, + 264, 51042], "temperature": 0.0, "avg_logprob": -0.24700527954101562, "compression_ratio": + 1.6948529411764706, "no_speech_prob": 0.20283888280391693}, {"id": 215, "seek": + 74868, "start": 762.2399999999999, "end": 763.68, "text": " user, is that right?", + "tokens": [51042, 4195, 11, 307, 300, 558, 30, 51114], "temperature": 0.0, "avg_logprob": + -0.24700527954101562, "compression_ratio": 1.6948529411764706, "no_speech_prob": + 0.20283888280391693}, {"id": 216, "seek": 74868, "start": 763.68, "end": 768.7199999999999, + "text": " Yeah, I mean, I think I''m doing like informal tech support in the opposite + time zone because", "tokens": [51114, 865, 11, 286, 914, 11, 286, 519, 286, 478, + 884, 411, 24342, 7553, 1406, 294, 264, 6182, 565, 6668, 570, 51366], "temperature": + 0.0, "avg_logprob": -0.24700527954101562, "compression_ratio": 1.6948529411764706, + "no_speech_prob": 0.20283888280391693}, {"id": 217, "seek": 74868, "start": 768.7199999999999, + "end": 773.0, "text": " they''re all on CET like you are in a, you know, for some + reason, although a lot of people", "tokens": [51366, 436, 434, 439, 322, 383, 4850, + 411, 291, 366, 294, 257, 11, 291, 458, 11, 337, 512, 1778, 11, 4878, 257, 688, 295, + 561, 51580], "temperature": 0.0, "avg_logprob": -0.24700527954101562, "compression_ratio": + 1.6948529411764706, "no_speech_prob": 0.20283888280391693}, {"id": 218, "seek": + 74868, "start": 773.0, "end": 776.4399999999999, "text": " wind up in, you know, + American time in there.", "tokens": [51580, 2468, 493, 294, 11, 291, 458, 11, 2665, + 565, 294, 456, 13, 51752], "temperature": 0.0, "avg_logprob": -0.24700527954101562, + "compression_ratio": 1.6948529411764706, "no_speech_prob": 0.20283888280391693}, + {"id": 219, "seek": 77644, "start": 776.44, "end": 779.1600000000001, "text": " + I was looking for a long time for a vector database.", "tokens": [50364, 286, 390, + 1237, 337, 257, 938, 565, 337, 257, 8062, 8149, 13, 50500], "temperature": 0.0, + "avg_logprob": -0.3319802965436663, "compression_ratio": 1.670995670995671, "no_speech_prob": + 0.001006167265586555}, {"id": 220, "seek": 77644, "start": 779.1600000000001, "end": + 785.9200000000001, "text": " I tried FAASS face, I think they call it in Python + and I tried a couple others and I really", "tokens": [50500, 286, 3031, 479, 5265, + 21929, 1851, 11, 286, 519, 436, 818, 309, 294, 15329, 293, 286, 3031, 257, 1916, + 2357, 293, 286, 534, 50838], "temperature": 0.0, "avg_logprob": -0.3319802965436663, + "compression_ratio": 1.670995670995671, "no_speech_prob": 0.001006167265586555}, + {"id": 221, "seek": 77644, "start": 785.9200000000001, "end": 786.9200000000001, + "text": " didn''t find anything.", "tokens": [50838, 994, 380, 915, 1340, 13, 50888], + "temperature": 0.0, "avg_logprob": -0.3319802965436663, "compression_ratio": 1.670995670995671, + "no_speech_prob": 0.001006167265586555}, {"id": 222, "seek": 77644, "start": 786.9200000000001, + "end": 796.6800000000001, "text": " I guess you could say like intuitive that intuitively + like scratch, scratch my itch, you know, like", "tokens": [50888, 286, 2041, 291, + 727, 584, 411, 21769, 300, 46506, 411, 8459, 11, 8459, 452, 309, 339, 11, 291, 458, + 11, 411, 51376], "temperature": 0.0, "avg_logprob": -0.3319802965436663, "compression_ratio": + 1.670995670995671, "no_speech_prob": 0.001006167265586555}, {"id": 223, "seek": + 77644, "start": 796.6800000000001, "end": 801.32, "text": " I don''t like software + to complicated things that are sort of like isolated and independent", "tokens": + [51376, 286, 500, 380, 411, 4722, 281, 6179, 721, 300, 366, 1333, 295, 411, 14621, + 293, 6695, 51608], "temperature": 0.0, "avg_logprob": -0.3319802965436663, "compression_ratio": + 1.670995670995671, "no_speech_prob": 0.001006167265586555}, {"id": 224, "seek": + 77644, "start": 801.32, "end": 803.6400000000001, "text": " and easy to install + and use.", "tokens": [51608, 293, 1858, 281, 3625, 293, 764, 13, 51724], "temperature": + 0.0, "avg_logprob": -0.3319802965436663, "compression_ratio": 1.670995670995671, + "no_speech_prob": 0.001006167265586555}, {"id": 225, "seek": 80364, "start": 803.64, + "end": 807.96, "text": " Quadrant just sort of like ticked all those boxes for me + like it''s a small download, it''s", "tokens": [50364, 29619, 7541, 445, 1333, 295, + 411, 5204, 292, 439, 729, 9002, 337, 385, 411, 309, 311, 257, 1359, 5484, 11, 309, + 311, 50580], "temperature": 0.0, "avg_logprob": -0.24959374952685925, "compression_ratio": + 1.6453900709219857, "no_speech_prob": 0.00087282236199826}, {"id": 226, "seek": + 80364, "start": 807.96, "end": 811.0, "text": " a dockerized so it''s very easy + to install.", "tokens": [50580, 257, 360, 9178, 1602, 370, 309, 311, 588, 1858, + 281, 3625, 13, 50732], "temperature": 0.0, "avg_logprob": -0.24959374952685925, + "compression_ratio": 1.6453900709219857, "no_speech_prob": 0.00087282236199826}, + {"id": 227, "seek": 80364, "start": 811.0, "end": 813.88, "text": " The API just + makes sense.", "tokens": [50732, 440, 9362, 445, 1669, 2020, 13, 50876], "temperature": + 0.0, "avg_logprob": -0.24959374952685925, "compression_ratio": 1.6453900709219857, + "no_speech_prob": 0.00087282236199826}, {"id": 228, "seek": 80364, "start": 813.88, + "end": 819.24, "text": " I had evaluated other vector databases we can talk about + that if you want.", "tokens": [50876, 286, 632, 25509, 661, 8062, 22380, 321, 393, + 751, 466, 300, 498, 291, 528, 13, 51144], "temperature": 0.0, "avg_logprob": -0.24959374952685925, + "compression_ratio": 1.6453900709219857, "no_speech_prob": 0.00087282236199826}, + {"id": 229, "seek": 80364, "start": 819.24, "end": 822.76, "text": " I found that + Quadrant was the best mix of all those different factors.", "tokens": [51144, 286, + 1352, 300, 29619, 7541, 390, 264, 1151, 2890, 295, 439, 729, 819, 6771, 13, 51320], + "temperature": 0.0, "avg_logprob": -0.24959374952685925, "compression_ratio": 1.6453900709219857, + "no_speech_prob": 0.00087282236199826}, {"id": 230, "seek": 80364, "start": 822.76, + "end": 828.36, "text": " So, you know, when I embrace an open source project to + try to do my best to help them out.", "tokens": [51320, 407, 11, 291, 458, 11, 562, + 286, 14038, 364, 1269, 4009, 1716, 281, 853, 281, 360, 452, 1151, 281, 854, 552, + 484, 13, 51600], "temperature": 0.0, "avg_logprob": -0.24959374952685925, "compression_ratio": + 1.6453900709219857, "no_speech_prob": 0.00087282236199826}, {"id": 231, "seek": + 80364, "start": 828.36, "end": 833.04, "text": " So I built the first elixir connector + to use Quadrant for me, Lixir.", "tokens": [51600, 407, 286, 3094, 264, 700, 806, + 970, 347, 19127, 281, 764, 29619, 7541, 337, 385, 11, 441, 970, 347, 13, 51834], + "temperature": 0.0, "avg_logprob": -0.24959374952685925, "compression_ratio": 1.6453900709219857, + "no_speech_prob": 0.00087282236199826}, {"id": 232, "seek": 83304, "start": 833.04, + "end": 837.36, "text": " I''m trying to still develop other little pieces of the + puzzle.", "tokens": [50364, 286, 478, 1382, 281, 920, 1499, 661, 707, 3755, 295, + 264, 12805, 13, 50580], "temperature": 0.0, "avg_logprob": -0.24683025905064174, + "compression_ratio": 1.6090534979423867, "no_speech_prob": 0.0024677375331521034}, + {"id": 233, "seek": 83304, "start": 837.36, "end": 841.76, "text": " So actually + I''m interested, I''m quite interested because you know, like I published a blog", + "tokens": [50580, 407, 767, 286, 478, 3102, 11, 286, 478, 1596, 3102, 570, 291, + 458, 11, 411, 286, 6572, 257, 6968, 50800], "temperature": 0.0, "avg_logprob": -0.24683025905064174, + "compression_ratio": 1.6090534979423867, "no_speech_prob": 0.0024677375331521034}, + {"id": 234, "seek": 83304, "start": 841.76, "end": 844.3199999999999, "text": " + on seven vector databases.", "tokens": [50800, 322, 3407, 8062, 22380, 13, 50928], + "temperature": 0.0, "avg_logprob": -0.24683025905064174, "compression_ratio": 1.6090534979423867, + "no_speech_prob": 0.0024677375331521034}, {"id": 235, "seek": 83304, "start": 844.3199999999999, + "end": 850.8, "text": " It was actually six and then the founder of Quadrant knocks + on my door, virtual door and he", "tokens": [50928, 467, 390, 767, 2309, 293, 550, + 264, 14917, 295, 29619, 7541, 40815, 322, 452, 2853, 11, 6374, 2853, 293, 415, 51252], + "temperature": 0.0, "avg_logprob": -0.24683025905064174, "compression_ratio": 1.6090534979423867, + "no_speech_prob": 0.0024677375331521034}, {"id": 236, "seek": 83304, "start": 850.8, + "end": 857.4399999999999, "text": " said hey, please add our database as well because + we are the new kid on the block and you", "tokens": [51252, 848, 4177, 11, 1767, + 909, 527, 8149, 382, 731, 570, 321, 366, 264, 777, 1636, 322, 264, 3461, 293, 291, + 51584], "temperature": 0.0, "avg_logprob": -0.24683025905064174, "compression_ratio": + 1.6090534979423867, "no_speech_prob": 0.0024677375331521034}, {"id": 237, "seek": + 83304, "start": 857.4399999999999, "end": 859.52, "text": " just probably didn''t + see us.", "tokens": [51584, 445, 1391, 994, 380, 536, 505, 13, 51688], "temperature": + 0.0, "avg_logprob": -0.24683025905064174, "compression_ratio": 1.6090534979423867, + "no_speech_prob": 0.0024677375331521034}, {"id": 238, "seek": 85952, "start": 859.52, + "end": 863.36, "text": " And then I opened their website and I was like kind of + a little bit blown away because,", "tokens": [50364, 400, 550, 286, 5625, 641, 3144, + 293, 286, 390, 411, 733, 295, 257, 707, 857, 16479, 1314, 570, 11, 50556], "temperature": + 0.0, "avg_logprob": -0.28487283212167247, "compression_ratio": 1.6705426356589148, + "no_speech_prob": 0.012875286862254143}, {"id": 239, "seek": 85952, "start": 863.36, + "end": 867.1999999999999, "text": " you know, the documentation looks interesting, + very good and also like the way they position", "tokens": [50556, 291, 458, 11, + 264, 14333, 1542, 1880, 11, 588, 665, 293, 611, 411, 264, 636, 436, 2535, 50748], + "temperature": 0.0, "avg_logprob": -0.28487283212167247, "compression_ratio": 1.6705426356589148, + "no_speech_prob": 0.012875286862254143}, {"id": 240, "seek": 85952, "start": 867.1999999999999, + "end": 868.1999999999999, "text": " it.", "tokens": [50748, 309, 13, 50798], "temperature": + 0.0, "avg_logprob": -0.28487283212167247, "compression_ratio": 1.6705426356589148, + "no_speech_prob": 0.012875286862254143}, {"id": 241, "seek": 85952, "start": 868.1999999999999, + "end": 869.1999999999999, "text": " Yeah.", "tokens": [50798, 865, 13, 50848], "temperature": + 0.0, "avg_logprob": -0.28487283212167247, "compression_ratio": 1.6705426356589148, + "no_speech_prob": 0.012875286862254143}, {"id": 242, "seek": 85952, "start": 869.1999999999999, + "end": 874.0799999999999, "text": " They talk about like metric deep learning, some + things that I didn''t even hear before.", "tokens": [50848, 814, 751, 466, 411, + 20678, 2452, 2539, 11, 512, 721, 300, 286, 994, 380, 754, 1568, 949, 13, 51092], + "temperature": 0.0, "avg_logprob": -0.28487283212167247, "compression_ratio": 1.6705426356589148, + "no_speech_prob": 0.012875286862254143}, {"id": 243, "seek": 85952, "start": 874.0799999999999, + "end": 880.6, "text": " And then I also discovered the developer team, like what + they do and also they customized", "tokens": [51092, 400, 550, 286, 611, 6941, 264, + 10754, 1469, 11, 411, 437, 436, 360, 293, 611, 436, 30581, 51418], "temperature": + 0.0, "avg_logprob": -0.28487283212167247, "compression_ratio": 1.6705426356589148, + "no_speech_prob": 0.012875286862254143}, {"id": 244, "seek": 85952, "start": 880.6, + "end": 883.92, "text": " like H&SW algorithm algorithm as well.", "tokens": [51418, + 411, 389, 5, 50, 54, 9284, 9284, 382, 731, 13, 51584], "temperature": 0.0, "avg_logprob": + -0.28487283212167247, "compression_ratio": 1.6705426356589148, "no_speech_prob": + 0.012875286862254143}, {"id": 245, "seek": 85952, "start": 883.92, "end": 886.0799999999999, + "text": " They do graph algorithm.", "tokens": [51584, 814, 360, 4295, 9284, 13, + 51692], "temperature": 0.0, "avg_logprob": -0.28487283212167247, "compression_ratio": + 1.6705426356589148, "no_speech_prob": 0.012875286862254143}, {"id": 246, "seek": + 88608, "start": 886.08, "end": 890.32, "text": " And can you like a little bit walk + me through the options that you considered, like which", "tokens": [50364, 400, + 393, 291, 411, 257, 707, 857, 1792, 385, 807, 264, 3956, 300, 291, 4888, 11, 411, + 597, 50576], "temperature": 0.0, "avg_logprob": -0.23395334531183112, "compression_ratio": + 1.7361963190184049, "no_speech_prob": 0.04934825003147125}, {"id": 247, "seek": + 88608, "start": 890.32, "end": 894.9200000000001, "text": " are the databases you + have taken a look at, how deep you went before you decided to go", "tokens": [50576, + 366, 264, 22380, 291, 362, 2726, 257, 574, 412, 11, 577, 2452, 291, 1437, 949, 291, + 3047, 281, 352, 50806], "temperature": 0.0, "avg_logprob": -0.23395334531183112, + "compression_ratio": 1.7361963190184049, "no_speech_prob": 0.04934825003147125}, + {"id": 248, "seek": 88608, "start": 894.9200000000001, "end": 899.2800000000001, + "text": " with Quadrant and what was the ticking moment like, okay, I like this.", + "tokens": [50806, 365, 29619, 7541, 293, 437, 390, 264, 33999, 1623, 411, 11, 1392, + 11, 286, 411, 341, 13, 51024], "temperature": 0.0, "avg_logprob": -0.23395334531183112, + "compression_ratio": 1.7361963190184049, "no_speech_prob": 0.04934825003147125}, + {"id": 249, "seek": 88608, "start": 899.2800000000001, "end": 900.2800000000001, + "text": " Right.", "tokens": [51024, 1779, 13, 51074], "temperature": 0.0, "avg_logprob": + -0.23395334531183112, "compression_ratio": 1.7361963190184049, "no_speech_prob": + 0.04934825003147125}, {"id": 250, "seek": 88608, "start": 900.2800000000001, "end": + 904.6, "text": " So I think the main one that I think I studied for a while, I think + a lot of people look", "tokens": [51074, 407, 286, 519, 264, 2135, 472, 300, 286, + 519, 286, 9454, 337, 257, 1339, 11, 286, 519, 257, 688, 295, 561, 574, 51290], "temperature": + 0.0, "avg_logprob": -0.23395334531183112, "compression_ratio": 1.7361963190184049, + "no_speech_prob": 0.04934825003147125}, {"id": 251, "seek": 88608, "start": 904.6, + "end": 906.5600000000001, "text": " at is Milvis.", "tokens": [51290, 412, 307, + 7036, 4938, 13, 51388], "temperature": 0.0, "avg_logprob": -0.23395334531183112, + "compression_ratio": 1.7361963190184049, "no_speech_prob": 0.04934825003147125}, + {"id": 252, "seek": 88608, "start": 906.5600000000001, "end": 908.76, "text": " + Milvis has like a lot of really exciting energy going on.", "tokens": [51388, 7036, + 4938, 575, 411, 257, 688, 295, 534, 4670, 2281, 516, 322, 13, 51498], "temperature": + 0.0, "avg_logprob": -0.23395334531183112, "compression_ratio": 1.7361963190184049, + "no_speech_prob": 0.04934825003147125}, {"id": 253, "seek": 88608, "start": 908.76, + "end": 911.6400000000001, "text": " I think they go to have a good replication story + as well.", "tokens": [51498, 286, 519, 436, 352, 281, 362, 257, 665, 39911, 1657, + 382, 731, 13, 51642], "temperature": 0.0, "avg_logprob": -0.23395334531183112, "compression_ratio": + 1.7361963190184049, "no_speech_prob": 0.04934825003147125}, {"id": 254, "seek": + 88608, "start": 911.6400000000001, "end": 914.96, "text": " But the problem where + they seem like they wanted me to use their Python data science toolkit", "tokens": + [51642, 583, 264, 1154, 689, 436, 1643, 411, 436, 1415, 385, 281, 764, 641, 15329, + 1412, 3497, 40167, 51808], "temperature": 0.0, "avg_logprob": -0.23395334531183112, + "compression_ratio": 1.7361963190184049, "no_speech_prob": 0.04934825003147125}, + {"id": 255, "seek": 91496, "start": 914.96, "end": 917.0400000000001, "text": " + to sort of interact with it.", "tokens": [50364, 281, 1333, 295, 4648, 365, 309, + 13, 50468], "temperature": 0.0, "avg_logprob": -0.2445657644698869, "compression_ratio": + 1.685121107266436, "no_speech_prob": 0.0018451543292030692}, {"id": 256, "seek": + 91496, "start": 917.0400000000001, "end": 922.0400000000001, "text": " They were, + their API was very abstract and focused on, obviously just not what I was really", + "tokens": [50468, 814, 645, 11, 641, 9362, 390, 588, 12649, 293, 5178, 322, 11, + 2745, 445, 406, 437, 286, 390, 534, 50718], "temperature": 0.0, "avg_logprob": -0.2445657644698869, + "compression_ratio": 1.685121107266436, "no_speech_prob": 0.0018451543292030692}, + {"id": 257, "seek": 91496, "start": 922.0400000000001, "end": 923.0400000000001, + "text": " doing.", "tokens": [50718, 884, 13, 50768], "temperature": 0.0, "avg_logprob": + -0.2445657644698869, "compression_ratio": 1.685121107266436, "no_speech_prob": 0.0018451543292030692}, + {"id": 258, "seek": 91496, "start": 923.0400000000001, "end": 929.72, "text": " + I needed an API that was oriented around data operations and working not so much + analysis.", "tokens": [50768, 286, 2978, 364, 9362, 300, 390, 21841, 926, 1412, + 7705, 293, 1364, 406, 370, 709, 5215, 13, 51102], "temperature": 0.0, "avg_logprob": + -0.2445657644698869, "compression_ratio": 1.685121107266436, "no_speech_prob": 0.0018451543292030692}, + {"id": 259, "seek": 91496, "start": 929.72, "end": 932.2800000000001, "text": " + So that kind of slowed me down there.", "tokens": [51102, 407, 300, 733, 295, 32057, + 385, 760, 456, 13, 51230], "temperature": 0.0, "avg_logprob": -0.2445657644698869, + "compression_ratio": 1.685121107266436, "no_speech_prob": 0.0018451543292030692}, + {"id": 260, "seek": 91496, "start": 932.2800000000001, "end": 936.36, "text": " + With Quadrant, I felt that as soon as I got to the web page, I knew how to use it.", + "tokens": [51230, 2022, 29619, 7541, 11, 286, 2762, 300, 382, 2321, 382, 286, 658, + 281, 264, 3670, 3028, 11, 286, 2586, 577, 281, 764, 309, 13, 51434], "temperature": + 0.0, "avg_logprob": -0.2445657644698869, "compression_ratio": 1.685121107266436, + "no_speech_prob": 0.0018451543292030692}, {"id": 261, "seek": 91496, "start": 936.36, + "end": 937.36, "text": " Right.", "tokens": [51434, 1779, 13, 51484], "temperature": + 0.0, "avg_logprob": -0.2445657644698869, "compression_ratio": 1.685121107266436, + "no_speech_prob": 0.0018451543292030692}, {"id": 262, "seek": 91496, "start": 937.36, + "end": 940.24, "text": " Which, you know, for some reason to me, you know, if a + software system abstracts as", "tokens": [51484, 3013, 11, 291, 458, 11, 337, 512, + 1778, 281, 385, 11, 291, 458, 11, 498, 257, 4722, 1185, 12649, 82, 382, 51628], + "temperature": 0.0, "avg_logprob": -0.2445657644698869, "compression_ratio": 1.685121107266436, + "no_speech_prob": 0.0018451543292030692}, {"id": 263, "seek": 91496, "start": 940.24, + "end": 942.72, "text": " specific of how it operates and how you use it too much.", + "tokens": [51628, 2685, 295, 577, 309, 22577, 293, 577, 291, 764, 309, 886, 709, + 13, 51752], "temperature": 0.0, "avg_logprob": -0.2445657644698869, "compression_ratio": + 1.685121107266436, "no_speech_prob": 0.0018451543292030692}, {"id": 264, "seek": + 94272, "start": 942.72, "end": 946.08, "text": " I like, I like to say, like, I + like a language where you go to the site and you just see", "tokens": [50364, 286, + 411, 11, 286, 411, 281, 584, 11, 411, 11, 286, 411, 257, 2856, 689, 291, 352, 281, + 264, 3621, 293, 291, 445, 536, 50532], "temperature": 0.0, "avg_logprob": -0.204294170032848, + "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0020431755110621452}, + {"id": 265, "seek": 94272, "start": 946.08, "end": 949.36, "text": " a bit of code + there on the home page and it''s a button you can run it, you know, like,", "tokens": + [50532, 257, 857, 295, 3089, 456, 322, 264, 1280, 3028, 293, 309, 311, 257, 2960, + 291, 393, 1190, 309, 11, 291, 458, 11, 411, 11, 50696], "temperature": 0.0, "avg_logprob": + -0.204294170032848, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0020431755110621452}, + {"id": 266, "seek": 94272, "start": 949.36, "end": 954.2, "text": " I don''t want + to be too removed from the actual tasks of what I''m doing.", "tokens": [50696, + 286, 500, 380, 528, 281, 312, 886, 7261, 490, 264, 3539, 9608, 295, 437, 286, 478, + 884, 13, 50938], "temperature": 0.0, "avg_logprob": -0.204294170032848, "compression_ratio": + 1.8235294117647058, "no_speech_prob": 0.0020431755110621452}, {"id": 267, "seek": + 94272, "start": 954.2, "end": 956.6800000000001, "text": " So Quadrant just seemed + to like understand that and get that.", "tokens": [50938, 407, 29619, 7541, 445, + 6576, 281, 411, 1223, 300, 293, 483, 300, 13, 51062], "temperature": 0.0, "avg_logprob": + -0.204294170032848, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0020431755110621452}, + {"id": 268, "seek": 94272, "start": 956.6800000000001, "end": 963.8000000000001, + "text": " And then going to the channel, I like the fact that they had a good, a + good, a deep technical", "tokens": [51062, 400, 550, 516, 281, 264, 2269, 11, 286, + 411, 264, 1186, 300, 436, 632, 257, 665, 11, 257, 665, 11, 257, 2452, 6191, 51418], + "temperature": 0.0, "avg_logprob": -0.204294170032848, "compression_ratio": 1.8235294117647058, + "no_speech_prob": 0.0020431755110621452}, {"id": 269, "seek": 94272, "start": 963.8000000000001, + "end": 966.08, "text": " understanding of how it works, but they weren''t trying + to beat you over the head with", "tokens": [51418, 3701, 295, 577, 309, 1985, 11, + 457, 436, 4999, 380, 1382, 281, 4224, 291, 670, 264, 1378, 365, 51532], "temperature": + 0.0, "avg_logprob": -0.204294170032848, "compression_ratio": 1.8235294117647058, + "no_speech_prob": 0.0020431755110621452}, {"id": 270, "seek": 94272, "start": 966.08, + "end": 967.08, "text": " the specifics.", "tokens": [51532, 264, 28454, 13, 51582], + "temperature": 0.0, "avg_logprob": -0.204294170032848, "compression_ratio": 1.8235294117647058, + "no_speech_prob": 0.0020431755110621452}, {"id": 271, "seek": 94272, "start": 967.08, + "end": 968.4, "text": " It was kind of like abstract at the right level.", "tokens": + [51582, 467, 390, 733, 295, 411, 12649, 412, 264, 558, 1496, 13, 51648], "temperature": + 0.0, "avg_logprob": -0.204294170032848, "compression_ratio": 1.8235294117647058, + "no_speech_prob": 0.0020431755110621452}, {"id": 272, "seek": 94272, "start": 968.4, + "end": 971.1600000000001, "text": " So, and you know, it''s really fast.", "tokens": + [51648, 407, 11, 293, 291, 458, 11, 309, 311, 534, 2370, 13, 51786], "temperature": + 0.0, "avg_logprob": -0.204294170032848, "compression_ratio": 1.8235294117647058, + "no_speech_prob": 0.0020431755110621452}, {"id": 273, "seek": 97116, "start": 971.16, + "end": 974.76, "text": " I tried tossing millions of records at it and it was almost + if perceptibly slower.", "tokens": [50364, 286, 3031, 14432, 278, 6803, 295, 7724, + 412, 309, 293, 309, 390, 1920, 498, 43276, 3545, 14009, 13, 50544], "temperature": + 0.0, "avg_logprob": -0.2629930443233914, "compression_ratio": 1.7938144329896908, + "no_speech_prob": 0.02281758561730385}, {"id": 274, "seek": 97116, "start": 974.76, + "end": 977.68, "text": " Like you almost couldn''t tell that you were adding so + many records.", "tokens": [50544, 1743, 291, 1920, 2809, 380, 980, 300, 291, 645, + 5127, 370, 867, 7724, 13, 50690], "temperature": 0.0, "avg_logprob": -0.2629930443233914, + "compression_ratio": 1.7938144329896908, "no_speech_prob": 0.02281758561730385}, + {"id": 275, "seek": 97116, "start": 977.68, "end": 979.8, "text": " So I thought + that was, that was really fair.", "tokens": [50690, 407, 286, 1194, 300, 390, 11, + 300, 390, 534, 3143, 13, 50796], "temperature": 0.0, "avg_logprob": -0.2629930443233914, + "compression_ratio": 1.7938144329896908, "no_speech_prob": 0.02281758561730385}, + {"id": 276, "seek": 97116, "start": 979.8, "end": 984.04, "text": " A lot of these + vector databases now, I feel like the more like platforms, you know,", "tokens": + [50796, 316, 688, 295, 613, 8062, 22380, 586, 11, 286, 841, 411, 264, 544, 411, + 9473, 11, 291, 458, 11, 51008], "temperature": 0.0, "avg_logprob": -0.2629930443233914, + "compression_ratio": 1.7938144329896908, "no_speech_prob": 0.02281758561730385}, + {"id": 277, "seek": 97116, "start": 984.04, "end": 985.04, "text": " I didn''t want + that.", "tokens": [51008, 286, 994, 380, 528, 300, 13, 51058], "temperature": 0.0, + "avg_logprob": -0.2629930443233914, "compression_ratio": 1.7938144329896908, "no_speech_prob": + 0.02281758561730385}, {"id": 278, "seek": 97116, "start": 985.04, "end": 989.36, + "text": " I wanted almost like a redis of vector databases, you know, that kind + of, by platform,", "tokens": [51058, 286, 1415, 1920, 411, 257, 2182, 271, 295, + 8062, 22380, 11, 291, 458, 11, 300, 733, 295, 11, 538, 3663, 11, 51274], "temperature": + 0.0, "avg_logprob": -0.2629930443233914, "compression_ratio": 1.7938144329896908, + "no_speech_prob": 0.02281758561730385}, {"id": 279, "seek": 97116, "start": 989.36, + "end": 994.0799999999999, "text": " do you mean like this database is trying to + lock you in in a way, kind of like give you", "tokens": [51274, 360, 291, 914, 411, + 341, 8149, 307, 1382, 281, 4017, 291, 294, 294, 257, 636, 11, 733, 295, 411, 976, + 291, 51510], "temperature": 0.0, "avg_logprob": -0.2629930443233914, "compression_ratio": + 1.7938144329896908, "no_speech_prob": 0.02281758561730385}, {"id": 280, "seek": + 97116, "start": 994.0799999999999, "end": 995.6, "text": " so many features you + don''t need?", "tokens": [51510, 370, 867, 4122, 291, 500, 380, 643, 30, 51586], + "temperature": 0.0, "avg_logprob": -0.2629930443233914, "compression_ratio": 1.7938144329896908, + "no_speech_prob": 0.02281758561730385}, {"id": 281, "seek": 97116, "start": 995.6, + "end": 996.6, "text": " Exactly.", "tokens": [51586, 7587, 13, 51636], "temperature": + 0.0, "avg_logprob": -0.2629930443233914, "compression_ratio": 1.7938144329896908, + "no_speech_prob": 0.02281758561730385}, {"id": 282, "seek": 97116, "start": 996.6, + "end": 997.6, "text": " Okay.", "tokens": [51636, 1033, 13, 51686], "temperature": + 0.0, "avg_logprob": -0.2629930443233914, "compression_ratio": 1.7938144329896908, + "no_speech_prob": 0.02281758561730385}, {"id": 283, "seek": 99760, "start": 997.6, + "end": 1001.6800000000001, "text": " So I think it''s all well meaning, but I just, + I just feel like I can''t, I can''t trust", "tokens": [50364, 407, 286, 519, 309, + 311, 439, 731, 3620, 11, 457, 286, 445, 11, 286, 445, 841, 411, 286, 393, 380, 11, + 286, 393, 380, 3361, 50568], "temperature": 0.0, "avg_logprob": -0.22437323593511815, + "compression_ratio": 1.6595092024539877, "no_speech_prob": 0.0274764783680439}, + {"id": 284, "seek": 99760, "start": 1001.6800000000001, "end": 1004.0, "text": " + one vendor for a lot of what I''m doing.", "tokens": [50568, 472, 24321, 337, 257, + 688, 295, 437, 286, 478, 884, 13, 50684], "temperature": 0.0, "avg_logprob": -0.22437323593511815, + "compression_ratio": 1.6595092024539877, "no_speech_prob": 0.0274764783680439}, + {"id": 285, "seek": 99760, "start": 1004.0, "end": 1007.96, "text": " I need to + sort of spread my, spread my risk, you know, over different parts.", "tokens": [50684, + 286, 643, 281, 1333, 295, 3974, 452, 11, 3974, 452, 3148, 11, 291, 458, 11, 670, + 819, 3166, 13, 50882], "temperature": 0.0, "avg_logprob": -0.22437323593511815, + "compression_ratio": 1.6595092024539877, "no_speech_prob": 0.0274764783680439}, + {"id": 286, "seek": 99760, "start": 1007.96, "end": 1012.44, "text": " So I try + not to embrace any parts of the system that are too large, too monolithic.", "tokens": + [50882, 407, 286, 853, 406, 281, 14038, 604, 3166, 295, 264, 1185, 300, 366, 886, + 2416, 11, 886, 1108, 42878, 13, 51106], "temperature": 0.0, "avg_logprob": -0.22437323593511815, + "compression_ratio": 1.6595092024539877, "no_speech_prob": 0.0274764783680439}, + {"id": 287, "seek": 99760, "start": 1012.44, "end": 1013.44, "text": " You know, + yeah.", "tokens": [51106, 509, 458, 11, 1338, 13, 51156], "temperature": 0.0, "avg_logprob": + -0.22437323593511815, "compression_ratio": 1.6595092024539877, "no_speech_prob": + 0.0274764783680439}, {"id": 288, "seek": 99760, "start": 1013.44, "end": 1017.0400000000001, + "text": " And I guess at this point, you''re wearing your VP hat, right?", "tokens": + [51156, 400, 286, 2041, 412, 341, 935, 11, 291, 434, 4769, 428, 35812, 2385, 11, + 558, 30, 51336], "temperature": 0.0, "avg_logprob": -0.22437323593511815, "compression_ratio": + 1.6595092024539877, "no_speech_prob": 0.0274764783680439}, {"id": 289, "seek": 99760, + "start": 1017.0400000000001, "end": 1022.12, "text": " VP of the genuine hat that + you, you don''t just kind of like, oh, this is a sexy platform", "tokens": [51336, + 35812, 295, 264, 16699, 2385, 300, 291, 11, 291, 500, 380, 445, 733, 295, 411, 11, + 1954, 11, 341, 307, 257, 13701, 3663, 51590], "temperature": 0.0, "avg_logprob": + -0.22437323593511815, "compression_ratio": 1.6595092024539877, "no_speech_prob": + 0.0274764783680439}, {"id": 290, "seek": 99760, "start": 1022.12, "end": 1023.12, + "text": " or tool.", "tokens": [51590, 420, 2290, 13, 51640], "temperature": 0.0, + "avg_logprob": -0.22437323593511815, "compression_ratio": 1.6595092024539877, "no_speech_prob": + 0.0274764783680439}, {"id": 291, "seek": 99760, "start": 1023.12, "end": 1024.1200000000001, + "text": " Let''s use it.", "tokens": [51640, 961, 311, 764, 309, 13, 51690], "temperature": + 0.0, "avg_logprob": -0.22437323593511815, "compression_ratio": 1.6595092024539877, + "no_speech_prob": 0.0274764783680439}, {"id": 292, "seek": 99760, "start": 1024.1200000000001, + "end": 1027.16, "text": " But you want to see long term, what are the implications, + right?", "tokens": [51690, 583, 291, 528, 281, 536, 938, 1433, 11, 437, 366, 264, + 16602, 11, 558, 30, 51842], "temperature": 0.0, "avg_logprob": -0.22437323593511815, + "compression_ratio": 1.6595092024539877, "no_speech_prob": 0.0274764783680439}, + {"id": 293, "seek": 102716, "start": 1028.16, "end": 1032.0400000000002, "text": + " And you need to, you need to adopt technology that has the right sort of surface + shape that", "tokens": [50414, 400, 291, 643, 281, 11, 291, 643, 281, 6878, 2899, + 300, 575, 264, 558, 1333, 295, 3753, 3909, 300, 50608], "temperature": 0.0, "avg_logprob": + -0.2272656514094426, "compression_ratio": 1.8610271903323263, "no_speech_prob": + 0.0027661689091473818}, {"id": 294, "seek": 102716, "start": 1032.0400000000002, + "end": 1034.0, "text": " you know, it''s going to slide in easily.", "tokens": [50608, + 291, 458, 11, 309, 311, 516, 281, 4137, 294, 3612, 13, 50706], "temperature": 0.0, + "avg_logprob": -0.2272656514094426, "compression_ratio": 1.8610271903323263, "no_speech_prob": + 0.0027661689091473818}, {"id": 295, "seek": 102716, "start": 1034.0, "end": 1039.0, + "text": " You know, I, I, I, with, for instance, with Python face or whatever, I + knew that that was", "tokens": [50706, 509, 458, 11, 286, 11, 286, 11, 286, 11, + 365, 11, 337, 5197, 11, 365, 15329, 1851, 420, 2035, 11, 286, 2586, 300, 300, 390, + 50956], "temperature": 0.0, "avg_logprob": -0.2272656514094426, "compression_ratio": + 1.8610271903323263, "no_speech_prob": 0.0027661689091473818}, {"id": 296, "seek": + 102716, "start": 1039.0, "end": 1042.16, "text": " going to be a nightmare to wrap + to make connectors for for different systems.", "tokens": [50956, 516, 281, 312, + 257, 18724, 281, 7019, 281, 652, 31865, 337, 337, 819, 3652, 13, 51114], "temperature": + 0.0, "avg_logprob": -0.2272656514094426, "compression_ratio": 1.8610271903323263, + "no_speech_prob": 0.0027661689091473818}, {"id": 297, "seek": 102716, "start": 1042.16, + "end": 1045.0400000000002, "text": " And I also knew that I wasn''t going to program + my entire thing in Python.", "tokens": [51114, 400, 286, 611, 2586, 300, 286, 2067, + 380, 516, 281, 1461, 452, 2302, 551, 294, 15329, 13, 51258], "temperature": 0.0, + "avg_logprob": -0.2272656514094426, "compression_ratio": 1.8610271903323263, "no_speech_prob": + 0.0027661689091473818}, {"id": 298, "seek": 102716, "start": 1045.0400000000002, + "end": 1049.76, "text": " And I also knew that, you know, I would need to have a + long term component running, running", "tokens": [51258, 400, 286, 611, 2586, 300, + 11, 291, 458, 11, 286, 576, 643, 281, 362, 257, 938, 1433, 6542, 2614, 11, 2614, + 51494], "temperature": 0.0, "avg_logprob": -0.2272656514094426, "compression_ratio": + 1.8610271903323263, "no_speech_prob": 0.0027661689091473818}, {"id": 299, "seek": + 102716, "start": 1049.76, "end": 1053.24, "text": " a web server that was independent + of Python back and restarts.", "tokens": [51494, 257, 3670, 7154, 300, 390, 6695, + 295, 15329, 646, 293, 1472, 11814, 13, 51668], "temperature": 0.0, "avg_logprob": + -0.2272656514094426, "compression_ratio": 1.8610271903323263, "no_speech_prob": + 0.0027661689091473818}, {"id": 300, "seek": 102716, "start": 1053.24, "end": 1055.8000000000002, + "text": " So with all those factors together, I think that quadrant was kind of + like the obvious", "tokens": [51668, 407, 365, 439, 729, 6771, 1214, 11, 286, 519, + 300, 46856, 390, 733, 295, 411, 264, 6322, 51796], "temperature": 0.0, "avg_logprob": + -0.2272656514094426, "compression_ratio": 1.8610271903323263, "no_speech_prob": + 0.0027661689091473818}, {"id": 301, "seek": 105580, "start": 1055.8, "end": 1056.8, + "text": " choice.", "tokens": [50364, 3922, 13, 50414], "temperature": 0.0, "avg_logprob": + -0.2997707598137133, "compression_ratio": 1.7027972027972027, "no_speech_prob": + 0.02486780658364296}, {"id": 302, "seek": 105580, "start": 1056.8, "end": 1059.28, + "text": " And plus just looking through the code, it seemed, it seemed short.", + "tokens": [50414, 400, 1804, 445, 1237, 807, 264, 3089, 11, 309, 6576, 11, 309, + 6576, 2099, 13, 50538], "temperature": 0.0, "avg_logprob": -0.2997707598137133, + "compression_ratio": 1.7027972027972027, "no_speech_prob": 0.02486780658364296}, + {"id": 303, "seek": 105580, "start": 1059.28, "end": 1063.3999999999999, "text": + " I, for some reason, I''ve been having really good, really good results with stuff + right", "tokens": [50538, 286, 11, 337, 512, 1778, 11, 286, 600, 668, 1419, 534, + 665, 11, 534, 665, 3542, 365, 1507, 558, 50744], "temperature": 0.0, "avg_logprob": + -0.2997707598137133, "compression_ratio": 1.7027972027972027, "no_speech_prob": + 0.02486780658364296}, {"id": 304, "seek": 105580, "start": 1063.3999999999999, "end": + 1064.3999999999999, "text": " and rest lately.", "tokens": [50744, 293, 1472, 12881, + 13, 50794], "temperature": 0.0, "avg_logprob": -0.2997707598137133, "compression_ratio": + 1.7027972027972027, "no_speech_prob": 0.02486780658364296}, {"id": 305, "seek": + 105580, "start": 1064.3999999999999, "end": 1068.84, "text": " I''ve, like, a lot + of rest software come across as really reliable and performant.", "tokens": [50794, + 286, 600, 11, 411, 11, 257, 688, 295, 1472, 4722, 808, 2108, 382, 534, 12924, 293, + 2042, 394, 13, 51016], "temperature": 0.0, "avg_logprob": -0.2997707598137133, "compression_ratio": + 1.7027972027972027, "no_speech_prob": 0.02486780658364296}, {"id": 306, "seek": + 105580, "start": 1068.84, "end": 1069.84, "text": " Yes.", "tokens": [51016, 1079, + 13, 51066], "temperature": 0.0, "avg_logprob": -0.2997707598137133, "compression_ratio": + 1.7027972027972027, "no_speech_prob": 0.02486780658364296}, {"id": 307, "seek": + 105580, "start": 1069.84, "end": 1074.48, "text": " So like, that''s what I was + about to ask, like, when you, then when you''re choosing, you", "tokens": [51066, + 407, 411, 11, 300, 311, 437, 286, 390, 466, 281, 1029, 11, 411, 11, 562, 291, 11, + 550, 562, 291, 434, 10875, 11, 291, 51298], "temperature": 0.0, "avg_logprob": -0.2997707598137133, + "compression_ratio": 1.7027972027972027, "no_speech_prob": 0.02486780658364296}, + {"id": 308, "seek": 105580, "start": 1074.48, "end": 1078.68, "text": " were choosing + the platform or the database, did you pay attention to the programming language", + "tokens": [51298, 645, 10875, 264, 3663, 420, 264, 8149, 11, 630, 291, 1689, 3202, + 281, 264, 9410, 2856, 51508], "temperature": 0.0, "avg_logprob": -0.2997707598137133, + "compression_ratio": 1.7027972027972027, "no_speech_prob": 0.02486780658364296}, + {"id": 309, "seek": 105580, "start": 1078.68, "end": 1081.28, "text": " that was + implemented in as well?", "tokens": [51508, 300, 390, 12270, 294, 382, 731, 30, + 51638], "temperature": 0.0, "avg_logprob": -0.2997707598137133, "compression_ratio": + 1.7027972027972027, "no_speech_prob": 0.02486780658364296}, {"id": 310, "seek": + 105580, "start": 1081.28, "end": 1082.28, "text": " Yeah.", "tokens": [51638, 865, + 13, 51688], "temperature": 0.0, "avg_logprob": -0.2997707598137133, "compression_ratio": + 1.7027972027972027, "no_speech_prob": 0.02486780658364296}, {"id": 311, "seek": + 108228, "start": 1083.28, "end": 1087.44, "text": " For some reason, I know it''s + unfair, but I''ve definitely observed certain patterns in", "tokens": [50414, 1171, + 512, 1778, 11, 286, 458, 309, 311, 17019, 11, 457, 286, 600, 2138, 13095, 1629, + 8294, 294, 50622], "temperature": 0.0, "avg_logprob": -0.20221754881712767, "compression_ratio": + 1.5919117647058822, "no_speech_prob": 0.027429813519120216}, {"id": 312, "seek": + 108228, "start": 1087.44, "end": 1093.6399999999999, "text": " all, let''s say, + all Java based server applications, like, let''s say, elastic is a great example.", + "tokens": [50622, 439, 11, 718, 311, 584, 11, 439, 10745, 2361, 7154, 5821, 11, + 411, 11, 718, 311, 584, 11, 17115, 307, 257, 869, 1365, 13, 50932], "temperature": + 0.0, "avg_logprob": -0.20221754881712767, "compression_ratio": 1.5919117647058822, + "no_speech_prob": 0.027429813519120216}, {"id": 313, "seek": 108228, "start": 1093.6399999999999, + "end": 1097.44, "text": " They always want to consume many, many, many CPUs and + have low RAM limitations.", "tokens": [50932, 814, 1009, 528, 281, 14732, 867, 11, + 867, 11, 867, 13199, 82, 293, 362, 2295, 14561, 15705, 13, 51122], "temperature": + 0.0, "avg_logprob": -0.20221754881712767, "compression_ratio": 1.5919117647058822, + "no_speech_prob": 0.027429813519120216}, {"id": 314, "seek": 108228, "start": 1097.44, + "end": 1103.8, "text": " And of course, they''re still that confusing garbage collection + cycles and Java.", "tokens": [51122, 400, 295, 1164, 11, 436, 434, 920, 300, 13181, + 14150, 5765, 17796, 293, 10745, 13, 51440], "temperature": 0.0, "avg_logprob": -0.20221754881712767, + "compression_ratio": 1.5919117647058822, "no_speech_prob": 0.027429813519120216}, + {"id": 315, "seek": 108228, "start": 1103.8, "end": 1108.8, "text": " And like every + time I run a Java based service, I need to end up doing tweaking on the JVM", "tokens": + [51440, 400, 411, 633, 565, 286, 1190, 257, 10745, 2361, 2643, 11, 286, 643, 281, + 917, 493, 884, 6986, 2456, 322, 264, 508, 53, 44, 51690], "temperature": 0.0, "avg_logprob": + -0.20221754881712767, "compression_ratio": 1.5919117647058822, "no_speech_prob": + 0.027429813519120216}, {"id": 316, "seek": 110880, "start": 1109.1599999999999, + "end": 1111.32, "text": " Yeah, which is like a garbage collector, right?", "tokens": + [50382, 865, 11, 597, 307, 411, 257, 14150, 23960, 11, 558, 30, 50490], "temperature": + 0.0, "avg_logprob": -0.34161218007405597, "compression_ratio": 1.8218181818181818, + "no_speech_prob": 0.09418073296546936}, {"id": 317, "seek": 110880, "start": 1111.32, + "end": 1113.24, "text": " You don''t want to do that black magic.", "tokens": [50490, + 509, 500, 380, 528, 281, 360, 300, 2211, 5585, 13, 50586], "temperature": 0.0, "avg_logprob": + -0.34161218007405597, "compression_ratio": 1.8218181818181818, "no_speech_prob": + 0.09418073296546936}, {"id": 318, "seek": 110880, "start": 1114.0, "end": 1118.48, + "text": " Yeah, I want, I want the thing that''s doing the running to sort of be + self operating.", "tokens": [50624, 865, 11, 286, 528, 11, 286, 528, 264, 551, 300, + 311, 884, 264, 2614, 281, 1333, 295, 312, 2698, 7447, 13, 50848], "temperature": + 0.0, "avg_logprob": -0.34161218007405597, "compression_ratio": 1.8218181818181818, + "no_speech_prob": 0.09418073296546936}, {"id": 319, "seek": 110880, "start": 1118.48, + "end": 1121.76, "text": " I don''t want to have to be tweaking that all the time + as my application needs grow and change.", "tokens": [50848, 286, 500, 380, 528, + 281, 362, 281, 312, 6986, 2456, 300, 439, 264, 565, 382, 452, 3861, 2203, 1852, + 293, 1319, 13, 51012], "temperature": 0.0, "avg_logprob": -0.34161218007405597, + "compression_ratio": 1.8218181818181818, "no_speech_prob": 0.09418073296546936}, + {"id": 320, "seek": 110880, "start": 1122.96, "end": 1126.12, "text": " So that + kind of like disqualifies all Java based software for me.", "tokens": [51072, 407, + 300, 733, 295, 411, 717, 22345, 11221, 439, 10745, 2361, 4722, 337, 385, 13, 51230], + "temperature": 0.0, "avg_logprob": -0.34161218007405597, "compression_ratio": 1.8218181818181818, + "no_speech_prob": 0.09418073296546936}, {"id": 321, "seek": 110880, "start": 1126.12, + "end": 1128.68, "text": " And I believe that one of the major vector databases is + Java, right?", "tokens": [51230, 400, 286, 1697, 300, 472, 295, 264, 2563, 8062, + 22380, 307, 10745, 11, 558, 30, 51358], "temperature": 0.0, "avg_logprob": -0.34161218007405597, + "compression_ratio": 1.8218181818181818, "no_speech_prob": 0.09418073296546936}, + {"id": 322, "seek": 110880, "start": 1128.68, "end": 1136.36, "text": " I think + it''s the, the, the, the, the, the, the, the, the, the, the, it is written in go, + actually.", "tokens": [51358, 286, 519, 309, 311, 264, 11, 264, 11, 264, 11, 264, + 11, 264, 11, 264, 11, 264, 11, 264, 11, 264, 11, 264, 11, 264, 11, 309, 307, 3720, + 294, 352, 11, 767, 13, 51742], "temperature": 0.0, "avg_logprob": -0.34161218007405597, + "compression_ratio": 1.8218181818181818, "no_speech_prob": 0.09418073296546936}, + {"id": 323, "seek": 113636, "start": 1136.36, "end": 1141.7199999999998, "text": + " So, but if you mean Vespa, Vespa is written in Java and some other part.", "tokens": + [50364, 407, 11, 457, 498, 291, 914, 691, 279, 4306, 11, 691, 279, 4306, 307, 3720, + 294, 10745, 293, 512, 661, 644, 13, 50632], "temperature": 0.0, "avg_logprob": -0.4084318921082002, + "compression_ratio": 1.5913043478260869, "no_speech_prob": 0.013458467088639736}, + {"id": 324, "seek": 113636, "start": 1141.7199999999998, "end": 1142.52, "text": + " Yeah, Vespa, yeah.", "tokens": [50632, 865, 11, 691, 279, 4306, 11, 1338, 13, + 50672], "temperature": 0.0, "avg_logprob": -0.4084318921082002, "compression_ratio": + 1.5913043478260869, "no_speech_prob": 0.013458467088639736}, {"id": 325, "seek": + 113636, "start": 1142.52, "end": 1143.0, "text": " Yeah.", "tokens": [50672, 865, + 13, 50696], "temperature": 0.0, "avg_logprob": -0.4084318921082002, "compression_ratio": + 1.5913043478260869, "no_speech_prob": 0.013458467088639736}, {"id": 326, "seek": + 113636, "start": 1144.12, "end": 1149.1599999999999, "text": " But by the way, did + you consider Vespa or VEV8 as you''ve been studying different databases?", "tokens": + [50752, 583, 538, 264, 636, 11, 630, 291, 1949, 691, 279, 4306, 420, 691, 36, 53, + 23, 382, 291, 600, 668, 7601, 819, 22380, 30, 51004], "temperature": 0.0, "avg_logprob": + -0.4084318921082002, "compression_ratio": 1.5913043478260869, "no_speech_prob": + 0.013458467088639736}, {"id": 327, "seek": 113636, "start": 1149.8799999999999, + "end": 1152.52, "text": " I believe I checked out the Vespa site, like you said, + Java.", "tokens": [51040, 286, 1697, 286, 10033, 484, 264, 691, 279, 4306, 3621, + 11, 411, 291, 848, 11, 10745, 13, 51172], "temperature": 0.0, "avg_logprob": -0.4084318921082002, + "compression_ratio": 1.5913043478260869, "no_speech_prob": 0.013458467088639736}, + {"id": 328, "seek": 113636, "start": 1153.56, "end": 1154.56, "text": " What was + the other one you said?", "tokens": [51224, 708, 390, 264, 661, 472, 291, 848, 30, + 51274], "temperature": 0.0, "avg_logprob": -0.4084318921082002, "compression_ratio": + 1.5913043478260869, "no_speech_prob": 0.013458467088639736}, {"id": 329, "seek": + 113636, "start": 1155.04, "end": 1155.7199999999998, "text": " VEV8.", "tokens": + [51298, 691, 36, 53, 23, 13, 51332], "temperature": 0.0, "avg_logprob": -0.4084318921082002, + "compression_ratio": 1.5913043478260869, "no_speech_prob": 0.013458467088639736}, + {"id": 330, "seek": 113636, "start": 1156.4399999999998, "end": 1157.1599999999999, + "text": " VEV8.", "tokens": [51368, 691, 36, 53, 23, 13, 51404], "temperature": + 0.0, "avg_logprob": -0.4084318921082002, "compression_ratio": 1.5913043478260869, + "no_speech_prob": 0.013458467088639736}, {"id": 331, "seek": 113636, "start": 1157.1599999999999, + "end": 1158.04, "text": " Written and go.", "tokens": [51404, 10159, 2987, 293, + 352, 13, 51448], "temperature": 0.0, "avg_logprob": -0.4084318921082002, "compression_ratio": + 1.5913043478260869, "no_speech_prob": 0.013458467088639736}, {"id": 332, "seek": + 113636, "start": 1158.3999999999999, "end": 1158.6799999999998, "text": " Yeah.", + "tokens": [51466, 865, 13, 51480], "temperature": 0.0, "avg_logprob": -0.4084318921082002, + "compression_ratio": 1.5913043478260869, "no_speech_prob": 0.013458467088639736}, + {"id": 333, "seek": 113636, "start": 1159.3999999999999, "end": 1161.9599999999998, + "text": " It''s also open source and also first year cloud.", "tokens": [51516, + 467, 311, 611, 1269, 4009, 293, 611, 700, 1064, 4588, 13, 51644], "temperature": + 0.0, "avg_logprob": -0.4084318921082002, "compression_ratio": 1.5913043478260869, + "no_speech_prob": 0.013458467088639736}, {"id": 334, "seek": 116196, "start": 1161.96, + "end": 1169.88, "text": " By the way, I have episodes with both Milbus and VEV8 + for those of us that would like to kind of", "tokens": [50364, 3146, 264, 636, 11, + 286, 362, 9313, 365, 1293, 7036, 21441, 293, 691, 36, 53, 23, 337, 729, 295, 505, + 300, 576, 411, 281, 733, 295, 50760], "temperature": 0.0, "avg_logprob": -0.2620533046437733, + "compression_ratio": 1.5641891891891893, "no_speech_prob": 0.00403061555698514}, + {"id": 335, "seek": 116196, "start": 1169.88, "end": 1174.1200000000001, "text": + " listen in what, what are the building blocks and architectures behind this?", + "tokens": [50760, 2140, 294, 437, 11, 437, 366, 264, 2390, 8474, 293, 6331, 1303, + 2261, 341, 30, 50972], "temperature": 0.0, "avg_logprob": -0.2620533046437733, "compression_ratio": + 1.5641891891891893, "no_speech_prob": 0.00403061555698514}, {"id": 336, "seek": + 116196, "start": 1174.1200000000001, "end": 1174.4, "text": " Yeah.", "tokens": + [50972, 865, 13, 50986], "temperature": 0.0, "avg_logprob": -0.2620533046437733, + "compression_ratio": 1.5641891891891893, "no_speech_prob": 0.00403061555698514}, + {"id": 337, "seek": 116196, "start": 1174.4, "end": 1175.04, "text": " And features.", + "tokens": [50986, 400, 4122, 13, 51018], "temperature": 0.0, "avg_logprob": -0.2620533046437733, + "compression_ratio": 1.5641891891891893, "no_speech_prob": 0.00403061555698514}, + {"id": 338, "seek": 116196, "start": 1175.04, "end": 1175.52, "text": " It''s actually + awesome.", "tokens": [51018, 467, 311, 767, 3476, 13, 51042], "temperature": 0.0, + "avg_logprob": -0.2620533046437733, "compression_ratio": 1.5641891891891893, "no_speech_prob": + 0.00403061555698514}, {"id": 339, "seek": 116196, "start": 1175.52, "end": 1176.52, + "text": " I have to listen to that.", "tokens": [51042, 286, 362, 281, 2140, 281, + 300, 13, 51092], "temperature": 0.0, "avg_logprob": -0.2620533046437733, "compression_ratio": + 1.5641891891891893, "no_speech_prob": 0.00403061555698514}, {"id": 340, "seek": + 116196, "start": 1176.52, "end": 1176.72, "text": " Yeah.", "tokens": [51092, 865, + 13, 51102], "temperature": 0.0, "avg_logprob": -0.2620533046437733, "compression_ratio": + 1.5641891891891893, "no_speech_prob": 0.00403061555698514}, {"id": 341, "seek": + 116196, "start": 1176.72, "end": 1178.8400000000001, "text": " I''m actually very + curious about the different implementation choices.", "tokens": [51102, 286, 478, + 767, 588, 6369, 466, 264, 819, 11420, 7994, 13, 51208], "temperature": 0.0, "avg_logprob": + -0.2620533046437733, "compression_ratio": 1.5641891891891893, "no_speech_prob": + 0.00403061555698514}, {"id": 342, "seek": 116196, "start": 1179.8400000000001, "end": + 1180.2, "text": " Yeah.", "tokens": [51258, 865, 13, 51276], "temperature": 0.0, + "avg_logprob": -0.2620533046437733, "compression_ratio": 1.5641891891891893, "no_speech_prob": + 0.00403061555698514}, {"id": 343, "seek": 116196, "start": 1180.2, "end": 1180.44, + "text": " Yeah.", "tokens": [51276, 865, 13, 51288], "temperature": 0.0, "avg_logprob": + -0.2620533046437733, "compression_ratio": 1.5641891891891893, "no_speech_prob": + 0.00403061555698514}, {"id": 344, "seek": 116196, "start": 1180.44, "end": 1184.52, + "text": " Because go is also a high performance language, right?", "tokens": [51288, + 1436, 352, 307, 611, 257, 1090, 3389, 2856, 11, 558, 30, 51492], "temperature": + 0.0, "avg_logprob": -0.2620533046437733, "compression_ratio": 1.5641891891891893, + "no_speech_prob": 0.00403061555698514}, {"id": 345, "seek": 116196, "start": 1184.52, + "end": 1186.8, "text": " Compared to Python or Java.", "tokens": [51492, 30539, + 281, 15329, 420, 10745, 13, 51606], "temperature": 0.0, "avg_logprob": -0.2620533046437733, + "compression_ratio": 1.5641891891891893, "no_speech_prob": 0.00403061555698514}, + {"id": 346, "seek": 116196, "start": 1187.04, "end": 1189.56, "text": " Of course, + in Java, it also depends how you do it.", "tokens": [51618, 2720, 1164, 11, 294, + 10745, 11, 309, 611, 5946, 577, 291, 360, 309, 13, 51744], "temperature": 0.0, "avg_logprob": + -0.2620533046437733, "compression_ratio": 1.5641891891891893, "no_speech_prob": + 0.00403061555698514}, {"id": 347, "seek": 118956, "start": 1189.56, "end": 1194.6799999999998, + "text": " You know, if you take elastic search or solar, both of them are using + Apache", "tokens": [50364, 509, 458, 11, 498, 291, 747, 17115, 3164, 420, 7936, + 11, 1293, 295, 552, 366, 1228, 46597, 50620], "temperature": 0.0, "avg_logprob": + -0.23298489730969996, "compression_ratio": 1.5992063492063493, "no_speech_prob": + 0.008370354771614075}, {"id": 348, "seek": 118956, "start": 1194.6799999999998, + "end": 1197.0, "text": " Ducin, which is a search library inside.", "tokens": [50620, + 413, 1311, 259, 11, 597, 307, 257, 3164, 6405, 1854, 13, 50736], "temperature": + 0.0, "avg_logprob": -0.23298489730969996, "compression_ratio": 1.5992063492063493, + "no_speech_prob": 0.008370354771614075}, {"id": 349, "seek": 118956, "start": 1197.36, + "end": 1202.9199999999998, "text": " And Apache Ducin has been optimized for, well, + close to 20 years or even more.", "tokens": [50754, 400, 46597, 413, 1311, 259, + 575, 668, 26941, 337, 11, 731, 11, 1998, 281, 945, 924, 420, 754, 544, 13, 51032], + "temperature": 0.0, "avg_logprob": -0.23298489730969996, "compression_ratio": 1.5992063492063493, + "no_speech_prob": 0.008370354771614075}, {"id": 350, "seek": 118956, "start": 1203.32, + "end": 1208.12, "text": " So I mean, it''s close to see in some sense, but of course, + it is not see.", "tokens": [51052, 407, 286, 914, 11, 309, 311, 1998, 281, 536, + 294, 512, 2020, 11, 457, 295, 1164, 11, 309, 307, 406, 536, 13, 51292], "temperature": + 0.0, "avg_logprob": -0.23298489730969996, "compression_ratio": 1.5992063492063493, + "no_speech_prob": 0.008370354771614075}, {"id": 351, "seek": 118956, "start": 1208.1599999999999, + "end": 1214.36, "text": " So like when you, when you load more and more data, eventually + it will, you will run into", "tokens": [51294, 407, 411, 562, 291, 11, 562, 291, + 3677, 544, 293, 544, 1412, 11, 4728, 309, 486, 11, 291, 486, 1190, 666, 51604], + "temperature": 0.0, "avg_logprob": -0.23298489730969996, "compression_ratio": 1.5992063492063493, + "no_speech_prob": 0.008370354771614075}, {"id": 352, "seek": 118956, "start": 1214.36, + "end": 1216.28, "text": " situations that you just explained, right?", "tokens": + [51604, 6851, 300, 291, 445, 8825, 11, 558, 30, 51700], "temperature": 0.0, "avg_logprob": + -0.23298489730969996, "compression_ratio": 1.5992063492063493, "no_speech_prob": + 0.008370354771614075}, {"id": 353, "seek": 121628, "start": 1216.28, "end": 1218.44, + "text": " That you start tweaking the garbage collector.", "tokens": [50364, 663, + 291, 722, 6986, 2456, 264, 14150, 23960, 13, 50472], "temperature": 0.0, "avg_logprob": + -0.26298143109704697, "compression_ratio": 1.6714285714285715, "no_speech_prob": + 0.0492158867418766}, {"id": 354, "seek": 121628, "start": 1218.76, "end": 1225.0, + "text": " There is like a dedicated channel or like even a person, I feel like Sean + Hasey in the", "tokens": [50488, 821, 307, 411, 257, 8374, 2269, 420, 411, 754, + 257, 954, 11, 286, 841, 411, 14839, 389, 651, 88, 294, 264, 50800], "temperature": + 0.0, "avg_logprob": -0.26298143109704697, "compression_ratio": 1.6714285714285715, + "no_speech_prob": 0.0492158867418766}, {"id": 355, "seek": 121628, "start": 1225.48, + "end": 1230.04, "text": " commuter side, who has a lot of wisdom in how specifically + you need to tune which", "tokens": [50824, 800, 20314, 1252, 11, 567, 575, 257, + 688, 295, 10712, 294, 577, 4682, 291, 643, 281, 10864, 597, 51052], "temperature": + 0.0, "avg_logprob": -0.26298143109704697, "compression_ratio": 1.6714285714285715, + "no_speech_prob": 0.0492158867418766}, {"id": 356, "seek": 121628, "start": 1230.04, + "end": 1231.84, "text": " parameters in which garbage collector.", "tokens": [51052, + 9834, 294, 597, 14150, 23960, 13, 51142], "temperature": 0.0, "avg_logprob": -0.26298143109704697, + "compression_ratio": 1.6714285714285715, "no_speech_prob": 0.0492158867418766}, + {"id": 357, "seek": 121628, "start": 1232.36, "end": 1234.48, "text": " Do you want + GC or whatever you have?", "tokens": [51168, 1144, 291, 528, 29435, 420, 2035, 291, + 362, 30, 51274], "temperature": 0.0, "avg_logprob": -0.26298143109704697, "compression_ratio": + 1.6714285714285715, "no_speech_prob": 0.0492158867418766}, {"id": 358, "seek": 121628, + "start": 1235.0, "end": 1237.08, "text": " Depending on the Java version, right?", + "tokens": [51300, 22539, 322, 264, 10745, 3037, 11, 558, 30, 51404], "temperature": + 0.0, "avg_logprob": -0.26298143109704697, "compression_ratio": 1.6714285714285715, + "no_speech_prob": 0.0492158867418766}, {"id": 359, "seek": 121628, "start": 1237.08, + "end": 1240.72, "text": " Because different Java versions have different GCs and + like it''s almost like", "tokens": [51404, 1436, 819, 10745, 9606, 362, 819, 29435, + 82, 293, 411, 309, 311, 1920, 411, 51586], "temperature": 0.0, "avg_logprob": -0.26298143109704697, + "compression_ratio": 1.6714285714285715, "no_speech_prob": 0.0492158867418766}, + {"id": 360, "seek": 121628, "start": 1240.72, "end": 1245.24, "text": " opening + a whole can of worms when you don''t want to actually.", "tokens": [51586, 5193, + 257, 1379, 393, 295, 28271, 562, 291, 500, 380, 528, 281, 767, 13, 51812], "temperature": + 0.0, "avg_logprob": -0.26298143109704697, "compression_ratio": 1.6714285714285715, + "no_speech_prob": 0.0492158867418766}, {"id": 361, "seek": 124524, "start": 1245.24, + "end": 1247.88, "text": " You want to solve that task and move on to the next one.", + "tokens": [50364, 509, 528, 281, 5039, 300, 5633, 293, 1286, 322, 281, 264, 958, + 472, 13, 50496], "temperature": 0.0, "avg_logprob": -0.21187834703285274, "compression_ratio": + 1.6254071661237786, "no_speech_prob": 0.002716003218665719}, {"id": 362, "seek": + 124524, "start": 1248.48, "end": 1252.04, "text": " So yeah, so far, I haven''t + had that issue with the rest based stuff that I''m", "tokens": [50526, 407, 1338, + 11, 370, 1400, 11, 286, 2378, 380, 632, 300, 2734, 365, 264, 1472, 2361, 1507, 300, + 286, 478, 50704], "temperature": 0.0, "avg_logprob": -0.21187834703285274, "compression_ratio": + 1.6254071661237786, "no_speech_prob": 0.002716003218665719}, {"id": 363, "seek": + 124524, "start": 1252.04, "end": 1254.44, "text": " integrating into my work so + far.", "tokens": [50704, 26889, 666, 452, 589, 370, 1400, 13, 50824], "temperature": + 0.0, "avg_logprob": -0.21187834703285274, "compression_ratio": 1.6254071661237786, + "no_speech_prob": 0.002716003218665719}, {"id": 364, "seek": 124524, "start": 1254.6, + "end": 1258.28, "text": " But you know, I would imagine the people using the first + wave of Java based", "tokens": [50832, 583, 291, 458, 11, 286, 576, 3811, 264, 561, + 1228, 264, 700, 5772, 295, 10745, 2361, 51016], "temperature": 0.0, "avg_logprob": + -0.21187834703285274, "compression_ratio": 1.6254071661237786, "no_speech_prob": + 0.002716003218665719}, {"id": 365, "seek": 124524, "start": 1258.28, "end": 1260.0, + "text": " server software didn''t find any problems either.", "tokens": [51016, + 7154, 4722, 994, 380, 915, 604, 2740, 2139, 13, 51102], "temperature": 0.0, "avg_logprob": + -0.21187834703285274, "compression_ratio": 1.6254071661237786, "no_speech_prob": + 0.002716003218665719}, {"id": 366, "seek": 124524, "start": 1260.0, "end": 1264.32, + "text": " So maybe as time goes on, we''ll discover that, you know, you can''t do + large", "tokens": [51102, 407, 1310, 382, 565, 1709, 322, 11, 321, 603, 4411, 300, + 11, 291, 458, 11, 291, 393, 380, 360, 2416, 51318], "temperature": 0.0, "avg_logprob": + -0.21187834703285274, "compression_ratio": 1.6254071661237786, "no_speech_prob": + 0.002716003218665719}, {"id": 367, "seek": 124524, "start": 1264.32, "end": 1265.88, + "text": " allocations in a roster or something like that.", "tokens": [51318, 12660, + 763, 294, 257, 29892, 420, 746, 411, 300, 13, 51396], "temperature": 0.0, "avg_logprob": + -0.21187834703285274, "compression_ratio": 1.6254071661237786, "no_speech_prob": + 0.002716003218665719}, {"id": 368, "seek": 124524, "start": 1266.8, "end": 1272.2, + "text": " Yeah, it''s also cool that actually Rust has been picked up by many teams + developing", "tokens": [51442, 865, 11, 309, 311, 611, 1627, 300, 767, 34952, 575, + 668, 6183, 493, 538, 867, 5491, 6416, 51712], "temperature": 0.0, "avg_logprob": + -0.21187834703285274, "compression_ratio": 1.6254071661237786, "no_speech_prob": + 0.002716003218665719}, {"id": 369, "seek": 127220, "start": 1272.2, "end": 1276.52, + "text": " search engines and not necessarily the vector databases, but like traditional,", + "tokens": [50364, 3164, 12982, 293, 406, 4725, 264, 8062, 22380, 11, 457, 411, 5164, + 11, 50580], "temperature": 0.0, "avg_logprob": -0.30072550151659094, "compression_ratio": + 1.611353711790393, "no_speech_prob": 0.005150607321411371}, {"id": 370, "seek": + 127220, "start": 1277.0800000000002, "end": 1283.4, "text": " you know, inverted + index databases like Tantivi is one of them, which is using Rust.", "tokens": [50608, + 291, 458, 11, 38969, 8186, 22380, 411, 314, 394, 33448, 307, 472, 295, 552, 11, + 597, 307, 1228, 34952, 13, 50924], "temperature": 0.0, "avg_logprob": -0.30072550151659094, + "compression_ratio": 1.611353711790393, "no_speech_prob": 0.005150607321411371}, + {"id": 371, "seek": 127220, "start": 1284.2, "end": 1288.6000000000001, "text": + " And they have a nice blog as well explaining some of the performance bottlenecks", + "tokens": [50964, 400, 436, 362, 257, 1481, 6968, 382, 731, 13468, 512, 295, 264, + 3389, 44641, 2761, 51184], "temperature": 0.0, "avg_logprob": -0.30072550151659094, + "compression_ratio": 1.611353711790393, "no_speech_prob": 0.005150607321411371}, + {"id": 372, "seek": 127220, "start": 1288.6000000000001, "end": 1290.0, "text": + " they were able to resolve.", "tokens": [51184, 436, 645, 1075, 281, 14151, 13, + 51254], "temperature": 0.0, "avg_logprob": -0.30072550151659094, "compression_ratio": + 1.611353711790393, "no_speech_prob": 0.005150607321411371}, {"id": 373, "seek": + 127220, "start": 1290.44, "end": 1293.24, "text": " And Tantivi is way faster than + you''ve seen.", "tokens": [51276, 400, 314, 394, 33448, 307, 636, 4663, 813, 291, + 600, 1612, 13, 51416], "temperature": 0.0, "avg_logprob": -0.30072550151659094, + "compression_ratio": 1.611353711790393, "no_speech_prob": 0.005150607321411371}, + {"id": 374, "seek": 127220, "start": 1294.04, "end": 1298.68, "text": " So there + is another presentation by the Tantivi main.", "tokens": [51456, 407, 456, 307, + 1071, 5860, 538, 264, 314, 394, 33448, 2135, 13, 51688], "temperature": 0.0, "avg_logprob": + -0.30072550151659094, "compression_ratio": 1.611353711790393, "no_speech_prob": + 0.005150607321411371}, {"id": 375, "seek": 129868, "start": 1298.68, "end": 1304.2, + "text": " I''m looking for a good free tech system with BM 25 and all that kind + of stuff.", "tokens": [50364, 286, 478, 1237, 337, 257, 665, 1737, 7553, 1185, 365, + 15901, 3552, 293, 439, 300, 733, 295, 1507, 13, 50640], "temperature": 0.0, "avg_logprob": + -0.18949323625706915, "compression_ratio": 1.726384364820847, "no_speech_prob": + 0.026245126500725746}, {"id": 376, "seek": 129868, "start": 1305.48, "end": 1307.8, + "text": " Unfortunately, quadrant isn''t going to add that.", "tokens": [50704, + 8590, 11, 46856, 1943, 380, 516, 281, 909, 300, 13, 50820], "temperature": 0.0, + "avg_logprob": -0.18949323625706915, "compression_ratio": 1.726384364820847, "no_speech_prob": + 0.026245126500725746}, {"id": 377, "seek": 129868, "start": 1307.8, "end": 1310.76, + "text": " I don''t think it''s kind of off base for them, but that''s such an important + part.", "tokens": [50820, 286, 500, 380, 519, 309, 311, 733, 295, 766, 3096, 337, + 552, 11, 457, 300, 311, 1270, 364, 1021, 644, 13, 50968], "temperature": 0.0, "avg_logprob": + -0.18949323625706915, "compression_ratio": 1.726384364820847, "no_speech_prob": + 0.026245126500725746}, {"id": 378, "seek": 129868, "start": 1312.2, "end": 1315.48, + "text": " You know, there''s a reason that so many of these startups have a team + of people", "tokens": [51040, 509, 458, 11, 456, 311, 257, 1778, 300, 370, 867, + 295, 613, 28041, 362, 257, 1469, 295, 561, 51204], "temperature": 0.0, "avg_logprob": + -0.18949323625706915, "compression_ratio": 1.726384364820847, "no_speech_prob": + 0.026245126500725746}, {"id": 379, "seek": 129868, "start": 1315.48, "end": 1318.6000000000001, + "text": " just doing search result quality, you know, search results are critical.", + "tokens": [51204, 445, 884, 3164, 1874, 3125, 11, 291, 458, 11, 3164, 3542, 366, + 4924, 13, 51360], "temperature": 0.0, "avg_logprob": -0.18949323625706915, "compression_ratio": + 1.726384364820847, "no_speech_prob": 0.026245126500725746}, {"id": 380, "seek": + 129868, "start": 1319.0, "end": 1323.4, "text": " Yeah, now that you mentioned this + important topic, I also wanted to kind of a little bit", "tokens": [51380, 865, + 11, 586, 300, 291, 2835, 341, 1021, 4829, 11, 286, 611, 1415, 281, 733, 295, 257, + 707, 857, 51600], "temperature": 0.0, "avg_logprob": -0.18949323625706915, "compression_ratio": + 1.726384364820847, "no_speech_prob": 0.026245126500725746}, {"id": 381, "seek": + 129868, "start": 1323.4, "end": 1327.3200000000002, "text": " pick your brain on + that, you know, like you have the traditional search engine,", "tokens": [51600, + 1888, 428, 3567, 322, 300, 11, 291, 458, 11, 411, 291, 362, 264, 5164, 3164, 2848, + 11, 51796], "temperature": 0.0, "avg_logprob": -0.18949323625706915, "compression_ratio": + 1.726384364820847, "no_speech_prob": 0.026245126500725746}, {"id": 382, "seek": + 132732, "start": 1327.32, "end": 1331.72, "text": " let''s say on your classic dot + com site, right, where I type text and you use inverted index", "tokens": [50364, + 718, 311, 584, 322, 428, 7230, 5893, 395, 3621, 11, 558, 11, 689, 286, 2010, 2487, + 293, 291, 764, 38969, 8186, 50584], "temperature": 0.0, "avg_logprob": -0.16429784893989563, + "compression_ratio": 1.8028169014084507, "no_speech_prob": 0.0013273870572447777}, + {"id": 383, "seek": 132732, "start": 1331.72, "end": 1332.6799999999998, "text": + " to find the results.", "tokens": [50584, 281, 915, 264, 3542, 13, 50632], "temperature": + 0.0, "avg_logprob": -0.16429784893989563, "compression_ratio": 1.8028169014084507, + "no_speech_prob": 0.0013273870572447777}, {"id": 384, "seek": 132732, "start": 1332.6799999999998, + "end": 1338.12, "text": " And then you want to bring in bird model or some other + model to deal with more semantic search, right.", "tokens": [50632, 400, 550, 291, + 528, 281, 1565, 294, 5255, 2316, 420, 512, 661, 2316, 281, 2028, 365, 544, 47982, + 3164, 11, 558, 13, 50904], "temperature": 0.0, "avg_logprob": -0.16429784893989563, + "compression_ratio": 1.8028169014084507, "no_speech_prob": 0.0013273870572447777}, + {"id": 385, "seek": 132732, "start": 1338.12, "end": 1341.56, "text": " So have + you thought about how you would combine them?", "tokens": [50904, 407, 362, 291, + 1194, 466, 577, 291, 576, 10432, 552, 30, 51076], "temperature": 0.0, "avg_logprob": + -0.16429784893989563, "compression_ratio": 1.8028169014084507, "no_speech_prob": + 0.0013273870572447777}, {"id": 386, "seek": 132732, "start": 1341.56, "end": 1343.8, + "text": " Let''s say inverted index versus vector search.", "tokens": [51076, 961, + 311, 584, 38969, 8186, 5717, 8062, 3164, 13, 51188], "temperature": 0.0, "avg_logprob": + -0.16429784893989563, "compression_ratio": 1.8028169014084507, "no_speech_prob": + 0.0013273870572447777}, {"id": 387, "seek": 132732, "start": 1344.6, "end": 1349.56, + "text": " So I actually, okay, so we say search, right, but there''s actually kind + of different sub tasks inside", "tokens": [51228, 407, 286, 767, 11, 1392, 11, 370, + 321, 584, 3164, 11, 558, 11, 457, 456, 311, 767, 733, 295, 819, 1422, 9608, 1854, + 51476], "temperature": 0.0, "avg_logprob": -0.16429784893989563, "compression_ratio": + 1.8028169014084507, "no_speech_prob": 0.0013273870572447777}, {"id": 388, "seek": + 132732, "start": 1349.56, "end": 1354.04, "text": " of search. One of them is when + you search for something, we want to show you something that''s", "tokens": [51476, + 295, 3164, 13, 1485, 295, 552, 307, 562, 291, 3164, 337, 746, 11, 321, 528, 281, + 855, 291, 746, 300, 311, 51700], "temperature": 0.0, "avg_logprob": -0.16429784893989563, + "compression_ratio": 1.8028169014084507, "no_speech_prob": 0.0013273870572447777}, + {"id": 389, "seek": 135404, "start": 1354.04, "end": 1357.6399999999999, "text": + " similar. So you don''t necessarily want to get the exact same term.", "tokens": + [50364, 2531, 13, 407, 291, 500, 380, 4725, 528, 281, 483, 264, 1900, 912, 1433, + 13, 50544], "temperature": 0.0, "avg_logprob": -0.16186086654663087, "compression_ratio": + 1.7095435684647302, "no_speech_prob": 0.000980414915829897}, {"id": 390, "seek": + 135404, "start": 1358.28, "end": 1364.2, "text": " So that requires like one piece + of data or one mechanism, right, which is more like a recommendation", "tokens": + [50576, 407, 300, 7029, 411, 472, 2522, 295, 1412, 420, 472, 7513, 11, 558, 11, + 597, 307, 544, 411, 257, 11879, 50872], "temperature": 0.0, "avg_logprob": -0.16186086654663087, + "compression_ratio": 1.7095435684647302, "no_speech_prob": 0.000980414915829897}, + {"id": 391, "seek": 135404, "start": 1364.2, "end": 1369.0, "text": " type system. + You also want to handle things with direct keyword matches, of course, but you also", + "tokens": [50872, 2010, 1185, 13, 509, 611, 528, 281, 4813, 721, 365, 2047, 20428, + 10676, 11, 295, 1164, 11, 457, 291, 611, 51112], "temperature": 0.0, "avg_logprob": + -0.16186086654663087, "compression_ratio": 1.7095435684647302, "no_speech_prob": + 0.000980414915829897}, {"id": 392, "seek": 135404, "start": 1369.0, "end": 1374.92, + "text": " want to handle typos, right. So typos requires like a second layer and + structure of databases or", "tokens": [51112, 528, 281, 4813, 2125, 329, 11, 558, + 13, 407, 2125, 329, 7029, 411, 257, 1150, 4583, 293, 3877, 295, 22380, 420, 51408], + "temperature": 0.0, "avg_logprob": -0.16186086654663087, "compression_ratio": 1.7095435684647302, + "no_speech_prob": 0.000980414915829897}, {"id": 393, "seek": 135404, "start": 1377.08, + "end": 1379.1599999999999, "text": " the way you implement it as to work a certain + way.", "tokens": [51516, 264, 636, 291, 4445, 309, 382, 281, 589, 257, 1629, 636, + 13, 51620], "temperature": 0.0, "avg_logprob": -0.16186086654663087, "compression_ratio": + 1.7095435684647302, "no_speech_prob": 0.000980414915829897}, {"id": 394, "seek": + 137916, "start": 1379.3200000000002, "end": 1388.2, "text": " Okay. And I feel like + the best way to kind of do this is to have the search piece do multiple", "tokens": + [50372, 1033, 13, 400, 286, 841, 411, 264, 1151, 636, 281, 733, 295, 360, 341, 307, + 281, 362, 264, 3164, 2522, 360, 3866, 50816], "temperature": 0.0, "avg_logprob": + -0.22429548525342755, "compression_ratio": 1.6356589147286822, "no_speech_prob": + 0.001332070678472519}, {"id": 395, "seek": 137916, "start": 1388.2, "end": 1391.64, + "text": " different attempts at solving the query and then combine them with an + intelligent strategy.", "tokens": [50816, 819, 15257, 412, 12606, 264, 14581, 293, + 550, 10432, 552, 365, 364, 13232, 5206, 13, 50988], "temperature": 0.0, "avg_logprob": + -0.22429548525342755, "compression_ratio": 1.6356589147286822, "no_speech_prob": + 0.001332070678472519}, {"id": 396, "seek": 137916, "start": 1391.64, "end": 1397.5600000000002, + "text": " So, like for instance, on classic, now we''re building a better auto-suggest + component.", "tokens": [50988, 407, 11, 411, 337, 5197, 11, 322, 7230, 11, 586, + 321, 434, 2390, 257, 1101, 8399, 12, 82, 697, 2629, 6542, 13, 51284], "temperature": + 0.0, "avg_logprob": -0.22429548525342755, "compression_ratio": 1.6356589147286822, + "no_speech_prob": 0.001332070678472519}, {"id": 397, "seek": 137916, "start": 1397.5600000000002, + "end": 1399.4, "text": " And it''s actually doing three different types of queries.", + "tokens": [51284, 400, 309, 311, 767, 884, 1045, 819, 3467, 295, 24109, 13, 51376], + "temperature": 0.0, "avg_logprob": -0.22429548525342755, "compression_ratio": 1.6356589147286822, + "no_speech_prob": 0.001332070678472519}, {"id": 398, "seek": 137916, "start": 1400.8400000000001, + "end": 1407.96, "text": " And I think that if you really, really, if you start recording + what users are doing and you", "tokens": [51448, 400, 286, 519, 300, 498, 291, 534, + 11, 534, 11, 498, 291, 722, 6613, 437, 5022, 366, 884, 293, 291, 51804], "temperature": + 0.0, "avg_logprob": -0.22429548525342755, "compression_ratio": 1.6356589147286822, + "no_speech_prob": 0.001332070678472519}, {"id": 399, "seek": 140796, "start": 1407.96, + "end": 1410.76, "text": " start looking at every single, every single search and + saying, what did we do wrong here? How", "tokens": [50364, 722, 1237, 412, 633, + 2167, 11, 633, 2167, 3164, 293, 1566, 11, 437, 630, 321, 360, 2085, 510, 30, 1012, + 50504], "temperature": 0.0, "avg_logprob": -0.13930261941780722, "compression_ratio": + 1.78125, "no_speech_prob": 0.00030742018134333193}, {"id": 400, "seek": 140796, + "start": 1410.76, "end": 1415.56, "text": " could we not service? I think you''ll + see that it''s actually not just one type of query.", "tokens": [50504, 727, 321, + 406, 2643, 30, 286, 519, 291, 603, 536, 300, 309, 311, 767, 406, 445, 472, 2010, + 295, 14581, 13, 50744], "temperature": 0.0, "avg_logprob": -0.13930261941780722, + "compression_ratio": 1.78125, "no_speech_prob": 0.00030742018134333193}, {"id": + 401, "seek": 140796, "start": 1415.56, "end": 1418.6000000000001, "text": " When + people see a type of search box, they''ll just start plugging things in. They don''t + know if", "tokens": [50744, 1133, 561, 536, 257, 2010, 295, 3164, 2424, 11, 436, + 603, 445, 722, 42975, 721, 294, 13, 814, 500, 380, 458, 498, 50896], "temperature": + 0.0, "avg_logprob": -0.13930261941780722, "compression_ratio": 1.78125, "no_speech_prob": + 0.00030742018134333193}, {"id": 402, "seek": 140796, "start": 1418.6000000000001, + "end": 1422.1200000000001, "text": " they''re going to do English language queries, + which is something that an embedding would handle,", "tokens": [50896, 436, 434, + 516, 281, 360, 3669, 2856, 24109, 11, 597, 307, 746, 300, 364, 12240, 3584, 576, + 4813, 11, 51072], "temperature": 0.0, "avg_logprob": -0.13930261941780722, "compression_ratio": + 1.78125, "no_speech_prob": 0.00030742018134333193}, {"id": 403, "seek": 140796, + "start": 1422.1200000000001, "end": 1428.2, "text": " right, because an embedding + can understand any sort of information or any sort of intention in", "tokens": [51072, + 558, 11, 570, 364, 12240, 3584, 393, 1223, 604, 1333, 295, 1589, 420, 604, 1333, + 295, 7789, 294, 51376], "temperature": 0.0, "avg_logprob": -0.13930261941780722, + "compression_ratio": 1.78125, "no_speech_prob": 0.00030742018134333193}, {"id": + 404, "seek": 140796, "start": 1428.2, "end": 1433.64, "text": " that query. But + sometimes they''re just searching for a specific model number or something like + that.", "tokens": [51376, 300, 14581, 13, 583, 2171, 436, 434, 445, 10808, 337, + 257, 2685, 2316, 1230, 420, 746, 411, 300, 13, 51648], "temperature": 0.0, "avg_logprob": + -0.13930261941780722, "compression_ratio": 1.78125, "no_speech_prob": 0.00030742018134333193}, + {"id": 405, "seek": 143364, "start": 1434.6000000000001, "end": 1439.48, "text": + " In my experience, a lot of text embedding models, if you use a term that''s outside + of the domain,", "tokens": [50412, 682, 452, 1752, 11, 257, 688, 295, 2487, 12240, + 3584, 5245, 11, 498, 291, 764, 257, 1433, 300, 311, 2380, 295, 264, 9274, 11, 50656], + "temperature": 0.0, "avg_logprob": -0.18848081518102575, "compression_ratio": 1.755485893416928, + "no_speech_prob": 0.0014015301130712032}, {"id": 406, "seek": 143364, "start": 1439.48, + "end": 1442.3600000000001, "text": " something that was outside the keyword list + it was trained with, you''re going to get really", "tokens": [50656, 746, 300, 390, + 2380, 264, 20428, 1329, 309, 390, 8895, 365, 11, 291, 434, 516, 281, 483, 534, 50800], + "temperature": 0.0, "avg_logprob": -0.18848081518102575, "compression_ratio": 1.755485893416928, + "no_speech_prob": 0.0014015301130712032}, {"id": 407, "seek": 143364, "start": 1442.3600000000001, + "end": 1446.3600000000001, "text": " bad results. So that''s another thing I have + to sort of be thinking about. So unfortunately,", "tokens": [50800, 1578, 3542, + 13, 407, 300, 311, 1071, 551, 286, 362, 281, 1333, 295, 312, 1953, 466, 13, 407, + 7015, 11, 51000], "temperature": 0.0, "avg_logprob": -0.18848081518102575, "compression_ratio": + 1.755485893416928, "no_speech_prob": 0.0014015301130712032}, {"id": 408, "seek": + 143364, "start": 1447.24, "end": 1450.8400000000001, "text": " right now I think + that the best way to set these things up is to do multiple last-use", "tokens": + [51044, 558, 586, 286, 519, 300, 264, 1151, 636, 281, 992, 613, 721, 493, 307, 281, + 360, 3866, 1036, 12, 438, 51224], "temperature": 0.0, "avg_logprob": -0.18848081518102575, + "compression_ratio": 1.755485893416928, "no_speech_prob": 0.0014015301130712032}, + {"id": 409, "seek": 143364, "start": 1450.8400000000001, "end": 1455.16, "text": + " search queries, maybe a postgres query, and maybe a quadrant query, and then correlate + all those", "tokens": [51224, 3164, 24109, 11, 1310, 257, 2183, 45189, 14581, 11, + 293, 1310, 257, 46856, 14581, 11, 293, 550, 48742, 439, 729, 51440], "temperature": + 0.0, "avg_logprob": -0.18848081518102575, "compression_ratio": 1.755485893416928, + "no_speech_prob": 0.0014015301130712032}, {"id": 410, "seek": 143364, "start": 1455.16, + "end": 1461.64, "text": " results and display them intelligently. Yeah, exactly. + So basically, you almost need some smart", "tokens": [51440, 3542, 293, 4674, 552, + 5613, 2276, 13, 865, 11, 2293, 13, 407, 1936, 11, 291, 1920, 643, 512, 4069, 51764], + "temperature": 0.0, "avg_logprob": -0.18848081518102575, "compression_ratio": 1.755485893416928, + "no_speech_prob": 0.0014015301130712032}, {"id": 411, "seek": 146164, "start": 1461.64, + "end": 1469.0, "text": " runker or re-runker, which combines these results, and + it doesn''t care, which algorithm was used", "tokens": [50364, 367, 3197, 260, 420, + 319, 12, 81, 3197, 260, 11, 597, 29520, 613, 3542, 11, 293, 309, 1177, 380, 1127, + 11, 597, 9284, 390, 1143, 50732], "temperature": 0.0, "avg_logprob": -0.20048571095883266, + "compression_ratio": 1.5714285714285714, "no_speech_prob": 0.0023459226358681917}, + {"id": 412, "seek": 146164, "start": 1469.0, "end": 1475.0, "text": " to bring them + in. But what it cares is the, you know, to optimize the KPI, let''s say,", "tokens": + [50732, 281, 1565, 552, 294, 13, 583, 437, 309, 12310, 307, 264, 11, 291, 458, 11, + 281, 19719, 264, 591, 31701, 11, 718, 311, 584, 11, 51032], "temperature": 0.0, + "avg_logprob": -0.20048571095883266, "compression_ratio": 1.5714285714285714, "no_speech_prob": + 0.0023459226358681917}, {"id": 413, "seek": 146164, "start": 1475.88, "end": 1481.48, + "text": " flicks through rate or whatever it is. Because in some applications, like, + I''ve been talking to one", "tokens": [51076, 932, 7663, 807, 3314, 420, 2035, 309, + 307, 13, 1436, 294, 512, 5821, 11, 411, 11, 286, 600, 668, 1417, 281, 472, 51356], + "temperature": 0.0, "avg_logprob": -0.20048571095883266, "compression_ratio": 1.5714285714285714, + "no_speech_prob": 0.0023459226358681917}, {"id": 414, "seek": 146164, "start": 1481.48, + "end": 1486.92, "text": " company building maps, and they said that, for example, + when you sit in a car and you start", "tokens": [51356, 2237, 2390, 11317, 11, 293, + 436, 848, 300, 11, 337, 1365, 11, 562, 291, 1394, 294, 257, 1032, 293, 291, 722, + 51628], "temperature": 0.0, "avg_logprob": -0.20048571095883266, "compression_ratio": + 1.5714285714285714, "no_speech_prob": 0.0023459226358681917}, {"id": 415, "seek": + 148692, "start": 1487.0, "end": 1493.3200000000002, "text": " typing some, like, + few letters, like two or three, you don''t have much time as a driver, and you just", + "tokens": [50368, 18444, 512, 11, 411, 11, 1326, 7825, 11, 411, 732, 420, 1045, + 11, 291, 500, 380, 362, 709, 565, 382, 257, 6787, 11, 293, 291, 445, 50684], "temperature": + 0.0, "avg_logprob": -0.10982670446838995, "compression_ratio": 1.6608695652173913, + "no_speech_prob": 0.015799125656485558}, {"id": 416, "seek": 148692, "start": 1493.3200000000002, + "end": 1501.0, "text": " need to hit the road going, right? So if this company is + doing bad at predicting the intent,", "tokens": [50684, 643, 281, 2045, 264, 3060, + 516, 11, 558, 30, 407, 498, 341, 2237, 307, 884, 1578, 412, 32884, 264, 8446, 11, + 51068], "temperature": 0.0, "avg_logprob": -0.10982670446838995, "compression_ratio": + 1.6608695652173913, "no_speech_prob": 0.015799125656485558}, {"id": 417, "seek": + 148692, "start": 1501.64, "end": 1508.1200000000001, "text": " and by the way, what + they do is that they don''t limit the search only to the radius around you,", "tokens": + [51100, 293, 538, 264, 636, 11, 437, 436, 360, 307, 300, 436, 500, 380, 4948, 264, + 3164, 787, 281, 264, 15845, 926, 291, 11, 51424], "temperature": 0.0, "avg_logprob": + -0.10982670446838995, "compression_ratio": 1.6608695652173913, "no_speech_prob": + 0.015799125656485558}, {"id": 418, "seek": 148692, "start": 1508.1200000000001, + "end": 1513.8000000000002, "text": " because they believe that you might be going + to the airport from where you will fly out to,", "tokens": [51424, 570, 436, 1697, + 300, 291, 1062, 312, 516, 281, 264, 10155, 490, 689, 291, 486, 3603, 484, 281, 11, + 51708], "temperature": 0.0, "avg_logprob": -0.10982670446838995, "compression_ratio": + 1.6608695652173913, "no_speech_prob": 0.015799125656485558}, {"id": 419, "seek": + 151380, "start": 1513.8, "end": 1518.36, "text": " I don''t know, Washington, DC, + and then you are looking for that specific street while you are", "tokens": [50364, + 286, 500, 380, 458, 11, 6149, 11, 9114, 11, 293, 550, 291, 366, 1237, 337, 300, + 2685, 4838, 1339, 291, 366, 50592], "temperature": 0.0, "avg_logprob": -0.13780059511699375, + "compression_ratio": 1.6808510638297873, "no_speech_prob": 0.01753455214202404}, + {"id": 420, "seek": 151380, "start": 1518.36, "end": 1524.28, "text": " sitting + in the car in Europe. And so they search the whole database of points of interest, + and,", "tokens": [50592, 3798, 294, 264, 1032, 294, 3315, 13, 400, 370, 436, 3164, + 264, 1379, 8149, 295, 2793, 295, 1179, 11, 293, 11, 50888], "temperature": 0.0, + "avg_logprob": -0.13780059511699375, "compression_ratio": 1.6808510638297873, "no_speech_prob": + 0.01753455214202404}, {"id": 421, "seek": 151380, "start": 1524.28, "end": 1528.9199999999998, + "text": " and you know, like, first of all, it''s scale. It''s going to be a problem. + And the other thing is,", "tokens": [50888, 293, 291, 458, 11, 411, 11, 700, 295, + 439, 11, 309, 311, 4373, 13, 467, 311, 516, 281, 312, 257, 1154, 13, 400, 264, 661, + 551, 307, 11, 51120], "temperature": 0.0, "avg_logprob": -0.13780059511699375, "compression_ratio": + 1.6808510638297873, "no_speech_prob": 0.01753455214202404}, {"id": 422, "seek": + 151380, "start": 1528.9199999999998, "end": 1534.9199999999998, "text": " you need + to actually rank the results in such a way that they get it, right? So it''s extremely", + "tokens": [51120, 291, 643, 281, 767, 6181, 264, 3542, 294, 1270, 257, 636, 300, + 436, 483, 309, 11, 558, 30, 407, 309, 311, 4664, 51420], "temperature": 0.0, "avg_logprob": + -0.13780059511699375, "compression_ratio": 1.6808510638297873, "no_speech_prob": + 0.01753455214202404}, {"id": 423, "seek": 151380, "start": 1534.9199999999998, "end": + 1539.96, "text": " difficult problem to handle. So in that case, I would, yeah, + in that case, they''re probably", "tokens": [51420, 2252, 1154, 281, 4813, 13, 407, + 294, 300, 1389, 11, 286, 576, 11, 1338, 11, 294, 300, 1389, 11, 436, 434, 1391, + 51672], "temperature": 0.0, "avg_logprob": -0.13780059511699375, "compression_ratio": + 1.6808510638297873, "no_speech_prob": 0.01753455214202404}, {"id": 424, "seek": + 153996, "start": 1539.96, "end": 1543.96, "text": " predicting from where you are + now. If you''re here now, what''s the most likely thing that you want to", "tokens": + [50364, 32884, 490, 689, 291, 366, 586, 13, 759, 291, 434, 510, 586, 11, 437, 311, + 264, 881, 3700, 551, 300, 291, 528, 281, 50564], "temperature": 0.0, "avg_logprob": + -0.11520998101485402, "compression_ratio": 1.8404907975460123, "no_speech_prob": + 0.009563334286212921}, {"id": 425, "seek": 153996, "start": 1543.96, "end": 1547.88, + "text": " go to? It''s kind of an interesting problem. And actually, that''s like, + you actually kind of bring", "tokens": [50564, 352, 281, 30, 467, 311, 733, 295, + 364, 1880, 1154, 13, 400, 767, 11, 300, 311, 411, 11, 291, 767, 733, 295, 1565, + 50760], "temperature": 0.0, "avg_logprob": -0.11520998101485402, "compression_ratio": + 1.8404907975460123, "no_speech_prob": 0.009563334286212921}, {"id": 426, "seek": + 153996, "start": 1547.88, "end": 1552.3600000000001, "text": " up a good point, + is that a lot of startups don''t have enough data to make those intelligent associations.", + "tokens": [50760, 493, 257, 665, 935, 11, 307, 300, 257, 688, 295, 28041, 500, 380, + 362, 1547, 1412, 281, 652, 729, 13232, 26597, 13, 50984], "temperature": 0.0, "avg_logprob": + -0.11520998101485402, "compression_ratio": 1.8404907975460123, "no_speech_prob": + 0.009563334286212921}, {"id": 427, "seek": 153996, "start": 1553.0, "end": 1556.76, + "text": " So it becomes a game of sort of, this, of finding an open data set that + you can use, or something", "tokens": [51016, 407, 309, 3643, 257, 1216, 295, 1333, + 295, 11, 341, 11, 295, 5006, 364, 1269, 1412, 992, 300, 291, 393, 764, 11, 420, + 746, 51204], "temperature": 0.0, "avg_logprob": -0.11520998101485402, "compression_ratio": + 1.8404907975460123, "no_speech_prob": 0.009563334286212921}, {"id": 428, "seek": + 153996, "start": 1556.76, "end": 1561.56, "text": " you do have, and like sort of + abstracting from it, or extending it in a certain way that you can", "tokens": [51204, + 291, 360, 362, 11, 293, 411, 1333, 295, 12649, 278, 490, 309, 11, 420, 24360, 309, + 294, 257, 1629, 636, 300, 291, 393, 51444], "temperature": 0.0, "avg_logprob": -0.11520998101485402, + "compression_ratio": 1.8404907975460123, "no_speech_prob": 0.009563334286212921}, + {"id": 429, "seek": 153996, "start": 1561.56, "end": 1566.8400000000001, "text": + " make these intelligent inferences. But it''s very, very difficult. And until you + have a lot of users,", "tokens": [51444, 652, 613, 13232, 13596, 2667, 13, 583, + 309, 311, 588, 11, 588, 2252, 13, 400, 1826, 291, 362, 257, 688, 295, 5022, 11, + 51708], "temperature": 0.0, "avg_logprob": -0.11520998101485402, "compression_ratio": + 1.8404907975460123, "no_speech_prob": 0.009563334286212921}, {"id": 430, "seek": + 156684, "start": 1566.84, "end": 1569.9599999999998, "text": " you don''t have any + data coming back to you, telling you whether or not you''re doing a good job.", + "tokens": [50364, 291, 500, 380, 362, 604, 1412, 1348, 646, 281, 291, 11, 3585, + 291, 1968, 420, 406, 291, 434, 884, 257, 665, 1691, 13, 50520], "temperature": 0.0, + "avg_logprob": -0.17050583906999722, "compression_ratio": 1.7363013698630136, "no_speech_prob": + 0.003944059368222952}, {"id": 431, "seek": 156684, "start": 1570.6799999999998, + "end": 1575.6399999999999, "text": " So it''s not easy. And I think that''s one + of the reasons that we see that some of these big startups,", "tokens": [50556, + 407, 309, 311, 406, 1858, 13, 400, 286, 519, 300, 311, 472, 295, 264, 4112, 300, + 321, 536, 300, 512, 295, 613, 955, 28041, 11, 50804], "temperature": 0.0, "avg_logprob": + -0.17050583906999722, "compression_ratio": 1.7363013698630136, "no_speech_prob": + 0.003944059368222952}, {"id": 432, "seek": 156684, "start": 1576.1999999999998, + "end": 1580.52, "text": " these platforms become very entrenched with their data + learning tools, or their machine learning tools,", "tokens": [50832, 613, 9473, + 1813, 588, 948, 42388, 365, 641, 1412, 2539, 3873, 11, 420, 641, 3479, 2539, 3873, + 11, 51048], "temperature": 0.0, "avg_logprob": -0.17050583906999722, "compression_ratio": + 1.7363013698630136, "no_speech_prob": 0.003944059368222952}, {"id": 433, "seek": + 156684, "start": 1580.52, "end": 1585.8, "text": " their data sets, there you have, + it becomes hard to, hard to unseat them. Because all the activity and", "tokens": + [51048, 641, 1412, 6352, 11, 456, 291, 362, 11, 309, 3643, 1152, 281, 11, 1152, + 281, 517, 43152, 552, 13, 1436, 439, 264, 5191, 293, 51312], "temperature": 0.0, + "avg_logprob": -0.17050583906999722, "compression_ratio": 1.7363013698630136, "no_speech_prob": + 0.003944059368222952}, {"id": 434, "seek": 156684, "start": 1585.8, "end": 1592.1999999999998, + "text": " that space is happening on their property, you know? Yeah, yeah, exactly. + So, and one thing I wanted to", "tokens": [51312, 300, 1901, 307, 2737, 322, 641, + 4707, 11, 291, 458, 30, 865, 11, 1338, 11, 2293, 13, 407, 11, 293, 472, 551, 286, + 1415, 281, 51632], "temperature": 0.0, "avg_logprob": -0.17050583906999722, "compression_ratio": + 1.7363013698630136, "no_speech_prob": 0.003944059368222952}, {"id": 435, "seek": + 159220, "start": 1592.2, "end": 1599.72, "text": " also mention that you said you + want to handle typos. Did you know, or did you look into bite level", "tokens": + [50364, 611, 2152, 300, 291, 848, 291, 528, 281, 4813, 2125, 329, 13, 2589, 291, + 458, 11, 420, 630, 291, 574, 666, 7988, 1496, 50740], "temperature": 0.0, "avg_logprob": + -0.145377700145428, "compression_ratio": 1.6375, "no_speech_prob": 0.011846935376524925}, + {"id": 436, "seek": 159220, "start": 1601.0, "end": 1607.16, "text": " embedding + models? So basically, instead of, you know, segmenting the word, let''s say, letter + by letter,", "tokens": [50804, 12240, 3584, 5245, 30, 407, 1936, 11, 2602, 295, + 11, 291, 458, 11, 9469, 278, 264, 1349, 11, 718, 311, 584, 11, 5063, 538, 5063, + 11, 51112], "temperature": 0.0, "avg_logprob": -0.145377700145428, "compression_ratio": + 1.6375, "no_speech_prob": 0.011846935376524925}, {"id": 437, "seek": 159220, "start": + 1607.16, "end": 1611.56, "text": " whatever, which could be also expensive. They + go into bite level. I think that paper was published", "tokens": [51112, 2035, 11, + 597, 727, 312, 611, 5124, 13, 814, 352, 666, 7988, 1496, 13, 286, 519, 300, 3035, + 390, 6572, 51332], "temperature": 0.0, "avg_logprob": -0.145377700145428, "compression_ratio": + 1.6375, "no_speech_prob": 0.011846935376524925}, {"id": 438, "seek": 159220, "start": + 1611.56, "end": 1617.0, "text": " by Google. I will try to look it up and also link + it in the show notes. But have you, have,", "tokens": [51332, 538, 3329, 13, 286, + 486, 853, 281, 574, 309, 493, 293, 611, 2113, 309, 294, 264, 855, 5570, 13, 583, + 362, 291, 11, 362, 11, 51604], "temperature": 0.0, "avg_logprob": -0.145377700145428, + "compression_ratio": 1.6375, "no_speech_prob": 0.011846935376524925}, {"id": 439, + "seek": 161700, "start": 1617.08, "end": 1622.44, "text": " did you know about this? + Or did you consider such models? That''s news to me. What I''ve been trying to do", + "tokens": [50368, 630, 291, 458, 466, 341, 30, 1610, 630, 291, 1949, 1270, 5245, + 30, 663, 311, 2583, 281, 385, 13, 708, 286, 600, 668, 1382, 281, 360, 50636], "temperature": + 0.0, "avg_logprob": -0.1710439066248616, "compression_ratio": 1.6601307189542485, + "no_speech_prob": 0.02082948572933674}, {"id": 440, "seek": 161700, "start": 1622.44, + "end": 1627.4, "text": " is just retrain an existing model with a bunch of permutations + and things like that, obviously,", "tokens": [50636, 307, 445, 1533, 7146, 364, + 6741, 2316, 365, 257, 3840, 295, 4784, 325, 763, 293, 721, 411, 300, 11, 2745, 11, + 50884], "temperature": 0.0, "avg_logprob": -0.1710439066248616, "compression_ratio": + 1.6601307189542485, "no_speech_prob": 0.02082948572933674}, {"id": 441, "seek": + 161700, "start": 1627.4, "end": 1631.72, "text": " think of that were like common + typos like dashes and stuff like that. But that''s a very interesting", "tokens": + [50884, 519, 295, 300, 645, 411, 2689, 2125, 329, 411, 8240, 279, 293, 1507, 411, + 300, 13, 583, 300, 311, 257, 588, 1880, 51100], "temperature": 0.0, "avg_logprob": + -0.1710439066248616, "compression_ratio": 1.6601307189542485, "no_speech_prob": + 0.02082948572933674}, {"id": 442, "seek": 161700, "start": 1631.72, "end": 1635.48, + "text": " idea. So basically, they''re working on a character by character level, + right? So the embedding itself", "tokens": [51100, 1558, 13, 407, 1936, 11, 436, + 434, 1364, 322, 257, 2517, 538, 2517, 1496, 11, 558, 30, 407, 264, 12240, 3584, + 2564, 51288], "temperature": 0.0, "avg_logprob": -0.1710439066248616, "compression_ratio": + 1.6601307189542485, "no_speech_prob": 0.02082948572933674}, {"id": 443, "seek": + 161700, "start": 1635.48, "end": 1641.64, "text": " is composed of, it''s even bite + because the language could be something they like, okay, you don''t want to", "tokens": + [51288, 307, 18204, 295, 11, 309, 311, 754, 7988, 570, 264, 2856, 727, 312, 746, + 436, 411, 11, 1392, 11, 291, 500, 380, 528, 281, 51596], "temperature": 0.0, "avg_logprob": + -0.1710439066248616, "compression_ratio": 1.6601307189542485, "no_speech_prob": + 0.02082948572933674}, {"id": 444, "seek": 164164, "start": 1641.88, "end": 1647.24, + "text": " apply some linguistic component, which is language dependent. Let''s say + in Chinese, you need to", "tokens": [50376, 3079, 512, 43002, 6542, 11, 597, 307, + 2856, 12334, 13, 961, 311, 584, 294, 4649, 11, 291, 643, 281, 50644], "temperature": + 0.0, "avg_logprob": -0.1800061502764302, "compression_ratio": 1.7617328519855595, + "no_speech_prob": 0.007794060278683901}, {"id": 445, "seek": 164164, "start": 1647.24, + "end": 1652.76, "text": " segment the string, right? You need to know where is that + space, which is not there geometrically.", "tokens": [50644, 9469, 264, 6798, 11, + 558, 30, 509, 643, 281, 458, 689, 307, 300, 1901, 11, 597, 307, 406, 456, 12956, + 81, 984, 13, 50920], "temperature": 0.0, "avg_logprob": -0.1800061502764302, "compression_ratio": + 1.7617328519855595, "no_speech_prob": 0.007794060278683901}, {"id": 446, "seek": + 164164, "start": 1652.76, "end": 1658.76, "text": " And then in some other languages, + let''s say, Russian, you will have like rich morphology. So a lot of", "tokens": + [50920, 400, 550, 294, 512, 661, 8650, 11, 718, 311, 584, 11, 7220, 11, 291, 486, + 362, 411, 4593, 25778, 1793, 13, 407, 257, 688, 295, 51220], "temperature": 0.0, + "avg_logprob": -0.1800061502764302, "compression_ratio": 1.7617328519855595, "no_speech_prob": + 0.007794060278683901}, {"id": 447, "seek": 164164, "start": 1658.76, "end": 1664.5200000000002, + "text": " endings and prefixes of the word, right? So instead of, yeah, like surface + forms, instead of", "tokens": [51220, 42474, 293, 18417, 36005, 295, 264, 1349, + 11, 558, 30, 407, 2602, 295, 11, 1338, 11, 411, 3753, 6422, 11, 2602, 295, 51508], + "temperature": 0.0, "avg_logprob": -0.1800061502764302, "compression_ratio": 1.7617328519855595, + "no_speech_prob": 0.007794060278683901}, {"id": 448, "seek": 164164, "start": 1664.5200000000002, + "end": 1670.76, "text": " applying that surface form lematizer, you will go and + just look at the bites. And then you ask the", "tokens": [51508, 9275, 300, 3753, + 1254, 7495, 267, 6545, 11, 291, 486, 352, 293, 445, 574, 412, 264, 26030, 13, 400, + 550, 291, 1029, 264, 51820], "temperature": 0.0, "avg_logprob": -0.1800061502764302, + "compression_ratio": 1.7617328519855595, "no_speech_prob": 0.007794060278683901}, + {"id": 449, "seek": 167076, "start": 1670.76, "end": 1677.08, "text": " neural networks + to pay attention to the bites instead of the characters. Yeah. That''s actually + a brilliant", "tokens": [50364, 18161, 9590, 281, 1689, 3202, 281, 264, 26030, 2602, + 295, 264, 4342, 13, 865, 13, 663, 311, 767, 257, 10248, 50680], "temperature": 0.0, + "avg_logprob": -0.1695186346769333, "compression_ratio": 1.6722408026755853, "no_speech_prob": + 0.0051723564974963665}, {"id": 450, "seek": 167076, "start": 1677.08, "end": 1681.24, + "text": " idea. And no, I haven''t heard that, but I would love to apply it. So + please send me the link.", "tokens": [50680, 1558, 13, 400, 572, 11, 286, 2378, + 380, 2198, 300, 11, 457, 286, 576, 959, 281, 3079, 309, 13, 407, 1767, 2845, 385, + 264, 2113, 13, 50888], "temperature": 0.0, "avg_logprob": -0.1695186346769333, "compression_ratio": + 1.6722408026755853, "no_speech_prob": 0.0051723564974963665}, {"id": 451, "seek": + 167076, "start": 1681.24, "end": 1685.72, "text": " I will, I will for sure. It + would be cool if you can apply and take a look at it. And hopefully,", "tokens": + [50888, 286, 486, 11, 286, 486, 337, 988, 13, 467, 576, 312, 1627, 498, 291, 393, + 3079, 293, 747, 257, 574, 412, 309, 13, 400, 4696, 11, 51112], "temperature": 0.0, + "avg_logprob": -0.1695186346769333, "compression_ratio": 1.6722408026755853, "no_speech_prob": + 0.0051723564974963665}, {"id": 452, "seek": 167076, "start": 1685.72, "end": 1691.4, + "text": " there is a model that you can take off the shelf and not like spend weeks + or months researching it.", "tokens": [51112, 456, 307, 257, 2316, 300, 291, 393, + 747, 766, 264, 15222, 293, 406, 411, 3496, 3259, 420, 2493, 24176, 309, 13, 51396], + "temperature": 0.0, "avg_logprob": -0.1695186346769333, "compression_ratio": 1.6722408026755853, + "no_speech_prob": 0.0051723564974963665}, {"id": 453, "seek": 167076, "start": 1691.4, + "end": 1696.76, "text": " So the amount of effort going into training these models + now is really, really absurd. It''s ludicrous,", "tokens": [51396, 407, 264, 2372, + 295, 4630, 516, 666, 3097, 613, 5245, 586, 307, 534, 11, 534, 19774, 13, 467, 311, + 15946, 299, 21189, 11, 51664], "temperature": 0.0, "avg_logprob": -0.1695186346769333, + "compression_ratio": 1.6722408026755853, "no_speech_prob": 0.0051723564974963665}, + {"id": 454, "seek": 169676, "start": 1697.56, "end": 1702.04, "text": " yes. And + I mean, the models are getting bigger and bigger, but it''s funny that they", "tokens": + [50404, 2086, 13, 400, 286, 914, 11, 264, 5245, 366, 1242, 3801, 293, 3801, 11, + 457, 309, 311, 4074, 300, 436, 50628], "temperature": 0.0, "avg_logprob": -0.1478987565407386, + "compression_ratio": 1.649789029535865, "no_speech_prob": 0.01335699949413538}, + {"id": 455, "seek": 169676, "start": 1702.04, "end": 1708.68, "text": " not necessarily + becoming more smart in a way. And I will try to open it. And I''m actually now editing", + "tokens": [50628, 406, 4725, 5617, 544, 4069, 294, 257, 636, 13, 400, 286, 486, + 853, 281, 1269, 309, 13, 400, 286, 478, 767, 586, 10000, 50960], "temperature": + 0.0, "avg_logprob": -0.1478987565407386, "compression_ratio": 1.649789029535865, + "no_speech_prob": 0.01335699949413538}, {"id": 456, "seek": 169676, "start": 1708.68, + "end": 1715.8, "text": " an episode. So by the time this one is published, that + one should also be published. And yeah, so basically,", "tokens": [50960, 364, 3500, + 13, 407, 538, 264, 565, 341, 472, 307, 6572, 11, 300, 472, 820, 611, 312, 6572, + 13, 400, 1338, 11, 370, 1936, 11, 51316], "temperature": 0.0, "avg_logprob": -0.1478987565407386, + "compression_ratio": 1.649789029535865, "no_speech_prob": 0.01335699949413538}, + {"id": 457, "seek": 169676, "start": 1718.36, "end": 1724.12, "text": " yeah, how + you train everything, I don''t know, it''s like, you don''t want like you as a developer,", + "tokens": [51444, 1338, 11, 577, 291, 3847, 1203, 11, 286, 500, 380, 458, 11, 309, + 311, 411, 11, 291, 500, 380, 528, 411, 291, 382, 257, 10754, 11, 51732], "temperature": + 0.0, "avg_logprob": -0.1478987565407386, "compression_ratio": 1.649789029535865, + "no_speech_prob": 0.01335699949413538}, {"id": 458, "seek": 172412, "start": 1724.12, + "end": 1729.08, "text": " you don''t have time researching things, right? Now with + what options do you have, you will probably", "tokens": [50364, 291, 500, 380, 362, + 565, 24176, 721, 11, 558, 30, 823, 365, 437, 3956, 360, 291, 362, 11, 291, 486, + 1391, 50612], "temperature": 0.0, "avg_logprob": -0.1210069578837573, "compression_ratio": + 1.685121107266436, "no_speech_prob": 0.01240096241235733}, {"id": 459, "seek": 172412, + "start": 1729.08, "end": 1734.28, "text": " need to go to some company, which will + offer you the service for money, or you need to go and scout", "tokens": [50612, + 643, 281, 352, 281, 512, 2237, 11, 597, 486, 2626, 291, 264, 2643, 337, 1460, 11, + 420, 291, 643, 281, 352, 293, 34392, 50872], "temperature": 0.0, "avg_logprob": + -0.1210069578837573, "compression_ratio": 1.685121107266436, "no_speech_prob": 0.01240096241235733}, + {"id": 460, "seek": 172412, "start": 1734.28, "end": 1739.8, "text": " on the hugging + face side and hope for the best, right? How do you feel about that? Yeah.", "tokens": + [50872, 322, 264, 41706, 1851, 1252, 293, 1454, 337, 264, 1151, 11, 558, 30, 1012, + 360, 291, 841, 466, 300, 30, 865, 13, 51148], "temperature": 0.0, "avg_logprob": + -0.1210069578837573, "compression_ratio": 1.685121107266436, "no_speech_prob": 0.01240096241235733}, + {"id": 461, "seek": 172412, "start": 1741.8, "end": 1746.36, "text": " I haven''t + spent so much time just sort of getting my brain around certain things that are + like,", "tokens": [51248, 286, 2378, 380, 4418, 370, 709, 565, 445, 1333, 295, 1242, + 452, 3567, 926, 1629, 721, 300, 366, 411, 11, 51476], "temperature": 0.0, "avg_logprob": + -0.1210069578837573, "compression_ratio": 1.685121107266436, "no_speech_prob": 0.01240096241235733}, + {"id": 462, "seek": 172412, "start": 1746.36, "end": 1749.6399999999999, "text": + " you know, there''s no real jumping off point for a lot of the stuff. There''s + no single place you can go", "tokens": [51476, 291, 458, 11, 456, 311, 572, 957, + 11233, 766, 935, 337, 257, 688, 295, 264, 1507, 13, 821, 311, 572, 2167, 1081, 291, + 393, 352, 51640], "temperature": 0.0, "avg_logprob": -0.1210069578837573, "compression_ratio": + 1.685121107266436, "no_speech_prob": 0.01240096241235733}, {"id": 463, "seek": 174964, + "start": 1749.72, "end": 1754.0400000000002, "text": " to people, I see people on + the web on these sites, like, what book should I read about this?", "tokens": [50368, + 281, 561, 11, 286, 536, 561, 322, 264, 3670, 322, 613, 7533, 11, 411, 11, 437, 1446, + 820, 286, 1401, 466, 341, 30, 50584], "temperature": 0.0, "avg_logprob": -0.16473599697681182, + "compression_ratio": 1.7537993920972645, "no_speech_prob": 0.0058298357762396336}, + {"id": 464, "seek": 174964, "start": 1754.68, "end": 1758.3600000000001, "text": + " Book, you kidding me? What book should you write about this? There are no books + about any of this,", "tokens": [50616, 9476, 11, 291, 9287, 385, 30, 708, 1446, + 820, 291, 2464, 466, 341, 30, 821, 366, 572, 3642, 466, 604, 295, 341, 11, 50800], + "temperature": 0.0, "avg_logprob": -0.16473599697681182, "compression_ratio": 1.7537993920972645, + "no_speech_prob": 0.0058298357762396336}, {"id": 465, "seek": 174964, "start": 1758.3600000000001, + "end": 1763.64, "text": " you know? It''s changing so quickly that I feel like you + have to be part of numerous, like internet", "tokens": [50800, 291, 458, 30, 467, + 311, 4473, 370, 2661, 300, 286, 841, 411, 291, 362, 281, 312, 644, 295, 12546, 11, + 411, 4705, 51064], "temperature": 0.0, "avg_logprob": -0.16473599697681182, "compression_ratio": + 1.7537993920972645, "no_speech_prob": 0.0058298357762396336}, {"id": 466, "seek": + 174964, "start": 1763.64, "end": 1768.76, "text": " subcultures and very specific, + like research websites, even I understand what''s going on. But", "tokens": [51064, + 1422, 66, 723, 1303, 293, 588, 2685, 11, 411, 2132, 12891, 11, 754, 286, 1223, 437, + 311, 516, 322, 13, 583, 51320], "temperature": 0.0, "avg_logprob": -0.16473599697681182, + "compression_ratio": 1.7537993920972645, "no_speech_prob": 0.0058298357762396336}, + {"id": 467, "seek": 174964, "start": 1768.76, "end": 1772.92, "text": " thank God + that people like hugging face are putting so many resources into just making these + tools", "tokens": [51320, 1309, 1265, 300, 561, 411, 41706, 1851, 366, 3372, 370, + 867, 3593, 666, 445, 1455, 613, 3873, 51528], "temperature": 0.0, "avg_logprob": + -0.16473599697681182, "compression_ratio": 1.7537993920972645, "no_speech_prob": + 0.0058298357762396336}, {"id": 468, "seek": 174964, "start": 1772.92, "end": 1778.2800000000002, + "text": " available, like their pipelines package is like such a time saver. I can''t + believe I was ever", "tokens": [51528, 2435, 11, 411, 641, 40168, 7372, 307, 411, + 1270, 257, 565, 601, 331, 13, 286, 393, 380, 1697, 286, 390, 1562, 51796], "temperature": + 0.0, "avg_logprob": -0.16473599697681182, "compression_ratio": 1.7537993920972645, + "no_speech_prob": 0.0058298357762396336}, {"id": 469, "seek": 177828, "start": 1778.28, + "end": 1784.36, "text": " implementing all these things from scratch or from, you + know, separate tool. Yeah, yeah, there is", "tokens": [50364, 18114, 439, 613, 721, + 490, 8459, 420, 490, 11, 291, 458, 11, 4994, 2290, 13, 865, 11, 1338, 11, 456, 307, + 50668], "temperature": 0.0, "avg_logprob": -0.16054482758045197, "compression_ratio": + 1.7298245614035088, "no_speech_prob": 0.01278048288077116}, {"id": 470, "seek": + 177828, "start": 1784.36, "end": 1789.3999999999999, "text": " another site that + I wanted to mention, which also picks up is papers with code because when you", + "tokens": [50668, 1071, 3621, 300, 286, 1415, 281, 2152, 11, 597, 611, 16137, 493, + 307, 10577, 365, 3089, 570, 562, 291, 50920], "temperature": 0.0, "avg_logprob": + -0.16054482758045197, "compression_ratio": 1.7298245614035088, "no_speech_prob": + 0.01278048288077116}, {"id": 471, "seek": 177828, "start": 1789.3999999999999, "end": + 1794.04, "text": " read a paper and you''re like, okay, how do I do it? I need to + spend a few weekends to implement it.", "tokens": [50920, 1401, 257, 3035, 293, + 291, 434, 411, 11, 1392, 11, 577, 360, 286, 360, 309, 30, 286, 643, 281, 3496, 257, + 1326, 23595, 281, 4445, 309, 13, 51152], "temperature": 0.0, "avg_logprob": -0.16054482758045197, + "compression_ratio": 1.7298245614035088, "no_speech_prob": 0.01278048288077116}, + {"id": 472, "seek": 177828, "start": 1794.04, "end": 1800.28, "text": " Some of + us have the skills, some of us don''t. So people are lucky. Yeah. And they probably + belong to", "tokens": [51152, 2188, 295, 505, 362, 264, 3942, 11, 512, 295, 505, + 500, 380, 13, 407, 561, 366, 6356, 13, 865, 13, 400, 436, 1391, 5784, 281, 51464], + "temperature": 0.0, "avg_logprob": -0.16054482758045197, "compression_ratio": 1.7298245614035088, + "no_speech_prob": 0.01278048288077116}, {"id": 473, "seek": 177828, "start": 1800.28, + "end": 1806.04, "text": " the communities so well that, you know, they know their + way through. Yeah, I think that, you know,", "tokens": [51464, 264, 4456, 370, 731, + 300, 11, 291, 458, 11, 436, 458, 641, 636, 807, 13, 865, 11, 286, 519, 300, 11, + 291, 458, 11, 51752], "temperature": 0.0, "avg_logprob": -0.16054482758045197, "compression_ratio": + 1.7298245614035088, "no_speech_prob": 0.01278048288077116}, {"id": 474, "seek": + 180604, "start": 1806.12, "end": 1810.84, "text": " papers without code are kind + of like, like to me, a little scandalous now. I feel like it''s very", "tokens": + [50368, 10577, 1553, 3089, 366, 733, 295, 411, 11, 411, 281, 385, 11, 257, 707, + 40273, 11553, 586, 13, 286, 841, 411, 309, 311, 588, 50604], "temperature": 0.0, + "avg_logprob": -0.1501302719116211, "compression_ratio": 1.7655677655677655, "no_speech_prob": + 0.0019139123614877462}, {"id": 475, "seek": 180604, "start": 1810.84, "end": 1816.52, + "text": " difficult to measure someone''s results and to really evaluate what people + are doing if they''re", "tokens": [50604, 2252, 281, 3481, 1580, 311, 3542, 293, + 281, 534, 13059, 437, 561, 366, 884, 498, 436, 434, 50888], "temperature": 0.0, + "avg_logprob": -0.1501302719116211, "compression_ratio": 1.7655677655677655, "no_speech_prob": + 0.0019139123614877462}, {"id": 476, "seek": 180604, "start": 1816.52, "end": 1822.68, + "text": " not releasing the code. And even if they explain the algorithm and the + paper a lot of times,", "tokens": [50888, 406, 16327, 264, 3089, 13, 400, 754, 498, + 436, 2903, 264, 9284, 293, 264, 3035, 257, 688, 295, 1413, 11, 51196], "temperature": + 0.0, "avg_logprob": -0.1501302719116211, "compression_ratio": 1.7655677655677655, + "no_speech_prob": 0.0019139123614877462}, {"id": 477, "seek": 180604, "start": 1822.68, + "end": 1826.68, "text": " the specific training process for the model is what''s + really critical. So certain decisions they", "tokens": [51196, 264, 2685, 3097, + 1399, 337, 264, 2316, 307, 437, 311, 534, 4924, 13, 407, 1629, 5327, 436, 51396], + "temperature": 0.0, "avg_logprob": -0.1501302719116211, "compression_ratio": 1.7655677655677655, + "no_speech_prob": 0.0019139123614877462}, {"id": 478, "seek": 180604, "start": 1826.68, + "end": 1831.32, "text": " made about what''s included in the day set, what isn''t + included in the day set. And just sort of like", "tokens": [51396, 1027, 466, 437, + 311, 5556, 294, 264, 786, 992, 11, 437, 1943, 380, 5556, 294, 264, 786, 992, 13, + 400, 445, 1333, 295, 411, 51628], "temperature": 0.0, "avg_logprob": -0.1501302719116211, + "compression_ratio": 1.7655677655677655, "no_speech_prob": 0.0019139123614877462}, + {"id": 479, "seek": 183132, "start": 1831.3999999999999, "end": 1837.8, "text": + " the training loop engineering, like I feel like that''s super important. So I + think what the success", "tokens": [50368, 264, 3097, 6367, 7043, 11, 411, 286, + 841, 411, 300, 311, 1687, 1021, 13, 407, 286, 519, 437, 264, 2245, 50688], "temperature": + 0.0, "avg_logprob": -0.18125890096028646, "compression_ratio": 1.6643598615916955, + "no_speech_prob": 0.020016679540276527}, {"id": 480, "seek": 183132, "start": 1837.8, + "end": 1842.4399999999998, "text": " of OpenAI has had with Clip, I think goes to + show that someone with a great idea and a model", "tokens": [50688, 295, 7238, 48698, + 575, 632, 365, 2033, 647, 11, 286, 519, 1709, 281, 855, 300, 1580, 365, 257, 869, + 1558, 293, 257, 2316, 50920], "temperature": 0.0, "avg_logprob": -0.18125890096028646, + "compression_ratio": 1.6643598615916955, "no_speech_prob": 0.020016679540276527}, + {"id": 481, "seek": 183132, "start": 1842.4399999999998, "end": 1846.6799999999998, + "text": " that''s released on time and early is just it''s going to really be a + game changer for the industry.", "tokens": [50920, 300, 311, 4736, 322, 565, 293, + 2440, 307, 445, 309, 311, 516, 281, 534, 312, 257, 1216, 22822, 337, 264, 3518, + 13, 51132], "temperature": 0.0, "avg_logprob": -0.18125890096028646, "compression_ratio": + 1.6643598615916955, "no_speech_prob": 0.020016679540276527}, {"id": 482, "seek": + 183132, "start": 1847.3999999999999, "end": 1852.2, "text": " Yeah, I remember like + in the Telegram channel of Quadrant, two people including you, I think,", "tokens": + [51168, 865, 11, 286, 1604, 411, 294, 264, 14889, 1342, 2269, 295, 29619, 7541, + 11, 732, 561, 3009, 291, 11, 286, 519, 11, 51408], "temperature": 0.0, "avg_logprob": + -0.18125890096028646, "compression_ratio": 1.6643598615916955, "no_speech_prob": + 0.020016679540276527}, {"id": 483, "seek": 183132, "start": 1852.2, "end": 1857.72, + "text": " said that without any fine tuning, you got really great results with Clip. + And I think you guys", "tokens": [51408, 848, 300, 1553, 604, 2489, 15164, 11, 291, + 658, 534, 869, 3542, 365, 2033, 647, 13, 400, 286, 519, 291, 1074, 51684], "temperature": + 0.0, "avg_logprob": -0.18125890096028646, "compression_ratio": 1.6643598615916955, + "no_speech_prob": 0.020016679540276527}, {"id": 484, "seek": 185772, "start": 1857.72, + "end": 1863.24, "text": " applied it to different domains. And that was amazing + because especially the cross-domain", "tokens": [50364, 6456, 309, 281, 819, 25514, + 13, 400, 300, 390, 2243, 570, 2318, 264, 3278, 12, 4121, 491, 50640], "temperature": + 0.0, "avg_logprob": -0.14832240079356507, "compression_ratio": 1.7573529411764706, + "no_speech_prob": 0.06046351045370102}, {"id": 485, "seek": 185772, "start": 1863.24, + "end": 1867.96, "text": " application, you know, it''s such a big pain for these + models. There is a paper as well where they", "tokens": [50640, 3861, 11, 291, 458, + 11, 309, 311, 1270, 257, 955, 1822, 337, 613, 5245, 13, 821, 307, 257, 3035, 382, + 731, 689, 436, 50876], "temperature": 0.0, "avg_logprob": -0.14832240079356507, + "compression_ratio": 1.7573529411764706, "no_speech_prob": 0.06046351045370102}, + {"id": 486, "seek": 185772, "start": 1867.96, "end": 1873.8, "text": " take the + off-the-shelf models and they apply them to different like search tasks because + a search", "tokens": [50876, 747, 264, 766, 12, 3322, 12, 46626, 5245, 293, 436, + 3079, 552, 281, 819, 411, 3164, 9608, 570, 257, 3164, 51168], "temperature": 0.0, + "avg_logprob": -0.14832240079356507, "compression_ratio": 1.7573529411764706, "no_speech_prob": + 0.06046351045370102}, {"id": 487, "seek": 185772, "start": 1873.8, "end": 1880.2, + "text": " task could be, let''s say, it looks like a question answering system or + it operates in financial", "tokens": [51168, 5633, 727, 312, 11, 718, 311, 584, + 11, 309, 1542, 411, 257, 1168, 13430, 1185, 420, 309, 22577, 294, 4669, 51488], + "temperature": 0.0, "avg_logprob": -0.14832240079356507, "compression_ratio": 1.7573529411764706, + "no_speech_prob": 0.06046351045370102}, {"id": 488, "seek": 185772, "start": 1880.2, + "end": 1884.76, "text": " domain, right? So it goes in specific domain. And then + they basically what they did is that they", "tokens": [51488, 9274, 11, 558, 30, + 407, 309, 1709, 294, 2685, 9274, 13, 400, 550, 436, 1936, 437, 436, 630, 307, 300, + 436, 51716], "temperature": 0.0, "avg_logprob": -0.14832240079356507, "compression_ratio": + 1.7573529411764706, "no_speech_prob": 0.06046351045370102}, {"id": 489, "seek": + 188476, "start": 1884.76, "end": 1889.56, "text": " applied no fine tuning whatsoever. + So they took the models and they applied what they call zero", "tokens": [50364, + 6456, 572, 2489, 15164, 17076, 13, 407, 436, 1890, 264, 5245, 293, 436, 6456, 437, + 436, 818, 4018, 50604], "temperature": 0.0, "avg_logprob": -0.11859741886105157, + "compression_ratio": 1.7408759124087592, "no_speech_prob": 0.006165551953017712}, + {"id": 490, "seek": 188476, "start": 1889.56, "end": 1894.2, "text": " short learning, + right? So you just, you need to predict it right away. And they showed that,", "tokens": + [50604, 2099, 2539, 11, 558, 30, 407, 291, 445, 11, 291, 643, 281, 6069, 309, 558, + 1314, 13, 400, 436, 4712, 300, 11, 50836], "temperature": 0.0, "avg_logprob": -0.11859741886105157, + "compression_ratio": 1.7408759124087592, "no_speech_prob": 0.006165551953017712}, + {"id": 491, "seek": 188476, "start": 1894.2, "end": 1900.28, "text": " ah, man, + they''re not all equal at all. And sometimes they miserably fail. But they actually", + "tokens": [50836, 3716, 11, 587, 11, 436, 434, 406, 439, 2681, 412, 439, 13, 400, + 2171, 436, 17725, 1188, 3061, 13, 583, 436, 767, 51140], "temperature": 0.0, "avg_logprob": + -0.11859741886105157, "compression_ratio": 1.7408759124087592, "no_speech_prob": + 0.006165551953017712}, {"id": 492, "seek": 188476, "start": 1901.08, "end": 1906.76, + "text": " found out that the specific category of algorithms based on, I think based + on dense retrievers,", "tokens": [51180, 1352, 484, 300, 264, 2685, 7719, 295, 14642, + 2361, 322, 11, 286, 519, 2361, 322, 18011, 19817, 840, 11, 51464], "temperature": + 0.0, "avg_logprob": -0.11859741886105157, "compression_ratio": 1.7408759124087592, + "no_speech_prob": 0.006165551953017712}, {"id": 493, "seek": 188476, "start": 1906.76, + "end": 1913.56, "text": " if I remember correctly. So they perform better than others. + But if you compare the dense retrievers", "tokens": [51464, 498, 286, 1604, 8944, + 13, 407, 436, 2042, 1101, 813, 2357, 13, 583, 498, 291, 6794, 264, 18011, 19817, + 840, 51804], "temperature": 0.0, "avg_logprob": -0.11859741886105157, "compression_ratio": + 1.7408759124087592, "no_speech_prob": 0.006165551953017712}, {"id": 494, "seek": + 191356, "start": 1913.56, "end": 1922.12, "text": " to BM25 method, BM25 was still + fairly close to them. And it''s way less expensive. So that''s", "tokens": [50364, + 281, 15901, 6074, 3170, 11, 15901, 6074, 390, 920, 6457, 1998, 281, 552, 13, 400, + 309, 311, 636, 1570, 5124, 13, 407, 300, 311, 50792], "temperature": 0.0, "avg_logprob": + -0.12113968206911671, "compression_ratio": 1.5726141078838174, "no_speech_prob": + 0.002635474083945155}, {"id": 495, "seek": 191356, "start": 1922.12, "end": 1927.8, + "text": " really interesting. Yeah. It also depends a lot on the very, very specific + use cases of your", "tokens": [50792, 534, 1880, 13, 865, 13, 467, 611, 5946, 257, + 688, 322, 264, 588, 11, 588, 2685, 764, 3331, 295, 428, 51076], "temperature": 0.0, + "avg_logprob": -0.12113968206911671, "compression_ratio": 1.5726141078838174, "no_speech_prob": + 0.002635474083945155}, {"id": 496, "seek": 191356, "start": 1927.8, "end": 1932.36, + "text": " users. Like what you''re saying with BM25, if something is very, if they''re + searching for a lot of", "tokens": [51076, 5022, 13, 1743, 437, 291, 434, 1566, + 365, 15901, 6074, 11, 498, 746, 307, 588, 11, 498, 436, 434, 10808, 337, 257, 688, + 295, 51304], "temperature": 0.0, "avg_logprob": -0.12113968206911671, "compression_ratio": + 1.5726141078838174, "no_speech_prob": 0.002635474083945155}, {"id": 497, "seek": + 191356, "start": 1933.32, "end": 1937.1599999999999, "text": " sort of jargon and + industry specific stuff, BM25 is definitely going to kill it compared to even", + "tokens": [51352, 1333, 295, 15181, 10660, 293, 3518, 2685, 1507, 11, 15901, 6074, + 307, 2138, 516, 281, 1961, 309, 5347, 281, 754, 51544], "temperature": 0.0, "avg_logprob": + -0.12113968206911671, "compression_ratio": 1.5726141078838174, "no_speech_prob": + 0.002635474083945155}, {"id": 498, "seek": 193716, "start": 1937.24, "end": 1941.96, + "text": " models that are tolerant of the terms outside of their, outside of the + keyword space. Like,", "tokens": [50368, 5245, 300, 366, 45525, 295, 264, 2115, + 2380, 295, 641, 11, 2380, 295, 264, 20428, 1901, 13, 1743, 11, 50604], "temperature": + 0.0, "avg_logprob": -0.12340681176436574, "compression_ratio": 1.7121771217712176, + "no_speech_prob": 0.0004939164500683546}, {"id": 499, "seek": 193716, "start": 1944.68, + "end": 1950.2, "text": " I really feel like what we need is a more natural way to + blend these two kinds of techniques.", "tokens": [50740, 286, 534, 841, 411, 437, + 321, 643, 307, 257, 544, 3303, 636, 281, 10628, 613, 732, 3685, 295, 7512, 13, 51016], + "temperature": 0.0, "avg_logprob": -0.12340681176436574, "compression_ratio": 1.7121771217712176, + "no_speech_prob": 0.0004939164500683546}, {"id": 500, "seek": 193716, "start": 1951.0, + "end": 1955.16, "text": " And I think as we see more and more advanced vector-based + search engines coming out, we''re", "tokens": [51056, 400, 286, 519, 382, 321, 536, + 544, 293, 544, 7339, 8062, 12, 6032, 3164, 12982, 1348, 484, 11, 321, 434, 51264], + "temperature": 0.0, "avg_logprob": -0.12340681176436574, "compression_ratio": 1.7121771217712176, + "no_speech_prob": 0.0004939164500683546}, {"id": 501, "seek": 193716, "start": 1955.16, + "end": 1959.48, "text": " going to see systems that are able to sort of store the + full text, store the vector embedding,", "tokens": [51264, 516, 281, 536, 3652, + 300, 366, 1075, 281, 1333, 295, 3531, 264, 1577, 2487, 11, 3531, 264, 8062, 12240, + 3584, 11, 51480], "temperature": 0.0, "avg_logprob": -0.12340681176436574, "compression_ratio": + 1.7121771217712176, "no_speech_prob": 0.0004939164500683546}, {"id": 502, "seek": + 193716, "start": 1959.48, "end": 1966.68, "text": " compare them both and rank them + in a uniform way, which is like so critical. And one of the,", "tokens": [51480, + 6794, 552, 1293, 293, 6181, 552, 294, 257, 9452, 636, 11, 597, 307, 411, 370, 4924, + 13, 400, 472, 295, 264, 11, 51840], "temperature": 0.0, "avg_logprob": -0.12340681176436574, + "compression_ratio": 1.7121771217712176, "no_speech_prob": 0.0004939164500683546}, + {"id": 503, "seek": 196668, "start": 1966.68, "end": 1971.8, "text": " I think something + you mentioned that is super interesting is to the systems that are,", "tokens": + [50364, 286, 519, 746, 291, 2835, 300, 307, 1687, 1880, 307, 281, 264, 3652, 300, + 366, 11, 50620], "temperature": 0.0, "avg_logprob": -0.16152468267476783, "compression_ratio": + 1.7703703703703704, "no_speech_prob": 0.0003389701887499541}, {"id": 504, "seek": + 196668, "start": 1973.0800000000002, "end": 1978.8400000000001, "text": " they''re + retraining them using these simple keyword question answering tasks. And result + comes out", "tokens": [50684, 436, 434, 49356, 1760, 552, 1228, 613, 2199, 20428, + 1168, 13430, 9608, 13, 400, 1874, 1487, 484, 50972], "temperature": 0.0, "avg_logprob": + -0.16152468267476783, "compression_ratio": 1.7703703703703704, "no_speech_prob": + 0.0003389701887499541}, {"id": 505, "seek": 196668, "start": 1978.8400000000001, + "end": 1983.0, "text": " much, much better, the accuracy and stuff. I think that''s + so interesting. And I believe that", "tokens": [50972, 709, 11, 709, 1101, 11, 264, + 14170, 293, 1507, 13, 286, 519, 300, 311, 370, 1880, 13, 400, 286, 1697, 300, 51180], + "temperature": 0.0, "avg_logprob": -0.16152468267476783, "compression_ratio": 1.7703703703703704, + "no_speech_prob": 0.0003389701887499541}, {"id": 506, "seek": 196668, "start": 1984.28, + "end": 1988.8400000000001, "text": " if you could take a model and take specifics + about your use case and blend them together very quickly", "tokens": [51244, 498, + 291, 727, 747, 257, 2316, 293, 747, 28454, 466, 428, 764, 1389, 293, 10628, 552, + 1214, 588, 2661, 51472], "temperature": 0.0, "avg_logprob": -0.16152468267476783, + "compression_ratio": 1.7703703703703704, "no_speech_prob": 0.0003389701887499541}, + {"id": 507, "seek": 196668, "start": 1988.8400000000001, "end": 1994.76, "text": + " and easily, I think that we would end up seeing embeddings that produce a much + better result in the", "tokens": [51472, 293, 3612, 11, 286, 519, 300, 321, 576, + 917, 493, 2577, 12240, 29432, 300, 5258, 257, 709, 1101, 1874, 294, 264, 51768], + "temperature": 0.0, "avg_logprob": -0.16152468267476783, "compression_ratio": 1.7703703703703704, + "no_speech_prob": 0.0003389701887499541}, {"id": 508, "seek": 199476, "start": 1994.76, + "end": 2002.52, "text": " field. Yeah. And I think you are tapping also in the in + the part where I hope that at some point,", "tokens": [50364, 2519, 13, 865, 13, + 400, 286, 519, 291, 366, 21444, 611, 294, 264, 294, 264, 644, 689, 286, 1454, 300, + 412, 512, 935, 11, 50752], "temperature": 0.0, "avg_logprob": -0.1449907543589768, + "compression_ratio": 1.588235294117647, "no_speech_prob": 0.005579933058470488}, + {"id": 509, "seek": 199476, "start": 2002.52, "end": 2009.4, "text": " we also get + a confidence level. Let''s say from BM25, you can also get a confidence estimate", + "tokens": [50752, 321, 611, 483, 257, 6687, 1496, 13, 961, 311, 584, 490, 15901, + 6074, 11, 291, 393, 611, 483, 257, 6687, 12539, 51096], "temperature": 0.0, "avg_logprob": + -0.1449907543589768, "compression_ratio": 1.588235294117647, "no_speech_prob": 0.005579933058470488}, + {"id": 510, "seek": 199476, "start": 2009.4, "end": 2016.36, "text": " how well + it worked. And the same goes to dense retrieving, you know, the vector retrieving.", + "tokens": [51096, 577, 731, 309, 2732, 13, 400, 264, 912, 1709, 281, 18011, 19817, + 798, 11, 291, 458, 11, 264, 8062, 19817, 798, 13, 51444], "temperature": 0.0, "avg_logprob": + -0.1449907543589768, "compression_ratio": 1.588235294117647, "no_speech_prob": 0.005579933058470488}, + {"id": 511, "seek": 199476, "start": 2016.36, "end": 2022.84, "text": " Wait, could + also say, hey, I''m kind of like 60% confident that I found you a good result, right?", + "tokens": [51444, 3802, 11, 727, 611, 584, 11, 4177, 11, 286, 478, 733, 295, 411, + 4060, 4, 6679, 300, 286, 1352, 291, 257, 665, 1874, 11, 558, 30, 51768], "temperature": + 0.0, "avg_logprob": -0.1449907543589768, "compression_ratio": 1.588235294117647, + "no_speech_prob": 0.005579933058470488}, {"id": 512, "seek": 202284, "start": 2022.84, + "end": 2027.1599999999999, "text": " Then the question is how you build it, but + like that''s the goal. Let''s say I''m just pushing this out", "tokens": [50364, + 1396, 264, 1168, 307, 577, 291, 1322, 309, 11, 457, 411, 300, 311, 264, 3387, 13, + 961, 311, 584, 286, 478, 445, 7380, 341, 484, 50580], "temperature": 0.0, "avg_logprob": + -0.17208438966332412, "compression_ratio": 1.6342281879194631, "no_speech_prob": + 0.006456795148551464}, {"id": 513, "seek": 202284, "start": 2027.1599999999999, + "end": 2031.72, "text": " to the universe. So hey, everyone who works on search + engines, maybe you can consider this. Or maybe", "tokens": [50580, 281, 264, 6445, + 13, 407, 4177, 11, 1518, 567, 1985, 322, 3164, 12982, 11, 1310, 291, 393, 1949, + 341, 13, 1610, 1310, 50808], "temperature": 0.0, "avg_logprob": -0.17208438966332412, + "compression_ratio": 1.6342281879194631, "no_speech_prob": 0.006456795148551464}, + {"id": 514, "seek": 202284, "start": 2031.72, "end": 2037.24, "text": " you already + are considering this. But I think that would be so much easier because, for example,", + "tokens": [50808, 291, 1217, 366, 8079, 341, 13, 583, 286, 519, 300, 576, 312, 370, + 709, 3571, 570, 11, 337, 1365, 11, 51084], "temperature": 0.0, "avg_logprob": -0.17208438966332412, + "compression_ratio": 1.6342281879194631, "no_speech_prob": 0.006456795148551464}, + {"id": 515, "seek": 202284, "start": 2037.24, "end": 2042.28, "text": " in e-commerce, + right, one of the problems is zero heat search. And probably for your search", "tokens": + [51084, 294, 308, 12, 26926, 11, 558, 11, 472, 295, 264, 2740, 307, 4018, 3738, + 3164, 13, 400, 1391, 337, 428, 3164, 51336], "temperature": 0.0, "avg_logprob": + -0.17208438966332412, "compression_ratio": 1.6342281879194631, "no_speech_prob": + 0.006456795148551464}, {"id": 516, "seek": 202284, "start": 2042.28, "end": 2047.48, + "text": " as well, right? Like somebody typed something that you couldn''t handle. + Now, what do you show? A", "tokens": [51336, 382, 731, 11, 558, 30, 1743, 2618, + 33941, 746, 300, 291, 2809, 380, 4813, 13, 823, 11, 437, 360, 291, 855, 30, 316, + 51596], "temperature": 0.0, "avg_logprob": -0.17208438966332412, "compression_ratio": + 1.6342281879194631, "no_speech_prob": 0.006456795148551464}, {"id": 517, "seek": + 204748, "start": 2047.48, "end": 2054.44, "text": " blank screen or you show the + most popular NFTs, right? And that''s one of the hard things about,", "tokens": + [50364, 8247, 2568, 420, 291, 855, 264, 881, 3743, 13576, 33424, 11, 558, 30, 400, + 300, 311, 472, 295, 264, 1152, 721, 466, 11, 50712], "temperature": 0.0, "avg_logprob": + -0.17251034106238414, "compression_ratio": 1.7167832167832169, "no_speech_prob": + 0.03491286560893059}, {"id": 518, "seek": 204748, "start": 2054.44, "end": 2060.12, + "text": " I guess, a traditional imperative based search engine. You can never show + the user an empty page.", "tokens": [50712, 286, 2041, 11, 257, 5164, 32490, 2361, + 3164, 2848, 13, 509, 393, 1128, 855, 264, 4195, 364, 6707, 3028, 13, 50996], "temperature": + 0.0, "avg_logprob": -0.17251034106238414, "compression_ratio": 1.7167832167832169, + "no_speech_prob": 0.03491286560893059}, {"id": 519, "seek": 204748, "start": 2060.12, + "end": 2064.28, "text": " You can never just say, ah, nothing, sorry, try again. + You have to always be like feeding them", "tokens": [50996, 509, 393, 1128, 445, + 584, 11, 3716, 11, 1825, 11, 2597, 11, 853, 797, 13, 509, 362, 281, 1009, 312, 411, + 12919, 552, 51204], "temperature": 0.0, "avg_logprob": -0.17251034106238414, "compression_ratio": + 1.7167832167832169, "no_speech_prob": 0.03491286560893059}, {"id": 520, "seek": + 204748, "start": 2064.28, "end": 2069.48, "text": " next steps that they can go + to that make that make a lot of sense. And that''s definitely one of the", "tokens": + [51204, 958, 4439, 300, 436, 393, 352, 281, 300, 652, 300, 652, 257, 688, 295, 2020, + 13, 400, 300, 311, 2138, 472, 295, 264, 51464], "temperature": 0.0, "avg_logprob": + -0.17251034106238414, "compression_ratio": 1.7167832167832169, "no_speech_prob": + 0.03491286560893059}, {"id": 521, "seek": 204748, "start": 2069.48, "end": 2075.72, + "text": " challenges with old style database search approaches, just finding, finding + results that are relevant,", "tokens": [51464, 4759, 365, 1331, 3758, 8149, 3164, + 11587, 11, 445, 5006, 11, 5006, 3542, 300, 366, 7340, 11, 51776], "temperature": + 0.0, "avg_logprob": -0.17251034106238414, "compression_ratio": 1.7167832167832169, + "no_speech_prob": 0.03491286560893059}, {"id": 522, "seek": 207572, "start": 2075.72, + "end": 2080.8399999999997, "text": " but not quite right. You know, SQL doesn''t + really do that. So that''s another great thing about", "tokens": [50364, 457, 406, + 1596, 558, 13, 509, 458, 11, 19200, 1177, 380, 534, 360, 300, 13, 407, 300, 311, + 1071, 869, 551, 466, 50620], "temperature": 0.0, "avg_logprob": -0.18517385996305025, + "compression_ratio": 1.5533596837944663, "no_speech_prob": 0.01086879801005125}, + {"id": 523, "seek": 207572, "start": 2080.8399999999997, "end": 2086.6, "text": + " Quadrant. At least a lot of times you''re getting a score metric back that is + a good and continuous,", "tokens": [50620, 29619, 7541, 13, 1711, 1935, 257, 688, + 295, 1413, 291, 434, 1242, 257, 6175, 20678, 646, 300, 307, 257, 665, 293, 10957, + 11, 50908], "temperature": 0.0, "avg_logprob": -0.18517385996305025, "compression_ratio": + 1.5533596837944663, "no_speech_prob": 0.01086879801005125}, {"id": 524, "seek": + 207572, "start": 2087.56, "end": 2093.3999999999996, "text": " value. It''s not + bullying. Yes, this match, no, that didn''t match. Yeah, yeah, exactly. I remember + like", "tokens": [50956, 2158, 13, 467, 311, 406, 4693, 1840, 13, 1079, 11, 341, + 2995, 11, 572, 11, 300, 994, 380, 2995, 13, 865, 11, 1338, 11, 2293, 13, 286, 1604, + 411, 51248], "temperature": 0.0, "avg_logprob": -0.18517385996305025, "compression_ratio": + 1.5533596837944663, "no_speech_prob": 0.01086879801005125}, {"id": 525, "seek": + 207572, "start": 2093.3999999999996, "end": 2101.8799999999997, "text": " when I + was entering this field slowly, I had a friend who was majoring in information retrieval", + "tokens": [51248, 562, 286, 390, 11104, 341, 2519, 5692, 11, 286, 632, 257, 1277, + 567, 390, 2563, 278, 294, 1589, 19817, 3337, 51672], "temperature": 0.0, "avg_logprob": + -0.18517385996305025, "compression_ratio": 1.5533596837944663, "no_speech_prob": + 0.01086879801005125}, {"id": 526, "seek": 210188, "start": 2101.88, "end": 2109.08, + "text": " systems as an academic. And and I asked him, hey, so if I, how do search + engine work, you know,", "tokens": [50364, 3652, 382, 364, 7778, 13, 400, 293, 286, + 2351, 796, 11, 4177, 11, 370, 498, 286, 11, 577, 360, 3164, 2848, 589, 11, 291, + 458, 11, 50724], "temperature": 0.0, "avg_logprob": -0.17766803741455078, "compression_ratio": + 1.6523605150214593, "no_speech_prob": 0.014163213782012463}, {"id": 527, "seek": + 210188, "start": 2109.08, "end": 2114.92, "text": " like if I type something, what + happens next? And I knew nothing about inverted indexes, nothing at", "tokens": + [50724, 411, 498, 286, 2010, 746, 11, 437, 2314, 958, 30, 400, 286, 2586, 1825, + 466, 38969, 8186, 279, 11, 1825, 412, 51016], "temperature": 0.0, "avg_logprob": + -0.17766803741455078, "compression_ratio": 1.6523605150214593, "no_speech_prob": + 0.014163213782012463}, {"id": 528, "seek": 210188, "start": 2114.92, "end": 2121.08, + "text": " all. And he said, yeah, there is like an inverted index. So we break that + documents down into this", "tokens": [51016, 439, 13, 400, 415, 848, 11, 1338, 11, + 456, 307, 411, 364, 38969, 8186, 13, 407, 321, 1821, 300, 8512, 760, 666, 341, 51324], + "temperature": 0.0, "avg_logprob": -0.17766803741455078, "compression_ratio": 1.6523605150214593, + "no_speech_prob": 0.014163213782012463}, {"id": 529, "seek": 210188, "start": 2121.08, + "end": 2126.84, "text": " kind of vector of terms. And then it points each, each + term points to the posting list with", "tokens": [51324, 733, 295, 8062, 295, 2115, + 13, 400, 550, 309, 2793, 1184, 11, 1184, 1433, 2793, 281, 264, 15978, 1329, 365, + 51612], "temperature": 0.0, "avg_logprob": -0.17766803741455078, "compression_ratio": + 1.6523605150214593, "no_speech_prob": 0.014163213782012463}, {"id": 530, "seek": + 212684, "start": 2126.84, "end": 2134.44, "text": " the OKDs and so on. And then + you apply Boolean, Boolean like logic on top of those. And you make", "tokens": + [50364, 264, 2264, 35, 82, 293, 370, 322, 13, 400, 550, 291, 3079, 23351, 28499, + 11, 23351, 28499, 411, 9952, 322, 1192, 295, 729, 13, 400, 291, 652, 50744], "temperature": + 0.0, "avg_logprob": -0.15811083732394998, "compression_ratio": 1.56198347107438, + "no_speech_prob": 0.006134833674877882}, {"id": 531, "seek": 212684, "start": 2134.44, + "end": 2140.36, "text": " it efficient. But then I was still not satisfied. I said, + hey, so it means that if I need to find", "tokens": [50744, 309, 7148, 13, 583, + 550, 286, 390, 920, 406, 11239, 13, 286, 848, 11, 4177, 11, 370, 309, 1355, 300, + 498, 286, 643, 281, 915, 51040], "temperature": 0.0, "avg_logprob": -0.15811083732394998, + "compression_ratio": 1.56198347107438, "no_speech_prob": 0.006134833674877882}, + {"id": 532, "seek": 212684, "start": 2140.36, "end": 2145.0, "text": " something, + let''s say I''m in discovery mode, I don''t know what type. So what should I do?", + "tokens": [51040, 746, 11, 718, 311, 584, 286, 478, 294, 12114, 4391, 11, 286, 500, + 380, 458, 437, 2010, 13, 407, 437, 820, 286, 360, 30, 51272], "temperature": 0.0, + "avg_logprob": -0.15811083732394998, "compression_ratio": 1.56198347107438, "no_speech_prob": + 0.006134833674877882}, {"id": 533, "seek": 212684, "start": 2145.8, "end": 2150.92, + "text": " And and he said, yeah, the IR is not there yet. Like there is no discovery. + You literally need", "tokens": [51312, 400, 293, 415, 848, 11, 1338, 11, 264, 16486, + 307, 406, 456, 1939, 13, 1743, 456, 307, 572, 12114, 13, 509, 3736, 643, 51568], + "temperature": 0.0, "avg_logprob": -0.15811083732394998, "compression_ratio": 1.56198347107438, + "no_speech_prob": 0.006134833674877882}, {"id": 534, "seek": 215092, "start": 2151.08, + "end": 2156.6, "text": " to type at least something, right? And then I said, OK, + when I type something, like how does the", "tokens": [50372, 281, 2010, 412, 1935, + 746, 11, 558, 30, 400, 550, 286, 848, 11, 2264, 11, 562, 286, 2010, 746, 11, 411, + 577, 775, 264, 50648], "temperature": 0.0, "avg_logprob": -0.09454145206241157, + "compression_ratio": 1.8339622641509434, "no_speech_prob": 0.002648799680173397}, + {"id": 535, "seek": 215092, "start": 2156.6, "end": 2161.4, "text": " search engine + know what I''m looking for? And he said, well, that inverted model, which is a", + "tokens": [50648, 3164, 2848, 458, 437, 286, 478, 1237, 337, 30, 400, 415, 848, + 11, 731, 11, 300, 38969, 2316, 11, 597, 307, 257, 50888], "temperature": 0.0, "avg_logprob": + -0.09454145206241157, "compression_ratio": 1.8339622641509434, "no_speech_prob": + 0.002648799680173397}, {"id": 536, "seek": 215092, "start": 2161.4, "end": 2168.12, + "text": " vector space model from the 60s or 70s, it basically builds some kind + of understanding of the document.", "tokens": [50888, 8062, 1901, 2316, 490, 264, + 4060, 82, 420, 5285, 82, 11, 309, 1936, 15182, 512, 733, 295, 3701, 295, 264, 4166, + 13, 51224], "temperature": 0.0, "avg_logprob": -0.09454145206241157, "compression_ratio": + 1.8339622641509434, "no_speech_prob": 0.002648799680173397}, {"id": 537, "seek": + 215092, "start": 2168.12, "end": 2173.0, "text": " And I said, how exactly does + it understand the document? And he said, basically, it''s a bag of words.", "tokens": + [51224, 400, 286, 848, 11, 577, 2293, 775, 309, 1223, 264, 4166, 30, 400, 415, 848, + 11, 1936, 11, 309, 311, 257, 3411, 295, 2283, 13, 51468], "temperature": 0.0, "avg_logprob": + -0.09454145206241157, "compression_ratio": 1.8339622641509434, "no_speech_prob": + 0.002648799680173397}, {"id": 538, "seek": 215092, "start": 2173.0, "end": 2177.56, + "text": " And I said, how can it, how can it make sure that it understands the meaning + when it''s just", "tokens": [51468, 400, 286, 848, 11, 577, 393, 309, 11, 577, 393, + 309, 652, 988, 300, 309, 15146, 264, 3620, 562, 309, 311, 445, 51696], "temperature": + 0.0, "avg_logprob": -0.09454145206241157, "compression_ratio": 1.8339622641509434, + "no_speech_prob": 0.002648799680173397}, {"id": 539, "seek": 217756, "start": 2177.56, + "end": 2183.64, "text": " bag of words? Well, he said, there is also IDF component + and NTF component. And these two play", "tokens": [50364, 3411, 295, 2283, 30, 1042, + 11, 415, 848, 11, 456, 307, 611, 7348, 37, 6542, 293, 43452, 37, 6542, 13, 400, + 613, 732, 862, 50668], "temperature": 0.0, "avg_logprob": -0.1503015118975972, "compression_ratio": + 1.74822695035461, "no_speech_prob": 0.01665777526795864}, {"id": 540, "seek": 217756, + "start": 2183.64, "end": 2189.32, "text": " together. And hopefully the ideas that + you will find some unique document, which which uniquely", "tokens": [50668, 1214, + 13, 400, 4696, 264, 3487, 300, 291, 486, 915, 512, 3845, 4166, 11, 597, 597, 31474, + 50952], "temperature": 0.0, "avg_logprob": -0.1503015118975972, "compression_ratio": + 1.74822695035461, "no_speech_prob": 0.01665777526795864}, {"id": 541, "seek": 217756, + "start": 2189.32, "end": 2196.04, "text": " explains what you''re looking for. But + if I''m not looking for a term, if I''m looking for to be or not", "tokens": [50952, + 13948, 437, 291, 434, 1237, 337, 13, 583, 498, 286, 478, 406, 1237, 337, 257, 1433, + 11, 498, 286, 478, 1237, 337, 281, 312, 420, 406, 51288], "temperature": 0.0, "avg_logprob": + -0.1503015118975972, "compression_ratio": 1.74822695035461, "no_speech_prob": 0.01665777526795864}, + {"id": 542, "seek": 217756, "start": 2196.04, "end": 2201.96, "text": " to be, each + of these words is a stop word. Now, how does it know what I''m looking for? And + then he said,", "tokens": [51288, 281, 312, 11, 1184, 295, 613, 2283, 307, 257, + 1590, 1349, 13, 823, 11, 577, 775, 309, 458, 437, 286, 478, 1237, 337, 30, 400, + 550, 415, 848, 11, 51584], "temperature": 0.0, "avg_logprob": -0.1503015118975972, + "compression_ratio": 1.74822695035461, "no_speech_prob": 0.01665777526795864}, {"id": + 543, "seek": 217756, "start": 2201.96, "end": 2207.08, "text": " OK, Google actually + pays more attention to the title. So like if these words occur in the title,", "tokens": + [51584, 2264, 11, 3329, 767, 10604, 544, 3202, 281, 264, 4876, 13, 407, 411, 498, + 613, 2283, 5160, 294, 264, 4876, 11, 51840], "temperature": 0.0, "avg_logprob": + -0.1503015118975972, "compression_ratio": 1.74822695035461, "no_speech_prob": 0.01665777526795864}, + {"id": 544, "seek": 220708, "start": 2207.08, "end": 2211.3199999999997, "text": + " they will rank the document higher. And at that point, I was like, this is like + magic. So it", "tokens": [50364, 436, 486, 6181, 264, 4166, 2946, 13, 400, 412, + 300, 935, 11, 286, 390, 411, 11, 341, 307, 411, 5585, 13, 407, 309, 50576], "temperature": + 0.0, "avg_logprob": -0.11439579091173538, "compression_ratio": 1.7228915662650603, + "no_speech_prob": 0.004149710293859243}, {"id": 545, "seek": 220708, "start": 2211.3199999999997, + "end": 2216.36, "text": " doesn''t understand anything. I''m searching. It''s just + tuning it, right? It''s layers of hacks upon", "tokens": [50576, 1177, 380, 1223, + 1340, 13, 286, 478, 10808, 13, 467, 311, 445, 15164, 309, 11, 558, 30, 467, 311, + 7914, 295, 33617, 3564, 50828], "temperature": 0.0, "avg_logprob": -0.11439579091173538, + "compression_ratio": 1.7228915662650603, "no_speech_prob": 0.004149710293859243}, + {"id": 546, "seek": 220708, "start": 2216.36, "end": 2220.36, "text": " hacks upon + hacks to achieve certain goals. It''s very interesting. And in the case of Google,", + "tokens": [50828, 33617, 3564, 33617, 281, 4584, 1629, 5493, 13, 467, 311, 588, + 1880, 13, 400, 294, 264, 1389, 295, 3329, 11, 51028], "temperature": 0.0, "avg_logprob": + -0.11439579091173538, "compression_ratio": 1.7228915662650603, "no_speech_prob": + 0.004149710293859243}, {"id": 547, "seek": 220708, "start": 2220.36, "end": 2225.56, + "text": " it''s amazing. And it works as well as it does. The scope of documents + that they have in that index", "tokens": [51028, 309, 311, 2243, 13, 400, 309, 1985, + 382, 731, 382, 309, 775, 13, 440, 11923, 295, 8512, 300, 436, 362, 294, 300, 8186, + 51288], "temperature": 0.0, "avg_logprob": -0.11439579091173538, "compression_ratio": + 1.7228915662650603, "no_speech_prob": 0.004149710293859243}, {"id": 548, "seek": + 220708, "start": 2225.56, "end": 2230.44, "text": " is ridiculous. And to be able + to sort of fulfill realistic queries, especially if you consider", "tokens": [51288, + 307, 11083, 13, 400, 281, 312, 1075, 281, 1333, 295, 13875, 12465, 24109, 11, 2318, + 498, 291, 1949, 51532], "temperature": 0.0, "avg_logprob": -0.11439579091173538, + "compression_ratio": 1.7228915662650603, "no_speech_prob": 0.004149710293859243}, + {"id": 549, "seek": 220708, "start": 2230.44, "end": 2234.52, "text": " doing an + exact magic query for long terms across a huge index of documents, like how the + hell,", "tokens": [51532, 884, 364, 1900, 5585, 14581, 337, 938, 2115, 2108, 257, + 2603, 8186, 295, 8512, 11, 411, 577, 264, 4921, 11, 51736], "temperature": 0.0, + "avg_logprob": -0.11439579091173538, "compression_ratio": 1.7228915662650603, "no_speech_prob": + 0.004149710293859243}, {"id": 550, "seek": 223452, "start": 2235.24, "end": 2238.6, + "text": " like the quotation mark queries, I guess you could call them. Very interesting.", + "tokens": [50400, 411, 264, 47312, 1491, 24109, 11, 286, 2041, 291, 727, 818, 552, + 13, 4372, 1880, 13, 50568], "temperature": 0.0, "avg_logprob": -0.2885671191745334, + "compression_ratio": 1.6691176470588236, "no_speech_prob": 0.007981045171618462}, + {"id": 551, "seek": 223452, "start": 2238.6, "end": 2244.04, "text": " I think that''s + a real good thing. One of the things that I''ve found to help me evaluate the overall", + "tokens": [50568, 286, 519, 300, 311, 257, 957, 665, 551, 13, 1485, 295, 264, 721, + 300, 286, 600, 1352, 281, 854, 385, 13059, 264, 4787, 50840], "temperature": 0.0, + "avg_logprob": -0.2885671191745334, "compression_ratio": 1.6691176470588236, "no_speech_prob": + 0.007981045171618462}, {"id": 552, "seek": 223452, "start": 2246.2, "end": 2251.72, + "text": " the overall confidence level of that these texts and writings do is I + evaluate different choices.", "tokens": [50948, 264, 4787, 6687, 1496, 295, 300, + 613, 15765, 293, 30083, 360, 307, 286, 13059, 819, 7994, 13, 51224], "temperature": + 0.0, "avg_logprob": -0.2885671191745334, "compression_ratio": 1.6691176470588236, + "no_speech_prob": 0.007981045171618462}, {"id": 553, "seek": 223452, "start": 2252.44, + "end": 2257.32, "text": " So for instance, on classic.com, one of the options we''re + exploring is we have an enormous", "tokens": [51260, 407, 337, 5197, 11, 322, 7230, + 13, 1112, 11, 472, 295, 264, 3956, 321, 434, 12736, 307, 321, 362, 364, 11322, 51504], + "temperature": 0.0, "avg_logprob": -0.2885671191745334, "compression_ratio": 1.6691176470588236, + "no_speech_prob": 0.007981045171618462}, {"id": 554, "seek": 223452, "start": 2258.04, + "end": 2262.2, "text": " editor workflow. So when a new vehicle comes into the site, + we need to have a vehicle", "tokens": [51540, 9839, 20993, 13, 407, 562, 257, 777, + 5864, 1487, 666, 264, 3621, 11, 321, 643, 281, 362, 257, 5864, 51748], "temperature": + 0.0, "avg_logprob": -0.2885671191745334, "compression_ratio": 1.6691176470588236, + "no_speech_prob": 0.007981045171618462}, {"id": 555, "seek": 226220, "start": 2262.3599999999997, + "end": 2265.8799999999997, "text": " person is expert at that making model. Look + at the vehicle and determine what it is and answer", "tokens": [50372, 954, 307, + 5844, 412, 300, 1455, 2316, 13, 2053, 412, 264, 5864, 293, 6997, 437, 309, 307, + 293, 1867, 50548], "temperature": 0.0, "avg_logprob": -0.1676562403289365, "compression_ratio": + 1.7898089171974523, "no_speech_prob": 0.003732377430424094}, {"id": 556, "seek": + 226220, "start": 2265.8799999999997, "end": 2270.6, "text": " some questions about + it. Like what color is it? Has it been restored or is it in an original condition?", + "tokens": [50548, 512, 1651, 466, 309, 13, 1743, 437, 2017, 307, 309, 30, 8646, + 309, 668, 23143, 420, 307, 309, 294, 364, 3380, 4188, 30, 50784], "temperature": + 0.0, "avg_logprob": -0.1676562403289365, "compression_ratio": 1.7898089171974523, + "no_speech_prob": 0.003732377430424094}, {"id": 557, "seek": 226220, "start": 2270.6, + "end": 2275.3999999999996, "text": " So what we''re trying now is to actually use + clip for that. So I have a database of those,", "tokens": [50784, 407, 437, 321, + 434, 1382, 586, 307, 281, 767, 764, 7353, 337, 300, 13, 407, 286, 362, 257, 8149, + 295, 729, 11, 51024], "temperature": 0.0, "avg_logprob": -0.1676562403289365, "compression_ratio": + 1.7898089171974523, "no_speech_prob": 0.003732377430424094}, {"id": 558, "seek": + 226220, "start": 2275.3999999999996, "end": 2280.9199999999996, "text": " let''s + say, potential colors. And then I evaluate the image with clip and I say,", "tokens": + [51024, 718, 311, 584, 11, 3995, 4577, 13, 400, 550, 286, 13059, 264, 3256, 365, + 7353, 293, 286, 584, 11, 51300], "temperature": 0.0, "avg_logprob": -0.1676562403289365, + "compression_ratio": 1.7898089171974523, "no_speech_prob": 0.003732377430424094}, + {"id": 559, "seek": 226220, "start": 2281.72, "end": 2286.8399999999997, "text": + " picture of a red car, picture of a blue car, picture of a green car. And then + I look at all of them,", "tokens": [51340, 3036, 295, 257, 2182, 1032, 11, 3036, + 295, 257, 3344, 1032, 11, 3036, 295, 257, 3092, 1032, 13, 400, 550, 286, 574, 412, + 439, 295, 552, 11, 51596], "temperature": 0.0, "avg_logprob": -0.1676562403289365, + "compression_ratio": 1.7898089171974523, "no_speech_prob": 0.003732377430424094}, + {"id": 560, "seek": 226220, "start": 2286.8399999999997, "end": 2291.3999999999996, + "text": " and I determine what one obviously which one has the highest the closest + distance. But also,", "tokens": [51596, 293, 286, 6997, 437, 472, 2745, 597, 472, + 575, 264, 6343, 264, 13699, 4560, 13, 583, 611, 11, 51824], "temperature": 0.0, + "avg_logprob": -0.1676562403289365, "compression_ratio": 1.7898089171974523, "no_speech_prob": + 0.003732377430424094}, {"id": 561, "seek": 229140, "start": 2291.4, "end": 2296.12, + "text": " overall, did any of them have a close distance or were they all kind of + distant or were they", "tokens": [50364, 4787, 11, 630, 604, 295, 552, 362, 257, + 1998, 4560, 420, 645, 436, 439, 733, 295, 17275, 420, 645, 436, 50600], "temperature": + 0.0, "avg_logprob": -0.2019174893697103, "compression_ratio": 1.7391304347826086, + "no_speech_prob": 0.0016881530173122883}, {"id": 562, "seek": 229140, "start": 2296.12, + "end": 2303.56, "text": " all very far away from the embedding of the query. And + if so, then I tell myself that okay,", "tokens": [50600, 439, 588, 1400, 1314, 490, + 264, 12240, 3584, 295, 264, 14581, 13, 400, 498, 370, 11, 550, 286, 980, 2059, 300, + 1392, 11, 50972], "temperature": 0.0, "avg_logprob": -0.2019174893697103, "compression_ratio": + 1.7391304347826086, "no_speech_prob": 0.0016881530173122883}, {"id": 563, "seek": + 229140, "start": 2303.56, "end": 2308.76, "text": " we''re not answering this question + well. Right? Like the fact that it had no strong suggestion at all", "tokens": [50972, + 321, 434, 406, 13430, 341, 1168, 731, 13, 1779, 30, 1743, 264, 1186, 300, 309, 632, + 572, 2068, 16541, 412, 439, 51232], "temperature": 0.0, "avg_logprob": -0.2019174893697103, + "compression_ratio": 1.7391304347826086, "no_speech_prob": 0.0016881530173122883}, + {"id": 564, "seek": 229140, "start": 2308.76, "end": 2314.6, "text": " is in a way + a confidence factor or a confidence metric in a way. Which is fantastic. Like you''re", + "tokens": [51232, 307, 294, 257, 636, 257, 6687, 5952, 420, 257, 6687, 20678, 294, + 257, 636, 13, 3013, 307, 5456, 13, 1743, 291, 434, 51524], "temperature": 0.0, "avg_logprob": + -0.2019174893697103, "compression_ratio": 1.7391304347826086, "no_speech_prob": + 0.0016881530173122883}, {"id": 565, "seek": 229140, "start": 2314.6, "end": 2320.28, + "text": " able to find an answer to my question, which is like broad enough, I think, + but like essentially,", "tokens": [51524, 1075, 281, 915, 364, 1867, 281, 452, 1168, + 11, 597, 307, 411, 4152, 1547, 11, 286, 519, 11, 457, 411, 4476, 11, 51808], "temperature": + 0.0, "avg_logprob": -0.2019174893697103, "compression_ratio": 1.7391304347826086, + "no_speech_prob": 0.0016881530173122883}, {"id": 566, "seek": 232028, "start": 2320.28, + "end": 2324.44, "text": " you can use a threshold on the distance that didn''t cross + my mind at all. Like, yeah, you''re right.", "tokens": [50364, 291, 393, 764, 257, + 14678, 322, 264, 4560, 300, 994, 380, 3278, 452, 1575, 412, 439, 13, 1743, 11, 1338, + 11, 291, 434, 558, 13, 50572], "temperature": 0.0, "avg_logprob": -0.19012149047851562, + "compression_ratio": 1.6925795053003534, "no_speech_prob": 0.004254921339452267}, + {"id": 567, "seek": 232028, "start": 2324.44, "end": 2330.1200000000003, "text": + " Like you can define kind of like the confidence interval for these distances, + right? And you know,", "tokens": [50572, 1743, 291, 393, 6964, 733, 295, 411, 264, + 6687, 15035, 337, 613, 22182, 11, 558, 30, 400, 291, 458, 11, 50856], "temperature": + 0.0, "avg_logprob": -0.19012149047851562, "compression_ratio": 1.6925795053003534, + "no_speech_prob": 0.004254921339452267}, {"id": 568, "seek": 232028, "start": 2330.1200000000003, + "end": 2334.92, "text": " which metric you''re using and you know your data set + as well. Right? So you could go through the", "tokens": [50856, 597, 20678, 291, + 434, 1228, 293, 291, 458, 428, 1412, 992, 382, 731, 13, 1779, 30, 407, 291, 727, + 352, 807, 264, 51096], "temperature": 0.0, "avg_logprob": -0.19012149047851562, + "compression_ratio": 1.6925795053003534, "no_speech_prob": 0.004254921339452267}, + {"id": 569, "seek": 232028, "start": 2334.92, "end": 2339.8, "text": " meal of your + lab and check, okay, is this a good one? Is this a bad one? Yeah, that''s an amazing", + "tokens": [51096, 6791, 295, 428, 2715, 293, 1520, 11, 1392, 11, 307, 341, 257, + 665, 472, 30, 1119, 341, 257, 1578, 472, 30, 865, 11, 300, 311, 364, 2243, 51340], + "temperature": 0.0, "avg_logprob": -0.19012149047851562, "compression_ratio": 1.6925795053003534, + "no_speech_prob": 0.004254921339452267}, {"id": 570, "seek": 232028, "start": 2339.8, + "end": 2347.32, "text": " solution that you just came up with. And from the perspective + of the amount of art of,", "tokens": [51340, 3827, 300, 291, 445, 1361, 493, 365, + 13, 400, 490, 264, 4585, 295, 264, 2372, 295, 1523, 295, 11, 51716], "temperature": + 0.0, "avg_logprob": -0.19012149047851562, "compression_ratio": 1.6925795053003534, + "no_speech_prob": 0.004254921339452267}, {"id": 571, "seek": 234732, "start": 2347.32, + "end": 2351.32, "text": " like when you''re building a piece of software, you have + to say, how many little artifacts am I", "tokens": [50364, 411, 562, 291, 434, 2390, + 257, 2522, 295, 4722, 11, 291, 362, 281, 584, 11, 577, 867, 707, 24617, 669, 286, + 50564], "temperature": 0.0, "avg_logprob": -0.14281707920440256, "compression_ratio": + 1.7337278106508875, "no_speech_prob": 0.00410262169316411}, {"id": 572, "seek": + 234732, "start": 2351.32, "end": 2354.36, "text": " creating here? What do I have + to actually do? Am I creating a lot of stuff? Am I just creating a", "tokens": [50564, + 4084, 510, 30, 708, 360, 286, 362, 281, 767, 360, 30, 2012, 286, 4084, 257, 688, + 295, 1507, 30, 2012, 286, 445, 4084, 257, 50716], "temperature": 0.0, "avg_logprob": + -0.14281707920440256, "compression_ratio": 1.7337278106508875, "no_speech_prob": + 0.00410262169316411}, {"id": 573, "seek": 234732, "start": 2354.36, "end": 2360.6800000000003, + "text": " little bit of stuff that works for a broad range of data and use cases? + Now with clip, you get so much", "tokens": [50716, 707, 857, 295, 1507, 300, 1985, + 337, 257, 4152, 3613, 295, 1412, 293, 764, 3331, 30, 823, 365, 7353, 11, 291, 483, + 370, 709, 51032], "temperature": 0.0, "avg_logprob": -0.14281707920440256, "compression_ratio": + 1.7337278106508875, "no_speech_prob": 0.00410262169316411}, {"id": 574, "seek": + 234732, "start": 2361.6400000000003, "end": 2366.04, "text": " for free, quote unquote, + like that whole question answering system that I used to implement that", "tokens": + [51080, 337, 1737, 11, 6513, 37557, 11, 411, 300, 1379, 1168, 13430, 1185, 300, + 286, 1143, 281, 4445, 300, 51300], "temperature": 0.0, "avg_logprob": -0.14281707920440256, + "compression_ratio": 1.7337278106508875, "no_speech_prob": 0.00410262169316411}, + {"id": 575, "seek": 234732, "start": 2367.2400000000002, "end": 2370.36, "text": + " took a couple hours. And of course, I''m going to take a lot of tweaking, but + compared to training,", "tokens": [51360, 1890, 257, 1916, 2496, 13, 400, 295, 1164, + 11, 286, 478, 516, 281, 747, 257, 688, 295, 6986, 2456, 11, 457, 5347, 281, 3097, + 11, 51516], "temperature": 0.0, "avg_logprob": -0.14281707920440256, "compression_ratio": + 1.7337278106508875, "no_speech_prob": 0.00410262169316411}, {"id": 576, "seek": + 234732, "start": 2370.36, "end": 2374.28, "text": " a bunch of image classifiers + to answer the same task, which would take me an enormous amount of", "tokens": [51516, + 257, 3840, 295, 3256, 1508, 23463, 281, 1867, 264, 912, 5633, 11, 597, 576, 747, + 385, 364, 11322, 2372, 295, 51712], "temperature": 0.0, "avg_logprob": -0.14281707920440256, + "compression_ratio": 1.7337278106508875, "no_speech_prob": 0.00410262169316411}, + {"id": 577, "seek": 237428, "start": 2374.28, "end": 2377.5600000000004, "text": + " effort I would have to have, you know, we have seven different attributes. So + there have to be seven", "tokens": [50364, 4630, 286, 576, 362, 281, 362, 11, 291, + 458, 11, 321, 362, 3407, 819, 17212, 13, 407, 456, 362, 281, 312, 3407, 50528], + "temperature": 0.0, "avg_logprob": -0.13637736072279003, "compression_ratio": 1.727536231884058, + "no_speech_prob": 0.0012051535304635763}, {"id": 578, "seek": 237428, "start": 2377.5600000000004, + "end": 2381.88, "text": " different models, hundreds of thousands of training images + for each, a very elaborate process of", "tokens": [50528, 819, 5245, 11, 6779, 295, + 5383, 295, 3097, 5267, 337, 1184, 11, 257, 588, 20945, 1399, 295, 50744], "temperature": + 0.0, "avg_logprob": -0.13637736072279003, "compression_ratio": 1.727536231884058, + "no_speech_prob": 0.0012051535304635763}, {"id": 579, "seek": 237428, "start": 2381.88, + "end": 2386.44, "text": " manually correlating them with clip. I just got all that + quickly. And again, it''s not super accurate,", "tokens": [50744, 16945, 13983, + 990, 552, 365, 7353, 13, 286, 445, 658, 439, 300, 2661, 13, 400, 797, 11, 309, 311, + 406, 1687, 8559, 11, 50972], "temperature": 0.0, "avg_logprob": -0.13637736072279003, + "compression_ratio": 1.727536231884058, "no_speech_prob": 0.0012051535304635763}, + {"id": 580, "seek": 237428, "start": 2386.44, "end": 2391.8, "text": " but it gives + you a building block that you can just apply everywhere. And you know, if at some + point,", "tokens": [50972, 457, 309, 2709, 291, 257, 2390, 3461, 300, 291, 393, + 445, 3079, 5315, 13, 400, 291, 458, 11, 498, 412, 512, 935, 11, 51240], "temperature": + 0.0, "avg_logprob": -0.13637736072279003, "compression_ratio": 1.727536231884058, + "no_speech_prob": 0.0012051535304635763}, {"id": 581, "seek": 237428, "start": 2391.8, + "end": 2395.96, "text": " I wanted to find other vehicles like this one, that same + model works. If I want to find, if this", "tokens": [51240, 286, 1415, 281, 915, + 661, 8948, 411, 341, 472, 11, 300, 912, 2316, 1985, 13, 759, 286, 528, 281, 915, + 11, 498, 341, 51448], "temperature": 0.0, "avg_logprob": -0.13637736072279003, "compression_ratio": + 1.727536231884058, "no_speech_prob": 0.0012051535304635763}, {"id": 582, "seek": + 237428, "start": 2395.96, "end": 2400.6800000000003, "text": " matches a certain + given piece of text, like, is this a Ford Mustang? That model works. It''s just,", + "tokens": [51448, 10676, 257, 1629, 2212, 2522, 295, 2487, 11, 411, 11, 307, 341, + 257, 11961, 37115, 30, 663, 2316, 1985, 13, 467, 311, 445, 11, 51684], "temperature": + 0.0, "avg_logprob": -0.13637736072279003, "compression_ratio": 1.727536231884058, + "no_speech_prob": 0.0012051535304635763}, {"id": 583, "seek": 240068, "start": 2401.3999999999996, + "end": 2405.0, "text": " I don''t know, it''s really, really, really amazing. Yeah, + it''s mind blowing that, you know,", "tokens": [50400, 286, 500, 380, 458, 11, 309, + 311, 534, 11, 534, 11, 534, 2243, 13, 865, 11, 309, 311, 1575, 15068, 300, 11, 291, + 458, 11, 50580], "temperature": 0.0, "avg_logprob": -0.15706961304991396, "compression_ratio": + 1.8534201954397393, "no_speech_prob": 0.019258173182606697}, {"id": 584, "seek": + 240068, "start": 2405.0, "end": 2410.04, "text": " science, as you said, you know, + somebody in the science world thought about this problem,", "tokens": [50580, 3497, + 11, 382, 291, 848, 11, 291, 458, 11, 2618, 294, 264, 3497, 1002, 1194, 466, 341, + 1154, 11, 50832], "temperature": 0.0, "avg_logprob": -0.15706961304991396, "compression_ratio": + 1.8534201954397393, "no_speech_prob": 0.019258173182606697}, {"id": 585, "seek": + 240068, "start": 2410.04, "end": 2415.0, "text": " and they came up with some really + great solution that you can actually use. But when you discovered", "tokens": [50832, + 293, 436, 1361, 493, 365, 512, 534, 869, 3827, 300, 291, 393, 767, 764, 13, 583, + 562, 291, 6941, 51080], "temperature": 0.0, "avg_logprob": -0.15706961304991396, + "compression_ratio": 1.8534201954397393, "no_speech_prob": 0.019258173182606697}, + {"id": 586, "seek": 240068, "start": 2415.0, "end": 2420.52, "text": " that clip + works so well, did you get amazed at the point of going back and reading the paper,", + "tokens": [51080, 300, 7353, 1985, 370, 731, 11, 630, 291, 483, 20507, 412, 264, + 935, 295, 516, 646, 293, 3760, 264, 3035, 11, 51356], "temperature": 0.0, "avg_logprob": + -0.15706961304991396, "compression_ratio": 1.8534201954397393, "no_speech_prob": + 0.019258173182606697}, {"id": 587, "seek": 240068, "start": 2420.52, "end": 2426.12, + "text": " or are you not interested in papers? I flipped to the paper. I''m interested + in papers to some extent.", "tokens": [51356, 420, 366, 291, 406, 3102, 294, 10577, + 30, 286, 26273, 281, 264, 3035, 13, 286, 478, 3102, 294, 10577, 281, 512, 8396, + 13, 51636], "temperature": 0.0, "avg_logprob": -0.15706961304991396, "compression_ratio": + 1.8534201954397393, "no_speech_prob": 0.019258173182606697}, {"id": 588, "seek": + 240068, "start": 2426.12, "end": 2429.56, "text": " I know some people like take + a week off of work to read paper and stuff like that. I''m like,", "tokens": [51636, + 286, 458, 512, 561, 411, 747, 257, 1243, 766, 295, 589, 281, 1401, 3035, 293, 1507, + 411, 300, 13, 286, 478, 411, 11, 51808], "temperature": 0.0, "avg_logprob": -0.15706961304991396, + "compression_ratio": 1.8534201954397393, "no_speech_prob": 0.019258173182606697}, + {"id": 589, "seek": 243068, "start": 2430.68, "end": 2436.3599999999997, "text": + " wow. You know, I don''t come from a math background. I come from more of a practitioner + programmer", "tokens": [50364, 6076, 13, 509, 458, 11, 286, 500, 380, 808, 490, + 257, 5221, 3678, 13, 286, 808, 490, 544, 295, 257, 32125, 32116, 50648], "temperature": + 0.0, "avg_logprob": -0.16955054578163642, "compression_ratio": 1.7446808510638299, + "no_speech_prob": 0.009962506592273712}, {"id": 590, "seek": 243068, "start": 2436.3599999999997, + "end": 2442.2799999999997, "text": " background. So for me, I actually prefer sometimes + to study the code and to understand, like,", "tokens": [50648, 3678, 13, 407, 337, + 385, 11, 286, 767, 4382, 2171, 281, 2979, 264, 3089, 293, 281, 1223, 11, 411, 11, + 50944], "temperature": 0.0, "avg_logprob": -0.16955054578163642, "compression_ratio": + 1.7446808510638299, "no_speech_prob": 0.009962506592273712}, {"id": 591, "seek": + 243068, "start": 2442.2799999999997, "end": 2445.64, "text": " a lot of times their + usage instructions will give you a lot of like subtle information about", "tokens": + [50944, 257, 688, 295, 1413, 641, 14924, 9415, 486, 976, 291, 257, 688, 295, 411, + 13743, 1589, 466, 51112], "temperature": 0.0, "avg_logprob": -0.16955054578163642, + "compression_ratio": 1.7446808510638299, "no_speech_prob": 0.009962506592273712}, + {"id": 592, "seek": 243068, "start": 2445.64, "end": 2450.04, "text": " ways that + you should and should not apply it. So I kind of stand that space for the most part.", + "tokens": [51112, 2098, 300, 291, 820, 293, 820, 406, 3079, 309, 13, 407, 286, 733, + 295, 1463, 300, 1901, 337, 264, 881, 644, 13, 51332], "temperature": 0.0, "avg_logprob": + -0.16955054578163642, "compression_ratio": 1.7446808510638299, "no_speech_prob": + 0.009962506592273712}, {"id": 593, "seek": 243068, "start": 2450.8399999999997, + "end": 2455.7999999999997, "text": " But I am definitely paying a lot of attention + to all, to all embeddings at this point. And I", "tokens": [51372, 583, 286, 669, + 2138, 6229, 257, 688, 295, 3202, 281, 439, 11, 281, 439, 12240, 29432, 412, 341, + 935, 13, 400, 286, 51620], "temperature": 0.0, "avg_logprob": -0.16955054578163642, + "compression_ratio": 1.7446808510638299, "no_speech_prob": 0.009962506592273712}, + {"id": 594, "seek": 243068, "start": 2455.7999999999997, "end": 2460.12, "text": + " feel like this is like, especially the multi-volta ones, once they start including + the video content,", "tokens": [51620, 841, 411, 341, 307, 411, 11, 2318, 264, 4825, + 12, 9646, 1328, 2306, 11, 1564, 436, 722, 3009, 264, 960, 2701, 11, 51836], "temperature": + 0.0, "avg_logprob": -0.16955054578163642, "compression_ratio": 1.7446808510638299, + "no_speech_prob": 0.009962506592273712}, {"id": 595, "seek": 246012, "start": 2460.12, + "end": 2464.44, "text": " and once we can run audio through there as well, it''s + just going to be a really exciting time to", "tokens": [50364, 293, 1564, 321, 393, + 1190, 6278, 807, 456, 382, 731, 11, 309, 311, 445, 516, 281, 312, 257, 534, 4670, + 565, 281, 50580], "temperature": 0.0, "avg_logprob": -0.12377304840087891, "compression_ratio": + 1.6363636363636365, "no_speech_prob": 0.007017049472779036}, {"id": 596, "seek": + 246012, "start": 2464.44, "end": 2469.88, "text": " be alive. Yeah, I think it''s + great that you''re looking at the code because it''s like several levels", "tokens": + [50580, 312, 5465, 13, 865, 11, 286, 519, 309, 311, 869, 300, 291, 434, 1237, 412, + 264, 3089, 570, 309, 311, 411, 2940, 4358, 50852], "temperature": 0.0, "avg_logprob": + -0.12377304840087891, "compression_ratio": 1.6363636363636365, "no_speech_prob": + 0.007017049472779036}, {"id": 597, "seek": 246012, "start": 2469.88, "end": 2475.7999999999997, + "text": " of abstraction, right? First, you''re trying to understand, okay, is this + useful? Okay, it is.", "tokens": [50852, 295, 37765, 11, 558, 30, 2386, 11, 291, + 434, 1382, 281, 1223, 11, 1392, 11, 307, 341, 4420, 30, 1033, 11, 309, 307, 13, + 51148], "temperature": 0.0, "avg_logprob": -0.12377304840087891, "compression_ratio": + 1.6363636363636365, "no_speech_prob": 0.007017049472779036}, {"id": 598, "seek": + 246012, "start": 2475.7999999999997, "end": 2481.72, "text": " How does it work? + Maybe what are the limitations? What are the advantages? Then you go to the paper,", + "tokens": [51148, 1012, 775, 309, 589, 30, 2704, 437, 366, 264, 15705, 30, 708, + 366, 264, 14906, 30, 1396, 291, 352, 281, 264, 3035, 11, 51444], "temperature": + 0.0, "avg_logprob": -0.12377304840087891, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.007017049472779036}, {"id": 599, "seek": 246012, "start": 2481.72, + "end": 2486.2, "text": " so where they, of course, beautifully describe the algorithm + and they say it''s the best. So it", "tokens": [51444, 370, 689, 436, 11, 295, 1164, + 11, 16525, 6786, 264, 9284, 293, 436, 584, 309, 311, 264, 1151, 13, 407, 309, 51668], + "temperature": 0.0, "avg_logprob": -0.12377304840087891, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.007017049472779036}, {"id": 600, "seek": 248620, "start": 2486.2, + "end": 2491.16, "text": " beats the state of the art and, you know, over the previous + work. But the problem is that there is", "tokens": [50364, 16447, 264, 1785, 295, + 264, 1523, 293, 11, 291, 458, 11, 670, 264, 3894, 589, 13, 583, 264, 1154, 307, + 300, 456, 307, 50612], "temperature": 0.0, "avg_logprob": -0.1436243874686105, "compression_ratio": + 1.779783393501805, "no_speech_prob": 0.030283942818641663}, {"id": 601, "seek": + 248620, "start": 2491.16, "end": 2497.3999999999996, "text": " always a gap, or + usually there is a gap between the paper and the code. So if they publish the open + source", "tokens": [50612, 1009, 257, 7417, 11, 420, 2673, 456, 307, 257, 7417, + 1296, 264, 3035, 293, 264, 3089, 13, 407, 498, 436, 11374, 264, 1269, 4009, 50924], + "temperature": 0.0, "avg_logprob": -0.1436243874686105, "compression_ratio": 1.779783393501805, + "no_speech_prob": 0.030283942818641663}, {"id": 602, "seek": 248620, "start": 2497.3999999999996, + "end": 2502.2, "text": " code, you go there and oh my god, there''s like a bunch + of additional hacks on top of the paper to", "tokens": [50924, 3089, 11, 291, 352, + 456, 293, 1954, 452, 3044, 11, 456, 311, 411, 257, 3840, 295, 4497, 33617, 322, + 1192, 295, 264, 3035, 281, 51164], "temperature": 0.0, "avg_logprob": -0.1436243874686105, + "compression_ratio": 1.779783393501805, "no_speech_prob": 0.030283942818641663}, + {"id": 603, "seek": 248620, "start": 2502.2, "end": 2508.8399999999997, "text": + " make it really work. Oh, I see. So yeah, it''s amazing that you go back and read + the code. And are", "tokens": [51164, 652, 309, 534, 589, 13, 876, 11, 286, 536, + 13, 407, 1338, 11, 309, 311, 2243, 300, 291, 352, 646, 293, 1401, 264, 3089, 13, + 400, 366, 51496], "temperature": 0.0, "avg_logprob": -0.1436243874686105, "compression_ratio": + 1.779783393501805, "no_speech_prob": 0.030283942818641663}, {"id": 604, "seek": + 248620, "start": 2508.8399999999997, "end": 2515.8799999999997, "text": " you getting + scared of reading like C++ code? No, no, no, no. In a way, C++ is like, it''s so", + "tokens": [51496, 291, 1242, 5338, 295, 3760, 411, 383, 25472, 3089, 30, 883, 11, + 572, 11, 572, 11, 572, 13, 682, 257, 636, 11, 383, 25472, 307, 411, 11, 309, 311, + 370, 51848], "temperature": 0.0, "avg_logprob": -0.1436243874686105, "compression_ratio": + 1.779783393501805, "no_speech_prob": 0.030283942818641663}, {"id": 605, "seek": + 251588, "start": 2515.88, "end": 2519.2400000000002, "text": " much, you know, it''s + different. It''s a different part of your brain. You know, C++ is so much", "tokens": + [50364, 709, 11, 291, 458, 11, 309, 311, 819, 13, 467, 311, 257, 819, 644, 295, + 428, 3567, 13, 509, 458, 11, 383, 25472, 307, 370, 709, 50532], "temperature": 0.0, + "avg_logprob": -0.1507323645298777, "compression_ratio": 1.8074534161490683, "no_speech_prob": + 0.0018829114269465208}, {"id": 606, "seek": 251588, "start": 2519.2400000000002, + "end": 2525.8, "text": " simpler in a certain sense that every, every line there + has a specific action at a specific point", "tokens": [50532, 18587, 294, 257, 1629, + 2020, 300, 633, 11, 633, 1622, 456, 575, 257, 2685, 3069, 412, 257, 2685, 935, 50860], + "temperature": 0.0, "avg_logprob": -0.1507323645298777, "compression_ratio": 1.8074534161490683, + "no_speech_prob": 0.0018829114269465208}, {"id": 607, "seek": 251588, "start": 2525.8, + "end": 2530.12, "text": " in the code. Like every line there has a certain meaning + with, let''s say a model and PyTorch,", "tokens": [50860, 294, 264, 3089, 13, 1743, + 633, 1622, 456, 575, 257, 1629, 3620, 365, 11, 718, 311, 584, 257, 2316, 293, 9953, + 51, 284, 339, 11, 51076], "temperature": 0.0, "avg_logprob": -0.1507323645298777, + "compression_ratio": 1.8074534161490683, "no_speech_prob": 0.0018829114269465208}, + {"id": 608, "seek": 251588, "start": 2530.12, "end": 2534.84, "text": " there''s + a lot of like, for instance, like if your normalization is wrong, right? It''s hard + to tell", "tokens": [51076, 456, 311, 257, 688, 295, 411, 11, 337, 5197, 11, 411, + 498, 428, 2710, 2144, 307, 2085, 11, 558, 30, 467, 311, 1152, 281, 980, 51312], + "temperature": 0.0, "avg_logprob": -0.1507323645298777, "compression_ratio": 1.8074534161490683, + "no_speech_prob": 0.0018829114269465208}, {"id": 609, "seek": 251588, "start": 2534.84, + "end": 2538.04, "text": " that. And it''s hard to even see that except for watching + a training curve and guessing and sort", "tokens": [51312, 300, 13, 400, 309, 311, + 1152, 281, 754, 536, 300, 3993, 337, 1976, 257, 3097, 7605, 293, 17939, 293, 1333, + 51472], "temperature": 0.0, "avg_logprob": -0.1507323645298777, "compression_ratio": + 1.8074534161490683, "no_speech_prob": 0.0018829114269465208}, {"id": 610, "seek": + 251588, "start": 2538.04, "end": 2543.7200000000003, "text": " of hoping. And maybe + that''s part of my, maybe that speaks to my skillset, but I definitely think that", + "tokens": [51472, 295, 7159, 13, 400, 1310, 300, 311, 644, 295, 452, 11, 1310, 300, + 10789, 281, 452, 3942, 302, 11, 457, 286, 2138, 519, 300, 51756], "temperature": + 0.0, "avg_logprob": -0.1507323645298777, "compression_ratio": 1.8074534161490683, + "no_speech_prob": 0.0018829114269465208}, {"id": 611, "seek": 254372, "start": 2544.6, + "end": 2548.6, "text": " that the machine learning model is brilliant because it''s + such a small amount of code that can do so", "tokens": [50408, 300, 264, 3479, 2539, + 2316, 307, 10248, 570, 309, 311, 1270, 257, 1359, 2372, 295, 3089, 300, 393, 360, + 370, 50608], "temperature": 0.0, "avg_logprob": -0.2083967673678358, "compression_ratio": + 1.6245733788395904, "no_speech_prob": 0.012448362074792385}, {"id": 612, "seek": + 254372, "start": 2548.6, "end": 2553.8799999999997, "text": " much. Whereas C++ + stuff is interesting because everything is excruciatingly carefully defined.", "tokens": + [50608, 709, 13, 13813, 383, 25472, 1507, 307, 1880, 570, 1203, 307, 1624, 894, + 537, 990, 356, 7500, 7642, 13, 50872], "temperature": 0.0, "avg_logprob": -0.2083967673678358, + "compression_ratio": 1.6245733788395904, "no_speech_prob": 0.012448362074792385}, + {"id": 613, "seek": 254372, "start": 2553.8799999999997, "end": 2557.8799999999997, + "text": " So it''s kind of two separate sides, but both you to hold in their own + way. Yeah, absolutely,", "tokens": [50872, 407, 309, 311, 733, 295, 732, 4994, 4881, + 11, 457, 1293, 291, 281, 1797, 294, 641, 1065, 636, 13, 865, 11, 3122, 11, 51072], + "temperature": 0.0, "avg_logprob": -0.2083967673678358, "compression_ratio": 1.6245733788395904, + "no_speech_prob": 0.012448362074792385}, {"id": 614, "seek": 254372, "start": 2557.8799999999997, + "end": 2563.48, "text": " absolutely. Especially when they get combined. So you''re + like, you''re doing some model on C++", "tokens": [51072, 3122, 13, 8545, 562, 436, + 483, 9354, 13, 407, 291, 434, 411, 11, 291, 434, 884, 512, 2316, 322, 383, 25472, + 51352], "temperature": 0.0, "avg_logprob": -0.2083967673678358, "compression_ratio": + 1.6245733788395904, "no_speech_prob": 0.012448362074792385}, {"id": 615, "seek": + 254372, "start": 2563.48, "end": 2569.24, "text": " because for example, H&SW the + graph algorithm is implemented in C++, which looks like C. So I", "tokens": [51352, + 570, 337, 1365, 11, 389, 5, 50, 54, 264, 4295, 9284, 307, 12270, 294, 383, 25472, + 11, 597, 1542, 411, 383, 13, 407, 286, 51640], "temperature": 0.0, "avg_logprob": + -0.2083967673678358, "compression_ratio": 1.6245733788395904, "no_speech_prob": + 0.012448362074792385}, {"id": 616, "seek": 256924, "start": 2569.3199999999997, + "end": 2575.64, "text": " took a look at it with one of my colleagues and he was + like, wow, this is not the modern C++ code.", "tokens": [50368, 1890, 257, 574, + 412, 309, 365, 472, 295, 452, 7734, 293, 415, 390, 411, 11, 6076, 11, 341, 307, + 406, 264, 4363, 383, 25472, 3089, 13, 50684], "temperature": 0.0, "avg_logprob": + -0.19346286178728855, "compression_ratio": 1.6753246753246753, "no_speech_prob": + 0.01894608698785305}, {"id": 617, "seek": 256924, "start": 2575.64, "end": 2582.04, + "text": " And yeah, it''s like basic, it looks very basic in a way. Of course, they + use some C++ elements,", "tokens": [50684, 400, 1338, 11, 309, 311, 411, 3875, 11, + 309, 1542, 588, 3875, 294, 257, 636, 13, 2720, 1164, 11, 436, 764, 512, 383, 25472, + 4959, 11, 51004], "temperature": 0.0, "avg_logprob": -0.19346286178728855, "compression_ratio": + 1.6753246753246753, "no_speech_prob": 0.01894608698785305}, {"id": 618, "seek": + 256924, "start": 2582.04, "end": 2589.08, "text": " but like, for example, they + allocate memory with like this level of mal-locks. Yeah, and you''re like,", "tokens": + [51004, 457, 411, 11, 337, 1365, 11, 436, 35713, 4675, 365, 411, 341, 1496, 295, + 2806, 12, 34896, 13, 865, 11, 293, 291, 434, 411, 11, 51356], "temperature": 0.0, + "avg_logprob": -0.19346286178728855, "compression_ratio": 1.6753246753246753, "no_speech_prob": + 0.01894608698785305}, {"id": 619, "seek": 256924, "start": 2589.08, "end": 2595.3199999999997, + "text": " wow, yeah, you''re doing that. And then some other companies, for example, + like semi, which", "tokens": [51356, 6076, 11, 1338, 11, 291, 434, 884, 300, 13, + 400, 550, 512, 661, 3431, 11, 337, 1365, 11, 411, 12909, 11, 597, 51668], "temperature": + 0.0, "avg_logprob": -0.19346286178728855, "compression_ratio": 1.6753246753246753, + "no_speech_prob": 0.01894608698785305}, {"id": 620, "seek": 259532, "start": 2595.32, + "end": 2602.1200000000003, "text": " build the aviate or quadrant, like basically + re-implement the algorithm and their language of", "tokens": [50364, 1322, 264, + 1305, 13024, 420, 46856, 11, 411, 1936, 319, 12, 332, 43704, 264, 9284, 293, 641, + 2856, 295, 50704], "temperature": 0.0, "avg_logprob": -0.20520151072535023, "compression_ratio": + 1.5982905982905984, "no_speech_prob": 0.005057184491306543}, {"id": 621, "seek": + 259532, "start": 2602.1200000000003, "end": 2608.6000000000004, "text": " choice, + go or rest or whatever. So because you feel probably better after understanding + each", "tokens": [50704, 3922, 11, 352, 420, 1472, 420, 2035, 13, 407, 570, 291, + 841, 1391, 1101, 934, 3701, 1184, 51028], "temperature": 0.0, "avg_logprob": -0.20520151072535023, + "compression_ratio": 1.5982905982905984, "no_speech_prob": 0.005057184491306543}, + {"id": 622, "seek": 259532, "start": 2608.6000000000004, "end": 2613.7200000000003, + "text": " bit there and then you can also control it in the way you want, especially + after listening to users.", "tokens": [51028, 857, 456, 293, 550, 291, 393, 611, + 1969, 309, 294, 264, 636, 291, 528, 11, 2318, 934, 4764, 281, 5022, 13, 51284], + "temperature": 0.0, "avg_logprob": -0.20520151072535023, "compression_ratio": 1.5982905982905984, + "no_speech_prob": 0.005057184491306543}, {"id": 623, "seek": 259532, "start": 2614.92, + "end": 2619.7200000000003, "text": " Yeah, and every, I think that it''s kind of + like they''re living in different, different", "tokens": [51344, 865, 11, 293, 633, + 11, 286, 519, 300, 309, 311, 733, 295, 411, 436, 434, 2647, 294, 819, 11, 819, 51584], + "temperature": 0.0, "avg_logprob": -0.20520151072535023, "compression_ratio": 1.5982905982905984, + "no_speech_prob": 0.005057184491306543}, {"id": 624, "seek": 261972, "start": 2619.72, + "end": 2626.4399999999996, "text": " computational spaces in a sense, like what + they''re expressing and what that line of code does is", "tokens": [50364, 28270, + 7673, 294, 257, 2020, 11, 411, 437, 436, 434, 22171, 293, 437, 300, 1622, 295, 3089, + 775, 307, 50700], "temperature": 0.0, "avg_logprob": -0.17500666852267283, "compression_ratio": + 1.7418181818181817, "no_speech_prob": 0.016194012016057968}, {"id": 625, "seek": + 261972, "start": 2626.4399999999996, "end": 2630.4399999999996, "text": " the complete + opposite from the perspective of what is what we''re committing to the machine here.", + "tokens": [50700, 264, 3566, 6182, 490, 264, 4585, 295, 437, 307, 437, 321, 434, + 26659, 281, 264, 3479, 510, 13, 50900], "temperature": 0.0, "avg_logprob": -0.17500666852267283, + "compression_ratio": 1.7418181818181817, "no_speech_prob": 0.016194012016057968}, + {"id": 626, "seek": 261972, "start": 2630.4399999999996, "end": 2634.7599999999998, + "text": " You know, the machine learning models, we''re building a framework in + which it''s capable of", "tokens": [50900, 509, 458, 11, 264, 3479, 2539, 5245, + 11, 321, 434, 2390, 257, 8388, 294, 597, 309, 311, 8189, 295, 51116], "temperature": + 0.0, "avg_logprob": -0.17500666852267283, "compression_ratio": 1.7418181818181817, + "no_speech_prob": 0.016194012016057968}, {"id": 627, "seek": 261972, "start": 2634.7599999999998, + "end": 2640.7599999999998, "text": " learning something in a C++ based or any imperative + environment like that. We''re expressing", "tokens": [51116, 2539, 746, 294, 257, + 383, 25472, 2361, 420, 604, 32490, 2823, 411, 300, 13, 492, 434, 22171, 51416], + "temperature": 0.0, "avg_logprob": -0.17500666852267283, "compression_ratio": 1.7418181818181817, + "no_speech_prob": 0.016194012016057968}, {"id": 628, "seek": 261972, "start": 2641.56, + "end": 2647.24, "text": " everything you can specifically do. It''s almost the opposite. + It''s kind of interesting to think about.", "tokens": [51456, 1203, 291, 393, 4682, + 360, 13, 467, 311, 1920, 264, 6182, 13, 467, 311, 733, 295, 1880, 281, 519, 466, + 13, 51740], "temperature": 0.0, "avg_logprob": -0.17500666852267283, "compression_ratio": + 1.7418181818181817, "no_speech_prob": 0.016194012016057968}, {"id": 629, "seek": + 264724, "start": 2647.24, "end": 2652.52, "text": " Yeah, exactly. Hey, Tom, it + was really great talking to you, but I was still thinking like if you", "tokens": + [50364, 865, 11, 2293, 13, 1911, 11, 5041, 11, 309, 390, 534, 869, 1417, 281, 291, + 11, 457, 286, 390, 920, 1953, 411, 498, 291, 50628], "temperature": 0.0, "avg_logprob": + -0.13145134129475072, "compression_ratio": 1.6040816326530611, "no_speech_prob": + 0.0031567849218845367}, {"id": 630, "seek": 264724, "start": 2652.52, "end": 2658.68, + "text": " can spend a few minutes and if you are not averse to philosophy, like + I like to ask this question", "tokens": [50628, 393, 3496, 257, 1326, 2077, 293, + 498, 291, 366, 406, 257, 4308, 281, 10675, 11, 411, 286, 411, 281, 1029, 341, 1168, + 50936], "temperature": 0.0, "avg_logprob": -0.13145134129475072, "compression_ratio": + 1.6040816326530611, "no_speech_prob": 0.0031567849218845367}, {"id": 631, "seek": + 264724, "start": 2658.68, "end": 2665.64, "text": " also to each guest on my on + my podcast, like considering that vector search is an emerging field in", "tokens": + [50936, 611, 281, 1184, 8341, 322, 452, 322, 452, 7367, 11, 411, 8079, 300, 8062, + 3164, 307, 364, 14989, 2519, 294, 51284], "temperature": 0.0, "avg_logprob": -0.13145134129475072, + "compression_ratio": 1.6040816326530611, "no_speech_prob": 0.0031567849218845367}, + {"id": 632, "seek": 264724, "start": 2665.64, "end": 2671.7999999999997, "text": + " many ways. We don''t know yet if it will fully replace the traditional search + or if they will work", "tokens": [51284, 867, 2098, 13, 492, 500, 380, 458, 1939, + 498, 309, 486, 4498, 7406, 264, 5164, 3164, 420, 498, 436, 486, 589, 51592], "temperature": + 0.0, "avg_logprob": -0.13145134129475072, "compression_ratio": 1.6040816326530611, + "no_speech_prob": 0.0031567849218845367}, {"id": 633, "seek": 267180, "start": 2672.6000000000004, + "end": 2677.88, "text": " together. But in general, like what makes you excited? + Why are you doing this? Like what,", "tokens": [50404, 1214, 13, 583, 294, 2674, + 11, 411, 437, 1669, 291, 2919, 30, 1545, 366, 291, 884, 341, 30, 1743, 437, 11, + 50668], "temperature": 0.0, "avg_logprob": -0.2140679486029971, "compression_ratio": + 1.7712177121771218, "no_speech_prob": 0.017046816647052765}, {"id": 634, "seek": + 267180, "start": 2677.88, "end": 2683.2400000000002, "text": " what keeps you going + and exploring this field today? I have a very simple answer for you. I''m tired", + "tokens": [50668, 437, 5965, 291, 516, 293, 12736, 341, 2519, 965, 30, 286, 362, + 257, 588, 2199, 1867, 337, 291, 13, 286, 478, 5868, 50936], "temperature": 0.0, + "avg_logprob": -0.2140679486029971, "compression_ratio": 1.7712177121771218, "no_speech_prob": + 0.017046816647052765}, {"id": 635, "seek": 267180, "start": 2683.2400000000002, + "end": 2689.2400000000002, "text": " of writing if statements. So you want to piggyback + on some complex models that the trends are.", "tokens": [50936, 295, 3579, 498, + 12363, 13, 407, 291, 528, 281, 39349, 3207, 322, 512, 3997, 5245, 300, 264, 13892, + 366, 13, 51236], "temperature": 0.0, "avg_logprob": -0.2140679486029971, "compression_ratio": + 1.7712177121771218, "no_speech_prob": 0.017046816647052765}, {"id": 636, "seek": + 267180, "start": 2689.2400000000002, "end": 2694.44, "text": " I want to show the + machine examples of it working correctly and examples of it working incorrectly", + "tokens": [51236, 286, 528, 281, 855, 264, 3479, 5110, 295, 309, 1364, 8944, 293, + 5110, 295, 309, 1364, 42892, 51496], "temperature": 0.0, "avg_logprob": -0.2140679486029971, + "compression_ratio": 1.7712177121771218, "no_speech_prob": 0.017046816647052765}, + {"id": 637, "seek": 267180, "start": 2694.44, "end": 2699.0800000000004, "text": + " and the machine learns exactly what those if statements should be. I mean, it''s + the idea that we", "tokens": [51496, 293, 264, 3479, 27152, 2293, 437, 729, 498, + 12363, 820, 312, 13, 286, 914, 11, 309, 311, 264, 1558, 300, 321, 51728], "temperature": + 0.0, "avg_logprob": -0.2140679486029971, "compression_ratio": 1.7712177121771218, + "no_speech_prob": 0.017046816647052765}, {"id": 638, "seek": 269908, "start": 2699.08, + "end": 2704.44, "text": " have to train something by illustrating every possible + variation of it is just insane. If imagine", "tokens": [50364, 362, 281, 3847, 746, + 538, 8490, 8754, 633, 1944, 12990, 295, 309, 307, 445, 10838, 13, 759, 3811, 50632], + "temperature": 0.0, "avg_logprob": -0.19227474076407297, "compression_ratio": 1.6576271186440679, + "no_speech_prob": 0.005028299055993557}, {"id": 639, "seek": 269908, "start": 2704.44, + "end": 2709.3199999999997, "text": " like on the say on Lookpop when you''re searching + for money and you see images with dollar signs in it,", "tokens": [50632, 411, 322, + 264, 584, 322, 2053, 13872, 562, 291, 434, 10808, 337, 1460, 293, 291, 536, 5267, + 365, 7241, 7880, 294, 309, 11, 50876], "temperature": 0.0, "avg_logprob": -0.19227474076407297, + "compression_ratio": 1.6576271186440679, "no_speech_prob": 0.005028299055993557}, + {"id": 640, "seek": 269908, "start": 2709.3199999999997, "end": 2713.96, "text": + " like that could have been programmed by a human being, but it would take a team + of hundreds and", "tokens": [50876, 411, 300, 727, 362, 668, 31092, 538, 257, 1952, + 885, 11, 457, 309, 576, 747, 257, 1469, 295, 6779, 293, 51108], "temperature": 0.0, + "avg_logprob": -0.19227474076407297, "compression_ratio": 1.6576271186440679, "no_speech_prob": + 0.005028299055993557}, {"id": 641, "seek": 269908, "start": 2713.96, "end": 2717.56, + "text": " it would take them 10 years and then they would finally have the money + detector, you know.", "tokens": [51108, 309, 576, 747, 552, 1266, 924, 293, 550, + 436, 576, 2721, 362, 264, 1460, 25712, 11, 291, 458, 13, 51288], "temperature": + 0.0, "avg_logprob": -0.19227474076407297, "compression_ratio": 1.6576271186440679, + "no_speech_prob": 0.005028299055993557}, {"id": 642, "seek": 269908, "start": 2719.16, + "end": 2723.08, "text": " Some brilliant dudes took a couple months to express how + it could work and now we can solve all these", "tokens": [51368, 2188, 10248, 27717, + 1890, 257, 1916, 2493, 281, 5109, 577, 309, 727, 589, 293, 586, 321, 393, 5039, + 439, 613, 51564], "temperature": 0.0, "avg_logprob": -0.19227474076407297, "compression_ratio": + 1.6576271186440679, "no_speech_prob": 0.005028299055993557}, {"id": 643, "seek": + 272308, "start": 2723.08, "end": 2729.4, "text": " different questions. It''s fascinating. + No, it''s an amazing answer actually. Thank you. I mean,", "tokens": [50364, 819, + 1651, 13, 467, 311, 10343, 13, 883, 11, 309, 311, 364, 2243, 1867, 767, 13, 1044, + 291, 13, 286, 914, 11, 50680], "temperature": 0.0, "avg_logprob": -0.1705579069471851, + "compression_ratio": 1.6333333333333333, "no_speech_prob": 0.010130831971764565}, + {"id": 644, "seek": 272308, "start": 2729.4, "end": 2735.72, "text": " it''s you + know, like some people get entrenched in like, oh, I''m so in love with machine + learning,", "tokens": [50680, 309, 311, 291, 458, 11, 411, 512, 561, 483, 948, 42388, + 294, 411, 11, 1954, 11, 286, 478, 370, 294, 959, 365, 3479, 2539, 11, 50996], "temperature": + 0.0, "avg_logprob": -0.1705579069471851, "compression_ratio": 1.6333333333333333, + "no_speech_prob": 0.010130831971764565}, {"id": 645, "seek": 272308, "start": 2735.72, + "end": 2741.64, "text": " but like what you say is that you have a practical need + and you also know the limitations of your", "tokens": [50996, 457, 411, 437, 291, + 584, 307, 300, 291, 362, 257, 8496, 643, 293, 291, 611, 458, 264, 15705, 295, 428, + 51292], "temperature": 0.0, "avg_logprob": -0.1705579069471851, "compression_ratio": + 1.6333333333333333, "no_speech_prob": 0.010130831971764565}, {"id": 646, "seek": + 272308, "start": 2741.64, "end": 2747.0, "text": " previous approach, right? Like + if statements like who wants to code if statements or like if you take", "tokens": + [51292, 3894, 3109, 11, 558, 30, 1743, 498, 12363, 411, 567, 2738, 281, 3089, 498, + 12363, 420, 411, 498, 291, 747, 51560], "temperature": 0.0, "avg_logprob": -0.1705579069471851, + "compression_ratio": 1.6333333333333333, "no_speech_prob": 0.010130831971764565}, + {"id": 647, "seek": 274700, "start": 2747.0, "end": 2752.28, "text": " would say + dictionary like somewhere in solar elastic search, you need to manually code that + dictionary", "tokens": [50364, 576, 584, 25890, 411, 4079, 294, 7936, 17115, 3164, + 11, 291, 643, 281, 16945, 3089, 300, 25890, 50628], "temperature": 0.0, "avg_logprob": + -0.29806305481506895, "compression_ratio": 1.6939501779359432, "no_speech_prob": + 0.02377639338374138}, {"id": 648, "seek": 274700, "start": 2752.28, "end": 2757.8, + "text": " up and like maintain it. Oh my god. Really? Is that the best part of your + job? Probably not.", "tokens": [50628, 493, 293, 411, 6909, 309, 13, 876, 452, 3044, + 13, 4083, 30, 1119, 300, 264, 1151, 644, 295, 428, 1691, 30, 9210, 406, 13, 50904], + "temperature": 0.0, "avg_logprob": -0.29806305481506895, "compression_ratio": 1.6939501779359432, + "no_speech_prob": 0.02377639338374138}, {"id": 649, "seek": 274700, "start": 2759.0, + "end": 2763.48, "text": " Defining synonyms is a whole I cannot believe I have to + define synonyms. Someone''s already done this", "tokens": [50964, 9548, 1760, 5451, + 2526, 2592, 307, 257, 1379, 286, 2644, 1697, 286, 362, 281, 6964, 5451, 2526, 2592, + 13, 8734, 311, 1217, 1096, 341, 51188], "temperature": 0.0, "avg_logprob": -0.29806305481506895, + "compression_ratio": 1.6939501779359432, "no_speech_prob": 0.02377639338374138}, + {"id": 650, "seek": 274700, "start": 2763.48, "end": 2767.4, "text": " somewhere, + you know, induction rich somewhere and they''re just sitting on the", "tokens": + [51188, 4079, 11, 291, 458, 11, 33371, 4593, 4079, 293, 436, 434, 445, 3798, 322, + 264, 51384], "temperature": 0.0, "avg_logprob": -0.29806305481506895, "compression_ratio": + 1.6939501779359432, "no_speech_prob": 0.02377639338374138}, {"id": 651, "seek": + 274700, "start": 2767.4, "end": 2773.72, "text": " ring. It is somewhere dusty dusty + shelves from why not why not embed them inside the machine learning", "tokens": + [51384, 4875, 13, 467, 307, 4079, 41973, 41973, 24349, 490, 983, 406, 983, 406, + 12240, 552, 1854, 264, 3479, 2539, 51700], "temperature": 0.0, "avg_logprob": -0.29806305481506895, + "compression_ratio": 1.6939501779359432, "no_speech_prob": 0.02377639338374138}, + {"id": 652, "seek": 277372, "start": 2773.72, "end": 2777.8799999999997, "text": + " algorithm? Yeah, absolutely. Hey, it''s so fantastic talking to you. Thanks for + bringing this", "tokens": [50364, 9284, 30, 865, 11, 3122, 13, 1911, 11, 309, 311, + 370, 5456, 1417, 281, 291, 13, 2561, 337, 5062, 341, 50572], "temperature": 0.0, + "avg_logprob": -0.12324299017588297, "compression_ratio": 1.6493055555555556, "no_speech_prob": + 0.012031695805490017}, {"id": 653, "seek": 277372, "start": 2777.8799999999997, + "end": 2783.3199999999997, "text": " user perspective and like, is there something + you would like to announce or share with with the", "tokens": [50572, 4195, 4585, + 293, 411, 11, 307, 456, 746, 291, 576, 411, 281, 7478, 420, 2073, 365, 365, 264, + 50844], "temperature": 0.0, "avg_logprob": -0.12324299017588297, "compression_ratio": + 1.6493055555555556, "no_speech_prob": 0.012031695805490017}, {"id": 654, "seek": + 277372, "start": 2783.3199999999997, "end": 2790.12, "text": " audience, you know, + anything at all? Just check out lookpop.co and taking some NFTs in your life.", + "tokens": [50844, 4034, 11, 291, 458, 11, 1340, 412, 439, 30, 1449, 1520, 484, 574, + 13872, 13, 1291, 293, 1940, 512, 13576, 33424, 294, 428, 993, 13, 51184], "temperature": + 0.0, "avg_logprob": -0.12324299017588297, "compression_ratio": 1.6493055555555556, + "no_speech_prob": 0.012031695805490017}, {"id": 655, "seek": 277372, "start": 2790.7599999999998, + "end": 2797.0, "text": " Yeah, and buy an NFT and spice up your life, the digital + life, right? Yeah, awesome. Hey,", "tokens": [51216, 865, 11, 293, 2256, 364, 50075, + 293, 19436, 493, 428, 993, 11, 264, 4562, 993, 11, 558, 30, 865, 11, 3476, 13, 1911, + 11, 51528], "temperature": 0.0, "avg_logprob": -0.12324299017588297, "compression_ratio": + 1.6493055555555556, "no_speech_prob": 0.012031695805490017}, {"id": 656, "seek": + 277372, "start": 2797.0, "end": 2803.08, "text": " Tom, thanks so much. I really + wish you all the best in trying quadrant and implementing it in your", "tokens": + [51528, 5041, 11, 3231, 370, 709, 13, 286, 534, 3172, 291, 439, 264, 1151, 294, + 1382, 46856, 293, 18114, 309, 294, 428, 51832], "temperature": 0.0, "avg_logprob": + -0.12324299017588297, "compression_ratio": 1.6493055555555556, "no_speech_prob": + 0.012031695805490017}, {"id": 657, "seek": 280308, "start": 2803.08, "end": 2810.04, + "text": " product and also like the whole web to your user base. And I''m sure we + can talk later as well", "tokens": [50364, 1674, 293, 611, 411, 264, 1379, 3670, + 281, 428, 4195, 3096, 13, 400, 286, 478, 988, 321, 393, 751, 1780, 382, 731, 50712], + "temperature": 0.0, "avg_logprob": -0.23591993675857295, "compression_ratio": 1.4657534246575343, + "no_speech_prob": 0.00378863955847919}, {"id": 658, "seek": 280308, "start": 2810.7599999999998, + "end": 2816.36, "text": " and you can share some progress speeds, you know, as you + go. Great. Thank you so much. Thank you so", "tokens": [50748, 293, 291, 393, 2073, + 512, 4205, 16411, 11, 291, 458, 11, 382, 291, 352, 13, 3769, 13, 1044, 291, 370, + 709, 13, 1044, 291, 370, 51028], "temperature": 0.0, "avg_logprob": -0.23591993675857295, + "compression_ratio": 1.4657534246575343, "no_speech_prob": 0.00378863955847919}, + {"id": 659, "seek": 280308, "start": 2816.36, "end": 2817.96, "text": " much. Yeah, + bye bye.", "tokens": [51028, 709, 13, 865, 11, 6543, 6543, 13, 51108], "temperature": + 0.0, "avg_logprob": -0.23591993675857295, "compression_ratio": 1.4657534246575343, + "no_speech_prob": 0.00378863955847919}]' +--- + +Hi, everyone. Bector Podcast is here. And today we have Tom Lackner, Vice President of Technology at the company called Classic. And I'm sure Tom will talk more about it. And he's also the founder and sole developer of Lookpop, which I'm sure Tom will talk more about as well today. +And what's really cool is that Tom has been using vector database called Quadrant in his development. And so today we have a user of a vector database, not a maker. And that's amazing to hear firsthand how it goes with vector database. Hey Tom. Hey, what's going on? So great that you joined today. +And I just wanted to start as usual, like if you could please introduce yourself and give you a little bit like color to your background. Sure. My name is Tom Lackner. I'm a software developer living in Miami, Florida, a very warm place. +I've been developing stuff on the web for about 20 years now since the early days of it. And I really, really love vector databases these days and doing stuff with embeddings. Yeah, fantastic, fantastic. +And can you tell more about classic? So I know that it's about classic cars, but yeah, what's this website is about and what's the community maybe around it and so on. So I'm the VP of technology for a site called classic.com, the tracks classic car values. +So what we basically do is we go out on the web and we grab all the car sales that are occurring that are happening in a way that's easily understood. So if anything is sold with a price on it, we record that information. +And then we cross reference all these vehicles broken down into what we call markets. +So if a vehicle came in two different trims, two different levels of options, we break those out separately and we can give the user a really good estimate of value with very specific and granular understanding of what a car is really worth. +So it's basically like a big data for cars type project, I guess you could say. Yeah, and I mean, I checked the website and I mean, the cars look so great and some of them are kind of like on on high end in terms of pricing. +So it also defines the audience, right? Yeah, it's classic car values have really gone up in the past five years, especially considering COVID and a couple of factors in the United States. So it's more important than ever to do really intelligent, like savvy shopping before you make a purchase. +So that's where we're coming from. Oh, yeah. Awesome. And like, is it so that the kind of the user experience is mostly kind of managed on the website or you have also some offline part of the operations? So most of our operations are online on the website. +We also have an iPhone app, but what's really important is our backend crawlers. So we have a huge amount of software and resources attached to the idea of brightened crawlers that can understand different auction websites really, really well. +That's like a critical part of the infrastructure that's sort of behind the scenes, but ends up doing becoming, you know, a key part of what we're doing. Yeah. And I knew obviously I have a search bar there. +So what happens when I type something like in, you know, on classic, we use a combination of Postgres for the actual like OLTP data, like, you know, the actual ground truth. And then we feed that into a last search to do the full text search. +What we're actually trying to do there is transition that as well to using a text embedding. We I find that text embeddings are easier to use in the long run. But what's actually challenging there is developing a good understanding of typos. Right. +So we could probably go into one detail later, but the most of the text embeddings that you encounter aren't really typo tolerant. So in our case, that search box needs to really understand, like, let's say, Ferrari or Lamborghini. Those words are often spelled incorrectly for obvious reasons. +So one of the things that's holding us up there is developing a typo, is a system level of embedding. + Yeah, it sounds also some similarities to web search, for example, you know, like where users are using like colloquial language, or if they if they talk to their microphone instead of typing, then you have these typical problems from ASR, like automatic speech recognition systems, and you need to tolerate that. +So so it means that I don't know, like we've been thinking about data augmentation techniques. Have you have you thought about that as well? So what I've tried to do is to retrain the retrain the model using basically our input data, but with certain transformations applied certain permutations. +At this point, I'm not I am not the point where I have a usable model coming out of that, but I'm still doing some research and I it should work in theory. +Yeah, and there are so many models like on hugging face that they guess you can also kind of tap in, right? And that's actually one of the hard parts is to evaluate all those models. +So I have been taking a couple days to write a script that just downloaded every single one and tried them to determine which to best at our tasks. Yeah, exactly. And also like choosing kind of the quality metrics is another, another direction like how do you evaluate? Yeah, absolutely. +Yeah, we're kind of a new territory for a lot of this. So I mean, that's exciting on one hand, but on the other hand, like sometimes you just don't know the answer to a problem, which is like, yeah, yeah, for sure. +So in that sense, classic is kind of well, it's it's funny that there is a coincidence classic and then classic search like in a way that you're using TF idea for BM25, right? Well, of course, you will add an embedding layer at some point to make it more semantic, right? +Then you said that you have look pop, which where you are now experimenting with vector search. +Can you tell me more what is look pop? And then how do you implement vector search there? For the last couple of years, I've been really interested in search engines and how search engines work. +I feel like Google has sort of done us a disservice in certain ways over the past, you know, a couple generations of its development. So I've been interested in like, you know, developing better web search tools. Lookpop.co is my effort to make a NFT search tool. +So NFTs are digital artworks that you can buy in the past year. The NFT market has exploded. I think something like $6 billion has been exchanged this year in NFTs. +But the problem with NFTs is that coming from the world and the language of cryptocurrency, a lot of the websites related to NFTs are about the price, the up, the down, the this, that that, you know, what's hot? +What's not blah, blah, blah, who's flipping, you know, like I personally don't care for that. +So I was looking for an NFT search engine that could actually help me understand the meaning of NFTs and find visually similar ones. +If I find something I like, it would be cool like the up to see stuff that's kind of in that same vein without having to manually search around on OpenC, for instance, which is the number one NFT market. You can only search by the name of the creators, right, which is so weird to me. +I wanted to be able to search by themes, by visual styles. And when I came across clip, the text embedding system or the imaging embedding system, it really, like it provided all those features in a pretty easy to use way. So I'm really excited about that functionality. Yeah. +And clip is basically the embedding model developed by OpenAI. I think it's also available as a hiking face model. So you can kind of plug it in much easier in the code. +And so you have what is your experience with clip so far? So one of the great things about embedding is that when they work, it's like a sort of like magic, right? Like it's like, it's amazing that this was even possible. +The problem is though, if you actually look at the results set as a whole, it's only 80% accurate, right? Like you'll find 20% of those in there are just what the hell is this? +So as a sort of imperative programmer coming to it, or a guy who's my experience is based in the world of traditional programming, to see that it's like, okay, this is a bug, but it's not. +It's one of the switchovers you need to make is to accept the fact that you're going to get a lot of great results for very little effort, but you're not going to get every 100% results. +It's more about identifying the results that are bad, flagging them and trying to retrain to get them out of the loop in the future. Yeah, exactly. +So like building that pipeline, it's essentially like MLOPS pipeline, right? Machine learning operations where you need to switch your mind a little bit into building this pipeline way. +You can like detect problems and then feedback to the process of building a next version of your model, right? So it's not as easy as opening your debugger and then, okay, here is the bug. It's logical, fix it, done. Yeah, the pipeline to develop these models is long term. +It's very different than a piece of software and you need continuous monitoring and you need to continuously be able to sort of have signals feeding in to make that make that model better next time. It's actually pretty difficult. +I know there's a lot of startups around like MLOPS now, which makes total sense to me, but it's almost like I feel like developers myself included need to build the mindset and to know, and to, and to mentally know like, okay, these are the different components that I need to put into the system. +Yeah, absolutely. And there is like, there are white papers published, for example, there is one by Google, I will try to also, you know, link it in the show notes and share with you. +But it's fairly long document and it goes so high level that you might get, get, get a slip, you know, while reading it. So you need like real tools, right? And you need some kind of understanding, okay, I stitched this together and I just achieved my goal. +I don't want to like build the fully blown MLOPS pipeline, right? And it's also expensive, like retraining these models is very slow, which means you're going to want to use the best hardware you can. +And if you're doing that every day, which is crazy, but let's say you do it every week or every month, it's still a significant amount of like fixed resources. You have to allocate to it and like mental resources to understand it to debug it. Yes. Yes. Yeah. You're right. +Yeah, I think there is still a long way to go in this in this direction, but at the same time, you like you as a developer, you want to focus on the task and on the result, right? Like not on figuring out what's the bug and that framework or whatever. +So yeah, I think there are tools already available. So maybe one of them that I've been using is determined AI or kind of doing some early stage moves there. It's completely open source and it's and it claims that basically it utilizes your GPUs to the maximum because GPU is super expensive. +And yeah, so basically at the abstracts GPU kind of allocation away from you, but it has some limitations as well. So the team is working on or resulting them, but like PyTorch and TensorFlow are supported. So like you can run some fine tuning or training or evaluation and hyperparameter search. +So yeah, I mean, it gives you a sense of control in a way, but of course that comes with some rigidity built in, but eventually I really hope that they will make it and it will be more kind of widespread. Yeah. Awesome. Awesome. +So today if I go to look up, can I buy an NFT already? Okay, then just find it. You can just find it and click through the open see the actual process of you getting the deliverable and the token and all that stuff is actually pretty complicated. So I'm going to let them do it. +I really want to on look up, hopefully be indexing tons of different NFT markets. Open see is the biggest one, but there's quite a few other small ones. So I didn't want to time myself too closely to one particular blockchain or one particular form of operation. +I do think that this is developing so quickly. NFTs weren't even a thing until about two years ago. So I feel like it's a little early to sort of like get in bed with just one of the vendors or just one of the vendors. Yeah. Yeah. +And I've been also like when I joined Quadrant Telegram channel, I saw like you've been so active like you are sending some advice or commentary almost every single day. So I love Quadrant and I love Telegram. +Yeah, I was just thinking like you are the developer of Quadrant or what, but you are the user, is that right? +Yeah, I mean, I think I'm doing like informal tech support in the opposite time zone because they're all on CET like you are in a, you know, for some reason, although a lot of people wind up in, you know, American time in there. +I was looking for a long time for a vector database. I tried FAASS face, I think they call it in Python and I tried a couple others and I really didn't find anything. +I guess you could say like intuitive that intuitively like scratch, scratch my itch, you know, like I don't like software to complicated things that are sort of like isolated and independent and easy to install and use. +Quadrant just sort of like ticked all those boxes for me like it's a small download, it's a dockerized so it's very easy to install. The API just makes sense. I had evaluated other vector databases we can talk about that if you want. +I found that Quadrant was the best mix of all those different factors. So, you know, when I embrace an open source project to try to do my best to help them out. So I built the first elixir connector to use Quadrant for me, Lixir. I'm trying to still develop other little pieces of the puzzle. +So actually I'm interested, I'm quite interested because you know, like I published a blog on seven vector databases. +It was actually six and then the founder of Quadrant knocks on my door, virtual door and he said hey, please add our database as well because we are the new kid on the block and you just probably didn't see us. +And then I opened their website and I was like kind of a little bit blown away because, you know, the documentation looks interesting, very good and also like the way they position it. Yeah. They talk about like metric deep learning, some things that I didn't even hear before. +And then I also discovered the developer team, like what they do and also they customized like H&SW algorithm algorithm as well. They do graph algorithm. +And can you like a little bit walk me through the options that you considered, like which are the databases you have taken a look at, how deep you went before you decided to go with Quadrant and what was the ticking moment like, okay, I like this. Right. +So I think the main one that I think I studied for a while, I think a lot of people look at is Milvis. Milvis has like a lot of really exciting energy going on. I think they go to have a good replication story as well. +But the problem where they seem like they wanted me to use their Python data science toolkit to sort of interact with it. They were, their API was very abstract and focused on, obviously just not what I was really doing. +I needed an API that was oriented around data operations and working not so much analysis. So that kind of slowed me down there. With Quadrant, I felt that as soon as I got to the web page, I knew how to use it. Right. +Which, you know, for some reason to me, you know, if a software system abstracts as specific of how it operates and how you use it too much. +I like, I like to say, like, I like a language where you go to the site and you just see a bit of code there on the home page and it's a button you can run it, you know, like, I don't want to be too removed from the actual tasks of what I'm doing. +So Quadrant just seemed to like understand that and get that. And then going to the channel, I like the fact that they had a good, a good, a deep technical understanding of how it works, but they weren't trying to beat you over the head with the specifics. +It was kind of like abstract at the right level. So, and you know, it's really fast. I tried tossing millions of records at it and it was almost if perceptibly slower. Like you almost couldn't tell that you were adding so many records. So I thought that was, that was really fair. +A lot of these vector databases now, I feel like the more like platforms, you know, I didn't want that. +I wanted almost like a redis of vector databases, you know, that kind of, by platform, do you mean like this database is trying to lock you in in a way, kind of like give you so many features you don't need? Exactly. Okay. +So I think it's all well meaning, but I just, I just feel like I can't, I can't trust one vendor for a lot of what I'm doing. I need to sort of spread my, spread my risk, you know, over different parts. So I try not to embrace any parts of the system that are too large, too monolithic. +You know, yeah. And I guess at this point, you're wearing your VP hat, right? VP of the genuine hat that you, you don't just kind of like, oh, this is a sexy platform or tool. Let's use it. +But you want to see long term, what are the implications, right? And you need to, you need to adopt technology that has the right sort of surface shape that you know, it's going to slide in easily. +You know, I, I, I, with, for instance, with Python face or whatever, I knew that that was going to be a nightmare to wrap to make connectors for for different systems. And I also knew that I wasn't going to program my entire thing in Python. +And I also knew that, you know, I would need to have a long term component running, running a web server that was independent of Python back and restarts. So with all those factors together, I think that quadrant was kind of like the obvious choice. +And plus just looking through the code, it seemed, it seemed short. I, for some reason, I've been having really good, really good results with stuff right and rest lately. I've, like, a lot of rest software come across as really reliable and performant. Yes. +So like, that's what I was about to ask, like, when you, then when you're choosing, you were choosing the platform or the database, did you pay attention to the programming language that was implemented in as well? Yeah. +For some reason, I know it's unfair, but I've definitely observed certain patterns in all, let's say, all Java based server applications, like, let's say, elastic is a great example. They always want to consume many, many, many CPUs and have low RAM limitations. +And of course, they're still that confusing garbage collection cycles and Java. And like every time I run a Java based service, I need to end up doing tweaking on the JVM Yeah, which is like a garbage collector, right? You don't want to do that black magic. +Yeah, I want, I want the thing that's doing the running to sort of be self operating. I don't want to have to be tweaking that all the time as my application needs grow and change. So that kind of like disqualifies all Java based software for me. +And I believe that one of the major vector databases is Java, right? I think it's the, the, the, the, the, the, the, the, the, the, the, it is written in go, actually. So, but if you mean Vespa, Vespa is written in Java and some other part. Yeah, Vespa, yeah. Yeah. +But by the way, did you consider Vespa or VEV8 as you've been studying different databases? I believe I checked out the Vespa site, like you said, Java. What was the other one you said? VEV8. VEV8. Written and go. Yeah. It's also open source and also first year cloud. +By the way, I have episodes with both Milbus and VEV8 for those of us that would like to kind of listen in what, what are the building blocks and architectures behind this? Yeah. And features. It's actually awesome. I have to listen to that. Yeah. +I'm actually very curious about the different implementation choices. Yeah. Yeah. Because go is also a high performance language, right? Compared to Python or Java. Of course, in Java, it also depends how you do it. +You know, if you take elastic search or solar, both of them are using Apache Ducin, which is a search library inside. And Apache Ducin has been optimized for, well, close to 20 years or even more. So I mean, it's close to see in some sense, but of course, it is not see. +So like when you, when you load more and more data, eventually it will, you will run into situations that you just explained, right? That you start tweaking the garbage collector. +There is like a dedicated channel or like even a person, I feel like Sean Hasey in the commuter side, who has a lot of wisdom in how specifically you need to tune which parameters in which garbage collector. +Do you want GC or whatever you have? Depending on the Java version, right? Because different Java versions have different GCs and like it's almost like opening a whole can of worms when you don't want to actually. You want to solve that task and move on to the next one. +So yeah, so far, I haven't had that issue with the rest based stuff that I'm integrating into my work so far. But you know, I would imagine the people using the first wave of Java based server software didn't find any problems either. +So maybe as time goes on, we'll discover that, you know, you can't do large allocations in a roster or something like that. +Yeah, it's also cool that actually Rust has been picked up by many teams developing search engines and not necessarily the vector databases, but like traditional, you know, inverted index databases like Tantivi is one of them, which is using Rust. +And they have a nice blog as well explaining some of the performance bottlenecks they were able to resolve. And Tantivi is way faster than you've seen. So there is another presentation by the Tantivi main. I'm looking for a good free tech system with BM 25 and all that kind of stuff. +Unfortunately, quadrant isn't going to add that. I don't think it's kind of off base for them, but that's such an important part. You know, there's a reason that so many of these startups have a team of people just doing search result quality, you know, search results are critical. +Yeah, now that you mentioned this important topic, I also wanted to kind of a little bit pick your brain on that, you know, like you have the traditional search engine, let's say on your classic dot com site, right, where I type text and you use inverted index to find the results. +And then you want to bring in bird model or some other model to deal with more semantic search, right. So have you thought about how you would combine them? Let's say inverted index versus vector search. +So I actually, okay, so we say search, right, but there's actually kind of different sub tasks inside of search. One of them is when you search for something, we want to show you something that's similar. So you don't necessarily want to get the exact same term. +So that requires like one piece of data or one mechanism, right, which is more like a recommendation type system. You also want to handle things with direct keyword matches, of course, but you also want to handle typos, right. +So typos requires like a second layer and structure of databases or the way you implement it as to work a certain way. Okay. +And I feel like the best way to kind of do this is to have the search piece do multiple different attempts at solving the query and then combine them with an intelligent strategy. So, like for instance, on classic, now we're building a better auto-suggest component. +And it's actually doing three different types of queries. +And I think that if you really, really, if you start recording what users are doing and you start looking at every single, every single search and saying, what did we do wrong here? How could we not service? I think you'll see that it's actually not just one type of query. +When people see a type of search box, they'll just start plugging things in. They don't know if they're going to do English language queries, which is something that an embedding would handle, right, because an embedding can understand any sort of information or any sort of intention in that query. +But sometimes they're just searching for a specific model number or something like that. In my experience, a lot of text embedding models, if you use a term that's outside of the domain, something that was outside the keyword list it was trained with, you're going to get really bad results. +So that's another thing I have to sort of be thinking about. +So unfortunately, right now I think that the best way to set these things up is to do multiple last-use search queries, maybe a postgres query, and maybe a quadrant query, and then correlate all those results and display them intelligently. Yeah, exactly. +So basically, you almost need some smart runker or re-runker, which combines these results, and it doesn't care, which algorithm was used to bring them in. But what it cares is the, you know, to optimize the KPI, let's say, flicks through rate or whatever it is. +Because in some applications, like, I've been talking to one company building maps, and they said that, for example, when you sit in a car and you start typing some, like, few letters, like two or three, you don't have much time as a driver, and you just need to hit the road going, right? + So if this company is doing bad at predicting the intent, and by the way, what they do is that they don't limit the search only to the radius around you, because they believe that you might be going to the airport from where you will fly out to, I don't know, Washington, DC, and then you are looking for that specific street while you are sitting in the car in Europe. +And so they search the whole database of points of interest, and, and you know, like, first of all, it's scale. It's going to be a problem. And the other thing is, you need to actually rank the results in such a way that they get it, right? So it's extremely difficult problem to handle. +So in that case, I would, yeah, in that case, they're probably predicting from where you are now. If you're here now, what's the most likely thing that you want to go to? It's kind of an interesting problem. +And actually, that's like, you actually kind of bring up a good point, is that a lot of startups don't have enough data to make those intelligent associations. +So it becomes a game of sort of, this, of finding an open data set that you can use, or something you do have, and like sort of abstracting from it, or extending it in a certain way that you can make these intelligent inferences. But it's very, very difficult. +And until you have a lot of users, you don't have any data coming back to you, telling you whether or not you're doing a good job. So it's not easy. +And I think that's one of the reasons that we see that some of these big startups, these platforms become very entrenched with their data learning tools, or their machine learning tools, their data sets, there you have, it becomes hard to, hard to unseat them. +Because all the activity and that space is happening on their property, you know? Yeah, yeah, exactly. So, and one thing I wanted to also mention that you said you want to handle typos. +Did you know, or did you look into bite level embedding models? So basically, instead of, you know, segmenting the word, let's say, letter by letter, whatever, which could be also expensive. They go into bite level. I think that paper was published by Google. +I will try to look it up and also link it in the show notes. But have you, have, did you know about this? Or did you consider such models? That's news to me. +What I've been trying to do is just retrain an existing model with a bunch of permutations and things like that, obviously, think of that were like common typos like dashes and stuff like that. But that's a very interesting idea. +So basically, they're working on a character by character level, right? So the embedding itself is composed of, it's even bite because the language could be something they like, okay, you don't want to apply some linguistic component, which is language dependent. +Let's say in Chinese, you need to segment the string, right? You need to know where is that space, which is not there geometrically. And then in some other languages, let's say, Russian, you will have like rich morphology. +So a lot of endings and prefixes of the word, right? So instead of, yeah, like surface forms, instead of applying that surface form lematizer, you will go and just look at the bites. And then you ask the neural networks to pay attention to the bites instead of the characters. Yeah. +That's actually a brilliant idea. And no, I haven't heard that, but I would love to apply it. So please send me the link. I will, I will for sure. It would be cool if you can apply and take a look at it. +And hopefully, there is a model that you can take off the shelf and not like spend weeks or months researching it. So the amount of effort going into training these models now is really, really absurd. It's ludicrous, yes. +And I mean, the models are getting bigger and bigger, but it's funny that they not necessarily becoming more smart in a way. And I will try to open it. And I'm actually now editing an episode. So by the time this one is published, that one should also be published. +And yeah, so basically, yeah, how you train everything, I don't know, it's like, you don't want like you as a developer, you don't have time researching things, right? +Now with what options do you have, you will probably need to go to some company, which will offer you the service for money, or you need to go and scout on the hugging face side and hope for the best, right? How do you feel about that? Yeah. +I haven't spent so much time just sort of getting my brain around certain things that are like, you know, there's no real jumping off point for a lot of the stuff. +There's no single place you can go to people, I see people on the web on these sites, like, what book should I read about this? Book, you kidding me? What book should you write about this? There are no books about any of this, you know? +It's changing so quickly that I feel like you have to be part of numerous, like internet subcultures and very specific, like research websites, even I understand what's going on. +But thank God that people like hugging face are putting so many resources into just making these tools available, like their pipelines package is like such a time saver. I can't believe I was ever implementing all these things from scratch or from, you know, separate tool. +Yeah, yeah, there is another site that I wanted to mention, which also picks up is papers with code because when you read a paper and you're like, okay, how do I do it? I need to spend a few weekends to implement it. Some of us have the skills, some of us don't. So people are lucky. Yeah. +And they probably belong to the communities so well that, you know, they know their way through. Yeah, I think that, you know, papers without code are kind of like, like to me, a little scandalous now. +I feel like it's very difficult to measure someone's results and to really evaluate what people are doing if they're not releasing the code. And even if they explain the algorithm and the paper a lot of times, the specific training process for the model is what's really critical. +So certain decisions they made about what's included in the day set, what isn't included in the day set. And just sort of like the training loop engineering, like I feel like that's super important. +So I think what the success of OpenAI has had with Clip, I think goes to show that someone with a great idea and a model that's released on time and early is just it's going to really be a game changer for the industry. +Yeah, I remember like in the Telegram channel of Quadrant, two people including you, I think, said that without any fine tuning, you got really great results with Clip. And I think you guys applied it to different domains. +And that was amazing because especially the cross-domain application, you know, it's such a big pain for these models. +There is a paper as well where they take the off-the-shelf models and they apply them to different like search tasks because a search task could be, let's say, it looks like a question answering system or it operates in financial domain, right? So it goes in specific domain. +And then they basically what they did is that they applied no fine tuning whatsoever. So they took the models and they applied what they call zero short learning, right? So you just, you need to predict it right away. And they showed that, ah, man, they're not all equal at all. +And sometimes they miserably fail. But they actually found out that the specific category of algorithms based on, I think based on dense retrievers, if I remember correctly. So they perform better than others. +But if you compare the dense retrievers to BM25 method, BM25 was still fairly close to them. And it's way less expensive. So that's really interesting. Yeah. It also depends a lot on the very, very specific use cases of your users. +Like what you're saying with BM25, if something is very, if they're searching for a lot of sort of jargon and industry specific stuff, BM25 is definitely going to kill it compared to even models that are tolerant of the terms outside of their, outside of the keyword space. +Like, I really feel like what we need is a more natural way to blend these two kinds of techniques. +And I think as we see more and more advanced vector-based search engines coming out, we're going to see systems that are able to sort of store the full text, store the vector embedding, compare them both and rank them in a uniform way, which is like so critical. +And one of the, I think something you mentioned that is super interesting is to the systems that are, they're retraining them using these simple keyword question answering tasks. And result comes out much, much better, the accuracy and stuff. I think that's so interesting. +And I believe that if you could take a model and take specifics about your use case and blend them together very quickly and easily, I think that we would end up seeing embeddings that produce a much better result in the field. Yeah. +And I think you are tapping also in the in the part where I hope that at some point, we also get a confidence level. Let's say from BM25, you can also get a confidence estimate how well it worked. And the same goes to dense retrieving, you know, the vector retrieving. +Wait, could also say, hey, I'm kind of like 60% confident that I found you a good result, right? Then the question is how you build it, but like that's the goal. Let's say I'm just pushing this out to the universe. So hey, everyone who works on search engines, maybe you can consider this. +Or maybe you already are considering this. But I think that would be so much easier because, for example, in e-commerce, right, one of the problems is zero heat search. And probably for your search as well, right? Like somebody typed something that you couldn't handle. +Now, what do you show? A blank screen or you show the most popular NFTs, right? And that's one of the hard things about, I guess, a traditional imperative based search engine. You can never show the user an empty page. You can never just say, ah, nothing, sorry, try again. +You have to always be like feeding them next steps that they can go to that make that make a lot of sense. And that's definitely one of the challenges with old style database search approaches, just finding, finding results that are relevant, but not quite right. +You know, SQL doesn't really do that. So that's another great thing about Quadrant. At least a lot of times you're getting a score metric back that is a good and continuous, value. It's not bullying. Yes, this match, no, that didn't match. Yeah, yeah, exactly. +I remember like when I was entering this field slowly, I had a friend who was majoring in information retrieval systems as an academic. +And and I asked him, hey, so if I, how do search engine work, you know, like if I type something, what happens next? And I knew nothing about inverted indexes, nothing at all. And he said, yeah, there is like an inverted index. So we break that documents down into this kind of vector of terms. +And then it points each, each term points to the posting list with the OKDs and so on. And then you apply Boolean, Boolean like logic on top of those. And you make it efficient. But then I was still not satisfied. +I said, hey, so it means that if I need to find something, let's say I'm in discovery mode, I don't know what type. So what should I do? And and he said, yeah, the IR is not there yet. Like there is no discovery. +You literally need to type at least something, right? And then I said, OK, when I type something, like how does the search engine know what I'm looking for? +And he said, well, that inverted model, which is a vector space model from the 60s or 70s, it basically builds some kind of understanding of the document. +And I said, how exactly does it understand the document? And he said, basically, it's a bag of words. And I said, how can it, how can it make sure that it understands the meaning when it's just bag of words? Well, he said, there is also IDF component and NTF component. And these two play together. +And hopefully the ideas that you will find some unique document, which which uniquely explains what you're looking for. But if I'm not looking for a term, if I'm looking for to be or not to be, each of these words is a stop word. +Now, how does it know what I'm looking for? And then he said, OK, Google actually pays more attention to the title. So like if these words occur in the title, they will rank the document higher. And at that point, I was like, this is like magic. So it doesn't understand anything. I'm searching. +It's just tuning it, right? It's layers of hacks upon hacks upon hacks to achieve certain goals. It's very interesting. And in the case of Google, it's amazing. And it works as well as it does. The scope of documents that they have in that index is ridiculous. +And to be able to sort of fulfill realistic queries, especially if you consider doing an exact magic query for long terms across a huge index of documents, like how the hell, like the quotation mark queries, I guess you could call them. Very interesting. I think that's a real good thing. +One of the things that I've found to help me evaluate the overall the overall confidence level of that these texts and writings do is I evaluate different choices. So for instance, on classic.com, one of the options we're exploring is we have an enormous editor workflow. +So when a new vehicle comes into the site, we need to have a vehicle person is expert at that making model. Look at the vehicle and determine what it is and answer some questions about it. +Like what color is it? Has it been restored or is it in an original condition? So what we're trying now is to actually use clip for that. So I have a database of those, let's say, potential colors. +And then I evaluate the image with clip and I say, picture of a red car, picture of a blue car, picture of a green car. And then I look at all of them, and I determine what one obviously which one has the highest the closest distance. +But also, overall, did any of them have a close distance or were they all kind of distant or were they all very far away from the embedding of the query. And if so, then I tell myself that okay, we're not answering this question well. +Right? Like the fact that it had no strong suggestion at all is in a way a confidence factor or a confidence metric in a way. Which is fantastic. +Like you're able to find an answer to my question, which is like broad enough, I think, but like essentially, you can use a threshold on the distance that didn't cross my mind at all. Like, yeah, you're right. +Like you can define kind of like the confidence interval for these distances, right? And you know, which metric you're using and you know your data set as well. +Right? So you could go through the meal of your lab and check, okay, is this a good one? Is this a bad one? Yeah, that's an amazing solution that you just came up with. +And from the perspective of the amount of art of, like when you're building a piece of software, you have to say, how many little artifacts am I creating here? What do I have to actually do? Am I creating a lot of stuff? +Am I just creating a little bit of stuff that works for a broad range of data and use cases? Now with clip, you get so much for free, quote unquote, like that whole question answering system that I used to implement that took a couple hours. +And of course, I'm going to take a lot of tweaking, but compared to training, a bunch of image classifiers to answer the same task, which would take me an enormous amount of effort I would have to have, you know, we have seven different attributes. +So there have to be seven different models, hundreds of thousands of training images for each, a very elaborate process of manually correlating them with clip. I just got all that quickly. And again, it's not super accurate, but it gives you a building block that you can just apply everywhere. +And you know, if at some point, I wanted to find other vehicles like this one, that same model works. If I want to find, if this matches a certain given piece of text, like, is this a Ford Mustang? That model works. It's just, I don't know, it's really, really, really amazing. +Yeah, it's mind blowing that, you know, science, as you said, you know, somebody in the science world thought about this problem, and they came up with some really great solution that you can actually use. +But when you discovered that clip works so well, did you get amazed at the point of going back and reading the paper, or are you not interested in papers? I flipped to the paper. I'm interested in papers to some extent. +I know some people like take a week off of work to read paper and stuff like that. I'm like, wow. You know, I don't come from a math background. I come from more of a practitioner programmer background. +So for me, I actually prefer sometimes to study the code and to understand, like, a lot of times their usage instructions will give you a lot of like subtle information about ways that you should and should not apply it. So I kind of stand that space for the most part. +But I am definitely paying a lot of attention to all, to all embeddings at this point. +And I feel like this is like, especially the multi-volta ones, once they start including the video content, and once we can run audio through there as well, it's just going to be a really exciting time to be alive. +Yeah, I think it's great that you're looking at the code because it's like several levels of abstraction, right? First, you're trying to understand, okay, is this useful? Okay, it is. +How does it work? Maybe what are the limitations? What are the advantages? Then you go to the paper, so where they, of course, beautifully describe the algorithm and they say it's the best. So it beats the state of the art and, you know, over the previous work. +But the problem is that there is always a gap, or usually there is a gap between the paper and the code. So if they publish the open source code, you go there and oh my god, there's like a bunch of additional hacks on top of the paper to make it really work. Oh, I see. +So yeah, it's amazing that you go back and read the code. And are you getting scared of reading like C++ code? No, no, no, no. In a way, C++ is like, it's so much, you know, it's different. It's a different part of your brain. +You know, C++ is so much simpler in a certain sense that every, every line there has a specific action at a specific point in the code. +Like every line there has a certain meaning with, let's say a model and PyTorch, there's a lot of like, for instance, like if your normalization is wrong, right? It's hard to tell that. And it's hard to even see that except for watching a training curve and guessing and sort of hoping. +And maybe that's part of my, maybe that speaks to my skillset, but I definitely think that that the machine learning model is brilliant because it's such a small amount of code that can do so much. Whereas C++ stuff is interesting because everything is excruciatingly carefully defined. +So it's kind of two separate sides, but both you to hold in their own way. Yeah, absolutely, absolutely. Especially when they get combined. So you're like, you're doing some model on C++ because for example, H&SW the graph algorithm is implemented in C++, which looks like C. +So I took a look at it with one of my colleagues and he was like, wow, this is not the modern C++ code. And yeah, it's like basic, it looks very basic in a way. Of course, they use some C++ elements, but like, for example, they allocate memory with like this level of mal-locks. +Yeah, and you're like, wow, yeah, you're doing that. And then some other companies, for example, like semi, which build the aviate or quadrant, like basically re-implement the algorithm and their language of choice, go or rest or whatever. +So because you feel probably better after understanding each bit there and then you can also control it in the way you want, especially after listening to users. +Yeah, and every, I think that it's kind of like they're living in different, different computational spaces in a sense, like what they're expressing and what that line of code does is the complete opposite from the perspective of what is what we're committing to the machine here. +You know, the machine learning models, we're building a framework in which it's capable of learning something in a C++ based or any imperative environment like that. We're expressing everything you can specifically do. It's almost the opposite. It's kind of interesting to think about. +Yeah, exactly. +Hey, Tom, it was really great talking to you, but I was still thinking like if you can spend a few minutes and if you are not averse to philosophy, like I like to ask this question also to each guest on my on my podcast, like considering that vector search is an emerging field in many ways. +We don't know yet if it will fully replace the traditional search or if they will work together. But in general, like what makes you excited? Why are you doing this? Like what, what keeps you going and exploring this field today? I have a very simple answer for you. +I'm tired of writing if statements. So you want to piggyback on some complex models that the trends are. I want to show the machine examples of it working correctly and examples of it working incorrectly and the machine learns exactly what those if statements should be. +I mean, it's the idea that we have to train something by illustrating every possible variation of it is just insane. + If imagine like on the say on Lookpop when you're searching for money and you see images with dollar signs in it, like that could have been programmed by a human being, but it would take a team of hundreds and it would take them 10 years and then they would finally have the money detector, you know. +Some brilliant dudes took a couple months to express how it could work and now we can solve all these different questions. It's fascinating. No, it's an amazing answer actually. Thank you. +I mean, it's you know, like some people get entrenched in like, oh, I'm so in love with machine learning, but like what you say is that you have a practical need and you also know the limitations of your previous approach, right? +Like if statements like who wants to code if statements or like if you take would say dictionary like somewhere in solar elastic search, you need to manually code that dictionary up and like maintain it. +Oh my god. Really? Is that the best part of your job? Probably not. Defining synonyms is a whole I cannot believe I have to define synonyms. Someone's already done this somewhere, you know, induction rich somewhere and they're just sitting on the ring. +It is somewhere dusty dusty shelves from why not why not embed them inside the machine learning algorithm? Yeah, absolutely. Hey, it's so fantastic talking to you. +Thanks for bringing this user perspective and like, is there something you would like to announce or share with with the audience, you know, anything at all? Just check out lookpop.co and taking some NFTs in your life. +Yeah, and buy an NFT and spice up your life, the digital life, right? Yeah, awesome. Hey, Tom, thanks so much. I really wish you all the best in trying quadrant and implementing it in your product and also like the whole web to your user base. +And I'm sure we can talk later as well and you can share some progress speeds, you know, as you go. Great. Thank you so much. Thank you so much. Yeah, bye bye. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/trey-grainger-wormhole-vectors.md b/transcripts_with_timestamps/vector-podcast/trey-grainger-wormhole-vectors.md new file mode 100644 index 0000000..aec3c41 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/trey-grainger-wormhole-vectors.md @@ -0,0 +1,6801 @@ +--- +description: '

This lightning session introduces a new idea in vector search - Wormhole + vectors!

It has deep roots in physics and allows for transcending spaces of + any nature: sparse, vector and behaviour (but could theoretically be any N-dimensional + space).

Blog post on Medium: https://dmitry-kan.medium.com/novel-idea-in-vector-search-wormhole-vectors-6093910593b8

Session + page on maven: https://maven.com/p/8c7de9/beyond-hybrid-search-with-wormhole-vectors?utm_campaign=NzI2NzIx&utm_medium=ll_share_link&utm_source=instructor

To + try the managed OpenSearch (multi-cloud, automatic backups, disaster recovery, vector + search and more), go here: https://console.aiven.io/signup?utm_source=youtube&utm_medium&&utm_content=vectorpodcast

Get + credits to use Aiven''s products (PG, Kafka, Valkey, OpenSearch, ClickHouse): https://aiven.io/startups

Timecodes:

00:00 + Intro by Dmitry

01:48 Trey''s presentation

03:05 Walk to the AI-Powered + Search course by Trey and Doug

07:07 Intro to vector spaces and embeddings

19:03 + Disjoint vector spaces and the need of hybrid search

23:11 Different modes + of search

24:49 Wormhole vectors

47:49 Q&A

What you''ll + learn:

- What are "Wormhole Vectors"?

Learn how wormhole vectors + work & how to use them to traverse between disparate vector spaces for better + hybrid search.

- Building a behavioral vector space from click stream data

Learn + to generate behavioral embeddings to be integrated with dense/semantic and sparse/lexical + vector queries.

- Traverse lexical, semantic, & behavioral vectors spaces

Jump + back and forth between multiple dense and sparse vector spaces in the same query

- + Advanced hybrid search techniques (beyond fusion algorithms)

Hybrid search + is more than mixing lexical + semantic search. See advanced techniques and where + wormhole vectors fit in.

YouTube: https://www.youtube.com/watch?v=fvDC7nK-_C0

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20251107_051156_724d3e0d493d36eed167f0604822b7e3.png +pub_date: Fri, 07 Nov 2025 05:58:00 GMT +title: Trey Grainger - Wormhole Vectors +url: https://rss.com/podcasts/vector-podcast/2314900 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 9.0, "text": " Alright, + hello everyone, wherever you are, really, really happy to see all of you online.", + "tokens": [50364, 2798, 11, 7751, 1518, 11, 8660, 291, 366, 11, 534, 11, 534, 2055, + 281, 536, 439, 295, 291, 2950, 13, 50814], "temperature": 0.0, "avg_logprob": -0.3031943683892908, + "compression_ratio": 1.4818652849740932, "no_speech_prob": 0.3462936580181122}, + {"id": 1, "seek": 0, "start": 9.0, "end": 15.0, "text": " Welcome to the Beyond + Hybrid Search with Warm Home Vectors. It''s another idea that", "tokens": [50814, + 4027, 281, 264, 19707, 47088, 17180, 365, 40353, 8719, 691, 557, 830, 13, 467, 311, + 1071, 1558, 300, 51114], "temperature": 0.0, "avg_logprob": -0.3031943683892908, + "compression_ratio": 1.4818652849740932, "no_speech_prob": 0.3462936580181122}, + {"id": 2, "seek": 0, "start": 15.0, "end": 24.0, "text": " Tray is going to present + today and we will have a discussion and all of you are welcome to ask questions + as well.", "tokens": [51114, 1765, 320, 307, 516, 281, 1974, 965, 293, 321, 486, + 362, 257, 5017, 293, 439, 295, 291, 366, 2928, 281, 1029, 1651, 382, 731, 13, 51564], + "temperature": 0.0, "avg_logprob": -0.3031943683892908, "compression_ratio": 1.4818652849740932, + "no_speech_prob": 0.3462936580181122}, {"id": 3, "seek": 2400, "start": 24.0, "end": + 31.0, "text": " Yeah, cool. I think we''ll start with that. This is just a quick + intro from me. I''m Dmitri Khan.", "tokens": [50364, 865, 11, 1627, 13, 286, 519, + 321, 603, 722, 365, 300, 13, 639, 307, 445, 257, 1702, 12897, 490, 385, 13, 286, + 478, 413, 3508, 470, 18136, 13, 50714], "temperature": 0.0, "avg_logprob": -0.18720425092256987, + "compression_ratio": 1.4427860696517414, "no_speech_prob": 0.31713834404945374}, + {"id": 4, "seek": 2400, "start": 31.0, "end": 40.0, "text": " I, most recently, + I''m with Ivan. I joined as a product director leading the search domain.", "tokens": + [50714, 286, 11, 881, 3938, 11, 286, 478, 365, 28893, 13, 286, 6869, 382, 257, 1674, + 5391, 5775, 264, 3164, 9274, 13, 51164], "temperature": 0.0, "avg_logprob": -0.18720425092256987, + "compression_ratio": 1.4427860696517414, "no_speech_prob": 0.31713834404945374}, + {"id": 5, "seek": 2400, "start": 40.0, "end": 49.0, "text": " We offer managed open + search so that you don''t have your headaches setting it up and doing some DevOps.", + "tokens": [51164, 492, 2626, 6453, 1269, 3164, 370, 300, 291, 500, 380, 362, 428, + 35046, 3287, 309, 493, 293, 884, 512, 43051, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.18720425092256987, "compression_ratio": 1.4427860696517414, "no_speech_prob": + 0.31713834404945374}, {"id": 6, "seek": 4900, "start": 49.0, "end": 57.0, "text": + " And you can choose any cloud whatsoever, really. And then just go and run with + that.", "tokens": [50364, 400, 291, 393, 2826, 604, 4588, 17076, 11, 534, 13, 400, + 550, 445, 352, 293, 1190, 365, 300, 13, 50764], "temperature": 0.0, "avg_logprob": + -0.11983788398004347, "compression_ratio": 1.564102564102564, "no_speech_prob": + 0.19468526542186737}, {"id": 7, "seek": 4900, "start": 57.0, "end": 65.0, "text": + " And I''ll share a couple of links later. I''m also a host of the vector podcast + that I started, I think, four years ago.", "tokens": [50764, 400, 286, 603, 2073, + 257, 1916, 295, 6123, 1780, 13, 286, 478, 611, 257, 3975, 295, 264, 8062, 7367, + 300, 286, 1409, 11, 286, 519, 11, 1451, 924, 2057, 13, 51164], "temperature": 0.0, + "avg_logprob": -0.11983788398004347, "compression_ratio": 1.564102564102564, "no_speech_prob": + 0.19468526542186737}, {"id": 8, "seek": 4900, "start": 65.0, "end": 75.0, "text": + " I already stopped counting. Maybe some of you have heard some of the episodes. + And yeah, it keeps going on and off, but I''m really excited to continue doing that.", + "tokens": [51164, 286, 1217, 5936, 13251, 13, 2704, 512, 295, 291, 362, 2198, 512, + 295, 264, 9313, 13, 400, 1338, 11, 309, 5965, 516, 322, 293, 766, 11, 457, 286, + 478, 534, 2919, 281, 2354, 884, 300, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.11983788398004347, "compression_ratio": 1.564102564102564, "no_speech_prob": + 0.19468526542186737}, {"id": 9, "seek": 7500, "start": 75.0, "end": 84.0, "text": + " I''ve been in search for, I think, 16 years, maybe 20 years if I include academic + experience or exposure.", "tokens": [50364, 286, 600, 668, 294, 3164, 337, 11, 286, + 519, 11, 3165, 924, 11, 1310, 945, 924, 498, 286, 4090, 7778, 1752, 420, 10420, + 13, 50814], "temperature": 0.0, "avg_logprob": -0.14241212873316522, "compression_ratio": + 1.4505494505494505, "no_speech_prob": 0.24794630706310272}, {"id": 10, "seek": 7500, + "start": 84.0, "end": 97.0, "text": " I''ve built search at startups, multinational + technology giants. I think what was the startup, for example, AlfaSense became, + I think, a unicorn company by now.", "tokens": [50814, 286, 600, 3094, 3164, 412, + 28041, 11, 45872, 1478, 2899, 31894, 13, 286, 519, 437, 390, 264, 18578, 11, 337, + 1365, 11, 967, 11771, 50, 1288, 3062, 11, 286, 519, 11, 257, 28122, 2237, 538, 586, + 13, 51464], "temperature": 0.0, "avg_logprob": -0.14241212873316522, "compression_ratio": + 1.4505494505494505, "no_speech_prob": 0.24794630706310272}, {"id": 11, "seek": 9700, + "start": 97.0, "end": 108.0, "text": " Yeah, I''m super excited to partner with + three AI power search and support from vector podcasts looking forward to the session + today.", "tokens": [50364, 865, 11, 286, 478, 1687, 2919, 281, 4975, 365, 1045, + 7318, 1347, 3164, 293, 1406, 490, 8062, 24045, 1237, 2128, 281, 264, 5481, 965, + 13, 50914], "temperature": 0.0, "avg_logprob": -0.19855614994349105, "compression_ratio": + 1.5585585585585586, "no_speech_prob": 0.2782234847545624}, {"id": 12, "seek": 9700, + "start": 108.0, "end": 110.0, "text": " Over to you, Trey.", "tokens": [50914, 4886, + 281, 291, 11, 314, 7950, 13, 51014], "temperature": 0.0, "avg_logprob": -0.19855614994349105, + "compression_ratio": 1.5585585585585586, "no_speech_prob": 0.2782234847545624}, + {"id": 13, "seek": 9700, "start": 110.0, "end": 112.0, "text": " Awesome. Thanks, + Dmitri. Appreciate it.", "tokens": [51014, 10391, 13, 2561, 11, 413, 3508, 470, + 13, 37601, 309, 13, 51114], "temperature": 0.0, "avg_logprob": -0.19855614994349105, + "compression_ratio": 1.5585585585585586, "no_speech_prob": 0.2782234847545624}, + {"id": 14, "seek": 9700, "start": 112.0, "end": 121.0, "text": " I''m really excited + to have Dmitri Khan more for the conversation part of this. He''s been doing, doing + the vector podcast and in the space for a long time.", "tokens": [51114, 286, 478, + 534, 2919, 281, 362, 413, 3508, 470, 18136, 544, 337, 264, 3761, 644, 295, 341, + 13, 634, 311, 668, 884, 11, 884, 264, 8062, 7367, 293, 294, 264, 1901, 337, 257, + 938, 565, 13, 51564], "temperature": 0.0, "avg_logprob": -0.19855614994349105, "compression_ratio": + 1.5585585585585586, "no_speech_prob": 0.2782234847545624}, {"id": 15, "seek": 12100, + "start": 121.0, "end": 135.0, "text": " So I think it''d be useful to help him facilitate, + get lots of questions and good discussions. So I''m Trey Granger. I''m the lead + author on the book AI Powered Search along with Doug Turnbull and Max Irwin.", "tokens": + [50364, 407, 286, 519, 309, 1116, 312, 4420, 281, 854, 796, 20207, 11, 483, 3195, + 295, 1651, 293, 665, 11088, 13, 407, 286, 478, 314, 7950, 2606, 3176, 13, 286, 478, + 264, 1477, 3793, 322, 264, 1446, 7318, 7086, 292, 17180, 2051, 365, 12742, 7956, + 37290, 293, 7402, 9151, 9136, 13, 51064], "temperature": 0.0, "avg_logprob": -0.19658759877651552, + "compression_ratio": 1.5, "no_speech_prob": 0.32589954137802124}, {"id": 16, "seek": + 12100, "start": 135.0, "end": 142.0, "text": " I''m the founder of Search Colonel + company that does AI Powered Search consulting, technical advisor, open source connections.", + "tokens": [51064, 286, 478, 264, 14917, 295, 17180, 28478, 2237, 300, 775, 7318, + 7086, 292, 17180, 23682, 11, 6191, 19161, 11, 1269, 4009, 9271, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.19658759877651552, "compression_ratio": 1.5, "no_speech_prob": + 0.32589954137802124}, {"id": 17, "seek": 14200, "start": 142.0, "end": 156.0, "text": + " Last year, I''ve been an adjunct professor at from university teaching computer + science. My background, basically my entire career has been in search, particularly + the intersection of data science, AI and search.", "tokens": [50364, 5264, 1064, + 11, 286, 600, 668, 364, 614, 10010, 349, 8304, 412, 490, 5454, 4571, 3820, 3497, + 13, 1222, 3678, 11, 1936, 452, 2302, 3988, 575, 668, 294, 3164, 11, 4098, 264, 15236, + 295, 1412, 3497, 11, 7318, 293, 3164, 13, 51064], "temperature": 0.0, "avg_logprob": + -0.21775614298306978, "compression_ratio": 1.6487455197132617, "no_speech_prob": + 0.5073336362838745}, {"id": 18, "seek": 14200, "start": 156.0, "end": 170.0, "text": + " My last company, prior to search journal, I was the CTO of pre search, which is + a decentralized web search engine prior to that. I was the chief algorithms officer + at Lucidworks, a search company, as well as prior to that, their SVP of engineering.", + "tokens": [51064, 1222, 1036, 2237, 11, 4059, 281, 3164, 6708, 11, 286, 390, 264, + 383, 15427, 295, 659, 3164, 11, 597, 307, 257, 32870, 3670, 3164, 2848, 4059, 281, + 300, 13, 286, 390, 264, 9588, 14642, 8456, 412, 9593, 327, 18357, 11, 257, 3164, + 2237, 11, 382, 731, 382, 4059, 281, 300, 11, 641, 31910, 47, 295, 7043, 13, 51764], + "temperature": 0.0, "avg_logprob": -0.21775614298306978, "compression_ratio": 1.6487455197132617, + "no_speech_prob": 0.5073336362838745}, {"id": 19, "seek": 17000, "start": 170.0, + "end": 181.0, "text": " I also had a search at career builder, prior to that. I + also a decade ago, solar in action, but AI Powered Search is the focus of what I''m + doing right now.", "tokens": [50364, 286, 611, 632, 257, 3164, 412, 3988, 27377, + 11, 4059, 281, 300, 13, 286, 611, 257, 10378, 2057, 11, 7936, 294, 3069, 11, 457, + 7318, 7086, 292, 17180, 307, 264, 1879, 295, 437, 286, 478, 884, 558, 586, 13, 50914], + "temperature": 0.0, "avg_logprob": -0.21004395346039706, "compression_ratio": 1.6053639846743295, + "no_speech_prob": 0.2600209712982178}, {"id": 20, "seek": 17000, "start": 181.0, + "end": 197.0, "text": " The books got, you know, quite good reviews from folks. + If you haven''t checked it out, please check it out. And this lightning lesson is + one of a series leading up to an AI Powered Search course that Doug Turnbull and + I are teaching starting two weeks from today.", "tokens": [50914, 440, 3642, 658, + 11, 291, 458, 11, 1596, 665, 10229, 490, 4024, 13, 759, 291, 2378, 380, 10033, 309, + 484, 11, 1767, 1520, 309, 484, 13, 400, 341, 16589, 6898, 307, 472, 295, 257, 2638, + 5775, 493, 281, 364, 7318, 7086, 292, 17180, 1164, 300, 12742, 7956, 37290, 293, + 286, 366, 4571, 2891, 732, 3259, 490, 965, 13, 51714], "temperature": 0.0, "avg_logprob": + -0.21004395346039706, "compression_ratio": 1.6053639846743295, "no_speech_prob": + 0.2600209712982178}, {"id": 21, "seek": 19700, "start": 197.0, "end": 205.0, "text": + " I heard of it. It''s kind of themed based upon the book, but we''ll be going into + a lot of new and emerging techniques that aren''t in the book as well.", "tokens": + [50364, 286, 2198, 295, 309, 13, 467, 311, 733, 295, 33920, 2361, 3564, 264, 1446, + 11, 457, 321, 603, 312, 516, 666, 257, 688, 295, 777, 293, 14989, 7512, 300, 3212, + 380, 294, 264, 1446, 382, 731, 13, 50764], "temperature": 0.0, "avg_logprob": -0.15663624560739112, + "compression_ratio": 1.6645569620253164, "no_speech_prob": 0.3834373354911804}, + {"id": 22, "seek": 19700, "start": 205.0, "end": 226.0, "text": " Just to give you + a sense, I''ll spend like a minute on this, maybe two. If you''re curious, it''s, + you know, four solid weeks of material, the first week will sort of, you know, do + a course intro, introduce the search relevance problem, talk about ranking those + things. We''ll have a guest session from Eric Pugh from open source connections, + talking about user behavior insights.", "tokens": [50764, 1449, 281, 976, 291, 257, + 2020, 11, 286, 603, 3496, 411, 257, 3456, 322, 341, 11, 1310, 732, 13, 759, 291, + 434, 6369, 11, 309, 311, 11, 291, 458, 11, 1451, 5100, 3259, 295, 2527, 11, 264, + 700, 1243, 486, 1333, 295, 11, 291, 458, 11, 360, 257, 1164, 12897, 11, 5366, 264, + 3164, 32684, 1154, 11, 751, 466, 17833, 729, 721, 13, 492, 603, 362, 257, 8341, + 5481, 490, 9336, 430, 1984, 490, 1269, 4009, 9271, 11, 1417, 466, 4195, 5223, 14310, + 13, 51814], "temperature": 0.0, "avg_logprob": -0.15663624560739112, "compression_ratio": + 1.6645569620253164, "no_speech_prob": 0.3834373354911804}, {"id": 23, "seek": 22600, + "start": 226.0, "end": 236.0, "text": " For collecting click stream data and how + to properly collect and process that are an accession will be on signals and reflected + intelligence models.", "tokens": [50364, 1171, 12510, 2052, 4309, 1412, 293, 577, + 281, 6108, 2500, 293, 1399, 300, 366, 364, 2105, 313, 486, 312, 322, 12354, 293, + 15502, 7599, 5245, 13, 50864], "temperature": 0.0, "avg_logprob": -0.143905967029173, + "compression_ratio": 1.7889908256880733, "no_speech_prob": 0.009816485457122326}, + {"id": 24, "seek": 22600, "start": 236.0, "end": 253.0, "text": " Everything from + signals boosting for popularized relevance to learning to rank for generalized relevance + to collaborative filtering and matrix factorization for personalized relevance to + knowledge graph learning to learn from user behaviors.", "tokens": [50864, 5471, + 490, 12354, 43117, 337, 3743, 1602, 32684, 281, 2539, 281, 6181, 337, 44498, 32684, + 281, 16555, 30822, 293, 8141, 5952, 2144, 337, 28415, 32684, 281, 3601, 4295, 2539, + 281, 1466, 490, 4195, 15501, 13, 51714], "temperature": 0.0, "avg_logprob": -0.143905967029173, + "compression_ratio": 1.7889908256880733, "no_speech_prob": 0.009816485457122326}, + {"id": 25, "seek": 25300, "start": 253.0, "end": 267.0, "text": " You know, terms, + misspelling things like that about your domain. And then every week will have office + hours where you can bring your hardest questions or we''ve got labs throughout the + course as well. If you need help with those, we can help.", "tokens": [50364, 509, + 458, 11, 2115, 11, 1713, 494, 2669, 721, 411, 300, 466, 428, 9274, 13, 400, 550, + 633, 1243, 486, 362, 3398, 2496, 689, 291, 393, 1565, 428, 13158, 1651, 420, 321, + 600, 658, 20339, 3710, 264, 1164, 382, 731, 13, 759, 291, 643, 854, 365, 729, 11, + 321, 393, 854, 13, 51064], "temperature": 0.0, "avg_logprob": -0.1717073576790946, + "compression_ratio": 1.4260355029585798, "no_speech_prob": 0.36015018820762634}, + {"id": 26, "seek": 26700, "start": 267.0, "end": 287.0, "text": " We the next week + will dive into AI powered query modalities, things like buying coders and crossing + coders talk about chunking, talk about late interaction models, hybrid search, multimodal + search, all of those. Again, all of this has code and notebooks associated with + it will be working through.", "tokens": [50364, 492, 264, 958, 1243, 486, 9192, + 666, 7318, 17786, 14581, 1072, 16110, 11, 721, 411, 6382, 17656, 433, 293, 14712, + 17656, 433, 751, 466, 16635, 278, 11, 751, 466, 3469, 9285, 5245, 11, 13051, 3164, + 11, 32972, 378, 304, 3164, 11, 439, 295, 729, 13, 3764, 11, 439, 295, 341, 575, + 3089, 293, 43782, 6615, 365, 309, 486, 312, 1364, 807, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.15559605396155154, "compression_ratio": 1.5797872340425532, + "no_speech_prob": 0.6516717672348022}, {"id": 27, "seek": 28700, "start": 287.0, + "end": 305.0, "text": " We have a guest lecture from Jenny from quadrant who will + be talking about mixing sparse sentence representations with mini coil and then + we''ll dive after that into sort of hands on building ranking classifiers or learning + to rank models.", "tokens": [50364, 492, 362, 257, 8341, 7991, 490, 20580, 490, + 46856, 567, 486, 312, 1417, 466, 11983, 637, 11668, 8174, 33358, 365, 8382, 22225, + 293, 550, 321, 603, 9192, 934, 300, 666, 1333, 295, 2377, 322, 2390, 17833, 1508, + 23463, 420, 2539, 281, 6181, 5245, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.24801528453826904, "compression_ratio": 1.451219512195122, "no_speech_prob": + 0.5208132266998291}, {"id": 28, "seek": 30500, "start": 305.0, "end": 315.0, "text": + " And what is entailed in that we will of course then have office hours again the + next week will dive deep into rag talk about rag.", "tokens": [50364, 400, 437, + 307, 948, 24731, 294, 300, 321, 486, 295, 1164, 550, 362, 3398, 2496, 797, 264, + 958, 1243, 486, 9192, 2452, 666, 17539, 751, 466, 17539, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.25813097702829463, "compression_ratio": 1.7728937728937728, + "no_speech_prob": 0.56816565990448}, {"id": 29, "seek": 30500, "start": 315.0, "end": + 323.0, "text": " You know, sort of naive rag, agentech rag adaptive rag guard rails + all the sorts of things you need to sort of understand to do to rag well.", "tokens": + [50864, 509, 458, 11, 1333, 295, 29052, 17539, 11, 623, 1576, 339, 17539, 27912, + 17539, 6290, 27649, 439, 264, 7527, 295, 721, 291, 643, 281, 1333, 295, 1223, 281, + 360, 281, 17539, 731, 13, 51264], "temperature": 0.0, "avg_logprob": -0.25813097702829463, + "compression_ratio": 1.7728937728937728, "no_speech_prob": 0.56816565990448}, {"id": + 30, "seek": 30500, "start": 323.0, "end": 334.0, "text": " We''ll talk about agentech + search towards the end of the course talk about interleaving strategies for rag + will have Max Irwin our co author on a powered search to giving a guest lecture + session after that will be.", "tokens": [51264, 492, 603, 751, 466, 623, 1576, 339, + 3164, 3030, 264, 917, 295, 264, 1164, 751, 466, 728, 306, 6152, 9029, 337, 17539, + 486, 362, 7402, 9151, 9136, 527, 598, 3793, 322, 257, 17786, 3164, 281, 2902, 257, + 8341, 7991, 5481, 934, 300, 486, 312, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.25813097702829463, "compression_ratio": 1.7728937728937728, "no_speech_prob": + 0.56816565990448}, {"id": 31, "seek": 33400, "start": 334.0, "end": 354.0, "text": + " Automating learning to rank and with click models and with active learning so + we''ll be diving into how to deal with biases in your data how to deal with exploration + versus exploitation looking for results that may maybe don''t show up in your normal + search patterns and then.", "tokens": [50364, 24619, 990, 2539, 281, 6181, 293, + 365, 2052, 5245, 293, 365, 4967, 2539, 370, 321, 603, 312, 20241, 666, 577, 281, + 2028, 365, 32152, 294, 428, 1412, 577, 281, 2028, 365, 16197, 5717, 33122, 1237, + 337, 3542, 300, 815, 1310, 500, 380, 855, 493, 294, 428, 2710, 3164, 8294, 293, + 550, 13, 51364], "temperature": 0.0, "avg_logprob": -0.14062168768474034, "compression_ratio": + 1.6407185628742516, "no_speech_prob": 0.005514203105121851}, {"id": 32, "seek": + 35400, "start": 354.0, "end": 378.0, "text": " The sort of final two weeks will + have a guest lecture from john handler from open search and AWS really talking about + scaling vector search in production with lots of good experience from large scale + open search clusters and Amazon servers and then we''ll dive into optimizing I search + for production everything from quantization re ranking strategies semantic caching.", + "tokens": [50364, 440, 1333, 295, 2572, 732, 3259, 486, 362, 257, 8341, 7991, 490, + 35097, 41967, 490, 1269, 3164, 293, 17650, 534, 1417, 466, 21589, 8062, 3164, 294, + 4265, 365, 3195, 295, 665, 1752, 490, 2416, 4373, 1269, 3164, 23313, 293, 6795, + 15909, 293, 550, 321, 603, 9192, 666, 40425, 286, 3164, 337, 4265, 1203, 490, 4426, + 2144, 319, 17833, 9029, 47982, 269, 2834, 13, 51564], "temperature": 0.0, "avg_logprob": + -0.1519966552506632, "compression_ratio": 1.6428571428571428, "no_speech_prob": + 0.20361757278442383}, {"id": 33, "seek": 37800, "start": 378.0, "end": 400.0, "text": + " Running local models and then for our last session will dive deep into AI powered + query understanding and agentech search focused on really interpreting understanding + queries leveraging agents as part of that process and so if that''s interesting + to you there''s a link and a QR code here anyone who attends today is eligible for + 20% off the course.", "tokens": [50364, 28136, 2654, 5245, 293, 550, 337, 527, 1036, + 5481, 486, 9192, 2452, 666, 7318, 17786, 14581, 3701, 293, 623, 1576, 339, 3164, + 5178, 322, 534, 37395, 3701, 24109, 32666, 12554, 382, 644, 295, 300, 1399, 293, + 370, 498, 300, 311, 1880, 281, 291, 456, 311, 257, 2113, 293, 257, 32784, 3089, + 510, 2878, 567, 49837, 965, 307, 14728, 337, 945, 4, 766, 264, 1164, 13, 51464], + "temperature": 0.0, "avg_logprob": -0.10343417568483214, "compression_ratio": 1.5844748858447488, + "no_speech_prob": 0.19440768659114838}, {"id": 34, "seek": 40000, "start": 401.0, + "end": 425.0, "text": " And so definitely check it out if you''ve been considering + it there''s two weeks left and of course even if you can''t attend all the sessions + everyone who''s enrolled will have permanent access to all of the recordings all + the code and all the course materials so you can use these going forward into the + future if that''s interesting to you so done with that now I''d like to get to our + topic which is beyond hybrid search with wormhole vectors.", "tokens": [50414, 400, + 370, 2138, 1520, 309, 484, 498, 291, 600, 668, 8079, 309, 456, 311, 732, 3259, 1411, + 293, 295, 1164, 754, 498, 291, 393, 380, 6888, 439, 264, 11081, 1518, 567, 311, + 25896, 486, 362, 10996, 2105, 281, 439, 295, 264, 25162, 439, 264, 3089, 293, 439, + 264, 1164, 5319, 370, 291, 393, 764, 613, 516, 2128, 666, 264, 2027, 498, 300, 311, + 1880, 281, 291, 370, 1096, 365, 300, 586, 286, 1116, 411, 281, 483, 281, 527, 4829, + 597, 307, 4399, 13051, 3164, 365, 23835, 14094, 18875, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.07660269480879589, "compression_ratio": 1.6768060836501901, + "no_speech_prob": 0.17372703552246094}, {"id": 35, "seek": 42500, "start": 425.0, + "end": 454.0, "text": " So let me dive straight in and feel free if you have questions + as Dimitri said post them in the comments Dimitri feel free to interrupt me at any + point if there''s something worth diving into otherwise i''m just going to keep + going and kind of focus on conversation at the end so I want to start with some + basic material on vectors and vector spaces to kind of set our expectations for + where we''re going with wormhole vectors to start vectors by definition mathematically + or something that have direction and we''re going to start with the end of the end + of the session.", "tokens": [50414, 407, 718, 385, 9192, 2997, 294, 293, 841, 1737, + 498, 291, 362, 1651, 382, 20975, 270, 470, 848, 2183, 552, 294, 264, 3053, 20975, + 270, 470, 841, 1737, 281, 12729, 385, 412, 604, 935, 498, 456, 311, 746, 3163, 20241, + 666, 5911, 741, 478, 445, 516, 281, 1066, 516, 293, 733, 295, 1879, 322, 3761, 412, + 264, 917, 370, 286, 528, 281, 722, 365, 512, 3875, 2527, 322, 18875, 293, 8062, + 7673, 281, 733, 295, 992, 527, 9843, 337, 689, 321, 434, 516, 365, 23835, 14094, + 18875, 281, 722, 18875, 538, 7123, 44003, 420, 746, 300, 362, 3513, 293, 321, 434, + 516, 281, 722, 365, 264, 917, 295, 264, 917, 295, 264, 5481, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.3083912123981704, "compression_ratio": 1.89, "no_speech_prob": + 0.12884946167469025}, {"id": 36, "seek": 45500, "start": 455.0, "end": 457.18, "text": + " And in this video let''s see in situations [\u266amagnitude [\u266amagnitude [\u266amagnitude + [\u266amagnitude [\u266amagnitude [\u266amagnitude [\u266amagnitude [\u266amagnitude + [\u266amagnitude [\u266amagnitude [\u266amagnitude [\u266amagnitude [\u266amagnitude + [\u266amagnitude [\u266amagnitude [\u266amanage [\u266amagnitude [\u266af Mash Fans + Bake \u0443\u0432\u0438\u0434 then", "tokens": [50364, 400, 294, 341, 960, 718, + 311, 536, 294, 6851, 44529, 76, 4535, 4377, 44529, 76, 4535, 4377, 44529, 76, 4535, + 4377, 44529, 76, 4535, 4377, 44529, 76, 4535, 4377, 44529, 76, 4535, 4377, 44529, + 76, 4535, 4377, 44529, 76, 4535, 4377, 44529, 76, 4535, 4377, 44529, 76, 4535, 4377, + 44529, 76, 4535, 4377, 44529, 76, 4535, 4377, 44529, 76, 4535, 4377, 44529, 76, + 4535, 4377, 44529, 76, 4535, 4377, 44529, 1601, 609, 44529, 76, 4535, 4377, 44529, + 69, 42039, 25065, 42597, 21974, 550, 50473], "temperature": 1.0, "avg_logprob": + -2.9473230931665992, "compression_ratio": 2.16, "no_speech_prob": 0.0055676670745015144}, + {"id": 37, "seek": 45500, "start": 457.18, "end": 458.88, "text": "\u82e5ymab\u0457r + VPN www.pe devils.comspeaking.habit.com", "tokens": [50473, 13736, 98, 4199, 455, + 8045, 81, 24512, 12520, 13, 494, 1905, 4174, 13, 1112, 14579, 13, 7821, 270, 13, + 1112, 50558], "temperature": 1.0, "avg_logprob": -2.9473230931665992, "compression_ratio": + 2.16, "no_speech_prob": 0.0055676670745015144}, {"id": 38, "seek": 45500, "start": + 458.88, "end": 468.96, "text": " So let''s see how much you could get together.", + "tokens": [50558, 407, 718, 311, 536, 577, 709, 291, 727, 483, 1214, 13, 51062], + "temperature": 1.0, "avg_logprob": -2.9473230931665992, "compression_ratio": 2.16, + "no_speech_prob": 0.0055676670745015144}, {"id": 39, "seek": 45500, "start": 474.0, + "end": 480.9, "text": " of course GK doctrine is a huge task, a huge task that a + man develops on who the heck is dominant during the process and setup it in", "tokens": + [51314, 295, 1164, 460, 42, 23290, 307, 257, 2603, 5633, 11, 257, 2603, 5633, 300, + 257, 587, 25453, 322, 567, 264, 12872, 307, 15657, 1830, 264, 1399, 293, 8657, 309, + 294, 51659], "temperature": 1.0, "avg_logprob": -2.9473230931665992, "compression_ratio": + 2.16, "no_speech_prob": 0.0055676670745015144}, {"id": 40, "seek": 48090, "start": + 480.9, "end": 483.65999999999997, "text": " in vector space and a Hilbert space,", + "tokens": [50364, 294, 8062, 1901, 293, 257, 19914, 4290, 1901, 11, 50502], "temperature": + 0.0, "avg_logprob": -0.29920297241210936, "compression_ratio": 1.7569721115537849, + "no_speech_prob": 0.1460099071264267}, {"id": 41, "seek": 48090, "start": 483.65999999999997, + "end": 485.9, "text": " but in a semantic vector space,", "tokens": [50502, 457, + 294, 257, 47982, 8062, 1901, 11, 50614], "temperature": 0.0, "avg_logprob": -0.29920297241210936, + "compression_ratio": 1.7569721115537849, "no_speech_prob": 0.1460099071264267}, + {"id": 42, "seek": 48090, "start": 485.9, "end": 487.94, "text": " into which we + can map a concept.", "tokens": [50614, 666, 597, 321, 393, 4471, 257, 3410, 13, + 50716], "temperature": 0.0, "avg_logprob": -0.29920297241210936, "compression_ratio": + 1.7569721115537849, "no_speech_prob": 0.1460099071264267}, {"id": 43, "seek": 48090, + "start": 487.94, "end": 491.29999999999995, "text": " So whereas, vectors have dimensions,", + "tokens": [50716, 407, 9735, 11, 18875, 362, 12819, 11, 50884], "temperature": 0.0, + "avg_logprob": -0.29920297241210936, "compression_ratio": 1.7569721115537849, "no_speech_prob": + 0.1460099071264267}, {"id": 44, "seek": 48090, "start": 491.29999999999995, "end": + 494.58, "text": " and those dimensions sort of go in any direction.", "tokens": + [50884, 293, 729, 12819, 1333, 295, 352, 294, 604, 3513, 13, 51048], "temperature": + 0.0, "avg_logprob": -0.29920297241210936, "compression_ratio": 1.7569721115537849, + "no_speech_prob": 0.1460099071264267}, {"id": 45, "seek": 48090, "start": 494.58, + "end": 496.14, "text": " When we talk about an embedding,", "tokens": [51048, 1133, + 321, 751, 466, 364, 12240, 3584, 11, 51126], "temperature": 0.0, "avg_logprob": + -0.29920297241210936, "compression_ratio": 1.7569721115537849, "no_speech_prob": + 0.1460099071264267}, {"id": 46, "seek": 48090, "start": 496.14, "end": 499.17999999999995, + "text": " an embedding is actually a point in vector space.", "tokens": [51126, + 364, 12240, 3584, 307, 767, 257, 935, 294, 8062, 1901, 13, 51278], "temperature": + 0.0, "avg_logprob": -0.29920297241210936, "compression_ratio": 1.7569721115537849, + "no_speech_prob": 0.1460099071264267}, {"id": 47, "seek": 48090, "start": 499.17999999999995, + "end": 501.06, "text": " So for example, this point right here,", "tokens": [51278, + 407, 337, 1365, 11, 341, 935, 558, 510, 11, 51372], "temperature": 0.0, "avg_logprob": + -0.29920297241210936, "compression_ratio": 1.7569721115537849, "no_speech_prob": + 0.1460099071264267}, {"id": 48, "seek": 48090, "start": 501.06, "end": 504.38, "text": + " this series of floats for book or tree or what have you.", "tokens": [51372, 341, + 2638, 295, 37878, 337, 1446, 420, 4230, 420, 437, 362, 291, 13, 51538], "temperature": + 0.0, "avg_logprob": -0.29920297241210936, "compression_ratio": 1.7569721115537849, + "no_speech_prob": 0.1460099071264267}, {"id": 49, "seek": 48090, "start": 504.38, + "end": 506.09999999999997, "text": " You can think of it as a vector", "tokens": + [51538, 509, 393, 519, 295, 309, 382, 257, 8062, 51624], "temperature": 0.0, "avg_logprob": + -0.29920297241210936, "compression_ratio": 1.7569721115537849, "no_speech_prob": + 0.1460099071264267}, {"id": 50, "seek": 48090, "start": 506.09999999999997, "end": + 509.85999999999996, "text": " originating from the origin at 0, 0 here,", "tokens": + [51624, 4957, 990, 490, 264, 4957, 412, 1958, 11, 1958, 510, 11, 51812], "temperature": + 0.0, "avg_logprob": -0.29920297241210936, "compression_ratio": 1.7569721115537849, + "no_speech_prob": 0.1460099071264267}, {"id": 51, "seek": 50986, "start": 509.86, + "end": 512.26, "text": " and extending out to that point.", "tokens": [50364, 293, + 24360, 484, 281, 300, 935, 13, 50484], "temperature": 0.0, "avg_logprob": -0.19104307731695935, + "compression_ratio": 1.695970695970696, "no_speech_prob": 0.00017427001148462296}, + {"id": 52, "seek": 50986, "start": 512.26, "end": 515.34, "text": " But fundamentally, + we think of an embedding as a coordinate,", "tokens": [50484, 583, 17879, 11, 321, + 519, 295, 364, 12240, 3584, 382, 257, 15670, 11, 50638], "temperature": 0.0, "avg_logprob": + -0.19104307731695935, "compression_ratio": 1.695970695970696, "no_speech_prob": + 0.00017427001148462296}, {"id": 53, "seek": 50986, "start": 515.34, "end": 518.0600000000001, + "text": " that is a point in vector space that corresponds", "tokens": [50638, 300, + 307, 257, 935, 294, 8062, 1901, 300, 23249, 50774], "temperature": 0.0, "avg_logprob": + -0.19104307731695935, "compression_ratio": 1.695970695970696, "no_speech_prob": + 0.00017427001148462296}, {"id": 54, "seek": 50986, "start": 518.0600000000001, "end": + 521.3000000000001, "text": " with some semantic meaning.", "tokens": [50774, 365, + 512, 47982, 3620, 13, 50936], "temperature": 0.0, "avg_logprob": -0.19104307731695935, + "compression_ratio": 1.695970695970696, "no_speech_prob": 0.00017427001148462296}, + {"id": 55, "seek": 50986, "start": 521.3000000000001, "end": 523.62, "text": " And + search whenever we''re dealing with embeddings,", "tokens": [50936, 400, 3164, 5699, + 321, 434, 6260, 365, 12240, 29432, 11, 51052], "temperature": 0.0, "avg_logprob": + -0.19104307731695935, "compression_ratio": 1.695970695970696, "no_speech_prob": + 0.00017427001148462296}, {"id": 56, "seek": 50986, "start": 523.62, "end": 527.58, + "text": " we often have things like word or phrase embeddings,", "tokens": [51052, + 321, 2049, 362, 721, 411, 1349, 420, 9535, 12240, 29432, 11, 51250], "temperature": + 0.0, "avg_logprob": -0.19104307731695935, "compression_ratio": 1.695970695970696, + "no_speech_prob": 0.00017427001148462296}, {"id": 57, "seek": 50986, "start": 527.58, + "end": 529.1800000000001, "text": " where we take an individual word", "tokens": + [51250, 689, 321, 747, 364, 2609, 1349, 51330], "temperature": 0.0, "avg_logprob": + -0.19104307731695935, "compression_ratio": 1.695970695970696, "no_speech_prob": + 0.00017427001148462296}, {"id": 58, "seek": 50986, "start": 529.1800000000001, "end": + 531.46, "text": " and leveraging a transformer model typically.", "tokens": [51330, + 293, 32666, 257, 31782, 2316, 5850, 13, 51444], "temperature": 0.0, "avg_logprob": + -0.19104307731695935, "compression_ratio": 1.695970695970696, "no_speech_prob": + 0.00017427001148462296}, {"id": 59, "seek": 50986, "start": 531.46, "end": 534.9, + "text": " We will generate that series of floats", "tokens": [51444, 492, 486, 8460, + 300, 2638, 295, 37878, 51616], "temperature": 0.0, "avg_logprob": -0.19104307731695935, + "compression_ratio": 1.695970695970696, "no_speech_prob": 0.00017427001148462296}, + {"id": 60, "seek": 50986, "start": 534.9, "end": 537.02, "text": " that represents + the meaning of that word,", "tokens": [51616, 300, 8855, 264, 3620, 295, 300, 1349, + 11, 51722], "temperature": 0.0, "avg_logprob": -0.19104307731695935, "compression_ratio": + 1.695970695970696, "no_speech_prob": 0.00017427001148462296}, {"id": 61, "seek": + 50986, "start": 537.02, "end": 538.54, "text": " given the context around it.", + "tokens": [51722, 2212, 264, 4319, 926, 309, 13, 51798], "temperature": 0.0, "avg_logprob": + -0.19104307731695935, "compression_ratio": 1.695970695970696, "no_speech_prob": + 0.00017427001148462296}, {"id": 62, "seek": 53854, "start": 538.54, "end": 540.9399999999999, + "text": " But we can also have sentence embeddings,", "tokens": [50364, 583, 321, + 393, 611, 362, 8174, 12240, 29432, 11, 50484], "temperature": 0.0, "avg_logprob": + -0.1349480564432933, "compression_ratio": 1.937269372693727, "no_speech_prob": 0.0001605852594366297}, + {"id": 63, "seek": 53854, "start": 540.9399999999999, "end": 542.8199999999999, + "text": " where we look at all of the words in the sentence", "tokens": [50484, + 689, 321, 574, 412, 439, 295, 264, 2283, 294, 264, 8174, 50578], "temperature": + 0.0, "avg_logprob": -0.1349480564432933, "compression_ratio": 1.937269372693727, + "no_speech_prob": 0.0001605852594366297}, {"id": 64, "seek": 53854, "start": 542.8199999999999, + "end": 544.86, "text": " and their contextual meaning,", "tokens": [50578, 293, + 641, 35526, 3620, 11, 50680], "temperature": 0.0, "avg_logprob": -0.1349480564432933, + "compression_ratio": 1.937269372693727, "no_speech_prob": 0.0001605852594366297}, + {"id": 65, "seek": 53854, "start": 544.86, "end": 548.18, "text": " and generate + an embedding that represents", "tokens": [50680, 293, 8460, 364, 12240, 3584, 300, + 8855, 50846], "temperature": 0.0, "avg_logprob": -0.1349480564432933, "compression_ratio": + 1.937269372693727, "no_speech_prob": 0.0001605852594366297}, {"id": 66, "seek": + 53854, "start": 548.18, "end": 549.18, "text": " the meaning of the sentence.", + "tokens": [50846, 264, 3620, 295, 264, 8174, 13, 50896], "temperature": 0.0, "avg_logprob": + -0.1349480564432933, "compression_ratio": 1.937269372693727, "no_speech_prob": 0.0001605852594366297}, + {"id": 67, "seek": 53854, "start": 549.18, "end": 550.54, "text": " We can have + paragraph embeddings", "tokens": [50896, 492, 393, 362, 18865, 12240, 29432, 50964], + "temperature": 0.0, "avg_logprob": -0.1349480564432933, "compression_ratio": 1.937269372693727, + "no_speech_prob": 0.0001605852594366297}, {"id": 68, "seek": 53854, "start": 550.54, + "end": 554.02, "text": " that sort of summarize the core ideas of that paragraph,", + "tokens": [50964, 300, 1333, 295, 20858, 264, 4965, 3487, 295, 300, 18865, 11, 51138], + "temperature": 0.0, "avg_logprob": -0.1349480564432933, "compression_ratio": 1.937269372693727, + "no_speech_prob": 0.0001605852594366297}, {"id": 69, "seek": 53854, "start": 554.02, + "end": 555.54, "text": " and the same thing with a document.", "tokens": [51138, + 293, 264, 912, 551, 365, 257, 4166, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.1349480564432933, "compression_ratio": 1.937269372693727, "no_speech_prob": 0.0001605852594366297}, + {"id": 70, "seek": 53854, "start": 555.54, "end": 558.5799999999999, "text": " Often + in search, we''ll start with just a document embedding,", "tokens": [51214, 20043, + 294, 3164, 11, 321, 603, 722, 365, 445, 257, 4166, 12240, 3584, 11, 51366], "temperature": + 0.0, "avg_logprob": -0.1349480564432933, "compression_ratio": 1.937269372693727, + "no_speech_prob": 0.0001605852594366297}, {"id": 71, "seek": 53854, "start": 558.5799999999999, + "end": 561.6999999999999, "text": " and when we take a query, we generate an embedding,", + "tokens": [51366, 293, 562, 321, 747, 257, 14581, 11, 321, 8460, 364, 12240, 3584, + 11, 51522], "temperature": 0.0, "avg_logprob": -0.1349480564432933, "compression_ratio": + 1.937269372693727, "no_speech_prob": 0.0001605852594366297}, {"id": 72, "seek": + 53854, "start": 561.6999999999999, "end": 563.54, "text": " and we do sort of a + vector similarity", "tokens": [51522, 293, 321, 360, 1333, 295, 257, 8062, 32194, + 51614], "temperature": 0.0, "avg_logprob": -0.1349480564432933, "compression_ratio": + 1.937269372693727, "no_speech_prob": 0.0001605852594366297}, {"id": 73, "seek": + 53854, "start": 563.54, "end": 567.0999999999999, "text": " between defined related + documents that match the query.", "tokens": [51614, 1296, 7642, 4077, 8512, 300, + 2995, 264, 14581, 13, 51792], "temperature": 0.0, "avg_logprob": -0.1349480564432933, + "compression_ratio": 1.937269372693727, "no_speech_prob": 0.0001605852594366297}, + {"id": 74, "seek": 56710, "start": 567.1, "end": 569.26, "text": " But you can chunk + documents up in any way", "tokens": [50364, 583, 291, 393, 16635, 8512, 493, 294, + 604, 636, 50472], "temperature": 0.0, "avg_logprob": -0.16214718204913753, "compression_ratio": + 1.628691983122363, "no_speech_prob": 9.877497359411791e-05}, {"id": 75, "seek": + 56710, "start": 569.26, "end": 571.74, "text": " and any number of vectors.", "tokens": + [50472, 293, 604, 1230, 295, 18875, 13, 50596], "temperature": 0.0, "avg_logprob": + -0.16214718204913753, "compression_ratio": 1.628691983122363, "no_speech_prob": + 9.877497359411791e-05}, {"id": 76, "seek": 56710, "start": 571.74, "end": 575.3000000000001, + "text": " Conceptually, we typically think of embeddings and vectors", "tokens": + [50596, 47482, 671, 11, 321, 5850, 519, 295, 12240, 29432, 293, 18875, 50774], "temperature": + 0.0, "avg_logprob": -0.16214718204913753, "compression_ratio": 1.628691983122363, + "no_speech_prob": 9.877497359411791e-05}, {"id": 77, "seek": 56710, "start": 575.3000000000001, + "end": 578.86, "text": " as having a relatively small number of dimensions.", "tokens": + [50774, 382, 1419, 257, 7226, 1359, 1230, 295, 12819, 13, 50952], "temperature": + 0.0, "avg_logprob": -0.16214718204913753, "compression_ratio": 1.628691983122363, + "no_speech_prob": 9.877497359411791e-05}, {"id": 78, "seek": 56710, "start": 578.86, + "end": 582.5, "text": " We call these dense vectors, where maybe there''s 768", + "tokens": [50952, 492, 818, 613, 18011, 18875, 11, 689, 1310, 456, 311, 1614, 27102, + 51134], "temperature": 0.0, "avg_logprob": -0.16214718204913753, "compression_ratio": + 1.628691983122363, "no_speech_prob": 9.877497359411791e-05}, {"id": 79, "seek": + 56710, "start": 582.5, "end": 586.1, "text": " or 1024, some number of dimensions.", + "tokens": [51134, 420, 1266, 7911, 11, 512, 1230, 295, 12819, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.16214718204913753, "compression_ratio": 1.628691983122363, + "no_speech_prob": 9.877497359411791e-05}, {"id": 80, "seek": 56710, "start": 586.1, + "end": 589.9, "text": " And we compress lots of data into a continuous space", "tokens": + [51314, 400, 321, 14778, 3195, 295, 1412, 666, 257, 10957, 1901, 51504], "temperature": + 0.0, "avg_logprob": -0.16214718204913753, "compression_ratio": 1.628691983122363, + "no_speech_prob": 9.877497359411791e-05}, {"id": 81, "seek": 56710, "start": 589.9, + "end": 590.9, "text": " within those.", "tokens": [51504, 1951, 729, 13, 51554], + "temperature": 0.0, "avg_logprob": -0.16214718204913753, "compression_ratio": 1.628691983122363, + "no_speech_prob": 9.877497359411791e-05}, {"id": 82, "seek": 56710, "start": 590.9, + "end": 596.7, "text": " However, there''s also the notion of sparse vectors.", "tokens": + [51554, 2908, 11, 456, 311, 611, 264, 10710, 295, 637, 11668, 18875, 13, 51844], + "temperature": 0.0, "avg_logprob": -0.16214718204913753, "compression_ratio": 1.628691983122363, + "no_speech_prob": 9.877497359411791e-05}, {"id": 83, "seek": 59670, "start": 596.7, + "end": 598.5400000000001, "text": " And the best way to think of a sparse vector", + "tokens": [50364, 400, 264, 1151, 636, 281, 519, 295, 257, 637, 11668, 8062, 50456], + "temperature": 0.0, "avg_logprob": -0.1382564211648608, "compression_ratio": 1.7547892720306513, + "no_speech_prob": 4.4013890146743506e-05}, {"id": 84, "seek": 59670, "start": 598.5400000000001, + "end": 600.7, "text": " for purposes of our discussion today", "tokens": [50456, + 337, 9932, 295, 527, 5017, 965, 50564], "temperature": 0.0, "avg_logprob": -0.1382564211648608, + "compression_ratio": 1.7547892720306513, "no_speech_prob": 4.4013890146743506e-05}, + {"id": 85, "seek": 59670, "start": 600.7, "end": 604.46, "text": " is to think of + lexical search and to think of just,", "tokens": [50564, 307, 281, 519, 295, 476, + 87, 804, 3164, 293, 281, 519, 295, 445, 11, 50752], "temperature": 0.0, "avg_logprob": + -0.1382564211648608, "compression_ratio": 1.7547892720306513, "no_speech_prob": + 4.4013890146743506e-05}, {"id": 86, "seek": 59670, "start": 604.46, "end": 606.82, + "text": " when I''m trying to run a search for keywords.", "tokens": [50752, 562, + 286, 478, 1382, 281, 1190, 257, 3164, 337, 21009, 13, 50870], "temperature": 0.0, + "avg_logprob": -0.1382564211648608, "compression_ratio": 1.7547892720306513, "no_speech_prob": + 4.4013890146743506e-05}, {"id": 87, "seek": 59670, "start": 606.82, "end": 613.1, + "text": " So imagine you have a 1 million dimensional vector, not 768,", "tokens": + [50870, 407, 3811, 291, 362, 257, 502, 2459, 18795, 8062, 11, 406, 1614, 27102, + 11, 51184], "temperature": 0.0, "avg_logprob": -0.1382564211648608, "compression_ratio": + 1.7547892720306513, "no_speech_prob": 4.4013890146743506e-05}, {"id": 88, "seek": + 59670, "start": 613.1, "end": 614.74, "text": " but a million dimensions.", "tokens": + [51184, 457, 257, 2459, 12819, 13, 51266], "temperature": 0.0, "avg_logprob": -0.1382564211648608, + "compression_ratio": 1.7547892720306513, "no_speech_prob": 4.4013890146743506e-05}, + {"id": 89, "seek": 59670, "start": 614.74, "end": 617.1400000000001, "text": " And + every single one of those dimensions", "tokens": [51266, 400, 633, 2167, 472, 295, + 729, 12819, 51386], "temperature": 0.0, "avg_logprob": -0.1382564211648608, "compression_ratio": + 1.7547892720306513, "no_speech_prob": 4.4013890146743506e-05}, {"id": 90, "seek": + 59670, "start": 617.1400000000001, "end": 620.74, "text": " corresponds to a term + in your index,", "tokens": [51386, 23249, 281, 257, 1433, 294, 428, 8186, 11, 51566], + "temperature": 0.0, "avg_logprob": -0.1382564211648608, "compression_ratio": 1.7547892720306513, + "no_speech_prob": 4.4013890146743506e-05}, {"id": 91, "seek": 59670, "start": 620.74, + "end": 622.38, "text": " where you''ve indexed all of your keywords.", "tokens": + [51566, 689, 291, 600, 8186, 292, 439, 295, 428, 21009, 13, 51648], "temperature": + 0.0, "avg_logprob": -0.1382564211648608, "compression_ratio": 1.7547892720306513, + "no_speech_prob": 4.4013890146743506e-05}, {"id": 92, "seek": 59670, "start": 622.38, + "end": 624.26, "text": " And let''s just assume that there''s only a million terms", + "tokens": [51648, 400, 718, 311, 445, 6552, 300, 456, 311, 787, 257, 2459, 2115, + 51742], "temperature": 0.0, "avg_logprob": -0.1382564211648608, "compression_ratio": + 1.7547892720306513, "no_speech_prob": 4.4013890146743506e-05}, {"id": 93, "seek": + 59670, "start": 624.26, "end": 626.58, "text": " in your index.", "tokens": [51742, + 294, 428, 8186, 13, 51858], "temperature": 0.0, "avg_logprob": -0.1382564211648608, + "compression_ratio": 1.7547892720306513, "no_speech_prob": 4.4013890146743506e-05}, + {"id": 94, "seek": 62658, "start": 626.58, "end": 630.9000000000001, "text": " If + I wanted to represent latte as a query,", "tokens": [50364, 759, 286, 1415, 281, + 2906, 37854, 382, 257, 14581, 11, 50580], "temperature": 0.0, "avg_logprob": -0.17519365824185884, + "compression_ratio": 1.7941176470588236, "no_speech_prob": 0.0010118169011548162}, + {"id": 95, "seek": 62658, "start": 630.9000000000001, "end": 631.82, "text": " well, + let me not do latte.", "tokens": [50580, 731, 11, 718, 385, 406, 360, 37854, 13, + 50626], "temperature": 0.0, "avg_logprob": -0.17519365824185884, "compression_ratio": + 1.7941176470588236, "no_speech_prob": 0.0010118169011548162}, {"id": 96, "seek": + 62658, "start": 631.82, "end": 634.5, "text": " Let me do a donut.", "tokens": [50626, + 961, 385, 360, 257, 33782, 13, 50760], "temperature": 0.0, "avg_logprob": -0.17519365824185884, + "compression_ratio": 1.7941176470588236, "no_speech_prob": 0.0010118169011548162}, + {"id": 97, "seek": 62658, "start": 634.5, "end": 636.7, "text": " If I want to represent + donut as a query,", "tokens": [50760, 759, 286, 528, 281, 2906, 33782, 382, 257, + 14581, 11, 50870], "temperature": 0.0, "avg_logprob": -0.17519365824185884, "compression_ratio": + 1.7941176470588236, "no_speech_prob": 0.0010118169011548162}, {"id": 98, "seek": + 62658, "start": 636.7, "end": 640.7, "text": " then I can represent that as a vector + with a million zeros", "tokens": [50870, 550, 286, 393, 2906, 300, 382, 257, 8062, + 365, 257, 2459, 35193, 51070], "temperature": 0.0, "avg_logprob": -0.17519365824185884, + "compression_ratio": 1.7941176470588236, "no_speech_prob": 0.0010118169011548162}, + {"id": 99, "seek": 62658, "start": 640.7, "end": 641.86, "text": " minus 1.", "tokens": + [51070, 3175, 502, 13, 51128], "temperature": 0.0, "avg_logprob": -0.17519365824185884, + "compression_ratio": 1.7941176470588236, "no_speech_prob": 0.0010118169011548162}, + {"id": 100, "seek": 62658, "start": 641.86, "end": 645.14, "text": " And there''s + a 1 in the column for donut,", "tokens": [51128, 400, 456, 311, 257, 502, 294, 264, + 7738, 337, 33782, 11, 51292], "temperature": 0.0, "avg_logprob": -0.17519365824185884, + "compression_ratio": 1.7941176470588236, "no_speech_prob": 0.0010118169011548162}, + {"id": 101, "seek": 62658, "start": 645.14, "end": 648.0200000000001, "text": " + indicating that this is a million dimensional vector", "tokens": [51292, 25604, + 300, 341, 307, 257, 2459, 18795, 8062, 51436], "temperature": 0.0, "avg_logprob": + -0.17519365824185884, "compression_ratio": 1.7941176470588236, "no_speech_prob": + 0.0010118169011548162}, {"id": 102, "seek": 62658, "start": 648.0200000000001, "end": + 650.9000000000001, "text": " with only one value represented.", "tokens": [51436, + 365, 787, 472, 2158, 10379, 13, 51580], "temperature": 0.0, "avg_logprob": -0.17519365824185884, + "compression_ratio": 1.7941176470588236, "no_speech_prob": 0.0010118169011548162}, + {"id": 103, "seek": 62658, "start": 650.9000000000001, "end": 655.26, "text": " + And that''s whether the text donut appears", "tokens": [51580, 400, 300, 311, 1968, + 264, 2487, 33782, 7038, 51798], "temperature": 0.0, "avg_logprob": -0.17519365824185884, + "compression_ratio": 1.7941176470588236, "no_speech_prob": 0.0010118169011548162}, + {"id": 104, "seek": 65526, "start": 655.26, "end": 659.38, "text": " within this + document or query.", "tokens": [50364, 1951, 341, 4166, 420, 14581, 13, 50570], + "temperature": 0.0, "avg_logprob": -0.1314561406119925, "compression_ratio": 1.7829787234042553, + "no_speech_prob": 0.0004816131549887359}, {"id": 105, "seek": 65526, "start": 659.38, + "end": 663.8199999999999, "text": " And so that''s a sparse vector, where it''s + sparse", "tokens": [50570, 400, 370, 300, 311, 257, 637, 11668, 8062, 11, 689, 309, + 311, 637, 11668, 50792], "temperature": 0.0, "avg_logprob": -0.1314561406119925, + "compression_ratio": 1.7829787234042553, "no_speech_prob": 0.0004816131549887359}, + {"id": 106, "seek": 65526, "start": 663.8199999999999, "end": 665.8199999999999, + "text": " because most of the data is not filled.", "tokens": [50792, 570, 881, + 295, 264, 1412, 307, 406, 6412, 13, 50892], "temperature": 0.0, "avg_logprob": -0.1314561406119925, + "compression_ratio": 1.7829787234042553, "no_speech_prob": 0.0004816131549887359}, + {"id": 107, "seek": 65526, "start": 665.8199999999999, "end": 669.1, "text": " And + I have mostly zeros, but there''s some ones.", "tokens": [50892, 400, 286, 362, + 5240, 35193, 11, 457, 456, 311, 512, 2306, 13, 51056], "temperature": 0.0, "avg_logprob": + -0.1314561406119925, "compression_ratio": 1.7829787234042553, "no_speech_prob": + 0.0004816131549887359}, {"id": 108, "seek": 65526, "start": 669.1, "end": 672.5, + "text": " And of course, in this case, if I had the search for cheese pizza,", "tokens": + [51056, 400, 295, 1164, 11, 294, 341, 1389, 11, 498, 286, 632, 264, 3164, 337, 5399, + 8298, 11, 51226], "temperature": 0.0, "avg_logprob": -0.1314561406119925, "compression_ratio": + 1.7829787234042553, "no_speech_prob": 0.0004816131549887359}, {"id": 109, "seek": + 65526, "start": 672.5, "end": 675.18, "text": " that vector would have two ones + in it, one for cheese", "tokens": [51226, 300, 8062, 576, 362, 732, 2306, 294, 309, + 11, 472, 337, 5399, 51360], "temperature": 0.0, "avg_logprob": -0.1314561406119925, + "compression_ratio": 1.7829787234042553, "no_speech_prob": 0.0004816131549887359}, + {"id": 110, "seek": 65526, "start": 675.18, "end": 676.42, "text": " and one for + pizza.", "tokens": [51360, 293, 472, 337, 8298, 13, 51422], "temperature": 0.0, + "avg_logprob": -0.1314561406119925, "compression_ratio": 1.7829787234042553, "no_speech_prob": + 0.0004816131549887359}, {"id": 111, "seek": 65526, "start": 676.42, "end": 679.8199999999999, + "text": " So it''s a million dimensional vector with two ones in it.", "tokens": + [51422, 407, 309, 311, 257, 2459, 18795, 8062, 365, 732, 2306, 294, 309, 13, 51592], + "temperature": 0.0, "avg_logprob": -0.1314561406119925, "compression_ratio": 1.7829787234042553, + "no_speech_prob": 0.0004816131549887359}, {"id": 112, "seek": 65526, "start": 679.8199999999999, + "end": 683.8199999999999, "text": " This is just as valid as a vector, as a dense + vector,", "tokens": [51592, 639, 307, 445, 382, 7363, 382, 257, 8062, 11, 382, 257, + 18011, 8062, 11, 51792], "temperature": 0.0, "avg_logprob": -0.1314561406119925, + "compression_ratio": 1.7829787234042553, "no_speech_prob": 0.0004816131549887359}, + {"id": 113, "seek": 68382, "start": 683.82, "end": 686.22, "text": " with only 738 + dimensions.", "tokens": [50364, 365, 787, 1614, 12625, 12819, 13, 50484], "temperature": + 0.0, "avg_logprob": -0.16312205684077632, "compression_ratio": 1.7066115702479339, + "no_speech_prob": 0.0014139787526801229}, {"id": 114, "seek": 68382, "start": 686.22, + "end": 689.3000000000001, "text": " But what we typically do, when we start", "tokens": + [50484, 583, 437, 321, 5850, 360, 11, 562, 321, 722, 50638], "temperature": 0.0, + "avg_logprob": -0.16312205684077632, "compression_ratio": 1.7066115702479339, "no_speech_prob": + 0.0014139787526801229}, {"id": 115, "seek": 68382, "start": 689.3000000000001, "end": + 693.22, "text": " to move from lexical matching, where we can match on those", "tokens": + [50638, 281, 1286, 490, 476, 87, 804, 14324, 11, 689, 321, 393, 2995, 322, 729, + 50834], "temperature": 0.0, "avg_logprob": -0.16312205684077632, "compression_ratio": + 1.7066115702479339, "no_speech_prob": 0.0014139787526801229}, {"id": 116, "seek": + 68382, "start": 693.22, "end": 696.46, "text": " yes or no ones or zeros in an inverted + index,", "tokens": [50834, 2086, 420, 572, 2306, 420, 35193, 294, 364, 38969, 8186, + 11, 50996], "temperature": 0.0, "avg_logprob": -0.16312205684077632, "compression_ratio": + 1.7066115702479339, "no_speech_prob": 0.0014139787526801229}, {"id": 117, "seek": + 68382, "start": 696.46, "end": 699.7, "text": " what we typically do when we move + to doing semantic search", "tokens": [50996, 437, 321, 5850, 360, 562, 321, 1286, + 281, 884, 47982, 3164, 51158], "temperature": 0.0, "avg_logprob": -0.16312205684077632, + "compression_ratio": 1.7066115702479339, "no_speech_prob": 0.0014139787526801229}, + {"id": 118, "seek": 68382, "start": 699.7, "end": 702.1400000000001, "text": " is + we focus on a much smaller number of dimensions.", "tokens": [51158, 307, 321, 1879, + 322, 257, 709, 4356, 1230, 295, 12819, 13, 51280], "temperature": 0.0, "avg_logprob": + -0.16312205684077632, "compression_ratio": 1.7066115702479339, "no_speech_prob": + 0.0014139787526801229}, {"id": 119, "seek": 68382, "start": 702.1400000000001, "end": + 705.1400000000001, "text": " And so conceptually, as an embedding here,", "tokens": + [51280, 400, 370, 3410, 671, 11, 382, 364, 12240, 3584, 510, 11, 51430], "temperature": + 0.0, "avg_logprob": -0.16312205684077632, "compression_ratio": 1.7066115702479339, + "no_speech_prob": 0.0014139787526801229}, {"id": 120, "seek": 68382, "start": 705.1400000000001, + "end": 708.2600000000001, "text": " what I have is eight dimensions.", "tokens": + [51430, 437, 286, 362, 307, 3180, 12819, 13, 51586], "temperature": 0.0, "avg_logprob": + -0.16312205684077632, "compression_ratio": 1.7066115702479339, "no_speech_prob": + 0.0014139787526801229}, {"id": 121, "seek": 68382, "start": 708.2600000000001, "end": + 710.9000000000001, "text": " Each of these items that I showed on the previous slide", + "tokens": [51586, 6947, 295, 613, 4754, 300, 286, 4712, 322, 264, 3894, 4137, 51718], + "temperature": 0.0, "avg_logprob": -0.16312205684077632, "compression_ratio": 1.7066115702479339, + "no_speech_prob": 0.0014139787526801229}, {"id": 122, "seek": 71090, "start": 710.9399999999999, + "end": 714.62, "text": " has dimensions indicating whether it''s food,", "tokens": + [50366, 575, 12819, 25604, 1968, 309, 311, 1755, 11, 50550], "temperature": 0.0, + "avg_logprob": -0.12700746259616533, "compression_ratio": 2.0088105726872247, "no_speech_prob": + 0.01598554477095604}, {"id": 123, "seek": 71090, "start": 714.62, "end": 718.66, + "text": " whether it''s a drink, how much dairy it has, is it bread,", "tokens": + [50550, 1968, 309, 311, 257, 2822, 11, 577, 709, 21276, 309, 575, 11, 307, 309, + 5961, 11, 50752], "temperature": 0.0, "avg_logprob": -0.12700746259616533, "compression_ratio": + 2.0088105726872247, "no_speech_prob": 0.01598554477095604}, {"id": 124, "seek": + 71090, "start": 718.66, "end": 721.14, "text": " is it caffeine, sweet, calories, + healthy, et cetera.", "tokens": [50752, 307, 309, 31261, 11, 3844, 11, 14904, 11, + 4627, 11, 1030, 11458, 13, 50876], "temperature": 0.0, "avg_logprob": -0.12700746259616533, + "compression_ratio": 2.0088105726872247, "no_speech_prob": 0.01598554477095604}, + {"id": 125, "seek": 71090, "start": 721.14, "end": 724.4599999999999, "text": " + So you can see apple juice now is not represented as,", "tokens": [50876, 407, 291, + 393, 536, 10606, 8544, 586, 307, 406, 10379, 382, 11, 51042], "temperature": 0.0, + "avg_logprob": -0.12700746259616533, "compression_ratio": 2.0088105726872247, "no_speech_prob": + 0.01598554477095604}, {"id": 126, "seek": 71090, "start": 724.4599999999999, "end": + 726.42, "text": " it has the word apple and it has the word juice,", "tokens": [51042, + 309, 575, 264, 1349, 10606, 293, 309, 575, 264, 1349, 8544, 11, 51140], "temperature": + 0.0, "avg_logprob": -0.12700746259616533, "compression_ratio": 2.0088105726872247, + "no_speech_prob": 0.01598554477095604}, {"id": 127, "seek": 71090, "start": 726.42, + "end": 729.98, "text": " but it''s represented as very much not food,", "tokens": + [51140, 457, 309, 311, 10379, 382, 588, 709, 406, 1755, 11, 51318], "temperature": + 0.0, "avg_logprob": -0.12700746259616533, "compression_ratio": 2.0088105726872247, + "no_speech_prob": 0.01598554477095604}, {"id": 128, "seek": 71090, "start": 729.98, + "end": 733.9399999999999, "text": " very much a drink, no dairy, no bread, no caffeine,", + "tokens": [51318, 588, 709, 257, 2822, 11, 572, 21276, 11, 572, 5961, 11, 572, 31261, + 11, 51516], "temperature": 0.0, "avg_logprob": -0.12700746259616533, "compression_ratio": + 2.0088105726872247, "no_speech_prob": 0.01598554477095604}, {"id": 129, "seek": + 71090, "start": 733.9399999999999, "end": 736.22, "text": " very high on sweet, + but not all the way up,", "tokens": [51516, 588, 1090, 322, 3844, 11, 457, 406, + 439, 264, 636, 493, 11, 51630], "temperature": 0.0, "avg_logprob": -0.12700746259616533, + "compression_ratio": 2.0088105726872247, "no_speech_prob": 0.01598554477095604}, + {"id": 130, "seek": 71090, "start": 736.22, "end": 739.38, "text": " very high on + calories, but all the way up and in between,", "tokens": [51630, 588, 1090, 322, + 14904, 11, 457, 439, 264, 636, 493, 293, 294, 1296, 11, 51788], "temperature": 0.0, + "avg_logprob": -0.12700746259616533, "compression_ratio": 2.0088105726872247, "no_speech_prob": + 0.01598554477095604}, {"id": 131, "seek": 73938, "start": 739.38, "end": 743.22, + "text": " but kind of sort of healthy, but not really.", "tokens": [50364, 457, + 733, 295, 1333, 295, 4627, 11, 457, 406, 534, 13, 50556], "temperature": 0.0, "avg_logprob": + -0.1634188982156607, "compression_ratio": 1.6804979253112033, "no_speech_prob": + 0.00027289101853966713}, {"id": 132, "seek": 73938, "start": 743.22, "end": 745.46, + "text": " And then same thing, cheese bread sticks,", "tokens": [50556, 400, 550, + 912, 551, 11, 5399, 5961, 12518, 11, 50668], "temperature": 0.0, "avg_logprob": + -0.1634188982156607, "compression_ratio": 1.6804979253112033, "no_speech_prob": + 0.00027289101853966713}, {"id": 133, "seek": 73938, "start": 745.46, "end": 748.66, + "text": " very much food, not a drink, good bit of dairy,", "tokens": [50668, 588, + 709, 1755, 11, 406, 257, 2822, 11, 665, 857, 295, 21276, 11, 50828], "temperature": + 0.0, "avg_logprob": -0.1634188982156607, "compression_ratio": 1.6804979253112033, + "no_speech_prob": 0.00027289101853966713}, {"id": 134, "seek": 73938, "start": 748.66, + "end": 751.34, "text": " very much bread, no caffeine, you get the idea.", "tokens": + [50828, 588, 709, 5961, 11, 572, 31261, 11, 291, 483, 264, 1558, 13, 50962], "temperature": + 0.0, "avg_logprob": -0.1634188982156607, "compression_ratio": 1.6804979253112033, + "no_speech_prob": 0.00027289101853966713}, {"id": 135, "seek": 73938, "start": 751.34, + "end": 755.42, "text": " These map in the attributes are the dimensions", "tokens": + [50962, 1981, 4471, 294, 264, 17212, 366, 264, 12819, 51166], "temperature": 0.0, + "avg_logprob": -0.1634188982156607, "compression_ratio": 1.6804979253112033, "no_speech_prob": + 0.00027289101853966713}, {"id": 136, "seek": 73938, "start": 755.42, "end": 758.34, + "text": " of these concepts over here by representing them", "tokens": [51166, 295, + 613, 10392, 670, 510, 538, 13460, 552, 51312], "temperature": 0.0, "avg_logprob": + -0.1634188982156607, "compression_ratio": 1.6804979253112033, "no_speech_prob": + 0.00027289101853966713}, {"id": 137, "seek": 73938, "start": 758.34, "end": 759.82, + "text": " in these eight dimensions.", "tokens": [51312, 294, 613, 3180, 12819, + 13, 51386], "temperature": 0.0, "avg_logprob": -0.1634188982156607, "compression_ratio": + 1.6804979253112033, "no_speech_prob": 0.00027289101853966713}, {"id": 138, "seek": + 73938, "start": 759.82, "end": 764.46, "text": " And in search, what we typically + do is we represent documents", "tokens": [51386, 400, 294, 3164, 11, 437, 321, 5850, + 360, 307, 321, 2906, 8512, 51618], "temperature": 0.0, "avg_logprob": -0.1634188982156607, + "compression_ratio": 1.6804979253112033, "no_speech_prob": 0.00027289101853966713}, + {"id": 139, "seek": 73938, "start": 764.46, "end": 767.58, "text": " and queries + leveraging these vectors,", "tokens": [51618, 293, 24109, 32666, 613, 18875, 11, + 51774], "temperature": 0.0, "avg_logprob": -0.1634188982156607, "compression_ratio": + 1.6804979253112033, "no_speech_prob": 0.00027289101853966713}, {"id": 140, "seek": + 76758, "start": 767.58, "end": 771.46, "text": " and then we do some sort of vector + similarity calculation", "tokens": [50364, 293, 550, 321, 360, 512, 1333, 295, 8062, + 32194, 17108, 50558], "temperature": 0.0, "avg_logprob": -0.11309589717699134, "compression_ratio": + 1.8464566929133859, "no_speech_prob": 0.00037480180617421865}, {"id": 141, "seek": + 76758, "start": 771.46, "end": 774.5, "text": " in order to say how later similar + things are.", "tokens": [50558, 294, 1668, 281, 584, 577, 1780, 2531, 721, 366, + 13, 50710], "temperature": 0.0, "avg_logprob": -0.11309589717699134, "compression_ratio": + 1.8464566929133859, "no_speech_prob": 0.00037480180617421865}, {"id": 142, "seek": + 76758, "start": 774.5, "end": 777.14, "text": " So if I were to, for example, take + the vector", "tokens": [50710, 407, 498, 286, 645, 281, 11, 337, 1365, 11, 747, + 264, 8062, 50842], "temperature": 0.0, "avg_logprob": -0.11309589717699134, "compression_ratio": + 1.8464566929133859, "no_speech_prob": 0.00037480180617421865}, {"id": 143, "seek": + 76758, "start": 777.14, "end": 781.1800000000001, "text": " over here for cheese + pizza, and I were to then do a cosine", "tokens": [50842, 670, 510, 337, 5399, 8298, + 11, 293, 286, 645, 281, 550, 360, 257, 23565, 51044], "temperature": 0.0, "avg_logprob": + -0.11309589717699134, "compression_ratio": 1.8464566929133859, "no_speech_prob": + 0.00037480180617421865}, {"id": 144, "seek": 76758, "start": 781.1800000000001, + "end": 784.4200000000001, "text": " similarity between that vector and every other + vector,", "tokens": [51044, 32194, 1296, 300, 8062, 293, 633, 661, 8062, 11, 51206], + "temperature": 0.0, "avg_logprob": -0.11309589717699134, "compression_ratio": 1.8464566929133859, + "no_speech_prob": 0.00037480180617421865}, {"id": 145, "seek": 76758, "start": 784.4200000000001, + "end": 788.0600000000001, "text": " I would see that cheese bread sticks have a + very high", "tokens": [51206, 286, 576, 536, 300, 5399, 5961, 12518, 362, 257, 588, + 1090, 51388], "temperature": 0.0, "avg_logprob": -0.11309589717699134, "compression_ratio": + 1.8464566929133859, "no_speech_prob": 0.00037480180617421865}, {"id": 146, "seek": + 76758, "start": 788.0600000000001, "end": 790.82, "text": " similarity followed + by cinnamon bread sticks,", "tokens": [51388, 32194, 6263, 538, 22969, 5961, 12518, + 11, 51526], "temperature": 0.0, "avg_logprob": -0.11309589717699134, "compression_ratio": + 1.8464566929133859, "no_speech_prob": 0.00037480180617421865}, {"id": 147, "seek": + 76758, "start": 790.82, "end": 792.86, "text": " followed by doughnut, all the way + down to water.", "tokens": [51526, 6263, 538, 7984, 18316, 11, 439, 264, 636, 760, + 281, 1281, 13, 51628], "temperature": 0.0, "avg_logprob": -0.11309589717699134, + "compression_ratio": 1.8464566929133859, "no_speech_prob": 0.00037480180617421865}, + {"id": 148, "seek": 76758, "start": 792.86, "end": 796.1, "text": " So these are + essentially ranked based upon cheese pizza,", "tokens": [51628, 407, 613, 366, 4476, + 20197, 2361, 3564, 5399, 8298, 11, 51790], "temperature": 0.0, "avg_logprob": -0.11309589717699134, + "compression_ratio": 1.8464566929133859, "no_speech_prob": 0.00037480180617421865}, + {"id": 149, "seek": 79610, "start": 796.1, "end": 797.9, "text": " these are the + cheesiest, breadiest,", "tokens": [50364, 613, 366, 264, 947, 279, 6495, 11, 5961, + 6495, 11, 50454], "temperature": 0.0, "avg_logprob": -0.177269571976696, "compression_ratio": + 1.8473895582329318, "no_speech_prob": 0.00041461861110292375}, {"id": 150, "seek": + 79610, "start": 797.9, "end": 801.38, "text": " unhealthiest, non-drinkiest things + at the top.", "tokens": [50454, 517, 19225, 6495, 11, 2107, 12, 16753, 475, 6495, + 721, 412, 264, 1192, 13, 50628], "temperature": 0.0, "avg_logprob": -0.177269571976696, + "compression_ratio": 1.8473895582329318, "no_speech_prob": 0.00041461861110292375}, + {"id": 151, "seek": 79610, "start": 801.38, "end": 805.7, "text": " This is still + very, very non-drinky,", "tokens": [50628, 639, 307, 920, 588, 11, 588, 2107, 12, + 16753, 22998, 11, 50844], "temperature": 0.0, "avg_logprob": -0.177269571976696, + "compression_ratio": 1.8473895582329318, "no_speech_prob": 0.00041461861110292375}, + {"id": 152, "seek": 79610, "start": 805.7, "end": 808.22, "text": " not very healthy + here, ranked all the way down to it.", "tokens": [50844, 406, 588, 4627, 510, 11, + 20197, 439, 264, 636, 760, 281, 309, 13, 50970], "temperature": 0.0, "avg_logprob": + -0.177269571976696, "compression_ratio": 1.8473895582329318, "no_speech_prob": 0.00041461861110292375}, + {"id": 153, "seek": 79610, "start": 808.22, "end": 810.38, "text": " It''s essentially + opposite in this vector space,", "tokens": [50970, 467, 311, 4476, 6182, 294, 341, + 8062, 1901, 11, 51078], "temperature": 0.0, "avg_logprob": -0.177269571976696, "compression_ratio": + 1.8473895582329318, "no_speech_prob": 0.00041461861110292375}, {"id": 154, "seek": + 79610, "start": 810.38, "end": 811.74, "text": " which is water, which is all the + way", "tokens": [51078, 597, 307, 1281, 11, 597, 307, 439, 264, 636, 51146], "temperature": + 0.0, "avg_logprob": -0.177269571976696, "compression_ratio": 1.8473895582329318, + "no_speech_prob": 0.00041461861110292375}, {"id": 155, "seek": 79610, "start": 811.74, + "end": 813.1800000000001, "text": " on the other end of the spectrum.", "tokens": + [51146, 322, 264, 661, 917, 295, 264, 11143, 13, 51218], "temperature": 0.0, "avg_logprob": + -0.177269571976696, "compression_ratio": 1.8473895582329318, "no_speech_prob": 0.00041461861110292375}, + {"id": 156, "seek": 79610, "start": 813.1800000000001, "end": 815.5400000000001, + "text": " Same thing with green tea, very similar to water,", "tokens": [51218, + 10635, 551, 365, 3092, 5817, 11, 588, 2531, 281, 1281, 11, 51336], "temperature": + 0.0, "avg_logprob": -0.177269571976696, "compression_ratio": 1.8473895582329318, + "no_speech_prob": 0.00041461861110292375}, {"id": 157, "seek": 79610, "start": 815.5400000000001, + "end": 820.86, "text": " cappuccino latte, healthy, no calories drink,", "tokens": + [51336, 1335, 427, 39407, 2982, 37854, 11, 4627, 11, 572, 14904, 2822, 11, 51602], + "temperature": 0.0, "avg_logprob": -0.177269571976696, "compression_ratio": 1.8473895582329318, + "no_speech_prob": 0.00041461861110292375}, {"id": 158, "seek": 79610, "start": 820.86, + "end": 824.34, "text": " all the way down to a very unhealthy, very not drink.", + "tokens": [51602, 439, 264, 636, 760, 281, 257, 588, 29147, 11, 588, 406, 2822, + 13, 51776], "temperature": 0.0, "avg_logprob": -0.177269571976696, "compression_ratio": + 1.8473895582329318, "no_speech_prob": 0.00041461861110292375}, {"id": 159, "seek": + 79610, "start": 824.34, "end": 825.14, "text": " You get the idea.", "tokens": [51776, + 509, 483, 264, 1558, 13, 51816], "temperature": 0.0, "avg_logprob": -0.177269571976696, + "compression_ratio": 1.8473895582329318, "no_speech_prob": 0.00041461861110292375}, + {"id": 160, "seek": 82514, "start": 825.14, "end": 829.02, "text": " So that''s + essentially in a semantic vector space,", "tokens": [50364, 407, 300, 311, 4476, + 294, 257, 47982, 8062, 1901, 11, 50558], "temperature": 0.0, "avg_logprob": -0.14254786825587606, + "compression_ratio": 1.6553030303030303, "no_speech_prob": 0.00014917526277713478}, + {"id": 161, "seek": 82514, "start": 829.02, "end": 831.62, "text": " things span + across these dimensions,", "tokens": [50558, 721, 16174, 2108, 613, 12819, 11, 50688], + "temperature": 0.0, "avg_logprob": -0.14254786825587606, "compression_ratio": 1.6553030303030303, + "no_speech_prob": 0.00014917526277713478}, {"id": 162, "seek": 82514, "start": 831.62, + "end": 834.62, "text": " and they fit at different places along,", "tokens": [50688, + 293, 436, 3318, 412, 819, 3190, 2051, 11, 50838], "temperature": 0.0, "avg_logprob": + -0.14254786825587606, "compression_ratio": 1.6553030303030303, "no_speech_prob": + 0.00014917526277713478}, {"id": 163, "seek": 82514, "start": 834.62, "end": 836.54, + "text": " within the vector space, that corresponds", "tokens": [50838, 1951, 264, + 8062, 1901, 11, 300, 23249, 50934], "temperature": 0.0, "avg_logprob": -0.14254786825587606, + "compression_ratio": 1.6553030303030303, "no_speech_prob": 0.00014917526277713478}, + {"id": 164, "seek": 82514, "start": 836.54, "end": 839.26, "text": " to the meaning + of these attributes.", "tokens": [50934, 281, 264, 3620, 295, 613, 17212, 13, 51070], + "temperature": 0.0, "avg_logprob": -0.14254786825587606, "compression_ratio": 1.6553030303030303, + "no_speech_prob": 0.00014917526277713478}, {"id": 165, "seek": 82514, "start": 839.26, + "end": 841.74, "text": " Now, when we deal with transformers,", "tokens": [51070, + 823, 11, 562, 321, 2028, 365, 4088, 433, 11, 51194], "temperature": 0.0, "avg_logprob": + -0.14254786825587606, "compression_ratio": 1.6553030303030303, "no_speech_prob": + 0.00014917526277713478}, {"id": 166, "seek": 82514, "start": 841.74, "end": 843.54, + "text": " which we get from all the LLMs today", "tokens": [51194, 597, 321, 483, + 490, 439, 264, 441, 43, 26386, 965, 51284], "temperature": 0.0, "avg_logprob": -0.14254786825587606, + "compression_ratio": 1.6553030303030303, "no_speech_prob": 0.00014917526277713478}, + {"id": 167, "seek": 82514, "start": 843.54, "end": 845.98, "text": " that we''re + leveraging for vector search,", "tokens": [51284, 300, 321, 434, 32666, 337, 8062, + 3164, 11, 51406], "temperature": 0.0, "avg_logprob": -0.14254786825587606, "compression_ratio": + 1.6553030303030303, "no_speech_prob": 0.00014917526277713478}, {"id": 168, "seek": + 82514, "start": 845.98, "end": 848.62, "text": " these don''t use explicit features,", + "tokens": [51406, 613, 500, 380, 764, 13691, 4122, 11, 51538], "temperature": 0.0, + "avg_logprob": -0.14254786825587606, "compression_ratio": 1.6553030303030303, "no_speech_prob": + 0.00014917526277713478}, {"id": 169, "seek": 82514, "start": 848.62, "end": 851.74, + "text": " like we have here, food, drink, dairy, bread, et cetera.", "tokens": [51538, + 411, 321, 362, 510, 11, 1755, 11, 2822, 11, 21276, 11, 5961, 11, 1030, 11458, 13, + 51694], "temperature": 0.0, "avg_logprob": -0.14254786825587606, "compression_ratio": + 1.6553030303030303, "no_speech_prob": 0.00014917526277713478}, {"id": 170, "seek": + 82514, "start": 851.74, "end": 853.22, "text": " They use latent features.", "tokens": + [51694, 814, 764, 48994, 4122, 13, 51768], "temperature": 0.0, "avg_logprob": -0.14254786825587606, + "compression_ratio": 1.6553030303030303, "no_speech_prob": 0.00014917526277713478}, + {"id": 171, "seek": 85322, "start": 853.22, "end": 855.46, "text": " And latent + just means sort of hidden,", "tokens": [50364, 400, 48994, 445, 1355, 1333, 295, + 7633, 11, 50476], "temperature": 0.0, "avg_logprob": -0.12584982094941316, "compression_ratio": + 1.6363636363636365, "no_speech_prob": 4.148954030824825e-05}, {"id": 172, "seek": + 85322, "start": 855.46, "end": 857.1, "text": " or another way to put it is,", "tokens": + [50476, 420, 1071, 636, 281, 829, 309, 307, 11, 50558], "temperature": 0.0, "avg_logprob": + -0.12584982094941316, "compression_ratio": 1.6363636363636365, "no_speech_prob": + 4.148954030824825e-05}, {"id": 173, "seek": 85322, "start": 857.1, "end": 859.6600000000001, + "text": " the dimensions don''t correspond one to one", "tokens": [50558, 264, 12819, + 500, 380, 6805, 472, 281, 472, 50686], "temperature": 0.0, "avg_logprob": -0.12584982094941316, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 4.148954030824825e-05}, + {"id": 174, "seek": 85322, "start": 859.6600000000001, "end": 861.1800000000001, + "text": " with particular attributes.", "tokens": [50686, 365, 1729, 17212, 13, + 50762], "temperature": 0.0, "avg_logprob": -0.12584982094941316, "compression_ratio": + 1.6363636363636365, "no_speech_prob": 4.148954030824825e-05}, {"id": 175, "seek": + 85322, "start": 861.1800000000001, "end": 864.26, "text": " It''s combinations of + those dimensions together", "tokens": [50762, 467, 311, 21267, 295, 729, 12819, + 1214, 50916], "temperature": 0.0, "avg_logprob": -0.12584982094941316, "compression_ratio": + 1.6363636363636365, "no_speech_prob": 4.148954030824825e-05}, {"id": 176, "seek": + 85322, "start": 864.26, "end": 865.94, "text": " that they give us our meaning.", + "tokens": [50916, 300, 436, 976, 505, 527, 3620, 13, 51000], "temperature": 0.0, + "avg_logprob": -0.12584982094941316, "compression_ratio": 1.6363636363636365, "no_speech_prob": + 4.148954030824825e-05}, {"id": 177, "seek": 85322, "start": 865.94, "end": 870.5, + "text": " And so to think of that visually,", "tokens": [51000, 400, 370, 281, 519, + 295, 300, 19622, 11, 51228], "temperature": 0.0, "avg_logprob": -0.12584982094941316, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 4.148954030824825e-05}, + {"id": 178, "seek": 85322, "start": 870.5, "end": 872.82, "text": " if I were to + create an embedding space,", "tokens": [51228, 498, 286, 645, 281, 1884, 364, 12240, + 3584, 1901, 11, 51344], "temperature": 0.0, "avg_logprob": -0.12584982094941316, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 4.148954030824825e-05}, + {"id": 179, "seek": 85322, "start": 872.82, "end": 874.4200000000001, "text": " + and this is obviously flattened,", "tokens": [51344, 293, 341, 307, 2745, 24183, + 292, 11, 51424], "temperature": 0.0, "avg_logprob": -0.12584982094941316, "compression_ratio": + 1.6363636363636365, "no_speech_prob": 4.148954030824825e-05}, {"id": 180, "seek": + 85322, "start": 874.4200000000001, "end": 876.62, "text": " there could be thousands + and thousands of dimensions", "tokens": [51424, 456, 727, 312, 5383, 293, 5383, + 295, 12819, 51534], "temperature": 0.0, "avg_logprob": -0.12584982094941316, "compression_ratio": + 1.6363636363636365, "no_speech_prob": 4.148954030824825e-05}, {"id": 181, "seek": + 85322, "start": 876.62, "end": 880.02, "text": " or hundreds, but in this vector + space,", "tokens": [51534, 420, 6779, 11, 457, 294, 341, 8062, 1901, 11, 51704], + "temperature": 0.0, "avg_logprob": -0.12584982094941316, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 4.148954030824825e-05}, {"id": 182, "seek": 88002, "start": 880.02, + "end": 883.5799999999999, "text": " if these are all of the embeddings that I have,", + "tokens": [50364, 498, 613, 366, 439, 295, 264, 12240, 29432, 300, 286, 362, 11, + 50542], "temperature": 0.0, "avg_logprob": -0.11100775400797526, "compression_ratio": + 1.772563176895307, "no_speech_prob": 0.000533765705768019}, {"id": 183, "seek": + 88002, "start": 883.5799999999999, "end": 887.78, "text": " and I would assert for + the phrase Darth Vader,", "tokens": [50542, 293, 286, 576, 19810, 337, 264, 9535, + 40696, 36337, 11, 50752], "temperature": 0.0, "avg_logprob": -0.11100775400797526, + "compression_ratio": 1.772563176895307, "no_speech_prob": 0.000533765705768019}, + {"id": 184, "seek": 88002, "start": 887.78, "end": 889.66, "text": " turn that into + an embedding and match it,", "tokens": [50752, 1261, 300, 666, 364, 12240, 3584, + 293, 2995, 309, 11, 50846], "temperature": 0.0, "avg_logprob": -0.11100775400797526, + "compression_ratio": 1.772563176895307, "no_speech_prob": 0.000533765705768019}, + {"id": 185, "seek": 88002, "start": 889.66, "end": 891.06, "text": " you''ll see + that over here on the right,", "tokens": [50846, 291, 603, 536, 300, 670, 510, 322, + 264, 558, 11, 50916], "temperature": 0.0, "avg_logprob": -0.11100775400797526, "compression_ratio": + 1.772563176895307, "no_speech_prob": 0.000533765705768019}, {"id": 186, "seek": + 88002, "start": 891.06, "end": 893.5, "text": " I have a cluster of meaning associated", + "tokens": [50916, 286, 362, 257, 13630, 295, 3620, 6615, 51038], "temperature": + 0.0, "avg_logprob": -0.11100775400797526, "compression_ratio": 1.772563176895307, + "no_speech_prob": 0.000533765705768019}, {"id": 187, "seek": 88002, "start": 893.5, + "end": 895.06, "text": " with the search for Darth Vader.", "tokens": [51038, 365, + 264, 3164, 337, 40696, 36337, 13, 51116], "temperature": 0.0, "avg_logprob": -0.11100775400797526, + "compression_ratio": 1.772563176895307, "no_speech_prob": 0.000533765705768019}, + {"id": 188, "seek": 88002, "start": 895.06, "end": 896.98, "text": " Now, there''s + some other points in various places,", "tokens": [51116, 823, 11, 456, 311, 512, + 661, 2793, 294, 3683, 3190, 11, 51212], "temperature": 0.0, "avg_logprob": -0.11100775400797526, + "compression_ratio": 1.772563176895307, "no_speech_prob": 0.000533765705768019}, + {"id": 189, "seek": 88002, "start": 896.98, "end": 900.62, "text": " but if I were + to look at the items in this cluster,", "tokens": [51212, 457, 498, 286, 645, 281, + 574, 412, 264, 4754, 294, 341, 13630, 11, 51394], "temperature": 0.0, "avg_logprob": + -0.11100775400797526, "compression_ratio": 1.772563176895307, "no_speech_prob": + 0.000533765705768019}, {"id": 190, "seek": 88002, "start": 900.62, "end": 902.34, + "text": " I see pictures of Darth Vader,", "tokens": [51394, 286, 536, 5242, 295, + 40696, 36337, 11, 51480], "temperature": 0.0, "avg_logprob": -0.11100775400797526, + "compression_ratio": 1.772563176895307, "no_speech_prob": 0.000533765705768019}, + {"id": 191, "seek": 88002, "start": 902.34, "end": 903.74, "text": " which is what + I would expect,", "tokens": [51480, 597, 307, 437, 286, 576, 2066, 11, 51550], "temperature": + 0.0, "avg_logprob": -0.11100775400797526, "compression_ratio": 1.772563176895307, + "no_speech_prob": 0.000533765705768019}, {"id": 192, "seek": 88002, "start": 903.74, + "end": 905.62, "text": " because the meaning of Darth Vader", "tokens": [51550, + 570, 264, 3620, 295, 40696, 36337, 51644], "temperature": 0.0, "avg_logprob": -0.11100775400797526, + "compression_ratio": 1.772563176895307, "no_speech_prob": 0.000533765705768019}, + {"id": 193, "seek": 88002, "start": 905.62, "end": 908.42, "text": " is essentially + in this area of vector space.", "tokens": [51644, 307, 4476, 294, 341, 1859, 295, + 8062, 1901, 13, 51784], "temperature": 0.0, "avg_logprob": -0.11100775400797526, + "compression_ratio": 1.772563176895307, "no_speech_prob": 0.000533765705768019}, + {"id": 194, "seek": 90842, "start": 908.42, "end": 911.9, "text": " Similarly, if + I were to search for Puppy,", "tokens": [50364, 13157, 11, 498, 286, 645, 281, 3164, + 337, 13605, 7966, 11, 50538], "temperature": 0.0, "avg_logprob": -0.21191672908449635, + "compression_ratio": 1.5867768595041323, "no_speech_prob": 0.00018618193280417472}, + {"id": 195, "seek": 90842, "start": 911.9, "end": 914.3399999999999, "text": " then + this cluster of meaning right here", "tokens": [50538, 550, 341, 13630, 295, 3620, + 558, 510, 50660], "temperature": 0.0, "avg_logprob": -0.21191672908449635, "compression_ratio": + 1.5867768595041323, "no_speech_prob": 0.00018618193280417472}, {"id": 196, "seek": + 90842, "start": 914.3399999999999, "end": 916.06, "text": " corresponds with puppies + and d,", "tokens": [50660, 23249, 365, 33734, 293, 274, 11, 50746], "temperature": + 0.0, "avg_logprob": -0.21191672908449635, "compression_ratio": 1.5867768595041323, + "no_speech_prob": 0.00018618193280417472}, {"id": 197, "seek": 90842, "start": 916.06, + "end": 918.14, "text": " I see pictures of puppies.", "tokens": [50746, 286, 536, + 5242, 295, 33734, 13, 50850], "temperature": 0.0, "avg_logprob": -0.21191672908449635, + "compression_ratio": 1.5867768595041323, "no_speech_prob": 0.00018618193280417472}, + {"id": 198, "seek": 90842, "start": 918.14, "end": 920.9799999999999, "text": " + So the interesting question arises,", "tokens": [50850, 407, 264, 1880, 1168, 27388, + 11, 50992], "temperature": 0.0, "avg_logprob": -0.21191672908449635, "compression_ratio": + 1.5867768595041323, "no_speech_prob": 0.00018618193280417472}, {"id": 199, "seek": + 90842, "start": 920.9799999999999, "end": 922.66, "text": " when I ask what happens,", + "tokens": [50992, 562, 286, 1029, 437, 2314, 11, 51076], "temperature": 0.0, "avg_logprob": + -0.21191672908449635, "compression_ratio": 1.5867768595041323, "no_speech_prob": + 0.00018618193280417472}, {"id": 200, "seek": 90842, "start": 922.66, "end": 927.54, + "text": " if I were to find the midpoint between Puppy and Darth Vader", "tokens": + [51076, 498, 286, 645, 281, 915, 264, 2062, 6053, 1296, 13605, 7966, 293, 40696, + 36337, 51320], "temperature": 0.0, "avg_logprob": -0.21191672908449635, "compression_ratio": + 1.5867768595041323, "no_speech_prob": 0.00018618193280417472}, {"id": 201, "seek": + 90842, "start": 927.54, "end": 929.42, "text": " in this semantic vector space?", + "tokens": [51320, 294, 341, 47982, 8062, 1901, 30, 51414], "temperature": 0.0, "avg_logprob": + -0.21191672908449635, "compression_ratio": 1.5867768595041323, "no_speech_prob": + 0.00018618193280417472}, {"id": 202, "seek": 90842, "start": 933.02, "end": 934.18, + "text": " People have different intuitions", "tokens": [51594, 3432, 362, 819, 16224, + 626, 51652], "temperature": 0.0, "avg_logprob": -0.21191672908449635, "compression_ratio": + 1.5867768595041323, "no_speech_prob": 0.00018618193280417472}, {"id": 203, "seek": + 90842, "start": 934.18, "end": 935.9, "text": " about what actually happens here.", + "tokens": [51652, 466, 437, 767, 2314, 510, 13, 51738], "temperature": 0.0, "avg_logprob": + -0.21191672908449635, "compression_ratio": 1.5867768595041323, "no_speech_prob": + 0.00018618193280417472}, {"id": 204, "seek": 90842, "start": 935.9, "end": 936.9399999999999, + "text": " Some people think it''s,", "tokens": [51738, 2188, 561, 519, 309, 311, + 11, 51790], "temperature": 0.0, "avg_logprob": -0.21191672908449635, "compression_ratio": + 1.5867768595041323, "no_speech_prob": 0.00018618193280417472}, {"id": 205, "seek": + 93694, "start": 936.94, "end": 939.1, "text": " I don''t know what I would find + in the middle,", "tokens": [50364, 286, 500, 380, 458, 437, 286, 576, 915, 294, + 264, 2808, 11, 50472], "temperature": 0.0, "avg_logprob": -0.12856858352134967, + "compression_ratio": 2.015810276679842, "no_speech_prob": 0.0029177896212786436}, + {"id": 206, "seek": 93694, "start": 939.1, "end": 942.1800000000001, "text": " but + the answer is if this vector space is properly constructed,", "tokens": [50472, + 457, 264, 1867, 307, 498, 341, 8062, 1901, 307, 6108, 17083, 11, 50626], "temperature": + 0.0, "avg_logprob": -0.12856858352134967, "compression_ratio": 2.015810276679842, + "no_speech_prob": 0.0029177896212786436}, {"id": 207, "seek": 93694, "start": 942.1800000000001, + "end": 944.7800000000001, "text": " so that the semantic meaning is represented,", + "tokens": [50626, 370, 300, 264, 47982, 3620, 307, 10379, 11, 50756], "temperature": + 0.0, "avg_logprob": -0.12856858352134967, "compression_ratio": 2.015810276679842, + "no_speech_prob": 0.0029177896212786436}, {"id": 208, "seek": 93694, "start": 944.7800000000001, + "end": 947.74, "text": " i.e., the further away I get from this point,", "tokens": + [50756, 741, 13, 68, 7933, 264, 3052, 1314, 286, 483, 490, 341, 935, 11, 50904], + "temperature": 0.0, "avg_logprob": -0.12856858352134967, "compression_ratio": 2.015810276679842, + "no_speech_prob": 0.0029177896212786436}, {"id": 209, "seek": 93694, "start": 947.74, + "end": 949.1400000000001, "text": " the more I get away from dog,", "tokens": [50904, + 264, 544, 286, 483, 1314, 490, 3000, 11, 50974], "temperature": 0.0, "avg_logprob": + -0.12856858352134967, "compression_ratio": 2.015810276679842, "no_speech_prob": + 0.0029177896212786436}, {"id": 210, "seek": 93694, "start": 949.1400000000001, "end": + 950.6600000000001, "text": " the further away I get from this,", "tokens": [50974, + 264, 3052, 1314, 286, 483, 490, 341, 11, 51050], "temperature": 0.0, "avg_logprob": + -0.12856858352134967, "compression_ratio": 2.015810276679842, "no_speech_prob": + 0.0029177896212786436}, {"id": 211, "seek": 93694, "start": 950.6600000000001, "end": + 953.62, "text": " the more I get away from Darth Vader and vice versa,", "tokens": + [51050, 264, 544, 286, 483, 1314, 490, 40696, 36337, 293, 11964, 25650, 11, 51198], + "temperature": 0.0, "avg_logprob": -0.12856858352134967, "compression_ratio": 2.015810276679842, + "no_speech_prob": 0.0029177896212786436}, {"id": 212, "seek": 93694, "start": 953.62, + "end": 955.58, "text": " then what I would expect to find,", "tokens": [51198, 550, + 437, 286, 576, 2066, 281, 915, 11, 51296], "temperature": 0.0, "avg_logprob": -0.12856858352134967, + "compression_ratio": 2.015810276679842, "no_speech_prob": 0.0029177896212786436}, + {"id": 213, "seek": 93694, "start": 955.58, "end": 958.22, "text": " if I sort of + average those two,", "tokens": [51296, 498, 286, 1333, 295, 4274, 729, 732, 11, + 51428], "temperature": 0.0, "avg_logprob": -0.12856858352134967, "compression_ratio": + 2.015810276679842, "no_speech_prob": 0.0029177896212786436}, {"id": 214, "seek": + 93694, "start": 958.22, "end": 960.74, "text": " a vector from here and a vector + from here together,", "tokens": [51428, 257, 8062, 490, 510, 293, 257, 8062, 490, + 510, 1214, 11, 51554], "temperature": 0.0, "avg_logprob": -0.12856858352134967, + "compression_ratio": 2.015810276679842, "no_speech_prob": 0.0029177896212786436}, + {"id": 215, "seek": 93694, "start": 960.74, "end": 962.94, "text": " is a Puppy + Darth Vader,", "tokens": [51554, 307, 257, 13605, 7966, 40696, 36337, 11, 51664], + "temperature": 0.0, "avg_logprob": -0.12856858352134967, "compression_ratio": 2.015810276679842, + "no_speech_prob": 0.0029177896212786436}, {"id": 216, "seek": 93694, "start": 962.94, + "end": 965.7800000000001, "text": " a cute Puppy Darth Vader right here in the middle.", + "tokens": [51664, 257, 4052, 13605, 7966, 40696, 36337, 558, 510, 294, 264, 2808, + 13, 51806], "temperature": 0.0, "avg_logprob": -0.12856858352134967, "compression_ratio": + 2.015810276679842, "no_speech_prob": 0.0029177896212786436}, {"id": 217, "seek": + 96578, "start": 966.74, "end": 968.5, "text": " For some people that makes intuitive + sense,", "tokens": [50412, 1171, 512, 561, 300, 1669, 21769, 2020, 11, 50500], "temperature": + 0.0, "avg_logprob": -0.1405307400611139, "compression_ratio": 1.693069306930693, + "no_speech_prob": 0.00044243282172828913}, {"id": 218, "seek": 96578, "start": 968.5, + "end": 971.86, "text": " but if you think about what a semantic vector space is + doing,", "tokens": [50500, 457, 498, 291, 519, 466, 437, 257, 47982, 8062, 1901, + 307, 884, 11, 50668], "temperature": 0.0, "avg_logprob": -0.1405307400611139, "compression_ratio": + 1.693069306930693, "no_speech_prob": 0.00044243282172828913}, {"id": 219, "seek": + 96578, "start": 971.86, "end": 975.14, "text": " where it''s representing meaning + across a continuous spectrum,", "tokens": [50668, 689, 309, 311, 13460, 3620, 2108, + 257, 10957, 11143, 11, 50832], "temperature": 0.0, "avg_logprob": -0.1405307400611139, + "compression_ratio": 1.693069306930693, "no_speech_prob": 0.00044243282172828913}, + {"id": 220, "seek": 96578, "start": 975.14, "end": 976.38, "text": " you would expect + to find this,", "tokens": [50832, 291, 576, 2066, 281, 915, 341, 11, 50894], "temperature": + 0.0, "avg_logprob": -0.1405307400611139, "compression_ratio": 1.693069306930693, + "no_speech_prob": 0.00044243282172828913}, {"id": 221, "seek": 96578, "start": 976.38, + "end": 978.62, "text": " because I''m essentially finding what the thing is,", "tokens": + [50894, 570, 286, 478, 4476, 5006, 437, 264, 551, 307, 11, 51006], "temperature": + 0.0, "avg_logprob": -0.1405307400611139, "compression_ratio": 1.693069306930693, + "no_speech_prob": 0.00044243282172828913}, {"id": 222, "seek": 96578, "start": 978.62, + "end": 982.4599999999999, "text": " that is the average sort of in between Darth + Vader and Puppy", "tokens": [51006, 300, 307, 264, 4274, 1333, 295, 294, 1296, 40696, + 36337, 293, 13605, 7966, 51198], "temperature": 0.0, "avg_logprob": -0.1405307400611139, + "compression_ratio": 1.693069306930693, "no_speech_prob": 0.00044243282172828913}, + {"id": 223, "seek": 96578, "start": 982.4599999999999, "end": 984.62, "text": " + within the semantic vector space.", "tokens": [51198, 1951, 264, 47982, 8062, 1901, + 13, 51306], "temperature": 0.0, "avg_logprob": -0.1405307400611139, "compression_ratio": + 1.693069306930693, "no_speech_prob": 0.00044243282172828913}, {"id": 224, "seek": + 96578, "start": 984.62, "end": 987.1, "text": " Now there''s all sorts of reasons + why this could not work,", "tokens": [51306, 823, 456, 311, 439, 7527, 295, 4112, + 983, 341, 727, 406, 589, 11, 51430], "temperature": 0.0, "avg_logprob": -0.1405307400611139, + "compression_ratio": 1.693069306930693, "no_speech_prob": 0.00044243282172828913}, + {"id": 225, "seek": 96578, "start": 987.1, "end": 989.02, "text": " depending upon + how you''ve changed your model,", "tokens": [51430, 5413, 3564, 577, 291, 600, 3105, + 428, 2316, 11, 51526], "temperature": 0.0, "avg_logprob": -0.1405307400611139, "compression_ratio": + 1.693069306930693, "no_speech_prob": 0.00044243282172828913}, {"id": 226, "seek": + 96578, "start": 989.02, "end": 994.66, "text": " and if there''s too much data being + compressed into little space,", "tokens": [51526, 293, 498, 456, 311, 886, 709, + 1412, 885, 30353, 666, 707, 1901, 11, 51808], "temperature": 0.0, "avg_logprob": + -0.1405307400611139, "compression_ratio": 1.693069306930693, "no_speech_prob": 0.00044243282172828913}, + {"id": 227, "seek": 99466, "start": 994.66, "end": 997.18, "text": " but conceptually + this works.", "tokens": [50364, 457, 3410, 671, 341, 1985, 13, 50490], "temperature": + 0.0, "avg_logprob": -0.1647048232578995, "compression_ratio": 1.8780487804878048, + "no_speech_prob": 6.653396121691912e-05}, {"id": 228, "seek": 99466, "start": 998.2199999999999, + "end": 1001.14, "text": " So similarly, if I were to do an embedding search", "tokens": + [50542, 407, 14138, 11, 498, 286, 645, 281, 360, 364, 12240, 3584, 3164, 50688], + "temperature": 0.0, "avg_logprob": -0.1647048232578995, "compression_ratio": 1.8780487804878048, + "no_speech_prob": 6.653396121691912e-05}, {"id": 229, "seek": 99466, "start": 1001.14, + "end": 1004.4599999999999, "text": " for superhero flying versus superhero''s flying,", + "tokens": [50688, 337, 19428, 7137, 5717, 19428, 311, 7137, 11, 50854], "temperature": + 0.0, "avg_logprob": -0.1647048232578995, "compression_ratio": 1.8780487804878048, + "no_speech_prob": 6.653396121691912e-05}, {"id": 230, "seek": 99466, "start": 1004.4599999999999, + "end": 1010.4599999999999, "text": " this is very comparable to running a search + for superhero flying", "tokens": [50854, 341, 307, 588, 25323, 281, 2614, 257, 3164, + 337, 19428, 7137, 51154], "temperature": 0.0, "avg_logprob": -0.1647048232578995, + "compression_ratio": 1.8780487804878048, "no_speech_prob": 6.653396121691912e-05}, + {"id": 231, "seek": 99466, "start": 1010.4599999999999, "end": 1015.4599999999999, + "text": " sort of with the idea of a singular hero", "tokens": [51154, 1333, 295, + 365, 264, 1558, 295, 257, 20010, 5316, 51404], "temperature": 0.0, "avg_logprob": + -0.1647048232578995, "compression_ratio": 1.8780487804878048, "no_speech_prob": + 6.653396121691912e-05}, {"id": 232, "seek": 99466, "start": 1016.3399999999999, + "end": 1019.66, "text": " and then sort of tracking out the idea of a singular", + "tokens": [51448, 293, 550, 1333, 295, 11603, 484, 264, 1558, 295, 257, 20010, 51614], + "temperature": 0.0, "avg_logprob": -0.1647048232578995, "compression_ratio": 1.8780487804878048, + "no_speech_prob": 6.653396121691912e-05}, {"id": 233, "seek": 99466, "start": 1019.66, + "end": 1021.54, "text": " and adding in the idea of a plural,", "tokens": [51614, + 293, 5127, 294, 264, 1558, 295, 257, 25377, 11, 51708], "temperature": 0.0, "avg_logprob": + -0.1647048232578995, "compression_ratio": 1.8780487804878048, "no_speech_prob": + 6.653396121691912e-05}, {"id": 234, "seek": 99466, "start": 1021.54, "end": 1023.06, + "text": " again, from here to heroes,", "tokens": [51708, 797, 11, 490, 510, 281, + 12332, 11, 51784], "temperature": 0.0, "avg_logprob": -0.1647048232578995, "compression_ratio": + 1.8780487804878048, "no_speech_prob": 6.653396121691912e-05}, {"id": 235, "seek": + 99466, "start": 1023.06, "end": 1024.42, "text": " and then what happens over here + is,", "tokens": [51784, 293, 550, 437, 2314, 670, 510, 307, 11, 51852], "temperature": + 0.0, "avg_logprob": -0.1647048232578995, "compression_ratio": 1.8780487804878048, + "no_speech_prob": 6.653396121691912e-05}, {"id": 236, "seek": 102442, "start": 1024.42, + "end": 1026.74, "text": " this is essentially the same vector,", "tokens": [50364, + 341, 307, 4476, 264, 912, 8062, 11, 50480], "temperature": 0.0, "avg_logprob": -0.15135489391679524, + "compression_ratio": 1.7159090909090908, "no_speech_prob": 5.813015013700351e-05}, + {"id": 237, "seek": 102442, "start": 1026.74, "end": 1031.74, "text": " but moved + toward or in the direction of multiple versus singular.", "tokens": [50480, 457, + 4259, 7361, 420, 294, 264, 3513, 295, 3866, 5717, 20010, 13, 50730], "temperature": + 0.0, "avg_logprob": -0.15135489391679524, "compression_ratio": 1.7159090909090908, + "no_speech_prob": 5.813015013700351e-05}, {"id": 238, "seek": 102442, "start": 1032.74, + "end": 1034.26, "text": " And so what you see over here, in fact,", "tokens": [50780, + 400, 370, 437, 291, 536, 670, 510, 11, 294, 1186, 11, 50856], "temperature": 0.0, + "avg_logprob": -0.15135489391679524, "compression_ratio": 1.7159090909090908, "no_speech_prob": + 5.813015013700351e-05}, {"id": 239, "seek": 102442, "start": 1034.26, "end": 1037.3400000000001, + "text": " is that while some of the images are the same,", "tokens": [50856, 307, + 300, 1339, 512, 295, 264, 5267, 366, 264, 912, 11, 51010], "temperature": 0.0, "avg_logprob": + -0.15135489391679524, "compression_ratio": 1.7159090909090908, "no_speech_prob": + 5.813015013700351e-05}, {"id": 240, "seek": 102442, "start": 1037.3400000000001, + "end": 1040.74, "text": " I''ll, in general, I''m seeing more images of superheroes", + "tokens": [51010, 286, 603, 11, 294, 2674, 11, 286, 478, 2577, 544, 5267, 295, 45417, + 51180], "temperature": 0.0, "avg_logprob": -0.15135489391679524, "compression_ratio": + 1.7159090909090908, "no_speech_prob": 5.813015013700351e-05}, {"id": 241, "seek": + 102442, "start": 1040.74, "end": 1043.5, "text": " that are in groups of multiple + superheroes.", "tokens": [51180, 300, 366, 294, 3935, 295, 3866, 45417, 13, 51318], + "temperature": 0.0, "avg_logprob": -0.15135489391679524, "compression_ratio": 1.7159090909090908, + "no_speech_prob": 5.813015013700351e-05}, {"id": 242, "seek": 102442, "start": 1043.5, + "end": 1048.3000000000002, "text": " And so to demonstrate this with a very explicit + concrete example,", "tokens": [51318, 400, 370, 281, 11698, 341, 365, 257, 588, + 13691, 9859, 1365, 11, 51558], "temperature": 0.0, "avg_logprob": -0.15135489391679524, + "compression_ratio": 1.7159090909090908, "no_speech_prob": 5.813015013700351e-05}, + {"id": 243, "seek": 102442, "start": 1048.3000000000002, "end": 1051.42, "text": + " if I were to take this, an embedding for this image,", "tokens": [51558, 498, + 286, 645, 281, 747, 341, 11, 364, 12240, 3584, 337, 341, 3256, 11, 51714], "temperature": + 0.0, "avg_logprob": -0.15135489391679524, "compression_ratio": 1.7159090909090908, + "no_speech_prob": 5.813015013700351e-05}, {"id": 244, "seek": 102442, "start": 1051.42, + "end": 1053.46, "text": " which is a delorean from back to the future,", "tokens": + [51714, 597, 307, 257, 1103, 25885, 490, 646, 281, 264, 2027, 11, 51816], "temperature": + 0.0, "avg_logprob": -0.15135489391679524, "compression_ratio": 1.7159090909090908, + "no_speech_prob": 5.813015013700351e-05}, {"id": 245, "seek": 105346, "start": 1053.54, + "end": 1055.3400000000001, "text": " and I were to describe it, right?", "tokens": + [50368, 293, 286, 645, 281, 6786, 309, 11, 558, 30, 50458], "temperature": 0.0, + "avg_logprob": -0.13120878348916265, "compression_ratio": 1.7647058823529411, "no_speech_prob": + 0.0003347648598719388}, {"id": 246, "seek": 105346, "start": 1055.3400000000001, + "end": 1060.3400000000001, "text": " This is a sporty car with one door on either + side,", "tokens": [50458, 639, 307, 257, 45804, 1032, 365, 472, 2853, 322, 2139, + 1252, 11, 50708], "temperature": 0.0, "avg_logprob": -0.13120878348916265, "compression_ratio": + 1.7647058823529411, "no_speech_prob": 0.0003347648598719388}, {"id": 247, "seek": + 105346, "start": 1061.3, "end": 1065.82, "text": " and it''s kind of boxy and it''s + got really cool lighting.", "tokens": [50756, 293, 309, 311, 733, 295, 2424, 88, + 293, 309, 311, 658, 534, 1627, 9577, 13, 50982], "temperature": 0.0, "avg_logprob": + -0.13120878348916265, "compression_ratio": 1.7647058823529411, "no_speech_prob": + 0.0003347648598719388}, {"id": 248, "seek": 105346, "start": 1065.82, "end": 1067.94, + "text": " And so when I run that search for this embedding", "tokens": [50982, 400, + 370, 562, 286, 1190, 300, 3164, 337, 341, 12240, 3584, 51088], "temperature": 0.0, + "avg_logprob": -0.13120878348916265, "compression_ratio": 1.7647058823529411, "no_speech_prob": + 0.0003347648598719388}, {"id": 249, "seek": 105346, "start": 1067.94, "end": 1070.9, + "text": " on other images, I find other sporty cars,", "tokens": [51088, 322, 661, + 5267, 11, 286, 915, 661, 45804, 5163, 11, 51236], "temperature": 0.0, "avg_logprob": + -0.13120878348916265, "compression_ratio": 1.7647058823529411, "no_speech_prob": + 0.0003347648598719388}, {"id": 250, "seek": 105346, "start": 1070.9, "end": 1072.54, + "text": " obviously some delorians in here,", "tokens": [51236, 2745, 512, 1103, + 284, 2567, 294, 510, 11, 51318], "temperature": 0.0, "avg_logprob": -0.13120878348916265, + "compression_ratio": 1.7647058823529411, "no_speech_prob": 0.0003347648598719388}, + {"id": 251, "seek": 105346, "start": 1072.54, "end": 1076.3, "text": " but also + just in general sporty cars with a door on either side", "tokens": [51318, 457, + 611, 445, 294, 2674, 45804, 5163, 365, 257, 2853, 322, 2139, 1252, 51506], "temperature": + 0.0, "avg_logprob": -0.13120878348916265, "compression_ratio": 1.7647058823529411, + "no_speech_prob": 0.0003347648598719388}, {"id": 252, "seek": 105346, "start": 1076.3, + "end": 1078.38, "text": " and really cool lighting for the most part.", "tokens": + [51506, 293, 534, 1627, 9577, 337, 264, 881, 644, 13, 51610], "temperature": 0.0, + "avg_logprob": -0.13120878348916265, "compression_ratio": 1.7647058823529411, "no_speech_prob": + 0.0003347648598719388}, {"id": 253, "seek": 105346, "start": 1078.38, "end": 1081.06, + "text": " However, what if I were to take an embedding", "tokens": [51610, 2908, + 11, 437, 498, 286, 645, 281, 747, 364, 12240, 3584, 51744], "temperature": 0.0, + "avg_logprob": -0.13120878348916265, "compression_ratio": 1.7647058823529411, "no_speech_prob": + 0.0003347648598719388}, {"id": 254, "seek": 108106, "start": 1081.06, "end": 1085.46, + "text": " for the query from the last slide superhero,", "tokens": [50364, 337, + 264, 14581, 490, 264, 1036, 4137, 19428, 11, 50584], "temperature": 0.0, "avg_logprob": + -0.15508970595498123, "compression_ratio": 1.7723880597014925, "no_speech_prob": + 0.004468755330890417}, {"id": 255, "seek": 108106, "start": 1085.46, "end": 1088.4199999999998, + "text": " and I were to average that or pull it", "tokens": [50584, 293, 286, 645, + 281, 4274, 300, 420, 2235, 309, 50732], "temperature": 0.0, "avg_logprob": -0.15508970595498123, + "compression_ratio": 1.7723880597014925, "no_speech_prob": 0.004468755330890417}, + {"id": 256, "seek": 108106, "start": 1088.4199999999998, "end": 1090.26, "text": + " with this image embedding, what would I get?", "tokens": [50732, 365, 341, 3256, + 12240, 3584, 11, 437, 576, 286, 483, 30, 50824], "temperature": 0.0, "avg_logprob": + -0.15508970595498123, "compression_ratio": 1.7723880597014925, "no_speech_prob": + 0.004468755330890417}, {"id": 257, "seek": 108106, "start": 1090.26, "end": 1092.3, + "text": " Well, in fact, we have an example of this", "tokens": [50824, 1042, 11, + 294, 1186, 11, 321, 362, 364, 1365, 295, 341, 50926], "temperature": 0.0, "avg_logprob": + -0.15508970595498123, "compression_ratio": 1.7723880597014925, "no_speech_prob": + 0.004468755330890417}, {"id": 258, "seek": 108106, "start": 1092.3, "end": 1095.86, + "text": " in the iPad search book when we''re doing multi-modal search.", "tokens": + [50926, 294, 264, 12945, 3164, 1446, 562, 321, 434, 884, 4825, 12, 8014, 304, 3164, + 13, 51104], "temperature": 0.0, "avg_logprob": -0.15508970595498123, "compression_ratio": + 1.7723880597014925, "no_speech_prob": 0.004468755330890417}, {"id": 259, "seek": + 108106, "start": 1095.86, "end": 1097.7, "text": " If I take an embedding for superhero", + "tokens": [51104, 759, 286, 747, 364, 12240, 3584, 337, 19428, 51196], "temperature": + 0.0, "avg_logprob": -0.15508970595498123, "compression_ratio": 1.7723880597014925, + "no_speech_prob": 0.004468755330890417}, {"id": 260, "seek": 108106, "start": 1097.7, + "end": 1099.34, "text": " and an embedding for this image,", "tokens": [51196, 293, + 364, 12240, 3584, 337, 341, 3256, 11, 51278], "temperature": 0.0, "avg_logprob": + -0.15508970595498123, "compression_ratio": 1.7723880597014925, "no_speech_prob": + 0.004468755330890417}, {"id": 261, "seek": 108106, "start": 1099.34, "end": 1102.86, + "text": " what I, in fact, do get is this very first result", "tokens": [51278, + 437, 286, 11, 294, 1186, 11, 360, 483, 307, 341, 588, 700, 1874, 51454], "temperature": + 0.0, "avg_logprob": -0.15508970595498123, "compression_ratio": 1.7723880597014925, + "no_speech_prob": 0.004468755330890417}, {"id": 262, "seek": 108106, "start": 1102.86, + "end": 1106.54, "text": " as a sporty car with cool lighting with a superhero on + top,", "tokens": [51454, 382, 257, 45804, 1032, 365, 1627, 9577, 365, 257, 19428, + 322, 1192, 11, 51638], "temperature": 0.0, "avg_logprob": -0.15508970595498123, + "compression_ratio": 1.7723880597014925, "no_speech_prob": 0.004468755330890417}, + {"id": 263, "seek": 108106, "start": 1106.54, "end": 1109.22, "text": " because + that''s what I would expect in this semantic vector space", "tokens": [51638, 570, + 300, 311, 437, 286, 576, 2066, 294, 341, 47982, 8062, 1901, 51772], "temperature": + 0.0, "avg_logprob": -0.15508970595498123, "compression_ratio": 1.7723880597014925, + "no_speech_prob": 0.004468755330890417}, {"id": 264, "seek": 110922, "start": 1109.22, + "end": 1112.1000000000001, "text": " to be in between these things.", "tokens": + [50364, 281, 312, 294, 1296, 613, 721, 13, 50508], "temperature": 0.0, "avg_logprob": + -0.12945638164397208, "compression_ratio": 1.7445255474452555, "no_speech_prob": + 0.006800663657486439}, {"id": 265, "seek": 110922, "start": 1112.1000000000001, + "end": 1116.06, "text": " And for these other images, again, sporty cars, single + door,", "tokens": [50508, 400, 337, 613, 661, 5267, 11, 797, 11, 45804, 5163, 11, + 2167, 2853, 11, 50706], "temperature": 0.0, "avg_logprob": -0.12945638164397208, + "compression_ratio": 1.7445255474452555, "no_speech_prob": 0.006800663657486439}, + {"id": 266, "seek": 110922, "start": 1116.06, "end": 1119.02, "text": " but notice + that in all of them, there''s a person,", "tokens": [50706, 457, 3449, 300, 294, + 439, 295, 552, 11, 456, 311, 257, 954, 11, 50854], "temperature": 0.0, "avg_logprob": + -0.12945638164397208, "compression_ratio": 1.7445255474452555, "no_speech_prob": + 0.006800663657486439}, {"id": 267, "seek": 110922, "start": 1119.02, "end": 1120.78, + "text": " and it just so happens that that person", "tokens": [50854, 293, 309, + 445, 370, 2314, 300, 300, 954, 50942], "temperature": 0.0, "avg_logprob": -0.12945638164397208, + "compression_ratio": 1.7445255474452555, "no_speech_prob": 0.006800663657486439}, + {"id": 268, "seek": 110922, "start": 1120.78, "end": 1123.14, "text": " is the protagonist + of their story.", "tokens": [50942, 307, 264, 24506, 295, 641, 1657, 13, 51060], + "temperature": 0.0, "avg_logprob": -0.12945638164397208, "compression_ratio": 1.7445255474452555, + "no_speech_prob": 0.006800663657486439}, {"id": 269, "seek": 110922, "start": 1123.14, + "end": 1125.66, "text": " So maybe those stories didn''t have actual superheroes,", + "tokens": [51060, 407, 1310, 729, 3676, 994, 380, 362, 3539, 45417, 11, 51186], + "temperature": 0.0, "avg_logprob": -0.12945638164397208, "compression_ratio": 1.7445255474452555, + "no_speech_prob": 0.006800663657486439}, {"id": 270, "seek": 110922, "start": 1125.66, + "end": 1127.98, "text": " but these are the heroes of those stories.", "tokens": + [51186, 457, 613, 366, 264, 12332, 295, 729, 3676, 13, 51302], "temperature": 0.0, + "avg_logprob": -0.12945638164397208, "compression_ratio": 1.7445255474452555, "no_speech_prob": + 0.006800663657486439}, {"id": 271, "seek": 110922, "start": 1127.98, "end": 1130.82, + "text": " So you get the idea, and I wanted to paint that conceptually", "tokens": + [51302, 407, 291, 483, 264, 1558, 11, 293, 286, 1415, 281, 4225, 300, 3410, 671, + 51444], "temperature": 0.0, "avg_logprob": -0.12945638164397208, "compression_ratio": + 1.7445255474452555, "no_speech_prob": 0.006800663657486439}, {"id": 272, "seek": + 110922, "start": 1130.82, "end": 1133.7, "text": " just to talk about regions of + vector space", "tokens": [51444, 445, 281, 751, 466, 10682, 295, 8062, 1901, 51588], + "temperature": 0.0, "avg_logprob": -0.12945638164397208, "compression_ratio": 1.7445255474452555, + "no_speech_prob": 0.006800663657486439}, {"id": 273, "seek": 110922, "start": 1133.7, + "end": 1137.3, "text": " and what they represent and how you can use math on vectors", + "tokens": [51588, 293, 437, 436, 2906, 293, 577, 291, 393, 764, 5221, 322, 18875, + 51768], "temperature": 0.0, "avg_logprob": -0.12945638164397208, "compression_ratio": + 1.7445255474452555, "no_speech_prob": 0.006800663657486439}, {"id": 274, "seek": + 113730, "start": 1137.3, "end": 1140.74, "text": " to move between them and sort + of combine concepts", "tokens": [50364, 281, 1286, 1296, 552, 293, 1333, 295, 10432, + 10392, 50536], "temperature": 0.0, "avg_logprob": -0.13015363673971156, "compression_ratio": + 1.688073394495413, "no_speech_prob": 0.00017746536468621343}, {"id": 275, "seek": + 113730, "start": 1140.74, "end": 1142.8999999999999, "text": " and find related + things.", "tokens": [50536, 293, 915, 4077, 721, 13, 50644], "temperature": 0.0, + "avg_logprob": -0.13015363673971156, "compression_ratio": 1.688073394495413, "no_speech_prob": + 0.00017746536468621343}, {"id": 276, "seek": 113730, "start": 1143.94, "end": 1148.94, + "text": " And so one problem, now zooming back out to the topic of today,", "tokens": + [50696, 400, 370, 472, 1154, 11, 586, 48226, 646, 484, 281, 264, 4829, 295, 965, + 11, 50946], "temperature": 0.0, "avg_logprob": -0.13015363673971156, "compression_ratio": + 1.688073394495413, "no_speech_prob": 0.00017746536468621343}, {"id": 277, "seek": + 113730, "start": 1148.94, "end": 1152.62, "text": " one problem that we commonly + come across,", "tokens": [50946, 472, 1154, 300, 321, 12719, 808, 2108, 11, 51130], + "temperature": 0.0, "avg_logprob": -0.13015363673971156, "compression_ratio": 1.688073394495413, + "no_speech_prob": 0.00017746536468621343}, {"id": 278, "seek": 113730, "start": + 1152.62, "end": 1155.02, "text": " and this is where hybrid search comes into play,", + "tokens": [51130, 293, 341, 307, 689, 13051, 3164, 1487, 666, 862, 11, 51250], "temperature": + 0.0, "avg_logprob": -0.13015363673971156, "compression_ratio": 1.688073394495413, + "no_speech_prob": 0.00017746536468621343}, {"id": 279, "seek": 113730, "start": + 1155.02, "end": 1158.3799999999999, "text": " is that we have disjoint vector spaces + in search", "tokens": [51250, 307, 300, 321, 362, 717, 48613, 8062, 7673, 294, 3164, + 51418], "temperature": 0.0, "avg_logprob": -0.13015363673971156, "compression_ratio": + 1.688073394495413, "no_speech_prob": 0.00017746536468621343}, {"id": 280, "seek": + 113730, "start": 1158.3799999999999, "end": 1161.3, "text": " and that leads to + disjoint query paradigms.", "tokens": [51418, 293, 300, 6689, 281, 717, 48613, 14581, + 13480, 328, 2592, 13, 51564], "temperature": 0.0, "avg_logprob": -0.13015363673971156, + "compression_ratio": 1.688073394495413, "no_speech_prob": 0.00017746536468621343}, + {"id": 281, "seek": 113730, "start": 1161.3, "end": 1166.1, "text": " What I mean + by that is that we have a sparse,", "tokens": [51564, 708, 286, 914, 538, 300, 307, + 300, 321, 362, 257, 637, 11668, 11, 51804], "temperature": 0.0, "avg_logprob": -0.13015363673971156, + "compression_ratio": 1.688073394495413, "no_speech_prob": 0.00017746536468621343}, + {"id": 282, "seek": 116610, "start": 1166.1, "end": 1170.82, "text": " lexical semantic + space, which is our inverted index.", "tokens": [50364, 476, 87, 804, 47982, 1901, + 11, 597, 307, 527, 38969, 8186, 13, 50600], "temperature": 0.0, "avg_logprob": -0.11010892982156868, + "compression_ratio": 1.6853932584269662, "no_speech_prob": 9.287398279411718e-05}, + {"id": 283, "seek": 116610, "start": 1170.82, "end": 1172.9399999999998, "text": + " What I showed you earlier with the million dimensions", "tokens": [50600, 708, + 286, 4712, 291, 3071, 365, 264, 2459, 12819, 50706], "temperature": 0.0, "avg_logprob": + -0.11010892982156868, "compression_ratio": 1.6853932584269662, "no_speech_prob": + 9.287398279411718e-05}, {"id": 284, "seek": 116610, "start": 1172.9399999999998, + "end": 1175.3799999999999, "text": " and the keywords represent the dimensions,", + "tokens": [50706, 293, 264, 21009, 2906, 264, 12819, 11, 50828], "temperature": + 0.0, "avg_logprob": -0.11010892982156868, "compression_ratio": 1.6853932584269662, + "no_speech_prob": 9.287398279411718e-05}, {"id": 285, "seek": 116610, "start": 1175.3799999999999, + "end": 1176.86, "text": " that is a vector space.", "tokens": [50828, 300, 307, + 257, 8062, 1901, 13, 50902], "temperature": 0.0, "avg_logprob": -0.11010892982156868, + "compression_ratio": 1.6853932584269662, "no_speech_prob": 9.287398279411718e-05}, + {"id": 286, "seek": 116610, "start": 1176.86, "end": 1178.2199999999998, "text": + " It''s just a very sparse one.", "tokens": [50902, 467, 311, 445, 257, 588, 637, + 11668, 472, 13, 50970], "temperature": 0.0, "avg_logprob": -0.11010892982156868, + "compression_ratio": 1.6853932584269662, "no_speech_prob": 9.287398279411718e-05}, + {"id": 287, "seek": 116610, "start": 1179.2199999999998, "end": 1181.86, "text": + " Similarly, we have dense vector spaces", "tokens": [51020, 13157, 11, 321, 362, + 18011, 8062, 7673, 51152], "temperature": 0.0, "avg_logprob": -0.11010892982156868, + "compression_ratio": 1.6853932584269662, "no_speech_prob": 9.287398279411718e-05}, + {"id": 288, "seek": 116610, "start": 1181.86, "end": 1184.2199999999998, "text": + " where most of our embeddings are,", "tokens": [51152, 689, 881, 295, 527, 12240, + 29432, 366, 11, 51270], "temperature": 0.0, "avg_logprob": -0.11010892982156868, + "compression_ratio": 1.6853932584269662, "no_speech_prob": 9.287398279411718e-05}, + {"id": 289, "seek": 116610, "start": 1184.2199999999998, "end": 1185.86, "text": + " that we get out of large language models", "tokens": [51270, 300, 321, 483, 484, + 295, 2416, 2856, 5245, 51352], "temperature": 0.0, "avg_logprob": -0.11010892982156868, + "compression_ratio": 1.6853932584269662, "no_speech_prob": 9.287398279411718e-05}, + {"id": 290, "seek": 116610, "start": 1185.86, "end": 1189.34, "text": " where they''re + compact into a small number of dimensions,", "tokens": [51352, 689, 436, 434, 14679, + 666, 257, 1359, 1230, 295, 12819, 11, 51526], "temperature": 0.0, "avg_logprob": + -0.11010892982156868, "compression_ratio": 1.6853932584269662, "no_speech_prob": + 9.287398279411718e-05}, {"id": 291, "seek": 116610, "start": 1189.34, "end": 1191.1799999999998, + "text": " but they''re continuous.", "tokens": [51526, 457, 436, 434, 10957, 13, + 51618], "temperature": 0.0, "avg_logprob": -0.11010892982156868, "compression_ratio": + 1.6853932584269662, "no_speech_prob": 9.287398279411718e-05}, {"id": 292, "seek": + 116610, "start": 1191.1799999999998, "end": 1193.82, "text": " Because we have these + two different query paradigms,", "tokens": [51618, 1436, 321, 362, 613, 732, 819, + 14581, 13480, 328, 2592, 11, 51750], "temperature": 0.0, "avg_logprob": -0.11010892982156868, + "compression_ratio": 1.6853932584269662, "no_speech_prob": 9.287398279411718e-05}, + {"id": 293, "seek": 119382, "start": 1193.82, "end": 1198.22, "text": " what often + happens with vector search is we say,", "tokens": [50364, 437, 2049, 2314, 365, + 8062, 3164, 307, 321, 584, 11, 50584], "temperature": 0.0, "avg_logprob": -0.12971499870563374, + "compression_ratio": 1.657992565055762, "no_speech_prob": 0.00019350160437170416}, + {"id": 294, "seek": 119382, "start": 1198.22, "end": 1201.3, "text": " hey, I don''t + know how to combine this dense query", "tokens": [50584, 4177, 11, 286, 500, 380, + 458, 577, 281, 10432, 341, 18011, 14581, 50738], "temperature": 0.0, "avg_logprob": + -0.12971499870563374, "compression_ratio": 1.657992565055762, "no_speech_prob": + 0.00019350160437170416}, {"id": 295, "seek": 119382, "start": 1201.3, "end": 1204.62, + "text": " on this embedding with the sparse query with these keywords.", "tokens": + [50738, 322, 341, 12240, 3584, 365, 264, 637, 11668, 14581, 365, 613, 21009, 13, + 50904], "temperature": 0.0, "avg_logprob": -0.12971499870563374, "compression_ratio": + 1.657992565055762, "no_speech_prob": 0.00019350160437170416}, {"id": 296, "seek": + 119382, "start": 1204.62, "end": 1207.02, "text": " So I''m just going to run them + as separate searches.", "tokens": [50904, 407, 286, 478, 445, 516, 281, 1190, 552, + 382, 4994, 26701, 13, 51024], "temperature": 0.0, "avg_logprob": -0.12971499870563374, + "compression_ratio": 1.657992565055762, "no_speech_prob": 0.00019350160437170416}, + {"id": 297, "seek": 119382, "start": 1207.02, "end": 1210.74, "text": " And in fact, + that''s what most sort of hybrid searches,", "tokens": [51024, 400, 294, 1186, 11, + 300, 311, 437, 881, 1333, 295, 13051, 26701, 11, 51210], "temperature": 0.0, "avg_logprob": + -0.12971499870563374, "compression_ratio": 1.657992565055762, "no_speech_prob": + 0.00019350160437170416}, {"id": 298, "seek": 119382, "start": 1210.74, "end": 1213.86, + "text": " hybrid search implementations look like out of the box.", "tokens": [51210, + 13051, 3164, 4445, 763, 574, 411, 484, 295, 264, 2424, 13, 51366], "temperature": + 0.0, "avg_logprob": -0.12971499870563374, "compression_ratio": 1.657992565055762, + "no_speech_prob": 0.00019350160437170416}, {"id": 299, "seek": 119382, "start": + 1213.86, "end": 1218.62, "text": " So this is an example of RRF or the reciprocal + rank fusion algorithm", "tokens": [51366, 407, 341, 307, 364, 1365, 295, 497, 49, + 37, 420, 264, 46948, 6181, 23100, 9284, 51604], "temperature": 0.0, "avg_logprob": + -0.12971499870563374, "compression_ratio": 1.657992565055762, "no_speech_prob": + 0.00019350160437170416}, {"id": 300, "seek": 119382, "start": 1218.62, "end": 1221.9399999999998, + "text": " where I''m essentially taking a lexical query over here", "tokens": [51604, + 689, 286, 478, 4476, 1940, 257, 476, 87, 804, 14581, 670, 510, 51770], "temperature": + 0.0, "avg_logprob": -0.12971499870563374, "compression_ratio": 1.657992565055762, + "no_speech_prob": 0.00019350160437170416}, {"id": 301, "seek": 122194, "start": + 1221.94, "end": 1223.46, "text": " for the Hobbit.", "tokens": [50364, 337, 264, + 22966, 5260, 13, 50440], "temperature": 0.0, "avg_logprob": -0.1446118462354617, + "compression_ratio": 1.873015873015873, "no_speech_prob": 0.006634780205786228}, + {"id": 302, "seek": 122194, "start": 1223.46, "end": 1225.22, "text": " And I''m + matching on a bunch of documents.", "tokens": [50440, 400, 286, 478, 14324, 322, + 257, 3840, 295, 8512, 13, 50528], "temperature": 0.0, "avg_logprob": -0.1446118462354617, + "compression_ratio": 1.873015873015873, "no_speech_prob": 0.006634780205786228}, + {"id": 303, "seek": 122194, "start": 1225.22, "end": 1227.98, "text": " You''ll + see each of these has the word Hobbit", "tokens": [50528, 509, 603, 536, 1184, 295, + 613, 575, 264, 1349, 22966, 5260, 50666], "temperature": 0.0, "avg_logprob": -0.1446118462354617, + "compression_ratio": 1.873015873015873, "no_speech_prob": 0.006634780205786228}, + {"id": 304, "seek": 122194, "start": 1227.98, "end": 1229.78, "text": " and it''s + somewhere either in the title", "tokens": [50666, 293, 309, 311, 4079, 2139, 294, + 264, 4876, 50756], "temperature": 0.0, "avg_logprob": -0.1446118462354617, "compression_ratio": + 1.873015873015873, "no_speech_prob": 0.006634780205786228}, {"id": 305, "seek": + 122194, "start": 1229.78, "end": 1231.74, "text": " or maybe in the description.", + "tokens": [50756, 420, 1310, 294, 264, 3855, 13, 50854], "temperature": 0.0, "avg_logprob": + -0.1446118462354617, "compression_ratio": 1.873015873015873, "no_speech_prob": 0.006634780205786228}, + {"id": 306, "seek": 122194, "start": 1231.74, "end": 1236.66, "text": " But notice + that while the first four results look pretty good,", "tokens": [50854, 583, 3449, + 300, 1339, 264, 700, 1451, 3542, 574, 1238, 665, 11, 51100], "temperature": 0.0, + "avg_logprob": -0.1446118462354617, "compression_ratio": 1.873015873015873, "no_speech_prob": + 0.006634780205786228}, {"id": 307, "seek": 122194, "start": 1236.66, "end": 1238.46, + "text": " the next, these are the only results", "tokens": [51100, 264, 958, 11, + 613, 366, 264, 787, 3542, 51190], "temperature": 0.0, "avg_logprob": -0.1446118462354617, + "compression_ratio": 1.873015873015873, "no_speech_prob": 0.006634780205786228}, + {"id": 308, "seek": 122194, "start": 1238.46, "end": 1240.3400000000001, "text": + " that had the word Hobbit in them.", "tokens": [51190, 300, 632, 264, 1349, 22966, + 5260, 294, 552, 13, 51284], "temperature": 0.0, "avg_logprob": -0.1446118462354617, + "compression_ratio": 1.873015873015873, "no_speech_prob": 0.006634780205786228}, + {"id": 309, "seek": 122194, "start": 1240.3400000000001, "end": 1241.98, "text": + " And then the rest of these results,", "tokens": [51284, 400, 550, 264, 1472, 295, + 613, 3542, 11, 51366], "temperature": 0.0, "avg_logprob": -0.1446118462354617, "compression_ratio": + 1.873015873015873, "no_speech_prob": 0.006634780205786228}, {"id": 310, "seek": + 122194, "start": 1241.98, "end": 1244.26, "text": " the good, the bad, and the ugly.", + "tokens": [51366, 264, 665, 11, 264, 1578, 11, 293, 264, 12246, 13, 51480], "temperature": + 0.0, "avg_logprob": -0.1446118462354617, "compression_ratio": 1.873015873015873, + "no_speech_prob": 0.006634780205786228}, {"id": 311, "seek": 122194, "start": 1244.26, + "end": 1247.8200000000002, "text": " This just happens to match on the word the + three times.", "tokens": [51480, 639, 445, 2314, 281, 2995, 322, 264, 1349, 264, + 1045, 1413, 13, 51658], "temperature": 0.0, "avg_logprob": -0.1446118462354617, + "compression_ratio": 1.873015873015873, "no_speech_prob": 0.006634780205786228}, + {"id": 312, "seek": 122194, "start": 1247.8200000000002, "end": 1250.46, "text": + " And then this next result happens to match", "tokens": [51658, 400, 550, 341, + 958, 1874, 2314, 281, 2995, 51790], "temperature": 0.0, "avg_logprob": -0.1446118462354617, + "compression_ratio": 1.873015873015873, "no_speech_prob": 0.006634780205786228}, + {"id": 313, "seek": 125046, "start": 1250.46, "end": 1253.14, "text": " on the Lord + of the ring.", "tokens": [50364, 322, 264, 3257, 295, 264, 4875, 13, 50498], "temperature": + 0.0, "avg_logprob": -0.14320736057710964, "compression_ratio": 1.7451612903225806, + "no_speech_prob": 0.0005634360131807625}, {"id": 314, "seek": 125046, "start": 1253.14, + "end": 1255.3400000000001, "text": " So it''s got the in it three times as well.", + "tokens": [50498, 407, 309, 311, 658, 264, 294, 309, 1045, 1413, 382, 731, 13, 50608], + "temperature": 0.0, "avg_logprob": -0.14320736057710964, "compression_ratio": 1.7451612903225806, + "no_speech_prob": 0.0005634360131807625}, {"id": 315, "seek": 125046, "start": 1255.3400000000001, + "end": 1257.22, "text": " It happened to give me a good result,", "tokens": [50608, + 467, 2011, 281, 976, 385, 257, 665, 1874, 11, 50702], "temperature": 0.0, "avg_logprob": + -0.14320736057710964, "compression_ratio": 1.7451612903225806, "no_speech_prob": + 0.0005634360131807625}, {"id": 316, "seek": 125046, "start": 1257.22, "end": 1258.74, + "text": " but it was purely coincidence", "tokens": [50702, 457, 309, 390, 17491, + 22137, 50778], "temperature": 0.0, "avg_logprob": -0.14320736057710964, "compression_ratio": + 1.7451612903225806, "no_speech_prob": 0.0005634360131807625}, {"id": 317, "seek": + 125046, "start": 1258.74, "end": 1260.78, "text": " because it doesn''t have the + word Hobbit here.", "tokens": [50778, 570, 309, 1177, 380, 362, 264, 1349, 22966, + 5260, 510, 13, 50880], "temperature": 0.0, "avg_logprob": -0.14320736057710964, + "compression_ratio": 1.7451612903225806, "no_speech_prob": 0.0005634360131807625}, + {"id": 318, "seek": 125046, "start": 1260.78, "end": 1263.06, "text": " And then + I get the abyss and then the apartment again,", "tokens": [50880, 400, 550, 286, + 483, 264, 410, 31059, 293, 550, 264, 9587, 797, 11, 50994], "temperature": 0.0, + "avg_logprob": -0.14320736057710964, "compression_ratio": 1.7451612903225806, "no_speech_prob": + 0.0005634360131807625}, {"id": 319, "seek": 125046, "start": 1263.06, "end": 1264.46, + "text": " only matching on the word the.", "tokens": [50994, 787, 14324, 322, 264, + 1349, 264, 13, 51064], "temperature": 0.0, "avg_logprob": -0.14320736057710964, + "compression_ratio": 1.7451612903225806, "no_speech_prob": 0.0005634360131807625}, + {"id": 320, "seek": 125046, "start": 1264.46, "end": 1266.46, "text": " So the lexical + search found all the results", "tokens": [51064, 407, 264, 476, 87, 804, 3164, 1352, + 439, 264, 3542, 51164], "temperature": 0.0, "avg_logprob": -0.14320736057710964, + "compression_ratio": 1.7451612903225806, "no_speech_prob": 0.0005634360131807625}, + {"id": 321, "seek": 125046, "start": 1266.46, "end": 1268.1000000000001, "text": + " that had the word Hobbit in them,", "tokens": [51164, 300, 632, 264, 1349, 22966, + 5260, 294, 552, 11, 51246], "temperature": 0.0, "avg_logprob": -0.14320736057710964, + "compression_ratio": 1.7451612903225806, "no_speech_prob": 0.0005634360131807625}, + {"id": 322, "seek": 125046, "start": 1268.1000000000001, "end": 1269.8600000000001, + "text": " but it completely missed a whole bunch", "tokens": [51246, 457, 309, 2584, + 6721, 257, 1379, 3840, 51334], "temperature": 0.0, "avg_logprob": -0.14320736057710964, + "compression_ratio": 1.7451612903225806, "no_speech_prob": 0.0005634360131807625}, + {"id": 323, "seek": 125046, "start": 1269.8600000000001, "end": 1271.82, "text": + " of other potentially relevant results.", "tokens": [51334, 295, 661, 7263, 7340, + 3542, 13, 51432], "temperature": 0.0, "avg_logprob": -0.14320736057710964, "compression_ratio": + 1.7451612903225806, "no_speech_prob": 0.0005634360131807625}, {"id": 324, "seek": + 125046, "start": 1271.82, "end": 1275.22, "text": " Likewise, my vector query over + here for this embedding", "tokens": [51432, 30269, 11, 452, 8062, 14581, 670, 510, + 337, 341, 12240, 3584, 51602], "temperature": 0.0, "avg_logprob": -0.14320736057710964, + "compression_ratio": 1.7451612903225806, "no_speech_prob": 0.0005634360131807625}, + {"id": 325, "seek": 125046, "start": 1275.22, "end": 1277.3, "text": " matched the + Hobbit here.", "tokens": [51602, 21447, 264, 22966, 5260, 510, 13, 51706], "temperature": + 0.0, "avg_logprob": -0.14320736057710964, "compression_ratio": 1.7451612903225806, + "no_speech_prob": 0.0005634360131807625}, {"id": 326, "seek": 125046, "start": 1277.3, + "end": 1279.74, "text": " It matched a Harry Potter movie here.", "tokens": [51706, + 467, 21447, 257, 9378, 18115, 3169, 510, 13, 51828], "temperature": 0.0, "avg_logprob": + -0.14320736057710964, "compression_ratio": 1.7451612903225806, "no_speech_prob": + 0.0005634360131807625}, {"id": 327, "seek": 127974, "start": 1279.74, "end": 1282.74, + "text": " Similar concepts, similar themes,", "tokens": [50364, 10905, 10392, 11, + 2531, 13544, 11, 50514], "temperature": 0.0, "avg_logprob": -0.18317613670294233, + "compression_ratio": 1.962962962962963, "no_speech_prob": 0.00045225981739349663}, + {"id": 328, "seek": 127974, "start": 1282.74, "end": 1286.5, "text": " and similar + kind of visual style.", "tokens": [50514, 293, 2531, 733, 295, 5056, 3758, 13, 50702], + "temperature": 0.0, "avg_logprob": -0.18317613670294233, "compression_ratio": 1.962962962962963, + "no_speech_prob": 0.00045225981739349663}, {"id": 329, "seek": 127974, "start": + 1286.5, "end": 1288.1, "text": " Lord of the rings, Lord of the rings,", "tokens": + [50702, 3257, 295, 264, 11136, 11, 3257, 295, 264, 11136, 11, 50782], "temperature": + 0.0, "avg_logprob": -0.18317613670294233, "compression_ratio": 1.962962962962963, + "no_speech_prob": 0.00045225981739349663}, {"id": 330, "seek": 127974, "start": + 1288.1, "end": 1289.46, "text": " Rise of the Guardians, I guess,", "tokens": [50782, + 34482, 295, 264, 45236, 11, 286, 2041, 11, 50850], "temperature": 0.0, "avg_logprob": + -0.18317613670294233, "compression_ratio": 1.962962962962963, "no_speech_prob": + 0.00045225981739349663}, {"id": 331, "seek": 127974, "start": 1289.46, "end": 1291.02, + "text": " is maybe kind of conceptually similar", "tokens": [50850, 307, 1310, 733, + 295, 3410, 671, 2531, 50928], "temperature": 0.0, "avg_logprob": -0.18317613670294233, + "compression_ratio": 1.962962962962963, "no_speech_prob": 0.00045225981739349663}, + {"id": 332, "seek": 127974, "start": 1291.02, "end": 1293.7, "text": " even though + it''s a cartoon.", "tokens": [50928, 754, 1673, 309, 311, 257, 18569, 13, 51062], + "temperature": 0.0, "avg_logprob": -0.18317613670294233, "compression_ratio": 1.962962962962963, + "no_speech_prob": 0.00045225981739349663}, {"id": 333, "seek": 127974, "start": + 1293.7, "end": 1296.1, "text": " The wailing, I think, just has a visually similar + style,", "tokens": [51062, 440, 261, 23315, 11, 286, 519, 11, 445, 575, 257, 19622, + 2531, 3758, 11, 51182], "temperature": 0.0, "avg_logprob": -0.18317613670294233, + "compression_ratio": 1.962962962962963, "no_speech_prob": 0.00045225981739349663}, + {"id": 334, "seek": 127974, "start": 1296.1, "end": 1297.18, "text": " but it''s + a really bad match.", "tokens": [51182, 457, 309, 311, 257, 534, 1578, 2995, 13, + 51236], "temperature": 0.0, "avg_logprob": -0.18317613670294233, "compression_ratio": + 1.962962962962963, "no_speech_prob": 0.00045225981739349663}, {"id": 335, "seek": + 127974, "start": 1297.18, "end": 1298.02, "text": " You get the idea.", "tokens": + [51236, 509, 483, 264, 1558, 13, 51278], "temperature": 0.0, "avg_logprob": -0.18317613670294233, + "compression_ratio": 1.962962962962963, "no_speech_prob": 0.00045225981739349663}, + {"id": 336, "seek": 127974, "start": 1298.02, "end": 1299.66, "text": " So there''s + some really good results", "tokens": [51278, 407, 456, 311, 512, 534, 665, 3542, + 51360], "temperature": 0.0, "avg_logprob": -0.18317613670294233, "compression_ratio": + 1.962962962962963, "no_speech_prob": 0.00045225981739349663}, {"id": 337, "seek": + 127974, "start": 1299.66, "end": 1302.18, "text": " I get from the vector search,", + "tokens": [51360, 286, 483, 490, 264, 8062, 3164, 11, 51486], "temperature": 0.0, + "avg_logprob": -0.18317613670294233, "compression_ratio": 1.962962962962963, "no_speech_prob": + 0.00045225981739349663}, {"id": 338, "seek": 127974, "start": 1302.18, "end": 1303.9, + "text": " some the dense vector search,", "tokens": [51486, 512, 264, 18011, 8062, + 3164, 11, 51572], "temperature": 0.0, "avg_logprob": -0.18317613670294233, "compression_ratio": + 1.962962962962963, "no_speech_prob": 0.00045225981739349663}, {"id": 339, "seek": + 127974, "start": 1303.9, "end": 1306.22, "text": " some really good results I get + from this lexical", "tokens": [51572, 512, 534, 665, 3542, 286, 483, 490, 341, 476, + 87, 804, 51688], "temperature": 0.0, "avg_logprob": -0.18317613670294233, "compression_ratio": + 1.962962962962963, "no_speech_prob": 0.00045225981739349663}, {"id": 340, "seek": + 127974, "start": 1306.22, "end": 1308.26, "text": " or sparse vector search.", "tokens": + [51688, 420, 637, 11668, 8062, 3164, 13, 51790], "temperature": 0.0, "avg_logprob": + -0.18317613670294233, "compression_ratio": 1.962962962962963, "no_speech_prob": + 0.00045225981739349663}, {"id": 341, "seek": 130826, "start": 1308.26, "end": 1311.78, + "text": " And then with hybrid search with reciprocal rank fusion,", "tokens": [50364, + 400, 550, 365, 13051, 3164, 365, 46948, 6181, 23100, 11, 50540], "temperature": + 0.0, "avg_logprob": -0.1321264338270526, "compression_ratio": 1.728395061728395, + "no_speech_prob": 6.19793645455502e-05}, {"id": 342, "seek": 130826, "start": 1311.78, + "end": 1314.3799999999999, "text": " we can essentially take each of those separate + sets", "tokens": [50540, 321, 393, 4476, 747, 1184, 295, 729, 4994, 6352, 50670], + "temperature": 0.0, "avg_logprob": -0.1321264338270526, "compression_ratio": 1.728395061728395, + "no_speech_prob": 6.19793645455502e-05}, {"id": 343, "seek": 130826, "start": 1314.3799999999999, + "end": 1316.86, "text": " of results and combine them together", "tokens": [50670, + 295, 3542, 293, 10432, 552, 1214, 50794], "temperature": 0.0, "avg_logprob": -0.1321264338270526, + "compression_ratio": 1.728395061728395, "no_speech_prob": 6.19793645455502e-05}, + {"id": 344, "seek": 130826, "start": 1316.86, "end": 1320.74, "text": " in a way + that weights things that both the lexical", "tokens": [50794, 294, 257, 636, 300, + 17443, 721, 300, 1293, 264, 476, 87, 804, 50988], "temperature": 0.0, "avg_logprob": + -0.1321264338270526, "compression_ratio": 1.728395061728395, "no_speech_prob": 6.19793645455502e-05}, + {"id": 345, "seek": 130826, "start": 1320.74, "end": 1324.62, "text": " and the + dense search found relevant.", "tokens": [50988, 293, 264, 18011, 3164, 1352, 7340, + 13, 51182], "temperature": 0.0, "avg_logprob": -0.1321264338270526, "compression_ratio": + 1.728395061728395, "no_speech_prob": 6.19793645455502e-05}, {"id": 346, "seek": + 130826, "start": 1324.62, "end": 1325.94, "text": " It moves those to the top", + "tokens": [51182, 467, 6067, 729, 281, 264, 1192, 51248], "temperature": 0.0, "avg_logprob": + -0.1321264338270526, "compression_ratio": 1.728395061728395, "no_speech_prob": 6.19793645455502e-05}, + {"id": 347, "seek": 130826, "start": 1325.94, "end": 1329.1, "text": " and then + kind of gives us better results overall.", "tokens": [51248, 293, 550, 733, 295, + 2709, 505, 1101, 3542, 4787, 13, 51406], "temperature": 0.0, "avg_logprob": -0.1321264338270526, + "compression_ratio": 1.728395061728395, "no_speech_prob": 6.19793645455502e-05}, + {"id": 348, "seek": 130826, "start": 1329.1, "end": 1333.58, "text": " And you can + see that I''ve matched most of the results over here.", "tokens": [51406, 400, 291, + 393, 536, 300, 286, 600, 21447, 881, 295, 264, 3542, 670, 510, 13, 51630], "temperature": + 0.0, "avg_logprob": -0.1321264338270526, "compression_ratio": 1.728395061728395, + "no_speech_prob": 6.19793645455502e-05}, {"id": 349, "seek": 130826, "start": 1333.58, + "end": 1335.94, "text": " So it''s better than either of the two lexical", "tokens": + [51630, 407, 309, 311, 1101, 813, 2139, 295, 264, 732, 476, 87, 804, 51748], "temperature": + 0.0, "avg_logprob": -0.1321264338270526, "compression_ratio": 1.728395061728395, + "no_speech_prob": 6.19793645455502e-05}, {"id": 350, "seek": 133594, "start": 1335.94, + "end": 1339.38, "text": " or dense vector search mechanisms individually.", "tokens": + [50364, 420, 18011, 8062, 3164, 15902, 16652, 13, 50536], "temperature": 0.0, "avg_logprob": + -0.14885834185746463, "compression_ratio": 1.8265682656826567, "no_speech_prob": + 0.001988595351576805}, {"id": 351, "seek": 133594, "start": 1339.38, "end": 1342.38, + "text": " However, I''m still treating them", "tokens": [50536, 2908, 11, 286, 478, + 920, 15083, 552, 50686], "temperature": 0.0, "avg_logprob": -0.14885834185746463, + "compression_ratio": 1.8265682656826567, "no_speech_prob": 0.001988595351576805}, + {"id": 352, "seek": 133594, "start": 1342.38, "end": 1344.1000000000001, "text": + " as entirely separate things.", "tokens": [50686, 382, 7696, 4994, 721, 13, 50772], + "temperature": 0.0, "avg_logprob": -0.14885834185746463, "compression_ratio": 1.8265682656826567, + "no_speech_prob": 0.001988595351576805}, {"id": 353, "seek": 133594, "start": 1344.1000000000001, + "end": 1345.3400000000001, "text": " I''m doing the lexical search,", "tokens": + [50772, 286, 478, 884, 264, 476, 87, 804, 3164, 11, 50834], "temperature": 0.0, + "avg_logprob": -0.14885834185746463, "compression_ratio": 1.8265682656826567, "no_speech_prob": + 0.001988595351576805}, {"id": 354, "seek": 133594, "start": 1345.3400000000001, + "end": 1346.54, "text": " I''m doing an embedding search", "tokens": [50834, 286, + 478, 884, 364, 12240, 3584, 3164, 50894], "temperature": 0.0, "avg_logprob": -0.14885834185746463, + "compression_ratio": 1.8265682656826567, "no_speech_prob": 0.001988595351576805}, + {"id": 355, "seek": 133594, "start": 1346.54, "end": 1349.66, "text": " and then + I''m combining them together.", "tokens": [50894, 293, 550, 286, 478, 21928, 552, + 1214, 13, 51050], "temperature": 0.0, "avg_logprob": -0.14885834185746463, "compression_ratio": + 1.8265682656826567, "no_speech_prob": 0.001988595351576805}, {"id": 356, "seek": + 133594, "start": 1349.66, "end": 1351.8200000000002, "text": " But in reality, there''s + lots of ways", "tokens": [51050, 583, 294, 4103, 11, 456, 311, 3195, 295, 2098, + 51158], "temperature": 0.0, "avg_logprob": -0.14885834185746463, "compression_ratio": + 1.8265682656826567, "no_speech_prob": 0.001988595351576805}, {"id": 357, "seek": + 133594, "start": 1351.8200000000002, "end": 1353.66, "text": " to merge these different + paradigms.", "tokens": [51158, 281, 22183, 613, 819, 13480, 328, 2592, 13, 51250], + "temperature": 0.0, "avg_logprob": -0.14885834185746463, "compression_ratio": 1.8265682656826567, + "no_speech_prob": 0.001988595351576805}, {"id": 358, "seek": 133594, "start": 1353.66, + "end": 1357.46, "text": " And even beyond just the embeddings I''m getting from + texts,", "tokens": [51250, 400, 754, 4399, 445, 264, 12240, 29432, 286, 478, 1242, + 490, 15765, 11, 51440], "temperature": 0.0, "avg_logprob": -0.14885834185746463, + "compression_ratio": 1.8265682656826567, "no_speech_prob": 0.001988595351576805}, + {"id": 359, "seek": 133594, "start": 1357.46, "end": 1359.3, "text": " we can get + text embeddings.", "tokens": [51440, 321, 393, 483, 2487, 12240, 29432, 13, 51532], + "temperature": 0.0, "avg_logprob": -0.14885834185746463, "compression_ratio": 1.8265682656826567, + "no_speech_prob": 0.001988595351576805}, {"id": 360, "seek": 133594, "start": 1359.3, + "end": 1361.9, "text": " So for example, we can do a text encoder", "tokens": [51532, + 407, 337, 1365, 11, 321, 393, 360, 257, 2487, 2058, 19866, 51662], "temperature": + 0.0, "avg_logprob": -0.14885834185746463, "compression_ratio": 1.8265682656826567, + "no_speech_prob": 0.001988595351576805}, {"id": 361, "seek": 133594, "start": 1361.9, + "end": 1363.38, "text": " to generate embeddings for that.", "tokens": [51662, 281, + 8460, 12240, 29432, 337, 300, 13, 51736], "temperature": 0.0, "avg_logprob": -0.14885834185746463, + "compression_ratio": 1.8265682656826567, "no_speech_prob": 0.001988595351576805}, + {"id": 362, "seek": 133594, "start": 1363.38, "end": 1365.8600000000001, "text": + " We can take images and generate embeddings for that.", "tokens": [51736, 492, + 393, 747, 5267, 293, 8460, 12240, 29432, 337, 300, 13, 51860], "temperature": 0.0, + "avg_logprob": -0.14885834185746463, "compression_ratio": 1.8265682656826567, "no_speech_prob": + 0.001988595351576805}, {"id": 363, "seek": 136586, "start": 1365.86, "end": 1368.1, + "text": " We can also take user behaviors", "tokens": [50364, 492, 393, 611, 747, + 4195, 15501, 50476], "temperature": 0.0, "avg_logprob": -0.12627686828863424, "compression_ratio": + 1.7043795620437956, "no_speech_prob": 6.328119343379512e-05}, {"id": 364, "seek": + 136586, "start": 1368.1, "end": 1370.74, "text": " and generate behavioral based + embeddings", "tokens": [50476, 293, 8460, 19124, 2361, 12240, 29432, 50608], "temperature": + 0.0, "avg_logprob": -0.12627686828863424, "compression_ratio": 1.7043795620437956, + "no_speech_prob": 6.328119343379512e-05}, {"id": 365, "seek": 136586, "start": 1370.74, + "end": 1372.1, "text": " and combine those together.", "tokens": [50608, 293, 10432, + 729, 1214, 13, 50676], "temperature": 0.0, "avg_logprob": -0.12627686828863424, + "compression_ratio": 1.7043795620437956, "no_speech_prob": 6.328119343379512e-05}, + {"id": 366, "seek": 136586, "start": 1372.1, "end": 1375.1399999999999, "text": + " And there''s different ways to generate new vector spaces.", "tokens": [50676, + 400, 456, 311, 819, 2098, 281, 8460, 777, 8062, 7673, 13, 50828], "temperature": + 0.0, "avg_logprob": -0.12627686828863424, "compression_ratio": 1.7043795620437956, + "no_speech_prob": 6.328119343379512e-05}, {"id": 367, "seek": 136586, "start": 1375.1399999999999, + "end": 1376.9399999999998, "text": " You can concatenate these together", "tokens": + [50828, 509, 393, 1588, 7186, 473, 613, 1214, 50918], "temperature": 0.0, "avg_logprob": + -0.12627686828863424, "compression_ratio": 1.7043795620437956, "no_speech_prob": + 6.328119343379512e-05}, {"id": 368, "seek": 136586, "start": 1376.9399999999998, + "end": 1378.9799999999998, "text": " and you can do dimensionality reduction", "tokens": + [50918, 293, 291, 393, 360, 10139, 1860, 11004, 51020], "temperature": 0.0, "avg_logprob": + -0.12627686828863424, "compression_ratio": 1.7043795620437956, "no_speech_prob": + 6.328119343379512e-05}, {"id": 369, "seek": 136586, "start": 1378.9799999999998, + "end": 1380.1399999999999, "text": " or you can stack them.", "tokens": [51020, + 420, 291, 393, 8630, 552, 13, 51078], "temperature": 0.0, "avg_logprob": -0.12627686828863424, + "compression_ratio": 1.7043795620437956, "no_speech_prob": 6.328119343379512e-05}, + {"id": 370, "seek": 136586, "start": 1380.1399999999999, "end": 1382.02, "text": + " I''m not gonna get into those today.", "tokens": [51078, 286, 478, 406, 799, 483, + 666, 729, 965, 13, 51172], "temperature": 0.0, "avg_logprob": -0.12627686828863424, + "compression_ratio": 1.7043795620437956, "no_speech_prob": 6.328119343379512e-05}, + {"id": 371, "seek": 136586, "start": 1382.02, "end": 1385.54, "text": " But the + reality is we''ve got a lot of tools at our disposal", "tokens": [51172, 583, 264, + 4103, 307, 321, 600, 658, 257, 688, 295, 3873, 412, 527, 26400, 51348], "temperature": + 0.0, "avg_logprob": -0.12627686828863424, "compression_ratio": 1.7043795620437956, + "no_speech_prob": 6.328119343379512e-05}, {"id": 372, "seek": 136586, "start": 1385.54, + "end": 1388.26, "text": " to be able to query and get at data", "tokens": [51348, + 281, 312, 1075, 281, 14581, 293, 483, 412, 1412, 51484], "temperature": 0.0, "avg_logprob": + -0.12627686828863424, "compression_ratio": 1.7043795620437956, "no_speech_prob": + 6.328119343379512e-05}, {"id": 373, "seek": 136586, "start": 1388.26, "end": 1389.9399999999998, + "text": " and relate it in different ways.", "tokens": [51484, 293, 10961, 309, + 294, 819, 2098, 13, 51568], "temperature": 0.0, "avg_logprob": -0.12627686828863424, + "compression_ratio": 1.7043795620437956, "no_speech_prob": 6.328119343379512e-05}, + {"id": 374, "seek": 136586, "start": 1389.9399999999998, "end": 1394.1799999999998, + "text": " In fact, what I described for hybrid searches", "tokens": [51568, 682, + 1186, 11, 437, 286, 7619, 337, 13051, 26701, 51780], "temperature": 0.0, "avg_logprob": + -0.12627686828863424, "compression_ratio": 1.7043795620437956, "no_speech_prob": + 6.328119343379512e-05}, {"id": 375, "seek": 139418, "start": 1394.26, "end": 1397.02, + "text": " I can go with RRF is just scratching the surface", "tokens": [50368, 286, + 393, 352, 365, 497, 49, 37, 307, 445, 29699, 264, 3753, 50506], "temperature": 0.0, + "avg_logprob": -0.2643263006723055, "compression_ratio": 1.5272727272727273, "no_speech_prob": + 0.0011563804000616074}, {"id": 376, "seek": 139418, "start": 1397.02, "end": 1401.1000000000001, + "text": " for what we can do with combining different paradigms.", "tokens": [50506, + 337, 437, 321, 393, 360, 365, 21928, 819, 13480, 328, 2592, 13, 50710], "temperature": + 0.0, "avg_logprob": -0.2643263006723055, "compression_ratio": 1.5272727272727273, + "no_speech_prob": 0.0011563804000616074}, {"id": 377, "seek": 139418, "start": 1401.1000000000001, + "end": 1403.5800000000002, "text": " And so this spectrum here on the left,", "tokens": + [50710, 400, 370, 341, 11143, 510, 322, 264, 1411, 11, 50834], "temperature": 0.0, + "avg_logprob": -0.2643263006723055, "compression_ratio": 1.5272727272727273, "no_speech_prob": + 0.0011563804000616074}, {"id": 378, "seek": 139418, "start": 1403.5800000000002, + "end": 1408.5800000000002, "text": " this is token matching or traditional electrical + search.", "tokens": [50834, 341, 307, 14862, 14324, 420, 5164, 12147, 3164, 13, + 51084], "temperature": 0.0, "avg_logprob": -0.2643263006723055, "compression_ratio": + 1.5272727272727273, "no_speech_prob": 0.0011563804000616074}, {"id": 379, "seek": + 139418, "start": 1412.14, "end": 1417.14, "text": " And you''ll see that things + like TF IDF,", "tokens": [51262, 400, 291, 603, 536, 300, 721, 411, 40964, 7348, + 37, 11, 51512], "temperature": 0.0, "avg_logprob": -0.2643263006723055, "compression_ratio": + 1.5272727272727273, "no_speech_prob": 0.0011563804000616074}, {"id": 380, "seek": + 139418, "start": 1417.54, "end": 1420.6200000000001, "text": " will be a matching + those kinds of things fit over here.", "tokens": [51532, 486, 312, 257, 14324, 729, + 3685, 295, 721, 3318, 670, 510, 13, 51686], "temperature": 0.0, "avg_logprob": -0.2643263006723055, + "compression_ratio": 1.5272727272727273, "no_speech_prob": 0.0011563804000616074}, + {"id": 381, "seek": 139418, "start": 1420.6200000000001, "end": 1422.74, "text": + " We''ve also, let me just check the, yeah.", "tokens": [51686, 492, 600, 611, 11, + 718, 385, 445, 1520, 264, 11, 1338, 13, 51792], "temperature": 0.0, "avg_logprob": + -0.2643263006723055, "compression_ratio": 1.5272727272727273, "no_speech_prob": + 0.0011563804000616074}, {"id": 382, "seek": 142274, "start": 1422.74, "end": 1426.26, + "text": " Okay, we''ve also got on the opposite of the spectrum", "tokens": [50364, + 1033, 11, 321, 600, 611, 658, 322, 264, 6182, 295, 264, 11143, 50540], "temperature": + 0.0, "avg_logprob": -0.1475802584811374, "compression_ratio": 1.688, "no_speech_prob": + 0.00024897450930438936}, {"id": 383, "seek": 142274, "start": 1426.26, "end": 1427.66, + "text": " this dense vector search.", "tokens": [50540, 341, 18011, 8062, 3164, + 13, 50610], "temperature": 0.0, "avg_logprob": -0.1475802584811374, "compression_ratio": + 1.688, "no_speech_prob": 0.00024897450930438936}, {"id": 384, "seek": 142274, "start": + 1427.66, "end": 1430.6200000000001, "text": " And of course the RRF would fall in + here", "tokens": [50610, 400, 295, 1164, 264, 497, 49, 37, 576, 2100, 294, 510, + 50758], "temperature": 0.0, "avg_logprob": -0.1475802584811374, "compression_ratio": + 1.688, "no_speech_prob": 0.00024897450930438936}, {"id": 385, "seek": 142274, "start": + 1430.6200000000001, "end": 1434.3, "text": " in this sort of hybrid sparse retrieval", + "tokens": [50758, 294, 341, 1333, 295, 13051, 637, 11668, 19817, 3337, 50942], "temperature": + 0.0, "avg_logprob": -0.1475802584811374, "compression_ratio": 1.688, "no_speech_prob": + 0.00024897450930438936}, {"id": 386, "seek": 142274, "start": 1434.3, "end": 1436.46, + "text": " in dense vector search where we''re running them", "tokens": [50942, 294, + 18011, 8062, 3164, 689, 321, 434, 2614, 552, 51050], "temperature": 0.0, "avg_logprob": + -0.1475802584811374, "compression_ratio": 1.688, "no_speech_prob": 0.00024897450930438936}, + {"id": 387, "seek": 142274, "start": 1436.46, "end": 1440.34, "text": " independently + and in parallel and combining the results.", "tokens": [51050, 21761, 293, 294, + 8952, 293, 21928, 264, 3542, 13, 51244], "temperature": 0.0, "avg_logprob": -0.1475802584811374, + "compression_ratio": 1.688, "no_speech_prob": 0.00024897450930438936}, {"id": 388, + "seek": 142274, "start": 1440.34, "end": 1443.54, "text": " But there''s also mechanisms + where we could,", "tokens": [51244, 583, 456, 311, 611, 15902, 689, 321, 727, 11, + 51404], "temperature": 0.0, "avg_logprob": -0.1475802584811374, "compression_ratio": + 1.688, "no_speech_prob": 0.00024897450930438936}, {"id": 389, "seek": 142274, "start": + 1443.54, "end": 1446.06, "text": " for example, run sparse retrieval first", "tokens": + [51404, 337, 1365, 11, 1190, 637, 11668, 19817, 3337, 700, 51530], "temperature": + 0.0, "avg_logprob": -0.1475802584811374, "compression_ratio": 1.688, "no_speech_prob": + 0.00024897450930438936}, {"id": 390, "seek": 142274, "start": 1446.06, "end": 1449.02, + "text": " and then re-rank using dense embeddings", "tokens": [51530, 293, 550, + 319, 12, 20479, 1228, 18011, 12240, 29432, 51678], "temperature": 0.0, "avg_logprob": + -0.1475802584811374, "compression_ratio": 1.688, "no_speech_prob": 0.00024897450930438936}, + {"id": 391, "seek": 142274, "start": 1449.02, "end": 1451.7, "text": " or something + like with mini coil,", "tokens": [51678, 420, 746, 411, 365, 8382, 22225, 11, 51812], + "temperature": 0.0, "avg_logprob": -0.1475802584811374, "compression_ratio": 1.688, + "no_speech_prob": 0.00024897450930438936}, {"id": 392, "seek": 145170, "start": + 1451.7, "end": 1453.78, "text": " which I mentioned Jenny from Quadrant''s", "tokens": + [50364, 597, 286, 2835, 20580, 490, 29619, 7541, 311, 50468], "temperature": 0.0, + "avg_logprob": -0.1499513584932835, "compression_ratio": 1.7763578274760383, "no_speech_prob": + 0.00028708824538625777}, {"id": 393, "seek": 145170, "start": 1453.78, "end": 1456.42, + "text": " going to come talk to us about in the AI powered search course.", "tokens": + [50468, 516, 281, 808, 751, 281, 505, 466, 294, 264, 7318, 17786, 3164, 1164, 13, + 50600], "temperature": 0.0, "avg_logprob": -0.1499513584932835, "compression_ratio": + 1.7763578274760383, "no_speech_prob": 0.00028708824538625777}, {"id": 394, "seek": + 145170, "start": 1456.42, "end": 1459.3, "text": " You can actually run a sparse + search", "tokens": [50600, 509, 393, 767, 1190, 257, 637, 11668, 3164, 50744], "temperature": + 0.0, "avg_logprob": -0.1499513584932835, "compression_ratio": 1.7763578274760383, + "no_speech_prob": 0.00028708824538625777}, {"id": 395, "seek": 145170, "start": + 1459.3, "end": 1462.78, "text": " and have embeddings that are sort of adding additional", + "tokens": [50744, 293, 362, 12240, 29432, 300, 366, 1333, 295, 5127, 4497, 50918], + "temperature": 0.0, "avg_logprob": -0.1499513584932835, "compression_ratio": 1.7763578274760383, + "no_speech_prob": 0.00028708824538625777}, {"id": 396, "seek": 145170, "start": + 1462.78, "end": 1465.5, "text": " semantic data to your electrical queries", "tokens": + [50918, 47982, 1412, 281, 428, 12147, 24109, 51054], "temperature": 0.0, "avg_logprob": + -0.1499513584932835, "compression_ratio": 1.7763578274760383, "no_speech_prob": + 0.00028708824538625777}, {"id": 397, "seek": 145170, "start": 1465.5, "end": 1468.46, + "text": " to be able to better leverage semantics", "tokens": [51054, 281, 312, + 1075, 281, 1101, 13982, 4361, 45298, 51202], "temperature": 0.0, "avg_logprob": + -0.1499513584932835, "compression_ratio": 1.7763578274760383, "no_speech_prob": + 0.00028708824538625777}, {"id": 398, "seek": 145170, "start": 1468.46, "end": 1469.98, + "text": " as part of your sparse search.", "tokens": [51202, 382, 644, 295, 428, + 637, 11668, 3164, 13, 51278], "temperature": 0.0, "avg_logprob": -0.1499513584932835, + "compression_ratio": 1.7763578274760383, "no_speech_prob": 0.00028708824538625777}, + {"id": 399, "seek": 145170, "start": 1469.98, "end": 1471.6200000000001, "text": + " There''s splay, there''s semantic knowledge graphs,", "tokens": [51278, 821, 311, + 262, 2858, 11, 456, 311, 47982, 3601, 24877, 11, 51360], "temperature": 0.0, "avg_logprob": + -0.1499513584932835, "compression_ratio": 1.7763578274760383, "no_speech_prob": + 0.00028708824538625777}, {"id": 400, "seek": 145170, "start": 1471.6200000000001, + "end": 1472.98, "text": " there''s all these different techniques", "tokens": [51360, + 456, 311, 439, 613, 819, 7512, 51428], "temperature": 0.0, "avg_logprob": -0.1499513584932835, + "compression_ratio": 1.7763578274760383, "no_speech_prob": 0.00028708824538625777}, + {"id": 401, "seek": 145170, "start": 1472.98, "end": 1474.7, "text": " that we can + use to get better search,", "tokens": [51428, 300, 321, 393, 764, 281, 483, 1101, + 3164, 11, 51514], "temperature": 0.0, "avg_logprob": -0.1499513584932835, "compression_ratio": + 1.7763578274760383, "no_speech_prob": 0.00028708824538625777}, {"id": 402, "seek": + 145170, "start": 1474.7, "end": 1477.38, "text": " whether it''s hybrid search or + leveraging one of the techniques.", "tokens": [51514, 1968, 309, 311, 13051, 3164, + 420, 32666, 472, 295, 264, 7512, 13, 51648], "temperature": 0.0, "avg_logprob": + -0.1499513584932835, "compression_ratio": 1.7763578274760383, "no_speech_prob": + 0.00028708824538625777}, {"id": 403, "seek": 145170, "start": 1477.38, "end": 1480.78, + "text": " But I want to just like mention that there''s lots of ways", "tokens": + [51648, 583, 286, 528, 281, 445, 411, 2152, 300, 456, 311, 3195, 295, 2098, 51818], + "temperature": 0.0, "avg_logprob": -0.1499513584932835, "compression_ratio": 1.7763578274760383, + "no_speech_prob": 0.00028708824538625777}, {"id": 404, "seek": 148078, "start": + 1480.78, "end": 1482.62, "text": " to deal with embeddings and to deal", "tokens": + [50364, 281, 2028, 365, 12240, 29432, 293, 281, 2028, 50456], "temperature": 0.0, + "avg_logprob": -0.1386562796200023, "compression_ratio": 1.9735849056603774, "no_speech_prob": + 0.0002039766841335222}, {"id": 405, "seek": 148078, "start": 1482.62, "end": 1485.66, + "text": " with sparse and dense vectors to combine them", "tokens": [50456, 365, + 637, 11668, 293, 18011, 18875, 281, 10432, 552, 50608], "temperature": 0.0, "avg_logprob": + -0.1386562796200023, "compression_ratio": 1.9735849056603774, "no_speech_prob": + 0.0002039766841335222}, {"id": 406, "seek": 148078, "start": 1485.66, "end": 1488.62, + "text": " to improve query understanding and to improve recall.", "tokens": [50608, + 281, 3470, 14581, 3701, 293, 281, 3470, 9901, 13, 50756], "temperature": 0.0, "avg_logprob": + -0.1386562796200023, "compression_ratio": 1.9735849056603774, "no_speech_prob": + 0.0002039766841335222}, {"id": 407, "seek": 148078, "start": 1488.62, "end": 1493.3799999999999, + "text": " And so one of the things that I''m experimenting with", "tokens": [50756, + 400, 370, 472, 295, 264, 721, 300, 286, 478, 29070, 365, 50994], "temperature": + 0.0, "avg_logprob": -0.1386562796200023, "compression_ratio": 1.9735849056603774, + "no_speech_prob": 0.0002039766841335222}, {"id": 408, "seek": 148078, "start": 1493.3799999999999, + "end": 1495.1399999999999, "text": " is sort of like an emerging way to do this", + "tokens": [50994, 307, 1333, 295, 411, 364, 14989, 636, 281, 360, 341, 51082], "temperature": + 0.0, "avg_logprob": -0.1386562796200023, "compression_ratio": 1.9735849056603774, + "no_speech_prob": 0.0002039766841335222}, {"id": 409, "seek": 148078, "start": 1495.1399999999999, + "end": 1496.98, "text": " is something I''m calling wormhole vectors.", "tokens": + [51082, 307, 746, 286, 478, 5141, 23835, 14094, 18875, 13, 51174], "temperature": + 0.0, "avg_logprob": -0.1386562796200023, "compression_ratio": 1.9735849056603774, + "no_speech_prob": 0.0002039766841335222}, {"id": 410, "seek": 148078, "start": 1496.98, + "end": 1499.86, "text": " And the idea of wormhole vectors is that I''ve got", "tokens": + [51174, 400, 264, 1558, 295, 23835, 14094, 18875, 307, 300, 286, 600, 658, 51318], + "temperature": 0.0, "avg_logprob": -0.1386562796200023, "compression_ratio": 1.9735849056603774, + "no_speech_prob": 0.0002039766841335222}, {"id": 411, "seek": 148078, "start": 1499.86, + "end": 1502.3, "text": " these sort of different vector spaces.", "tokens": [51318, + 613, 1333, 295, 819, 8062, 7673, 13, 51440], "temperature": 0.0, "avg_logprob": + -0.1386562796200023, "compression_ratio": 1.9735849056603774, "no_speech_prob": + 0.0002039766841335222}, {"id": 412, "seek": 148078, "start": 1502.3, "end": 1504.5, + "text": " I''ve got my sparse lexical vector space,", "tokens": [51440, 286, 600, + 658, 452, 637, 11668, 476, 87, 804, 8062, 1901, 11, 51550], "temperature": 0.0, + "avg_logprob": -0.1386562796200023, "compression_ratio": 1.9735849056603774, "no_speech_prob": + 0.0002039766841335222}, {"id": 413, "seek": 148078, "start": 1504.5, "end": 1505.66, + "text": " which we talked about.", "tokens": [51550, 597, 321, 2825, 466, 13, 51608], + "temperature": 0.0, "avg_logprob": -0.1386562796200023, "compression_ratio": 1.9735849056603774, + "no_speech_prob": 0.0002039766841335222}, {"id": 414, "seek": 148078, "start": 1505.66, + "end": 1507.98, "text": " I''ve got my dense semantic vector space.", "tokens": + [51608, 286, 600, 658, 452, 18011, 47982, 8062, 1901, 13, 51724], "temperature": + 0.0, "avg_logprob": -0.1386562796200023, "compression_ratio": 1.9735849056603774, + "no_speech_prob": 0.0002039766841335222}, {"id": 415, "seek": 148078, "start": 1507.98, + "end": 1510.62, "text": " And then I mentioned we can generate behavioral vector", + "tokens": [51724, 400, 550, 286, 2835, 321, 393, 8460, 19124, 8062, 51856], "temperature": + 0.0, "avg_logprob": -0.1386562796200023, "compression_ratio": 1.9735849056603774, + "no_speech_prob": 0.0002039766841335222}, {"id": 416, "seek": 151062, "start": 1510.62, + "end": 1513.6999999999998, "text": " spaces, which I''ll show in just a little bit.", + "tokens": [50364, 7673, 11, 597, 286, 603, 855, 294, 445, 257, 707, 857, 13, 50518], + "temperature": 0.0, "avg_logprob": -0.13930073430982687, "compression_ratio": 1.6730769230769231, + "no_speech_prob": 0.000653287919703871}, {"id": 417, "seek": 151062, "start": 1513.6999999999998, + "end": 1517.82, "text": " And so I want to walk through what this technique looks + like.", "tokens": [50518, 400, 370, 286, 528, 281, 1792, 807, 437, 341, 6532, 1542, + 411, 13, 50724], "temperature": 0.0, "avg_logprob": -0.13930073430982687, "compression_ratio": + 1.6730769230769231, "no_speech_prob": 0.000653287919703871}, {"id": 418, "seek": + 151062, "start": 1517.82, "end": 1521.1799999999998, "text": " And I do want to + frame this talk as this is sort of like", "tokens": [50724, 400, 286, 360, 528, + 281, 3920, 341, 751, 382, 341, 307, 1333, 295, 411, 50892], "temperature": 0.0, + "avg_logprob": -0.13930073430982687, "compression_ratio": 1.6730769230769231, "no_speech_prob": + 0.000653287919703871}, {"id": 419, "seek": 151062, "start": 1521.1799999999998, + "end": 1522.7399999999998, "text": " new and emerging.", "tokens": [50892, 777, + 293, 14989, 13, 50970], "temperature": 0.0, "avg_logprob": -0.13930073430982687, + "compression_ratio": 1.6730769230769231, "no_speech_prob": 0.000653287919703871}, + {"id": 420, "seek": 151062, "start": 1522.7399999999998, "end": 1526.6999999999998, + "text": " I''ve got lots of experience doing some of this", "tokens": [50970, 286, + 600, 658, 3195, 295, 1752, 884, 512, 295, 341, 51168], "temperature": 0.0, "avg_logprob": + -0.13930073430982687, "compression_ratio": 1.6730769230769231, "no_speech_prob": + 0.000653287919703871}, {"id": 421, "seek": 151062, "start": 1526.6999999999998, + "end": 1527.9799999999998, "text": " across different vector spaces.", "tokens": + [51168, 2108, 819, 8062, 7673, 13, 51232], "temperature": 0.0, "avg_logprob": -0.13930073430982687, + "compression_ratio": 1.6730769230769231, "no_speech_prob": 0.000653287919703871}, + {"id": 422, "seek": 151062, "start": 1527.9799999999998, "end": 1531.02, "text": + " But there''s a lot of things that I so need to iron out", "tokens": [51232, 583, + 456, 311, 257, 688, 295, 721, 300, 286, 370, 643, 281, 6497, 484, 51384], "temperature": + 0.0, "avg_logprob": -0.13930073430982687, "compression_ratio": 1.6730769230769231, + "no_speech_prob": 0.000653287919703871}, {"id": 423, "seek": 151062, "start": 1531.02, + "end": 1532.8999999999999, "text": " in terms of best practices for doing this.", + "tokens": [51384, 294, 2115, 295, 1151, 7525, 337, 884, 341, 13, 51478], "temperature": + 0.0, "avg_logprob": -0.13930073430982687, "compression_ratio": 1.6730769230769231, + "no_speech_prob": 0.000653287919703871}, {"id": 424, "seek": 151062, "start": 1532.8999999999999, + "end": 1536.9799999999998, "text": " So treat this as something that''s emerging", + "tokens": [51478, 407, 2387, 341, 382, 746, 300, 311, 14989, 51682], "temperature": + 0.0, "avg_logprob": -0.13930073430982687, "compression_ratio": 1.6730769230769231, + "no_speech_prob": 0.000653287919703871}, {"id": 425, "seek": 151062, "start": 1536.9799999999998, + "end": 1538.1, "text": " and something you can play with.", "tokens": [51682, 293, + 746, 291, 393, 862, 365, 13, 51738], "temperature": 0.0, "avg_logprob": -0.13930073430982687, + "compression_ratio": 1.6730769230769231, "no_speech_prob": 0.000653287919703871}, + {"id": 426, "seek": 153810, "start": 1538.1, "end": 1541.1, "text": " And I think + the intuition will be really helpful.", "tokens": [50364, 400, 286, 519, 264, 24002, + 486, 312, 534, 4961, 13, 50514], "temperature": 0.0, "avg_logprob": -0.1733814380787037, + "compression_ratio": 1.7692307692307692, "no_speech_prob": 0.0001525890693301335}, + {"id": 427, "seek": 153810, "start": 1541.1, "end": 1543.78, "text": " But I''m, + you know, if in preparation for the course", "tokens": [50514, 583, 286, 478, 11, + 291, 458, 11, 498, 294, 13081, 337, 264, 1164, 50648], "temperature": 0.0, "avg_logprob": + -0.1733814380787037, "compression_ratio": 1.7692307692307692, "no_speech_prob": + 0.0001525890693301335}, {"id": 428, "seek": 153810, "start": 1543.78, "end": 1545.62, + "text": " and going forward, I''m going to be doing a lot more", "tokens": [50648, + 293, 516, 2128, 11, 286, 478, 516, 281, 312, 884, 257, 688, 544, 50740], "temperature": + 0.0, "avg_logprob": -0.1733814380787037, "compression_ratio": 1.7692307692307692, + "no_speech_prob": 0.0001525890693301335}, {"id": 429, "seek": 153810, "start": 1545.62, + "end": 1547.74, "text": " in terms of concrete examples for this.", "tokens": [50740, + 294, 2115, 295, 9859, 5110, 337, 341, 13, 50846], "temperature": 0.0, "avg_logprob": + -0.1733814380787037, "compression_ratio": 1.7692307692307692, "no_speech_prob": + 0.0001525890693301335}, {"id": 430, "seek": 153810, "start": 1547.74, "end": 1552.58, + "text": " And so I don''t want to get into quantum physics or, you know,", "tokens": + [50846, 400, 370, 286, 500, 380, 528, 281, 483, 666, 13018, 10649, 420, 11, 291, + 458, 11, 51088], "temperature": 0.0, "avg_logprob": -0.1733814380787037, "compression_ratio": + 1.7692307692307692, "no_speech_prob": 0.0001525890693301335}, {"id": 431, "seek": + 153810, "start": 1552.58, "end": 1553.3799999999999, "text": " physics in general.", + "tokens": [51088, 10649, 294, 2674, 13, 51128], "temperature": 0.0, "avg_logprob": + -0.1733814380787037, "compression_ratio": 1.7692307692307692, "no_speech_prob": + 0.0001525890693301335}, {"id": 432, "seek": 153810, "start": 1553.3799999999999, + "end": 1555.5, "text": " But, you know, wormholes, if you''re not familiar,", "tokens": + [51128, 583, 11, 291, 458, 11, 23835, 37894, 11, 498, 291, 434, 406, 4963, 11, 51234], + "temperature": 0.0, "avg_logprob": -0.1733814380787037, "compression_ratio": 1.7692307692307692, + "no_speech_prob": 0.0001525890693301335}, {"id": 433, "seek": 153810, "start": 1555.5, + "end": 1559.1, "text": " are essentially passages through space time.", "tokens": + [51234, 366, 4476, 31589, 807, 1901, 565, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.1733814380787037, "compression_ratio": 1.7692307692307692, "no_speech_prob": + 0.0001525890693301335}, {"id": 434, "seek": 153810, "start": 1559.1, "end": 1562.1799999999998, + "text": " You can think of it as, you know, the ability to, you know,", "tokens": + [51414, 509, 393, 519, 295, 309, 382, 11, 291, 458, 11, 264, 3485, 281, 11, 291, + 458, 11, 51568], "temperature": 0.0, "avg_logprob": -0.1733814380787037, "compression_ratio": + 1.7692307692307692, "no_speech_prob": 0.0001525890693301335}, {"id": 435, "seek": + 153810, "start": 1562.1799999999998, "end": 1566.1399999999999, "text": " go from + one point in space to another point in space", "tokens": [51568, 352, 490, 472, + 935, 294, 1901, 281, 1071, 935, 294, 1901, 51766], "temperature": 0.0, "avg_logprob": + -0.1733814380787037, "compression_ratio": 1.7692307692307692, "no_speech_prob": + 0.0001525890693301335}, {"id": 436, "seek": 156614, "start": 1566.14, "end": 1568.66, + "text": " and essentially like hop there instantly.", "tokens": [50364, 293, 4476, + 411, 3818, 456, 13518, 13, 50490], "temperature": 0.0, "avg_logprob": -0.25437104317449755, + "compression_ratio": 1.5805243445692885, "no_speech_prob": 0.001434129080735147}, + {"id": 437, "seek": 156614, "start": 1568.66, "end": 1570.7800000000002, "text": + " I could get into Einstein Rosenberg''s", "tokens": [50490, 286, 727, 483, 666, + 23486, 33630, 6873, 311, 50596], "temperature": 0.0, "avg_logprob": -0.25437104317449755, + "compression_ratio": 1.5805243445692885, "no_speech_prob": 0.001434129080735147}, + {"id": 438, "seek": 156614, "start": 1570.7800000000002, "end": 1572.6200000000001, + "text": " and all that kind of stuff, but don''t really want to", "tokens": [50596, + 293, 439, 300, 733, 295, 1507, 11, 457, 500, 380, 534, 528, 281, 50688], "temperature": + 0.0, "avg_logprob": -0.25437104317449755, "compression_ratio": 1.5805243445692885, + "no_speech_prob": 0.001434129080735147}, {"id": 439, "seek": 156614, "start": 1572.6200000000001, + "end": 1575.5, "text": " for purposes of today.", "tokens": [50688, 337, 9932, 295, + 965, 13, 50832], "temperature": 0.0, "avg_logprob": -0.25437104317449755, "compression_ratio": + 1.5805243445692885, "no_speech_prob": 0.001434129080735147}, {"id": 440, "seek": + 156614, "start": 1575.5, "end": 1579.46, "text": " And what I do want to do though + is talk about,", "tokens": [50832, 400, 437, 286, 360, 528, 281, 360, 1673, 307, + 751, 466, 11, 51030], "temperature": 0.0, "avg_logprob": -0.25437104317449755, "compression_ratio": + 1.5805243445692885, "no_speech_prob": 0.001434129080735147}, {"id": 441, "seek": + 156614, "start": 1579.46, "end": 1580.9, "text": " oh, give me a second.", "tokens": + [51030, 1954, 11, 976, 385, 257, 1150, 13, 51102], "temperature": 0.0, "avg_logprob": + -0.25437104317449755, "compression_ratio": 1.5805243445692885, "no_speech_prob": + 0.001434129080735147}, {"id": 442, "seek": 156614, "start": 1580.9, "end": 1582.8200000000002, + "text": " Well, I''ll skip over this.", "tokens": [51102, 1042, 11, 286, 603, 10023, + 670, 341, 13, 51198], "temperature": 0.0, "avg_logprob": -0.25437104317449755, "compression_ratio": + 1.5805243445692885, "no_speech_prob": 0.001434129080735147}, {"id": 443, "seek": + 156614, "start": 1582.8200000000002, "end": 1584.66, "text": " We''ll maybe come + to that in the Q&A", "tokens": [51198, 492, 603, 1310, 808, 281, 300, 294, 264, + 1249, 5, 32, 51290], "temperature": 0.0, "avg_logprob": -0.25437104317449755, "compression_ratio": + 1.5805243445692885, "no_speech_prob": 0.001434129080735147}, {"id": 444, "seek": + 156614, "start": 1584.66, "end": 1589.8600000000001, "text": " if Demetri''s interested + in talking about this notion", "tokens": [51290, 498, 4686, 302, 470, 311, 3102, + 294, 1417, 466, 341, 10710, 51550], "temperature": 0.0, "avg_logprob": -0.25437104317449755, + "compression_ratio": 1.5805243445692885, "no_speech_prob": 0.001434129080735147}, + {"id": 445, "seek": 156614, "start": 1589.8600000000001, "end": 1593.6200000000001, + "text": " of entanglement and how that relates to wormholes.", "tokens": [51550, + 295, 948, 656, 3054, 293, 577, 300, 16155, 281, 23835, 37894, 13, 51738], "temperature": + 0.0, "avg_logprob": -0.25437104317449755, "compression_ratio": 1.5805243445692885, + "no_speech_prob": 0.001434129080735147}, {"id": 446, "seek": 156614, "start": 1593.6200000000001, + "end": 1594.9, "text": " It might be interesting later.", "tokens": [51738, 467, + 1062, 312, 1880, 1780, 13, 51802], "temperature": 0.0, "avg_logprob": -0.25437104317449755, + "compression_ratio": 1.5805243445692885, "no_speech_prob": 0.001434129080735147}, + {"id": 447, "seek": 159490, "start": 1594.9, "end": 1598.3400000000001, "text": + " But I don''t, this is about search, not about quantum physics", "tokens": [50364, + 583, 286, 500, 380, 11, 341, 307, 466, 3164, 11, 406, 466, 13018, 10649, 50536], + "temperature": 0.0, "avg_logprob": -0.17942346152612718, "compression_ratio": 1.8705357142857142, + "no_speech_prob": 0.00015755061758682132}, {"id": 448, "seek": 159490, "start": + 1598.3400000000001, "end": 1599.3400000000001, "text": " and physics in general.", + "tokens": [50536, 293, 10649, 294, 2674, 13, 50586], "temperature": 0.0, "avg_logprob": + -0.17942346152612718, "compression_ratio": 1.8705357142857142, "no_speech_prob": + 0.00015755061758682132}, {"id": 449, "seek": 159490, "start": 1599.3400000000001, + "end": 1603.42, "text": " So this is what it means to generate a wormhole vector", + "tokens": [50586, 407, 341, 307, 437, 309, 1355, 281, 8460, 257, 23835, 14094, 8062, + 50790], "temperature": 0.0, "avg_logprob": -0.17942346152612718, "compression_ratio": + 1.8705357142857142, "no_speech_prob": 0.00015755061758682132}, {"id": 450, "seek": + 159490, "start": 1603.42, "end": 1604.6200000000001, "text": " bi-practically.", + "tokens": [50790, 3228, 12, 42559, 984, 13, 50850], "temperature": 0.0, "avg_logprob": + -0.17942346152612718, "compression_ratio": 1.8705357142857142, "no_speech_prob": + 0.00015755061758682132}, {"id": 451, "seek": 159490, "start": 1604.6200000000001, + "end": 1609.14, "text": " So if you want to generate a wormhole vector,", "tokens": + [50850, 407, 498, 291, 528, 281, 8460, 257, 23835, 14094, 8062, 11, 51076], "temperature": + 0.0, "avg_logprob": -0.17942346152612718, "compression_ratio": 1.8705357142857142, + "no_speech_prob": 0.00015755061758682132}, {"id": 452, "seek": 159490, "start": + 1609.14, "end": 1612.02, "text": " there''s a fundamental base reality", "tokens": + [51076, 456, 311, 257, 8088, 3096, 4103, 51220], "temperature": 0.0, "avg_logprob": + -0.17942346152612718, "compression_ratio": 1.8705357142857142, "no_speech_prob": + 0.00015755061758682132}, {"id": 453, "seek": 159490, "start": 1612.02, "end": 1614.3400000000001, + "text": " of all these vector spaces, meaning if I query", "tokens": [51220, 295, + 439, 613, 8062, 7673, 11, 3620, 498, 286, 14581, 51336], "temperature": 0.0, "avg_logprob": + -0.17942346152612718, "compression_ratio": 1.8705357142857142, "no_speech_prob": + 0.00015755061758682132}, {"id": 454, "seek": 159490, "start": 1614.3400000000001, + "end": 1616.74, "text": " with an embedding and a dense vector space,", "tokens": + [51336, 365, 364, 12240, 3584, 293, 257, 18011, 8062, 1901, 11, 51456], "temperature": + 0.0, "avg_logprob": -0.17942346152612718, "compression_ratio": 1.8705357142857142, + "no_speech_prob": 0.00015755061758682132}, {"id": 455, "seek": 159490, "start": + 1616.74, "end": 1619.5, "text": " or I query with a lexical query over here,", "tokens": + [51456, 420, 286, 14581, 365, 257, 476, 87, 804, 14581, 670, 510, 11, 51594], "temperature": + 0.0, "avg_logprob": -0.17942346152612718, "compression_ratio": 1.8705357142857142, + "no_speech_prob": 0.00015755061758682132}, {"id": 456, "seek": 159490, "start": + 1619.5, "end": 1623.18, "text": " or I query with IDs and user behavior over here,", + "tokens": [51594, 420, 286, 14581, 365, 48212, 293, 4195, 5223, 670, 510, 11, 51778], + "temperature": 0.0, "avg_logprob": -0.17942346152612718, "compression_ratio": 1.8705357142857142, + "no_speech_prob": 0.00015755061758682132}, {"id": 457, "seek": 162318, "start": + 1623.18, "end": 1628.5800000000002, "text": " all of those queries ultimately boil + down to matching something.", "tokens": [50364, 439, 295, 729, 24109, 6284, 13329, + 760, 281, 14324, 746, 13, 50634], "temperature": 0.0, "avg_logprob": -0.12074902424445519, + "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.0001317524875048548}, + {"id": 458, "seek": 162318, "start": 1628.5800000000002, "end": 1631.54, "text": + " And the something that they match is really critical", "tokens": [50634, 400, + 264, 746, 300, 436, 2995, 307, 534, 4924, 50782], "temperature": 0.0, "avg_logprob": + -0.12074902424445519, "compression_ratio": 1.7883817427385893, "no_speech_prob": + 0.0001317524875048548}, {"id": 459, "seek": 162318, "start": 1631.54, "end": 1634.9, + "text": " to how we understand queries and how we understand relevance.", "tokens": + [50782, 281, 577, 321, 1223, 24109, 293, 577, 321, 1223, 32684, 13, 50950], "temperature": + 0.0, "avg_logprob": -0.12074902424445519, "compression_ratio": 1.7883817427385893, + "no_speech_prob": 0.0001317524875048548}, {"id": 460, "seek": 162318, "start": 1634.9, + "end": 1637.8200000000002, "text": " And what they boil down to is a document set.", + "tokens": [50950, 400, 437, 436, 13329, 760, 281, 307, 257, 4166, 992, 13, 51096], + "temperature": 0.0, "avg_logprob": -0.12074902424445519, "compression_ratio": 1.7883817427385893, + "no_speech_prob": 0.0001317524875048548}, {"id": 461, "seek": 162318, "start": 1637.8200000000002, + "end": 1640.5800000000002, "text": " So if you run an embedding search over here,", + "tokens": [51096, 407, 498, 291, 1190, 364, 12240, 3584, 3164, 670, 510, 11, 51234], + "temperature": 0.0, "avg_logprob": -0.12074902424445519, "compression_ratio": 1.7883817427385893, + "no_speech_prob": 0.0001317524875048548}, {"id": 462, "seek": 162318, "start": 1640.5800000000002, + "end": 1642.98, "text": " you find a point in vector space.", "tokens": [51234, + 291, 915, 257, 935, 294, 8062, 1901, 13, 51354], "temperature": 0.0, "avg_logprob": + -0.12074902424445519, "compression_ratio": 1.7883817427385893, "no_speech_prob": + 0.0001317524875048548}, {"id": 463, "seek": 162318, "start": 1642.98, "end": 1646.54, + "text": " And if it''s a dense space, you typically", "tokens": [51354, 400, 498, + 309, 311, 257, 18011, 1901, 11, 291, 5850, 51532], "temperature": 0.0, "avg_logprob": + -0.12074902424445519, "compression_ratio": 1.7883817427385893, "no_speech_prob": + 0.0001317524875048548}, {"id": 464, "seek": 162318, "start": 1646.54, "end": 1649.26, + "text": " do an approximate nearest neighbor algorithm,", "tokens": [51532, 360, + 364, 30874, 23831, 5987, 9284, 11, 51668], "temperature": 0.0, "avg_logprob": -0.12074902424445519, + "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.0001317524875048548}, + {"id": 465, "seek": 162318, "start": 1649.26, "end": 1652.46, "text": " or otherwise + find the nearest neighbors", "tokens": [51668, 420, 5911, 915, 264, 23831, 12512, + 51828], "temperature": 0.0, "avg_logprob": -0.12074902424445519, "compression_ratio": + 1.7883817427385893, "no_speech_prob": 0.0001317524875048548}, {"id": 466, "seek": + 165246, "start": 1652.46, "end": 1653.94, "text": " to whatever point you''re querying.", + "tokens": [50364, 281, 2035, 935, 291, 434, 7083, 1840, 13, 50438], "temperature": + 0.0, "avg_logprob": -0.15615109879841177, "compression_ratio": 1.8913857677902621, + "no_speech_prob": 0.0002027682203333825}, {"id": 467, "seek": 165246, "start": 1653.94, + "end": 1656.46, "text": " And those are your relevant documents.", "tokens": [50438, + 400, 729, 366, 428, 7340, 8512, 13, 50564], "temperature": 0.0, "avg_logprob": -0.15615109879841177, + "compression_ratio": 1.8913857677902621, "no_speech_prob": 0.0002027682203333825}, + {"id": 468, "seek": 165246, "start": 1656.46, "end": 1657.98, "text": " Those documents + form a set.", "tokens": [50564, 3950, 8512, 1254, 257, 992, 13, 50640], "temperature": + 0.0, "avg_logprob": -0.15615109879841177, "compression_ratio": 1.8913857677902621, + "no_speech_prob": 0.0002027682203333825}, {"id": 469, "seek": 165246, "start": 1657.98, + "end": 1660.26, "text": " And you can cut off the threshold at any point to say,", + "tokens": [50640, 400, 291, 393, 1723, 766, 264, 14678, 412, 604, 935, 281, 584, + 11, 50754], "temperature": 0.0, "avg_logprob": -0.15615109879841177, "compression_ratio": + 1.8913857677902621, "no_speech_prob": 0.0002027682203333825}, {"id": 470, "seek": + 165246, "start": 1660.26, "end": 1662.1000000000001, "text": " these are the documents + that matched.", "tokens": [50754, 613, 366, 264, 8512, 300, 21447, 13, 50846], "temperature": + 0.0, "avg_logprob": -0.15615109879841177, "compression_ratio": 1.8913857677902621, + "no_speech_prob": 0.0002027682203333825}, {"id": 471, "seek": 165246, "start": 1662.1000000000001, + "end": 1664.46, "text": " But that set of documents collectively", "tokens": [50846, + 583, 300, 992, 295, 8512, 24341, 50964], "temperature": 0.0, "avg_logprob": -0.15615109879841177, + "compression_ratio": 1.8913857677902621, "no_speech_prob": 0.0002027682203333825}, + {"id": 472, "seek": 165246, "start": 1664.46, "end": 1667.8600000000001, "text": + " has some meaning that some relationships within it", "tokens": [50964, 575, 512, + 3620, 300, 512, 6159, 1951, 309, 51134], "temperature": 0.0, "avg_logprob": -0.15615109879841177, + "compression_ratio": 1.8913857677902621, "no_speech_prob": 0.0002027682203333825}, + {"id": 473, "seek": 165246, "start": 1667.8600000000001, "end": 1670.22, "text": + " that represent the meaning of that query.", "tokens": [51134, 300, 2906, 264, + 3620, 295, 300, 14581, 13, 51252], "temperature": 0.0, "avg_logprob": -0.15615109879841177, + "compression_ratio": 1.8913857677902621, "no_speech_prob": 0.0002027682203333825}, + {"id": 474, "seek": 165246, "start": 1670.22, "end": 1673.1000000000001, "text": + " Likewise, if I do a keyword search,", "tokens": [51252, 30269, 11, 498, 286, 360, + 257, 20428, 3164, 11, 51396], "temperature": 0.0, "avg_logprob": -0.15615109879841177, + "compression_ratio": 1.8913857677902621, "no_speech_prob": 0.0002027682203333825}, + {"id": 475, "seek": 165246, "start": 1673.1000000000001, "end": 1674.7, "text": + " I find a document set.", "tokens": [51396, 286, 915, 257, 4166, 992, 13, 51476], + "temperature": 0.0, "avg_logprob": -0.15615109879841177, "compression_ratio": 1.8913857677902621, + "no_speech_prob": 0.0002027682203333825}, {"id": 476, "seek": 165246, "start": 1674.7, + "end": 1677.1000000000001, "text": " And the collection of those documents", "tokens": + [51476, 400, 264, 5765, 295, 729, 8512, 51596], "temperature": 0.0, "avg_logprob": + -0.15615109879841177, "compression_ratio": 1.8913857677902621, "no_speech_prob": + 0.0002027682203333825}, {"id": 477, "seek": 165246, "start": 1677.1000000000001, + "end": 1679.5, "text": " represents the meaning of that query,", "tokens": [51596, + 8855, 264, 3620, 295, 300, 14581, 11, 51716], "temperature": 0.0, "avg_logprob": + -0.15615109879841177, "compression_ratio": 1.8913857677902621, "no_speech_prob": + 0.0002027682203333825}, {"id": 478, "seek": 165246, "start": 1679.5, "end": 1681.54, + "text": " at least as we''ve been able to represent it", "tokens": [51716, 412, + 1935, 382, 321, 600, 668, 1075, 281, 2906, 309, 51818], "temperature": 0.0, "avg_logprob": + -0.15615109879841177, "compression_ratio": 1.8913857677902621, "no_speech_prob": + 0.0002027682203333825}, {"id": 479, "seek": 168154, "start": 1681.54, "end": 1683.62, + "text": " in that vector space, same thing over here.", "tokens": [50364, 294, 300, + 8062, 1901, 11, 912, 551, 670, 510, 13, 50468], "temperature": 0.0, "avg_logprob": + -0.19520183231519617, "compression_ratio": 1.8846153846153846, "no_speech_prob": + 2.7283009330858476e-05}, {"id": 480, "seek": 168154, "start": 1683.62, "end": 1685.74, + "text": " So the idea of a wormhole vector is", "tokens": [50468, 407, 264, 1558, + 295, 257, 23835, 14094, 8062, 307, 50574], "temperature": 0.0, "avg_logprob": -0.19520183231519617, + "compression_ratio": 1.8846153846153846, "no_speech_prob": 2.7283009330858476e-05}, + {"id": 481, "seek": 168154, "start": 1685.74, "end": 1690.06, "text": " if I want + to query in one vector space", "tokens": [50574, 498, 286, 528, 281, 14581, 294, + 472, 8062, 1901, 50790], "temperature": 0.0, "avg_logprob": -0.19520183231519617, + "compression_ratio": 1.8846153846153846, "no_speech_prob": 2.7283009330858476e-05}, + {"id": 482, "seek": 168154, "start": 1690.06, "end": 1693.74, "text": " and find + a corresponding region in another vector space", "tokens": [50790, 293, 915, 257, + 11760, 4458, 294, 1071, 8062, 1901, 50974], "temperature": 0.0, "avg_logprob": -0.19520183231519617, + "compression_ratio": 1.8846153846153846, "no_speech_prob": 2.7283009330858476e-05}, + {"id": 483, "seek": 168154, "start": 1693.74, "end": 1698.3799999999999, "text": + " that shares the same meaning semantically,", "tokens": [50974, 300, 12182, 264, + 912, 3620, 4361, 49505, 11, 51206], "temperature": 0.0, "avg_logprob": -0.19520183231519617, + "compression_ratio": 1.8846153846153846, "no_speech_prob": 2.7283009330858476e-05}, + {"id": 484, "seek": 168154, "start": 1698.3799999999999, "end": 1700.8999999999999, + "text": " then what I''ll do is I''ll query in the current vector space,", "tokens": + [51206, 550, 437, 286, 603, 360, 307, 286, 603, 14581, 294, 264, 2190, 8062, 1901, + 11, 51332], "temperature": 0.0, "avg_logprob": -0.19520183231519617, "compression_ratio": + 1.8846153846153846, "no_speech_prob": 2.7283009330858476e-05}, {"id": 485, "seek": + 168154, "start": 1700.8999999999999, "end": 1702.94, "text": " for example, within + embedding here.", "tokens": [51332, 337, 1365, 11, 1951, 12240, 3584, 510, 13, 51434], + "temperature": 0.0, "avg_logprob": -0.19520183231519617, "compression_ratio": 1.8846153846153846, + "no_speech_prob": 2.7283009330858476e-05}, {"id": 486, "seek": 168154, "start": + 1702.94, "end": 1703.8999999999999, "text": " Actually, let me start it here.", + "tokens": [51434, 5135, 11, 718, 385, 722, 309, 510, 13, 51482], "temperature": + 0.0, "avg_logprob": -0.19520183231519617, "compression_ratio": 1.8846153846153846, + "no_speech_prob": 2.7283009330858476e-05}, {"id": 487, "seek": 168154, "start": + 1703.8999999999999, "end": 1706.5, "text": " I''ll query in the sparse-like-school + vector space.", "tokens": [51482, 286, 603, 14581, 294, 264, 637, 11668, 12, 4092, + 12, 82, 21856, 8062, 1901, 13, 51612], "temperature": 0.0, "avg_logprob": -0.19520183231519617, + "compression_ratio": 1.8846153846153846, "no_speech_prob": 2.7283009330858476e-05}, + {"id": 488, "seek": 168154, "start": 1706.5, "end": 1708.7, "text": " I will then + find a relevant document set.", "tokens": [51612, 286, 486, 550, 915, 257, 7340, + 4166, 992, 13, 51722], "temperature": 0.0, "avg_logprob": -0.19520183231519617, + "compression_ratio": 1.8846153846153846, "no_speech_prob": 2.7283009330858476e-05}, + {"id": 489, "seek": 168154, "start": 1708.7, "end": 1709.82, "text": " This is what + search does.", "tokens": [51722, 639, 307, 437, 3164, 775, 13, 51778], "temperature": + 0.0, "avg_logprob": -0.19520183231519617, "compression_ratio": 1.8846153846153846, + "no_speech_prob": 2.7283009330858476e-05}, {"id": 490, "seek": 168154, "start": + 1709.82, "end": 1711.3799999999999, "text": " It finds a document set.", "tokens": + [51778, 467, 10704, 257, 4166, 992, 13, 51856], "temperature": 0.0, "avg_logprob": + -0.19520183231519617, "compression_ratio": 1.8846153846153846, "no_speech_prob": + 2.7283009330858476e-05}, {"id": 491, "seek": 171138, "start": 1711.38, "end": 1713.6200000000001, + "text": " And then I will derive a wormhole vector", "tokens": [50364, 400, 550, + 286, 486, 28446, 257, 23835, 14094, 8062, 50476], "temperature": 0.0, "avg_logprob": + -0.15189586836716223, "compression_ratio": 1.897872340425532, "no_speech_prob": + 2.2624504708801396e-05}, {"id": 492, "seek": 171138, "start": 1713.6200000000001, + "end": 1717.14, "text": " to a corresponding region of another vector space.", "tokens": + [50476, 281, 257, 11760, 4458, 295, 1071, 8062, 1901, 13, 50652], "temperature": + 0.0, "avg_logprob": -0.15189586836716223, "compression_ratio": 1.897872340425532, + "no_speech_prob": 2.2624504708801396e-05}, {"id": 493, "seek": 171138, "start": + 1717.14, "end": 1720.5, "text": " So for example, once I found a document set over + here,", "tokens": [50652, 407, 337, 1365, 11, 1564, 286, 1352, 257, 4166, 992, 670, + 510, 11, 50820], "temperature": 0.0, "avg_logprob": -0.15189586836716223, "compression_ratio": + 1.897872340425532, "no_speech_prob": 2.2624504708801396e-05}, {"id": 494, "seek": + 171138, "start": 1720.5, "end": 1723.8600000000001, "text": " I will use that document + set to generate,", "tokens": [50820, 286, 486, 764, 300, 4166, 992, 281, 8460, 11, + 50988], "temperature": 0.0, "avg_logprob": -0.15189586836716223, "compression_ratio": + 1.897872340425532, "no_speech_prob": 2.2624504708801396e-05}, {"id": 495, "seek": + 171138, "start": 1723.8600000000001, "end": 1725.3400000000001, "text": " what I''m + calling a wormhole vector,", "tokens": [50988, 437, 286, 478, 5141, 257, 23835, + 14094, 8062, 11, 51062], "temperature": 0.0, "avg_logprob": -0.15189586836716223, + "compression_ratio": 1.897872340425532, "no_speech_prob": 2.2624504708801396e-05}, + {"id": 496, "seek": 171138, "start": 1725.3400000000001, "end": 1728.0200000000002, + "text": " but to generate a vector that will allow me", "tokens": [51062, 457, 281, + 8460, 257, 8062, 300, 486, 2089, 385, 51196], "temperature": 0.0, "avg_logprob": + -0.15189586836716223, "compression_ratio": 1.897872340425532, "no_speech_prob": + 2.2624504708801396e-05}, {"id": 497, "seek": 171138, "start": 1728.0200000000002, + "end": 1730.74, "text": " to query in the other vector space or hop or traverse", + "tokens": [51196, 281, 14581, 294, 264, 661, 8062, 1901, 420, 3818, 420, 45674, + 51332], "temperature": 0.0, "avg_logprob": -0.15189586836716223, "compression_ratio": + 1.897872340425532, "no_speech_prob": 2.2624504708801396e-05}, {"id": 498, "seek": + 171138, "start": 1730.74, "end": 1732.6200000000001, "text": " to the other vector + space instantly,", "tokens": [51332, 281, 264, 661, 8062, 1901, 13518, 11, 51426], + "temperature": 0.0, "avg_logprob": -0.15189586836716223, "compression_ratio": 1.897872340425532, + "no_speech_prob": 2.2624504708801396e-05}, {"id": 499, "seek": 171138, "start": + 1732.6200000000001, "end": 1736.0200000000002, "text": " to a region that shares + a similar semantic meaning", "tokens": [51426, 281, 257, 4458, 300, 12182, 257, + 2531, 47982, 3620, 51596], "temperature": 0.0, "avg_logprob": -0.15189586836716223, + "compression_ratio": 1.897872340425532, "no_speech_prob": 2.2624504708801396e-05}, + {"id": 500, "seek": 171138, "start": 1736.0200000000002, "end": 1738.6200000000001, + "text": " to the region in the Lexical space.", "tokens": [51596, 281, 264, 4458, + 294, 264, 24086, 804, 1901, 13, 51726], "temperature": 0.0, "avg_logprob": -0.15189586836716223, + "compression_ratio": 1.897872340425532, "no_speech_prob": 2.2624504708801396e-05}, + {"id": 501, "seek": 173862, "start": 1738.62, "end": 1742.58, "text": " Then once + I''ve found that vector for the other vector space,", "tokens": [50364, 1396, 1564, + 286, 600, 1352, 300, 8062, 337, 264, 661, 8062, 1901, 11, 50562], "temperature": + 0.0, "avg_logprob": -0.15582829814846233, "compression_ratio": 1.8911290322580645, + "no_speech_prob": 0.00017260682943742722}, {"id": 502, "seek": 173862, "start": + 1742.58, "end": 1744.86, "text": " I will run that query in the other vector space", + "tokens": [50562, 286, 486, 1190, 300, 14581, 294, 264, 661, 8062, 1901, 50676], + "temperature": 0.0, "avg_logprob": -0.15582829814846233, "compression_ratio": 1.8911290322580645, + "no_speech_prob": 0.00017260682943742722}, {"id": 503, "seek": 173862, "start": + 1744.86, "end": 1746.82, "text": " to traverse to that vector space,", "tokens": + [50676, 281, 45674, 281, 300, 8062, 1901, 11, 50774], "temperature": 0.0, "avg_logprob": + -0.15582829814846233, "compression_ratio": 1.8911290322580645, "no_speech_prob": + 0.00017260682943742722}, {"id": 504, "seek": 173862, "start": 1746.82, "end": 1748.4599999999998, + "text": " and then I''ll repeat as needed.", "tokens": [50774, 293, 550, 286, 603, + 7149, 382, 2978, 13, 50856], "temperature": 0.0, "avg_logprob": -0.15582829814846233, + "compression_ratio": 1.8911290322580645, "no_speech_prob": 0.00017260682943742722}, + {"id": 505, "seek": 173862, "start": 1748.4599999999998, "end": 1750.9799999999998, + "text": " So I can actually hop back and forth between vector spaces", "tokens": + [50856, 407, 286, 393, 767, 3818, 646, 293, 5220, 1296, 8062, 7673, 50982], "temperature": + 0.0, "avg_logprob": -0.15582829814846233, "compression_ratio": 1.8911290322580645, + "no_speech_prob": 0.00017260682943742722}, {"id": 506, "seek": 173862, "start": + 1750.9799999999998, "end": 1753.26, "text": " to find and collect documents,", "tokens": + [50982, 281, 915, 293, 2500, 8512, 11, 51096], "temperature": 0.0, "avg_logprob": + -0.15582829814846233, "compression_ratio": 1.8911290322580645, "no_speech_prob": + 0.00017260682943742722}, {"id": 507, "seek": 173862, "start": 1753.26, "end": 1754.9799999999998, + "text": " to try to better understand them,", "tokens": [51096, 281, 853, 281, 1101, + 1223, 552, 11, 51182], "temperature": 0.0, "avg_logprob": -0.15582829814846233, + "compression_ratio": 1.8911290322580645, "no_speech_prob": 0.00017260682943742722}, + {"id": 508, "seek": 173862, "start": 1754.9799999999998, "end": 1758.86, "text": + " and then to use that understanding to take those documents", "tokens": [51182, + 293, 550, 281, 764, 300, 3701, 281, 747, 729, 8512, 51376], "temperature": 0.0, + "avg_logprob": -0.15582829814846233, "compression_ratio": 1.8911290322580645, "no_speech_prob": + 0.00017260682943742722}, {"id": 509, "seek": 173862, "start": 1758.86, "end": 1761.34, + "text": " and return them for the full set of search results.", "tokens": [51376, + 293, 2736, 552, 337, 264, 1577, 992, 295, 3164, 3542, 13, 51500], "temperature": + 0.0, "avg_logprob": -0.15582829814846233, "compression_ratio": 1.8911290322580645, + "no_speech_prob": 0.00017260682943742722}, {"id": 510, "seek": 173862, "start": + 1762.5, "end": 1765.7399999999998, "text": " So I''m gonna actually just show this + visually for a second.", "tokens": [51558, 407, 286, 478, 799, 767, 445, 855, 341, + 19622, 337, 257, 1150, 13, 51720], "temperature": 0.0, "avg_logprob": -0.15582829814846233, + "compression_ratio": 1.8911290322580645, "no_speech_prob": 0.00017260682943742722}, + {"id": 511, "seek": 176574, "start": 1766.74, "end": 1769.14, "text": " Let me put + them up.", "tokens": [50414, 961, 385, 829, 552, 493, 13, 50534], "temperature": + 0.0, "avg_logprob": -0.35301541678513154, "compression_ratio": 1.5568181818181819, + "no_speech_prob": 0.002192049054428935}, {"id": 512, "seek": 176574, "start": 1773.6200000000001, + "end": 1776.6200000000001, "text": " Let me click here and restart this demo.", + "tokens": [50758, 961, 385, 2052, 510, 293, 21022, 341, 10723, 13, 50908], "temperature": + 0.0, "avg_logprob": -0.35301541678513154, "compression_ratio": 1.5568181818181819, + "no_speech_prob": 0.002192049054428935}, {"id": 513, "seek": 176574, "start": 1779.38, + "end": 1782.98, "text": " So imagine I have a sparse vector space over here on the + left.", "tokens": [51046, 407, 3811, 286, 362, 257, 637, 11668, 8062, 1901, 670, + 510, 322, 264, 1411, 13, 51226], "temperature": 0.0, "avg_logprob": -0.35301541678513154, + "compression_ratio": 1.5568181818181819, "no_speech_prob": 0.002192049054428935}, + {"id": 514, "seek": 176574, "start": 1783.98, "end": 1786.66, "text": " The way + this works is I send a query in.", "tokens": [51276, 440, 636, 341, 1985, 307, 286, + 2845, 257, 14581, 294, 13, 51410], "temperature": 0.0, "avg_logprob": -0.35301541678513154, + "compression_ratio": 1.5568181818181819, "no_speech_prob": 0.002192049054428935}, + {"id": 515, "seek": 176574, "start": 1786.66, "end": 1789.98, "text": " This query + finds a set of relevant documents", "tokens": [51410, 639, 14581, 10704, 257, 992, + 295, 7340, 8512, 51576], "temperature": 0.0, "avg_logprob": -0.35301541678513154, + "compression_ratio": 1.5568181818181819, "no_speech_prob": 0.002192049054428935}, + {"id": 516, "seek": 176574, "start": 1789.98, "end": 1792.26, "text": " that are + in this vector space,", "tokens": [51576, 300, 366, 294, 341, 8062, 1901, 11, 51690], + "temperature": 0.0, "avg_logprob": -0.35301541678513154, "compression_ratio": 1.5568181818181819, + "no_speech_prob": 0.002192049054428935}, {"id": 517, "seek": 176574, "start": 1792.26, + "end": 1793.66, "text": " and what''s found those documents,", "tokens": [51690, + 293, 437, 311, 1352, 729, 8512, 11, 51760], "temperature": 0.0, "avg_logprob": -0.35301541678513154, + "compression_ratio": 1.5568181818181819, "no_speech_prob": 0.002192049054428935}, + {"id": 518, "seek": 179366, "start": 1793.66, "end": 1797.18, "text": " it uses + them to essentially a wormhole in the pauses", "tokens": [50364, 309, 4960, 552, + 281, 4476, 257, 23835, 14094, 294, 264, 2502, 8355, 50540], "temperature": 0.0, + "avg_logprob": -0.19951272518076796, "compression_ratio": 1.7517482517482517, "no_speech_prob": + 9.638664778321981e-05}, {"id": 519, "seek": 179366, "start": 1797.18, "end": 1798.18, + "text": " for a second.", "tokens": [50540, 337, 257, 1150, 13, 50590], "temperature": + 0.0, "avg_logprob": -0.19951272518076796, "compression_ratio": 1.7517482517482517, + "no_speech_prob": 9.638664778321981e-05}, {"id": 520, "seek": 179366, "start": 1798.18, + "end": 1800.1000000000001, "text": " Oh, maybe I can''t.", "tokens": [50590, 876, + 11, 1310, 286, 393, 380, 13, 50686], "temperature": 0.0, "avg_logprob": -0.19951272518076796, + "compression_ratio": 1.7517482517482517, "no_speech_prob": 9.638664778321981e-05}, + {"id": 521, "seek": 179366, "start": 1800.1000000000001, "end": 1801.5400000000002, + "text": " Essentially, once I''ve run that query,", "tokens": [50686, 23596, 11, + 1564, 286, 600, 1190, 300, 14581, 11, 50758], "temperature": 0.0, "avg_logprob": + -0.19951272518076796, "compression_ratio": 1.7517482517482517, "no_speech_prob": + 9.638664778321981e-05}, {"id": 522, "seek": 179366, "start": 1801.5400000000002, + "end": 1803.02, "text": " I find the relevant documents,", "tokens": [50758, 286, + 915, 264, 7340, 8512, 11, 50832], "temperature": 0.0, "avg_logprob": -0.19951272518076796, + "compression_ratio": 1.7517482517482517, "no_speech_prob": 9.638664778321981e-05}, + {"id": 523, "seek": 179366, "start": 1803.02, "end": 1805.3400000000001, "text": + " which are the things close by in vector space.", "tokens": [50832, 597, 366, 264, + 721, 1998, 538, 294, 8062, 1901, 13, 50948], "temperature": 0.0, "avg_logprob": + -0.19951272518076796, "compression_ratio": 1.7517482517482517, "no_speech_prob": + 9.638664778321981e-05}, {"id": 524, "seek": 179366, "start": 1805.3400000000001, + "end": 1809.02, "text": " I then use that to generate a vector and embedding", "tokens": + [50948, 286, 550, 764, 300, 281, 8460, 257, 8062, 293, 12240, 3584, 51132], "temperature": + 0.0, "avg_logprob": -0.19951272518076796, "compression_ratio": 1.7517482517482517, + "no_speech_prob": 9.638664778321981e-05}, {"id": 525, "seek": 179366, "start": 1809.02, + "end": 1812.0600000000002, "text": " that I''m gonna run a search for over here + in the dense space.", "tokens": [51132, 300, 286, 478, 799, 1190, 257, 3164, 337, + 670, 510, 294, 264, 18011, 1901, 13, 51284], "temperature": 0.0, "avg_logprob": + -0.19951272518076796, "compression_ratio": 1.7517482517482517, "no_speech_prob": + 9.638664778321981e-05}, {"id": 526, "seek": 179366, "start": 1812.0600000000002, + "end": 1813.5, "text": " And once I run that search,", "tokens": [51284, 400, 1564, + 286, 1190, 300, 3164, 11, 51356], "temperature": 0.0, "avg_logprob": -0.19951272518076796, + "compression_ratio": 1.7517482517482517, "no_speech_prob": 9.638664778321981e-05}, + {"id": 527, "seek": 179366, "start": 1813.5, "end": 1815.98, "text": " you''ll notice + that in this example,", "tokens": [51356, 291, 603, 3449, 300, 294, 341, 1365, 11, + 51480], "temperature": 0.0, "avg_logprob": -0.19951272518076796, "compression_ratio": + 1.7517482517482517, "no_speech_prob": 9.638664778321981e-05}, {"id": 528, "seek": + 179366, "start": 1815.98, "end": 1818.42, "text": " it''s not exactly where these + documents are,", "tokens": [51480, 309, 311, 406, 2293, 689, 613, 8512, 366, 11, + 51602], "temperature": 0.0, "avg_logprob": -0.19951272518076796, "compression_ratio": + 1.7517482517482517, "no_speech_prob": 9.638664778321981e-05}, {"id": 529, "seek": + 179366, "start": 1818.42, "end": 1819.74, "text": " but it''s very nearby,", "tokens": + [51602, 457, 309, 311, 588, 11184, 11, 51668], "temperature": 0.0, "avg_logprob": + -0.19951272518076796, "compression_ratio": 1.7517482517482517, "no_speech_prob": + 9.638664778321981e-05}, {"id": 530, "seek": 179366, "start": 1819.74, "end": 1822.66, + "text": " meaning the sort of collection of these things together", "tokens": [51668, + 3620, 264, 1333, 295, 5765, 295, 613, 721, 1214, 51814], "temperature": 0.0, "avg_logprob": + -0.19951272518076796, "compression_ratio": 1.7517482517482517, "no_speech_prob": + 9.638664778321981e-05}, {"id": 531, "seek": 182266, "start": 1822.66, "end": 1826.26, + "text": " and what''s understood semantically about the relationship,", "tokens": + [50364, 293, 437, 311, 7320, 4361, 49505, 466, 264, 2480, 11, 50544], "temperature": + 0.0, "avg_logprob": -0.15157008171081543, "compression_ratio": 1.7628458498023716, + "no_speech_prob": 5.4762684158049524e-05}, {"id": 532, "seek": 182266, "start": + 1826.26, "end": 1829.18, "text": " maps to this point and vector space on the right.", + "tokens": [50544, 11317, 281, 341, 935, 293, 8062, 1901, 322, 264, 558, 13, 50690], + "temperature": 0.0, "avg_logprob": -0.15157008171081543, "compression_ratio": 1.7628458498023716, + "no_speech_prob": 5.4762684158049524e-05}, {"id": 533, "seek": 182266, "start": + 1829.18, "end": 1833.8600000000001, "text": " And then that allows me to then find + other things surrounding it", "tokens": [50690, 400, 550, 300, 4045, 385, 281, 550, + 915, 661, 721, 11498, 309, 50924], "temperature": 0.0, "avg_logprob": -0.15157008171081543, + "compression_ratio": 1.7628458498023716, "no_speech_prob": 5.4762684158049524e-05}, + {"id": 534, "seek": 182266, "start": 1833.8600000000001, "end": 1836.66, "text": + " that represent a similar meaning.", "tokens": [50924, 300, 2906, 257, 2531, 3620, + 13, 51064], "temperature": 0.0, "avg_logprob": -0.15157008171081543, "compression_ratio": + 1.7628458498023716, "no_speech_prob": 5.4762684158049524e-05}, {"id": 535, "seek": + 182266, "start": 1836.66, "end": 1839.26, "text": " And this is just looking at + two vector spaces,", "tokens": [51064, 400, 341, 307, 445, 1237, 412, 732, 8062, + 7673, 11, 51194], "temperature": 0.0, "avg_logprob": -0.15157008171081543, "compression_ratio": + 1.7628458498023716, "no_speech_prob": 5.4762684158049524e-05}, {"id": 536, "seek": + 182266, "start": 1839.26, "end": 1841.8600000000001, "text": " a sparse vector space + and a dense vector space", "tokens": [51194, 257, 637, 11668, 8062, 1901, 293, 257, + 18011, 8062, 1901, 51324], "temperature": 0.0, "avg_logprob": -0.15157008171081543, + "compression_ratio": 1.7628458498023716, "no_speech_prob": 5.4762684158049524e-05}, + {"id": 537, "seek": 182266, "start": 1841.8600000000001, "end": 1844.02, "text": + " for keywords and then for embeddings.", "tokens": [51324, 337, 21009, 293, 550, + 337, 12240, 29432, 13, 51432], "temperature": 0.0, "avg_logprob": -0.15157008171081543, + "compression_ratio": 1.7628458498023716, "no_speech_prob": 5.4762684158049524e-05}, + {"id": 538, "seek": 182266, "start": 1844.02, "end": 1845.8600000000001, "text": + " But as I mentioned, there''s also this notion", "tokens": [51432, 583, 382, 286, + 2835, 11, 456, 311, 611, 341, 10710, 51524], "temperature": 0.0, "avg_logprob": + -0.15157008171081543, "compression_ratio": 1.7628458498023716, "no_speech_prob": + 5.4762684158049524e-05}, {"id": 539, "seek": 182266, "start": 1845.8600000000001, + "end": 1847.5800000000002, "text": " of a behavioral vector space.", "tokens": [51524, + 295, 257, 19124, 8062, 1901, 13, 51610], "temperature": 0.0, "avg_logprob": -0.15157008171081543, + "compression_ratio": 1.7628458498023716, "no_speech_prob": 5.4762684158049524e-05}, + {"id": 540, "seek": 182266, "start": 1847.5800000000002, "end": 1849.02, "text": + " So the same thing happens here.", "tokens": [51610, 407, 264, 912, 551, 2314, + 510, 13, 51682], "temperature": 0.0, "avg_logprob": -0.15157008171081543, "compression_ratio": + 1.7628458498023716, "no_speech_prob": 5.4762684158049524e-05}, {"id": 541, "seek": + 184902, "start": 1849.1, "end": 1853.26, "text": " I can run a query, find relevant + documents,", "tokens": [50368, 286, 393, 1190, 257, 14581, 11, 915, 7340, 8512, + 11, 50576], "temperature": 0.0, "avg_logprob": -0.1160381283380289, "compression_ratio": + 1.7328244274809161, "no_speech_prob": 0.0007076597539708018}, {"id": 542, "seek": + 184902, "start": 1853.26, "end": 1855.54, "text": " use those as my wormhole.", + "tokens": [50576, 764, 729, 382, 452, 23835, 14094, 13, 50690], "temperature": 0.0, + "avg_logprob": -0.1160381283380289, "compression_ratio": 1.7328244274809161, "no_speech_prob": + 0.0007076597539708018}, {"id": 543, "seek": 184902, "start": 1855.54, "end": 1857.5, + "text": " And then I generate this wormhole vector", "tokens": [50690, 400, 550, + 286, 8460, 341, 23835, 14094, 8062, 50788], "temperature": 0.0, "avg_logprob": -0.1160381283380289, + "compression_ratio": 1.7328244274809161, "no_speech_prob": 0.0007076597539708018}, + {"id": 544, "seek": 184902, "start": 1857.5, "end": 1859.94, "text": " to hop through + the wormhole to the other side", "tokens": [50788, 281, 3818, 807, 264, 23835, 14094, + 281, 264, 661, 1252, 50910], "temperature": 0.0, "avg_logprob": -0.1160381283380289, + "compression_ratio": 1.7328244274809161, "no_speech_prob": 0.0007076597539708018}, + {"id": 545, "seek": 184902, "start": 1859.94, "end": 1862.66, "text": " to find + the region corresponding to that meaning", "tokens": [50910, 281, 915, 264, 4458, + 11760, 281, 300, 3620, 51046], "temperature": 0.0, "avg_logprob": -0.1160381283380289, + "compression_ratio": 1.7328244274809161, "no_speech_prob": 0.0007076597539708018}, + {"id": 546, "seek": 184902, "start": 1862.66, "end": 1864.9, "text": " in either + of these other vector spaces.", "tokens": [51046, 294, 2139, 295, 613, 661, 8062, + 7673, 13, 51158], "temperature": 0.0, "avg_logprob": -0.1160381283380289, "compression_ratio": + 1.7328244274809161, "no_speech_prob": 0.0007076597539708018}, {"id": 547, "seek": + 184902, "start": 1864.9, "end": 1868.82, "text": " So in this case, if I''ve done + major expectation,", "tokens": [51158, 407, 294, 341, 1389, 11, 498, 286, 600, 1096, + 2563, 14334, 11, 51354], "temperature": 0.0, "avg_logprob": -0.1160381283380289, + "compression_ratio": 1.7328244274809161, "no_speech_prob": 0.0007076597539708018}, + {"id": 548, "seek": 184902, "start": 1868.82, "end": 1870.74, "text": " which is + like the process you go through", "tokens": [51354, 597, 307, 411, 264, 1399, 291, + 352, 807, 51450], "temperature": 0.0, "avg_logprob": -0.1160381283380289, "compression_ratio": + 1.7328244274809161, "no_speech_prob": 0.0007076597539708018}, {"id": 549, "seek": + 184902, "start": 1870.74, "end": 1873.3799999999999, "text": " when you''re doing + collaborative filtering for recommendations,", "tokens": [51450, 562, 291, 434, + 884, 16555, 30822, 337, 10434, 11, 51582], "temperature": 0.0, "avg_logprob": -0.1160381283380289, + "compression_ratio": 1.7328244274809161, "no_speech_prob": 0.0007076597539708018}, + {"id": 550, "seek": 184902, "start": 1873.3799999999999, "end": 1877.34, "text": + " then I would hop to the corresponding region over here.", "tokens": [51582, 550, + 286, 576, 3818, 281, 264, 11760, 4458, 670, 510, 13, 51780], "temperature": 0.0, + "avg_logprob": -0.1160381283380289, "compression_ratio": 1.7328244274809161, "no_speech_prob": + 0.0007076597539708018}, {"id": 551, "seek": 187734, "start": 1877.34, "end": 1881.86, + "text": " So that''s the general idea, just kind of visually describing it.", "tokens": + [50364, 407, 300, 311, 264, 2674, 1558, 11, 445, 733, 295, 19622, 16141, 309, 13, + 50590], "temperature": 0.2, "avg_logprob": -0.5212716052406713, "compression_ratio": + 1.4437869822485208, "no_speech_prob": 6.683952960884199e-05}, {"id": 552, "seek": + 187734, "start": 1881.86, "end": 1885.5, "text": " And hop like over here.", "tokens": + [50590, 400, 3818, 411, 670, 510, 13, 50772], "temperature": 0.2, "avg_logprob": + -0.5212716052406713, "compression_ratio": 1.4437869822485208, "no_speech_prob": + 6.683952960884199e-05}, {"id": 553, "seek": 187734, "start": 1885.5, "end": 1888.34, + "text": " I can just one second.", "tokens": [50772, 286, 393, 445, 472, 1150, 13, + 50914], "temperature": 0.2, "avg_logprob": -0.5212716052406713, "compression_ratio": + 1.4437869822485208, "no_speech_prob": 6.683952960884199e-05}, {"id": 554, "seek": + 187734, "start": 1888.34, "end": 1890.26, "text": " I can be one second.", "tokens": + [50914, 286, 393, 312, 472, 1150, 13, 51010], "temperature": 0.2, "avg_logprob": + -0.5212716052406713, "compression_ratio": 1.4437869822485208, "no_speech_prob": + 6.683952960884199e-05}, {"id": 555, "seek": 187734, "start": 1890.26, "end": 1891.74, + "text": " Come on, slides up.", "tokens": [51010, 2492, 322, 11, 9788, 493, 13, + 51084], "temperature": 0.2, "avg_logprob": -0.5212716052406713, "compression_ratio": + 1.4437869822485208, "no_speech_prob": 6.683952960884199e-05}, {"id": 556, "seek": + 187734, "start": 1900.54, "end": 1901.06, "text": " All right.", "tokens": [51524, + 1057, 558, 13, 51550], "temperature": 0.2, "avg_logprob": -0.5212716052406713, "compression_ratio": + 1.4437869822485208, "no_speech_prob": 6.683952960884199e-05}, {"id": 557, "seek": + 187734, "start": 1903.98, "end": 1905.22, "text": " And then the next question is, + how", "tokens": [51696, 400, 550, 264, 958, 1168, 307, 11, 577, 51758], "temperature": + 0.2, "avg_logprob": -0.5212716052406713, "compression_ratio": 1.4437869822485208, + "no_speech_prob": 6.683952960884199e-05}, {"id": 558, "seek": 187734, "start": 1905.22, + "end": 1907.3, "text": " do we actually create these wormhole vectors?", "tokens": + [51758, 360, 321, 767, 1884, 613, 23835, 14094, 18875, 30, 51862], "temperature": + 0.2, "avg_logprob": -0.5212716052406713, "compression_ratio": 1.4437869822485208, + "no_speech_prob": 6.683952960884199e-05}, {"id": 559, "seek": 190730, "start": 1907.46, + "end": 1909.22, "text": " So to meet you, if there''s any questions,", "tokens": + [50372, 407, 281, 1677, 291, 11, 498, 456, 311, 604, 1651, 11, 50460], "temperature": + 0.0, "avg_logprob": -0.23760106922250934, "compression_ratio": 1.5751072961373391, + "no_speech_prob": 0.00018061262380797416}, {"id": 560, "seek": 190730, "start": + 1909.22, "end": 1910.46, "text": " feel free to interrupt me at any point.", "tokens": + [50460, 841, 1737, 281, 12729, 385, 412, 604, 935, 13, 50522], "temperature": 0.0, + "avg_logprob": -0.23760106922250934, "compression_ratio": 1.5751072961373391, "no_speech_prob": + 0.00018061262380797416}, {"id": 561, "seek": 190730, "start": 1910.46, "end": 1912.7, + "text": " But I''ll keep going otherwise.", "tokens": [50522, 583, 286, 603, 1066, + 516, 5911, 13, 50634], "temperature": 0.0, "avg_logprob": -0.23760106922250934, + "compression_ratio": 1.5751072961373391, "no_speech_prob": 0.00018061262380797416}, + {"id": 562, "seek": 190730, "start": 1912.7, "end": 1914.86, "text": " I think we + have a couple of questions,", "tokens": [50634, 286, 519, 321, 362, 257, 1916, 295, + 1651, 11, 50742], "temperature": 0.0, "avg_logprob": -0.23760106922250934, "compression_ratio": + 1.5751072961373391, "no_speech_prob": 0.00018061262380797416}, {"id": 563, "seek": + 190730, "start": 1914.86, "end": 1916.78, "text": " but we''ll defer at the end.", + "tokens": [50742, 457, 321, 603, 25704, 412, 264, 917, 13, 50838], "temperature": + 0.0, "avg_logprob": -0.23760106922250934, "compression_ratio": 1.5751072961373391, + "no_speech_prob": 0.00018061262380797416}, {"id": 564, "seek": 190730, "start": + 1916.78, "end": 1917.62, "text": " Sounds good.", "tokens": [50838, 14576, 665, + 13, 50880], "temperature": 0.0, "avg_logprob": -0.23760106922250934, "compression_ratio": + 1.5751072961373391, "no_speech_prob": 0.00018061262380797416}, {"id": 565, "seek": + 190730, "start": 1917.62, "end": 1918.26, "text": " All right.", "tokens": [50880, + 1057, 558, 13, 50912], "temperature": 0.0, "avg_logprob": -0.23760106922250934, + "compression_ratio": 1.5751072961373391, "no_speech_prob": 0.00018061262380797416}, + {"id": 566, "seek": 190730, "start": 1918.26, "end": 1921.62, "text": " So the question + now is, how do we create a wormhole vector?", "tokens": [50912, 407, 264, 1168, + 586, 307, 11, 577, 360, 321, 1884, 257, 23835, 14094, 8062, 30, 51080], "temperature": + 0.0, "avg_logprob": -0.23760106922250934, "compression_ratio": 1.5751072961373391, + "no_speech_prob": 0.00018061262380797416}, {"id": 567, "seek": 190730, "start": + 1921.62, "end": 1927.3799999999999, "text": " And there''s essentially two types + that I''m going to focus on right now.", "tokens": [51080, 400, 456, 311, 4476, + 732, 3467, 300, 286, 478, 516, 281, 1879, 322, 558, 586, 13, 51368], "temperature": + 0.0, "avg_logprob": -0.23760106922250934, "compression_ratio": 1.5751072961373391, + "no_speech_prob": 0.00018061262380797416}, {"id": 568, "seek": 190730, "start": + 1927.3799999999999, "end": 1933.7, "text": " One is the, sorry, I lost this.", "tokens": + [51368, 1485, 307, 264, 11, 2597, 11, 286, 2731, 341, 13, 51684], "temperature": + 0.0, "avg_logprob": -0.23760106922250934, "compression_ratio": 1.5751072961373391, + "no_speech_prob": 0.00018061262380797416}, {"id": 569, "seek": 193370, "start": + 1933.7, "end": 1934.98, "text": " OK.", "tokens": [50364, 2264, 13, 50428], "temperature": + 0.0, "avg_logprob": -0.2036243237947163, "compression_ratio": 1.682608695652174, + "no_speech_prob": 0.002920187311246991}, {"id": 570, "seek": 193370, "start": 1934.98, + "end": 1941.74, "text": " The first is if I''m trying to go to a dense vector space + within bedding.", "tokens": [50428, 440, 700, 307, 498, 286, 478, 1382, 281, 352, + 281, 257, 18011, 8062, 1901, 1951, 2901, 3584, 13, 50766], "temperature": 0.0, "avg_logprob": + -0.2036243237947163, "compression_ratio": 1.682608695652174, "no_speech_prob": 0.002920187311246991}, + {"id": 571, "seek": 193370, "start": 1941.74, "end": 1942.66, "text": " So this + is very easy.", "tokens": [50766, 407, 341, 307, 588, 1858, 13, 50812], "temperature": + 0.0, "avg_logprob": -0.2036243237947163, "compression_ratio": 1.682608695652174, + "no_speech_prob": 0.002920187311246991}, {"id": 572, "seek": 193370, "start": 1942.66, + "end": 1948.22, "text": " All I have to do is pull the vectors or average the vectors + of the top end documents.", "tokens": [50812, 1057, 286, 362, 281, 360, 307, 2235, + 264, 18875, 420, 4274, 264, 18875, 295, 264, 1192, 917, 8512, 13, 51090], "temperature": + 0.0, "avg_logprob": -0.2036243237947163, "compression_ratio": 1.682608695652174, + "no_speech_prob": 0.002920187311246991}, {"id": 573, "seek": 193370, "start": 1948.22, + "end": 1950.78, "text": " So imagine I run a keyword search.", "tokens": [51090, + 407, 3811, 286, 1190, 257, 20428, 3164, 13, 51218], "temperature": 0.0, "avg_logprob": + -0.2036243237947163, "compression_ratio": 1.682608695652174, "no_speech_prob": 0.002920187311246991}, + {"id": 574, "seek": 193370, "start": 1950.78, "end": 1952.3, "text": " I find a + set of documents.", "tokens": [51218, 286, 915, 257, 992, 295, 8512, 13, 51294], + "temperature": 0.0, "avg_logprob": -0.2036243237947163, "compression_ratio": 1.682608695652174, + "no_speech_prob": 0.002920187311246991}, {"id": 575, "seek": 193370, "start": 1952.3, + "end": 1953.8600000000001, "text": " I rank those.", "tokens": [51294, 286, 6181, + 729, 13, 51372], "temperature": 0.0, "avg_logprob": -0.2036243237947163, "compression_ratio": + 1.682608695652174, "no_speech_prob": 0.002920187311246991}, {"id": 576, "seek": + 193370, "start": 1953.8600000000001, "end": 1956.8600000000001, "text": " And then + I don''t necessarily need to take the entire document set.", "tokens": [51372, 400, + 550, 286, 500, 380, 4725, 643, 281, 747, 264, 2302, 4166, 992, 13, 51522], "temperature": + 0.0, "avg_logprob": -0.2036243237947163, "compression_ratio": 1.682608695652174, + "no_speech_prob": 0.002920187311246991}, {"id": 577, "seek": 193370, "start": 1956.8600000000001, + "end": 1957.78, "text": " I could.", "tokens": [51522, 286, 727, 13, 51568], "temperature": + 0.0, "avg_logprob": -0.2036243237947163, "compression_ratio": 1.682608695652174, + "no_speech_prob": 0.002920187311246991}, {"id": 578, "seek": 193370, "start": 1957.78, + "end": 1959.66, "text": " But if I wanted to just take the top end documents", "tokens": + [51568, 583, 498, 286, 1415, 281, 445, 747, 264, 1192, 917, 8512, 51662], "temperature": + 0.0, "avg_logprob": -0.2036243237947163, "compression_ratio": 1.682608695652174, + "no_speech_prob": 0.002920187311246991}, {"id": 579, "seek": 195966, "start": 1959.66, + "end": 1963.1000000000001, "text": " to get a more sort of semantically relevant,", + "tokens": [50364, 281, 483, 257, 544, 1333, 295, 4361, 49505, 7340, 11, 50536], + "temperature": 0.0, "avg_logprob": -0.15439103395884274, "compression_ratio": 1.7346938775510203, + "no_speech_prob": 0.0010433149291202426}, {"id": 580, "seek": 195966, "start": 1963.1000000000001, + "end": 1967.5800000000002, "text": " or just let''s just say relevant set corresponding + to that keyword query,", "tokens": [50536, 420, 445, 718, 311, 445, 584, 7340, 992, + 11760, 281, 300, 20428, 14581, 11, 50760], "temperature": 0.0, "avg_logprob": -0.15439103395884274, + "compression_ratio": 1.7346938775510203, "no_speech_prob": 0.0010433149291202426}, + {"id": 581, "seek": 195966, "start": 1967.5800000000002, "end": 1972.14, "text": + " then I generate a new embedding in the dense space", "tokens": [50760, 550, 286, + 8460, 257, 777, 12240, 3584, 294, 264, 18011, 1901, 50988], "temperature": 0.0, + "avg_logprob": -0.15439103395884274, "compression_ratio": 1.7346938775510203, "no_speech_prob": + 0.0010433149291202426}, {"id": 582, "seek": 195966, "start": 1972.14, "end": 1974.6200000000001, + "text": " that is just an average of those.", "tokens": [50988, 300, 307, 445, 364, + 4274, 295, 729, 13, 51112], "temperature": 0.0, "avg_logprob": -0.15439103395884274, + "compression_ratio": 1.7346938775510203, "no_speech_prob": 0.0010433149291202426}, + {"id": 583, "seek": 195966, "start": 1974.6200000000001, "end": 1977.1000000000001, + "text": " If you go back to the Darth Vader example from earlier,", "tokens": [51112, + 759, 291, 352, 646, 281, 264, 40696, 36337, 1365, 490, 3071, 11, 51236], "temperature": + 0.0, "avg_logprob": -0.15439103395884274, "compression_ratio": 1.7346938775510203, + "no_speech_prob": 0.0010433149291202426}, {"id": 584, "seek": 195966, "start": 1977.1000000000001, + "end": 1978.98, "text": " where the puppy Darth Vader is in the middle,", "tokens": + [51236, 689, 264, 18196, 40696, 36337, 307, 294, 264, 2808, 11, 51330], "temperature": + 0.0, "avg_logprob": -0.15439103395884274, "compression_ratio": 1.7346938775510203, + "no_speech_prob": 0.0010433149291202426}, {"id": 585, "seek": 195966, "start": 1978.98, + "end": 1983.1000000000001, "text": " it was sort of a combination of the meaning + of Darth Vader and a meaning of puppy.", "tokens": [51330, 309, 390, 1333, 295, + 257, 6562, 295, 264, 3620, 295, 40696, 36337, 293, 257, 3620, 295, 18196, 13, 51536], + "temperature": 0.0, "avg_logprob": -0.15439103395884274, "compression_ratio": 1.7346938775510203, + "no_speech_prob": 0.0010433149291202426}, {"id": 586, "seek": 195966, "start": 1983.1000000000001, + "end": 1985.1000000000001, "text": " Think of this as taking a bunch of documents", + "tokens": [51536, 6557, 295, 341, 382, 1940, 257, 3840, 295, 8512, 51636], "temperature": + 0.0, "avg_logprob": -0.15439103395884274, "compression_ratio": 1.7346938775510203, + "no_speech_prob": 0.0010433149291202426}, {"id": 587, "seek": 195966, "start": 1985.1000000000001, + "end": 1986.5400000000002, "text": " that each have their own meaning.", "tokens": + [51636, 300, 1184, 362, 641, 1065, 3620, 13, 51708], "temperature": 0.0, "avg_logprob": + -0.15439103395884274, "compression_ratio": 1.7346938775510203, "no_speech_prob": + 0.0010433149291202426}, {"id": 588, "seek": 195966, "start": 1986.5400000000002, + "end": 1989.5, "text": " And when I pull them together, I''m creating", "tokens": + [51708, 400, 562, 286, 2235, 552, 1214, 11, 286, 478, 4084, 51856], "temperature": + 0.0, "avg_logprob": -0.15439103395884274, "compression_ratio": 1.7346938775510203, + "no_speech_prob": 0.0010433149291202426}, {"id": 589, "seek": 198950, "start": 1989.5, + "end": 1992.34, "text": " an embedding that has the average of the meaning.", "tokens": + [50364, 364, 12240, 3584, 300, 575, 264, 4274, 295, 264, 3620, 13, 50506], "temperature": + 0.0, "avg_logprob": -0.13117500967230677, "compression_ratio": 1.8130081300813008, + "no_speech_prob": 8.349609561264515e-05}, {"id": 590, "seek": 198950, "start": 1992.34, + "end": 1998.34, "text": " And if I assume my documents that I queried on the lexical + side,", "tokens": [50506, 400, 498, 286, 6552, 452, 8512, 300, 286, 7083, 1091, + 322, 264, 476, 87, 804, 1252, 11, 50806], "temperature": 0.0, "avg_logprob": -0.13117500967230677, + "compression_ratio": 1.8130081300813008, "no_speech_prob": 8.349609561264515e-05}, + {"id": 591, "seek": 198950, "start": 1998.34, "end": 2001.14, "text": " have some + sense of a shared meaning within them.", "tokens": [50806, 362, 512, 2020, 295, + 257, 5507, 3620, 1951, 552, 13, 50946], "temperature": 0.0, "avg_logprob": -0.13117500967230677, + "compression_ratio": 1.8130081300813008, "no_speech_prob": 8.349609561264515e-05}, + {"id": 592, "seek": 198950, "start": 2001.14, "end": 2002.86, "text": " And I take + the top documents from that,", "tokens": [50946, 400, 286, 747, 264, 1192, 8512, + 490, 300, 11, 51032], "temperature": 0.0, "avg_logprob": -0.13117500967230677, "compression_ratio": + 1.8130081300813008, "no_speech_prob": 8.349609561264515e-05}, {"id": 593, "seek": + 198950, "start": 2002.86, "end": 2005.7, "text": " then that shared meaning I can + hop over to the dense space,", "tokens": [51032, 550, 300, 5507, 3620, 286, 393, + 3818, 670, 281, 264, 18011, 1901, 11, 51174], "temperature": 0.0, "avg_logprob": + -0.13117500967230677, "compression_ratio": 1.8130081300813008, "no_speech_prob": + 8.349609561264515e-05}, {"id": 594, "seek": 198950, "start": 2005.7, "end": 2008.58, + "text": " find and then find other things that have similar meaning,", "tokens": + [51174, 915, 293, 550, 915, 661, 721, 300, 362, 2531, 3620, 11, 51318], "temperature": + 0.0, "avg_logprob": -0.13117500967230677, "compression_ratio": 1.8130081300813008, + "no_speech_prob": 8.349609561264515e-05}, {"id": 595, "seek": 198950, "start": 2008.58, + "end": 2010.98, "text": " even if they don''t match the keywords.", "tokens": [51318, + 754, 498, 436, 500, 380, 2995, 264, 21009, 13, 51438], "temperature": 0.0, "avg_logprob": + -0.13117500967230677, "compression_ratio": 1.8130081300813008, "no_speech_prob": + 8.349609561264515e-05}, {"id": 596, "seek": 198950, "start": 2010.98, "end": 2014.66, + "text": " Likewise, I can go the other direction if I''m in my embedding space,", + "tokens": [51438, 30269, 11, 286, 393, 352, 264, 661, 3513, 498, 286, 478, 294, + 452, 12240, 3584, 1901, 11, 51622], "temperature": 0.0, "avg_logprob": -0.13117500967230677, + "compression_ratio": 1.8130081300813008, "no_speech_prob": 8.349609561264515e-05}, + {"id": 597, "seek": 198950, "start": 2014.66, "end": 2016.42, "text": " my dense + space.", "tokens": [51622, 452, 18011, 1901, 13, 51710], "temperature": 0.0, "avg_logprob": + -0.13117500967230677, "compression_ratio": 1.8130081300813008, "no_speech_prob": + 8.349609561264515e-05}, {"id": 598, "seek": 201642, "start": 2016.42, "end": 2021.94, + "text": " I can run a search, find the top in most related embeddings", "tokens": + [50364, 286, 393, 1190, 257, 3164, 11, 915, 264, 1192, 294, 881, 4077, 12240, 29432, + 50640], "temperature": 0.0, "avg_logprob": -0.17304037619328153, "compression_ratio": + 1.6123778501628665, "no_speech_prob": 0.00032095983624458313}, {"id": 599, "seek": + 201642, "start": 2021.94, "end": 2024.46, "text": " by cosine similar to what have + you.", "tokens": [50640, 538, 23565, 2531, 281, 437, 362, 291, 13, 50766], "temperature": + 0.0, "avg_logprob": -0.17304037619328153, "compression_ratio": 1.6123778501628665, + "no_speech_prob": 0.00032095983624458313}, {"id": 600, "seek": 201642, "start": + 2024.46, "end": 2027.54, "text": " And then conceptually, it seems more difficult", + "tokens": [50766, 400, 550, 3410, 671, 11, 309, 2544, 544, 2252, 50920], "temperature": + 0.0, "avg_logprob": -0.17304037619328153, "compression_ratio": 1.6123778501628665, + "no_speech_prob": 0.00032095983624458313}, {"id": 601, "seek": 201642, "start": + 2027.54, "end": 2030.3400000000001, "text": " to then hop over to the sparse space.", + "tokens": [50920, 281, 550, 3818, 670, 281, 264, 637, 11668, 1901, 13, 51060], "temperature": + 0.0, "avg_logprob": -0.17304037619328153, "compression_ratio": 1.6123778501628665, + "no_speech_prob": 0.00032095983624458313}, {"id": 602, "seek": 201642, "start": + 2030.3400000000001, "end": 2032.0600000000002, "text": " How do you generate a sparse + vector?", "tokens": [51060, 1012, 360, 291, 8460, 257, 637, 11668, 8062, 30, 51146], + "temperature": 0.0, "avg_logprob": -0.17304037619328153, "compression_ratio": 1.6123778501628665, + "no_speech_prob": 0.00032095983624458313}, {"id": 603, "seek": 201642, "start": + 2032.0600000000002, "end": 2034.78, "text": " But there''s a technique called semi-technology + graphs,", "tokens": [51146, 583, 456, 311, 257, 6532, 1219, 12909, 12, 29113, 1793, + 24877, 11, 51282], "temperature": 0.0, "avg_logprob": -0.17304037619328153, "compression_ratio": + 1.6123778501628665, "no_speech_prob": 0.00032095983624458313}, {"id": 604, "seek": + 201642, "start": 2034.78, "end": 2037.78, "text": " which I''ll kind of walk you + through, which allows you to do this.", "tokens": [51282, 597, 286, 603, 733, 295, + 1792, 291, 807, 11, 597, 4045, 291, 281, 360, 341, 13, 51432], "temperature": 0.0, + "avg_logprob": -0.17304037619328153, "compression_ratio": 1.6123778501628665, "no_speech_prob": + 0.00032095983624458313}, {"id": 605, "seek": 201642, "start": 2037.78, "end": 2041.98, + "text": " So zooming back out, I mentioned pulling the vectors of the K-NN documents.", + "tokens": [51432, 407, 48226, 646, 484, 11, 286, 2835, 8407, 264, 18875, 295, 264, + 591, 12, 45, 45, 8512, 13, 51642], "temperature": 0.0, "avg_logprob": -0.17304037619328153, + "compression_ratio": 1.6123778501628665, "no_speech_prob": 0.00032095983624458313}, + {"id": 606, "seek": 201642, "start": 2041.98, "end": 2044.46, "text": " All you + need to do, again, I query an electrical space,", "tokens": [51642, 1057, 291, 643, + 281, 360, 11, 797, 11, 286, 14581, 364, 12147, 1901, 11, 51766], "temperature": + 0.0, "avg_logprob": -0.17304037619328153, "compression_ratio": 1.6123778501628665, + "no_speech_prob": 0.00032095983624458313}, {"id": 607, "seek": 201642, "start": + 2044.46, "end": 2045.9, "text": " get the top K documents,", "tokens": [51766, 483, + 264, 1192, 591, 8512, 11, 51838], "temperature": 0.0, "avg_logprob": -0.17304037619328153, + "compression_ratio": 1.6123778501628665, "no_speech_prob": 0.00032095983624458313}, + {"id": 608, "seek": 204590, "start": 2045.9, "end": 2048.46, "text": " get the embeddings + of those documents and average them together.", "tokens": [50364, 483, 264, 12240, + 29432, 295, 729, 8512, 293, 4274, 552, 1214, 13, 50492], "temperature": 0.0, "avg_logprob": + -0.16094406007781742, "compression_ratio": 1.7570422535211268, "no_speech_prob": + 4.3208525312365964e-05}, {"id": 609, "seek": 204590, "start": 2048.46, "end": 2053.7000000000003, + "text": " This is the simple way to do that, just using NumPy.", "tokens": [50492, + 639, 307, 264, 2199, 636, 281, 360, 300, 11, 445, 1228, 22592, 47, 88, 13, 50754], + "temperature": 0.0, "avg_logprob": -0.16094406007781742, "compression_ratio": 1.7570422535211268, + "no_speech_prob": 4.3208525312365964e-05}, {"id": 610, "seek": 204590, "start": + 2053.7000000000003, "end": 2055.1800000000003, "text": " For the semantics knowledge + graph approach,", "tokens": [50754, 1171, 264, 4361, 45298, 3601, 4295, 3109, 11, + 50828], "temperature": 0.0, "avg_logprob": -0.16094406007781742, "compression_ratio": + 1.7570422535211268, "no_speech_prob": 4.3208525312365964e-05}, {"id": 611, "seek": + 204590, "start": 2055.1800000000003, "end": 2058.1800000000003, "text": " same thing + I get the top K documents in the current vector space.", "tokens": [50828, 912, + 551, 286, 483, 264, 1192, 591, 8512, 294, 264, 2190, 8062, 1901, 13, 50978], "temperature": + 0.0, "avg_logprob": -0.16094406007781742, "compression_ratio": 1.7570422535211268, + "no_speech_prob": 4.3208525312365964e-05}, {"id": 612, "seek": 204590, "start": + 2058.1800000000003, "end": 2061.1, "text": " And then I do a semantics knowledge + graph traversal", "tokens": [50978, 400, 550, 286, 360, 257, 4361, 45298, 3601, + 4295, 23149, 304, 51124], "temperature": 0.0, "avg_logprob": -0.16094406007781742, + "compression_ratio": 1.7570422535211268, "no_speech_prob": 4.3208525312365964e-05}, + {"id": 613, "seek": 204590, "start": 2061.1, "end": 2063.94, "text": " to derive + a sparse lexical query", "tokens": [51124, 281, 28446, 257, 637, 11668, 476, 87, + 804, 14581, 51266], "temperature": 0.0, "avg_logprob": -0.16094406007781742, "compression_ratio": + 1.7570422535211268, "no_speech_prob": 4.3208525312365964e-05}, {"id": 614, "seek": + 204590, "start": 2063.94, "end": 2066.94, "text": " that best represents those documents.", + "tokens": [51266, 300, 1151, 8855, 729, 8512, 13, 51416], "temperature": 0.0, "avg_logprob": + -0.16094406007781742, "compression_ratio": 1.7570422535211268, "no_speech_prob": + 4.3208525312365964e-05}, {"id": 615, "seek": 204590, "start": 2066.94, "end": 2069.14, + "text": " So functionally, if you think of language,", "tokens": [51416, 407, 2445, + 379, 11, 498, 291, 519, 295, 2856, 11, 51526], "temperature": 0.0, "avg_logprob": + -0.16094406007781742, "compression_ratio": 1.7570422535211268, "no_speech_prob": + 4.3208525312365964e-05}, {"id": 616, "seek": 204590, "start": 2069.14, "end": 2071.14, + "text": " I just talk about semantics knowledge graphs for a second", "tokens": + [51526, 286, 445, 751, 466, 4361, 45298, 3601, 24877, 337, 257, 1150, 51626], "temperature": + 0.0, "avg_logprob": -0.16094406007781742, "compression_ratio": 1.7570422535211268, + "no_speech_prob": 4.3208525312365964e-05}, {"id": 617, "seek": 204590, "start": + 2071.14, "end": 2074.06, "text": " and show you the structure of natural language.", + "tokens": [51626, 293, 855, 291, 264, 3877, 295, 3303, 2856, 13, 51772], "temperature": + 0.0, "avg_logprob": -0.16094406007781742, "compression_ratio": 1.7570422535211268, + "no_speech_prob": 4.3208525312365964e-05}, {"id": 618, "seek": 207406, "start": + 2074.06, "end": 2076.54, "text": " You could think of it as a graph of relationships.", + "tokens": [50364, 509, 727, 519, 295, 309, 382, 257, 4295, 295, 6159, 13, 50488], + "temperature": 0.0, "avg_logprob": -0.16026883885480356, "compression_ratio": 1.9652509652509653, + "no_speech_prob": 0.0006026344490237534}, {"id": 619, "seek": 207406, "start": 2076.54, + "end": 2080.58, "text": " We''ve got prefixes and suffixes and those mapped to terms,", + "tokens": [50488, 492, 600, 658, 18417, 36005, 293, 3889, 36005, 293, 729, 33318, + 281, 2115, 11, 50690], "temperature": 0.0, "avg_logprob": -0.16026883885480356, + "compression_ratio": 1.9652509652509653, "no_speech_prob": 0.0006026344490237534}, + {"id": 620, "seek": 207406, "start": 2080.58, "end": 2083.1, "text": " those mapped + to term sequences and documents.", "tokens": [50690, 729, 33318, 281, 1433, 22978, + 293, 8512, 13, 50816], "temperature": 0.0, "avg_logprob": -0.16026883885480356, + "compression_ratio": 1.9652509652509653, "no_speech_prob": 0.0006026344490237534}, + {"id": 621, "seek": 207406, "start": 2083.1, "end": 2086.34, "text": " But once + you get documents and we''ve got these terms across documents,", "tokens": [50816, + 583, 1564, 291, 483, 8512, 293, 321, 600, 658, 613, 2115, 2108, 8512, 11, 50978], + "temperature": 0.0, "avg_logprob": -0.16026883885480356, "compression_ratio": 1.9652509652509653, + "no_speech_prob": 0.0006026344490237534}, {"id": 622, "seek": 207406, "start": 2086.34, + "end": 2089.5, "text": " you can just think of this as a giant graph of relationships.", + "tokens": [50978, 291, 393, 445, 519, 295, 341, 382, 257, 7410, 4295, 295, 6159, + 13, 51136], "temperature": 0.0, "avg_logprob": -0.16026883885480356, "compression_ratio": + 1.9652509652509653, "no_speech_prob": 0.0006026344490237534}, {"id": 623, "seek": + 207406, "start": 2089.5, "end": 2092.34, "text": " And so I can take individual + words.", "tokens": [51136, 400, 370, 286, 393, 747, 2609, 2283, 13, 51278], "temperature": + 0.0, "avg_logprob": -0.16026883885480356, "compression_ratio": 1.9652509652509653, + "no_speech_prob": 0.0006026344490237534}, {"id": 624, "seek": 207406, "start": 2092.34, + "end": 2096.2599999999998, "text": " In this case, Trey, he, he, they all refer + to the same entity.", "tokens": [51278, 682, 341, 1389, 11, 314, 7950, 11, 415, + 11, 415, 11, 436, 439, 2864, 281, 264, 912, 13977, 13, 51474], "temperature": 0.0, + "avg_logprob": -0.16026883885480356, "compression_ratio": 1.9652509652509653, "no_speech_prob": + 0.0006026344490237534}, {"id": 625, "seek": 207406, "start": 2096.2599999999998, + "end": 2097.38, "text": " I can take other things.", "tokens": [51474, 286, 393, + 747, 661, 721, 13, 51530], "temperature": 0.0, "avg_logprob": -0.16026883885480356, + "compression_ratio": 1.9652509652509653, "no_speech_prob": 0.0006026344490237534}, + {"id": 626, "seek": 207406, "start": 2097.38, "end": 2101.38, "text": " And if I + think of this as a graph, then in fact,", "tokens": [51530, 400, 498, 286, 519, + 295, 341, 382, 257, 4295, 11, 550, 294, 1186, 11, 51730], "temperature": 0.0, "avg_logprob": + -0.16026883885480356, "compression_ratio": 1.9652509652509653, "no_speech_prob": + 0.0006026344490237534}, {"id": 627, "seek": 207406, "start": 2101.38, "end": 2103.9, + "text": " you can leverage your inverted index as a graph", "tokens": [51730, 291, + 393, 13982, 428, 38969, 8186, 382, 257, 4295, 51856], "temperature": 0.0, "avg_logprob": + -0.16026883885480356, "compression_ratio": 1.9652509652509653, "no_speech_prob": + 0.0006026344490237534}, {"id": 628, "seek": 210390, "start": 2103.9, "end": 2106.98, + "text": " and you can traverse it to find these relationships.", "tokens": [50364, + 293, 291, 393, 45674, 309, 281, 915, 613, 6159, 13, 50518], "temperature": 0.0, + "avg_logprob": -0.18201105613408125, "compression_ratio": 1.7575757575757576, "no_speech_prob": + 4.3960564653389156e-05}, {"id": 629, "seek": 210390, "start": 2106.98, "end": 2108.94, + "text": " And so, and a typical search engine.", "tokens": [50518, 400, 370, 11, + 293, 257, 7476, 3164, 2848, 13, 50616], "temperature": 0.0, "avg_logprob": -0.18201105613408125, + "compression_ratio": 1.7575757575757576, "no_speech_prob": 4.3960564653389156e-05}, + {"id": 630, "seek": 210390, "start": 2108.94, "end": 2111.62, "text": " So like + any of the Lucy and engines, for example,", "tokens": [50616, 407, 411, 604, 295, + 264, 22698, 293, 12982, 11, 337, 1365, 11, 50750], "temperature": 0.0, "avg_logprob": + -0.18201105613408125, "compression_ratio": 1.7575757575757576, "no_speech_prob": + 4.3960564653389156e-05}, {"id": 631, "seek": 210390, "start": 2111.62, "end": 2114.34, + "text": " you have an inverted index,", "tokens": [50750, 291, 362, 364, 38969, + 8186, 11, 50886], "temperature": 0.0, "avg_logprob": -0.18201105613408125, "compression_ratio": + 1.7575757575757576, "no_speech_prob": 4.3960564653389156e-05}, {"id": 632, "seek": + 210390, "start": 2114.34, "end": 2117.54, "text": " which is something that maps + terms to sets of documents.", "tokens": [50886, 597, 307, 746, 300, 11317, 2115, + 281, 6352, 295, 8512, 13, 51046], "temperature": 0.0, "avg_logprob": -0.18201105613408125, + "compression_ratio": 1.7575757575757576, "no_speech_prob": 4.3960564653389156e-05}, + {"id": 633, "seek": 210390, "start": 2117.54, "end": 2119.86, "text": " And then + you''ve got usually a forward index", "tokens": [51046, 400, 550, 291, 600, 658, + 2673, 257, 2128, 8186, 51162], "temperature": 0.0, "avg_logprob": -0.18201105613408125, + "compression_ratio": 1.7575757575757576, "no_speech_prob": 4.3960564653389156e-05}, + {"id": 634, "seek": 210390, "start": 2119.86, "end": 2124.86, "text": " and open + search, elastic search, solar, any Lucy and engine.", "tokens": [51162, 293, 1269, + 3164, 11, 17115, 3164, 11, 7936, 11, 604, 22698, 293, 2848, 13, 51412], "temperature": + 0.0, "avg_logprob": -0.18201105613408125, "compression_ratio": 1.7575757575757576, + "no_speech_prob": 4.3960564653389156e-05}, {"id": 635, "seek": 210390, "start": + 2124.86, "end": 2127.2200000000003, "text": " This is going to be your doc values.", + "tokens": [51412, 639, 307, 516, 281, 312, 428, 3211, 4190, 13, 51530], "temperature": + 0.0, "avg_logprob": -0.18201105613408125, "compression_ratio": 1.7575757575757576, + "no_speech_prob": 4.3960564653389156e-05}, {"id": 636, "seek": 210390, "start": + 2127.2200000000003, "end": 2130.34, "text": " But essentially, I can take any term + and map it to a set of documents.", "tokens": [51530, 583, 4476, 11, 286, 393, 747, + 604, 1433, 293, 4471, 309, 281, 257, 992, 295, 8512, 13, 51686], "temperature": + 0.0, "avg_logprob": -0.18201105613408125, "compression_ratio": 1.7575757575757576, + "no_speech_prob": 4.3960564653389156e-05}, {"id": 637, "seek": 210390, "start": + 2130.34, "end": 2132.46, "text": " So if I can take any term,", "tokens": [51686, + 407, 498, 286, 393, 747, 604, 1433, 11, 51792], "temperature": 0.0, "avg_logprob": + -0.18201105613408125, "compression_ratio": 1.7575757575757576, "no_speech_prob": + 4.3960564653389156e-05}, {"id": 638, "seek": 213246, "start": 2132.46, "end": 2134.7400000000002, + "text": " I''m sorry, any document map it to a set of terms.", "tokens": [50364, + 286, 478, 2597, 11, 604, 4166, 4471, 309, 281, 257, 992, 295, 2115, 13, 50478], + "temperature": 0.0, "avg_logprob": -0.14726111725086474, "compression_ratio": 2.017167381974249, + "no_speech_prob": 0.00017674015543889254}, {"id": 639, "seek": 213246, "start": + 2134.7400000000002, "end": 2138.2200000000003, "text": " So if I can take any term + and map it to a document set,", "tokens": [50478, 407, 498, 286, 393, 747, 604, + 1433, 293, 4471, 309, 281, 257, 4166, 992, 11, 50652], "temperature": 0.0, "avg_logprob": + -0.14726111725086474, "compression_ratio": 2.017167381974249, "no_speech_prob": + 0.00017674015543889254}, {"id": 640, "seek": 213246, "start": 2138.2200000000003, + "end": 2141.38, "text": " and I can take any document and map it to a set of terms,", + "tokens": [50652, 293, 286, 393, 747, 604, 4166, 293, 4471, 309, 281, 257, 992, + 295, 2115, 11, 50810], "temperature": 0.0, "avg_logprob": -0.14726111725086474, + "compression_ratio": 2.017167381974249, "no_speech_prob": 0.00017674015543889254}, + {"id": 641, "seek": 213246, "start": 2141.38, "end": 2144.38, "text": " then that''s + a graph and I can traverse back and forth across this.", "tokens": [50810, 550, + 300, 311, 257, 4295, 293, 286, 393, 45674, 646, 293, 5220, 2108, 341, 13, 50960], + "temperature": 0.0, "avg_logprob": -0.14726111725086474, "compression_ratio": 2.017167381974249, + "no_speech_prob": 0.00017674015543889254}, {"id": 642, "seek": 213246, "start": + 2144.38, "end": 2150.66, "text": " So for example, if I have the skill of Java and + the skill field,", "tokens": [50960, 407, 337, 1365, 11, 498, 286, 362, 264, 5389, + 295, 10745, 293, 264, 5389, 2519, 11, 51274], "temperature": 0.0, "avg_logprob": + -0.14726111725086474, "compression_ratio": 2.017167381974249, "no_speech_prob": + 0.00017674015543889254}, {"id": 643, "seek": 213246, "start": 2150.66, "end": 2154.1, + "text": " and I''ve got a set of documents that has the keyword Java,", "tokens": + [51274, 293, 286, 600, 658, 257, 992, 295, 8512, 300, 575, 264, 20428, 10745, 11, + 51446], "temperature": 0.0, "avg_logprob": -0.14726111725086474, "compression_ratio": + 2.017167381974249, "no_speech_prob": 0.00017674015543889254}, {"id": 644, "seek": + 213246, "start": 2154.1, "end": 2159.7400000000002, "text": " you can think of this + set of documents as representing the keyword Java.", "tokens": [51446, 291, 393, + 519, 295, 341, 992, 295, 8512, 382, 13460, 264, 20428, 10745, 13, 51728], "temperature": + 0.0, "avg_logprob": -0.14726111725086474, "compression_ratio": 2.017167381974249, + "no_speech_prob": 0.00017674015543889254}, {"id": 645, "seek": 213246, "start": + 2159.7400000000002, "end": 2161.98, "text": " And then similarly, there''s sort + of a link", "tokens": [51728, 400, 550, 14138, 11, 456, 311, 1333, 295, 257, 2113, + 51840], "temperature": 0.0, "avg_logprob": -0.14726111725086474, "compression_ratio": + 2.017167381974249, "no_speech_prob": 0.00017674015543889254}, {"id": 646, "seek": + 216198, "start": 2161.98, "end": 2162.82, "text": " to other documents.", "tokens": + [50364, 281, 661, 8512, 13, 50406], "temperature": 0.0, "avg_logprob": -0.16245371765560573, + "compression_ratio": 1.9816176470588236, "no_speech_prob": 0.0005235624266788363}, + {"id": 647, "seek": 216198, "start": 2162.82, "end": 2165.18, "text": " You''ll + notice that there''s no documents that link", "tokens": [50406, 509, 603, 3449, + 300, 456, 311, 572, 8512, 300, 2113, 50524], "temperature": 0.0, "avg_logprob": + -0.16245371765560573, "compression_ratio": 1.9816176470588236, "no_speech_prob": + 0.0005235624266788363}, {"id": 648, "seek": 216198, "start": 2165.18, "end": 2168.14, + "text": " both the skill of Java and the skill of hibernate.", "tokens": [50524, + 1293, 264, 5389, 295, 10745, 293, 264, 5389, 295, 4879, 26848, 473, 13, 50672], + "temperature": 0.0, "avg_logprob": -0.16245371765560573, "compression_ratio": 1.9816176470588236, + "no_speech_prob": 0.0005235624266788363}, {"id": 649, "seek": 216198, "start": 2168.14, + "end": 2170.14, "text": " And so in a set theory view, it looks like this.", "tokens": + [50672, 400, 370, 294, 257, 992, 5261, 1910, 11, 309, 1542, 411, 341, 13, 50772], + "temperature": 0.0, "avg_logprob": -0.16245371765560573, "compression_ratio": 1.9816176470588236, + "no_speech_prob": 0.0005235624266788363}, {"id": 650, "seek": 216198, "start": 2170.14, + "end": 2172.7400000000002, "text": " Notice that this set doesn''t intersect with + these.", "tokens": [50772, 13428, 300, 341, 992, 1177, 380, 27815, 365, 613, 13, + 50902], "temperature": 0.0, "avg_logprob": -0.16245371765560573, "compression_ratio": + 1.9816176470588236, "no_speech_prob": 0.0005235624266788363}, {"id": 651, "seek": + 216198, "start": 2172.7400000000002, "end": 2176.14, "text": " And from a graph + theory view, the same underlying indexes", "tokens": [50902, 400, 490, 257, 4295, + 5261, 1910, 11, 264, 912, 14217, 8186, 279, 51072], "temperature": 0.0, "avg_logprob": + -0.16245371765560573, "compression_ratio": 1.9816176470588236, "no_speech_prob": + 0.0005235624266788363}, {"id": 652, "seek": 216198, "start": 2176.14, "end": 2179.86, + "text": " look like this where I have a graph where I''ve got the skill of Java", + "tokens": [51072, 574, 411, 341, 689, 286, 362, 257, 4295, 689, 286, 600, 658, 264, + 5389, 295, 10745, 51258], "temperature": 0.0, "avg_logprob": -0.16245371765560573, + "compression_ratio": 1.9816176470588236, "no_speech_prob": 0.0005235624266788363}, + {"id": 653, "seek": 216198, "start": 2179.86, "end": 2182.7400000000002, "text": + " with a has related skill edge to the skill of Scala", "tokens": [51258, 365, 257, + 575, 4077, 5389, 4691, 281, 264, 5389, 295, 2747, 5159, 51402], "temperature": 0.0, + "avg_logprob": -0.16245371765560573, "compression_ratio": 1.9816176470588236, "no_speech_prob": + 0.0005235624266788363}, {"id": 654, "seek": 216198, "start": 2182.7400000000002, + "end": 2184.02, "text": " and the skill of hibernate.", "tokens": [51402, 293, 264, + 5389, 295, 4879, 26848, 473, 13, 51466], "temperature": 0.0, "avg_logprob": -0.16245371765560573, + "compression_ratio": 1.9816176470588236, "no_speech_prob": 0.0005235624266788363}, + {"id": 655, "seek": 216198, "start": 2184.02, "end": 2187.06, "text": " And then + oncology is completely disconnected from the graph.", "tokens": [51466, 400, 550, + 40592, 1793, 307, 2584, 29426, 490, 264, 4295, 13, 51618], "temperature": 0.0, "avg_logprob": + -0.16245371765560573, "compression_ratio": 1.9816176470588236, "no_speech_prob": + 0.0005235624266788363}, {"id": 656, "seek": 216198, "start": 2187.06, "end": 2189.86, + "text": " And all I''m doing is leveraging my inverted index,", "tokens": [51618, + 400, 439, 286, 478, 884, 307, 32666, 452, 38969, 8186, 11, 51758], "temperature": + 0.0, "avg_logprob": -0.16245371765560573, "compression_ratio": 1.9816176470588236, + "no_speech_prob": 0.0005235624266788363}, {"id": 657, "seek": 218986, "start": 2189.86, + "end": 2196.3, "text": " my sparse representation to traverse across these relationships.", + "tokens": [50364, 452, 637, 11668, 10290, 281, 45674, 2108, 613, 6159, 13, 50686], + "temperature": 0.0, "avg_logprob": -0.157505534519659, "compression_ratio": 1.8244897959183672, + "no_speech_prob": 0.001203685300424695}, {"id": 658, "seek": 218986, "start": 2196.3, + "end": 2199.02, "text": " This is very useful for things like disambiguation,", + "tokens": [50686, 639, 307, 588, 4420, 337, 721, 411, 717, 2173, 328, 16073, 11, + 50822], "temperature": 0.0, "avg_logprob": -0.157505534519659, "compression_ratio": + 1.8244897959183672, "no_speech_prob": 0.001203685300424695}, {"id": 659, "seek": + 218986, "start": 2199.02, "end": 2200.98, "text": " where I can take a keyword like + server.", "tokens": [50822, 689, 286, 393, 747, 257, 20428, 411, 7154, 13, 50920], + "temperature": 0.0, "avg_logprob": -0.157505534519659, "compression_ratio": 1.8244897959183672, + "no_speech_prob": 0.001203685300424695}, {"id": 660, "seek": 218986, "start": 2200.98, + "end": 2203.58, "text": " I can traverse through documents to find", "tokens": [50920, + 286, 393, 45674, 807, 8512, 281, 915, 51050], "temperature": 0.0, "avg_logprob": + -0.157505534519659, "compression_ratio": 1.8244897959183672, "no_speech_prob": 0.001203685300424695}, + {"id": 661, "seek": 218986, "start": 2203.58, "end": 2206.2200000000003, "text": + " what are the top semantically related categories,", "tokens": [51050, 437, 366, + 264, 1192, 4361, 49505, 4077, 10479, 11, 51182], "temperature": 0.0, "avg_logprob": + -0.157505534519659, "compression_ratio": 1.8244897959183672, "no_speech_prob": 0.001203685300424695}, + {"id": 662, "seek": 218986, "start": 2206.2200000000003, "end": 2207.98, "text": + " for example, DevOps and Travel.", "tokens": [51182, 337, 1365, 11, 43051, 293, + 20610, 13, 51270], "temperature": 0.0, "avg_logprob": -0.157505534519659, "compression_ratio": + 1.8244897959183672, "no_speech_prob": 0.001203685300424695}, {"id": 663, "seek": + 218986, "start": 2207.98, "end": 2209.6600000000003, "text": " And then within each + of those categories,", "tokens": [51270, 400, 550, 1951, 1184, 295, 729, 10479, + 11, 51354], "temperature": 0.0, "avg_logprob": -0.157505534519659, "compression_ratio": + 1.8244897959183672, "no_speech_prob": 0.001203685300424695}, {"id": 664, "seek": + 218986, "start": 2209.6600000000003, "end": 2211.94, "text": " I can traverse to + other keywords and find", "tokens": [51354, 286, 393, 45674, 281, 661, 21009, 293, + 915, 51468], "temperature": 0.0, "avg_logprob": -0.157505534519659, "compression_ratio": + 1.8244897959183672, "no_speech_prob": 0.001203685300424695}, {"id": 665, "seek": + 218986, "start": 2211.94, "end": 2214.1400000000003, "text": " which are the most + semantically related keywords", "tokens": [51468, 597, 366, 264, 881, 4361, 49505, + 4077, 21009, 51578], "temperature": 0.0, "avg_logprob": -0.157505534519659, "compression_ratio": + 1.8244897959183672, "no_speech_prob": 0.001203685300424695}, {"id": 666, "seek": + 218986, "start": 2214.1400000000003, "end": 2216.3, "text": " to server and the + DevOps category.", "tokens": [51578, 281, 7154, 293, 264, 43051, 7719, 13, 51686], + "temperature": 0.0, "avg_logprob": -0.157505534519659, "compression_ratio": 1.8244897959183672, + "no_speech_prob": 0.001203685300424695}, {"id": 667, "seek": 221630, "start": 2216.3, + "end": 2218.82, "text": " For example, I get terms like Docker,", "tokens": [50364, + 1171, 1365, 11, 286, 483, 2115, 411, 33772, 11, 50490], "temperature": 0.0, "avg_logprob": + -0.1971553455699574, "compression_ratio": 1.5691056910569106, "no_speech_prob": + 0.000775289663579315}, {"id": 668, "seek": 221630, "start": 2218.82, "end": 2221.82, + "text": " Ingenx, Jenkins, Git, Words and Travel,", "tokens": [50490, 682, 1766, + 87, 11, 41273, 11, 16939, 11, 32857, 293, 20610, 11, 50640], "temperature": 0.0, + "avg_logprob": -0.1971553455699574, "compression_ratio": 1.5691056910569106, "no_speech_prob": + 0.000775289663579315}, {"id": 669, "seek": 221630, "start": 2221.82, "end": 2224.2200000000003, + "text": " I get things like tip, restaurant,", "tokens": [50640, 286, 483, 721, + 411, 4125, 11, 6383, 11, 50760], "temperature": 0.0, "avg_logprob": -0.1971553455699574, + "compression_ratio": 1.5691056910569106, "no_speech_prob": 0.000775289663579315}, + {"id": 670, "seek": 221630, "start": 2224.2200000000003, "end": 2226.5800000000004, + "text": " bill, wages, things like that.", "tokens": [50760, 2961, 11, 20097, 11, + 721, 411, 300, 13, 50878], "temperature": 0.0, "avg_logprob": -0.1971553455699574, + "compression_ratio": 1.5691056910569106, "no_speech_prob": 0.000775289663579315}, + {"id": 671, "seek": 221630, "start": 2226.5800000000004, "end": 2229.9, "text": + " And so all of this just leverages an inverted index.", "tokens": [50878, 400, + 370, 439, 295, 341, 445, 12451, 1660, 364, 38969, 8186, 13, 51044], "temperature": + 0.0, "avg_logprob": -0.1971553455699574, "compression_ratio": 1.5691056910569106, + "no_speech_prob": 0.000775289663579315}, {"id": 672, "seek": 221630, "start": 2229.9, + "end": 2232.78, "text": " There''s no embeddings whatsoever.", "tokens": [51044, + 821, 311, 572, 12240, 29432, 17076, 13, 51188], "temperature": 0.0, "avg_logprob": + -0.1971553455699574, "compression_ratio": 1.5691056910569106, "no_speech_prob": + 0.000775289663579315}, {"id": 673, "seek": 221630, "start": 2232.78, "end": 2237.9, + "text": " This is all just leveraging the sparse semantic space.", "tokens": [51188, + 639, 307, 439, 445, 32666, 264, 637, 11668, 47982, 1901, 13, 51444], "temperature": + 0.0, "avg_logprob": -0.1971553455699574, "compression_ratio": 1.5691056910569106, + "no_speech_prob": 0.000775289663579315}, {"id": 674, "seek": 221630, "start": 2237.9, + "end": 2240.26, "text": " But why this matters for modeling a tent", "tokens": [51444, + 583, 983, 341, 7001, 337, 15983, 257, 7054, 51562], "temperature": 0.0, "avg_logprob": + -0.1971553455699574, "compression_ratio": 1.5691056910569106, "no_speech_prob": + 0.000775289663579315}, {"id": 675, "seek": 221630, "start": 2240.26, "end": 2244.6200000000003, + "text": " is if I have a query like barbecue near haystack over here,", "tokens": + [51562, 307, 498, 286, 362, 257, 14581, 411, 21877, 2651, 4842, 372, 501, 670, 510, + 11, 51780], "temperature": 0.0, "avg_logprob": -0.1971553455699574, "compression_ratio": + 1.5691056910569106, "no_speech_prob": 0.000775289663579315}, {"id": 676, "seek": + 224462, "start": 2245.58, "end": 2250.1, "text": " I can generate a sparse vector + representing the meaning", "tokens": [50412, 286, 393, 8460, 257, 637, 11668, 8062, + 13460, 264, 3620, 50638], "temperature": 0.0, "avg_logprob": -0.154041166305542, + "compression_ratio": 1.7110091743119267, "no_speech_prob": 0.0029200271237641573}, + {"id": 677, "seek": 224462, "start": 2250.1, "end": 2252.9, "text": " of barbecue + by looking at the index", "tokens": [50638, 295, 21877, 538, 1237, 412, 264, 8186, + 50778], "temperature": 0.0, "avg_logprob": -0.154041166305542, "compression_ratio": + 1.7110091743119267, "no_speech_prob": 0.0029200271237641573}, {"id": 678, "seek": + 224462, "start": 2252.9, "end": 2254.02, "text": " and seeing what''s related to + it.", "tokens": [50778, 293, 2577, 437, 311, 4077, 281, 309, 13, 50834], "temperature": + 0.0, "avg_logprob": -0.154041166305542, "compression_ratio": 1.7110091743119267, + "no_speech_prob": 0.0029200271237641573}, {"id": 679, "seek": 224462, "start": 2254.02, + "end": 2256.7, "text": " So in this context, what I''m able to find", "tokens": + [50834, 407, 294, 341, 4319, 11, 437, 286, 478, 1075, 281, 915, 50968], "temperature": + 0.0, "avg_logprob": -0.154041166305542, "compression_ratio": 1.7110091743119267, + "no_speech_prob": 0.0029200271237641573}, {"id": 680, "seek": 224462, "start": 2256.7, + "end": 2260.2999999999997, "text": " is that barbecue is related to things like + ribs", "tokens": [50968, 307, 300, 21877, 307, 4077, 281, 721, 411, 21400, 51148], + "temperature": 0.0, "avg_logprob": -0.154041166305542, "compression_ratio": 1.7110091743119267, + "no_speech_prob": 0.0029200271237641573}, {"id": 681, "seek": 224462, "start": 2260.2999999999997, + "end": 2264.14, "text": " and brisket and pork and the category of restaurant.", + "tokens": [51148, 293, 738, 271, 5758, 293, 10208, 293, 264, 7719, 295, 6383, 13, + 51340], "temperature": 0.0, "avg_logprob": -0.154041166305542, "compression_ratio": + 1.7110091743119267, "no_speech_prob": 0.0029200271237641573}, {"id": 682, "seek": + 224462, "start": 2264.14, "end": 2269.14, "text": " IE, I can generate a sparse + lexical vector like this,", "tokens": [51340, 286, 36, 11, 286, 393, 8460, 257, + 637, 11668, 476, 87, 804, 8062, 411, 341, 11, 51590], "temperature": 0.0, "avg_logprob": + -0.154041166305542, "compression_ratio": 1.7110091743119267, "no_speech_prob": 0.0029200271237641573}, + {"id": 683, "seek": 224462, "start": 2269.66, "end": 2273.8199999999997, "text": + " purely from the things that are semantically nearby", "tokens": [51616, 17491, + 490, 264, 721, 300, 366, 4361, 49505, 11184, 51824], "temperature": 0.0, "avg_logprob": + -0.154041166305542, "compression_ratio": 1.7110091743119267, "no_speech_prob": 0.0029200271237641573}, + {"id": 684, "seek": 227382, "start": 2273.82, "end": 2276.86, "text": " in my sparse + vector space to barbecue.", "tokens": [50364, 294, 452, 637, 11668, 8062, 1901, + 281, 21877, 13, 50516], "temperature": 0.0, "avg_logprob": -0.11250356038411459, + "compression_ratio": 1.7067669172932332, "no_speech_prob": 0.00020677690918091685}, + {"id": 685, "seek": 227382, "start": 2276.86, "end": 2279.1800000000003, "text": + " But also, if you look at the query over on the right,", "tokens": [50516, 583, + 611, 11, 498, 291, 574, 412, 264, 14581, 670, 322, 264, 558, 11, 50632], "temperature": + 0.0, "avg_logprob": -0.11250356038411459, "compression_ratio": 1.7067669172932332, + "no_speech_prob": 0.00020677690918091685}, {"id": 686, "seek": 227382, "start": + 2279.1800000000003, "end": 2281.7000000000003, "text": " barbecue grill, what I''m + able to do", "tokens": [50632, 21877, 16492, 11, 437, 286, 478, 1075, 281, 360, + 50758], "temperature": 0.0, "avg_logprob": -0.11250356038411459, "compression_ratio": + 1.7067669172932332, "no_speech_prob": 0.00020677690918091685}, {"id": 687, "seek": + 227382, "start": 2281.7000000000003, "end": 2286.26, "text": " is generate a sparse + vector that is barbecue or grill", "tokens": [50758, 307, 8460, 257, 637, 11668, + 8062, 300, 307, 21877, 420, 16492, 50986], "temperature": 0.0, "avg_logprob": -0.11250356038411459, + "compression_ratio": 1.7067669172932332, "no_speech_prob": 0.00020677690918091685}, + {"id": 688, "seek": 227382, "start": 2286.26, "end": 2288.2200000000003, "text": + " or propane or charcoal.", "tokens": [50986, 420, 2365, 1929, 420, 30625, 13, 51084], + "temperature": 0.0, "avg_logprob": -0.11250356038411459, "compression_ratio": 1.7067669172932332, + "no_speech_prob": 0.00020677690918091685}, {"id": 689, "seek": 227382, "start": + 2288.2200000000003, "end": 2290.3, "text": " Notice that this vector is now different", + "tokens": [51084, 13428, 300, 341, 8062, 307, 586, 819, 51188], "temperature": 0.0, + "avg_logprob": -0.11250356038411459, "compression_ratio": 1.7067669172932332, "no_speech_prob": + 0.00020677690918091685}, {"id": 690, "seek": 227382, "start": 2290.3, "end": 2293.2200000000003, + "text": " because it''s contextualized based upon grill being", "tokens": [51188, + 570, 309, 311, 35526, 1602, 2361, 3564, 16492, 885, 51334], "temperature": 0.0, + "avg_logprob": -0.11250356038411459, "compression_ratio": 1.7067669172932332, "no_speech_prob": + 0.00020677690918091685}, {"id": 691, "seek": 227382, "start": 2293.2200000000003, + "end": 2294.06, "text": " in this query.", "tokens": [51334, 294, 341, 14581, 13, + 51376], "temperature": 0.0, "avg_logprob": -0.11250356038411459, "compression_ratio": + 1.7067669172932332, "no_speech_prob": 0.00020677690918091685}, {"id": 692, "seek": + 227382, "start": 2294.06, "end": 2297.94, "text": " So now my query becomes a category + about to our appliance", "tokens": [51376, 407, 586, 452, 14581, 3643, 257, 7719, + 466, 281, 527, 45646, 51570], "temperature": 0.0, "avg_logprob": -0.11250356038411459, + "compression_ratio": 1.7067669172932332, "no_speech_prob": 0.00020677690918091685}, + {"id": 693, "seek": 227382, "start": 2297.94, "end": 2299.38, "text": " and then + this is the list of words", "tokens": [51570, 293, 550, 341, 307, 264, 1329, 295, + 2283, 51642], "temperature": 0.0, "avg_logprob": -0.11250356038411459, "compression_ratio": + 1.7067669172932332, "no_speech_prob": 0.00020677690918091685}, {"id": 694, "seek": + 227382, "start": 2299.38, "end": 2301.9, "text": " that better represents the meaning + of barbecue.", "tokens": [51642, 300, 1101, 8855, 264, 3620, 295, 21877, 13, 51768], + "temperature": 0.0, "avg_logprob": -0.11250356038411459, "compression_ratio": 1.7067669172932332, + "no_speech_prob": 0.00020677690918091685}, {"id": 695, "seek": 230190, "start": + 2301.9, "end": 2305.3, "text": " Again, no embeddings, no transformer models,", + "tokens": [50364, 3764, 11, 572, 12240, 29432, 11, 572, 31782, 5245, 11, 50534], + "temperature": 0.0, "avg_logprob": -0.16834406088326723, "compression_ratio": 1.658273381294964, + "no_speech_prob": 0.00031480571487918496}, {"id": 696, "seek": 230190, "start": + 2305.3, "end": 2306.54, "text": " no LLMs involved here.", "tokens": [50534, 572, + 441, 43, 26386, 3288, 510, 13, 50596], "temperature": 0.0, "avg_logprob": -0.16834406088326723, + "compression_ratio": 1.658273381294964, "no_speech_prob": 0.00031480571487918496}, + {"id": 697, "seek": 230190, "start": 2306.54, "end": 2309.46, "text": " This is + purely leveraging my sparse lexical space", "tokens": [50596, 639, 307, 17491, 32666, + 452, 637, 11668, 476, 87, 804, 1901, 50742], "temperature": 0.0, "avg_logprob": + -0.16834406088326723, "compression_ratio": 1.658273381294964, "no_speech_prob": + 0.00031480571487918496}, {"id": 698, "seek": 230190, "start": 2309.46, "end": 2311.46, + "text": " in the semantics within it.", "tokens": [50742, 294, 264, 4361, 45298, + 1951, 309, 13, 50842], "temperature": 0.0, "avg_logprob": -0.16834406088326723, + "compression_ratio": 1.658273381294964, "no_speech_prob": 0.00031480571487918496}, + {"id": 699, "seek": 230190, "start": 2311.46, "end": 2313.7000000000003, "text": + " And so this is some example source code", "tokens": [50842, 400, 370, 341, 307, + 512, 1365, 4009, 3089, 50954], "temperature": 0.0, "avg_logprob": -0.16834406088326723, + "compression_ratio": 1.658273381294964, "no_speech_prob": 0.00031480571487918496}, + {"id": 700, "seek": 230190, "start": 2313.7000000000003, "end": 2314.94, "text": + " from the AI Power Search book", "tokens": [50954, 490, 264, 7318, 7086, 17180, + 1446, 51016], "temperature": 0.0, "avg_logprob": -0.16834406088326723, "compression_ratio": + 1.658273381294964, "no_speech_prob": 0.00031480571487918496}, {"id": 701, "seek": + 230190, "start": 2314.94, "end": 2317.98, "text": " for traversing semantic knowledge + graphs.", "tokens": [51016, 337, 23149, 278, 47982, 3601, 24877, 13, 51168], "temperature": + 0.0, "avg_logprob": -0.16834406088326723, "compression_ratio": 1.658273381294964, + "no_speech_prob": 0.00031480571487918496}, {"id": 702, "seek": 230190, "start": + 2317.98, "end": 2321.3, "text": " But the idea here with the wormhole vectors", + "tokens": [51168, 583, 264, 1558, 510, 365, 264, 23835, 14094, 18875, 51334], "temperature": + 0.0, "avg_logprob": -0.16834406088326723, "compression_ratio": 1.658273381294964, + "no_speech_prob": 0.00031480571487918496}, {"id": 703, "seek": 230190, "start": + 2321.3, "end": 2323.82, "text": " is that I can take a query in any vector space.", + "tokens": [51334, 307, 300, 286, 393, 747, 257, 14581, 294, 604, 8062, 1901, 13, + 51460], "temperature": 0.0, "avg_logprob": -0.16834406088326723, "compression_ratio": + 1.658273381294964, "no_speech_prob": 0.00031480571487918496}, {"id": 704, "seek": + 230190, "start": 2323.82, "end": 2326.02, "text": " So for example, if I take a + lexical query here,", "tokens": [51460, 407, 337, 1365, 11, 498, 286, 747, 257, + 476, 87, 804, 14581, 510, 11, 51570], "temperature": 0.0, "avg_logprob": -0.16834406088326723, + "compression_ratio": 1.658273381294964, "no_speech_prob": 0.00031480571487918496}, + {"id": 705, "seek": 230190, "start": 2327.9, "end": 2331.62, "text": " I can easily + take a lasagna or drive through it, what have you.", "tokens": [51664, 286, 393, + 3612, 747, 257, 2439, 35697, 420, 3332, 807, 309, 11, 437, 362, 291, 13, 51850], + "temperature": 0.0, "avg_logprob": -0.16834406088326723, "compression_ratio": 1.658273381294964, + "no_speech_prob": 0.00031480571487918496}, {"id": 706, "seek": 233162, "start": + 2331.62, "end": 2334.7799999999997, "text": " And I can generate these representations + over here", "tokens": [50364, 400, 286, 393, 8460, 613, 33358, 670, 510, 50522], + "temperature": 0.0, "avg_logprob": -0.27583667719475574, "compression_ratio": 1.7725321888412018, + "no_speech_prob": 0.00028894373099319637}, {"id": 707, "seek": 233162, "start": + 2334.7799999999997, "end": 2338.98, "text": " by taking a lasagna, finding a dockset + that matches that keyword,", "tokens": [50522, 538, 1940, 257, 2439, 35697, 11, + 5006, 257, 360, 2761, 302, 300, 10676, 300, 20428, 11, 50732], "temperature": 0.0, + "avg_logprob": -0.27583667719475574, "compression_ratio": 1.7725321888412018, "no_speech_prob": + 0.00028894373099319637}, {"id": 708, "seek": 233162, "start": 2338.98, "end": 2342.1, + "text": " and then from that dockset, finding these other relationships,", "tokens": + [50732, 293, 550, 490, 300, 360, 2761, 302, 11, 5006, 613, 661, 6159, 11, 50888], + "temperature": 0.0, "avg_logprob": -0.27583667719475574, "compression_ratio": 1.7725321888412018, + "no_speech_prob": 0.00028894373099319637}, {"id": 709, "seek": 233162, "start": + 2342.1, "end": 2346.94, "text": " for example, lasagna can be described as Italian", + "tokens": [50888, 337, 1365, 11, 2439, 35697, 393, 312, 7619, 382, 10003, 51130], + "temperature": 0.0, "avg_logprob": -0.27583667719475574, "compression_ratio": 1.7725321888412018, + "no_speech_prob": 0.00028894373099319637}, {"id": 710, "seek": 233162, "start": + 2346.94, "end": 2350.1, "text": " with keywords like lasagna, Alfredo pasta and + Italian.", "tokens": [51130, 365, 21009, 411, 2439, 35697, 11, 28327, 78, 13296, + 293, 10003, 13, 51288], "temperature": 0.0, "avg_logprob": -0.27583667719475574, + "compression_ratio": 1.7725321888412018, "no_speech_prob": 0.00028894373099319637}, + {"id": 711, "seek": 233162, "start": 2351.22, "end": 2354.14, "text": " And then + Korean barbecue can be represented as category", "tokens": [51344, 400, 550, 6933, + 21877, 393, 312, 10379, 382, 7719, 51490], "temperature": 0.0, "avg_logprob": -0.27583667719475574, + "compression_ratio": 1.7725321888412018, "no_speech_prob": 0.00028894373099319637}, + {"id": 712, "seek": 233162, "start": 2354.14, "end": 2357.54, "text": " of Korean + with terms like Korean,", "tokens": [51490, 295, 6933, 365, 2115, 411, 6933, 11, + 51660], "temperature": 0.0, "avg_logprob": -0.27583667719475574, "compression_ratio": + 1.7725321888412018, "no_speech_prob": 0.00028894373099319637}, {"id": 713, "seek": + 233162, "start": 2357.54, "end": 2360.42, "text": " Bonchon, Saruwan, et cetera, + fast food,", "tokens": [51660, 7368, 339, 266, 11, 6894, 84, 7916, 11, 1030, 11458, + 11, 2370, 1755, 11, 51804], "temperature": 0.0, "avg_logprob": -0.27583667719475574, + "compression_ratio": 1.7725321888412018, "no_speech_prob": 0.00028894373099319637}, + {"id": 714, "seek": 236042, "start": 2360.42, "end": 2362.1800000000003, "text": + " gets things like McDonald''s in window.", "tokens": [50364, 2170, 721, 411, 16889, + 311, 294, 4910, 13, 50452], "temperature": 0.0, "avg_logprob": -0.12369064952051917, + "compression_ratio": 1.7932330827067668, "no_speech_prob": 0.0006806566379964352}, + {"id": 715, "seek": 236042, "start": 2362.1800000000003, "end": 2363.98, "text": + " So this is purely leveraging,", "tokens": [50452, 407, 341, 307, 17491, 32666, + 11, 50542], "temperature": 0.0, "avg_logprob": -0.12369064952051917, "compression_ratio": + 1.7932330827067668, "no_speech_prob": 0.0006806566379964352}, {"id": 716, "seek": + 236042, "start": 2363.98, "end": 2365.34, "text": " and I''ve been doing this for + years,", "tokens": [50542, 293, 286, 600, 668, 884, 341, 337, 924, 11, 50610], "temperature": + 0.0, "avg_logprob": -0.12369064952051917, "compression_ratio": 1.7932330827067668, + "no_speech_prob": 0.0006806566379964352}, {"id": 717, "seek": 236042, "start": 2365.34, + "end": 2367.34, "text": " and it works very, very well.", "tokens": [50610, 293, + 309, 1985, 588, 11, 588, 731, 13, 50710], "temperature": 0.0, "avg_logprob": -0.12369064952051917, + "compression_ratio": 1.7932330827067668, "no_speech_prob": 0.0006806566379964352}, + {"id": 718, "seek": 236042, "start": 2367.34, "end": 2369.9, "text": " But this + is purely leveraging the inverted index", "tokens": [50710, 583, 341, 307, 17491, + 32666, 264, 38969, 8186, 50838], "temperature": 0.0, "avg_logprob": -0.12369064952051917, + "compression_ratio": 1.7932330827067668, "no_speech_prob": 0.0006806566379964352}, + {"id": 719, "seek": 236042, "start": 2369.9, "end": 2371.3, "text": " in this document + set.", "tokens": [50838, 294, 341, 4166, 992, 13, 50908], "temperature": 0.0, "avg_logprob": + -0.12369064952051917, "compression_ratio": 1.7932330827067668, "no_speech_prob": + 0.0006806566379964352}, {"id": 720, "seek": 236042, "start": 2371.3, "end": 2373.1800000000003, + "text": " But the idea with the wormhole vectors", "tokens": [50908, 583, 264, 1558, + 365, 264, 23835, 14094, 18875, 51002], "temperature": 0.0, "avg_logprob": -0.12369064952051917, + "compression_ratio": 1.7932330827067668, "no_speech_prob": 0.0006806566379964352}, + {"id": 721, "seek": 236042, "start": 2373.1800000000003, "end": 2377.5, "text": + " is not just to stay within a single vector space,", "tokens": [51002, 307, 406, + 445, 281, 1754, 1951, 257, 2167, 8062, 1901, 11, 51218], "temperature": 0.0, "avg_logprob": + -0.12369064952051917, "compression_ratio": 1.7932330827067668, "no_speech_prob": + 0.0006806566379964352}, {"id": 722, "seek": 236042, "start": 2377.5, "end": 2379.14, + "text": " but to be able to go across vector spaces.", "tokens": [51218, 457, 281, + 312, 1075, 281, 352, 2108, 8062, 7673, 13, 51300], "temperature": 0.0, "avg_logprob": + -0.12369064952051917, "compression_ratio": 1.7932330827067668, "no_speech_prob": + 0.0006806566379964352}, {"id": 723, "seek": 236042, "start": 2379.14, "end": 2383.2200000000003, + "text": " So similarly, I should be able to take an embedding", "tokens": [51300, + 407, 14138, 11, 286, 820, 312, 1075, 281, 747, 364, 12240, 3584, 51504], "temperature": + 0.0, "avg_logprob": -0.12369064952051917, "compression_ratio": 1.7932330827067668, + "no_speech_prob": 0.0006806566379964352}, {"id": 724, "seek": 236042, "start": 2383.2200000000003, + "end": 2386.9, "text": " that finds a region in semantic vector space", "tokens": + [51504, 300, 10704, 257, 4458, 294, 47982, 8062, 1901, 51688], "temperature": 0.0, + "avg_logprob": -0.12369064952051917, "compression_ratio": 1.7932330827067668, "no_speech_prob": + 0.0006806566379964352}, {"id": 725, "seek": 236042, "start": 2386.9, "end": 2389.62, + "text": " and a dense space, find the nearby things,", "tokens": [51688, 293, 257, + 18011, 1901, 11, 915, 264, 11184, 721, 11, 51824], "temperature": 0.0, "avg_logprob": + -0.12369064952051917, "compression_ratio": 1.7932330827067668, "no_speech_prob": + 0.0006806566379964352}, {"id": 726, "seek": 238962, "start": 2389.62, "end": 2392.02, + "text": " which ultimately just translate to a dockset.", "tokens": [50364, 597, + 6284, 445, 13799, 281, 257, 360, 2761, 302, 13, 50484], "temperature": 0.0, "avg_logprob": + -0.12167274212015086, "compression_ratio": 1.7849829351535835, "no_speech_prob": + 8.210516534745693e-05}, {"id": 727, "seek": 238962, "start": 2392.02, "end": 2394.7, + "text": " And then from that dockset, I can use the same technique", "tokens": [50484, + 400, 550, 490, 300, 360, 2761, 302, 11, 286, 393, 764, 264, 912, 6532, 50618], "temperature": + 0.0, "avg_logprob": -0.12167274212015086, "compression_ratio": 1.7849829351535835, + "no_speech_prob": 8.210516534745693e-05}, {"id": 728, "seek": 238962, "start": 2394.7, + "end": 2396.7799999999997, "text": " to say what are the things that are related", + "tokens": [50618, 281, 584, 437, 366, 264, 721, 300, 366, 4077, 50722], "temperature": + 0.0, "avg_logprob": -0.12167274212015086, "compression_ratio": 1.7849829351535835, + "no_speech_prob": 8.210516534745693e-05}, {"id": 729, "seek": 238962, "start": 2396.7799999999997, + "end": 2399.14, "text": " within these documents and generate", "tokens": [50722, + 1951, 613, 8512, 293, 8460, 50840], "temperature": 0.0, "avg_logprob": -0.12167274212015086, + "compression_ratio": 1.7849829351535835, "no_speech_prob": 8.210516534745693e-05}, + {"id": 730, "seek": 238962, "start": 2399.14, "end": 2401.7, "text": " the similar + kinds of outputs over here?", "tokens": [50840, 264, 2531, 3685, 295, 23930, 670, + 510, 30, 50968], "temperature": 0.0, "avg_logprob": -0.12167274212015086, "compression_ratio": + 1.7849829351535835, "no_speech_prob": 8.210516534745693e-05}, {"id": 731, "seek": + 238962, "start": 2401.7, "end": 2404.7, "text": " You can also think of this, if + taking away", "tokens": [50968, 509, 393, 611, 519, 295, 341, 11, 498, 1940, 1314, + 51118], "temperature": 0.0, "avg_logprob": -0.12167274212015086, "compression_ratio": + 1.7849829351535835, "no_speech_prob": 8.210516534745693e-05}, {"id": 732, "seek": + 238962, "start": 2404.7, "end": 2407.22, "text": " all the wormhole vector terminology,", + "tokens": [51118, 439, 264, 23835, 14094, 8062, 27575, 11, 51244], "temperature": + 0.0, "avg_logprob": -0.12167274212015086, "compression_ratio": 1.7849829351535835, + "no_speech_prob": 8.210516534745693e-05}, {"id": 733, "seek": 238962, "start": 2407.22, + "end": 2408.94, "text": " you can think of this as just a way to make", "tokens": + [51244, 291, 393, 519, 295, 341, 382, 445, 257, 636, 281, 652, 51330], "temperature": + 0.0, "avg_logprob": -0.12167274212015086, "compression_ratio": 1.7849829351535835, + "no_speech_prob": 8.210516534745693e-05}, {"id": 734, "seek": 238962, "start": 2408.94, + "end": 2410.98, "text": " your embeddings more explainable.", "tokens": [51330, + 428, 12240, 29432, 544, 2903, 712, 13, 51432], "temperature": 0.0, "avg_logprob": + -0.12167274212015086, "compression_ratio": 1.7849829351535835, "no_speech_prob": + 8.210516534745693e-05}, {"id": 735, "seek": 238962, "start": 2410.98, "end": 2413.66, + "text": " I''ve got an embedding, I go to a dense vector space,", "tokens": [51432, + 286, 600, 658, 364, 12240, 3584, 11, 286, 352, 281, 257, 18011, 8062, 1901, 11, + 51566], "temperature": 0.0, "avg_logprob": -0.12167274212015086, "compression_ratio": + 1.7849829351535835, "no_speech_prob": 8.210516534745693e-05}, {"id": 736, "seek": + 238962, "start": 2413.66, "end": 2416.02, "text": " I find documents, and then from + that set of documents,", "tokens": [51566, 286, 915, 8512, 11, 293, 550, 490, 300, + 992, 295, 8512, 11, 51684], "temperature": 0.0, "avg_logprob": -0.12167274212015086, + "compression_ratio": 1.7849829351535835, "no_speech_prob": 8.210516534745693e-05}, + {"id": 737, "seek": 238962, "start": 2416.02, "end": 2418.14, "text": " I''m now + deriving a lexical vector,", "tokens": [51684, 286, 478, 586, 1163, 2123, 257, 476, + 87, 804, 8062, 11, 51790], "temperature": 0.0, "avg_logprob": -0.12167274212015086, + "compression_ratio": 1.7849829351535835, "no_speech_prob": 8.210516534745693e-05}, + {"id": 738, "seek": 241814, "start": 2418.14, "end": 2421.58, "text": " which is + readable that''s describing what''s happening there.", "tokens": [50364, 597, 307, + 49857, 300, 311, 16141, 437, 311, 2737, 456, 13, 50536], "temperature": 0.0, "avg_logprob": + -0.13206357404220204, "compression_ratio": 1.6751824817518248, "no_speech_prob": + 0.0001020477429847233}, {"id": 739, "seek": 241814, "start": 2421.58, "end": 2422.7799999999997, + "text": " And of course, I can then turn around", "tokens": [50536, 400, 295, 1164, + 11, 286, 393, 550, 1261, 926, 50596], "temperature": 0.0, "avg_logprob": -0.13206357404220204, + "compression_ratio": 1.6751824817518248, "no_speech_prob": 0.0001020477429847233}, + {"id": 740, "seek": 241814, "start": 2422.7799999999997, "end": 2425.66, "text": + " and take that and query in my sparse space", "tokens": [50596, 293, 747, 300, + 293, 14581, 294, 452, 637, 11668, 1901, 50740], "temperature": 0.0, "avg_logprob": + -0.13206357404220204, "compression_ratio": 1.6751824817518248, "no_speech_prob": + 0.0001020477429847233}, {"id": 741, "seek": 241814, "start": 2425.66, "end": 2428.2999999999997, + "text": " to match other things that have the terms,", "tokens": [50740, 281, 2995, + 661, 721, 300, 362, 264, 2115, 11, 50872], "temperature": 0.0, "avg_logprob": -0.13206357404220204, + "compression_ratio": 1.6751824817518248, "no_speech_prob": 0.0001020477429847233}, + {"id": 742, "seek": 241814, "start": 2428.2999999999997, "end": 2429.8599999999997, + "text": " but maybe didn''t match in the dense space.", "tokens": [50872, 457, 1310, + 994, 380, 2995, 294, 264, 18011, 1901, 13, 50950], "temperature": 0.0, "avg_logprob": + -0.13206357404220204, "compression_ratio": 1.6751824817518248, "no_speech_prob": + 0.0001020477429847233}, {"id": 743, "seek": 241814, "start": 2429.8599999999997, + "end": 2431.7, "text": " So that''s the general idea.", "tokens": [50950, 407, 300, + 311, 264, 2674, 1558, 13, 51042], "temperature": 0.0, "avg_logprob": -0.13206357404220204, + "compression_ratio": 1.6751824817518248, "no_speech_prob": 0.0001020477429847233}, + {"id": 744, "seek": 241814, "start": 2432.8199999999997, "end": 2437.9, "text": + " There''s one last thing I wanted to cover briefly,", "tokens": [51098, 821, 311, + 472, 1036, 551, 286, 1415, 281, 2060, 10515, 11, 51352], "temperature": 0.0, "avg_logprob": + -0.13206357404220204, "compression_ratio": 1.6751824817518248, "no_speech_prob": + 0.0001020477429847233}, {"id": 745, "seek": 241814, "start": 2437.9, "end": 2440.62, + "text": " which is this notion of behavioral embedding spaces,", "tokens": [51352, + 597, 307, 341, 10710, 295, 19124, 12240, 3584, 7673, 11, 51488], "temperature": + 0.0, "avg_logprob": -0.13206357404220204, "compression_ratio": 1.6751824817518248, + "no_speech_prob": 0.0001020477429847233}, {"id": 746, "seek": 241814, "start": 2440.62, + "end": 2442.66, "text": " because I''ve mentioned it multiple times.", "tokens": + [51488, 570, 286, 600, 2835, 309, 3866, 1413, 13, 51590], "temperature": 0.0, "avg_logprob": + -0.13206357404220204, "compression_ratio": 1.6751824817518248, "no_speech_prob": + 0.0001020477429847233}, {"id": 747, "seek": 241814, "start": 2442.66, "end": 2446.22, + "text": " And I have a feeling a lot of people aren''t super familiar.", "tokens": + [51590, 400, 286, 362, 257, 2633, 257, 688, 295, 561, 3212, 380, 1687, 4963, 13, + 51768], "temperature": 0.0, "avg_logprob": -0.13206357404220204, "compression_ratio": + 1.6751824817518248, "no_speech_prob": 0.0001020477429847233}, {"id": 748, "seek": + 244622, "start": 2446.2999999999997, "end": 2450.14, "text": " And so let me click + here.", "tokens": [50368, 400, 370, 718, 385, 2052, 510, 13, 50560], "temperature": + 0.0, "avg_logprob": -0.12966444553473058, "compression_ratio": 1.6072727272727272, + "no_speech_prob": 0.0005561095895245671}, {"id": 749, "seek": 244622, "start": 2450.14, + "end": 2451.98, "text": " The general idea, and I''ll be very quick through this,", + "tokens": [50560, 440, 2674, 1558, 11, 293, 286, 603, 312, 588, 1702, 807, 341, + 11, 50652], "temperature": 0.0, "avg_logprob": -0.12966444553473058, "compression_ratio": + 1.6072727272727272, "no_speech_prob": 0.0005561095895245671}, {"id": 750, "seek": + 244622, "start": 2451.98, "end": 2453.8199999999997, "text": " we''ll spend more + time in the AI Power Search course", "tokens": [50652, 321, 603, 3496, 544, 565, + 294, 264, 7318, 7086, 17180, 1164, 50744], "temperature": 0.0, "avg_logprob": -0.12966444553473058, + "compression_ratio": 1.6072727272727272, "no_speech_prob": 0.0005561095895245671}, + {"id": 751, "seek": 244622, "start": 2453.8199999999997, "end": 2457.7799999999997, + "text": " diving into all of this, but the very high level intuition", "tokens": + [50744, 20241, 666, 439, 295, 341, 11, 457, 264, 588, 1090, 1496, 24002, 50942], + "temperature": 0.0, "avg_logprob": -0.12966444553473058, "compression_ratio": 1.6072727272727272, + "no_speech_prob": 0.0005561095895245671}, {"id": 752, "seek": 244622, "start": 2457.7799999999997, + "end": 2461.2999999999997, "text": " is that when users interact with your documents, + right?", "tokens": [50942, 307, 300, 562, 5022, 4648, 365, 428, 8512, 11, 558, 30, + 51118], "temperature": 0.0, "avg_logprob": -0.12966444553473058, "compression_ratio": + 1.6072727272727272, "no_speech_prob": 0.0005561095895245671}, {"id": 753, "seek": + 244622, "start": 2461.2999999999997, "end": 2463.74, "text": " They run queries, + they click on things,", "tokens": [51118, 814, 1190, 24109, 11, 436, 2052, 322, + 721, 11, 51240], "temperature": 0.0, "avg_logprob": -0.12966444553473058, "compression_ratio": + 1.6072727272727272, "no_speech_prob": 0.0005561095895245671}, {"id": 754, "seek": + 244622, "start": 2463.74, "end": 2466.2999999999997, "text": " they like them, add + to cart purchase,", "tokens": [51240, 436, 411, 552, 11, 909, 281, 5467, 8110, 11, + 51368], "temperature": 0.0, "avg_logprob": -0.12966444553473058, "compression_ratio": + 1.6072727272727272, "no_speech_prob": 0.0005561095895245671}, {"id": 755, "seek": + 244622, "start": 2466.2999999999997, "end": 2468.3799999999997, "text": " those + are user behavioral signals.", "tokens": [51368, 729, 366, 4195, 19124, 12354, 13, + 51472], "temperature": 0.0, "avg_logprob": -0.12966444553473058, "compression_ratio": + 1.6072727272727272, "no_speech_prob": 0.0005561095895245671}, {"id": 756, "seek": + 244622, "start": 2468.3799999999997, "end": 2470.98, "text": " And if you''ve got + a sufficient amount of traffic,", "tokens": [51472, 400, 498, 291, 600, 658, 257, + 11563, 2372, 295, 6419, 11, 51602], "temperature": 0.0, "avg_logprob": -0.12966444553473058, + "compression_ratio": 1.6072727272727272, "no_speech_prob": 0.0005561095895245671}, + {"id": 757, "seek": 244622, "start": 2470.98, "end": 2472.58, "text": " you want + to be collecting those", "tokens": [51602, 291, 528, 281, 312, 12510, 729, 51682], + "temperature": 0.0, "avg_logprob": -0.12966444553473058, "compression_ratio": 1.6072727272727272, + "no_speech_prob": 0.0005561095895245671}, {"id": 758, "seek": 247258, "start": 2472.58, + "end": 2477.42, "text": " and leveraging them to build reflected intelligence algorithms.", + "tokens": [50364, 293, 32666, 552, 281, 1322, 15502, 7599, 14642, 13, 50606], "temperature": + 0.0, "avg_logprob": -0.1764691925048828, "compression_ratio": 1.828358208955224, + "no_speech_prob": 0.000283190980553627}, {"id": 759, "seek": 247258, "start": 2477.42, + "end": 2479.62, "text": " So one of the types I mentioned", "tokens": [50606, 407, + 472, 295, 264, 3467, 286, 2835, 50716], "temperature": 0.0, "avg_logprob": -0.1764691925048828, + "compression_ratio": 1.828358208955224, "no_speech_prob": 0.000283190980553627}, + {"id": 760, "seek": 247258, "start": 2479.62, "end": 2482.8199999999997, "text": + " several earlier signals boosting, collaborative filtering,", "tokens": [50716, + 2940, 3071, 12354, 43117, 11, 16555, 30822, 11, 50876], "temperature": 0.0, "avg_logprob": + -0.1764691925048828, "compression_ratio": 1.828358208955224, "no_speech_prob": 0.000283190980553627}, + {"id": 761, "seek": 247258, "start": 2482.8199999999997, "end": 2484.9, "text": + " and matrix factorization, learning to rank,", "tokens": [50876, 293, 8141, 5952, + 2144, 11, 2539, 281, 6181, 11, 50980], "temperature": 0.0, "avg_logprob": -0.1764691925048828, + "compression_ratio": 1.828358208955224, "no_speech_prob": 0.000283190980553627}, + {"id": 762, "seek": 247258, "start": 2484.9, "end": 2486.66, "text": " and knowledge + graph learning,", "tokens": [50980, 293, 3601, 4295, 2539, 11, 51068], "temperature": + 0.0, "avg_logprob": -0.1764691925048828, "compression_ratio": 1.828358208955224, + "no_speech_prob": 0.000283190980553627}, {"id": 763, "seek": 247258, "start": 2486.66, + "end": 2489.18, "text": " but specifically on collaborative filtering,", "tokens": + [51068, 457, 4682, 322, 16555, 30822, 11, 51194], "temperature": 0.0, "avg_logprob": + -0.1764691925048828, "compression_ratio": 1.828358208955224, "no_speech_prob": 0.000283190980553627}, + {"id": 764, "seek": 247258, "start": 2489.18, "end": 2492.38, "text": " which is + mostly focused on personalized search,", "tokens": [51194, 597, 307, 5240, 5178, + 322, 28415, 3164, 11, 51354], "temperature": 0.0, "avg_logprob": -0.1764691925048828, + "compression_ratio": 1.828358208955224, "no_speech_prob": 0.000283190980553627}, + {"id": 765, "seek": 247258, "start": 2492.38, "end": 2493.7799999999997, "text": + " or understanding user behavior", "tokens": [51354, 420, 3701, 4195, 5223, 51424], + "temperature": 0.0, "avg_logprob": -0.1764691925048828, "compression_ratio": 1.828358208955224, + "no_speech_prob": 0.000283190980553627}, {"id": 766, "seek": 247258, "start": 2493.7799999999997, + "end": 2496.34, "text": " to generate better personalized results.", "tokens": [51424, + 281, 8460, 1101, 28415, 3542, 13, 51552], "temperature": 0.0, "avg_logprob": -0.1764691925048828, + "compression_ratio": 1.828358208955224, "no_speech_prob": 0.000283190980553627}, + {"id": 767, "seek": 247258, "start": 2496.34, "end": 2498.7, "text": " We typically + leverage collaborative filtering,", "tokens": [51552, 492, 5850, 13982, 16555, 30822, + 11, 51670], "temperature": 0.0, "avg_logprob": -0.1764691925048828, "compression_ratio": + 1.828358208955224, "no_speech_prob": 0.000283190980553627}, {"id": 768, "seek": + 247258, "start": 2498.7, "end": 2500.86, "text": " which is now algorithm for doing + recommendations.", "tokens": [51670, 597, 307, 586, 9284, 337, 884, 10434, 13, 51778], + "temperature": 0.0, "avg_logprob": -0.1764691925048828, "compression_ratio": 1.828358208955224, + "no_speech_prob": 0.000283190980553627}, {"id": 769, "seek": 250086, "start": 2500.86, + "end": 2502.98, "text": " So I start with a particular item,", "tokens": [50364, + 407, 286, 722, 365, 257, 1729, 3174, 11, 50470], "temperature": 0.0, "avg_logprob": + -0.17474715492942117, "compression_ratio": 1.7568627450980392, "no_speech_prob": + 0.0001063140225596726}, {"id": 770, "seek": 250086, "start": 2502.98, "end": 2506.3, + "text": " or particular user, and I recommend other items", "tokens": [50470, 420, + 1729, 4195, 11, 293, 286, 2748, 661, 4754, 50636], "temperature": 0.0, "avg_logprob": + -0.17474715492942117, "compression_ratio": 1.7568627450980392, "no_speech_prob": + 0.0001063140225596726}, {"id": 771, "seek": 250086, "start": 2506.3, "end": 2508.46, + "text": " based upon that item or user.", "tokens": [50636, 2361, 3564, 300, 3174, + 420, 4195, 13, 50744], "temperature": 0.0, "avg_logprob": -0.17474715492942117, + "compression_ratio": 1.7568627450980392, "no_speech_prob": 0.0001063140225596726}, + {"id": 772, "seek": 250086, "start": 2508.46, "end": 2510.3, "text": " So this is + what that typically looks like, right?", "tokens": [50744, 407, 341, 307, 437, 300, + 5850, 1542, 411, 11, 558, 30, 50836], "temperature": 0.0, "avg_logprob": -0.17474715492942117, + "compression_ratio": 1.7568627450980392, "no_speech_prob": 0.0001063140225596726}, + {"id": 773, "seek": 250086, "start": 2510.3, "end": 2513.46, "text": " Somebody + runs searches or purchases things like Apple", "tokens": [50836, 13463, 6676, 26701, + 420, 26762, 721, 411, 6373, 50994], "temperature": 0.0, "avg_logprob": -0.17474715492942117, + "compression_ratio": 1.7568627450980392, "no_speech_prob": 0.0001063140225596726}, + {"id": 774, "seek": 250086, "start": 2513.46, "end": 2516.7000000000003, "text": + " and MacBook, and then these are the items they interact with,", "tokens": [50994, + 293, 31737, 11, 293, 550, 613, 366, 264, 4754, 436, 4648, 365, 11, 51156], "temperature": + 0.0, "avg_logprob": -0.17474715492942117, "compression_ratio": 1.7568627450980392, + "no_speech_prob": 0.0001063140225596726}, {"id": 775, "seek": 250086, "start": 2516.7000000000003, + "end": 2519.82, "text": " iPads and MacBook Airs, things like that.", "tokens": + [51156, 5180, 5834, 293, 31737, 5774, 82, 11, 721, 411, 300, 13, 51312], "temperature": + 0.0, "avg_logprob": -0.17474715492942117, "compression_ratio": 1.7568627450980392, + "no_speech_prob": 0.0001063140225596726}, {"id": 776, "seek": 250086, "start": 2519.82, + "end": 2521.7400000000002, "text": " And then for that user, we can generate", "tokens": + [51312, 400, 550, 337, 300, 4195, 11, 321, 393, 8460, 51408], "temperature": 0.0, + "avg_logprob": -0.17474715492942117, "compression_ratio": 1.7568627450980392, "no_speech_prob": + 0.0001063140225596726}, {"id": 777, "seek": 250086, "start": 2521.7400000000002, + "end": 2524.78, "text": " this list of recommendations based upon running", "tokens": + [51408, 341, 1329, 295, 10434, 2361, 3564, 2614, 51560], "temperature": 0.0, "avg_logprob": + -0.17474715492942117, "compression_ratio": 1.7568627450980392, "no_speech_prob": + 0.0001063140225596726}, {"id": 778, "seek": 250086, "start": 2524.78, "end": 2527.78, + "text": " this collaborative filtering algorithm.", "tokens": [51560, 341, 16555, + 30822, 9284, 13, 51710], "temperature": 0.0, "avg_logprob": -0.17474715492942117, + "compression_ratio": 1.7568627450980392, "no_speech_prob": 0.0001063140225596726}, + {"id": 779, "seek": 252778, "start": 2527.78, "end": 2532.5, "text": " In this case, + I want to briefly mention again,", "tokens": [50364, 682, 341, 1389, 11, 286, 528, + 281, 10515, 2152, 797, 11, 50600], "temperature": 0.0, "avg_logprob": -0.1571255260043674, + "compression_ratio": 1.8680851063829786, "no_speech_prob": 0.0008366529946215451}, + {"id": 780, "seek": 252778, "start": 2532.5, "end": 2535.78, "text": " that with + typical content based embeddings,", "tokens": [50600, 300, 365, 7476, 2701, 2361, + 12240, 29432, 11, 50764], "temperature": 0.0, "avg_logprob": -0.1571255260043674, + "compression_ratio": 1.8680851063829786, "no_speech_prob": 0.0008366529946215451}, + {"id": 781, "seek": 252778, "start": 2535.78, "end": 2538.26, "text": " I mentioned + latent features earlier.", "tokens": [50764, 286, 2835, 48994, 4122, 3071, 13, 50888], + "temperature": 0.0, "avg_logprob": -0.1571255260043674, "compression_ratio": 1.8680851063829786, + "no_speech_prob": 0.0008366529946215451}, {"id": 782, "seek": 252778, "start": 2538.26, + "end": 2540.46, "text": " Typically you have items,", "tokens": [50888, 23129, 291, + 362, 4754, 11, 50998], "temperature": 0.0, "avg_logprob": -0.1571255260043674, "compression_ratio": + 1.8680851063829786, "no_speech_prob": 0.0008366529946215451}, {"id": 783, "seek": + 252778, "start": 2540.46, "end": 2544.7000000000003, "text": " and there''s these + densely packed dimensions", "tokens": [50998, 293, 456, 311, 613, 24505, 736, 13265, + 12819, 51210], "temperature": 0.0, "avg_logprob": -0.1571255260043674, "compression_ratio": + 1.8680851063829786, "no_speech_prob": 0.0008366529946215451}, {"id": 784, "seek": + 252778, "start": 2544.7000000000003, "end": 2547.5, "text": " that represents different + features.", "tokens": [51210, 300, 8855, 819, 4122, 13, 51350], "temperature": 0.0, + "avg_logprob": -0.1571255260043674, "compression_ratio": 1.8680851063829786, "no_speech_prob": + 0.0008366529946215451}, {"id": 785, "seek": 252778, "start": 2547.5, "end": 2550.34, + "text": " Collectively, this particular feature", "tokens": [51350, 31896, 3413, + 11, 341, 1729, 4111, 51492], "temperature": 0.0, "avg_logprob": -0.1571255260043674, + "compression_ratio": 1.8680851063829786, "no_speech_prob": 0.0008366529946215451}, + {"id": 786, "seek": 252778, "start": 2550.34, "end": 2552.5, "text": " might have + a strong correlation with size,", "tokens": [51492, 1062, 362, 257, 2068, 20009, + 365, 2744, 11, 51600], "temperature": 0.0, "avg_logprob": -0.1571255260043674, "compression_ratio": + 1.8680851063829786, "no_speech_prob": 0.0008366529946215451}, {"id": 787, "seek": + 252778, "start": 2552.5, "end": 2554.5800000000004, "text": " this one I have a + strong correlation with color,", "tokens": [51600, 341, 472, 286, 362, 257, 2068, + 20009, 365, 2017, 11, 51704], "temperature": 0.0, "avg_logprob": -0.1571255260043674, + "compression_ratio": 1.8680851063829786, "no_speech_prob": 0.0008366529946215451}, + {"id": 788, "seek": 252778, "start": 2554.5800000000004, "end": 2556.1400000000003, + "text": " this one I have a strong correlation with,", "tokens": [51704, 341, 472, + 286, 362, 257, 2068, 20009, 365, 11, 51782], "temperature": 0.0, "avg_logprob": + -0.1571255260043674, "compression_ratio": 1.8680851063829786, "no_speech_prob": + 0.0008366529946215451}, {"id": 789, "seek": 252778, "start": 2556.1400000000003, + "end": 2557.7400000000002, "text": " is this kind of like a computer.", "tokens": + [51782, 307, 341, 733, 295, 411, 257, 3820, 13, 51862], "temperature": 0.0, "avg_logprob": + -0.1571255260043674, "compression_ratio": 1.8680851063829786, "no_speech_prob": + 0.0008366529946215451}, {"id": 790, "seek": 255774, "start": 2557.74, "end": 2562.18, + "text": " But those meanings spread across many different", "tokens": [50364, 583, + 729, 28138, 3974, 2108, 867, 819, 50586], "temperature": 0.0, "avg_logprob": -0.1256363902773176, + "compression_ratio": 1.96875, "no_speech_prob": 8.213351247832179e-05}, {"id": 791, + "seek": 255774, "start": 2562.18, "end": 2564.2999999999997, "text": " of these + dimensions.", "tokens": [50586, 295, 613, 12819, 13, 50692], "temperature": 0.0, + "avg_logprob": -0.1256363902773176, "compression_ratio": 1.96875, "no_speech_prob": + 8.213351247832179e-05}, {"id": 792, "seek": 255774, "start": 2564.2999999999997, + "end": 2567.2599999999998, "text": " Similarly, whenever we''re doing collaborative + filtering,", "tokens": [50692, 13157, 11, 5699, 321, 434, 884, 16555, 30822, 11, + 50840], "temperature": 0.0, "avg_logprob": -0.1256363902773176, "compression_ratio": + 1.96875, "no_speech_prob": 8.213351247832179e-05}, {"id": 793, "seek": 255774, "start": + 2567.2599999999998, "end": 2571.06, "text": " these also rely on latent features + or latent dimensions.", "tokens": [50840, 613, 611, 10687, 322, 48994, 4122, 420, + 48994, 12819, 13, 51030], "temperature": 0.0, "avg_logprob": -0.1256363902773176, + "compression_ratio": 1.96875, "no_speech_prob": 8.213351247832179e-05}, {"id": 794, + "seek": 255774, "start": 2571.06, "end": 2574.4599999999996, "text": " So for example, + if I have a bunch of users,", "tokens": [51030, 407, 337, 1365, 11, 498, 286, 362, + 257, 3840, 295, 5022, 11, 51200], "temperature": 0.0, "avg_logprob": -0.1256363902773176, + "compression_ratio": 1.96875, "no_speech_prob": 8.213351247832179e-05}, {"id": 795, + "seek": 255774, "start": 2574.4599999999996, "end": 2577.18, "text": " my first + user likes these three movies,", "tokens": [51200, 452, 700, 4195, 5902, 613, 1045, + 6233, 11, 51336], "temperature": 0.0, "avg_logprob": -0.1256363902773176, "compression_ratio": + 1.96875, "no_speech_prob": 8.213351247832179e-05}, {"id": 796, "seek": 255774, "start": + 2577.18, "end": 2579.58, "text": " my next user likes these three movies,", "tokens": + [51336, 452, 958, 4195, 5902, 613, 1045, 6233, 11, 51456], "temperature": 0.0, "avg_logprob": + -0.1256363902773176, "compression_ratio": 1.96875, "no_speech_prob": 8.213351247832179e-05}, + {"id": 797, "seek": 255774, "start": 2579.58, "end": 2581.3399999999997, "text": + " my third user likes these three,", "tokens": [51456, 452, 2636, 4195, 5902, 613, + 1045, 11, 51544], "temperature": 0.0, "avg_logprob": -0.1256363902773176, "compression_ratio": + 1.96875, "no_speech_prob": 8.213351247832179e-05}, {"id": 798, "seek": 255774, "start": + 2581.3399999999997, "end": 2582.8999999999996, "text": " my next user likes these + three,", "tokens": [51544, 452, 958, 4195, 5902, 613, 1045, 11, 51622], "temperature": + 0.0, "avg_logprob": -0.1256363902773176, "compression_ratio": 1.96875, "no_speech_prob": + 8.213351247832179e-05}, {"id": 799, "seek": 255774, "start": 2582.8999999999996, + "end": 2585.06, "text": " and my last user likes these three.", "tokens": [51622, + 293, 452, 1036, 4195, 5902, 613, 1045, 13, 51730], "temperature": 0.0, "avg_logprob": + -0.1256363902773176, "compression_ratio": 1.96875, "no_speech_prob": 8.213351247832179e-05}, + {"id": 800, "seek": 255774, "start": 2585.06, "end": 2586.8199999999997, "text": + " You can kind of visually see here,", "tokens": [51730, 509, 393, 733, 295, 19622, + 536, 510, 11, 51818], "temperature": 0.0, "avg_logprob": -0.1256363902773176, "compression_ratio": + 1.96875, "no_speech_prob": 8.213351247832179e-05}, {"id": 801, "seek": 258682, "start": + 2586.82, "end": 2589.6600000000003, "text": " these are some similarity here,", + "tokens": [50364, 613, 366, 512, 32194, 510, 11, 50506], "temperature": 0.0, "avg_logprob": + -0.16618667431731723, "compression_ratio": 1.8274647887323943, "no_speech_prob": + 0.00411580502986908}, {"id": 802, "seek": 258682, "start": 2589.6600000000003, "end": + 2591.1000000000004, "text": " there''s some similarity here,", "tokens": [50506, + 456, 311, 512, 32194, 510, 11, 50578], "temperature": 0.0, "avg_logprob": -0.16618667431731723, + "compression_ratio": 1.8274647887323943, "no_speech_prob": 0.00411580502986908}, + {"id": 803, "seek": 258682, "start": 2591.1000000000004, "end": 2593.38, "text": + " your brain''s probably picking out what it is.", "tokens": [50578, 428, 3567, + 311, 1391, 8867, 484, 437, 309, 307, 13, 50692], "temperature": 0.0, "avg_logprob": + -0.16618667431731723, "compression_ratio": 1.8274647887323943, "no_speech_prob": + 0.00411580502986908}, {"id": 804, "seek": 258682, "start": 2593.38, "end": 2596.46, + "text": " But if I were to map these conceptually,", "tokens": [50692, 583, 498, + 286, 645, 281, 4471, 613, 3410, 671, 11, 50846], "temperature": 0.0, "avg_logprob": + -0.16618667431731723, "compression_ratio": 1.8274647887323943, "no_speech_prob": + 0.00411580502986908}, {"id": 805, "seek": 258682, "start": 2596.46, "end": 2599.34, + "text": " I would say that users one, two, and three", "tokens": [50846, 286, 576, + 584, 300, 5022, 472, 11, 732, 11, 293, 1045, 50990], "temperature": 0.0, "avg_logprob": + -0.16618667431731723, "compression_ratio": 1.8274647887323943, "no_speech_prob": + 0.00411580502986908}, {"id": 806, "seek": 258682, "start": 2599.34, "end": 2601.46, + "text": " tended to like movies that were about superheroes", "tokens": [50990, + 34732, 281, 411, 6233, 300, 645, 466, 45417, 51096], "temperature": 0.0, "avg_logprob": + -0.16618667431731723, "compression_ratio": 1.8274647887323943, "no_speech_prob": + 0.00411580502986908}, {"id": 807, "seek": 258682, "start": 2601.46, "end": 2604.46, + "text": " made by Marvel Studios and occasionally Warner Brothers,", "tokens": [51096, + 1027, 538, 13837, 23005, 293, 16895, 31769, 19886, 11, 51246], "temperature": 0.0, + "avg_logprob": -0.16618667431731723, "compression_ratio": 1.8274647887323943, "no_speech_prob": + 0.00411580502986908}, {"id": 808, "seek": 258682, "start": 2604.46, "end": 2605.5800000000004, + "text": " they''re all action movies,", "tokens": [51246, 436, 434, 439, 3069, 6233, + 11, 51302], "temperature": 0.0, "avg_logprob": -0.16618667431731723, "compression_ratio": + 1.8274647887323943, "no_speech_prob": 0.00411580502986908}, {"id": 809, "seek": + 258682, "start": 2605.5800000000004, "end": 2607.54, "text": " and they''re not + suitable for small children.", "tokens": [51302, 293, 436, 434, 406, 12873, 337, + 1359, 2227, 13, 51400], "temperature": 0.0, "avg_logprob": -0.16618667431731723, + "compression_ratio": 1.8274647887323943, "no_speech_prob": 0.00411580502986908}, + {"id": 810, "seek": 258682, "start": 2607.54, "end": 2609.02, "text": " Whereas + users four and five,", "tokens": [51400, 13813, 5022, 1451, 293, 1732, 11, 51474], + "temperature": 0.0, "avg_logprob": -0.16618667431731723, "compression_ratio": 1.8274647887323943, + "no_speech_prob": 0.00411580502986908}, {"id": 811, "seek": 258682, "start": 2609.02, + "end": 2611.2200000000003, "text": " all liked animated movies,", "tokens": [51474, + 439, 4501, 18947, 6233, 11, 51584], "temperature": 0.0, "avg_logprob": -0.16618667431731723, + "compression_ratio": 1.8274647887323943, "no_speech_prob": 0.00411580502986908}, + {"id": 812, "seek": 258682, "start": 2611.2200000000003, "end": 2612.94, "text": + " all of them were suitable for small children,", "tokens": [51584, 439, 295, 552, + 645, 12873, 337, 1359, 2227, 11, 51670], "temperature": 0.0, "avg_logprob": -0.16618667431731723, + "compression_ratio": 1.8274647887323943, "no_speech_prob": 0.00411580502986908}, + {"id": 813, "seek": 258682, "start": 2612.94, "end": 2615.1400000000003, "text": + " and all of them were made by Disney and Pixar.", "tokens": [51670, 293, 439, 295, + 552, 645, 1027, 538, 8653, 293, 46695, 13, 51780], "temperature": 0.0, "avg_logprob": + -0.16618667431731723, "compression_ratio": 1.8274647887323943, "no_speech_prob": + 0.00411580502986908}, {"id": 814, "seek": 261514, "start": 2616.1, "end": 2617.9, + "text": " A collaborative filtering algorithm", "tokens": [50412, 316, 16555, 30822, + 9284, 50502], "temperature": 0.0, "avg_logprob": -0.15696475575271163, "compression_ratio": + 1.7020408163265306, "no_speech_prob": 0.001121428911574185}, {"id": 815, "seek": + 261514, "start": 2618.74, "end": 2621.3799999999997, "text": " sort of discovers + these relationships", "tokens": [50544, 1333, 295, 44522, 613, 6159, 50676], "temperature": + 0.0, "avg_logprob": -0.15696475575271163, "compression_ratio": 1.7020408163265306, + "no_speech_prob": 0.001121428911574185}, {"id": 816, "seek": 261514, "start": 2621.3799999999997, + "end": 2623.54, "text": " and recommends based upon them", "tokens": [50676, 293, + 34556, 2361, 3564, 552, 50784], "temperature": 0.0, "avg_logprob": -0.15696475575271163, + "compression_ratio": 1.7020408163265306, "no_speech_prob": 0.001121428911574185}, + {"id": 817, "seek": 261514, "start": 2623.54, "end": 2626.62, "text": " because + they exist in the underlying documents,", "tokens": [50784, 570, 436, 2514, 294, + 264, 14217, 8512, 11, 50938], "temperature": 0.0, "avg_logprob": -0.15696475575271163, + "compression_ratio": 1.7020408163265306, "no_speech_prob": 0.001121428911574185}, + {"id": 818, "seek": 261514, "start": 2626.62, "end": 2630.1, "text": " even though + we don''t have them modeled out explicitly.", "tokens": [50938, 754, 1673, 321, + 500, 380, 362, 552, 37140, 484, 20803, 13, 51112], "temperature": 0.0, "avg_logprob": + -0.15696475575271163, "compression_ratio": 1.7020408163265306, "no_speech_prob": + 0.001121428911574185}, {"id": 819, "seek": 261514, "start": 2630.1, "end": 2632.2999999999997, + "text": " And the way this works with collaborative filtering", "tokens": [51112, + 400, 264, 636, 341, 1985, 365, 16555, 30822, 51222], "temperature": 0.0, "avg_logprob": + -0.15696475575271163, "compression_ratio": 1.7020408163265306, "no_speech_prob": + 0.001121428911574185}, {"id": 820, "seek": 261514, "start": 2632.2999999999997, + "end": 2634.8199999999997, "text": " is we do matrix factorization.", "tokens": + [51222, 307, 321, 360, 8141, 5952, 2144, 13, 51348], "temperature": 0.0, "avg_logprob": + -0.15696475575271163, "compression_ratio": 1.7020408163265306, "no_speech_prob": + 0.001121428911574185}, {"id": 821, "seek": 261514, "start": 2634.8199999999997, + "end": 2638.18, "text": " So we start with a user item matrix,", "tokens": [51348, + 407, 321, 722, 365, 257, 4195, 3174, 8141, 11, 51516], "temperature": 0.0, "avg_logprob": + -0.15696475575271163, "compression_ratio": 1.7020408163265306, "no_speech_prob": + 0.001121428911574185}, {"id": 822, "seek": 261514, "start": 2638.18, "end": 2639.62, + "text": " where here''s my list of users,", "tokens": [51516, 689, 510, 311, 452, + 1329, 295, 5022, 11, 51588], "temperature": 0.0, "avg_logprob": -0.15696475575271163, + "compression_ratio": 1.7020408163265306, "no_speech_prob": 0.001121428911574185}, + {"id": 823, "seek": 261514, "start": 2639.62, "end": 2640.62, "text": " and here''s + my items,", "tokens": [51588, 293, 510, 311, 452, 4754, 11, 51638], "temperature": + 0.0, "avg_logprob": -0.15696475575271163, "compression_ratio": 1.7020408163265306, + "no_speech_prob": 0.001121428911574185}, {"id": 824, "seek": 261514, "start": 2640.62, + "end": 2643.3799999999997, "text": " and then these are sort of the amount", "tokens": + [51638, 293, 550, 613, 366, 1333, 295, 264, 2372, 51776], "temperature": 0.0, "avg_logprob": + -0.15696475575271163, "compression_ratio": 1.7020408163265306, "no_speech_prob": + 0.001121428911574185}, {"id": 825, "seek": 264338, "start": 2643.5, "end": 2645.2200000000003, + "text": " to which they like those items.", "tokens": [50370, 281, 597, 436, 411, + 729, 4754, 13, 50456], "temperature": 0.0, "avg_logprob": -0.15000835748819205, + "compression_ratio": 1.7330827067669172, "no_speech_prob": 0.0006736969808116555}, + {"id": 826, "seek": 264338, "start": 2645.2200000000003, "end": 2648.62, "text": + " We can derive this based upon just their querian click patterns.", "tokens": [50456, + 492, 393, 28446, 341, 2361, 3564, 445, 641, 7083, 952, 2052, 8294, 13, 50626], "temperature": + 0.0, "avg_logprob": -0.15000835748819205, "compression_ratio": 1.7330827067669172, + "no_speech_prob": 0.0006736969808116555}, {"id": 827, "seek": 264338, "start": 2650.26, + "end": 2652.42, "text": " The intermediate step for collaborative filtering", "tokens": + [50708, 440, 19376, 1823, 337, 16555, 30822, 50816], "temperature": 0.0, "avg_logprob": + -0.15000835748819205, "compression_ratio": 1.7330827067669172, "no_speech_prob": + 0.0006736969808116555}, {"id": 828, "seek": 264338, "start": 2652.42, "end": 2654.1800000000003, + "text": " is matrix factorization,", "tokens": [50816, 307, 8141, 5952, 2144, 11, + 50904], "temperature": 0.0, "avg_logprob": -0.15000835748819205, "compression_ratio": + 1.7330827067669172, "no_speech_prob": 0.0006736969808116555}, {"id": 829, "seek": + 264338, "start": 2654.1800000000003, "end": 2658.58, "text": " which is taking this + underlying user item interaction matrix", "tokens": [50904, 597, 307, 1940, 341, + 14217, 4195, 3174, 9285, 8141, 51124], "temperature": 0.0, "avg_logprob": -0.15000835748819205, + "compression_ratio": 1.7330827067669172, "no_speech_prob": 0.0006736969808116555}, + {"id": 830, "seek": 264338, "start": 2658.58, "end": 2661.38, "text": " and trying + to break it into two different matrices.", "tokens": [51124, 293, 1382, 281, 1821, + 309, 666, 732, 819, 32284, 13, 51264], "temperature": 0.0, "avg_logprob": -0.15000835748819205, + "compression_ratio": 1.7330827067669172, "no_speech_prob": 0.0006736969808116555}, + {"id": 831, "seek": 264338, "start": 2661.38, "end": 2664.9, "text": " This user + feature matrix and this item feature matrix.", "tokens": [51264, 639, 4195, 4111, + 8141, 293, 341, 3174, 4111, 8141, 13, 51440], "temperature": 0.0, "avg_logprob": + -0.15000835748819205, "compression_ratio": 1.7330827067669172, "no_speech_prob": + 0.0006736969808116555}, {"id": 832, "seek": 264338, "start": 2664.9, "end": 2669.1800000000003, + "text": " And the idea is that if I can generate a set of latent values", "tokens": + [51440, 400, 264, 1558, 307, 300, 498, 286, 393, 8460, 257, 992, 295, 48994, 4190, + 51654], "temperature": 0.0, "avg_logprob": -0.15000835748819205, "compression_ratio": + 1.7330827067669172, "no_speech_prob": 0.0006736969808116555}, {"id": 833, "seek": + 264338, "start": 2669.1800000000003, "end": 2671.98, "text": " associated with this + user across some number of dimensions,", "tokens": [51654, 6615, 365, 341, 4195, + 2108, 512, 1230, 295, 12819, 11, 51794], "temperature": 0.0, "avg_logprob": -0.15000835748819205, + "compression_ratio": 1.7330827067669172, "no_speech_prob": 0.0006736969808116555}, + {"id": 834, "seek": 267198, "start": 2671.98, "end": 2673.7400000000002, "text": + " I''m only showing three here visually,", "tokens": [50364, 286, 478, 787, 4099, + 1045, 510, 19622, 11, 50452], "temperature": 0.0, "avg_logprob": -0.10863017234481684, + "compression_ratio": 1.7865612648221343, "no_speech_prob": 0.0004396222939249128}, + {"id": 835, "seek": 267198, "start": 2673.7400000000002, "end": 2676.7400000000002, + "text": " because it''s a PowerPoint slide, but there''ll be more.", "tokens": [50452, + 570, 309, 311, 257, 25584, 4137, 11, 457, 456, 603, 312, 544, 13, 50602], "temperature": + 0.0, "avg_logprob": -0.10863017234481684, "compression_ratio": 1.7865612648221343, + "no_speech_prob": 0.0004396222939249128}, {"id": 836, "seek": 267198, "start": 2676.7400000000002, + "end": 2680.14, "text": " And if I have the same latent dimensions", "tokens": [50602, + 400, 498, 286, 362, 264, 912, 48994, 12819, 50772], "temperature": 0.0, "avg_logprob": + -0.10863017234481684, "compression_ratio": 1.7865612648221343, "no_speech_prob": + 0.0004396222939249128}, {"id": 837, "seek": 267198, "start": 2680.14, "end": 2681.7400000000002, + "text": " over here for the items,", "tokens": [50772, 670, 510, 337, 264, 4754, + 11, 50852], "temperature": 0.0, "avg_logprob": -0.10863017234481684, "compression_ratio": + 1.7865612648221343, "no_speech_prob": 0.0004396222939249128}, {"id": 838, "seek": + 267198, "start": 2681.7400000000002, "end": 2683.98, "text": " when I multiply a + particular user", "tokens": [50852, 562, 286, 12972, 257, 1729, 4195, 50964], "temperature": + 0.0, "avg_logprob": -0.10863017234481684, "compression_ratio": 1.7865612648221343, + "no_speech_prob": 0.0004396222939249128}, {"id": 839, "seek": 267198, "start": 2683.98, + "end": 2688.98, "text": " and their particular values associated", "tokens": [50964, + 293, 641, 1729, 4190, 6615, 51214], "temperature": 0.0, "avg_logprob": -0.10863017234481684, + "compression_ratio": 1.7865612648221343, "no_speech_prob": 0.0004396222939249128}, + {"id": 840, "seek": 267198, "start": 2688.98, "end": 2691.7, "text": " with these + latent dimensions with the movie,", "tokens": [51214, 365, 613, 48994, 12819, 365, + 264, 3169, 11, 51350], "temperature": 0.0, "avg_logprob": -0.10863017234481684, + "compression_ratio": 1.7865612648221343, "no_speech_prob": 0.0004396222939249128}, + {"id": 841, "seek": 267198, "start": 2691.7, "end": 2694.42, "text": " then I''m + pulling apart how much of this belongs to the movie", "tokens": [51350, 550, 286, + 478, 8407, 4936, 577, 709, 295, 341, 12953, 281, 264, 3169, 51486], "temperature": + 0.0, "avg_logprob": -0.10863017234481684, "compression_ratio": 1.7865612648221343, + "no_speech_prob": 0.0004396222939249128}, {"id": 842, "seek": 267198, "start": 2694.42, + "end": 2695.86, "text": " and how much of this belongs to the user", "tokens": [51486, + 293, 577, 709, 295, 341, 12953, 281, 264, 4195, 51558], "temperature": 0.0, "avg_logprob": + -0.10863017234481684, "compression_ratio": 1.7865612648221343, "no_speech_prob": + 0.0004396222939249128}, {"id": 843, "seek": 267198, "start": 2695.86, "end": 2697.34, + "text": " in terms of an interest.", "tokens": [51558, 294, 2115, 295, 364, 1179, + 13, 51632], "temperature": 0.0, "avg_logprob": -0.10863017234481684, "compression_ratio": + 1.7865612648221343, "no_speech_prob": 0.0004396222939249128}, {"id": 844, "seek": + 267198, "start": 2697.34, "end": 2699.34, "text": " But at the end of the day, these + are embeddings.", "tokens": [51632, 583, 412, 264, 917, 295, 264, 786, 11, 613, + 366, 12240, 29432, 13, 51732], "temperature": 0.0, "avg_logprob": -0.10863017234481684, + "compression_ratio": 1.7865612648221343, "no_speech_prob": 0.0004396222939249128}, + {"id": 845, "seek": 269934, "start": 2699.34, "end": 2702.78, "text": " This is + a user embedding and this is an item embedding.", "tokens": [50364, 639, 307, 257, + 4195, 12240, 3584, 293, 341, 307, 364, 3174, 12240, 3584, 13, 50536], "temperature": + 0.0, "avg_logprob": -0.11654238547048261, "compression_ratio": 1.8694029850746268, + "no_speech_prob": 0.00016376233543269336}, {"id": 846, "seek": 269934, "start": + 2702.78, "end": 2704.7000000000003, "text": " What that means is that,", "tokens": + [50536, 708, 300, 1355, 307, 300, 11, 50632], "temperature": 0.0, "avg_logprob": + -0.11654238547048261, "compression_ratio": 1.8694029850746268, "no_speech_prob": + 0.00016376233543269336}, {"id": 847, "seek": 269934, "start": 2704.7000000000003, + "end": 2707.6600000000003, "text": " and this is just how it works to do collaborative + filtering", "tokens": [50632, 293, 341, 307, 445, 577, 309, 1985, 281, 360, 16555, + 30822, 50780], "temperature": 0.0, "avg_logprob": -0.11654238547048261, "compression_ratio": + 1.8694029850746268, "no_speech_prob": 0.00016376233543269336}, {"id": 848, "seek": + 269934, "start": 2707.6600000000003, "end": 2710.46, "text": " and actually generate + recommendations for particular items,", "tokens": [50780, 293, 767, 8460, 10434, + 337, 1729, 4754, 11, 50920], "temperature": 0.0, "avg_logprob": -0.11654238547048261, + "compression_ratio": 1.8694029850746268, "no_speech_prob": 0.00016376233543269336}, + {"id": 849, "seek": 269934, "start": 2710.46, "end": 2712.2200000000003, "text": + " not particularly useful for today.", "tokens": [50920, 406, 4098, 4420, 337, 965, + 13, 51008], "temperature": 0.0, "avg_logprob": -0.11654238547048261, "compression_ratio": + 1.8694029850746268, "no_speech_prob": 0.00016376233543269336}, {"id": 850, "seek": + 269934, "start": 2712.2200000000003, "end": 2716.58, "text": " But what I can do + is I can generate these latent embeddings", "tokens": [51008, 583, 437, 286, 393, + 360, 307, 286, 393, 8460, 613, 48994, 12240, 29432, 51226], "temperature": 0.0, + "avg_logprob": -0.11654238547048261, "compression_ratio": 1.8694029850746268, "no_speech_prob": + 0.00016376233543269336}, {"id": 851, "seek": 269934, "start": 2716.58, "end": 2718.3, + "text": " and these essentially allow me to create", "tokens": [51226, 293, 613, + 4476, 2089, 385, 281, 1884, 51312], "temperature": 0.0, "avg_logprob": -0.11654238547048261, + "compression_ratio": 1.8694029850746268, "no_speech_prob": 0.00016376233543269336}, + {"id": 852, "seek": 269934, "start": 2718.3, "end": 2721.9, "text": " a behavioral + embedding space for my items.", "tokens": [51312, 257, 19124, 12240, 3584, 1901, + 337, 452, 4754, 13, 51492], "temperature": 0.0, "avg_logprob": -0.11654238547048261, + "compression_ratio": 1.8694029850746268, "no_speech_prob": 0.00016376233543269336}, + {"id": 853, "seek": 269934, "start": 2721.9, "end": 2723.06, "text": " So once I''ve + done that,", "tokens": [51492, 407, 1564, 286, 600, 1096, 300, 11, 51550], "temperature": + 0.0, "avg_logprob": -0.11654238547048261, "compression_ratio": 1.8694029850746268, + "no_speech_prob": 0.00016376233543269336}, {"id": 854, "seek": 269934, "start": + 2723.06, "end": 2725.7000000000003, "text": " I can add these behavioral embeddings + onto documents", "tokens": [51550, 286, 393, 909, 613, 19124, 12240, 29432, 3911, + 8512, 51682], "temperature": 0.0, "avg_logprob": -0.11654238547048261, "compression_ratio": + 1.8694029850746268, "no_speech_prob": 0.00016376233543269336}, {"id": 855, "seek": + 269934, "start": 2725.7000000000003, "end": 2728.94, "text": " just like I do with + content-based embeddings", "tokens": [51682, 445, 411, 286, 360, 365, 2701, 12, + 6032, 12240, 29432, 51844], "temperature": 0.0, "avg_logprob": -0.11654238547048261, + "compression_ratio": 1.8694029850746268, "no_speech_prob": 0.00016376233543269336}, + {"id": 856, "seek": 272894, "start": 2728.94, "end": 2731.38, "text": " or whether + it''s images or text or what have you,", "tokens": [50364, 420, 1968, 309, 311, + 5267, 420, 2487, 420, 437, 362, 291, 11, 50486], "temperature": 0.0, "avg_logprob": + -0.20460670535303965, "compression_ratio": 1.5326460481099657, "no_speech_prob": + 6.502851465484127e-05}, {"id": 857, "seek": 272894, "start": 2731.38, "end": 2735.62, + "text": " and then leverage those as a behavioral space.", "tokens": [50486, 293, + 550, 13982, 729, 382, 257, 19124, 1901, 13, 50698], "temperature": 0.0, "avg_logprob": + -0.20460670535303965, "compression_ratio": 1.5326460481099657, "no_speech_prob": + 6.502851465484127e-05}, {"id": 858, "seek": 272894, "start": 2735.62, "end": 2738.62, + "text": " So we do this commonly with personalized search,", "tokens": [50698, 407, + 321, 360, 341, 12719, 365, 28415, 3164, 11, 50848], "temperature": 0.0, "avg_logprob": + -0.20460670535303965, "compression_ratio": 1.5326460481099657, "no_speech_prob": + 6.502851465484127e-05}, {"id": 859, "seek": 272894, "start": 2738.62, "end": 2740.7000000000003, + "text": " for example, we''ll go through this in the course.", "tokens": [50848, + 337, 1365, 11, 321, 603, 352, 807, 341, 294, 264, 1164, 13, 50952], "temperature": + 0.0, "avg_logprob": -0.20460670535303965, "compression_ratio": 1.5326460481099657, + "no_speech_prob": 6.502851465484127e-05}, {"id": 860, "seek": 272894, "start": 2740.7000000000003, + "end": 2743.86, "text": " But if I have a person who previously searched", "tokens": + [50952, 583, 498, 286, 362, 257, 954, 567, 8046, 22961, 51110], "temperature": 0.0, + "avg_logprob": -0.20460670535303965, "compression_ratio": 1.5326460481099657, "no_speech_prob": + 6.502851465484127e-05}, {"id": 861, "seek": 272894, "start": 2743.86, "end": 2747.54, + "text": " for Hello Kitty Plus toy, GE Electric Razor,", "tokens": [51110, 337, + 2425, 36393, 7721, 12058, 11, 18003, 24677, 29051, 284, 11, 51294], "temperature": + 0.0, "avg_logprob": -0.20460670535303965, "compression_ratio": 1.5326460481099657, + "no_speech_prob": 6.502851465484127e-05}, {"id": 862, "seek": 272894, "start": 2747.54, + "end": 2749.26, "text": " GE Bright White Lightbulbs,", "tokens": [51294, 18003, + 24271, 5552, 8279, 12176, 929, 11, 51380], "temperature": 0.0, "avg_logprob": -0.20460670535303965, + "compression_ratio": 1.5326460481099657, "no_speech_prob": 6.502851465484127e-05}, + {"id": 863, "seek": 272894, "start": 2749.26, "end": 2751.82, "text": " Samsung + Stainless Steel refrigerator,", "tokens": [51380, 13173, 745, 491, 1832, 26038, + 19655, 11, 51508], "temperature": 0.0, "avg_logprob": -0.20460670535303965, "compression_ratio": + 1.5326460481099657, "no_speech_prob": 6.502851465484127e-05}, {"id": 864, "seek": + 272894, "start": 2751.82, "end": 2753.86, "text": " I can take a normal query,", + "tokens": [51508, 286, 393, 747, 257, 2710, 14581, 11, 51610], "temperature": 0.0, + "avg_logprob": -0.20460670535303965, "compression_ratio": 1.5326460481099657, "no_speech_prob": + 6.502851465484127e-05}, {"id": 865, "seek": 272894, "start": 2753.86, "end": 2755.38, + "text": " keyword query for microwave,", "tokens": [51610, 20428, 14581, 337, 19025, + 11, 51686], "temperature": 0.0, "avg_logprob": -0.20460670535303965, "compression_ratio": + 1.5326460481099657, "no_speech_prob": 6.502851465484127e-05}, {"id": 866, "seek": + 272894, "start": 2755.38, "end": 2757.38, "text": " which just returns random microwaves.", + "tokens": [51686, 597, 445, 11247, 4974, 17177, 5423, 13, 51786], "temperature": + 0.0, "avg_logprob": -0.20460670535303965, "compression_ratio": 1.5326460481099657, + "no_speech_prob": 6.502851465484127e-05}, {"id": 867, "seek": 275738, "start": 2758.34, + "end": 2763.06, "text": " If I use these vectors improperly with no guardrails,", + "tokens": [50412, 759, 286, 764, 613, 18875, 40651, 356, 365, 572, 6290, 424, 4174, + 11, 50648], "temperature": 0.0, "avg_logprob": -0.15857930481433868, "compression_ratio": + 1.723021582733813, "no_speech_prob": 0.002244642935693264}, {"id": 868, "seek": + 275738, "start": 2763.06, "end": 2766.1800000000003, "text": " I might do things + like blur the lines between categories.", "tokens": [50648, 286, 1062, 360, 721, + 411, 14257, 264, 3876, 1296, 10479, 13, 50804], "temperature": 0.0, "avg_logprob": + -0.15857930481433868, "compression_ratio": 1.723021582733813, "no_speech_prob": + 0.002244642935693264}, {"id": 869, "seek": 275738, "start": 2766.1800000000003, + "end": 2768.46, "text": " Most people, if they''ve searched for a Samsung Stainless", + "tokens": [50804, 4534, 561, 11, 498, 436, 600, 22961, 337, 257, 13173, 745, 491, + 1832, 50918], "temperature": 0.0, "avg_logprob": -0.15857930481433868, "compression_ratio": + 1.723021582733813, "no_speech_prob": 0.002244642935693264}, {"id": 870, "seek": + 275738, "start": 2768.46, "end": 2769.86, "text": " Steel refrigerator,", "tokens": + [50918, 26038, 19655, 11, 50988], "temperature": 0.0, "avg_logprob": -0.15857930481433868, + "compression_ratio": 1.723021582733813, "no_speech_prob": 0.002244642935693264}, + {"id": 871, "seek": 275738, "start": 2769.86, "end": 2773.34, "text": " the best + result here would be a Samsung Stainless Steel microwave.", "tokens": [50988, 264, + 1151, 1874, 510, 576, 312, 257, 13173, 745, 491, 1832, 26038, 19025, 13, 51162], + "temperature": 0.0, "avg_logprob": -0.15857930481433868, "compression_ratio": 1.723021582733813, + "no_speech_prob": 0.002244642935693264}, {"id": 872, "seek": 275738, "start": 2773.34, + "end": 2774.9, "text": " But if you do this wrong,", "tokens": [51162, 583, 498, + 291, 360, 341, 2085, 11, 51240], "temperature": 0.0, "avg_logprob": -0.15857930481433868, + "compression_ratio": 1.723021582733813, "no_speech_prob": 0.002244642935693264}, + {"id": 873, "seek": 275738, "start": 2774.9, "end": 2776.7400000000002, "text": + " the sort of naive approach is,", "tokens": [51240, 264, 1333, 295, 29052, 3109, + 307, 11, 51332], "temperature": 0.0, "avg_logprob": -0.15857930481433868, "compression_ratio": + 1.723021582733813, "no_speech_prob": 0.002244642935693264}, {"id": 874, "seek": + 275738, "start": 2776.7400000000002, "end": 2778.42, "text": " I might end up with + a Hello Kitty microwave", "tokens": [51332, 286, 1062, 917, 493, 365, 257, 2425, + 36393, 19025, 51416], "temperature": 0.0, "avg_logprob": -0.15857930481433868, "compression_ratio": + 1.723021582733813, "no_speech_prob": 0.002244642935693264}, {"id": 875, "seek": + 275738, "start": 2778.42, "end": 2782.7000000000003, "text": " or a Panasonic microwave, + or not Panasonic,", "tokens": [51416, 420, 257, 7557, 39460, 19025, 11, 420, 406, + 7557, 39460, 11, 51630], "temperature": 0.0, "avg_logprob": -0.15857930481433868, + "compression_ratio": 1.723021582733813, "no_speech_prob": 0.002244642935693264}, + {"id": 876, "seek": 275738, "start": 2782.7000000000003, "end": 2784.2200000000003, + "text": " but I might end up with things", "tokens": [51630, 457, 286, 1062, 917, + 493, 365, 721, 51706], "temperature": 0.0, "avg_logprob": -0.15857930481433868, + "compression_ratio": 1.723021582733813, "no_speech_prob": 0.002244642935693264}, + {"id": 877, "seek": 275738, "start": 2784.2200000000003, "end": 2786.5, "text": + " that don''t exactly match all of the preferences", "tokens": [51706, 300, 500, + 380, 2293, 2995, 439, 295, 264, 21910, 51820], "temperature": 0.0, "avg_logprob": + -0.15857930481433868, "compression_ratio": 1.723021582733813, "no_speech_prob": + 0.002244642935693264}, {"id": 878, "seek": 278650, "start": 2786.78, "end": 2789.06, + "text": " of the category, again, for another day.", "tokens": [50378, 295, 264, + 7719, 11, 797, 11, 337, 1071, 786, 13, 50492], "temperature": 0.0, "avg_logprob": + -0.15876143833376327, "compression_ratio": 1.62109375, "no_speech_prob": 0.00041343827615492046}, + {"id": 879, "seek": 278650, "start": 2789.06, "end": 2791.62, "text": " But this + is how a behavioral vector space would typically", "tokens": [50492, 583, 341, 307, + 577, 257, 19124, 8062, 1901, 576, 5850, 50620], "temperature": 0.0, "avg_logprob": + -0.15876143833376327, "compression_ratio": 1.62109375, "no_speech_prob": 0.00041343827615492046}, + {"id": 880, "seek": 278650, "start": 2791.62, "end": 2793.3, "text": " be used.", + "tokens": [50620, 312, 1143, 13, 50704], "temperature": 0.0, "avg_logprob": -0.15876143833376327, + "compression_ratio": 1.62109375, "no_speech_prob": 0.00041343827615492046}, {"id": + 881, "seek": 278650, "start": 2793.3, "end": 2796.42, "text": " But ultimately, + there''s a lot of tips and tricks", "tokens": [50704, 583, 6284, 11, 456, 311, 257, + 688, 295, 6082, 293, 11733, 50860], "temperature": 0.0, "avg_logprob": -0.15876143833376327, + "compression_ratio": 1.62109375, "no_speech_prob": 0.00041343827615492046}, {"id": + 882, "seek": 278650, "start": 2796.42, "end": 2798.74, "text": " you can use to + do AI-powered search", "tokens": [50860, 291, 393, 764, 281, 360, 7318, 12, 27178, + 3164, 50976], "temperature": 0.0, "avg_logprob": -0.15876143833376327, "compression_ratio": + 1.62109375, "no_speech_prob": 0.00041343827615492046}, {"id": 883, "seek": 278650, + "start": 2798.74, "end": 2801.66, "text": " to combine all of these different techniques", + "tokens": [50976, 281, 10432, 439, 295, 613, 819, 7512, 51122], "temperature": 0.0, + "avg_logprob": -0.15876143833376327, "compression_ratio": 1.62109375, "no_speech_prob": + 0.00041343827615492046}, {"id": 884, "seek": 278650, "start": 2801.66, "end": 2803.82, + "text": " that you might use to run searches", "tokens": [51122, 300, 291, 1062, + 764, 281, 1190, 26701, 51230], "temperature": 0.0, "avg_logprob": -0.15876143833376327, + "compression_ratio": 1.62109375, "no_speech_prob": 0.00041343827615492046}, {"id": + 885, "seek": 278650, "start": 2803.82, "end": 2806.3, "text": " and to query understanding + of relevance", "tokens": [51230, 293, 281, 14581, 3701, 295, 32684, 51354], "temperature": + 0.0, "avg_logprob": -0.15876143833376327, "compression_ratio": 1.62109375, "no_speech_prob": + 0.00041343827615492046}, {"id": 886, "seek": 278650, "start": 2806.3, "end": 2809.3, + "text": " and sort of integrate wormhole vectors in various places.", "tokens": + [51354, 293, 1333, 295, 13365, 23835, 14094, 18875, 294, 3683, 3190, 13, 51504], + "temperature": 0.0, "avg_logprob": -0.15876143833376327, "compression_ratio": 1.62109375, + "no_speech_prob": 0.00041343827615492046}, {"id": 887, "seek": 278650, "start": + 2809.3, "end": 2812.3, "text": " So there''s lots of different query paradigms", + "tokens": [51504, 407, 456, 311, 3195, 295, 819, 14581, 13480, 328, 2592, 51654], + "temperature": 0.0, "avg_logprob": -0.15876143833376327, "compression_ratio": 1.62109375, + "no_speech_prob": 0.00041343827615492046}, {"id": 888, "seek": 281230, "start": + 2812.3, "end": 2816.6200000000003, "text": " to experiment with, to merge using + wormhole vectors.", "tokens": [50364, 281, 5120, 365, 11, 281, 22183, 1228, 23835, + 14094, 18875, 13, 50580], "temperature": 0.0, "avg_logprob": -0.13896166376706934, + "compression_ratio": 1.75, "no_speech_prob": 0.0018121888861060143}, {"id": 889, + "seek": 281230, "start": 2816.6200000000003, "end": 2818.78, "text": " But that''s + the general idea.", "tokens": [50580, 583, 300, 311, 264, 2674, 1558, 13, 50688], + "temperature": 0.0, "avg_logprob": -0.13896166376706934, "compression_ratio": 1.75, + "no_speech_prob": 0.0018121888861060143}, {"id": 890, "seek": 281230, "start": 2818.78, + "end": 2820.26, "text": " I wanted to kind of introduce today", "tokens": [50688, + 286, 1415, 281, 733, 295, 5366, 965, 50762], "temperature": 0.0, "avg_logprob": + -0.13896166376706934, "compression_ratio": 1.75, "no_speech_prob": 0.0018121888861060143}, + {"id": 891, "seek": 281230, "start": 2820.26, "end": 2823.1800000000003, "text": + " to get the discussion going about going", "tokens": [50762, 281, 483, 264, 5017, + 516, 466, 516, 50908], "temperature": 0.0, "avg_logprob": -0.13896166376706934, + "compression_ratio": 1.75, "no_speech_prob": 0.0018121888861060143}, {"id": 892, + "seek": 281230, "start": 2823.1800000000003, "end": 2825.3, "text": " from thinking + of these vector spaces", "tokens": [50908, 490, 1953, 295, 613, 8062, 7673, 51014], + "temperature": 0.0, "avg_logprob": -0.13896166376706934, "compression_ratio": 1.75, + "no_speech_prob": 0.0018121888861060143}, {"id": 893, "seek": 281230, "start": 2825.3, + "end": 2827.1400000000003, "text": " as entirely sort of orthogonal,", "tokens": + [51014, 382, 7696, 1333, 295, 41488, 11, 51106], "temperature": 0.0, "avg_logprob": + -0.13896166376706934, "compression_ratio": 1.75, "no_speech_prob": 0.0018121888861060143}, + {"id": 894, "seek": 281230, "start": 2827.1400000000003, "end": 2828.7000000000003, + "text": " where I have to query them separately,", "tokens": [51106, 689, 286, 362, + 281, 14581, 552, 14759, 11, 51184], "temperature": 0.0, "avg_logprob": -0.13896166376706934, + "compression_ratio": 1.75, "no_speech_prob": 0.0018121888861060143}, {"id": 895, + "seek": 281230, "start": 2828.7000000000003, "end": 2830.9, "text": " or maybe I + could even query them in the same query,", "tokens": [51184, 420, 1310, 286, 727, + 754, 14581, 552, 294, 264, 912, 14581, 11, 51294], "temperature": 0.0, "avg_logprob": + -0.13896166376706934, "compression_ratio": 1.75, "no_speech_prob": 0.0018121888861060143}, + {"id": 896, "seek": 281230, "start": 2830.9, "end": 2833.0600000000004, "text": + " but I''m filtering on them separately,", "tokens": [51294, 457, 286, 478, 30822, + 322, 552, 14759, 11, 51402], "temperature": 0.0, "avg_logprob": -0.13896166376706934, + "compression_ratio": 1.75, "no_speech_prob": 0.0018121888861060143}, {"id": 897, + "seek": 281230, "start": 2833.0600000000004, "end": 2836.46, "text": " to trying + to actually pull out the semantic understanding", "tokens": [51402, 281, 1382, 281, + 767, 2235, 484, 264, 47982, 3701, 51572], "temperature": 0.0, "avg_logprob": -0.13896166376706934, + "compression_ratio": 1.75, "no_speech_prob": 0.0018121888861060143}, {"id": 898, + "seek": 281230, "start": 2836.46, "end": 2840.78, "text": " from one vector space + and use that to craft a sort of wormhole", "tokens": [51572, 490, 472, 8062, 1901, + 293, 764, 300, 281, 8448, 257, 1333, 295, 23835, 14094, 51788], "temperature": 0.0, + "avg_logprob": -0.13896166376706934, "compression_ratio": 1.75, "no_speech_prob": + 0.0018121888861060143}, {"id": 899, "seek": 284078, "start": 2840.78, "end": 2842.98, + "text": " or hopping off point to another vector space", "tokens": [50364, 420, + 47199, 766, 935, 281, 1071, 8062, 1901, 50474], "temperature": 0.0, "avg_logprob": + -0.19012510272818553, "compression_ratio": 1.6539682539682539, "no_speech_prob": + 0.0009992808336392045}, {"id": 900, "seek": 284078, "start": 2842.98, "end": 2845.3, + "text": " to sort of continue to explore leveraging", "tokens": [50474, 281, 1333, + 295, 2354, 281, 6839, 32666, 50590], "temperature": 0.0, "avg_logprob": -0.19012510272818553, + "compression_ratio": 1.6539682539682539, "no_speech_prob": 0.0009992808336392045}, + {"id": 901, "seek": 284078, "start": 2845.3, "end": 2846.6200000000003, "text": + " a different query paradigm.", "tokens": [50590, 257, 819, 14581, 24709, 13, 50656], + "temperature": 0.0, "avg_logprob": -0.19012510272818553, "compression_ratio": 1.6539682539682539, + "no_speech_prob": 0.0009992808336392045}, {"id": 902, "seek": 284078, "start": 2846.6200000000003, + "end": 2850.98, "text": " So that''s pretty much it for the talk for today.", "tokens": + [50656, 407, 300, 311, 1238, 709, 309, 337, 264, 751, 337, 965, 13, 50874], "temperature": + 0.0, "avg_logprob": -0.19012510272818553, "compression_ratio": 1.6539682539682539, + "no_speech_prob": 0.0009992808336392045}, {"id": 903, "seek": 284078, "start": 2850.98, + "end": 2854.5400000000004, "text": " Dima, Dmitry, I don''t know if you want to + start", "tokens": [50874, 413, 4775, 11, 413, 3508, 627, 11, 286, 500, 380, 458, + 498, 291, 528, 281, 722, 51052], "temperature": 0.0, "avg_logprob": -0.19012510272818553, + "compression_ratio": 1.6539682539682539, "no_speech_prob": 0.0009992808336392045}, + {"id": 904, "seek": 284078, "start": 2854.5400000000004, "end": 2856.2200000000003, + "text": " to dive in some questions.", "tokens": [51052, 281, 9192, 294, 512, 1651, + 13, 51136], "temperature": 0.0, "avg_logprob": -0.19012510272818553, "compression_ratio": + 1.6539682539682539, "no_speech_prob": 0.0009992808336392045}, {"id": 905, "seek": + 284078, "start": 2856.2200000000003, "end": 2858.9, "text": " I know some people + will have to hop off at the top of the hour", "tokens": [51136, 286, 458, 512, 561, + 486, 362, 281, 3818, 766, 412, 264, 1192, 295, 264, 1773, 51270], "temperature": + 0.0, "avg_logprob": -0.19012510272818553, "compression_ratio": 1.6539682539682539, + "no_speech_prob": 0.0009992808336392045}, {"id": 906, "seek": 284078, "start": 2858.9, + "end": 2860.3, "text": " because this is scheduled for an hour,", "tokens": [51270, + 570, 341, 307, 15678, 337, 364, 1773, 11, 51340], "temperature": 0.0, "avg_logprob": + -0.19012510272818553, "compression_ratio": 1.6539682539682539, "no_speech_prob": + 0.0009992808336392045}, {"id": 907, "seek": 284078, "start": 2860.3, "end": 2862.5, + "text": " but I''m also happy to just kind of keep going", "tokens": [51340, 457, + 286, 478, 611, 2055, 281, 445, 733, 295, 1066, 516, 51450], "temperature": 0.0, + "avg_logprob": -0.19012510272818553, "compression_ratio": 1.6539682539682539, "no_speech_prob": + 0.0009992808336392045}, {"id": 908, "seek": 284078, "start": 2862.5, "end": 2864.78, + "text": " with questions a little bit after", "tokens": [51450, 365, 1651, 257, + 707, 857, 934, 51564], "temperature": 0.0, "avg_logprob": -0.19012510272818553, + "compression_ratio": 1.6539682539682539, "no_speech_prob": 0.0009992808336392045}, + {"id": 909, "seek": 284078, "start": 2864.78, "end": 2866.78, "text": " if it makes + sense and people can drop off when they want.", "tokens": [51564, 498, 309, 1669, + 2020, 293, 561, 393, 3270, 766, 562, 436, 528, 13, 51664], "temperature": 0.0, "avg_logprob": + -0.19012510272818553, "compression_ratio": 1.6539682539682539, "no_speech_prob": + 0.0009992808336392045}, {"id": 910, "seek": 284078, "start": 2866.78, "end": 2869.26, + "text": " But let''s maybe dive into some discussion.", "tokens": [51664, 583, 718, + 311, 1310, 9192, 666, 512, 5017, 13, 51788], "temperature": 0.0, "avg_logprob": + -0.19012510272818553, "compression_ratio": 1.6539682539682539, "no_speech_prob": + 0.0009992808336392045}, {"id": 911, "seek": 286926, "start": 2869.26, "end": 2870.94, + "text": " Yeah, we have a bunch of questions.", "tokens": [50364, 865, 11, 321, + 362, 257, 3840, 295, 1651, 13, 50448], "temperature": 0.0, "avg_logprob": -0.26576798050491895, + "compression_ratio": 1.6063829787234043, "no_speech_prob": 0.009843863546848297}, + {"id": 912, "seek": 286926, "start": 2870.94, "end": 2872.26, "text": " Thanks, + Tray, a bunch.", "tokens": [50448, 2561, 11, 1765, 320, 11, 257, 3840, 13, 50514], + "temperature": 0.0, "avg_logprob": -0.26576798050491895, "compression_ratio": 1.6063829787234043, + "no_speech_prob": 0.009843863546848297}, {"id": 913, "seek": 286926, "start": 2872.26, + "end": 2873.94, "text": " This is fantastic topic.", "tokens": [50514, 639, 307, + 5456, 4829, 13, 50598], "temperature": 0.0, "avg_logprob": -0.26576798050491895, + "compression_ratio": 1.6063829787234043, "no_speech_prob": 0.009843863546848297}, + {"id": 914, "seek": 286926, "start": 2873.94, "end": 2876.86, "text": " I just recently + traveled to Texas from Finland", "tokens": [50598, 286, 445, 3938, 16147, 281, 7885, + 490, 24869, 50744], "temperature": 0.0, "avg_logprob": -0.26576798050491895, "compression_ratio": + 1.6063829787234043, "no_speech_prob": 0.009843863546848297}, {"id": 915, "seek": + 286926, "start": 2876.86, "end": 2878.7000000000003, "text": " and it took me like + 12 hours.", "tokens": [50744, 293, 309, 1890, 385, 411, 2272, 2496, 13, 50836], + "temperature": 0.0, "avg_logprob": -0.26576798050491895, "compression_ratio": 1.6063829787234043, + "no_speech_prob": 0.009843863546848297}, {"id": 916, "seek": 286926, "start": 2878.7000000000003, + "end": 2881.78, "text": " I wish there was a wormhole jump through points", "tokens": + [50836, 286, 3172, 456, 390, 257, 23835, 14094, 3012, 807, 2793, 50990], "temperature": + 0.0, "avg_logprob": -0.26576798050491895, "compression_ratio": 1.6063829787234043, + "no_speech_prob": 0.009843863546848297}, {"id": 917, "seek": 286926, "start": 2881.78, + "end": 2884.2200000000003, "text": " so I could just end up there much quicker.", + "tokens": [50990, 370, 286, 727, 445, 917, 493, 456, 709, 16255, 13, 51112], "temperature": + 0.0, "avg_logprob": -0.26576798050491895, "compression_ratio": 1.6063829787234043, + "no_speech_prob": 0.009843863546848297}, {"id": 918, "seek": 286926, "start": 2884.2200000000003, + "end": 2885.46, "text": " There''d be all of them.", "tokens": [51112, 821, 1116, + 312, 439, 295, 552, 13, 51174], "temperature": 0.0, "avg_logprob": -0.26576798050491895, + "compression_ratio": 1.6063829787234043, "no_speech_prob": 0.009843863546848297}, + {"id": 919, "seek": 286926, "start": 2885.46, "end": 2887.5800000000004, "text": + " We have so many questions, man.", "tokens": [51174, 492, 362, 370, 867, 1651, + 11, 587, 13, 51280], "temperature": 0.0, "avg_logprob": -0.26576798050491895, "compression_ratio": + 1.6063829787234043, "no_speech_prob": 0.009843863546848297}, {"id": 920, "seek": + 286926, "start": 2888.5800000000004, "end": 2892.5400000000004, "text": " So I''ll + defer my questions off and I''ll just jump.", "tokens": [51330, 407, 286, 603, 25704, + 452, 1651, 766, 293, 286, 603, 445, 3012, 13, 51528], "temperature": 0.0, "avg_logprob": + -0.26576798050491895, "compression_ratio": 1.6063829787234043, "no_speech_prob": + 0.009843863546848297}, {"id": 921, "seek": 286926, "start": 2892.5400000000004, + "end": 2895.6200000000003, "text": " There is one logistic question from Ardune.", + "tokens": [51528, 821, 307, 472, 3565, 3142, 1168, 490, 1587, 67, 2613, 13, 51682], + "temperature": 0.0, "avg_logprob": -0.26576798050491895, "compression_ratio": 1.6063829787234043, + "no_speech_prob": 0.009843863546848297}, {"id": 922, "seek": 286926, "start": 2895.6200000000003, + "end": 2898.5, "text": " I hope I pronounced you name correctly, I''m sorry.", "tokens": + [51682, 286, 1454, 286, 23155, 291, 1315, 8944, 11, 286, 478, 2597, 13, 51826], + "temperature": 0.0, "avg_logprob": -0.26576798050491895, "compression_ratio": 1.6063829787234043, + "no_speech_prob": 0.009843863546848297}, {"id": 923, "seek": 289850, "start": 2898.5, + "end": 2901.3, "text": " How''s the course different from the AI Powered Search + book?", "tokens": [50364, 1012, 311, 264, 1164, 819, 490, 264, 7318, 7086, 292, + 17180, 1446, 30, 50504], "temperature": 0.0, "avg_logprob": -0.23608602134926807, + "compression_ratio": 1.5289256198347108, "no_speech_prob": 0.03237403929233551}, + {"id": 924, "seek": 289850, "start": 2901.3, "end": 2905.86, "text": " And then + later is this topic wormhole vectors", "tokens": [50504, 400, 550, 1780, 307, 341, + 4829, 23835, 14094, 18875, 50732], "temperature": 0.0, "avg_logprob": -0.23608602134926807, + "compression_ratio": 1.5289256198347108, "no_speech_prob": 0.03237403929233551}, + {"id": 925, "seek": 289850, "start": 2905.86, "end": 2908.22, "text": " covered + in the book?", "tokens": [50732, 5343, 294, 264, 1446, 30, 50850], "temperature": + 0.0, "avg_logprob": -0.23608602134926807, "compression_ratio": 1.5289256198347108, + "no_speech_prob": 0.03237403929233551}, {"id": 926, "seek": 289850, "start": 2908.22, + "end": 2909.06, "text": " Awesome.", "tokens": [50850, 10391, 13, 50892], "temperature": + 0.0, "avg_logprob": -0.23608602134926807, "compression_ratio": 1.5289256198347108, + "no_speech_prob": 0.03237403929233551}, {"id": 927, "seek": 289850, "start": 2909.06, + "end": 2913.42, "text": " OK, so I would say there''s materialized.", "tokens": + [50892, 2264, 11, 370, 286, 576, 584, 456, 311, 2527, 1602, 13, 51110], "temperature": + 0.0, "avg_logprob": -0.23608602134926807, "compression_ratio": 1.5289256198347108, + "no_speech_prob": 0.03237403929233551}, {"id": 928, "seek": 289850, "start": 2913.42, + "end": 2918.54, "text": " There''s probably like about a 40% overlap.", "tokens": + [51110, 821, 311, 1391, 411, 466, 257, 3356, 4, 19959, 13, 51366], "temperature": + 0.0, "avg_logprob": -0.23608602134926807, "compression_ratio": 1.5289256198347108, + "no_speech_prob": 0.03237403929233551}, {"id": 929, "seek": 289850, "start": 2918.54, + "end": 2921.26, "text": " The book is a good solid foundation", "tokens": [51366, + 440, 1446, 307, 257, 665, 5100, 7030, 51502], "temperature": 0.0, "avg_logprob": + -0.23608602134926807, "compression_ratio": 1.5289256198347108, "no_speech_prob": + 0.03237403929233551}, {"id": 930, "seek": 289850, "start": 2921.26, "end": 2923.06, + "text": " for how to think about AI Powered Search.", "tokens": [51502, 337, 577, + 281, 519, 466, 7318, 7086, 292, 17180, 13, 51592], "temperature": 0.0, "avg_logprob": + -0.23608602134926807, "compression_ratio": 1.5289256198347108, "no_speech_prob": + 0.03237403929233551}, {"id": 931, "seek": 289850, "start": 2923.06, "end": 2925.34, + "text": " Obviously we go through all the mental models", "tokens": [51592, 7580, + 321, 352, 807, 439, 264, 4973, 5245, 51706], "temperature": 0.0, "avg_logprob": + -0.23608602134926807, "compression_ratio": 1.5289256198347108, "no_speech_prob": + 0.03237403929233551}, {"id": 932, "seek": 289850, "start": 2925.34, "end": 2926.62, + "text": " and lots of code examples.", "tokens": [51706, 293, 3195, 295, 3089, 5110, + 13, 51770], "temperature": 0.0, "avg_logprob": -0.23608602134926807, "compression_ratio": + 1.5289256198347108, "no_speech_prob": 0.03237403929233551}, {"id": 933, "seek": + 292662, "start": 2926.62, "end": 2929.06, "text": " So a lot of the labs and a lot + of the code", "tokens": [50364, 407, 257, 688, 295, 264, 20339, 293, 257, 688, 295, + 264, 3089, 50486], "temperature": 0.0, "avg_logprob": -0.1772294725690569, "compression_ratio": + 2.0368852459016393, "no_speech_prob": 0.0012175999581813812}, {"id": 934, "seek": + 292662, "start": 2929.06, "end": 2931.02, "text": " for the course will come from + the book.", "tokens": [50486, 337, 264, 1164, 486, 808, 490, 264, 1446, 13, 50584], + "temperature": 0.0, "avg_logprob": -0.1772294725690569, "compression_ratio": 2.0368852459016393, + "no_speech_prob": 0.0012175999581813812}, {"id": 935, "seek": 292662, "start": 2931.02, + "end": 2935.18, "text": " However, there''s a lot of new topics and things", "tokens": + [50584, 2908, 11, 456, 311, 257, 688, 295, 777, 8378, 293, 721, 50792], "temperature": + 0.0, "avg_logprob": -0.1772294725690569, "compression_ratio": 2.0368852459016393, + "no_speech_prob": 0.0012175999581813812}, {"id": 936, "seek": 292662, "start": 2935.18, + "end": 2937.9, "text": " that we just like, we couldn''t write a thousand page book.", + "tokens": [50792, 300, 321, 445, 411, 11, 321, 2809, 380, 2464, 257, 4714, 3028, + 1446, 13, 50928], "temperature": 0.0, "avg_logprob": -0.1772294725690569, "compression_ratio": + 2.0368852459016393, "no_speech_prob": 0.0012175999581813812}, {"id": 937, "seek": + 292662, "start": 2937.9, "end": 2939.66, "text": " And so there''s a lot of things + we just couldn''t get to", "tokens": [50928, 400, 370, 456, 311, 257, 688, 295, + 721, 321, 445, 2809, 380, 483, 281, 51016], "temperature": 0.0, "avg_logprob": -0.1772294725690569, + "compression_ratio": 2.0368852459016393, "no_speech_prob": 0.0012175999581813812}, + {"id": 938, "seek": 292662, "start": 2939.66, "end": 2942.02, "text": " because + we had to start from the beginning and frame it.", "tokens": [51016, 570, 321, 632, + 281, 722, 490, 264, 2863, 293, 3920, 309, 13, 51134], "temperature": 0.0, "avg_logprob": + -0.1772294725690569, "compression_ratio": 2.0368852459016393, "no_speech_prob": + 0.0012175999581813812}, {"id": 939, "seek": 292662, "start": 2942.02, "end": 2946.94, + "text": " So things like late interaction models, things like", "tokens": [51134, + 407, 721, 411, 3469, 9285, 5245, 11, 721, 411, 51380], "temperature": 0.0, "avg_logprob": + -0.1772294725690569, "compression_ratio": 2.0368852459016393, "no_speech_prob": + 0.0012175999581813812}, {"id": 940, "seek": 292662, "start": 2946.94, "end": 2950.06, + "text": " a gentick search that aren''t in the book,", "tokens": [51380, 257, 16108, + 618, 3164, 300, 3212, 380, 294, 264, 1446, 11, 51536], "temperature": 0.0, "avg_logprob": + -0.1772294725690569, "compression_ratio": 2.0368852459016393, "no_speech_prob": + 0.0012175999581813812}, {"id": 941, "seek": 292662, "start": 2950.06, "end": 2951.58, + "text": " like late interaction models are referenced,", "tokens": [51536, 411, + 3469, 9285, 5245, 366, 32734, 11, 51612], "temperature": 0.0, "avg_logprob": -0.1772294725690569, + "compression_ratio": 2.0368852459016393, "no_speech_prob": 0.0012175999581813812}, + {"id": 942, "seek": 292662, "start": 2951.58, "end": 2955.02, "text": " but we just + couldn''t get into depth that are more modern", "tokens": [51612, 457, 321, 445, + 2809, 380, 483, 666, 7161, 300, 366, 544, 4363, 51784], "temperature": 0.0, "avg_logprob": + -0.1772294725690569, "compression_ratio": 2.0368852459016393, "no_speech_prob": + 0.0012175999581813812}, {"id": 943, "seek": 295502, "start": 2955.02, "end": 2958.14, + "text": " and interesting ways to solve problems.", "tokens": [50364, 293, 1880, + 2098, 281, 5039, 2740, 13, 50520], "temperature": 0.0, "avg_logprob": -0.17497139236553996, + "compression_ratio": 1.8979591836734695, "no_speech_prob": 0.008172878064215183}, + {"id": 944, "seek": 295502, "start": 2958.14, "end": 2960.46, "text": " Things like + mini coil, which I mentioned,", "tokens": [50520, 9514, 411, 8382, 22225, 11, 597, + 286, 2835, 11, 50636], "temperature": 0.0, "avg_logprob": -0.17497139236553996, + "compression_ratio": 1.8979591836734695, "no_speech_prob": 0.008172878064215183}, + {"id": 945, "seek": 295502, "start": 2960.46, "end": 2963.2599999999998, "text": + " those things will be in the course and unique to the course", "tokens": [50636, + 729, 721, 486, 312, 294, 264, 1164, 293, 3845, 281, 264, 1164, 50776], "temperature": + 0.0, "avg_logprob": -0.17497139236553996, "compression_ratio": 1.8979591836734695, + "no_speech_prob": 0.008172878064215183}, {"id": 946, "seek": 295502, "start": 2963.2599999999998, + "end": 2966.34, "text": " and will have guest speakers who are experts in those + things.", "tokens": [50776, 293, 486, 362, 8341, 9518, 567, 366, 8572, 294, 729, + 721, 13, 50930], "temperature": 0.0, "avg_logprob": -0.17497139236553996, "compression_ratio": + 1.8979591836734695, "no_speech_prob": 0.008172878064215183}, {"id": 947, "seek": + 295502, "start": 2966.34, "end": 2970.7, "text": " So I would say the course doesn''t + expect you to have read the book", "tokens": [50930, 407, 286, 576, 584, 264, 1164, + 1177, 380, 2066, 291, 281, 362, 1401, 264, 1446, 51148], "temperature": 0.0, "avg_logprob": + -0.17497139236553996, "compression_ratio": 1.8979591836734695, "no_speech_prob": + 0.008172878064215183}, {"id": 948, "seek": 295502, "start": 2970.7, "end": 2972.3, + "text": " or to understand the fundamentals in the book.", "tokens": [51148, 420, + 281, 1223, 264, 29505, 294, 264, 1446, 13, 51228], "temperature": 0.0, "avg_logprob": + -0.17497139236553996, "compression_ratio": 1.8979591836734695, "no_speech_prob": + 0.008172878064215183}, {"id": 949, "seek": 295502, "start": 2972.3, "end": 2975.74, + "text": " We''ll cover those, but we won''t cover everything in the book.", "tokens": + [51228, 492, 603, 2060, 729, 11, 457, 321, 1582, 380, 2060, 1203, 294, 264, 1446, + 13, 51400], "temperature": 0.0, "avg_logprob": -0.17497139236553996, "compression_ratio": + 1.8979591836734695, "no_speech_prob": 0.008172878064215183}, {"id": 950, "seek": + 295502, "start": 2975.74, "end": 2977.1, "text": " And we''ll also cover a lot of + things", "tokens": [51400, 400, 321, 603, 611, 2060, 257, 688, 295, 721, 51468], + "temperature": 0.0, "avg_logprob": -0.17497139236553996, "compression_ratio": 1.8979591836734695, + "no_speech_prob": 0.008172878064215183}, {"id": 951, "seek": 295502, "start": 2977.1, + "end": 2979.38, "text": " that aren''t in the book and go in deeper depth.", "tokens": + [51468, 300, 3212, 380, 294, 264, 1446, 293, 352, 294, 7731, 7161, 13, 51582], "temperature": + 0.0, "avg_logprob": -0.17497139236553996, "compression_ratio": 1.8979591836734695, + "no_speech_prob": 0.008172878064215183}, {"id": 952, "seek": 295502, "start": 2979.38, + "end": 2981.86, "text": " And so I would say, if you''ve read the book,", "tokens": + [51582, 400, 370, 286, 576, 584, 11, 498, 291, 600, 1401, 264, 1446, 11, 51706], + "temperature": 0.0, "avg_logprob": -0.17497139236553996, "compression_ratio": 1.8979591836734695, + "no_speech_prob": 0.008172878064215183}, {"id": 953, "seek": 295502, "start": 2981.86, + "end": 2984.58, "text": " the course is still going to be really valuable.", "tokens": + [51706, 264, 1164, 307, 920, 516, 281, 312, 534, 8263, 13, 51842], "temperature": + 0.0, "avg_logprob": -0.17497139236553996, "compression_ratio": 1.8979591836734695, + "no_speech_prob": 0.008172878064215183}, {"id": 954, "seek": 298458, "start": 2984.58, + "end": 2986.38, "text": " And even if you can''t make all the sessions,", "tokens": + [50364, 400, 754, 498, 291, 393, 380, 652, 439, 264, 11081, 11, 50454], "temperature": + 0.0, "avg_logprob": -0.20999315750500389, "compression_ratio": 1.76, "no_speech_prob": + 0.0011745034717023373}, {"id": 955, "seek": 298458, "start": 2986.38, "end": 2992.54, + "text": " again, the videos and all the materials available for you forever.", "tokens": + [50454, 797, 11, 264, 2145, 293, 439, 264, 5319, 2435, 337, 291, 5680, 13, 50762], + "temperature": 0.0, "avg_logprob": -0.20999315750500389, "compression_ratio": 1.76, + "no_speech_prob": 0.0011745034717023373}, {"id": 956, "seek": 298458, "start": 2992.54, + "end": 2996.5, "text": " So you don''t have to have read the book to take the course,", + "tokens": [50762, 407, 291, 500, 380, 362, 281, 362, 1401, 264, 1446, 281, 747, + 264, 1164, 11, 50960], "temperature": 0.0, "avg_logprob": -0.20999315750500389, + "compression_ratio": 1.76, "no_speech_prob": 0.0011745034717023373}, {"id": 957, + "seek": 298458, "start": 2996.5, "end": 2998.22, "text": " but if you have read + the book, the course", "tokens": [50960, 457, 498, 291, 362, 1401, 264, 1446, 11, + 264, 1164, 51046], "temperature": 0.0, "avg_logprob": -0.20999315750500389, "compression_ratio": + 1.76, "no_speech_prob": 0.0011745034717023373}, {"id": 958, "seek": 298458, "start": + 2998.22, "end": 3000.06, "text": " is still going to be massively useful.", "tokens": + [51046, 307, 920, 516, 281, 312, 29379, 4420, 13, 51138], "temperature": 0.0, "avg_logprob": + -0.20999315750500389, "compression_ratio": 1.76, "no_speech_prob": 0.0011745034717023373}, + {"id": 959, "seek": 298458, "start": 3000.06, "end": 3001.7, "text": " Yeah, so + to implement each other.", "tokens": [51138, 865, 11, 370, 281, 4445, 1184, 661, + 13, 51220], "temperature": 0.0, "avg_logprob": -0.20999315750500389, "compression_ratio": + 1.76, "no_speech_prob": 0.0011745034717023373}, {"id": 960, "seek": 298458, "start": + 3001.7, "end": 3005.94, "text": " And by the way, I own the book and it''s amazing + read in silency.", "tokens": [51220, 400, 538, 264, 636, 11, 286, 1065, 264, 1446, + 293, 309, 311, 2243, 1401, 294, 3425, 3020, 13, 51432], "temperature": 0.0, "avg_logprob": + -0.20999315750500389, "compression_ratio": 1.76, "no_speech_prob": 0.0011745034717023373}, + {"id": 961, "seek": 298458, "start": 3005.94, "end": 3008.62, "text": " And then + the course is a different way of engaging", "tokens": [51432, 400, 550, 264, 1164, + 307, 257, 819, 636, 295, 11268, 51566], "temperature": 0.0, "avg_logprob": -0.20999315750500389, + "compression_ratio": 1.76, "no_speech_prob": 0.0011745034717023373}, {"id": 962, + "seek": 298458, "start": 3008.62, "end": 3012.62, "text": " with the material like + a dynamic way.", "tokens": [51566, 365, 264, 2527, 411, 257, 8546, 636, 13, 51766], + "temperature": 0.0, "avg_logprob": -0.20999315750500389, "compression_ratio": 1.76, + "no_speech_prob": 0.0011745034717023373}, {"id": 963, "seek": 301262, "start": 3012.62, + "end": 3014.1, "text": " Well, I didn''t answer the last part, which", "tokens": + [50364, 1042, 11, 286, 994, 380, 1867, 264, 1036, 644, 11, 597, 50438], "temperature": + 0.0, "avg_logprob": -0.20913433661827674, "compression_ratio": 1.674496644295302, + "no_speech_prob": 0.0022750417701900005}, {"id": 964, "seek": 301262, "start": 3014.1, + "end": 3016.46, "text": " is, will wormhole vectors be covered?", "tokens": [50438, + 307, 11, 486, 23835, 14094, 18875, 312, 5343, 30, 50556], "temperature": 0.0, "avg_logprob": + -0.20913433661827674, "compression_ratio": 1.674496644295302, "no_speech_prob": + 0.0022750417701900005}, {"id": 965, "seek": 301262, "start": 3016.46, "end": 3019.54, + "text": " They will definitely be covered more", "tokens": [50556, 814, 486, 2138, + 312, 5343, 544, 50710], "temperature": 0.0, "avg_logprob": -0.20913433661827674, + "compression_ratio": 1.674496644295302, "no_speech_prob": 0.0022750417701900005}, + {"id": 966, "seek": 301262, "start": 3019.54, "end": 3023.42, "text": " so as the + techniques and strategies for how", "tokens": [50710, 370, 382, 264, 7512, 293, + 9029, 337, 577, 50904], "temperature": 0.0, "avg_logprob": -0.20913433661827674, + "compression_ratio": 1.674496644295302, "no_speech_prob": 0.0022750417701900005}, + {"id": 967, "seek": 301262, "start": 3023.42, "end": 3025.18, "text": " to hop back + and forth between.", "tokens": [50904, 281, 3818, 646, 293, 5220, 1296, 13, 50992], + "temperature": 0.0, "avg_logprob": -0.20913433661827674, "compression_ratio": 1.674496644295302, + "no_speech_prob": 0.0022750417701900005}, {"id": 968, "seek": 301262, "start": 3025.18, + "end": 3028.18, "text": " So some of it''s actually in the book, the semantic knowledge", + "tokens": [50992, 407, 512, 295, 309, 311, 767, 294, 264, 1446, 11, 264, 47982, + 3601, 51142], "temperature": 0.0, "avg_logprob": -0.20913433661827674, "compression_ratio": + 1.674496644295302, "no_speech_prob": 0.0022750417701900005}, {"id": 969, "seek": + 301262, "start": 3028.18, "end": 3029.7799999999997, "text": " graph stuff is already + in the book.", "tokens": [51142, 4295, 1507, 307, 1217, 294, 264, 1446, 13, 51222], + "temperature": 0.0, "avg_logprob": -0.20913433661827674, "compression_ratio": 1.674496644295302, + "no_speech_prob": 0.0022750417701900005}, {"id": 970, "seek": 301262, "start": 3029.7799999999997, + "end": 3034.2999999999997, "text": " But the, yeah, we''ll definitely talk about + wormhole vectors", "tokens": [51222, 583, 264, 11, 1338, 11, 321, 603, 2138, 751, + 466, 23835, 14094, 18875, 51448], "temperature": 0.0, "avg_logprob": -0.20913433661827674, + "compression_ratio": 1.674496644295302, "no_speech_prob": 0.0022750417701900005}, + {"id": 971, "seek": 301262, "start": 3034.2999999999997, "end": 3036.38, "text": + " explicitly and have some more specific examples", "tokens": [51448, 20803, 293, + 362, 512, 544, 2685, 5110, 51552], "temperature": 0.0, "avg_logprob": -0.20913433661827674, + "compression_ratio": 1.674496644295302, "no_speech_prob": 0.0022750417701900005}, + {"id": 972, "seek": 301262, "start": 3036.38, "end": 3037.62, "text": " people can + play with.", "tokens": [51552, 561, 393, 862, 365, 13, 51614], "temperature": 0.0, + "avg_logprob": -0.20913433661827674, "compression_ratio": 1.674496644295302, "no_speech_prob": + 0.0022750417701900005}, {"id": 973, "seek": 301262, "start": 3037.62, "end": 3039.1, + "text": " Yeah, awesome.", "tokens": [51614, 865, 11, 3476, 13, 51688], "temperature": + 0.0, "avg_logprob": -0.20913433661827674, "compression_ratio": 1.674496644295302, + "no_speech_prob": 0.0022750417701900005}, {"id": 974, "seek": 301262, "start": 3039.1, + "end": 3042.38, "text": " And I do want to mention this is like experimental and + emerging.", "tokens": [51688, 400, 286, 360, 528, 281, 2152, 341, 307, 411, 17069, + 293, 14989, 13, 51852], "temperature": 0.0, "avg_logprob": -0.20913433661827674, + "compression_ratio": 1.674496644295302, "no_speech_prob": 0.0022750417701900005}, + {"id": 975, "seek": 304238, "start": 3042.38, "end": 3046.2200000000003, "text": + " And there''s some things that I glossed over today", "tokens": [50364, 400, 456, + 311, 512, 721, 300, 286, 19574, 292, 670, 965, 50556], "temperature": 0.0, "avg_logprob": + -0.23646517412378154, "compression_ratio": 1.6282051282051282, "no_speech_prob": + 0.00036594553967006505}, {"id": 976, "seek": 304238, "start": 3046.2200000000003, + "end": 3050.2200000000003, "text": " in terms of hopping to a particular point", + "tokens": [50556, 294, 2115, 295, 47199, 281, 257, 1729, 935, 50756], "temperature": + 0.0, "avg_logprob": -0.23646517412378154, "compression_ratio": 1.6282051282051282, + "no_speech_prob": 0.00036594553967006505}, {"id": 977, "seek": 304238, "start": + 3050.2200000000003, "end": 3053.54, "text": " versus trying to hop to a region and + have more of a shape", "tokens": [50756, 5717, 1382, 281, 3818, 281, 257, 4458, + 293, 362, 544, 295, 257, 3909, 50922], "temperature": 0.0, "avg_logprob": -0.23646517412378154, + "compression_ratio": 1.6282051282051282, "no_speech_prob": 0.00036594553967006505}, + {"id": 978, "seek": 304238, "start": 3053.54, "end": 3055.34, "text": " that we + could chat about as well.", "tokens": [50922, 300, 321, 727, 5081, 466, 382, 731, + 13, 51012], "temperature": 0.0, "avg_logprob": -0.23646517412378154, "compression_ratio": + 1.6282051282051282, "no_speech_prob": 0.00036594553967006505}, {"id": 979, "seek": + 304238, "start": 3055.34, "end": 3060.54, "text": " But yeah, there''s still some + things", "tokens": [51012, 583, 1338, 11, 456, 311, 920, 512, 721, 51272], "temperature": + 0.0, "avg_logprob": -0.23646517412378154, "compression_ratio": 1.6282051282051282, + "no_speech_prob": 0.00036594553967006505}, {"id": 980, "seek": 304238, "start": + 3060.54, "end": 3063.5, "text": " I''m doing to kind of better understand it fine + to know.", "tokens": [51272, 286, 478, 884, 281, 733, 295, 1101, 1223, 309, 2489, + 281, 458, 13, 51420], "temperature": 0.0, "avg_logprob": -0.23646517412378154, "compression_ratio": + 1.6282051282051282, "no_speech_prob": 0.00036594553967006505}, {"id": 981, "seek": + 304238, "start": 3063.5, "end": 3065.34, "text": " Yeah, awesome.", "tokens": [51420, + 865, 11, 3476, 13, 51512], "temperature": 0.0, "avg_logprob": -0.23646517412378154, + "compression_ratio": 1.6282051282051282, "no_speech_prob": 0.00036594553967006505}, + {"id": 982, "seek": 304238, "start": 3065.34, "end": 3066.42, "text": " I''m trying + to speed up.", "tokens": [51512, 286, 478, 1382, 281, 3073, 493, 13, 51566], "temperature": + 0.0, "avg_logprob": -0.23646517412378154, "compression_ratio": 1.6282051282051282, + "no_speech_prob": 0.00036594553967006505}, {"id": 983, "seek": 304238, "start": + 3066.42, "end": 3069.26, "text": " But there''s a question from Claudiu.", "tokens": + [51566, 583, 456, 311, 257, 1168, 490, 24858, 5951, 13, 51708], "temperature": 0.0, + "avg_logprob": -0.23646517412378154, "compression_ratio": 1.6282051282051282, "no_speech_prob": + 0.00036594553967006505}, {"id": 984, "seek": 304238, "start": 3069.26, "end": 3070.62, + "text": " What are the latent features?", "tokens": [51708, 708, 366, 264, 48994, + 4122, 30, 51776], "temperature": 0.0, "avg_logprob": -0.23646517412378154, "compression_ratio": + 1.6282051282051282, "no_speech_prob": 0.00036594553967006505}, {"id": 985, "seek": + 307062, "start": 3070.62, "end": 3074.7799999999997, "text": " And basically where + you switched from sort of explicit feature", "tokens": [50364, 400, 1936, 689, 291, + 16858, 490, 1333, 295, 13691, 4111, 50572], "temperature": 0.0, "avg_logprob": -0.23717843569242036, + "compression_ratio": 1.646341463414634, "no_speech_prob": 0.005522354505956173}, + {"id": 986, "seek": 307062, "start": 3074.7799999999997, "end": 3077.18, "text": + " metrics to like latent features.", "tokens": [50572, 16367, 281, 411, 48994, 4122, + 13, 50692], "temperature": 0.0, "avg_logprob": -0.23717843569242036, "compression_ratio": + 1.646341463414634, "no_speech_prob": 0.005522354505956173}, {"id": 987, "seek": + 307062, "start": 3077.18, "end": 3078.66, "text": " Maybe I can take it quickly.", + "tokens": [50692, 2704, 286, 393, 747, 309, 2661, 13, 50766], "temperature": 0.0, + "avg_logprob": -0.23717843569242036, "compression_ratio": 1.646341463414634, "no_speech_prob": + 0.005522354505956173}, {"id": 988, "seek": 307062, "start": 3078.66, "end": 3080.94, + "text": " It''s basically in an LLAM.", "tokens": [50766, 467, 311, 1936, 294, 364, + 441, 43, 2865, 13, 50880], "temperature": 0.0, "avg_logprob": -0.23717843569242036, + "compression_ratio": 1.646341463414634, "no_speech_prob": 0.005522354505956173}, + {"id": 989, "seek": 307062, "start": 3080.94, "end": 3083.22, "text": " So if you + deal with an encoder model,", "tokens": [50880, 407, 498, 291, 2028, 365, 364, 2058, + 19866, 2316, 11, 50994], "temperature": 0.0, "avg_logprob": -0.23717843569242036, + "compression_ratio": 1.646341463414634, "no_speech_prob": 0.005522354505956173}, + {"id": 990, "seek": 307062, "start": 3083.22, "end": 3086.54, "text": " where you + generate embeddings, basically", "tokens": [50994, 689, 291, 8460, 12240, 29432, + 11, 1936, 51160], "temperature": 0.0, "avg_logprob": -0.23717843569242036, "compression_ratio": + 1.646341463414634, "no_speech_prob": 0.005522354505956173}, {"id": 991, "seek": + 307062, "start": 3086.54, "end": 3088.3399999999997, "text": " these are like internal + representations", "tokens": [51160, 613, 366, 411, 6920, 33358, 51250], "temperature": + 0.0, "avg_logprob": -0.23717843569242036, "compression_ratio": 1.646341463414634, + "no_speech_prob": 0.005522354505956173}, {"id": 992, "seek": 307062, "start": 3088.3399999999997, + "end": 3089.58, "text": " that the model learns.", "tokens": [51250, 300, 264, 2316, + 27152, 13, 51312], "temperature": 0.0, "avg_logprob": -0.23717843569242036, "compression_ratio": + 1.646341463414634, "no_speech_prob": 0.005522354505956173}, {"id": 993, "seek": + 307062, "start": 3089.58, "end": 3091.1, "text": " And they''re like compressed.", + "tokens": [51312, 400, 436, 434, 411, 30353, 13, 51388], "temperature": 0.0, "avg_logprob": + -0.23717843569242036, "compression_ratio": 1.646341463414634, "no_speech_prob": + 0.005522354505956173}, {"id": 994, "seek": 307062, "start": 3091.1, "end": 3094.8599999999997, + "text": " They''re like abstract way of dealing with patterns", "tokens": [51388, + 814, 434, 411, 12649, 636, 295, 6260, 365, 8294, 51576], "temperature": 0.0, "avg_logprob": + -0.23717843569242036, "compression_ratio": 1.646341463414634, "no_speech_prob": + 0.005522354505956173}, {"id": 995, "seek": 307062, "start": 3094.8599999999997, + "end": 3097.42, "text": " or relationships and your data.", "tokens": [51576, 420, + 6159, 293, 428, 1412, 13, 51704], "temperature": 0.0, "avg_logprob": -0.23717843569242036, + "compression_ratio": 1.646341463414634, "no_speech_prob": 0.005522354505956173}, + {"id": 996, "seek": 309742, "start": 3097.42, "end": 3100.7400000000002, "text": + " It''s not exactly directly that black and white,", "tokens": [50364, 467, 311, + 406, 2293, 3838, 300, 2211, 293, 2418, 11, 50530], "temperature": 0.0, "avg_logprob": + -0.2316513421400538, "compression_ratio": 1.578512396694215, "no_speech_prob": 0.00023609970230609179}, + {"id": 997, "seek": 309742, "start": 3100.7400000000002, "end": 3104.02, "text": + " but the thing is that on the conceptual level,", "tokens": [50530, 457, 264, 551, + 307, 300, 322, 264, 24106, 1496, 11, 50694], "temperature": 0.0, "avg_logprob": + -0.2316513421400538, "compression_ratio": 1.578512396694215, "no_speech_prob": 0.00023609970230609179}, + {"id": 998, "seek": 309742, "start": 3104.02, "end": 3108.14, "text": " these are + like internal weights that the model learns.", "tokens": [50694, 613, 366, 411, + 6920, 17443, 300, 264, 2316, 27152, 13, 50900], "temperature": 0.0, "avg_logprob": + -0.2316513421400538, "compression_ratio": 1.578512396694215, "no_speech_prob": 0.00023609970230609179}, + {"id": 999, "seek": 309742, "start": 3108.14, "end": 3111.2200000000003, "text": + " Then there is a question from Julian, very concrete one.", "tokens": [50900, 1396, + 456, 307, 257, 1168, 490, 25151, 11, 588, 9859, 472, 13, 51054], "temperature": + 0.0, "avg_logprob": -0.2316513421400538, "compression_ratio": 1.578512396694215, + "no_speech_prob": 0.00023609970230609179}, {"id": 1000, "seek": 309742, "start": + 3111.2200000000003, "end": 3113.78, "text": " Can you give an great example of how", + "tokens": [51054, 1664, 291, 976, 364, 869, 1365, 295, 577, 51182], "temperature": + 0.0, "avg_logprob": -0.2316513421400538, "compression_ratio": 1.578512396694215, + "no_speech_prob": 0.00023609970230609179}, {"id": 1001, "seek": 309742, "start": + 3113.78, "end": 3119.54, "text": " to compute the wormhole vector from sparse to + dense space?", "tokens": [51182, 281, 14722, 264, 23835, 14094, 8062, 490, 637, + 11668, 281, 18011, 1901, 30, 51470], "temperature": 0.0, "avg_logprob": -0.2316513421400538, + "compression_ratio": 1.578512396694215, "no_speech_prob": 0.00023609970230609179}, + {"id": 1002, "seek": 309742, "start": 3119.54, "end": 3121.46, "text": " Yeah, so + I had a slide.", "tokens": [51470, 865, 11, 370, 286, 632, 257, 4137, 13, 51566], + "temperature": 0.0, "avg_logprob": -0.2316513421400538, "compression_ratio": 1.578512396694215, + "no_speech_prob": 0.00023609970230609179}, {"id": 1003, "seek": 309742, "start": + 3121.46, "end": 3127.06, "text": " It''s sparse to dense is the easy one, which + is, let me,", "tokens": [51566, 467, 311, 637, 11668, 281, 18011, 307, 264, 1858, + 472, 11, 597, 307, 11, 718, 385, 11, 51846], "temperature": 0.0, "avg_logprob": + -0.2316513421400538, "compression_ratio": 1.578512396694215, "no_speech_prob": 0.00023609970230609179}, + {"id": 1004, "seek": 312706, "start": 3127.06, "end": 3128.2999999999997, "text": + " let me go back to the slide.", "tokens": [50364, 718, 385, 352, 646, 281, 264, + 4137, 13, 50426], "temperature": 0.0, "avg_logprob": -0.222856482680963, "compression_ratio": + 1.5944700460829493, "no_speech_prob": 0.0004166789003647864}, {"id": 1005, "seek": + 312706, "start": 3131.1, "end": 3133.7, "text": " One second almost there.", "tokens": + [50566, 1485, 1150, 1920, 456, 13, 50696], "temperature": 0.0, "avg_logprob": -0.222856482680963, + "compression_ratio": 1.5944700460829493, "no_speech_prob": 0.0004166789003647864}, + {"id": 1006, "seek": 312706, "start": 3136.66, "end": 3138.38, "text": " Here we + go.", "tokens": [50844, 1692, 321, 352, 13, 50930], "temperature": 0.0, "avg_logprob": + -0.222856482680963, "compression_ratio": 1.5944700460829493, "no_speech_prob": 0.0004166789003647864}, + {"id": 1007, "seek": 312706, "start": 3138.38, "end": 3143.38, "text": " So to go + from sparse to dense, think of it this way.", "tokens": [50930, 407, 281, 352, 490, + 637, 11668, 281, 18011, 11, 519, 295, 309, 341, 636, 13, 51180], "temperature": + 0.0, "avg_logprob": -0.222856482680963, "compression_ratio": 1.5944700460829493, + "no_speech_prob": 0.0004166789003647864}, {"id": 1008, "seek": 312706, "start": + 3143.38, "end": 3145.7799999999997, "text": " You''ve got a bunch of documents in + your index,", "tokens": [51180, 509, 600, 658, 257, 3840, 295, 8512, 294, 428, 8186, + 11, 51300], "temperature": 0.0, "avg_logprob": -0.222856482680963, "compression_ratio": + 1.5944700460829493, "no_speech_prob": 0.0004166789003647864}, {"id": 1009, "seek": + 312706, "start": 3145.7799999999997, "end": 3148.14, "text": " and you generate + embeddings for those documents.", "tokens": [51300, 293, 291, 8460, 12240, 29432, + 337, 729, 8512, 13, 51418], "temperature": 0.0, "avg_logprob": -0.222856482680963, + "compression_ratio": 1.5944700460829493, "no_speech_prob": 0.0004166789003647864}, + {"id": 1010, "seek": 312706, "start": 3148.14, "end": 3151.34, "text": " That''s + how your dense space is constructed, right?", "tokens": [51418, 663, 311, 577, 428, + 18011, 1901, 307, 17083, 11, 558, 30, 51578], "temperature": 0.0, "avg_logprob": + -0.222856482680963, "compression_ratio": 1.5944700460829493, "no_speech_prob": 0.0004166789003647864}, + {"id": 1011, "seek": 312706, "start": 3151.34, "end": 3153.2999999999997, "text": + " Those embeddings on the documents,", "tokens": [51578, 3950, 12240, 29432, 322, + 264, 8512, 11, 51676], "temperature": 0.0, "avg_logprob": -0.222856482680963, "compression_ratio": + 1.5944700460829493, "no_speech_prob": 0.0004166789003647864}, {"id": 1012, "seek": + 312706, "start": 3153.2999999999997, "end": 3155.94, "text": " if you query for + the documents using keywords", "tokens": [51676, 498, 291, 14581, 337, 264, 8512, + 1228, 21009, 51808], "temperature": 0.0, "avg_logprob": -0.222856482680963, "compression_ratio": + 1.5944700460829493, "no_speech_prob": 0.0004166789003647864}, {"id": 1013, "seek": + 315594, "start": 3155.94, "end": 3158.58, "text": " in your sparse space, then you''re + still", "tokens": [50364, 294, 428, 637, 11668, 1901, 11, 550, 291, 434, 920, 50496], + "temperature": 0.0, "avg_logprob": -0.16652173844594803, "compression_ratio": 1.8171641791044777, + "no_speech_prob": 0.0008758945623412728}, {"id": 1014, "seek": 315594, "start": + 3158.58, "end": 3159.86, "text": " matching that set of documents,", "tokens": [50496, + 14324, 300, 992, 295, 8512, 11, 50560], "temperature": 0.0, "avg_logprob": -0.16652173844594803, + "compression_ratio": 1.8171641791044777, "no_speech_prob": 0.0008758945623412728}, + {"id": 1015, "seek": 315594, "start": 3159.86, "end": 3162.3, "text": " and all + of those documents have the embeddings on them.", "tokens": [50560, 293, 439, 295, + 729, 8512, 362, 264, 12240, 29432, 322, 552, 13, 50682], "temperature": 0.0, "avg_logprob": + -0.16652173844594803, "compression_ratio": 1.8171641791044777, "no_speech_prob": + 0.0008758945623412728}, {"id": 1016, "seek": 315594, "start": 3162.3, "end": 3165.54, + "text": " So all you do is run a keyword search on your documents,", "tokens": [50682, + 407, 439, 291, 360, 307, 1190, 257, 20428, 3164, 322, 428, 8512, 11, 50844], "temperature": + 0.0, "avg_logprob": -0.16652173844594803, "compression_ratio": 1.8171641791044777, + "no_speech_prob": 0.0008758945623412728}, {"id": 1017, "seek": 315594, "start": + 3165.54, "end": 3168.18, "text": " take the top end documents are the most relevant, + right?", "tokens": [50844, 747, 264, 1192, 917, 8512, 366, 264, 881, 7340, 11, 558, + 30, 50976], "temperature": 0.0, "avg_logprob": -0.16652173844594803, "compression_ratio": + 1.8171641791044777, "no_speech_prob": 0.0008758945623412728}, {"id": 1018, "seek": + 315594, "start": 3168.18, "end": 3172.7400000000002, "text": " They hopefully semantically + represent the concept the best.", "tokens": [50976, 814, 4696, 4361, 49505, 2906, + 264, 3410, 264, 1151, 13, 51204], "temperature": 0.0, "avg_logprob": -0.16652173844594803, + "compression_ratio": 1.8171641791044777, "no_speech_prob": 0.0008758945623412728}, + {"id": 1019, "seek": 315594, "start": 3172.7400000000002, "end": 3175.26, "text": + " And then you take those embeddings off of them,", "tokens": [51204, 400, 550, + 291, 747, 729, 12240, 29432, 766, 295, 552, 11, 51330], "temperature": 0.0, "avg_logprob": + -0.16652173844594803, "compression_ratio": 1.8171641791044777, "no_speech_prob": + 0.0008758945623412728}, {"id": 1020, "seek": 315594, "start": 3175.26, "end": 3177.02, + "text": " and you literally abverse them together.", "tokens": [51330, 293, 291, + 3736, 410, 4308, 552, 1214, 13, 51418], "temperature": 0.0, "avg_logprob": -0.16652173844594803, + "compression_ratio": 1.8171641791044777, "no_speech_prob": 0.0008758945623412728}, + {"id": 1021, "seek": 315594, "start": 3177.02, "end": 3180.78, "text": " The code + for that is on the screen right here,", "tokens": [51418, 440, 3089, 337, 300, 307, + 322, 264, 2568, 558, 510, 11, 51606], "temperature": 0.0, "avg_logprob": -0.16652173844594803, + "compression_ratio": 1.8171641791044777, "no_speech_prob": 0.0008758945623412728}, + {"id": 1022, "seek": 315594, "start": 3180.78, "end": 3183.7000000000003, "text": + " and that you just generate this pooled embedding.", "tokens": [51606, 293, 300, + 291, 445, 8460, 341, 7005, 292, 12240, 3584, 13, 51752], "temperature": 0.0, "avg_logprob": + -0.16652173844594803, "compression_ratio": 1.8171641791044777, "no_speech_prob": + 0.0008758945623412728}, {"id": 1023, "seek": 318370, "start": 3183.7, "end": 3186.14, + "text": " It''s that notion of Darth Vader versus Puppy,", "tokens": [50364, 467, + 311, 300, 10710, 295, 40696, 36337, 5717, 13605, 7966, 11, 50486], "temperature": + 0.0, "avg_logprob": -0.1876959707222733, "compression_ratio": 1.8772563176895307, + "no_speech_prob": 0.0005989124183543026}, {"id": 1024, "seek": 318370, "start": + 3186.14, "end": 3188.5, "text": " and finding the Puppy Darth Vader in the middle, + right?", "tokens": [50486, 293, 5006, 264, 13605, 7966, 40696, 36337, 294, 264, + 2808, 11, 558, 30, 50604], "temperature": 0.0, "avg_logprob": -0.1876959707222733, + "compression_ratio": 1.8772563176895307, "no_speech_prob": 0.0005989124183543026}, + {"id": 1025, "seek": 318370, "start": 3188.5, "end": 3190.8599999999997, "text": + " If someone were to run a keyword search,", "tokens": [50604, 759, 1580, 645, 281, + 1190, 257, 20428, 3164, 11, 50722], "temperature": 0.0, "avg_logprob": -0.1876959707222733, + "compression_ratio": 1.8772563176895307, "no_speech_prob": 0.0005989124183543026}, + {"id": 1026, "seek": 318370, "start": 3190.8599999999997, "end": 3193.3399999999997, + "text": " and it''s sort of, it''s easy to think of this", "tokens": [50722, 293, + 309, 311, 1333, 295, 11, 309, 311, 1858, 281, 519, 295, 341, 50846], "temperature": + 0.0, "avg_logprob": -0.1876959707222733, "compression_ratio": 1.8772563176895307, + "no_speech_prob": 0.0005989124183543026}, {"id": 1027, "seek": 318370, "start": + 3193.3399999999997, "end": 3195.8999999999996, "text": " with a single keyword, + but let''s go back to my,", "tokens": [50846, 365, 257, 2167, 20428, 11, 457, 718, + 311, 352, 646, 281, 452, 11, 50974], "temperature": 0.0, "avg_logprob": -0.1876959707222733, + "compression_ratio": 1.8772563176895307, "no_speech_prob": 0.0005989124183543026}, + {"id": 1028, "seek": 318370, "start": 3199.18, "end": 3200.8599999999997, "text": + " let''s go back to cheese pizza, right?", "tokens": [51138, 718, 311, 352, 646, + 281, 5399, 8298, 11, 558, 30, 51222], "temperature": 0.0, "avg_logprob": -0.1876959707222733, + "compression_ratio": 1.8772563176895307, "no_speech_prob": 0.0005989124183543026}, + {"id": 1029, "seek": 318370, "start": 3200.8599999999997, "end": 3203.8199999999997, + "text": " Like if I search for pizza, I''m gonna match a bunch of pizzas.", "tokens": + [51222, 1743, 498, 286, 3164, 337, 8298, 11, 286, 478, 799, 2995, 257, 3840, 295, + 44037, 13, 51370], "temperature": 0.0, "avg_logprob": -0.1876959707222733, "compression_ratio": + 1.8772563176895307, "no_speech_prob": 0.0005989124183543026}, {"id": 1030, "seek": + 318370, "start": 3203.8199999999997, "end": 3205.7, "text": " If I search for, maybe + cheese pizza is back", "tokens": [51370, 759, 286, 3164, 337, 11, 1310, 5399, 8298, + 307, 646, 51464], "temperature": 0.0, "avg_logprob": -0.1876959707222733, "compression_ratio": + 1.8772563176895307, "no_speech_prob": 0.0005989124183543026}, {"id": 1031, "seek": + 318370, "start": 3205.7, "end": 3209.3799999999997, "text": " as all pizza has cheese, + let''s do cinnamon bread sticks, right?", "tokens": [51464, 382, 439, 8298, 575, + 5399, 11, 718, 311, 360, 22969, 5961, 12518, 11, 558, 30, 51648], "temperature": + 0.0, "avg_logprob": -0.1876959707222733, "compression_ratio": 1.8772563176895307, + "no_speech_prob": 0.0005989124183543026}, {"id": 1032, "seek": 318370, "start": + 3209.3799999999997, "end": 3212.02, "text": " If I search for bread, I''m gonna + find bread.", "tokens": [51648, 759, 286, 3164, 337, 5961, 11, 286, 478, 799, 915, + 5961, 13, 51780], "temperature": 0.0, "avg_logprob": -0.1876959707222733, "compression_ratio": + 1.8772563176895307, "no_speech_prob": 0.0005989124183543026}, {"id": 1033, "seek": + 318370, "start": 3212.02, "end": 3213.1, "text": " Documents have the word bread.", + "tokens": [51780, 16024, 4697, 362, 264, 1349, 5961, 13, 51834], "temperature": + 0.0, "avg_logprob": -0.1876959707222733, "compression_ratio": 1.8772563176895307, + "no_speech_prob": 0.0005989124183543026}, {"id": 1034, "seek": 321310, "start": + 3213.1, "end": 3215.46, "text": " If I search for cinnamon, I find documents with + cinnamon.", "tokens": [50364, 759, 286, 3164, 337, 22969, 11, 286, 915, 8512, 365, + 22969, 13, 50482], "temperature": 0.0, "avg_logprob": -0.11311496628655328, "compression_ratio": + 2.1294117647058823, "no_speech_prob": 0.0004339195729698986}, {"id": 1035, "seek": + 321310, "start": 3215.46, "end": 3217.7799999999997, "text": " If I search for sticks, + I find documents with sticks.", "tokens": [50482, 759, 286, 3164, 337, 12518, 11, + 286, 915, 8512, 365, 12518, 13, 50598], "temperature": 0.0, "avg_logprob": -0.11311496628655328, + "compression_ratio": 2.1294117647058823, "no_speech_prob": 0.0004339195729698986}, + {"id": 1036, "seek": 321310, "start": 3217.7799999999997, "end": 3220.42, "text": + " Sticks by itself isn''t really what I''m looking for,", "tokens": [50598, 745, + 7663, 538, 2564, 1943, 380, 534, 437, 286, 478, 1237, 337, 11, 50730], "temperature": + 0.0, "avg_logprob": -0.11311496628655328, "compression_ratio": 2.1294117647058823, + "no_speech_prob": 0.0004339195729698986}, {"id": 1037, "seek": 321310, "start": + 3220.42, "end": 3222.9, "text": " but if I do cinnamon bread sticks,", "tokens": + [50730, 457, 498, 286, 360, 22969, 5961, 12518, 11, 50854], "temperature": 0.0, + "avg_logprob": -0.11311496628655328, "compression_ratio": 2.1294117647058823, "no_speech_prob": + 0.0004339195729698986}, {"id": 1038, "seek": 321310, "start": 3222.9, "end": 3224.42, + "text": " then I''m finding all of the documents", "tokens": [50854, 550, 286, 478, + 5006, 439, 295, 264, 8512, 50930], "temperature": 0.0, "avg_logprob": -0.11311496628655328, + "compression_ratio": 2.1294117647058823, "no_speech_prob": 0.0004339195729698986}, + {"id": 1039, "seek": 321310, "start": 3224.42, "end": 3226.54, "text": " that have + those terms together,", "tokens": [50930, 300, 362, 729, 2115, 1214, 11, 51036], + "temperature": 0.0, "avg_logprob": -0.11311496628655328, "compression_ratio": 2.1294117647058823, + "no_speech_prob": 0.0004339195729698986}, {"id": 1040, "seek": 321310, "start": + 3226.54, "end": 3229.8199999999997, "text": " which likely are cinnamon bread sticks,", + "tokens": [51036, 597, 3700, 366, 22969, 5961, 12518, 11, 51200], "temperature": + 0.0, "avg_logprob": -0.11311496628655328, "compression_ratio": 2.1294117647058823, + "no_speech_prob": 0.0004339195729698986}, {"id": 1041, "seek": 321310, "start": + 3229.8199999999997, "end": 3231.54, "text": " or have the notion of cinnamon bread + sticks,", "tokens": [51200, 420, 362, 264, 10710, 295, 22969, 5961, 12518, 11, 51286], + "temperature": 0.0, "avg_logprob": -0.11311496628655328, "compression_ratio": 2.1294117647058823, + "no_speech_prob": 0.0004339195729698986}, {"id": 1042, "seek": 321310, "start": + 3231.54, "end": 3233.74, "text": " or talking about cinnamon bread sticks.", "tokens": + [51286, 420, 1417, 466, 22969, 5961, 12518, 13, 51396], "temperature": 0.0, "avg_logprob": + -0.11311496628655328, "compression_ratio": 2.1294117647058823, "no_speech_prob": + 0.0004339195729698986}, {"id": 1043, "seek": 321310, "start": 3233.74, "end": 3235.5, + "text": " So if I take all of those documents,", "tokens": [51396, 407, 498, 286, + 747, 439, 295, 729, 8512, 11, 51484], "temperature": 0.0, "avg_logprob": -0.11311496628655328, + "compression_ratio": 2.1294117647058823, "no_speech_prob": 0.0004339195729698986}, + {"id": 1044, "seek": 321310, "start": 3235.5, "end": 3238.46, "text": " the most + relevant ones, and I generate,", "tokens": [51484, 264, 881, 7340, 2306, 11, 293, + 286, 8460, 11, 51632], "temperature": 0.0, "avg_logprob": -0.11311496628655328, + "compression_ratio": 2.1294117647058823, "no_speech_prob": 0.0004339195729698986}, + {"id": 1045, "seek": 321310, "start": 3238.46, "end": 3239.86, "text": " and I average + their embeddings together,", "tokens": [51632, 293, 286, 4274, 641, 12240, 29432, + 1214, 11, 51702], "temperature": 0.0, "avg_logprob": -0.11311496628655328, "compression_ratio": + 2.1294117647058823, "no_speech_prob": 0.0004339195729698986}, {"id": 1046, "seek": + 321310, "start": 3239.86, "end": 3242.7799999999997, "text": " and go over to the + dense space,", "tokens": [51702, 293, 352, 670, 281, 264, 18011, 1901, 11, 51848], + "temperature": 0.0, "avg_logprob": -0.11311496628655328, "compression_ratio": 2.1294117647058823, + "no_speech_prob": 0.0004339195729698986}, {"id": 1047, "seek": 324278, "start": + 3242.78, "end": 3247.42, "text": " where I land should be where the concept of cinnamon + bread sticks is,", "tokens": [50364, 689, 286, 2117, 820, 312, 689, 264, 3410, 295, + 22969, 5961, 12518, 307, 11, 50596], "temperature": 0.0, "avg_logprob": -0.22591648931088654, + "compression_ratio": 1.7508650519031141, "no_speech_prob": 0.0006612985162064433}, + {"id": 1048, "seek": 324278, "start": 3247.42, "end": 3252.26, "text": " and things + nearby, which may not have the word cinnamon bread or sticks in them,", "tokens": + [50596, 293, 721, 11184, 11, 597, 815, 406, 362, 264, 1349, 22969, 5961, 420, 12518, + 294, 552, 11, 50838], "temperature": 0.0, "avg_logprob": -0.22591648931088654, "compression_ratio": + 1.7508650519031141, "no_speech_prob": 0.0006612985162064433}, {"id": 1049, "seek": + 324278, "start": 3252.26, "end": 3252.98, "text": " should come back.", "tokens": + [50838, 820, 808, 646, 13, 50874], "temperature": 0.0, "avg_logprob": -0.22591648931088654, + "compression_ratio": 1.7508650519031141, "no_speech_prob": 0.0006612985162064433}, + {"id": 1050, "seek": 324278, "start": 3252.98, "end": 3255.78, "text": " I might + get certain kinds of doughnuts,", "tokens": [50874, 286, 1062, 483, 1629, 3685, + 295, 7984, 29251, 11, 51014], "temperature": 0.0, "avg_logprob": -0.22591648931088654, + "compression_ratio": 1.7508650519031141, "no_speech_prob": 0.0006612985162064433}, + {"id": 1051, "seek": 324278, "start": 3255.78, "end": 3258.5800000000004, "text": + " and certain kind, I might get like a churro or something like that.", "tokens": + [51014, 293, 1629, 733, 11, 286, 1062, 483, 411, 257, 417, 374, 340, 420, 746, 411, + 300, 13, 51154], "temperature": 0.0, "avg_logprob": -0.22591648931088654, "compression_ratio": + 1.7508650519031141, "no_speech_prob": 0.0006612985162064433}, {"id": 1052, "seek": + 324278, "start": 3258.5800000000004, "end": 3259.86, "text": " So that''s how it + works.", "tokens": [51154, 407, 300, 311, 577, 309, 1985, 13, 51218], "temperature": + 0.0, "avg_logprob": -0.22591648931088654, "compression_ratio": 1.7508650519031141, + "no_speech_prob": 0.0006612985162064433}, {"id": 1053, "seek": 324278, "start": + 3259.86, "end": 3261.2200000000003, "text": " But this is the math here.", "tokens": + [51218, 583, 341, 307, 264, 5221, 510, 13, 51286], "temperature": 0.0, "avg_logprob": + -0.22591648931088654, "compression_ratio": 1.7508650519031141, "no_speech_prob": + 0.0006612985162064433}, {"id": 1054, "seek": 324278, "start": 3261.2200000000003, + "end": 3264.1000000000004, "text": " That''s actually the easiest way to go from + sparse to dense.", "tokens": [51286, 663, 311, 767, 264, 12889, 636, 281, 352, 490, + 637, 11668, 281, 18011, 13, 51430], "temperature": 0.0, "avg_logprob": -0.22591648931088654, + "compression_ratio": 1.7508650519031141, "no_speech_prob": 0.0006612985162064433}, + {"id": 1055, "seek": 324278, "start": 3264.1000000000004, "end": 3267.46, "text": + " The dense to sparse requires a semantic knowledge graph or similar.", "tokens": + [51430, 440, 18011, 281, 637, 11668, 7029, 257, 47982, 3601, 4295, 420, 2531, 13, + 51598], "temperature": 0.0, "avg_logprob": -0.22591648931088654, "compression_ratio": + 1.7508650519031141, "no_speech_prob": 0.0006612985162064433}, {"id": 1056, "seek": + 324278, "start": 3268.42, "end": 3269.26, "text": " Awesome.", "tokens": [51646, + 10391, 13, 51688], "temperature": 0.0, "avg_logprob": -0.22591648931088654, "compression_ratio": + 1.7508650519031141, "no_speech_prob": 0.0006612985162064433}, {"id": 1057, "seek": + 324278, "start": 3269.26, "end": 3271.26, "text": " I hope this answers your question, + Julen.", "tokens": [51688, 286, 1454, 341, 6338, 428, 1168, 11, 7174, 268, 13, 51788], + "temperature": 0.0, "avg_logprob": -0.22591648931088654, "compression_ratio": 1.7508650519031141, + "no_speech_prob": 0.0006612985162064433}, {"id": 1058, "seek": 327126, "start": + 3271.26, "end": 3276.46, "text": " I would not feel free to unmute and ask full + of questions.", "tokens": [50364, 286, 576, 406, 841, 1737, 281, 41445, 293, 1029, + 1577, 295, 1651, 13, 50624], "temperature": 0.0, "avg_logprob": -0.3246119659725982, + "compression_ratio": 1.5957446808510638, "no_speech_prob": 0.0066570378839969635}, + {"id": 1059, "seek": 327126, "start": 3276.46, "end": 3280.5, "text": " Otherwise, + I''ll jump to the next one from Ursula.", "tokens": [50624, 10328, 11, 286, 603, + 3012, 281, 264, 958, 472, 490, 41303, 3780, 13, 50826], "temperature": 0.0, "avg_logprob": + -0.3246119659725982, "compression_ratio": 1.5957446808510638, "no_speech_prob": + 0.0066570378839969635}, {"id": 1060, "seek": 327126, "start": 3280.5, "end": 3283.5800000000004, + "text": " Do we build the inverted index and the forward index", "tokens": [50826, + 1144, 321, 1322, 264, 38969, 8186, 293, 264, 2128, 8186, 50980], "temperature": + 0.0, "avg_logprob": -0.3246119659725982, "compression_ratio": 1.5957446808510638, + "no_speech_prob": 0.0066570378839969635}, {"id": 1061, "seek": 327126, "start": + 3283.5800000000004, "end": 3287.38, "text": " to build the knowledge graph using + just some document chunks?", "tokens": [50980, 281, 1322, 264, 3601, 4295, 1228, + 445, 512, 4166, 24004, 30, 51170], "temperature": 0.0, "avg_logprob": -0.3246119659725982, + "compression_ratio": 1.5957446808510638, "no_speech_prob": 0.0066570378839969635}, + {"id": 1062, "seek": 327126, "start": 3287.38, "end": 3290.42, "text": " Or do we + need a much bigger document base to make it work?", "tokens": [51170, 1610, 360, + 321, 643, 257, 709, 3801, 4166, 3096, 281, 652, 309, 589, 30, 51322], "temperature": + 0.0, "avg_logprob": -0.3246119659725982, "compression_ratio": 1.5957446808510638, + "no_speech_prob": 0.0066570378839969635}, {"id": 1063, "seek": 327126, "start": + 3291.6600000000003, "end": 3292.6200000000003, "text": " It''s a good question.", + "tokens": [51384, 467, 311, 257, 665, 1168, 13, 51432], "temperature": 0.0, "avg_logprob": + -0.3246119659725982, "compression_ratio": 1.5957446808510638, "no_speech_prob": + 0.0066570378839969635}, {"id": 1064, "seek": 327126, "start": 3292.6200000000003, + "end": 3293.6600000000003, "text": " Mm-hmm.", "tokens": [51432, 8266, 12, 10250, + 13, 51484], "temperature": 0.0, "avg_logprob": -0.3246119659725982, "compression_ratio": + 1.5957446808510638, "no_speech_prob": 0.0066570378839969635}, {"id": 1065, "seek": + 327126, "start": 3293.6600000000003, "end": 3301.1400000000003, "text": " So the + best way for semantic knowledge graph to work the best,", "tokens": [51484, 407, + 264, 1151, 636, 337, 47982, 3601, 4295, 281, 589, 264, 1151, 11, 51858], "temperature": + 0.0, "avg_logprob": -0.3246119659725982, "compression_ratio": 1.5957446808510638, + "no_speech_prob": 0.0066570378839969635}, {"id": 1066, "seek": 330126, "start": + 3301.26, "end": 3307.1400000000003, "text": " you need to have overlaps of terms + that across documents.", "tokens": [50364, 291, 643, 281, 362, 15986, 2382, 295, + 2115, 300, 2108, 8512, 13, 50658], "temperature": 0.0, "avg_logprob": -0.26547984073036596, + "compression_ratio": 1.585, "no_speech_prob": 0.0027499250136315823}, {"id": 1067, + "seek": 330126, "start": 3307.1400000000003, "end": 3311.5400000000004, "text": + " Meaning, if I take something like stack exchange,", "tokens": [50658, 19948, 11, + 498, 286, 747, 746, 411, 8630, 7742, 11, 50878], "temperature": 0.0, "avg_logprob": + -0.26547984073036596, "compression_ratio": 1.585, "no_speech_prob": 0.0027499250136315823}, + {"id": 1068, "seek": 330126, "start": 3311.5400000000004, "end": 3315.46, "text": + " where there''s a bunch of topics being talked about,", "tokens": [50878, 689, + 456, 311, 257, 3840, 295, 8378, 885, 2825, 466, 11, 51074], "temperature": 0.0, + "avg_logprob": -0.26547984073036596, "compression_ratio": 1.585, "no_speech_prob": + 0.0027499250136315823}, {"id": 1069, "seek": 330126, "start": 3315.46, "end": 3320.1400000000003, + "text": " you''ll have lots of people who use the same words together and the same + documents.", "tokens": [51074, 291, 603, 362, 3195, 295, 561, 567, 764, 264, 912, + 2283, 1214, 293, 264, 912, 8512, 13, 51308], "temperature": 0.0, "avg_logprob": + -0.26547984073036596, "compression_ratio": 1.585, "no_speech_prob": 0.0027499250136315823}, + {"id": 1070, "seek": 330126, "start": 3320.1400000000003, "end": 3326.7400000000002, + "text": " When that happens, you can easily find sets of terms that overlap commonly", + "tokens": [51308, 1133, 300, 2314, 11, 291, 393, 3612, 915, 6352, 295, 2115, 300, + 19959, 12719, 51638], "temperature": 0.0, "avg_logprob": -0.26547984073036596, "compression_ratio": + 1.585, "no_speech_prob": 0.0027499250136315823}, {"id": 1071, "seek": 332674, "start": + 3326.74, "end": 3331.2599999999998, "text": " and use the semantic knowledge graph + to generate semantic understanding", "tokens": [50364, 293, 764, 264, 47982, 3601, + 4295, 281, 8460, 47982, 3701, 50590], "temperature": 0.0, "avg_logprob": -0.21526068084093988, + "compression_ratio": 1.6872586872586872, "no_speech_prob": 0.003010509768500924}, + {"id": 1072, "seek": 332674, "start": 3331.2599999999998, "end": 3333.8599999999997, + "text": " and relationships based upon those co-occurrences.", "tokens": [50590, + 293, 6159, 2361, 3564, 729, 598, 12, 905, 14112, 38983, 13, 50720], "temperature": + 0.0, "avg_logprob": -0.21526068084093988, "compression_ratio": 1.6872586872586872, + "no_speech_prob": 0.003010509768500924}, {"id": 1073, "seek": 332674, "start": 3333.8599999999997, + "end": 3337.02, "text": " All the math for that''s in the AI powered search book,", + "tokens": [50720, 1057, 264, 5221, 337, 300, 311, 294, 264, 7318, 17786, 3164, 1446, + 11, 50878], "temperature": 0.0, "avg_logprob": -0.21526068084093988, "compression_ratio": + 1.6872586872586872, "no_speech_prob": 0.003010509768500924}, {"id": 1074, "seek": + 332674, "start": 3337.02, "end": 3341.4599999999996, "text": " but that''s when + it works the best.", "tokens": [50878, 457, 300, 311, 562, 309, 1985, 264, 1151, + 13, 51100], "temperature": 0.0, "avg_logprob": -0.21526068084093988, "compression_ratio": + 1.6872586872586872, "no_speech_prob": 0.003010509768500924}, {"id": 1075, "seek": + 332674, "start": 3341.4599999999996, "end": 3345.22, "text": " Something like Wikipedia + is actually even though it''s commonly used", "tokens": [51100, 6595, 411, 28999, + 307, 767, 754, 1673, 309, 311, 12719, 1143, 51288], "temperature": 0.0, "avg_logprob": + -0.21526068084093988, "compression_ratio": 1.6872586872586872, "no_speech_prob": + 0.003010509768500924}, {"id": 1076, "seek": 332674, "start": 3345.22, "end": 3348.4599999999996, + "text": " for every data science project.", "tokens": [51288, 337, 633, 1412, 3497, + 1716, 13, 51450], "temperature": 0.0, "avg_logprob": -0.21526068084093988, "compression_ratio": + 1.6872586872586872, "no_speech_prob": 0.003010509768500924}, {"id": 1077, "seek": + 332674, "start": 3348.4599999999996, "end": 3350.62, "text": " It''s actually really + bad for semantic knowledge graphs", "tokens": [51450, 467, 311, 767, 534, 1578, + 337, 47982, 3601, 24877, 51558], "temperature": 0.0, "avg_logprob": -0.21526068084093988, + "compression_ratio": 1.6872586872586872, "no_speech_prob": 0.003010509768500924}, + {"id": 1078, "seek": 332674, "start": 3350.62, "end": 3355.74, "text": " because + every Wikipedia document tends to be about a particular topic", "tokens": [51558, + 570, 633, 28999, 4166, 12258, 281, 312, 466, 257, 1729, 4829, 51814], "temperature": + 0.0, "avg_logprob": -0.21526068084093988, "compression_ratio": 1.6872586872586872, + "no_speech_prob": 0.003010509768500924}, {"id": 1079, "seek": 335574, "start": 3355.74, + "end": 3358.62, "text": " and other than common terminology,", "tokens": [50364, + 293, 661, 813, 2689, 27575, 11, 50508], "temperature": 0.0, "avg_logprob": -0.1441986051380125, + "compression_ratio": 1.803088803088803, "no_speech_prob": 0.0001800281897885725}, + {"id": 1080, "seek": 335574, "start": 3358.62, "end": 3361.4199999999996, "text": + " you tend to not have a lot of overlap across documents", "tokens": [50508, 291, + 3928, 281, 406, 362, 257, 688, 295, 19959, 2108, 8512, 50648], "temperature": 0.0, + "avg_logprob": -0.1441986051380125, "compression_ratio": 1.803088803088803, "no_speech_prob": + 0.0001800281897885725}, {"id": 1081, "seek": 335574, "start": 3361.4199999999996, + "end": 3362.9799999999996, "text": " because they''re all focused on one idea.", + "tokens": [50648, 570, 436, 434, 439, 5178, 322, 472, 1558, 13, 50726], "temperature": + 0.0, "avg_logprob": -0.1441986051380125, "compression_ratio": 1.803088803088803, + "no_speech_prob": 0.0001800281897885725}, {"id": 1082, "seek": 335574, "start": + 3362.9799999999996, "end": 3366.4599999999996, "text": " So for semantic knowledge + graph to work well,", "tokens": [50726, 407, 337, 47982, 3601, 4295, 281, 589, 731, + 11, 50900], "temperature": 0.0, "avg_logprob": -0.1441986051380125, "compression_ratio": + 1.803088803088803, "no_speech_prob": 0.0001800281897885725}, {"id": 1083, "seek": + 335574, "start": 3366.4599999999996, "end": 3370.2999999999997, "text": " you typically + are going to want to have overlap across your documents.", "tokens": [50900, 291, + 5850, 366, 516, 281, 528, 281, 362, 19959, 2108, 428, 8512, 13, 51092], "temperature": + 0.0, "avg_logprob": -0.1441986051380125, "compression_ratio": 1.803088803088803, + "no_speech_prob": 0.0001800281897885725}, {"id": 1084, "seek": 335574, "start": + 3370.2999999999997, "end": 3375.2599999999998, "text": " What that means is that + if you chunk your document so small", "tokens": [51092, 708, 300, 1355, 307, 300, + 498, 291, 16635, 428, 4166, 370, 1359, 51340], "temperature": 0.0, "avg_logprob": + -0.1441986051380125, "compression_ratio": 1.803088803088803, "no_speech_prob": 0.0001800281897885725}, + {"id": 1085, "seek": 335574, "start": 3375.2599999999998, "end": 3379.54, "text": + " that you only have a couple of words or sentences or something like that,", "tokens": + [51340, 300, 291, 787, 362, 257, 1916, 295, 2283, 420, 16579, 420, 746, 411, 300, + 11, 51554], "temperature": 0.0, "avg_logprob": -0.1441986051380125, "compression_ratio": + 1.803088803088803, "no_speech_prob": 0.0001800281897885725}, {"id": 1086, "seek": + 335574, "start": 3379.54, "end": 3381.7, "text": " you lose a lot of that context.", + "tokens": [51554, 291, 3624, 257, 688, 295, 300, 4319, 13, 51662], "temperature": + 0.0, "avg_logprob": -0.1441986051380125, "compression_ratio": 1.803088803088803, + "no_speech_prob": 0.0001800281897885725}, {"id": 1087, "seek": 335574, "start": + 3381.7, "end": 3383.9799999999996, "text": " I mean, in general, when you chunk, + you lose context.", "tokens": [51662, 286, 914, 11, 294, 2674, 11, 562, 291, 16635, + 11, 291, 3624, 4319, 13, 51776], "temperature": 0.0, "avg_logprob": -0.1441986051380125, + "compression_ratio": 1.803088803088803, "no_speech_prob": 0.0001800281897885725}, + {"id": 1088, "seek": 338398, "start": 3383.98, "end": 3387.26, "text": " That''s + the problem with chunking, with most forms of chunking.", "tokens": [50364, 663, + 311, 264, 1154, 365, 16635, 278, 11, 365, 881, 6422, 295, 16635, 278, 13, 50528], + "temperature": 0.0, "avg_logprob": -0.16659028412865812, "compression_ratio": 1.673913043478261, + "no_speech_prob": 0.004153615795075893}, {"id": 1089, "seek": 338398, "start": 3387.26, + "end": 3389.98, "text": " And so you have to be careful not to chunk too much,", + "tokens": [50528, 400, 370, 291, 362, 281, 312, 5026, 406, 281, 16635, 886, 709, + 11, 50664], "temperature": 0.0, "avg_logprob": -0.16659028412865812, "compression_ratio": + 1.673913043478261, "no_speech_prob": 0.004153615795075893}, {"id": 1090, "seek": + 338398, "start": 3389.98, "end": 3394.9, "text": " but the end verse is also true + if you only have 100 documents", "tokens": [50664, 457, 264, 917, 7996, 307, 611, + 2074, 498, 291, 787, 362, 2319, 8512, 50910], "temperature": 0.0, "avg_logprob": + -0.16659028412865812, "compression_ratio": 1.673913043478261, "no_speech_prob": + 0.004153615795075893}, {"id": 1091, "seek": 338398, "start": 3394.9, "end": 3397.7, + "text": " and every single one of them is 1,000 pages long,", "tokens": [50910, + 293, 633, 2167, 472, 295, 552, 307, 502, 11, 1360, 7183, 938, 11, 51050], "temperature": + 0.0, "avg_logprob": -0.16659028412865812, "compression_ratio": 1.673913043478261, + "no_speech_prob": 0.004153615795075893}, {"id": 1092, "seek": 338398, "start": 3397.7, + "end": 3399.26, "text": " well, there''s way too much overlap", "tokens": [51050, + 731, 11, 456, 311, 636, 886, 709, 19959, 51128], "temperature": 0.0, "avg_logprob": + -0.16659028412865812, "compression_ratio": 1.673913043478261, "no_speech_prob": + 0.004153615795075893}, {"id": 1093, "seek": 338398, "start": 3399.26, "end": 3400.78, + "text": " and everything is related at that point.", "tokens": [51128, 293, 1203, + 307, 4077, 412, 300, 935, 13, 51204], "temperature": 0.0, "avg_logprob": -0.16659028412865812, + "compression_ratio": 1.673913043478261, "no_speech_prob": 0.004153615795075893}, + {"id": 1094, "seek": 338398, "start": 3400.78, "end": 3405.02, "text": " So I would + say it''s no different than just how you would typically", "tokens": [51204, 407, + 286, 576, 584, 309, 311, 572, 819, 813, 445, 577, 291, 576, 5850, 51416], "temperature": + 0.0, "avg_logprob": -0.16659028412865812, "compression_ratio": 1.673913043478261, + "no_speech_prob": 0.004153615795075893}, {"id": 1095, "seek": 338398, "start": 3405.02, + "end": 3407.66, "text": " segment your documents for any search problem.", "tokens": + [51416, 9469, 428, 8512, 337, 604, 3164, 1154, 13, 51548], "temperature": 0.0, "avg_logprob": + -0.16659028412865812, "compression_ratio": 1.673913043478261, "no_speech_prob": + 0.004153615795075893}, {"id": 1096, "seek": 338398, "start": 3407.66, "end": 3411.38, + "text": " You need to be granular enough to be useful,", "tokens": [51548, 509, + 643, 281, 312, 39962, 1547, 281, 312, 4420, 11, 51734], "temperature": 0.0, "avg_logprob": + -0.16659028412865812, "compression_ratio": 1.673913043478261, "no_speech_prob": + 0.004153615795075893}, {"id": 1097, "seek": 341138, "start": 3411.46, "end": 3416.38, + "text": " but not broad enough to be too general.", "tokens": [50368, 457, 406, + 4152, 1547, 281, 312, 886, 2674, 13, 50614], "temperature": 0.0, "avg_logprob": + -0.25470991868239184, "compression_ratio": 1.6840148698884758, "no_speech_prob": + 0.014531845226883888}, {"id": 1098, "seek": 341138, "start": 3416.38, "end": 3417.54, + "text": " Yeah, awesome.", "tokens": [50614, 865, 11, 3476, 13, 50672], "temperature": + 0.0, "avg_logprob": -0.25470991868239184, "compression_ratio": 1.6840148698884758, + "no_speech_prob": 0.014531845226883888}, {"id": 1099, "seek": 341138, "start": 3417.54, + "end": 3419.9, "text": " And now the logistical question from Arun,", "tokens": + [50672, 400, 586, 264, 3565, 42686, 1168, 490, 1587, 409, 11, 50790], "temperature": + 0.0, "avg_logprob": -0.25470991868239184, "compression_ratio": 1.6840148698884758, + "no_speech_prob": 0.014531845226883888}, {"id": 1100, "seek": 341138, "start": 3419.9, + "end": 3422.46, "text": " whether we will share slides.", "tokens": [50790, 1968, + 321, 486, 2073, 9788, 13, 50918], "temperature": 0.0, "avg_logprob": -0.25470991868239184, + "compression_ratio": 1.6840148698884758, "no_speech_prob": 0.014531845226883888}, + {"id": 1101, "seek": 341138, "start": 3422.46, "end": 3423.62, "text": " Yes, absolutely.", + "tokens": [50918, 1079, 11, 3122, 13, 50976], "temperature": 0.0, "avg_logprob": + -0.25470991868239184, "compression_ratio": 1.6840148698884758, "no_speech_prob": + 0.014531845226883888}, {"id": 1102, "seek": 341138, "start": 3423.62, "end": 3426.7400000000002, + "text": " Yeah, the video for this, everybody who signed up", "tokens": [50976, + 865, 11, 264, 960, 337, 341, 11, 2201, 567, 8175, 493, 51132], "temperature": 0.0, + "avg_logprob": -0.25470991868239184, "compression_ratio": 1.6840148698884758, "no_speech_prob": + 0.014531845226883888}, {"id": 1103, "seek": 341138, "start": 3426.7400000000002, + "end": 3428.78, "text": " will get in probably like 48 hours,", "tokens": [51132, + 486, 483, 294, 1391, 411, 11174, 2496, 11, 51234], "temperature": 0.0, "avg_logprob": + -0.25470991868239184, "compression_ratio": 1.6840148698884758, "no_speech_prob": + 0.014531845226883888}, {"id": 1104, "seek": 341138, "start": 3428.78, "end": 3430.3, + "text": " you''ll get emailed a copy of the video,", "tokens": [51234, 291, 603, + 483, 3796, 292, 257, 5055, 295, 264, 960, 11, 51310], "temperature": 0.0, "avg_logprob": + -0.25470991868239184, "compression_ratio": 1.6840148698884758, "no_speech_prob": + 0.014531845226883888}, {"id": 1105, "seek": 341138, "start": 3430.3, "end": 3431.3, + "text": " so you can refer back to it.", "tokens": [51310, 370, 291, 393, 2864, + 646, 281, 309, 13, 51360], "temperature": 0.0, "avg_logprob": -0.25470991868239184, + "compression_ratio": 1.6840148698884758, "no_speech_prob": 0.014531845226883888}, + {"id": 1106, "seek": 341138, "start": 3431.3, "end": 3435.54, "text": " And I''ll + also send an email with the slides probably shortly thereafter.", "tokens": [51360, + 400, 286, 603, 611, 2845, 364, 3796, 365, 264, 9788, 1391, 13392, 38729, 13, 51572], + "temperature": 0.0, "avg_logprob": -0.25470991868239184, "compression_ratio": 1.6840148698884758, + "no_speech_prob": 0.014531845226883888}, {"id": 1107, "seek": 341138, "start": 3435.54, + "end": 3439.54, "text": " Yeah, and I plan to publish this in the vector podcast + as well.", "tokens": [51572, 865, 11, 293, 286, 1393, 281, 11374, 341, 294, 264, + 8062, 7367, 382, 731, 13, 51772], "temperature": 0.0, "avg_logprob": -0.25470991868239184, + "compression_ratio": 1.6840148698884758, "no_speech_prob": 0.014531845226883888}, + {"id": 1108, "seek": 341138, "start": 3439.54, "end": 3440.82, "text": " Yes, absolutely.", + "tokens": [51772, 1079, 11, 3122, 13, 51836], "temperature": 0.0, "avg_logprob": + -0.25470991868239184, "compression_ratio": 1.6840148698884758, "no_speech_prob": + 0.014531845226883888}, {"id": 1109, "seek": 344082, "start": 3440.82, "end": 3443.34, + "text": " I''ll follow up later.", "tokens": [50364, 286, 603, 1524, 493, 1780, + 13, 50490], "temperature": 0.0, "avg_logprob": -0.23563863314115086, "compression_ratio": + 1.7216117216117217, "no_speech_prob": 0.03407295420765877}, {"id": 1110, "seek": + 344082, "start": 3443.34, "end": 3445.5, "text": " The next question is really cool + from Cloud.", "tokens": [50490, 440, 958, 1168, 307, 534, 1627, 490, 8061, 13, 50598], + "temperature": 0.0, "avg_logprob": -0.23563863314115086, "compression_ratio": 1.7216117216117217, + "no_speech_prob": 0.03407295420765877}, {"id": 1111, "seek": 344082, "start": 3445.5, + "end": 3450.34, "text": " You''re creating a wormhole vector that will move us from + embedding space", "tokens": [50598, 509, 434, 4084, 257, 23835, 14094, 8062, 300, + 486, 1286, 505, 490, 12240, 3584, 1901, 50840], "temperature": 0.0, "avg_logprob": + -0.23563863314115086, "compression_ratio": 1.7216117216117217, "no_speech_prob": + 0.03407295420765877}, {"id": 1112, "seek": 344082, "start": 3450.34, "end": 3452.06, + "text": " to sparse vector.", "tokens": [50840, 281, 637, 11668, 8062, 13, 50926], + "temperature": 0.0, "avg_logprob": -0.23563863314115086, "compression_ratio": 1.7216117216117217, + "no_speech_prob": 0.03407295420765877}, {"id": 1113, "seek": 344082, "start": 3452.06, + "end": 3454.5, "text": " I understand the methodology, but the way back now,", "tokens": + [50926, 286, 1223, 264, 24850, 11, 457, 264, 636, 646, 586, 11, 51048], "temperature": + 0.0, "avg_logprob": -0.23563863314115086, "compression_ratio": 1.7216117216117217, + "no_speech_prob": 0.03407295420765877}, {"id": 1114, "seek": 344082, "start": 3454.5, + "end": 3457.38, "text": " how do we aggregate a set of sparse vectors", "tokens": + [51048, 577, 360, 321, 26118, 257, 992, 295, 637, 11668, 18875, 51192], "temperature": + 0.0, "avg_logprob": -0.23563863314115086, "compression_ratio": 1.7216117216117217, + "no_speech_prob": 0.03407295420765877}, {"id": 1115, "seek": 344082, "start": 3457.38, + "end": 3460.78, "text": " that represent documents in a way that will allow us to + move us", "tokens": [51192, 300, 2906, 8512, 294, 257, 636, 300, 486, 2089, 505, + 281, 1286, 505, 51362], "temperature": 0.0, "avg_logprob": -0.23563863314115086, + "compression_ratio": 1.7216117216117217, "no_speech_prob": 0.03407295420765877}, + {"id": 1116, "seek": 344082, "start": 3460.78, "end": 3461.86, "text": " to the + embedding space?", "tokens": [51362, 281, 264, 12240, 3584, 1901, 30, 51416], "temperature": + 0.0, "avg_logprob": -0.23563863314115086, "compression_ratio": 1.7216117216117217, + "no_speech_prob": 0.03407295420765877}, {"id": 1117, "seek": 344082, "start": 3461.86, + "end": 3464.94, "text": " In other words, from the sparse, like you showed the tray,", + "tokens": [51416, 682, 661, 2283, 11, 490, 264, 637, 11668, 11, 411, 291, 4712, + 264, 16027, 11, 51570], "temperature": 0.0, "avg_logprob": -0.23563863314115086, + "compression_ratio": 1.7216117216117217, "no_speech_prob": 0.03407295420765877}, + {"id": 1118, "seek": 344082, "start": 3464.94, "end": 3467.06, "text": " we have + millions of dimensions, right?", "tokens": [51570, 321, 362, 6803, 295, 12819, 11, + 558, 30, 51676], "temperature": 0.0, "avg_logprob": -0.23563863314115086, "compression_ratio": + 1.7216117216117217, "no_speech_prob": 0.03407295420765877}, {"id": 1119, "seek": + 344082, "start": 3467.06, "end": 3469.98, "text": " How do we compact that, right?", + "tokens": [51676, 1012, 360, 321, 14679, 300, 11, 558, 30, 51822], "temperature": + 0.0, "avg_logprob": -0.23563863314115086, "compression_ratio": 1.7216117216117217, + "no_speech_prob": 0.03407295420765877}, {"id": 1120, "seek": 346998, "start": 3469.98, + "end": 3472.9, "text": " And don''t lose anything and don''t introduce any noise", + "tokens": [50364, 400, 500, 380, 3624, 1340, 293, 500, 380, 5366, 604, 5658, 50510], + "temperature": 0.0, "avg_logprob": -0.18355014181544638, "compression_ratio": 1.653386454183267, + "no_speech_prob": 0.00017973485228139907}, {"id": 1121, "seek": 346998, "start": + 3472.9, "end": 3475.54, "text": " when we''re not way to the dense space?", "tokens": + [50510, 562, 321, 434, 406, 636, 281, 264, 18011, 1901, 30, 50642], "temperature": + 0.0, "avg_logprob": -0.18355014181544638, "compression_ratio": 1.653386454183267, + "no_speech_prob": 0.00017973485228139907}, {"id": 1122, "seek": 346998, "start": + 3475.54, "end": 3477.42, "text": " Yeah, so it''s a really great question.", "tokens": + [50642, 865, 11, 370, 309, 311, 257, 534, 869, 1168, 13, 50736], "temperature": + 0.0, "avg_logprob": -0.18355014181544638, "compression_ratio": 1.653386454183267, + "no_speech_prob": 0.00017973485228139907}, {"id": 1123, "seek": 346998, "start": + 3477.42, "end": 3479.5, "text": " I answered it technically in terms of pulling,", + "tokens": [50736, 286, 10103, 309, 12120, 294, 2115, 295, 8407, 11, 50840], "temperature": + 0.0, "avg_logprob": -0.18355014181544638, "compression_ratio": 1.653386454183267, + "no_speech_prob": 0.00017973485228139907}, {"id": 1124, "seek": 346998, "start": + 3479.5, "end": 3482.06, "text": " but let me add some color to it in terms of techniques.", + "tokens": [50840, 457, 718, 385, 909, 512, 2017, 281, 309, 294, 2115, 295, 7512, + 13, 50968], "temperature": 0.0, "avg_logprob": -0.18355014181544638, "compression_ratio": + 1.653386454183267, "no_speech_prob": 0.00017973485228139907}, {"id": 1125, "seek": + 346998, "start": 3482.06, "end": 3484.06, "text": " So the...", "tokens": [50968, + 407, 264, 485, 51068], "temperature": 0.0, "avg_logprob": -0.18355014181544638, + "compression_ratio": 1.653386454183267, "no_speech_prob": 0.00017973485228139907}, + {"id": 1126, "seek": 346998, "start": 3486.02, "end": 3486.98, "text": " There''s + a couple of things here.", "tokens": [51166, 821, 311, 257, 1916, 295, 721, 510, + 13, 51214], "temperature": 0.0, "avg_logprob": -0.18355014181544638, "compression_ratio": + 1.653386454183267, "no_speech_prob": 0.00017973485228139907}, {"id": 1127, "seek": + 346998, "start": 3486.98, "end": 3491.54, "text": " One, whenever you''re querying + in an inverted index,", "tokens": [51214, 1485, 11, 5699, 291, 434, 7083, 1840, + 294, 364, 38969, 8186, 11, 51442], "temperature": 0.0, "avg_logprob": -0.18355014181544638, + "compression_ratio": 1.653386454183267, "no_speech_prob": 0.00017973485228139907}, + {"id": 1128, "seek": 346998, "start": 3491.54, "end": 3495.66, "text": " there''s + typically a kind of Boolean matching phase,", "tokens": [51442, 456, 311, 5850, + 257, 733, 295, 23351, 28499, 14324, 5574, 11, 51648], "temperature": 0.0, "avg_logprob": + -0.18355014181544638, "compression_ratio": 1.653386454183267, "no_speech_prob": + 0.00017973485228139907}, {"id": 1129, "seek": 346998, "start": 3495.66, "end": 3497.22, + "text": " and then there''s a ranking phase,", "tokens": [51648, 293, 550, 456, + 311, 257, 17833, 5574, 11, 51726], "temperature": 0.0, "avg_logprob": -0.18355014181544638, + "compression_ratio": 1.653386454183267, "no_speech_prob": 0.00017973485228139907}, + {"id": 1130, "seek": 349722, "start": 3497.22, "end": 3500.54, "text": " meaning + if you had 10 million documents in your index,", "tokens": [50364, 3620, 498, 291, + 632, 1266, 2459, 8512, 294, 428, 8186, 11, 50530], "temperature": 0.0, "avg_logprob": + -0.17541495103102464, "compression_ratio": 1.8815331010452963, "no_speech_prob": + 0.004542068112641573}, {"id": 1131, "seek": 349722, "start": 3500.54, "end": 3503.9399999999996, + "text": " you''re not gonna return a ranked list of 10 million documents.", "tokens": + [50530, 291, 434, 406, 799, 2736, 257, 20197, 1329, 295, 1266, 2459, 8512, 13, 50700], + "temperature": 0.0, "avg_logprob": -0.17541495103102464, "compression_ratio": 1.8815331010452963, + "no_speech_prob": 0.004542068112641573}, {"id": 1132, "seek": 349722, "start": 3503.9399999999996, + "end": 3505.3799999999997, "text": " You''re gonna probably return the documents", + "tokens": [50700, 509, 434, 799, 1391, 2736, 264, 8512, 50772], "temperature": 0.0, + "avg_logprob": -0.17541495103102464, "compression_ratio": 1.8815331010452963, "no_speech_prob": + 0.004542068112641573}, {"id": 1133, "seek": 349722, "start": 3505.3799999999997, + "end": 3507.8999999999996, "text": " that have the specific keywords you search + for,", "tokens": [50772, 300, 362, 264, 2685, 21009, 291, 3164, 337, 11, 50898], + "temperature": 0.0, "avg_logprob": -0.17541495103102464, "compression_ratio": 1.8815331010452963, + "no_speech_prob": 0.004542068112641573}, {"id": 1134, "seek": 349722, "start": 3507.8999999999996, + "end": 3510.14, "text": " which is gonna be a much smaller document set.", "tokens": + [50898, 597, 307, 799, 312, 257, 709, 4356, 4166, 992, 13, 51010], "temperature": + 0.0, "avg_logprob": -0.17541495103102464, "compression_ratio": 1.8815331010452963, + "no_speech_prob": 0.004542068112641573}, {"id": 1135, "seek": 349722, "start": 3511.18, + "end": 3512.02, "text": " And so that''s...", "tokens": [51062, 400, 370, 300, 311, + 485, 51104], "temperature": 0.0, "avg_logprob": -0.17541495103102464, "compression_ratio": + 1.8815331010452963, "no_speech_prob": 0.004542068112641573}, {"id": 1136, "seek": + 349722, "start": 3512.02, "end": 3513.54, "text": " And you can do the same thing + on the dense side", "tokens": [51104, 400, 291, 393, 360, 264, 912, 551, 322, 264, + 18011, 1252, 51180], "temperature": 0.0, "avg_logprob": -0.17541495103102464, "compression_ratio": + 1.8815331010452963, "no_speech_prob": 0.004542068112641573}, {"id": 1137, "seek": + 349722, "start": 3513.54, "end": 3516.8999999999996, "text": " with cutoffs on cosine + similarity or something like that.", "tokens": [51180, 365, 1723, 19231, 322, 23565, + 32194, 420, 746, 411, 300, 13, 51348], "temperature": 0.0, "avg_logprob": -0.17541495103102464, + "compression_ratio": 1.8815331010452963, "no_speech_prob": 0.004542068112641573}, + {"id": 1138, "seek": 349722, "start": 3516.8999999999996, "end": 3521.8999999999996, + "text": " But, the step one is you start with a condensed document set", "tokens": + [51348, 583, 11, 264, 1823, 472, 307, 291, 722, 365, 257, 36398, 4166, 992, 51598], + "temperature": 0.0, "avg_logprob": -0.17541495103102464, "compression_ratio": 1.8815331010452963, + "no_speech_prob": 0.004542068112641573}, {"id": 1139, "seek": 349722, "start": 3522.18, + "end": 3524.18, "text": " that should represent generally the meaning", "tokens": + [51612, 300, 820, 2906, 5101, 264, 3620, 51712], "temperature": 0.0, "avg_logprob": + -0.17541495103102464, "compression_ratio": 1.8815331010452963, "no_speech_prob": + 0.004542068112641573}, {"id": 1140, "seek": 349722, "start": 3524.18, "end": 3526.9399999999996, + "text": " of what you searched for using the keywords you searched", "tokens": [51712, + 295, 437, 291, 22961, 337, 1228, 264, 21009, 291, 22961, 51850], "temperature": + 0.0, "avg_logprob": -0.17541495103102464, "compression_ratio": 1.8815331010452963, + "no_speech_prob": 0.004542068112641573}, {"id": 1141, "seek": 352694, "start": 3526.98, + "end": 3528.42, "text": " from the lexical side.", "tokens": [50366, 490, 264, 476, + 87, 804, 1252, 13, 50438], "temperature": 0.0, "avg_logprob": -0.14011366536298137, + "compression_ratio": 1.7424242424242424, "no_speech_prob": 0.0026540064718574286}, + {"id": 1142, "seek": 352694, "start": 3528.42, "end": 3532.26, "text": " However, + because the idea of a wormhole vector", "tokens": [50438, 2908, 11, 570, 264, 1558, + 295, 257, 23835, 14094, 8062, 50630], "temperature": 0.0, "avg_logprob": -0.14011366536298137, + "compression_ratio": 1.7424242424242424, "no_speech_prob": 0.0026540064718574286}, + {"id": 1143, "seek": 352694, "start": 3532.26, "end": 3536.14, "text": " is to find + the best corresponding region", "tokens": [50630, 307, 281, 915, 264, 1151, 11760, + 4458, 50824], "temperature": 0.0, "avg_logprob": -0.14011366536298137, "compression_ratio": + 1.7424242424242424, "no_speech_prob": 0.0026540064718574286}, {"id": 1144, "seek": + 352694, "start": 3536.14, "end": 3538.54, "text": " in the other semantic space,", + "tokens": [50824, 294, 264, 661, 47982, 1901, 11, 50944], "temperature": 0.0, "avg_logprob": + -0.14011366536298137, "compression_ratio": 1.7424242424242424, "no_speech_prob": + 0.0026540064718574286}, {"id": 1145, "seek": 352694, "start": 3538.54, "end": 3541.94, + "text": " it can often be useful to not take that entire document set,", "tokens": + [50944, 309, 393, 2049, 312, 4420, 281, 406, 747, 300, 2302, 4166, 992, 11, 51114], + "temperature": 0.0, "avg_logprob": -0.14011366536298137, "compression_ratio": 1.7424242424242424, + "no_speech_prob": 0.0026540064718574286}, {"id": 1146, "seek": 352694, "start": + 3541.94, "end": 3543.26, "text": " either matching the query,", "tokens": [51114, + 2139, 14324, 264, 14581, 11, 51180], "temperature": 0.0, "avg_logprob": -0.14011366536298137, + "compression_ratio": 1.7424242424242424, "no_speech_prob": 0.0026540064718574286}, + {"id": 1147, "seek": 352694, "start": 3543.26, "end": 3545.78, "text": " but if + you feel confident about your ranking,", "tokens": [51180, 457, 498, 291, 841, 6679, + 466, 428, 17833, 11, 51306], "temperature": 0.0, "avg_logprob": -0.14011366536298137, + "compression_ratio": 1.7424242424242424, "no_speech_prob": 0.0026540064718574286}, + {"id": 1148, "seek": 352694, "start": 3545.78, "end": 3547.5, "text": " then you + can take the top end document.", "tokens": [51306, 550, 291, 393, 747, 264, 1192, + 917, 4166, 13, 51392], "temperature": 0.0, "avg_logprob": -0.14011366536298137, + "compression_ratio": 1.7424242424242424, "no_speech_prob": 0.0026540064718574286}, + {"id": 1149, "seek": 352694, "start": 3547.5, "end": 3549.78, "text": " So maybe + you match 10,000,", "tokens": [51392, 407, 1310, 291, 2995, 1266, 11, 1360, 11, + 51506], "temperature": 0.0, "avg_logprob": -0.14011366536298137, "compression_ratio": + 1.7424242424242424, "no_speech_prob": 0.0026540064718574286}, {"id": 1150, "seek": + 352694, "start": 3549.78, "end": 3551.1, "text": " and maybe you only take the top + hundred", "tokens": [51506, 293, 1310, 291, 787, 747, 264, 1192, 3262, 51572], "temperature": + 0.0, "avg_logprob": -0.14011366536298137, "compression_ratio": 1.7424242424242424, + "no_speech_prob": 0.0026540064718574286}, {"id": 1151, "seek": 352694, "start": + 3551.1, "end": 3552.62, "text": " and say, hey, from the top hundred,", "tokens": + [51572, 293, 584, 11, 4177, 11, 490, 264, 1192, 3262, 11, 51648], "temperature": + 0.0, "avg_logprob": -0.14011366536298137, "compression_ratio": 1.7424242424242424, + "no_speech_prob": 0.0026540064718574286}, {"id": 1152, "seek": 352694, "start": + 3552.62, "end": 3554.86, "text": " if you know your relevance ranking is good,", + "tokens": [51648, 498, 291, 458, 428, 32684, 17833, 307, 665, 11, 51760], "temperature": + 0.0, "avg_logprob": -0.14011366536298137, "compression_ratio": 1.7424242424242424, + "no_speech_prob": 0.0026540064718574286}, {"id": 1153, "seek": 355486, "start": + 3554.86, "end": 3558.1400000000003, "text": " then you''re gonna use that to generate + a more precise wormhole", "tokens": [50364, 550, 291, 434, 799, 764, 300, 281, 8460, + 257, 544, 13600, 23835, 14094, 50528], "temperature": 0.0, "avg_logprob": -0.13648429630309578, + "compression_ratio": 1.8841698841698842, "no_speech_prob": 0.00020072469487786293}, + {"id": 1154, "seek": 355486, "start": 3558.1400000000003, "end": 3560.54, "text": + " vector to the meaning of those top documents", "tokens": [50528, 8062, 281, 264, + 3620, 295, 729, 1192, 8512, 50648], "temperature": 0.0, "avg_logprob": -0.13648429630309578, + "compression_ratio": 1.8841698841698842, "no_speech_prob": 0.00020072469487786293}, + {"id": 1155, "seek": 355486, "start": 3560.54, "end": 3562.02, "text": " over to + the dense space.", "tokens": [50648, 670, 281, 264, 18011, 1901, 13, 50722], "temperature": + 0.0, "avg_logprob": -0.13648429630309578, "compression_ratio": 1.8841698841698842, + "no_speech_prob": 0.00020072469487786293}, {"id": 1156, "seek": 355486, "start": + 3562.02, "end": 3566.02, "text": " So, whether you go with the full matching document + set,", "tokens": [50722, 407, 11, 1968, 291, 352, 365, 264, 1577, 14324, 4166, 992, + 11, 50922], "temperature": 0.0, "avg_logprob": -0.13648429630309578, "compression_ratio": + 1.8841698841698842, "no_speech_prob": 0.00020072469487786293}, {"id": 1157, "seek": + 355486, "start": 3566.02, "end": 3568.02, "text": " or you go with the top end,", + "tokens": [50922, 420, 291, 352, 365, 264, 1192, 917, 11, 51022], "temperature": + 0.0, "avg_logprob": -0.13648429630309578, "compression_ratio": 1.8841698841698842, + "no_speech_prob": 0.00020072469487786293}, {"id": 1158, "seek": 355486, "start": + 3568.02, "end": 3570.1, "text": " that''s really a just practical matter", "tokens": + [51022, 300, 311, 534, 257, 445, 8496, 1871, 51126], "temperature": 0.0, "avg_logprob": + -0.13648429630309578, "compression_ratio": 1.8841698841698842, "no_speech_prob": + 0.00020072469487786293}, {"id": 1159, "seek": 355486, "start": 3570.1, "end": 3572.2200000000003, + "text": " of how confident you are on the ranking.", "tokens": [51126, 295, 577, + 6679, 291, 366, 322, 264, 17833, 13, 51232], "temperature": 0.0, "avg_logprob": + -0.13648429630309578, "compression_ratio": 1.8841698841698842, "no_speech_prob": + 0.00020072469487786293}, {"id": 1160, "seek": 355486, "start": 3572.2200000000003, + "end": 3574.78, "text": " If you''re really confident in your relevance,", "tokens": + [51232, 759, 291, 434, 534, 6679, 294, 428, 32684, 11, 51360], "temperature": 0.0, + "avg_logprob": -0.13648429630309578, "compression_ratio": 1.8841698841698842, "no_speech_prob": + 0.00020072469487786293}, {"id": 1161, "seek": 355486, "start": 3574.78, "end": 3576.6200000000003, + "text": " you should go with the more relevant documents.", "tokens": [51360, 291, + 820, 352, 365, 264, 544, 7340, 8512, 13, 51452], "temperature": 0.0, "avg_logprob": + -0.13648429630309578, "compression_ratio": 1.8841698841698842, "no_speech_prob": + 0.00020072469487786293}, {"id": 1162, "seek": 355486, "start": 3576.6200000000003, + "end": 3578.6600000000003, "text": " And if you''re not, just take the whole document + set", "tokens": [51452, 400, 498, 291, 434, 406, 11, 445, 747, 264, 1379, 4166, + 992, 51554], "temperature": 0.0, "avg_logprob": -0.13648429630309578, "compression_ratio": + 1.8841698841698842, "no_speech_prob": 0.00020072469487786293}, {"id": 1163, "seek": + 355486, "start": 3578.6600000000003, "end": 3581.78, "text": " and it should sort + of average out the meaning.", "tokens": [51554, 293, 309, 820, 1333, 295, 4274, + 484, 264, 3620, 13, 51710], "temperature": 0.0, "avg_logprob": -0.13648429630309578, + "compression_ratio": 1.8841698841698842, "no_speech_prob": 0.00020072469487786293}, + {"id": 1164, "seek": 358178, "start": 3582.78, "end": 3585.26, "text": " One other + thing that we didn''t really get into", "tokens": [50414, 1485, 661, 551, 300, 321, + 994, 380, 534, 483, 666, 50538], "temperature": 0.0, "avg_logprob": -0.27242630499380605, + "compression_ratio": 1.3409090909090908, "no_speech_prob": 0.0033424245193600655}, + {"id": 1165, "seek": 358178, "start": 3585.26, "end": 3589.2200000000003, "text": + " is that the strategy, the technique I was showing", "tokens": [50538, 307, 300, + 264, 5206, 11, 264, 6532, 286, 390, 4099, 50736], "temperature": 0.0, "avg_logprob": + -0.27242630499380605, "compression_ratio": 1.3409090909090908, "no_speech_prob": + 0.0033424245193600655}, {"id": 1166, "seek": 358178, "start": 3589.2200000000003, + "end": 3594.2200000000003, "text": " if I, let me jump back to the final slide one + second.", "tokens": [50736, 498, 286, 11, 718, 385, 3012, 646, 281, 264, 2572, 4137, + 472, 1150, 13, 50986], "temperature": 0.0, "avg_logprob": -0.27242630499380605, + "compression_ratio": 1.3409090909090908, "no_speech_prob": 0.0033424245193600655}, + {"id": 1167, "seek": 358178, "start": 3600.0600000000004, "end": 3601.78, "text": + " If I jump back to...", "tokens": [51278, 759, 286, 3012, 646, 281, 485, 51364], + "temperature": 0.0, "avg_logprob": -0.27242630499380605, "compression_ratio": 1.3409090909090908, + "no_speech_prob": 0.0033424245193600655}, {"id": 1168, "seek": 358178, "start": + 3606.7400000000002, "end": 3607.5800000000004, "text": " Here.", "tokens": [51612, + 1692, 13, 51654], "temperature": 0.0, "avg_logprob": -0.27242630499380605, "compression_ratio": + 1.3409090909090908, "no_speech_prob": 0.0033424245193600655}, {"id": 1169, "seek": + 360758, "start": 3608.58, "end": 3611.8199999999997, "text": " So the technique + that I''m showing,", "tokens": [50414, 407, 264, 6532, 300, 286, 478, 4099, 11, + 50576], "temperature": 0.0, "avg_logprob": -0.15337547011997388, "compression_ratio": + 1.6227272727272728, "no_speech_prob": 0.0016515774186700583}, {"id": 1170, "seek": + 360758, "start": 3611.8199999999997, "end": 3614.1, "text": " where I get my document + set,", "tokens": [50576, 689, 286, 483, 452, 4166, 992, 11, 50690], "temperature": + 0.0, "avg_logprob": -0.15337547011997388, "compression_ratio": 1.6227272727272728, + "no_speech_prob": 0.0016515774186700583}, {"id": 1171, "seek": 360758, "start": + 3614.1, "end": 3615.98, "text": " pull my embeddings together,", "tokens": [50690, + 2235, 452, 12240, 29432, 1214, 11, 50784], "temperature": 0.0, "avg_logprob": -0.15337547011997388, + "compression_ratio": 1.6227272727272728, "no_speech_prob": 0.0016515774186700583}, + {"id": 1172, "seek": 360758, "start": 3615.98, "end": 3619.7, "text": " that ultimately + gives me a single embedding,", "tokens": [50784, 300, 6284, 2709, 385, 257, 2167, + 12240, 3584, 11, 50970], "temperature": 0.0, "avg_logprob": -0.15337547011997388, + "compression_ratio": 1.6227272727272728, "no_speech_prob": 0.0016515774186700583}, + {"id": 1173, "seek": 360758, "start": 3619.7, "end": 3623.58, "text": " which is + a single point over here in my dense vector space.", "tokens": [50970, 597, 307, + 257, 2167, 935, 670, 510, 294, 452, 18011, 8062, 1901, 13, 51164], "temperature": + 0.0, "avg_logprob": -0.15337547011997388, "compression_ratio": 1.6227272727272728, + "no_speech_prob": 0.0016515774186700583}, {"id": 1174, "seek": 360758, "start": + 3623.58, "end": 3627.8199999999997, "text": " The reality is that different queries + have different specificity.", "tokens": [51164, 440, 4103, 307, 300, 819, 24109, + 362, 819, 2685, 507, 13, 51376], "temperature": 0.0, "avg_logprob": -0.15337547011997388, + "compression_ratio": 1.6227272727272728, "no_speech_prob": 0.0016515774186700583}, + {"id": 1175, "seek": 360758, "start": 3627.8199999999997, "end": 3630.62, "text": + " So imagine this is like a job search engine.", "tokens": [51376, 407, 3811, 341, + 307, 411, 257, 1691, 3164, 2848, 13, 51516], "temperature": 0.0, "avg_logprob": + -0.15337547011997388, "compression_ratio": 1.6227272727272728, "no_speech_prob": + 0.0016515774186700583}, {"id": 1176, "seek": 360758, "start": 3630.62, "end": 3635.62, + "text": " If I run a search for senior AI search engineer,", "tokens": [51516, 759, + 286, 1190, 257, 3164, 337, 7965, 7318, 3164, 11403, 11, 51766], "temperature": 0.0, + "avg_logprob": -0.15337547011997388, "compression_ratio": 1.6227272727272728, "no_speech_prob": + 0.0016515774186700583}, {"id": 1177, "seek": 363758, "start": 3638.58, "end": 3642.22, + "text": " a late interaction, culverts,", "tokens": [50414, 257, 3469, 9285, 11, + 11021, 36999, 11, 50596], "temperature": 0.0, "avg_logprob": -0.1882278849777666, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.0019133547320961952}, + {"id": 1178, "seek": 363758, "start": 3644.9, "end": 3647.7799999999997, "text": + " signals boosting and collaborative filter.", "tokens": [50730, 12354, 43117, 293, + 16555, 6608, 13, 50874], "temperature": 0.0, "avg_logprob": -0.1882278849777666, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.0019133547320961952}, + {"id": 1179, "seek": 363758, "start": 3647.7799999999997, "end": 3650.74, "text": + " If I run that search, that''s a very specific search.", "tokens": [50874, 759, + 286, 1190, 300, 3164, 11, 300, 311, 257, 588, 2685, 3164, 13, 51022], "temperature": + 0.0, "avg_logprob": -0.1882278849777666, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.0019133547320961952}, {"id": 1180, "seek": 363758, "start": + 3650.74, "end": 3653.22, "text": " Frankly, it probably doesn''t match anybody,", + "tokens": [51022, 41344, 11, 309, 1391, 1177, 380, 2995, 4472, 11, 51146], "temperature": + 0.0, "avg_logprob": -0.1882278849777666, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.0019133547320961952}, {"id": 1181, "seek": 363758, "start": + 3653.22, "end": 3654.94, "text": " but if I ran that search,", "tokens": [51146, + 457, 498, 286, 5872, 300, 3164, 11, 51232], "temperature": 0.0, "avg_logprob": -0.1882278849777666, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.0019133547320961952}, + {"id": 1182, "seek": 363758, "start": 3654.94, "end": 3656.62, "text": " it would + be a very small number of documents.", "tokens": [51232, 309, 576, 312, 257, 588, + 1359, 1230, 295, 8512, 13, 51316], "temperature": 0.0, "avg_logprob": -0.1882278849777666, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.0019133547320961952}, + {"id": 1183, "seek": 363758, "start": 3656.62, "end": 3657.66, "text": " It''s very + specific.", "tokens": [51316, 467, 311, 588, 2685, 13, 51368], "temperature": 0.0, + "avg_logprob": -0.1882278849777666, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.0019133547320961952}, {"id": 1184, "seek": 363758, "start": 3657.66, "end": 3661.46, + "text": " However, and so in that case, having a point kind of makes sense.", "tokens": + [51368, 2908, 11, 293, 370, 294, 300, 1389, 11, 1419, 257, 935, 733, 295, 1669, + 2020, 13, 51558], "temperature": 0.0, "avg_logprob": -0.1882278849777666, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.0019133547320961952}, {"id": 1185, "seek": + 363758, "start": 3661.46, "end": 3663.86, "text": " However, if I ran a search for + sales,", "tokens": [51558, 2908, 11, 498, 286, 5872, 257, 3164, 337, 5763, 11, 51678], + "temperature": 0.0, "avg_logprob": -0.1882278849777666, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.0019133547320961952}, {"id": 1186, "seek": 366386, "start": + 3664.86, "end": 3668.86, "text": " that''s like a third of all jobs.", "tokens": + [50414, 300, 311, 411, 257, 2636, 295, 439, 4782, 13, 50614], "temperature": 0.0, + "avg_logprob": -0.16494624382626694, "compression_ratio": 1.8185654008438819, "no_speech_prob": + 0.001431996002793312}, {"id": 1187, "seek": 366386, "start": 3668.86, "end": 3671.5, + "text": " And for me to take the notion of sales,", "tokens": [50614, 400, 337, + 385, 281, 747, 264, 10710, 295, 5763, 11, 50746], "temperature": 0.0, "avg_logprob": + -0.16494624382626694, "compression_ratio": 1.8185654008438819, "no_speech_prob": + 0.001431996002793312}, {"id": 1188, "seek": 366386, "start": 3671.5, "end": 3674.94, + "text": " which is probably a giant region in this vector space", "tokens": [50746, + 597, 307, 1391, 257, 7410, 4458, 294, 341, 8062, 1901, 50918], "temperature": 0.0, + "avg_logprob": -0.16494624382626694, "compression_ratio": 1.8185654008438819, "no_speech_prob": + 0.001431996002793312}, {"id": 1189, "seek": 366386, "start": 3674.94, "end": 3676.7000000000003, + "text": " with lots of nuance inside of it,", "tokens": [50918, 365, 3195, 295, + 42625, 1854, 295, 309, 11, 51006], "temperature": 0.0, "avg_logprob": -0.16494624382626694, + "compression_ratio": 1.8185654008438819, "no_speech_prob": 0.001431996002793312}, + {"id": 1190, "seek": 366386, "start": 3676.7000000000003, "end": 3680.58, "text": + " and to then turn that into just a point in the other vector space,", "tokens": + [51006, 293, 281, 550, 1261, 300, 666, 445, 257, 935, 294, 264, 661, 8062, 1901, + 11, 51200], "temperature": 0.0, "avg_logprob": -0.16494624382626694, "compression_ratio": + 1.8185654008438819, "no_speech_prob": 0.001431996002793312}, {"id": 1191, "seek": + 366386, "start": 3680.58, "end": 3682.7400000000002, "text": " it''s probably not + gonna work out super well", "tokens": [51200, 309, 311, 1391, 406, 799, 589, 484, + 1687, 731, 51308], "temperature": 0.0, "avg_logprob": -0.16494624382626694, "compression_ratio": + 1.8185654008438819, "no_speech_prob": 0.001431996002793312}, {"id": 1192, "seek": + 366386, "start": 3682.7400000000002, "end": 3683.78, "text": " because there''s + probably,", "tokens": [51308, 570, 456, 311, 1391, 11, 51360], "temperature": 0.0, + "avg_logprob": -0.16494624382626694, "compression_ratio": 1.8185654008438819, "no_speech_prob": + 0.001431996002793312}, {"id": 1193, "seek": 366386, "start": 3683.78, "end": 3687.34, + "text": " sales is probably distributed across that other vector space", "tokens": + [51360, 5763, 307, 1391, 12631, 2108, 300, 661, 8062, 1901, 51538], "temperature": + 0.0, "avg_logprob": -0.16494624382626694, "compression_ratio": 1.8185654008438819, + "no_speech_prob": 0.001431996002793312}, {"id": 1194, "seek": 366386, "start": 3687.34, + "end": 3689.1800000000003, "text": " in a much larger region.", "tokens": [51538, + 294, 257, 709, 4833, 4458, 13, 51630], "temperature": 0.0, "avg_logprob": -0.16494624382626694, + "compression_ratio": 1.8185654008438819, "no_speech_prob": 0.001431996002793312}, + {"id": 1195, "seek": 366386, "start": 3689.1800000000003, "end": 3693.7400000000002, + "text": " And so there''s this notion of query specificity", "tokens": [51630, 400, + 370, 456, 311, 341, 10710, 295, 14581, 2685, 507, 51858], "temperature": 0.0, "avg_logprob": + -0.16494624382626694, "compression_ratio": 1.8185654008438819, "no_speech_prob": + 0.001431996002793312}, {"id": 1196, "seek": 369374, "start": 3693.7799999999997, + "end": 3695.58, "text": " which is also really useful.", "tokens": [50366, 597, + 307, 611, 534, 4420, 13, 50456], "temperature": 0.0, "avg_logprob": -0.14200683653823973, + "compression_ratio": 1.7671755725190839, "no_speech_prob": 9.53077687881887e-05}, + {"id": 1197, "seek": 369374, "start": 3695.58, "end": 3698.8199999999997, "text": + " So I would actually argue that the better way to do this technique", "tokens": + [50456, 407, 286, 576, 767, 9695, 300, 264, 1101, 636, 281, 360, 341, 6532, 50618], + "temperature": 0.0, "avg_logprob": -0.14200683653823973, "compression_ratio": 1.7671755725190839, + "no_speech_prob": 9.53077687881887e-05}, {"id": 1198, "seek": 369374, "start": 3698.8199999999997, + "end": 3701.54, "text": " is as part of your initial query", "tokens": [50618, 307, + 382, 644, 295, 428, 5883, 14581, 50754], "temperature": 0.0, "avg_logprob": -0.14200683653823973, + "compression_ratio": 1.7671755725190839, "no_speech_prob": 9.53077687881887e-05}, + {"id": 1199, "seek": 369374, "start": 3701.54, "end": 3704.22, "text": " when you''re + sort of finding the set of documents.", "tokens": [50754, 562, 291, 434, 1333, 295, + 5006, 264, 992, 295, 8512, 13, 50888], "temperature": 0.0, "avg_logprob": -0.14200683653823973, + "compression_ratio": 1.7671755725190839, "no_speech_prob": 9.53077687881887e-05}, + {"id": 1200, "seek": 369374, "start": 3704.22, "end": 3706.74, "text": " If you + can look, for example, at the embeddings", "tokens": [50888, 759, 291, 393, 574, + 11, 337, 1365, 11, 412, 264, 12240, 29432, 51014], "temperature": 0.0, "avg_logprob": + -0.14200683653823973, "compression_ratio": 1.7671755725190839, "no_speech_prob": + 9.53077687881887e-05}, {"id": 1201, "seek": 369374, "start": 3706.74, "end": 3708.74, + "text": " and do just like a co-sign similarity", "tokens": [51014, 293, 360, 445, + 411, 257, 598, 12, 82, 788, 32194, 51114], "temperature": 0.0, "avg_logprob": -0.14200683653823973, + "compression_ratio": 1.7671755725190839, "no_speech_prob": 9.53077687881887e-05}, + {"id": 1202, "seek": 369374, "start": 3708.74, "end": 3710.7, "text": " across the + embeddings that you''re pulling,", "tokens": [51114, 2108, 264, 12240, 29432, 300, + 291, 434, 8407, 11, 51212], "temperature": 0.0, "avg_logprob": -0.14200683653823973, + "compression_ratio": 1.7671755725190839, "no_speech_prob": 9.53077687881887e-05}, + {"id": 1203, "seek": 369374, "start": 3710.7, "end": 3715.8599999999997, "text": + " you can go from a bunch of embeddings", "tokens": [51212, 291, 393, 352, 490, + 257, 3840, 295, 12240, 29432, 51470], "temperature": 0.0, "avg_logprob": -0.14200683653823973, + "compression_ratio": 1.7671755725190839, "no_speech_prob": 9.53077687881887e-05}, + {"id": 1204, "seek": 369374, "start": 3715.8599999999997, "end": 3717.62, "text": + " that are just pulled together into a point", "tokens": [51470, 300, 366, 445, + 7373, 1214, 666, 257, 935, 51558], "temperature": 0.0, "avg_logprob": -0.14200683653823973, + "compression_ratio": 1.7671755725190839, "no_speech_prob": 9.53077687881887e-05}, + {"id": 1205, "seek": 369374, "start": 3717.62, "end": 3720.4599999999996, "text": + " to actually saying, what is the relative size", "tokens": [51558, 281, 767, 1566, + 11, 437, 307, 264, 4972, 2744, 51700], "temperature": 0.0, "avg_logprob": -0.14200683653823973, + "compression_ratio": 1.7671755725190839, "no_speech_prob": 9.53077687881887e-05}, + {"id": 1206, "seek": 369374, "start": 3720.4599999999996, "end": 3722.66, "text": + " of the range of the co-signs", "tokens": [51700, 295, 264, 3613, 295, 264, 598, + 12, 82, 42636, 51810], "temperature": 0.0, "avg_logprob": -0.14200683653823973, + "compression_ratio": 1.7671755725190839, "no_speech_prob": 9.53077687881887e-05}, + {"id": 1207, "seek": 372266, "start": 3722.66, "end": 3724.5, "text": " within these + embeddings?", "tokens": [50364, 1951, 613, 12240, 29432, 30, 50456], "temperature": + 0.0, "avg_logprob": -0.10577959265590699, "compression_ratio": 1.702290076335878, + "no_speech_prob": 0.00023765789228491485}, {"id": 1208, "seek": 372266, "start": + 3724.5, "end": 3726.3799999999997, "text": " And if it''s a very large range,", + "tokens": [50456, 400, 498, 309, 311, 257, 588, 2416, 3613, 11, 50550], "temperature": + 0.0, "avg_logprob": -0.10577959265590699, "compression_ratio": 1.702290076335878, + "no_speech_prob": 0.00023765789228491485}, {"id": 1209, "seek": 372266, "start": + 3726.3799999999997, "end": 3729.02, "text": " I understand that this is not a very + specific query,", "tokens": [50550, 286, 1223, 300, 341, 307, 406, 257, 588, 2685, + 14581, 11, 50682], "temperature": 0.0, "avg_logprob": -0.10577959265590699, "compression_ratio": + 1.702290076335878, "no_speech_prob": 0.00023765789228491485}, {"id": 1210, "seek": + 372266, "start": 3729.02, "end": 3730.22, "text": " it''s a broad query.", "tokens": + [50682, 309, 311, 257, 4152, 14581, 13, 50742], "temperature": 0.0, "avg_logprob": + -0.10577959265590699, "compression_ratio": 1.702290076335878, "no_speech_prob": + 0.00023765789228491485}, {"id": 1211, "seek": 372266, "start": 3730.22, "end": 3734.62, + "text": " Therefore, when I go query in the dense space,", "tokens": [50742, 7504, + 11, 562, 286, 352, 14581, 294, 264, 18011, 1901, 11, 50962], "temperature": 0.0, + "avg_logprob": -0.10577959265590699, "compression_ratio": 1.702290076335878, "no_speech_prob": + 0.00023765789228491485}, {"id": 1212, "seek": 372266, "start": 3734.62, "end": 3737.66, + "text": " I need to draw a larger radius", "tokens": [50962, 286, 643, 281, 2642, + 257, 4833, 15845, 51114], "temperature": 0.0, "avg_logprob": -0.10577959265590699, + "compression_ratio": 1.702290076335878, "no_speech_prob": 0.00023765789228491485}, + {"id": 1213, "seek": 372266, "start": 3737.66, "end": 3741.1, "text": " or a larger + kind of shape around what I''m searching for.", "tokens": [51114, 420, 257, 4833, + 733, 295, 3909, 926, 437, 286, 478, 10808, 337, 13, 51286], "temperature": 0.0, + "avg_logprob": -0.10577959265590699, "compression_ratio": 1.702290076335878, "no_speech_prob": + 0.00023765789228491485}, {"id": 1214, "seek": 372266, "start": 3741.1, "end": 3743.46, + "text": " So ideally, you''re actually searching for a shape", "tokens": [51286, + 407, 22915, 11, 291, 434, 767, 10808, 337, 257, 3909, 51404], "temperature": 0.0, + "avg_logprob": -0.10577959265590699, "compression_ratio": 1.702290076335878, "no_speech_prob": + 0.00023765789228491485}, {"id": 1215, "seek": 372266, "start": 3743.46, "end": 3744.8599999999997, + "text": " at not just a point,", "tokens": [51404, 412, 406, 445, 257, 935, 11, + 51474], "temperature": 0.0, "avg_logprob": -0.10577959265590699, "compression_ratio": + 1.702290076335878, "no_speech_prob": 0.00023765789228491485}, {"id": 1216, "seek": + 372266, "start": 3744.8599999999997, "end": 3748.42, "text": " but literally every + vector search implementation I''ve seen", "tokens": [51474, 457, 3736, 633, 8062, + 3164, 11420, 286, 600, 1612, 51652], "temperature": 0.0, "avg_logprob": -0.10577959265590699, + "compression_ratio": 1.702290076335878, "no_speech_prob": 0.00023765789228491485}, + {"id": 1217, "seek": 372266, "start": 3748.42, "end": 3751.7, "text": " at any company + is searching on embeddings as points", "tokens": [51652, 412, 604, 2237, 307, 10808, + 322, 12240, 29432, 382, 2793, 51816], "temperature": 0.0, "avg_logprob": -0.10577959265590699, + "compression_ratio": 1.702290076335878, "no_speech_prob": 0.00023765789228491485}, + {"id": 1218, "seek": 375170, "start": 3751.7, "end": 3753.54, "text": " and just + looking for the nearest things,", "tokens": [50364, 293, 445, 1237, 337, 264, 23831, + 721, 11, 50456], "temperature": 0.0, "avg_logprob": -0.20429954194186026, "compression_ratio": + 1.5807692307692307, "no_speech_prob": 0.0001122245448641479}, {"id": 1219, "seek": + 375170, "start": 3753.54, "end": 3754.7, "text": " not searching on shapes.", "tokens": + [50456, 406, 10808, 322, 10854, 13, 50514], "temperature": 0.0, "avg_logprob": -0.20429954194186026, + "compression_ratio": 1.5807692307692307, "no_speech_prob": 0.0001122245448641479}, + {"id": 1220, "seek": 375170, "start": 3754.7, "end": 3757.8999999999996, "text": + " And so we don''t even really have the query patterns", "tokens": [50514, 400, + 370, 321, 500, 380, 754, 534, 362, 264, 14581, 8294, 50674], "temperature": 0.0, + "avg_logprob": -0.20429954194186026, "compression_ratio": 1.5807692307692307, "no_speech_prob": + 0.0001122245448641479}, {"id": 1221, "seek": 375170, "start": 3757.8999999999996, + "end": 3761.74, "text": " and paradigms in place today to do that kind of a query.", + "tokens": [50674, 293, 13480, 328, 2592, 294, 1081, 965, 281, 360, 300, 733, 295, + 257, 14581, 13, 50866], "temperature": 0.0, "avg_logprob": -0.20429954194186026, + "compression_ratio": 1.5807692307692307, "no_speech_prob": 0.0001122245448641479}, + {"id": 1222, "seek": 375170, "start": 3761.74, "end": 3763.62, "text": " But I think + that would be a further improvement", "tokens": [50866, 583, 286, 519, 300, 576, + 312, 257, 3052, 10444, 50960], "temperature": 0.0, "avg_logprob": -0.20429954194186026, + "compression_ratio": 1.5807692307692307, "no_speech_prob": 0.0001122245448641479}, + {"id": 1223, "seek": 375170, "start": 3763.62, "end": 3764.8999999999996, "text": + " on the paradigm here.", "tokens": [50960, 322, 264, 24709, 510, 13, 51024], "temperature": + 0.0, "avg_logprob": -0.20429954194186026, "compression_ratio": 1.5807692307692307, + "no_speech_prob": 0.0001122245448641479}, {"id": 1224, "seek": 375170, "start": + 3765.9399999999996, "end": 3767.74, "text": " Awesome.", "tokens": [51076, 10391, + 13, 51166], "temperature": 0.0, "avg_logprob": -0.20429954194186026, "compression_ratio": + 1.5807692307692307, "no_speech_prob": 0.0001122245448641479}, {"id": 1225, "seek": + 375170, "start": 3767.74, "end": 3769.66, "text": " Yeah, Tim Allison says thank + you.", "tokens": [51166, 865, 11, 7172, 32638, 1619, 1309, 291, 13, 51262], "temperature": + 0.0, "avg_logprob": -0.20429954194186026, "compression_ratio": 1.5807692307692307, + "no_speech_prob": 0.0001122245448641479}, {"id": 1226, "seek": 375170, "start": + 3769.66, "end": 3771.14, "text": " Thanks, Tim.", "tokens": [51262, 2561, 11, 7172, + 13, 51336], "temperature": 0.0, "avg_logprob": -0.20429954194186026, "compression_ratio": + 1.5807692307692307, "no_speech_prob": 0.0001122245448641479}, {"id": 1227, "seek": + 375170, "start": 3771.14, "end": 3772.54, "text": " The next question is from Julian.", + "tokens": [51336, 440, 958, 1168, 307, 490, 25151, 13, 51406], "temperature": 0.0, + "avg_logprob": -0.20429954194186026, "compression_ratio": 1.5807692307692307, "no_speech_prob": + 0.0001122245448641479}, {"id": 1228, "seek": 375170, "start": 3772.54, "end": 3774.98, + "text": " Can you recommend any papers or other material", "tokens": [51406, 1664, + 291, 2748, 604, 10577, 420, 661, 2527, 51528], "temperature": 0.0, "avg_logprob": + -0.20429954194186026, "compression_ratio": 1.5807692307692307, "no_speech_prob": + 0.0001122245448641479}, {"id": 1229, "seek": 375170, "start": 3774.98, "end": 3776.98, + "text": " to explore the topic further?", "tokens": [51528, 281, 6839, 264, 4829, + 3052, 30, 51628], "temperature": 0.0, "avg_logprob": -0.20429954194186026, "compression_ratio": + 1.5807692307692307, "no_speech_prob": 0.0001122245448641479}, {"id": 1230, "seek": + 377698, "start": 3777.26, "end": 3780.7400000000002, "text": " So not really.", + "tokens": [50378, 407, 406, 534, 13, 50552], "temperature": 0.0, "avg_logprob": + -0.2884144943751646, "compression_ratio": 1.5648148148148149, "no_speech_prob": + 0.00744219496846199}, {"id": 1231, "seek": 377698, "start": 3780.7400000000002, + "end": 3787.54, "text": " So the one whole vector thing is something I kind of came + up with.", "tokens": [50552, 407, 264, 472, 1379, 8062, 551, 307, 746, 286, 733, + 295, 1361, 493, 365, 13, 50892], "temperature": 0.0, "avg_logprob": -0.2884144943751646, + "compression_ratio": 1.5648148148148149, "no_speech_prob": 0.00744219496846199}, + {"id": 1232, "seek": 377698, "start": 3787.54, "end": 3790.1, "text": " I will say, + well, two things.", "tokens": [50892, 286, 486, 584, 11, 731, 11, 732, 721, 13, + 51020], "temperature": 0.0, "avg_logprob": -0.2884144943751646, "compression_ratio": + 1.5648148148148149, "no_speech_prob": 0.00744219496846199}, {"id": 1233, "seek": + 377698, "start": 3790.1, "end": 3792.94, "text": " One, semantic knowledge graphs.", + "tokens": [51020, 1485, 11, 47982, 3601, 24877, 13, 51162], "temperature": 0.0, + "avg_logprob": -0.2884144943751646, "compression_ratio": 1.5648148148148149, "no_speech_prob": + 0.00744219496846199}, {"id": 1234, "seek": 377698, "start": 3792.94, "end": 3796.02, + "text": " I actually was the lead author on the original semantic knowledge", "tokens": + [51162, 286, 767, 390, 264, 1477, 3793, 322, 264, 3380, 47982, 3601, 51316], "temperature": + 0.0, "avg_logprob": -0.2884144943751646, "compression_ratio": 1.5648148148148149, + "no_speech_prob": 0.00744219496846199}, {"id": 1235, "seek": 377698, "start": 3796.02, + "end": 3800.18, "text": " graph paper back in like, I don''t know, 2016 or whatever + was published.", "tokens": [51316, 4295, 3035, 646, 294, 411, 11, 286, 500, 380, + 458, 11, 6549, 420, 2035, 390, 6572, 13, 51524], "temperature": 0.0, "avg_logprob": + -0.2884144943751646, "compression_ratio": 1.5648148148148149, "no_speech_prob": + 0.00744219496846199}, {"id": 1236, "seek": 377698, "start": 3800.18, "end": 3806.62, + "text": " So this notion of being able to jump between spaces back", "tokens": [51524, + 407, 341, 10710, 295, 885, 1075, 281, 3012, 1296, 7673, 646, 51846], "temperature": + 0.0, "avg_logprob": -0.2884144943751646, "compression_ratio": 1.5648148148148149, + "no_speech_prob": 0.00744219496846199}, {"id": 1237, "seek": 380662, "start": 3806.62, + "end": 3809.98, "text": " to a sparse space, you could look at that paper", "tokens": + [50364, 281, 257, 637, 11668, 1901, 11, 291, 727, 574, 412, 300, 3035, 50532], "temperature": + 0.0, "avg_logprob": -0.21112240971745672, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.0006073295371606946}, {"id": 1238, "seek": 380662, "start": + 3809.98, "end": 3811.54, "text": " if you want an actual research paper.", "tokens": + [50532, 498, 291, 528, 364, 3539, 2132, 3035, 13, 50610], "temperature": 0.0, "avg_logprob": + -0.21112240971745672, "compression_ratio": 1.6363636363636365, "no_speech_prob": + 0.0006073295371606946}, {"id": 1239, "seek": 380662, "start": 3811.54, "end": 3813.06, + "text": " I''ve also given lots of talks about it.", "tokens": [50610, 286, 600, + 611, 2212, 3195, 295, 6686, 466, 309, 13, 50686], "temperature": 0.0, "avg_logprob": + -0.21112240971745672, "compression_ratio": 1.6363636363636365, "no_speech_prob": + 0.0006073295371606946}, {"id": 1240, "seek": 380662, "start": 3813.06, "end": 3814.3399999999997, + "text": " It''s in the AI powered search book.", "tokens": [50686, 467, 311, 294, + 264, 7318, 17786, 3164, 1446, 13, 50750], "temperature": 0.0, "avg_logprob": -0.21112240971745672, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 0.0006073295371606946}, + {"id": 1241, "seek": 380662, "start": 3814.3399999999997, "end": 3816.14, "text": + " It''ll be in the course.", "tokens": [50750, 467, 603, 312, 294, 264, 1164, 13, + 50840], "temperature": 0.0, "avg_logprob": -0.21112240971745672, "compression_ratio": + 1.6363636363636365, "no_speech_prob": 0.0006073295371606946}, {"id": 1242, "seek": + 380662, "start": 3816.14, "end": 3820.46, "text": " However, the notion of taking, + running a query", "tokens": [50840, 2908, 11, 264, 10710, 295, 1940, 11, 2614, 257, + 14581, 51056], "temperature": 0.0, "avg_logprob": -0.21112240971745672, "compression_ratio": + 1.6363636363636365, "no_speech_prob": 0.0006073295371606946}, {"id": 1243, "seek": + 380662, "start": 3820.46, "end": 3825.74, "text": " and pulling vectors together + and even the notion of query", "tokens": [51056, 293, 8407, 18875, 1214, 293, 754, + 264, 10710, 295, 14581, 51320], "temperature": 0.0, "avg_logprob": -0.21112240971745672, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 0.0006073295371606946}, + {"id": 1244, "seek": 380662, "start": 3825.74, "end": 3828.9, "text": " specificity + that co-science similarity thing.", "tokens": [51320, 2685, 507, 300, 598, 12, 82, + 6699, 32194, 551, 13, 51478], "temperature": 0.0, "avg_logprob": -0.21112240971745672, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 0.0006073295371606946}, + {"id": 1245, "seek": 380662, "start": 3828.9, "end": 3833.06, "text": " If you look + at Daniel Tukolang, he actually did a lightning", "tokens": [51478, 759, 291, 574, + 412, 8033, 314, 2034, 401, 656, 11, 415, 767, 630, 257, 16589, 51686], "temperature": + 0.0, "avg_logprob": -0.21112240971745672, "compression_ratio": 1.6363636363636365, + "no_speech_prob": 0.0006073295371606946}, {"id": 1246, "seek": 383306, "start": + 3833.06, "end": 3838.06, "text": " talk with us a week or two ago on query understanding.", + "tokens": [50364, 751, 365, 505, 257, 1243, 420, 732, 2057, 322, 14581, 3701, 13, + 50614], "temperature": 0.0, "avg_logprob": -0.11625098345572488, "compression_ratio": + 1.9367088607594938, "no_speech_prob": 0.00895062554627657}, {"id": 1247, "seek": + 383306, "start": 3838.06, "end": 3840.82, "text": " He actually talks about this + notion of a bag of documents", "tokens": [50614, 634, 767, 6686, 466, 341, 10710, + 295, 257, 3411, 295, 8512, 50752], "temperature": 0.0, "avg_logprob": -0.11625098345572488, + "compression_ratio": 1.9367088607594938, "no_speech_prob": 0.00895062554627657}, + {"id": 1248, "seek": 383306, "start": 3840.82, "end": 3842.18, "text": " to represent + a query.", "tokens": [50752, 281, 2906, 257, 14581, 13, 50820], "temperature": 0.0, + "avg_logprob": -0.11625098345572488, "compression_ratio": 1.9367088607594938, "no_speech_prob": + 0.00895062554627657}, {"id": 1249, "seek": 383306, "start": 3842.18, "end": 3844.66, + "text": " It''s functionally the exact same thing.", "tokens": [50820, 467, 311, + 2445, 379, 264, 1900, 912, 551, 13, 50944], "temperature": 0.0, "avg_logprob": -0.11625098345572488, + "compression_ratio": 1.9367088607594938, "no_speech_prob": 0.00895062554627657}, + {"id": 1250, "seek": 383306, "start": 3844.66, "end": 3848.58, "text": " So if I + run a query and think of the query''s meaning", "tokens": [50944, 407, 498, 286, + 1190, 257, 14581, 293, 519, 295, 264, 14581, 311, 3620, 51140], "temperature": 0.0, + "avg_logprob": -0.11625098345572488, "compression_ratio": 1.9367088607594938, "no_speech_prob": + 0.00895062554627657}, {"id": 1251, "seek": 383306, "start": 3848.58, "end": 3852.58, + "text": " as being represented by the set of documents that match that query,", + "tokens": [51140, 382, 885, 10379, 538, 264, 992, 295, 8512, 300, 2995, 300, 14581, + 11, 51340], "temperature": 0.0, "avg_logprob": -0.11625098345572488, "compression_ratio": + 1.9367088607594938, "no_speech_prob": 0.00895062554627657}, {"id": 1252, "seek": + 383306, "start": 3852.58, "end": 3855.94, "text": " then to take that set of documents + that holds that meaning", "tokens": [51340, 550, 281, 747, 300, 992, 295, 8512, + 300, 9190, 300, 3620, 51508], "temperature": 0.0, "avg_logprob": -0.11625098345572488, + "compression_ratio": 1.9367088607594938, "no_speech_prob": 0.00895062554627657}, + {"id": 1253, "seek": 383306, "start": 3855.94, "end": 3859.86, "text": " and pull + the embeddings to create an average embedding", "tokens": [51508, 293, 2235, 264, + 12240, 29432, 281, 1884, 364, 4274, 12240, 3584, 51704], "temperature": 0.0, "avg_logprob": + -0.11625098345572488, "compression_ratio": 1.9367088607594938, "no_speech_prob": + 0.00895062554627657}, {"id": 1254, "seek": 383306, "start": 3859.86, "end": 3862.7, + "text": " that represents that meaning and embedding space,", "tokens": [51704, + 300, 8855, 300, 3620, 293, 12240, 3584, 1901, 11, 51846], "temperature": 0.0, "avg_logprob": + -0.11625098345572488, "compression_ratio": 1.9367088607594938, "no_speech_prob": + 0.00895062554627657}, {"id": 1255, "seek": 386270, "start": 3862.7, "end": 3865.18, + "text": " it''s functionally the same thing that Daniel describes", "tokens": [50364, + 309, 311, 2445, 379, 264, 912, 551, 300, 8033, 15626, 50488], "temperature": 0.0, + "avg_logprob": -0.20744129919236706, "compression_ratio": 1.6217228464419475, "no_speech_prob": + 0.0002090014168061316}, {"id": 1256, "seek": 386270, "start": 3865.18, "end": 3867.4199999999996, + "text": " when he talks about bags of documents.", "tokens": [50488, 562, 415, 6686, + 466, 10405, 295, 8512, 13, 50600], "temperature": 0.0, "avg_logprob": -0.20744129919236706, + "compression_ratio": 1.6217228464419475, "no_speech_prob": 0.0002090014168061316}, + {"id": 1257, "seek": 386270, "start": 3867.4199999999996, "end": 3869.8199999999997, + "text": " So I would say look at Daniel''s work,", "tokens": [50600, 407, 286, 576, + 584, 574, 412, 8033, 311, 589, 11, 50720], "temperature": 0.0, "avg_logprob": -0.20744129919236706, + "compression_ratio": 1.6217228464419475, "no_speech_prob": 0.0002090014168061316}, + {"id": 1258, "seek": 386270, "start": 3869.8199999999997, "end": 3874.1, "text": + " look at the lightning talk he gave a week or two ago with us.", "tokens": [50720, + 574, 412, 264, 16589, 751, 415, 2729, 257, 1243, 420, 732, 2057, 365, 505, 13, 50934], + "temperature": 0.0, "avg_logprob": -0.20744129919236706, "compression_ratio": 1.6217228464419475, + "no_speech_prob": 0.0002090014168061316}, {"id": 1259, "seek": 386270, "start": + 3874.1, "end": 3876.58, "text": " And those are some good resources.", "tokens": + [50934, 400, 729, 366, 512, 665, 3593, 13, 51058], "temperature": 0.0, "avg_logprob": + -0.20744129919236706, "compression_ratio": 1.6217228464419475, "no_speech_prob": + 0.0002090014168061316}, {"id": 1260, "seek": 386270, "start": 3876.58, "end": 3879.58, + "text": " And of course, the book and the course.", "tokens": [51058, 400, 295, + 1164, 11, 264, 1446, 293, 264, 1164, 13, 51208], "temperature": 0.0, "avg_logprob": + -0.20744129919236706, "compression_ratio": 1.6217228464419475, "no_speech_prob": + 0.0002090014168061316}, {"id": 1261, "seek": 386270, "start": 3879.58, "end": 3880.74, + "text": " Yeah, awesome.", "tokens": [51208, 865, 11, 3476, 13, 51266], "temperature": + 0.0, "avg_logprob": -0.20744129919236706, "compression_ratio": 1.6217228464419475, + "no_speech_prob": 0.0002090014168061316}, {"id": 1262, "seek": 386270, "start": + 3880.74, "end": 3883.54, "text": " Maybe at some point of paper as well, right?", + "tokens": [51266, 2704, 412, 512, 935, 295, 3035, 382, 731, 11, 558, 30, 51406], + "temperature": 0.0, "avg_logprob": -0.20744129919236706, "compression_ratio": 1.6217228464419475, + "no_speech_prob": 0.0002090014168061316}, {"id": 1263, "seek": 386270, "start": + 3883.54, "end": 3885.1, "text": " Yeah, it''s definitely possible.", "tokens": [51406, + 865, 11, 309, 311, 2138, 1944, 13, 51484], "temperature": 0.0, "avg_logprob": -0.20744129919236706, + "compression_ratio": 1.6217228464419475, "no_speech_prob": 0.0002090014168061316}, + {"id": 1264, "seek": 386270, "start": 3885.1, "end": 3888.1, "text": " I need lots + of good.", "tokens": [51484, 286, 643, 3195, 295, 665, 13, 51634], "temperature": + 0.0, "avg_logprob": -0.20744129919236706, "compression_ratio": 1.6217228464419475, + "no_speech_prob": 0.0002090014168061316}, {"id": 1265, "seek": 386270, "start": + 3888.1, "end": 3891.8599999999997, "text": " I need evals on how this actually does + in practice.", "tokens": [51634, 286, 643, 1073, 1124, 322, 577, 341, 767, 775, + 294, 3124, 13, 51822], "temperature": 0.0, "avg_logprob": -0.20744129919236706, + "compression_ratio": 1.6217228464419475, "no_speech_prob": 0.0002090014168061316}, + {"id": 1266, "seek": 389186, "start": 3891.86, "end": 3893.06, "text": " Yeah, absolutely.", + "tokens": [50364, 865, 11, 3122, 13, 50424], "temperature": 0.0, "avg_logprob": + -0.2862801752170595, "compression_ratio": 1.6326530612244898, "no_speech_prob": + 0.0007126876735128462}, {"id": 1267, "seek": 389186, "start": 3893.06, "end": 3894.26, + "text": " Are you not the same question?", "tokens": [50424, 2014, 291, 406, 264, + 912, 1168, 30, 50484], "temperature": 0.0, "avg_logprob": -0.2862801752170595, "compression_ratio": + 1.6326530612244898, "no_speech_prob": 0.0007126876735128462}, {"id": 1268, "seek": + 389186, "start": 3894.26, "end": 3895.34, "text": " I''ll skip that.", "tokens": + [50484, 286, 603, 10023, 300, 13, 50538], "temperature": 0.0, "avg_logprob": -0.2862801752170595, + "compression_ratio": 1.6326530612244898, "no_speech_prob": 0.0007126876735128462}, + {"id": 1269, "seek": 389186, "start": 3895.34, "end": 3898.9, "text": " Mustafa + is asking for a knife phone query cases.", "tokens": [50538, 37229, 307, 3365, 337, + 257, 7976, 2593, 14581, 3331, 13, 50716], "temperature": 0.0, "avg_logprob": -0.2862801752170595, + "compression_ratio": 1.6326530612244898, "no_speech_prob": 0.0007126876735128462}, + {"id": 1270, "seek": 389186, "start": 3898.9, "end": 3901.54, "text": " So charges + may also appear.", "tokens": [50716, 407, 12235, 815, 611, 4204, 13, 50848], "temperature": + 0.0, "avg_logprob": -0.2862801752170595, "compression_ratio": 1.6326530612244898, + "no_speech_prob": 0.0007126876735128462}, {"id": 1271, "seek": 389186, "start": + 3901.54, "end": 3903.6200000000003, "text": " Would be correct to take the average.", + "tokens": [50848, 6068, 312, 3006, 281, 747, 264, 4274, 13, 50952], "temperature": + 0.0, "avg_logprob": -0.2862801752170595, "compression_ratio": 1.6326530612244898, + "no_speech_prob": 0.0007126876735128462}, {"id": 1272, "seek": 389186, "start": + 3903.6200000000003, "end": 3906.7000000000003, "text": " Would it be correct to + take the average of them?", "tokens": [50952, 6068, 309, 312, 3006, 281, 747, 264, + 4274, 295, 552, 30, 51106], "temperature": 0.0, "avg_logprob": -0.2862801752170595, + "compression_ratio": 1.6326530612244898, "no_speech_prob": 0.0007126876735128462}, + {"id": 1273, "seek": 389186, "start": 3906.7000000000003, "end": 3907.38, "text": + " So good question.", "tokens": [51106, 407, 665, 1168, 13, 51140], "temperature": + 0.0, "avg_logprob": -0.2862801752170595, "compression_ratio": 1.6326530612244898, + "no_speech_prob": 0.0007126876735128462}, {"id": 1274, "seek": 389186, "start": + 3907.38, "end": 3910.58, "text": " So yeah, if you, so,", "tokens": [51140, 407, + 1338, 11, 498, 291, 11, 370, 11, 51300], "temperature": 0.0, "avg_logprob": -0.2862801752170595, + "compression_ratio": 1.6326530612244898, "no_speech_prob": 0.0007126876735128462}, + {"id": 1275, "seek": 389186, "start": 3911.7000000000003, "end": 3914.3, "text": + " Lexical queries work really well.", "tokens": [51356, 24086, 804, 24109, 589, + 534, 731, 13, 51486], "temperature": 0.0, "avg_logprob": -0.2862801752170595, "compression_ratio": + 1.6326530612244898, "no_speech_prob": 0.0007126876735128462}, {"id": 1276, "seek": + 389186, "start": 3914.3, "end": 3917.1800000000003, "text": " When you''ve got particular + terms you''re looking for,", "tokens": [51486, 1133, 291, 600, 658, 1729, 2115, + 291, 434, 1237, 337, 11, 51630], "temperature": 0.0, "avg_logprob": -0.2862801752170595, + "compression_ratio": 1.6326530612244898, "no_speech_prob": 0.0007126876735128462}, + {"id": 1277, "seek": 389186, "start": 3917.1800000000003, "end": 3920.1, "text": + " whether it''s an ID or whether it''s a keyword,", "tokens": [51630, 1968, 309, + 311, 364, 7348, 420, 1968, 309, 311, 257, 20428, 11, 51776], "temperature": 0.0, + "avg_logprob": -0.2862801752170595, "compression_ratio": 1.6326530612244898, "no_speech_prob": + 0.0007126876735128462}, {"id": 1278, "seek": 392010, "start": 3920.1, "end": 3922.62, + "text": " they don''t work as well with like semantic meaning.", "tokens": [50364, + 436, 500, 380, 589, 382, 731, 365, 411, 47982, 3620, 13, 50490], "temperature": + 0.0, "avg_logprob": -0.15670201315808652, "compression_ratio": 1.6769759450171822, + "no_speech_prob": 0.0007207689923234284}, {"id": 1279, "seek": 392010, "start": + 3922.62, "end": 3925.7, "text": " Whereas in a dent space, obviously you query on + meaning,", "tokens": [50490, 13813, 294, 257, 7059, 1901, 11, 2745, 291, 14581, + 322, 3620, 11, 50644], "temperature": 0.0, "avg_logprob": -0.15670201315808652, + "compression_ratio": 1.6769759450171822, "no_speech_prob": 0.0007207689923234284}, + {"id": 1280, "seek": 392010, "start": 3925.7, "end": 3928.62, "text": " but if you + try to search for a product ID in a dent space,", "tokens": [50644, 457, 498, 291, + 853, 281, 3164, 337, 257, 1674, 7348, 294, 257, 7059, 1901, 11, 50790], "temperature": + 0.0, "avg_logprob": -0.15670201315808652, "compression_ratio": 1.6769759450171822, + "no_speech_prob": 0.0007207689923234284}, {"id": 1281, "seek": 392010, "start": + 3928.62, "end": 3930.7, "text": " unless you''ve fine-tuned it for that,", "tokens": + [50790, 5969, 291, 600, 2489, 12, 83, 43703, 309, 337, 300, 11, 50894], "temperature": + 0.0, "avg_logprob": -0.15670201315808652, "compression_ratio": 1.6769759450171822, + "no_speech_prob": 0.0007207689923234284}, {"id": 1282, "seek": 392010, "start": + 3930.7, "end": 3933.18, "text": " it''s gonna do an awful job.", "tokens": [50894, + 309, 311, 799, 360, 364, 11232, 1691, 13, 51018], "temperature": 0.0, "avg_logprob": + -0.15670201315808652, "compression_ratio": 1.6769759450171822, "no_speech_prob": + 0.0007207689923234284}, {"id": 1283, "seek": 392010, "start": 3933.18, "end": 3935.98, + "text": " And so, in the case of like searching for iPhone", "tokens": [51018, 400, + 370, 11, 294, 264, 1389, 295, 411, 10808, 337, 7252, 51158], "temperature": 0.0, + "avg_logprob": -0.15670201315808652, "compression_ratio": 1.6769759450171822, "no_speech_prob": + 0.0007207689923234284}, {"id": 1284, "seek": 392010, "start": 3935.98, "end": 3938.18, + "text": " and getting iPhone cases,", "tokens": [51158, 293, 1242, 7252, 3331, 11, + 51268], "temperature": 0.0, "avg_logprob": -0.15670201315808652, "compression_ratio": + 1.6769759450171822, "no_speech_prob": 0.0007207689923234284}, {"id": 1285, "seek": + 392010, "start": 3938.18, "end": 3940.94, "text": " this somewhat gets back to what + I said earlier about,", "tokens": [51268, 341, 8344, 2170, 646, 281, 437, 286, 848, + 3071, 466, 11, 51406], "temperature": 0.0, "avg_logprob": -0.15670201315808652, + "compression_ratio": 1.6769759450171822, "no_speech_prob": 0.0007207689923234284}, + {"id": 1286, "seek": 392010, "start": 3940.94, "end": 3943.98, "text": " ideally, + if you take the top end documents", "tokens": [51406, 22915, 11, 498, 291, 747, + 264, 1192, 917, 8512, 51558], "temperature": 0.0, "avg_logprob": -0.15670201315808652, + "compression_ratio": 1.6769759450171822, "no_speech_prob": 0.0007207689923234284}, + {"id": 1287, "seek": 392010, "start": 3943.98, "end": 3946.9, "text": " that are + the most relevant and you limit to that,", "tokens": [51558, 300, 366, 264, 881, + 7340, 293, 291, 4948, 281, 300, 11, 51704], "temperature": 0.0, "avg_logprob": -0.15670201315808652, + "compression_ratio": 1.6769759450171822, "no_speech_prob": 0.0007207689923234284}, + {"id": 1288, "seek": 392010, "start": 3946.9, "end": 3948.46, "text": " like if + you''re ranking algorithm", "tokens": [51704, 411, 498, 291, 434, 17833, 9284, 51782], + "temperature": 0.0, "avg_logprob": -0.15670201315808652, "compression_ratio": 1.6769759450171822, + "no_speech_prob": 0.0007207689923234284}, {"id": 1289, "seek": 394846, "start": + 3948.46, "end": 3950.78, "text": " can already sort of understand that when someone + searches", "tokens": [50364, 393, 1217, 1333, 295, 1223, 300, 562, 1580, 26701, + 50480], "temperature": 0.0, "avg_logprob": -0.2072736805882947, "compression_ratio": + 1.8132780082987552, "no_speech_prob": 0.0004132489557377994}, {"id": 1290, "seek": + 394846, "start": 3950.78, "end": 3955.78, "text": " for iPhone that they mean an + actual iPhone versus a case,", "tokens": [50480, 337, 7252, 300, 436, 914, 364, + 3539, 7252, 5717, 257, 1389, 11, 50730], "temperature": 0.0, "avg_logprob": -0.2072736805882947, + "compression_ratio": 1.8132780082987552, "no_speech_prob": 0.0004132489557377994}, + {"id": 1291, "seek": 394846, "start": 3955.78, "end": 3958.14, "text": " that''s + a better way to go", "tokens": [50730, 300, 311, 257, 1101, 636, 281, 352, 50848], + "temperature": 0.0, "avg_logprob": -0.2072736805882947, "compression_ratio": 1.8132780082987552, + "no_speech_prob": 0.0004132489557377994}, {"id": 1292, "seek": 394846, "start": + 3958.14, "end": 3960.5, "text": " versus just anything that matches the term.", + "tokens": [50848, 5717, 445, 1340, 300, 10676, 264, 1433, 13, 50966], "temperature": + 0.0, "avg_logprob": -0.2072736805882947, "compression_ratio": 1.8132780082987552, + "no_speech_prob": 0.0004132489557377994}, {"id": 1293, "seek": 394846, "start": + 3960.5, "end": 3961.7, "text": " That being said,", "tokens": [50966, 663, 885, + 848, 11, 51026], "temperature": 0.0, "avg_logprob": -0.2072736805882947, "compression_ratio": + 1.8132780082987552, "no_speech_prob": 0.0004132489557377994}, {"id": 1294, "seek": + 394846, "start": 3963.34, "end": 3967.66, "text": " what you can do is you can, + for example, in that case,", "tokens": [51108, 437, 291, 393, 360, 307, 291, 393, + 11, 337, 1365, 11, 294, 300, 1389, 11, 51324], "temperature": 0.0, "avg_logprob": + -0.2072736805882947, "compression_ratio": 1.8132780082987552, "no_speech_prob": + 0.0004132489557377994}, {"id": 1295, "seek": 394846, "start": 3967.66, "end": 3970.9, + "text": " search for iPhone, find the iPhone cases,", "tokens": [51324, 3164, 337, + 7252, 11, 915, 264, 7252, 3331, 11, 51486], "temperature": 0.0, "avg_logprob": -0.2072736805882947, + "compression_ratio": 1.8132780082987552, "no_speech_prob": 0.0004132489557377994}, + {"id": 1296, "seek": 394846, "start": 3970.9, "end": 3972.94, "text": " along with + iPhone, get that average vector,", "tokens": [51486, 2051, 365, 7252, 11, 483, 300, + 4274, 8062, 11, 51588], "temperature": 0.0, "avg_logprob": -0.2072736805882947, + "compression_ratio": 1.8132780082987552, "no_speech_prob": 0.0004132489557377994}, + {"id": 1297, "seek": 394846, "start": 3972.94, "end": 3974.82, "text": " and then + there''s still this region of,", "tokens": [51588, 293, 550, 456, 311, 920, 341, + 4458, 295, 11, 51682], "temperature": 0.0, "avg_logprob": -0.2072736805882947, "compression_ratio": + 1.8132780082987552, "no_speech_prob": 0.0004132489557377994}, {"id": 1298, "seek": + 394846, "start": 3974.82, "end": 3975.78, "text": " along certain dimensions,", + "tokens": [51682, 2051, 1629, 12819, 11, 51730], "temperature": 0.0, "avg_logprob": + -0.2072736805882947, "compression_ratio": 1.8132780082987552, "no_speech_prob": + 0.0004132489557377994}, {"id": 1299, "seek": 394846, "start": 3975.78, "end": 3977.34, + "text": " it''s associated with iPhone.", "tokens": [51730, 309, 311, 6615, 365, + 7252, 13, 51808], "temperature": 0.0, "avg_logprob": -0.2072736805882947, "compression_ratio": + 1.8132780082987552, "no_speech_prob": 0.0004132489557377994}, {"id": 1300, "seek": + 397734, "start": 3977.34, "end": 3979.7000000000003, "text": " If you hopped over + to the behavioral embedding space,", "tokens": [50364, 759, 291, 3818, 3452, 670, + 281, 264, 19124, 12240, 3584, 1901, 11, 50482], "temperature": 0.0, "avg_logprob": + -0.14208681528804867, "compression_ratio": 1.7731958762886597, "no_speech_prob": + 8.853290637489408e-05}, {"id": 1301, "seek": 397734, "start": 3979.7000000000003, + "end": 3981.06, "text": " what you''re gonna find is that,", "tokens": [50482, 437, + 291, 434, 799, 915, 307, 300, 11, 50550], "temperature": 0.0, "avg_logprob": -0.14208681528804867, + "compression_ratio": 1.7731958762886597, "no_speech_prob": 8.853290637489408e-05}, + {"id": 1302, "seek": 397734, "start": 3981.06, "end": 3985.7400000000002, "text": + " hey, these cases are very highly correlated to these items,", "tokens": [50550, + 4177, 11, 613, 3331, 366, 588, 5405, 38574, 281, 613, 4754, 11, 50784], "temperature": + 0.0, "avg_logprob": -0.14208681528804867, "compression_ratio": 1.7731958762886597, + "no_speech_prob": 8.853290637489408e-05}, {"id": 1303, "seek": 397734, "start": + 3985.7400000000002, "end": 3988.42, "text": " the iPhones that actually correspond + to those cases,", "tokens": [50784, 264, 43793, 300, 767, 6805, 281, 729, 3331, + 11, 50918], "temperature": 0.0, "avg_logprob": -0.14208681528804867, "compression_ratio": + 1.7731958762886597, "no_speech_prob": 8.853290637489408e-05}, {"id": 1304, "seek": + 397734, "start": 3988.42, "end": 3989.98, "text": " so that might be a case where + you would want to hop", "tokens": [50918, 370, 300, 1062, 312, 257, 1389, 689, 291, + 576, 528, 281, 3818, 50996], "temperature": 0.0, "avg_logprob": -0.14208681528804867, + "compression_ratio": 1.7731958762886597, "no_speech_prob": 8.853290637489408e-05}, + {"id": 1305, "seek": 397734, "start": 3989.98, "end": 3992.94, "text": " to the + behavioral space and leverage what''s there.", "tokens": [50996, 281, 264, 19124, + 1901, 293, 13982, 437, 311, 456, 13, 51144], "temperature": 0.0, "avg_logprob": + -0.14208681528804867, "compression_ratio": 1.7731958762886597, "no_speech_prob": + 8.853290637489408e-05}, {"id": 1306, "seek": 397734, "start": 3992.94, "end": 3994.7000000000003, + "text": " There''s also just a note,", "tokens": [51144, 821, 311, 611, 445, 257, + 3637, 11, 51232], "temperature": 0.0, "avg_logprob": -0.14208681528804867, "compression_ratio": + 1.7731958762886597, "no_speech_prob": 8.853290637489408e-05}, {"id": 1307, "seek": + 397734, "start": 3994.7000000000003, "end": 3996.78, "text": " we''ve talked about + taking entire queries", "tokens": [51232, 321, 600, 2825, 466, 1940, 2302, 24109, + 51336], "temperature": 0.0, "avg_logprob": -0.14208681528804867, "compression_ratio": + 1.7731958762886597, "no_speech_prob": 8.853290637489408e-05}, {"id": 1308, "seek": + 397734, "start": 3996.78, "end": 3999.02, "text": " and hopping from between spaces,", + "tokens": [51336, 293, 47199, 490, 1296, 7673, 11, 51448], "temperature": 0.0, "avg_logprob": + -0.14208681528804867, "compression_ratio": 1.7731958762886597, "no_speech_prob": + 8.853290637489408e-05}, {"id": 1309, "seek": 397734, "start": 3999.02, "end": 4002.9, + "text": " but there''s also a line of thinking and practice here", "tokens": [51448, + 457, 456, 311, 611, 257, 1622, 295, 1953, 293, 3124, 510, 51642], "temperature": + 0.0, "avg_logprob": -0.14208681528804867, "compression_ratio": 1.7731958762886597, + "no_speech_prob": 8.853290637489408e-05}, {"id": 1310, "seek": 397734, "start": + 4002.9, "end": 4005.82, "text": " around using this for query understanding,", "tokens": + [51642, 926, 1228, 341, 337, 14581, 3701, 11, 51788], "temperature": 0.0, "avg_logprob": + -0.14208681528804867, "compression_ratio": 1.7731958762886597, "no_speech_prob": + 8.853290637489408e-05}, {"id": 1311, "seek": 397734, "start": 4005.82, "end": 4007.1000000000004, + "text": " not just ranking,", "tokens": [51788, 406, 445, 17833, 11, 51852], "temperature": + 0.0, "avg_logprob": -0.14208681528804867, "compression_ratio": 1.7731958762886597, + "no_speech_prob": 8.853290637489408e-05}, {"id": 1312, "seek": 400710, "start": + 4007.1, "end": 4008.98, "text": " and so you could, for example,", "tokens": [50364, + 293, 370, 291, 727, 11, 337, 1365, 11, 50458], "temperature": 0.0, "avg_logprob": + -0.17946782243361167, "compression_ratio": 1.7349397590361446, "no_speech_prob": + 1.633443207538221e-05}, {"id": 1313, "seek": 400710, "start": 4008.98, "end": 4011.86, + "text": " split the query into individual keywords.", "tokens": [50458, 7472, 264, + 14581, 666, 2609, 21009, 13, 50602], "temperature": 0.0, "avg_logprob": -0.17946782243361167, + "compression_ratio": 1.7349397590361446, "no_speech_prob": 1.633443207538221e-05}, + {"id": 1314, "seek": 400710, "start": 4011.86, "end": 4015.22, "text": " iPhone, + like just the word iPhone,", "tokens": [50602, 7252, 11, 411, 445, 264, 1349, 7252, + 11, 50770], "temperature": 0.0, "avg_logprob": -0.17946782243361167, "compression_ratio": + 1.7349397590361446, "no_speech_prob": 1.633443207538221e-05}, {"id": 1315, "seek": + 400710, "start": 4015.22, "end": 4017.06, "text": " and you could also search on + the dense space,", "tokens": [50770, 293, 291, 727, 611, 3164, 322, 264, 18011, + 1901, 11, 50862], "temperature": 0.0, "avg_logprob": -0.17946782243361167, "compression_ratio": + 1.7349397590361446, "no_speech_prob": 1.633443207538221e-05}, {"id": 1316, "seek": + 400710, "start": 4017.06, "end": 4019.3399999999997, "text": " and you could try + to take the individual pieces", "tokens": [50862, 293, 291, 727, 853, 281, 747, + 264, 2609, 3755, 50976], "temperature": 0.0, "avg_logprob": -0.17946782243361167, + "compression_ratio": 1.7349397590361446, "no_speech_prob": 1.633443207538221e-05}, + {"id": 1317, "seek": 400710, "start": 4019.3399999999997, "end": 4020.94, "text": + " and find things related to them,", "tokens": [50976, 293, 915, 721, 4077, 281, + 552, 11, 51056], "temperature": 0.0, "avg_logprob": -0.17946782243361167, "compression_ratio": + 1.7349397590361446, "no_speech_prob": 1.633443207538221e-05}, {"id": 1318, "seek": + 400710, "start": 4020.94, "end": 4023.1, "text": " and then leverage that for query + understanding", "tokens": [51056, 293, 550, 13982, 300, 337, 14581, 3701, 51164], + "temperature": 0.0, "avg_logprob": -0.17946782243361167, "compression_ratio": 1.7349397590361446, + "no_speech_prob": 1.633443207538221e-05}, {"id": 1319, "seek": 400710, "start": + 4023.1, "end": 4024.54, "text": " to hop back and forth between spaces.", "tokens": + [51164, 281, 3818, 646, 293, 5220, 1296, 7673, 13, 51236], "temperature": 0.0, "avg_logprob": + -0.17946782243361167, "compression_ratio": 1.7349397590361446, "no_speech_prob": + 1.633443207538221e-05}, {"id": 1320, "seek": 400710, "start": 4024.54, "end": 4028.14, + "text": " So the answer is you still have", "tokens": [51236, 407, 264, 1867, 307, + 291, 920, 362, 51416], "temperature": 0.0, "avg_logprob": -0.17946782243361167, + "compression_ratio": 1.7349397590361446, "no_speech_prob": 1.633443207538221e-05}, + {"id": 1321, "seek": 400710, "start": 4028.14, "end": 4031.46, "text": " the fundamental + limitations of each space,", "tokens": [51416, 264, 8088, 15705, 295, 1184, 1901, + 11, 51582], "temperature": 0.0, "avg_logprob": -0.17946782243361167, "compression_ratio": + 1.7349397590361446, "no_speech_prob": 1.633443207538221e-05}, {"id": 1322, "seek": + 400710, "start": 4031.46, "end": 4034.46, "text": " but imagine if somebody searched + for,", "tokens": [51582, 457, 3811, 498, 2618, 22961, 337, 11, 51732], "temperature": + 0.0, "avg_logprob": -0.17946782243361167, "compression_ratio": 1.7349397590361446, + "no_speech_prob": 1.633443207538221e-05}, {"id": 1323, "seek": 403446, "start": + 4034.46, "end": 4039.26, "text": " I want a phone that''s really good at blah, blah, + blah,", "tokens": [50364, 286, 528, 257, 2593, 300, 311, 534, 665, 412, 12288, 11, + 12288, 11, 12288, 11, 50604], "temperature": 0.0, "avg_logprob": -0.20818191919571313, + "compression_ratio": 1.7264957264957266, "no_speech_prob": 0.00017591615323908627}, + {"id": 1324, "seek": 403446, "start": 4039.26, "end": 4044.26, "text": " that''s + made by Apple with product ID X, right?", "tokens": [50604, 300, 311, 1027, 538, + 6373, 365, 1674, 7348, 1783, 11, 558, 30, 50854], "temperature": 0.0, "avg_logprob": + -0.20818191919571313, "compression_ratio": 1.7264957264957266, "no_speech_prob": + 0.00017591615323908627}, {"id": 1325, "seek": 403446, "start": 4044.98, "end": 4046.86, + "text": " You can imagine trying to search for that,", "tokens": [50890, 509, 393, + 3811, 1382, 281, 3164, 337, 300, 11, 50984], "temperature": 0.0, "avg_logprob": + -0.20818191919571313, "compression_ratio": 1.7264957264957266, "no_speech_prob": + 0.00017591615323908627}, {"id": 1326, "seek": 403446, "start": 4046.86, "end": 4048.7, + "text": " like an oratory on the,", "tokens": [50984, 411, 364, 420, 4745, 322, + 264, 11, 51076], "temperature": 0.0, "avg_logprob": -0.20818191919571313, "compression_ratio": + 1.7264957264957266, "no_speech_prob": 0.00017591615323908627}, {"id": 1327, "seek": + 403446, "start": 4048.7, "end": 4051.7400000000002, "text": " on the lexical side,", + "tokens": [51076, 322, 264, 476, 87, 804, 1252, 11, 51228], "temperature": 0.0, + "avg_logprob": -0.20818191919571313, "compression_ratio": 1.7264957264957266, "no_speech_prob": + 0.00017591615323908627}, {"id": 1328, "seek": 403446, "start": 4051.7400000000002, + "end": 4053.06, "text": " and you''ll actually match that ID", "tokens": [51228, + 293, 291, 603, 767, 2995, 300, 7348, 51294], "temperature": 0.0, "avg_logprob": + -0.20818191919571313, "compression_ratio": 1.7264957264957266, "no_speech_prob": + 0.00017591615323908627}, {"id": 1329, "seek": 403446, "start": 4053.06, "end": 4055.14, + "text": " and probably have it come up at the very top,", "tokens": [51294, 293, + 1391, 362, 309, 808, 493, 412, 264, 588, 1192, 11, 51398], "temperature": 0.0, "avg_logprob": + -0.20818191919571313, "compression_ratio": 1.7264957264957266, "no_speech_prob": + 0.00017591615323908627}, {"id": 1330, "seek": 403446, "start": 4055.14, "end": 4058.2200000000003, + "text": " and then you can imagine searching for that embedding", "tokens": [51398, + 293, 550, 291, 393, 3811, 10808, 337, 300, 12240, 3584, 51552], "temperature": 0.0, + "avg_logprob": -0.20818191919571313, "compression_ratio": 1.7264957264957266, "no_speech_prob": + 0.00017591615323908627}, {"id": 1331, "seek": 403446, "start": 4058.2200000000003, + "end": 4061.3, "text": " on the dense space,", "tokens": [51552, 322, 264, 18011, + 1901, 11, 51706], "temperature": 0.0, "avg_logprob": -0.20818191919571313, "compression_ratio": + 1.7264957264957266, "no_speech_prob": 0.00017591615323908627}, {"id": 1332, "seek": + 403446, "start": 4061.3, "end": 4064.18, "text": " and you can imagine for each + of those hopping back and forth", "tokens": [51706, 293, 291, 393, 3811, 337, 1184, + 295, 729, 47199, 646, 293, 5220, 51850], "temperature": 0.0, "avg_logprob": -0.20818191919571313, + "compression_ratio": 1.7264957264957266, "no_speech_prob": 0.00017591615323908627}, + {"id": 1333, "seek": 406418, "start": 4064.22, "end": 4066.14, "text": " and trying + to see what documents are there a couple of times.", "tokens": [50366, 293, 1382, + 281, 536, 437, 8512, 366, 456, 257, 1916, 295, 1413, 13, 50462], "temperature": + 0.0, "avg_logprob": -0.15023418596595717, "compression_ratio": 1.796116504854369, + "no_speech_prob": 0.00033023249125108123}, {"id": 1334, "seek": 406418, "start": + 4066.14, "end": 4067.8599999999997, "text": " So there''s, look,", "tokens": [50462, + 407, 456, 311, 11, 574, 11, 50548], "temperature": 0.0, "avg_logprob": -0.15023418596595717, + "compression_ratio": 1.796116504854369, "no_speech_prob": 0.00033023249125108123}, + {"id": 1335, "seek": 406418, "start": 4068.98, "end": 4069.8199999999997, "text": + " sure to answer,", "tokens": [50604, 988, 281, 1867, 11, 50646], "temperature": + 0.0, "avg_logprob": -0.15023418596595717, "compression_ratio": 1.796116504854369, + "no_speech_prob": 0.00033023249125108123}, {"id": 1336, "seek": 406418, "start": + 4069.8199999999997, "end": 4071.22, "text": " there''s a lot of different ways", + "tokens": [50646, 456, 311, 257, 688, 295, 819, 2098, 50716], "temperature": 0.0, + "avg_logprob": -0.15023418596595717, "compression_ratio": 1.796116504854369, "no_speech_prob": + 0.00033023249125108123}, {"id": 1337, "seek": 406418, "start": 4071.22, "end": 4072.7, + "text": " that you could leverage this technique", "tokens": [50716, 300, 291, 727, + 13982, 341, 6532, 50790], "temperature": 0.0, "avg_logprob": -0.15023418596595717, + "compression_ratio": 1.796116504854369, "no_speech_prob": 0.00033023249125108123}, + {"id": 1338, "seek": 406418, "start": 4072.7, "end": 4074.7799999999997, "text": + " to be hopping back and forth.", "tokens": [50790, 281, 312, 47199, 646, 293, 5220, + 13, 50894], "temperature": 0.0, "avg_logprob": -0.15023418596595717, "compression_ratio": + 1.796116504854369, "no_speech_prob": 0.00033023249125108123}, {"id": 1339, "seek": + 406418, "start": 4074.7799999999997, "end": 4076.7, "text": " I''m not gonna claim + now that I''ve thought through", "tokens": [50894, 286, 478, 406, 799, 3932, 586, + 300, 286, 600, 1194, 807, 50990], "temperature": 0.0, "avg_logprob": -0.15023418596595717, + "compression_ratio": 1.796116504854369, "no_speech_prob": 0.00033023249125108123}, + {"id": 1340, "seek": 406418, "start": 4076.7, "end": 4077.58, "text": " every single + one of them,", "tokens": [50990, 633, 2167, 472, 295, 552, 11, 51034], "temperature": + 0.0, "avg_logprob": -0.15023418596595717, "compression_ratio": 1.796116504854369, + "no_speech_prob": 0.00033023249125108123}, {"id": 1341, "seek": 406418, "start": + 4077.58, "end": 4079.1, "text": " and there''s lots of ways to do it,", "tokens": + [51034, 293, 456, 311, 3195, 295, 2098, 281, 360, 309, 11, 51110], "temperature": + 0.0, "avg_logprob": -0.15023418596595717, "compression_ratio": 1.796116504854369, + "no_speech_prob": 0.00033023249125108123}, {"id": 1342, "seek": 406418, "start": + 4079.1, "end": 4082.2999999999997, "text": " but I think as an introduction to the + topic,", "tokens": [51110, 457, 286, 519, 382, 364, 9339, 281, 264, 4829, 11, 51270], + "temperature": 0.0, "avg_logprob": -0.15023418596595717, "compression_ratio": 1.796116504854369, + "no_speech_prob": 0.00033023249125108123}, {"id": 1343, "seek": 406418, "start": + 4082.2999999999997, "end": 4085.3799999999997, "text": " and as a tool that you + can add to your tool belt,", "tokens": [51270, 293, 382, 257, 2290, 300, 291, 393, + 909, 281, 428, 2290, 10750, 11, 51424], "temperature": 0.0, "avg_logprob": -0.15023418596595717, + "compression_ratio": 1.796116504854369, "no_speech_prob": 0.00033023249125108123}, + {"id": 1344, "seek": 406418, "start": 4085.3799999999997, "end": 4088.54, "text": + " to be able to get explainability", "tokens": [51424, 281, 312, 1075, 281, 483, + 2903, 2310, 51582], "temperature": 0.0, "avg_logprob": -0.15023418596595717, "compression_ratio": + 1.796116504854369, "no_speech_prob": 0.00033023249125108123}, {"id": 1345, "seek": + 406418, "start": 4088.54, "end": 4089.62, "text": " and another vector space,", + "tokens": [51582, 293, 1071, 8062, 1901, 11, 51636], "temperature": 0.0, "avg_logprob": + -0.15023418596595717, "compression_ratio": 1.796116504854369, "no_speech_prob": + 0.00033023249125108123}, {"id": 1346, "seek": 406418, "start": 4089.62, "end": 4091.4199999999996, + "text": " based upon what you found in the first vector space,", "tokens": [51636, + 2361, 3564, 437, 291, 1352, 294, 264, 700, 8062, 1901, 11, 51726], "temperature": + 0.0, "avg_logprob": -0.15023418596595717, "compression_ratio": 1.796116504854369, + "no_speech_prob": 0.00033023249125108123}, {"id": 1347, "seek": 406418, "start": + 4091.4199999999996, "end": 4094.02, "text": " I think this is a really cool technique,", + "tokens": [51726, 286, 519, 341, 307, 257, 534, 1627, 6532, 11, 51856], "temperature": + 0.0, "avg_logprob": -0.15023418596595717, "compression_ratio": 1.796116504854369, + "no_speech_prob": 0.00033023249125108123}, {"id": 1348, "seek": 409402, "start": + 4094.02, "end": 4096.78, "text": " and I just wanted to kind of present it and get + feedback", "tokens": [50364, 293, 286, 445, 1415, 281, 733, 295, 1974, 309, 293, + 483, 5824, 50502], "temperature": 0.0, "avg_logprob": -0.16125579044736665, "compression_ratio": + 1.6866666666666668, "no_speech_prob": 0.0008274210849776864}, {"id": 1349, "seek": + 409402, "start": 4096.78, "end": 4098.94, "text": " and enjoy this discussion.", + "tokens": [50502, 293, 2103, 341, 5017, 13, 50610], "temperature": 0.0, "avg_logprob": + -0.16125579044736665, "compression_ratio": 1.6866666666666668, "no_speech_prob": + 0.0008274210849776864}, {"id": 1350, "seek": 409402, "start": 4098.94, "end": 4099.94, + "text": " Yeah, thanks, Trey.", "tokens": [50610, 865, 11, 3231, 11, 314, 7950, + 13, 50660], "temperature": 0.0, "avg_logprob": -0.16125579044736665, "compression_ratio": + 1.6866666666666668, "no_speech_prob": 0.0008274210849776864}, {"id": 1351, "seek": + 409402, "start": 4099.94, "end": 4100.94, "text": " We''re quite over time,", "tokens": + [50660, 492, 434, 1596, 670, 565, 11, 50710], "temperature": 0.0, "avg_logprob": + -0.16125579044736665, "compression_ratio": 1.6866666666666668, "no_speech_prob": + 0.0008274210849776864}, {"id": 1352, "seek": 409402, "start": 4100.94, "end": 4102.7, + "text": " thanks everyone for staying on,", "tokens": [50710, 3231, 1518, 337, 7939, + 322, 11, 50798], "temperature": 0.0, "avg_logprob": -0.16125579044736665, "compression_ratio": + 1.6866666666666668, "no_speech_prob": 0.0008274210849776864}, {"id": 1353, "seek": + 409402, "start": 4102.7, "end": 4104.54, "text": " and hopefully if you can still + stay on,", "tokens": [50798, 293, 4696, 498, 291, 393, 920, 1754, 322, 11, 50890], + "temperature": 0.0, "avg_logprob": -0.16125579044736665, "compression_ratio": 1.6866666666666668, + "no_speech_prob": 0.0008274210849776864}, {"id": 1354, "seek": 409402, "start": + 4104.54, "end": 4106.62, "text": " we can get to the bottom of the list.", "tokens": + [50890, 321, 393, 483, 281, 264, 2767, 295, 264, 1329, 13, 50994], "temperature": + 0.0, "avg_logprob": -0.16125579044736665, "compression_ratio": 1.6866666666666668, + "no_speech_prob": 0.0008274210849776864}, {"id": 1355, "seek": 409402, "start": + 4108.1, "end": 4110.14, "text": " Yeah, and by the way, to your answer, Trey,", + "tokens": [51068, 865, 11, 293, 538, 264, 636, 11, 281, 428, 1867, 11, 314, 7950, + 11, 51170], "temperature": 0.0, "avg_logprob": -0.16125579044736665, "compression_ratio": + 1.6866666666666668, "no_speech_prob": 0.0008274210849776864}, {"id": 1356, "seek": + 409402, "start": 4110.14, "end": 4112.14, "text": " I think somewhere there is probably + a notion", "tokens": [51170, 286, 519, 4079, 456, 307, 1391, 257, 10710, 51270], + "temperature": 0.0, "avg_logprob": -0.16125579044736665, "compression_ratio": 1.6866666666666668, + "no_speech_prob": 0.0008274210849776864}, {"id": 1357, "seek": 409402, "start": + 4112.14, "end": 4114.62, "text": " of search result diversity as well, right?", + "tokens": [51270, 295, 3164, 1874, 8811, 382, 731, 11, 558, 30, 51394], "temperature": + 0.0, "avg_logprob": -0.16125579044736665, "compression_ratio": 1.6866666666666668, + "no_speech_prob": 0.0008274210849776864}, {"id": 1358, "seek": 409402, "start": + 4114.62, "end": 4116.7, "text": " So even if the user types iPhone,", "tokens": + [51394, 407, 754, 498, 264, 4195, 3467, 7252, 11, 51498], "temperature": 0.0, "avg_logprob": + -0.16125579044736665, "compression_ratio": 1.6866666666666668, "no_speech_prob": + 0.0008274210849776864}, {"id": 1359, "seek": 409402, "start": 4116.7, "end": 4117.9, + "text": " they only mean the phone,", "tokens": [51498, 436, 787, 914, 264, 2593, + 11, 51558], "temperature": 0.0, "avg_logprob": -0.16125579044736665, "compression_ratio": + 1.6866666666666668, "no_speech_prob": 0.0008274210849776864}, {"id": 1360, "seek": + 409402, "start": 4117.9, "end": 4119.94, "text": " but they actually may mean something + else, right?", "tokens": [51558, 457, 436, 767, 815, 914, 746, 1646, 11, 558, 30, + 51660], "temperature": 0.0, "avg_logprob": -0.16125579044736665, "compression_ratio": + 1.6866666666666668, "no_speech_prob": 0.0008274210849776864}, {"id": 1361, "seek": + 409402, "start": 4119.94, "end": 4121.86, "text": " So showing diverse results,", + "tokens": [51660, 407, 4099, 9521, 3542, 11, 51756], "temperature": 0.0, "avg_logprob": + -0.16125579044736665, "compression_ratio": 1.6866666666666668, "no_speech_prob": + 0.0008274210849776864}, {"id": 1362, "seek": 412186, "start": 4121.86, "end": 4124.94, + "text": " and then traversing to the other side", "tokens": [50364, 293, 550, 23149, + 278, 281, 264, 661, 1252, 50518], "temperature": 0.0, "avg_logprob": -0.26635313034057617, + "compression_ratio": 1.6473029045643153, "no_speech_prob": 0.0009137650486081839}, + {"id": 1363, "seek": 412186, "start": 4124.94, "end": 4127.82, "text": " with those + diverse results could also make sense.", "tokens": [50518, 365, 729, 9521, 3542, + 727, 611, 652, 2020, 13, 50662], "temperature": 0.0, "avg_logprob": -0.26635313034057617, + "compression_ratio": 1.6473029045643153, "no_speech_prob": 0.0009137650486081839}, + {"id": 1364, "seek": 412186, "start": 4129.099999999999, "end": 4130.259999999999, + "text": " Absolutely.", "tokens": [50726, 7021, 13, 50784], "temperature": 0.0, + "avg_logprob": -0.26635313034057617, "compression_ratio": 1.6473029045643153, "no_speech_prob": + 0.0009137650486081839}, {"id": 1365, "seek": 412186, "start": 4130.259999999999, + "end": 4132.099999999999, "text": " Then Arjun is asking,", "tokens": [50784, 1396, + 1587, 10010, 307, 3365, 11, 50876], "temperature": 0.0, "avg_logprob": -0.26635313034057617, + "compression_ratio": 1.6473029045643153, "no_speech_prob": 0.0009137650486081839}, + {"id": 1366, "seek": 412186, "start": 4132.099999999999, "end": 4133.66, "text": + " is there, if I summarize the question,", "tokens": [50876, 307, 456, 11, 498, + 286, 20858, 264, 1168, 11, 50954], "temperature": 0.0, "avg_logprob": -0.26635313034057617, + "compression_ratio": 1.6473029045643153, "no_speech_prob": 0.0009137650486081839}, + {"id": 1367, "seek": 412186, "start": 4133.66, "end": 4137.179999999999, "text": + " is there a cheaper way than using semantic knowledge graph?", "tokens": [50954, + 307, 456, 257, 12284, 636, 813, 1228, 47982, 3601, 4295, 30, 51130], "temperature": + 0.0, "avg_logprob": -0.26635313034057617, "compression_ratio": 1.6473029045643153, + "no_speech_prob": 0.0009137650486081839}, {"id": 1368, "seek": 412186, "start": + 4137.179999999999, "end": 4141.78, "text": " Maybe the fear is that the graph approach", + "tokens": [51130, 2704, 264, 4240, 307, 300, 264, 4295, 3109, 51360], "temperature": + 0.0, "avg_logprob": -0.26635313034057617, "compression_ratio": 1.6473029045643153, + "no_speech_prob": 0.0009137650486081839}, {"id": 1369, "seek": 412186, "start": + 4141.78, "end": 4146.42, "text": " is computation-expensive, is there some cheap + way to...", "tokens": [51360, 307, 24903, 12, 27409, 11, 307, 456, 512, 7084, 636, + 281, 485, 51592], "temperature": 0.0, "avg_logprob": -0.26635313034057617, "compression_ratio": + 1.6473029045643153, "no_speech_prob": 0.0009137650486081839}, {"id": 1370, "seek": + 412186, "start": 4146.42, "end": 4148.78, "text": " It''s less computationally expensive + than running", "tokens": [51592, 467, 311, 1570, 24903, 379, 5124, 813, 2614, 51710], + "temperature": 0.0, "avg_logprob": -0.26635313034057617, "compression_ratio": 1.6473029045643153, + "no_speech_prob": 0.0009137650486081839}, {"id": 1371, "seek": 412186, "start": + 4149.78, "end": 4151.299999999999, "text": " an embedding model typically.", "tokens": + [51760, 364, 12240, 3584, 2316, 5850, 13, 51836], "temperature": 0.0, "avg_logprob": + -0.26635313034057617, "compression_ratio": 1.6473029045643153, "no_speech_prob": + 0.0009137650486081839}, {"id": 1372, "seek": 415186, "start": 4152.299999999999, + "end": 4154.78, "text": " But it just depends.", "tokens": [50386, 583, 309, 445, + 5946, 13, 50510], "temperature": 0.0, "avg_logprob": -0.2665522921401843, "compression_ratio": + 1.5850622406639003, "no_speech_prob": 0.00047733221435919404}, {"id": 1373, "seek": + 415186, "start": 4155.7, "end": 4157.259999999999, "text": " Yeah, I mean, there''s + other techniques.", "tokens": [50556, 865, 11, 286, 914, 11, 456, 311, 661, 7512, + 13, 50634], "temperature": 0.0, "avg_logprob": -0.2665522921401843, "compression_ratio": + 1.5850622406639003, "no_speech_prob": 0.00047733221435919404}, {"id": 1374, "seek": + 415186, "start": 4157.259999999999, "end": 4158.86, "text": " Like if you have a + fine-tuned,", "tokens": [50634, 1743, 498, 291, 362, 257, 2489, 12, 83, 43703, 11, + 50714], "temperature": 0.0, "avg_logprob": -0.2665522921401843, "compression_ratio": + 1.5850622406639003, "no_speech_prob": 0.00047733221435919404}, {"id": 1375, "seek": + 415186, "start": 4158.86, "end": 4161.0199999999995, "text": " blade model, for + example,", "tokens": [50714, 10959, 2316, 11, 337, 1365, 11, 50822], "temperature": + 0.0, "avg_logprob": -0.2665522921401843, "compression_ratio": 1.5850622406639003, + "no_speech_prob": 0.00047733221435919404}, {"id": 1376, "seek": 415186, "start": + 4161.0199999999995, "end": 4163.78, "text": " it can give you very comparable kind + of", "tokens": [50822, 309, 393, 976, 291, 588, 25323, 733, 295, 50960], "temperature": + 0.0, "avg_logprob": -0.2665522921401843, "compression_ratio": 1.5850622406639003, + "no_speech_prob": 0.00047733221435919404}, {"id": 1377, "seek": 415186, "start": + 4164.82, "end": 4167.78, "text": " semantic understanding on the sparse side.", + "tokens": [51012, 47982, 3701, 322, 264, 637, 11668, 1252, 13, 51160], "temperature": + 0.0, "avg_logprob": -0.2665522921401843, "compression_ratio": 1.5850622406639003, + "no_speech_prob": 0.00047733221435919404}, {"id": 1378, "seek": 415186, "start": + 4167.78, "end": 4169.5, "text": " The problem with that is you have to fine-tune + it", "tokens": [51160, 440, 1154, 365, 300, 307, 291, 362, 281, 2489, 12, 83, 2613, + 309, 51246], "temperature": 0.0, "avg_logprob": -0.2665522921401843, "compression_ratio": + 1.5850622406639003, "no_speech_prob": 0.00047733221435919404}, {"id": 1379, "seek": + 415186, "start": 4169.5, "end": 4170.58, "text": " to your data,", "tokens": [51246, + 281, 428, 1412, 11, 51300], "temperature": 0.0, "avg_logprob": -0.2665522921401843, + "compression_ratio": 1.5850622406639003, "no_speech_prob": 0.00047733221435919404}, + {"id": 1380, "seek": 415186, "start": 4170.58, "end": 4173.82, "text": " and also + one of the benefits of the semantic knowledge graph", "tokens": [51300, 293, 611, + 472, 295, 264, 5311, 295, 264, 47982, 3601, 4295, 51462], "temperature": 0.0, "avg_logprob": + -0.2665522921401843, "compression_ratio": 1.5850622406639003, "no_speech_prob": + 0.00047733221435919404}, {"id": 1381, "seek": 415186, "start": 4173.82, "end": 4176.7, + "text": " is that if you,", "tokens": [51462, 307, 300, 498, 291, 11, 51606], "temperature": + 0.0, "avg_logprob": -0.2665522921401843, "compression_ratio": 1.5850622406639003, + "no_speech_prob": 0.00047733221435919404}, {"id": 1382, "seek": 415186, "start": + 4176.7, "end": 4179.0199999999995, "text": " I''m just gonna quickly jump to the + slide", "tokens": [51606, 286, 478, 445, 799, 2661, 3012, 281, 264, 4137, 51722], + "temperature": 0.0, "avg_logprob": -0.2665522921401843, "compression_ratio": 1.5850622406639003, + "no_speech_prob": 0.00047733221435919404}, {"id": 1383, "seek": 417902, "start": + 4179.02, "end": 4181.5, "text": " and show you this one.", "tokens": [50364, 293, + 855, 291, 341, 472, 13, 50488], "temperature": 0.0, "avg_logprob": -0.14282368324898384, + "compression_ratio": 1.6437246963562753, "no_speech_prob": 0.008077533915638924}, + {"id": 1384, "seek": 417902, "start": 4183.540000000001, "end": 4185.860000000001, + "text": " Let me do the one that''s got keywords, here we go.", "tokens": [50590, + 961, 385, 360, 264, 472, 300, 311, 658, 21009, 11, 510, 321, 352, 13, 50706], "temperature": + 0.0, "avg_logprob": -0.14282368324898384, "compression_ratio": 1.6437246963562753, + "no_speech_prob": 0.008077533915638924}, {"id": 1385, "seek": 417902, "start": 4186.9800000000005, + "end": 4188.700000000001, "text": " Share here.", "tokens": [50762, 14945, 510, + 13, 50848], "temperature": 0.0, "avg_logprob": -0.14282368324898384, "compression_ratio": + 1.6437246963562753, "no_speech_prob": 0.008077533915638924}, {"id": 1386, "seek": + 417902, "start": 4188.700000000001, "end": 4190.26, "text": " With the semantic + knowledge graph approach,", "tokens": [50848, 2022, 264, 47982, 3601, 4295, 3109, + 11, 50926], "temperature": 0.0, "avg_logprob": -0.14282368324898384, "compression_ratio": + 1.6437246963562753, "no_speech_prob": 0.008077533915638924}, {"id": 1387, "seek": + 417902, "start": 4190.26, "end": 4192.9400000000005, "text": " you have the ability + to not just represent", "tokens": [50926, 291, 362, 264, 3485, 281, 406, 445, 2906, + 51060], "temperature": 0.0, "avg_logprob": -0.14282368324898384, "compression_ratio": + 1.6437246963562753, "no_speech_prob": 0.008077533915638924}, {"id": 1388, "seek": + 417902, "start": 4192.9400000000005, "end": 4197.26, "text": " the query with a + bunch of terms with values,", "tokens": [51060, 264, 14581, 365, 257, 3840, 295, + 2115, 365, 4190, 11, 51276], "temperature": 0.0, "avg_logprob": -0.14282368324898384, + "compression_ratio": 1.6437246963562753, "no_speech_prob": 0.008077533915638924}, + {"id": 1389, "seek": 417902, "start": 4197.26, "end": 4199.22, "text": " but you + can actually use any field.", "tokens": [51276, 457, 291, 393, 767, 764, 604, 2519, + 13, 51374], "temperature": 0.0, "avg_logprob": -0.14282368324898384, "compression_ratio": + 1.6437246963562753, "no_speech_prob": 0.008077533915638924}, {"id": 1390, "seek": + 417902, "start": 4199.22, "end": 4201.9400000000005, "text": " So it''s really useful + to be able to describe it", "tokens": [51374, 407, 309, 311, 534, 4420, 281, 312, + 1075, 281, 6786, 309, 51510], "temperature": 0.0, "avg_logprob": -0.14282368324898384, + "compression_ratio": 1.6437246963562753, "no_speech_prob": 0.008077533915638924}, + {"id": 1391, "seek": 417902, "start": 4201.9400000000005, "end": 4204.700000000001, + "text": " with a category of Korean and a bunch of terms here,", "tokens": [51510, + 365, 257, 7719, 295, 6933, 293, 257, 3840, 295, 2115, 510, 11, 51648], "temperature": + 0.0, "avg_logprob": -0.14282368324898384, "compression_ratio": 1.6437246963562753, + "no_speech_prob": 0.008077533915638924}, {"id": 1392, "seek": 417902, "start": 4204.700000000001, + "end": 4207.780000000001, "text": " and maybe you''ve got other fields on your documents", + "tokens": [51648, 293, 1310, 291, 600, 658, 661, 7909, 322, 428, 8512, 51802], "temperature": + 0.0, "avg_logprob": -0.14282368324898384, "compression_ratio": 1.6437246963562753, + "no_speech_prob": 0.008077533915638924}, {"id": 1393, "seek": 420778, "start": 4207.78, + "end": 4210.46, "text": " that are really useful for describing the document,", + "tokens": [50364, 300, 366, 534, 4420, 337, 16141, 264, 4166, 11, 50498], "temperature": + 0.0, "avg_logprob": -0.2193455269666222, "compression_ratio": 1.6413793103448275, + "no_speech_prob": 0.002246849937364459}, {"id": 1394, "seek": 420778, "start": 4210.46, + "end": 4212.3, "text": " a taxonomy of some sort.", "tokens": [50498, 257, 3366, + 23423, 295, 512, 1333, 13, 50590], "temperature": 0.0, "avg_logprob": -0.2193455269666222, + "compression_ratio": 1.6413793103448275, "no_speech_prob": 0.002246849937364459}, + {"id": 1395, "seek": 420778, "start": 4212.3, "end": 4214.86, "text": " The semantic + knowledge graph gives you a lot richer ability", "tokens": [50590, 440, 47982, 3601, + 4295, 2709, 291, 257, 688, 29021, 3485, 50718], "temperature": 0.0, "avg_logprob": + -0.2193455269666222, "compression_ratio": 1.6413793103448275, "no_speech_prob": + 0.002246849937364459}, {"id": 1396, "seek": 420778, "start": 4214.86, "end": 4219.0599999999995, + "text": " to turn the set of documents into a fully expressive query.", "tokens": + [50718, 281, 1261, 264, 992, 295, 8512, 666, 257, 4498, 40189, 14581, 13, 50928], + "temperature": 0.0, "avg_logprob": -0.2193455269666222, "compression_ratio": 1.6413793103448275, + "no_speech_prob": 0.002246849937364459}, {"id": 1397, "seek": 420778, "start": 4219.0599999999995, + "end": 4220.86, "text": " So yeah, there''s other techniques.", "tokens": [50928, + 407, 1338, 11, 456, 311, 661, 7512, 13, 51018], "temperature": 0.0, "avg_logprob": + -0.2193455269666222, "compression_ratio": 1.6413793103448275, "no_speech_prob": + 0.002246849937364459}, {"id": 1398, "seek": 420778, "start": 4220.86, "end": 4222.98, + "text": " You could look at displayed things like that,", "tokens": [51018, 509, + 727, 574, 412, 16372, 721, 411, 300, 11, 51124], "temperature": 0.0, "avg_logprob": + -0.2193455269666222, "compression_ratio": 1.6413793103448275, "no_speech_prob": + 0.002246849937364459}, {"id": 1399, "seek": 420778, "start": 4222.98, "end": 4226.66, + "text": " but nothing that''s nearly as expressive.", "tokens": [51124, 457, 1825, + 300, 311, 6217, 382, 40189, 13, 51308], "temperature": 0.0, "avg_logprob": -0.2193455269666222, + "compression_ratio": 1.6413793103448275, "no_speech_prob": 0.002246849937364459}, + {"id": 1400, "seek": 420778, "start": 4226.66, "end": 4227.94, "text": " Yeah, and + these are the concepts", "tokens": [51308, 865, 11, 293, 613, 366, 264, 10392, 51372], + "temperature": 0.0, "avg_logprob": -0.2193455269666222, "compression_ratio": 1.6413793103448275, + "no_speech_prob": 0.002246849937364459}, {"id": 1401, "seek": 420778, "start": 4227.94, + "end": 4229.42, "text": " you''re gonna covering the course, right?", "tokens": + [51372, 291, 434, 799, 10322, 264, 1164, 11, 558, 30, 51446], "temperature": 0.0, + "avg_logprob": -0.2193455269666222, "compression_ratio": 1.6413793103448275, "no_speech_prob": + 0.002246849937364459}, {"id": 1402, "seek": 420778, "start": 4229.42, "end": 4231.86, + "text": " Yeah, we''ll cover it all in the course.", "tokens": [51446, 865, 11, + 321, 603, 2060, 309, 439, 294, 264, 1164, 13, 51568], "temperature": 0.0, "avg_logprob": + -0.2193455269666222, "compression_ratio": 1.6413793103448275, "no_speech_prob": + 0.002246849937364459}, {"id": 1403, "seek": 420778, "start": 4231.86, "end": 4234.38, + "text": " For those who are interested to learn more.", "tokens": [51568, 1171, + 729, 567, 366, 3102, 281, 1466, 544, 13, 51694], "temperature": 0.0, "avg_logprob": + -0.2193455269666222, "compression_ratio": 1.6413793103448275, "no_speech_prob": + 0.002246849937364459}, {"id": 1404, "seek": 423438, "start": 4235.22, "end": 4238.74, + "text": " Sorry, not trying to make this talk just a big promo", "tokens": [50406, + 4919, 11, 406, 1382, 281, 652, 341, 751, 445, 257, 955, 26750, 50582], "temperature": + 0.0, "avg_logprob": -0.34484953629343135, "compression_ratio": 1.626923076923077, + "no_speech_prob": 0.03527650237083435}, {"id": 1405, "seek": 423438, "start": 4238.74, + "end": 4240.82, "text": " for the course, but this is a really,", "tokens": [50582, + 337, 264, 1164, 11, 457, 341, 307, 257, 534, 11, 50686], "temperature": 0.0, "avg_logprob": + -0.34484953629343135, "compression_ratio": 1.626923076923077, "no_speech_prob": + 0.03527650237083435}, {"id": 1406, "seek": 423438, "start": 4240.82, "end": 4243.34, + "text": " wormhole vectors by themselves are a really interesting topic,", "tokens": + [50686, 23835, 14094, 18875, 538, 2969, 366, 257, 534, 1880, 4829, 11, 50812], "temperature": + 0.0, "avg_logprob": -0.34484953629343135, "compression_ratio": 1.626923076923077, + "no_speech_prob": 0.03527650237083435}, {"id": 1407, "seek": 423438, "start": 4243.34, + "end": 4247.34, "text": " but yeah, obviously I would love if you would join us", + "tokens": [50812, 457, 1338, 11, 2745, 286, 576, 959, 498, 291, 576, 3917, 505, + 51012], "temperature": 0.0, "avg_logprob": -0.34484953629343135, "compression_ratio": + 1.626923076923077, "no_speech_prob": 0.03527650237083435}, {"id": 1408, "seek": + 423438, "start": 4247.34, "end": 4249.22, "text": " in the course, it''ll be fun.", + "tokens": [51012, 294, 264, 1164, 11, 309, 603, 312, 1019, 13, 51106], "temperature": + 0.0, "avg_logprob": -0.34484953629343135, "compression_ratio": 1.626923076923077, + "no_speech_prob": 0.03527650237083435}, {"id": 1409, "seek": 423438, "start": 4250.1, + "end": 4253.14, "text": " Debo Brata is asking, what do you think", "tokens": [51150, + 1346, 1763, 1603, 3274, 307, 3365, 11, 437, 360, 291, 519, 51302], "temperature": + 0.0, "avg_logprob": -0.34484953629343135, "compression_ratio": 1.626923076923077, + "no_speech_prob": 0.03527650237083435}, {"id": 1410, "seek": 423438, "start": 4253.14, + "end": 4255.74, "text": " about some of the reputable search sites,", "tokens": + [51302, 466, 512, 295, 264, 1085, 32148, 3164, 7533, 11, 51432], "temperature": + 0.0, "avg_logprob": -0.34484953629343135, "compression_ratio": 1.626923076923077, + "no_speech_prob": 0.03527650237083435}, {"id": 1411, "seek": 423438, "start": 4255.74, + "end": 4259.900000000001, "text": " like in Deden, LinkedIn, where searching for + a male engineer", "tokens": [51432, 411, 294, 413, 6876, 11, 20657, 11, 689, 10808, + 337, 257, 7133, 11403, 51640], "temperature": 0.0, "avg_logprob": -0.34484953629343135, + "compression_ratio": 1.626923076923077, "no_speech_prob": 0.03527650237083435}, + {"id": 1412, "seek": 423438, "start": 4259.900000000001, "end": 4262.34, "text": + " will bring your results like data engineers", "tokens": [51640, 486, 1565, 428, + 3542, 411, 1412, 11955, 51762], "temperature": 0.0, "avg_logprob": -0.34484953629343135, + "compression_ratio": 1.626923076923077, "no_speech_prob": 0.03527650237083435}, + {"id": 1413, "seek": 426234, "start": 4262.34, "end": 4265.82, "text": " and whatever + unrelated stuff, not directly related stuff.", "tokens": [50364, 293, 2035, 38967, + 1507, 11, 406, 3838, 4077, 1507, 13, 50538], "temperature": 0.0, "avg_logprob": + -0.2616902542114258, "compression_ratio": 1.7251184834123223, "no_speech_prob": + 0.0058535500429570675}, {"id": 1414, "seek": 426234, "start": 4265.82, "end": 4269.58, + "text": " And so the question is why search documents", "tokens": [50538, 400, 370, + 264, 1168, 307, 983, 3164, 8512, 50726], "temperature": 0.0, "avg_logprob": -0.2616902542114258, + "compression_ratio": 1.7251184834123223, "no_speech_prob": 0.0058535500429570675}, + {"id": 1415, "seek": 426234, "start": 4269.58, "end": 4271.9400000000005, "text": + " not based on the entire user query, right?", "tokens": [50726, 406, 2361, 322, + 264, 2302, 4195, 14581, 11, 558, 30, 50844], "temperature": 0.0, "avg_logprob": + -0.2616902542114258, "compression_ratio": 1.7251184834123223, "no_speech_prob": + 0.0058535500429570675}, {"id": 1416, "seek": 426234, "start": 4271.9400000000005, + "end": 4272.900000000001, "text": " Only part of it.", "tokens": [50844, 5686, 644, + 295, 309, 13, 50892], "temperature": 0.0, "avg_logprob": -0.2616902542114258, "compression_ratio": + 1.7251184834123223, "no_speech_prob": 0.0058535500429570675}, {"id": 1417, "seek": + 426234, "start": 4277.58, "end": 4280.14, "text": " Sorry, I''m trying to understand + the question", "tokens": [51126, 4919, 11, 286, 478, 1382, 281, 1223, 264, 1168, + 51254], "temperature": 0.0, "avg_logprob": -0.2616902542114258, "compression_ratio": + 1.7251184834123223, "no_speech_prob": 0.0058535500429570675}, {"id": 1418, "seek": + 426234, "start": 4280.14, "end": 4281.9800000000005, "text": " in relation to the + wormhole vector topic.", "tokens": [51254, 294, 9721, 281, 264, 23835, 14094, 8062, + 4829, 13, 51346], "temperature": 0.0, "avg_logprob": -0.2616902542114258, "compression_ratio": + 1.7251184834123223, "no_speech_prob": 0.0058535500429570675}, {"id": 1419, "seek": + 426234, "start": 4283.58, "end": 4287.58, "text": " Yeah, I think it''s more, I + think it''s less directly related.", "tokens": [51426, 865, 11, 286, 519, 309, 311, + 544, 11, 286, 519, 309, 311, 1570, 3838, 4077, 13, 51626], "temperature": 0.0, "avg_logprob": + -0.2616902542114258, "compression_ratio": 1.7251184834123223, "no_speech_prob": + 0.0058535500429570675}, {"id": 1420, "seek": 426234, "start": 4287.58, "end": 4292.14, + "text": " I think it''s more airing on the side of why data bias.", "tokens": [51626, + 286, 519, 309, 311, 544, 257, 5057, 322, 264, 1252, 295, 983, 1412, 12577, 13, 51854], + "temperature": 0.0, "avg_logprob": -0.2616902542114258, "compression_ratio": 1.7251184834123223, + "no_speech_prob": 0.0058535500429570675}, {"id": 1421, "seek": 429214, "start": + 4292.14, "end": 4296.700000000001, "text": " Why does reputable search sites do + not sort of", "tokens": [50364, 1545, 775, 1085, 32148, 3164, 7533, 360, 406, 1333, + 295, 50592], "temperature": 0.0, "avg_logprob": -0.1610468796321324, "compression_ratio": + 1.6544715447154472, "no_speech_prob": 0.0005980291753076017}, {"id": 1422, "seek": + 429214, "start": 4296.700000000001, "end": 4301.06, "text": " utilize the semantic + search, you know, one to one in a way?", "tokens": [50592, 16117, 264, 47982, 3164, + 11, 291, 458, 11, 472, 281, 472, 294, 257, 636, 30, 50810], "temperature": 0.0, + "avg_logprob": -0.1610468796321324, "compression_ratio": 1.6544715447154472, "no_speech_prob": + 0.0005980291753076017}, {"id": 1423, "seek": 429214, "start": 4301.06, "end": 4301.900000000001, + "text": " Yeah, I got you.", "tokens": [50810, 865, 11, 286, 658, 291, 13, 50852], + "temperature": 0.0, "avg_logprob": -0.1610468796321324, "compression_ratio": 1.6544715447154472, + "no_speech_prob": 0.0005980291753076017}, {"id": 1424, "seek": 429214, "start": + 4301.900000000001, "end": 4306.900000000001, "text": " I mean, the reality is that + most AI-powered search algorithms,", "tokens": [50852, 286, 914, 11, 264, 4103, + 307, 300, 881, 7318, 12, 27178, 3164, 14642, 11, 51102], "temperature": 0.0, "avg_logprob": + -0.1610468796321324, "compression_ratio": 1.6544715447154472, "no_speech_prob": + 0.0005980291753076017}, {"id": 1425, "seek": 429214, "start": 4308.5, "end": 4312.58, + "text": " really all of them are used data and the data is biased, right?", "tokens": + [51182, 534, 439, 295, 552, 366, 1143, 1412, 293, 264, 1412, 307, 28035, 11, 558, + 30, 51386], "temperature": 0.0, "avg_logprob": -0.1610468796321324, "compression_ratio": + 1.6544715447154472, "no_speech_prob": 0.0005980291753076017}, {"id": 1426, "seek": + 429214, "start": 4312.58, "end": 4315.06, "text": " So like the reality is in the + world,", "tokens": [51386, 407, 411, 264, 4103, 307, 294, 264, 1002, 11, 51510], + "temperature": 0.0, "avg_logprob": -0.1610468796321324, "compression_ratio": 1.6544715447154472, + "no_speech_prob": 0.0005980291753076017}, {"id": 1427, "seek": 429214, "start": + 4315.06, "end": 4317.14, "text": " if you look at data engineering jobs,", "tokens": + [51510, 498, 291, 574, 412, 1412, 7043, 4782, 11, 51614], "temperature": 0.0, "avg_logprob": + -0.1610468796321324, "compression_ratio": 1.6544715447154472, "no_speech_prob": + 0.0005980291753076017}, {"id": 1428, "seek": 429214, "start": 4317.14, "end": 4320.04, + "text": " they are more statistically skewed towards males", "tokens": [51614, 436, + 366, 544, 36478, 8756, 26896, 3030, 20776, 51759], "temperature": 0.0, "avg_logprob": + -0.1610468796321324, "compression_ratio": 1.6544715447154472, "no_speech_prob": + 0.0005980291753076017}, {"id": 1429, "seek": 429214, "start": 4320.04, "end": 4321.26, + "text": " being in those jobs and females.", "tokens": [51759, 885, 294, 729, 4782, + 293, 21529, 13, 51820], "temperature": 0.0, "avg_logprob": -0.1610468796321324, + "compression_ratio": 1.6544715447154472, "no_speech_prob": 0.0005980291753076017}, + {"id": 1430, "seek": 432126, "start": 4321.26, "end": 4324.9800000000005, "text": + " That doesn''t mean anything in terms of who can do the job", "tokens": [50364, + 663, 1177, 380, 914, 1340, 294, 2115, 295, 567, 393, 360, 264, 1691, 50550], "temperature": + 0.0, "avg_logprob": -0.13937093189784458, "compression_ratio": 1.740072202166065, + "no_speech_prob": 0.0018295740010216832}, {"id": 1431, "seek": 432126, "start": + 4324.9800000000005, "end": 4325.820000000001, "text": " or who can''t.", "tokens": + [50550, 420, 567, 393, 380, 13, 50592], "temperature": 0.0, "avg_logprob": -0.13937093189784458, + "compression_ratio": 1.740072202166065, "no_speech_prob": 0.0018295740010216832}, + {"id": 1432, "seek": 432126, "start": 4325.820000000001, "end": 4327.9400000000005, + "text": " It''s just a reality that, you know,", "tokens": [50592, 467, 311, 445, + 257, 4103, 300, 11, 291, 458, 11, 50698], "temperature": 0.0, "avg_logprob": -0.13937093189784458, + "compression_ratio": 1.740072202166065, "no_speech_prob": 0.0018295740010216832}, + {"id": 1433, "seek": 432126, "start": 4327.9400000000005, "end": 4330.58, "text": + " there tend to be more males in engineering", "tokens": [50698, 456, 3928, 281, + 312, 544, 20776, 294, 7043, 50830], "temperature": 0.0, "avg_logprob": -0.13937093189784458, + "compression_ratio": 1.740072202166065, "no_speech_prob": 0.0018295740010216832}, + {"id": 1434, "seek": 432126, "start": 4330.58, "end": 4332.42, "text": " and therefore + the data is reflecting that.", "tokens": [50830, 293, 4412, 264, 1412, 307, 23543, + 300, 13, 50922], "temperature": 0.0, "avg_logprob": -0.13937093189784458, "compression_ratio": + 1.740072202166065, "no_speech_prob": 0.0018295740010216832}, {"id": 1435, "seek": + 432126, "start": 4332.42, "end": 4335.18, "text": " It would be nice to be able + to take those biases out", "tokens": [50922, 467, 576, 312, 1481, 281, 312, 1075, + 281, 747, 729, 32152, 484, 51060], "temperature": 0.0, "avg_logprob": -0.13937093189784458, + "compression_ratio": 1.740072202166065, "no_speech_prob": 0.0018295740010216832}, + {"id": 1436, "seek": 432126, "start": 4335.18, "end": 4337.58, "text": " and in + fact, there''s ways you can do that,", "tokens": [51060, 293, 294, 1186, 11, 456, + 311, 2098, 291, 393, 360, 300, 11, 51180], "temperature": 0.0, "avg_logprob": -0.13937093189784458, + "compression_ratio": 1.740072202166065, "no_speech_prob": 0.0018295740010216832}, + {"id": 1437, "seek": 432126, "start": 4337.58, "end": 4339.26, "text": " but they''re + extra work.", "tokens": [51180, 457, 436, 434, 2857, 589, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.13937093189784458, "compression_ratio": 1.740072202166065, + "no_speech_prob": 0.0018295740010216832}, {"id": 1438, "seek": 432126, "start": + 4339.26, "end": 4341.22, "text": " And so the out of the box algorithms", "tokens": + [51264, 400, 370, 264, 484, 295, 264, 2424, 14642, 51362], "temperature": 0.0, "avg_logprob": + -0.13937093189784458, "compression_ratio": 1.740072202166065, "no_speech_prob": + 0.0018295740010216832}, {"id": 1439, "seek": 432126, "start": 4341.22, "end": 4343.74, + "text": " that are typically employed don''t necessarily", "tokens": [51362, 300, + 366, 5850, 20115, 500, 380, 4725, 51488], "temperature": 0.0, "avg_logprob": -0.13937093189784458, + "compression_ratio": 1.740072202166065, "no_speech_prob": 0.0018295740010216832}, + {"id": 1440, "seek": 432126, "start": 4343.74, "end": 4346.02, "text": " try to + tackle those biases.", "tokens": [51488, 853, 281, 14896, 729, 32152, 13, 51602], + "temperature": 0.0, "avg_logprob": -0.13937093189784458, "compression_ratio": 1.740072202166065, + "no_speech_prob": 0.0018295740010216832}, {"id": 1441, "seek": 432126, "start": + 4346.02, "end": 4349.34, "text": " So yeah, I think it''s, I think it''s valiant + to, you know,", "tokens": [51602, 407, 1338, 11, 286, 519, 309, 311, 11, 286, 519, + 309, 311, 1323, 5798, 281, 11, 291, 458, 11, 51768], "temperature": 0.0, "avg_logprob": + -0.13937093189784458, "compression_ratio": 1.740072202166065, "no_speech_prob": + 0.0018295740010216832}, {"id": 1442, "seek": 434934, "start": 4349.34, "end": 4351.58, + "text": " try to, especially when you''re dealing with things", "tokens": [50364, + 853, 281, 11, 2318, 562, 291, 434, 6260, 365, 721, 50476], "temperature": 0.0, "avg_logprob": + -0.20804228502161362, "compression_ratio": 1.6989247311827957, "no_speech_prob": + 0.008751551620662212}, {"id": 1443, "seek": 434934, "start": 4351.58, "end": 4354.860000000001, + "text": " like people''s livelihoods and, you know, careers", "tokens": [50476, + 411, 561, 311, 34343, 82, 293, 11, 291, 458, 11, 16409, 50640], "temperature": 0.0, + "avg_logprob": -0.20804228502161362, "compression_ratio": 1.6989247311827957, "no_speech_prob": + 0.008751551620662212}, {"id": 1444, "seek": 434934, "start": 4354.860000000001, + "end": 4356.66, "text": " and things like that, I think it''s,", "tokens": [50640, + 293, 721, 411, 300, 11, 286, 519, 309, 311, 11, 50730], "temperature": 0.0, "avg_logprob": + -0.20804228502161362, "compression_ratio": 1.6989247311827957, "no_speech_prob": + 0.008751551620662212}, {"id": 1445, "seek": 434934, "start": 4356.66, "end": 4360.06, + "text": " it''s a great exercise and something they should focus on,", "tokens": + [50730, 309, 311, 257, 869, 5380, 293, 746, 436, 820, 1879, 322, 11, 50900], "temperature": + 0.0, "avg_logprob": -0.20804228502161362, "compression_ratio": 1.6989247311827957, + "no_speech_prob": 0.008751551620662212}, {"id": 1446, "seek": 434934, "start": 4360.06, + "end": 4362.22, "text": " but it''s like, unfortunately,", "tokens": [50900, 457, + 309, 311, 411, 11, 7015, 11, 51008], "temperature": 0.0, "avg_logprob": -0.20804228502161362, + "compression_ratio": 1.6989247311827957, "no_speech_prob": 0.008751551620662212}, + {"id": 1447, "seek": 434934, "start": 4362.22, "end": 4364.3, "text": " kind of + a reality of the underlying data", "tokens": [51008, 733, 295, 257, 4103, 295, 264, + 14217, 1412, 51112], "temperature": 0.0, "avg_logprob": -0.20804228502161362, "compression_ratio": + 1.6989247311827957, "no_speech_prob": 0.008751551620662212}, {"id": 1448, "seek": + 434934, "start": 4364.3, "end": 4366.58, "text": " that''s being bubbled up, I think.", + "tokens": [51112, 300, 311, 885, 13045, 1493, 493, 11, 286, 519, 13, 51226], "temperature": + 0.0, "avg_logprob": -0.20804228502161362, "compression_ratio": 1.6989247311827957, + "no_speech_prob": 0.008751551620662212}, {"id": 1449, "seek": 434934, "start": 4366.58, + "end": 4367.42, "text": " Yeah.", "tokens": [51226, 865, 13, 51268], "temperature": + 0.0, "avg_logprob": -0.20804228502161362, "compression_ratio": 1.6989247311827957, + "no_speech_prob": 0.008751551620662212}, {"id": 1450, "seek": 434934, "start": 4367.42, + "end": 4368.46, "text": " My take is that I-", "tokens": [51268, 1222, 747, 307, + 300, 286, 12, 51320], "temperature": 0.0, "avg_logprob": -0.20804228502161362, "compression_ratio": + 1.6989247311827957, "no_speech_prob": 0.008751551620662212}, {"id": 1451, "seek": + 434934, "start": 4368.46, "end": 4370.02, "text": " The data is biased.", "tokens": + [51320, 440, 1412, 307, 28035, 13, 51398], "temperature": 0.0, "avg_logprob": -0.20804228502161362, + "compression_ratio": 1.6989247311827957, "no_speech_prob": 0.008751551620662212}, + {"id": 1452, "seek": 434934, "start": 4370.02, "end": 4371.02, "text": " Yeah, I + agree.", "tokens": [51398, 865, 11, 286, 3986, 13, 51448], "temperature": 0.0, "avg_logprob": + -0.20804228502161362, "compression_ratio": 1.6989247311827957, "no_speech_prob": + 0.008751551620662212}, {"id": 1453, "seek": 434934, "start": 4371.02, "end": 4374.18, + "text": " I''ve been one job search engine like a couple companies ago", "tokens": + [51448, 286, 600, 668, 472, 1691, 3164, 2848, 411, 257, 1916, 3431, 2057, 51606], + "temperature": 0.0, "avg_logprob": -0.20804228502161362, "compression_ratio": 1.6989247311827957, + "no_speech_prob": 0.008751551620662212}, {"id": 1454, "seek": 434934, "start": 4374.18, + "end": 4377.860000000001, "text": " and my take is that probably these companies + are trying", "tokens": [51606, 293, 452, 747, 307, 300, 1391, 613, 3431, 366, 1382, + 51790], "temperature": 0.0, "avg_logprob": -0.20804228502161362, "compression_ratio": + 1.6989247311827957, "no_speech_prob": 0.008751551620662212}, {"id": 1455, "seek": + 437786, "start": 4377.86, "end": 4382.86, "text": " to avoid, you know, these traps + when your super,", "tokens": [50364, 281, 5042, 11, 291, 458, 11, 613, 24173, 562, + 428, 1687, 11, 50614], "temperature": 0.0, "avg_logprob": -0.19890231530643204, + "compression_ratio": 1.5860655737704918, "no_speech_prob": 0.006218739319592714}, + {"id": 1456, "seek": 437786, "start": 4382.86, "end": 4386.42, "text": " super precise + query will either lead to nothing", "tokens": [50614, 1687, 13600, 14581, 486, 2139, + 1477, 281, 1825, 50792], "temperature": 0.0, "avg_logprob": -0.19890231530643204, + "compression_ratio": 1.5860655737704918, "no_speech_prob": 0.006218739319592714}, + {"id": 1457, "seek": 437786, "start": 4386.42, "end": 4388.78, "text": " or lead + to just a couple of jobs on the screen", "tokens": [50792, 420, 1477, 281, 445, + 257, 1916, 295, 4782, 322, 264, 2568, 50910], "temperature": 0.0, "avg_logprob": + -0.19890231530643204, "compression_ratio": 1.5860655737704918, "no_speech_prob": + 0.006218739319592714}, {"id": 1458, "seek": 437786, "start": 4388.78, "end": 4391.86, + "text": " because their business is to show you as many jobs as possible", "tokens": + [50910, 570, 641, 1606, 307, 281, 855, 291, 382, 867, 4782, 382, 1944, 51064], "temperature": + 0.0, "avg_logprob": -0.19890231530643204, "compression_ratio": 1.5860655737704918, + "no_speech_prob": 0.006218739319592714}, {"id": 1459, "seek": 437786, "start": 4391.86, + "end": 4393.94, "text": " so that they can monetize that.", "tokens": [51064, 370, + 300, 436, 393, 15556, 1125, 300, 13, 51168], "temperature": 0.0, "avg_logprob": + -0.19890231530643204, "compression_ratio": 1.5860655737704918, "no_speech_prob": + 0.006218739319592714}, {"id": 1460, "seek": 437786, "start": 4393.94, "end": 4397.0599999999995, + "text": " So it''s all about maybe like business element as well.", "tokens": [51168, + 407, 309, 311, 439, 466, 1310, 411, 1606, 4478, 382, 731, 13, 51324], "temperature": + 0.0, "avg_logprob": -0.19890231530643204, "compression_ratio": 1.5860655737704918, + "no_speech_prob": 0.006218739319592714}, {"id": 1461, "seek": 437786, "start": 4397.0599999999995, + "end": 4401.0199999999995, "text": " But I''m sure there are other technical aspects + of this,", "tokens": [51324, 583, 286, 478, 988, 456, 366, 661, 6191, 7270, 295, + 341, 11, 51522], "temperature": 0.0, "avg_logprob": -0.19890231530643204, "compression_ratio": + 1.5860655737704918, "no_speech_prob": 0.006218739319592714}, {"id": 1462, "seek": + 437786, "start": 4401.0199999999995, "end": 4403.179999999999, "text": " which we + should note disregard.", "tokens": [51522, 597, 321, 820, 3637, 44493, 13, 51630], + "temperature": 0.0, "avg_logprob": -0.19890231530643204, "compression_ratio": 1.5860655737704918, + "no_speech_prob": 0.006218739319592714}, {"id": 1463, "seek": 437786, "start": 4404.66, + "end": 4405.5, "text": " Sure.", "tokens": [51704, 4894, 13, 51746], "temperature": + 0.0, "avg_logprob": -0.19890231530643204, "compression_ratio": 1.5860655737704918, + "no_speech_prob": 0.006218739319592714}, {"id": 1464, "seek": 440550, "start": 4406.46, + "end": 4409.1, "text": " The next question is from Arjun.", "tokens": [50412, 440, + 958, 1168, 307, 490, 1587, 10010, 13, 50544], "temperature": 0.0, "avg_logprob": + -0.3238900320870536, "compression_ratio": 1.7053571428571428, "no_speech_prob": + 0.000620715320110321}, {"id": 1465, "seek": 440550, "start": 4409.1, "end": 4412.02, + "text": " And you experience how much do the following results differ?", "tokens": + [50544, 400, 291, 1752, 577, 709, 360, 264, 3480, 3542, 743, 30, 50690], "temperature": + 0.0, "avg_logprob": -0.3238900320870536, "compression_ratio": 1.7053571428571428, + "no_speech_prob": 0.000620715320110321}, {"id": 1466, "seek": 440550, "start": 4412.02, + "end": 4416.42, "text": " First, query against dense vector space directly.", "tokens": + [50690, 2386, 11, 14581, 1970, 18011, 8062, 1901, 3838, 13, 50910], "temperature": + 0.0, "avg_logprob": -0.3238900320870536, "compression_ratio": 1.7053571428571428, + "no_speech_prob": 0.000620715320110321}, {"id": 1467, "seek": 440550, "start": 4416.42, + "end": 4420.1, "text": " And the second query in sparse vector space", "tokens": + [50910, 400, 264, 1150, 14581, 294, 637, 11668, 8062, 1901, 51094], "temperature": + 0.0, "avg_logprob": -0.3238900320870536, "compression_ratio": 1.7053571428571428, + "no_speech_prob": 0.000620715320110321}, {"id": 1468, "seek": 440550, "start": 4420.1, + "end": 4422.02, "text": " and wormhole to dense vector space", "tokens": [51094, + 293, 23835, 14094, 281, 18011, 8062, 1901, 51190], "temperature": 0.0, "avg_logprob": + -0.3238900320870536, "compression_ratio": 1.7053571428571428, "no_speech_prob": + 0.000620715320110321}, {"id": 1469, "seek": 440550, "start": 4422.02, "end": 4423.78, + "text": " and finally get the docs that are similar", "tokens": [51190, 293, 2721, + 483, 264, 45623, 300, 366, 2531, 51278], "temperature": 0.0, "avg_logprob": -0.3238900320870536, + "compression_ratio": 1.7053571428571428, "no_speech_prob": 0.000620715320110321}, + {"id": 1470, "seek": 440550, "start": 4423.78, "end": 4426.66, "text": " to the + wormhole vector average vector.", "tokens": [51278, 281, 264, 23835, 14094, 8062, + 4274, 8062, 13, 51422], "temperature": 0.0, "avg_logprob": -0.3238900320870536, + "compression_ratio": 1.7053571428571428, "no_speech_prob": 0.000620715320110321}, + {"id": 1471, "seek": 440550, "start": 4428.5, "end": 4430.3, "text": " So what''s + the question?", "tokens": [51514, 407, 437, 311, 264, 1168, 30, 51604], "temperature": + 0.0, "avg_logprob": -0.3238900320870536, "compression_ratio": 1.7053571428571428, + "no_speech_prob": 0.000620715320110321}, {"id": 1472, "seek": 440550, "start": 4430.3, + "end": 4431.06, "text": " That''s the answer.", "tokens": [51604, 663, 311, 264, + 1867, 13, 51642], "temperature": 0.0, "avg_logprob": -0.3238900320870536, "compression_ratio": + 1.7053571428571428, "no_speech_prob": 0.000620715320110321}, {"id": 1473, "seek": + 440550, "start": 4431.06, "end": 4433.78, "text": " I mean, at the end of the day, + the,", "tokens": [51642, 286, 914, 11, 412, 264, 917, 295, 264, 786, 11, 264, 11, + 51778], "temperature": 0.0, "avg_logprob": -0.3238900320870536, "compression_ratio": + 1.7053571428571428, "no_speech_prob": 0.000620715320110321}, {"id": 1474, "seek": + 443378, "start": 4433.78, "end": 4437.74, "text": " if you have a query that you + run against your lexical space", "tokens": [50364, 498, 291, 362, 257, 14581, 300, + 291, 1190, 1970, 428, 476, 87, 804, 1901, 50562], "temperature": 0.0, "avg_logprob": + -0.15940138670775267, "compression_ratio": 1.8608695652173912, "no_speech_prob": + 3.71196074411273e-05}, {"id": 1475, "seek": 443378, "start": 4437.74, "end": 4442.74, + "text": " that matches mostly documents that are related to the query", "tokens": + [50562, 300, 10676, 5240, 8512, 300, 366, 4077, 281, 264, 14581, 50812], "temperature": + 0.0, "avg_logprob": -0.15940138670775267, "compression_ratio": 1.8608695652173912, + "no_speech_prob": 3.71196074411273e-05}, {"id": 1476, "seek": 443378, "start": 4443.259999999999, + "end": 4446.74, "text": " and then you hop over to the dense space,", "tokens": + [50838, 293, 550, 291, 3818, 670, 281, 264, 18011, 1901, 11, 51012], "temperature": + 0.0, "avg_logprob": -0.15940138670775267, "compression_ratio": 1.8608695652173912, + "no_speech_prob": 3.71196074411273e-05}, {"id": 1477, "seek": 443378, "start": 4446.74, + "end": 4448.86, "text": " you''re typically gonna get a lot of overlap", "tokens": + [51012, 291, 434, 5850, 799, 483, 257, 688, 295, 19959, 51118], "temperature": 0.0, + "avg_logprob": -0.15940138670775267, "compression_ratio": 1.8608695652173912, "no_speech_prob": + 3.71196074411273e-05}, {"id": 1478, "seek": 443378, "start": 4448.86, "end": 4451.82, + "text": " because the lexical space semantics", "tokens": [51118, 570, 264, 476, + 87, 804, 1901, 4361, 45298, 51266], "temperature": 0.0, "avg_logprob": -0.15940138670775267, + "compression_ratio": 1.8608695652173912, "no_speech_prob": 3.71196074411273e-05}, + {"id": 1479, "seek": 443378, "start": 4451.82, "end": 4454.46, "text": " are going + to be very similar to the dense vector space", "tokens": [51266, 366, 516, 281, + 312, 588, 2531, 281, 264, 18011, 8062, 1901, 51398], "temperature": 0.0, "avg_logprob": + -0.15940138670775267, "compression_ratio": 1.8608695652173912, "no_speech_prob": + 3.71196074411273e-05}, {"id": 1480, "seek": 443378, "start": 4454.46, "end": 4456.78, + "text": " semantics in terms of like the underlying meaning.", "tokens": [51398, + 4361, 45298, 294, 2115, 295, 411, 264, 14217, 3620, 13, 51514], "temperature": 0.0, + "avg_logprob": -0.15940138670775267, "compression_ratio": 1.8608695652173912, "no_speech_prob": + 3.71196074411273e-05}, {"id": 1481, "seek": 443378, "start": 4458.139999999999, + "end": 4460.74, "text": " If you were to take the lexical space", "tokens": [51582, + 759, 291, 645, 281, 747, 264, 476, 87, 804, 1901, 51712], "temperature": 0.0, "avg_logprob": + -0.15940138670775267, "compression_ratio": 1.8608695652173912, "no_speech_prob": + 3.71196074411273e-05}, {"id": 1482, "seek": 443378, "start": 4460.74, "end": 4463.099999999999, + "text": " and I should mention, you can actually use", "tokens": [51712, 293, 286, + 820, 2152, 11, 291, 393, 767, 764, 51830], "temperature": 0.0, "avg_logprob": -0.15940138670775267, + "compression_ratio": 1.8608695652173912, "no_speech_prob": 3.71196074411273e-05}, + {"id": 1483, "seek": 446310, "start": 4463.1, "end": 4465.38, "text": " the wormhole + vector in the same vector space.", "tokens": [50364, 264, 23835, 14094, 8062, 294, + 264, 912, 8062, 1901, 13, 50478], "temperature": 0.0, "avg_logprob": -0.15093188815646702, + "compression_ratio": 1.783882783882784, "no_speech_prob": 0.00045594366383738816}, + {"id": 1484, "seek": 446310, "start": 4465.38, "end": 4468.860000000001, "text": + " I kind of showed that with like taking a query like lasagna", "tokens": [50478, + 286, 733, 295, 4712, 300, 365, 411, 1940, 257, 14581, 411, 2439, 35697, 50652], + "temperature": 0.0, "avg_logprob": -0.15093188815646702, "compression_ratio": 1.783882783882784, + "no_speech_prob": 0.00045594366383738816}, {"id": 1485, "seek": 446310, "start": + 4468.860000000001, "end": 4473.3, "text": " and then rewriting it with a more expanded + out lexical query", "tokens": [50652, 293, 550, 319, 19868, 309, 365, 257, 544, + 14342, 484, 476, 87, 804, 14581, 50874], "temperature": 0.0, "avg_logprob": -0.15093188815646702, + "compression_ratio": 1.783882783882784, "no_speech_prob": 0.00045594366383738816}, + {"id": 1486, "seek": 446310, "start": 4473.3, "end": 4474.820000000001, "text": + " with a category of Italian.", "tokens": [50874, 365, 257, 7719, 295, 10003, 13, + 50950], "temperature": 0.0, "avg_logprob": -0.15093188815646702, "compression_ratio": + 1.783882783882784, "no_speech_prob": 0.00045594366383738816}, {"id": 1487, "seek": + 446310, "start": 4474.820000000001, "end": 4476.9400000000005, "text": " And so + you don''t have to actually jump", "tokens": [50950, 400, 370, 291, 500, 380, 362, + 281, 767, 3012, 51056], "temperature": 0.0, "avg_logprob": -0.15093188815646702, + "compression_ratio": 1.783882783882784, "no_speech_prob": 0.00045594366383738816}, + {"id": 1488, "seek": 446310, "start": 4476.9400000000005, "end": 4478.14, "text": + " between different vector spaces.", "tokens": [51056, 1296, 819, 8062, 7673, 13, + 51116], "temperature": 0.0, "avg_logprob": -0.15093188815646702, "compression_ratio": + 1.783882783882784, "no_speech_prob": 0.00045594366383738816}, {"id": 1489, "seek": + 446310, "start": 4478.14, "end": 4480.5, "text": " You can even jump within the + same vector space.", "tokens": [51116, 509, 393, 754, 3012, 1951, 264, 912, 8062, + 1901, 13, 51234], "temperature": 0.0, "avg_logprob": -0.15093188815646702, "compression_ratio": + 1.783882783882784, "no_speech_prob": 0.00045594366383738816}, {"id": 1490, "seek": + 446310, "start": 4480.5, "end": 4484.620000000001, "text": " And I think that in + this context,", "tokens": [51234, 400, 286, 519, 300, 294, 341, 4319, 11, 51440], + "temperature": 0.0, "avg_logprob": -0.15093188815646702, "compression_ratio": 1.783882783882784, + "no_speech_prob": 0.00045594366383738816}, {"id": 1491, "seek": 446310, "start": + 4484.620000000001, "end": 4488.22, "text": " the more similar, the meaning of the + underlying set", "tokens": [51440, 264, 544, 2531, 11, 264, 3620, 295, 264, 14217, + 992, 51620], "temperature": 0.0, "avg_logprob": -0.15093188815646702, "compression_ratio": + 1.783882783882784, "no_speech_prob": 0.00045594366383738816}, {"id": 1492, "seek": + 446310, "start": 4488.22, "end": 4490.3, "text": " of documents is matching each + query,", "tokens": [51620, 295, 8512, 307, 14324, 1184, 14581, 11, 51724], "temperature": + 0.0, "avg_logprob": -0.15093188815646702, "compression_ratio": 1.783882783882784, + "no_speech_prob": 0.00045594366383738816}, {"id": 1493, "seek": 446310, "start": + 4490.3, "end": 4492.46, "text": " the more interesting you''re gonna be able to + find", "tokens": [51724, 264, 544, 1880, 291, 434, 799, 312, 1075, 281, 915, 51832], + "temperature": 0.0, "avg_logprob": -0.15093188815646702, "compression_ratio": 1.783882783882784, + "no_speech_prob": 0.00045594366383738816}, {"id": 1494, "seek": 449246, "start": + 4492.5, "end": 4495.18, "text": " missing links in the other vector space.", "tokens": + [50366, 5361, 6123, 294, 264, 661, 8062, 1901, 13, 50500], "temperature": 0.0, "avg_logprob": + -0.20068416595458985, "compression_ratio": 1.725868725868726, "no_speech_prob": + 0.00027209712425246835}, {"id": 1495, "seek": 449246, "start": 4495.18, "end": 4496.86, + "text": " If you have very orthogonal queries,", "tokens": [50500, 759, 291, 362, + 588, 41488, 24109, 11, 50584], "temperature": 0.0, "avg_logprob": -0.20068416595458985, + "compression_ratio": 1.725868725868726, "no_speech_prob": 0.00027209712425246835}, + {"id": 1496, "seek": 449246, "start": 4496.86, "end": 4500.82, "text": " like you + can imagine on the lexical side searching for", "tokens": [50584, 411, 291, 393, + 3811, 322, 264, 476, 87, 804, 1252, 10808, 337, 50782], "temperature": 0.0, "avg_logprob": + -0.20068416595458985, "compression_ratio": 1.725868725868726, "no_speech_prob": + 0.00027209712425246835}, {"id": 1497, "seek": 449246, "start": 4500.82, "end": 4505.82, + "text": " orange juice and Nintendo switch, right?", "tokens": [50782, 7671, 8544, + 293, 11578, 3679, 11, 558, 30, 51032], "temperature": 0.0, "avg_logprob": -0.20068416595458985, + "compression_ratio": 1.725868725868726, "no_speech_prob": 0.00027209712425246835}, + {"id": 1498, "seek": 449246, "start": 4505.82, "end": 4507.58, "text": " Or you''ll + get nothing for that,", "tokens": [51032, 1610, 291, 603, 483, 1825, 337, 300, 11, + 51120], "temperature": 0.0, "avg_logprob": -0.20068416595458985, "compression_ratio": + 1.725868725868726, "no_speech_prob": 0.00027209712425246835}, {"id": 1499, "seek": + 449246, "start": 4507.58, "end": 4509.06, "text": " but orange juice or Nintendo + switch.", "tokens": [51120, 457, 7671, 8544, 420, 11578, 3679, 13, 51194], "temperature": + 0.0, "avg_logprob": -0.20068416595458985, "compression_ratio": 1.725868725868726, + "no_speech_prob": 0.00027209712425246835}, {"id": 1500, "seek": 449246, "start": + 4509.06, "end": 4511.82, "text": " Well, you basically end up with a document set", + "tokens": [51194, 1042, 11, 291, 1936, 917, 493, 365, 257, 4166, 992, 51332], "temperature": + 0.0, "avg_logprob": -0.20068416595458985, "compression_ratio": 1.725868725868726, + "no_speech_prob": 0.00027209712425246835}, {"id": 1501, "seek": 449246, "start": + 4511.82, "end": 4514.38, "text": " that is really two separate document sets, right?", + "tokens": [51332, 300, 307, 534, 732, 4994, 4166, 6352, 11, 558, 30, 51460], "temperature": + 0.0, "avg_logprob": -0.20068416595458985, "compression_ratio": 1.725868725868726, + "no_speech_prob": 0.00027209712425246835}, {"id": 1502, "seek": 449246, "start": + 4514.38, "end": 4516.46, "text": " It''s like there''s not a lot of overlap", "tokens": + [51460, 467, 311, 411, 456, 311, 406, 257, 688, 295, 19959, 51564], "temperature": + 0.0, "avg_logprob": -0.20068416595458985, "compression_ratio": 1.725868725868726, + "no_speech_prob": 0.00027209712425246835}, {"id": 1503, "seek": 449246, "start": + 4516.46, "end": 4518.14, "text": " and if you hop over to the dense space", "tokens": + [51564, 293, 498, 291, 3818, 670, 281, 264, 18011, 1901, 51648], "temperature": + 0.0, "avg_logprob": -0.20068416595458985, "compression_ratio": 1.725868725868726, + "no_speech_prob": 0.00027209712425246835}, {"id": 1504, "seek": 449246, "start": + 4518.14, "end": 4520.3, "text": " and get the average of those,", "tokens": [51648, + 293, 483, 264, 4274, 295, 729, 11, 51756], "temperature": 0.0, "avg_logprob": -0.20068416595458985, + "compression_ratio": 1.725868725868726, "no_speech_prob": 0.00027209712425246835}, + {"id": 1505, "seek": 452030, "start": 4521.26, "end": 4523.66, "text": " there''s + still gonna be things that are like", "tokens": [50412, 456, 311, 920, 799, 312, + 721, 300, 366, 411, 50532], "temperature": 0.0, "avg_logprob": -0.13911766383958898, + "compression_ratio": 1.7453874538745386, "no_speech_prob": 0.004013934638351202}, + {"id": 1506, "seek": 452030, "start": 4523.66, "end": 4526.820000000001, "text": + " probably close to Nintendo switch", "tokens": [50532, 1391, 1998, 281, 11578, + 3679, 50690], "temperature": 0.0, "avg_logprob": -0.13911766383958898, "compression_ratio": + 1.7453874538745386, "no_speech_prob": 0.004013934638351202}, {"id": 1507, "seek": + 452030, "start": 4526.820000000001, "end": 4529.860000000001, "text": " and probably + close to orange, but the more different", "tokens": [50690, 293, 1391, 1998, 281, + 7671, 11, 457, 264, 544, 819, 50842], "temperature": 0.0, "avg_logprob": -0.13911766383958898, + "compression_ratio": 1.7453874538745386, "no_speech_prob": 0.004013934638351202}, + {"id": 1508, "seek": 452030, "start": 4529.860000000001, "end": 4532.46, "text": + " those things are, you might get some weird stuff in between.", "tokens": [50842, + 729, 721, 366, 11, 291, 1062, 483, 512, 3657, 1507, 294, 1296, 13, 50972], "temperature": + 0.0, "avg_logprob": -0.13911766383958898, "compression_ratio": 1.7453874538745386, + "no_speech_prob": 0.004013934638351202}, {"id": 1509, "seek": 452030, "start": 4532.46, + "end": 4535.900000000001, "text": " Because you''re now looking across two different + places,", "tokens": [50972, 1436, 291, 434, 586, 1237, 2108, 732, 819, 3190, 11, + 51144], "temperature": 0.0, "avg_logprob": -0.13911766383958898, "compression_ratio": + 1.7453874538745386, "no_speech_prob": 0.004013934638351202}, {"id": 1510, "seek": + 452030, "start": 4535.900000000001, "end": 4539.02, "text": " any Nintendo switch + stuff that''s orange", "tokens": [51144, 604, 11578, 3679, 1507, 300, 311, 7671, + 51300], "temperature": 0.0, "avg_logprob": -0.13911766383958898, "compression_ratio": + 1.7453874538745386, "no_speech_prob": 0.004013934638351202}, {"id": 1511, "seek": + 452030, "start": 4539.02, "end": 4540.860000000001, "text": " or related to juice + or something might show up,", "tokens": [51300, 420, 4077, 281, 8544, 420, 746, + 1062, 855, 493, 11, 51392], "temperature": 0.0, "avg_logprob": -0.13911766383958898, + "compression_ratio": 1.7453874538745386, "no_speech_prob": 0.004013934638351202}, + {"id": 1512, "seek": 452030, "start": 4540.860000000001, "end": 4542.62, "text": + " but it''s gonna be weird.", "tokens": [51392, 457, 309, 311, 799, 312, 3657, 13, + 51480], "temperature": 0.0, "avg_logprob": -0.13911766383958898, "compression_ratio": + 1.7453874538745386, "no_speech_prob": 0.004013934638351202}, {"id": 1513, "seek": + 452030, "start": 4542.62, "end": 4545.7, "text": " And so this isn''t like a magical + silver bullet", "tokens": [51480, 400, 370, 341, 1943, 380, 411, 257, 12066, 8753, + 11632, 51634], "temperature": 0.0, "avg_logprob": -0.13911766383958898, "compression_ratio": + 1.7453874538745386, "no_speech_prob": 0.004013934638351202}, {"id": 1514, "seek": + 452030, "start": 4545.7, "end": 4547.38, "text": " that solves every query understanding", + "tokens": [51634, 300, 39890, 633, 14581, 3701, 51718], "temperature": 0.0, "avg_logprob": + -0.13911766383958898, "compression_ratio": 1.7453874538745386, "no_speech_prob": + 0.004013934638351202}, {"id": 1515, "seek": 452030, "start": 4547.38, "end": 4548.62, + "text": " or every relevance problem.", "tokens": [51718, 420, 633, 32684, 1154, + 13, 51780], "temperature": 0.0, "avg_logprob": -0.13911766383958898, "compression_ratio": + 1.7453874538745386, "no_speech_prob": 0.004013934638351202}, {"id": 1516, "seek": + 454862, "start": 4548.62, "end": 4550.86, "text": " It''s just another tool in our + toolkit", "tokens": [50364, 467, 311, 445, 1071, 2290, 294, 527, 40167, 50476], + "temperature": 0.0, "avg_logprob": -0.26807504736858867, "compression_ratio": 1.6869918699186992, + "no_speech_prob": 0.00042176118586212397}, {"id": 1517, "seek": 454862, "start": + 4550.86, "end": 4553.94, "text": " to be able to better reason about the underlying + documents", "tokens": [50476, 281, 312, 1075, 281, 1101, 1778, 466, 264, 14217, + 8512, 50630], "temperature": 0.0, "avg_logprob": -0.26807504736858867, "compression_ratio": + 1.6869918699186992, "no_speech_prob": 0.00042176118586212397}, {"id": 1518, "seek": + 454862, "start": 4553.94, "end": 4556.5, "text": " and queries and to explain queries", + "tokens": [50630, 293, 24109, 293, 281, 2903, 24109, 50758], "temperature": 0.0, + "avg_logprob": -0.26807504736858867, "compression_ratio": 1.6869918699186992, "no_speech_prob": + 0.00042176118586212397}, {"id": 1519, "seek": 454862, "start": 4556.5, "end": 4560.14, + "text": " in another modality, if you will, another query modality.", "tokens": + [50758, 294, 1071, 1072, 1860, 11, 498, 291, 486, 11, 1071, 14581, 1072, 1860, 13, + 50940], "temperature": 0.0, "avg_logprob": -0.26807504736858867, "compression_ratio": + 1.6869918699186992, "no_speech_prob": 0.00042176118586212397}, {"id": 1520, "seek": + 454862, "start": 4560.14, "end": 4562.0199999999995, "text": " Yeah, in other words, + what your searching for", "tokens": [50940, 865, 11, 294, 661, 2283, 11, 437, 428, + 10808, 337, 51034], "temperature": 0.0, "avg_logprob": -0.26807504736858867, "compression_ratio": + 1.6869918699186992, "no_speech_prob": 0.00042176118586212397}, {"id": 1521, "seek": + 454862, "start": 4562.0199999999995, "end": 4564.18, "text": " should still kind + of make sense.", "tokens": [51034, 820, 920, 733, 295, 652, 2020, 13, 51142], "temperature": + 0.0, "avg_logprob": -0.26807504736858867, "compression_ratio": 1.6869918699186992, + "no_speech_prob": 0.00042176118586212397}, {"id": 1522, "seek": 454862, "start": + 4564.18, "end": 4565.0199999999995, "text": " Yeah, yeah.", "tokens": [51142, 865, + 11, 1338, 13, 51184], "temperature": 0.0, "avg_logprob": -0.26807504736858867, "compression_ratio": + 1.6869918699186992, "no_speech_prob": 0.00042176118586212397}, {"id": 1523, "seek": + 454862, "start": 4565.0199999999995, "end": 4568.94, "text": " So if it does, it + probably will return some useful results.", "tokens": [51184, 407, 498, 309, 775, + 11, 309, 1391, 486, 2736, 512, 4420, 3542, 13, 51380], "temperature": 0.0, "avg_logprob": + -0.26807504736858867, "compression_ratio": 1.6869918699186992, "no_speech_prob": + 0.00042176118586212397}, {"id": 1524, "seek": 454862, "start": 4569.78, "end": 4572.62, + "text": " Tips say, thank you, thank you, tips.", "tokens": [51422, 36887, 584, + 11, 1309, 291, 11, 1309, 291, 11, 6082, 13, 51564], "temperature": 0.0, "avg_logprob": + -0.26807504736858867, "compression_ratio": 1.6869918699186992, "no_speech_prob": + 0.00042176118586212397}, {"id": 1525, "seek": 454862, "start": 4572.62, "end": 4577.62, + "text": " Rustem says, you know, the impact of", "tokens": [51564, 34952, 443, 1619, + 11, 291, 458, 11, 264, 2712, 295, 51814], "temperature": 0.0, "avg_logprob": -0.26807504736858867, + "compression_ratio": 1.6869918699186992, "no_speech_prob": 0.00042176118586212397}, + {"id": 1526, "seek": 457762, "start": 4577.62, "end": 4580.0199999999995, "text": + " documents'' segmentation, so basically,", "tokens": [50364, 8512, 6, 9469, 399, + 11, 370, 1936, 11, 50484], "temperature": 0.0, "avg_logprob": -0.24571891959386927, + "compression_ratio": 1.782442748091603, "no_speech_prob": 0.00192662060726434}, + {"id": 1527, "seek": 457762, "start": 4580.0199999999995, "end": 4582.5, "text": + " what are the suggestions to improve that", "tokens": [50484, 437, 366, 264, 13396, + 281, 3470, 300, 50608], "temperature": 0.0, "avg_logprob": -0.24571891959386927, + "compression_ratio": 1.782442748091603, "no_speech_prob": 0.00192662060726434}, + {"id": 1528, "seek": 457762, "start": 4582.5, "end": 4586.42, "text": " so that + wormhole vectors would be useful?", "tokens": [50608, 370, 300, 23835, 14094, 18875, + 576, 312, 4420, 30, 50804], "temperature": 0.0, "avg_logprob": -0.24571891959386927, + "compression_ratio": 1.782442748091603, "no_speech_prob": 0.00192662060726434}, + {"id": 1529, "seek": 457762, "start": 4586.42, "end": 4587.26, "text": " Yeah, I + think-", "tokens": [50804, 865, 11, 286, 519, 12, 50846], "temperature": 0.0, "avg_logprob": + -0.24571891959386927, "compression_ratio": 1.782442748091603, "no_speech_prob": + 0.00192662060726434}, {"id": 1530, "seek": 457762, "start": 4587.26, "end": 4588.42, + "text": " Especially for much documents.", "tokens": [50846, 8545, 337, 709, 8512, + 13, 50904], "temperature": 0.0, "avg_logprob": -0.24571891959386927, "compression_ratio": + 1.782442748091603, "no_speech_prob": 0.00192662060726434}, {"id": 1531, "seek": + 457762, "start": 4588.42, "end": 4590.82, "text": " Yeah, I mean, I think it''s + common sense,", "tokens": [50904, 865, 11, 286, 914, 11, 286, 519, 309, 311, 2689, + 2020, 11, 51024], "temperature": 0.0, "avg_logprob": -0.24571891959386927, "compression_ratio": + 1.782442748091603, "no_speech_prob": 0.00192662060726434}, {"id": 1532, "seek": + 457762, "start": 4590.82, "end": 4593.0199999999995, "text": " like the same way + that you would chunk documents", "tokens": [51024, 411, 264, 912, 636, 300, 291, + 576, 16635, 8512, 51134], "temperature": 0.0, "avg_logprob": -0.24571891959386927, + "compression_ratio": 1.782442748091603, "no_speech_prob": 0.00192662060726434}, + {"id": 1533, "seek": 457762, "start": 4593.0199999999995, "end": 4595.98, "text": + " for doing RAAG, you know, I happen to think of,", "tokens": [51134, 337, 884, + 497, 5265, 38, 11, 291, 458, 11, 286, 1051, 281, 519, 295, 11, 51282], "temperature": + 0.0, "avg_logprob": -0.24571891959386927, "compression_ratio": 1.782442748091603, + "no_speech_prob": 0.00192662060726434}, {"id": 1534, "seek": 457762, "start": 4595.98, + "end": 4599.34, "text": " you know, if the documents are too big,", "tokens": [51282, + 291, 458, 11, 498, 264, 8512, 366, 886, 955, 11, 51450], "temperature": 0.0, "avg_logprob": + -0.24571891959386927, "compression_ratio": 1.782442748091603, "no_speech_prob": + 0.00192662060726434}, {"id": 1535, "seek": 457762, "start": 4599.34, "end": 4602.58, + "text": " then there''s too much loss of specificity", "tokens": [51450, 550, 456, + 311, 886, 709, 4470, 295, 2685, 507, 51612], "temperature": 0.0, "avg_logprob": + -0.24571891959386927, "compression_ratio": 1.782442748091603, "no_speech_prob": + 0.00192662060726434}, {"id": 1536, "seek": 457762, "start": 4602.58, "end": 4605.0199999999995, + "text": " and too much context being blurred together.", "tokens": [51612, 293, + 886, 709, 4319, 885, 43525, 1214, 13, 51734], "temperature": 0.0, "avg_logprob": + -0.24571891959386927, "compression_ratio": 1.782442748091603, "no_speech_prob": + 0.00192662060726434}, {"id": 1537, "seek": 457762, "start": 4605.0199999999995, + "end": 4606.54, "text": " And if the documents are too tiny,", "tokens": [51734, + 400, 498, 264, 8512, 366, 886, 5870, 11, 51810], "temperature": 0.0, "avg_logprob": + -0.24571891959386927, "compression_ratio": 1.782442748091603, "no_speech_prob": + 0.00192662060726434}, {"id": 1538, "seek": 460654, "start": 4606.54, "end": 4608.5, + "text": " then you''re losing context,", "tokens": [50364, 550, 291, 434, 7027, + 4319, 11, 50462], "temperature": 0.0, "avg_logprob": -0.2600862979888916, "compression_ratio": + 1.626923076923077, "no_speech_prob": 0.0003269418375566602}, {"id": 1539, "seek": + 460654, "start": 4608.5, "end": 4610.82, "text": " and they''re too specific and + not enough overlap.", "tokens": [50462, 293, 436, 434, 886, 2685, 293, 406, 1547, + 19959, 13, 50578], "temperature": 0.0, "avg_logprob": -0.2600862979888916, "compression_ratio": + 1.626923076923077, "no_speech_prob": 0.0003269418375566602}, {"id": 1540, "seek": + 460654, "start": 4610.82, "end": 4614.38, "text": " So I think, you know, typical, + like whatever your domain is,", "tokens": [50578, 407, 286, 519, 11, 291, 458, 11, + 7476, 11, 411, 2035, 428, 9274, 307, 11, 50756], "temperature": 0.0, "avg_logprob": + -0.2600862979888916, "compression_ratio": 1.626923076923077, "no_speech_prob": 0.0003269418375566602}, + {"id": 1541, "seek": 460654, "start": 4614.38, "end": 4617.78, "text": " you know, + I mean, if you''ve got giant PDFs that are books,", "tokens": [50756, 291, 458, + 11, 286, 914, 11, 498, 291, 600, 658, 7410, 17752, 82, 300, 366, 3642, 11, 50926], + "temperature": 0.0, "avg_logprob": -0.2600862979888916, "compression_ratio": 1.626923076923077, + "no_speech_prob": 0.0003269418375566602}, {"id": 1542, "seek": 460654, "start": + 4617.78, "end": 4620.86, "text": " maybe break into the chapters or possibly even + sections", "tokens": [50926, 1310, 1821, 666, 264, 20013, 420, 6264, 754, 10863, + 51080], "temperature": 0.0, "avg_logprob": -0.2600862979888916, "compression_ratio": + 1.626923076923077, "no_speech_prob": 0.0003269418375566602}, {"id": 1543, "seek": + 460654, "start": 4620.86, "end": 4622.5, "text": " of chapters of the large sections,", + "tokens": [51080, 295, 20013, 295, 264, 2416, 10863, 11, 51162], "temperature": + 0.0, "avg_logprob": -0.2600862979888916, "compression_ratio": 1.626923076923077, + "no_speech_prob": 0.0003269418375566602}, {"id": 1544, "seek": 460654, "start": + 4623.38, "end": 4625.86, "text": " but yeah, just like use common sense.", "tokens": + [51206, 457, 1338, 11, 445, 411, 764, 2689, 2020, 13, 51330], "temperature": 0.0, + "avg_logprob": -0.2600862979888916, "compression_ratio": 1.626923076923077, "no_speech_prob": + 0.0003269418375566602}, {"id": 1545, "seek": 460654, "start": 4625.86, "end": 4629.34, + "text": " What''s a reasonable size document that represents", "tokens": [51330, + 708, 311, 257, 10585, 2744, 4166, 300, 8855, 51504], "temperature": 0.0, "avg_logprob": + -0.2600862979888916, "compression_ratio": 1.626923076923077, "no_speech_prob": 0.0003269418375566602}, + {"id": 1546, "seek": 460654, "start": 4629.34, "end": 4632.62, "text": " the meaning + of something that is, like sort of,", "tokens": [51504, 264, 3620, 295, 746, 300, + 307, 11, 411, 1333, 295, 11, 51668], "temperature": 0.0, "avg_logprob": -0.2600862979888916, + "compression_ratio": 1.626923076923077, "no_speech_prob": 0.0003269418375566602}, + {"id": 1547, "seek": 463262, "start": 4633.62, "end": 4638.22, "text": " it''s called + integral, like a whole thing, a whole concept.", "tokens": [50414, 309, 311, 1219, + 11573, 11, 411, 257, 1379, 551, 11, 257, 1379, 3410, 13, 50644], "temperature": + 0.0, "avg_logprob": -0.3731326410325907, "compression_ratio": 1.5525291828793775, + "no_speech_prob": 0.007267951965332031}, {"id": 1548, "seek": 463262, "start": 4639.0199999999995, + "end": 4640.0199999999995, "text": " Yeah, yeah.", "tokens": [50684, 865, 11, 1338, + 13, 50734], "temperature": 0.0, "avg_logprob": -0.3731326410325907, "compression_ratio": + 1.5525291828793775, "no_speech_prob": 0.007267951965332031}, {"id": 1549, "seek": + 463262, "start": 4641.62, "end": 4644.54, "text": " It depends on the domain, but + I would just say,", "tokens": [50814, 467, 5946, 322, 264, 9274, 11, 457, 286, 576, + 445, 584, 11, 50960], "temperature": 0.0, "avg_logprob": -0.3731326410325907, "compression_ratio": + 1.5525291828793775, "no_speech_prob": 0.007267951965332031}, {"id": 1550, "seek": + 463262, "start": 4644.54, "end": 4646.46, "text": " you know, your common sense + is probably going to take you", "tokens": [50960, 291, 458, 11, 428, 2689, 2020, + 307, 1391, 516, 281, 747, 291, 51056], "temperature": 0.0, "avg_logprob": -0.3731326410325907, + "compression_ratio": 1.5525291828793775, "no_speech_prob": 0.007267951965332031}, + {"id": 1551, "seek": 463262, "start": 4646.46, "end": 4647.46, "text": " far on + that one.", "tokens": [51056, 1400, 322, 300, 472, 13, 51106], "temperature": 0.0, + "avg_logprob": -0.3731326410325907, "compression_ratio": 1.5525291828793775, "no_speech_prob": + 0.007267951965332031}, {"id": 1552, "seek": 463262, "start": 4647.46, "end": 4649.54, + "text": " Yeah, and like for long documents that are like", "tokens": [51106, 865, + 11, 293, 411, 337, 938, 8512, 300, 366, 411, 51210], "temperature": 0.0, "avg_logprob": + -0.3731326410325907, "compression_ratio": 1.5525291828793775, "no_speech_prob": + 0.007267951965332031}, {"id": 1553, "seek": 463262, "start": 4649.54, "end": 4652.099999999999, + "text": " at 1,000 pages for sure you want to do that.", "tokens": [51210, 412, + 502, 11, 1360, 7183, 337, 988, 291, 528, 281, 360, 300, 13, 51338], "temperature": + 0.0, "avg_logprob": -0.3731326410325907, "compression_ratio": 1.5525291828793775, + "no_speech_prob": 0.007267951965332031}, {"id": 1554, "seek": 463262, "start": 4652.74, + "end": 4655.34, "text": " And maybe the last question is from Arjun.", "tokens": + [51370, 400, 1310, 264, 1036, 1168, 307, 490, 1587, 10010, 13, 51500], "temperature": + 0.0, "avg_logprob": -0.3731326410325907, "compression_ratio": 1.5525291828793775, + "no_speech_prob": 0.007267951965332031}, {"id": 1555, "seek": 463262, "start": 4656.5, + "end": 4659.0199999999995, "text": " Can this idea of wormhole vectors give us", + "tokens": [51558, 1664, 341, 1558, 295, 23835, 14094, 18875, 976, 505, 51684], "temperature": + 0.0, "avg_logprob": -0.3731326410325907, "compression_ratio": 1.5525291828793775, + "no_speech_prob": 0.007267951965332031}, {"id": 1556, "seek": 463262, "start": 4659.0199999999995, + "end": 4660.82, "text": " more serendipitous results?", "tokens": [51684, 544, 816, + 521, 647, 39831, 3542, 30, 51774], "temperature": 0.0, "avg_logprob": -0.3731326410325907, + "compression_ratio": 1.5525291828793775, "no_speech_prob": 0.007267951965332031}, + {"id": 1557, "seek": 466082, "start": 4661.66, "end": 4662.94, "text": " Give us + more what?", "tokens": [50406, 5303, 505, 544, 437, 30, 50470], "temperature": 0.0, + "avg_logprob": -0.29548610051472984, "compression_ratio": 1.6553030303030303, "no_speech_prob": + 0.0021705341059714556}, {"id": 1558, "seek": 466082, "start": 4662.94, "end": 4664.94, + "text": " Serendipitous results.", "tokens": [50470, 4210, 521, 647, 39831, 3542, + 13, 50570], "temperature": 0.0, "avg_logprob": -0.29548610051472984, "compression_ratio": + 1.6553030303030303, "no_speech_prob": 0.0021705341059714556}, {"id": 1559, "seek": + 466082, "start": 4664.94, "end": 4667.299999999999, "text": " Yeah, absolutely.", + "tokens": [50570, 865, 11, 3122, 13, 50688], "temperature": 0.0, "avg_logprob": + -0.29548610051472984, "compression_ratio": 1.6553030303030303, "no_speech_prob": + 0.0021705341059714556}, {"id": 1560, "seek": 466082, "start": 4667.299999999999, + "end": 4670.94, "text": " So yeah, think of just the behavioral space, right?", + "tokens": [50688, 407, 1338, 11, 519, 295, 445, 264, 19124, 1901, 11, 558, 30, 50870], + "temperature": 0.0, "avg_logprob": -0.29548610051472984, "compression_ratio": 1.6553030303030303, + "no_speech_prob": 0.0021705341059714556}, {"id": 1561, "seek": 466082, "start": + 4670.94, "end": 4673.46, "text": " If I run a query, a keyword query,", "tokens": + [50870, 759, 286, 1190, 257, 14581, 11, 257, 20428, 14581, 11, 50996], "temperature": + 0.0, "avg_logprob": -0.29548610051472984, "compression_ratio": 1.6553030303030303, + "no_speech_prob": 0.0021705341059714556}, {"id": 1562, "seek": 466082, "start": + 4673.46, "end": 4677.54, "text": " and then I want to find other things that are + related to this", "tokens": [50996, 293, 550, 286, 528, 281, 915, 661, 721, 300, + 366, 4077, 281, 341, 51200], "temperature": 0.0, "avg_logprob": -0.29548610051472984, + "compression_ratio": 1.6553030303030303, "no_speech_prob": 0.0021705341059714556}, + {"id": 1563, "seek": 466082, "start": 4677.54, "end": 4680.74, "text": " that don''t + match the terms and maybe don''t even match the meaning,", "tokens": [51200, 300, + 500, 380, 2995, 264, 2115, 293, 1310, 500, 380, 754, 2995, 264, 3620, 11, 51360], + "temperature": 0.0, "avg_logprob": -0.29548610051472984, "compression_ratio": 1.6553030303030303, + "no_speech_prob": 0.0021705341059714556}, {"id": 1564, "seek": 466082, "start": + 4680.74, "end": 4684.299999999999, "text": " but like user behavior has said that + these things I should suggest.", "tokens": [51360, 457, 411, 4195, 5223, 575, 848, + 300, 613, 721, 286, 820, 3402, 13, 51538], "temperature": 0.0, "avg_logprob": -0.29548610051472984, + "compression_ratio": 1.6553030303030303, "no_speech_prob": 0.0021705341059714556}, + {"id": 1565, "seek": 466082, "start": 4684.299999999999, "end": 4686.98, "text": + " I''m basically infusing recommendations then.", "tokens": [51538, 286, 478, 1936, + 1536, 7981, 10434, 550, 13, 51672], "temperature": 0.0, "avg_logprob": -0.29548610051472984, + "compression_ratio": 1.6553030303030303, "no_speech_prob": 0.0021705341059714556}, + {"id": 1566, "seek": 466082, "start": 4686.98, "end": 4689.7, "text": " If I help + over to the cement to the dense space,", "tokens": [51672, 759, 286, 854, 670, 281, + 264, 19729, 281, 264, 18011, 1901, 11, 51808], "temperature": 0.0, "avg_logprob": + -0.29548610051472984, "compression_ratio": 1.6553030303030303, "no_speech_prob": + 0.0021705341059714556}, {"id": 1567, "seek": 468970, "start": 4690.54, "end": 4694.7, + "text": " then I take my keywords and I''m finding other things that share meaning,", + "tokens": [50406, 550, 286, 747, 452, 21009, 293, 286, 478, 5006, 661, 721, 300, + 2073, 3620, 11, 50614], "temperature": 0.0, "avg_logprob": -0.17417796741832386, + "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.002898290054872632}, + {"id": 1568, "seek": 468970, "start": 4694.7, "end": 4697.099999999999, "text": + " but don''t necessarily have that keyword.", "tokens": [50614, 457, 500, 380, 4725, + 362, 300, 20428, 13, 50734], "temperature": 0.0, "avg_logprob": -0.17417796741832386, + "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.002898290054872632}, + {"id": 1569, "seek": 468970, "start": 4697.099999999999, "end": 4700.66, "text": + " I''m starting with dense and I hop over to the electrical side.", "tokens": [50734, + 286, 478, 2891, 365, 18011, 293, 286, 3818, 670, 281, 264, 12147, 1252, 13, 50912], + "temperature": 0.0, "avg_logprob": -0.17417796741832386, "compression_ratio": 1.7755102040816326, + "no_speech_prob": 0.002898290054872632}, {"id": 1570, "seek": 468970, "start": 4700.66, + "end": 4704.099999999999, "text": " I''m making sure that I''m finding things with + that meaning,", "tokens": [50912, 286, 478, 1455, 988, 300, 286, 478, 5006, 721, + 365, 300, 3620, 11, 51084], "temperature": 0.0, "avg_logprob": -0.17417796741832386, + "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.002898290054872632}, + {"id": 1571, "seek": 468970, "start": 4704.099999999999, "end": 4709.74, "text": + " but I''m adding in keywords that were completely ignored by the dense space.", + "tokens": [51084, 457, 286, 478, 5127, 294, 21009, 300, 645, 2584, 19735, 538, 264, + 18011, 1901, 13, 51366], "temperature": 0.0, "avg_logprob": -0.17417796741832386, + "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.002898290054872632}, + {"id": 1572, "seek": 468970, "start": 4709.74, "end": 4711.74, "text": " That was + not necessarily a serendipitous.", "tokens": [51366, 663, 390, 406, 4725, 257, 816, + 521, 647, 39831, 13, 51466], "temperature": 0.0, "avg_logprob": -0.17417796741832386, + "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.002898290054872632}, + {"id": 1573, "seek": 468970, "start": 4711.74, "end": 4716.54, "text": " That''s + just like fixing problems, but I would say going from lexical to semantic.", "tokens": + [51466, 663, 311, 445, 411, 19442, 2740, 11, 457, 286, 576, 584, 516, 490, 476, + 87, 804, 281, 47982, 13, 51706], "temperature": 0.0, "avg_logprob": -0.17417796741832386, + "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.002898290054872632}, + {"id": 1574, "seek": 471654, "start": 4716.54, "end": 4720.74, "text": " More so + we''ll get you things that were dismissed.", "tokens": [50364, 5048, 370, 321, 603, + 483, 291, 721, 300, 645, 29970, 13, 50574], "temperature": 0.0, "avg_logprob": -0.22747069464789496, + "compression_ratio": 1.6981818181818182, "no_speech_prob": 0.0048144799657166}, + {"id": 1575, "seek": 471654, "start": 4720.74, "end": 4722.26, "text": " But yeah, + for actual serendipitous,", "tokens": [50574, 583, 1338, 11, 337, 3539, 816, 521, + 647, 39831, 11, 50650], "temperature": 0.0, "avg_logprob": -0.22747069464789496, + "compression_ratio": 1.6981818181818182, "no_speech_prob": 0.0048144799657166}, + {"id": 1576, "seek": 471654, "start": 4722.26, "end": 4726.5, "text": " the behavioral + space is probably going to give you a lot more magic there.", "tokens": [50650, + 264, 19124, 1901, 307, 1391, 516, 281, 976, 291, 257, 688, 544, 5585, 456, 13, 50862], + "temperature": 0.0, "avg_logprob": -0.22747069464789496, "compression_ratio": 1.6981818181818182, + "no_speech_prob": 0.0048144799657166}, {"id": 1577, "seek": 471654, "start": 4726.5, + "end": 4727.42, "text": " All right, awesome.", "tokens": [50862, 1057, 558, 11, + 3476, 13, 50908], "temperature": 0.0, "avg_logprob": -0.22747069464789496, "compression_ratio": + 1.6981818181818182, "no_speech_prob": 0.0048144799657166}, {"id": 1578, "seek": + 471654, "start": 4727.42, "end": 4728.3, "text": " I think it''s a wrap.", "tokens": + [50908, 286, 519, 309, 311, 257, 7019, 13, 50952], "temperature": 0.0, "avg_logprob": + -0.22747069464789496, "compression_ratio": 1.6981818181818182, "no_speech_prob": + 0.0048144799657166}, {"id": 1579, "seek": 471654, "start": 4728.3, "end": 4729.86, + "text": " Thanks so much, everyone.", "tokens": [50952, 2561, 370, 709, 11, 1518, + 13, 51030], "temperature": 0.0, "avg_logprob": -0.22747069464789496, "compression_ratio": + 1.6981818181818182, "no_speech_prob": 0.0048144799657166}, {"id": 1580, "seek": + 471654, "start": 4729.86, "end": 4733.58, "text": " Trey, thank you so much for + the presentation, for the idea,", "tokens": [51030, 314, 7950, 11, 1309, 291, 370, + 709, 337, 264, 5860, 11, 337, 264, 1558, 11, 51216], "temperature": 0.0, "avg_logprob": + -0.22747069464789496, "compression_ratio": 1.6981818181818182, "no_speech_prob": + 0.0048144799657166}, {"id": 1581, "seek": 471654, "start": 4733.58, "end": 4737.7, + "text": " and for pounding at the questions with such an immense speed.", "tokens": + [51216, 293, 337, 40034, 412, 264, 1651, 365, 1270, 364, 22920, 3073, 13, 51422], + "temperature": 0.0, "avg_logprob": -0.22747069464789496, "compression_ratio": 1.6981818181818182, + "no_speech_prob": 0.0048144799657166}, {"id": 1582, "seek": 471654, "start": 4737.7, + "end": 4740.42, "text": " Thank you all for time.", "tokens": [51422, 1044, 291, + 439, 337, 565, 13, 51558], "temperature": 0.0, "avg_logprob": -0.22747069464789496, + "compression_ratio": 1.6981818181818182, "no_speech_prob": 0.0048144799657166}, + {"id": 1583, "seek": 471654, "start": 4740.42, "end": 4741.9, "text": " This was + awesome.", "tokens": [51558, 639, 390, 3476, 13, 51632], "temperature": 0.0, "avg_logprob": + -0.22747069464789496, "compression_ratio": 1.6981818181818182, "no_speech_prob": + 0.0048144799657166}, {"id": 1584, "seek": 471654, "start": 4741.9, "end": 4742.74, + "text": " Awesome.", "tokens": [51632, 10391, 13, 51674], "temperature": 0.0, "avg_logprob": + -0.22747069464789496, "compression_ratio": 1.6981818181818182, "no_speech_prob": + 0.0048144799657166}, {"id": 1585, "seek": 471654, "start": 4742.74, "end": 4745.06, + "text": " Thanks to me to really appreciate you coming on.", "tokens": [51674, 2561, + 281, 385, 281, 534, 4449, 291, 1348, 322, 13, 51790], "temperature": 0.0, "avg_logprob": + -0.22747069464789496, "compression_ratio": 1.6981818181818182, "no_speech_prob": + 0.0048144799657166}, {"id": 1586, "seek": 471654, "start": 4745.06, "end": 4745.78, + "text": " This was awesome.", "tokens": [51790, 639, 390, 3476, 13, 51826], "temperature": + 0.0, "avg_logprob": -0.22747069464789496, "compression_ratio": 1.6981818181818182, + "no_speech_prob": 0.0048144799657166}, {"id": 1587, "seek": 474578, "start": 4745.78, + "end": 4748.42, "text": " Thanks to everybody for joining.", "tokens": [50364, 2561, + 281, 2201, 337, 5549, 13, 50496], "temperature": 0.0, "avg_logprob": -0.2834304675721286, + "compression_ratio": 1.3983739837398375, "no_speech_prob": 0.009870157577097416}, + {"id": 1588, "seek": 474578, "start": 4748.42, "end": 4751.42, "text": " And yeah, + the video and slides and everything will be coming out to you.", "tokens": [50496, + 400, 1338, 11, 264, 960, 293, 9788, 293, 1203, 486, 312, 1348, 484, 281, 291, 13, + 50646], "temperature": 0.0, "avg_logprob": -0.2834304675721286, "compression_ratio": + 1.3983739837398375, "no_speech_prob": 0.009870157577097416}, {"id": 1589, "seek": + 474578, "start": 4751.42, "end": 4753.0199999999995, "text": " And I hope to see + you soon.", "tokens": [50646, 400, 286, 1454, 281, 536, 291, 2321, 13, 50726], "temperature": + 0.0, "avg_logprob": -0.2834304675721286, "compression_ratio": 1.3983739837398375, + "no_speech_prob": 0.009870157577097416}, {"id": 1590, "seek": 474578, "start": 4753.0199999999995, + "end": 4754.219999999999, "text": " Thank you.", "tokens": [50726, 1044, 291, 13, + 50786], "temperature": 0.0, "avg_logprob": -0.2834304675721286, "compression_ratio": + 1.3983739837398375, "no_speech_prob": 0.009870157577097416}, {"id": 1591, "seek": + 474578, "start": 4754.219999999999, "end": 4755.0599999999995, "text": " See you + soon.", "tokens": [50786, 3008, 291, 2321, 13, 50828], "temperature": 0.0, "avg_logprob": + -0.2834304675721286, "compression_ratio": 1.3983739837398375, "no_speech_prob": + 0.009870157577097416}, {"id": 1592, "seek": 474578, "start": 4755.0599999999995, + "end": 4755.62, "text": " Bye bye.", "tokens": [50828, 4621, 6543, 13, 50856], "temperature": + 0.0, "avg_logprob": -0.2834304675721286, "compression_ratio": 1.3983739837398375, + "no_speech_prob": 0.009870157577097416}, {"id": 1593, "seek": 474578, "start": 4755.62, + "end": 4756.62, "text": " Bye.", "tokens": [50856, 4621, 13, 50906], "temperature": + 0.0, "avg_logprob": -0.2834304675721286, "compression_ratio": 1.3983739837398375, + "no_speech_prob": 0.009870157577097416}]' +--- + +Alright, hello everyone, wherever you are, really, really happy to see all of you online. Welcome to the Beyond Hybrid Search with Warm Home Vectors. It's another idea that Tray is going to present today and we will have a discussion and all of you are welcome to ask questions as well. Yeah, cool. +I think we'll start with that. This is just a quick intro from me. I'm Dmitri Khan. I, most recently, I'm with Ivan. I joined as a product director leading the search domain. We offer managed open search so that you don't have your headaches setting it up and doing some DevOps. +And you can choose any cloud whatsoever, really. And then just go and run with that. And I'll share a couple of links later. I'm also a host of the vector podcast that I started, I think, four years ago. I already stopped counting. Maybe some of you have heard some of the episodes. +And yeah, it keeps going on and off, but I'm really excited to continue doing that. I've been in search for, I think, 16 years, maybe 20 years if I include academic experience or exposure. I've built search at startups, multinational technology giants. +I think what was the startup, for example, AlfaSense became, I think, a unicorn company by now. Yeah, I'm super excited to partner with three AI power search and support from vector podcasts looking forward to the session today. Over to you, Trey. Awesome. Thanks, Dmitri. Appreciate it. +I'm really excited to have Dmitri Khan more for the conversation part of this. He's been doing, doing the vector podcast and in the space for a long time. So I think it'd be useful to help him facilitate, get lots of questions and good discussions. So I'm Trey Granger. +I'm the lead author on the book AI Powered Search along with Doug Turnbull and Max Irwin. I'm the founder of Search Colonel company that does AI Powered Search consulting, technical advisor, open source connections. +Last year, I've been an adjunct professor at from university teaching computer science. My background, basically my entire career has been in search, particularly the intersection of data science, AI and search. +My last company, prior to search journal, I was the CTO of pre search, which is a decentralized web search engine prior to that. I was the chief algorithms officer at Lucidworks, a search company, as well as prior to that, their SVP of engineering. +I also had a search at career builder, prior to that. I also a decade ago, solar in action, but AI Powered Search is the focus of what I'm doing right now. The books got, you know, quite good reviews from folks. If you haven't checked it out, please check it out. +And this lightning lesson is one of a series leading up to an AI Powered Search course that Doug Turnbull and I are teaching starting two weeks from today. I heard of it. +It's kind of themed based upon the book, but we'll be going into a lot of new and emerging techniques that aren't in the book as well. Just to give you a sense, I'll spend like a minute on this, maybe two. +If you're curious, it's, you know, four solid weeks of material, the first week will sort of, you know, do a course intro, introduce the search relevance problem, talk about ranking those things. +We'll have a guest session from Eric Pugh from open source connections, talking about user behavior insights. For collecting click stream data and how to properly collect and process that are an accession will be on signals and reflected intelligence models. +Everything from signals boosting for popularized relevance to learning to rank for generalized relevance to collaborative filtering and matrix factorization for personalized relevance to knowledge graph learning to learn from user behaviors. +You know, terms, misspelling things like that about your domain. And then every week will have office hours where you can bring your hardest questions or we've got labs throughout the course as well. If you need help with those, we can help. +We the next week will dive into AI powered query modalities, things like buying coders and crossing coders talk about chunking, talk about late interaction models, hybrid search, multimodal search, all of those. Again, all of this has code and notebooks associated with it will be working through. +We have a guest lecture from Jenny from quadrant who will be talking about mixing sparse sentence representations with mini coil and then we'll dive after that into sort of hands on building ranking classifiers or learning to rank models. +And what is entailed in that we will of course then have office hours again the next week will dive deep into rag talk about rag. You know, sort of naive rag, agentech rag adaptive rag guard rails all the sorts of things you need to sort of understand to do to rag well. +We'll talk about agentech search towards the end of the course talk about interleaving strategies for rag will have Max Irwin our co author on a powered search to giving a guest lecture session after that will be. +Automating learning to rank and with click models and with active learning so we'll be diving into how to deal with biases in your data how to deal with exploration versus exploitation looking for results that may maybe don't show up in your normal search patterns and then. + The sort of final two weeks will have a guest lecture from john handler from open search and AWS really talking about scaling vector search in production with lots of good experience from large scale open search clusters and Amazon servers and then we'll dive into optimizing I search for production everything from quantization re ranking strategies semantic caching. + Running local models and then for our last session will dive deep into AI powered query understanding and agentech search focused on really interpreting understanding queries leveraging agents as part of that process and so if that's interesting to you there's a link and a QR code here anyone who attends today is eligible for 20% off the course. + And so definitely check it out if you've been considering it there's two weeks left and of course even if you can't attend all the sessions everyone who's enrolled will have permanent access to all of the recordings all the code and all the course materials so you can use these going forward into the future if that's interesting to you so done with that now I'd like to get to our topic which is beyond hybrid search with wormhole vectors. + So let me dive straight in and feel free if you have questions as Dimitri said post them in the comments Dimitri feel free to interrupt me at any point if there's something worth diving into otherwise i'm just going to keep going and kind of focus on conversation at the end so I want to start with some basic material on vectors and vector spaces to kind of set our expectations for where we're going with wormhole vectors to start vectors by definition mathematically or something that have direction and we're going to start with the end of the end of the session. +And in this video let's see in situations [♪magnitude [♪magnitude [♪magnitude [♪magnitude [♪magnitude [♪magnitude [♪magnitude [♪magnitude [♪magnitude [♪magnitude [♪magnitude [♪magnitude [♪magnitude [♪magnitude [♪magnitude [♪manage [♪magnitude [♪f Mash Fans Bake увид then若ymabїr VPN www.pe devils. +comspeaking.habit.com So let's see how much you could get together. +of course GK doctrine is a huge task, a huge task that a man develops on who the heck is dominant during the process and setup it in in vector space and a Hilbert space, but in a semantic vector space, into which we can map a concept. +So whereas, vectors have dimensions, and those dimensions sort of go in any direction. When we talk about an embedding, an embedding is actually a point in vector space. So for example, this point right here, this series of floats for book or tree or what have you. +You can think of it as a vector originating from the origin at 0, 0 here, and extending out to that point. But fundamentally, we think of an embedding as a coordinate, that is a point in vector space that corresponds with some semantic meaning. +And search whenever we're dealing with embeddings, we often have things like word or phrase embeddings, where we take an individual word and leveraging a transformer model typically. We will generate that series of floats that represents the meaning of that word, given the context around it. +But we can also have sentence embeddings, where we look at all of the words in the sentence and their contextual meaning, and generate an embedding that represents the meaning of the sentence. +We can have paragraph embeddings that sort of summarize the core ideas of that paragraph, and the same thing with a document. +Often in search, we'll start with just a document embedding, and when we take a query, we generate an embedding, and we do sort of a vector similarity between defined related documents that match the query. But you can chunk documents up in any way and any number of vectors. +Conceptually, we typically think of embeddings and vectors as having a relatively small number of dimensions. We call these dense vectors, where maybe there's 768 or 1024, some number of dimensions. And we compress lots of data into a continuous space within those. +However, there's also the notion of sparse vectors. And the best way to think of a sparse vector for purposes of our discussion today is to think of lexical search and to think of just, when I'm trying to run a search for keywords. +So imagine you have a 1 million dimensional vector, not 768, but a million dimensions. And every single one of those dimensions corresponds to a term in your index, where you've indexed all of your keywords. And let's just assume that there's only a million terms in your index. +If I wanted to represent latte as a query, well, let me not do latte. Let me do a donut. If I want to represent donut as a query, then I can represent that as a vector with a million zeros minus 1. +And there's a 1 in the column for donut, indicating that this is a million dimensional vector with only one value represented. And that's whether the text donut appears within this document or query. And so that's a sparse vector, where it's sparse because most of the data is not filled. +And I have mostly zeros, but there's some ones. And of course, in this case, if I had the search for cheese pizza, that vector would have two ones in it, one for cheese and one for pizza. So it's a million dimensional vector with two ones in it. +This is just as valid as a vector, as a dense vector, with only 738 dimensions. +But what we typically do, when we start to move from lexical matching, where we can match on those yes or no ones or zeros in an inverted index, what we typically do when we move to doing semantic search is we focus on a much smaller number of dimensions. +And so conceptually, as an embedding here, what I have is eight dimensions. Each of these items that I showed on the previous slide has dimensions indicating whether it's food, whether it's a drink, how much dairy it has, is it bread, is it caffeine, sweet, calories, healthy, et cetera. + So you can see apple juice now is not represented as, it has the word apple and it has the word juice, but it's represented as very much not food, very much a drink, no dairy, no bread, no caffeine, very high on sweet, but not all the way up, very high on calories, but all the way up and in between, but kind of sort of healthy, but not really. +And then same thing, cheese bread sticks, very much food, not a drink, good bit of dairy, very much bread, no caffeine, you get the idea. These map in the attributes are the dimensions of these concepts over here by representing them in these eight dimensions. +And in search, what we typically do is we represent documents and queries leveraging these vectors, and then we do some sort of vector similarity calculation in order to say how later similar things are. + So if I were to, for example, take the vector over here for cheese pizza, and I were to then do a cosine similarity between that vector and every other vector, I would see that cheese bread sticks have a very high similarity followed by cinnamon bread sticks, followed by doughnut, all the way down to water. +So these are essentially ranked based upon cheese pizza, these are the cheesiest, breadiest, unhealthiest, non-drinkiest things at the top. This is still very, very non-drinky, not very healthy here, ranked all the way down to it. +It's essentially opposite in this vector space, which is water, which is all the way on the other end of the spectrum. Same thing with green tea, very similar to water, cappuccino latte, healthy, no calories drink, all the way down to a very unhealthy, very not drink. You get the idea. +So that's essentially in a semantic vector space, things span across these dimensions, and they fit at different places along, within the vector space, that corresponds to the meaning of these attributes. +Now, when we deal with transformers, which we get from all the LLMs today that we're leveraging for vector search, these don't use explicit features, like we have here, food, drink, dairy, bread, et cetera. They use latent features. +And latent just means sort of hidden, or another way to put it is, the dimensions don't correspond one to one with particular attributes. It's combinations of those dimensions together that they give us our meaning. + And so to think of that visually, if I were to create an embedding space, and this is obviously flattened, there could be thousands and thousands of dimensions or hundreds, but in this vector space, if these are all of the embeddings that I have, and I would assert for the phrase Darth Vader, turn that into an embedding and match it, you'll see that over here on the right, I have a cluster of meaning associated with the search for Darth Vader. +Now, there's some other points in various places, but if I were to look at the items in this cluster, I see pictures of Darth Vader, which is what I would expect, because the meaning of Darth Vader is essentially in this area of vector space. +Similarly, if I were to search for Puppy, then this cluster of meaning right here corresponds with puppies and d, I see pictures of puppies. +So the interesting question arises, when I ask what happens, if I were to find the midpoint between Puppy and Darth Vader in this semantic vector space? People have different intuitions about what actually happens here. +Some people think it's, I don't know what I would find in the middle, but the answer is if this vector space is properly constructed, so that the semantic meaning is represented, i.e. +, the further away I get from this point, the more I get away from dog, the further away I get from this, the more I get away from Darth Vader and vice versa, then what I would expect to find, if I sort of average those two, a vector from here and a vector from here together, is a Puppy Darth Vader, a cute Puppy Darth Vader right here in the middle. + For some people that makes intuitive sense, but if you think about what a semantic vector space is doing, where it's representing meaning across a continuous spectrum, you would expect to find this, because I'm essentially finding what the thing is, that is the average sort of in between Darth Vader and Puppy within the semantic vector space. +Now there's all sorts of reasons why this could not work, depending upon how you've changed your model, and if there's too much data being compressed into little space, but conceptually this works. + So similarly, if I were to do an embedding search for superhero flying versus superhero's flying, this is very comparable to running a search for superhero flying sort of with the idea of a singular hero and then sort of tracking out the idea of a singular and adding in the idea of a plural, again, from here to heroes, and then what happens over here is, this is essentially the same vector, but moved toward or in the direction of multiple versus singular. +And so what you see over here, in fact, is that while some of the images are the same, I'll, in general, I'm seeing more images of superheroes that are in groups of multiple superheroes. +And so to demonstrate this with a very explicit concrete example, if I were to take this, an embedding for this image, which is a delorean from back to the future, and I were to describe it, right? +This is a sporty car with one door on either side, and it's kind of boxy and it's got really cool lighting. +And so when I run that search for this embedding on other images, I find other sporty cars, obviously some delorians in here, but also just in general sporty cars with a door on either side and really cool lighting for the most part. +However, what if I were to take an embedding for the query from the last slide superhero, and I were to average that or pull it with this image embedding, what would I get? Well, in fact, we have an example of this in the iPad search book when we're doing multi-modal search. +If I take an embedding for superhero and an embedding for this image, what I, in fact, do get is this very first result as a sporty car with cool lighting with a superhero on top, because that's what I would expect in this semantic vector space to be in between these things. +And for these other images, again, sporty cars, single door, but notice that in all of them, there's a person, and it just so happens that that person is the protagonist of their story. So maybe those stories didn't have actual superheroes, but these are the heroes of those stories. +So you get the idea, and I wanted to paint that conceptually just to talk about regions of vector space and what they represent and how you can use math on vectors to move between them and sort of combine concepts and find related things. +And so one problem, now zooming back out to the topic of today, one problem that we commonly come across, and this is where hybrid search comes into play, is that we have disjoint vector spaces in search and that leads to disjoint query paradigms. +What I mean by that is that we have a sparse, lexical semantic space, which is our inverted index. What I showed you earlier with the million dimensions and the keywords represent the dimensions, that is a vector space. It's just a very sparse one. +Similarly, we have dense vector spaces where most of our embeddings are, that we get out of large language models where they're compact into a small number of dimensions, but they're continuous. +Because we have these two different query paradigms, what often happens with vector search is we say, hey, I don't know how to combine this dense query on this embedding with the sparse query with these keywords. So I'm just going to run them as separate searches. +And in fact, that's what most sort of hybrid searches, hybrid search implementations look like out of the box. So this is an example of RRF or the reciprocal rank fusion algorithm where I'm essentially taking a lexical query over here for the Hobbit. And I'm matching on a bunch of documents. +You'll see each of these has the word Hobbit and it's somewhere either in the title or maybe in the description. But notice that while the first four results look pretty good, the next, these are the only results that had the word Hobbit in them. +And then the rest of these results, the good, the bad, and the ugly. This just happens to match on the word the three times. And then this next result happens to match on the Lord of the ring. So it's got the in it three times as well. +It happened to give me a good result, but it was purely coincidence because it doesn't have the word Hobbit here. And then I get the abyss and then the apartment again, only matching on the word the. +So the lexical search found all the results that had the word Hobbit in them, but it completely missed a whole bunch of other potentially relevant results. Likewise, my vector query over here for this embedding matched the Hobbit here. It matched a Harry Potter movie here. +Similar concepts, similar themes, and similar kind of visual style. Lord of the rings, Lord of the rings, Rise of the Guardians, I guess, is maybe kind of conceptually similar even though it's a cartoon. The wailing, I think, just has a visually similar style, but it's a really bad match. +You get the idea. So there's some really good results I get from the vector search, some the dense vector search, some really good results I get from this lexical or sparse vector search. +And then with hybrid search with reciprocal rank fusion, we can essentially take each of those separate sets of results and combine them together in a way that weights things that both the lexical and the dense search found relevant. +It moves those to the top and then kind of gives us better results overall. And you can see that I've matched most of the results over here. So it's better than either of the two lexical or dense vector search mechanisms individually. However, I'm still treating them as entirely separate things. +I'm doing the lexical search, I'm doing an embedding search and then I'm combining them together. But in reality, there's lots of ways to merge these different paradigms. And even beyond just the embeddings I'm getting from texts, we can get text embeddings. +So for example, we can do a text encoder to generate embeddings for that. We can take images and generate embeddings for that. We can also take user behaviors and generate behavioral based embeddings and combine those together. And there's different ways to generate new vector spaces. +You can concatenate these together and you can do dimensionality reduction or you can stack them. I'm not gonna get into those today. But the reality is we've got a lot of tools at our disposal to be able to query and get at data and relate it in different ways. +In fact, what I described for hybrid searches I can go with RRF is just scratching the surface for what we can do with combining different paradigms. And so this spectrum here on the left, this is token matching or traditional electrical search. +And you'll see that things like TF IDF, will be a matching those kinds of things fit over here. We've also, let me just check the, yeah. Okay, we've also got on the opposite of the spectrum this dense vector search. +And of course the RRF would fall in here in this sort of hybrid sparse retrieval in dense vector search where we're running them independently and in parallel and combining the results. +But there's also mechanisms where we could, for example, run sparse retrieval first and then re-rank using dense embeddings or something like with mini coil, which I mentioned Jenny from Quadrant's going to come talk to us about in the AI powered search course. +You can actually run a sparse search and have embeddings that are sort of adding additional semantic data to your electrical queries to be able to better leverage semantics as part of your sparse search. +There's splay, there's semantic knowledge graphs, there's all these different techniques that we can use to get better search, whether it's hybrid search or leveraging one of the techniques. +But I want to just like mention that there's lots of ways to deal with embeddings and to deal with sparse and dense vectors to combine them to improve query understanding and to improve recall. +And so one of the things that I'm experimenting with is sort of like an emerging way to do this is something I'm calling wormhole vectors. And the idea of wormhole vectors is that I've got these sort of different vector spaces. I've got my sparse lexical vector space, which we talked about. +I've got my dense semantic vector space. And then I mentioned we can generate behavioral vector spaces, which I'll show in just a little bit. And so I want to walk through what this technique looks like. And I do want to frame this talk as this is sort of like new and emerging. +I've got lots of experience doing some of this across different vector spaces. But there's a lot of things that I so need to iron out in terms of best practices for doing this. So treat this as something that's emerging and something you can play with. +And I think the intuition will be really helpful. But I'm, you know, if in preparation for the course and going forward, I'm going to be doing a lot more in terms of concrete examples for this. And so I don't want to get into quantum physics or, you know, physics in general. +But, you know, wormholes, if you're not familiar, are essentially passages through space time. You can think of it as, you know, the ability to, you know, go from one point in space to another point in space and essentially like hop there instantly. +I could get into Einstein Rosenberg's and all that kind of stuff, but don't really want to for purposes of today. And what I do want to do though is talk about, oh, give me a second. Well, I'll skip over this. +We'll maybe come to that in the Q&A if Demetri's interested in talking about this notion of entanglement and how that relates to wormholes. It might be interesting later. But I don't, this is about search, not about quantum physics and physics in general. +So this is what it means to generate a wormhole vector bi-practically. + So if you want to generate a wormhole vector, there's a fundamental base reality of all these vector spaces, meaning if I query with an embedding and a dense vector space, or I query with a lexical query over here, or I query with IDs and user behavior over here, all of those queries ultimately boil down to matching something. +And the something that they match is really critical to how we understand queries and how we understand relevance. And what they boil down to is a document set. So if you run an embedding search over here, you find a point in vector space. +And if it's a dense space, you typically do an approximate nearest neighbor algorithm, or otherwise find the nearest neighbors to whatever point you're querying. And those are your relevant documents. Those documents form a set. +And you can cut off the threshold at any point to say, these are the documents that matched. But that set of documents collectively has some meaning that some relationships within it that represent the meaning of that query. Likewise, if I do a keyword search, I find a document set. +And the collection of those documents represents the meaning of that query, at least as we've been able to represent it in that vector space, same thing over here. +So the idea of a wormhole vector is if I want to query in one vector space and find a corresponding region in another vector space that shares the same meaning semantically, then what I'll do is I'll query in the current vector space, for example, within embedding here. +Actually, let me start it here. I'll query in the sparse-like-school vector space. I will then find a relevant document set. This is what search does. It finds a document set. And then I will derive a wormhole vector to a corresponding region of another vector space. + So for example, once I found a document set over here, I will use that document set to generate, what I'm calling a wormhole vector, but to generate a vector that will allow me to query in the other vector space or hop or traverse to the other vector space instantly, to a region that shares a similar semantic meaning to the region in the Lexical space. +Then once I've found that vector for the other vector space, I will run that query in the other vector space to traverse to that vector space, and then I'll repeat as needed. +So I can actually hop back and forth between vector spaces to find and collect documents, to try to better understand them, and then to use that understanding to take those documents and return them for the full set of search results. So I'm gonna actually just show this visually for a second. +Let me put them up. Let me click here and restart this demo. So imagine I have a sparse vector space over here on the left. The way this works is I send a query in. +This query finds a set of relevant documents that are in this vector space, and what's found those documents, it uses them to essentially a wormhole in the pauses for a second. Oh, maybe I can't. +Essentially, once I've run that query, I find the relevant documents, which are the things close by in vector space. I then use that to generate a vector and embedding that I'm gonna run a search for over here in the dense space. +And once I run that search, you'll notice that in this example, it's not exactly where these documents are, but it's very nearby, meaning the sort of collection of these things together and what's understood semantically about the relationship, maps to this point and vector space on the right. +And then that allows me to then find other things surrounding it that represent a similar meaning. And this is just looking at two vector spaces, a sparse vector space and a dense vector space for keywords and then for embeddings. +But as I mentioned, there's also this notion of a behavioral vector space. So the same thing happens here. I can run a query, find relevant documents, use those as my wormhole. +And then I generate this wormhole vector to hop through the wormhole to the other side to find the region corresponding to that meaning in either of these other vector spaces. +So in this case, if I've done major expectation, which is like the process you go through when you're doing collaborative filtering for recommendations, then I would hop to the corresponding region over here. So that's the general idea, just kind of visually describing it. And hop like over here. +I can just one second. I can be one second. Come on, slides up. All right. And then the next question is, how do we actually create these wormhole vectors? So to meet you, if there's any questions, feel free to interrupt me at any point. But I'll keep going otherwise. +I think we have a couple of questions, but we'll defer at the end. Sounds good. All right. So the question now is, how do we create a wormhole vector? And there's essentially two types that I'm going to focus on right now. One is the, sorry, I lost this. OK. +The first is if I'm trying to go to a dense vector space within bedding. So this is very easy. All I have to do is pull the vectors or average the vectors of the top end documents. So imagine I run a keyword search. I find a set of documents. I rank those. +And then I don't necessarily need to take the entire document set. I could. +But if I wanted to just take the top end documents to get a more sort of semantically relevant, or just let's just say relevant set corresponding to that keyword query, then I generate a new embedding in the dense space that is just an average of those. +If you go back to the Darth Vader example from earlier, where the puppy Darth Vader is in the middle, it was sort of a combination of the meaning of Darth Vader and a meaning of puppy. Think of this as taking a bunch of documents that each have their own meaning. +And when I pull them together, I'm creating an embedding that has the average of the meaning. And if I assume my documents that I queried on the lexical side, have some sense of a shared meaning within them. +And I take the top documents from that, then that shared meaning I can hop over to the dense space, find and then find other things that have similar meaning, even if they don't match the keywords. Likewise, I can go the other direction if I'm in my embedding space, my dense space. +I can run a search, find the top in most related embeddings by cosine similar to what have you. And then conceptually, it seems more difficult to then hop over to the sparse space. +How do you generate a sparse vector? But there's a technique called semi-technology graphs, which I'll kind of walk you through, which allows you to do this. So zooming back out, I mentioned pulling the vectors of the K-NN documents. +All you need to do, again, I query an electrical space, get the top K documents, get the embeddings of those documents and average them together. This is the simple way to do that, just using NumPy. +For the semantics knowledge graph approach, same thing I get the top K documents in the current vector space. And then I do a semantics knowledge graph traversal to derive a sparse lexical query that best represents those documents. +So functionally, if you think of language, I just talk about semantics knowledge graphs for a second and show you the structure of natural language. You could think of it as a graph of relationships. +We've got prefixes and suffixes and those mapped to terms, those mapped to term sequences and documents. But once you get documents and we've got these terms across documents, you can just think of this as a giant graph of relationships. And so I can take individual words. +In this case, Trey, he, he, they all refer to the same entity. I can take other things. And if I think of this as a graph, then in fact, you can leverage your inverted index as a graph and you can traverse it to find these relationships. And so, and a typical search engine. +So like any of the Lucy and engines, for example, you have an inverted index, which is something that maps terms to sets of documents. And then you've got usually a forward index and open search, elastic search, solar, any Lucy and engine. This is going to be your doc values. +But essentially, I can take any term and map it to a set of documents. So if I can take any term, I'm sorry, any document map it to a set of terms. +So if I can take any term and map it to a document set, and I can take any document and map it to a set of terms, then that's a graph and I can traverse back and forth across this. +So for example, if I have the skill of Java and the skill field, and I've got a set of documents that has the keyword Java, you can think of this set of documents as representing the keyword Java. And then similarly, there's sort of a link to other documents. +You'll notice that there's no documents that link both the skill of Java and the skill of hibernate. And so in a set theory view, it looks like this. Notice that this set doesn't intersect with these. +And from a graph theory view, the same underlying indexes look like this where I have a graph where I've got the skill of Java with a has related skill edge to the skill of Scala and the skill of hibernate. And then oncology is completely disconnected from the graph. +And all I'm doing is leveraging my inverted index, my sparse representation to traverse across these relationships. This is very useful for things like disambiguation, where I can take a keyword like server. +I can traverse through documents to find what are the top semantically related categories, for example, DevOps and Travel. And then within each of those categories, I can traverse to other keywords and find which are the most semantically related keywords to server and the DevOps category. +For example, I get terms like Docker, Ingenx, Jenkins, Git, Words and Travel, I get things like tip, restaurant, bill, wages, things like that. And so all of this just leverages an inverted index. There's no embeddings whatsoever. This is all just leveraging the sparse semantic space. +But why this matters for modeling a tent is if I have a query like barbecue near haystack over here, I can generate a sparse vector representing the meaning of barbecue by looking at the index and seeing what's related to it. +So in this context, what I'm able to find is that barbecue is related to things like ribs and brisket and pork and the category of restaurant. IE, I can generate a sparse lexical vector like this, purely from the things that are semantically nearby in my sparse vector space to barbecue. +But also, if you look at the query over on the right, barbecue grill, what I'm able to do is generate a sparse vector that is barbecue or grill or propane or charcoal. Notice that this vector is now different because it's contextualized based upon grill being in this query. +So now my query becomes a category about to our appliance and then this is the list of words that better represents the meaning of barbecue. Again, no embeddings, no transformer models, no LLMs involved here. This is purely leveraging my sparse lexical space in the semantics within it. +And so this is some example source code from the AI Power Search book for traversing semantic knowledge graphs. But the idea here with the wormhole vectors is that I can take a query in any vector space. +So for example, if I take a lexical query here, I can easily take a lasagna or drive through it, what have you. +And I can generate these representations over here by taking a lasagna, finding a dockset that matches that keyword, and then from that dockset, finding these other relationships, for example, lasagna can be described as Italian with keywords like lasagna, Alfredo pasta and Italian. +And then Korean barbecue can be represented as category of Korean with terms like Korean, Bonchon, Saruwan, et cetera, fast food, gets things like McDonald's in window. So this is purely leveraging, and I've been doing this for years, and it works very, very well. +But this is purely leveraging the inverted index in this document set. But the idea with the wormhole vectors is not just to stay within a single vector space, but to be able to go across vector spaces. +So similarly, I should be able to take an embedding that finds a region in semantic vector space and a dense space, find the nearby things, which ultimately just translate to a dockset. +And then from that dockset, I can use the same technique to say what are the things that are related within these documents and generate the similar kinds of outputs over here? +You can also think of this, if taking away all the wormhole vector terminology, you can think of this as just a way to make your embeddings more explainable. +I've got an embedding, I go to a dense vector space, I find documents, and then from that set of documents, I'm now deriving a lexical vector, which is readable that's describing what's happening there. +And of course, I can then turn around and take that and query in my sparse space to match other things that have the terms, but maybe didn't match in the dense space. So that's the general idea. +There's one last thing I wanted to cover briefly, which is this notion of behavioral embedding spaces, because I've mentioned it multiple times. And I have a feeling a lot of people aren't super familiar. And so let me click here. +The general idea, and I'll be very quick through this, we'll spend more time in the AI Power Search course diving into all of this, but the very high level intuition is that when users interact with your documents, right? +They run queries, they click on things, they like them, add to cart purchase, those are user behavioral signals. +And if you've got a sufficient amount of traffic, you want to be collecting those and leveraging them to build reflected intelligence algorithms. + So one of the types I mentioned several earlier signals boosting, collaborative filtering, and matrix factorization, learning to rank, and knowledge graph learning, but specifically on collaborative filtering, which is mostly focused on personalized search, or understanding user behavior to generate better personalized results. +We typically leverage collaborative filtering, which is now algorithm for doing recommendations. So I start with a particular item, or particular user, and I recommend other items based upon that item or user. +So this is what that typically looks like, right? Somebody runs searches or purchases things like Apple and MacBook, and then these are the items they interact with, iPads and MacBook Airs, things like that. +And then for that user, we can generate this list of recommendations based upon running this collaborative filtering algorithm. In this case, I want to briefly mention again, that with typical content based embeddings, I mentioned latent features earlier. +Typically you have items, and there's these densely packed dimensions that represents different features. +Collectively, this particular feature might have a strong correlation with size, this one I have a strong correlation with color, this one I have a strong correlation with, is this kind of like a computer. But those meanings spread across many different of these dimensions. +Similarly, whenever we're doing collaborative filtering, these also rely on latent features or latent dimensions. +So for example, if I have a bunch of users, my first user likes these three movies, my next user likes these three movies, my third user likes these three, my next user likes these three, and my last user likes these three. +You can kind of visually see here, these are some similarity here, there's some similarity here, your brain's probably picking out what it is. +But if I were to map these conceptually, I would say that users one, two, and three tended to like movies that were about superheroes made by Marvel Studios and occasionally Warner Brothers, they're all action movies, and they're not suitable for small children. +Whereas users four and five, all liked animated movies, all of them were suitable for small children, and all of them were made by Disney and Pixar. +A collaborative filtering algorithm sort of discovers these relationships and recommends based upon them because they exist in the underlying documents, even though we don't have them modeled out explicitly. And the way this works with collaborative filtering is we do matrix factorization. +So we start with a user item matrix, where here's my list of users, and here's my items, and then these are sort of the amount to which they like those items. We can derive this based upon just their querian click patterns. +The intermediate step for collaborative filtering is matrix factorization, which is taking this underlying user item interaction matrix and trying to break it into two different matrices. This user feature matrix and this item feature matrix. +And the idea is that if I can generate a set of latent values associated with this user across some number of dimensions, I'm only showing three here visually, because it's a PowerPoint slide, but there'll be more. + And if I have the same latent dimensions over here for the items, when I multiply a particular user and their particular values associated with these latent dimensions with the movie, then I'm pulling apart how much of this belongs to the movie and how much of this belongs to the user in terms of an interest. +But at the end of the day, these are embeddings. This is a user embedding and this is an item embedding. What that means is that, and this is just how it works to do collaborative filtering and actually generate recommendations for particular items, not particularly useful for today. +But what I can do is I can generate these latent embeddings and these essentially allow me to create a behavioral embedding space for my items. +So once I've done that, I can add these behavioral embeddings onto documents just like I do with content-based embeddings or whether it's images or text or what have you, and then leverage those as a behavioral space. +So we do this commonly with personalized search, for example, we'll go through this in the course. +But if I have a person who previously searched for Hello Kitty Plus toy, GE Electric Razor, GE Bright White Lightbulbs, Samsung Stainless Steel refrigerator, I can take a normal query, keyword query for microwave, which just returns random microwaves. +If I use these vectors improperly with no guardrails, I might do things like blur the lines between categories. Most people, if they've searched for a Samsung Stainless Steel refrigerator, the best result here would be a Samsung Stainless Steel microwave. +But if you do this wrong, the sort of naive approach is, I might end up with a Hello Kitty microwave or a Panasonic microwave, or not Panasonic, but I might end up with things that don't exactly match all of the preferences of the category, again, for another day. +But this is how a behavioral vector space would typically be used. +But ultimately, there's a lot of tips and tricks you can use to do AI-powered search to combine all of these different techniques that you might use to run searches and to query understanding of relevance and sort of integrate wormhole vectors in various places. +So there's lots of different query paradigms to experiment with, to merge using wormhole vectors. But that's the general idea. + I wanted to kind of introduce today to get the discussion going about going from thinking of these vector spaces as entirely sort of orthogonal, where I have to query them separately, or maybe I could even query them in the same query, but I'm filtering on them separately, to trying to actually pull out the semantic understanding from one vector space and use that to craft a sort of wormhole or hopping off point to another vector space to sort of continue to explore leveraging a different query paradigm. +So that's pretty much it for the talk for today. Dima, Dmitry, I don't know if you want to start to dive in some questions. +I know some people will have to hop off at the top of the hour because this is scheduled for an hour, but I'm also happy to just kind of keep going with questions a little bit after if it makes sense and people can drop off when they want. But let's maybe dive into some discussion. +Yeah, we have a bunch of questions. Thanks, Tray, a bunch. This is fantastic topic. I just recently traveled to Texas from Finland and it took me like 12 hours. I wish there was a wormhole jump through points so I could just end up there much quicker. There'd be all of them. +We have so many questions, man. So I'll defer my questions off and I'll just jump. There is one logistic question from Ardune. I hope I pronounced you name correctly, I'm sorry. +How's the course different from the AI Powered Search book? And then later is this topic wormhole vectors covered in the book? Awesome. OK, so I would say there's materialized. There's probably like about a 40% overlap. The book is a good solid foundation for how to think about AI Powered Search. +Obviously we go through all the mental models and lots of code examples. So a lot of the labs and a lot of the code for the course will come from the book. However, there's a lot of new topics and things that we just like, we couldn't write a thousand page book. +And so there's a lot of things we just couldn't get to because we had to start from the beginning and frame it. +So things like late interaction models, things like a gentick search that aren't in the book, like late interaction models are referenced, but we just couldn't get into depth that are more modern and interesting ways to solve problems. +Things like mini coil, which I mentioned, those things will be in the course and unique to the course and will have guest speakers who are experts in those things. So I would say the course doesn't expect you to have read the book or to understand the fundamentals in the book. +We'll cover those, but we won't cover everything in the book. And we'll also cover a lot of things that aren't in the book and go in deeper depth. And so I would say, if you've read the book, the course is still going to be really valuable. +And even if you can't make all the sessions, again, the videos and all the materials available for you forever. So you don't have to have read the book to take the course, but if you have read the book, the course is still going to be massively useful. Yeah, so to implement each other. +And by the way, I own the book and it's amazing read in silency. And then the course is a different way of engaging with the material like a dynamic way. +Well, I didn't answer the last part, which is, will wormhole vectors be covered? They will definitely be covered more so as the techniques and strategies for how to hop back and forth between. So some of it's actually in the book, the semantic knowledge graph stuff is already in the book. +But the, yeah, we'll definitely talk about wormhole vectors explicitly and have some more specific examples people can play with. Yeah, awesome. And I do want to mention this is like experimental and emerging. +And there's some things that I glossed over today in terms of hopping to a particular point versus trying to hop to a region and have more of a shape that we could chat about as well. But yeah, there's still some things I'm doing to kind of better understand it fine to know. Yeah, awesome. +I'm trying to speed up. But there's a question from Claudiu. What are the latent features? And basically where you switched from sort of explicit feature metrics to like latent features. Maybe I can take it quickly. It's basically in an LLAM. +So if you deal with an encoder model, where you generate embeddings, basically these are like internal representations that the model learns. And they're like compressed. They're like abstract way of dealing with patterns or relationships and your data. +It's not exactly directly that black and white, but the thing is that on the conceptual level, these are like internal weights that the model learns. Then there is a question from Julian, very concrete one. +Can you give an great example of how to compute the wormhole vector from sparse to dense space? Yeah, so I had a slide. It's sparse to dense is the easy one, which is, let me, let me go back to the slide. One second almost there. Here we go. So to go from sparse to dense, think of it this way. +You've got a bunch of documents in your index, and you generate embeddings for those documents. +That's how your dense space is constructed, right? Those embeddings on the documents, if you query for the documents using keywords in your sparse space, then you're still matching that set of documents, and all of those documents have the embeddings on them. +So all you do is run a keyword search on your documents, take the top end documents are the most relevant, right? They hopefully semantically represent the concept the best. And then you take those embeddings off of them, and you literally abverse them together. +The code for that is on the screen right here, and that you just generate this pooled embedding. +It's that notion of Darth Vader versus Puppy, and finding the Puppy Darth Vader in the middle, right? If someone were to run a keyword search, and it's sort of, it's easy to think of this with a single keyword, but let's go back to my, let's go back to cheese pizza, right? +Like if I search for pizza, I'm gonna match a bunch of pizzas. +If I search for, maybe cheese pizza is back as all pizza has cheese, let's do cinnamon bread sticks, right? If I search for bread, I'm gonna find bread. Documents have the word bread. If I search for cinnamon, I find documents with cinnamon. If I search for sticks, I find documents with sticks. +Sticks by itself isn't really what I'm looking for, but if I do cinnamon bread sticks, then I'm finding all of the documents that have those terms together, which likely are cinnamon bread sticks, or have the notion of cinnamon bread sticks, or talking about cinnamon bread sticks. + So if I take all of those documents, the most relevant ones, and I generate, and I average their embeddings together, and go over to the dense space, where I land should be where the concept of cinnamon bread sticks is, and things nearby, which may not have the word cinnamon bread or sticks in them, should come back. +I might get certain kinds of doughnuts, and certain kind, I might get like a churro or something like that. So that's how it works. But this is the math here. That's actually the easiest way to go from sparse to dense. The dense to sparse requires a semantic knowledge graph or similar. Awesome. +I hope this answers your question, Julen. I would not feel free to unmute and ask full of questions. Otherwise, I'll jump to the next one from Ursula. +Do we build the inverted index and the forward index to build the knowledge graph using just some document chunks? Or do we need a much bigger document base to make it work? It's a good question. Mm-hmm. +So the best way for semantic knowledge graph to work the best, you need to have overlaps of terms that across documents. +Meaning, if I take something like stack exchange, where there's a bunch of topics being talked about, you'll have lots of people who use the same words together and the same documents. +When that happens, you can easily find sets of terms that overlap commonly and use the semantic knowledge graph to generate semantic understanding and relationships based upon those co-occurrences. All the math for that's in the AI powered search book, but that's when it works the best. +Something like Wikipedia is actually even though it's commonly used for every data science project. +It's actually really bad for semantic knowledge graphs because every Wikipedia document tends to be about a particular topic and other than common terminology, you tend to not have a lot of overlap across documents because they're all focused on one idea. +So for semantic knowledge graph to work well, you typically are going to want to have overlap across your documents. What that means is that if you chunk your document so small that you only have a couple of words or sentences or something like that, you lose a lot of that context. +I mean, in general, when you chunk, you lose context. That's the problem with chunking, with most forms of chunking. +And so you have to be careful not to chunk too much, but the end verse is also true if you only have 100 documents and every single one of them is 1,000 pages long, well, there's way too much overlap and everything is related at that point. +So I would say it's no different than just how you would typically segment your documents for any search problem. You need to be granular enough to be useful, but not broad enough to be too general. Yeah, awesome. And now the logistical question from Arun, whether we will share slides. +Yes, absolutely. Yeah, the video for this, everybody who signed up will get in probably like 48 hours, you'll get emailed a copy of the video, so you can refer back to it. And I'll also send an email with the slides probably shortly thereafter. +Yeah, and I plan to publish this in the vector podcast as well. Yes, absolutely. I'll follow up later. The next question is really cool from Cloud. You're creating a wormhole vector that will move us from embedding space to sparse vector. +I understand the methodology, but the way back now, how do we aggregate a set of sparse vectors that represent documents in a way that will allow us to move us to the embedding space? In other words, from the sparse, like you showed the tray, we have millions of dimensions, right? +How do we compact that, right? And don't lose anything and don't introduce any noise when we're not way to the dense space? Yeah, so it's a really great question. +I answered it technically in terms of pulling, but let me add some color to it in terms of techniques. So the... There's a couple of things here. +One, whenever you're querying in an inverted index, there's typically a kind of Boolean matching phase, and then there's a ranking phase, meaning if you had 10 million documents in your index, you're not gonna return a ranked list of 10 million documents. +You're gonna probably return the documents that have the specific keywords you search for, which is gonna be a much smaller document set. And so that's... And you can do the same thing on the dense side with cutoffs on cosine similarity or something like that. +But, the step one is you start with a condensed document set that should represent generally the meaning of what you searched for using the keywords you searched from the lexical side. +However, because the idea of a wormhole vector is to find the best corresponding region in the other semantic space, it can often be useful to not take that entire document set, either matching the query, but if you feel confident about your ranking, then you can take the top end document. +So maybe you match 10,000, and maybe you only take the top hundred and say, hey, from the top hundred, if you know your relevance ranking is good, then you're gonna use that to generate a more precise wormhole vector to the meaning of those top documents over to the dense space. +So, whether you go with the full matching document set, or you go with the top end, that's really a just practical matter of how confident you are on the ranking. If you're really confident in your relevance, you should go with the more relevant documents. +And if you're not, just take the whole document set and it should sort of average out the meaning. One other thing that we didn't really get into is that the strategy, the technique I was showing if I, let me jump back to the final slide one second. If I jump back to... Here. +So the technique that I'm showing, where I get my document set, pull my embeddings together, that ultimately gives me a single embedding, which is a single point over here in my dense vector space. The reality is that different queries have different specificity. +So imagine this is like a job search engine. If I run a search for senior AI search engineer, a late interaction, culverts, signals boosting and collaborative filter. If I run that search, that's a very specific search. +Frankly, it probably doesn't match anybody, but if I ran that search, it would be a very small number of documents. It's very specific. However, and so in that case, having a point kind of makes sense. However, if I ran a search for sales, that's like a third of all jobs. + And for me to take the notion of sales, which is probably a giant region in this vector space with lots of nuance inside of it, and to then turn that into just a point in the other vector space, it's probably not gonna work out super well because there's probably, sales is probably distributed across that other vector space in a much larger region. +And so there's this notion of query specificity which is also really useful. So I would actually argue that the better way to do this technique is as part of your initial query when you're sort of finding the set of documents. + If you can look, for example, at the embeddings and do just like a co-sign similarity across the embeddings that you're pulling, you can go from a bunch of embeddings that are just pulled together into a point to actually saying, what is the relative size of the range of the co-signs within these embeddings? +And if it's a very large range, I understand that this is not a very specific query, it's a broad query. +Therefore, when I go query in the dense space, I need to draw a larger radius or a larger kind of shape around what I'm searching for. +So ideally, you're actually searching for a shape at not just a point, but literally every vector search implementation I've seen at any company is searching on embeddings as points and just looking for the nearest things, not searching on shapes. +And so we don't even really have the query patterns and paradigms in place today to do that kind of a query. But I think that would be a further improvement on the paradigm here. Awesome. Yeah, Tim Allison says thank you. Thanks, Tim. The next question is from Julian. +Can you recommend any papers or other material to explore the topic further? So not really. So the one whole vector thing is something I kind of came up with. I will say, well, two things. One, semantic knowledge graphs. +I actually was the lead author on the original semantic knowledge graph paper back in like, I don't know, 2016 or whatever was published. So this notion of being able to jump between spaces back to a sparse space, you could look at that paper if you want an actual research paper. +I've also given lots of talks about it. It's in the AI powered search book. It'll be in the course. However, the notion of taking, running a query and pulling vectors together and even the notion of query specificity that co-science similarity thing. +If you look at Daniel Tukolang, he actually did a lightning talk with us a week or two ago on query understanding. He actually talks about this notion of a bag of documents to represent a query. It's functionally the exact same thing. + So if I run a query and think of the query's meaning as being represented by the set of documents that match that query, then to take that set of documents that holds that meaning and pull the embeddings to create an average embedding that represents that meaning and embedding space, it's functionally the same thing that Daniel describes when he talks about bags of documents. +So I would say look at Daniel's work, look at the lightning talk he gave a week or two ago with us. And those are some good resources. And of course, the book and the course. Yeah, awesome. Maybe at some point of paper as well, right? Yeah, it's definitely possible. I need lots of good. +I need evals on how this actually does in practice. Yeah, absolutely. Are you not the same question? I'll skip that. Mustafa is asking for a knife phone query cases. So charges may also appear. Would be correct to take the average. Would it be correct to take the average of them? So good question. +So yeah, if you, so, Lexical queries work really well. When you've got particular terms you're looking for, whether it's an ID or whether it's a keyword, they don't work as well with like semantic meaning. +Whereas in a dent space, obviously you query on meaning, but if you try to search for a product ID in a dent space, unless you've fine-tuned it for that, it's gonna do an awful job. + And so, in the case of like searching for iPhone and getting iPhone cases, this somewhat gets back to what I said earlier about, ideally, if you take the top end documents that are the most relevant and you limit to that, like if you're ranking algorithm can already sort of understand that when someone searches for iPhone that they mean an actual iPhone versus a case, that's a better way to go versus just anything that matches the term. +That being said, what you can do is you can, for example, in that case, search for iPhone, find the iPhone cases, along with iPhone, get that average vector, and then there's still this region of, along certain dimensions, it's associated with iPhone. + If you hopped over to the behavioral embedding space, what you're gonna find is that, hey, these cases are very highly correlated to these items, the iPhones that actually correspond to those cases, so that might be a case where you would want to hop to the behavioral space and leverage what's there. +There's also just a note, we've talked about taking entire queries and hopping from between spaces, but there's also a line of thinking and practice here around using this for query understanding, not just ranking, and so you could, for example, split the query into individual keywords. +iPhone, like just the word iPhone, and you could also search on the dense space, and you could try to take the individual pieces and find things related to them, and then leverage that for query understanding to hop back and forth between spaces. +So the answer is you still have the fundamental limitations of each space, but imagine if somebody searched for, I want a phone that's really good at blah, blah, blah, that's made by Apple with product ID X, right? + You can imagine trying to search for that, like an oratory on the, on the lexical side, and you'll actually match that ID and probably have it come up at the very top, and then you can imagine searching for that embedding on the dense space, and you can imagine for each of those hopping back and forth and trying to see what documents are there a couple of times. +So there's, look, sure to answer, there's a lot of different ways that you could leverage this technique to be hopping back and forth. + I'm not gonna claim now that I've thought through every single one of them, and there's lots of ways to do it, but I think as an introduction to the topic, and as a tool that you can add to your tool belt, to be able to get explainability and another vector space, based upon what you found in the first vector space, I think this is a really cool technique, and I just wanted to kind of present it and get feedback and enjoy this discussion. +Yeah, thanks, Trey. We're quite over time, thanks everyone for staying on, and hopefully if you can still stay on, we can get to the bottom of the list. +Yeah, and by the way, to your answer, Trey, I think somewhere there is probably a notion of search result diversity as well, right? So even if the user types iPhone, they only mean the phone, but they actually may mean something else, right? +So showing diverse results, and then traversing to the other side with those diverse results could also make sense. +Absolutely. Then Arjun is asking, is there, if I summarize the question, is there a cheaper way than using semantic knowledge graph? Maybe the fear is that the graph approach is computation-expensive, is there some cheap way to... +It's less computationally expensive than running an embedding model typically. But it just depends. Yeah, I mean, there's other techniques. Like if you have a fine-tuned, blade model, for example, it can give you very comparable kind of semantic understanding on the sparse side. +The problem with that is you have to fine-tune it to your data, and also one of the benefits of the semantic knowledge graph is that if you, I'm just gonna quickly jump to the slide and show you this one. Let me do the one that's got keywords, here we go. Share here. +With the semantic knowledge graph approach, you have the ability to not just represent the query with a bunch of terms with values, but you can actually use any field. +So it's really useful to be able to describe it with a category of Korean and a bunch of terms here, and maybe you've got other fields on your documents that are really useful for describing the document, a taxonomy of some sort. +The semantic knowledge graph gives you a lot richer ability to turn the set of documents into a fully expressive query. So yeah, there's other techniques. You could look at displayed things like that, but nothing that's nearly as expressive. +Yeah, and these are the concepts you're gonna covering the course, right? Yeah, we'll cover it all in the course. For those who are interested to learn more. +Sorry, not trying to make this talk just a big promo for the course, but this is a really, wormhole vectors by themselves are a really interesting topic, but yeah, obviously I would love if you would join us in the course, it'll be fun. +Debo Brata is asking, what do you think about some of the reputable search sites, like in Deden, LinkedIn, where searching for a male engineer will bring your results like data engineers and whatever unrelated stuff, not directly related stuff. +And so the question is why search documents not based on the entire user query, right? Only part of it. Sorry, I'm trying to understand the question in relation to the wormhole vector topic. Yeah, I think it's more, I think it's less directly related. +I think it's more airing on the side of why data bias. Why does reputable search sites do not sort of utilize the semantic search, you know, one to one in a way? Yeah, I got you. +I mean, the reality is that most AI-powered search algorithms, really all of them are used data and the data is biased, right? So like the reality is in the world, if you look at data engineering jobs, they are more statistically skewed towards males being in those jobs and females. +That doesn't mean anything in terms of who can do the job or who can't. It's just a reality that, you know, there tend to be more males in engineering and therefore the data is reflecting that. +It would be nice to be able to take those biases out and in fact, there's ways you can do that, but they're extra work. And so the out of the box algorithms that are typically employed don't necessarily try to tackle those biases. + So yeah, I think it's, I think it's valiant to, you know, try to, especially when you're dealing with things like people's livelihoods and, you know, careers and things like that, I think it's, it's a great exercise and something they should focus on, but it's like, unfortunately, kind of a reality of the underlying data that's being bubbled up, I think. +Yeah. My take is that I- The data is biased. Yeah, I agree. + I've been one job search engine like a couple companies ago and my take is that probably these companies are trying to avoid, you know, these traps when your super, super precise query will either lead to nothing or lead to just a couple of jobs on the screen because their business is to show you as many jobs as possible so that they can monetize that. +So it's all about maybe like business element as well. But I'm sure there are other technical aspects of this, which we should note disregard. Sure. The next question is from Arjun. And you experience how much do the following results differ? First, query against dense vector space directly. +And the second query in sparse vector space and wormhole to dense vector space and finally get the docs that are similar to the wormhole vector average vector. So what's the question? That's the answer. + I mean, at the end of the day, the, if you have a query that you run against your lexical space that matches mostly documents that are related to the query and then you hop over to the dense space, you're typically gonna get a lot of overlap because the lexical space semantics are going to be very similar to the dense vector space semantics in terms of like the underlying meaning. +If you were to take the lexical space and I should mention, you can actually use the wormhole vector in the same vector space. I kind of showed that with like taking a query like lasagna and then rewriting it with a more expanded out lexical query with a category of Italian. +And so you don't have to actually jump between different vector spaces. You can even jump within the same vector space. +And I think that in this context, the more similar, the meaning of the underlying set of documents is matching each query, the more interesting you're gonna be able to find missing links in the other vector space. +If you have very orthogonal queries, like you can imagine on the lexical side searching for orange juice and Nintendo switch, right? Or you'll get nothing for that, but orange juice or Nintendo switch. +Well, you basically end up with a document set that is really two separate document sets, right? + It's like there's not a lot of overlap and if you hop over to the dense space and get the average of those, there's still gonna be things that are like probably close to Nintendo switch and probably close to orange, but the more different those things are, you might get some weird stuff in between. +Because you're now looking across two different places, any Nintendo switch stuff that's orange or related to juice or something might show up, but it's gonna be weird. And so this isn't like a magical silver bullet that solves every query understanding or every relevance problem. +It's just another tool in our toolkit to be able to better reason about the underlying documents and queries and to explain queries in another modality, if you will, another query modality. Yeah, in other words, what your searching for should still kind of make sense. Yeah, yeah. +So if it does, it probably will return some useful results. Tips say, thank you, thank you, tips. +Rustem says, you know, the impact of documents' segmentation, so basically, what are the suggestions to improve that so that wormhole vectors would be useful? Yeah, I think- Especially for much documents. +Yeah, I mean, I think it's common sense, like the same way that you would chunk documents for doing RAAG, you know, I happen to think of, you know, if the documents are too big, then there's too much loss of specificity and too much context being blurred together. +And if the documents are too tiny, then you're losing context, and they're too specific and not enough overlap. +So I think, you know, typical, like whatever your domain is, you know, I mean, if you've got giant PDFs that are books, maybe break into the chapters or possibly even sections of chapters of the large sections, but yeah, just like use common sense. +What's a reasonable size document that represents the meaning of something that is, like sort of, it's called integral, like a whole thing, a whole concept. Yeah, yeah. It depends on the domain, but I would just say, you know, your common sense is probably going to take you far on that one. +Yeah, and like for long documents that are like at 1,000 pages for sure you want to do that. And maybe the last question is from Arjun. Can this idea of wormhole vectors give us more serendipitous results? Give us more what? Serendipitous results. Yeah, absolutely. +So yeah, think of just the behavioral space, right? If I run a query, a keyword query, and then I want to find other things that are related to this that don't match the terms and maybe don't even match the meaning, but like user behavior has said that these things I should suggest. +I'm basically infusing recommendations then. If I help over to the cement to the dense space, then I take my keywords and I'm finding other things that share meaning, but don't necessarily have that keyword. I'm starting with dense and I hop over to the electrical side. +I'm making sure that I'm finding things with that meaning, but I'm adding in keywords that were completely ignored by the dense space. That was not necessarily a serendipitous. That's just like fixing problems, but I would say going from lexical to semantic. +More so we'll get you things that were dismissed. But yeah, for actual serendipitous, the behavioral space is probably going to give you a lot more magic there. All right, awesome. I think it's a wrap. Thanks so much, everyone. +Trey, thank you so much for the presentation, for the idea, and for pounding at the questions with such an immense speed. Thank you all for time. This was awesome. Awesome. Thanks to me to really appreciate you coming on. This was awesome. Thanks to everybody for joining. +And yeah, the video and slides and everything will be coming out to you. And I hope to see you soon. Thank you. See you soon. Bye bye. Bye. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/vector-databases-the-rise-fall-and-future-by-notebooklm.md b/transcripts_with_timestamps/vector-podcast/vector-databases-the-rise-fall-and-future-by-notebooklm.md new file mode 100644 index 0000000..7ad5eb9 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/vector-databases-the-rise-fall-and-future-by-notebooklm.md @@ -0,0 +1,2211 @@ +--- +description: '

https://www.vectorpodcast.com/

I + had fun interacting with NotebookLM - mostly for self-educational purposes. I think + this tool can help by bringing an additional perspective over a textual content. + It ties to what RAG (Retrieval Augmented Generation) can do to content generation + in another modality. In this case, text is used to augment the generation of a podcast + episode.

This episode is based on my blog post: https://dmitry-kan.medium.com/the-rise-fall-and-future-of-vector-databases-how-to-pick-the-one-that-lasts-6b9fbb43bbbe

Time + codes:

00:00 Intro to the topic

1:11 Dmitry''s knowledge in the space

1:54 + Unpacking the Rise & Fall idea

3:14 How attention got back to Vector DBs + for a bit

4:18 Getting practical: Dmitry''s guide for choosing the right Vector + Database

4:39 FAISS

5:34 What if you need fine-grained keyword search? + Look at Apache Lucene-based engines

6:41 Exception to the rule: Late-interaction + models

8:30 Latency and QPS: GSI APU, Vespa, Hyperspace

9:28 Strategic + approach

9:55 Cloud solutions: CosmosDB, Vertex AI, Pinecone, Weaviate Cloud

10:14 + Community voice: pgvector

10:48 Picture of the fascinating future of the field

12:23 + Question to the audience

12:44 Taking a step back: key points

13:45 + Don''t get caught up in trendy shiny new tech

YouTube: https://www.youtube.com/watch?v=403rxbWZK9Y

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20250302_080303_04f8b5f2665529faa9d13569b97a18c9.png +pub_date: Sun, 02 Mar 2025 08:27:58 GMT +title: 'Vector Databases: The Rise, Fall and Future - by NotebookLM' +url: https://rss.com/podcasts/vector-podcast/1922013 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 1.24, "text": " Welcome + back everybody.", "tokens": [50364, 4027, 646, 2201, 13, 50426], "temperature": + 0.0, "avg_logprob": -0.2856946832993451, "compression_ratio": 1.4946996466431095, + "no_speech_prob": 0.01761999912559986}, {"id": 1, "seek": 0, "start": 1.24, "end": + 5.72, "text": " Today, we''re gonna be diving into the world", "tokens": [50426, + 2692, 11, 321, 434, 799, 312, 20241, 666, 264, 1002, 50650], "temperature": 0.0, + "avg_logprob": -0.2856946832993451, "compression_ratio": 1.4946996466431095, "no_speech_prob": + 0.01761999912559986}, {"id": 2, "seek": 0, "start": 5.72, "end": 7.2, "text": " + of vector databases.", "tokens": [50650, 295, 8062, 22380, 13, 50724], "temperature": + 0.0, "avg_logprob": -0.2856946832993451, "compression_ratio": 1.4946996466431095, + "no_speech_prob": 0.01761999912559986}, {"id": 3, "seek": 0, "start": 7.2, "end": + 8.44, "text": " Ooh, fun.", "tokens": [50724, 7951, 11, 1019, 13, 50786], "temperature": + 0.0, "avg_logprob": -0.2856946832993451, "compression_ratio": 1.4946996466431095, + "no_speech_prob": 0.01761999912559986}, {"id": 4, "seek": 0, "start": 8.44, "end": + 10.8, "text": " They''re rise, they''re potential fall,", "tokens": [50786, 814, + 434, 6272, 11, 436, 434, 3995, 2100, 11, 50904], "temperature": 0.0, "avg_logprob": + -0.2856946832993451, "compression_ratio": 1.4946996466431095, "no_speech_prob": + 0.01761999912559986}, {"id": 5, "seek": 0, "start": 10.8, "end": 12.08, "text": + " and with the future holds.", "tokens": [50904, 293, 365, 264, 2027, 9190, 13, + 50968], "temperature": 0.0, "avg_logprob": -0.2856946832993451, "compression_ratio": + 1.4946996466431095, "no_speech_prob": 0.01761999912559986}, {"id": 6, "seek": 0, + "start": 12.08, "end": 12.92, "text": " Okay.", "tokens": [50968, 1033, 13, 51010], + "temperature": 0.0, "avg_logprob": -0.2856946832993451, "compression_ratio": 1.4946996466431095, + "no_speech_prob": 0.01761999912559986}, {"id": 7, "seek": 0, "start": 12.92, "end": + 14.68, "text": " You know, you sent us this fascinating medium article", "tokens": + [51010, 509, 458, 11, 291, 2279, 505, 341, 10343, 6399, 7222, 51098], "temperature": + 0.0, "avg_logprob": -0.2856946832993451, "compression_ratio": 1.4946996466431095, + "no_speech_prob": 0.01761999912559986}, {"id": 8, "seek": 0, "start": 14.68, "end": + 16.240000000000002, "text": " to kind of guide our exploration.", "tokens": [51098, + 281, 733, 295, 5934, 527, 16197, 13, 51176], "temperature": 0.0, "avg_logprob": + -0.2856946832993451, "compression_ratio": 1.4946996466431095, "no_speech_prob": + 0.01761999912559986}, {"id": 9, "seek": 0, "start": 16.240000000000002, "end": 17.080000000000002, + "text": " Ooh.", "tokens": [51176, 7951, 13, 51218], "temperature": 0.0, "avg_logprob": + -0.2856946832993451, "compression_ratio": 1.4946996466431095, "no_speech_prob": + 0.01761999912559986}, {"id": 10, "seek": 0, "start": 17.080000000000002, "end": + 21.64, "text": " Called The Rise, Fall, and Future of Vector Databases.", "tokens": + [51218, 45001, 440, 34482, 11, 7465, 11, 293, 20805, 295, 691, 20814, 40461, 1957, + 13, 51446], "temperature": 0.0, "avg_logprob": -0.2856946832993451, "compression_ratio": + 1.4946996466431095, "no_speech_prob": 0.01761999912559986}, {"id": 11, "seek": 0, + "start": 21.64, "end": 24.36, "text": " How to pick the one that lasts by Dimitri + Can.", "tokens": [51446, 1012, 281, 1888, 264, 472, 300, 20669, 538, 20975, 270, + 470, 1664, 13, 51582], "temperature": 0.0, "avg_logprob": -0.2856946832993451, "compression_ratio": + 1.4946996466431095, "no_speech_prob": 0.01761999912559986}, {"id": 12, "seek": 0, + "start": 24.36, "end": 25.52, "text": " Yeah, I saw that one.", "tokens": [51582, + 865, 11, 286, 1866, 300, 472, 13, 51640], "temperature": 0.0, "avg_logprob": -0.2856946832993451, + "compression_ratio": 1.4946996466431095, "no_speech_prob": 0.01761999912559986}, + {"id": 13, "seek": 0, "start": 25.52, "end": 27.92, "text": " Published January + 6th, 2025.", "tokens": [51640, 21808, 4173, 7061, 1386, 392, 11, 39209, 13, 51760], + "temperature": 0.0, "avg_logprob": -0.2856946832993451, "compression_ratio": 1.4946996466431095, + "no_speech_prob": 0.01761999912559986}, {"id": 14, "seek": 0, "start": 27.92, "end": + 28.84, "text": " Mm-hmm.", "tokens": [51760, 8266, 12, 10250, 13, 51806], "temperature": + 0.0, "avg_logprob": -0.2856946832993451, "compression_ratio": 1.4946996466431095, + "no_speech_prob": 0.01761999912559986}, {"id": 15, "seek": 2884, "start": 28.84, + "end": 33.32, "text": " So guess the term vector database might actually be", "tokens": + [50364, 407, 2041, 264, 1433, 8062, 8149, 1062, 767, 312, 50588], "temperature": + 0.0, "avg_logprob": -0.21132066788211945, "compression_ratio": 1.6265822784810127, + "no_speech_prob": 0.0015298514626920223}, {"id": 16, "seek": 2884, "start": 33.32, + "end": 34.16, "text": " on its way out.", "tokens": [50588, 322, 1080, 636, 484, + 13, 50630], "temperature": 0.0, "avg_logprob": -0.21132066788211945, "compression_ratio": + 1.6265822784810127, "no_speech_prob": 0.0015298514626920223}, {"id": 17, "seek": + 2884, "start": 34.16, "end": 35.16, "text": " Really?", "tokens": [50630, 4083, + 30, 50680], "temperature": 0.0, "avg_logprob": -0.21132066788211945, "compression_ratio": + 1.6265822784810127, "no_speech_prob": 0.0015298514626920223}, {"id": 18, "seek": + 2884, "start": 35.16, "end": 37.76, "text": " And your choice of database could + hinge on", "tokens": [50680, 400, 428, 3922, 295, 8149, 727, 28822, 322, 50810], + "temperature": 0.0, "avg_logprob": -0.21132066788211945, "compression_ratio": 1.6265822784810127, + "no_speech_prob": 0.0015298514626920223}, {"id": 19, "seek": 2884, "start": 37.76, + "end": 40.4, "text": " needing things like faceted search.", "tokens": [50810, 18006, + 721, 411, 1915, 10993, 3164, 13, 50942], "temperature": 0.0, "avg_logprob": -0.21132066788211945, + "compression_ratio": 1.6265822784810127, "no_speech_prob": 0.0015298514626920223}, + {"id": 20, "seek": 2884, "start": 40.4, "end": 41.24, "text": " Oh, wow.", "tokens": + [50942, 876, 11, 6076, 13, 50984], "temperature": 0.0, "avg_logprob": -0.21132066788211945, + "compression_ratio": 1.6265822784810127, "no_speech_prob": 0.0015298514626920223}, + {"id": 21, "seek": 2884, "start": 41.24, "end": 44.08, "text": " Or even those super + cool late interaction models.", "tokens": [50984, 1610, 754, 729, 1687, 1627, 3469, + 9285, 5245, 13, 51126], "temperature": 0.0, "avg_logprob": -0.21132066788211945, + "compression_ratio": 1.6265822784810127, "no_speech_prob": 0.0015298514626920223}, + {"id": 22, "seek": 2884, "start": 44.08, "end": 45.239999999999995, "text": " Huh, + interesting.", "tokens": [51126, 8063, 11, 1880, 13, 51184], "temperature": 0.0, + "avg_logprob": -0.21132066788211945, "compression_ratio": 1.6265822784810127, "no_speech_prob": + 0.0015298514626920223}, {"id": 23, "seek": 2884, "start": 45.239999999999995, "end": + 45.84, "text": " And treat.", "tokens": [51184, 400, 2387, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.21132066788211945, "compression_ratio": 1.6265822784810127, + "no_speech_prob": 0.0015298514626920223}, {"id": 24, "seek": 2884, "start": 45.84, + "end": 46.68, "text": " I know I am.", "tokens": [51214, 286, 458, 286, 669, 13, + 51256], "temperature": 0.0, "avg_logprob": -0.21132066788211945, "compression_ratio": + 1.6265822784810127, "no_speech_prob": 0.0015298514626920223}, {"id": 25, "seek": + 2884, "start": 46.68, "end": 47.44, "text": " Let''s break it all down.", "tokens": + [51256, 961, 311, 1821, 309, 439, 760, 13, 51294], "temperature": 0.0, "avg_logprob": + -0.21132066788211945, "compression_ratio": 1.6265822784810127, "no_speech_prob": + 0.0015298514626920223}, {"id": 26, "seek": 2884, "start": 47.44, "end": 48.28, "text": + " Okay, let''s do it.", "tokens": [51294, 1033, 11, 718, 311, 360, 309, 13, 51336], + "temperature": 0.0, "avg_logprob": -0.21132066788211945, "compression_ratio": 1.6265822784810127, + "no_speech_prob": 0.0015298514626920223}, {"id": 27, "seek": 2884, "start": 48.28, + "end": 49.4, "text": " So it''s gonna be good.", "tokens": [51336, 407, 309, 311, + 799, 312, 665, 13, 51392], "temperature": 0.0, "avg_logprob": -0.21132066788211945, + "compression_ratio": 1.6265822784810127, "no_speech_prob": 0.0015298514626920223}, + {"id": 28, "seek": 2884, "start": 49.4, "end": 51.28, "text": " What I thought was + so interesting about this article", "tokens": [51392, 708, 286, 1194, 390, 370, + 1880, 466, 341, 7222, 51486], "temperature": 0.0, "avg_logprob": -0.21132066788211945, + "compression_ratio": 1.6265822784810127, "no_speech_prob": 0.0015298514626920223}, + {"id": 29, "seek": 2884, "start": 51.28, "end": 54.120000000000005, "text": " is + how it really blends like the technical side", "tokens": [51486, 307, 577, 309, + 534, 37619, 411, 264, 6191, 1252, 51628], "temperature": 0.0, "avg_logprob": -0.21132066788211945, + "compression_ratio": 1.6265822784810127, "no_speech_prob": 0.0015298514626920223}, + {"id": 30, "seek": 2884, "start": 54.120000000000005, "end": 55.8, "text": " with + the broader AI landscape.", "tokens": [51628, 365, 264, 13227, 7318, 9661, 13, 51712], + "temperature": 0.0, "avg_logprob": -0.21132066788211945, "compression_ratio": 1.6265822784810127, + "no_speech_prob": 0.0015298514626920223}, {"id": 31, "seek": 2884, "start": 55.8, + "end": 56.92, "text": " Yeah, you''re right.", "tokens": [51712, 865, 11, 291, 434, + 558, 13, 51768], "temperature": 0.0, "avg_logprob": -0.21132066788211945, "compression_ratio": + 1.6265822784810127, "no_speech_prob": 0.0015298514626920223}, {"id": 32, "seek": + 2884, "start": 56.92, "end": 58.36, "text": " It''s not just about the nuts and + bolts.", "tokens": [51768, 467, 311, 406, 445, 466, 264, 10483, 293, 18127, 13, + 51840], "temperature": 0.0, "avg_logprob": -0.21132066788211945, "compression_ratio": + 1.6265822784810127, "no_speech_prob": 0.0015298514626920223}, {"id": 33, "seek": + 5836, "start": 58.36, "end": 60.519999999999996, "text": " It''s about how perceptions + and adoption", "tokens": [50364, 467, 311, 466, 577, 35258, 293, 19215, 50472], + "temperature": 0.0, "avg_logprob": -0.20929038885867957, "compression_ratio": 1.6266666666666667, + "no_speech_prob": 0.00015660123608540744}, {"id": 34, "seek": 5836, "start": 60.519999999999996, + "end": 64.08, "text": " of vector databases are shifting within the AI world.", + "tokens": [50472, 295, 8062, 22380, 366, 17573, 1951, 264, 7318, 1002, 13, 50650], + "temperature": 0.0, "avg_logprob": -0.20929038885867957, "compression_ratio": 1.6266666666666667, + "no_speech_prob": 0.00015660123608540744}, {"id": 35, "seek": 5836, "start": 64.08, + "end": 64.92, "text": " Absolutely.", "tokens": [50650, 7021, 13, 50692], "temperature": + 0.0, "avg_logprob": -0.20929038885867957, "compression_ratio": 1.6266666666666667, + "no_speech_prob": 0.00015660123608540744}, {"id": 36, "seek": 5836, "start": 64.92, + "end": 66.8, "text": " Like this is not just a technical deep dive.", "tokens": + [50692, 1743, 341, 307, 406, 445, 257, 6191, 2452, 9192, 13, 50786], "temperature": + 0.0, "avg_logprob": -0.20929038885867957, "compression_ratio": 1.6266666666666667, + "no_speech_prob": 0.00015660123608540744}, {"id": 37, "seek": 5836, "start": 66.8, + "end": 67.8, "text": " Right.", "tokens": [50786, 1779, 13, 50836], "temperature": + 0.0, "avg_logprob": -0.20929038885867957, "compression_ratio": 1.6266666666666667, + "no_speech_prob": 0.00015660123608540744}, {"id": 38, "seek": 5836, "start": 67.8, + "end": 69.84, "text": " This is about how people are thinking about these things.", + "tokens": [50836, 639, 307, 466, 577, 561, 366, 1953, 466, 613, 721, 13, 50938], + "temperature": 0.0, "avg_logprob": -0.20929038885867957, "compression_ratio": 1.6266666666666667, + "no_speech_prob": 0.00015660123608540744}, {"id": 39, "seek": 5836, "start": 69.84, + "end": 70.36, "text": " Yeah.", "tokens": [50938, 865, 13, 50964], "temperature": + 0.0, "avg_logprob": -0.20929038885867957, "compression_ratio": 1.6266666666666667, + "no_speech_prob": 0.00015660123608540744}, {"id": 40, "seek": 5836, "start": 70.36, + "end": 71.36, "text": " And using them.", "tokens": [50964, 400, 1228, 552, 13, + 51014], "temperature": 0.0, "avg_logprob": -0.20929038885867957, "compression_ratio": + 1.6266666666666667, "no_speech_prob": 0.00015660123608540744}, {"id": 41, "seek": + 5836, "start": 71.36, "end": 72.36, "text": " Totally.", "tokens": [51014, 22837, + 13, 51064], "temperature": 0.0, "avg_logprob": -0.20929038885867957, "compression_ratio": + 1.6266666666666667, "no_speech_prob": 0.00015660123608540744}, {"id": 42, "seek": + 5836, "start": 72.36, "end": 74.72, "text": " And Dimitri brings this really unique + perspective", "tokens": [51064, 400, 20975, 270, 470, 5607, 341, 534, 3845, 4585, + 51182], "temperature": 0.0, "avg_logprob": -0.20929038885867957, "compression_ratio": + 1.6266666666666667, "no_speech_prob": 0.00015660123608540744}, {"id": 43, "seek": + 5836, "start": 74.72, "end": 75.72, "text": " He does.", "tokens": [51182, 634, + 775, 13, 51232], "temperature": 0.0, "avg_logprob": -0.20929038885867957, "compression_ratio": + 1.6266666666666667, "no_speech_prob": 0.00015660123608540744}, {"id": 44, "seek": + 5836, "start": 75.72, "end": 76.96000000000001, "text": " to this whole conversation.", + "tokens": [51232, 281, 341, 1379, 3761, 13, 51294], "temperature": 0.0, "avg_logprob": + -0.20929038885867957, "compression_ratio": 1.6266666666666667, "no_speech_prob": + 0.00015660123608540744}, {"id": 45, "seek": 5836, "start": 76.96000000000001, "end": + 78.52, "text": " Because he was like deeply involved", "tokens": [51294, 1436, 415, + 390, 411, 8760, 3288, 51372], "temperature": 0.0, "avg_logprob": -0.20929038885867957, + "compression_ratio": 1.6266666666666667, "no_speech_prob": 0.00015660123608540744}, + {"id": 46, "seek": 5836, "start": 78.52, "end": 81.0, "text": " in this emerging + market just a few years back.", "tokens": [51372, 294, 341, 14989, 2142, 445, 257, + 1326, 924, 646, 13, 51496], "temperature": 0.0, "avg_logprob": -0.20929038885867957, + "compression_ratio": 1.6266666666666667, "no_speech_prob": 0.00015660123608540744}, + {"id": 47, "seek": 5836, "start": 81.0, "end": 82.12, "text": " Oh, really?", "tokens": + [51496, 876, 11, 534, 30, 51552], "temperature": 0.0, "avg_logprob": -0.20929038885867957, + "compression_ratio": 1.6266666666666667, "no_speech_prob": 0.00015660123608540744}, + {"id": 48, "seek": 5836, "start": 82.12, "end": 87.4, "text": " He was even advising + VCs on which vector database companies", "tokens": [51552, 634, 390, 754, 35598, + 691, 33290, 322, 597, 8062, 8149, 3431, 51816], "temperature": 0.0, "avg_logprob": + -0.20929038885867957, "compression_ratio": 1.6266666666666667, "no_speech_prob": + 0.00015660123608540744}, {"id": 49, "seek": 8740, "start": 87.4, "end": 88.4, "text": + " to back.", "tokens": [50364, 281, 646, 13, 50414], "temperature": 0.0, "avg_logprob": + -0.23036939559444305, "compression_ratio": 1.694267515923567, "no_speech_prob": + 0.0057039810344576836}, {"id": 50, "seek": 8740, "start": 88.4, "end": 89.4, "text": + " Wow.", "tokens": [50414, 3153, 13, 50464], "temperature": 0.0, "avg_logprob": + -0.23036939559444305, "compression_ratio": 1.694267515923567, "no_speech_prob": + 0.0057039810344576836}, {"id": 51, "seek": 8740, "start": 89.4, "end": 90.4, "text": + " So he''s like an insider.", "tokens": [50464, 407, 415, 311, 411, 364, 40990, + 13, 50514], "temperature": 0.0, "avg_logprob": -0.23036939559444305, "compression_ratio": + 1.694267515923567, "no_speech_prob": 0.0057039810344576836}, {"id": 52, "seek": + 8740, "start": 90.4, "end": 91.4, "text": " Yeah, he''s got the inside scoop.", + "tokens": [50514, 865, 11, 415, 311, 658, 264, 1854, 19555, 13, 50564], "temperature": + 0.0, "avg_logprob": -0.23036939559444305, "compression_ratio": 1.694267515923567, + "no_speech_prob": 0.0057039810344576836}, {"id": 53, "seek": 8740, "start": 91.4, + "end": 94.48, "text": " So he really saw this whole thing unfold firsthand.", "tokens": + [50564, 407, 415, 534, 1866, 341, 1379, 551, 17980, 38599, 13, 50718], "temperature": + 0.0, "avg_logprob": -0.23036939559444305, "compression_ratio": 1.694267515923567, + "no_speech_prob": 0.0057039810344576836}, {"id": 54, "seek": 8740, "start": 94.48, + "end": 95.72, "text": " He was there from the beginning.", "tokens": [50718, 634, + 390, 456, 490, 264, 2863, 13, 50780], "temperature": 0.0, "avg_logprob": -0.23036939559444305, + "compression_ratio": 1.694267515923567, "no_speech_prob": 0.0057039810344576836}, + {"id": 55, "seek": 8740, "start": 95.72, "end": 97.12, "text": " Wow, that''s amazing.", + "tokens": [50780, 3153, 11, 300, 311, 2243, 13, 50850], "temperature": 0.0, "avg_logprob": + -0.23036939559444305, "compression_ratio": 1.694267515923567, "no_speech_prob": + 0.0057039810344576836}, {"id": 56, "seek": 8740, "start": 97.12, "end": 99.68, "text": + " And it''s interesting because just a few years ago,", "tokens": [50850, 400, 309, + 311, 1880, 570, 445, 257, 1326, 924, 2057, 11, 50978], "temperature": 0.0, "avg_logprob": + -0.23036939559444305, "compression_ratio": 1.694267515923567, "no_speech_prob": + 0.0057039810344576836}, {"id": 57, "seek": 8740, "start": 99.68, "end": 102.88000000000001, + "text": " vector databases were like the hot topic.", "tokens": [50978, 8062, 22380, + 645, 411, 264, 2368, 4829, 13, 51138], "temperature": 0.0, "avg_logprob": -0.23036939559444305, + "compression_ratio": 1.694267515923567, "no_speech_prob": 0.0057039810344576836}, + {"id": 58, "seek": 8740, "start": 102.88000000000001, "end": 104.2, "text": " They + were everywhere, right?", "tokens": [51138, 814, 645, 5315, 11, 558, 30, 51204], + "temperature": 0.0, "avg_logprob": -0.23036939559444305, "compression_ratio": 1.694267515923567, + "no_speech_prob": 0.0057039810344576836}, {"id": 59, "seek": 8740, "start": 104.2, + "end": 105.36000000000001, "text": " Everybody was talking about it.", "tokens": + [51204, 7646, 390, 1417, 466, 309, 13, 51262], "temperature": 0.0, "avg_logprob": + -0.23036939559444305, "compression_ratio": 1.694267515923567, "no_speech_prob": + 0.0057039810344576836}, {"id": 60, "seek": 8740, "start": 105.36000000000001, "end": + 107.4, "text": " Like they were the key to unlocking", "tokens": [51262, 1743, 436, + 645, 264, 2141, 281, 49620, 51364], "temperature": 0.0, "avg_logprob": -0.23036939559444305, + "compression_ratio": 1.694267515923567, "no_speech_prob": 0.0057039810344576836}, + {"id": 61, "seek": 8740, "start": 107.4, "end": 110.44, "text": " all these powerful + AI applications.", "tokens": [51364, 439, 613, 4005, 7318, 5821, 13, 51516], "temperature": + 0.0, "avg_logprob": -0.23036939559444305, "compression_ratio": 1.694267515923567, + "no_speech_prob": 0.0057039810344576836}, {"id": 62, "seek": 8740, "start": 110.44, + "end": 112.36000000000001, "text": " Like this was gonna change everything.", "tokens": + [51516, 1743, 341, 390, 799, 1319, 1203, 13, 51612], "temperature": 0.0, "avg_logprob": + -0.23036939559444305, "compression_ratio": 1.694267515923567, "no_speech_prob": + 0.0057039810344576836}, {"id": 63, "seek": 8740, "start": 112.36000000000001, "end": + 113.36000000000001, "text": " Everyone was so excited.", "tokens": [51612, 5198, + 390, 370, 2919, 13, 51662], "temperature": 0.0, "avg_logprob": -0.23036939559444305, + "compression_ratio": 1.694267515923567, "no_speech_prob": 0.0057039810344576836}, + {"id": 64, "seek": 8740, "start": 113.36000000000001, "end": 114.36000000000001, + "text": " Yeah.", "tokens": [51662, 865, 13, 51712], "temperature": 0.0, "avg_logprob": + -0.23036939559444305, "compression_ratio": 1.694267515923567, "no_speech_prob": + 0.0057039810344576836}, {"id": 65, "seek": 8740, "start": 114.36000000000001, "end": + 116.48, "text": " Okay, so let''s unpack this whole rise and fall idea.", "tokens": + [51712, 1033, 11, 370, 718, 311, 26699, 341, 1379, 6272, 293, 2100, 1558, 13, 51818], + "temperature": 0.0, "avg_logprob": -0.23036939559444305, "compression_ratio": 1.694267515923567, + "no_speech_prob": 0.0057039810344576836}, {"id": 66, "seek": 8740, "start": 116.48, + "end": 117.32000000000001, "text": " Okay.", "tokens": [51818, 1033, 13, 51860], + "temperature": 0.0, "avg_logprob": -0.23036939559444305, "compression_ratio": 1.694267515923567, + "no_speech_prob": 0.0057039810344576836}, {"id": 67, "seek": 11732, "start": 117.32, + "end": 118.96, "text": " So Dimitri noticed something interesting.", "tokens": [50364, + 407, 20975, 270, 470, 5694, 746, 1880, 13, 50446], "temperature": 0.0, "avg_logprob": + -0.20658189392089843, "compression_ratio": 1.55, "no_speech_prob": 0.0007272476796060801}, + {"id": 68, "seek": 11732, "start": 118.96, "end": 119.96, "text": " What''s up?", + "tokens": [50446, 708, 311, 493, 30, 50496], "temperature": 0.0, "avg_logprob": + -0.20658189392089843, "compression_ratio": 1.55, "no_speech_prob": 0.0007272476796060801}, + {"id": 69, "seek": 11732, "start": 119.96, "end": 123.32, "text": " Fewer people + were reading his early articles", "tokens": [50496, 33468, 260, 561, 645, 3760, + 702, 2440, 11290, 50664], "temperature": 0.0, "avg_logprob": -0.20658189392089843, + "compression_ratio": 1.55, "no_speech_prob": 0.0007272476796060801}, {"id": 70, + "seek": 11732, "start": 123.32, "end": 125.08, "text": " about vector databases.", + "tokens": [50664, 466, 8062, 22380, 13, 50752], "temperature": 0.0, "avg_logprob": + -0.20658189392089843, "compression_ratio": 1.55, "no_speech_prob": 0.0007272476796060801}, + {"id": 71, "seek": 11732, "start": 125.08, "end": 126.08, "text": " Oh, really?", + "tokens": [50752, 876, 11, 534, 30, 50802], "temperature": 0.0, "avg_logprob": -0.20658189392089843, + "compression_ratio": 1.55, "no_speech_prob": 0.0007272476796060801}, {"id": 72, + "seek": 11732, "start": 126.08, "end": 127.08, "text": " Huh?", "tokens": [50802, + 8063, 30, 50852], "temperature": 0.0, "avg_logprob": -0.20658189392089843, "compression_ratio": + 1.55, "no_speech_prob": 0.0007272476796060801}, {"id": 73, "seek": 11732, "start": + 127.08, "end": 128.07999999999998, "text": " I wonder why.", "tokens": [50852, 286, + 2441, 983, 13, 50902], "temperature": 0.0, "avg_logprob": -0.20658189392089843, + "compression_ratio": 1.55, "no_speech_prob": 0.0007272476796060801}, {"id": 74, + "seek": 11732, "start": 128.07999999999998, "end": 129.07999999999998, "text": " + What do you make of that?", "tokens": [50902, 708, 360, 291, 652, 295, 300, 30, + 50952], "temperature": 0.0, "avg_logprob": -0.20658189392089843, "compression_ratio": + 1.55, "no_speech_prob": 0.0007272476796060801}, {"id": 75, "seek": 11732, "start": + 129.07999999999998, "end": 132.51999999999998, "text": " Well, you know, it kind + of hints at a potential shift", "tokens": [50952, 1042, 11, 291, 458, 11, 309, 733, + 295, 27271, 412, 257, 3995, 5513, 51124], "temperature": 0.0, "avg_logprob": -0.20658189392089843, + "compression_ratio": 1.55, "no_speech_prob": 0.0007272476796060801}, {"id": 76, + "seek": 11732, "start": 132.51999999999998, "end": 133.51999999999998, "text": " + in the industry.", "tokens": [51124, 294, 264, 3518, 13, 51174], "temperature": + 0.0, "avg_logprob": -0.20658189392089843, "compression_ratio": 1.55, "no_speech_prob": + 0.0007272476796060801}, {"id": 77, "seek": 11732, "start": 133.51999999999998, "end": + 134.51999999999998, "text": " Okay.", "tokens": [51174, 1033, 13, 51224], "temperature": + 0.0, "avg_logprob": -0.20658189392089843, "compression_ratio": 1.55, "no_speech_prob": + 0.0007272476796060801}, {"id": 78, "seek": 11732, "start": 134.51999999999998, "end": + 138.4, "text": " Instead of being seen as these like standalone solutions,", "tokens": + [51224, 7156, 295, 885, 1612, 382, 613, 411, 37454, 6547, 11, 51418], "temperature": + 0.0, "avg_logprob": -0.20658189392089843, "compression_ratio": 1.55, "no_speech_prob": + 0.0007272476796060801}, {"id": 79, "seek": 11732, "start": 138.4, "end": 140.68, + "text": " it seems like vector search technology", "tokens": [51418, 309, 2544, + 411, 8062, 3164, 2899, 51532], "temperature": 0.0, "avg_logprob": -0.20658189392089843, + "compression_ratio": 1.55, "no_speech_prob": 0.0007272476796060801}, {"id": 80, + "seek": 11732, "start": 140.68, "end": 144.35999999999999, "text": " is kind of + merging with other AI advancements,", "tokens": [51532, 307, 733, 295, 44559, 365, + 661, 7318, 7295, 1117, 11, 51716], "temperature": 0.0, "avg_logprob": -0.20658189392089843, + "compression_ratio": 1.55, "no_speech_prob": 0.0007272476796060801}, {"id": 81, + "seek": 11732, "start": 144.35999999999999, "end": 146.32, "text": " becoming part + of a bigger picture.", "tokens": [51716, 5617, 644, 295, 257, 3801, 3036, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.20658189392089843, "compression_ratio": 1.55, + "no_speech_prob": 0.0007272476796060801}, {"id": 82, "seek": 14632, "start": 146.32, + "end": 147.32, "text": " Like what?", "tokens": [50364, 1743, 437, 30, 50414], "temperature": + 0.0, "avg_logprob": -0.1910396853819588, "compression_ratio": 1.6547231270358307, + "no_speech_prob": 0.0006415004609152675}, {"id": 83, "seek": 14632, "start": 147.32, + "end": 150.44, "text": " Think LLM, small time modal search.", "tokens": [50414, + 6557, 441, 43, 44, 11, 1359, 565, 39745, 3164, 13, 50570], "temperature": 0.0, "avg_logprob": + -0.1910396853819588, "compression_ratio": 1.6547231270358307, "no_speech_prob": + 0.0006415004609152675}, {"id": 84, "seek": 14632, "start": 150.44, "end": 152.0, + "text": " They''re all getting more integrated.", "tokens": [50570, 814, 434, 439, + 1242, 544, 10919, 13, 50648], "temperature": 0.0, "avg_logprob": -0.1910396853819588, + "compression_ratio": 1.6547231270358307, "no_speech_prob": 0.0006415004609152675}, + {"id": 85, "seek": 14632, "start": 152.0, "end": 154.35999999999999, "text": " Okay, + so it''s not that vector databases", "tokens": [50648, 1033, 11, 370, 309, 311, + 406, 300, 8062, 22380, 50766], "temperature": 0.0, "avg_logprob": -0.1910396853819588, + "compression_ratio": 1.6547231270358307, "no_speech_prob": 0.0006415004609152675}, + {"id": 86, "seek": 14632, "start": 154.35999999999999, "end": 155.6, "text": " are + like vanishing.", "tokens": [50766, 366, 411, 3161, 3807, 13, 50828], "temperature": + 0.0, "avg_logprob": -0.1910396853819588, "compression_ratio": 1.6547231270358307, + "no_speech_prob": 0.0006415004609152675}, {"id": 87, "seek": 14632, "start": 155.6, + "end": 156.44, "text": " Right.", "tokens": [50828, 1779, 13, 50870], "temperature": + 0.0, "avg_logprob": -0.1910396853819588, "compression_ratio": 1.6547231270358307, + "no_speech_prob": 0.0006415004609152675}, {"id": 88, "seek": 14632, "start": 156.44, + "end": 157.44, "text": " They''re evolving.", "tokens": [50870, 814, 434, 21085, + 13, 50920], "temperature": 0.0, "avg_logprob": -0.1910396853819588, "compression_ratio": + 1.6547231270358307, "no_speech_prob": 0.0006415004609152675}, {"id": 89, "seek": + 14632, "start": 157.44, "end": 158.44, "text": " Exactly.", "tokens": [50920, 7587, + 13, 50970], "temperature": 0.0, "avg_logprob": -0.1910396853819588, "compression_ratio": + 1.6547231270358307, "no_speech_prob": 0.0006415004609152675}, {"id": 90, "seek": + 14632, "start": 158.44, "end": 159.72, "text": " I''m blending into more comprehensive + solutions.", "tokens": [50970, 286, 478, 23124, 666, 544, 13914, 6547, 13, 51034], + "temperature": 0.0, "avg_logprob": -0.1910396853819588, "compression_ratio": 1.6547231270358307, + "no_speech_prob": 0.0006415004609152675}, {"id": 91, "seek": 14632, "start": 159.72, + "end": 160.72, "text": " That''s it.", "tokens": [51034, 663, 311, 309, 13, 51084], + "temperature": 0.0, "avg_logprob": -0.1910396853819588, "compression_ratio": 1.6547231270358307, + "no_speech_prob": 0.0006415004609152675}, {"id": 92, "seek": 14632, "start": 160.72, + "end": 161.72, "text": " Okay, I got it.", "tokens": [51084, 1033, 11, 286, 658, + 309, 13, 51134], "temperature": 0.0, "avg_logprob": -0.1910396853819588, "compression_ratio": + 1.6547231270358307, "no_speech_prob": 0.0006415004609152675}, {"id": 93, "seek": + 14632, "start": 161.72, "end": 163.48, "text": " The technology itself is still + crucial.", "tokens": [51134, 440, 2899, 2564, 307, 920, 11462, 13, 51222], "temperature": + 0.0, "avg_logprob": -0.1910396853819588, "compression_ratio": 1.6547231270358307, + "no_speech_prob": 0.0006415004609152675}, {"id": 94, "seek": 14632, "start": 163.48, + "end": 166.64, "text": " But how we think about it and use it is evolving.", "tokens": + [51222, 583, 577, 321, 519, 466, 309, 293, 764, 309, 307, 21085, 13, 51380], "temperature": + 0.0, "avg_logprob": -0.1910396853819588, "compression_ratio": 1.6547231270358307, + "no_speech_prob": 0.0006415004609152675}, {"id": 95, "seek": 14632, "start": 166.64, + "end": 167.64, "text": " Okay.", "tokens": [51380, 1033, 13, 51430], "temperature": + 0.0, "avg_logprob": -0.1910396853819588, "compression_ratio": 1.6547231270358307, + "no_speech_prob": 0.0006415004609152675}, {"id": 96, "seek": 14632, "start": 167.64, + "end": 170.04, "text": " Like, you know, you have your traditional databases, right?", + "tokens": [51430, 1743, 11, 291, 458, 11, 291, 362, 428, 5164, 22380, 11, 558, 30, + 51550], "temperature": 0.0, "avg_logprob": -0.1910396853819588, "compression_ratio": + 1.6547231270358307, "no_speech_prob": 0.0006415004609152675}, {"id": 97, "seek": + 14632, "start": 170.04, "end": 172.4, "text": " Like your SQL and no SQL types.", + "tokens": [51550, 1743, 428, 19200, 293, 572, 19200, 3467, 13, 51668], "temperature": + 0.0, "avg_logprob": -0.1910396853819588, "compression_ratio": 1.6547231270358307, + "no_speech_prob": 0.0006415004609152675}, {"id": 98, "seek": 14632, "start": 172.4, + "end": 176.04, "text": " Well, a lot of them have integrated vector search capabilities + now.", "tokens": [51668, 1042, 11, 257, 688, 295, 552, 362, 10919, 8062, 3164, 10862, + 586, 13, 51850], "temperature": 0.0, "avg_logprob": -0.1910396853819588, "compression_ratio": + 1.6547231270358307, "no_speech_prob": 0.0006415004609152675}, {"id": 99, "seek": + 17604, "start": 176.04, "end": 177.28, "text": " Oh, wow.", "tokens": [50364, 876, + 11, 6076, 13, 50426], "temperature": 0.0, "avg_logprob": -0.18342658148871527, "compression_ratio": + 1.7788161993769471, "no_speech_prob": 0.001304388279095292}, {"id": 100, "seek": + 17604, "start": 177.28, "end": 179.95999999999998, "text": " So the data type itself + is becoming normalized", "tokens": [50426, 407, 264, 1412, 2010, 2564, 307, 5617, + 48704, 50560], "temperature": 0.0, "avg_logprob": -0.18342658148871527, "compression_ratio": + 1.7788161993769471, "no_speech_prob": 0.001304388279095292}, {"id": 101, "seek": + 17604, "start": 179.95999999999998, "end": 181.6, "text": " within these existing + systems.", "tokens": [50560, 1951, 613, 6741, 3652, 13, 50642], "temperature": 0.0, + "avg_logprob": -0.18342658148871527, "compression_ratio": 1.7788161993769471, "no_speech_prob": + 0.001304388279095292}, {"id": 102, "seek": 17604, "start": 181.6, "end": 182.6, + "text": " Okay.", "tokens": [50642, 1033, 13, 50692], "temperature": 0.0, "avg_logprob": + -0.18342658148871527, "compression_ratio": 1.7788161993769471, "no_speech_prob": + 0.001304388279095292}, {"id": 103, "seek": 17604, "start": 182.6, "end": 183.6, + "text": " I see.", "tokens": [50692, 286, 536, 13, 50742], "temperature": 0.0, "avg_logprob": + -0.18342658148871527, "compression_ratio": 1.7788161993769471, "no_speech_prob": + 0.001304388279095292}, {"id": 104, "seek": 17604, "start": 183.6, "end": 184.6, + "text": " So it''s becoming more commonplace.", "tokens": [50742, 407, 309, 311, + 5617, 544, 2689, 6742, 13, 50792], "temperature": 0.0, "avg_logprob": -0.18342658148871527, + "compression_ratio": 1.7788161993769471, "no_speech_prob": 0.001304388279095292}, + {"id": 105, "seek": 17604, "start": 184.6, "end": 185.6, "text": " Yeah.", "tokens": + [50792, 865, 13, 50842], "temperature": 0.0, "avg_logprob": -0.18342658148871527, + "compression_ratio": 1.7788161993769471, "no_speech_prob": 0.001304388279095292}, + {"id": 106, "seek": 17604, "start": 185.6, "end": 186.6, "text": " Exactly.", "tokens": + [50842, 7587, 13, 50892], "temperature": 0.0, "avg_logprob": -0.18342658148871527, + "compression_ratio": 1.7788161993769471, "no_speech_prob": 0.001304388279095292}, + {"id": 107, "seek": 17604, "start": 186.6, "end": 187.6, "text": " It''s not this + like niche thing anymore.", "tokens": [50892, 467, 311, 406, 341, 411, 19956, 551, + 3602, 13, 50942], "temperature": 0.0, "avg_logprob": -0.18342658148871527, "compression_ratio": + 1.7788161993769471, "no_speech_prob": 0.001304388279095292}, {"id": 108, "seek": + 17604, "start": 187.6, "end": 188.6, "text": " Right.", "tokens": [50942, 1779, + 13, 50992], "temperature": 0.0, "avg_logprob": -0.18342658148871527, "compression_ratio": + 1.7788161993769471, "no_speech_prob": 0.001304388279095292}, {"id": 109, "seek": + 17604, "start": 188.6, "end": 189.6, "text": " It''s just part of the toolkit.", + "tokens": [50992, 467, 311, 445, 644, 295, 264, 40167, 13, 51042], "temperature": + 0.0, "avg_logprob": -0.18342658148871527, "compression_ratio": 1.7788161993769471, + "no_speech_prob": 0.001304388279095292}, {"id": 110, "seek": 17604, "start": 189.6, + "end": 191.23999999999998, "text": " It''s becoming part of the fabric of how we + work with data.", "tokens": [51042, 467, 311, 5617, 644, 295, 264, 7253, 295, 577, + 321, 589, 365, 1412, 13, 51124], "temperature": 0.0, "avg_logprob": -0.18342658148871527, + "compression_ratio": 1.7788161993769471, "no_speech_prob": 0.001304388279095292}, + {"id": 111, "seek": 17604, "start": 191.23999999999998, "end": 193.12, "text": " + I like that fabric of how we work with data.", "tokens": [51124, 286, 411, 300, + 7253, 295, 577, 321, 589, 365, 1412, 13, 51218], "temperature": 0.0, "avg_logprob": + -0.18342658148871527, "compression_ratio": 1.7788161993769471, "no_speech_prob": + 0.001304388279095292}, {"id": 112, "seek": 17604, "start": 193.12, "end": 194.28, + "text": " It''s a good way to put it right.", "tokens": [51218, 467, 311, 257, 665, + 636, 281, 829, 309, 558, 13, 51276], "temperature": 0.0, "avg_logprob": -0.18342658148871527, + "compression_ratio": 1.7788161993769471, "no_speech_prob": 0.001304388279095292}, + {"id": 113, "seek": 17604, "start": 194.28, "end": 195.92, "text": " But here''s + where things get really interesting.", "tokens": [51276, 583, 510, 311, 689, 721, + 483, 534, 1880, 13, 51358], "temperature": 0.0, "avg_logprob": -0.18342658148871527, + "compression_ratio": 1.7788161993769471, "no_speech_prob": 0.001304388279095292}, + {"id": 114, "seek": 17604, "start": 195.92, "end": 196.92, "text": " Okay.", "tokens": + [51358, 1033, 13, 51408], "temperature": 0.0, "avg_logprob": -0.18342658148871527, + "compression_ratio": 1.7788161993769471, "no_speech_prob": 0.001304388279095292}, + {"id": 115, "seek": 17604, "start": 196.92, "end": 197.92, "text": " Tell me.", + "tokens": [51408, 5115, 385, 13, 51458], "temperature": 0.0, "avg_logprob": -0.18342658148871527, + "compression_ratio": 1.7788161993769471, "no_speech_prob": 0.001304388279095292}, + {"id": 116, "seek": 17604, "start": 197.92, "end": 202.12, "text": " Dimitri saw + this resurgence in views for those older articles.", "tokens": [51458, 20975, 270, + 470, 1866, 341, 725, 44607, 294, 6809, 337, 729, 4906, 11290, 13, 51668], "temperature": + 0.0, "avg_logprob": -0.18342658148871527, "compression_ratio": 1.7788161993769471, + "no_speech_prob": 0.001304388279095292}, {"id": 117, "seek": 17604, "start": 202.12, + "end": 203.76, "text": " Oh, so people are coming back to them.", "tokens": [51668, + 876, 11, 370, 561, 366, 1348, 646, 281, 552, 13, 51750], "temperature": 0.0, "avg_logprob": + -0.18342658148871527, "compression_ratio": 1.7788161993769471, "no_speech_prob": + 0.001304388279095292}, {"id": 118, "seek": 17604, "start": 203.76, "end": 205.32, + "text": " They''re coming back and guess what?", "tokens": [51750, 814, 434, 1348, + 646, 293, 2041, 437, 30, 51828], "temperature": 0.0, "avg_logprob": -0.18342658148871527, + "compression_ratio": 1.7788161993769471, "no_speech_prob": 0.001304388279095292}, + {"id": 119, "seek": 17604, "start": 205.32, "end": 206.0, "text": " What?", "tokens": + [51828, 708, 30, 51862], "temperature": 0.0, "avg_logprob": -0.18342658148871527, + "compression_ratio": 1.7788161993769471, "no_speech_prob": 0.001304388279095292}, + {"id": 120, "seek": 20600, "start": 206.0, "end": 211.24, "text": " We''ve been + right when major funding announcements hit for some vector database startups back + in", "tokens": [50364, 492, 600, 668, 558, 562, 2563, 6137, 23785, 2045, 337, 512, + 8062, 8149, 28041, 646, 294, 50626], "temperature": 0.0, "avg_logprob": -0.34150967355501854, + "compression_ratio": 1.5121951219512195, "no_speech_prob": 0.003568369895219803}, + {"id": 121, "seek": 20600, "start": 211.24, "end": 212.24, "text": " April 2023.", + "tokens": [50626, 6929, 44377, 13, 50676], "temperature": 0.0, "avg_logprob": -0.34150967355501854, + "compression_ratio": 1.5121951219512195, "no_speech_prob": 0.003568369895219803}, + {"id": 122, "seek": 20600, "start": 212.24, "end": 213.24, "text": " Oh, interesting.", + "tokens": [50676, 876, 11, 1880, 13, 50726], "temperature": 0.0, "avg_logprob": + -0.34150967355501854, "compression_ratio": 1.5121951219512195, "no_speech_prob": + 0.003568369895219803}, {"id": 123, "seek": 20600, "start": 213.24, "end": 215.8, + "text": " Like, big money was flowing back into this space.", "tokens": [50726, + 1743, 11, 955, 1460, 390, 13974, 646, 666, 341, 1901, 13, 50854], "temperature": + 0.0, "avg_logprob": -0.34150967355501854, "compression_ratio": 1.5121951219512195, + "no_speech_prob": 0.003568369895219803}, {"id": 124, "seek": 20600, "start": 215.8, + "end": 216.8, "text": " Yeah.", "tokens": [50854, 865, 13, 50904], "temperature": + 0.0, "avg_logprob": -0.34150967355501854, "compression_ratio": 1.5121951219512195, + "no_speech_prob": 0.003568369895219803}, {"id": 125, "seek": 20600, "start": 216.8, + "end": 218.64, "text": " Like, pine cones, $100 million series B.", "tokens": [50904, + 1743, 11, 15113, 40548, 11, 1848, 6879, 2459, 2638, 363, 13, 50996], "temperature": + 0.0, "avg_logprob": -0.34150967355501854, "compression_ratio": 1.5121951219512195, + "no_speech_prob": 0.003568369895219803}, {"id": 126, "seek": 20600, "start": 218.64, + "end": 219.64, "text": " Yeah.", "tokens": [50996, 865, 13, 51046], "temperature": + 0.0, "avg_logprob": -0.34150967355501854, "compression_ratio": 1.5121951219512195, + "no_speech_prob": 0.003568369895219803}, {"id": 127, "seek": 20600, "start": 219.64, + "end": 220.64, "text": " Pine cone was huge.", "tokens": [51046, 33531, 19749, 390, + 2603, 13, 51096], "temperature": 0.0, "avg_logprob": -0.34150967355501854, "compression_ratio": + 1.5121951219512195, "no_speech_prob": 0.003568369895219803}, {"id": 128, "seek": + 20600, "start": 220.64, "end": 222.72, "text": " Weve 8 securing $50 million.", + "tokens": [51096, 492, 303, 1649, 33640, 1848, 2803, 2459, 13, 51200], "temperature": + 0.0, "avg_logprob": -0.34150967355501854, "compression_ratio": 1.5121951219512195, + "no_speech_prob": 0.003568369895219803}, {"id": 129, "seek": 20600, "start": 222.72, + "end": 223.72, "text": " Huh.", "tokens": [51200, 8063, 13, 51250], "temperature": + 0.0, "avg_logprob": -0.34150967355501854, "compression_ratio": 1.5121951219512195, + "no_speech_prob": 0.003568369895219803}, {"id": 130, "seek": 20600, "start": 223.72, + "end": 227.48, "text": " And QDran getting $7.5 million in seed funding.", "tokens": + [51250, 400, 1249, 35, 4257, 1242, 1848, 22, 13, 20, 2459, 294, 8871, 6137, 13, + 51438], "temperature": 0.0, "avg_logprob": -0.34150967355501854, "compression_ratio": + 1.5121951219512195, "no_speech_prob": 0.003568369895219803}, {"id": 131, "seek": + 20600, "start": 227.48, "end": 228.48, "text": " Yeah.", "tokens": [51438, 865, + 13, 51488], "temperature": 0.0, "avg_logprob": -0.34150967355501854, "compression_ratio": + 1.5121951219512195, "no_speech_prob": 0.003568369895219803}, {"id": 132, "seek": + 20600, "start": 228.48, "end": 229.8, "text": " Those were big headlines.", "tokens": + [51488, 3950, 645, 955, 23867, 13, 51554], "temperature": 0.0, "avg_logprob": -0.34150967355501854, + "compression_ratio": 1.5121951219512195, "no_speech_prob": 0.003568369895219803}, + {"id": 133, "seek": 20600, "start": 229.8, "end": 230.8, "text": " They were.", + "tokens": [51554, 814, 645, 13, 51604], "temperature": 0.0, "avg_logprob": -0.34150967355501854, + "compression_ratio": 1.5121951219512195, "no_speech_prob": 0.003568369895219803}, + {"id": 134, "seek": 23080, "start": 230.8, "end": 237.16000000000003, "text": " + It really highlights how much media coverage and industry buzz can influence how + we perceive", "tokens": [50364, 467, 534, 14254, 577, 709, 3021, 9645, 293, 3518, + 13036, 393, 6503, 577, 321, 20281, 50682], "temperature": 0.0, "avg_logprob": -0.21253193600077025, + "compression_ratio": 1.6869009584664536, "no_speech_prob": 0.20910020172595978}, + {"id": 135, "seek": 23080, "start": 237.16000000000003, "end": 238.56, "text": " + technology trends.", "tokens": [50682, 2899, 13892, 13, 50752], "temperature": 0.0, + "avg_logprob": -0.21253193600077025, "compression_ratio": 1.6869009584664536, "no_speech_prob": + 0.20910020172595978}, {"id": 136, "seek": 23080, "start": 238.56, "end": 239.56, + "text": " Codally.", "tokens": [50752, 383, 378, 379, 13, 50802], "temperature": + 0.0, "avg_logprob": -0.21253193600077025, "compression_ratio": 1.6869009584664536, + "no_speech_prob": 0.20910020172595978}, {"id": 137, "seek": 23080, "start": 239.56, + "end": 241.28, "text": " It''s like a self-fulfilling prophecy almost.", "tokens": + [50802, 467, 311, 411, 257, 2698, 12, 906, 69, 7345, 23945, 1920, 13, 50888], "temperature": + 0.0, "avg_logprob": -0.21253193600077025, "compression_ratio": 1.6869009584664536, + "no_speech_prob": 0.20910020172595978}, {"id": 138, "seek": 23080, "start": 241.28, + "end": 242.28, "text": " Yeah.", "tokens": [50888, 865, 13, 50938], "temperature": + 0.0, "avg_logprob": -0.21253193600077025, "compression_ratio": 1.6869009584664536, + "no_speech_prob": 0.20910020172595978}, {"id": 139, "seek": 23080, "start": 242.28, + "end": 246.36, "text": " And it makes you think like how much of what we perceive + is like the next big thing is", "tokens": [50938, 400, 309, 1669, 291, 519, 411, + 577, 709, 295, 437, 321, 20281, 307, 411, 264, 958, 955, 551, 307, 51142], "temperature": + 0.0, "avg_logprob": -0.21253193600077025, "compression_ratio": 1.6869009584664536, + "no_speech_prob": 0.20910020172595978}, {"id": 140, "seek": 23080, "start": 246.36, + "end": 250.60000000000002, "text": " actually driven by, you know, the hype, the + hype, the funding, the media attention.", "tokens": [51142, 767, 9555, 538, 11, + 291, 458, 11, 264, 24144, 11, 264, 24144, 11, 264, 6137, 11, 264, 3021, 3202, 13, + 51354], "temperature": 0.0, "avg_logprob": -0.21253193600077025, "compression_ratio": + 1.6869009584664536, "no_speech_prob": 0.20910020172595978}, {"id": 141, "seek": + 23080, "start": 250.60000000000002, "end": 251.60000000000002, "text": " Yeah.", + "tokens": [51354, 865, 13, 51404], "temperature": 0.0, "avg_logprob": -0.21253193600077025, + "compression_ratio": 1.6869009584664536, "no_speech_prob": 0.20910020172595978}, + {"id": 142, "seek": 23080, "start": 251.60000000000002, "end": 252.60000000000002, + "text": " It''s fascinating.", "tokens": [51404, 467, 311, 10343, 13, 51454], "temperature": + 0.0, "avg_logprob": -0.21253193600077025, "compression_ratio": 1.6869009584664536, + "no_speech_prob": 0.20910020172595978}, {"id": 143, "seek": 23080, "start": 252.60000000000002, + "end": 256.40000000000003, "text": " So clearly, there''s still a ton of innovation + and investment happening in the vector database", "tokens": [51454, 407, 4448, 11, + 456, 311, 920, 257, 2952, 295, 8504, 293, 6078, 2737, 294, 264, 8062, 8149, 51644], + "temperature": 0.0, "avg_logprob": -0.21253193600077025, "compression_ratio": 1.6869009584664536, + "no_speech_prob": 0.20910020172595978}, {"id": 144, "seek": 23080, "start": 256.40000000000003, + "end": 257.40000000000003, "text": " space.", "tokens": [51644, 1901, 13, 51694], + "temperature": 0.0, "avg_logprob": -0.21253193600077025, "compression_ratio": 1.6869009584664536, + "no_speech_prob": 0.20910020172595978}, {"id": 145, "seek": 23080, "start": 257.40000000000003, + "end": 258.40000000000003, "text": " Be sure.", "tokens": [51694, 879, 988, 13, + 51744], "temperature": 0.0, "avg_logprob": -0.21253193600077025, "compression_ratio": + 1.6869009584664536, "no_speech_prob": 0.20910020172595978}, {"id": 146, "seek": + 23080, "start": 258.40000000000003, "end": 260.56, "text": " But let''s get practical + for our listener out there.", "tokens": [51744, 583, 718, 311, 483, 8496, 337, 527, + 31569, 484, 456, 13, 51852], "temperature": 0.0, "avg_logprob": -0.21253193600077025, + "compression_ratio": 1.6869009584664536, "no_speech_prob": 0.20910020172595978}, + {"id": 147, "seek": 26056, "start": 260.56, "end": 261.96, "text": " Let''s give + him some actionable advice.", "tokens": [50364, 961, 311, 976, 796, 512, 45098, + 5192, 13, 50434], "temperature": 0.0, "avg_logprob": -0.2596795290511176, "compression_ratio": + 1.5961538461538463, "no_speech_prob": 0.01484401524066925}, {"id": 148, "seek": + 26056, "start": 261.96, "end": 266.48, "text": " The real gem in this article is + Demetri''s guide for choosing the right vector database.", "tokens": [50434, 440, + 957, 7173, 294, 341, 7222, 307, 4686, 302, 470, 311, 5934, 337, 10875, 264, 558, + 8062, 8149, 13, 50660], "temperature": 0.0, "avg_logprob": -0.2596795290511176, + "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.01484401524066925}, + {"id": 149, "seek": 26056, "start": 266.48, "end": 267.48, "text": " Right.", "tokens": + [50660, 1779, 13, 50710], "temperature": 0.0, "avg_logprob": -0.2596795290511176, + "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.01484401524066925}, + {"id": 150, "seek": 26056, "start": 267.48, "end": 269.2, "text": " Because there''s + no one size fits all solution.", "tokens": [50710, 1436, 456, 311, 572, 472, 2744, + 9001, 439, 3827, 13, 50796], "temperature": 0.0, "avg_logprob": -0.2596795290511176, + "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.01484401524066925}, + {"id": 151, "seek": 26056, "start": 269.2, "end": 270.2, "text": " Exactly.", "tokens": + [50796, 7587, 13, 50846], "temperature": 0.0, "avg_logprob": -0.2596795290511176, + "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.01484401524066925}, + {"id": 152, "seek": 26056, "start": 270.2, "end": 271.44, "text": " It really depends + on your specific needs.", "tokens": [50846, 467, 534, 5946, 322, 428, 2685, 2203, + 13, 50908], "temperature": 0.0, "avg_logprob": -0.2596795290511176, "compression_ratio": + 1.5961538461538463, "no_speech_prob": 0.01484401524066925}, {"id": 153, "seek": + 26056, "start": 271.44, "end": 274.84000000000003, "text": " It''s like having a + roadmap for navigating this complex landscape.", "tokens": [50908, 467, 311, 411, + 1419, 257, 35738, 337, 32054, 341, 3997, 9661, 13, 51078], "temperature": 0.0, "avg_logprob": + -0.2596795290511176, "compression_ratio": 1.5961538461538463, "no_speech_prob": + 0.01484401524066925}, {"id": 154, "seek": 26056, "start": 274.84000000000003, "end": + 276.48, "text": " Totally a roadmap.", "tokens": [51078, 22837, 257, 35738, 13, + 51160], "temperature": 0.0, "avg_logprob": -0.2596795290511176, "compression_ratio": + 1.5961538461538463, "no_speech_prob": 0.01484401524066925}, {"id": 155, "seek": + 26056, "start": 276.48, "end": 278.6, "text": " So where does he suggest starting?", + "tokens": [51160, 407, 689, 775, 415, 3402, 2891, 30, 51266], "temperature": 0.0, + "avg_logprob": -0.2596795290511176, "compression_ratio": 1.5961538461538463, "no_speech_prob": + 0.01484401524066925}, {"id": 156, "seek": 26056, "start": 278.6, "end": 279.6, "text": + " Okay.", "tokens": [51266, 1033, 13, 51316], "temperature": 0.0, "avg_logprob": + -0.2596795290511176, "compression_ratio": 1.5961538461538463, "no_speech_prob": + 0.01484401524066925}, {"id": 157, "seek": 26056, "start": 279.6, "end": 280.6, "text": + " I''m ready.", "tokens": [51316, 286, 478, 1919, 13, 51366], "temperature": 0.0, + "avg_logprob": -0.2596795290511176, "compression_ratio": 1.5961538461538463, "no_speech_prob": + 0.01484401524066925}, {"id": 158, "seek": 26056, "start": 280.6, "end": 283.4, "text": + " His secret weapon is, FAYS.", "tokens": [51366, 2812, 4054, 7463, 307, 11, 479, + 4299, 50, 13, 51506], "temperature": 0.0, "avg_logprob": -0.2596795290511176, "compression_ratio": + 1.5961538461538463, "no_speech_prob": 0.01484401524066925}, {"id": 159, "seek": + 26056, "start": 283.4, "end": 284.4, "text": " Okay.", "tokens": [51506, 1033, 13, + 51556], "temperature": 0.0, "avg_logprob": -0.2596795290511176, "compression_ratio": + 1.5961538461538463, "no_speech_prob": 0.01484401524066925}, {"id": 160, "seek": + 26056, "start": 284.4, "end": 287.16, "text": " No, it''s not technically a full-fledged + database.", "tokens": [51556, 883, 11, 309, 311, 406, 12120, 257, 1577, 12, 69, + 1493, 3004, 8149, 13, 51694], "temperature": 0.0, "avg_logprob": -0.2596795290511176, + "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.01484401524066925}, + {"id": 161, "seek": 26056, "start": 287.16, "end": 288.16, "text": " Right.", "tokens": + [51694, 1779, 13, 51744], "temperature": 0.0, "avg_logprob": -0.2596795290511176, + "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.01484401524066925}, + {"id": 162, "seek": 26056, "start": 288.16, "end": 289.16, "text": " It''s more + of a powerful library.", "tokens": [51744, 467, 311, 544, 295, 257, 4005, 6405, + 13, 51794], "temperature": 0.0, "avg_logprob": -0.2596795290511176, "compression_ratio": + 1.5961538461538463, "no_speech_prob": 0.01484401524066925}, {"id": 163, "seek": + 26056, "start": 289.16, "end": 290.16, "text": " Okay.", "tokens": [51794, 1033, + 13, 51844], "temperature": 0.0, "avg_logprob": -0.2596795290511176, "compression_ratio": + 1.5961538461538463, "no_speech_prob": 0.01484401524066925}, {"id": 164, "seek": + 29016, "start": 290.16, "end": 294.40000000000003, "text": " So the kicker, it can + handle massive data sets.", "tokens": [50364, 407, 264, 4437, 260, 11, 309, 393, + 4813, 5994, 1412, 6352, 13, 50576], "temperature": 0.0, "avg_logprob": -0.2221156081108198, + "compression_ratio": 1.6058394160583942, "no_speech_prob": 0.07253293693065643}, + {"id": 165, "seek": 29016, "start": 294.40000000000003, "end": 295.40000000000003, + "text": " Okay.", "tokens": [50576, 1033, 13, 50626], "temperature": 0.0, "avg_logprob": + -0.2221156081108198, "compression_ratio": 1.6058394160583942, "no_speech_prob": + 0.07253293693065643}, {"id": 166, "seek": 29016, "start": 295.40000000000003, "end": + 297.04, "text": " We''re talking over a billion vectors.", "tokens": [50626, 492, + 434, 1417, 670, 257, 5218, 18875, 13, 50708], "temperature": 0.0, "avg_logprob": + -0.2221156081108198, "compression_ratio": 1.6058394160583942, "no_speech_prob": + 0.07253293693065643}, {"id": 167, "seek": 29016, "start": 297.04, "end": 298.04, + "text": " Wow.", "tokens": [50708, 3153, 13, 50758], "temperature": 0.0, "avg_logprob": + -0.2221156081108198, "compression_ratio": 1.6058394160583942, "no_speech_prob": + 0.07253293693065643}, {"id": 168, "seek": 29016, "start": 298.04, "end": 299.04, + "text": " That''s a lot.", "tokens": [50758, 663, 311, 257, 688, 13, 50808], "temperature": + 0.0, "avg_logprob": -0.2221156081108198, "compression_ratio": 1.6058394160583942, + "no_speech_prob": 0.07253293693065643}, {"id": 169, "seek": 29016, "start": 299.04, + "end": 300.04, "text": " So it can scale.", "tokens": [50808, 407, 309, 393, 4373, + 13, 50858], "temperature": 0.0, "avg_logprob": -0.2221156081108198, "compression_ratio": + 1.6058394160583942, "no_speech_prob": 0.07253293693065643}, {"id": 170, "seek": + 29016, "start": 300.04, "end": 301.32000000000005, "text": " We can handle the big + stuff.", "tokens": [50858, 492, 393, 4813, 264, 955, 1507, 13, 50922], "temperature": + 0.0, "avg_logprob": -0.2221156081108198, "compression_ratio": 1.6058394160583942, + "no_speech_prob": 0.07253293693065643}, {"id": 171, "seek": 29016, "start": 301.32000000000005, + "end": 305.16, "text": " And the beauty of FAYS is its simplicity and scalability.", + "tokens": [50922, 400, 264, 6643, 295, 479, 4299, 50, 307, 1080, 25632, 293, 15664, + 2310, 13, 51114], "temperature": 0.0, "avg_logprob": -0.2221156081108198, "compression_ratio": + 1.6058394160583942, "no_speech_prob": 0.07253293693065643}, {"id": 172, "seek": + 29016, "start": 305.16, "end": 309.84000000000003, "text": " So it''s perfect for, + it''s perfect for initial exploration and prototyping.", "tokens": [51114, 407, + 309, 311, 2176, 337, 11, 309, 311, 2176, 337, 5883, 16197, 293, 46219, 3381, 13, + 51348], "temperature": 0.0, "avg_logprob": -0.2221156081108198, "compression_ratio": + 1.6058394160583942, "no_speech_prob": 0.07253293693065643}, {"id": 173, "seek": + 29016, "start": 309.84000000000003, "end": 311.92, "text": " You can just get in + there and start playing around.", "tokens": [51348, 509, 393, 445, 483, 294, 456, + 293, 722, 2433, 926, 13, 51452], "temperature": 0.0, "avg_logprob": -0.2221156081108198, + "compression_ratio": 1.6058394160583942, "no_speech_prob": 0.07253293693065643}, + {"id": 174, "seek": 29016, "start": 311.92, "end": 312.92, "text": " Exactly.", + "tokens": [51452, 7587, 13, 51502], "temperature": 0.0, "avg_logprob": -0.2221156081108198, + "compression_ratio": 1.6058394160583942, "no_speech_prob": 0.07253293693065643}, + {"id": 175, "seek": 29016, "start": 312.92, "end": 313.92, "text": " Yeah.", "tokens": + [51502, 865, 13, 51552], "temperature": 0.0, "avg_logprob": -0.2221156081108198, + "compression_ratio": 1.6058394160583942, "no_speech_prob": 0.07253293693065643}, + {"id": 176, "seek": 29016, "start": 313.92, "end": 315.96000000000004, "text": " + But of course, uh oh, there''s always a butt.", "tokens": [51552, 583, 295, 1164, + 11, 2232, 1954, 11, 456, 311, 1009, 257, 6660, 13, 51654], "temperature": 0.0, "avg_logprob": + -0.2221156081108198, "compression_ratio": 1.6058394160583942, "no_speech_prob": + 0.07253293693065643}, {"id": 177, "seek": 29016, "start": 315.96000000000004, "end": + 316.96000000000004, "text": " There''s a trade-off.", "tokens": [51654, 821, 311, + 257, 4923, 12, 4506, 13, 51704], "temperature": 0.0, "avg_logprob": -0.2221156081108198, + "compression_ratio": 1.6058394160583942, "no_speech_prob": 0.07253293693065643}, + {"id": 178, "seek": 29016, "start": 316.96000000000004, "end": 317.96000000000004, + "text": " Okay.", "tokens": [51704, 1033, 13, 51754], "temperature": 0.0, "avg_logprob": + -0.2221156081108198, "compression_ratio": 1.6058394160583942, "no_speech_prob": + 0.07253293693065643}, {"id": 179, "seek": 29016, "start": 317.96000000000004, "end": + 318.96000000000004, "text": " What is it?", "tokens": [51754, 708, 307, 309, 30, + 51804], "temperature": 0.0, "avg_logprob": -0.2221156081108198, "compression_ratio": + 1.6058394160583942, "no_speech_prob": 0.07253293693065643}, {"id": 180, "seek": + 31896, "start": 318.96, "end": 321.12, "text": " Built-in filtering capabilities.", + "tokens": [50364, 49822, 12, 259, 30822, 10862, 13, 50472], "temperature": 0.0, + "avg_logprob": -0.2789870491863167, "compression_ratio": 1.564102564102564, "no_speech_prob": + 0.07089832425117493}, {"id": 181, "seek": 31896, "start": 321.12, "end": 324.28, + "text": " Uh, so you can''t really do that fine-grained search.", "tokens": [50472, + 4019, 11, 370, 291, 393, 380, 534, 360, 300, 2489, 12, 20735, 2001, 3164, 13, 50630], + "temperature": 0.0, "avg_logprob": -0.2789870491863167, "compression_ratio": 1.564102564102564, + "no_speech_prob": 0.07089832425117493}, {"id": 182, "seek": 31896, "start": 324.28, + "end": 325.28, "text": " Right.", "tokens": [50630, 1779, 13, 50680], "temperature": + 0.0, "avg_logprob": -0.2789870491863167, "compression_ratio": 1.564102564102564, + "no_speech_prob": 0.07089832425117493}, {"id": 183, "seek": 31896, "start": 325.28, + "end": 326.47999999999996, "text": " Like with keywords and stuff.", "tokens": [50680, + 1743, 365, 21009, 293, 1507, 13, 50740], "temperature": 0.0, "avg_logprob": -0.2789870491863167, + "compression_ratio": 1.564102564102564, "no_speech_prob": 0.07089832425117493}, + {"id": 184, "seek": 31896, "start": 326.47999999999996, "end": 328.52, "text": " + Which might mean getting creative with workarounds?", "tokens": [50740, 3013, 1062, + 914, 1242, 5880, 365, 589, 289, 4432, 30, 50842], "temperature": 0.0, "avg_logprob": + -0.2789870491863167, "compression_ratio": 1.564102564102564, "no_speech_prob": 0.07089832425117493}, + {"id": 185, "seek": 31896, "start": 328.52, "end": 329.52, "text": " Okay.", "tokens": + [50842, 1033, 13, 50892], "temperature": 0.0, "avg_logprob": -0.2789870491863167, + "compression_ratio": 1.564102564102564, "no_speech_prob": 0.07089832425117493}, + {"id": 186, "seek": 31896, "start": 329.52, "end": 330.52, "text": " So you''ve + got to be a little clever.", "tokens": [50892, 407, 291, 600, 658, 281, 312, 257, + 707, 13494, 13, 50942], "temperature": 0.0, "avg_logprob": -0.2789870491863167, + "compression_ratio": 1.564102564102564, "no_speech_prob": 0.07089832425117493}, + {"id": 187, "seek": 31896, "start": 330.52, "end": 331.52, "text": " A little bit.", + "tokens": [50942, 316, 707, 857, 13, 50992], "temperature": 0.0, "avg_logprob": + -0.2789870491863167, "compression_ratio": 1.564102564102564, "no_speech_prob": 0.07089832425117493}, + {"id": 188, "seek": 31896, "start": 331.52, "end": 334.52, "text": " If you want + to use FAYS, I.S. for certain things.", "tokens": [50992, 759, 291, 528, 281, 764, + 479, 4299, 50, 11, 286, 13, 50, 13, 337, 1629, 721, 13, 51142], "temperature": 0.0, + "avg_logprob": -0.2789870491863167, "compression_ratio": 1.564102564102564, "no_speech_prob": + 0.07089832425117493}, {"id": 189, "seek": 31896, "start": 334.52, "end": 335.52, + "text": " Yeah.", "tokens": [51142, 865, 13, 51192], "temperature": 0.0, "avg_logprob": + -0.2789870491863167, "compression_ratio": 1.564102564102564, "no_speech_prob": 0.07089832425117493}, + {"id": 190, "seek": 31896, "start": 335.52, "end": 337.56, "text": " So if you need + that fine-grained controlled keyword search.", "tokens": [51192, 407, 498, 291, + 643, 300, 2489, 12, 20735, 2001, 10164, 20428, 3164, 13, 51294], "temperature": + 0.0, "avg_logprob": -0.2789870491863167, "compression_ratio": 1.564102564102564, + "no_speech_prob": 0.07089832425117493}, {"id": 191, "seek": 31896, "start": 337.56, + "end": 338.56, "text": " Yeah.", "tokens": [51294, 865, 13, 51344], "temperature": + 0.0, "avg_logprob": -0.2789870491863167, "compression_ratio": 1.564102564102564, + "no_speech_prob": 0.07089832425117493}, {"id": 192, "seek": 31896, "start": 338.56, + "end": 343.4, "text": " Along with your vector search, what does Demetri recommend?", + "tokens": [51344, 17457, 365, 428, 8062, 3164, 11, 437, 775, 4686, 302, 470, 2748, + 30, 51586], "temperature": 0.0, "avg_logprob": -0.2789870491863167, "compression_ratio": + 1.564102564102564, "no_speech_prob": 0.07089832425117493}, {"id": 193, "seek": 31896, + "start": 343.4, "end": 345.0, "text": " I''m all ears.", "tokens": [51586, 286, + 478, 439, 8798, 13, 51666], "temperature": 0.0, "avg_logprob": -0.2789870491863167, + "compression_ratio": 1.564102564102564, "no_speech_prob": 0.07089832425117493}, + {"id": 194, "seek": 34500, "start": 345.0, "end": 349.28, "text": " He suggests + looking at databases built on top of Lucene.", "tokens": [50364, 634, 13409, 1237, + 412, 22380, 3094, 322, 1192, 295, 9593, 1450, 13, 50578], "temperature": 0.0, "avg_logprob": + -0.1796564694108634, "compression_ratio": 1.6933797909407666, "no_speech_prob": + 0.3848007917404175}, {"id": 195, "seek": 34500, "start": 349.28, "end": 350.28, + "text": " Lucene.", "tokens": [50578, 9593, 1450, 13, 50628], "temperature": 0.0, + "avg_logprob": -0.1796564694108634, "compression_ratio": 1.6933797909407666, "no_speech_prob": + 0.3848007917404175}, {"id": 196, "seek": 34500, "start": 350.28, "end": 354.68, + "text": " Options like open search, elastic search, and a patchy solar.", "tokens": + [50628, 42934, 411, 1269, 3164, 11, 17115, 3164, 11, 293, 257, 9972, 88, 7936, 13, + 50848], "temperature": 0.0, "avg_logprob": -0.1796564694108634, "compression_ratio": + 1.6933797909407666, "no_speech_prob": 0.3848007917404175}, {"id": 197, "seek": 34500, + "start": 354.68, "end": 355.68, "text": " Got it.", "tokens": [50848, 5803, 309, + 13, 50898], "temperature": 0.0, "avg_logprob": -0.1796564694108634, "compression_ratio": + 1.6933797909407666, "no_speech_prob": 0.3848007917404175}, {"id": 198, "seek": 34500, + "start": 355.68, "end": 359.12, "text": " So these are all built on this like solid + foundation of Lucene technology.", "tokens": [50898, 407, 613, 366, 439, 3094, 322, + 341, 411, 5100, 7030, 295, 9593, 1450, 2899, 13, 51070], "temperature": 0.0, "avg_logprob": + -0.1796564694108634, "compression_ratio": 1.6933797909407666, "no_speech_prob": + 0.3848007917404175}, {"id": 199, "seek": 34500, "start": 359.12, "end": 360.12, + "text": " Yeah.", "tokens": [51070, 865, 13, 51120], "temperature": 0.0, "avg_logprob": + -0.1796564694108634, "compression_ratio": 1.6933797909407666, "no_speech_prob": + 0.3848007917404175}, {"id": 200, "seek": 34500, "start": 360.12, "end": 361.12, + "text": " Lucene''s been around for a while, right?", "tokens": [51120, 9593, 1450, + 311, 668, 926, 337, 257, 1339, 11, 558, 30, 51170], "temperature": 0.0, "avg_logprob": + -0.1796564694108634, "compression_ratio": 1.6933797909407666, "no_speech_prob": + 0.3848007917404175}, {"id": 201, "seek": 34500, "start": 361.12, "end": 363.56, + "text": " It''s a mature technology with a proven track record.", "tokens": [51170, + 467, 311, 257, 14442, 2899, 365, 257, 12785, 2837, 2136, 13, 51292], "temperature": + 0.0, "avg_logprob": -0.1796564694108634, "compression_ratio": 1.6933797909407666, + "no_speech_prob": 0.3848007917404175}, {"id": 202, "seek": 34500, "start": 363.56, + "end": 364.56, "text": " Yeah.", "tokens": [51292, 865, 13, 51342], "temperature": + 0.0, "avg_logprob": -0.1796564694108634, "compression_ratio": 1.6933797909407666, + "no_speech_prob": 0.3848007917404175}, {"id": 203, "seek": 34500, "start": 364.56, + "end": 365.56, "text": " So it''s reliable.", "tokens": [51342, 407, 309, 311, 12924, + 13, 51392], "temperature": 0.0, "avg_logprob": -0.1796564694108634, "compression_ratio": + 1.6933797909407666, "no_speech_prob": 0.3848007917404175}, {"id": 204, "seek": 34500, + "start": 365.56, "end": 366.56, "text": " Reliable.", "tokens": [51392, 8738, 9364, + 13, 51442], "temperature": 0.0, "avg_logprob": -0.1796564694108634, "compression_ratio": + 1.6933797909407666, "no_speech_prob": 0.3848007917404175}, {"id": 205, "seek": 34500, + "start": 366.56, "end": 367.56, "text": " Yeah.", "tokens": [51442, 865, 13, 51492], + "temperature": 0.0, "avg_logprob": -0.1796564694108634, "compression_ratio": 1.6933797909407666, + "no_speech_prob": 0.3848007917404175}, {"id": 206, "seek": 34500, "start": 367.56, + "end": 368.56, "text": " And it provides that robust keyword search.", "tokens": + [51492, 400, 309, 6417, 300, 13956, 20428, 3164, 13, 51542], "temperature": 0.0, + "avg_logprob": -0.1796564694108634, "compression_ratio": 1.6933797909407666, "no_speech_prob": + 0.3848007917404175}, {"id": 207, "seek": 34500, "start": 368.56, "end": 369.56, + "text": " Okay.", "tokens": [51542, 1033, 13, 51592], "temperature": 0.0, "avg_logprob": + -0.1796564694108634, "compression_ratio": 1.6933797909407666, "no_speech_prob": + 0.3848007917404175}, {"id": 208, "seek": 34500, "start": 369.56, "end": 371.72, + "text": " You mentioned plus multilingual support.", "tokens": [51592, 509, 2835, + 1804, 2120, 38219, 1406, 13, 51700], "temperature": 0.0, "avg_logprob": -0.1796564694108634, + "compression_ratio": 1.6933797909407666, "no_speech_prob": 0.3848007917404175}, + {"id": 209, "seek": 34500, "start": 371.72, "end": 373.12, "text": " That''s important + these days.", "tokens": [51700, 663, 311, 1021, 613, 1708, 13, 51770], "temperature": + 0.0, "avg_logprob": -0.1796564694108634, "compression_ratio": 1.6933797909407666, + "no_speech_prob": 0.3848007917404175}, {"id": 210, "seek": 34500, "start": 373.12, + "end": 374.12, "text": " Super important.", "tokens": [51770, 4548, 1021, 13, 51820], + "temperature": 0.0, "avg_logprob": -0.1796564694108634, "compression_ratio": 1.6933797909407666, + "no_speech_prob": 0.3848007917404175}, {"id": 211, "seek": 37412, "start": 374.12, + "end": 376.72, "text": " And it performs incredibly well.", "tokens": [50364, 400, + 309, 26213, 6252, 731, 13, 50494], "temperature": 0.0, "avg_logprob": -0.2774854426761325, + "compression_ratio": 1.693103448275862, "no_speech_prob": 0.31069931387901306}, + {"id": 212, "seek": 37412, "start": 376.72, "end": 378.56, "text": " So it''s fast + and efficient.", "tokens": [50494, 407, 309, 311, 2370, 293, 7148, 13, 50586], "temperature": + 0.0, "avg_logprob": -0.2774854426761325, "compression_ratio": 1.693103448275862, + "no_speech_prob": 0.31069931387901306}, {"id": 213, "seek": 37412, "start": 378.56, + "end": 379.56, "text": " Nice.", "tokens": [50586, 5490, 13, 50636], "temperature": + 0.0, "avg_logprob": -0.2774854426761325, "compression_ratio": 1.693103448275862, + "no_speech_prob": 0.31069931387901306}, {"id": 214, "seek": 37412, "start": 379.56, + "end": 382.64, "text": " This makes it particularly well suited for e-commerce.", + "tokens": [50636, 639, 1669, 309, 4098, 731, 24736, 337, 308, 12, 26926, 13, 50790], + "temperature": 0.0, "avg_logprob": -0.2774854426761325, "compression_ratio": 1.693103448275862, + "no_speech_prob": 0.31069931387901306}, {"id": 215, "seek": 37412, "start": 382.64, + "end": 384.44, "text": " Oh, interesting why e-commerce?", "tokens": [50790, 876, + 11, 1880, 983, 308, 12, 26926, 30, 50880], "temperature": 0.0, "avg_logprob": -0.2774854426761325, + "compression_ratio": 1.693103448275862, "no_speech_prob": 0.31069931387901306}, + {"id": 216, "seek": 37412, "start": 384.44, "end": 391.36, "text": " Where features + like faceting, which allows users to refine their search by specific attributes.", + "tokens": [50880, 2305, 4122, 411, 1915, 9880, 11, 597, 4045, 5022, 281, 33906, + 641, 3164, 538, 2685, 17212, 13, 51226], "temperature": 0.0, "avg_logprob": -0.2774854426761325, + "compression_ratio": 1.693103448275862, "no_speech_prob": 0.31069931387901306}, + {"id": 217, "seek": 37412, "start": 391.36, "end": 392.36, "text": " Oh, I see.", + "tokens": [51226, 876, 11, 286, 536, 13, 51276], "temperature": 0.0, "avg_logprob": + -0.2774854426761325, "compression_ratio": 1.693103448275862, "no_speech_prob": 0.31069931387901306}, + {"id": 218, "seek": 37412, "start": 392.36, "end": 394.44, "text": " So like filtering + by brands.", "tokens": [51276, 407, 411, 30822, 538, 11324, 13, 51380], "temperature": + 0.0, "avg_logprob": -0.2774854426761325, "compression_ratio": 1.693103448275862, + "no_speech_prob": 0.31069931387901306}, {"id": 219, "seek": 37412, "start": 394.44, + "end": 395.44, "text": " Yeah.", "tokens": [51380, 865, 13, 51430], "temperature": + 0.0, "avg_logprob": -0.2774854426761325, "compression_ratio": 1.693103448275862, + "no_speech_prob": 0.31069931387901306}, {"id": 220, "seek": 37412, "start": 395.44, + "end": 396.44, "text": " Like filtering by brand.", "tokens": [51430, 1743, 30822, + 538, 3360, 13, 51480], "temperature": 0.0, "avg_logprob": -0.2774854426761325, "compression_ratio": + 1.693103448275862, "no_speech_prob": 0.31069931387901306}, {"id": 221, "seek": 37412, + "start": 396.44, "end": 397.44, "text": " Price range.", "tokens": [51480, 25803, + 3613, 13, 51530], "temperature": 0.0, "avg_logprob": -0.2774854426761325, "compression_ratio": + 1.693103448275862, "no_speech_prob": 0.31069931387901306}, {"id": 222, "seek": 37412, + "start": 397.44, "end": 398.44, "text": " Price range size.", "tokens": [51530, + 25803, 3613, 2744, 13, 51580], "temperature": 0.0, "avg_logprob": -0.2774854426761325, + "compression_ratio": 1.693103448275862, "no_speech_prob": 0.31069931387901306}, + {"id": 223, "seek": 37412, "start": 398.44, "end": 399.68, "text": " All those things + are essential.", "tokens": [51580, 1057, 729, 721, 366, 7115, 13, 51642], "temperature": + 0.0, "avg_logprob": -0.2774854426761325, "compression_ratio": 1.693103448275862, + "no_speech_prob": 0.31069931387901306}, {"id": 224, "seek": 37412, "start": 399.68, + "end": 401.16, "text": " Makes sense for e-commerce.", "tokens": [51642, 25245, + 2020, 337, 308, 12, 26926, 13, 51716], "temperature": 0.0, "avg_logprob": -0.2774854426761325, + "compression_ratio": 1.693103448275862, "no_speech_prob": 0.31069931387901306}, + {"id": 225, "seek": 37412, "start": 401.16, "end": 402.56, "text": " He did mention + one exception though, right?", "tokens": [51716, 634, 630, 2152, 472, 11183, 1673, + 11, 558, 30, 51786], "temperature": 0.0, "avg_logprob": -0.2774854426761325, "compression_ratio": + 1.693103448275862, "no_speech_prob": 0.31069931387901306}, {"id": 226, "seek": 37412, + "start": 402.56, "end": 403.56, "text": " Though there''s always an exception.", + "tokens": [51786, 10404, 456, 311, 1009, 364, 11183, 13, 51836], "temperature": + 0.0, "avg_logprob": -0.2774854426761325, "compression_ratio": 1.693103448275862, + "no_speech_prob": 0.31069931387901306}, {"id": 227, "seek": 40356, "start": 404.56, + "end": 405.56, "text": " What is it?", "tokens": [50414, 708, 307, 309, 30, 50464], + "temperature": 0.0, "avg_logprob": -0.21318682352701823, "compression_ratio": 1.6, + "no_speech_prob": 0.002844778122380376}, {"id": 228, "seek": 40356, "start": 405.56, + "end": 408.28000000000003, "text": " Cudrant, even though it''s not built on Lucene.", + "tokens": [50464, 383, 532, 7541, 11, 754, 1673, 309, 311, 406, 3094, 322, 9593, + 1450, 13, 50600], "temperature": 0.0, "avg_logprob": -0.21318682352701823, "compression_ratio": + 1.6, "no_speech_prob": 0.002844778122380376}, {"id": 229, "seek": 40356, "start": + 408.28000000000003, "end": 409.28000000000003, "text": " Oh, okay.", "tokens": [50600, + 876, 11, 1392, 13, 50650], "temperature": 0.0, "avg_logprob": -0.21318682352701823, + "compression_ratio": 1.6, "no_speech_prob": 0.002844778122380376}, {"id": 230, "seek": + 40356, "start": 409.28000000000003, "end": 411.6, "text": " Includes fascinating + capabilities.", "tokens": [50650, 7779, 1471, 279, 10343, 10862, 13, 50766], "temperature": + 0.0, "avg_logprob": -0.21318682352701823, "compression_ratio": 1.6, "no_speech_prob": + 0.002844778122380376}, {"id": 231, "seek": 40356, "start": 411.6, "end": 412.6, + "text": " Interesting.", "tokens": [50766, 14711, 13, 50816], "temperature": 0.0, + "avg_logprob": -0.21318682352701823, "compression_ratio": 1.6, "no_speech_prob": + 0.002844778122380376}, {"id": 232, "seek": 40356, "start": 412.6, "end": 414.4, + "text": " So it''s kind of a hybrid approach.", "tokens": [50816, 407, 309, 311, + 733, 295, 257, 13051, 3109, 13, 50906], "temperature": 0.0, "avg_logprob": -0.21318682352701823, + "compression_ratio": 1.6, "no_speech_prob": 0.002844778122380376}, {"id": 233, "seek": + 40356, "start": 414.4, "end": 416.48, "text": " Making it a contender in those scenarios + too.", "tokens": [50906, 14595, 309, 257, 660, 3216, 294, 729, 15077, 886, 13, 51010], + "temperature": 0.0, "avg_logprob": -0.21318682352701823, "compression_ratio": 1.6, + "no_speech_prob": 0.002844778122380376}, {"id": 234, "seek": 40356, "start": 416.48, + "end": 418.4, "text": " So Cudrant''s kind of a wild card.", "tokens": [51010, 407, + 383, 532, 7541, 311, 733, 295, 257, 4868, 2920, 13, 51106], "temperature": 0.0, + "avg_logprob": -0.21318682352701823, "compression_ratio": 1.6, "no_speech_prob": + 0.002844778122380376}, {"id": 235, "seek": 40356, "start": 418.4, "end": 419.4, + "text": " A little bit.", "tokens": [51106, 316, 707, 857, 13, 51156], "temperature": + 0.0, "avg_logprob": -0.21318682352701823, "compression_ratio": 1.6, "no_speech_prob": + 0.002844778122380376}, {"id": 236, "seek": 40356, "start": 419.4, "end": 420.4, + "text": " Yeah.", "tokens": [51156, 865, 13, 51206], "temperature": 0.0, "avg_logprob": + -0.21318682352701823, "compression_ratio": 1.6, "no_speech_prob": 0.002844778122380376}, + {"id": 237, "seek": 40356, "start": 420.4, "end": 421.72, "text": " It''s got its + own unique set of features.", "tokens": [51206, 467, 311, 658, 1080, 1065, 3845, + 992, 295, 4122, 13, 51272], "temperature": 0.0, "avg_logprob": -0.21318682352701823, + "compression_ratio": 1.6, "no_speech_prob": 0.002844778122380376}, {"id": 238, "seek": + 40356, "start": 421.72, "end": 424.8, "text": " And it shows the importance of going + beyond general categories.", "tokens": [51272, 400, 309, 3110, 264, 7379, 295, 516, + 4399, 2674, 10479, 13, 51426], "temperature": 0.0, "avg_logprob": -0.21318682352701823, + "compression_ratio": 1.6, "no_speech_prob": 0.002844778122380376}, {"id": 239, "seek": + 40356, "start": 424.8, "end": 425.8, "text": " Yeah.", "tokens": [51426, 865, 13, + 51476], "temperature": 0.0, "avg_logprob": -0.21318682352701823, "compression_ratio": + 1.6, "no_speech_prob": 0.002844778122380376}, {"id": 240, "seek": 40356, "start": + 425.8, "end": 429.64, "text": " And really digging into the specific features each + database offers.", "tokens": [51476, 400, 534, 17343, 666, 264, 2685, 4122, 1184, + 8149, 7736, 13, 51668], "temperature": 0.0, "avg_logprob": -0.21318682352701823, + "compression_ratio": 1.6, "no_speech_prob": 0.002844778122380376}, {"id": 241, "seek": + 40356, "start": 429.64, "end": 430.64, "text": " Absolutely.", "tokens": [51668, + 7021, 13, 51718], "temperature": 0.0, "avg_logprob": -0.21318682352701823, "compression_ratio": + 1.6, "no_speech_prob": 0.002844778122380376}, {"id": 242, "seek": 40356, "start": + 430.64, "end": 433.12, "text": " You can''t just like assume that because it''s + in one category.", "tokens": [51718, 509, 393, 380, 445, 411, 6552, 300, 570, 309, + 311, 294, 472, 7719, 13, 51842], "temperature": 0.0, "avg_logprob": -0.21318682352701823, + "compression_ratio": 1.6, "no_speech_prob": 0.002844778122380376}, {"id": 243, "seek": + 43312, "start": 433.12, "end": 434.12, "text": " Right.", "tokens": [50364, 1779, + 13, 50414], "temperature": 0.0, "avg_logprob": -0.1812015919203169, "compression_ratio": + 1.7701863354037266, "no_speech_prob": 0.1291946917772293}, {"id": 244, "seek": 43312, + "start": 434.12, "end": 435.12, "text": " It''s got all the features you need.", + "tokens": [50414, 467, 311, 658, 439, 264, 4122, 291, 643, 13, 50464], "temperature": + 0.0, "avg_logprob": -0.1812015919203169, "compression_ratio": 1.7701863354037266, + "no_speech_prob": 0.1291946917772293}, {"id": 245, "seek": 43312, "start": 435.12, + "end": 436.12, "text": " You got to do your research.", "tokens": [50464, 509, 658, + 281, 360, 428, 2132, 13, 50514], "temperature": 0.0, "avg_logprob": -0.1812015919203169, + "compression_ratio": 1.7701863354037266, "no_speech_prob": 0.1291946917772293}, + {"id": 246, "seek": 43312, "start": 436.12, "end": 437.44, "text": " You got to + look under the hood.", "tokens": [50514, 509, 658, 281, 574, 833, 264, 13376, 13, + 50580], "temperature": 0.0, "avg_logprob": -0.1812015919203169, "compression_ratio": + 1.7701863354037266, "no_speech_prob": 0.1291946917772293}, {"id": 247, "seek": 43312, + "start": 437.44, "end": 438.44, "text": " Exactly.", "tokens": [50580, 7587, 13, + 50630], "temperature": 0.0, "avg_logprob": -0.1812015919203169, "compression_ratio": + 1.7701863354037266, "no_speech_prob": 0.1291946917772293}, {"id": 248, "seek": 43312, + "start": 438.44, "end": 441.64, "text": " Now what if you need something more advanced?", + "tokens": [50630, 823, 437, 498, 291, 643, 746, 544, 7339, 30, 50790], "temperature": + 0.0, "avg_logprob": -0.1812015919203169, "compression_ratio": 1.7701863354037266, + "no_speech_prob": 0.1291946917772293}, {"id": 249, "seek": 43312, "start": 441.64, + "end": 442.64, "text": " Okay.", "tokens": [50790, 1033, 13, 50840], "temperature": + 0.0, "avg_logprob": -0.1812015919203169, "compression_ratio": 1.7701863354037266, + "no_speech_prob": 0.1291946917772293}, {"id": 250, "seek": 43312, "start": 442.64, + "end": 443.64, "text": " Like what?", "tokens": [50840, 1743, 437, 30, 50890], "temperature": + 0.0, "avg_logprob": -0.1812015919203169, "compression_ratio": 1.7701863354037266, + "no_speech_prob": 0.1291946917772293}, {"id": 251, "seek": 43312, "start": 443.64, + "end": 444.96, "text": " Like support for those late interaction models.", "tokens": + [50890, 1743, 1406, 337, 729, 3469, 9285, 5245, 13, 50956], "temperature": 0.0, + "avg_logprob": -0.1812015919203169, "compression_ratio": 1.7701863354037266, "no_speech_prob": + 0.1291946917772293}, {"id": 252, "seek": 43312, "start": 444.96, "end": 446.16, + "text": " Late interaction models, huh?", "tokens": [50956, 31220, 9285, 5245, 11, + 7020, 30, 51016], "temperature": 0.0, "avg_logprob": -0.1812015919203169, "compression_ratio": + 1.7701863354037266, "no_speech_prob": 0.1291946917772293}, {"id": 253, "seek": 43312, + "start": 446.16, "end": 447.16, "text": " Yeah.", "tokens": [51016, 865, 13, 51066], + "temperature": 0.0, "avg_logprob": -0.1812015919203169, "compression_ratio": 1.7701863354037266, + "no_speech_prob": 0.1291946917772293}, {"id": 254, "seek": 43312, "start": 447.16, + "end": 448.16, "text": " If you heard of these.", "tokens": [51066, 759, 291, 2198, + 295, 613, 13, 51116], "temperature": 0.0, "avg_logprob": -0.1812015919203169, "compression_ratio": + 1.7701863354037266, "no_speech_prob": 0.1291946917772293}, {"id": 255, "seek": 43312, + "start": 448.16, "end": 449.64, "text": " I''ve heard the term, but I''m not really + sure what they were.", "tokens": [51116, 286, 600, 2198, 264, 1433, 11, 457, 286, + 478, 406, 534, 988, 437, 436, 645, 13, 51190], "temperature": 0.0, "avg_logprob": + -0.1812015919203169, "compression_ratio": 1.7701863354037266, "no_speech_prob": + 0.1291946917772293}, {"id": 256, "seek": 43312, "start": 449.64, "end": 450.64, + "text": " Okay.", "tokens": [51190, 1033, 13, 51240], "temperature": 0.0, "avg_logprob": + -0.1812015919203169, "compression_ratio": 1.7701863354037266, "no_speech_prob": + 0.1291946917772293}, {"id": 257, "seek": 43312, "start": 450.64, "end": 454.28000000000003, + "text": " So imagine you''re searching for the perfect pair of red shoes.", "tokens": + [51240, 407, 3811, 291, 434, 10808, 337, 264, 2176, 6119, 295, 2182, 6654, 13, 51422], + "temperature": 0.0, "avg_logprob": -0.1812015919203169, "compression_ratio": 1.7701863354037266, + "no_speech_prob": 0.1291946917772293}, {"id": 258, "seek": 43312, "start": 454.28000000000003, + "end": 455.28000000000003, "text": " Okay.", "tokens": [51422, 1033, 13, 51472], + "temperature": 0.0, "avg_logprob": -0.1812015919203169, "compression_ratio": 1.7701863354037266, + "no_speech_prob": 0.1291946917772293}, {"id": 259, "seek": 43312, "start": 455.28000000000003, + "end": 456.28000000000003, "text": " I like shoes.", "tokens": [51472, 286, 411, + 6654, 13, 51522], "temperature": 0.0, "avg_logprob": -0.1812015919203169, "compression_ratio": + 1.7701863354037266, "no_speech_prob": 0.1291946917772293}, {"id": 260, "seek": 43312, + "start": 456.28000000000003, "end": 458.84000000000003, "text": " But only after + you''ve seen a picture of the outfit you want them to match.", "tokens": [51522, + 583, 787, 934, 291, 600, 1612, 257, 3036, 295, 264, 11263, 291, 528, 552, 281, 2995, + 13, 51650], "temperature": 0.0, "avg_logprob": -0.1812015919203169, "compression_ratio": + 1.7701863354037266, "no_speech_prob": 0.1291946917772293}, {"id": 261, "seek": 43312, + "start": 458.84000000000003, "end": 459.84000000000003, "text": " Oh, I see.", "tokens": + [51650, 876, 11, 286, 536, 13, 51700], "temperature": 0.0, "avg_logprob": -0.1812015919203169, + "compression_ratio": 1.7701863354037266, "no_speech_prob": 0.1291946917772293}, + {"id": 262, "seek": 43312, "start": 459.84000000000003, "end": 461.68, "text": " + So like the context of the search changes.", "tokens": [51700, 407, 411, 264, 4319, + 295, 264, 3164, 2962, 13, 51792], "temperature": 0.0, "avg_logprob": -0.1812015919203169, + "compression_ratio": 1.7701863354037266, "no_speech_prob": 0.1291946917772293}, + {"id": 263, "seek": 43312, "start": 461.68, "end": 462.68, "text": " Exactly.", + "tokens": [51792, 7587, 13, 51842], "temperature": 0.0, "avg_logprob": -0.1812015919203169, + "compression_ratio": 1.7701863354037266, "no_speech_prob": 0.1291946917772293}, + {"id": 264, "seek": 46268, "start": 462.72, "end": 464.16, "text": " And then you + see something you see later on.", "tokens": [50366, 400, 550, 291, 536, 746, 291, + 536, 1780, 322, 13, 50438], "temperature": 0.0, "avg_logprob": -0.24896220288245507, + "compression_ratio": 1.7896440129449838, "no_speech_prob": 0.057629648596048355}, + {"id": 265, "seek": 46268, "start": 464.16, "end": 465.96, "text": " That''s where + late interaction models come in.", "tokens": [50438, 663, 311, 689, 3469, 9285, + 5245, 808, 294, 13, 50528], "temperature": 0.0, "avg_logprob": -0.24896220288245507, + "compression_ratio": 1.7896440129449838, "no_speech_prob": 0.057629648596048355}, + {"id": 266, "seek": 46268, "start": 465.96, "end": 466.96, "text": " Okay.", "tokens": + [50528, 1033, 13, 50578], "temperature": 0.0, "avg_logprob": -0.24896220288245507, + "compression_ratio": 1.7896440129449838, "no_speech_prob": 0.057629648596048355}, + {"id": 267, "seek": 46268, "start": 466.96, "end": 471.16, "text": " Allowing you + to refine your search based on context that''s only available later in the", "tokens": + [50578, 1057, 9637, 291, 281, 33906, 428, 3164, 2361, 322, 4319, 300, 311, 787, + 2435, 1780, 294, 264, 50788], "temperature": 0.0, "avg_logprob": -0.24896220288245507, + "compression_ratio": 1.7896440129449838, "no_speech_prob": 0.057629648596048355}, + {"id": 268, "seek": 46268, "start": 471.16, "end": 472.16, "text": " process.", + "tokens": [50788, 1399, 13, 50838], "temperature": 0.0, "avg_logprob": -0.24896220288245507, + "compression_ratio": 1.7896440129449838, "no_speech_prob": 0.057629648596048355}, + {"id": 269, "seek": 46268, "start": 472.16, "end": 474.6, "text": " So it''s like + a more dynamic way of searching.", "tokens": [50838, 407, 309, 311, 411, 257, 544, + 8546, 636, 295, 10808, 13, 50960], "temperature": 0.0, "avg_logprob": -0.24896220288245507, + "compression_ratio": 1.7896440129449838, "no_speech_prob": 0.057629648596048355}, + {"id": 270, "seek": 46268, "start": 474.6, "end": 476.0, "text": " It is a more + dynamic way of searching.", "tokens": [50960, 467, 307, 257, 544, 8546, 636, 295, + 10808, 13, 51030], "temperature": 0.0, "avg_logprob": -0.24896220288245507, "compression_ratio": + 1.7896440129449838, "no_speech_prob": 0.057629648596048355}, {"id": 271, "seek": + 46268, "start": 476.0, "end": 477.0, "text": " Interesting.", "tokens": [51030, + 14711, 13, 51080], "temperature": 0.0, "avg_logprob": -0.24896220288245507, "compression_ratio": + 1.7896440129449838, "no_speech_prob": 0.057629648596048355}, {"id": 272, "seek": + 46268, "start": 477.0, "end": 479.36, "text": " And it requires a different level + of database support.", "tokens": [51080, 400, 309, 7029, 257, 819, 1496, 295, 8149, + 1406, 13, 51198], "temperature": 0.0, "avg_logprob": -0.24896220288245507, "compression_ratio": + 1.7896440129449838, "no_speech_prob": 0.057629648596048355}, {"id": 273, "seek": + 46268, "start": 479.36, "end": 480.36, "text": " I bet.", "tokens": [51198, 286, + 778, 13, 51248], "temperature": 0.0, "avg_logprob": -0.24896220288245507, "compression_ratio": + 1.7896440129449838, "no_speech_prob": 0.057629648596048355}, {"id": 274, "seek": + 46268, "start": 480.36, "end": 485.4, "text": " And Dmitry points to QDrent or Vespa + as potential solutions.", "tokens": [51248, 400, 413, 3508, 627, 2793, 281, 1249, + 35, 1753, 420, 691, 279, 4306, 382, 3995, 6547, 13, 51500], "temperature": 0.0, + "avg_logprob": -0.24896220288245507, "compression_ratio": 1.7896440129449838, "no_speech_prob": + 0.057629648596048355}, {"id": 275, "seek": 46268, "start": 485.4, "end": 486.4, + "text": " Okay.", "tokens": [51500, 1033, 13, 51550], "temperature": 0.0, "avg_logprob": + -0.24896220288245507, "compression_ratio": 1.7896440129449838, "no_speech_prob": + 0.057629648596048355}, {"id": 276, "seek": 46268, "start": 486.4, "end": 488.6, + "text": " So they can handle those late interactions.", "tokens": [51550, 407, 436, + 393, 4813, 729, 3469, 13280, 13, 51660], "temperature": 0.0, "avg_logprob": -0.24896220288245507, + "compression_ratio": 1.7896440129449838, "no_speech_prob": 0.057629648596048355}, + {"id": 277, "seek": 46268, "start": 488.6, "end": 490.72, "text": " Because they + offer that support natively.", "tokens": [51660, 1436, 436, 2626, 300, 1406, 8470, + 356, 13, 51766], "temperature": 0.0, "avg_logprob": -0.24896220288245507, "compression_ratio": + 1.7896440129449838, "no_speech_prob": 0.057629648596048355}, {"id": 278, "seek": + 46268, "start": 490.72, "end": 492.24, "text": " So you don''t have to hack it together + yourself.", "tokens": [51766, 407, 291, 500, 380, 362, 281, 10339, 309, 1214, 1803, + 13, 51842], "temperature": 0.0, "avg_logprob": -0.24896220288245507, "compression_ratio": + 1.7896440129449838, "no_speech_prob": 0.057629648596048355}, {"id": 279, "seek": + 49224, "start": 492.28000000000003, "end": 493.28000000000003, "text": " Exactly.", + "tokens": [50366, 7587, 13, 50416], "temperature": 0.0, "avg_logprob": -0.19433695475260418, + "compression_ratio": 1.5472312703583062, "no_speech_prob": 0.002053606091067195}, + {"id": 280, "seek": 49224, "start": 493.28000000000003, "end": 494.28000000000003, + "text": " That''s good to know.", "tokens": [50416, 663, 311, 665, 281, 458, 13, + 50466], "temperature": 0.0, "avg_logprob": -0.19433695475260418, "compression_ratio": + 1.5472312703583062, "no_speech_prob": 0.002053606091067195}, {"id": 281, "seek": + 49224, "start": 494.28000000000003, "end": 500.32, "text": " So choosing a database + that can handle those complexities is critical for performance and", "tokens": [50466, + 407, 10875, 257, 8149, 300, 393, 4813, 729, 48705, 307, 4924, 337, 3389, 293, 50768], + "temperature": 0.0, "avg_logprob": -0.19433695475260418, "compression_ratio": 1.5472312703583062, + "no_speech_prob": 0.002053606091067195}, {"id": 282, "seek": 49224, "start": 500.32, + "end": 501.32, "text": " efficiency.", "tokens": [50768, 10493, 13, 50818], "temperature": + 0.0, "avg_logprob": -0.19433695475260418, "compression_ratio": 1.5472312703583062, + "no_speech_prob": 0.002053606091067195}, {"id": 283, "seek": 49224, "start": 501.32, + "end": 503.72, "text": " You don''t want your search to be slow and clunky.", "tokens": + [50818, 509, 500, 380, 528, 428, 3164, 281, 312, 2964, 293, 596, 25837, 13, 50938], + "temperature": 0.0, "avg_logprob": -0.19433695475260418, "compression_ratio": 1.5472312703583062, + "no_speech_prob": 0.002053606091067195}, {"id": 284, "seek": 49224, "start": 503.72, + "end": 505.92, "text": " Especially if you''re dealing with a lot of data.", "tokens": + [50938, 8545, 498, 291, 434, 6260, 365, 257, 688, 295, 1412, 13, 51048], "temperature": + 0.0, "avg_logprob": -0.19433695475260418, "compression_ratio": 1.5472312703583062, + "no_speech_prob": 0.002053606091067195}, {"id": 285, "seek": 49224, "start": 505.92, + "end": 506.92, "text": " Right.", "tokens": [51048, 1779, 13, 51098], "temperature": + 0.0, "avg_logprob": -0.19433695475260418, "compression_ratio": 1.5472312703583062, + "no_speech_prob": 0.002053606091067195}, {"id": 286, "seek": 49224, "start": 506.92, + "end": 508.72, "text": " Or if you need those results in real time.", "tokens": + [51098, 1610, 498, 291, 643, 729, 3542, 294, 957, 565, 13, 51188], "temperature": + 0.0, "avg_logprob": -0.19433695475260418, "compression_ratio": 1.5472312703583062, + "no_speech_prob": 0.002053606091067195}, {"id": 287, "seek": 49224, "start": 508.72, + "end": 509.92, "text": " But it doesn''t stop there.", "tokens": [51188, 583, 309, + 1177, 380, 1590, 456, 13, 51248], "temperature": 0.0, "avg_logprob": -0.19433695475260418, + "compression_ratio": 1.5472312703583062, "no_speech_prob": 0.002053606091067195}, + {"id": 288, "seek": 49224, "start": 509.92, "end": 510.92, "text": " There''s more.", + "tokens": [51248, 821, 311, 544, 13, 51298], "temperature": 0.0, "avg_logprob": + -0.19433695475260418, "compression_ratio": 1.5472312703583062, "no_speech_prob": + 0.002053606091067195}, {"id": 289, "seek": 49224, "start": 510.92, "end": 513.5600000000001, + "text": " The next step into Dmitry''s roadmap is super important.", "tokens": [51298, + 440, 958, 1823, 666, 413, 3508, 627, 311, 35738, 307, 1687, 1021, 13, 51430], "temperature": + 0.0, "avg_logprob": -0.19433695475260418, "compression_ratio": 1.5472312703583062, + "no_speech_prob": 0.002053606091067195}, {"id": 290, "seek": 49224, "start": 513.5600000000001, + "end": 514.5600000000001, "text": " Okay.", "tokens": [51430, 1033, 13, 51480], + "temperature": 0.0, "avg_logprob": -0.19433695475260418, "compression_ratio": 1.5472312703583062, + "no_speech_prob": 0.002053606091067195}, {"id": 291, "seek": 49224, "start": 514.5600000000001, + "end": 515.5600000000001, "text": " Hit me.", "tokens": [51480, 9217, 385, 13, 51530], + "temperature": 0.0, "avg_logprob": -0.19433695475260418, "compression_ratio": 1.5472312703583062, + "no_speech_prob": 0.002053606091067195}, {"id": 292, "seek": 49224, "start": 515.5600000000001, + "end": 516.5600000000001, "text": " Considering latency.", "tokens": [51530, 33854, + 27043, 13, 51580], "temperature": 0.0, "avg_logprob": -0.19433695475260418, "compression_ratio": + 1.5472312703583062, "no_speech_prob": 0.002053606091067195}, {"id": 293, "seek": + 49224, "start": 516.5600000000001, "end": 517.5600000000001, "text": " Latency, + okay.", "tokens": [51580, 7354, 3020, 11, 1392, 13, 51630], "temperature": 0.0, + "avg_logprob": -0.19433695475260418, "compression_ratio": 1.5472312703583062, "no_speech_prob": + 0.002053606091067195}, {"id": 294, "seek": 49224, "start": 517.5600000000001, "end": + 518.5600000000001, "text": " And those query per second abans?", "tokens": [51630, + 400, 729, 14581, 680, 1150, 410, 599, 30, 51680], "temperature": 0.0, "avg_logprob": + -0.19433695475260418, "compression_ratio": 1.5472312703583062, "no_speech_prob": + 0.002053606091067195}, {"id": 295, "seek": 49224, "start": 518.5600000000001, "end": + 519.5600000000001, "text": " Oh, yes.", "tokens": [51680, 876, 11, 2086, 13, 51730], + "temperature": 0.0, "avg_logprob": -0.19433695475260418, "compression_ratio": 1.5472312703583062, + "no_speech_prob": 0.002053606091067195}, {"id": 296, "seek": 49224, "start": 519.5600000000001, + "end": 520.5600000000001, "text": " QPS.", "tokens": [51730, 1249, 6273, 13, 51780], + "temperature": 0.0, "avg_logprob": -0.19433695475260418, "compression_ratio": 1.5472312703583062, + "no_speech_prob": 0.002053606091067195}, {"id": 297, "seek": 52056, "start": 520.64, + "end": 522.4799999999999, "text": " Can make or break your application?", "tokens": + [50368, 1664, 652, 420, 1821, 428, 3861, 30, 50460], "temperature": 0.0, "avg_logprob": + -0.1773903483436221, "compression_ratio": 1.631578947368421, "no_speech_prob": 0.02104545198380947}, + {"id": 298, "seek": 52056, "start": 522.4799999999999, "end": 523.3199999999999, + "text": " You''re telling me.", "tokens": [50460, 509, 434, 3585, 385, 13, 50502], + "temperature": 0.0, "avg_logprob": -0.1773903483436221, "compression_ratio": 1.631578947368421, + "no_speech_prob": 0.02104545198380947}, {"id": 299, "seek": 52056, "start": 523.3199999999999, + "end": 524.8, "text": " If your database is slow.", "tokens": [50502, 759, 428, + 8149, 307, 2964, 13, 50576], "temperature": 0.0, "avg_logprob": -0.1773903483436221, + "compression_ratio": 1.631578947368421, "no_speech_prob": 0.02104545198380947}, + {"id": 300, "seek": 52056, "start": 524.8, "end": 525.8, "text": " Yeah.", "tokens": + [50576, 865, 13, 50626], "temperature": 0.0, "avg_logprob": -0.1773903483436221, + "compression_ratio": 1.631578947368421, "no_speech_prob": 0.02104545198380947}, + {"id": 301, "seek": 52056, "start": 525.8, "end": 527.4799999999999, "text": " Or + it can''t handle the volume of queries.", "tokens": [50626, 1610, 309, 393, 380, + 4813, 264, 5523, 295, 24109, 13, 50710], "temperature": 0.0, "avg_logprob": -0.1773903483436221, + "compression_ratio": 1.631578947368421, "no_speech_prob": 0.02104545198380947}, + {"id": 302, "seek": 52056, "start": 527.4799999999999, "end": 529.16, "text": " + It''s going to be a bad experience for the user.", "tokens": [50710, 467, 311, 516, + 281, 312, 257, 1578, 1752, 337, 264, 4195, 13, 50794], "temperature": 0.0, "avg_logprob": + -0.1773903483436221, "compression_ratio": 1.631578947368421, "no_speech_prob": 0.02104545198380947}, + {"id": 303, "seek": 52056, "start": 529.16, "end": 530.16, "text": " It''s going + to be a disaster.", "tokens": [50794, 467, 311, 516, 281, 312, 257, 11293, 13, 50844], + "temperature": 0.0, "avg_logprob": -0.1773903483436221, "compression_ratio": 1.631578947368421, + "no_speech_prob": 0.02104545198380947}, {"id": 304, "seek": 52056, "start": 530.16, + "end": 532.0, "text": " So you''ve got to think about those things up front.", "tokens": + [50844, 407, 291, 600, 658, 281, 519, 466, 729, 721, 493, 1868, 13, 50936], "temperature": + 0.0, "avg_logprob": -0.1773903483436221, "compression_ratio": 1.631578947368421, + "no_speech_prob": 0.02104545198380947}, {"id": 305, "seek": 52056, "start": 532.0, + "end": 533.0, "text": " Absolutely.", "tokens": [50936, 7021, 13, 50986], "temperature": + 0.0, "avg_logprob": -0.1773903483436221, "compression_ratio": 1.631578947368421, + "no_speech_prob": 0.02104545198380947}, {"id": 306, "seek": 52056, "start": 533.0, + "end": 535.3199999999999, "text": " And choose a database that can handle the load.", + "tokens": [50986, 400, 2826, 257, 8149, 300, 393, 4813, 264, 3677, 13, 51102], "temperature": + 0.0, "avg_logprob": -0.1773903483436221, "compression_ratio": 1.631578947368421, + "no_speech_prob": 0.02104545198380947}, {"id": 307, "seek": 52056, "start": 535.3199999999999, + "end": 537.5999999999999, "text": " If high performance is the name of the game.", + "tokens": [51102, 759, 1090, 3389, 307, 264, 1315, 295, 264, 1216, 13, 51216], "temperature": + 0.0, "avg_logprob": -0.1773903483436221, "compression_ratio": 1.631578947368421, + "no_speech_prob": 0.02104545198380947}, {"id": 308, "seek": 52056, "start": 537.5999999999999, + "end": 538.5999999999999, "text": " Yeah.", "tokens": [51216, 865, 13, 51266], "temperature": + 0.0, "avg_logprob": -0.1773903483436221, "compression_ratio": 1.631578947368421, + "no_speech_prob": 0.02104545198380947}, {"id": 309, "seek": 52056, "start": 538.5999999999999, + "end": 544.4, "text": " You''ll want to explore solutions like GSI, APU, Vespa, + or hyperspace.", "tokens": [51266, 509, 603, 528, 281, 6839, 6547, 411, 460, 20262, + 11, 5372, 52, 11, 691, 279, 4306, 11, 420, 7420, 433, 17940, 13, 51556], "temperature": + 0.0, "avg_logprob": -0.1773903483436221, "compression_ratio": 1.631578947368421, + "no_speech_prob": 0.02104545198380947}, {"id": 310, "seek": 52056, "start": 544.4, + "end": 545.4, "text": " Got it.", "tokens": [51556, 5803, 309, 13, 51606], "temperature": + 0.0, "avg_logprob": -0.1773903483436221, "compression_ratio": 1.631578947368421, + "no_speech_prob": 0.02104545198380947}, {"id": 311, "seek": 52056, "start": 545.4, + "end": 548.2399999999999, "text": " In fact, Dmitry even shared an anecdote about + a CTO.", "tokens": [51606, 682, 1186, 11, 413, 3508, 627, 754, 5507, 364, 49845, + 466, 257, 383, 15427, 13, 51748], "temperature": 0.0, "avg_logprob": -0.1773903483436221, + "compression_ratio": 1.631578947368421, "no_speech_prob": 0.02104545198380947}, + {"id": 312, "seek": 52056, "start": 548.2399999999999, "end": 549.88, "text": " + Oh, I love a good anecdote.", "tokens": [51748, 876, 11, 286, 959, 257, 665, 49845, + 13, 51830], "temperature": 0.0, "avg_logprob": -0.1773903483436221, "compression_ratio": + 1.631578947368421, "no_speech_prob": 0.02104545198380947}, {"id": 313, "seek": 54988, + "start": 549.88, "end": 555.8, "text": " You''ll confess that no open source vector + database could handle their extreme workload.", "tokens": [50364, 509, 603, 19367, + 300, 572, 1269, 4009, 8062, 8149, 727, 4813, 641, 8084, 20139, 13, 50660], "temperature": + 0.0, "avg_logprob": -0.23072041810013866, "compression_ratio": 1.612794612794613, + "no_speech_prob": 0.07140126824378967}, {"id": 314, "seek": 54988, "start": 555.8, + "end": 556.8, "text": " Wow.", "tokens": [50660, 3153, 13, 50710], "temperature": + 0.0, "avg_logprob": -0.23072041810013866, "compression_ratio": 1.612794612794613, + "no_speech_prob": 0.07140126824378967}, {"id": 315, "seek": 54988, "start": 556.8, + "end": 558.48, "text": " So they had to go with a commercial solution.", "tokens": + [50710, 407, 436, 632, 281, 352, 365, 257, 6841, 3827, 13, 50794], "temperature": + 0.0, "avg_logprob": -0.23072041810013866, "compression_ratio": 1.612794612794613, + "no_speech_prob": 0.07140126824378967}, {"id": 316, "seek": 54988, "start": 558.48, + "end": 559.68, "text": " They''d find something else.", "tokens": [50794, 814, 1116, + 915, 746, 1646, 13, 50854], "temperature": 0.0, "avg_logprob": -0.23072041810013866, + "compression_ratio": 1.612794612794613, "no_speech_prob": 0.07140126824378967}, + {"id": 317, "seek": 54988, "start": 559.68, "end": 560.68, "text": " That''s interesting.", + "tokens": [50854, 663, 311, 1880, 13, 50904], "temperature": 0.0, "avg_logprob": + -0.23072041810013866, "compression_ratio": 1.612794612794613, "no_speech_prob": + 0.07140126824378967}, {"id": 318, "seek": 54988, "start": 560.68, "end": 562.48, + "text": " Choosing wisely is essential.", "tokens": [50904, 12366, 6110, 37632, + 307, 7115, 13, 50994], "temperature": 0.0, "avg_logprob": -0.23072041810013866, + "compression_ratio": 1.612794612794613, "no_speech_prob": 0.07140126824378967}, + {"id": 319, "seek": 54988, "start": 562.48, "end": 564.64, "text": " You can''t + just pick the first one you see.", "tokens": [50994, 509, 393, 380, 445, 1888, 264, + 700, 472, 291, 536, 13, 51102], "temperature": 0.0, "avg_logprob": -0.23072041810013866, + "compression_ratio": 1.612794612794613, "no_speech_prob": 0.07140126824378967}, + {"id": 320, "seek": 54988, "start": 564.64, "end": 565.64, "text": " No.", "tokens": + [51102, 883, 13, 51152], "temperature": 0.0, "avg_logprob": -0.23072041810013866, + "compression_ratio": 1.612794612794613, "no_speech_prob": 0.07140126824378967}, + {"id": 321, "seek": 54988, "start": 565.64, "end": 566.64, "text": " You got to + your homework.", "tokens": [51152, 509, 658, 281, 428, 14578, 13, 51202], "temperature": + 0.0, "avg_logprob": -0.23072041810013866, "compression_ratio": 1.612794612794613, + "no_speech_prob": 0.07140126824378967}, {"id": 322, "seek": 54988, "start": 566.64, + "end": 568.12, "text": " And think about your long term needs.", "tokens": [51202, + 400, 519, 466, 428, 938, 1433, 2203, 13, 51276], "temperature": 0.0, "avg_logprob": + -0.23072041810013866, "compression_ratio": 1.612794612794613, "no_speech_prob": + 0.07140126824378967}, {"id": 323, "seek": 54988, "start": 568.12, "end": 571.84, + "text": " So the takeaway here is you need to think strategically.", "tokens": [51276, + 407, 264, 30681, 510, 307, 291, 643, 281, 519, 38061, 13, 51462], "temperature": + 0.0, "avg_logprob": -0.23072041810013866, "compression_ratio": 1.612794612794613, + "no_speech_prob": 0.07140126824378967}, {"id": 324, "seek": 54988, "start": 571.84, + "end": 572.84, "text": " Yep.", "tokens": [51462, 7010, 13, 51512], "temperature": + 0.0, "avg_logprob": -0.23072041810013866, "compression_ratio": 1.612794612794613, + "no_speech_prob": 0.07140126824378967}, {"id": 325, "seek": 54988, "start": 572.84, + "end": 578.12, "text": " Do you invest the engineering time to set up and maintain + an open source database?", "tokens": [51512, 1144, 291, 1963, 264, 7043, 565, 281, + 992, 493, 293, 6909, 364, 1269, 4009, 8149, 30, 51776], "temperature": 0.0, "avg_logprob": + -0.23072041810013866, "compression_ratio": 1.612794612794613, "no_speech_prob": + 0.07140126824378967}, {"id": 326, "seek": 54988, "start": 578.12, "end": 579.12, + "text": " Right.", "tokens": [51776, 1779, 13, 51826], "temperature": 0.0, "avg_logprob": + -0.23072041810013866, "compression_ratio": 1.612794612794613, "no_speech_prob": + 0.07140126824378967}, {"id": 327, "seek": 57912, "start": 579.12, "end": 584.24, + "text": " Or do you go with the convenience and potentially higher costs of a cloud + solution?", "tokens": [50364, 1610, 360, 291, 352, 365, 264, 19283, 293, 7263, 2946, + 5497, 295, 257, 4588, 3827, 30, 50620], "temperature": 0.0, "avg_logprob": -0.22903351022415802, + "compression_ratio": 1.586466165413534, "no_speech_prob": 0.0004534748732112348}, + {"id": 328, "seek": 57912, "start": 584.24, "end": 585.24, "text": " Right.", "tokens": + [50620, 1779, 13, 50670], "temperature": 0.0, "avg_logprob": -0.22903351022415802, + "compression_ratio": 1.586466165413534, "no_speech_prob": 0.0004534748732112348}, + {"id": 329, "seek": 57912, "start": 585.24, "end": 586.24, "text": " It''s a classic + trade off.", "tokens": [50670, 467, 311, 257, 7230, 4923, 766, 13, 50720], "temperature": + 0.0, "avg_logprob": -0.22903351022415802, "compression_ratio": 1.586466165413534, + "no_speech_prob": 0.0004534748732112348}, {"id": 330, "seek": 57912, "start": 586.24, + "end": 587.24, "text": " There''s no right or wrong answer.", "tokens": [50720, + 821, 311, 572, 558, 420, 2085, 1867, 13, 50770], "temperature": 0.0, "avg_logprob": + -0.22903351022415802, "compression_ratio": 1.586466165413534, "no_speech_prob": + 0.0004534748732112348}, {"id": 331, "seek": 57912, "start": 587.24, "end": 588.24, + "text": " It depends on your situation.", "tokens": [50770, 467, 5946, 322, 428, + 2590, 13, 50820], "temperature": 0.0, "avg_logprob": -0.22903351022415802, "compression_ratio": + 1.586466165413534, "no_speech_prob": 0.0004534748732112348}, {"id": 332, "seek": + 57912, "start": 588.24, "end": 592.84, "text": " It''s all about finding the balance + that works best for your specific situation.", "tokens": [50820, 467, 311, 439, + 466, 5006, 264, 4772, 300, 1985, 1151, 337, 428, 2685, 2590, 13, 51050], "temperature": + 0.0, "avg_logprob": -0.22903351022415802, "compression_ratio": 1.586466165413534, + "no_speech_prob": 0.0004534748732112348}, {"id": 333, "seek": 57912, "start": 592.84, + "end": 594.32, "text": " Absolutely.", "tokens": [51050, 7021, 13, 51124], "temperature": + 0.0, "avg_logprob": -0.22903351022415802, "compression_ratio": 1.586466165413534, + "no_speech_prob": 0.0004534748732112348}, {"id": 334, "seek": 57912, "start": 594.32, + "end": 599.5600000000001, "text": " And there are a lot of great cloud and API based + options out there.", "tokens": [51124, 400, 456, 366, 257, 688, 295, 869, 4588, + 293, 9362, 2361, 3956, 484, 456, 13, 51386], "temperature": 0.0, "avg_logprob": + -0.22903351022415802, "compression_ratio": 1.586466165413534, "no_speech_prob": + 0.0004534748732112348}, {"id": 335, "seek": 57912, "start": 599.5600000000001, "end": + 600.5600000000001, "text": " Like what?", "tokens": [51386, 1743, 437, 30, 51436], + "temperature": 0.0, "avg_logprob": -0.22903351022415802, "compression_ratio": 1.586466165413534, + "no_speech_prob": 0.0004534748732112348}, {"id": 336, "seek": 57912, "start": 600.5600000000001, + "end": 607.76, "text": " Like Cosmos DB, Vertex AI, Pine Cone Cloud, Weeveate Cloud, + and other.", "tokens": [51436, 1743, 15855, 3415, 26754, 11, 21044, 3121, 7318, + 11, 33531, 383, 546, 8061, 11, 492, 68, 303, 473, 8061, 11, 293, 661, 13, 51796], + "temperature": 0.0, "avg_logprob": -0.22903351022415802, "compression_ratio": 1.586466165413534, + "no_speech_prob": 0.0004534748732112348}, {"id": 337, "seek": 60776, "start": 607.76, + "end": 609.4399999999999, "text": " But there''s no shortage of options.", "tokens": + [50364, 583, 456, 311, 572, 24708, 295, 3956, 13, 50448], "temperature": 0.0, "avg_logprob": + -0.21822113037109375, "compression_ratio": 1.6396103896103895, "no_speech_prob": + 0.23578330874443054}, {"id": 338, "seek": 60776, "start": 609.4399999999999, "end": + 610.68, "text": " There''s a lot to choose from.", "tokens": [50448, 821, 311, 257, + 688, 281, 2826, 490, 13, 50510], "temperature": 0.0, "avg_logprob": -0.21822113037109375, + "compression_ratio": 1.6396103896103895, "no_speech_prob": 0.23578330874443054}, + {"id": 339, "seek": 60776, "start": 610.68, "end": 612.0, "text": " It''s a good + problem to have, right?", "tokens": [50510, 467, 311, 257, 665, 1154, 281, 362, + 11, 558, 30, 50576], "temperature": 0.0, "avg_logprob": -0.21822113037109375, "compression_ratio": + 1.6396103896103895, "no_speech_prob": 0.23578330874443054}, {"id": 340, "seek": + 60776, "start": 612.0, "end": 613.2, "text": " It is a good problem to have.", "tokens": + [50576, 467, 307, 257, 665, 1154, 281, 362, 13, 50636], "temperature": 0.0, "avg_logprob": + -0.21822113037109375, "compression_ratio": 1.6396103896103895, "no_speech_prob": + 0.23578330874443054}, {"id": 341, "seek": 60776, "start": 613.2, "end": 615.4, "text": + " Better than having no options at all.", "tokens": [50636, 15753, 813, 1419, 572, + 3956, 412, 439, 13, 50746], "temperature": 0.0, "avg_logprob": -0.21822113037109375, + "compression_ratio": 1.6396103896103895, "no_speech_prob": 0.23578330874443054}, + {"id": 342, "seek": 60776, "start": 615.4, "end": 617.24, "text": " And we love + hearing from our community.", "tokens": [50746, 400, 321, 959, 4763, 490, 527, 1768, + 13, 50838], "temperature": 0.0, "avg_logprob": -0.21822113037109375, "compression_ratio": + 1.6396103896103895, "no_speech_prob": 0.23578330874443054}, {"id": 343, "seek": + 60776, "start": 617.24, "end": 619.48, "text": " Oh yes, our listeners are the best.", + "tokens": [50838, 876, 2086, 11, 527, 23274, 366, 264, 1151, 13, 50950], "temperature": + 0.0, "avg_logprob": -0.21822113037109375, "compression_ratio": 1.6396103896103895, + "no_speech_prob": 0.23578330874443054}, {"id": 344, "seek": 60776, "start": 619.48, + "end": 624.4399999999999, "text": " One reader, Matt Collins, suggested exploring + extensions like PG Vector.", "tokens": [50950, 1485, 15149, 11, 7397, 27973, 11, + 10945, 12736, 25129, 411, 40975, 691, 20814, 13, 51198], "temperature": 0.0, "avg_logprob": + -0.21822113037109375, "compression_ratio": 1.6396103896103895, "no_speech_prob": + 0.23578330874443054}, {"id": 345, "seek": 60776, "start": 624.4399999999999, "end": + 625.4399999999999, "text": " Did you vector?", "tokens": [51198, 2589, 291, 8062, + 30, 51248], "temperature": 0.0, "avg_logprob": -0.21822113037109375, "compression_ratio": + 1.6396103896103895, "no_speech_prob": 0.23578330874443054}, {"id": 346, "seek": + 60776, "start": 625.4399999999999, "end": 626.4399999999999, "text": " Okay.", "tokens": + [51248, 1033, 13, 51298], "temperature": 0.0, "avg_logprob": -0.21822113037109375, + "compression_ratio": 1.6396103896103895, "no_speech_prob": 0.23578330874443054}, + {"id": 347, "seek": 60776, "start": 626.4399999999999, "end": 628.3199999999999, + "text": " Which adds vector search to Postgresql.", "tokens": [51298, 3013, 10860, + 8062, 3164, 281, 10223, 45189, 80, 75, 13, 51392], "temperature": 0.0, "avg_logprob": + -0.21822113037109375, "compression_ratio": 1.6396103896103895, "no_speech_prob": + 0.23578330874443054}, {"id": 348, "seek": 60776, "start": 628.3199999999999, "end": + 632.12, "text": " Oh, so you can just add it onto your existing Postgres database.", + "tokens": [51392, 876, 11, 370, 291, 393, 445, 909, 309, 3911, 428, 6741, 10223, + 45189, 8149, 13, 51582], "temperature": 0.0, "avg_logprob": -0.21822113037109375, + "compression_ratio": 1.6396103896103895, "no_speech_prob": 0.23578330874443054}, + {"id": 349, "seek": 60776, "start": 632.12, "end": 633.12, "text": " Exactly.", + "tokens": [51582, 7587, 13, 51632], "temperature": 0.0, "avg_logprob": -0.21822113037109375, + "compression_ratio": 1.6396103896103895, "no_speech_prob": 0.23578330874443054}, + {"id": 350, "seek": 60776, "start": 633.12, "end": 634.12, "text": " That''s pretty + cool.", "tokens": [51632, 663, 311, 1238, 1627, 13, 51682], "temperature": 0.0, + "avg_logprob": -0.21822113037109375, "compression_ratio": 1.6396103896103895, "no_speech_prob": + 0.23578330874443054}, {"id": 351, "seek": 60776, "start": 634.12, "end": 635.12, + "text": " It''s a really clever solution.", "tokens": [51682, 467, 311, 257, 534, + 13494, 3827, 13, 51732], "temperature": 0.0, "avg_logprob": -0.21822113037109375, + "compression_ratio": 1.6396103896103895, "no_speech_prob": 0.23578330874443054}, + {"id": 352, "seek": 63512, "start": 635.12, "end": 637.88, "text": " You have to + rip and replace your whole infrastructure.", "tokens": [50364, 509, 362, 281, 12782, + 293, 7406, 428, 1379, 6896, 13, 50502], "temperature": 0.0, "avg_logprob": -0.2464251072286702, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.5175851583480835}, + {"id": 353, "seek": 63512, "start": 637.88, "end": 642.88, "text": " And it speaks + to the constantly evolving nature of the vector database landscape.", "tokens": + [50502, 400, 309, 10789, 281, 264, 6460, 21085, 3687, 295, 264, 8062, 8149, 9661, + 13, 50752], "temperature": 0.0, "avg_logprob": -0.2464251072286702, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.5175851583480835}, {"id": 354, "seek": 63512, + "start": 642.88, "end": 643.96, "text": " It''s a fast moving field.", "tokens": + [50752, 467, 311, 257, 2370, 2684, 2519, 13, 50806], "temperature": 0.0, "avg_logprob": + -0.2464251072286702, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.5175851583480835}, {"id": 355, "seek": 63512, "start": 643.96, "end": 645.68, + "text": " There''s always something new happening.", "tokens": [50806, 821, 311, + 1009, 746, 777, 2737, 13, 50892], "temperature": 0.0, "avg_logprob": -0.2464251072286702, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.5175851583480835}, + {"id": 356, "seek": 63512, "start": 645.68, "end": 647.92, "text": " You''ve been + in new solutions emerging.", "tokens": [50892, 509, 600, 668, 294, 777, 6547, 14989, + 13, 51004], "temperature": 0.0, "avg_logprob": -0.2464251072286702, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.5175851583480835}, {"id": 357, "seek": 63512, + "start": 647.92, "end": 648.92, "text": " Speaking of evolution.", "tokens": [51004, + 13069, 295, 9303, 13, 51054], "temperature": 0.0, "avg_logprob": -0.2464251072286702, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.5175851583480835}, + {"id": 358, "seek": 63512, "start": 648.92, "end": 650.52, "text": " Oh, this is + where it gets really interesting.", "tokens": [51054, 876, 11, 341, 307, 689, 309, + 2170, 534, 1880, 13, 51134], "temperature": 0.0, "avg_logprob": -0.2464251072286702, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.5175851583480835}, + {"id": 359, "seek": 63512, "start": 650.52, "end": 652.76, "text": " Dimitri paints + a fascinating picture of the future.", "tokens": [51134, 20975, 270, 470, 28076, + 257, 10343, 3036, 295, 264, 2027, 13, 51246], "temperature": 0.0, "avg_logprob": + -0.2464251072286702, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.5175851583480835}, {"id": 360, "seek": 63512, "start": 652.76, "end": 654.44, + "text": " I can''t wait to hear this.", "tokens": [51246, 286, 393, 380, 1699, 281, + 1568, 341, 13, 51330], "temperature": 0.0, "avg_logprob": -0.2464251072286702, "compression_ratio": + 1.6666666666666667, "no_speech_prob": 0.5175851583480835}, {"id": 361, "seek": 63512, + "start": 654.44, "end": 658.5600000000001, "text": " He believes the future lies + in what he calls neural search frameworks.", "tokens": [51330, 634, 12307, 264, + 2027, 9134, 294, 437, 415, 5498, 18161, 3164, 29834, 13, 51536], "temperature": + 0.0, "avg_logprob": -0.2464251072286702, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.5175851583480835}, {"id": 362, "seek": 63512, "start": 658.5600000000001, + "end": 659.72, "text": " Oh, wow.", "tokens": [51536, 876, 11, 6076, 13, 51594], + "temperature": 0.0, "avg_logprob": -0.2464251072286702, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.5175851583480835}, {"id": 363, "seek": 63512, "start": 659.72, + "end": 664.08, "text": " These frameworks could revolutionize how we build AI-powered + applications.", "tokens": [51594, 1981, 29834, 727, 8894, 1125, 577, 321, 1322, + 7318, 12, 27178, 5821, 13, 51812], "temperature": 0.0, "avg_logprob": -0.2464251072286702, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.5175851583480835}, + {"id": 364, "seek": 63512, "start": 664.08, "end": 665.08, "text": " Okay.", "tokens": + [51812, 1033, 13, 51862], "temperature": 0.0, "avg_logprob": -0.2464251072286702, + "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.5175851583480835}, + {"id": 365, "seek": 66508, "start": 665.08, "end": 669.24, "text": " And then we + have a system that streamlines the entire process from data modeling and embedding", + "tokens": [50364, 400, 550, 321, 362, 257, 1185, 300, 4309, 11045, 264, 2302, 1399, + 490, 1412, 15983, 293, 12240, 3584, 50572], "temperature": 0.0, "avg_logprob": -0.25581640355727253, + "compression_ratio": 1.67595818815331, "no_speech_prob": 0.0053773801773786545}, + {"id": 366, "seek": 66508, "start": 669.24, "end": 672.72, "text": " selection to + evaluation and scaling.", "tokens": [50572, 9450, 281, 13344, 293, 21589, 13, 50746], + "temperature": 0.0, "avg_logprob": -0.25581640355727253, "compression_ratio": 1.67595818815331, + "no_speech_prob": 0.0053773801773786545}, {"id": 367, "seek": 66508, "start": 672.72, + "end": 676.76, "text": " So instead of wrestling with the complexities of choosing + and integrating all the different", "tokens": [50746, 407, 2602, 295, 19274, 365, + 264, 48705, 295, 10875, 293, 26889, 439, 264, 819, 50948], "temperature": 0.0, "avg_logprob": + -0.25581640355727253, "compression_ratio": 1.67595818815331, "no_speech_prob": 0.0053773801773786545}, + {"id": 368, "seek": 66508, "start": 676.76, "end": 682.6800000000001, "text": " + components, it would be like having an intelligent assistant guiding you through + building a search", "tokens": [50948, 6677, 11, 309, 576, 312, 411, 1419, 364, 13232, + 10994, 25061, 291, 807, 2390, 257, 3164, 51244], "temperature": 0.0, "avg_logprob": + -0.25581640355727253, "compression_ratio": 1.67595818815331, "no_speech_prob": 0.0053773801773786545}, + {"id": 369, "seek": 66508, "start": 682.6800000000001, "end": 686.5600000000001, + "text": " application no matter what database technology you''re using.", "tokens": + [51244, 3861, 572, 1871, 437, 8149, 2899, 291, 434, 1228, 13, 51438], "temperature": + 0.0, "avg_logprob": -0.25581640355727253, "compression_ratio": 1.67595818815331, + "no_speech_prob": 0.0053773801773786545}, {"id": 370, "seek": 66508, "start": 686.5600000000001, + "end": 687.5600000000001, "text": " Exactly.", "tokens": [51438, 7587, 13, 51488], + "temperature": 0.0, "avg_logprob": -0.25581640355727253, "compression_ratio": 1.67595818815331, + "no_speech_prob": 0.0053773801773786545}, {"id": 371, "seek": 66508, "start": 687.5600000000001, + "end": 691.1600000000001, "text": " And this vision ties in nicely with the concept + of compound AI systems.", "tokens": [51488, 400, 341, 5201, 14039, 294, 9594, 365, + 264, 3410, 295, 14154, 7318, 3652, 13, 51668], "temperature": 0.0, "avg_logprob": + -0.25581640355727253, "compression_ratio": 1.67595818815331, "no_speech_prob": 0.0053773801773786545}, + {"id": 372, "seek": 66508, "start": 691.1600000000001, "end": 692.1600000000001, + "text": " Oh, interesting.", "tokens": [51668, 876, 11, 1880, 13, 51718], "temperature": + 0.0, "avg_logprob": -0.25581640355727253, "compression_ratio": 1.67595818815331, + "no_speech_prob": 0.0053773801773786545}, {"id": 373, "seek": 69216, "start": 692.16, + "end": 698.28, "text": " So where LLMs vector databases and other AI components + work together like a well coordinated", "tokens": [50364, 407, 689, 441, 43, 26386, + 8062, 22380, 293, 661, 7318, 6677, 589, 1214, 411, 257, 731, 29591, 50670], "temperature": + 0.0, "avg_logprob": -0.1728930188040448, "compression_ratio": 1.671280276816609, + "no_speech_prob": 0.10033581405878067}, {"id": 374, "seek": 69216, "start": 698.28, + "end": 699.28, "text": " orchestra.", "tokens": [50670, 25280, 13, 50720], "temperature": + 0.0, "avg_logprob": -0.1728930188040448, "compression_ratio": 1.671280276816609, + "no_speech_prob": 0.10033581405878067}, {"id": 375, "seek": 69216, "start": 699.28, + "end": 703.48, "text": " So instead of focusing on the individual instruments, you''re + conducting the entire symphony.", "tokens": [50720, 407, 2602, 295, 8416, 322, 264, + 2609, 12190, 11, 291, 434, 21749, 264, 2302, 6697, 28616, 13, 50930], "temperature": + 0.0, "avg_logprob": -0.1728930188040448, "compression_ratio": 1.671280276816609, + "no_speech_prob": 0.10033581405878067}, {"id": 376, "seek": 69216, "start": 703.48, + "end": 704.48, "text": " Precisely.", "tokens": [50930, 48746, 736, 13, 50980], + "temperature": 0.0, "avg_logprob": -0.1728930188040448, "compression_ratio": 1.671280276816609, + "no_speech_prob": 0.10033581405878067}, {"id": 377, "seek": 69216, "start": 704.48, + "end": 705.9599999999999, "text": " I love that analogy.", "tokens": [50980, 286, + 959, 300, 21663, 13, 51054], "temperature": 0.0, "avg_logprob": -0.1728930188040448, + "compression_ratio": 1.671280276816609, "no_speech_prob": 0.10033581405878067}, + {"id": 378, "seek": 69216, "start": 705.9599999999999, "end": 709.8, "text": " Users + can then focus on the task they''re trying to solve, rather than the technical nuts", + "tokens": [51054, 47092, 393, 550, 1879, 322, 264, 5633, 436, 434, 1382, 281, 5039, + 11, 2831, 813, 264, 6191, 10483, 51246], "temperature": 0.0, "avg_logprob": -0.1728930188040448, + "compression_ratio": 1.671280276816609, "no_speech_prob": 0.10033581405878067}, + {"id": 379, "seek": 69216, "start": 709.8, "end": 710.8, "text": " and bolts.", + "tokens": [51246, 293, 18127, 13, 51296], "temperature": 0.0, "avg_logprob": -0.1728930188040448, + "compression_ratio": 1.671280276816609, "no_speech_prob": 0.10033581405878067}, + {"id": 380, "seek": 69216, "start": 710.8, "end": 715.0799999999999, "text": " So + it''s about abstracting away the complexity and empowering users to focus on the + bigger", "tokens": [51296, 407, 309, 311, 466, 12649, 278, 1314, 264, 14024, 293, + 28261, 5022, 281, 1879, 322, 264, 3801, 51510], "temperature": 0.0, "avg_logprob": + -0.1728930188040448, "compression_ratio": 1.671280276816609, "no_speech_prob": 0.10033581405878067}, + {"id": 381, "seek": 69216, "start": 715.0799999999999, "end": 716.0799999999999, + "text": " picture.", "tokens": [51510, 3036, 13, 51560], "temperature": 0.0, "avg_logprob": + -0.1728930188040448, "compression_ratio": 1.671280276816609, "no_speech_prob": 0.10033581405878067}, + {"id": 382, "seek": 69216, "start": 716.0799999999999, "end": 719.0, "text": " It''s + about making AI more accessible and user friendly.", "tokens": [51560, 467, 311, + 466, 1455, 7318, 544, 9515, 293, 4195, 9208, 13, 51706], "temperature": 0.0, "avg_logprob": + -0.1728930188040448, "compression_ratio": 1.671280276816609, "no_speech_prob": 0.10033581405878067}, + {"id": 383, "seek": 71900, "start": 719.0, "end": 722.24, "text": " It''s fascinating + how this all connects to those funding announcements we talked about", "tokens": + [50364, 467, 311, 10343, 577, 341, 439, 16967, 281, 729, 6137, 23785, 321, 2825, + 466, 50526], "temperature": 0.0, "avg_logprob": -0.1810559630393982, "compression_ratio": + 1.6269592476489028, "no_speech_prob": 0.005485461559146643}, {"id": 384, "seek": + 71900, "start": 722.24, "end": 723.24, "text": " earlier.", "tokens": [50526, 3071, + 13, 50576], "temperature": 0.0, "avg_logprob": -0.1810559630393982, "compression_ratio": + 1.6269592476489028, "no_speech_prob": 0.005485461559146643}, {"id": 385, "seek": + 71900, "start": 723.24, "end": 724.24, "text": " Right.", "tokens": [50576, 1779, + 13, 50626], "temperature": 0.0, "avg_logprob": -0.1810559630393982, "compression_ratio": + 1.6269592476489028, "no_speech_prob": 0.005485461559146643}, {"id": 386, "seek": + 71900, "start": 724.24, "end": 730.4, "text": " It seems like the industry might + be moving towards a more unified approach to AI solutions.", "tokens": [50626, 467, + 2544, 411, 264, 3518, 1062, 312, 2684, 3030, 257, 544, 26787, 3109, 281, 7318, 6547, + 13, 50934], "temperature": 0.0, "avg_logprob": -0.1810559630393982, "compression_ratio": + 1.6269592476489028, "no_speech_prob": 0.005485461559146643}, {"id": 387, "seek": + 71900, "start": 730.4, "end": 732.16, "text": " That''s a keen observation.", "tokens": + [50934, 663, 311, 257, 20297, 14816, 13, 51022], "temperature": 0.0, "avg_logprob": + -0.1810559630393982, "compression_ratio": 1.6269592476489028, "no_speech_prob": + 0.005485461559146643}, {"id": 388, "seek": 71900, "start": 732.16, "end": 735.32, + "text": " While individual components like vector databases are still important.", + "tokens": [51022, 3987, 2609, 6677, 411, 8062, 22380, 366, 920, 1021, 13, 51180], + "temperature": 0.0, "avg_logprob": -0.1810559630393982, "compression_ratio": 1.6269592476489028, + "no_speech_prob": 0.005485461559146643}, {"id": 389, "seek": 71900, "start": 735.32, + "end": 736.32, "text": " For sure.", "tokens": [51180, 1171, 988, 13, 51230], "temperature": + 0.0, "avg_logprob": -0.1810559630393982, "compression_ratio": 1.6269592476489028, + "no_speech_prob": 0.005485461559146643}, {"id": 390, "seek": 71900, "start": 736.32, + "end": 740.56, "text": " The future might be about how these pieces fit into a larger + ecosystem.", "tokens": [51230, 440, 2027, 1062, 312, 466, 577, 613, 3755, 3318, + 666, 257, 4833, 11311, 13, 51442], "temperature": 0.0, "avg_logprob": -0.1810559630393982, + "compression_ratio": 1.6269592476489028, "no_speech_prob": 0.005485461559146643}, + {"id": 391, "seek": 71900, "start": 740.56, "end": 743.16, "text": " Yeah, it''s + all about the big picture.", "tokens": [51442, 865, 11, 309, 311, 439, 466, 264, + 955, 3036, 13, 51572], "temperature": 0.0, "avg_logprob": -0.1810559630393982, "compression_ratio": + 1.6269592476489028, "no_speech_prob": 0.005485461559146643}, {"id": 392, "seek": + 71900, "start": 743.16, "end": 745.8, "text": " This brings us to an interesting + question for you, the listener.", "tokens": [51572, 639, 5607, 505, 281, 364, 1880, + 1168, 337, 291, 11, 264, 31569, 13, 51704], "temperature": 0.0, "avg_logprob": -0.1810559630393982, + "compression_ratio": 1.6269592476489028, "no_speech_prob": 0.005485461559146643}, + {"id": 393, "seek": 71900, "start": 745.8, "end": 746.8, "text": " Oh, yes.", "tokens": + [51704, 876, 11, 2086, 13, 51754], "temperature": 0.0, "avg_logprob": -0.1810559630393982, + "compression_ratio": 1.6269592476489028, "no_speech_prob": 0.005485461559146643}, + {"id": 394, "seek": 71900, "start": 746.8, "end": 748.56, "text": " Let''s get our + listeners involved.", "tokens": [51754, 961, 311, 483, 527, 23274, 3288, 13, 51842], + "temperature": 0.0, "avg_logprob": -0.1810559630393982, "compression_ratio": 1.6269592476489028, + "no_speech_prob": 0.005485461559146643}, {"id": 395, "seek": 74856, "start": 748.56, + "end": 753.92, "text": " Do you see neural search frameworks as a complete paradigm + shift?", "tokens": [50364, 1144, 291, 536, 18161, 3164, 29834, 382, 257, 3566, 24709, + 5513, 30, 50632], "temperature": 0.0, "avg_logprob": -0.199441650390625, "compression_ratio": + 1.5654952076677315, "no_speech_prob": 0.0009701931849122047}, {"id": 396, "seek": + 74856, "start": 753.92, "end": 759.28, "text": " Or will specialized vector databases + continue to have a distinct role to play?", "tokens": [50632, 1610, 486, 19813, + 8062, 22380, 2354, 281, 362, 257, 10644, 3090, 281, 862, 30, 50900], "temperature": + 0.0, "avg_logprob": -0.199441650390625, "compression_ratio": 1.5654952076677315, + "no_speech_prob": 0.0009701931849122047}, {"id": 397, "seek": 74856, "start": 759.28, + "end": 760.8, "text": " It''s tough question.", "tokens": [50900, 467, 311, 4930, + 1168, 13, 50976], "temperature": 0.0, "avg_logprob": -0.199441650390625, "compression_ratio": + 1.5654952076677315, "no_speech_prob": 0.0009701931849122047}, {"id": 398, "seek": + 74856, "start": 760.8, "end": 762.0, "text": " It''s something to think about.", + "tokens": [50976, 467, 311, 746, 281, 519, 466, 13, 51036], "temperature": 0.0, + "avg_logprob": -0.199441650390625, "compression_ratio": 1.5654952076677315, "no_speech_prob": + 0.0009701931849122047}, {"id": 399, "seek": 74856, "start": 762.0, "end": 763.4799999999999, + "text": " Let us know your thoughts in the comments.", "tokens": [51036, 961, 505, + 458, 428, 4598, 294, 264, 3053, 13, 51110], "temperature": 0.0, "avg_logprob": -0.199441650390625, + "compression_ratio": 1.5654952076677315, "no_speech_prob": 0.0009701931849122047}, + {"id": 400, "seek": 74856, "start": 763.4799999999999, "end": 765.0, "text": " We''d + love to hear from you.", "tokens": [51110, 492, 1116, 959, 281, 1568, 490, 291, + 13, 51186], "temperature": 0.0, "avg_logprob": -0.199441650390625, "compression_ratio": + 1.5654952076677315, "no_speech_prob": 0.0009701931849122047}, {"id": 401, "seek": + 74856, "start": 765.0, "end": 770.3599999999999, "text": " But before we get too + caught up in the future, let''s take a step back and revisit one of Dmitry''s", + "tokens": [51186, 583, 949, 321, 483, 886, 5415, 493, 294, 264, 2027, 11, 718, 311, + 747, 257, 1823, 646, 293, 32676, 472, 295, 413, 3508, 627, 311, 51454], "temperature": + 0.0, "avg_logprob": -0.199441650390625, "compression_ratio": 1.5654952076677315, + "no_speech_prob": 0.0009701931849122047}, {"id": 402, "seek": 74856, "start": 770.3599999999999, + "end": 776.04, "text": " key points about the impact of media coverage on perceptions + of technology trends.", "tokens": [51454, 2141, 2793, 466, 264, 2712, 295, 3021, + 9645, 322, 35258, 295, 2899, 13892, 13, 51738], "temperature": 0.0, "avg_logprob": + -0.199441650390625, "compression_ratio": 1.5654952076677315, "no_speech_prob": 0.0009701931849122047}, + {"id": 403, "seek": 74856, "start": 776.04, "end": 777.04, "text": " Right.", "tokens": + [51738, 1779, 13, 51788], "temperature": 0.0, "avg_logprob": -0.199441650390625, + "compression_ratio": 1.5654952076677315, "no_speech_prob": 0.0009701931849122047}, + {"id": 404, "seek": 74856, "start": 777.04, "end": 777.9599999999999, "text": " + That was a really important point.", "tokens": [51788, 663, 390, 257, 534, 1021, + 935, 13, 51834], "temperature": 0.0, "avg_logprob": -0.199441650390625, "compression_ratio": + 1.5654952076677315, "no_speech_prob": 0.0009701931849122047}, {"id": 405, "seek": + 77796, "start": 777.96, "end": 782.52, "text": " It''s a crucial reminder to be + discerning consumers of information, especially in a field", "tokens": [50364, 467, + 311, 257, 11462, 13548, 281, 312, 717, 1776, 773, 11883, 295, 1589, 11, 2318, 294, + 257, 2519, 50592], "temperature": 0.0, "avg_logprob": -0.15416003382483193, "compression_ratio": + 1.634441087613293, "no_speech_prob": 0.007103354204446077}, {"id": 406, "seek": + 77796, "start": 782.52, "end": 786.4000000000001, "text": " as dynamic as AI, where + innovation is constant.", "tokens": [50592, 382, 8546, 382, 7318, 11, 689, 8504, + 307, 5754, 13, 50786], "temperature": 0.0, "avg_logprob": -0.15416003382483193, + "compression_ratio": 1.634441087613293, "no_speech_prob": 0.007103354204446077}, + {"id": 407, "seek": 77796, "start": 786.4000000000001, "end": 790.0400000000001, + "text": " What might seem like a decline could actually be a natural evolution.", + "tokens": [50786, 708, 1062, 1643, 411, 257, 15635, 727, 767, 312, 257, 3303, 9303, + 13, 50968], "temperature": 0.0, "avg_logprob": -0.15416003382483193, "compression_ratio": + 1.634441087613293, "no_speech_prob": 0.007103354204446077}, {"id": 408, "seek": + 77796, "start": 790.0400000000001, "end": 791.0400000000001, "text": " Interesting.", + "tokens": [50968, 14711, 13, 51018], "temperature": 0.0, "avg_logprob": -0.15416003382483193, + "compression_ratio": 1.634441087613293, "no_speech_prob": 0.007103354204446077}, + {"id": 409, "seek": 77796, "start": 791.0400000000001, "end": 794.72, "text": " + As a technology matures and finds its place within a larger ecosystem.", "tokens": + [51018, 1018, 257, 2899, 275, 3377, 293, 10704, 1080, 1081, 1951, 257, 4833, 11311, + 13, 51202], "temperature": 0.0, "avg_logprob": -0.15416003382483193, "compression_ratio": + 1.634441087613293, "no_speech_prob": 0.007103354204446077}, {"id": 410, "seek": + 77796, "start": 794.72, "end": 795.72, "text": " Right.", "tokens": [51202, 1779, + 13, 51252], "temperature": 0.0, "avg_logprob": -0.15416003382483193, "compression_ratio": + 1.634441087613293, "no_speech_prob": 0.007103354204446077}, {"id": 411, "seek": + 77796, "start": 795.72, "end": 798.0400000000001, "text": " Like a caterpillar transforming + into a butterfly.", "tokens": [51252, 1743, 257, 44982, 30635, 27210, 666, 257, + 22140, 13, 51368], "temperature": 0.0, "avg_logprob": -0.15416003382483193, "compression_ratio": + 1.634441087613293, "no_speech_prob": 0.007103354204446077}, {"id": 412, "seek": + 77796, "start": 798.0400000000001, "end": 799.0400000000001, "text": " That''s a + wonderful analogy.", "tokens": [51368, 663, 311, 257, 3715, 21663, 13, 51418], "temperature": + 0.0, "avg_logprob": -0.15416003382483193, "compression_ratio": 1.634441087613293, + "no_speech_prob": 0.007103354204446077}, {"id": 413, "seek": 77796, "start": 799.0400000000001, + "end": 802.8000000000001, "text": " It''s still the same creature, just in a more + advanced and beautiful form.", "tokens": [51418, 467, 311, 920, 264, 912, 12797, + 11, 445, 294, 257, 544, 7339, 293, 2238, 1254, 13, 51606], "temperature": 0.0, "avg_logprob": + -0.15416003382483193, "compression_ratio": 1.634441087613293, "no_speech_prob": + 0.007103354204446077}, {"id": 414, "seek": 77796, "start": 802.8000000000001, "end": + 807.36, "text": " It underscores the importance of staying curious, continuing to + explore, and never assuming", "tokens": [51606, 467, 16692, 66, 2706, 264, 7379, + 295, 7939, 6369, 11, 9289, 281, 6839, 11, 293, 1128, 11926, 51834], "temperature": + 0.0, "avg_logprob": -0.15416003382483193, "compression_ratio": 1.634441087613293, + "no_speech_prob": 0.007103354204446077}, {"id": 415, "seek": 80736, "start": 807.36, + "end": 810.72, "text": " that any technology is truly dead.", "tokens": [50364, + 300, 604, 2899, 307, 4908, 3116, 13, 50532], "temperature": 0.0, "avg_logprob": + -0.19437014447511547, "compression_ratio": 1.608695652173913, "no_speech_prob": + 0.06758448481559753}, {"id": 416, "seek": 80736, "start": 810.72, "end": 813.32, + "text": " Because it might just be evolving into something even better.", "tokens": + [50532, 1436, 309, 1062, 445, 312, 21085, 666, 746, 754, 1101, 13, 50662], "temperature": + 0.0, "avg_logprob": -0.19437014447511547, "compression_ratio": 1.608695652173913, + "no_speech_prob": 0.06758448481559753}, {"id": 417, "seek": 80736, "start": 813.32, + "end": 816.92, "text": " Who knows what exciting developments await us in the world + of vector databases?", "tokens": [50662, 2102, 3255, 437, 4670, 20862, 19670, 505, + 294, 264, 1002, 295, 8062, 22380, 30, 50842], "temperature": 0.0, "avg_logprob": + -0.19437014447511547, "compression_ratio": 1.608695652173913, "no_speech_prob": + 0.06758448481559753}, {"id": 418, "seek": 80736, "start": 816.92, "end": 818.96, + "text": " I''m definitely eager to see what the future holds.", "tokens": [50842, + 286, 478, 2138, 18259, 281, 536, 437, 264, 2027, 9190, 13, 50944], "temperature": + 0.0, "avg_logprob": -0.19437014447511547, "compression_ratio": 1.608695652173913, + "no_speech_prob": 0.06758448481559753}, {"id": 419, "seek": 80736, "start": 818.96, + "end": 819.96, "text": " E2.", "tokens": [50944, 462, 17, 13, 50994], "temperature": + 0.0, "avg_logprob": -0.19437014447511547, "compression_ratio": 1.608695652173913, + "no_speech_prob": 0.06758448481559753}, {"id": 420, "seek": 80736, "start": 819.96, + "end": 822.5600000000001, "text": " This deep dive has given me a whole new perspective.", + "tokens": [50994, 639, 2452, 9192, 575, 2212, 385, 257, 1379, 777, 4585, 13, 51124], + "temperature": 0.0, "avg_logprob": -0.19437014447511547, "compression_ratio": 1.608695652173913, + "no_speech_prob": 0.06758448481559753}, {"id": 421, "seek": 80736, "start": 822.5600000000001, + "end": 825.04, "text": " I''m sure it has for our listeners as well.", "tokens": + [51124, 286, 478, 988, 309, 575, 337, 527, 23274, 382, 731, 13, 51248], "temperature": + 0.0, "avg_logprob": -0.19437014447511547, "compression_ratio": 1.608695652173913, + "no_speech_prob": 0.06758448481559753}, {"id": 422, "seek": 80736, "start": 825.04, + "end": 830.32, "text": " You know, as we''re discussing this, it strikes me that + Dmitry''s journey with vector databases", "tokens": [51248, 509, 458, 11, 382, 321, + 434, 10850, 341, 11, 309, 16750, 385, 300, 413, 3508, 627, 311, 4671, 365, 8062, + 22380, 51512], "temperature": 0.0, "avg_logprob": -0.19437014447511547, "compression_ratio": + 1.608695652173913, "no_speech_prob": 0.06758448481559753}, {"id": 423, "seek": 80736, + "start": 830.32, "end": 833.32, "text": " mirrors a broader trend in the tech world.", + "tokens": [51512, 24238, 257, 13227, 6028, 294, 264, 7553, 1002, 13, 51662], "temperature": + 0.0, "avg_logprob": -0.19437014447511547, "compression_ratio": 1.608695652173913, + "no_speech_prob": 0.06758448481559753}, {"id": 424, "seek": 80736, "start": 833.32, + "end": 834.52, "text": " Oh, how so?", "tokens": [51662, 876, 11, 577, 370, 30, + 51722], "temperature": 0.0, "avg_logprob": -0.19437014447511547, "compression_ratio": + 1.608695652173913, "no_speech_prob": 0.06758448481559753}, {"id": 425, "seek": 80736, + "start": 834.52, "end": 836.16, "text": " We often get caught up in the hype cycle.", + "tokens": [51722, 492, 2049, 483, 5415, 493, 294, 264, 24144, 6586, 13, 51804], + "temperature": 0.0, "avg_logprob": -0.19437014447511547, "compression_ratio": 1.608695652173913, + "no_speech_prob": 0.06758448481559753}, {"id": 426, "seek": 83616, "start": 836.16, + "end": 837.88, "text": " Oh, yeah, for sure.", "tokens": [50364, 876, 11, 1338, + 11, 337, 988, 13, 50450], "temperature": 0.0, "avg_logprob": -0.19287610521503523, + "compression_ratio": 1.5320754716981133, "no_speech_prob": 0.01279979757964611}, + {"id": 427, "seek": 83616, "start": 837.88, "end": 844.3199999999999, "text": " + But true innovation often emerges when technologies evolve and integrate in unexpected + ways.", "tokens": [50450, 583, 2074, 8504, 2049, 38965, 562, 7943, 16693, 293, 13365, + 294, 13106, 2098, 13, 50772], "temperature": 0.0, "avg_logprob": -0.19287610521503523, + "compression_ratio": 1.5320754716981133, "no_speech_prob": 0.01279979757964611}, + {"id": 428, "seek": 83616, "start": 844.3199999999999, "end": 849.0799999999999, + "text": " It''s like that saying the whole is greater than the sum of its parts.", + "tokens": [50772, 467, 311, 411, 300, 1566, 264, 1379, 307, 5044, 813, 264, 2408, + 295, 1080, 3166, 13, 51010], "temperature": 0.0, "avg_logprob": -0.19287610521503523, + "compression_ratio": 1.5320754716981133, "no_speech_prob": 0.01279979757964611}, + {"id": 429, "seek": 83616, "start": 849.0799999999999, "end": 851.3199999999999, + "text": " And that brings us to another crucial point from the article.", "tokens": + [51010, 400, 300, 5607, 505, 281, 1071, 11462, 935, 490, 264, 7222, 13, 51122], + "temperature": 0.0, "avg_logprob": -0.19287610521503523, "compression_ratio": 1.5320754716981133, + "no_speech_prob": 0.01279979757964611}, {"id": 430, "seek": 83616, "start": 851.3199999999999, + "end": 852.3199999999999, "text": " Okay.", "tokens": [51122, 1033, 13, 51172], + "temperature": 0.0, "avg_logprob": -0.19287610521503523, "compression_ratio": 1.5320754716981133, + "no_speech_prob": 0.01279979757964611}, {"id": 431, "seek": 83616, "start": 852.3199999999999, + "end": 854.92, "text": " One that I think holds immense value for our listeners + today.", "tokens": [51172, 1485, 300, 286, 519, 9190, 22920, 2158, 337, 527, 23274, + 965, 13, 51302], "temperature": 0.0, "avg_logprob": -0.19287610521503523, "compression_ratio": + 1.5320754716981133, "no_speech_prob": 0.01279979757964611}, {"id": 432, "seek": + 83616, "start": 854.92, "end": 856.24, "text": " I''m all ears.", "tokens": [51302, + 286, 478, 439, 8798, 13, 51368], "temperature": 0.0, "avg_logprob": -0.19287610521503523, + "compression_ratio": 1.5320754716981133, "no_speech_prob": 0.01279979757964611}, + {"id": 433, "seek": 83616, "start": 856.24, "end": 860.9599999999999, "text": " + Remember how Dmitry emphasized that it''s not just about the vectors themselves.", + "tokens": [51368, 5459, 577, 413, 3508, 627, 34068, 300, 309, 311, 406, 445, 466, + 264, 18875, 2969, 13, 51604], "temperature": 0.0, "avg_logprob": -0.19287610521503523, + "compression_ratio": 1.5320754716981133, "no_speech_prob": 0.01279979757964611}, + {"id": 434, "seek": 86096, "start": 860.96, "end": 866.32, "text": " It''s about + understanding the nuances of data pre-processing model selection and bedding", "tokens": + [50364, 467, 311, 466, 3701, 264, 38775, 295, 1412, 659, 12, 41075, 278, 2316, 9450, + 293, 2901, 3584, 50632], "temperature": 0.0, "avg_logprob": -0.20812978329865828, + "compression_ratio": 1.5774647887323943, "no_speech_prob": 0.08708859235048294}, + {"id": 435, "seek": 86096, "start": 866.32, "end": 867.48, "text": " techniques.", + "tokens": [50632, 7512, 13, 50690], "temperature": 0.0, "avg_logprob": -0.20812978329865828, + "compression_ratio": 1.5774647887323943, "no_speech_prob": 0.08708859235048294}, + {"id": 436, "seek": 86096, "start": 867.48, "end": 873.08, "text": " And even knowing + when to switch back to traditional keyword search for certain tasks.", "tokens": + [50690, 400, 754, 5276, 562, 281, 3679, 646, 281, 5164, 20428, 3164, 337, 1629, + 9608, 13, 50970], "temperature": 0.0, "avg_logprob": -0.20812978329865828, "compression_ratio": + 1.5774647887323943, "no_speech_prob": 0.08708859235048294}, {"id": 437, "seek": + 86096, "start": 873.08, "end": 875.76, "text": " Yeah, sometimes the old ways are + still the best.", "tokens": [50970, 865, 11, 2171, 264, 1331, 2098, 366, 920, 264, + 1151, 13, 51104], "temperature": 0.0, "avg_logprob": -0.20812978329865828, "compression_ratio": + 1.5774647887323943, "no_speech_prob": 0.08708859235048294}, {"id": 438, "seek": + 86096, "start": 875.76, "end": 882.2, "text": " He''s advocating for a more holistic + approach where vector databases are seen as one tool", "tokens": [51104, 634, 311, + 32050, 337, 257, 544, 30334, 3109, 689, 8062, 22380, 366, 1612, 382, 472, 2290, + 51426], "temperature": 0.0, "avg_logprob": -0.20812978329865828, "compression_ratio": + 1.5774647887323943, "no_speech_prob": 0.08708859235048294}, {"id": 439, "seek": + 86096, "start": 882.2, "end": 884.2800000000001, "text": " among many in the AI + toolbox.", "tokens": [51426, 3654, 867, 294, 264, 7318, 44593, 13, 51530], "temperature": + 0.0, "avg_logprob": -0.20812978329865828, "compression_ratio": 1.5774647887323943, + "no_speech_prob": 0.08708859235048294}, {"id": 440, "seek": 86096, "start": 884.2800000000001, + "end": 885.76, "text": " So it''s not a silver bullet.", "tokens": [51530, 407, + 309, 311, 406, 257, 8753, 11632, 13, 51604], "temperature": 0.0, "avg_logprob": + -0.20812978329865828, "compression_ratio": 1.5774647887323943, "no_speech_prob": + 0.08708859235048294}, {"id": 441, "seek": 86096, "start": 885.76, "end": 886.9200000000001, + "text": " It''s not a magic solution.", "tokens": [51604, 467, 311, 406, 257, 5585, + 3827, 13, 51662], "temperature": 0.0, "avg_logprob": -0.20812978329865828, "compression_ratio": + 1.5774647887323943, "no_speech_prob": 0.08708859235048294}, {"id": 442, "seek": + 86096, "start": 886.9200000000001, "end": 888.76, "text": " It''s one piece of the + puzzle.", "tokens": [51662, 467, 311, 472, 2522, 295, 264, 12805, 13, 51754], "temperature": + 0.0, "avg_logprob": -0.20812978329865828, "compression_ratio": 1.5774647887323943, + "no_speech_prob": 0.08708859235048294}, {"id": 443, "seek": 86096, "start": 888.76, + "end": 889.76, "text": " Exactly.", "tokens": [51754, 7587, 13, 51804], "temperature": + 0.0, "avg_logprob": -0.20812978329865828, "compression_ratio": 1.5774647887323943, + "no_speech_prob": 0.08708859235048294}, {"id": 444, "seek": 88976, "start": 889.76, + "end": 895.0, "text": " There is a deeper understanding of the underlying principles, + not just blindly applying the latest", "tokens": [50364, 821, 307, 257, 7731, 3701, + 295, 264, 14217, 9156, 11, 406, 445, 47744, 9275, 264, 6792, 50626], "temperature": + 0.0, "avg_logprob": -0.20254910376764113, "compression_ratio": 1.6214511041009463, + "no_speech_prob": 0.17467549443244934}, {"id": 445, "seek": 88976, "start": 895.0, + "end": 896.0, "text": " trendy technology.", "tokens": [50626, 38596, 2899, 13, + 50676], "temperature": 0.0, "avg_logprob": -0.20254910376764113, "compression_ratio": + 1.6214511041009463, "no_speech_prob": 0.17467549443244934}, {"id": 446, "seek": + 88976, "start": 896.0, "end": 897.0, "text": " Right.", "tokens": [50676, 1779, + 13, 50726], "temperature": 0.0, "avg_logprob": -0.20254910376764113, "compression_ratio": + 1.6214511041009463, "no_speech_prob": 0.17467549443244934}, {"id": 447, "seek": + 88976, "start": 897.0, "end": 901.76, "text": " It''s about making informed choices + based on a thorough analysis of your specific needs and", "tokens": [50726, 467, + 311, 466, 1455, 11740, 7994, 2361, 322, 257, 12934, 5215, 295, 428, 2685, 2203, + 293, 50964], "temperature": 0.0, "avg_logprob": -0.20254910376764113, "compression_ratio": + 1.6214511041009463, "no_speech_prob": 0.17467549443244934}, {"id": 448, "seek": + 88976, "start": 901.76, "end": 902.76, "text": " constraints.", "tokens": [50964, + 18491, 13, 51014], "temperature": 0.0, "avg_logprob": -0.20254910376764113, "compression_ratio": + 1.6214511041009463, "no_speech_prob": 0.17467549443244934}, {"id": 449, "seek": + 88976, "start": 902.76, "end": 904.3199999999999, "text": " So don''t just jump + on the bandwagon.", "tokens": [51014, 407, 500, 380, 445, 3012, 322, 264, 4116, + 86, 6709, 13, 51092], "temperature": 0.0, "avg_logprob": -0.20254910376764113, "compression_ratio": + 1.6214511041009463, "no_speech_prob": 0.17467549443244934}, {"id": 450, "seek": + 88976, "start": 904.3199999999999, "end": 906.88, "text": " Do your research and + figure out what''s right for you.", "tokens": [51092, 1144, 428, 2132, 293, 2573, + 484, 437, 311, 558, 337, 291, 13, 51220], "temperature": 0.0, "avg_logprob": -0.20254910376764113, + "compression_ratio": 1.6214511041009463, "no_speech_prob": 0.17467549443244934}, + {"id": 451, "seek": 88976, "start": 906.88, "end": 912.24, "text": " So for those + of you out there exploring AI solutions, don''t get fixated on buzzwords.", "tokens": + [51220, 407, 337, 729, 295, 291, 484, 456, 12736, 7318, 6547, 11, 500, 380, 483, + 3191, 770, 322, 13036, 13832, 13, 51488], "temperature": 0.0, "avg_logprob": -0.20254910376764113, + "compression_ratio": 1.6214511041009463, "no_speech_prob": 0.17467549443244934}, + {"id": 452, "seek": 88976, "start": 912.24, "end": 915.64, "text": " Take the time + to really grasp the fundamentals.", "tokens": [51488, 3664, 264, 565, 281, 534, + 21743, 264, 29505, 13, 51658], "temperature": 0.0, "avg_logprob": -0.20254910376764113, + "compression_ratio": 1.6214511041009463, "no_speech_prob": 0.17467549443244934}, + {"id": 453, "seek": 88976, "start": 915.64, "end": 917.4399999999999, "text": " + Understand the basics.", "tokens": [51658, 26093, 264, 14688, 13, 51748], "temperature": + 0.0, "avg_logprob": -0.20254910376764113, "compression_ratio": 1.6214511041009463, + "no_speech_prob": 0.17467549443244934}, {"id": 454, "seek": 88976, "start": 917.4399999999999, + "end": 918.56, "text": " Experiment with different approaches.", "tokens": [51748, + 37933, 365, 819, 11587, 13, 51804], "temperature": 0.0, "avg_logprob": -0.20254910376764113, + "compression_ratio": 1.6214511041009463, "no_speech_prob": 0.17467549443244934}, + {"id": 455, "seek": 91856, "start": 918.56, "end": 920.16, "text": " Stay around + with different tools.", "tokens": [50364, 8691, 926, 365, 819, 3873, 13, 50444], + "temperature": 0.0, "avg_logprob": -0.2009817433153462, "compression_ratio": 1.5674740484429066, + "no_speech_prob": 0.10541005432605743}, {"id": 456, "seek": 91856, "start": 920.16, + "end": 922.8, "text": " And don''t be afraid to challenge assumptions.", "tokens": + [50444, 400, 500, 380, 312, 4638, 281, 3430, 17695, 13, 50576], "temperature": 0.0, + "avg_logprob": -0.2009817433153462, "compression_ratio": 1.5674740484429066, "no_speech_prob": + 0.10541005432605743}, {"id": 457, "seek": 91856, "start": 922.8, "end": 923.8, "text": + " Question everything.", "tokens": [50576, 14464, 1203, 13, 50626], "temperature": + 0.0, "avg_logprob": -0.2009817433153462, "compression_ratio": 1.5674740484429066, + "no_speech_prob": 0.10541005432605743}, {"id": 458, "seek": 91856, "start": 923.8, + "end": 926.5999999999999, "text": " And remember, the AI landscape is constantly + evolving.", "tokens": [50626, 400, 1604, 11, 264, 7318, 9661, 307, 6460, 21085, + 13, 50766], "temperature": 0.0, "avg_logprob": -0.2009817433153462, "compression_ratio": + 1.5674740484429066, "no_speech_prob": 0.10541005432605743}, {"id": 459, "seek": + 91856, "start": 926.5999999999999, "end": 930.68, "text": " What works best today + might be superseded by something even more powerful and efficient", "tokens": [50766, + 708, 1985, 1151, 965, 1062, 312, 37906, 37679, 538, 746, 754, 544, 4005, 293, 7148, + 50970], "temperature": 0.0, "avg_logprob": -0.2009817433153462, "compression_ratio": + 1.5674740484429066, "no_speech_prob": 0.10541005432605743}, {"id": 460, "seek": + 91856, "start": 930.68, "end": 932.16, "text": " tomorrow.", "tokens": [50970, 4153, + 13, 51044], "temperature": 0.0, "avg_logprob": -0.2009817433153462, "compression_ratio": + 1.5674740484429066, "no_speech_prob": 0.10541005432605743}, {"id": 461, "seek": + 91856, "start": 932.16, "end": 933.16, "text": " So stay curious.", "tokens": [51044, + 407, 1754, 6369, 13, 51094], "temperature": 0.0, "avg_logprob": -0.2009817433153462, + "compression_ratio": 1.5674740484429066, "no_speech_prob": 0.10541005432605743}, + {"id": 462, "seek": 91856, "start": 933.16, "end": 934.16, "text": " They engage.", + "tokens": [51094, 814, 4683, 13, 51144], "temperature": 0.0, "avg_logprob": -0.2009817433153462, + "compression_ratio": 1.5674740484429066, "no_speech_prob": 0.10541005432605743}, + {"id": 463, "seek": 91856, "start": 934.16, "end": 935.16, "text": " And keep learning.", + "tokens": [51144, 400, 1066, 2539, 13, 51194], "temperature": 0.0, "avg_logprob": + -0.2009817433153462, "compression_ratio": 1.5674740484429066, "no_speech_prob": + 0.10541005432605743}, {"id": 464, "seek": 91856, "start": 935.16, "end": 937.52, + "text": " Couldn''t have said it better myself.", "tokens": [51194, 35800, 380, + 362, 848, 309, 1101, 2059, 13, 51312], "temperature": 0.0, "avg_logprob": -0.2009817433153462, + "compression_ratio": 1.5674740484429066, "no_speech_prob": 0.10541005432605743}, + {"id": 465, "seek": 91856, "start": 937.52, "end": 943.88, "text": " Well folks, + that brings us to the end of our deep dive into the world of vector databases.", + "tokens": [51312, 1042, 4024, 11, 300, 5607, 505, 281, 264, 917, 295, 527, 2452, + 9192, 666, 264, 1002, 295, 8062, 22380, 13, 51630], "temperature": 0.0, "avg_logprob": + -0.2009817433153462, "compression_ratio": 1.5674740484429066, "no_speech_prob": + 0.10541005432605743}, {"id": 466, "seek": 91856, "start": 943.88, "end": 945.52, + "text": " It''s been a wild ride.", "tokens": [51630, 467, 311, 668, 257, 4868, + 5077, 13, 51712], "temperature": 0.0, "avg_logprob": -0.2009817433153462, "compression_ratio": + 1.5674740484429066, "no_speech_prob": 0.10541005432605743}, {"id": 467, "seek": + 94552, "start": 945.52, "end": 951.52, "text": " We''ve explored their rise, their + potential fall, and the exciting possibilities of neural", "tokens": [50364, 492, + 600, 24016, 641, 6272, 11, 641, 3995, 2100, 11, 293, 264, 4670, 12178, 295, 18161, + 50664], "temperature": 0.0, "avg_logprob": -0.15887847439996128, "compression_ratio": + 1.627986348122867, "no_speech_prob": 0.04780411720275879}, {"id": 468, "seek": 94552, + "start": 951.52, "end": 952.76, "text": " search frameworks.", "tokens": [50664, + 3164, 29834, 13, 50726], "temperature": 0.0, "avg_logprob": -0.15887847439996128, + "compression_ratio": 1.627986348122867, "no_speech_prob": 0.04780411720275879}, + {"id": 469, "seek": 94552, "start": 952.76, "end": 954.4, "text": " We''ve covered + a lot of ground.", "tokens": [50726, 492, 600, 5343, 257, 688, 295, 2727, 13, 50808], + "temperature": 0.0, "avg_logprob": -0.15887847439996128, "compression_ratio": 1.627986348122867, + "no_speech_prob": 0.04780411720275879}, {"id": 470, "seek": 94552, "start": 954.4, + "end": 959.0799999999999, "text": " We''ve also learned some valuable lessons about + navigating the hype cycle and making informed", "tokens": [50808, 492, 600, 611, + 3264, 512, 8263, 8820, 466, 32054, 264, 24144, 6586, 293, 1455, 11740, 51042], "temperature": + 0.0, "avg_logprob": -0.15887847439996128, "compression_ratio": 1.627986348122867, + "no_speech_prob": 0.04780411720275879}, {"id": 471, "seek": 94552, "start": 959.0799999999999, + "end": 962.24, "text": " decisions in a rapidly changing technological landscape.", + "tokens": [51042, 5327, 294, 257, 12910, 4473, 18439, 9661, 13, 51200], "temperature": + 0.0, "avg_logprob": -0.15887847439996128, "compression_ratio": 1.627986348122867, + "no_speech_prob": 0.04780411720275879}, {"id": 472, "seek": 94552, "start": 962.24, + "end": 963.8, "text": " It''s been a fascinating journey.", "tokens": [51200, 467, + 311, 668, 257, 10343, 4671, 13, 51278], "temperature": 0.0, "avg_logprob": -0.15887847439996128, + "compression_ratio": 1.627986348122867, "no_speech_prob": 0.04780411720275879}, + {"id": 473, "seek": 94552, "start": 963.8, "end": 964.8, "text": " Absolutely.", + "tokens": [51278, 7021, 13, 51328], "temperature": 0.0, "avg_logprob": -0.15887847439996128, + "compression_ratio": 1.627986348122867, "no_speech_prob": 0.04780411720275879}, + {"id": 474, "seek": 94552, "start": 964.8, "end": 966.96, "text": " And we hope + you''ve enjoyed it as much as we have.", "tokens": [51328, 400, 321, 1454, 291, + 600, 4626, 309, 382, 709, 382, 321, 362, 13, 51436], "temperature": 0.0, "avg_logprob": + -0.15887847439996128, "compression_ratio": 1.627986348122867, "no_speech_prob": + 0.04780411720275879}, {"id": 475, "seek": 94552, "start": 966.96, "end": 967.96, + "text": " Until next time.", "tokens": [51436, 9088, 958, 565, 13, 51486], "temperature": + 0.0, "avg_logprob": -0.15887847439996128, "compression_ratio": 1.627986348122867, + "no_speech_prob": 0.04780411720275879}, {"id": 476, "seek": 94552, "start": 967.96, + "end": 969.16, "text": " Keep exploring.", "tokens": [51486, 5527, 12736, 13, 51546], + "temperature": 0.0, "avg_logprob": -0.15887847439996128, "compression_ratio": 1.627986348122867, + "no_speech_prob": 0.04780411720275879}, {"id": 477, "seek": 94552, "start": 969.16, + "end": 970.24, "text": " Keep questioning.", "tokens": [51546, 5527, 21257, 13, + 51600], "temperature": 0.0, "avg_logprob": -0.15887847439996128, "compression_ratio": + 1.627986348122867, "no_speech_prob": 0.04780411720275879}, {"id": 478, "seek": 94552, + "start": 970.24, "end": 972.76, "text": " And keep that thirst for knowledge alive.", + "tokens": [51600, 400, 1066, 300, 34846, 337, 3601, 5465, 13, 51726], "temperature": + 0.0, "avg_logprob": -0.15887847439996128, "compression_ratio": 1.627986348122867, + "no_speech_prob": 0.04780411720275879}, {"id": 479, "seek": 97276, "start": 972.76, + "end": 977.2, "text": " It''s funny actually while we''re focused on all this cutting + edge tech, Dimitri actually", "tokens": [50364, 467, 311, 4074, 767, 1339, 321, + 434, 5178, 322, 439, 341, 6492, 4691, 7553, 11, 20975, 270, 470, 767, 50586], "temperature": + 0.0, "avg_logprob": -0.18286917550223214, "compression_ratio": 1.6242603550295858, + "no_speech_prob": 0.31692713499069214}, {"id": 480, "seek": 97276, "start": 977.2, + "end": 979.64, "text": " kind of throws it back to basics in the article a little + bit.", "tokens": [50586, 733, 295, 19251, 309, 646, 281, 14688, 294, 264, 7222, + 257, 707, 857, 13, 50708], "temperature": 0.0, "avg_logprob": -0.18286917550223214, + "compression_ratio": 1.6242603550295858, "no_speech_prob": 0.31692713499069214}, + {"id": 481, "seek": 97276, "start": 979.64, "end": 981.16, "text": " Oh yeah, I + remember that part.", "tokens": [50708, 876, 1338, 11, 286, 1604, 300, 644, 13, + 50784], "temperature": 0.0, "avg_logprob": -0.18286917550223214, "compression_ratio": + 1.6242603550295858, "no_speech_prob": 0.31692713499069214}, {"id": 482, "seek": + 97276, "start": 981.16, "end": 985.68, "text": " He recounts this conversation he + had with the chief data scientist at a major bank.", "tokens": [50784, 634, 43997, + 82, 341, 3761, 415, 632, 365, 264, 9588, 1412, 12662, 412, 257, 2563, 3765, 13, + 51010], "temperature": 0.0, "avg_logprob": -0.18286917550223214, "compression_ratio": + 1.6242603550295858, "no_speech_prob": 0.31692713499069214}, {"id": 483, "seek": + 97276, "start": 985.68, "end": 990.04, "text": " That was a good one, which I thought + was so interesting because it really emphasizes", "tokens": [51010, 663, 390, 257, + 665, 472, 11, 597, 286, 1194, 390, 370, 1880, 570, 309, 534, 48856, 51228], "temperature": + 0.0, "avg_logprob": -0.18286917550223214, "compression_ratio": 1.6242603550295858, + "no_speech_prob": 0.31692713499069214}, {"id": 484, "seek": 97276, "start": 990.04, + "end": 995.4399999999999, "text": " how even with all these advancements, sometimes + the simplest solution is the best one.", "tokens": [51228, 577, 754, 365, 439, 613, + 7295, 1117, 11, 2171, 264, 22811, 3827, 307, 264, 1151, 472, 13, 51498], "temperature": + 0.0, "avg_logprob": -0.18286917550223214, "compression_ratio": 1.6242603550295858, + "no_speech_prob": 0.31692713499069214}, {"id": 485, "seek": 97276, "start": 995.4399999999999, + "end": 997.68, "text": " You don''t always need the fanciest tools.", "tokens": + [51498, 509, 500, 380, 1009, 643, 264, 3429, 537, 377, 3873, 13, 51610], "temperature": + 0.0, "avg_logprob": -0.18286917550223214, "compression_ratio": 1.6242603550295858, + "no_speech_prob": 0.31692713499069214}, {"id": 486, "seek": 97276, "start": 997.68, + "end": 998.68, "text": " Right.", "tokens": [51610, 1779, 13, 51660], "temperature": + 0.0, "avg_logprob": -0.18286917550223214, "compression_ratio": 1.6242603550295858, + "no_speech_prob": 0.31692713499069214}, {"id": 487, "seek": 97276, "start": 998.68, + "end": 1001.08, "text": " Sometimes it''s about using the right tool for the job.", + "tokens": [51660, 4803, 309, 311, 466, 1228, 264, 558, 2290, 337, 264, 1691, 13, + 51780], "temperature": 0.0, "avg_logprob": -0.18286917550223214, "compression_ratio": + 1.6242603550295858, "no_speech_prob": 0.31692713499069214}, {"id": 488, "seek": + 97276, "start": 1001.08, "end": 1002.08, "text": " Exactly.", "tokens": [51780, + 7587, 13, 51830], "temperature": 0.0, "avg_logprob": -0.18286917550223214, "compression_ratio": + 1.6242603550295858, "no_speech_prob": 0.31692713499069214}, {"id": 489, "seek": + 100208, "start": 1002.08, "end": 1006.1600000000001, "text": " This bank had poured + resources into building this complex vector search system.", "tokens": [50364, 639, + 3765, 632, 23270, 3593, 666, 2390, 341, 3997, 8062, 3164, 1185, 13, 50568], "temperature": + 0.0, "avg_logprob": -0.19838447868824005, "compression_ratio": 1.5377049180327869, + "no_speech_prob": 0.11691146343946457}, {"id": 490, "seek": 100208, "start": 1006.1600000000001, + "end": 1007.1600000000001, "text": " Okay.", "tokens": [50568, 1033, 13, 50618], + "temperature": 0.0, "avg_logprob": -0.19838447868824005, "compression_ratio": 1.5377049180327869, + "no_speech_prob": 0.11691146343946457}, {"id": 491, "seek": 100208, "start": 1007.1600000000001, + "end": 1008.1600000000001, "text": " But guess what?", "tokens": [50618, 583, 2041, + 437, 30, 50668], "temperature": 0.0, "avg_logprob": -0.19838447868824005, "compression_ratio": + 1.5377049180327869, "no_speech_prob": 0.11691146343946457}, {"id": 492, "seek": + 100208, "start": 1008.1600000000001, "end": 1009.1600000000001, "text": " What?", + "tokens": [50668, 708, 30, 50718], "temperature": 0.0, "avg_logprob": -0.19838447868824005, + "compression_ratio": 1.5377049180327869, "no_speech_prob": 0.11691146343946457}, + {"id": 493, "seek": 100208, "start": 1009.1600000000001, "end": 1012.0, "text": + " They ended up getting better results with good old fashioned keyword search.", + "tokens": [50718, 814, 4590, 493, 1242, 1101, 3542, 365, 665, 1331, 40646, 20428, + 3164, 13, 50860], "temperature": 0.0, "avg_logprob": -0.19838447868824005, "compression_ratio": + 1.5377049180327869, "no_speech_prob": 0.11691146343946457}, {"id": 494, "seek": + 100208, "start": 1012.0, "end": 1013.0, "text": " Really?", "tokens": [50860, 4083, + 30, 50910], "temperature": 0.0, "avg_logprob": -0.19838447868824005, "compression_ratio": + 1.5377049180327869, "no_speech_prob": 0.11691146343946457}, {"id": 495, "seek": + 100208, "start": 1013.0, "end": 1015.0, "text": " For some very specific tasks.", + "tokens": [50910, 1171, 512, 588, 2685, 9608, 13, 51010], "temperature": 0.0, "avg_logprob": + -0.19838447868824005, "compression_ratio": 1.5377049180327869, "no_speech_prob": + 0.11691146343946457}, {"id": 496, "seek": 100208, "start": 1015.0, "end": 1016.0, + "text": " Huh.", "tokens": [51010, 8063, 13, 51060], "temperature": 0.0, "avg_logprob": + -0.19838447868824005, "compression_ratio": 1.5377049180327869, "no_speech_prob": + 0.11691146343946457}, {"id": 497, "seek": 100208, "start": 1016.0, "end": 1018.76, + "text": " So even the big bangs are going back to basics sometimes.", "tokens": + [51060, 407, 754, 264, 955, 32802, 366, 516, 646, 281, 14688, 2171, 13, 51198], + "temperature": 0.0, "avg_logprob": -0.19838447868824005, "compression_ratio": 1.5377049180327869, + "no_speech_prob": 0.11691146343946457}, {"id": 498, "seek": 100208, "start": 1018.76, + "end": 1019.76, "text": " Sometimes it makes more sense.", "tokens": [51198, 4803, + 309, 1669, 544, 2020, 13, 51248], "temperature": 0.0, "avg_logprob": -0.19838447868824005, + "compression_ratio": 1.5377049180327869, "no_speech_prob": 0.11691146343946457}, + {"id": 499, "seek": 100208, "start": 1019.76, "end": 1020.76, "text": " Yeah.", + "tokens": [51248, 865, 13, 51298], "temperature": 0.0, "avg_logprob": -0.19838447868824005, + "compression_ratio": 1.5377049180327869, "no_speech_prob": 0.11691146343946457}, + {"id": 500, "seek": 100208, "start": 1020.76, "end": 1025.08, "text": " It''s a + powerful reminder that we shouldn''t dismiss those tried and true methods.", "tokens": + [51298, 467, 311, 257, 4005, 13548, 300, 321, 4659, 380, 16974, 729, 3031, 293, + 2074, 7150, 13, 51514], "temperature": 0.0, "avg_logprob": -0.19838447868824005, + "compression_ratio": 1.5377049180327869, "no_speech_prob": 0.11691146343946457}, + {"id": 501, "seek": 100208, "start": 1025.08, "end": 1027.32, "text": " Like don''t + throw out the baby with the bathwater.", "tokens": [51514, 1743, 500, 380, 3507, + 484, 264, 3186, 365, 264, 6079, 8002, 13, 51626], "temperature": 0.0, "avg_logprob": + -0.19838447868824005, "compression_ratio": 1.5377049180327869, "no_speech_prob": + 0.11691146343946457}, {"id": 502, "seek": 100208, "start": 1027.32, "end": 1028.32, + "text": " Right.", "tokens": [51626, 1779, 13, 51676], "temperature": 0.0, "avg_logprob": + -0.19838447868824005, "compression_ratio": 1.5377049180327869, "no_speech_prob": + 0.11691146343946457}, {"id": 503, "seek": 100208, "start": 1028.32, "end": 1029.32, + "text": " Exactly.", "tokens": [51676, 7587, 13, 51726], "temperature": 0.0, "avg_logprob": + -0.19838447868824005, "compression_ratio": 1.5377049180327869, "no_speech_prob": + 0.11691146343946457}, {"id": 504, "seek": 102932, "start": 1029.32, "end": 1033.8, + "text": " The tools when used strategically can outperform the flashiest new tech.", + "tokens": [50364, 440, 3873, 562, 1143, 38061, 393, 484, 26765, 264, 7319, 6495, + 777, 7553, 13, 50588], "temperature": 0.0, "avg_logprob": -0.20490973525577122, + "compression_ratio": 1.609375, "no_speech_prob": 0.5447851419448853}, {"id": 505, + "seek": 102932, "start": 1033.8, "end": 1036.48, "text": " It''s all about choosing + the right tool for the job.", "tokens": [50588, 467, 311, 439, 466, 10875, 264, + 558, 2290, 337, 264, 1691, 13, 50722], "temperature": 0.0, "avg_logprob": -0.20490973525577122, + "compression_ratio": 1.609375, "no_speech_prob": 0.5447851419448853}, {"id": 506, + "seek": 102932, "start": 1036.48, "end": 1039.1599999999999, "text": " It''s like + trying to use a chainsaw to cut a piece of paper.", "tokens": [50722, 467, 311, + 411, 1382, 281, 764, 257, 12626, 1607, 281, 1723, 257, 2522, 295, 3035, 13, 50856], + "temperature": 0.0, "avg_logprob": -0.20490973525577122, "compression_ratio": 1.609375, + "no_speech_prob": 0.5447851419448853}, {"id": 507, "seek": 102932, "start": 1039.1599999999999, + "end": 1040.1599999999999, "text": " Oof.", "tokens": [50856, 422, 2670, 13, 50906], + "temperature": 0.0, "avg_logprob": -0.20490973525577122, "compression_ratio": 1.609375, + "no_speech_prob": 0.5447851419448853}, {"id": 508, "seek": 102932, "start": 1040.1599999999999, + "end": 1041.76, "text": " Yeah, that wouldn''t end well.", "tokens": [50906, 865, + 11, 300, 2759, 380, 917, 731, 13, 50986], "temperature": 0.0, "avg_logprob": -0.20490973525577122, + "compression_ratio": 1.609375, "no_speech_prob": 0.5447851419448853}, {"id": 509, + "seek": 102932, "start": 1041.76, "end": 1044.32, "text": " Sometimes a simple pair + of scissors does the job better.", "tokens": [50986, 4803, 257, 2199, 6119, 295, + 16066, 775, 264, 1691, 1101, 13, 51114], "temperature": 0.0, "avg_logprob": -0.20490973525577122, + "compression_ratio": 1.609375, "no_speech_prob": 0.5447851419448853}, {"id": 510, + "seek": 102932, "start": 1044.32, "end": 1045.96, "text": " Much better.", "tokens": + [51114, 12313, 1101, 13, 51196], "temperature": 0.0, "avg_logprob": -0.20490973525577122, + "compression_ratio": 1.609375, "no_speech_prob": 0.5447851419448853}, {"id": 511, + "seek": 102932, "start": 1045.96, "end": 1049.6399999999999, "text": " And that + brings us back to Dimitri''s vision of neural search frameworks.", "tokens": [51196, + 400, 300, 5607, 505, 646, 281, 20975, 270, 470, 311, 5201, 295, 18161, 3164, 29834, + 13, 51380], "temperature": 0.0, "avg_logprob": -0.20490973525577122, "compression_ratio": + 1.609375, "no_speech_prob": 0.5447851419448853}, {"id": 512, "seek": 102932, "start": + 1049.6399999999999, "end": 1050.6399999999999, "text": " Okay.", "tokens": [51380, + 1033, 13, 51430], "temperature": 0.0, "avg_logprob": -0.20490973525577122, "compression_ratio": + 1.609375, "no_speech_prob": 0.5447851419448853}, {"id": 513, "seek": 102932, "start": + 1050.6399999999999, "end": 1051.6399999999999, "text": " If they become a reality.", + "tokens": [51430, 759, 436, 1813, 257, 4103, 13, 51480], "temperature": 0.0, "avg_logprob": + -0.20490973525577122, "compression_ratio": 1.609375, "no_speech_prob": 0.5447851419448853}, + {"id": 514, "seek": 102932, "start": 1051.6399999999999, "end": 1052.6399999999999, + "text": " Yeah.", "tokens": [51480, 865, 13, 51530], "temperature": 0.0, "avg_logprob": + -0.20490973525577122, "compression_ratio": 1.609375, "no_speech_prob": 0.5447851419448853}, + {"id": 515, "seek": 102932, "start": 1052.6399999999999, "end": 1054.6, "text": + " Could they simplify these choices for us?", "tokens": [51530, 7497, 436, 20460, + 613, 7994, 337, 505, 30, 51628], "temperature": 0.0, "avg_logprob": -0.20490973525577122, + "compression_ratio": 1.609375, "no_speech_prob": 0.5447851419448853}, {"id": 516, + "seek": 102932, "start": 1054.6, "end": 1055.6, "text": " Interesting question.", + "tokens": [51628, 14711, 1168, 13, 51678], "temperature": 0.0, "avg_logprob": -0.20490973525577122, + "compression_ratio": 1.609375, "no_speech_prob": 0.5447851419448853}, {"id": 517, + "seek": 102932, "start": 1055.6, "end": 1058.48, "text": " Would they be able to + determine the best approach?", "tokens": [51678, 6068, 436, 312, 1075, 281, 6997, + 264, 1151, 3109, 30, 51822], "temperature": 0.0, "avg_logprob": -0.20490973525577122, + "compression_ratio": 1.609375, "no_speech_prob": 0.5447851419448853}, {"id": 518, + "seek": 105848, "start": 1058.48, "end": 1064.16, "text": " Like whether it''s vector + search, keyword search or a hybrid based on the task at hand.", "tokens": [50364, + 1743, 1968, 309, 311, 8062, 3164, 11, 20428, 3164, 420, 257, 13051, 2361, 322, 264, + 5633, 412, 1011, 13, 50648], "temperature": 0.0, "avg_logprob": -0.1728199818095223, + "compression_ratio": 1.632867132867133, "no_speech_prob": 0.004857513587921858}, + {"id": 519, "seek": 105848, "start": 1064.16, "end": 1065.16, "text": " That''s + the dream.", "tokens": [50648, 663, 311, 264, 3055, 13, 50698], "temperature": 0.0, + "avg_logprob": -0.1728199818095223, "compression_ratio": 1.632867132867133, "no_speech_prob": + 0.004857513587921858}, {"id": 520, "seek": 105848, "start": 1065.16, "end": 1066.16, + "text": " How be amazed?", "tokens": [50698, 1012, 312, 20507, 30, 50748], "temperature": + 0.0, "avg_logprob": -0.1728199818095223, "compression_ratio": 1.632867132867133, + "no_speech_prob": 0.004857513587921858}, {"id": 521, "seek": 105848, "start": 1066.16, + "end": 1070.76, "text": " It would be like having this AI assistant that just knows + the best way to search for anything.", "tokens": [50748, 467, 576, 312, 411, 1419, + 341, 7318, 10994, 300, 445, 3255, 264, 1151, 636, 281, 3164, 337, 1340, 13, 50978], + "temperature": 0.0, "avg_logprob": -0.1728199818095223, "compression_ratio": 1.632867132867133, + "no_speech_prob": 0.004857513587921858}, {"id": 522, "seek": 105848, "start": 1070.76, + "end": 1072.2, "text": " That''s a fantastic question.", "tokens": [50978, 663, + 311, 257, 5456, 1168, 13, 51050], "temperature": 0.0, "avg_logprob": -0.1728199818095223, + "compression_ratio": 1.632867132867133, "no_speech_prob": 0.004857513587921858}, + {"id": 523, "seek": 105848, "start": 1072.2, "end": 1077.2, "text": " It really + gets to the heart of what these frameworks could potentially offer.", "tokens": + [51050, 467, 534, 2170, 281, 264, 1917, 295, 437, 613, 29834, 727, 7263, 2626, 13, + 51300], "temperature": 0.0, "avg_logprob": -0.1728199818095223, "compression_ratio": + 1.632867132867133, "no_speech_prob": 0.004857513587921858}, {"id": 524, "seek": + 105848, "start": 1077.2, "end": 1080.92, "text": " Like a more intelligent and adaptable + approach to search.", "tokens": [51300, 1743, 257, 544, 13232, 293, 6231, 712, 3109, + 281, 3164, 13, 51486], "temperature": 0.0, "avg_logprob": -0.1728199818095223, "compression_ratio": + 1.632867132867133, "no_speech_prob": 0.004857513587921858}, {"id": 525, "seek": + 105848, "start": 1080.92, "end": 1083.2, "text": " So instead of us having to figure + it all out.", "tokens": [51486, 407, 2602, 295, 505, 1419, 281, 2573, 309, 439, + 484, 13, 51600], "temperature": 0.0, "avg_logprob": -0.1728199818095223, "compression_ratio": + 1.632867132867133, "no_speech_prob": 0.004857513587921858}, {"id": 526, "seek": + 105848, "start": 1083.2, "end": 1084.2, "text": " Yeah.", "tokens": [51600, 865, + 13, 51650], "temperature": 0.0, "avg_logprob": -0.1728199818095223, "compression_ratio": + 1.632867132867133, "no_speech_prob": 0.004857513587921858}, {"id": 527, "seek": + 105848, "start": 1084.2, "end": 1085.96, "text": " The system would just do it for + us.", "tokens": [51650, 440, 1185, 576, 445, 360, 309, 337, 505, 13, 51738], "temperature": + 0.0, "avg_logprob": -0.1728199818095223, "compression_ratio": 1.632867132867133, + "no_speech_prob": 0.004857513587921858}, {"id": 528, "seek": 108596, "start": 1085.96, + "end": 1092.04, "text": " A vision of system that not only combines different AI + components, but also understands", "tokens": [50364, 316, 5201, 295, 1185, 300, + 406, 787, 29520, 819, 7318, 6677, 11, 457, 611, 15146, 50668], "temperature": 0.0, + "avg_logprob": -0.20408561494615343, "compression_ratio": 1.5614035087719298, "no_speech_prob": + 0.13744895160198212}, {"id": 529, "seek": 108596, "start": 1092.04, "end": 1096.16, + "text": " which tool is best suited for each specific task.", "tokens": [50668, + 597, 2290, 307, 1151, 24736, 337, 1184, 2685, 5633, 13, 50874], "temperature": 0.0, + "avg_logprob": -0.20408561494615343, "compression_ratio": 1.5614035087719298, "no_speech_prob": + 0.13744895160198212}, {"id": 530, "seek": 108596, "start": 1096.16, "end": 1097.56, + "text": " That would be a game changer.", "tokens": [50874, 663, 576, 312, 257, + 1216, 22822, 13, 50944], "temperature": 0.0, "avg_logprob": -0.20408561494615343, + "compression_ratio": 1.5614035087719298, "no_speech_prob": 0.13744895160198212}, + {"id": 531, "seek": 108596, "start": 1097.56, "end": 1101.88, "text": " It would + free us from getting bogged down in the technical details and allow us to focus", + "tokens": [50944, 467, 576, 1737, 505, 490, 1242, 26132, 3004, 760, 294, 264, 6191, + 4365, 293, 2089, 505, 281, 1879, 51160], "temperature": 0.0, "avg_logprob": -0.20408561494615343, + "compression_ratio": 1.5614035087719298, "no_speech_prob": 0.13744895160198212}, + {"id": 532, "seek": 108596, "start": 1101.88, "end": 1103.8400000000001, "text": + " on solving real world problems.", "tokens": [51160, 322, 12606, 957, 1002, 2740, + 13, 51258], "temperature": 0.0, "avg_logprob": -0.20408561494615343, "compression_ratio": + 1.5614035087719298, "no_speech_prob": 0.13744895160198212}, {"id": 533, "seek": + 108596, "start": 1103.8400000000001, "end": 1106.68, "text": " It''s all about making + AI work for us.", "tokens": [51258, 467, 311, 439, 466, 1455, 7318, 589, 337, 505, + 13, 51400], "temperature": 0.0, "avg_logprob": -0.20408561494615343, "compression_ratio": + 1.5614035087719298, "no_speech_prob": 0.13744895160198212}, {"id": 534, "seek": + 108596, "start": 1106.68, "end": 1108.16, "text": " Not the other way around.", + "tokens": [51400, 1726, 264, 661, 636, 926, 13, 51474], "temperature": 0.0, "avg_logprob": + -0.20408561494615343, "compression_ratio": 1.5614035087719298, "no_speech_prob": + 0.13744895160198212}, {"id": 535, "seek": 108596, "start": 1108.16, "end": 1109.3600000000001, + "text": " Exactly.", "tokens": [51474, 7587, 13, 51534], "temperature": 0.0, "avg_logprob": + -0.20408561494615343, "compression_ratio": 1.5614035087719298, "no_speech_prob": + 0.13744895160198212}, {"id": 536, "seek": 108596, "start": 1109.3600000000001, "end": + 1114.24, "text": " And it highlights the importance of staying flexible and open + to new possibilities.", "tokens": [51534, 400, 309, 14254, 264, 7379, 295, 7939, + 11358, 293, 1269, 281, 777, 12178, 13, 51778], "temperature": 0.0, "avg_logprob": + -0.20408561494615343, "compression_ratio": 1.5614035087719298, "no_speech_prob": + 0.13744895160198212}, {"id": 537, "seek": 111424, "start": 1114.24, "end": 1120.6, + "text": " As the AI landscape continues to evolve, we need to be willing to adapt + and embrace", "tokens": [50364, 1018, 264, 7318, 9661, 6515, 281, 16693, 11, 321, + 643, 281, 312, 4950, 281, 6231, 293, 14038, 50682], "temperature": 0.0, "avg_logprob": + -0.18256014869326637, "compression_ratio": 1.550387596899225, "no_speech_prob": + 0.008506925776600838}, {"id": 538, "seek": 111424, "start": 1120.6, "end": 1121.6, + "text": " new approaches.", "tokens": [50682, 777, 11587, 13, 50732], "temperature": + 0.0, "avg_logprob": -0.18256014869326637, "compression_ratio": 1.550387596899225, + "no_speech_prob": 0.008506925776600838}, {"id": 539, "seek": 111424, "start": 1121.6, + "end": 1124.16, "text": " Even if the challenger existing assumptions.", "tokens": + [50732, 2754, 498, 264, 2076, 10210, 6741, 17695, 13, 50860], "temperature": 0.0, + "avg_logprob": -0.18256014869326637, "compression_ratio": 1.550387596899225, "no_speech_prob": + 0.008506925776600838}, {"id": 540, "seek": 111424, "start": 1124.16, "end": 1125.16, + "text": " Challenge the status quo.", "tokens": [50860, 17517, 264, 6558, 28425, + 13, 50910], "temperature": 0.0, "avg_logprob": -0.18256014869326637, "compression_ratio": + 1.550387596899225, "no_speech_prob": 0.008506925776600838}, {"id": 541, "seek": + 111424, "start": 1125.16, "end": 1126.16, "text": " Right.", "tokens": [50910, 1779, + 13, 50960], "temperature": 0.0, "avg_logprob": -0.18256014869326637, "compression_ratio": + 1.550387596899225, "no_speech_prob": 0.008506925776600838}, {"id": 542, "seek": + 111424, "start": 1126.16, "end": 1129.2, "text": " Well said, well, this deep dive + has been a whirlwind of information.", "tokens": [50960, 1042, 848, 11, 731, 11, + 341, 2452, 9192, 575, 668, 257, 35706, 12199, 295, 1589, 13, 51112], "temperature": + 0.0, "avg_logprob": -0.18256014869326637, "compression_ratio": 1.550387596899225, + "no_speech_prob": 0.008506925776600838}, {"id": 543, "seek": 111424, "start": 1129.2, + "end": 1130.2, "text": " It has.", "tokens": [51112, 467, 575, 13, 51162], "temperature": + 0.0, "avg_logprob": -0.18256014869326637, "compression_ratio": 1.550387596899225, + "no_speech_prob": 0.008506925776600838}, {"id": 544, "seek": 111424, "start": 1130.2, + "end": 1136.8, "text": " But I think the biggest takeaway for me is that the world + of vector databases and AI in general", "tokens": [51162, 583, 286, 519, 264, 3880, + 30681, 337, 385, 307, 300, 264, 1002, 295, 8062, 22380, 293, 7318, 294, 2674, 51492], + "temperature": 0.0, "avg_logprob": -0.18256014869326637, "compression_ratio": 1.550387596899225, + "no_speech_prob": 0.008506925776600838}, {"id": 545, "seek": 111424, "start": 1136.8, + "end": 1139.24, "text": " is anything but static.", "tokens": [51492, 307, 1340, + 457, 13437, 13, 51614], "temperature": 0.0, "avg_logprob": -0.18256014869326637, + "compression_ratio": 1.550387596899225, "no_speech_prob": 0.008506925776600838}, + {"id": 546, "seek": 111424, "start": 1139.24, "end": 1140.52, "text": " It''s constantly + changing.", "tokens": [51614, 467, 311, 6460, 4473, 13, 51678], "temperature": 0.0, + "avg_logprob": -0.18256014869326637, "compression_ratio": 1.550387596899225, "no_speech_prob": + 0.008506925776600838}, {"id": 547, "seek": 114052, "start": 1140.52, "end": 1145.12, + "text": " It''s a constantly evolving landscape full of exciting possibilities and + unexpected twists", "tokens": [50364, 467, 311, 257, 6460, 21085, 9661, 1577, 295, + 4670, 12178, 293, 13106, 35290, 50594], "temperature": 0.0, "avg_logprob": -0.16215332576206754, + "compression_ratio": 1.6586102719033233, "no_speech_prob": 0.10723783075809479}, + {"id": 548, "seek": 114052, "start": 1145.12, "end": 1146.12, "text": " in turn.", + "tokens": [50594, 294, 1261, 13, 50644], "temperature": 0.0, "avg_logprob": -0.16215332576206754, + "compression_ratio": 1.6586102719033233, "no_speech_prob": 0.10723783075809479}, + {"id": 549, "seek": 114052, "start": 1146.12, "end": 1147.28, "text": " You never + know what''s going to come next.", "tokens": [50644, 509, 1128, 458, 437, 311, 516, + 281, 808, 958, 13, 50702], "temperature": 0.0, "avg_logprob": -0.16215332576206754, + "compression_ratio": 1.6586102719033233, "no_speech_prob": 0.10723783075809479}, + {"id": 550, "seek": 114052, "start": 1147.28, "end": 1148.76, "text": " And that''s + what makes it so exciting.", "tokens": [50702, 400, 300, 311, 437, 1669, 309, 370, + 4670, 13, 50776], "temperature": 0.0, "avg_logprob": -0.16215332576206754, "compression_ratio": + 1.6586102719033233, "no_speech_prob": 0.10723783075809479}, {"id": 551, "seek": + 114052, "start": 1148.76, "end": 1149.76, "text": " I completely agree.", "tokens": + [50776, 286, 2584, 3986, 13, 50826], "temperature": 0.0, "avg_logprob": -0.16215332576206754, + "compression_ratio": 1.6586102719033233, "no_speech_prob": 0.10723783075809479}, + {"id": 552, "seek": 114052, "start": 1149.76, "end": 1152.8799999999999, "text": + " And it''s a landscape that requires us to stay curious.", "tokens": [50826, 400, + 309, 311, 257, 9661, 300, 7029, 505, 281, 1754, 6369, 13, 50982], "temperature": + 0.0, "avg_logprob": -0.16215332576206754, "compression_ratio": 1.6586102719033233, + "no_speech_prob": 0.10723783075809479}, {"id": 553, "seek": 114052, "start": 1152.8799999999999, + "end": 1153.8799999999999, "text": " Keep learning.", "tokens": [50982, 5527, 2539, + 13, 51032], "temperature": 0.0, "avg_logprob": -0.16215332576206754, "compression_ratio": + 1.6586102719033233, "no_speech_prob": 0.10723783075809479}, {"id": 554, "seek": + 114052, "start": 1153.8799999999999, "end": 1155.76, "text": " And never stop asking + questions.", "tokens": [51032, 400, 1128, 1590, 3365, 1651, 13, 51126], "temperature": + 0.0, "avg_logprob": -0.16215332576206754, "compression_ratio": 1.6586102719033233, + "no_speech_prob": 0.10723783075809479}, {"id": 555, "seek": 114052, "start": 1155.76, + "end": 1159.28, "text": " And on that note, we''d love to hear from you, our listeners.", + "tokens": [51126, 400, 322, 300, 3637, 11, 321, 1116, 959, 281, 1568, 490, 291, + 11, 527, 23274, 13, 51302], "temperature": 0.0, "avg_logprob": -0.16215332576206754, + "compression_ratio": 1.6586102719033233, "no_speech_prob": 0.10723783075809479}, + {"id": 556, "seek": 114052, "start": 1159.28, "end": 1161.16, "text": " Yes, please + share your thoughts.", "tokens": [51302, 1079, 11, 1767, 2073, 428, 4598, 13, 51396], + "temperature": 0.0, "avg_logprob": -0.16215332576206754, "compression_ratio": 1.6586102719033233, + "no_speech_prob": 0.10723783075809479}, {"id": 557, "seek": 114052, "start": 1161.16, + "end": 1163.56, "text": " What are your thoughts on the future of vector databases?", + "tokens": [51396, 708, 366, 428, 4598, 322, 264, 2027, 295, 8062, 22380, 30, 51516], + "temperature": 0.0, "avg_logprob": -0.16215332576206754, "compression_ratio": 1.6586102719033233, + "no_speech_prob": 0.10723783075809479}, {"id": 558, "seek": 114052, "start": 1163.56, + "end": 1164.56, "text": " Hmm.", "tokens": [51516, 8239, 13, 51566], "temperature": + 0.0, "avg_logprob": -0.16215332576206754, "compression_ratio": 1.6586102719033233, + "no_speech_prob": 0.10723783075809479}, {"id": 559, "seek": 114052, "start": 1164.56, + "end": 1169.04, "text": " Do you think neural search frameworks will revolutionize + the way we build AI applications?", "tokens": [51566, 1144, 291, 519, 18161, 3164, + 29834, 486, 8894, 1125, 264, 636, 321, 1322, 7318, 5821, 30, 51790], "temperature": + 0.0, "avg_logprob": -0.16215332576206754, "compression_ratio": 1.6586102719033233, + "no_speech_prob": 0.10723783075809479}, {"id": 560, "seek": 116904, "start": 1169.04, + "end": 1171.04, "text": " That''s the big question.", "tokens": [50364, 663, 311, + 264, 955, 1168, 13, 50464], "temperature": 0.0, "avg_logprob": -0.16757558254485436, + "compression_ratio": 1.6036866359447004, "no_speech_prob": 0.25533971190452576}, + {"id": 561, "seek": 116904, "start": 1171.04, "end": 1173.24, "text": " Share your + insights, your predictions, and your questions.", "tokens": [50464, 14945, 428, + 14310, 11, 428, 21264, 11, 293, 428, 1651, 13, 50574], "temperature": 0.0, "avg_logprob": + -0.16757558254485436, "compression_ratio": 1.6036866359447004, "no_speech_prob": + 0.25533971190452576}, {"id": 562, "seek": 116904, "start": 1173.24, "end": 1174.24, + "text": " We''re all ears.", "tokens": [50574, 492, 434, 439, 8798, 13, 50624], + "temperature": 0.0, "avg_logprob": -0.16757558254485436, "compression_ratio": 1.6036866359447004, + "no_speech_prob": 0.25533971190452576}, {"id": 563, "seek": 116904, "start": 1174.24, + "end": 1176.32, "text": " We''re always eager to continue the conversation.", "tokens": + [50624, 492, 434, 1009, 18259, 281, 2354, 264, 3761, 13, 50728], "temperature": + 0.0, "avg_logprob": -0.16757558254485436, "compression_ratio": 1.6036866359447004, + "no_speech_prob": 0.25533971190452576}, {"id": 564, "seek": 116904, "start": 1176.32, + "end": 1180.6, "text": " Until next time, keep exploring, keep innovating, and keep + diving deep into the fascinating", "tokens": [50728, 9088, 958, 565, 11, 1066, 12736, + 11, 1066, 5083, 990, 11, 293, 1066, 20241, 2452, 666, 264, 10343, 50942], "temperature": + 0.0, "avg_logprob": -0.16757558254485436, "compression_ratio": 1.6036866359447004, + "no_speech_prob": 0.25533971190452576}, {"id": 565, "seek": 116904, "start": 1180.6, + "end": 1182.68, "text": " world of AI.", "tokens": [50942, 1002, 295, 7318, 13, + 51046], "temperature": 0.0, "avg_logprob": -0.16757558254485436, "compression_ratio": + 1.6036866359447004, "no_speech_prob": 0.25533971190452576}, {"id": 566, "seek": + 116904, "start": 1182.68, "end": 1184.8, "text": " That''s a wrap on this deep dive, + folks.", "tokens": [51046, 663, 311, 257, 7019, 322, 341, 2452, 9192, 11, 4024, + 13, 51152], "temperature": 0.0, "avg_logprob": -0.16757558254485436, "compression_ratio": + 1.6036866359447004, "no_speech_prob": 0.25533971190452576}, {"id": 567, "seek": + 116904, "start": 1184.8, "end": 1186.84, "text": " We hope you''ve enjoyed the journey + as much as we have.", "tokens": [51152, 492, 1454, 291, 600, 4626, 264, 4671, 382, + 709, 382, 321, 362, 13, 51254], "temperature": 0.0, "avg_logprob": -0.16757558254485436, + "compression_ratio": 1.6036866359447004, "no_speech_prob": 0.25533971190452576}]' +--- + +Welcome back everybody. Today, we're gonna be diving into the world of vector databases. Ooh, fun. They're rise, they're potential fall, and with the future holds. Okay. You know, you sent us this fascinating medium article to kind of guide our exploration. Ooh. +Called The Rise, Fall, and Future of Vector Databases. How to pick the one that lasts by Dimitri Can. Yeah, I saw that one. Published January 6th, 2025. Mm-hmm. So guess the term vector database might actually be on its way out. +Really? And your choice of database could hinge on needing things like faceted search. Oh, wow. Or even those super cool late interaction models. Huh, interesting. And treat. I know I am. Let's break it all down. Okay, let's do it. So it's gonna be good. +What I thought was so interesting about this article is how it really blends like the technical side with the broader AI landscape. Yeah, you're right. It's not just about the nuts and bolts. It's about how perceptions and adoption of vector databases are shifting within the AI world. Absolutely. +Like this is not just a technical deep dive. Right. This is about how people are thinking about these things. Yeah. And using them. Totally. And Dimitri brings this really unique perspective He does. to this whole conversation. +Because he was like deeply involved in this emerging market just a few years back. Oh, really? He was even advising VCs on which vector database companies to back. Wow. So he's like an insider. Yeah, he's got the inside scoop. So he really saw this whole thing unfold firsthand. +He was there from the beginning. Wow, that's amazing. And it's interesting because just a few years ago, vector databases were like the hot topic. They were everywhere, right? Everybody was talking about it. Like they were the key to unlocking all these powerful AI applications. +Like this was gonna change everything. Everyone was so excited. Yeah. Okay, so let's unpack this whole rise and fall idea. Okay. So Dimitri noticed something interesting. What's up? Fewer people were reading his early articles about vector databases. Oh, really? Huh? I wonder why. +What do you make of that? Well, you know, it kind of hints at a potential shift in the industry. Okay. Instead of being seen as these like standalone solutions, it seems like vector search technology is kind of merging with other AI advancements, becoming part of a bigger picture. +Like what? Think LLM, small time modal search. They're all getting more integrated. Okay, so it's not that vector databases are like vanishing. Right. They're evolving. Exactly. I'm blending into more comprehensive solutions. That's it. Okay, I got it. The technology itself is still crucial. +But how we think about it and use it is evolving. Okay. Like, you know, you have your traditional databases, right? Like your SQL and no SQL types. Well, a lot of them have integrated vector search capabilities now. Oh, wow. +So the data type itself is becoming normalized within these existing systems. Okay. I see. So it's becoming more commonplace. Yeah. Exactly. It's not this like niche thing anymore. Right. It's just part of the toolkit. It's becoming part of the fabric of how we work with data. +I like that fabric of how we work with data. It's a good way to put it right. But here's where things get really interesting. Okay. Tell me. Dimitri saw this resurgence in views for those older articles. Oh, so people are coming back to them. +They're coming back and guess what? What? We've been right when major funding announcements hit for some vector database startups back in April 2023. Oh, interesting. Like, big money was flowing back into this space. Yeah. Like, pine cones, $100 million series B. Yeah. Pine cone was huge. +Weve 8 securing $50 million. Huh. And QDran getting $7.5 million in seed funding. Yeah. Those were big headlines. They were. It really highlights how much media coverage and industry buzz can influence how we perceive technology trends. Codally. It's like a self-fulfilling prophecy almost. Yeah. +And it makes you think like how much of what we perceive is like the next big thing is actually driven by, you know, the hype, the hype, the funding, the media attention. Yeah. It's fascinating. So clearly, there's still a ton of innovation and investment happening in the vector database space. +Be sure. But let's get practical for our listener out there. Let's give him some actionable advice. The real gem in this article is Demetri's guide for choosing the right vector database. Right. Because there's no one size fits all solution. Exactly. It really depends on your specific needs. +It's like having a roadmap for navigating this complex landscape. Totally a roadmap. So where does he suggest starting? Okay. I'm ready. His secret weapon is, FAYS. Okay. No, it's not technically a full-fledged database. Right. It's more of a powerful library. Okay. +So the kicker, it can handle massive data sets. Okay. We're talking over a billion vectors. Wow. That's a lot. So it can scale. We can handle the big stuff. And the beauty of FAYS is its simplicity and scalability. So it's perfect for, it's perfect for initial exploration and prototyping. +You can just get in there and start playing around. Exactly. Yeah. But of course, uh oh, there's always a butt. There's a trade-off. Okay. What is it? Built-in filtering capabilities. Uh, so you can't really do that fine-grained search. Right. Like with keywords and stuff. +Which might mean getting creative with workarounds? Okay. So you've got to be a little clever. A little bit. If you want to use FAYS, I.S. for certain things. Yeah. So if you need that fine-grained controlled keyword search. Yeah. +Along with your vector search, what does Demetri recommend? I'm all ears. He suggests looking at databases built on top of Lucene. Lucene. Options like open search, elastic search, and a patchy solar. Got it. So these are all built on this like solid foundation of Lucene technology. Yeah. +Lucene's been around for a while, right? It's a mature technology with a proven track record. Yeah. So it's reliable. Reliable. Yeah. And it provides that robust keyword search. Okay. You mentioned plus multilingual support. That's important these days. Super important. +And it performs incredibly well. So it's fast and efficient. Nice. This makes it particularly well suited for e-commerce. Oh, interesting why e-commerce? Where features like faceting, which allows users to refine their search by specific attributes. Oh, I see. So like filtering by brands. Yeah. +Like filtering by brand. Price range. Price range size. All those things are essential. Makes sense for e-commerce. He did mention one exception though, right? Though there's always an exception. What is it? Cudrant, even though it's not built on Lucene. Oh, okay. Includes fascinating capabilities. +Interesting. So it's kind of a hybrid approach. Making it a contender in those scenarios too. So Cudrant's kind of a wild card. A little bit. Yeah. It's got its own unique set of features. And it shows the importance of going beyond general categories. Yeah. +And really digging into the specific features each database offers. Absolutely. You can't just like assume that because it's in one category. Right. It's got all the features you need. You got to do your research. You got to look under the hood. Exactly. +Now what if you need something more advanced? Okay. Like what? Like support for those late interaction models. Late interaction models, huh? Yeah. If you heard of these. I've heard the term, but I'm not really sure what they were. Okay. So imagine you're searching for the perfect pair of red shoes. +Okay. I like shoes. But only after you've seen a picture of the outfit you want them to match. Oh, I see. So like the context of the search changes. Exactly. And then you see something you see later on. That's where late interaction models come in. Okay. +Allowing you to refine your search based on context that's only available later in the process. So it's like a more dynamic way of searching. It is a more dynamic way of searching. Interesting. And it requires a different level of database support. I bet. +And Dmitry points to QDrent or Vespa as potential solutions. Okay. So they can handle those late interactions. Because they offer that support natively. So you don't have to hack it together yourself. Exactly. That's good to know. +So choosing a database that can handle those complexities is critical for performance and efficiency. You don't want your search to be slow and clunky. Especially if you're dealing with a lot of data. Right. Or if you need those results in real time. But it doesn't stop there. There's more. +The next step into Dmitry's roadmap is super important. Okay. Hit me. Considering latency. Latency, okay. And those query per second abans? Oh, yes. QPS. Can make or break your application? You're telling me. If your database is slow. Yeah. Or it can't handle the volume of queries. +It's going to be a bad experience for the user. It's going to be a disaster. So you've got to think about those things up front. Absolutely. And choose a database that can handle the load. If high performance is the name of the game. Yeah. +You'll want to explore solutions like GSI, APU, Vespa, or hyperspace. Got it. In fact, Dmitry even shared an anecdote about a CTO. Oh, I love a good anecdote. You'll confess that no open source vector database could handle their extreme workload. Wow. So they had to go with a commercial solution. +They'd find something else. That's interesting. Choosing wisely is essential. You can't just pick the first one you see. No. You got to your homework. And think about your long term needs. So the takeaway here is you need to think strategically. Yep. +Do you invest the engineering time to set up and maintain an open source database? Right. Or do you go with the convenience and potentially higher costs of a cloud solution? Right. It's a classic trade off. There's no right or wrong answer. It depends on your situation. +It's all about finding the balance that works best for your specific situation. Absolutely. And there are a lot of great cloud and API based options out there. Like what? Like Cosmos DB, Vertex AI, Pine Cone Cloud, Weeveate Cloud, and other. But there's no shortage of options. +There's a lot to choose from. It's a good problem to have, right? It is a good problem to have. Better than having no options at all. And we love hearing from our community. Oh yes, our listeners are the best. One reader, Matt Collins, suggested exploring extensions like PG Vector. +Did you vector? Okay. Which adds vector search to Postgresql. Oh, so you can just add it onto your existing Postgres database. Exactly. That's pretty cool. It's a really clever solution. You have to rip and replace your whole infrastructure. +And it speaks to the constantly evolving nature of the vector database landscape. It's a fast moving field. There's always something new happening. You've been in new solutions emerging. Speaking of evolution. Oh, this is where it gets really interesting. +Dimitri paints a fascinating picture of the future. I can't wait to hear this. He believes the future lies in what he calls neural search frameworks. Oh, wow. These frameworks could revolutionize how we build AI-powered applications. Okay. +And then we have a system that streamlines the entire process from data modeling and embedding selection to evaluation and scaling. +So instead of wrestling with the complexities of choosing and integrating all the different components, it would be like having an intelligent assistant guiding you through building a search application no matter what database technology you're using. Exactly. +And this vision ties in nicely with the concept of compound AI systems. Oh, interesting. So where LLMs vector databases and other AI components work together like a well coordinated orchestra. So instead of focusing on the individual instruments, you're conducting the entire symphony. Precisely. +I love that analogy. Users can then focus on the task they're trying to solve, rather than the technical nuts and bolts. So it's about abstracting away the complexity and empowering users to focus on the bigger picture. It's about making AI more accessible and user friendly. +It's fascinating how this all connects to those funding announcements we talked about earlier. Right. It seems like the industry might be moving towards a more unified approach to AI solutions. That's a keen observation. While individual components like vector databases are still important. +For sure. The future might be about how these pieces fit into a larger ecosystem. Yeah, it's all about the big picture. This brings us to an interesting question for you, the listener. Oh, yes. Let's get our listeners involved. +Do you see neural search frameworks as a complete paradigm shift? Or will specialized vector databases continue to have a distinct role to play? It's tough question. It's something to think about. Let us know your thoughts in the comments. We'd love to hear from you. +But before we get too caught up in the future, let's take a step back and revisit one of Dmitry's key points about the impact of media coverage on perceptions of technology trends. Right. That was a really important point. +It's a crucial reminder to be discerning consumers of information, especially in a field as dynamic as AI, where innovation is constant. What might seem like a decline could actually be a natural evolution. Interesting. As a technology matures and finds its place within a larger ecosystem. Right. +Like a caterpillar transforming into a butterfly. That's a wonderful analogy. It's still the same creature, just in a more advanced and beautiful form. It underscores the importance of staying curious, continuing to explore, and never assuming that any technology is truly dead. +Because it might just be evolving into something even better. Who knows what exciting developments await us in the world of vector databases? I'm definitely eager to see what the future holds. E2. This deep dive has given me a whole new perspective. I'm sure it has for our listeners as well. +You know, as we're discussing this, it strikes me that Dmitry's journey with vector databases mirrors a broader trend in the tech world. Oh, how so? We often get caught up in the hype cycle. Oh, yeah, for sure. +But true innovation often emerges when technologies evolve and integrate in unexpected ways. It's like that saying the whole is greater than the sum of its parts. And that brings us to another crucial point from the article. Okay. One that I think holds immense value for our listeners today. +I'm all ears. Remember how Dmitry emphasized that it's not just about the vectors themselves. It's about understanding the nuances of data pre-processing model selection and bedding techniques. And even knowing when to switch back to traditional keyword search for certain tasks. +Yeah, sometimes the old ways are still the best. He's advocating for a more holistic approach where vector databases are seen as one tool among many in the AI toolbox. So it's not a silver bullet. It's not a magic solution. It's one piece of the puzzle. Exactly. +There is a deeper understanding of the underlying principles, not just blindly applying the latest trendy technology. Right. It's about making informed choices based on a thorough analysis of your specific needs and constraints. So don't just jump on the bandwagon. +Do your research and figure out what's right for you. So for those of you out there exploring AI solutions, don't get fixated on buzzwords. Take the time to really grasp the fundamentals. Understand the basics. Experiment with different approaches. Stay around with different tools. +And don't be afraid to challenge assumptions. Question everything. And remember, the AI landscape is constantly evolving. What works best today might be superseded by something even more powerful and efficient tomorrow. So stay curious. They engage. And keep learning. +Couldn't have said it better myself. Well folks, that brings us to the end of our deep dive into the world of vector databases. It's been a wild ride. We've explored their rise, their potential fall, and the exciting possibilities of neural search frameworks. We've covered a lot of ground. +We've also learned some valuable lessons about navigating the hype cycle and making informed decisions in a rapidly changing technological landscape. It's been a fascinating journey. Absolutely. And we hope you've enjoyed it as much as we have. Until next time. Keep exploring. Keep questioning. +And keep that thirst for knowledge alive. It's funny actually while we're focused on all this cutting edge tech, Dimitri actually kind of throws it back to basics in the article a little bit. Oh yeah, I remember that part. +He recounts this conversation he had with the chief data scientist at a major bank. That was a good one, which I thought was so interesting because it really emphasizes how even with all these advancements, sometimes the simplest solution is the best one. You don't always need the fanciest tools. +Right. Sometimes it's about using the right tool for the job. Exactly. This bank had poured resources into building this complex vector search system. Okay. But guess what? What? They ended up getting better results with good old fashioned keyword search. Really? For some very specific tasks. Huh. +So even the big bangs are going back to basics sometimes. Sometimes it makes more sense. Yeah. It's a powerful reminder that we shouldn't dismiss those tried and true methods. Like don't throw out the baby with the bathwater. Right. Exactly. +The tools when used strategically can outperform the flashiest new tech. It's all about choosing the right tool for the job. It's like trying to use a chainsaw to cut a piece of paper. Oof. Yeah, that wouldn't end well. Sometimes a simple pair of scissors does the job better. Much better. +And that brings us back to Dimitri's vision of neural search frameworks. Okay. If they become a reality. Yeah. Could they simplify these choices for us? Interesting question. +Would they be able to determine the best approach? Like whether it's vector search, keyword search or a hybrid based on the task at hand. That's the dream. How be amazed? It would be like having this AI assistant that just knows the best way to search for anything. That's a fantastic question. +It really gets to the heart of what these frameworks could potentially offer. Like a more intelligent and adaptable approach to search. So instead of us having to figure it all out. Yeah. The system would just do it for us. +A vision of system that not only combines different AI components, but also understands which tool is best suited for each specific task. That would be a game changer. It would free us from getting bogged down in the technical details and allow us to focus on solving real world problems. +It's all about making AI work for us. Not the other way around. Exactly. And it highlights the importance of staying flexible and open to new possibilities. As the AI landscape continues to evolve, we need to be willing to adapt and embrace new approaches. +Even if the challenger existing assumptions. Challenge the status quo. Right. Well said, well, this deep dive has been a whirlwind of information. It has. But I think the biggest takeaway for me is that the world of vector databases and AI in general is anything but static. +It's constantly changing. It's a constantly evolving landscape full of exciting possibilities and unexpected twists in turn. You never know what's going to come next. And that's what makes it so exciting. I completely agree. And it's a landscape that requires us to stay curious. Keep learning. +And never stop asking questions. And on that note, we'd love to hear from you, our listeners. Yes, please share your thoughts. What are your thoughts on the future of vector databases? Hmm. +Do you think neural search frameworks will revolutionize the way we build AI applications? That's the big question. Share your insights, your predictions, and your questions. We're all ears. We're always eager to continue the conversation. +Until next time, keep exploring, keep innovating, and keep diving deep into the fascinating world of AI. That's a wrap on this deep dive, folks. We hope you've enjoyed the journey as much as we have. \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/yaniv-vaknin-director-of-product-searchium-hardware-accelerated-vector-search.md b/transcripts_with_timestamps/vector-podcast/yaniv-vaknin-director-of-product-searchium-hardware-accelerated-vector-search.md new file mode 100644 index 0000000..4063c15 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/yaniv-vaknin-director-of-product-searchium-hardware-accelerated-vector-search.md @@ -0,0 +1,2490 @@ +--- +description: '

00:00 Introduction

01:11 Yaniv’s background and intro to Searchium + & GSI

04:12 Ways to consume the APU acceleration for vector search

05:39 + Power consumption dimension in vector search

7:40 Place of the platform in + terms of applications, use cases and developer experience

12:06 Advantages + of APU Vector Search Plugins for Elasticsearch and OpenSearch compared to their + own implementations

17:54 Everyone needs to save: the economic profile of + the APU solution

20:51 Features and ANN algorithms in the solution

24:23 + Consumers most interested in dedicated hardware for vector search vs SaaS

27:08 + Vector Database or a relevance oriented application?

33:51 Where to go with + vector search?

42:38 How Vector Search fits into Search

48:58 Role of + the human in the AI loop

58:05 The missing bit in the AI/ML/Search space

1:06:37 + Magical WHY question

1:09:54 Announcements

- Searchium vector search: + https://searchium.ai/

- Dr. Avidan Akerib, + founder behind the APU technology: https://www.linkedin.com/in/avidan-akerib-phd-bbb35b12/

- + OpenSearch benchmark for performance tuning: https://betterprogramming.pub/tired-of-troubleshooting-idle-search-resources-use-opensearch-benchmark-for-performance-tuning-d4277c9f724

- + APU KNN plugin for OpenSearch: https://towardsdatascience.com/bolster-opensearch-performance-with-5-simple-steps-ca7d21234f6b

- + Multilingual and Multimodal Search with Hardware Acceleration: https://blog.muves.io/multilingual-and-multimodal-vector-search-with-hardware-acceleration-2091a825de78

- + Muves talk at Berlin Buzzwords, where we have utilized GSI APU: https://blog.muves.io/muves-at-berlin-buzzwords-2022-3150eef01c4

- + Not All Vector Databases are made equal: https://towardsdatascience.com/milvus-pinecone-vespa-weaviate-vald-gsi-what-unites-these-buzz-words-and-what-makes-each-9c65a3bd0696

Episode + on YouTube: https://youtu.be/EerdWRPuqd4

Podcast + design: Saurabh Rai: https://twitter.com/srvbhr

' +image_url: https://media.rss.com/vector-podcast/ep_cover_20221221_081200_2671d7352871b25bd4959821449e1a69.jpg +pub_date: Wed, 21 Dec 2022 20:35:43 GMT +title: Yaniv Vaknin - Director of Product, Searchium - Hardware accelerated vector + search +url: https://rss.com/podcasts/vector-podcast/752549 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 23.56, "text": " Hello, + vector podcast is here. We again continue to roll in this season 2 of this podcast.", + "tokens": [50364, 2425, 11, 8062, 7367, 307, 510, 13, 492, 797, 2354, 281, 3373, + 294, 341, 3196, 568, 295, 341, 7367, 13, 51542], "temperature": 0.0, "avg_logprob": + -0.42507251103719074, "compression_ratio": 1.1111111111111112, "no_speech_prob": + 0.11455044895410538}, {"id": 1, "seek": 2356, "start": 23.56, "end": 31.6, "text": + " Today I have a very interesting guest, Yanny Vaknin, who is the director of product + at Searchum.", "tokens": [50364, 2692, 286, 362, 257, 588, 1880, 8341, 11, 398, + 11612, 691, 514, 22955, 11, 567, 307, 264, 5391, 295, 1674, 412, 17180, 449, 13, + 50766], "temperature": 0.0, "avg_logprob": -0.4544465201241629, "compression_ratio": + 1.4766839378238341, "no_speech_prob": 0.44815152883529663}, {"id": 2, "seek": 2356, + "start": 31.6, "end": 36.8, "text": " If you''ve read my blog post on not all vector + databases, I made equal.", "tokens": [50766, 759, 291, 600, 1401, 452, 6968, 2183, + 322, 406, 439, 8062, 22380, 11, 286, 1027, 2681, 13, 51026], "temperature": 0.0, + "avg_logprob": -0.4544465201241629, "compression_ratio": 1.4766839378238341, "no_speech_prob": + 0.44815152883529663}, {"id": 3, "seek": 2356, "start": 36.8, "end": 41.239999999999995, + "text": " One of the vector databases, all like technologies, stood out.", "tokens": + [51026, 1485, 295, 264, 8062, 22380, 11, 439, 411, 7943, 11, 9371, 484, 13, 51248], + "temperature": 0.0, "avg_logprob": -0.4544465201241629, "compression_ratio": 1.4766839378238341, + "no_speech_prob": 0.44815152883529663}, {"id": 4, "seek": 2356, "start": 41.239999999999995, + "end": 47.239999999999995, "text": " And it''s a technology made by GSI, technology + company.", "tokens": [51248, 400, 309, 311, 257, 2899, 1027, 538, 460, 20262, 11, + 2899, 2237, 13, 51548], "temperature": 0.0, "avg_logprob": -0.4544465201241629, + "compression_ratio": 1.4766839378238341, "no_speech_prob": 0.44815152883529663}, + {"id": 5, "seek": 4724, "start": 47.28, "end": 51.28, "text": " And it''s implementing + a hardware for vector search.", "tokens": [50366, 400, 309, 311, 18114, 257, 8837, + 337, 8062, 3164, 13, 50566], "temperature": 0.0, "avg_logprob": -0.2543359221073619, + "compression_ratio": 1.5333333333333334, "no_speech_prob": 0.029213469475507736}, + {"id": 6, "seek": 4724, "start": 51.28, "end": 57.480000000000004, "text": " It''s + very rare that this thing exists or this approach exists on the market today.", + "tokens": [50566, 467, 311, 588, 5892, 300, 341, 551, 8198, 420, 341, 3109, 8198, + 322, 264, 2142, 965, 13, 50876], "temperature": 0.0, "avg_logprob": -0.2543359221073619, + "compression_ratio": 1.5333333333333334, "no_speech_prob": 0.029213469475507736}, + {"id": 7, "seek": 4724, "start": 57.480000000000004, "end": 60.440000000000005, + "text": " And I''m super excited to talk to Yanny Vaknin.", "tokens": [50876, 400, + 286, 478, 1687, 2919, 281, 751, 281, 398, 11612, 691, 514, 22955, 13, 51024], "temperature": + 0.0, "avg_logprob": -0.2543359221073619, "compression_ratio": 1.5333333333333334, + "no_speech_prob": 0.029213469475507736}, {"id": 8, "seek": 4724, "start": 60.440000000000005, + "end": 61.440000000000005, "text": " How are you, Yanny Vaknin?", "tokens": [51024, + 1012, 366, 291, 11, 398, 11612, 691, 514, 22955, 30, 51074], "temperature": 0.0, + "avg_logprob": -0.2543359221073619, "compression_ratio": 1.5333333333333334, "no_speech_prob": + 0.029213469475507736}, {"id": 9, "seek": 4724, "start": 61.440000000000005, "end": + 62.64, "text": " Hey.", "tokens": [51074, 1911, 13, 51134], "temperature": 0.0, + "avg_logprob": -0.2543359221073619, "compression_ratio": 1.5333333333333334, "no_speech_prob": + 0.029213469475507736}, {"id": 10, "seek": 4724, "start": 62.64, "end": 65.48, "text": + " Great. Thanks for having me, maybe, Mietri.", "tokens": [51134, 3769, 13, 2561, + 337, 1419, 385, 11, 1310, 11, 376, 1684, 470, 13, 51276], "temperature": 0.0, "avg_logprob": + -0.2543359221073619, "compression_ratio": 1.5333333333333334, "no_speech_prob": + 0.029213469475507736}, {"id": 11, "seek": 4724, "start": 65.48, "end": 70.72, "text": + " Yeah. I''m really glad you joined and found time in your business schedule.", + "tokens": [51276, 865, 13, 286, 478, 534, 5404, 291, 6869, 293, 1352, 565, 294, + 428, 1606, 7567, 13, 51538], "temperature": 0.0, "avg_logprob": -0.2543359221073619, + "compression_ratio": 1.5333333333333334, "no_speech_prob": 0.029213469475507736}, + {"id": 12, "seek": 4724, "start": 70.72, "end": 75.68, "text": " So can you first + explain how searchum and GSI are related?", "tokens": [51538, 407, 393, 291, 700, + 2903, 577, 3164, 449, 293, 460, 20262, 366, 4077, 30, 51786], "temperature": 0.0, + "avg_logprob": -0.2543359221073619, "compression_ratio": 1.5333333333333334, "no_speech_prob": + 0.029213469475507736}, {"id": 13, "seek": 7568, "start": 75.68, "end": 83.28, "text": + " And maybe at the same time, if you could traditionally introduce yourself and + talk about your background and how you got here.", "tokens": [50364, 400, 1310, + 412, 264, 912, 565, 11, 498, 291, 727, 19067, 5366, 1803, 293, 751, 466, 428, 3678, + 293, 577, 291, 658, 510, 13, 50744], "temperature": 0.0, "avg_logprob": -0.24611820318760017, + "compression_ratio": 1.5849056603773586, "no_speech_prob": 0.006571581587195396}, + {"id": 14, "seek": 7568, "start": 83.28, "end": 87.0, "text": " Yeah. So maybe I + will start with quick introduction.", "tokens": [50744, 865, 13, 407, 1310, 286, + 486, 722, 365, 1702, 9339, 13, 50930], "temperature": 0.0, "avg_logprob": -0.24611820318760017, + "compression_ratio": 1.5849056603773586, "no_speech_prob": 0.006571581587195396}, + {"id": 15, "seek": 7568, "start": 87.0, "end": 91.88000000000001, "text": " So I''m + director of product at Searchum AI.", "tokens": [50930, 407, 286, 478, 5391, 295, + 1674, 412, 17180, 449, 7318, 13, 51174], "temperature": 0.0, "avg_logprob": -0.24611820318760017, + "compression_ratio": 1.5849056603773586, "no_speech_prob": 0.006571581587195396}, + {"id": 16, "seek": 7568, "start": 91.88000000000001, "end": 104.44000000000001, + "text": " Searchum AI is a SaaS platform for ML search application, based on purpose + build AI chip for search applications.", "tokens": [51174, 17180, 449, 7318, 307, + 257, 49733, 3663, 337, 21601, 3164, 3861, 11, 2361, 322, 4334, 1322, 7318, 11409, + 337, 3164, 5821, 13, 51802], "temperature": 0.0, "avg_logprob": -0.24611820318760017, + "compression_ratio": 1.5849056603773586, "no_speech_prob": 0.006571581587195396}, + {"id": 17, "seek": 10444, "start": 104.44, "end": 118.67999999999999, "text": " + Prior to this role, I''ve worked at AWS as a machine learning specialist where I''ve + worked with broad spectrum of top t top tier tech companies.", "tokens": [50364, + 24032, 281, 341, 3090, 11, 286, 600, 2732, 412, 17650, 382, 257, 3479, 2539, 17008, + 689, 286, 600, 2732, 365, 4152, 11143, 295, 1192, 256, 1192, 12362, 7553, 3431, + 13, 51076], "temperature": 0.0, "avg_logprob": -0.23917608003358584, "compression_ratio": + 1.5804878048780489, "no_speech_prob": 0.062350302934646606}, {"id": 18, "seek": + 10444, "start": 118.67999999999999, "end": 121.44, "text": " Trying to help them + in their machine learning domain.", "tokens": [51076, 20180, 281, 854, 552, 294, + 641, 3479, 2539, 9274, 13, 51214], "temperature": 0.0, "avg_logprob": -0.23917608003358584, + "compression_ratio": 1.5804878048780489, "no_speech_prob": 0.062350302934646606}, + {"id": 19, "seek": 10444, "start": 121.44, "end": 133.68, "text": " And I was super + excited from the revolution of the like the fifth revolution, the AI revolution + with cool stuff of NLP search.", "tokens": [51214, 400, 286, 390, 1687, 2919, 490, + 264, 8894, 295, 264, 411, 264, 9266, 8894, 11, 264, 7318, 8894, 365, 1627, 1507, + 295, 426, 45196, 3164, 13, 51826], "temperature": 0.0, "avg_logprob": -0.23917608003358584, + "compression_ratio": 1.5804878048780489, "no_speech_prob": 0.062350302934646606}, + {"id": 20, "seek": 13368, "start": 133.68, "end": 137.32, "text": " Unstructed data + structure data.", "tokens": [50364, 1156, 372, 1757, 292, 1412, 3877, 1412, 13, + 50546], "temperature": 0.0, "avg_logprob": -0.3357957971507105, "compression_ratio": + 1.3680981595092025, "no_speech_prob": 0.025602206587791443}, {"id": 21, "seek": + 13368, "start": 137.32, "end": 146.08, "text": " I''ve worked with various companies + cyber, fintech e-commerce, etc.", "tokens": [50546, 286, 600, 2732, 365, 3683, 3431, + 13411, 11, 283, 686, 5023, 308, 12, 26926, 11, 5183, 13, 50984], "temperature": + 0.0, "avg_logprob": -0.3357957971507105, "compression_ratio": 1.3680981595092025, + "no_speech_prob": 0.025602206587791443}, {"id": 22, "seek": 13368, "start": 146.08, + "end": 157.12, "text": " I was co founder and CEO of deep sea AI, which was the + first computer vision based system for open water drowning detection.", "tokens": + [50984, 286, 390, 598, 14917, 293, 9282, 295, 2452, 4158, 7318, 11, 597, 390, 264, + 700, 3820, 5201, 2361, 1185, 337, 1269, 1281, 37198, 17784, 13, 51536], "temperature": + 0.0, "avg_logprob": -0.3357957971507105, "compression_ratio": 1.3680981595092025, + "no_speech_prob": 0.025602206587791443}, {"id": 23, "seek": 15712, "start": 157.12, + "end": 161.24, "text": " So we are the SaaS solution of GSI.", "tokens": [50364, + 407, 321, 366, 264, 49733, 3827, 295, 460, 20262, 13, 50570], "temperature": 0.0, + "avg_logprob": -0.2721675025092231, "compression_ratio": 1.445414847161572, "no_speech_prob": + 0.04016530513763428}, {"id": 24, "seek": 15712, "start": 161.24, "end": 165.44, + "text": " GSI acquired an Israeli startup a few years ago.", "tokens": [50570, 460, + 20262, 17554, 364, 19974, 18578, 257, 1326, 924, 2057, 13, 50780], "temperature": + 0.0, "avg_logprob": -0.2721675025092231, "compression_ratio": 1.445414847161572, + "no_speech_prob": 0.04016530513763428}, {"id": 25, "seek": 15712, "start": 165.44, + "end": 172.52, "text": " And the founder is Dr. Avidan Krib, which is one of the + smartest guy that I ever met.", "tokens": [50780, 400, 264, 14917, 307, 2491, 13, + 11667, 31675, 591, 2024, 11, 597, 307, 472, 295, 264, 41491, 2146, 300, 286, 1562, + 1131, 13, 51134], "temperature": 0.0, "avg_logprob": -0.2721675025092231, "compression_ratio": + 1.445414847161572, "no_speech_prob": 0.04016530513763428}, {"id": 26, "seek": 15712, + "start": 172.52, "end": 177.0, "text": " And during this PhD, he invented a new + concept.", "tokens": [51134, 400, 1830, 341, 14476, 11, 415, 14479, 257, 777, 3410, + 13, 51358], "temperature": 0.0, "avg_logprob": -0.2721675025092231, "compression_ratio": + 1.445414847161572, "no_speech_prob": 0.04016530513763428}, {"id": 27, "seek": 15712, + "start": 177.0, "end": 186.88, "text": " So traditionally, CPU is communicating + with the memory and then you have like challenges of bottleneck, IO, etc.", "tokens": + [51358, 407, 19067, 11, 13199, 307, 17559, 365, 264, 4675, 293, 550, 291, 362, 411, + 4759, 295, 44641, 547, 11, 39839, 11, 5183, 13, 51852], "temperature": 0.0, "avg_logprob": + -0.2721675025092231, "compression_ratio": 1.445414847161572, "no_speech_prob": 0.04016530513763428}, + {"id": 28, "seek": 18688, "start": 186.88, "end": 193.6, "text": " But when you + build the new concept, you build a memory that is the processor.", "tokens": [50364, + 583, 562, 291, 1322, 264, 777, 3410, 11, 291, 1322, 257, 4675, 300, 307, 264, 15321, + 13, 50700], "temperature": 0.0, "avg_logprob": -0.19830200407240126, "compression_ratio": + 1.7880434782608696, "no_speech_prob": 0.004914760589599609}, {"id": 29, "seek": + 18688, "start": 193.6, "end": 197.76, "text": " So all of the computation is happening + inside the memory.", "tokens": [50700, 407, 439, 295, 264, 24903, 307, 2737, 1854, + 264, 4675, 13, 50908], "temperature": 0.0, "avg_logprob": -0.19830200407240126, + "compression_ratio": 1.7880434782608696, "no_speech_prob": 0.004914760589599609}, + {"id": 30, "seek": 18688, "start": 197.76, "end": 213.88, "text": " You can guess + that when you''re running heavy or intensive intensive memory applications, if all + of the processing is happening inside the memory, you can get a single digit millisecond + latency.", "tokens": [50908, 509, 393, 2041, 300, 562, 291, 434, 2614, 4676, 420, + 18957, 18957, 4675, 5821, 11, 498, 439, 295, 264, 9007, 307, 2737, 1854, 264, 4675, + 11, 291, 393, 483, 257, 2167, 14293, 27940, 18882, 27043, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.19830200407240126, "compression_ratio": 1.7880434782608696, + "no_speech_prob": 0.004914760589599609}, {"id": 31, "seek": 21388, "start": 213.88, + "end": 222.72, "text": " Yeah, so GSI acquired the Israeli startup and we are based + in Israel.", "tokens": [50364, 865, 11, 370, 460, 20262, 17554, 264, 19974, 18578, + 293, 321, 366, 2361, 294, 5674, 13, 50806], "temperature": 0.0, "avg_logprob": -0.24529946018272722, + "compression_ratio": 1.4240837696335078, "no_speech_prob": 0.21629437804222107}, + {"id": 32, "seek": 21388, "start": 222.72, "end": 228.6, "text": " We have an R&D + team of approximately 40 to 50 people.", "tokens": [50806, 492, 362, 364, 497, 5, + 35, 1469, 295, 10447, 3356, 281, 2625, 561, 13, 51100], "temperature": 0.0, "avg_logprob": + -0.24529946018272722, "compression_ratio": 1.4240837696335078, "no_speech_prob": + 0.21629437804222107}, {"id": 33, "seek": 21388, "start": 228.6, "end": 235.0, "text": + " In order to scale it, we started searching AI because it''s super hard to scale + hardware.", "tokens": [51100, 682, 1668, 281, 4373, 309, 11, 321, 1409, 10808, 7318, + 570, 309, 311, 1687, 1152, 281, 4373, 8837, 13, 51420], "temperature": 0.0, "avg_logprob": + -0.24529946018272722, "compression_ratio": 1.4240837696335078, "no_speech_prob": + 0.21629437804222107}, {"id": 34, "seek": 21388, "start": 235.0, "end": 240.8, "text": + " So today we are offering this unique hardware on the cloud.", "tokens": [51420, + 407, 965, 321, 366, 8745, 341, 3845, 8837, 322, 264, 4588, 13, 51710], "temperature": + 0.0, "avg_logprob": -0.24529946018272722, "compression_ratio": 1.4240837696335078, + "no_speech_prob": 0.21629437804222107}, {"id": 35, "seek": 24080, "start": 240.8, + "end": 251.56, "text": " It can be AWS, GCP or any other cloud and customers can + consume it as a SaaS platform.", "tokens": [50364, 467, 393, 312, 17650, 11, 460, + 20049, 420, 604, 661, 4588, 293, 4581, 393, 14732, 309, 382, 257, 49733, 3663, 13, + 50902], "temperature": 0.0, "avg_logprob": -0.22140585658061934, "compression_ratio": + 1.419811320754717, "no_speech_prob": 0.04357348009943962}, {"id": 36, "seek": 24080, + "start": 251.56, "end": 253.16000000000003, "text": " Yeah, makes sense.", "tokens": + [50902, 865, 11, 1669, 2020, 13, 50982], "temperature": 0.0, "avg_logprob": -0.22140585658061934, + "compression_ratio": 1.419811320754717, "no_speech_prob": 0.04357348009943962}, + {"id": 37, "seek": 24080, "start": 253.16000000000003, "end": 261.16, "text": " + But there is still an option to if I want to have a completely on-premise sort of + like setup, right?", "tokens": [50982, 583, 456, 307, 920, 364, 3614, 281, 498, + 286, 528, 281, 362, 257, 2584, 322, 12, 29403, 908, 1333, 295, 411, 8657, 11, 558, + 30, 51382], "temperature": 0.0, "avg_logprob": -0.22140585658061934, "compression_ratio": + 1.419811320754717, "no_speech_prob": 0.04357348009943962}, {"id": 38, "seek": 24080, + "start": 261.16, "end": 268.0, "text": " In principle, I could have bought like + the APU cards, APU being a associative processing unit.", "tokens": [51382, 682, + 8665, 11, 286, 727, 362, 4243, 411, 264, 5372, 52, 5632, 11, 5372, 52, 885, 257, + 4180, 1166, 9007, 4985, 13, 51724], "temperature": 0.0, "avg_logprob": -0.22140585658061934, + "compression_ratio": 1.419811320754717, "no_speech_prob": 0.04357348009943962}, + {"id": 39, "seek": 26800, "start": 268.0, "end": 272.56, "text": " Cards and like + install them similar to what I would do with GPU. Is that right?", "tokens": [50364, + 383, 2287, 293, 411, 3625, 552, 2531, 281, 437, 286, 576, 360, 365, 18407, 13, 1119, + 300, 558, 30, 50592], "temperature": 0.0, "avg_logprob": -0.20464169081821237, "compression_ratio": + 1.6140350877192982, "no_speech_prob": 0.052067022770643234}, {"id": 40, "seek": + 26800, "start": 272.56, "end": 273.64, "text": " Yeah, yeah, totally.", "tokens": + [50592, 865, 11, 1338, 11, 3879, 13, 50646], "temperature": 0.0, "avg_logprob": + -0.20464169081821237, "compression_ratio": 1.6140350877192982, "no_speech_prob": + 0.052067022770643234}, {"id": 41, "seek": 26800, "start": 273.64, "end": 276.64, + "text": " Yeah, so there are two types of implementation.", "tokens": [50646, 865, + 11, 370, 456, 366, 732, 3467, 295, 11420, 13, 50796], "temperature": 0.0, "avg_logprob": + -0.20464169081821237, "compression_ratio": 1.6140350877192982, "no_speech_prob": + 0.052067022770643234}, {"id": 42, "seek": 26800, "start": 276.64, "end": 282.24, + "text": " The first one is on-prem and the second is via the cloud.", "tokens": + [50796, 440, 700, 472, 307, 322, 12, 29403, 293, 264, 1150, 307, 5766, 264, 4588, + 13, 51076], "temperature": 0.0, "avg_logprob": -0.20464169081821237, "compression_ratio": + 1.6140350877192982, "no_speech_prob": 0.052067022770643234}, {"id": 43, "seek": + 26800, "start": 282.24, "end": 297.4, "text": " There are various configurations + and in terms of if for instance, customers that would like to consume it as an on-prem + solution, there are various capabilities.", "tokens": [51076, 821, 366, 3683, 31493, + 293, 294, 2115, 295, 498, 337, 5197, 11, 4581, 300, 576, 411, 281, 14732, 309, 382, + 364, 322, 12, 29403, 3827, 11, 456, 366, 3683, 10862, 13, 51834], "temperature": + 0.0, "avg_logprob": -0.20464169081821237, "compression_ratio": 1.6140350877192982, + "no_speech_prob": 0.052067022770643234}, {"id": 44, "seek": 29740, "start": 297.4, + "end": 307.76, "text": " And one of the major things about this hardware accelerator + is the power consumption.", "tokens": [50364, 400, 472, 295, 264, 2563, 721, 466, + 341, 8837, 39889, 307, 264, 1347, 12126, 13, 50882], "temperature": 0.0, "avg_logprob": + -0.13492685045514788, "compression_ratio": 1.6388888888888888, "no_speech_prob": + 0.006789816077798605}, {"id": 45, "seek": 29740, "start": 307.76, "end": 316.03999999999996, + "text": " So comparing it to CPU or GPU, it is like can be 5% or 10% of the power + consumption.", "tokens": [50882, 407, 15763, 309, 281, 13199, 420, 18407, 11, 309, + 307, 411, 393, 312, 1025, 4, 420, 1266, 4, 295, 264, 1347, 12126, 13, 51296], "temperature": + 0.0, "avg_logprob": -0.13492685045514788, "compression_ratio": 1.6388888888888888, + "no_speech_prob": 0.006789816077798605}, {"id": 46, "seek": 29740, "start": 316.03999999999996, + "end": 327.03999999999996, "text": " So companies that are running heavy workloads + of GPU and CPU, the total cost of ownership for them is the power consumption.", + "tokens": [51296, 407, 3431, 300, 366, 2614, 4676, 32452, 295, 18407, 293, 13199, + 11, 264, 3217, 2063, 295, 15279, 337, 552, 307, 264, 1347, 12126, 13, 51846], "temperature": + 0.0, "avg_logprob": -0.13492685045514788, "compression_ratio": 1.6388888888888888, + "no_speech_prob": 0.006789816077798605}, {"id": 47, "seek": 32704, "start": 327.04, + "end": 328.44, "text": " And other factors.", "tokens": [50364, 400, 661, 6771, + 13, 50434], "temperature": 0.0, "avg_logprob": -0.1871465618690748, "compression_ratio": + 1.6371681415929205, "no_speech_prob": 0.018471654504537582}, {"id": 48, "seek": + 32704, "start": 328.44, "end": 339.96000000000004, "text": " So on-prem customers + can reduce the infrastructure cost in terms of the total cost of ownership, power + consumption, etc.", "tokens": [50434, 407, 322, 12, 29403, 4581, 393, 5407, 264, + 6896, 2063, 294, 2115, 295, 264, 3217, 2063, 295, 15279, 11, 1347, 12126, 11, 5183, + 13, 51010], "temperature": 0.0, "avg_logprob": -0.1871465618690748, "compression_ratio": + 1.6371681415929205, "no_speech_prob": 0.018471654504537582}, {"id": 49, "seek": + 32704, "start": 339.96000000000004, "end": 341.0, "text": " Yeah, this is really + cool.", "tokens": [51010, 865, 11, 341, 307, 534, 1627, 13, 51062], "temperature": + 0.0, "avg_logprob": -0.1871465618690748, "compression_ratio": 1.6371681415929205, + "no_speech_prob": 0.018471654504537582}, {"id": 50, "seek": 32704, "start": 341.0, + "end": 348.20000000000005, "text": " And I think it''s not very frequently that + we mentioned power consumption as one of the dimensions on this podcast.", "tokens": + [51062, 400, 286, 519, 309, 311, 406, 588, 10374, 300, 321, 2835, 1347, 12126, 382, + 472, 295, 264, 12819, 322, 341, 7367, 13, 51422], "temperature": 0.0, "avg_logprob": + -0.1871465618690748, "compression_ratio": 1.6371681415929205, "no_speech_prob": + 0.018471654504537582}, {"id": 51, "seek": 32704, "start": 348.20000000000005, "end": + 354.36, "text": " I mean, I think it''s crucial of course for the planet and also + for the electricity bill.", "tokens": [51422, 286, 914, 11, 286, 519, 309, 311, + 11462, 295, 1164, 337, 264, 5054, 293, 611, 337, 264, 10356, 2961, 13, 51730], "temperature": + 0.0, "avg_logprob": -0.1871465618690748, "compression_ratio": 1.6371681415929205, + "no_speech_prob": 0.018471654504537582}, {"id": 52, "seek": 35436, "start": 354.36, + "end": 360.76, "text": " And how the electricity costs skyrocketing, you know, and + I think it''s quite important.", "tokens": [50364, 400, 577, 264, 10356, 5497, 5443, + 37463, 278, 11, 291, 458, 11, 293, 286, 519, 309, 311, 1596, 1021, 13, 50684], "temperature": + 0.0, "avg_logprob": -0.30089751533840015, "compression_ratio": 1.446927374301676, + "no_speech_prob": 0.10110354423522949}, {"id": 53, "seek": 35436, "start": 360.76, + "end": 363.36, "text": " Yeah, sorry.", "tokens": [50684, 865, 11, 2597, 13, 50814], + "temperature": 0.0, "avg_logprob": -0.30089751533840015, "compression_ratio": 1.446927374301676, + "no_speech_prob": 0.10110354423522949}, {"id": 54, "seek": 35436, "start": 363.36, + "end": 378.24, "text": " No, I was just kind of alluding to this fact that it''s + very like you should not skip it in kind of assessing a system or like a vector + search solution, right.", "tokens": [50814, 883, 11, 286, 390, 445, 733, 295, 439, + 33703, 281, 341, 1186, 300, 309, 311, 588, 411, 291, 820, 406, 10023, 309, 294, + 733, 295, 34348, 257, 1185, 420, 411, 257, 8062, 3164, 3827, 11, 558, 13, 51558], + "temperature": 0.0, "avg_logprob": -0.30089751533840015, "compression_ratio": 1.446927374301676, + "no_speech_prob": 0.10110354423522949}, {"id": 55, "seek": 37824, "start": 378.24, + "end": 388.88, "text": " Not only focusing entirely on the offering itself, like + you should still worry about how it will scale in different dimensions.", "tokens": + [50364, 1726, 787, 8416, 7696, 322, 264, 8745, 2564, 11, 411, 291, 820, 920, 3292, + 466, 577, 309, 486, 4373, 294, 819, 12819, 13, 50896], "temperature": 0.0, "avg_logprob": + -0.24110528628031414, "compression_ratio": 1.4516129032258065, "no_speech_prob": + 0.5214963555335999}, {"id": 56, "seek": 37824, "start": 388.88, "end": 393.92, "text": + " I''m glad you guys also worry about the power consumption part.", "tokens": [50896, + 286, 478, 5404, 291, 1074, 611, 3292, 466, 264, 1347, 12126, 644, 13, 51148], "temperature": + 0.0, "avg_logprob": -0.24110528628031414, "compression_ratio": 1.4516129032258065, + "no_speech_prob": 0.5214963555335999}, {"id": 57, "seek": 37824, "start": 393.92, + "end": 400.8, "text": " Yeah, low carbon footprint is a major issue right now and + especially in Europe.", "tokens": [51148, 865, 11, 2295, 5954, 24222, 307, 257, + 2563, 2734, 558, 586, 293, 2318, 294, 3315, 13, 51492], "temperature": 0.0, "avg_logprob": + -0.24110528628031414, "compression_ratio": 1.4516129032258065, "no_speech_prob": + 0.5214963555335999}, {"id": 58, "seek": 40080, "start": 401.76, "end": 412.88, "text": + " Usually developers when they are launching the AWS instances, so they choose by + parameters of virtual CPU, RAM, etc.", "tokens": [50412, 11419, 8849, 562, 436, + 366, 18354, 264, 17650, 14519, 11, 370, 436, 2826, 538, 9834, 295, 6374, 13199, + 11, 14561, 11, 5183, 13, 50968], "temperature": 0.0, "avg_logprob": -0.23259834096401552, + "compression_ratio": 1.6018957345971565, "no_speech_prob": 0.06344892084598541}, + {"id": 59, "seek": 40080, "start": 412.88, "end": 430.72, "text": " And they are + not aware of the carbon footprint, but when you are running it on-prem, this is + a major parameters and this is a key parameter to, you know, taking a decision what + is the right platform or right is the right.", "tokens": [50968, 400, 436, 366, + 406, 3650, 295, 264, 5954, 24222, 11, 457, 562, 291, 366, 2614, 309, 322, 12, 29403, + 11, 341, 307, 257, 2563, 9834, 293, 341, 307, 257, 2141, 13075, 281, 11, 291, 458, + 11, 1940, 257, 3537, 437, 307, 264, 558, 3663, 420, 558, 307, 264, 558, 13, 51860], + "temperature": 0.0, "avg_logprob": -0.23259834096401552, "compression_ratio": 1.6018957345971565, + "no_speech_prob": 0.06344892084598541}, {"id": 60, "seek": 43072, "start": 430.72, + "end": 433.72, "text": " Hardware for you.", "tokens": [50364, 11817, 3039, 337, + 291, 13, 50514], "temperature": 0.0, "avg_logprob": -0.25560032028749763, "compression_ratio": + 1.6411483253588517, "no_speech_prob": 0.020188316702842712}, {"id": 61, "seek": + 43072, "start": 433.72, "end": 447.72, "text": " So totally agree, but, you know, + I believe in an agree with you, we should take it into consideration and assume + for cloud providers to integrate cloud providers like AWS, GCP, Azure.", "tokens": + [50514, 407, 3879, 3986, 11, 457, 11, 291, 458, 11, 286, 1697, 294, 364, 3986, 365, + 291, 11, 321, 820, 747, 309, 666, 12381, 293, 6552, 337, 4588, 11330, 281, 13365, + 4588, 11330, 411, 17650, 11, 460, 20049, 11, 11969, 13, 51214], "temperature": 0.0, + "avg_logprob": -0.25560032028749763, "compression_ratio": 1.6411483253588517, "no_speech_prob": + 0.020188316702842712}, {"id": 62, "seek": 43072, "start": 447.72, "end": 457.72, + "text": " This can be, you know, critical for them and we are in conversation with + some of the company of some of the cloud providers that I mentioned.", "tokens": + [51214, 639, 393, 312, 11, 291, 458, 11, 4924, 337, 552, 293, 321, 366, 294, 3761, + 365, 512, 295, 264, 2237, 295, 512, 295, 264, 4588, 11330, 300, 286, 2835, 13, 51714], + "temperature": 0.0, "avg_logprob": -0.25560032028749763, "compression_ratio": 1.6411483253588517, + "no_speech_prob": 0.020188316702842712}, {"id": 63, "seek": 45772, "start": 457.72, + "end": 459.72, "text": " Yeah, this sounds great.", "tokens": [50364, 865, 11, 341, + 3263, 869, 13, 50464], "temperature": 0.0, "avg_logprob": -0.19881485034893084, + "compression_ratio": 1.4977168949771689, "no_speech_prob": 0.008465241640806198}, + {"id": 64, "seek": 45772, "start": 459.72, "end": 473.72, "text": " If we move a + little bit closer to the algorithm side, so this is a kind of like dedicated hardware + and as far as I understood also based on our brain buzz, buzzwords presentation.", + "tokens": [50464, 759, 321, 1286, 257, 707, 857, 4966, 281, 264, 9284, 1252, 11, + 370, 341, 307, 257, 733, 295, 411, 8374, 8837, 293, 382, 1400, 382, 286, 7320, 611, + 2361, 322, 527, 3567, 13036, 11, 13036, 13832, 5860, 13, 51164], "temperature": + 0.0, "avg_logprob": -0.19881485034893084, "compression_ratio": 1.4977168949771689, + "no_speech_prob": 0.008465241640806198}, {"id": 65, "seek": 45772, "start": 473.72, + "end": 478.72, "text": " This hardware can support not only vector search, but some + other scenarios, right.", "tokens": [51164, 639, 8837, 393, 1406, 406, 787, 8062, + 3164, 11, 457, 512, 661, 15077, 11, 558, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.19881485034893084, "compression_ratio": 1.4977168949771689, "no_speech_prob": + 0.008465241640806198}, {"id": 66, "seek": 45772, "start": 478.72, "end": 482.72, + "text": " Like for image processing related tasks.", "tokens": [51414, 1743, 337, + 3256, 9007, 4077, 9608, 13, 51614], "temperature": 0.0, "avg_logprob": -0.19881485034893084, + "compression_ratio": 1.4977168949771689, "no_speech_prob": 0.008465241640806198}, + {"id": 67, "seek": 48272, "start": 482.72, "end": 494.72, "text": " So is there + any kind of like constraint on what type of vector search algorithm you can implement + or is it doesn''t it doesn''t have any constraint.", "tokens": [50364, 407, 307, + 456, 604, 733, 295, 411, 25534, 322, 437, 2010, 295, 8062, 3164, 9284, 291, 393, + 4445, 420, 307, 309, 1177, 380, 309, 1177, 380, 362, 604, 25534, 13, 50964], "temperature": + 0.0, "avg_logprob": -0.16377704899485518, "compression_ratio": 1.6559633027522935, + "no_speech_prob": 0.036446474492549896}, {"id": 68, "seek": 48272, "start": 494.72, + "end": 495.72, "text": " Yeah, yeah.", "tokens": [50964, 865, 11, 1338, 13, 51014], + "temperature": 0.0, "avg_logprob": -0.16377704899485518, "compression_ratio": 1.6559633027522935, + "no_speech_prob": 0.036446474492549896}, {"id": 69, "seek": 48272, "start": 495.72, + "end": 508.72, "text": " So I think that the biggest challenge today is when you + are developing hardly can develop like a state of the art hardware, but I think + the major challenge is how do you integrate it with the community.", "tokens": [51014, + 407, 286, 519, 300, 264, 3880, 3430, 965, 307, 562, 291, 366, 6416, 13572, 393, + 1499, 411, 257, 1785, 295, 264, 1523, 8837, 11, 457, 286, 519, 264, 2563, 3430, + 307, 577, 360, 291, 13365, 309, 365, 264, 1768, 13, 51664], "temperature": 0.0, + "avg_logprob": -0.16377704899485518, "compression_ratio": 1.6559633027522935, "no_speech_prob": + 0.036446474492549896}, {"id": 70, "seek": 50872, "start": 508.72, "end": 511.72, + "text": " And video I''ve done it.", "tokens": [50364, 400, 960, 286, 600, 1096, + 309, 13, 50514], "temperature": 0.0, "avg_logprob": -0.2465040169510187, "compression_ratio": + 1.3426573426573427, "no_speech_prob": 0.042637743055820465}, {"id": 71, "seek": + 50872, "start": 511.72, "end": 516.72, "text": " Very good with the CUDA and it + should be part of the ecosystem.", "tokens": [50514, 4372, 665, 365, 264, 29777, + 7509, 293, 309, 820, 312, 644, 295, 264, 11311, 13, 50764], "temperature": 0.0, + "avg_logprob": -0.2465040169510187, "compression_ratio": 1.3426573426573427, "no_speech_prob": + 0.042637743055820465}, {"id": 72, "seek": 50872, "start": 516.72, "end": 523.72, + "text": " So in terms of applications here, we have like another application for + image processing, it is based on.", "tokens": [50764, 407, 294, 2115, 295, 5821, + 510, 11, 321, 362, 411, 1071, 3861, 337, 3256, 9007, 11, 309, 307, 2361, 322, 13, + 51114], "temperature": 0.0, "avg_logprob": -0.2465040169510187, "compression_ratio": + 1.3426573426573427, "no_speech_prob": 0.042637743055820465}, {"id": 73, "seek": + 52372, "start": 523.72, "end": 537.72, "text": " So, say it''s set light images + and radar images and we can process it like faster in a few orders of magnitude + comparing to Jeep and video GPU.", "tokens": [50364, 407, 11, 584, 309, 311, 992, + 1442, 5267, 293, 16544, 5267, 293, 321, 393, 1399, 309, 411, 4663, 294, 257, 1326, + 9470, 295, 15668, 15763, 281, 31748, 293, 960, 18407, 13, 51064], "temperature": + 0.0, "avg_logprob": -0.4312035151890346, "compression_ratio": 1.2456140350877194, + "no_speech_prob": 0.2566491961479187}, {"id": 74, "seek": 53772, "start": 537.72, + "end": 547.72, "text": " So, we have in the past we had a few other applications + for a genome and the molecules and today we are.", "tokens": [50364, 407, 11, 321, + 362, 294, 264, 1791, 321, 632, 257, 1326, 661, 5821, 337, 257, 1049, 423, 293, 264, + 13093, 293, 965, 321, 366, 13, 50864], "temperature": 0.0, "avg_logprob": -0.2518039927763097, + "compression_ratio": 1.6698564593301435, "no_speech_prob": 0.6177324056625366}, + {"id": 75, "seek": 53772, "start": 547.72, "end": 558.72, "text": " We would like + to you know to focus on on the biggest challenges like I believe that you know searches + and you know we can elaborate about it later on.", "tokens": [50864, 492, 576, 411, + 281, 291, 458, 281, 1879, 322, 322, 264, 3880, 4759, 411, 286, 1697, 300, 291, 458, + 26701, 293, 291, 458, 321, 393, 20945, 466, 309, 1780, 322, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.2518039927763097, "compression_ratio": 1.6698564593301435, + "no_speech_prob": 0.6177324056625366}, {"id": 76, "seek": 53772, "start": 558.72, + "end": 566.72, "text": " Search is still broken and this is a huge market and so + our focus right now is on the search.", "tokens": [51414, 17180, 307, 920, 5463, + 293, 341, 307, 257, 2603, 2142, 293, 370, 527, 1879, 558, 586, 307, 322, 264, 3164, + 13, 51814], "temperature": 0.0, "avg_logprob": -0.2518039927763097, "compression_ratio": + 1.6698564593301435, "no_speech_prob": 0.6177324056625366}, {"id": 77, "seek": 56672, + "start": 566.72, "end": 578.72, "text": " And we actually have to spend it to other + solutions as well like image processing and we already have a solution and a customer + for this solution.", "tokens": [50364, 400, 321, 767, 362, 281, 3496, 309, 281, + 661, 6547, 382, 731, 411, 3256, 9007, 293, 321, 1217, 362, 257, 3827, 293, 257, + 5474, 337, 341, 3827, 13, 50964], "temperature": 0.0, "avg_logprob": -0.2831337792532785, + "compression_ratio": 1.5467625899280575, "no_speech_prob": 0.017435187473893166}, + {"id": 78, "seek": 56672, "start": 578.72, "end": 585.72, "text": " And then one + of our efforts is to build an ecosystem around this so.", "tokens": [50964, 400, + 550, 472, 295, 527, 6484, 307, 281, 1322, 364, 11311, 926, 341, 370, 13, 51314], + "temperature": 0.0, "avg_logprob": -0.2831337792532785, "compression_ratio": 1.5467625899280575, + "no_speech_prob": 0.017435187473893166}, {"id": 79, "seek": 58572, "start": 585.72, + "end": 597.72, "text": " Hopefully soon we will launch our Python compiler so developers + can write their code on Python and then you know run it.", "tokens": [50364, 10429, + 2321, 321, 486, 4025, 527, 15329, 31958, 370, 8849, 393, 2464, 641, 3089, 322, 15329, + 293, 550, 291, 458, 1190, 309, 13, 50964], "temperature": 0.0, "avg_logprob": -0.2535075407761794, + "compression_ratio": 1.4210526315789473, "no_speech_prob": 0.20332691073417664}, + {"id": 80, "seek": 58572, "start": 597.72, "end": 605.72, "text": " Simulnglessly + on on our APU without you know trying to learn a new framework or a new language.", + "tokens": [50964, 3998, 425, 872, 12048, 322, 322, 527, 5372, 52, 1553, 291, 458, + 1382, 281, 1466, 257, 777, 8388, 420, 257, 777, 2856, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.2535075407761794, "compression_ratio": 1.4210526315789473, + "no_speech_prob": 0.20332691073417664}, {"id": 81, "seek": 60572, "start": 605.72, + "end": 622.72, "text": " So, this is another direction that we are working we are + trying I think one of the biggest challenges today is you know to simplifying the + technological stack for developers so they are working with the common frameworks + or languages they don''t want to learn it.", "tokens": [50364, 407, 11, 341, 307, + 1071, 3513, 300, 321, 366, 1364, 321, 366, 1382, 286, 519, 472, 295, 264, 3880, + 4759, 965, 307, 291, 458, 281, 6883, 5489, 264, 18439, 8630, 337, 8849, 370, 436, + 366, 1364, 365, 264, 2689, 29834, 420, 8650, 436, 500, 380, 528, 281, 1466, 309, + 13, 51214], "temperature": 0.0, "avg_logprob": -0.15087122387356228, "compression_ratio": + 1.5232558139534884, "no_speech_prob": 0.1232721135020256}, {"id": 82, "seek": 62272, + "start": 622.72, "end": 626.72, "text": " So, they they they would like you know + to stay with the.", "tokens": [50364, 407, 11, 436, 436, 436, 576, 411, 291, 458, + 281, 1754, 365, 264, 13, 50564], "temperature": 0.0, "avg_logprob": -0.2130439905019907, + "compression_ratio": 1.6488095238095237, "no_speech_prob": 0.2192198485136032}, + {"id": 83, "seek": 62272, "start": 626.72, "end": 633.72, "text": " With the languages + that are familiar and you know the learning curve is not always.", "tokens": [50564, + 2022, 264, 8650, 300, 366, 4963, 293, 291, 458, 264, 2539, 7605, 307, 406, 1009, + 13, 50914], "temperature": 0.0, "avg_logprob": -0.2130439905019907, "compression_ratio": + 1.6488095238095237, "no_speech_prob": 0.2192198485136032}, {"id": 84, "seek": 62272, + "start": 633.72, "end": 643.72, "text": " They don''t always have time for you know + to learn a new framework so we are trying to simplify the integration with their + current stack.", "tokens": [50914, 814, 500, 380, 1009, 362, 565, 337, 291, 458, + 281, 1466, 257, 777, 8388, 370, 321, 366, 1382, 281, 20460, 264, 10980, 365, 641, + 2190, 8630, 13, 51414], "temperature": 0.0, "avg_logprob": -0.2130439905019907, + "compression_ratio": 1.6488095238095237, "no_speech_prob": 0.2192198485136032}, + {"id": 85, "seek": 64372, "start": 643.72, "end": 654.72, "text": " One of our solution + is is a plugin on top of elastic and open search and which they are offering a vector + search today but and we can talk about it but.", "tokens": [50364, 1485, 295, 527, + 3827, 307, 307, 257, 23407, 322, 1192, 295, 17115, 293, 1269, 3164, 293, 597, 436, + 366, 8745, 257, 8062, 3164, 965, 457, 293, 321, 393, 751, 466, 309, 457, 13, 50914], + "temperature": 0.0, "avg_logprob": -0.18903590221794284, "compression_ratio": 1.4803149606299213, + "no_speech_prob": 0.04933660104870796}, {"id": 86, "seek": 64372, "start": 654.72, + "end": 659.72, "text": " So we have a plugin on top of this.", "tokens": [50914, + 407, 321, 362, 257, 23407, 322, 1192, 295, 341, 13, 51164], "temperature": 0.0, + "avg_logprob": -0.18903590221794284, "compression_ratio": 1.4803149606299213, "no_speech_prob": + 0.04933660104870796}, {"id": 87, "seek": 65972, "start": 659.72, "end": 670.72, + "text": " Search applications because some of some of the customers would like you + know to stay with their curing the last or open search so we built a plugin on top + of it and we are.", "tokens": [50364, 17180, 5821, 570, 512, 295, 512, 295, 264, + 4581, 576, 411, 291, 458, 281, 1754, 365, 641, 1262, 278, 264, 1036, 420, 1269, + 3164, 370, 321, 3094, 257, 23407, 322, 1192, 295, 309, 293, 321, 366, 13, 50914], + "temperature": 0.0, "avg_logprob": -0.18826780821147718, "compression_ratio": 1.5304878048780488, + "no_speech_prob": 0.050167303532361984}, {"id": 88, "seek": 65972, "start": 670.72, + "end": 677.72, "text": " We are talking with search engines and vector database + in order to implement.", "tokens": [50914, 492, 366, 1417, 365, 3164, 12982, 293, + 8062, 8149, 294, 1668, 281, 4445, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.18826780821147718, "compression_ratio": 1.5304878048780488, "no_speech_prob": + 0.050167303532361984}, {"id": 89, "seek": 67772, "start": 677.72, "end": 693.72, + "text": " Our solution with their solution and I think in terms of like the the + landscape so we are we are not perceived the vector search engines and vector database + as competitors.", "tokens": [50364, 2621, 3827, 365, 641, 3827, 293, 286, 519, 294, + 2115, 295, 411, 264, 264, 9661, 370, 321, 366, 321, 366, 406, 19049, 264, 8062, + 3164, 12982, 293, 8062, 8149, 382, 18333, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.18520394961039224, "compression_ratio": 1.4333333333333333, "no_speech_prob": + 0.27227693796157837}, {"id": 90, "seek": 69372, "start": 693.72, "end": 706.72, + "text": " My perception is that they are potential they are partners and better + together and you know to give a greater value for their customers reducing the infrastructure + cost and give.", "tokens": [50364, 1222, 12860, 307, 300, 436, 366, 3995, 436, 366, + 4462, 293, 1101, 1214, 293, 291, 458, 281, 976, 257, 5044, 2158, 337, 641, 4581, + 12245, 264, 6896, 2063, 293, 976, 13, 51014], "temperature": 0.0, "avg_logprob": + -0.1768174171447754, "compression_ratio": 1.424, "no_speech_prob": 0.0941043272614479}, + {"id": 91, "seek": 70672, "start": 706.72, "end": 720.72, "text": " So the lower + latency with the same without sacrificing accuracy so yeah we are you know trying + to be part of the ecosystem and you know to help them and.", "tokens": [50364, 407, + 264, 3126, 27043, 365, 264, 912, 1553, 42294, 14170, 370, 1338, 321, 366, 291, 458, + 1382, 281, 312, 644, 295, 264, 11311, 293, 291, 458, 281, 854, 552, 293, 13, 51064], + "temperature": 0.4, "avg_logprob": -0.2978116726053172, "compression_ratio": 1.6449704142011834, + "no_speech_prob": 0.11550034582614899}, {"id": 92, "seek": 70672, "start": 720.72, + "end": 728.72, "text": " To help customer scale there and improve their scale their + search applications yeah yeah this is interesting you touched on.", "tokens": [51064, + 1407, 854, 5474, 4373, 456, 293, 3470, 641, 4373, 641, 3164, 5821, 1338, 1338, 341, + 307, 1880, 291, 9828, 322, 13, 51464], "temperature": 0.4, "avg_logprob": -0.2978116726053172, + "compression_ratio": 1.6449704142011834, "no_speech_prob": 0.11550034582614899}, + {"id": 93, "seek": 72872, "start": 728.72, "end": 747.72, "text": " You know being + like a competitor to vector database I think it''s interesting topic in general + because on one hand if you take all vector database players they kind of look at + each other as competitors probably but at the same time as all of you players are + sharing the.", "tokens": [50364, 509, 458, 885, 411, 257, 27266, 281, 8062, 8149, + 286, 519, 309, 311, 1880, 4829, 294, 2674, 570, 322, 472, 1011, 498, 291, 747, 439, + 8062, 8149, 4150, 436, 733, 295, 574, 412, 1184, 661, 382, 18333, 1391, 457, 412, + 264, 912, 565, 382, 439, 295, 291, 4150, 366, 5414, 264, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.11730776514325823, "compression_ratio": 1.5823529411764705, + "no_speech_prob": 0.3203605115413666}, {"id": 94, "seek": 74772, "start": 747.72, + "end": 774.72, "text": " You know the approach the documentation the how you think + about yourself I think it also helps cumulatively to the whole market but I wasn''t + also wanted to drill in a bit into this elastic search and open search plugin so + essentially like what elastic search team has been doing recently and I think they + released now some updates in version 8.5 where you can you can do things like hybrid + search right but this is all based on.", "tokens": [50364, 509, 458, 264, 3109, + 264, 14333, 264, 577, 291, 519, 466, 1803, 286, 519, 309, 611, 3665, 12713, 425, + 19020, 281, 264, 1379, 2142, 457, 286, 2067, 380, 611, 1415, 281, 11392, 294, 257, + 857, 666, 341, 17115, 3164, 293, 1269, 3164, 23407, 370, 4476, 411, 437, 17115, + 3164, 1469, 575, 668, 884, 3938, 293, 286, 519, 436, 4736, 586, 512, 9205, 294, + 3037, 1649, 13, 20, 689, 291, 393, 291, 393, 360, 721, 411, 13051, 3164, 558, 457, + 341, 307, 439, 2361, 322, 13, 51714], "temperature": 0.0, "avg_logprob": -0.14674968933791258, + "compression_ratio": 1.705179282868526, "no_speech_prob": 0.42380788922309875}, + {"id": 95, "seek": 77472, "start": 774.72, "end": 776.72, "text": " On the.", "tokens": + [50364, 1282, 264, 13, 50464], "temperature": 0.0, "avg_logprob": -0.26192683093952684, + "compression_ratio": 1.3529411764705883, "no_speech_prob": 0.0295390747487545}, + {"id": 96, "seek": 77472, "start": 776.72, "end": 784.72, "text": " A and an implementation + on top of Lucin so it''s all inside Java so it kind of runs in the same gvm right.", + "tokens": [50464, 316, 293, 364, 11420, 322, 1192, 295, 9593, 259, 370, 309, 311, + 439, 1854, 10745, 370, 309, 733, 295, 6676, 294, 264, 912, 290, 85, 76, 558, 13, + 50864], "temperature": 0.0, "avg_logprob": -0.26192683093952684, "compression_ratio": + 1.3529411764705883, "no_speech_prob": 0.0295390747487545}, {"id": 97, "seek": 77472, + "start": 784.72, "end": 791.72, "text": " The approach that you you guys have implemented + it''s basically like a.", "tokens": [50864, 440, 3109, 300, 291, 291, 1074, 362, + 12270, 309, 311, 1936, 411, 257, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.26192683093952684, "compression_ratio": 1.3529411764705883, "no_speech_prob": + 0.0295390747487545}, {"id": 98, "seek": 79172, "start": 791.72, "end": 811.72, "text": + " A vector search backend right which kind of runs somewhere else let''s see if + we''re using the cloud offering but at the same time it feels like a sort of like + native to elastic search so I don''t need to do much right I just need to install + the plugin of course I need to have credentials.", "tokens": [50364, 316, 8062, + 3164, 38087, 558, 597, 733, 295, 6676, 4079, 1646, 718, 311, 536, 498, 321, 434, + 1228, 264, 4588, 8745, 457, 412, 264, 912, 565, 309, 3417, 411, 257, 1333, 295, + 411, 8470, 281, 17115, 3164, 370, 286, 500, 380, 643, 281, 360, 709, 558, 286, 445, + 643, 281, 3625, 264, 23407, 295, 1164, 286, 643, 281, 362, 27404, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.11885370107797476, "compression_ratio": 1.5683060109289617, + "no_speech_prob": 0.4349227547645569}, {"id": 99, "seek": 81172, "start": 811.72, + "end": 838.72, "text": " And what I wanted to say is that it feels like you expand + the capabilities of elastic search beyond what what it offers in a way that you + can actually remove the load of vector search away from it to another back and right + can you talk a bit more on the unit cost on on this kind of unit economy and and + and sort of advantages of the approach that that you have implemented.", "tokens": + [50364, 400, 437, 286, 1415, 281, 584, 307, 300, 309, 3417, 411, 291, 5268, 264, + 10862, 295, 17115, 3164, 4399, 437, 437, 309, 7736, 294, 257, 636, 300, 291, 393, + 767, 4159, 264, 3677, 295, 8062, 3164, 1314, 490, 309, 281, 1071, 646, 293, 558, + 393, 291, 751, 257, 857, 544, 322, 264, 4985, 2063, 322, 322, 341, 733, 295, 4985, + 5010, 293, 293, 293, 1333, 295, 14906, 295, 264, 3109, 300, 300, 291, 362, 12270, + 13, 51714], "temperature": 0.0, "avg_logprob": -0.11812107563018799, "compression_ratio": + 1.7077625570776256, "no_speech_prob": 0.05050776153802872}, {"id": 100, "seek": + 83872, "start": 838.72, "end": 861.72, "text": " Yeah, this is a great point actually + so but we are trying to decouple storage and compute so let''s say for instance + customer with elastic or open search and they they''re having like tens of clusters + and they would like you know to scale it and to optimize it so we are running on + top of of elastic and you can.", "tokens": [50364, 865, 11, 341, 307, 257, 869, + 935, 767, 370, 457, 321, 366, 1382, 281, 979, 263, 781, 6725, 293, 14722, 370, 718, + 311, 584, 337, 5197, 5474, 365, 17115, 420, 1269, 3164, 293, 436, 436, 434, 1419, + 411, 10688, 295, 23313, 293, 436, 576, 411, 291, 458, 281, 4373, 309, 293, 281, + 19719, 309, 370, 321, 366, 2614, 322, 1192, 295, 295, 17115, 293, 291, 393, 13, + 51514], "temperature": 0.0, "avg_logprob": -0.12943125442719797, "compression_ratio": + 1.5897435897435896, "no_speech_prob": 0.023487763479351997}, {"id": 101, "seek": + 86172, "start": 861.72, "end": 881.72, "text": " Our solution is kind of the compute + for elastic so they can run and scale and reduce the infrastructure cost because + you know all of these is is a is a question of how many machines do you run okay + so you can get like 99.9 recall it and or accuracy.", "tokens": [50364, 2621, 3827, + 307, 733, 295, 264, 14722, 337, 17115, 370, 436, 393, 1190, 293, 4373, 293, 5407, + 264, 6896, 2063, 570, 291, 458, 439, 295, 613, 307, 307, 257, 307, 257, 1168, 295, + 577, 867, 8379, 360, 291, 1190, 1392, 370, 291, 393, 483, 411, 11803, 13, 24, 9901, + 309, 293, 420, 14170, 13, 51364], "temperature": 0.0, "avg_logprob": -0.15338777673655543, + "compression_ratio": 1.5182926829268293, "no_speech_prob": 0.10750913619995117}, + {"id": 102, "seek": 88172, "start": 881.72, "end": 893.72, "text": " And you can + get like single digit millisecond latency but in terms of the infrastructure costs + so you know one of the biggest challenges today for enterprises is the.", "tokens": + [50364, 400, 291, 393, 483, 411, 2167, 14293, 27940, 18882, 27043, 457, 294, 2115, + 295, 264, 6896, 5497, 370, 291, 458, 472, 295, 264, 3880, 4759, 965, 337, 29034, + 307, 264, 13, 50964], "temperature": 0.0, "avg_logprob": -0.1948120525905064, "compression_ratio": + 1.360655737704918, "no_speech_prob": 0.06562619656324387}, {"id": 103, "seek": 89372, + "start": 893.72, "end": 922.72, "text": " The low margins due to heavy infrastructure + applications so if you are running GPU on the cloud or like heavy machines with + big machines with high memory it''s great in terms of the business because it''s + great in terms of the dev team in terms of we are getting great performance high + recall but again when you''re moving and you''re discussing on on the business side.", + "tokens": [50414, 440, 2295, 30317, 3462, 281, 4676, 6896, 5821, 370, 498, 291, + 366, 2614, 18407, 322, 264, 4588, 420, 411, 4676, 8379, 365, 955, 8379, 365, 1090, + 4675, 309, 311, 869, 294, 2115, 295, 264, 1606, 570, 309, 311, 869, 294, 2115, 295, + 264, 1905, 1469, 294, 2115, 295, 321, 366, 1242, 869, 3389, 1090, 9901, 457, 797, + 562, 291, 434, 2684, 293, 291, 434, 10850, 322, 322, 264, 1606, 1252, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.1383481927820154, "compression_ratio": 1.7345971563981042, + "no_speech_prob": 0.14363382756710052}, {"id": 104, "seek": 92372, "start": 923.72, + "end": 952.72, "text": " So in terms of the margins and the profit of the companies + and today it''s a big issue you know with companies that are having the challenge + of being profitable so we are trying and we add like a few benchmarks we are trying + to reduce the infrastructure cost so instead of 10 machines it can be two machines + and and our accelerator our.", "tokens": [50364, 407, 294, 2115, 295, 264, 30317, + 293, 264, 7475, 295, 264, 3431, 293, 965, 309, 311, 257, 955, 2734, 291, 458, 365, + 3431, 300, 366, 1419, 264, 3430, 295, 885, 21608, 370, 321, 366, 1382, 293, 321, + 909, 411, 257, 1326, 43751, 321, 366, 1382, 281, 5407, 264, 6896, 2063, 370, 2602, + 295, 1266, 8379, 309, 393, 312, 732, 8379, 293, 293, 527, 39889, 527, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.12335879560829936, "compression_ratio": 1.6834170854271358, + "no_speech_prob": 0.009290657006204128}, {"id": 105, "seek": 95372, "start": 953.72, + "end": 977.72, "text": " For APU and with that you can you know scale it and you + know one one other interesting thing is like many companies are talking about the + scale challenge about the one billion scale challenge so and you know data is exploding + right because you know today they are.", "tokens": [50364, 1171, 5372, 52, 293, + 365, 300, 291, 393, 291, 458, 4373, 309, 293, 291, 458, 472, 472, 661, 1880, 551, + 307, 411, 867, 3431, 366, 1417, 466, 264, 4373, 3430, 466, 264, 472, 5218, 4373, + 3430, 370, 293, 291, 458, 1412, 307, 35175, 558, 570, 291, 458, 965, 436, 366, 13, + 51564], "temperature": 0.0, "avg_logprob": -0.1617334019054066, "compression_ratio": + 1.6815286624203822, "no_speech_prob": 0.02717452310025692}, {"id": 106, "seek": + 97772, "start": 977.72, "end": 1006.72, "text": " 80 zetabytes and 10 years ago + it was like 16 so essentially like data is growing very fast and I assume that in + the next couple of years it will grow exponentially and 90% of this data and the + data that is created every year is unstructured so you know this is the cliche of + finding a needle in a haystack.", "tokens": [50364, 4688, 710, 302, 24538, 293, + 1266, 924, 2057, 309, 390, 411, 3165, 370, 4476, 411, 1412, 307, 4194, 588, 2370, + 293, 286, 6552, 300, 294, 264, 958, 1916, 295, 924, 309, 486, 1852, 37330, 293, + 4289, 4, 295, 341, 1412, 293, 264, 1412, 300, 307, 2942, 633, 1064, 307, 18799, + 46847, 370, 291, 458, 341, 307, 264, 46705, 295, 5006, 257, 11037, 294, 257, 4842, + 372, 501, 13, 51814], "temperature": 0.0, "avg_logprob": -0.131530331893706, "compression_ratio": + 1.5612244897959184, "no_speech_prob": 0.050234563648700714}, {"id": 107, "seek": + 100772, "start": 1007.72, "end": 1036.72, "text": " So in terms of and I assume + more and more companies will face the scale challenge like above one billion and + I know that this is a challenge for some of the search engine companies you know + scaling to hundreds of millions and billions and I had a conversation with one of + the biggest e-commerce in in Asia and he told me yeah our our challenges to scale + they have like two billion.", "tokens": [50364, 407, 294, 2115, 295, 293, 286, 6552, + 544, 293, 544, 3431, 486, 1851, 264, 4373, 3430, 411, 3673, 472, 5218, 293, 286, + 458, 300, 341, 307, 257, 3430, 337, 512, 295, 264, 3164, 2848, 3431, 291, 458, 21589, + 281, 6779, 295, 6803, 293, 17375, 293, 286, 632, 257, 3761, 365, 472, 295, 264, + 3880, 308, 12, 26926, 294, 294, 10038, 293, 415, 1907, 385, 1338, 527, 527, 4759, + 281, 4373, 436, 362, 411, 732, 5218, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.12570935487747192, "compression_ratio": 1.7720930232558139, "no_speech_prob": + 0.013040537014603615}, {"id": 108, "seek": 103672, "start": 1036.72, "end": 1065.72, + "text": " Index and again the infrastructure cost is a major issue I read a post + by Amazon''s CFO and like a week ago and their focus right now is reducing the infrastructure + cost for their customers and any solution that can reduce the infrastructure cost + for enterprises I think it''s a major issue for.", "tokens": [50364, 33552, 293, + 797, 264, 6896, 2063, 307, 257, 2563, 2734, 286, 1401, 257, 2183, 538, 6795, 311, + 383, 18067, 293, 411, 257, 1243, 2057, 293, 641, 1879, 558, 586, 307, 12245, 264, + 6896, 2063, 337, 641, 4581, 293, 604, 3827, 300, 393, 5407, 264, 6896, 2063, 337, + 29034, 286, 519, 309, 311, 257, 2563, 2734, 337, 13, 51814], "temperature": 0.0, + "avg_logprob": -0.16459927793409004, "compression_ratio": 1.723529411764706, "no_speech_prob": + 0.02813696302473545}, {"id": 109, "seek": 106572, "start": 1065.72, "end": 1074.72, + "text": " Not only R&D teams but business and decision-makers in enterprises.", + "tokens": [50364, 1726, 787, 497, 5, 35, 5491, 457, 1606, 293, 3537, 12, 15870, + 294, 29034, 13, 50814], "temperature": 0.0, "avg_logprob": -0.144998599956562, "compression_ratio": + 1.6153846153846154, "no_speech_prob": 0.07004126906394958}, {"id": 110, "seek": + 106572, "start": 1074.72, "end": 1091.72, "text": " Yeah well I will I will pull + for that link so we can also include it in the in the show notes some of our listeners + by the way find it quite educational to have all this additional links and study + materials and I think we can also include that that''s super super cool.", "tokens": + [50814, 865, 731, 286, 486, 286, 486, 2235, 337, 300, 2113, 370, 321, 393, 611, + 4090, 309, 294, 264, 294, 264, 855, 5570, 512, 295, 527, 23274, 538, 264, 636, 915, + 309, 1596, 10189, 281, 362, 439, 341, 4497, 6123, 293, 2979, 5319, 293, 286, 519, + 321, 393, 611, 4090, 300, 300, 311, 1687, 1687, 1627, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.144998599956562, "compression_ratio": 1.6153846153846154, + "no_speech_prob": 0.07004126906394958}, {"id": 111, "seek": 109172, "start": 1091.72, + "end": 1115.72, "text": " So in a way like your challenges that you basically need + to as you said there are low margins right for this big players everyone tries to + stay profitable so in a way your challenges to not only fit into that narrow kind + of window but also be profitable yourself right so like you like provide that acceleration.", + "tokens": [50364, 407, 294, 257, 636, 411, 428, 4759, 300, 291, 1936, 643, 281, + 382, 291, 848, 456, 366, 2295, 30317, 558, 337, 341, 955, 4150, 1518, 9898, 281, + 1754, 21608, 370, 294, 257, 636, 428, 4759, 281, 406, 787, 3318, 666, 300, 9432, + 733, 295, 4910, 457, 611, 312, 21608, 1803, 558, 370, 411, 291, 411, 2893, 300, + 17162, 13, 51564], "temperature": 0.0, "avg_logprob": -0.15088989621117002, "compression_ratio": + 1.7049180327868851, "no_speech_prob": 0.5821470022201538}, {"id": 112, "seek": 111572, + "start": 1115.72, "end": 1125.72, "text": " What do you think where do you stand + today on that I do you think there is a lot still to do or do you think it''s already + something that companies can try.", "tokens": [50364, 708, 360, 291, 519, 689, 360, + 291, 1463, 965, 322, 300, 286, 360, 291, 519, 456, 307, 257, 688, 920, 281, 360, + 420, 360, 291, 519, 309, 311, 1217, 746, 300, 3431, 393, 853, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.17950748072730172, "compression_ratio": 1.681081081081081, + "no_speech_prob": 0.1673039197921753}, {"id": 113, "seek": 111572, "start": 1125.72, + "end": 1143.72, "text": " Yeah so today we have like the first generation of our + AI chip the APU the potential of improving our hardware and our bill of material + of our hardware and", "tokens": [50864, 865, 370, 965, 321, 362, 411, 264, 700, + 5125, 295, 527, 7318, 11409, 264, 5372, 52, 264, 3995, 295, 11470, 527, 8837, 293, + 527, 2961, 295, 2527, 295, 527, 8837, 293, 51764], "temperature": 0.0, "avg_logprob": + -0.17950748072730172, "compression_ratio": 1.681081081081081, "no_speech_prob": + 0.1673039197921753}, {"id": 114, "seek": 114372, "start": 1143.72, "end": 1157.72, + "text": " generally speaking next year we launch our second generation so for instance + if today we can you know in terms of performance we are talking about single double + digit millisecond latency with one APU.", "tokens": [50364, 5101, 4124, 958, 1064, + 321, 4025, 527, 1150, 5125, 370, 337, 5197, 498, 965, 321, 393, 291, 458, 294, 2115, + 295, 3389, 321, 366, 1417, 466, 2167, 3834, 14293, 27940, 18882, 27043, 365, 472, + 5372, 52, 13, 51064], "temperature": 0.0, "avg_logprob": -0.17996202032250094, "compression_ratio": + 1.6909871244635193, "no_speech_prob": 0.09651032835245132}, {"id": 115, "seek": + 114372, "start": 1157.72, "end": 1172.72, "text": " Next year we will launch our + second generation it will be more than 10x faster so I think we are just scratching + the tip of the iceberg so the I think that the hardware challenge is solved but.", + "tokens": [51064, 3087, 1064, 321, 486, 4025, 527, 1150, 5125, 309, 486, 312, 544, + 813, 1266, 87, 4663, 370, 286, 519, 321, 366, 445, 29699, 264, 4125, 295, 264, 38880, + 370, 264, 286, 519, 300, 264, 8837, 3430, 307, 13041, 457, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.17996202032250094, "compression_ratio": 1.6909871244635193, + "no_speech_prob": 0.09651032835245132}, {"id": 116, "seek": 117272, "start": 1172.72, + "end": 1189.72, "text": " You know every week we have like a new implementation + and improving our performance on the software layer so we have a few layers we have + the hardware layers so I spoke about it like the first generation and the second + generation.", "tokens": [50364, 509, 458, 633, 1243, 321, 362, 411, 257, 777, 11420, + 293, 11470, 527, 3389, 322, 264, 4722, 4583, 370, 321, 362, 257, 1326, 7914, 321, + 362, 264, 8837, 7914, 370, 286, 7179, 466, 309, 411, 264, 700, 5125, 293, 264, 1150, + 5125, 13, 51214], "temperature": 0.0, "avg_logprob": -0.1168725439842711, "compression_ratio": + 1.6312056737588652, "no_speech_prob": 0.009129791520535946}, {"id": 117, "seek": + 118972, "start": 1189.72, "end": 1198.72, "text": " I believe that there there''s + a huge potential in terms of optimizing our software layer because it is.", "tokens": + [50364, 286, 1697, 300, 456, 456, 311, 257, 2603, 3995, 294, 2115, 295, 40425, 527, + 4722, 4583, 570, 309, 307, 13, 50814], "temperature": 0.0, "avg_logprob": -0.08955100003410787, + "compression_ratio": 1.7117647058823529, "no_speech_prob": 0.07505377382040024}, + {"id": 118, "seek": 118972, "start": 1198.72, "end": 1202.72, "text": " We are trying + to reinvent search so.", "tokens": [50814, 492, 366, 1382, 281, 33477, 3164, 370, + 13, 51014], "temperature": 0.0, "avg_logprob": -0.08955100003410787, "compression_ratio": + 1.7117647058823529, "no_speech_prob": 0.07505377382040024}, {"id": 119, "seek": + 118972, "start": 1202.72, "end": 1214.72, "text": " I think there''s a huge potential + on on the hardware side but I think we are just we are just we didn''t even start + to optimize our software performance.", "tokens": [51014, 286, 519, 456, 311, 257, + 2603, 3995, 322, 322, 264, 8837, 1252, 457, 286, 519, 321, 366, 445, 321, 366, 445, + 321, 994, 380, 754, 722, 281, 19719, 527, 4722, 3389, 13, 51614], "temperature": + 0.0, "avg_logprob": -0.08955100003410787, "compression_ratio": 1.7117647058823529, + "no_speech_prob": 0.07505377382040024}, {"id": 120, "seek": 121472, "start": 1214.72, + "end": 1233.72, "text": " Recently we found a new implementation to improve the + latency by reduce the latency by 40% like it was two weeks ago so hopefully we will + launch it to production in the upcoming in the next up upcoming weeks.", "tokens": + [50364, 20072, 321, 1352, 257, 777, 11420, 281, 3470, 264, 27043, 538, 5407, 264, + 27043, 538, 3356, 4, 411, 309, 390, 732, 3259, 2057, 370, 4696, 321, 486, 4025, + 309, 281, 4265, 294, 264, 11500, 294, 264, 958, 493, 11500, 3259, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.10941989686754015, "compression_ratio": 1.5182481751824817, + "no_speech_prob": 0.12799666821956635}, {"id": 121, "seek": 123372, "start": 1233.72, + "end": 1250.72, "text": " So and in terms of your question yeah I think we are just + at the beginning and I believe that we can optimize both on the on the hardware + and the software layer and hopefully it will be very profitable.", "tokens": [50364, + 407, 293, 294, 2115, 295, 428, 1168, 1338, 286, 519, 321, 366, 445, 412, 264, 2863, + 293, 286, 1697, 300, 321, 393, 19719, 1293, 322, 264, 322, 264, 8837, 293, 264, + 4722, 4583, 293, 4696, 309, 486, 312, 588, 21608, 13, 51214], "temperature": 0.0, + "avg_logprob": -0.11150744756062826, "compression_ratio": 1.4225352112676057, "no_speech_prob": + 0.15878064930438995}, {"id": 122, "seek": 125072, "start": 1250.72, "end": 1277.72, + "text": " Sounds great I mean in general it was since I have a had exposure to it + as we implemented the image search demo it was quite interesting how you know how + easy it was to set it up right so it like and and you don''t need to worry about + that hardware thing yeah it acts a little bit like a black box but on the other + hand it''s very scalable so.", "tokens": [50364, 14576, 869, 286, 914, 294, 2674, + 309, 390, 1670, 286, 362, 257, 632, 10420, 281, 309, 382, 321, 12270, 264, 3256, + 3164, 10723, 309, 390, 1596, 1880, 577, 291, 458, 577, 1858, 309, 390, 281, 992, + 309, 493, 558, 370, 309, 411, 293, 293, 291, 500, 380, 643, 281, 3292, 466, 300, + 8837, 551, 1338, 309, 10672, 257, 707, 857, 411, 257, 2211, 2424, 457, 322, 264, + 661, 1011, 309, 311, 588, 38481, 370, 13, 51714], "temperature": 0.0, "avg_logprob": + -0.11460753332210492, "compression_ratio": 1.6009389671361502, "no_speech_prob": + 0.30056703090667725}, {"id": 123, "seek": 127772, "start": 1277.72, "end": 1305.72, + "text": " And you guys also have I will make sure to link this you also published + the is called neural hashing algorithm right which you which you use one of the + algorithms that you have implemented it would also be equal to drill in into that + direction but I mean in general it was fairly straightforward how you know how we + upload the data how it gets indexed and then how we can query.", "tokens": [50364, + 400, 291, 1074, 611, 362, 286, 486, 652, 988, 281, 2113, 341, 291, 611, 6572, 264, + 307, 1219, 18161, 575, 571, 9284, 558, 597, 291, 597, 291, 764, 472, 295, 264, 14642, + 300, 291, 362, 12270, 309, 576, 611, 312, 2681, 281, 11392, 294, 666, 300, 3513, + 457, 286, 914, 294, 2674, 309, 390, 6457, 15325, 577, 291, 458, 577, 321, 6580, + 264, 1412, 577, 309, 2170, 8186, 292, 293, 550, 577, 321, 393, 14581, 13, 51764], + "temperature": 0.0, "avg_logprob": -0.11058735847473145, "compression_ratio": 1.6725663716814159, + "no_speech_prob": 0.02938235178589821}, {"id": 124, "seek": 130572, "start": 1305.72, + "end": 1326.72, "text": " Yeah I was just thinking to take it a little bit deeper + can you talk to some of the features you know many of the vector database players + they say why do you need vector databases because first of all if you took files + for example a similar framework you wouldn''t have the filter support right and + of course in.", "tokens": [50364, 865, 286, 390, 445, 1953, 281, 747, 309, 257, + 707, 857, 7731, 393, 291, 751, 281, 512, 295, 264, 4122, 291, 458, 867, 295, 264, + 8062, 8149, 4150, 436, 584, 983, 360, 291, 643, 8062, 22380, 570, 700, 295, 439, + 498, 291, 1890, 7098, 337, 1365, 257, 2531, 8388, 291, 2759, 380, 362, 264, 6608, + 1406, 558, 293, 295, 1164, 294, 13, 51414], "temperature": 0.0, "avg_logprob": -0.11486555590774074, + "compression_ratio": 1.555, "no_speech_prob": 0.004475673194974661}, {"id": 125, + "seek": 132672, "start": 1327.1200000000001, "end": 1343.72, "text": " In real application + like such app you do need filters alongside the whatever retriever you implement + right keyword or vector so can you talk a bit more about features and maybe also + touch on the algorithms that you guys have implemented.", "tokens": [50384, 682, + 957, 3861, 411, 1270, 724, 291, 360, 643, 15995, 12385, 264, 2035, 19817, 331, 291, + 4445, 558, 20428, 420, 8062, 370, 393, 291, 751, 257, 857, 544, 466, 4122, 293, + 1310, 611, 2557, 322, 264, 14642, 300, 291, 1074, 362, 12270, 13, 51214], "temperature": + 0.0, "avg_logprob": -0.14714757432328893, "compression_ratio": 1.490566037735849, + "no_speech_prob": 0.00956221204251051}, {"id": 126, "seek": 134372, "start": 1344.72, + "end": 1354.72, "text": " Yeah yeah so there are various types of features and implementations + we are working with you know the common.", "tokens": [50414, 865, 1338, 370, 456, + 366, 3683, 3467, 295, 4122, 293, 4445, 763, 321, 366, 1364, 365, 291, 458, 264, + 2689, 13, 50914], "temperature": 0.0, "avg_logprob": -0.16258420944213867, "compression_ratio": + 1.46, "no_speech_prob": 0.029695376753807068}, {"id": 127, "seek": 134372, "start": + 1356.72, "end": 1365.72, "text": " Algorithms it can be either flat search for applications + like it can be a face recognition where you need to.", "tokens": [51014, 35014, + 6819, 2592, 309, 393, 312, 2139, 4962, 3164, 337, 5821, 411, 309, 393, 312, 257, + 1851, 11150, 689, 291, 643, 281, 13, 51464], "temperature": 0.0, "avg_logprob": + -0.16258420944213867, "compression_ratio": 1.46, "no_speech_prob": 0.029695376753807068}, + {"id": 128, "seek": 136572, "start": 1365.72, "end": 1386.72, "text": " Search any + every record we have implementations of the a nm i vf and new implementation of + hnsw on our apu and pre filter and other features one of one of our.", "tokens": + [50364, 17180, 604, 633, 2136, 321, 362, 4445, 763, 295, 264, 257, 297, 76, 741, + 371, 69, 293, 777, 11420, 295, 276, 3695, 86, 322, 527, 1882, 84, 293, 659, 6608, + 293, 661, 4122, 472, 295, 472, 295, 527, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.4684944152832031, "compression_ratio": 1.4587155963302751, "no_speech_prob": + 0.07207847386598587}, {"id": 129, "seek": 138672, "start": 1386.72, "end": 1401.72, + "text": " One of the areas that we would like to focus is as you mentioned is is + to simplify so we can you know you can work with it as a as a black box and install + the plugin and work with your.", "tokens": [50364, 1485, 295, 264, 3179, 300, 321, + 576, 411, 281, 1879, 307, 382, 291, 2835, 307, 307, 281, 20460, 370, 321, 393, 291, + 458, 291, 393, 589, 365, 309, 382, 257, 382, 257, 2211, 2424, 293, 3625, 264, 23407, + 293, 589, 365, 428, 13, 51114], "temperature": 0.0, "avg_logprob": -0.14856313137297936, + "compression_ratio": 1.4453125, "no_speech_prob": 0.022032681852579117}, {"id": + 130, "seek": 140172, "start": 1401.72, "end": 1413.72, "text": " With your technological + stack and with your search application either elastic open search or a vector search + engine or vector database.", "tokens": [50364, 2022, 428, 18439, 8630, 293, 365, + 428, 3164, 3861, 2139, 17115, 1269, 3164, 420, 257, 8062, 3164, 2848, 420, 8062, + 8149, 13, 50964], "temperature": 0.0, "avg_logprob": -0.2360800046187181, "compression_ratio": + 1.3917525773195876, "no_speech_prob": 0.028889549896121025}, {"id": 131, "seek": + 141372, "start": 1414.72, "end": 1437.72, "text": " And pre filter as you mentioned + is supported and I think that we should focus on simplifying this is our biggest + challenge simplifying the work with our platform and creating more integrations + and more connectors not not on the feature level but in terms of you know working + with the the ecosystem this is this is our main.", "tokens": [50414, 400, 659, 6608, + 382, 291, 2835, 307, 8104, 293, 286, 519, 300, 321, 820, 1879, 322, 6883, 5489, + 341, 307, 527, 3880, 3430, 6883, 5489, 264, 589, 365, 527, 3663, 293, 4084, 544, + 3572, 763, 293, 544, 31865, 406, 406, 322, 264, 4111, 1496, 457, 294, 2115, 295, + 291, 458, 1364, 365, 264, 264, 11311, 341, 307, 341, 307, 527, 2135, 13, 51564], + "temperature": 0.0, "avg_logprob": -0.1380288673169685, "compression_ratio": 1.6649484536082475, + "no_speech_prob": 0.08103273063898087}, {"id": 132, "seek": 143772, "start": 1437.72, + "end": 1457.72, "text": " We are in focus right now and again improving the performance + because we are you know customer obsessed and we would like our customer to get + the lowest infrastructure cost and with without sacrificing latency and.", "tokens": + [50364, 492, 366, 294, 1879, 558, 586, 293, 797, 11470, 264, 3389, 570, 321, 366, + 291, 458, 5474, 16923, 293, 321, 576, 411, 527, 5474, 281, 483, 264, 12437, 6896, + 2063, 293, 365, 1553, 42294, 27043, 293, 13, 51364], "temperature": 0.0, "avg_logprob": + -0.33304762840270996, "compression_ratio": 1.5063291139240507, "no_speech_prob": + 0.07368817180395126}, {"id": 133, "seek": 143772, "start": 1457.72, "end": 1460.72, + "text": " Sorry and the accuracy.", "tokens": [51364, 4919, 293, 264, 14170, 13, + 51514], "temperature": 0.0, "avg_logprob": -0.33304762840270996, "compression_ratio": + 1.5063291139240507, "no_speech_prob": 0.07368817180395126}, {"id": 134, "seek": + 146072, "start": 1461.72, "end": 1480.72, "text": " Yeah that makes sense and especially + like to do this at scale right I know that some of the players they say that it''s + very rare that there are clients with more than you dozens of millions of items + right but today you already mentioned that there are clients.", "tokens": [50414, + 865, 300, 1669, 2020, 293, 2318, 411, 281, 360, 341, 412, 4373, 558, 286, 458, 300, + 512, 295, 264, 4150, 436, 584, 300, 309, 311, 588, 5892, 300, 456, 366, 6982, 365, + 544, 813, 291, 18431, 295, 6803, 295, 4754, 558, 457, 965, 291, 1217, 2835, 300, + 456, 366, 6982, 13, 51364], "temperature": 0.0, "avg_logprob": -0.13310369144786488, + "compression_ratio": 1.625, "no_speech_prob": 0.09243606775999069}, {"id": 135, + "seek": 148072, "start": 1480.72, "end": 1505.72, "text": " Which have you know + more than a billion items maybe more than two billion items so do you think that + going forward you will see more of these second you know type of players with more + data or do you think that there is still a use for dedicated hardware for this kind + of smaller scale players.", "tokens": [50364, 3013, 362, 291, 458, 544, 813, 257, + 5218, 4754, 1310, 544, 813, 732, 5218, 4754, 370, 360, 291, 519, 300, 516, 2128, + 291, 486, 536, 544, 295, 613, 1150, 291, 458, 2010, 295, 4150, 365, 544, 1412, 420, + 360, 291, 519, 300, 456, 307, 920, 257, 764, 337, 8374, 8837, 337, 341, 733, 295, + 4356, 4373, 4150, 13, 51614], "temperature": 0.0, "avg_logprob": -0.11741599728984217, + "compression_ratio": 1.7650602409638554, "no_speech_prob": 0.12625530362129211}, + {"id": 136, "seek": 150572, "start": 1506.72, "end": 1534.72, "text": " Yeah yeah + absolutely agree with you I think that in terms of the scale challenge we are we + are working with with customers and some of them here as you mentioned like tens + of billions but moving forward I think most of the enterprises and the big companies + will move forward and they will scale to one billion 10 billion and maybe even more.", + "tokens": [50414, 865, 1338, 3122, 3986, 365, 291, 286, 519, 300, 294, 2115, 295, + 264, 4373, 3430, 321, 366, 321, 366, 1364, 365, 365, 4581, 293, 512, 295, 552, 510, + 382, 291, 2835, 411, 10688, 295, 17375, 457, 2684, 2128, 286, 519, 881, 295, 264, + 29034, 293, 264, 955, 3431, 486, 1286, 2128, 293, 436, 486, 4373, 281, 472, 5218, + 1266, 5218, 293, 1310, 754, 544, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.12302655759065048, "compression_ratio": 1.7309644670050761, "no_speech_prob": + 0.10071995854377747}, {"id": 137, "seek": 153472, "start": 1534.72, "end": 1563.72, + "text": " In terms of like the ecosystem so my two cents is that companies are stealing + the concept of keyboard search for some applications TF IDF at bm 25 for some for + some application it''s a good solution and you know you don''t need an hammer for + a screw driver.", "tokens": [50414, 682, 2115, 295, 411, 264, 11311, 370, 452, 732, + 14941, 307, 300, 3431, 366, 19757, 264, 3410, 295, 10186, 3164, 337, 512, 5821, + 40964, 7348, 37, 412, 272, 76, 3552, 337, 512, 337, 512, 3861, 309, 311, 257, 665, + 3827, 293, 291, 458, 291, 500, 380, 643, 364, 13017, 337, 257, 5630, 6787, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.27452639529579564, "compression_ratio": 1.4912280701754386, + "no_speech_prob": 0.04111647233366966}, {"id": 138, "seek": 156472, "start": 1564.72, + "end": 1584.72, "text": " So that''s the problem right so for some use cases keyboard + search is a good fit and this is like you know part of the concept of vibrate search + and so I think we are still we''re we''re the beginning of the the vector search + if I may call it the vector search revolution when you know where you can have the.", + "tokens": [50364, 407, 300, 311, 264, 1154, 558, 370, 337, 512, 764, 3331, 10186, + 3164, 307, 257, 665, 3318, 293, 341, 307, 411, 291, 458, 644, 295, 264, 3410, 295, + 11666, 4404, 3164, 293, 370, 286, 519, 321, 366, 920, 321, 434, 321, 434, 264, 2863, + 295, 264, 264, 8062, 3164, 498, 286, 815, 818, 309, 264, 8062, 3164, 8894, 562, + 291, 458, 689, 291, 393, 362, 264, 13, 51364], "temperature": 0.0, "avg_logprob": + -0.2443498692042391, "compression_ratio": 1.6451612903225807, "no_speech_prob": + 0.048471856862306595}, {"id": 139, "seek": 158472, "start": 1584.72, "end": 1613.72, + "text": " Back concept like any unstructured data we were usually we are talking + about text but there are broad areas that we could you know develop some cool stuff + for as I mentioned genome audio video search we have a notebook with you know website + notebook with video search and again the the there''s a broad spectrum of applications + that companies can develop some cool stuff cool stuff.", "tokens": [50364, 5833, + 3410, 411, 604, 18799, 46847, 1412, 321, 645, 2673, 321, 366, 1417, 466, 2487, 457, + 456, 366, 4152, 3179, 300, 321, 727, 291, 458, 1499, 512, 1627, 1507, 337, 382, + 286, 2835, 1049, 423, 6278, 960, 3164, 321, 362, 257, 21060, 365, 291, 458, 3144, + 21060, 365, 960, 3164, 293, 797, 264, 264, 456, 311, 257, 4152, 11143, 295, 5821, + 300, 3431, 393, 1499, 512, 1627, 1507, 1627, 1507, 13, 51814], "temperature": 0.0, + "avg_logprob": -0.2134912078445022, "compression_ratio": 1.7720930232558139, "no_speech_prob": + 0.3108038604259491}, {"id": 140, "seek": 161472, "start": 1614.72, "end": 1627.72, + "text": " And we are you know excited to see brilliant ideas and start up that are + developing applications on top of of these vector search applications.", "tokens": + [50364, 400, 321, 366, 291, 458, 2919, 281, 536, 10248, 3487, 293, 722, 493, 300, + 366, 6416, 5821, 322, 1192, 295, 295, 613, 8062, 3164, 5821, 13, 51014], "temperature": + 0.0, "avg_logprob": -0.18827401569911412, "compression_ratio": 1.5495049504950495, + "no_speech_prob": 0.04221174120903015}, {"id": 141, "seek": 161472, "start": 1627.72, + "end": 1640.72, "text": " Yeah you touch on that topic by the way which I also spoke + to to some extent on the haystack conference in Berlin where I gave a keynote also + make sure to give the link.", "tokens": [51014, 865, 291, 2557, 322, 300, 4829, + 538, 264, 636, 597, 286, 611, 7179, 281, 281, 512, 8396, 322, 264, 4842, 372, 501, + 7586, 294, 13848, 689, 286, 2729, 257, 33896, 611, 652, 988, 281, 976, 264, 2113, + 13, 51664], "temperature": 0.0, "avg_logprob": -0.18827401569911412, "compression_ratio": + 1.5495049504950495, "no_speech_prob": 0.04221174120903015}, {"id": 142, "seek": + 164072, "start": 1640.72, "end": 1669.72, "text": " Back turnbull said that let''s + stop calling it vector search because and I don''t know how I really interested + to really interested to hear your thoughts on that because in principle you you + and I being product managers like if we think about some problem to solve right + let''s say we want to introduce I don''t know question answering component in our + search engines it''s not like we would probably if we didn''t know that we probably + wouldn''t say oh I know how to solve it it''s vector search.", "tokens": [50364, + 5833, 1261, 37290, 848, 300, 718, 311, 1590, 5141, 309, 8062, 3164, 570, 293, 286, + 500, 380, 458, 577, 286, 534, 3102, 281, 534, 3102, 281, 1568, 428, 4598, 322, 300, + 570, 294, 8665, 291, 291, 293, 286, 885, 1674, 14084, 411, 498, 321, 519, 466, 512, + 1154, 281, 5039, 558, 718, 311, 584, 321, 528, 281, 5366, 286, 500, 380, 458, 1168, + 13430, 6542, 294, 527, 3164, 12982, 309, 311, 406, 411, 321, 576, 1391, 498, 321, + 994, 380, 458, 300, 321, 1391, 2759, 380, 584, 1954, 286, 458, 577, 281, 5039, 309, + 309, 311, 8062, 3164, 13, 51814], "temperature": 0.0, "avg_logprob": -0.12146264603994425, + "compression_ratio": 1.8593155893536122, "no_speech_prob": 0.1279418170452118}, + {"id": 143, "seek": 166972, "start": 1669.72, "end": 1698.72, "text": " And so instead + he was saying you know let''s call it relevance application right or relevance oriented + application what''s your broad take on this you touched on this as well like people + are not yet aware of this revolution it''s probably already happening but people + don''t know what to do with it right and I just yesterday saw it with from one user + saying can you actually explain what what can I do with it.", "tokens": [50364, + 400, 370, 2602, 415, 390, 1566, 291, 458, 718, 311, 818, 309, 32684, 3861, 558, + 420, 32684, 21841, 3861, 437, 311, 428, 4152, 747, 322, 341, 291, 9828, 322, 341, + 382, 731, 411, 561, 366, 406, 1939, 3650, 295, 341, 8894, 309, 311, 1391, 1217, + 2737, 457, 561, 500, 380, 458, 437, 281, 360, 365, 309, 558, 293, 286, 445, 5186, + 1866, 309, 365, 490, 472, 4195, 1566, 393, 291, 767, 2903, 437, 437, 393, 286, 360, + 365, 309, 13, 51814], "temperature": 0.0, "avg_logprob": -0.11065818014599028, "compression_ratio": + 1.7319148936170212, "no_speech_prob": 0.05738554149866104}, {"id": 144, "seek": + 169872, "start": 1698.72, "end": 1708.72, "text": " So do you think that the world + is still let''s say the world of software development is still awakening to this + new field.", "tokens": [50364, 407, 360, 291, 519, 300, 264, 1002, 307, 920, 718, + 311, 584, 264, 1002, 295, 4722, 3250, 307, 920, 31550, 281, 341, 777, 2519, 13, + 50864], "temperature": 0.0, "avg_logprob": -0.35846584359395134, "compression_ratio": + 1.8982300884955752, "no_speech_prob": 0.05404602363705635}, {"id": 145, "seek": + 169872, "start": 1708.72, "end": 1727.72, "text": " Yeah yeah absolutely I fully + agree with you essentially when I''m talking with developers and I''m saying we + are I''m talking again we are working on vector search they are like asking vector + what and I think that most of the developers and this is one of the things that + I''m going to do is I''m going to do it.", "tokens": [50864, 865, 1338, 3122, 286, + 4498, 3986, 365, 291, 4476, 562, 286, 478, 1417, 365, 8849, 293, 286, 478, 1566, + 321, 366, 286, 478, 1417, 797, 321, 366, 1364, 322, 8062, 3164, 436, 366, 411, 3365, + 8062, 437, 293, 286, 519, 300, 881, 295, 264, 8849, 293, 341, 307, 472, 295, 264, + 721, 300, 286, 478, 516, 281, 360, 307, 286, 478, 516, 281, 360, 309, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.35846584359395134, "compression_ratio": 1.8982300884955752, + "no_speech_prob": 0.05404602363705635}, {"id": 146, "seek": 172772, "start": 1727.72, + "end": 1756.72, "text": " And this is one of our challenges is to democratize AI + and machine learning so in terms of technology my perspective and is that technology + is an enabler if the best solution is vector search great it can you know outperform + on on various applications but the technology on a product perspective so you are + trying to create value I think that the first lesson of a product.", "tokens": [50364, + 400, 341, 307, 472, 295, 527, 4759, 307, 281, 37221, 1125, 7318, 293, 3479, 2539, + 370, 294, 2115, 295, 2899, 452, 4585, 293, 307, 300, 2899, 307, 364, 465, 455, 1918, + 498, 264, 1151, 3827, 307, 8062, 3164, 869, 309, 393, 291, 458, 484, 26765, 322, + 322, 3683, 5821, 457, 264, 2899, 322, 257, 1674, 4585, 370, 291, 366, 1382, 281, + 1884, 2158, 286, 519, 300, 264, 700, 6898, 295, 257, 1674, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.15559538308676188, "compression_ratio": 1.6801801801801801, + "no_speech_prob": 0.004189042840152979}, {"id": 147, "seek": 175672, "start": 1756.72, + "end": 1772.72, "text": " The second lesson of a product is to create value for + your customer that''s it simple as that and what is the technology and what is under + the hood and what is inside the black box it really doesn''t matter.", "tokens": + [50364, 440, 1150, 6898, 295, 257, 1674, 307, 281, 1884, 2158, 337, 428, 5474, 300, + 311, 309, 2199, 382, 300, 293, 437, 307, 264, 2899, 293, 437, 307, 833, 264, 13376, + 293, 437, 307, 1854, 264, 2211, 2424, 309, 534, 1177, 380, 1871, 13, 51164], "temperature": + 0.2, "avg_logprob": -0.16074164370273022, "compression_ratio": 1.5073529411764706, + "no_speech_prob": 0.01169244758784771}, {"id": 148, "seek": 177272, "start": 1772.72, + "end": 1796.72, "text": " In terms of technology yeah there''s you know and we are + like it''s a crazy time for developers in terms of the AI machine learning revolution + stable diffusion generative generative AI and I''ve heard about that they are going + open AI planning to launch the new GPT for.", "tokens": [50364, 682, 2115, 295, + 2899, 1338, 456, 311, 291, 458, 293, 321, 366, 411, 309, 311, 257, 3219, 565, 337, + 8849, 294, 2115, 295, 264, 7318, 3479, 2539, 8894, 8351, 25242, 1337, 1166, 1337, + 1166, 7318, 293, 286, 600, 2198, 466, 300, 436, 366, 516, 1269, 7318, 5038, 281, + 4025, 264, 777, 26039, 51, 337, 13, 51564], "temperature": 0.0, "avg_logprob": -0.15446628958491956, + "compression_ratio": 1.4833333333333334, "no_speech_prob": 0.08518822491168976}, + {"id": 149, "seek": 179672, "start": 1796.72, "end": 1825.72, "text": " And the + pace of innovation is is totally crazy so and it''s really hard to keep it to keep + it simple to simplify it when when people are asking you you know there''s the grandmother + test for startup state in plain English explain your idea in a plain English super + challenging you know to simplify so when you know when developers or companies asking + what is vector search I''m using the.", "tokens": [50364, 400, 264, 11638, 295, + 8504, 307, 307, 3879, 3219, 370, 293, 309, 311, 534, 1152, 281, 1066, 309, 281, + 1066, 309, 2199, 281, 20460, 309, 562, 562, 561, 366, 3365, 291, 291, 458, 456, + 311, 264, 14317, 1500, 337, 18578, 1785, 294, 11121, 3669, 2903, 428, 1558, 294, + 257, 11121, 3669, 1687, 7595, 291, 458, 281, 20460, 370, 562, 291, 458, 562, 8849, + 420, 3431, 3365, 437, 307, 8062, 3164, 286, 478, 1228, 264, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.1611012801145896, "compression_ratio": 1.7579908675799087, + "no_speech_prob": 0.048631180077791214}, {"id": 150, "seek": 182672, "start": 1826.72, + "end": 1851.72, "text": " The example of you know transforming words in the case + of text to numbers it''s easy for us to compare numbers right we know that three + is is close or similar to four right but what is the connection between king and + queen okay so how do you represent it as a number.", "tokens": [50364, 440, 1365, + 295, 291, 458, 27210, 2283, 294, 264, 1389, 295, 2487, 281, 3547, 309, 311, 1858, + 337, 505, 281, 6794, 3547, 558, 321, 458, 300, 1045, 307, 307, 1998, 420, 2531, + 281, 1451, 558, 457, 437, 307, 264, 4984, 1296, 4867, 293, 12206, 1392, 370, 577, + 360, 291, 2906, 309, 382, 257, 1230, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.17193661705922272, "compression_ratio": 1.5647058823529412, "no_speech_prob": + 0.007253407966345549}, {"id": 151, "seek": 185172, "start": 1851.72, "end": 1880.72, + "text": " So if you if you are trying again again I''m trying to super simplified + if you are trying to build an equation what is the connection between king and queen + so you can say king plus men minus woman equals to queen so you are you''re trying + to represent it as numbers so this is the concept of vector you are representing + I''m going to say.", "tokens": [50364, 407, 498, 291, 498, 291, 366, 1382, 797, + 797, 286, 478, 1382, 281, 1687, 26335, 498, 291, 366, 1382, 281, 1322, 364, 5367, + 437, 307, 264, 4984, 1296, 4867, 293, 12206, 370, 291, 393, 584, 4867, 1804, 1706, + 3175, 3059, 6915, 281, 12206, 370, 291, 366, 291, 434, 1382, 281, 2906, 309, 382, + 3547, 370, 341, 307, 264, 3410, 295, 8062, 291, 366, 13460, 286, 478, 516, 281, + 584, 13, 51814], "temperature": 0.0, "avg_logprob": -0.26626050794446793, "compression_ratio": + 1.8064516129032258, "no_speech_prob": 0.0976535975933075}, {"id": 152, "seek": 188072, + "start": 1880.72, "end": 1909.72, "text": " So I''m trying to make sure that you + are understanding and unstructured data and it can be you know with image image + embedding etc and then I think like you know most of the tech companies today their + core technology is search okay let''s take if you are looking for a movie it''s + it''s Netflix if you are would like you know to hear something cool or your podcast + so you are running query on Spotify vector podcast and you will get the.", "tokens": + [50364, 407, 286, 478, 1382, 281, 652, 988, 300, 291, 366, 3701, 293, 18799, 46847, + 1412, 293, 309, 393, 312, 291, 458, 365, 3256, 3256, 12240, 3584, 5183, 293, 550, + 286, 519, 411, 291, 458, 881, 295, 264, 7553, 3431, 965, 641, 4965, 2899, 307, 3164, + 1392, 718, 311, 747, 498, 291, 366, 1237, 337, 257, 3169, 309, 311, 309, 311, 12778, + 498, 291, 366, 576, 411, 291, 458, 281, 1568, 746, 1627, 420, 428, 7367, 370, 291, + 366, 2614, 14581, 322, 29036, 8062, 7367, 293, 291, 486, 483, 264, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.3485026258103391, "compression_ratio": 1.7389558232931728, + "no_speech_prob": 0.22334259748458862}, {"id": 153, "seek": 190972, "start": 1909.72, + "end": 1938.72, "text": " Dmitry''s podcast you would like to buy a dress or and + you you are trying to do it very simple you don''t usually more more of the let''s + take for instance e-commerce right so most of the consumers don''t have the time + and the patient to run you know SQL queries you know filter these filter that they + would like to write it in a simple English or or in a different language okay yeah + so let''s take for an example and.", "tokens": [50364, 413, 3508, 627, 311, 7367, + 291, 576, 411, 281, 2256, 257, 5231, 420, 293, 291, 291, 366, 1382, 281, 360, 309, + 588, 2199, 291, 500, 380, 2673, 544, 544, 295, 264, 718, 311, 747, 337, 5197, 308, + 12, 26926, 558, 370, 881, 295, 264, 11883, 500, 380, 362, 264, 565, 293, 264, 4537, + 281, 1190, 291, 458, 19200, 24109, 291, 458, 6608, 613, 6608, 300, 436, 576, 411, + 281, 2464, 309, 294, 257, 2199, 3669, 420, 420, 294, 257, 819, 2856, 1392, 1338, + 370, 718, 311, 747, 337, 364, 1365, 293, 13, 51814], "temperature": 0.4, "avg_logprob": + -0.18317365144428455, "compression_ratio": 1.7107438016528926, "no_speech_prob": + 0.01184324361383915}, {"id": 154, "seek": 193972, "start": 1939.72, "end": 1968.72, + "text": " Girl in Asia she would like to purchase a red and white short sleeve dress + up until the the vector search revolution she didn''t had the option to do it so + usually she will get like a similar result it will not always be red and white with + short sleeve dress and what about the challenge of the language okay so if our English + is not so great and she would like to purchase something.", "tokens": [50364, 8502, + 294, 10038, 750, 576, 411, 281, 8110, 257, 2182, 293, 2418, 2099, 21138, 5231, 493, + 1826, 264, 264, 8062, 3164, 8894, 750, 994, 380, 632, 264, 3614, 281, 360, 309, + 370, 2673, 750, 486, 483, 411, 257, 2531, 1874, 309, 486, 406, 1009, 312, 2182, + 293, 2418, 365, 2099, 21138, 5231, 293, 437, 466, 264, 3430, 295, 264, 2856, 1392, + 370, 498, 527, 3669, 307, 406, 370, 869, 293, 750, 576, 411, 281, 8110, 746, 13, + 51814], "temperature": 0.0, "avg_logprob": -0.12148785591125488, "compression_ratio": + 1.7813953488372094, "no_speech_prob": 0.04090399667620659}, {"id": 155, "seek": + 196972, "start": 1969.72, "end": 1999.68, "text": " An Amazon eBay or any other + e-commerce so the challenge of language so essentially vector search is breaking + the barrier of the language and the the barrier of understanding what is your what + is your question or what is your query so I think that in terms of you know and + there''s a broad discussion about it democratizing AI what is the added value of.", + "tokens": [50364, 1107, 6795, 33803, 420, 604, 661, 308, 12, 26926, 370, 264, 3430, + 295, 2856, 370, 4476, 8062, 3164, 307, 7697, 264, 13357, 295, 264, 2856, 293, 264, + 264, 13357, 295, 3701, 437, 307, 428, 437, 307, 428, 1168, 420, 437, 307, 428, 14581, + 370, 286, 519, 300, 294, 2115, 295, 291, 458, 293, 456, 311, 257, 4152, 5017, 466, + 309, 37221, 3319, 7318, 437, 307, 264, 3869, 2158, 295, 13, 51862], "temperature": + 0.0, "avg_logprob": -0.1465297333181721, "compression_ratio": 1.7170731707317073, + "no_speech_prob": 0.037404220551252365}, {"id": 156, "seek": 199972, "start": 1999.72, + "end": 2027.04, "text": " AI so you know you have like autonomous cars and this + is great but you know breaking barriers the language barrier with the multilingual + model and and some other cool stuff this is this is I think something that is doing + really good for the ecosystem and for consumer and the people that you know they + have like a barrier of a language so.", "tokens": [50364, 7318, 370, 291, 458, 291, + 362, 411, 23797, 5163, 293, 341, 307, 869, 457, 291, 458, 7697, 13565, 264, 2856, + 13357, 365, 264, 2120, 38219, 2316, 293, 293, 512, 661, 1627, 1507, 341, 307, 341, + 307, 286, 519, 746, 300, 307, 884, 534, 665, 337, 264, 11311, 293, 337, 9711, 293, + 264, 561, 300, 291, 458, 436, 362, 411, 257, 13357, 295, 257, 2856, 370, 13, 51730], + "temperature": 0.0, "avg_logprob": -0.153001526423863, "compression_ratio": 1.774869109947644, + "no_speech_prob": 0.06908848136663437}, {"id": 157, "seek": 202704, "start": 2027.96, + "end": 2047.76, "text": " This is a great example what is the added value of vector + search yeah I agree I mean all of the examples that you brought up you know if you + if you look at how you would tackle I don''t know like red short sleeve dress with + the more traditional approach I guess you will need to build some kind of.", "tokens": + [50410, 639, 307, 257, 869, 1365, 437, 307, 264, 3869, 2158, 295, 8062, 3164, 1338, + 286, 3986, 286, 914, 439, 295, 264, 5110, 300, 291, 3038, 493, 291, 458, 498, 291, + 498, 291, 574, 412, 577, 291, 576, 14896, 286, 500, 380, 458, 411, 2182, 2099, 21138, + 5231, 365, 264, 544, 5164, 3109, 286, 2041, 291, 486, 643, 281, 1322, 512, 733, + 295, 13, 51400], "temperature": 0.0, "avg_logprob": -0.13978313332173362, "compression_ratio": + 1.5549738219895288, "no_speech_prob": 0.024568524211645126}, {"id": 158, "seek": + 204776, "start": 2047.76, "end": 2068.76, "text": " Query understanding system but + even then like even if after you''ve built it let''s say you you will run filters + on your data right but that that also means you do need to have the filters but + but if you don''t have them if you don''t have the values in those fields in your + documents right so what if you want you have like and this is by the way not.", + "tokens": [50364, 2326, 2109, 3701, 1185, 457, 754, 550, 411, 754, 498, 934, 291, + 600, 3094, 309, 718, 311, 584, 291, 291, 486, 1190, 15995, 322, 428, 1412, 558, + 457, 300, 300, 611, 1355, 291, 360, 643, 281, 362, 264, 15995, 457, 457, 498, 291, + 500, 380, 362, 552, 498, 291, 500, 380, 362, 264, 4190, 294, 729, 7909, 294, 428, + 8512, 558, 370, 437, 498, 291, 528, 291, 362, 411, 293, 341, 307, 538, 264, 636, + 406, 13, 51414], "temperature": 0.0, "avg_logprob": -0.21288911795910495, "compression_ratio": + 1.8031088082901554, "no_speech_prob": 0.04797352850437164}, {"id": 159, "seek": + 206876, "start": 2068.76, "end": 2094.92, "text": " It''s very unusual like I used + to see I used to oversee a project in e-commerce space where we would get data from + new providers all the time right so one of the issues was to map them back to our + ontology but at the same time they would they would miss a lot of like field values + right so what would you put there so they give you some description and then they + give you the image on a set of images.", "tokens": [50364, 467, 311, 588, 10901, + 411, 286, 1143, 281, 536, 286, 1143, 281, 46543, 257, 1716, 294, 308, 12, 26926, + 1901, 689, 321, 576, 483, 1412, 490, 777, 11330, 439, 264, 565, 558, 370, 472, 295, + 264, 2663, 390, 281, 4471, 552, 646, 281, 527, 6592, 1793, 457, 412, 264, 912, 565, + 436, 576, 436, 576, 1713, 257, 688, 295, 411, 2519, 4190, 558, 370, 437, 576, 291, + 829, 456, 370, 436, 976, 291, 512, 3855, 293, 550, 436, 976, 291, 264, 3256, 322, + 257, 992, 295, 5267, 13, 51672], "temperature": 0.0, "avg_logprob": -0.19501097305961276, + "compression_ratio": 1.7621145374449338, "no_speech_prob": 0.3938063681125641}, + {"id": 160, "seek": 209492, "start": 2095.32, "end": 2108.2400000000002, "text": + " So like with with conventional not conventional but like more traditional approach + of search right keyword search you''re kind of like stuck right what would you do + there and I guess of course people do solve it in some way but.", "tokens": [50384, + 407, 411, 365, 365, 16011, 406, 16011, 457, 411, 544, 5164, 3109, 295, 3164, 558, + 20428, 3164, 291, 434, 733, 295, 411, 5541, 558, 437, 576, 291, 360, 456, 293, 286, + 2041, 295, 1164, 561, 360, 5039, 309, 294, 512, 636, 457, 13, 51030], "temperature": + 0.0, "avg_logprob": -0.18719393412272137, "compression_ratio": 1.6473988439306357, + "no_speech_prob": 0.020178183913230896}, {"id": 161, "seek": 209492, "start": 2108.8, + "end": 2112.7200000000003, "text": " Instead you could just apply vector search + right and and.", "tokens": [51058, 7156, 291, 727, 445, 3079, 8062, 3164, 558, 293, + 293, 13, 51254], "temperature": 0.0, "avg_logprob": -0.18719393412272137, "compression_ratio": + 1.6473988439306357, "no_speech_prob": 0.020178183913230896}, {"id": 162, "seek": + 211272, "start": 2113.64, "end": 2127.56, "text": " Even though I say just there + is still some challenge for example with model fine tuning and things like that + can you talk a bit more to this maybe new challenges that this field opens of course + it gives us.", "tokens": [50410, 2754, 1673, 286, 584, 445, 456, 307, 920, 512, + 3430, 337, 1365, 365, 2316, 2489, 15164, 293, 721, 411, 300, 393, 291, 751, 257, + 857, 544, 281, 341, 1310, 777, 4759, 300, 341, 2519, 9870, 295, 1164, 309, 2709, + 505, 13, 51106], "temperature": 0.0, "avg_logprob": -0.11314863628811306, "compression_ratio": + 1.4405594405594406, "no_speech_prob": 0.0452168732881546}, {"id": 163, "seek": 212756, + "start": 2128.16, "end": 2145.2, "text": " opportunities it gives us advantages + it solves some you know painstaking issues that we had before but what do we need + to focus on going forward then once we deploy such systems beyond only hardware + part but also like on this algorithm side.", "tokens": [50394, 4786, 309, 2709, + 505, 14906, 309, 39890, 512, 291, 458, 1822, 372, 2456, 2663, 300, 321, 632, 949, + 457, 437, 360, 321, 643, 281, 1879, 322, 516, 2128, 550, 1564, 321, 7274, 1270, + 3652, 4399, 787, 8837, 644, 457, 611, 411, 322, 341, 9284, 1252, 13, 51246], "temperature": + 0.0, "avg_logprob": -0.3053676986694336, "compression_ratio": 1.4968944099378882, + "no_speech_prob": 0.05416077747941017}, {"id": 164, "seek": 214520, "start": 2146.16, + "end": 2171.2799999999997, "text": " Yeah you know this is this is a great question + because it resonates with one of your blog post recent blog post where you published + the Google''s research about e-commerce companies in the US losing 300 billion dollars + due to search abandonment in the US only.", "tokens": [50412, 865, 291, 458, 341, + 307, 341, 307, 257, 869, 1168, 570, 309, 41051, 365, 472, 295, 428, 6968, 2183, + 5162, 6968, 2183, 689, 291, 6572, 264, 3329, 311, 2132, 466, 308, 12, 26926, 3431, + 294, 264, 2546, 7027, 6641, 5218, 3808, 3462, 281, 3164, 9072, 518, 294, 264, 2546, + 787, 13, 51668], "temperature": 0.0, "avg_logprob": -0.2387984882701527, "compression_ratio": + 1.48, "no_speech_prob": 0.08114374428987503}, {"id": 165, "seek": 217128, "start": + 2172.0400000000004, "end": 2196.84, "text": " And again this is crazy number because + if you have like I would like to buy a green polo shirt and I really want to buy + a green polo shirt and the e-commerce got this green polo shirt inside in the in + the warehouse or in the inventory and they can find the match we can find the match + for this for this challenge.", "tokens": [50402, 400, 797, 341, 307, 3219, 1230, + 570, 498, 291, 362, 411, 286, 576, 411, 281, 2256, 257, 3092, 1180, 78, 8336, 293, + 286, 534, 528, 281, 2256, 257, 3092, 1180, 78, 8336, 293, 264, 308, 12, 26926, 658, + 341, 3092, 1180, 78, 8336, 1854, 294, 264, 294, 264, 22244, 420, 294, 264, 14228, + 293, 436, 393, 915, 264, 2995, 321, 393, 915, 264, 2995, 337, 341, 337, 341, 3430, + 13, 51642], "temperature": 0.0, "avg_logprob": -0.22432667500263936, "compression_ratio": + 1.8520710059171597, "no_speech_prob": 0.04939769208431244}, {"id": 166, "seek": + 219684, "start": 2197.84, "end": 2219.56, "text": " This is this is the you challenge + but in terms of of and again this is just one one example but you know our mission + is to back to break this barrier for for developers it''s not only e-commerce so + expanding it to searching blocks okay if you would like to find.", "tokens": [50414, + 639, 307, 341, 307, 264, 291, 3430, 457, 294, 2115, 295, 295, 293, 797, 341, 307, + 445, 472, 472, 1365, 457, 291, 458, 527, 4447, 307, 281, 646, 281, 1821, 341, 13357, + 337, 337, 8849, 309, 311, 406, 787, 308, 12, 26926, 370, 14702, 309, 281, 10808, + 8474, 1392, 498, 291, 576, 411, 281, 915, 13, 51500], "temperature": 0.0, "avg_logprob": + -0.2289884090423584, "compression_ratio": 1.5535714285714286, "no_speech_prob": + 0.0221580658107996}, {"id": 167, "seek": 221956, "start": 2220.52, "end": 2245.72, + "text": " And anomaly or you would like to understand what is the root cause when + you''re and you have like a software system logs and you would like to to understand + and to find some anomalies or even fintech e-commerce and other areas I think that + there''s some cool stuff over there so one one way.", "tokens": [50412, 400, 42737, + 420, 291, 576, 411, 281, 1223, 437, 307, 264, 5593, 3082, 562, 291, 434, 293, 291, + 362, 411, 257, 4722, 1185, 20820, 293, 291, 576, 411, 281, 281, 1223, 293, 281, + 915, 512, 24769, 48872, 420, 754, 283, 686, 5023, 308, 12, 26926, 293, 661, 3179, + 286, 519, 300, 456, 311, 512, 1627, 1507, 670, 456, 370, 472, 472, 636, 13, 51672], + "temperature": 0.0, "avg_logprob": -0.17895628801032679, "compression_ratio": 1.670520231213873, + "no_speech_prob": 0.06978459656238556}, {"id": 168, "seek": 224572, "start": 2246.2799999999997, + "end": 2270.68, "text": " And you know to move forward is if you would like to use + let''s take for instance Siri I would like to buy with your audio right I would + like to buy a red and white short sleeve dress below 100 dollar.", "tokens": [50392, + 400, 291, 458, 281, 1286, 2128, 307, 498, 291, 576, 411, 281, 764, 718, 311, 747, + 337, 5197, 33682, 286, 576, 411, 281, 2256, 365, 428, 6278, 558, 286, 576, 411, + 281, 2256, 257, 2182, 293, 2418, 2099, 21138, 5231, 2507, 2319, 7241, 13, 51612], + "temperature": 0.0, "avg_logprob": -0.23003037770589194, "compression_ratio": 1.5037593984962405, + "no_speech_prob": 0.06288184970617294}, {"id": 169, "seek": 227068, "start": 2270.68, + "end": 2298.72, "text": " Okay so you can this is a simple thing for you know consumers + but you know technology wise this is the you challenge so the first challenge is + to convert the audio to text and today there''s you know you can convert it directly + to vectors and then you can run this query but again you need to filter because + if you want.", "tokens": [50412, 1033, 370, 291, 393, 341, 307, 257, 2199, 551, + 337, 291, 458, 11883, 457, 291, 458, 2899, 10829, 341, 307, 264, 291, 3430, 370, + 264, 700, 3430, 307, 281, 7620, 264, 6278, 281, 2487, 293, 965, 456, 311, 291, 458, + 291, 393, 7620, 309, 3838, 281, 18875, 293, 550, 291, 393, 1190, 341, 14581, 457, + 797, 291, 643, 281, 6608, 570, 498, 291, 528, 13, 51766], "temperature": 0.0, "avg_logprob": + -0.1768792797537411, "compression_ratio": 1.7722222222222221, "no_speech_prob": + 0.03726491704583168}, {"id": 170, "seek": 230068, "start": 2300.68, "end": 2330.6, + "text": " Something that is below 100 so usually it''s the price field so I think + this is the biggest challenge that the consumers or people can communicate in a + natural way with the computer with audio and say it very simple without you know + trying to to run a complicated SQL queries etc so I think this is the like the the + holy grail of of the audio.", "tokens": [50364, 6595, 300, 307, 2507, 2319, 370, + 2673, 309, 311, 264, 3218, 2519, 370, 286, 519, 341, 307, 264, 3880, 3430, 300, + 264, 11883, 420, 561, 393, 7890, 294, 257, 3303, 636, 365, 264, 3820, 365, 6278, + 293, 584, 309, 588, 2199, 1553, 291, 458, 1382, 281, 281, 1190, 257, 6179, 19200, + 24109, 5183, 370, 286, 519, 341, 307, 264, 411, 264, 264, 10622, 1295, 388, 295, + 295, 264, 6278, 13, 51860], "temperature": 0.0, "avg_logprob": -0.32119361668416896, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 0.019489077851176262}, + {"id": 171, "seek": 233068, "start": 2330.68, "end": 2356.44, "text": " Machine + learning to process this query and give you like the example and when you are purchase + when you would like to purchase it on a certain website it will give you the place + order page and you will get all of the details you will see the type of the dress + and it will give you the right result and it will be below 100.", "tokens": [50364, + 22155, 2539, 281, 1399, 341, 14581, 293, 976, 291, 411, 264, 1365, 293, 562, 291, + 366, 8110, 562, 291, 576, 411, 281, 8110, 309, 322, 257, 1629, 3144, 309, 486, 976, + 291, 264, 1081, 1668, 3028, 293, 291, 486, 483, 439, 295, 264, 4365, 291, 486, 536, + 264, 2010, 295, 264, 5231, 293, 309, 486, 976, 291, 264, 558, 1874, 293, 309, 486, + 312, 2507, 2319, 13, 51652], "temperature": 0.0, "avg_logprob": -0.18584509298835003, + "compression_ratio": 1.8352272727272727, "no_speech_prob": 0.011681396514177322}, + {"id": 172, "seek": 235644, "start": 2356.44, "end": 2369.32, "text": " The 100 + dollar and I think this is the way or this is a direction that we we can move forward + with this technology.", "tokens": [50364, 440, 2319, 7241, 293, 286, 519, 341, 307, + 264, 636, 420, 341, 307, 257, 3513, 300, 321, 321, 393, 1286, 2128, 365, 341, 2899, + 13, 51008], "temperature": 0.0, "avg_logprob": -0.2316394962676584, "compression_ratio": + 1.603864734299517, "no_speech_prob": 0.18350249528884888}, {"id": 173, "seek": 235644, + "start": 2370.12, "end": 2386.36, "text": " Yeah yeah that sounds great so in principle + like so so that our listeners that present a new one will understand is that vector + search really opens doors to new types of data right new modalities as they say + so like.", "tokens": [51048, 865, 1338, 300, 3263, 869, 370, 294, 8665, 411, 370, + 370, 300, 527, 23274, 300, 1974, 257, 777, 472, 486, 1223, 307, 300, 8062, 3164, + 534, 9870, 8077, 281, 777, 3467, 295, 1412, 558, 777, 1072, 16110, 382, 436, 584, + 370, 411, 13, 51860], "temperature": 0.0, "avg_logprob": -0.2316394962676584, "compression_ratio": + 1.603864734299517, "no_speech_prob": 0.18350249528884888}, {"id": 174, "seek": 238644, + "start": 2386.44, "end": 2416.2000000000003, "text": " Previously it was maybe only + text modality even if you saw pictures on the on the monitor or on your phone as + you know as response to your query it doesn''t necessarily mean that that query + really was kind of grasping the best parts of that image like it would actually + understand what what is in the image but with vector search you can also implement + that and for example using clip model or some other model where you can.", "tokens": + [50364, 33606, 309, 390, 1310, 787, 2487, 1072, 1860, 754, 498, 291, 1866, 5242, + 322, 264, 322, 264, 6002, 420, 322, 428, 2593, 382, 291, 458, 382, 4134, 281, 428, + 14581, 309, 1177, 380, 4725, 914, 300, 300, 14581, 534, 390, 733, 295, 29444, 3381, + 264, 1151, 3166, 295, 300, 3256, 411, 309, 576, 767, 1223, 437, 437, 307, 294, 264, + 3256, 457, 365, 8062, 3164, 291, 393, 611, 4445, 300, 293, 337, 1365, 1228, 7353, + 2316, 420, 512, 661, 2316, 689, 291, 393, 13, 51852], "temperature": 0.0, "avg_logprob": + -0.1135278679858679, "compression_ratio": 1.7306122448979593, "no_speech_prob": + 0.022312765941023827}, {"id": 175, "seek": 241644, "start": 2417.4, "end": 2418.04, + "text": " Really.", "tokens": [50412, 4083, 13, 50444], "temperature": 0.0, "avg_logprob": + -0.2591490396639196, "compression_ratio": 1.6682692307692308, "no_speech_prob": + 0.008501578122377396}, {"id": 176, "seek": 241644, "start": 2419.32, "end": 2436.6, + "text": " Infer meaning from that picture right and what you are saying is that + in the future and maybe this is to some extent happening already is that we can + also cross modalities between voice and text right so like what I''m saying it can + it can represent as a vector and then.", "tokens": [50508, 682, 612, 3620, 490, + 300, 3036, 558, 293, 437, 291, 366, 1566, 307, 300, 294, 264, 2027, 293, 1310, 341, + 307, 281, 512, 8396, 2737, 1217, 307, 300, 321, 393, 611, 3278, 1072, 16110, 1296, + 3177, 293, 2487, 558, 370, 411, 437, 286, 478, 1566, 309, 393, 309, 393, 2906, 382, + 257, 8062, 293, 550, 13, 51372], "temperature": 0.0, "avg_logprob": -0.2591490396639196, + "compression_ratio": 1.6682692307692308, "no_speech_prob": 0.008501578122377396}, + {"id": 177, "seek": 241644, "start": 2437.48, "end": 2442.84, "text": " Find an + image or find a video right it''s like a lot of applications.", "tokens": [51416, + 11809, 364, 3256, 420, 915, 257, 960, 558, 309, 311, 411, 257, 688, 295, 5821, 13, + 51684], "temperature": 0.0, "avg_logprob": -0.2591490396639196, "compression_ratio": + 1.6682692307692308, "no_speech_prob": 0.008501578122377396}, {"id": 178, "seek": + 244284, "start": 2442.84, "end": 2471.88, "text": " Yeah yeah yeah totally yeah + exactly and you know if you are working with your Instagram and you found like a + nice celeb that is wearing a nice dress and you would like to buy something that + is similar so with image search you can find like a similar and find me the find + me this dress or the most relevant dress or the most the closest dress the.", "tokens": + [50372, 865, 1338, 1338, 3879, 1338, 2293, 293, 291, 458, 498, 291, 366, 1364, 365, + 428, 5281, 293, 291, 1352, 411, 257, 1481, 1769, 28512, 300, 307, 4769, 257, 1481, + 5231, 293, 291, 576, 411, 281, 2256, 746, 300, 307, 2531, 370, 365, 3256, 3164, + 291, 393, 915, 411, 257, 2531, 293, 915, 385, 264, 915, 385, 341, 5231, 420, 264, + 881, 7340, 5231, 420, 264, 881, 264, 13699, 5231, 264, 13, 51816], "temperature": + 0.0, "avg_logprob": -0.2532351983560098, "compression_ratio": 1.8804347826086956, + "no_speech_prob": 0.0316493846476078}, {"id": 179, "seek": 247284, "start": 2472.84, + "end": 2487.1600000000003, "text": " Closes example of this dress and yeah yeah + there are various options you know this is just one example you know of how to monetize + Instagram or a tick talk where you know consumers can.", "tokens": [50364, 2033, + 4201, 1365, 295, 341, 5231, 293, 1338, 1338, 456, 366, 3683, 3956, 291, 458, 341, + 307, 445, 472, 1365, 291, 458, 295, 577, 281, 15556, 1125, 5281, 420, 257, 5204, + 751, 689, 291, 458, 11883, 393, 13, 51080], "temperature": 0.0, "avg_logprob": -0.4102802276611328, + "compression_ratio": 1.4785714285714286, "no_speech_prob": 0.010825091041624546}, + {"id": 180, "seek": 247284, "start": 2488.2000000000003, "end": 2490.52, "text": + " Watch their favorite.", "tokens": [51132, 7277, 641, 2954, 13, 51248], "temperature": + 0.0, "avg_logprob": -0.4102802276611328, "compression_ratio": 1.4785714285714286, + "no_speech_prob": 0.010825091041624546}, {"id": 181, "seek": 249052, "start": 2490.52, + "end": 2491.56, "text": " them.", "tokens": [50364, 552, 13, 50416], "temperature": + 0.0, "avg_logprob": -0.3449482295824134, "compression_ratio": 1.7577092511013215, + "no_speech_prob": 0.07154249399900436}, {"id": 182, "seek": 249052, "start": 2492.2, + "end": 2508.36, "text": " The celeb that they are following and if they were seeing + something this is great so I want to purchase it and in terms of monetization and + in terms of the added value of the customer take take this you take this platform + says that.", "tokens": [50448, 440, 1769, 28512, 300, 436, 366, 3480, 293, 498, + 436, 645, 2577, 746, 341, 307, 869, 370, 286, 528, 281, 8110, 309, 293, 294, 2115, + 295, 15556, 2144, 293, 294, 2115, 295, 264, 3869, 2158, 295, 264, 5474, 747, 747, + 341, 291, 747, 341, 3663, 1619, 300, 13, 51256], "temperature": 0.0, "avg_logprob": + -0.3449482295824134, "compression_ratio": 1.7577092511013215, "no_speech_prob": + 0.07154249399900436}, {"id": 183, "seek": 249052, "start": 2510.2, "end": 2520.44, + "text": " Any commerce platform okay this is like a fresh concept but this is ways + this is a way for companies to monetize the platform it''s not a social media it + can be.", "tokens": [51348, 2639, 26320, 3663, 1392, 341, 307, 411, 257, 4451, 3410, + 457, 341, 307, 2098, 341, 307, 257, 636, 337, 3431, 281, 15556, 1125, 264, 3663, + 309, 311, 406, 257, 2093, 3021, 309, 393, 312, 13, 51860], "temperature": 0.0, "avg_logprob": + -0.3449482295824134, "compression_ratio": 1.7577092511013215, "no_speech_prob": + 0.07154249399900436}, {"id": 184, "seek": 252052, "start": 2520.52, "end": 2528.6, + "text": " e-commerce and it can be super simple because you know up until now they + they''ve seen like a nice dress or a nice.", "tokens": [50364, 308, 12, 26926, 293, + 309, 393, 312, 1687, 2199, 570, 291, 458, 493, 1826, 586, 436, 436, 600, 1612, 411, + 257, 1481, 5231, 420, 257, 1481, 13, 50768], "temperature": 0.0, "avg_logprob": + -0.2727415385999178, "compression_ratio": 1.7136752136752136, "no_speech_prob": + 0.006026688031852245}, {"id": 185, "seek": 252052, "start": 2529.88, "end": 2538.04, + "text": " A shirt but they cannot do with it they cannot purchase it they don''t + know how to explain to the machine or the computer.", "tokens": [50832, 316, 8336, + 457, 436, 2644, 360, 365, 309, 436, 2644, 8110, 309, 436, 500, 380, 458, 577, 281, + 2903, 281, 264, 3479, 420, 264, 3820, 13, 51240], "temperature": 0.0, "avg_logprob": + -0.2727415385999178, "compression_ratio": 1.7136752136752136, "no_speech_prob": + 0.006026688031852245}, {"id": 186, "seek": 252052, "start": 2539.8, "end": 2550.44, + "text": " What what is the type of the of the clothing that they would like to buy + so yeah that there are various options and yeah i''m eager to see what are the applications.", + "tokens": [51328, 708, 437, 307, 264, 2010, 295, 264, 295, 264, 11502, 300, 436, + 576, 411, 281, 2256, 370, 1338, 300, 456, 366, 3683, 3956, 293, 1338, 741, 478, + 18259, 281, 536, 437, 366, 264, 5821, 13, 51860], "temperature": 0.0, "avg_logprob": + -0.2727415385999178, "compression_ratio": 1.7136752136752136, "no_speech_prob": + 0.006026688031852245}, {"id": 187, "seek": 255052, "start": 2550.52, "end": 2556.6, + "text": " That you know developers and the entrepreneurs will develop with this + technology.", "tokens": [50364, 663, 291, 458, 8849, 293, 264, 12639, 486, 1499, + 365, 341, 2899, 13, 50668], "temperature": 0.0, "avg_logprob": -0.2868285993250405, + "compression_ratio": 1.5916666666666666, "no_speech_prob": 0.006189994513988495}, + {"id": 188, "seek": 255052, "start": 2557.48, "end": 2567.88, "text": " Yeah that + sounds great one of the apps that you just kind of reminded me of is I think it + was James Briggs who built the kind of simple demo.", "tokens": [50712, 865, 300, + 3263, 869, 472, 295, 264, 7733, 300, 291, 445, 733, 295, 15920, 385, 295, 307, 286, + 519, 309, 390, 5678, 1603, 32555, 567, 3094, 264, 733, 295, 2199, 10723, 13, 51232], + "temperature": 0.0, "avg_logprob": -0.2868285993250405, "compression_ratio": 1.5916666666666666, + "no_speech_prob": 0.006189994513988495}, {"id": 189, "seek": 255052, "start": 2568.68, + "end": 2580.44, "text": " Using the recent model called whisper from open AI so + you can actually you know like on YouTube today how you find things is basically + mostly based on titles.", "tokens": [51272, 11142, 264, 5162, 2316, 1219, 26018, + 490, 1269, 7318, 370, 291, 393, 767, 291, 458, 411, 322, 3088, 965, 577, 291, 915, + 721, 307, 1936, 5240, 2361, 322, 12992, 13, 51860], "temperature": 0.0, "avg_logprob": + -0.2868285993250405, "compression_ratio": 1.5916666666666666, "no_speech_prob": + 0.006189994513988495}, {"id": 190, "seek": 258052, "start": 2580.52, "end": 2582.68, + "text": " I believe this is what people type.", "tokens": [50364, 286, 1697, 341, + 307, 437, 561, 2010, 13, 50472], "temperature": 0.0, "avg_logprob": -0.27641512552897135, + "compression_ratio": 1.588235294117647, "no_speech_prob": 0.012654316611588001}, + {"id": 191, "seek": 258052, "start": 2583.96, "end": 2603.0, "text": " But then + he built a demo where he can land in the precise time code which contains the answer + to your question you know that could be really interesting like it just to think + about it at the unlocks even more what you said in the beginning like we have this + is a device of data and so on.", "tokens": [50536, 583, 550, 415, 3094, 257, 10723, + 689, 415, 393, 2117, 294, 264, 13600, 565, 3089, 597, 8306, 264, 1867, 281, 428, + 1168, 291, 458, 300, 727, 312, 534, 1880, 411, 309, 445, 281, 519, 466, 309, 412, + 264, 517, 34896, 754, 544, 437, 291, 848, 294, 264, 2863, 411, 321, 362, 341, 307, + 257, 4302, 295, 1412, 293, 370, 322, 13, 51488], "temperature": 0.0, "avg_logprob": + -0.27641512552897135, "compression_ratio": 1.588235294117647, "no_speech_prob": + 0.012654316611588001}, {"id": 192, "seek": 260300, "start": 2603.56, "end": 2612.12, + "text": " But like we are not able to unlock the data right it''s just sitting there + waiting to be discovered so to say.", "tokens": [50392, 583, 411, 321, 366, 406, + 1075, 281, 11634, 264, 1412, 558, 309, 311, 445, 3798, 456, 3806, 281, 312, 6941, + 370, 281, 584, 13, 50820], "temperature": 0.0, "avg_logprob": -0.14792193375624618, + "compression_ratio": 1.5728155339805825, "no_speech_prob": 0.020676447078585625}, + {"id": 193, "seek": 260300, "start": 2613.56, "end": 2630.04, "text": " Yeah it''s + really cool I wanted to spend a bit of time on the search topic itself so you did + mention this search abandonment issue which is like an e-commerce but but in general + if we if we think about search field.", "tokens": [50892, 865, 309, 311, 534, 1627, + 286, 1415, 281, 3496, 257, 857, 295, 565, 322, 264, 3164, 4829, 2564, 370, 291, + 630, 2152, 341, 3164, 9072, 518, 2734, 597, 307, 411, 364, 308, 12, 26926, 457, + 457, 294, 2674, 498, 321, 498, 321, 519, 466, 3164, 2519, 13, 51716], "temperature": + 0.0, "avg_logprob": -0.14792193375624618, "compression_ratio": 1.5728155339805825, + "no_speech_prob": 0.020676447078585625}, {"id": 194, "seek": 263004, "start": 2630.04, + "end": 2659.88, "text": " On a much larger scale and I think Daniel tanky-lank also + said about it that when search engine doesn''t work you are blamed but when it does + work you don''t hear anything it''s like people take it for granted it''s kind of + like water from the tap I guess right if it''s the right analogy so what do you + think of the search field in general like where do you think vector search field + fits in and what''s the role of this hybrid.", "tokens": [50412, 1282, 257, 709, + 4833, 4373, 293, 286, 519, 8033, 5466, 88, 12, 75, 657, 611, 848, 466, 309, 300, + 562, 3164, 2848, 1177, 380, 589, 291, 366, 32027, 457, 562, 309, 775, 589, 291, + 500, 380, 1568, 1340, 309, 311, 411, 561, 747, 309, 337, 12344, 309, 311, 733, 295, + 411, 1281, 490, 264, 5119, 286, 2041, 558, 498, 309, 311, 264, 558, 21663, 370, + 437, 360, 291, 519, 295, 264, 3164, 2519, 294, 2674, 411, 689, 360, 291, 519, 8062, + 3164, 2519, 9001, 294, 293, 437, 311, 264, 3090, 295, 341, 13051, 13, 51856], "temperature": + 0.0, "avg_logprob": -0.20772019612420464, "compression_ratio": 1.726530612244898, + "no_speech_prob": 0.03149003908038139}, {"id": 195, "seek": 266004, "start": 2660.04, + "end": 2674.36, "text": " Approach where you have this keywords versus which are + more familiar to users versus vector search so where would you take this yourself + right as a product manager having unlimited resources where would you where would + you go.", "tokens": [50364, 29551, 608, 689, 291, 362, 341, 21009, 5717, 597, 366, + 544, 4963, 281, 5022, 5717, 8062, 3164, 370, 689, 576, 291, 747, 341, 1803, 558, + 382, 257, 1674, 6598, 1419, 21950, 3593, 689, 576, 291, 689, 576, 291, 352, 13, + 51080], "temperature": 0.0, "avg_logprob": -0.2025515428229944, "compression_ratio": + 1.703125, "no_speech_prob": 0.007938697934150696}, {"id": 196, "seek": 266004, "start": + 2675.64, "end": 2686.36, "text": " Yeah this is this is an interesting question + yeah so I think that search is still an unsold problem.", "tokens": [51144, 865, + 341, 307, 341, 307, 364, 1880, 1168, 1338, 370, 286, 519, 300, 3164, 307, 920, 364, + 517, 45537, 1154, 13, 51680], "temperature": 0.0, "avg_logprob": -0.2025515428229944, + "compression_ratio": 1.703125, "no_speech_prob": 0.007938697934150696}, {"id": 197, + "seek": 268636, "start": 2686.36, "end": 2715.96, "text": " And you know in order + to find the right object or the right the the most accurate type of data we are + still we have a lot of work to develop this ecosystem and you know to build the + multimodals and multilingual and I think that the big tech companies are doing some + great job with this.", "tokens": [50412, 400, 291, 458, 294, 1668, 281, 915, 264, + 558, 2657, 420, 264, 558, 264, 264, 881, 8559, 2010, 295, 1412, 321, 366, 920, 321, + 362, 257, 688, 295, 589, 281, 1499, 341, 11311, 293, 291, 458, 281, 1322, 264, 32972, + 378, 1124, 293, 2120, 38219, 293, 286, 519, 300, 264, 955, 7553, 3431, 366, 884, + 512, 869, 1691, 365, 341, 13, 51844], "temperature": 0.0, "avg_logprob": -0.19534264504909515, + "compression_ratio": 1.6067415730337078, "no_speech_prob": 0.042302731424570084}, + {"id": 198, "seek": 271636, "start": 2716.36, "end": 2741.1600000000003, "text": + " With this stuff like open and I and the other folks and hybrid searches is is + a is a very interesting concept I believe that we for some applications it can be + a good a good way to solve their challenges but I think that the one of the most + important.", "tokens": [50364, 2022, 341, 1507, 411, 1269, 293, 286, 293, 264, 661, + 4024, 293, 13051, 26701, 307, 307, 257, 307, 257, 588, 1880, 3410, 286, 1697, 300, + 321, 337, 512, 5821, 309, 393, 312, 257, 665, 257, 665, 636, 281, 5039, 641, 4759, + 457, 286, 519, 300, 264, 472, 295, 264, 881, 1021, 13, 51604], "temperature": 0.0, + "avg_logprob": -0.22703027725219727, "compression_ratio": 1.539877300613497, "no_speech_prob": + 0.009138207882642746}, {"id": 199, "seek": 274116, "start": 2742.12, "end": 2749.72, + "text": " Pilar is that you should and again I''ve learned that the data but yes + that they are like the concept.", "tokens": [50412, 430, 2202, 307, 300, 291, 820, + 293, 797, 286, 600, 3264, 300, 264, 1412, 457, 2086, 300, 436, 366, 411, 264, 3410, + 13, 50792], "temperature": 0.0, "avg_logprob": -0.2210918468433422, "compression_ratio": + 1.8785046728971964, "no_speech_prob": 0.23245413601398468}, {"id": 200, "seek": + 274116, "start": 2751.48, "end": 2768.48, "text": " Of moving backwards from the + custom what is I don''t have solution if we have a discussion with a customer we + are asking what is the problem that you would like to solve and this is like you + should be focused what what is the problem that you would like to solve like more + than 50% of your discussion.", "tokens": [50880, 2720, 2684, 12204, 490, 264, 2375, + 437, 307, 286, 500, 380, 362, 3827, 498, 321, 362, 257, 5017, 365, 257, 5474, 321, + 366, 3365, 437, 307, 264, 1154, 300, 291, 576, 411, 281, 5039, 293, 341, 307, 411, + 291, 820, 312, 5178, 437, 437, 307, 264, 1154, 300, 291, 576, 411, 281, 5039, 411, + 544, 813, 2625, 4, 295, 428, 5017, 13, 51730], "temperature": 0.0, "avg_logprob": + -0.2210918468433422, "compression_ratio": 1.8785046728971964, "no_speech_prob": + 0.23245413601398468}, {"id": 201, "seek": 276848, "start": 2769.2, "end": 2795.2400000000002, + "text": " And if you don''t have a good fit it''s not a good fit if you did the + vector search technology is not a good fit we would say it to the customer we are + not trying you know to fit into a space that you know keyword search is a great + solution so I think it''s the focus should be around the problem space so trying + to figure out what is their pain point or what is their customers pain point.", + "tokens": [50400, 400, 498, 291, 500, 380, 362, 257, 665, 3318, 309, 311, 406, 257, + 665, 3318, 498, 291, 630, 264, 8062, 3164, 2899, 307, 406, 257, 665, 3318, 321, + 576, 584, 309, 281, 264, 5474, 321, 366, 406, 1382, 291, 458, 281, 3318, 666, 257, + 1901, 300, 291, 458, 20428, 3164, 307, 257, 869, 3827, 370, 286, 519, 309, 311, + 264, 1879, 820, 312, 926, 264, 1154, 1901, 370, 1382, 281, 2573, 484, 437, 307, + 641, 1822, 935, 420, 437, 307, 641, 4581, 1822, 935, 13, 51702], "temperature": + 0.0, "avg_logprob": -0.1458975009703904, "compression_ratio": 1.8428571428571427, + "no_speech_prob": 0.02312430925667286}, {"id": 202, "seek": 279524, "start": 2796.04, + "end": 2821.3999999999996, "text": " Is it the accuracy for some for some applications + we we spoke with the fraud detection company and for their use case like keyword + search was good enough solution great so go go go ahead and we don''t want to disturb + you so I think the focus should be around the the problem and the challenge and + then again.", "tokens": [50404, 1119, 309, 264, 14170, 337, 512, 337, 512, 5821, + 321, 321, 7179, 365, 264, 14560, 17784, 2237, 293, 337, 641, 764, 1389, 411, 20428, + 3164, 390, 665, 1547, 3827, 869, 370, 352, 352, 352, 2286, 293, 321, 500, 380, 528, + 281, 18071, 291, 370, 286, 519, 264, 1879, 820, 312, 926, 264, 264, 1154, 293, 264, + 3430, 293, 550, 797, 13, 51672], "temperature": 0.0, "avg_logprob": -0.1825513693002554, + "compression_ratio": 1.5989583333333333, "no_speech_prob": 0.04490693286061287}, + {"id": 203, "seek": 282140, "start": 2821.4, "end": 2849.64, "text": " Focus on + what is the the challenge that they would like to achieve or what is the what is + the potential of the solution and sometimes we are talking about recall is it the + most important parameter for some of the for some of the customers and 90% is good + enough for the use case but for mission mission critically should be mission critical + applications.", "tokens": [50412, 21862, 322, 437, 307, 264, 264, 3430, 300, 436, + 576, 411, 281, 4584, 420, 437, 307, 264, 437, 307, 264, 3995, 295, 264, 3827, 293, + 2171, 321, 366, 1417, 466, 9901, 307, 309, 264, 881, 1021, 13075, 337, 512, 295, + 264, 337, 512, 295, 264, 4581, 293, 4289, 4, 307, 665, 1547, 337, 264, 764, 1389, + 457, 337, 4447, 4447, 22797, 820, 312, 4447, 4924, 5821, 13, 51776], "temperature": + 0.0, "avg_logprob": -0.2280451774597168, "compression_ratio": 1.77, "no_speech_prob": + 0.05462859943509102}, {"id": 204, "seek": 285140, "start": 2851.96, "end": 2863.2400000000002, + "text": " It should be 99.99% right so I think it''s it''s a matter to some extent + it''s it''s an issue of what is the problem and what is the KPI that you would like + to achieve.", "tokens": [50392, 467, 820, 312, 11803, 13, 8494, 4, 558, 370, 286, + 519, 309, 311, 309, 311, 257, 1871, 281, 512, 8396, 309, 311, 309, 311, 364, 2734, + 295, 437, 307, 264, 1154, 293, 437, 307, 264, 591, 31701, 300, 291, 576, 411, 281, + 4584, 13, 50956], "temperature": 0.0, "avg_logprob": -0.12031129914887097, "compression_ratio": + 1.757847533632287, "no_speech_prob": 0.012550857849419117}, {"id": 205, "seek": + 285140, "start": 2863.92, "end": 2880.52, "text": " Would you like to optimize the + recall great we would optimize it if you would like to reduce the infrastructure + cost with the same KPI recall of X and you have latency of X and it''s a good enough + and maybe it can be latency so.", "tokens": [50990, 6068, 291, 411, 281, 19719, + 264, 9901, 869, 321, 576, 19719, 309, 498, 291, 576, 411, 281, 5407, 264, 6896, + 2063, 365, 264, 912, 591, 31701, 9901, 295, 1783, 293, 291, 362, 27043, 295, 1783, + 293, 309, 311, 257, 665, 1547, 293, 1310, 309, 393, 312, 27043, 370, 13, 51820], + "temperature": 0.0, "avg_logprob": -0.12031129914887097, "compression_ratio": 1.757847533632287, + "no_speech_prob": 0.012550857849419117}, {"id": 206, "seek": 288140, "start": 2881.8, + "end": 2900.84, "text": " For instance, Amazon published a research that every 100 + milliseconds let it see equals to 1% of the revenue so if the revenue is $1 billion + then 100 milliseconds of latency equals to 10 million.", "tokens": [50384, 1171, + 5197, 11, 6795, 6572, 257, 2132, 300, 633, 2319, 34184, 718, 309, 536, 6915, 281, + 502, 4, 295, 264, 9324, 370, 498, 264, 9324, 307, 1848, 16, 5218, 550, 2319, 34184, + 295, 27043, 6915, 281, 1266, 2459, 13, 51336], "temperature": 0.0, "avg_logprob": + -0.236592616675035, "compression_ratio": 1.5064935064935066, "no_speech_prob": 0.011023120023310184}, + {"id": 207, "seek": 288140, "start": 2901.88, "end": 2903.6800000000003, "text": + " This is a huge impact for companies.", "tokens": [51388, 639, 307, 257, 2603, + 2712, 337, 3431, 13, 51478], "temperature": 0.0, "avg_logprob": -0.236592616675035, + "compression_ratio": 1.5064935064935066, "no_speech_prob": 0.011023120023310184}, + {"id": 208, "seek": 290368, "start": 2904.16, "end": 2921.8399999999997, "text": + " So I think the main the main question is what is the problem that you would like + to solve what is the pain point and starting from the customer and then work backwards + to find if you have like a good solution and if a solution is a is a good fit.", + "tokens": [50388, 407, 286, 519, 264, 2135, 264, 2135, 1168, 307, 437, 307, 264, + 1154, 300, 291, 576, 411, 281, 5039, 437, 307, 264, 1822, 935, 293, 2891, 490, 264, + 5474, 293, 550, 589, 12204, 281, 915, 498, 291, 362, 411, 257, 665, 3827, 293, 498, + 257, 3827, 307, 257, 307, 257, 665, 3318, 13, 51272], "temperature": 0.0, "avg_logprob": + -0.2868145565653956, "compression_ratio": 1.8523809523809525, "no_speech_prob": + 0.04083620756864548}, {"id": 209, "seek": 290368, "start": 2922.3199999999997, "end": + 2931.64, "text": " And again, there are various concept keyword search is a great + solution vector search is a great solution and I read searches a good solution.", + "tokens": [51296, 400, 797, 11, 456, 366, 3683, 3410, 20428, 3164, 307, 257, 869, + 3827, 8062, 3164, 307, 257, 869, 3827, 293, 286, 1401, 26701, 257, 665, 3827, 13, + 51762], "temperature": 0.0, "avg_logprob": -0.2868145565653956, "compression_ratio": + 1.8523809523809525, "no_speech_prob": 0.04083620756864548}, {"id": 210, "seek": + 293164, "start": 2932.44, "end": 2937.3199999999997, "text": " The the big I think + the biggest question is what is the problem that the customer would like to solve.", + "tokens": [50404, 440, 264, 955, 286, 519, 264, 3880, 1168, 307, 437, 307, 264, + 1154, 300, 264, 5474, 576, 411, 281, 5039, 13, 50648], "temperature": 0.0, "avg_logprob": + -0.18510849835121468, "compression_ratio": 1.58, "no_speech_prob": 0.03003217838704586}, + {"id": 211, "seek": 293164, "start": 2937.64, "end": 2952.7999999999997, "text": + " Yeah, I think you put it really brilliantly because it''s very easy to get into + my new show of tweaking things like on the software side and saying I have the best + algorithm right or I have the fastest or whatever.", "tokens": [50664, 865, 11, + 286, 519, 291, 829, 309, 534, 8695, 42580, 570, 309, 311, 588, 1858, 281, 483, 666, + 452, 777, 855, 295, 6986, 2456, 721, 411, 322, 264, 4722, 1252, 293, 1566, 286, + 362, 264, 1151, 9284, 558, 420, 286, 362, 264, 14573, 420, 2035, 13, 51422], "temperature": + 0.0, "avg_logprob": -0.18510849835121468, "compression_ratio": 1.58, "no_speech_prob": + 0.03003217838704586}, {"id": 212, "seek": 295280, "start": 2953.2000000000003, "end": + 2962.0800000000004, "text": " But then if you if you forget that I guess the most + important dimension for your customer maybe it''s power consumption that we mentioned + previously or something else right.", "tokens": [50384, 583, 550, 498, 291, 498, + 291, 2870, 300, 286, 2041, 264, 881, 1021, 10139, 337, 428, 5474, 1310, 309, 311, + 1347, 12126, 300, 321, 2835, 8046, 420, 746, 1646, 558, 13, 50828], "temperature": + 0.0, "avg_logprob": -0.20031770746758643, "compression_ratio": 1.6984126984126984, + "no_speech_prob": 0.06714273244142532}, {"id": 213, "seek": 295280, "start": 2962.96, + "end": 2964.4, "text": " But but also what you said.", "tokens": [50872, 583, 457, + 611, 437, 291, 848, 13, 50944], "temperature": 0.0, "avg_logprob": -0.20031770746758643, + "compression_ratio": 1.6984126984126984, "no_speech_prob": 0.06714273244142532}, + {"id": 214, "seek": 295280, "start": 2966.0, "end": 2982.4, "text": " How you can + think like the way Amazon did it right that they think big right they say okay of + all these dollars we we earned how much actually we wasted on on you know latency + and also how much of clients we kind of lost right.", "tokens": [51024, 1012, 291, + 393, 519, 411, 264, 636, 6795, 630, 309, 558, 300, 436, 519, 955, 558, 436, 584, + 1392, 295, 439, 613, 3808, 321, 321, 12283, 577, 709, 767, 321, 19496, 322, 322, + 291, 458, 27043, 293, 611, 577, 709, 295, 6982, 321, 733, 295, 2731, 558, 13, 51844], + "temperature": 0.0, "avg_logprob": -0.20031770746758643, "compression_ratio": 1.6984126984126984, + "no_speech_prob": 0.06714273244142532}, {"id": 215, "seek": 298240, "start": 2982.4, + "end": 2996.1600000000003, "text": " Or potential clients because if if the server + doesn''t respond soon enough then and it''s only average right 100 millisecond maybe + for some it looks like more like closer to second including their own internet connection.", + "tokens": [50364, 1610, 3995, 6982, 570, 498, 498, 264, 7154, 1177, 380, 4196, 2321, + 1547, 550, 293, 309, 311, 787, 4274, 558, 2319, 27940, 18882, 1310, 337, 512, 309, + 1542, 411, 544, 411, 4966, 281, 1150, 3009, 641, 1065, 4705, 4984, 13, 51052], "temperature": + 0.0, "avg_logprob": -0.2873264983460143, "compression_ratio": 1.594488188976378, + "no_speech_prob": 0.008880238980054855}, {"id": 216, "seek": 298240, "start": 2996.7200000000003, + "end": 3004.1600000000003, "text": " And they might just give up and say hard this + is not working today I will go and check something else I will forget about what + I wanted to buy.", "tokens": [51080, 400, 436, 1062, 445, 976, 493, 293, 584, 1152, + 341, 307, 406, 1364, 965, 286, 486, 352, 293, 1520, 746, 1646, 286, 486, 2870, 466, + 437, 286, 1415, 281, 2256, 13, 51452], "temperature": 0.0, "avg_logprob": -0.2873264983460143, + "compression_ratio": 1.594488188976378, "no_speech_prob": 0.008880238980054855}, + {"id": 217, "seek": 298240, "start": 3004.4, "end": 3005.6, "text": " So right.", + "tokens": [51464, 407, 558, 13, 51524], "temperature": 0.0, "avg_logprob": -0.2873264983460143, + "compression_ratio": 1.594488188976378, "no_speech_prob": 0.008880238980054855}, + {"id": 218, "seek": 298240, "start": 3006.96, "end": 3008.1600000000003, "text": + " Yeah, this is very interesting.", "tokens": [51592, 865, 11, 341, 307, 588, 1880, + 13, 51652], "temperature": 0.0, "avg_logprob": -0.2873264983460143, "compression_ratio": + 1.594488188976378, "no_speech_prob": 0.008880238980054855}, {"id": 219, "seek": + 300816, "start": 3008.72, "end": 3035.2, "text": " Also like you you you you you + brought up some some topic behind the scenes like on the role of human in this whole + loop I also want to pick your brain on that you know there is one direction in AI + saying this is going to be a whole automatic thing you don''t need to do anything + it will decide for you which is also by the way a little bit worrisome if the yeah + it''s going to decide for everything.", "tokens": [50392, 2743, 411, 291, 291, 291, + 291, 291, 3038, 493, 512, 512, 4829, 2261, 264, 8026, 411, 322, 264, 3090, 295, + 1952, 294, 341, 1379, 6367, 286, 611, 528, 281, 1888, 428, 3567, 322, 300, 291, + 458, 456, 307, 472, 3513, 294, 7318, 1566, 341, 307, 516, 281, 312, 257, 1379, 12509, + 551, 291, 500, 380, 643, 281, 360, 1340, 309, 486, 4536, 337, 291, 597, 307, 611, + 538, 264, 636, 257, 707, 857, 469, 5714, 423, 498, 264, 1338, 309, 311, 516, 281, + 4536, 337, 1203, 13, 51716], "temperature": 0.0, "avg_logprob": -0.18203064635559751, + "compression_ratio": 1.7723214285714286, "no_speech_prob": 0.028258956968784332}, + {"id": 220, "seek": 303520, "start": 3035.2, "end": 3064.96, "text": " But but even + going back to earth like where do you see humans will play a role in some sense + we are slower right then machines in some sense I think we are still faster for + example in creating things but even they are the machines that tapping into it but + like connected with MLOPS topics also machine learning operations and connected + with bias and data that we collect to tell you what the real thing is.", "tokens": + [50392, 583, 457, 754, 516, 646, 281, 4120, 411, 689, 360, 291, 536, 6255, 486, + 862, 257, 3090, 294, 512, 2020, 321, 366, 14009, 558, 550, 8379, 294, 512, 2020, + 286, 519, 321, 366, 920, 4663, 337, 1365, 294, 4084, 721, 457, 754, 436, 366, 264, + 8379, 300, 21444, 666, 309, 457, 411, 4582, 365, 21601, 46, 6273, 8378, 611, 3479, + 2539, 7705, 293, 4582, 365, 12577, 293, 1412, 300, 321, 2500, 281, 980, 291, 437, + 264, 957, 551, 307, 13, 51852], "temperature": 0.0, "avg_logprob": -0.39790858992611067, + "compression_ratio": 1.7276595744680852, "no_speech_prob": 0.027755936607718468}, + {"id": 221, "seek": 306520, "start": 3065.2, "end": 3075.8399999999997, "text": + " Train models or maybe some other dimension that I''m missing that you think human + is going to play a role can you can you expand a little bit on that yeah.", "tokens": + [50364, 28029, 5245, 420, 1310, 512, 661, 10139, 300, 286, 478, 5361, 300, 291, + 519, 1952, 307, 516, 281, 862, 257, 3090, 393, 291, 393, 291, 5268, 257, 707, 857, + 322, 300, 1338, 13, 50896], "temperature": 0.0, "avg_logprob": -0.181281694224183, + "compression_ratio": 1.5837837837837838, "no_speech_prob": 0.00645774370059371}, + {"id": 222, "seek": 306520, "start": 3076.3199999999997, "end": 3085.04, "text": + " So actually I wrote about it in medium about the MLOPS challenge and and the human + in the loop and what is the place of human in the loop.", "tokens": [50920, 407, + 767, 286, 4114, 466, 309, 294, 6399, 466, 264, 21601, 46, 6273, 3430, 293, 293, + 264, 1952, 294, 264, 6367, 293, 437, 307, 264, 1081, 295, 1952, 294, 264, 6367, + 13, 51356], "temperature": 0.0, "avg_logprob": -0.181281694224183, "compression_ratio": + 1.5837837837837838, "no_speech_prob": 0.00645774370059371}, {"id": 223, "seek": + 308504, "start": 3085.04, "end": 3095.44, "text": " Essentially I believe that machine + learning is a decision support system okay I believe that the human as.", "tokens": + [50364, 23596, 286, 1697, 300, 3479, 2539, 307, 257, 3537, 1406, 1185, 1392, 286, + 1697, 300, 264, 1952, 382, 13, 50884], "temperature": 0.0, "avg_logprob": -0.31867987012106275, + "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.011864340864121914}, + {"id": 224, "seek": 308504, "start": 3096.16, "end": 3114.96, "text": " As a huge + or important significant role in helping the machine to decide and where a good + way to automate processes is to use the machine and to set a threshold okay so for + instance if you were.", "tokens": [50920, 1018, 257, 2603, 420, 1021, 4776, 3090, + 294, 4315, 264, 3479, 281, 4536, 293, 689, 257, 665, 636, 281, 31605, 7555, 307, + 281, 764, 264, 3479, 293, 281, 992, 257, 14678, 1392, 370, 337, 5197, 498, 291, + 645, 13, 51860], "temperature": 0.0, "avg_logprob": -0.31867987012106275, "compression_ratio": + 1.6096256684491979, "no_speech_prob": 0.011864340864121914}, {"id": 225, "seek": + 311504, "start": 3115.04, "end": 3133.2799999999997, "text": " We''re talking about + cyber cyber security challenge so you can decide that the threshold is below 0.7 + is a good enough and you don''t that like the sock teams will will check this anomaly + and then again you are reducing.", "tokens": [50364, 492, 434, 1417, 466, 13411, + 13411, 3825, 3430, 370, 291, 393, 4536, 300, 264, 14678, 307, 2507, 1958, 13, 22, + 307, 257, 665, 1547, 293, 291, 500, 380, 300, 411, 264, 35302, 5491, 486, 486, 1520, + 341, 42737, 293, 550, 797, 291, 366, 12245, 13, 51276], "temperature": 0.0, "avg_logprob": + -0.25792349601278497, "compression_ratio": 1.4630872483221478, "no_speech_prob": + 0.006134802475571632}, {"id": 226, "seek": 313328, "start": 3133.28, "end": 3154.2400000000002, + "text": " The main power cost because you are automating and you are sending queries + or a stream of data to analyze that they would you know fine tune the model and + then you can create a learning model right so it''s a human in the loop the human + is giving a feedback to the model and then you can.", "tokens": [50364, 440, 2135, + 1347, 2063, 570, 291, 366, 3553, 990, 293, 291, 366, 7750, 24109, 420, 257, 4309, + 295, 1412, 281, 12477, 300, 436, 576, 291, 458, 2489, 10864, 264, 2316, 293, 550, + 291, 393, 1884, 257, 2539, 2316, 558, 370, 309, 311, 257, 1952, 294, 264, 6367, + 264, 1952, 307, 2902, 257, 5824, 281, 264, 2316, 293, 550, 291, 393, 13, 51412], + "temperature": 0.0, "avg_logprob": -0.21799991314227765, "compression_ratio": 1.6783625730994152, + "no_speech_prob": 0.36805739998817444}, {"id": 227, "seek": 315424, "start": 3154.24, + "end": 3164.8799999999997, "text": " Detect data drift if it''s not automated you + know there there are solutions that are good that you know data drift etc but again.", + "tokens": [50364, 4237, 557, 1412, 19699, 498, 309, 311, 406, 18473, 291, 458, 456, + 456, 366, 6547, 300, 366, 665, 300, 291, 458, 1412, 19699, 5183, 457, 797, 13, 50896], + "temperature": 0.0, "avg_logprob": -0.27783331189836774, "compression_ratio": 1.6348314606741574, + "no_speech_prob": 0.42037612199783325}, {"id": 228, "seek": 315424, "start": 3166.3999999999996, + "end": 3182.56, "text": " My my two cents is that fully automated systems we are + not ready yet for it and I believe that in order and then again we don''t like all + of the anomalies will be.", "tokens": [50972, 1222, 452, 732, 14941, 307, 300, 4498, + 18473, 3652, 321, 366, 406, 1919, 1939, 337, 309, 293, 286, 1697, 300, 294, 1668, + 293, 550, 797, 321, 500, 380, 411, 439, 295, 264, 24769, 48872, 486, 312, 13, 51780], + "temperature": 0.0, "avg_logprob": -0.27783331189836774, "compression_ratio": 1.6348314606741574, + "no_speech_prob": 0.42037612199783325}, {"id": 229, "seek": 318256, "start": 3182.56, + "end": 3207.12, "text": " Tested by a human again because you have the false positive + fatigue or a lot fatigue in in the cyber domain so I believe that a combination + or the hybrid model where you can define a certain threshold and send it to a human + to run a sanity check and you know i''ve worked with many data scientist and.", + "tokens": [50364, 9279, 292, 538, 257, 1952, 797, 570, 291, 362, 264, 7908, 3353, + 20574, 420, 257, 688, 20574, 294, 294, 264, 13411, 9274, 370, 286, 1697, 300, 257, + 6562, 420, 264, 13051, 2316, 689, 291, 393, 6964, 257, 1629, 14678, 293, 2845, 309, + 281, 257, 1952, 281, 1190, 257, 47892, 1520, 293, 291, 458, 741, 600, 2732, 365, + 867, 1412, 12662, 293, 13, 51592], "temperature": 0.0, "avg_logprob": -0.29117217208399915, + "compression_ratio": 1.5625, "no_speech_prob": 0.059261504560709}, {"id": 230, "seek": + 320712, "start": 3207.12, "end": 3220.24, "text": " The the always like you know + to improve the state of the art model and improve the f 1 score for from 99 to 99.9.", + "tokens": [50364, 440, 264, 1009, 411, 291, 458, 281, 3470, 264, 1785, 295, 264, + 1523, 2316, 293, 3470, 264, 283, 502, 6175, 337, 490, 11803, 281, 11803, 13, 24, + 13, 51020], "temperature": 0.0, "avg_logprob": -0.2649727463722229, "compression_ratio": + 1.2282608695652173, "no_speech_prob": 0.18493223190307617}, {"id": 231, "seek": + 322024, "start": 3221.2, "end": 3242.3199999999997, "text": " But what is the the + impact on the business is it is it important enough for the business to invest resources + in this in this in this research or not like five data scientists are running and + testing and optimizing the hyper parameter.", "tokens": [50412, 583, 437, 307, 264, + 264, 2712, 322, 264, 1606, 307, 309, 307, 309, 1021, 1547, 337, 264, 1606, 281, + 1963, 3593, 294, 341, 294, 341, 294, 341, 2132, 420, 406, 411, 1732, 1412, 7708, + 366, 2614, 293, 4997, 293, 40425, 264, 9848, 13075, 13, 51468], "temperature": 0.0, + "avg_logprob": -0.2605609893798828, "compression_ratio": 1.6363636363636365, "no_speech_prob": + 0.07706615328788757}, {"id": 232, "seek": 324232, "start": 3242.32, "end": 3261.92, + "text": " For months but what is the business what is the impact on the business + so essentially I believe that and and again it resonates with the search domain + so I believe that companies that will be smart enough to integrate the human and + the loop mechanism where they can find you know.", "tokens": [50364, 1171, 2493, + 457, 437, 307, 264, 1606, 437, 307, 264, 2712, 322, 264, 1606, 370, 4476, 286, 1697, + 300, 293, 293, 797, 309, 41051, 365, 264, 3164, 9274, 370, 286, 1697, 300, 3431, + 300, 486, 312, 4069, 1547, 281, 13365, 264, 1952, 293, 264, 6367, 7513, 689, 436, + 393, 915, 291, 458, 13, 51344], "temperature": 0.0, "avg_logprob": -0.25503315841942503, + "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0723816305398941}, {"id": + 233, "seek": 326192, "start": 3261.92, "end": 3287.92, "text": " Measure KPIs like + the clicks on on the on the first result how many clicks on the first result or + any other KPIs and then if it''s a good model then great we should keep it you know + if it''s not broken don''t touch it but if something is not working the mechanism + or something that there''s a drift in the data so we can you know.", "tokens": [50364, + 41436, 41371, 6802, 411, 264, 18521, 322, 322, 264, 322, 264, 700, 1874, 577, 867, + 18521, 322, 264, 700, 1874, 420, 604, 661, 41371, 6802, 293, 550, 498, 309, 311, + 257, 665, 2316, 550, 869, 321, 820, 1066, 309, 291, 458, 498, 309, 311, 406, 5463, + 500, 380, 2557, 309, 457, 498, 746, 307, 406, 1364, 264, 7513, 420, 746, 300, 456, + 311, 257, 19699, 294, 264, 1412, 370, 321, 393, 291, 458, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.1777657973460662, "compression_ratio": 1.7157894736842105, + "no_speech_prob": 0.24126069247722626}, {"id": 234, "seek": 328792, "start": 3287.92, + "end": 3292.4, "text": " Research research it again and find what is the root cause + and then.", "tokens": [50364, 10303, 2132, 309, 797, 293, 915, 437, 307, 264, 5593, + 3082, 293, 550, 13, 50588], "temperature": 0.0, "avg_logprob": -0.25997584379172023, + "compression_ratio": 1.7307692307692308, "no_speech_prob": 0.051333919167518616}, + {"id": 235, "seek": 328792, "start": 3293.2400000000002, "end": 3304.08, "text": + " Human will detect or machine will will detect it so I believe that this this is + the question of of layers so you have like the machine learning layer and then.", + "tokens": [50630, 10294, 486, 5531, 420, 3479, 486, 486, 5531, 309, 370, 286, 1697, + 300, 341, 341, 307, 264, 1168, 295, 295, 7914, 370, 291, 362, 411, 264, 3479, 2539, + 4583, 293, 550, 13, 51172], "temperature": 0.0, "avg_logprob": -0.25997584379172023, + "compression_ratio": 1.7307692307692308, "no_speech_prob": 0.051333919167518616}, + {"id": 236, "seek": 328792, "start": 3304.8, "end": 3314.52, "text": " ML obstacles + like you can you know auto ML and the hyper parameter optimization and data drift + and model drift and other tools but.", "tokens": [51208, 21601, 17735, 411, 291, + 393, 291, 458, 8399, 21601, 293, 264, 9848, 13075, 19618, 293, 1412, 19699, 293, + 2316, 19699, 293, 661, 3873, 457, 13, 51694], "temperature": 0.0, "avg_logprob": + -0.25997584379172023, "compression_ratio": 1.7307692307692308, "no_speech_prob": + 0.051333919167518616}, {"id": 237, "seek": 331452, "start": 3314.52, "end": 3329.08, + "text": " We are I don''t think we are ready to fully automated all of the process + and yeah this is like a great question for instance autonomous cars are we ready + yet or not.", "tokens": [50364, 492, 366, 286, 500, 380, 519, 321, 366, 1919, 281, + 4498, 18473, 439, 295, 264, 1399, 293, 1338, 341, 307, 411, 257, 869, 1168, 337, + 5197, 23797, 5163, 366, 321, 1919, 1939, 420, 406, 13, 51092], "temperature": 0.0, + "avg_logprob": -0.3535003344217936, "compression_ratio": 1.5740740740740742, "no_speech_prob": + 0.03337598219513893}, {"id": 238, "seek": 331452, "start": 3330.6, "end": 3338.52, + "text": " This is I think this is the the challenge of the data science ecosystem + in the next years.", "tokens": [51168, 639, 307, 286, 519, 341, 307, 264, 264, 3430, + 295, 264, 1412, 3497, 11311, 294, 264, 958, 924, 13, 51564], "temperature": 0.0, + "avg_logprob": -0.3535003344217936, "compression_ratio": 1.5740740740740742, "no_speech_prob": + 0.03337598219513893}, {"id": 239, "seek": 333852, "start": 3339.32, "end": 3341.88, + "text": " Yeah I think it''s also like.", "tokens": [50404, 865, 286, 519, 309, + 311, 611, 411, 13, 50532], "temperature": 0.0, "avg_logprob": -0.2300984448400037, + "compression_ratio": 1.5133689839572193, "no_speech_prob": 0.042004186660051346}, + {"id": 240, "seek": 333852, "start": 3343.32, "end": 3359.0, "text": " Our psychological + readiness to accept this solutions rights maybe previously when we didn''t have + let''s say elevator so everyone was walking up the stairs and no one really complained + but then when the first elevator arrived maybe people were like really.", "tokens": + [50604, 2621, 14346, 34954, 281, 3241, 341, 6547, 4601, 1310, 8046, 562, 321, 994, + 380, 362, 718, 311, 584, 18782, 370, 1518, 390, 4494, 493, 264, 13471, 293, 572, + 472, 534, 33951, 457, 550, 562, 264, 700, 18782, 6678, 1310, 561, 645, 411, 534, + 13, 51388], "temperature": 0.0, "avg_logprob": -0.2300984448400037, "compression_ratio": + 1.5133689839572193, "no_speech_prob": 0.042004186660051346}, {"id": 241, "seek": + 335900, "start": 3359.56, "end": 3366.2, "text": " You know looking at it with open + eyes and like what is this should they trust it will I get interrupt in it or something + you know.", "tokens": [50392, 509, 458, 1237, 412, 309, 365, 1269, 2575, 293, 411, + 437, 307, 341, 820, 436, 3361, 309, 486, 286, 483, 12729, 294, 309, 420, 746, 291, + 458, 13, 50724], "temperature": 0.0, "avg_logprob": -0.21883544246707343, "compression_ratio": + 1.8192307692307692, "no_speech_prob": 0.04302505776286125}, {"id": 242, "seek": + 335900, "start": 3367.08, "end": 3370.6, "text": " So the same I think goes to what + you just raised as the.", "tokens": [50768, 407, 264, 912, 286, 519, 1709, 281, + 437, 291, 445, 6005, 382, 264, 13, 50944], "temperature": 0.0, "avg_logprob": -0.21883544246707343, + "compression_ratio": 1.8192307692307692, "no_speech_prob": 0.04302505776286125}, + {"id": 243, "seek": 335900, "start": 3372.04, "end": 3378.2, "text": " You know + self driving cars you know I think it was ill and mask saying I don''t remember + exactly the stats but something let''s say one and a thousand.", "tokens": [51016, + 509, 458, 2698, 4840, 5163, 291, 458, 286, 519, 309, 390, 3171, 293, 6094, 1566, + 286, 500, 380, 1604, 2293, 264, 18152, 457, 746, 718, 311, 584, 472, 293, 257, 4714, + 13, 51324], "temperature": 0.0, "avg_logprob": -0.21883544246707343, "compression_ratio": + 1.8192307692307692, "no_speech_prob": 0.04302505776286125}, {"id": 244, "seek": + 335900, "start": 3379.0, "end": 3388.76, "text": " You know so it avoids basically + nine hundred ninety nine you know cases bad cases so would you trust that or do + you need like it to be.", "tokens": [51364, 509, 458, 370, 309, 3641, 3742, 1936, + 4949, 3262, 25063, 4949, 291, 458, 3331, 1578, 3331, 370, 576, 291, 3361, 300, 420, + 360, 291, 643, 411, 309, 281, 312, 13, 51852], "temperature": 0.0, "avg_logprob": + -0.21883544246707343, "compression_ratio": 1.8192307692307692, "no_speech_prob": + 0.04302505776286125}, {"id": 245, "seek": 338900, "start": 3389.0, "end": 3412.92, + "text": " Even bigger number and so on so forth so like complete thousands so it''s + never mistaken but what about cases where it''s hard to decide right like you inevitably + going to crush the car now you need to choose where like to the human or maybe to + to the I don''t know for to the tree or something which hurts the driver and stuff + like that right so.", "tokens": [50364, 2754, 3801, 1230, 293, 370, 322, 370, 5220, + 370, 411, 3566, 5383, 370, 309, 311, 1128, 21333, 457, 437, 466, 3331, 689, 309, + 311, 1152, 281, 4536, 558, 411, 291, 28171, 516, 281, 10321, 264, 1032, 586, 291, + 643, 281, 2826, 689, 411, 281, 264, 1952, 420, 1310, 281, 281, 264, 286, 500, 380, + 458, 337, 281, 264, 4230, 420, 746, 597, 11051, 264, 6787, 293, 1507, 411, 300, + 558, 370, 13, 51560], "temperature": 0.0, "avg_logprob": -0.13961144497520045, "compression_ratio": + 1.6618357487922706, "no_speech_prob": 0.006587734445929527}, {"id": 246, "seek": + 341292, "start": 3412.92, "end": 3442.76, "text": " So this I think the same the + same decisions that we would be making as humans then algorithms should make and + I think what humans or humanity has hard time with is probably accepting the fact + that someone else is going to make the decision right so yeah it''s yeah it''s a + revolution you know you mentioned the elevator but you know the famous story of + the end report with the horses and the cars why should we.", "tokens": [50412, 407, + 341, 286, 519, 264, 912, 264, 912, 5327, 300, 321, 576, 312, 1455, 382, 6255, 550, + 14642, 820, 652, 293, 286, 519, 437, 6255, 420, 10243, 575, 1152, 565, 365, 307, + 1391, 17391, 264, 1186, 300, 1580, 1646, 307, 516, 281, 652, 264, 3537, 558, 370, + 1338, 309, 311, 1338, 309, 311, 257, 8894, 291, 458, 291, 2835, 264, 18782, 457, + 291, 458, 264, 4618, 1657, 295, 264, 917, 2275, 365, 264, 13112, 293, 264, 5163, + 983, 820, 321, 13, 51856], "temperature": 0.0, "avg_logprob": -0.23662346885317848, + "compression_ratio": 1.8097345132743363, "no_speech_prob": 0.03795231878757477}, + {"id": 247, "seek": 344292, "start": 3442.92, "end": 3450.92, "text": " Need the + cars right so it''s a it''s a revolution I think that most of.", "tokens": [50364, + 16984, 264, 5163, 558, 370, 309, 311, 257, 309, 311, 257, 8894, 286, 519, 300, 881, + 295, 13, 50764], "temperature": 0.0, "avg_logprob": -0.24705459822469683, "compression_ratio": + 1.6031746031746033, "no_speech_prob": 0.005872825160622597}, {"id": 248, "seek": + 344292, "start": 3451.92, "end": 3471.92, "text": " The features that we are working + on improving the quality of life and people you know can automate processes and + and on focus in their you know in their family and instead of doing some complicated + task they can you know focus their.", "tokens": [50814, 440, 4122, 300, 321, 366, + 1364, 322, 11470, 264, 3125, 295, 993, 293, 561, 291, 458, 393, 31605, 7555, 293, + 293, 322, 1879, 294, 641, 291, 458, 294, 641, 1605, 293, 2602, 295, 884, 512, 6179, + 5633, 436, 393, 291, 458, 1879, 641, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.24705459822469683, "compression_ratio": 1.6031746031746033, "no_speech_prob": + 0.005872825160622597}, {"id": 249, "seek": 347292, "start": 3472.92, "end": 3490.92, + "text": " Time on innovation or you know play football or soccer whatever they want + and you know makes our life easier to some extent yeah and we believe collectively + that vector search is going to help there i.", "tokens": [50364, 6161, 322, 8504, + 420, 291, 458, 862, 7346, 420, 15469, 2035, 436, 528, 293, 291, 458, 1669, 527, + 993, 3571, 281, 512, 8396, 1338, 293, 321, 1697, 24341, 300, 8062, 3164, 307, 516, + 281, 854, 456, 741, 13, 51264], "temperature": 0.0, "avg_logprob": -0.1753052756899879, + "compression_ratio": 1.4460431654676258, "no_speech_prob": 0.035924822092056274}, + {"id": 250, "seek": 349092, "start": 3490.92, "end": 3511.92, "text": " I really + like also to of course ask this my philosophical question but before that i was + thinking like what do you think on the field in general the vector search and maybe + including search and machine learning what do a lot is happening but what do you + think is still missing from your perspective.", "tokens": [50364, 286, 534, 411, + 611, 281, 295, 1164, 1029, 341, 452, 25066, 1168, 457, 949, 300, 741, 390, 1953, + 411, 437, 360, 291, 519, 322, 264, 2519, 294, 2674, 264, 8062, 3164, 293, 1310, + 3009, 3164, 293, 3479, 2539, 437, 360, 257, 688, 307, 2737, 457, 437, 360, 291, + 519, 307, 920, 5361, 490, 428, 4585, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.11387457847595214, "compression_ratio": 1.6611111111111112, "no_speech_prob": + 0.16490373015403748}, {"id": 251, "seek": 351192, "start": 3512.92, "end": 3515.92, + "text": " Something that maybe we need to fix.", "tokens": [50414, 6595, 300, 1310, + 321, 643, 281, 3191, 13, 50564], "temperature": 0.0, "avg_logprob": -0.12341706202580378, + "compression_ratio": 1.6476683937823835, "no_speech_prob": 0.03781450539827347}, + {"id": 252, "seek": 351192, "start": 3516.92, "end": 3540.92, "text": " To be more + efficient yeah I think it''s it''s it''s education simplifying the concept of search + I think this is the should be our main focus so education generating content and + again I really like the grandmother test simplifying not like super complicated + mathematical equations etc.", "tokens": [50614, 1407, 312, 544, 7148, 1338, 286, + 519, 309, 311, 309, 311, 309, 311, 3309, 6883, 5489, 264, 3410, 295, 3164, 286, + 519, 341, 307, 264, 820, 312, 527, 2135, 1879, 370, 3309, 17746, 2701, 293, 797, + 286, 534, 411, 264, 14317, 1500, 6883, 5489, 406, 411, 1687, 6179, 18894, 11787, + 5183, 13, 51814], "temperature": 0.0, "avg_logprob": -0.12341706202580378, "compression_ratio": + 1.6476683937823835, "no_speech_prob": 0.03781450539827347}, {"id": 253, "seek": + 354192, "start": 3541.92, "end": 3565.92, "text": " So I think it''s education we + are you know the ecosystem is trying to generate high quality content videos a YouTube + blog post we we are trying to contribute to this effort as well.", "tokens": [50364, + 407, 286, 519, 309, 311, 3309, 321, 366, 291, 458, 264, 11311, 307, 1382, 281, 8460, + 1090, 3125, 2701, 2145, 257, 3088, 6968, 2183, 321, 321, 366, 1382, 281, 10586, + 281, 341, 4630, 382, 731, 13, 51564], "temperature": 0.0, "avg_logprob": -0.3443129539489746, + "compression_ratio": 1.3333333333333333, "no_speech_prob": 0.025917518883943558}, + {"id": 254, "seek": 356592, "start": 3565.92, "end": 3593.92, "text": " We are doing + enough and you know it can be like high school or universities so but again this + is vector search is technology it''s an enabler it''s not the it''s not the objective + it''s not the it''s not the target but in order to unlock the potential of vector + search and machine learning and transform and so on all of these cool stuff we should.", + "tokens": [50364, 492, 366, 884, 1547, 293, 291, 458, 309, 393, 312, 411, 1090, + 1395, 420, 11779, 370, 457, 797, 341, 307, 8062, 3164, 307, 2899, 309, 311, 364, + 465, 455, 1918, 309, 311, 406, 264, 309, 311, 406, 264, 10024, 309, 311, 406, 264, + 309, 311, 406, 264, 3779, 457, 294, 1668, 281, 11634, 264, 3995, 295, 8062, 3164, + 293, 3479, 2539, 293, 4088, 293, 370, 322, 439, 295, 613, 1627, 1507, 321, 820, + 13, 51764], "temperature": 0.0, "avg_logprob": -0.16971482986058944, "compression_ratio": + 1.7323232323232323, "no_speech_prob": 0.40467748045921326}, {"id": 255, "seek": + 359392, "start": 3593.92, "end": 3615.92, "text": " Invest some of our resources + in education and learning and training and you know unlock the potential that every + developer can kill build a vector vector search based application in in every field + like it can be as I mentioned before health care,", "tokens": [50364, 14008, 512, + 295, 527, 3593, 294, 3309, 293, 2539, 293, 3097, 293, 291, 458, 11634, 264, 3995, + 300, 633, 10754, 393, 1961, 1322, 257, 8062, 8062, 3164, 2361, 3861, 294, 294, 633, + 2519, 411, 309, 393, 312, 382, 286, 2835, 949, 1585, 1127, 11, 51464], "temperature": + 0.0, "avg_logprob": -0.20028056701024374, "compression_ratio": 1.5471698113207548, + "no_speech_prob": 0.029232246801257133}, {"id": 256, "seek": 361592, "start": 3615.92, + "end": 3642.92, "text": " care, FinTech education and any other domain that manufacturing + or any other domain that you would like that is bigger to solve some problem I think + we should you know simplified similar to the revolution of auto ml so instead of + you know processing and labeling images etc.", "tokens": [50364, 1127, 11, 3773, + 36050, 3309, 293, 604, 661, 9274, 300, 11096, 420, 604, 661, 9274, 300, 291, 576, + 411, 300, 307, 3801, 281, 5039, 512, 1154, 286, 519, 321, 820, 291, 458, 26335, + 2531, 281, 264, 8894, 295, 8399, 23271, 370, 2602, 295, 291, 458, 9007, 293, 40244, + 5267, 5183, 13, 51714], "temperature": 0.0, "avg_logprob": -0.2777529629794034, + "compression_ratio": 1.5657142857142856, "no_speech_prob": 0.3736276924610138}, + {"id": 257, "seek": 364292, "start": 3642.92, "end": 3671.92, "text": " You have + like an auto auto ml tool or solution and your provision you provision the data + labeling it and then the under the hood the auto ml model will run the experiments + find the right algorithm find the right hyper parameters optimizing them you can + define what is the what is the KPI that you would like to optimize.", "tokens": + [50364, 509, 362, 411, 364, 8399, 8399, 23271, 2290, 420, 3827, 293, 428, 17225, + 291, 17225, 264, 1412, 40244, 309, 293, 550, 264, 833, 264, 13376, 264, 8399, 23271, + 2316, 486, 1190, 264, 12050, 915, 264, 558, 9284, 915, 264, 558, 9848, 9834, 40425, + 552, 291, 393, 6964, 437, 307, 264, 437, 307, 264, 591, 31701, 300, 291, 576, 411, + 281, 19719, 13, 51814], "temperature": 0.0, "avg_logprob": -0.15898974736531576, + "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.3567464053630829}, + {"id": 258, "seek": 367192, "start": 3671.92, "end": 3700.92, "text": " If it''s + f1 score or recall or whatever and then you will get the model and if you would + like to deep dive you will get the code so you know generating models with low code + so this is another and another area that we should focus in but you know to some + extent I believe that education training and generating high quality content.", + "tokens": [50364, 759, 309, 311, 283, 16, 6175, 420, 9901, 420, 2035, 293, 550, + 291, 486, 483, 264, 2316, 293, 498, 291, 576, 411, 281, 2452, 9192, 291, 486, 483, + 264, 3089, 370, 291, 458, 17746, 5245, 365, 2295, 3089, 370, 341, 307, 1071, 293, + 1071, 1859, 300, 321, 820, 1879, 294, 457, 291, 458, 281, 512, 8396, 286, 1697, + 300, 3309, 3097, 293, 17746, 1090, 3125, 2701, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.13065814971923828, "compression_ratio": 1.7098445595854923, "no_speech_prob": + 0.07434297353029251}, {"id": 259, "seek": 370092, "start": 3700.92, "end": 3706.92, + "text": " So it should be our focus right now.", "tokens": [50364, 407, 309, 820, + 312, 527, 1879, 558, 586, 13, 50664], "temperature": 0.0, "avg_logprob": -0.17756898403167726, + "compression_ratio": 1.6396396396396395, "no_speech_prob": 0.048153337091207504}, + {"id": 260, "seek": 370092, "start": 3706.92, "end": 3729.92, "text": " Yeah I think + you put it really well and I would probably even add to this that yes there is content + which kind of like promotes someone''s solution right but at the same time you really + want to educate like why should people even care about your solution so you need + to take few steps back and explain what are we talking about.", "tokens": [50664, + 865, 286, 519, 291, 829, 309, 534, 731, 293, 286, 576, 1391, 754, 909, 281, 341, + 300, 2086, 456, 307, 2701, 597, 733, 295, 411, 36015, 1580, 311, 3827, 558, 457, + 412, 264, 912, 565, 291, 534, 528, 281, 16092, 411, 983, 820, 561, 754, 1127, 466, + 428, 3827, 370, 291, 643, 281, 747, 1326, 4439, 646, 293, 2903, 437, 366, 321, 1417, + 466, 13, 51814], "temperature": 0.0, "avg_logprob": -0.17756898403167726, "compression_ratio": + 1.6396396396396395, "no_speech_prob": 0.048153337091207504}, {"id": 261, "seek": + 372992, "start": 3729.92, "end": 3757.92, "text": " You know what problems exist + that you are trying to that you are targeting right so I still if I was asking the + same question to myself I still see a lot of content which is much more promotional + than it should be because in the beginning of this revolution you still need to + explain what is happening what the hell is going on you know why why because the + reaction could be also from the incumbent players that they will say no this is + not.", "tokens": [50364, 509, 458, 437, 2740, 2514, 300, 291, 366, 1382, 281, 300, + 291, 366, 17918, 558, 370, 286, 920, 498, 286, 390, 3365, 264, 912, 1168, 281, 2059, + 286, 920, 536, 257, 688, 295, 2701, 597, 307, 709, 544, 41790, 813, 309, 820, 312, + 570, 294, 264, 2863, 295, 341, 8894, 291, 920, 643, 281, 2903, 437, 307, 2737, 437, + 264, 4921, 307, 516, 322, 291, 458, 983, 983, 570, 264, 5480, 727, 312, 611, 490, + 264, 45539, 4150, 300, 436, 486, 584, 572, 341, 307, 406, 13, 51764], "temperature": + 0.2, "avg_logprob": -0.10864009437980232, "compression_ratio": 1.778225806451613, + "no_speech_prob": 0.0878516212105751}, {"id": 262, "seek": 375792, "start": 3757.92, + "end": 3777.92, "text": " No this is not no this is not where things are going they + will go go back to their clients and say the same but like you should not position + it that way you should you should explain as you said like start from the problem + right start from what is your actual business and product target.", "tokens": [50364, + 883, 341, 307, 406, 572, 341, 307, 406, 689, 721, 366, 516, 436, 486, 352, 352, + 646, 281, 641, 6982, 293, 584, 264, 912, 457, 411, 291, 820, 406, 2535, 309, 300, + 636, 291, 820, 291, 820, 2903, 382, 291, 848, 411, 722, 490, 264, 1154, 558, 722, + 490, 437, 307, 428, 3539, 1606, 293, 1674, 3779, 13, 51364], "temperature": 0.0, + "avg_logprob": -0.0915422131938319, "compression_ratio": 1.7349397590361446, "no_speech_prob": + 0.07426150888204575}, {"id": 263, "seek": 377792, "start": 3777.92, "end": 3803.92, + "text": " And I guess this is not something that many engineers asking themselves + some of them some of the best that I know do some of the best data scientists do + as well they don''t code before they understood what is being asked of them and + I think it''s an amazing skill and this is exactly where education also helps like + why should also data scientists or engineers care about this new new field.", "tokens": + [50364, 400, 286, 2041, 341, 307, 406, 746, 300, 867, 11955, 3365, 2969, 512, 295, + 552, 512, 295, 264, 1151, 300, 286, 458, 360, 512, 295, 264, 1151, 1412, 7708, 360, + 382, 731, 436, 500, 380, 3089, 949, 436, 7320, 437, 307, 885, 2351, 295, 552, 293, + 286, 519, 309, 311, 364, 2243, 5389, 293, 341, 307, 2293, 689, 3309, 611, 3665, + 411, 983, 820, 611, 1412, 7708, 420, 11955, 1127, 466, 341, 777, 777, 2519, 13, + 51664], "temperature": 0.0, "avg_logprob": -0.07445611357688904, "compression_ratio": + 1.8093023255813954, "no_speech_prob": 0.07346794754266739}, {"id": 264, "seek": + 380392, "start": 3803.92, "end": 3826.92, "text": " Yeah, yeah, this is super important + and yeah, we should honestly we should you know when I''m saying again internally + we should improve the quality of the content and not trying you know to sell our + solution just to explain for software developer without the background in machine + learning.", "tokens": [50364, 865, 11, 1338, 11, 341, 307, 1687, 1021, 293, 1338, + 11, 321, 820, 6095, 321, 820, 291, 458, 562, 286, 478, 1566, 797, 19501, 321, 820, + 3470, 264, 3125, 295, 264, 2701, 293, 406, 1382, 291, 458, 281, 3607, 527, 3827, + 445, 281, 2903, 337, 4722, 10754, 1553, 264, 3678, 294, 3479, 2539, 13, 51514], + "temperature": 0.0, "avg_logprob": -0.16991370299766803, "compression_ratio": 1.5706521739130435, + "no_speech_prob": 0.057909753173589706}, {"id": 265, "seek": 382692, "start": 3826.92, + "end": 3855.92, "text": " And to simplify it for him and to explain what is the + concept what is the trade off between the concept and again to give him the option + to understand what is happening and issue decide what is the best tool for him is + it a screwdriver is it an hammer it will understand the bits and bytes but and the + trade off and and give him the the full picture about what what are the problems.", + "tokens": [50414, 400, 281, 20460, 309, 337, 796, 293, 281, 2903, 437, 307, 264, + 3410, 437, 307, 264, 4923, 766, 1296, 264, 3410, 293, 797, 281, 976, 796, 264, 3614, + 281, 1223, 437, 307, 2737, 293, 2734, 4536, 437, 307, 264, 1151, 2290, 337, 796, + 307, 309, 257, 27282, 307, 309, 364, 13017, 309, 486, 1223, 264, 9239, 293, 36088, + 457, 293, 264, 4923, 766, 293, 293, 976, 796, 264, 264, 1577, 3036, 466, 437, 437, + 366, 264, 2740, 13, 51814], "temperature": 0.0, "avg_logprob": -0.17395376276086877, + "compression_ratio": 1.9441624365482233, "no_speech_prob": 0.16073961555957794}, + {"id": 266, "seek": 385692, "start": 3856.92, "end": 3865.92, "text": " Yeah, I + think it''s a great question.", "tokens": [50364, 865, 11, 286, 519, 309, 311, 257, + 869, 1168, 13, 50814], "temperature": 0.4, "avg_logprob": -0.45776950353863594, + "compression_ratio": 1.670940170940171, "no_speech_prob": 0.07698927819728851}, + {"id": 267, "seek": 385692, "start": 3865.92, "end": 3872.92, "text": " It was in + cons of every solution and you know you will take the decision.", "tokens": [50814, + 467, 390, 294, 1014, 295, 633, 3827, 293, 291, 458, 291, 486, 747, 264, 3537, 13, + 51164], "temperature": 0.4, "avg_logprob": -0.45776950353863594, "compression_ratio": + 1.670940170940171, "no_speech_prob": 0.07698927819728851}, {"id": 268, "seek": 385692, + "start": 3872.92, "end": 3882.92, "text": " Yeah, exactly and I think if we decade + more some of the players doing doing really great job there and I am looking forward + also to see some blog posts you already mentioned the notebooks that you guys are + publishing on your website and I believe that was searching website right.", "tokens": + [51164, 865, 11, 2293, 293, 286, 519, 498, 321, 10378, 544, 512, 295, 264, 4150, + 884, 884, 534, 869, 1691, 456, 293, 286, 669, 1237, 2128, 611, 281, 536, 512, 6968, + 12300, 291, 1217, 2835, 264, 43782, 300, 291, 1074, 366, 17832, 322, 428, 3144, + 293, 286, 1697, 300, 390, 10808, 3144, 558, 13, 51664], "temperature": 0.4, "avg_logprob": + -0.45776950353863594, "compression_ratio": 1.670940170940171, "no_speech_prob": + 0.07698927819728851}, {"id": 269, "seek": 388292, "start": 3882.92, "end": 3894.92, + "text": " Now that I learned that you really care about the topic I think it''s + important to create and share and maybe educate the educators and give the example + so I think this is really great.", "tokens": [50364, 823, 300, 286, 3264, 300, 291, + 534, 1127, 466, 264, 4829, 286, 519, 309, 311, 1021, 281, 1884, 293, 2073, 293, + 1310, 16092, 264, 22819, 293, 976, 264, 1365, 370, 286, 519, 341, 307, 534, 869, + 13, 50964], "temperature": 0.0, "avg_logprob": -0.13913886687334845, "compression_ratio": + 1.597938144329897, "no_speech_prob": 0.0948852077126503}, {"id": 270, "seek": 388292, + "start": 3894.92, "end": 3908.92, "text": " Yeah, yeah, one example the great blog + was that one of our software developer wrote is how to optimize open search workloads.", + "tokens": [50964, 865, 11, 1338, 11, 472, 1365, 264, 869, 6968, 390, 300, 472, 295, + 527, 4722, 10754, 4114, 307, 577, 281, 19719, 1269, 3164, 32452, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.13913886687334845, "compression_ratio": 1.597938144329897, + "no_speech_prob": 0.0948852077126503}, {"id": 271, "seek": 390892, "start": 3908.92, + "end": 3937.92, "text": " So again it''s not we have a plugin on top of it, but + he wrote about what are the options without you know writing about our solution + and what are the options out they can our customers can optimize it and another + interesting blog post that we will publish soon is you know benchmarking one of + the things that we should improve in our ecosystem is to decide on on a standard + tool that will.", "tokens": [50364, 407, 797, 309, 311, 406, 321, 362, 257, 23407, + 322, 1192, 295, 309, 11, 457, 415, 4114, 466, 437, 366, 264, 3956, 1553, 291, 458, + 3579, 466, 527, 3827, 293, 437, 366, 264, 3956, 484, 436, 393, 527, 4581, 393, 19719, + 309, 293, 1071, 1880, 6968, 2183, 300, 321, 486, 11374, 2321, 307, 291, 458, 18927, + 278, 472, 295, 264, 721, 300, 321, 820, 3470, 294, 527, 11311, 307, 281, 4536, 322, + 322, 257, 3832, 2290, 300, 486, 13, 51814], "temperature": 0.0, "avg_logprob": -0.1335151051900473, + "compression_ratio": 1.728888888888889, "no_speech_prob": 0.01586165279150009}, + {"id": 272, "seek": 393792, "start": 3937.92, "end": 3965.92, "text": " Help us + to decide what is the KPI and the benchmark there are various benchmark over there + i''m familiar we are familiar with rally and the elastic benchmark i haven''t seen + like a good benchmark industry standard in in the vector search there was the challenge + of big a and one year ago two years ago, but again I don''t think we have.", "tokens": + [50364, 10773, 505, 281, 4536, 437, 307, 264, 591, 31701, 293, 264, 18927, 456, + 366, 3683, 18927, 670, 456, 741, 478, 4963, 321, 366, 4963, 365, 17584, 293, 264, + 17115, 18927, 741, 2378, 380, 1612, 411, 257, 665, 18927, 3518, 3832, 294, 294, + 264, 8062, 3164, 456, 390, 264, 3430, 295, 955, 257, 293, 472, 1064, 2057, 732, + 924, 2057, 11, 457, 797, 286, 500, 380, 519, 321, 362, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.22193279005076788, "compression_ratio": 1.6818181818181819, + "no_speech_prob": 0.03456810489296913}, {"id": 273, "seek": 396592, "start": 3965.92, + "end": 3980.92, "text": " Good tool today, but so one of our developers wrote out + to run the benchmark tool so it was like open search benchmark how to use this benchmark + and what is the.", "tokens": [50364, 2205, 2290, 965, 11, 457, 370, 472, 295, 527, + 8849, 4114, 484, 281, 1190, 264, 18927, 2290, 370, 309, 390, 411, 1269, 3164, 18927, + 577, 281, 764, 341, 18927, 293, 437, 307, 264, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.18407570688348068, "compression_ratio": 1.412280701754386, "no_speech_prob": + 0.039454713463783264}, {"id": 274, "seek": 398092, "start": 3980.92, "end": 4004.92, + "text": " Beats and bytes and tips how to understand the benchmark tools so yeah + I think that starting from the education and then offering customers to check your + solution yeah sounds great i think maybe even by the time this podcast is published + we have that new blog as well.", "tokens": [50364, 879, 1720, 293, 36088, 293, 6082, + 577, 281, 1223, 264, 18927, 3873, 370, 1338, 286, 519, 300, 2891, 490, 264, 3309, + 293, 550, 8745, 4581, 281, 1520, 428, 3827, 1338, 3263, 869, 741, 519, 1310, 754, + 538, 264, 565, 341, 7367, 307, 6572, 321, 362, 300, 777, 6968, 382, 731, 13, 51564], + "temperature": 0.0, "avg_logprob": -0.15552491274746982, "compression_ratio": 1.5491329479768785, + "no_speech_prob": 0.2141244113445282}, {"id": 275, "seek": 400492, "start": 4004.92, + "end": 4031.92, "text": " Hey, I''m really excited to be chatting to you today, + I mean it''s attached a lot of deep topics i''m sure we could have gone for longer, + but I was also really curious to ask you this magical why question you know the + same way as you said don''t think about software think about the problem that you''re + solving the reason i''m asking why question is because I truly believe.", "tokens": + [50364, 1911, 11, 286, 478, 534, 2919, 281, 312, 24654, 281, 291, 965, 11, 286, + 914, 309, 311, 8570, 257, 688, 295, 2452, 8378, 741, 478, 988, 321, 727, 362, 2780, + 337, 2854, 11, 457, 286, 390, 611, 534, 6369, 281, 1029, 291, 341, 12066, 983, 1168, + 291, 458, 264, 912, 636, 382, 291, 848, 500, 380, 519, 466, 4722, 519, 466, 264, + 1154, 300, 291, 434, 12606, 264, 1778, 741, 478, 3365, 983, 1168, 307, 570, 286, + 4908, 1697, 13, 51714], "temperature": 0.0, "avg_logprob": -0.1777872471582322, + "compression_ratio": 1.6255506607929515, "no_speech_prob": 0.3613390326499939}, + {"id": 276, "seek": 403192, "start": 4031.92, "end": 4060.92, "text": " But if you + don''t understand why you do things then you''re kind of like flying through things + right so you might as well regret some sometime later maybe you train the muscle + but still i don''t think it''s a good sensible approach in your life so i''m really + interested given all your experience in in machine learning and product management + software development why are you excited to work on vector search search whatever + is it that you do day today.", "tokens": [50364, 583, 498, 291, 500, 380, 1223, + 983, 291, 360, 721, 550, 291, 434, 733, 295, 411, 7137, 807, 721, 558, 370, 291, + 1062, 382, 731, 10879, 512, 15053, 1780, 1310, 291, 3847, 264, 8679, 457, 920, 741, + 500, 380, 519, 309, 311, 257, 665, 25380, 3109, 294, 428, 993, 370, 741, 478, 534, + 3102, 2212, 439, 428, 1752, 294, 294, 3479, 2539, 293, 1674, 4592, 4722, 3250, 983, + 366, 291, 2919, 281, 589, 322, 8062, 3164, 3164, 2035, 307, 309, 300, 291, 360, + 786, 965, 13, 51814], "temperature": 0.0, "avg_logprob": -0.11619693241762312, "compression_ratio": + 1.6905660377358491, "no_speech_prob": 0.023022204637527466}, {"id": 277, "seek": + 406192, "start": 4062.32, "end": 4083.04, "text": " Yeah, I think this is a great + question I really like the why combinator accelerator approach for building products + build something that customers like or love and essentially you know we are building + some trying to build some cool stuff and make.", "tokens": [50384, 865, 11, 286, + 519, 341, 307, 257, 869, 1168, 286, 534, 411, 264, 983, 2512, 31927, 39889, 3109, + 337, 2390, 3383, 1322, 746, 300, 4581, 411, 420, 959, 293, 4476, 291, 458, 321, + 366, 2390, 512, 1382, 281, 1322, 512, 1627, 1507, 293, 652, 13, 51420], "temperature": + 0.0, "avg_logprob": -0.24081407274518693, "compression_ratio": 1.5279503105590062, + "no_speech_prob": 0.02774079144001007}, {"id": 278, "seek": 408304, "start": 4083.04, + "end": 4110.04, "text": " People''s life easier happier i gave an example of the + girl from Asia so this is this is I think one of our mission but it''s not only + the girl from Asia that would like to purchase a red short sleeve dress it''s the + DevOps that is trying to find the right log and instead of working it for hours + it will take him.", "tokens": [50364, 3432, 311, 993, 3571, 20423, 741, 2729, 364, + 1365, 295, 264, 2013, 490, 10038, 370, 341, 307, 341, 307, 286, 519, 472, 295, 527, + 4447, 457, 309, 311, 406, 787, 264, 2013, 490, 10038, 300, 576, 411, 281, 8110, + 257, 2182, 2099, 21138, 5231, 309, 311, 264, 43051, 300, 307, 1382, 281, 915, 264, + 558, 3565, 293, 2602, 295, 1364, 309, 337, 2496, 309, 486, 747, 796, 13, 51714], + "temperature": 0.0, "avg_logprob": -0.1759140756395128, "compression_ratio": 1.5707070707070707, + "no_speech_prob": 0.19459618628025055}, {"id": 279, "seek": 411004, "start": 4110.04, + "end": 4127.04, "text": " Seconds okay so essentially our mission and i''m excited + that i''m working on this topic it''s to make the the consumers businesses and enterprises + life easier.", "tokens": [50364, 5736, 82, 1392, 370, 4476, 527, 4447, 293, 741, + 478, 2919, 300, 741, 478, 1364, 322, 341, 4829, 309, 311, 281, 652, 264, 264, 11883, + 6011, 293, 29034, 993, 3571, 13, 51214], "temperature": 0.0, "avg_logprob": -0.2132699966430664, + "compression_ratio": 1.3305084745762712, "no_speech_prob": 0.0479571595788002}, + {"id": 280, "seek": 412704, "start": 4127.24, "end": 4136.44, "text": " And so I + think it''s a very simple statement of the why and I believe that this is this is + my mission or this is our mission.", "tokens": [50374, 400, 370, 286, 519, 309, + 311, 257, 588, 2199, 5629, 295, 264, 983, 293, 286, 1697, 300, 341, 307, 341, 307, + 452, 4447, 420, 341, 307, 527, 4447, 13, 50834], "temperature": 0.0, "avg_logprob": + -0.17513853406149243, "compression_ratio": 1.5471698113207548, "no_speech_prob": + 0.09198193997144699}, {"id": 281, "seek": 412704, "start": 4137.5199999999995, "end": + 4144.24, "text": " And and to some extent I think that this is like a doing good + perspective so you know you have like.", "tokens": [50888, 400, 293, 281, 512, 8396, + 286, 519, 300, 341, 307, 411, 257, 884, 665, 4585, 370, 291, 458, 291, 362, 411, + 13, 51224], "temperature": 0.0, "avg_logprob": -0.17513853406149243, "compression_ratio": + 1.5471698113207548, "no_speech_prob": 0.09198193997144699}, {"id": 282, "seek": + 412704, "start": 4146.12, "end": 4149.6, "text": " gambling company is.", "tokens": + [51318, 27077, 2237, 307, 13, 51492], "temperature": 0.0, "avg_logprob": -0.17513853406149243, + "compression_ratio": 1.5471698113207548, "no_speech_prob": 0.09198193997144699}, + {"id": 283, "seek": 414960, "start": 4150.6, "end": 4173.6, "text": " building some + stuff and building applications and i''m my approach is you know building things + that will help the humanity so i''m exciting that this is the things that i''m working + on and by the way this was in my previous startup when we tried to save life right + drowning detection.", "tokens": [50414, 2390, 512, 1507, 293, 2390, 5821, 293, 741, + 478, 452, 3109, 307, 291, 458, 2390, 721, 300, 486, 854, 264, 10243, 370, 741, 478, + 4670, 300, 341, 307, 264, 721, 300, 741, 478, 1364, 322, 293, 538, 264, 636, 341, + 390, 294, 452, 3894, 18578, 562, 321, 3031, 281, 3155, 993, 558, 37198, 17784, 13, + 51564], "temperature": 0.0, "avg_logprob": -0.1816116106712212, "compression_ratio": + 1.6149425287356323, "no_speech_prob": 0.017696810886263847}, {"id": 284, "seek": + 417360, "start": 4173.6, "end": 4184.6, "text": " and you issue residential pool + open water and you know save life and if we can you know save life maybe for health + applications.", "tokens": [50364, 293, 291, 2734, 17389, 7005, 1269, 1281, 293, + 291, 458, 3155, 993, 293, 498, 321, 393, 291, 458, 3155, 993, 1310, 337, 1585, 5821, + 13, 50914], "temperature": 0.0, "avg_logprob": -0.3255566564099542, "compression_ratio": + 1.5975609756097562, "no_speech_prob": 0.03280489891767502}, {"id": 285, "seek": + 417360, "start": 4185.6, "end": 4193.84, "text": " detecting cancer with the image + embeddings or some other cool stuff i''m super excited that this is the domain that + we are working on.", "tokens": [50964, 40237, 5592, 365, 264, 3256, 12240, 29432, + 420, 512, 661, 1627, 1507, 741, 478, 1687, 2919, 300, 341, 307, 264, 9274, 300, + 321, 366, 1364, 322, 13, 51376], "temperature": 0.0, "avg_logprob": -0.3255566564099542, + "compression_ratio": 1.5975609756097562, "no_speech_prob": 0.03280489891767502}, + {"id": 286, "seek": 419384, "start": 4194.4400000000005, "end": 4202.360000000001, + "text": " yeah this is this is very relatable in this fantastic that you''re bringing + this up you know how we can actually improve life.", "tokens": [50394, 1338, 341, + 307, 341, 307, 588, 42355, 294, 341, 5456, 300, 291, 434, 5062, 341, 493, 291, 458, + 577, 321, 393, 767, 3470, 993, 13, 50790], "temperature": 0.0, "avg_logprob": -0.21439705954657662, + "compression_ratio": 1.6648936170212767, "no_speech_prob": 0.021486863493919373}, + {"id": 287, "seek": 419384, "start": 4203.56, "end": 4204.64, "text": " beside building.", + "tokens": [50850, 15726, 2390, 13, 50904], "temperature": 0.0, "avg_logprob": -0.21439705954657662, + "compression_ratio": 1.6648936170212767, "no_speech_prob": 0.021486863493919373}, + {"id": 288, "seek": 419384, "start": 4205.360000000001, "end": 4207.84, "text": + " great products or products that sell.", "tokens": [50940, 869, 3383, 420, 3383, + 300, 3607, 13, 51064], "temperature": 0.0, "avg_logprob": -0.21439705954657662, + "compression_ratio": 1.6648936170212767, "no_speech_prob": 0.021486863493919373}, + {"id": 289, "seek": 419384, "start": 4208.92, "end": 4218.16, "text": " This is + this is amazing and to conclude is there any announcement that you want to make + from the from your side of from search room.", "tokens": [51118, 639, 307, 341, + 307, 2243, 293, 281, 16886, 307, 456, 604, 12847, 300, 291, 528, 281, 652, 490, + 264, 490, 428, 1252, 295, 490, 3164, 1808, 13, 51580], "temperature": 0.0, "avg_logprob": + -0.21439705954657662, "compression_ratio": 1.6648936170212767, "no_speech_prob": + 0.021486863493919373}, {"id": 290, "seek": 421816, "start": 4218.5199999999995, + "end": 4219.16, "text": " A. I.", "tokens": [50382, 316, 13, 286, 13, 50414], "temperature": + 0.0, "avg_logprob": -0.40305386559437895, "compression_ratio": 1.42, "no_speech_prob": + 0.021343037486076355}, {"id": 291, "seek": 421816, "start": 4220.16, "end": 4224.48, + "text": " yeah yeah so i''m very excited because we are building.", "tokens": [50464, + 1338, 1338, 370, 741, 478, 588, 2919, 570, 321, 366, 2390, 13, 50680], "temperature": + 0.0, "avg_logprob": -0.40305386559437895, "compression_ratio": 1.42, "no_speech_prob": + 0.021343037486076355}, {"id": 292, "seek": 421816, "start": 4225.96, "end": 4235.32, + "text": " some cool stuff so the first one we launch our search you may i platform + where we offer customers a free tier to check our.", "tokens": [50754, 512, 1627, + 1507, 370, 264, 700, 472, 321, 4025, 527, 3164, 291, 815, 741, 3663, 689, 321, 2626, + 4581, 257, 1737, 12362, 281, 1520, 527, 13, 51222], "temperature": 0.0, "avg_logprob": + -0.40305386559437895, "compression_ratio": 1.42, "no_speech_prob": 0.021343037486076355}, + {"id": 293, "seek": 421816, "start": 4236.5199999999995, "end": 4239.32, "text": + " platform and again it''s not.", "tokens": [51282, 3663, 293, 797, 309, 311, 406, + 13, 51422], "temperature": 0.0, "avg_logprob": -0.40305386559437895, "compression_ratio": + 1.42, "no_speech_prob": 0.021343037486076355}, {"id": 294, "seek": 423932, "start": + 4240.32, "end": 4252.32, "text": " fully working we are not supporting all of the + features but it this is very important it is very important for us to get your feedback + so i encourage you to check it and to.", "tokens": [50414, 4498, 1364, 321, 366, + 406, 7231, 439, 295, 264, 4122, 457, 309, 341, 307, 588, 1021, 309, 307, 588, 1021, + 337, 505, 281, 483, 428, 5824, 370, 741, 5373, 291, 281, 1520, 309, 293, 281, 13, + 51014], "temperature": 0.0, "avg_logprob": -0.1634552323973024, "compression_ratio": + 1.7142857142857142, "no_speech_prob": 0.06491108238697052}, {"id": 295, "seek": + 423932, "start": 4253.32, "end": 4257.32, "text": " send me an email or send my + team an email or in our support.", "tokens": [51064, 2845, 385, 364, 3796, 420, + 2845, 452, 1469, 364, 3796, 420, 294, 527, 1406, 13, 51264], "temperature": 0.0, + "avg_logprob": -0.1634552323973024, "compression_ratio": 1.7142857142857142, "no_speech_prob": + 0.06491108238697052}, {"id": 296, "seek": 423932, "start": 4258.32, "end": 4265.32, + "text": " give your feedback don''t be a gentle with us we are trying you know to + build.", "tokens": [51314, 976, 428, 5824, 500, 380, 312, 257, 6424, 365, 505, 321, + 366, 1382, 291, 458, 281, 1322, 13, 51664], "temperature": 0.0, "avg_logprob": -0.1634552323973024, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.06491108238697052}, + {"id": 297, "seek": 426532, "start": 4266.32, "end": 4269.32, "text": " things that + developers would like and.", "tokens": [50414, 721, 300, 8849, 576, 411, 293, 13, + 50564], "temperature": 0.0, "avg_logprob": -0.14582645359323987, "compression_ratio": + 1.58, "no_speech_prob": 0.026000134646892548}, {"id": 298, "seek": 426532, "start": + 4270.32, "end": 4281.32, "text": " we are very focused on the customer so this is + the first announcement and every feedback is valuable next year we will launch our + second generation where we can offer.", "tokens": [50614, 321, 366, 588, 5178, 322, + 264, 5474, 370, 341, 307, 264, 700, 12847, 293, 633, 5824, 307, 8263, 958, 1064, + 321, 486, 4025, 527, 1150, 5125, 689, 321, 393, 2626, 13, 51164], "temperature": + 0.0, "avg_logprob": -0.14582645359323987, "compression_ratio": 1.58, "no_speech_prob": + 0.026000134646892548}, {"id": 299, "seek": 426532, "start": 4282.32, "end": 4293.32, + "text": " better performance more than 10x we have a few new implementations and + in terms of performance and hopefully.", "tokens": [51214, 1101, 3389, 544, 813, + 1266, 87, 321, 362, 257, 1326, 777, 4445, 763, 293, 294, 2115, 295, 3389, 293, 4696, + 13, 51764], "temperature": 0.0, "avg_logprob": -0.14582645359323987, "compression_ratio": + 1.58, "no_speech_prob": 0.026000134646892548}, {"id": 300, "seek": 429332, "start": + 4294.32, "end": 4296.32, "text": " at the beginning of.", "tokens": [50414, 412, + 264, 2863, 295, 13, 50514], "temperature": 0.0, "avg_logprob": -0.18840161809381448, + "compression_ratio": 1.4794520547945205, "no_speech_prob": 0.019750285893678665}, + {"id": 301, "seek": 429332, "start": 4296.32, "end": 4307.32, "text": " 2023 we + will release our python compiler and some other cool stuff so we are working on + a few vectors if I may use.", "tokens": [50514, 44377, 321, 486, 4374, 527, 38797, + 31958, 293, 512, 661, 1627, 1507, 370, 321, 366, 1364, 322, 257, 1326, 18875, 498, + 286, 815, 764, 13, 51064], "temperature": 0.0, "avg_logprob": -0.18840161809381448, + "compression_ratio": 1.4794520547945205, "no_speech_prob": 0.019750285893678665}, + {"id": 302, "seek": 429332, "start": 4307.32, "end": 4314.32, "text": " vectors + and yeah on the software on the hardware on the system user experience.", "tokens": + [51064, 18875, 293, 1338, 322, 264, 4722, 322, 264, 8837, 322, 264, 1185, 4195, + 1752, 13, 51414], "temperature": 0.0, "avg_logprob": -0.18840161809381448, "compression_ratio": + 1.4794520547945205, "no_speech_prob": 0.019750285893678665}, {"id": 303, "seek": + 431432, "start": 4315.32, "end": 4319.32, "text": " And the user interface and to + simplify it.", "tokens": [50414, 400, 264, 4195, 9226, 293, 281, 20460, 309, 13, + 50614], "temperature": 0.0, "avg_logprob": -0.1834882762697008, "compression_ratio": + 1.5642458100558658, "no_speech_prob": 0.03493248298764229}, {"id": 304, "seek": + 431432, "start": 4320.32, "end": 4326.32, "text": " yeah so this is the things that + we are working right now and we will be happy to be in touch.", "tokens": [50664, + 1338, 370, 341, 307, 264, 721, 300, 321, 366, 1364, 558, 586, 293, 321, 486, 312, + 2055, 281, 312, 294, 2557, 13, 50964], "temperature": 0.0, "avg_logprob": -0.1834882762697008, + "compression_ratio": 1.5642458100558658, "no_speech_prob": 0.03493248298764229}, + {"id": 305, "seek": 431432, "start": 4327.32, "end": 4335.32, "text": " sounds great + thanks you have looks like your plate is full of really exciting things so all the + best.", "tokens": [51014, 3263, 869, 3231, 291, 362, 1542, 411, 428, 5924, 307, + 1577, 295, 534, 4670, 721, 370, 439, 264, 1151, 13, 51414], "temperature": 0.0, + "avg_logprob": -0.1834882762697008, "compression_ratio": 1.5642458100558658, "no_speech_prob": + 0.03493248298764229}, {"id": 306, "seek": 431432, "start": 4335.32, "end": 4339.32, + "text": " to you and your team i know some of them.", "tokens": [51414, 281, 291, + 293, 428, 1469, 741, 458, 512, 295, 552, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.1834882762697008, "compression_ratio": 1.5642458100558658, "no_speech_prob": + 0.03493248298764229}, {"id": 307, "seek": 433932, "start": 4340.32, "end": 4350.32, + "text": " yeah it''s it''s amazing that you guys are building this and i''m really + looking forward to gen 2 of of the APU hardware as well.", "tokens": [50414, 1338, + 309, 311, 309, 311, 2243, 300, 291, 1074, 366, 2390, 341, 293, 741, 478, 534, 1237, + 2128, 281, 1049, 568, 295, 295, 264, 5372, 52, 8837, 382, 731, 13, 50914], "temperature": + 0.0, "avg_logprob": -0.27265625, "compression_ratio": 1.6968325791855203, "no_speech_prob": + 0.05698372796177864}, {"id": 308, "seek": 433932, "start": 4351.32, "end": 4356.32, + "text": " yeah and all the best we will stay in touch thank you very much for this + episode.", "tokens": [50964, 1338, 293, 439, 264, 1151, 321, 486, 1754, 294, 2557, + 1309, 291, 588, 709, 337, 341, 3500, 13, 51214], "temperature": 0.0, "avg_logprob": + -0.27265625, "compression_ratio": 1.6968325791855203, "no_speech_prob": 0.05698372796177864}, + {"id": 309, "seek": 433932, "start": 4358.32, "end": 4368.32, "text": " yeah thank + you very much the mid tree was a pleasure talking with you you know super interesting + stuff I can you know talk for hours about this the man you know it''s.", "tokens": + [51314, 1338, 1309, 291, 588, 709, 264, 2062, 4230, 390, 257, 6834, 1417, 365, 291, + 291, 458, 1687, 1880, 1507, 286, 393, 291, 458, 751, 337, 2496, 466, 341, 264, 587, + 291, 458, 309, 311, 13, 51814], "temperature": 0.0, "avg_logprob": -0.27265625, + "compression_ratio": 1.6968325791855203, "no_speech_prob": 0.05698372796177864}, + {"id": 310, "seek": 436932, "start": 4369.32, "end": 4376.32, "text": " i''m excited + that working this domain and really looking forward to you know.", "tokens": [50364, + 741, 478, 2919, 300, 1364, 341, 9274, 293, 534, 1237, 2128, 281, 291, 458, 13, 50714], + "temperature": 0.0, "avg_logprob": -0.3031829062928545, "compression_ratio": 1.4596774193548387, + "no_speech_prob": 0.006817470770329237}, {"id": 311, "seek": 436932, "start": 4376.32, + "end": 4381.32, "text": " hear from the community fantastic thanks so much and if.", + "tokens": [50714, 1568, 490, 264, 1768, 5456, 3231, 370, 709, 293, 498, 13, 50964], + "temperature": 0.0, "avg_logprob": -0.3031829062928545, "compression_ratio": 1.4596774193548387, + "no_speech_prob": 0.006817470770329237}, {"id": 312, "seek": 436932, "start": 4381.32, + "end": 4383.32, "text": " thank you for now.", "tokens": [50964, 1309, 291, 337, + 586, 13, 51064], "temperature": 0.0, "avg_logprob": -0.3031829062928545, "compression_ratio": + 1.4596774193548387, "no_speech_prob": 0.006817470770329237}, {"id": 313, "seek": + 436932, "start": 4383.32, "end": 4386.32, "text": " thank you very much bye bye.", + "tokens": [51064, 1309, 291, 588, 709, 6543, 6543, 13, 51214], "temperature": 0.0, + "avg_logprob": -0.3031829062928545, "compression_ratio": 1.4596774193548387, "no_speech_prob": + 0.006817470770329237}, {"id": 314, "seek": 439932, "start": 4399.32, "end": 4410.32, + "text": " music playing", "tokens": [50364, 1318, 2433, 50914], "temperature": 0.0, + "avg_logprob": -0.9839839935302734, "compression_ratio": 0.6190476190476191, "no_speech_prob": + 0.8395236730575562}]' +--- + +Hello, vector podcast is here. We again continue to roll in this season 2 of this podcast. Today I have a very interesting guest, Yanny Vaknin, who is the director of product at Searchum. If you've read my blog post on not all vector databases, I made equal. +One of the vector databases, all like technologies, stood out. And it's a technology made by GSI, technology company. And it's implementing a hardware for vector search. It's very rare that this thing exists or this approach exists on the market today. And I'm super excited to talk to Yanny Vaknin. +How are you, Yanny Vaknin? Hey. Great. Thanks for having me, maybe, Mietri. Yeah. I'm really glad you joined and found time in your business schedule. +So can you first explain how searchum and GSI are related? And maybe at the same time, if you could traditionally introduce yourself and talk about your background and how you got here. Yeah. So maybe I will start with quick introduction. So I'm director of product at Searchum AI. +Searchum AI is a SaaS platform for ML search application, based on purpose build AI chip for search applications. Prior to this role, I've worked at AWS as a machine learning specialist where I've worked with broad spectrum of top t top tier tech companies. +Trying to help them in their machine learning domain. And I was super excited from the revolution of the like the fifth revolution, the AI revolution with cool stuff of NLP search. Unstructed data structure data. I've worked with various companies cyber, fintech e-commerce, etc. +I was co founder and CEO of deep sea AI, which was the first computer vision based system for open water drowning detection. So we are the SaaS solution of GSI. GSI acquired an Israeli startup a few years ago. And the founder is Dr. Avidan Krib, which is one of the smartest guy that I ever met. +And during this PhD, he invented a new concept. So traditionally, CPU is communicating with the memory and then you have like challenges of bottleneck, IO, etc. But when you build the new concept, you build a memory that is the processor. So all of the computation is happening inside the memory. +You can guess that when you're running heavy or intensive intensive memory applications, if all of the processing is happening inside the memory, you can get a single digit millisecond latency. Yeah, so GSI acquired the Israeli startup and we are based in Israel. +We have an R&D team of approximately 40 to 50 people. In order to scale it, we started searching AI because it's super hard to scale hardware. So today we are offering this unique hardware on the cloud. It can be AWS, GCP or any other cloud and customers can consume it as a SaaS platform. +Yeah, makes sense. But there is still an option to if I want to have a completely on-premise sort of like setup, right? In principle, I could have bought like the APU cards, APU being a associative processing unit. Cards and like install them similar to what I would do with GPU. +Is that right? Yeah, yeah, totally. Yeah, so there are two types of implementation. The first one is on-prem and the second is via the cloud. +There are various configurations and in terms of if for instance, customers that would like to consume it as an on-prem solution, there are various capabilities. And one of the major things about this hardware accelerator is the power consumption. +So comparing it to CPU or GPU, it is like can be 5% or 10% of the power consumption. So companies that are running heavy workloads of GPU and CPU, the total cost of ownership for them is the power consumption. And other factors. +So on-prem customers can reduce the infrastructure cost in terms of the total cost of ownership, power consumption, etc. Yeah, this is really cool. And I think it's not very frequently that we mentioned power consumption as one of the dimensions on this podcast. +I mean, I think it's crucial of course for the planet and also for the electricity bill. And how the electricity costs skyrocketing, you know, and I think it's quite important. Yeah, sorry. +No, I was just kind of alluding to this fact that it's very like you should not skip it in kind of assessing a system or like a vector search solution, right. Not only focusing entirely on the offering itself, like you should still worry about how it will scale in different dimensions. +I'm glad you guys also worry about the power consumption part. Yeah, low carbon footprint is a major issue right now and especially in Europe. Usually developers when they are launching the AWS instances, so they choose by parameters of virtual CPU, RAM, etc. +And they are not aware of the carbon footprint, but when you are running it on-prem, this is a major parameters and this is a key parameter to, you know, taking a decision what is the right platform or right is the right. Hardware for you. +So totally agree, but, you know, I believe in an agree with you, we should take it into consideration and assume for cloud providers to integrate cloud providers like AWS, GCP, Azure. +This can be, you know, critical for them and we are in conversation with some of the company of some of the cloud providers that I mentioned. Yeah, this sounds great. +If we move a little bit closer to the algorithm side, so this is a kind of like dedicated hardware and as far as I understood also based on our brain buzz, buzzwords presentation. This hardware can support not only vector search, but some other scenarios, right. +Like for image processing related tasks. So is there any kind of like constraint on what type of vector search algorithm you can implement or is it doesn't it doesn't have any constraint. Yeah, yeah. +So I think that the biggest challenge today is when you are developing hardly can develop like a state of the art hardware, but I think the major challenge is how do you integrate it with the community. And video I've done it. Very good with the CUDA and it should be part of the ecosystem. +So in terms of applications here, we have like another application for image processing, it is based on. So, say it's set light images and radar images and we can process it like faster in a few orders of magnitude comparing to Jeep and video GPU. +So, we have in the past we had a few other applications for a genome and the molecules and today we are. We would like to you know to focus on on the biggest challenges like I believe that you know searches and you know we can elaborate about it later on. +Search is still broken and this is a huge market and so our focus right now is on the search. And we actually have to spend it to other solutions as well like image processing and we already have a solution and a customer for this solution. +And then one of our efforts is to build an ecosystem around this so. Hopefully soon we will launch our Python compiler so developers can write their code on Python and then you know run it. Simulnglessly on on our APU without you know trying to learn a new framework or a new language. +So, this is another direction that we are working we are trying I think one of the biggest challenges today is you know to simplifying the technological stack for developers so they are working with the common frameworks or languages they don't want to learn it. +So, they they they would like you know to stay with the. With the languages that are familiar and you know the learning curve is not always. They don't always have time for you know to learn a new framework so we are trying to simplify the integration with their current stack. +One of our solution is is a plugin on top of elastic and open search and which they are offering a vector search today but and we can talk about it but. So we have a plugin on top of this. +Search applications because some of some of the customers would like you know to stay with their curing the last or open search so we built a plugin on top of it and we are. We are talking with search engines and vector database in order to implement. +Our solution with their solution and I think in terms of like the the landscape so we are we are not perceived the vector search engines and vector database as competitors. +My perception is that they are potential they are partners and better together and you know to give a greater value for their customers reducing the infrastructure cost and give. +So the lower latency with the same without sacrificing accuracy so yeah we are you know trying to be part of the ecosystem and you know to help them and. To help customer scale there and improve their scale their search applications yeah yeah this is interesting you touched on. +You know being like a competitor to vector database I think it's interesting topic in general because on one hand if you take all vector database players they kind of look at each other as competitors probably but at the same time as all of you players are sharing the. + You know the approach the documentation the how you think about yourself I think it also helps cumulatively to the whole market but I wasn't also wanted to drill in a bit into this elastic search and open search plugin so essentially like what elastic search team has been doing recently and I think they released now some updates in version 8. +5 where you can you can do things like hybrid search right but this is all based on. On the. A and an implementation on top of Lucin so it's all inside Java so it kind of runs in the same gvm right. The approach that you you guys have implemented it's basically like a. +A vector search backend right which kind of runs somewhere else let's see if we're using the cloud offering but at the same time it feels like a sort of like native to elastic search so I don't need to do much right I just need to install the plugin of course I need to have credentials. + And what I wanted to say is that it feels like you expand the capabilities of elastic search beyond what what it offers in a way that you can actually remove the load of vector search away from it to another back and right can you talk a bit more on the unit cost on on this kind of unit economy and and and sort of advantages of the approach that that you have implemented. + Yeah, this is a great point actually so but we are trying to decouple storage and compute so let's say for instance customer with elastic or open search and they they're having like tens of clusters and they would like you know to scale it and to optimize it so we are running on top of of elastic and you can. +Our solution is kind of the compute for elastic so they can run and scale and reduce the infrastructure cost because you know all of these is is a is a question of how many machines do you run okay so you can get like 99.9 recall it and or accuracy. +And you can get like single digit millisecond latency but in terms of the infrastructure costs so you know one of the biggest challenges today for enterprises is the. + The low margins due to heavy infrastructure applications so if you are running GPU on the cloud or like heavy machines with big machines with high memory it's great in terms of the business because it's great in terms of the dev team in terms of we are getting great performance high recall but again when you're moving and you're discussing on on the business side. + So in terms of the margins and the profit of the companies and today it's a big issue you know with companies that are having the challenge of being profitable so we are trying and we add like a few benchmarks we are trying to reduce the infrastructure cost so instead of 10 machines it can be two machines and and our accelerator our. +For APU and with that you can you know scale it and you know one one other interesting thing is like many companies are talking about the scale challenge about the one billion scale challenge so and you know data is exploding right because you know today they are. + 80 zetabytes and 10 years ago it was like 16 so essentially like data is growing very fast and I assume that in the next couple of years it will grow exponentially and 90% of this data and the data that is created every year is unstructured so you know this is the cliche of finding a needle in a haystack. + So in terms of and I assume more and more companies will face the scale challenge like above one billion and I know that this is a challenge for some of the search engine companies you know scaling to hundreds of millions and billions and I had a conversation with one of the biggest e-commerce in in Asia and he told me yeah our our challenges to scale they have like two billion. +Index and again the infrastructure cost is a major issue I read a post by Amazon's CFO and like a week ago and their focus right now is reducing the infrastructure cost for their customers and any solution that can reduce the infrastructure cost for enterprises I think it's a major issue for. +Not only R&D teams but business and decision-makers in enterprises. +Yeah well I will I will pull for that link so we can also include it in the in the show notes some of our listeners by the way find it quite educational to have all this additional links and study materials and I think we can also include that that's super super cool. + So in a way like your challenges that you basically need to as you said there are low margins right for this big players everyone tries to stay profitable so in a way your challenges to not only fit into that narrow kind of window but also be profitable yourself right so like you like provide that acceleration. +What do you think where do you stand today on that I do you think there is a lot still to do or do you think it's already something that companies can try. + Yeah so today we have like the first generation of our AI chip the APU the potential of improving our hardware and our bill of material of our hardware and generally speaking next year we launch our second generation so for instance if today we can you know in terms of performance we are talking about single double digit millisecond latency with one APU. +Next year we will launch our second generation it will be more than 10x faster so I think we are just scratching the tip of the iceberg so the I think that the hardware challenge is solved but. +You know every week we have like a new implementation and improving our performance on the software layer so we have a few layers we have the hardware layers so I spoke about it like the first generation and the second generation. +I believe that there there's a huge potential in terms of optimizing our software layer because it is. We are trying to reinvent search so. I think there's a huge potential on on the hardware side but I think we are just we are just we didn't even start to optimize our software performance. +Recently we found a new implementation to improve the latency by reduce the latency by 40% like it was two weeks ago so hopefully we will launch it to production in the upcoming in the next up upcoming weeks. +So and in terms of your question yeah I think we are just at the beginning and I believe that we can optimize both on the on the hardware and the software layer and hopefully it will be very profitable. + Sounds great I mean in general it was since I have a had exposure to it as we implemented the image search demo it was quite interesting how you know how easy it was to set it up right so it like and and you don't need to worry about that hardware thing yeah it acts a little bit like a black box but on the other hand it's very scalable so. + And you guys also have I will make sure to link this you also published the is called neural hashing algorithm right which you which you use one of the algorithms that you have implemented it would also be equal to drill in into that direction but I mean in general it was fairly straightforward how you know how we upload the data how it gets indexed and then how we can query. + Yeah I was just thinking to take it a little bit deeper can you talk to some of the features you know many of the vector database players they say why do you need vector databases because first of all if you took files for example a similar framework you wouldn't have the filter support right and of course in. +In real application like such app you do need filters alongside the whatever retriever you implement right keyword or vector so can you talk a bit more about features and maybe also touch on the algorithms that you guys have implemented. +Yeah yeah so there are various types of features and implementations we are working with you know the common. Algorithms it can be either flat search for applications like it can be a face recognition where you need to. +Search any every record we have implementations of the a nm i vf and new implementation of hnsw on our apu and pre filter and other features one of one of our. +One of the areas that we would like to focus is as you mentioned is is to simplify so we can you know you can work with it as a as a black box and install the plugin and work with your. +With your technological stack and with your search application either elastic open search or a vector search engine or vector database. + And pre filter as you mentioned is supported and I think that we should focus on simplifying this is our biggest challenge simplifying the work with our platform and creating more integrations and more connectors not not on the feature level but in terms of you know working with the the ecosystem this is this is our main. +We are in focus right now and again improving the performance because we are you know customer obsessed and we would like our customer to get the lowest infrastructure cost and with without sacrificing latency and. Sorry and the accuracy. +Yeah that makes sense and especially like to do this at scale right I know that some of the players they say that it's very rare that there are clients with more than you dozens of millions of items right but today you already mentioned that there are clients. +Which have you know more than a billion items maybe more than two billion items so do you think that going forward you will see more of these second you know type of players with more data or do you think that there is still a use for dedicated hardware for this kind of smaller scale players. + Yeah yeah absolutely agree with you I think that in terms of the scale challenge we are we are working with with customers and some of them here as you mentioned like tens of billions but moving forward I think most of the enterprises and the big companies will move forward and they will scale to one billion 10 billion and maybe even more. +In terms of like the ecosystem so my two cents is that companies are stealing the concept of keyboard search for some applications TF IDF at bm 25 for some for some application it's a good solution and you know you don't need an hammer for a screw driver. + So that's the problem right so for some use cases keyboard search is a good fit and this is like you know part of the concept of vibrate search and so I think we are still we're we're the beginning of the the vector search if I may call it the vector search revolution when you know where you can have the. + Back concept like any unstructured data we were usually we are talking about text but there are broad areas that we could you know develop some cool stuff for as I mentioned genome audio video search we have a notebook with you know website notebook with video search and again the the there's a broad spectrum of applications that companies can develop some cool stuff cool stuff. +And we are you know excited to see brilliant ideas and start up that are developing applications on top of of these vector search applications. +Yeah you touch on that topic by the way which I also spoke to to some extent on the haystack conference in Berlin where I gave a keynote also make sure to give the link. + Back turnbull said that let's stop calling it vector search because and I don't know how I really interested to really interested to hear your thoughts on that because in principle you you and I being product managers like if we think about some problem to solve right let's say we want to introduce I don't know question answering component in our search engines it's not like we would probably if we didn't know that we probably wouldn't say oh I know how to solve it it's vector search. + And so instead he was saying you know let's call it relevance application right or relevance oriented application what's your broad take on this you touched on this as well like people are not yet aware of this revolution it's probably already happening but people don't know what to do with it right and I just yesterday saw it with from one user saying can you actually explain what what can I do with it. +So do you think that the world is still let's say the world of software development is still awakening to this new field. + Yeah yeah absolutely I fully agree with you essentially when I'm talking with developers and I'm saying we are I'm talking again we are working on vector search they are like asking vector what and I think that most of the developers and this is one of the things that I'm going to do is I'm going to do it. + And this is one of our challenges is to democratize AI and machine learning so in terms of technology my perspective and is that technology is an enabler if the best solution is vector search great it can you know outperform on on various applications but the technology on a product perspective so you are trying to create value I think that the first lesson of a product. +The second lesson of a product is to create value for your customer that's it simple as that and what is the technology and what is under the hood and what is inside the black box it really doesn't matter. +In terms of technology yeah there's you know and we are like it's a crazy time for developers in terms of the AI machine learning revolution stable diffusion generative generative AI and I've heard about that they are going open AI planning to launch the new GPT for. + And the pace of innovation is is totally crazy so and it's really hard to keep it to keep it simple to simplify it when when people are asking you you know there's the grandmother test for startup state in plain English explain your idea in a plain English super challenging you know to simplify so when you know when developers or companies asking what is vector search I'm using the. +The example of you know transforming words in the case of text to numbers it's easy for us to compare numbers right we know that three is is close or similar to four right but what is the connection between king and queen okay so how do you represent it as a number. + So if you if you are trying again again I'm trying to super simplified if you are trying to build an equation what is the connection between king and queen so you can say king plus men minus woman equals to queen so you are you're trying to represent it as numbers so this is the concept of vector you are representing I'm going to say. + So I'm trying to make sure that you are understanding and unstructured data and it can be you know with image image embedding etc and then I think like you know most of the tech companies today their core technology is search okay let's take if you are looking for a movie it's it's Netflix if you are would like you know to hear something cool or your podcast so you are running query on Spotify vector podcast and you will get the. + Dmitry's podcast you would like to buy a dress or and you you are trying to do it very simple you don't usually more more of the let's take for instance e-commerce right so most of the consumers don't have the time and the patient to run you know SQL queries you know filter these filter that they would like to write it in a simple English or or in a different language okay yeah so let's take for an example and. + Girl in Asia she would like to purchase a red and white short sleeve dress up until the the vector search revolution she didn't had the option to do it so usually she will get like a similar result it will not always be red and white with short sleeve dress and what about the challenge of the language okay so if our English is not so great and she would like to purchase something. + An Amazon eBay or any other e-commerce so the challenge of language so essentially vector search is breaking the barrier of the language and the the barrier of understanding what is your what is your question or what is your query so I think that in terms of you know and there's a broad discussion about it democratizing AI what is the added value of. + AI so you know you have like autonomous cars and this is great but you know breaking barriers the language barrier with the multilingual model and and some other cool stuff this is this is I think something that is doing really good for the ecosystem and for consumer and the people that you know they have like a barrier of a language so. +This is a great example what is the added value of vector search yeah I agree I mean all of the examples that you brought up you know if you if you look at how you would tackle I don't know like red short sleeve dress with the more traditional approach I guess you will need to build some kind of. + Query understanding system but even then like even if after you've built it let's say you you will run filters on your data right but that that also means you do need to have the filters but but if you don't have them if you don't have the values in those fields in your documents right so what if you want you have like and this is by the way not. + It's very unusual like I used to see I used to oversee a project in e-commerce space where we would get data from new providers all the time right so one of the issues was to map them back to our ontology but at the same time they would they would miss a lot of like field values right so what would you put there so they give you some description and then they give you the image on a set of images. +So like with with conventional not conventional but like more traditional approach of search right keyword search you're kind of like stuck right what would you do there and I guess of course people do solve it in some way but. Instead you could just apply vector search right and and. +Even though I say just there is still some challenge for example with model fine tuning and things like that can you talk a bit more to this maybe new challenges that this field opens of course it gives us. +opportunities it gives us advantages it solves some you know painstaking issues that we had before but what do we need to focus on going forward then once we deploy such systems beyond only hardware part but also like on this algorithm side. +Yeah you know this is this is a great question because it resonates with one of your blog post recent blog post where you published the Google's research about e-commerce companies in the US losing 300 billion dollars due to search abandonment in the US only. + And again this is crazy number because if you have like I would like to buy a green polo shirt and I really want to buy a green polo shirt and the e-commerce got this green polo shirt inside in the in the warehouse or in the inventory and they can find the match we can find the match for this for this challenge. +This is this is the you challenge but in terms of of and again this is just one one example but you know our mission is to back to break this barrier for for developers it's not only e-commerce so expanding it to searching blocks okay if you would like to find. +And anomaly or you would like to understand what is the root cause when you're and you have like a software system logs and you would like to to understand and to find some anomalies or even fintech e-commerce and other areas I think that there's some cool stuff over there so one one way. +And you know to move forward is if you would like to use let's take for instance Siri I would like to buy with your audio right I would like to buy a red and white short sleeve dress below 100 dollar. + Okay so you can this is a simple thing for you know consumers but you know technology wise this is the you challenge so the first challenge is to convert the audio to text and today there's you know you can convert it directly to vectors and then you can run this query but again you need to filter because if you want. + Something that is below 100 so usually it's the price field so I think this is the biggest challenge that the consumers or people can communicate in a natural way with the computer with audio and say it very simple without you know trying to to run a complicated SQL queries etc so I think this is the like the the holy grail of of the audio. + Machine learning to process this query and give you like the example and when you are purchase when you would like to purchase it on a certain website it will give you the place order page and you will get all of the details you will see the type of the dress and it will give you the right result and it will be below 100. +The 100 dollar and I think this is the way or this is a direction that we we can move forward with this technology. +Yeah yeah that sounds great so in principle like so so that our listeners that present a new one will understand is that vector search really opens doors to new types of data right new modalities as they say so like. + Previously it was maybe only text modality even if you saw pictures on the on the monitor or on your phone as you know as response to your query it doesn't necessarily mean that that query really was kind of grasping the best parts of that image like it would actually understand what what is in the image but with vector search you can also implement that and for example using clip model or some other model where you can. +Really. Infer meaning from that picture right and what you are saying is that in the future and maybe this is to some extent happening already is that we can also cross modalities between voice and text right so like what I'm saying it can it can represent as a vector and then. +Find an image or find a video right it's like a lot of applications. + Yeah yeah yeah totally yeah exactly and you know if you are working with your Instagram and you found like a nice celeb that is wearing a nice dress and you would like to buy something that is similar so with image search you can find like a similar and find me the find me this dress or the most relevant dress or the most the closest dress the. +Closes example of this dress and yeah yeah there are various options you know this is just one example you know of how to monetize Instagram or a tick talk where you know consumers can. Watch their favorite. them. +The celeb that they are following and if they were seeing something this is great so I want to purchase it and in terms of monetization and in terms of the added value of the customer take take this you take this platform says that. +Any commerce platform okay this is like a fresh concept but this is ways this is a way for companies to monetize the platform it's not a social media it can be. e-commerce and it can be super simple because you know up until now they they've seen like a nice dress or a nice. +A shirt but they cannot do with it they cannot purchase it they don't know how to explain to the machine or the computer. What what is the type of the of the clothing that they would like to buy so yeah that there are various options and yeah i'm eager to see what are the applications. +That you know developers and the entrepreneurs will develop with this technology. Yeah that sounds great one of the apps that you just kind of reminded me of is I think it was James Briggs who built the kind of simple demo. +Using the recent model called whisper from open AI so you can actually you know like on YouTube today how you find things is basically mostly based on titles. I believe this is what people type. +But then he built a demo where he can land in the precise time code which contains the answer to your question you know that could be really interesting like it just to think about it at the unlocks even more what you said in the beginning like we have this is a device of data and so on. +But like we are not able to unlock the data right it's just sitting there waiting to be discovered so to say. +Yeah it's really cool I wanted to spend a bit of time on the search topic itself so you did mention this search abandonment issue which is like an e-commerce but but in general if we if we think about search field. + On a much larger scale and I think Daniel tanky-lank also said about it that when search engine doesn't work you are blamed but when it does work you don't hear anything it's like people take it for granted it's kind of like water from the tap I guess right if it's the right analogy so what do you think of the search field in general like where do you think vector search field fits in and what's the role of this hybrid. +Approach where you have this keywords versus which are more familiar to users versus vector search so where would you take this yourself right as a product manager having unlimited resources where would you where would you go. +Yeah this is this is an interesting question yeah so I think that search is still an unsold problem. +And you know in order to find the right object or the right the the most accurate type of data we are still we have a lot of work to develop this ecosystem and you know to build the multimodals and multilingual and I think that the big tech companies are doing some great job with this. +With this stuff like open and I and the other folks and hybrid searches is is a is a very interesting concept I believe that we for some applications it can be a good a good way to solve their challenges but I think that the one of the most important. +Pilar is that you should and again I've learned that the data but yes that they are like the concept. + Of moving backwards from the custom what is I don't have solution if we have a discussion with a customer we are asking what is the problem that you would like to solve and this is like you should be focused what what is the problem that you would like to solve like more than 50% of your discussion. + And if you don't have a good fit it's not a good fit if you did the vector search technology is not a good fit we would say it to the customer we are not trying you know to fit into a space that you know keyword search is a great solution so I think it's the focus should be around the problem space so trying to figure out what is their pain point or what is their customers pain point. + Is it the accuracy for some for some applications we we spoke with the fraud detection company and for their use case like keyword search was good enough solution great so go go go ahead and we don't want to disturb you so I think the focus should be around the the problem and the challenge and then again. + Focus on what is the the challenge that they would like to achieve or what is the what is the potential of the solution and sometimes we are talking about recall is it the most important parameter for some of the for some of the customers and 90% is good enough for the use case but for mission mission critically should be mission critical applications. +It should be 99.99% right so I think it's it's a matter to some extent it's it's an issue of what is the problem and what is the KPI that you would like to achieve. +Would you like to optimize the recall great we would optimize it if you would like to reduce the infrastructure cost with the same KPI recall of X and you have latency of X and it's a good enough and maybe it can be latency so. +For instance, Amazon published a research that every 100 milliseconds let it see equals to 1% of the revenue so if the revenue is $1 billion then 100 milliseconds of latency equals to 10 million. This is a huge impact for companies. +So I think the main the main question is what is the problem that you would like to solve what is the pain point and starting from the customer and then work backwards to find if you have like a good solution and if a solution is a is a good fit. +And again, there are various concept keyword search is a great solution vector search is a great solution and I read searches a good solution. The the big I think the biggest question is what is the problem that the customer would like to solve. +Yeah, I think you put it really brilliantly because it's very easy to get into my new show of tweaking things like on the software side and saying I have the best algorithm right or I have the fastest or whatever. +But then if you if you forget that I guess the most important dimension for your customer maybe it's power consumption that we mentioned previously or something else right. But but also what you said. +How you can think like the way Amazon did it right that they think big right they say okay of all these dollars we we earned how much actually we wasted on on you know latency and also how much of clients we kind of lost right. +Or potential clients because if if the server doesn't respond soon enough then and it's only average right 100 millisecond maybe for some it looks like more like closer to second including their own internet connection. +And they might just give up and say hard this is not working today I will go and check something else I will forget about what I wanted to buy. So right. Yeah, this is very interesting. + Also like you you you you you brought up some some topic behind the scenes like on the role of human in this whole loop I also want to pick your brain on that you know there is one direction in AI saying this is going to be a whole automatic thing you don't need to do anything it will decide for you which is also by the way a little bit worrisome if the yeah it's going to decide for everything. + But but even going back to earth like where do you see humans will play a role in some sense we are slower right then machines in some sense I think we are still faster for example in creating things but even they are the machines that tapping into it but like connected with MLOPS topics also machine learning operations and connected with bias and data that we collect to tell you what the real thing is. +Train models or maybe some other dimension that I'm missing that you think human is going to play a role can you can you expand a little bit on that yeah. So actually I wrote about it in medium about the MLOPS challenge and and the human in the loop and what is the place of human in the loop. +Essentially I believe that machine learning is a decision support system okay I believe that the human as. +As a huge or important significant role in helping the machine to decide and where a good way to automate processes is to use the machine and to set a threshold okay so for instance if you were. We're talking about cyber cyber security challenge so you can decide that the threshold is below 0. +7 is a good enough and you don't that like the sock teams will will check this anomaly and then again you are reducing. +The main power cost because you are automating and you are sending queries or a stream of data to analyze that they would you know fine tune the model and then you can create a learning model right so it's a human in the loop the human is giving a feedback to the model and then you can. +Detect data drift if it's not automated you know there there are solutions that are good that you know data drift etc but again. My my two cents is that fully automated systems we are not ready yet for it and I believe that in order and then again we don't like all of the anomalies will be. + Tested by a human again because you have the false positive fatigue or a lot fatigue in in the cyber domain so I believe that a combination or the hybrid model where you can define a certain threshold and send it to a human to run a sanity check and you know i've worked with many data scientist and. +The the always like you know to improve the state of the art model and improve the f 1 score for from 99 to 99.9. +But what is the the impact on the business is it is it important enough for the business to invest resources in this in this in this research or not like five data scientists are running and testing and optimizing the hyper parameter. +For months but what is the business what is the impact on the business so essentially I believe that and and again it resonates with the search domain so I believe that companies that will be smart enough to integrate the human and the loop mechanism where they can find you know. + Measure KPIs like the clicks on on the on the first result how many clicks on the first result or any other KPIs and then if it's a good model then great we should keep it you know if it's not broken don't touch it but if something is not working the mechanism or something that there's a drift in the data so we can you know. +Research research it again and find what is the root cause and then. Human will detect or machine will will detect it so I believe that this this is the question of of layers so you have like the machine learning layer and then. +ML obstacles like you can you know auto ML and the hyper parameter optimization and data drift and model drift and other tools but. We are I don't think we are ready to fully automated all of the process and yeah this is like a great question for instance autonomous cars are we ready yet or not. +This is I think this is the the challenge of the data science ecosystem in the next years. Yeah I think it's also like. +Our psychological readiness to accept this solutions rights maybe previously when we didn't have let's say elevator so everyone was walking up the stairs and no one really complained but then when the first elevator arrived maybe people were like really. +You know looking at it with open eyes and like what is this should they trust it will I get interrupt in it or something you know. So the same I think goes to what you just raised as the. +You know self driving cars you know I think it was ill and mask saying I don't remember exactly the stats but something let's say one and a thousand. You know so it avoids basically nine hundred ninety nine you know cases bad cases so would you trust that or do you need like it to be. + Even bigger number and so on so forth so like complete thousands so it's never mistaken but what about cases where it's hard to decide right like you inevitably going to crush the car now you need to choose where like to the human or maybe to to the I don't know for to the tree or something which hurts the driver and stuff like that right so. + So this I think the same the same decisions that we would be making as humans then algorithms should make and I think what humans or humanity has hard time with is probably accepting the fact that someone else is going to make the decision right so yeah it's yeah it's a revolution you know you mentioned the elevator but you know the famous story of the end report with the horses and the cars why should we. +Need the cars right so it's a it's a revolution I think that most of. +The features that we are working on improving the quality of life and people you know can automate processes and and on focus in their you know in their family and instead of doing some complicated task they can you know focus their. +Time on innovation or you know play football or soccer whatever they want and you know makes our life easier to some extent yeah and we believe collectively that vector search is going to help there i. + I really like also to of course ask this my philosophical question but before that i was thinking like what do you think on the field in general the vector search and maybe including search and machine learning what do a lot is happening but what do you think is still missing from your perspective. +Something that maybe we need to fix. +To be more efficient yeah I think it's it's it's education simplifying the concept of search I think this is the should be our main focus so education generating content and again I really like the grandmother test simplifying not like super complicated mathematical equations etc. +So I think it's education we are you know the ecosystem is trying to generate high quality content videos a YouTube blog post we we are trying to contribute to this effort as well. + We are doing enough and you know it can be like high school or universities so but again this is vector search is technology it's an enabler it's not the it's not the objective it's not the it's not the target but in order to unlock the potential of vector search and machine learning and transform and so on all of these cool stuff we should. + Invest some of our resources in education and learning and training and you know unlock the potential that every developer can kill build a vector vector search based application in in every field like it can be as I mentioned before health care, care, FinTech education and any other domain that manufacturing or any other domain that you would like that is bigger to solve some problem I think we should you know simplified similar to the revolution of auto ml so instead of you know processing and labeling images etc. + You have like an auto auto ml tool or solution and your provision you provision the data labeling it and then the under the hood the auto ml model will run the experiments find the right algorithm find the right hyper parameters optimizing them you can define what is the what is the KPI that you would like to optimize. + If it's f1 score or recall or whatever and then you will get the model and if you would like to deep dive you will get the code so you know generating models with low code so this is another and another area that we should focus in but you know to some extent I believe that education training and generating high quality content. +So it should be our focus right now. + Yeah I think you put it really well and I would probably even add to this that yes there is content which kind of like promotes someone's solution right but at the same time you really want to educate like why should people even care about your solution so you need to take few steps back and explain what are we talking about. + You know what problems exist that you are trying to that you are targeting right so I still if I was asking the same question to myself I still see a lot of content which is much more promotional than it should be because in the beginning of this revolution you still need to explain what is happening what the hell is going on you know why why because the reaction could be also from the incumbent players that they will say no this is not. +No this is not no this is not where things are going they will go go back to their clients and say the same but like you should not position it that way you should you should explain as you said like start from the problem right start from what is your actual business and product target. + And I guess this is not something that many engineers asking themselves some of them some of the best that I know do some of the best data scientists do as well they don't code before they understood what is being asked of them and I think it's an amazing skill and this is exactly where education also helps like why should also data scientists or engineers care about this new new field. +Yeah, yeah, this is super important and yeah, we should honestly we should you know when I'm saying again internally we should improve the quality of the content and not trying you know to sell our solution just to explain for software developer without the background in machine learning. + And to simplify it for him and to explain what is the concept what is the trade off between the concept and again to give him the option to understand what is happening and issue decide what is the best tool for him is it a screwdriver is it an hammer it will understand the bits and bytes but and the trade off and and give him the the full picture about what what are the problems. +Yeah, I think it's a great question. It was in cons of every solution and you know you will take the decision. +Yeah, exactly and I think if we decade more some of the players doing doing really great job there and I am looking forward also to see some blog posts you already mentioned the notebooks that you guys are publishing on your website and I believe that was searching website right. +Now that I learned that you really care about the topic I think it's important to create and share and maybe educate the educators and give the example so I think this is really great. +Yeah, yeah, one example the great blog was that one of our software developer wrote is how to optimize open search workloads. + So again it's not we have a plugin on top of it, but he wrote about what are the options without you know writing about our solution and what are the options out they can our customers can optimize it and another interesting blog post that we will publish soon is you know benchmarking one of the things that we should improve in our ecosystem is to decide on on a standard tool that will. + Help us to decide what is the KPI and the benchmark there are various benchmark over there i'm familiar we are familiar with rally and the elastic benchmark i haven't seen like a good benchmark industry standard in in the vector search there was the challenge of big a and one year ago two years ago, but again I don't think we have. +Good tool today, but so one of our developers wrote out to run the benchmark tool so it was like open search benchmark how to use this benchmark and what is the. +Beats and bytes and tips how to understand the benchmark tools so yeah I think that starting from the education and then offering customers to check your solution yeah sounds great i think maybe even by the time this podcast is published we have that new blog as well. + Hey, I'm really excited to be chatting to you today, I mean it's attached a lot of deep topics i'm sure we could have gone for longer, but I was also really curious to ask you this magical why question you know the same way as you said don't think about software think about the problem that you're solving the reason i'm asking why question is because I truly believe. + But if you don't understand why you do things then you're kind of like flying through things right so you might as well regret some sometime later maybe you train the muscle but still i don't think it's a good sensible approach in your life so i'm really interested given all your experience in in machine learning and product management software development why are you excited to work on vector search search whatever is it that you do day today. +Yeah, I think this is a great question I really like the why combinator accelerator approach for building products build something that customers like or love and essentially you know we are building some trying to build some cool stuff and make. + People's life easier happier i gave an example of the girl from Asia so this is this is I think one of our mission but it's not only the girl from Asia that would like to purchase a red short sleeve dress it's the DevOps that is trying to find the right log and instead of working it for hours it will take him. +Seconds okay so essentially our mission and i'm excited that i'm working on this topic it's to make the the consumers businesses and enterprises life easier. And so I think it's a very simple statement of the why and I believe that this is this is my mission or this is our mission. +And and to some extent I think that this is like a doing good perspective so you know you have like. gambling company is. +building some stuff and building applications and i'm my approach is you know building things that will help the humanity so i'm exciting that this is the things that i'm working on and by the way this was in my previous startup when we tried to save life right drowning detection. +and you issue residential pool open water and you know save life and if we can you know save life maybe for health applications. detecting cancer with the image embeddings or some other cool stuff i'm super excited that this is the domain that we are working on. +yeah this is this is very relatable in this fantastic that you're bringing this up you know how we can actually improve life. beside building. great products or products that sell. +This is this is amazing and to conclude is there any announcement that you want to make from the from your side of from search room. A. I. yeah yeah so i'm very excited because we are building. +some cool stuff so the first one we launch our search you may i platform where we offer customers a free tier to check our. platform and again it's not. +fully working we are not supporting all of the features but it this is very important it is very important for us to get your feedback so i encourage you to check it and to. send me an email or send my team an email or in our support. +give your feedback don't be a gentle with us we are trying you know to build. things that developers would like and. we are very focused on the customer so this is the first announcement and every feedback is valuable next year we will launch our second generation where we can offer. +better performance more than 10x we have a few new implementations and in terms of performance and hopefully. at the beginning of. 2023 we will release our python compiler and some other cool stuff so we are working on a few vectors if I may use. +vectors and yeah on the software on the hardware on the system user experience. And the user interface and to simplify it. yeah so this is the things that we are working right now and we will be happy to be in touch. +sounds great thanks you have looks like your plate is full of really exciting things so all the best. to you and your team i know some of them. yeah it's it's amazing that you guys are building this and i'm really looking forward to gen 2 of of the APU hardware as well. +yeah and all the best we will stay in touch thank you very much for this episode. yeah thank you very much the mid tree was a pleasure talking with you you know super interesting stuff I can you know talk for hours about this the man you know it's. +i'm excited that working this domain and really looking forward to you know. hear from the community fantastic thanks so much and if. thank you for now. thank you very much bye bye. music playing \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/yury-malkov-staff-engineer-twitter-author-of-the-most-adopted-ann-algorithm-hnsw.md b/transcripts_with_timestamps/vector-podcast/yury-malkov-staff-engineer-twitter-author-of-the-most-adopted-ann-algorithm-hnsw.md new file mode 100644 index 0000000..bd19bc9 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/yury-malkov-staff-engineer-twitter-author-of-the-most-adopted-ann-algorithm-hnsw.md @@ -0,0 +1,3878 @@ +--- +description: '

YouTube: https://www.youtube.com/watch?v=gvgD98jWrJM

Topics:

00:00 + Introduction

01:04 Yury’s background in laser physics, computer vision and + startups

05:14 How Yury entered the field of nearest neighbor search and his + impression of it

09:03 “Not all Small Worlds are Navigable”

10:10 Gentle + introduction into the theory of Small World Navigable Graphs and related concepts

13:55 + Further clarification on the input constraints for the NN search algorithm design

15:03 + What did not work in NSW algorithm and how did Yury set up to invent new algorithm + called HNSW

24:06 Collaboration with Leo Boytsov on integrating HNSW in nmslib

26:01 + Differences between HNSW and NSW

27:55 Does algorithm always converge?

31:56 + How FAISS’s implementation is different from the original HNSW

33:13 Could + Yury predict that his algorithm would be implemented in so many frameworks and vector + databases in languages like Go and Rust?

36:51 How our perception of high-dimensional + spaces change compared to 3D?

38:30 ANN Benchmarks

41:33 Feeling proud + of the invention and publication process during 2,5 years!

48:10 Yury’s effort + to maintain HNSW and its GitHub community and the algorithm’s design principles

53:29 + Dmitry’s ANN algorithm KANNDI, which uses HNSW as a building block

1:02:16 + Java / Python Virtual Machines, profiling and benchmarking. “Your analysis of performance + contradicts the profiler”

1:05:36 What are Yury’s hopes and goals for HNSW + and role of symbolic filtering in ANN in general

1:13:05 The future of ANN + field: search inside a neural network, graph ANN

1:15:14 Multistage ranking + with graph based nearest neighbor search

1:18:18 Do we have the “best” ANN + algorithm? How ANN algorithms influence each other

1:21:27 Yury’s plans on + publishing his ideas

1:23:42 The intriguing question of Why

Show notes:

- + HNSW library: https://github.com/nmslib/hnswlib/

- + HNSW paper Malkov, Y. A., & Yashunin, D. A. (2018). Efficient and robust approximate + nearest neighbor search using hierarchical navigable small world graphs. TPAMI, + 42(4), 824-836. (arxiv:1603.09320)

- NSW paper Malkov, Y., Ponomarenko, A., + Logvinov, A., & Krylov, V. (2014). Approximate nearest neighbor algorithm based + on navigable small world graphs. Information Systems, 45, 61-68.

- Yury Lifshits’s + paper: https://yury.name/papers/lifshits2009combinatorial.pdf

- + Sergey Brin’s work in nearest neighbour search: GNAT - Geometric Near-neighbour + Access Tree: [CiteSeerX — Near neighbor search in large metric spaces](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.173.8156)

- + Podcast with Leo Boytsov: https://rare-technologies.com/rrp-4-leo-boytsov-knn-search/

- + FALCONN algorithm: https://github.com/falconn-lib/falconn

- + Mentioned navigable small world papers:

Kleinberg, J. M. (2000). Navigation + in a small world. Nature, 406(6798), 845-845.;

Boguna, M., Krioukov, D., & + Claffy, K. C. (2009). Navigability of complex networks. Nature Physics, 5(1), 74-80.

' +image_url: https://media.rss.com/vector-podcast/20220131_090127_be85ef047356dd187c4b22fb3a9286be.jpg +pub_date: Mon, 31 Jan 2022 09:41:27 GMT +title: Yury Malkov - Staff Engineer, Twitter - Author of the most adopted ANN algorithm + HNSW +url: https://rss.com/podcasts/vector-podcast/377082 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 21.44, "text": " Hello, + vector podcast is here and today we''re going to be talking to the author of H&SW", + "tokens": [50364, 2425, 11, 8062, 7367, 307, 510, 293, 965, 321, 434, 516, 281, + 312, 1417, 281, 264, 3793, 295, 389, 5, 50, 54, 51436], "temperature": 0.0, "avg_logprob": + -0.4366465210914612, "compression_ratio": 1.1458333333333333, "no_speech_prob": + 0.09332770109176636}, {"id": 1, "seek": 0, "start": 21.44, "end": 25.44, "text": + " library and algorithm.", "tokens": [51436, 6405, 293, 9284, 13, 51636], "temperature": + 0.0, "avg_logprob": -0.4366465210914612, "compression_ratio": 1.1458333333333333, + "no_speech_prob": 0.09332770109176636}, {"id": 2, "seek": 2544, "start": 25.44, + "end": 32.0, "text": " It''s one of the best algorithms out there, one of the most + used algorithms in vector search.", "tokens": [50364, 467, 311, 472, 295, 264, 1151, + 14642, 484, 456, 11, 472, 295, 264, 881, 1143, 14642, 294, 8062, 3164, 13, 50692], + "temperature": 0.0, "avg_logprob": -0.4448713302612305, "compression_ratio": 1.5817307692307692, + "no_speech_prob": 0.04719945415854454}, {"id": 3, "seek": 2544, "start": 32.0, "end": + 34.160000000000004, "text": " And today I''m talking to Yuri Malkov.", "tokens": + [50692, 400, 965, 286, 478, 1417, 281, 33901, 376, 667, 5179, 13, 50800], "temperature": + 0.0, "avg_logprob": -0.4448713302612305, "compression_ratio": 1.5817307692307692, + "no_speech_prob": 0.04719945415854454}, {"id": 4, "seek": 2544, "start": 34.160000000000004, + "end": 36.160000000000004, "text": " How are you doing?", "tokens": [50800, 1012, + 366, 291, 884, 30, 50900], "temperature": 0.0, "avg_logprob": -0.4448713302612305, + "compression_ratio": 1.5817307692307692, "no_speech_prob": 0.04719945415854454}, + {"id": 5, "seek": 2544, "start": 36.160000000000004, "end": 37.160000000000004, + "text": " Hi.", "tokens": [50900, 2421, 13, 50950], "temperature": 0.0, "avg_logprob": + -0.4448713302612305, "compression_ratio": 1.5817307692307692, "no_speech_prob": + 0.04719945415854454}, {"id": 6, "seek": 2544, "start": 37.160000000000004, "end": + 38.160000000000004, "text": " Hi.", "tokens": [50950, 2421, 13, 51000], "temperature": + 0.0, "avg_logprob": -0.4448713302612305, "compression_ratio": 1.5817307692307692, + "no_speech_prob": 0.04719945415854454}, {"id": 7, "seek": 2544, "start": 38.160000000000004, + "end": 39.160000000000004, "text": " Hi.", "tokens": [51000, 2421, 13, 51050], "temperature": + 0.0, "avg_logprob": -0.4448713302612305, "compression_ratio": 1.5817307692307692, + "no_speech_prob": 0.04719945415854454}, {"id": 8, "seek": 2544, "start": 39.160000000000004, + "end": 44.24, "text": " So yeah, my name is Yuri Malkov.", "tokens": [51050, 407, + 1338, 11, 452, 1315, 307, 33901, 376, 667, 5179, 13, 51304], "temperature": 0.0, + "avg_logprob": -0.4448713302612305, "compression_ratio": 1.5817307692307692, "no_speech_prob": + 0.04719945415854454}, {"id": 9, "seek": 2544, "start": 44.24, "end": 46.36, "text": + " Currently I''m working at Twitter.", "tokens": [51304, 19964, 286, 478, 1364, + 412, 5794, 13, 51410], "temperature": 0.0, "avg_logprob": -0.4448713302612305, "compression_ratio": + 1.5817307692307692, "no_speech_prob": 0.04719945415854454}, {"id": 10, "seek": 2544, + "start": 46.36, "end": 53.32, "text": " There''s a staff from my engineer and the + content understanding and research and recommender", "tokens": [51410, 821, 311, + 257, 3525, 490, 452, 11403, 293, 264, 2701, 3701, 293, 2132, 293, 2748, 260, 51758], + "temperature": 0.0, "avg_logprob": -0.4448713302612305, "compression_ratio": 1.5817307692307692, + "no_speech_prob": 0.04719945415854454}, {"id": 11, "seek": 2544, "start": 53.32, + "end": 54.32, "text": " systems.", "tokens": [51758, 3652, 13, 51808], "temperature": + 0.0, "avg_logprob": -0.4448713302612305, "compression_ratio": 1.5817307692307692, + "no_speech_prob": 0.04719945415854454}, {"id": 12, "seek": 5432, "start": 54.56, + "end": 60.24, "text": " Yeah, please know that during discussion I don''t represent + like Twitter''s point of view.", "tokens": [50376, 865, 11, 1767, 458, 300, 1830, + 5017, 286, 500, 380, 2906, 411, 5794, 311, 935, 295, 1910, 13, 50660], "temperature": + 0.0, "avg_logprob": -0.3138644748263889, "compression_ratio": 1.53125, "no_speech_prob": + 0.004192720633000135}, {"id": 13, "seek": 5432, "start": 60.24, "end": 63.16, "text": + " The views are of my own.", "tokens": [50660, 440, 6809, 366, 295, 452, 1065, 13, + 50806], "temperature": 0.0, "avg_logprob": -0.3138644748263889, "compression_ratio": + 1.53125, "no_speech_prob": 0.004192720633000135}, {"id": 14, "seek": 5432, "start": + 63.16, "end": 66.08, "text": " Yeah, so it''s great.", "tokens": [50806, 865, 11, + 370, 309, 311, 869, 13, 50952], "temperature": 0.0, "avg_logprob": -0.3138644748263889, + "compression_ratio": 1.53125, "no_speech_prob": 0.004192720633000135}, {"id": 15, + "seek": 5432, "start": 66.08, "end": 69.56, "text": " So yeah, you already began + introducing yourself.", "tokens": [50952, 407, 1338, 11, 291, 1217, 4283, 15424, + 1803, 13, 51126], "temperature": 0.0, "avg_logprob": -0.3138644748263889, "compression_ratio": + 1.53125, "no_speech_prob": 0.004192720633000135}, {"id": 16, "seek": 5432, "start": + 69.56, "end": 76.03999999999999, "text": " So I was wondering if you could tell + me a bit about yourself, your background and then", "tokens": [51126, 407, 286, + 390, 6359, 498, 291, 727, 980, 385, 257, 857, 466, 1803, 11, 428, 3678, 293, 550, + 51450], "temperature": 0.0, "avg_logprob": -0.3138644748263889, "compression_ratio": + 1.53125, "no_speech_prob": 0.004192720633000135}, {"id": 17, "seek": 5432, "start": + 76.03999999999999, "end": 80.0, "text": " maybe we can also move into discussing + the algorithm itself.", "tokens": [51450, 1310, 321, 393, 611, 1286, 666, 10850, + 264, 9284, 2564, 13, 51648], "temperature": 0.0, "avg_logprob": -0.3138644748263889, + "compression_ratio": 1.53125, "no_speech_prob": 0.004192720633000135}, {"id": 18, + "seek": 5432, "start": 80.0, "end": 81.64, "text": " Okay, sure.", "tokens": [51648, + 1033, 11, 988, 13, 51730], "temperature": 0.0, "avg_logprob": -0.3138644748263889, + "compression_ratio": 1.53125, "no_speech_prob": 0.004192720633000135}, {"id": 19, + "seek": 8164, "start": 82.12, "end": 90.56, "text": " Yeah, so my trajectory of + moving to ML is quite typical to Russia.", "tokens": [50388, 865, 11, 370, 452, + 21512, 295, 2684, 281, 21601, 307, 1596, 7476, 281, 6797, 13, 50810], "temperature": + 0.0, "avg_logprob": -0.32309078034900485, "compression_ratio": 1.3734177215189873, + "no_speech_prob": 0.005141934845596552}, {"id": 20, "seek": 8164, "start": 90.56, + "end": 98.84, "text": " So yeah, I got good physics education in Nizhny Novgorod + and there I did the PhD in laser", "tokens": [50810, 407, 1338, 11, 286, 658, 665, + 10649, 3309, 294, 426, 590, 71, 1634, 31948, 26465, 378, 293, 456, 286, 630, 264, + 14476, 294, 12530, 51224], "temperature": 0.0, "avg_logprob": -0.32309078034900485, + "compression_ratio": 1.3734177215189873, "no_speech_prob": 0.005141934845596552}, + {"id": 21, "seek": 8164, "start": 98.84, "end": 101.0, "text": " physics.", "tokens": + [51224, 10649, 13, 51332], "temperature": 0.0, "avg_logprob": -0.32309078034900485, + "compression_ratio": 1.3734177215189873, "no_speech_prob": 0.005141934845596552}, + {"id": 22, "seek": 8164, "start": 101.0, "end": 105.56, "text": " So there I was + doing experiments on teravat lasers.", "tokens": [51332, 407, 456, 286, 390, 884, + 12050, 322, 1796, 706, 267, 37948, 13, 51560], "temperature": 0.0, "avg_logprob": + -0.32309078034900485, "compression_ratio": 1.3734177215189873, "no_speech_prob": + 0.005141934845596552}, {"id": 23, "seek": 10556, "start": 105.56, "end": 112.56, + "text": " So that was fun and like that part of physics is like considered to be + like sexy part,", "tokens": [50364, 407, 300, 390, 1019, 293, 411, 300, 644, 295, + 10649, 307, 411, 4888, 281, 312, 411, 13701, 644, 11, 50714], "temperature": 0.0, + "avg_logprob": -0.3145904979486575, "compression_ratio": 1.6386138613861385, "no_speech_prob": + 0.011874135583639145}, {"id": 24, "seek": 10556, "start": 112.56, "end": 116.44, + "text": " similar to computer vision in machine learning.", "tokens": [50714, 2531, + 281, 3820, 5201, 294, 3479, 2539, 13, 50908], "temperature": 0.0, "avg_logprob": + -0.3145904979486575, "compression_ratio": 1.6386138613861385, "no_speech_prob": + 0.011874135583639145}, {"id": 25, "seek": 10556, "start": 116.44, "end": 119.68, + "text": " And I was lucky to have good supervisors.", "tokens": [50908, 400, 286, + 390, 6356, 281, 362, 665, 42218, 13, 51070], "temperature": 0.0, "avg_logprob": + -0.3145904979486575, "compression_ratio": 1.6386138613861385, "no_speech_prob": + 0.011874135583639145}, {"id": 26, "seek": 10556, "start": 119.68, "end": 125.48, + "text": " So one of my supervisor which was like mostly a supervisor of paper.", + "tokens": [51070, 407, 472, 295, 452, 24610, 597, 390, 411, 5240, 257, 24610, 295, + 3035, 13, 51360], "temperature": 0.0, "avg_logprob": -0.3145904979486575, "compression_ratio": + 1.6386138613861385, "no_speech_prob": 0.011874135583639145}, {"id": 27, "seek": + 10556, "start": 125.48, "end": 126.80000000000001, "text": " So he helped me.", + "tokens": [51360, 407, 415, 4254, 385, 13, 51426], "temperature": 0.0, "avg_logprob": + -0.3145904979486575, "compression_ratio": 1.6386138613861385, "no_speech_prob": + 0.011874135583639145}, {"id": 28, "seek": 10556, "start": 126.80000000000001, "end": + 129.44, "text": " Is now the head of Russian Academy.", "tokens": [51426, 1119, + 586, 264, 1378, 295, 7220, 11735, 13, 51558], "temperature": 0.0, "avg_logprob": + -0.3145904979486575, "compression_ratio": 1.6386138613861385, "no_speech_prob": + 0.011874135583639145}, {"id": 29, "seek": 10556, "start": 129.44, "end": 133.2, + "text": " So yeah, I had good supervisors.", "tokens": [51558, 407, 1338, 11, 286, + 632, 665, 42218, 13, 51746], "temperature": 0.0, "avg_logprob": -0.3145904979486575, + "compression_ratio": 1.6386138613861385, "no_speech_prob": 0.011874135583639145}, + {"id": 30, "seek": 13320, "start": 134.2, "end": 141.51999999999998, "text": " In + addition to physics, I was concurrently working part time in a startup that was + building", "tokens": [50414, 682, 4500, 281, 10649, 11, 286, 390, 37702, 356, 1364, + 644, 565, 294, 257, 18578, 300, 390, 2390, 50780], "temperature": 0.0, "avg_logprob": + -0.26957862782028486, "compression_ratio": 1.4036144578313252, "no_speech_prob": + 0.003964224364608526}, {"id": 31, "seek": 13320, "start": 141.51999999999998, "end": + 149.64, "text": " distributed scalable search systems based on insights from real + networks.", "tokens": [50780, 12631, 38481, 3164, 3652, 2361, 322, 14310, 490, 957, + 9590, 13, 51186], "temperature": 0.0, "avg_logprob": -0.26957862782028486, "compression_ratio": + 1.4036144578313252, "no_speech_prob": 0.003964224364608526}, {"id": 32, "seek": + 13320, "start": 149.64, "end": 159.95999999999998, "text": " Yeah, that worked ended + up in several papers on predecessor of H&W.", "tokens": [51186, 865, 11, 300, 2732, + 4590, 493, 294, 2940, 10577, 322, 34991, 295, 389, 5, 54, 13, 51702], "temperature": + 0.0, "avg_logprob": -0.26957862782028486, "compression_ratio": 1.4036144578313252, + "no_speech_prob": 0.003964224364608526}, {"id": 33, "seek": 15996, "start": 159.96, + "end": 169.56, "text": " And the startup, yeah, unfortunately the startup was closed + before even I got PhD.", "tokens": [50364, 400, 264, 18578, 11, 1338, 11, 7015, + 264, 18578, 390, 5395, 949, 754, 286, 658, 14476, 13, 50844], "temperature": 0.0, + "avg_logprob": -0.2619208017985026, "compression_ratio": 1.4, "no_speech_prob": + 0.015900060534477234}, {"id": 34, "seek": 15996, "start": 169.56, "end": 181.20000000000002, + "text": " So yeah, I decided to focus on physics after that, but after I got my + PhD degree in physics.", "tokens": [50844, 407, 1338, 11, 286, 3047, 281, 1879, + 322, 10649, 934, 300, 11, 457, 934, 286, 658, 452, 14476, 4314, 294, 10649, 13, + 51426], "temperature": 0.0, "avg_logprob": -0.2619208017985026, "compression_ratio": + 1.4, "no_speech_prob": 0.015900060534477234}, {"id": 35, "seek": 18120, "start": + 181.2, "end": 190.51999999999998, "text": " So I, like there was a choice for me + like what to do next and to proceed with career", "tokens": [50364, 407, 286, 11, + 411, 456, 390, 257, 3922, 337, 385, 411, 437, 281, 360, 958, 293, 281, 8991, 365, + 3988, 50830], "temperature": 0.0, "avg_logprob": -0.31042656341156405, "compression_ratio": + 1.5696969696969696, "no_speech_prob": 0.030610591173171997}, {"id": 36, "seek": + 18120, "start": 190.51999999999998, "end": 191.51999999999998, "text": " and physics.", + "tokens": [50830, 293, 10649, 13, 50880], "temperature": 0.0, "avg_logprob": -0.31042656341156405, + "compression_ratio": 1.5696969696969696, "no_speech_prob": 0.030610591173171997}, + {"id": 37, "seek": 18120, "start": 191.51999999999998, "end": 195.04, "text": " + I had to go abroad, like I didn''t want to go abroad.", "tokens": [50880, 286, 632, + 281, 352, 12637, 11, 411, 286, 994, 380, 528, 281, 352, 12637, 13, 51056], "temperature": + 0.0, "avg_logprob": -0.31042656341156405, "compression_ratio": 1.5696969696969696, + "no_speech_prob": 0.030610591173171997}, {"id": 38, "seek": 18120, "start": 195.04, + "end": 197.44, "text": " I want to stay in Nizhny Novgorod.", "tokens": [51056, + 286, 528, 281, 1754, 294, 426, 590, 71, 1634, 31948, 26465, 378, 13, 51176], "temperature": + 0.0, "avg_logprob": -0.31042656341156405, "compression_ratio": 1.5696969696969696, + "no_speech_prob": 0.030610591173171997}, {"id": 39, "seek": 18120, "start": 197.44, + "end": 204.83999999999997, "text": " So I decided to just like switch directions + and to network science there.", "tokens": [51176, 407, 286, 3047, 281, 445, 411, + 3679, 11095, 293, 281, 3209, 3497, 456, 13, 51546], "temperature": 0.0, "avg_logprob": + -0.31042656341156405, "compression_ratio": 1.5696969696969696, "no_speech_prob": + 0.030610591173171997}, {"id": 40, "seek": 20484, "start": 204.84, "end": 213.0, + "text": " And then I got a really good grant from the Russian fund.", "tokens": + [50364, 400, 550, 286, 658, 257, 534, 665, 6386, 490, 264, 7220, 2374, 13, 50772], + "temperature": 0.0, "avg_logprob": -0.383243845469916, "compression_ratio": 1.4325842696629214, + "no_speech_prob": 0.01627843640744686}, {"id": 41, "seek": 20484, "start": 213.0, + "end": 217.52, "text": " Alpha Phi, which now is not present anymore.", "tokens": + [50772, 20588, 41435, 11, 597, 586, 307, 406, 1974, 3602, 13, 50998], "temperature": + 0.0, "avg_logprob": -0.383243845469916, "compression_ratio": 1.4325842696629214, + "no_speech_prob": 0.01627843640744686}, {"id": 42, "seek": 20484, "start": 217.52, + "end": 221.36, "text": " So I could do like research by my own.", "tokens": [50998, + 407, 286, 727, 360, 411, 2132, 538, 452, 1065, 13, 51190], "temperature": 0.0, "avg_logprob": + -0.383243845469916, "compression_ratio": 1.4325842696629214, "no_speech_prob": 0.01627843640744686}, + {"id": 43, "seek": 20484, "start": 221.36, "end": 224.6, "text": " Like this pretty + good salary.", "tokens": [51190, 1743, 341, 1238, 665, 15360, 13, 51352], "temperature": + 0.0, "avg_logprob": -0.383243845469916, "compression_ratio": 1.4325842696629214, + "no_speech_prob": 0.01627843640744686}, {"id": 44, "seek": 20484, "start": 224.6, + "end": 230.84, "text": " And yeah, I also joined companies, like computer vision + companies to get to insight", "tokens": [51352, 400, 1338, 11, 286, 611, 6869, 3431, + 11, 411, 3820, 5201, 3431, 281, 483, 281, 11269, 51664], "temperature": 0.0, "avg_logprob": + -0.383243845469916, "compression_ratio": 1.4325842696629214, "no_speech_prob": 0.01627843640744686}, + {"id": 45, "seek": 23084, "start": 230.84, "end": 236.56, "text": " into why people + actually use like similarities to your algorithm and machine learning.", "tokens": + [50364, 666, 983, 561, 767, 764, 411, 24197, 281, 428, 9284, 293, 3479, 2539, 13, + 50650], "temperature": 0.0, "avg_logprob": -0.42778515171360326, "compression_ratio": + 1.5388349514563107, "no_speech_prob": 0.0028358076233416796}, {"id": 46, "seek": + 23084, "start": 236.56, "end": 245.76, "text": " And I worked at an television and + later anti-club, which is the company that is like doing big", "tokens": [50650, + 400, 286, 2732, 412, 364, 8815, 293, 1780, 6061, 12, 40607, 11, 597, 307, 264, 2237, + 300, 307, 411, 884, 955, 51110], "temperature": 0.0, "avg_logprob": -0.42778515171360326, + "compression_ratio": 1.5388349514563107, "no_speech_prob": 0.0028358076233416796}, + {"id": 47, "seek": 23084, "start": 245.76, "end": 250.52, "text": " brother for + Moscow, like Moscow surveillance.", "tokens": [51110, 3708, 337, 18298, 11, 411, + 18298, 18475, 13, 51348], "temperature": 0.0, "avg_logprob": -0.42778515171360326, + "compression_ratio": 1.5388349514563107, "no_speech_prob": 0.0028358076233416796}, + {"id": 48, "seek": 23084, "start": 250.52, "end": 259.24, "text": " And later I + joined some some VIS Center in Moscow and I worked with like Victor Limpitsky", + "tokens": [51348, 400, 1780, 286, 6869, 512, 512, 691, 2343, 5169, 294, 18298, 293, + 286, 2732, 365, 411, 15777, 441, 8814, 1208, 4133, 51784], "temperature": 0.0, "avg_logprob": + -0.42778515171360326, "compression_ratio": 1.5388349514563107, "no_speech_prob": + 0.0028358076233416796}, {"id": 49, "seek": 25924, "start": 259.24, "end": 270.56, + "text": " who is one of the well known personas in Russia and in 2019 I moved to + US and now I", "tokens": [50364, 567, 307, 472, 295, 264, 731, 2570, 12019, 294, + 6797, 293, 294, 6071, 286, 4259, 281, 2546, 293, 586, 286, 50930], "temperature": + 0.0, "avg_logprob": -0.37339091691814486, "compression_ratio": 1.3798882681564246, + "no_speech_prob": 0.02167615294456482}, {"id": 50, "seek": 25924, "start": 270.56, + "end": 280.0, "text": " work in Twitter to recommend their systems and content understanding, + like board models.", "tokens": [50930, 589, 294, 5794, 281, 2748, 641, 3652, 293, + 2701, 3701, 11, 411, 3150, 5245, 13, 51402], "temperature": 0.0, "avg_logprob": + -0.37339091691814486, "compression_ratio": 1.3798882681564246, "no_speech_prob": + 0.02167615294456482}, {"id": 51, "seek": 25924, "start": 280.0, "end": 281.0, "text": + " Oh yeah.", "tokens": [51402, 876, 1338, 13, 51452], "temperature": 0.0, "avg_logprob": + -0.37339091691814486, "compression_ratio": 1.3798882681564246, "no_speech_prob": + 0.02167615294456482}, {"id": 52, "seek": 25924, "start": 281.0, "end": 285.84000000000003, + "text": " So you probably also use nearest neighbor search in your work or.", "tokens": + [51452, 407, 291, 1391, 611, 764, 23831, 5987, 3164, 294, 428, 589, 420, 13, 51694], + "temperature": 0.0, "avg_logprob": -0.37339091691814486, "compression_ratio": 1.3798882681564246, + "no_speech_prob": 0.02167615294456482}, {"id": 53, "seek": 28584, "start": 286.4, + "end": 290.52, "text": " Well, I can mention it.", "tokens": [50392, 1042, 11, 286, + 393, 2152, 309, 13, 50598], "temperature": 0.0, "avg_logprob": -0.46531993586842607, + "compression_ratio": 1.5166666666666666, "no_speech_prob": 0.06107814237475395}, + {"id": 54, "seek": 28584, "start": 290.52, "end": 292.88, "text": " Yeah, well, + not really.", "tokens": [50598, 865, 11, 731, 11, 406, 534, 13, 50716], "temperature": + 0.0, "avg_logprob": -0.46531993586842607, "compression_ratio": 1.5166666666666666, + "no_speech_prob": 0.06107814237475395}, {"id": 55, "seek": 28584, "start": 292.88, + "end": 300.28, "text": " So I''m focused on the so I can work Twitter most of the + time.", "tokens": [50716, 407, 286, 478, 5178, 322, 264, 370, 286, 393, 589, 5794, + 881, 295, 264, 565, 13, 51086], "temperature": 0.0, "avg_logprob": -0.46531993586842607, + "compression_ratio": 1.5166666666666666, "no_speech_prob": 0.06107814237475395}, + {"id": 56, "seek": 28584, "start": 300.28, "end": 305.55999999999995, "text": " + I can have last half a year, I spent on improving search relevance.", "tokens": + [51086, 286, 393, 362, 1036, 1922, 257, 1064, 11, 286, 4418, 322, 11470, 3164, 32684, + 13, 51350], "temperature": 0.0, "avg_logprob": -0.46531993586842607, "compression_ratio": + 1.5166666666666666, "no_speech_prob": 0.06107814237475395}, {"id": 57, "seek": 28584, + "start": 305.55999999999995, "end": 308.55999999999995, "text": " So that is mostly + the ranker.", "tokens": [51350, 407, 300, 307, 5240, 264, 6181, 260, 13, 51500], + "temperature": 0.0, "avg_logprob": -0.46531993586842607, "compression_ratio": 1.5166666666666666, + "no_speech_prob": 0.06107814237475395}, {"id": 58, "seek": 28584, "start": 308.55999999999995, + "end": 313.52, "text": " But that is closely related to the nearest neighbor search.", + "tokens": [51500, 583, 300, 307, 8185, 4077, 281, 264, 23831, 5987, 3164, 13, 51748], + "temperature": 0.0, "avg_logprob": -0.46531993586842607, "compression_ratio": 1.5166666666666666, + "no_speech_prob": 0.06107814237475395}, {"id": 59, "seek": 28584, "start": 313.52, + "end": 314.52, "text": " Yeah.", "tokens": [51748, 865, 13, 51798], "temperature": + 0.0, "avg_logprob": -0.46531993586842607, "compression_ratio": 1.5166666666666666, + "no_speech_prob": 0.06107814237475395}, {"id": 60, "seek": 31452, "start": 314.68, + "end": 319.08, "text": " So you mentioned like basically the background where you''ve + been in Russia, it was like kind", "tokens": [50372, 407, 291, 2835, 411, 1936, + 264, 3678, 689, 291, 600, 668, 294, 6797, 11, 309, 390, 411, 733, 50592], "temperature": + 0.0, "avg_logprob": -0.20180282990137735, "compression_ratio": 1.6869918699186992, + "no_speech_prob": 0.0018707547569647431}, {"id": 61, "seek": 31452, "start": 319.08, + "end": 320.59999999999997, "text": " of related to computer vision.", "tokens": + [50592, 295, 4077, 281, 3820, 5201, 13, 50668], "temperature": 0.0, "avg_logprob": + -0.20180282990137735, "compression_ratio": 1.6869918699186992, "no_speech_prob": + 0.0018707547569647431}, {"id": 62, "seek": 31452, "start": 320.59999999999997, "end": + 325.59999999999997, "text": " Of course, you had physics background by education, + but you also kind of worked in computer vision", "tokens": [50668, 2720, 1164, 11, + 291, 632, 10649, 3678, 538, 3309, 11, 457, 291, 611, 733, 295, 2732, 294, 3820, + 5201, 50918], "temperature": 0.0, "avg_logprob": -0.20180282990137735, "compression_ratio": + 1.6869918699186992, "no_speech_prob": 0.0018707547569647431}, {"id": 63, "seek": + 31452, "start": 325.59999999999997, "end": 326.52, "text": " startups.", "tokens": + [50918, 28041, 13, 50964], "temperature": 0.0, "avg_logprob": -0.20180282990137735, + "compression_ratio": 1.6869918699186992, "no_speech_prob": 0.0018707547569647431}, + {"id": 64, "seek": 31452, "start": 326.52, "end": 332.24, "text": " So what was + your impression of this nearest neighbor search problem and like, how did you think", + "tokens": [50964, 407, 437, 390, 428, 9995, 295, 341, 23831, 5987, 3164, 1154, 293, + 411, 11, 577, 630, 291, 519, 51250], "temperature": 0.0, "avg_logprob": -0.20180282990137735, + "compression_ratio": 1.6869918699186992, "no_speech_prob": 0.0018707547569647431}, + {"id": 65, "seek": 31452, "start": 332.24, "end": 338.15999999999997, "text": " + about it when like, did you read papers to understand like what was done in that + area?", "tokens": [51250, 466, 309, 562, 411, 11, 630, 291, 1401, 10577, 281, 1223, + 411, 437, 390, 1096, 294, 300, 1859, 30, 51546], "temperature": 0.0, "avg_logprob": + -0.20180282990137735, "compression_ratio": 1.6869918699186992, "no_speech_prob": + 0.0018707547569647431}, {"id": 66, "seek": 33816, "start": 338.16, "end": 346.44, + "text": " I think that areas pretty like developed right in in in the papers like + like NSW itself,", "tokens": [50364, 286, 519, 300, 3179, 1238, 411, 4743, 558, + 294, 294, 294, 264, 10577, 411, 411, 15943, 54, 2564, 11, 50778], "temperature": + 0.0, "avg_logprob": -0.39413238826550934, "compression_ratio": 1.4861878453038675, + "no_speech_prob": 0.02068152092397213}, {"id": 67, "seek": 33816, "start": 346.44, + "end": 347.44, "text": " right?", "tokens": [50778, 558, 30, 50828], "temperature": + 0.0, "avg_logprob": -0.39413238826550934, "compression_ratio": 1.4861878453038675, + "no_speech_prob": 0.02068152092397213}, {"id": 68, "seek": 33816, "start": 347.44, + "end": 349.44, "text": " Like navigators.", "tokens": [50828, 1743, 7407, 3391, + 13, 50928], "temperature": 0.0, "avg_logprob": -0.39413238826550934, "compression_ratio": + 1.4861878453038675, "no_speech_prob": 0.02068152092397213}, {"id": 69, "seek": 33816, + "start": 349.44, "end": 358.88, "text": " Well, so like in the startup meta labs, + I have been working, I think I''ve worked for", "tokens": [50928, 1042, 11, 370, + 411, 294, 264, 18578, 19616, 20339, 11, 286, 362, 668, 1364, 11, 286, 519, 286, + 600, 2732, 337, 51400], "temperature": 0.0, "avg_logprob": -0.39413238826550934, + "compression_ratio": 1.4861878453038675, "no_speech_prob": 0.02068152092397213}, + {"id": 70, "seek": 33816, "start": 358.88, "end": 361.16, "text": " six or seven + years.", "tokens": [51400, 2309, 420, 3407, 924, 13, 51514], "temperature": 0.0, + "avg_logprob": -0.39413238826550934, "compression_ratio": 1.4861878453038675, "no_speech_prob": + 0.02068152092397213}, {"id": 71, "seek": 33816, "start": 361.16, "end": 365.8, "text": + " So it was quite quite a significant period of time.", "tokens": [51514, 407, 309, + 390, 1596, 1596, 257, 4776, 2896, 295, 565, 13, 51746], "temperature": 0.0, "avg_logprob": + -0.39413238826550934, "compression_ratio": 1.4861878453038675, "no_speech_prob": + 0.02068152092397213}, {"id": 72, "seek": 36580, "start": 365.8, "end": 370.16, "text": + " And then we started just like from distributed search.", "tokens": [50364, 400, + 550, 321, 1409, 445, 411, 490, 12631, 3164, 13, 50582], "temperature": 0.0, "avg_logprob": + -0.24922507459467108, "compression_ratio": 1.6858638743455496, "no_speech_prob": + 0.0018466752953827381}, {"id": 73, "seek": 36580, "start": 370.16, "end": 373.36, + "text": " So the idea was like we do it from scratch.", "tokens": [50582, 407, 264, + 1558, 390, 411, 321, 360, 309, 490, 8459, 13, 50742], "temperature": 0.0, "avg_logprob": + -0.24922507459467108, "compression_ratio": 1.6858638743455496, "no_speech_prob": + 0.0018466752953827381}, {"id": 74, "seek": 36580, "start": 373.36, "end": 377.0, + "text": " So like we don''t care what I''ve been done before.", "tokens": [50742, + 407, 411, 321, 500, 380, 1127, 437, 286, 600, 668, 1096, 949, 13, 50924], "temperature": + 0.0, "avg_logprob": -0.24922507459467108, "compression_ratio": 1.6858638743455496, + "no_speech_prob": 0.0018466752953827381}, {"id": 75, "seek": 36580, "start": 377.0, + "end": 378.0, "text": " So we have an idea.", "tokens": [50924, 407, 321, 362, 364, + 1558, 13, 50974], "temperature": 0.0, "avg_logprob": -0.24922507459467108, "compression_ratio": + 1.6858638743455496, "no_speech_prob": 0.0018466752953827381}, {"id": 76, "seek": + 36580, "start": 378.0, "end": 385.08000000000004, "text": " So there are like distributed + hash tables like port or other stuff and we want to do it,", "tokens": [50974, 407, + 456, 366, 411, 12631, 22019, 8020, 411, 2436, 420, 661, 1507, 293, 321, 528, 281, + 360, 309, 11, 51328], "temperature": 0.0, "avg_logprob": -0.24922507459467108, "compression_ratio": + 1.6858638743455496, "no_speech_prob": 0.0018466752953827381}, {"id": 77, "seek": + 36580, "start": 385.08000000000004, "end": 386.92, "text": " but with similarity + search.", "tokens": [51328, 457, 365, 32194, 3164, 13, 51420], "temperature": 0.0, + "avg_logprob": -0.24922507459467108, "compression_ratio": 1.6858638743455496, "no_speech_prob": + 0.0018466752953827381}, {"id": 78, "seek": 36580, "start": 386.92, "end": 391.28000000000003, + "text": " So that should scale to better base.", "tokens": [51420, 407, 300, 820, + 4373, 281, 1101, 3096, 13, 51638], "temperature": 0.0, "avg_logprob": -0.24922507459467108, + "compression_ratio": 1.6858638743455496, "no_speech_prob": 0.0018466752953827381}, + {"id": 79, "seek": 39128, "start": 391.28, "end": 396.15999999999997, "text": " + And there that''s like very different approach from nearest neighbor search.", "tokens": + [50364, 400, 456, 300, 311, 411, 588, 819, 3109, 490, 23831, 5987, 3164, 13, 50608], + "temperature": 0.0, "avg_logprob": -0.2146826996200386, "compression_ratio": 1.6919642857142858, + "no_speech_prob": 0.006103829946368933}, {"id": 80, "seek": 39128, "start": 396.15999999999997, + "end": 401.88, "text": " And like most of the time we spent like developing this + algorithm was not even like nearest", "tokens": [50608, 400, 411, 881, 295, 264, + 565, 321, 4418, 411, 6416, 341, 9284, 390, 406, 754, 411, 23831, 50894], "temperature": + 0.0, "avg_logprob": -0.2146826996200386, "compression_ratio": 1.6919642857142858, + "no_speech_prob": 0.006103829946368933}, {"id": 81, "seek": 39128, "start": 401.88, + "end": 402.88, "text": " neighbor search.", "tokens": [50894, 5987, 3164, 13, 50944], + "temperature": 0.0, "avg_logprob": -0.2146826996200386, "compression_ratio": 1.6919642857142858, + "no_speech_prob": 0.006103829946368933}, {"id": 82, "seek": 39128, "start": 402.88, + "end": 412.35999999999996, "text": " That was closer to this symbolic filtering, + but with like arbitrary filters.", "tokens": [50944, 663, 390, 4966, 281, 341, 25755, + 30822, 11, 457, 365, 411, 23211, 15995, 13, 51418], "temperature": 0.0, "avg_logprob": + -0.2146826996200386, "compression_ratio": 1.6919642857142858, "no_speech_prob": + 0.006103829946368933}, {"id": 83, "seek": 39128, "start": 412.35999999999996, "end": + 418.08, "text": " And only at some point of time, like we had a realization that + oh, like that is similar", "tokens": [51418, 400, 787, 412, 512, 935, 295, 565, + 11, 411, 321, 632, 257, 25138, 300, 1954, 11, 411, 300, 307, 2531, 51704], "temperature": + 0.0, "avg_logprob": -0.2146826996200386, "compression_ratio": 1.6919642857142858, + "no_speech_prob": 0.006103829946368933}, {"id": 84, "seek": 39128, "start": 418.08, + "end": 420.4, "text": " to what people actually need.", "tokens": [51704, 281, 437, + 561, 767, 643, 13, 51820], "temperature": 0.0, "avg_logprob": -0.2146826996200386, + "compression_ratio": 1.6919642857142858, "no_speech_prob": 0.006103829946368933}, + {"id": 85, "seek": 42040, "start": 420.4, "end": 423.4, "text": " Like there are + a lot of papers of on nearest neighbor search.", "tokens": [50364, 1743, 456, 366, + 257, 688, 295, 10577, 295, 322, 23831, 5987, 3164, 13, 50514], "temperature": 0.0, + "avg_logprob": -0.34679229442889875, "compression_ratio": 1.6218487394957983, "no_speech_prob": + 0.008457459509372711}, {"id": 86, "seek": 42040, "start": 423.4, "end": 432.2, "text": + " So we switch direction and like the most cited publications are on nearest neighbors.", + "tokens": [50514, 407, 321, 3679, 3513, 293, 411, 264, 881, 30134, 25618, 366, 322, + 23831, 12512, 13, 50954], "temperature": 0.0, "avg_logprob": -0.34679229442889875, + "compression_ratio": 1.6218487394957983, "no_speech_prob": 0.008457459509372711}, + {"id": 87, "seek": 42040, "start": 432.2, "end": 433.2, "text": " Yeah.", "tokens": + [50954, 865, 13, 51004], "temperature": 0.0, "avg_logprob": -0.34679229442889875, + "compression_ratio": 1.6218487394957983, "no_speech_prob": 0.008457459509372711}, + {"id": 88, "seek": 42040, "start": 433.2, "end": 434.2, "text": " Yeah.", "tokens": + [51004, 865, 13, 51054], "temperature": 0.0, "avg_logprob": -0.34679229442889875, + "compression_ratio": 1.6218487394957983, "no_speech_prob": 0.008457459509372711}, + {"id": 89, "seek": 42040, "start": 434.2, "end": 437.2, "text": " I don''t remember + was it on your paper or somebody else''s paper.", "tokens": [51054, 286, 500, 380, + 1604, 390, 309, 322, 428, 3035, 420, 2618, 1646, 311, 3035, 13, 51204], "temperature": + 0.0, "avg_logprob": -0.34679229442889875, "compression_ratio": 1.6218487394957983, + "no_speech_prob": 0.008457459509372711}, {"id": 90, "seek": 42040, "start": 437.2, + "end": 442.35999999999996, "text": " I saw a paper of my old friend, you reliefs + because he actually defended his thesis like", "tokens": [51204, 286, 1866, 257, + 3035, 295, 452, 1331, 1277, 11, 291, 10915, 82, 570, 415, 767, 34135, 702, 22288, + 411, 51462], "temperature": 0.0, "avg_logprob": -0.34679229442889875, "compression_ratio": + 1.6218487394957983, "no_speech_prob": 0.008457459509372711}, {"id": 91, "seek": + 42040, "start": 442.35999999999996, "end": 445.32, "text": " in PG thesis in this + space.", "tokens": [51462, 294, 40975, 22288, 294, 341, 1901, 13, 51610], "temperature": + 0.0, "avg_logprob": -0.34679229442889875, "compression_ratio": 1.6218487394957983, + "no_speech_prob": 0.008457459509372711}, {"id": 92, "seek": 42040, "start": 445.32, + "end": 448.56, "text": " So when he was doing it, I think it was 2009.", "tokens": + [51610, 407, 562, 415, 390, 884, 309, 11, 286, 519, 309, 390, 11453, 13, 51772], + "temperature": 0.0, "avg_logprob": -0.34679229442889875, "compression_ratio": 1.6218487394957983, + "no_speech_prob": 0.008457459509372711}, {"id": 93, "seek": 44856, "start": 448.56, + "end": 453.64, "text": " I was like, I was considering this like a pure mathematical + problem without like maybe", "tokens": [50364, 286, 390, 411, 11, 286, 390, 8079, + 341, 411, 257, 6075, 18894, 1154, 1553, 411, 1310, 50618], "temperature": 0.0, "avg_logprob": + -0.19077334684484146, "compression_ratio": 1.7306273062730628, "no_speech_prob": + 0.0022083779331296682}, {"id": 94, "seek": 44856, "start": 453.64, "end": 455.12, + "text": " direct application.", "tokens": [50618, 2047, 3861, 13, 50692], "temperature": + 0.0, "avg_logprob": -0.19077334684484146, "compression_ratio": 1.7306273062730628, + "no_speech_prob": 0.0022083779331296682}, {"id": 95, "seek": 44856, "start": 455.12, + "end": 458.44, "text": " But then he gave a talk at Google, like you know, Google + tech talks.", "tokens": [50692, 583, 550, 415, 2729, 257, 751, 412, 3329, 11, 411, + 291, 458, 11, 3329, 7553, 6686, 13, 50858], "temperature": 0.0, "avg_logprob": -0.19077334684484146, + "compression_ratio": 1.7306273062730628, "no_speech_prob": 0.0022083779331296682}, + {"id": 96, "seek": 44856, "start": 458.44, "end": 463.88, "text": " I don''t know + if they still exist or not, but like he presented this problem and like they", "tokens": + [50858, 286, 500, 380, 458, 498, 436, 920, 2514, 420, 406, 11, 457, 411, 415, 8212, + 341, 1154, 293, 411, 436, 51130], "temperature": 0.0, "avg_logprob": -0.19077334684484146, + "compression_ratio": 1.7306273062730628, "no_speech_prob": 0.0022083779331296682}, + {"id": 97, "seek": 44856, "start": 463.88, "end": 467.32, "text": " did some optimizations + as well.", "tokens": [51130, 630, 512, 5028, 14455, 382, 731, 13, 51302], "temperature": + 0.0, "avg_logprob": -0.19077334684484146, "compression_ratio": 1.7306273062730628, + "no_speech_prob": 0.0022083779331296682}, {"id": 98, "seek": 44856, "start": 467.32, + "end": 470.96, "text": " And then I think I think your paper sites it or maybe someone + else''s I don''t remember.", "tokens": [51302, 400, 550, 286, 519, 286, 519, 428, + 3035, 7533, 309, 420, 1310, 1580, 1646, 311, 286, 500, 380, 1604, 13, 51484], "temperature": + 0.0, "avg_logprob": -0.19077334684484146, "compression_ratio": 1.7306273062730628, + "no_speech_prob": 0.0022083779331296682}, {"id": 99, "seek": 44856, "start": 470.96, + "end": 476.36, "text": " I was like really surprised to see, you know, his work + also kind of in the same line", "tokens": [51484, 286, 390, 411, 534, 6100, 281, + 536, 11, 291, 458, 11, 702, 589, 611, 733, 295, 294, 264, 912, 1622, 51754], "temperature": + 0.0, "avg_logprob": -0.19077334684484146, "compression_ratio": 1.7306273062730628, + "no_speech_prob": 0.0022083779331296682}, {"id": 100, "seek": 47636, "start": 476.36, + "end": 481.36, "text": " of things that now lead to vector search essentially.", + "tokens": [50364, 295, 721, 300, 586, 1477, 281, 8062, 3164, 4476, 13, 50614], "temperature": + 0.0, "avg_logprob": -0.36263618940188563, "compression_ratio": 1.5544554455445545, + "no_speech_prob": 0.0341654010117054}, {"id": 101, "seek": 47636, "start": 481.36, + "end": 487.36, "text": " Well, yeah, I think I saw his work, but it seemed like + more theory.", "tokens": [50614, 1042, 11, 1338, 11, 286, 519, 286, 1866, 702, 589, + 11, 457, 309, 6576, 411, 544, 5261, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.36263618940188563, "compression_ratio": 1.5544554455445545, "no_speech_prob": + 0.0341654010117054}, {"id": 102, "seek": 47636, "start": 487.36, "end": 495.36, + "text": " Like if you look to history like of like graph approaches so like.", "tokens": + [50914, 1743, 498, 291, 574, 281, 2503, 411, 295, 411, 4295, 11587, 370, 411, 13, + 51314], "temperature": 0.0, "avg_logprob": -0.36263618940188563, "compression_ratio": + 1.5544554455445545, "no_speech_prob": 0.0341654010117054}, {"id": 103, "seek": 47636, + "start": 495.36, "end": 500.36, "text": " Now it''s mostly like rehashing of old + stuff.", "tokens": [51314, 823, 309, 311, 5240, 411, 22355, 11077, 295, 1331, 1507, + 13, 51564], "temperature": 0.0, "avg_logprob": -0.36263618940188563, "compression_ratio": + 1.5544554455445545, "no_speech_prob": 0.0341654010117054}, {"id": 104, "seek": 47636, + "start": 500.36, "end": 505.48, "text": " So definitely new things, but like there + is so much work done before like Sergey", "tokens": [51564, 407, 2138, 777, 721, + 11, 457, 411, 456, 307, 370, 709, 589, 1096, 949, 411, 49238, 51820], "temperature": + 0.0, "avg_logprob": -0.36263618940188563, "compression_ratio": 1.5544554455445545, + "no_speech_prob": 0.0341654010117054}, {"id": 105, "seek": 50548, "start": 505.48, + "end": 511.48, "text": " Brein worked on nearest neighbor search with like GNIT.", + "tokens": [50364, 7090, 259, 2732, 322, 23831, 5987, 3164, 365, 411, 46411, 3927, + 13, 50664], "temperature": 0.0, "avg_logprob": -0.38368380069732666, "compression_ratio": + 1.4555555555555555, "no_speech_prob": 0.014138257130980492}, {"id": 106, "seek": + 50548, "start": 511.48, "end": 514.48, "text": " So that is also like good work.", + "tokens": [50664, 407, 300, 307, 611, 411, 665, 589, 13, 50814], "temperature": + 0.0, "avg_logprob": -0.38368380069732666, "compression_ratio": 1.4555555555555555, + "no_speech_prob": 0.014138257130980492}, {"id": 107, "seek": 50548, "start": 514.48, + "end": 522.24, "text": " There were there were previous work on graph search, I + think in 1993, which like aren''t", "tokens": [50814, 821, 645, 456, 645, 3894, + 589, 322, 4295, 3164, 11, 286, 519, 294, 25137, 11, 597, 411, 3212, 380, 51202], + "temperature": 0.0, "avg_logprob": -0.38368380069732666, "compression_ratio": 1.4555555555555555, + "no_speech_prob": 0.014138257130980492}, {"id": 108, "seek": 50548, "start": 522.24, + "end": 527.48, "text": " that much different compared to like current though, like + they have also problems with", "tokens": [51202, 300, 709, 819, 5347, 281, 411, + 2190, 1673, 11, 411, 436, 362, 611, 2740, 365, 51464], "temperature": 0.0, "avg_logprob": + -0.38368380069732666, "compression_ratio": 1.4555555555555555, "no_speech_prob": + 0.014138257130980492}, {"id": 109, "seek": 52748, "start": 527.48, "end": 531.48, + "text": " scalability at that point.", "tokens": [50364, 15664, 2310, 412, 300, + 935, 13, 50564], "temperature": 0.0, "avg_logprob": -0.2575679723767267, "compression_ratio": + 1.4325842696629214, "no_speech_prob": 0.20761214196681976}, {"id": 110, "seek": + 52748, "start": 531.48, "end": 537.48, "text": " So I think yeah, so that was.", + "tokens": [50564, 407, 286, 519, 1338, 11, 370, 300, 390, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.2575679723767267, "compression_ratio": 1.4325842696629214, + "no_speech_prob": 0.20761214196681976}, {"id": 111, "seek": 52748, "start": 537.48, + "end": 543.48, "text": " There is a large number of like previous work in that area.", + "tokens": [50864, 821, 307, 257, 2416, 1230, 295, 411, 3894, 589, 294, 300, 1859, + 13, 51164], "temperature": 0.0, "avg_logprob": -0.2575679723767267, "compression_ratio": + 1.4325842696629214, "no_speech_prob": 0.20761214196681976}, {"id": 112, "seek": + 52748, "start": 543.48, "end": 549.48, "text": " But you said like you didn''t concern + yourself with reading too many papers before you started", "tokens": [51164, 583, + 291, 848, 411, 291, 994, 380, 3136, 1803, 365, 3760, 886, 867, 10577, 949, 291, + 1409, 51464], "temperature": 0.0, "avg_logprob": -0.2575679723767267, "compression_ratio": + 1.4325842696629214, "no_speech_prob": 0.20761214196681976}, {"id": 113, "seek": + 52748, "start": 549.48, "end": 551.48, "text": " inventing this new algorithm.", + "tokens": [51464, 7962, 278, 341, 777, 9284, 13, 51564], "temperature": 0.0, "avg_logprob": + -0.2575679723767267, "compression_ratio": 1.4325842696629214, "no_speech_prob": + 0.20761214196681976}, {"id": 114, "seek": 52748, "start": 551.48, "end": 552.48, + "text": " Is that right?", "tokens": [51564, 1119, 300, 558, 30, 51614], "temperature": + 0.0, "avg_logprob": -0.2575679723767267, "compression_ratio": 1.4325842696629214, + "no_speech_prob": 0.20761214196681976}, {"id": 115, "seek": 55248, "start": 552.48, + "end": 558.48, "text": " Yeah, sure, sure, we read papers, but they were not really + relevant.", "tokens": [50364, 865, 11, 988, 11, 988, 11, 321, 1401, 10577, 11, 457, + 436, 645, 406, 534, 7340, 13, 50664], "temperature": 0.0, "avg_logprob": -0.2470530027984291, + "compression_ratio": 1.721951219512195, "no_speech_prob": 0.027562787756323814}, + {"id": 116, "seek": 55248, "start": 558.48, "end": 563.48, "text": " So we read + papers on network science.", "tokens": [50664, 407, 321, 1401, 10577, 322, 3209, + 3497, 13, 50914], "temperature": 0.0, "avg_logprob": -0.2470530027984291, "compression_ratio": + 1.721951219512195, "no_speech_prob": 0.027562787756323814}, {"id": 117, "seek": + 55248, "start": 563.48, "end": 569.48, "text": " And so we tried to so there was + a problem with building like this, no navigable small roles.", "tokens": [50914, + 400, 370, 321, 3031, 281, 370, 456, 390, 257, 1154, 365, 2390, 411, 341, 11, 572, + 7407, 712, 1359, 9604, 13, 51214], "temperature": 0.0, "avg_logprob": -0.2470530027984291, + "compression_ratio": 1.721951219512195, "no_speech_prob": 0.027562787756323814}, + {"id": 118, "seek": 55248, "start": 569.48, "end": 573.48, "text": " So like not + every small network is navigable.", "tokens": [51214, 407, 411, 406, 633, 1359, + 3209, 307, 7407, 712, 13, 51414], "temperature": 0.0, "avg_logprob": -0.2470530027984291, + "compression_ratio": 1.721951219512195, "no_speech_prob": 0.027562787756323814}, + {"id": 119, "seek": 55248, "start": 573.48, "end": 576.48, "text": " Like most models + are not.", "tokens": [51414, 1743, 881, 5245, 366, 406, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.2470530027984291, "compression_ratio": 1.721951219512195, + "no_speech_prob": 0.027562787756323814}, {"id": 120, "seek": 55248, "start": 576.48, + "end": 581.48, "text": " So we wanted to build navigable small and there were also + didn''t understand like.", "tokens": [51564, 407, 321, 1415, 281, 1322, 7407, 712, + 1359, 293, 456, 645, 611, 994, 380, 1223, 411, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.2470530027984291, "compression_ratio": 1.721951219512195, "no_speech_prob": 0.027562787756323814}, + {"id": 121, "seek": 58148, "start": 581.48, "end": 591.48, "text": " Like what what + was the criteria like what is like how we could make it and we reinvented like this.", + "tokens": [50364, 1743, 437, 437, 390, 264, 11101, 411, 437, 307, 411, 577, 321, + 727, 652, 309, 293, 321, 33477, 292, 411, 341, 13, 50864], "temperature": 0.0, "avg_logprob": + -0.3275919521556181, "compression_ratio": 1.6467391304347827, "no_speech_prob": + 0.0074677495285868645}, {"id": 122, "seek": 58148, "start": 591.48, "end": 602.48, + "text": " Dillon or graphs inside the company and after like you reinvented like + you know starting to search and see there are lots of papers who did the same.", + "tokens": [50864, 413, 30961, 420, 24877, 1854, 264, 2237, 293, 934, 411, 291, 33477, + 292, 411, 291, 458, 2891, 281, 3164, 293, 536, 456, 366, 3195, 295, 10577, 567, + 630, 264, 912, 13, 51414], "temperature": 0.0, "avg_logprob": -0.3275919521556181, + "compression_ratio": 1.6467391304347827, "no_speech_prob": 0.0074677495285868645}, + {"id": 123, "seek": 58148, "start": 602.48, "end": 603.48, "text": " Right.", "tokens": + [51414, 1779, 13, 51464], "temperature": 0.0, "avg_logprob": -0.3275919521556181, + "compression_ratio": 1.6467391304347827, "no_speech_prob": 0.0074677495285868645}, + {"id": 124, "seek": 58148, "start": 603.48, "end": 604.48, "text": " Yeah.", "tokens": + [51464, 865, 13, 51514], "temperature": 0.0, "avg_logprob": -0.3275919521556181, + "compression_ratio": 1.6467391304347827, "no_speech_prob": 0.0074677495285868645}, + {"id": 125, "seek": 58148, "start": 604.48, "end": 606.48, "text": " So yeah.", + "tokens": [51514, 407, 1338, 13, 51614], "temperature": 0.0, "avg_logprob": -0.3275919521556181, + "compression_ratio": 1.6467391304347827, "no_speech_prob": 0.0074677495285868645}, + {"id": 126, "seek": 58148, "start": 606.48, "end": 608.48, "text": " So we went + the other way.", "tokens": [51614, 407, 321, 1437, 264, 661, 636, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.3275919521556181, "compression_ratio": 1.6467391304347827, + "no_speech_prob": 0.0074677495285868645}, {"id": 127, "seek": 58148, "start": 608.48, + "end": 609.48, "text": " Yeah.", "tokens": [51714, 865, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.3275919521556181, "compression_ratio": 1.6467391304347827, + "no_speech_prob": 0.0074677495285868645}, {"id": 128, "seek": 60948, "start": 609.48, + "end": 625.48, "text": " So now that you mentioned this thing like can you actually + please introduce this concepts at least on high level to our audience like what + is a small world what is like what white it''s to be navigable kind of a little + bit like more to the user facing", "tokens": [50364, 407, 586, 300, 291, 2835, 341, + 551, 411, 393, 291, 767, 1767, 5366, 341, 10392, 412, 1935, 322, 1090, 1496, 281, + 527, 4034, 411, 437, 307, 257, 1359, 1002, 437, 307, 411, 437, 2418, 309, 311, 281, + 312, 7407, 712, 733, 295, 257, 707, 857, 411, 544, 281, 264, 4195, 7170, 51164], + "temperature": 0.0, "avg_logprob": -0.16488108559260292, "compression_ratio": 1.583815028901734, + "no_speech_prob": 0.011575441807508469}, {"id": 129, "seek": 60948, "start": 625.48, + "end": 628.48, "text": " level if it''s possible.", "tokens": [51164, 1496, 498, + 309, 311, 1944, 13, 51314], "temperature": 0.0, "avg_logprob": -0.16488108559260292, + "compression_ratio": 1.583815028901734, "no_speech_prob": 0.011575441807508469}, + {"id": 130, "seek": 62848, "start": 628.48, "end": 633.48, "text": " Well, like + navigable small world so you have a large network.", "tokens": [50364, 1042, 11, + 411, 7407, 712, 1359, 1002, 370, 291, 362, 257, 2416, 3209, 13, 50614], "temperature": + 0.0, "avg_logprob": -0.1980497802513233, "compression_ratio": 1.6229508196721312, + "no_speech_prob": 0.03176131099462509}, {"id": 131, "seek": 62848, "start": 633.48, + "end": 648.48, "text": " And so navigable small world that means you can find paths + between like arbitrary elements in this network using which is a logarithmic scale.", + "tokens": [50614, 400, 370, 7407, 712, 1359, 1002, 300, 1355, 291, 393, 915, 14518, + 1296, 411, 23211, 4959, 294, 341, 3209, 1228, 597, 307, 257, 41473, 355, 13195, + 4373, 13, 51364], "temperature": 0.0, "avg_logprob": -0.1980497802513233, "compression_ratio": + 1.6229508196721312, "no_speech_prob": 0.03176131099462509}, {"id": 132, "seek": + 62848, "start": 648.48, "end": 655.48, "text": " So the number of hopes can be done + with the rhythmic and you can use only local information.", "tokens": [51364, 407, + 264, 1230, 295, 13681, 393, 312, 1096, 365, 264, 46967, 293, 291, 393, 764, 787, + 2654, 1589, 13, 51714], "temperature": 0.0, "avg_logprob": -0.1980497802513233, + "compression_ratio": 1.6229508196721312, "no_speech_prob": 0.03176131099462509}, + {"id": 133, "seek": 65548, "start": 655.48, "end": 667.48, "text": " And do like + something like greedy search like greedy searches allow and if you can find like + the path and the algorithmic steps to your network is navigable.", "tokens": [50364, + 400, 360, 411, 746, 411, 28228, 3164, 411, 28228, 26701, 2089, 293, 498, 291, 393, + 915, 411, 264, 3100, 293, 264, 9284, 299, 4439, 281, 428, 3209, 307, 7407, 712, + 13, 50964], "temperature": 0.0, "avg_logprob": -0.37479050305424905, "compression_ratio": + 1.5629629629629629, "no_speech_prob": 0.032770946621894836}, {"id": 134, "seek": + 65548, "start": 667.48, "end": 673.48, "text": " And that small world part like + why is it small small.", "tokens": [50964, 400, 300, 1359, 1002, 644, 411, 983, + 307, 309, 1359, 1359, 13, 51264], "temperature": 0.0, "avg_logprob": -0.37479050305424905, + "compression_ratio": 1.5629629629629629, "no_speech_prob": 0.032770946621894836}, + {"id": 135, "seek": 67348, "start": 673.48, "end": 689.48, "text": " And that''s + like history how he''s historical reasons so there was like a famous like milligram + experiment where they they send letters from one person like from random person + to some target person.", "tokens": [50364, 400, 300, 311, 411, 2503, 577, 415, 311, + 8584, 4112, 370, 456, 390, 411, 257, 4618, 411, 38298, 5120, 689, 436, 436, 2845, + 7825, 490, 472, 954, 411, 490, 4974, 954, 281, 512, 3779, 954, 13, 51164], "temperature": + 0.0, "avg_logprob": -0.3330188536308181, "compression_ratio": 1.7676767676767677, + "no_speech_prob": 0.05594843998551369}, {"id": 136, "seek": 67348, "start": 689.48, + "end": 699.48, "text": " That was kind of greedy like greedy search for connections + very similar to this and that that''s called like small world experiment so like + a small world.", "tokens": [51164, 663, 390, 733, 295, 28228, 411, 28228, 3164, + 337, 9271, 588, 2531, 281, 341, 293, 300, 300, 311, 1219, 411, 1359, 1002, 5120, + 370, 411, 257, 1359, 1002, 13, 51664], "temperature": 0.0, "avg_logprob": -0.3330188536308181, + "compression_ratio": 1.7676767676767677, "no_speech_prob": 0.05594843998551369}, + {"id": 137, "seek": 69948, "start": 699.48, "end": 710.48, "text": " And real networks + like people have like real networks have low diameter like human human connection + networks.", "tokens": [50364, 400, 957, 9590, 411, 561, 362, 411, 957, 9590, 362, + 2295, 14196, 411, 1952, 1952, 4984, 9590, 13, 50914], "temperature": 0.0, "avg_logprob": + -0.16736959431269396, "compression_ratio": 1.7075471698113207, "no_speech_prob": + 0.11016340553760529}, {"id": 138, "seek": 69948, "start": 710.48, "end": 718.48, + "text": " And they are navigable like at least according to milligram experiments + and like subsequent experiments.", "tokens": [50914, 400, 436, 366, 7407, 712, 411, + 412, 1935, 4650, 281, 38298, 12050, 293, 411, 19962, 12050, 13, 51314], "temperature": + 0.0, "avg_logprob": -0.16736959431269396, "compression_ratio": 1.7075471698113207, + "no_speech_prob": 0.11016340553760529}, {"id": 139, "seek": 69948, "start": 718.48, + "end": 728.48, "text": " Is it kind of related in common terms like to six handshakes + that you need to connect every random person with another random person on the planet.", + "tokens": [51314, 1119, 309, 733, 295, 4077, 294, 2689, 2115, 411, 281, 2309, 2377, + 71, 3419, 300, 291, 643, 281, 1745, 633, 4974, 954, 365, 1071, 4974, 954, 322, 264, + 5054, 13, 51814], "temperature": 0.0, "avg_logprob": -0.16736959431269396, "compression_ratio": + 1.7075471698113207, "no_speech_prob": 0.11016340553760529}, {"id": 140, "seek": + 72848, "start": 728.48, "end": 737.48, "text": " Yes, yes, so that''s that''s like + that experiment is pretty sure I think it''s done in the 60s so yeah so.", "tokens": + [50364, 1079, 11, 2086, 11, 370, 300, 311, 300, 311, 411, 300, 5120, 307, 1238, + 988, 286, 519, 309, 311, 1096, 294, 264, 4060, 82, 370, 1338, 370, 13, 50814], "temperature": + 0.0, "avg_logprob": -0.17175868806384859, "compression_ratio": 1.6909871244635193, + "no_speech_prob": 0.0020402243826538324}, {"id": 141, "seek": 72848, "start": 737.48, + "end": 745.48, "text": " And so the navigable part is basically like if we put this + in the context of search right so.", "tokens": [50814, 400, 370, 264, 7407, 712, + 644, 307, 1936, 411, 498, 321, 829, 341, 294, 264, 4319, 295, 3164, 558, 370, 13, + 51214], "temperature": 0.0, "avg_logprob": -0.17175868806384859, "compression_ratio": + 1.6909871244635193, "no_speech_prob": 0.0020402243826538324}, {"id": 142, "seek": + 72848, "start": 745.48, "end": 756.48, "text": " So let''s say I have local information + I''m here I would like to travel from here let''s say I''m in Helsinki I would like + to travel to New York like how do I travel right I need to go to the airport.", + "tokens": [51214, 407, 718, 311, 584, 286, 362, 2654, 1589, 286, 478, 510, 286, + 576, 411, 281, 3147, 490, 510, 718, 311, 584, 286, 478, 294, 45429, 41917, 286, + 576, 411, 281, 3147, 281, 1873, 3609, 411, 577, 360, 286, 3147, 558, 286, 643, 281, + 352, 281, 264, 10155, 13, 51764], "temperature": 0.0, "avg_logprob": -0.17175868806384859, + "compression_ratio": 1.6909871244635193, "no_speech_prob": 0.0020402243826538324}, + {"id": 143, "seek": 75648, "start": 756.48, "end": 766.48, "text": " From the airport + I will travel maybe to some city in Europe from there I will change you know the + airplane and then fly over to New York.", "tokens": [50364, 3358, 264, 10155, 286, + 486, 3147, 1310, 281, 512, 2307, 294, 3315, 490, 456, 286, 486, 1319, 291, 458, + 264, 17130, 293, 550, 3603, 670, 281, 1873, 3609, 13, 50864], "temperature": 0.0, + "avg_logprob": -0.10633975015559667, "compression_ratio": 1.5487179487179488, "no_speech_prob": + 0.025776835158467293}, {"id": 144, "seek": 75648, "start": 766.48, "end": 777.48, + "text": " I''m making it a little bit more complicated there is a direct flight + to New York from Helsinki but okay maybe that wasn''t right is that analogous to + navigable part.", "tokens": [50864, 286, 478, 1455, 309, 257, 707, 857, 544, 6179, + 456, 307, 257, 2047, 7018, 281, 1873, 3609, 490, 45429, 41917, 457, 1392, 1310, + 300, 2067, 380, 558, 307, 300, 16660, 563, 281, 7407, 712, 644, 13, 51414], "temperature": + 0.0, "avg_logprob": -0.10633975015559667, "compression_ratio": 1.5487179487179488, + "no_speech_prob": 0.025776835158467293}, {"id": 145, "seek": 77748, "start": 777.48, + "end": 792.48, "text": " Yes, yes, so like generally like that you can pinpoint + that but if you start and finish in like small local airports which usually don''t + have connections, my magic connection so they connected to hops.", "tokens": [50364, + 1079, 11, 2086, 11, 370, 411, 5101, 411, 300, 291, 393, 40837, 300, 457, 498, 291, + 722, 293, 2413, 294, 411, 1359, 2654, 36561, 597, 2673, 500, 380, 362, 9271, 11, + 452, 5585, 4984, 370, 436, 4582, 281, 47579, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.24590779472799862, "compression_ratio": 1.7465437788018434, "no_speech_prob": + 0.048686470836400986}, {"id": 146, "seek": 77748, "start": 792.48, "end": 803.48, + "text": " Yeah, and that is one of the model of navigable small roles so there are + like Kleinberg''s model which doesn''t have hops so you can also build navigable + small walls without hops.", "tokens": [51114, 865, 11, 293, 300, 307, 472, 295, + 264, 2316, 295, 7407, 712, 1359, 9604, 370, 456, 366, 411, 33327, 6873, 311, 2316, + 597, 1177, 380, 362, 47579, 370, 291, 393, 611, 1322, 7407, 712, 1359, 7920, 1553, + 47579, 13, 51664], "temperature": 0.0, "avg_logprob": -0.24590779472799862, "compression_ratio": + 1.7465437788018434, "no_speech_prob": 0.048686470836400986}, {"id": 147, "seek": + 80348, "start": 803.48, "end": 832.48, "text": " But they have polylogarifmix coedizian + so if you want to have polylogarifmix coedizian so maybe I''ll ask you to provide + some references later so especially for those who want to dig deeper into the smithematics + like you mentioned these different algorithms like many of them are new to me at + least so I''m sure to our part of our audience.", "tokens": [50364, 583, 436, 362, + 6754, 4987, 289, 351, 76, 970, 598, 292, 590, 952, 370, 498, 291, 528, 281, 362, + 6754, 4987, 289, 351, 76, 970, 598, 292, 590, 952, 370, 1310, 286, 603, 1029, 291, + 281, 2893, 512, 15400, 1780, 370, 2318, 337, 729, 567, 528, 281, 2528, 7731, 666, + 264, 899, 355, 37541, 411, 291, 2835, 613, 819, 14642, 411, 867, 295, 552, 366, + 777, 281, 385, 412, 1935, 370, 286, 478, 988, 281, 527, 644, 295, 527, 4034, 13, + 51814], "temperature": 0.0, "avg_logprob": -0.3868344475241268, "compression_ratio": + 1.6732673267326732, "no_speech_prob": 0.13643303513526917}, {"id": 148, "seek": + 83248, "start": 832.48, "end": 856.48, "text": " Part of our audience as well and + I wanted to also ask you like on the context of your invention like what was the + input so you said like you had a lot of data right from computer vision but like + was there something else like dimensionality or some other constraint that was kind + of tough for previous algorithms like a LSH or you know any other.", "tokens": [50364, + 4100, 295, 527, 4034, 382, 731, 293, 286, 1415, 281, 611, 1029, 291, 411, 322, 264, + 4319, 295, 428, 22265, 411, 437, 390, 264, 4846, 370, 291, 848, 411, 291, 632, 257, + 688, 295, 1412, 558, 490, 3820, 5201, 457, 411, 390, 456, 746, 1646, 411, 10139, + 1860, 420, 512, 661, 25534, 300, 390, 733, 295, 4930, 337, 3894, 14642, 411, 257, + 441, 17308, 420, 291, 458, 604, 661, 13, 51564], "temperature": 0.0, "avg_logprob": + -0.12364180023605759, "compression_ratio": 1.619718309859155, "no_speech_prob": + 0.01772996410727501}, {"id": 149, "seek": 85648, "start": 856.48, "end": 868.48, + "text": " Well, there LSH didn''t even work so we worked with like three structures + we have to like how will you do LSH.", "tokens": [50364, 1042, 11, 456, 441, 17308, + 994, 380, 754, 589, 370, 321, 2732, 365, 411, 1045, 9227, 321, 362, 281, 411, 577, + 486, 291, 360, 441, 17308, 13, 50964], "temperature": 0.0, "avg_logprob": -0.35864014779367753, + "compression_ratio": 1.1595744680851063, "no_speech_prob": 0.16624902188777924}, + {"id": 150, "seek": 86848, "start": 868.48, "end": 893.48, "text": " Yeah, for and + for LSH so I thought that those are not practical algorithms so even when I spoke + with people who like were writing a lot of papers on LSH they like expressed doubts + and whether those algorithms are practical so they are not learnable so they cannot + take advantage of the data that you have so like that.", "tokens": [50364, 865, + 11, 337, 293, 337, 441, 17308, 370, 286, 1194, 300, 729, 366, 406, 8496, 14642, + 370, 754, 562, 286, 7179, 365, 561, 567, 411, 645, 3579, 257, 688, 295, 10577, 322, + 441, 17308, 436, 411, 12675, 22618, 293, 1968, 729, 14642, 366, 8496, 370, 436, + 366, 406, 1466, 712, 370, 436, 2644, 747, 5002, 295, 264, 1412, 300, 291, 362, 370, + 411, 300, 13, 51614], "temperature": 0.0, "avg_logprob": -0.1623264948527018, "compression_ratio": + 1.6476683937823835, "no_speech_prob": 0.2989859879016876}, {"id": 151, "seek": 89348, + "start": 894.08, "end": 901.48, "text": " And like what what they told is like they + see as quantization as just a better version of practical version of LSH.", "tokens": + [50394, 400, 411, 437, 437, 436, 1907, 307, 411, 436, 536, 382, 4426, 2144, 382, + 445, 257, 1101, 3037, 295, 8496, 3037, 295, 441, 17308, 13, 50764], "temperature": + 0.0, "avg_logprob": -0.14685373163934964, "compression_ratio": 1.5459183673469388, + "no_speech_prob": 0.020174356177449226}, {"id": 152, "seek": 89348, "start": 903.48, + "end": 917.48, "text": " Yeah right and so actually I''m really interested like + how did you set up to invent the algorithm like I can just give you briefly like + in the recent billion scale vector search challenge.", "tokens": [50864, 865, 558, + 293, 370, 767, 286, 478, 534, 3102, 411, 577, 630, 291, 992, 493, 281, 7962, 264, + 9284, 411, 286, 393, 445, 976, 291, 10515, 411, 294, 264, 5162, 5218, 4373, 8062, + 3164, 3430, 13, 51564], "temperature": 0.0, "avg_logprob": -0.14685373163934964, + "compression_ratio": 1.5459183673469388, "no_speech_prob": 0.020174356177449226}, + {"id": 153, "seek": 91748, "start": 918.48, "end": 939.48, "text": " We had like + a small team and one of our team members actually implemented like a small change + in product quantization layer like basically how you shuffle the dimensions in the + vector and he achieved like 12% recall increase over the baseline you know the Facebook + sell algorithm.", "tokens": [50414, 492, 632, 411, 257, 1359, 1469, 293, 472, 295, + 527, 1469, 2679, 767, 12270, 411, 257, 1359, 1319, 294, 1674, 4426, 2144, 4583, + 411, 1936, 577, 291, 39426, 264, 12819, 294, 264, 8062, 293, 415, 11042, 411, 2272, + 4, 9901, 3488, 670, 264, 20518, 291, 458, 264, 4384, 3607, 9284, 13, 51464], "temperature": + 0.0, "avg_logprob": -0.13454842133955522, "compression_ratio": 1.5135135135135136, + "no_speech_prob": 0.0518990121781826}, {"id": 154, "seek": 93948, "start": 939.48, + "end": 955.48, "text": " I didn''t like have that much knowledge I''ve read your + paper I''ve read other papers and so I was just thinking okay if I if I would start + from first principles how would I solve it like I know nothing about this problem + right so like how can I solve you know the search in multi-dimensional space.", + "tokens": [50364, 286, 994, 380, 411, 362, 300, 709, 3601, 286, 600, 1401, 428, + 3035, 286, 600, 1401, 661, 10577, 293, 370, 286, 390, 445, 1953, 1392, 498, 286, + 498, 286, 576, 722, 490, 700, 9156, 577, 576, 286, 5039, 309, 411, 286, 458, 1825, + 466, 341, 1154, 558, 370, 411, 577, 393, 286, 5039, 291, 458, 264, 3164, 294, 4825, + 12, 18759, 1901, 13, 51164], "temperature": 0.0, "avg_logprob": -0.11763519610998766, + "compression_ratio": 1.707142857142857, "no_speech_prob": 0.005949764046818018}, + {"id": 155, "seek": 93948, "start": 955.48, "end": 967.48, "text": " And so I actually + implemented a very very simple algorithm using your algorithm as one of the components + maybe we can talk later about it but like how did you start inventing H&S W.", "tokens": + [51164, 400, 370, 286, 767, 12270, 257, 588, 588, 2199, 9284, 1228, 428, 9284, 382, + 472, 295, 264, 6677, 1310, 321, 393, 751, 1780, 466, 309, 457, 411, 577, 630, 291, + 722, 7962, 278, 389, 5, 50, 343, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.11763519610998766, "compression_ratio": 1.707142857142857, "no_speech_prob": + 0.005949764046818018}, {"id": 156, "seek": 96948, "start": 970.48, "end": 987.48, + "text": " Well H&S W had a pretty assessor so it has like an NSW it''s also called + MSW or SW graph in different places like depending on where you look so and there + I just so it had problems.", "tokens": [50414, 1042, 389, 5, 50, 343, 632, 257, + 1238, 5877, 284, 370, 309, 575, 411, 364, 15943, 54, 309, 311, 611, 1219, 7395, + 54, 420, 20346, 4295, 294, 819, 3190, 411, 5413, 322, 689, 291, 574, 370, 293, 456, + 286, 445, 370, 309, 632, 2740, 13, 51264], "temperature": 0.0, "avg_logprob": -0.21885568268445074, + "compression_ratio": 1.3235294117647058, "no_speech_prob": 0.01921668089926243}, + {"id": 157, "seek": 98748, "start": 988.48, "end": 1005.48, "text": " So it had + several problems but like for like if you don''t think about distributed setup the + main problem it had poly algorithmic scalability with a number of elements and that + killed the performance on low dimensional data.", "tokens": [50414, 407, 309, 632, + 2940, 2740, 457, 411, 337, 411, 498, 291, 500, 380, 519, 466, 12631, 8657, 264, + 2135, 1154, 309, 632, 6754, 9284, 299, 15664, 2310, 365, 257, 1230, 295, 4959, 293, + 300, 4652, 264, 3389, 322, 2295, 18795, 1412, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.22851226640784222, "compression_ratio": 1.457516339869281, "no_speech_prob": + 0.042576249688863754}, {"id": 158, "seek": 100548, "start": 1006.48, "end": 1034.48, + "text": " So there were like comparison works like one by Leonid Bytes of where + he evaluated different algorithms and like its performance really like it didn''t + perform that well on some data set and the loss was by many orders of magnitude + so it could be like one like 1000 times slower than the best solution and yeah.", + "tokens": [50414, 407, 456, 645, 411, 9660, 1985, 411, 472, 538, 13244, 327, 3146, + 7269, 295, 689, 415, 25509, 819, 14642, 293, 411, 1080, 3389, 534, 411, 309, 994, + 380, 2042, 300, 731, 322, 512, 1412, 992, 293, 264, 4470, 390, 538, 867, 9470, 295, + 15668, 370, 309, 727, 312, 411, 472, 411, 9714, 1413, 14009, 813, 264, 1151, 3827, + 293, 1338, 13, 51814], "temperature": 0.0, "avg_logprob": -0.21878616626446062, + "compression_ratio": 1.5422885572139304, "no_speech_prob": 0.06631921976804733}, + {"id": 159, "seek": 103448, "start": 1034.48, "end": 1048.48, "text": " So the work + on H&S W were targeted at just improving the previous version so it wouldn''t have + this problem and like ideally would perform the best on all setups.", "tokens": + [50364, 407, 264, 589, 322, 389, 5, 50, 343, 645, 15045, 412, 445, 11470, 264, 3894, + 3037, 370, 309, 2759, 380, 362, 341, 1154, 293, 411, 22915, 576, 2042, 264, 1151, + 322, 439, 46832, 13, 51064], "temperature": 0.0, "avg_logprob": -0.1654958163990694, + "compression_ratio": 1.4285714285714286, "no_speech_prob": 0.003002072451636195}, + {"id": 160, "seek": 103448, "start": 1048.48, "end": 1052.48, "text": " So yeah + and that that that that has been solved.", "tokens": [51064, 407, 1338, 293, 300, + 300, 300, 300, 575, 668, 13041, 13, 51264], "temperature": 0.0, "avg_logprob": -0.1654958163990694, + "compression_ratio": 1.4285714285714286, "no_speech_prob": 0.003002072451636195}, + {"id": 161, "seek": 105248, "start": 1053.48, "end": 1071.48, "text": " Right but + like you still needed to add that magical age in front of it so you made it hierarchical + like what what pushed you in the direction of making it hierarchical and what what + did you think that it might work or was it like as a result of experimentation that + it proved to work.", "tokens": [50414, 1779, 457, 411, 291, 920, 2978, 281, 909, + 300, 12066, 3205, 294, 1868, 295, 309, 370, 291, 1027, 309, 35250, 804, 411, 437, + 437, 9152, 291, 294, 264, 3513, 295, 1455, 309, 35250, 804, 293, 437, 437, 630, + 291, 519, 300, 309, 1062, 589, 420, 390, 309, 411, 382, 257, 1874, 295, 37142, 300, + 309, 14617, 281, 589, 13, 51314], "temperature": 0.0, "avg_logprob": -0.1273075890919519, + "compression_ratio": 1.7005988023952097, "no_speech_prob": 0.042004723101854324}, + {"id": 162, "seek": 107148, "start": 1072.48, "end": 1097.48, "text": " Well yeah + that that that''s that''s yeah that has many ingredients in it so for for one thing + when I worked with the startup mirror labs so we had a different problem with distributed + index that NSW had a pleasant quality that the hubs that are created in the network + are the first elements.", "tokens": [50414, 1042, 1338, 300, 300, 300, 311, 300, + 311, 1338, 300, 575, 867, 6952, 294, 309, 370, 337, 337, 472, 551, 562, 286, 2732, + 365, 264, 18578, 8013, 20339, 370, 321, 632, 257, 819, 1154, 365, 12631, 8186, 300, + 15943, 54, 632, 257, 16232, 3125, 300, 264, 46870, 300, 366, 2942, 294, 264, 3209, + 366, 264, 700, 4959, 13, 51664], "temperature": 0.0, "avg_logprob": -0.25532833222419987, + "compression_ratio": 1.6420454545454546, "no_speech_prob": 0.03184979781508446}, + {"id": 163, "seek": 109748, "start": 1098.48, "end": 1121.48, "text": " So and the + for distributed system you would want to add new nodes to the system and you will + have much capacity like increase the capacity of the system but because all your + hubs on in the first notes like in the older notes because they have been created + before new nodes even existed.", "tokens": [50414, 407, 293, 264, 337, 12631, 1185, + 291, 576, 528, 281, 909, 777, 13891, 281, 264, 1185, 293, 291, 486, 362, 709, 6042, + 411, 3488, 264, 6042, 295, 264, 1185, 457, 570, 439, 428, 46870, 322, 294, 264, + 700, 5570, 411, 294, 264, 4906, 5570, 570, 436, 362, 668, 2942, 949, 777, 13891, + 754, 13135, 13, 51564], "temperature": 0.0, "avg_logprob": -0.2255421169733597, + "compression_ratio": 1.7228915662650603, "no_speech_prob": 0.015513861551880836}, + {"id": 164, "seek": 112148, "start": 1121.48, "end": 1145.48, "text": " The traffic + is routed through the same old notes which make it not scalable and we spent quite + a lot of time on figuring out how to solve it and there at some point I''ve noticed + that like our NSW approach is pretty similar to skip list in terms of what what + has been what is being protest as final network.", "tokens": [50364, 440, 6419, + 307, 4020, 292, 807, 264, 912, 1331, 5570, 597, 652, 309, 406, 38481, 293, 321, + 4418, 1596, 257, 688, 295, 565, 322, 15213, 484, 577, 281, 5039, 309, 293, 456, + 412, 512, 935, 286, 600, 5694, 300, 411, 527, 15943, 54, 3109, 307, 1238, 2531, + 281, 10023, 1329, 294, 2115, 295, 437, 437, 575, 668, 437, 307, 885, 11281, 382, + 2572, 3209, 13, 51564], "temperature": 0.0, "avg_logprob": -0.15753530419391135, + "compression_ratio": 1.5223880597014925, "no_speech_prob": 0.026150261983275414}, + {"id": 165, "seek": 114548, "start": 1145.48, "end": 1168.48, "text": " The idea + is like if you if you create a skip list for one D and create the NSW for one D + and then like for skip list you just merge the all all links regardless of player + you will get a similar network in terms of like degree distribution like distance + distribution well all major properties.", "tokens": [50364, 440, 1558, 307, 411, + 498, 291, 498, 291, 1884, 257, 10023, 1329, 337, 472, 413, 293, 1884, 264, 15943, + 54, 337, 472, 413, 293, 550, 411, 337, 10023, 1329, 291, 445, 22183, 264, 439, 439, + 6123, 10060, 295, 4256, 291, 486, 483, 257, 2531, 3209, 294, 2115, 295, 411, 4314, + 7316, 411, 4560, 7316, 731, 439, 2563, 7221, 13, 51514], "temperature": 0.0, "avg_logprob": + -0.16269907875666542, "compression_ratio": 1.7176470588235293, "no_speech_prob": + 0.03380730375647545}, {"id": 166, "seek": 116848, "start": 1168.48, "end": 1193.48, + "text": " So but skip list doesn''t have this property so you can add new nodes + and they can have like they can have higher levels and like your traffic will be + a road through notes in your form like across your distributed system so and that + thing we knew like from the startup that there is a like equivalence but that was + only for the problem of distributed search.", "tokens": [50364, 407, 457, 10023, + 1329, 1177, 380, 362, 341, 4707, 370, 291, 393, 909, 777, 13891, 293, 436, 393, + 362, 411, 436, 393, 362, 2946, 4358, 293, 411, 428, 6419, 486, 312, 257, 3060, 807, + 5570, 294, 428, 1254, 411, 2108, 428, 12631, 1185, 370, 293, 300, 551, 321, 2586, + 411, 490, 264, 18578, 300, 456, 307, 257, 411, 9052, 655, 457, 300, 390, 787, 337, + 264, 1154, 295, 12631, 3164, 13, 51614], "temperature": 0.0, "avg_logprob": -0.23071039835611978, + "compression_ratio": 1.733009708737864, "no_speech_prob": 0.01258369255810976}, + {"id": 167, "seek": 119348, "start": 1193.48, "end": 1220.48, "text": " So it would + still use the same polylogar if Michael like 3d search algorithm like which doesn''t + think about like what is that how many links you have on a note so that was shelved + for that reasons in the startup but then so after ID PhD so like I wanted to publish + a good paper on network science.", "tokens": [50364, 407, 309, 576, 920, 764, 264, + 912, 6754, 4987, 289, 498, 5116, 411, 805, 67, 3164, 9284, 411, 597, 1177, 380, + 519, 466, 411, 437, 307, 300, 577, 867, 6123, 291, 362, 322, 257, 3637, 370, 300, + 390, 9180, 937, 337, 300, 4112, 294, 264, 18578, 457, 550, 370, 934, 7348, 14476, + 370, 411, 286, 1415, 281, 11374, 257, 665, 3035, 322, 3209, 3497, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.4356955929078918, "compression_ratio": 1.485, "no_speech_prob": + 0.06325482577085495}, {"id": 168, "seek": 122048, "start": 1220.48, "end": 1249.48, + "text": " And there like it was and I like there is there is a result that we can + create a new navigable networks which a method which was not known before so I tried + to publish it in nature so it was rejected like nature physics also rejected that + it was rejected by editors then in scientific reports was rejected after a review + and then like it was finally published in plus one.", "tokens": [50414, 400, 456, + 411, 309, 390, 293, 286, 411, 456, 307, 456, 307, 257, 1874, 300, 321, 393, 1884, + 257, 777, 7407, 712, 9590, 597, 257, 3170, 597, 390, 406, 2570, 949, 370, 286, 3031, + 281, 11374, 309, 294, 3687, 370, 309, 390, 15749, 411, 3687, 10649, 611, 15749, + 300, 309, 390, 15749, 538, 31446, 550, 294, 8134, 7122, 390, 15749, 934, 257, 3131, + 293, 550, 411, 309, 390, 2721, 6572, 294, 1804, 472, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.20691304392628856, "compression_ratio": 1.8235294117647058, + "no_speech_prob": 0.032385438680648804}, {"id": 169, "seek": 125048, "start": 1250.48, + "end": 1258.48, "text": " And I think like I really like this paper so that was + like the most surprising result I think I got but yet it''s not really decided.", + "tokens": [50364, 400, 286, 519, 411, 286, 534, 411, 341, 3035, 370, 300, 390, 411, + 264, 881, 8830, 1874, 286, 519, 286, 658, 457, 1939, 309, 311, 406, 534, 3047, 13, + 50764], "temperature": 0.0, "avg_logprob": -0.19177235727724823, "compression_ratio": + 1.7627118644067796, "no_speech_prob": 0.011710943654179573}, {"id": 170, "seek": + 125048, "start": 1259.48, "end": 1279.48, "text": " And as a byproduct of this I + did a comparison to other navigable small world methods and so like maybe I have + maybe like this approach with like the old vision that you can apply like you can + look at the real world networks and replicate it and like computer system and they + will be.", "tokens": [50814, 400, 382, 257, 538, 33244, 295, 341, 286, 630, 257, + 9660, 281, 661, 7407, 712, 1359, 1002, 7150, 293, 370, 411, 1310, 286, 362, 1310, + 411, 341, 3109, 365, 411, 264, 1331, 5201, 300, 291, 393, 3079, 411, 291, 393, 574, + 412, 264, 957, 1002, 9590, 293, 25356, 309, 293, 411, 3820, 1185, 293, 436, 486, + 312, 13, 51814], "temperature": 0.0, "avg_logprob": -0.19177235727724823, "compression_ratio": + 1.7627118644067796, "no_speech_prob": 0.011710943654179573}, {"id": 171, "seek": + 128048, "start": 1280.48, "end": 1290.48, "text": " So I replicate that the work + done like scale free navigable navigable small worlds which are very popular thing + till the moment all.", "tokens": [50364, 407, 286, 25356, 300, 264, 589, 1096, 411, + 4373, 1737, 7407, 712, 7407, 712, 1359, 13401, 597, 366, 588, 3743, 551, 4288, 264, + 1623, 439, 13, 50864], "temperature": 0.0, "avg_logprob": -0.3321691904312525, "compression_ratio": + 1.740566037735849, "no_speech_prob": 0.12275636196136475}, {"id": 172, "seek": 128048, + "start": 1291.48, "end": 1309.48, "text": " And so that the performance was really + was like very bad like extremely bad and the reason for that that if you have a + scale free network and scale free means you have a power low distribution of degree + and usually they like there is a.", "tokens": [50914, 400, 370, 300, 264, 3389, + 390, 534, 390, 411, 588, 1578, 411, 4664, 1578, 293, 264, 1778, 337, 300, 300, 498, + 291, 362, 257, 4373, 1737, 3209, 293, 4373, 1737, 1355, 291, 362, 257, 1347, 2295, + 7316, 295, 4314, 293, 2673, 436, 411, 456, 307, 257, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.3321691904312525, "compression_ratio": 1.740566037735849, + "no_speech_prob": 0.12275636196136475}, {"id": 173, "seek": 131048, "start": 1310.48, + "end": 1339.48, "text": " coefficient gamma and like the best cases was gamma is + close to two but gamma close to two means that the scalability with the size of + the network so the degree scale is almost linearly so when you have a like a greedy + search for the hub so when it goes through the hub like it to play it''s like a + huge portion of the network so you have like linear scalability instead of like + ultra logarithmic so log log again which is.", "tokens": [50364, 17619, 15546, 293, + 411, 264, 1151, 3331, 390, 15546, 307, 1998, 281, 732, 457, 15546, 1998, 281, 732, + 1355, 300, 264, 15664, 2310, 365, 264, 2744, 295, 264, 3209, 370, 264, 4314, 4373, + 307, 1920, 8213, 356, 370, 562, 291, 362, 257, 411, 257, 28228, 3164, 337, 264, + 11838, 370, 562, 309, 1709, 807, 264, 11838, 411, 309, 281, 862, 309, 311, 411, + 257, 2603, 8044, 295, 264, 3209, 370, 291, 362, 411, 8213, 15664, 2310, 2602, 295, + 411, 14808, 41473, 355, 13195, 370, 3565, 3565, 797, 597, 307, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.2759508522607947, "compression_ratio": 1.8923766816143497, + "no_speech_prob": 0.02611459605395794}, {"id": 174, "seek": 134048, "start": 1340.48, + "end": 1369.48, "text": " They like the number of hope is log log in but at some + point you evaluate to like almost every point in your network and you have like + really bad performance and that like that after that realized what was the problem + with NSW and like I thought all like we already have a solution for that so because + keep least doesn''t have this problem and so yeah after that I implemented the prototype + and it worked.", "tokens": [50364, 814, 411, 264, 1230, 295, 1454, 307, 3565, 3565, + 294, 457, 412, 512, 935, 291, 13059, 281, 411, 1920, 633, 935, 294, 428, 3209, 293, + 291, 362, 411, 534, 1578, 3389, 293, 300, 411, 300, 934, 300, 5334, 437, 390, 264, + 1154, 365, 15943, 54, 293, 411, 286, 1194, 439, 411, 321, 1217, 362, 257, 3827, + 337, 300, 370, 570, 1066, 1935, 1177, 380, 362, 341, 1154, 293, 370, 1338, 934, + 300, 286, 12270, 264, 19475, 293, 309, 2732, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.24544520550463572, "compression_ratio": 1.7046413502109705, "no_speech_prob": + 0.024636508896946907}, {"id": 175, "seek": 137048, "start": 1370.48, "end": 1377.48, + "text": " Working on the like C++ version and the evaluation.", "tokens": [50364, + 18337, 322, 264, 411, 383, 25472, 3037, 293, 264, 13344, 13, 50714], "temperature": + 0.0, "avg_logprob": -0.27485483033316477, "compression_ratio": 1.5225806451612902, + "no_speech_prob": 0.0366550087928772}, {"id": 176, "seek": 137048, "start": 1377.48, + "end": 1392.48, "text": " By the way when when you started implementing your prototype + was it initially in C++ now it wasn''t in in in in Java and Java because Java is + your favorite language or what was it Java.", "tokens": [50714, 3146, 264, 636, + 562, 562, 291, 1409, 18114, 428, 19475, 390, 309, 9105, 294, 383, 25472, 586, 309, + 2067, 380, 294, 294, 294, 294, 10745, 293, 10745, 570, 10745, 307, 428, 2954, 2856, + 420, 437, 390, 309, 10745, 13, 51464], "temperature": 0.0, "avg_logprob": -0.27485483033316477, + "compression_ratio": 1.5225806451612902, "no_speech_prob": 0.0366550087928772}, + {"id": 177, "seek": 139248, "start": 1392.48, "end": 1409.48, "text": " Because + the distributed system like that was implemented in Java so that was close so like + it was easier to integrate like if you like maybe you were thinking it''s easier + to integrate in Java right.", "tokens": [50364, 1436, 264, 12631, 1185, 411, 300, + 390, 12270, 294, 10745, 370, 300, 390, 1998, 370, 411, 309, 390, 3571, 281, 13365, + 411, 498, 291, 411, 1310, 291, 645, 1953, 309, 311, 3571, 281, 13365, 294, 10745, + 558, 13, 51214], "temperature": 0.0, "avg_logprob": -0.2186222748017647, "compression_ratio": + 1.6684491978609626, "no_speech_prob": 0.08544904738664627}, {"id": 178, "seek": + 139248, "start": 1409.48, "end": 1420.48, "text": " Well I just know how to code + it in Java so I code that several times for NSW and that all Java code was released.", + "tokens": [51214, 1042, 286, 445, 458, 577, 281, 3089, 309, 294, 10745, 370, 286, + 3089, 300, 2940, 1413, 337, 15943, 54, 293, 300, 439, 10745, 3089, 390, 4736, 13, + 51764], "temperature": 0.0, "avg_logprob": -0.2186222748017647, "compression_ratio": + 1.6684491978609626, "no_speech_prob": 0.08544904738664627}, {"id": 179, "seek": + 142048, "start": 1421.48, "end": 1446.48, "text": " So yeah just code it and then + like I had to transfer it to C++ to make it efficient and like yeah and so there + is like Leonid bites off so who who is a maintainer of an MS leap so I have been + in contact with him for quite a while and yeah so it was implemented in the library.", + "tokens": [50414, 407, 1338, 445, 3089, 309, 293, 550, 411, 286, 632, 281, 5003, + 309, 281, 383, 25472, 281, 652, 309, 7148, 293, 411, 1338, 293, 370, 456, 307, 411, + 13244, 327, 26030, 766, 370, 567, 567, 307, 257, 6909, 260, 295, 364, 7395, 19438, + 370, 286, 362, 668, 294, 3385, 365, 796, 337, 1596, 257, 1339, 293, 1338, 370, 309, + 390, 12270, 294, 264, 6405, 13, 51664], "temperature": 0.0, "avg_logprob": -0.27116899904997455, + "compression_ratio": 1.541899441340782, "no_speech_prob": 0.02220718376338482}, + {"id": 180, "seek": 144648, "start": 1446.48, "end": 1456.48, "text": " Did you + like collaborate with him to to to implement it using the enemy sleep sort of the + most efficient way or.", "tokens": [50364, 2589, 291, 411, 18338, 365, 796, 281, + 281, 281, 4445, 309, 1228, 264, 5945, 2817, 1333, 295, 264, 881, 7148, 636, 420, + 13, 50864], "temperature": 0.0, "avg_logprob": -0.21896951728396946, "compression_ratio": + 1.5422885572139304, "no_speech_prob": 0.014739437028765678}, {"id": 181, "seek": + 144648, "start": 1456.48, "end": 1474.48, "text": " Well first of all like the ideology + of the library is very close to like what we have been developing so it''s not only + focused on like typical distances like L2, Cassian or even like inner product.", + "tokens": [50864, 1042, 700, 295, 439, 411, 264, 23101, 295, 264, 6405, 307, 588, + 1998, 281, 411, 437, 321, 362, 668, 6416, 370, 309, 311, 406, 787, 5178, 322, 411, + 7476, 22182, 411, 441, 17, 11, 383, 640, 952, 420, 754, 411, 7284, 1674, 13, 51764], + "temperature": 0.0, "avg_logprob": -0.21896951728396946, "compression_ratio": 1.5422885572139304, + "no_speech_prob": 0.014739437028765678}, {"id": 182, "seek": 147448, "start": 1475.48, + "end": 1489.48, "text": " So yeah it makes sense to compare on all those distances + and Leonid also had a paper like in a bench like on all of those so we can just + implement a new algorithm and run a bench.", "tokens": [50414, 407, 1338, 309, 1669, + 2020, 281, 6794, 322, 439, 729, 22182, 293, 13244, 327, 611, 632, 257, 3035, 411, + 294, 257, 10638, 411, 322, 439, 295, 729, 370, 321, 393, 445, 4445, 257, 777, 9284, + 293, 1190, 257, 10638, 13, 51114], "temperature": 0.0, "avg_logprob": -0.2149033424181816, + "compression_ratio": 1.703125, "no_speech_prob": 0.016881758347153664}, {"id": 183, + "seek": 147448, "start": 1489.48, "end": 1502.48, "text": " Right and come so that + that was like a really good point and it also wasn''t implemented in and benchmark + so if you add an algorithm so we can like.", "tokens": [51114, 1779, 293, 808, 370, + 300, 300, 390, 411, 257, 534, 665, 935, 293, 309, 611, 2067, 380, 12270, 294, 293, + 18927, 370, 498, 291, 909, 364, 9284, 370, 321, 393, 411, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.2149033424181816, "compression_ratio": 1.703125, "no_speech_prob": + 0.016881758347153664}, {"id": 184, "seek": 150248, "start": 1503.48, "end": 1518.48, + "text": " Go through all sets of benchmarks yeah yeah yeah so like it was kind of + easy for you to evaluate where you algorithm stands against other algorithms right + so like yes right right and so what was the.", "tokens": [50414, 1037, 807, 439, + 6352, 295, 43751, 1338, 1338, 1338, 370, 411, 309, 390, 733, 295, 1858, 337, 291, + 281, 13059, 689, 291, 9284, 7382, 1970, 661, 14642, 558, 370, 411, 2086, 558, 558, + 293, 370, 437, 390, 264, 13, 51164], "temperature": 0.0, "avg_logprob": -0.1774702650127989, + "compression_ratio": 1.6629213483146068, "no_speech_prob": 0.006451904773712158}, + {"id": 185, "seek": 150248, "start": 1518.48, "end": 1525.48, "text": " And you + also had a call to write maybe maybe you could introduce him as well like on your + paper.", "tokens": [51164, 400, 291, 611, 632, 257, 818, 281, 2464, 1310, 1310, + 291, 727, 5366, 796, 382, 731, 411, 322, 428, 3035, 13, 51514], "temperature": 0.0, + "avg_logprob": -0.1774702650127989, "compression_ratio": 1.6629213483146068, "no_speech_prob": + 0.006451904773712158}, {"id": 186, "seek": 152548, "start": 1525.48, "end": 1530.48, + "text": " Oh yeah that that is midrida shunyan so.", "tokens": [50364, 876, 1338, + 300, 300, 307, 2062, 81, 2887, 402, 409, 6277, 370, 13, 50614], "temperature": 0.0, + "avg_logprob": -0.4161293195641559, "compression_ratio": 1.3070175438596492, "no_speech_prob": + 0.04281523451209068}, {"id": 187, "seek": 152548, "start": 1530.48, "end": 1543.48, + "text": " So that''s that that that was my peer in the like physics lab he also + got PhD the same year I did so yeah so.", "tokens": [50614, 407, 300, 311, 300, + 300, 300, 390, 452, 15108, 294, 264, 411, 10649, 2715, 415, 611, 658, 14476, 264, + 912, 1064, 286, 630, 370, 1338, 370, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.4161293195641559, "compression_ratio": 1.3070175438596492, "no_speech_prob": + 0.04281523451209068}, {"id": 188, "seek": 154348, "start": 1544.48, "end": 1555.48, + "text": " Yeah so we decided to team up with that so he helped a lot on he he did + he did the all evaluation so he integrated it with other code.", "tokens": [50414, + 865, 370, 321, 3047, 281, 1469, 493, 365, 300, 370, 415, 4254, 257, 688, 322, 415, + 415, 630, 415, 630, 264, 439, 13344, 370, 415, 10919, 309, 365, 661, 3089, 13, 50964], + "temperature": 0.0, "avg_logprob": -0.1683224074694575, "compression_ratio": 1.8125, + "no_speech_prob": 0.15242688357830048}, {"id": 189, "seek": 154348, "start": 1555.48, + "end": 1560.48, "text": " And here on the evaluation on the like clusters that we + had.", "tokens": [50964, 400, 510, 322, 264, 13344, 322, 264, 411, 23313, 300, 321, + 632, 13, 51214], "temperature": 0.0, "avg_logprob": -0.1683224074694575, "compression_ratio": + 1.8125, "no_speech_prob": 0.15242688357830048}, {"id": 190, "seek": 154348, "start": + 1560.48, "end": 1572.48, "text": " Yeah at that point nice so so back to the invention + like as you''ve been inventing this elbow did you have to make a lot of adjustments + to the core of the algorithm as you have been evaluating it or was it like.", "tokens": + [51214, 865, 412, 300, 935, 1481, 370, 370, 646, 281, 264, 22265, 411, 382, 291, + 600, 668, 7962, 278, 341, 18507, 630, 291, 362, 281, 652, 257, 688, 295, 18624, + 281, 264, 4965, 295, 264, 9284, 382, 291, 362, 668, 27479, 309, 420, 390, 309, 411, + 13, 51814], "temperature": 0.0, "avg_logprob": -0.1683224074694575, "compression_ratio": + 1.8125, "no_speech_prob": 0.15242688357830048}, {"id": 191, "seek": 157248, "start": + 1572.48, "end": 1577.48, "text": " You know the first shot and it was it.", "tokens": + [50364, 509, 458, 264, 700, 3347, 293, 309, 390, 309, 13, 50614], "temperature": + 0.0, "avg_logprob": -0.2782336984361921, "compression_ratio": 1.4705882352941178, + "no_speech_prob": 0.016889402642846107}, {"id": 192, "seek": 157248, "start": 1577.48, + "end": 1597.48, "text": " Well not really so there are like two changes compared + to NSW in the national SW so first one is the idea of layers so that''s all most + of the problems with like low dimensional data and.", "tokens": [50614, 1042, 406, + 534, 370, 456, 366, 411, 732, 2962, 5347, 281, 15943, 54, 294, 264, 4048, 20346, + 370, 700, 472, 307, 264, 1558, 295, 7914, 370, 300, 311, 439, 881, 295, 264, 2740, + 365, 411, 2295, 18795, 1412, 293, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.2782336984361921, "compression_ratio": 1.4705882352941178, "no_speech_prob": + 0.016889402642846107}, {"id": 193, "seek": 159748, "start": 1597.48, "end": 1620.48, + "text": " Yeah also improve performance like in most of the tax that like most of + the distributions even like but maybe not much like high dimensional data but still + when I ran the whole like suit that was there was a few data set on we should perform + worse compared to.", "tokens": [50364, 865, 611, 3470, 3389, 411, 294, 881, 295, + 264, 3366, 300, 411, 881, 295, 264, 37870, 754, 411, 457, 1310, 406, 709, 411, 1090, + 18795, 1412, 457, 920, 562, 286, 5872, 264, 1379, 411, 5722, 300, 390, 456, 390, + 257, 1326, 1412, 992, 322, 321, 820, 2042, 5324, 5347, 281, 13, 51514], "temperature": + 0.0, "avg_logprob": -0.24169871590354225, "compression_ratio": 1.6149068322981366, + "no_speech_prob": 0.02894921787083149}, {"id": 194, "seek": 162048, "start": 1620.48, + "end": 1636.48, "text": " VP3 so that''s from Leonits you then I thought that wasn''t + a big deal but like communicated the results with Leonit trying to convince him + that like we don''t need to have that much algorithms.", "tokens": [50364, 35812, + 18, 370, 300, 311, 490, 13244, 1208, 291, 550, 286, 1194, 300, 2067, 380, 257, 955, + 2028, 457, 411, 34989, 264, 3542, 365, 13244, 270, 1382, 281, 13447, 796, 300, 411, + 321, 500, 380, 643, 281, 362, 300, 709, 14642, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.277025844739831, "compression_ratio": 1.3642857142857143, "no_speech_prob": 0.09887982904911041}, + {"id": 195, "seek": 163648, "start": 1636.48, "end": 1654.48, "text": " But he was + not convinced so then we added like an improvement with the heuristic for selecting + the neighbors which like I personally knew from the work on spatial approximation + three.", "tokens": [50364, 583, 415, 390, 406, 12561, 370, 550, 321, 3869, 411, + 364, 10444, 365, 264, 415, 374, 3142, 337, 18182, 264, 12512, 597, 411, 286, 5665, + 2586, 490, 264, 589, 322, 23598, 28023, 1045, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.12329130423696417, "compression_ratio": 1.3656716417910448, "no_speech_prob": + 0.07337860763072968}, {"id": 196, "seek": 165448, "start": 1654.48, "end": 1675.48, + "text": " That made that that made the transition to skip list exact so it made + an exact so you can build the exact skip list in one day using this heuristic and + after that so yeah that that that that addition improved the performance yeah.", + "tokens": [50364, 663, 1027, 300, 300, 1027, 264, 6034, 281, 10023, 1329, 1900, + 370, 309, 1027, 364, 1900, 370, 291, 393, 1322, 264, 1900, 10023, 1329, 294, 472, + 786, 1228, 341, 415, 374, 3142, 293, 934, 300, 370, 1338, 300, 300, 300, 300, 4500, + 9689, 264, 3389, 1338, 13, 51414], "temperature": 0.0, "avg_logprob": -0.3832649530148974, + "compression_ratio": 1.7164179104477613, "no_speech_prob": 0.10238711535930634}, + {"id": 197, "seek": 167548, "start": 1675.48, "end": 1691.48, "text": " I remember + please correctly if i''m wrong but like I''ve read your paper actually really really + closely so I printed it and you know like I was reading with the pencil actually + making notes so remember like at some point was it so that you agree to them it + also needs to prove that it will converge.", "tokens": [50364, 286, 1604, 1767, + 8944, 498, 741, 478, 2085, 457, 411, 286, 600, 1401, 428, 3035, 767, 534, 534, 8185, + 370, 286, 13567, 309, 293, 291, 458, 411, 286, 390, 3760, 365, 264, 10985, 767, + 1455, 5570, 370, 1604, 411, 412, 512, 935, 390, 309, 370, 300, 291, 3986, 281, 552, + 309, 611, 2203, 281, 7081, 300, 309, 486, 41881, 13, 51164], "temperature": 0.0, + "avg_logprob": -0.170175239443779, "compression_ratio": 1.6408839779005524, "no_speech_prob": + 0.3042680323123932}, {"id": 198, "seek": 169148, "start": 1691.48, "end": 1717.48, + "text": " Or like because you keep resuffling the points in some way right like + as you build it you use multiple threads like in order to kind of build the actual + paths between the nodes between layers right so like do you need to kind of still + somehow make sure that it will converge on all dimensionality so all spaces or was + it was it not necessary.", "tokens": [50364, 1610, 411, 570, 291, 1066, 725, 1245, + 1688, 264, 2793, 294, 512, 636, 558, 411, 382, 291, 1322, 309, 291, 764, 3866, 19314, + 411, 294, 1668, 281, 733, 295, 1322, 264, 3539, 14518, 1296, 264, 13891, 1296, 7914, + 558, 370, 411, 360, 291, 643, 281, 733, 295, 920, 6063, 652, 988, 300, 309, 486, + 41881, 322, 439, 10139, 1860, 370, 439, 7673, 420, 390, 309, 390, 309, 406, 4818, + 13, 51664], "temperature": 0.0, "avg_logprob": -0.1653478596661542, "compression_ratio": + 1.693069306930693, "no_speech_prob": 0.19410616159439087}, {"id": 199, "seek": 171748, + "start": 1717.48, "end": 1746.48, "text": " Well so the algorithm is pretty stable + so the result is like how many threads you can go that is an empirical result so + I was surprised when I saw it but you know even like for NSW the first algorithm + even if you start to do like to use I know 40 threads from a single element I can + found no no no drop in the recall.", "tokens": [50364, 1042, 370, 264, 9284, 307, + 1238, 8351, 370, 264, 1874, 307, 411, 577, 867, 19314, 291, 393, 352, 300, 307, + 364, 31886, 1874, 370, 286, 390, 6100, 562, 286, 1866, 309, 457, 291, 458, 754, + 411, 337, 15943, 54, 264, 700, 9284, 754, 498, 291, 722, 281, 360, 411, 281, 764, + 286, 458, 3356, 19314, 490, 257, 2167, 4478, 286, 393, 1352, 572, 572, 572, 3270, + 294, 264, 9901, 13, 51814], "temperature": 0.0, "avg_logprob": -0.2161735586217932, + "compression_ratio": 1.601010101010101, "no_speech_prob": 0.08253836631774902}, + {"id": 200, "seek": 174648, "start": 1746.48, "end": 1751.48, "text": " No drop + in the recall or speed that was a bit surprising.", "tokens": [50364, 883, 3270, + 294, 264, 9901, 420, 3073, 300, 390, 257, 857, 8830, 13, 50614], "temperature": + 0.0, "avg_logprob": -0.16134048330372777, "compression_ratio": 1.5394736842105263, + "no_speech_prob": 0.002386378822848201}, {"id": 201, "seek": 174648, "start": 1751.48, + "end": 1755.48, "text": " In terms of stability.", "tokens": [50614, 682, 2115, + 295, 11826, 13, 50814], "temperature": 0.0, "avg_logprob": -0.16134048330372777, + "compression_ratio": 1.5394736842105263, "no_speech_prob": 0.002386378822848201}, + {"id": 202, "seek": 174648, "start": 1755.48, "end": 1768.48, "text": " So the main + way to make it stable is just like to avoid avoid exploring so like use use proper + parameters big enough there are ways to make it stable in.", "tokens": [50814, 407, + 264, 2135, 636, 281, 652, 309, 8351, 307, 445, 411, 281, 5042, 5042, 12736, 370, + 411, 764, 764, 2296, 9834, 955, 1547, 456, 366, 2098, 281, 652, 309, 8351, 294, + 13, 51464], "temperature": 0.0, "avg_logprob": -0.16134048330372777, "compression_ratio": + 1.5394736842105263, "no_speech_prob": 0.002386378822848201}, {"id": 203, "seek": + 176848, "start": 1768.48, "end": 1789.48, "text": " For corruption so when when + but that that that that is pretty costly so if you bootstrap the graph so if you + like do iterations like similar to an undecent I think you probably know that I + can make it stable even if it''s corrupted by a lot.", "tokens": [50364, 1171, 17959, + 370, 562, 562, 457, 300, 300, 300, 300, 307, 1238, 28328, 370, 498, 291, 11450, + 372, 4007, 264, 4295, 370, 498, 291, 411, 360, 36540, 411, 2531, 281, 364, 674, + 3045, 317, 286, 519, 291, 1391, 458, 300, 286, 393, 652, 309, 8351, 754, 498, 309, + 311, 39480, 538, 257, 688, 13, 51414], "temperature": 0.0, "avg_logprob": -0.20094226968699488, + "compression_ratio": 1.5686274509803921, "no_speech_prob": 0.16928915679454803}, + {"id": 204, "seek": 178948, "start": 1789.48, "end": 1801.48, "text": " So that + is done only for updates so like when you update your kind of corrupting the graph + and well in the like a channels WLIP.", "tokens": [50364, 407, 300, 307, 1096, 787, + 337, 9205, 370, 411, 562, 291, 5623, 428, 733, 295, 17366, 278, 264, 4295, 293, + 731, 294, 264, 411, 257, 9235, 343, 43, 9139, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.24967892198677522, "compression_ratio": 1.6930232558139535, "no_speech_prob": + 0.050628744065761566}, {"id": 205, "seek": 178948, "start": 1801.48, "end": 1817.48, + "text": " So for updates it wasn''t specifically made to be like very stable but + for just construction it doesn''t have to be like that''s stable doesn''t have to + conversion all situation just keep the parameters high enough and it wouldn''t diverge.", + "tokens": [50964, 407, 337, 9205, 309, 2067, 380, 4682, 1027, 281, 312, 411, 588, + 8351, 457, 337, 445, 6435, 309, 1177, 380, 362, 281, 312, 411, 300, 311, 8351, 1177, + 380, 362, 281, 14298, 439, 2590, 445, 1066, 264, 9834, 1090, 1547, 293, 309, 2759, + 380, 18558, 432, 13, 51764], "temperature": 0.0, "avg_logprob": -0.24967892198677522, + "compression_ratio": 1.6930232558139535, "no_speech_prob": 0.050628744065761566}, + {"id": 206, "seek": 181748, "start": 1817.48, "end": 1846.48, "text": " Right yeah + because I remember like and I''m also curious to hear your opinion so then I after + your paper I started reading other papers for example the Microsoft''s zoom algorithm + and then later they called it discount and with some modifications so they were + comparing to etch and SW at larger scale something like billions of nodes billions + of points in the space right.", "tokens": [50364, 1779, 1338, 570, 286, 1604, 411, + 293, 286, 478, 611, 6369, 281, 1568, 428, 4800, 370, 550, 286, 934, 428, 3035, 286, + 1409, 3760, 661, 10577, 337, 1365, 264, 8116, 311, 8863, 9284, 293, 550, 1780, 436, + 1219, 309, 11635, 293, 365, 512, 26881, 370, 436, 645, 15763, 281, 1030, 339, 293, + 20346, 412, 4833, 4373, 746, 411, 17375, 295, 13891, 17375, 295, 2793, 294, 264, + 1901, 558, 13, 51814], "temperature": 0.0, "avg_logprob": -0.22679744354666095, + "compression_ratio": 1.608695652173913, "no_speech_prob": 0.003249343251809478}, + {"id": 207, "seek": 184648, "start": 1846.48, "end": 1871.48, "text": " And so they + they were trying to minimize the cost that that that it will incur because basically + as you build the H and SW you also use memory quite a bit right so I wanted to hear + your opinion on that part and then they what they did is that they I don''t know + if you''re familiar with these papers but what they did is that they offloaded portion + of the retrieval to to an SSD disk.", "tokens": [50364, 400, 370, 436, 436, 645, + 1382, 281, 17522, 264, 2063, 300, 300, 300, 309, 486, 35774, 570, 1936, 382, 291, + 1322, 264, 389, 293, 20346, 291, 611, 764, 4675, 1596, 257, 857, 558, 370, 286, + 1415, 281, 1568, 428, 4800, 322, 300, 644, 293, 550, 436, 437, 436, 630, 307, 300, + 436, 286, 500, 380, 458, 498, 291, 434, 4963, 365, 613, 10577, 457, 437, 436, 630, + 307, 300, 436, 766, 2907, 292, 8044, 295, 264, 19817, 3337, 281, 281, 364, 30262, + 12355, 13, 51614], "temperature": 0.0, "avg_logprob": -0.15148478204553778, "compression_ratio": + 1.668122270742358, "no_speech_prob": 0.011889033019542694}, {"id": 208, "seek": + 187148, "start": 1871.48, "end": 1884.48, "text": " And so they kind of combined + your algorithm with like additional layers and then they kind of resolve to full + precision when they go to SSD disk but they don''t don''t do it in memory.", "tokens": + [50364, 400, 370, 436, 733, 295, 9354, 428, 9284, 365, 411, 4497, 7914, 293, 550, + 436, 733, 295, 14151, 281, 1577, 18356, 562, 436, 352, 281, 30262, 12355, 457, 436, + 500, 380, 500, 380, 360, 309, 294, 4675, 13, 51014], "temperature": 0.0, "avg_logprob": + -0.1278900771305479, "compression_ratio": 1.575, "no_speech_prob": 0.08497486263513565}, + {"id": 209, "seek": 187148, "start": 1884.48, "end": 1889.48, "text": " So they + do use quantization right yeah they use quantization exactly.", "tokens": [51014, + 407, 436, 360, 764, 4426, 2144, 558, 1338, 436, 764, 4426, 2144, 2293, 13, 51264], + "temperature": 0.0, "avg_logprob": -0.1278900771305479, "compression_ratio": 1.575, + "no_speech_prob": 0.08497486263513565}, {"id": 210, "seek": 188948, "start": 1890.48, + "end": 1911.48, "text": " That''s a very popular approach and that makes sense so + it''s so basically you have a hardware limitation so that you can can store but + you have a hardware here are here so you have like not so big RAM and like lots + of SSDs and maybe like if you have distributed system you have access to other nodes.", + "tokens": [50414, 663, 311, 257, 588, 3743, 3109, 293, 300, 1669, 2020, 370, 309, + 311, 370, 1936, 291, 362, 257, 8837, 27432, 370, 300, 291, 393, 393, 3531, 457, + 291, 362, 257, 8837, 510, 366, 510, 370, 291, 362, 411, 406, 370, 955, 14561, 293, + 411, 3195, 295, 30262, 82, 293, 1310, 411, 498, 291, 362, 12631, 1185, 291, 362, + 2105, 281, 661, 13891, 13, 51464], "temperature": 0.0, "avg_logprob": -0.20697151013274692, + "compression_ratio": 1.6611111111111112, "no_speech_prob": 0.5008446574211121}, + {"id": 211, "seek": 191148, "start": 1912.48, "end": 1916.48, "text": " So yeah + that''s a clever use of here are here that makes sense.", "tokens": [50414, 407, + 1338, 300, 311, 257, 13494, 764, 295, 510, 366, 510, 300, 1669, 2020, 13, 50614], + "temperature": 0.0, "avg_logprob": -0.1383817396968244, "compression_ratio": 1.6388888888888888, + "no_speech_prob": 0.018013939261436462}, {"id": 212, "seek": 191148, "start": 1916.48, + "end": 1928.48, "text": " But at the same time like your algorithm was taken into + using to popular frameworks like files right so like files is not a single algorithm + like one of them as H and SW and then.", "tokens": [50614, 583, 412, 264, 912, 565, + 411, 428, 9284, 390, 2726, 666, 1228, 281, 3743, 29834, 411, 7098, 558, 370, 411, + 7098, 307, 406, 257, 2167, 9284, 411, 472, 295, 552, 382, 389, 293, 20346, 293, + 550, 13, 51214], "temperature": 0.0, "avg_logprob": -0.1383817396968244, "compression_ratio": + 1.6388888888888888, "no_speech_prob": 0.018013939261436462}, {"id": 213, "seek": + 191148, "start": 1928.48, "end": 1937.48, "text": " Actually don''t know how they + did it did they take your C++ dependency directly or did they implemented do know.", + "tokens": [51214, 5135, 500, 380, 458, 577, 436, 630, 309, 630, 436, 747, 428, 383, + 25472, 33621, 3838, 420, 630, 436, 12270, 360, 458, 13, 51664], "temperature": 0.0, + "avg_logprob": -0.1383817396968244, "compression_ratio": 1.6388888888888888, "no_speech_prob": + 0.018013939261436462}, {"id": 214, "seek": 193748, "start": 1937.48, "end": 1957.48, + "text": " They very implemented from scratch so like I talked to them once so they + said they tried different way but like in the end it was like pretty close to the + like the initial C++ library don''t have some different there is some if something''s + are implemented differently in fights.", "tokens": [50364, 814, 588, 12270, 490, + 8459, 370, 411, 286, 2825, 281, 552, 1564, 370, 436, 848, 436, 3031, 819, 636, 457, + 411, 294, 264, 917, 309, 390, 411, 1238, 1998, 281, 264, 411, 264, 5883, 383, 25472, + 6405, 500, 380, 362, 512, 819, 456, 307, 512, 498, 746, 311, 366, 12270, 7614, 294, + 14512, 13, 51364], "temperature": 0.0, "avg_logprob": -0.3318731373754041, "compression_ratio": + 1.6104651162790697, "no_speech_prob": 0.020452603697776794}, {"id": 215, "seek": + 195748, "start": 1957.48, "end": 1975.48, "text": " So for instance there is a thread + pool like in channels WL for keeping track of visited elements so when you have + a new search if there is like a map like think of a bitmap for which knows which + notes in the network are visited.", "tokens": [50364, 407, 337, 5197, 456, 307, + 257, 7207, 7005, 411, 294, 9235, 343, 43, 337, 5145, 2837, 295, 11220, 4959, 370, + 562, 291, 362, 257, 777, 3164, 498, 456, 307, 411, 257, 4471, 411, 519, 295, 257, + 857, 24223, 337, 597, 3255, 597, 5570, 294, 264, 3209, 366, 11220, 13, 51264], "temperature": + 0.0, "avg_logprob": -0.2739807884648161, "compression_ratio": 1.52, "no_speech_prob": + 0.032176580280065536}, {"id": 216, "seek": 197548, "start": 1976.48, "end": 1991.48, + "text": " And the channels WL it''s kept in memory all the time and when you have + like a new search it will just like peaks from the pool and then face like it creates + it once per search so there are much searches more more effective.", "tokens": [50414, + 400, 264, 9235, 343, 43, 309, 311, 4305, 294, 4675, 439, 264, 565, 293, 562, 291, + 362, 411, 257, 777, 3164, 309, 486, 445, 411, 26897, 490, 264, 7005, 293, 550, 1851, + 411, 309, 7829, 309, 1564, 680, 3164, 370, 456, 366, 709, 26701, 544, 544, 4942, + 13, 51164], "temperature": 0.0, "avg_logprob": -0.3434600463280311, "compression_ratio": + 1.4966442953020134, "no_speech_prob": 0.0547904446721077}, {"id": 217, "seek": 199148, + "start": 1992.48, "end": 2017.48, "text": " Yeah yeah yeah by search yeah batch + search is another feature that sometimes is implemented in vector databases but + did you like expect that your algorithm would become so widely applicable like do + you know that it has been re implemented in several languages like for example as + part of vector database called V ev8 it was implemented in goal.", "tokens": [50414, + 865, 1338, 1338, 538, 3164, 1338, 15245, 3164, 307, 1071, 4111, 300, 2171, 307, + 12270, 294, 8062, 22380, 457, 630, 291, 411, 2066, 300, 428, 9284, 576, 1813, 370, + 13371, 21142, 411, 360, 291, 458, 300, 309, 575, 668, 319, 12270, 294, 2940, 8650, + 411, 337, 1365, 382, 644, 295, 8062, 8149, 1219, 691, 1073, 23, 309, 390, 12270, + 294, 3387, 13, 51664], "temperature": 0.0, "avg_logprob": -0.24775661121715198, + "compression_ratio": 1.72, "no_speech_prob": 0.04997752234339714}, {"id": 218, "seek": + 201748, "start": 2017.48, "end": 2032.48, "text": " And there is a database called + quadrant it it''s implemented in rast and of course all of these implementations + also add like crowd support so they you can actually update right because in reality + in database you need these features.", "tokens": [50364, 400, 456, 307, 257, 8149, + 1219, 10787, 7541, 309, 309, 311, 12270, 294, 367, 525, 293, 295, 1164, 439, 295, + 613, 4445, 763, 611, 909, 411, 6919, 1406, 370, 436, 291, 393, 767, 5623, 558, 570, + 294, 4103, 294, 8149, 291, 643, 613, 4122, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.16049987841875124, "compression_ratio": 1.7096774193548387, "no_speech_prob": + 0.008340234868228436}, {"id": 219, "seek": 201748, "start": 2032.48, "end": 2044.48, + "text": " And then they also added symbolic filtering on top of that so it''s also + inside your algorithm like did you did you expect such popularity.", "tokens": [51114, + 400, 550, 436, 611, 3869, 25755, 30822, 322, 1192, 295, 300, 370, 309, 311, 611, + 1854, 428, 9284, 411, 630, 291, 630, 291, 2066, 1270, 19301, 13, 51714], "temperature": + 0.0, "avg_logprob": -0.16049987841875124, "compression_ratio": 1.7096774193548387, + "no_speech_prob": 0.008340234868228436}, {"id": 220, "seek": 204448, "start": 2045.48, + "end": 2052.48, "text": " No no like I thought that we will publish the algorithm + and like we will win the benchmarks and we''re clearly seeing.", "tokens": [50414, + 883, 572, 411, 286, 1194, 300, 321, 486, 11374, 264, 9284, 293, 411, 321, 486, 1942, + 264, 43751, 293, 321, 434, 4448, 2577, 13, 50764], "temperature": 0.0, "avg_logprob": + -0.22544760704040528, "compression_ratio": 1.863849765258216, "no_speech_prob": + 0.028511296957731247}, {"id": 221, "seek": 204448, "start": 2052.48, "end": 2072.48, + "text": " But though at that time like just before we published the benchmark there + was a like competitor Falcon which also published the benchmarks of widget better + but like for Falcon targeted like not like that much and I thought that well Falcon + was only like for few specific metrics.", "tokens": [50764, 583, 1673, 412, 300, + 565, 411, 445, 949, 321, 6572, 264, 18927, 456, 390, 257, 411, 27266, 31801, 597, + 611, 6572, 264, 43751, 295, 34047, 1101, 457, 411, 337, 31801, 15045, 411, 406, + 411, 300, 709, 293, 286, 1194, 300, 731, 31801, 390, 787, 411, 337, 1326, 2685, + 16367, 13, 51764], "temperature": 0.0, "avg_logprob": -0.22544760704040528, "compression_ratio": + 1.863849765258216, "no_speech_prob": 0.028511296957731247}, {"id": 222, "seek": + 207248, "start": 2072.48, "end": 2096.48, "text": " And yeah actually it also was + done by like person from the same school which I went so it was in jarrison stain + so I talked with him a bit and I thought that like we have open source to code so + we published the paper so like people will quickly just like iterate on top of that + and like improve.", "tokens": [50364, 400, 1338, 767, 309, 611, 390, 1096, 538, + 411, 954, 490, 264, 912, 1395, 597, 286, 1437, 370, 309, 390, 294, 361, 2284, 2770, + 16441, 370, 286, 2825, 365, 796, 257, 857, 293, 286, 1194, 300, 411, 321, 362, 1269, + 4009, 281, 3089, 370, 321, 6572, 264, 3035, 370, 411, 561, 486, 2661, 445, 411, + 44497, 322, 1192, 295, 300, 293, 411, 3470, 13, 51564], "temperature": 0.0, "avg_logprob": + -0.30870457256541534, "compression_ratio": 1.560846560846561, "no_speech_prob": + 0.018862580880522728}, {"id": 223, "seek": 209648, "start": 2096.48, "end": 2125.48, + "text": " But yeah so it took much more time to others to improve upon it compared + to what I''ve expected and maybe that was due to lack of interest maybe that was + to some inertia so I don''t know like looking at the how many startups and solutions + are popping out right now it seems like that like the most of the interest came + much longer.", "tokens": [50364, 583, 1338, 370, 309, 1890, 709, 544, 565, 281, + 2357, 281, 3470, 3564, 309, 5347, 281, 437, 286, 600, 5176, 293, 1310, 300, 390, + 3462, 281, 5011, 295, 1179, 1310, 300, 390, 281, 512, 37234, 370, 286, 500, 380, + 458, 411, 1237, 412, 264, 577, 867, 28041, 293, 6547, 366, 18374, 484, 558, 586, + 309, 2544, 411, 300, 411, 264, 881, 295, 264, 1179, 1361, 709, 2854, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.1206220653322008, "compression_ratio": 1.604878048780488, + "no_speech_prob": 0.0800580233335495}, {"id": 224, "seek": 212548, "start": 2125.48, + "end": 2135.48, "text": " Like much later yeah to like to the point when it was + released so it was hard to predict it back then.", "tokens": [50364, 1743, 709, + 1780, 1338, 281, 411, 281, 264, 935, 562, 309, 390, 4736, 370, 309, 390, 1152, 281, + 6069, 309, 646, 550, 13, 50864], "temperature": 0.0, "avg_logprob": -0.19256186485290527, + "compression_ratio": 1.6790697674418604, "no_speech_prob": 0.05155416205525398}, + {"id": 225, "seek": 212548, "start": 2135.48, "end": 2153.48, "text": " Yeah do + you think that an MS leap has to do something with this success that it kind of + maybe an MS leap was somewhat visible and then when you edit your algorithm there + and show that it performs you know those people who followed this library at least + knew.", "tokens": [50864, 865, 360, 291, 519, 300, 364, 7395, 19438, 575, 281, 360, + 746, 365, 341, 2245, 300, 309, 733, 295, 1310, 364, 7395, 19438, 390, 8344, 8974, + 293, 550, 562, 291, 8129, 428, 9284, 456, 293, 855, 300, 309, 26213, 291, 458, 729, + 561, 567, 6263, 341, 6405, 412, 1935, 2586, 13, 51764], "temperature": 0.0, "avg_logprob": + -0.19256186485290527, "compression_ratio": 1.6790697674418604, "no_speech_prob": + 0.05155416205525398}, {"id": 226, "seek": 215348, "start": 2153.48, "end": 2156.48, + "text": " Okay there is a new algorithm.", "tokens": [50364, 1033, 456, 307, 257, + 777, 9284, 13, 50514], "temperature": 0.0, "avg_logprob": -0.29628263517867687, + "compression_ratio": 1.3488372093023255, "no_speech_prob": 0.01560369785875082}, + {"id": 227, "seek": 215348, "start": 2156.48, "end": 2168.48, "text": " I think + yeah well that helps so when the MS leap is a good library so it has some audience + I think the most attention came from and benchmarks.", "tokens": [50514, 286, 519, + 1338, 731, 300, 3665, 370, 562, 264, 7395, 19438, 307, 257, 665, 6405, 370, 309, + 575, 512, 4034, 286, 519, 264, 881, 3202, 1361, 490, 293, 43751, 13, 51114], "temperature": + 0.0, "avg_logprob": -0.29628263517867687, "compression_ratio": 1.3488372093023255, + "no_speech_prob": 0.01560369785875082}, {"id": 228, "seek": 216848, "start": 2169.48, + "end": 2180.48, "text": " So because yeah well an eye is like what was I had a lot + of attention by that point and that benchmark was done by the same person.", "tokens": + [50414, 407, 570, 1338, 731, 364, 3313, 307, 411, 437, 390, 286, 632, 257, 688, + 295, 3202, 538, 300, 935, 293, 300, 18927, 390, 1096, 538, 264, 912, 954, 13, 50964], + "temperature": 0.0, "avg_logprob": -0.22842265620376123, "compression_ratio": 1.6264367816091954, + "no_speech_prob": 0.037044331431388855}, {"id": 229, "seek": 216848, "start": 2180.48, + "end": 2197.48, "text": " Who did an eye so yeah I think that draw some like traffic + to the libraries and yeah also I think the idea of algorithm was like understandable + and so.", "tokens": [50964, 2102, 630, 364, 3313, 370, 1338, 286, 519, 300, 2642, + 512, 411, 6419, 281, 264, 15148, 293, 1338, 611, 286, 519, 264, 1558, 295, 9284, + 390, 411, 25648, 293, 370, 13, 51814], "temperature": 0.0, "avg_logprob": -0.22842265620376123, + "compression_ratio": 1.6264367816091954, "no_speech_prob": 0.037044331431388855}, + {"id": 230, "seek": 219748, "start": 2198.48, "end": 2207.48, "text": " So that + also like affects the usage so if you understand something you are more likely to + use it.", "tokens": [50414, 407, 300, 611, 411, 11807, 264, 14924, 370, 498, 291, + 1223, 746, 291, 366, 544, 3700, 281, 764, 309, 13, 50864], "temperature": 0.0, "avg_logprob": + -0.2194897292496322, "compression_ratio": 1.5515695067264574, "no_speech_prob": + 0.005177979823201895}, {"id": 231, "seek": 219748, "start": 2207.48, "end": 2226.48, + "text": " Yeah yeah it''s Eric Bernerson right the Swedish guy as he says the sweet + who is stuck in New York City yeah I think he implemented a no way originally there + is also like a presentation by him where he explains not only the annoy algorithm + but also.", "tokens": [50864, 865, 1338, 309, 311, 9336, 10781, 3953, 558, 264, + 23523, 2146, 382, 415, 1619, 264, 3844, 567, 307, 5541, 294, 1873, 3609, 4392, 1338, + 286, 519, 415, 12270, 257, 572, 636, 7993, 456, 307, 611, 411, 257, 5860, 538, 796, + 689, 415, 13948, 406, 787, 264, 8759, 9284, 457, 611, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.2194897292496322, "compression_ratio": 1.5515695067264574, + "no_speech_prob": 0.005177979823201895}, {"id": 232, "seek": 222648, "start": 2226.48, + "end": 2249.48, "text": " So how intuition doesn''t work in multi-dimensional spaces + anymore like we think that like in three in 3D world where we leave now right like + the further the point away from you like you can actually see it somehow perceive + it but like in multi-dimensional space it''s not like that I don''t know what''s + your view on that by the way.", "tokens": [50364, 407, 577, 24002, 1177, 380, 589, + 294, 4825, 12, 18759, 7673, 3602, 411, 321, 519, 300, 411, 294, 1045, 294, 805, + 35, 1002, 689, 321, 1856, 586, 558, 411, 264, 3052, 264, 935, 1314, 490, 291, 411, + 291, 393, 767, 536, 309, 6063, 20281, 309, 457, 411, 294, 4825, 12, 18759, 1901, + 309, 311, 406, 411, 300, 286, 500, 380, 458, 437, 311, 428, 1910, 322, 300, 538, + 264, 636, 13, 51514], "temperature": 0.0, "avg_logprob": -0.13716737111409505, "compression_ratio": + 1.6417910447761195, "no_speech_prob": 0.010929143987596035}, {"id": 233, "seek": + 224948, "start": 2249.48, "end": 2255.48, "text": " So like does geometry perception + changes in high dimensional space.", "tokens": [50364, 407, 411, 775, 18426, 12860, + 2962, 294, 1090, 18795, 1901, 13, 50664], "temperature": 0.0, "avg_logprob": -0.19851755717444042, + "compression_ratio": 1.6576086956521738, "no_speech_prob": 0.011454934254288673}, + {"id": 234, "seek": 224948, "start": 2255.48, "end": 2276.48, "text": " Well yes + yes so there are like many interpretations of this so people who work with nearest + neighbor search they know about it so like if you have like if you have like many + dimensions even small perturbations there they can go like far.", "tokens": [50664, + 1042, 2086, 2086, 370, 456, 366, 411, 867, 37547, 295, 341, 370, 561, 567, 589, + 365, 23831, 5987, 3164, 436, 458, 466, 309, 370, 411, 498, 291, 362, 411, 498, 291, + 362, 411, 867, 12819, 754, 1359, 40468, 763, 456, 436, 393, 352, 411, 1400, 13, + 51714], "temperature": 0.0, "avg_logprob": -0.19851755717444042, "compression_ratio": + 1.6576086956521738, "no_speech_prob": 0.011454934254288673}, {"id": 235, "seek": + 227648, "start": 2276.48, "end": 2297.48, "text": " So you all have like so to find + nearest neighbor you need to have like a huge cover sphere yeah like when you divide + divide the space so yeah that makes the problem complicated and that that one of + the reasons why all the practical methods are approximate.", "tokens": [50364, 407, + 291, 439, 362, 411, 370, 281, 915, 23831, 5987, 291, 643, 281, 362, 411, 257, 2603, + 2060, 16687, 1338, 411, 562, 291, 9845, 9845, 264, 1901, 370, 1338, 300, 1669, 264, + 1154, 6179, 293, 300, 300, 472, 295, 264, 4112, 983, 439, 264, 8496, 7150, 366, + 30874, 13, 51414], "temperature": 0.0, "avg_logprob": -0.17376018020341982, "compression_ratio": + 1.610062893081761, "no_speech_prob": 0.028064627200365067}, {"id": 236, "seek": + 229748, "start": 2298.48, "end": 2320.48, "text": " Right yeah yeah yeah so like + you do need some approximation in order to find the points and so yeah I mean it + sounds like so when you when you mentioned and then benchmarks was it you who submitted + the algorithm for the benchmarks or was it Eric who picked it up and he made it + kind of available in the end and benchmarks.", "tokens": [50414, 1779, 1338, 1338, + 1338, 370, 411, 291, 360, 643, 512, 28023, 294, 1668, 281, 915, 264, 2793, 293, + 370, 1338, 286, 914, 309, 3263, 411, 370, 562, 291, 562, 291, 2835, 293, 550, 43751, + 390, 309, 291, 567, 14405, 264, 9284, 337, 264, 43751, 420, 390, 309, 9336, 567, + 6183, 309, 493, 293, 415, 1027, 309, 733, 295, 2435, 294, 264, 917, 293, 43751, + 13, 51514], "temperature": 0.0, "avg_logprob": -0.143162008644878, "compression_ratio": + 1.7219251336898396, "no_speech_prob": 0.13415798544883728}, {"id": 237, "seek": + 232048, "start": 2320.48, "end": 2323.48, "text": " No no I did a full request to + edit.", "tokens": [50364, 883, 572, 286, 630, 257, 1577, 5308, 281, 8129, 13, 50514], + "temperature": 0.0, "avg_logprob": -0.31843823725634285, "compression_ratio": 1.7826086956521738, + "no_speech_prob": 0.2328237146139145}, {"id": 238, "seek": 232048, "start": 2323.48, + "end": 2333.48, "text": " All right so it basically yeah yeah so you pushed it forward + yourself right so it wasn''t like you just implemented and then you waited for it + to be discovered so to say.", "tokens": [50514, 1057, 558, 370, 309, 1936, 1338, + 1338, 370, 291, 9152, 309, 2128, 1803, 558, 370, 309, 2067, 380, 411, 291, 445, + 12270, 293, 550, 291, 15240, 337, 309, 281, 312, 6941, 370, 281, 584, 13, 51014], + "temperature": 0.0, "avg_logprob": -0.31843823725634285, "compression_ratio": 1.7826086956521738, + "no_speech_prob": 0.2328237146139145}, {"id": 239, "seek": 232048, "start": 2333.48, + "end": 2349.48, "text": " No yeah definitely so like the one of the like decisions + to use in the most sleep was that the most sleep was already integrated in an benchmark + so adding that will be just like adding some code in an benchmark that like pulls + the algorithm and.", "tokens": [51014, 883, 1338, 2138, 370, 411, 264, 472, 295, + 264, 411, 5327, 281, 764, 294, 264, 881, 2817, 390, 300, 264, 881, 2817, 390, 1217, + 10919, 294, 364, 18927, 370, 5127, 300, 486, 312, 445, 411, 5127, 512, 3089, 294, + 364, 18927, 300, 411, 16982, 264, 9284, 293, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.31843823725634285, "compression_ratio": 1.7826086956521738, "no_speech_prob": + 0.2328237146139145}, {"id": 240, "seek": 234948, "start": 2349.48, "end": 2369.48, + "text": " And then the tuning of the parameters so that was but that was simple + to do right yeah and so as you did that like what were the results like of that + of course an end benchmarks it has a number of parameters right for example like + even indexing speed.", "tokens": [50364, 400, 550, 264, 15164, 295, 264, 9834, 370, + 300, 390, 457, 300, 390, 2199, 281, 360, 558, 1338, 293, 370, 382, 291, 630, 300, + 411, 437, 645, 264, 3542, 411, 295, 300, 295, 1164, 364, 917, 43751, 309, 575, 257, + 1230, 295, 9834, 558, 337, 1365, 411, 754, 8186, 278, 3073, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.3187592029571533, "compression_ratio": 1.5987261146496816, + "no_speech_prob": 0.0018335786880925298}, {"id": 241, "seek": 236948, "start": 2369.48, + "end": 2383.48, "text": " Not only like recall versus QPS trade trade off like was + there some specific kind of metrics that were hnsw excelled over other algorithms.", + "tokens": [50364, 1726, 787, 411, 9901, 5717, 1249, 6273, 4923, 4923, 766, 411, + 390, 456, 512, 2685, 733, 295, 16367, 300, 645, 276, 3695, 86, 45817, 292, 670, + 661, 14642, 13, 51064], "temperature": 0.0, "avg_logprob": -0.27355136293353455, + "compression_ratio": 1.2636363636363637, "no_speech_prob": 0.21543322503566742}, + {"id": 242, "seek": 238348, "start": 2384.48, "end": 2404.48, "text": " Well at + that point of time there was like no logging of the construction time and memory + consumption and the like the initial version in the most sleep it had like clear + focus on the performance like recall to speed ratio.", "tokens": [50414, 1042, 412, + 300, 935, 295, 565, 456, 390, 411, 572, 27991, 295, 264, 6435, 565, 293, 4675, 12126, + 293, 264, 411, 264, 5883, 3037, 294, 264, 881, 2817, 309, 632, 411, 1850, 1879, + 322, 264, 3389, 411, 9901, 281, 3073, 8509, 13, 51414], "temperature": 0.0, "avg_logprob": + -0.21947410832280698, "compression_ratio": 1.5524475524475525, "no_speech_prob": + 0.0713333711028099}, {"id": 243, "seek": 240448, "start": 2405.48, "end": 2421.48, + "text": " Yeah and well you know it''s hard to do proper benchmarking so like there + are a number of scenarios somewhere you have a limit on memory somewhere you have + a limit on the construction time sometimes like you don''t care about them at all + you just care about the speed.", "tokens": [50414, 865, 293, 731, 291, 458, 309, + 311, 1152, 281, 360, 2296, 18927, 278, 370, 411, 456, 366, 257, 1230, 295, 15077, + 4079, 291, 362, 257, 4948, 322, 4675, 4079, 291, 362, 257, 4948, 322, 264, 6435, + 565, 2171, 411, 291, 500, 380, 1127, 466, 552, 412, 439, 291, 445, 1127, 466, 264, + 3073, 13, 51214], "temperature": 0.0, "avg_logprob": -0.11550265345080145, "compression_ratio": + 1.705128205128205, "no_speech_prob": 0.04405513033270836}, {"id": 244, "seek": 242148, + "start": 2421.48, "end": 2442.48, "text": " You can also care about like multi thread + performance or you can care about like single thread performance like maybe different + scenarios so it''s pretty hard to go proper benchmarking and the depth like like + I did a decision to just focus on the recall and don''t think about construction + and memory.", "tokens": [50364, 509, 393, 611, 1127, 466, 411, 4825, 7207, 3389, + 420, 291, 393, 1127, 466, 411, 2167, 7207, 3389, 411, 1310, 819, 15077, 370, 309, + 311, 1238, 1152, 281, 352, 2296, 18927, 278, 293, 264, 7161, 411, 411, 286, 630, + 257, 3537, 281, 445, 1879, 322, 264, 9901, 293, 500, 380, 519, 466, 6435, 293, 4675, + 13, 51414], "temperature": 0.0, "avg_logprob": -0.2436261018117269, "compression_ratio": + 1.670391061452514, "no_speech_prob": 0.05505936220288277}, {"id": 245, "seek": 244248, + "start": 2442.48, "end": 2454.48, "text": " Okay I see yeah and so and and basically + when you when you did that like was hnsw like at the top of the competition at that + point.", "tokens": [50364, 1033, 286, 536, 1338, 293, 370, 293, 293, 1936, 562, + 291, 562, 291, 630, 300, 411, 390, 276, 3695, 86, 411, 412, 264, 1192, 295, 264, + 6211, 412, 300, 935, 13, 50964], "temperature": 0.0, "avg_logprob": -0.23508886011635385, + "compression_ratio": 1.7614213197969544, "no_speech_prob": 0.055322013795375824}, + {"id": 246, "seek": 244248, "start": 2454.48, "end": 2471.48, "text": " Yes yes + it was like a top and on many many benchmarks it was like it was a huge cap compared + to the next competitor so not so maybe for a globe I think this Falcon there was + still there there was a like significant.", "tokens": [50964, 1079, 2086, 309, 390, + 411, 257, 1192, 293, 322, 867, 867, 43751, 309, 390, 411, 309, 390, 257, 2603, 1410, + 5347, 281, 264, 958, 27266, 370, 406, 370, 1310, 337, 257, 15371, 286, 519, 341, + 31801, 456, 390, 920, 456, 456, 390, 257, 411, 4776, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.23508886011635385, "compression_ratio": 1.7614213197969544, + "no_speech_prob": 0.055322013795375824}, {"id": 247, "seek": 247248, "start": 2473.48, + "end": 2477.48, "text": " Difference yeah but many.", "tokens": [50414, 35940, 5158, + 1338, 457, 867, 13, 50614], "temperature": 0.0, "avg_logprob": -0.28007225556807086, + "compression_ratio": 1.4076923076923078, "no_speech_prob": 0.010700671002268791}, + {"id": 248, "seek": 247248, "start": 2479.48, "end": 2492.48, "text": " Yeah also + like at some point after that there was a real release of key graph algorithm so + which like decreased the difference but it was still on top of it.", "tokens": [50714, + 865, 611, 411, 412, 512, 935, 934, 300, 456, 390, 257, 957, 4374, 295, 2141, 4295, + 9284, 370, 597, 411, 24436, 264, 2649, 457, 309, 390, 920, 322, 1192, 295, 309, + 13, 51364], "temperature": 0.0, "avg_logprob": -0.28007225556807086, "compression_ratio": + 1.4076923076923078, "no_speech_prob": 0.010700671002268791}, {"id": 249, "seek": + 249248, "start": 2492.48, "end": 2502.48, "text": " Yeah so did you did you did + it make you feel proud at that moment when you saw the big gap and like this is + your invention for how did you feel about it.", "tokens": [50364, 865, 370, 630, + 291, 630, 291, 630, 309, 652, 291, 841, 4570, 412, 300, 1623, 562, 291, 1866, 264, + 955, 7417, 293, 411, 341, 307, 428, 22265, 337, 577, 630, 291, 841, 466, 309, 13, + 50864], "temperature": 0.0, "avg_logprob": -0.1310565899579953, "compression_ratio": + 1.6701030927835052, "no_speech_prob": 0.07944231480360031}, {"id": 250, "seek": + 249248, "start": 2502.48, "end": 2519.48, "text": " Well that felt nice for sure + so yeah so we published the paper I think like pop when the paper was finally accepted + so it''s also felt like really well so I think it took.", "tokens": [50864, 1042, + 300, 2762, 1481, 337, 988, 370, 1338, 370, 321, 6572, 264, 3035, 286, 519, 411, + 1665, 562, 264, 3035, 390, 2721, 9035, 370, 309, 311, 611, 2762, 411, 534, 731, + 370, 286, 519, 309, 1890, 13, 51714], "temperature": 0.0, "avg_logprob": -0.1310565899579953, + "compression_ratio": 1.6701030927835052, "no_speech_prob": 0.07944231480360031}, + {"id": 251, "seek": 251948, "start": 2519.48, "end": 2524.48, "text": " Like two + and a half years to publish the paper well.", "tokens": [50364, 1743, 732, 293, + 257, 1922, 924, 281, 11374, 264, 3035, 731, 13, 50614], "temperature": 0.0, "avg_logprob": + -0.2913198149606083, "compression_ratio": 1.665289256198347, "no_speech_prob": 0.02951093018054962}, + {"id": 252, "seek": 251948, "start": 2524.48, "end": 2535.48, "text": " As they + say in the US I think every rejection brings you closer to the goal so it sounds + like you''ve been rejected in multiple like journals that was not that was still + published.", "tokens": [50614, 1018, 436, 584, 294, 264, 2546, 286, 519, 633, 26044, + 5607, 291, 4966, 281, 264, 3387, 370, 309, 3263, 411, 291, 600, 668, 15749, 294, + 3866, 411, 29621, 300, 390, 406, 300, 390, 920, 6572, 13, 51164], "temperature": + 0.0, "avg_logprob": -0.2913198149606083, "compression_ratio": 1.665289256198347, + "no_speech_prob": 0.02951093018054962}, {"id": 253, "seek": 251948, "start": 2535.48, + "end": 2547.48, "text": " Now that was a single journal it''s just like yeah one + revision took one year so that is that is palm year so transaction of pattern analyzing + a machine intelligence okay.", "tokens": [51164, 823, 300, 390, 257, 2167, 6708, + 309, 311, 445, 411, 1338, 472, 34218, 1890, 472, 1064, 370, 300, 307, 300, 307, + 17018, 1064, 370, 14425, 295, 5102, 23663, 257, 3479, 7599, 1392, 13, 51764], "temperature": + 0.0, "avg_logprob": -0.2913198149606083, "compression_ratio": 1.665289256198347, + "no_speech_prob": 0.02951093018054962}, {"id": 254, "seek": 254748, "start": 2548.48, + "end": 2560.48, "text": " So like we follow the practice and physics and ignore + ignore the conferences so and we also need the for the grants we need to have journal + publications not not confidence publication so we sent.", "tokens": [50414, 407, + 411, 321, 1524, 264, 3124, 293, 10649, 293, 11200, 11200, 264, 22032, 370, 293, + 321, 611, 643, 264, 337, 264, 16101, 321, 643, 281, 362, 6708, 25618, 406, 406, + 6687, 19953, 370, 321, 2279, 13, 51014], "temperature": 0.0, "avg_logprob": -0.2794168735372609, + "compression_ratio": 1.607361963190184, "no_speech_prob": 0.0058587719686329365}, + {"id": 255, "seek": 254748, "start": 2561.48, "end": 2566.48, "text": " To Pamy + and had few revisions there but each revision took a year.", "tokens": [51064, 1407, + 430, 7804, 293, 632, 1326, 3698, 4252, 456, 457, 1184, 34218, 1890, 257, 1064, 13, + 51314], "temperature": 0.0, "avg_logprob": -0.2794168735372609, "compression_ratio": + 1.607361963190184, "no_speech_prob": 0.0058587719686329365}, {"id": 256, "seek": + 256648, "start": 2566.48, "end": 2576.48, "text": " Wow this is super long why do + you think it was like that like why why would reviewers be so scrutinizing like + your submission.", "tokens": [50364, 3153, 341, 307, 1687, 938, 983, 360, 291, 519, + 309, 390, 411, 300, 411, 983, 983, 576, 45837, 312, 370, 28949, 259, 3319, 411, + 428, 23689, 13, 50864], "temperature": 0.0, "avg_logprob": -0.16562448848377576, + "compression_ratio": 1.483695652173913, "no_speech_prob": 0.19627168774604797}, + {"id": 257, "seek": 256648, "start": 2577.48, "end": 2593.48, "text": " Well I don''t + know so like I actually talked with the editor so I was very angry after the first + result so and it seems like just a problem is how.", "tokens": [50914, 1042, 286, + 500, 380, 458, 370, 411, 286, 767, 2825, 365, 264, 9839, 370, 286, 390, 588, 6884, + 934, 264, 700, 1874, 370, 293, 309, 2544, 411, 445, 257, 1154, 307, 577, 13, 51714], + "temperature": 0.0, "avg_logprob": -0.16562448848377576, "compression_ratio": 1.483695652173913, + "no_speech_prob": 0.19627168774604797}, {"id": 258, "seek": 259348, "start": 2593.48, + "end": 2603.48, "text": " So publications in computer science are organized so that''s + that''s not only that journal there are so many journals which have.", "tokens": + [50364, 407, 25618, 294, 3820, 3497, 366, 9983, 370, 300, 311, 300, 311, 406, 787, + 300, 6708, 456, 366, 370, 867, 29621, 597, 362, 13, 50864], "temperature": 0.0, + "avg_logprob": -0.3304427146911621, "compression_ratio": 1.6089385474860336, "no_speech_prob": + 0.08353373408317566}, {"id": 259, "seek": 259348, "start": 2603.48, "end": 2618.48, + "text": " This problem and like when I looked at the Twitter like when some discussions + were like oh I got like review invitation for like this like the national journal.", + "tokens": [50864, 639, 1154, 293, 411, 562, 286, 2956, 412, 264, 5794, 411, 562, + 512, 11088, 645, 411, 1954, 286, 658, 411, 3131, 17890, 337, 411, 341, 411, 264, + 4048, 6708, 13, 51614], "temperature": 0.0, "avg_logprob": -0.3304427146911621, + "compression_ratio": 1.6089385474860336, "no_speech_prob": 0.08353373408317566}, + {"id": 260, "seek": 261848, "start": 2618.48, "end": 2646.48, "text": " They said + I have to write review in 10 days oh I never gonna do that so no like no way I''m + writing a review in 10 days and yeah so in physics it took it sometimes took a few + weeks to get the review and journal in journal so you send it and thank you for + the months you can already start like writing to review like what what what takes + so long yeah and yeah in computer science.", "tokens": [50364, 814, 848, 286, 362, + 281, 2464, 3131, 294, 1266, 1708, 1954, 286, 1128, 799, 360, 300, 370, 572, 411, + 572, 636, 286, 478, 3579, 257, 3131, 294, 1266, 1708, 293, 1338, 370, 294, 10649, + 309, 1890, 309, 2171, 1890, 257, 1326, 3259, 281, 483, 264, 3131, 293, 6708, 294, + 6708, 370, 291, 2845, 309, 293, 1309, 291, 337, 264, 2493, 291, 393, 1217, 722, + 411, 3579, 281, 3131, 411, 437, 437, 437, 2516, 370, 938, 1338, 293, 1338, 294, + 3820, 3497, 13, 51764], "temperature": 0.0, "avg_logprob": -0.21093954041946766, + "compression_ratio": 1.75, "no_speech_prob": 0.12649580836296082}, {"id": 261, "seek": + 264648, "start": 2646.48, "end": 2664.48, "text": " But journals are very slow conferences + are also slow there''s several months to get the review and like people saw that + we are using archive yeah so if there were no archive I think they have already + they will just.", "tokens": [50364, 583, 29621, 366, 588, 2964, 22032, 366, 611, + 2964, 456, 311, 2940, 2493, 281, 483, 264, 3131, 293, 411, 561, 1866, 300, 321, + 366, 1228, 23507, 1338, 370, 498, 456, 645, 572, 23507, 286, 519, 436, 362, 1217, + 436, 486, 445, 13, 51264], "temperature": 0.0, "avg_logprob": -0.3064603805541992, + "compression_ratio": 1.4758620689655173, "no_speech_prob": 0.1618216633796692}, + {"id": 262, "seek": 266448, "start": 2664.48, "end": 2688.48, "text": " They will + create new journals yeah exactly like they should be any monopolies right in that + sense like maybe go and create your own journal but then the question is when the + problem is when you a PhD student let''s say you have a chicken act problem right + so you haven''t proven yourself yet you need a publication to defend your thesis + right so that''s the trap.", "tokens": [50364, 814, 486, 1884, 777, 29621, 1338, + 2293, 411, 436, 820, 312, 604, 47721, 530, 558, 294, 300, 2020, 411, 1310, 352, + 293, 1884, 428, 1065, 6708, 457, 550, 264, 1168, 307, 562, 264, 1154, 307, 562, + 291, 257, 14476, 3107, 718, 311, 584, 291, 362, 257, 4662, 605, 1154, 558, 370, + 291, 2378, 380, 12785, 1803, 1939, 291, 643, 257, 19953, 281, 8602, 428, 22288, + 558, 370, 300, 311, 264, 11487, 13, 51564], "temperature": 0.0, "avg_logprob": -0.1908327529304906, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.2940293252468109}, + {"id": 263, "seek": 268848, "start": 2688.48, "end": 2717.48, "text": " Well it''s + also known how how can this all so if like they created like a new conference conferences + like I think I didn''t remember I see a lot or I see a lot was created not that + long ago they could have created the journal as well yeah the same people said like + we don''t want to do conferences like conferences you have a very tight deadline + that means like if you miss it you''ll wait for another year and that is like not + not.", "tokens": [50414, 1042, 309, 311, 611, 2570, 577, 577, 393, 341, 439, 370, + 498, 411, 436, 2942, 411, 257, 777, 7586, 22032, 411, 286, 519, 286, 994, 380, 1604, + 286, 536, 257, 688, 420, 286, 536, 257, 688, 390, 2942, 406, 300, 938, 2057, 436, + 727, 362, 2942, 264, 6708, 382, 731, 1338, 264, 912, 561, 848, 411, 321, 500, 380, + 528, 281, 360, 22032, 411, 22032, 291, 362, 257, 588, 4524, 20615, 300, 1355, 411, + 498, 291, 1713, 309, 291, 603, 1699, 337, 1071, 1064, 293, 300, 307, 411, 406, 406, + 13, 51814], "temperature": 0.0, "avg_logprob": -0.20963461855624585, "compression_ratio": + 1.8771929824561404, "no_speech_prob": 0.26836782693862915}, {"id": 264, "seek": + 271848, "start": 2718.48, "end": 2735.48, "text": " Great let''s create a journal + and now you have a continuous like a spectrum of time when you want to send your + paper no deadlines there are no deadlines for reviewers yeah you could almost review + yourself.", "tokens": [50364, 3769, 718, 311, 1884, 257, 6708, 293, 586, 291, 362, + 257, 10957, 411, 257, 11143, 295, 565, 562, 291, 528, 281, 2845, 428, 3035, 572, + 37548, 456, 366, 572, 37548, 337, 45837, 1338, 291, 727, 1920, 3131, 1803, 13, 51214], + "temperature": 0.0, "avg_logprob": -0.23158929514330487, "compression_ratio": 1.4782608695652173, + "no_speech_prob": 0.0892980620265007}, {"id": 265, "seek": 273548, "start": 2735.48, + "end": 2755.48, "text": " Yeah I mean like during the review period on the conference + you can get like 10 papers at the same time so you have to review them like in a + batch but if you are working with journals you get a review like from time to time + yeah like your your load is distributed.", "tokens": [50364, 865, 286, 914, 411, + 1830, 264, 3131, 2896, 322, 264, 7586, 291, 393, 483, 411, 1266, 10577, 412, 264, + 912, 565, 370, 291, 362, 281, 3131, 552, 411, 294, 257, 15245, 457, 498, 291, 366, + 1364, 365, 29621, 291, 483, 257, 3131, 411, 490, 565, 281, 565, 1338, 411, 428, + 428, 3677, 307, 12631, 13, 51364], "temperature": 0.0, "avg_logprob": -0.1861377813048282, + "compression_ratio": 1.6, "no_speech_prob": 0.33215025067329407}, {"id": 266, "seek": + 275548, "start": 2755.48, "end": 2781.48, "text": " Yeah so by the way what is your + take like I think new IPS conference they decided this year they decided to hold + all reviews publicly so essentially anybody can see you know the comments from reviewers + and there is like a discussion between reviewers and authors and everything is public + do you think it improves the process somehow or not what''s your take on this.", + "tokens": [50364, 865, 370, 538, 264, 636, 437, 307, 428, 747, 411, 286, 519, 777, + 50021, 7586, 436, 3047, 341, 1064, 436, 3047, 281, 1797, 439, 10229, 14843, 370, + 4476, 4472, 393, 536, 291, 458, 264, 3053, 490, 45837, 293, 456, 307, 411, 257, + 5017, 1296, 45837, 293, 16552, 293, 1203, 307, 1908, 360, 291, 519, 309, 24771, + 264, 1399, 6063, 420, 406, 437, 311, 428, 747, 322, 341, 13, 51664], "temperature": + 0.0, "avg_logprob": -0.1015020145310296, "compression_ratio": 1.6898148148148149, + "no_speech_prob": 0.09201295673847198}, {"id": 267, "seek": 278148, "start": 2781.48, + "end": 2810.48, "text": " Well I think that makes sense so that opens well that + sets the bar for reviewers higher because if you know that your review will be read + by some random people you want to make it better and spend more time on reading + the paper it also helps to understand the review process from outside like for if + you''re a new reviewer you want to understand how to do proper review you can just + read reviews by other people.", "tokens": [50414, 1042, 286, 519, 300, 1669, 2020, + 370, 300, 9870, 731, 300, 6352, 264, 2159, 337, 45837, 2946, 570, 498, 291, 458, + 300, 428, 3131, 486, 312, 1401, 538, 512, 4974, 561, 291, 528, 281, 652, 309, 1101, + 293, 3496, 544, 565, 322, 3760, 264, 3035, 309, 611, 3665, 281, 1223, 264, 3131, + 1399, 490, 2380, 411, 337, 498, 291, 434, 257, 777, 3131, 260, 291, 528, 281, 1223, + 577, 281, 360, 2296, 3131, 291, 393, 445, 1401, 10229, 538, 661, 561, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.11340210858513328, "compression_ratio": 1.7869565217391303, + "no_speech_prob": 0.061579495668411255}, {"id": 268, "seek": 281148, "start": 2811.48, + "end": 2835.48, "text": " And that is helpful and you can also like if you''re if + you want to publish a paper you can find similar papers and read the reviews for + those papers and understand like why they are rejected or accepted so that that + helps I don''t see like much problem in that that fights against against the corruption + and some places in science are corrupted so.", "tokens": [50364, 400, 300, 307, + 4961, 293, 291, 393, 611, 411, 498, 291, 434, 498, 291, 528, 281, 11374, 257, 3035, + 291, 393, 915, 2531, 10577, 293, 1401, 264, 10229, 337, 729, 10577, 293, 1223, 411, + 983, 436, 366, 15749, 420, 9035, 370, 300, 300, 3665, 286, 500, 380, 536, 411, 709, + 1154, 294, 300, 300, 14512, 1970, 1970, 264, 17959, 293, 512, 3190, 294, 3497, 366, + 39480, 370, 13, 51564], "temperature": 0.0, "avg_logprob": -0.20947571595509848, + "compression_ratio": 1.7263681592039801, "no_speech_prob": 0.04055245220661163}, + {"id": 269, "seek": 283548, "start": 2835.48, "end": 2864.48, "text": " Yeah it + kind of brings transparency at least with the process and also as you mentioned + someone can learn how to do these things right so I think it''s also useful and + maybe it prevents situations when the paper is rejected outright because the reviewer + has some bias against this topic or you know I mean at least transparency is good + I think yeah.", "tokens": [50414, 865, 309, 733, 295, 5607, 17131, 412, 1935, 365, + 264, 1399, 293, 611, 382, 291, 2835, 1580, 393, 1466, 577, 281, 360, 613, 721, 558, + 370, 286, 519, 309, 311, 611, 4420, 293, 1310, 309, 22367, 6851, 562, 264, 3035, + 307, 15749, 35189, 570, 264, 3131, 260, 575, 512, 12577, 1970, 341, 4829, 420, 291, + 458, 286, 914, 412, 1935, 17131, 307, 665, 286, 519, 1338, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.08758626665387835, "compression_ratio": 1.6186046511627907, + "no_speech_prob": 0.1423475593328476}, {"id": 270, "seek": 286548, "start": 2866.48, + "end": 2872.48, "text": " Are you publishing today by the way do you have any publishable + work do you intend to publish.", "tokens": [50414, 2014, 291, 17832, 965, 538, 264, + 636, 360, 291, 362, 604, 11374, 712, 589, 360, 291, 19759, 281, 11374, 13, 50714], + "temperature": 0.0, "avg_logprob": -0.2400408699398949, "compression_ratio": 1.6235294117647059, + "no_speech_prob": 0.032644256949424744}, {"id": 271, "seek": 286548, "start": 2874.48, + "end": 2888.48, "text": " Not much so I''m working mostly on protection like maybe + next year I work on something publishable we are last last thing I published that + wasn''t some song so for on both estimation.", "tokens": [50814, 1726, 709, 370, + 286, 478, 1364, 5240, 322, 6334, 411, 1310, 958, 1064, 286, 589, 322, 746, 11374, + 712, 321, 366, 1036, 1036, 551, 286, 6572, 300, 2067, 380, 512, 2153, 370, 337, + 322, 1293, 35701, 13, 51514], "temperature": 0.0, "avg_logprob": -0.2400408699398949, + "compression_ratio": 1.6235294117647059, "no_speech_prob": 0.032644256949424744}, + {"id": 272, "seek": 288848, "start": 2889.48, "end": 2914.48, "text": " Yeah but + like I''ve noticed like you are very active on hnsw github like when I when I posted + my question and maybe we can discuss that as well if you are kind of curious on + that kind of you responded really fast and so it means that you still continue to + allocate chunk of your time to to look at you know issues and pull requests on on + github.", "tokens": [50414, 865, 457, 411, 286, 600, 5694, 411, 291, 366, 588, 4967, + 322, 276, 3695, 86, 290, 355, 836, 411, 562, 286, 562, 286, 9437, 452, 1168, 293, + 1310, 321, 393, 2248, 300, 382, 731, 498, 291, 366, 733, 295, 6369, 322, 300, 733, + 295, 291, 15806, 534, 2370, 293, 370, 309, 1355, 300, 291, 920, 2354, 281, 35713, + 16635, 295, 428, 565, 281, 281, 574, 412, 291, 458, 2663, 293, 2235, 12475, 322, + 322, 290, 355, 836, 13, 51664], "temperature": 0.0, "avg_logprob": -0.12601515141929068, + "compression_ratio": 1.6618357487922706, "no_speech_prob": 0.06960642337799072}, + {"id": 273, "seek": 291448, "start": 2914.48, "end": 2943.48, "text": " Yeah so + like I wish I would have done it better so I miss some some things from there but + yeah I tried to update this library so I think that well when I designed hnsw so + there was some design decisions and even if I see like some algorithms outside improve + upon that I think they are not.", "tokens": [50414, 865, 370, 411, 286, 3172, 286, + 576, 362, 1096, 309, 1101, 370, 286, 1713, 512, 512, 721, 490, 456, 457, 1338, 286, + 3031, 281, 5623, 341, 6405, 370, 286, 519, 300, 731, 562, 286, 4761, 276, 3695, + 86, 370, 456, 390, 512, 1715, 5327, 293, 754, 498, 286, 536, 411, 512, 14642, 2380, + 3470, 3564, 300, 286, 519, 436, 366, 406, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.2035375741811899, "compression_ratio": 1.6235955056179776, "no_speech_prob": + 0.03246492147445679}, {"id": 274, "seek": 294448, "start": 2944.48, "end": 2946.48, + "text": " Aligned with the design.", "tokens": [50364, 967, 16690, 365, 264, 1715, + 13, 50464], "temperature": 0.0, "avg_logprob": -0.21321464909447563, "compression_ratio": + 1.5251396648044693, "no_speech_prob": 0.0077543724328279495}, {"id": 275, "seek": + 294448, "start": 2946.48, "end": 2956.48, "text": " So and I skip them one of that + is like hnsw tries to avoid global view of the network so because it''s it''s a + descendant of.", "tokens": [50464, 407, 293, 286, 10023, 552, 472, 295, 300, 307, + 411, 276, 3695, 86, 9898, 281, 5042, 4338, 1910, 295, 264, 3209, 370, 570, 309, + 311, 309, 311, 257, 16333, 394, 295, 13, 50964], "temperature": 0.0, "avg_logprob": + -0.21321464909447563, "compression_ratio": 1.5251396648044693, "no_speech_prob": + 0.0077543724328279495}, {"id": 276, "seek": 294448, "start": 2959.48, "end": 2968.48, + "text": " Distributed algorithms so like it''s like it''s not good strategically + if you have like a global view well sometimes it helps.", "tokens": [51114, 9840, + 2024, 4866, 14642, 370, 411, 309, 311, 411, 309, 311, 406, 665, 38061, 498, 291, + 362, 411, 257, 4338, 1910, 731, 2171, 309, 3665, 13, 51564], "temperature": 0.0, + "avg_logprob": -0.21321464909447563, "compression_ratio": 1.5251396648044693, "no_speech_prob": + 0.0077543724328279495}, {"id": 277, "seek": 296848, "start": 2969.48, "end": 2992.48, + "text": " Like there are papers where you can and that we should make that the pass + from the entry point of the network to every node is in short so you can make it + but that is that breaks if you''re doing assertions for instance so like you cannot + have a global view and dynamic nature at the same time.", "tokens": [50414, 1743, + 456, 366, 10577, 689, 291, 393, 293, 300, 321, 820, 652, 300, 264, 1320, 490, 264, + 8729, 935, 295, 264, 3209, 281, 633, 9984, 307, 294, 2099, 370, 291, 393, 652, 309, + 457, 300, 307, 300, 9857, 498, 291, 434, 884, 19810, 626, 337, 5197, 370, 411, 291, + 2644, 362, 257, 4338, 1910, 293, 8546, 3687, 412, 264, 912, 565, 13, 51564], "temperature": + 0.0, "avg_logprob": -0.17540849338878284, "compression_ratio": 1.5923913043478262, + "no_speech_prob": 0.027314988896250725}, {"id": 278, "seek": 299248, "start": 2993.48, + "end": 3018.48, "text": " Yeah so that that''s why I ignore some of the stuff there''s + also a focus on like custom distances so even though the hnsw lip supports only + free distances is pretty easy to implement what distances like you want and I believe + that there will be a shift in like what distances are being used.", "tokens": [50414, + 865, 370, 300, 300, 311, 983, 286, 11200, 512, 295, 264, 1507, 456, 311, 611, 257, + 1879, 322, 411, 2375, 22182, 370, 754, 1673, 264, 276, 3695, 86, 8280, 9346, 787, + 1737, 22182, 307, 1238, 1858, 281, 4445, 437, 22182, 411, 291, 528, 293, 286, 1697, + 300, 456, 486, 312, 257, 5513, 294, 411, 437, 22182, 366, 885, 1143, 13, 51664], + "temperature": 0.0, "avg_logprob": -0.17219698429107666, "compression_ratio": 1.6111111111111112, + "no_speech_prob": 0.08882223069667816}, {"id": 279, "seek": 301848, "start": 3018.48, + "end": 3045.48, "text": " After some time because there are problems with like those + like those simple distances you mean like a sign cosign dot product this type of + distances right or yeah yeah it''s more a problem that you want to embed everything + like you want to embed an entity into a single vector representation so and that + has limitations like as you probably know that.", "tokens": [50364, 2381, 512, 565, + 570, 456, 366, 2740, 365, 411, 729, 411, 729, 2199, 22182, 291, 914, 411, 257, 1465, + 3792, 788, 5893, 1674, 341, 2010, 295, 22182, 558, 420, 1338, 1338, 309, 311, 544, + 257, 1154, 300, 291, 528, 281, 12240, 1203, 411, 291, 528, 281, 12240, 364, 13977, + 666, 257, 2167, 8062, 10290, 370, 293, 300, 575, 15705, 411, 382, 291, 1391, 458, + 300, 13, 51714], "temperature": 0.0, "avg_logprob": -0.2616024562290737, "compression_ratio": + 1.755, "no_speech_prob": 0.04624165967106819}, {"id": 280, "seek": 304548, "start": + 3045.48, "end": 3074.48, "text": " Like transformers are based on attention and + there before there was a like a last year with attention for translation and without + attention of didn''t work well because it like compressed everything to a single + vector so I believe that in some time there will be at least set distances so where + your object and query represented as a set of like as a set of which can be shuffled + and doesn''t change the structure.", "tokens": [50364, 1743, 4088, 433, 366, 2361, + 322, 3202, 293, 456, 949, 456, 390, 257, 411, 257, 1036, 1064, 365, 3202, 337, 12853, + 293, 1553, 3202, 295, 994, 380, 589, 731, 570, 309, 411, 30353, 1203, 281, 257, + 2167, 8062, 370, 286, 1697, 300, 294, 512, 565, 456, 486, 312, 412, 1935, 992, 22182, + 370, 689, 428, 2657, 293, 14581, 10379, 382, 257, 992, 295, 411, 382, 257, 992, + 295, 597, 393, 312, 402, 33974, 293, 1177, 380, 1319, 264, 3877, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.17764822642008463, "compression_ratio": 1.7606837606837606, + "no_speech_prob": 0.014399350620806217}, {"id": 281, "seek": 307448, "start": 3074.48, + "end": 3102.48, "text": " So for a user that can be like set of user interests for + a document that can be a set of like subjects inside the document for the query + it can be like different parts that you want to have in the query at the same time + but those parts like might not be ordered and when you embed something you are that + you make it ordered and like for instance when I played with clip.", "tokens": [50364, + 407, 337, 257, 4195, 300, 393, 312, 411, 992, 295, 4195, 8847, 337, 257, 4166, 300, + 393, 312, 257, 992, 295, 411, 13066, 1854, 264, 4166, 337, 264, 14581, 309, 393, + 312, 411, 819, 3166, 300, 291, 528, 281, 362, 294, 264, 14581, 412, 264, 912, 565, + 457, 729, 3166, 411, 1062, 406, 312, 8866, 293, 562, 291, 12240, 746, 291, 366, + 300, 291, 652, 309, 8866, 293, 411, 337, 5197, 562, 286, 3737, 365, 7353, 13, 51764], + "temperature": 0.0, "avg_logprob": -0.16185068201135705, "compression_ratio": 1.8592964824120604, + "no_speech_prob": 0.012826280668377876}, {"id": 282, "seek": 310248, "start": 3102.48, + "end": 3131.48, "text": " So there is this for so I thought that like it can do + what''s your which is nice so you can like have an image and like see like what + are the words are which text is closest but definitely struggles with the notion + of like what words are there so what is the first word yeah yeah so like geometrically + or like in different languages it might be even different geometry of words right + like should you read left to right or right.", "tokens": [50364, 407, 456, 307, + 341, 337, 370, 286, 1194, 300, 411, 309, 393, 360, 437, 311, 428, 597, 307, 1481, + 370, 291, 393, 411, 362, 364, 3256, 293, 411, 536, 411, 437, 366, 264, 2283, 366, + 597, 2487, 307, 13699, 457, 2138, 17592, 365, 264, 10710, 295, 411, 437, 2283, 366, + 456, 370, 437, 307, 264, 700, 1349, 1338, 1338, 370, 411, 12956, 81, 984, 420, 411, + 294, 819, 8650, 309, 1062, 312, 754, 819, 18426, 295, 2283, 558, 411, 820, 291, + 1401, 1411, 281, 558, 420, 558, 13, 51814], "temperature": 0.0, "avg_logprob": -0.3359025389283568, + "compression_ratio": 1.8484848484848484, "no_speech_prob": 0.0416722372174263}, + {"id": 283, "seek": 313248, "start": 3132.48, "end": 3137.48, "text": " Right to + left and then like you also need another dimension of language they are guess.", + "tokens": [50364, 1779, 281, 1411, 293, 550, 411, 291, 611, 643, 1071, 10139, 295, + 2856, 436, 366, 2041, 13, 50614], "temperature": 0.0, "avg_logprob": -0.3035950488354786, + "compression_ratio": 1.6363636363636365, "no_speech_prob": 0.012068433687090874}, + {"id": 284, "seek": 313248, "start": 3137.48, "end": 3161.48, "text": " Yeah, we + can represent it as like bag of words maybe ordered bag of words is something encoding + as all people do now for text but that that I have so I think like an end would + need to adapt for the situation in the future and keeping the stability to add new + distances is like is important.", "tokens": [50614, 865, 11, 321, 393, 2906, 309, + 382, 411, 3411, 295, 2283, 1310, 8866, 3411, 295, 2283, 307, 746, 43430, 382, 439, + 561, 360, 586, 337, 2487, 457, 300, 300, 286, 362, 370, 286, 519, 411, 364, 917, + 576, 643, 281, 6231, 337, 264, 2590, 294, 264, 2027, 293, 5145, 264, 11826, 281, + 909, 777, 22182, 307, 411, 307, 1021, 13, 51814], "temperature": 0.0, "avg_logprob": + -0.3035950488354786, "compression_ratio": 1.6363636363636365, "no_speech_prob": + 0.012068433687090874}, {"id": 285, "seek": 316148, "start": 3161.48, "end": 3173.48, + "text": " Yeah, so are you are you working on on this personally or are you like + welcoming pool requests as they say you know to implement different distances.", + "tokens": [50364, 865, 11, 370, 366, 291, 366, 291, 1364, 322, 322, 341, 5665, 420, + 366, 291, 411, 17378, 7005, 12475, 382, 436, 584, 291, 458, 281, 4445, 819, 22182, + 13, 50964], "temperature": 0.0, "avg_logprob": -0.19610314142136348, "compression_ratio": + 1.6022099447513811, "no_speech_prob": 0.011396877467632294}, {"id": 286, "seek": + 316148, "start": 3173.48, "end": 3184.48, "text": " Well, I''m welcoming pool requests + for sure because those are very application specific well it''s pretty easy to implement + like I don''t know.", "tokens": [50964, 1042, 11, 286, 478, 17378, 7005, 12475, + 337, 988, 570, 729, 366, 588, 3861, 2685, 731, 309, 311, 1238, 1858, 281, 4445, + 411, 286, 500, 380, 458, 13, 51514], "temperature": 0.0, "avg_logprob": -0.19610314142136348, + "compression_ratio": 1.6022099447513811, "no_speech_prob": 0.011396877467632294}, + {"id": 287, "seek": 318448, "start": 3185.48, "end": 3212.48, "text": " Or some + simple distance so you have like a set a set of distance you just select which are + the closest out of the set so it would like many many to many somewhat similar I + think to what colder does so though they can I think go without it but essentially + you all you''ll need a set to set distance right yeah make sense I was since I mentioned + it twice already I was wondering like to pick your name.", "tokens": [50414, 1610, + 512, 2199, 4560, 370, 291, 362, 411, 257, 992, 257, 992, 295, 4560, 291, 445, 3048, + 597, 366, 264, 13699, 484, 295, 264, 992, 370, 309, 576, 411, 867, 867, 281, 867, + 8344, 2531, 286, 519, 281, 437, 31020, 775, 370, 1673, 436, 393, 286, 519, 352, + 1553, 309, 457, 4476, 291, 439, 291, 603, 643, 257, 992, 281, 992, 4560, 558, 1338, + 652, 2020, 286, 390, 1670, 286, 2835, 309, 6091, 1217, 286, 390, 6359, 411, 281, + 1888, 428, 1315, 13, 51764], "temperature": 0.0, "avg_logprob": -0.2948715166113843, + "compression_ratio": 1.7733333333333334, "no_speech_prob": 0.08927077054977417}, + {"id": 288, "seek": 321248, "start": 3212.48, "end": 3241.48, "text": " I was wondering + like to pick your brain on what I was thinking in this space like an and trust me + it''s absolutely very simple algorithm that I came up with the only problem is that + I chose Python as the language and so Python has this little bit weird virtual machine + kind of how it does the garbage collection and so what I suspect maybe it''s also + bugging my code but on billion nodes I cannot actually make it conversion reasonable + memory so I''m going to say.", "tokens": [50364, 286, 390, 6359, 411, 281, 1888, + 428, 3567, 322, 437, 286, 390, 1953, 294, 341, 1901, 411, 364, 293, 3361, 385, 309, + 311, 3122, 588, 2199, 9284, 300, 286, 1361, 493, 365, 264, 787, 1154, 307, 300, + 286, 5111, 15329, 382, 264, 2856, 293, 370, 15329, 575, 341, 707, 857, 3657, 6374, + 3479, 733, 295, 577, 309, 775, 264, 14150, 5765, 293, 370, 437, 286, 9091, 1310, + 309, 311, 611, 7426, 3249, 452, 3089, 457, 322, 5218, 13891, 286, 2644, 767, 652, + 309, 14298, 10585, 4675, 370, 286, 478, 516, 281, 584, 13, 51814], "temperature": + 0.0, "avg_logprob": -0.2782296446180835, "compression_ratio": 1.6824817518248176, + "no_speech_prob": 0.005664549767971039}, {"id": 289, "seek": 324148, "start": 3241.48, + "end": 3265.48, "text": " And so it runs out of memory like on 995 million and what + I did I was really what like I took the input set of points right so the points + are like 128 dimensions or 200 dimensions.", "tokens": [50364, 400, 370, 309, 6676, + 484, 295, 4675, 411, 322, 1722, 15718, 2459, 293, 437, 286, 630, 286, 390, 534, + 437, 411, 286, 1890, 264, 4846, 992, 295, 2793, 558, 370, 264, 2793, 366, 411, 29810, + 12819, 420, 2331, 12819, 13, 51564], "temperature": 0.0, "avg_logprob": -0.2871645147150213, + "compression_ratio": 1.3636363636363635, "no_speech_prob": 0.0032338628079742193}, + {"id": 290, "seek": 326548, "start": 3265.48, "end": 3281.48, "text": " Essentially + I pick a random point the first one not not random the first one and then I on a + sample of points I compute a median distance right so basically what''s what''s + the kind of average distance between between all of them in a player wise fashion.", + "tokens": [50364, 23596, 286, 1888, 257, 4974, 935, 264, 700, 472, 406, 406, 4974, + 264, 700, 472, 293, 550, 286, 322, 257, 6889, 295, 2793, 286, 14722, 257, 26779, + 4560, 558, 370, 1936, 437, 311, 437, 311, 264, 733, 295, 4274, 4560, 1296, 1296, + 439, 295, 552, 294, 257, 4256, 10829, 6700, 13, 51164], "temperature": 0.0, "avg_logprob": + -0.08790369033813476, "compression_ratio": 1.6217948717948718, "no_speech_prob": + 0.03686128929257393}, {"id": 291, "seek": 328148, "start": 3282.48, "end": 3308.48, + "text": " And and so then I use that as as a filter to build what I called a sharp + right so essentially I decided to split the billion points down to controllable + number of charts let''s say 1000 charts right and so I pick the first point and + then I say okay which other point is close enough so like within that median distance + to this point.", "tokens": [50414, 400, 293, 370, 550, 286, 764, 300, 382, 382, + 257, 6608, 281, 1322, 437, 286, 1219, 257, 8199, 558, 370, 4476, 286, 3047, 281, + 7472, 264, 5218, 2793, 760, 281, 45159, 712, 1230, 295, 17767, 718, 311, 584, 9714, + 17767, 558, 293, 370, 286, 1888, 264, 700, 935, 293, 550, 286, 584, 1392, 597, 661, + 935, 307, 1998, 1547, 370, 411, 1951, 300, 26779, 4560, 281, 341, 935, 13, 51714], + "temperature": 0.0, "avg_logprob": -0.14440786022029511, "compression_ratio": 1.6274509803921569, + "no_speech_prob": 0.02213110215961933}, {"id": 292, "seek": 330848, "start": 3308.48, + "end": 3333.48, "text": " And so I joined them together in the chart and as the + chart reaches 1 million so basically if it''s like 1000 charts each chart roughly + 1 million points that''s a billion points right and then I will close that chart + and I will run H and as double you on it so that I can actually have that chart + as a hierarchical navigable small world graph.", "tokens": [50364, 400, 370, 286, + 6869, 552, 1214, 294, 264, 6927, 293, 382, 264, 6927, 14235, 502, 2459, 370, 1936, + 498, 309, 311, 411, 9714, 17767, 1184, 6927, 9810, 502, 2459, 2793, 300, 311, 257, + 5218, 2793, 558, 293, 550, 286, 486, 1998, 300, 6927, 293, 286, 486, 1190, 389, + 293, 382, 3834, 291, 322, 309, 370, 300, 286, 393, 767, 362, 300, 6927, 382, 257, + 35250, 804, 7407, 712, 1359, 1002, 4295, 13, 51614], "temperature": 0.0, "avg_logprob": + -0.1812357275109542, "compression_ratio": 1.7577319587628866, "no_speech_prob": + 0.04544257000088692}, {"id": 293, "seek": 333348, "start": 3334.48, "end": 3348.48, + "text": " And and it seems to converge like at least on 10 million it converges + on 100 million converges it runs out of memory on one billion but I think it''s + just some weirdness in how I do it in this big loop or overall points.", "tokens": + [50414, 400, 293, 309, 2544, 281, 41881, 411, 412, 1935, 322, 1266, 2459, 309, 9652, + 2880, 322, 2319, 2459, 9652, 2880, 309, 6676, 484, 295, 4675, 322, 472, 5218, 457, + 286, 519, 309, 311, 445, 512, 3657, 1287, 294, 577, 286, 360, 309, 294, 341, 955, + 6367, 420, 4787, 2793, 13, 51114], "temperature": 0.0, "avg_logprob": -0.14005363429034198, + "compression_ratio": 1.5208333333333333, "no_speech_prob": 0.019728410989046097}, + {"id": 294, "seek": 334848, "start": 3348.48, "end": 3366.48, "text": " But when + I reached out to you on on GitHub like my idea was to also access the first layer + of the graph so that first layer where the query will enter I could use that.", + "tokens": [50364, 583, 562, 286, 6488, 484, 281, 291, 322, 322, 23331, 411, 452, + 1558, 390, 281, 611, 2105, 264, 700, 4583, 295, 264, 4295, 370, 300, 700, 4583, + 689, 264, 14581, 486, 3242, 286, 727, 764, 300, 13, 51264], "temperature": 0.0, + "avg_logprob": -0.13900033439077983, "compression_ratio": 1.3770491803278688, "no_speech_prob": + 0.09835981577634811}, {"id": 295, "seek": 336648, "start": 3367.48, "end": 3393.48, + "text": " And as the sort of entry point across this 1000 charts right so because + I don''t want to load all 1000 into memory I want to load only sufficient amount + of entry points so that I can quickly check which chart is closer to my query and + then go inside that and use it as W what do you think about this idea it''s very + simple I think.", "tokens": [50414, 400, 382, 264, 1333, 295, 8729, 935, 2108, 341, + 9714, 17767, 558, 370, 570, 286, 500, 380, 528, 281, 3677, 439, 9714, 666, 4675, + 286, 528, 281, 3677, 787, 11563, 2372, 295, 8729, 2793, 370, 300, 286, 393, 2661, + 1520, 597, 6927, 307, 4966, 281, 452, 14581, 293, 550, 352, 1854, 300, 293, 764, + 309, 382, 343, 437, 360, 291, 519, 466, 341, 1558, 309, 311, 588, 2199, 286, 519, + 13, 51714], "temperature": 0.0, "avg_logprob": -0.12703253428141276, "compression_ratio": + 1.6127450980392157, "no_speech_prob": 0.09558729082345963}, {"id": 296, "seek": + 339348, "start": 3394.48, "end": 3411.48, "text": " Yes well that that that makes + sense so that clustering it seems to be so like you did you have like a cluster + the points into 1000 clusters and then they select the clusters and.", "tokens": + [50414, 1079, 731, 300, 300, 300, 1669, 2020, 370, 300, 596, 48673, 309, 2544, 281, + 312, 370, 411, 291, 630, 291, 362, 411, 257, 13630, 264, 2793, 666, 9714, 23313, + 293, 550, 436, 3048, 264, 23313, 293, 13, 51264], "temperature": 0.0, "avg_logprob": + -0.2219376680327625, "compression_ratio": 1.575221238938053, "no_speech_prob": 0.04425818473100662}, + {"id": 297, "seek": 341148, "start": 3412.48, "end": 3421.48, "text": " Well yeah + that that makes sense I think like historically there were other papers that suggested + something similar to.", "tokens": [50414, 1042, 1338, 300, 300, 1669, 2020, 286, + 519, 411, 16180, 456, 645, 661, 10577, 300, 10945, 746, 2531, 281, 13, 50864], "temperature": + 0.0, "avg_logprob": -0.2000776373821756, "compression_ratio": 1.5071428571428571, + "no_speech_prob": 0.032513830810785294}, {"id": 298, "seek": 341148, "start": 3421.48, + "end": 3429.48, "text": " And then I think in flam so that was one of the distributors + strategies that they suggested.", "tokens": [50864, 400, 550, 286, 519, 294, 932, + 335, 370, 300, 390, 472, 295, 264, 4400, 30751, 9029, 300, 436, 10945, 13, 51264], + "temperature": 0.0, "avg_logprob": -0.2000776373821756, "compression_ratio": 1.5071428571428571, + "no_speech_prob": 0.032513830810785294}, {"id": 299, "seek": 342948, "start": 3430.48, + "end": 3451.48, "text": " Yeah well that that that might work out so that though + that depends on on the scale so and so that also well for production system you + also want to replicate those notes and so right.", "tokens": [50414, 865, 731, 300, + 300, 300, 1062, 589, 484, 370, 300, 1673, 300, 5946, 322, 322, 264, 4373, 370, 293, + 370, 300, 611, 731, 337, 4265, 1185, 291, 611, 528, 281, 25356, 729, 5570, 293, + 370, 558, 13, 51464], "temperature": 0.0, "avg_logprob": -0.2498370612539896, "compression_ratio": + 1.525, "no_speech_prob": 0.07719220221042633}, {"id": 300, "seek": 345148, "start": + 3452.48, "end": 3476.48, "text": " Okay maybe like let''s come from a different + way so that you can also start to very small pieces so it might not be needed in + this case I can want to balance so but on the top layer you can also use like as + in this Microsoft paper that you mentioned also there are other papers like from + young so.", "tokens": [50414, 1033, 1310, 411, 718, 311, 808, 490, 257, 819, 636, + 370, 300, 291, 393, 611, 722, 281, 588, 1359, 3755, 370, 309, 1062, 406, 312, 2978, + 294, 341, 1389, 286, 393, 528, 281, 4772, 370, 457, 322, 264, 1192, 4583, 291, 393, + 611, 764, 411, 382, 294, 341, 8116, 3035, 300, 291, 2835, 611, 456, 366, 661, 10577, + 411, 490, 2037, 370, 13, 51614], "temperature": 0.0, "avg_logprob": -0.21727066609396864, + "compression_ratio": 1.6174863387978142, "no_speech_prob": 0.006955789867788553}, + {"id": 301, "seek": 347648, "start": 3476.48, "end": 3486.48, "text": " So I have + a paper this those guys so you can use a in you can start into maybe not the short + you can.", "tokens": [50364, 407, 286, 362, 257, 3035, 341, 729, 1074, 370, 291, + 393, 764, 257, 294, 291, 393, 722, 666, 1310, 406, 264, 2099, 291, 393, 13, 50864], + "temperature": 0.0, "avg_logprob": -0.5875177712276064, "compression_ratio": 1.216867469879518, + "no_speech_prob": 0.025405174121260643}, {"id": 302, "seek": 348648, "start": 3486.48, + "end": 3501.48, "text": " So if you want to divide your data set into like million + clusters and use like a higher index to decide on which chart you want to select + it right yes.", "tokens": [50364, 407, 498, 291, 528, 281, 9845, 428, 1412, 992, + 666, 411, 2459, 23313, 293, 764, 411, 257, 2946, 8186, 281, 4536, 322, 597, 6927, + 291, 528, 281, 3048, 309, 558, 2086, 13, 51114], "temperature": 0.0, "avg_logprob": + -0.37323816006000227, "compression_ratio": 1.532846715328467, "no_speech_prob": + 0.24684560298919678}, {"id": 303, "seek": 348648, "start": 3501.48, "end": 3506.48, + "text": " So though like if you''re if you''re not talking about like.", "tokens": + [51114, 407, 1673, 411, 498, 291, 434, 498, 291, 434, 406, 1417, 466, 411, 13, 51364], + "temperature": 0.0, "avg_logprob": -0.37323816006000227, "compression_ratio": 1.532846715328467, + "no_speech_prob": 0.24684560298919678}, {"id": 304, "seek": 350648, "start": 3506.48, + "end": 3511.48, "text": " So it''s not a scale so probably like doesn''t make too + much sense.", "tokens": [50364, 407, 309, 311, 406, 257, 4373, 370, 1391, 411, 1177, + 380, 652, 886, 709, 2020, 13, 50614], "temperature": 0.0, "avg_logprob": -0.24902000427246093, + "compression_ratio": 1.6226415094339623, "no_speech_prob": 0.3841511607170105}, + {"id": 305, "seek": 350648, "start": 3511.48, "end": 3534.48, "text": " But yeah + yeah you can do this yeah I mean I''m still hopeful to kind of keep trying it I + have another friend who is like on Twitter he actually recorded like a YouTube video + where he said here here and here you make a mistake like this is why you lose memory + like you should never allocate objects inside loops you should pre-allocate them + as NAMPA erase and so on.", "tokens": [50614, 583, 1338, 1338, 291, 393, 360, 341, + 1338, 286, 914, 286, 478, 920, 20531, 281, 733, 295, 1066, 1382, 309, 286, 362, + 1071, 1277, 567, 307, 411, 322, 5794, 415, 767, 8287, 411, 257, 3088, 960, 689, + 415, 848, 510, 510, 293, 510, 291, 652, 257, 6146, 411, 341, 307, 983, 291, 3624, + 4675, 411, 291, 820, 1128, 35713, 6565, 1854, 16121, 291, 820, 659, 12, 336, 42869, + 552, 382, 426, 2865, 10297, 23525, 293, 370, 322, 13, 51764], "temperature": 0.0, + "avg_logprob": -0.24902000427246093, "compression_ratio": 1.6226415094339623, "no_speech_prob": + 0.3841511607170105}, {"id": 306, "seek": 353448, "start": 3534.48, "end": 3554.48, + "text": " So with his modifications it still runs out of memory so like I need to + kind of move forward and I''m still kind of like hopefully I can do it in Python + but something also tells me maybe I should move to more kind of memory controllable + language something like rast or C++ I don''t know.", "tokens": [50364, 407, 365, + 702, 26881, 309, 920, 6676, 484, 295, 4675, 370, 411, 286, 643, 281, 733, 295, 1286, + 2128, 293, 286, 478, 920, 733, 295, 411, 4696, 286, 393, 360, 309, 294, 15329, 457, + 746, 611, 5112, 385, 1310, 286, 820, 1286, 281, 544, 733, 295, 4675, 45159, 712, + 2856, 746, 411, 367, 525, 420, 383, 25472, 286, 500, 380, 458, 13, 51364], "temperature": + 0.0, "avg_logprob": -0.1569580598310991, "compression_ratio": 1.5159574468085106, + "no_speech_prob": 0.33807137608528137}, {"id": 307, "seek": 355448, "start": 3554.48, + "end": 3583.48, "text": " Well I''m not sure so like so using something like so + you probably using C++ libraries from Python like NAMPA or torch yeah something + like that so they should not click memory so those are pretty pretty controllable + yeah yeah it is definitely my code somewhere in the loop it probably just computes + too many time like like basically the hottest part of the algorithm like in terms + of profiling it right is like.", "tokens": [50364, 1042, 286, 478, 406, 988, 370, + 411, 370, 1228, 746, 411, 370, 291, 1391, 1228, 383, 25472, 15148, 490, 15329, 411, + 426, 2865, 10297, 420, 27822, 1338, 746, 411, 300, 370, 436, 820, 406, 2052, 4675, + 370, 729, 366, 1238, 1238, 45159, 712, 1338, 1338, 309, 307, 2138, 452, 3089, 4079, + 294, 264, 6367, 309, 1391, 445, 715, 1819, 886, 867, 565, 411, 411, 1936, 264, 32780, + 644, 295, 264, 9284, 411, 294, 2115, 295, 1740, 4883, 309, 558, 307, 411, 13, 51814], + "temperature": 0.0, "avg_logprob": -0.14544051192527593, "compression_ratio": 1.6942148760330578, + "no_speech_prob": 0.044569287449121475}, {"id": 308, "seek": 358348, "start": 3583.48, + "end": 3605.48, "text": " Like when you can so you pre compute the median distance + once right and then you use that value all the time so it''s kind of it''s okay + it''s just an object so it doesn''t allocate much but then as you extract the next + batch of points so I read the one billion set in one million batches right.", "tokens": + [50364, 1743, 562, 291, 393, 370, 291, 659, 14722, 264, 26779, 4560, 1564, 558, + 293, 550, 291, 764, 300, 2158, 439, 264, 565, 370, 309, 311, 733, 295, 309, 311, + 1392, 309, 311, 445, 364, 2657, 370, 309, 1177, 380, 35713, 709, 457, 550, 382, + 291, 8947, 264, 958, 15245, 295, 2793, 370, 286, 1401, 264, 472, 5218, 992, 294, + 472, 2459, 15245, 279, 558, 13, 51464], "temperature": 0.0, "avg_logprob": -0.13325601384259653, + "compression_ratio": 1.6440677966101696, "no_speech_prob": 0.020004967227578163}, + {"id": 309, "seek": 360548, "start": 3606.48, "end": 3625.48, "text": " I sense + that there could be a loss of memory because like it''s a binary file and so you + say in NAMPA you say from this file read the next batch right so like you you provide + the kind of offset and so I sense that maybe there it loses memory maybe not I don''t + know.", "tokens": [50414, 286, 2020, 300, 456, 727, 312, 257, 4470, 295, 4675, 570, + 411, 309, 311, 257, 17434, 3991, 293, 370, 291, 584, 294, 426, 2865, 10297, 291, + 584, 490, 341, 3991, 1401, 264, 958, 15245, 558, 370, 411, 291, 291, 2893, 264, + 733, 295, 18687, 293, 370, 286, 2020, 300, 1310, 456, 309, 18293, 4675, 1310, 406, + 286, 500, 380, 458, 13, 51364], "temperature": 0.2, "avg_logprob": -0.24742143232743818, + "compression_ratio": 1.698019801980198, "no_speech_prob": 0.01445314846932888}, + {"id": 310, "seek": 360548, "start": 3626.48, "end": 3634.48, "text": " Because + I''ve noticed that in files library they use M M M M M M M M M M M M M.", "tokens": + [51414, 1436, 286, 600, 5694, 300, 294, 7098, 6405, 436, 764, 376, 376, 376, 376, + 376, 376, 376, 376, 376, 376, 376, 376, 376, 13, 51814], "temperature": 0.2, "avg_logprob": + -0.24742143232743818, "compression_ratio": 1.698019801980198, "no_speech_prob": + 0.01445314846932888}, {"id": 311, "seek": 363448, "start": 3634.48, "end": 3644.48, + "text": " Yeah, I can also use M M M M. So NAMPA if you read the tenser from NAMPA + they can also have M M M M M M M options so you can load this M M M M M M M M M + M M M M M.", "tokens": [50364, 865, 11, 286, 393, 611, 764, 376, 376, 376, 376, + 13, 407, 426, 2865, 10297, 498, 291, 1401, 264, 10688, 260, 490, 426, 2865, 10297, + 436, 393, 611, 362, 376, 376, 376, 376, 376, 376, 376, 3956, 370, 291, 393, 3677, + 341, 376, 376, 376, 376, 376, 376, 376, 376, 376, 376, 376, 376, 376, 376, 13, 50864], + "temperature": 0.6, "avg_logprob": -0.3152877209233303, "compression_ratio": 1.8118279569892473, + "no_speech_prob": 0.021768230944871902}, {"id": 312, "seek": 363448, "start": 3645.48, + "end": 3660.48, "text": " But even if you''re using if you''re reading we are like + open like open file is like read binary it should not click memory so it should + it should do read then it''s just like.", "tokens": [50914, 583, 754, 498, 291, + 434, 1228, 498, 291, 434, 3760, 321, 366, 411, 1269, 411, 1269, 3991, 307, 411, + 1401, 17434, 309, 820, 406, 2052, 4675, 370, 309, 820, 309, 820, 360, 1401, 550, + 309, 311, 445, 411, 13, 51664], "temperature": 0.6, "avg_logprob": -0.3152877209233303, + "compression_ratio": 1.8118279569892473, "no_speech_prob": 0.021768230944871902}, + {"id": 313, "seek": 366048, "start": 3660.48, "end": 3668.1, "text": " Yeah, so + it must be something super stupid then in my code that kind of like really obvious + to somebody like you like okay", "tokens": [50364, 865, 11, 370, 309, 1633, 312, + 746, 1687, 6631, 550, 294, 452, 3089, 300, 733, 295, 411, 534, 6322, 281, 2618, + 411, 291, 411, 1392, 50745], "temperature": 0.0, "avg_logprob": -0.2828137919587909, + "compression_ratio": 1.7114624505928853, "no_speech_prob": 0.2249089628458023}, + {"id": 314, "seek": 366048, "start": 3668.1, "end": 3670.7, "text": " Here is here + is that point you should not do this", "tokens": [50745, 1692, 307, 510, 307, 300, + 935, 291, 820, 406, 360, 341, 50875], "temperature": 0.0, "avg_logprob": -0.2828137919587909, + "compression_ratio": 1.7114624505928853, "no_speech_prob": 0.2249089628458023}, + {"id": 315, "seek": 366048, "start": 3670.98, "end": 3674.42, "text": " But like + for me it''s like I invented this basic idea", "tokens": [50889, 583, 411, 337, + 385, 309, 311, 411, 286, 14479, 341, 3875, 1558, 51061], "temperature": 0.0, "avg_logprob": + -0.2828137919587909, "compression_ratio": 1.7114624505928853, "no_speech_prob": + 0.2249089628458023}, {"id": 316, "seek": 366048, "start": 3674.76, "end": 3679.86, + "text": " But then like pushing it maybe like like it works on 10 million and I''m + okay", "tokens": [51078, 583, 550, 411, 7380, 309, 1310, 411, 411, 309, 1985, 322, + 1266, 2459, 293, 286, 478, 1392, 51333], "temperature": 0.0, "avg_logprob": -0.2828137919587909, + "compression_ratio": 1.7114624505928853, "no_speech_prob": 0.2249089628458023}, + {"id": 317, "seek": 366048, "start": 3680.02, "end": 3685.44, "text": " But like + the task was as part of this challenge to do the billion scale right so this is + like", "tokens": [51341, 583, 411, 264, 5633, 390, 382, 644, 295, 341, 3430, 281, + 360, 264, 5218, 4373, 558, 370, 341, 307, 411, 51612], "temperature": 0.0, "avg_logprob": + -0.2828137919587909, "compression_ratio": 1.7114624505928853, "no_speech_prob": + 0.2249089628458023}, {"id": 318, "seek": 366048, "start": 3685.96, "end": 3688.46, + "text": " You crawl you crawl the the mountain", "tokens": [51638, 509, 24767, 291, + 24767, 264, 264, 6937, 51763], "temperature": 0.0, "avg_logprob": -0.2828137919587909, + "compression_ratio": 1.7114624505928853, "no_speech_prob": 0.2249089628458023}, + {"id": 319, "seek": 368846, "start": 3689.02, "end": 3693.5, "text": " Without the + top in a way, but yes, there is a top of course. It''s only one billion points", + "tokens": [50392, 9129, 264, 1192, 294, 257, 636, 11, 457, 2086, 11, 456, 307, 257, + 1192, 295, 1164, 13, 467, 311, 787, 472, 5218, 2793, 50616], "temperature": 0.0, + "avg_logprob": -0.286625608391718, "compression_ratio": 1.5576923076923077, "no_speech_prob": + 0.0039496049284935}, {"id": 320, "seek": 368846, "start": 3694.38, "end": 3699.7, + "text": " But yeah, I mean it keeps me quite excited to keep doing it. Of course, + I already see some", "tokens": [50660, 583, 1338, 11, 286, 914, 309, 5965, 385, + 1596, 2919, 281, 1066, 884, 309, 13, 2720, 1164, 11, 286, 1217, 536, 512, 50926], + "temperature": 0.0, "avg_logprob": -0.286625608391718, "compression_ratio": 1.5576923076923077, + "no_speech_prob": 0.0039496049284935}, {"id": 321, "seek": 368846, "start": 3700.42, + "end": 3702.02, "text": " maybe", "tokens": [50962, 1310, 51042], "temperature": + 0.0, "avg_logprob": -0.286625608391718, "compression_ratio": 1.5576923076923077, + "no_speech_prob": 0.0039496049284935}, {"id": 322, "seek": 368846, "start": 3702.02, + "end": 3705.68, "text": " Need for improvements for example. How how do I make updates?", + "tokens": [51042, 16984, 337, 13797, 337, 1365, 13, 1012, 577, 360, 286, 652, 9205, + 30, 51225], "temperature": 0.0, "avg_logprob": -0.286625608391718, "compression_ratio": + 1.5576923076923077, "no_speech_prob": 0.0039496049284935}, {"id": 323, "seek": 368846, + "start": 3705.98, "end": 3710.3, "text": " Right, so let''s say a new point comes + in and I have like 1000 charts predefined", "tokens": [51240, 1779, 11, 370, 718, + 311, 584, 257, 777, 935, 1487, 294, 293, 286, 362, 411, 9714, 17767, 659, 37716, + 51456], "temperature": 0.0, "avg_logprob": -0.286625608391718, "compression_ratio": + 1.5576923076923077, "no_speech_prob": 0.0039496049284935}, {"id": 324, "seek": 368846, + "start": 3710.66, "end": 3715.84, "text": " So I need to find either an existing + chart or create a new one at some point", "tokens": [51474, 407, 286, 643, 281, + 915, 2139, 364, 6741, 6927, 420, 1884, 257, 777, 472, 412, 512, 935, 51733], "temperature": + 0.0, "avg_logprob": -0.286625608391718, "compression_ratio": 1.5576923076923077, + "no_speech_prob": 0.0039496049284935}, {"id": 325, "seek": 371584, "start": 3715.88, + "end": 3723.8, "text": " So that that part I defer to the future, but like maybe + I still need to push push harder to just converge it first", "tokens": [50366, 407, + 300, 300, 644, 286, 25704, 281, 264, 2027, 11, 457, 411, 1310, 286, 920, 643, 281, + 2944, 2944, 6081, 281, 445, 41881, 309, 700, 50762], "temperature": 0.0, "avg_logprob": + -0.2673491564663974, "compression_ratio": 1.6164383561643836, "no_speech_prob": + 0.00398827251046896}, {"id": 326, "seek": 371584, "start": 3725.56, "end": 3733.26, + "text": " Okay, you can profile for memory so we can like loop some operations in + the code that you think that can leak and", "tokens": [50850, 1033, 11, 291, 393, + 7964, 337, 4675, 370, 321, 393, 411, 6367, 512, 7705, 294, 264, 3089, 300, 291, + 519, 300, 393, 17143, 293, 51235], "temperature": 0.0, "avg_logprob": -0.2673491564663974, + "compression_ratio": 1.6164383561643836, "no_speech_prob": 0.00398827251046896}, + {"id": 327, "seek": 371584, "start": 3733.76, "end": 3735.76, "text": " Profile + the memory for those", "tokens": [51260, 6039, 794, 264, 4675, 337, 729, 51360], + "temperature": 0.0, "avg_logprob": -0.2673491564663974, "compression_ratio": 1.6164383561643836, + "no_speech_prob": 0.00398827251046896}, {"id": 328, "seek": 371584, "start": 3735.76, + "end": 3740.1200000000003, "text": " Yeah, I''ve been doing that like actually I + also come from the world of Java so in Java it''s like", "tokens": [51360, 865, + 11, 286, 600, 668, 884, 300, 411, 767, 286, 611, 808, 490, 264, 1002, 295, 10745, + 370, 294, 10745, 309, 311, 411, 51578], "temperature": 0.0, "avg_logprob": -0.2673491564663974, + "compression_ratio": 1.6164383561643836, "no_speech_prob": 0.00398827251046896}, + {"id": 329, "seek": 374012, "start": 3740.7999999999997, "end": 3745.64, "text": + " Quite straightforward in a way there are also tools in Python when you plug in + this memory profiler", "tokens": [50398, 20464, 15325, 294, 257, 636, 456, 366, + 611, 3873, 294, 15329, 562, 291, 5452, 294, 341, 4675, 1740, 5441, 50640], "temperature": + 0.0, "avg_logprob": -0.29528640914749316, "compression_ratio": 1.5872340425531914, + "no_speech_prob": 0.012513574212789536}, {"id": 330, "seek": 374012, "start": 3745.64, + "end": 3751.0, "text": " It slows down your computations significantly and so you + have to wait like 10 times more", "tokens": [50640, 467, 35789, 760, 428, 2807, + 763, 10591, 293, 370, 291, 362, 281, 1699, 411, 1266, 1413, 544, 50908], "temperature": + 0.0, "avg_logprob": -0.29528640914749316, "compression_ratio": 1.5872340425531914, + "no_speech_prob": 0.012513574212789536}, {"id": 331, "seek": 374012, "start": 3754.48, + "end": 3763.7999999999997, "text": " Yeah, so I''m not a fan of profilers so like + recently I find I found a video like a talk on YouTube", "tokens": [51082, 865, + 11, 370, 286, 478, 406, 257, 3429, 295, 1740, 388, 433, 370, 411, 3938, 286, 915, + 286, 1352, 257, 960, 411, 257, 751, 322, 3088, 51548], "temperature": 0.0, "avg_logprob": + -0.29528640914749316, "compression_ratio": 1.5872340425531914, "no_speech_prob": + 0.012513574212789536}, {"id": 332, "seek": 374012, "start": 3763.7999999999997, + "end": 3769.72, "text": " Which explain like why why we shouldn''t use profilers + and that was like the profilers", "tokens": [51548, 3013, 2903, 411, 983, 983, 321, + 4659, 380, 764, 1740, 388, 433, 293, 300, 390, 411, 264, 1740, 388, 433, 51844], + "temperature": 0.0, "avg_logprob": -0.29528640914749316, "compression_ratio": 1.5872340425531914, + "no_speech_prob": 0.012513574212789536}, {"id": 333, "seek": 377012, "start": 3770.2, + "end": 3777.2799999999997, "text": " They become obsolete when the code became like + not multi-freaded, but like with multiple paths", "tokens": [50368, 814, 1813, 46333, + 562, 264, 3089, 3062, 411, 406, 4825, 12, 69, 2538, 292, 11, 457, 411, 365, 3866, + 14518, 50722], "temperature": 0.0, "avg_logprob": -0.496322234471639, "compression_ratio": + 1.6419213973799127, "no_speech_prob": 0.005532990675419569}, {"id": 334, "seek": + 377012, "start": 3777.2799999999997, "end": 3780.7999999999997, "text": " So when + I''m totally this pension so pension was super scholar", "tokens": [50722, 407, + 562, 286, 478, 3879, 341, 21927, 370, 21927, 390, 1687, 17912, 50898], "temperature": + 0.0, "avg_logprob": -0.496322234471639, "compression_ratio": 1.6419213973799127, + "no_speech_prob": 0.005532990675419569}, {"id": 335, "seek": 377012, "start": 3781.2799999999997, + "end": 3786.48, "text": " So your operations are out of order and when you look + at the profiler results like", "tokens": [50922, 407, 428, 7705, 366, 484, 295, + 1668, 293, 562, 291, 574, 412, 264, 1740, 5441, 3542, 411, 51182], "temperature": + 0.0, "avg_logprob": -0.496322234471639, "compression_ratio": 1.6419213973799127, + "no_speech_prob": 0.005532990675419569}, {"id": 336, "seek": 377012, "start": 3787.24, + "end": 3793.56, "text": " I don''t understand them so when I was developing a S&S + W Libye, I haven''t used profilers", "tokens": [51220, 286, 500, 380, 1223, 552, + 370, 562, 286, 390, 6416, 257, 318, 5, 50, 343, 15834, 1200, 11, 286, 2378, 380, + 1143, 1740, 388, 433, 51536], "temperature": 0.0, "avg_logprob": -0.496322234471639, + "compression_ratio": 1.6419213973799127, "no_speech_prob": 0.005532990675419569}, + {"id": 337, "seek": 377012, "start": 3793.56, "end": 3797.0, "text": " So I just + like wrote benches for operations and", "tokens": [51536, 407, 286, 445, 411, 4114, + 3271, 3781, 337, 7705, 293, 51708], "temperature": 0.0, "avg_logprob": -0.496322234471639, + "compression_ratio": 1.6419213973799127, "no_speech_prob": 0.005532990675419569}, + {"id": 338, "seek": 379700, "start": 3797.64, "end": 3799.64, "text": " And like + I had like", "tokens": [50396, 400, 411, 286, 632, 411, 50496], "temperature": 0.0, + "avg_logprob": -0.3821509225027902, "compression_ratio": 1.575, "no_speech_prob": + 0.009380927309393883}, {"id": 339, "seek": 379700, "start": 3800.16, "end": 3804.32, + "text": " Faceline and trial so they usually work in the same memory", "tokens": + [50522, 17667, 5440, 293, 7308, 370, 436, 2673, 589, 294, 264, 912, 4675, 50730], + "temperature": 0.0, "avg_logprob": -0.3821509225027902, "compression_ratio": 1.575, + "no_speech_prob": 0.009380927309393883}, {"id": 340, "seek": 379700, "start": 3804.6, + "end": 3809.16, "text": " So the like index is the same, but there are different + implementations of search and", "tokens": [50744, 407, 264, 411, 8186, 307, 264, + 912, 11, 457, 456, 366, 819, 4445, 763, 295, 3164, 293, 50972], "temperature": 0.0, + "avg_logprob": -0.3821509225027902, "compression_ratio": 1.575, "no_speech_prob": + 0.009380927309393883}, {"id": 341, "seek": 379700, "start": 3809.84, "end": 3814.8, + "text": " Like your your speed can depend on memory how you allocate the memory + and", "tokens": [51006, 1743, 428, 428, 3073, 393, 5672, 322, 4675, 577, 291, 35713, + 264, 4675, 293, 51254], "temperature": 0.0, "avg_logprob": -0.3821509225027902, + "compression_ratio": 1.575, "no_speech_prob": 0.009380927309393883}, {"id": 342, + "seek": 379700, "start": 3815.96, "end": 3820.36, "text": " With those benches you + can measure something like up to 1 or 2% of difference", "tokens": [51312, 2022, + 729, 3271, 3781, 291, 393, 3481, 746, 411, 493, 281, 502, 420, 568, 4, 295, 2649, + 51532], "temperature": 0.0, "avg_logprob": -0.3821509225027902, "compression_ratio": + 1.575, "no_speech_prob": 0.009380927309393883}, {"id": 343, "seek": 382036, "start": + 3820.96, "end": 3829.1200000000003, "text": " And when you like do a lot of benches + with one or two percent improvement you can get like 20% improvement 50% improvement", + "tokens": [50394, 400, 562, 291, 411, 360, 257, 688, 295, 3271, 3781, 365, 472, + 420, 732, 3043, 10444, 291, 393, 483, 411, 945, 4, 10444, 2625, 4, 10444, 50802], + "temperature": 0.0, "avg_logprob": -0.35023237410045804, "compression_ratio": 1.7584541062801933, + "no_speech_prob": 0.00907359179109335}, {"id": 344, "seek": 382036, "start": 3831.04, + "end": 3839.44, "text": " Yeah, but like I never used profiles and like I never + saw like in my life that people use profiles and like get", "tokens": [50898, 865, + 11, 457, 411, 286, 1128, 1143, 23693, 293, 411, 286, 1128, 1866, 411, 294, 452, + 993, 300, 561, 764, 23693, 293, 411, 483, 51318], "temperature": 0.0, "avg_logprob": + -0.35023237410045804, "compression_ratio": 1.7584541062801933, "no_speech_prob": + 0.00907359179109335}, {"id": 345, "seek": 382036, "start": 3840.1600000000003, "end": + 3843.1600000000003, "text": " really complicated insights from using profiles", + "tokens": [51354, 534, 6179, 14310, 490, 1228, 23693, 51504], "temperature": 0.0, + "avg_logprob": -0.35023237410045804, "compression_ratio": 1.7584541062801933, "no_speech_prob": + 0.00907359179109335}, {"id": 346, "seek": 382036, "start": 3843.1600000000003, "end": + 3847.92, "text": " Yeah, I agree like we we did like so we''re building also building + a search engine", "tokens": [51504, 865, 11, 286, 3986, 411, 321, 321, 630, 411, + 370, 321, 434, 2390, 611, 2390, 257, 3164, 2848, 51742], "temperature": 0.0, "avg_logprob": + -0.35023237410045804, "compression_ratio": 1.7584541062801933, "no_speech_prob": + 0.00907359179109335}, {"id": 347, "seek": 384792, "start": 3848.4, "end": 3857.76, + "text": " With like we had like by design we had like billions of documents and + each document was just a short sentence like a statement from a document real document", + "tokens": [50388, 2022, 411, 321, 632, 411, 538, 1715, 321, 632, 411, 17375, 295, + 8512, 293, 1184, 4166, 390, 445, 257, 2099, 8174, 411, 257, 5629, 490, 257, 4166, + 957, 4166, 50856], "temperature": 0.0, "avg_logprob": -0.2521374667132342, "compression_ratio": + 1.7876447876447876, "no_speech_prob": 0.008226973004639149}, {"id": 348, "seek": + 384792, "start": 3858.28, "end": 3866.08, "text": " And of course we were running + out like we were running into all this garbage collector stop the world problems + and so on and we were running this", "tokens": [50882, 400, 295, 1164, 321, 645, + 2614, 484, 411, 321, 645, 2614, 666, 439, 341, 14150, 23960, 1590, 264, 1002, 2740, + 293, 370, 322, 293, 321, 645, 2614, 341, 51272], "temperature": 0.0, "avg_logprob": + -0.2521374667132342, "compression_ratio": 1.7876447876447876, "no_speech_prob": + 0.008226973004639149}, {"id": 349, "seek": 384792, "start": 3866.08, "end": 3872.08, + "text": " Profilers, I think one of them was J rocket and then others and like when + you see the graphs you''re like, okay", "tokens": [51272, 6039, 388, 433, 11, 286, + 519, 472, 295, 552, 390, 508, 13012, 293, 550, 2357, 293, 411, 562, 291, 536, 264, + 24877, 291, 434, 411, 11, 1392, 51572], "temperature": 0.0, "avg_logprob": -0.2521374667132342, + "compression_ratio": 1.7876447876447876, "no_speech_prob": 0.008226973004639149}, + {"id": 350, "seek": 384792, "start": 3872.64, "end": 3876.08, "text": " So now I + know yes it leaks, but what should I do?", "tokens": [51600, 407, 586, 286, 458, + 2086, 309, 28885, 11, 457, 437, 820, 286, 360, 30, 51772], "temperature": 0.0, "avg_logprob": + -0.2521374667132342, "compression_ratio": 1.7876447876447876, "no_speech_prob": + 0.008226973004639149}, {"id": 351, "seek": 387608, "start": 3876.7999999999997, + "end": 3880.96, "text": " So or it tells you that your code is using like", "tokens": + [50400, 407, 420, 309, 5112, 291, 300, 428, 3089, 307, 1228, 411, 50608], "temperature": + 0.0, "avg_logprob": -0.3873465061187744, "compression_ratio": 1.5691489361702127, + "no_speech_prob": 0.01709228754043579}, {"id": 352, "seek": 387608, "start": 3881.52, + "end": 3884.7999999999997, "text": " Biterase too much like what can you do other + than that?", "tokens": [50636, 363, 1681, 651, 886, 709, 411, 437, 393, 291, 360, + 661, 813, 300, 30, 50800], "temperature": 0.0, "avg_logprob": -0.3873465061187744, + "compression_ratio": 1.5691489361702127, "no_speech_prob": 0.01709228754043579}, + {"id": 353, "seek": 387608, "start": 3887.56, "end": 3893.7999999999997, "text": + " Yeah, and for performance it''s even worse. So you see that like this model takes + a lot of time, but", "tokens": [50938, 865, 11, 293, 337, 3389, 309, 311, 754, 5324, + 13, 407, 291, 536, 300, 411, 341, 2316, 2516, 257, 688, 295, 565, 11, 457, 51250], + "temperature": 0.0, "avg_logprob": -0.3873465061187744, "compression_ratio": 1.5691489361702127, + "no_speech_prob": 0.01709228754043579}, {"id": 354, "seek": 387608, "start": 3895.68, + "end": 3902.56, "text": " The in a multi multi threaded world that like it''s not + for sure. So you may improve it that", "tokens": [51344, 440, 294, 257, 4825, 4825, + 47493, 1002, 300, 411, 309, 311, 406, 337, 988, 13, 407, 291, 815, 3470, 309, 300, + 51688], "temperature": 0.0, "avg_logprob": -0.3873465061187744, "compression_ratio": + 1.5691489361702127, "no_speech_prob": 0.01709228754043579}, {"id": 355, "seek": + 390256, "start": 3903.36, "end": 3910.0, "text": " Like and that happened so quite + quite quite a few times so people went to me and said like", "tokens": [50404, 1743, + 293, 300, 2011, 370, 1596, 1596, 1596, 257, 1326, 1413, 370, 561, 1437, 281, 385, + 293, 848, 411, 50736], "temperature": 0.0, "avg_logprob": -0.3109179941813151, "compression_ratio": + 1.595, "no_speech_prob": 0.009028066881000996}, {"id": 356, "seek": 390256, "start": + 3911.04, "end": 3914.56, "text": " You''re analysis of performance contradictory", + "tokens": [50788, 509, 434, 5215, 295, 3389, 49555, 50964], "temperature": 0.0, + "avg_logprob": -0.3109179941813151, "compression_ratio": 1.595, "no_speech_prob": + 0.009028066881000996}, {"id": 357, "seek": 390256, "start": 3915.84, "end": 3917.84, + "text": " Profile blocks", "tokens": [51028, 6039, 794, 8474, 51128], "temperature": + 0.0, "avg_logprob": -0.3109179941813151, "compression_ratio": 1.595, "no_speech_prob": + 0.009028066881000996}, {"id": 358, "seek": 390256, "start": 3918.96, "end": 3922.88, + "text": " And that''s okay, right because you didn''t optimize for the profiler", + "tokens": [51184, 400, 300, 311, 1392, 11, 558, 570, 291, 994, 380, 19719, 337, + 264, 1740, 5441, 51380], "temperature": 0.0, "avg_logprob": -0.3109179941813151, + "compression_ratio": 1.595, "no_speech_prob": 0.009028066881000996}, {"id": 359, + "seek": 390256, "start": 3924.4, "end": 3930.48, "text": " Yeah, because profiler + cannot like so it cannot say to you what would happen if you change something", + "tokens": [51456, 865, 11, 570, 1740, 5441, 2644, 411, 370, 309, 2644, 584, 281, + 291, 437, 576, 1051, 498, 291, 1319, 746, 51760], "temperature": 0.0, "avg_logprob": + -0.3109179941813151, "compression_ratio": 1.595, "no_speech_prob": 0.009028066881000996}, + {"id": 360, "seek": 393048, "start": 3930.88, "end": 3935.44, "text": " Exactly, + it''s just a snapshot. Yeah, it''s just a snapshot. Yeah, exactly", "tokens": [50384, + 7587, 11, 309, 311, 445, 257, 30163, 13, 865, 11, 309, 311, 445, 257, 30163, 13, + 865, 11, 2293, 50612], "temperature": 0.0, "avg_logprob": -0.32242582241694134, + "compression_ratio": 1.7136563876651982, "no_speech_prob": 0.004768346436321735}, + {"id": 361, "seek": 393048, "start": 3935.92, "end": 3940.64, "text": " And and + like coming back to hsw what are you hoping to achieve like", "tokens": [50636, + 400, 293, 411, 1348, 646, 281, 276, 82, 86, 437, 366, 291, 7159, 281, 4584, 411, + 50872], "temperature": 0.0, "avg_logprob": -0.32242582241694134, "compression_ratio": + 1.7136563876651982, "no_speech_prob": 0.004768346436321735}, {"id": 362, "seek": + 393048, "start": 3941.28, "end": 3944.8, "text": " Maybe in some midterm future + for example like", "tokens": [50904, 2704, 294, 512, 2062, 7039, 2027, 337, 1365, + 411, 51080], "temperature": 0.0, "avg_logprob": -0.32242582241694134, "compression_ratio": + 1.7136563876651982, "no_speech_prob": 0.004768346436321735}, {"id": 363, "seek": + 393048, "start": 3945.92, "end": 3953.12, "text": " Why do you decide at work which + where the reimplementation as w is when they add symbolic filtering?", "tokens": + [51136, 1545, 360, 291, 4536, 412, 589, 597, 689, 264, 33433, 781, 19631, 382, 261, + 307, 562, 436, 909, 25755, 30822, 30, 51496], "temperature": 0.0, "avg_logprob": + -0.32242582241694134, "compression_ratio": 1.7136563876651982, "no_speech_prob": + 0.004768346436321735}, {"id": 364, "seek": 393048, "start": 3953.6, "end": 3959.6, + "text": " So like what would it take in your original paper in your original algorithm + to add symbolic filters?", "tokens": [51520, 407, 411, 437, 576, 309, 747, 294, + 428, 3380, 3035, 294, 428, 3380, 9284, 281, 909, 25755, 15995, 30, 51820], "temperature": + 0.0, "avg_logprob": -0.32242582241694134, "compression_ratio": 1.7136563876651982, + "no_speech_prob": 0.004768346436321735}, {"id": 365, "seek": 395960, "start": 3959.8399999999997, + "end": 3963.2799999999997, "text": " How does it change the dynamic of that graph + and search?", "tokens": [50376, 1012, 775, 309, 1319, 264, 8546, 295, 300, 4295, + 293, 3164, 30, 50548], "temperature": 0.0, "avg_logprob": -0.21966276729808135, + "compression_ratio": 1.7327188940092166, "no_speech_prob": 0.003681499743834138}, + {"id": 366, "seek": 395960, "start": 3964.4, "end": 3973.52, "text": " Uh, well, + it seems like for me like so I can correlate interest to and then and interest to + symbolic filtering", "tokens": [50604, 4019, 11, 731, 11, 309, 2544, 411, 337, 385, + 411, 370, 286, 393, 48742, 1179, 281, 293, 550, 293, 1179, 281, 25755, 30822, 51060], + "temperature": 0.0, "avg_logprob": -0.21966276729808135, "compression_ratio": 1.7327188940092166, + "no_speech_prob": 0.003681499743834138}, {"id": 367, "seek": 395960, "start": 3973.8399999999997, + "end": 3982.7999999999997, "text": " So like I think two years ago I haven''t heard + like people talk about symbolic filtering in an but now like it''s a hot topic", + "tokens": [51076, 407, 411, 286, 519, 732, 924, 2057, 286, 2378, 380, 2198, 411, + 561, 751, 466, 25755, 30822, 294, 364, 457, 586, 411, 309, 311, 257, 2368, 4829, + 51524], "temperature": 0.0, "avg_logprob": -0.21966276729808135, "compression_ratio": + 1.7327188940092166, "no_speech_prob": 0.003681499743834138}, {"id": 368, "seek": + 395960, "start": 3983.44, "end": 3988.08, "text": " Like from different places people + want symbolic filtering that is like for targeting", "tokens": [51556, 1743, 490, + 819, 3190, 561, 528, 25755, 30822, 300, 307, 411, 337, 17918, 51788], "temperature": + 0.0, "avg_logprob": -0.21966276729808135, "compression_ratio": 1.7327188940092166, + "no_speech_prob": 0.003681499743834138}, {"id": 369, "seek": 398808, "start": 3988.72, + "end": 3994.64, "text": " So like for ads. Yeah, you can you want to have some targeting + for the audience or some other", "tokens": [50396, 407, 411, 337, 10342, 13, 865, + 11, 291, 393, 291, 528, 281, 362, 512, 17918, 337, 264, 4034, 420, 512, 661, 50692], + "temperature": 0.0, "avg_logprob": -0.24053382873535156, "compression_ratio": 1.5404040404040404, + "no_speech_prob": 0.0013806666247546673}, {"id": 370, "seek": 398808, "start": 3995.2, + "end": 3996.48, "text": " filters and", "tokens": [50720, 15995, 293, 50784], "temperature": + 0.0, "avg_logprob": -0.24053382873535156, "compression_ratio": 1.5404040404040404, + "no_speech_prob": 0.0013806666247546673}, {"id": 371, "seek": 398808, "start": 3997.68, + "end": 4001.12, "text": " But I see that as outside of the end and itself", "tokens": + [50844, 583, 286, 536, 300, 382, 2380, 295, 264, 917, 293, 2564, 51016], "temperature": + 0.0, "avg_logprob": -0.24053382873535156, "compression_ratio": 1.5404040404040404, + "no_speech_prob": 0.0013806666247546673}, {"id": 372, "seek": 398808, "start": 4001.92, + "end": 4003.92, "text": " so", "tokens": [51056, 370, 51156], "temperature": 0.0, + "avg_logprob": -0.24053382873535156, "compression_ratio": 1.5404040404040404, "no_speech_prob": + 0.0013806666247546673}, {"id": 373, "seek": 398808, "start": 4003.92, "end": 4008.08, + "text": " As I said when working on a startup so our first application was", "tokens": + [51156, 1018, 286, 848, 562, 1364, 322, 257, 18578, 370, 527, 700, 3861, 390, 51364], + "temperature": 0.0, "avg_logprob": -0.24053382873535156, "compression_ratio": 1.5404040404040404, + "no_speech_prob": 0.0013806666247546673}, {"id": 374, "seek": 398808, "start": 4008.96, + "end": 4014.4, "text": " Doing something like symbolic filtering and there it''s + easier in some sense because", "tokens": [51408, 18496, 746, 411, 25755, 30822, + 293, 456, 309, 311, 3571, 294, 512, 2020, 570, 51680], "temperature": 0.0, "avg_logprob": + -0.24053382873535156, "compression_ratio": 1.5404040404040404, "no_speech_prob": + 0.0013806666247546673}, {"id": 375, "seek": 401440, "start": 4014.8, "end": 4021.36, + "text": " Like as you said there is a problem of this distances and high-dimensional + space and this problem", "tokens": [50384, 1743, 382, 291, 848, 456, 307, 257, 1154, + 295, 341, 22182, 293, 1090, 12, 18759, 1901, 293, 341, 1154, 50712], "temperature": + 0.0, "avg_logprob": -0.2316982632591611, "compression_ratio": 1.6396396396396395, + "no_speech_prob": 0.0011786088580265641}, {"id": 376, "seek": 401440, "start": 4021.44, + "end": 4023.6800000000003, "text": " There is no such problem in symbolic filtering", + "tokens": [50716, 821, 307, 572, 1270, 1154, 294, 25755, 30822, 50828], "temperature": + 0.0, "avg_logprob": -0.2316982632591611, "compression_ratio": 1.6396396396396395, + "no_speech_prob": 0.0011786088580265641}, {"id": 377, "seek": 401440, "start": 4024.7200000000003, + "end": 4030.56, "text": " So symbolic filtering you have a query that have exact + result and yeah, if you write the SQL query", "tokens": [50880, 407, 25755, 30822, + 291, 362, 257, 14581, 300, 362, 1900, 1874, 293, 1338, 11, 498, 291, 2464, 264, + 19200, 14581, 51172], "temperature": 0.0, "avg_logprob": -0.2316982632591611, "compression_ratio": + 1.6396396396396395, "no_speech_prob": 0.0011786088580265641}, {"id": 378, "seek": + 401440, "start": 4031.12, "end": 4037.92, "text": " So it can be optimized to work + efficiently and but the iNM does a very different job. It does approximate", "tokens": + [51200, 407, 309, 393, 312, 26941, 281, 589, 19621, 293, 457, 264, 741, 45, 44, + 775, 257, 588, 819, 1691, 13, 467, 775, 30874, 51540], "temperature": 0.0, "avg_logprob": + -0.2316982632591611, "compression_ratio": 1.6396396396396395, "no_speech_prob": + 0.0011786088580265641}, {"id": 379, "seek": 401440, "start": 4038.7200000000003, + "end": 4040.4, "text": " Yeah filtering", "tokens": [51580, 865, 30822, 51664], + "temperature": 0.0, "avg_logprob": -0.2316982632591611, "compression_ratio": 1.6396396396396395, + "no_speech_prob": 0.0011786088580265641}, {"id": 380, "seek": 404040, "start": 4040.48, + "end": 4048.56, "text": " We can kind of mix them together. So if you add like so + you have a distance and like you add some", "tokens": [50368, 492, 393, 733, 295, + 2890, 552, 1214, 13, 407, 498, 291, 909, 411, 370, 291, 362, 257, 4560, 293, 411, + 291, 909, 512, 50772], "temperature": 0.0, "avg_logprob": -0.16905040171608995, + "compression_ratio": 1.5913978494623655, "no_speech_prob": 0.001725785550661385}, + {"id": 381, "seek": 404040, "start": 4050.4, "end": 4057.6, "text": " Like prefix + for that which somehow captures the symbolic filtering and you can build an index + that also like takes", "tokens": [50864, 1743, 46969, 337, 300, 597, 6063, 27986, + 264, 25755, 30822, 293, 291, 393, 1322, 364, 8186, 300, 611, 411, 2516, 51224], + "temperature": 0.0, "avg_logprob": -0.16905040171608995, "compression_ratio": 1.5913978494623655, + "no_speech_prob": 0.001725785550661385}, {"id": 382, "seek": 404040, "start": 4059.36, + "end": 4064.48, "text": " Takes account and like there are some other people who + suggested to do that as well", "tokens": [51312, 44347, 2696, 293, 411, 456, 366, + 512, 661, 561, 567, 10945, 281, 360, 300, 382, 731, 51568], "temperature": 0.0, + "avg_logprob": -0.16905040171608995, "compression_ratio": 1.5913978494623655, "no_speech_prob": + 0.001725785550661385}, {"id": 383, "seek": 406448, "start": 4065.2, "end": 4069.04, + "text": " But the problem here like and yeah, that can help so during search", "tokens": + [50400, 583, 264, 1154, 510, 411, 293, 1338, 11, 300, 393, 854, 370, 1830, 3164, + 50592], "temperature": 0.0, "avg_logprob": -0.22255685594346789, "compression_ratio": + 1.7254098360655739, "no_speech_prob": 0.008683813735842705}, {"id": 384, "seek": + 406448, "start": 4069.36, "end": 4078.48, "text": " So if you filter by the symbol + and like you can easily add filtering so when hnsw does filtering for deletes like + can be done the same way", "tokens": [50608, 407, 498, 291, 6608, 538, 264, 5986, + 293, 411, 291, 393, 3612, 909, 30822, 370, 562, 276, 3695, 86, 775, 30822, 337, + 1103, 37996, 411, 393, 312, 1096, 264, 912, 636, 51064], "temperature": 0.0, "avg_logprob": + -0.22255685594346789, "compression_ratio": 1.7254098360655739, "no_speech_prob": + 0.008683813735842705}, {"id": 385, "seek": 406448, "start": 4080.08, "end": 4082.08, + "text": " Yeah, you can extract", "tokens": [51144, 865, 11, 291, 393, 8947, 51244], + "temperature": 0.0, "avg_logprob": -0.22255685594346789, "compression_ratio": 1.7254098360655739, + "no_speech_prob": 0.008683813735842705}, {"id": 386, "seek": 406448, "start": 4082.08, + "end": 4088.4, "text": " Like only elements that pass the filter and there is some + like guidance on the graph because you create that wizard", "tokens": [51244, 1743, + 787, 4959, 300, 1320, 264, 6608, 293, 456, 307, 512, 411, 10056, 322, 264, 4295, + 570, 291, 1884, 300, 25807, 51560], "temperature": 0.0, "avg_logprob": -0.22255685594346789, + "compression_ratio": 1.7254098360655739, "no_speech_prob": 0.008683813735842705}, + {"id": 387, "seek": 406448, "start": 4089.12, "end": 4094.2400000000002, "text": + " But for me like I don''t know so you have like huge number of possible filters", + "tokens": [51596, 583, 337, 385, 411, 286, 500, 380, 458, 370, 291, 362, 411, 2603, + 1230, 295, 1944, 15995, 51852], "temperature": 0.0, "avg_logprob": -0.22255685594346789, + "compression_ratio": 1.7254098360655739, "no_speech_prob": 0.008683813735842705}, + {"id": 388, "seek": 409448, "start": 4094.56, "end": 4101.68, "text": " So what + will be the metric and how would you balance it with the like approximate network + that creates a lot of problems", "tokens": [50368, 407, 437, 486, 312, 264, 20678, + 293, 577, 576, 291, 4772, 309, 365, 264, 411, 30874, 3209, 300, 7829, 257, 688, + 295, 2740, 50724], "temperature": 0.0, "avg_logprob": -0.2755220247351605, "compression_ratio": + 1.6440677966101696, "no_speech_prob": 0.0027677782345563173}, {"id": 389, "seek": + 409448, "start": 4102.24, "end": 4104.24, "text": " I think yeah, and I", "tokens": + [50752, 286, 519, 1338, 11, 293, 286, 50852], "temperature": 0.0, "avg_logprob": + -0.2755220247351605, "compression_ratio": 1.6440677966101696, "no_speech_prob": + 0.0027677782345563173}, {"id": 390, "seek": 409448, "start": 4104.88, "end": 4110.32, + "text": " I thought that the best solution would be like to keep this like to some + extent", "tokens": [50884, 286, 1194, 300, 264, 1151, 3827, 576, 312, 411, 281, + 1066, 341, 411, 281, 512, 8396, 51156], "temperature": 0.0, "avg_logprob": -0.2755220247351605, + "compression_ratio": 1.6440677966101696, "no_speech_prob": 0.0027677782345563173}, + {"id": 391, "seek": 409448, "start": 4111.52, "end": 4117.36, "text": " But focus + more on like how do you can chart the index according to those like", "tokens": + [51216, 583, 1879, 544, 322, 411, 577, 360, 291, 393, 6927, 264, 8186, 4650, 281, + 729, 411, 51508], "temperature": 0.0, "avg_logprob": -0.2755220247351605, "compression_ratio": + 1.6440677966101696, "no_speech_prob": 0.0027677782345563173}, {"id": 392, "seek": + 409448, "start": 4118.96, "end": 4123.28, "text": " Great theory don''t that there + are sharp. So you can like do SQL queries like for instance", "tokens": [51588, + 3769, 5261, 500, 380, 300, 456, 366, 8199, 13, 407, 291, 393, 411, 360, 19200, 24109, + 411, 337, 5197, 51804], "temperature": 0.0, "avg_logprob": -0.2755220247351605, + "compression_ratio": 1.6440677966101696, "no_speech_prob": 0.0027677782345563173}, + {"id": 393, "seek": 412328, "start": 4124.08, "end": 4126.0, "text": " Like there + are some queries that can", "tokens": [50404, 1743, 456, 366, 512, 24109, 300, 393, + 50500], "temperature": 0.0, "avg_logprob": -0.21215457706661014, "compression_ratio": + 1.6604651162790698, "no_speech_prob": 0.004452978726476431}, {"id": 394, "seek": + 412328, "start": 4127.04, "end": 4128.88, "text": " Work well with this filtering", + "tokens": [50552, 6603, 731, 365, 341, 30822, 50644], "temperature": 0.0, "avg_logprob": + -0.21215457706661014, "compression_ratio": 1.6604651162790698, "no_speech_prob": + 0.004452978726476431}, {"id": 395, "seek": 412328, "start": 4129.679999999999, "end": + 4135.44, "text": " Like if you''re most of like or like I don''t know 20% of the + elements pass the symbolic filter", "tokens": [50684, 1743, 498, 291, 434, 881, + 295, 411, 420, 411, 286, 500, 380, 458, 945, 4, 295, 264, 4959, 1320, 264, 25755, + 6608, 50972], "temperature": 0.0, "avg_logprob": -0.21215457706661014, "compression_ratio": + 1.6604651162790698, "no_speech_prob": 0.004452978726476431}, {"id": 396, "seek": + 412328, "start": 4135.84, "end": 4137.84, "text": " So that is fine you can use + it", "tokens": [50992, 407, 300, 307, 2489, 291, 393, 764, 309, 51092], "temperature": + 0.0, "avg_logprob": -0.21215457706661014, "compression_ratio": 1.6604651162790698, + "no_speech_prob": 0.004452978726476431}, {"id": 397, "seek": 412328, "start": 4137.92, + "end": 4145.44, "text": " But maybe there are some queries for which like I know + only like one of a million passes them and those are in different parts", "tokens": + [51096, 583, 1310, 456, 366, 512, 24109, 337, 597, 411, 286, 458, 787, 411, 472, + 295, 257, 2459, 11335, 552, 293, 729, 366, 294, 819, 3166, 51472], "temperature": + 0.0, "avg_logprob": -0.21215457706661014, "compression_ratio": 1.6604651162790698, + "no_speech_prob": 0.004452978726476431}, {"id": 398, "seek": 412328, "start": 4145.84, + "end": 4147.44, "text": " Yeah exactly space", "tokens": [51492, 865, 2293, 1901, + 51572], "temperature": 0.0, "avg_logprob": -0.21215457706661014, "compression_ratio": + 1.6604651162790698, "no_speech_prob": 0.004452978726476431}, {"id": 399, "seek": + 412328, "start": 4148.16, "end": 4150.08, "text": " So for them you can", "tokens": + [51608, 407, 337, 552, 291, 393, 51704], "temperature": 0.0, "avg_logprob": -0.21215457706661014, + "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.004452978726476431}, + {"id": 400, "seek": 415008, "start": 4150.24, "end": 4152.24, "text": " Uh", "tokens": + [50372, 4019, 50472], "temperature": 0.0, "avg_logprob": -0.28486966623843296, "compression_ratio": + 1.74235807860262, "no_speech_prob": 0.003158317180350423}, {"id": 401, "seek": 415008, + "start": 4152.24, "end": 4156.88, "text": " See in real time. So you like you search + and you see that it doesn''t perform well", "tokens": [50472, 3008, 294, 957, 565, + 13, 407, 291, 411, 291, 3164, 293, 291, 536, 300, 309, 1177, 380, 2042, 731, 50704], + "temperature": 0.0, "avg_logprob": -0.28486966623843296, "compression_ratio": 1.74235807860262, + "no_speech_prob": 0.003158317180350423}, {"id": 402, "seek": 415008, "start": 4158.08, + "end": 4160.24, "text": " Uh for those and you can just", "tokens": [50764, 4019, + 337, 729, 293, 291, 393, 445, 50872], "temperature": 0.0, "avg_logprob": -0.28486966623843296, + "compression_ratio": 1.74235807860262, "no_speech_prob": 0.003158317180350423}, + {"id": 403, "seek": 415008, "start": 4160.88, "end": 4167.36, "text": " Build the + separate index for them right because you know those are small those are people + want to find them", "tokens": [50904, 11875, 264, 4994, 8186, 337, 552, 558, 570, + 291, 458, 729, 366, 1359, 729, 366, 561, 528, 281, 915, 552, 51228], "temperature": + 0.0, "avg_logprob": -0.28486966623843296, "compression_ratio": 1.74235807860262, + "no_speech_prob": 0.003158317180350423}, {"id": 404, "seek": 415008, "start": 4168.24, + "end": 4174.08, "text": " Uh, maybe there are enough maybe they''re out of a billion, + but if you have three LN elements, so there''s like a million of them", "tokens": + [51272, 4019, 11, 1310, 456, 366, 1547, 1310, 436, 434, 484, 295, 257, 5218, 11, + 457, 498, 291, 362, 1045, 441, 45, 4959, 11, 370, 456, 311, 411, 257, 2459, 295, + 552, 51564], "temperature": 0.0, "avg_logprob": -0.28486966623843296, "compression_ratio": + 1.74235807860262, "no_speech_prob": 0.003158317180350423}, {"id": 405, "seek": 415008, + "start": 4174.64, "end": 4177.28, "text": " So you you you cash them like build + a cash index", "tokens": [51592, 407, 291, 291, 291, 6388, 552, 411, 1322, 257, + 6388, 8186, 51724], "temperature": 0.0, "avg_logprob": -0.28486966623843296, "compression_ratio": + 1.74235807860262, "no_speech_prob": 0.003158317180350423}, {"id": 406, "seek": 417728, + "start": 4177.92, "end": 4181.92, "text": " For those on the fly so that is like + discrete optimization problem", "tokens": [50396, 1171, 729, 322, 264, 3603, 370, + 300, 307, 411, 27706, 19618, 1154, 50596], "temperature": 0.0, "avg_logprob": -0.2454017167238845, + "compression_ratio": 1.7025862068965518, "no_speech_prob": 0.012665307149291039}, + {"id": 407, "seek": 417728, "start": 4182.5599999999995, "end": 4186.639999999999, + "text": " And I think that''s a bit outside of the index because index is like", + "tokens": [50628, 400, 286, 519, 300, 311, 257, 857, 2380, 295, 264, 8186, 570, + 8186, 307, 411, 50832], "temperature": 0.0, "avg_logprob": -0.2454017167238845, + "compression_ratio": 1.7025862068965518, "no_speech_prob": 0.012665307149291039}, + {"id": 408, "seek": 417728, "start": 4188.16, "end": 4189.84, "text": " Uh", "tokens": + [50908, 4019, 50992], "temperature": 0.0, "avg_logprob": -0.2454017167238845, "compression_ratio": + 1.7025862068965518, "no_speech_prob": 0.012665307149291039}, {"id": 409, "seek": + 417728, "start": 4189.92, "end": 4192.5599999999995, "text": " Yeah, so it''s focused + on the different part. Yeah", "tokens": [50996, 865, 11, 370, 309, 311, 5178, 322, + 264, 819, 644, 13, 865, 51128], "temperature": 0.0, "avg_logprob": -0.2454017167238845, + "compression_ratio": 1.7025862068965518, "no_speech_prob": 0.012665307149291039}, + {"id": 410, "seek": 417728, "start": 4193.44, "end": 4200.24, "text": " Yeah, and + I really I don''t think that other algorithms like and an algorithms can like somehow + avoid this problem", "tokens": [51172, 865, 11, 293, 286, 534, 286, 500, 380, 519, + 300, 661, 14642, 411, 293, 364, 14642, 393, 411, 6063, 5042, 341, 1154, 51512], + "temperature": 0.0, "avg_logprob": -0.2454017167238845, "compression_ratio": 1.7025862068965518, + "no_speech_prob": 0.012665307149291039}, {"id": 411, "seek": 417728, "start": 4200.96, + "end": 4205.679999999999, "text": " Yeah exactly. Yeah, I mean it sounds yeah, what + would you say like you find a stunt correctly", "tokens": [51548, 865, 2293, 13, + 865, 11, 286, 914, 309, 3263, 1338, 11, 437, 576, 291, 584, 411, 291, 915, 257, + 33391, 8944, 51784], "temperature": 0.0, "avg_logprob": -0.2454017167238845, "compression_ratio": + 1.7025862068965518, "no_speech_prob": 0.012665307149291039}, {"id": 412, "seek": + 420568, "start": 4205.76, "end": 4208.320000000001, "text": " Like a little bit + like a and then contradicts", "tokens": [50368, 1743, 257, 707, 857, 411, 257, 293, + 550, 28900, 82, 50496], "temperature": 0.0, "avg_logprob": -0.24843960541945237, + "compression_ratio": 1.7046413502109705, "no_speech_prob": 0.004104072693735361}, + {"id": 413, "seek": 420568, "start": 4209.04, "end": 4215.04, "text": " Just kind + of the nature of symbolic filtering in some sense, but still people do it right + so for example in VEV8", "tokens": [50532, 1449, 733, 295, 264, 3687, 295, 25755, + 30822, 294, 512, 2020, 11, 457, 920, 561, 360, 309, 558, 370, 337, 1365, 294, 691, + 36, 53, 23, 50832], "temperature": 0.0, "avg_logprob": -0.24843960541945237, "compression_ratio": + 1.7046413502109705, "no_speech_prob": 0.004104072693735361}, {"id": 414, "seek": + 420568, "start": 4215.4400000000005, "end": 4222.08, "text": " And in quadrant they + did it right so like you and in milbus as well, but it''s funny like in milbus they + use", "tokens": [50852, 400, 294, 46856, 436, 630, 309, 558, 370, 411, 291, 293, + 294, 1962, 21441, 382, 731, 11, 457, 309, 311, 4074, 411, 294, 1962, 21441, 436, + 764, 51184], "temperature": 0.0, "avg_logprob": -0.24843960541945237, "compression_ratio": + 1.7046413502109705, "no_speech_prob": 0.004104072693735361}, {"id": 415, "seek": + 420568, "start": 4223.360000000001, "end": 4228.88, "text": " Fies and then other + algorithms, right, but they say we only support you know integer", "tokens": [51248, + 479, 530, 293, 550, 661, 14642, 11, 558, 11, 457, 436, 584, 321, 787, 1406, 291, + 458, 24922, 51524], "temperature": 0.0, "avg_logprob": -0.24843960541945237, "compression_ratio": + 1.7046413502109705, "no_speech_prob": 0.004104072693735361}, {"id": 416, "seek": + 420568, "start": 4229.4400000000005, "end": 4232.240000000001, "text": " Fields, + but we don''t support for example strings yet", "tokens": [51552, 48190, 11, 457, + 321, 500, 380, 1406, 337, 1365, 13985, 1939, 51692], "temperature": 0.0, "avg_logprob": + -0.24843960541945237, "compression_ratio": 1.7046413502109705, "no_speech_prob": + 0.004104072693735361}, {"id": 417, "seek": 423224, "start": 4232.32, "end": 4236.32, + "text": " So we are working on adding strings which means essentially they''re designing + like", "tokens": [50368, 407, 321, 366, 1364, 322, 5127, 13985, 597, 1355, 4476, + 436, 434, 14685, 411, 50568], "temperature": 0.0, "avg_logprob": -0.2312630773748009, + "compression_ratio": 1.6434108527131783, "no_speech_prob": 0.0019731600768864155}, + {"id": 418, "seek": 423224, "start": 4237.28, "end": 4240.96, "text": " This graph + somehow in such a way that okay, it doesn''t support strings yet", "tokens": [50616, + 639, 4295, 6063, 294, 1270, 257, 636, 300, 1392, 11, 309, 1177, 380, 1406, 13985, + 1939, 50800], "temperature": 0.0, "avg_logprob": -0.2312630773748009, "compression_ratio": + 1.6434108527131783, "no_speech_prob": 0.0019731600768864155}, {"id": 419, "seek": + 423224, "start": 4241.679999999999, "end": 4243.679999999999, "text": " Maybe because + it''s not so easy", "tokens": [50836, 2704, 570, 309, 311, 406, 370, 1858, 50936], + "temperature": 0.0, "avg_logprob": -0.2312630773748009, "compression_ratio": 1.6434108527131783, + "no_speech_prob": 0.0019731600768864155}, {"id": 420, "seek": 423224, "start": 4244.0, + "end": 4246.0, "text": " To to to to to edit right", "tokens": [50952, 1407, 281, + 281, 281, 281, 8129, 558, 51052], "temperature": 0.0, "avg_logprob": -0.2312630773748009, + "compression_ratio": 1.6434108527131783, "no_speech_prob": 0.0019731600768864155}, + {"id": 421, "seek": 423224, "start": 4246.32, "end": 4252.96, "text": " Well, I + I''m not sure so that also depends how you measure the performance like if you have + rare queries", "tokens": [51068, 1042, 11, 286, 286, 478, 406, 988, 370, 300, 611, + 5946, 577, 291, 3481, 264, 3389, 411, 498, 291, 362, 5892, 24109, 51400], "temperature": + 0.0, "avg_logprob": -0.2312630773748009, "compression_ratio": 1.6434108527131783, + "no_speech_prob": 0.0019731600768864155}, {"id": 422, "seek": 423224, "start": 4253.44, + "end": 4259.12, "text": " That switch the rich don''t have any result. So like you + pro like your algorithm doesn''t even work on them", "tokens": [51424, 663, 3679, + 264, 4593, 500, 380, 362, 604, 1874, 13, 407, 411, 291, 447, 411, 428, 9284, 1177, + 380, 754, 589, 322, 552, 51708], "temperature": 0.0, "avg_logprob": -0.2312630773748009, + "compression_ratio": 1.6434108527131783, "no_speech_prob": 0.0019731600768864155}, + {"id": 423, "seek": 425912, "start": 4259.599999999999, "end": 4266.64, "text": + " But you either are rare to you measure the like overall recall and you don''t + see it like any problems", "tokens": [50388, 583, 291, 2139, 366, 5892, 281, 291, + 3481, 264, 411, 4787, 9901, 293, 291, 500, 380, 536, 309, 411, 604, 2740, 50740], + "temperature": 0.0, "avg_logprob": -0.23749273794668693, "compression_ratio": 1.6273584905660377, + "no_speech_prob": 0.0017013797769322991}, {"id": 424, "seek": 425912, "start": 4267.5199999999995, + "end": 4273.599999999999, "text": " So definitely you can build the solution maybe + like some simple with like filtering during search", "tokens": [50784, 407, 2138, + 291, 393, 1322, 264, 3827, 1310, 411, 512, 2199, 365, 411, 30822, 1830, 3164, 51088], + "temperature": 0.0, "avg_logprob": -0.23749273794668693, "compression_ratio": 1.6273584905660377, + "no_speech_prob": 0.0017013797769322991}, {"id": 425, "seek": 425912, "start": 4274.4, + "end": 4276.24, "text": " but like", "tokens": [51128, 457, 411, 51220], "temperature": + 0.0, "avg_logprob": -0.23749273794668693, "compression_ratio": 1.6273584905660377, + "no_speech_prob": 0.0017013797769322991}, {"id": 426, "seek": 425912, "start": 4276.24, + "end": 4281.28, "text": " It sure it will fail on some points and that is suboptimal + in terms of latency", "tokens": [51220, 467, 988, 309, 486, 3061, 322, 512, 2793, + 293, 300, 307, 1422, 5747, 10650, 294, 2115, 295, 27043, 51472], "temperature": + 0.0, "avg_logprob": -0.23749273794668693, "compression_ratio": 1.6273584905660377, + "no_speech_prob": 0.0017013797769322991}, {"id": 427, "seek": 425912, "start": 4282.16, + "end": 4285.28, "text": " Yeah, so if you if you''re talking about existing solution", + "tokens": [51516, 865, 11, 370, 498, 291, 498, 291, 434, 1417, 466, 6741, 3827, + 51672], "temperature": 0.0, "avg_logprob": -0.23749273794668693, "compression_ratio": + 1.6273584905660377, "no_speech_prob": 0.0017013797769322991}, {"id": 428, "seek": + 428528, "start": 4285.759999999999, "end": 4290.8, "text": " Maybe they have like + a really good solution, which I just don''t know I looked at few", "tokens": [50388, + 2704, 436, 362, 411, 257, 534, 665, 3827, 11, 597, 286, 445, 500, 380, 458, 286, + 2956, 412, 1326, 50640], "temperature": 0.0, "avg_logprob": -0.22526710264144406, + "compression_ratio": 1.6150627615062763, "no_speech_prob": 0.004469835199415684}, + {"id": 429, "seek": 428528, "start": 4292.0, "end": 4295.679999999999, "text": " + Uh, and that was mostly like filtering inside the graph", "tokens": [50700, 4019, + 11, 293, 300, 390, 5240, 411, 30822, 1854, 264, 4295, 50884], "temperature": 0.0, + "avg_logprob": -0.22526710264144406, "compression_ratio": 1.6150627615062763, "no_speech_prob": + 0.004469835199415684}, {"id": 430, "seek": 428528, "start": 4296.48, "end": 4304.24, + "text": " So like if you if you if you if you have really rare elements which are + like distributed across the search space", "tokens": [50924, 407, 411, 498, 291, + 498, 291, 498, 291, 498, 291, 362, 534, 5892, 4959, 597, 366, 411, 12631, 2108, + 264, 3164, 1901, 51312], "temperature": 0.0, "avg_logprob": -0.22526710264144406, + "compression_ratio": 1.6150627615062763, "no_speech_prob": 0.004469835199415684}, + {"id": 431, "seek": 428528, "start": 4305.5199999999995, "end": 4308.48, "text": + " Evenly like in different parts. So it will struggle", "tokens": [51376, 2754, + 356, 411, 294, 819, 3166, 13, 407, 309, 486, 7799, 51524], "temperature": 0.0, "avg_logprob": + -0.22526710264144406, "compression_ratio": 1.6150627615062763, "no_speech_prob": + 0.004469835199415684}, {"id": 432, "seek": 428528, "start": 4309.28, "end": 4313.599999999999, + "text": " Because you need to just do brute force of the whole to find them. Yeah, + exactly", "tokens": [51564, 1436, 291, 643, 281, 445, 360, 47909, 3464, 295, 264, + 1379, 281, 915, 552, 13, 865, 11, 2293, 51780], "temperature": 0.0, "avg_logprob": + -0.22526710264144406, "compression_ratio": 1.6150627615062763, "no_speech_prob": + 0.004469835199415684}, {"id": 433, "seek": 431360, "start": 4313.68, "end": 4318.4800000000005, + "text": " I mean to me it sounds like computerial explosion like if I add more and + more symbolic filters", "tokens": [50368, 286, 914, 281, 385, 309, 3263, 411, 3820, + 831, 15673, 411, 498, 286, 909, 544, 293, 544, 25755, 15995, 50608], "temperature": + 0.0, "avg_logprob": -0.19057241760858215, "compression_ratio": 1.7877551020408162, + "no_speech_prob": 0.0008095403900370002}, {"id": 434, "seek": 431360, "start": 4318.96, + "end": 4322.72, "text": " Like essentially I''m introducing like new sub spaces + in my space, right?", "tokens": [50632, 1743, 4476, 286, 478, 15424, 411, 777, 1422, + 7673, 294, 452, 1901, 11, 558, 30, 50820], "temperature": 0.0, "avg_logprob": -0.19057241760858215, + "compression_ratio": 1.7877551020408162, "no_speech_prob": 0.0008095403900370002}, + {"id": 435, "seek": 431360, "start": 4322.96, "end": 4329.92, "text": " So like + I need to like push these points somehow closer to each other within that specific + symbolic filter", "tokens": [50832, 407, 411, 286, 643, 281, 411, 2944, 613, 2793, + 6063, 4966, 281, 1184, 661, 1951, 300, 2685, 25755, 6608, 51180], "temperature": + 0.0, "avg_logprob": -0.19057241760858215, "compression_ratio": 1.7877551020408162, + "no_speech_prob": 0.0008095403900370002}, {"id": 436, "seek": 431360, "start": 4330.0, + "end": 4335.68, "text": " But if I add more of them now I have like kind of like + multi-dimensional space of filters, right?", "tokens": [51184, 583, 498, 286, 909, + 544, 295, 552, 586, 286, 362, 411, 733, 295, 411, 4825, 12, 18759, 1901, 295, 15995, + 11, 558, 30, 51468], "temperature": 0.0, "avg_logprob": -0.19057241760858215, "compression_ratio": + 1.7877551020408162, "no_speech_prob": 0.0008095403900370002}, {"id": 437, "seek": + 431360, "start": 4336.8, "end": 4341.04, "text": " Yeah, and you you have a really + high dimensional space of filters", "tokens": [51524, 865, 11, 293, 291, 291, 362, + 257, 534, 1090, 18795, 1901, 295, 15995, 51736], "temperature": 0.0, "avg_logprob": + -0.19057241760858215, "compression_ratio": 1.7877551020408162, "no_speech_prob": + 0.0008095403900370002}, {"id": 438, "seek": 434104, "start": 4341.12, "end": 4343.84, + "text": " But you don''t really know like the distribution of queries", "tokens": + [50368, 583, 291, 500, 380, 534, 458, 411, 264, 7316, 295, 24109, 50504], "temperature": + 0.0, "avg_logprob": -0.17489470056740633, "compression_ratio": 1.7183098591549295, + "no_speech_prob": 0.002023716690018773}, {"id": 439, "seek": 434104, "start": 4344.56, + "end": 4349.68, "text": " For those filters it should be very different because + those are user distribution", "tokens": [50540, 1171, 729, 15995, 309, 820, 312, + 588, 819, 570, 729, 366, 4195, 7316, 50796], "temperature": 0.0, "avg_logprob": + -0.17489470056740633, "compression_ratio": 1.7183098591549295, "no_speech_prob": + 0.002023716690018773}, {"id": 440, "seek": 434104, "start": 4350.4, "end": 4354.08, + "text": " Yeah, so that also will make the problem more complicated", "tokens": + [50832, 865, 11, 370, 300, 611, 486, 652, 264, 1154, 544, 6179, 51016], "temperature": + 0.0, "avg_logprob": -0.17489470056740633, "compression_ratio": 1.7183098591549295, + "no_speech_prob": 0.002023716690018773}, {"id": 441, "seek": 434104, "start": 4354.24, + "end": 4359.2, "text": " So it still can work if you if the especially if distribution + is kind of similar", "tokens": [51024, 407, 309, 920, 393, 589, 498, 291, 498, 264, + 2318, 498, 7316, 307, 733, 295, 2531, 51272], "temperature": 0.0, "avg_logprob": + -0.17489470056740633, "compression_ratio": 1.7183098591549295, "no_speech_prob": + 0.002023716690018773}, {"id": 442, "seek": 434104, "start": 4359.68, "end": 4363.68, + "text": " So it will work if you crank up the parameters of the graph", "tokens": + [51296, 407, 309, 486, 589, 498, 291, 21263, 493, 264, 9834, 295, 264, 4295, 51496], + "temperature": 0.0, "avg_logprob": -0.17489470056740633, "compression_ratio": 1.7183098591549295, + "no_speech_prob": 0.002023716690018773}, {"id": 443, "seek": 434104, "start": 4364.56, + "end": 4366.56, "text": " Yeah, use more connections", "tokens": [51540, 865, 11, + 764, 544, 9271, 51640], "temperature": 0.0, "avg_logprob": -0.17489470056740633, + "compression_ratio": 1.7183098591549295, "no_speech_prob": 0.002023716690018773}, + {"id": 444, "seek": 436656, "start": 4366.64, "end": 4373.200000000001, "text": + " But so there is a mismatch so during query your distribution may be very different + and you need to think about it", "tokens": [50368, 583, 370, 456, 307, 257, 23220, + 852, 370, 1830, 14581, 428, 7316, 815, 312, 588, 819, 293, 291, 643, 281, 519, 466, + 309, 50696], "temperature": 0.0, "avg_logprob": -0.20746775540438564, "compression_ratio": + 1.82421875, "no_speech_prob": 0.004831674508750439}, {"id": 445, "seek": 436656, + "start": 4373.84, "end": 4375.200000000001, "text": " So like", "tokens": [50728, + 407, 411, 50796], "temperature": 0.0, "avg_logprob": -0.20746775540438564, "compression_ratio": + 1.82421875, "no_speech_prob": 0.004831674508750439}, {"id": 446, "seek": 436656, + "start": 4375.200000000001, "end": 4380.240000000001, "text": " How you balance + those inside so you have like two types of distance and how you balance them", "tokens": + [50796, 1012, 291, 4772, 729, 1854, 370, 291, 362, 411, 732, 3467, 295, 4560, 293, + 577, 291, 4772, 552, 51048], "temperature": 0.0, "avg_logprob": -0.20746775540438564, + "compression_ratio": 1.82421875, "no_speech_prob": 0.004831674508750439}, {"id": + 447, "seek": 436656, "start": 4380.240000000001, "end": 4383.52, "text": " You want + to balance it so the the query distribution", "tokens": [51048, 509, 528, 281, 4772, + 309, 370, 264, 264, 14581, 7316, 51212], "temperature": 0.0, "avg_logprob": -0.20746775540438564, + "compression_ratio": 1.82421875, "no_speech_prob": 0.004831674508750439}, {"id": + 448, "seek": 436656, "start": 4384.080000000001, "end": 4385.280000000001, "text": + " Yeah", "tokens": [51240, 865, 51300], "temperature": 0.0, "avg_logprob": -0.20746775540438564, + "compression_ratio": 1.82421875, "no_speech_prob": 0.004831674508750439}, {"id": + 449, "seek": 436656, "start": 4385.280000000001, "end": 4391.84, "text": " That''s + that''s that''s this field like I think this field of vector search doesn''t make + you excited that you you contributed to it", "tokens": [51300, 663, 311, 300, 311, + 300, 311, 341, 2519, 411, 286, 519, 341, 2519, 295, 8062, 3164, 1177, 380, 652, + 291, 2919, 300, 291, 291, 18434, 281, 309, 51628], "temperature": 0.0, "avg_logprob": + -0.20746775540438564, "compression_ratio": 1.82421875, "no_speech_prob": 0.004831674508750439}, + {"id": 450, "seek": 436656, "start": 4391.92, "end": 4396.0, "text": " Like how + do you feel about this field that is emerging right now?", "tokens": [51632, 1743, + 577, 360, 291, 841, 466, 341, 2519, 300, 307, 14989, 558, 586, 30, 51836], "temperature": + 0.0, "avg_logprob": -0.20746775540438564, "compression_ratio": 1.82421875, "no_speech_prob": + 0.004831674508750439}, {"id": 451, "seek": 439656, "start": 4397.200000000001, "end": + 4402.64, "text": " Well, I think it is very important so right now I''m working + mostly on applications", "tokens": [50396, 1042, 11, 286, 519, 309, 307, 588, 1021, + 370, 558, 586, 286, 478, 1364, 5240, 322, 5821, 50668], "temperature": 0.0, "avg_logprob": + -0.255834477742513, "compression_ratio": 1.575609756097561, "no_speech_prob": 0.0015009396011009812}, + {"id": 452, "seek": 439656, "start": 4403.52, "end": 4405.52, "text": " how to", + "tokens": [50712, 577, 281, 50812], "temperature": 0.0, "avg_logprob": -0.255834477742513, + "compression_ratio": 1.575609756097561, "no_speech_prob": 0.0015009396011009812}, + {"id": 453, "seek": 439656, "start": 4406.400000000001, "end": 4411.04, "text": + " Like get advantage of this and so there are many applications", "tokens": [50856, + 1743, 483, 5002, 295, 341, 293, 370, 456, 366, 867, 5821, 51088], "temperature": + 0.0, "avg_logprob": -0.255834477742513, "compression_ratio": 1.575609756097561, + "no_speech_prob": 0.0015009396011009812}, {"id": 454, "seek": 439656, "start": 4412.320000000001, + "end": 4417.120000000001, "text": " Which cannot be done without efficient search + like there was a paper for deep mind", "tokens": [51152, 3013, 2644, 312, 1096, + 1553, 7148, 3164, 411, 456, 390, 257, 3035, 337, 2452, 1575, 51392], "temperature": + 0.0, "avg_logprob": -0.255834477742513, "compression_ratio": 1.575609756097561, + "no_speech_prob": 0.0015009396011009812}, {"id": 455, "seek": 439656, "start": 4417.6, + "end": 4420.4800000000005, "text": " Like was quite recently where they used search", + "tokens": [51416, 1743, 390, 1596, 3938, 689, 436, 1143, 3164, 51560], "temperature": + 0.0, "avg_logprob": -0.255834477742513, "compression_ratio": 1.575609756097561, + "no_speech_prob": 0.0015009396011009812}, {"id": 456, "seek": 439656, "start": 4421.4400000000005, + "end": 4424.96, "text": " Uh like inside of the network and uh well", "tokens": + [51608, 4019, 411, 1854, 295, 264, 3209, 293, 2232, 731, 51784], "temperature": + 0.0, "avg_logprob": -0.255834477742513, "compression_ratio": 1.575609756097561, + "no_speech_prob": 0.0015009396011009812}, {"id": 457, "seek": 442656, "start": 4426.64, + "end": 4428.64, "text": " That makes a lot of sense", "tokens": [50368, 663, 1669, + 257, 688, 295, 2020, 50468], "temperature": 0.0, "avg_logprob": -0.3023087637765067, + "compression_ratio": 1.75, "no_speech_prob": 0.0007562597165815532}, {"id": 458, + "seek": 442656, "start": 4428.64, "end": 4430.400000000001, "text": " And I think", + "tokens": [50468, 400, 286, 519, 50556], "temperature": 0.0, "avg_logprob": -0.3023087637765067, + "compression_ratio": 1.75, "no_speech_prob": 0.0007562597165815532}, {"id": 459, + "seek": 442656, "start": 4431.4400000000005, "end": 4434.8, "text": " Yeah, there + will be more papers and that there were papers before that paper", "tokens": [50608, + 865, 11, 456, 486, 312, 544, 10577, 293, 300, 456, 645, 10577, 949, 300, 3035, 50776], + "temperature": 0.0, "avg_logprob": -0.3023087637765067, "compression_ratio": 1.75, + "no_speech_prob": 0.0007562597165815532}, {"id": 460, "seek": 442656, "start": 4435.76, + "end": 4442.160000000001, "text": " But there will be more papers that use an M + inside the inside the big like a huge nal p model", "tokens": [50824, 583, 456, + 486, 312, 544, 10577, 300, 764, 364, 376, 1854, 264, 1854, 264, 955, 411, 257, 2603, + 297, 304, 280, 2316, 51144], "temperature": 0.0, "avg_logprob": -0.3023087637765067, + "compression_ratio": 1.75, "no_speech_prob": 0.0007562597165815532}, {"id": 461, + "seek": 442656, "start": 4442.96, "end": 4446.8, "text": " Yeah, yeah, for example + like this uh learning to hash methods", "tokens": [51184, 865, 11, 1338, 11, 337, + 1365, 411, 341, 2232, 2539, 281, 22019, 7150, 51376], "temperature": 0.0, "avg_logprob": + -0.3023087637765067, "compression_ratio": 1.75, "no_speech_prob": 0.0007562597165815532}, + {"id": 462, "seek": 442656, "start": 4446.8, "end": 4448.72, "text": " I don''t + know if you heard about them", "tokens": [51376, 286, 500, 380, 458, 498, 291, 2198, + 466, 552, 51472], "temperature": 0.0, "avg_logprob": -0.3023087637765067, "compression_ratio": + 1.75, "no_speech_prob": 0.0007562597165815532}, {"id": 463, "seek": 442656, "start": + 4448.72, "end": 4451.76, "text": " So like um there are like when I when I tried + to kind of", "tokens": [51472, 407, 411, 1105, 456, 366, 411, 562, 286, 562, 286, + 3031, 281, 733, 295, 51624], "temperature": 0.0, "avg_logprob": -0.3023087637765067, + "compression_ratio": 1.75, "no_speech_prob": 0.0007562597165815532}, {"id": 464, + "seek": 445176, "start": 4452.56, "end": 4457.280000000001, "text": " Put everything + into their buckets like how many different types of algorithms exist", "tokens": + [50404, 4935, 1203, 666, 641, 32191, 411, 577, 867, 819, 3467, 295, 14642, 2514, + 50640], "temperature": 0.0, "avg_logprob": -0.21257998303669254, "compression_ratio": + 1.6320754716981132, "no_speech_prob": 0.004206873942166567}, {"id": 465, "seek": + 445176, "start": 4457.76, "end": 4462.88, "text": " Like I didn''t know about learning + to hash. It seems to be like one of the recent uh developments", "tokens": [50664, + 1743, 286, 994, 380, 458, 466, 2539, 281, 22019, 13, 467, 2544, 281, 312, 411, 472, + 295, 264, 5162, 2232, 20862, 50920], "temperature": 0.0, "avg_logprob": -0.21257998303669254, + "compression_ratio": 1.6320754716981132, "no_speech_prob": 0.004206873942166567}, + {"id": 466, "seek": 445176, "start": 4464.16, "end": 4467.2, "text": " Are you following + up on that as well or uh", "tokens": [50984, 2014, 291, 3480, 493, 322, 300, 382, + 731, 420, 2232, 51136], "temperature": 0.0, "avg_logprob": -0.21257998303669254, + "compression_ratio": 1.6320754716981132, "no_speech_prob": 0.004206873942166567}, + {"id": 467, "seek": 445176, "start": 4467.92, "end": 4476.08, "text": " Well learning + to hash so like I''m not really following that so learning to hash was before hnsw", + "tokens": [51172, 1042, 2539, 281, 22019, 370, 411, 286, 478, 406, 534, 3480, 300, + 370, 2539, 281, 22019, 390, 949, 276, 3695, 86, 51580], "temperature": 0.0, "avg_logprob": + -0.21257998303669254, "compression_ratio": 1.6320754716981132, "no_speech_prob": + 0.004206873942166567}, {"id": 468, "seek": 445176, "start": 4476.8, "end": 4478.8, + "text": " Okay, there are algorithms", "tokens": [51616, 1033, 11, 456, 366, 14642, + 51716], "temperature": 0.0, "avg_logprob": -0.21257998303669254, "compression_ratio": + 1.6320754716981132, "no_speech_prob": 0.004206873942166567}, {"id": 469, "seek": + 447880, "start": 4478.88, "end": 4485.4400000000005, "text": " And uh when I talked + with people who did like were specialized on product quantization and review the + papers", "tokens": [50368, 400, 2232, 562, 286, 2825, 365, 561, 567, 630, 411, 645, + 19813, 322, 1674, 4426, 2144, 293, 3131, 264, 10577, 50696], "temperature": 0.0, + "avg_logprob": -0.23785109852635583, "compression_ratio": 1.6056338028169015, "no_speech_prob": + 0.0006676787161268294}, {"id": 470, "seek": 447880, "start": 4486.320000000001, + "end": 4493.84, "text": " Uh, they told me that like learning to hash never reaches + the performance of like post quantization", "tokens": [50740, 4019, 11, 436, 1907, + 385, 300, 411, 2539, 281, 22019, 1128, 14235, 264, 3389, 295, 411, 2183, 4426, 2144, + 51116], "temperature": 0.0, "avg_logprob": -0.23785109852635583, "compression_ratio": + 1.6056338028169015, "no_speech_prob": 0.0006676787161268294}, {"id": 471, "seek": + 447880, "start": 4494.4800000000005, "end": 4497.28, "text": " Like at least at + what that was like a few years ago", "tokens": [51148, 1743, 412, 1935, 412, 437, + 300, 390, 411, 257, 1326, 924, 2057, 51288], "temperature": 0.0, "avg_logprob": + -0.23785109852635583, "compression_ratio": 1.6056338028169015, "no_speech_prob": + 0.0006676787161268294}, {"id": 472, "seek": 447880, "start": 4498.56, "end": 4501.12, + "text": " Yeah, and uh yeah, maybe like", "tokens": [51352, 865, 11, 293, 2232, + 1338, 11, 1310, 411, 51480], "temperature": 0.0, "avg_logprob": -0.23785109852635583, + "compression_ratio": 1.6056338028169015, "no_speech_prob": 0.0006676787161268294}, + {"id": 473, "seek": 447880, "start": 4502.72, "end": 4504.72, "text": " Now it''s + solved", "tokens": [51560, 823, 309, 311, 13041, 51660], "temperature": 0.0, "avg_logprob": + -0.23785109852635583, "compression_ratio": 1.6056338028169015, "no_speech_prob": + 0.0006676787161268294}, {"id": 474, "seek": 447880, "start": 4505.4400000000005, + "end": 4508.320000000001, "text": " Uh, but like when I talk about an n", "tokens": + [51696, 4019, 11, 457, 411, 562, 286, 751, 466, 364, 297, 51840], "temperature": + 0.0, "avg_logprob": -0.23785109852635583, "compression_ratio": 1.6056338028169015, + "no_speech_prob": 0.0006676787161268294}, {"id": 475, "seek": 450880, "start": 4509.12, + "end": 4512.08, "text": " Inside the i think about like about graph in it", "tokens": + [50380, 15123, 264, 741, 519, 466, 411, 466, 4295, 294, 309, 50528], "temperature": + 0.0, "avg_logprob": -0.28593329020908903, "compression_ratio": 1.635, "no_speech_prob": + 0.0005888649611733854}, {"id": 476, "seek": 450880, "start": 4512.88, "end": 4514.4800000000005, + "text": " So yeah, yeah, and", "tokens": [50568, 407, 1338, 11, 1338, 11, 293, 50648], + "temperature": 0.0, "avg_logprob": -0.28593329020908903, "compression_ratio": 1.635, + "no_speech_prob": 0.0005888649611733854}, {"id": 477, "seek": 450880, "start": 4515.52, + "end": 4517.52, "text": " Yeah, and uh", "tokens": [50700, 865, 11, 293, 2232, 50800], + "temperature": 0.0, "avg_logprob": -0.28593329020908903, "compression_ratio": 1.635, + "no_speech_prob": 0.0005888649611733854}, {"id": 478, "seek": 450880, "start": 4517.52, + "end": 4521.4400000000005, "text": " So one interesting thing also can happen uh + like with graphs", "tokens": [50800, 407, 472, 1880, 551, 611, 393, 1051, 2232, + 411, 365, 24877, 50996], "temperature": 0.0, "avg_logprob": -0.28593329020908903, + "compression_ratio": 1.635, "no_speech_prob": 0.0005888649611733854}, {"id": 479, + "seek": 450880, "start": 4522.08, "end": 4530.0, "text": " So what what like what + is an additional advantage of graph uh nearest neighbors to your engines is that + you can change the metric", "tokens": [51028, 407, 437, 437, 411, 437, 307, 364, + 4497, 5002, 295, 4295, 2232, 23831, 12512, 281, 428, 12982, 307, 300, 291, 393, + 1319, 264, 20678, 51424], "temperature": 0.0, "avg_logprob": -0.28593329020908903, + "compression_ratio": 1.635, "no_speech_prob": 0.0005888649611733854}, {"id": 480, + "seek": 450880, "start": 4531.52, "end": 4533.52, "text": " Uh, so", "tokens": [51500, + 4019, 11, 370, 51600], "temperature": 0.0, "avg_logprob": -0.28593329020908903, + "compression_ratio": 1.635, "no_speech_prob": 0.0005888649611733854}, {"id": 481, + "seek": 450880, "start": 4533.6, "end": 4536.64, "text": " For instance if you are + doing multi-stage ranking", "tokens": [51604, 1171, 5197, 498, 291, 366, 884, 4825, + 12, 17882, 17833, 51756], "temperature": 0.0, "avg_logprob": -0.28593329020908903, + "compression_ratio": 1.635, "no_speech_prob": 0.0005888649611733854}, {"id": 482, + "seek": 453664, "start": 4537.12, "end": 4538.320000000001, "text": " Like the", + "tokens": [50388, 1743, 264, 50448], "temperature": 0.0, "avg_logprob": -0.3003704522245674, + "compression_ratio": 1.8436018957345972, "no_speech_prob": 0.001609328668564558}, + {"id": 483, "seek": 453664, "start": 4538.320000000001, "end": 4544.72, "text": + " You have like and you have multiple candidate sources like for search you have + something like like like the m25", "tokens": [50448, 509, 362, 411, 293, 291, 362, + 3866, 11532, 7139, 411, 337, 3164, 291, 362, 746, 411, 411, 411, 264, 275, 6074, + 50768], "temperature": 0.0, "avg_logprob": -0.3003704522245674, "compression_ratio": + 1.8436018957345972, "no_speech_prob": 0.001609328668564558}, {"id": 484, "seek": + 453664, "start": 4545.360000000001, "end": 4548.4800000000005, "text": " Also you + might have embeddings like with similarity search", "tokens": [50800, 2743, 291, + 1062, 362, 12240, 29432, 411, 365, 32194, 3164, 50956], "temperature": 0.0, "avg_logprob": + -0.3003704522245674, "compression_ratio": 1.8436018957345972, "no_speech_prob": + 0.001609328668564558}, {"id": 485, "seek": 453664, "start": 4550.0, "end": 4554.0, + "text": " So uh, and those are like three that are separate sources and then ranked", + "tokens": [51032, 407, 2232, 11, 293, 729, 366, 411, 1045, 300, 366, 4994, 7139, + 293, 550, 20197, 51232], "temperature": 0.0, "avg_logprob": -0.3003704522245674, + "compression_ratio": 1.8436018957345972, "no_speech_prob": 0.001609328668564558}, + {"id": 486, "seek": 453664, "start": 4554.88, "end": 4560.240000000001, "text": + " Uh, but essentially like why do you need an n like for the first like from from + the beginning", "tokens": [51276, 4019, 11, 457, 4476, 411, 983, 360, 291, 643, + 364, 297, 411, 337, 264, 700, 411, 490, 490, 264, 2863, 51544], "temperature": 0.0, + "avg_logprob": -0.3003704522245674, "compression_ratio": 1.8436018957345972, "no_speech_prob": + 0.001609328668564558}, {"id": 487, "seek": 453664, "start": 4560.96, "end": 4563.360000000001, + "text": " Uh, you need an n to speed up the ranking", "tokens": [51580, 4019, 11, + 291, 643, 364, 297, 281, 3073, 493, 264, 17833, 51700], "temperature": 0.0, "avg_logprob": + -0.3003704522245674, "compression_ratio": 1.8436018957345972, "no_speech_prob": + 0.001609328668564558}, {"id": 488, "seek": 456336, "start": 4563.44, "end": 4566.719999999999, + "text": " So essentially you can rank all the documents using your have a ranker", + "tokens": [50368, 407, 4476, 291, 393, 6181, 439, 264, 8512, 1228, 428, 362, 257, + 6181, 260, 50532], "temperature": 0.0, "avg_logprob": -0.2939552990895397, "compression_ratio": + 1.7887931034482758, "no_speech_prob": 0.0010237540118396282}, {"id": 489, "seek": + 456336, "start": 4567.36, "end": 4576.32, "text": " Uh, but uh you cannot it''s + to like to expensive to do so you can add an n and n is basically for vector search + uh that is you", "tokens": [50564, 4019, 11, 457, 2232, 291, 2644, 309, 311, 281, + 411, 281, 5124, 281, 360, 370, 291, 393, 909, 364, 297, 293, 297, 307, 1936, 337, + 8062, 3164, 2232, 300, 307, 291, 51012], "temperature": 0.0, "avg_logprob": -0.2939552990895397, + "compression_ratio": 1.7887931034482758, "no_speech_prob": 0.0010237540118396282}, + {"id": 490, "seek": 456336, "start": 4576.4, "end": 4581.28, "text": " Distill everything + to vectors and you have the same objective and you have like a", "tokens": [51016, + 9840, 373, 1203, 281, 18875, 293, 291, 362, 264, 912, 10024, 293, 291, 362, 411, + 257, 51260], "temperature": 0.0, "avg_logprob": -0.2939552990895397, "compression_ratio": + 1.7887931034482758, "no_speech_prob": 0.0010237540118396282}, {"id": 491, "seek": + 456336, "start": 4582.88, "end": 4584.88, "text": " Like a way to", "tokens": [51340, + 1743, 257, 636, 281, 51440], "temperature": 0.0, "avg_logprob": -0.2939552990895397, + "compression_ratio": 1.7887931034482758, "no_speech_prob": 0.0010237540118396282}, + {"id": 492, "seek": 456336, "start": 4584.88, "end": 4587.36, "text": " Sparsify + the interactions", "tokens": [51440, 1738, 685, 2505, 264, 13280, 51564], "temperature": + 0.0, "avg_logprob": -0.2939552990895397, "compression_ratio": 1.7887931034482758, + "no_speech_prob": 0.0010237540118396282}, {"id": 493, "seek": 456336, "start": 4588.24, + "end": 4593.28, "text": " Uh, but you can look about the other way so you know you + have a graph and the graph are just the", "tokens": [51608, 4019, 11, 457, 291, + 393, 574, 466, 264, 661, 636, 370, 291, 458, 291, 362, 257, 4295, 293, 264, 4295, + 366, 445, 264, 51860], "temperature": 0.0, "avg_logprob": -0.2939552990895397, "compression_ratio": + 1.7887931034482758, "no_speech_prob": 0.0010237540118396282}, {"id": 494, "seek": + 459336, "start": 4593.44, "end": 4597.839999999999, "text": " Candidates and you + have like a low simple metric now you have", "tokens": [50368, 20466, 327, 1024, + 293, 291, 362, 411, 257, 2295, 2199, 20678, 586, 291, 362, 50588], "temperature": + 0.0, "avg_logprob": -0.26624090592939775, "compression_ratio": 1.7783251231527093, + "no_speech_prob": 0.00101264170370996}, {"id": 495, "seek": 459336, "start": 4598.5599999999995, + "end": 4604.639999999999, "text": " More complicated metric on this graph and you + have like a final ranking that also can be searched on this graph", "tokens": [50624, + 5048, 6179, 20678, 322, 341, 4295, 293, 291, 362, 411, 257, 2572, 17833, 300, 611, + 393, 312, 22961, 322, 341, 4295, 50928], "temperature": 0.0, "avg_logprob": -0.26624090592939775, + "compression_ratio": 1.7783251231527093, "no_speech_prob": 0.00101264170370996}, + {"id": 496, "seek": 459336, "start": 4605.5199999999995, "end": 4607.5199999999995, + "text": " So that means you don''t supply", "tokens": [50972, 407, 300, 1355, 291, + 500, 380, 5847, 51072], "temperature": 0.0, "avg_logprob": -0.26624090592939775, + "compression_ratio": 1.7783251231527093, "no_speech_prob": 0.00101264170370996}, + {"id": 497, "seek": 459336, "start": 4608.32, "end": 4612.799999999999, "text": + " Like a set of candidates to the ranking, but rather you supply interpoints in + the graphs", "tokens": [51112, 1743, 257, 992, 295, 11255, 281, 264, 17833, 11, + 457, 2831, 291, 5847, 728, 20552, 294, 264, 24877, 51336], "temperature": 0.0, "avg_logprob": + -0.26624090592939775, "compression_ratio": 1.7783251231527093, "no_speech_prob": + 0.00101264170370996}, {"id": 498, "seek": 459336, "start": 4613.28, "end": 4615.679999999999, + "text": " So you have a graph which is uh, well", "tokens": [51360, 407, 291, 362, + 257, 4295, 597, 307, 2232, 11, 731, 51480], "temperature": 0.0, "avg_logprob": -0.26624090592939775, + "compression_ratio": 1.7783251231527093, "no_speech_prob": 0.00101264170370996}, + {"id": 499, "seek": 459336, "start": 4616.639999999999, "end": 4619.2, "text": " + Which is built a trying to uh", "tokens": [51528, 3013, 307, 3094, 257, 1382, 281, + 2232, 51656], "temperature": 0.0, "avg_logprob": -0.26624090592939775, "compression_ratio": + 1.7783251231527093, "no_speech_prob": 0.00101264170370996}, {"id": 500, "seek": + 461920, "start": 4619.28, "end": 4621.28, "text": " uh", "tokens": [50368, 2232, + 50468], "temperature": 0.0, "avg_logprob": -0.2878838309758826, "compression_ratio": + 1.634517766497462, "no_speech_prob": 0.0012936722487211227}, {"id": 501, "seek": + 461920, "start": 4621.28, "end": 4623.28, "text": " Capture the uh", "tokens": [50468, + 9480, 540, 264, 2232, 50568], "temperature": 0.0, "avg_logprob": -0.2878838309758826, + "compression_ratio": 1.634517766497462, "no_speech_prob": 0.0012936722487211227}, + {"id": 502, "seek": 461920, "start": 4623.28, "end": 4630.48, "text": " similarity + for the ranker and uh like when you so instead of filtering like from one stage + to the next stage", "tokens": [50568, 32194, 337, 264, 6181, 260, 293, 2232, 411, + 562, 291, 370, 2602, 295, 30822, 411, 490, 472, 3233, 281, 264, 958, 3233, 50928], + "temperature": 0.0, "avg_logprob": -0.2878838309758826, "compression_ratio": 1.634517766497462, + "no_speech_prob": 0.0012936722487211227}, {"id": 503, "seek": 461920, "start": 4630.639999999999, + "end": 4639.12, "text": " You can uh just switch the metric in the graph. You had + light metric, which is like vectors now you have more complicated metrics", "tokens": + [50936, 509, 393, 2232, 445, 3679, 264, 20678, 294, 264, 4295, 13, 509, 632, 1442, + 20678, 11, 597, 307, 411, 18875, 586, 291, 362, 544, 6179, 16367, 51360], "temperature": + 0.0, "avg_logprob": -0.2878838309758826, "compression_ratio": 1.634517766497462, + "no_speech_prob": 0.0012936722487211227}, {"id": 504, "seek": 461920, "start": 4639.12, + "end": 4642.48, "text": " So you hydrate the features of the elements in the graph + and like", "tokens": [51360, 407, 291, 5796, 4404, 264, 4122, 295, 264, 4959, 294, + 264, 4295, 293, 411, 51528], "temperature": 0.0, "avg_logprob": -0.2878838309758826, + "compression_ratio": 1.634517766497462, "no_speech_prob": 0.0012936722487211227}, + {"id": 505, "seek": 464248, "start": 4643.28, "end": 4647.44, "text": " Traverse + and like now you have a really complicated metric", "tokens": [50404, 5403, 4308, + 293, 411, 586, 291, 362, 257, 534, 6179, 20678, 50612], "temperature": 0.0, "avg_logprob": + -0.27549139022827146, "compression_ratio": 1.6583333333333334, "no_speech_prob": + 0.0034101265482604504}, {"id": 506, "seek": 464248, "start": 4648.08, "end": 4654.639999999999, + "text": " Which yeah, like you just very heavy, but you still you just have an interpoint + in the graph. So you explore it and you can", "tokens": [50644, 3013, 1338, 11, + 411, 291, 445, 588, 4676, 11, 457, 291, 920, 291, 445, 362, 364, 728, 6053, 294, + 264, 4295, 13, 407, 291, 6839, 309, 293, 291, 393, 50972], "temperature": 0.0, "avg_logprob": + -0.27549139022827146, "compression_ratio": 1.6583333333333334, "no_speech_prob": + 0.0034101265482604504}, {"id": 507, "seek": 464248, "start": 4655.44, "end": 4662.16, + "text": " Uh, well, you can fix some mistakes done by the previous layers. Yeah, + so it''s not exact filtering. So that''s yeah, that''s another like", "tokens": + [51012, 4019, 11, 731, 11, 291, 393, 3191, 512, 8038, 1096, 538, 264, 3894, 7914, + 13, 865, 11, 370, 309, 311, 406, 1900, 30822, 13, 407, 300, 311, 1338, 11, 300, + 311, 1071, 411, 51348], "temperature": 0.0, "avg_logprob": -0.27549139022827146, + "compression_ratio": 1.6583333333333334, "no_speech_prob": 0.0034101265482604504}, + {"id": 508, "seek": 464248, "start": 4663.36, "end": 4665.36, "text": " Maybe unique", + "tokens": [51408, 2704, 3845, 51508], "temperature": 0.0, "avg_logprob": -0.27549139022827146, + "compression_ratio": 1.6583333333333334, "no_speech_prob": 0.0034101265482604504}, + {"id": 509, "seek": 464248, "start": 4665.36, "end": 4668.4, "text": " The feature + of the graph methods. Yeah, sounds quite exciting like", "tokens": [51508, 440, + 4111, 295, 264, 4295, 7150, 13, 865, 11, 3263, 1596, 4670, 411, 51660], "temperature": + 0.0, "avg_logprob": -0.27549139022827146, "compression_ratio": 1.6583333333333334, + "no_speech_prob": 0.0034101265482604504}, {"id": 510, "seek": 466840, "start": 4668.879999999999, + "end": 4674.48, "text": " Have you have you thought about publishing this idea or + like I mean it sounds quite quite unique", "tokens": [50388, 3560, 291, 362, 291, + 1194, 466, 17832, 341, 1558, 420, 411, 286, 914, 309, 3263, 1596, 1596, 3845, 50668], + "temperature": 0.0, "avg_logprob": -0.2381882115414268, "compression_ratio": 1.6598360655737705, + "no_speech_prob": 0.001674604951404035}, {"id": 511, "seek": 466840, "start": 4675.839999999999, + "end": 4679.28, "text": " Well, it doesn''t make sense to publish an idea without + implementation", "tokens": [50736, 1042, 11, 309, 1177, 380, 652, 2020, 281, 11374, + 364, 1558, 1553, 11420, 50908], "temperature": 0.0, "avg_logprob": -0.2381882115414268, + "compression_ratio": 1.6598360655737705, "no_speech_prob": 0.001674604951404035}, + {"id": 512, "seek": 466840, "start": 4679.599999999999, "end": 4684.08, "text": + " Yeah, for sure, but maybe you can influence those who who would like to", "tokens": + [50924, 865, 11, 337, 988, 11, 457, 1310, 291, 393, 6503, 729, 567, 567, 576, 411, + 281, 51148], "temperature": 0.0, "avg_logprob": -0.2381882115414268, "compression_ratio": + 1.6598360655737705, "no_speech_prob": 0.001674604951404035}, {"id": 513, "seek": + 466840, "start": 4685.44, "end": 4687.28, "text": " Experiment on it", "tokens": + [51216, 37933, 322, 309, 51308], "temperature": 0.0, "avg_logprob": -0.2381882115414268, + "compression_ratio": 1.6598360655737705, "no_speech_prob": 0.001674604951404035}, + {"id": 514, "seek": 466840, "start": 4687.28, "end": 4692.96, "text": " At least + those who will watch this podcast. I think they will listen. They will they will + probably pick it up", "tokens": [51308, 1711, 1935, 729, 567, 486, 1159, 341, 7367, + 13, 286, 519, 436, 486, 2140, 13, 814, 486, 436, 486, 1391, 1888, 309, 493, 51592], + "temperature": 0.0, "avg_logprob": -0.2381882115414268, "compression_ratio": 1.6598360655737705, + "no_speech_prob": 0.001674604951404035}, {"id": 515, "seek": 466840, "start": 4694.0, + "end": 4696.0, "text": " Yeah, and use graph algorithms for sure", "tokens": [51644, + 865, 11, 293, 764, 4295, 14642, 337, 988, 51744], "temperature": 0.0, "avg_logprob": + -0.2381882115414268, "compression_ratio": 1.6598360655737705, "no_speech_prob": + 0.001674604951404035}, {"id": 516, "seek": 469600, "start": 4696.0, "end": 4698.0, + "text": " Like", "tokens": [50364, 1743, 50464], "temperature": 0.0, "avg_logprob": + -0.28791150381398756, "compression_ratio": 1.6220095693779903, "no_speech_prob": + 0.0030487084295600653}, {"id": 517, "seek": 469600, "start": 4698.0, "end": 4701.68, + "text": " Yeah, I mean it sounds like all all of the NN algorithms they have", "tokens": + [50464, 865, 11, 286, 914, 309, 3263, 411, 439, 439, 295, 264, 426, 45, 14642, 436, + 362, 50648], "temperature": 0.0, "avg_logprob": -0.28791150381398756, "compression_ratio": + 1.6220095693779903, "no_speech_prob": 0.0030487084295600653}, {"id": 518, "seek": + 469600, "start": 4702.72, "end": 4707.6, "text": " Like advantages and disadvantages, + right? So it''s not like the all of them are uniquely", "tokens": [50700, 1743, + 14906, 293, 37431, 11, 558, 30, 407, 309, 311, 406, 411, 264, 439, 295, 552, 366, + 31474, 50944], "temperature": 0.0, "avg_logprob": -0.28791150381398756, "compression_ratio": + 1.6220095693779903, "no_speech_prob": 0.0030487084295600653}, {"id": 519, "seek": + 469600, "start": 4709.12, "end": 4711.12, "text": " Outperforming, you know the + others", "tokens": [51020, 5925, 26765, 278, 11, 291, 458, 264, 2357, 51120], "temperature": + 0.0, "avg_logprob": -0.28791150381398756, "compression_ratio": 1.6220095693779903, + "no_speech_prob": 0.0030487084295600653}, {"id": 520, "seek": 469600, "start": 4712.24, + "end": 4720.0, "text": " Well, there is like a division like if you think about + like quantization algorithms. So they are kind of orthogonal to", "tokens": [51176, + 1042, 11, 456, 307, 411, 257, 10044, 411, 498, 291, 519, 466, 411, 4426, 2144, 14642, + 13, 407, 436, 366, 733, 295, 41488, 281, 51564], "temperature": 0.0, "avg_logprob": + -0.28791150381398756, "compression_ratio": 1.6220095693779903, "no_speech_prob": + 0.0030487084295600653}, {"id": 521, "seek": 469600, "start": 4721.04, "end": 4723.04, + "text": " Graph algorithms. So they", "tokens": [51616, 21884, 14642, 13, 407, 436, + 51716], "temperature": 0.0, "avg_logprob": -0.28791150381398756, "compression_ratio": + 1.6220095693779903, "no_speech_prob": 0.0030487084295600653}, {"id": 522, "seek": + 472304, "start": 4723.44, "end": 4729.68, "text": " They quantize so they can speed + up a compressed like I''m compressed to save the memory and speed up the computation", + "tokens": [50384, 814, 4426, 1125, 370, 436, 393, 3073, 493, 257, 30353, 411, 286, + 478, 30353, 281, 3155, 264, 4675, 293, 3073, 493, 264, 24903, 50696], "temperature": + 0.0, "avg_logprob": -0.2789344178869369, "compression_ratio": 1.6923076923076923, + "no_speech_prob": 0.0017961168196052313}, {"id": 523, "seek": 472304, "start": 4730.72, + "end": 4735.12, "text": " But like older algorithm they just use something like + IVF. So", "tokens": [50748, 583, 411, 4906, 9284, 436, 445, 764, 746, 411, 15967, + 37, 13, 407, 50968], "temperature": 0.0, "avg_logprob": -0.2789344178869369, "compression_ratio": + 1.6923076923076923, "no_speech_prob": 0.0017961168196052313}, {"id": 524, "seek": + 472304, "start": 4735.76, "end": 4741.2, "text": " And then like one layer filtering + and you can use graphs instead of IVF", "tokens": [51000, 400, 550, 411, 472, 4583, + 30822, 293, 291, 393, 764, 24877, 2602, 295, 15967, 37, 51272], "temperature": 0.0, + "avg_logprob": -0.2789344178869369, "compression_ratio": 1.6923076923076923, "no_speech_prob": + 0.0017961168196052313}, {"id": 525, "seek": 472304, "start": 4741.68, "end": 4746.32, + "text": " Right, so we can use graphs and add the quantization and at the Faiz did + that", "tokens": [51296, 1779, 11, 370, 321, 393, 764, 24877, 293, 909, 264, 4426, + 2144, 293, 412, 264, 12710, 590, 630, 300, 51528], "temperature": 0.0, "avg_logprob": + -0.2789344178869369, "compression_ratio": 1.6923076923076923, "no_speech_prob": + 0.0017961168196052313}, {"id": 526, "seek": 472304, "start": 4747.04, "end": 4748.24, + "text": " before", "tokens": [51564, 949, 51624], "temperature": 0.0, "avg_logprob": + -0.2789344178869369, "compression_ratio": 1.6923076923076923, "no_speech_prob": + 0.0017961168196052313}, {"id": 527, "seek": 472304, "start": 4748.24, "end": 4750.96, + "text": " Yeah, I think some others also did that", "tokens": [51624, 865, 11, 286, + 519, 512, 2357, 611, 630, 300, 51760], "temperature": 0.0, "avg_logprob": -0.2789344178869369, + "compression_ratio": 1.6923076923076923, "no_speech_prob": 0.0017961168196052313}, + {"id": 528, "seek": 475096, "start": 4751.6, "end": 4756.08, "text": " Yeah, and + the thing and then like vector databases actually offer it as one method like like", + "tokens": [50396, 865, 11, 293, 264, 551, 293, 550, 411, 8062, 22380, 767, 2626, + 309, 382, 472, 3170, 411, 411, 50620], "temperature": 0.0, "avg_logprob": -0.20382654868950278, + "compression_ratio": 1.6748251748251748, "no_speech_prob": 0.0015540739987045527}, + {"id": 529, "seek": 475096, "start": 4756.72, "end": 4763.76, "text": " Milvus for + example like they offer IVF and then you can choose like if you want to do exact + K&N or if you want to do A&N", "tokens": [50652, 7036, 85, 301, 337, 1365, 411, + 436, 2626, 15967, 37, 293, 550, 291, 393, 2826, 411, 498, 291, 528, 281, 360, 1900, + 591, 5, 45, 420, 498, 291, 528, 281, 360, 316, 5, 45, 51004], "temperature": 0.0, + "avg_logprob": -0.20382654868950278, "compression_ratio": 1.6748251748251748, "no_speech_prob": + 0.0015540739987045527}, {"id": 530, "seek": 475096, "start": 4764.32, "end": 4766.8, + "text": " So you can actually configure it in different ways", "tokens": [51032, + 407, 291, 393, 767, 22162, 309, 294, 819, 2098, 51156], "temperature": 0.0, "avg_logprob": + -0.20382654868950278, "compression_ratio": 1.6748251748251748, "no_speech_prob": + 0.0015540739987045527}, {"id": 531, "seek": 475096, "start": 4768.0, "end": 4770.4, + "text": " Yeah, I mean just sounds like you''re", "tokens": [51216, 865, 11, 286, + 914, 445, 3263, 411, 291, 434, 51336], "temperature": 0.0, "avg_logprob": -0.20382654868950278, + "compression_ratio": 1.6748251748251748, "no_speech_prob": 0.0015540739987045527}, + {"id": 532, "seek": 475096, "start": 4771.2, "end": 4777.04, "text": " Without maybe + realizing much like you are at the core of what''s happening in vector search in + some sense", "tokens": [51376, 9129, 1310, 16734, 709, 411, 291, 366, 412, 264, + 4965, 295, 437, 311, 2737, 294, 8062, 3164, 294, 512, 2020, 51668], "temperature": + 0.0, "avg_logprob": -0.20382654868950278, "compression_ratio": 1.6748251748251748, + "no_speech_prob": 0.0015540739987045527}, {"id": 533, "seek": 475096, "start": 4777.52, + "end": 4780.4, "text": " Of course, there have been other multiple contributions, + right? But like", "tokens": [51692, 2720, 1164, 11, 456, 362, 668, 661, 3866, 15725, + 11, 558, 30, 583, 411, 51836], "temperature": 0.0, "avg_logprob": -0.20382654868950278, + "compression_ratio": 1.6748251748251748, "no_speech_prob": 0.0015540739987045527}, + {"id": 534, "seek": 478040, "start": 4780.799999999999, "end": 4785.28, "text": + " For some reason exactly your algorithm has been picked by many vector databases", + "tokens": [50384, 1171, 512, 1778, 2293, 428, 9284, 575, 668, 6183, 538, 867, 8062, + 22380, 50608], "temperature": 0.0, "avg_logprob": -0.18953146683542352, "compression_ratio": + 1.6857142857142857, "no_speech_prob": 0.000993597786873579}, {"id": 535, "seek": + 478040, "start": 4785.36, "end": 4788.0, "text": " There are like seven of them. + So actually wrote a blog about", "tokens": [50612, 821, 366, 411, 3407, 295, 552, + 13, 407, 767, 4114, 257, 6968, 466, 50744], "temperature": 0.0, "avg_logprob": -0.18953146683542352, + "compression_ratio": 1.6857142857142857, "no_speech_prob": 0.000993597786873579}, + {"id": 536, "seek": 478040, "start": 4788.719999999999, "end": 4793.2, "text": " + Six of them and then seventh kind of knocked on my door and said can you also add + at us", "tokens": [50780, 11678, 295, 552, 293, 550, 17875, 733, 295, 16914, 322, + 452, 2853, 293, 848, 393, 291, 611, 909, 412, 505, 51004], "temperature": 0.0, "avg_logprob": + -0.18953146683542352, "compression_ratio": 1.6857142857142857, "no_speech_prob": + 0.000993597786873579}, {"id": 537, "seek": 478040, "start": 4794.5599999999995, + "end": 4803.12, "text": " And so when I when I was going through different databases + like in Java implemented in Java or in in Python or you know in Rust and go", "tokens": + [51072, 400, 370, 562, 286, 562, 286, 390, 516, 807, 819, 22380, 411, 294, 10745, + 12270, 294, 10745, 420, 294, 294, 15329, 420, 291, 458, 294, 34952, 293, 352, 51500], + "temperature": 0.0, "avg_logprob": -0.18953146683542352, "compression_ratio": 1.6857142857142857, + "no_speech_prob": 0.000993597786873579}, {"id": 538, "seek": 478040, "start": 4803.599999999999, + "end": 4806.4, "text": " All of them picked your algorithm for some reason", "tokens": + [51524, 1057, 295, 552, 6183, 428, 9284, 337, 512, 1778, 51664], "temperature": + 0.0, "avg_logprob": -0.18953146683542352, "compression_ratio": 1.6857142857142857, + "no_speech_prob": 0.000993597786873579}, {"id": 539, "seek": 480640, "start": 4807.36, + "end": 4819.44, "text": " So like maybe it was easier like it''s a combination of + how easy it is to implement how transparent it is like to understand right and then + basically it''s stability. So it''s like a combination of things", "tokens": [50412, + 407, 411, 1310, 309, 390, 3571, 411, 309, 311, 257, 6562, 295, 577, 1858, 309, 307, + 281, 4445, 577, 12737, 309, 307, 411, 281, 1223, 558, 293, 550, 1936, 309, 311, + 11826, 13, 407, 309, 311, 411, 257, 6562, 295, 721, 51016], "temperature": 0.0, + "avg_logprob": -0.1975628266851586, "compression_ratio": 1.7320574162679425, "no_speech_prob": + 0.0028562198858708143}, {"id": 540, "seek": 480640, "start": 4822.24, "end": 4825.679999999999, + "text": " Yeah, probably like I''m not totally sure", "tokens": [51156, 865, 11, + 1391, 411, 286, 478, 406, 3879, 988, 51328], "temperature": 0.0, "avg_logprob": + -0.1975628266851586, "compression_ratio": 1.7320574162679425, "no_speech_prob": + 0.0028562198858708143}, {"id": 541, "seek": 480640, "start": 4826.32, "end": 4834.08, + "text": " So yeah, the initial library also was implemented as a header on the well, + not the initial so that was a second library", "tokens": [51360, 407, 1338, 11, + 264, 5883, 6405, 611, 390, 12270, 382, 257, 23117, 322, 264, 731, 11, 406, 264, + 5883, 370, 300, 390, 257, 1150, 6405, 51748], "temperature": 0.0, "avg_logprob": + -0.1975628266851586, "compression_ratio": 1.7320574162679425, "no_speech_prob": + 0.0028562198858708143}, {"id": 542, "seek": 483408, "start": 4834.72, "end": 4840.24, + "text": " So there there was a problem with HNW lippen implementation and NMS lip", + "tokens": [50396, 407, 456, 456, 390, 257, 1154, 365, 389, 45, 54, 375, 21278, 11420, + 293, 426, 10288, 8280, 50672], "temperature": 0.0, "avg_logprob": -0.4676639513037671, + "compression_ratio": 1.6116504854368932, "no_speech_prob": 0.0030006265733391047}, + {"id": 543, "seek": 483408, "start": 4840.8, "end": 4845.04, "text": " So it so + like the NMS lip format was a bit restrictive", "tokens": [50700, 407, 309, 370, + 411, 264, 426, 10288, 8280, 7877, 390, 257, 857, 43220, 50912], "temperature": 0.0, + "avg_logprob": -0.4676639513037671, "compression_ratio": 1.6116504854368932, "no_speech_prob": + 0.0030006265733391047}, {"id": 544, "seek": 483408, "start": 4845.68, "end": 4848.48, + "text": " Like for efficient operation. So it converted it to", "tokens": [50944, + 1743, 337, 7148, 6916, 13, 407, 309, 16424, 309, 281, 51084], "temperature": 0.0, + "avg_logprob": -0.4676639513037671, "compression_ratio": 1.6116504854368932, "no_speech_prob": + 0.0030006265733391047}, {"id": 545, "seek": 483408, "start": 4849.76, "end": 4853.28, + "text": " Flat memory format and so that that", "tokens": [51148, 36172, 4675, 7877, + 293, 370, 300, 300, 51324], "temperature": 0.0, "avg_logprob": -0.4676639513037671, + "compression_ratio": 1.6116504854368932, "no_speech_prob": 0.0030006265733391047}, + {"id": 546, "seek": 483408, "start": 4854.16, "end": 4860.8, "text": " That makes + made construction slower and memory can sub-share bigger. So was re-implemented + as I had a wrongly library", "tokens": [51368, 663, 1669, 1027, 6435, 14009, 293, + 4675, 393, 1422, 12, 2716, 543, 3801, 13, 407, 390, 319, 12, 332, 781, 14684, 382, + 286, 632, 257, 2085, 356, 6405, 51700], "temperature": 0.0, "avg_logprob": -0.4676639513037671, + "compression_ratio": 1.6116504854368932, "no_speech_prob": 0.0030006265733391047}, + {"id": 547, "seek": 486080, "start": 4861.4400000000005, "end": 4864.0, "text": + " So header on the library was inspired by an I", "tokens": [50396, 407, 23117, + 322, 264, 6405, 390, 7547, 538, 364, 286, 50524], "temperature": 0.0, "avg_logprob": + -0.2919126465207055, "compression_ratio": 1.6122448979591837, "no_speech_prob": + 0.0021126149222254753}, {"id": 548, "seek": 486080, "start": 4864.56, "end": 4867.12, + "text": " So like by the success also and I", "tokens": [50552, 407, 411, 538, 264, + 2245, 611, 293, 286, 50680], "temperature": 0.0, "avg_logprob": -0.2919126465207055, + "compression_ratio": 1.6122448979591837, "no_speech_prob": 0.0021126149222254753}, + {"id": 549, "seek": 486080, "start": 4868.24, "end": 4872.72, "text": " Think that + also might have contributed because it''s very easy to like integrate it", "tokens": + [50736, 6557, 300, 611, 1062, 362, 18434, 570, 309, 311, 588, 1858, 281, 411, 13365, + 309, 50960], "temperature": 0.0, "avg_logprob": -0.2919126465207055, "compression_ratio": + 1.6122448979591837, "no_speech_prob": 0.0021126149222254753}, {"id": 550, "seek": + 486080, "start": 4873.52, "end": 4876.8, "text": " So there are a few files it compiles + in some seconds", "tokens": [51000, 407, 456, 366, 257, 1326, 7098, 309, 715, 4680, + 294, 512, 3949, 51164], "temperature": 0.0, "avg_logprob": -0.2919126465207055, + "compression_ratio": 1.6122448979591837, "no_speech_prob": 0.0021126149222254753}, + {"id": 551, "seek": 486080, "start": 4878.16, "end": 4879.6, "text": " Yeah, no", + "tokens": [51232, 865, 11, 572, 51304], "temperature": 0.0, "avg_logprob": -0.2919126465207055, + "compression_ratio": 1.6122448979591837, "no_speech_prob": 0.0021126149222254753}, + {"id": 552, "seek": 486080, "start": 4879.6, "end": 4881.6, "text": " Maybe maybe + also that help", "tokens": [51304, 2704, 1310, 611, 300, 854, 51404], "temperature": + 0.0, "avg_logprob": -0.2919126465207055, "compression_ratio": 1.6122448979591837, + "no_speech_prob": 0.0021126149222254753}, {"id": 553, "seek": 486080, "start": 4881.6, + "end": 4884.56, "text": " So the library itself is simple and easy to integrate", + "tokens": [51404, 407, 264, 6405, 2564, 307, 2199, 293, 1858, 281, 13365, 51552], + "temperature": 0.0, "avg_logprob": -0.2919126465207055, "compression_ratio": 1.6122448979591837, + "no_speech_prob": 0.0021126149222254753}, {"id": 554, "seek": 486080, "start": 4885.28, + "end": 4886.96, "text": " Yeah, yeah", "tokens": [51588, 865, 11, 1338, 51672], + "temperature": 0.0, "avg_logprob": -0.2919126465207055, "compression_ratio": 1.6122448979591837, + "no_speech_prob": 0.0021126149222254753}, {"id": 555, "seek": 488696, "start": 4887.12, + "end": 4891.12, "text": " And I mean it must feel kind of cool to have this impact", + "tokens": [50372, 400, 286, 914, 309, 1633, 841, 733, 295, 1627, 281, 362, 341, + 2712, 50572], "temperature": 0.0, "avg_logprob": -0.20754295587539673, "compression_ratio": + 1.6333333333333333, "no_speech_prob": 0.00152643583714962}, {"id": 556, "seek": + 488696, "start": 4891.52, "end": 4895.12, "text": " But but I also I also hope like + you you will continue kind of", "tokens": [50592, 583, 457, 286, 611, 286, 611, + 1454, 411, 291, 291, 486, 2354, 733, 295, 50772], "temperature": 0.0, "avg_logprob": + -0.20754295587539673, "compression_ratio": 1.6333333333333333, "no_speech_prob": + 0.00152643583714962}, {"id": 557, "seek": 488696, "start": 4895.84, "end": 4901.2, + "text": " Maybe doing some publishable publishable work in some fashion and doesn''t + need to be a journal", "tokens": [50808, 2704, 884, 512, 11374, 712, 11374, 712, + 589, 294, 512, 6700, 293, 1177, 380, 643, 281, 312, 257, 6708, 51076], "temperature": + 0.0, "avg_logprob": -0.20754295587539673, "compression_ratio": 1.6333333333333333, + "no_speech_prob": 0.00152643583714962}, {"id": 558, "seek": 488696, "start": 4901.2, + "end": 4903.84, "text": " Which is rejected five times but something else", "tokens": + [51076, 3013, 307, 15749, 1732, 1413, 457, 746, 1646, 51208], "temperature": 0.0, + "avg_logprob": -0.20754295587539673, "compression_ratio": 1.6333333333333333, "no_speech_prob": + 0.00152643583714962}, {"id": 559, "seek": 488696, "start": 4904.56, "end": 4906.72, + "text": " Is this something that you are planning to do or", "tokens": [51244, 1119, + 341, 746, 300, 291, 366, 5038, 281, 360, 420, 51352], "temperature": 0.0, "avg_logprob": + -0.20754295587539673, "compression_ratio": 1.6333333333333333, "no_speech_prob": + 0.00152643583714962}, {"id": 560, "seek": 488696, "start": 4909.6, "end": 4915.2, + "text": " Uh, well that depends so like I cannot talk too much about my work in + Twitter. So", "tokens": [51496, 4019, 11, 731, 300, 5946, 370, 411, 286, 2644, 751, + 886, 709, 466, 452, 589, 294, 5794, 13, 407, 51776], "temperature": 0.0, "avg_logprob": + -0.20754295587539673, "compression_ratio": 1.6333333333333333, "no_speech_prob": + 0.00152643583714962}, {"id": 561, "seek": 491696, "start": 4917.52, "end": 4919.52, + "text": " So", "tokens": [50392, 407, 50492], "temperature": 0.0, "avg_logprob": + -0.37414201100667316, "compression_ratio": 1.5507246376811594, "no_speech_prob": + 0.0019481817726045847}, {"id": 562, "seek": 491696, "start": 4920.32, "end": 4929.68, + "text": " Maybe maybe we will publish something so that that depends on how it goes. + I mean, I''m near even nearest neighbors. Yeah, not not only but yeah", "tokens": + [50532, 2704, 1310, 321, 486, 11374, 746, 370, 300, 300, 5946, 322, 577, 309, 1709, + 13, 286, 914, 11, 286, 478, 2651, 754, 23831, 12512, 13, 865, 11, 406, 406, 787, + 457, 1338, 51000], "temperature": 0.0, "avg_logprob": -0.37414201100667316, "compression_ratio": + 1.5507246376811594, "no_speech_prob": 0.0019481817726045847}, {"id": 563, "seek": + 491696, "start": 4931.44, "end": 4935.44, "text": " But I it''s hard to predict + now if it works well", "tokens": [51088, 583, 286, 309, 311, 1152, 281, 6069, 586, + 498, 309, 1985, 731, 51288], "temperature": 0.0, "avg_logprob": -0.37414201100667316, + "compression_ratio": 1.5507246376811594, "no_speech_prob": 0.0019481817726045847}, + {"id": 564, "seek": 491696, "start": 4936.16, "end": 4943.92, "text": " So that + then publish yeah, at least the idea that you mentioned like I mean if you if it''s + outside Twitter for example in hnsw", "tokens": [51324, 407, 300, 550, 11374, 1338, + 11, 412, 1935, 264, 1558, 300, 291, 2835, 411, 286, 914, 498, 291, 498, 309, 311, + 2380, 5794, 337, 1365, 294, 276, 3695, 86, 51712], "temperature": 0.0, "avg_logprob": + -0.37414201100667316, "compression_ratio": 1.5507246376811594, "no_speech_prob": + 0.0019481817726045847}, {"id": 565, "seek": 494392, "start": 4943.92, "end": 4948.24, + "text": " Your library like the idea of this multistage ranking sounds quite exciting", + "tokens": [50364, 2260, 6405, 411, 264, 1558, 295, 341, 2120, 468, 609, 17833, 3263, + 1596, 4670, 50580], "temperature": 0.0, "avg_logprob": -0.34343172429682134, "compression_ratio": + 1.7242990654205608, "no_speech_prob": 0.020026100799441338}, {"id": 566, "seek": + 494392, "start": 4948.96, "end": 4956.4800000000005, "text": " Um, uh, well, I think + it can be implemented only by the teams who own the rankers and all the whole pipeline", + "tokens": [50616, 3301, 11, 2232, 11, 731, 11, 286, 519, 309, 393, 312, 12270, 787, + 538, 264, 5491, 567, 1065, 264, 6181, 433, 293, 439, 264, 1379, 15517, 50992], "temperature": + 0.0, "avg_logprob": -0.34343172429682134, "compression_ratio": 1.7242990654205608, + "no_speech_prob": 0.020026100799441338}, {"id": 567, "seek": 494392, "start": 4956.96, + "end": 4960.24, "text": " Yes, true. I think it can be implemented as like", "tokens": + [51016, 1079, 11, 2074, 13, 286, 519, 309, 393, 312, 12270, 382, 411, 51180], "temperature": + 0.0, "avg_logprob": -0.34343172429682134, "compression_ratio": 1.7242990654205608, + "no_speech_prob": 0.020026100799441338}, {"id": 568, "seek": 494392, "start": 4961.2, + "end": 4969.28, "text": " As I think you need to hide ha you need to hydrate the + features and like yeah on the fly and have feature hydration is very specific to", + "tokens": [51228, 1018, 286, 519, 291, 643, 281, 6479, 324, 291, 643, 281, 5796, + 4404, 264, 4122, 293, 411, 1338, 322, 264, 3603, 293, 362, 4111, 43631, 307, 588, + 2685, 281, 51632], "temperature": 0.0, "avg_logprob": -0.34343172429682134, "compression_ratio": + 1.7242990654205608, "no_speech_prob": 0.020026100799441338}, {"id": 569, "seek": + 496928, "start": 4969.84, "end": 4971.04, "text": " application", "tokens": [50392, + 3861, 50452], "temperature": 0.0, "avg_logprob": -0.22211335834703946, "compression_ratio": + 1.5893719806763285, "no_speech_prob": 0.0044957296922802925}, {"id": 570, "seek": + 496928, "start": 4971.04, "end": 4973.759999999999, "text": " Yeah, but not only + inside the production environment", "tokens": [50452, 865, 11, 457, 406, 787, 1854, + 264, 4265, 2823, 50588], "temperature": 0.0, "avg_logprob": -0.22211335834703946, + "compression_ratio": 1.5893719806763285, "no_speech_prob": 0.0044957296922802925}, + {"id": 571, "seek": 496928, "start": 4974.32, "end": 4984.719999999999, "text": + " Yeah, that makes sense. Yeah, uh, so maybe it will call for creation of data sets + and kind of this benchmarks if the industry chooses to move in that direction", + "tokens": [50616, 865, 11, 300, 1669, 2020, 13, 865, 11, 2232, 11, 370, 1310, 309, + 486, 818, 337, 8016, 295, 1412, 6352, 293, 733, 295, 341, 43751, 498, 264, 3518, + 25963, 281, 1286, 294, 300, 3513, 51136], "temperature": 0.0, "avg_logprob": -0.22211335834703946, + "compression_ratio": 1.5893719806763285, "no_speech_prob": 0.0044957296922802925}, + {"id": 572, "seek": 496928, "start": 4986.16, "end": 4990.5599999999995, "text": + " Well, like there are some obvious problems with data privacy", "tokens": [51208, + 1042, 11, 411, 456, 366, 512, 6322, 2740, 365, 1412, 11427, 51428], "temperature": + 0.0, "avg_logprob": -0.22211335834703946, "compression_ratio": 1.5893719806763285, + "no_speech_prob": 0.0044957296922802925}, {"id": 573, "seek": 496928, "start": 4991.44, + "end": 4994.24, "text": " With that so it''s hard to publish something", "tokens": + [51472, 2022, 300, 370, 309, 311, 1152, 281, 11374, 746, 51612], "temperature": + 0.0, "avg_logprob": -0.22211335834703946, "compression_ratio": 1.5893719806763285, + "no_speech_prob": 0.0044957296922802925}, {"id": 574, "seek": 499424, "start": 4994.96, + "end": 5003.04, "text": " Well, you can think of a toy problem. So like you have + like you don''t do actual like work with users, but maybe", "tokens": [50400, 1042, + 11, 291, 393, 519, 295, 257, 12058, 1154, 13, 407, 411, 291, 362, 411, 291, 500, + 380, 360, 3539, 411, 589, 365, 5022, 11, 457, 1310, 50804], "temperature": 0.0, + "avg_logprob": -0.2812381244841076, "compression_ratio": 1.6237113402061856, "no_speech_prob": + 0.0055713895708322525}, {"id": 575, "seek": 499424, "start": 5004.24, "end": 5008.4, + "text": " We do image to image search and you have like a huge transformer model", + "tokens": [50864, 492, 360, 3256, 281, 3256, 3164, 293, 291, 362, 411, 257, 2603, + 31782, 2316, 51072], "temperature": 0.0, "avg_logprob": -0.2812381244841076, "compression_ratio": + 1.6237113402061856, "no_speech_prob": 0.0055713895708322525}, {"id": 576, "seek": + 499424, "start": 5009.36, "end": 5012.24, "text": " On top of that or maybe like + something like marco", "tokens": [51120, 1282, 1192, 295, 300, 420, 1310, 411, 746, + 411, 1849, 1291, 51264], "temperature": 0.0, "avg_logprob": -0.2812381244841076, + "compression_ratio": 1.6237113402061856, "no_speech_prob": 0.0055713895708322525}, + {"id": 577, "seek": 499424, "start": 5012.719999999999, "end": 5016.88, "text": + " Emma smart car maybe it can be done with that like experimented", "tokens": [51288, + 17124, 4069, 1032, 1310, 309, 393, 312, 1096, 365, 300, 411, 5120, 292, 51496], + "temperature": 0.0, "avg_logprob": -0.2812381244841076, "compression_ratio": 1.6237113402061856, + "no_speech_prob": 0.0055713895708322525}, {"id": 578, "seek": 499424, "start": 5017.76, + "end": 5019.76, "text": " Hmm, maybe so", "tokens": [51540, 8239, 11, 1310, 370, + 51640], "temperature": 0.0, "avg_logprob": -0.2812381244841076, "compression_ratio": + 1.6237113402061856, "no_speech_prob": 0.0055713895708322525}, {"id": 579, "seek": + 499424, "start": 5020.32, "end": 5021.599999999999, "text": " Yeah", "tokens": [51668, + 865, 51732], "temperature": 0.0, "avg_logprob": -0.2812381244841076, "compression_ratio": + 1.6237113402061856, "no_speech_prob": 0.0055713895708322525}, {"id": 580, "seek": + 502160, "start": 5021.76, "end": 5027.200000000001, "text": " Yeah, I think we weren''t + really deep today. You really I think it was really really cold cold talking to + you", "tokens": [50372, 865, 11, 286, 519, 321, 4999, 380, 534, 2452, 965, 13, 509, + 534, 286, 519, 309, 390, 534, 534, 3554, 3554, 1417, 281, 291, 50644], "temperature": + 0.0, "avg_logprob": -0.2723648628492034, "compression_ratio": 1.6556603773584906, + "no_speech_prob": 0.0013995451154187322}, {"id": 581, "seek": 502160, "start": 5027.4400000000005, + "end": 5029.6, "text": " I always like to still ask", "tokens": [50656, 286, 1009, + 411, 281, 920, 1029, 50764], "temperature": 0.0, "avg_logprob": -0.2723648628492034, + "compression_ratio": 1.6556603773584906, "no_speech_prob": 0.0013995451154187322}, + {"id": 582, "seek": 502160, "start": 5030.4800000000005, "end": 5035.6, "text": + " Kind of this question orthogonal question of why like it''s a little bit more + philosophical", "tokens": [50808, 9242, 295, 341, 1168, 41488, 1168, 295, 983, 411, + 309, 311, 257, 707, 857, 544, 25066, 51064], "temperature": 0.0, "avg_logprob": + -0.2723648628492034, "compression_ratio": 1.6556603773584906, "no_speech_prob": + 0.0013995451154187322}, {"id": 583, "seek": 502160, "start": 5035.68, "end": 5037.68, + "text": " But like if you''re not a verse", "tokens": [51068, 583, 411, 498, 291, + 434, 406, 257, 7996, 51168], "temperature": 0.0, "avg_logprob": -0.2723648628492034, + "compression_ratio": 1.6556603773584906, "no_speech_prob": 0.0013995451154187322}, + {"id": 584, "seek": 502160, "start": 5038.400000000001, "end": 5040.400000000001, + "text": " Of philosophy like why", "tokens": [51204, 2720, 10675, 411, 983, 51304], + "temperature": 0.0, "avg_logprob": -0.2723648628492034, "compression_ratio": 1.6556603773584906, + "no_speech_prob": 0.0013995451154187322}, {"id": 585, "seek": 502160, "start": 5041.120000000001, + "end": 5046.72, "text": " Would you say like this field attracted you like in your + own words?", "tokens": [51340, 6068, 291, 584, 411, 341, 2519, 15912, 291, 411, + 294, 428, 1065, 2283, 30, 51620], "temperature": 0.0, "avg_logprob": -0.2723648628492034, + "compression_ratio": 1.6556603773584906, "no_speech_prob": 0.0013995451154187322}, + {"id": 586, "seek": 502160, "start": 5048.56, "end": 5050.08, "text": " Uh", "tokens": + [51712, 4019, 51788], "temperature": 0.0, "avg_logprob": -0.2723648628492034, "compression_ratio": + 1.6556603773584906, "no_speech_prob": 0.0013995451154187322}, {"id": 587, "seek": + 505008, "start": 5050.8, "end": 5053.6, "text": " I didn''t have much choice. It + just was like I", "tokens": [50400, 286, 994, 380, 362, 709, 3922, 13, 467, 445, + 390, 411, 286, 50540], "temperature": 0.0, "avg_logprob": -0.2576492533964269, "compression_ratio": + 1.658878504672897, "no_speech_prob": 0.0005544802406802773}, {"id": 588, "seek": + 505008, "start": 5054.64, "end": 5057.84, "text": " I got my first job offer and + that was", "tokens": [50592, 286, 658, 452, 700, 1691, 2626, 293, 300, 390, 50752], + "temperature": 0.0, "avg_logprob": -0.2576492533964269, "compression_ratio": 1.658878504672897, + "no_speech_prob": 0.0005544802406802773}, {"id": 589, "seek": 505008, "start": 5059.04, + "end": 5061.04, "text": " In this field", "tokens": [50812, 682, 341, 2519, 50912], + "temperature": 0.0, "avg_logprob": -0.2576492533964269, "compression_ratio": 1.658878504672897, + "no_speech_prob": 0.0005544802406802773}, {"id": 590, "seek": 505008, "start": 5061.76, + "end": 5070.24, "text": " That''s that''s about scale so like people like scaling + and like many games when you play on like android or other stuff", "tokens": [50948, + 663, 311, 300, 311, 466, 4373, 370, 411, 561, 411, 21589, 293, 411, 867, 2813, 562, + 291, 862, 322, 411, 36157, 420, 661, 1507, 51372], "temperature": 0.0, "avg_logprob": + -0.2576492533964269, "compression_ratio": 1.658878504672897, "no_speech_prob": 0.0005544802406802773}, + {"id": 591, "seek": 505008, "start": 5070.24, "end": 5077.84, "text": " They''re + based on scaling so you do like a little action and there are like huge consequences + of those actions like destroying something or", "tokens": [51372, 814, 434, 2361, + 322, 21589, 370, 291, 360, 411, 257, 707, 3069, 293, 456, 366, 411, 2603, 10098, + 295, 729, 5909, 411, 19926, 746, 420, 51752], "temperature": 0.0, "avg_logprob": + -0.2576492533964269, "compression_ratio": 1.658878504672897, "no_speech_prob": 0.0005544802406802773}, + {"id": 592, "seek": 507784, "start": 5078.56, "end": 5080.56, "text": " Like and + that is scaling and", "tokens": [50400, 1743, 293, 300, 307, 21589, 293, 50500], + "temperature": 0.0, "avg_logprob": -0.21875047246250537, "compression_ratio": 1.6719367588932805, + "no_speech_prob": 0.0016594500048086047}, {"id": 593, "seek": 507784, "start": 5081.2, + "end": 5085.76, "text": " uh, this is just like a pure scale of how how how we scale + machine learning", "tokens": [50532, 2232, 11, 341, 307, 445, 411, 257, 6075, 4373, + 295, 577, 577, 577, 321, 4373, 3479, 2539, 50760], "temperature": 0.0, "avg_logprob": + -0.21875047246250537, "compression_ratio": 1.6719367588932805, "no_speech_prob": + 0.0016594500048086047}, {"id": 594, "seek": 507784, "start": 5086.8, "end": 5088.24, + "text": " Applications", "tokens": [50812, 26519, 763, 50884], "temperature": 0.0, + "avg_logprob": -0.21875047246250537, "compression_ratio": 1.6719367588932805, "no_speech_prob": + 0.0016594500048086047}, {"id": 595, "seek": 507784, "start": 5088.32, "end": 5093.52, + "text": " Yeah, so on on one hand it kind of was predefined as you said you found + the job on the other hand", "tokens": [50888, 865, 11, 370, 322, 322, 472, 1011, + 309, 733, 295, 390, 659, 37716, 382, 291, 848, 291, 1352, 264, 1691, 322, 264, 661, + 1011, 51148], "temperature": 0.0, "avg_logprob": -0.21875047246250537, "compression_ratio": + 1.6719367588932805, "no_speech_prob": 0.0016594500048086047}, {"id": 596, "seek": + 507784, "start": 5094.0, "end": 5099.68, "text": " You still were curious to implement + that algorithm. So like it wasn''t like somebody said okay", "tokens": [51172, 509, + 920, 645, 6369, 281, 4445, 300, 9284, 13, 407, 411, 309, 2067, 380, 411, 2618, 848, + 1392, 51456], "temperature": 0.0, "avg_logprob": -0.21875047246250537, "compression_ratio": + 1.6719367588932805, "no_speech_prob": 0.0016594500048086047}, {"id": 597, "seek": + 507784, "start": 5099.68, "end": 5105.12, "text": " You have to do it right you + could also choose a job of like okay. I''m just coding nine to five and then I go + home", "tokens": [51456, 509, 362, 281, 360, 309, 558, 291, 727, 611, 2826, 257, + 1691, 295, 411, 1392, 13, 286, 478, 445, 17720, 4949, 281, 1732, 293, 550, 286, + 352, 1280, 51728], "temperature": 0.0, "avg_logprob": -0.21875047246250537, "compression_ratio": + 1.6719367588932805, "no_speech_prob": 0.0016594500048086047}, {"id": 598, "seek": + 510512, "start": 5106.08, "end": 5108.88, "text": " But like you still decided to + implement an algorithm", "tokens": [50412, 583, 411, 291, 920, 3047, 281, 4445, + 364, 9284, 50552], "temperature": 0.0, "avg_logprob": -0.23932915642147973, "compression_ratio": + 1.6132075471698113, "no_speech_prob": 0.003961828537285328}, {"id": 599, "seek": + 510512, "start": 5111.12, "end": 5114.24, "text": " Well, yes, well that that was + a fun job. So", "tokens": [50664, 1042, 11, 2086, 11, 731, 300, 300, 390, 257, 1019, + 1691, 13, 407, 50820], "temperature": 0.0, "avg_logprob": -0.23932915642147973, + "compression_ratio": 1.6132075471698113, "no_speech_prob": 0.003961828537285328}, + {"id": 600, "seek": 510512, "start": 5115.5199999999995, "end": 5122.24, "text": + " Yeah, so like you were not scared by the challenge itself, right? Maybe was it + like motivating actually", "tokens": [50884, 865, 11, 370, 411, 291, 645, 406, 5338, + 538, 264, 3430, 2564, 11, 558, 30, 2704, 390, 309, 411, 41066, 767, 51220], "temperature": + 0.0, "avg_logprob": -0.23932915642147973, "compression_ratio": 1.6132075471698113, + "no_speech_prob": 0.003961828537285328}, {"id": 601, "seek": 510512, "start": 5123.68, + "end": 5128.0, "text": " There was no like that much push like from the like", "tokens": + [51292, 821, 390, 572, 411, 300, 709, 2944, 411, 490, 264, 411, 51508], "temperature": + 0.0, "avg_logprob": -0.23932915642147973, "compression_ratio": 1.6132075471698113, + "no_speech_prob": 0.003961828537285328}, {"id": 602, "seek": 510512, "start": 5129.12, + "end": 5134.4, "text": " From from the company itself. So we could we could do whatever + we want inside the company", "tokens": [51564, 3358, 490, 264, 2237, 2564, 13, 407, + 321, 727, 321, 727, 360, 2035, 321, 528, 1854, 264, 2237, 51828], "temperature": + 0.0, "avg_logprob": -0.23932915642147973, "compression_ratio": 1.6132075471698113, + "no_speech_prob": 0.003961828537285328}, {"id": 603, "seek": 513440, "start": 5134.4, + "end": 5136.4, "text": " So it was very like relaxed", "tokens": [50364, 407, 309, + 390, 588, 411, 14628, 50464], "temperature": 0.0, "avg_logprob": -0.1659189987182617, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.0006032013334333897}, + {"id": 604, "seek": 513440, "start": 5136.799999999999, "end": 5141.2, "text": " + Yeah, that might be actually a really good background to invent things", "tokens": + [50484, 865, 11, 300, 1062, 312, 767, 257, 534, 665, 3678, 281, 7962, 721, 50704], + "temperature": 0.0, "avg_logprob": -0.1659189987182617, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.0006032013334333897}, {"id": 605, "seek": 513440, "start": 5141.679999999999, + "end": 5146.4, "text": " Don''t you think like if if you if you come to work and + somebody says no, you cannot do what you want", "tokens": [50728, 1468, 380, 291, + 519, 411, 498, 498, 291, 498, 291, 808, 281, 589, 293, 2618, 1619, 572, 11, 291, + 2644, 360, 437, 291, 528, 50964], "temperature": 0.0, "avg_logprob": -0.1659189987182617, + "compression_ratio": 1.7142857142857142, "no_speech_prob": 0.0006032013334333897}, + {"id": 606, "seek": 513440, "start": 5146.4, "end": 5150.08, "text": " You should + do this and it might be kind of too restrictive", "tokens": [50964, 509, 820, 360, + 341, 293, 309, 1062, 312, 733, 295, 886, 43220, 51148], "temperature": 0.0, "avg_logprob": + -0.1659189987182617, "compression_ratio": 1.7142857142857142, "no_speech_prob": + 0.0006032013334333897}, {"id": 607, "seek": 513440, "start": 5150.5599999999995, + "end": 5155.759999999999, "text": " But here like they''ve been both challenges + and also that freedom to solve those challenges", "tokens": [51172, 583, 510, 411, + 436, 600, 668, 1293, 4759, 293, 611, 300, 5645, 281, 5039, 729, 4759, 51432], "temperature": + 0.0, "avg_logprob": -0.1659189987182617, "compression_ratio": 1.7142857142857142, + "no_speech_prob": 0.0006032013334333897}, {"id": 608, "seek": 513440, "start": 5157.44, + "end": 5161.5199999999995, "text": " Yeah, there are like two components first of + all you need to have like", "tokens": [51516, 865, 11, 456, 366, 411, 732, 6677, + 700, 295, 439, 291, 643, 281, 362, 411, 51720], "temperature": 0.0, "avg_logprob": + -0.1659189987182617, "compression_ratio": 1.7142857142857142, "no_speech_prob": + 0.0006032013334333897}, {"id": 609, "seek": 516152, "start": 5162.160000000001, + "end": 5169.92, "text": " Freedom and do long long term stuff. So like without worrying + or like what are you going to ship into production soon?", "tokens": [50396, 22208, + 293, 360, 938, 938, 1433, 1507, 13, 407, 411, 1553, 18788, 420, 411, 437, 366, 291, + 516, 281, 5374, 666, 4265, 2321, 30, 50784], "temperature": 0.0, "avg_logprob": + -0.24563800927364465, "compression_ratio": 1.7818930041152263, "no_speech_prob": + 0.0029202087316662073}, {"id": 610, "seek": 516152, "start": 5169.92, "end": 5171.68, + "text": " The second is concentration of talent", "tokens": [50784, 440, 1150, 307, + 9856, 295, 8301, 50872], "temperature": 0.0, "avg_logprob": -0.24563800927364465, + "compression_ratio": 1.7818930041152263, "no_speech_prob": 0.0029202087316662073}, + {"id": 611, "seek": 516152, "start": 5172.4800000000005, "end": 5176.0, "text": + " So you have like high concentration of talent so people can", "tokens": [50912, + 407, 291, 362, 411, 1090, 9856, 295, 8301, 370, 561, 393, 51088], "temperature": + 0.0, "avg_logprob": -0.24563800927364465, "compression_ratio": 1.7818930041152263, + "no_speech_prob": 0.0029202087316662073}, {"id": 612, "seek": 516152, "start": 5177.040000000001, + "end": 5178.88, "text": " Share ideas", "tokens": [51140, 14945, 3487, 51232], "temperature": + 0.0, "avg_logprob": -0.24563800927364465, "compression_ratio": 1.7818930041152263, + "no_speech_prob": 0.0029202087316662073}, {"id": 613, "seek": 516152, "start": 5178.88, + "end": 5183.200000000001, "text": " Yeah, if you have this mix so like there will + there will be innovations for sure", "tokens": [51232, 865, 11, 498, 291, 362, 341, + 2890, 370, 411, 456, 486, 456, 486, 312, 24283, 337, 988, 51448], "temperature": + 0.0, "avg_logprob": -0.24563800927364465, "compression_ratio": 1.7818930041152263, + "no_speech_prob": 0.0029202087316662073}, {"id": 614, "seek": 516152, "start": 5183.68, + "end": 5190.64, "text": " Yeah, it sounds like you had a combination of all three + components that you mentioned, right? So like talents and also yeah", "tokens": + [51472, 865, 11, 309, 3263, 411, 291, 632, 257, 6562, 295, 439, 1045, 6677, 300, + 291, 2835, 11, 558, 30, 407, 411, 19933, 293, 611, 1338, 51820], "temperature": + 0.0, "avg_logprob": -0.24563800927364465, "compression_ratio": 1.7818930041152263, + "no_speech_prob": 0.0029202087316662073}, {"id": 615, "seek": 519064, "start": 5191.52, + "end": 5194.320000000001, "text": " Yeah, yeah, I also saw that another company + is like", "tokens": [50408, 865, 11, 1338, 11, 286, 611, 1866, 300, 1071, 2237, + 307, 411, 50548], "temperature": 0.0, "avg_logprob": -0.21239442091721755, "compression_ratio": + 1.7885462555066078, "no_speech_prob": 0.00251026707701385}, {"id": 616, "seek": + 519064, "start": 5194.96, "end": 5201.4400000000005, "text": " Yeah, like in Samsung + there was already a strong team and there were like lots of innovation. So there + are a few startups", "tokens": [50580, 865, 11, 411, 294, 13173, 456, 390, 1217, + 257, 2068, 1469, 293, 456, 645, 411, 3195, 295, 8504, 13, 407, 456, 366, 257, 1326, + 28041, 50904], "temperature": 0.0, "avg_logprob": -0.21239442091721755, "compression_ratio": + 1.7885462555066078, "no_speech_prob": 0.00251026707701385}, {"id": 617, "seek": + 519064, "start": 5202.320000000001, "end": 5206.64, "text": " Uh, which came from + our lab and there was like a really good paper", "tokens": [50948, 4019, 11, 597, + 1361, 490, 527, 2715, 293, 456, 390, 411, 257, 534, 665, 3035, 51164], "temperature": + 0.0, "avg_logprob": -0.21239442091721755, "compression_ratio": 1.7885462555066078, + "no_speech_prob": 0.00251026707701385}, {"id": 618, "seek": 519064, "start": 5208.0, + "end": 5212.72, "text": " Yeah, so that that that''s a that''s a recipe for innovation + for sure", "tokens": [51232, 865, 11, 370, 300, 300, 300, 311, 257, 300, 311, 257, + 6782, 337, 8504, 337, 988, 51468], "temperature": 0.0, "avg_logprob": -0.21239442091721755, + "compression_ratio": 1.7885462555066078, "no_speech_prob": 0.00251026707701385}, + {"id": 619, "seek": 519064, "start": 5213.52, "end": 5219.280000000001, "text": + " Yeah, yeah, I''m really happy that it turned out so well to you for you and uh + your author as well", "tokens": [51508, 865, 11, 1338, 11, 286, 478, 534, 2055, + 300, 309, 3574, 484, 370, 731, 281, 291, 337, 291, 293, 2232, 428, 3793, 382, 731, + 51796], "temperature": 0.0, "avg_logprob": -0.21239442091721755, "compression_ratio": + 1.7885462555066078, "no_speech_prob": 0.00251026707701385}, {"id": 620, "seek": + 521928, "start": 5219.84, "end": 5223.44, "text": " I think he continues to work + also in the industry at least last time I checked", "tokens": [50392, 286, 519, + 415, 6515, 281, 589, 611, 294, 264, 3518, 412, 1935, 1036, 565, 286, 10033, 50572], + "temperature": 0.0, "avg_logprob": -0.2183579314838756, "compression_ratio": 1.5961538461538463, + "no_speech_prob": 0.0023854163009673357}, {"id": 621, "seek": 521928, "start": 5223.92, + "end": 5226.88, "text": " um, and so I I really hope that", "tokens": [50596, 1105, + 11, 293, 370, 286, 286, 534, 1454, 300, 50744], "temperature": 0.0, "avg_logprob": + -0.2183579314838756, "compression_ratio": 1.5961538461538463, "no_speech_prob": + 0.0023854163009673357}, {"id": 622, "seek": 521928, "start": 5227.92, "end": 5233.92, + "text": " You will get some really cool pull requests on hnsw that will pass your + criteria", "tokens": [50796, 509, 486, 483, 512, 534, 1627, 2235, 12475, 322, 276, + 3695, 86, 300, 486, 1320, 428, 11101, 51096], "temperature": 0.0, "avg_logprob": + -0.2183579314838756, "compression_ratio": 1.5961538461538463, "no_speech_prob": + 0.0023854163009673357}, {"id": 623, "seek": 521928, "start": 5236.08, "end": 5239.44, + "text": " Well, yeah, most of them pass is just like I", "tokens": [51204, 1042, + 11, 1338, 11, 881, 295, 552, 1320, 307, 445, 411, 286, 51372], "temperature": 0.0, + "avg_logprob": -0.2183579314838756, "compression_ratio": 1.5961538461538463, "no_speech_prob": + 0.0023854163009673357}, {"id": 624, "seek": 521928, "start": 5240.639999999999, + "end": 5243.84, "text": " Would love to have more time and I''ll try to allocate + more time", "tokens": [51432, 6068, 959, 281, 362, 544, 565, 293, 286, 603, 853, + 281, 35713, 544, 565, 51592], "temperature": 0.0, "avg_logprob": -0.2183579314838756, + "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.0023854163009673357}, + {"id": 625, "seek": 521928, "start": 5244.719999999999, "end": 5246.639999999999, + "text": " Yeah, looking and checking them", "tokens": [51636, 865, 11, 1237, 293, + 8568, 552, 51732], "temperature": 0.0, "avg_logprob": -0.2183579314838756, "compression_ratio": + 1.5961538461538463, "no_speech_prob": 0.0023854163009673357}, {"id": 626, "seek": + 524664, "start": 5246.64, "end": 5253.360000000001, "text": " Yeah, it''s really + really great. I really enjoyed talking to you Yuri. Um, thanks so much for allocating + your time also in this", "tokens": [50364, 865, 11, 309, 311, 534, 534, 869, 13, + 286, 534, 4626, 1417, 281, 291, 33901, 13, 3301, 11, 3231, 370, 709, 337, 12660, + 990, 428, 565, 611, 294, 341, 50700], "temperature": 0.0, "avg_logprob": -0.2931802942511741, + "compression_ratio": 1.6390243902439023, "no_speech_prob": 0.0014533621724694967}, + {"id": 627, "seek": 524664, "start": 5253.360000000001, "end": 5260.8, "text": " + Precreasement time um, but yeah, I mean all the best to you in in the future also + twitter and uh", "tokens": [50700, 6001, 66, 265, 296, 1712, 565, 1105, 11, 457, + 1338, 11, 286, 914, 439, 264, 1151, 281, 291, 294, 294, 264, 2027, 611, 21439, 293, + 2232, 51072], "temperature": 0.0, "avg_logprob": -0.2931802942511741, "compression_ratio": + 1.6390243902439023, "no_speech_prob": 0.0014533621724694967}, {"id": 628, "seek": + 524664, "start": 5261.84, "end": 5268.88, "text": " Hope to see some published work + at some point, but I don''t know it just uh, I enjoyed reading your paper and and", + "tokens": [51124, 6483, 281, 536, 512, 6572, 589, 412, 512, 935, 11, 457, 286, 500, + 380, 458, 309, 445, 2232, 11, 286, 4626, 3760, 428, 3035, 293, 293, 51476], "temperature": + 0.0, "avg_logprob": -0.2931802942511741, "compression_ratio": 1.6390243902439023, + "no_speech_prob": 0.0014533621724694967}, {"id": 629, "seek": 526888, "start": 5269.68, + "end": 5273.36, "text": " Kind of also then read read your code and it''s kind of + like", "tokens": [50404, 9242, 295, 611, 550, 1401, 1401, 428, 3089, 293, 309, 311, + 733, 295, 411, 50588], "temperature": 0.0, "avg_logprob": -0.1744119703155203, "compression_ratio": + 1.6782608695652175, "no_speech_prob": 0.002410390181466937}, {"id": 630, "seek": + 526888, "start": 5274.08, "end": 5277.52, "text": " It feels like you''ve put a + lot of effort there and and", "tokens": [50624, 467, 3417, 411, 291, 600, 829, 257, + 688, 295, 4630, 456, 293, 293, 50796], "temperature": 0.0, "avg_logprob": -0.1744119703155203, + "compression_ratio": 1.6782608695652175, "no_speech_prob": 0.002410390181466937}, + {"id": 631, "seek": 526888, "start": 5278.56, "end": 5281.76, "text": " It uh, it + also influences the industry so much today", "tokens": [50848, 467, 2232, 11, 309, + 611, 21222, 264, 3518, 370, 709, 965, 51008], "temperature": 0.0, "avg_logprob": + -0.1744119703155203, "compression_ratio": 1.6782608695652175, "no_speech_prob": + 0.002410390181466937}, {"id": 632, "seek": 526888, "start": 5282.08, "end": 5286.16, + "text": " So maybe you are not kind of realizing this every single day, but like", + "tokens": [51024, 407, 1310, 291, 366, 406, 733, 295, 16734, 341, 633, 2167, 786, + 11, 457, 411, 51228], "temperature": 0.0, "avg_logprob": -0.1744119703155203, "compression_ratio": + 1.6782608695652175, "no_speech_prob": 0.002410390181466937}, {"id": 633, "seek": + 526888, "start": 5286.88, "end": 5294.8, "text": " Yeah, you should know this that + there are so many databases that use your algorithm as as one of the baseline''s + in production", "tokens": [51264, 865, 11, 291, 820, 458, 341, 300, 456, 366, 370, + 867, 22380, 300, 764, 428, 9284, 382, 382, 472, 295, 264, 20518, 311, 294, 4265, + 51660], "temperature": 0.0, "avg_logprob": -0.1744119703155203, "compression_ratio": + 1.6782608695652175, "no_speech_prob": 0.002410390181466937}, {"id": 634, "seek": + 526888, "start": 5295.92, "end": 5297.92, "text": " It''s really cool work", "tokens": + [51716, 467, 311, 534, 1627, 589, 51816], "temperature": 0.0, "avg_logprob": -0.1744119703155203, + "compression_ratio": 1.6782608695652175, "no_speech_prob": 0.002410390181466937}, + {"id": 635, "seek": 529792, "start": 5298.08, "end": 5300.8, "text": " Yeah, yeah, + that that that that that was great that", "tokens": [50372, 865, 11, 1338, 11, 300, + 300, 300, 300, 300, 390, 869, 300, 50508], "temperature": 0.0, "avg_logprob": -0.3320886117440683, + "compression_ratio": 1.706896551724138, "no_speech_prob": 0.0015692192828282714}, + {"id": 636, "seek": 529792, "start": 5301.68, "end": 5303.68, "text": " There was + success", "tokens": [50552, 821, 390, 2245, 50652], "temperature": 0.0, "avg_logprob": + -0.3320886117440683, "compression_ratio": 1.706896551724138, "no_speech_prob": 0.0015692192828282714}, + {"id": 637, "seek": 529792, "start": 5303.68, "end": 5306.4800000000005, "text": + " Yeah, maybe one thing like I would note uh", "tokens": [50652, 865, 11, 1310, + 472, 551, 411, 286, 576, 3637, 2232, 50792], "temperature": 0.0, "avg_logprob": + -0.3320886117440683, "compression_ratio": 1.706896551724138, "no_speech_prob": 0.0015692192828282714}, + {"id": 638, "seek": 529792, "start": 5307.28, "end": 5312.96, "text": " That the + idea was the rain cares. So that was partially implemented and there needs work", + "tokens": [50832, 663, 264, 1558, 390, 264, 4830, 12310, 13, 407, 300, 390, 18886, + 12270, 293, 456, 2203, 589, 51116], "temperature": 0.0, "avg_logprob": -0.3320886117440683, + "compression_ratio": 1.706896551724138, "no_speech_prob": 0.0015692192828282714}, + {"id": 639, "seek": 529792, "start": 5313.76, "end": 5316.8, "text": " So he had + a work on ER maybe you know like", "tokens": [51156, 407, 415, 632, 257, 589, 322, + 14929, 1310, 291, 458, 411, 51308], "temperature": 0.0, "avg_logprob": -0.3320886117440683, + "compression_ratio": 1.706896551724138, "no_speech_prob": 0.0015692192828282714}, + {"id": 640, "seek": 529792, "start": 5318.64, "end": 5320.64, "text": " by using + the", "tokens": [51400, 538, 1228, 264, 51500], "temperature": 0.0, "avg_logprob": + -0.3320886117440683, "compression_ratio": 1.706896551724138, "no_speech_prob": 0.0015692192828282714}, + {"id": 641, "seek": 529792, "start": 5320.8, "end": 5323.04, "text": " And then + as the for the final rain care", "tokens": [51508, 400, 550, 382, 264, 337, 264, + 2572, 4830, 1127, 51620], "temperature": 0.0, "avg_logprob": -0.3320886117440683, + "compression_ratio": 1.706896551724138, "no_speech_prob": 0.0015692192828282714}, + {"id": 642, "seek": 532304, "start": 5323.92, "end": 5324.96, "text": " So", "tokens": + [50408, 407, 50460], "temperature": 0.0, "avg_logprob": -0.2514350825342639, "compression_ratio": + 1.6770428015564203, "no_speech_prob": 0.005218378733843565}, {"id": 643, "seek": + 532304, "start": 5324.96, "end": 5330.8, "text": " Yeah, it''s just like so I felt + that I need to cite this sure I need work for sure", "tokens": [50460, 865, 11, + 309, 311, 445, 411, 370, 286, 2762, 300, 286, 643, 281, 37771, 341, 988, 286, 643, + 589, 337, 988, 50752], "temperature": 0.0, "avg_logprob": -0.2514350825342639, "compression_ratio": + 1.6770428015564203, "no_speech_prob": 0.005218378733843565}, {"id": 644, "seek": + 532304, "start": 5331.6, "end": 5335.04, "text": " I learned this idea like maybe + not was changing, but from from him", "tokens": [50792, 286, 3264, 341, 1558, 411, + 1310, 406, 390, 4473, 11, 457, 490, 490, 796, 50964], "temperature": 0.0, "avg_logprob": + -0.2514350825342639, "compression_ratio": 1.6770428015564203, "no_speech_prob": + 0.005218378733843565}, {"id": 645, "seek": 532304, "start": 5335.68, "end": 5342.32, + "text": " Yeah, yeah, it sounds great. I mean, I''ve also interacted a bit with + him uh and and it sounds like he''s very knowledgeable guy", "tokens": [50996, 865, + 11, 1338, 11, 309, 3263, 869, 13, 286, 914, 11, 286, 600, 611, 49621, 257, 857, + 365, 796, 2232, 293, 293, 309, 3263, 411, 415, 311, 588, 33800, 2146, 51328], "temperature": + 0.0, "avg_logprob": -0.2514350825342639, "compression_ratio": 1.6770428015564203, + "no_speech_prob": 0.005218378733843565}, {"id": 646, "seek": 532304, "start": 5342.88, + "end": 5344.88, "text": " And he has very strong opinions as well", "tokens": [51356, + 400, 415, 575, 588, 2068, 11819, 382, 731, 51456], "temperature": 0.0, "avg_logprob": + -0.2514350825342639, "compression_ratio": 1.6770428015564203, "no_speech_prob": + 0.005218378733843565}, {"id": 647, "seek": 532304, "start": 5344.96, "end": 5348.24, + "text": " So maybe we will also talk with him on one of the episodes", "tokens": + [51460, 407, 1310, 321, 486, 611, 751, 365, 796, 322, 472, 295, 264, 9313, 51624], + "temperature": 0.0, "avg_logprob": -0.2514350825342639, "compression_ratio": 1.6770428015564203, + "no_speech_prob": 0.005218378733843565}, {"id": 648, "seek": 532304, "start": 5349.04, + "end": 5352.8, "text": " Uh, but um, yeah, I''m glad that you guys collaborated", + "tokens": [51664, 4019, 11, 457, 1105, 11, 1338, 11, 286, 478, 5404, 300, 291, 1074, + 42463, 51852], "temperature": 0.0, "avg_logprob": -0.2514350825342639, "compression_ratio": + 1.6770428015564203, "no_speech_prob": 0.005218378733843565}, {"id": 649, "seek": + 535304, "start": 5353.68, "end": 5357.36, "text": " Yeah, it''s it''s a fantastic + result for for the industry as well", "tokens": [50396, 865, 11, 309, 311, 309, + 311, 257, 5456, 1874, 337, 337, 264, 3518, 382, 731, 50580], "temperature": 0.0, + "avg_logprob": -0.22022696464292466, "compression_ratio": 1.8402061855670102, "no_speech_prob": + 0.0010746006155386567}, {"id": 650, "seek": 535304, "start": 5358.48, "end": 5363.44, + "text": " And and probably for your profiles well not probably but definitely for + your profiles", "tokens": [50636, 400, 293, 1391, 337, 428, 23693, 731, 406, 1391, + 457, 2138, 337, 428, 23693, 50884], "temperature": 0.0, "avg_logprob": -0.22022696464292466, + "compression_ratio": 1.8402061855670102, "no_speech_prob": 0.0010746006155386567}, + {"id": 651, "seek": 535304, "start": 5364.64, "end": 5371.92, "text": " So yeah, + um, thank you so much for your time and um, yeah, I hope you will have a relaxing + time over the Christmas and", "tokens": [50944, 407, 1338, 11, 1105, 11, 1309, 291, + 370, 709, 337, 428, 565, 293, 1105, 11, 1338, 11, 286, 1454, 291, 486, 362, 257, + 20103, 565, 670, 264, 5272, 293, 51308], "temperature": 0.0, "avg_logprob": -0.22022696464292466, + "compression_ratio": 1.8402061855670102, "no_speech_prob": 0.0010746006155386567}, + {"id": 652, "seek": 535304, "start": 5372.88, "end": 5374.88, "text": " Happy new + year as well", "tokens": [51356, 8277, 777, 1064, 382, 731, 51456], "temperature": + 0.0, "avg_logprob": -0.22022696464292466, "compression_ratio": 1.8402061855670102, + "no_speech_prob": 0.0010746006155386567}, {"id": 653, "seek": 535304, "start": 5375.28, + "end": 5377.28, "text": " So thank you very much for your time Yuri", "tokens": + [51476, 407, 1309, 291, 588, 709, 337, 428, 565, 33901, 51576], "temperature": 0.0, + "avg_logprob": -0.22022696464292466, "compression_ratio": 1.8402061855670102, "no_speech_prob": + 0.0010746006155386567}, {"id": 654, "seek": 535304, "start": 5378.4, "end": 5380.4, + "text": " Thank you", "tokens": [51632, 1044, 291, 51732], "temperature": 0.0, "avg_logprob": + -0.22022696464292466, "compression_ratio": 1.8402061855670102, "no_speech_prob": + 0.0010746006155386567}, {"id": 655, "seek": 535304, "start": 5380.4, "end": 5382.4, + "text": " Yeah, bye bye", "tokens": [51732, 865, 11, 6543, 6543, 51832], "temperature": + 0.0, "avg_logprob": -0.22022696464292466, "compression_ratio": 1.8402061855670102, + "no_speech_prob": 0.0010746006155386567}, {"id": 656, "seek": 538304, "start": 5383.04, + "end": 5402.64, "text": " Um", "tokens": [50396, 3301, 51344], "temperature": 1.0, + "avg_logprob": -3.687346935272217, "compression_ratio": 0.2, "no_speech_prob": 0.313255250453949}]' +--- + +Hello, vector podcast is here and today we're going to be talking to the author of H&SW library and algorithm. It's one of the best algorithms out there, one of the most used algorithms in vector search. And today I'm talking to Yuri Malkov. How are you doing? Hi. Hi. Hi. +So yeah, my name is Yuri Malkov. Currently I'm working at Twitter. There's a staff from my engineer and the content understanding and research and recommender systems. Yeah, please know that during discussion I don't represent like Twitter's point of view. The views are of my own. +Yeah, so it's great. So yeah, you already began introducing yourself. So I was wondering if you could tell me a bit about yourself, your background and then maybe we can also move into discussing the algorithm itself. Okay, sure. Yeah, so my trajectory of moving to ML is quite typical to Russia. +So yeah, I got good physics education in Nizhny Novgorod and there I did the PhD in laser physics. So there I was doing experiments on teravat lasers. So that was fun and like that part of physics is like considered to be like sexy part, similar to computer vision in machine learning. +And I was lucky to have good supervisors. So one of my supervisor which was like mostly a supervisor of paper. So he helped me. Is now the head of Russian Academy. So yeah, I had good supervisors. +In addition to physics, I was concurrently working part time in a startup that was building distributed scalable search systems based on insights from real networks. Yeah, that worked ended up in several papers on predecessor of H&W. +And the startup, yeah, unfortunately the startup was closed before even I got PhD. So yeah, I decided to focus on physics after that, but after I got my PhD degree in physics. So I, like there was a choice for me like what to do next and to proceed with career and physics. +I had to go abroad, like I didn't want to go abroad. I want to stay in Nizhny Novgorod. So I decided to just like switch directions and to network science there. And then I got a really good grant from the Russian fund. Alpha Phi, which now is not present anymore. +So I could do like research by my own. Like this pretty good salary. And yeah, I also joined companies, like computer vision companies to get to insight into why people actually use like similarities to your algorithm and machine learning. +And I worked at an television and later anti-club, which is the company that is like doing big brother for Moscow, like Moscow surveillance. +And later I joined some some VIS Center in Moscow and I worked with like Victor Limpitsky who is one of the well known personas in Russia and in 2019 I moved to US and now I work in Twitter to recommend their systems and content understanding, like board models. Oh yeah. +So you probably also use nearest neighbor search in your work or. Well, I can mention it. Yeah, well, not really. So I'm focused on the so I can work Twitter most of the time. I can have last half a year, I spent on improving search relevance. So that is mostly the ranker. +But that is closely related to the nearest neighbor search. Yeah. So you mentioned like basically the background where you've been in Russia, it was like kind of related to computer vision. Of course, you had physics background by education, but you also kind of worked in computer vision startups. +So what was your impression of this nearest neighbor search problem and like, how did you think about it when like, did you read papers to understand like what was done in that area? I think that areas pretty like developed right in in in the papers like like NSW itself, right? Like navigators. +Well, so like in the startup meta labs, I have been working, I think I've worked for six or seven years. So it was quite quite a significant period of time. And then we started just like from distributed search. So the idea was like we do it from scratch. +So like we don't care what I've been done before. So we have an idea. So there are like distributed hash tables like port or other stuff and we want to do it, but with similarity search. So that should scale to better base. And there that's like very different approach from nearest neighbor search. +And like most of the time we spent like developing this algorithm was not even like nearest neighbor search. That was closer to this symbolic filtering, but with like arbitrary filters. +And only at some point of time, like we had a realization that oh, like that is similar to what people actually need. Like there are a lot of papers of on nearest neighbor search. So we switch direction and like the most cited publications are on nearest neighbors. Yeah. Yeah. +I don't remember was it on your paper or somebody else's paper. I saw a paper of my old friend, you reliefs because he actually defended his thesis like in PG thesis in this space. So when he was doing it, I think it was 2009. +I was like, I was considering this like a pure mathematical problem without like maybe direct application. But then he gave a talk at Google, like you know, Google tech talks. I don't know if they still exist or not, but like he presented this problem and like they did some optimizations as well. +And then I think I think your paper sites it or maybe someone else's I don't remember. I was like really surprised to see, you know, his work also kind of in the same line of things that now lead to vector search essentially. Well, yeah, I think I saw his work, but it seemed like more theory. +Like if you look to history like of like graph approaches so like. Now it's mostly like rehashing of old stuff. So definitely new things, but like there is so much work done before like Sergey Brein worked on nearest neighbor search with like GNIT. So that is also like good work. +There were there were previous work on graph search, I think in 1993, which like aren't that much different compared to like current though, like they have also problems with scalability at that point. So I think yeah, so that was. There is a large number of like previous work in that area. +But you said like you didn't concern yourself with reading too many papers before you started inventing this new algorithm. Is that right? Yeah, sure, sure, we read papers, but they were not really relevant. So we read papers on network science. +And so we tried to so there was a problem with building like this, no navigable small roles. So like not every small network is navigable. Like most models are not. So we wanted to build navigable small and there were also didn't understand like. +Like what what was the criteria like what is like how we could make it and we reinvented like this. Dillon or graphs inside the company and after like you reinvented like you know starting to search and see there are lots of papers who did the same. Right. Yeah. So yeah. So we went the other way. +Yeah. So now that you mentioned this thing like can you actually please introduce this concepts at least on high level to our audience like what is a small world what is like what white it's to be navigable kind of a little bit like more to the user facing level if it's possible. +Well, like navigable small world so you have a large network. And so navigable small world that means you can find paths between like arbitrary elements in this network using which is a logarithmic scale. So the number of hopes can be done with the rhythmic and you can use only local information. +And do like something like greedy search like greedy searches allow and if you can find like the path and the algorithmic steps to your network is navigable. And that small world part like why is it small small. +And that's like history how he's historical reasons so there was like a famous like milligram experiment where they they send letters from one person like from random person to some target person. +That was kind of greedy like greedy search for connections very similar to this and that that's called like small world experiment so like a small world. And real networks like people have like real networks have low diameter like human human connection networks. +And they are navigable like at least according to milligram experiments and like subsequent experiments. Is it kind of related in common terms like to six handshakes that you need to connect every random person with another random person on the planet. +Yes, yes, so that's that's like that experiment is pretty sure I think it's done in the 60s so yeah so. And so the navigable part is basically like if we put this in the context of search right so. +So let's say I have local information I'm here I would like to travel from here let's say I'm in Helsinki I would like to travel to New York like how do I travel right I need to go to the airport. +From the airport I will travel maybe to some city in Europe from there I will change you know the airplane and then fly over to New York. +I'm making it a little bit more complicated there is a direct flight to New York from Helsinki but okay maybe that wasn't right is that analogous to navigable part. +Yes, yes, so like generally like that you can pinpoint that but if you start and finish in like small local airports which usually don't have connections, my magic connection so they connected to hops. +Yeah, and that is one of the model of navigable small roles so there are like Kleinberg's model which doesn't have hops so you can also build navigable small walls without hops. + But they have polylogarifmix coedizian so if you want to have polylogarifmix coedizian so maybe I'll ask you to provide some references later so especially for those who want to dig deeper into the smithematics like you mentioned these different algorithms like many of them are new to me at least so I'm sure to our part of our audience. + Part of our audience as well and I wanted to also ask you like on the context of your invention like what was the input so you said like you had a lot of data right from computer vision but like was there something else like dimensionality or some other constraint that was kind of tough for previous algorithms like a LSH or you know any other. +Well, there LSH didn't even work so we worked with like three structures we have to like how will you do LSH. + Yeah, for and for LSH so I thought that those are not practical algorithms so even when I spoke with people who like were writing a lot of papers on LSH they like expressed doubts and whether those algorithms are practical so they are not learnable so they cannot take advantage of the data that you have so like that. +And like what what they told is like they see as quantization as just a better version of practical version of LSH. +Yeah right and so actually I'm really interested like how did you set up to invent the algorithm like I can just give you briefly like in the recent billion scale vector search challenge. +We had like a small team and one of our team members actually implemented like a small change in product quantization layer like basically how you shuffle the dimensions in the vector and he achieved like 12% recall increase over the baseline you know the Facebook sell algorithm. +I didn't like have that much knowledge I've read your paper I've read other papers and so I was just thinking okay if I if I would start from first principles how would I solve it like I know nothing about this problem right so like how can I solve you know the search in multi-dimensional space. +And so I actually implemented a very very simple algorithm using your algorithm as one of the components maybe we can talk later about it but like how did you start inventing H&S W. +Well H&S W had a pretty assessor so it has like an NSW it's also called MSW or SW graph in different places like depending on where you look so and there I just so it had problems. +So it had several problems but like for like if you don't think about distributed setup the main problem it had poly algorithmic scalability with a number of elements and that killed the performance on low dimensional data. + So there were like comparison works like one by Leonid Bytes of where he evaluated different algorithms and like its performance really like it didn't perform that well on some data set and the loss was by many orders of magnitude so it could be like one like 1000 times slower than the best solution and yeah. +So the work on H&S W were targeted at just improving the previous version so it wouldn't have this problem and like ideally would perform the best on all setups. So yeah and that that that that has been solved. +Right but like you still needed to add that magical age in front of it so you made it hierarchical like what what pushed you in the direction of making it hierarchical and what what did you think that it might work or was it like as a result of experimentation that it proved to work. +Well yeah that that that's that's yeah that has many ingredients in it so for for one thing when I worked with the startup mirror labs so we had a different problem with distributed index that NSW had a pleasant quality that the hubs that are created in the network are the first elements. +So and the for distributed system you would want to add new nodes to the system and you will have much capacity like increase the capacity of the system but because all your hubs on in the first notes like in the older notes because they have been created before new nodes even existed. + The traffic is routed through the same old notes which make it not scalable and we spent quite a lot of time on figuring out how to solve it and there at some point I've noticed that like our NSW approach is pretty similar to skip list in terms of what what has been what is being protest as final network. +The idea is like if you if you create a skip list for one D and create the NSW for one D and then like for skip list you just merge the all all links regardless of player you will get a similar network in terms of like degree distribution like distance distribution well all major properties. + So but skip list doesn't have this property so you can add new nodes and they can have like they can have higher levels and like your traffic will be a road through notes in your form like across your distributed system so and that thing we knew like from the startup that there is a like equivalence but that was only for the problem of distributed search. +So it would still use the same polylogar if Michael like 3d search algorithm like which doesn't think about like what is that how many links you have on a note so that was shelved for that reasons in the startup but then so after ID PhD so like I wanted to publish a good paper on network science. + And there like it was and I like there is there is a result that we can create a new navigable networks which a method which was not known before so I tried to publish it in nature so it was rejected like nature physics also rejected that it was rejected by editors then in scientific reports was rejected after a review and then like it was finally published in plus one. +And I think like I really like this paper so that was like the most surprising result I think I got but yet it's not really decided. +And as a byproduct of this I did a comparison to other navigable small world methods and so like maybe I have maybe like this approach with like the old vision that you can apply like you can look at the real world networks and replicate it and like computer system and they will be. +So I replicate that the work done like scale free navigable navigable small worlds which are very popular thing till the moment all. +And so that the performance was really was like very bad like extremely bad and the reason for that that if you have a scale free network and scale free means you have a power low distribution of degree and usually they like there is a. + coefficient gamma and like the best cases was gamma is close to two but gamma close to two means that the scalability with the size of the network so the degree scale is almost linearly so when you have a like a greedy search for the hub so when it goes through the hub like it to play it's like a huge portion of the network so you have like linear scalability instead of like ultra logarithmic so log log again which is. + They like the number of hope is log log in but at some point you evaluate to like almost every point in your network and you have like really bad performance and that like that after that realized what was the problem with NSW and like I thought all like we already have a solution for that so because keep least doesn't have this problem and so yeah after that I implemented the prototype and it worked. +Working on the like C++ version and the evaluation. By the way when when you started implementing your prototype was it initially in C++ now it wasn't in in in in Java and Java because Java is your favorite language or what was it Java. +Because the distributed system like that was implemented in Java so that was close so like it was easier to integrate like if you like maybe you were thinking it's easier to integrate in Java right. +Well I just know how to code it in Java so I code that several times for NSW and that all Java code was released. +So yeah just code it and then like I had to transfer it to C++ to make it efficient and like yeah and so there is like Leonid bites off so who who is a maintainer of an MS leap so I have been in contact with him for quite a while and yeah so it was implemented in the library. +Did you like collaborate with him to to to implement it using the enemy sleep sort of the most efficient way or. +Well first of all like the ideology of the library is very close to like what we have been developing so it's not only focused on like typical distances like L2, Cassian or even like inner product. +So yeah it makes sense to compare on all those distances and Leonid also had a paper like in a bench like on all of those so we can just implement a new algorithm and run a bench. +Right and come so that that was like a really good point and it also wasn't implemented in and benchmark so if you add an algorithm so we can like. +Go through all sets of benchmarks yeah yeah yeah so like it was kind of easy for you to evaluate where you algorithm stands against other algorithms right so like yes right right and so what was the. And you also had a call to write maybe maybe you could introduce him as well like on your paper. +Oh yeah that that is midrida shunyan so. So that's that that that was my peer in the like physics lab he also got PhD the same year I did so yeah so. Yeah so we decided to team up with that so he helped a lot on he he did he did the all evaluation so he integrated it with other code. +And here on the evaluation on the like clusters that we had. Yeah at that point nice so so back to the invention like as you've been inventing this elbow did you have to make a lot of adjustments to the core of the algorithm as you have been evaluating it or was it like. +You know the first shot and it was it. Well not really so there are like two changes compared to NSW in the national SW so first one is the idea of layers so that's all most of the problems with like low dimensional data and. +Yeah also improve performance like in most of the tax that like most of the distributions even like but maybe not much like high dimensional data but still when I ran the whole like suit that was there was a few data set on we should perform worse compared to. +VP3 so that's from Leonits you then I thought that wasn't a big deal but like communicated the results with Leonit trying to convince him that like we don't need to have that much algorithms. +But he was not convinced so then we added like an improvement with the heuristic for selecting the neighbors which like I personally knew from the work on spatial approximation three. +That made that that made the transition to skip list exact so it made an exact so you can build the exact skip list in one day using this heuristic and after that so yeah that that that that addition improved the performance yeah. +I remember please correctly if i'm wrong but like I've read your paper actually really really closely so I printed it and you know like I was reading with the pencil actually making notes so remember like at some point was it so that you agree to them it also needs to prove that it will converge. + Or like because you keep resuffling the points in some way right like as you build it you use multiple threads like in order to kind of build the actual paths between the nodes between layers right so like do you need to kind of still somehow make sure that it will converge on all dimensionality so all spaces or was it was it not necessary. + Well so the algorithm is pretty stable so the result is like how many threads you can go that is an empirical result so I was surprised when I saw it but you know even like for NSW the first algorithm even if you start to do like to use I know 40 threads from a single element I can found no no no drop in the recall. +No drop in the recall or speed that was a bit surprising. In terms of stability. So the main way to make it stable is just like to avoid avoid exploring so like use use proper parameters big enough there are ways to make it stable in. +For corruption so when when but that that that that is pretty costly so if you bootstrap the graph so if you like do iterations like similar to an undecent I think you probably know that I can make it stable even if it's corrupted by a lot. +So that is done only for updates so like when you update your kind of corrupting the graph and well in the like a channels WLIP. +So for updates it wasn't specifically made to be like very stable but for just construction it doesn't have to be like that's stable doesn't have to conversion all situation just keep the parameters high enough and it wouldn't diverge. + Right yeah because I remember like and I'm also curious to hear your opinion so then I after your paper I started reading other papers for example the Microsoft's zoom algorithm and then later they called it discount and with some modifications so they were comparing to etch and SW at larger scale something like billions of nodes billions of points in the space right. + And so they they were trying to minimize the cost that that that it will incur because basically as you build the H and SW you also use memory quite a bit right so I wanted to hear your opinion on that part and then they what they did is that they I don't know if you're familiar with these papers but what they did is that they offloaded portion of the retrieval to to an SSD disk. +And so they kind of combined your algorithm with like additional layers and then they kind of resolve to full precision when they go to SSD disk but they don't don't do it in memory. So they do use quantization right yeah they use quantization exactly. + That's a very popular approach and that makes sense so it's so basically you have a hardware limitation so that you can can store but you have a hardware here are here so you have like not so big RAM and like lots of SSDs and maybe like if you have distributed system you have access to other nodes. +So yeah that's a clever use of here are here that makes sense. But at the same time like your algorithm was taken into using to popular frameworks like files right so like files is not a single algorithm like one of them as H and SW and then. +Actually don't know how they did it did they take your C++ dependency directly or did they implemented do know. +They very implemented from scratch so like I talked to them once so they said they tried different way but like in the end it was like pretty close to the like the initial C++ library don't have some different there is some if something's are implemented differently in fights. +So for instance there is a thread pool like in channels WL for keeping track of visited elements so when you have a new search if there is like a map like think of a bitmap for which knows which notes in the network are visited. +And the channels WL it's kept in memory all the time and when you have like a new search it will just like peaks from the pool and then face like it creates it once per search so there are much searches more more effective. + Yeah yeah yeah by search yeah batch search is another feature that sometimes is implemented in vector databases but did you like expect that your algorithm would become so widely applicable like do you know that it has been re implemented in several languages like for example as part of vector database called V ev8 it was implemented in goal. +And there is a database called quadrant it it's implemented in rast and of course all of these implementations also add like crowd support so they you can actually update right because in reality in database you need these features. +And then they also added symbolic filtering on top of that so it's also inside your algorithm like did you did you expect such popularity. No no like I thought that we will publish the algorithm and like we will win the benchmarks and we're clearly seeing. +But though at that time like just before we published the benchmark there was a like competitor Falcon which also published the benchmarks of widget better but like for Falcon targeted like not like that much and I thought that well Falcon was only like for few specific metrics. +And yeah actually it also was done by like person from the same school which I went so it was in jarrison stain so I talked with him a bit and I thought that like we have open source to code so we published the paper so like people will quickly just like iterate on top of that and like improve. + But yeah so it took much more time to others to improve upon it compared to what I've expected and maybe that was due to lack of interest maybe that was to some inertia so I don't know like looking at the how many startups and solutions are popping out right now it seems like that like the most of the interest came much longer. +Like much later yeah to like to the point when it was released so it was hard to predict it back then. +Yeah do you think that an MS leap has to do something with this success that it kind of maybe an MS leap was somewhat visible and then when you edit your algorithm there and show that it performs you know those people who followed this library at least knew. Okay there is a new algorithm. +I think yeah well that helps so when the MS leap is a good library so it has some audience I think the most attention came from and benchmarks. So because yeah well an eye is like what was I had a lot of attention by that point and that benchmark was done by the same person. +Who did an eye so yeah I think that draw some like traffic to the libraries and yeah also I think the idea of algorithm was like understandable and so. So that also like affects the usage so if you understand something you are more likely to use it. +Yeah yeah it's Eric Bernerson right the Swedish guy as he says the sweet who is stuck in New York City yeah I think he implemented a no way originally there is also like a presentation by him where he explains not only the annoy algorithm but also. + So how intuition doesn't work in multi-dimensional spaces anymore like we think that like in three in 3D world where we leave now right like the further the point away from you like you can actually see it somehow perceive it but like in multi-dimensional space it's not like that I don't know what's your view on that by the way. +So like does geometry perception changes in high dimensional space. +Well yes yes so there are like many interpretations of this so people who work with nearest neighbor search they know about it so like if you have like if you have like many dimensions even small perturbations there they can go like far. +So you all have like so to find nearest neighbor you need to have like a huge cover sphere yeah like when you divide divide the space so yeah that makes the problem complicated and that that one of the reasons why all the practical methods are approximate. + Right yeah yeah yeah so like you do need some approximation in order to find the points and so yeah I mean it sounds like so when you when you mentioned and then benchmarks was it you who submitted the algorithm for the benchmarks or was it Eric who picked it up and he made it kind of available in the end and benchmarks. +No no I did a full request to edit. All right so it basically yeah yeah so you pushed it forward yourself right so it wasn't like you just implemented and then you waited for it to be discovered so to say. +No yeah definitely so like the one of the like decisions to use in the most sleep was that the most sleep was already integrated in an benchmark so adding that will be just like adding some code in an benchmark that like pulls the algorithm and. +And then the tuning of the parameters so that was but that was simple to do right yeah and so as you did that like what were the results like of that of course an end benchmarks it has a number of parameters right for example like even indexing speed. +Not only like recall versus QPS trade trade off like was there some specific kind of metrics that were hnsw excelled over other algorithms. +Well at that point of time there was like no logging of the construction time and memory consumption and the like the initial version in the most sleep it had like clear focus on the performance like recall to speed ratio. +Yeah and well you know it's hard to do proper benchmarking so like there are a number of scenarios somewhere you have a limit on memory somewhere you have a limit on the construction time sometimes like you don't care about them at all you just care about the speed. + You can also care about like multi thread performance or you can care about like single thread performance like maybe different scenarios so it's pretty hard to go proper benchmarking and the depth like like I did a decision to just focus on the recall and don't think about construction and memory. +Okay I see yeah and so and and basically when you when you did that like was hnsw like at the top of the competition at that point. +Yes yes it was like a top and on many many benchmarks it was like it was a huge cap compared to the next competitor so not so maybe for a globe I think this Falcon there was still there there was a like significant. Difference yeah but many. +Yeah also like at some point after that there was a real release of key graph algorithm so which like decreased the difference but it was still on top of it. +Yeah so did you did you did it make you feel proud at that moment when you saw the big gap and like this is your invention for how did you feel about it. +Well that felt nice for sure so yeah so we published the paper I think like pop when the paper was finally accepted so it's also felt like really well so I think it took. Like two and a half years to publish the paper well. +As they say in the US I think every rejection brings you closer to the goal so it sounds like you've been rejected in multiple like journals that was not that was still published. +Now that was a single journal it's just like yeah one revision took one year so that is that is palm year so transaction of pattern analyzing a machine intelligence okay. +So like we follow the practice and physics and ignore ignore the conferences so and we also need the for the grants we need to have journal publications not not confidence publication so we sent. To Pamy and had few revisions there but each revision took a year. +Wow this is super long why do you think it was like that like why why would reviewers be so scrutinizing like your submission. Well I don't know so like I actually talked with the editor so I was very angry after the first result so and it seems like just a problem is how. +So publications in computer science are organized so that's that's not only that journal there are so many journals which have. This problem and like when I looked at the Twitter like when some discussions were like oh I got like review invitation for like this like the national journal. + They said I have to write review in 10 days oh I never gonna do that so no like no way I'm writing a review in 10 days and yeah so in physics it took it sometimes took a few weeks to get the review and journal in journal so you send it and thank you for the months you can already start like writing to review like what what what takes so long yeah and yeah in computer science. +But journals are very slow conferences are also slow there's several months to get the review and like people saw that we are using archive yeah so if there were no archive I think they have already they will just. + They will create new journals yeah exactly like they should be any monopolies right in that sense like maybe go and create your own journal but then the question is when the problem is when you a PhD student let's say you have a chicken act problem right so you haven't proven yourself yet you need a publication to defend your thesis right so that's the trap. + Well it's also known how how can this all so if like they created like a new conference conferences like I think I didn't remember I see a lot or I see a lot was created not that long ago they could have created the journal as well yeah the same people said like we don't want to do conferences like conferences you have a very tight deadline that means like if you miss it you'll wait for another year and that is like not not. +Great let's create a journal and now you have a continuous like a spectrum of time when you want to send your paper no deadlines there are no deadlines for reviewers yeah you could almost review yourself. +Yeah I mean like during the review period on the conference you can get like 10 papers at the same time so you have to review them like in a batch but if you are working with journals you get a review like from time to time yeah like your your load is distributed. + Yeah so by the way what is your take like I think new IPS conference they decided this year they decided to hold all reviews publicly so essentially anybody can see you know the comments from reviewers and there is like a discussion between reviewers and authors and everything is public do you think it improves the process somehow or not what's your take on this. + Well I think that makes sense so that opens well that sets the bar for reviewers higher because if you know that your review will be read by some random people you want to make it better and spend more time on reading the paper it also helps to understand the review process from outside like for if you're a new reviewer you want to understand how to do proper review you can just read reviews by other people. + And that is helpful and you can also like if you're if you want to publish a paper you can find similar papers and read the reviews for those papers and understand like why they are rejected or accepted so that that helps I don't see like much problem in that that fights against against the corruption and some places in science are corrupted so. + Yeah it kind of brings transparency at least with the process and also as you mentioned someone can learn how to do these things right so I think it's also useful and maybe it prevents situations when the paper is rejected outright because the reviewer has some bias against this topic or you know I mean at least transparency is good I think yeah. +Are you publishing today by the way do you have any publishable work do you intend to publish. Not much so I'm working mostly on protection like maybe next year I work on something publishable we are last last thing I published that wasn't some song so for on both estimation. + Yeah but like I've noticed like you are very active on hnsw github like when I when I posted my question and maybe we can discuss that as well if you are kind of curious on that kind of you responded really fast and so it means that you still continue to allocate chunk of your time to to look at you know issues and pull requests on on github. +Yeah so like I wish I would have done it better so I miss some some things from there but yeah I tried to update this library so I think that well when I designed hnsw so there was some design decisions and even if I see like some algorithms outside improve upon that I think they are not. +Aligned with the design. So and I skip them one of that is like hnsw tries to avoid global view of the network so because it's it's a descendant of. Distributed algorithms so like it's like it's not good strategically if you have like a global view well sometimes it helps. +Like there are papers where you can and that we should make that the pass from the entry point of the network to every node is in short so you can make it but that is that breaks if you're doing assertions for instance so like you cannot have a global view and dynamic nature at the same time. +Yeah so that that's why I ignore some of the stuff there's also a focus on like custom distances so even though the hnsw lip supports only free distances is pretty easy to implement what distances like you want and I believe that there will be a shift in like what distances are being used. + After some time because there are problems with like those like those simple distances you mean like a sign cosign dot product this type of distances right or yeah yeah it's more a problem that you want to embed everything like you want to embed an entity into a single vector representation so and that has limitations like as you probably know that. + Like transformers are based on attention and there before there was a like a last year with attention for translation and without attention of didn't work well because it like compressed everything to a single vector so I believe that in some time there will be at least set distances so where your object and query represented as a set of like as a set of which can be shuffled and doesn't change the structure. + So for a user that can be like set of user interests for a document that can be a set of like subjects inside the document for the query it can be like different parts that you want to have in the query at the same time but those parts like might not be ordered and when you embed something you are that you make it ordered and like for instance when I played with clip. + So there is this for so I thought that like it can do what's your which is nice so you can like have an image and like see like what are the words are which text is closest but definitely struggles with the notion of like what words are there so what is the first word yeah yeah so like geometrically or like in different languages it might be even different geometry of words right like should you read left to right or right. +Right to left and then like you also need another dimension of language they are guess. +Yeah, we can represent it as like bag of words maybe ordered bag of words is something encoding as all people do now for text but that that I have so I think like an end would need to adapt for the situation in the future and keeping the stability to add new distances is like is important. +Yeah, so are you are you working on on this personally or are you like welcoming pool requests as they say you know to implement different distances. Well, I'm welcoming pool requests for sure because those are very application specific well it's pretty easy to implement like I don't know. + Or some simple distance so you have like a set a set of distance you just select which are the closest out of the set so it would like many many to many somewhat similar I think to what colder does so though they can I think go without it but essentially you all you'll need a set to set distance right yeah make sense I was since I mentioned it twice already I was wondering like to pick your name. + I was wondering like to pick your brain on what I was thinking in this space like an and trust me it's absolutely very simple algorithm that I came up with the only problem is that I chose Python as the language and so Python has this little bit weird virtual machine kind of how it does the garbage collection and so what I suspect maybe it's also bugging my code but on billion nodes I cannot actually make it conversion reasonable memory so I'm going to say. +And so it runs out of memory like on 995 million and what I did I was really what like I took the input set of points right so the points are like 128 dimensions or 200 dimensions. +Essentially I pick a random point the first one not not random the first one and then I on a sample of points I compute a median distance right so basically what's what's the kind of average distance between between all of them in a player wise fashion. + And and so then I use that as as a filter to build what I called a sharp right so essentially I decided to split the billion points down to controllable number of charts let's say 1000 charts right and so I pick the first point and then I say okay which other point is close enough so like within that median distance to this point. + And so I joined them together in the chart and as the chart reaches 1 million so basically if it's like 1000 charts each chart roughly 1 million points that's a billion points right and then I will close that chart and I will run H and as double you on it so that I can actually have that chart as a hierarchical navigable small world graph. +And and it seems to converge like at least on 10 million it converges on 100 million converges it runs out of memory on one billion but I think it's just some weirdness in how I do it in this big loop or overall points. +But when I reached out to you on on GitHub like my idea was to also access the first layer of the graph so that first layer where the query will enter I could use that. + And as the sort of entry point across this 1000 charts right so because I don't want to load all 1000 into memory I want to load only sufficient amount of entry points so that I can quickly check which chart is closer to my query and then go inside that and use it as W what do you think about this idea it's very simple I think. +Yes well that that that makes sense so that clustering it seems to be so like you did you have like a cluster the points into 1000 clusters and then they select the clusters and. Well yeah that that makes sense I think like historically there were other papers that suggested something similar to. +And then I think in flam so that was one of the distributors strategies that they suggested. Yeah well that that that might work out so that though that depends on on the scale so and so that also well for production system you also want to replicate those notes and so right. +Okay maybe like let's come from a different way so that you can also start to very small pieces so it might not be needed in this case I can want to balance so but on the top layer you can also use like as in this Microsoft paper that you mentioned also there are other papers like from young so. +So I have a paper this those guys so you can use a in you can start into maybe not the short you can. So if you want to divide your data set into like million clusters and use like a higher index to decide on which chart you want to select it right yes. +So though like if you're if you're not talking about like. So it's not a scale so probably like doesn't make too much sense. + But yeah yeah you can do this yeah I mean I'm still hopeful to kind of keep trying it I have another friend who is like on Twitter he actually recorded like a YouTube video where he said here here and here you make a mistake like this is why you lose memory like you should never allocate objects inside loops you should pre-allocate them as NAMPA erase and so on. +So with his modifications it still runs out of memory so like I need to kind of move forward and I'm still kind of like hopefully I can do it in Python but something also tells me maybe I should move to more kind of memory controllable language something like rast or C++ I don't know. + Well I'm not sure so like so using something like so you probably using C++ libraries from Python like NAMPA or torch yeah something like that so they should not click memory so those are pretty pretty controllable yeah yeah it is definitely my code somewhere in the loop it probably just computes too many time like like basically the hottest part of the algorithm like in terms of profiling it right is like. +Like when you can so you pre compute the median distance once right and then you use that value all the time so it's kind of it's okay it's just an object so it doesn't allocate much but then as you extract the next batch of points so I read the one billion set in one million batches right. +I sense that there could be a loss of memory because like it's a binary file and so you say in NAMPA you say from this file read the next batch right so like you you provide the kind of offset and so I sense that maybe there it loses memory maybe not I don't know. +Because I've noticed that in files library they use M M M M M M M M M M M M M. Yeah, I can also use M M M M. So NAMPA if you read the tenser from NAMPA they can also have M M M M M M M options so you can load this M M M M M M M M M M M M M M. +But even if you're using if you're reading we are like open like open file is like read binary it should not click memory so it should it should do read then it's just like. + Yeah, so it must be something super stupid then in my code that kind of like really obvious to somebody like you like okay Here is here is that point you should not do this But like for me it's like I invented this basic idea But then like pushing it maybe like like it works on 10 million and I'm okay But like the task was as part of this challenge to do the billion scale right so this is like You crawl you crawl the the mountain Without the top in a way, but yes, there is a top of course. +It's only one billion points But yeah, I mean it keeps me quite excited to keep doing it. Of course, I already see some maybe Need for improvements for example. +How how do I make updates? + Right, so let's say a new point comes in and I have like 1000 charts predefined So I need to find either an existing chart or create a new one at some point So that that part I defer to the future, but like maybe I still need to push push harder to just converge it first Okay, you can profile for memory so we can like loop some operations in the code that you think that can leak and Profile the memory for those Yeah, I've been doing that like actually I also come from the world of Java so in Java it's like Quite straightforward in a way there are also tools in Python when you plug in this memory profiler It slows down your computations significantly and so you have to wait like 10 times more Yeah, so I'm not a fan of profilers so like recently I find I found a video like a talk on YouTube Which explain like why why we shouldn't use profilers and that was like the profilers They become obsolete when the code became like not multi-freaded, but like with multiple paths So when I'm totally this pension so pension was super scholar So your operations are out of order and when you look at the profiler results like I don't understand them so when I was developing a S&S W Libye, I haven't used profilers So I just like wrote benches for operations and And like I had like Faceline and trial so they usually work in the same memory So the like index is the same, but there are different implementations of search and Like your your speed can depend on memory how you allocate the memory and With those benches you can measure something like up to 1 or 2% of difference And when you like do a lot of benches with one or two percent improvement you can get like 20% improvement 50% improvement Yeah, but like I never used profiles and like I never saw like in my life that people use profiles and like get really complicated insights from using profiles Yeah, I agree like we we did like so we're building also building a search engine With like we had like by design we had like billions of documents and each document was just a short sentence like a statement from a document real document And of course we were running out like we were running into all this garbage collector stop the world problems and so on and we were running this Profilers, I think one of them was J rocket and then others and like when you see the graphs you're like, okay So now I know yes it leaks, but what should I do? +So or it tells you that your code is using like Biterase too much like what can you do other than that? Yeah, and for performance it's even worse. +So you see that like this model takes a lot of time, but The in a multi multi threaded world that like it's not for sure. + So you may improve it that Like and that happened so quite quite quite a few times so people went to me and said like You're analysis of performance contradictory Profile blocks And that's okay, right because you didn't optimize for the profiler Yeah, because profiler cannot like so it cannot say to you what would happen if you change something Exactly, it's just a snapshot. +Yeah, it's just a snapshot. +Yeah, exactly And and like coming back to hsw what are you hoping to achieve like Maybe in some midterm future for example like Why do you decide at work which where the reimplementation as w is when they add symbolic filtering? +So like what would it take in your original paper in your original algorithm to add symbolic filters? How does it change the dynamic of that graph and search? + Uh, well, it seems like for me like so I can correlate interest to and then and interest to symbolic filtering So like I think two years ago I haven't heard like people talk about symbolic filtering in an but now like it's a hot topic Like from different places people want symbolic filtering that is like for targeting So like for ads. + Yeah, you can you want to have some targeting for the audience or some other filters and But I see that as outside of the end and itself so As I said when working on a startup so our first application was Doing something like symbolic filtering and there it's easier in some sense because Like as you said there is a problem of this distances and high-dimensional space and this problem There is no such problem in symbolic filtering So symbolic filtering you have a query that have exact result and yeah, if you write the SQL query So it can be optimized to work efficiently and but the iNM does a very different job. +It does approximate Yeah filtering We can kind of mix them together. + So if you add like so you have a distance and like you add some Like prefix for that which somehow captures the symbolic filtering and you can build an index that also like takes Takes account and like there are some other people who suggested to do that as well But the problem here like and yeah, that can help so during search So if you filter by the symbol and like you can easily add filtering so when hnsw does filtering for deletes like can be done the same way Yeah, you can extract Like only elements that pass the filter and there is some like guidance on the graph because you create that wizard But for me like I don't know so you have like huge number of possible filters So what will be the metric and how would you balance it with the like approximate network that creates a lot of problems I think yeah, and I I thought that the best solution would be like to keep this like to some extent But focus more on like how do you can chart the index according to those like Great theory don't that there are sharp. + So you can like do SQL queries like for instance Like there are some queries that can Work well with this filtering Like if you're most of like or like I don't know 20% of the elements pass the symbolic filter So that is fine you can use it But maybe there are some queries for which like I know only like one of a million passes them and those are in different parts Yeah exactly space So for them you can Uh See in real time. + So you like you search and you see that it doesn't perform well Uh for those and you can just Build the separate index for them right because you know those are small those are people want to find them Uh, maybe there are enough maybe they're out of a billion, but if you have three LN elements, so there's like a million of them So you you you cash them like build a cash index For those on the fly so that is like discrete optimization problem And I think that's a bit outside of the index because index is like Uh Yeah, so it's focused on the different part. +Yeah Yeah, and I really I don't think that other algorithms like and an algorithms can like somehow avoid this problem Yeah exactly. + Yeah, I mean it sounds yeah, what would you say like you find a stunt correctly Like a little bit like a and then contradicts Just kind of the nature of symbolic filtering in some sense, but still people do it right so for example in VEV8 And in quadrant they did it right so like you and in milbus as well, but it's funny like in milbus they use Fies and then other algorithms, right, but they say we only support you know integer Fields, but we don't support for example strings yet So we are working on adding strings which means essentially they're designing like This graph somehow in such a way that okay, it doesn't support strings yet Maybe because it's not so easy To to to to to edit right Well, I I'm not sure so that also depends how you measure the performance like if you have rare queries That switch the rich don't have any result. + So like you pro like your algorithm doesn't even work on them But you either are rare to you measure the like overall recall and you don't see it like any problems So definitely you can build the solution maybe like some simple with like filtering during search but like It sure it will fail on some points and that is suboptimal in terms of latency Yeah, so if you if you're talking about existing solution Maybe they have like a really good solution, which I just don't know I looked at few Uh, and that was mostly like filtering inside the graph So like if you if you if you if you have really rare elements which are like distributed across the search space Evenly like in different parts. +So it will struggle Because you need to just do brute force of the whole to find them. +Yeah, exactly I mean to me it sounds like computerial explosion like if I add more and more symbolic filters Like essentially I'm introducing like new sub spaces in my space, right? +So like I need to like push these points somehow closer to each other within that specific symbolic filter But if I add more of them now I have like kind of like multi-dimensional space of filters, right? + Yeah, and you you have a really high dimensional space of filters But you don't really know like the distribution of queries For those filters it should be very different because those are user distribution Yeah, so that also will make the problem more complicated So it still can work if you if the especially if distribution is kind of similar So it will work if you crank up the parameters of the graph Yeah, use more connections But so there is a mismatch so during query your distribution may be very different and you need to think about it So like How you balance those inside so you have like two types of distance and how you balance them You want to balance it so the the query distribution Yeah That's that's that's this field like I think this field of vector search doesn't make you excited that you you contributed to it Like how do you feel about this field that is emerging right now? + Well, I think it is very important so right now I'm working mostly on applications how to Like get advantage of this and so there are many applications Which cannot be done without efficient search like there was a paper for deep mind Like was quite recently where they used search Uh like inside of the network and uh well That makes a lot of sense And I think Yeah, there will be more papers and that there were papers before that paper But there will be more papers that use an M inside the inside the big like a huge nal p model Yeah, yeah, for example like this uh learning to hash methods I don't know if you heard about them So like um there are like when I when I tried to kind of Put everything into their buckets like how many different types of algorithms exist Like I didn't know about learning to hash. + It seems to be like one of the recent uh developments Are you following up on that as well or uh Well learning to hash so like I'm not really following that so learning to hash was before hnsw Okay, there are algorithms And uh when I talked with people who did like were specialized on product quantization and review the papers Uh, they told me that like learning to hash never reaches the performance of like post quantization Like at least at what that was like a few years ago Yeah, and uh yeah, maybe like Now it's solved Uh, but like when I talk about an n Inside the i think about like about graph in it So yeah, yeah, and Yeah, and uh So one interesting thing also can happen uh like with graphs So what what like what is an additional advantage of graph uh nearest neighbors to your engines is that you can change the metric Uh, so For instance if you are doing multi-stage ranking Like the You have like and you have multiple candidate sources like for search you have something like like like the m25 Also you might have embeddings like with similarity search So uh, and those are like three that are separate sources and then ranked Uh, but essentially like why do you need an n like for the first like from from the beginning Uh, you need an n to speed up the ranking So essentially you can rank all the documents using your have a ranker Uh, but uh you cannot it's to like to expensive to do so you can add an n and n is basically for vector search uh that is you Distill everything to vectors and you have the same objective and you have like a Like a way to Sparsify the interactions Uh, but you can look about the other way so you know you have a graph and the graph are just the Candidates and you have like a low simple metric now you have More complicated metric on this graph and you have like a final ranking that also can be searched on this graph So that means you don't supply Like a set of candidates to the ranking, but rather you supply interpoints in the graphs So you have a graph which is uh, well Which is built a trying to uh uh Capture the uh similarity for the ranker and uh like when you so instead of filtering like from one stage to the next stage You can uh just switch the metric in the graph. + You had light metric, which is like vectors now you have more complicated metrics So you hydrate the features of the elements in the graph and like Traverse and like now you have a really complicated metric Which yeah, like you just very heavy, but you still you just have an interpoint in the graph. +So you explore it and you can Uh, well, you can fix some mistakes done by the previous layers. Yeah, so it's not exact filtering. So that's yeah, that's another like Maybe unique The feature of the graph methods. + Yeah, sounds quite exciting like Have you have you thought about publishing this idea or like I mean it sounds quite quite unique Well, it doesn't make sense to publish an idea without implementation Yeah, for sure, but maybe you can influence those who who would like to Experiment on it At least those who will watch this podcast. +I think they will listen. +They will they will probably pick it up Yeah, and use graph algorithms for sure Like Yeah, I mean it sounds like all all of the NN algorithms they have Like advantages and disadvantages, right? +So it's not like the all of them are uniquely Outperforming, you know the others Well, there is like a division like if you think about like quantization algorithms. +So they are kind of orthogonal to Graph algorithms. So they They quantize so they can speed up a compressed like I'm compressed to save the memory and speed up the computation But like older algorithm they just use something like IVF. + So And then like one layer filtering and you can use graphs instead of IVF Right, so we can use graphs and add the quantization and at the Faiz did that before Yeah, I think some others also did that Yeah, and the thing and then like vector databases actually offer it as one method like like Milvus for example like they offer IVF and then you can choose like if you want to do exact K&N or if you want to do A&N So you can actually configure it in different ways Yeah, I mean just sounds like you're Without maybe realizing much like you are at the core of what's happening in vector search in some sense Of course, there have been other multiple contributions, right? +But like For some reason exactly your algorithm has been picked by many vector databases There are like seven of them. + So actually wrote a blog about Six of them and then seventh kind of knocked on my door and said can you also add at us And so when I when I was going through different databases like in Java implemented in Java or in in Python or you know in Rust and go All of them picked your algorithm for some reason So like maybe it was easier like it's a combination of how easy it is to implement how transparent it is like to understand right and then basically it's stability. + So it's like a combination of things Yeah, probably like I'm not totally sure So yeah, the initial library also was implemented as a header on the well, not the initial so that was a second library So there there was a problem with HNW lippen implementation and NMS lip So it so like the NMS lip format was a bit restrictive Like for efficient operation. +So it converted it to Flat memory format and so that that That makes made construction slower and memory can sub-share bigger. + So was re-implemented as I had a wrongly library So header on the library was inspired by an I So like by the success also and I Think that also might have contributed because it's very easy to like integrate it So there are a few files it compiles in some seconds Yeah, no Maybe maybe also that help So the library itself is simple and easy to integrate Yeah, yeah And I mean it must feel kind of cool to have this impact But but I also I also hope like you you will continue kind of Maybe doing some publishable publishable work in some fashion and doesn't need to be a journal Which is rejected five times but something else Is this something that you are planning to do or Uh, well that depends so like I cannot talk too much about my work in Twitter. +So So Maybe maybe we will publish something so that that depends on how it goes. I mean, I'm near even nearest neighbors. + Yeah, not not only but yeah But I it's hard to predict now if it works well So that then publish yeah, at least the idea that you mentioned like I mean if you if it's outside Twitter for example in hnsw Your library like the idea of this multistage ranking sounds quite exciting Um, uh, well, I think it can be implemented only by the teams who own the rankers and all the whole pipeline Yes, true. +I think it can be implemented as like As I think you need to hide ha you need to hydrate the features and like yeah on the fly and have feature hydration is very specific to application Yeah, but not only inside the production environment Yeah, that makes sense. +Yeah, uh, so maybe it will call for creation of data sets and kind of this benchmarks if the industry chooses to move in that direction Well, like there are some obvious problems with data privacy With that so it's hard to publish something Well, you can think of a toy problem. + So like you have like you don't do actual like work with users, but maybe We do image to image search and you have like a huge transformer model On top of that or maybe like something like marco Emma smart car maybe it can be done with that like experimented Hmm, maybe so Yeah Yeah, I think we weren't really deep today. + You really I think it was really really cold cold talking to you I always like to still ask Kind of this question orthogonal question of why like it's a little bit more philosophical But like if you're not a verse Of philosophy like why Would you say like this field attracted you like in your own words? +Uh I didn't have much choice. + It just was like I I got my first job offer and that was In this field That's that's about scale so like people like scaling and like many games when you play on like android or other stuff They're based on scaling so you do like a little action and there are like huge consequences of those actions like destroying something or Like and that is scaling and uh, this is just like a pure scale of how how how we scale machine learning Applications Yeah, so on on one hand it kind of was predefined as you said you found the job on the other hand You still were curious to implement that algorithm. +So like it wasn't like somebody said okay You have to do it right you could also choose a job of like okay. I'm just coding nine to five and then I go home But like you still decided to implement an algorithm Well, yes, well that that was a fun job. +So Yeah, so like you were not scared by the challenge itself, right? Maybe was it like motivating actually There was no like that much push like from the like From from the company itself. + So we could we could do whatever we want inside the company So it was very like relaxed Yeah, that might be actually a really good background to invent things Don't you think like if if you if you come to work and somebody says no, you cannot do what you want You should do this and it might be kind of too restrictive But here like they've been both challenges and also that freedom to solve those challenges Yeah, there are like two components first of all you need to have like Freedom and do long long term stuff. +So like without worrying or like what are you going to ship into production soon? +The second is concentration of talent So you have like high concentration of talent so people can Share ideas Yeah, if you have this mix so like there will there will be innovations for sure Yeah, it sounds like you had a combination of all three components that you mentioned, right? +So like talents and also yeah Yeah, yeah, I also saw that another company is like Yeah, like in Samsung there was already a strong team and there were like lots of innovation. + So there are a few startups Uh, which came from our lab and there was like a really good paper Yeah, so that that that's a that's a recipe for innovation for sure Yeah, yeah, I'm really happy that it turned out so well to you for you and uh your author as well I think he continues to work also in the industry at least last time I checked um, and so I I really hope that You will get some really cool pull requests on hnsw that will pass your criteria Well, yeah, most of them pass is just like I Would love to have more time and I'll try to allocate more time Yeah, looking and checking them Yeah, it's really really great. +I really enjoyed talking to you Yuri. + Um, thanks so much for allocating your time also in this Precreasement time um, but yeah, I mean all the best to you in in the future also twitter and uh Hope to see some published work at some point, but I don't know it just uh, I enjoyed reading your paper and and Kind of also then read read your code and it's kind of like It feels like you've put a lot of effort there and and It uh, it also influences the industry so much today So maybe you are not kind of realizing this every single day, but like Yeah, you should know this that there are so many databases that use your algorithm as as one of the baseline's in production It's really cool work Yeah, yeah, that that that that that was great that There was success Yeah, maybe one thing like I would note uh That the idea was the rain cares. + So that was partially implemented and there needs work So he had a work on ER maybe you know like by using the And then as the for the final rain care So Yeah, it's just like so I felt that I need to cite this sure I need work for sure I learned this idea like maybe not was changing, but from from him Yeah, yeah, it sounds great. + I mean, I've also interacted a bit with him uh and and it sounds like he's very knowledgeable guy And he has very strong opinions as well So maybe we will also talk with him on one of the episodes Uh, but um, yeah, I'm glad that you guys collaborated Yeah, it's it's a fantastic result for for the industry as well And and probably for your profiles well not probably but definitely for your profiles So yeah, um, thank you so much for your time and um, yeah, I hope you will have a relaxing time over the Christmas and Happy new year as well So thank you very much for your time Yuri Thank you Yeah, bye bye Um \ No newline at end of file diff --git a/transcripts_with_timestamps/vector-podcast/yusuf-sarigoz-ai-research-engineer-qdrant-getting-to-know-your-data-with-metric-learning.md b/transcripts_with_timestamps/vector-podcast/yusuf-sarigoz-ai-research-engineer-qdrant-getting-to-know-your-data-with-metric-learning.md new file mode 100644 index 0000000..4364e21 --- /dev/null +++ b/transcripts_with_timestamps/vector-podcast/yusuf-sarigoz-ai-research-engineer-qdrant-getting-to-know-your-data-with-metric-learning.md @@ -0,0 +1,2188 @@ +--- +description: '

YouTube: https://www.youtube.com/watch?v=AU0O_6-EY6s

Topics:

00:00 + Intro

01:03 Yusuf’s background

03:00 Multimodal search in tech and humans

08:53 + CLIP: discovering hidden semantics

13:02 Where to start to apply metric learning + in practice. AutoEncoder architecture included!

19:00 Unpacking it further: + what is metric learning and the difference with deep metric learning?

28:50 + How Deep Learning allowed us to transition from pixels to meaning in the images

32:05 + Increasing efficiency: vector compression and quantization aspects

34:25 Yusuf + gives a practical use-case with Conversational AI of where metric learning can prove + to be useful. And tools!

40:59 A few words on how the podcast is made :) Yusuf’s + explanation of how Gmail smart reply feature works internally

51:19 Metric + learning helps us learn the best vector representation for the given task

52:16 + Metric learning shines in data scarce regimes. Positive impact on the planet

58:30 + Yusuf’s motivation to work in the space of vector search, Qdrant, deep learning + and metric learning — the question of Why

1:05:02 Announcements from Yusuf

- + Join discussions at Discord: https://discord.qdrant.tech

- Yusuf''s + Medium: https://medium.com/@yusufsarigoz + and LinkedIn: https://www.linkedin.com/in/yusufsarigoz/

- + GSOC 2022: TensorFlow Similarity - project led by Yusuf: https://docs.google.com/document/d/1fLDLwIhnwDUz3uUV8RyUZiOlmTN9Uzy5ZuvI8iDDFf8/edit#heading=h.zftd93u5hfnp

- + Dmitry''s Twitter: https://twitter.com/DmitryKan

Full + Show Notes: https://www.youtube.com/watch?v=AU0O_6-EY6s

' +image_url: https://media.rss.com/vector-podcast/20220507_080542_57009c58f961b6d0713e057b9a5a4832.jpg +pub_date: Sat, 07 May 2022 20:37:42 GMT +title: Yusuf Sarıgöz - AI Research Engineer, Qdrant - Getting to know your data with + metric learning +url: https://rss.com/podcasts/vector-podcast/479453 +whisper_segments: '[{"id": 0, "seek": 0, "start": 0.0, "end": 24.72, "text": " Hello, + today we have a new episode of the Vector Podcast and today I''m super happy to + have", "tokens": [50364, 2425, 11, 965, 321, 362, 257, 777, 3500, 295, 264, 691, + 20814, 29972, 293, 965, 286, 478, 1687, 2055, 281, 362, 51600], "temperature": 0.0, + "avg_logprob": -0.3523604583740234, "compression_ratio": 1.0714285714285714, "no_speech_prob": + 0.16972854733467102}, {"id": 1, "seek": 2472, "start": 24.72, "end": 32.64, "text": + " you, Suf Sangos, with me. He holds the role of AI Research Engineer at Quadrant. + It''s a", "tokens": [50364, 291, 11, 2746, 69, 19037, 329, 11, 365, 385, 13, 634, + 9190, 264, 3090, 295, 7318, 10303, 15808, 412, 29619, 7541, 13, 467, 311, 257, 50760], + "temperature": 0.0, "avg_logprob": -0.2194515677059398, "compression_ratio": 1.5743801652892562, + "no_speech_prob": 0.2924881875514984}, {"id": 2, "seek": 2472, "start": 32.64, "end": + 38.32, "text": " Vector Search Database Company and you might remember we had an + episode with Tom Lackner,", "tokens": [50760, 691, 20814, 17180, 40461, 651, 13918, + 293, 291, 1062, 1604, 321, 632, 364, 3500, 365, 5041, 441, 501, 1193, 11, 51044], + "temperature": 0.0, "avg_logprob": -0.2194515677059398, "compression_ratio": 1.5743801652892562, + "no_speech_prob": 0.2924881875514984}, {"id": 3, "seek": 2472, "start": 38.32, "end": + 44.16, "text": " who is the user of Quadrant today. We have an episode and discussion + with you, Suf, who works for", "tokens": [51044, 567, 307, 264, 4195, 295, 29619, + 7541, 965, 13, 492, 362, 364, 3500, 293, 5017, 365, 291, 11, 2746, 69, 11, 567, + 1985, 337, 51336], "temperature": 0.0, "avg_logprob": -0.2194515677059398, "compression_ratio": + 1.5743801652892562, "no_speech_prob": 0.2924881875514984}, {"id": 4, "seek": 2472, + "start": 44.16, "end": 50.8, "text": " Quadrant. And one of the core topics today + we''re going to be discussing metric learning, but before that,", "tokens": [51336, + 29619, 7541, 13, 400, 472, 295, 264, 4965, 8378, 965, 321, 434, 516, 281, 312, 10850, + 20678, 2539, 11, 457, 949, 300, 11, 51668], "temperature": 0.0, "avg_logprob": -0.2194515677059398, + "compression_ratio": 1.5743801652892562, "no_speech_prob": 0.2924881875514984}, + {"id": 5, "seek": 5080, "start": 50.8, "end": 60.4, "text": " hey, you Suf, how + are you doing? I''m very excited to join you in this episode to discuss metric learning", + "tokens": [50364, 4177, 11, 291, 2746, 69, 11, 577, 366, 291, 884, 30, 286, 478, + 588, 2919, 281, 3917, 291, 294, 341, 3500, 281, 2248, 20678, 2539, 50844], "temperature": + 0.0, "avg_logprob": -0.1496942705578274, "compression_ratio": 1.5265957446808511, + "no_speech_prob": 0.00567870307713747}, {"id": 6, "seek": 5080, "start": 60.4, "end": + 67.12, "text": " and thank you for having me. Yeah, thanks for coming up. Really, + I think this topic is", "tokens": [50844, 293, 1309, 291, 337, 1419, 385, 13, 865, + 11, 3231, 337, 1348, 493, 13, 4083, 11, 286, 519, 341, 4829, 307, 51180], "temperature": + 0.0, "avg_logprob": -0.1496942705578274, "compression_ratio": 1.5265957446808511, + "no_speech_prob": 0.00567870307713747}, {"id": 7, "seek": 5080, "start": 67.67999999999999, + "end": 74.64, "text": " something that has been crossing my area of focus and also + some of the questions that users are", "tokens": [51208, 746, 300, 575, 668, 14712, + 452, 1859, 295, 1879, 293, 611, 512, 295, 264, 1651, 300, 5022, 366, 51556], "temperature": + 0.0, "avg_logprob": -0.1496942705578274, "compression_ratio": 1.5265957446808511, + "no_speech_prob": 0.00567870307713747}, {"id": 8, "seek": 7464, "start": 74.64, + "end": 80.64, "text": " asking, you know, okay, if I have this data set, how can + I be sure that it will work with neural", "tokens": [50364, 3365, 11, 291, 458, + 11, 1392, 11, 498, 286, 362, 341, 1412, 992, 11, 577, 393, 286, 312, 988, 300, 309, + 486, 589, 365, 18161, 50664], "temperature": 0.0, "avg_logprob": -0.2478970399836904, + "compression_ratio": 1.5378486055776892, "no_speech_prob": 0.013338115066289902}, + {"id": 9, "seek": 7464, "start": 80.64, "end": 85.76, "text": " search, right? And + I think metric learning seems to be one of the answers. But before we start", "tokens": + [50664, 3164, 11, 558, 30, 400, 286, 519, 20678, 2539, 2544, 281, 312, 472, 295, + 264, 6338, 13, 583, 949, 321, 722, 50920], "temperature": 0.0, "avg_logprob": -0.2478970399836904, + "compression_ratio": 1.5378486055776892, "no_speech_prob": 0.013338115066289902}, + {"id": 10, "seek": 7464, "start": 85.76, "end": 91.12, "text": " discussing this + in deep, in depth, I was thinking, could you please introduce yourself to our audience?", + "tokens": [50920, 10850, 341, 294, 2452, 11, 294, 7161, 11, 286, 390, 1953, 11, + 727, 291, 1767, 5366, 1803, 281, 527, 4034, 30, 51188], "temperature": 0.0, "avg_logprob": + -0.2478970399836904, "compression_ratio": 1.5378486055776892, "no_speech_prob": + 0.013338115066289902}, {"id": 11, "seek": 7464, "start": 92.64, "end": 101.04, "text": + " Yes, sure. Armist has told Suf''s software developer and AI researcher with a + background in", "tokens": [51264, 1079, 11, 988, 13, 11893, 468, 575, 1907, 2746, + 69, 311, 4722, 10754, 293, 7318, 21751, 365, 257, 3678, 294, 51684], "temperature": + 0.0, "avg_logprob": -0.2478970399836904, "compression_ratio": 1.5378486055776892, + "no_speech_prob": 0.013338115066289902}, {"id": 12, "seek": 10104, "start": 101.12, + "end": 112.08000000000001, "text": " linguistics at the university. Actually, I''ve + been developing software since my high school years.", "tokens": [50368, 21766, + 6006, 412, 264, 5454, 13, 5135, 11, 286, 600, 668, 6416, 4722, 1670, 452, 1090, + 1395, 924, 13, 50916], "temperature": 0.0, "avg_logprob": -0.1662550083426542, "compression_ratio": + 1.3706293706293706, "no_speech_prob": 0.006533720064908266}, {"id": 13, "seek": + 10104, "start": 113.04, "end": 123.04, "text": " During my master''s study, I combined + my experience and my education to study machine translation.", "tokens": [50964, + 6842, 452, 4505, 311, 2979, 11, 286, 9354, 452, 1752, 293, 452, 3309, 281, 2979, + 3479, 12853, 13, 51464], "temperature": 0.0, "avg_logprob": -0.1662550083426542, + "compression_ratio": 1.3706293706293706, "no_speech_prob": 0.006533720064908266}, + {"id": 14, "seek": 12304, "start": 123.28, "end": 134.4, "text": " After several + years of experience in different roles and at different startups,", "tokens": [50376, + 2381, 2940, 924, 295, 1752, 294, 819, 9604, 293, 412, 819, 28041, 11, 50932], "temperature": + 0.0, "avg_logprob": -0.22620956521285207, "compression_ratio": 1.416, "no_speech_prob": + 0.03279435634613037}, {"id": 15, "seek": 12304, "start": 138.24, "end": 151.20000000000002, + "text": " I ended up with the multi model retrieval because I had a long experience + in both computer vision", "tokens": [51124, 286, 4590, 493, 365, 264, 4825, 2316, + 19817, 3337, 570, 286, 632, 257, 938, 1752, 294, 1293, 3820, 5201, 51772], "temperature": + 0.0, "avg_logprob": -0.22620956521285207, "compression_ratio": 1.416, "no_speech_prob": + 0.03279435634613037}, {"id": 16, "seek": 15120, "start": 151.2, "end": 159.92, "text": + " and measured language processing. So for some time, my main focus is metric learning.", + "tokens": [50364, 293, 12690, 2856, 9007, 13, 407, 337, 512, 565, 11, 452, 2135, + 1879, 307, 20678, 2539, 13, 50800], "temperature": 0.0, "avg_logprob": -0.31396156549453735, + "compression_ratio": 1.4619883040935673, "no_speech_prob": 0.003455967642366886}, + {"id": 17, "seek": 15120, "start": 161.11999999999998, "end": 171.51999999999998, + "text": " I was already a user of co-advent, even before joining co-advent and I + thought it would be very cool", "tokens": [50860, 286, 390, 1217, 257, 4195, 295, + 598, 12, 345, 2475, 11, 754, 949, 5549, 598, 12, 345, 2475, 293, 286, 1194, 309, + 576, 312, 588, 1627, 51380], "temperature": 0.0, "avg_logprob": -0.31396156549453735, + "compression_ratio": 1.4619883040935673, "no_speech_prob": 0.003455967642366886}, + {"id": 18, "seek": 15120, "start": 172.48, "end": 178.88, "text": " to work for + an open source project that I find valuable myself.", "tokens": [51428, 281, 589, + 337, 364, 1269, 4009, 1716, 300, 286, 915, 8263, 2059, 13, 51748], "temperature": + 0.0, "avg_logprob": -0.31396156549453735, "compression_ratio": 1.4619883040935673, + "no_speech_prob": 0.003455967642366886}, {"id": 19, "seek": 17888, "start": 179.6, + "end": 186.32, "text": " Yeah, sounds awesome. Sounds cool. You just mentioned multi + model. So you mean like multi model search,", "tokens": [50400, 865, 11, 3263, 3476, + 13, 14576, 1627, 13, 509, 445, 2835, 4825, 2316, 13, 407, 291, 914, 411, 4825, 2316, + 3164, 11, 50736], "temperature": 0.0, "avg_logprob": -0.2481926781790597, "compression_ratio": + 1.541237113402062, "no_speech_prob": 0.025137102231383324}, {"id": 20, "seek": 17888, + "start": 186.32, "end": 194.0, "text": " right? And I think this field is still + kind of in many ways shaping up and many people are still", "tokens": [50736, 558, + 30, 400, 286, 519, 341, 2519, 307, 920, 733, 295, 294, 867, 2098, 25945, 493, 293, + 867, 561, 366, 920, 51120], "temperature": 0.0, "avg_logprob": -0.2481926781790597, + "compression_ratio": 1.541237113402062, "no_speech_prob": 0.025137102231383324}, + {"id": 21, "seek": 17888, "start": 194.0, "end": 199.44, "text": " learning and + kind of scratching their heads like what is multi model? Like maybe if you could + give", "tokens": [51120, 2539, 293, 733, 295, 29699, 641, 8050, 411, 437, 307, 4825, + 2316, 30, 1743, 1310, 498, 291, 727, 976, 51392], "temperature": 0.0, "avg_logprob": + -0.2481926781790597, "compression_ratio": 1.541237113402062, "no_speech_prob": 0.025137102231383324}, + {"id": 22, "seek": 19944, "start": 199.52, "end": 212.64, "text": " an example or + a little bit explain what is multi model. Yes, sure. Actually, as you just said, + multi model is quite a", "tokens": [50368, 364, 1365, 420, 257, 707, 857, 2903, + 437, 307, 4825, 2316, 13, 1079, 11, 988, 13, 5135, 11, 382, 291, 445, 848, 11, 4825, + 2316, 307, 1596, 257, 51024], "temperature": 0.0, "avg_logprob": -0.34644007215312883, + "compression_ratio": 1.4444444444444444, "no_speech_prob": 0.006710630841553211}, + {"id": 23, "seek": 19944, "start": 213.2, "end": 224.48, "text": " new topic actually. + Actually, it''s resurrecting with developments in deep metric learning.", "tokens": + [51052, 777, 4829, 767, 13, 5135, 11, 309, 311, 34338, 278, 365, 20862, 294, 2452, + 20678, 2539, 13, 51616], "temperature": 0.0, "avg_logprob": -0.34644007215312883, + "compression_ratio": 1.4444444444444444, "no_speech_prob": 0.006710630841553211}, + {"id": 24, "seek": 22448, "start": 224.56, "end": 236.39999999999998, "text": " + One of the most famous applications is a clip by OpenAI, short for contrastive language + image,", "tokens": [50368, 1485, 295, 264, 881, 4618, 5821, 307, 257, 7353, 538, + 7238, 48698, 11, 2099, 337, 8712, 488, 2856, 3256, 11, 50960], "temperature": 0.0, + "avg_logprob": -0.1859737237294515, "compression_ratio": 1.3642857142857143, "no_speech_prob": + 0.00890433695167303}, {"id": 25, "seek": 22448, "start": 238.0, "end": 251.28, "text": + " the pre-training. In the most basic term, they train a model to construct a unified + vector space", "tokens": [51040, 264, 659, 12, 17227, 1760, 13, 682, 264, 881, 3875, + 1433, 11, 436, 3847, 257, 2316, 281, 7690, 257, 26787, 8062, 1901, 51704], "temperature": + 0.0, "avg_logprob": -0.1859737237294515, "compression_ratio": 1.3642857142857143, + "no_speech_prob": 0.00890433695167303}, {"id": 26, "seek": 25128, "start": 251.28, + "end": 262.32, "text": " for both images and tests. Basically, they have two encoders, + one for images and one for", "tokens": [50364, 337, 1293, 5267, 293, 6921, 13, 8537, + 11, 436, 362, 732, 2058, 378, 433, 11, 472, 337, 5267, 293, 472, 337, 50916], "temperature": + 0.0, "avg_logprob": -0.20743141617885855, "compression_ratio": 1.4424778761061947, + "no_speech_prob": 0.0070470524951815605}, {"id": 27, "seek": 25128, "start": 263.28, + "end": 272.8, "text": " tests, support that you have a pair of images and its textual + description.", "tokens": [50964, 6921, 11, 1406, 300, 291, 362, 257, 6119, 295, + 5267, 293, 1080, 2487, 901, 3855, 13, 51440], "temperature": 0.0, "avg_logprob": + -0.20743141617885855, "compression_ratio": 1.4424778761061947, "no_speech_prob": + 0.0070470524951815605}, {"id": 28, "seek": 27280, "start": 273.12, "end": 285.36, + "text": " When you see this image and that textual description to these encoders, + you are supposed to get", "tokens": [50380, 1133, 291, 536, 341, 3256, 293, 300, + 2487, 901, 3855, 281, 613, 2058, 378, 433, 11, 291, 366, 3442, 281, 483, 50992], + "temperature": 0.0, "avg_logprob": -0.23163982232411703, "compression_ratio": 1.476923076923077, + "no_speech_prob": 0.04262159764766693}, {"id": 29, "seek": 27280, "start": 286.56, + "end": 301.2, "text": " very similar vectors, vector output from these encoders. + So you can search images with a textual", "tokens": [51052, 588, 2531, 18875, 11, + 8062, 5598, 490, 613, 2058, 378, 433, 13, 407, 291, 393, 3164, 5267, 365, 257, 2487, + 901, 51784], "temperature": 0.0, "avg_logprob": -0.23163982232411703, "compression_ratio": + 1.476923076923077, "no_speech_prob": 0.04262159764766693}, {"id": 30, "seek": 30120, + "start": 301.2, "end": 311.92, "text": " query or Y-14. So you sort of crossed the, + so in a way with one modalities text or image is another", "tokens": [50364, 14581, + 420, 398, 12, 7271, 13, 407, 291, 1333, 295, 14622, 264, 11, 370, 294, 257, 636, + 365, 472, 1072, 16110, 2487, 420, 3256, 307, 1071, 50900], "temperature": 0.0, "avg_logprob": + -0.3519118513379778, "compression_ratio": 1.4202898550724639, "no_speech_prob": + 0.007251732051372528}, {"id": 31, "seek": 30120, "start": 311.92, "end": 319.91999999999996, + "text": " modality, but in this case, we kind of like cross go across modalities. + I think we can cross the", "tokens": [50900, 1072, 1860, 11, 457, 294, 341, 1389, + 11, 321, 733, 295, 411, 3278, 352, 2108, 1072, 16110, 13, 286, 519, 321, 393, 3278, + 264, 51300], "temperature": 0.0, "avg_logprob": -0.3519118513379778, "compression_ratio": + 1.4202898550724639, "no_speech_prob": 0.007251732051372528}, {"id": 32, "seek": + 31992, "start": 319.92, "end": 330.40000000000003, "text": " border of modalities + with this. Yeah, which I think to many users will sound like a magic because", "tokens": + [50364, 7838, 295, 1072, 16110, 365, 341, 13, 865, 11, 597, 286, 519, 281, 867, + 5022, 486, 1626, 411, 257, 5585, 570, 50888], "temperature": 0.0, "avg_logprob": + -0.2309054798550076, "compression_ratio": 1.5957446808510638, "no_speech_prob": + 0.004148249980062246}, {"id": 33, "seek": 31992, "start": 331.28000000000003, "end": + 339.44, "text": " you essentially, if you view an image like a set of pixels and + if you query textual queries a set of", "tokens": [50932, 291, 4476, 11, 498, 291, + 1910, 364, 3256, 411, 257, 992, 295, 18668, 293, 498, 291, 14581, 2487, 901, 24109, + 257, 992, 295, 51340], "temperature": 0.0, "avg_logprob": -0.2309054798550076, "compression_ratio": + 1.5957446808510638, "no_speech_prob": 0.004148249980062246}, {"id": 34, "seek": + 31992, "start": 339.44, "end": 347.76, "text": " words, now you sort of somehow + magically search your words in pixels, but actually that''s not exactly", "tokens": + [51340, 2283, 11, 586, 291, 1333, 295, 6063, 39763, 3164, 428, 2283, 294, 18668, + 11, 457, 767, 300, 311, 406, 2293, 51756], "temperature": 0.0, "avg_logprob": -0.2309054798550076, + "compression_ratio": 1.5957446808510638, "no_speech_prob": 0.004148249980062246}, + {"id": 35, "seek": 34776, "start": 347.76, "end": 354.48, "text": " what''s happening. + Of course, we do the embedding and so on, but in a nutshell, it kind of sounds like", + "tokens": [50364, 437, 311, 2737, 13, 2720, 1164, 11, 321, 360, 264, 12240, 3584, + 293, 370, 322, 11, 457, 294, 257, 37711, 11, 309, 733, 295, 3263, 411, 50700], "temperature": + 0.0, "avg_logprob": -0.2396084947406121, "compression_ratio": 1.356164383561644, + "no_speech_prob": 0.003481998573988676}, {"id": 36, "seek": 34776, "start": 354.48, + "end": 365.52, "text": " this magical cross model search there. Yes, I expected + for newcomers is a little bit like magic,", "tokens": [50700, 341, 12066, 3278, + 2316, 3164, 456, 13, 1079, 11, 286, 5176, 337, 40014, 433, 307, 257, 707, 857, 411, + 5585, 11, 51252], "temperature": 0.0, "avg_logprob": -0.2396084947406121, "compression_ratio": + 1.356164383561644, "no_speech_prob": 0.003481998573988676}, {"id": 37, "seek": 36552, + "start": 366.08, "end": 381.03999999999996, "text": " but from quite a long time, + we have already been using vector search in the context of image search,", "tokens": + [50392, 457, 490, 1596, 257, 938, 565, 11, 321, 362, 1217, 668, 1228, 8062, 3164, + 294, 264, 4319, 295, 3256, 3164, 11, 51140], "temperature": 0.0, "avg_logprob": + -0.23621658325195313, "compression_ratio": 1.1764705882352942, "no_speech_prob": + 0.005519893951714039}, {"id": 38, "seek": 38104, "start": 381.12, "end": 396.16, + "text": " but in that case, we search for images with a query which is image if + that, but in this case,", "tokens": [50368, 457, 294, 300, 1389, 11, 321, 3164, + 337, 5267, 365, 257, 14581, 597, 307, 3256, 498, 300, 11, 457, 294, 341, 1389, 11, + 51120], "temperature": 0.0, "avg_logprob": -0.24256067073091547, "compression_ratio": + 1.4296875, "no_speech_prob": 0.0025529435370117426}, {"id": 39, "seek": 38104, "start": + 396.8, "end": 409.76, "text": " we make a connection between two modalities actually. + This is also how our human brain is", "tokens": [51152, 321, 652, 257, 4984, 1296, + 732, 1072, 16110, 767, 13, 639, 307, 611, 577, 527, 1952, 3567, 307, 51800], "temperature": + 0.0, "avg_logprob": -0.24256067073091547, "compression_ratio": 1.4296875, "no_speech_prob": + 0.0025529435370117426}, {"id": 40, "seek": 40976, "start": 409.76, "end": 421.12, + "text": " functioning. For the most of the time, we don''t consume the information + from a single", "tokens": [50364, 18483, 13, 1171, 264, 881, 295, 264, 565, 11, + 321, 500, 380, 14732, 264, 1589, 490, 257, 2167, 50932], "temperature": 0.0, "avg_logprob": + -0.19399917957394622, "compression_ratio": 1.3636363636363635, "no_speech_prob": + 0.00391565402969718}, {"id": 41, "seek": 40976, "start": 421.59999999999997, "end": + 439.36, "text": " modality actually when we try to understand our environment, we + both take it as a visual input", "tokens": [50956, 1072, 1860, 767, 562, 321, 853, + 281, 1223, 527, 2823, 11, 321, 1293, 747, 309, 382, 257, 5056, 4846, 51844], "temperature": + 0.0, "avg_logprob": -0.19399917957394622, "compression_ratio": 1.3636363636363635, + "no_speech_prob": 0.00391565402969718}, {"id": 42, "seek": 43936, "start": 439.36, + "end": 451.92, "text": " and also an audio input and we also talk to people around + them for it gives us a better", "tokens": [50364, 293, 611, 364, 6278, 4846, 293, + 321, 611, 751, 281, 561, 926, 552, 337, 309, 2709, 505, 257, 1101, 50992], "temperature": + 0.0, "avg_logprob": -0.16713650660081344, "compression_ratio": 1.3671875, "no_speech_prob": + 0.007281173951923847}, {"id": 43, "seek": 43936, "start": 452.8, "end": 465.12, + "text": " understanding of the environment. So if we want to make our AI smarter, + we also need to", "tokens": [51036, 3701, 295, 264, 2823, 13, 407, 498, 321, 528, + 281, 652, 527, 7318, 20294, 11, 321, 611, 643, 281, 51652], "temperature": 0.0, + "avg_logprob": -0.16713650660081344, "compression_ratio": 1.3671875, "no_speech_prob": + 0.007281173951923847}, {"id": 44, "seek": 46512, "start": 466.0, "end": 477.92, + "text": " help them gain this ability as well. So beyond searching for images with + a textual query,", "tokens": [50408, 854, 552, 6052, 341, 3485, 382, 731, 13, 407, + 4399, 10808, 337, 5267, 365, 257, 2487, 901, 14581, 11, 51004], "temperature": 0.0, + "avg_logprob": -0.2102330525716146, "compression_ratio": 1.3970588235294117, "no_speech_prob": + 0.010370674543082714}, {"id": 45, "seek": 46512, "start": 478.88, "end": 493.12, + "text": " this also helps us to combine information from different sources. So in + this case, maybe we can also", "tokens": [51052, 341, 611, 3665, 505, 281, 10432, + 1589, 490, 819, 7139, 13, 407, 294, 341, 1389, 11, 1310, 321, 393, 611, 51764], + "temperature": 0.0, "avg_logprob": -0.2102330525716146, "compression_ratio": 1.3970588235294117, + "no_speech_prob": 0.010370674543082714}, {"id": 46, "seek": 49312, "start": 493.12, + "end": 504.64, "text": " have AI better understand its environment by combining, + for example, a stream from the", "tokens": [50364, 362, 7318, 1101, 1223, 1080, + 2823, 538, 21928, 11, 337, 1365, 11, 257, 4309, 490, 264, 50940], "temperature": + 0.0, "avg_logprob": -0.27706934276380035, "compression_ratio": 1.3643410852713178, + "no_speech_prob": 0.007500975858420134}, {"id": 47, "seek": 49312, "start": 506.32, + "end": 517.6800000000001, "text": " camera and also maybe an output from a speech + recognition and encoding them into a vector", "tokens": [51024, 2799, 293, 611, + 1310, 364, 5598, 490, 257, 6218, 11150, 293, 43430, 552, 666, 257, 8062, 51592], + "temperature": 0.0, "avg_logprob": -0.27706934276380035, "compression_ratio": 1.3643410852713178, + "no_speech_prob": 0.007500975858420134}, {"id": 48, "seek": 51768, "start": 518.4799999999999, + "end": 532.3199999999999, "text": " we can combine these two vectors to fit into + that encoder. So this also opens such new opportunities.", "tokens": [50404, 321, + 393, 10432, 613, 732, 18875, 281, 3318, 666, 300, 2058, 19866, 13, 407, 341, 611, + 9870, 1270, 777, 4786, 13, 51096], "temperature": 0.0, "avg_logprob": -0.2814934507329413, + "compression_ratio": 1.3776223776223777, "no_speech_prob": 0.017302606254816055}, + {"id": 49, "seek": 51768, "start": 533.28, "end": 540.88, "text": " Yeah, that''s + a great intro there also like how you gave analogy with how human brain functions,", + "tokens": [51144, 865, 11, 300, 311, 257, 869, 12897, 456, 611, 411, 577, 291, 2729, + 21663, 365, 577, 1952, 3567, 6828, 11, 51524], "temperature": 0.0, "avg_logprob": + -0.2814934507329413, "compression_ratio": 1.3776223776223777, "no_speech_prob": + 0.017302606254816055}, {"id": 50, "seek": 54088, "start": 540.88, "end": 549.2, + "text": " so like how we take so many signals into our decision making. And specifically, + like what you", "tokens": [50364, 370, 411, 577, 321, 747, 370, 867, 12354, 666, + 527, 3537, 1455, 13, 400, 4682, 11, 411, 437, 291, 50780], "temperature": 0.0, "avg_logprob": + -0.1895873719367428, "compression_ratio": 1.6057142857142856, "no_speech_prob": + 0.033799298107624054}, {"id": 51, "seek": 54088, "start": 549.2, "end": 556.96, + "text": " mentioned about clip, I like the fact that in practical settings, let''s + say if you have images,", "tokens": [50780, 2835, 466, 7353, 11, 286, 411, 264, + 1186, 300, 294, 8496, 6257, 11, 718, 311, 584, 498, 291, 362, 5267, 11, 51168], + "temperature": 0.0, "avg_logprob": -0.1895873719367428, "compression_ratio": 1.6057142857142856, + "no_speech_prob": 0.033799298107624054}, {"id": 52, "seek": 54088, "start": 556.96, + "end": 562.88, "text": " let''s say of some goods and you want to make a search + in those goods and you also have some", "tokens": [51168, 718, 311, 584, 295, 512, + 10179, 293, 291, 528, 281, 652, 257, 3164, 294, 729, 10179, 293, 291, 611, 362, + 512, 51464], "temperature": 0.0, "avg_logprob": -0.1895873719367428, "compression_ratio": + 1.6057142857142856, "no_speech_prob": 0.033799298107624054}, {"id": 53, "seek": + 56288, "start": 562.88, "end": 571.6, "text": " metadata, let''s say titles or descriptions, + right? It may be that some human decided what to put", "tokens": [50364, 26603, + 11, 718, 311, 584, 12992, 420, 24406, 11, 558, 30, 467, 815, 312, 300, 512, 1952, + 3047, 437, 281, 829, 50800], "temperature": 0.0, "avg_logprob": -0.13722216116415487, + "compression_ratio": 1.5824175824175823, "no_speech_prob": 0.0037804818712174892}, + {"id": 54, "seek": 56288, "start": 571.6, "end": 576.96, "text": " in that text, + but they didn''t put everything that there is on the image, right? And so I think", + "tokens": [50800, 294, 300, 2487, 11, 457, 436, 994, 380, 829, 1203, 300, 456, 307, + 322, 264, 3256, 11, 558, 30, 400, 370, 286, 519, 51068], "temperature": 0.0, "avg_logprob": + -0.13722216116415487, "compression_ratio": 1.5824175824175823, "no_speech_prob": + 0.0037804818712174892}, {"id": 55, "seek": 56288, "start": 576.96, "end": 584.64, + "text": " clip helps us to find sort of semantics that''s hidden inside the image + itself, right? So I think", "tokens": [51068, 7353, 3665, 505, 281, 915, 1333, 295, + 4361, 45298, 300, 311, 7633, 1854, 264, 3256, 2564, 11, 558, 30, 407, 286, 519, + 51452], "temperature": 0.0, "avg_logprob": -0.13722216116415487, "compression_ratio": + 1.5824175824175823, "no_speech_prob": 0.0037804818712174892}, {"id": 56, "seek": + 58464, "start": 584.64, "end": 589.36, "text": " that''s kind of like has practical + impact on what we built.", "tokens": [50364, 300, 311, 733, 295, 411, 575, 8496, + 2712, 322, 437, 321, 3094, 13, 50600], "temperature": 0.0, "avg_logprob": -0.2164562463760376, + "compression_ratio": 1.2704918032786885, "no_speech_prob": 0.004923407919704914}, + {"id": 57, "seek": 58464, "start": 591.6, "end": 602.4, "text": " Yeah, exactly. + Actually, in the traditional source, for example, let''s get the product source + as", "tokens": [50712, 865, 11, 2293, 13, 5135, 11, 294, 264, 5164, 4009, 11, 337, + 1365, 11, 718, 311, 483, 264, 1674, 4009, 382, 51252], "temperature": 0.0, "avg_logprob": + -0.2164562463760376, "compression_ratio": 1.2704918032786885, "no_speech_prob": + 0.004923407919704914}, {"id": 58, "seek": 60240, "start": 602.72, "end": 615.4399999999999, + "text": " example, when you want to develop a product source for, for example, an + e-commerce website,", "tokens": [50380, 1365, 11, 562, 291, 528, 281, 1499, 257, + 1674, 4009, 337, 11, 337, 1365, 11, 364, 308, 12, 26926, 3144, 11, 51016], "temperature": + 0.0, "avg_logprob": -0.1745330344798953, "compression_ratio": 1.4132231404958677, + "no_speech_prob": 0.006605126429349184}, {"id": 59, "seek": 60240, "start": 615.4399999999999, + "end": 625.04, "text": " you need to enter different terms that can define that + product to have a user''s", "tokens": [51016, 291, 643, 281, 3242, 819, 2115, 300, + 393, 6964, 300, 1674, 281, 362, 257, 4195, 311, 51496], "temperature": 0.0, "avg_logprob": + -0.1745330344798953, "compression_ratio": 1.4132231404958677, "no_speech_prob": + 0.006605126429349184}, {"id": 60, "seek": 62504, "start": 625.04, "end": 637.92, + "text": " find that product with different wording, but this is not so practical + because people use very", "tokens": [50364, 915, 300, 1674, 365, 819, 47602, 11, + 457, 341, 307, 406, 370, 8496, 570, 561, 764, 588, 51008], "temperature": 0.0, "avg_logprob": + -0.17904976436070033, "compression_ratio": 1.3880597014925373, "no_speech_prob": + 0.0028693124186247587}, {"id": 61, "seek": 62504, "start": 637.92, "end": 650.0, + "text": " different terms to refer to things. And you in the current capacity of + e-commerce websites,", "tokens": [51008, 819, 2115, 281, 2864, 281, 721, 13, 400, + 291, 294, 264, 2190, 6042, 295, 308, 12, 26926, 12891, 11, 51612], "temperature": + 0.0, "avg_logprob": -0.17904976436070033, "compression_ratio": 1.3880597014925373, + "no_speech_prob": 0.0028693124186247587}, {"id": 62, "seek": 65000, "start": 650.0, + "end": 663.28, "text": " we have hundreds of thousands of products and they also + need to be updated once you add new", "tokens": [50364, 321, 362, 6779, 295, 5383, + 295, 3383, 293, 436, 611, 643, 281, 312, 10588, 1564, 291, 909, 777, 51028], "temperature": + 0.0, "avg_logprob": -0.22537077040899367, "compression_ratio": 1.4634146341463414, + "no_speech_prob": 0.01681429147720337}, {"id": 63, "seek": 65000, "start": 663.28, + "end": 677.28, "text": " products and remove new products. And also like myths acted + at typos to this complexity,", "tokens": [51028, 3383, 293, 4159, 777, 3383, 13, + 400, 611, 411, 28205, 20359, 412, 2125, 329, 281, 341, 14024, 11, 51728], "temperature": + 0.0, "avg_logprob": -0.22537077040899367, "compression_ratio": 1.4634146341463414, + "no_speech_prob": 0.01681429147720337}, {"id": 64, "seek": 67728, "start": 678.0, + "end": 688.8, "text": " is actually explored to millions, maybe a tens of millions + of possibilities. This is beyond the", "tokens": [50400, 307, 767, 24016, 281, 6803, + 11, 1310, 257, 10688, 295, 6803, 295, 12178, 13, 639, 307, 4399, 264, 50940], "temperature": + 0.0, "avg_logprob": -0.26218256839486054, "compression_ratio": 1.4444444444444444, + "no_speech_prob": 0.019165612757205963}, {"id": 65, "seek": 67728, "start": 688.8, + "end": 703.04, "text": " power of humans actually. But once you make connections, + make a connection between text and images,", "tokens": [50940, 1347, 295, 6255, + 767, 13, 583, 1564, 291, 652, 9271, 11, 652, 257, 4984, 1296, 2487, 293, 5267, 11, + 51652], "temperature": 0.0, "avg_logprob": -0.26218256839486054, "compression_ratio": + 1.4444444444444444, "no_speech_prob": 0.019165612757205963}, {"id": 66, "seek": + 70304, "start": 703.04, "end": 716.4, "text": " you don''t need to enter such descriptive + text, you only encode images into vectors and index", "tokens": [50364, 291, 500, + 380, 643, 281, 3242, 1270, 42585, 2487, 11, 291, 787, 2058, 1429, 5267, 666, 18875, + 293, 8186, 51032], "temperature": 0.0, "avg_logprob": -0.2243068573322702, "compression_ratio": + 1.4427480916030535, "no_speech_prob": 0.005336532834917307}, {"id": 67, "seek": + 70304, "start": 716.4, "end": 726.4, "text": " time into a vector database. Then + in the inference time, all you need is just encode the textual", "tokens": [51032, + 565, 666, 257, 8062, 8149, 13, 1396, 294, 264, 38253, 565, 11, 439, 291, 643, 307, + 445, 2058, 1429, 264, 2487, 901, 51532], "temperature": 0.0, "avg_logprob": -0.2243068573322702, + "compression_ratio": 1.4427480916030535, "no_speech_prob": 0.005336532834917307}, + {"id": 68, "seek": 72640, "start": 726.88, "end": 736.9599999999999, "text": " input + as well and create that pre-indexed database to get similar results. Actually, this + also", "tokens": [50388, 4846, 382, 731, 293, 1884, 300, 659, 12, 471, 3121, 292, + 8149, 281, 483, 2531, 3542, 13, 5135, 11, 341, 611, 50892], "temperature": 0.0, + "avg_logprob": -0.24781280093722874, "compression_ratio": 1.2905982905982907, "no_speech_prob": + 0.0028040348552167416}, {"id": 69, "seek": 72640, "start": 738.8, "end": 743.28, + "text": " buildings new opportunities, for example, people usually", "tokens": [50984, + 7446, 777, 4786, 11, 337, 1365, 11, 561, 2673, 51208], "temperature": 0.0, "avg_logprob": + -0.24781280093722874, "compression_ratio": 1.2905982905982907, "no_speech_prob": + 0.0028040348552167416}, {"id": 70, "seek": 74328, "start": 743.36, "end": 758.4, + "text": " enter some pre-defined textual descriptors in this search engines, but + some new products may have", "tokens": [50368, 3242, 512, 659, 12, 37716, 2487, + 901, 31280, 830, 294, 341, 3164, 12982, 11, 457, 512, 777, 3383, 815, 362, 51120], + "temperature": 0.0, "avg_logprob": -0.34983181953430176, "compression_ratio": 1.1547619047619047, + "no_speech_prob": 0.00809534639120102}, {"id": 71, "seek": 75840, "start": 759.1999999999999, + "end": 773.1999999999999, "text": " brand new features that people are not accustomed + to. So even in this case, our vector search based", "tokens": [50404, 3360, 777, + 4122, 300, 561, 366, 406, 35980, 281, 13, 407, 754, 294, 341, 1389, 11, 527, 8062, + 3164, 2361, 51104], "temperature": 0.0, "avg_logprob": -0.22027762234210968, "compression_ratio": + 1.5056179775280898, "no_speech_prob": 0.005204358603805304}, {"id": 72, "seek": + 75840, "start": 774.8, "end": 780.8, "text": " solution that combines images and + text can be in that image as well.", "tokens": [51184, 3827, 300, 29520, 5267, 293, + 2487, 393, 312, 294, 300, 3256, 382, 731, 13, 51484], "temperature": 0.0, "avg_logprob": + -0.22027762234210968, "compression_ratio": 1.5056179775280898, "no_speech_prob": + 0.005204358603805304}, {"id": 73, "seek": 75840, "start": 782.0, "end": 787.12, + "text": " Yeah, that sounds cool. So it kind of opens up a lot of opportunities + that didn''t exist before when", "tokens": [51544, 865, 11, 300, 3263, 1627, 13, + 407, 309, 733, 295, 9870, 493, 257, 688, 295, 4786, 300, 994, 380, 2514, 949, 562, + 51800], "temperature": 0.0, "avg_logprob": -0.22027762234210968, "compression_ratio": + 1.5056179775280898, "no_speech_prob": 0.005204358603805304}, {"id": 74, "seek": + 78712, "start": 787.68, "end": 795.44, "text": " we modeled our object purely through + textual representation. Maybe somebody did attempt to", "tokens": [50392, 321, 37140, + 527, 2657, 17491, 807, 2487, 901, 10290, 13, 2704, 2618, 630, 5217, 281, 50780], + "temperature": 0.0, "avg_logprob": -0.18337147376116583, "compression_ratio": 1.4263959390862944, + "no_speech_prob": 0.002145607490092516}, {"id": 75, "seek": 78712, "start": 795.44, + "end": 801.2, "text": " also encode images of some other binary format, but I think + maybe it wasn''t as efficient or", "tokens": [50780, 611, 2058, 1429, 5267, 295, + 512, 661, 17434, 7877, 11, 457, 286, 519, 1310, 309, 2067, 380, 382, 7148, 420, + 51068], "temperature": 0.0, "avg_logprob": -0.18337147376116583, "compression_ratio": + 1.4263959390862944, "no_speech_prob": 0.002145607490092516}, {"id": 76, "seek": + 78712, "start": 802.08, "end": 811.84, "text": " definitely not multi-model. So + that sounds so cool. And so how do you connect? Where do you start?", "tokens": + [51112, 2138, 406, 4825, 12, 8014, 338, 13, 407, 300, 3263, 370, 1627, 13, 400, + 370, 577, 360, 291, 1745, 30, 2305, 360, 291, 722, 30, 51600], "temperature": 0.0, + "avg_logprob": -0.18337147376116583, "compression_ratio": 1.4263959390862944, "no_speech_prob": + 0.002145607490092516}, {"id": 77, "seek": 81184, "start": 811.84, "end": 818.0, + "text": " Usually, let''s say if you have a data set, right? And you want to implement + neural search", "tokens": [50364, 11419, 11, 718, 311, 584, 498, 291, 362, 257, + 1412, 992, 11, 558, 30, 400, 291, 528, 281, 4445, 18161, 3164, 50672], "temperature": + 0.0, "avg_logprob": -0.15680377403002108, "compression_ratio": 1.6519823788546255, + "no_speech_prob": 0.0053307972848415375}, {"id": 78, "seek": 81184, "start": 818.5600000000001, + "end": 827.0400000000001, "text": " experience. At one point of time, do you start + thinking about what the metric is the best for", "tokens": [50700, 1752, 13, 1711, + 472, 935, 295, 565, 11, 360, 291, 722, 1953, 466, 437, 264, 20678, 307, 264, 1151, + 337, 51124], "temperature": 0.0, "avg_logprob": -0.15680377403002108, "compression_ratio": + 1.6519823788546255, "no_speech_prob": 0.0053307972848415375}, {"id": 79, "seek": + 81184, "start": 827.0400000000001, "end": 833.2, "text": " my data set? And also, + how do you approach it from which angle do you usually approach this?", "tokens": + [51124, 452, 1412, 992, 30, 400, 611, 11, 577, 360, 291, 3109, 309, 490, 597, 5802, + 360, 291, 2673, 3109, 341, 30, 51432], "temperature": 0.0, "avg_logprob": -0.15680377403002108, + "compression_ratio": 1.6519823788546255, "no_speech_prob": 0.0053307972848415375}, + {"id": 80, "seek": 81184, "start": 833.2, "end": 837.76, "text": " And this is something + that really helps you to hear your theoretical as well as practical thoughts", "tokens": + [51432, 400, 341, 307, 746, 300, 534, 3665, 291, 281, 1568, 428, 20864, 382, 731, + 382, 8496, 4598, 51660], "temperature": 0.0, "avg_logprob": -0.15680377403002108, + "compression_ratio": 1.6519823788546255, "no_speech_prob": 0.0053307972848415375}, + {"id": 81, "seek": 83776, "start": 837.76, "end": 847.92, "text": " of this. Yes, + you''re actually there are lots of very different techniques and", "tokens": [50364, + 295, 341, 13, 1079, 11, 291, 434, 767, 456, 366, 3195, 295, 588, 819, 7512, 293, + 50872], "temperature": 0.0, "avg_logprob": -0.2619859544854415, "compression_ratio": + 1.328125, "no_speech_prob": 0.004639596678316593}, {"id": 82, "seek": 83776, "start": + 848.56, "end": 859.12, "text": " methods and approaches to metric learning that + can work for some specific types of problems.", "tokens": [50904, 7150, 293, 11587, + 281, 20678, 2539, 300, 393, 589, 337, 512, 2685, 3467, 295, 2740, 13, 51432], "temperature": + 0.0, "avg_logprob": -0.2619859544854415, "compression_ratio": 1.328125, "no_speech_prob": + 0.004639596678316593}, {"id": 83, "seek": 85912, "start": 859.84, "end": 870.24, + "text": " But in my practical experience, I usually begin with with with an auto + encoder, because it''s", "tokens": [50400, 583, 294, 452, 8496, 1752, 11, 286, 2673, + 1841, 365, 365, 365, 364, 8399, 2058, 19866, 11, 570, 309, 311, 50920], "temperature": + 0.0, "avg_logprob": -0.22210170911706012, "compression_ratio": 1.3880597014925373, + "no_speech_prob": 0.013364981859922409}, {"id": 84, "seek": 85912, "start": 872.0, + "end": 886.5600000000001, "text": " already very easy to implement and easy to train. + It can be applied to almost any data track.", "tokens": [51008, 1217, 588, 1858, + 281, 4445, 293, 1858, 281, 3847, 13, 467, 393, 312, 6456, 281, 1920, 604, 1412, + 2837, 13, 51736], "temperature": 0.0, "avg_logprob": -0.22210170911706012, "compression_ratio": + 1.3880597014925373, "no_speech_prob": 0.013364981859922409}, {"id": 85, "seek": + 88656, "start": 887.4399999999999, "end": 896.64, "text": " Basically, in auto encoders, + we have two models and encoder and the encoders.", "tokens": [50408, 8537, 11, 294, + 8399, 2058, 378, 433, 11, 321, 362, 732, 5245, 293, 2058, 19866, 293, 264, 2058, + 378, 433, 13, 50868], "temperature": 0.0, "avg_logprob": -0.2684889923442494, "compression_ratio": + 1.4234234234234233, "no_speech_prob": 0.008925629779696465}, {"id": 86, "seek": + 88656, "start": 897.4399999999999, "end": 909.28, "text": " The encoders part encodes + samples into an dimensional vector. This and should be", "tokens": [50908, 440, + 2058, 378, 433, 644, 2058, 4789, 10938, 666, 364, 18795, 8062, 13, 639, 293, 820, + 312, 51500], "temperature": 0.0, "avg_logprob": -0.2684889923442494, "compression_ratio": + 1.4234234234234233, "no_speech_prob": 0.008925629779696465}, {"id": 87, "seek": + 90928, "start": 910.24, "end": 921.1999999999999, "text": " much lower than the + dimensionality of the input sample. And the decoder is supposed to", "tokens": [50412, + 709, 3126, 813, 264, 10139, 1860, 295, 264, 4846, 6889, 13, 400, 264, 979, 19866, + 307, 3442, 281, 50960], "temperature": 0.0, "avg_logprob": -0.24122115543910436, + "compression_ratio": 1.4444444444444444, "no_speech_prob": 0.006070391740649939}, + {"id": 88, "seek": 90928, "start": 921.92, "end": 936.72, "text": " reconstruct + the input sample when this encoded vector is given to it. So this is a", "tokens": + [50996, 31499, 264, 4846, 6889, 562, 341, 2058, 12340, 8062, 307, 2212, 281, 309, + 13, 407, 341, 307, 257, 51736], "temperature": 0.0, "avg_logprob": -0.24122115543910436, + "compression_ratio": 1.4444444444444444, "no_speech_prob": 0.006070391740649939}, + {"id": 89, "seek": 93672, "start": 937.6800000000001, "end": 949.28, "text": " the + self-provised method. So it can be applied to any type of data set. You don''t need + labels.", "tokens": [50412, 264, 2698, 12, 4318, 24420, 3170, 13, 407, 309, 393, + 312, 6456, 281, 604, 2010, 295, 1412, 992, 13, 509, 500, 380, 643, 16949, 13, 50992], + "temperature": 0.0, "avg_logprob": -0.33323415120442706, "compression_ratio": 1.2857142857142858, + "no_speech_prob": 0.012670663185417652}, {"id": 90, "seek": 93672, "start": 950.88, + "end": 963.6, "text": " It usually gives a very good resource. After training such + a model, you can visualize", "tokens": [51072, 467, 2673, 2709, 257, 588, 665, 7684, + 13, 2381, 3097, 1270, 257, 2316, 11, 291, 393, 23273, 51708], "temperature": 0.0, + "avg_logprob": -0.33323415120442706, "compression_ratio": 1.2857142857142858, "no_speech_prob": + 0.012670663185417652}, {"id": 91, "seek": 96360, "start": 964.0, "end": 976.5600000000001, + "text": " embedding. We call the output of the encoders, vectors embedding. So you + can visualize such embedding", "tokens": [50384, 12240, 3584, 13, 492, 818, 264, + 5598, 295, 264, 2058, 378, 433, 11, 18875, 12240, 3584, 13, 407, 291, 393, 23273, + 1270, 12240, 3584, 51012], "temperature": 0.0, "avg_logprob": -0.269812992640904, + "compression_ratio": 1.406015037593985, "no_speech_prob": 0.010135110467672348}, + {"id": 92, "seek": 96360, "start": 976.5600000000001, "end": 990.0, "text": " with + a tool. This tool can be, for example, TensorFlow projectors and another tool by", + "tokens": [51012, 365, 257, 2290, 13, 639, 2290, 393, 312, 11, 337, 1365, 11, 37624, + 1716, 830, 293, 1071, 2290, 538, 51684], "temperature": 0.0, "avg_logprob": -0.269812992640904, + "compression_ratio": 1.406015037593985, "no_speech_prob": 0.010135110467672348}, + {"id": 93, "seek": 99000, "start": 990.0, "end": 998.4, "text": " Yubach. I just + couldn''t show my word in there. Sorry. No worries. We can find those links later, + I guess.", "tokens": [50364, 398, 836, 608, 13, 286, 445, 2809, 380, 855, 452, 1349, + 294, 456, 13, 4919, 13, 883, 16340, 13, 492, 393, 915, 729, 6123, 1780, 11, 286, + 2041, 13, 50784], "temperature": 0.0, "avg_logprob": -0.4937224881402377, "compression_ratio": + 1.355263157894737, "no_speech_prob": 0.006493051536381245}, {"id": 94, "seek": 99000, + "start": 999.6, "end": 1013.28, "text": " Yeah, we can put a link in the description. + And this visualization tools have us see if our encoders", "tokens": [50844, 865, + 11, 321, 393, 829, 257, 2113, 294, 264, 3855, 13, 400, 341, 25801, 3873, 362, 505, + 536, 498, 527, 2058, 378, 433, 51528], "temperature": 0.0, "avg_logprob": -0.4937224881402377, + "compression_ratio": 1.355263157894737, "no_speech_prob": 0.006493051536381245}, + {"id": 95, "seek": 101328, "start": 1013.68, "end": 1026.56, "text": " really involve + similar samples need to each closer to each other than the similar ones.", "tokens": + [50384, 534, 9494, 2531, 10938, 643, 281, 1184, 4966, 281, 1184, 661, 813, 264, + 2531, 2306, 13, 51028], "temperature": 0.0, "avg_logprob": -0.24338811509152677, + "compression_ratio": 1.464, "no_speech_prob": 0.060637082904577255}, {"id": 96, + "seek": 101328, "start": 1027.84, "end": 1038.16, "text": " If it is, we can use + this encoders part. We can just dispose the decoder part and we can simply", "tokens": + [51092, 759, 309, 307, 11, 321, 393, 764, 341, 2058, 378, 433, 644, 13, 492, 393, + 445, 42537, 264, 979, 19866, 644, 293, 321, 393, 2935, 51608], "temperature": 0.0, + "avg_logprob": -0.24338811509152677, "compression_ratio": 1.464, "no_speech_prob": + 0.060637082904577255}, {"id": 97, "seek": 103816, "start": 1039.0400000000002, "end": + 1045.44, "text": " keep the encoder part and use it to encode our samples and index + them in the", "tokens": [50408, 1066, 264, 2058, 19866, 644, 293, 764, 309, 281, + 2058, 1429, 527, 10938, 293, 8186, 552, 294, 264, 50728], "temperature": 0.0, "avg_logprob": + -0.18929408146784857, "compression_ratio": 1.3214285714285714, "no_speech_prob": + 0.011630809865891933}, {"id": 98, "seek": 103816, "start": 1046.3200000000002, "end": + 1058.0800000000002, "text": " vector. And we can already start searching semantics. + But we usually do", "tokens": [50772, 8062, 13, 400, 321, 393, 1217, 722, 10808, + 4361, 45298, 13, 583, 321, 2673, 360, 51360], "temperature": 0.0, "avg_logprob": + -0.18929408146784857, "compression_ratio": 1.3214285714285714, "no_speech_prob": + 0.011630809865891933}, {"id": 99, "seek": 105808, "start": 1058.08, "end": 1069.12, + "text": " buzzers than this one with only small set of labeled data. And you actually + need only", "tokens": [50364, 13036, 433, 813, 341, 472, 365, 787, 1359, 992, 295, + 21335, 1412, 13, 400, 291, 767, 643, 787, 50916], "temperature": 0.0, "avg_logprob": + -0.23537533623831614, "compression_ratio": 1.3858267716535433, "no_speech_prob": + 0.01939673162996769}, {"id": 100, "seek": 105808, "start": 1070.8799999999999, "end": + 1079.1999999999998, "text": " a few with that one. Actually, we are preparing some + publications to demonstrate this one.", "tokens": [51004, 257, 1326, 365, 300, 472, + 13, 5135, 11, 321, 366, 10075, 512, 25618, 281, 11698, 341, 472, 13, 51420], "temperature": + 0.0, "avg_logprob": -0.23537533623831614, "compression_ratio": 1.3858267716535433, + "no_speech_prob": 0.01939673162996769}, {"id": 101, "seek": 107920, "start": 1079.68, + "end": 1095.3600000000001, "text": " After you train and encoders with a considerable + number of unlabeled data, all you need to do is", "tokens": [50388, 2381, 291, 3847, + 293, 2058, 378, 433, 365, 257, 24167, 1230, 295, 32118, 18657, 292, 1412, 11, 439, + 291, 643, 281, 360, 307, 51172], "temperature": 0.0, "avg_logprob": -0.4124216842651367, + "compression_ratio": 1.4511278195488722, "no_speech_prob": 0.020865069702267647}, + {"id": 102, "seek": 107920, "start": 1096.96, "end": 1108.64, "text": " just to + find to in it with a small set of labeled data. On the supervised site, there are + really", "tokens": [51252, 445, 281, 915, 281, 294, 309, 365, 257, 1359, 992, 295, + 21335, 1412, 13, 1282, 264, 46533, 3621, 11, 456, 366, 534, 51836], "temperature": + 0.0, "avg_logprob": -0.4124216842651367, "compression_ratio": 1.4511278195488722, + "no_speech_prob": 0.020865069702267647}, {"id": 103, "seek": 110920, "start": 1109.6000000000001, + "end": 1121.76, "text": " quite a number of very different approaches to matrix + learning from more traditional margin-based", "tokens": [50384, 1596, 257, 1230, + 295, 588, 819, 11587, 281, 8141, 2539, 490, 544, 5164, 10270, 12, 6032, 50992], + "temperature": 0.0, "avg_logprob": -0.26663709298158306, "compression_ratio": 1.4661654135338347, + "no_speech_prob": 0.004581479821354151}, {"id": 104, "seek": 110920, "start": 1122.48, + "end": 1138.24, "text": " approaches to newer categorization-based approaches. And + actually, they deserve a long discussion", "tokens": [51028, 11587, 281, 17628, + 19250, 2144, 12, 6032, 11587, 13, 400, 767, 11, 436, 9948, 257, 938, 5017, 51816], + "temperature": 0.0, "avg_logprob": -0.26663709298158306, "compression_ratio": 1.4661654135338347, + "no_speech_prob": 0.004581479821354151}, {"id": 105, "seek": 113824, "start": 1138.24, + "end": 1144.88, "text": " of data. For sure. Yeah, that''s awesome. But just to + unpack it a little bit, so", "tokens": [50364, 295, 1412, 13, 1171, 988, 13, 865, + 11, 300, 311, 3476, 13, 583, 445, 281, 26699, 309, 257, 707, 857, 11, 370, 50696], + "temperature": 0.0, "avg_logprob": -0.3364786207675934, "compression_ratio": 1.4911242603550297, + "no_speech_prob": 0.0079282121732831}, {"id": 106, "seek": 113824, "start": 1144.88, + "end": 1152.0, "text": " in a natural metric learning process allows me to learn + the optimal distance metric for my data.", "tokens": [50696, 294, 257, 3303, 20678, + 2539, 1399, 4045, 385, 281, 1466, 264, 16252, 4560, 20678, 337, 452, 1412, 13, 51052], + "temperature": 0.0, "avg_logprob": -0.3364786207675934, "compression_ratio": 1.4911242603550297, + "no_speech_prob": 0.0079282121732831}, {"id": 107, "seek": 113824, "start": 1152.8, + "end": 1158.48, "text": " So it''s kind of like a function of my dataset properties, + inner properties.", "tokens": [51092, 407, 309, 311, 733, 295, 411, 257, 2445, 295, + 452, 28872, 7221, 11, 7284, 7221, 13, 51376], "temperature": 0.0, "avg_logprob": + -0.3364786207675934, "compression_ratio": 1.4911242603550297, "no_speech_prob": + 0.0079282121732831}, {"id": 108, "seek": 115848, "start": 1159.3600000000001, "end": + 1166.56, "text": " Yeah, actually, let''s clarify this metric thing. What does it + mean in this context?", "tokens": [50408, 865, 11, 767, 11, 718, 311, 17594, 341, + 20678, 551, 13, 708, 775, 309, 914, 294, 341, 4319, 30, 50768], "temperature": 0.0, + "avg_logprob": -0.1807206796140087, "compression_ratio": 1.3333333333333333, "no_speech_prob": + 0.017488857731223106}, {"id": 109, "seek": 115848, "start": 1167.84, "end": 1181.2, + "text": " In this context, a metric is a non-negative function with two inputs. + Let''s say X and Y.", "tokens": [50832, 682, 341, 4319, 11, 257, 20678, 307, 257, + 2107, 12, 28561, 1166, 2445, 365, 732, 15743, 13, 961, 311, 584, 1783, 293, 398, + 13, 51500], "temperature": 0.0, "avg_logprob": -0.1807206796140087, "compression_ratio": + 1.3333333333333333, "no_speech_prob": 0.017488857731223106}, {"id": 110, "seek": + 118120, "start": 1181.76, "end": 1190.88, "text": " And it is used to measure what + is called the distance between X and Y.", "tokens": [50392, 400, 309, 307, 1143, + 281, 3481, 437, 307, 1219, 264, 4560, 1296, 1783, 293, 398, 13, 50848], "temperature": + 0.0, "avg_logprob": -0.16151898946517554, "compression_ratio": 1.2522522522522523, + "no_speech_prob": 0.00732125248759985}, {"id": 111, "seek": 118120, "start": 1192.72, + "end": 1202.8, "text": " When we feed such two inputs, it gives us a scaler''s positive + value.", "tokens": [50940, 1133, 321, 3154, 1270, 732, 15743, 11, 309, 2709, 505, + 257, 15664, 260, 311, 3353, 2158, 13, 51444], "temperature": 0.0, "avg_logprob": + -0.16151898946517554, "compression_ratio": 1.2522522522522523, "no_speech_prob": + 0.00732125248759985}, {"id": 112, "seek": 120280, "start": 1203.6, "end": 1216.1599999999999, + "text": " If this value is closer to zero, then we can assume that those two inputs + are more", "tokens": [50404, 759, 341, 2158, 307, 4966, 281, 4018, 11, 550, 321, + 393, 6552, 300, 729, 732, 15743, 366, 544, 51032], "temperature": 0.0, "avg_logprob": + -0.2132264773050944, "compression_ratio": 1.408, "no_speech_prob": 0.011007328517735004}, + {"id": 113, "seek": 120280, "start": 1216.1599999999999, "end": 1230.0, "text": + " similar to each other with two inputs with a higher distance value. So our whole + objective in", "tokens": [51032, 2531, 281, 1184, 661, 365, 732, 15743, 365, 257, + 2946, 4560, 2158, 13, 407, 527, 1379, 10024, 294, 51724], "temperature": 0.0, "avg_logprob": + -0.2132264773050944, "compression_ratio": 1.408, "no_speech_prob": 0.011007328517735004}, + {"id": 114, "seek": 123000, "start": 1230.0, "end": 1245.28, "text": " metric learning + is to train functions that can give this distance value. On the practical", "tokens": + [50364, 20678, 2539, 307, 281, 3847, 6828, 300, 393, 976, 341, 4560, 2158, 13, 1282, + 264, 8496, 51128], "temperature": 0.0, "avg_logprob": -0.21928277015686035, "compression_ratio": + 1.141025641025641, "no_speech_prob": 0.0022721486166119576}, {"id": 115, "seek": + 124528, "start": 1246.24, "end": 1256.72, "text": " site, we usually train a model + that outputs a vector and a dimensional vector.", "tokens": [50412, 3621, 11, 321, + 2673, 3847, 257, 2316, 300, 23930, 257, 8062, 293, 257, 18795, 8062, 13, 50936], + "temperature": 0.0, "avg_logprob": -0.28207188844680786, "compression_ratio": 1.280373831775701, + "no_speech_prob": 0.009226060472428799}, {"id": 116, "seek": 124528, "start": 1258.08, + "end": 1264.24, "text": " And then we can apply different distance functions such + as", "tokens": [51004, 400, 550, 321, 393, 3079, 819, 4560, 6828, 1270, 382, 51312], + "temperature": 0.0, "avg_logprob": -0.28207188844680786, "compression_ratio": 1.280373831775701, + "no_speech_prob": 0.009226060472428799}, {"id": 117, "seek": 126424, "start": 1265.2, + "end": 1276.64, "text": " Euclidean and cosine distance to get a measurement of + the distance value.", "tokens": [50412, 462, 1311, 31264, 282, 293, 23565, 4560, + 281, 483, 257, 13160, 295, 264, 4560, 2158, 13, 50984], "temperature": 0.0, "avg_logprob": + -0.17669592405620374, "compression_ratio": 1.4051724137931034, "no_speech_prob": + 0.005759425926953554}, {"id": 118, "seek": 126424, "start": 1278.0, "end": 1288.16, + "text": " There is also a term deep metric learning. Actually, the traditional metric + learning uses", "tokens": [51052, 821, 307, 611, 257, 1433, 2452, 20678, 2539, 13, + 5135, 11, 264, 5164, 20678, 2539, 4960, 51560], "temperature": 0.0, "avg_logprob": + -0.17669592405620374, "compression_ratio": 1.4051724137931034, "no_speech_prob": + 0.005759425926953554}, {"id": 119, "seek": 128816, "start": 1288.8000000000002, + "end": 1297.1200000000001, "text": " some linear transformations to project samples + into an dimensional", "tokens": [50396, 512, 8213, 34852, 281, 1716, 10938, 666, + 364, 18795, 50812], "temperature": 0.0, "avg_logprob": -0.16697173118591307, "compression_ratio": + 1.4220183486238531, "no_speech_prob": 0.00545174814760685}, {"id": 120, "seek": + 128816, "start": 1298.88, "end": 1310.48, "text": " feature space to apply a metric + function. But this linear aspect of such transformations", "tokens": [50900, 4111, + 1901, 281, 3079, 257, 20678, 2445, 13, 583, 341, 8213, 4171, 295, 1270, 34852, 51480], + "temperature": 0.0, "avg_logprob": -0.16697173118591307, "compression_ratio": 1.4220183486238531, + "no_speech_prob": 0.00545174814760685}, {"id": 121, "seek": 131048, "start": 1310.48, + "end": 1325.2, "text": " limits the use of traditional metric learning using time + with more richers, data types,", "tokens": [50364, 10406, 264, 764, 295, 5164, 20678, + 2539, 1228, 565, 365, 544, 4593, 433, 11, 1412, 3467, 11, 51100], "temperature": + 0.0, "avg_logprob": -0.42113206444717033, "compression_ratio": 1.4566929133858268, + "no_speech_prob": 0.00289943628013134}, {"id": 122, "seek": 131048, "start": 1325.2, + "end": 1336.08, "text": " for example, images and texts. So deep metric learning + benefits from the methods of deep learning", "tokens": [51100, 337, 1365, 11, 5267, + 293, 15765, 13, 407, 2452, 20678, 2539, 5311, 490, 264, 7150, 295, 2452, 2539, 51644], + "temperature": 0.0, "avg_logprob": -0.42113206444717033, "compression_ratio": 1.4566929133858268, + "no_speech_prob": 0.00289943628013134}, {"id": 123, "seek": 133608, "start": 1336.24, + "end": 1349.4399999999998, "text": " to learn non-linear transformations to project + samples into a new and dimensional vector space.", "tokens": [50372, 281, 1466, + 2107, 12, 28263, 34852, 281, 1716, 10938, 666, 257, 777, 293, 18795, 8062, 1901, + 13, 51032], "temperature": 0.0, "avg_logprob": -0.16175737613584937, "compression_ratio": + 1.3555555555555556, "no_speech_prob": 0.008436362259089947}, {"id": 124, "seek": + 133608, "start": 1350.6399999999999, "end": 1361.28, "text": " But in this context, + I usually use metric learning as an umbrella term to refer to both", "tokens": [51092, + 583, 294, 341, 4319, 11, 286, 2673, 764, 20678, 2539, 382, 364, 21925, 1433, 281, + 2864, 281, 1293, 51624], "temperature": 0.0, "avg_logprob": -0.16175737613584937, + "compression_ratio": 1.3555555555555556, "no_speech_prob": 0.008436362259089947}, + {"id": 125, "seek": 136128, "start": 1361.36, "end": 1370.3999999999999, "text": + " traditional metric learning and deep metric learning. Just like we do with machine + learning", "tokens": [50368, 5164, 20678, 2539, 293, 2452, 20678, 2539, 13, 1449, + 411, 321, 360, 365, 3479, 2539, 50820], "temperature": 0.0, "avg_logprob": -0.3222321485861754, + "compression_ratio": 1.7079207920792079, "no_speech_prob": 0.009160725399851799}, + {"id": 126, "seek": 136128, "start": 1370.3999999999999, "end": 1374.32, "text": + " to refer to both classical machine learning and deep learning.", "tokens": [50820, + 281, 2864, 281, 1293, 13735, 3479, 2539, 293, 2452, 2539, 13, 51016], "temperature": + 0.0, "avg_logprob": -0.3222321485861754, "compression_ratio": 1.7079207920792079, + "no_speech_prob": 0.009160725399851799}, {"id": 127, "seek": 136128, "start": 1374.32, + "end": 1383.2, "text": " Yeah, that makes sense. Thank you. And so essentially, + in the lay main terms, deep learning allows us to", "tokens": [51016, 865, 11, 300, + 1669, 2020, 13, 1044, 291, 13, 400, 370, 4476, 11, 294, 264, 2360, 2135, 2115, 11, + 2452, 2539, 4045, 505, 281, 51460], "temperature": 0.0, "avg_logprob": -0.3222321485861754, + "compression_ratio": 1.7079207920792079, "no_speech_prob": 0.009160725399851799}, + {"id": 128, "seek": 136128, "start": 1383.2, "end": 1391.12, "text": " vectorize + data objects that previously we couldn''t vectorize in a celly, so images or", "tokens": + [51460, 8062, 1125, 1412, 6565, 300, 8046, 321, 2809, 380, 8062, 1125, 294, 257, + 2815, 88, 11, 370, 5267, 420, 51856], "temperature": 0.0, "avg_logprob": -0.3222321485861754, + "compression_ratio": 1.7079207920792079, "no_speech_prob": 0.009160725399851799}, + {"id": 129, "seek": 139128, "start": 1391.44, "end": 1397.36, "text": " I don''t + know. And do it efficiently, because in images, you might have way too many pixels.", + "tokens": [50372, 286, 500, 380, 458, 13, 400, 360, 309, 19621, 11, 570, 294, 5267, + 11, 291, 1062, 362, 636, 886, 867, 18668, 13, 50668], "temperature": 0.0, "avg_logprob": + -0.19561427126648606, "compression_ratio": 1.5964125560538116, "no_speech_prob": + 0.004142153076827526}, {"id": 130, "seek": 139128, "start": 1397.36, "end": 1402.8799999999999, + "text": " So if you just take the vector of all the pixels, it''s way too big of + an object to deal with.", "tokens": [50668, 407, 498, 291, 445, 747, 264, 8062, + 295, 439, 264, 18668, 11, 309, 311, 636, 886, 955, 295, 364, 2657, 281, 2028, 365, + 13, 50944], "temperature": 0.0, "avg_logprob": -0.19561427126648606, "compression_ratio": + 1.5964125560538116, "no_speech_prob": 0.004142153076827526}, {"id": 131, "seek": + 139128, "start": 1403.68, "end": 1408.0, "text": " And so you vectorize, as you + said, in the beginning, and you basically sort of", "tokens": [50984, 400, 370, + 291, 8062, 1125, 11, 382, 291, 848, 11, 294, 264, 2863, 11, 293, 291, 1936, 1333, + 295, 51200], "temperature": 0.0, "avg_logprob": -0.19561427126648606, "compression_ratio": + 1.5964125560538116, "no_speech_prob": 0.004142153076827526}, {"id": 132, "seek": + 139128, "start": 1410.32, "end": 1415.92, "text": " project it in a lower dimensional + space. So now you can actually efficiently operate on it.", "tokens": [51316, 1716, + 309, 294, 257, 3126, 18795, 1901, 13, 407, 586, 291, 393, 767, 19621, 9651, 322, + 309, 13, 51596], "temperature": 0.0, "avg_logprob": -0.19561427126648606, "compression_ratio": + 1.5964125560538116, "no_speech_prob": 0.004142153076827526}, {"id": 133, "seek": + 141592, "start": 1416.88, "end": 1430.8000000000002, "text": " Exactly. Let''s get + images as an example. Let''s assume that we have images with a size of", "tokens": + [50412, 7587, 13, 961, 311, 483, 5267, 382, 364, 1365, 13, 961, 311, 6552, 300, + 321, 362, 5267, 365, 257, 2744, 295, 51108], "temperature": 0.0, "avg_logprob": + -0.23407047271728515, "compression_ratio": 1.1282051282051282, "no_speech_prob": + 0.021925868466496468}, {"id": 134, "seek": 143080, "start": 1431.76, "end": 1447.44, + "text": " 200 times 200. And we also have a channel value of three. So we end up + with 200 times 200 times", "tokens": [50412, 2331, 1413, 2331, 13, 400, 321, 611, + 362, 257, 2269, 2158, 295, 1045, 13, 407, 321, 917, 493, 365, 2331, 1413, 2331, + 1413, 51196], "temperature": 0.0, "avg_logprob": -0.2718716132931593, "compression_ratio": + 1.309090909090909, "no_speech_prob": 0.01090193446725607}, {"id": 135, "seek": 143080, + "start": 1448.0, "end": 1454.8, "text": " three values for a single image. And also, + let''s", "tokens": [51224, 1045, 4190, 337, 257, 2167, 3256, 13, 400, 611, 11, 718, + 311, 51564], "temperature": 0.0, "avg_logprob": -0.2718716132931593, "compression_ratio": + 1.309090909090909, "no_speech_prob": 0.01090193446725607}, {"id": 136, "seek": 145480, + "start": 1454.96, "end": 1471.12, "text": " actually, too many values also mean + a great variance value. So it''s not so practical to make a", "tokens": [50372, + 767, 11, 886, 867, 4190, 611, 914, 257, 869, 21977, 2158, 13, 407, 309, 311, 406, + 370, 8496, 281, 652, 257, 51180], "temperature": 0.0, "avg_logprob": -0.3360712814331055, + "compression_ratio": 1.146341463414634, "no_speech_prob": 0.0020567975006997585}, + {"id": 137, "seek": 147112, "start": 1471.12, "end": 1485.52, "text": " measurement + between two images, because those pixel values can include very surface, quite", + "tokens": [50364, 13160, 1296, 732, 5267, 11, 570, 729, 19261, 4190, 393, 4090, + 588, 3753, 11, 1596, 51084], "temperature": 0.0, "avg_logprob": -0.28118324279785156, + "compression_ratio": 1.3445378151260505, "no_speech_prob": 0.004205784294754267}, + {"id": 138, "seek": 147112, "start": 1488.32, "end": 1498.0, "text": " shallow surface + features that do not make any sense in our semantics.", "tokens": [51224, 20488, + 3753, 4122, 300, 360, 406, 652, 604, 2020, 294, 527, 4361, 45298, 13, 51708], "temperature": + 0.0, "avg_logprob": -0.28118324279785156, "compression_ratio": 1.3445378151260505, + "no_speech_prob": 0.004205784294754267}, {"id": 139, "seek": 149800, "start": 1498.24, + "end": 1508.16, "text": " But once we encode those high dimensional inputs into + a low dimensional vector space,", "tokens": [50376, 583, 1564, 321, 2058, 1429, + 729, 1090, 18795, 15743, 666, 257, 2295, 18795, 8062, 1901, 11, 50872], "temperature": + 0.0, "avg_logprob": -0.4275725228445871, "compression_ratio": 1.238532110091743, + "no_speech_prob": 0.012053826823830605}, {"id": 140, "seek": 149800, "start": 1508.88, + "end": 1516.08, "text": " for example, we usually have 500 to 12, 10 to the", "tokens": + [50908, 337, 1365, 11, 321, 2673, 362, 5923, 281, 2272, 11, 1266, 281, 264, 51268], + "temperature": 0.0, "avg_logprob": -0.4275725228445871, "compression_ratio": 1.238532110091743, + "no_speech_prob": 0.012053826823830605}, {"id": 141, "seek": 151608, "start": 1516.08, + "end": 1530.1599999999999, "text": " 12, 12, 12, 14 dimensional vectors. And this + value is really low when compared to the original", "tokens": [50364, 2272, 11, + 2272, 11, 2272, 11, 3499, 18795, 18875, 13, 400, 341, 2158, 307, 534, 2295, 562, + 5347, 281, 264, 3380, 51068], "temperature": 0.0, "avg_logprob": -0.44789005279541017, + "compression_ratio": 1.0804597701149425, "no_speech_prob": 0.004807830322533846}, + {"id": 142, "seek": 153016, "start": 1531.1200000000001, "end": 1546.64, "text": + " dimension of that sample. So in this case, that model should learn, should learn + a representation", "tokens": [50412, 10139, 295, 300, 6889, 13, 407, 294, 341, 1389, + 11, 300, 2316, 820, 1466, 11, 820, 1466, 257, 10290, 51188], "temperature": 0.0, + "avg_logprob": -0.24646667812181555, "compression_ratio": 1.2278481012658229, "no_speech_prob": + 0.015870986506342888}, {"id": 143, "seek": 154664, "start": 1547.44, "end": 1562.0800000000002, + "text": " of high dimensional samples. Actually, we just throw the unnecessary part + of those samples,", "tokens": [50404, 295, 1090, 18795, 10938, 13, 5135, 11, 321, + 445, 3507, 264, 19350, 644, 295, 729, 10938, 11, 51136], "temperature": 0.0, "avg_logprob": + -0.20283959893619313, "compression_ratio": 1.2660550458715596, "no_speech_prob": + 0.028912030160427094}, {"id": 144, "seek": 154664, "start": 1562.0800000000002, + "end": 1568.48, "text": " and we only keep the part that matters for us.", "tokens": + [51136, 293, 321, 787, 1066, 264, 644, 300, 7001, 337, 505, 13, 51456], "temperature": + 0.0, "avg_logprob": -0.20283959893619313, "compression_ratio": 1.2660550458715596, + "no_speech_prob": 0.028912030160427094}, {"id": 145, "seek": 156848, "start": 1569.44, + "end": 1575.68, "text": " Yeah, yeah. So kind of in some sense, you could say it''s + like signal compression,", "tokens": [50412, 865, 11, 1338, 13, 407, 733, 295, 294, + 512, 2020, 11, 291, 727, 584, 309, 311, 411, 6358, 19355, 11, 50724], "temperature": + 0.0, "avg_logprob": -0.20223540845124618, "compression_ratio": 1.6756756756756757, + "no_speech_prob": 0.02364165149629116}, {"id": 146, "seek": 156848, "start": 1575.68, + "end": 1583.52, "text": " right? So in some sense, like using the signal law, like + the distribution, you could actually", "tokens": [50724, 558, 30, 407, 294, 512, + 2020, 11, 411, 1228, 264, 6358, 2101, 11, 411, 264, 7316, 11, 291, 727, 767, 51116], + "temperature": 0.0, "avg_logprob": -0.20223540845124618, "compression_ratio": 1.6756756756756757, + "no_speech_prob": 0.02364165149629116}, {"id": 147, "seek": 156848, "start": 1583.52, + "end": 1589.84, "text": " compress things, like I don''t know if theoretically speaking + in an image, you have like one object,", "tokens": [51116, 14778, 721, 11, 411, + 286, 500, 380, 458, 498, 29400, 4124, 294, 364, 3256, 11, 291, 362, 411, 472, 2657, + 11, 51432], "temperature": 0.0, "avg_logprob": -0.20223540845124618, "compression_ratio": + 1.6756756756756757, "no_speech_prob": 0.02364165149629116}, {"id": 148, "seek": + 156848, "start": 1589.84, "end": 1595.28, "text": " and the rest is just the background + of one color. You really don''t need to pass all these pixels", "tokens": [51432, + 293, 264, 1472, 307, 445, 264, 3678, 295, 472, 2017, 13, 509, 534, 500, 380, 643, + 281, 1320, 439, 613, 18668, 51704], "temperature": 0.0, "avg_logprob": -0.20223540845124618, + "compression_ratio": 1.6756756756756757, "no_speech_prob": 0.02364165149629116}, + {"id": 149, "seek": 159528, "start": 1595.28, "end": 1601.84, "text": " independently, + like you could just say, okay, it''s a background I''ve learned that it''s that + color", "tokens": [50364, 21761, 11, 411, 291, 727, 445, 584, 11, 1392, 11, 309, + 311, 257, 3678, 286, 600, 3264, 300, 309, 311, 300, 2017, 50692], "temperature": + 0.0, "avg_logprob": -0.15224934948815239, "compression_ratio": 1.4923857868020305, + "no_speech_prob": 0.0006475347327068448}, {"id": 150, "seek": 159528, "start": 1601.84, + "end": 1608.32, "text": " kind of semantically, I guess, and then what matters is + the object somewhere there that we focus on", "tokens": [50692, 733, 295, 4361, + 49505, 11, 286, 2041, 11, 293, 550, 437, 7001, 307, 264, 2657, 4079, 456, 300, 321, + 1879, 322, 51016], "temperature": 0.0, "avg_logprob": -0.15224934948815239, "compression_ratio": + 1.4923857868020305, "no_speech_prob": 0.0006475347327068448}, {"id": 151, "seek": + 159528, "start": 1608.32, "end": 1616.56, "text": " when we look at this picture, + right? Yeah, exactly. Actually, in the original distribution case,", "tokens": [51016, + 562, 321, 574, 412, 341, 3036, 11, 558, 30, 865, 11, 2293, 13, 5135, 11, 294, 264, + 3380, 7316, 1389, 11, 51428], "temperature": 0.0, "avg_logprob": -0.15224934948815239, + "compression_ratio": 1.4923857868020305, "no_speech_prob": 0.0006475347327068448}, + {"id": 152, "seek": 161656, "start": 1617.52, "end": 1624.8799999999999, "text": + " for example, of images, we don''t have any connection between the value of a pixel", + "tokens": [50412, 337, 1365, 11, 295, 5267, 11, 321, 500, 380, 362, 604, 4984, 1296, + 264, 2158, 295, 257, 19261, 50780], "temperature": 0.0, "avg_logprob": -0.12031276835951694, + "compression_ratio": 1.3461538461538463, "no_speech_prob": 0.0027168949600309134}, + {"id": 153, "seek": 161656, "start": 1626.56, "end": 1640.48, "text": " and the + semantic counterpart of that pixel one. But once we transform it into a vector space,", + "tokens": [50864, 293, 264, 47982, 22335, 295, 300, 19261, 472, 13, 583, 1564, 321, + 4088, 309, 666, 257, 8062, 1901, 11, 51560], "temperature": 0.0, "avg_logprob": + -0.12031276835951694, "compression_ratio": 1.3461538461538463, "no_speech_prob": + 0.0027168949600309134}, {"id": 154, "seek": 164048, "start": 1641.1200000000001, + "end": 1653.44, "text": " at least theoretically we can make conclusions. For example, + we have a 1024 dimensional", "tokens": [50396, 412, 1935, 29400, 321, 393, 652, + 22865, 13, 1171, 1365, 11, 321, 362, 257, 1266, 7911, 18795, 51012], "temperature": + 0.0, "avg_logprob": -0.2408162967578785, "compression_ratio": 1.2521008403361344, + "no_speech_prob": 0.0059758685529232025}, {"id": 155, "seek": 164048, "start": 1654.88, + "end": 1661.76, "text": " vector as a representation of that image. In this case, + if we", "tokens": [51084, 8062, 382, 257, 10290, 295, 300, 3256, 13, 682, 341, 1389, + 11, 498, 321, 51428], "temperature": 0.0, "avg_logprob": -0.2408162967578785, "compression_ratio": + 1.2521008403361344, "no_speech_prob": 0.0059758685529232025}, {"id": 156, "seek": + 166176, "start": 1661.76, "end": 1674.72, "text": " examine this vector space, we + can make conclusions of this value in the index zero,", "tokens": [50364, 17496, + 341, 8062, 1901, 11, 321, 393, 652, 22865, 295, 341, 2158, 294, 264, 8186, 4018, + 11, 51012], "temperature": 0.0, "avg_logprob": -0.2603144231049911, "compression_ratio": + 1.5081967213114753, "no_speech_prob": 0.0028704653959721327}, {"id": 157, "seek": + 166176, "start": 1674.72, "end": 1687.92, "text": " in cause the features of this + feature of image. For example, it can, in cause the size of a specific", "tokens": + [51012, 294, 3082, 264, 4122, 295, 341, 4111, 295, 3256, 13, 1171, 1365, 11, 309, + 393, 11, 294, 3082, 264, 2744, 295, 257, 2685, 51672], "temperature": 0.0, "avg_logprob": + -0.2603144231049911, "compression_ratio": 1.5081967213114753, "no_speech_prob": + 0.0028704653959721327}, {"id": 158, "seek": 168792, "start": 1688.0, "end": 1701.1200000000001, + "text": " object or the colors value of a specific object or maybe some more abstract + features of objects.", "tokens": [50368, 2657, 420, 264, 4577, 2158, 295, 257, 2685, + 2657, 420, 1310, 512, 544, 12649, 4122, 295, 6565, 13, 51024], "temperature": 0.0, + "avg_logprob": -0.2761072256626227, "compression_ratio": 1.4609375, "no_speech_prob": + 0.006052842829376459}, {"id": 159, "seek": 168792, "start": 1702.72, "end": 1716.72, + "text": " This enables us to search it more efficiently instead of otherwise our + values are actually", "tokens": [51104, 639, 17077, 505, 281, 3164, 309, 544, 19621, + 2602, 295, 5911, 527, 4190, 366, 767, 51804], "temperature": 0.0, "avg_logprob": + -0.2761072256626227, "compression_ratio": 1.4609375, "no_speech_prob": 0.006052842829376459}, + {"id": 160, "seek": 171672, "start": 1717.2, "end": 1730.0, "text": " distributed + to a very wide range. And we don''t have such interpretations in that distribution + space.", "tokens": [50388, 12631, 281, 257, 588, 4874, 3613, 13, 400, 321, 500, + 380, 362, 1270, 37547, 294, 300, 7316, 1901, 13, 51028], "temperature": 0.0, "avg_logprob": + -0.2621048927307129, "compression_ratio": 1.5172413793103448, "no_speech_prob": + 0.008603821508586407}, {"id": 161, "seek": 171672, "start": 1730.72, "end": 1736.08, + "text": " Yeah, that makes sense. It''s a very unique high variant and also in some + senses,", "tokens": [51064, 865, 11, 300, 1669, 2020, 13, 467, 311, 257, 588, 3845, + 1090, 17501, 293, 611, 294, 512, 17057, 11, 51332], "temperature": 0.0, "avg_logprob": + -0.2621048927307129, "compression_ratio": 1.5172413793103448, "no_speech_prob": + 0.008603821508586407}, {"id": 162, "seek": 171672, "start": 1736.08, "end": 1741.68, + "text": " like waste of space because we are not communicating that much more information + by", "tokens": [51332, 411, 5964, 295, 1901, 570, 321, 366, 406, 17559, 300, 709, + 544, 1589, 538, 51612], "temperature": 0.0, "avg_logprob": -0.2621048927307129, + "compression_ratio": 1.5172413793103448, "no_speech_prob": 0.008603821508586407}, + {"id": 163, "seek": 174168, "start": 1742.0, "end": 1746.64, "text": " sort of encoding + all these pixels. But we could actually extract some features and patterns in", + "tokens": [50380, 1333, 295, 43430, 439, 613, 18668, 13, 583, 321, 727, 767, 8947, + 512, 4122, 293, 8294, 294, 50612], "temperature": 0.0, "avg_logprob": -0.23810322388358737, + "compression_ratio": 1.6083333333333334, "no_speech_prob": 0.01150743942707777}, + {"id": 164, "seek": 174168, "start": 1746.64, "end": 1752.0, "text": " the image. + I think some early work on this was done using, if I remember, it was called a", + "tokens": [50612, 264, 3256, 13, 286, 519, 512, 2440, 589, 322, 341, 390, 1096, + 1228, 11, 498, 286, 1604, 11, 309, 390, 1219, 257, 50880], "temperature": 0.0, "avg_logprob": + -0.23810322388358737, "compression_ratio": 1.6083333333333334, "no_speech_prob": + 0.01150743942707777}, {"id": 165, "seek": 174168, "start": 1752.0, "end": 1758.24, + "text": " Godworth filter or some other ways of kind of smoothing your image and + trying to learn what features", "tokens": [50880, 1265, 13136, 6608, 420, 512, 661, + 2098, 295, 733, 295, 899, 6259, 571, 428, 3256, 293, 1382, 281, 1466, 437, 4122, + 51192], "temperature": 0.0, "avg_logprob": -0.23810322388358737, "compression_ratio": + 1.6083333333333334, "no_speech_prob": 0.01150743942707777}, {"id": 166, "seek": + 174168, "start": 1758.24, "end": 1765.3600000000001, "text": " you have, for instance, + if you try to differentiate between spruce and widely trees. So like for the", "tokens": + [51192, 291, 362, 11, 337, 5197, 11, 498, 291, 853, 281, 23203, 1296, 637, 41158, + 293, 13371, 5852, 13, 407, 411, 337, 264, 51548], "temperature": 0.0, "avg_logprob": + -0.23810322388358737, "compression_ratio": 1.6083333333333334, "no_speech_prob": + 0.01150743942707777}, {"id": 167, "seek": 176536, "start": 1765.36, "end": 1776.3999999999999, + "text": " purposes of keeping one tree and then maybe removing the others. But I + think it wasn''t as efficient", "tokens": [50364, 9932, 295, 5145, 472, 4230, 293, + 550, 1310, 12720, 264, 2357, 13, 583, 286, 519, 309, 2067, 380, 382, 7148, 50916], + "temperature": 0.0, "avg_logprob": -0.20047240257263182, "compression_ratio": 1.606837606837607, + "no_speech_prob": 0.0010489040287211537}, {"id": 168, "seek": 176536, "start": 1777.12, + "end": 1781.6799999999998, "text": " perhaps as compared to deep learning because + deep learning, as far as understanding,", "tokens": [50952, 4317, 382, 5347, 281, + 2452, 2539, 570, 2452, 2539, 11, 382, 1400, 382, 3701, 11, 51180], "temperature": + 0.0, "avg_logprob": -0.20047240257263182, "compression_ratio": 1.606837606837607, + "no_speech_prob": 0.0010489040287211537}, {"id": 169, "seek": 176536, "start": 1781.6799999999998, + "end": 1787.6, "text": " basically like learns without features in many ways. It + learns from the data and then you should", "tokens": [51180, 1936, 411, 27152, 1553, + 4122, 294, 867, 2098, 13, 467, 27152, 490, 264, 1412, 293, 550, 291, 820, 51476], + "temperature": 0.0, "avg_logprob": -0.20047240257263182, "compression_ratio": 1.606837606837607, + "no_speech_prob": 0.0010489040287211537}, {"id": 170, "seek": 176536, "start": 1787.6, + "end": 1794.24, "text": " have some target function that you''re optimizing for + so it can recreate the weights inside it.", "tokens": [51476, 362, 512, 3779, 2445, + 300, 291, 434, 40425, 337, 370, 309, 393, 25833, 264, 17443, 1854, 309, 13, 51808], + "temperature": 0.0, "avg_logprob": -0.20047240257263182, "compression_ratio": 1.606837606837607, + "no_speech_prob": 0.0010489040287211537}, {"id": 171, "seek": 179536, "start": 1796.32, + "end": 1800.6399999999999, "text": " Exactly. Actually, what is most differentiating", + "tokens": [50412, 7587, 13, 5135, 11, 437, 307, 881, 27372, 990, 50628], "temperature": + 0.0, "avg_logprob": -0.20722929123909242, "compression_ratio": 1.3557692307692308, + "no_speech_prob": 0.0030548565555363894}, {"id": 172, "seek": 179536, "start": 1804.24, + "end": 1816.6399999999999, "text": " feature of deep learning is deep learning is + actually used to learn the parameters of complex", "tokens": [50808, 4111, 295, + 2452, 2539, 307, 2452, 2539, 307, 767, 1143, 281, 1466, 264, 9834, 295, 3997, 51428], + "temperature": 0.0, "avg_logprob": -0.20722929123909242, "compression_ratio": 1.3557692307692308, + "no_speech_prob": 0.0030548565555363894}, {"id": 173, "seek": 181664, "start": 1817.6000000000001, + "end": 1831.1200000000001, "text": " functions instead of manually tuning them. + Before deep learning, we already had most of the", "tokens": [50412, 6828, 2602, + 295, 16945, 15164, 552, 13, 4546, 2452, 2539, 11, 321, 1217, 632, 881, 295, 264, + 51088], "temperature": 0.0, "avg_logprob": -0.1290648341178894, "compression_ratio": + 1.4803149606299213, "no_speech_prob": 0.00355158350430429}, {"id": 174, "seek": + 181664, "start": 1831.1200000000001, "end": 1844.5600000000002, "text": " filters + we currently have. But the parameters of such filters were supposed to be manually + tuned", "tokens": [51088, 15995, 321, 4362, 362, 13, 583, 264, 9834, 295, 1270, + 15995, 645, 3442, 281, 312, 16945, 10870, 51760], "temperature": 0.0, "avg_logprob": + -0.1290648341178894, "compression_ratio": 1.4803149606299213, "no_speech_prob": + 0.00355158350430429}, {"id": 175, "seek": 184456, "start": 1844.6399999999999, "end": + 1855.04, "text": " by experts in that domain. But in deep learning, we learn those + parameters directly from data.", "tokens": [50368, 538, 8572, 294, 300, 9274, 13, + 583, 294, 2452, 2539, 11, 321, 1466, 729, 9834, 3838, 490, 1412, 13, 50888], "temperature": + 0.0, "avg_logprob": -0.1896209239959717, "compression_ratio": 1.3306451612903225, + "no_speech_prob": 0.004211807157844305}, {"id": 176, "seek": 184456, "start": 1856.1599999999999, + "end": 1864.56, "text": " And as you said, actually, the beginning of metric learning + is also in", "tokens": [50944, 400, 382, 291, 848, 11, 767, 11, 264, 2863, 295, + 20678, 2539, 307, 611, 294, 51364], "temperature": 0.0, "avg_logprob": -0.1896209239959717, + "compression_ratio": 1.3306451612903225, "no_speech_prob": 0.004211807157844305}, + {"id": 177, "seek": 186456, "start": 1865.04, "end": 1876.96, "text": " dimensionality + reduction. We have most popular contrastive loss, for example. And the first", "tokens": + [50388, 10139, 1860, 11004, 13, 492, 362, 881, 3743, 8712, 488, 4470, 11, 337, 1365, + 13, 400, 264, 700, 50984], "temperature": 0.0, "avg_logprob": -0.25351572036743164, + "compression_ratio": 1.3984962406015038, "no_speech_prob": 0.004416854120790958}, + {"id": 178, "seek": 186456, "start": 1876.96, "end": 1888.72, "text": " introduction + of contrastive loss is in 2005 and the original purpose of that function actually", + "tokens": [50984, 9339, 295, 8712, 488, 4470, 307, 294, 14394, 293, 264, 3380, 4334, + 295, 300, 2445, 767, 51572], "temperature": 0.0, "avg_logprob": -0.25351572036743164, + "compression_ratio": 1.3984962406015038, "no_speech_prob": 0.004416854120790958}, + {"id": 179, "seek": 188872, "start": 1889.6000000000001, "end": 1902.64, "text": + " to reduce dimensionality of high dimensional inputs rather than vector source + or anything F", "tokens": [50408, 281, 5407, 10139, 1860, 295, 1090, 18795, 15743, + 2831, 813, 8062, 4009, 420, 1340, 479, 51060], "temperature": 0.0, "avg_logprob": + -0.27132900555928546, "compression_ratio": 1.7333333333333334, "no_speech_prob": + 0.010725550353527069}, {"id": 180, "seek": 188872, "start": 1904.16, "end": 1915.44, + "text": " for actually another end just tried to reduce the dimensionality of high + dimensional input", "tokens": [51136, 337, 767, 1071, 917, 445, 3031, 281, 5407, + 264, 10139, 1860, 295, 1090, 18795, 4846, 51700], "temperature": 0.0, "avg_logprob": + -0.27132900555928546, "compression_ratio": 1.7333333333333334, "no_speech_prob": + 0.010725550353527069}, {"id": 181, "seek": 191544, "start": 1915.52, "end": 1923.6000000000001, + "text": " to use lower dimensional input F features to other models.", "tokens": + [50368, 281, 764, 3126, 18795, 4846, 479, 4122, 281, 661, 5245, 13, 50772], "temperature": + 0.0, "avg_logprob": -0.1849643144852076, "compression_ratio": 1.502183406113537, + "no_speech_prob": 0.0032915824558585882}, {"id": 182, "seek": 191544, "start": 1924.48, + "end": 1929.92, "text": " Yeah, that sounds exciting. Actually, before you brought + this up, I didn''t think that way because", "tokens": [50816, 865, 11, 300, 3263, + 4670, 13, 5135, 11, 949, 291, 3038, 341, 493, 11, 286, 994, 380, 519, 300, 636, + 570, 51088], "temperature": 0.0, "avg_logprob": -0.1849643144852076, "compression_ratio": + 1.502183406113537, "no_speech_prob": 0.0032915824558585882}, {"id": 183, "seek": + 191544, "start": 1931.1200000000001, "end": 1938.24, "text": " I was experimenting + in my team also with things like product quantization. So you do have all", "tokens": + [51148, 286, 390, 29070, 294, 452, 1469, 611, 365, 721, 411, 1674, 4426, 2144, 13, + 407, 291, 360, 362, 439, 51504], "temperature": 0.0, "avg_logprob": -0.1849643144852076, + "compression_ratio": 1.502183406113537, "no_speech_prob": 0.0032915824558585882}, + {"id": 184, "seek": 191544, "start": 1938.24, "end": 1943.04, "text": " already + the vectors computed by the neural network, but you could actually quantize them + even", "tokens": [51504, 1217, 264, 18875, 40610, 538, 264, 18161, 3209, 11, 457, + 291, 727, 767, 4426, 1125, 552, 754, 51744], "temperature": 0.0, "avg_logprob": + -0.1849643144852076, "compression_ratio": 1.502183406113537, "no_speech_prob": 0.0032915824558585882}, + {"id": 185, "seek": 194304, "start": 1943.04, "end": 1950.48, "text": " further. + So you save space and maybe of course you introduce some overlaps that might decrease + your", "tokens": [50364, 3052, 13, 407, 291, 3155, 1901, 293, 1310, 295, 1164, 291, + 5366, 512, 15986, 2382, 300, 1062, 11514, 428, 50736], "temperature": 0.0, "avg_logprob": + -0.15778249769068475, "compression_ratio": 1.582010582010582, "no_speech_prob": + 0.009588501416146755}, {"id": 186, "seek": 194304, "start": 1950.48, "end": 1958.56, + "text": " precision, but slightly, but you''re gonna save a ton of space and make + your search more efficient.", "tokens": [50736, 18356, 11, 457, 4748, 11, 457, 291, + 434, 799, 3155, 257, 2952, 295, 1901, 293, 652, 428, 3164, 544, 7148, 13, 51140], + "temperature": 0.0, "avg_logprob": -0.15778249769068475, "compression_ratio": 1.582010582010582, + "no_speech_prob": 0.009588501416146755}, {"id": 187, "seek": 194304, "start": 1958.56, + "end": 1964.48, "text": " So it''s almost like you could think of dimensionality + reduction in so many different levels and ways", "tokens": [51140, 407, 309, 311, + 1920, 411, 291, 727, 519, 295, 10139, 1860, 11004, 294, 370, 867, 819, 4358, 293, + 2098, 51436], "temperature": 0.0, "avg_logprob": -0.15778249769068475, "compression_ratio": + 1.582010582010582, "no_speech_prob": 0.009588501416146755}, {"id": 188, "seek": + 196448, "start": 1965.1200000000001, "end": 1972.8, "text": " as you have the reason + about your data, right? Yeah, exactly. Actually, metric learning is", "tokens": + [50396, 382, 291, 362, 264, 1778, 466, 428, 1412, 11, 558, 30, 865, 11, 2293, 13, + 5135, 11, 20678, 2539, 307, 50780], "temperature": 0.0, "avg_logprob": -0.2866149946700695, + "compression_ratio": 1.3703703703703705, "no_speech_prob": 0.00762192253023386}, + {"id": 189, "seek": 196448, "start": 1973.68, "end": 1984.88, "text": " itself a + type of dimensionality reduction, but even after you apply metric learning and vector", + "tokens": [50824, 2564, 257, 2010, 295, 10139, 1860, 11004, 11, 457, 754, 934, 291, + 3079, 20678, 2539, 293, 8062, 51384], "temperature": 0.0, "avg_logprob": -0.2866149946700695, + "compression_ratio": 1.3703703703703705, "no_speech_prob": 0.00762192253023386}, + {"id": 190, "seek": 198488, "start": 1984.88, "end": 1996.64, "text": " encoding + to your data, you still have a high dimensional vector. You have, for example, 10,", + "tokens": [50364, 43430, 281, 428, 1412, 11, 291, 920, 362, 257, 1090, 18795, 8062, + 13, 509, 362, 11, 337, 1365, 11, 1266, 11, 50952], "temperature": 0.0, "avg_logprob": + -0.24850128173828126, "compression_ratio": 1.0833333333333333, "no_speech_prob": + 0.007246140856295824}, {"id": 191, "seek": 199664, "start": 1996.64, "end": 2015.0400000000002, + "text": " 10, 10, 10, 4, dimensional data times 32 bits for a single flaw. So it''s + already a huge data", "tokens": [50364, 1266, 11, 1266, 11, 1266, 11, 1017, 11, + 18795, 1412, 1413, 8858, 9239, 337, 257, 2167, 13717, 13, 407, 309, 311, 1217, 257, + 2603, 1412, 51284], "temperature": 0.0, "avg_logprob": -0.43807647968160696, "compression_ratio": + 1.069767441860465, "no_speech_prob": 0.008578852750360966}, {"id": 192, "seek": + 201504, "start": 2015.04, "end": 2029.52, "text": " when you have, for example, + millions of samples. So you can still actually apply some quantization", "tokens": + [50364, 562, 291, 362, 11, 337, 1365, 11, 6803, 295, 10938, 13, 407, 291, 393, 920, + 767, 3079, 512, 4426, 2144, 51088], "temperature": 0.0, "avg_logprob": -0.2164289951324463, + "compression_ratio": 1.1264367816091954, "no_speech_prob": 0.018948595970869064}, + {"id": 193, "seek": 202952, "start": 2030.32, "end": 2042.8799999999999, "text": + " methods to get even smaller representations from that one. And this can be also + hierarchical, meaning that", "tokens": [50404, 7150, 281, 483, 754, 4356, 33358, + 490, 300, 472, 13, 400, 341, 393, 312, 611, 35250, 804, 11, 3620, 300, 51032], "temperature": + 0.0, "avg_logprob": -0.20502420572134164, "compression_ratio": 1.472, "no_speech_prob": + 0.08072381466627121}, {"id": 194, "seek": 202952, "start": 2043.68, "end": 2056.16, + "text": " you can get several representations of the same sample at different levels + of", "tokens": [51072, 291, 393, 483, 2940, 33358, 295, 264, 912, 6889, 412, 819, + 4358, 295, 51696], "temperature": 0.0, "avg_logprob": -0.20502420572134164, "compression_ratio": + 1.472, "no_speech_prob": 0.08072381466627121}, {"id": 195, "seek": 205616, "start": + 2057.04, "end": 2066.16, "text": " information encoded in that feature space. Yeah, + that''s fantastic. So I was also thinking like,", "tokens": [50408, 1589, 2058, + 12340, 294, 300, 4111, 1901, 13, 865, 11, 300, 311, 5456, 13, 407, 286, 390, 611, + 1953, 411, 11, 50864], "temperature": 0.0, "avg_logprob": -0.21805763244628906, + "compression_ratio": 1.5376344086021505, "no_speech_prob": 0.004282147157937288}, + {"id": 196, "seek": 205616, "start": 2067.6, "end": 2075.7599999999998, "text": + " if you could give like some practical example or setting where I could start thinking + about", "tokens": [50936, 498, 291, 727, 976, 411, 512, 8496, 1365, 420, 3287, 689, + 286, 727, 722, 1953, 466, 51344], "temperature": 0.0, "avg_logprob": -0.21805763244628906, + "compression_ratio": 1.5376344086021505, "no_speech_prob": 0.004282147157937288}, + {"id": 197, "seek": 205616, "start": 2075.7599999999998, "end": 2083.7599999999998, + "text": " deploying metric learning and also like, could you sort of point us in + the direction of what tools", "tokens": [51344, 34198, 20678, 2539, 293, 611, 411, + 11, 727, 291, 1333, 295, 935, 505, 294, 264, 3513, 295, 437, 3873, 51744], "temperature": + 0.0, "avg_logprob": -0.21805763244628906, "compression_ratio": 1.5376344086021505, + "no_speech_prob": 0.004282147157937288}, {"id": 198, "seek": 208376, "start": 2083.76, + "end": 2089.2000000000003, "text": " are available so that I don''t think we need + to reinvent everything from scratch, but maybe there are", "tokens": [50364, 366, + 2435, 370, 300, 286, 500, 380, 519, 321, 643, 281, 33477, 1203, 490, 8459, 11, 457, + 1310, 456, 366, 50636], "temperature": 0.0, "avg_logprob": -0.1574177811111229, + "compression_ratio": 1.5153061224489797, "no_speech_prob": 0.00320336502045393}, + {"id": 199, "seek": 208376, "start": 2089.2000000000003, "end": 2095.92, "text": + " some practices, also best practices available, you know, to structure this process. + Can you give", "tokens": [50636, 512, 7525, 11, 611, 1151, 7525, 2435, 11, 291, + 458, 11, 281, 3877, 341, 1399, 13, 1664, 291, 976, 50972], "temperature": 0.0, "avg_logprob": + -0.1574177811111229, "compression_ratio": 1.5153061224489797, "no_speech_prob": + 0.00320336502045393}, {"id": 200, "seek": 208376, "start": 2095.92, "end": 2106.8, + "text": " some advice on that? Yeah, sure. For a starter example, actually, metric + learning is best known for", "tokens": [50972, 512, 5192, 322, 300, 30, 865, 11, + 988, 13, 1171, 257, 22465, 1365, 11, 767, 11, 20678, 2539, 307, 1151, 2570, 337, + 51516], "temperature": 0.0, "avg_logprob": -0.1574177811111229, "compression_ratio": + 1.5153061224489797, "no_speech_prob": 0.00320336502045393}, {"id": 201, "seek": + 210680, "start": 2106.8, "end": 2115.76, "text": " is used in face recognition, + but personally, I don''t support use of machine learning to process", "tokens": + [50364, 307, 1143, 294, 1851, 11150, 11, 457, 5665, 11, 286, 500, 380, 1406, 764, + 295, 3479, 2539, 281, 1399, 50812], "temperature": 0.0, "avg_logprob": -0.24581545049493964, + "compression_ratio": 1.3617021276595744, "no_speech_prob": 0.005657794885337353}, + {"id": 202, "seek": 210680, "start": 2115.76, "end": 2126.6400000000003, "text": + " biometric information. So I give an example from our everyday life, actually, + we almost everyday", "tokens": [50812, 3228, 29470, 1589, 13, 407, 286, 976, 364, + 1365, 490, 527, 7429, 993, 11, 767, 11, 321, 1920, 7429, 51356], "temperature": + 0.0, "avg_logprob": -0.24581545049493964, "compression_ratio": 1.3617021276595744, + "no_speech_prob": 0.005657794885337353}, {"id": 203, "seek": 212664, "start": 2127.44, + "end": 2139.2, "text": " use it, smart deploy. The feature found in, for example, + GMA, LinkedIn, and other messaging apps.", "tokens": [50404, 764, 309, 11, 4069, + 7274, 13, 440, 4111, 1352, 294, 11, 337, 1365, 11, 460, 9998, 11, 20657, 11, 293, + 661, 21812, 7733, 13, 50992], "temperature": 0.0, "avg_logprob": -0.23712456744650137, + "compression_ratio": 1.2993197278911566, "no_speech_prob": 0.00422839168459177}, + {"id": 204, "seek": 212664, "start": 2141.44, "end": 2152.3199999999997, "text": + " Actually, it is trained from a large collection of conversation histories in these + platforms.", "tokens": [51104, 5135, 11, 309, 307, 8895, 490, 257, 2416, 5765, 295, + 3761, 30631, 294, 613, 9473, 13, 51648], "temperature": 0.0, "avg_logprob": -0.23712456744650137, + "compression_ratio": 1.2993197278911566, "no_speech_prob": 0.00422839168459177}, + {"id": 205, "seek": 215232, "start": 2153.2000000000003, "end": 2169.44, "text": + " Basically, they just like the example we put in the beginning image and textual + unified vector space,", "tokens": [50408, 8537, 11, 436, 445, 411, 264, 1365, 321, + 829, 294, 264, 2863, 3256, 293, 2487, 901, 26787, 8062, 1901, 11, 51220], "temperature": + 0.0, "avg_logprob": -0.2126744099152394, "compression_ratio": 1.4803149606299213, + "no_speech_prob": 0.0026586647145450115}, {"id": 206, "seek": 215232, "start": 2169.44, + "end": 2178.88, "text": " they construct a unified vector space for conversation + histories and single sentences.", "tokens": [51220, 436, 7690, 257, 26787, 8062, + 1901, 337, 3761, 30631, 293, 2167, 16579, 13, 51692], "temperature": 0.0, "avg_logprob": + -0.2126744099152394, "compression_ratio": 1.4803149606299213, "no_speech_prob": + 0.0026586647145450115}, {"id": 207, "seek": 217888, "start": 2179.44, "end": 2192.2400000000002, + "text": " For any moment of conversation, you encode the history of that conversation + to retrieve", "tokens": [50392, 1171, 604, 1623, 295, 3761, 11, 291, 2058, 1429, + 264, 2503, 295, 300, 3761, 281, 30254, 51032], "temperature": 0.0, "avg_logprob": + -0.21244478225708008, "compression_ratio": 1.4508196721311475, "no_speech_prob": + 0.007910378277301788}, {"id": 208, "seek": 217888, "start": 2193.28, "end": 2207.76, + "text": " most relevant replies to that history. And you can show them as suggestions + to the usage,", "tokens": [51084, 881, 7340, 42289, 281, 300, 2503, 13, 400, 291, + 393, 855, 552, 382, 13396, 281, 264, 14924, 11, 51808], "temperature": 0.0, "avg_logprob": + -0.21244478225708008, "compression_ratio": 1.4508196721311475, "no_speech_prob": + 0.007910378277301788}, {"id": 209, "seek": 220776, "start": 2207.76, "end": 2218.1600000000003, + "text": " and users can pitch one of them. And what is exciting with this setup, + you can also", "tokens": [50364, 293, 5022, 393, 7293, 472, 295, 552, 13, 400, 437, + 307, 4670, 365, 341, 8657, 11, 291, 393, 611, 50884], "temperature": 0.0, "avg_logprob": + -0.17199566250755674, "compression_ratio": 1.390625, "no_speech_prob": 0.00404054019600153}, + {"id": 210, "seek": 220776, "start": 2219.5200000000004, "end": 2234.8, "text": + " log the chosen reply, and you can continue improving your model from direct feedback + from your", "tokens": [50952, 3565, 264, 8614, 16972, 11, 293, 291, 393, 2354, 11470, + 428, 2316, 490, 2047, 5824, 490, 428, 51716], "temperature": 0.0, "avg_logprob": + -0.17199566250755674, "compression_ratio": 1.390625, "no_speech_prob": 0.00404054019600153}, + {"id": 211, "seek": 223480, "start": 2235.52, "end": 2247.36, "text": " actual users. + So it''s a really practical use case of metric learning. And for practitioners who + want to", "tokens": [50400, 3539, 5022, 13, 407, 309, 311, 257, 534, 8496, 764, + 1389, 295, 20678, 2539, 13, 400, 337, 25742, 567, 528, 281, 50992], "temperature": + 0.0, "avg_logprob": -0.259184482485749, "compression_ratio": 1.4402985074626866, + "no_speech_prob": 0.00505034951493144}, {"id": 212, "seek": 223480, "start": 2248.32, + "end": 2259.92, "text": " start experimenting with metric learning, actually, there + are lots of tools to solve very", "tokens": [51040, 722, 29070, 365, 20678, 2539, + 11, 767, 11, 456, 366, 3195, 295, 3873, 281, 5039, 588, 51620], "temperature": 0.0, + "avg_logprob": -0.259184482485749, "compression_ratio": 1.4402985074626866, "no_speech_prob": + 0.00505034951493144}, {"id": 213, "seek": 225992, "start": 2259.92, "end": 2271.52, + "text": " few problems in metric learning. So in the context of deep learning model + development itself,", "tokens": [50364, 1326, 2740, 294, 20678, 2539, 13, 407, 294, + 264, 4319, 295, 2452, 2539, 2316, 3250, 2564, 11, 50944], "temperature": 0.0, "avg_logprob": + -0.2515622813527177, "compression_ratio": 1.4230769230769231, "no_speech_prob": + 0.0075847068801522255}, {"id": 214, "seek": 225992, "start": 2271.52, "end": 2282.08, + "text": " we have several libraries, such as high-torch metric learning and transfer + flow similarity.", "tokens": [50944, 321, 362, 2940, 15148, 11, 1270, 382, 1090, + 12, 21151, 339, 20678, 2539, 293, 5003, 3095, 32194, 13, 51472], "temperature": + 0.0, "avg_logprob": -0.2515622813527177, "compression_ratio": 1.4230769230769231, + "no_speech_prob": 0.0075847068801522255}, {"id": 215, "seek": 228208, "start": 2282.56, + "end": 2297.44, "text": " There are other libraries as well, but I think these are + the most mature libraries and most", "tokens": [50388, 821, 366, 661, 15148, 382, + 731, 11, 457, 286, 519, 613, 366, 264, 881, 14442, 15148, 293, 881, 51132], "temperature": + 0.0, "avg_logprob": -0.3006820028478449, "compression_ratio": 1.1973684210526316, + "no_speech_prob": 0.017655150964856148}, {"id": 216, "seek": 229744, "start": 2297.52, + "end": 2310.0, "text": " cultural, how should I say, virtual libraries to tackle + with different data tasks.", "tokens": [50368, 6988, 11, 577, 820, 286, 584, 11, + 6374, 15148, 281, 14896, 365, 819, 1412, 9608, 13, 50992], "temperature": 0.0, "avg_logprob": + -0.457450093449773, "compression_ratio": 1.2741935483870968, "no_speech_prob": 0.018973808735609055}, + {"id": 217, "seek": 229744, "start": 2312.16, "end": 2320.8, "text": " On the other + hand, for visualization, we have this transfer flow projector,", "tokens": [51100, + 1282, 264, 661, 1011, 11, 337, 25801, 11, 321, 362, 341, 5003, 3095, 39792, 11, + 51532], "temperature": 0.0, "avg_logprob": -0.457450093449773, "compression_ratio": + 1.2741935483870968, "no_speech_prob": 0.018973808735609055}, {"id": 218, "seek": + 232080, "start": 2320.8, "end": 2330.0800000000004, "text": " is a browser-based + tool for you can examine your embedding easily with that one.", "tokens": [50364, + 307, 257, 11185, 12, 6032, 2290, 337, 291, 393, 17496, 428, 12240, 3584, 3612, 365, + 300, 472, 13, 50828], "temperature": 0.0, "avg_logprob": -0.35618030734178496, "compression_ratio": + 1.344, "no_speech_prob": 0.01588754542171955}, {"id": 219, "seek": 232080, "start": + 2332.96, "end": 2343.6000000000004, "text": " There are also vector search databases, + there are increasing in numbers, but of course,", "tokens": [50972, 821, 366, 611, + 8062, 3164, 22380, 11, 456, 366, 5662, 294, 3547, 11, 457, 295, 1164, 11, 51504], + "temperature": 0.0, "avg_logprob": -0.35618030734178496, "compression_ratio": 1.344, + "no_speech_prob": 0.01588754542171955}, {"id": 220, "seek": 234360, "start": 2344.48, + "end": 2357.2799999999997, "text": " I am a fan of Quaddon because it''s really + doing a great job with an extensive filtering support", "tokens": [50408, 286, 669, + 257, 3429, 295, 2326, 345, 13966, 570, 309, 311, 534, 884, 257, 869, 1691, 365, + 364, 13246, 30822, 1406, 51048], "temperature": 0.0, "avg_logprob": -0.3162554680032933, + "compression_ratio": 1.3098591549295775, "no_speech_prob": 0.014685436151921749}, + {"id": 221, "seek": 234360, "start": 2357.2799999999997, "end": 2372.24, "text": + " for a variety of data tasks. And it''s doing this very efficiently, very elegant + in only 40", "tokens": [51048, 337, 257, 5673, 295, 1412, 9608, 13, 400, 309, 311, + 884, 341, 588, 19621, 11, 588, 21117, 294, 787, 3356, 51796], "temperature": 0.0, + "avg_logprob": -0.3162554680032933, "compression_ratio": 1.3098591549295775, "no_speech_prob": + 0.014685436151921749}, {"id": 222, "seek": 237224, "start": 2372.8799999999997, + "end": 2387.6, "text": " megawise. So it opens up very important is to put your + metric learning model into production", "tokens": [50396, 10816, 1607, 908, 13, + 407, 309, 9870, 493, 588, 1021, 307, 281, 829, 428, 20678, 2539, 2316, 666, 4265, + 51132], "temperature": 0.0, "avg_logprob": -0.31873972519584326, "compression_ratio": + 1.1358024691358024, "no_speech_prob": 0.013554328121244907}, {"id": 223, "seek": + 238760, "start": 2388.56, "end": 2402.16, "text": " and to combine vector search + with super search as well. So you can just filter your data based on", "tokens": + [50412, 293, 281, 10432, 8062, 3164, 365, 1687, 3164, 382, 731, 13, 407, 291, 393, + 445, 6608, 428, 1412, 2361, 322, 51092], "temperature": 0.0, "avg_logprob": -0.3026287749006942, + "compression_ratio": 1.373913043478261, "no_speech_prob": 0.025765538215637207}, + {"id": 224, "seek": 238760, "start": 2403.6, "end": 2409.8399999999997, "text": + " their payload information at the same time as vector search.", "tokens": [51164, + 641, 30918, 1589, 412, 264, 912, 565, 382, 8062, 3164, 13, 51476], "temperature": + 0.0, "avg_logprob": -0.3026287749006942, "compression_ratio": 1.373913043478261, + "no_speech_prob": 0.025765538215637207}, {"id": 225, "seek": 240984, "start": 2410.32, + "end": 2425.04, "text": " I think these are other than that, beyond beside my research + and engineering practices,", "tokens": [50388, 286, 519, 613, 366, 661, 813, 300, + 11, 4399, 15726, 452, 2132, 293, 7043, 7525, 11, 51124], "temperature": 0.0, "avg_logprob": + -0.3671827567251105, "compression_ratio": 1.3615384615384616, "no_speech_prob": + 0.018351832404732704}, {"id": 226, "seek": 240984, "start": 2425.76, "end": 2437.28, + "text": " I''m also maintaining a repository called Automatic Learning and I''m + regularly sharing new", "tokens": [51160, 286, 478, 611, 14916, 257, 25841, 1219, + 6049, 13143, 15205, 293, 286, 478, 11672, 5414, 777, 51736], "temperature": 0.0, + "avg_logprob": -0.3671827567251105, "compression_ratio": 1.3615384615384616, "no_speech_prob": + 0.018351832404732704}, {"id": 227, "seek": 243728, "start": 2438.1600000000003, + "end": 2448.88, "text": " developments in the domain of metric learning with personal + annotations. So I think it might be", "tokens": [50408, 20862, 294, 264, 9274, 295, + 20678, 2539, 365, 2973, 25339, 763, 13, 407, 286, 519, 309, 1062, 312, 50944], "temperature": + 0.0, "avg_logprob": -0.20294805673452523, "compression_ratio": 1.3333333333333333, + "no_speech_prob": 0.011788278818130493}, {"id": 228, "seek": 243728, "start": 2448.88, + "end": 2459.52, "text": " also quite helpful for those who want to find their ways + in this domain.", "tokens": [50944, 611, 1596, 4961, 337, 729, 567, 528, 281, 915, + 641, 2098, 294, 341, 9274, 13, 51476], "temperature": 0.0, "avg_logprob": -0.20294805673452523, + "compression_ratio": 1.3333333333333333, "no_speech_prob": 0.011788278818130493}, + {"id": 229, "seek": 245952, "start": 2459.6, "end": 2465.7599999999998, "text": + " That''s awesome. Thank you. I will certainly make sure to add all of these links + in the", "tokens": [50368, 663, 311, 3476, 13, 1044, 291, 13, 286, 486, 3297, 652, + 988, 281, 909, 439, 295, 613, 6123, 294, 264, 50676], "temperature": 0.0, "avg_logprob": + -0.18772247225739236, "compression_ratio": 1.6666666666666667, "no_speech_prob": + 0.02823718637228012}, {"id": 230, "seek": 245952, "start": 2466.56, "end": 2473.68, + "text": " description notes, in the notes to this podcast and usually all of these + podcasts that I do,", "tokens": [50716, 3855, 5570, 11, 294, 264, 5570, 281, 341, + 7367, 293, 2673, 439, 295, 613, 24045, 300, 286, 360, 11, 51072], "temperature": + 0.0, "avg_logprob": -0.18772247225739236, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.02823718637228012}, {"id": 231, "seek": 245952, "start": 2473.68, + "end": 2478.64, "text": " they have a lot of links that actually you almost can + use as an educational material. And", "tokens": [51072, 436, 362, 257, 688, 295, + 6123, 300, 767, 291, 1920, 393, 764, 382, 364, 10189, 2527, 13, 400, 51320], "temperature": + 0.0, "avg_logprob": -0.18772247225739236, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.02823718637228012}, {"id": 232, "seek": 245952, "start": 2478.64, + "end": 2486.72, "text": " thanks so much for adding so much information here. And + I actually wanted to drill a little bit", "tokens": [51320, 3231, 370, 709, 337, + 5127, 370, 709, 1589, 510, 13, 400, 286, 767, 1415, 281, 11392, 257, 707, 857, 51724], + "temperature": 0.0, "avg_logprob": -0.18772247225739236, "compression_ratio": 1.6666666666666667, + "no_speech_prob": 0.02823718637228012}, {"id": 233, "seek": 248672, "start": 2486.72, + "end": 2492.24, "text": " again into that example that brilliant example you gave + about predicting sort of what snacks when I", "tokens": [50364, 797, 666, 300, 1365, + 300, 10248, 1365, 291, 2729, 466, 32884, 1333, 295, 437, 16160, 562, 286, 50640], + "temperature": 0.0, "avg_logprob": -0.2181310551140898, "compression_ratio": 1.6324786324786325, + "no_speech_prob": 0.0064032007940113544}, {"id": 234, "seek": 248672, "start": 2492.24, + "end": 2497.04, "text": " type. Actually, I used this feature quite a lot and especially + like when you''re on the go and", "tokens": [50640, 2010, 13, 5135, 11, 286, 1143, + 341, 4111, 1596, 257, 688, 293, 2318, 411, 562, 291, 434, 322, 264, 352, 293, 50880], + "temperature": 0.0, "avg_logprob": -0.2181310551140898, "compression_ratio": 1.6324786324786325, + "no_speech_prob": 0.0064032007940113544}, {"id": 235, "seek": 248672, "start": 2497.04, + "end": 2503.4399999999996, "text": " today I think I''ve used it somewhere with + a Gmail, I was on the go and I had only one finger,", "tokens": [50880, 965, 286, + 519, 286, 600, 1143, 309, 4079, 365, 257, 36732, 11, 286, 390, 322, 264, 352, 293, + 286, 632, 787, 472, 5984, 11, 51200], "temperature": 0.0, "avg_logprob": -0.2181310551140898, + "compression_ratio": 1.6324786324786325, "no_speech_prob": 0.0064032007940113544}, + {"id": 236, "seek": 248672, "start": 2503.4399999999996, "end": 2510.3999999999996, + "text": " right? So just holding my phone as I go and there was a question and the + answer was something,", "tokens": [51200, 558, 30, 407, 445, 5061, 452, 2593, 382, + 286, 352, 293, 456, 390, 257, 1168, 293, 264, 1867, 390, 746, 11, 51548], "temperature": + 0.0, "avg_logprob": -0.2181310551140898, "compression_ratio": 1.6324786324786325, + "no_speech_prob": 0.0064032007940113544}, {"id": 237, "seek": 251040, "start": 2510.4, + "end": 2516.4, "text": " yes, it happened or yes, it did. And maybe it wasn''t the + best sort of semantical choice or maybe", "tokens": [50364, 2086, 11, 309, 2011, + 420, 2086, 11, 309, 630, 13, 400, 1310, 309, 2067, 380, 264, 1151, 1333, 295, 4361, + 394, 804, 3922, 420, 1310, 50664], "temperature": 0.0, "avg_logprob": -0.1602321261451358, + "compression_ratio": 1.6527196652719665, "no_speech_prob": 0.005754351615905762}, + {"id": 238, "seek": 251040, "start": 2516.4, "end": 2521.84, "text": " not the most + elegant choice linguistically, like maybe I would add more color, but because I + was on", "tokens": [50664, 406, 264, 881, 21117, 3922, 21766, 20458, 11, 411, 1310, + 286, 576, 909, 544, 2017, 11, 457, 570, 286, 390, 322, 50936], "temperature": 0.0, + "avg_logprob": -0.1602321261451358, "compression_ratio": 1.6527196652719665, "no_speech_prob": + 0.005754351615905762}, {"id": 239, "seek": 251040, "start": 2521.84, "end": 2529.12, + "text": " the go, it was fine to save that, you know, few minutes and don''t be + distracted by the phone. So I", "tokens": [50936, 264, 352, 11, 309, 390, 2489, + 281, 3155, 300, 11, 291, 458, 11, 1326, 2077, 293, 500, 380, 312, 21658, 538, 264, + 2593, 13, 407, 286, 51300], "temperature": 0.0, "avg_logprob": -0.1602321261451358, + "compression_ratio": 1.6527196652719665, "no_speech_prob": 0.005754351615905762}, + {"id": 240, "seek": 251040, "start": 2529.12, "end": 2538.64, "text": " just pressed + that button and off it goes. And so that''s a fantastic feature. So I wanted to + sort of", "tokens": [51300, 445, 17355, 300, 2960, 293, 766, 309, 1709, 13, 400, + 370, 300, 311, 257, 5456, 4111, 13, 407, 286, 1415, 281, 1333, 295, 51776], "temperature": + 0.0, "avg_logprob": -0.1602321261451358, "compression_ratio": 1.6527196652719665, + "no_speech_prob": 0.005754351615905762}, {"id": 241, "seek": 253864, "start": 2539.2, + "end": 2544.64, "text": " open up the process a little bit of metric learning in + this case. Basically, I imagine and please", "tokens": [50392, 1269, 493, 264, 1399, + 257, 707, 857, 295, 20678, 2539, 294, 341, 1389, 13, 8537, 11, 286, 3811, 293, 1767, + 50664], "temperature": 0.0, "avg_logprob": -0.14401057215020208, "compression_ratio": + 1.6570247933884297, "no_speech_prob": 0.003972918260842562}, {"id": 242, "seek": + 253864, "start": 2544.64, "end": 2552.0, "text": " correct me if I''m wrong. As + an input, I would have, let''s say, a pair of sentences that what was the", "tokens": + [50664, 3006, 385, 498, 286, 478, 2085, 13, 1018, 364, 4846, 11, 286, 576, 362, + 11, 718, 311, 584, 11, 257, 6119, 295, 16579, 300, 437, 390, 264, 51032], "temperature": + 0.0, "avg_logprob": -0.14401057215020208, "compression_ratio": 1.6570247933884297, + "no_speech_prob": 0.003972918260842562}, {"id": 243, "seek": 253864, "start": 2552.0, + "end": 2557.7599999999998, "text": " input and what was the prediction and that + prediction could be either curated by experts or we could", "tokens": [51032, 4846, + 293, 437, 390, 264, 17630, 293, 300, 17630, 727, 312, 2139, 47851, 538, 8572, 420, + 321, 727, 51320], "temperature": 0.0, "avg_logprob": -0.14401057215020208, "compression_ratio": + 1.6570247933884297, "no_speech_prob": 0.003972918260842562}, {"id": 244, "seek": + 253864, "start": 2557.7599999999998, "end": 2565.2, "text": " have minded from the + logs, whatever. So let''s say we have a corpus like this, right? So we can employ", + "tokens": [51320, 362, 36707, 490, 264, 20820, 11, 2035, 13, 407, 718, 311, 584, + 321, 362, 257, 1181, 31624, 411, 341, 11, 558, 30, 407, 321, 393, 3188, 51692], + "temperature": 0.0, "avg_logprob": -0.14401057215020208, "compression_ratio": 1.6570247933884297, + "no_speech_prob": 0.003972918260842562}, {"id": 245, "seek": 256520, "start": 2565.2, + "end": 2570.64, "text": " sequence to sequence model or some other model to actually + train like our first first predictor.", "tokens": [50364, 8310, 281, 8310, 2316, + 420, 512, 661, 2316, 281, 767, 3847, 411, 527, 700, 700, 6069, 284, 13, 50636], + "temperature": 0.0, "avg_logprob": -0.1332760530359605, "compression_ratio": 1.748878923766816, + "no_speech_prob": 0.018314525485038757}, {"id": 246, "seek": 256520, "start": 2571.6, + "end": 2576.8799999999997, "text": " So at which point would you start thinking + and how exactly would you start thinking about metric", "tokens": [50684, 407, 412, + 597, 935, 576, 291, 722, 1953, 293, 577, 2293, 576, 291, 722, 1953, 466, 20678, + 50948], "temperature": 0.0, "avg_logprob": -0.1332760530359605, "compression_ratio": + 1.748878923766816, "no_speech_prob": 0.018314525485038757}, {"id": 247, "seek": + 256520, "start": 2576.8799999999997, "end": 2585.04, "text": " learning? Like how + can I change the behavior of my model? Like will I replace like last layer of my", + "tokens": [50948, 2539, 30, 1743, 577, 393, 286, 1319, 264, 5223, 295, 452, 2316, + 30, 1743, 486, 286, 7406, 411, 1036, 4583, 295, 452, 51356], "temperature": 0.0, + "avg_logprob": -0.1332760530359605, "compression_ratio": 1.748878923766816, "no_speech_prob": + 0.018314525485038757}, {"id": 248, "seek": 256520, "start": 2585.04, "end": 2589.6, + "text": " neural network with like different layer that I have learned from metric + learning? Can you a bit", "tokens": [51356, 18161, 3209, 365, 411, 819, 4583, 300, + 286, 362, 3264, 490, 20678, 2539, 30, 1664, 291, 257, 857, 51584], "temperature": + 0.0, "avg_logprob": -0.1332760530359605, "compression_ratio": 1.748878923766816, + "no_speech_prob": 0.018314525485038757}, {"id": 249, "seek": 258960, "start": 2589.6, + "end": 2597.04, "text": " open up this kitchen for me? Thanks. Actually, this smart + supply has", "tokens": [50364, 1269, 493, 341, 6525, 337, 385, 30, 2561, 13, 5135, + 11, 341, 4069, 5847, 575, 50736], "temperature": 0.0, "avg_logprob": -0.22520437240600585, + "compression_ratio": 1.2698412698412698, "no_speech_prob": 0.0060164667665958405}, + {"id": 250, "seek": 258960, "start": 2598.4, "end": 2611.04, "text": " its own paper + by Google as well and they are really doing a great job to describe the whole", + "tokens": [50804, 1080, 1065, 3035, 538, 3329, 382, 731, 293, 436, 366, 534, 884, + 257, 869, 1691, 281, 6786, 264, 1379, 51436], "temperature": 0.0, "avg_logprob": + -0.22520437240600585, "compression_ratio": 1.2698412698412698, "no_speech_prob": + 0.0060164667665958405}, {"id": 251, "seek": 261104, "start": 2611.04, "end": 2623.36, + "text": " logic to whole design decisions behind this feature. As you already said, + the suggested", "tokens": [50364, 9952, 281, 1379, 1715, 5327, 2261, 341, 4111, + 13, 1018, 291, 1217, 848, 11, 264, 10945, 50980], "temperature": 0.0, "avg_logprob": + -0.2349872134980701, "compression_ratio": 1.3909774436090225, "no_speech_prob": + 0.005320590455085039}, {"id": 252, "seek": 261104, "start": 2623.36, "end": 2638.48, + "text": " duplicates are not the best, the most specific replies that you can imagine, + but this is actually", "tokens": [50980, 17154, 1024, 366, 406, 264, 1151, 11, 264, + 881, 2685, 42289, 300, 291, 393, 3811, 11, 457, 341, 307, 767, 51736], "temperature": + 0.0, "avg_logprob": -0.2349872134980701, "compression_ratio": 1.3909774436090225, + "no_speech_prob": 0.005320590455085039}, {"id": 253, "seek": 263848, "start": 2639.44, + "end": 2650.8, "text": " are spied design because they do not generate those replies, + but they have a large collection of", "tokens": [50412, 366, 637, 1091, 1715, 570, + 436, 360, 406, 8460, 729, 42289, 11, 457, 436, 362, 257, 2416, 5765, 295, 50980], + "temperature": 0.0, "avg_logprob": -0.2822672681110661, "compression_ratio": 1.4651162790697674, + "no_speech_prob": 0.009230163879692554}, {"id": 254, "seek": 263848, "start": 2651.44, + "end": 2663.44, "text": " such replies and they should be as flexible as possible + to fit into different circumstances.", "tokens": [51012, 1270, 42289, 293, 436, + 820, 312, 382, 11358, 382, 1944, 281, 3318, 666, 819, 9121, 13, 51612], "temperature": + 0.0, "avg_logprob": -0.2822672681110661, "compression_ratio": 1.4651162790697674, + "no_speech_prob": 0.009230163879692554}, {"id": 255, "seek": 266344, "start": 2663.44, + "end": 2676.96, "text": " So they shouldn''t have any specific references to a specific + sentence in the conversation.", "tokens": [50364, 407, 436, 4659, 380, 362, 604, + 2685, 15400, 281, 257, 2685, 8174, 294, 264, 3761, 13, 51040], "temperature": 0.0, + "avg_logprob": -0.2183765411376953, "compression_ratio": 1.4324324324324325, "no_speech_prob": + 0.004382849670946598}, {"id": 256, "seek": 266344, "start": 2676.96, "end": 2684.96, + "text": " So that should be a generic enough to apply almost any conversation.", + "tokens": [51040, 407, 300, 820, 312, 257, 19577, 1547, 281, 3079, 1920, 604, 3761, + 13, 51440], "temperature": 0.0, "avg_logprob": -0.2183765411376953, "compression_ratio": + 1.4324324324324325, "no_speech_prob": 0.004382849670946598}, {"id": 257, "seek": + 268496, "start": 2685.6, "end": 2698.8, "text": " For the training slide, yeah actually, + they filter a large collection from the different platforms", "tokens": [50396, + 1171, 264, 3097, 4137, 11, 1338, 767, 11, 436, 6608, 257, 2416, 5765, 490, 264, + 819, 9473, 51056], "temperature": 0.0, "avg_logprob": -0.40209515889485675, "compression_ratio": + 1.4957264957264957, "no_speech_prob": 0.032382991164922714}, {"id": 258, "seek": + 268496, "start": 2699.52, "end": 2709.36, "text": " they are running Gmail and other + platforms and they filter short replies and", "tokens": [51092, 436, 366, 2614, + 36732, 293, 661, 9473, 293, 436, 6608, 2099, 42289, 293, 51584], "temperature": + 0.0, "avg_logprob": -0.40209515889485675, "compression_ratio": 1.4957264957264957, + "no_speech_prob": 0.032382991164922714}, {"id": 259, "seek": 270936, "start": 2710.0, + "end": 2727.92, "text": " thematically more broad samples such as as you gave as + an example. Yes, I did or no, I didn''t", "tokens": [50396, 552, 5030, 544, 4152, + 10938, 1270, 382, 382, 291, 2729, 382, 364, 1365, 13, 1079, 11, 286, 630, 420, 572, + 11, 286, 994, 380, 51292], "temperature": 0.0, "avg_logprob": -0.34926251002720426, + "compression_ratio": 1.1071428571428572, "no_speech_prob": 0.005918622016906738}, + {"id": 260, "seek": 272792, "start": 2728.8, "end": 2740.0, "text": " does it have + such examples. And the actual training algorithm works like this. They", "tokens": + [50408, 775, 309, 362, 1270, 5110, 13, 400, 264, 3539, 3097, 9284, 1985, 411, 341, + 13, 814, 50968], "temperature": 0.0, "avg_logprob": -0.3826844930648804, "compression_ratio": + 1.3410852713178294, "no_speech_prob": 0.015645449981093407}, {"id": 261, "seek": + 272792, "start": 2741.44, "end": 2753.28, "text": " actually come up with a very + creative, very clever, lost function for just a terrain with", "tokens": [51040, + 767, 808, 493, 365, 257, 588, 5880, 11, 588, 13494, 11, 2731, 2445, 337, 445, 257, + 17674, 365, 51632], "temperature": 0.0, "avg_logprob": -0.3826844930648804, "compression_ratio": + 1.3410852713178294, "no_speech_prob": 0.015645449981093407}, {"id": 262, "seek": + 275328, "start": 2753.28, "end": 2768.4, "text": " this model. They have only a + pair of two samples and there is no other label or information.", "tokens": [50364, + 341, 2316, 13, 814, 362, 787, 257, 6119, 295, 732, 10938, 293, 456, 307, 572, 661, + 7645, 420, 1589, 13, 51120], "temperature": 0.0, "avg_logprob": -0.27707455555597943, + "compression_ratio": 1.1219512195121952, "no_speech_prob": 0.01523177232593298}, + {"id": 263, "seek": 276840, "start": 2769.2000000000003, "end": 2783.6800000000003, + "text": " We only have one input and one ground truth, we have no other scoring, + no other label or", "tokens": [50404, 492, 787, 362, 472, 4846, 293, 472, 2727, + 3494, 11, 321, 362, 572, 661, 22358, 11, 572, 661, 7645, 420, 51128], "temperature": + 0.0, "avg_logprob": -0.402416189511617, "compression_ratio": 1.1733333333333333, + "no_speech_prob": 0.04214279353618622}, {"id": 264, "seek": 278368, "start": 2784.3999999999996, + "end": 2794.96, "text": " anything else. So we only get a batch of, for example,", + "tokens": [50400, 1340, 1646, 13, 407, 321, 787, 483, 257, 15245, 295, 11, 337, + 1365, 11, 50928], "temperature": 0.0, "avg_logprob": -0.24027713569434914, "compression_ratio": + 1.3557692307692308, "no_speech_prob": 0.029186204075813293}, {"id": 265, "seek": + 278368, "start": 2796.0, "end": 2807.68, "text": " and samples and we encode those + two and samples because we have two samples first page", "tokens": [50980, 293, + 10938, 293, 321, 2058, 1429, 729, 732, 293, 10938, 570, 321, 362, 732, 10938, 700, + 3028, 51564], "temperature": 0.0, "avg_logprob": -0.24027713569434914, "compression_ratio": + 1.3557692307692308, "no_speech_prob": 0.029186204075813293}, {"id": 266, "seek": + 280768, "start": 2808.56, "end": 2822.24, "text": " and we end up with two and samples. + And once we encode them with our encoder, we can compute a", "tokens": [50408, 293, + 321, 917, 493, 365, 732, 293, 10938, 13, 400, 1564, 321, 2058, 1429, 552, 365, 527, + 2058, 19866, 11, 321, 393, 14722, 257, 51092], "temperature": 0.0, "avg_logprob": + -0.26709564364686306, "compression_ratio": 1.5241935483870968, "no_speech_prob": + 0.011978224851191044}, {"id": 267, "seek": 280768, "start": 2822.24, "end": 2834.24, + "text": " distance matrix between these all posts of the encoder. A distance matrix + is a two-dimensional", "tokens": [51092, 4560, 8141, 1296, 613, 439, 12300, 295, + 264, 2058, 19866, 13, 316, 4560, 8141, 307, 257, 732, 12, 18759, 51692], "temperature": + 0.0, "avg_logprob": -0.26709564364686306, "compression_ratio": 1.5241935483870968, + "no_speech_prob": 0.011978224851191044}, {"id": 268, "seek": 283424, "start": 2835.2, + "end": 2850.24, "text": " matrix to define every distance value between all possible + pairs in a collection. So we have", "tokens": [50412, 8141, 281, 6964, 633, 4560, + 2158, 1296, 439, 1944, 15494, 294, 257, 5765, 13, 407, 321, 362, 51164], "temperature": + 0.0, "avg_logprob": -0.26274358658563524, "compression_ratio": 1.1358024691358024, + "no_speech_prob": 0.006556063424795866}, {"id": 269, "seek": 285024, "start": 2850.72, + "end": 2865.4399999999996, "text": " a matrix of five and times and and we already + have these samples as pairs. We already know", "tokens": [50388, 257, 8141, 295, + 1732, 293, 1413, 293, 293, 321, 1217, 362, 613, 10938, 382, 15494, 13, 492, 1217, + 458, 51124], "temperature": 0.0, "avg_logprob": -0.36319554370382556, "compression_ratio": + 1.1538461538461537, "no_speech_prob": 0.01102994754910469}, {"id": 270, "seek": + 286544, "start": 2865.92, "end": 2879.12, "text": " that there is a company target + samples for the sample, for the first sample at index zero,", "tokens": [50388, + 300, 456, 307, 257, 2237, 3779, 10938, 337, 264, 6889, 11, 337, 264, 700, 6889, + 412, 8186, 4018, 11, 51048], "temperature": 0.0, "avg_logprob": -0.37609410840411517, + "compression_ratio": 1.801980198019802, "no_speech_prob": 0.01228384394198656}, + {"id": 271, "seek": 286544, "start": 2879.12, "end": 2893.6, "text": " the company + sample should also be at index zero for sample at index one. The company sample", + "tokens": [51048, 264, 2237, 6889, 820, 611, 312, 412, 8186, 4018, 337, 6889, 412, + 8186, 472, 13, 440, 2237, 6889, 51772], "temperature": 0.0, "avg_logprob": -0.37609410840411517, + "compression_ratio": 1.801980198019802, "no_speech_prob": 0.01228384394198656}, + {"id": 272, "seek": 289360, "start": 2894.08, "end": 2905.92, "text": " sample should + be at index one. So we can generate these target labels just based on this", "tokens": + [50388, 6889, 820, 312, 412, 8186, 472, 13, 407, 321, 393, 8460, 613, 3779, 16949, + 445, 2361, 322, 341, 50980], "temperature": 0.0, "avg_logprob": -0.11794736773468727, + "compression_ratio": 1.3984375, "no_speech_prob": 0.003321515629068017}, {"id": + 273, "seek": 289360, "start": 2905.92, "end": 2918.24, "text": " information. So + it''s like a categorical classification now. So for the first sample in the", "tokens": + [50980, 1589, 13, 407, 309, 311, 411, 257, 19250, 804, 21538, 586, 13, 407, 337, + 264, 700, 6889, 294, 264, 51596], "temperature": 0.0, "avg_logprob": -0.11794736773468727, + "compression_ratio": 1.3984375, "no_speech_prob": 0.003321515629068017}, {"id": + 274, "seek": 291824, "start": 2919.2, "end": 2932.8799999999997, "text": " pair + at index zero, the categorical label should be zero and others all index values + should be", "tokens": [50412, 6119, 412, 8186, 4018, 11, 264, 19250, 804, 7645, + 820, 312, 4018, 293, 2357, 439, 8186, 4190, 820, 312, 51096], "temperature": 0.0, + "avg_logprob": -0.179537586543871, "compression_ratio": 1.4761904761904763, "no_speech_prob": + 0.00920387264341116}, {"id": 275, "seek": 291824, "start": 2933.68, "end": 2942.7999999999997, + "text": " wrong. So we can just encode this information as a one-heart encoding + and we can simply use", "tokens": [51136, 2085, 13, 407, 321, 393, 445, 2058, 1429, + 341, 1589, 382, 257, 472, 12, 12864, 43430, 293, 321, 393, 2935, 764, 51592], "temperature": + 0.0, "avg_logprob": -0.179537586543871, "compression_ratio": 1.4761904761904763, + "no_speech_prob": 0.00920387264341116}, {"id": 276, "seek": 294280, "start": 2943.76, + "end": 2952.8, "text": " a Crohn''s entropy loss once we encode this information + as a one-heart encoding and we can", "tokens": [50412, 257, 18965, 12071, 311, 30867, + 4470, 1564, 321, 2058, 1429, 341, 1589, 382, 257, 472, 12, 12864, 43430, 293, 321, + 393, 50864], "temperature": 0.0, "avg_logprob": -0.2634368159554221, "compression_ratio": + 1.3740458015267176, "no_speech_prob": 0.01413606759160757}, {"id": 277, "seek": + 294280, "start": 2953.52, "end": 2964.1600000000003, "text": " train this model + with this loss. So it is called multiple negative ranking loss because in", "tokens": + [50900, 3847, 341, 2316, 365, 341, 4470, 13, 407, 309, 307, 1219, 3866, 3671, 17833, + 4470, 570, 294, 51432], "temperature": 0.0, "avg_logprob": -0.2634368159554221, + "compression_ratio": 1.3740458015267176, "no_speech_prob": 0.01413606759160757}, + {"id": 278, "seek": 296416, "start": 2965.12, "end": 2980.08, "text": " in some + way we rank all possible replies in a batch with multiple negatives and only one + positive", "tokens": [50412, 294, 512, 636, 321, 6181, 439, 1944, 42289, 294, 257, + 15245, 365, 3866, 40019, 293, 787, 472, 3353, 51160], "temperature": 0.0, "avg_logprob": + -0.2694201252677224, "compression_ratio": 1.4222222222222223, "no_speech_prob": + 0.0041774264536798}, {"id": 279, "seek": 296416, "start": 2980.96, "end": 2993.12, + "text": " sample. Yeah. And so you would train this network with this with this + loss function and so the", "tokens": [51204, 6889, 13, 865, 13, 400, 370, 291, 576, + 3847, 341, 3209, 365, 341, 365, 341, 4470, 2445, 293, 370, 264, 51812], "temperature": + 0.0, "avg_logprob": -0.2694201252677224, "compression_ratio": 1.4222222222222223, + "no_speech_prob": 0.0041774264536798}, {"id": 280, "seek": 299312, "start": 2993.12, + "end": 3002.08, "text": " output will be what? Like will it be like the optimal + metric or optimal? Yeah. In one, we train this", "tokens": [50364, 5598, 486, 312, + 437, 30, 1743, 486, 309, 312, 411, 264, 16252, 20678, 420, 16252, 30, 865, 13, 682, + 472, 11, 321, 3847, 341, 50812], "temperature": 0.0, "avg_logprob": -0.2433856208369417, + "compression_ratio": 1.4565217391304348, "no_speech_prob": 0.006256880704313517}, + {"id": 281, "seek": 299312, "start": 3003.2, "end": 3021.12, "text": " model, we + end up with a model that can encode a sentence in such a way that that vector can + retrieve", "tokens": [50868, 2316, 11, 321, 917, 493, 365, 257, 2316, 300, 393, + 2058, 1429, 257, 8174, 294, 1270, 257, 636, 300, 300, 8062, 393, 30254, 51764], + "temperature": 0.0, "avg_logprob": -0.2433856208369417, "compression_ratio": 1.4565217391304348, + "no_speech_prob": 0.006256880704313517}, {"id": 282, "seek": 302112, "start": 3021.12, + "end": 3034.3199999999997, "text": " the most relevant vectors from a collection + of possible replies. So after we train this model,", "tokens": [50364, 264, 881, + 7340, 18875, 490, 257, 5765, 295, 1944, 42289, 13, 407, 934, 321, 3847, 341, 2316, + 11, 51024], "temperature": 0.0, "avg_logprob": -0.10508264194835316, "compression_ratio": + 1.4538461538461538, "no_speech_prob": 0.008691701106727123}, {"id": 283, "seek": + 302112, "start": 3034.3199999999997, "end": 3046.3199999999997, "text": " we encode + all possible replies and index them in a vector database. And at the inference time,", + "tokens": [51024, 321, 2058, 1429, 439, 1944, 42289, 293, 8186, 552, 294, 257, 8062, + 8149, 13, 400, 412, 264, 38253, 565, 11, 51624], "temperature": 0.0, "avg_logprob": + -0.10508264194835316, "compression_ratio": 1.4538461538461538, "no_speech_prob": + 0.008691701106727123}, {"id": 284, "seek": 304632, "start": 3046.8, "end": 3059.76, + "text": " we encode the usage input again with this model and make a query, a vector + source query to that", "tokens": [50388, 321, 2058, 1429, 264, 14924, 4846, 797, + 365, 341, 2316, 293, 652, 257, 14581, 11, 257, 8062, 4009, 14581, 281, 300, 51036], + "temperature": 0.0, "avg_logprob": -0.32258164648916204, "compression_ratio": 1.4104477611940298, + "no_speech_prob": 0.008722199127078056}, {"id": 285, "seek": 304632, "start": 3059.76, + "end": 3070.88, "text": " pre-index database of possible replies and we can get, + for example, a car, a chain, a nearest", "tokens": [51036, 659, 12, 471, 3121, 8149, + 295, 1944, 42289, 293, 321, 393, 483, 11, 337, 1365, 11, 257, 1032, 11, 257, 5021, + 11, 257, 23831, 51592], "temperature": 0.0, "avg_logprob": -0.32258164648916204, + "compression_ratio": 1.4104477611940298, "no_speech_prob": 0.008722199127078056}, + {"id": 286, "seek": 307088, "start": 3070.88, "end": 3082.7200000000003, "text": + " neighbor to that vector to suggest to use it. Yeah. I mean, after you explain + this, like to me,", "tokens": [50364, 5987, 281, 300, 8062, 281, 3402, 281, 764, + 309, 13, 865, 13, 286, 914, 11, 934, 291, 2903, 341, 11, 411, 281, 385, 11, 50956], + "temperature": 0.0, "avg_logprob": -0.24846298554364374, "compression_ratio": 1.676300578034682, + "no_speech_prob": 0.007206438574939966}, {"id": 287, "seek": 307088, "start": 3082.7200000000003, + "end": 3089.6, "text": " like the mental image that evokes is that we sort of like + learn rather than learning the metric,", "tokens": [50956, 411, 264, 4973, 3256, + 300, 1073, 8606, 307, 300, 321, 1333, 295, 411, 1466, 2831, 813, 2539, 264, 20678, + 11, 51300], "temperature": 0.0, "avg_logprob": -0.24846298554364374, "compression_ratio": + 1.676300578034682, "no_speech_prob": 0.007206438574939966}, {"id": 288, "seek": + 307088, "start": 3089.6, "end": 3095.36, "text": " we''re actually learning the + vectors themselves. We''re learning the best vector representation for", "tokens": + [51300, 321, 434, 767, 2539, 264, 18875, 2969, 13, 492, 434, 2539, 264, 1151, 8062, + 10290, 337, 51588], "temperature": 0.0, "avg_logprob": -0.24846298554364374, "compression_ratio": + 1.676300578034682, "no_speech_prob": 0.007206438574939966}, {"id": 289, "seek": + 309536, "start": 3095.44, "end": 3102.32, "text": " our object to satisfy some goal, + right? Let''s say that for this sentence, the closest", "tokens": [50368, 527, 2657, + 281, 19319, 512, 3387, 11, 558, 30, 961, 311, 584, 300, 337, 341, 8174, 11, 264, + 13699, 50712], "temperature": 0.0, "avg_logprob": -0.24090723557905716, "compression_ratio": + 1.3235294117647058, "no_speech_prob": 0.004641443956643343}, {"id": 290, "seek": + 309536, "start": 3102.32, "end": 3111.04, "text": " reply should be this in some + sense. Yeah, exactly. Actually, the model learns a representation", "tokens": [50712, + 16972, 820, 312, 341, 294, 512, 2020, 13, 865, 11, 2293, 13, 5135, 11, 264, 2316, + 27152, 257, 10290, 51148], "temperature": 0.0, "avg_logprob": -0.24090723557905716, + "compression_ratio": 1.3235294117647058, "no_speech_prob": 0.004641443956643343}, + {"id": 291, "seek": 311104, "start": 3111.12, "end": 3126.64, "text": " that satisfies + the satisfies our purpose. So in some way, we can fully pick any distance", "tokens": + [50368, 300, 44271, 264, 44271, 527, 4334, 13, 407, 294, 512, 636, 11, 321, 393, + 4498, 1888, 604, 4560, 51144], "temperature": 0.0, "avg_logprob": -0.24228973388671876, + "compression_ratio": 1.3360655737704918, "no_speech_prob": 0.02719750814139843}, + {"id": 292, "seek": 311104, "start": 3127.68, "end": 3138.4, "text": " metric based + on this intuition. Yeah. So the second part of your question,", "tokens": [51196, + 20678, 2361, 322, 341, 24002, 13, 865, 13, 407, 264, 1150, 644, 295, 428, 1168, + 11, 51732], "temperature": 0.0, "avg_logprob": -0.24228973388671876, "compression_ratio": + 1.3360655737704918, "no_speech_prob": 0.02719750814139843}, {"id": 293, "seek": + 313840, "start": 3139.12, "end": 3151.28, "text": " when we can think about metric + learning. Actually, metric learning can be applied to almost any domain", "tokens": + [50400, 562, 321, 393, 519, 466, 20678, 2539, 13, 5135, 11, 20678, 2539, 393, 312, + 6456, 281, 1920, 604, 9274, 51008], "temperature": 0.0, "avg_logprob": -0.15340256690979004, + "compression_ratio": 1.5, "no_speech_prob": 0.0053184013813734055}, {"id": 294, + "seek": 313840, "start": 3152.4, "end": 3161.92, "text": " of problems, but there + are some particular cases where metric learning really shines over", "tokens": [51064, + 295, 2740, 11, 457, 456, 366, 512, 1729, 3331, 689, 20678, 2539, 534, 28056, 670, + 51540], "temperature": 0.0, "avg_logprob": -0.15340256690979004, "compression_ratio": + 1.5, "no_speech_prob": 0.0053184013813734055}, {"id": 295, "seek": 316192, "start": + 3162.7200000000003, "end": 3175.2000000000003, "text": " other alternatives. These + are actually data scarce regimes, especially for labeled, if you are", "tokens": + [50404, 661, 20478, 13, 1981, 366, 767, 1412, 41340, 45738, 11, 2318, 337, 21335, + 11, 498, 291, 366, 51028], "temperature": 0.0, "avg_logprob": -0.24727450476752388, + "compression_ratio": 1.441860465116279, "no_speech_prob": 0.004401012323796749}, + {"id": 296, "seek": 316192, "start": 3176.48, "end": 3185.84, "text": " short for + labeled data, you can still do a pretty good job with, for example, auto encoder,", + "tokens": [51092, 2099, 337, 21335, 1412, 11, 291, 393, 920, 360, 257, 1238, 665, + 1691, 365, 11, 337, 1365, 11, 8399, 2058, 19866, 11, 51560], "temperature": 0.0, + "avg_logprob": -0.24727450476752388, "compression_ratio": 1.441860465116279, "no_speech_prob": + 0.004401012323796749}, {"id": 297, "seek": 318584, "start": 3185.84, "end": 3195.6000000000004, + "text": " as we already discussed previously. And also, if you have", "tokens": + [50364, 382, 321, 1217, 7152, 8046, 13, 400, 611, 11, 498, 291, 362, 50852], "temperature": + 0.0, "avg_logprob": -0.3226981664958753, "compression_ratio": 1.2767857142857142, + "no_speech_prob": 0.004759341012686491}, {"id": 298, "seek": 318584, "start": 3197.1200000000003, + "end": 3210.0, "text": " rapid-changing distributions, it''s again, very helpful. + And if you have, for example,", "tokens": [50928, 7558, 12, 27123, 37870, 11, 309, + 311, 797, 11, 588, 4961, 13, 400, 498, 291, 362, 11, 337, 1365, 11, 51572], "temperature": + 0.0, "avg_logprob": -0.3226981664958753, "compression_ratio": 1.2767857142857142, + "no_speech_prob": 0.004759341012686491}, {"id": 299, "seek": 321000, "start": 3210.0, + "end": 3225.52, "text": " a very, very high number of classes, again, metric learning + can do a good job. Finally, metric learning", "tokens": [50364, 257, 588, 11, 588, + 1090, 1230, 295, 5359, 11, 797, 11, 20678, 2539, 393, 360, 257, 665, 1691, 13, 6288, + 11, 20678, 2539, 51140], "temperature": 0.0, "avg_logprob": -0.20940260092417398, + "compression_ratio": 1.507462686567164, "no_speech_prob": 0.01618296280503273}, + {"id": 300, "seek": 321000, "start": 3227.12, "end": 3239.44, "text": " is one of + the best way to be able to actually increase the performance of machine learning + models,", "tokens": [51220, 307, 472, 295, 264, 1151, 636, 281, 312, 1075, 281, + 767, 3488, 264, 3389, 295, 3479, 2539, 5245, 11, 51836], "temperature": 0.0, "avg_logprob": + -0.20940260092417398, "compression_ratio": 1.507462686567164, "no_speech_prob": + 0.01618296280503273}, {"id": 301, "seek": 323944, "start": 3239.52, "end": 3248.88, + "text": " even after training. In normal deep learning training, there is no way + to increase the performance", "tokens": [50368, 754, 934, 3097, 13, 682, 2710, 2452, + 2539, 3097, 11, 456, 307, 572, 636, 281, 3488, 264, 3389, 50836], "temperature": + 0.0, "avg_logprob": -0.13123036490546333, "compression_ratio": 1.4887218045112782, + "no_speech_prob": 0.0019445125944912434}, {"id": 302, "seek": 323944, "start": 3248.88, + "end": 3261.68, "text": " of a model after training is complete. But in metric learning, + this is quite possible. For example,", "tokens": [50836, 295, 257, 2316, 934, 3097, + 307, 3566, 13, 583, 294, 20678, 2539, 11, 341, 307, 1596, 1944, 13, 1171, 1365, + 11, 51476], "temperature": 0.0, "avg_logprob": -0.13123036490546333, "compression_ratio": + 1.4887218045112782, "no_speech_prob": 0.0019445125944912434}, {"id": 303, "seek": + 326168, "start": 3261.7599999999998, "end": 3271.68, "text": " instead of just a + training classification model to make a probability distribution over", "tokens": + [50368, 2602, 295, 445, 257, 3097, 21538, 2316, 281, 652, 257, 8482, 7316, 670, + 50864], "temperature": 0.0, "avg_logprob": -0.26426038986597306, "compression_ratio": + 1.472, "no_speech_prob": 0.012363242916762829}, {"id": 304, "seek": 326168, "start": + 3273.04, "end": 3284.7999999999997, "text": " set of classes, we can train a metric + learning model and encode samples with that model to store", "tokens": [50932, 992, + 295, 5359, 11, 321, 393, 3847, 257, 20678, 2539, 2316, 293, 2058, 1429, 10938, 365, + 300, 2316, 281, 3531, 51520], "temperature": 0.0, "avg_logprob": -0.26426038986597306, + "compression_ratio": 1.472, "no_speech_prob": 0.012363242916762829}, {"id": 305, + "seek": 328480, "start": 3284.8, "end": 3294.96, "text": " somewhere. And during + the inference, we can query that store to get more similar", "tokens": [50364, 4079, + 13, 400, 1830, 264, 38253, 11, 321, 393, 14581, 300, 3531, 281, 483, 544, 2531, + 50872], "temperature": 0.0, "avg_logprob": -0.275920481295199, "compression_ratio": + 1.376, "no_speech_prob": 0.01892298273742199}, {"id": 306, "seek": 328480, "start": + 3296.6400000000003, "end": 3309.36, "text": " chain nearest neighbors and decide + on the predicted category based on the majority of those", "tokens": [50956, 5021, + 23831, 12512, 293, 4536, 322, 264, 19147, 7719, 2361, 322, 264, 6286, 295, 729, + 51592], "temperature": 0.0, "avg_logprob": -0.275920481295199, "compression_ratio": + 1.376, "no_speech_prob": 0.01892298273742199}, {"id": 307, "seek": 330936, "start": + 3309.36, "end": 3320.48, "text": " chain nearest neighbors. This is called chain + uncostication, in fact. And in the practical", "tokens": [50364, 5021, 23831, 12512, + 13, 639, 307, 1219, 5021, 6219, 555, 8758, 11, 294, 1186, 13, 400, 294, 264, 8496, + 50920], "temperature": 0.0, "avg_logprob": -0.4215660509855851, "compression_ratio": + 1.4651162790697674, "no_speech_prob": 0.007927683182060719}, {"id": 308, "seek": + 330936, "start": 3320.48, "end": 3334.1600000000003, "text": " side, on the practical + side, you can continue to add new samples to that store without any need to", "tokens": + [50920, 1252, 11, 322, 264, 8496, 1252, 11, 291, 393, 2354, 281, 909, 777, 10938, + 281, 300, 3531, 1553, 604, 643, 281, 51604], "temperature": 0.0, "avg_logprob": + -0.4215660509855851, "compression_ratio": 1.4651162790697674, "no_speech_prob": + 0.007927683182060719}, {"id": 309, "seek": 333416, "start": 3334.96, "end": 3345.8399999999997, + "text": " retrain the model. And once you add new samples to that store your model + performance", "tokens": [50404, 1533, 7146, 264, 2316, 13, 400, 1564, 291, 909, + 777, 10938, 281, 300, 3531, 428, 2316, 3389, 50948], "temperature": 0.0, "avg_logprob": + -0.23172660101027714, "compression_ratio": 1.423728813559322, "no_speech_prob": + 0.005843288265168667}, {"id": 310, "seek": 333416, "start": 3346.64, "end": 3357.12, + "text": " will also increase. And also, there is another use case, for example, + a more recent", "tokens": [50988, 486, 611, 3488, 13, 400, 611, 11, 456, 307, 1071, + 764, 1389, 11, 337, 1365, 11, 257, 544, 5162, 51512], "temperature": 0.0, "avg_logprob": + -0.23172660101027714, "compression_ratio": 1.423728813559322, "no_speech_prob": + 0.005843288265168667}, {"id": 311, "seek": 335712, "start": 3357.7599999999998, + "end": 3373.2, "text": " approach by DeepMind. Up until now, the only way to make + AI smarter is usually train a bigger", "tokens": [50396, 3109, 538, 14895, 44, 471, + 13, 5858, 1826, 586, 11, 264, 787, 636, 281, 652, 7318, 20294, 307, 2673, 3847, + 257, 3801, 51168], "temperature": 0.0, "avg_logprob": -0.3901444948636569, "compression_ratio": + 1.0333333333333334, "no_speech_prob": 0.017984267324209213}, {"id": 312, "seek": + 337320, "start": 3373.2799999999997, "end": 3388.64, "text": " and bigger language + model. But in the most recent study by DeepMind, they augment language models", + "tokens": [50368, 293, 3801, 2856, 2316, 13, 583, 294, 264, 881, 5162, 2979, 538, + 14895, 44, 471, 11, 436, 29919, 2856, 5245, 51136], "temperature": 0.0, "avg_logprob": + -0.19730119705200194, "compression_ratio": 1.3070175438596492, "no_speech_prob": + 0.019084036350250244}, {"id": 313, "seek": 337320, "start": 3388.64, "end": 3394.56, + "text": " with retrieval capability. This means actually they", "tokens": [51136, + 365, 19817, 3337, 13759, 13, 639, 1355, 767, 436, 51432], "temperature": 0.0, "avg_logprob": + -0.19730119705200194, "compression_ratio": 1.3070175438596492, "no_speech_prob": + 0.019084036350250244}, {"id": 314, "seek": 339456, "start": 3395.52, "end": 3409.44, + "text": " encode and store a large collection of corpus in a Rector''s database. + And during the inference,", "tokens": [50412, 2058, 1429, 293, 3531, 257, 2416, + 5765, 295, 1181, 31624, 294, 257, 497, 20814, 311, 8149, 13, 400, 1830, 264, 38253, + 11, 51108], "temperature": 0.0, "avg_logprob": -0.32727373563326323, "compression_ratio": + 1.1176470588235294, "no_speech_prob": 0.0066364360973238945}, {"id": 315, "seek": + 340944, "start": 3410.4, "end": 3428.7200000000003, "text": " they query this database + to get most relevant sentences, most relevant text to the user input.", "tokens": + [50412, 436, 14581, 341, 8149, 281, 483, 881, 7340, 16579, 11, 881, 7340, 2487, + 281, 264, 4195, 4846, 13, 51328], "temperature": 0.0, "avg_logprob": -0.33846139907836914, + "compression_ratio": 1.2533333333333334, "no_speech_prob": 0.011745172552764416}, + {"id": 316, "seek": 342872, "start": 3428.7999999999997, "end": 3441.68, "text": + " And they combine them to feed to the model. And with this technique, they can + achieve the", "tokens": [50368, 400, 436, 10432, 552, 281, 3154, 281, 264, 2316, + 13, 400, 365, 341, 6532, 11, 436, 393, 4584, 264, 51012], "temperature": 0.0, "avg_logprob": + -0.21010682799599387, "compression_ratio": 1.2740740740740741, "no_speech_prob": + 0.013064459897577763}, {"id": 317, "seek": 342872, "start": 3442.56, "end": 3458.08, + "text": " same performance as GPT3 with 25x less parameters. So it''s a very efficient + way of", "tokens": [51056, 912, 3389, 382, 26039, 51, 18, 365, 3552, 87, 1570, 9834, + 13, 407, 309, 311, 257, 588, 7148, 636, 295, 51832], "temperature": 0.0, "avg_logprob": + -0.21010682799599387, "compression_ratio": 1.2740740740740741, "no_speech_prob": + 0.013064459897577763}, {"id": 318, "seek": 345872, "start": 3459.04, "end": 3470.72, + "text": " AI. So I''m also quite happy to see the direction of AI towards a more + efficient one with", "tokens": [50380, 7318, 13, 407, 286, 478, 611, 1596, 2055, + 281, 536, 264, 3513, 295, 7318, 3030, 257, 544, 7148, 472, 365, 50964], "temperature": + 0.0, "avg_logprob": -0.17484624620894312, "compression_ratio": 1.4583333333333333, + "no_speech_prob": 0.00731369573622942}, {"id": 319, "seek": 345872, "start": 3471.6, + "end": 3476.24, "text": " metric learning as well. Yeah, yeah, it''s fantastic. + And I think it''s like a good impact on the", "tokens": [51008, 20678, 2539, 382, + 731, 13, 865, 11, 1338, 11, 309, 311, 5456, 13, 400, 286, 519, 309, 311, 411, 257, + 665, 2712, 322, 264, 51240], "temperature": 0.0, "avg_logprob": -0.17484624620894312, + "compression_ratio": 1.4583333333333333, "no_speech_prob": 0.00731369573622942}, + {"id": 320, "seek": 345872, "start": 3476.24, "end": 3481.2799999999997, "text": + " planning, because I don''t think we want to spend too much electricity on all + power and training", "tokens": [51240, 5038, 11, 570, 286, 500, 380, 519, 321, 528, + 281, 3496, 886, 709, 10356, 322, 439, 1347, 293, 3097, 51492], "temperature": 0.0, + "avg_logprob": -0.17484624620894312, "compression_ratio": 1.4583333333333333, "no_speech_prob": + 0.00731369573622942}, {"id": 321, "seek": 348128, "start": 3481.28, "end": 3490.8, + "text": " neural networks. Yeah, exactly. And it also enables democratization of + Deep Learning, because", "tokens": [50364, 18161, 9590, 13, 865, 11, 2293, 13, 400, + 309, 611, 17077, 37221, 2144, 295, 14895, 15205, 11, 570, 50840], "temperature": + 0.0, "avg_logprob": -0.37569480718568316, "compression_ratio": 1.282758620689655, + "no_speech_prob": 0.012573977001011372}, {"id": 322, "seek": 348128, "start": 3491.6800000000003, + "end": 3503.0400000000004, "text": " not everyone has the same resources as this + large companies as Google, Facebook, and OpenAI.", "tokens": [50884, 406, 1518, + 575, 264, 912, 3593, 382, 341, 2416, 3431, 382, 3329, 11, 4384, 11, 293, 7238, 48698, + 13, 51452], "temperature": 0.0, "avg_logprob": -0.37569480718568316, "compression_ratio": + 1.282758620689655, "no_speech_prob": 0.012573977001011372}, {"id": 323, "seek": + 350304, "start": 3503.52, "end": 3508.48, "text": " So I think it''s also important + for that reason as well.", "tokens": [50388, 407, 286, 519, 309, 311, 611, 1021, + 337, 300, 1778, 382, 731, 13, 50636], "temperature": 0.0, "avg_logprob": -0.1723371891493208, + "compression_ratio": 1.6179245283018868, "no_speech_prob": 0.016916243359446526}, + {"id": 324, "seek": 350304, "start": 3509.12, "end": 3514.08, "text": " Yeah, that''s + fantastic. I mean, you gave quite a lot of detail on metric learning. Of course,", + "tokens": [50668, 865, 11, 300, 311, 5456, 13, 286, 914, 11, 291, 2729, 1596, 257, + 688, 295, 2607, 322, 20678, 2539, 13, 2720, 1164, 11, 50916], "temperature": 0.0, + "avg_logprob": -0.1723371891493208, "compression_ratio": 1.6179245283018868, "no_speech_prob": + 0.016916243359446526}, {"id": 325, "seek": 350304, "start": 3514.08, "end": 3522.4, + "text": " there is a ton to learn. And I even, I''ve seen like a book cited on one + of the metric learning pages", "tokens": [50916, 456, 307, 257, 2952, 281, 1466, + 13, 400, 286, 754, 11, 286, 600, 1612, 411, 257, 1446, 30134, 322, 472, 295, 264, + 20678, 2539, 7183, 51332], "temperature": 0.0, "avg_logprob": -0.1723371891493208, + "compression_ratio": 1.6179245283018868, "no_speech_prob": 0.016916243359446526}, + {"id": 326, "seek": 350304, "start": 3523.2, "end": 3531.12, "text": " that I found + through your awesome metric learning resource. And now that we touched a bit on", + "tokens": [51372, 300, 286, 1352, 807, 428, 3476, 20678, 2539, 7684, 13, 400, 586, + 300, 321, 9828, 257, 857, 322, 51768], "temperature": 0.0, "avg_logprob": -0.1723371891493208, + "compression_ratio": 1.6179245283018868, "no_speech_prob": 0.016916243359446526}, + {"id": 327, "seek": 353112, "start": 3532.08, "end": 3537.44, "text": " where the + AI is also going and then how to make it more efficient. I also like to ask a question", + "tokens": [50412, 689, 264, 7318, 307, 611, 516, 293, 550, 577, 281, 652, 309, 544, + 7148, 13, 286, 611, 411, 281, 1029, 257, 1168, 50680], "temperature": 0.0, "avg_logprob": + -0.17041492462158203, "compression_ratio": 1.6044444444444443, "no_speech_prob": + 0.0015371742192655802}, {"id": 328, "seek": 353112, "start": 3537.44, "end": 3545.3599999999997, + "text": " of why sort of this magical question which drills into your motivation + as to why at all you are", "tokens": [50680, 295, 983, 1333, 295, 341, 12066, 1168, + 597, 36126, 666, 428, 12335, 382, 281, 983, 412, 439, 291, 366, 51076], "temperature": + 0.0, "avg_logprob": -0.17041492462158203, "compression_ratio": 1.6044444444444443, + "no_speech_prob": 0.0015371742192655802}, {"id": 329, "seek": 353112, "start": 3545.3599999999997, + "end": 3551.8399999999997, "text": " in this space, let''s say deep learning and + quadrant vector search and also specifically metric", "tokens": [51076, 294, 341, + 1901, 11, 718, 311, 584, 2452, 2539, 293, 46856, 8062, 3164, 293, 611, 4682, 20678, + 51400], "temperature": 0.0, "avg_logprob": -0.17041492462158203, "compression_ratio": + 1.6044444444444443, "no_speech_prob": 0.0015371742192655802}, {"id": 330, "seek": + 353112, "start": 3551.8399999999997, "end": 3556.24, "text": " learning. Can you + a bit elaborate on the philosophy that drives you here?", "tokens": [51400, 2539, + 13, 1664, 291, 257, 857, 20945, 322, 264, 10675, 300, 11754, 291, 510, 30, 51620], + "temperature": 0.0, "avg_logprob": -0.17041492462158203, "compression_ratio": 1.6044444444444443, + "no_speech_prob": 0.0015371742192655802}, {"id": 331, "seek": 355624, "start": 3556.72, + "end": 3570.24, "text": " Yes, sure. Actually, what motivates me to work with metric + learning is", "tokens": [50388, 1079, 11, 988, 13, 5135, 11, 437, 42569, 385, 281, + 589, 365, 20678, 2539, 307, 51064], "temperature": 0.0, "avg_logprob": -0.24811282910798726, + "compression_ratio": 0.958904109589041, "no_speech_prob": 0.009302918799221516}, + {"id": 332, "seek": 357024, "start": 3570.8799999999997, "end": 3595.52, "text": + " it''s potential to approach many different problems very efficiently. Before metric + learning actually,", "tokens": [50396, 309, 311, 3995, 281, 3109, 867, 819, 2740, + 588, 19621, 13, 4546, 20678, 2539, 767, 11, 51628], "temperature": 0.0, "avg_logprob": + -0.43420095443725587, "compression_ratio": 1.1222222222222222, "no_speech_prob": + 0.006233962252736092}, {"id": 333, "seek": 359552, "start": 3596.24, "end": 3611.28, + "text": " you need to train very different models to solve very different problems. + But with metric learning,", "tokens": [50400, 291, 643, 281, 3847, 588, 819, 5245, + 281, 5039, 588, 819, 2740, 13, 583, 365, 20678, 2539, 11, 51152], "temperature": + 0.0, "avg_logprob": -0.149908954446966, "compression_ratio": 1.736842105263158, + "no_speech_prob": 0.006703353486955166}, {"id": 334, "seek": 359552, "start": 3611.28, + "end": 3624.48, "text": " you can train a single model and you can use the very + same model to solve very different problems.", "tokens": [51152, 291, 393, 3847, + 257, 2167, 2316, 293, 291, 393, 764, 264, 588, 912, 2316, 281, 5039, 588, 819, 2740, + 13, 51812], "temperature": 0.0, "avg_logprob": -0.149908954446966, "compression_ratio": + 1.736842105263158, "no_speech_prob": 0.006703353486955166}, {"id": 335, "seek": + 362448, "start": 3625.28, "end": 3638.0, "text": " And this is also another fight + that makes metric learning efficient. Actually, metric learning has a", "tokens": + [50404, 400, 341, 307, 611, 1071, 2092, 300, 1669, 20678, 2539, 7148, 13, 5135, + 11, 20678, 2539, 575, 257, 51040], "temperature": 0.0, "avg_logprob": -0.22633049488067628, + "compression_ratio": 1.4344262295081966, "no_speech_prob": 0.0023442262317985296}, + {"id": 336, "seek": 362448, "start": 3638.0, "end": 3646.56, "text": " great potential, + but you also need a great tool to put it into production.", "tokens": [51040, 869, + 3995, 11, 457, 291, 611, 643, 257, 869, 2290, 281, 829, 309, 666, 4265, 13, 51468], + "temperature": 0.0, "avg_logprob": -0.22633049488067628, "compression_ratio": 1.4344262295081966, + "no_speech_prob": 0.0023442262317985296}, {"id": 337, "seek": 364656, "start": 3647.04, + "end": 3664.16, "text": " For example, upon to now there was no way to combine vector + search with paid-out information.", "tokens": [50388, 1171, 1365, 11, 3564, 281, + 586, 456, 390, 572, 636, 281, 10432, 8062, 3164, 365, 4835, 12, 346, 1589, 13, 51244], + "temperature": 0.0, "avg_logprob": -0.38897028836337005, "compression_ratio": 1.4360902255639099, + "no_speech_prob": 0.03534810617566109}, {"id": 338, "seek": 364656, "start": 3665.52, + "end": 3675.04, "text": " Even if you make a connection, it was not for practical + because you lose some information because", "tokens": [51312, 2754, 498, 291, 652, + 257, 4984, 11, 309, 390, 406, 337, 8496, 570, 291, 3624, 512, 1589, 570, 51788], + "temperature": 0.0, "avg_logprob": -0.38897028836337005, "compression_ratio": 1.4360902255639099, + "no_speech_prob": 0.03534810617566109}, {"id": 339, "seek": 367504, "start": 3675.7599999999998, + "end": 3689.92, "text": " you do not, you could not filter the tool systems of information + at the same time. Quadrant is", "tokens": [50400, 291, 360, 406, 11, 291, 727, 406, + 6608, 264, 2290, 3652, 295, 1589, 412, 264, 912, 565, 13, 29619, 7541, 307, 51108], + "temperature": 0.0, "avg_logprob": -0.2610823448668135, "compression_ratio": 1.4014598540145986, + "no_speech_prob": 0.003344888100400567}, {"id": 340, "seek": 367504, "start": 3690.72, + "end": 3703.2, "text": " doing a great job by combining vector search with filterable + paid-out information. So it opens up", "tokens": [51148, 884, 257, 869, 1691, 538, + 21928, 8062, 3164, 365, 6608, 712, 4835, 12, 346, 1589, 13, 407, 309, 9870, 493, + 51772], "temperature": 0.0, "avg_logprob": -0.2610823448668135, "compression_ratio": + 1.4014598540145986, "no_speech_prob": 0.003344888100400567}, {"id": 341, "seek": + 370320, "start": 3703.8399999999997, "end": 3719.12, "text": " quite a few new opportunities. + For that one, you can filter your information based on", "tokens": [50396, 1596, + 257, 1326, 777, 4786, 13, 1171, 300, 472, 11, 291, 393, 6608, 428, 1589, 2361, 322, + 51160], "temperature": 0.0, "avg_logprob": -0.20006409145536877, "compression_ratio": + 1.0625, "no_speech_prob": 0.0070467074401676655}, {"id": 342, "seek": 371912, "start": + 3720.08, "end": 3731.2, "text": " if geographic, geographic place, for example, + or another sparse category,", "tokens": [50412, 498, 32318, 11, 32318, 1081, 11, + 337, 1365, 11, 420, 1071, 637, 11668, 7719, 11, 50968], "temperature": 0.0, "avg_logprob": + -0.2995162464323498, "compression_ratio": 1.3387096774193548, "no_speech_prob": + 0.007769379299134016}, {"id": 343, "seek": 371912, "start": 3733.04, "end": 3745.12, + "text": " numeric value or anything else while at the same time doing a vector search. + So I think it''s", "tokens": [51060, 7866, 299, 2158, 420, 1340, 1646, 1339, 412, + 264, 912, 565, 884, 257, 8062, 3164, 13, 407, 286, 519, 309, 311, 51664], "temperature": + 0.0, "avg_logprob": -0.2995162464323498, "compression_ratio": 1.3387096774193548, + "no_speech_prob": 0.007769379299134016}, {"id": 344, "seek": 374512, "start": 3746.08, + "end": 3760.16, "text": " really exciting. One of the most common problems in AI, + you actually do the research, but you", "tokens": [50412, 534, 4670, 13, 1485, 295, + 264, 881, 2689, 2740, 294, 7318, 11, 291, 767, 360, 264, 2132, 11, 457, 291, 51116], + "temperature": 0.0, "avg_logprob": -0.1324567084616803, "compression_ratio": 1.3357142857142856, + "no_speech_prob": 0.014797184616327286}, {"id": 345, "seek": 374512, "start": 3760.16, + "end": 3771.8399999999997, "text": " don''t have the required tooling to make it + practical in the real world. So I think it''s quite", "tokens": [51116, 500, 380, + 362, 264, 4739, 46593, 281, 652, 309, 8496, 294, 264, 957, 1002, 13, 407, 286, 519, + 309, 311, 1596, 51700], "temperature": 0.0, "avg_logprob": -0.1324567084616803, + "compression_ratio": 1.3357142857142856, "no_speech_prob": 0.014797184616327286}, + {"id": 346, "seek": 377184, "start": 3771.84, "end": 3784.0, "text": " important + to have such tools as Quadrant to achieve very different, very difficult and challenging", + "tokens": [50364, 1021, 281, 362, 1270, 3873, 382, 29619, 7541, 281, 4584, 588, + 819, 11, 588, 2252, 293, 7595, 50972], "temperature": 0.0, "avg_logprob": -0.2031852782718719, + "compression_ratio": 1.5026737967914439, "no_speech_prob": 0.0026681963354349136}, + {"id": 347, "seek": 377184, "start": 3784.0, "end": 3791.52, "text": " problems + very alien than efficiently. Yeah, absolutely. That''s quite deep. Thank you so + much", "tokens": [50972, 2740, 588, 12319, 813, 19621, 13, 865, 11, 3122, 13, 663, + 311, 1596, 2452, 13, 1044, 291, 370, 709, 51348], "temperature": 0.0, "avg_logprob": + -0.2031852782718719, "compression_ratio": 1.5026737967914439, "no_speech_prob": + 0.0026681963354349136}, {"id": 348, "seek": 377184, "start": 3791.52, "end": 3800.4, + "text": " for sharing this. It also resonates with me because in many ways, you + know, deep learning", "tokens": [51348, 337, 5414, 341, 13, 467, 611, 41051, 365, + 385, 570, 294, 867, 2098, 11, 291, 458, 11, 2452, 2539, 51792], "temperature": 0.0, + "avg_logprob": -0.2031852782718719, "compression_ratio": 1.5026737967914439, "no_speech_prob": + 0.0026681963354349136}, {"id": 349, "seek": 380040, "start": 3800.4, "end": 3806.08, + "text": " on one hand, maybe some people feel like it''s kind of overhyped and there + is so much material on", "tokens": [50364, 322, 472, 1011, 11, 1310, 512, 561, 841, + 411, 309, 311, 733, 295, 670, 3495, 3452, 293, 456, 307, 370, 709, 2527, 322, 50648], + "temperature": 0.0, "avg_logprob": -0.11098207877232479, "compression_ratio": 1.6610169491525424, + "no_speech_prob": 0.025757860392332077}, {"id": 350, "seek": 380040, "start": 3806.08, + "end": 3812.1600000000003, "text": " the web. On the other hand, when you start + doing it yourself, you might end up, you know, going into", "tokens": [50648, 264, + 3670, 13, 1282, 264, 661, 1011, 11, 562, 291, 722, 884, 309, 1803, 11, 291, 1062, + 917, 493, 11, 291, 458, 11, 516, 666, 50952], "temperature": 0.0, "avg_logprob": + -0.11098207877232479, "compression_ratio": 1.6610169491525424, "no_speech_prob": + 0.025757860392332077}, {"id": 351, "seek": 380040, "start": 3812.1600000000003, + "end": 3816.32, "text": " down the rabbit hole and you don''t know all the tools + as you said. You don''t know all the best", "tokens": [50952, 760, 264, 19509, 5458, + 293, 291, 500, 380, 458, 439, 264, 3873, 382, 291, 848, 13, 509, 500, 380, 458, + 439, 264, 1151, 51160], "temperature": 0.0, "avg_logprob": -0.11098207877232479, + "compression_ratio": 1.6610169491525424, "no_speech_prob": 0.025757860392332077}, + {"id": 352, "seek": 380040, "start": 3816.32, "end": 3824.8, "text": " practices. + And also, like, before we had vector databases, you couldn''t actually, well, apply + this,", "tokens": [51160, 7525, 13, 400, 611, 11, 411, 11, 949, 321, 632, 8062, + 22380, 11, 291, 2809, 380, 767, 11, 731, 11, 3079, 341, 11, 51584], "temperature": + 0.0, "avg_logprob": -0.11098207877232479, "compression_ratio": 1.6610169491525424, + "no_speech_prob": 0.025757860392332077}, {"id": 353, "seek": 382480, "start": 3824.8, + "end": 3831.28, "text": " like, okay, you could of course build some nice demo and, + you know, throw a web page and just ask", "tokens": [50364, 411, 11, 1392, 11, 291, + 727, 295, 1164, 1322, 512, 1481, 10723, 293, 11, 291, 458, 11, 3507, 257, 3670, + 3028, 293, 445, 1029, 50688], "temperature": 0.0, "avg_logprob": -0.12215572042563527, + "compression_ratio": 1.6610169491525424, "no_speech_prob": 0.013267750851809978}, + {"id": 354, "seek": 382480, "start": 3831.28, "end": 3836.8, "text": " somebody, + okay, type something here and my neural network will do something. But now, like, + you could", "tokens": [50688, 2618, 11, 1392, 11, 2010, 746, 510, 293, 452, 18161, + 3209, 486, 360, 746, 13, 583, 586, 11, 411, 11, 291, 727, 50964], "temperature": + 0.0, "avg_logprob": -0.12215572042563527, "compression_ratio": 1.6610169491525424, + "no_speech_prob": 0.013267750851809978}, {"id": 355, "seek": 382480, "start": 3836.8, + "end": 3842.4, "text": " kind of scale this further and index your embeddings and + see the end result of what you''re doing", "tokens": [50964, 733, 295, 4373, 341, + 3052, 293, 8186, 428, 12240, 29432, 293, 536, 264, 917, 1874, 295, 437, 291, 434, + 884, 51244], "temperature": 0.0, "avg_logprob": -0.12215572042563527, "compression_ratio": + 1.6610169491525424, "no_speech_prob": 0.013267750851809978}, {"id": 356, "seek": + 382480, "start": 3842.4, "end": 3851.2000000000003, "text": " through the retrieval + process. So I think that opens up a lot of opportunities. So that''s super", "tokens": + [51244, 807, 264, 19817, 3337, 1399, 13, 407, 286, 519, 300, 9870, 493, 257, 688, + 295, 4786, 13, 407, 300, 311, 1687, 51684], "temperature": 0.0, "avg_logprob": -0.12215572042563527, + "compression_ratio": 1.6610169491525424, "no_speech_prob": 0.013267750851809978}, + {"id": 357, "seek": 385120, "start": 3851.2, "end": 3869.9199999999996, "text": + " cool. Yeah, exactly. Actually, once we have such tooling, the domain is also improving + more rapidly", "tokens": [50364, 1627, 13, 865, 11, 2293, 13, 5135, 11, 1564, 321, + 362, 1270, 46593, 11, 264, 9274, 307, 611, 11470, 544, 12910, 51300], "temperature": + 0.0, "avg_logprob": -0.2701752853393555, "compression_ratio": 1.1, "no_speech_prob": + 0.017540913075208664}, {"id": 358, "seek": 386992, "start": 3869.92, "end": 3882.8, + "text": " and also the improvements in the domain also foster development of such + tools. So I think it''s like", "tokens": [50364, 293, 611, 264, 13797, 294, 264, + 9274, 611, 17114, 3250, 295, 1270, 3873, 13, 407, 286, 519, 309, 311, 411, 51008], + "temperature": 0.0, "avg_logprob": -0.2383765345034392, "compression_ratio": 1.4393939393939394, + "no_speech_prob": 0.05824998393654823}, {"id": 359, "seek": 386992, "start": 3883.6, + "end": 3896.48, "text": " too far and it will be a metric learning will be in a + better place in the future with this", "tokens": [51048, 886, 1400, 293, 309, 486, + 312, 257, 20678, 2539, 486, 312, 294, 257, 1101, 1081, 294, 264, 2027, 365, 341, + 51692], "temperature": 0.0, "avg_logprob": -0.2383765345034392, "compression_ratio": + 1.4393939393939394, "no_speech_prob": 0.05824998393654823}, {"id": 360, "seek": + 389648, "start": 3897.2, "end": 3904.56, "text": " rapid developments in the domain. + Yeah, absolutely. And I was thinking, like, there''s like a ton of", "tokens": [50400, + 7558, 20862, 294, 264, 9274, 13, 865, 11, 3122, 13, 400, 286, 390, 1953, 11, 411, + 11, 456, 311, 411, 257, 2952, 295, 50768], "temperature": 0.0, "avg_logprob": -0.1799595850818562, + "compression_ratio": 1.6129032258064515, "no_speech_prob": 0.016853876411914825}, + {"id": 361, "seek": 389648, "start": 3904.56, "end": 3910.64, "text": " material, + I''m sure we''ll have to digest, at least I will have to digest a lot of it and + see how I can", "tokens": [50768, 2527, 11, 286, 478, 988, 321, 603, 362, 281, 13884, + 11, 412, 1935, 286, 486, 362, 281, 13884, 257, 688, 295, 309, 293, 536, 577, 286, + 393, 51072], "temperature": 0.0, "avg_logprob": -0.1799595850818562, "compression_ratio": + 1.6129032258064515, "no_speech_prob": 0.016853876411914825}, {"id": 362, "seek": + 389648, "start": 3910.64, "end": 3917.36, "text": " apply this. And thankfully, + you have, you know, you have this awesome metric learning resource on", "tokens": + [51072, 3079, 341, 13, 400, 27352, 11, 291, 362, 11, 291, 458, 11, 291, 362, 341, + 3476, 20678, 2539, 7684, 322, 51408], "temperature": 0.0, "avg_logprob": -0.1799595850818562, + "compression_ratio": 1.6129032258064515, "no_speech_prob": 0.016853876411914825}, + {"id": 363, "seek": 389648, "start": 3917.36, "end": 3924.88, "text": " GitHub that + we can check out. We''ll make sure to leave it in the notes. And if if some of us + want to", "tokens": [51408, 23331, 300, 321, 393, 1520, 484, 13, 492, 603, 652, + 988, 281, 1856, 309, 294, 264, 5570, 13, 400, 498, 498, 512, 295, 505, 528, 281, + 51784], "temperature": 0.0, "avg_logprob": -0.1799595850818562, "compression_ratio": + 1.6129032258064515, "no_speech_prob": 0.016853876411914825}, {"id": 364, "seek": + 392488, "start": 3924.88, "end": 3931.6, "text": " kind of work with you or interact + with you, can you make like a little announcement where we can", "tokens": [50364, + 733, 295, 589, 365, 291, 420, 4648, 365, 291, 11, 393, 291, 652, 411, 257, 707, + 12847, 689, 321, 393, 50700], "temperature": 0.0, "avg_logprob": -0.21776157809842017, + "compression_ratio": 1.5371428571428571, "no_speech_prob": 0.005566722247749567}, + {"id": 365, "seek": 392488, "start": 3931.6, "end": 3937.52, "text": " join forces + and kind of learn more about metric learning and maybe contribute to this field + together", "tokens": [50700, 3917, 5874, 293, 733, 295, 1466, 544, 466, 20678, 2539, + 293, 1310, 10586, 281, 341, 2519, 1214, 50996], "temperature": 0.0, "avg_logprob": + -0.21776157809842017, "compression_ratio": 1.5371428571428571, "no_speech_prob": + 0.005566722247749567}, {"id": 366, "seek": 392488, "start": 3937.52, "end": 3949.04, + "text": " with you? Yes, you''re right. I have several announcements maybe. First,", + "tokens": [50996, 365, 291, 30, 1079, 11, 291, 434, 558, 13, 286, 362, 2940, 23785, + 1310, 13, 2386, 11, 51572], "temperature": 0.0, "avg_logprob": -0.21776157809842017, + "compression_ratio": 1.5371428571428571, "no_speech_prob": 0.005566722247749567}, + {"id": 367, "seek": 394904, "start": 3949.2, "end": 3962.32, "text": " beyond my + resource and engineering site, I''m also a community guide guide and we have a", + "tokens": [50372, 4399, 452, 7684, 293, 7043, 3621, 11, 286, 478, 611, 257, 1768, + 5934, 5934, 293, 321, 362, 257, 51028], "temperature": 0.0, "avg_logprob": -0.33519233976091656, + "compression_ratio": 1.328358208955224, "no_speech_prob": 0.010248702950775623}, + {"id": 368, "seek": 394904, "start": 3962.32, "end": 3974.64, "text": " difficult + server at Quadrant where we hold paper reading class. We had the first one about", + "tokens": [51028, 2252, 7154, 412, 29619, 7541, 689, 321, 1797, 3035, 3760, 1508, + 13, 492, 632, 264, 700, 472, 466, 51644], "temperature": 0.0, "avg_logprob": -0.33519233976091656, + "compression_ratio": 1.328358208955224, "no_speech_prob": 0.010248702950775623}, + {"id": 369, "seek": 397464, "start": 3974.64, "end": 3985.12, "text": " contrastive + laws and we will also have another fashion about triplet laws. And I also", "tokens": + [50364, 8712, 488, 6064, 293, 321, 486, 611, 362, 1071, 6700, 466, 1376, 14657, + 6064, 13, 400, 286, 611, 50888], "temperature": 0.0, "avg_logprob": -0.39745194571358816, + "compression_ratio": 1.3706896551724137, "no_speech_prob": 0.006165626924484968}, + {"id": 370, "seek": 397464, "start": 3986.24, "end": 3996.64, "text": " wrote a + wrote an intuitional triplet law post. Our approach will be like,", "tokens": [50944, + 4114, 257, 4114, 364, 560, 84, 2628, 1376, 14657, 2101, 2183, 13, 2621, 3109, 486, + 312, 411, 11, 51464], "temperature": 0.0, "avg_logprob": -0.39745194571358816, "compression_ratio": + 1.3706896551724137, "no_speech_prob": 0.006165626924484968}, {"id": 371, "seek": + 399664, "start": 3997.12, "end": 4007.6, "text": " after I will write such intuitional + post about papers and then we will hold", "tokens": [50388, 934, 286, 486, 2464, + 1270, 560, 84, 2628, 2183, 466, 10577, 293, 550, 321, 486, 1797, 50912], "temperature": + 0.0, "avg_logprob": -0.18127928972244262, "compression_ratio": 1.3253968253968254, + "no_speech_prob": 0.006581358145922422}, {"id": 372, "seek": 399664, "start": 4008.48, + "end": 4019.7599999999998, "text": " Q&A sessions in our discourse servers. So everyone + who is curious about metric learning can", "tokens": [50956, 1249, 5, 32, 11081, + 294, 527, 23938, 15909, 13, 407, 1518, 567, 307, 6369, 466, 20678, 2539, 393, 51520], + "temperature": 0.0, "avg_logprob": -0.18127928972244262, "compression_ratio": 1.3253968253968254, + "no_speech_prob": 0.006581358145922422}, {"id": 373, "seek": 401976, "start": 4019.76, + "end": 4033.28, "text": " join the discourse server to enjoy this discussion. Apart + from that one, beside my professional", "tokens": [50364, 3917, 264, 23938, 7154, + 281, 2103, 341, 5017, 13, 24111, 490, 300, 472, 11, 15726, 452, 4843, 51040], "temperature": + 0.0, "avg_logprob": -0.16468121455265924, "compression_ratio": 1.300751879699248, + "no_speech_prob": 0.003287998028099537}, {"id": 374, "seek": 401976, "start": 4033.28, + "end": 4041.28, "text": " life, I''m recognized as a Google developer expert on + machine learning, on the", "tokens": [51040, 993, 11, 286, 478, 9823, 382, 257, + 3329, 10754, 5844, 322, 3479, 2539, 11, 322, 264, 51440], "temperature": 0.0, "avg_logprob": + -0.16468121455265924, "compression_ratio": 1.300751879699248, "no_speech_prob": + 0.003287998028099537}, {"id": 375, "seek": 404128, "start": 4041.76, "end": 4054.0800000000004, + "text": " volunteering site, community site. And this year at Google''s thunder + off call, I will serve", "tokens": [50388, 33237, 3621, 11, 1768, 3621, 13, 400, + 341, 1064, 412, 3329, 311, 19898, 766, 818, 11, 286, 486, 4596, 51004], "temperature": + 0.0, "avg_logprob": -0.6347226036919488, "compression_ratio": 1.3768115942028984, + "no_speech_prob": 0.003868973348289728}, {"id": 376, "seek": 404128, "start": 4054.7200000000003, + "end": 4063.84, "text": " as a transfer flow mentor for the transfer flow, the library + take Python package, if a package for", "tokens": [51036, 382, 257, 5003, 3095, + 14478, 337, 264, 5003, 3095, 11, 264, 6405, 747, 15329, 7372, 11, 498, 257, 7372, + 337, 51492], "temperature": 0.0, "avg_logprob": -0.6347226036919488, "compression_ratio": + 1.3768115942028984, "no_speech_prob": 0.003868973348289728}, {"id": 377, "seek": + 406384, "start": 4064.56, "end": 4072.8, "text": " metric learning in the transfer + flow ecosystem. So university students and fresh", "tokens": [50400, 20678, 2539, + 294, 264, 5003, 3095, 11311, 13, 407, 5454, 1731, 293, 4451, 50812], "temperature": + 0.0, "avg_logprob": -0.170440673828125, "compression_ratio": 1.5371428571428571, + "no_speech_prob": 0.011202441528439522}, {"id": 378, "seek": 406384, "start": 4073.6000000000004, + "end": 4083.6000000000004, "text": " graduates can apply to Google''s thunder off + call if they want to work with me in this effort and", "tokens": [50852, 13577, + 393, 3079, 281, 3329, 311, 19898, 766, 818, 498, 436, 528, 281, 589, 365, 385, 294, + 341, 4630, 293, 51352], "temperature": 0.0, "avg_logprob": -0.170440673828125, "compression_ratio": + 1.5371428571428571, "no_speech_prob": 0.011202441528439522}, {"id": 379, "seek": + 406384, "start": 4083.6000000000004, "end": 4092.6400000000003, "text": " contribute + to the field. That''s fantastic. I think Google''s thunder off code is an exciting", + "tokens": [51352, 10586, 281, 264, 2519, 13, 663, 311, 5456, 13, 286, 519, 3329, + 311, 19898, 766, 3089, 307, 364, 4670, 51804], "temperature": 0.0, "avg_logprob": + -0.170440673828125, "compression_ratio": 1.5371428571428571, "no_speech_prob": 0.011202441528439522}, + {"id": 380, "seek": 409264, "start": 4092.72, "end": 4097.599999999999, "text": + " place to be and there are so many projects but it''s great to learn that you are + leading the", "tokens": [50368, 1081, 281, 312, 293, 456, 366, 370, 867, 4455, 457, + 309, 311, 869, 281, 1466, 300, 291, 366, 5775, 264, 50612], "temperature": 0.0, + "avg_logprob": -0.14781305064325748, "compression_ratio": 1.6740088105726871, "no_speech_prob": + 0.011664592660963535}, {"id": 381, "seek": 409264, "start": 4097.599999999999, "end": + 4104.48, "text": " metric learning exploration there and I''m sure there will be + interest towards it and I will make", "tokens": [50612, 20678, 2539, 16197, 456, + 293, 286, 478, 988, 456, 486, 312, 1179, 3030, 309, 293, 286, 486, 652, 50956], + "temperature": 0.0, "avg_logprob": -0.14781305064325748, "compression_ratio": 1.6740088105726871, + "no_speech_prob": 0.011664592660963535}, {"id": 382, "seek": 409264, "start": 4104.48, + "end": 4112.5599999999995, "text": " sure to also leave the relevant link in the + show notes on this. Yeah, thanks so much. Use of this", "tokens": [50956, 988, 281, + 611, 1856, 264, 7340, 2113, 294, 264, 855, 5570, 322, 341, 13, 865, 11, 3231, 370, + 709, 13, 8278, 295, 341, 51360], "temperature": 0.0, "avg_logprob": -0.14781305064325748, + "compression_ratio": 1.6740088105726871, "no_speech_prob": 0.011664592660963535}, + {"id": 383, "seek": 409264, "start": 4113.76, "end": 4120.72, "text": " this was + a pleasure to discuss with you. I feel like I dipped some of my fingers in the water", + "tokens": [51420, 341, 390, 257, 6834, 281, 2248, 365, 291, 13, 286, 841, 411, 286, + 45162, 512, 295, 452, 7350, 294, 264, 1281, 51768], "temperature": 0.0, "avg_logprob": + -0.14781305064325748, "compression_ratio": 1.6740088105726871, "no_speech_prob": + 0.011664592660963535}, {"id": 384, "seek": 412072, "start": 4120.8, "end": 4125.360000000001, + "text": " of metric learning. I think there is still a ton to learn and thanks so + much for", "tokens": [50368, 295, 20678, 2539, 13, 286, 519, 456, 307, 920, 257, + 2952, 281, 1466, 293, 3231, 370, 709, 337, 50596], "temperature": 0.0, "avg_logprob": + -0.2606357607925147, "compression_ratio": 1.4573170731707317, "no_speech_prob": + 0.0056219808757305145}, {"id": 385, "seek": 412072, "start": 4126.320000000001, + "end": 4130.88, "text": " introducing it from so multiple angles. We''ve enjoyed + this conversation.", "tokens": [50644, 15424, 309, 490, 370, 3866, 14708, 13, 492, + 600, 4626, 341, 3761, 13, 50872], "temperature": 0.0, "avg_logprob": -0.2606357607925147, + "compression_ratio": 1.4573170731707317, "no_speech_prob": 0.0056219808757305145}, + {"id": 386, "seek": 412072, "start": 4132.96, "end": 4145.68, "text": " Thank you + Dimitriv again for this great opportunity. I hope the audience also enjoyed", "tokens": + [50976, 1044, 291, 20975, 270, 470, 85, 797, 337, 341, 869, 2650, 13, 286, 1454, + 264, 4034, 611, 4626, 51612], "temperature": 0.0, "avg_logprob": -0.2606357607925147, + "compression_ratio": 1.4573170731707317, "no_speech_prob": 0.0056219808757305145}, + {"id": 387, "seek": 414568, "start": 4146.56, "end": 4155.84, "text": " it as well + and I hope it will be helpful for those who are interested in metric learning.", + "tokens": [50408, 309, 382, 731, 293, 286, 1454, 309, 486, 312, 4961, 337, 729, + 567, 366, 3102, 294, 20678, 2539, 13, 50872], "temperature": 0.0, "avg_logprob": + -0.19913691740769607, "compression_ratio": 1.608695652173913, "no_speech_prob": + 0.008144969120621681}, {"id": 388, "seek": 414568, "start": 4156.4800000000005, + "end": 4162.96, "text": " Yeah, for sure. Thank you so much. I learned a ton and + I hope I''ll also see you maybe doing", "tokens": [50904, 865, 11, 337, 988, 13, + 1044, 291, 370, 709, 13, 286, 3264, 257, 2952, 293, 286, 1454, 286, 603, 611, 536, + 291, 1310, 884, 51228], "temperature": 0.0, "avg_logprob": -0.19913691740769607, + "compression_ratio": 1.608695652173913, "no_speech_prob": 0.008144969120621681}, + {"id": 389, "seek": 414568, "start": 4162.96, "end": 4168.240000000001, "text": + " some presentations or reading your blogs to learn more about it. Thanks so much.", + "tokens": [51228, 512, 18964, 420, 3760, 428, 31038, 281, 1466, 544, 466, 309, 13, + 2561, 370, 709, 13, 51492], "temperature": 0.0, "avg_logprob": -0.19913691740769607, + "compression_ratio": 1.608695652173913, "no_speech_prob": 0.008144969120621681}, + {"id": 390, "seek": 414568, "start": 4169.6, "end": 4171.68, "text": " Thank you + so much. Yeah, bye bye.", "tokens": [51560, 1044, 291, 370, 709, 13, 865, 11, 6543, + 6543, 13, 51664], "temperature": 0.0, "avg_logprob": -0.19913691740769607, "compression_ratio": + 1.608695652173913, "no_speech_prob": 0.008144969120621681}]' +--- + +Hello, today we have a new episode of the Vector Podcast and today I'm super happy to have you, Suf Sangos, with me. He holds the role of AI Research Engineer at Quadrant. +It's a Vector Search Database Company and you might remember we had an episode with Tom Lackner, who is the user of Quadrant today. We have an episode and discussion with you, Suf, who works for Quadrant. +And one of the core topics today we're going to be discussing metric learning, but before that, hey, you Suf, how are you doing? I'm very excited to join you in this episode to discuss metric learning and thank you for having me. Yeah, thanks for coming up. +Really, I think this topic is something that has been crossing my area of focus and also some of the questions that users are asking, you know, okay, if I have this data set, how can I be sure that it will work with neural search, right? And I think metric learning seems to be one of the answers. +But before we start discussing this in deep, in depth, I was thinking, could you please introduce yourself to our audience? Yes, sure. Armist has told Suf's software developer and AI researcher with a background in linguistics at the university. +Actually, I've been developing software since my high school years. During my master's study, I combined my experience and my education to study machine translation. +After several years of experience in different roles and at different startups, I ended up with the multi model retrieval because I had a long experience in both computer vision and measured language processing. So for some time, my main focus is metric learning. +I was already a user of co-advent, even before joining co-advent and I thought it would be very cool to work for an open source project that I find valuable myself. Yeah, sounds awesome. Sounds cool. You just mentioned multi model. +So you mean like multi model search, right? And I think this field is still kind of in many ways shaping up and many people are still learning and kind of scratching their heads like what is multi model? Like maybe if you could give an example or a little bit explain what is multi model. Yes, sure. +Actually, as you just said, multi model is quite a new topic actually. Actually, it's resurrecting with developments in deep metric learning. One of the most famous applications is a clip by OpenAI, short for contrastive language image, the pre-training. +In the most basic term, they train a model to construct a unified vector space for both images and tests. Basically, they have two encoders, one for images and one for tests, support that you have a pair of images and its textual description. +When you see this image and that textual description to these encoders, you are supposed to get very similar vectors, vector output from these encoders. So you can search images with a textual query or Y-14. +So you sort of crossed the, so in a way with one modalities text or image is another modality, but in this case, we kind of like cross go across modalities. I think we can cross the border of modalities with this. +Yeah, which I think to many users will sound like a magic because you essentially, if you view an image like a set of pixels and if you query textual queries a set of words, now you sort of somehow magically search your words in pixels, but actually that's not exactly what's happening. +Of course, we do the embedding and so on, but in a nutshell, it kind of sounds like this magical cross model search there. + Yes, I expected for newcomers is a little bit like magic, but from quite a long time, we have already been using vector search in the context of image search, but in that case, we search for images with a query which is image if that, but in this case, we make a connection between two modalities actually. +This is also how our human brain is functioning. +For the most of the time, we don't consume the information from a single modality actually when we try to understand our environment, we both take it as a visual input and also an audio input and we also talk to people around them for it gives us a better understanding of the environment. +So if we want to make our AI smarter, we also need to help them gain this ability as well. So beyond searching for images with a textual query, this also helps us to combine information from different sources. +So in this case, maybe we can also have AI better understand its environment by combining, for example, a stream from the camera and also maybe an output from a speech recognition and encoding them into a vector we can combine these two vectors to fit into that encoder. +So this also opens such new opportunities. Yeah, that's a great intro there also like how you gave analogy with how human brain functions, so like how we take so many signals into our decision making. +And specifically, like what you mentioned about clip, I like the fact that in practical settings, let's say if you have images, let's say of some goods and you want to make a search in those goods and you also have some metadata, let's say titles or descriptions, right? +It may be that some human decided what to put in that text, but they didn't put everything that there is on the image, right? And so I think clip helps us to find sort of semantics that's hidden inside the image itself, right? So I think that's kind of like has practical impact on what we built. +Yeah, exactly. + Actually, in the traditional source, for example, let's get the product source as example, when you want to develop a product source for, for example, an e-commerce website, you need to enter different terms that can define that product to have a user's find that product with different wording, but this is not so practical because people use very different terms to refer to things. +And you in the current capacity of e-commerce websites, we have hundreds of thousands of products and they also need to be updated once you add new products and remove new products. +And also like myths acted at typos to this complexity, is actually explored to millions, maybe a tens of millions of possibilities. This is beyond the power of humans actually. +But once you make connections, make a connection between text and images, you don't need to enter such descriptive text, you only encode images into vectors and index time into a vector database. +Then in the inference time, all you need is just encode the textual input as well and create that pre-indexed database to get similar results. +Actually, this also buildings new opportunities, for example, people usually enter some pre-defined textual descriptors in this search engines, but some new products may have brand new features that people are not accustomed to. +So even in this case, our vector search based solution that combines images and text can be in that image as well. Yeah, that sounds cool. So it kind of opens up a lot of opportunities that didn't exist before when we modeled our object purely through textual representation. +Maybe somebody did attempt to also encode images of some other binary format, but I think maybe it wasn't as efficient or definitely not multi-model. So that sounds so cool. +And so how do you connect? Where do you start? Usually, let's say if you have a data set, right? And you want to implement neural search experience. +At one point of time, do you start thinking about what the metric is the best for my data set? And also, how do you approach it from which angle do you usually approach this? And this is something that really helps you to hear your theoretical as well as practical thoughts of this. +Yes, you're actually there are lots of very different techniques and methods and approaches to metric learning that can work for some specific types of problems. +But in my practical experience, I usually begin with with with an auto encoder, because it's already very easy to implement and easy to train. It can be applied to almost any data track. Basically, in auto encoders, we have two models and encoder and the encoders. +The encoders part encodes samples into an dimensional vector. This and should be much lower than the dimensionality of the input sample. And the decoder is supposed to reconstruct the input sample when this encoded vector is given to it. So this is a the self-provised method. +So it can be applied to any type of data set. You don't need labels. It usually gives a very good resource. After training such a model, you can visualize embedding. We call the output of the encoders, vectors embedding. So you can visualize such embedding with a tool. +This tool can be, for example, TensorFlow projectors and another tool by Yubach. I just couldn't show my word in there. Sorry. No worries. We can find those links later, I guess. Yeah, we can put a link in the description. +And this visualization tools have us see if our encoders really involve similar samples need to each closer to each other than the similar ones. If it is, we can use this encoders part. +We can just dispose the decoder part and we can simply keep the encoder part and use it to encode our samples and index them in the vector. And we can already start searching semantics. But we usually do buzzers than this one with only small set of labeled data. +And you actually need only a few with that one. Actually, we are preparing some publications to demonstrate this one. After you train and encoders with a considerable number of unlabeled data, all you need to do is just to find to in it with a small set of labeled data. +On the supervised site, there are really quite a number of very different approaches to matrix learning from more traditional margin-based approaches to newer categorization-based approaches. And actually, they deserve a long discussion of data. For sure. Yeah, that's awesome. +But just to unpack it a little bit, so in a natural metric learning process allows me to learn the optimal distance metric for my data. So it's kind of like a function of my dataset properties, inner properties. Yeah, actually, let's clarify this metric thing. +What does it mean in this context? In this context, a metric is a non-negative function with two inputs. Let's say X and Y. And it is used to measure what is called the distance between X and Y. When we feed such two inputs, it gives us a scaler's positive value. +If this value is closer to zero, then we can assume that those two inputs are more similar to each other with two inputs with a higher distance value. So our whole objective in metric learning is to train functions that can give this distance value. +On the practical site, we usually train a model that outputs a vector and a dimensional vector. And then we can apply different distance functions such as Euclidean and cosine distance to get a measurement of the distance value. There is also a term deep metric learning. +Actually, the traditional metric learning uses some linear transformations to project samples into an dimensional feature space to apply a metric function. +But this linear aspect of such transformations limits the use of traditional metric learning using time with more richers, data types, for example, images and texts. +So deep metric learning benefits from the methods of deep learning to learn non-linear transformations to project samples into a new and dimensional vector space. +But in this context, I usually use metric learning as an umbrella term to refer to both traditional metric learning and deep metric learning. Just like we do with machine learning to refer to both classical machine learning and deep learning. Yeah, that makes sense. Thank you. +And so essentially, in the lay main terms, deep learning allows us to vectorize data objects that previously we couldn't vectorize in a celly, so images or I don't know. And do it efficiently, because in images, you might have way too many pixels. +So if you just take the vector of all the pixels, it's way too big of an object to deal with. And so you vectorize, as you said, in the beginning, and you basically sort of project it in a lower dimensional space. So now you can actually efficiently operate on it. Exactly. +Let's get images as an example. Let's assume that we have images with a size of 200 times 200. And we also have a channel value of three. So we end up with 200 times 200 times three values for a single image. And also, let's actually, too many values also mean a great variance value. +So it's not so practical to make a measurement between two images, because those pixel values can include very surface, quite shallow surface features that do not make any sense in our semantics. +But once we encode those high dimensional inputs into a low dimensional vector space, for example, we usually have 500 to 12, 10 to the 12, 12, 12, 14 dimensional vectors. And this value is really low when compared to the original dimension of that sample. +So in this case, that model should learn, should learn a representation of high dimensional samples. Actually, we just throw the unnecessary part of those samples, and we only keep the part that matters for us. Yeah, yeah. +So kind of in some sense, you could say it's like signal compression, right? +So in some sense, like using the signal law, like the distribution, you could actually compress things, like I don't know if theoretically speaking in an image, you have like one object, and the rest is just the background of one color. +You really don't need to pass all these pixels independently, like you could just say, okay, it's a background I've learned that it's that color kind of semantically, I guess, and then what matters is the object somewhere there that we focus on when we look at this picture, right? Yeah, exactly. +Actually, in the original distribution case, for example, of images, we don't have any connection between the value of a pixel and the semantic counterpart of that pixel one. But once we transform it into a vector space, at least theoretically we can make conclusions. +For example, we have a 1024 dimensional vector as a representation of that image. In this case, if we examine this vector space, we can make conclusions of this value in the index zero, in cause the features of this feature of image. +For example, it can, in cause the size of a specific object or the colors value of a specific object or maybe some more abstract features of objects. This enables us to search it more efficiently instead of otherwise our values are actually distributed to a very wide range. +And we don't have such interpretations in that distribution space. Yeah, that makes sense. It's a very unique high variant and also in some senses, like waste of space because we are not communicating that much more information by sort of encoding all these pixels. +But we could actually extract some features and patterns in the image. +I think some early work on this was done using, if I remember, it was called a Godworth filter or some other ways of kind of smoothing your image and trying to learn what features you have, for instance, if you try to differentiate between spruce and widely trees. +So like for the purposes of keeping one tree and then maybe removing the others. But I think it wasn't as efficient perhaps as compared to deep learning because deep learning, as far as understanding, basically like learns without features in many ways. +It learns from the data and then you should have some target function that you're optimizing for so it can recreate the weights inside it. Exactly. +Actually, what is most differentiating feature of deep learning is deep learning is actually used to learn the parameters of complex functions instead of manually tuning them. Before deep learning, we already had most of the filters we currently have. +But the parameters of such filters were supposed to be manually tuned by experts in that domain. But in deep learning, we learn those parameters directly from data. And as you said, actually, the beginning of metric learning is also in dimensionality reduction. +We have most popular contrastive loss, for example. + And the first introduction of contrastive loss is in 2005 and the original purpose of that function actually to reduce dimensionality of high dimensional inputs rather than vector source or anything F for actually another end just tried to reduce the dimensionality of high dimensional input to use lower dimensional input F features to other models. +Yeah, that sounds exciting. Actually, before you brought this up, I didn't think that way because I was experimenting in my team also with things like product quantization. So you do have all already the vectors computed by the neural network, but you could actually quantize them even further. +So you save space and maybe of course you introduce some overlaps that might decrease your precision, but slightly, but you're gonna save a ton of space and make your search more efficient. +So it's almost like you could think of dimensionality reduction in so many different levels and ways as you have the reason about your data, right? Yeah, exactly. +Actually, metric learning is itself a type of dimensionality reduction, but even after you apply metric learning and vector encoding to your data, you still have a high dimensional vector. You have, for example, 10, 10, 10, 10, 4, dimensional data times 32 bits for a single flaw. +So it's already a huge data when you have, for example, millions of samples. So you can still actually apply some quantization methods to get even smaller representations from that one. +And this can be also hierarchical, meaning that you can get several representations of the same sample at different levels of information encoded in that feature space. Yeah, that's fantastic. + So I was also thinking like, if you could give like some practical example or setting where I could start thinking about deploying metric learning and also like, could you sort of point us in the direction of what tools are available so that I don't think we need to reinvent everything from scratch, but maybe there are some practices, also best practices available, you know, to structure this process. +Can you give some advice on that? Yeah, sure. For a starter example, actually, metric learning is best known for is used in face recognition, but personally, I don't support use of machine learning to process biometric information. +So I give an example from our everyday life, actually, we almost everyday use it, smart deploy. The feature found in, for example, GMA, LinkedIn, and other messaging apps. Actually, it is trained from a large collection of conversation histories in these platforms. +Basically, they just like the example we put in the beginning image and textual unified vector space, they construct a unified vector space for conversation histories and single sentences. +For any moment of conversation, you encode the history of that conversation to retrieve most relevant replies to that history. And you can show them as suggestions to the usage, and users can pitch one of them. +And what is exciting with this setup, you can also log the chosen reply, and you can continue improving your model from direct feedback from your actual users. So it's a really practical use case of metric learning. +And for practitioners who want to start experimenting with metric learning, actually, there are lots of tools to solve very few problems in metric learning. +So in the context of deep learning model development itself, we have several libraries, such as high-torch metric learning and transfer flow similarity. +There are other libraries as well, but I think these are the most mature libraries and most cultural, how should I say, virtual libraries to tackle with different data tasks. +On the other hand, for visualization, we have this transfer flow projector, is a browser-based tool for you can examine your embedding easily with that one. +There are also vector search databases, there are increasing in numbers, but of course, I am a fan of Quaddon because it's really doing a great job with an extensive filtering support for a variety of data tasks. And it's doing this very efficiently, very elegant in only 40 megawise. +So it opens up very important is to put your metric learning model into production and to combine vector search with super search as well. So you can just filter your data based on their payload information at the same time as vector search. +I think these are other than that, beyond beside my research and engineering practices, I'm also maintaining a repository called Automatic Learning and I'm regularly sharing new developments in the domain of metric learning with personal annotations. +So I think it might be also quite helpful for those who want to find their ways in this domain. That's awesome. Thank you. +I will certainly make sure to add all of these links in the description notes, in the notes to this podcast and usually all of these podcasts that I do, they have a lot of links that actually you almost can use as an educational material. And thanks so much for adding so much information here. +And I actually wanted to drill a little bit again into that example that brilliant example you gave about predicting sort of what snacks when I type. +Actually, I used this feature quite a lot and especially like when you're on the go and today I think I've used it somewhere with a Gmail, I was on the go and I had only one finger, right? +So just holding my phone as I go and there was a question and the answer was something, yes, it happened or yes, it did. +And maybe it wasn't the best sort of semantical choice or maybe not the most elegant choice linguistically, like maybe I would add more color, but because I was on the go, it was fine to save that, you know, few minutes and don't be distracted by the phone. +So I just pressed that button and off it goes. And so that's a fantastic feature. So I wanted to sort of open up the process a little bit of metric learning in this case. Basically, I imagine and please correct me if I'm wrong. +As an input, I would have, let's say, a pair of sentences that what was the input and what was the prediction and that prediction could be either curated by experts or we could have minded from the logs, whatever. +So let's say we have a corpus like this, right? So we can employ sequence to sequence model or some other model to actually train like our first first predictor. +So at which point would you start thinking and how exactly would you start thinking about metric learning? Like how can I change the behavior of my model? Like will I replace like last layer of my neural network with like different layer that I have learned from metric learning? +Can you a bit open up this kitchen for me? Thanks. +Actually, this smart supply has its own paper by Google as well and they are really doing a great job to describe the whole logic to whole design decisions behind this feature. + As you already said, the suggested duplicates are not the best, the most specific replies that you can imagine, but this is actually are spied design because they do not generate those replies, but they have a large collection of such replies and they should be as flexible as possible to fit into different circumstances. +So they shouldn't have any specific references to a specific sentence in the conversation. So that should be a generic enough to apply almost any conversation. +For the training slide, yeah actually, they filter a large collection from the different platforms they are running Gmail and other platforms and they filter short replies and thematically more broad samples such as as you gave as an example. Yes, I did or no, I didn't does it have such examples. +And the actual training algorithm works like this. They actually come up with a very creative, very clever, lost function for just a terrain with this model. They have only a pair of two samples and there is no other label or information. +We only have one input and one ground truth, we have no other scoring, no other label or anything else. So we only get a batch of, for example, and samples and we encode those two and samples because we have two samples first page and we end up with two and samples. +And once we encode them with our encoder, we can compute a distance matrix between these all posts of the encoder. A distance matrix is a two-dimensional matrix to define every distance value between all possible pairs in a collection. +So we have a matrix of five and times and and we already have these samples as pairs. We already know that there is a company target samples for the sample, for the first sample at index zero, the company sample should also be at index zero for sample at index one. +The company sample sample should be at index one. So we can generate these target labels just based on this information. So it's like a categorical classification now. +So for the first sample in the pair at index zero, the categorical label should be zero and others all index values should be wrong. +So we can just encode this information as a one-heart encoding and we can simply use a Crohn's entropy loss once we encode this information as a one-heart encoding and we can train this model with this loss. +So it is called multiple negative ranking loss because in in some way we rank all possible replies in a batch with multiple negatives and only one positive sample. Yeah. +And so you would train this network with this with this loss function and so the output will be what? Like will it be like the optimal metric or optimal? Yeah. +In one, we train this model, we end up with a model that can encode a sentence in such a way that that vector can retrieve the most relevant vectors from a collection of possible replies. So after we train this model, we encode all possible replies and index them in a vector database. +And at the inference time, we encode the usage input again with this model and make a query, a vector source query to that pre-index database of possible replies and we can get, for example, a car, a chain, a nearest neighbor to that vector to suggest to use it. Yeah. +I mean, after you explain this, like to me, like the mental image that evokes is that we sort of like learn rather than learning the metric, we're actually learning the vectors themselves. +We're learning the best vector representation for our object to satisfy some goal, right? Let's say that for this sentence, the closest reply should be this in some sense. Yeah, exactly. Actually, the model learns a representation that satisfies the satisfies our purpose. +So in some way, we can fully pick any distance metric based on this intuition. Yeah. So the second part of your question, when we can think about metric learning. +Actually, metric learning can be applied to almost any domain of problems, but there are some particular cases where metric learning really shines over other alternatives. +These are actually data scarce regimes, especially for labeled, if you are short for labeled data, you can still do a pretty good job with, for example, auto encoder, as we already discussed previously. And also, if you have rapid-changing distributions, it's again, very helpful. +And if you have, for example, a very, very high number of classes, again, metric learning can do a good job. Finally, metric learning is one of the best way to be able to actually increase the performance of machine learning models, even after training. +In normal deep learning training, there is no way to increase the performance of a model after training is complete. But in metric learning, this is quite possible. +For example, instead of just a training classification model to make a probability distribution over set of classes, we can train a metric learning model and encode samples with that model to store somewhere. +And during the inference, we can query that store to get more similar chain nearest neighbors and decide on the predicted category based on the majority of those chain nearest neighbors. This is called chain uncostication, in fact. +And in the practical side, on the practical side, you can continue to add new samples to that store without any need to retrain the model. And once you add new samples to that store your model performance will also increase. +And also, there is another use case, for example, a more recent approach by DeepMind. Up until now, the only way to make AI smarter is usually train a bigger and bigger language model. But in the most recent study by DeepMind, they augment language models with retrieval capability. +This means actually they encode and store a large collection of corpus in a Rector's database. And during the inference, they query this database to get most relevant sentences, most relevant text to the user input. And they combine them to feed to the model. +And with this technique, they can achieve the same performance as GPT3 with 25x less parameters. So it's a very efficient way of AI. So I'm also quite happy to see the direction of AI towards a more efficient one with metric learning as well. Yeah, yeah, it's fantastic. +And I think it's like a good impact on the planning, because I don't think we want to spend too much electricity on all power and training neural networks. Yeah, exactly. +And it also enables democratization of Deep Learning, because not everyone has the same resources as this large companies as Google, Facebook, and OpenAI. So I think it's also important for that reason as well. Yeah, that's fantastic. I mean, you gave quite a lot of detail on metric learning. +Of course, there is a ton to learn. And I even, I've seen like a book cited on one of the metric learning pages that I found through your awesome metric learning resource. And now that we touched a bit on where the AI is also going and then how to make it more efficient. +I also like to ask a question of why sort of this magical question which drills into your motivation as to why at all you are in this space, let's say deep learning and quadrant vector search and also specifically metric learning. +Can you a bit elaborate on the philosophy that drives you here? Yes, sure. Actually, what motivates me to work with metric learning is it's potential to approach many different problems very efficiently. +Before metric learning actually, you need to train very different models to solve very different problems. But with metric learning, you can train a single model and you can use the very same model to solve very different problems. +And this is also another fight that makes metric learning efficient. Actually, metric learning has a great potential, but you also need a great tool to put it into production. For example, upon to now there was no way to combine vector search with paid-out information. +Even if you make a connection, it was not for practical because you lose some information because you do not, you could not filter the tool systems of information at the same time. Quadrant is doing a great job by combining vector search with filterable paid-out information. +So it opens up quite a few new opportunities. For that one, you can filter your information based on if geographic, geographic place, for example, or another sparse category, numeric value or anything else while at the same time doing a vector search. So I think it's really exciting. +One of the most common problems in AI, you actually do the research, but you don't have the required tooling to make it practical in the real world. +So I think it's quite important to have such tools as Quadrant to achieve very different, very difficult and challenging problems very alien than efficiently. Yeah, absolutely. That's quite deep. Thank you so much for sharing this. +It also resonates with me because in many ways, you know, deep learning on one hand, maybe some people feel like it's kind of overhyped and there is so much material on the web. +On the other hand, when you start doing it yourself, you might end up, you know, going into down the rabbit hole and you don't know all the tools as you said. You don't know all the best practices. +And also, like, before we had vector databases, you couldn't actually, well, apply this, like, okay, you could of course build some nice demo and, you know, throw a web page and just ask somebody, okay, type something here and my neural network will do something. +But now, like, you could kind of scale this further and index your embeddings and see the end result of what you're doing through the retrieval process. So I think that opens up a lot of opportunities. So that's super cool. Yeah, exactly. +Actually, once we have such tooling, the domain is also improving more rapidly and also the improvements in the domain also foster development of such tools. +So I think it's like too far and it will be a metric learning will be in a better place in the future with this rapid developments in the domain. Yeah, absolutely. +And I was thinking, like, there's like a ton of material, I'm sure we'll have to digest, at least I will have to digest a lot of it and see how I can apply this. And thankfully, you have, you know, you have this awesome metric learning resource on GitHub that we can check out. +We'll make sure to leave it in the notes. And if if some of us want to kind of work with you or interact with you, can you make like a little announcement where we can join forces and kind of learn more about metric learning and maybe contribute to this field together with you? Yes, you're right. +I have several announcements maybe. First, beyond my resource and engineering site, I'm also a community guide guide and we have a difficult server at Quadrant where we hold paper reading class. We had the first one about contrastive laws and we will also have another fashion about triplet laws. +And I also wrote a wrote an intuitional triplet law post. Our approach will be like, after I will write such intuitional post about papers and then we will hold Q&A sessions in our discourse servers. +So everyone who is curious about metric learning can join the discourse server to enjoy this discussion. Apart from that one, beside my professional life, I'm recognized as a Google developer expert on machine learning, on the volunteering site, community site. +And this year at Google's thunder off call, I will serve as a transfer flow mentor for the transfer flow, the library take Python package, if a package for metric learning in the transfer flow ecosystem. +So university students and fresh graduates can apply to Google's thunder off call if they want to work with me in this effort and contribute to the field. That's fantastic. + I think Google's thunder off code is an exciting place to be and there are so many projects but it's great to learn that you are leading the metric learning exploration there and I'm sure there will be interest towards it and I will make sure to also leave the relevant link in the show notes on this. +Yeah, thanks so much. Use of this this was a pleasure to discuss with you. I feel like I dipped some of my fingers in the water of metric learning. I think there is still a ton to learn and thanks so much for introducing it from so multiple angles. We've enjoyed this conversation. +Thank you Dimitriv again for this great opportunity. I hope the audience also enjoyed it as well and I hope it will be helpful for those who are interested in metric learning. Yeah, for sure. Thank you so much. +I learned a ton and I hope I'll also see you maybe doing some presentations or reading your blogs to learn more about it. Thanks so much. Thank you so much. Yeah, bye bye. \ No newline at end of file